Making Your Services More Reliable

Kevin Burke

@derivativeburke

What does Five Nines mean?

Five Nines = you can fail 0.001% of the time

Chasing Five Nines

This might not be appropriate!

Microservices:

Many more opportunities for failure!

What happens when I type google.com into my browser?

DNS Lookup

man getaddrinfo

Establish TCP Connection

man connect

Write Request

man 2 write

Read Response

man 2 read

Parse Response

DNS

DNS Provider Outages

ENom - 5 million domains

Cloudfront - 100 Minutes Downtime

DNSimple - 11 Hour Downtime

DNS Lookup Failures

DNS Server is Down/Unreachable

You Might Be Vulnerable If...

DNS Resolver is Down - Workarounds

DNS Provider is Down - Workarounds

Timeouts

When something is taking too long, you abandon it

Your users have a timeout, whether your system does or not

Outside In

Setting Timeouts - 2 Questions

Hard Math Stuff

Fail early if you can't serve a request

Socket Timeouts Are Liars

Slow Read

Remote Server Unreachable

Why 18 Seconds?

HAProxy

retries is the number of times a connection attempt should be retried on a server when a connection either is refused or times out. The default value is 3.

Timeouts

One Timeout Value = set on both Connect/Read

Separate Connect Timeout

Not available in the standard library!

Timeouts - Workarounds

Timeouts - Wall Clock Timeout

Timeouts - Measure

Retries

Retries - Temporal Failures

Retries - Single Component Failures

Exponential Backoff

1, 2, 4, 8...

Exponential Backoff with Jitter

1.01, 2.03, 3.9, 8.2...

When can you retry?

Idempotence

Idempotent Actions

Idempotent Requests

You can always retry idempotent actions!

Not Idempotent

When can you retry?

Not Idempotent Requests

You can retry if the data never made it (connection timeout, connection error)

Not Idempotent Requests

Determining whether the data made it is hard

Not Idempotent Requests

You can retry if you get a 429 or a 503 (carefully!)

How do I make things idempotent?

Not idempotent

Idempotent (retryable) request

Idempotent (retryable) request

Requires sid collision handling!

Know your HTTP client

Testing your clients

Thanks!

Kevin Burke

kev.inburke.com

kev@inburke.com

@derivativeburke


These slides are available at:

kev.inburke.com/slides/reliable-http

/

#