An API Client that’s Faster than the API

For the past few weeks I've been working on Logrole, a Twilio log viewer. If you have to browse through your Twilio logs, I think this is the way that you should do it. We were able to do some things around performance and resource conservation that have been difficult to accomplish with today's popular web server technologies.

Fast List Responses

Twilio's API frequently takes over 1 second to return a page of calls or messages via the API. But Logrole usually returns results in under 100ms. How? Every 30 seconds we fetch the most recent page of Calls/Messages/Conferences and store them in a cache. When we download a page of resources, we get the URL to the next page - Twilio's next_page_uri — immediately, but a user might not browse to the next page for another few seconds. We don't have to wait for you to hit Next to get the results - on the server side, we fetch/cache the next page immediately, so it's ready when you finally hit the Next button, and it feels really snappy.

The cache is a simple LRU cache. We run Go structs through encoding/gob and then gzip before storing them, which takes the size of a cache value from 42KB to about 3KB. At this size, about 8,300 values can fit in 25MB of memory.

var buf bytes.Buffer
writer := gzip.NewWriter(&buf)
enc := gob.NewEncoder(writer)
if err := enc.Encode(data); err != nil {
	panic(err)
}
if err := writer.Close(); err != nil {
	panic(err)
}
c.mu.Lock()
defer c.mu.Unlock()
c.c.Add(key, buf.Bytes())
c.Debug("stored data in cache", "key", key, "size", buf.Len(),
    "cache_size", c.c.Len())

Right now one machine is more than enough to serve the website, but if we ever needed multiple machines, we could use a tool like groupcache to share/lookup cached values across multiple different machines.

Combining Shared Queries

The logic in the previous paragraphs leads to a race. If the user requests the Next page before we've finished retrieving/saving the value in the cache, we'll end up making two requests for the exact same data. This isn't too bad in our case, but doubles the load on Twilio, and means the second request could have gotten the results sooner by reusing the response from first request.

The singleflight package is useful for ensuring only one request ever gets made at a time. With singleflight, if a request is already in progress with a given key, a caller will get the return value from the first request. We use the next page URI as the key.

var g singleflight.Group
g.Do(page.NextPageURI, func() (interface{}, error) {
    // 1) return value from cache, if we've stored it
    // 2) else, retrieve the resource from the API
    // 3) store response in the cache
    // 4) return response
})

This technique is also useful for avoiding thundering herd problems.

Canceling Unnecessary Queries

You've configured a 30 second request timeout in nginx, and a query is taking too long, exceeding that timeout. nginx returns a 504 Gateway Timeout and moves on, but your server is still happily processing the request, even though no one is listening. It's a hard problem to solve because it's much easier for a thread to just give up than to give up and tell everyone downstream of you that they can give up too. A lot of our tools and libraries don't have the ability to do that kind of out of band signaling to a downstream process.

Go's context.Context is designed as an antidote for this. We set a timeout in a request handler very early on in the request lifecycle:

ctx, cancel := context.WithTimeout(req.Context(), timeout)
defer cancel()
req = req.WithContext(ctx)
h.ServeHTTP(w, req)

We pass this context to any method call that does I/O - a database query, a API client request (in our case), or a exec.Command. If the timeout is exceeded, we'll get a message on a channel at ctx.Done(), and can immediately stop work, no matter where we are. Stopping work if a context is canceled is a built in property of http.Request and os/exec in Go 1.7, and will be in database/sql starting with Go 1.8.

This is so nice - as a comparison, one of the most popular npm libraries for "stop after a timeout" is the connect-timeout library, which let you execute a callback after a particular amount of time, but does nothing to cancel any in-progress work. No popular ORM's for Node support canceling database queries.

It can be really tricky to enforce an absolute deadline on a HTTP request. In most languages you compute a timeout as a duration, but this timeout might reset to 0 every time a byte is received on the socket, making it difficult to enforce that the request doesn't exceed some wall-clock amount of time. Normally you have to do this by starting a 2nd thread that sleeps for a wall-clock amount of time, then checks whether the HTTP request is still in progress and kills the HTTP request thread if so. This 2nd thread also has to cleanup and close any open file descriptors.

Starting threads / finding and closing FD's may not be easy in your language but Contexts make it super easy to set a deadline for sending/receiving data and communicating that to a lot of different callers. Then the http request code can clean up the same way it would for any canceled request.

Metrics

I've been obsessed with performance for a long time and one of the first things I like to do in a new codebase is start printing out timestamps. How long did tests take? How long did it take to start the HTTP server? It's impossible to optimize something if you're not measuring it and it's hard to make people aware of a problem if you're not printing durations for common tasks.

Logrole prints three numbers in the footer of every response: the server response time, the template render time, and the Twilio API request time. You can use these numbers to get a picture of where the server was spending its time, and whether template rendering is taking too long. I use custom template functions to implement this - we store the request's start time in its Context, and then print time elapsed on screen. Obviously this is not a perfect measure since we can't see the time after the template footer is rendered - mainly the ResponseWriter.Write call. But it's close enough.

HTML5

Logrole loads one CSS file and one font. I would have had to use a lot more Javascript a few years ago, but HTML5 has some really nice features that eliminate the need for Javascript. For example, there's a built in date picker for HTML5, that people can use to filter calls/messages by date (It's best supported in Chrome at the moment). Similarly you don't need Javascript to play recordings anymore. HTML5 has an <audio> element that will provide controls for you.

I've needed Javascript in only three places so far:

a "click to show images" toggle button where the CSS necessary to implement it would have been too convoluted
a "click-to-copy" button
To submit a user's timezone change when they change it in the menu bar (instead of having a separate "Submit" button).

About 50 lines in total, implemented in the HTML below the elements where it's needed.

Conclusion

Combining these techniques, we get a server that uses little memory, doesn't waste any time doing unnecessary work, and responds and renders a slow data set extraordinarily quickly. Before starting a new project, evaluate the feature set of the language/frameworks you plan to use - whether the ORM/API clients you are planning to use support fast cancelation, whether you can set wall-clock timeouts and propagate them easily through your stack, and whether your language makes it easy to combine duplicate requests.

If you are a Twilio customer I hope you will give Logrole a try - I think you will like it a lot.

Thanks to Bradley Falzon, Kyle Conroy and Alan Shreve for reading drafts of this post. Thanks to Brad Fitzpatrick for designing and implementing most of the code mentioned here.

Liked what you read? I am available for hire.

KEVIN BURKE