Posts Tagged With: Code

Dumb Tricks to Save Database Space

I have seen a few databases recently that could have saved a lot of space by being more efficient with how they stored data. Sometimes this isn't a big problem, when a table is not going to grow particularly quickly. But it can become a big problem and you can be leaving a lot of disk savings on the table.

Let's review some of the benefits of smaller tables:

  • Indexes are smaller. This means your database needs less space to index your tables, and more RAM can be used to cache results.

  • The cache can hold more objects, since the objects are smaller.

  • You'll delay the point at which your database won't fit on a single disk, and you have to shard.

  • Query results which might have fit in 2 TCP packets will now fit in one.

  • Backups complete more quickly.

  • Your application servers will use less RAM to hold the result.

  • Migrations complete more quickly.

  • Full table searches complete more quickly.

Let's review some common data types and strategies for holding these. If these are obvious to you - great! You can stop reading at any point. They're not obvious to a lot of people.

A brief reminder before we get started - a bit is a single 0 or 1, and a byte is a series of 8 bits. Every ASCII character can be represented in a single byte.


It's common to store UUID's as text fields in your database. A typical UUID representation - "ad91e02b-a147-4c47-aa8c-1f3c2240c0df" - will take up 36 bytes and more if you store it with a prefix like SMS or user_. A UUID uses only 16 different characters (the hyphens are for display only, like hyphens in a phone number). This means you only need 4 bits to store a UUID character. There are 32 characters in a UUID, so can fit a UUID in 16 bytes, a saving of 55%. If you're using Postgres, you can use the uuid data type to store UUID's, or the binary data type - in MySQL, you can use a binary(16).


It's often useful to store a prefix with a UUID, so you know what the ID represents from looking at it - for example, SM123 or user_456. I wrote a short library that stores a prefix with a UUID, but strips it before writing to the database. To read UUID's out of the database with a prefix, attach them to the SELECT statement:

SELECT 'user_' || id FROM users LIMIT 5;

My old team at Shyp recently converted text ID's to UUID's and wrote about that process on their engineering blog.


It's becoming more common to store relational data in JSON or JSONB columns. There are a lot of reasons to do this or not do this - I don't want to rehash that discussion here. JSONB does lead to inefficient data storage for UUID's, however, since you are limited to storing characters that are valid JSON. If you are storing UUID's this means you can't get down to 16 bytes, since you can't just any byte in JSON. You can base64 encode your 16 byte UUID. In Go, that encoding dance looks something like this:

import "encoding/base64"
import "encoding/hex"
import "strings"
rawUUID := "ad91e02b-a147-4c47-aa8c-1f3c2240c0df"
// Strip the hyphens
uuidStr := strings.Replace(rawUUID, "-", "", 4)
// Decode the hex string into a slice of 16 bytes.
bits, _ := hex.DecodeString(uuidStr)
// Re-encode that 16-byte slice using base64.

That outputs rZHgK6FHTEeqjB88IkDA3w, which is only 22 bytes, a 38% improvement.

Use smaller numbers

A default Postgres integer is 4 bytes and can hold any number from -2147483648 to 2147483648. If you know that the integer you are storing is never going to exceed 32,760, you can use a smallint (2 bytes) to store it and save 2 bytes per row.

Use an enum type

Let's say you have a subscription that can have one of several states (trial, paid, expired). Storing the strings "trial", "paid", "expired" in the database can take up extra space. Instead use an enum type, which is only 4 bytes (1 byte in MySQL) and ensures you can't accidentally write a bad status like "trail". Another alternative is to store a smallint and convert them to values that make sense in the application, but this makes it harder to determine what things are if you're querying the database directly, and doesn't prevent mistakes.

Use binary for password hashes

Most password hashing algorithms should give you back raw bytes. You should be able to store the raw bytes directly in the database using bytea.

Move fields out of JSON columns

One downside of JSON/JSONB is that the key gets stored alongside the value for each row in the application. If you are storing a boolean like {"show_blue_button": true} in JSON, you're using 18 bytes per row to store the string "show_blue_button" and only one bit to store the boolean true. If you store this field in a Postgres column, you are only using one or two bits per row. Moving this to a column pays off in terms of space even if you only need the show_blue_button boolean once every 70-140 rows. It's much easier to add indexes on columns than JSON fields as well.


That's it! A small amount of thought and work upfront can pay big dividends down the line. Migrating columns after they're already in place can be a pain. In general, the best approach is to do the following:

  • Add the new column with the correct type.

  • Edit your application to write/update both the new and the old column.

  • Backfill the new column, copying over all values from the old column for old records in the database. If the table is large, do this in batches of 1000 rows or so to avoid locking your table for too long.

  • Edit your application to read exclusively from the new column.

  • Drop the old column.

I hope this helps!

Inspired by some tweets from Andrey Petrov and a Heap Analytics post about JSONB.

Liked what you read? I am available for hire.

Real Life Go Benchmarking

I've been following the commits to the Go project for some time now. Occasionally someone will post a commit with benchmarks showing how much the commit improves performance along some axis or another. In this commit, they've increased the performance of division by 7 (a notoriously tricky number to divide by) by about 40% on machines running the ARM instruction set. I'm not 100% sure what the commit does, but it switches around the instructions that get generated when you do long division on ARM in a way that makes things faster.

Anyway, I wanted to learn how to do benchmarks, and practice making something faster. Some motivation came along when Uber released their go-zap logging library, with a set of benchmarks showing my friend Alan's logging library as the worst performer. So I thought it would be a good candidate for benchmarking.

Fortunately Alan has already included a set of benchmarks in the test suite. You can run them by cloning the project and then calling the following:

go test -run=^$ -bench=.

You need to pass -run=^$ to exclude all tests in the test suite, otherwise all of the tests will run and also all of the benchmarks. We only care about the benchmarks, so -run=^$ filters out every argument.

We get some output!

BenchmarkStreamNoCtx-4                   	  300000	      5735 ns/op
BenchmarkDiscard-4                       	 2000000	       856 ns/op
BenchmarkCallerFileHandler-4             	 1000000	      1980 ns/op
BenchmarkCallerFuncHandler-4             	 1000000	      1864 ns/op
BenchmarkLogfmtNoCtx-4                   	  500000	      3866 ns/op
BenchmarkJsonNoCtx-4                     	 1000000	      1816 ns/op
BenchmarkMultiLevelFilter-4              	 2000000	      1015 ns/op
BenchmarkDescendant1-4                   	 2000000	       857 ns/op
BenchmarkDescendant2-4                   	 2000000	       872 ns/op
BenchmarkDescendant4-4                   	 2000000	      1029 ns/op
BenchmarkDescendant8-4                   	 1000000	      1018 ns/op
BenchmarkLog15AddingFields-4             	   50000	     29492 ns/op
BenchmarkLog15WithAccumulatedContext-4   	   50000	     33599 ns/op
BenchmarkLog15WithoutFields-4            	  200000	      9417 ns/op
ok	30.083s

The name on the left is the benchmark name. The number (4) is the number of CPU's used for the benchmark. The number in the middle is the number of test runs; to get a good benchmark, you want to run a thing as many times as feasible, then divide the total runtime by the number of test runs. Otherwise you run into problems like coordinated omission and weighting the extreme outliers too much, or failing to weight them at all.

To get a "good" benchmark, you want to try to isolate the running code from anything else running on the machine. For example, say you run the tip benchmarks while a Youtube video is playing, make a change, and then run the benchmarks while nothing is playing. Video players require a lot of CPU/RAM to play videos, and all other things being equal, the benchmark is going to be worse with Youtube running. There are a lot of ways to accidentally bias the results as well, for example you might get bored with the tip benchmarks and browse around, then make a change and observe the console intensely to see the new results. You're biasing the results simply by not switching tabs or screens!

If you have access to a Linux machine with nothing else running on it, that's going to be your best bet for benchmarking. Otherwise, shut down as many other programs as are feasible on your main machine before starting any benchmark tests.

Running a benchmark multiple times can also be a good way to compensate for environmental effects. Russ Cox's benchstat program is very useful for this; it gathers many runs of a benchmark, and tells you whether the results are statistically significant. Run your benchmark with the -count flag to run it multiple times in a row:

go test -count=20 -run=^$ -bench=Log15AddingFields | tee -a master-benchmark

Do the same for your change, writing to a different file (change-benchmark), then run benchstat:

benchstat master-benchmark change-benchmark

You'll get really nice looking output. This is generally the output used by the Go core development team to print benchmark results.

$ benchstat benchmarks/tip benchmarks/early-time-exit
name                 old time/op  new time/op  delta
StreamNoCtx-4        3.60µs ± 6%  3.17µs ± 1%  -12.02%  (p=0.001 n=8+6)
Discard-4             837ns ± 1%   804ns ± 3%   -3.94%  (p=0.001 n=7+6)
CallerFileHandler-4  1.94µs ± 2%  1.88µs ± 0%   -3.01%  (p=0.003 n=7+5)
CallerFuncHandler-4  1.72µs ± 3%  1.65µs ± 1%   -3.98%  (p=0.001 n=7+6)
LogfmtNoCtx-4        2.39µs ± 2%  2.04µs ± 1%  -14.69%  (p=0.001 n=8+6)
JsonNoCtx-4          1.68µs ± 1%  1.66µs ± 0%   -1.08%  (p=0.001 n=7+6)
MultiLevelFilter-4    880ns ± 2%   832ns ± 0%   -5.44%  (p=0.001 n=7+6)
Descendant1-4         855ns ± 3%   818ns ± 1%   -4.28%  (p=0.000 n=8+6)
Descendant2-4         872ns ± 3%   830ns ± 2%   -4.87%  (p=0.001 n=7+6)
Descendant4-4         934ns ± 1%   893ns ± 1%   -4.41%  (p=0.001 n=7+6)
Descendant8-4         990ns ± 2%   958ns ± 2%   -3.29%  (p=0.002 n=7+6)

OK! So now we have a framework for measuring whether a change helps or hurts.

How can I make my code faster?

Good question! There's no one answer for this. In general, there are three strategies:

  • Figure out a way to do less work than you did before. Avoid doing an expensive computation where it's not necessary, exit early from functions, &c.

  • Do the same work, but in a faster way; use a different algorithm, or use a different function, that's faster.

  • Do the same work, but parallelize it across multiple CPU's, or threads.

Each technique is useful in different places, and it can be hard to predict where you'll be able to extract performance improvements. It is useful to know how expensive various operations are, so you can evaluate the costs of a given operation.

It's also easy to spend a lot of time "optimizing" something only to realize it's not the bottleneck in your program. If you optimize something that takes 5% of the code's execution time, the best overall speedup you can get is 5%, even if you get the code to run instantaneously. Go's test framework has tools for figuring out where your code is spending the majority of its time. To get the best use out of them, focus on profiling the code execution for a single benchmark. Here, I'm profiling the StreamNoCtx benchmark in the log15 library.

$ go test -cpuprofile=cpu.out -benchmem -memprofile=mem.out -run=^$ -bench=StreamNoCtx -v
BenchmarkStreamNoCtx-4   	  500000	      3502 ns/op	     440 B/op	      12 allocs/op

This will generate 3 files: cpu.out, mem.out, and log15.test. These are binary files. You want to use the pprof tool to evaluate them. First let's look at the CPU profile; I've started it and then run top10 to get the top 10 functions.

$ go tool pprof log15.test cpu.out
Entering interactive mode (type "help" for commands)
(pprof) top5
560ms of 1540ms total (36.36%)
Showing top 5 nodes out of 112 (cum >= 60ms)
      flat  flat%   sum%        cum   cum%
     180ms 11.69% 11.69%      400ms 25.97%  runtime.mallocgc
     160ms 10.39% 22.08%      160ms 10.39%  runtime.mach_semaphore_signal
     100ms  6.49% 28.57%      260ms 16.88%
      60ms  3.90% 32.47%       70ms  4.55%  bytes.(*Buffer).WriteByte
      60ms  3.90% 36.36%       60ms  3.90%  runtime.stringiter2

The top 5 functions are responsible for 36% of the program's runtime. flat is how much time is spent inside of a function, cum shows how much time is spent in a function, and also in any code called by a function. Of these 5, runtime.stringiter2, runtime.mallocgc, and runtime.mach_semaphore_signal are not good candidates for optimization. They are very specialized pieces of code, and they're part of the Go runtime, so changes need to pass all tests and get approved by the Go core development team. We could potentially figure out how to call them less often though - mallocgc indicates we are creating lots of objects. Maybe we could figure out how to create fewer objects.

The likeliest candidate for improvement is the escapeString function in our own codebase. The list function is super useful for checking this.

(pprof) list escapeString
ROUTINE ======================== in /Users/kevin/code/go/src/
      30ms      330ms (flat, cum) 23.40% of Total
      10ms       10ms    225:func escapeString(s string) string {
         .          .    226:	needQuotes := false
         .       90ms    227:	e := bytes.Buffer{}
         .          .    228:	e.WriteByte('"')
      10ms       50ms    229:	for _, r := range s {
         .          .    230:		if r <= ' ' || r == '=' || r == '"' {
         .          .    231:			needQuotes = true
         .          .    232:		}
         .          .    233:
         .          .    234:		switch r {
         .          .    235:		case '\\', '"':
         .          .    236:			e.WriteByte('\\')
         .          .    237:			e.WriteByte(byte(r))
         .          .    238:		case '\n':
         .          .    239:			e.WriteByte('\\')
         .          .    240:			e.WriteByte('n')
         .          .    241:		case '\r':
         .          .    242:			e.WriteByte('\\')
         .          .    243:			e.WriteByte('r')
         .          .    244:		case '\t':
         .          .    245:			e.WriteByte('\\')
         .          .    246:			e.WriteByte('t')
         .          .    247:		default:
         .      100ms    248:			e.WriteRune(r)
         .          .    249:		}
         .          .    250:	}
         .       10ms    251:	e.WriteByte('"')
         .          .    252:	start, stop := 0, e.Len()
         .          .    253:	if !needQuotes {
         .          .    254:		start, stop = 1, stop-1
         .          .    255:	}
      10ms       70ms    256:	return string(e.Bytes()[start:stop])

The basic idea here is to write a string, but add a backslash before double quotes, newlines, and tab characters. Some ideas for improvement:

  • We create a bytes.Buffer{} at the beginning of the function. We could keep a Pool of buffers, and retrieve one, so we don't need to allocate memory for a buffer each time we escape a string.

  • If a string doesn't contain a double quote, newline, tab, or carriage return, it doesn't need to be escaped. Maybe we can avoid creating the Buffer entirely for that case, if we can find a fast way to check whether a string has those characters in it.

  • If we call WriteByte twice in a row, we could probably replace it with a WriteString(), which will use a copy to move two bytes, instead of growing the array twice.

  • We call e.Bytes() and then cast the result to a string. Maybe we can figure out how to call e.String() directly.

You can then look at how much time is being spent in each area to get an idea of how much any given idea will help your benchmarks. For example, replacing WriteByte with WriteString probably won't save much time; you're at most saving the time it takes to write every quote and newline, and most strings are made up of letters and numbers and space characters instead. (It doesn't show up at all in the benchmark but that's because the benchmark writes the phrase "test message" over and over again, and that doesn't have any escape-able characters).

That's the CPU benchmark; how much time the CPU spends running each function in the codebase. There's also the memory profile; how much memory each function allocates. Let's look at that! We have to pass one of these flags to pprof to get it to show memory information.

Sample value selection option (for heap profiles):
  -inuse_space      Display in-use memory size
  -inuse_objects    Display in-use object counts
  -alloc_space      Display allocated memory size
  -alloc_objects    Display allocated object counts

Let's pass one. Notice in this case, the top 5 functions allocate 96% of the objects:

go tool pprof -alloc_objects log15.test mem.out
(pprof) top5
6612331 of 6896359 total (95.88%)
Showing top 5 nodes out of 18 (cum >= 376843)
      flat  flat%   sum%        cum   cum%
   2146370 31.12% 31.12%    6612331 95.88%
   1631482 23.66% 54.78%    1631482 23.66%
   1540119 22.33% 77.11%    4465961 64.76%
    917517 13.30% 90.42%    1294360 18.77%
    376843  5.46% 95.88%     376843  5.46%  time.Time.Format

Let's look at a function:

ROUTINE ======================== in /Users/kevin/code/go/src/
   1540119    4465961 (flat, cum) 64.76% of Total
         .          .     97:		if i != 0 {
         .          .     98:			buf.WriteByte(' ')
         .          .     99:		}
         .          .    100:
         .          .    101:		k, ok := ctx[i].(string)
         .    2925842    102:		v := formatLogfmtValue(ctx[i+1])
         .          .    103:		if !ok {
         .          .    104:			k, v = errorKey, formatLogfmtValue(k)
         .          .    105:		}
         .          .    106:
         .          .    107:		// XXX: we should probably check that all of your key bytes aren't invalid
         .          .    108:		if color > 0 {
         .          .    109:			fmt.Fprintf(buf, "\x1b[%dm%s\x1b[0m=%s", color, k, v)
         .          .    110:		} else {
   1540119    1540119    111:			fmt.Fprintf(buf, "%s=%s", k, v)
         .          .    112:		}
         .          .    113:	}

In the common case on line 111 (when color = 0), we're calling fmt.Fprintf to write the key, then an equals sign, then the value. Fprintf also has to allocate memory for its own byte buffer, then parse the format string, then interpolate the two variables. It might be faster, and avoid allocations, to just call buf.WriteString(k), then write the equals sign, then call buf.WriteString(v). But you'd want to benchmark it first to double check!


Using a combination of these techniques, I was able to improve the performance of log15 by about 30% for some common code paths, and reduce memory allocations as well. I was not expecting this, but I also found a way to speed up JSON encoding in the Go standard library by about 20%!

Want someone to benchmark/improve performance in your company's application, or teach your team more about benchmarking? I am available for consulting; email me and let's set something up!

Liked what you read? I am available for hire.

Cleaning up Parallel Tests in Go 1.7

I have a lot of tests in Go that integrate with Postgres, and test the interactions between Go models and the database.

A lot of these tests can run in parallel. For example, any test that attempts to write a record, but fails with a constraint failure, can run in parallel with all other tests. A test that tries to read a random database ID and expects to not fetch a record can run in parallel with other tests. If you write your tests so they all use random UUID's, or all run inside of transactions, you can run them in parallel. You can use this technique to keep your test suite pretty fast, even if each individual test takes 20-40 milliseconds.

You can mark a test to run in parallel by calling t.Parallel() at the top of the test. Here's an example test from the job queue Rickover:

func TestCreateMissingFields(t *testing.T) {
  job := models.Job{
    Name: "email-signup",
  _, err := jobs.Create(job)
  test.AssertError(t, err, "")
  test.AssertEquals(t, err.Error(), "Invalid delivery_strategy: \"\"")

This test will run in parallel with other tests marked Parallel and only with other tests marked Parallel; all other tests run sequentially.

The problem comes when you want to clear the database. If you have a t.Parallel() test clean up after it has made its assertions, it might try to clear the database while another Parallel() test is still running! That wouldn't be good at all. Presumably, the sequential tests are expecting the database to be cleared. (They could clear it at the start of the test, but this might lead to unnecessary extra DB writes; it's better for tests that alter the database to clean up after themselves).

(You can also run every test in a transaction, and roll it back at the end. Which is great, and gives you automatic isolation! But you have to pass a *sql.Tx around everywhere, and make two additional roundtrips to the database, which you probably also need to do in your application).

Go 1.7 adds the ability to nest tests. Which means we can run setup once, run every parallel test, then tear down once. Something like this (from the docs):

func TestTeardownParallel(t *testing.T) {
  // This Run will not return until the parallel tests finish.
  t.Run("group", func(t *testing.T) {
    t.Run("Test1", parallelTest1)
    t.Run("Test2", parallelTest2)
    t.Run("Test3", parallelTest3)
  // <tear-down code>

Note you have to lowercase the function names for the parallel tests, or they'll run inside of the test block, and then again, individually. I settled on this pattern:

var parallelTests = []func(*testing.T){
func TestAll(t *testing.T) {
  defer test.TearDown(t)
  t.Run("Parallel", func(t *testing.T) {
    for _, parallelTest := range parallelTests {
      test.F(t, parallelTest)

The test mentioned there is the set of test helpers from the Let's Encrypt project, plus some of my own. test.F finds the defined function name, capitalizes it, and passes the result to test.Run:

// capitalize the first letter in the string
func capitalize(s string) string {
  r, size := utf8.DecodeRuneInString(s)
  return fmt.Sprintf("%c", unicode.ToTitle(r)) + s[size:]
func F(t *testing.T, f func(*testing.T)) {
  longfuncname := runtime.FuncForPC(reflect.ValueOf(f).Pointer()).Name()
  funcnameparts := strings.Split(longfuncname, ".")
  funcname := funcnameparts[len(funcnameparts)-1]
  t.Run(capitalize(funcname), f)

The result is a set of parallel tests that run a cleanup action exactly once. The downside is the resulting tests have two levels of nesting; you have to define a second t.Run that waits for the parallel tests to complete.

=== RUN   TestAll
=== RUN   TestAll/Parallel
=== RUN   TestAll/Parallel/TestCreate
=== RUN   TestAll/Parallel/TestCreateEmptyPriority
=== RUN   TestAll/Parallel/TestUniqueFailure
=== RUN   TestAll/Parallel/TestGet
--- PASS: TestAll (0.03s)
    --- PASS: TestAll/Parallel (0.00s)
        --- PASS: TestAll/Parallel/TestCreate (0.01s)
        --- PASS: TestAll/Parallel/TestCreateEmptyPriority (0.01s)
        --- PASS: TestAll/Parallel/TestUniqueFailure (0.01s)
        --- PASS: TestAll/Parallel/TestGet (0.02s)

The other thing that might trip you up: If you add print statements to your tear down lines, they'll appear in the console output before the PASS lines. However, I verified they run after all of your parallel tests are finished running.

Liked what you read? I am available for hire.

Buying stocks without a time limit

Recently I put the maximum amount of cash into my IRA account. Since stock prices jump up and down all the time, I wondered whether the current price would be the best one to buy the stock at. In particular, I'm not withdrawing money from my IRA for the better part of 40 years, so I'm not going to sell it — I want the absolute lowest price possible.

Stock markets offer a type of order that's useful for this - you can put in a limit order, which says essentially "I'll buy N shares at price X", where X is some price below the last traded price of the stock. Anyone can take the other side of that order at any time, which usually happens when the stock price drops below the limit order! So you can put in a limit order at any time and it will execute whenever the stock price drops below that amount. I'm simplifying slightly but that's the gist.

So if the current stock price is X, what price should I put in my limit order for? You should be able to come up with some kind of model of the stock price and how it moves and say, roughly, "If you set your limit order at Z, there will be a ~98% chance of it executing in the next year." The normal criticism of stock price models is they underestimate the probability of extreme events, which is alright for us, since extreme events make it more likely our limit order will trigger.

I figured someone studied this problem before so I asked about it on Quant Stack Exchange and I got a really good answer! The answer pointed to two different papers, which have attempted to model this.

Essentially the model looks at the volatility of the stock - how often it jumps up and down - the stock's current price, the probability you want the order to hit, and the amount of time you want to wait. The formula is a little complex but it looks like this:

I wrote some code to compute these values for the stocks I wanted to buy. The hardest part is figuring out the volatility, because you need access to a lot of stock prices, and you'll want to compute the standard deviation from that. I was able to get historical price data from Quandl, and current stock prices from Yahoo.

Combine the two and you get output that looks like this:

$ go run erf.go main.go --stock=<stock> --total=5500 --percent 0.95
the standard deviation is: 0.006742
annualized: 0.107022
current price: 74.95
Based on this stock's volatility, you should set a limit order for: $74.35.

Compared with buying it at the current price, you'll be able to buy 0.6 extra shares (a value of $43.77)

Pretty neat! I've put the code up on Github, so you can take a look for yourself.

Liked what you read? I am available for hire.

A Two Month Debugging Story

This is the story about an error that caused our tests to fail maybe one out of one hundred builds. Recently we figured out why this happened and deployed code to fix it.

The Problem

I've been fighting slow test times in Javascript and Sails since pretty much the day I started at Shyp. We run about 6000 tests on a popular CI service, and they take about 7 minutes to run end to end. This is pretty slow, so we spend a lot of time thinking about ways to make them faster.

To ensure each test has a clean slate, we clear the database between each test. Normally you clear tables by running TRUNCATE TABLE table1 table2 table3 table4 CASCADE. Unfortunately this takes about 200ms, which is too slow when you want to run it between every test - it would double the runtime of our test suite.

DELETE FROM runs much faster, but if you have records in table A that reference table B, DELETE FROM tableB will fail, since records in A still point to it. We could draw a dependency graph between our 60 tables and issue the DELETEs in the correct order, but this seems painful to maintain as we add more constraints. Instead our contractor James Cropcho found a solution - disable the constraints, run the DELETE, then re-enable the constraints. This ran in about 20ms.


This worked for a few months. Unfortunately, occasionally a test would time out and fail after 18 seconds - maybe 1 in 100 test runs would fail with a timeout. Normally when the tests fail you get back some kind of error message — a Postgres constraint failure will print a useful message, or a Javascript exception will bubble up, or Javascript will complain about an unhandled rejection.

With this error we didn't get any of those. We also observed it could happen anywhere - it didn't seem to correlate with any individual test or test file.

Where to start looking

For a long time we suspected that our ORM was losing some kind of race and a callback wasn't being hit, deep in the framework code. This type of problem isn't uncommon in Javascript - the most common tool for stubbing the system time causes hangs in many libraries, including Express, Request, async, Bluebird, and Co. These types of issues are incredibly difficult to debug, because when things hang, or fail to hit callbacks, you have virtually no insight into what the stack looks like, or what callback failed to get hit. We could add console.log lines but you have to have some idea what we were looking for. Furthermore annotating every single library and every single query would have caused readability problems elsewhere.

Run the tests forever

Our CI provider lets us SSH into a host, but you have to enable SSH access before the build completes, which isn't practical when 1 in 100 builds fails. My normal approach to debugging intermittent test failures is to run the tests in a continuous loop until they fail, and then inspect the test logs/SSH to the host and look for... something, until you figured out what was going on. Unfortunately that wasn't triggering it often enough either, and even when I did trigger it, our CI provider shuts down SSH access after 30 minutes, and I wasn't fast enough to notice/investigate.

Instead we had to ship whatever logs we needed as artifacts from the test. This required turning on database logging for Postgres, then copying logs to the directory where artifacts get exported. Database logs across multiple containers shared the same filename, which meant only database logs from the first container became available as artifacts, so we also had to add a unique ID to each log file so they'd get exported properly.

This was a lot of steps, and took just enough debugging/setup time that I procrastinated for a while, hoping a better stack trace or error message might reveal itself, or that the problem would go away.

Here are the Postgres settings we enabled:

logging_collector = on
log_statement = 'all'
log_duration = on
# Log if we are waiting on a lock.
deadlock_timeout = 1s
log_lock_waits = on
# Ensure all logs are in one file.
log_rotation_size = 200MB
log_line_prefix = '%m [%p-%c-%l] %e %q%u@%d '

Finally we found something!

20:38:04.947 LOG: process 16877 still waiting for AccessExclusiveLock on relation 16789 of database 16387 after 1000.113 ms
20:38:04.947 DETAIL: Process holding the lock: 16936. Wait queue: 16877.
20:38:23.141 LOG: process 16877 acquired AccessExclusiveLock on relation 16789 of database 16387 after 19193.981 ms
20:38:23.141 ERROR: canceling autovacuum task
20:38:23.141 CONTEXT: automatic vacuum of table ""
20:38:23.141 LOG: process 16877 acquired AccessExclusiveLock on relation 16789 of database 16387 after 19193.981 ms

It looks like, unbeknownst to us, autovacuum was running in the background, and conflicting with the multi-statement ALTER TABLE query, in a way that each one was waiting for the other to complete. If you get deep enough in the weeds of Postgres blog posts, you find that stuck VACUUMs can and do happen.

Although it requires a fairly unusual combination of circumstances, a stuck VACUUM can even lead to an undetected deadlock situation, as Tom Lane recently pointed out. If a process that has a cursor open on a table attempts to take an exclusive lock on that table while it's being vacuumed, the exclusive lock request and the vacuum will both block until one of them is manually cancelled.

Not out of the woods

Unfortunately, we continued to see deadlocks. In some cases, a test would continue to make background database queries after they had reported completion. These would interact poorly with our ALTER TABLE commands, and deadlock, leading to a timeout. I posted on the Postgresql mailing list about this, and Alexey Bashtanov suggested a different approach. Instead of running ALTER TABLE, start a transaction and defer enforcement of foreign key constraints until the COMMIT. That way we could run the DELETE FROM's in any order and they'd still work.


This approach also sped up that query! It now runs in about 10ms, versus 20ms before.

Not out of the woods, again

We thought we'd nipped it in the bud, and certainly we were happy with the performance improvement. Unfortunately we've still been seeing timeouts; less frequently, but maybe once or twice a week. They usually happen when INSERTing records into an empty table; Postgres just hangs for 15 seconds, until the statement_timeout kicks in and Postgres kills the query, which fails the build. There are no messages about locks, so we're not sure what's going on. This is pretty uncharacteristic behavior for Postgres, and we're not doing anything particularly crazy with queries. We're open to suggestions; I posted about it again on the Postgres mailing list but did not get any good feedback.


The amount of work that's gone into tracking this failure down is depressing, though it's surprising that Javascript isn't the culprit for intermittent test failures. It would be nice if our CI provider had a way to auto-enable SSH on failed builds, enabled Postgres logging by default, or disabled autovacuum by default. I'd also recommend if you're adding foreign key constraints to make them DEFERRABLE INITIALLY IMMEDIATE; this gives you the most flexibility to pursue strategies like we did.

Liked what you read? I am available for hire.

The Frustration and Loneliness of Server-Side Javascript Development

I was hired in December 2014 as the sixth engineer at Shyp. Shyp runs Node.js on the server. It's been a pretty frustrating journey, and I wanted to share some of the experiences we've had. There are no hot takes about the learning curve, or "why are there so many frameworks" in this post.

  • Initially we were running on Node 0.10.30. Timers would consistently fire 50 milliseconds early - that is, if you called setTimeout with a duration of 200 milliseconds, the timer would fire after 150 milliseconds. I asked about this in the #Node.js Freenode channel, and was told it was my fault for using an "ancient" version of Node (it was 18 months old at that time), and that I should get on a "long term stable" version. This made timeout-based tests difficult to write - every test had to set timeouts longer than 50ms.

  • I wrote a metrics library that published to Librato. I expect a background metric publishing daemon to silently swallow/log errors. One day Librato had an outage and returned a 502 to all requests; unbeknownst to me the underlying librato library we were using was also an EventEmitter, that crashed the process if unhandled. This caused about 30 minutes of downtime in our API.

  • The advice on the Node.js website is to crash the process if there's an uncaught exception. Our application is about 60 models, 30 controllers; it doesn't seem particularly large. It consistently takes 40 seconds to boot our server in production; the majority of this time is spent requiring files. Obviously we try not to crash, and fix crashes when we see them, but we can't take 40 seconds of downtime because someone sent an input we weren't expecting. I asked about this on the Node.js mailing list but got mostly peanuts. Recently the Go core team sped up build times by 2x in a month.

  • We discovered our framework was sleeping for 50ms on POST and PUT requests for no reason. I've previously written about the problems with Sails, so I am going to omit most of that here.

  • We upgraded to Node 4 (after spending a day dealing with a nasty TLS problem) and observed our servers were consuming 20-30% more RAM, with no accompanying speed, CPU utilization, or throughput increase. We were left wondering what benefit we got from upgrading.

  • It took about two months to figure out how to generate a npm shrinkwrap file that produced reliable changes when we tried to update it. Often attempting to modify/update it would change the "from" field in every single dependency.

  • The sinon library appears to be one of the most popular ways to stub the system time. The default method for stubbing (useFakeTimers) leads many other libraries to hang inexplicably. I noticed this after spending 90 minutes wondering why stubbing the system time caused CSV writing to fail. The only ways to debug this are 1) to add console.log statements at deeper and deeper levels, since you can't ever hit ctrl+C and print a stack trace, or 2) take a core dump.

  • The library we used to send messages to Slack crashed our server if/when Slack returned HTML instead of JSON.

  • Our Redis provider changed something - we don't know what, since the version number stayed the same - in their Redis instance, which causes the Redis connection library we use to crash the process with a "Unhandled event" message. We have no idea why this happens, but it's tied to the number of connections we have open - we've had to keep a close eye on the number of concurrent Redis connections, and eventually just phased out Redis altogether.

  • Our database framework doesn't support transactions, so we had to write our own transaction library.

  • The underlying database driver doesn't have a way to time out/cancel connection acquisition, so threads will wait forever for a connection to become available. I wrote a patch for this; it hasn't gotten any closer to merging in 10 weeks, so I published a new NPM package with the patch, and submitted that upstream.

  • I wrote a job queue in Go. I couldn't benchmark it using our existing Node server as the downstream server, the Node server was too slow - a basic Express app would take 50ms on average to process incoming requests and respond with a 202, with 100 concurrent in-flight requests. I had to write a separate downstream server for benchmarking.

  • Last week I noticed our integration servers were frequently running out of memory. Normally they use about 700MB of memory, but we saw they would accumulate 500MB of swap in about 20 seconds. I think the app served about thirty requests total in that time frame. There's nothing unusual about the requests, amount of traffic being sent over HTTP or to the database during this time frame; the amount of data was in kilobytes. We don't have tools like pprof. We can't take a heap dump, because the box is out of memory by that point.


You could argue - and certainly you could say this about Sails - that we chose bad libraries, and we should have picked better. But part of the problem with the Node ecosystem is it's hard to distinguish good libraries from bad.

I've also listed problems - pound your head, hours or days of frustrating debugging - with at least six different libraries above. At what point do you move beyond "this one library is bad" and start worrying about the bigger picture? Do the libraries have problems because the authors don't know better, or because the language and standard library make it difficult to write good libraries? I don't think it helps.

You could argue that we are a fast growing company and we'd have these problems with any language or framework we chose. Maybe? I really doubt it. The job queue is 6000 lines of Go code written from scratch. There was one really bad problem in about two months of development, and a) the race detector caught it before production, b) I got pointed to the right answer in IRC pretty quickly.

You could also argue we are bad engineers and better engineers wouldn't have these problems. I can't rule this out, but I certainly don't think this is the case.

The Javascript community gets a lot of credit for its work on codes of conduct, on creating awesome conferences, for creating a welcoming, thoughtful community. But the most overwhelming feeling I've gotten over the past year has been loneliness. I constantly feel like I am the only person running into these problems with libraries, these problems in production, or I'm the only person who cares about the consequences of e.g. waiting forever for a connection to become available. Frequently posting on Github or in IRC and getting no response, or getting told "that's the way it is". In the absence of feedback, I am left wondering how everyone else is getting along, or how their users are.

I've learned a lot about debugging Node applications, making builds faster, finding and fixing intermittent test failures and more. I am available to speak or consult at your company or event, about what I have learned. I am also interested in figuring out your solutions to these problems - I will buy you a coffee or a beer and chat about this at any time.

Liked what you read? I am available for hire.

Go Concurrency for Javascript Developers

Often the best way to learn a new language is to implement something you know
with it. Let’s take a look at some common async Javascript patterns, and how
you’d implement them in Go.


You can certainly implement callbacks in Go! Functions are first class
citizens. Here we make a HTTP request and hit the callback with the response
body or an error.

func makeRequest(method string, url string, cb func(error, io.ReadCloser)) {
    req, _ := http.NewRequest(method, url, nil)
    resp, err := http.DefaultClient.Do(req)
    if err != nil {
        cb(err, nil)
    } else {
        cb(err, resp.Body)
func main() {
    makeRequest("GET", "", func(err error, body io.ReadCloser) {
        defer body.Close()
        io.Copy(os.Stdout, body)

Go uses callbacks in the standard library in some places – time.AfterFunc,
http.HandlerFunc. Other places callbacks are mostly used to allow you to
specify what function or action you want to take, in interaction with a useful
piece of code – filepath.Walk, strings.LastIndexFunc, etc.

Most of the same problems with callbacks apply to Go as Javascript, namely, you
can forget to call a callback, or you can call it more than once. At least you
can’t try to callback that is undefined in Go, since the compiler will not
let you do this.

But callbacks aren’t idiomatic. Javascript needs callbacks because there’s
only a single thread of execution, processing every request on your server
sequentially. If you "wait"/"block" for something to finish (e.g. by calling
fs.readFileSync or any of the Sync methods, every other request on your
server has to wait until you’re done. I’m simplifying, but you have to have
callbacks in Javascript so everything else (other incoming HTTP requests, for
example) have a chance to run.

Go does not have this same limitation. A Javascript thread can only run on one
CPU; Go can use all of the CPU’s on your computer, so multiple threads can
do work at the same time, even if all of them are making "blocking" function
calls, like HTTP requests, database queries, or reading or writing files.

Furthermore, even if you only had one CPU, multiple Go threads would run
without having IO "block" the scheduler, even though methods like http.Get
have a synchronous interface. How does this work? Deep down in code like
net/fd_unix.go, the socket code calls syscall.Write, which (when you follow
it far enough) calls runtime.entersyscall, which signals to the scheduler
that it won’t have anything to do for a while
, and other work should

For that reason you probably don’t want to use callbacks for asynchronous code.

.then / .catch / .finally

This is the other common approach to async operations in Javascript.
Fortunately most API’s in Go are blocking, so you can just do one thing and
then the other thing. Say you want to execute the following async actions:

var changeEmail = function(userId, newEmail) {
    return checkEmailAgainstBlacklist(newEmail).then(function() {
        return getUser(userId);
    }).then(function(user) { = newEmail;
    }).catch(function(err) {
        throw err;

In Go, you’d just do each action in turn and wait for them to finish:

func changeEmail(userId string, newEmail string) (User, error) {
    err := checkEmailAgainstBlacklist(newEmail)
    if err != nil {
        return nil, err
    user, err := getUser(userId)
    if err != nil {
        return nil, err
    user.Email = newEmail
    return user.Save()

.finally is usually implemented with the defer keyword; you can at any time
defer code to run at the point the function terminates.

resp, err := http.Get("")
// Note Close() will panic if err is non-nil, you still have to check err
defer resp.Body.Close()
io.Copy(os.Stdout, resp.Body)


Just ignore the error response, or handle it; there’s no real difference
between sync and async errors, besides calling panic, which your code
shouldn’t be doing.


Doing multiple things at once in Javascript is pretty easy with Promises.
Here we dispatch two database queries at once, and handle the results in one

var checkUserOwnsPickup = function(userId, pickupId) {
    return Promise.join(
    ).spread(function(user, pickup) {
        return pickup.userId ===;

Unfortunately doing this in Go is a little more complicated. Let’s start with
a slightly easier case – doing something where we don’t care about the result,
whether it errors or not. You can put each in a goroutine and they’ll execute
in the background.

func subscribeUser(user User) {
    go func() {
    go func() {

The easiest way to pass the results back is to declare the variables in the
outer scope and assign them in a goroutine.

func main() {
    var user models.User
    var userErr error
    var pickup models.Pickup
    var pickupErr error
    var wg sync.WaitGroup
    go func() {
        user, userErr = getUser(userId)
    go func() {
        pickup, pickupErr = getPickup(pickupId)
    // Error checking omitted

More verbose than Promise.join, but the lack of generics primitives makes a
shorter solution tough. You can gain more control, and early terminate by using
channels, but it’s more verbose to do so.

Do the same as the Go version of Promise.join above, and then write a for
loop to loop over the resolved values. This might be a little trickier because
you need to specify the types of everything, and you can’t create arrays with
differing types. If you want a given concurrency value, there are some neat
code examples in this post on Pipelines.

Checking for races

One of the strengths of Go is the race detector, which detects whether
two threads can attempt to read/write the same variable in different orders
without synchronizing first. It’s slightly slower than normal, but I highly
encourage you to run your test suite with the race detector on. Here’s our
make target:

    go test -race ./... -timeout 2s

Once it found a defect in our code that turned out to be a standard library
! I highly recommend enabling the race detector.

The equivalent in Javascript would be code that varies the event loop each
callback runs in, and checks that you still get the same results.


I hope this helps! I wrote this to be the guide I wish I had when I was getting
started figuring out how to program concurrently in Go.

Thanks to Kyle Conroy, Alan Shreve, Chris
, @rf, and members of the #go-nuts IRC channel for
answering my silly questions.

Liked what you read? I am available for hire.

Encoding and Decoding JSON, with Go’s net/http package

This is a pretty common task: encode JSON and send it to a server, decode JSON on the server, and vice versa. Amazingly, the existing resources on how to do this aren't very clear. So let's walk through each case, for the following simple User object:

type User struct{
    Id      string
    Balance uint64

Sending JSON in the body of a POST/PUT request

Let's start with the trickiest one: the body of a Go's http.Request is an io.Reader, which doesn't fit well if you have a struct - you need to write the struct first and then copy that to a reader.

func main() {
    u := User{Id: "US123", Balance: 8}
    b := new(bytes.Buffer)
    res, _ := http.Post("", "application/json; charset=utf-8", b)
    io.Copy(os.Stdout, res.Body)

Decoding JSON on the server

Let's say you're expecting the client to send JSON data to the server. Easy, decode it with json.NewDecoder(r.Body).Decode(&u). Here's what that looks like with error handling:

func main() {
    http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        var u User
        if r.Body == nil {
            http.Error(w, "Please send a request body", 400)
        err := json.NewDecoder(r.Body).Decode(&u)
        if err != nil {
            http.Error(w, err.Error(), 400)
    log.Fatal(http.ListenAndServe(":8080", nil))

Encoding JSON in a server response

Just the opposite of the above - call json.NewEncoder(w).Encode(&u) to write JSON to the server.

func main() {
    http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        u := User{Id: "US123", Balance: 8}
    log.Fatal(http.ListenAndServe(":8080", nil))

Reading a JSON response from the server.

This time you're going to read the response body in the client, after making the request.

func main() {
    u := User{Id: "US123", Balance: 8}
    b := new(bytes.Buffer)
    res, _ := http.Post("", "application/json; charset=utf-8", b)
    var body struct {
        // sends back key/value pairs, no map[string][]string
        Headers map[string]string `json:"headers"`
        Origin  string            `json:"origin"`

That's it! I hope it helps you a lot. Note that we only had to encode/decode the response to a byte array one time - in every other case we passed it directly to the reader/writer, which is one of the really nice things about Go, and interfaces - in most cases it's really easy to pass around streams, instead of having to deal with the intermediate steps.

Liked what you read? I am available for hire.

How to push to multiple Github accounts from the same machine

You might want/need to check out repos locally as several different Github users. For example, my Github account is kevinburke, but I push code at work as kevinburkeshyp. If I could only commit/push as one of those users on each machine, or I had to manually specify the SSH key to use, it would be a big hassle!

Generate some SSH keys

First generate SSH keys for each account, and then upload the public keys to the right Github accounts.

ssh-keygen -t rsa -b 4096 -f "$HOME/.ssh/kevinburke_rsa"
ssh-keygen -t rsa -b 4096 -f "$HOME/.ssh/kevinburkeshyp_rsa"

Set up SSH host aliasing

You're going to use your standard SSH config for the host you use more often, and an alias for the one you use less often. Here's my SSH configuration for the host I use more often - on my work laptop, it's my work account.

    IdentityFile ~/.ssh/kevinburkeshyp_rsa
    User kevinburkeshyp

Now, set up an alias for your second account.

    IdentityFile ~/.ssh/kevinburke_rsa
    User kevinburke

The Host is set to - this is what you write into your SSH config locally. The HostName is what's used for DNS resolution, which is how this still works. Note you'll want to switch the IdentityFile and the User with the key and usernames you set when you generated the keys.

Now if you want to clone something into the second account, use the special host alias:

git clone [email protected]:kevinburke/hamms.git

That's it! It should just work.

Which Commit Name/Email Address?

Most people configure their commit email address using the following command:

$ git config --global [email protected]

This sets your email address in $HOME/.gitconfig and applies that value to every repository/commit on your system. Obviously this is a problem if you have two different email addresses! I want to use [email protected] with my Shyp account and [email protected] with my personal account.

I fix this by putting an empty email in the global $HOME/.gitconfig:

    name = Kevin Burke
    email = ""

When I try a commit in a new repository, Git shows the following message:

*** Please tell me who you are.


  git config --global "[email protected]"
  git config --global "Your Name"

to set your account's default identity.
Omit --global to set the identity only in this repository.

fatal: unable to auto-detect email address (got [email protected](none)')

Which is good! It prevents me from committing until I've configured my email address. I then have helpers set up to easily configure this.

alias setinburkeemail="git config [email protected]'"

That's it! What tips do you use for managing two different Github accounts?

Liked what you read? I am available for hire.

Just Return One Error

Recently there's been a trend in API's to return more than one error, or to always return an array of errors, to the client, when there's a 400 or a 500 server error. From the JSON API specification:

Error objects MUST be returned as an array keyed by errors in the top level of a JSON API document.

Here's an example.

HTTP/1.1 422 Unprocessable Entity
Content-Type: application/vnd.api+json

  "errors": [
      "status": "422",
      "source": { "pointer": "/data/attributes/first-name" },
      "title":  "Invalid Attribute",
      "detail": "First name must contain at least three characters."

I've also seen it in model validation; for example, Waterline returns an array of errors in the invalidAttributes field of the returned error object.

I think this is overkill, and adds significant complexity to your application for little benefit. Let's look at why.

  • Clients are rarely prepared to show multiple errors. At Shyp, we returned an errors key from the API on a 400 or 500 error for about 18 months. All four of the API's clients - the iOS app, the Android app, our error logger, and our internal dashboard - simply processed this by parsing and displaying errors[0], and ignoring the rest. It makes sense from the client's perspective - the errors were being shown on a phone (not much room) or an alert box in the browser (show two, and you get the "Prevent this page from displaying more alerts" dialog). But if clients can't make use of multiple errors, it makes me wonder why we return them from the API.

  • You rarely catch every problem with a request. Say a pickup is 1) missing the to address, 2) missing the city, 3) shipping to a country we don't ship to, and 4) contains a photo of a firearm, which we can't ship. If we return an error, it's going to be ['Missing City', 'Missing Address Line 1']. There's no way we're going to submit your invalid address to our rates service and discover even more problems with the customs form, or try to parse the photo and learn that you're trying to ship a firearm. So even if you fix the first errors, we're still going to return you a 400 from the API.

  • Continuing to process the request is dangerous. Attaching multiple errors to an API response implies that you discovered a problem with an incoming request and continued processing it. This is dangerous! Every part of your system needs to be prepared to handle invalid inputs. This might be a useful defensive measure, but it's also a lot of work for every function in your system to avoid making assumptions about the input that's passed to it. In my experience, passing around bad inputs leads to unnecessary server errors and crashes.

  • Programming languages and libraries return one error. Generally if you call raise or throw, you're limited to throwing a single Error object. If you're using a third party library that can error, usually library functions are limited to returning one error. Returning a single error from your API or your model validation function aligns with 99% of error handling paradigms that currently exist.

A Caveat

There's one place where returning multiple errors may make sense - a signup form with multiple form fields, where it can be frustrating to only return one problem at a time. There are two ways to deal with this:

  • Perform validation on the client, and detect/display multiple errors there. It's a better user experience to do this, since feedback is more or less instantaneous. You'll need the validation in your API anyway, but if someone manages to bypass the client checks, it's probably fine to just return one error at a time.

  • Special case your returned error with a key like attributes, and return the necessary error messages, just for this case. I'd rather special case this error message, than force every other error to be an array just to handle this one place in the API.

Liked what you read? I am available for hire.