Safely Moving a Large Shrinkwrapped Dependency

This past week the team at Shyp decided to fork the framework we use (Sails.js) and the ORM it uses (Waterline). It was an interesting exercise in JS tooling, and I wasted a lot of time on busywork, so I thought I'd share a little bit about the process.

Why Fork?

We've been on an outdated fork of Sails/Waterline for a while. We chose not to update because we were worried that updates would introduce new errors, and many of the updates were new features that we didn't want or need.

The recent ugly discussions between the former CEO of Balderdash and other core contributors to Sails/Waterline made us rethink our dependency on the core tools. We concluded upgrading our framework was going to always be unlikely. We also figured we could remove the many parts of the framework that we don't want or need (sockets, cookies, blueprints, the horribly insecure "action routes", CSRF, Mongo support, Grunt, sails generate, sails www, the multi-query methods that aren't safe against race conditions, &c), and apply bugfixes to parts of the code that we knew were broken. Fortunately the changes already started to pay for themselves at the end of an 8 hour day, as I'll explain later.

Rollout Plan

On its face, the rollout plan was pretty simple.

  1. Fork Sails to Shyp/sails.

  2. Update the HEAD of the master branch to point at the version we have installed in production.

  3. Point the Sails dependency at our fork (no code changes).

  4. Make changes that we want.

  5. Update the dependency.

Breakdowns

This broke down somewhere around (3), for a few reasons.

  • If I tried to npm install github.com/Shyp/sails.git, every single package Sails depended on, and all of their sub-dependencies, would update. We didn't want this, for several reasons - I don't trust package maintainers to stay on top of updates, or predict when changes/updates will break code, and package updates amount to a large amount of unvetted code being deployed into your API without any code review.

    I didn't know how to work around this so I punted on it by hard-coding the update in the npm shrinkwrap file.

  • My next attempt was to try to generate a shrinkwrap file for Sails, and give our Sails fork the same versions of every dependency & sub-dependency we had in our API. This was the right idea, but failed in its implementation. Sails dependencies are currently alphabetized, but they weren't in the version that we were running. I alphabetized the package.json list while generating a shrinkwrap file, which changed the dependency order in the Shyp/sails shrinkwrap file. This made it impossible to diff against the shrinkwrap file in our API, which listed Sails's dependencies in an arbitrary order.

  • Finally I figured out what needed to happen. First, alphabetize the Sails dependency list in the Shyp API repository. Once that's done, I could use that shrinkwrap file as the basis for the shrinkwrap file in the Sails fork, so the dependencies in the Sails project matched up with the Shyp dependencies. Specifically, I accomplished this by sorting the dependencies in the package.json file, placing the sorted file in api/node_modules/sails/package.json, and then re-running clingwrap with the same dependencies installed. I then verified the same version of each dependency was installed by sorting the JSON blob in the old and new shrinkwrap files, and ensuring that the diff was empty.

    Once that was done, I could update dependencies in the (alphabetized) fork, and the diff in Shyp API's (now alphabetized) npm-shrinkwrap.json would be small. It was great to finally be able to generate a small diff in npm-shrinkwrap.json, and verify that it was correct.

Compounding the frustration was that each build took about 2 minutes, to wipe and reinstall all versions of every dependency. It was a slow afternoon that got more frustrating as it wore on.

Notes

  • Before adding or attempting to move any large NPM dependency, ensure that the dependencies are in alphabetical order. You'll run into too many headaches if this is not the case, and you won't be able to regenerate the npm-shrinkwrap file or be able to reason about it.

  • It's surprising that the builtin build tools don't enforce alphabetical order.

  • NPM's shrinkwrap tool does not generate the same file every time you run it. Specifically, if you install a package, occasionally the shrinkwrap file will report the "from" string is a range, for example, "from": "[email protected]>=0.6.0-1 <0.7.0", and sometimes it will report the "from" string as a npmjs.com URL. For a while, we worked around this by generating the NPM shrinkwrap file twice - the second time we ran it, it would generate a consistent set of strings.

      rm -rf node_modules
      npm cache clear
      npm install --production
      npm shrinkwrap
      npm install --production
      npm shrinkwrap
    

    Eventually we switched to using clingwrap, which solves this problem by avoiding the "from" and "resolved" fields entirely. We've also heard NPM 3 contains updates to the shrinkwrap file, but NPM 3 also believes our dependency tree contains errors, so we've been unable to effectively test it. Clingwrap also has some errors with NPM 3 and github repositories.

  • It was incredibly tempting to just say "fuck it" and deploy the updated version of every dependency and subdependency. Based on the amount of work it took me to figure out how to generate and check in consistent diffs, I came away thinking that there are a lot of Javascript applications out there that are unable to control/stabilize the versions of libraries that they use.

  • These are all problems you can avoid or mitigate by thinking incredibly hard before taking on a dependency, especially if, like Sails, the tool you're leaning on has a lot of its own dependencies. Is there really an advantage to using _.first(arr) instead of arr[0]? Do you really need the pluralize library, or is it sufficient to just append an 's' for your use case? Do you need to take a dependency to build a state machine, or can you use a dictionary and an UPDATE query?

Benefits

This took a lot of time to get right, and felt like busywork for at least three of the four hours I spent working on this, but we finally got some benefits out of this by the end of the day.

  • I committed a change to immediately error if anyone attempts to use the count() method, which attempts to pull the entire table into memory. Deploying this in our API application revealed four places where we were calling count()! Fortunately, they were all in test code. Still, it's delightful to be able to automatically enforce a rule we previously had to rely on code review (and shared team memory) to catch.

  • We removed the Grunt dependency, which we never used. There are about 4500 lines of deletions waiting in pull requests.

  • We understand shrinkwrap a little better, and hopefully this process won't be so painful next time.

  • We removed some incorrect Mongo-specific code from Waterline. This is the type of adaptation that we can only do by giving up compatibility, and we're happy with how easy this is now.

So that was my Wednesday last week. As it stands, we're planning on developing this fork in the open. I don't think you should use it - there are still better alternatives - but if you're stuck on Waterline with Postgres, or using Sails as a REST API, you might want to take a look.

Liked what you read? I am available for hire.

Comments are heavily moderated.