Semver is evil
Recently it has been brought to my attention that the node-ipc
npm package deliberately started
including malware.
I've talked about the downsides of package managers in other posts like Package managers considered harmful where I mostly talked about package
bloat and the problems this causes.
This time I'd also like to talk more about some of the security concerns that come with the usage of a
package manager.
The story of left-pad
You've probably heard about this one already, some guy creates a helper function that left-pads a string
and
publishes it as a package on npm. Eventually he takes down the package, but only after racking up
thousands
of dependants; popular libraries used by millions of developers around the globe. People can no longer
deploy their
websites because this one guy decided to de-list his silly left-pad
package, and the
CTO of
npm is forced to step in and de-de-list the package.
This should've never happened, but at least we learned from it right? Well not really, developers continue to blindly pull in hundreds of dependencies to this day. I genuinely don't understand how these people sleep at night, knowing any one of these authors could prevent you from shipping a new version of your website at any moment. It's not a security concern per se, but having such a large amount of dependencies is just a completely unnecessary liability. Power outages and internet outages suck, but they're not really issues we can solve as programmers. Outages caused by rogue package authors on the other hand are our fault and everyone touching a package manager should take responsibility.
Enter semver
The idea behind semantic versioning (or semver for short) is that you can distinguish between bug fixes
(patch
), minor additions (minor
) and major breaking changes (major
).
Upgrading to a new patch
or minor
version of a package is generally considered to
be safe,
while upgrading to a new major
version might take some extra effort. This scheme allows you to
stay up
to date with the latest features and security fixes without worrying about breaking your code, great!
It's far from perfect though. For one, the version is specified by a human and there's no guarantee the human won't typo, misremember, or maliciously change the version to something that doesn't represent the changes made to the codebase.
On top of this the entire concept of semantic versioning is kind of flawed as it only cares about changes to the interface of the software, it doesn't care about implementation details. In other words: it assumes computers are magic boxes with infinite memory and processing speed. Let's say you're making a game that needs to do some expensive calculation on each frame and you've found a package on npm that solves most of the problem. You've painstakingly optimized the rest of your pipeline to reach the target framerate while staying within the memory budget and everything seems to be working great.
Now the package author pushes a new patch
that adds a bunch of security features. Suddenly your
game
slows down to a crawl! According to semver the package author didn't do anything wrong, and many other
dependent
of the package are actually quite happy with the new security patch. As it turns out different projects are
concerned about different things, and semver only kinda solves one thing.
Yet again we've introduced another source of effective outages where package authors pushing seemingly insignificant changes can completely ruin your day.
The dangers of semver
I'll only focus on npm here because it's the package manager I know best, but this may apply to other package managers as well.
Npm is the most popular package manager for javascript devs as it comes bundled with node.js. It uses semver to the fullest, when installing a package it will automatically grab the latest compatible version. If you update your packages at a later date it will check if any packages are out of date and update to the latest compatible version accordingly by using semver. Npm will notify developers when they depend on outdated packages and in recent years they'll even warn about any known security vulnerabilities in the packages you depend on, encouraging you to always stay up to date.
This sounds reasonable enough since we all know security patches are important and semver allows us to install said patches while making a guarantee it won't break our software. I've seen this reasoning being distilled down to "semver is secure" or "package managers are secure" in some lectures.
This is where things start to get dangerous! Semver isn't bad per-se, but believing semver will somehow make your applications secure is just terrible. It gives us a reason to justify relying on package managers, and it makes you want to install a bunch of tiny packages that solve specific problems because "the package authors know what they're doing, so we'll get security patches for free :)". It makes us compulsively keep our packages up to date at all times. It's not uncommon for projects to have a CI pipeline that always pulls in dependencies from npm before deployment.
I already talked about some of the problems of semver in the previous section, and these are drastically amplified as your dependency graph grows and update frequency increases.
Up until this point I've just been talking about these systems working as intended, but what
happens
when we have a malicious actor?
As you (hopefully) know, pulling in a npm package and calling its functions is arbitrary code execution. The
package
could do anything, and you should only install it if you trust it completely. Now chances are you
actually
can trust a popular library with millions of users, there's no way it'd get so popular if
it were
evil after all. But with package managers you're not just trusting the package at the time of
installing, you're trusting that the maintainers are good and will never do anything
malicious
in the future.
Humans are fragile, some hypothetical genius that wrote worlds fastest circular-queue
library
in 2013
might turn into a drug addicted schizophrenic twenty years down the line. Would you trust this individual to
execute
arbitrary code on your PC? Do you trust them executing arbitrary code on the production server? Well if
you're
blindly updating your packages the answer should logically be yes.
As it turns out in the case of node-ipc
the maintainer of the library just wanted to spread an
anti-war
message, and did so by nuking the file system of every russian that happened to run some software using the
package.
I don't want to get too far into this, but it's honestly disgusting to me how Russian citizens (most of whom are also anti-war) are being otherized by our media and how most adults are going along with it. It's understandable that people want to fight back against Putin, but stop painting "the Russians" as the enemy while doing so.
Politics aside, this goes to show how fragile modern software has become. Projects you might not even
associate with
javascript like Unity Hub shipped with the malware version of node-ipc
.
Stuff like this is why I've personally started doing the exact opposite of what most people
would call
"proper security" by installing working versions of software and never updating
them
unless I need new features.
How to fix the semver problem
I personally believe semver is a net negative for developers. While it kind of allows us to painlessly update libraries, it lulls you into a false sense of security while doing so.
When choosing libraries to incorporate into our project we usually look at github stars, activity and sponsors as good metrics of trustworthyness. When all else fails, only then do we actually look at the source code of a library to see if it's any good. To me this is completely reasonable, at least for projects where time to market is one of the primary concerns.
I believe we should use a similar level of discipline when updating packages: you don't have to look at all the diffs, but someone should. Npm warns you about security issues, presumably these are reported by the package authors. What if we had a similar system where independent developers can review changes made to a package before it gets approved? When updating you'd simply look at some of the positive and negative reviews. If there's a lack of reviews you could look at the code yourself before deciding to update. Not only does this solve the malware problem, you'd probably even learn some new stuff along the way!
As a little aside I should also mention that nested dependencies are evil. If one of the packages you depend on has its own dependencies they should simply be appended to your list of dependencies instead of forming some implicit dependency graph. This means you'll have to review every package update that comes your way and no sneaky nested packages can slip through.
While we could still use semver in combination with this review system, I'm not sure if we really need it. Not only can the reviews tell you about the (lack of) breaking changes in the programming interface. They can also tell you about performance degradation, dependency bloat, etc. and you could decide whether it's worth upgrading or not instead of leaving that decision up to the semver gods.