How can we trust open source software?
February 26, 2013 7:18 AM Subscribe
Open source software is considered trustworthy because anyone can validate the source code and hold the developer accountable. Usually developers will also make compiled binaries available for convenience. How can we know that these binaries are compiled from the same source code the developer published, and not a malicious variant of it?
A truly paranoid person will compile from source every time. But most people don't have the knowledge or time to do that so they trust the project's official compiled packages. Is there any way to validate that an application wasn't compiled from a secret parallel branch?
I was thinking of a checksum: Compile an app yourself, then download the official binary, and the checksums of both should match. I don't know much about compiling code personally, I assume there are nuances to the process that could result in variation even if the official binary is legitimate. Can anyone more informed shed some light on this?
A truly paranoid person will compile from source every time. But most people don't have the knowledge or time to do that so they trust the project's official compiled packages. Is there any way to validate that an application wasn't compiled from a secret parallel branch?
I was thinking of a checksum: Compile an app yourself, then download the official binary, and the checksums of both should match. I don't know much about compiling code personally, I assume there are nuances to the process that could result in variation even if the official binary is legitimate. Can anyone more informed shed some light on this?
Best answer: You can't.
You just have to trust the people who distribute it, and keep an eye out for reports of violations of trust. (Usually when someone tries something funny with free software, someone will notice sooner or later, and make a fuss about it. Then you know not to trust that supplier again.)
As tylerkaraszewski points out: it's mathematically impossible, without verifying your computer hardware and your compiler, which is practically impossible.
Software distribution sites do usually also publish checksums of the packages, but that only guards against MITM attacks, not against malicious software distributors.
posted by richb at 7:33 AM on February 26, 2013 [1 favorite]
You just have to trust the people who distribute it, and keep an eye out for reports of violations of trust. (Usually when someone tries something funny with free software, someone will notice sooner or later, and make a fuss about it. Then you know not to trust that supplier again.)
As tylerkaraszewski points out: it's mathematically impossible, without verifying your computer hardware and your compiler, which is practically impossible.
Software distribution sites do usually also publish checksums of the packages, but that only guards against MITM attacks, not against malicious software distributors.
posted by richb at 7:33 AM on February 26, 2013 [1 favorite]
Response by poster: Thanks for the answers, unfortunate as they are. I'm surprised a solution to this hasn't been proposed yet.
Couldn't a neutral third party offer a VPS for developers to compile on? That way the process could be observed, verified, and reproduced. Then developers could post a "certified download" link to their application hosted on the third-party site along with all the compilation details to prove it's trustworthy. Seems like the kind of service GitHub or someone like that should be offering.
I don't know much about this stuff, there's probably a reason it hasn't been done.
posted by The Winsome Parker Lewis at 7:52 AM on February 26, 2013
Couldn't a neutral third party offer a VPS for developers to compile on? That way the process could be observed, verified, and reproduced. Then developers could post a "certified download" link to their application hosted on the third-party site along with all the compilation details to prove it's trustworthy. Seems like the kind of service GitHub or someone like that should be offering.
I don't know much about this stuff, there's probably a reason it hasn't been done.
posted by The Winsome Parker Lewis at 7:52 AM on February 26, 2013
I assume there are nuances to the process that could result in variation even if the official binary is legitimate.
I haven't specifically attempted to compile a package from source and then compare the checksums (actually, people usually use MD5 sums for this purpose) but I wouldn't expect it to work as you envision, unless your machine is absolutely identical in all respects to the distributor's systems (installed software versions, libraries, everything). Even then, I think distributors probably have a large automated build system for keeping packages compiled and up to date -- they don't do them one-at-a-time, by hand, as you probably would.
"Neutral third parties" are a common ingredient to throw into a system to improve security, but you always have to ask, "what if you don't trust that third party?" There's really nobody, no entity, that absolutely everybody else trusts. And if you've gotta trust somebody, why not trust the person who wrote the program?
posted by spacewrench at 8:00 AM on February 26, 2013
I haven't specifically attempted to compile a package from source and then compare the checksums (actually, people usually use MD5 sums for this purpose) but I wouldn't expect it to work as you envision, unless your machine is absolutely identical in all respects to the distributor's systems (installed software versions, libraries, everything). Even then, I think distributors probably have a large automated build system for keeping packages compiled and up to date -- they don't do them one-at-a-time, by hand, as you probably would.
"Neutral third parties" are a common ingredient to throw into a system to improve security, but you always have to ask, "what if you don't trust that third party?" There's really nobody, no entity, that absolutely everybody else trusts. And if you've gotta trust somebody, why not trust the person who wrote the program?
posted by spacewrench at 8:00 AM on February 26, 2013
You can't trust the distribution channel.
You can't trust the build system.
You can't trust the package maintainer.
You can't trust the software author.
You can't trust the compiler.
You can't trust the underlying operating system.
You can't even trust the hardware.
But that's no way to live. :) No matter where you try to address the problem, there are avenues of attack above and below. Until you're hand crafting ICs from beach sand your system will be vulnerable. Even then, you can't trust yourself. You're hand crafting ICs from beach sand. You're crazy.
posted by roue at 8:11 AM on February 26, 2013 [16 favorites]
You can't trust the build system.
You can't trust the package maintainer.
You can't trust the software author.
You can't trust the compiler.
You can't trust the underlying operating system.
You can't even trust the hardware.
But that's no way to live. :) No matter where you try to address the problem, there are avenues of attack above and below. Until you're hand crafting ICs from beach sand your system will be vulnerable. Even then, you can't trust yourself. You're hand crafting ICs from beach sand. You're crazy.
posted by roue at 8:11 AM on February 26, 2013 [16 favorites]
Response by poster: Thanks, everybody. Now I'm even more paranoid than I was before asking! :-)
I still think a third-party system for auditing the compilation process would be useful. I'd trust GitHub more than some Joe Schmo developer I don't know. I mean, if the source code's reviewable AND you can confirm that source is what's being compiled AND you can confirm the MD5 of the compiled app matches what you're downloading as a user, that's pretty watertight. I think there's a pretty clear line separating that (reasonable concern) from mistrusting your own OS and hardware (excessive FUD).
This probably isn't a big deal for most software but I'm surprised security experts aren't rallying for these measures to safeguard sensitive software like Tor, TrueCrypt, or various Bitcoin clients. Programs that you need to know are doing exactly what they're supposed to and nothing else.
posted by The Winsome Parker Lewis at 8:48 AM on February 26, 2013
I still think a third-party system for auditing the compilation process would be useful. I'd trust GitHub more than some Joe Schmo developer I don't know. I mean, if the source code's reviewable AND you can confirm that source is what's being compiled AND you can confirm the MD5 of the compiled app matches what you're downloading as a user, that's pretty watertight. I think there's a pretty clear line separating that (reasonable concern) from mistrusting your own OS and hardware (excessive FUD).
This probably isn't a big deal for most software but I'm surprised security experts aren't rallying for these measures to safeguard sensitive software like Tor, TrueCrypt, or various Bitcoin clients. Programs that you need to know are doing exactly what they're supposed to and nothing else.
posted by The Winsome Parker Lewis at 8:48 AM on February 26, 2013
I still think a third-party system for auditing the compilation process would be useful.
And who's going to pay for it? Your "free software" just became extremely expensive.
posted by Chocolate Pickle at 8:52 AM on February 26, 2013
And who's going to pay for it? Your "free software" just became extremely expensive.
posted by Chocolate Pickle at 8:52 AM on February 26, 2013
Response by poster: And who's going to pay for it? Your "free software" just became extremely expensive.
Yeah, I don't know the answer to that question. I could see it being funded by the EFF or some other advocacy organization. Ultimately I'm just thinking aloud and making sure I understand the problem, not writing a business plan.
posted by The Winsome Parker Lewis at 9:01 AM on February 26, 2013
Yeah, I don't know the answer to that question. I could see it being funded by the EFF or some other advocacy organization. Ultimately I'm just thinking aloud and making sure I understand the problem, not writing a business plan.
posted by The Winsome Parker Lewis at 9:01 AM on February 26, 2013
Most users of open source software linux probably use packages provided by their distribution in which case the source code is already compiled by them and not the developers. In practice this means that you need to trust both the package maintainer as well as the developers.
posted by colophon at 9:20 AM on February 26, 2013
posted by colophon at 9:20 AM on February 26, 2013
I'd trust GitHub more than some Joe Schmo developer I don't know.
Personally, I trust the Debian maintainers not to screw me over. Debian is one of the few distros that has actually paid a decent amount of attention to culture.
posted by flabdablet at 9:36 AM on February 26, 2013 [2 favorites]
Personally, I trust the Debian maintainers not to screw me over. Debian is one of the few distros that has actually paid a decent amount of attention to culture.
posted by flabdablet at 9:36 AM on February 26, 2013 [2 favorites]
Commercial software developers also suffer from most of these (theoretical) security problems: there's nothing magic about selling software that makes you immune to them.
Ultimately, all you really have to go on is reputation & trust that the people involved value that reputation enough to put the effort in to eliminate as many risks as reasonably possible.
As with everything in life, you can't eliminate some risks no matter how much money you spend.
Personally, I trust the Debian developer community not to deliberately screw me over rather more than I do most commercial developers.
posted by pharm at 10:06 AM on February 26, 2013 [1 favorite]
Ultimately, all you really have to go on is reputation & trust that the people involved value that reputation enough to put the effort in to eliminate as many risks as reasonably possible.
As with everything in life, you can't eliminate some risks no matter how much money you spend.
Personally, I trust the Debian developer community not to deliberately screw me over rather more than I do most commercial developers.
posted by pharm at 10:06 AM on February 26, 2013 [1 favorite]
Best answer: This is going to be long and involve some technical detail but I will try to make it all accessible to a lay audience.
Point #1: Even if you had a mathematically provable perfect chain of custody, there are already so many vulnerabilities in all but the best-audited pieces of software that it doesn't matter anyway. There is a huge market for so-called "zero-day" flaws, which are unpublished vulnerabilities. There are highly unusual projects like the OpenBSD operating system, which is extensively audited and very conservatively constructed. I don't know how to quantify this, but my guess is that they are well above the 99th percentile relative to the average project's security, perhaps even 99.99 or higher, and even they sometimes get flaws that allow a remote attacker access to the machine. That's not even taking into account flaws which allow an ordinary user to perform "privilege escalation" and take over the machine, which are far more common than remote exploits. Security is all about breaking the weakest link, stepping over the lowest fence, etc., and while poisoned binaries are a legitimate threat, in the scheme of things it's small.
Point #2: tylerkaraszewski references an excellent article, but one that's only relevant to compilers. When one first creates a new compiler, since an existing compiler is used, there's a possibility to modify the resulting binary. The cast majority of libraries and applications are not used to generate other programs, and therefore vulnerabilities couldn't be propagated that way. You'd have to attack the compiler itself, which is a lot smaller target than a whole operating system.
Point #3: The compilation environment of a computer is absolutely enormous. There are so many little corners, environment variables, differences in the low-level hardware, etc., that the odds of producing the exact same binary even on two seemingly identical installations of the same operating system is pretty small. The hash would only be relevant on the same physical machine, so it's not like you could perform your own independent verification. And, as noted, the more you centralize that sort of thing, the more valuable it becomes to attack.
Point #4: Commercial software, as noted by pharm, typically has completely opaque source, development environment, and deployment. Talk about trust!
posted by wnissen at 10:15 AM on February 26, 2013
Point #1: Even if you had a mathematically provable perfect chain of custody, there are already so many vulnerabilities in all but the best-audited pieces of software that it doesn't matter anyway. There is a huge market for so-called "zero-day" flaws, which are unpublished vulnerabilities. There are highly unusual projects like the OpenBSD operating system, which is extensively audited and very conservatively constructed. I don't know how to quantify this, but my guess is that they are well above the 99th percentile relative to the average project's security, perhaps even 99.99 or higher, and even they sometimes get flaws that allow a remote attacker access to the machine. That's not even taking into account flaws which allow an ordinary user to perform "privilege escalation" and take over the machine, which are far more common than remote exploits. Security is all about breaking the weakest link, stepping over the lowest fence, etc., and while poisoned binaries are a legitimate threat, in the scheme of things it's small.
Point #2: tylerkaraszewski references an excellent article, but one that's only relevant to compilers. When one first creates a new compiler, since an existing compiler is used, there's a possibility to modify the resulting binary. The cast majority of libraries and applications are not used to generate other programs, and therefore vulnerabilities couldn't be propagated that way. You'd have to attack the compiler itself, which is a lot smaller target than a whole operating system.
Point #3: The compilation environment of a computer is absolutely enormous. There are so many little corners, environment variables, differences in the low-level hardware, etc., that the odds of producing the exact same binary even on two seemingly identical installations of the same operating system is pretty small. The hash would only be relevant on the same physical machine, so it's not like you could perform your own independent verification. And, as noted, the more you centralize that sort of thing, the more valuable it becomes to attack.
Point #4: Commercial software, as noted by pharm, typically has completely opaque source, development environment, and deployment. Talk about trust!
posted by wnissen at 10:15 AM on February 26, 2013
It is undecidable whether two programs are equivalent in the general case; this includes proving the equivalence of source code and object code. That is, there are at least some program pairs for which it is impossible to prove in a finite number of steps whether they are equivalent or not.
On the other hand, if you start with program P and transform it by a number of equivalence-preserving steps S = [S1, S2, ..., Sn] into program P', you have a verifiable equivalence proof. For the person who receives P (source code), P' (object code), and the list of steps S, you need only repeat the n steps to verify the equivalence of P and P'.
You have the usual need to rely on the absence of bugs or backdoors in software (in this case, in the equivalence-verifying software).
Also it's entirely possible that this system would still not be particularly useful. For instance, the time to verify the object code's equivalence to the source code might be on the same order as the time to build the source code into object code. (also, transmitting all of P, P', and S will also take more to download than just P, the original source code)
Or it might turn out that so many of the transformations taken by compilers are non-equivalent that the provably-equivalent object code would perform markedly worse at runtime than object code produced by a standard, non-verifiable compiler.
posted by jepler at 2:42 PM on February 26, 2013
On the other hand, if you start with program P and transform it by a number of equivalence-preserving steps S = [S1, S2, ..., Sn] into program P', you have a verifiable equivalence proof. For the person who receives P (source code), P' (object code), and the list of steps S, you need only repeat the n steps to verify the equivalence of P and P'.
You have the usual need to rely on the absence of bugs or backdoors in software (in this case, in the equivalence-verifying software).
Also it's entirely possible that this system would still not be particularly useful. For instance, the time to verify the object code's equivalence to the source code might be on the same order as the time to build the source code into object code. (also, transmitting all of P, P', and S will also take more to download than just P, the original source code)
Or it might turn out that so many of the transformations taken by compilers are non-equivalent that the provably-equivalent object code would perform markedly worse at runtime than object code produced by a standard, non-verifiable compiler.
posted by jepler at 2:42 PM on February 26, 2013
There was that time when the Debian crew screwed everybody over for a while. Even so, I'm inclinded to believe that it wasn't done maliciously or deliberately.
posted by flabdablet at 2:30 AM on February 27, 2013
posted by flabdablet at 2:30 AM on February 27, 2013
Exactly flabdablet: the best you can do is believe that a particular group won't deliberately screw you over. It remains inevitable that they're going to make mistakes from time to time.
posted by pharm at 3:29 AM on February 27, 2013
posted by pharm at 3:29 AM on February 27, 2013
« Older Is there a service like ReadItLater/Instapaper for... | Should I get a part time job? Newer »
This thread is closed to new comments.
posted by tylerkaraszewski at 7:24 AM on February 26, 2013 [2 favorites]