Can someone explaining GPG/PKI/SSH signing?
May 18, 2024 6:53 PM Subscribe
I've been working with RPM-OSTree and it is a neat idea and I like the concept except the variant I'm using at least is additive only and not really immutable (e.g., no git reset --hard && git clean, granted it is an image file) due to limitations of the OS. It also does things line being additive only basically. For those unfamiliar it creates the OS and packages in a container, then creates a local commit. User overrides take precedence, and you can accidentally overwrite a package making the update procedure "break." But it does sign the original image artifacts, my question is there a way to basically verify signing history of an arbitrary object? I have a feeling I accidentally stumbled on what the whole blockchain thing was about.
I'm not going to go into details to save space but basically you have an original image of the kernel + packages and then it'll update nightly. Due to how OS works you can't really be immutable unless you delete everything. But it installs a key ring and verifies the original "image" ... then I realized outside of environments like Kubernetes or Git even I don't understand signing files.
I guess my question is that for any artifacts coming out of Kubernetes we sign and we treat containers as immutable. In git itself all artifacts are signed and you can use like keypass or a vault to do some sort of OIDC verification and a vault for backup. I'm not interested in encryption but is there a way to show the history of the signed file.
If I sign a file now and it isn't a blob I can inject my public key in there, or there's a number of ways to do it, but that's just showing I added it or more likely in an automated environment a service did it. I want to go a step beyond that because you can always curl or obtain a file from anywhere and sign it and not know if it was unverified or not regardless of protocol used. I'd like to say "unverified file pulled -> signed by kubernetes@geoffgo -> modified and signed by mary@geoffco" .... this isn't for malicious purposes but does show sha (same file) and the history. I realize that's kinda basically git but I was wondering if there was a platform agnostic way to trace the lineage of a file back with or without encryption? Is that what blockchain is trying to achieve?
This isn't necessarily to see the history, why ti was changed, etc. Just to see oh yeah you definitely did something with with this file at this time. So it might be Word-2003-x86, but the sha is off and you did a gpg verify so regardless of whether you don't remember you did something so lets find a blob that has a commit everyone else uses.
Does this make sense? I guess I just used verification to make sure it came from a trusted source in an environment where we knew what trusted meant. There might be a good reason history shows I downloaded terraform from hashiform who signed it, changed, I signed it, put it in an internal repo, etc.but that's not germane to the question. If I can show that there's going to be another system of record explaining why.
I'm not going to go into details to save space but basically you have an original image of the kernel + packages and then it'll update nightly. Due to how OS works you can't really be immutable unless you delete everything. But it installs a key ring and verifies the original "image" ... then I realized outside of environments like Kubernetes or Git even I don't understand signing files.
I guess my question is that for any artifacts coming out of Kubernetes we sign and we treat containers as immutable. In git itself all artifacts are signed and you can use like keypass or a vault to do some sort of OIDC verification and a vault for backup. I'm not interested in encryption but is there a way to show the history of the signed file.
If I sign a file now and it isn't a blob I can inject my public key in there, or there's a number of ways to do it, but that's just showing I added it or more likely in an automated environment a service did it. I want to go a step beyond that because you can always curl or obtain a file from anywhere and sign it and not know if it was unverified or not regardless of protocol used. I'd like to say "unverified file pulled -> signed by kubernetes@geoffgo -> modified and signed by mary@geoffco" .... this isn't for malicious purposes but does show sha (same file) and the history. I realize that's kinda basically git but I was wondering if there was a platform agnostic way to trace the lineage of a file back with or without encryption? Is that what blockchain is trying to achieve?
This isn't necessarily to see the history, why ti was changed, etc. Just to see oh yeah you definitely did something with with this file at this time. So it might be Word-2003-x86, but the sha is off and you did a gpg verify so regardless of whether you don't remember you did something so lets find a blob that has a commit everyone else uses.
Does this make sense? I guess I just used verification to make sure it came from a trusted source in an environment where we knew what trusted meant. There might be a good reason history shows I downloaded terraform from hashiform who signed it, changed, I signed it, put it in an internal repo, etc.but that's not germane to the question. If I can show that there's going to be another system of record explaining why.
What's your mental model of asymmetric cryptography, and how does attestation work?
Two secrets create a channel, for Elliptic curves, it's two points on the pre-agreed curve. For RSA two prime numbers are used as an encryption-decryption pair and you use one without the other to create an encrypted message. One secret gets kept by the provider and the other shared publicly. When you want to attest some data, the signing process produces a checksum of the data (typically a hash like SHA256) and then you encrypt it with one of the keys. Verifying does something similar: compute the hash of the attested data, then decrypt the hash to see that it's the same.
Certificates on web servers rely on public keys distributed in client bundles, and GPG mail can attest that messages aren't modified or encrypt the whole conversation.
Git and OSTree use this idea incrementally: a data structure called a Merkle Tree has a chain of attestation, but also forking and branching, and the Merkle Tree is core to git's and OSTree's (and blockchain) storage. This explanation of OSTree and Ninja at Linux Weekly News has more detail.
Is there an immutable attested store for vendor-imported items? You could probably fake one with append-only storage or a custom git-blob script to include the attestation hash in the commit message.
posted by k3ninho at 1:15 AM on May 19, 2024 [1 favorite]
Two secrets create a channel, for Elliptic curves, it's two points on the pre-agreed curve. For RSA two prime numbers are used as an encryption-decryption pair and you use one without the other to create an encrypted message. One secret gets kept by the provider and the other shared publicly. When you want to attest some data, the signing process produces a checksum of the data (typically a hash like SHA256) and then you encrypt it with one of the keys. Verifying does something similar: compute the hash of the attested data, then decrypt the hash to see that it's the same.
Certificates on web servers rely on public keys distributed in client bundles, and GPG mail can attest that messages aren't modified or encrypt the whole conversation.
Git and OSTree use this idea incrementally: a data structure called a Merkle Tree has a chain of attestation, but also forking and branching, and the Merkle Tree is core to git's and OSTree's (and blockchain) storage. This explanation of OSTree and Ninja at Linux Weekly News has more detail.
Is there an immutable attested store for vendor-imported items? You could probably fake one with append-only storage or a custom git-blob script to include the attestation hash in the commit message.
posted by k3ninho at 1:15 AM on May 19, 2024 [1 favorite]
Your title question doesn't exactly line up with the body. I'll take a stab at what I understand the questions to be.
explaining GPG/PKI/SSH signing?
So these are all diffierent things, what underlies them all is asymmetric encryption. The original paper from RSA is actually pretty readable. Especially if you skip the cryptographic analysis about why it works and is secure. The general idea is that your public key isn't a secret at all, but is related to your private key in such a way that what one locks the other unlocks.
The Achilles' heel in all of this is public key distribution. How do you know the public key you have is the 'right' one? For TLS, we have a chain of trust, going from browsers / OS vendors to every repudible website. The MetaFilter cert is signed by a key tied to Lets Encrypt R3, which is in turn signed by ISR Root 1, which is known and trusted by Safari / macOS. So, to the degree that ISR / LE / MF can be trusted, I can be sure this comment form is being POST'ed to MetaFilter and not any other party, because there is a direct chain between a trusted cert and the cert MeFi presented me.
Signing works by encrypting the object with the private key, which anyone can then decrypt with the public key. Since only the responsible party has the private key, only they could have signed it, and anyone can prove that. Of course, again key distribution matters. You need some method of proving that the key pair belongs to the person who it purports to be. Keyservers can store them but anyone can upload. Most try to ensure you actually control an email address, but don't validate government issued ID or anything.
I guess what I'm trying to say is identity is a social construct that is especially thin in the digital context. At best we can ensure persistence of identity -- who (or what?) ever decrypted or signed a message is the same as the last person who did so using a key pair.
Is there a way to basically verify signing history of an arbitrary object?
You bring up git and blockchain here, and these are not really the same. They are Merkle Trees, which use hashing functions to produce tamper resistant data structures. Each node in the tree consists of some data plus the metadata, including the parent(s), and that is fed into a hash function to compute the id of the newly created node. Since the parent id was also calculated using this process, the definition is recursive and if any bit of history is altered, the hashes won't match the content. This is why you have to force push to a branch after a complicated git rebase -- you've altered the order of hashing, or the contents thereof.
For git, this was originally not needed to be crypto secure. The hash function really only needs to make collisions extremely rare to be useful at object lookup and detecting corrupt data. So it the hash function was locked in at SHA-1 originally, because it was available and fit for purpose. Over time, people started relying on that crypto secure hash, until it wasn't, and thus the 2019 move to make hash functions pluggable and move to SHA256. Bitcoin and other blockchain want their merkle tree to be crypto secure because money is on the line.
Bitcoin furthers require that all origin wallets sign the transaction, so that only wallet holders can spend money in the wallet. It then uses a global consensus protocol to ensure that the group supports a single unified history of the ledger. Git also optionally supports signed objects, but isn't required -- I could make an (unsigned!) git commit on my local file system claiming Linus Torvalds himself signed off on my rubocop policies enforcing spaces. When objects are signed, both do this using the standard signing approach mentioned above. Crucially, the signature is also fed into ID assigning hash function.
Finally, none of this helped in the case of the xz-tools attack last month. We know that code was committed by someone over the internet claiming a certain identity, but nobody knows if this person is even real. All SBOMs can do here is verify that the system delivered the bad code accurately. That's the metaphorical wide open barn door about open source -- any random state actor can patiently take over a key git repo with enough quality commits, and it's way easier than breaking PKI.
posted by pwnguin at 1:20 AM on May 19, 2024 [1 favorite]
explaining GPG/PKI/SSH signing?
So these are all diffierent things, what underlies them all is asymmetric encryption. The original paper from RSA is actually pretty readable. Especially if you skip the cryptographic analysis about why it works and is secure. The general idea is that your public key isn't a secret at all, but is related to your private key in such a way that what one locks the other unlocks.
The Achilles' heel in all of this is public key distribution. How do you know the public key you have is the 'right' one? For TLS, we have a chain of trust, going from browsers / OS vendors to every repudible website. The MetaFilter cert is signed by a key tied to Lets Encrypt R3, which is in turn signed by ISR Root 1, which is known and trusted by Safari / macOS. So, to the degree that ISR / LE / MF can be trusted, I can be sure this comment form is being POST'ed to MetaFilter and not any other party, because there is a direct chain between a trusted cert and the cert MeFi presented me.
Signing works by encrypting the object with the private key, which anyone can then decrypt with the public key. Since only the responsible party has the private key, only they could have signed it, and anyone can prove that. Of course, again key distribution matters. You need some method of proving that the key pair belongs to the person who it purports to be. Keyservers can store them but anyone can upload. Most try to ensure you actually control an email address, but don't validate government issued ID or anything.
I guess what I'm trying to say is identity is a social construct that is especially thin in the digital context. At best we can ensure persistence of identity -- who (or what?) ever decrypted or signed a message is the same as the last person who did so using a key pair.
Is there a way to basically verify signing history of an arbitrary object?
You bring up git and blockchain here, and these are not really the same. They are Merkle Trees, which use hashing functions to produce tamper resistant data structures. Each node in the tree consists of some data plus the metadata, including the parent(s), and that is fed into a hash function to compute the id of the newly created node. Since the parent id was also calculated using this process, the definition is recursive and if any bit of history is altered, the hashes won't match the content. This is why you have to force push to a branch after a complicated git rebase -- you've altered the order of hashing, or the contents thereof.
For git, this was originally not needed to be crypto secure. The hash function really only needs to make collisions extremely rare to be useful at object lookup and detecting corrupt data. So it the hash function was locked in at SHA-1 originally, because it was available and fit for purpose. Over time, people started relying on that crypto secure hash, until it wasn't, and thus the 2019 move to make hash functions pluggable and move to SHA256. Bitcoin and other blockchain want their merkle tree to be crypto secure because money is on the line.
Bitcoin furthers require that all origin wallets sign the transaction, so that only wallet holders can spend money in the wallet. It then uses a global consensus protocol to ensure that the group supports a single unified history of the ledger. Git also optionally supports signed objects, but isn't required -- I could make an (unsigned!) git commit on my local file system claiming Linus Torvalds himself signed off on my rubocop policies enforcing spaces. When objects are signed, both do this using the standard signing approach mentioned above. Crucially, the signature is also fed into ID assigning hash function.
Finally, none of this helped in the case of the xz-tools attack last month. We know that code was committed by someone over the internet claiming a certain identity, but nobody knows if this person is even real. All SBOMs can do here is verify that the system delivered the bad code accurately. That's the metaphorical wide open barn door about open source -- any random state actor can patiently take over a key git repo with enough quality commits, and it's way easier than breaking PKI.
posted by pwnguin at 1:20 AM on May 19, 2024 [1 favorite]
You need some method of proving that the key pair belongs to the person who it purports to be.
This is the problem that Keybase was originally intended to address, by providing a generalized and provably secure way to prove that this social media account / email / website / public key / whatever and that social media account / email / website / public key / whatever are both owned by the same person.
posted by flabdablet at 2:53 AM on May 19, 2024
This is the problem that Keybase was originally intended to address, by providing a generalized and provably secure way to prove that this social media account / email / website / public key / whatever and that social media account / email / website / public key / whatever are both owned by the same person.
posted by flabdablet at 2:53 AM on May 19, 2024
You are not logged in, either login or create an account to post comments
OCI images have some stuff around signatures, and are a layered format that might work well if you signed each layer. (Not saying you have to run the result as a container, just that the layered approach might be promising)
There might also be some interesting prior art with the whole "reproducible builds" stuff going on in the Linux community.
SBOM is another area of active development around these issues, although it's more focused on 3rd party libs causing security or legal problems when incorporated.
If you wanted to implement this whole cloth, I'd probably start with a Merkle tree, and storing various things on the leafs: artifact signatures (into some sort of content addressable store), then representations of modifications signed by actors within this ecosystem (CI pipelines? Admins?).
(You could probably prototype this pretty well with a Git repo and storing a signed metadata / x-delta / artifact tuple on each revision, keeping the files off repo with Git LFS )
posted by Anonymous Function at 10:03 PM on May 18, 2024