* Re: Suggestion on hashing [not found] <1322813319.4340.109.camel@yos> @ 2011-12-02 14:22 ` Nguyen Thai Ngoc Duy 2011-12-02 18:09 ` Jeff King ` (2 more replies) 2011-12-02 17:54 ` Jeff King 1 sibling, 3 replies; 14+ messages in thread From: Nguyen Thai Ngoc Duy @ 2011-12-02 14:22 UTC (permalink / raw) To: Bill Zaumen; +Cc: Jeff King, Git Mailing List (I'm not sure why you dropped git@vger. I see nothing private here so I bring git@vger back) On Fri, Dec 2, 2011 at 3:08 PM, Bill Zaumen <bill.zaumen@gmail.com> wrote: > At one point Nguyen said that "What I'm thinking is whether it's > possible to decouple two sha-1 roles in git, as object identifier > and digest, separately. Each sha-1 identifies an object and an extra > set of digests on the "same" object." > > My code pretty much does that (it just uses a CRC instead of a real > digest, but I can easily change that). It'd be easier to look at your code if you split it into a series of smaller patches. > So the question is whether > using SHA-1 as an ID and SHA-256(?) as a digest is a better long term > solution than simply replacing SHA-1. I would not stick with any algorithm permanently. No one knows when SHA-256 might be broken. > If there is some interest in pursuing it further, I could make those > changes fairly easily. Then you'd have two message digests, a SHA-1 > and a longer one, with the longer one stored parallel to the actual > object. Then it becomes easy to compute a digest of all the digests > in a commit's tree and store that in a commit, if that is what you > want to do. I personally would like to see how it works out especially when computing new digests is much more expensive than SHA-1. And I hope that by delaying computing new digests (stored outside actual objects), we could make minimum code changes to git. Though security concerns may be the killer factor and I haven't worked that out yet. > Replacing SHA-1 with something like SHA-256 sounds easier to implement, SHA-1 charateristics (like 20 byte length) are hard coded everywhere in git, it'd be a big audit. > but the problem is all the existing repositories. While rewriting all > the objects and trees to use new hashes is similar to a rebase in most > cases, there is a complication - submodules. Git stores the hash of > a submodule's commit in its tree because a particular revision of > a project 'goes' with a particular revision of a submodule. But, a > submodule can exist in one revision and not in the next or previous > revision Furthermore A could be a submodule of B at one point in time, > and many commits later, B could end up being a submodule of A. > Fixing it up could be pretty complicated (plus having to deal with > network failures - to update GitHub for example, you'd have to download > submodules it uses, possibly from somewhere else and some submodules may > not be publicly accessible (e.g., a private project kept on GitHub but > with a critical submodule kept in house behind a corporate firewall). > Also, you might have to update a git repository and its submodules > concurrently, so that you always can find a new value when you need > it. > > My guess is that this could be far more complicated than what I did. > Excluding two files that are not used (the symbol PACKDB is not > defined), I added two new files, crcdb.h and objd-crcdb.c which store > CRCs for loose objects - 517 lines total including lots of comments in > the header file - full documentation for each function. The other > changes include 1475 lines of new code in previously existing git files > and 136 deletions (most trivial). There were also minor changes to > the makefile and test scripts. You'd need to convince git maintainer this is worth doing first, before talking how big the changes are ;-) > Bill -- Duy ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Suggestion on hashing 2011-12-02 14:22 ` Suggestion on hashing Nguyen Thai Ngoc Duy @ 2011-12-02 18:09 ` Jeff King 2011-12-03 0:48 ` Bill Zaumen 2011-12-06 1:56 ` Chris West (Faux) 2 siblings, 0 replies; 14+ messages in thread From: Jeff King @ 2011-12-02 18:09 UTC (permalink / raw) To: Nguyen Thai Ngoc Duy; +Cc: Bill Zaumen, Git Mailing List On Fri, Dec 02, 2011 at 09:22:31PM +0700, Nguyen Thai Ngoc Duy wrote: > > So the question is whether > > using SHA-1 as an ID and SHA-256(?) as a digest is a better long term > > solution than simply replacing SHA-1. > > I would not stick with any algorithm permanently. No one knows when > SHA-256 might be broken. Yeah, you could stick a few bits of algorithm parameter in the beginning of each identifier. It would mean unique hashes get one character or so longer (and they would all start with "1", or whatever the identifier is). SHA-256 doesn't suffer from SHA-1's problems, though they are based on related constructions, so I think there is some concern that it may eventually fail in the same way. SHA-3 is a better bet in that sense, but it will also be very unproven, even once it is actually standardized. > > Replacing SHA-1 with something like SHA-256 sounds easier to implement, > > SHA-1 charateristics (like 20 byte length) are hard coded everywhere > in git, it'd be a big audit. In theory, you could truncate a longer hash to 160-bits. It's not the bit-strength of SHA-1 that is the problem, but the attacks on the algorithm itself which reduce the bit-strength to something too low. I would think a truncated result would retain the same cryptographic properties, as one of the properties of the un-truncated hash is that changes in the input data are reflected throughout the hash. Some hashes, like Skein, explicitly have a big internal state, and then just let you output as many bytes as is appropriate (i.e., being a drop-in replacement for SHA-1 is an explicit goal). But I'm not a cryptographer, so there may be some subtle issues with doing that to arbitrary hash functions. -Peff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Suggestion on hashing 2011-12-02 14:22 ` Suggestion on hashing Nguyen Thai Ngoc Duy 2011-12-02 18:09 ` Jeff King @ 2011-12-03 0:48 ` Bill Zaumen 2011-12-06 1:56 ` Chris West (Faux) 2 siblings, 0 replies; 14+ messages in thread From: Bill Zaumen @ 2011-12-03 0:48 UTC (permalink / raw) To: Nguyen Thai Ngoc Duy; +Cc: Jeff King, Git Mailing List On Fri, 2011-12-02 at 21:22 +0700, Nguyen Thai Ngoc Duy wrote: > (I'm not sure why you dropped git@vger. I see nothing private here so > I bring git@vger back) Oh, I just didn't want to flood the mailing list with too much on one topic and figured we could summarize a discussion at some point and post that, but if you'd rather keep it all on the list, that's fine with me. I can split the code into a series of smaller patches - smaller than the set of three I sent, but I'm not sure if the test scripts will work with all of the intermediate patches if I do that. I can also make the digest (current a CRC) pluggable. Then you can try different digests as an experiment and see how that affects performance. My implementation uses the CRC or new digests only when the object database is being modified or explicitly verified. Basically the code provides memoization for an additional hash function, used for whatever purpose you desire. If you want to put a digest of message digests into a commit message, you can do that fairly quickly as one level of digests has been precomputed. I think Jeff's or your suggestion of putting an additional digest in the commit message is a good idea. If you want to experiment with such changes, the code would provide a reasonable start on that. So, I guess I should make those changes - pluggable digest and splitting the patches further. > You'd need to convince git maintainer this is worth doing first, > before talking how big the changes are ;-) I'd guess there are several issues: the amount of code, how complex the changes are, what the performance impacts are, whether the changes are backwards compatible, and what you get for the effort. As a start on the last question, "what you get," aside from some extra checking to detect problems, if you modify commit messages and signed tags to use better digests, you can make a stronger argument regarding authentication. For example, suppose you have a project in which your code is dual-licensed - GPL for free use but a separate license if the code is used in a proprietary product and there is a legal dispute, using a better digest than SHA-1 would have some advantages - when they start calling in expert witnesses, one side will bring in a security expert who will testify that SHA-1 is too weak to be used for authentication, citing government publications such as http://csrc.nist.gov/groups/ST/hash/statement.html as evidence. The jury is not going to consist of people who can fully understand the details, so being able to say that git's authentication matches current best practices would be an additional reason to use git. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Suggestion on hashing 2011-12-02 14:22 ` Suggestion on hashing Nguyen Thai Ngoc Duy 2011-12-02 18:09 ` Jeff King 2011-12-03 0:48 ` Bill Zaumen @ 2011-12-06 1:56 ` Chris West (Faux) 2011-12-06 3:47 ` Bill Zaumen 2011-12-06 4:46 ` Nguyen Thai Ngoc Duy 2 siblings, 2 replies; 14+ messages in thread From: Chris West (Faux) @ 2011-12-06 1:56 UTC (permalink / raw) To: Nguyen Thai Ngoc Duy; +Cc: Bill Zaumen, Jeff King, Git Mailing List Nguyen Thai Ngoc Duy wrote: > SHA-1 charateristics (like 20 byte length) are hard coded everywhere > in git, it'd be a big audit. I was planning to look at this anyway. My branch[1] allows init/add/commit with SHA-256, SHA-512 and all the SHA-3 candidates. log/fsck/etc. are all broken. Don't even dare try packs. Fixing things is painful but not impossible. I'm not convinced the task is even remotely insurmountable. (This is not a request-for-comments, just an informational notification. It does not even attempt to address compatability or the like.) $ make HASH=sha512 -j6 $ PATH=bin-wrappers:.. $ git init && echo hi > foo && git add foo && git commit -m "bang" Initialized empty Git repository in /.../.git/ [master (root-commit) 8d3ae658dff0c6e398bb4a0d193974e49acfadedfcd61daca42c931ac18d5ac46f0a068e08d81c25d7b79b1c3f4951e4340eeb90f0ef39de355c9bab7e75faba] bang 1 files changed, 1 insertions(+), 0 deletions(-) create mode 100644 foo 1. (Please use the hash-v0.0.1 tag, I rebase.) gitweb: http://preview.tinyurl.com/bsufh92 git://git.goeswhere.com/git/git.git https://github.com/FauxFaux/git/tree/hash-v0.0.1 --- Chris West (Faux) Freenode #git: FauxFaux https://ssl.goeswhere.com/key-transition-2011-10-10.txt.asc gpg: 408A E4F1 4EA7 33EF 1265 82C1 B195 E1C4 779B A9B2 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Suggestion on hashing 2011-12-06 1:56 ` Chris West (Faux) @ 2011-12-06 3:47 ` Bill Zaumen 2011-12-06 4:46 ` Nguyen Thai Ngoc Duy 1 sibling, 0 replies; 14+ messages in thread From: Bill Zaumen @ 2011-12-06 3:47 UTC (permalink / raw) To: Chris West (Faux); +Cc: Nguyen Thai Ngoc Duy, Jeff King, Git Mailing List When I went through the code, I noted that SHA-1 hashes are currently used for the following: * object IDs * authentication (something to sign using public-key encryption) * data integrity (basically a really good checksum). While there are lot of 20-byte arrays of unsigned char, many of those are associated with lookups. You might want to look at the number of places that git_SHA1_Init is called (there aren't all that many of those, and that function indicates the points where SHA-1 hashes are being created). While a few things I tried were complete false starts (kept those out of the preliminary patches I sent), I managed to store a CRC (which you can treat as a place-holder for a real message digest) for each SHA-1 hash in a pack file, but I did it by creating a separate file (extension ".mds") and that worked. I looked into modifying pack files, and that was too messy given that you'd want older version to still work with newer remote repositories. The other factor is that the "mds" files are computed locally, and at the same time that you create an "idx" file. The formats of the "pack" and "idx" files don't change. I've just started on replacing the CRC I used with real message digests, making new digests easy to add. The plan is to initially make it work with both a CRC and SHA-1 (the CRC so I can test it easily by comparing new and old versions to show that nothing changed when it shouldn't have), and because Git already implements SHA-1. I should complete my changes. If we are lucky, maybe the changes I'm trying would solve some of the problems you mentioned with pack files. At least I can store the digests in a way that doesn't break the log and fsck operations (it went through all the test suites, with only minor modifications for things like counting the number of files in particular directories). If you make changes to commit objects, fixing the test scripts is a pain - there are a number of places where SHA-1 values are hard- coded, and those have to be replaced. Bill On Tue, 2011-12-06 at 01:56 +0000, Chris West (Faux) wrote: > Nguyen Thai Ngoc Duy wrote: > > SHA-1 charateristics (like 20 byte length) are hard coded everywhere > > in git, it'd be a big audit. > > I was planning to look at this anyway. My branch[1] allows > init/add/commit with SHA-256, SHA-512 and all the SHA-3 candidates. > > log/fsck/etc. are all broken. Don't even dare try packs. Fixing things > is painful but not impossible. I'm not convinced the task is even > remotely insurmountable. > > (This is not a request-for-comments, just an informational notification. > It does not even attempt to address compatability or the like.) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Suggestion on hashing 2011-12-06 1:56 ` Chris West (Faux) 2011-12-06 3:47 ` Bill Zaumen @ 2011-12-06 4:46 ` Nguyen Thai Ngoc Duy 2011-12-06 6:02 ` Bill Zaumen 1 sibling, 1 reply; 14+ messages in thread From: Nguyen Thai Ngoc Duy @ 2011-12-06 4:46 UTC (permalink / raw) To: Chris West (Faux); +Cc: Bill Zaumen, Jeff King, Git Mailing List On Tue, Dec 6, 2011 at 8:56 AM, Chris West (Faux) <faux@goeswhere.com> wrote: > > Nguyen Thai Ngoc Duy wrote: >> >> SHA-1 charateristics (like 20 byte length) are hard coded everywhere >> in git, it'd be a big audit. > > > I was planning to look at this anyway. My branch[1] allows > init/add/commit with SHA-256, SHA-512 and all the SHA-3 candidates. Great! > log/fsck/etc. are all broken. Don't even dare try packs. Fixing things > is painful but not impossible. I'm not convinced the task is even > remotely insurmountable. It would take more work, but after you're done with code changes, you should have a look at updating the test suite. We have many SHA-1s there. If the test suite passes, your job is (beautifully) done. -- Duy ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Suggestion on hashing 2011-12-06 4:46 ` Nguyen Thai Ngoc Duy @ 2011-12-06 6:02 ` Bill Zaumen 2011-12-06 6:23 ` Nguyen Thai Ngoc Duy 0 siblings, 1 reply; 14+ messages in thread From: Bill Zaumen @ 2011-12-06 6:02 UTC (permalink / raw) To: Nguyen Thai Ngoc Duy; +Cc: Chris West (Faux), Jeff King, Git Mailing List On Tue, 2011-12-06 at 11:46 +0700, Nguyen Thai Ngoc Duy wrote: > On Tue, Dec 6, 2011 at 8:56 AM, Chris West (Faux) <faux@goeswhere.com> wrote: > > > > Nguyen Thai Ngoc Duy wrote: > >> > >> SHA-1 charateristics (like 20 byte length) are hard coded everywhere > >> in git, it'd be a big audit. > > > > > > I was planning to look at this anyway. My branch[1] allows > > init/add/commit with SHA-256, SHA-512 and all the SHA-3 candidates. > > Great! If you are replacing SHA-1 as an object ID with another hash function, two things to watch are submodules and alternative object databases. Because of those, it is necessary to worry about the order in which repositories are converted. In the worst case for submodules, you'd have to do multiple repositories at the same time, switching between them depending on what you need at each point. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Suggestion on hashing 2011-12-06 6:02 ` Bill Zaumen @ 2011-12-06 6:23 ` Nguyen Thai Ngoc Duy 2011-12-07 1:44 ` Bill Zaumen 0 siblings, 1 reply; 14+ messages in thread From: Nguyen Thai Ngoc Duy @ 2011-12-06 6:23 UTC (permalink / raw) To: Bill Zaumen; +Cc: Chris West (Faux), Jeff King, Git Mailing List On Tue, Dec 6, 2011 at 1:02 PM, Bill Zaumen <bill.zaumen@gmail.com> wrote: > On Tue, 2011-12-06 at 11:46 +0700, Nguyen Thai Ngoc Duy wrote: >> On Tue, Dec 6, 2011 at 8:56 AM, Chris West (Faux) <faux@goeswhere.com> wrote: >> > >> > Nguyen Thai Ngoc Duy wrote: >> >> >> >> SHA-1 charateristics (like 20 byte length) are hard coded everywhere >> >> in git, it'd be a big audit. >> > >> > >> > I was planning to look at this anyway. My branch[1] allows >> > init/add/commit with SHA-256, SHA-512 and all the SHA-3 candidates. >> >> Great! > > If you are replacing SHA-1 as an object ID with another hash function, > two things to watch are submodules and alternative object databases. > Because of those, it is necessary to worry about the order in which > repositories are converted. In the worst case for submodules, you'd > have to do multiple repositories at the same time, switching between > them depending on what you need at each point. I know migration would be painful. But note that new repos can benefit stronger digest without legacy (of course until it links to an old repo). For submodules, I think we should extend it to become something similar to soft-link: git link is an SHA-1 to a text file that contains SHA-1 and maybe other digests of the submodule's tip. -- Duy ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Suggestion on hashing 2011-12-06 6:23 ` Nguyen Thai Ngoc Duy @ 2011-12-07 1:44 ` Bill Zaumen 0 siblings, 0 replies; 14+ messages in thread From: Bill Zaumen @ 2011-12-07 1:44 UTC (permalink / raw) To: Nguyen Thai Ngoc Duy; +Cc: Chris West (Faux), Jeff King, Git Mailing List On Tue, 2011-12-06 at 13:23 +0700, Nguyen Thai Ngoc Duy wrote: > On Tue, Dec 6, 2011 at 1:02 PM, Bill Zaumen <bill.zaumen@gmail.com> wrote: > > If you are replacing SHA-1 as an object ID with another hash function, > > two things to watch are submodules and alternative object databases. > > Because of those, it is necessary to worry about the order in which > > repositories are converted. In the worst case for submodules, you'd > > have to do multiple repositories at the same time, switching between > > them depending on what you need at each point. > > I know migration would be painful. But note that new repos can benefit > stronger digest without legacy (of course until it links to an old > repo). For submodules, I think we should extend it to become something > similar to soft-link: git link is an SHA-1 to a text file that > contains SHA-1 and maybe other digests of the submodule's tip. Repositories would need to store a table mapping old SHA-1 values to the new ones (for commits). There's nothing in a repository to reliably indicate that it is being used as a submodule, and the choice of submodules can vary from commit to commit, making it difficult to control the order in which objects have their hashes updated. In some corner cases, you could have two branches in each of two repositories with different choices as to which is a submodule of which, although I'd be surprised if anyone actually did that. Aside from that, in some corporate environments, the IT departments want to determine the release schedule for applications, and would take a dim view of changes that could not be tested first without being widely deployed. You could end up making Git unacceptable for those departments if you do not maintain backwards compatibility with existing repositories. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Suggestion on hashing [not found] <1322813319.4340.109.camel@yos> 2011-12-02 14:22 ` Suggestion on hashing Nguyen Thai Ngoc Duy @ 2011-12-02 17:54 ` Jeff King 2011-12-03 1:50 ` Bill Zaumen 1 sibling, 1 reply; 14+ messages in thread From: Jeff King @ 2011-12-02 17:54 UTC (permalink / raw) To: Bill Zaumen; +Cc: git, pclouds On Fri, Dec 02, 2011 at 12:08:39AM -0800, Bill Zaumen wrote: > At one point Nguyen said that "What I'm thinking is whether it's > possible to decouple two sha-1 roles in git, as object identifier > and digest, separately. Each sha-1 identifies an object and an extra > set of digests on the "same" object." > > My code pretty much does that (it just uses a CRC instead of a real > digest, but I can easily change that). So the question is whether > using SHA-1 as an ID and SHA-256(?) as a digest is a better long term > solution than simply replacing SHA-1. I think your code is solving the wrong problem (or solving the right problem in a half-way manner). The only things that make sense to me are: 1. Do nothing. SHA-1 is probably not broken yet, even by the NSA, and even if it is, an attack is extremely expensive to mount. This may change in the future, of course, but it will probably stay expensive for a while. 2. Decouple the object identifier and digest roles, but insert the digest into newly created objects, so it can be part of the signature chain. I described such a scheme in one of my replies to you. It has some complexities, but has the bonus that we can build directly on older history, preserving its sha1s. 3. Replace SHA-1 with a more secure algorithm. I'm probably in favor of (1) at this point. Whether to do (2) or (3) will depend on where we are when SHA-1 gets feasibly broken. It may be many years away, at which point we may be considering a git 2.0 that breaks repository compatibility, anyway. That would be a natural time to consider changing the algorithm. > Replacing SHA-1 with something like SHA-256 sounds easier to implement, > but the problem is all the existing repositories. Right. I don't think anyone is denying that it would be a giant pain. -Peff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Suggestion on hashing 2011-12-02 17:54 ` Jeff King @ 2011-12-03 1:50 ` Bill Zaumen 2011-12-03 15:08 ` Jeff King 0 siblings, 1 reply; 14+ messages in thread From: Bill Zaumen @ 2011-12-03 1:50 UTC (permalink / raw) To: Jeff King; +Cc: git, pclouds On Fri, 2011-12-02 at 12:54 -0500, Jeff King wrote: > On Fri, Dec 02, 2011 at 12:08:39AM -0800, Bill Zaumen wrote: > I think your code is solving the wrong problem (or solving the right > problem in a half-way manner). The only things that make sense to me > are: > > 1. Do nothing. SHA-1 is probably not broken yet, even by the NSA, and > even if it is, an attack is extremely expensive to mount. This may > change in the future, of course, but it will probably stay > expensive for a while. > > 2. Decouple the object identifier and digest roles, but insert the > digest into newly created objects, so it can be part of the > signature chain. I described such a scheme in one of my replies to > you. It has some complexities, but has the bonus that we can build > directly on older history, preserving its sha1s. > > 3. Replace SHA-1 with a more secure algorithm. Suppose I make the digest pluggable, something I intended to do eventually anyway? Then you just use the existing SHA-1 as an object identifier and the new digest in a signature chain? What I did was essentially to compute the new digest (using a CRC as the trivial case) whenever an object's SHA-1 hash is computed, plus using the new digest for low-cost collision checks. Then you have everything needed to experiment with your second option. I got the impression that Nguyen had some interest in that, but could be mistaken. The use is simple: if you have the SHA-1 hash of an object, you call a function, currently named "has_sha1_file_crc" and it returns true if a CRC is available, writing the hash into the buffer supplied as its second argument. You can do whatever you like with it. If you want a digest of digests, you just traverse a commit's tree, and call has_sha1_file_crc whenever you want to look up a digest. So, the API is actually very simple if you just use the patch to quickly look up the digest associated with a SHA-1 ID - everything else it does happens automatically. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Suggestion on hashing 2011-12-03 1:50 ` Bill Zaumen @ 2011-12-03 15:08 ` Jeff King 2011-12-03 15:34 ` Philip Oakley 2011-12-03 21:21 ` Bill Zaumen 0 siblings, 2 replies; 14+ messages in thread From: Jeff King @ 2011-12-03 15:08 UTC (permalink / raw) To: Bill Zaumen; +Cc: git, pclouds On Fri, Dec 02, 2011 at 05:50:21PM -0800, Bill Zaumen wrote: > On Fri, 2011-12-02 at 12:54 -0500, Jeff King wrote: > > On Fri, Dec 02, 2011 at 12:08:39AM -0800, Bill Zaumen wrote: > > > I think your code is solving the wrong problem (or solving the right > > problem in a half-way manner). The only things that make sense to me > > are: > > > > 1. Do nothing. SHA-1 is probably not broken yet, even by the NSA, and > > even if it is, an attack is extremely expensive to mount. This may > > change in the future, of course, but it will probably stay > > expensive for a while. > > > > 2. Decouple the object identifier and digest roles, but insert the > > digest into newly created objects, so it can be part of the > > signature chain. I described such a scheme in one of my replies to > > you. It has some complexities, but has the bonus that we can build > > directly on older history, preserving its sha1s. > > > > 3. Replace SHA-1 with a more secure algorithm. > > Suppose I make the digest pluggable, something I intended to do > eventually anyway? Then you just use the existing SHA-1 as an > object identifier and the new digest in a signature chain? What I > did was essentially to compute the new digest (using a CRC as the > trivial case) whenever an object's SHA-1 hash is computed, plus > using the new digest for low-cost collision checks. If you make the digest stronger (or pluggable) and include it in the actual objects themselves, then you have a start on (2). I'd drop all of the digest-exchange bits from the protocol, as the actual signatures are the real, trustable verification. I don't think you can drop the external storage of the digests, which is one of the ugliest bits. You'll be asking for the digests all the time to create new commit objects, so you need to have it at hand without rehashing. And I wouldn't get my hopes up that this will go into git any time soon. At this point, we're really guessing about how broken SHA-1 will be in the future, and how much we are going to want to care. Just my two cents. -Peff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Suggestion on hashing 2011-12-03 15:08 ` Jeff King @ 2011-12-03 15:34 ` Philip Oakley 2011-12-03 21:21 ` Bill Zaumen 1 sibling, 0 replies; 14+ messages in thread From: Philip Oakley @ 2011-12-03 15:34 UTC (permalink / raw) To: Bill Zaumen; +Cc: git, pclouds, Jeff King Had you seen the recent thread by Junio with the footnote link to the paper on reconcilliation by using multiple hashes? http://article.gmane.org/gmane.linux.kernel/1214517. "What's the Difference? Efficient Set Reconciliation without Prior Context" http://cseweb.ucsd.edu/~fuyeda/papers/sigcomm2011.pdfIt looks to have a lot of the properties being sought, and links with other git aspects. Philip From: "Jeff King" <peff@peff.net>: Saturday, December 03, 2011 3:08 PM On Fri, Dec 02, 2011 at 05:50:21PM -0800, Bill Zaumen wrote: > On Fri, 2011-12-02 at 12:54 -0500, Jeff King wrote: > > On Fri, Dec 02, 2011 at 12:08:39AM -0800, Bill Zaumen wrote: > > > I think your code is solving the wrong problem (or solving the right > > problem in a half-way manner). The only things that make sense to me > > are: > > > > 1. Do nothing. SHA-1 is probably not broken yet, even by the NSA, and > > even if it is, an attack is extremely expensive to mount. This may > > change in the future, of course, but it will probably stay > > expensive for a while. > > > > 2. Decouple the object identifier and digest roles, but insert the > > digest into newly created objects, so it can be part of the > > signature chain. I described such a scheme in one of my replies to > > you. It has some complexities, but has the bonus that we can build > > directly on older history, preserving its sha1s. > > > > 3. Replace SHA-1 with a more secure algorithm. > > Suppose I make the digest pluggable, something I intended to do > eventually anyway? Then you just use the existing SHA-1 as an > object identifier and the new digest in a signature chain? What I > did was essentially to compute the new digest (using a CRC as the > trivial case) whenever an object's SHA-1 hash is computed, plus > using the new digest for low-cost collision checks. If you make the digest stronger (or pluggable) and include it in the actual objects themselves, then you have a start on (2). I'd drop all of the digest-exchange bits from the protocol, as the actual signatures are the real, trustable verification. I don't think you can drop the external storage of the digests, which is one of the ugliest bits. You'll be asking for the digests all the time to create new commit objects, so you need to have it at hand without rehashing. And I wouldn't get my hopes up that this will go into git any time soon. At this point, we're really guessing about how broken SHA-1 will be in the future, and how much we are going to want to care. Just my two cents. -Peff -- Version: 2012.0.1873 / Virus Database: 2102/4653 - Release Date: 12/02/11 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Suggestion on hashing 2011-12-03 15:08 ` Jeff King 2011-12-03 15:34 ` Philip Oakley @ 2011-12-03 21:21 ` Bill Zaumen 1 sibling, 0 replies; 14+ messages in thread From: Bill Zaumen @ 2011-12-03 21:21 UTC (permalink / raw) To: Jeff King; +Cc: git, pclouds On Sat, 2011-12-03 at 10:08 -0500, Jeff King wrote: > > > > Suppose I make the digest pluggable, something I intended to do > > eventually anyway? Then you just use the existing SHA-1 as an > > object identifier and the new digest in a signature chain? What I > > did was essentially to compute the new digest (using a CRC as the > > trivial case) whenever an object's SHA-1 hash is computed, plus > > using the new digest for low-cost collision checks. > > If you make the digest stronger (or pluggable) and include it in the > actual objects themselves, then you have a start on (2). > > I'd drop all of the digest-exchange bits from the protocol, as the > actual signatures are the real, trustable verification. I don't think > you can drop the external storage of the digests, which is one of the > ugliest bits. You'll be asking for the digests all the time to create > new commit objects, so you need to have it at hand without rehashing. The digest-exchange bits, including the tests and response to errors, is only 222 lines of new code, so its really a minor part. The rest takes care of what you referred to as "one of the ugliest bits," so I think it is useful to have available - you can then try various ways of improving the authentication of commit objects without having to do a lot of initial work. I can make those changes - probably over the next couple of weeks or so (have some other non-related things to take care of) and then send a new set of patches. > > And I wouldn't get my hopes up that this will go into git any time soon. > At this point, we're really guessing about how broken SHA-1 will be in > the future, and how much we are going to want to care. > > Just my two cents. Thanks for the discussion. I might add that it is not just a question of how broken SHA-1 is. If an IT department is considering adopting Git as the company's revision control system and authentication is important to the company, an IT manager may not accept SHA-1 for authentication purposes because NIST claims SHA-1 is not adequate for authentication in general and explaining to upper management why NIST's statement is not applicable given the way SHA-1 is used in Git is much harder than saying, "Git follows the current best practices regarding authentication." That statement is a simple check-list item one can show upper management in comparing alternatives. Such issues (making technical choices for non-technical reasons) have come up before - I once worked on a high-speed (for the time) networking project and our manager mentioned that transferring medical records such as X-ray pictures was one application - they do not accept lossy data compression because, even if it is completely adequate, in a malpractice suit, the plaintiff's lawyer would say, "And they purposely threw away data critical to my client's health," which would sound pretty damning to a typical jury. The legal risk outweighed the cost of the additional bandwidth. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2011-12-07 1:44 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <1322813319.4340.109.camel@yos> 2011-12-02 14:22 ` Suggestion on hashing Nguyen Thai Ngoc Duy 2011-12-02 18:09 ` Jeff King 2011-12-03 0:48 ` Bill Zaumen 2011-12-06 1:56 ` Chris West (Faux) 2011-12-06 3:47 ` Bill Zaumen 2011-12-06 4:46 ` Nguyen Thai Ngoc Duy 2011-12-06 6:02 ` Bill Zaumen 2011-12-06 6:23 ` Nguyen Thai Ngoc Duy 2011-12-07 1:44 ` Bill Zaumen 2011-12-02 17:54 ` Jeff King 2011-12-03 1:50 ` Bill Zaumen 2011-12-03 15:08 ` Jeff King 2011-12-03 15:34 ` Philip Oakley 2011-12-03 21:21 ` Bill Zaumen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).