git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bill Zaumen <bill.zaumen@gmail.com>
To: "Chris West (Faux)" <faux@goeswhere.com>
Cc: Nguyen Thai Ngoc Duy <pclouds@gmail.com>,
	Jeff King <peff@peff.net>, Git Mailing List <git@vger.kernel.org>
Subject: Re: Suggestion on hashing
Date: Mon, 05 Dec 2011 19:47:03 -0800	[thread overview]
Message-ID: <1323143223.1745.67.camel@yos> (raw)
In-Reply-To: <alpine.DEB.2.00.1112060146121.15104@hoki.goeswhere.com>

When I went through the code, I noted that SHA-1 hashes are
currently used for the following:

   * object IDs
   * authentication (something to sign using public-key encryption)
   * data integrity (basically a really good checksum).

While there are lot of 20-byte arrays of unsigned char, many of those
are associated with lookups.  You might want to look at the
number of places that git_SHA1_Init is called (there aren't all that
many of those, and that function indicates the points where SHA-1
hashes are being created).

While a few things I tried were complete false starts (kept those
out of the preliminary patches I sent), I managed to store
a CRC (which you can treat as a place-holder for a real message
digest) for each SHA-1 hash in a pack file, but I did it by
creating a separate file (extension ".mds") and that worked.
I looked into modifying pack files, and that was too messy given
that you'd want older version to still work with newer remote
repositories.  The other factor is that the "mds" files are
computed locally, and at the same time that you create an "idx" file.
The formats of the "pack" and "idx" files don't change.

I've just started on replacing the CRC I used with real message
digests, making new digests easy to add. The plan is to initially
make it work with both a CRC and SHA-1 (the CRC so I can test it
easily by comparing new and old versions to show that nothing
changed when it shouldn't have), and because Git already implements
SHA-1.

I should complete my changes.  If we are lucky, maybe the changes I'm
trying would solve some of the problems you mentioned with pack files.
At least I can store the digests in a way that doesn't break the log
and fsck operations (it went through all the test suites, with only
minor modifications for things like counting the number of files in
particular directories).

If you make changes to commit objects, fixing the test scripts is a 
pain - there are a number of places where SHA-1 values are hard-
coded, and those have to be replaced.

Bill

On Tue, 2011-12-06 at 01:56 +0000, Chris West (Faux) wrote:
> Nguyen Thai Ngoc Duy wrote:
> > SHA-1 charateristics (like 20 byte length) are hard coded everywhere
> > in git, it'd be a big audit.
> 
> I was planning to look at this anyway.  My branch[1] allows
>   init/add/commit with SHA-256, SHA-512 and all the SHA-3 candidates.
> 
> log/fsck/etc. are all broken.  Don't even dare try packs.  Fixing things
>   is painful but not impossible.  I'm not convinced the task is even
>   remotely insurmountable.
> 
> (This is not a request-for-comments, just an informational notification.
>   It does not even attempt to address compatability or the like.)

  reply	other threads:[~2011-12-06  3:47 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1322813319.4340.109.camel@yos>
2011-12-02 14:22 ` Suggestion on hashing Nguyen Thai Ngoc Duy
2011-12-02 18:09   ` Jeff King
2011-12-03  0:48   ` Bill Zaumen
2011-12-06  1:56   ` Chris West (Faux)
2011-12-06  3:47     ` Bill Zaumen [this message]
2011-12-06  4:46     ` Nguyen Thai Ngoc Duy
2011-12-06  6:02       ` Bill Zaumen
2011-12-06  6:23         ` Nguyen Thai Ngoc Duy
2011-12-07  1:44           ` Bill Zaumen
2011-12-02 17:54 ` Jeff King
2011-12-03  1:50   ` Bill Zaumen
2011-12-03 15:08     ` Jeff King
2011-12-03 15:34       ` Philip Oakley
2011-12-03 21:21       ` Bill Zaumen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1323143223.1745.67.camel@yos \
    --to=bill.zaumen@gmail.com \
    --cc=faux@goeswhere.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).