git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Shawn O. Pearce" <spearce@spearce.org>
To: David Srbecky <dsrbecky@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: Why is the name of a blob SHA1("$type $size\0$data") and not SHA1("$data")?
Date: Thu, 30 Apr 2009 13:02:17 -0700	[thread overview]
Message-ID: <20090430200217.GU23604@spearce.org> (raw)
In-Reply-To: <49FA0214.70009@gmail.com>

David Srbecky <dsrbecky@gmail.com> wrote:
>
> I started digging into the details and there is one thing that is really  
> bugging me - why is the name of a blob SHA1("$type $size\0$data") and  
> not SHA1("$data")?  I mean, wouldn't it be beautiful if the name of the  
> blob would really just be the SHA1 of the uncompressed file content? :-)

Well, a commit is stored in the same namespace as a blob (file
content).  So the type being included in the SHA1 computation helps
to break them apart and say "this is really a commit" vs. "this
is a file that just happens to have the same content as a commit".
It does help consistency checkers like `git fsck` to know that the
object is used in the right context.

I can't guess what Linus had in mind when he wrote Git, but I would
wager it was something along the lines that storing everything in
a single directory structure was simpler/more elegant than having
a different directory structure per object type.  Today I would
probably have made the same design decision, but I'm biased by
Git already so who knows if I'm just mimicing Linus' brilliance or
would have arrived at the same result myself.

Including the length is overkill, yes, but its in the header of the
data so that git can immediately allocate a properly sized memory
buffer before it inflates the rest of the object content.  Its a
performance improvement.  Its probably a historical accident that
it got included in the SHA1 computation, as notice its position
between the type and the data... it likely was just easier to
include it in the SHA1 than to exclude it.

> I would really appriciate some comments on the design decisions so that  
> I can sleep well at night :-)

Then I won't mention pack files... which aren't as simple to read
as just inflating a file on disk.  :-)

-- 
Shawn.

  reply	other threads:[~2009-04-30 20:02 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-30 19:55 Why is the name of a blob SHA1("$type $size\0$data") and not SHA1("$data")? David Srbecky
2009-04-30 20:02 ` Shawn O. Pearce [this message]
2009-04-30 22:57 ` Björn Steinbrink

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090430200217.GU23604@spearce.org \
    --to=spearce@spearce.org \
    --cc=dsrbecky@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).