All of lore.kernel.org
 help / color / mirror / Atom feed
From: Felipe Contreras <felipe.contreras@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	ZheNing Hu <adlternative@gmail.com>
Cc: Jeff King <peff@peff.net>, Taylor Blau <me@ttaylorr.com>,
	Junio C Hamano <gitster@pobox.com>,
	Git List <git@vger.kernel.org>,
	johncai86@gmail.com
Subject: Re: [Question] Can git cat-file have a type filtering option?
Date: Sun, 16 Apr 2023 06:06:02 -0600	[thread overview]
Message-ID: <643be4aa2bdf9_f43d294d9@chronos.notmuch> (raw)
In-Reply-To: <CAHk-=wjr-CMLX2Jo2++rwcv0VNr+HmZqXEVXNsJGiPRUwNxzBQ@mail.gmail.com>

Linus Torvalds wrote:
> On Fri, Apr 14, 2023 at 5:17 AM ZheNing Hu <adlternative@gmail.com> wrote:
> >
> > Jeff King <peff@peff.net> 于2023年4月14日周五 15:30写道:
> > >
> > > On Wed, Apr 12, 2023 at 05:57:02PM +0800, ZheNing Hu wrote:
> > > >
> > > > I'm still puzzled why git calculated the object id based on {type, size, data}
> > > >  together instead of just {data}?
> > >
> > > You'd have to ask Linus for the original reasoning. ;)
> 
> I originally thought of the git object store as "tagged pointers".
> 
> That actually caused confusion initially when I tried to explain this
> to SCM people, because "tag" means something very different in an SCM
> environment than it means in computer architecture.
> 
> And the implication of a tagged pointer is that you have two parts of
> it - the "tag" and the "address". Both are relevant at all points.
> 
> This isn't quite as obvious in everyday moden git usage, because a lot
> of uses end up _only_ using the "address" (aka SHA1), but it's very
> much part of the object store design. Internally, the object layout
> never uses just the SHA1, it's all "type:SHA1", even if sometimes the
> types are implied (ie the tree object doesn't spell out "blob", but
> it's still explicit in the mode bits).
> 
> This is very very obvious in "git cat-file", which was one of the
> original scripts in the first commit (but even there the tag/type has
> changed meaning over time: the very first version didn't use it as
> input at all, then it started verifying it, and then later it got the
> more subtle context of "peel the tags until you find this type").
> 
> You can also see this in the original README (again, go look at that
> first git commit): the README talks about the "tag of their type".
> 
> Of course, in practice git then walked away from having to specify the
> type all the time. It started even in that original release, in that
> the HEAD file never contained the type - because it was implicit (a
> HEAD is always a commit).
> 
> So we ended up having a lot of situations like that where the actual
> tag part was implicit from context, and these days people basically
> never refer to the "full" object name with tag, but only the SHA1
> address.
> 
> So now we have situations where the type really has to be looked up
> dynamically, because it's not explicitly encoded anywhere. While HEAD
> is supposed to always be a commit, other refs can be pretty much
> anything, and can point to a tag object, a commit, a tree or a blob.
> So then you actually have to look up the type based on the address.
> 
> End result: these days people don't even think of git objects as
> "tagged pointers".  Even internally in git, lots of code just passes
> the "object name" along without any tag/type, just the raw SHA1 / OID.
> 
> So that originally "everything is a tagged pointer" is much less true
> than it used to be, and now, instead of having tagged pointers, you
> mostly end up with just "bare pointers" and look up the type
> dynamically from there.
> 
> And that "look up the type in the object" is possible because even
> originally, I did *not* want any kind of "object type aliasing".
> 
> So even when looking up the object with the full "tag:pointer", the
> encoding of the object itself then also contains that object type, so
> that you can cross-check that you used the right tag.
> 
> That said, you *can* see some of the effects of this "tagged pointers"
> in how the internals do things like
> 
>     struct commit *commit = lookup_commit(repo, &oid);
> 
> which conceptually very much is about tagged pointers. And the fact
> that two objects cannot alias is actually somewhat encoded in that: a
> "struct commit" contains a "struct object" as a member. But so does
> "struct blob" - and the two "struct object" cases are never the same
> "object".
> 
> So there's never any worry about "could blob.object be the same object
> as commit.object"?
> 
> That is actually inherent in the code, in how "lookup_commit()"
> actually does lookup_object() and then does object_as_type(OBJ_COMMIT)
> on the result.

This explains rather well why the object type is used in the calculation, and
it makes sense.

But I don't see anything about the object size. Isn't that unnecessary?

-- 
Felipe Contreras

  reply	other threads:[~2023-04-16 12:06 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-07 14:24 [Question] Can git cat-file have a type filtering option? ZheNing Hu
2023-04-07 16:30 ` Junio C Hamano
2023-04-08  6:27   ` ZheNing Hu
2023-04-09  1:28     ` Taylor Blau
2023-04-09  2:19       ` Taylor Blau
2023-04-09  2:26         ` Taylor Blau
2023-04-09  6:51           ` ZheNing Hu
2023-04-10 20:01             ` Jeff King
2023-04-10 23:20               ` Taylor Blau
2023-04-09  6:47       ` ZheNing Hu
2023-04-10 20:14         ` Jeff King
2023-04-11 14:09           ` ZheNing Hu
2023-04-12  7:43             ` Jeff King
2023-04-12  9:57               ` ZheNing Hu
2023-04-14  7:30                 ` Jeff King
2023-04-14 12:17                   ` ZheNing Hu
2023-04-14 15:58                     ` Junio C Hamano
2023-04-16 11:15                       ` ZheNing Hu
2023-04-14 17:04                     ` Linus Torvalds
2023-04-16 12:06                       ` Felipe Contreras [this message]
2023-04-16 12:43                       ` ZheNing Hu
2023-04-09  1:26   ` Taylor Blau
2023-04-09  1:23 ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=643be4aa2bdf9_f43d294d9@chronos.notmuch \
    --to=felipe.contreras@gmail.com \
    --cc=adlternative@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johncai86@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.