From: Felipe Contreras <felipe.contreras@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>,
ZheNing Hu <adlternative@gmail.com>
Cc: Jeff King <peff@peff.net>, Taylor Blau <me@ttaylorr.com>,
Junio C Hamano <gitster@pobox.com>,
Git List <git@vger.kernel.org>,
johncai86@gmail.com
Subject: Re: [Question] Can git cat-file have a type filtering option?
Date: Sun, 16 Apr 2023 06:06:02 -0600 [thread overview]
Message-ID: <643be4aa2bdf9_f43d294d9@chronos.notmuch> (raw)
In-Reply-To: <CAHk-=wjr-CMLX2Jo2++rwcv0VNr+HmZqXEVXNsJGiPRUwNxzBQ@mail.gmail.com>
Linus Torvalds wrote:
> On Fri, Apr 14, 2023 at 5:17 AM ZheNing Hu <adlternative@gmail.com> wrote:
> >
> > Jeff King <peff@peff.net> 于2023年4月14日周五 15:30写道:
> > >
> > > On Wed, Apr 12, 2023 at 05:57:02PM +0800, ZheNing Hu wrote:
> > > >
> > > > I'm still puzzled why git calculated the object id based on {type, size, data}
> > > > together instead of just {data}?
> > >
> > > You'd have to ask Linus for the original reasoning. ;)
>
> I originally thought of the git object store as "tagged pointers".
>
> That actually caused confusion initially when I tried to explain this
> to SCM people, because "tag" means something very different in an SCM
> environment than it means in computer architecture.
>
> And the implication of a tagged pointer is that you have two parts of
> it - the "tag" and the "address". Both are relevant at all points.
>
> This isn't quite as obvious in everyday moden git usage, because a lot
> of uses end up _only_ using the "address" (aka SHA1), but it's very
> much part of the object store design. Internally, the object layout
> never uses just the SHA1, it's all "type:SHA1", even if sometimes the
> types are implied (ie the tree object doesn't spell out "blob", but
> it's still explicit in the mode bits).
>
> This is very very obvious in "git cat-file", which was one of the
> original scripts in the first commit (but even there the tag/type has
> changed meaning over time: the very first version didn't use it as
> input at all, then it started verifying it, and then later it got the
> more subtle context of "peel the tags until you find this type").
>
> You can also see this in the original README (again, go look at that
> first git commit): the README talks about the "tag of their type".
>
> Of course, in practice git then walked away from having to specify the
> type all the time. It started even in that original release, in that
> the HEAD file never contained the type - because it was implicit (a
> HEAD is always a commit).
>
> So we ended up having a lot of situations like that where the actual
> tag part was implicit from context, and these days people basically
> never refer to the "full" object name with tag, but only the SHA1
> address.
>
> So now we have situations where the type really has to be looked up
> dynamically, because it's not explicitly encoded anywhere. While HEAD
> is supposed to always be a commit, other refs can be pretty much
> anything, and can point to a tag object, a commit, a tree or a blob.
> So then you actually have to look up the type based on the address.
>
> End result: these days people don't even think of git objects as
> "tagged pointers". Even internally in git, lots of code just passes
> the "object name" along without any tag/type, just the raw SHA1 / OID.
>
> So that originally "everything is a tagged pointer" is much less true
> than it used to be, and now, instead of having tagged pointers, you
> mostly end up with just "bare pointers" and look up the type
> dynamically from there.
>
> And that "look up the type in the object" is possible because even
> originally, I did *not* want any kind of "object type aliasing".
>
> So even when looking up the object with the full "tag:pointer", the
> encoding of the object itself then also contains that object type, so
> that you can cross-check that you used the right tag.
>
> That said, you *can* see some of the effects of this "tagged pointers"
> in how the internals do things like
>
> struct commit *commit = lookup_commit(repo, &oid);
>
> which conceptually very much is about tagged pointers. And the fact
> that two objects cannot alias is actually somewhat encoded in that: a
> "struct commit" contains a "struct object" as a member. But so does
> "struct blob" - and the two "struct object" cases are never the same
> "object".
>
> So there's never any worry about "could blob.object be the same object
> as commit.object"?
>
> That is actually inherent in the code, in how "lookup_commit()"
> actually does lookup_object() and then does object_as_type(OBJ_COMMIT)
> on the result.
This explains rather well why the object type is used in the calculation, and
it makes sense.
But I don't see anything about the object size. Isn't that unnecessary?
--
Felipe Contreras
next prev parent reply other threads:[~2023-04-16 12:06 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-07 14:24 [Question] Can git cat-file have a type filtering option? ZheNing Hu
2023-04-07 16:30 ` Junio C Hamano
2023-04-08 6:27 ` ZheNing Hu
2023-04-09 1:28 ` Taylor Blau
2023-04-09 2:19 ` Taylor Blau
2023-04-09 2:26 ` Taylor Blau
2023-04-09 6:51 ` ZheNing Hu
2023-04-10 20:01 ` Jeff King
2023-04-10 23:20 ` Taylor Blau
2023-04-09 6:47 ` ZheNing Hu
2023-04-10 20:14 ` Jeff King
2023-04-11 14:09 ` ZheNing Hu
2023-04-12 7:43 ` Jeff King
2023-04-12 9:57 ` ZheNing Hu
2023-04-14 7:30 ` Jeff King
2023-04-14 12:17 ` ZheNing Hu
2023-04-14 15:58 ` Junio C Hamano
2023-04-16 11:15 ` ZheNing Hu
2023-04-14 17:04 ` Linus Torvalds
2023-04-16 12:06 ` Felipe Contreras [this message]
2023-04-16 12:43 ` ZheNing Hu
2023-04-09 1:26 ` Taylor Blau
2023-04-09 1:23 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=643be4aa2bdf9_f43d294d9@chronos.notmuch \
--to=felipe.contreras@gmail.com \
--cc=adlternative@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=johncai86@gmail.com \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).