git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: ZheNing Hu <adlternative@gmail.com>
Cc: Jeff King <peff@peff.net>, Taylor Blau <me@ttaylorr.com>,
	Git List <git@vger.kernel.org>,
	johncai86@gmail.com,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [Question] Can git cat-file have a type filtering option?
Date: Fri, 14 Apr 2023 08:58:39 -0700	[thread overview]
Message-ID: <xmqqh6titpzk.fsf@gitster.g> (raw)
In-Reply-To: <CAOLTT8SEeY1tfU39xHPJ21F7o3dmgEFwNCny=Z2F4Y2HFR3DzA@mail.gmail.com> (ZheNing Hu's message of "Fri, 14 Apr 2023 20:17:34 +0800")

ZheNing Hu <adlternative@gmail.com> writes:

> Oh, you are right, this could be to prevent conflicts between Git objects
> with identical content but different types. However, I always associate
> Git with the file system, where metadata such as file type and size is
> stored in the inode, while the file data is stored in separate chunks.

I am afraid the presentation order Peff used caused a bit of
confusion.  The true reason is what Peff brought up as "Or worse".
We need to be able to tell, given only the name of an object,
everything that we need to know about the object, and for that, we
need the type information when we ask for an object by its name.
Having size embedded in the data that comes back to us when we
consult object database with an object name helps the implementation
to pre-allocate a buffer and then inflate into it--there is no
fundamental reason why it should be there.

It is a secondary problem created by the design choice that we store
type together with contents, that the object type recorded in a tree
entry may contradict the actual type of the object recorded in the
tree entry.  We could have declared that the object type found in a
tree entry is to be trusted, if we didn't record the type in the
object database together with the object contents.

I think your original question was not "why do we store type and
size together with the contents?", but was "why do we include in the
hash computation?", and all of the above discuss related tangent
without touching the original question.

The need to have type or size available when we ask the object
database for data associated with the object does not necessarily
mean they must be hashed together with the contents.  It was done
merely because "why not? that way, we do not have to worry about
catching corrupt values for type and size information we want to
store together with the contents".  IOW, we could have checksummed
these two pieces of information separately, but why bother?

  reply	other threads:[~2023-04-14 15:58 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-07 14:24 [Question] Can git cat-file have a type filtering option? ZheNing Hu
2023-04-07 16:30 ` Junio C Hamano
2023-04-08  6:27   ` ZheNing Hu
2023-04-09  1:28     ` Taylor Blau
2023-04-09  2:19       ` Taylor Blau
2023-04-09  2:26         ` Taylor Blau
2023-04-09  6:51           ` ZheNing Hu
2023-04-10 20:01             ` Jeff King
2023-04-10 23:20               ` Taylor Blau
2023-04-09  6:47       ` ZheNing Hu
2023-04-10 20:14         ` Jeff King
2023-04-11 14:09           ` ZheNing Hu
2023-04-12  7:43             ` Jeff King
2023-04-12  9:57               ` ZheNing Hu
2023-04-14  7:30                 ` Jeff King
2023-04-14 12:17                   ` ZheNing Hu
2023-04-14 15:58                     ` Junio C Hamano [this message]
2023-04-16 11:15                       ` ZheNing Hu
2023-04-14 17:04                     ` Linus Torvalds
2023-04-16 12:06                       ` Felipe Contreras
2023-04-16 12:43                       ` ZheNing Hu
2023-04-09  1:26   ` Taylor Blau
2023-04-09  1:23 ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqh6titpzk.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=adlternative@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=johncai86@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).