From: Junio C Hamano <gitster@pobox.com>
To: "Julia Evans" <julia@jvns.ca>
Cc: "Julia Evans" <gitgitgadget@gmail.com>,
git@vger.kernel.org,
"Kristoffer Haugsbakk" <kristofferhaugsbakk@fastmail.com>,
"D. Ben Knoble" <ben.knoble@gmail.com>,
"Patrick Steinhardt" <ps@pks.im>
Subject: Re: [PATCH v3] doc: add a explanation of Git's data model
Date: Thu, 16 Oct 2025 09:54:45 -0700 [thread overview]
Message-ID: <xmqq347i948a.fsf@gitster.g> (raw)
In-Reply-To: <0eb276ef-7b1a-4e79-93da-13a83226aa01@app.fastmail.com> (Julia Evans's message of "Thu, 16 Oct 2025 11:19:46 -0400")
"Julia Evans" <julia@jvns.ca> writes:
>>> +[[tree]]
>>> +trees::
>>> + A tree is how Git represents a directory. It lists, for each item in
>>> + the tree:
>>> ++
>>> +[[file-mode]]
>>> +1. The *file mode*, for example `100644`. The format is inspired by Unix
>>> + permissions, but Git's modes are much more limited. Git only supports these file modes:
>>> ++
>>> + - `100644`: regular file (with type `blob`)
>>> + - `100755`: executable file (with type `blob`)
>>> + - `120000`: symbolic link (with type `blob`)
>>> + - `040000`: directory (with type `tree`)
>>> + - `160000`: gitlink, for use with submodules (with type `commit`)
>>
>> It is not really "supporting" file modes. Rather, Git only records
>> 5 kinds of entities associated with each path in a tree object, and
>> uses numbers taht remotely resemble POSIX file modes to represent
>> these 5 kinds.
>>
>> Perhaps "supports" -> "uses"?
>
> "Uses" sounds good to me.
Also "much more limited" is misleading. We only represent 5 kinds
of things, so we use only 5 mode-bits-looking numbers.
>>> +2. The *type*: either <<blob,`blob`>> (a file), `tree` (a directory),
>>> + or <<commit,`commit`>> (a Git submodule, which is a
>>> + commit from a different Git repository)
>>> +3. The <<object-id,*object ID*>>
>>> +4. The *filename*
>>
>> Here it may be worth noting that this "filename" is a single
>> pathname component (roughly, what you would see in non-recursive
>> "ls"). In other words, it may be a directory name.
Comments?
>>> +[[blob]]
>>> +blobs::
>>> + A blob is how Git represents a file. A blob object contains the
>>> + file's contents.
>>
>> "represents a file" hints as if the thing may know its name, but
>> that is not the case (its name is given only by surrounding tree).
>>
>> "A blob is how Git represents uninterpreted series of bytes, and
>> most commonly used to store file's contents." or something, perhaps?
>
> I'll say "A blob is how Git represents a file's contents", unless Git has
> another use for blobs that I don't know about (I think it's not
> that much of a stretch to say that a symbolic link is a special kind
> of file where the "contents" are the the link destination).
A few configuration variables like mailmap.blob name a blob object,
for which _only_ its contents, i.e., the sequence of bytes, matter
and where they originally were stored does not matter.
But we are falling into the area of tautology, as any sequence of
bytes can be stored in a file so they can be called "contents of a
file". But the point is that these bytes do not have to be stored
to become a blob (think: "git cat-file -t blob -w --stdin").
> I think it's always clearer to be more specific when possible, if there's only
> one purpose for blobs it's unnecessary (and IMO a bit misleading, because
> it makes the reader wonder if there are other purposes that they should
> know about) to say that blobs can be used to store any arbitrary bytes for
> any purpose.
I do not think describing other use cases is unnecessary. Even if
we limit ourselves to discuss a single purpose for blob, i.e. to
represent the contents of a file, we should stress that blob is to
store _only_ contents, and not other aspects of the file (e.g., in
what paths with what mode), and that is where my reaction to "how
Git reprsents a file" comes from.
>>> +[[branch]]
>>> +branches: `refs/heads/<name>`::
>>> + A branch is a name for a commit ID.
>>
>> Well a commit ID is an alternative way to refer to a commit object
>> *name*, so it is a bit strange to say "a name for a commit ID".
>>
>> Perhaps "A branch ref stores a commit ID." is better?
>
> I think I'll leave this alone, none of the many test readers reported
> being confused by it.
Would a confused person report that they are confused? ;-)
> I see that you don't like the "name for a commit ID" phrasing :)
> Maybe there's another way to say it, though again none of the test
> readers said they were confused by this or disagreed with the phrasing.
Yes, I get that given "refs/heads/main", you want to say "main" is
one of the ways to have repo_get_oid() to yield the commit object,
and you are using "name" in that sense, but it is more like a ref
can be used to name an object. It is *not* the name of the object,
because the object can have other names, and more importantly, it
(i.e., to give a name for an object) is not the only thing that a
ref can do. And that is why I do not like that phrasing, combined
with the target of giving that name is spelled "a commit ID". The
commit ID is already another way to name the thing the refname can
be also used to name: a commit object. A commit object and a commit
object name are different things. The latter is a name that can
refer to the former. And a ref can be used just like the latter to
refer to the former (i.e. "commit object").
By the way, I do like the way many of your responses are "will think
about it more", not "I'll take your version".
Very much appreciated.
Thanks.
next prev parent reply other threads:[~2025-10-16 16:54 UTC|newest]
Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-03 17:34 [PATCH] doc: add a explanation of Git's data model Julia Evans via GitGitGadget
2025-10-03 21:46 ` Kristoffer Haugsbakk
2025-10-06 19:36 ` Julia Evans
2025-10-06 21:44 ` D. Ben Knoble
2025-10-06 21:46 ` Julia Evans
2025-10-06 21:55 ` D. Ben Knoble
2025-10-09 13:20 ` Julia Evans
2025-10-08 9:59 ` Kristoffer Haugsbakk
2025-10-06 3:32 ` Junio C Hamano
2025-10-06 19:03 ` Julia Evans
2025-10-07 12:37 ` Kristoffer Haugsbakk
2025-10-07 16:38 ` Junio C Hamano
2025-10-07 14:32 ` Patrick Steinhardt
2025-10-07 17:02 ` Junio C Hamano
2025-10-07 19:30 ` Julia Evans
2025-10-07 20:01 ` Junio C Hamano
2025-10-07 18:39 ` D. Ben Knoble
2025-10-07 18:55 ` Julia Evans
2025-10-08 4:18 ` Patrick Steinhardt
2025-10-08 15:53 ` Junio C Hamano
2025-10-08 19:06 ` Julia Evans
2025-10-08 13:53 ` [PATCH v2] " Julia Evans via GitGitGadget
2025-10-10 11:51 ` Patrick Steinhardt
2025-10-13 14:48 ` Junio C Hamano
2025-10-14 5:45 ` Patrick Steinhardt
2025-10-14 9:18 ` Julia Evans
2025-10-14 11:45 ` Patrick Steinhardt
2025-10-14 13:39 ` Junio C Hamano
2025-10-14 21:12 ` [PATCH v3] " Julia Evans via GitGitGadget
2025-10-15 6:24 ` Patrick Steinhardt
2025-10-15 15:34 ` Junio C Hamano
2025-10-15 17:20 ` Julia Evans
2025-10-15 20:42 ` Junio C Hamano
2025-10-16 14:21 ` Julia Evans
2025-10-15 19:58 ` Junio C Hamano
2025-10-16 15:19 ` Julia Evans
2025-10-16 16:54 ` Junio C Hamano [this message]
2025-10-16 18:59 ` Julia Evans
2025-10-16 20:48 ` Junio C Hamano
2025-10-16 15:24 ` Kristoffer Haugsbakk
2025-10-20 16:37 ` Kristoffer Haugsbakk
2025-10-20 18:01 ` Junio C Hamano
2025-10-27 19:32 ` [PATCH v4] doc: add an " Julia Evans via GitGitGadget
2025-10-27 21:54 ` Junio C Hamano
2025-10-28 20:10 ` Julia Evans
2025-10-28 20:31 ` Junio C Hamano
2025-10-30 20:32 ` [PATCH v5] " Julia Evans via GitGitGadget
2025-10-31 14:44 ` Junio C Hamano
2025-11-03 7:40 ` Patrick Steinhardt
2025-11-03 15:38 ` Junio C Hamano
2025-11-03 19:43 ` Julia Evans
2025-11-04 1:34 ` Junio C Hamano
2025-11-04 15:45 ` Julia Evans
2025-11-04 20:53 ` Junio C Hamano
2025-11-04 21:24 ` Julia Evans
2025-11-04 23:45 ` Junio C Hamano
2025-11-05 0:02 ` Julia Evans
2025-11-05 3:21 ` Ben Knoble
2025-11-05 16:26 ` Julia Evans
2025-11-06 3:07 ` Ben Knoble
2025-10-31 21:49 ` Junio C Hamano
2025-11-03 7:40 ` Patrick Steinhardt
2025-11-03 19:52 ` Julia Evans
2025-11-07 19:52 ` [PATCH v6] " Julia Evans via GitGitGadget
2025-11-07 21:03 ` Junio C Hamano
2025-11-07 21:23 ` Junio C Hamano
2025-11-07 21:40 ` Julia Evans
2025-11-07 23:07 ` Junio C Hamano
2025-11-08 19:43 ` Junio C Hamano
2025-11-09 0:48 ` Ben Knoble
2025-11-09 4:59 ` Junio C Hamano
2025-11-10 15:56 ` Julia Evans
2025-11-11 10:13 ` Junio C Hamano
2025-11-11 13:07 ` Ben Knoble
2025-11-11 15:24 ` Julia Evans
2025-11-12 19:16 ` Junio C Hamano
2025-11-12 22:49 ` Junio C Hamano
2025-11-13 19:50 ` Julia Evans
2025-11-13 20:07 ` Junio C Hamano
2025-11-13 20:18 ` Julia Evans
2025-11-13 20:34 ` Chris Torek
2025-11-13 23:11 ` Junio C Hamano
2025-11-12 19:53 ` [PATCH v7] " Julia Evans via GitGitGadget
2025-11-12 20:26 ` Junio C Hamano
2025-11-23 2:37 ` Junio C Hamano
2025-12-01 8:14 ` Patrick Steinhardt
2025-12-02 12:25 ` Junio C Hamano
2025-10-09 14:20 ` [PATCH] doc: add a " Julia Evans
2025-10-10 0:42 ` Ben Knoble
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqq347i948a.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=ben.knoble@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=julia@jvns.ca \
--cc=kristofferhaugsbakk@fastmail.com \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).