git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: Julia Evans via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org,  Julia Evans <julia@jvns.ca>
Subject: Re: [PATCH] doc: add a explanation of Git's data model
Date: Tue, 07 Oct 2025 10:02:51 -0700	[thread overview]
Message-ID: <xmqq4isalk5g.fsf@gitster.g> (raw)
In-Reply-To: <aOUkZa4_fq1hho7Q@pks.im> (Patrick Steinhardt's message of "Tue, 7 Oct 2025 16:32:05 +0200")

Patrick Steinhardt <ps@pks.im> writes:

>> +Git's core operations use 4 kinds of data:
>> +
>> +1. <<objects,Objects>>: commits, trees, blobs, and tag objects
>> +2. <<references,References>>: branches, tags,
>> +   remote-tracking branches, etc
>> +3. <<index,The index>>, also known as the staging area
>> +4. <<reflogs,Reflogs>>
>
> This list makes sense to me. There's of course more data structures in
> Git, but all the other data structures shouldn't really matter to users
> at all as they are mostly caches or internal details of the on-disk
> format.
>
> There's potentially one exception though, namely the Git configuration.
> I'd claim that Git "uses" the Git configuration similarly to how it uses
> the others, but I get why it's not explicitly mentioned here.

The core operations do not use Git configuration any more than they
use what is specified by the command line arguments.

>> +[[objects]]
>> +OBJECTS
>> +-------
>> +
>> +Commits, trees, blobs, and tag objects are all stored in Git's object database.
>> +Every object has:
>> +
>> +1. an *ID*, which is the SHA-1 hash of its contents.
>
> I think this needs to be adapted to not single out SHA-1 as the only
> hashing algorithm. We already support SHA-256, so we should definitely
> say that the algorithm can be swapped. Maybe something like:

Good point.  Also officially they are called "object name".

>   An *object ID*, which is the cryptographic hash of its contents. By
>   default, Git uses SHA-1 as object hash, but alternative hashes like
>   SHA-256 are supported.

I'd avoid "object name is the result of hashing X" which historically
was a source of question: "why does 'sha1sum README.md' give different
hash from 'git add README.md && git ls-files -s README.md'?"

It is an irrelevant implementation detail (and you'd eventually end
up having to say "X is <type> SP <length> NUL <contents>").

    An object name, which is derived cryptographically from its
    type, size and contents.  All versions of Git can use SHA-1 hash
    function, but more recent versions of Git can also use SHA-256
    hash function.

>> +commits::
>> +    A commit contains:
>> ++
>> +1. Its *parent commit ID(s)*. The first commit in a repository has 0 parents,
>> +  regular commits have 1 parent, merge commits have 2+ parents
>
> I'd say "at least two parents" instead of "2+ parents".

Yup, that reads much better.

>> +tree 1b61de420a21a2f1aaef93e38ecd0e45e8bc9f0a
>> +parent 4ccb6d7b8869a86aae2e84c56523f8705b50c647
>> +author Maya <maya@example.com> 1759173425 -0400
>> +committer Maya <maya@example.com> 1759173425 -0400
>> +
>> +Add README
>> +----
>
> In practice, commits can have other headers that are ignored by Git. But
> that's certainly not part of Git's core data model, so I don't think we
> should mention that here.

Third-party software can add truly garbage ones that do not have any
meaning, and Git tolerates by ignoring them.  But there are others
that Git does pay attention to, like encoding, gpgsig, etc., which
may worth mention (in the form that "these four are what you typically
see, but there may be others" without even naming any).



  reply	other threads:[~2025-10-07 17:02 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-03 17:34 [PATCH] doc: add a explanation of Git's data model Julia Evans via GitGitGadget
2025-10-03 21:46 ` Kristoffer Haugsbakk
2025-10-06 19:36   ` Julia Evans
2025-10-06 21:44     ` D. Ben Knoble
2025-10-06 21:46       ` Julia Evans
2025-10-06 21:55         ` D. Ben Knoble
2025-10-09 13:20           ` Julia Evans
2025-10-08  9:59     ` Kristoffer Haugsbakk
2025-10-06  3:32 ` Junio C Hamano
2025-10-06 19:03   ` Julia Evans
2025-10-07 12:37   ` Kristoffer Haugsbakk
2025-10-07 16:38     ` Junio C Hamano
2025-10-07 14:32 ` Patrick Steinhardt
2025-10-07 17:02   ` Junio C Hamano [this message]
2025-10-07 19:30     ` Julia Evans
2025-10-07 20:01       ` Junio C Hamano
2025-10-07 18:39   ` D. Ben Knoble
2025-10-07 18:55   ` Julia Evans
2025-10-08  4:18     ` Patrick Steinhardt
2025-10-08 15:53       ` Junio C Hamano
2025-10-08 19:06         ` Julia Evans
2025-10-08 13:53 ` [PATCH v2] " Julia Evans via GitGitGadget
2025-10-10 11:51   ` Patrick Steinhardt
2025-10-13 14:48     ` Junio C Hamano
2025-10-14  5:45       ` Patrick Steinhardt
2025-10-14  9:18         ` Julia Evans
2025-10-14 11:45           ` Patrick Steinhardt
2025-10-14 13:39           ` Junio C Hamano
2025-10-14 21:12   ` [PATCH v3] " Julia Evans via GitGitGadget
2025-10-15  6:24     ` Patrick Steinhardt
2025-10-15 15:34       ` Junio C Hamano
2025-10-15 17:20         ` Julia Evans
2025-10-15 20:42           ` Junio C Hamano
2025-10-16 14:21             ` Julia Evans
2025-10-15 19:58     ` Junio C Hamano
2025-10-16 15:19       ` Julia Evans
2025-10-16 16:54         ` Junio C Hamano
2025-10-16 18:59           ` Julia Evans
2025-10-16 20:48             ` Junio C Hamano
2025-10-16 15:24     ` Kristoffer Haugsbakk
2025-10-20 16:37     ` Kristoffer Haugsbakk
2025-10-20 18:01       ` Junio C Hamano
2025-10-27 19:32     ` [PATCH v4] doc: add an " Julia Evans via GitGitGadget
2025-10-27 21:54       ` Junio C Hamano
2025-10-28 20:10         ` Julia Evans
2025-10-28 20:31           ` Junio C Hamano
2025-10-30 20:32       ` [PATCH v5] " Julia Evans via GitGitGadget
2025-10-31 14:44         ` Junio C Hamano
2025-11-03  7:40           ` Patrick Steinhardt
2025-11-03 15:38             ` Junio C Hamano
2025-11-03 19:43           ` Julia Evans
2025-11-04  1:34             ` Junio C Hamano
2025-11-04 15:45               ` Julia Evans
2025-11-04 20:53                 ` Junio C Hamano
2025-11-04 21:24                   ` Julia Evans
2025-11-04 23:45                     ` Junio C Hamano
2025-11-05  0:02                       ` Julia Evans
2025-11-05  3:21                         ` Ben Knoble
2025-11-05 16:26                           ` Julia Evans
2025-11-06  3:07                             ` Ben Knoble
2025-10-31 21:49         ` Junio C Hamano
2025-11-03  7:40         ` Patrick Steinhardt
2025-11-03 19:52           ` Julia Evans
2025-11-07 19:52         ` [PATCH v6] " Julia Evans via GitGitGadget
2025-11-07 21:03           ` Junio C Hamano
2025-11-07 21:23           ` Junio C Hamano
2025-11-07 21:40             ` Julia Evans
2025-11-07 23:07               ` Junio C Hamano
2025-11-08 19:43                 ` Junio C Hamano
2025-11-09  0:48                 ` Ben Knoble
2025-11-09  4:59                   ` Junio C Hamano
2025-11-10 15:56                     ` Julia Evans
2025-11-11 10:13                       ` Junio C Hamano
2025-11-11 13:07                         ` Ben Knoble
2025-11-11 15:24                         ` Julia Evans
2025-11-12 19:16                           ` Junio C Hamano
2025-11-12 22:49                             ` Junio C Hamano
2025-11-13 19:50                               ` Julia Evans
2025-11-13 20:07                                 ` Junio C Hamano
2025-11-13 20:18                                 ` Julia Evans
2025-11-13 20:34                                   ` Chris Torek
2025-11-13 23:11                                   ` Junio C Hamano
2025-11-12 19:53           ` [PATCH v7] " Julia Evans via GitGitGadget
2025-11-12 20:26             ` Junio C Hamano
2025-11-23  2:37             ` Junio C Hamano
2025-12-01  8:14               ` Patrick Steinhardt
2025-12-02 12:25                 ` Junio C Hamano
2025-10-09 14:20 ` [PATCH] doc: add a " Julia Evans
2025-10-10  0:42   ` Ben Knoble

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq4isalk5g.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=julia@jvns.ca \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).