From: Junio C Hamano <gitster@pobox.com>
To: "Julia Evans via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org,
Kristoffer Haugsbakk <kristofferhaugsbakk@fastmail.com>,
"D. Ben Knoble" <ben.knoble@gmail.com>,
Patrick Steinhardt <ps@pks.im>, Julia Evans <julia@jvns.ca>
Subject: Re: [PATCH v5] doc: add an explanation of Git's data model
Date: Fri, 31 Oct 2025 07:44:43 -0700 [thread overview]
Message-ID: <xmqqtszf2kro.fsf@gitster.g> (raw)
In-Reply-To: <pull.1981.v5.git.1761856336360.gitgitgadget@gmail.com> (Julia Evans via GitGitGadget's message of "Thu, 30 Oct 2025 20:32:16 +0000")
"Julia Evans via GitGitGadget" <gitgitgadget@gmail.com> writes:
> diff --git a/Documentation/gitdatamodel.adoc b/Documentation/gitdatamodel.adoc
> new file mode 100644
> index 0000000000..1cefbb4833
> --- /dev/null
> +++ b/Documentation/gitdatamodel.adoc
> @@ -0,0 +1,296 @@
> +gitdatamodel(7)
> +===============
> +
> +NAME
> +----
> +gitdatamodel - Git's core data model
> +
> +SYNOPSIS
> +--------
> +gitdatamodel
> +
> +DESCRIPTION
> +-----------
> +
> +It's not necessary to understand Git's data model to use Git, but it's
> +very helpful when reading Git's documentation so that you know what it
> +means when the documentation says "object", "reference" or "index".
> +
> +Git's core operations use 4 kinds of data:
> +
> +1. <<objects,Objects>>: commits, trees, blobs, and tag objects
> +2. <<references,References>>: branches, tags,
> + remote-tracking branches, etc
> +3. <<index,The index>>, also known as the staging area
> +4. <<reflogs,Reflogs>>: logs of changes to references ("ref log")
> +
> +[[objects]]
> +OBJECTS
> +-------
> +
> +All of the commits and files in a Git repository are stored as "Git objects".
> +Git objects never change after they're created, and every object has an ID,
> +like `1b61de420a21a2f1aaef93e38ecd0e45e8bc9f0a`.
> +
> +This means that if you have an object's ID, you can always recover its
> +exact contents as long as the object hasn't been deleted.
> +
> +Every object has:
> +
> +[[object-id]]
> +1. an *ID* (aka "object name"), which is a cryptographic hash of its
> + type and contents.
> + It's fast to look up a Git object using its ID.
> + This is usually represented in hexadecimal, like
> + `1b61de420a21a2f1aaef93e38ecd0e45e8bc9f0a`.
> +2. a *type*. There are 4 types of objects:
> + <<commit,commits>>, <<tree,trees>>, <<blob,blobs>>,
> + and <<tag-object,tag objects>>.
> +3. *contents*. The structure of the contents depends on the type.
> +
> +Here's how each type of object is structured:
> +
> +[[commit]]
> +commit::
> + A commit contains these required fields
> + (though there are other optional fields):
> ++
> +1. The full directory structure of all the files in that version of the
> + repository and each file's contents, stored as the *<<tree,tree>>* ID
> + of the commit's base directory.
"base directory" is a new term; I think we most often use
"top-level" directory (in various spellings).
$ git grep -e 'base directory' -e 'level directory' Documentation/
> +[[tree]]
> +tree::
> + A tree is how Git represents a directory.
> + It can contain files or other trees (which are subdirectories).
> + It lists, for each item in the tree:
> ++
> +1. The *filename*, for example `hello.py`
> +2. The *file mode*. Git has these file modes. which are only
"has these" -> "uses only these" to clarify that this is an
exhaustive enumeration and users cannot invent 100664 and others,
which is a mistake Git itself used to make/allow.
> +[[tag-object]]
> +tag object::
> + Tag objects contain these required fields
> + (though there are other optional fields):
> ++
> +1. The object *ID* it references
> +2. The object *type*
I would rephrase these to
1. The *ID* of the object it references
2. The *type* of the object it references
because (1) a tag object references another object, not ID. To name
the object it reference, it uses the object name of it, but just
like your name is not you, object name is not the object (it merely
is *one* way to refer to it). (2) unless it is very clear to readers
that "The object" in 1. and 2. refer to the same object, 2. invites
a question "type of which object?".
> +[[branch]]
> +branches: `refs/heads/<name>`::
> + A branch refers to a commit ID.
A branch refers to a commit object (by its ID). Ditto for tags.
> +NOTE: Git may delete objects that aren't "reachable" from any reference.
> +An object is "reachable" if we can find it by following tags to whatever
> +they tag, commits to their parents or trees, and trees to the trees or
> +blobs that they contain.
> +For example, if you amend a commit, with `git commit --amend`,
> +the old commit will usually not be reachable, so it may be deleted eventually.
> +Reachable objects will never be deleted.
Very good write-up. As we would touch upon reflog later in the same
document, we may want to extend the "amend" example a bit, perhaps
like
Note: Git never deletes objects that are "reachable". An object
is "reachable" if .... An unreachable object may be deleted.
For example, ... a newly created commit will replace the old
commit and the current branch ref points at the new commit. The
old commit is recorded in the <<reflogs,reflog>> of the current
branch, so it is still "reachable", but sufficiently old reflog
entries are expired away, the old commit may become unreachable
at that point, and would get deleted.
Other than the above, I found everything very nicely written.
Thanks.
next prev parent reply other threads:[~2025-10-31 14:44 UTC|newest]
Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-03 17:34 [PATCH] doc: add a explanation of Git's data model Julia Evans via GitGitGadget
2025-10-03 21:46 ` Kristoffer Haugsbakk
2025-10-06 19:36 ` Julia Evans
2025-10-06 21:44 ` D. Ben Knoble
2025-10-06 21:46 ` Julia Evans
2025-10-06 21:55 ` D. Ben Knoble
2025-10-09 13:20 ` Julia Evans
2025-10-08 9:59 ` Kristoffer Haugsbakk
2025-10-06 3:32 ` Junio C Hamano
2025-10-06 19:03 ` Julia Evans
2025-10-07 12:37 ` Kristoffer Haugsbakk
2025-10-07 16:38 ` Junio C Hamano
2025-10-07 14:32 ` Patrick Steinhardt
2025-10-07 17:02 ` Junio C Hamano
2025-10-07 19:30 ` Julia Evans
2025-10-07 20:01 ` Junio C Hamano
2025-10-07 18:39 ` D. Ben Knoble
2025-10-07 18:55 ` Julia Evans
2025-10-08 4:18 ` Patrick Steinhardt
2025-10-08 15:53 ` Junio C Hamano
2025-10-08 19:06 ` Julia Evans
2025-10-08 13:53 ` [PATCH v2] " Julia Evans via GitGitGadget
2025-10-10 11:51 ` Patrick Steinhardt
2025-10-13 14:48 ` Junio C Hamano
2025-10-14 5:45 ` Patrick Steinhardt
2025-10-14 9:18 ` Julia Evans
2025-10-14 11:45 ` Patrick Steinhardt
2025-10-14 13:39 ` Junio C Hamano
2025-10-14 21:12 ` [PATCH v3] " Julia Evans via GitGitGadget
2025-10-15 6:24 ` Patrick Steinhardt
2025-10-15 15:34 ` Junio C Hamano
2025-10-15 17:20 ` Julia Evans
2025-10-15 20:42 ` Junio C Hamano
2025-10-16 14:21 ` Julia Evans
2025-10-15 19:58 ` Junio C Hamano
2025-10-16 15:19 ` Julia Evans
2025-10-16 16:54 ` Junio C Hamano
2025-10-16 18:59 ` Julia Evans
2025-10-16 20:48 ` Junio C Hamano
2025-10-16 15:24 ` Kristoffer Haugsbakk
2025-10-20 16:37 ` Kristoffer Haugsbakk
2025-10-20 18:01 ` Junio C Hamano
2025-10-27 19:32 ` [PATCH v4] doc: add an " Julia Evans via GitGitGadget
2025-10-27 21:54 ` Junio C Hamano
2025-10-28 20:10 ` Julia Evans
2025-10-28 20:31 ` Junio C Hamano
2025-10-30 20:32 ` [PATCH v5] " Julia Evans via GitGitGadget
2025-10-31 14:44 ` Junio C Hamano [this message]
2025-11-03 7:40 ` Patrick Steinhardt
2025-11-03 15:38 ` Junio C Hamano
2025-11-03 19:43 ` Julia Evans
2025-11-04 1:34 ` Junio C Hamano
2025-11-04 15:45 ` Julia Evans
2025-11-04 20:53 ` Junio C Hamano
2025-11-04 21:24 ` Julia Evans
2025-11-04 23:45 ` Junio C Hamano
2025-11-05 0:02 ` Julia Evans
2025-11-05 3:21 ` Ben Knoble
2025-11-05 16:26 ` Julia Evans
2025-11-06 3:07 ` Ben Knoble
2025-10-31 21:49 ` Junio C Hamano
2025-11-03 7:40 ` Patrick Steinhardt
2025-11-03 19:52 ` Julia Evans
2025-11-07 19:52 ` [PATCH v6] " Julia Evans via GitGitGadget
2025-11-07 21:03 ` Junio C Hamano
2025-11-07 21:23 ` Junio C Hamano
2025-11-07 21:40 ` Julia Evans
2025-11-07 23:07 ` Junio C Hamano
2025-11-08 19:43 ` Junio C Hamano
2025-11-09 0:48 ` Ben Knoble
2025-11-09 4:59 ` Junio C Hamano
2025-11-10 15:56 ` Julia Evans
2025-11-11 10:13 ` Junio C Hamano
2025-11-11 13:07 ` Ben Knoble
2025-11-11 15:24 ` Julia Evans
2025-11-12 19:16 ` Junio C Hamano
2025-11-12 22:49 ` Junio C Hamano
2025-11-13 19:50 ` Julia Evans
2025-11-13 20:07 ` Junio C Hamano
2025-11-13 20:18 ` Julia Evans
2025-11-13 20:34 ` Chris Torek
2025-11-13 23:11 ` Junio C Hamano
2025-11-12 19:53 ` [PATCH v7] " Julia Evans via GitGitGadget
2025-11-12 20:26 ` Junio C Hamano
2025-11-23 2:37 ` Junio C Hamano
2025-12-01 8:14 ` Patrick Steinhardt
2025-12-02 12:25 ` Junio C Hamano
2025-10-09 14:20 ` [PATCH] doc: add a " Julia Evans
2025-10-10 0:42 ` Ben Knoble
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqtszf2kro.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=ben.knoble@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=julia@jvns.ca \
--cc=kristofferhaugsbakk@fastmail.com \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).