From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Subject: [NOTES 05/11] Pluggable object databases
Date: Mon, 6 Oct 2025 15:19:18 -0400 [thread overview]
Message-ID: <aOQWNrIDSTTZtbnG@nand.local> (raw)
In-Reply-To: <aOQVeVYY6zadPjln@nand.local>
Topic: Pluggable object databases
Leader: Patrick Steinhardt
* Already working towards, since git 2.50.
* Allow innovation on the server side on large binary.
* The design will soon be up for discussion.
* Allow migration between different object format, and allow to be picked later
by the implementer.
* The planned work is to make the new db more pluggable, right now the work is
still about refactoring. 2.53 will have a proof of concept. Might take into
the second half of 2026 to be done.
* Blocker1: The current db format is still not clear. Particularly latency perf
related issues.
* Might be using content chunking hashing, might be using existing db impl
like cassandra.
* Blocker2: Second problem is how to generate the packfile.
* Taylor wonder whether we can reuse the current object db, but patrick thinks
the current impl is too large/complex to adopt. The current refactoring effort
with better abstraction might speed up future changes.
* Gitster wonders whether we can just use the hash of the chunks' hashes.
* Taylor also thinks a new obj db might become just as complex.
* Patrick thinks the new obj db can be more maintainable. Starting off with a
brand new abstraction allows faster iteration.
* Rewriting obj db in a new world might be challenging because the pack obj is
so intimate to so many usage and optimizations (e.g. bitmap), also the need to
identify big binary obj over the wire.
* Taylor thinks maybe we don't need to rewrite pack obj, but abstracting the
packfile could make it worse and more verbose.
* Patrick mentions there's already many other adjacent projects abstract away
from the pack format; e.g. jgit, libgit2. Jgit initially already identified
Casadra's perf would never work due to latency overhead.
* Taylor suggests we identify a proof of concept with comparable latency to
existing obj db before doing additional refactoring.
* Ezekiel is refocusing the discussion on targeting large binary files. Maybe
with large binary files, latency degradation is not as important.
* In git, we already have a divergent code path for large binary files, we just
chose to store them in the packfile, technically people can change the storage
selection without refactoring.
* Patrick still thinks having sub-system abstraction would make code more
maintainable.
* Taylor is supportive about some objects can use the current db vs only have
the large binary files to use the new db; at least we don't impose the
overhead over all objects.
* The obj chunk design Patrick proposing is meant to benefit both client side
storage and server side.
* We should resume this discussion with more concrete usage, right now we are
still talking about potential scenarios.
* The premisor feature from server side cannot satisfy all clients, since some
clients don't want to use premisor, so the server side might still be expected
to have the large binary files on disk.
* The packfile url might still be the main direction we can use to fix the large
binary issue without doing exploding obj chunking.
* Another benefit of obj chunking is to reduce hash time for large binary files.
Gerrit currently sees 50% of clone time is due to hashing. Parallel hashing is
also possible with obj chunking.
next prev parent reply other threads:[~2025-10-06 19:19 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-06 19:16 Notes from the Git Contributor's Summit, 2025 Taylor Blau
2025-10-06 19:18 ` [NOTES 01/11] SHA-256 and interoperability work Taylor Blau
2025-10-06 19:18 ` [NOTES 02/11] First-class conflicts in Git? Taylor Blau
2025-10-06 19:18 ` [NOTES 03/11] The future of history rewriting - rebase, replay and history (+Change-IDs) Taylor Blau
2025-10-06 19:18 ` [NOTES 04/11] Rust Taylor Blau
2025-10-06 19:19 ` Taylor Blau [this message]
2025-10-06 19:19 ` [NOTES 06/11] Repository maintenance long-term goals Taylor Blau
2025-10-06 19:19 ` [NOTES 07/11] Change-ID Header in Git Taylor Blau
2025-10-06 19:20 ` [NOTES 08/11] Resumable fetch / push Taylor Blau
2025-10-06 19:20 ` [NOTES 09/11] Git 3.0 Taylor Blau
2025-10-06 19:20 ` [NOTES 10/11] How can companies respectfully engage contractors to work on Git? Taylor Blau
2025-10-06 19:20 ` [NOTES 11/11] Conservancy 2025 updates Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aOQWNrIDSTTZtbnG@nand.local \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).