From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: Eric Wong <e@80x24.org>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
Taylor Blau <me@ttaylorr.com>, Derrick Stolee <stolee@gmail.com>,
Patrick Steinhardt <ps@pks.im>,
Jonathan Nieder <jrnieder@gmail.com>
Subject: Re: Efficiently storing SHA-1 ↔ SHA-256 mappings in compatibility mode
Date: Thu, 28 Aug 2025 21:43:43 +0000 [thread overview]
Message-ID: <aLDNj5GPYA9nR3xR@fruit.crustytoothpaste.net> (raw)
In-Reply-To: <20250827190817.M36986@dcvr>
[-- Attachment #1: Type: text/plain, Size: 2353 bytes --]
On 2025-08-27 at 19:08:16, Eric Wong wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> wrote:
> > TL;DR: We need a different datastore than a flat file for storing
> > mappings between SHA-1 and SHA-256 in compatibility mode. Advice and
> > opinions sought.
>
> <snip>
>
> > Our approach for mapping object IDs between algorithms uses data in pack
> > index v3 (outlined in the transition document), plus a flat file called
> > `loose-object-idx` for loose objects. However, we didn't anticipate
> > that we'd need to handle mappings long-term for data that is neither a
> > loose object nor a packed object.
> >
> > For instance, with shallow clones, we must store a mapping for the
> > shallows the server has sent us[1], since we lack the history to convert
> > objects otherwise. Similarly, if there are submodules or we're using a
> > partial clone, we must store those mappings as well, since we cannot
> > convert trees without them. We can store them in the
> > `loose-object-idx`, but since it's not sorted or easily searchable, it's
> > going to perform really terribly when we store enough of them. Right
> > now, we read the entire file into two hashmaps (one in each direction)
> > and we sometimes need to re-read it when other processes add items, so
> > it won't take much to make it be slow and take a lot of memory.
>
> This really seems ideal for SQLite, which has come a long way
> since 2005 when git started.
>
> I really wish git would've relied on more on existing formats
> (e.g. LMDB refs) rather than introducing more one-off data
> formats that require more cognitive overhead to document and
> learn[1], especially when SQLite is extremely portable and works
> on tiny devices.
SQLite is not an option because it performs poorly with Java and we want
our formats to work with other implementations, like JGit. That's why
we created reftable instead of using SQLite.
Also, in general, I'm not interested in being tied to a single
implementation. If the developers of SQLite decide to dramatically
change the license of all their code like Oracle did with Berkeley DB,
we're going to have a problem. Yes, we can use the older versions, but
we'd still need people to maintain the library and update it.
--
brian m. carlson (they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
next prev parent reply other threads:[~2025-08-28 21:43 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-14 1:09 Efficiently storing SHA-1 ↔ SHA-256 mappings in compatibility mode brian m. carlson
2025-08-14 14:22 ` Junio C Hamano
2025-08-14 22:06 ` brian m. carlson
2025-08-14 22:51 ` Junio C Hamano
2025-08-15 15:27 ` Derrick Stolee
2025-09-03 6:43 ` Patrick Steinhardt
2025-08-27 19:08 ` Eric Wong
2025-08-28 14:53 ` Junio C Hamano
2025-08-28 21:43 ` brian m. carlson [this message]
2025-08-29 19:51 ` Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aLDNj5GPYA9nR3xR@fruit.crustytoothpaste.net \
--to=sandals@crustytoothpaste.net \
--cc=e@80x24.org \
--cc=git@vger.kernel.org \
--cc=jrnieder@gmail.com \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).