From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Shawn Pearce <spearce@spearce.org>
Cc: git <git@vger.kernel.org>, Jeff King <peff@peff.net>,
Michael Haggerty <mhagger@alum.mit.edu>,
Junio C Hamano <gitster@pobox.com>,
David Borowitz <dborowitz@google.com>,
Stefan Beller <sbeller@google.com>,
David Turner <David.Turner@twosigma.com>,
Ben Alex <ben.alex@acegi.com.au>,
Kristoffer Sjogren <stoffe@gmail.com>
Subject: Re: reftable [v5]: new ref storage format
Date: Sun, 06 Aug 2017 18:56:16 +0200 [thread overview]
Message-ID: <874ltkzlcf.fsf@gmail.com> (raw)
In-Reply-To: <CAJo=hJsOHF0KVmXvbSBiBgxq4zRdt7v7sj_GuKvcpbu8tkujFA@mail.gmail.com>
On Sun, Aug 06 2017, Shawn Pearce jotted:
> 5th iteration of the reftable storage format.
I haven't kept up with all of the discussion, sorry if these comments
repeat something that's already mentioned.
> ### Version 1
>
> A repository must set its `$GIT_DIR/config` to configure reftable:
>
> [core]
> repositoryformatversion = 1
> [extensions]
> reftable = true
David Turner's LMDB proposal specified a extensions.refStorage config
variable instead. I think this is a much better idea, cf. the mistake we
already made with grep.extendedRegexp & grep.patternType. I.e. to have
'extensions.refStorage = reftable' instead of 'extensions.reftable =
true'.
If we grow another storage backend this'll become messy, and it won't be
obvious to the user that the configuration is mutually exclusive (which
it surely will be), so we'll end up having to special case it similar to
the grep.[extendedRegexp,patternType] (i.e. either make one override the
other, or make specifying >1 an error, a hassle with the config API).
> Performance testing indicates reftable is faster for lookups (51%
> faster, 11.2 usec vs. 5.4 usec), although reftable produces a
> slightly larger file (+ ~3.2%, 28.3M vs 29.2M):
>
> format | size | seek cold | seek hot |
> ---------:|-------:|----------:|----------:|
> mh-alt | 28.3 M | 23.4 usec | 11.2 usec |
> reftable | 29.2 M | 19.9 usec | 5.4 usec |
>
> [mh-alt]: https://public-inbox.org/git/CAMy9T_HCnyc1g8XWOOWhe7nN0aEFyyBskV2aOMb_fe+wGvEJ7A@mail.gmail.com/
Might be worth noting "based on WIP Java implementation". I started
searching for patches for this new format & found via
<CAJo=hJtrdCOF-RxzXfyLx7R-1f2-7pZVO_UOg28J=wUDNdf3yw@mail.gmail.com>
that it's JGit only.
Also if one wanted to run these tests via JGit using your WIP code where
does that code live / how to test it?
> ### LMDB
>
> David Turner proposed [using LMDB][dt-lmdb], as LMDB is lightweight
> (64k of runtime code) and GPL-compatible license.
>
> A downside of LMDB is its reliance on a single C implementation. This
> makes embedding inside JGit (a popular reimplemenation of Git)
> difficult, and hoisting onto virtual storage (for JGit DFS) virtually
> impossible.
This rationale as stated reads a bit too much like https://xkcd.com/927/
I.e. surely the actual problem isn't that there's a single C
implementation of LMDB, since that's one more than the C implementation
that exists of this new format already.
Also isn't this info out of date now that this exists:
https://github.com/lmdbjava/lmdbjava ? That project has been implemented
after David's initial LMDB patches on-list, but I don't know if it
implements the subset of the LMDB format needed for his proposed ref
storage.
But rather something like:
A downside of LMDB is that it would be too complex to implement the
subset of its database format needed for this reference storage in
Java in the nascent lmdbjava project and to keep the two compatible
going forward while juggling support for two upstream projects whose
aims may conflict with ours.
Or:
A downside of LMDB is <above rationale> + even if we did that
benchmarks <do we have those?> show that it wouldn't be worth it to
use the LMDB format since it's slower/bigger/whatever.
> A common format that can be supported by all major Git implementations
> (git-core, JGit, libgit2) is strongly preferred.
>
> [dt-lmdb]: https://public-inbox.org/git/1455772670-21142-26-git-send-email-dturner@twopensource.com/
>
> ## Future
>
> ### Longer hashes
>
> Version will bump (e.g. 2) to indicate `value` uses a different
> object id length other than 20. The length could be stored in an
> expanded file header, or hardcoded as part of the version.
next prev parent reply other threads:[~2017-08-06 16:56 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-06 3:15 reftable [v5]: new ref storage format Shawn Pearce
2017-08-06 16:56 ` Ævar Arnfjörð Bjarmason [this message]
2017-08-06 22:56 ` Shawn Pearce
[not found] ` <CAOhB0ruYhGAyNn84ZjS7TH7QdwxNi2bPN8KFxEEBd58B9qVrmg@mail.gmail.com>
2017-08-07 14:41 ` Shawn Pearce
2017-08-07 15:40 ` David Turner
2017-08-08 7:52 ` Jeff King
2017-08-08 9:16 ` Shawn Pearce
2017-08-08 7:38 ` Jeff King
2017-08-09 11:18 ` Howard Chu
2017-08-14 12:30 ` Howard Chu
2017-08-14 16:05 ` David Turner
2017-08-15 3:54 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=874ltkzlcf.fsf@gmail.com \
--to=avarab@gmail.com \
--cc=David.Turner@twosigma.com \
--cc=ben.alex@acegi.com.au \
--cc=dborowitz@google.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mhagger@alum.mit.edu \
--cc=peff@peff.net \
--cc=sbeller@google.com \
--cc=spearce@spearce.org \
--cc=stoffe@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox