From: Junio C Hamano <gitster@pobox.com>
To: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Cc: git@vger.kernel.org
Subject: Re: Compressing packed-refs
Date: Thu, 16 Jul 2020 15:27:15 -0700 [thread overview]
Message-ID: <xmqqsgdrf64c.fsf@gitster.c.googlers.com> (raw)
In-Reply-To: <20200716221026.dgduvxful32gkhwy@chatter.i7.local> (Konstantin Ryabitsev's message of "Thu, 16 Jul 2020 18:10:26 -0400")
Konstantin Ryabitsev <konstantin@linuxfoundation.org> writes:
> I know repos with too many refs is a corner-case for most people, but
> it's looming large in my world, so I'm wondering if it makes sense to
> compress the packed-refs file when "git pack-refs" is performed?
I think the reftable is the longer term direction, but let's see if
there is easy enough optimization opportunity that we can afford the
development and maintenance cost for the short term.
My .git/packed-refs file begins like so:
# pack-refs with: peeled fully-peeled sorted
c3808ca6982b0ad7ee9b87eca9b50b9a24ec08b0 refs/heads/maint-2.10
3b9e3c2cede15057af3ff8076c45ad5f33829436 refs/heads/maint-2.11
584f8975d2d9530a34bd0b936ae774f82fe30fed refs/heads/master
2cccc8116438182c988c7f26d9559a1c22e78f1c refs/heads/next
8300349bc1f0a0e2623d5824266bd72c1f4b5f24 refs/notes/commits
...
A few observations that can lead to easy design elements are
- Typically more than half of each records is consumed by the
object name that is hard to "compress".
- The file is sorted, so it could use the prefix compression like
we do in the v4 index files.
So perhaps a new format could be
- The header "# pack-refs with: " lists a new trait, "compressed";
- Object names will be expressed in binary, saving 20 bytes per a
record;
- Prefix compression of the refnames similar to v4 index would save
a bit more.
Storing binary object names would actually be favourable for
performance, as the in-core data structure we use to store the
result of parsing the file uses binary.
next prev parent reply other threads:[~2020-07-16 23:24 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-16 22:10 Compressing packed-refs Konstantin Ryabitsev
2020-07-16 22:27 ` Junio C Hamano [this message]
2020-07-16 22:54 ` Konstantin Ryabitsev
2020-07-17 6:27 ` Jeff King
2020-07-18 18:26 ` Konstantin Ryabitsev
2020-07-20 17:32 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqsgdrf64c.fsf@gitster.c.googlers.com \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=konstantin@linuxfoundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).