From: Martin Fick <mfick@codeaurora.org>
To: git@vger.kernel.org
Subject: Ideas to speed up repacking
Date: Mon, 2 Dec 2013 16:30:45 -0700 [thread overview]
Message-ID: <201312021630.45767.mfick@codeaurora.org> (raw)
I wanted to explore the idea of exploiting knowledge about
previous repacks to help speed up future repacks.
I had various ideas that seemed like they might be good
places to start, but things quickly got away from me.
Mainly I wanted to focus on reducing and even sometimes
eliminating reachability calculations since that seems to be
be the one major unsolved slow piece during repacking.
My first line of thinking goes like this: "After a full
repack, reachability of the current refs is known. Exploit
that knowledge for future repacks." There are some very
simple scenarios where if we could figure out how to
identify them reliably, I think we could simply avoid
reachability calculations entirely, and yet end up with the
same repacked files as if we had done the reachability
calculations. Let me outline some to see if they make sense
as starting place for further discussion.
-------------
* Setup 1:
Do a full repack. All loose and packed objects are added
to a single pack file (assumes git config repack options do
not create multiple packs).
* Scenario 1:
Start with Setup 1. Nothing has changed on the repo
contents (no new object/packs, refs all the same), but
repacking config options have changed (for example
compression level has changed).
* Scenario 2:
Starts with Setup 1. Add one new pack file that was
pushed to the repo by adding a new ref to the repo (existing
refs did not change).
* Scenario 3:
Starts with Setup 1. Add one new pack file that was
pushed to the repo by updating an existing ref with a fast
forward.
* Scenario 4:
Starts with Setup 1. Add some loose objects to the repo
via a local fast forward ref update (I am assuming this is
possible without adding any new unreferenced objects?)
In all 4 scenarios, I believe we should be able to skip
history traversal and simply grab all objects and repack
them into a new file?
-------------
Of the 4 scenarios above, it seems like #3 and #4 are very
common operations (#2 is perhaps even more common for
Gerrit)? If these scenarios can be reliably identified
somehow, then perhaps they could be used to reduce repacking
time for these scenarios, and later used as building blocks
to reduce repacking time for other related but slightly more
complicated scenarios (with reduced history walking instead
of none)?
For example to identify scenario 1, what if we kept a copy
of all refs and their shas used during a full repack along
with the newly repacked file? A simplistic approach would
store them in the same format as the packed-refs file as
pack-<sha>.refs. During repacking, if none of the refs have
changed and there are no new objects...
Then, if none of the refs have changed and there are new
objects, we can just throw the new objects away?
...
I am going to stop here because this email is long enough
and I wanted to get some feedback on the ideas first before
offering more solutions.
Thanks,
-Martin
--
The Qualcomm Innovation Center, Inc. is a member of Code
Aurora Forum, hosted by The Linux Foundation
next reply other threads:[~2013-12-02 23:30 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-02 23:30 Martin Fick [this message]
2013-12-03 0:44 ` Ideas to speed up repacking Junio C Hamano
2013-12-03 3:27 ` Duy Nguyen
2013-12-03 7:17 ` Junio C Hamano
2013-12-03 10:17 ` Duy Nguyen
2013-12-03 17:50 ` Junio C Hamano
2013-12-03 19:26 ` Martin Fick
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201312021630.45767.mfick@codeaurora.org \
--to=mfick@codeaurora.org \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).