git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin Fick <mfick@codeaurora.org>
To: git@vger.kernel.org
Subject: Ideas to speed up repacking
Date: Mon, 2 Dec 2013 16:30:45 -0700	[thread overview]
Message-ID: <201312021630.45767.mfick@codeaurora.org> (raw)

I wanted to explore the idea of exploiting knowledge about 
previous repacks to help speed up future repacks.  

I had various ideas that seemed like they might be good 
places to start, but things quickly got away from me.  
Mainly I wanted to focus on reducing and even sometimes 
eliminating reachability calculations since that seems to be 
be the one major unsolved slow piece during repacking.

My first line of thinking goes like this:  "After a full 
repack, reachability of the current refs is known.  Exploit 
that knowledge for future repacks."  There are some very 
simple scenarios where if we could figure out how to 
identify them reliably, I think we could simply avoid 
reachability calculations entirely, and yet end up with the 
same repacked files as if we had done the reachability 
calculations.  Let me outline some to see if they make sense 
as starting place for further discussion.

-------------

* Setup 1:  

  Do a full repack.  All loose and packed objects are added 
to a single pack file (assumes git config repack options do 
not create multiple packs).

* Scenario 1:

  Start with Setup 1.  Nothing has changed on the repo 
contents (no new object/packs, refs all the same), but 
repacking config options have changed (for example 
compression level has changed).

* Scenario 2:

   Starts with Setup 1.  Add one new pack file that was 
pushed to the repo by adding a new ref to the repo (existing 
refs did not change).

* Scenario 3: 

   Starts with Setup 1.  Add one new pack file that was 
pushed to the repo by updating an existing ref with a fast 
forward.

* Scenario 4:

   Starts with Setup 1.  Add some loose objects to the repo 
via a local fast forward ref update (I am assuming this is 
possible without adding any new unreferenced objects?)


In all 4 scenarios, I believe we should be able to skip 
history traversal and simply grab all objects and repack 
them into a new file?

-------------

Of the 4 scenarios above, it seems like #3 and #4 are very 
common operations (#2 is perhaps even more common for 
Gerrit)?  If these scenarios can be reliably identified 
somehow, then perhaps they could be used to reduce repacking 
time for these scenarios, and later used as building blocks 
to reduce repacking time for other related but slightly more 
complicated scenarios (with reduced history walking instead 
of none)?

For example to identify scenario 1, what if we kept a copy 
of all refs and their shas used during a full repack along 
with the newly repacked file?  A simplistic approach would 
store them in the same format as the packed-refs file as 
pack-<sha>.refs.  During repacking, if none of the refs have 
changed and there are no new objects...  

Then, if none of the refs have changed and there are new 
objects, we can just throw the new objects away?

...

I am going to stop here because this email is long enough 
and I wanted to get some feedback on the ideas first before 
offering more solutions.

Thanks,

-Martin

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation
 

             reply	other threads:[~2013-12-02 23:30 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-02 23:30 Martin Fick [this message]
2013-12-03  0:44 ` Ideas to speed up repacking Junio C Hamano
2013-12-03  3:27   ` Duy Nguyen
2013-12-03  7:17     ` Junio C Hamano
2013-12-03 10:17       ` Duy Nguyen
2013-12-03 17:50         ` Junio C Hamano
2013-12-03 19:26           ` Martin Fick

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201312021630.45767.mfick@codeaurora.org \
    --to=mfick@codeaurora.org \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).