* git repack vs git gc --aggressive @ 2012-08-07 18:22 Felix Natter 2012-08-07 18:44 ` Jeff King 0 siblings, 1 reply; 7+ messages in thread From: Felix Natter @ 2012-08-07 18:22 UTC (permalink / raw) To: git hello, I read this: http://metalinguist.wordpress.com/2007/12/06/the-woes-of-git-gc-aggressive-and-how-git-deltas-work/ where git repack -a -d --depth=250 --window=250 is mentioned as a (recommended) alternative to git gc --aggressive. I am a bit confused, because the page also mentions that git gc --aggressive is recommended when a repo has been imported using git fast-import. So my questions are: 1. is the above repack command (with --depth=500) safe? Of course I want to be absolutely sure that our repo will be consistent. Do I need another command ("git gc", "git prune") as well? 2. is it the right tool for the job or shall I use git gc --aggressive? Thanks! -- Felix Natter ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: git repack vs git gc --aggressive 2012-08-07 18:22 git repack vs git gc --aggressive Felix Natter @ 2012-08-07 18:44 ` Jeff King 2012-08-07 19:05 ` Junio C Hamano 0 siblings, 1 reply; 7+ messages in thread From: Jeff King @ 2012-08-07 18:44 UTC (permalink / raw) To: Felix Natter; +Cc: git On Tue, Aug 07, 2012 at 08:22:21PM +0200, Felix Natter wrote: > I read this: > http://metalinguist.wordpress.com/2007/12/06/the-woes-of-git-gc-aggressive-and-how-git-deltas-work/ > where > git repack -a -d --depth=250 --window=250 > is mentioned as a (recommended) alternative to git gc --aggressive. Note how old that post is. In fact, on the very same day it was posted, the discussion on the mailing list resulted in this commit: commit 1c192f3442414a6ce83f9a524806fc26a0861d2d Author: Johannes Schindelin <Johannes.Schindelin@gmx.de> Date: Thu Dec 6 12:03:38 2007 +0000 gc --aggressive: make it really aggressive The default was not to change the window or depth at all. As suggested by Jon Smirl, Linus Torvalds and others, default to --window=250 --depth=250 So the packing parameters are the same these days for either method. Note that "git gc --aggressive" will also use "-f" to recompute all deltas. This is more expensive, but gives git more flexibility if the old deltas were sub-optimal (typically, this is the case if the existing pack was generated by fast-import, which favors speed of import versus coming up with an optimal storage pattern). > So my questions are: > > 1. is the above repack command (with --depth=500) safe? Of course I want > to be absolutely sure that our repo will be consistent. > Do I need another command ("git gc", "git prune") as well? Yes, it's safe. Changing the depth parameter can never lose data. However, it's probably not a good idea for two reasons: 1. It probably does nothing. You're not likely to hit a 500-depth delta chain (the point of the "250" in --aggressive is that it is already ridiculously high). 2. Even if you did come up with a 500-depth delta chain, it may not be a good tradeoff. You might save a little bit of space, but keep in mind that to generate the object data, it means that git will have to follow a chain of 500 deltas to regenerate the object. Of course, every workload is different. One can develop pathological cases where --depth=500 saves a lot of space. But it's unlikely that it is the case for a normal repository. You can always try both and see the result. In fact, I'd also test how just "git gc" behaves versus "git gc --aggressive" for your repo. The former is much less expensive to run. You really shouldn't need to be running "--aggressive" all the time, so if you are looking at doing a nightly repack or similar, just "git gc" is probably fine. -Peff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: git repack vs git gc --aggressive 2012-08-07 18:44 ` Jeff King @ 2012-08-07 19:05 ` Junio C Hamano 2012-08-10 19:09 ` Felix Natter 0 siblings, 1 reply; 7+ messages in thread From: Junio C Hamano @ 2012-08-07 19:05 UTC (permalink / raw) To: Jeff King; +Cc: Felix Natter, git Jeff King <peff@peff.net> writes: > So the packing parameters are the same these days for either method. > Note that "git gc --aggressive" will also use "-f" to recompute all > deltas. This is more expensive, but gives git more flexibility if the > old deltas were sub-optimal (typically, this is the case if the existing > pack was generated by fast-import, which favors speed of import versus > coming up with an optimal storage pattern). Also your fetch often results in storing the pack received from the other end straight to your local repository (with necessary objects to complete the pack the other end did not send appended at the end). If the server side hasn't been packed with "-f", you will inherit the badness until you repack with "-f". > Of course, every workload is different. One can develop pathological > cases where --depth=500 saves a lot of space. But it's unlikely that it > is the case for a normal repository. You can always try both and see the > result. For a dataset where ridiculously large depth really is a win, these objects would have to be reasonably large and cost of expanding the base and then applying hundreds of delta to recover one object may not be negligible. The user should consider if he is willing to pay the price every time he does a local Git operation. > In fact, I'd also test how just "git gc" behaves versus "git gc > --aggressive" for your repo. The former is much less expensive to run. > You really shouldn't need to be running "--aggressive" all the time, so > if you are looking at doing a nightly repack or similar, just "git gc" > is probably fine. As I am coming from "large depth is harmful" school, I would recommend - "git repack -a -d -f" with large "--window" with reasonably short "--depth" once, and mark the result with .keep; - "git repack -a -d -f" once every several weeks; and - "git gc" or "git repack" (without any other options) daily. and ignore "--aggressive" entirely. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: git repack vs git gc --aggressive 2012-08-07 19:05 ` Junio C Hamano @ 2012-08-10 19:09 ` Felix Natter 2012-08-10 20:09 ` Junio C Hamano 0 siblings, 1 reply; 7+ messages in thread From: Felix Natter @ 2012-08-10 19:09 UTC (permalink / raw) To: git Junio C Hamano <gitster@pobox.com> writes: > Jeff King <peff@peff.net> writes: > >> So the packing parameters are the same these days for either method. >> Note that "git gc --aggressive" will also use "-f" to recompute all >> deltas. This is more expensive, but gives git more flexibility if the >> old deltas were sub-optimal (typically, this is the case if the existing >> pack was generated by fast-import, which favors speed of import versus >> coming up with an optimal storage pattern). > > Also your fetch often results in storing the pack received from the > other end straight to your local repository (with necessary objects > to complete the pack the other end did not send appended at the > end). If the server side hasn't been packed with "-f", you will > inherit the badness until you repack with "-f". > >> Of course, every workload is different. One can develop pathological >> cases where --depth=500 saves a lot of space. But it's unlikely that it >> is the case for a normal repository. You can always try both and see the >> result. > > For a dataset where ridiculously large depth really is a win, these > objects would have to be reasonably large and cost of expanding the > base and then applying hundreds of delta to recover one object may > not be negligible. The user should consider if he is willing to pay > the price every time he does a local Git operation. > >> In fact, I'd also test how just "git gc" behaves versus "git gc >> --aggressive" for your repo. The former is much less expensive to run. >> You really shouldn't need to be running "--aggressive" all the time, so >> if you are looking at doing a nightly repack or similar, just "git gc" >> is probably fine. Thank you both very much for your answers! I have a few questions about this: > As I am coming from "large depth is harmful" school, I would > recommend > > - "git repack -a -d -f" with large "--window" with reasonably short > "--depth" once, So something like --depth=250 and --window=500? > and mark the result with .keep; I guess you refer to a toplevel '.keep' file. But what does that do (sorry, couldn't find anything on google)? > - "git repack -a -d -f" once every several weeks; and > > - "git gc" or "git repack" (without any other options) daily. > > and ignore "--aggressive" entirely. One more question: I use bzr fast-export | git fast-import to import branches from bzr: bzr fast-export --marks=$MARKS_BZR --git-branch="$BRANCHNAME" "$BZR_FREEPLANE_REPO/$BRANCHNAME/" | \ git fast-import --import-marks=$MARKS_GIT --export-marks=$MARKS_GIT Will those marks files (which remember which commits are already there in the git repo) also work after I have done git repack / git gc? In other words, can I import bzr-branches after I have run git repack / git gc on the repo? Thank you! -- Felix Natter ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: git repack vs git gc --aggressive 2012-08-10 19:09 ` Felix Natter @ 2012-08-10 20:09 ` Junio C Hamano 2012-08-13 14:20 ` Marc Branchaud 0 siblings, 1 reply; 7+ messages in thread From: Junio C Hamano @ 2012-08-10 20:09 UTC (permalink / raw) To: Felix Natter; +Cc: git Felix Natter <fnatter@gmx.net> writes: > I have a few questions about this: > >> As I am coming from "large depth is harmful" school, I would >> recommend >> >> - "git repack -a -d -f" with large "--window" with reasonably short >> "--depth" once, > > So something like --depth=250 and --window=500? I would use more like --depth=16 or 32 in my local repositories. >> and mark the result with .keep; > > I guess you refer to a toplevel '.keep' file. Not at all. And it is not documented, it seems X-<. Typically you have a pair of files in .git/objects/pack, e.g. .git/objects/pack/pack-2e3e3b332b446278f9ff91c4f497bc6ed2626d00.idx .git/objects/pack/pack-2e3e3b332b446278f9ff91c4f497bc6ed2626d00.pack And you can add another file next to them .git/objects/pack/pack-2e3e3b332b446278f9ff91c4f497bc6ed2626d00.keep to prevent the pack from getting repacked. I think "git clone" does this for you after an initial import. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: git repack vs git gc --aggressive 2012-08-10 20:09 ` Junio C Hamano @ 2012-08-13 14:20 ` Marc Branchaud 2012-08-13 17:19 ` Junio C Hamano 0 siblings, 1 reply; 7+ messages in thread From: Marc Branchaud @ 2012-08-13 14:20 UTC (permalink / raw) To: Junio C Hamano; +Cc: Felix Natter, git On 12-08-10 04:09 PM, Junio C Hamano wrote: > Felix Natter <fnatter@gmx.net> writes: > >> I have a few questions about this: >> >>> As I am coming from "large depth is harmful" school, I would >>> recommend >>> >>> - "git repack -a -d -f" with large "--window" with reasonably short >>> "--depth" once, >> >> So something like --depth=250 and --window=500? > > I would use more like --depth=16 or 32 in my local repositories. > >>> and mark the result with .keep; >> >> I guess you refer to a toplevel '.keep' file. > > Not at all. And it is not documented, it seems X-<. > > Typically you have a pair of files in .git/objects/pack, e.g. > > .git/objects/pack/pack-2e3e3b332b446278f9ff91c4f497bc6ed2626d00.idx > .git/objects/pack/pack-2e3e3b332b446278f9ff91c4f497bc6ed2626d00.pack > > And you can add another file next to them > > .git/objects/pack/pack-2e3e3b332b446278f9ff91c4f497bc6ed2626d00.keep > > to prevent the pack from getting repacked. I think "git clone" does > this for you after an initial import. 1.7.12.rc1 does not. I even cloned from a repo with a few .keep files, but ended up with only one big .pack file. Maybe clone should preserve the packs it gets from the upstream repo? For example, our main repo has a 690MB pack file that's marked .keep, but the clone just ends up with a single 725MB pack file. Would our clones see performance improvements if they that big 690MB pack separate from the others? Perhaps the fact that clone creates a single pack file makes it impossible to preserve the .keep packs from the upstream? (I figure it's probably not a good idea for clone to .keep the single pack file it creates.) M. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: git repack vs git gc --aggressive 2012-08-13 14:20 ` Marc Branchaud @ 2012-08-13 17:19 ` Junio C Hamano 0 siblings, 0 replies; 7+ messages in thread From: Junio C Hamano @ 2012-08-13 17:19 UTC (permalink / raw) To: marcnarc; +Cc: Felix Natter, git Marc Branchaud <mbranchaud@xiplink.com> writes: > On 12-08-10 04:09 PM, Junio C Hamano wrote: >> Felix Natter <fnatter@gmx.net> writes: >> >>> I have a few questions about this: >>> >>>> As I am coming from "large depth is harmful" school, I would >>>> recommend >>>> >>>> - "git repack -a -d -f" with large "--window" with reasonably short >>>> "--depth" once, >>> >>> So something like --depth=250 and --window=500? >> >> I would use more like --depth=16 or 32 in my local repositories. >> >>>> and mark the result with .keep; >>> >>> I guess you refer to a toplevel '.keep' file. >> >> Not at all. And it is not documented, it seems X-<. >> >> Typically you have a pair of files in .git/objects/pack, e.g. >> >> .git/objects/pack/pack-2e3e3b332b446278f9ff91c4f497bc6ed2626d00.idx >> .git/objects/pack/pack-2e3e3b332b446278f9ff91c4f497bc6ed2626d00.pack >> >> And you can add another file next to them >> >> .git/objects/pack/pack-2e3e3b332b446278f9ff91c4f497bc6ed2626d00.keep >> >> to prevent the pack from getting repacked. I think "git clone" does >> this for you after an initial import. > > 1.7.12.rc1 does not. Sorry, I misremembered. It was removed at 1db4a75 (Remove unnecessary pack-*.keep file after successful git-clone, 2008-07-08), so even when the sender gave you a crappy pack, you can repack locally to correct it. > Maybe clone should preserve the packs it gets from the upstream repo? That was part of the intention of the code 1db4a75 removed. > For > example, our main repo has a 690MB pack file that's marked .keep, but the > clone just ends up with a single 725MB pack file. Would our clones see > performance improvements if they that big 690MB pack separate from the others? There is no "pack boundary" in the object transfer protocol. What comes out of the wire is a single stream of pack data, so the above is not feasible without major surgery and backward incompatible change. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-08-13 17:20 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-08-07 18:22 git repack vs git gc --aggressive Felix Natter 2012-08-07 18:44 ` Jeff King 2012-08-07 19:05 ` Junio C Hamano 2012-08-10 19:09 ` Felix Natter 2012-08-10 20:09 ` Junio C Hamano 2012-08-13 14:20 ` Marc Branchaud 2012-08-13 17:19 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).