* WIth git-next, writing bitmaps fails when keep files are present @ 2014-01-23 2:38 Siddharth Agarwal 2014-01-23 20:36 ` Siddharth Agarwal 2014-01-23 22:52 ` [PATCH] pack-objects: turn off bitmaps when skipping objects Jeff King 0 siblings, 2 replies; 27+ messages in thread From: Siddharth Agarwal @ 2014-01-23 2:38 UTC (permalink / raw) To: git Running git-next, writing bitmap indexes fails if a keep file is present from an earlier pack. With git at b139ac2, the following commands demonstrate the problem: git init test cd test touch a git add a git commit -m "a" git repack -ad # generate a pack file for f in .git/objects/pack/*.pack; touch ${f/%pack/keep} # mark it as to keep touch b git add b git commit -m "b" git repack -adb This fails at the bitmap writing stage with something like: Counting objects: 2, done. Delta compression using up to 24 threads. Compressing objects: 100% (2/2), done. Writing objects: 100% (2/2), done. fatal: Failed to write bitmap index. Packfile doesn't have full closure (object 7388a015938147155b600eaacc59af6e78c75e5a is missing) In our case we have .keep files lying around from ages ago (possibly due to kill -9s run on the server). It also means that running repack -a with bitmap writing enabled on a repo becomes problematic if a fetch is run concurrently. Even if we practice good .keep hygiene, this seems like a bug in git that should be fixed. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: WIth git-next, writing bitmaps fails when keep files are present 2014-01-23 2:38 WIth git-next, writing bitmaps fails when keep files are present Siddharth Agarwal @ 2014-01-23 20:36 ` Siddharth Agarwal 2014-01-23 22:52 ` [PATCH] pack-objects: turn off bitmaps when skipping objects Jeff King 1 sibling, 0 replies; 27+ messages in thread From: Siddharth Agarwal @ 2014-01-23 20:36 UTC (permalink / raw) To: git On 01/22/2014 06:38 PM, Siddharth Agarwal wrote: > In our case we have .keep files lying around from ages ago (possibly > due to kill -9s run on the server). It also means that running repack > -a with bitmap writing enabled on a repo becomes problematic if a > fetch is run concurrently. We briefly discussed locking our repos while the repack was run, but the repo that would benefit the most from repacks cannot be locked to pushes for even a tenth of the time that repack takes on it. ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH] pack-objects: turn off bitmaps when skipping objects 2014-01-23 2:38 WIth git-next, writing bitmaps fails when keep files are present Siddharth Agarwal 2014-01-23 20:36 ` Siddharth Agarwal @ 2014-01-23 22:52 ` Jeff King 2014-01-23 23:45 ` Siddharth Agarwal 1 sibling, 1 reply; 27+ messages in thread From: Jeff King @ 2014-01-23 22:52 UTC (permalink / raw) To: Siddharth Agarwal; +Cc: git On Wed, Jan 22, 2014 at 06:38:57PM -0800, Siddharth Agarwal wrote: > Running git-next, writing bitmap indexes fails if a keep file is > present from an earlier pack. Right, that's expected. The bitmap format cannot represent objects that are not present in the pack. So we cannot write a bitmap index if any object reachable from a packed commit is omitted from the pack. We could be nicer and downgrade it to a warning, though. The patch below does that. > In our case we have .keep files lying around from ages ago (possibly > due to kill -9s run on the server). We ran into that problem at GitHub, too. We just turn off `--honor-pack-keep` during our repacks, as we never want them on anyway (and we would prefer to ignore the .keep than to abort the bitmap). > It also means that running repack -a with bitmap writing enabled on a > repo becomes problematic if a fetch is run concurrently. For the most part, no. The .keep file should generally only be set during the period between indexing the pack and updating the refs (so while checking connectivity and running hooks). But pack-objects starts from the ref tips and walks backwards. Until they are updated, it will not try to pack the objects in the .keep files, as nobody references them. There are two loopholes, though: 1. In some instances, a remote may send an object we already have (e.g., because it is a blob referenced in an old commit, but newly referenced again due to a revert; we do not do a full object difference during the protocol negotiation, for reasons of efficiency). If that is the case, we may omit it if pack-objects starts during the period that the .pack and .keep files exist. 2. Once the fetch updates the refs, it removes the .keep file. But this isn't atomic. A repack which starts between the two may pick up the new ref values, but also see the .keep file. These are both unlikely, but possible on a very busy repository. The patch below will downgrade each to a warning, rather than aborting the repack. So this should just work out of the box with this patch. But if bitmaps are important to you (say, you are running a very busy site and want to make sure you always have bitmaps turned on) and you do not otherwise care about .keep files, you may want to disable them, too. -Peff -- >8 -- Subject: pack-objects: turn off bitmaps when skipping objects The pack bitmap format requires that we have a single bit for each object in the pack, and that each object's bitmap represents its complete set of reachable objects. Therefore we have no way to represent the bitmap of an object which references objects outside the pack. We notice this problem while generating the bitmaps, as we try to find the offset of a particular object and realize that we do not have it. In this case we die, and neither the bitmap nor the pack is generated. This is correct, but perhaps a little unfriendly. If you have bitmaps turned on in the config, many repacks will fail which would otherwise succeed. E.g., incremental repacks, repacks with "-l" when you have alternates, ".keep" files. Instead, this patch notices early that we are omitting some objects from the pack and turns off bitmaps (with a warning). Note that this is not strictly correct, as it's possible that the object being omitted is not reachable from any other object in the pack. In practice, this is almost never the case, and there are two advantages to doing it this way: 1. The code is much simpler, as we do not have to cleanly abort the bitmap-generation process midway through. 2. We do not waste time partially generating bitmaps only to find out that some object deep in the history is not being packed. Signed-off-by: Jeff King <peff@peff.net> --- I tried to keep the warning to an 80-character line without making it too confusing. Suggestions welcome if it doesn't make sense to people. builtin/pack-objects.c | 12 +++++++++++- t/t5310-pack-bitmaps.sh | 5 ++++- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 8364fbd..76831d9 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1000,6 +1000,10 @@ static void create_object_entry(const unsigned char *sha1, entry->no_try_delta = no_try_delta; } +static const char no_closure_warning[] = N_( +"disabling bitmap writing, as some objects are not being packed" +); + static int add_object_entry(const unsigned char *sha1, enum object_type type, const char *name, int exclude) { @@ -1010,8 +1014,14 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type, if (have_duplicate_entry(sha1, exclude, &index_pos)) return 0; - if (!want_object_in_pack(sha1, exclude, &found_pack, &found_offset)) + if (!want_object_in_pack(sha1, exclude, &found_pack, &found_offset)) { + /* The pack is missing an object, so it will not have closure */ + if (write_bitmap_index) { + warning(_(no_closure_warning)); + write_bitmap_index = 0; + } return 0; + } create_object_entry(sha1, type, pack_name_hash(name), exclude, name && no_try_delta(name), diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index d3a3afa..f13525c 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -91,7 +91,10 @@ test_expect_success 'fetch (partial bitmap)' ' test_expect_success 'incremental repack cannot create bitmaps' ' test_commit more-1 && - test_must_fail git repack -d + find .git/objects/pack -name "*.bitmap" >expect && + git repack -d && + find .git/objects/pack -name "*.bitmap" >actual && + test_cmp expect actual ' test_expect_success 'incremental repack can disable bitmaps' ' -- 1.8.5.2.500.g8060133 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] pack-objects: turn off bitmaps when skipping objects 2014-01-23 22:52 ` [PATCH] pack-objects: turn off bitmaps when skipping objects Jeff King @ 2014-01-23 23:45 ` Siddharth Agarwal 2014-01-23 23:53 ` Siddharth Agarwal 2014-01-23 23:56 ` [PATCH] pack-objects: turn off bitmaps when skipping objects Vicent Martí 0 siblings, 2 replies; 27+ messages in thread From: Siddharth Agarwal @ 2014-01-23 23:45 UTC (permalink / raw) To: Jeff King; +Cc: git On 01/23/2014 02:52 PM, Jeff King wrote: > Right, that's expected. > > The bitmap format cannot represent objects that are not present in the > pack. So we cannot write a bitmap index if any object reachable from a > packed commit is omitted from the pack. > > We could be nicer and downgrade it to a warning, though. The patch below > does that. This makes sense. >> In our case we have .keep files lying around from ages ago (possibly >> due to kill -9s run on the server). > We ran into that problem at GitHub, too. We just turn off > `--honor-pack-keep` during our repacks, as we never want them on anyway > (and we would prefer to ignore the .keep than to abort the bitmap). Yes, we'd prefer to do that too. How do you actually do this, though? I don't see a way to pass `--honor-pack-keep` (shouldn't I pass in its inverse?) down to `git-pack-objects`. >> It also means that running repack -a with bitmap writing enabled on a >> repo becomes problematic if a fetch is run concurrently. > For the most part, no. The .keep file should generally only be set > during the period between indexing the pack and updating the refs (so > while checking connectivity and running hooks). But pack-objects starts > from the ref tips and walks backwards. Until they are updated, it will > not try to pack the objects in the .keep files, as nobody references > them. The worry is less certain objects not being packed and more the old packs being deleted by git repack, isn't it? From the man page for git-index-pack: --keep Before moving the index into its final destination create an empty .keep file for the associated pack file. This option is usually necessary with --stdin to prevent a simultaneous git repack process from deleting the newly constructed pack and index before refs can be updated to use objects contained in the pack. I could be misunderstanding things here, though. From the description in the man page it's not clear what the actual failure mode here is. > There are two loopholes, though: > > 1. In some instances, a remote may send an object we already have > (e.g., because it is a blob referenced in an old commit, but newly > referenced again due to a revert; we do not do a full object > difference during the protocol negotiation, for reasons of > efficiency). If that is the case, we may omit it if pack-objects > starts during the period that the .pack and .keep files exist. > > 2. Once the fetch updates the refs, it removes the .keep file. But > this isn't atomic. A repack which starts between the two may pick > up the new ref values, but also see the .keep file. > > These are both unlikely, but possible on a very busy repository. The > patch below will downgrade each to a warning, rather than aborting the > repack. > > So this should just work out of the box with this patch. But if bitmaps > are important to you (say, you are running a very busy site and want > to make sure you always have bitmaps turned on) and you do not otherwise > care about .keep files, you may want to disable them, too. We need to make sure bitmaps are always turned on, but we need to be even more certain that pushes don't fail due to races. > -Peff > > -- >8 -- > Subject: pack-objects: turn off bitmaps when skipping objects > > The pack bitmap format requires that we have a single bit > for each object in the pack, and that each object's bitmap > represents its complete set of reachable objects. Therefore > we have no way to represent the bitmap of an object which > references objects outside the pack. > > We notice this problem while generating the bitmaps, as we > try to find the offset of a particular object and realize > that we do not have it. In this case we die, and neither the > bitmap nor the pack is generated. This is correct, but > perhaps a little unfriendly. If you have bitmaps turned on > in the config, many repacks will fail which would otherwise > succeed. E.g., incremental repacks, repacks with "-l" when > you have alternates, ".keep" files. > > Instead, this patch notices early that we are omitting some > objects from the pack and turns off bitmaps (with a > warning). Note that this is not strictly correct, as it's > possible that the object being omitted is not reachable from > any other object in the pack. In practice, this is almost > never the case, and there are two advantages to doing it > this way: > > 1. The code is much simpler, as we do not have to cleanly > abort the bitmap-generation process midway through. > > 2. We do not waste time partially generating bitmaps only > to find out that some object deep in the history is not > being packed. > > Signed-off-by: Jeff King <peff@peff.net> > --- > I tried to keep the warning to an 80-character line without making it > too confusing. Suggestions welcome if it doesn't make sense to people. > > builtin/pack-objects.c | 12 +++++++++++- > t/t5310-pack-bitmaps.sh | 5 ++++- > 2 files changed, 15 insertions(+), 2 deletions(-) > > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > index 8364fbd..76831d9 100644 > --- a/builtin/pack-objects.c > +++ b/builtin/pack-objects.c > @@ -1000,6 +1000,10 @@ static void create_object_entry(const unsigned char *sha1, > entry->no_try_delta = no_try_delta; > } > > +static const char no_closure_warning[] = N_( > +"disabling bitmap writing, as some objects are not being packed" > +); > + > static int add_object_entry(const unsigned char *sha1, enum object_type type, > const char *name, int exclude) > { > @@ -1010,8 +1014,14 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type, > if (have_duplicate_entry(sha1, exclude, &index_pos)) > return 0; > > - if (!want_object_in_pack(sha1, exclude, &found_pack, &found_offset)) > + if (!want_object_in_pack(sha1, exclude, &found_pack, &found_offset)) { > + /* The pack is missing an object, so it will not have closure */ > + if (write_bitmap_index) { > + warning(_(no_closure_warning)); > + write_bitmap_index = 0; > + } > return 0; > + } > > create_object_entry(sha1, type, pack_name_hash(name), > exclude, name && no_try_delta(name), > diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh > index d3a3afa..f13525c 100755 > --- a/t/t5310-pack-bitmaps.sh > +++ b/t/t5310-pack-bitmaps.sh > @@ -91,7 +91,10 @@ test_expect_success 'fetch (partial bitmap)' ' > > test_expect_success 'incremental repack cannot create bitmaps' ' > test_commit more-1 && > - test_must_fail git repack -d > + find .git/objects/pack -name "*.bitmap" >expect && > + git repack -d && > + find .git/objects/pack -name "*.bitmap" >actual && > + test_cmp expect actual > ' > > test_expect_success 'incremental repack can disable bitmaps' ' ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] pack-objects: turn off bitmaps when skipping objects 2014-01-23 23:45 ` Siddharth Agarwal @ 2014-01-23 23:53 ` Siddharth Agarwal 2014-01-24 2:28 ` Jeff King 2014-01-23 23:56 ` [PATCH] pack-objects: turn off bitmaps when skipping objects Vicent Martí 1 sibling, 1 reply; 27+ messages in thread From: Siddharth Agarwal @ 2014-01-23 23:53 UTC (permalink / raw) To: Jeff King; +Cc: git On 01/23/2014 03:45 PM, Siddharth Agarwal wrote: > > The worry is less certain objects not being packed and more the old > packs being deleted by git repack, isn't it? From the man page for > git-index-pack: This should probably be "new pack" and not "old packs", I guess. Not knowing much about how this actually works, I'm assuming the scenario here is something like: (1) git receive-pack receives a pack P.pack and writes it to disk (2) git index-pack runs on P.pack (3) git repack runs separately, finds pack P.pack with no refs pointing to it, and deletes it (4) everything goes wrong With a keep file, this would be averted because (1) git receive-pack receives a pack P.pack and writes it to disk (2) git index-pack writes a keep file for P.pack, called P.keep (3) git repack runs separately, finds pack P.pack with a keep file, doesn't touch it (4) git index-pack finishes, and something updates refs to point to P.pack and deletes P.keep ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] pack-objects: turn off bitmaps when skipping objects 2014-01-23 23:53 ` Siddharth Agarwal @ 2014-01-24 2:28 ` Jeff King 2014-01-24 2:44 ` Siddharth Agarwal 0 siblings, 1 reply; 27+ messages in thread From: Jeff King @ 2014-01-24 2:28 UTC (permalink / raw) To: Siddharth Agarwal; +Cc: git On Thu, Jan 23, 2014 at 03:53:28PM -0800, Siddharth Agarwal wrote: > On 01/23/2014 03:45 PM, Siddharth Agarwal wrote: > > > >The worry is less certain objects not being packed and more the old > >packs being deleted by git repack, isn't it? From the man page for > >git-index-pack: > > This should probably be "new pack" and not "old packs", I guess. Not > knowing much about how this actually works, I'm assuming the scenario > here is something like: > > (1) git receive-pack receives a pack P.pack and writes it to disk > (2) git index-pack runs on P.pack > (3) git repack runs separately, finds pack P.pack with no refs > pointing to it, and deletes it > (4) everything goes wrong > > With a keep file, this would be averted because > > (1) git receive-pack receives a pack P.pack and writes it to disk > (2) git index-pack writes a keep file for P.pack, called P.keep > (3) git repack runs separately, finds pack P.pack with a keep file, > doesn't touch it > (4) git index-pack finishes, and something updates refs to point to > P.pack and deletes P.keep I think your understanding is accurate here. So we want repack to respect keep files for deletion, but we _not_ necessarily want pack-objects to avoid packing an object just because it's in a pack marked by .keep (see my other email). -Peff ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] pack-objects: turn off bitmaps when skipping objects 2014-01-24 2:28 ` Jeff King @ 2014-01-24 2:44 ` Siddharth Agarwal 2014-01-28 6:09 ` [PATCH] repack: add `repack.honorpackkeep` config var Jeff King 0 siblings, 1 reply; 27+ messages in thread From: Siddharth Agarwal @ 2014-01-24 2:44 UTC (permalink / raw) To: Jeff King; +Cc: git On 01/23/2014 06:28 PM, Jeff King wrote: > I think your understanding is accurate here. So we want repack to > respect keep files for deletion, but we _not_ necessarily want > pack-objects to avoid packing an object just because it's in a pack > marked by .keep (see my other email). Yes, that makes sense and sounds pretty safe. So the right solution for us probably is to apply the patch Vicent posted, set repack.honorpackkeep to false, and also have a cron job that cleans up stale .keep files so that subsequent repacks clean it up. ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH] repack: add `repack.honorpackkeep` config var 2014-01-24 2:44 ` Siddharth Agarwal @ 2014-01-28 6:09 ` Jeff King 2014-01-28 9:21 ` Junio C Hamano 0 siblings, 1 reply; 27+ messages in thread From: Jeff King @ 2014-01-28 6:09 UTC (permalink / raw) To: Siddharth Agarwal; +Cc: Vicent Marti, Junio C Hamano, git On Thu, Jan 23, 2014 at 06:44:43PM -0800, Siddharth Agarwal wrote: > On 01/23/2014 06:28 PM, Jeff King wrote: > >I think your understanding is accurate here. So we want repack to > >respect keep files for deletion, but we _not_ necessarily want > >pack-objects to avoid packing an object just because it's in a pack > >marked by .keep (see my other email). > > Yes, that makes sense and sounds pretty safe. > > So the right solution for us probably is to apply the patch Vicent > posted, set repack.honorpackkeep to false, and also have a cron job > that cleans up stale .keep files so that subsequent repacks clean it > up. Yes, that matches what we do at GitHub. Here's Vicent's patch, with documentation and an expanded commit message. I think it should be suitable for upstream git. -- >8 -- From: Vicent Marti <tanoku@gmail.com> Subject: repack: add `repack.honorpackkeep` config var The git-repack command always passes `--honor-pack-keep` to pack-objects. This has traditionally been a good thing, as we do not want to duplicate those objects in a new pack, and we are not going to delete the old pack. However, when bitmaps are in use, it is important for a full repack to include all reachable objects, even if they may be duplicated in a .keep pack. Otherwise, we cannot generate the bitmaps, as the on-disk format requires the set of objects in the pack to be fully closed. Even if the repository does not generally have .keep files, a simultaneous push could cause a race condition in which a .keep file exists at the moment of a repack. The repack may try to include those objects in one of two situations: 1. The pushed .keep pack contains objects that were already in the repository (e.g., blobs due to a revert of an old commit). 2. Receive-pack updates the refs, making the objects reachable, but before it removes the .keep file, the repack runs. In either case, we may prefer to duplicate some objects in the new, full pack, and let the next repack (after the .keep file is cleaned up) take care of removing them. This patch introduces an option to disable the `--honor-pack-keep` option. It is not triggered by default, even when pack.writeBitmaps is turned on, because its use depends on your overall packing strategy and use of .keep files. Note that this option just disables the pack-objects behavior. We still leave packs with a .keep in place, as we do not necessarily know that we have duplicated all of their objects. Signed-off-by: Jeff King <peff@peff.net> --- Intended for the jk/pack-bitmap topic. Documentation/config.txt | 8 ++++++++ builtin/repack.c | 8 +++++++- 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 947e6f8..5036a10 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2128,6 +2128,14 @@ repack.usedeltabaseoffset:: "false" and repack. Access from old Git versions over the native protocol are unaffected by this option. +repack.honorPackKeep:: + If set to false, include objects in `.keep` files when repacking + via `git repack`. Note that we still do not delete `.keep` packs + after `pack-objects` finishes. This means that we may duplicate + objects, but this makes the option safe to use when there are + concurrent pushes or fetches. This option is generally only + useful if you have set `pack.writeBitmaps`. Defaults to true. + rerere.autoupdate:: When set to true, `git-rerere` updates the index with the resulting contents after it cleanly resolves conflicts using diff --git a/builtin/repack.c b/builtin/repack.c index a9c4593..585c41d 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -9,6 +9,7 @@ #include "argv-array.h" static int delta_base_offset = 1; +static int honor_pack_keep = 1; static char *packdir, *packtmp; static const char *const git_repack_usage[] = { @@ -22,6 +23,10 @@ static int repack_config(const char *var, const char *value, void *cb) delta_base_offset = git_config_bool(var, value); return 0; } + if (!strcmp(var, "repack.honorpackkeep")) { + honor_pack_keep = git_config_bool(var, value); + return 0; + } return git_default_config(var, value, cb); } @@ -190,10 +195,11 @@ int cmd_repack(int argc, const char **argv, const char *prefix) argv_array_push(&cmd_args, "pack-objects"); argv_array_push(&cmd_args, "--keep-true-parents"); - argv_array_push(&cmd_args, "--honor-pack-keep"); argv_array_push(&cmd_args, "--non-empty"); argv_array_push(&cmd_args, "--all"); argv_array_push(&cmd_args, "--reflog"); + if (honor_pack_keep) + argv_array_push(&cmd_args, "--honor-pack-keep"); if (window) argv_array_pushf(&cmd_args, "--window=%u", window); if (window_memory) -- 1.8.5.2.500.g8060133 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] repack: add `repack.honorpackkeep` config var 2014-01-28 6:09 ` [PATCH] repack: add `repack.honorpackkeep` config var Jeff King @ 2014-01-28 9:21 ` Junio C Hamano 2014-02-24 8:24 ` Jeff King 0 siblings, 1 reply; 27+ messages in thread From: Junio C Hamano @ 2014-01-28 9:21 UTC (permalink / raw) To: Jeff King; +Cc: Siddharth Agarwal, Vicent Marti, git Jeff King <peff@peff.net> writes: > The git-repack command always passes `--honor-pack-keep` > to pack-objects. This has traditionally been a good thing, > as we do not want to duplicate those objects in a new pack, > and we are not going to delete the old pack. > ... > Note that this option just disables the pack-objects > behavior. We still leave packs with a .keep in place, as we > do not necessarily know that we have duplicated all of their > objects. > > Signed-off-by: Jeff King <peff@peff.net> > --- > Intended for the jk/pack-bitmap topic. Two comments. - It seems that this adds a configuration variable that cannot be countermanded from the command line. It has to come with either a very good justification in the documentation describing why it is a bad idea to even allow overriding from the command line in a repository that sets it, or a command line option to let the users override it. I personally prefer the latter, because that will be one less thing for users to remember (i.e. "usually you can override the configured default from the command line, but this variable cannot be because of these very good reasons"). - In the context of "pack-objects", the name "--honor-pack-keep" makes sense; it is understood that pack-objects will _not_ remove kept packfile, so "honoring" can only mean "do not attempt to pick objects out of kept packs to add to the pack being generated." and there is no room for --no-honor-pack-keep to be mistaken as "you canremove the ones marked to be kept after saving the still-used objects in it away." But does the same name make sense in the context of "repack"? Thanks. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] repack: add `repack.honorpackkeep` config var 2014-01-28 9:21 ` Junio C Hamano @ 2014-02-24 8:24 ` Jeff King 2014-02-24 19:10 ` Junio C Hamano 0 siblings, 1 reply; 27+ messages in thread From: Jeff King @ 2014-02-24 8:24 UTC (permalink / raw) To: Junio C Hamano; +Cc: Siddharth Agarwal, Vicent Marti, git On Tue, Jan 28, 2014 at 01:21:43AM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > The git-repack command always passes `--honor-pack-keep` > > to pack-objects. This has traditionally been a good thing, > > as we do not want to duplicate those objects in a new pack, > > and we are not going to delete the old pack. > > ... > > Note that this option just disables the pack-objects > > behavior. We still leave packs with a .keep in place, as we > > do not necessarily know that we have duplicated all of their > > objects. > > > > Signed-off-by: Jeff King <peff@peff.net> > > --- > > Intended for the jk/pack-bitmap topic. > > Two comments. Sorry, this one slipped through the cracks. Here's a re-roll addressing your comments. > - It seems that this adds a configuration variable that cannot be > countermanded from the command line. It has to come with either a > very good justification in the documentation describing why it is > a bad idea to even allow overriding from the command line in a > repository that sets it, or a command line option to let the > users override it. I personally prefer the latter, because that > will be one less thing for users to remember (i.e. "usually you > can override the configured default from the command line, but > this variable cannot be because of these very good reasons"). It was less "it is a bad idea to override" and more "I cannot conceive of any good reason to override". And since you can always use "git -c", I didn't think it was worth cluttering repack's options. However, I suppose if one were explicitly bitmapping a single invocation with `git repack -adb`, it might make sense to use it on the command line. Fixed in the re-roll. > - In the context of "pack-objects", the name "--honor-pack-keep" > makes sense; it is understood that pack-objects will _not_ remove > kept packfile, so "honoring" can only mean "do not attempt to > pick objects out of kept packs to add to the pack being > generated." and there is no room for --no-honor-pack-keep to be > mistaken as "you canremove the ones marked to be kept after > saving the still-used objects in it away." > > But does the same name make sense in the context of "repack"? I think the distinction you are making is to capture the second second from the docs: If set to false, include objects in `.keep` files when repacking via `git repack`. Note that we still do not delete `.keep` packs after `pack-objects` finishes. The best name I could come up with is "--pack-keep-objects", since that is literally what it is doing. I'm not wild about the name because it is easy to read "keep" as a verb (and "pack" as a noun). I think it's OK, but suggestions are welcome. -- >8 -- From: Vicent Marti <tanoku@gmail.com> Subject: repack: add `repack.packKeepObjects` config var The git-repack command always passes `--honor-pack-keep` to pack-objects. This has traditionally been a good thing, as we do not want to duplicate those objects in a new pack, and we are not going to delete the old pack. However, when bitmaps are in use, it is important for a full repack to include all reachable objects, even if they may be duplicated in a .keep pack. Otherwise, we cannot generate the bitmaps, as the on-disk format requires the set of objects in the pack to be fully closed. Even if the repository does not generally have .keep files, a simultaneous push could cause a race condition in which a .keep file exists at the moment of a repack. The repack may try to include those objects in one of two situations: 1. The pushed .keep pack contains objects that were already in the repository (e.g., blobs due to a revert of an old commit). 2. Receive-pack updates the refs, making the objects reachable, but before it removes the .keep file, the repack runs. In either case, we may prefer to duplicate some objects in the new, full pack, and let the next repack (after the .keep file is cleaned up) take care of removing them. This patch introduces an option to disable the `--honor-pack-keep` option. It is not triggered by default, even when pack.writeBitmaps is turned on, because its use depends on your overall packing strategy and use of .keep files. Note that this option just disables the pack-objects behavior. We still leave packs with a .keep in place, as we do not necessarily know that we have duplicated all of their objects. Signed-off-by: Jeff King <peff@peff.net> --- I added a test, too, mostly to make sure I didn't bungle the option, since it's negated from its former name. Documentation/config.txt | 5 +++++ Documentation/git-repack.txt | 8 ++++++++ builtin/repack.c | 10 +++++++++- t/t7700-repack.sh | 16 ++++++++++++++++ 4 files changed, 38 insertions(+), 1 deletion(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index becbade..8ad081e 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2136,6 +2136,11 @@ repack.usedeltabaseoffset:: "false" and repack. Access from old Git versions over the native protocol are unaffected by this option. +repack.packKeepObjects:: + If set to true, makes `git repack` act as if + `--pack-keep-objects` was passed. See linkgit:git-repack[1] for + details. Defaults to false. + rerere.autoupdate:: When set to true, `git-rerere` updates the index with the resulting contents after it cleanly resolves conflicts using diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt index 002cfd5..0c1ffbd 100644 --- a/Documentation/git-repack.txt +++ b/Documentation/git-repack.txt @@ -117,6 +117,14 @@ other objects in that pack they already have locally. must be able to refer to all reachable objects. This option overrides the setting of `pack.writebitmaps`. +--pack-keep-objects:: + Include objects in `.keep` files when repacking. Note that we + still do not delete `.keep` packs after `pack-objects` finishes. + This means that we may duplicate objects, but this makes the + option safe to use when there are concurrent pushes or fetches. + This option is generally only useful if you are writing bitmaps + with `-b` or `pack.writebitmaps`, as it ensures that the + bitmapped packfile has the necessary objects. Configuration ------------- diff --git a/builtin/repack.c b/builtin/repack.c index 49f5857..0785a4e 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -9,6 +9,7 @@ #include "argv-array.h" static int delta_base_offset = 1; +static int pack_keep_objects; static char *packdir, *packtmp; static const char *const git_repack_usage[] = { @@ -22,6 +23,10 @@ static int repack_config(const char *var, const char *value, void *cb) delta_base_offset = git_config_bool(var, value); return 0; } + if (!strcmp(var, "repack.packkeepobjects")) { + pack_keep_objects = git_config_bool(var, value); + return 0; + } return git_default_config(var, value, cb); } @@ -175,6 +180,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix) N_("limits the maximum delta depth")), OPT_STRING(0, "max-pack-size", &max_pack_size, N_("bytes"), N_("maximum size of each packfile")), + OPT_BOOL(0, "pack-keep-objects", &pack_keep_objects, + N_("repack objects in packs marked with .keep")), OPT_END() }; @@ -190,7 +197,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix) argv_array_push(&cmd_args, "pack-objects"); argv_array_push(&cmd_args, "--keep-true-parents"); - argv_array_push(&cmd_args, "--honor-pack-keep"); + if (!pack_keep_objects) + argv_array_push(&cmd_args, "--honor-pack-keep"); argv_array_push(&cmd_args, "--non-empty"); argv_array_push(&cmd_args, "--all"); argv_array_push(&cmd_args, "--reflog"); diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh index b45bd1e..13ca93c 100755 --- a/t/t7700-repack.sh +++ b/t/t7700-repack.sh @@ -35,6 +35,22 @@ test_expect_success 'objects in packs marked .keep are not repacked' ' test -z "$found_duplicate_object" ' +test_expect_success '--pack-keep-objects duplicates objects' ' + # build on $objsha1, $packsha1, and .keep state from previous + git repack -Adl --pack-keep-objects && + test_when_finished "found_duplicate_object=" && + for p in .git/objects/pack/*.idx; do + idx=$(basename $p) + test "pack-$packsha1.idx" = "$idx" && continue + if git verify-pack -v $p | egrep "^$objsha1"; then + found_duplicate_object=1 + echo "DUPLICATE OBJECT FOUND" + break + fi + done && + test "$found_duplicate_object" = 1 +' + test_expect_success 'loose objects in alternate ODB are not repacked' ' mkdir alt_objects && echo `pwd`/alt_objects > .git/objects/info/alternates && -- 1.8.5.2.500.g8060133 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] repack: add `repack.honorpackkeep` config var 2014-02-24 8:24 ` Jeff King @ 2014-02-24 19:10 ` Junio C Hamano 2014-02-26 10:13 ` Jeff King 0 siblings, 1 reply; 27+ messages in thread From: Junio C Hamano @ 2014-02-24 19:10 UTC (permalink / raw) To: Jeff King; +Cc: Siddharth Agarwal, Vicent Marti, git Jeff King <peff@peff.net> writes: > Sorry, this one slipped through the cracks. Here's a re-roll addressing > your comments. > ... >> - In the context of "pack-objects", the name "--honor-pack-keep" >> makes sense; it is understood that pack-objects will _not_ remove >> kept packfile, so "honoring" can only mean "do not attempt to >> pick objects out of kept packs to add to the pack being >> generated." and there is no room for --no-honor-pack-keep to be >> mistaken as "you canremove the ones marked to be kept after >> saving the still-used objects in it away." >> >> But does the same name make sense in the context of "repack"? > > I think the distinction you are making is to capture the second second > from the docs: > > If set to false, include objects in `.keep` files when repacking via > `git repack`. Note that we still do not delete `.keep` packs after > `pack-objects` finishes. > > The best name I could come up with is "--pack-keep-objects", since that > is literally what it is doing. I'm not wild about the name because it is > easy to read "keep" as a verb (and "pack" as a noun). I think it's OK, > but suggestions are welcome. pack-kept-objects then? ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] repack: add `repack.honorpackkeep` config var 2014-02-24 19:10 ` Junio C Hamano @ 2014-02-26 10:13 ` Jeff King 2014-02-26 20:30 ` Junio C Hamano 0 siblings, 1 reply; 27+ messages in thread From: Jeff King @ 2014-02-26 10:13 UTC (permalink / raw) To: Junio C Hamano; +Cc: Siddharth Agarwal, Vicent Marti, git On Mon, Feb 24, 2014 at 11:10:49AM -0800, Junio C Hamano wrote: > > The best name I could come up with is "--pack-keep-objects", since that > > is literally what it is doing. I'm not wild about the name because it is > > easy to read "keep" as a verb (and "pack" as a noun). I think it's OK, > > but suggestions are welcome. > > pack-kept-objects then? Hmm. That does address my point above, but somehow the word "kept" feels awkward to me. I'm ambivalent between the two. Here's the "kept" version if you want to apply that. -- >8 -- From: Vicent Marti <tanoku@gmail.com> Subject: [PATCH] repack: add `repack.packKeptObjects` config var The git-repack command always passes `--honor-pack-keep` to pack-objects. This has traditionally been a good thing, as we do not want to duplicate those objects in a new pack, and we are not going to delete the old pack. However, when bitmaps are in use, it is important for a full repack to include all reachable objects, even if they may be duplicated in a .keep pack. Otherwise, we cannot generate the bitmaps, as the on-disk format requires the set of objects in the pack to be fully closed. Even if the repository does not generally have .keep files, a simultaneous push could cause a race condition in which a .keep file exists at the moment of a repack. The repack may try to include those objects in one of two situations: 1. The pushed .keep pack contains objects that were already in the repository (e.g., blobs due to a revert of an old commit). 2. Receive-pack updates the refs, making the objects reachable, but before it removes the .keep file, the repack runs. In either case, we may prefer to duplicate some objects in the new, full pack, and let the next repack (after the .keep file is cleaned up) take care of removing them. This patch introduces an option to disable the `--honor-pack-keep` option. It is not triggered by default, even when pack.writeBitmaps is turned on, because its use depends on your overall packing strategy and use of .keep files. Note that this option just disables the pack-objects behavior. We still leave packs with a .keep in place, as we do not necessarily know that we have duplicated all of their objects. Signed-off-by: Jeff King <peff@peff.net> --- Documentation/config.txt | 5 +++++ Documentation/git-repack.txt | 8 ++++++++ builtin/repack.c | 10 +++++++++- t/t7700-repack.sh | 16 ++++++++++++++++ 4 files changed, 38 insertions(+), 1 deletion(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index becbade..3a3d84f 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2136,6 +2136,11 @@ repack.usedeltabaseoffset:: "false" and repack. Access from old Git versions over the native protocol are unaffected by this option. +repack.packKeptObjects:: + If set to true, makes `git repack` act as if + `--pack-kept-objects` was passed. See linkgit:git-repack[1] for + details. Defaults to false. + rerere.autoupdate:: When set to true, `git-rerere` updates the index with the resulting contents after it cleanly resolves conflicts using diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt index 002cfd5..4786a78 100644 --- a/Documentation/git-repack.txt +++ b/Documentation/git-repack.txt @@ -117,6 +117,14 @@ other objects in that pack they already have locally. must be able to refer to all reachable objects. This option overrides the setting of `pack.writebitmaps`. +--pack-kept-objects:: + Include objects in `.keep` files when repacking. Note that we + still do not delete `.keep` packs after `pack-objects` finishes. + This means that we may duplicate objects, but this makes the + option safe to use when there are concurrent pushes or fetches. + This option is generally only useful if you are writing bitmaps + with `-b` or `pack.writebitmaps`, as it ensures that the + bitmapped packfile has the necessary objects. Configuration ------------- diff --git a/builtin/repack.c b/builtin/repack.c index 49f5857..49947b2 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -9,6 +9,7 @@ #include "argv-array.h" static int delta_base_offset = 1; +static int pack_kept_objects; static char *packdir, *packtmp; static const char *const git_repack_usage[] = { @@ -22,6 +23,10 @@ static int repack_config(const char *var, const char *value, void *cb) delta_base_offset = git_config_bool(var, value); return 0; } + if (!strcmp(var, "repack.packkeptobjects")) { + pack_kept_objects = git_config_bool(var, value); + return 0; + } return git_default_config(var, value, cb); } @@ -175,6 +180,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix) N_("limits the maximum delta depth")), OPT_STRING(0, "max-pack-size", &max_pack_size, N_("bytes"), N_("maximum size of each packfile")), + OPT_BOOL(0, "pack-kept-objects", &pack_kept_objects, + N_("repack objects in packs marked with .keep")), OPT_END() }; @@ -190,7 +197,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix) argv_array_push(&cmd_args, "pack-objects"); argv_array_push(&cmd_args, "--keep-true-parents"); - argv_array_push(&cmd_args, "--honor-pack-keep"); + if (!pack_kept_objects) + argv_array_push(&cmd_args, "--honor-pack-keep"); argv_array_push(&cmd_args, "--non-empty"); argv_array_push(&cmd_args, "--all"); argv_array_push(&cmd_args, "--reflog"); diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh index b45bd1e..f8431a8 100755 --- a/t/t7700-repack.sh +++ b/t/t7700-repack.sh @@ -35,6 +35,22 @@ test_expect_success 'objects in packs marked .keep are not repacked' ' test -z "$found_duplicate_object" ' +test_expect_success '--pack-kept-objects duplicates objects' ' + # build on $objsha1, $packsha1, and .keep state from previous + git repack -Adl --pack-kept-objects && + test_when_finished "found_duplicate_object=" && + for p in .git/objects/pack/*.idx; do + idx=$(basename $p) + test "pack-$packsha1.idx" = "$idx" && continue + if git verify-pack -v $p | egrep "^$objsha1"; then + found_duplicate_object=1 + echo "DUPLICATE OBJECT FOUND" + break + fi + done && + test "$found_duplicate_object" = 1 +' + test_expect_success 'loose objects in alternate ODB are not repacked' ' mkdir alt_objects && echo `pwd`/alt_objects > .git/objects/info/alternates && -- 1.8.5.2.500.g8060133 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] repack: add `repack.honorpackkeep` config var 2014-02-26 10:13 ` Jeff King @ 2014-02-26 20:30 ` Junio C Hamano 2014-02-27 11:27 ` Jeff King 0 siblings, 1 reply; 27+ messages in thread From: Junio C Hamano @ 2014-02-26 20:30 UTC (permalink / raw) To: Jeff King; +Cc: Siddharth Agarwal, Vicent Marti, git Jeff King <peff@peff.net> writes: > On Mon, Feb 24, 2014 at 11:10:49AM -0800, Junio C Hamano wrote: > >> > The best name I could come up with is "--pack-keep-objects", since that >> > is literally what it is doing. I'm not wild about the name because it is >> > easy to read "keep" as a verb (and "pack" as a noun). I think it's OK, >> > but suggestions are welcome. >> >> pack-kept-objects then? > > Hmm. That does address my point above, but somehow the word "kept" feels > awkward to me. I'm ambivalent between the two. That word does make my backside somewhat itchy ;-) Would it help to take a step back and think what the option really does? Perhaps we should call it --pack-all-objects, which is short for --pack-all-objectsregardless-of-where-they-currently-are-stored, or something? The word "all" gives a wrong connotation in a different way (e.g. "regardless of reachability" is a possible wrong interpretation), so that does not sound too good, either. "--repack-kept-objects"? "--include-kept-objects"? ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] repack: add `repack.honorpackkeep` config var 2014-02-26 20:30 ` Junio C Hamano @ 2014-02-27 11:27 ` Jeff King 2014-02-27 18:04 ` Junio C Hamano 0 siblings, 1 reply; 27+ messages in thread From: Jeff King @ 2014-02-27 11:27 UTC (permalink / raw) To: Junio C Hamano; +Cc: Siddharth Agarwal, Vicent Marti, git On Wed, Feb 26, 2014 at 12:30:36PM -0800, Junio C Hamano wrote: > >> pack-kept-objects then? > > > > Hmm. That does address my point above, but somehow the word "kept" feels > > awkward to me. I'm ambivalent between the two. > > That word does make my backside somewhat itchy ;-) > > Would it help to take a step back and think what the option really > does? Perhaps we should call it --pack-all-objects, which is short > for --pack-all-objectsregardless-of-where-they-currently-are-stored, > or something? The word "all" gives a wrong connotation in a > different way (e.g. "regardless of reachability" is a possible wrong > interpretation), so that does not sound too good, either. I do not think "all" is what we want to say. It only affects "kept" objects, not any of the other exclusions (e.g., "-l"). > "--repack-kept-objects"? "--include-kept-objects"? Of all of them, I think --pack-kept-objects is probably the best. And I think we are hitting diminishing returns in thinking too much more on the name. :) -Peff ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] repack: add `repack.honorpackkeep` config var 2014-02-27 11:27 ` Jeff King @ 2014-02-27 18:04 ` Junio C Hamano 2014-02-28 8:55 ` Jeff King 0 siblings, 1 reply; 27+ messages in thread From: Junio C Hamano @ 2014-02-27 18:04 UTC (permalink / raw) To: Jeff King; +Cc: Siddharth Agarwal, Vicent Marti, git Jeff King <peff@peff.net> writes: > Of all of them, I think --pack-kept-objects is probably the best. And I > think we are hitting diminishing returns in thinking too much more on > the name. :) True enough. I wonder if it makes sense to link it with "pack.writebitmaps" more tightly, without even exposing it as a seemingly orthogonal knob that can be tweaked, though. I think that is because I do not fully understand the ", because ..." part of the below: >> This patch introduces an option to disable the >> `--honor-pack-keep` option. It is not triggered by default, >> even when pack.writeBitmaps is turned on, because its use >> depends on your overall packing strategy and use of .keep >> files. If you ask --write-bitmap-index (or have pack.writeBitmaps on), you do want the bitmap-index to be written, and unless you tell pack-objects to ignore the .keep marker, it cannot do so, no? Does the ", because ..." part above mean "you may have an overall packing strategy to use .keep file to not ever repack some subset of the objects, so we will not silently explode the kept objects into a new pack"? ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] repack: add `repack.honorpackkeep` config var 2014-02-27 18:04 ` Junio C Hamano @ 2014-02-28 8:55 ` Jeff King 2014-02-28 17:09 ` Nasser Grainawi 2014-02-28 18:45 ` Junio C Hamano 0 siblings, 2 replies; 27+ messages in thread From: Jeff King @ 2014-02-28 8:55 UTC (permalink / raw) To: Junio C Hamano; +Cc: Siddharth Agarwal, Vicent Marti, git On Thu, Feb 27, 2014 at 10:04:44AM -0800, Junio C Hamano wrote: > I wonder if it makes sense to link it with "pack.writebitmaps" more > tightly, without even exposing it as a seemingly orthogonal knob > that can be tweaked, though. > > I think that is because I do not fully understand the ", because ..." > part of the below: > > >> This patch introduces an option to disable the > >> `--honor-pack-keep` option. It is not triggered by default, > >> even when pack.writeBitmaps is turned on, because its use > >> depends on your overall packing strategy and use of .keep > >> files. > > If you ask --write-bitmap-index (or have pack.writeBitmaps on), you > do want the bitmap-index to be written, and unless you tell > pack-objects to ignore the .keep marker, it cannot do so, no? > > Does the ", because ..." part above mean "you may have an overall > packing strategy to use .keep file to not ever repack some subset of > the objects, so we will not silently explode the kept objects into a > new pack"? Exactly. The two features (bitmaps and .keep) are not compatible with each other, so you have to prioritize one. If you are using static .keep files, you might want them to continue being respected at the expense of using bitmaps for that repo. So I think you want a separate option from --write-bitmap-index to allow the appropriate flexibility. The default is another matter. I think most people using .bitmaps on a server would probably want to set repack.packKeptObjects. They would want to repack often to take advantage of the .bitmaps anyway, so they probably don't care about .keep files (any they see are due to races with incoming pushes). So we could do something like falling back to turning the option on if --write-bitmap-index is on _and_ the user didn't specify --pack-kept-objects. The existing default is mostly there because it is the conservative choice (.keep files continue to do their thing as normal unless you say otherwise). But the fallback thing would be one less knob that bitmap users would need to turn in the common case. Here's the interdiff for doing the fallback: --- diff --git a/Documentation/config.txt b/Documentation/config.txt index 3a3d84f..a8ddc7f 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2139,7 +2139,9 @@ repack.usedeltabaseoffset:: repack.packKeptObjects:: If set to true, makes `git repack` act as if `--pack-kept-objects` was passed. See linkgit:git-repack[1] for - details. Defaults to false. + details. Defaults to `false` normally, but `true` if a bitmap + index is being written (either via `--write-bitmap-index` or + `pack.writeBitmaps`). rerere.autoupdate:: When set to true, `git-rerere` updates the index with the diff --git a/builtin/repack.c b/builtin/repack.c index 49947b2..6b0b62d 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -9,7 +9,7 @@ #include "argv-array.h" static int delta_base_offset = 1; -static int pack_kept_objects; +static int pack_kept_objects = -1; static char *packdir, *packtmp; static const char *const git_repack_usage[] = { @@ -190,6 +190,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix) argc = parse_options(argc, argv, prefix, builtin_repack_options, git_repack_usage, 0); + if (pack_kept_objects < 0) + pack_kept_objects = write_bitmap; + packdir = mkpathdup("%s/pack", get_object_directory()); packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid()); diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh index f8431a8..b1eed5c 100755 --- a/t/t7700-repack.sh +++ b/t/t7700-repack.sh @@ -21,7 +21,7 @@ test_expect_success 'objects in packs marked .keep are not repacked' ' objsha1=$(git verify-pack -v pack-$packsha1.idx | head -n 1 | sed -e "s/^\([0-9a-f]\{40\}\).*/\1/") && mv pack-* .git/objects/pack/ && - git repack -A -d -l && + git repack --no-pack-kept-objects -A -d -l && git prune-packed && for p in .git/objects/pack/*.idx; do idx=$(basename $p) @@ -35,9 +35,9 @@ test_expect_success 'objects in packs marked .keep are not repacked' ' test -z "$found_duplicate_object" ' -test_expect_success '--pack-kept-objects duplicates objects' ' +test_expect_success 'writing bitmaps duplicates .keep objects' ' # build on $objsha1, $packsha1, and .keep state from previous - git repack -Adl --pack-kept-objects && + git repack -Adl && test_when_finished "found_duplicate_object=" && for p in .git/objects/pack/*.idx; do idx=$(basename $p) ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] repack: add `repack.honorpackkeep` config var 2014-02-28 8:55 ` Jeff King @ 2014-02-28 17:09 ` Nasser Grainawi 2014-03-01 6:05 ` Jeff King 2014-02-28 18:45 ` Junio C Hamano 1 sibling, 1 reply; 27+ messages in thread From: Nasser Grainawi @ 2014-02-28 17:09 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, Siddharth Agarwal, Vicent Marti, git On Feb 28, 2014, at 1:55 AM, Jeff King <peff@peff.net> wrote: > On Thu, Feb 27, 2014 at 10:04:44AM -0800, Junio C Hamano wrote: > >> I wonder if it makes sense to link it with "pack.writebitmaps" more >> tightly, without even exposing it as a seemingly orthogonal knob >> that can be tweaked, though. >> >> I think that is because I do not fully understand the ", because ..." >> part of the below: >> >>>> This patch introduces an option to disable the >>>> `--honor-pack-keep` option. It is not triggered by default, >>>> even when pack.writeBitmaps is turned on, because its use >>>> depends on your overall packing strategy and use of .keep >>>> files. >> >> If you ask --write-bitmap-index (or have pack.writeBitmaps on), you >> do want the bitmap-index to be written, and unless you tell >> pack-objects to ignore the .keep marker, it cannot do so, no? >> >> Does the ", because ..." part above mean "you may have an overall >> packing strategy to use .keep file to not ever repack some subset of >> the objects, so we will not silently explode the kept objects into a >> new pack"? > > Exactly. The two features (bitmaps and .keep) are not compatible with > each other, so you have to prioritize one. If you are using static .keep > files, you might want them to continue being respected at the expense of > using bitmaps for that repo. So I think you want a separate option from > --write-bitmap-index to allow the appropriate flexibility. Has anyone thought about how to make them compatible? We're using Martin Fick's git-exproll script which makes heavy use of keeps to reduce pack file churn. In addition to the on-disk benefits we get there, the driving factor behind creating exproll was to prevent Gerrit from having two large (30GB+) mostly duplicated pack files open in memory at the same time. Repacking in JGit would help in a single-master environment, but we'd be back to having this problem once we go to a multi-master setup. Perhaps the solution here is actually something in JGit where it could aggressively try to close references to pack files, but that still doesn't help the disk churn problem. As Peff says below, we would want to repack often to get up-to-date bitmaps, but ideally we could do that without writing hundreds of GBs to disk (which is obviously worse when "disk" is a NFS mount). > > The default is another matter. I think most people using .bitmaps on a > server would probably want to set repack.packKeptObjects. They would > want to repack often to take advantage of the .bitmaps anyway, so they > probably don't care about .keep files (any they see are due to races > with incoming pushes). > > So we could do something like falling back to turning the option on if > --write-bitmap-index is on _and_ the user didn't specify > --pack-kept-objects. The existing default is mostly there because it is > the conservative choice (.keep files continue to do their thing as > normal unless you say otherwise). But the fallback thing would be one > less knob that bitmap users would need to turn in the common case. > > Here's the interdiff for doing the fallback: > > --- > diff --git a/Documentation/config.txt b/Documentation/config.txt > index 3a3d84f..a8ddc7f 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -2139,7 +2139,9 @@ repack.usedeltabaseoffset:: > repack.packKeptObjects:: > If set to true, makes `git repack` act as if > `--pack-kept-objects` was passed. See linkgit:git-repack[1] for > - details. Defaults to false. > + details. Defaults to `false` normally, but `true` if a bitmap > + index is being written (either via `--write-bitmap-index` or > + `pack.writeBitmaps`). > > rerere.autoupdate:: > When set to true, `git-rerere` updates the index with the > diff --git a/builtin/repack.c b/builtin/repack.c > index 49947b2..6b0b62d 100644 > --- a/builtin/repack.c > +++ b/builtin/repack.c > @@ -9,7 +9,7 @@ > #include "argv-array.h" > > static int delta_base_offset = 1; > -static int pack_kept_objects; > +static int pack_kept_objects = -1; > static char *packdir, *packtmp; > > static const char *const git_repack_usage[] = { > @@ -190,6 +190,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix) > argc = parse_options(argc, argv, prefix, builtin_repack_options, > git_repack_usage, 0); > > + if (pack_kept_objects < 0) > + pack_kept_objects = write_bitmap; > + > packdir = mkpathdup("%s/pack", get_object_directory()); > packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid()); > > diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh > index f8431a8..b1eed5c 100755 > --- a/t/t7700-repack.sh > +++ b/t/t7700-repack.sh > @@ -21,7 +21,7 @@ test_expect_success 'objects in packs marked .keep are not repacked' ' > objsha1=$(git verify-pack -v pack-$packsha1.idx | head -n 1 | > sed -e "s/^\([0-9a-f]\{40\}\).*/\1/") && > mv pack-* .git/objects/pack/ && > - git repack -A -d -l && > + git repack --no-pack-kept-objects -A -d -l && > git prune-packed && > for p in .git/objects/pack/*.idx; do > idx=$(basename $p) > @@ -35,9 +35,9 @@ test_expect_success 'objects in packs marked .keep are not repacked' ' > test -z "$found_duplicate_object" > ' > > -test_expect_success '--pack-kept-objects duplicates objects' ' > +test_expect_success 'writing bitmaps duplicates .keep objects' ' > # build on $objsha1, $packsha1, and .keep state from previous > - git repack -Adl --pack-kept-objects && > + git repack -Adl && > test_when_finished "found_duplicate_object=" && > for p in .git/objects/pack/*.idx; do > idx=$(basename $p) > -- > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] repack: add `repack.honorpackkeep` config var 2014-02-28 17:09 ` Nasser Grainawi @ 2014-03-01 6:05 ` Jeff King 2014-03-03 19:12 ` Shawn Pearce 0 siblings, 1 reply; 27+ messages in thread From: Jeff King @ 2014-03-01 6:05 UTC (permalink / raw) To: Nasser Grainawi; +Cc: Junio C Hamano, Siddharth Agarwal, Vicent Marti, git On Fri, Feb 28, 2014 at 10:09:08AM -0700, Nasser Grainawi wrote: > > Exactly. The two features (bitmaps and .keep) are not compatible with > > each other, so you have to prioritize one. If you are using static .keep > > files, you might want them to continue being respected at the expense of > > using bitmaps for that repo. So I think you want a separate option from > > --write-bitmap-index to allow the appropriate flexibility. > > Has anyone thought about how to make them compatible? Yes, but it's complicated and not likely to happen soon. Having .keep files means that you are not including some objects in the newly created pack. Each bit in a commit's bitmap corresponds to one object in the pack, and whether it is reachable from that commit. The bitmap is only useful if we can calculate the full reachability from it, and it has no way to specify objects outside of the pack. To fix this, you would need to change the on-disk format of the bitmaps to somehow reference objects outside of the pack. Either by having the bitmaps index a repo-global set of objects, or by permitting a list of "edge" objects that are referenced from the pack, but not included (and then when assembling the full reachable list, you would have to recurse across "edge" objects to find their reachable list in another pack, etc). So it's possible, but it would complicate the scheme quite a bit, and would not be backwards compatible with either JGit or C Git. > We're using Martin Fick's git-exproll script which makes heavy use of > keeps to reduce pack file churn. In addition to the on-disk benefits > we get there, the driving factor behind creating exproll was to > prevent Gerrit from having two large (30GB+) mostly duplicated pack > files open in memory at the same time. Repacking in JGit would help in > a single-master environment, but we'd be back to having this problem > once we go to a multi-master setup. > > Perhaps the solution here is actually something in JGit where it could > aggressively try to close references to pack files In C git we don't worry about this too much, because our programs tend to be short-lived, and references to the old pack will go away quickly. Plus it is all mmap'd, so as we simply stop accessing the pages of the old pack, they should eventually be dropped if there is memory pressure. I seem to recall that JGit does not mmap its packfiles. Does it pread? In that case, I'd expect unused bits from the duplicated packfile to get dropped from the disk cache over time. If it loads whole packfiles into memory, then yes, it should probably close more aggressively. > , but that still > doesn't help the disk churn problem. As Peff says below, we would want > to repack often to get up-to-date bitmaps, but ideally we could do > that without writing hundreds of GBs to disk (which is obviously worse > when "disk" is a NFS mount). Ultimately I think the solution to the churn problem is a packfile-like storage that allows true appending of deltas. You can come up with a scheme to allow deltas between on-disk packs (i.e., "thin" packs on disk). The trick there is handling the dependencies and cycles. I think you could get by with a strict ordering of packs and a few rules: 1. An object in a pack with weight A cannot have as a base an object in a pack with weight <= A. 2. A pack with weight A cannot be deleted if there exists a pack with weight > A. But you'd want to also add in a single update-able index over all the packfiles, and even then you'd still want to pack occasionally (because you'd end up with deltas on bases going back in time, but you really prefer your bases to be near the tip of history). So I am not volunteering to work on it. :) At GitHub we mostly deal with the churn by throwing more server resources at it. But we have the advantage of having a very large number of small-to-medium repos, which is relatively easy to scale up. A small number of huge repos is trickier. -Peff ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] repack: add `repack.honorpackkeep` config var 2014-03-01 6:05 ` Jeff King @ 2014-03-03 19:12 ` Shawn Pearce 0 siblings, 0 replies; 27+ messages in thread From: Shawn Pearce @ 2014-03-03 19:12 UTC (permalink / raw) To: Jeff King Cc: Nasser Grainawi, Junio C Hamano, Siddharth Agarwal, Vicent Marti, git On Fri, Feb 28, 2014 at 10:05 PM, Jeff King <peff@peff.net> wrote: > On Fri, Feb 28, 2014 at 10:09:08AM -0700, Nasser Grainawi wrote: > >> > Exactly. The two features (bitmaps and .keep) are not compatible with >> > each other, so you have to prioritize one. If you are using static .keep >> > files, you might want them to continue being respected at the expense of >> > using bitmaps for that repo. So I think you want a separate option from >> > --write-bitmap-index to allow the appropriate flexibility. >> >> Has anyone thought about how to make them compatible? > > Yes, but it's complicated and not likely to happen soon. > > Having .keep files means that you are not including some objects in the > newly created pack. Each bit in a commit's bitmap corresponds to one > object in the pack, and whether it is reachable from that commit. The > bitmap is only useful if we can calculate the full reachability from it, > and it has no way to specify objects outside of the pack. > > To fix this, you would need to change the on-disk format of the bitmaps > to somehow reference objects outside of the pack. Either by having the > bitmaps index a repo-global set of objects, or by permitting a list of > "edge" objects that are referenced from the pack, but not included (and > then when assembling the full reachable list, you would have to recurse > across "edge" objects to find their reachable list in another pack, > etc). > > So it's possible, but it would complicate the scheme quite a bit, and > would not be backwards compatible with either JGit or C Git. Colby Ranger always wanted to add this to the bitmap scheme. Construct a partial pack bitmap on a partial pack of "recent" objects, with edge pointers naming objects that are not in this pack but whose closures need to be considered part of the bitmap. Its complicated in-memory because you need to fuse together two or more bitmaps (the partial pack one, and the larger historical kept pack) before running the "want AND NOT have" computation. Colby did not find time to work on this in JGit, so it just didn't get implemented. But we did consider it, as the servers at Google we built bitmap for use a multi-level pack scheme and don't want to rebuild packs all of the time. >> We're using Martin Fick's git-exproll script which makes heavy use of >> keeps to reduce pack file churn. In addition to the on-disk benefits >> we get there, the driving factor behind creating exproll was to >> prevent Gerrit from having two large (30GB+) mostly duplicated pack >> files open in memory at the same time. Repacking in JGit would help in >> a single-master environment, but we'd be back to having this problem >> once we go to a multi-master setup. >> >> Perhaps the solution here is actually something in JGit where it could >> aggressively try to close references to pack files > > In C git we don't worry about this too much, because our programs tend > to be short-lived, and references to the old pack will go away quickly. > Plus it is all mmap'd, so as we simply stop accessing the pages of the > old pack, they should eventually be dropped if there is memory pressure. > > I seem to recall that JGit does not mmap its packfiles. Does it pread? JGit does not mmap because you can't munmap() until the Java GC gets around to freeing the tiny little header object that contains the memory address of the start of the mmap segment. This can take ages, to the point where you run out of virtual address space in the process and s**t starts to fail left and right inside of the JVM. The GC is just unable to prioritize finding those tiny headers and getting them out of the heap so the munmap can take place safely. So yea, JGit does pread() for the blocks but it holds those in its own buffer cache inside of the Java heap. Where a 4K disk block is a 4K memory array that puts pressure on the GC to actually wake up and free resources that are unused. What Nasser is talking about is JGit may take a long time to realize one pack is unused and start kicking those blocks out of its buffer cache. Those blocks are reference counted and the file descriptor JGit preads from is held open so long as at least one block is in the buffer cache. By keeping the file open we force the filesystem to keep the inode alive a lot longer, which means the disk needs a huge amount of free space to store the unlinked but still open 30G pack files from prior GC generations. > In that case, I'd expect unused bits from the duplicated packfile to get > dropped from the disk cache over time. If it loads whole packfiles into > memory, then yes, it should probably close more aggressively. Its more than that, its the inode being kept alive by the open file descriptor... ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] repack: add `repack.honorpackkeep` config var 2014-02-28 8:55 ` Jeff King 2014-02-28 17:09 ` Nasser Grainawi @ 2014-02-28 18:45 ` Junio C Hamano 2014-03-01 5:43 ` Jeff King 1 sibling, 1 reply; 27+ messages in thread From: Junio C Hamano @ 2014-02-28 18:45 UTC (permalink / raw) To: Jeff King; +Cc: Siddharth Agarwal, Vicent Marti, git Jeff King <peff@peff.net> writes: > On Thu, Feb 27, 2014 at 10:04:44AM -0800, Junio C Hamano wrote: > >> I wonder if it makes sense to link it with "pack.writebitmaps" more >> tightly, without even exposing it as a seemingly orthogonal knob >> that can be tweaked, though. >> >> I think that is because I do not fully understand the ", because ..." >> part of the below: >> >> >> This patch introduces an option to disable the >> >> `--honor-pack-keep` option. It is not triggered by default, >> >> even when pack.writeBitmaps is turned on, because its use >> >> depends on your overall packing strategy and use of .keep >> >> files. >> >> If you ask --write-bitmap-index (or have pack.writeBitmaps on), you >> do want the bitmap-index to be written, and unless you tell >> pack-objects to ignore the .keep marker, it cannot do so, no? >> >> Does the ", because ..." part above mean "you may have an overall >> packing strategy to use .keep file to not ever repack some subset of >> the objects, so we will not silently explode the kept objects into a >> new pack"? > > Exactly. The two features (bitmaps and .keep) are not compatible with > each other, so you have to prioritize one. If you are using static .keep > files, you might want them to continue being respected at the expense of > using bitmaps for that repo. So I think you want a separate option from > --write-bitmap-index to allow the appropriate flexibility. What is "the appropriate flexibility", though? If the user wants to use bitmap, we would need to drop .keep, no? Doesn't always having two copies in two packs degrade performance unnecessarily (without even talking about wasted diskspace)? An explicit .keep exists in the repository because it is expensive and undesirable to duplicate what is in there in the first place, so it feels to me that either - Disable with warning, or outright refuse, the "-b" option if there is .keep (if we want to give precedence to .keep); or - Remove .keep with warning when "-b" option is given (if we want to give precedence to "-b"). and nothing else would be a reasonable option. Unfortunately, we can do neither automatically because there could be a transient .keep file in an active repository. So I think I agree with this... > The default is another matter. I think most people using .bitmaps on a > server would probably want to set repack.packKeptObjects. They would > want to repack often to take advantage of the .bitmaps anyway, so they > probably don't care about .keep files (any they see are due to races > with incoming pushes). ... which makes me think that repack.packKeptObjects is merely a distraction---it should be enough to just pass "--pack-kept-objects" when "-b" is asked, without giving any extra configurability, no? > So we could do something like falling back to turning the option on if > --write-bitmap-index is on _and_ the user didn't specify > --pack-kept-objects. If you mean "didn't specify --no-pack-kept-objects", then I think that is sensible. I still do not know why we would want the configuration variable, though. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] repack: add `repack.honorpackkeep` config var 2014-02-28 18:45 ` Junio C Hamano @ 2014-03-01 5:43 ` Jeff King 2014-03-03 18:13 ` Junio C Hamano 0 siblings, 1 reply; 27+ messages in thread From: Jeff King @ 2014-03-01 5:43 UTC (permalink / raw) To: Junio C Hamano; +Cc: Siddharth Agarwal, Vicent Marti, git On Fri, Feb 28, 2014 at 10:45:39AM -0800, Junio C Hamano wrote: > > Exactly. The two features (bitmaps and .keep) are not compatible with > > each other, so you have to prioritize one. If you are using static .keep > > files, you might want them to continue being respected at the expense of > > using bitmaps for that repo. So I think you want a separate option from > > --write-bitmap-index to allow the appropriate flexibility. > > What is "the appropriate flexibility", though? If the user wants to > use bitmap, we would need to drop .keep, no? Or the flip side: if the user wants to use .keep, we should drop bitmaps. My point is that we do not know which way the user wants to go, so we should not tie the options together. > Doesn't always having two copies in two packs degrade performance > unnecessarily (without even talking about wasted diskspace)? An > explicit .keep exists in the repository because it is expensive and > undesirable to duplicate what is in there in the first place, so it > feels to me that either The benefits of static .keep files are (I think): 1. less I/O during repacks, as you do not rewrite a static set of objects 2. less turnover of packfiles, which can make dumb access more efficient (both for dumb clients, but also for things like non-git-aware backups). I think the existence of smart-http more or less nullifies (2). For (1), it helps at first, but you get diminishing returns as your non-keep packfile grows. I think it only helps in pathological cases (e.g., you mark 10GB worth of giant blobs in a .keep pack, and then pack the other 10MB of trees, commits, and normal-sized blobs as usual). > - Disable with warning, or outright refuse, the "-b" option if > there is .keep (if we want to give precedence to .keep); or > > - Remove .keep with warning when "-b" option is given (if we want > to give precedence to "-b"). > > and nothing else would be a reasonable option. Unfortunately, we > can do neither automatically because there could be a transient .keep > file in an active repository. Right, the transient ones complicate the issue. But I think even for static .keep versus bitmaps, there is question. See below... > > The default is another matter. I think most people using .bitmaps on a > > server would probably want to set repack.packKeptObjects. They would > > want to repack often to take advantage of the .bitmaps anyway, so they > > probably don't care about .keep files (any they see are due to races > > with incoming pushes). > > ... which makes me think that repack.packKeptObjects is merely a > distraction---it should be enough to just pass "--pack-kept-objects" > when "-b" is asked, without giving any extra configurability, no? But you do not necessarily ask for "-b" explicitly; it might come from the config, too. Imagine you have a server with many repos. You want to use bitmaps when you can, so you set pack.writeBitmaps in /etc/gitconfig. But in a few repos, you want to use .keep files, and it's more important for you to use it than bitmaps (e.g., because it is one of the pathological cases above). So you set repack.packKeptObjects to false in /etc/gitconfig, to prefer .keep to bitmaps where appropriate. If you did not have that config option, your alternative would be to turn off pack.writeBitmaps in the repositories with .keep files. But then you need to per-repo keep that flag in sync with whether or not the repo has .keep files. To be clear, at GitHub we do not plan on ever having repack.packKeptObjects off (for now we have it on explicitly, but if it were connected to pack.writeBitmaps, then we would be happy with that default). I am mostly trying to give an escape hatch to let people use different optimization strategies if they want. If we are going to have --pack-kept-objects (and I think we should), I think we also should have a matching config option. Because it is useful when matched with the bitmap code, and that can be turned on both from the command-line or from the config. Wherever you are doing that, you would want to be able to make the matching .keep decision. And I don't think it hurts much. With the fallback-to-on behavior, you do not have to even care that it is there unless you are doing something clever. > > So we could do something like falling back to turning the option on if > > --write-bitmap-index is on _and_ the user didn't specify > > --pack-kept-objects. > > If you mean "didn't specify --no-pack-kept-objects", then I think > that is sensible. I still do not know why we would want the > configuration variable, though. Right, I meant "if the pack-kept-objects variable is set, either on or off, either via the command line or in the config". As far as what the config is good for, see above. -Peff ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] repack: add `repack.honorpackkeep` config var 2014-03-01 5:43 ` Jeff King @ 2014-03-03 18:13 ` Junio C Hamano 2014-03-03 18:15 ` Jeff King 0 siblings, 1 reply; 27+ messages in thread From: Junio C Hamano @ 2014-03-03 18:13 UTC (permalink / raw) To: Jeff King; +Cc: Siddharth Agarwal, Vicent Marti, git Jeff King <peff@peff.net> writes: > On Fri, Feb 28, 2014 at 10:45:39AM -0800, Junio C Hamano wrote: > >> > Exactly. The two features (bitmaps and .keep) are not compatible with >> > each other, so you have to prioritize one. If you are using static .keep >> > files, you might want them to continue being respected at the expense of >> > using bitmaps for that repo. So I think you want a separate option from >> > --write-bitmap-index to allow the appropriate flexibility. >> >> What is "the appropriate flexibility", though? If the user wants to >> use bitmap, we would need to drop .keep, no? > > Or the flip side: if the user wants to use .keep, we should drop > bitmaps. My point is that we do not know which way the user wants to > go, so we should not tie the options together. Hmph. I think the short of your later explanation is "global config may tell us to use bitmap, in which case we would need a way to defeat that and have existing .keep honored, and it makes it easier to do so if these two are kept separate, because you do not want to run around and selectively disable bitmaps in these repositories. We can instead do so with repack.packKeptObjects in the global configuration." and I tend to agree with the reasoning. Thanks. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] repack: add `repack.honorpackkeep` config var 2014-03-03 18:13 ` Junio C Hamano @ 2014-03-03 18:15 ` Jeff King 2014-03-03 19:51 ` Junio C Hamano 0 siblings, 1 reply; 27+ messages in thread From: Jeff King @ 2014-03-03 18:15 UTC (permalink / raw) To: Junio C Hamano; +Cc: Siddharth Agarwal, Vicent Marti, git On Mon, Mar 03, 2014 at 10:13:47AM -0800, Junio C Hamano wrote: > > Or the flip side: if the user wants to use .keep, we should drop > > bitmaps. My point is that we do not know which way the user wants to > > go, so we should not tie the options together. > > Hmph. I think the short of your later explanation is "global config > may tell us to use bitmap, in which case we would need a way to > defeat that and have existing .keep honored, and it makes it easier > to do so if these two are kept separate, because you do not want to > run around and selectively disable bitmaps in these repositories. > We can instead do so with repack.packKeptObjects in the global > configuration." and I tend to agree with the reasoning. Yes. Do you need a re-roll from me? I think the last version I sent + the squash to tie the default to bitmap-writing makes the most sense. -Peff ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] repack: add `repack.honorpackkeep` config var 2014-03-03 18:15 ` Jeff King @ 2014-03-03 19:51 ` Junio C Hamano 2014-03-03 20:04 ` Jeff King 0 siblings, 1 reply; 27+ messages in thread From: Junio C Hamano @ 2014-03-03 19:51 UTC (permalink / raw) To: Jeff King; +Cc: Siddharth Agarwal, Vicent Marti, git Jeff King <peff@peff.net> writes: > On Mon, Mar 03, 2014 at 10:13:47AM -0800, Junio C Hamano wrote: > >> > Or the flip side: if the user wants to use .keep, we should drop >> > bitmaps. My point is that we do not know which way the user wants to >> > go, so we should not tie the options together. >> >> Hmph. I think the short of your later explanation is "global config >> may tell us to use bitmap, in which case we would need a way to >> defeat that and have existing .keep honored, and it makes it easier >> to do so if these two are kept separate, because you do not want to >> run around and selectively disable bitmaps in these repositories. >> We can instead do so with repack.packKeptObjects in the global >> configuration." and I tend to agree with the reasoning. > > Yes. Do you need a re-roll from me? I think the last version I sent + > the squash to tie the default to bitmap-writing makes the most sense. I have 9e20b390 (repack: add `repack.packKeptObjects` config var, 2014-02-26); I do not recall I've squashed anything into it, though. Do you mean this one? Here's the interdiff for doing the fallback: --- diff --git a/Documentation/config.txt b/Documentation/config.txt index 3a3d84f..a8ddc7f 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2139,7 +2139,9 @@ repack.usedeltabaseoffset:: repack.packKeptObjects:: If set to true, makes `git repack` act as if `--pack-kept-objects` was passed. See linkgit:git-repack[1] for - details. Defaults to false. + details. Defaults to `false` normally, but `true` if a bitmap + index is being written (either via `--write-bitmap-index` or + `pack.writeBitmaps`). rerere.autoupdate:: When set to true, `git-rerere` updates the index with the diff --git a/builtin/repack.c b/builtin/repack.c index 49947b2..6b0b62d 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -9,7 +9,7 @@ #include "argv-array.h" static int delta_base_offset = 1; -static int pack_kept_objects; +static int pack_kept_objects = -1; static char *packdir, *packtmp; static const char *const git_repack_usage[] = { @@ -190,6 +190,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix) argc = parse_options(argc, argv, prefix, builtin_repack_options, git_repack_usage, 0); + if (pack_kept_objects < 0) + pack_kept_objects = write_bitmap; + packdir = mkpathdup("%s/pack", get_object_directory()); packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid()); diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh index f8431a8..b1eed5c 100755 --- a/t/t7700-repack.sh +++ b/t/t7700-repack.sh @@ -21,7 +21,7 @@ test_expect_success 'objects in packs marked .keep are not repacked' ' objsha1=$(git verify-pack -v pack-$packsha1.idx | head -n 1 | sed -e "s/^\([0-9a-f]\{40\}\).*/\1/") && mv pack-* .git/objects/pack/ && - git repack -A -d -l && + git repack --no-pack-kept-objects -A -d -l && git prune-packed && for p in .git/objects/pack/*.idx; do idx=$(basename $p) @@ -35,9 +35,9 @@ test_expect_success 'objects in packs marked .keep are not repacked' ' test -z "$found_duplicate_object" ' -test_expect_success '--pack-kept-objects duplicates objects' ' +test_expect_success 'writing bitmaps duplicates .keep objects' ' # build on $objsha1, $packsha1, and .keep state from previous - git repack -Adl --pack-kept-objects && + git repack -Adl && test_when_finished "found_duplicate_object=" && for p in .git/objects/pack/*.idx; do idx=$(basename $p) ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] repack: add `repack.honorpackkeep` config var 2014-03-03 19:51 ` Junio C Hamano @ 2014-03-03 20:04 ` Jeff King 0 siblings, 0 replies; 27+ messages in thread From: Jeff King @ 2014-03-03 20:04 UTC (permalink / raw) To: Junio C Hamano; +Cc: Siddharth Agarwal, Vicent Marti, git On Mon, Mar 03, 2014 at 11:51:06AM -0800, Junio C Hamano wrote: > > Yes. Do you need a re-roll from me? I think the last version I sent + > > the squash to tie the default to bitmap-writing makes the most sense. > > I have 9e20b390 (repack: add `repack.packKeptObjects` config var, > 2014-02-26); I do not recall I've squashed anything into it, though. > > Do you mean this one? > > Here's the interdiff for doing the fallback: > [...] Yes. Though I just noticed that the commit message needs updating if that is squashed in. Here is the whole patch, with that update. And I am dropping Vicent as the author, since I think there are now literally zero lines of his left. ;) -- >8 -- Subject: [PATCH] repack: add `repack.packKeptObjects` config var The git-repack command always passes `--honor-pack-keep` to pack-objects. This has traditionally been a good thing, as we do not want to duplicate those objects in a new pack, and we are not going to delete the old pack. However, when bitmaps are in use, it is important for a full repack to include all reachable objects, even if they may be duplicated in a .keep pack. Otherwise, we cannot generate the bitmaps, as the on-disk format requires the set of objects in the pack to be fully closed. Even if the repository does not generally have .keep files, a simultaneous push could cause a race condition in which a .keep file exists at the moment of a repack. The repack may try to include those objects in one of two situations: 1. The pushed .keep pack contains objects that were already in the repository (e.g., blobs due to a revert of an old commit). 2. Receive-pack updates the refs, making the objects reachable, but before it removes the .keep file, the repack runs. In either case, we may prefer to duplicate some objects in the new, full pack, and let the next repack (after the .keep file is cleaned up) take care of removing them. This patch introduces both a command-line and config option to disable the `--honor-pack-keep` option. By default, it is triggered when pack.writeBitmaps (or `--write-bitmap-index` is turned on), but specifying it explicitly can override the behavior (e.g., in cases where you prefer .keep files to bitmaps, but only when they are present). Note that this option just disables the pack-objects behavior. We still leave packs with a .keep in place, as we do not necessarily know that we have duplicated all of their objects. Signed-off-by: Jeff King <peff@peff.net> --- Documentation/config.txt | 7 +++++++ Documentation/git-repack.txt | 8 ++++++++ builtin/repack.c | 13 ++++++++++++- t/t7700-repack.sh | 18 +++++++++++++++++- 4 files changed, 44 insertions(+), 2 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index becbade..a8ddc7f 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2136,6 +2136,13 @@ repack.usedeltabaseoffset:: "false" and repack. Access from old Git versions over the native protocol are unaffected by this option. +repack.packKeptObjects:: + If set to true, makes `git repack` act as if + `--pack-kept-objects` was passed. See linkgit:git-repack[1] for + details. Defaults to `false` normally, but `true` if a bitmap + index is being written (either via `--write-bitmap-index` or + `pack.writeBitmaps`). + rerere.autoupdate:: When set to true, `git-rerere` updates the index with the resulting contents after it cleanly resolves conflicts using diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt index 002cfd5..4786a78 100644 --- a/Documentation/git-repack.txt +++ b/Documentation/git-repack.txt @@ -117,6 +117,14 @@ other objects in that pack they already have locally. must be able to refer to all reachable objects. This option overrides the setting of `pack.writebitmaps`. +--pack-kept-objects:: + Include objects in `.keep` files when repacking. Note that we + still do not delete `.keep` packs after `pack-objects` finishes. + This means that we may duplicate objects, but this makes the + option safe to use when there are concurrent pushes or fetches. + This option is generally only useful if you are writing bitmaps + with `-b` or `pack.writebitmaps`, as it ensures that the + bitmapped packfile has the necessary objects. Configuration ------------- diff --git a/builtin/repack.c b/builtin/repack.c index 49f5857..6b0b62d 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -9,6 +9,7 @@ #include "argv-array.h" static int delta_base_offset = 1; +static int pack_kept_objects = -1; static char *packdir, *packtmp; static const char *const git_repack_usage[] = { @@ -22,6 +23,10 @@ static int repack_config(const char *var, const char *value, void *cb) delta_base_offset = git_config_bool(var, value); return 0; } + if (!strcmp(var, "repack.packkeptobjects")) { + pack_kept_objects = git_config_bool(var, value); + return 0; + } return git_default_config(var, value, cb); } @@ -175,6 +180,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix) N_("limits the maximum delta depth")), OPT_STRING(0, "max-pack-size", &max_pack_size, N_("bytes"), N_("maximum size of each packfile")), + OPT_BOOL(0, "pack-kept-objects", &pack_kept_objects, + N_("repack objects in packs marked with .keep")), OPT_END() }; @@ -183,6 +190,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix) argc = parse_options(argc, argv, prefix, builtin_repack_options, git_repack_usage, 0); + if (pack_kept_objects < 0) + pack_kept_objects = write_bitmap; + packdir = mkpathdup("%s/pack", get_object_directory()); packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid()); @@ -190,7 +200,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix) argv_array_push(&cmd_args, "pack-objects"); argv_array_push(&cmd_args, "--keep-true-parents"); - argv_array_push(&cmd_args, "--honor-pack-keep"); + if (!pack_kept_objects) + argv_array_push(&cmd_args, "--honor-pack-keep"); argv_array_push(&cmd_args, "--non-empty"); argv_array_push(&cmd_args, "--all"); argv_array_push(&cmd_args, "--reflog"); diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh index b45bd1e..b1eed5c 100755 --- a/t/t7700-repack.sh +++ b/t/t7700-repack.sh @@ -21,7 +21,7 @@ test_expect_success 'objects in packs marked .keep are not repacked' ' objsha1=$(git verify-pack -v pack-$packsha1.idx | head -n 1 | sed -e "s/^\([0-9a-f]\{40\}\).*/\1/") && mv pack-* .git/objects/pack/ && - git repack -A -d -l && + git repack --no-pack-kept-objects -A -d -l && git prune-packed && for p in .git/objects/pack/*.idx; do idx=$(basename $p) @@ -35,6 +35,22 @@ test_expect_success 'objects in packs marked .keep are not repacked' ' test -z "$found_duplicate_object" ' +test_expect_success 'writing bitmaps can duplicate .keep objects' ' + # build on $objsha1, $packsha1, and .keep state from previous + git repack -Adl && + test_when_finished "found_duplicate_object=" && + for p in .git/objects/pack/*.idx; do + idx=$(basename $p) + test "pack-$packsha1.idx" = "$idx" && continue + if git verify-pack -v $p | egrep "^$objsha1"; then + found_duplicate_object=1 + echo "DUPLICATE OBJECT FOUND" + break + fi + done && + test "$found_duplicate_object" = 1 +' + test_expect_success 'loose objects in alternate ODB are not repacked' ' mkdir alt_objects && echo `pwd`/alt_objects > .git/objects/info/alternates && -- 1.8.5.2.500.g8060133 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] pack-objects: turn off bitmaps when skipping objects 2014-01-23 23:45 ` Siddharth Agarwal 2014-01-23 23:53 ` Siddharth Agarwal @ 2014-01-23 23:56 ` Vicent Martí 2014-01-24 2:26 ` Jeff King 1 sibling, 1 reply; 27+ messages in thread From: Vicent Martí @ 2014-01-23 23:56 UTC (permalink / raw) To: Siddharth Agarwal; +Cc: Jeff King, git On Fri, Jan 24, 2014 at 12:45 AM, Siddharth Agarwal <sid0@fb.com> wrote: > Yes, we'd prefer to do that too. How do you actually do this, though? I > don't see a way to pass `--honor-pack-keep` (shouldn't I pass in its > inverse?) down to `git-pack-objects`. We run with this patch in production, it may be of use to you: https://gist.github.com/vmg/8589317 In fact, it may be worth upstreaming too. I'll kindly ask peff to do it when he has a moment. Apologies for not attaching the patch inline, the GMail web UI doesn't mix well with patch workflow. Cheers, vmg ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] pack-objects: turn off bitmaps when skipping objects 2014-01-23 23:56 ` [PATCH] pack-objects: turn off bitmaps when skipping objects Vicent Martí @ 2014-01-24 2:26 ` Jeff King 0 siblings, 0 replies; 27+ messages in thread From: Jeff King @ 2014-01-24 2:26 UTC (permalink / raw) To: Vicent Martí; +Cc: Siddharth Agarwal, git On Fri, Jan 24, 2014 at 12:56:17AM +0100, Vicent Martí wrote: > On Fri, Jan 24, 2014 at 12:45 AM, Siddharth Agarwal <sid0@fb.com> wrote: > > Yes, we'd prefer to do that too. How do you actually do this, though? I > > don't see a way to pass `--honor-pack-keep` (shouldn't I pass in its > > inverse?) down to `git-pack-objects`. > > We run with this patch in production, it may be of use to you: > https://gist.github.com/vmg/8589317 > > In fact, it may be worth upstreaming too. I'll kindly ask peff to do > it when he has a moment. I was actually looking at it earlier when I sent this message. The tricky thing about the patch is that it turns off --honor-pack-keep, but does _not_ teach git-repack to clean up the .keep file. Which I think is the right and safe thing to do, as otherwise you might blow away a pack with .keep, even though you did not just pack its objects (i.e., because it was written by a fetch or push which did not yet update the refs). So the safe thing is to actually duplicate those objects, leave the .keep pack around, and then assume it will get cleaned up on the next repack. If you _do_ have a stale .keep file, though, then that stale pack will hang around forever (presumably with its objects duplicated in the "real" pack). So I think the patch is doing the right thing, but I was still figuring out how to explain it (and I hope I just did). I'll post it with a full commit message tomorrow. -Peff ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2014-03-03 20:04 UTC | newest] Thread overview: 27+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-01-23 2:38 WIth git-next, writing bitmaps fails when keep files are present Siddharth Agarwal 2014-01-23 20:36 ` Siddharth Agarwal 2014-01-23 22:52 ` [PATCH] pack-objects: turn off bitmaps when skipping objects Jeff King 2014-01-23 23:45 ` Siddharth Agarwal 2014-01-23 23:53 ` Siddharth Agarwal 2014-01-24 2:28 ` Jeff King 2014-01-24 2:44 ` Siddharth Agarwal 2014-01-28 6:09 ` [PATCH] repack: add `repack.honorpackkeep` config var Jeff King 2014-01-28 9:21 ` Junio C Hamano 2014-02-24 8:24 ` Jeff King 2014-02-24 19:10 ` Junio C Hamano 2014-02-26 10:13 ` Jeff King 2014-02-26 20:30 ` Junio C Hamano 2014-02-27 11:27 ` Jeff King 2014-02-27 18:04 ` Junio C Hamano 2014-02-28 8:55 ` Jeff King 2014-02-28 17:09 ` Nasser Grainawi 2014-03-01 6:05 ` Jeff King 2014-03-03 19:12 ` Shawn Pearce 2014-02-28 18:45 ` Junio C Hamano 2014-03-01 5:43 ` Jeff King 2014-03-03 18:13 ` Junio C Hamano 2014-03-03 18:15 ` Jeff King 2014-03-03 19:51 ` Junio C Hamano 2014-03-03 20:04 ` Jeff King 2014-01-23 23:56 ` [PATCH] pack-objects: turn off bitmaps when skipping objects Vicent Martí 2014-01-24 2:26 ` Jeff King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).