* builtin git-bundle - pack contains many more objects than required @ 2007-03-06 4:17 Mark Levedahl 2007-03-06 6:18 ` Junio C Hamano 2007-03-06 7:45 ` [PATCH] git-bundle: fix pack generation Junio C Hamano 0 siblings, 2 replies; 22+ messages in thread From: Mark Levedahl @ 2007-03-06 4:17 UTC (permalink / raw) To: Johannes.Schindelin, Git Mailing List builtin-bundle in version v1.5.0.3-271-g5ced057 (commit 5ced0572217f82f20c4a3460) (and probably before) includes far more objects in the pack than are actually necessary. Comparison to the original shell scripts I wrote shows the prerequisites and refs always agree between those (and that is what I tested before), so the problem is apparently in the arguments as passed to pack-objects. I confess to not understanding the revision walking code called from builtin-bundle.c, so I do not have a patch. However, a simple example: git>git-rev-list --objects HEAD~1..HEAD 5ced0572217f82f20c4a3460492768e07c08aeea 1d30eaabe4b6f7218e4e4cfff5670a493aa7358e 2a1d6a2be1511f65b601897f05962e5f673257d8 contrib cbd77b2f57114d4fa2f119f1b1ee17968d6d67d4 contrib/emacs 8554e3967cc692c6916e5aee35952074d07e8bf0 contrib/emacs/Makefile shows that the bundle for this case should have five objects as shown. In this case, 916 objects are actually included in the pack when executing git bundle create test.bdl HEAD~1..HEAD Mark ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: builtin git-bundle - pack contains many more objects than required 2007-03-06 4:17 builtin git-bundle - pack contains many more objects than required Mark Levedahl @ 2007-03-06 6:18 ` Junio C Hamano 2007-03-06 7:45 ` [PATCH] git-bundle: fix pack generation Junio C Hamano 1 sibling, 0 replies; 22+ messages in thread From: Junio C Hamano @ 2007-03-06 6:18 UTC (permalink / raw) To: Mark Levedahl; +Cc: Johannes.Schindelin, Git Mailing List Mark Levedahl <mlevedahl@gmail.com> writes: > However, a simple example: > > git>git-rev-list --objects HEAD~1..HEAD > 5ced0572217f82f20c4a3460492768e07c08aeea > 1d30eaabe4b6f7218e4e4cfff5670a493aa7358e > 2a1d6a2be1511f65b601897f05962e5f673257d8 contrib > cbd77b2f57114d4fa2f119f1b1ee17968d6d67d4 contrib/emacs > 8554e3967cc692c6916e5aee35952074d07e8bf0 contrib/emacs/Makefile > > shows that the bundle for this case should have five objects as > shown. In this case, 916 objects are actually included in the pack > when executing > > git bundle create test.bdl HEAD~1..HEAD Indeed. The internal rev-list reimplemented in builtin-bundle.c lacks the code to mark trees and blobs in commits that are on the other side of the edge uninteresting. This is on top of the patches I send to the list and Dscho earlier. -- builtin-bundle.c | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/builtin-bundle.c b/builtin-bundle.c index 4fe74a7..199a30d 100644 --- a/builtin-bundle.c +++ b/builtin-bundle.c @@ -278,6 +278,11 @@ static void show_object(struct object_array_entry *p) write_or_die(1, "\n", 1); } +static void show_edge(struct commit *commit) +{ + ; /* nothing to do */ +} + static int create_bundle(struct bundle_header *header, const char *path, int argc, const char **argv) { @@ -361,6 +366,7 @@ static int create_bundle(struct bundle_header *header, const char *path, dup2(in, 1); close(in); prepare_revision_walk(&revs); + mark_edges_uninteresting(revs.commits, &revs, show_edge); traverse_commit_list(&revs, show_commit, show_object); close(1); while (waitpid(pid, &status, 0) < 0) ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH] git-bundle: fix pack generation. 2007-03-06 4:17 builtin git-bundle - pack contains many more objects than required Mark Levedahl 2007-03-06 6:18 ` Junio C Hamano @ 2007-03-06 7:45 ` Junio C Hamano 2007-03-07 1:15 ` Mark Levedahl 2007-03-07 1:16 ` [PATCH] git-bundle: fix pack generation Mark Levedahl 1 sibling, 2 replies; 22+ messages in thread From: Junio C Hamano @ 2007-03-06 7:45 UTC (permalink / raw) To: Mark Levedahl; +Cc: Johannes.Schindelin, Git Mailing List The handcrafted built-in rev-list lookalike forgot to mark the trees and blobs contained in the boundary commits uninteresting, resulting in unnecessary objects in the pack. Signed-off-by: Junio C Hamano <junkio@cox.net> --- * I'll have this in 'master' tonight. Also I am thinking about changing 'git bundle verify' to spit out heads and prereqs unconditionally, in addition to 'foo.bdl is ok'. Comments? builtin-bundle.c | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/builtin-bundle.c b/builtin-bundle.c index d41a413..279b8f8 100644 --- a/builtin-bundle.c +++ b/builtin-bundle.c @@ -263,6 +263,11 @@ static void show_object(struct object_array_entry *p) write_or_die(1, "\n", 1); } +static void show_edge(struct commit *commit) +{ + ; /* nothing to do */ +} + static int create_bundle(struct bundle_header *header, const char *path, int argc, const char **argv) { @@ -341,6 +346,7 @@ static int create_bundle(struct bundle_header *header, const char *path, dup2(in, 1); close(in); prepare_revision_walk(&revs); + mark_edges_uninteresting(revs.commits, &revs, show_edge); traverse_commit_list(&revs, show_commit, show_object); close(1); while (waitpid(pid, &status, 0) < 0) -- 1.5.0.3.862.g71037 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: fix pack generation. 2007-03-06 7:45 ` [PATCH] git-bundle: fix pack generation Junio C Hamano @ 2007-03-07 1:15 ` Mark Levedahl 2007-03-07 3:01 ` Junio C Hamano 2007-03-07 1:16 ` [PATCH] git-bundle: fix pack generation Mark Levedahl 1 sibling, 1 reply; 22+ messages in thread From: Mark Levedahl @ 2007-03-07 1:15 UTC (permalink / raw) To: Junio C Hamano; +Cc: Johannes.Schindelin, Git Mailing List Junio C Hamano wrote: > The handcrafted built-in rev-list lookalike forgot to mark the trees > and blobs contained in the boundary commits uninteresting, resulting > in unnecessary objects in the pack. > > Signed-off-by: Junio C Hamano <junkio@cox.net> > This works for things like master~1..master, but fails on git bundle create t.bdl master --since=1.day.ago. Apparently the boundary commits marked with a date are not being honored in creating a pack. Mark ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: fix pack generation. 2007-03-07 1:15 ` Mark Levedahl @ 2007-03-07 3:01 ` Junio C Hamano 2007-03-07 3:17 ` Mark Levedahl 0 siblings, 1 reply; 22+ messages in thread From: Junio C Hamano @ 2007-03-07 3:01 UTC (permalink / raw) To: Mark Levedahl; +Cc: Johannes.Schindelin, Git Mailing List Mark Levedahl <mlevedahl@gmail.com> writes: > Junio C Hamano wrote: >> The handcrafted built-in rev-list lookalike forgot to mark the trees >> and blobs contained in the boundary commits uninteresting, resulting >> in unnecessary objects in the pack. >> >> Signed-off-by: Junio C Hamano <junkio@cox.net> >> > This works for things like master~1..master, but fails on git bundle > create t.bdl master --since=1.day.ago. > Apparently the boundary commits marked with a date are not being > honored in creating a pack. That one is caused by the broken revision traversal in 'master' and being worked on in 'next'. Care to try the one from 'next' instead? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: fix pack generation. 2007-03-07 3:01 ` Junio C Hamano @ 2007-03-07 3:17 ` Mark Levedahl 2007-03-07 3:22 ` Johannes Schindelin 0 siblings, 1 reply; 22+ messages in thread From: Mark Levedahl @ 2007-03-07 3:17 UTC (permalink / raw) To: Junio C Hamano; +Cc: Johannes.Schindelin, Git Mailing List Junio C Hamano wrote: > That one is caused by the broken revision traversal in 'master' > and being worked on in 'next'. Care to try the one from 'next' > instead? using next as just pulled from kernel.org (09890a9bce0bc27182bc1f74a34b53) ... git>git bundle create test.bdl HEAD~1..HEAD error: rev-list died 255 I have not found any rev-args set that avoids that error. Mark ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: fix pack generation. 2007-03-07 3:17 ` Mark Levedahl @ 2007-03-07 3:22 ` Johannes Schindelin 2007-03-07 3:50 ` Mark Levedahl 0 siblings, 1 reply; 22+ messages in thread From: Johannes Schindelin @ 2007-03-07 3:22 UTC (permalink / raw) To: Mark Levedahl; +Cc: Junio C Hamano, Git Mailing List Hi, On Tue, 6 Mar 2007, Mark Levedahl wrote: > Junio C Hamano wrote: > > That one is caused by the broken revision traversal in 'master' > > and being worked on in 'next'. Care to try the one from 'next' > > instead? > using next as just pulled from kernel.org (09890a9bce0bc27182bc1f74a34b53) > ... > > git>git bundle create test.bdl HEAD~1..HEAD > error: rev-list died 255 > > I have not found any rev-args set that avoids that error. Have you tried "make test"? In particular, t5510-fetch.sh? If it passes, git-bundle works at least for one particular case, and I'd suspect then that the inbuilt GIT_EXEC_PATH bites you. To avoid that particular peculiarity, just "export GIT_EXEC_PATH=/path/to/next/", and try again. Hiw, Dscho ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: fix pack generation. 2007-03-07 3:22 ` Johannes Schindelin @ 2007-03-07 3:50 ` Mark Levedahl 2007-03-07 4:05 ` Johannes Schindelin 0 siblings, 1 reply; 22+ messages in thread From: Mark Levedahl @ 2007-03-07 3:50 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Junio C Hamano, Git Mailing List Johannes Schindelin wrote: > Have you tried "make test"? In particular, t5510-fetch.sh? > > If it passes, git-bundle works at least for one particular case, and I'd > suspect then that the inbuilt GIT_EXEC_PATH bites you. To avoid that > particular peculiarity, just "export GIT_EXEC_PATH=/path/to/next/", and > try again. > > Hiw, > Dscho Yes, setting GIT_EXEC_PATH fixed the problem, thanks. I just tried git-bundle in a repository where I just committed 1 file, the previous commit is several weeks old. git-bundle create test.bdl HEAD --since=1.day.ago ==>> pack with 1531 objects git-bundle create test.bdl HEAD ^HEAD~1 ==>> pack with 3 objects But, both should only have 3 objects. So, I think the boundary marking with date limiting still has a problem. Apparently, every blob supporting the included commits are included in the pack, even if those blobs are also part of the pack's prerequisites. Mark ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: fix pack generation. 2007-03-07 3:50 ` Mark Levedahl @ 2007-03-07 4:05 ` Johannes Schindelin 2007-03-07 5:57 ` Junio C Hamano 2007-03-07 16:34 ` Mark Levedahl 0 siblings, 2 replies; 22+ messages in thread From: Johannes Schindelin @ 2007-03-07 4:05 UTC (permalink / raw) To: Mark Levedahl; +Cc: Junio C Hamano, Git Mailing List Hi, On Tue, 6 Mar 2007, Mark Levedahl wrote: > Yes, setting GIT_EXEC_PATH fixed the problem, thanks. Wonderful! > I just tried git-bundle in a repository where I just committed 1 file, > the previous commit is several weeks old. > > git-bundle create test.bdl HEAD --since=1.day.ago ==>> pack with 1531 > objects Did you test with "--since=1.day.ago HEAD", i.e. with the correct order? I know you'd like the options to be interminglable, but "HEAD" really is not an option, but an argument. Hth, Dscho ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: fix pack generation. 2007-03-07 4:05 ` Johannes Schindelin @ 2007-03-07 5:57 ` Junio C Hamano 2007-03-07 16:34 ` Mark Levedahl 1 sibling, 0 replies; 22+ messages in thread From: Junio C Hamano @ 2007-03-07 5:57 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Mark Levedahl, Git Mailing List Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> git-bundle create test.bdl HEAD --since=1.day.ago ==>> pack with 1531 >> objects > > Did you test with "--since=1.day.ago HEAD", i.e. with the correct order? I > know you'd like the options to be interminglable, but "HEAD" really is not > an option, but an argument. That is probably true but I do not think we have ever had date limiter and --objects interact correctly. It is understandable as traditionally packs were to generate needed objects between revisions, and unlimited but counted (--max-count) or timed (--max-age) are unexpected uses for "rev-list --objects" (in other words, nobody had done so and I am not surprised it does not work as you expect -- rather I would be surprised if anybody has even expectd the current code to work that way without modification). I am a bit too tired tonight so anybody is welcome to beat me to fix this one, preferrably on top of 'next'. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: fix pack generation. 2007-03-07 4:05 ` Johannes Schindelin 2007-03-07 5:57 ` Junio C Hamano @ 2007-03-07 16:34 ` Mark Levedahl 2007-03-07 22:35 ` Johannes Schindelin 1 sibling, 1 reply; 22+ messages in thread From: Mark Levedahl @ 2007-03-07 16:34 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Junio C Hamano, Git Mailing List On 3/6/07, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > Hi, > > > > > git-bundle create test.bdl HEAD --since=1.day.ago ==>> pack with 1531 > > objects > > Did you test with "--since=1.day.ago HEAD", i.e. with the correct order? I > know you'd like the options to be interminglable, but "HEAD" really is not > an option, but an argument. > Changing the order of arguments makes no difference, same result either way. Mark ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: fix pack generation. 2007-03-07 16:34 ` Mark Levedahl @ 2007-03-07 22:35 ` Johannes Schindelin 2007-03-07 23:17 ` Linus Torvalds 2007-03-07 23:32 ` [PATCH] git-bundle: fix pack generation Junio C Hamano 0 siblings, 2 replies; 22+ messages in thread From: Johannes Schindelin @ 2007-03-07 22:35 UTC (permalink / raw) To: Mark Levedahl; +Cc: Junio C Hamano, Git Mailing List Hi, On Wed, 7 Mar 2007, Mark Levedahl wrote: > On 3/6/07, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > > > > git-bundle create test.bdl HEAD --since=1.day.ago ==>> pack with > > > 1531 objects > > > > Did you test with "--since=1.day.ago HEAD", i.e. with the correct > > order? I know you'd like the options to be interminglable, but "HEAD" > > really is not an option, but an argument. > > Changing the order of arguments makes no difference, same result either > way. We don't do thin packs. Should we? I guess that $ git ls-tree -r HEAD | wc results in something close to 1500 in that repo. Which basically means that the 1531 objects (including trees and the commit) sounds correct. Of course, we _could_ make the packs thin, but the disadvantage would be that we can never decide at a later stage to allow shallow fetches from that bundle. The advantage of the thin packs would be that the bundles would be much smaller. Ciao, Dscho ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: fix pack generation. 2007-03-07 22:35 ` Johannes Schindelin @ 2007-03-07 23:17 ` Linus Torvalds 2007-03-08 0:27 ` [PATCH] git-bundle: Make thin packs Johannes Schindelin 2007-03-07 23:32 ` [PATCH] git-bundle: fix pack generation Junio C Hamano 1 sibling, 1 reply; 22+ messages in thread From: Linus Torvalds @ 2007-03-07 23:17 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Mark Levedahl, Junio C Hamano, Git Mailing List On Wed, 7 Mar 2007, Johannes Schindelin wrote: > > We don't do thin packs. Should we? Since a bundle doesn't make any sense *anyway* unless you have the prerequisites at the other end, I think you might as well do thin packs. That will cut down on the bundle size a *lot* for the common cases. Linus ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] git-bundle: Make thin packs 2007-03-07 23:17 ` Linus Torvalds @ 2007-03-08 0:27 ` Johannes Schindelin 2007-03-08 0:34 ` Linus Torvalds 2007-03-08 2:02 ` Junio C Hamano 0 siblings, 2 replies; 22+ messages in thread From: Johannes Schindelin @ 2007-03-08 0:27 UTC (permalink / raw) To: Linus Torvalds; +Cc: Mark Levedahl, Junio C Hamano, Git Mailing List Thin packs are way smaller, but they rely on the receiving end to have the base objects. However, Git's pack protocol also uses thin packs by default. So make the packs contained in bundles thin, since bundles are just another transport. The patch looks a bit bigger than intended, mainly because --thin _implies_ that pack-objects should run its own rev-list. Therefore, this patch removes all the stuff we used to roll rev-list ourselves. This commit also changes behaviour slightly: since we now know early enough if a specified ref is _not_ contained in the pack, we can avoid putting that ref into the pack. So, we don't die() here, but warn() instead, and skip that ref. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- It came to me as a surprise that --thin implies --revs, and I have to admit that I still don't understand it. Anyway, this commit deletes more than double the lines as it adds, so I'm not complaining. On Wed, 7 Mar 2007, Linus Torvalds wrote: > On Wed, 7 Mar 2007, Johannes Schindelin wrote: > > > We don't do thin packs. Should we? > > Since a bundle doesn't make any sense *anyway* unless you have > the prerequisites at the other end, I think you might as well do > thin packs. That will cut down on the bundle size a *lot* for > the common cases. Well, I disagree on the blanket "*anyway*". Shallow fetches are no longer possible from these bundles (at least after this commit _and_ ":/git-bundle: avoid packing" which I just sent out). But then, people who want shallow fetches via bundles are more likely to want tar balls _anyway_. builtin-bundle.c | 85 ++++++++++++++++------------------------------------- 1 files changed, 26 insertions(+), 59 deletions(-) diff --git a/builtin-bundle.c b/builtin-bundle.c index 70d4479..6163358 100644 --- a/builtin-bundle.c +++ b/builtin-bundle.c @@ -257,45 +257,15 @@ static int list_heads(struct bundle_header *header, int argc, const char **argv) return list_refs(&header->references, argc, argv); } -static void show_commit(struct commit *commit) -{ - write_or_die(1, sha1_to_hex(commit->object.sha1), 40); - write_or_die(1, "\n", 1); - if (commit->parents) { - free_commit_list(commit->parents); - commit->parents = NULL; - } -} - -static void show_object(struct object_array_entry *p) -{ - /* An object with name "foo\n0000000..." can be used to - * confuse downstream git-pack-objects very badly. - */ - const char *ep = strchr(p->name, '\n'); - int len = ep ? ep - p->name : strlen(p->name); - write_or_die(1, sha1_to_hex(p->item->sha1), 40); - write_or_die(1, " ", 1); - if (len) - write_or_die(1, p->name, len); - write_or_die(1, "\n", 1); -} - -static void show_edge(struct commit *commit) -{ - ; /* nothing to do */ -} - static int create_bundle(struct bundle_header *header, const char *path, int argc, const char **argv) { int bundle_fd = -1; const char **argv_boundary = xmalloc((argc + 4) * sizeof(const char *)); - const char **argv_pack = xmalloc(4 * sizeof(const char *)); + const char **argv_pack = xmalloc(5 * sizeof(const char *)); int pid, in, out, i, status; char buffer[1024]; struct rev_info revs; - struct object_array tips; bundle_fd = (!strcmp(path, "-") ? 1 : open(path, O_CREAT | O_WRONLY, 0666)); @@ -319,16 +289,20 @@ static int create_bundle(struct bundle_header *header, const char *path, pid = fork_with_pipe(argv_boundary, NULL, &out); if (pid < 0) return -1; - while ((i = read_string(out, buffer, sizeof(buffer))) > 0) + while ((i = read_string(out, buffer, sizeof(buffer))) > 0) { + unsigned char sha1[20]; if (buffer[0] == '-') { - unsigned char sha1[20]; write_or_die(bundle_fd, buffer, i); if (!get_sha1_hex(buffer + 1, sha1)) { struct object *object = parse_object(sha1); object->flags |= UNINTERESTING; add_pending_object(&revs, object, buffer); } + } else if (!get_sha1_hex(buffer, sha1)) { + struct object *object = parse_object(sha1); + object->flags |= SHOWN; } + } while ((i = waitpid(pid, &status, 0)) < 0) if (errno != EINTR) return error("rev-list died"); @@ -336,14 +310,10 @@ static int create_bundle(struct bundle_header *header, const char *path, return error("rev-list died %d", WEXITSTATUS(status)); /* write references */ - revs.tag_objects = 1; - revs.tree_objects = 1; - revs.blob_objects = 1; argc = setup_revisions(argc, argv, &revs, NULL); if (argc > 1) return error("unrecognized argument: %s'", argv[1]); - memset(&tips, 0, sizeof(tips)); for (i = 0; i < revs.pending.nr; i++) { struct object_array_entry *e = revs.pending.objects + i; unsigned char sha1[20]; @@ -353,11 +323,20 @@ static int create_bundle(struct bundle_header *header, const char *path, continue; if (dwim_ref(e->name, strlen(e->name), sha1, &ref) != 1) continue; + /* + * Make sure the refs we wrote out is correct; --max-count and + * other limiting options could have prevented all the tips + * from getting output. + */ + if (!(e->item->flags & SHOWN)) { + warn("ref '%s' is excluded by the rev-list options", + e->name); + continue; + } write_or_die(bundle_fd, sha1_to_hex(e->item->sha1), 40); write_or_die(bundle_fd, " ", 1); write_or_die(bundle_fd, ref, strlen(ref)); write_or_die(bundle_fd, "\n", 1); - add_object_array(e->item, e->name, &tips); free(ref); } @@ -368,39 +347,27 @@ static int create_bundle(struct bundle_header *header, const char *path, argv_pack[0] = "pack-objects"; argv_pack[1] = "--all-progress"; argv_pack[2] = "--stdout"; - argv_pack[3] = NULL; + argv_pack[3] = "--thin"; + argv_pack[4] = NULL; in = -1; out = bundle_fd; pid = fork_with_pipe(argv_pack, &in, &out); if (pid < 0) return error("Could not spawn pack-objects"); - close(1); - dup2(in, 1); + for (i = 0; i < revs.pending.nr; i++) { + struct object *object = revs.pending.objects[i].item; + if (object->flags & UNINTERESTING) + write(in, "^", 1); + write(in, sha1_to_hex(object->sha1), 40); + write(in, "\n", 1); + } close(in); - prepare_revision_walk(&revs); - mark_edges_uninteresting(revs.commits, &revs, show_edge); - traverse_commit_list(&revs, show_commit, show_object); - close(1); while (waitpid(pid, &status, 0) < 0) if (errno != EINTR) return -1; if (!WIFEXITED(status) || WEXITSTATUS(status)) return error ("pack-objects died"); - /* - * Make sure the refs we wrote out is correct; --max-count and - * other limiting options could have prevented all the tips - * from getting output. - */ - status = 0; - for (i = 0; i < tips.nr; i++) { - if (!(tips.objects[i].item->flags & SHOWN)) { - status = 1; - error("%s: not included in the resulting pack", - tips.objects[i].name); - } - } - return status; } -- 1.5.0.3.2562.gfcfe0 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: Make thin packs 2007-03-08 0:27 ` [PATCH] git-bundle: Make thin packs Johannes Schindelin @ 2007-03-08 0:34 ` Linus Torvalds 2007-03-08 0:56 ` Johannes Schindelin 2007-03-08 2:02 ` Junio C Hamano 1 sibling, 1 reply; 22+ messages in thread From: Linus Torvalds @ 2007-03-08 0:34 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Mark Levedahl, Junio C Hamano, Git Mailing List On Thu, 8 Mar 2007, Johannes Schindelin wrote: > > > Since a bundle doesn't make any sense *anyway* unless you have > > the prerequisites at the other end, I think you might as well do > > thin packs. That will cut down on the bundle size a *lot* for > > the common cases. > > Well, I disagree on the blanket "*anyway*". Shallow fetches are no > longer possible from these bundles (at least after this commit > _and_ ":/git-bundle: avoid packing" which I just sent out). Does anybody actually use shallow clones in real life? When I did the numbers a long time ago, the shallow clone didn't actually help much, because it meant that there were no deltas. Which meant that you got 1% of the history for 60% of the price of all history, and the shallow thing didn't really seem to make much sense. I guess that for something with a really long history, you'd get 0.001% of the history for 10% of the price, and maybe it makes sense then. Linus ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: Make thin packs 2007-03-08 0:34 ` Linus Torvalds @ 2007-03-08 0:56 ` Johannes Schindelin 0 siblings, 0 replies; 22+ messages in thread From: Johannes Schindelin @ 2007-03-08 0:56 UTC (permalink / raw) To: Linus Torvalds; +Cc: Mark Levedahl, Junio C Hamano, Git Mailing List Hi, On Wed, 7 Mar 2007, Linus Torvalds wrote: > Does anybody actually use shallow clones in real life? I don't. That's why I work on push/fetch from/via/into shallow repos. > When I did the numbers a long time ago, the shallow clone didn't > actually help much, because it meant that there were no deltas. Which > meant that you got 1% of the history for 60% of the price of all > history, and the shallow thing didn't really seem to make much sense. > > I guess that for something with a really long history, you'd get 0.001% > of the history for 10% of the price, and maybe it makes sense then. You -- being blessed by not having to work with anything closed -- miss an important fact of commercial software development. Most, if not all, projects in a commercial setting contain binary blobs. For example, a DLL, or a PNG, or a Firmware blob. These are updated regularly. And they delta really awfully bad. Also, for something as OpenOffice or Mozilla, I guess that if you really only want to work on the newest revision, the _initial_ fetch will be way cheaper than a full clone. Of course I have no numbers here, but that is what my gut feeling says. Naturally, over time, the shallow clone will fetch in more and more objects from the upstream, eventually being almost as large as if having the complete history, but the user's experience is different: the amount of time needed to get that amount of data is much more widely spread. Ciao, Dscho ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: Make thin packs 2007-03-08 0:27 ` [PATCH] git-bundle: Make thin packs Johannes Schindelin 2007-03-08 0:34 ` Linus Torvalds @ 2007-03-08 2:02 ` Junio C Hamano 2007-03-08 13:07 ` Johannes Schindelin 1 sibling, 1 reply; 22+ messages in thread From: Junio C Hamano @ 2007-03-08 2:02 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Linus Torvalds, Mark Levedahl, Git Mailing List Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > This commit also changes behaviour slightly: since we now know early > enough if a specified ref is _not_ contained in the pack, we can avoid > putting that ref into the pack. So, we don't die() here, but warn() > instead, and skip that ref. I am a bit worried about that part. If somebody says "bundle create foo.bdl --since=1.day maint" and maint's tip hasn't changed for a month, we would end up having no refs and no pack in the bundle with just a warning. "bundle create foo.bdl -20 master next" does _not_ mean "20 revs but at least have master and next tip", but it may surprise an uninitiated to find out that running "bundle list foo.bdl" on the resulting bundle did not talk about master at all. I have a feeling that it would avoid confusion if we did not even start the pack generation and die early when we find the counting caused us not to include all the positive refs specified on the command line. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: Make thin packs 2007-03-08 2:02 ` Junio C Hamano @ 2007-03-08 13:07 ` Johannes Schindelin 0 siblings, 0 replies; 22+ messages in thread From: Johannes Schindelin @ 2007-03-08 13:07 UTC (permalink / raw) To: Junio C Hamano; +Cc: Linus Torvalds, Mark Levedahl, Git Mailing List Hi, On Wed, 7 Mar 2007, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > This commit also changes behaviour slightly: since we now know early > > enough if a specified ref is _not_ contained in the pack, we can avoid > > putting that ref into the pack. So, we don't die() here, but warn() > > instead, and skip that ref. > > I am a bit worried about that part. > > If somebody says "bundle create foo.bdl --since=1.day maint" and > maint's tip hasn't changed for a month, we would end up having > no refs and no pack in the bundle with just a warning. > > "bundle create foo.bdl -20 master next" does _not_ mean "20 revs > but at least have master and next tip", but it may surprise an > uninitiated to find out that running "bundle list foo.bdl" on > the resulting bundle did not talk about master at all. > > I have a feeling that it would avoid confusion if we did not > even start the pack generation and die early when we find the > counting caused us not to include all the positive refs > specified on the command line. Fair enough. Ciao, Dscho ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: fix pack generation. 2007-03-07 22:35 ` Johannes Schindelin 2007-03-07 23:17 ` Linus Torvalds @ 2007-03-07 23:32 ` Junio C Hamano 2007-03-07 23:43 ` [PATCH] git-bundle: avoid packing objects which are in the prerequisites Johannes Schindelin 1 sibling, 1 reply; 22+ messages in thread From: Junio C Hamano @ 2007-03-07 23:32 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Mark Levedahl, Git Mailing List Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > On Wed, 7 Mar 2007, Mark Levedahl wrote: > >> On 3/6/07, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: >> > >> > > git-bundle create test.bdl HEAD --since=1.day.ago ==>> pack with >> > > 1531 objects >> > >> > Did you test with "--since=1.day.ago HEAD", i.e. with the correct >> > order? I know you'd like the options to be interminglable, but "HEAD" >> > really is not an option, but an argument. >> >> Changing the order of arguments makes no difference, same result either >> way. > > We don't do thin packs. Should we? I guess that > > $ git ls-tree -r HEAD | wc > > results in something close to 1500 in that repo. Which basically means > that the 1531 objects (including trees and the commit) sounds correct. We should do thin packs, but 1500+ objects is a different story. rev-list --objects <revspec> means "show me the revs and its objects". rev-list --max-count=<n> <revspec> means "show me the revs and its objects, but stop at <n> revs". On the other hand, rev-list --objects ^<rev1> <rev2> means something totally different. "never show me what is reachable from <rev1> but I am interested to see what are reachable from <rev2>". So it is natural that "rev-list --objects --max-count=1 HEAD" shows the same set of tree and blob objects as "git-tar HEAD". "show me the revs and its objects" does not say "but exclude trees and blobs reachable from commits beyond the <n> revs boundary," which is "rev-list --objects HEAD^..HEAD". In other words, "rev-list -1 HEAD" and "rev-list HEAD^..HEAD" are _different_. We fixed (in 'next') the revision traversal so that --max-count and --boundary interact correctly. What's not fixed is that we still use "rev-list --objects --max-*" given by the user, which is not the semantics git-bundle wants. We should rewrite the revspec to drive pack-objects when generating the packfile part of the bundle to do "<revs> --not <boundaries>" internally in git-bundle. ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] git-bundle: avoid packing objects which are in the prerequisites 2007-03-07 23:32 ` [PATCH] git-bundle: fix pack generation Junio C Hamano @ 2007-03-07 23:43 ` Johannes Schindelin 2007-03-08 20:36 ` Mark Levedahl 0 siblings, 1 reply; 22+ messages in thread From: Johannes Schindelin @ 2007-03-07 23:43 UTC (permalink / raw) To: Junio C Hamano; +Cc: Mark Levedahl, Git Mailing List When saying something like "--since=1.day.ago" or "--max-count=5", git-bundle finds the boundary commits which are recorded as prerequisites. However, it failed to tell pack-objects _not_ to pack the objects which are in these. Fix that. And add a test for that. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- On Wed, 7 Mar 2007, Junio C Hamano wrote: > What's not fixed is that we still use "rev-list --objects > --max-*" given by the user, which is not the semantics > git-bundle wants. We should rewrite the revspec to drive > pack-objects when generating the packfile part of the bundle to > do "<revs> --not <boundaries>" internally in git-bundle. Since we already calculate the prerequisites, it's easy to pass them as uninteresting, too. It does not hurt to keep the max* limiters, so they are not unset. builtin-bundle.c | 15 ++++++++++++--- t/t5510-fetch.sh | 11 +++++++++++ 2 files changed, 23 insertions(+), 3 deletions(-) diff --git a/builtin-bundle.c b/builtin-bundle.c index 3b3bc25..70d4479 100644 --- a/builtin-bundle.c +++ b/builtin-bundle.c @@ -305,6 +305,10 @@ static int create_bundle(struct bundle_header *header, const char *path, /* write signature */ write_or_die(bundle_fd, bundle_signature, strlen(bundle_signature)); + /* init revs to list objects for pack-objects later */ + save_commit_buffer = 0; + init_revisions(&revs, NULL); + /* write prerequisites */ memcpy(argv_boundary + 3, argv + 1, argc * sizeof(const char *)); argv_boundary[0] = "rev-list"; @@ -316,8 +320,15 @@ static int create_bundle(struct bundle_header *header, const char *path, if (pid < 0) return -1; while ((i = read_string(out, buffer, sizeof(buffer))) > 0) - if (buffer[0] == '-') + if (buffer[0] == '-') { + unsigned char sha1[20]; write_or_die(bundle_fd, buffer, i); + if (!get_sha1_hex(buffer + 1, sha1)) { + struct object *object = parse_object(sha1); + object->flags |= UNINTERESTING; + add_pending_object(&revs, object, buffer); + } + } while ((i = waitpid(pid, &status, 0)) < 0) if (errno != EINTR) return error("rev-list died"); @@ -325,8 +336,6 @@ static int create_bundle(struct bundle_header *header, const char *path, return error("rev-list died %d", WEXITSTATUS(status)); /* write references */ - save_commit_buffer = 0; - init_revisions(&revs, NULL); revs.tag_objects = 1; revs.tree_objects = 1; revs.blob_objects = 1; diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh index ce96b4b..7be9793 100755 --- a/t/t5510-fetch.sh +++ b/t/t5510-fetch.sh @@ -121,4 +121,15 @@ test_expect_success 'unbundle 2' ' test "tip" = "$(git log -1 --pretty=oneline master | cut -b42-)" ' +test_expect_success 'bundle does not prerequisite objects' ' + cd "$D" && + touch file2 && + git add file2 && + git commit -m add.file2 file2 && + git bundle create bundle3 -1 HEAD && + sed "1,4d" < bundle3 > bundle.pack && + git index-pack bundle.pack && + test 4 = $(git verify-pack -v bundle.pack | wc -l) +' + test_done -- 1.5.0.3.2562.gfcfe0 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: avoid packing objects which are in the prerequisites 2007-03-07 23:43 ` [PATCH] git-bundle: avoid packing objects which are in the prerequisites Johannes Schindelin @ 2007-03-08 20:36 ` Mark Levedahl 0 siblings, 0 replies; 22+ messages in thread From: Mark Levedahl @ 2007-03-08 20:36 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Junio C Hamano, Git Mailing List Johannes Schindelin wrote: Using next (2432cbc253e8ef5cf09062abb8f7325813476d1d) that has your patches for creating a thin-pack and limiting objects, I generally find the bundle file created cannot be used. Specific example you can repeat in a current git repo: git>git branch foo 2432cbc253e8ef5c git>git bundle create t.bdl foo~1..foo Generating pack... Done counting 20 objects. Result has 14 objects. Deltifying 14 objects. 100% (14/14) done Writing 14 objects. 100% (14/14) done Total 14 (delta 10), reused 10 (delta 6) git>git fetch t.bdl fatal: pack has junk at the end error: index-pack exited with status 128 fatal: Fetch failure: t.bdl git> Also, we get an interesting failure if the references are not touched in a given date range: git>git bundle create t.bdl --since=5.minutes.ago foo warning: ref 'foo' is excluded by the rev-list options Generating pack... Done counting 38321 objects. Deltifying 38321 objects. 100% (38321/38321) done Writing 38321 objects. 100% (38321/38321) done Total 38321 (delta 26678), reused 38180 (delta 26551) git> No references requies 38000+ objects?? In this case, the bundle includes every object in the repository, the exact opposite of what should happen (zero objects). Mark ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: fix pack generation. 2007-03-06 7:45 ` [PATCH] git-bundle: fix pack generation Junio C Hamano 2007-03-07 1:15 ` Mark Levedahl @ 2007-03-07 1:16 ` Mark Levedahl 1 sibling, 0 replies; 22+ messages in thread From: Mark Levedahl @ 2007-03-07 1:16 UTC (permalink / raw) To: Junio C Hamano; +Cc: Johannes.Schindelin, Git Mailing List On 3/6/07, *Junio C Hamano* <junkio@cox.net <mailto:junkio@cox.net>> wrote: * I'll have this in 'master' tonight. Also I am thinking about changing 'git bundle verify' to spit out heads and prereqs unconditionally, in addition to 'foo.bdl is ok'. Comments? That is a good feature, much better than trying to list the bundle, and I have no issue with this suggestion. We could also define "git bundle [depends|show-refs] bundle-file" to be more suggestive of the actual result. As the prerequisites have a common conceptual definition with those of a shallow clone, perhaps there should be a common way to get those. In this vein, git-ls-remote bundle already shows the references in git-like fashion. Mark ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2007-03-08 20:36 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-03-06 4:17 builtin git-bundle - pack contains many more objects than required Mark Levedahl 2007-03-06 6:18 ` Junio C Hamano 2007-03-06 7:45 ` [PATCH] git-bundle: fix pack generation Junio C Hamano 2007-03-07 1:15 ` Mark Levedahl 2007-03-07 3:01 ` Junio C Hamano 2007-03-07 3:17 ` Mark Levedahl 2007-03-07 3:22 ` Johannes Schindelin 2007-03-07 3:50 ` Mark Levedahl 2007-03-07 4:05 ` Johannes Schindelin 2007-03-07 5:57 ` Junio C Hamano 2007-03-07 16:34 ` Mark Levedahl 2007-03-07 22:35 ` Johannes Schindelin 2007-03-07 23:17 ` Linus Torvalds 2007-03-08 0:27 ` [PATCH] git-bundle: Make thin packs Johannes Schindelin 2007-03-08 0:34 ` Linus Torvalds 2007-03-08 0:56 ` Johannes Schindelin 2007-03-08 2:02 ` Junio C Hamano 2007-03-08 13:07 ` Johannes Schindelin 2007-03-07 23:32 ` [PATCH] git-bundle: fix pack generation Junio C Hamano 2007-03-07 23:43 ` [PATCH] git-bundle: avoid packing objects which are in the prerequisites Johannes Schindelin 2007-03-08 20:36 ` Mark Levedahl 2007-03-07 1:16 ` [PATCH] git-bundle: fix pack generation Mark Levedahl
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).