* builtin git-bundle - pack contains many more objects than required
@ 2007-03-06 4:17 Mark Levedahl
2007-03-06 6:18 ` Junio C Hamano
2007-03-06 7:45 ` [PATCH] git-bundle: fix pack generation Junio C Hamano
0 siblings, 2 replies; 22+ messages in thread
From: Mark Levedahl @ 2007-03-06 4:17 UTC (permalink / raw)
To: Johannes.Schindelin, Git Mailing List
builtin-bundle in version v1.5.0.3-271-g5ced057 (commit
5ced0572217f82f20c4a3460) (and probably before) includes far more
objects in the pack than are actually necessary. Comparison to the
original shell scripts I wrote shows the prerequisites and refs always
agree between those (and that is what I tested before), so the problem
is apparently in the arguments as passed to pack-objects. I confess to
not understanding the revision walking code called from
builtin-bundle.c, so I do not have a patch.
However, a simple example:
git>git-rev-list --objects HEAD~1..HEAD
5ced0572217f82f20c4a3460492768e07c08aeea
1d30eaabe4b6f7218e4e4cfff5670a493aa7358e
2a1d6a2be1511f65b601897f05962e5f673257d8 contrib
cbd77b2f57114d4fa2f119f1b1ee17968d6d67d4 contrib/emacs
8554e3967cc692c6916e5aee35952074d07e8bf0 contrib/emacs/Makefile
shows that the bundle for this case should have five objects as shown.
In this case, 916 objects are actually included in the pack when executing
git bundle create test.bdl HEAD~1..HEAD
Mark
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: builtin git-bundle - pack contains many more objects than required
2007-03-06 4:17 builtin git-bundle - pack contains many more objects than required Mark Levedahl
@ 2007-03-06 6:18 ` Junio C Hamano
2007-03-06 7:45 ` [PATCH] git-bundle: fix pack generation Junio C Hamano
1 sibling, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2007-03-06 6:18 UTC (permalink / raw)
To: Mark Levedahl; +Cc: Johannes.Schindelin, Git Mailing List
Mark Levedahl <mlevedahl@gmail.com> writes:
> However, a simple example:
>
> git>git-rev-list --objects HEAD~1..HEAD
> 5ced0572217f82f20c4a3460492768e07c08aeea
> 1d30eaabe4b6f7218e4e4cfff5670a493aa7358e
> 2a1d6a2be1511f65b601897f05962e5f673257d8 contrib
> cbd77b2f57114d4fa2f119f1b1ee17968d6d67d4 contrib/emacs
> 8554e3967cc692c6916e5aee35952074d07e8bf0 contrib/emacs/Makefile
>
> shows that the bundle for this case should have five objects as
> shown. In this case, 916 objects are actually included in the pack
> when executing
>
> git bundle create test.bdl HEAD~1..HEAD
Indeed. The internal rev-list reimplemented in builtin-bundle.c
lacks the code to mark trees and blobs in commits that are on
the other side of the edge uninteresting.
This is on top of the patches I send to the list and Dscho
earlier.
--
builtin-bundle.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/builtin-bundle.c b/builtin-bundle.c
index 4fe74a7..199a30d 100644
--- a/builtin-bundle.c
+++ b/builtin-bundle.c
@@ -278,6 +278,11 @@ static void show_object(struct object_array_entry *p)
write_or_die(1, "\n", 1);
}
+static void show_edge(struct commit *commit)
+{
+ ; /* nothing to do */
+}
+
static int create_bundle(struct bundle_header *header, const char *path,
int argc, const char **argv)
{
@@ -361,6 +366,7 @@ static int create_bundle(struct bundle_header *header, const char *path,
dup2(in, 1);
close(in);
prepare_revision_walk(&revs);
+ mark_edges_uninteresting(revs.commits, &revs, show_edge);
traverse_commit_list(&revs, show_commit, show_object);
close(1);
while (waitpid(pid, &status, 0) < 0)
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH] git-bundle: fix pack generation.
2007-03-06 4:17 builtin git-bundle - pack contains many more objects than required Mark Levedahl
2007-03-06 6:18 ` Junio C Hamano
@ 2007-03-06 7:45 ` Junio C Hamano
2007-03-07 1:15 ` Mark Levedahl
2007-03-07 1:16 ` [PATCH] git-bundle: fix pack generation Mark Levedahl
1 sibling, 2 replies; 22+ messages in thread
From: Junio C Hamano @ 2007-03-06 7:45 UTC (permalink / raw)
To: Mark Levedahl; +Cc: Johannes.Schindelin, Git Mailing List
The handcrafted built-in rev-list lookalike forgot to mark the trees
and blobs contained in the boundary commits uninteresting, resulting
in unnecessary objects in the pack.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
* I'll have this in 'master' tonight. Also I am thinking about
changing 'git bundle verify' to spit out heads and prereqs
unconditionally, in addition to 'foo.bdl is ok'. Comments?
builtin-bundle.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/builtin-bundle.c b/builtin-bundle.c
index d41a413..279b8f8 100644
--- a/builtin-bundle.c
+++ b/builtin-bundle.c
@@ -263,6 +263,11 @@ static void show_object(struct object_array_entry *p)
write_or_die(1, "\n", 1);
}
+static void show_edge(struct commit *commit)
+{
+ ; /* nothing to do */
+}
+
static int create_bundle(struct bundle_header *header, const char *path,
int argc, const char **argv)
{
@@ -341,6 +346,7 @@ static int create_bundle(struct bundle_header *header, const char *path,
dup2(in, 1);
close(in);
prepare_revision_walk(&revs);
+ mark_edges_uninteresting(revs.commits, &revs, show_edge);
traverse_commit_list(&revs, show_commit, show_object);
close(1);
while (waitpid(pid, &status, 0) < 0)
--
1.5.0.3.862.g71037
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: fix pack generation.
2007-03-06 7:45 ` [PATCH] git-bundle: fix pack generation Junio C Hamano
@ 2007-03-07 1:15 ` Mark Levedahl
2007-03-07 3:01 ` Junio C Hamano
2007-03-07 1:16 ` [PATCH] git-bundle: fix pack generation Mark Levedahl
1 sibling, 1 reply; 22+ messages in thread
From: Mark Levedahl @ 2007-03-07 1:15 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johannes.Schindelin, Git Mailing List
Junio C Hamano wrote:
> The handcrafted built-in rev-list lookalike forgot to mark the trees
> and blobs contained in the boundary commits uninteresting, resulting
> in unnecessary objects in the pack.
>
> Signed-off-by: Junio C Hamano <junkio@cox.net>
>
This works for things like master~1..master, but fails on git bundle
create t.bdl master --since=1.day.ago.
Apparently the boundary commits marked with a date are not being honored
in creating a pack.
Mark
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: fix pack generation.
2007-03-06 7:45 ` [PATCH] git-bundle: fix pack generation Junio C Hamano
2007-03-07 1:15 ` Mark Levedahl
@ 2007-03-07 1:16 ` Mark Levedahl
1 sibling, 0 replies; 22+ messages in thread
From: Mark Levedahl @ 2007-03-07 1:16 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johannes.Schindelin, Git Mailing List
On 3/6/07, *Junio C Hamano* <junkio@cox.net <mailto:junkio@cox.net>> wrote:
* I'll have this in 'master' tonight. Also I am thinking about
changing 'git bundle verify' to spit out heads and prereqs
unconditionally, in addition to 'foo.bdl is ok'. Comments?
That is a good feature, much better than trying to list the bundle, and
I have no issue with this suggestion.
We could also define "git bundle [depends|show-refs] bundle-file" to be
more suggestive of the actual result. As the prerequisites have a common
conceptual definition with those of a shallow clone, perhaps there
should be a common way to get those. In this vein, git-ls-remote bundle
already shows the references in git-like fashion.
Mark
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: fix pack generation.
2007-03-07 1:15 ` Mark Levedahl
@ 2007-03-07 3:01 ` Junio C Hamano
2007-03-07 3:17 ` Mark Levedahl
0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2007-03-07 3:01 UTC (permalink / raw)
To: Mark Levedahl; +Cc: Johannes.Schindelin, Git Mailing List
Mark Levedahl <mlevedahl@gmail.com> writes:
> Junio C Hamano wrote:
>> The handcrafted built-in rev-list lookalike forgot to mark the trees
>> and blobs contained in the boundary commits uninteresting, resulting
>> in unnecessary objects in the pack.
>>
>> Signed-off-by: Junio C Hamano <junkio@cox.net>
>>
> This works for things like master~1..master, but fails on git bundle
> create t.bdl master --since=1.day.ago.
> Apparently the boundary commits marked with a date are not being
> honored in creating a pack.
That one is caused by the broken revision traversal in 'master'
and being worked on in 'next'. Care to try the one from 'next'
instead?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: fix pack generation.
2007-03-07 3:01 ` Junio C Hamano
@ 2007-03-07 3:17 ` Mark Levedahl
2007-03-07 3:22 ` Johannes Schindelin
0 siblings, 1 reply; 22+ messages in thread
From: Mark Levedahl @ 2007-03-07 3:17 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johannes.Schindelin, Git Mailing List
Junio C Hamano wrote:
> That one is caused by the broken revision traversal in 'master'
> and being worked on in 'next'. Care to try the one from 'next'
> instead?
using next as just pulled from kernel.org
(09890a9bce0bc27182bc1f74a34b53) ...
git>git bundle create test.bdl HEAD~1..HEAD
error: rev-list died 255
I have not found any rev-args set that avoids that error.
Mark
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: fix pack generation.
2007-03-07 3:17 ` Mark Levedahl
@ 2007-03-07 3:22 ` Johannes Schindelin
2007-03-07 3:50 ` Mark Levedahl
0 siblings, 1 reply; 22+ messages in thread
From: Johannes Schindelin @ 2007-03-07 3:22 UTC (permalink / raw)
To: Mark Levedahl; +Cc: Junio C Hamano, Git Mailing List
Hi,
On Tue, 6 Mar 2007, Mark Levedahl wrote:
> Junio C Hamano wrote:
> > That one is caused by the broken revision traversal in 'master'
> > and being worked on in 'next'. Care to try the one from 'next'
> > instead?
> using next as just pulled from kernel.org (09890a9bce0bc27182bc1f74a34b53)
> ...
>
> git>git bundle create test.bdl HEAD~1..HEAD
> error: rev-list died 255
>
> I have not found any rev-args set that avoids that error.
Have you tried "make test"? In particular, t5510-fetch.sh?
If it passes, git-bundle works at least for one particular case, and I'd
suspect then that the inbuilt GIT_EXEC_PATH bites you. To avoid that
particular peculiarity, just "export GIT_EXEC_PATH=/path/to/next/", and
try again.
Hiw,
Dscho
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: fix pack generation.
2007-03-07 3:22 ` Johannes Schindelin
@ 2007-03-07 3:50 ` Mark Levedahl
2007-03-07 4:05 ` Johannes Schindelin
0 siblings, 1 reply; 22+ messages in thread
From: Mark Levedahl @ 2007-03-07 3:50 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Junio C Hamano, Git Mailing List
Johannes Schindelin wrote:
> Have you tried "make test"? In particular, t5510-fetch.sh?
>
> If it passes, git-bundle works at least for one particular case, and I'd
> suspect then that the inbuilt GIT_EXEC_PATH bites you. To avoid that
> particular peculiarity, just "export GIT_EXEC_PATH=/path/to/next/", and
> try again.
>
> Hiw,
> Dscho
Yes, setting GIT_EXEC_PATH fixed the problem, thanks.
I just tried git-bundle in a repository where I just committed 1 file,
the previous commit is several weeks old.
git-bundle create test.bdl HEAD --since=1.day.ago ==>> pack with 1531
objects
git-bundle create test.bdl HEAD ^HEAD~1 ==>> pack with 3 objects
But, both should only have 3 objects. So, I think the boundary marking
with date limiting still has a problem. Apparently, every blob
supporting the included commits are included in the pack, even if those
blobs are also part of the pack's prerequisites.
Mark
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: fix pack generation.
2007-03-07 3:50 ` Mark Levedahl
@ 2007-03-07 4:05 ` Johannes Schindelin
2007-03-07 5:57 ` Junio C Hamano
2007-03-07 16:34 ` Mark Levedahl
0 siblings, 2 replies; 22+ messages in thread
From: Johannes Schindelin @ 2007-03-07 4:05 UTC (permalink / raw)
To: Mark Levedahl; +Cc: Junio C Hamano, Git Mailing List
Hi,
On Tue, 6 Mar 2007, Mark Levedahl wrote:
> Yes, setting GIT_EXEC_PATH fixed the problem, thanks.
Wonderful!
> I just tried git-bundle in a repository where I just committed 1 file,
> the previous commit is several weeks old.
>
> git-bundle create test.bdl HEAD --since=1.day.ago ==>> pack with 1531
> objects
Did you test with "--since=1.day.ago HEAD", i.e. with the correct order? I
know you'd like the options to be interminglable, but "HEAD" really is not
an option, but an argument.
Hth,
Dscho
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: fix pack generation.
2007-03-07 4:05 ` Johannes Schindelin
@ 2007-03-07 5:57 ` Junio C Hamano
2007-03-07 16:34 ` Mark Levedahl
1 sibling, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2007-03-07 5:57 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Mark Levedahl, Git Mailing List
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>> git-bundle create test.bdl HEAD --since=1.day.ago ==>> pack with 1531
>> objects
>
> Did you test with "--since=1.day.ago HEAD", i.e. with the correct order? I
> know you'd like the options to be interminglable, but "HEAD" really is not
> an option, but an argument.
That is probably true but I do not think we have ever had date
limiter and --objects interact correctly. It is understandable
as traditionally packs were to generate needed objects between
revisions, and unlimited but counted (--max-count) or timed
(--max-age) are unexpected uses for "rev-list --objects" (in
other words, nobody had done so and I am not surprised it does
not work as you expect -- rather I would be surprised if anybody
has even expectd the current code to work that way without
modification).
I am a bit too tired tonight so anybody is welcome to beat me to
fix this one, preferrably on top of 'next'.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: fix pack generation.
2007-03-07 4:05 ` Johannes Schindelin
2007-03-07 5:57 ` Junio C Hamano
@ 2007-03-07 16:34 ` Mark Levedahl
2007-03-07 22:35 ` Johannes Schindelin
1 sibling, 1 reply; 22+ messages in thread
From: Mark Levedahl @ 2007-03-07 16:34 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Junio C Hamano, Git Mailing List
On 3/6/07, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
> >
> > git-bundle create test.bdl HEAD --since=1.day.ago ==>> pack with 1531
> > objects
>
> Did you test with "--since=1.day.ago HEAD", i.e. with the correct order? I
> know you'd like the options to be interminglable, but "HEAD" really is not
> an option, but an argument.
>
Changing the order of arguments makes no difference, same result either way.
Mark
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: fix pack generation.
2007-03-07 16:34 ` Mark Levedahl
@ 2007-03-07 22:35 ` Johannes Schindelin
2007-03-07 23:17 ` Linus Torvalds
2007-03-07 23:32 ` [PATCH] git-bundle: fix pack generation Junio C Hamano
0 siblings, 2 replies; 22+ messages in thread
From: Johannes Schindelin @ 2007-03-07 22:35 UTC (permalink / raw)
To: Mark Levedahl; +Cc: Junio C Hamano, Git Mailing List
Hi,
On Wed, 7 Mar 2007, Mark Levedahl wrote:
> On 3/6/07, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> >
> > > git-bundle create test.bdl HEAD --since=1.day.ago ==>> pack with
> > > 1531 objects
> >
> > Did you test with "--since=1.day.ago HEAD", i.e. with the correct
> > order? I know you'd like the options to be interminglable, but "HEAD"
> > really is not an option, but an argument.
>
> Changing the order of arguments makes no difference, same result either
> way.
We don't do thin packs. Should we? I guess that
$ git ls-tree -r HEAD | wc
results in something close to 1500 in that repo. Which basically means
that the 1531 objects (including trees and the commit) sounds correct.
Of course, we _could_ make the packs thin, but the disadvantage would be
that we can never decide at a later stage to allow shallow fetches from
that bundle.
The advantage of the thin packs would be that the bundles would be much
smaller.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: fix pack generation.
2007-03-07 22:35 ` Johannes Schindelin
@ 2007-03-07 23:17 ` Linus Torvalds
2007-03-08 0:27 ` [PATCH] git-bundle: Make thin packs Johannes Schindelin
2007-03-07 23:32 ` [PATCH] git-bundle: fix pack generation Junio C Hamano
1 sibling, 1 reply; 22+ messages in thread
From: Linus Torvalds @ 2007-03-07 23:17 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Mark Levedahl, Junio C Hamano, Git Mailing List
On Wed, 7 Mar 2007, Johannes Schindelin wrote:
>
> We don't do thin packs. Should we?
Since a bundle doesn't make any sense *anyway* unless you have the
prerequisites at the other end, I think you might as well do thin packs.
That will cut down on the bundle size a *lot* for the common cases.
Linus
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: fix pack generation.
2007-03-07 22:35 ` Johannes Schindelin
2007-03-07 23:17 ` Linus Torvalds
@ 2007-03-07 23:32 ` Junio C Hamano
2007-03-07 23:43 ` [PATCH] git-bundle: avoid packing objects which are in the prerequisites Johannes Schindelin
1 sibling, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2007-03-07 23:32 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Mark Levedahl, Git Mailing List
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> On Wed, 7 Mar 2007, Mark Levedahl wrote:
>
>> On 3/6/07, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>> >
>> > > git-bundle create test.bdl HEAD --since=1.day.ago ==>> pack with
>> > > 1531 objects
>> >
>> > Did you test with "--since=1.day.ago HEAD", i.e. with the correct
>> > order? I know you'd like the options to be interminglable, but "HEAD"
>> > really is not an option, but an argument.
>>
>> Changing the order of arguments makes no difference, same result either
>> way.
>
> We don't do thin packs. Should we? I guess that
>
> $ git ls-tree -r HEAD | wc
>
> results in something close to 1500 in that repo. Which basically means
> that the 1531 objects (including trees and the commit) sounds correct.
We should do thin packs, but 1500+ objects is a different story.
rev-list --objects <revspec>
means "show me the revs and its objects".
rev-list --max-count=<n> <revspec>
means "show me the revs and its objects, but stop at <n> revs".
On the other hand,
rev-list --objects ^<rev1> <rev2>
means something totally different. "never show me what is
reachable from <rev1> but I am interested to see what are
reachable from <rev2>".
So it is natural that "rev-list --objects --max-count=1 HEAD"
shows the same set of tree and blob objects as "git-tar HEAD".
"show me the revs and its objects" does not say "but exclude
trees and blobs reachable from commits beyond the <n> revs
boundary," which is "rev-list --objects HEAD^..HEAD".
In other words, "rev-list -1 HEAD" and "rev-list HEAD^..HEAD"
are _different_.
We fixed (in 'next') the revision traversal so that --max-count
and --boundary interact correctly. What's not fixed is that we
still use "rev-list --objects --max-*" given by the user, which
is not the semantics git-bundle wants. We should rewrite the
revspec to drive pack-objects when generating the packfile part
of the bundle to do "<revs> --not <boundaries>" internally in
git-bundle.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] git-bundle: avoid packing objects which are in the prerequisites
2007-03-07 23:32 ` [PATCH] git-bundle: fix pack generation Junio C Hamano
@ 2007-03-07 23:43 ` Johannes Schindelin
2007-03-08 20:36 ` Mark Levedahl
0 siblings, 1 reply; 22+ messages in thread
From: Johannes Schindelin @ 2007-03-07 23:43 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Mark Levedahl, Git Mailing List
When saying something like "--since=1.day.ago" or "--max-count=5",
git-bundle finds the boundary commits which are recorded as
prerequisites. However, it failed to tell pack-objects _not_ to
pack the objects which are in these.
Fix that. And add a test for that.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
On Wed, 7 Mar 2007, Junio C Hamano wrote:
> What's not fixed is that we still use "rev-list --objects
> --max-*" given by the user, which is not the semantics
> git-bundle wants. We should rewrite the revspec to drive
> pack-objects when generating the packfile part of the bundle to
> do "<revs> --not <boundaries>" internally in git-bundle.
Since we already calculate the prerequisites, it's easy to pass
them as uninteresting, too. It does not hurt to keep the max*
limiters, so they are not unset.
builtin-bundle.c | 15 ++++++++++++---
t/t5510-fetch.sh | 11 +++++++++++
2 files changed, 23 insertions(+), 3 deletions(-)
diff --git a/builtin-bundle.c b/builtin-bundle.c
index 3b3bc25..70d4479 100644
--- a/builtin-bundle.c
+++ b/builtin-bundle.c
@@ -305,6 +305,10 @@ static int create_bundle(struct bundle_header *header, const char *path,
/* write signature */
write_or_die(bundle_fd, bundle_signature, strlen(bundle_signature));
+ /* init revs to list objects for pack-objects later */
+ save_commit_buffer = 0;
+ init_revisions(&revs, NULL);
+
/* write prerequisites */
memcpy(argv_boundary + 3, argv + 1, argc * sizeof(const char *));
argv_boundary[0] = "rev-list";
@@ -316,8 +320,15 @@ static int create_bundle(struct bundle_header *header, const char *path,
if (pid < 0)
return -1;
while ((i = read_string(out, buffer, sizeof(buffer))) > 0)
- if (buffer[0] == '-')
+ if (buffer[0] == '-') {
+ unsigned char sha1[20];
write_or_die(bundle_fd, buffer, i);
+ if (!get_sha1_hex(buffer + 1, sha1)) {
+ struct object *object = parse_object(sha1);
+ object->flags |= UNINTERESTING;
+ add_pending_object(&revs, object, buffer);
+ }
+ }
while ((i = waitpid(pid, &status, 0)) < 0)
if (errno != EINTR)
return error("rev-list died");
@@ -325,8 +336,6 @@ static int create_bundle(struct bundle_header *header, const char *path,
return error("rev-list died %d", WEXITSTATUS(status));
/* write references */
- save_commit_buffer = 0;
- init_revisions(&revs, NULL);
revs.tag_objects = 1;
revs.tree_objects = 1;
revs.blob_objects = 1;
diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index ce96b4b..7be9793 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -121,4 +121,15 @@ test_expect_success 'unbundle 2' '
test "tip" = "$(git log -1 --pretty=oneline master | cut -b42-)"
'
+test_expect_success 'bundle does not prerequisite objects' '
+ cd "$D" &&
+ touch file2 &&
+ git add file2 &&
+ git commit -m add.file2 file2 &&
+ git bundle create bundle3 -1 HEAD &&
+ sed "1,4d" < bundle3 > bundle.pack &&
+ git index-pack bundle.pack &&
+ test 4 = $(git verify-pack -v bundle.pack | wc -l)
+'
+
test_done
--
1.5.0.3.2562.gfcfe0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH] git-bundle: Make thin packs
2007-03-07 23:17 ` Linus Torvalds
@ 2007-03-08 0:27 ` Johannes Schindelin
2007-03-08 0:34 ` Linus Torvalds
2007-03-08 2:02 ` Junio C Hamano
0 siblings, 2 replies; 22+ messages in thread
From: Johannes Schindelin @ 2007-03-08 0:27 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Mark Levedahl, Junio C Hamano, Git Mailing List
Thin packs are way smaller, but they rely on the receiving end to have the
base objects. However, Git's pack protocol also uses thin packs by
default. So make the packs contained in bundles thin, since bundles are
just another transport.
The patch looks a bit bigger than intended, mainly because --thin
_implies_ that pack-objects should run its own rev-list. Therefore, this
patch removes all the stuff we used to roll rev-list ourselves.
This commit also changes behaviour slightly: since we now know early
enough if a specified ref is _not_ contained in the pack, we can avoid
putting that ref into the pack. So, we don't die() here, but warn()
instead, and skip that ref.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
It came to me as a surprise that --thin implies --revs, and
I have to admit that I still don't understand it.
Anyway, this commit deletes more than double the lines as it
adds, so I'm not complaining.
On Wed, 7 Mar 2007, Linus Torvalds wrote:
> On Wed, 7 Mar 2007, Johannes Schindelin wrote:
>
> > We don't do thin packs. Should we?
>
> Since a bundle doesn't make any sense *anyway* unless you have
> the prerequisites at the other end, I think you might as well do
> thin packs. That will cut down on the bundle size a *lot* for
> the common cases.
Well, I disagree on the blanket "*anyway*". Shallow fetches are no
longer possible from these bundles (at least after this commit
_and_ ":/git-bundle: avoid packing" which I just sent out).
But then, people who want shallow fetches via bundles are more
likely to want tar balls _anyway_.
builtin-bundle.c | 85 ++++++++++++++++-------------------------------------
1 files changed, 26 insertions(+), 59 deletions(-)
diff --git a/builtin-bundle.c b/builtin-bundle.c
index 70d4479..6163358 100644
--- a/builtin-bundle.c
+++ b/builtin-bundle.c
@@ -257,45 +257,15 @@ static int list_heads(struct bundle_header *header, int argc, const char **argv)
return list_refs(&header->references, argc, argv);
}
-static void show_commit(struct commit *commit)
-{
- write_or_die(1, sha1_to_hex(commit->object.sha1), 40);
- write_or_die(1, "\n", 1);
- if (commit->parents) {
- free_commit_list(commit->parents);
- commit->parents = NULL;
- }
-}
-
-static void show_object(struct object_array_entry *p)
-{
- /* An object with name "foo\n0000000..." can be used to
- * confuse downstream git-pack-objects very badly.
- */
- const char *ep = strchr(p->name, '\n');
- int len = ep ? ep - p->name : strlen(p->name);
- write_or_die(1, sha1_to_hex(p->item->sha1), 40);
- write_or_die(1, " ", 1);
- if (len)
- write_or_die(1, p->name, len);
- write_or_die(1, "\n", 1);
-}
-
-static void show_edge(struct commit *commit)
-{
- ; /* nothing to do */
-}
-
static int create_bundle(struct bundle_header *header, const char *path,
int argc, const char **argv)
{
int bundle_fd = -1;
const char **argv_boundary = xmalloc((argc + 4) * sizeof(const char *));
- const char **argv_pack = xmalloc(4 * sizeof(const char *));
+ const char **argv_pack = xmalloc(5 * sizeof(const char *));
int pid, in, out, i, status;
char buffer[1024];
struct rev_info revs;
- struct object_array tips;
bundle_fd = (!strcmp(path, "-") ? 1 :
open(path, O_CREAT | O_WRONLY, 0666));
@@ -319,16 +289,20 @@ static int create_bundle(struct bundle_header *header, const char *path,
pid = fork_with_pipe(argv_boundary, NULL, &out);
if (pid < 0)
return -1;
- while ((i = read_string(out, buffer, sizeof(buffer))) > 0)
+ while ((i = read_string(out, buffer, sizeof(buffer))) > 0) {
+ unsigned char sha1[20];
if (buffer[0] == '-') {
- unsigned char sha1[20];
write_or_die(bundle_fd, buffer, i);
if (!get_sha1_hex(buffer + 1, sha1)) {
struct object *object = parse_object(sha1);
object->flags |= UNINTERESTING;
add_pending_object(&revs, object, buffer);
}
+ } else if (!get_sha1_hex(buffer, sha1)) {
+ struct object *object = parse_object(sha1);
+ object->flags |= SHOWN;
}
+ }
while ((i = waitpid(pid, &status, 0)) < 0)
if (errno != EINTR)
return error("rev-list died");
@@ -336,14 +310,10 @@ static int create_bundle(struct bundle_header *header, const char *path,
return error("rev-list died %d", WEXITSTATUS(status));
/* write references */
- revs.tag_objects = 1;
- revs.tree_objects = 1;
- revs.blob_objects = 1;
argc = setup_revisions(argc, argv, &revs, NULL);
if (argc > 1)
return error("unrecognized argument: %s'", argv[1]);
- memset(&tips, 0, sizeof(tips));
for (i = 0; i < revs.pending.nr; i++) {
struct object_array_entry *e = revs.pending.objects + i;
unsigned char sha1[20];
@@ -353,11 +323,20 @@ static int create_bundle(struct bundle_header *header, const char *path,
continue;
if (dwim_ref(e->name, strlen(e->name), sha1, &ref) != 1)
continue;
+ /*
+ * Make sure the refs we wrote out is correct; --max-count and
+ * other limiting options could have prevented all the tips
+ * from getting output.
+ */
+ if (!(e->item->flags & SHOWN)) {
+ warn("ref '%s' is excluded by the rev-list options",
+ e->name);
+ continue;
+ }
write_or_die(bundle_fd, sha1_to_hex(e->item->sha1), 40);
write_or_die(bundle_fd, " ", 1);
write_or_die(bundle_fd, ref, strlen(ref));
write_or_die(bundle_fd, "\n", 1);
- add_object_array(e->item, e->name, &tips);
free(ref);
}
@@ -368,39 +347,27 @@ static int create_bundle(struct bundle_header *header, const char *path,
argv_pack[0] = "pack-objects";
argv_pack[1] = "--all-progress";
argv_pack[2] = "--stdout";
- argv_pack[3] = NULL;
+ argv_pack[3] = "--thin";
+ argv_pack[4] = NULL;
in = -1;
out = bundle_fd;
pid = fork_with_pipe(argv_pack, &in, &out);
if (pid < 0)
return error("Could not spawn pack-objects");
- close(1);
- dup2(in, 1);
+ for (i = 0; i < revs.pending.nr; i++) {
+ struct object *object = revs.pending.objects[i].item;
+ if (object->flags & UNINTERESTING)
+ write(in, "^", 1);
+ write(in, sha1_to_hex(object->sha1), 40);
+ write(in, "\n", 1);
+ }
close(in);
- prepare_revision_walk(&revs);
- mark_edges_uninteresting(revs.commits, &revs, show_edge);
- traverse_commit_list(&revs, show_commit, show_object);
- close(1);
while (waitpid(pid, &status, 0) < 0)
if (errno != EINTR)
return -1;
if (!WIFEXITED(status) || WEXITSTATUS(status))
return error ("pack-objects died");
- /*
- * Make sure the refs we wrote out is correct; --max-count and
- * other limiting options could have prevented all the tips
- * from getting output.
- */
- status = 0;
- for (i = 0; i < tips.nr; i++) {
- if (!(tips.objects[i].item->flags & SHOWN)) {
- status = 1;
- error("%s: not included in the resulting pack",
- tips.objects[i].name);
- }
- }
-
return status;
}
--
1.5.0.3.2562.gfcfe0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: Make thin packs
2007-03-08 0:27 ` [PATCH] git-bundle: Make thin packs Johannes Schindelin
@ 2007-03-08 0:34 ` Linus Torvalds
2007-03-08 0:56 ` Johannes Schindelin
2007-03-08 2:02 ` Junio C Hamano
1 sibling, 1 reply; 22+ messages in thread
From: Linus Torvalds @ 2007-03-08 0:34 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Mark Levedahl, Junio C Hamano, Git Mailing List
On Thu, 8 Mar 2007, Johannes Schindelin wrote:
>
> > Since a bundle doesn't make any sense *anyway* unless you have
> > the prerequisites at the other end, I think you might as well do
> > thin packs. That will cut down on the bundle size a *lot* for
> > the common cases.
>
> Well, I disagree on the blanket "*anyway*". Shallow fetches are no
> longer possible from these bundles (at least after this commit
> _and_ ":/git-bundle: avoid packing" which I just sent out).
Does anybody actually use shallow clones in real life?
When I did the numbers a long time ago, the shallow clone didn't actually
help much, because it meant that there were no deltas. Which meant that
you got 1% of the history for 60% of the price of all history, and the
shallow thing didn't really seem to make much sense.
I guess that for something with a really long history, you'd get 0.001% of
the history for 10% of the price, and maybe it makes sense then.
Linus
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: Make thin packs
2007-03-08 0:34 ` Linus Torvalds
@ 2007-03-08 0:56 ` Johannes Schindelin
0 siblings, 0 replies; 22+ messages in thread
From: Johannes Schindelin @ 2007-03-08 0:56 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Mark Levedahl, Junio C Hamano, Git Mailing List
Hi,
On Wed, 7 Mar 2007, Linus Torvalds wrote:
> Does anybody actually use shallow clones in real life?
I don't. That's why I work on push/fetch from/via/into shallow repos.
> When I did the numbers a long time ago, the shallow clone didn't
> actually help much, because it meant that there were no deltas. Which
> meant that you got 1% of the history for 60% of the price of all
> history, and the shallow thing didn't really seem to make much sense.
>
> I guess that for something with a really long history, you'd get 0.001%
> of the history for 10% of the price, and maybe it makes sense then.
You -- being blessed by not having to work with anything closed -- miss an
important fact of commercial software development. Most, if not all,
projects in a commercial setting contain binary blobs. For example, a DLL,
or a PNG, or a Firmware blob. These are updated regularly. And they delta
really awfully bad.
Also, for something as OpenOffice or Mozilla, I guess that if you really
only want to work on the newest revision, the _initial_ fetch will be way
cheaper than a full clone. Of course I have no numbers here, but that is
what my gut feeling says.
Naturally, over time, the shallow clone will fetch in more and more
objects from the upstream, eventually being almost as large as if having
the complete history, but the user's experience is different: the amount
of time needed to get that amount of data is much more widely spread.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: Make thin packs
2007-03-08 0:27 ` [PATCH] git-bundle: Make thin packs Johannes Schindelin
2007-03-08 0:34 ` Linus Torvalds
@ 2007-03-08 2:02 ` Junio C Hamano
2007-03-08 13:07 ` Johannes Schindelin
1 sibling, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2007-03-08 2:02 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Linus Torvalds, Mark Levedahl, Git Mailing List
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> This commit also changes behaviour slightly: since we now know early
> enough if a specified ref is _not_ contained in the pack, we can avoid
> putting that ref into the pack. So, we don't die() here, but warn()
> instead, and skip that ref.
I am a bit worried about that part.
If somebody says "bundle create foo.bdl --since=1.day maint" and
maint's tip hasn't changed for a month, we would end up having
no refs and no pack in the bundle with just a warning.
"bundle create foo.bdl -20 master next" does _not_ mean "20 revs
but at least have master and next tip", but it may surprise an
uninitiated to find out that running "bundle list foo.bdl" on
the resulting bundle did not talk about master at all.
I have a feeling that it would avoid confusion if we did not
even start the pack generation and die early when we find the
counting caused us not to include all the positive refs
specified on the command line.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: Make thin packs
2007-03-08 2:02 ` Junio C Hamano
@ 2007-03-08 13:07 ` Johannes Schindelin
0 siblings, 0 replies; 22+ messages in thread
From: Johannes Schindelin @ 2007-03-08 13:07 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Linus Torvalds, Mark Levedahl, Git Mailing List
Hi,
On Wed, 7 Mar 2007, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > This commit also changes behaviour slightly: since we now know early
> > enough if a specified ref is _not_ contained in the pack, we can avoid
> > putting that ref into the pack. So, we don't die() here, but warn()
> > instead, and skip that ref.
>
> I am a bit worried about that part.
>
> If somebody says "bundle create foo.bdl --since=1.day maint" and
> maint's tip hasn't changed for a month, we would end up having
> no refs and no pack in the bundle with just a warning.
>
> "bundle create foo.bdl -20 master next" does _not_ mean "20 revs
> but at least have master and next tip", but it may surprise an
> uninitiated to find out that running "bundle list foo.bdl" on
> the resulting bundle did not talk about master at all.
>
> I have a feeling that it would avoid confusion if we did not
> even start the pack generation and die early when we find the
> counting caused us not to include all the positive refs
> specified on the command line.
Fair enough.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] git-bundle: avoid packing objects which are in the prerequisites
2007-03-07 23:43 ` [PATCH] git-bundle: avoid packing objects which are in the prerequisites Johannes Schindelin
@ 2007-03-08 20:36 ` Mark Levedahl
0 siblings, 0 replies; 22+ messages in thread
From: Mark Levedahl @ 2007-03-08 20:36 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Junio C Hamano, Git Mailing List
Johannes Schindelin wrote:
Using next (2432cbc253e8ef5cf09062abb8f7325813476d1d) that has your
patches for creating a thin-pack and limiting objects, I generally find
the bundle file created cannot be used.
Specific example you can repeat in a current git repo:
git>git branch foo 2432cbc253e8ef5c
git>git bundle create t.bdl foo~1..foo
Generating pack...
Done counting 20 objects.
Result has 14 objects.
Deltifying 14 objects.
100% (14/14) done
Writing 14 objects.
100% (14/14) done
Total 14 (delta 10), reused 10 (delta 6)
git>git fetch t.bdl
fatal: pack has junk at the end
error: index-pack exited with status 128
fatal: Fetch failure: t.bdl
git>
Also, we get an interesting failure if the references are not touched in
a given date range:
git>git bundle create t.bdl --since=5.minutes.ago foo
warning: ref 'foo' is excluded by the rev-list options
Generating pack...
Done counting 38321 objects.
Deltifying 38321 objects.
100% (38321/38321) done
Writing 38321 objects.
100% (38321/38321) done
Total 38321 (delta 26678), reused 38180 (delta 26551)
git>
No references requies 38000+ objects?? In this case, the bundle includes
every object in the repository, the exact opposite of what should happen
(zero objects).
Mark
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2007-03-08 20:36 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-06 4:17 builtin git-bundle - pack contains many more objects than required Mark Levedahl
2007-03-06 6:18 ` Junio C Hamano
2007-03-06 7:45 ` [PATCH] git-bundle: fix pack generation Junio C Hamano
2007-03-07 1:15 ` Mark Levedahl
2007-03-07 3:01 ` Junio C Hamano
2007-03-07 3:17 ` Mark Levedahl
2007-03-07 3:22 ` Johannes Schindelin
2007-03-07 3:50 ` Mark Levedahl
2007-03-07 4:05 ` Johannes Schindelin
2007-03-07 5:57 ` Junio C Hamano
2007-03-07 16:34 ` Mark Levedahl
2007-03-07 22:35 ` Johannes Schindelin
2007-03-07 23:17 ` Linus Torvalds
2007-03-08 0:27 ` [PATCH] git-bundle: Make thin packs Johannes Schindelin
2007-03-08 0:34 ` Linus Torvalds
2007-03-08 0:56 ` Johannes Schindelin
2007-03-08 2:02 ` Junio C Hamano
2007-03-08 13:07 ` Johannes Schindelin
2007-03-07 23:32 ` [PATCH] git-bundle: fix pack generation Junio C Hamano
2007-03-07 23:43 ` [PATCH] git-bundle: avoid packing objects which are in the prerequisites Johannes Schindelin
2007-03-08 20:36 ` Mark Levedahl
2007-03-07 1:16 ` [PATCH] git-bundle: fix pack generation Mark Levedahl
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).