* [RFC] send-pack: allow skipping delta when sending pack
@ 2006-05-21 5:48 Jeff King
2006-05-21 6:17 ` Junio C Hamano
0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2006-05-21 5:48 UTC (permalink / raw)
To: git
I have a git repo where I keep relatively large files (digital photos).
I have a local repo and a "master" repo on a server which I access over
ssh. Deltifying a bunch of large images takes a relatively long time. I
can live with this while packing (though it is slightly annoying to have
to pack separately on both repos, I understand why it might be hard to
reuse the deltification).
However, it is extremely annoying to add a large set of images and then
push them to the server. The server is on a 100Mbit LAN, so the
deltification part of the process takes up most of the time (and
typically ends up making no deltas, since the files are unrelated
images). The patch below causes the GIT_NODELTA environment variable to
set the window depth to 0 when sending a pack, preventing deltification.
The result is much better performance in my case. However, the method
seems quite hack-ish, so I wanted to get comments on how this should be
done. Possibilities I considered:
1. A command line option to git-send-pack. The problem with this is
that support is required from git-push and cg-push to pass the
option through.
2. A repo config variable that says not to deltify on sending (or
potentially, not to deltify at all, which makes packing in general
much nicer -- however, I don't think this is a good idea, as I do
still want deltification rarely, it's just that it mostly will
fail). This should probably be per-remote for the obvious reason
that one might push to local and remote repos. One drawback is that
sometimes deltification may be a win; it's just that I sometimes
know that it won't be (because I added a bunch of unrelated large
files). It's nice to selectively turn this option on for a given
push.
3. Ideally, we could do some heuristic to see if deltification will
yield helpful results. In particular, we may already have a pack
with these commits in it (especially if we just repack before a
push). If we can re-use this information, it at least saves
deltifying twice (once to pack, once to push). In theory, I would
think the fact that we don't pass --no-reuse-delta to pack-objects
means that this would happen automatically, but it clearly doesn't.
Comments?
---
f1cf653120dd492d1c86ee2a92a9c8221023cef1
send-pack.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)
f1cf653120dd492d1c86ee2a92a9c8221023cef1
diff --git a/send-pack.c b/send-pack.c
index 409f188..4ad6489 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -30,8 +30,14 @@ static void exec_pack_objects(void)
static const char *args[] = {
"pack-objects",
"--stdout",
+ NULL,
NULL
};
+ const char *nodelta;
+
+ nodelta = getenv("GIT_NODELTA");
+ if(nodelta && !strcmp(nodelta, "1"))
+ args[2] = "--depth=0";
execv_git_cmd(args);
die("git-pack-objects exec failed (%s)", strerror(errno));
}
--
1.3.3.g288c-dirty
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC] send-pack: allow skipping delta when sending pack
2006-05-21 5:48 [RFC] send-pack: allow skipping delta when sending pack Jeff King
@ 2006-05-21 6:17 ` Junio C Hamano
2006-05-21 8:14 ` Jeff King
0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2006-05-21 6:17 UTC (permalink / raw)
To: Jeff King; +Cc: git
Jeff King <peff@peff.net> writes:
> The result is much better performance in my case. However, the method
> seems quite hack-ish, so I wanted to get comments on how this should be
> done. Possibilities I considered:
> 1. A command line option to git-send-pack. The problem with this is
> that support is required from git-push and cg-push to pass the
> option through.
When you pull from such a repository you would also need to be
able to control this. The repository owner knows what's in the
repository a lot more than the downloader, so some repository
configuration that tells upload-pack to use such-and-such delta
window is also needed. But as you say below:
> 3. Ideally, we could do some heuristic to see if deltification will
> yield helpful results. In particular, we may already have a pack
> with these commits in it (especially if we just repack before a
> push). If we can re-use this information, it at least saves
> deltifying twice (once to pack, once to push). In theory, I would
> think the fact that we don't pass --no-reuse-delta to pack-objects
> means that this would happen automatically, but it clearly doesn't.
The lack of --no-reuse-delta just means "if the object we are
going to send is a delta in the source, and its delta base is
also something we are going to send, then pretend that it is the
base delta for that object to skip computation". What you want
here is "if the object we are going to send is not a delta in
the source, and there are sufficient number of other objects the
object could have been deltified against, then it is very likely
that it was not worth deltifying when it was packed; so it is
probably not worth deltifying it now".
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] send-pack: allow skipping delta when sending pack
2006-05-21 6:17 ` Junio C Hamano
@ 2006-05-21 8:14 ` Jeff King
2006-05-21 8:24 ` Jakub Narebski
2006-05-21 8:31 ` Junio C Hamano
0 siblings, 2 replies; 10+ messages in thread
From: Jeff King @ 2006-05-21 8:14 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Sat, May 20, 2006 at 11:17:42PM -0700, Junio C Hamano wrote:
> base delta for that object to skip computation". What you want
> here is "if the object we are going to send is not a delta in
> the source, and there are sufficient number of other objects the
> object could have been deltified against, then it is very likely
> that it was not worth deltifying when it was packed; so it is
> probably not worth deltifying it now".
I think we can make a stronger statement in many cases: "if the object
we are going to send is not a delta in the source, and there are no
other objects it could be deltified against, then it is not worth
deltifying." That is, in the case that we just packed we KNOW that it's
not worth it, since we're not sending anything that isn't already
packed.
Following this logic, I believe we could always turn off deltification
if there are no loose objects. It seems a bit special case, but it
optimizes the "repack -d then push" workflow which I suspect may be
relatively common.
The patch below runs git-rev-list with --unpacked; if there are no
objects returned, it tells pack-objects to set depth=0 (is that really
what we want? It's OK to use existing deltas; we just don't want to
compute any new ones. I'm not sure how --depth behaves in that respect).
-Peff
---
f3e7fb4a5025cc8157557f3da6f9dc7d0a89395f
send-pack.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++-----------
1 files changed, 58 insertions(+), 14 deletions(-)
f3e7fb4a5025cc8157557f3da6f9dc7d0a89395f
diff --git a/send-pack.c b/send-pack.c
index 409f188..0bddc0a 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -25,18 +25,7 @@ static int is_zero_sha1(const unsigned c
return 1;
}
-static void exec_pack_objects(void)
-{
- static const char *args[] = {
- "pack-objects",
- "--stdout",
- NULL
- };
- execv_git_cmd(args);
- die("git-pack-objects exec failed (%s)", strerror(errno));
-}
-
-static void exec_rev_list(struct ref *refs)
+static void exec_rev_list(struct ref *refs, int unpacked)
{
struct ref *ref;
static const char *args[1000];
@@ -47,6 +36,8 @@ static void exec_rev_list(struct ref *re
args[i++] = "--objects-edge";
else
args[i++] = "--objects";
+ if(unpacked)
+ args[i++] = "--unpacked";
/* First send the ones we care about most */
for (ref = refs; ref; ref = ref->next) {
@@ -85,6 +76,59 @@ static void exec_rev_list(struct ref *re
die("git-rev-list exec failed (%s)", strerror(errno));
}
+static int rev_list_loose_objects(struct ref *refs)
+{
+ int fd[2];
+ pid_t pid;
+
+ if(pipe(fd) < 0)
+ die("rev-list setup: pipe failed");
+ pid = fork();
+ if(pid < 0)
+ die("rev-list setup: fork failed");
+ if(!pid) {
+ dup2(fd[1], 1);
+ close(fd[0]);
+ close(fd[1]);
+ exec_rev_list(refs, 1);
+ die("rev-list failed");
+ }
+ close(fd[1]);
+ return fd[0];
+}
+
+static int count_loose_objects(struct ref *refs)
+{
+ int fd = rev_list_loose_objects(refs);
+ unsigned count = 0;
+ char buf[4096];
+ int len;
+
+ while((len = read(fd, buf, sizeof buf)) > 0) {
+ int i;
+ for(i = 0; i < len; i++)
+ if(buf[i] == '\n')
+ count++;
+ }
+ if(len < 0)
+ die("rev-list read failed");
+ return count;
+}
+
+static void exec_pack_objects(struct ref *refs)
+{
+ static const char *args[] = {
+ "pack-objects",
+ "--stdout",
+ NULL,
+ NULL
+ };
+ if(count_loose_objects(refs) == 0)
+ args[2] = "--depth=0";
+ execv_git_cmd(args);
+ die("git-pack-objects exec failed (%s)", strerror(errno));
+}
+
static void rev_list(int fd, struct ref *refs)
{
int pipe_fd[2];
@@ -99,7 +143,7 @@ static void rev_list(int fd, struct ref
close(pipe_fd[0]);
close(pipe_fd[1]);
close(fd);
- exec_pack_objects();
+ exec_pack_objects(refs);
die("pack-objects setup failed");
}
if (pack_objects_pid < 0)
@@ -108,7 +152,7 @@ static void rev_list(int fd, struct ref
close(pipe_fd[0]);
close(pipe_fd[1]);
close(fd);
- exec_rev_list(refs);
+ exec_rev_list(refs, 0);
}
static int pack_objects(int fd, struct ref *refs)
--
1.3.3.gf1cf-dirty
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC] send-pack: allow skipping delta when sending pack
2006-05-21 8:14 ` Jeff King
@ 2006-05-21 8:24 ` Jakub Narebski
2006-05-21 8:44 ` Jeff King
2006-05-21 8:31 ` Junio C Hamano
1 sibling, 1 reply; 10+ messages in thread
From: Jakub Narebski @ 2006-05-21 8:24 UTC (permalink / raw)
To: git
Jeff King wrote:
Hmmm... isn't the patch slightly against git coding style?
> + if(unpacked)
[...]
>+ for(i = 0; i < len; i++)
compare to:
> for (ref = refs; ref; ref = ref->next) {
--
Jakub Narebski
Warsaw, Poland
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] send-pack: allow skipping delta when sending pack
2006-05-21 8:14 ` Jeff King
2006-05-21 8:24 ` Jakub Narebski
@ 2006-05-21 8:31 ` Junio C Hamano
2006-05-21 8:43 ` Jeff King
1 sibling, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2006-05-21 8:31 UTC (permalink / raw)
To: git
Jeff King <peff@peff.net> writes:
> On Sat, May 20, 2006 at 11:17:42PM -0700, Junio C Hamano wrote:
>
>> base delta for that object to skip computation". What you want
>> here is "if the object we are going to send is not a delta in
>> the source, and there are sufficient number of other objects the
>> object could have been deltified against, then it is very likely
>> that it was not worth deltifying when it was packed; so it is
>> probably not worth deltifying it now".
>
> I think we can make a stronger statement in many cases: "if the object
> we are going to send is not a delta in the source, and there are no
> other objects it could be deltified against, then it is not worth
> deltifying." That is, in the case that we just packed we KNOW that it's
> not worth it, since we're not sending anything that isn't already
> packed.
Careful.
We do not delta an otherwise perfectly deltifiable object if its
delta base happens to be at the depth edge in the original pack.
So no, we do _NOT_ know if it is not worth it merely from the
fact that it is not deltified in the existing pack. And the
latter part of your test "there are no other objects it could be
deltified against" is either expensive (you have to try first to
see if that is the case to really see it) or stupid (you just
assume there is no suitable delta base without looking at other
objects like we currently do).
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] send-pack: allow skipping delta when sending pack
2006-05-21 8:31 ` Junio C Hamano
@ 2006-05-21 8:43 ` Jeff King
2006-05-21 9:40 ` Junio C Hamano
0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2006-05-21 8:43 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Sun, May 21, 2006 at 01:31:22AM -0700, Junio C Hamano wrote:
> We do not delta an otherwise perfectly deltifiable object if its
> delta base happens to be at the depth edge in the original pack.
> So no, we do _NOT_ know if it is not worth it merely from the
> fact that it is not deltified in the existing pack. And the
Yes, you're right. What do you suggest for this performance issue, then?
A manual no-delta trigger, or just going to get a cup of coffee while
pushing (my tests showed 5-6x slowdown from deltifying)?
-Peff
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] send-pack: allow skipping delta when sending pack
2006-05-21 8:24 ` Jakub Narebski
@ 2006-05-21 8:44 ` Jeff King
2006-05-21 9:14 ` Junio C Hamano
2006-05-21 9:24 ` Jakub Narebski
0 siblings, 2 replies; 10+ messages in thread
From: Jeff King @ 2006-05-21 8:44 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git
On Sun, May 21, 2006 at 10:24:37AM +0200, Jakub Narebski wrote:
> Hmmm... isn't the patch slightly against git coding style?
Oops, yes (though the point is moot since the patch is conceptually
wrong). Is there a git coding style document somewhere?
-Peff
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] send-pack: allow skipping delta when sending pack
2006-05-21 8:44 ` Jeff King
@ 2006-05-21 9:14 ` Junio C Hamano
2006-05-21 9:24 ` Jakub Narebski
1 sibling, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2006-05-21 9:14 UTC (permalink / raw)
To: Jeff King; +Cc: git
Jeff King <peff@peff.net> writes:
> On Sun, May 21, 2006 at 10:24:37AM +0200, Jakub Narebski wrote:
>
>> Hmmm... isn't the patch slightly against git coding style?
>
> Oops, yes (though the point is moot since the patch is conceptually
> wrong). Is there a git coding style document somewhere?
When in doubt, please follow the kernel coding style ;-).
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] send-pack: allow skipping delta when sending pack
2006-05-21 8:44 ` Jeff King
2006-05-21 9:14 ` Junio C Hamano
@ 2006-05-21 9:24 ` Jakub Narebski
1 sibling, 0 replies; 10+ messages in thread
From: Jakub Narebski @ 2006-05-21 9:24 UTC (permalink / raw)
To: git
Jeff King wrote:
> On Sun, May 21, 2006 at 10:24:37AM +0200, Jakub Narebski wrote:
>
>> Hmmm... isn't the patch slightly against git coding style?
>
> Oops, yes (though the point is moot since the patch is conceptually
> wrong). Is there a git coding style document somewhere?
Not that I know of. It's better to follow the coding style used
in the rest of files.
Besides, differentiating between
if (condition)
and
function(argument);
makes IMVHO sense.
--
Jakub Narebski
Warsaw, Poland
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] send-pack: allow skipping delta when sending pack
2006-05-21 8:43 ` Jeff King
@ 2006-05-21 9:40 ` Junio C Hamano
0 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2006-05-21 9:40 UTC (permalink / raw)
To: Jeff King; +Cc: git
Jeff King <peff@peff.net> writes:
> What do you suggest for this performance issue, then?
> A manual no-delta trigger, or just going to get a cup of coffee while
> pushing (my tests showed 5-6x slowdown from deltifying)?
In the short term, give --depth to send-pack, would be better
than nothing.
I agree we should be able to do better. But let me grab some
sleep now first ;-).
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2006-05-21 9:40 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-21 5:48 [RFC] send-pack: allow skipping delta when sending pack Jeff King
2006-05-21 6:17 ` Junio C Hamano
2006-05-21 8:14 ` Jeff King
2006-05-21 8:24 ` Jakub Narebski
2006-05-21 8:44 ` Jeff King
2006-05-21 9:14 ` Junio C Hamano
2006-05-21 9:24 ` Jakub Narebski
2006-05-21 8:31 ` Junio C Hamano
2006-05-21 8:43 ` Jeff King
2006-05-21 9:40 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).