* [RFC] send-pack: allow skipping delta when sending pack
@ 2006-05-21 5:48 Jeff King
2006-05-21 6:17 ` Junio C Hamano
0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2006-05-21 5:48 UTC (permalink / raw)
To: git
I have a git repo where I keep relatively large files (digital photos).
I have a local repo and a "master" repo on a server which I access over
ssh. Deltifying a bunch of large images takes a relatively long time. I
can live with this while packing (though it is slightly annoying to have
to pack separately on both repos, I understand why it might be hard to
reuse the deltification).
However, it is extremely annoying to add a large set of images and then
push them to the server. The server is on a 100Mbit LAN, so the
deltification part of the process takes up most of the time (and
typically ends up making no deltas, since the files are unrelated
images). The patch below causes the GIT_NODELTA environment variable to
set the window depth to 0 when sending a pack, preventing deltification.
The result is much better performance in my case. However, the method
seems quite hack-ish, so I wanted to get comments on how this should be
done. Possibilities I considered:
1. A command line option to git-send-pack. The problem with this is
that support is required from git-push and cg-push to pass the
option through.
2. A repo config variable that says not to deltify on sending (or
potentially, not to deltify at all, which makes packing in general
much nicer -- however, I don't think this is a good idea, as I do
still want deltification rarely, it's just that it mostly will
fail). This should probably be per-remote for the obvious reason
that one might push to local and remote repos. One drawback is that
sometimes deltification may be a win; it's just that I sometimes
know that it won't be (because I added a bunch of unrelated large
files). It's nice to selectively turn this option on for a given
push.
3. Ideally, we could do some heuristic to see if deltification will
yield helpful results. In particular, we may already have a pack
with these commits in it (especially if we just repack before a
push). If we can re-use this information, it at least saves
deltifying twice (once to pack, once to push). In theory, I would
think the fact that we don't pass --no-reuse-delta to pack-objects
means that this would happen automatically, but it clearly doesn't.
Comments?
---
f1cf653120dd492d1c86ee2a92a9c8221023cef1
send-pack.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)
f1cf653120dd492d1c86ee2a92a9c8221023cef1
diff --git a/send-pack.c b/send-pack.c
index 409f188..4ad6489 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -30,8 +30,14 @@ static void exec_pack_objects(void)
static const char *args[] = {
"pack-objects",
"--stdout",
+ NULL,
NULL
};
+ const char *nodelta;
+
+ nodelta = getenv("GIT_NODELTA");
+ if(nodelta && !strcmp(nodelta, "1"))
+ args[2] = "--depth=0";
execv_git_cmd(args);
die("git-pack-objects exec failed (%s)", strerror(errno));
}
--
1.3.3.g288c-dirty
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [RFC] send-pack: allow skipping delta when sending pack 2006-05-21 5:48 [RFC] send-pack: allow skipping delta when sending pack Jeff King @ 2006-05-21 6:17 ` Junio C Hamano 2006-05-21 8:14 ` Jeff King 0 siblings, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2006-05-21 6:17 UTC (permalink / raw) To: Jeff King; +Cc: git Jeff King <peff@peff.net> writes: > The result is much better performance in my case. However, the method > seems quite hack-ish, so I wanted to get comments on how this should be > done. Possibilities I considered: > 1. A command line option to git-send-pack. The problem with this is > that support is required from git-push and cg-push to pass the > option through. When you pull from such a repository you would also need to be able to control this. The repository owner knows what's in the repository a lot more than the downloader, so some repository configuration that tells upload-pack to use such-and-such delta window is also needed. But as you say below: > 3. Ideally, we could do some heuristic to see if deltification will > yield helpful results. In particular, we may already have a pack > with these commits in it (especially if we just repack before a > push). If we can re-use this information, it at least saves > deltifying twice (once to pack, once to push). In theory, I would > think the fact that we don't pass --no-reuse-delta to pack-objects > means that this would happen automatically, but it clearly doesn't. The lack of --no-reuse-delta just means "if the object we are going to send is a delta in the source, and its delta base is also something we are going to send, then pretend that it is the base delta for that object to skip computation". What you want here is "if the object we are going to send is not a delta in the source, and there are sufficient number of other objects the object could have been deltified against, then it is very likely that it was not worth deltifying when it was packed; so it is probably not worth deltifying it now". ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] send-pack: allow skipping delta when sending pack 2006-05-21 6:17 ` Junio C Hamano @ 2006-05-21 8:14 ` Jeff King 2006-05-21 8:24 ` Jakub Narebski 2006-05-21 8:31 ` Junio C Hamano 0 siblings, 2 replies; 10+ messages in thread From: Jeff King @ 2006-05-21 8:14 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Sat, May 20, 2006 at 11:17:42PM -0700, Junio C Hamano wrote: > base delta for that object to skip computation". What you want > here is "if the object we are going to send is not a delta in > the source, and there are sufficient number of other objects the > object could have been deltified against, then it is very likely > that it was not worth deltifying when it was packed; so it is > probably not worth deltifying it now". I think we can make a stronger statement in many cases: "if the object we are going to send is not a delta in the source, and there are no other objects it could be deltified against, then it is not worth deltifying." That is, in the case that we just packed we KNOW that it's not worth it, since we're not sending anything that isn't already packed. Following this logic, I believe we could always turn off deltification if there are no loose objects. It seems a bit special case, but it optimizes the "repack -d then push" workflow which I suspect may be relatively common. The patch below runs git-rev-list with --unpacked; if there are no objects returned, it tells pack-objects to set depth=0 (is that really what we want? It's OK to use existing deltas; we just don't want to compute any new ones. I'm not sure how --depth behaves in that respect). -Peff --- f3e7fb4a5025cc8157557f3da6f9dc7d0a89395f send-pack.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++----------- 1 files changed, 58 insertions(+), 14 deletions(-) f3e7fb4a5025cc8157557f3da6f9dc7d0a89395f diff --git a/send-pack.c b/send-pack.c index 409f188..0bddc0a 100644 --- a/send-pack.c +++ b/send-pack.c @@ -25,18 +25,7 @@ static int is_zero_sha1(const unsigned c return 1; } -static void exec_pack_objects(void) -{ - static const char *args[] = { - "pack-objects", - "--stdout", - NULL - }; - execv_git_cmd(args); - die("git-pack-objects exec failed (%s)", strerror(errno)); -} - -static void exec_rev_list(struct ref *refs) +static void exec_rev_list(struct ref *refs, int unpacked) { struct ref *ref; static const char *args[1000]; @@ -47,6 +36,8 @@ static void exec_rev_list(struct ref *re args[i++] = "--objects-edge"; else args[i++] = "--objects"; + if(unpacked) + args[i++] = "--unpacked"; /* First send the ones we care about most */ for (ref = refs; ref; ref = ref->next) { @@ -85,6 +76,59 @@ static void exec_rev_list(struct ref *re die("git-rev-list exec failed (%s)", strerror(errno)); } +static int rev_list_loose_objects(struct ref *refs) +{ + int fd[2]; + pid_t pid; + + if(pipe(fd) < 0) + die("rev-list setup: pipe failed"); + pid = fork(); + if(pid < 0) + die("rev-list setup: fork failed"); + if(!pid) { + dup2(fd[1], 1); + close(fd[0]); + close(fd[1]); + exec_rev_list(refs, 1); + die("rev-list failed"); + } + close(fd[1]); + return fd[0]; +} + +static int count_loose_objects(struct ref *refs) +{ + int fd = rev_list_loose_objects(refs); + unsigned count = 0; + char buf[4096]; + int len; + + while((len = read(fd, buf, sizeof buf)) > 0) { + int i; + for(i = 0; i < len; i++) + if(buf[i] == '\n') + count++; + } + if(len < 0) + die("rev-list read failed"); + return count; +} + +static void exec_pack_objects(struct ref *refs) +{ + static const char *args[] = { + "pack-objects", + "--stdout", + NULL, + NULL + }; + if(count_loose_objects(refs) == 0) + args[2] = "--depth=0"; + execv_git_cmd(args); + die("git-pack-objects exec failed (%s)", strerror(errno)); +} + static void rev_list(int fd, struct ref *refs) { int pipe_fd[2]; @@ -99,7 +143,7 @@ static void rev_list(int fd, struct ref close(pipe_fd[0]); close(pipe_fd[1]); close(fd); - exec_pack_objects(); + exec_pack_objects(refs); die("pack-objects setup failed"); } if (pack_objects_pid < 0) @@ -108,7 +152,7 @@ static void rev_list(int fd, struct ref close(pipe_fd[0]); close(pipe_fd[1]); close(fd); - exec_rev_list(refs); + exec_rev_list(refs, 0); } static int pack_objects(int fd, struct ref *refs) -- 1.3.3.gf1cf-dirty ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC] send-pack: allow skipping delta when sending pack 2006-05-21 8:14 ` Jeff King @ 2006-05-21 8:24 ` Jakub Narebski 2006-05-21 8:44 ` Jeff King 2006-05-21 8:31 ` Junio C Hamano 1 sibling, 1 reply; 10+ messages in thread From: Jakub Narebski @ 2006-05-21 8:24 UTC (permalink / raw) To: git Jeff King wrote: Hmmm... isn't the patch slightly against git coding style? > + if(unpacked) [...] >+ for(i = 0; i < len; i++) compare to: > for (ref = refs; ref; ref = ref->next) { -- Jakub Narebski Warsaw, Poland ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] send-pack: allow skipping delta when sending pack 2006-05-21 8:24 ` Jakub Narebski @ 2006-05-21 8:44 ` Jeff King 2006-05-21 9:14 ` Junio C Hamano 2006-05-21 9:24 ` Jakub Narebski 0 siblings, 2 replies; 10+ messages in thread From: Jeff King @ 2006-05-21 8:44 UTC (permalink / raw) To: Jakub Narebski; +Cc: git On Sun, May 21, 2006 at 10:24:37AM +0200, Jakub Narebski wrote: > Hmmm... isn't the patch slightly against git coding style? Oops, yes (though the point is moot since the patch is conceptually wrong). Is there a git coding style document somewhere? -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] send-pack: allow skipping delta when sending pack 2006-05-21 8:44 ` Jeff King @ 2006-05-21 9:14 ` Junio C Hamano 2006-05-21 9:24 ` Jakub Narebski 1 sibling, 0 replies; 10+ messages in thread From: Junio C Hamano @ 2006-05-21 9:14 UTC (permalink / raw) To: Jeff King; +Cc: git Jeff King <peff@peff.net> writes: > On Sun, May 21, 2006 at 10:24:37AM +0200, Jakub Narebski wrote: > >> Hmmm... isn't the patch slightly against git coding style? > > Oops, yes (though the point is moot since the patch is conceptually > wrong). Is there a git coding style document somewhere? When in doubt, please follow the kernel coding style ;-). ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] send-pack: allow skipping delta when sending pack 2006-05-21 8:44 ` Jeff King 2006-05-21 9:14 ` Junio C Hamano @ 2006-05-21 9:24 ` Jakub Narebski 1 sibling, 0 replies; 10+ messages in thread From: Jakub Narebski @ 2006-05-21 9:24 UTC (permalink / raw) To: git Jeff King wrote: > On Sun, May 21, 2006 at 10:24:37AM +0200, Jakub Narebski wrote: > >> Hmmm... isn't the patch slightly against git coding style? > > Oops, yes (though the point is moot since the patch is conceptually > wrong). Is there a git coding style document somewhere? Not that I know of. It's better to follow the coding style used in the rest of files. Besides, differentiating between if (condition) and function(argument); makes IMVHO sense. -- Jakub Narebski Warsaw, Poland ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] send-pack: allow skipping delta when sending pack 2006-05-21 8:14 ` Jeff King 2006-05-21 8:24 ` Jakub Narebski @ 2006-05-21 8:31 ` Junio C Hamano 2006-05-21 8:43 ` Jeff King 1 sibling, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2006-05-21 8:31 UTC (permalink / raw) To: git Jeff King <peff@peff.net> writes: > On Sat, May 20, 2006 at 11:17:42PM -0700, Junio C Hamano wrote: > >> base delta for that object to skip computation". What you want >> here is "if the object we are going to send is not a delta in >> the source, and there are sufficient number of other objects the >> object could have been deltified against, then it is very likely >> that it was not worth deltifying when it was packed; so it is >> probably not worth deltifying it now". > > I think we can make a stronger statement in many cases: "if the object > we are going to send is not a delta in the source, and there are no > other objects it could be deltified against, then it is not worth > deltifying." That is, in the case that we just packed we KNOW that it's > not worth it, since we're not sending anything that isn't already > packed. Careful. We do not delta an otherwise perfectly deltifiable object if its delta base happens to be at the depth edge in the original pack. So no, we do _NOT_ know if it is not worth it merely from the fact that it is not deltified in the existing pack. And the latter part of your test "there are no other objects it could be deltified against" is either expensive (you have to try first to see if that is the case to really see it) or stupid (you just assume there is no suitable delta base without looking at other objects like we currently do). ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] send-pack: allow skipping delta when sending pack 2006-05-21 8:31 ` Junio C Hamano @ 2006-05-21 8:43 ` Jeff King 2006-05-21 9:40 ` Junio C Hamano 0 siblings, 1 reply; 10+ messages in thread From: Jeff King @ 2006-05-21 8:43 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Sun, May 21, 2006 at 01:31:22AM -0700, Junio C Hamano wrote: > We do not delta an otherwise perfectly deltifiable object if its > delta base happens to be at the depth edge in the original pack. > So no, we do _NOT_ know if it is not worth it merely from the > fact that it is not deltified in the existing pack. And the Yes, you're right. What do you suggest for this performance issue, then? A manual no-delta trigger, or just going to get a cup of coffee while pushing (my tests showed 5-6x slowdown from deltifying)? -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] send-pack: allow skipping delta when sending pack 2006-05-21 8:43 ` Jeff King @ 2006-05-21 9:40 ` Junio C Hamano 0 siblings, 0 replies; 10+ messages in thread From: Junio C Hamano @ 2006-05-21 9:40 UTC (permalink / raw) To: Jeff King; +Cc: git Jeff King <peff@peff.net> writes: > What do you suggest for this performance issue, then? > A manual no-delta trigger, or just going to get a cup of coffee while > pushing (my tests showed 5-6x slowdown from deltifying)? In the short term, give --depth to send-pack, would be better than nothing. I agree we should be able to do better. But let me grab some sleep now first ;-). ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2006-05-21 9:40 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-05-21 5:48 [RFC] send-pack: allow skipping delta when sending pack Jeff King 2006-05-21 6:17 ` Junio C Hamano 2006-05-21 8:14 ` Jeff King 2006-05-21 8:24 ` Jakub Narebski 2006-05-21 8:44 ` Jeff King 2006-05-21 9:14 ` Junio C Hamano 2006-05-21 9:24 ` Jakub Narebski 2006-05-21 8:31 ` Junio C Hamano 2006-05-21 8:43 ` Jeff King 2006-05-21 9:40 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).