git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 2/2] Teach receive-pack how to keep pack files based on object count.
@ 2006-10-31  7:57 Shawn Pearce
  2006-10-31 19:56 ` Nicolas Pitre
  0 siblings, 1 reply; 7+ messages in thread
From: Shawn Pearce @ 2006-10-31  7:57 UTC (permalink / raw)
  To: Junio C Hamano, Nicolas Pitre; +Cc: git

Since keeping a pushed pack or exploding it into loose objects
should be a local repository decision this teaches receive-pack
to decide if it should call unpack-objects or index-pack --stdin
--fix-thin based on the setting of receive.unpackLimit and the
number of objects contained in the received pack.

If the number of objects (hdr_entries) in the received pack is
below the value of receive.unpackLimit (which is 5000 by default)
then we unpack-objects as we have in the past.

If the hdr_entries >= receive.unpackLimit then we call index-pack and
ask it to include our pid and hostname in the .keep file to make it
easier to identify why a given pack has been kept in the repository.

Currently this leaves every received pack as a kept pack.  We really
don't want that as received packs will tend to be small.  Instead we
want to delete the .keep file automatically after all refs have
been updated.  That is being left as room for future improvement.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 Documentation/config.txt |   11 +++++++-
 cache.h                  |    1 +
 receive-pack.c           |   66 +++++++++++++++++++++++++++++++++++++++++++--
 sha1_file.c              |    2 +-
 4 files changed, 75 insertions(+), 5 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index d9e73da..9d3c71c 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -301,7 +301,16 @@ imap::
 	The configuration variables in the 'imap' section are described
 	in gitlink:git-imap-send[1].
 
-receive.denyNonFastforwads::
+receive.unpackLimit::
+	If the number of objects received in a push is below this
+	limit then the objects will be unpacked into loose object
+	files. However if the number of received objects equals or
+	exceeds this limit then the received pack will be stored as
+	a pack, after adding any missing delta bases.  Storing the
+	pack from a push can make the push operation complete faster,
+	especially on slow filesystems.
+
+receive.denyNonFastForwards::
 	If set to true, git-receive-pack will deny a ref update which is
 	not a fast forward. Use this to prevent such an update via a push,
 	even if that push is forced. This configuration variable is
diff --git a/cache.h b/cache.h
index e997a85..6cb7e1d 100644
--- a/cache.h
+++ b/cache.h
@@ -376,6 +376,7 @@ extern struct packed_git *parse_pack_ind
 						char *idx_path);
 
 extern void prepare_packed_git(void);
+extern void reprepare_packed_git(void);
 extern void install_packed_git(struct packed_git *pack);
 
 extern struct packed_git *find_sha1_pack(const unsigned char *sha1, 
diff --git a/receive-pack.c b/receive-pack.c
index 7e154c5..b394833 100644
--- a/receive-pack.c
+++ b/receive-pack.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "pack.h"
 #include "refs.h"
 #include "pkt-line.h"
 #include "run-command.h"
@@ -7,9 +8,8 @@
 
 static const char receive_pack_usage[] = "git-receive-pack <git-dir>";
 
-static const char *unpacker[] = { "unpack-objects", NULL };
-
 static int deny_non_fast_forwards = 0;
+static int unpack_limit = 5000;
 static int report_status;
 
 static char capabilities[] = "report-status";
@@ -25,6 +25,12 @@ static int receive_pack_config(const cha
 		return 0;
 	}
 
+	if (strcmp(var, "receive.unpacklimit") == 0)
+	{
+		unpack_limit = git_config_int(var, value);
+		return 0;
+	}
+
 	return 0;
 }
 
@@ -227,9 +233,63 @@ static void read_head_info(void)
 	}
 }
 
+static const char *parse_pack_header(struct pack_header *hdr)
+{
+	char *c = (char*)hdr;
+	ssize_t remaining = sizeof(struct pack_header);
+	do {
+		ssize_t r = xread(0, c, remaining);
+		if (r <= 0)
+			return "eof before pack header was fully read";
+		remaining -= r;
+		c += r;
+	} while (remaining > 0);
+	if (hdr->hdr_signature != htonl(PACK_SIGNATURE))
+		return "protocol error (pack signature mismatch detected)";
+	if (!pack_version_ok(hdr->hdr_version))
+		return "protocol error (pack version unsupported)";
+	return NULL;
+}
+
 static const char *unpack()
 {
-	int code = run_command_v_opt(1, unpacker, RUN_GIT_CMD);
+	struct pack_header hdr;
+	const char *hdr_err;
+	char hdr_arg[38];
+	int code;
+
+	hdr_err = parse_pack_header(&hdr);
+	if (hdr_err)
+		return hdr_err;
+	snprintf(hdr_arg, sizeof(hdr_arg), "--pack_header=%u,%u",
+		ntohl(hdr.hdr_version), ntohl(hdr.hdr_entries));
+
+	if (ntohl(hdr.hdr_entries) < unpack_limit) {
+		const char *unpacker[3];
+		unpacker[0] = "unpack-objects";
+		unpacker[1] = hdr_arg;
+		unpacker[2] = NULL;
+		code = run_command_v_opt(1, unpacker, RUN_GIT_CMD);
+	} else {
+		const char *keeper[6];
+		char my_host[255], keep_arg[128 + 255];
+
+		if (gethostname(my_host, sizeof(my_host)))
+			strcpy(my_host, "localhost");
+		snprintf(keep_arg, sizeof(keep_arg),
+			"--keep=receive-pack %i on %s",
+			getpid(), my_host);
+
+		keeper[0] = "index-pack";
+		keeper[1] = "--stdin";
+		keeper[2] = "--fix-thin";
+		keeper[3] = hdr_arg;
+		keeper[4] = keep_arg;
+		keeper[5] = NULL;
+		code = run_command_v_opt(1, keeper, RUN_GIT_CMD);
+		if (!code)
+			reprepare_packed_git();
+	}
 
 	switch (code) {
 	case 0:
diff --git a/sha1_file.c b/sha1_file.c
index 5e6c8b8..7bda2d4 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -663,7 +663,7 @@ void prepare_packed_git(void)
 	prepare_packed_git_run_once = 1;
 }
 
-static void reprepare_packed_git(void)
+void reprepare_packed_git(void)
 {
 	prepare_packed_git_run_once = 0;
 	prepare_packed_git();
-- 
1.4.3.3.g7d63

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] Teach receive-pack how to keep pack files based on object count.
  2006-10-31  7:57 [PATCH 2/2] Teach receive-pack how to keep pack files based on object count Shawn Pearce
@ 2006-10-31 19:56 ` Nicolas Pitre
  2006-10-31 20:11   ` Shawn Pearce
  0 siblings, 1 reply; 7+ messages in thread
From: Nicolas Pitre @ 2006-10-31 19:56 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Junio C Hamano, git

On Tue, 31 Oct 2006, Shawn Pearce wrote:

> Since keeping a pushed pack or exploding it into loose objects
> should be a local repository decision this teaches receive-pack
> to decide if it should call unpack-objects or index-pack --stdin
> --fix-thin based on the setting of receive.unpackLimit and the
> number of objects contained in the received pack.

This works fine when used with my replacement patch for your [1/2] one.

> Currently this leaves every received pack as a kept pack.  We really
> don't want that as received packs will tend to be small.  Instead we
> want to delete the .keep file automatically after all refs have
> been updated.  That is being left as room for future improvement.

I think this should be solved before rx packs are actually stored as 
packs though.  Otherwise people will end up with unwanted .keep files 
left around.  Maybe having a much bigger default for object number 
treshold for the time being?  (unless this patch is applied to "next" at 
the same time as another one that actually deals with those .keep 
files).



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] Teach receive-pack how to keep pack files based on object count.
  2006-10-31 19:56 ` Nicolas Pitre
@ 2006-10-31 20:11   ` Shawn Pearce
  2006-10-31 21:08     ` Nicolas Pitre
  0 siblings, 1 reply; 7+ messages in thread
From: Shawn Pearce @ 2006-10-31 20:11 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, git

Nicolas Pitre <nico@cam.org> wrote:
> On Tue, 31 Oct 2006, Shawn Pearce wrote:
> 
> > Since keeping a pushed pack or exploding it into loose objects
> > should be a local repository decision this teaches receive-pack
> > to decide if it should call unpack-objects or index-pack --stdin
> > --fix-thin based on the setting of receive.unpackLimit and the
> > number of objects contained in the received pack.
> 
> This works fine when used with my replacement patch for your [1/2] one.
> 
> > Currently this leaves every received pack as a kept pack.  We really
> > don't want that as received packs will tend to be small.  Instead we
> > want to delete the .keep file automatically after all refs have
> > been updated.  That is being left as room for future improvement.
> 
> I think this should be solved before rx packs are actually stored as 
> packs though.  Otherwise people will end up with unwanted .keep files 
> left around.  Maybe having a much bigger default for object number 
> treshold for the time being?  (unless this patch is applied to "next" at 
> the same time as another one that actually deals with those .keep 
> files).

Its next on my list of things to do.  Hopefully I'll be able to
implement it today.

I'm thinking of just brute forcing it: put enough identifying data
into the .keep file to make it unique, then go through every local
pack and look at their .keep file; if the content matches what
receive-pack asked index-pack to put there then remove it.

-- 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] Teach receive-pack how to keep pack files based on object count.
  2006-10-31 20:11   ` Shawn Pearce
@ 2006-10-31 21:08     ` Nicolas Pitre
  2006-10-31 21:29       ` Shawn Pearce
  2006-10-31 22:27       ` Junio C Hamano
  0 siblings, 2 replies; 7+ messages in thread
From: Nicolas Pitre @ 2006-10-31 21:08 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Junio C Hamano, git

On Tue, 31 Oct 2006, Shawn Pearce wrote:
> Nicolas Pitre <nico@cam.org> wrote:
> > I think this should be solved before rx packs are actually stored as 
> > packs though.  Otherwise people will end up with unwanted .keep files 
> > left around.  Maybe having a much bigger default for object number 
> > treshold for the time being?  (unless this patch is applied to "next" at 
> > the same time as another one that actually deals with those .keep 
> > files).
> 
> Its next on my list of things to do.  Hopefully I'll be able to
> implement it today.
> 
> I'm thinking of just brute forcing it: put enough identifying data
> into the .keep file to make it unique, then go through every local
> pack and look at their .keep file; if the content matches what
> receive-pack asked index-pack to put there then remove it.

Ouch.  What about the patch below?  It covers only the pull/fetch case, 
but covering the push case shouldn't be that hard either (simply use a 
pipe to read index-pack's stdout and capture the pack name).

I used "pack" <tab> <sha1> so it is easy to pick out of the list of refs 
that usually comes over the stream in the fetch case (if I understood 
that part right).

diff --git a/fetch-clone.c b/fetch-clone.c
index f629d8d..579307d 100644
--- a/fetch-clone.c
+++ b/fetch-clone.c
@@ -81,7 +81,18 @@ int receive_unpack_pack(int xd[2], const
 
 int receive_keep_pack(int xd[2], const char *me, int quiet, int sideband)
 {
-	const char *argv[5] = { "index-pack", "--stdin", "--fix-thin",
-				quiet ? NULL : "-v", NULL };
+	const char *argv[6];
+	char my_host[255], keep_arg[128 + 255];
+
+	if (gethostname(my_host, sizeof(my_host)))
+		strcpy(my_host, "localhost");
+	snprintf(keep_arg, sizeof(keep_arg), "--keep=fetch-pack %i on %s",
+		 getpid(), my_host);
+	argv[0] = "index-pack";
+	argv[1] = "--stdin";
+	argv[2] = "--fix-thin";
+	argv[3] = keep_arg;
+	argv[4] = quiet ? NULL : "-v";
+	argv[5] = NULL;
 	return get_pack(xd, me, sideband, argv);
 }
diff --git a/git-fetch.sh b/git-fetch.sh
index 539dff6..366014d 100755
--- a/git-fetch.sh
+++ b/git-fetch.sh
@@ -368,6 +368,7 @@ fetch_main () {
       ;; # we are already done.
   *)
     ( : subshell because we muck with IFS
+      pack_lockfile=
       IFS=" 	$LF"
       (
 	  git-fetch-pack $exec $keep "$remote" $rref || echo failed "$remote"
@@ -378,6 +379,10 @@ fetch_main () {
 	  failed)
 		  echo >&2 "Fetch failure: $remote"
 		  exit 1 ;;
+	  pack)
+		  # special line coming from index-pack with the pack name
+		  pack_lockfile="$GIT_OBJECT_DIRECTORY/pack/pack-$remote_name.keep"
+		  continue ;;
 	  esac
 	  found=
 	  single_force=
@@ -408,6 +413,7 @@ fetch_main () {
 	  append_fetch_head "$sha1" "$remote" \
 		  "$remote_name" "$remote_nick" "$local_name" "$not_for_merge"
       done
+      [ "$pack_lockfile" ] && rm -f "$pack_lockfile"
     ) || exit ;;
   esac
 
diff --git a/index-pack.c b/index-pack.c
index a3b55f9..c8f66da 100644
--- a/index-pack.c
+++ b/index-pack.c
@@ -767,18 +767,6 @@ static void final(const char *final_pack
 		if (err)
 			die("error while closing pack file: %s", strerror(errno));
 		chmod(curr_pack_name, 0444);
-
-		/*
-		 * Let's just mimic git-unpack-objects here and write
-		 * the last part of the buffer to stdout.
-		 */
-		while (input_len) {
-			err = xwrite(1, input_buffer + input_offset, input_len);
-			if (err <= 0)
-				break;
-			input_len -= err;
-			input_offset += err;
-		}
 	}
 
 	if (keep_msg) {
@@ -818,6 +806,27 @@ static void final(const char *final_pack
 		if (move_temp_to_file(curr_index_name, final_index_name))
 			die("cannot store index file");
 	}
+
+	if (!from_stdin) {
+		printf("%s\n", sha1_to_hex(sha1));
+	} else {
+		char buf[48];
+		int len = snprintf(buf, sizeof(buf), "pack\t%s\n",
+				   sha1_to_hex(sha1));
+		xwrite(1, buf, len);
+
+		/*
+		 * Let's just mimic git-unpack-objects here and write
+		 * the last part of the input buffer to stdout.
+		 */
+		while (input_len) {
+			err = xwrite(1, input_buffer + input_offset, input_len);
+			if (err <= 0)
+				break;
+			input_len -= err;
+			input_offset += err;
+		}
+	}
 }
 
 int main(int argc, char **argv)
@@ -934,8 +943,5 @@ int main(int argc, char **argv)
 	free(index_name_buf);
 	free(keep_name_buf);
 
-	if (!from_stdin)
-		printf("%s\n", sha1_to_hex(sha1));
-
 	return 0;

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] Teach receive-pack how to keep pack files based on object count.
  2006-10-31 21:08     ` Nicolas Pitre
@ 2006-10-31 21:29       ` Shawn Pearce
  2006-10-31 22:06         ` Nicolas Pitre
  2006-10-31 22:27       ` Junio C Hamano
  1 sibling, 1 reply; 7+ messages in thread
From: Shawn Pearce @ 2006-10-31 21:29 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, git

Nicolas Pitre <nico@cam.org> wrote:
> On Tue, 31 Oct 2006, Shawn Pearce wrote:
> > Nicolas Pitre <nico@cam.org> wrote:
> > > I think this should be solved before rx packs are actually stored as 
> > > packs though.  Otherwise people will end up with unwanted .keep files 
> > > left around.  Maybe having a much bigger default for object number 
> > > treshold for the time being?  (unless this patch is applied to "next" at 
> > > the same time as another one that actually deals with those .keep 
> > > files).
> > 
> > Its next on my list of things to do.  Hopefully I'll be able to
> > implement it today.
> > 
> > I'm thinking of just brute forcing it: put enough identifying data
> > into the .keep file to make it unique, then go through every local
> > pack and look at their .keep file; if the content matches what
> > receive-pack asked index-pack to put there then remove it.
> 
> Ouch.  What about the patch below?  It covers only the pull/fetch case, 
> but covering the push case shouldn't be that hard either (simply use a 
> pipe to read index-pack's stdout and capture the pack name).
> 
> I used "pack" <tab> <sha1> so it is easy to pick out of the list of refs 
> that usually comes over the stream in the fetch case (if I understood 
> that part right).

I thought about using a pipe too, but in the case of receive-pack
it looked like index-pack was sending something back to the push
end of the connection.  I didn't dig into the code enough to see
what that was and how to do the same in receive-pack itself.  The
brute force approach is horrible but simple.  ;-)

I'll look at your patch and what I need to do make a pipe work here,
because its clearly the better solution.

-- 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] Teach receive-pack how to keep pack files based on object count.
  2006-10-31 21:29       ` Shawn Pearce
@ 2006-10-31 22:06         ` Nicolas Pitre
  0 siblings, 0 replies; 7+ messages in thread
From: Nicolas Pitre @ 2006-10-31 22:06 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Junio C Hamano, git

On Tue, 31 Oct 2006, Shawn Pearce wrote:
> Nicolas Pitre <nico@cam.org> wrote:
> > I used "pack" <tab> <sha1> so it is easy to pick out of the list of refs 
> > that usually comes over the stream in the fetch case (if I understood 
> > that part right).
> 
> I thought about using a pipe too, but in the case of receive-pack
> it looked like index-pack was sending something back to the push
> end of the connection.  I didn't dig into the code enough to see
> what that was and how to do the same in receive-pack itself.

Well, I think it goes like this:

unpack-objects (and now index-pack --stdin) reads from stdin in 4kb 
chunks.  When the pack has been entirely parsed, it is possible that the 
4kb chunk contains data from the stream past the actual pack data which 
is why the remaining of the buffer is flushed out to stdout for the next 
tool in the chain to pick up.

So if you use a pipe with index-pack to retrieve the pack name, you also 
must consume all extra data from it and pass it on as well.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] Teach receive-pack how to keep pack files based on object count.
  2006-10-31 21:08     ` Nicolas Pitre
  2006-10-31 21:29       ` Shawn Pearce
@ 2006-10-31 22:27       ` Junio C Hamano
  1 sibling, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2006-10-31 22:27 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Shawn Pearce, git

Nicolas Pitre <nico@cam.org> writes:

> I used "pack" <tab> <sha1> so it is easy to pick out of the list of refs 
> that usually comes over the stream in the fetch case (if I understood 
> that part right).

This sounds like a very sane thing to do.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-10-31 22:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-31  7:57 [PATCH 2/2] Teach receive-pack how to keep pack files based on object count Shawn Pearce
2006-10-31 19:56 ` Nicolas Pitre
2006-10-31 20:11   ` Shawn Pearce
2006-10-31 21:08     ` Nicolas Pitre
2006-10-31 21:29       ` Shawn Pearce
2006-10-31 22:06         ` Nicolas Pitre
2006-10-31 22:27       ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).