Git development

Git development
 help / color / mirror / Atom feed

* Re: Git is exploding
From: Tay Ray Chuan @ 2011-10-29  8:36 UTC (permalink / raw)
  To: Øyvind A. Holm; +Cc: git
In-Reply-To: <CAA787r=jeBv9moineaJVY=urYzEX+d7n23ED-txAGhLS+OPbmg@mail.gmail.com>

On Sat, Oct 29, 2011 at 8:39 AM, Øyvind A. Holm <sunny@sunbase.org> wrote:
> Found an interesting "Popularity Contest" graph on debian.org (via
> Thomas Bassetto on G+):
>
> http://bit.ly/rNxVN0
>
> Very cool indeed. Maybe it's the rise of GitHub, or simply that the
> user interface is mature enough that also "regular" users feel
> comfortable with it.

How were the numbers gathered? I looked around the page but gave up.

-- 
Cheers,
Ray Chuan

^ permalink raw reply

* Re: Git is exploding
From: Miles Bader @ 2011-10-29  8:28 UTC (permalink / raw)
  To: Øyvind A. Holm; +Cc: git
In-Reply-To: <8762j8jje9.fsf@catnip.gol.com>

2011/10/29 Miles Bader <miles@gnu.org>:
> That the sharpness of that graph is pretty amazing though; what
> happened in 2010Q1?

Actually, now I realize what happened:  that's the date the Debian
"git-core" package was renamed "git" (the "git" package used to be
"gnu interactive tools")!!

-Miles

-- 
Cat is power.  Cat is peace.

^ permalink raw reply

* Re: sparse checkout using exclusions
From: Ramkumar Ramachandra @ 2011-10-29  5:46 UTC (permalink / raw)
  To: Eric Raible; +Cc: git@vger.kernel.org
In-Reply-To: <4EAB4632.5080101@nextest.com>

Hi Eric,

Eric Raible writes:
> Might it make sense for the example in git-read-tree.html to be
> updated to include the leading slash?

This issue was fixed in 5e821231 (git-read-tree.txt: update sparse
checkout examples, 2011-09-26).

Cheers.

-- Ram

^ permalink raw reply

* Bitbucket now has git
From: Alec Taylor @ 2011-10-29  3:36 UTC (permalink / raw)
  To: git

Please update http://git-scm.com/tools

^ permalink raw reply

* Fork freedesktop project to bitbucket, make changes, generate patch back to freedesktop?
From: Alec Taylor @ 2011-10-29  3:35 UTC (permalink / raw)
  To: git

Good afternoon,

I've forked a [git] freedesktop project to [git] bitbucket.

I am working with a team extending the functionality of this project.

After many MANY adds, commits and pushes back and forth on the
bitbucket project, we then want to send this freedesktop project a
PATCH with the changes we've made.

Can you tell me the command I need to do this?

Thanks for all suggestions,

Alec Taylor

^ permalink raw reply

* Git is exploding
From: Øyvind A. Holm @ 2011-10-29  0:39 UTC (permalink / raw)
  To: git

Found an interesting "Popularity Contest" graph on debian.org (via
Thomas Bassetto on G+):

http://bit.ly/rNxVN0

Very cool indeed. Maybe it's the rise of GitHub, or simply that the
user interface is mature enough that also "regular" users feel
comfortable with it.

Regards,
Øyvind

^ permalink raw reply

* sparse checkout using exclusions
From: Eric Raible @ 2011-10-29  0:17 UTC (permalink / raw)
  To: git@vger.kernel.org

Hi all.

I was just about to send a long message about using exclusions
in sparse-checkout, when I did one last search and saw that all
of my problems were fixed by using '/*' instead of '*' as the
first line in .git/info/sparse-checkout.

Might it make sense for the example in git-read-tree.html to be
updated to include the leading slash?

    /*
    !unwanted

- Eric

^ permalink raw reply

* [PATCH 4/4] Bulk check-in
From: Junio C Hamano @ 2011-10-28 23:54 UTC (permalink / raw)
  To: git
In-Reply-To: <1319846051-462-1-git-send-email-gitster@pobox.com>

This extends the earlier approach to stream a large file directly from the
filesystem to its own packfile, and allows "git add" to send large files
directly into a single pack. Older code used to spawn fast-import, but
the new bulk_checkin API replaces it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Makefile         |    2 +
 builtin/add.c    |    5 ++
 bulk-checkin.c   |  159 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 bulk-checkin.h   |   16 ++++++
 sha1_file.c      |   67 ++---------------------
 t/t1050-large.sh |   26 +++++++--
 6 files changed, 206 insertions(+), 69 deletions(-)
 create mode 100644 bulk-checkin.c
 create mode 100644 bulk-checkin.h

diff --git a/Makefile b/Makefile
index 3139c19..418dd2e 100644
--- a/Makefile
+++ b/Makefile
@@ -505,6 +505,7 @@ LIB_H += argv-array.h
 LIB_H += attr.h
 LIB_H += blob.h
 LIB_H += builtin.h
+LIB_H += bulk-checkin.h
 LIB_H += cache.h
 LIB_H += cache-tree.h
 LIB_H += color.h
@@ -591,6 +592,7 @@ LIB_OBJS += base85.o
 LIB_OBJS += bisect.o
 LIB_OBJS += blob.o
 LIB_OBJS += branch.o
+LIB_OBJS += bulk-checkin.o
 LIB_OBJS += bundle.o
 LIB_OBJS += cache-tree.o
 LIB_OBJS += color.o
diff --git a/builtin/add.c b/builtin/add.c
index c59b0c9..1c42900 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -13,6 +13,7 @@
 #include "diff.h"
 #include "diffcore.h"
 #include "revision.h"
+#include "bulk-checkin.h"
 
 static const char * const builtin_add_usage[] = {
 	"git add [options] [--] <filepattern>...",
@@ -458,11 +459,15 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 		free(seen);
 	}
 
+	plug_bulk_checkin();
+
 	exit_status |= add_files_to_cache(prefix, pathspec, flags);
 
 	if (add_new_files)
 		exit_status |= add_files(&dir, flags);
 
+	unplug_bulk_checkin();
+
  finish:
 	if (active_cache_changed) {
 		if (write_cache(newfd, active_cache, active_nr) ||
diff --git a/bulk-checkin.c b/bulk-checkin.c
new file mode 100644
index 0000000..cad7a0b
--- /dev/null
+++ b/bulk-checkin.c
@@ -0,0 +1,159 @@
+/*
+ * Copyright (c) 2011, Google Inc.
+ */
+#include "bulk-checkin.h"
+#include "csum-file.h"
+#include "pack.h"
+
+static int pack_compression_level = Z_DEFAULT_COMPRESSION;
+
+static struct bulk_checkin_state {
+	unsigned plugged:1;
+
+	char *pack_tmp_name;
+	struct sha1file *f;
+	off_t offset;
+	struct pack_idx_option pack_idx_opts;
+
+	struct pack_idx_entry **written;
+	uint32_t alloc_written;
+	uint32_t nr_written;
+} state;
+
+static void finish_bulk_checkin(struct bulk_checkin_state *state)
+{
+	unsigned char sha1[20];
+	char packname[PATH_MAX];
+	int i;
+
+	if (!state->f)
+		return;
+
+	if (state->nr_written == 1) {
+		sha1close(state->f, sha1, CSUM_FSYNC);
+	} else {
+		int fd = sha1close(state->f, sha1, 0);
+		fixup_pack_header_footer(fd, sha1, state->pack_tmp_name,
+					 state->nr_written, sha1,
+					 state->offset);
+		close(fd);
+	}
+
+	sprintf(packname, "%s/pack/pack-", get_object_directory());
+	finish_tmp_packfile(packname, state->pack_tmp_name,
+			    state->written, state->nr_written,
+			    &state->pack_idx_opts, sha1);
+	for (i = 0; i < state->nr_written; i++)
+		free(state->written[i]);
+	free(state->written);
+	memset(state, 0, sizeof(*state));
+
+	/* Make objects we just wrote available to ourselves */
+	reprepare_packed_git();
+}
+
+static void deflate_to_pack(struct bulk_checkin_state *state,
+			    unsigned char sha1[],
+			    int fd, size_t size, enum object_type type,
+			    const char *path, unsigned flags)
+{
+	unsigned char obuf[16384];
+	unsigned hdrlen;
+	git_zstream s;
+	git_SHA_CTX ctx;
+	int write_object = (flags & HASH_WRITE_OBJECT);
+	int status = Z_OK;
+	struct pack_idx_entry *idx = NULL;
+
+	hdrlen = sprintf((char *)obuf, "%s %" PRIuMAX, typename(type), size) + 1;
+	git_SHA1_Init(&ctx);
+	git_SHA1_Update(&ctx, obuf, hdrlen);
+
+	if (write_object) {
+		idx = xcalloc(1, sizeof(*idx));
+		idx->offset = state->offset;
+		crc32_begin(state->f);
+	}
+	memset(&s, 0, sizeof(s));
+	git_deflate_init(&s, pack_compression_level);
+
+	hdrlen = encode_in_pack_object_header(type, size, obuf);
+	s.next_out = obuf + hdrlen;
+	s.avail_out = sizeof(obuf) - hdrlen;
+
+	while (status != Z_STREAM_END) {
+		unsigned char ibuf[16384];
+
+		if (size && !s.avail_in) {
+			ssize_t rsize = size < sizeof(ibuf) ? size : sizeof(ibuf);
+			if (xread(fd, ibuf, rsize) != rsize)
+				die("failed to read %d bytes from '%s'",
+				    (int)rsize, path);
+			git_SHA1_Update(&ctx, ibuf, rsize);
+			s.next_in = ibuf;
+			s.avail_in = rsize;
+			size -= rsize;
+		}
+
+		status = git_deflate(&s, size ? 0 : Z_FINISH);
+
+		if (!s.avail_out || status == Z_STREAM_END) {
+			size_t written = s.next_out - obuf;
+			if (write_object) {
+				sha1write(state->f, obuf, written);
+				state->offset += written;
+			}
+			s.next_out = obuf;
+			s.avail_out = sizeof(obuf);
+		}
+
+		switch (status) {
+		case Z_OK:
+		case Z_BUF_ERROR:
+		case Z_STREAM_END:
+			continue;
+		default:
+			die("unexpected deflate failure: %d", status);
+		}
+	}
+	git_deflate_end(&s);
+	git_SHA1_Final(sha1, &ctx);
+	if (write_object) {
+		idx->crc32 = crc32_end(state->f);
+		hashcpy(idx->sha1, sha1);
+		ALLOC_GROW(state->written,
+			   state->nr_written + 1, state->alloc_written);
+		state->written[state->nr_written++] = idx;
+	}
+}
+
+int index_bulk_checkin(unsigned char *sha1,
+		       int fd, size_t size, enum object_type type,
+		       const char *path, unsigned flags)
+{
+	if (!state.f && (flags & HASH_WRITE_OBJECT)) {
+		state.f = create_tmp_packfile(&state.pack_tmp_name);
+		reset_pack_idx_option(&state.pack_idx_opts);
+		/* Pretend we are going to write only one object */
+		state.offset = write_pack_header(state.f, 1);
+		if (!state.offset)
+			die_errno("unable to write pack header");
+	}
+
+	deflate_to_pack(&state, sha1, fd, size, type, path, flags);
+	if (!state.plugged)
+		finish_bulk_checkin(&state);
+	return 0;
+}
+
+void plug_bulk_checkin(void)
+{
+	state.plugged = 1;
+}
+
+void unplug_bulk_checkin(void)
+{
+	state.plugged = 0;
+	if (state.f)
+		finish_bulk_checkin(&state);
+}
diff --git a/bulk-checkin.h b/bulk-checkin.h
new file mode 100644
index 0000000..4f599f8
--- /dev/null
+++ b/bulk-checkin.h
@@ -0,0 +1,16 @@
+/*
+ * Copyright (c) 2011, Google Inc.
+ */
+#ifndef BULK_CHECKIN_H
+#define BULK_CHECKIN_H
+
+#include "cache.h"
+
+extern int index_bulk_checkin(unsigned char sha1[],
+			      int fd, size_t size, enum object_type type,
+			      const char *path, unsigned flags);
+
+extern void plug_bulk_checkin(void);
+extern void unplug_bulk_checkin(void);
+
+#endif
diff --git a/sha1_file.c b/sha1_file.c
index 27f3b9b..c96e366 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -18,6 +18,7 @@
 #include "refs.h"
 #include "pack-revindex.h"
 #include "sha1-lookup.h"
+#include "bulk-checkin.h"
 
 #ifndef O_NOATIME
 #if defined(__linux__) && (defined(__i386__) || defined(__PPC__))
@@ -2679,10 +2680,8 @@ static int index_core(unsigned char *sha1, int fd, size_t size,
 }
 
 /*
- * This creates one packfile per large blob, because the caller
- * immediately wants the result sha1, and fast-import can report the
- * object name via marks mechanism only by closing the created
- * packfile.
+ * This creates one packfile per large blob unless bulk-checkin
+ * machinery is "plugged".
  *
  * This also bypasses the usual "convert-to-git" dance, and that is on
  * purpose. We could write a streaming version of the converting
@@ -2696,65 +2695,7 @@ static int index_stream(unsigned char *sha1, int fd, size_t size,
 			enum object_type type, const char *path,
 			unsigned flags)
 {
-	struct child_process fast_import;
-	char export_marks[512];
-	const char *argv[] = { "fast-import", "--quiet", export_marks, NULL };
-	char tmpfile[512];
-	char fast_import_cmd[512];
-	char buf[512];
-	int len, tmpfd;
-
-	strcpy(tmpfile, git_path("hashstream_XXXXXX"));
-	tmpfd = git_mkstemp_mode(tmpfile, 0600);
-	if (tmpfd < 0)
-		die_errno("cannot create tempfile: %s", tmpfile);
-	if (close(tmpfd))
-		die_errno("cannot close tempfile: %s", tmpfile);
-	sprintf(export_marks, "--export-marks=%s", tmpfile);
-
-	memset(&fast_import, 0, sizeof(fast_import));
-	fast_import.in = -1;
-	fast_import.argv = argv;
-	fast_import.git_cmd = 1;
-	if (start_command(&fast_import))
-		die_errno("index-stream: git fast-import failed");
-
-	len = sprintf(fast_import_cmd, "blob\nmark :1\ndata %lu\n",
-		      (unsigned long) size);
-	write_or_whine(fast_import.in, fast_import_cmd, len,
-		       "index-stream: feeding fast-import");
-	while (size) {
-		char buf[10240];
-		size_t sz = size < sizeof(buf) ? size : sizeof(buf);
-		ssize_t actual;
-
-		actual = read_in_full(fd, buf, sz);
-		if (actual < 0)
-			die_errno("index-stream: reading input");
-		if (write_in_full(fast_import.in, buf, actual) != actual)
-			die_errno("index-stream: feeding fast-import");
-		size -= actual;
-	}
-	if (close(fast_import.in))
-		die_errno("index-stream: closing fast-import");
-	if (finish_command(&fast_import))
-		die_errno("index-stream: finishing fast-import");
-
-	tmpfd = open(tmpfile, O_RDONLY);
-	if (tmpfd < 0)
-		die_errno("index-stream: cannot open fast-import mark");
-	len = read(tmpfd, buf, sizeof(buf));
-	if (len < 0)
-		die_errno("index-stream: reading fast-import mark");
-	if (close(tmpfd) < 0)
-		die_errno("index-stream: closing fast-import mark");
-	if (unlink(tmpfile))
-		die_errno("index-stream: unlinking fast-import mark");
-	if (len != 44 ||
-	    memcmp(":1 ", buf, 3) ||
-	    get_sha1_hex(buf + 3, sha1))
-		die_errno("index-stream: unexpected fast-import mark: <%s>", buf);
-	return 0;
+	return index_bulk_checkin(sha1, fd, size, type, path, flags);
 }
 
 int index_fd(unsigned char *sha1, int fd, struct stat *st,
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index deba111..36def25 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -7,14 +7,28 @@ test_description='adding and checking out large blobs'
 
 test_expect_success setup '
 	git config core.bigfilethreshold 200k &&
-	echo X | dd of=large bs=1k seek=2000
+	echo X | dd of=large bs=1k seek=2000 &&
+	echo Y | dd of=huge bs=1k seek=2500
 '
 
-test_expect_success 'add a large file' '
-	git add large &&
-	# make sure we got a packfile and no loose objects
-	test -f .git/objects/pack/pack-*.pack &&
-	test ! -f .git/objects/??/??????????????????????????????????????
+test_expect_success 'add a large file or two' '
+	git add large huge &&
+	# make sure we got a single packfile and no loose objects
+	bad= count=0 &&
+	for p in .git/objects/pack/pack-*.pack
+	do
+		count=$(( $count + 1 ))
+		test -f "$p" && continue
+		bad=t
+	done &&
+	test -z "$bad" &&
+	test $count = 1 &&
+	for l in .git/objects/??/??????????????????????????????????????
+	do
+		test -f "$l" || continue
+		bad=t
+	done &&
+	test -z "$bad"
 '
 
 test_expect_success 'checkout a large file' '
-- 
1.7.7.1.573.ga40d2

^ permalink raw reply related

* [PATCH 3/4] finish_tmp_packfile(): a helper function
From: Junio C Hamano @ 2011-10-28 23:54 UTC (permalink / raw)
  To: git
In-Reply-To: <1319846051-462-1-git-send-email-gitster@pobox.com>

Factor out a small logic out of the private write_pack_file() function
in builtin/pack-objects.c.

This changes the order of finishing multi-pack generation slightly. The
code used to

 - adjust shared perm of temporary packfile
 - rename temporary packfile to the final name
 - update mtime of the packfile under the final name
 - adjust shared perm of temporary idxfile
 - rename temporary idxfile to the final name

but because the helper does not want to do the mtime thing, the updated
code does that step first and then all the rest.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/pack-objects.c |   33 ++++++++++-----------------------
 pack-write.c           |   31 +++++++++++++++++++++++++++++++
 pack.h                 |    1 +
 3 files changed, 42 insertions(+), 23 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 3258fa9..b458b6d 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -617,20 +617,8 @@ static void write_pack_file(void)
 
 		if (!pack_to_stdout) {
 			struct stat st;
-			const char *idx_tmp_name;
 			char tmpname[PATH_MAX];
 
-			idx_tmp_name = write_idx_file(NULL, written_list, nr_written,
-						      &pack_idx_opts, sha1);
-
-			snprintf(tmpname, sizeof(tmpname), "%s-%s.pack",
-				 base_name, sha1_to_hex(sha1));
-			free_pack_by_name(tmpname);
-			if (adjust_shared_perm(pack_tmp_name))
-				die_errno("unable to make temporary pack file readable");
-			if (rename(pack_tmp_name, tmpname))
-				die_errno("unable to rename temporary pack file");
-
 			/*
 			 * Packs are runtime accessed in their mtime
 			 * order since newer packs are more likely to contain
@@ -638,28 +626,27 @@ static void write_pack_file(void)
 			 * packs then we should modify the mtime of later ones
 			 * to preserve this property.
 			 */
-			if (stat(tmpname, &st) < 0) {
+			if (stat(pack_tmp_name, &st) < 0) {
 				warning("failed to stat %s: %s",
-					tmpname, strerror(errno));
+					pack_tmp_name, strerror(errno));
 			} else if (!last_mtime) {
 				last_mtime = st.st_mtime;
 			} else {
 				struct utimbuf utb;
 				utb.actime = st.st_atime;
 				utb.modtime = --last_mtime;
-				if (utime(tmpname, &utb) < 0)
+				if (utime(pack_tmp_name, &utb) < 0)
 					warning("failed utime() on %s: %s",
 						tmpname, strerror(errno));
 			}
 
-			snprintf(tmpname, sizeof(tmpname), "%s-%s.idx",
-				 base_name, sha1_to_hex(sha1));
-			if (adjust_shared_perm(idx_tmp_name))
-				die_errno("unable to make temporary index file readable");
-			if (rename(idx_tmp_name, tmpname))
-				die_errno("unable to rename temporary index file");
-
-			free((void *) idx_tmp_name);
+			/* Enough space for "-<sha-1>.pack"? */
+			if (sizeof(tmpname) <= strlen(base_name) + 50)
+				die("pack base name '%s' too long", base_name);
+			snprintf(tmpname, sizeof(tmpname), "%s-", base_name);
+			finish_tmp_packfile(tmpname, pack_tmp_name,
+					    written_list, nr_written,
+					    &pack_idx_opts, sha1);
 			free(pack_tmp_name);
 			puts(sha1_to_hex(sha1));
 		}
diff --git a/pack-write.c b/pack-write.c
index 863cce8..cadc3e1 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -338,3 +338,34 @@ struct sha1file *create_tmp_packfile(char **pack_tmp_name)
 	*pack_tmp_name = xstrdup(tmpname);
 	return sha1fd(fd, *pack_tmp_name);
 }
+
+void finish_tmp_packfile(char *name_buffer,
+			 const char *pack_tmp_name,
+			 struct pack_idx_entry **written_list,
+			 uint32_t nr_written,
+			 struct pack_idx_option *pack_idx_opts,
+			 unsigned char sha1[])
+{
+	const char *idx_tmp_name;
+	char *end_of_name_prefix = strrchr(name_buffer, 0);
+
+	if (adjust_shared_perm(pack_tmp_name))
+		die_errno("unable to make temporary pack file readable");
+
+	idx_tmp_name = write_idx_file(NULL, written_list, nr_written,
+				      pack_idx_opts, sha1);
+	if (adjust_shared_perm(idx_tmp_name))
+		die_errno("unable to make temporary index file readable");
+
+	sprintf(end_of_name_prefix, "%s.pack", sha1_to_hex(sha1));
+	free_pack_by_name(name_buffer);
+
+	if (rename(pack_tmp_name, name_buffer))
+		die_errno("unable to rename temporary pack file");
+
+	sprintf(end_of_name_prefix, "%s.idx", sha1_to_hex(sha1));
+	if (rename(idx_tmp_name, name_buffer))
+		die_errno("unable to rename temporary index file");
+
+	free((void *)idx_tmp_name);
+}
diff --git a/pack.h b/pack.h
index 0027ac6..cfb0f69 100644
--- a/pack.h
+++ b/pack.h
@@ -86,5 +86,6 @@ extern int encode_in_pack_object_header(enum object_type, uintmax_t, unsigned ch
 extern int read_pack_header(int fd, struct pack_header *);
 
 extern struct sha1file *create_tmp_packfile(char **pack_tmp_name);
+extern void finish_tmp_packfile(char *name_buffer, const char *pack_tmp_name, struct pack_idx_entry **written_list, uint32_t nr_written, struct pack_idx_option *pack_idx_opts, unsigned char sha1[]);
 
 #endif
-- 
1.7.7.1.573.ga40d2

^ permalink raw reply related

* [PATCH 2/4] create_tmp_packfile(): a helper function
From: Junio C Hamano @ 2011-10-28 23:54 UTC (permalink / raw)
  To: git
In-Reply-To: <1319846051-462-1-git-send-email-gitster@pobox.com>

Factor out a small logic out of the private write_pack_file() function
in builtin/pack-objects.c

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/pack-objects.c |   12 +++---------
 pack-write.c           |   10 ++++++++++
 pack.h                 |    3 +++
 3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 6643c16..3258fa9 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -584,16 +584,10 @@ static void write_pack_file(void)
 		unsigned char sha1[20];
 		char *pack_tmp_name = NULL;
 
-		if (pack_to_stdout) {
+		if (pack_to_stdout)
 			f = sha1fd_throughput(1, "<stdout>", progress_state);
-		} else {
-			char tmpname[PATH_MAX];
-			int fd;
-			fd = odb_mkstemp(tmpname, sizeof(tmpname),
-					 "pack/tmp_pack_XXXXXX");
-			pack_tmp_name = xstrdup(tmpname);
-			f = sha1fd(fd, pack_tmp_name);
-		}
+		else
+			f = create_tmp_packfile(&pack_tmp_name);
 
 		offset = write_pack_header(f, nr_remaining);
 		if (!offset)
diff --git a/pack-write.c b/pack-write.c
index 46f3f84..863cce8 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -328,3 +328,13 @@ int encode_in_pack_object_header(enum object_type type, uintmax_t size, unsigned
 	*hdr = c;
 	return n;
 }
+
+struct sha1file *create_tmp_packfile(char **pack_tmp_name)
+{
+	char tmpname[PATH_MAX];
+	int fd;
+
+	fd = odb_mkstemp(tmpname, sizeof(tmpname), "pack/tmp_pack_XXXXXX");
+	*pack_tmp_name = xstrdup(tmpname);
+	return sha1fd(fd, *pack_tmp_name);
+}
diff --git a/pack.h b/pack.h
index d429d8a..0027ac6 100644
--- a/pack.h
+++ b/pack.h
@@ -84,4 +84,7 @@ extern int encode_in_pack_object_header(enum object_type, uintmax_t, unsigned ch
 #define PH_ERROR_PACK_SIGNATURE	(-2)
 #define PH_ERROR_PROTOCOL	(-3)
 extern int read_pack_header(int fd, struct pack_header *);
+
+extern struct sha1file *create_tmp_packfile(char **pack_tmp_name);
+
 #endif
-- 
1.7.7.1.573.ga40d2

^ permalink raw reply related

* [PATCH 1/4] write_pack_header(): a helper function
From: Junio C Hamano @ 2011-10-28 23:54 UTC (permalink / raw)
  To: git
In-Reply-To: <1319846051-462-1-git-send-email-gitster@pobox.com>

Factor out a small logic out of the private write_pack_file() function
in builtin/pack-objects.c

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/pack-objects.c |    9 +++------
 pack-write.c           |   12 ++++++++++++
 pack.h                 |    2 ++
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index ba3705d..6643c16 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -571,7 +571,6 @@ static void write_pack_file(void)
 	uint32_t i = 0, j;
 	struct sha1file *f;
 	off_t offset;
-	struct pack_header hdr;
 	uint32_t nr_remaining = nr_result;
 	time_t last_mtime = 0;
 	struct object_entry **write_order;
@@ -596,11 +595,9 @@ static void write_pack_file(void)
 			f = sha1fd(fd, pack_tmp_name);
 		}
 
-		hdr.hdr_signature = htonl(PACK_SIGNATURE);
-		hdr.hdr_version = htonl(PACK_VERSION);
-		hdr.hdr_entries = htonl(nr_remaining);
-		sha1write(f, &hdr, sizeof(hdr));
-		offset = sizeof(hdr);
+		offset = write_pack_header(f, nr_remaining);
+		if (!offset)
+			die_errno("unable to write pack header");
 		nr_written = 0;
 		for (; i < nr_objects; i++) {
 			struct object_entry *e = write_order[i];
diff --git a/pack-write.c b/pack-write.c
index 9cd3bfb..46f3f84 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -178,6 +178,18 @@ const char *write_idx_file(const char *index_name, struct pack_idx_entry **objec
 	return index_name;
 }
 
+off_t write_pack_header(struct sha1file *f, uint32_t nr_entries)
+{
+	struct pack_header hdr;
+
+	hdr.hdr_signature = htonl(PACK_SIGNATURE);
+	hdr.hdr_version = htonl(PACK_VERSION);
+	hdr.hdr_entries = htonl(nr_entries);
+	if (sha1write(f, &hdr, sizeof(hdr)))
+		return 0;
+	return sizeof(hdr);
+}
+
 /*
  * Update pack header with object_count and compute new SHA1 for pack data
  * associated to pack_fd, and write that SHA1 at the end.  That new SHA1
diff --git a/pack.h b/pack.h
index 722a54e..d429d8a 100644
--- a/pack.h
+++ b/pack.h
@@ -2,6 +2,7 @@
 #define PACK_H
 
 #include "object.h"
+#include "csum-file.h"
 
 /*
  * Packed object header
@@ -74,6 +75,7 @@ extern const char *write_idx_file(const char *index_name, struct pack_idx_entry
 extern int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, off_t offset, off_t len, unsigned int nr);
 extern int verify_pack_index(struct packed_git *);
 extern int verify_pack(struct packed_git *);
+extern off_t write_pack_header(struct sha1file *f, uint32_t);
 extern void fixup_pack_header_footer(int, unsigned char *, const char *, uint32_t, unsigned char *, off_t);
 extern char *index_pack_lockfile(int fd);
 extern int encode_in_pack_object_header(enum object_type, uintmax_t, unsigned char *);
-- 
1.7.7.1.573.ga40d2

^ permalink raw reply related

* [PATCH 0/4] Bulk check-in
From: Junio C Hamano @ 2011-10-28 23:54 UTC (permalink / raw)
  To: git

This miniseries is a continuation of the "large file" topic from 1.7.6
development cycle.

The first three are moving existing code around for better reuse.  The
last one serves two purposes: to lift the one-pack-per-one-large-blob
constraint by introducing the concept of "plugging/unplugging" (i.e. you
plug the drain and throw many large blob at index_fd(), and they appear in
a single pack when you unplug it), and to stop using fast-import in this
codepath.

Only very lightly tested.

Junio C Hamano (4):
  write_pack_header(): a helper function
  create_tmp_packfile(): a helper function
  finish_tmp_packfile(): a helper function
  Bulk check-in

 Makefile               |    2 +
 builtin/add.c          |    5 ++
 builtin/pack-objects.c |   56 +++++------------
 bulk-checkin.c         |  159 ++++++++++++++++++++++++++++++++++++++++++++++++
 bulk-checkin.h         |   16 +++++
 pack-write.c           |   53 ++++++++++++++++
 pack.h                 |    6 ++
 sha1_file.c            |   67 +-------------------
 t/t1050-large.sh       |   26 ++++++--
 9 files changed, 282 insertions(+), 108 deletions(-)
 create mode 100644 bulk-checkin.c
 create mode 100644 bulk-checkin.h

-- 
1.7.7.1.573.ga40d2

^ permalink raw reply

* Re: [RFC/PATCH] define the way new representation types are encoded in the pack
From: Nicolas Pitre @ 2011-10-28 23:30 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Junio C Hamano, git, Jeff King
In-Reply-To: <CAJo=hJsEzkFV9k8N+GAwWddmEZH8pQeJZrg_MXD72stbAW0ceQ@mail.gmail.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1352 bytes --]

On Fri, 28 Oct 2011, Shawn Pearce wrote:

> On Fri, Oct 28, 2011 at 15:48, Nicolas Pitre <nico@fluxnic.net> wrote:
> > On Fri, 28 Oct 2011, Shawn Pearce wrote:
> >> - The immediate next byte encodes the extended type. This type is
> >> stored using the OFS_DELTA offset varint encoding, and thus may be
> >> larger than 256 if we ever need it to be.
> >
> > I'd say it is just a byte.  No encoding needed.  Let's not be silly
> > about it.  If we really have more than 255 object types one day (and I
> > really hope this will never happen) then the value 0 in that byte could
> > indicate yet another extended object type encoding.  But I truly hope
> > we'll have pack v9 or v10 by then and that we'll have obsoleted the
> > current 3-bit encoding completely at that point anyway.
> 
> Yes. I probably wouldn't code the parser to use a varint here. I would
> say the extended types stored in this byte must be >= 8, and must be
> <= 127. Any values out of this range are unsupported and should be
> rejected. We can later reserve the right to set the high bit and
> switch to the OFS_DELTA varint encoding if we need that many more
> types, and we explicitly define codes 0-7 as illegal if detected here
> in the extended byte field.

I wouldn't go as far as rejecting codes 1-7 as illegal though, but I 
otherwise agree with what you say.


Nicolas

^ permalink raw reply

* Re: [RFC/PATCH] define the way new representation types are encoded in the pack
From: Shawn Pearce @ 2011-10-28 23:07 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, git, Jeff King
In-Reply-To: <alpine.LFD.2.02.1110290031540.30467@xanadu.home>

On Fri, Oct 28, 2011 at 15:48, Nicolas Pitre <nico@fluxnic.net> wrote:
> On Fri, 28 Oct 2011, Shawn Pearce wrote:
>> - The immediate next byte encodes the extended type. This type is
>> stored using the OFS_DELTA offset varint encoding, and thus may be
>> larger than 256 if we ever need it to be.
>
> I'd say it is just a byte.  No encoding needed.  Let's not be silly
> about it.  If we really have more than 255 object types one day (and I
> really hope this will never happen) then the value 0 in that byte could
> indicate yet another extended object type encoding.  But I truly hope
> we'll have pack v9 or v10 by then and that we'll have obsoleted the
> current 3-bit encoding completely at that point anyway.

Yes. I probably wouldn't code the parser to use a varint here. I would
say the extended types stored in this byte must be >= 8, and must be
<= 127. Any values out of this range are unsupported and should be
rejected. We can later reserve the right to set the high bit and
switch to the OFS_DELTA varint encoding if we need that many more
types, and we explicitly define codes 0-7 as illegal if detected here
in the extended byte field.

^ permalink raw reply

* Re: [RFC/PATCH] define the way new representation types are encoded in the pack
From: Nicolas Pitre @ 2011-10-28 22:48 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Junio C Hamano, git, Jeff King
In-Reply-To: <CAJo=hJt-YZcdxw+D=1S4haPmY-8-LLjXD=MvDGeWbdJ88_VOGw@mail.gmail.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3385 bytes --]

On Fri, 28 Oct 2011, Shawn Pearce wrote:

> On Thu, Oct 27, 2011 at 23:04, Junio C Hamano <gitster@pobox.com> wrote:
> > In addition to four basic types (commit, tree, blob and tag), the pack
> > stream can encode a few other "representation" types, such as REF_DELTA
> > and OFS_DELTA. As we allocate 3 bits in the first byte for this purpose,
> > we do not have much room to add new representation types in place, but we
> > do have one value reserved for future expansion.
> 
> We have 2 values reserved, 0 and 5.
> 
> > When bit 4-6 encodes type 5, the first byte is used this way:
> >
> >  - Bit 0-3 denotes the real "extended" representation type. Because types
> >   0-7 can already be encoded without using the extended format, we can
> >   offset the type by 8 (i.e. if bit 0-3 says 3, it means representation
> >   type 11 = 3 + 8);
> >
> >  - Bit 4-6 has the value "5";
> >
> >  - Bit 7 is used to signal if the _third_ byte needs to be read for larger
> >   size that cannot be represented with 8-bit.
> 
> This is very complicated. We don't need more complex logic in the pack
> encoding. We _especially_ do not need yet another variant of how to
> store a variable length integer in the pack file. I'm sorry, but we
> already have two different variants and this just adds a third. It is
> beyond crazy.
> 
> Last time (this is now years ago but whatever) Nico and I discussed
> adding a new type to packs it was for the alternate tree encoding in
> "pack v4". Trees happen so often that type code 5 is a good value to
> use for these. Later you talked about using the extended type to store
> a cattree blob thing, which would not appear nearly as often as a
> normal directory listing type tree that was encoded using the pack v4
> style encoding... I think saving type 5 for a small frequently
> occurring type is a good thing.
> 
> > As it is unlikely for us to pack things that do not need to record any
> > size, the second byte is always used in full to encode the low 8-bit of
> > the size.
> >
> > I haven't started using type=8 and upwards for anything yet, but because
> > we have only one "future expansion" value left, I want us to be extremely
> > careful in order to avoid painting us into a corner that we cannot get out
> > of, so I am sending this out early for a preliminary review.
> 
> I would have said something more like:
> 
> When bit 4-6 encodes "0", then:
> 
> - Bit 0-3 and bit 7 are used normally to encode a variable length
> "size" integer. These may be 0 indicating no size information.
> 
> - 2nd-nth bytes store remaining "size" information, if bit 7 was set.
> 
> - The immediate next byte encodes the extended type. This type is
> stored using the OFS_DELTA offset varint encoding, and thus may be
> larger than 256 if we ever need it to be.

I'd say it is just a byte.  No encoding needed.  Let's not be silly 
about it.  If we really have more than 255 object types one day (and I 
really hope this will never happen) then the value 0 in that byte could 
indicate yet another extended object type encoding.  But I truly hope 
we'll have pack v9 or v10 by then and that we'll have obsoleted the 
current 3-bit encoding completely at that point anyway.

For the record, I spent around 20 hours working on pack v4 while in the 
Caribbeans for a week last winter as I said I would.  Maybe I'll repeat 
the operation this year.


Nicolas

^ permalink raw reply

* Re: imap-send badly handles commit bodies beginning with "From <"
From: Jeff King @ 2011-10-28 21:37 UTC (permalink / raw)
  To: Andrew Eikum; +Cc: git
In-Reply-To: <20111028212122.GB3966@foghorn.codeweavers.com>

On Fri, Oct 28, 2011 at 04:21:22PM -0500, Andrew Eikum wrote:

> Since we have a program called "mailsplit," wouldn't it make more
> sense to have imap-send use its implementation to split mail instead
> of sharing just the From line detection?

Potentially, yeah. I was thinking of just pulling over the from line
detection (which is the real black magic bit), but it looks like
imap-send's mbox handling could use some general attention (maybe it
would be possible to not read the entire mbox into memory, for example).

> I was hoping it'd be a quick matter of pulling mailsplit's
> implementation out of builtin and into the top level, but I see it's
> got some global variables that are tangled enough that I actually have
> to understand the code before I can pull it apart :)
>
> If no one beats me to it, I'll work on this next week. It's late on
> Friday and I'm moving house this weekend.

No rush. Let us know if you have questions.

> Quick question, since I'm not intimately familiar with Git's code: I
> was thinking of creating a new compilation unit at the top level,
> mailutils.{c,h}, and referencing it from both imap-send.c and
> builtin/splitmail.c. Does that seem like the right approach? Is there
> an existing compilation unit I should be placing splitmail's guts into
> instead?

Yes, I think a new file makes sense here. Make sure to update LIB_H and
LIB_OBJS in the Makefile.

-Peff

^ permalink raw reply

* Re: imap-send badly handles commit bodies beginning with "From <"
From: Andrew Eikum @ 2011-10-28 21:21 UTC (permalink / raw)
  To: Jeff King; +Cc: Andrew Eikum, git
In-Reply-To: <20111028203256.GA15082@sigill.intra.peff.net>

On Fri, Oct 28, 2011 at 01:32:57PM -0700, Jeff King wrote:
> Mbox does have this problem, but I think in this case it is a
> particularly crappy implementation of mbox in imap-send. Look at
> imap-send.c:split_msg; it just looks for "From ".
> 
> It should at least check for something that looks like a timestamp, like
> git-mailsplit does. Maybe mailsplit's is_from_line should be factored
> out so that it can be reused in imap-send.

Since we have a program called "mailsplit," wouldn't it make more
sense to have imap-send use its implementation to split mail instead
of sharing just the From line detection?

> Want to work on a patch?

I was hoping it'd be a quick matter of pulling mailsplit's
implementation out of builtin and into the top level, but I see it's
got some global variables that are tangled enough that I actually have
to understand the code before I can pull it apart :)

If no one beats me to it, I'll work on this next week. It's late on
Friday and I'm moving house this weekend.

Quick question, since I'm not intimately familiar with Git's code: I
was thinking of creating a new compilation unit at the top level,
mailutils.{c,h}, and referencing it from both imap-send.c and
builtin/splitmail.c. Does that seem like the right approach? Is there
an existing compilation unit I should be placing splitmail's guts into
instead?

Andrew

^ permalink raw reply

* Re: [PATCH/WIP 01/11] Introduce "check-attr --excluded" as a replacement for "add --ignore-missing"
From: Nguyen Thai Ngoc Duy @ 2011-10-28 20:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vfwiez4s5.fsf@alter.siamese.dyndns.org>

2011/10/28 Junio C Hamano <gitster@pobox.com>:
> Perhaps ls-files is a more suitable home for the feature?

ls-files sounds good. It does all kinds of file selection already.
I'll see if I can add -I (aka "show ignored files only) to it.
-- 
Duy

^ permalink raw reply

* Re: [PATCH/WIP 02/11] notes-merge: use opendir/readdir instead of using read_directory()
From: Nguyen Thai Ngoc Duy @ 2011-10-28 20:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vzkgmz6v0.fsf@alter.siamese.dyndns.org>

On Fri, Oct 28, 2011 at 4:23 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
>
>>> When read_directory("where/ever") is called, what kind of paths does it
>>> collect? Do the paths the function collects share "where/ever" as their
>>> common prefix? I thought it collects the paths relative to whatever
>>> top-level directory given to the function, so that "where/ever" could be
>>> anything.
>>
>> Correct. But read_directory() takes pathspec now so naturally it does
>> not treat "where/ever" a common prefix anymore.  So it has to open(".")
>> and starts from there.
>
> That is a puzzling statement. The read_directory() function takes:
>
>  - dir: use this struct to pass traversal status and collected paths;
>
>  - path, len: this is the directory (not a pathspec) we start traversal
>   from; and
>
>  - pathspec: these are the patterns that specify which parts of the
>   directory hierarchy under <path,len> are traversed.
>
> I do not see any good reason for <path,len> to become a match pattern. Are
> you trying to get it prepended to elements in pathspec[] and match the path
> collected including the <path> part?
>
> Why?
>
> I could see that "open . and start from there, treating as if <path,len>
> is also pathspec" could be made to work, but I do not see why that is
> desirable.
>
> In other words, are there existing callers that abuse read_directory()
> to feed a pattern in <path,len>? Maybe they should be the one that needs
> fixing instead?

fill_directory() tries to calculate a common prefix (i.e. <path,len>
to read_directory()) from pathspec and that may or may not work when
pathspec magic comes into play. But yes, I could just make
fill_directory() pass <"",0> to read_directory() and keep <path,len>
in read_directory() for notes-merge and future users.
-- 
Duy

^ permalink raw reply

* Re: imap-send badly handles commit bodies beginning with "From <"
From: Jeff King @ 2011-10-28 20:32 UTC (permalink / raw)
  To: Andrew Eikum; +Cc: git
In-Reply-To: <20111028180044.GA3966@foghorn.codeweavers.com>

On Fri, Oct 28, 2011 at 01:00:44PM -0500, Andrew Eikum wrote:

> On the server side, it was split into two mails on either side of that
> commit message's From line with neither mail actually containing the
> From line. To fix it, I just changed it to "Copied from <url>:" :-P
> 
> Ain't mbox grand?

Mbox does have this problem, but I think in this case it is a
particularly crappy implementation of mbox in imap-send. Look at
imap-send.c:split_msg; it just looks for "From ".

It should at least check for something that looks like a timestamp, like
git-mailsplit does. Maybe mailsplit's is_from_line should be factored
out so that it can be reused in imap-send.

Want to work on a patch?

-Peff

^ permalink raw reply

* Re: git alias and --help
From: Gelonida N @ 2011-10-28 20:23 UTC (permalink / raw)
  To: git
In-Reply-To: <m362j95jv3.fsf@localhost.localdomain>

On 10/28/2011 03:26 PM, Jakub Narebski wrote:
> Miles Bader <miles@gnu.org> writes:
>> Junio C Hamano <gitster@pobox.com> writes:
> 
>>>>> git branch --help
>>>>
>>>> How about "git help branch"?
>>>
>>> The reason why we do not do what you seem to be suggesting is because
>>> giving the same behaviour to "git b --help" as "git branch --help" is
>>> wrong.
>>
>> I agree with Gelonida's followup:  although what you say makes sense,
>> it's still pretty annoying behavior for the very common case of a
>> simple renaming alias...
>>
>> E.g., I have "co" aliased to "checkout", and so my fingers are very
>> very inclined to say "co" when I mean checkout... including when
>> asking for help.  I actually end up typing "git co --help", grumbling,
>> and retyping with the full command name, quite reguarly.
>>
>> What I've often wished is that git's help system would output
>> something like:
>>
>>    $ git help co
>>    `git co' is aliased to `checkout'
>>
>>    Here's the help entry for `checkout':
>>
>>    GIT-CHECKOUT(1)                   Git Manual                   GIT-CHECKOUT(1)
> 
> Wouldn't it be more useful to say something like this:
> 
>   $ git co --help
>   `git co' is aliased to `checkout'
>  
>   You can see help entry for `checkout' with "git checkout --help"
> 
> Then help is only copy'n'paste away.  
> 
> (This helping text probably should be controlled by some advice.*
> config variable).

This is definitely an option and something which I suggested as one
option myself (however my example had a typo and was perhaps therefore
not understandable)

^ permalink raw reply

* Re: git alias and --help
From: Gelonida N @ 2011-10-28 20:21 UTC (permalink / raw)
  To: git
In-Reply-To: <m31utx5js7.fsf@localhost.localdomain>

On 10/28/2011 03:27 PM, Jakub Narebski wrote:
> Gelonida N <gelonida@gmail.com> writes:
> 
> [...]
> 
>> Another small detail:
>>
>> Let's assume I have following alias:
>>
>> log = log --name-status
>>
>>
>> In this case I directly get the help text for git log
>> if I typed 'git log --help' (or 'git help log').
>> I don't even see, that my log is in reality aliased.
> 
> That is because it doesn't work: git does not allow for aliasing its
> built-in commands.


Well this explains why.
Thanks

^ permalink raw reply

* Re: git slow over https
From: Gelonida N @ 2011-10-28 20:07 UTC (permalink / raw)
  To: git
In-Reply-To: <CAOs=hR+K_YZcjdAUq_jaz0wc9k8BRQ2-ny7A=GFaNL4R-W0UBw@mail.gmail.com>

On 10/28/2011 05:28 PM, Mika Fischer wrote:
> Hi,
> 
> I have an apache server that serves git repositories over https. I
> have the problem that git is very slow when accessing it, as in 3
> seconds for a "git pull" that does nothing.
> 
> I tracked the problem down to git sleeping for 50ms using select from
> time to time while downloading the response of the server. In my case
> this really hurts performance (see attached strace). However, with a
> different https server things work quite fine.
> 
> If I remove the select in http.c:673 (in run_active_slot), then things
> are fast also with my server.
> 
> So my questions are:
> 1) What's the purpose of the select in http.c:673? Can it be removed?
> 2) If it serves a useful purpose, what can be the reason that it hurts
> performance so much in my case?
> 

Do you edperience the same delay if you ssh to the server?

If yes, then try to add following line to the slow https server's config
file.

UseDNS no

The config file should have a name similair to
/etc/ssh/sshd_config



Is it getting any faster?

^ permalink raw reply

* Re: [PATCH 00/28] Store references hierarchically in cache
From: Michael Haggerty @ 2011-10-28 18:45 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Junio C Hamano, git, Jeff King, Drew Northup, Jakub Narebski,
	Heiko Voigt, Johan Herland, Julian Phillips
In-Reply-To: <CALkWK0=Hsq_yg1Vyr-3_jf9n=WcB_XNYRQa0SEhSWD5VxKBXQg@mail.gmail.com>

On 10/28/2011 03:07 PM, Ramkumar Ramachandra wrote:
> Michael Haggerty writes:
>> Therefore, this patch series changes the data structure used to store
>> the reference cache from a single array of pointers-to-struct into a
>> tree-like structure in which each subdirectory of the reference
>> namespace is stored as an array of pointers-to-entry and entries can
>> be either references or subdirectories containing more references.
> 
> Very nice! I like the idea. Can't wait to start reading the series.
> 
>>  * refs/replace is almost *always* needed even though it often
>>    doesn't even exist.  Thus the presence of many loose references
>>    slows down *many* git commands for no reason whatsoever.
> 
> Was this one of your primary inspirations for writing this series?

My primary inspiration was that "git filter-branch" was so slow, which
is partly because of the refs/replace thing and partly just the built-in
inefficiency of the old reference cache.

>> * the time to create a new branch goes from 180 ms to less than 10 ms
>>  (my test resolution only includes two decimal places) and the time
>>  to checkout a new branch does the same.
> 
> I'm interested in seeing how the callgraph changed.  Assuming you used
> Valgrind to profile it, could you publish the outputs?

I didn't use valgrind; I just timed commands using time(1).

>> The efficiency gains are such that some operations are now faster with
>> loose references than with packed references; however, some operations
>> with packed references slow down a bit.
> 
> Curiously, why do operations with packed references slow down?  (I'll
> probably find out in a few minutes after reading the series, but I'm
> asking anyway because it it's very non-obvious to me now)

I think it's just because the new data structure is slightly slower than
the old one for the task of appending thousands of refs without doing
any searching or sorting.  Since packed refs are read all-or-nothing
(even after my changes), there is no way to read the packed refs only
for the directory that you are interested in.

>> Patches 11-24 change most of the internal functions to work with
>> "struct ref_entry *" (namely the kind of ref_entry that holds a
>> directory of references) instead of "struct ref_dir *".  The reason
>> for this change it to allow these functions access to the "flag" and
>> "name" fields that are stored in ref_entry and thereby avoid having to
>> store redundant information in "struct ref_dir" (which would increase
>> the size of *every* ref_entry because of its presence in the union).
> 
> Hm, I was wondering why the series was looking so intimidating.  Is it
> not possible to squash all (or atleast some) of these together?

Why would I possibly want to squash them together?  Each one is
self-contained and logically separate from the rest.  After each commit
the code compiles and works.  Squashing them together wouldn't increase
the cognitive burden of reading them; on the contrary, it would make it
harder to read them because several logically-separate changes would be
confounded into a single patch.  Most of these patches have only a few
nontrivial lines which you can read at a glance.  And if the series ever
has to be bisected, the error can be narrowed down to a very small diff.

>> From: Michael Haggerty <mhagger@alum.mit.edu>
> 
> Nit: Can't you configure your email client to put this in the "From: "
> header of your emails?

I'm not sure where those extra lines come from.  My email client is
git-send-email.  I just checked, and they have appeared in patch series
sent through two completely different SMTP servers, so it seems unlikely
that the SMTP server is guilty.  I'll see if I can figure it out.

> Thanks for the interesting read.

Thanks for reading :-)

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply

* Re: git slow over https
From: Daniel Stenberg @ 2011-10-28 18:28 UTC (permalink / raw)
  To: Mika Fischer; +Cc: Git Mailing List
In-Reply-To: <CAOs=hR+K_YZcjdAUq_jaz0wc9k8BRQ2-ny7A=GFaNL4R-W0UBw@mail.gmail.com>

On Fri, 28 Oct 2011, Mika Fischer wrote:

> 1) What's the purpose of the select in http.c:673? Can it be removed?
> 2) If it serves a useful purpose, what can be the reason that it hurts
> performance so much in my case?

The purpose must be to avoid busy-looping in case there's nothing to read.

It should probably use curl_multi_fdset [1] to get a decent set to wait for 
instead so that it'll return fast if there is pending data. The timeout for 
select can in fact also get extended with the use of curl_multi_timeout [2].

1 = http://curl.haxx.se/libcurl/c/curl_multi_fdset.html
2 = http://curl.haxx.se/libcurl/c/curl_multi_timeout.html

-- 

  / daniel.haxx.se

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox