Git development
 help / color / mirror / Atom feed
* fetching packs and storing them as packs
From: Nicolas Pitre @ 2006-10-26  3:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

With the last few patches I just posted it is now possible to receive 
(fetch) packs, validate them on the fly, complete them if they are thin 
packs, and store them directly without exploding them into loose 
objects.

There are advantages and inconvenients to both methods, so I think this 
should become a configuration option and/or even a command line argument 
to git-fetch. I think there are many more advantages to keeping packs 
packed hence I think using index-pack should become the default.

But I'm a bit tired to play with it and the final integration is for 
someone else to do.  I've tested it lightly using the extremely crude 
patch below to hook it in the fetch process.

Have fun!

diff --git a/fetch-clone.c b/fetch-clone.c
index 76b99af..28796c3 100644
--- a/fetch-clone.c
+++ b/fetch-clone.c
@@ -142,7 +142,8 @@ int receive_unpack_pack(int xd[2], const
 		dup2(fd[0], 0);
 		close(fd[0]);
 		close(fd[1]);
-		execl_git_cmd("unpack-objects", quiet ? "-q" : NULL, NULL);
+		execl_git_cmd("index-pack", "--stdin", "--fix-thin",
+			      quiet ? NULL : "-v", NULL);
 		die("git-unpack-objects exec failed");
 	}
 	close(fd[0]);
diff --git a/receive-pack.c b/receive-pack.c
index 1fcf3a9..7f6dc49 100644
--- a/receive-pack.c
+++ b/receive-pack.c
@@ -7,7 +7,7 @@
 
 static const char receive_pack_usage[] = "git-receive-pack <git-dir>";
 
-static const char *unpacker[] = { "unpack-objects", NULL };
+static const char *unpacker[] = { "index-pack", "-v", "--stdin", "--fix-thin", NULL };
 
 static int report_status;

^ permalink raw reply related

* [PATCH] diff-format.txt: Combined diff format documentation supplement
From: Jakub Narebski @ 2006-10-26  3:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vejswkoi4.fsf@assigned-by-dhcp.cox.net>

Update example combined diff format to the current version
$ git diff-tree -p -c fec9ebf16c948bcb4a8b88d0173ee63584bcde76
and provide complete first chunk in example.

Document combined diff format headers: how "diff header" look like,
which of "extended diff headers" are used with combined diff and how
they look like, differences in two-line from-file/to-file header from
non-combined diff format, chunk header format.

It should be noted that combined diff format was designed for quick
_content_ inspection and renames would work correctly to pick which
blobs from each tree to compare but otherwise not reflected in the
output (the pathnames are not shown).

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Junio C Hamano napisał:
> Patches to documentation would be easier to comment on and more
> productive, I guess.

So here you have. It should perhaps get review on validity by someone
well versed in the combined diff generation code. There are some guesses
here...

It compiles, but the output was not inspected.


 Documentation/diff-format.txt |   70 ++++++++++++++++++++++++++++++++++++++---
 1 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/Documentation/diff-format.txt b/Documentation/diff-format.txt
index 617d8f5..0d04b03 100644
--- a/Documentation/diff-format.txt
+++ b/Documentation/diff-format.txt
@@ -156,18 +156,78 @@ to produce 'combined diff', which looks 
 
 ------------
 diff --combined describe.c
-@@@ +98,7 @@@
-   return (a_date > b_date) ? -1 : (a_date == b_date) ? 0 : 1;
+index fabadb8,cc95eb0..4866510
+--- a/describe.c
++++ b/describe.c
+@@@ -98,20 -98,12 +98,20 @@@
+  	return (a_date > b_date) ? -1 : (a_date == b_date) ? 0 : 1;
   }
-
+  
 - static void describe(char *arg)
  -static void describe(struct commit *cmit, int last_one)
 ++static void describe(char *arg, int last_one)
   {
- +     unsigned char sha1[20];
- +     struct commit *cmit;
+ +	unsigned char sha1[20];
+ +	struct commit *cmit;
+  	struct commit_list *list;
+  	static int initialized = 0;
+  	struct commit_name *n;
+  
+ +	if (get_sha1(arg, sha1) < 0)
+ +		usage(describe_usage);
+ +	cmit = lookup_commit_reference(sha1);
+ +	if (!cmit)
+ +		usage(describe_usage);
+ +
+  	if (!initialized) {
+  		initialized = 1;
+  		for_each_ref(get_name);
 ------------
 
+1.   It is preceded with a "git diff" header, that looks like
+     this (when '-c' option is used):
+
+       diff --combined fileM
+
+     or like this (when '--cc' option is used):
+
+       diff --c fileM
+
+2.   It is followed by one or more extended header lines
+     (we assume here that we have merge with two parents):
+
+       index <hash>,<hash>..<hash>
+       mode <mode>,<mode>..<mode>
+       new file mode <mode>
+
+     The "mode <mode>,<mode>..<mode>" appears only if at least
+     one of the <mode> is diferent from the rest. Extended headers
+     with information about detected contents movement (renames
+     and copying detection) are designed to work with diff of two
+     <tree-ish> and are not used by combined diff format. Currently
+     combined diff format cannot show files which were removed
+     by merge, so "deleted file mode <mode>,<mode>" is never used.
+
+3.   It is followed by two-line from-file/to-file header
+
+       --- a/fileM
+       +++ b/fileM
+
+     Contrary to two-line header for traditional 'unified' diff
+     format, and similar to filenames in ordinary "diff header",
+     /dev/null is not used for creation combined diff.
+
+4.   Chunk header format is modified to prevent people from
+     accidentally feeding it to 'patch -p1'. Combined diff format
+     was created for review of merge commit changes, and was not
+     meant for apply. The change is similar to the change in the
+     extended 'index' header
+
+       @@@ <from-file-range> <from-file-range> <to-file-range> @@@
+
+     It might be not obvious that we have number of parents + 1
+     '@' characters in chunk header for combined diff format.
+
 Unlike the traditional 'unified' diff format, which shows two
 files A and B with a single column that has `-` (minus --
 appears in A but removed in B), `+` (plus -- missing in A but
-- 
1.4.2.1



-- 
Jakub Narebski

^ permalink raw reply related

* [PATCH 2/3] add progress status to index-pack
From: Nicolas Pitre @ 2006-10-26  3:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

This is more interesting to look at when performing a big fetch.

Signed-off-by: Nicolas Pitre <nico@cam.org>
---
 Documentation/git-index-pack.txt |    7 +++-
 index-pack.c                     |   74 +++++++++++++++++++++++++++++++++++--
 2 files changed, 75 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index c58287d..9fa4847 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -8,8 +8,8 @@ git-index-pack - Build pack index file f
 
 SYNOPSIS
 --------
-'git-index-pack' [-o <index-file>] <pack-file>
-'git-index-pack' --stdin [--fix-thin] [-o <index-file>] [<pack-file>]
+'git-index-pack' [-v] [-o <index-file>] <pack-file>
+'git-index-pack' --stdin [--fix-thin] [-v] [-o <index-file>] [<pack-file>]
 
 
 DESCRIPTION
@@ -22,6 +22,9 @@ objects/pack/ directory of a git reposit
 
 OPTIONS
 -------
+-v::
+	Be verbose about what is going on, including progress status.
+
 -o <index-file>::
 	Write the generated pack index into the specified
 	file.  Without this option the name of pack index
diff --git a/index-pack.c b/index-pack.c
index 9086bbf..2046b37 100644
--- a/index-pack.c
+++ b/index-pack.c
@@ -6,9 +6,11 @@
 #include "commit.h"
 #include "tag.h"
 #include "tree.h"
+#include <sys/time.h>
+#include <signal.h>
 
 static const char index_pack_usage[] =
-"git-index-pack [-o <index-file>] { <pack-file> | --stdin [--fix-thin] [<pack-file>] }";
+"git-index-pack [-v] [-o <index-file>] { <pack-file> | --stdin [--fix-thin] [<pack-file>] }";
 
 struct object_entry
 {
@@ -44,6 +46,42 @@ static int nr_deltas;
 static int nr_resolved_deltas;
 
 static int from_stdin;
+static int verbose;
+
+static volatile sig_atomic_t progress_update;
+
+static void progress_interval(int signum)
+{
+	progress_update = 1;
+}
+
+static void setup_progress_signal(void)
+{
+	struct sigaction sa;
+	struct itimerval v;
+
+	memset(&sa, 0, sizeof(sa));
+	sa.sa_handler = progress_interval;
+	sigemptyset(&sa.sa_mask);
+	sa.sa_flags = SA_RESTART;
+	sigaction(SIGALRM, &sa, NULL);
+
+	v.it_interval.tv_sec = 1;
+	v.it_interval.tv_usec = 0;
+	v.it_value = v.it_interval;
+	setitimer(ITIMER_REAL, &v, NULL);
+
+}
+
+static unsigned display_progress(unsigned n, unsigned total, unsigned last_pc)
+{
+	unsigned percent = n * 100 / total;
+	if (percent != last_pc || progress_update) {
+		fprintf(stderr, "%4u%% (%u/%u) done\r", percent, n, total);
+		progress_update = 0;
+	}
+	return percent;
+}
 
 /* We always read in 4kB chunks. */
 static unsigned char input_buffer[4096];
@@ -135,7 +173,6 @@ static void parse_pack_header(void)
 
 	nr_objects = ntohl(hdr->hdr_entries);
 	use(sizeof(struct pack_header));
-	/*fprintf(stderr, "Indexing %d objects\n", nr_objects);*/
 }
 
 static void bad_object(unsigned long offset, const char *format,
@@ -383,7 +420,7 @@ static int compare_delta_entry(const voi
 /* Parse all objects and return the pack content SHA1 hash */
 static void parse_pack_objects(unsigned char *sha1)
 {
-	int i;
+	int i, percent = -1;
 	struct delta_entry *delta = deltas;
 	void *data;
 	struct stat st;
@@ -394,6 +431,8 @@ static void parse_pack_objects(unsigned
 	 * - calculate SHA1 of all non-delta objects;
 	 * - remember base SHA1 for all deltas.
 	 */
+	if (verbose)
+		fprintf(stderr, "Indexing %d objects.\n", nr_objects);
 	for (i = 0; i < nr_objects; i++) {
 		struct object_entry *obj = &objects[i];
 		data = unpack_raw_entry(obj, &delta->base);
@@ -405,8 +444,12 @@ static void parse_pack_objects(unsigned
 		} else
 			sha1_object(data, obj->size, obj->type, obj->sha1);
 		free(data);
+		if (verbose)
+			percent = display_progress(i+1, nr_objects, percent);
 	}
 	objects[i].offset = consumed_bytes;
+	if (verbose)
+		fputc('\n', stderr);
 
 	/* Check pack integrity */
 	flush();
@@ -420,6 +463,9 @@ static void parse_pack_objects(unsigned
 	if (S_ISREG(st.st_mode) && st.st_size != consumed_bytes + 20)
 		die("pack has junk at the end");
 
+	if (!nr_deltas)
+		return;
+
 	/* Sort deltas by base SHA1/offset for fast searching */
 	qsort(deltas, nr_deltas, sizeof(struct delta_entry),
 	      compare_delta_entry);
@@ -432,6 +478,8 @@ static void parse_pack_objects(unsigned
 	 *   recursively checking if the resulting object is used as a base
 	 *   for some more deltas.
 	 */
+	if (verbose)
+		fprintf(stderr, "Resolving %d deltas.\n", nr_deltas);
 	for (i = 0; i < nr_objects; i++) {
 		struct object_entry *obj = &objects[i];
 		union delta_base base;
@@ -462,7 +510,12 @@ static void parse_pack_objects(unsigned
 						      obj->size, obj->type);
 			}
 		free(data);
+		if (verbose)
+			percent = display_progress(nr_resolved_deltas,
+						   nr_deltas, percent);
 	}
+	if (verbose && nr_resolved_deltas == nr_deltas)
+		fputc('\n', stderr);
 }
 
 static int write_compressed(int fd, void *in, unsigned int size)
@@ -521,7 +574,7 @@ static int delta_pos_compare(const void
 static void fix_unresolved_deltas(int nr_unresolved)
 {
 	struct delta_entry **sorted_by_pos;
-	int i, n = 0;
+	int i, n = 0, percent = -1;
 
 	/*
 	 * Since many unresolved deltas may well be themselves base objects
@@ -570,8 +623,13 @@ static void fix_unresolved_deltas(int nr
 
 		append_obj_to_pack(data, size, obj_type);
 		free(data);
+		if (verbose)
+			percent = display_progress(nr_resolved_deltas,
+						   nr_deltas, percent);
 	}
 	free(sorted_by_pos);
+	if (verbose)
+		fputc('\n', stderr);
 }
 
 static void readjust_pack_header_and_sha1(unsigned char *sha1)
@@ -747,6 +805,8 @@ int main(int argc, char **argv)
 				from_stdin = 1;
 			} else if (!strcmp(arg, "--fix-thin")) {
 				fix_thin_pack = 1;
+			} else if (!strcmp(arg, "-v")) {
+				verbose = 1;
 			} else if (!strcmp(arg, "-o")) {
 				if (index_name || (i+1) >= argc)
 					usage(index_pack_usage);
@@ -780,16 +840,22 @@ int main(int argc, char **argv)
 	parse_pack_header();
 	objects = xmalloc((nr_objects + 1) * sizeof(struct object_entry));
 	deltas = xmalloc(nr_objects * sizeof(struct delta_entry));
+	if (verbose)
+		setup_progress_signal();
 	parse_pack_objects(sha1);
 	if (nr_deltas != nr_resolved_deltas) {
 		if (fix_thin_pack) {
 			int nr_unresolved = nr_deltas - nr_resolved_deltas;
+			int nr_objects_initial = nr_objects;
 			if (nr_unresolved <= 0)
 				die("confusion beyond insanity");
 			objects = xrealloc(objects,
 					   (nr_objects + nr_unresolved + 1)
 					   * sizeof(*objects));
 			fix_unresolved_deltas(nr_unresolved);
+			if (verbose)
+				fprintf(stderr, "%d objects were added to complete this thin pack.\n",
+					nr_objects - nr_objects_initial);
 			readjust_pack_header_and_sha1(sha1);
 		}
 		if (nr_deltas != nr_resolved_deltas)
-- 
1.4.3.3.g10cf-dirty

^ permalink raw reply related

* [PATCH 3/3] mimic unpack-objects when --stdin is used with index-pack
From: Nicolas Pitre @ 2006-10-26  3:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

It appears that git-unpack-objects writes the last part of the input
buffer to stdout after the pack has been parsed.  This looks a bit
suspicious since the last fill() might have filled the buffer up to
the 4096 byte limit and more data might still be pending on stdin,
but since this is about being a drop-in replacement for unpack-objects
let's simply duplicate the same behavior for now.

Signed-off-by: Nicolas Pitre <nico@cam.org>
---
 index-pack.c |   17 +++++++++++++++--
 1 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/index-pack.c b/index-pack.c
index 2046b37..7f7dc5d 100644
--- a/index-pack.c
+++ b/index-pack.c
@@ -456,6 +456,7 @@ static void parse_pack_objects(unsigned
 	SHA1_Final(sha1, &input_ctx);
 	if (hashcmp(fill(20), sha1))
 		die("pack is corrupted (SHA1 mismatch)");
+	use(20);
 
 	/* If input_fd is a file, we should have reached its end now. */
 	if (fstat(input_fd, &st))
@@ -765,6 +766,18 @@ static void final(const char *final_pack
 		if (err)
 			die("error while closing pack file: %s", strerror(errno));
 		chmod(curr_pack_name, 0444);
+
+		/*
+		 * Let's just mimic git-unpack-objects here and write
+		 * the last part of the buffer to stdout.
+		 */
+		while (input_len) {
+			err = xwrite(1, input_buffer + input_offset, input_len);
+			if (err <= 0)
+				break;
+			input_len -= err;
+			input_offset += err;
+		}
 	}
 
 	if (final_pack_name != curr_pack_name) {
@@ -863,7 +876,6 @@ int main(int argc, char **argv)
 			    nr_deltas - nr_resolved_deltas);
 	} else {
 		/* Flush remaining pack final 20-byte SHA1. */
-		use(20);
 		flush();
 	}
 	free(deltas);
@@ -872,7 +884,8 @@ int main(int argc, char **argv)
 	free(objects);
 	free(index_name_buf);
 
-	printf("%s\n", sha1_to_hex(sha1));
+	if (!from_stdin)
+		printf("%s\n", sha1_to_hex(sha1));
 
 	return 0;
 }
-- 
1.4.3.3.g10cf-dirty

^ permalink raw reply related

* [PATCH 1/3] make index-pâck able to complete thin packs
From: Nicolas Pitre @ 2006-10-26  3:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

A new flag, --fix-thin, instructs git-index-pack to append any missing
objects to a thin pack to make it self contained and indexable. Of course
objects missing from the pack must be present elsewhere in the local
repository.

Signed-off-by: Nicolas Pitre <nico@cam.org>
---
 Documentation/git-index-pack.txt |   14 ++-
 index-pack.c                     |  254 +++++++++++++++++++++++++++++++-------
 2 files changed, 224 insertions(+), 44 deletions(-)

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index db7af58..c58287d 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -8,7 +8,8 @@ git-index-pack - Build pack index file f
 
 SYNOPSIS
 --------
-'git-index-pack' [-o <index-file>] { <pack-file> | --stdin [<pack-file>] }
+'git-index-pack' [-o <index-file>] <pack-file>
+'git-index-pack' --stdin [--fix-thin] [-o <index-file>] [<pack-file>]
 
 
 DESCRIPTION
@@ -36,6 +37,17 @@ OPTIONS
 	objects/pack/ directory of the current git repository with
 	a default name determined from the pack content.
 
+--fix-thin::
+	It is possible for gitlink:git-pack-objects[1] to build
+	"thin" pack, which records objects in deltified form based on
+	objects not included in the pack to reduce network traffic.
+	Those objects are expected to be present on the receiving end
+	and they must be included in the pack for that pack to be self
+	contained and indexable. Without this option any attempt to
+	index a thin pack will fail. This option only makes sense in
+	conjonction with --stdin.
+
+
 Author
 ------
 Written by Sergey Vlasov <vsu@altlinux.ru>
diff --git a/index-pack.c b/index-pack.c
index cecdd26..9086bbf 100644
--- a/index-pack.c
+++ b/index-pack.c
@@ -8,7 +8,7 @@
 #include "tree.h"
 
 static const char index_pack_usage[] =
-"git-index-pack [-o <index-file>] { <pack-file> | --stdin [<pack-file>] }";
+"git-index-pack [-o <index-file>] { <pack-file> | --stdin [--fix-thin] [<pack-file>] }";
 
 struct object_entry
 {
@@ -33,14 +33,15 @@ union delta_base {
 
 struct delta_entry
 {
-	struct object_entry *obj;
 	union delta_base base;
+	int obj_no;
 };
 
 static struct object_entry *objects;
 static struct delta_entry *deltas;
 static int nr_objects;
 static int nr_deltas;
+static int nr_resolved_deltas;
 
 static int from_stdin;
 
@@ -50,6 +51,18 @@ static unsigned long input_offset, input
 static SHA_CTX input_ctx;
 static int input_fd, output_fd, mmap_fd;
 
+/* Discard current buffer used content. */
+static void flush()
+{
+	if (input_offset) {
+		if (output_fd >= 0)
+			write_or_die(output_fd, input_buffer, input_offset);
+		SHA1_Update(&input_ctx, input_buffer, input_offset);
+		memcpy(input_buffer, input_buffer + input_offset, input_len);
+		input_offset = 0;
+	}
+}
+
 /*
  * Make sure at least "min" bytes are available in the buffer, and
  * return the pointer to the buffer.
@@ -60,13 +73,7 @@ static void * fill(int min)
 		return input_buffer + input_offset;
 	if (min > sizeof(input_buffer))
 		die("cannot fill %d bytes", min);
-	if (input_offset) {
-		if (output_fd >= 0)
-			write_or_die(output_fd, input_buffer, input_offset);
-		SHA1_Update(&input_ctx, input_buffer, input_offset);
-		memcpy(input_buffer, input_buffer + input_offset, input_len);
-		input_offset = 0;
-	}
+	flush();
 	do {
 		int ret = xread(input_fd, input_buffer + input_len,
 				sizeof(input_buffer) - input_len);
@@ -323,10 +330,9 @@ static void sha1_object(const void *data
 	SHA1_Final(sha1, &ctx);
 }
 
-static void resolve_delta(struct delta_entry *delta, void *base_data,
+static void resolve_delta(struct object_entry *delta_obj, void *base_data,
 			  unsigned long base_size, enum object_type type)
 {
-	struct object_entry *obj = delta->obj;
 	void *delta_data;
 	unsigned long delta_size;
 	void *result;
@@ -334,29 +340,34 @@ static void resolve_delta(struct delta_e
 	union delta_base delta_base;
 	int j, first, last;
 
-	obj->real_type = type;
-	delta_data = get_data_from_pack(obj);
-	delta_size = obj->size;
+	delta_obj->real_type = type;
+	delta_data = get_data_from_pack(delta_obj);
+	delta_size = delta_obj->size;
 	result = patch_delta(base_data, base_size, delta_data, delta_size,
 			     &result_size);
 	free(delta_data);
 	if (!result)
-		bad_object(obj->offset, "failed to apply delta");
-	sha1_object(result, result_size, type, obj->sha1);
+		bad_object(delta_obj->offset, "failed to apply delta");
+	sha1_object(result, result_size, type, delta_obj->sha1);
+	nr_resolved_deltas++;
 
-	hashcpy(delta_base.sha1, obj->sha1);
+	hashcpy(delta_base.sha1, delta_obj->sha1);
 	if (!find_delta_childs(&delta_base, &first, &last)) {
-		for (j = first; j <= last; j++)
-			if (deltas[j].obj->type == OBJ_REF_DELTA)
-				resolve_delta(&deltas[j], result, result_size, type);
+		for (j = first; j <= last; j++) {
+			struct object_entry *child = objects + deltas[j].obj_no;
+			if (child->real_type == OBJ_REF_DELTA)
+				resolve_delta(child, result, result_size, type);
+		}
 	}
 
 	memset(&delta_base, 0, sizeof(delta_base));
-	delta_base.offset = obj->offset;
+	delta_base.offset = delta_obj->offset;
 	if (!find_delta_childs(&delta_base, &first, &last)) {
-		for (j = first; j <= last; j++)
-			if (deltas[j].obj->type == OBJ_OFS_DELTA)
-				resolve_delta(&deltas[j], result, result_size, type);
+		for (j = first; j <= last; j++) {
+			struct object_entry *child = objects + deltas[j].obj_no;
+			if (child->real_type == OBJ_OFS_DELTA)
+				resolve_delta(child, result, result_size, type);
+		}
 	}
 
 	free(result);
@@ -389,7 +400,7 @@ static void parse_pack_objects(unsigned
 		obj->real_type = obj->type;
 		if (obj->type == OBJ_REF_DELTA || obj->type == OBJ_OFS_DELTA) {
 			nr_deltas++;
-			delta->obj = obj;
+			delta->obj_no = i;
 			delta++;
 		} else
 			sha1_object(data, obj->size, obj->type, obj->sha1);
@@ -398,18 +409,15 @@ static void parse_pack_objects(unsigned
 	objects[i].offset = consumed_bytes;
 
 	/* Check pack integrity */
-	SHA1_Update(&input_ctx, input_buffer, input_offset);
+	flush();
 	SHA1_Final(sha1, &input_ctx);
 	if (hashcmp(fill(20), sha1))
 		die("pack is corrupted (SHA1 mismatch)");
-	use(20);
-	if (output_fd >= 0)
-		write_or_die(output_fd, input_buffer, input_offset);
 
 	/* If input_fd is a file, we should have reached its end now. */
 	if (fstat(input_fd, &st))
 		die("cannot fstat packfile: %s", strerror(errno));
-	if (S_ISREG(st.st_mode) && st.st_size != consumed_bytes)
+	if (S_ISREG(st.st_mode) && st.st_size != consumed_bytes + 20)
 		die("pack has junk at the end");
 
 	/* Sort deltas by base SHA1/offset for fast searching */
@@ -440,24 +448,161 @@ static void parse_pack_objects(unsigned
 			continue;
 		data = get_data_from_pack(obj);
 		if (ref)
-			for (j = ref_first; j <= ref_last; j++)
-				if (deltas[j].obj->type == OBJ_REF_DELTA)
-					resolve_delta(&deltas[j], data,
+			for (j = ref_first; j <= ref_last; j++) {
+				struct object_entry *child = objects + deltas[j].obj_no;
+				if (child->real_type == OBJ_REF_DELTA)
+					resolve_delta(child, data,
 						      obj->size, obj->type);
+			}
 		if (ofs)
-			for (j = ofs_first; j <= ofs_last; j++)
-				if (deltas[j].obj->type == OBJ_OFS_DELTA)
-					resolve_delta(&deltas[j], data,
+			for (j = ofs_first; j <= ofs_last; j++) {
+				struct object_entry *child = objects + deltas[j].obj_no;
+				if (child->real_type == OBJ_OFS_DELTA)
+					resolve_delta(child, data,
 						      obj->size, obj->type);
+			}
 		free(data);
 	}
+}
+
+static int write_compressed(int fd, void *in, unsigned int size)
+{
+	z_stream stream;
+	unsigned long maxsize;
+	void *out;
+
+	memset(&stream, 0, sizeof(stream));
+	deflateInit(&stream, zlib_compression_level);
+	maxsize = deflateBound(&stream, size);
+	out = xmalloc(maxsize);
+
+	/* Compress it */
+	stream.next_in = in;
+	stream.avail_in = size;
+	stream.next_out = out;
+	stream.avail_out = maxsize;
+	while (deflate(&stream, Z_FINISH) == Z_OK);
+	deflateEnd(&stream);
+
+	size = stream.total_out;
+	write_or_die(fd, out, size);
+	free(out);
+	return size;
+}
+
+static void append_obj_to_pack(void *buf,
+			       unsigned long size, enum object_type type)
+{
+	struct object_entry *obj = &objects[nr_objects++];
+	unsigned char header[10];
+	unsigned long s = size;
+	int n = 0;
+	unsigned char c = (type << 4) | (s & 15);
+	s >>= 4;
+	while (s) {
+		header[n++] = c | 0x80;
+		c = s & 0x7f;
+		s >>= 7;
+	}
+	header[n++] = c;
+	write_or_die(output_fd, header, n);
+	obj[1].offset = obj[0].offset + n;
+	obj[1].offset += write_compressed(output_fd, buf, size);
+	sha1_object(buf, size, type, obj->sha1);
+}
+
+static int delta_pos_compare(const void *_a, const void *_b)
+{
+	struct delta_entry *a = *(struct delta_entry **)_a;
+	struct delta_entry *b = *(struct delta_entry **)_b;
+	return a->obj_no - b->obj_no;
+}
 
-	/* Check for unresolved deltas */
+static void fix_unresolved_deltas(int nr_unresolved)
+{
+	struct delta_entry **sorted_by_pos;
+	int i, n = 0;
+
+	/*
+	 * Since many unresolved deltas may well be themselves base objects
+	 * for more unresolved deltas, we really want to include the
+	 * smallest number of base objects that would cover as much delta
+	 * as possible by picking the
+	 * trunc deltas first, allowing for other deltas to resolve without
+	 * additional base objects.  Since most base objects are to be found
+	 * before deltas depending on them, a good heuristic is to start
+	 * resolving deltas in the same order as their position in the pack.
+	 */
+	sorted_by_pos = xmalloc(nr_unresolved * sizeof(*sorted_by_pos));
 	for (i = 0; i < nr_deltas; i++) {
-		if (deltas[i].obj->real_type == OBJ_REF_DELTA ||
-		    deltas[i].obj->real_type == OBJ_OFS_DELTA)
-			die("pack has unresolved deltas");
+		if (objects[deltas[i].obj_no].real_type != OBJ_REF_DELTA)
+			continue;
+		sorted_by_pos[n++] = &deltas[i];
 	}
+	qsort(sorted_by_pos, n, sizeof(*sorted_by_pos), delta_pos_compare);
+
+	for (i = 0; i < n; i++) {
+		struct delta_entry *d = sorted_by_pos[i];
+		void *data;
+		unsigned long size;
+		char type[10];
+		enum object_type obj_type;
+		int j, first, last;
+
+		if (objects[d->obj_no].real_type != OBJ_REF_DELTA)
+			continue;
+		data = read_sha1_file(d->base.sha1, type, &size);
+		if (!data)
+			continue;
+		if      (!strcmp(type, blob_type))   obj_type = OBJ_BLOB;
+		else if (!strcmp(type, tree_type))   obj_type = OBJ_TREE;
+		else if (!strcmp(type, commit_type)) obj_type = OBJ_COMMIT;
+		else if (!strcmp(type, tag_type))    obj_type = OBJ_TAG;
+		else die("base object %s is of type '%s'",
+			 sha1_to_hex(d->base.sha1), type);
+
+		find_delta_childs(&d->base, &first, &last);
+		for (j = first; j <= last; j++) {
+			struct object_entry *child = objects + deltas[j].obj_no;
+			if (child->real_type == OBJ_REF_DELTA)
+				resolve_delta(child, data, size, obj_type);
+		}
+
+		append_obj_to_pack(data, size, obj_type);
+		free(data);
+	}
+	free(sorted_by_pos);
+}
+
+static void readjust_pack_header_and_sha1(unsigned char *sha1)
+{
+	struct pack_header hdr;
+	SHA_CTX ctx;
+	int size;
+
+	/* Rewrite pack header with updated object number */
+	if (lseek(output_fd, 0, SEEK_SET) != 0)
+		die("cannot seek back: %s", strerror(errno));
+	if (xread(output_fd, &hdr, sizeof(hdr)) != sizeof(hdr))
+		die("cannot read pack header back: %s", strerror(errno));
+	hdr.hdr_entries = htonl(nr_objects);
+	if (lseek(output_fd, 0, SEEK_SET) != 0)
+		die("cannot seek back: %s", strerror(errno));
+	write_or_die(output_fd, &hdr, sizeof(hdr));
+	if (lseek(output_fd, 0, SEEK_SET) != 0)
+		die("cannot seek back: %s", strerror(errno));
+
+	/* Recompute and store the new pack's SHA1 */
+	SHA1_Init(&ctx);
+	do {
+		unsigned char *buf[4096];
+		size = xread(output_fd, buf, sizeof(buf));
+		if (size < 0)
+			die("cannot read pack data back: %s", strerror(errno));
+		SHA1_Update(&ctx, buf, size);
+	} while (size > 0);
+	SHA1_Final(sha1, &ctx);
+	write_or_die(output_fd, sha1, 20);
 }
 
 static int sha1_compare(const void *_a, const void *_b)
@@ -588,7 +733,7 @@ static void final(const char *final_pack
 
 int main(int argc, char **argv)
 {
-	int i;
+	int i, fix_thin_pack = 0;
 	const char *curr_pack, *pack_name = NULL;
 	const char *curr_index, *index_name = NULL;
 	char *index_name_buf = NULL;
@@ -600,6 +745,8 @@ int main(int argc, char **argv)
 		if (*arg == '-') {
 			if (!strcmp(arg, "--stdin")) {
 				from_stdin = 1;
+			} else if (!strcmp(arg, "--fix-thin")) {
+				fix_thin_pack = 1;
 			} else if (!strcmp(arg, "-o")) {
 				if (index_name || (i+1) >= argc)
 					usage(index_pack_usage);
@@ -616,6 +763,8 @@ int main(int argc, char **argv)
 
 	if (!pack_name && !from_stdin)
 		usage(index_pack_usage);
+	if (fix_thin_pack && !from_stdin)
+		die("--fix-thin cannot be used without --stdin");
 	if (!index_name && pack_name) {
 		int len = strlen(pack_name);
 		if (!has_extension(pack_name, ".pack"))
@@ -629,9 +778,28 @@ int main(int argc, char **argv)
 
 	curr_pack = open_pack_file(pack_name);
 	parse_pack_header();
-	objects = xcalloc(nr_objects + 1, sizeof(struct object_entry));
-	deltas = xcalloc(nr_objects, sizeof(struct delta_entry));
+	objects = xmalloc((nr_objects + 1) * sizeof(struct object_entry));
+	deltas = xmalloc(nr_objects * sizeof(struct delta_entry));
 	parse_pack_objects(sha1);
+	if (nr_deltas != nr_resolved_deltas) {
+		if (fix_thin_pack) {
+			int nr_unresolved = nr_deltas - nr_resolved_deltas;
+			if (nr_unresolved <= 0)
+				die("confusion beyond insanity");
+			objects = xrealloc(objects,
+					   (nr_objects + nr_unresolved + 1)
+					   * sizeof(*objects));
+			fix_unresolved_deltas(nr_unresolved);
+			readjust_pack_header_and_sha1(sha1);
+		}
+		if (nr_deltas != nr_resolved_deltas)
+			die("pack has %d unresolved deltas",
+			    nr_deltas - nr_resolved_deltas);
+	} else {
+		/* Flush remaining pack final 20-byte SHA1. */
+		use(20);
+		flush();
+	}
 	free(deltas);
 	curr_index = write_index_file(index_name, sha1);
 	final(pack_name, curr_pack, index_name, curr_index, sha1);
-- 
1.4.3.3.g10cf-dirty

^ permalink raw reply related

* Re: Combined diff format documentation
From: Junio C Hamano @ 2006-10-26  3:04 UTC (permalink / raw)
  To: Horst H. von Brand; +Cc: git
In-Reply-To: <200610260148.k9Q1mr99007511@laptop13.inf.utfsm.cl>

"Horst H. von Brand" <vonbrand@inf.utfsm.cl> writes:

>> Correct.  This was done to prevent people from accidentally
>> feeding it to "patch -p1".  In other words, we wanted to make it
>> so obvious that it is _not_ a patch.
>
> It isn't, really... perhaps it should be made /more/ obvious (not use @ but
> e.g. &, ...)?

Eh, sorry, what I meant was "obvious to the tool", so "patch"
would take notice.

^ permalink raw reply

* [PATCH] Documentation: updates to "Everyday GIT"
From: J. Bruce Fields @ 2006-10-26  2:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Remove the introduction: I think it should be obvious why
we have this.  (And if it isn't obvious then we've got other
problems.)

Replace reference to git whatchanged by git log.

Miscellaneous style and grammar fixes.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
---
 Documentation/everyday.txt |   43 ++++++++++++-------------------------------
 1 files changed, 12 insertions(+), 31 deletions(-)

diff --git a/Documentation/everyday.txt b/Documentation/everyday.txt
index b935c18..99e24a4 100644
--- a/Documentation/everyday.txt
+++ b/Documentation/everyday.txt
@@ -1,22 +1,7 @@
 Everyday GIT With 20 Commands Or So
 ===================================
 
-GIT suite has over 100 commands, and the manual page for each of
-them discusses what the command does and how it is used in
-detail, but until you know what command should be used in order
-to achieve what you want to do, you cannot tell which manual
-page to look at, and if you know that already you do not need
-the manual.
-
-Does that mean you need to know all of them before you can use
-git?  Not at all.  Depending on the role you play, the set of
-commands you need to know is slightly different, but in any case
-what you need to learn is far smaller than the full set of
-commands to carry out your day-to-day work.  This document is to
-serve as a cheat-sheet and a set of pointers for people playing
-various roles.
-
-<<Basic Repository>> commands are needed by people who has a
+<<Basic Repository>> commands are needed by people who have a
 repository --- that is everybody, because every working tree of
 git is a repository.
 
@@ -25,28 +10,27 @@ essential for anybody who makes a commit
 works alone.
 
 If you work with other people, you will need commands listed in
-<<Individual Developer (Participant)>> section as well.
+the <<Individual Developer (Participant)>> section as well.
 
-People who play <<Integrator>> role need to learn some more
+People who play the <<Integrator>> role need to learn some more
 commands in addition to the above.
 
 <<Repository Administration>> commands are for system
-administrators who are responsible to care and feed git
-repositories to support developers.
+administrators who are responsible for the care and feeding
+of git repositories.
 
 
 Basic Repository[[Basic Repository]]
 ------------------------------------
 
-Everybody uses these commands to feed and care git repositories.
+Everybody uses these commands to maintain git repositories.
 
   * gitlink:git-init-db[1] or gitlink:git-clone[1] to create a
     new repository.
 
-  * gitlink:git-fsck-objects[1] to validate the repository.
+  * gitlink:git-fsck-objects[1] to check the repository for errors.
 
-  * gitlink:git-prune[1] to garbage collect cruft in the
-    repository.
+  * gitlink:git-prune[1] to remove unused objects in the repository.
 
   * gitlink:git-repack[1] to pack loose objects for efficiency.
 
@@ -78,8 +62,8 @@ Repack a small project into single pack.
 $ git prune
 ------------
 +
-<1> pack all the objects reachable from the refs into one pack
-and remove unneeded other packs
+<1> pack all the objects reachable from the refs into one pack,
+then remove the other packs.
 
 
 Individual Developer (Standalone)[[Individual Developer (Standalone)]]
@@ -93,9 +77,6 @@ following commands.
 
   * gitlink:git-log[1] to see what happened.
 
-  * gitlink:git-whatchanged[1] to find out where things have
-    come from.
-
   * gitlink:git-checkout[1] and gitlink:git-branch[1] to switch
     branches.
 
@@ -120,7 +101,7 @@ following commands.
 Examples
 ~~~~~~~~
 
-Extract a tarball and create a working tree and a new repository to keep track of it.::
+Use a tarball as a starting point for a new repository:
 +
 ------------
 $ tar zxf frotz.tar.gz
@@ -203,7 +184,7 @@ Clone the upstream and work on it.  Feed
 $ edit/compile/test; git commit -a -s <1>
 $ git format-patch origin <2>
 $ git pull <3>
-$ git whatchanged -p ORIG_HEAD.. arch/i386 include/asm-i386 <4>
+$ git log -p ORIG_HEAD.. arch/i386 include/asm-i386 <4>
 $ git pull git://git.kernel.org/pub/.../jgarzik/libata-dev.git ALL <5>
 $ git reset --hard ORIG_HEAD <6>
 $ git prune <7>
-- 
1.4.3.2

^ permalink raw reply related

* Re: VCS comparison table
From: Linus Torvalds @ 2006-10-26  2:29 UTC (permalink / raw)
  To: David Rientjes; +Cc: Lachlan Patrick, bazaar-ng, git
In-Reply-To: <Pine.LNX.4.64N.0610232336010.30334@attu2.cs.washington.edu>



On Mon, 23 Oct 2006, David Rientjes wrote:
> 
> Some of the internal commands that have been coded in C are actually much 
> better handled by the shell in the first place.

Others have answered this, but the thing is, it was a _wonderful_ way to 
prototype things, and to add obvious (and nice) early UI issues that made 
git much more usable.

But no, things are not better handled in shell.

Shell tends to make some things really _hard_ to do. A fair chunk of the 
rewrite was because core functionality made things easier. For example, 
the whole internal revision partsing library is really actually a lot more 
capable than we could easily expose as a simple pipeline: the original 
"git log" pipeline worked very well, and you can actually still use those 
kinds of pipelines for a lot of work, but at the same time, some things 
really just work better when you have "deeper" interfaces.

For example, the revision parsing library not only makes "git log" trivial 
as C, it's also needed for an efficient "git annotate/blame/pickaxe" kind 
of thing. There are also things that are just ludicrously hard to do in 
shell-script, like exclusive and atomic file operations.

We used perl and python for some things, but finding people who know them 
tends to be problematic, and python in particular was also a dependency 
problem too, so the fact that the default recursive merge was python 
wasn't wonderful.

So I think the shell-scripts are great (and some of them quite likely will 
remain around for the forseeable future) for prototyping, but for core 
functionality they were not wonderful. 

They are sometimes good examples of how powerful a scripting language git 
can be, though. Scripting is still very important, even though a lot of 
the core stuff doesn't necessarily depend on being scripts itself. 

But error handling in scripting is very hard or inconvenient, especially 
in pipelines. So some things were actively problematic (ie "git-rev-list 
--all --objects | git-pack-objects") and moving it to use the internal 
library interface was simply technically the right thing to do.

Others had real performance issues, eg the new merge in C is a lot faster. 
It was fast before, it's much faster still.


^ permalink raw reply

* Re: Combined diff format documentation
From: Horst H. von Brand @ 2006-10-26  1:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jakub Narebski, git
In-Reply-To: <7vejswkoi4.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> wrote:
> Jakub Narebski <jnareb@gmail.com> writes:

[...]

> > 5. Hunk header is also modified: in ordinary diff we have
> > ...
> >    It might be not obvoious that we have (number of parents + 1) '@'
> >    characters in chunk header for combined dif format.

> Correct.  This was done to prevent people from accidentally
> feeding it to "patch -p1".  In other words, we wanted to make it
> so obvious that it is _not_ a patch.

It isn't, really... perhaps it should be made /more/ obvious (not use @ but
e.g. &, ...)?
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                    Fono: +56 32 2654431
Universidad Tecnica Federico Santa Maria             +56 32 2654239

^ permalink raw reply

* Re: VCS comparison table
From: Horst H. von Brand @ 2006-10-26  1:06 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Jakub Narebski, git
In-Reply-To: <20061025224428.GN20017@pasky.or.cz>

Petr Baudis <pasky@suse.cz> wrote:
> Dear diary, on Thu, Oct 26, 2006 at 12:29:17AM CEST, I got a letter
> where Jakub Narebski <jnareb@gmail.com> said that...
> > Cute names are taken: CoGITo, gitk, qgit (GTK+ history viewer is gitview,
> > not ggit, curiously ;-) and tig.
> 
> wit?

Wig. 
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                    Fono: +56 32 2654431
Universidad Tecnica Federico Santa Maria             +56 32 2654239

^ permalink raw reply

* Re: VCS comparison table
From: Matthew D. Fuller @ 2006-10-25 23:53 UTC (permalink / raw)
  To: David Lang; +Cc: bazaar-ng, git
In-Reply-To: <Pine.LNX.4.63.0610251459160.1754@qynat.qvtvafvgr.pbz>

On Wed, Oct 25, 2006 at 03:40:00PM -0700 I heard the voice of
David Lang, and lo! it spake thus:
> 
> I think we are talking past each other here.
> 
> what I think was said was
> 
> G 'one feature of git is that you can view arbatrary slices
> trivially'
> 
> B 'bzr can do this too, you just use branches to define the slices'

Ah.  This is more like "bzr [mostly] only does this now in terms of a
single branch (or some point back along it)".  The slices that go
between branches are very limited ('missing' gives you one view;
'branch:' and 'ancestor:' revision specifications give you another).
bzrk/'visualize' gives an interface similar to gitk, but also only in
the context of a single branch/head looking backward through its
previous tree AFAIK.  Any random DAG-slicing of what you have in the
revision store can be done, somebody would just have to write the code
for it.  Nothing about 'the workflow preserves parents' would make
that any harder than writing the code for git was.

Much of this is probably a result of the 'branch'-centric (rather than
'repository'-centric) view of the world; similarly to the fact that
branches are referred to by location (local ../otherbranch, or remote
http/sftp/etc) rather than by a name.  This is one of the bits of bzr
I'm personally somewhat ambivalent about.


> they now have threeB options

Those certainly aren't the only choices, but to stay OT:

> 3. pull from each other frequently to keep in sync.
> 
> this changes the topology to
> 
>    Master
>    /   \
>  dev1--dev2
> 
> if they do this with bzr then the revno's break, they each get extra
> commits showing up (so they can never show the same history).

These two are either/or, not and; either they pull (in which case
their old mainline is no longer meaningful), or they merge (in which
case they get the 'extra' merge commits).


> in git this is a non-issue, they can pull back and forth and the
> only new history to show up will be changes.

In git, this is a non-issue because you don't get to CHOOSE which way
to work.  You always (if you can) pull and obliterate your local
mainline.  In bzr, it's only an 'issue' because you CAN choose, and
CAN maintain your local mainline.  You CAN choose, right now, to do a
git and pull back and forth and only new history show up as changed by
creating a 'bzr-pull' shell script that does a 'bzr pull || bzr merge'
(though you'd be a lot better off adding a '--fast-forward-if-you-can'
option to merge and aliasing that over).

More basically, though, I don't think that "histories become exactly
equivalent" is a necessary pass-word to enter the Hallowed City of
Truely Distributed Development.  And I certainly see no reason to
believe we'll agree on it this time any more than We (in broad) have
the last 6 times it came up in the thread.


-- 
Matthew Fuller     (MF4839)   |  fullermd@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/

^ permalink raw reply

* Re: Combined diff format documentation
From: Jakub Narebski @ 2006-10-25 23:45 UTC (permalink / raw)
  To: git
In-Reply-To: <7vejswkoi4.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:

>>    BTW. it is not mentioned in documentation that git diff uses hunk section
>>    indicator, and what regexp/expression it uses (and is it configurable).
>>    Not described in documentation.
> 
> If you mean by "hunk section indicator" the output similar to
> GNU diff -p option, I think it is not worth mentioning and we
> are not ready to mention it yet (we have not etched the
> expression in stone).  Nobody jumped up and down to say it needs
> to be configurable, so it is left undocumented more or less
> deliberately.

By the way, I have just checked that combined diff format doesn't have
(for unknown reason) "which section" indicator in chunk header.
Compare
$ git diff-tree -p -m fec9ebf16c948bcb4a8b88d0173ee63584bcde76
and
$ git diff-tree -p -c fec9ebf16c948bcb4a8b88d0173ee63584bcde76
(this is the source of example combined diff format in diff-formats.txt
which I've found via
$ git rev-list --parents HEAD -- describe.c | grep " .* "
i.e. finding all merges which included changes to describe.c; there
are only two such commits).
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply

* Re: Combined diff format documentation
From: Junio C Hamano @ 2006-10-25 23:24 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <ehoq5r$8h5$1@sea.gmane.org>

Jakub Narebski <jnareb@gmail.com> writes:

> Well, the _documentation_ doesn't tell. I haven't fully grokked the code
> for generating and coloring combined diff output besides the fact that
> I think it uses last indicator ('+' or '-') to chose color for the rest
> of line. You said that even if the possibility exist, it is extreme
> unlikely.

Well if I said that I must have been on booze ;-).

A '-' in the nth column means that the line is from the nth
parent and does _not_ appear in the merge result.  A '+' in the
nth column means that the line _appears_ in the merge result,
and the nth parent does not have that line (i.e. added by the
merge itself, or inherited from other parents).

Hence, by definition, you cannot have '-' and '+' on the same
line (otherwise the line has to exist and not exist in the merge
result at the same time).

A ' ' is a bit tricky to interpret.  A ' ' on a line _without_
any '-' means the line is the same as in that parent and the
merge result (i.e. the result inherited the line from that
parent).  A ' ' on a line that has '-' talks nothing about the
merge result (because by definition '-' lines do not exist in
the merge result) nor the parent that has ' '; in other words,
it is a "don't care" bit.  In the example you quoted from the
commit log of af3feefa:

         - static void describe(char *arg)
          -static void describe(struct commit *cmit, int last_one)
         ++static void describe(char *arg, int last_one)
           {

The first parent had it as one-arg function, and the second one
two-arg but the first parameter was of type "struct commit *";
the merge result has it as two-arg with the first parameter of
type "char *".  The second parent does not know about the
one-arg form of the function so it has ' ' in its column for the
first line.

All versions start the function with an opening brace '{' so the
line has two ' ' prefixed, which is an example of ' ' on a line
without any '-'.



^ permalink raw reply

* Re: VCS comparison table
From: Jakub Narebski @ 2006-10-25 23:15 UTC (permalink / raw)
  To: git
In-Reply-To: <20061025224428.GN20017@pasky.or.cz>

Petr Baudis wrote:

> Dear diary, on Thu, Oct 26, 2006 at 12:29:17AM CEST, I got a letter
> where Jakub Narebski <jnareb@gmail.com> said that...
>> Cute names are taken: CoGITo, gitk, qgit (GTK+ history viewer is gitview,
>> not ggit, curiously ;-) and tig.
> 
> wit?

Taken.

wit ? a Python web interface to git maintained by Christian Meder.
Example site on http://www.grmso.net:8090/ . It uses PATH_INFO
much more than gitweb (which uses CGI parameters mostly, but also
supports multiple projects).

Well, not maintained if http://www.absolutegiganten.org/wit/
is indicator

  wit-0.0.4.tar.gz        08-Sep-2005

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply

* Re: Combined diff format documentation
From: Junio C Hamano @ 2006-10-25 23:14 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <ehoq5r$8h5$1@sea.gmane.org>

Jakub Narebski <jnareb@gmail.com> writes:

> I was not sure about output. All conclusions about combined diff output
> are from examples; I've planned to send patch to documentation when I'll
> be sure that at least _most_ of what I've added is correct.
>
> Will do.

Thanks.

^ permalink raw reply

* Re: Combined diff format documentation
From: Jakub Narebski @ 2006-10-25 22:58 UTC (permalink / raw)
  To: git
In-Reply-To: <7vejswkoi4.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
>
>> 6. Documentation/diff-format.txt explains combined and condensed combined
>>    format quite well, although it doesn't tell us if we can have plusses and
>>    minuses together in one line...
> 
> But you already know the answer to that question, since you
> asked me a few days ago ;-).

Yes, in "[RFC] Syntax highlighting for combined diff" thread
http://permalink.gmane.org/gmane.comp.version-control.git/29566

Well, the _documentation_ doesn't tell. I haven't fully grokked the code
for generating and coloring combined diff output besides the fact that
I think it uses last indicator ('+' or '-') to chose color for the rest
of line. You said that even if the possibility exist, it is extreme
unlikely.

> Patches to documentation would be easier to comment on and more
> productive, I guess.

I was not sure about output. All conclusions about combined diff output
are from examples; I've planned to send patch to documentation when I'll
be sure that at least _most_ of what I've added is correct.

Will do.
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply

* Re: [PATCH] Fix bad usage of mkpath in builtin-branch.sh
From: Junio C Hamano @ 2006-10-25 22:51 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git
In-Reply-To: <20061025224313.GM20017@pasky.or.cz>

Petr Baudis <pasky@suse.cz> writes:

>> +
>> +test "$ret" = 0 && git-diff-tree --summary --root --no-commit-id HEAD
>> +
>>  exit "$ret"
>
> Yes, this might be a good idea, although after the commit is perhaps too
> late.

Before the commit I thought we have git-status output in the
commit log buffer.

Ah,...

We had that old issue of "'M foo' cannot tell content or mode
changes (or both)", and people suggested "M+" and such which
were rejected because Porcelains and people's scripts depended
deeply on "diff --name-status" output being stable.

^ permalink raw reply

* rebasing, git-am and whitespace cleanup conundrims
From: Martin Langhoff @ 2006-10-25 22:49 UTC (permalink / raw)
  To: Git Mailing List

Hola!

i'm going through a rebase of a long series of patches (~50) that have
quite a bit of trailing whitespace. Once the rebase is done on my repo
(resolving all the code level conflicts), I want to re-rebase it on
top of the same commit but cleaning up trailing whitespace.

The problem is that upstream has 'some' trailing whitespace. The
policy is to reject "new" trailing whitespace and cleanup in stages to
avoid big merge conflicts and stuff.

Now, once the early patches are "in" with --whitespace=trim, the
following patches don't apply anymore :-/ and I can't just trim
whitespace in the whole patch series automatically.

Is there any way to get git-apply to ignore whitespace differences
when applying like GNU patch -l?

When -3 is passed, it must be calling the merge utility from
merge_file() -- which doesn't do 'ignore-whitespace'. But if it's a
straight patch application, it should be possible to do something like
that... maybe?

cheers,



^ permalink raw reply

* [ANNOUNCE] GIT 1.4.3.3
From: Junio C Hamano @ 2006-10-25 22:45 UTC (permalink / raw)
  To: git; +Cc: linux-kernel

The latest maintenance release GIT 1.4.3.3 is available at the
usual places:

  http://www.kernel.org/pub/software/scm/git/

  git-1.4.3.3.tar.{gz,bz2}			(tarball)
  git-htmldocs-1.4.3.3.tar.{gz,bz2}		(preformatted docs)
  git-manpages-1.4.3.3.tar.{gz,bz2}		(preformatted docs)
  RPMS/$arch/git-*-1.4.3.3-1.$arch.rpm	(RPM)

Sorry to be doing three follow-up releases in a row.  This is
primarily fix the partitioning of programs in generated RPM.  If
you are installing all of git it does not matter, but by mistake
we were placing git-archive into git-arch subpackage, which
meant that you need to install tla only to use git-tar-tree and
git-archive --format=zip.

Thanks for Gerrit for noticing and reporting it, although he is
from Debian camp ;-).

----------------------------------------------------------------

Changes since v1.4.3.2 are as follows:

Eric Wong (1):
      git-svn: fix symlink-to-file changes when using command-line svn 1.4.0

Gerrit Pape (1):
      Set $HOME for selftests

Junio C Hamano (5):
      Documentation: note about contrib/.
      RPM package re-classification.
      Refer to git-rev-parse:Specifying Revisions from git.txt
      Update cherry documentation.
      Documentation/SubmittingPatches: 3+1 != 6

Petr Baudis (1):
      xdiff: Match GNU diff behaviour when deciding hunk comment worthiness of lines

Tuncer Ayaz (1):
      git-fetch.sh printed protocol fix


^ permalink raw reply

* [PATCH] git-svnimport: support for partial imports
From: Sasha Khapyorsky @ 2006-10-25 22:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Matthias Urlichs

This adds support for partial svn imports. Let's assume that SVN
repository layout looks like:

  $trunk/path/to/our/project
  $branches/path/to/our/project
  $tags/path/to/our/project

, and we would like to import only tree under this specific
'path/to/our/project' and not whole tree under $trunk, $branches, etc..
Now we will be be able to do it by using '-P path/to/our/project' option
with git-svnimport.

Signed-off-by: Sasha Khapyorsky <sashak@voltaire.com>
---
 git-svnimport.perl |   29 +++++++++++++++++++++++++----
 1 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/git-svnimport.perl b/git-svnimport.perl
index f6eff8e..cbaa8ab 100755
--- a/git-svnimport.perl
+++ b/git-svnimport.perl
@@ -31,7 +31,7 @@ die "Need SVN:Core 1.2.1 or better" if $
 $ENV{'TZ'}="UTC";
 
 our($opt_h,$opt_o,$opt_v,$opt_u,$opt_C,$opt_i,$opt_m,$opt_M,$opt_t,$opt_T,
-    $opt_b,$opt_r,$opt_I,$opt_A,$opt_s,$opt_l,$opt_d,$opt_D,$opt_S,$opt_F);
+    $opt_b,$opt_r,$opt_I,$opt_A,$opt_s,$opt_l,$opt_d,$opt_D,$opt_S,$opt_F,$opt_P);
 
 sub usage() {
 	print STDERR <<END;
@@ -39,17 +39,19 @@ Usage: ${\basename $0}     # fetch/updat
        [-o branch-for-HEAD] [-h] [-v] [-l max_rev]
        [-C GIT_repository] [-t tagname] [-T trunkname] [-b branchname]
        [-d|-D] [-i] [-u] [-r] [-I ignorefilename] [-s start_chg]
-       [-m] [-M regex] [-A author_file] [-S] [-F] [SVN_URL]
+       [-m] [-M regex] [-A author_file] [-S] [-F] [-P project_name] [SVN_URL]
 END
 	exit(1);
 }
 
-getopts("A:b:C:dDFhiI:l:mM:o:rs:t:T:Suv") or usage();
+getopts("A:b:C:dDFhiI:l:mM:o:rs:t:T:SP:uv") or usage();
 usage if $opt_h;
 
 my $tag_name = $opt_t || "tags";
 my $trunk_name = $opt_T || "trunk";
 my $branch_name = $opt_b || "branches";
+my $project_name = $opt_P || "";
+$project_name = "/" . $project_name if ($project_name);
 
 @ARGV == 1 or @ARGV == 2 or usage();
 
@@ -427,6 +429,20 @@ sub get_ignore($$$$$) {
 	}
 }
 
+sub project_path($$)
+{
+	my ($path, $project) = @_;
+
+	$path = "/".$path unless ($path =~ m#^\/#) ;
+	return $1 if ($path =~ m#^$project\/(.*)$#);
+
+	$path =~ s#\.#\\\.#g;
+	$path =~ s#\+#\\\+#g;
+	return "/" if ($project =~ m#^$path.*$#);
+
+	return undef;
+}
+
 sub split_path($$) {
 	my($rev,$path) = @_;
 	my $branch;
@@ -446,7 +462,11 @@ sub split_path($$) {
 		print STDERR "$rev: Unrecognized path: $path\n" unless (defined $no_error{$path});
 		return ()
 	}
-	$path = "/" if $path eq "";
+	if ($path eq "") {
+		$path = "/";
+	} elsif ($project_name) {
+		$path = project_path($path, $project_name);
+	}
 	return ($branch,$path);
 }
 
@@ -898,6 +918,7 @@ sub commit_all {
 	while(my($path,$action) = each %$changed_paths) {
 		($branch,$path) = split_path($revision,$path);
 		next if not defined $branch;
+		next if not defined $path;
 		$done{$branch}{$path} = $action;
 	}
 	while(($branch,$changed_paths) = each %done) {
-- 
1.4.3.1.g9f9e

^ permalink raw reply related

* Re: VCS comparison table
From: Petr Baudis @ 2006-10-25 22:44 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <ehooeo$1g6$2@sea.gmane.org>

Dear diary, on Thu, Oct 26, 2006 at 12:29:17AM CEST, I got a letter
where Jakub Narebski <jnareb@gmail.com> said that...
> Cute names are taken: CoGITo, gitk, qgit (GTK+ history viewer is gitview,
> not ggit, curiously ;-) and tig.

wit?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1

^ permalink raw reply

* Re: [PATCH] Fix bad usage of mkpath in builtin-branch.sh
From: Petr Baudis @ 2006-10-25 22:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vslhddmtu.fsf@assigned-by-dhcp.cox.net>

Dear diary, on Wed, Oct 25, 2006 at 06:46:37AM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> said that...
> Junio C Hamano <junkio@cox.net> writes:
> 
> > Petr Baudis <pasky@suse.cz> writes:
> >
> >> I have made my fair share of inadverent mode changes as well (I don't
> >> even know how that *happenned*), and I don't seem to be alone; since
> >> this is something you are doing only rarely anyway, perhaps we should
> >> try to make mode changes more visible?
> >
> > Well we already do and that's how I noticed.
> 
> Ah, sorry, I think I misunderstood you.
> Did you mean something like this?
> 
> diff --git a/git-commit.sh b/git-commit.sh
> index 5b1cf85..8bae734 100755
> --- a/git-commit.sh
> +++ b/git-commit.sh
> @@ -629,4 +629,7 @@ if test -x "$GIT_DIR"/hooks/post-commit
>  then
>  	"$GIT_DIR"/hooks/post-commit
>  fi
> +
> +test "$ret" = 0 && git-diff-tree --summary --root --no-commit-id HEAD
> +
>  exit "$ret"

Yes, this might be a good idea, although after the commit is perhaps too
late.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1

^ permalink raw reply

* Re: VCS comparison table
From: David Lang @ 2006-10-25 22:41 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: git
In-Reply-To: <20061025221531.GB10140@spearce.org>

On Wed, 25 Oct 2006, Shawn Pearce wrote:

> David Lang <dlang@digitalinsight.com> wrote:
>> a quick lesson on program nameing
>>
>> On Wed, 25 Oct 2006, Andreas Ericsson wrote:
>>
>>> I'm personally all for a rewrite of the necessary commands in C ("commit"
>>> comes to mind), but as many others, I have no personal interest in doing
>>> the actual work. I'm fairly certain that once we get it working natively
>>> on windows with some decent performance, windows hackers will pick up the
>>> ball and write "wingit", which will be a log viewer and GUI thing for
>>              ^^^^^^
>>
>> how many other people read this as 'wing it' rather then 'win git'? ;-)
>
> Yes, that's certainly a less than optimal name...
>
> What about gitk?  Is it "gi tk" or "git k" ?  This has actually
> been the source of much local debate.  :-)

in this case I think it's both, (or technicaly git tk with the double t's 
combined to save typeing)


^ permalink raw reply

* Re: VCS comparison table
From: David Lang @ 2006-10-25 22:40 UTC (permalink / raw)
  To: Matthew D. Fuller; +Cc: Linus Torvalds, bazaar-ng, git
In-Reply-To: <20061025002713.GN17019@over-yonder.net>

On Tue, 24 Oct 2006, Matthew D. Fuller wrote:

> On Tue, Oct 24, 2006 at 11:03:20AM -0700 I heard the voice of
> David Lang, and lo! it spake thus:
>>
>> it sounded like you were saying that the way to get the slices of
>> the DAG was to use branches in bzr. [...]
>
> I'm not entirely sure I understand what you mean here, but I think
> you're saying "Nobody's written the code in bzr to show arbitrary
> slices of the DAG", which is true TTBOMK.

I think we are talking past each other here.

what I think was said was

G 'one feature of git is that you can view arbatrary slices trivially'

B 'bzr can do this too, you just use branches to define the slices'

G 'but this limits you becouse branches are defined as code is developed, git 
lets you define slices at viewing time'

by the way, I think it's more then just saying 'well, the code could be written 
to do this in $VCS' some decisions and standard ways of doing things can impact 
how hard it is to implement a feature, and some decisions can make it 
impossible (without doing unexpected things).

>
>> everyone agrees that bzr supports the Star topology. Most people
>> (including bzr people) seem to agree that currently bzr does not
>> support the Distributed topology.
>
> I think this statement arouses so much grumbling because (a) bzr does
> support such a lot better than often seems implied, (b) where it
> doesn't, the changes needed to do so are relatively minor (often
> merely cosmetic), and (c) disagreement over whether some of the
> qualifications included for 'distributed' are really fundamental.
>
>
>> it's just fine for bzr to not support all possible topologies,
>
> I think there's a real intent for bzr TO support at least all common
> topologies.  I'll buy that current development has focused more on
> [relatively] simple topologies than the more wildly complex ones.  I
> look forward to more addressing of the less common cases as the tool
> matures, and I think a lot of this thread will be good material to
> work with as that happens.  It's just the suggestion that providing
> fruit for simple topologies _necessarily_ prejudices against complex
> ones that I find so onerous.

one concern that the git people are voicing is that the things that work for 
simple topologies (revno's) can't be used with the more complex ones (where you 
need the refid's). especially the fact that users need to do things 
significantly different when there are fairly subtle changes to the topology.

the scenerio that came up elsewhere today where you have

    Master
    /    \
dev1   dev2

and then dev1 and dev2 both start working on the same thing (without knowing 
it), then discover they are working on the same thing. they now have threeB 
options

1. merge their stuff up to the master so that they can both pull it down.
   but this puts broken, experimental stuff up in the master

2. declare one of the dev trees to be the master

this changes the topology to

Master--dev1--dev2

3. pull from each other frequently to keep in sync.

this changes the topology to

    Master
    /   \
dev1--dev2

if they do this with bzr then the revno's break, they each get extra commits 
showing up (so they can never show the same history).

in git this is a non-issue, they can pull back and forth and the only new 
history to show up will be changes.

this is the situation that the kernel developers are in frequently. it sounds as 
if you haven't needed to do this yet, so you haven't encountered the problems.

David Lang

^ permalink raw reply

* Re: Combined diff format documentation
From: Junio C Hamano @ 2006-10-25 22:40 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <ehoo2k$1g6$1@sea.gmane.org>

Jakub Narebski <jnareb@gmail.com> writes:

> 1. "git diff" header which looked like this
> 2. the "index" extended header line changes from
> 3. The "rename/copy" headers seems to be never present; see below.
>...

Thanks for starting this.  Your observation is correct.  It was
pretty much designed for quick _content_ inspection and renames
would work correctly to pick which blobs from each tree to
compare but otherwise not reflected in the output (the pathnames
are not shown as far as I know).  We could probably add it if
some users need it.

> 5. Hunk header is also modified: in ordinary diff we have
> ...
>    It might be not obvoious that we have (number of parents + 1) '@'
>    characters in chunk header for combined dif format.

Correct.  This was done to prevent people from accidentally
feeding it to "patch -p1".  In other words, we wanted to make it
so obvious that it is _not_ a patch.

There may be more information in "git log -- combine-diff.c"
output that ought to be collected into the documentation, and
now might be a good time to do so, given that that part of the
system is fairly stable and has not changed for quite some time
in git timescale.

>    BTW. it is not mentioned in documentation that git diff uses hunk section
>    indicator, and what regexp/expression it uses (and is it configurable).
>    Not described in documentation.

If you mean by "hunk section indicator" the output similar to
GNU diff -p option, I think it is not worth mentioning and we
are not ready to mention it yet (we have not etched the
expression in stone).  Nobody jumped up and down to say it needs
to be configurable, so it is left undocumented more or less
deliberately.

> 6. Documentation/diff-format.txt explains combined and condensed combined
>    format quite well, although it doesn't tell us if we can have plusses and
>    minuses together in one line...

But you already know the answer to that question, since you
asked me a few days ago ;-).

Patches to documentation would be easier to comment on and more
productive, I guess.

> Below there are following diffs: with first parent, merge (with all parents)
> with renames detection, combined, combined with rename detection. Is it all
> expected?

Yes.  I do not see anything obviously unexpected in your output.


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox