git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] binary patch.
@ 2006-05-04 23:52 Junio C Hamano
  2006-05-05  2:47 ` Nicolas Pitre
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2006-05-04 23:52 UTC (permalink / raw)
  To: git

This adds "binary patch" to the diff output and teaches apply
what to do with them.

On the diff generation side, traditionally, we said "Binary
files differ\n" without giving anything other than the preimage
and postimage object name on the index line.  This was good
enough for applying a patch generated from your own repository
(very useful while rebasing), because the postimage would be
available in such a case.  However, this was not useful when the
recipient of such a patch via e-mail were to apply it, even if
the preimage was available.

This patch allows the diff to generate "binary" patch when
operating under --full-index option.  The binary patch follows
the usual extended git diff headers, and looks like this:

	"GIT binary patch\n"
	<length byte><data>"\n"
	...
	"\n"

Each line is prefixed with a "length-byte", whose value is upper
or lowercase alphabet that encodes number of bytes that the data
on the line decodes to (1..52 -- 'A' means 1, 'B' means 2, ...,
'Z' means 26, 'a' means 27, ...).  <data> is 1 or more groups of
5-byte sequence, each of which encodes up to 4 bytes in base85
encoding.  Because 52 / 4 * 5 = 65 and we have the length byte,
an output line is capped to 66 characters.  The payload is the
same diff-delta as we use in the packfiles.

On the consumption side, git-apply now can decode and apply the
binary patch when --allow-binary-replacement is given, the diff
was generated with --full-index, and the receiving repository
has the preimage blob, which is the same condition as it always
required when accepting an "Binary files differ\n" patch.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 Makefile |    2 -
 apply.c  |  232 ++++++++++++++++++++++++++++++++++++++++++++++----------------
 cache.h  |    3 +
 diff.c   |  106 ++++++++++++++++++++++++++--
 4 files changed, 275 insertions(+), 68 deletions(-)

diff --git a/Makefile b/Makefile
index a3f7e92..b6be520 100644
--- a/Makefile
+++ b/Makefile
@@ -205,7 +205,7 @@ DIFF_OBJS = \
 	diffcore-delta.o log-tree.o
 
 LIB_OBJS = \
-	blob.o commit.o connect.o csum-file.o \
+	blob.o commit.o connect.o csum-file.o base85.o \
 	date.o diff-delta.o entry.o exec_cmd.o ident.o index.o \
 	object.o pack-check.o patch-delta.o path.o pkt-line.o \
 	quote.o read-cache.o refs.o run-command.o \
diff --git a/apply.c b/apply.c
index 269210a..e37c4eb 100644
--- a/apply.c
+++ b/apply.c
@@ -10,6 +10,7 @@ #include <fnmatch.h>
 #include "cache.h"
 #include "quote.h"
 #include "blob.h"
+#include "delta.h"
 
 //  --check turns on checking that the working tree matches the
 //    files that are being modified, but doesn't apply the patch
@@ -966,6 +967,70 @@ static inline int metadata_changes(struc
 		 patch->old_mode != patch->new_mode);
 }
 
+static int parse_binary(char *buffer, unsigned long size, struct patch *patch)
+{
+	/* We have read "GIT binary patch\n"; what follows is a
+	 * sequence of 'length-byte' followed by base-85 encoded
+	 * delta data.
+	 *
+	 * Each 5-byte sequence of base-85 encodes up to 4 bytes,
+	 * and we would limit the patch line to 66 characters,
+	 * so one line can fit up to 13 groups that would decode
+	 * to 52 bytes max.  The length byte 'A'-'Z' corresponds
+	 * to 1-26 bytes, and 'a'-'z' corresponds to 27-52 bytes.
+	 * The end of binary is signalled with an empty line.
+	 */
+	int llen, used;
+	struct fragment *fragment;
+	char *delta = NULL;
+
+	patch->is_binary = 1;
+	patch->fragments = fragment = xcalloc(1, sizeof(*fragment));
+	used = 0;
+	while (1) {
+		int byte_length, max_byte_length, newsize;
+		llen = linelen(buffer, size);
+		used += llen;
+		linenr++;
+		if (llen == 1)
+			break;
+		/* Minimum line is "A00000\n" which is 7-byte long,
+		 * and the line length must be multiple of 5 plus 2.
+		 */
+		if ((llen < 7) || (llen-2) % 5)
+			goto corrupt;
+		max_byte_length = (llen - 2) / 5 * 4;
+		byte_length = *buffer;
+		if ('A' <= byte_length && byte_length <= 'Z')
+			byte_length = byte_length - 'A' + 1;
+		else if ('a' <= byte_length && byte_length <= 'z')
+			byte_length = byte_length - 'a' + 27;
+		else
+			goto corrupt;
+		/* if the input length was not multiple of 4, we would
+		 * have filler at the end but the filler should never
+		 * exceed 3 bytes
+		 */
+		if (max_byte_length < byte_length ||
+		    byte_length <= max_byte_length - 4)
+			goto corrupt;
+		newsize = fragment->size + byte_length;
+		delta = xrealloc(delta, newsize);
+		if (decode_85(delta + fragment->size,
+			      buffer + 1,
+			      byte_length))
+			goto corrupt;
+		fragment->size = newsize;
+		buffer += llen;
+		size -= llen;
+	}
+	fragment->patch = delta;
+	return used;
+ corrupt:
+	return error("corrupt binary patch at line %d: %.*s",
+		     linenr-1, llen-1, buffer);
+}
+
 static int parse_chunk(char *buffer, unsigned long size, struct patch *patch)
 {
 	int hdrsize, patchsize;
@@ -982,19 +1047,34 @@ static int parse_chunk(char *buffer, uns
 			"Files ",
 			NULL,
 		};
+		static const char git_binary[] = "GIT binary patch\n";
 		int i;
 		int hd = hdrsize + offset;
 		unsigned long llen = linelen(buffer + hd, size - hd);
 
-		if (!memcmp(" differ\n", buffer + hd + llen - 8, 8))
+		if (llen == sizeof(git_binary) - 1 &&
+		    !memcmp(git_binary, buffer + hd, llen)) {
+			int used;
+			linenr++;
+			used = parse_binary(buffer + hd + llen,
+					    size - hd - llen, patch);
+			if (used)
+				patchsize = used + llen;
+			else
+				patchsize = 0;
+		}
+		else if (!memcmp(" differ\n", buffer + hd + llen - 8, 8)) {
 			for (i = 0; binhdr[i]; i++) {
 				int len = strlen(binhdr[i]);
 				if (len < size - hd &&
 				    !memcmp(binhdr[i], buffer + hd, len)) {
+					linenr++;
 					patch->is_binary = 1;
+					patchsize = llen;
 					break;
 				}
 			}
+		}
 
 		/* Empty patch cannot be applied if:
 		 * - it is a binary patch and we do not do binary_replace, or
@@ -1345,76 +1425,108 @@ #endif
 	return offset;
 }
 
-static int apply_fragments(struct buffer_desc *desc, struct patch *patch)
+static int apply_binary(struct buffer_desc *desc, struct patch *patch)
 {
-	struct fragment *frag = patch->fragments;
 	const char *name = patch->old_name ? patch->old_name : patch->new_name;
+	unsigned char sha1[20];
+	unsigned char hdr[50];
+	int hdrlen;
 
-	if (patch->is_binary) {
-		unsigned char sha1[20];
+	if (!allow_binary_replacement)
+		return error("cannot apply binary patch to '%s' "
+			     "without --allow-binary-replacement",
+			     name);
 
-		if (!allow_binary_replacement)
-			return error("cannot apply binary patch to '%s' "
-				     "without --allow-binary-replacement",
-				     name);
+	/* For safety, we require patch index line to contain
+	 * full 40-byte textual SHA1 for old and new, at least for now.
+	 */
+	if (strlen(patch->old_sha1_prefix) != 40 ||
+	    strlen(patch->new_sha1_prefix) != 40 ||
+	    get_sha1_hex(patch->old_sha1_prefix, sha1) ||
+	    get_sha1_hex(patch->new_sha1_prefix, sha1))
+		return error("cannot apply binary patch to '%s' "
+			     "without full index line", name);
 
-		/* For safety, we require patch index line to contain
-		 * full 40-byte textual SHA1 for old and new, at least for now.
+	if (patch->old_name) {
+		/* See if the old one matches what the patch
+		 * applies to.
 		 */
-		if (strlen(patch->old_sha1_prefix) != 40 ||
-		    strlen(patch->new_sha1_prefix) != 40 ||
-		    get_sha1_hex(patch->old_sha1_prefix, sha1) ||
-		    get_sha1_hex(patch->new_sha1_prefix, sha1))
-			return error("cannot apply binary patch to '%s' "
-				     "without full index line", name);
-
-		if (patch->old_name) {
-			unsigned char hdr[50];
-			int hdrlen;
-
-			/* See if the old one matches what the patch
-			 * applies to.
-			 */
-			write_sha1_file_prepare(desc->buffer, desc->size,
-						blob_type, sha1, hdr, &hdrlen);
-			if (strcmp(sha1_to_hex(sha1), patch->old_sha1_prefix))
-				return error("the patch applies to '%s' (%s), "
-					     "which does not match the "
-					     "current contents.",
-					     name, sha1_to_hex(sha1));
-		}
-		else {
-			/* Otherwise, the old one must be empty. */
-			if (desc->size)
-				return error("the patch applies to an empty "
-					     "'%s' but it is not empty", name);
-		}
+		write_sha1_file_prepare(desc->buffer, desc->size,
+					blob_type, sha1, hdr, &hdrlen);
+		if (strcmp(sha1_to_hex(sha1), patch->old_sha1_prefix))
+			return error("the patch applies to '%s' (%s), "
+				     "which does not match the "
+				     "current contents.",
+				     name, sha1_to_hex(sha1));
+	}
+	else {
+		/* Otherwise, the old one must be empty. */
+		if (desc->size)
+			return error("the patch applies to an empty "
+				     "'%s' but it is not empty", name);
+	}
+
+	if (desc->buffer) {
+		free(desc->buffer);
+		desc->alloc = desc->size = 0;
+	}
+	get_sha1_hex(patch->new_sha1_prefix, sha1);
+	if (!memcmp(sha1, null_sha1, 20))
+		return 0; /* deletion patch */
+
+	if (has_sha1_file(sha1)) {
+		char type[10];
+		unsigned long size;
 
-		/* For now, we do not record post-image data in the patch,
-		 * and require the object already present in the recipient's
-		 * object database.
+		desc->buffer = read_sha1_file(sha1, type, &size);
+		if (!desc->buffer)
+			return error("the necessary postimage %s for "
+				     "'%s' cannot be read",
+				     patch->new_sha1_prefix, name);
+		desc->alloc = desc->size = size;
+	}
+	else {
+		char type[10];
+		unsigned long src_size, dst_size;
+		void *src;
+
+		get_sha1_hex(patch->old_sha1_prefix, sha1);
+		src = read_sha1_file(sha1, type, &src_size);
+		if (!src)
+			return error("the necessary preimage %s for "
+				     "'%s' cannot be read",
+				     patch->old_sha1_prefix, name);
+
+		/* patch->fragment->patch has the delta data and
+		 * we should apply it to the preimage.
 		 */
-		if (desc->buffer) {
-			free(desc->buffer);
-			desc->alloc = desc->size = 0;
-		}
-		get_sha1_hex(patch->new_sha1_prefix, sha1);
-
-		if (memcmp(sha1, null_sha1, 20)) {
-			char type[10];
-			unsigned long size;
-
-			desc->buffer = read_sha1_file(sha1, type, &size);
-			if (!desc->buffer)
-				return error("the necessary postimage %s for "
-					     "'%s' does not exist",
-					     patch->new_sha1_prefix, name);
-			desc->alloc = desc->size = size;
-		}
+		desc->buffer = patch_delta(src, src_size,
+					   (void*) patch->fragments->patch,
+					   patch->fragments->size,
+					   &dst_size);
+		if (!desc->buffer)
+			return error("binary patch does not apply to '%s'",
+				     name);
+		desc->size = desc->alloc = dst_size;
 
-		return 0;
+		/* verify that the result matches */
+		write_sha1_file_prepare(desc->buffer, desc->size, blob_type,
+					sha1, hdr, &hdrlen);
+		if (strcmp(sha1_to_hex(sha1), patch->new_sha1_prefix))
+			return error("binary patch to '%s' creates incorrect result", name);
 	}
 
+	return 0;
+}
+
+static int apply_fragments(struct buffer_desc *desc, struct patch *patch)
+{
+	struct fragment *frag = patch->fragments;
+	const char *name = patch->old_name ? patch->old_name : patch->new_name;
+
+	if (patch->is_binary)
+		return apply_binary(desc, patch);
+
 	while (frag) {
 		if (apply_one_fragment(desc, frag) < 0)
 			return error("patch failed: %s:%ld",
diff --git a/cache.h b/cache.h
index 9d0ddcf..2f32f3d 100644
--- a/cache.h
+++ b/cache.h
@@ -363,4 +363,7 @@ extern int receive_keep_pack(int fd[2], 
 /* pager.c */
 extern void setup_pager(void);
 
+/* base85 */
+int decode_85(char *dst, char *line, int linelen);
+
 #endif /* CACHE_H */
diff --git a/diff.c b/diff.c
index c845c87..b14d897 100644
--- a/diff.c
+++ b/diff.c
@@ -8,6 +8,7 @@ #include "cache.h"
 #include "quote.h"
 #include "diff.h"
 #include "diffcore.h"
+#include "delta.h"
 #include "xdiff-interface.h"
 
 static int use_size_cache;
@@ -391,6 +392,90 @@ static void show_stats(struct diffstat_t
 			total_files, adds, dels);
 }
 
+static void *encode_delta_size(void *data, unsigned long size)
+{
+	unsigned char *cp = data;
+	*cp++ = size;
+	size >>= 7;
+	while (size) {
+		cp[-1] |= 0x80;
+		*cp++ = size;
+		size >>= 7;
+	}
+	return cp;
+}
+
+static void *safe_diff_delta(const unsigned char *src, unsigned long src_size,
+			     const unsigned char *dst, unsigned long dst_size,
+			     unsigned long *delta_size)
+{
+	unsigned long bufsize;
+	unsigned char *data;
+	unsigned char *cp;
+
+	if (src_size && dst_size)
+		return diff_delta(src, src_size, dst, dst_size, delta_size, 0);
+
+	/* diff-delta does not like to do delta with empty, so
+	 * we do that by hand here.  Sigh...
+	 */
+
+	if (!src_size)
+		/* literal copy can be done only 127-byte at a time.
+		 */
+		bufsize = dst_size + (dst_size / 127) + 40;
+	else
+		bufsize = 40;
+	data = xmalloc(bufsize);
+	cp = encode_delta_size(data, src_size);
+	cp = encode_delta_size(cp, dst_size);
+
+	if (dst_size) {
+		/* copy out literally */
+		while (dst_size) {
+			int sz = (127 < dst_size) ? 127 : dst_size;
+			*cp++ = sz;
+			dst_size -= sz;
+			while (sz) {
+				*cp++ = *dst++;
+				sz--;
+			}
+		}
+	}
+	*delta_size = (cp - data);
+	return data;
+}
+
+static void emit_binary_diff(mmfile_t *one, mmfile_t *two)
+{
+	void *delta, *cp;
+	unsigned long delta_size;
+
+	printf("GIT binary patch\n");
+	delta = safe_diff_delta(one->ptr, one->size,
+				two->ptr, two->size,
+				&delta_size);
+	if (!delta)
+		die("unable to generate binary diff");
+
+	/* emit delta encoded in base85 */
+	cp = delta;
+	while (delta_size) {
+		int bytes = (52 < delta_size) ? 52 : delta_size;
+		char line[70];
+		delta_size -= bytes;
+		if (bytes <= 26)
+			line[0] = bytes + 'A' - 1;
+		else
+			line[0] = bytes - 26 + 'a' - 1;
+		encode_85(line + 1, cp, bytes);
+		cp += bytes;
+		puts(line);
+	}
+	printf("\n");
+	free(delta);
+}
+
 #define FIRST_FEW_BYTES 8000
 static int mmfile_is_binary(mmfile_t *mf)
 {
@@ -407,6 +492,7 @@ static void builtin_diff(const char *nam
 			 struct diff_filespec *one,
 			 struct diff_filespec *two,
 			 const char *xfrm_msg,
+			 struct diff_options *o,
 			 int complete_rewrite)
 {
 	mmfile_t mf1, mf2;
@@ -451,8 +537,13 @@ static void builtin_diff(const char *nam
 	if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0)
 		die("unable to read files to diff");
 
-	if (mmfile_is_binary(&mf1) || mmfile_is_binary(&mf2))
-		printf("Binary files %s and %s differ\n", lbl[0], lbl[1]);
+	if (mmfile_is_binary(&mf1) || mmfile_is_binary(&mf2)) {
+		if (o->full_index)
+			emit_binary_diff(&mf1, &mf2);
+		else
+			printf("Binary files %s and %s differ\n",
+			       lbl[0], lbl[1]);
+	}
 	else {
 		/* Crazy xdl interfaces.. */
 		const char *diffopts = getenv("GIT_DIFF_OPTS");
@@ -928,6 +1019,7 @@ static void run_diff_cmd(const char *pgm
 			 struct diff_filespec *one,
 			 struct diff_filespec *two,
 			 const char *xfrm_msg,
+			 struct diff_options *o,
 			 int complete_rewrite)
 {
 	if (pgm) {
@@ -937,7 +1029,7 @@ static void run_diff_cmd(const char *pgm
 	}
 	if (one && two)
 		builtin_diff(name, other ? other : name,
-			     one, two, xfrm_msg, complete_rewrite);
+			     one, two, xfrm_msg, o, complete_rewrite);
 	else
 		printf("* Unmerged path %s\n", name);
 }
@@ -971,7 +1063,7 @@ static void run_diff(struct diff_filepai
 
 	if (DIFF_PAIR_UNMERGED(p)) {
 		/* unmerged */
-		run_diff_cmd(pgm, p->one->path, NULL, NULL, NULL, NULL, 0);
+		run_diff_cmd(pgm, p->one->path, NULL, NULL, NULL, NULL, o, 0);
 		return;
 	}
 
@@ -1041,14 +1133,14 @@ static void run_diff(struct diff_filepai
 		 * needs to be split into deletion and creation.
 		 */
 		struct diff_filespec *null = alloc_filespec(two->path);
-		run_diff_cmd(NULL, name, other, one, null, xfrm_msg, 0);
+		run_diff_cmd(NULL, name, other, one, null, xfrm_msg, o, 0);
 		free(null);
 		null = alloc_filespec(one->path);
-		run_diff_cmd(NULL, name, other, null, two, xfrm_msg, 0);
+		run_diff_cmd(NULL, name, other, null, two, xfrm_msg, o, 0);
 		free(null);
 	}
 	else
-		run_diff_cmd(pgm, name, other, one, two, xfrm_msg,
+		run_diff_cmd(pgm, name, other, one, two, xfrm_msg, o,
 			     complete_rewrite);
 
 	free(name_munged);
-- 
1.3.1.g25a9

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] binary patch.
  2006-05-04 23:52 [PATCH] binary patch Junio C Hamano
@ 2006-05-05  2:47 ` Nicolas Pitre
  2006-05-05  6:47   ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Nicolas Pitre @ 2006-05-05  2:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, 4 May 2006, Junio C Hamano wrote:

> This adds "binary patch" to the diff output and teaches apply
> what to do with them.

This is nice.

However I'd deflate the delta data before encoding it with base85.


Nicolas

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] binary patch.
  2006-05-05  2:47 ` Nicolas Pitre
@ 2006-05-05  6:47   ` Junio C Hamano
  2006-05-05 10:15     ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2006-05-05  6:47 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: git

Nicolas Pitre <nico@cam.org> writes:

> On Thu, 4 May 2006, Junio C Hamano wrote:
>
>> This adds "binary patch" to the diff output and teaches apply
>> what to do with them.
>
> This is nice.
>
> However I'd deflate the delta data before encoding it with base85.

Yeah, things still to be done are deflating both delta and
optionally perhaps use just deflate without delta for "new file"
codepath.

And testsuite.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] binary patch.
  2006-05-05  6:47   ` Junio C Hamano
@ 2006-05-05 10:15     ` Junio C Hamano
  2006-05-05 15:41       ` Nicolas Pitre
  2006-05-06  7:40       ` Junio C Hamano
  0 siblings, 2 replies; 12+ messages in thread
From: Junio C Hamano @ 2006-05-05 10:15 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: git

Junio C Hamano <junkio@cox.net> writes:

> Nicolas Pitre <nico@cam.org> writes:
>
>> This is nice.
>>
>> However I'd deflate the delta data before encoding it with base85.
>
> Yeah, things still to be done are deflating both delta and
> optionally perhaps use just deflate without delta for "new file"
> codepath.
>
> And testsuite.

-- >8 --
[PATCH] binary diff: further updates.

This updates the user interface and generated diff data format.

 * "diff --binary" is used to signal that we want an e-mailable
   binary patch.  It implies --full-index and -p.

 * "apply --allow-binary-replacement" acquired a short synonym
   "apply --binary".

 * After the "GIT binary patch\n" header line there is a token
   to record which binary patch mechanism was used, so that we
   can extend it later.  Currently there are two mechanisms
   defined: "literal" and "delta".  The former records the
   deflated postimage and the latter records the deflated delta
   from the preimage to postimage.

   For purely implementation convenience, I added the deflated
   length after these "literal/delta" tokens (otherwise the
   decoding side needs to guess and reallocate the buffer while
   inflating).  Improvement patches are very welcomed.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 * Have done only very minimum testing, and unlike somebody else ;-)
   my code is not always perfect, so this will still stay out of
   "next" for a while.

 apply.c |  130 +++++++++++++++++++++++++++++++++++++++++++++----------------
 cache.h |    1 
 diff.c  |  135 +++++++++++++++++++++++++++++++++------------------------------
 diff.h  |    1 
 4 files changed, 169 insertions(+), 98 deletions(-)

diff --git a/apply.c b/apply.c
index e37c4eb..1b93aab 100644
--- a/apply.c
+++ b/apply.c
@@ -114,6 +114,9 @@ struct patch {
 	char *new_name, *old_name, *def_name;
 	unsigned int old_mode, new_mode;
 	int is_rename, is_copy, is_new, is_delete, is_binary;
+#define BINARY_DELTA_DEFLATED 1
+#define BINARY_LITERAL_DEFLATED 2
+	unsigned long deflate_origlen;
 	int lines_added, lines_deleted;
 	int score;
 	struct fragment *fragments;
@@ -969,9 +972,11 @@ static inline int metadata_changes(struc
 
 static int parse_binary(char *buffer, unsigned long size, struct patch *patch)
 {
-	/* We have read "GIT binary patch\n"; what follows is a
-	 * sequence of 'length-byte' followed by base-85 encoded
-	 * delta data.
+	/* We have read "GIT binary patch\n"; what follows is a line
+	 * that says the patch method (currently, either "deflated
+	 * literal" or "deflated delta") and the length of data before
+	 * deflating; a sequence of 'length-byte' followed by base-85
+	 * encoded data follows.
 	 *
 	 * Each 5-byte sequence of base-85 encodes up to 4 bytes,
 	 * and we would limit the patch line to 66 characters,
@@ -982,11 +987,27 @@ static int parse_binary(char *buffer, un
 	 */
 	int llen, used;
 	struct fragment *fragment;
-	char *delta = NULL;
+	char *data = NULL;
 
-	patch->is_binary = 1;
 	patch->fragments = fragment = xcalloc(1, sizeof(*fragment));
-	used = 0;
+
+	/* Grab the type of patch */
+	llen = linelen(buffer, size);
+	used = llen;
+	linenr++;
+
+	if (!strncmp(buffer, "delta ", 6)) {
+		patch->is_binary = BINARY_DELTA_DEFLATED;
+		patch->deflate_origlen = strtoul(buffer + 6, NULL, 10);
+	}
+	else if (!strncmp(buffer, "literal ", 8)) {
+		patch->is_binary = BINARY_LITERAL_DEFLATED;
+		patch->deflate_origlen = strtoul(buffer + 8, NULL, 10);
+	}
+	else
+		return error("unrecognized binary patch at line %d: %.*s",
+			     linenr-1, llen-1, buffer);
+	buffer += llen;
 	while (1) {
 		int byte_length, max_byte_length, newsize;
 		llen = linelen(buffer, size);
@@ -1015,8 +1036,8 @@ static int parse_binary(char *buffer, un
 		    byte_length <= max_byte_length - 4)
 			goto corrupt;
 		newsize = fragment->size + byte_length;
-		delta = xrealloc(delta, newsize);
-		if (decode_85(delta + fragment->size,
+		data = xrealloc(data, newsize);
+		if (decode_85(data + fragment->size,
 			      buffer + 1,
 			      byte_length))
 			goto corrupt;
@@ -1024,7 +1045,7 @@ static int parse_binary(char *buffer, un
 		buffer += llen;
 		size -= llen;
 	}
-	fragment->patch = delta;
+	fragment->patch = data;
 	return used;
  corrupt:
 	return error("corrupt binary patch at line %d: %.*s",
@@ -1425,6 +1446,61 @@ #endif
 	return offset;
 }
 
+static char *inflate_it(const void *data, unsigned long size,
+			unsigned long inflated_size)
+{
+	z_stream stream;
+	void *out;
+	int st;
+
+	memset(&stream, 0, sizeof(stream));
+
+	stream.next_in = (unsigned char *)data;
+	stream.avail_in = size;
+	stream.next_out = out = xmalloc(inflated_size);
+	stream.avail_out = inflated_size;
+	inflateInit(&stream);
+	st = inflate(&stream, Z_FINISH);
+	if ((st != Z_STREAM_END) || stream.total_out != inflated_size) {
+		free(out);
+		return NULL;
+	}
+	return out;
+}
+
+static int apply_binary_fragment(struct buffer_desc *desc, struct patch *patch)
+{
+	unsigned long dst_size;
+	struct fragment *fragment = patch->fragments;
+	void *data;
+	void *result;
+
+	data = inflate_it(fragment->patch, fragment->size,
+			  patch->deflate_origlen);
+	if (!data)
+		return error("corrupt patch data");
+	switch (patch->is_binary) {
+	case BINARY_DELTA_DEFLATED:
+		result = patch_delta(desc->buffer, desc->size,
+				     data,
+				     patch->deflate_origlen,
+				     &dst_size);
+		free(desc->buffer);
+		desc->buffer = result;
+		free(data);
+		break;
+	case BINARY_LITERAL_DEFLATED:
+		free(desc->buffer);
+		desc->buffer = data;
+		dst_size = patch->deflate_origlen;
+		break;
+	}
+	if (!desc->buffer)
+		return -1;
+	desc->size = desc->alloc = dst_size;
+	return 0;
+}
+
 static int apply_binary(struct buffer_desc *desc, struct patch *patch)
 {
 	const char *name = patch->old_name ? patch->old_name : patch->new_name;
@@ -1466,18 +1542,20 @@ static int apply_binary(struct buffer_de
 				     "'%s' but it is not empty", name);
 	}
 
-	if (desc->buffer) {
+	get_sha1_hex(patch->new_sha1_prefix, sha1);
+	if (!memcmp(sha1, null_sha1, 20)) {
 		free(desc->buffer);
 		desc->alloc = desc->size = 0;
-	}
-	get_sha1_hex(patch->new_sha1_prefix, sha1);
-	if (!memcmp(sha1, null_sha1, 20))
+		desc->buffer = NULL;
 		return 0; /* deletion patch */
+	}
 
 	if (has_sha1_file(sha1)) {
+		/* We already have the postimage */
 		char type[10];
 		unsigned long size;
 
+		free(desc->buffer);
 		desc->buffer = read_sha1_file(sha1, type, &size);
 		if (!desc->buffer)
 			return error("the necessary postimage %s for "
@@ -1486,28 +1564,13 @@ static int apply_binary(struct buffer_de
 		desc->alloc = desc->size = size;
 	}
 	else {
-		char type[10];
-		unsigned long src_size, dst_size;
-		void *src;
-
-		get_sha1_hex(patch->old_sha1_prefix, sha1);
-		src = read_sha1_file(sha1, type, &src_size);
-		if (!src)
-			return error("the necessary preimage %s for "
-				     "'%s' cannot be read",
-				     patch->old_sha1_prefix, name);
-
-		/* patch->fragment->patch has the delta data and
-		 * we should apply it to the preimage.
+		/* We have verified desc matches the preimage;
+		 * apply the patch data to it, which is stored
+		 * in the patch->fragments->{patch,size}.
 		 */
-		desc->buffer = patch_delta(src, src_size,
-					   (void*) patch->fragments->patch,
-					   patch->fragments->size,
-					   &dst_size);
-		if (!desc->buffer)
+		if (apply_binary_fragment(desc, patch))
 			return error("binary patch does not apply to '%s'",
 				     name);
-		desc->size = desc->alloc = dst_size;
 
 		/* verify that the result matches */
 		write_sha1_file_prepare(desc->buffer, desc->size, blob_type,
@@ -2102,7 +2165,8 @@ int main(int argc, char **argv)
 			diffstat = 1;
 			continue;
 		}
-		if (!strcmp(arg, "--allow-binary-replacement")) {
+		if (!strcmp(arg, "--allow-binary-replacement") ||
+		    !strcmp(arg, "--binary")) {
 			allow_binary_replacement = 1;
 			continue;
 		}
diff --git a/cache.h b/cache.h
index 2f32f3d..4b7a439 100644
--- a/cache.h
+++ b/cache.h
@@ -365,5 +365,6 @@ extern void setup_pager(void);
 
 /* base85 */
 int decode_85(char *dst, char *line, int linelen);
+void encode_85(char *buf, unsigned char *data, int bytes);
 
 #endif /* CACHE_H */
diff --git a/diff.c b/diff.c
index b14d897..94bd94a 100644
--- a/diff.c
+++ b/diff.c
@@ -392,78 +392,75 @@ static void show_stats(struct diffstat_t
 			total_files, adds, dels);
 }
 
-static void *encode_delta_size(void *data, unsigned long size)
+static unsigned char *deflate_it(char *data,
+				 unsigned long size,
+				 unsigned long *result_size)
 {
-	unsigned char *cp = data;
-	*cp++ = size;
-	size >>= 7;
-	while (size) {
-		cp[-1] |= 0x80;
-		*cp++ = size;
-		size >>= 7;
-	}
-	return cp;
-}
-
-static void *safe_diff_delta(const unsigned char *src, unsigned long src_size,
-			     const unsigned char *dst, unsigned long dst_size,
-			     unsigned long *delta_size)
-{
-	unsigned long bufsize;
-	unsigned char *data;
-	unsigned char *cp;
-
-	if (src_size && dst_size)
-		return diff_delta(src, src_size, dst, dst_size, delta_size, 0);
-
-	/* diff-delta does not like to do delta with empty, so
-	 * we do that by hand here.  Sigh...
-	 */
-
-	if (!src_size)
-		/* literal copy can be done only 127-byte at a time.
-		 */
-		bufsize = dst_size + (dst_size / 127) + 40;
-	else
-		bufsize = 40;
-	data = xmalloc(bufsize);
-	cp = encode_delta_size(data, src_size);
-	cp = encode_delta_size(cp, dst_size);
-
-	if (dst_size) {
-		/* copy out literally */
-		while (dst_size) {
-			int sz = (127 < dst_size) ? 127 : dst_size;
-			*cp++ = sz;
-			dst_size -= sz;
-			while (sz) {
-				*cp++ = *dst++;
-				sz--;
-			}
-		}
-	}
-	*delta_size = (cp - data);
-	return data;
+	int bound;
+	unsigned char *deflated;
+	z_stream stream;
+
+	memset(&stream, 0, sizeof(stream));
+	deflateInit(&stream, Z_BEST_COMPRESSION);
+	bound = deflateBound(&stream, size);
+	deflated = xmalloc(bound);
+	stream.next_out = deflated;
+	stream.avail_out = bound;
+
+	stream.next_in = (unsigned char *)data;
+	stream.avail_in = size;
+	while (deflate(&stream, Z_FINISH) == Z_OK)
+		; /* nothing */
+	deflateEnd(&stream);
+	*result_size = stream.total_out;
+	return deflated;
 }
 
 static void emit_binary_diff(mmfile_t *one, mmfile_t *two)
 {
-	void *delta, *cp;
+	void *cp;
+	void *delta;
+	void *deflated;
+	void *data;
+	unsigned long orig_size;
 	unsigned long delta_size;
+	unsigned long deflate_size;
+	unsigned long data_size;
 
 	printf("GIT binary patch\n");
-	delta = safe_diff_delta(one->ptr, one->size,
-				two->ptr, two->size,
-				&delta_size);
-	if (!delta)
-		die("unable to generate binary diff");
-
-	/* emit delta encoded in base85 */
-	cp = delta;
-	while (delta_size) {
-		int bytes = (52 < delta_size) ? 52 : delta_size;
+	/* We could do deflated delta, or we could do just deflated two,
+	 * whichever is smaller.
+	 */
+	delta = NULL;
+	deflated = deflate_it(two->ptr, two->size, &deflate_size);
+	if (one->size && two->size) {
+		delta = diff_delta(one->ptr, one->size,
+				   two->ptr, two->size,
+				   &delta_size, deflate_size);
+		orig_size = delta_size;
+		if (delta)
+			delta = deflate_it(delta, delta_size, &delta_size);
+	}
+
+	if (delta && delta_size < deflate_size) {
+		printf("delta %lu\n", orig_size);
+		free(deflated);
+		data = delta;
+		data_size = delta_size;
+	}
+	else {
+		printf("literal %lu\n", two->size);
+		free(delta);
+		data = deflated;
+		data_size = deflate_size;
+	}
+
+	/* emit data encoded in base85 */
+	cp = data;
+	while (data_size) {
+		int bytes = (52 < data_size) ? 52 : data_size;
 		char line[70];
-		delta_size -= bytes;
+		data_size -= bytes;
 		if (bytes <= 26)
 			line[0] = bytes + 'A' - 1;
 		else
@@ -473,7 +470,7 @@ static void emit_binary_diff(mmfile_t *o
 		puts(line);
 	}
 	printf("\n");
-	free(delta);
+	free(data);
 }
 
 #define FIRST_FEW_BYTES 8000
@@ -538,7 +535,11 @@ static void builtin_diff(const char *nam
 		die("unable to read files to diff");
 
 	if (mmfile_is_binary(&mf1) || mmfile_is_binary(&mf2)) {
-		if (o->full_index)
+		/* Quite common confusing case */
+		if (mf1.size == mf2.size &&
+		    !memcmp(mf1.ptr, mf2.ptr, mf1.size))
+			goto free_ab_and_return;
+		if (o->binary)
 			emit_binary_diff(&mf1, &mf2);
 		else
 			printf("Binary files %s and %s differ\n",
@@ -1239,6 +1240,10 @@ int diff_opt_parse(struct diff_options *
 		options->rename_limit = strtoul(arg+2, NULL, 10);
 	else if (!strcmp(arg, "--full-index"))
 		options->full_index = 1;
+	else if (!strcmp(arg, "--binary")) {
+		options->output_format = DIFF_FORMAT_PATCH;
+		options->full_index = options->binary = 1;
+	}
 	else if (!strcmp(arg, "--name-only"))
 		options->output_format = DIFF_FORMAT_NAME;
 	else if (!strcmp(arg, "--name-status"))
diff --git a/diff.h b/diff.h
index b3b2c4d..d052608 100644
--- a/diff.h
+++ b/diff.h
@@ -28,6 +28,7 @@ struct diff_options {
 		 with_raw:1,
 		 with_stat:1,
 		 tree_in_recursive:1,
+		 binary:1,
 		 full_index:1,
 		 silent_on_remove:1,
 		 find_copies_harder:1;
-- 
1.3.1.g25a9

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] binary patch.
  2006-05-05 10:15     ` Junio C Hamano
@ 2006-05-05 15:41       ` Nicolas Pitre
  2006-05-05 17:38         ` Junio C Hamano
  2006-05-06  7:40       ` Junio C Hamano
  1 sibling, 1 reply; 12+ messages in thread
From: Nicolas Pitre @ 2006-05-05 15:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, 5 May 2006, Junio C Hamano wrote:

> > Yeah, things still to be done are deflating both delta and
> > optionally perhaps use just deflate without delta for "new file"
> > codepath.

> +	/* We could do deflated delta, or we could do just deflated two,
> +	 * whichever is smaller.
> +	 */
> +	delta = NULL;
> +	deflated = deflate_it(two->ptr, two->size, &deflate_size);
> +	if (one->size && two->size) {
> +		delta = diff_delta(one->ptr, one->size,
> +				   two->ptr, two->size,
> +				   &delta_size, deflate_size);

Here you probably want to use deflate_size-1 (deflate_size can't be 0).

> +		orig_size = delta_size;
> +		if (delta)
> +			delta = deflate_it(delta, delta_size, &delta_size);

Here you're leaking the original delta buffer memory.


Nicolas

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] binary patch.
  2006-05-05 15:41       ` Nicolas Pitre
@ 2006-05-05 17:38         ` Junio C Hamano
  2006-05-05 18:33           ` Nicolas Pitre
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2006-05-05 17:38 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: git

Nicolas Pitre <nico@cam.org> writes:

>> +	delta = NULL;
>> +	deflated = deflate_it(two->ptr, two->size, &deflate_size);
>> +	if (one->size && two->size) {
>> +		delta = diff_delta(one->ptr, one->size,
>> +				   two->ptr, two->size,
>> +				   &delta_size, deflate_size);
>
> Here you probably want to use deflate_size-1 (deflate_size can't be 0).

I am not sure if -1 is worth here.

The delta is going to be deflated and hopefully gets a bit
smaller, so if we really care that level of detail, it might be
worth to do (deflate_size*3/2) or something like that here, use
delta with or without deflate whichever is smaller, and mark the
uncompressed delta with a different tag ("uncompressed delta"?).
And for symmetry, to deal with uncompressible data, we may want
to have "uncompressed literal" as well.

>> +		orig_size = delta_size;
>> +		if (delta)
>> +			delta = deflate_it(delta, delta_size, &delta_size);
>
> Here you're leaking the original delta buffer memory.

Indeed.  Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] binary patch.
  2006-05-05 17:38         ` Junio C Hamano
@ 2006-05-05 18:33           ` Nicolas Pitre
  2006-05-05 19:23             ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Nicolas Pitre @ 2006-05-05 18:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, 5 May 2006, Junio C Hamano wrote:

> Nicolas Pitre <nico@cam.org> writes:
> 
> >> +	delta = NULL;
> >> +	deflated = deflate_it(two->ptr, two->size, &deflate_size);
> >> +	if (one->size && two->size) {
> >> +		delta = diff_delta(one->ptr, one->size,
> >> +				   two->ptr, two->size,
> >> +				   &delta_size, deflate_size);
> >
> > Here you probably want to use deflate_size-1 (deflate_size can't be 0).
> 
> I am not sure if -1 is worth here.
> 
> The delta is going to be deflated and hopefully gets a bit
> smaller, so if we really care that level of detail, it might be
> worth to do (deflate_size*3/2) or something like that here, use
> delta with or without deflate whichever is smaller, and mark the
> uncompressed delta with a different tag ("uncompressed delta"?).
> And for symmetry, to deal with uncompressible data, we may want
> to have "uncompressed literal" as well.

Nah...  Please just forget that.  ;-)


Nicolas

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] binary patch.
  2006-05-05 18:33           ` Nicolas Pitre
@ 2006-05-05 19:23             ` Junio C Hamano
  2006-05-05 20:07               ` Nicolas Pitre
  2006-05-05 20:33               ` Daniel Barkalow
  0 siblings, 2 replies; 12+ messages in thread
From: Junio C Hamano @ 2006-05-05 19:23 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: git

Nicolas Pitre <nico@cam.org> writes:

> On Fri, 5 May 2006, Junio C Hamano wrote:
>
>> The delta is going to be deflated and hopefully gets a bit
>> smaller, so if we really care that level of detail, it might be
>> worth to do (deflate_size*3/2) or something like that here, use
>> delta with or without deflate whichever is smaller, and mark the
>> uncompressed delta with a different tag ("uncompressed delta"?).
>> And for symmetry, to deal with uncompressible data, we may want
>> to have "uncompressed literal" as well.
>
> Nah...  Please just forget that.  ;-)

I was serious about the above actually.

BTW, this "binary patch" opens a different can of worms.

Currently, the diff uses a heuristic borrowed from GNU diff 
(I did not look at the code when I did it, but it is described
in its documentation) to decide if a file is binary (look at the
first few bytes and find NUL).  I am sure people will want to
have a way to say "that heuristic fails but this _is_ a binary
file and please treat it as such".

There are two, both valid, I think, ways to do it.

 - give an option to "diff" that says "treat this path as binary
   for this invocation of the program".

 - give an attribute to blob object that says "this blob is
   binary and should be treated as such".

The latter is probably the right way to go in the longer term.

A blob being binary or not is a property of the content and does
not depend on where it sits in the history, so unlike "recording
renames as a hint in commit objects", the attribute is at the
blob level, not at the commit nor the tree that points at the
blob.

But "binaryness" affects only certain operations that extract
the data (e.g. diff and grep) and not others (e.g. fetch).
Also, it makes sense to being able to retroactively mark a blob,
which was not marked as such originally, is a binary.  So I do
not think it should be recorded in the object header.

Which suggests that we may perhaps want to have notes that can
be attached to existing objects to augment them without changing
the contents of the data, and have tools notice these notes when
they are available.  Another example is to associate correct
MIME types to blobs so, gitweb _blob_ links can do sensible
things to them.

These external notes are purely for Porcelains (in the context
of this sentence "diff" and "grep" are Porcelain), but we would
also want a way to propagate them across repositories somehow.
In a sense, "grafts" information is similar to the external
notes in that it augments existing commit objects, but its
effect is a bit more intrusive; it affects the way the core
operates.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] binary patch.
  2006-05-05 19:23             ` Junio C Hamano
@ 2006-05-05 20:07               ` Nicolas Pitre
  2006-05-05 20:33               ` Daniel Barkalow
  1 sibling, 0 replies; 12+ messages in thread
From: Nicolas Pitre @ 2006-05-05 20:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, 5 May 2006, Junio C Hamano wrote:

> Nicolas Pitre <nico@cam.org> writes:
> 
> > On Fri, 5 May 2006, Junio C Hamano wrote:
> >
> >> The delta is going to be deflated and hopefully gets a bit
> >> smaller, so if we really care that level of detail, it might be
> >> worth to do (deflate_size*3/2) or something like that here, use
> >> delta with or without deflate whichever is smaller, and mark the
> >> uncompressed delta with a different tag ("uncompressed delta"?).
> >> And for symmetry, to deal with uncompressible data, we may want
> >> to have "uncompressed literal" as well.
> >
> > Nah...  Please just forget that.  ;-)
> 
> I was serious about the above actually.

And I think this is overkill.

First, if a deflated delta is to be _larger_ than its inflated version 
this is because the delta data is really really short, most probably 
shorter than a single base85 line.  Same for literal data.

So I truely think the pretty special and rare case where not deflating 
might be smaller is simply not worth the added complexity.

> BTW, this "binary patch" opens a different can of worms.
> 
> Currently, the diff uses a heuristic borrowed from GNU diff 
> (I did not look at the code when I did it, but it is described
> in its documentation) to decide if a file is binary (look at the
> first few bytes and find NUL).  I am sure people will want to
> have a way to say "that heuristic fails but this _is_ a binary
> file and please treat it as such".
> 
> There are two, both valid, I think, ways to do it.
> 
>  - give an option to "diff" that says "treat this path as binary
>    for this invocation of the program".
> 
>  - give an attribute to blob object that says "this blob is
>    binary and should be treated as such".
> 
> The latter is probably the right way to go in the longer term.

I'm not sure I agree.

> A blob being binary or not is a property of the content and does
> not depend on where it sits in the history, so unlike "recording
> renames as a hint in commit objects", the attribute is at the
> blob level, not at the commit nor the tree that points at the
> blob.

Well, sort of.

> But "binaryness" affects only certain operations that extract
> the data (e.g. diff and grep) and not others (e.g. fetch).
> Also, it makes sense to being able to retroactively mark a blob,
> which was not marked as such originally, is a binary.  So I do
> not think it should be recorded in the object header.

Agreed.

> Which suggests that we may perhaps want to have notes that can
> be attached to existing objects to augment them without changing
> the contents of the data, and have tools notice these notes when
> they are available.  Another example is to associate correct
> MIME types to blobs so, gitweb _blob_ links can do sensible
> things to them.

I think blobs are the wrong level to attach such notes.  If you go that 
path you'll have to add as many entries for the number of blobs many 
revisions of the same file might have.

Instead I think it should be attached to files.  After all being a 
binary or not is a file attribute regardless of its revision.  And 
implementation wise I'd do it as a .gitbin file listing all names of 
files that should be considered as binaries, with path globing and all, 
just like .gitignore currently lists files that should be ignored.

And the advantage is that those .gitbin files can be distributed and 
revision-controlled just like the .gitignore files.

And in addition to those files you could have a section in the repo 
config file listing default name patterns for files that are considered 
binaries.  Or even a section, if present, that lists patterns for files 
that are _not_ binaries since that list might certainly be shorter.  
There could be a corresponding .gittext as well.

And in the absence of any of those then the default automatic euristic 
applies.


Nicolas

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] binary patch.
  2006-05-05 19:23             ` Junio C Hamano
  2006-05-05 20:07               ` Nicolas Pitre
@ 2006-05-05 20:33               ` Daniel Barkalow
  2006-05-05 20:50                 ` Junio C Hamano
  1 sibling, 1 reply; 12+ messages in thread
From: Daniel Barkalow @ 2006-05-05 20:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, git

On Fri, 5 May 2006, Junio C Hamano wrote:

> But "binaryness" affects only certain operations that extract
> the data (e.g. diff and grep) and not others (e.g. fetch).
> Also, it makes sense to being able to retroactively mark a blob,
> which was not marked as such originally, is a binary.  So I do
> not think it should be recorded in the object header.

Why do you think it makes sense to retroactively mark a blob with things 
like binariness or MIME type? To the extent that the information is not 
possible to extract from the blob contents, it seems to me to be a 
permanent aspect of the blob. And I could see having blobs with the same 
content but different type information (that one is a ZIP archive, while 
this one is a OpenDocument file), and tools may care how they were 
specified, and the user would want to be able to track how they had 
historically been marked, if the system allows them to be marked at all.

Of course, there's still the issue of how this info is generated for a new 
blob; I think it should live in the index for tracked files and come from 
a .gitignore-style file for new files. (For that matter, there could be a 
.gitmetadata file, which would handle "ignore" as well as binary and 
whatever other info you want to produce about your not-previously-tracked 
files.)

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] binary patch.
  2006-05-05 20:33               ` Daniel Barkalow
@ 2006-05-05 20:50                 ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2006-05-05 20:50 UTC (permalink / raw)
  To: git

Daniel Barkalow <barkalow@iabervon.org> writes:

> On Fri, 5 May 2006, Junio C Hamano wrote:
>
>> But "binaryness" affects only certain operations that extract
>> the data (e.g. diff and grep) and not others (e.g. fetch).
>> Also, it makes sense to being able to retroactively mark a blob,
>> which was not marked as such originally, is a binary.  So I do
>> not think it should be recorded in the object header.
>
> Why do you think it makes sense to retroactively mark a blob with things 
> like binariness or MIME type? To the extent that the information is not 
> possible to extract from the blob contents, it seems to me to be a 
> permanent aspect of the blob. And I could see having blobs with the same 
> content but different type information (that one is a ZIP archive, while 
> this one is a OpenDocument file), and tools may care how they were 
> specified, and the user would want to be able to track how they had 
> historically been marked, if the system allows them to be marked at all.
>
> Of course, there's still the issue of how this info is generated for a new 
> blob; I think it should live in the index for tracked files and come from 
> a .gitignore-style file for new files. (For that matter, there could be a 
> .gitmetadata file, which would handle "ignore" as well as binary and 
> whatever other info you want to produce about your not-previously-tracked 
> files.)

I think Nico's solution (compromise?) is the right and most
practical one.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] binary patch.
  2006-05-05 10:15     ` Junio C Hamano
  2006-05-05 15:41       ` Nicolas Pitre
@ 2006-05-06  7:40       ` Junio C Hamano
  1 sibling, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2006-05-06  7:40 UTC (permalink / raw)
  To: git

Junio C Hamano <junkio@cox.net> writes:

>> Yeah, things still to be done are deflating both delta and
>> optionally perhaps use just deflate without delta for "new file"
>> codepath.
>>
>> And testsuite.
>
> -- >8 --
> [PATCH] binary diff: further updates.
>...
> Signed-off-by: Junio C Hamano <junkio@cox.net>
>
> ---
>
>  * Have done only very minimum testing, and unlike somebody else ;-)
>    my code is not always perfect, so this will still stay out of
>    "next" for a while.

OK, now it passes the testsuite I wrote, so I'll push this out
in "next".  I was not drunk while doing the testsuite so the
code now must be perfect ;-).

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2006-05-06  7:40 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-04 23:52 [PATCH] binary patch Junio C Hamano
2006-05-05  2:47 ` Nicolas Pitre
2006-05-05  6:47   ` Junio C Hamano
2006-05-05 10:15     ` Junio C Hamano
2006-05-05 15:41       ` Nicolas Pitre
2006-05-05 17:38         ` Junio C Hamano
2006-05-05 18:33           ` Nicolas Pitre
2006-05-05 19:23             ` Junio C Hamano
2006-05-05 20:07               ` Nicolas Pitre
2006-05-05 20:33               ` Daniel Barkalow
2006-05-05 20:50                 ` Junio C Hamano
2006-05-06  7:40       ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).