git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Remove diff machinery dependency from read-cache
@ 2010-01-21 19:37 Linus Torvalds
  2010-01-21 20:07 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Linus Torvalds @ 2010-01-21 19:37 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List


Exal Sibeaz pointed out that some git files are way too big, and that 
add_files_to_cache() brings in all the diff machinery to any git binary 
that needs the basic git SHA1 object operations from read-cache.c. Which 
is pretty much all of them.

It's doubly silly, since add_files_to_cache() is only used by builtin 
programs (add, checkout and commit), so it's fairly easily fixed by just 
moving the thing to builtin-add.c, and avoiding the dependency entirely.

I initially argued to Exal that it would probably be best to try to depend 
on smart compilers and linkers, but after spending some time trying to 
make -ffunction-sections work and giving up, I think Exal was right, and 
the fix is to just do some trivial cleanups like this.

This trivial cleanup results in pretty stunning file size differences. 
The diff machinery really is mostly used by just the builtin programs, and 
you have things like these trivial before-and-after numbers:

  -rwxr-xr-x 1 torvalds torvalds 1727420 2010-01-21 10:53 git-hash-object
  -rwxrwxr-x 1 torvalds torvalds  940265 2010-01-21 11:16 git-hash-object

Now, I'm not saying that 940kB is good either, but that's mostly all the 
debug information - you can see the real code with 'size':

   text	   data	    bss	    dec	    hex	filename
 418675	   3920	 127408	 550003	  86473	git-hash-object (before)
 230650	   2288	 111728	 344666	  5425a	git-hash-object (after)

ie we have a nice 24% size reduction from this trivial cleanup.

It's not just that one file either. I get:

	[torvalds@nehalem git]$ du -s /home/torvalds/libexec/git-core
	45640	/home/torvalds/libexec/git-core (before)
	33508	/home/torvalds/libexec/git-core (after)

so we're talking 12MB of diskspace here. 

(Of course, stripping all the binaries brings the 33MB down to 9MB, so the 
whole debug information thing is still the bulk of it all, but that's a 
separate issue entirely)

Now, I'm sure there are other things we should do, and changing our 
compiler flags from -O2 to -Os would bring the text size down by an 
additional almost 20%, but this thing Exal pointed out seems to be some 
good low-hanging fruit.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 builtin-add.c |   76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 read-cache.c  |   78 ---------------------------------------------------------
 2 files changed, 76 insertions(+), 78 deletions(-)

diff --git a/builtin-add.c b/builtin-add.c
index cb6e590..2705f8d 100644
--- a/builtin-add.c
+++ b/builtin-add.c
@@ -11,6 +11,7 @@
 #include "run-command.h"
 #include "parse-options.h"
 #include "diff.h"
+#include "diffcore.h"
 #include "revision.h"
 
 static const char * const builtin_add_usage[] = {
@@ -20,6 +21,81 @@ static const char * const builtin_add_usage[] = {
 static int patch_interactive, add_interactive, edit_interactive;
 static int take_worktree_changes;
 
+struct update_callback_data
+{
+	int flags;
+	int add_errors;
+};
+
+static void update_callback(struct diff_queue_struct *q,
+			    struct diff_options *opt, void *cbdata)
+{
+	int i;
+	struct update_callback_data *data = cbdata;
+
+	for (i = 0; i < q->nr; i++) {
+		struct diff_filepair *p = q->queue[i];
+		const char *path = p->one->path;
+		switch (p->status) {
+		default:
+			die("unexpected diff status %c", p->status);
+		case DIFF_STATUS_UNMERGED:
+			/*
+			 * ADD_CACHE_IGNORE_REMOVAL is unset if "git
+			 * add -u" is calling us, In such a case, a
+			 * missing work tree file needs to be removed
+			 * if there is an unmerged entry at stage #2,
+			 * but such a diff record is followed by
+			 * another with DIFF_STATUS_DELETED (and if
+			 * there is no stage #2, we won't see DELETED
+			 * nor MODIFIED).  We can simply continue
+			 * either way.
+			 */
+			if (!(data->flags & ADD_CACHE_IGNORE_REMOVAL))
+				continue;
+			/*
+			 * Otherwise, it is "git add path" is asking
+			 * to explicitly add it; we fall through.  A
+			 * missing work tree file is an error and is
+			 * caught by add_file_to_index() in such a
+			 * case.
+			 */
+		case DIFF_STATUS_MODIFIED:
+		case DIFF_STATUS_TYPE_CHANGED:
+			if (add_file_to_index(&the_index, path, data->flags)) {
+				if (!(data->flags & ADD_CACHE_IGNORE_ERRORS))
+					die("updating files failed");
+				data->add_errors++;
+			}
+			break;
+		case DIFF_STATUS_DELETED:
+			if (data->flags & ADD_CACHE_IGNORE_REMOVAL)
+				break;
+			if (!(data->flags & ADD_CACHE_PRETEND))
+				remove_file_from_index(&the_index, path);
+			if (data->flags & (ADD_CACHE_PRETEND|ADD_CACHE_VERBOSE))
+				printf("remove '%s'\n", path);
+			break;
+		}
+	}
+}
+
+int add_files_to_cache(const char *prefix, const char **pathspec, int flags)
+{
+	struct update_callback_data data;
+	struct rev_info rev;
+	init_revisions(&rev, prefix);
+	setup_revisions(0, NULL, &rev, NULL);
+	rev.prune_data = pathspec;
+	rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
+	rev.diffopt.format_callback = update_callback;
+	data.flags = flags;
+	data.add_errors = 0;
+	rev.diffopt.format_callback_data = &data;
+	run_diff_files(&rev, DIFF_RACY_IS_MODIFIED);
+	return !!data.add_errors;
+}
+
 static void fill_pathspec_matches(const char **pathspec, char *seen, int specs)
 {
 	int num_unmatched = 0, i;
diff --git a/read-cache.c b/read-cache.c
index edd9959..79938bf 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -10,9 +10,6 @@
 #include "dir.h"
 #include "tree.h"
 #include "commit.h"
-#include "diff.h"
-#include "diffcore.h"
-#include "revision.h"
 #include "blob.h"
 #include "resolve-undo.h"
 
@@ -1648,81 +1645,6 @@ int read_index_unmerged(struct index_state *istate)
 	return unmerged;
 }
 
-struct update_callback_data
-{
-	int flags;
-	int add_errors;
-};
-
-static void update_callback(struct diff_queue_struct *q,
-			    struct diff_options *opt, void *cbdata)
-{
-	int i;
-	struct update_callback_data *data = cbdata;
-
-	for (i = 0; i < q->nr; i++) {
-		struct diff_filepair *p = q->queue[i];
-		const char *path = p->one->path;
-		switch (p->status) {
-		default:
-			die("unexpected diff status %c", p->status);
-		case DIFF_STATUS_UNMERGED:
-			/*
-			 * ADD_CACHE_IGNORE_REMOVAL is unset if "git
-			 * add -u" is calling us, In such a case, a
-			 * missing work tree file needs to be removed
-			 * if there is an unmerged entry at stage #2,
-			 * but such a diff record is followed by
-			 * another with DIFF_STATUS_DELETED (and if
-			 * there is no stage #2, we won't see DELETED
-			 * nor MODIFIED).  We can simply continue
-			 * either way.
-			 */
-			if (!(data->flags & ADD_CACHE_IGNORE_REMOVAL))
-				continue;
-			/*
-			 * Otherwise, it is "git add path" is asking
-			 * to explicitly add it; we fall through.  A
-			 * missing work tree file is an error and is
-			 * caught by add_file_to_index() in such a
-			 * case.
-			 */
-		case DIFF_STATUS_MODIFIED:
-		case DIFF_STATUS_TYPE_CHANGED:
-			if (add_file_to_index(&the_index, path, data->flags)) {
-				if (!(data->flags & ADD_CACHE_IGNORE_ERRORS))
-					die("updating files failed");
-				data->add_errors++;
-			}
-			break;
-		case DIFF_STATUS_DELETED:
-			if (data->flags & ADD_CACHE_IGNORE_REMOVAL)
-				break;
-			if (!(data->flags & ADD_CACHE_PRETEND))
-				remove_file_from_index(&the_index, path);
-			if (data->flags & (ADD_CACHE_PRETEND|ADD_CACHE_VERBOSE))
-				printf("remove '%s'\n", path);
-			break;
-		}
-	}
-}
-
-int add_files_to_cache(const char *prefix, const char **pathspec, int flags)
-{
-	struct update_callback_data data;
-	struct rev_info rev;
-	init_revisions(&rev, prefix);
-	setup_revisions(0, NULL, &rev, NULL);
-	rev.prune_data = pathspec;
-	rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
-	rev.diffopt.format_callback = update_callback;
-	data.flags = flags;
-	data.add_errors = 0;
-	rev.diffopt.format_callback_data = &data;
-	run_diff_files(&rev, DIFF_RACY_IS_MODIFIED);
-	return !!data.add_errors;
-}
-
 /*
  * Returns 1 if the path is an "other" path with respect to
  * the index; that is, the path is not mentioned in the index at all,

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Remove diff machinery dependency from read-cache
  2010-01-21 19:37 Remove diff machinery dependency from read-cache Linus Torvalds
@ 2010-01-21 20:07 ` Junio C Hamano
  2010-01-21 20:15   ` Linus Torvalds
  2010-01-22  3:58 ` Linus Torvalds
  2010-01-23  6:31 ` Brian Campbell
  2 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2010-01-21 20:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> This trivial cleanup results in pretty stunning file size differences. 
> The diff machinery really is mostly used by just the builtin programs, and 
> you have things like these trivial before-and-after numbers:
>
>   -rwxr-xr-x 1 torvalds torvalds 1727420 2010-01-21 10:53 git-hash-object
>   -rwxrwxr-x 1 torvalds torvalds  940265 2010-01-21 11:16 git-hash-object
>
> Now, I'm not saying that 940kB is good either, but that's mostly all the 
> debug information - you can see the real code with 'size':
>
>    text	   data	    bss	    dec	    hex	filename
>  418675	   3920	 127408	 550003	  86473	git-hash-object (before)
>  230650	   2288	 111728	 344666	  5425a	git-hash-object (after)
>
> ie we have a nice 24% size reduction from this trivial cleanup.


The patch itself to move add_files_to_cache() to builtin-add.c (or to its
own file) makes sense from the code placement POV, but if the goal is to
shrink the on-disk footprint, isn't an alternative approach be to make
hash-object built-in?  You can lose the whole 1.7M from the filesystem
footprint that way, no?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Remove diff machinery dependency from read-cache
  2010-01-21 20:07 ` Junio C Hamano
@ 2010-01-21 20:15   ` Linus Torvalds
  2010-01-21 22:18     ` Linus Torvalds
  0 siblings, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2010-01-21 20:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List



On Thu, 21 Jan 2010, Junio C Hamano wrote:
> 
> The patch itself to move add_files_to_cache() to builtin-add.c (or to its
> own file) makes sense from the code placement POV, but if the goal is to
> shrink the on-disk footprint, isn't an alternative approach be to make
> hash-object built-in?  You can lose the whole 1.7M from the filesystem
> footprint that way, no?

Sure. Except, as I mentioned, it's not just git-hash-object. It's _all_ of 
them.

The total space savings wasn't 1.7M, it was 12M.

		Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Remove diff machinery dependency from read-cache
  2010-01-21 20:15   ` Linus Torvalds
@ 2010-01-21 22:18     ` Linus Torvalds
  2010-01-21 23:25       ` Linus Torvalds
  0 siblings, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2010-01-21 22:18 UTC (permalink / raw)
  To: Junio C Hamano, Shawn O. Pearce; +Cc: Git Mailing List



On Thu, 21 Jan 2010, Linus Torvalds wrote:
> 
> Sure. Except, as I mentioned, it's not just git-hash-object. It's _all_ of 
> them.
> 
> The total space savings wasn't 1.7M, it was 12M.

There are some other interesting cases. For example, look at this:

	[torvalds@nehalem git]$ size show-index.o git-show-index
	   text	   data	    bss	    dec	    hex	filename
	   1310	      0	   1024	   2334	    91e	show-index.o
	 222706	   2296	 112720	 337722	  5273a	git-show-index

ie a trivial program like 'show-index' has ballooned to 220kB. Let's look 
at why:

	[torvalds@nehalem git]$ nm show-index.o | grep ' U '
	                 U die
	                 U fread
	                 U free
	                 U printf
	                 U sha1_to_hex
	                 U stdin
	                 U usage
	                 U xmalloc

ok, if you ignore standard library things (which will be from a shared 
library anyway), it really only wants totally trivial things: die, 
xmalloc, and sha1_to_hex. Those should be a few hundred bytes, not a few 
hundred _kilo_bytes.

So what happens?

 - sha1_to_hex brings in all of sha1_file.c, even though it doesn't need 
   any of it. Ok, that's easily fixed: split up the hex helpers into a 
   file of its own ("hex.c")

 - "die()" brings in usage.c, which is actually designed correctly, so it 
   is all fine. No extra pain there. Sure, we'll get some other trivial 
   stuff from there, but we're talking maybe a kilobyte of code.

 - "xmalloc()" brings in the trivial wrappers.

   OOPS.

Those wrappers bring in zlib (through git_inflate*), which is not a huge 
issue, we coult just move the git_inflate*() wrappers to its own file. 
Trivial. But the wrappers also bring in:

 - xmalloc/xrealloc/xstrdup:
                 U release_pack_memory

which in turn brings in _all_ of the rest of the git libraries. End 
result: a trivial git helper program that _should_ be a couple of 
kilobytes in size ends up being 200+kB of text, and 900kB with debug 
information.

Absolutely _none_ of which is in the least useful.

Oh well.

We could fix it a few ways

 - ignore it. Most git programs will get the pack handling functions 
   anyway, since they want to get object reading.

 - as mentioned, just build in _everything_ so that we only ever have one 
   binary

 - get rid of release_pack_memory() entirely. We have better ways to limit 
   pack memory use these days, but they do require configuration (we do 
   have a default packed_git_limit, though, so even without any explicit 
   configuration it's not insane).

 - don't have explicit knowledge about 'release_pack_memory' in xmalloc, 
   but instead have the packing functions register a "xmalloc 
   pressure_reliever function". So then programs that have pack handling 
   will register the fixup function, and programs that don't will never 
   even know.

Hmm? We have about 20 external programs that may hide issues like this.

		Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Remove diff machinery dependency from read-cache
  2010-01-21 22:18     ` Linus Torvalds
@ 2010-01-21 23:25       ` Linus Torvalds
  2010-01-22  0:45         ` Junio C Hamano
  0 siblings, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2010-01-21 23:25 UTC (permalink / raw)
  To: Junio C Hamano, Shawn O. Pearce; +Cc: Git Mailing List



On Thu, 21 Jan 2010, Linus Torvalds wrote:
> 
> We could fix it a few ways
> 
>  - ignore it. Most git programs will get the pack handling functions 
>    anyway, since they want to get object reading.

In fact, we should probably remove git-show-index. It may have some 
historical significance as a pack-file index debugger, but it has no 
actual redeeming features now, considering that the binary is a megabyte 
of useless crud with debugging info.

However, we do actually use it in t/t5302-pack-index.sh. So in the 
meantime, how about this hacky patch to simply just avoid xmalloc, and 
separating out the trivial hex functions into "hex.o".

This results in

  [torvalds@nehalem git]$ size git-show-index 
       text    data     bss     dec     hex filename
     222818    2276  112688  337782   52776 git-show-index (before)
       5696     624    1264    7584    1da0 git-show-index (after)

which is a whole lot better, no?

(Or make it a built-in, if we actually think we want to carry it along in 
the long run)

		Linus

---
 Makefile     |    1 +
 sha1_file.c  |   66 ----------------------------------------------------------
 show-index.c |    2 +-
 3 files changed, 2 insertions(+), 67 deletions(-)

diff --git a/Makefile b/Makefile
index ad890ec..a041b69 100644
--- a/Makefile
+++ b/Makefile
@@ -559,6 +559,7 @@ LIB_OBJS += graph.o
 LIB_OBJS += grep.o
 LIB_OBJS += hash.o
 LIB_OBJS += help.o
+LIB_OBJS += hex.o
 LIB_OBJS += ident.o
 LIB_OBJS += levenshtein.o
 LIB_OBJS += list-objects.o
diff --git a/sha1_file.c b/sha1_file.c
index 7086760..12478a3 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -35,54 +35,6 @@ static size_t sz_fmt(size_t s) { return s; }
 
 const unsigned char null_sha1[20];
 
-const signed char hexval_table[256] = {
-	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 00-07 */
-	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 08-0f */
-	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 10-17 */
-	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 18-1f */
-	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 20-27 */
-	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 28-2f */
-	  0,  1,  2,  3,  4,  5,  6,  7,		/* 30-37 */
-	  8,  9, -1, -1, -1, -1, -1, -1,		/* 38-3f */
-	 -1, 10, 11, 12, 13, 14, 15, -1,		/* 40-47 */
-	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 48-4f */
-	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 50-57 */
-	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 58-5f */
-	 -1, 10, 11, 12, 13, 14, 15, -1,		/* 60-67 */
-	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 68-67 */
-	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 70-77 */
-	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 78-7f */
-	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 80-87 */
-	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 88-8f */
-	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 90-97 */
-	 -1, -1, -1, -1, -1, -1, -1, -1,		/* 98-9f */
-	 -1, -1, -1, -1, -1, -1, -1, -1,		/* a0-a7 */
-	 -1, -1, -1, -1, -1, -1, -1, -1,		/* a8-af */
-	 -1, -1, -1, -1, -1, -1, -1, -1,		/* b0-b7 */
-	 -1, -1, -1, -1, -1, -1, -1, -1,		/* b8-bf */
-	 -1, -1, -1, -1, -1, -1, -1, -1,		/* c0-c7 */
-	 -1, -1, -1, -1, -1, -1, -1, -1,		/* c8-cf */
-	 -1, -1, -1, -1, -1, -1, -1, -1,		/* d0-d7 */
-	 -1, -1, -1, -1, -1, -1, -1, -1,		/* d8-df */
-	 -1, -1, -1, -1, -1, -1, -1, -1,		/* e0-e7 */
-	 -1, -1, -1, -1, -1, -1, -1, -1,		/* e8-ef */
-	 -1, -1, -1, -1, -1, -1, -1, -1,		/* f0-f7 */
-	 -1, -1, -1, -1, -1, -1, -1, -1,		/* f8-ff */
-};
-
-int get_sha1_hex(const char *hex, unsigned char *sha1)
-{
-	int i;
-	for (i = 0; i < 20; i++) {
-		unsigned int val = (hexval(hex[0]) << 4) | hexval(hex[1]);
-		if (val & ~0xff)
-			return -1;
-		*sha1++ = val;
-		hex += 2;
-	}
-	return 0;
-}
-
 static inline int offset_1st_component(const char *path)
 {
 	if (has_dos_drive_prefix(path))
@@ -133,24 +85,6 @@ int safe_create_leading_directories_const(const char *path)
 	return result;
 }
 
-char *sha1_to_hex(const unsigned char *sha1)
-{
-	static int bufno;
-	static char hexbuffer[4][50];
-	static const char hex[] = "0123456789abcdef";
-	char *buffer = hexbuffer[3 & ++bufno], *buf = buffer;
-	int i;
-
-	for (i = 0; i < 20; i++) {
-		unsigned int val = *sha1++;
-		*buf++ = hex[val >> 4];
-		*buf++ = hex[val & 0xf];
-	}
-	*buf = '\0';
-
-	return buffer;
-}
-
 static void fill_sha1_path(char *pathbuf, const unsigned char *sha1)
 {
 	int i;
diff --git a/show-index.c b/show-index.c
index 63f9da5..4c0ac13 100644
--- a/show-index.c
+++ b/show-index.c
@@ -48,7 +48,7 @@ int main(int argc, char **argv)
 			unsigned char sha1[20];
 			uint32_t crc;
 			uint32_t off;
-		} *entries = xmalloc(nr * sizeof(entries[0]));
+		} *entries = malloc(nr * sizeof(entries[0]));
 		for (i = 0; i < nr; i++)
 			if (fread(entries[i].sha1, 20, 1, stdin) != 1)
 				die("unable to read sha1 %u/%u", i, nr);

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Remove diff machinery dependency from read-cache
  2010-01-21 23:25       ` Linus Torvalds
@ 2010-01-22  0:45         ` Junio C Hamano
  2010-01-22  0:59           ` Johannes Schindelin
                             ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Junio C Hamano @ 2010-01-22  0:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Shawn O. Pearce, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Thu, 21 Jan 2010, Linus Torvalds wrote:
>> 
>> We could fix it a few ways
>> 
>>  - ignore it. Most git programs will get the pack handling functions 
>>    anyway, since they want to get object reading.
>
> In fact, we should probably remove git-show-index. It may have some 
> historical significance as a pack-file index debugger, but it has no 
> actual redeeming features now, considering that the binary is a megabyte 
> of useless crud with debugging info.
>
> However, we do actually use it in t/t5302-pack-index.sh. So in the 
> meantime, how about this hacky patch to simply just avoid xmalloc, and 
> separating out the trivial hex functions into "hex.o".
>
> This results in
>
>   [torvalds@nehalem git]$ size git-show-index 
>        text    data     bss     dec     hex filename
>      222818    2276  112688  337782   52776 git-show-index (before)
>        5696     624    1264    7584    1da0 git-show-index (after)
>
> which is a whole lot better, no?
>
> (Or make it a built-in, if we actually think we want to carry it along in 
> the long run)

We tend to not remove things unless we are absolutely certain nobody uses
it, so probably making it built-in would be preferrable.  I don't think
show-index is used very often if ever, but scripts that use hash-object
would use it really often and would do so via its --stdin interface if it
knows that it is creating more than a dozen objects, so start-up time
required to map the whole git is probably not an issue.

By the way, do you think anybody still uses "git merge-trees"?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Remove diff machinery dependency from read-cache
  2010-01-22  0:45         ` Junio C Hamano
@ 2010-01-22  0:59           ` Johannes Schindelin
  2010-01-22  1:01             ` Junio C Hamano
  2010-01-22  2:20           ` Linus Torvalds
  2010-01-22  2:35           ` Remove diff machinery dependency from read-cache Nicolas Pitre
  2 siblings, 1 reply; 23+ messages in thread
From: Johannes Schindelin @ 2010-01-22  0:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Shawn O. Pearce, Git Mailing List

Hi,

On Thu, 21 Jan 2010, Junio C Hamano wrote:

> By the way, do you think anybody still uses "git merge-trees"?

IMO this is the only viable way to a non-broken merge-recursive.  Removing 
it would be counterproductive.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Remove diff machinery dependency from read-cache
  2010-01-22  0:59           ` Johannes Schindelin
@ 2010-01-22  1:01             ` Junio C Hamano
  2010-01-22  1:43               ` Johannes Schindelin
  0 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2010-01-22  1:01 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Linus Torvalds, Shawn O. Pearce, Git Mailing List

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Thu, 21 Jan 2010, Junio C Hamano wrote:
>
>> By the way, do you think anybody still uses "git merge-trees"?
>
> IMO this is the only viable way to a non-broken merge-recursive.  Removing 
> it would be counterproductive.

Do you mean you don't think

    Subject: Re: git-merge segfault in 1.6.6 and master
    Date: Thu, 21 Jan 2010 16:38:56 -0800
    Message-ID: <7vaaw7j7mn.fsf@alter.siamese.dyndns.org>

    ...
    In the meantime, I think applying this patch is the right thing to do.

    -- >8 --
    Subject: merge-recursive: do not return NULL only to cause segfault

would help us?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Remove diff machinery dependency from read-cache
  2010-01-22  1:01             ` Junio C Hamano
@ 2010-01-22  1:43               ` Johannes Schindelin
  2010-01-22  3:50                 ` Junio C Hamano
  0 siblings, 1 reply; 23+ messages in thread
From: Johannes Schindelin @ 2010-01-22  1:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Shawn O. Pearce, Git Mailing List

Hi,

On Thu, 21 Jan 2010, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > On Thu, 21 Jan 2010, Junio C Hamano wrote:
> >
> >> By the way, do you think anybody still uses "git merge-trees"?
> >
> > IMO this is the only viable way to a non-broken merge-recursive.  Removing 
> > it would be counterproductive.
> 
> Do you mean you don't think
> 
>     Subject: Re: git-merge segfault in 1.6.6 and master
>     Date: Thu, 21 Jan 2010 16:38:56 -0800
>     Message-ID: <7vaaw7j7mn.fsf@alter.siamese.dyndns.org>
> 
>     ...
>     In the meantime, I think applying this patch is the right thing to do.
> 
>     -- >8 --
>     Subject: merge-recursive: do not return NULL only to cause segfault
> 
> would help us?

Sorry, I cannot have a look at this now.

But in the long run, I think that it gets tiring to chase all kinds of 
weird interactions between unpack_trees(), the rename detection and the 
recursive merge.

I could see merge-trees as a viable alternative approach.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Remove diff machinery dependency from read-cache
  2010-01-22  0:45         ` Junio C Hamano
  2010-01-22  0:59           ` Johannes Schindelin
@ 2010-01-22  2:20           ` Linus Torvalds
  2010-01-22  2:25             ` Linus Torvalds
  2010-01-22  2:35           ` Remove diff machinery dependency from read-cache Nicolas Pitre
  2 siblings, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2010-01-22  2:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn O. Pearce, Git Mailing List



On Thu, 21 Jan 2010, Junio C Hamano wrote:
> 
> We tend to not remove things unless we are absolutely certain nobody uses
> it, so probably making it built-in would be preferrable.  I don't think
> show-index is used very often if ever, but scripts that use hash-object
> would use it really often and would do so via its --stdin interface if it
> knows that it is creating more than a dozen objects, so start-up time
> required to map the whole git is probably not an issue.

git-show-index should certainly be easy to turn into a built-in too. Patch 
appended.

> By the way, do you think anybody still uses "git merge-trees"?

I dunno. I think it has some conceptual advantages, but realistically, I 
doubt anybody is willing to go through the pain to make it grow up enough 
to become a viable alternative to our current situation.

		Linus

---
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Thu Jan 21 18:16:49 2010 -0800
Subject: Turn 'show-index' into a builtin

.. rather than being a huge executable just because it sucks in all the
git library code and then does something really trivial.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Makefile                             |    2 +-
 show-index.c => builtin-show-index.c |    2 +-
 builtin.h                            |    1 +
 git.c                                |    3 ++-
 4 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/Makefile b/Makefile
index ad890ec..3439d2c 100644
--- a/Makefile
+++ b/Makefile
@@ -396,7 +396,6 @@ PROGRAMS += git-mktag$X
 PROGRAMS += git-pack-redundant$X
 PROGRAMS += git-patch-id$X
 PROGRAMS += git-shell$X
-PROGRAMS += git-show-index$X
 PROGRAMS += git-unpack-file$X
 PROGRAMS += git-upload-pack$X
 PROGRAMS += git-var$X
@@ -693,6 +692,7 @@ BUILTIN_OBJS += builtin-rm.o
 BUILTIN_OBJS += builtin-send-pack.o
 BUILTIN_OBJS += builtin-shortlog.o
 BUILTIN_OBJS += builtin-show-branch.o
+BUILTIN_OBJS += builtin-show-index.o
 BUILTIN_OBJS += builtin-show-ref.o
 BUILTIN_OBJS += builtin-stripspace.o
 BUILTIN_OBJS += builtin-symbolic-ref.o
diff --git a/show-index.c b/builtin-show-index.c
similarity index 96%
rename from show-index.c
rename to builtin-show-index.c
index 63f9da5..92202b8 100644
--- a/show-index.c
+++ b/builtin-show-index.c
@@ -4,7 +4,7 @@
 static const char show_index_usage[] =
 "git show-index < <packed archive index>";
 
-int main(int argc, char **argv)
+int cmd_show_index(int argc, const char **argv, const  char *prefix)
 {
 	int i;
 	unsigned nr;
diff --git a/builtin.h b/builtin.h
index c3f83c0..4d73d7c 100644
--- a/builtin.h
+++ b/builtin.h
@@ -93,6 +93,7 @@ extern int cmd_send_pack(int argc, const char **argv, const char *prefix);
 extern int cmd_shortlog(int argc, const char **argv, const char *prefix);
 extern int cmd_show(int argc, const char **argv, const char *prefix);
 extern int cmd_show_branch(int argc, const char **argv, const char *prefix);
+extern int cmd_show_index(int argc, const char **argv, const char *prefix);
 extern int cmd_status(int argc, const char **argv, const char *prefix);
 extern int cmd_stripspace(int argc, const char **argv, const char *prefix);
 extern int cmd_symbolic_ref(int argc, const char **argv, const char *prefix);
diff --git a/git.c b/git.c
index 194471f..5fabf18 100644
--- a/git.c
+++ b/git.c
@@ -358,8 +358,9 @@ static void handle_internal_command(int argc, const char **argv)
 		{ "rm", cmd_rm, RUN_SETUP },
 		{ "send-pack", cmd_send_pack, RUN_SETUP },
 		{ "shortlog", cmd_shortlog, USE_PAGER },
-		{ "show-branch", cmd_show_branch, RUN_SETUP },
 		{ "show", cmd_show, RUN_SETUP | USE_PAGER },
+		{ "show-branch", cmd_show_branch, RUN_SETUP },
+		{ "show-index", cmd_show_index },
 		{ "status", cmd_status, RUN_SETUP | NEED_WORK_TREE },
 		{ "stripspace", cmd_stripspace },
 		{ "symbolic-ref", cmd_symbolic_ref, RUN_SETUP },

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Remove diff machinery dependency from read-cache
  2010-01-22  2:20           ` Linus Torvalds
@ 2010-01-22  2:25             ` Linus Torvalds
  2010-01-22  3:50               ` Linus Torvalds
  2010-01-22  8:43               ` Johannes Sixt
  0 siblings, 2 replies; 23+ messages in thread
From: Linus Torvalds @ 2010-01-22  2:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn O. Pearce, Git Mailing List



On Thu, 21 Jan 2010, Linus Torvalds wrote:
> 
> > By the way, do you think anybody still uses "git merge-trees"?
> 
> I dunno. I think it has some conceptual advantages, but realistically, I 
> doubt anybody is willing to go through the pain to make it grow up enough 
> to become a viable alternative to our current situation.

This makes it a built-in, at least, so it doesn't waste the diskspace.

		Linus

---
 Makefile                             |    2 +-
 merge-tree.c => builtin-merge-tree.c |    4 +---
 builtin.h                            |    1 +
 git.c                                |    1 +
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/Makefile b/Makefile
index 3439d2c..35aea16 100644
--- a/Makefile
+++ b/Makefile
@@ -391,7 +391,6 @@ PROGRAMS += git-hash-object$X
 PROGRAMS += git-imap-send$X
 PROGRAMS += git-index-pack$X
 PROGRAMS += git-merge-index$X
-PROGRAMS += git-merge-tree$X
 PROGRAMS += git-mktag$X
 PROGRAMS += git-pack-redundant$X
 PROGRAMS += git-patch-id$X
@@ -670,6 +669,7 @@ BUILTIN_OBJS += builtin-merge-base.o
 BUILTIN_OBJS += builtin-merge-file.o
 BUILTIN_OBJS += builtin-merge-ours.o
 BUILTIN_OBJS += builtin-merge-recursive.o
+BUILTIN_OBJS += builtin-merge-tree.o
 BUILTIN_OBJS += builtin-mktree.o
 BUILTIN_OBJS += builtin-mv.o
 BUILTIN_OBJS += builtin-name-rev.o
diff --git a/merge-tree.c b/builtin-merge-tree.c
similarity index 99%
rename from merge-tree.c
rename to builtin-merge-tree.c
index 37b94d9..8e16c3e 100644
--- a/merge-tree.c
+++ b/builtin-merge-tree.c
@@ -337,7 +337,7 @@ static void *get_tree_descriptor(struct tree_desc *desc, const char *rev)
 	return buf;
 }
 
-int main(int argc, char **argv)
+int cmd_merge_tree(int argc, const char **argv, const char *prefix)
 {
 	struct tree_desc t[3];
 	void *buf1, *buf2, *buf3;
@@ -347,8 +347,6 @@ int main(int argc, char **argv)
 
 	git_extract_argv0_path(argv[0]);
 
-	setup_git_directory();
-
 	buf1 = get_tree_descriptor(t+0, argv[1]);
 	buf2 = get_tree_descriptor(t+1, argv[2]);
 	buf3 = get_tree_descriptor(t+2, argv[3]);
diff --git a/builtin.h b/builtin.h
index 4d73d7c..06bf04e 100644
--- a/builtin.h
+++ b/builtin.h
@@ -70,6 +70,7 @@ extern int cmd_merge_base(int argc, const char **argv, const char *prefix);
 extern int cmd_merge_ours(int argc, const char **argv, const char *prefix);
 extern int cmd_merge_file(int argc, const char **argv, const char *prefix);
 extern int cmd_merge_recursive(int argc, const char **argv, const char *prefix);
+extern int cmd_merge_tree(int argc, const char **argv, const char *prefix);
 extern int cmd_mktree(int argc, const char **argv, const char *prefix);
 extern int cmd_mv(int argc, const char **argv, const char *prefix);
 extern int cmd_name_rev(int argc, const char **argv, const char *prefix);
diff --git a/git.c b/git.c
index 5fabf18..e5964a8 100644
--- a/git.c
+++ b/git.c
@@ -335,6 +335,7 @@ static void handle_internal_command(int argc, const char **argv)
 		{ "merge-recursive-ours", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE },
 		{ "merge-recursive-theirs", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE },
 		{ "merge-subtree", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE },
+		{ "merge-tree", cmd_merge_tree, RUN_SETUP },
 		{ "mktree", cmd_mktree, RUN_SETUP },
 		{ "mv", cmd_mv, RUN_SETUP | NEED_WORK_TREE },
 		{ "name-rev", cmd_name_rev, RUN_SETUP },

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Remove diff machinery dependency from read-cache
  2010-01-22  0:45         ` Junio C Hamano
  2010-01-22  0:59           ` Johannes Schindelin
  2010-01-22  2:20           ` Linus Torvalds
@ 2010-01-22  2:35           ` Nicolas Pitre
  2010-01-22  2:44             ` Linus Torvalds
                               ` (2 more replies)
  2 siblings, 3 replies; 23+ messages in thread
From: Nicolas Pitre @ 2010-01-22  2:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Shawn O. Pearce, Git Mailing List

On Thu, 21 Jan 2010, Junio C Hamano wrote:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
> 
> > On Thu, 21 Jan 2010, Linus Torvalds wrote:
> >> 
> >> We could fix it a few ways
> >> 
> >>  - ignore it. Most git programs will get the pack handling functions 
> >>    anyway, since they want to get object reading.
> >
> > In fact, we should probably remove git-show-index. It may have some 
> > historical significance as a pack-file index debugger, but it has no 
> > actual redeeming features now, considering that the binary is a megabyte 
> > of useless crud with debugging info.
> >
> > However, we do actually use it in t/t5302-pack-index.sh. So in the 
> > meantime, how about this hacky patch to simply just avoid xmalloc, and 
> > separating out the trivial hex functions into "hex.o".
> >
> > This results in
> >
> >   [torvalds@nehalem git]$ size git-show-index 
> >        text    data     bss     dec     hex filename
> >      222818    2276  112688  337782   52776 git-show-index (before)
> >        5696     624    1264    7584    1da0 git-show-index (after)
> >
> > which is a whole lot better, no?
> >
> > (Or make it a built-in, if we actually think we want to carry it along in 
> > the long run)
> 
> We tend to not remove things unless we are absolutely certain nobody uses
> it, so probably making it built-in would be preferrable.  I don't think
> show-index is used very often if ever, but scripts that use hash-object
> would use it really often and would do so via its --stdin interface if it
> knows that it is creating more than a dozen objects, so start-up time
> required to map the whole git is probably not an issue.

I do use it, but for developing/debugging pack stuff.
I don't suggest removing it, but I don't think making it a built-in has 
value either.

So I really think that Linus' patch (which is missing hex.c btw) is a 
good thing to do, even if only for the cleanup value.

Then, git-show-index could probably become test-show-index and no longer 
leave the build directory.


Nicolas

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Remove diff machinery dependency from read-cache
  2010-01-22  2:35           ` Remove diff machinery dependency from read-cache Nicolas Pitre
@ 2010-01-22  2:44             ` Linus Torvalds
  2010-01-22  3:56             ` Junio C Hamano
  2010-01-22  4:31             ` Linus Torvalds
  2 siblings, 0 replies; 23+ messages in thread
From: Linus Torvalds @ 2010-01-22  2:44 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, Shawn O. Pearce, Git Mailing List



On Thu, 21 Jan 2010, Nicolas Pitre wrote:
> 
> So I really think that Linus' patch (which is missing hex.c btw) is a 
> good thing to do, even if only for the cleanup value.

Gaah. hex.c was the obvious code movement with just a "cache.h" include at 
the top. However, I already threw away that whole tree in favor of the 
built-in version, and now I'm too lazy to re-create it.

		Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Remove diff machinery dependency from read-cache
  2010-01-22  2:25             ` Linus Torvalds
@ 2010-01-22  3:50               ` Linus Torvalds
  2010-01-22  8:43               ` Johannes Sixt
  1 sibling, 0 replies; 23+ messages in thread
From: Linus Torvalds @ 2010-01-22  3:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn O. Pearce, Git Mailing List



On Thu, 21 Jan 2010, Linus Torvalds wrote:
> 
> This makes it a built-in, at least, so it doesn't waste the diskspace.

.. and here's 'git hash-object' as a built-in.

		Linus

---
 Makefile                               |    2 +-
 hash-object.c => builtin-hash-object.c |    5 +----
 builtin.h                              |    1 +
 git.c                                  |    1 +
 4 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/Makefile b/Makefile
index 35aea16..f9e4aa3 100644
--- a/Makefile
+++ b/Makefile
@@ -387,7 +387,6 @@ EXTRA_PROGRAMS =
 # ... and all the rest that could be moved out of bindir to gitexecdir
 PROGRAMS += $(EXTRA_PROGRAMS)
 PROGRAMS += git-fast-import$X
-PROGRAMS += git-hash-object$X
 PROGRAMS += git-imap-send$X
 PROGRAMS += git-index-pack$X
 PROGRAMS += git-merge-index$X
@@ -656,6 +655,7 @@ BUILTIN_OBJS += builtin-for-each-ref.o
 BUILTIN_OBJS += builtin-fsck.o
 BUILTIN_OBJS += builtin-gc.o
 BUILTIN_OBJS += builtin-grep.o
+BUILTIN_OBJS += builtin-hash-object.o
 BUILTIN_OBJS += builtin-help.o
 BUILTIN_OBJS += builtin-init-db.o
 BUILTIN_OBJS += builtin-log.o
diff --git a/hash-object.c b/builtin-hash-object.c
similarity index 97%
rename from hash-object.c
rename to builtin-hash-object.c
index 9455dd0..6a5f5b5 100644
--- a/hash-object.c
+++ b/builtin-hash-object.c
@@ -73,17 +73,14 @@ static const struct option hash_object_options[] = {
 	OPT_END()
 };
 
-int main(int argc, const char **argv)
+int cmd_hash_object(int argc, const char **argv, const char *prefix)
 {
 	int i;
-	const char *prefix = NULL;
 	int prefix_length = -1;
 	const char *errstr = NULL;
 
 	type = blob_type;
 
-	git_extract_argv0_path(argv[0]);
-
 	argc = parse_options(argc, argv, NULL, hash_object_options,
 			     hash_object_usage, 0);
 
diff --git a/builtin.h b/builtin.h
index 06bf04e..3aa6b6c 100644
--- a/builtin.h
+++ b/builtin.h
@@ -55,6 +55,7 @@ extern int cmd_fsck(int argc, const char **argv, const char *prefix);
 extern int cmd_gc(int argc, const char **argv, const char *prefix);
 extern int cmd_get_tar_commit_id(int argc, const char **argv, const char *prefix);
 extern int cmd_grep(int argc, const char **argv, const char *prefix);
+extern int cmd_hash_object(int argc, const char **argv, const char *prefix);
 extern int cmd_help(int argc, const char **argv, const char *prefix);
 extern int cmd_http_fetch(int argc, const char **argv, const char *prefix);
 extern int cmd_init_db(int argc, const char **argv, const char *prefix);
diff --git a/git.c b/git.c
index e5964a8..a952663 100644
--- a/git.c
+++ b/git.c
@@ -318,6 +318,7 @@ static void handle_internal_command(int argc, const char **argv)
 		{ "gc", cmd_gc, RUN_SETUP },
 		{ "get-tar-commit-id", cmd_get_tar_commit_id },
 		{ "grep", cmd_grep, USE_PAGER },
+		{ "hash-object", cmd_hash_object },
 		{ "help", cmd_help },
 		{ "init", cmd_init_db },
 		{ "init-db", cmd_init_db },

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Remove diff machinery dependency from read-cache
  2010-01-22  1:43               ` Johannes Schindelin
@ 2010-01-22  3:50                 ` Junio C Hamano
  0 siblings, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2010-01-22  3:50 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Linus Torvalds, Shawn O. Pearce, Git Mailing List

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>>     In the meantime, I think applying this patch is the right thing to do.
>> 
>>     -- >8 --
>>     Subject: merge-recursive: do not return NULL only to cause segfault
>> 
>> would help us?
>
> Sorry, I cannot have a look at this now.

That's fine; I know you've been busy outside git (you've kept saying that
for past several months), and I didn't really expect you to be single
handedly fixing or rewriting merge-recursive.  "In the meantime" patch is
not about attempting to "fix" anything deep inside the guts of it; it is
merely to die() with messages in a function that returned NULL when the
caller never expected to see NULL and caused segfault.

> But in the long run, I think that it gets tiring to chase all kinds of 
> weird interactions between unpack_trees(), the rename detection and the 
> recursive merge.

I don't think there is any interaction; as Tim Olsen reported, "resolve"
that uses the same unpack_trees() merges the trees just fine.  The bug
seems to be all inside merge_recursive().

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Remove diff machinery dependency from read-cache
  2010-01-22  2:35           ` Remove diff machinery dependency from read-cache Nicolas Pitre
  2010-01-22  2:44             ` Linus Torvalds
@ 2010-01-22  3:56             ` Junio C Hamano
  2010-01-22  4:21               ` Linus Torvalds
  2010-01-22  4:31             ` Linus Torvalds
  2 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2010-01-22  3:56 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Junio C Hamano, Linus Torvalds, Shawn O. Pearce, Git Mailing List

Nicolas Pitre <nico@fluxnic.net> writes:

> I do use it, but for developing/debugging pack stuff.  I don't suggest
> removing it, but I don't think making it a built-in has value either.

I thought people _might_ have used it for satistics purposes, but it
appears that the command doesn't even give in-pack size of objects nor
delta chain length, so probably anybody doing pack statistics would be
using "verify-pack -v" and wouldn't mind if it became test-show-index.

> So I really think that Linus' patch (which is missing hex.c btw) is a 
> good thing to do, even if only for the cleanup value.
>
> Then, git-show-index could probably become test-show-index and no longer 
> leave the build directory.

Yeah, I think that is a sensible long term plan, too.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Remove diff machinery dependency from read-cache
  2010-01-21 19:37 Remove diff machinery dependency from read-cache Linus Torvalds
  2010-01-21 20:07 ` Junio C Hamano
@ 2010-01-22  3:58 ` Linus Torvalds
  2010-01-23  6:31 ` Brian Campbell
  2 siblings, 0 replies; 23+ messages in thread
From: Linus Torvalds @ 2010-01-22  3:58 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List



On Thu, 21 Jan 2010, Linus Torvalds wrote:
> 
> It's not just that one file either. I get:
> 
> 	[torvalds@nehalem git]$ du -s /home/torvalds/libexec/git-core
> 	45640	/home/torvalds/libexec/git-core (before)
> 	33508	/home/torvalds/libexec/git-core (after)
> 
> so we're talking 12MB of diskspace here. 

Btw, with the few built-in patches I've sent, it's now

	[torvalds@nehalem git]$ du -s /home/torvalds/libexec/git-core
	27876	/home/torvalds/libexec/git-core

so they're worth about 5.5M of disk-space.

Of course, without debugging and with -Os, the difference is much smaller, 
and libexec ends up being "just" 8.6MB in size. There are still a number 
of trivial programs that could be made builtin.

Optimally, I suspect just things like 'git-daemon' and the remote helpers 
would be actual separate binaries.

		Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Remove diff machinery dependency from read-cache
  2010-01-22  3:56             ` Junio C Hamano
@ 2010-01-22  4:21               ` Linus Torvalds
  0 siblings, 0 replies; 23+ messages in thread
From: Linus Torvalds @ 2010-01-22  4:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, Shawn O. Pearce, Git Mailing List



On Thu, 21 Jan 2010, Junio C Hamano wrote:

> Nicolas Pitre <nico@fluxnic.net> writes:
> 
> > I do use it, but for developing/debugging pack stuff.  I don't suggest
> > removing it, but I don't think making it a built-in has value either.
> 
> I thought people _might_ have used it for satistics purposes, but it
> appears that the command doesn't even give in-pack size of objects nor
> delta chain length, so probably anybody doing pack statistics would be
> using "verify-pack -v" and wouldn't mind if it became test-show-index.
> 
> > So I really think that Linus' patch (which is missing hex.c btw) is a 
> > good thing to do, even if only for the cleanup value.
> >
> > Then, git-show-index could probably become test-show-index and no longer 
> > leave the build directory.
> 
> Yeah, I think that is a sensible long term plan, too.

Note that there are other totally trivial git programs like 'git-var' that 
have exactly the same issue as 'git-show-index'.

Same details: it doesn't really want any diff machinery, doesn't want any 
git object handling, but can't avoid it because it uses xmalloc (through 
environment.c and config.c etc), so a program that _should_ be pretty 
trivially small again ends up being 200+kB in size even without debug 
info (and almost a megabyte with it).

So here's another trivial builtin-ification patch.

		Linus

---
 Makefile               |    2 +-
 var.c => builtin-var.c |    4 +---
 builtin.h              |    1 +
 git.c                  |    1 +
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/Makefile b/Makefile
index f9e4aa3..398b5fb 100644
--- a/Makefile
+++ b/Makefile
@@ -396,7 +396,6 @@ PROGRAMS += git-patch-id$X
 PROGRAMS += git-shell$X
 PROGRAMS += git-unpack-file$X
 PROGRAMS += git-upload-pack$X
-PROGRAMS += git-var$X
 PROGRAMS += git-http-backend$X
 
 # List built-in command $C whose implementation cmd_$C() is not in
@@ -703,6 +702,7 @@ BUILTIN_OBJS += builtin-update-index.o
 BUILTIN_OBJS += builtin-update-ref.o
 BUILTIN_OBJS += builtin-update-server-info.o
 BUILTIN_OBJS += builtin-upload-archive.o
+BUILTIN_OBJS += builtin-var.o
 BUILTIN_OBJS += builtin-verify-pack.o
 BUILTIN_OBJS += builtin-verify-tag.o
 BUILTIN_OBJS += builtin-write-tree.o
diff --git a/var.c b/builtin-var.c
similarity index 96%
rename from var.c
rename to builtin-var.c
index d9892f8..2280518 100644
--- a/var.c
+++ b/builtin-var.c
@@ -72,7 +72,7 @@ static int show_config(const char *var, const char *value, void *cb)
 	return git_default_config(var, value, cb);
 }
 
-int main(int argc, char **argv)
+int cmd_var(int argc, const char **argv, const char *prefix)
 {
 	const char *val;
 	int nongit;
@@ -80,8 +80,6 @@ int main(int argc, char **argv)
 		usage(var_usage);
 	}
 
-	git_extract_argv0_path(argv[0]);
-
 	setup_git_directory_gently(&nongit);
 	val = NULL;
 
diff --git a/builtin.h b/builtin.h
index 3aa6b6c..0c9c396 100644
--- a/builtin.h
+++ b/builtin.h
@@ -107,6 +107,7 @@ extern int cmd_update_ref(int argc, const char **argv, const char *prefix);
 extern int cmd_update_server_info(int argc, const char **argv, const char *prefix);
 extern int cmd_upload_archive(int argc, const char **argv, const char *prefix);
 extern int cmd_upload_tar(int argc, const char **argv, const char *prefix);
+extern int cmd_var(int argc, const char **argv, const char *prefix);
 extern int cmd_verify_tag(int argc, const char **argv, const char *prefix);
 extern int cmd_version(int argc, const char **argv, const char *prefix);
 extern int cmd_whatchanged(int argc, const char **argv, const char *prefix);
diff --git a/git.c b/git.c
index a952663..09d3272 100644
--- a/git.c
+++ b/git.c
@@ -373,6 +373,7 @@ static void handle_internal_command(int argc, const char **argv)
 		{ "update-ref", cmd_update_ref, RUN_SETUP },
 		{ "update-server-info", cmd_update_server_info, RUN_SETUP },
 		{ "upload-archive", cmd_upload_archive },
+		{ "var", cmd_var },
 		{ "verify-tag", cmd_verify_tag, RUN_SETUP },
 		{ "version", cmd_version },
 		{ "whatchanged", cmd_whatchanged, RUN_SETUP | USE_PAGER },

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Remove diff machinery dependency from read-cache
  2010-01-22  2:35           ` Remove diff machinery dependency from read-cache Nicolas Pitre
  2010-01-22  2:44             ` Linus Torvalds
  2010-01-22  3:56             ` Junio C Hamano
@ 2010-01-22  4:31             ` Linus Torvalds
  2 siblings, 0 replies; 23+ messages in thread
From: Linus Torvalds @ 2010-01-22  4:31 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, Shawn O. Pearce, Git Mailing List



On Thu, 21 Jan 2010, Nicolas Pitre wrote:
> 
> So I really think that Linus' patch (which is missing hex.c btw) is a 
> good thing to do, even if only for the cleanup value.

Btw, it looks like the separate hex.c would fix not just git-show-index 
(together with de-xmalloc), but also make git-patch-id shrink down. Except 
git-patch-id for some reason does git_extract_argv0_path(), which brings 
in exec_cmd.o, which brings in strbuf, and xmalloc, and now it's all the 
same old pain again.

So rather than try to solve it all (xmalloc in particular is pretty 
hairy), here's another patch.

		Linus
---
 Makefile                         |    2 +-
 patch-id.c => builtin-patch-id.c |    4 +---
 builtin.h                        |    1 +
 git.c                            |    1 +
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/Makefile b/Makefile
index 398b5fb..5b614e4 100644
--- a/Makefile
+++ b/Makefile
@@ -392,7 +392,6 @@ PROGRAMS += git-index-pack$X
 PROGRAMS += git-merge-index$X
 PROGRAMS += git-mktag$X
 PROGRAMS += git-pack-redundant$X
-PROGRAMS += git-patch-id$X
 PROGRAMS += git-shell$X
 PROGRAMS += git-unpack-file$X
 PROGRAMS += git-upload-pack$X
@@ -674,6 +673,7 @@ BUILTIN_OBJS += builtin-mv.o
 BUILTIN_OBJS += builtin-name-rev.o
 BUILTIN_OBJS += builtin-pack-objects.o
 BUILTIN_OBJS += builtin-pack-refs.o
+BUILTIN_OBJS += builtin-patch-id.o
 BUILTIN_OBJS += builtin-prune-packed.o
 BUILTIN_OBJS += builtin-prune.o
 BUILTIN_OBJS += builtin-push.o
diff --git a/patch-id.c b/builtin-patch-id.c
similarity index 95%
rename from patch-id.c
rename to builtin-patch-id.c
index 0df4cb0..af0911e 100644
--- a/patch-id.c
+++ b/builtin-patch-id.c
@@ -75,13 +75,11 @@ static void generate_id_list(void)
 
 static const char patch_id_usage[] = "git patch-id < patch";
 
-int main(int argc, char **argv)
+int cmd_patch_id(int argc, const char **argv, const char *prefix)
 {
 	if (argc != 1)
 		usage(patch_id_usage);
 
-	git_extract_argv0_path(argv[0]);
-
 	generate_id_list();
 	return 0;
 }
diff --git a/builtin.h b/builtin.h
index 0c9c396..ab723f8 100644
--- a/builtin.h
+++ b/builtin.h
@@ -76,6 +76,7 @@ extern int cmd_mktree(int argc, const char **argv, const char *prefix);
 extern int cmd_mv(int argc, const char **argv, const char *prefix);
 extern int cmd_name_rev(int argc, const char **argv, const char *prefix);
 extern int cmd_pack_objects(int argc, const char **argv, const char *prefix);
+extern int cmd_patch_id(int argc, const char **argv, const char *prefix);
 extern int cmd_pickaxe(int argc, const char **argv, const char *prefix);
 extern int cmd_prune(int argc, const char **argv, const char *prefix);
 extern int cmd_prune_packed(int argc, const char **argv, const char *prefix);
diff --git a/git.c b/git.c
index 09d3272..e38f201 100644
--- a/git.c
+++ b/git.c
@@ -341,6 +341,7 @@ static void handle_internal_command(int argc, const char **argv)
 		{ "mv", cmd_mv, RUN_SETUP | NEED_WORK_TREE },
 		{ "name-rev", cmd_name_rev, RUN_SETUP },
 		{ "pack-objects", cmd_pack_objects, RUN_SETUP },
+		{ "patch-id", cmd_patch_id },
 		{ "peek-remote", cmd_ls_remote },
 		{ "pickaxe", cmd_blame, RUN_SETUP },
 		{ "prune", cmd_prune, RUN_SETUP },

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Remove diff machinery dependency from read-cache
  2010-01-22  2:25             ` Linus Torvalds
  2010-01-22  3:50               ` Linus Torvalds
@ 2010-01-22  8:43               ` Johannes Sixt
  2010-01-22 11:47                 ` [PATCH] merge-tree: remove unnecessary call of git_extract_argv0_path Johannes Sixt
  1 sibling, 1 reply; 23+ messages in thread
From: Johannes Sixt @ 2010-01-22  8:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Shawn O. Pearce, Git Mailing List

Linus Torvalds schrieb:
> @@ -347,8 +347,6 @@ int main(int argc, char **argv)
>  
>  	git_extract_argv0_path(argv[0]);

This line must go away as well.

> -	setup_git_directory();
> -

-- Hannes

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] merge-tree: remove unnecessary call of git_extract_argv0_path
  2010-01-22  8:43               ` Johannes Sixt
@ 2010-01-22 11:47                 ` Johannes Sixt
  2010-01-22 16:40                   ` Linus Torvalds
  0 siblings, 1 reply; 23+ messages in thread
From: Johannes Sixt @ 2010-01-22 11:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Shawn O. Pearce, git, Johannes Sixt

This call should have been removed when the utility was made a builtin by
907a7cb.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
 builtin-merge-tree.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/builtin-merge-tree.c b/builtin-merge-tree.c
index 8e16c3e..a4a4f2c 100644
--- a/builtin-merge-tree.c
+++ b/builtin-merge-tree.c
@@ -345,8 +345,6 @@ int cmd_merge_tree(int argc, const char **argv, const char *prefix)
 	if (argc != 4)
 		usage(merge_tree_usage);
 
-	git_extract_argv0_path(argv[0]);
-
 	buf1 = get_tree_descriptor(t+0, argv[1]);
 	buf2 = get_tree_descriptor(t+1, argv[2]);
 	buf3 = get_tree_descriptor(t+2, argv[3]);
-- 
1.6.6.1.372.g084d

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] merge-tree: remove unnecessary call of git_extract_argv0_path
  2010-01-22 11:47                 ` [PATCH] merge-tree: remove unnecessary call of git_extract_argv0_path Johannes Sixt
@ 2010-01-22 16:40                   ` Linus Torvalds
  0 siblings, 0 replies; 23+ messages in thread
From: Linus Torvalds @ 2010-01-22 16:40 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, Shawn O. Pearce, git



On Fri, 22 Jan 2010, Johannes Sixt wrote:
>
> This call should have been removed when the utility was made a builtin by
> 907a7cb.

Ack.

			Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Remove diff machinery dependency from read-cache
  2010-01-21 19:37 Remove diff machinery dependency from read-cache Linus Torvalds
  2010-01-21 20:07 ` Junio C Hamano
  2010-01-22  3:58 ` Linus Torvalds
@ 2010-01-23  6:31 ` Brian Campbell
  2 siblings, 0 replies; 23+ messages in thread
From: Brian Campbell @ 2010-01-23  6:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List

On Jan 21, 2010, at 2:37 PM, Linus Torvalds wrote:

> ---
> builtin-add.c |   76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> read-cache.c  |   78 ---------------------------------------------------------
> 2 files changed, 76 insertions(+), 78 deletions(-)
> 
> diff --git a/builtin-add.c b/builtin-add.c
> index cb6e590..2705f8d 100644

I've recently tried building "master" from git://git.kernel.org/pub/scm/git/git.git, and got the following error:

    $ make
...snip...	
        LINK git-fast-import
    Undefined symbols:
      "_git_mailmap_file", referenced from:
          _git_default_config in libgit.a(config.o)
    ld: symbol(s) not found
    collect2: ld returned 1 exit status
    make: *** [git-fast-import] Error 1

When I bisect, I find this commit to blame:

    $ git bisect start master v1.6.6.1
    Bisecting: 197 revisions left to test after this (roughly 8 steps)
    [f8eb50f60b5c8efda3529fcf89517080c686ce0b] Merge branch 'jh/commit-status'
    $ git bisect run make -j2
    running make -j2
    GIT_VERSION = 1.6.6.240.gf8eb5
...snip...
    fb7d3f32b283a3847e6f151a06794abd14ffd81b is the first bad commit
    commit fb7d3f32b283a3847e6f151a06794abd14ffd81b
    Author: Linus Torvalds <torvalds@linux-foundation.org>
    Date:   Thu Jan 21 11:37:38 2010 -0800

I also verified that it fails from with "make clean; make", so a dirty tree or -j2 aren't to blame. I'm not terribly familiar with the code base, so I'm a bit puzzled about why this commit would have cause the error that it does; I don't see any obvious connection between add_files_to_cache() and git_mailmap_file. 

Can anyone explain why I'd be seeing such an error? Is this a problem on my end?

    $ uname -a
    Darwin erlang.local 10.2.0 Darwin Kernel Version 10.2.0: Tue Nov  3 10:37:10 PST 2009; root:xnu-1486.2.11~1/RELEASE_I386 i386
    $ gcc --version
    i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5646) (dot 1)
    Copyright (C) 2007 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2010-01-23  6:32 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-21 19:37 Remove diff machinery dependency from read-cache Linus Torvalds
2010-01-21 20:07 ` Junio C Hamano
2010-01-21 20:15   ` Linus Torvalds
2010-01-21 22:18     ` Linus Torvalds
2010-01-21 23:25       ` Linus Torvalds
2010-01-22  0:45         ` Junio C Hamano
2010-01-22  0:59           ` Johannes Schindelin
2010-01-22  1:01             ` Junio C Hamano
2010-01-22  1:43               ` Johannes Schindelin
2010-01-22  3:50                 ` Junio C Hamano
2010-01-22  2:20           ` Linus Torvalds
2010-01-22  2:25             ` Linus Torvalds
2010-01-22  3:50               ` Linus Torvalds
2010-01-22  8:43               ` Johannes Sixt
2010-01-22 11:47                 ` [PATCH] merge-tree: remove unnecessary call of git_extract_argv0_path Johannes Sixt
2010-01-22 16:40                   ` Linus Torvalds
2010-01-22  2:35           ` Remove diff machinery dependency from read-cache Nicolas Pitre
2010-01-22  2:44             ` Linus Torvalds
2010-01-22  3:56             ` Junio C Hamano
2010-01-22  4:21               ` Linus Torvalds
2010-01-22  4:31             ` Linus Torvalds
2010-01-22  3:58 ` Linus Torvalds
2010-01-23  6:31 ` Brian Campbell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).