git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Junio C Hamano <gitster@pobox.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Remove diff machinery dependency from read-cache
Date: Thu, 21 Jan 2010 11:37:38 -0800 (PST)	[thread overview]
Message-ID: <alpine.LFD.2.00.1001211119130.13231@localhost.localdomain> (raw)


Exal Sibeaz pointed out that some git files are way too big, and that 
add_files_to_cache() brings in all the diff machinery to any git binary 
that needs the basic git SHA1 object operations from read-cache.c. Which 
is pretty much all of them.

It's doubly silly, since add_files_to_cache() is only used by builtin 
programs (add, checkout and commit), so it's fairly easily fixed by just 
moving the thing to builtin-add.c, and avoiding the dependency entirely.

I initially argued to Exal that it would probably be best to try to depend 
on smart compilers and linkers, but after spending some time trying to 
make -ffunction-sections work and giving up, I think Exal was right, and 
the fix is to just do some trivial cleanups like this.

This trivial cleanup results in pretty stunning file size differences. 
The diff machinery really is mostly used by just the builtin programs, and 
you have things like these trivial before-and-after numbers:

  -rwxr-xr-x 1 torvalds torvalds 1727420 2010-01-21 10:53 git-hash-object
  -rwxrwxr-x 1 torvalds torvalds  940265 2010-01-21 11:16 git-hash-object

Now, I'm not saying that 940kB is good either, but that's mostly all the 
debug information - you can see the real code with 'size':

   text	   data	    bss	    dec	    hex	filename
 418675	   3920	 127408	 550003	  86473	git-hash-object (before)
 230650	   2288	 111728	 344666	  5425a	git-hash-object (after)

ie we have a nice 24% size reduction from this trivial cleanup.

It's not just that one file either. I get:

	[torvalds@nehalem git]$ du -s /home/torvalds/libexec/git-core
	45640	/home/torvalds/libexec/git-core (before)
	33508	/home/torvalds/libexec/git-core (after)

so we're talking 12MB of diskspace here. 

(Of course, stripping all the binaries brings the 33MB down to 9MB, so the 
whole debug information thing is still the bulk of it all, but that's a 
separate issue entirely)

Now, I'm sure there are other things we should do, and changing our 
compiler flags from -O2 to -Os would bring the text size down by an 
additional almost 20%, but this thing Exal pointed out seems to be some 
good low-hanging fruit.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 builtin-add.c |   76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 read-cache.c  |   78 ---------------------------------------------------------
 2 files changed, 76 insertions(+), 78 deletions(-)

diff --git a/builtin-add.c b/builtin-add.c
index cb6e590..2705f8d 100644
--- a/builtin-add.c
+++ b/builtin-add.c
@@ -11,6 +11,7 @@
 #include "run-command.h"
 #include "parse-options.h"
 #include "diff.h"
+#include "diffcore.h"
 #include "revision.h"
 
 static const char * const builtin_add_usage[] = {
@@ -20,6 +21,81 @@ static const char * const builtin_add_usage[] = {
 static int patch_interactive, add_interactive, edit_interactive;
 static int take_worktree_changes;
 
+struct update_callback_data
+{
+	int flags;
+	int add_errors;
+};
+
+static void update_callback(struct diff_queue_struct *q,
+			    struct diff_options *opt, void *cbdata)
+{
+	int i;
+	struct update_callback_data *data = cbdata;
+
+	for (i = 0; i < q->nr; i++) {
+		struct diff_filepair *p = q->queue[i];
+		const char *path = p->one->path;
+		switch (p->status) {
+		default:
+			die("unexpected diff status %c", p->status);
+		case DIFF_STATUS_UNMERGED:
+			/*
+			 * ADD_CACHE_IGNORE_REMOVAL is unset if "git
+			 * add -u" is calling us, In such a case, a
+			 * missing work tree file needs to be removed
+			 * if there is an unmerged entry at stage #2,
+			 * but such a diff record is followed by
+			 * another with DIFF_STATUS_DELETED (and if
+			 * there is no stage #2, we won't see DELETED
+			 * nor MODIFIED).  We can simply continue
+			 * either way.
+			 */
+			if (!(data->flags & ADD_CACHE_IGNORE_REMOVAL))
+				continue;
+			/*
+			 * Otherwise, it is "git add path" is asking
+			 * to explicitly add it; we fall through.  A
+			 * missing work tree file is an error and is
+			 * caught by add_file_to_index() in such a
+			 * case.
+			 */
+		case DIFF_STATUS_MODIFIED:
+		case DIFF_STATUS_TYPE_CHANGED:
+			if (add_file_to_index(&the_index, path, data->flags)) {
+				if (!(data->flags & ADD_CACHE_IGNORE_ERRORS))
+					die("updating files failed");
+				data->add_errors++;
+			}
+			break;
+		case DIFF_STATUS_DELETED:
+			if (data->flags & ADD_CACHE_IGNORE_REMOVAL)
+				break;
+			if (!(data->flags & ADD_CACHE_PRETEND))
+				remove_file_from_index(&the_index, path);
+			if (data->flags & (ADD_CACHE_PRETEND|ADD_CACHE_VERBOSE))
+				printf("remove '%s'\n", path);
+			break;
+		}
+	}
+}
+
+int add_files_to_cache(const char *prefix, const char **pathspec, int flags)
+{
+	struct update_callback_data data;
+	struct rev_info rev;
+	init_revisions(&rev, prefix);
+	setup_revisions(0, NULL, &rev, NULL);
+	rev.prune_data = pathspec;
+	rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
+	rev.diffopt.format_callback = update_callback;
+	data.flags = flags;
+	data.add_errors = 0;
+	rev.diffopt.format_callback_data = &data;
+	run_diff_files(&rev, DIFF_RACY_IS_MODIFIED);
+	return !!data.add_errors;
+}
+
 static void fill_pathspec_matches(const char **pathspec, char *seen, int specs)
 {
 	int num_unmatched = 0, i;
diff --git a/read-cache.c b/read-cache.c
index edd9959..79938bf 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -10,9 +10,6 @@
 #include "dir.h"
 #include "tree.h"
 #include "commit.h"
-#include "diff.h"
-#include "diffcore.h"
-#include "revision.h"
 #include "blob.h"
 #include "resolve-undo.h"
 
@@ -1648,81 +1645,6 @@ int read_index_unmerged(struct index_state *istate)
 	return unmerged;
 }
 
-struct update_callback_data
-{
-	int flags;
-	int add_errors;
-};
-
-static void update_callback(struct diff_queue_struct *q,
-			    struct diff_options *opt, void *cbdata)
-{
-	int i;
-	struct update_callback_data *data = cbdata;
-
-	for (i = 0; i < q->nr; i++) {
-		struct diff_filepair *p = q->queue[i];
-		const char *path = p->one->path;
-		switch (p->status) {
-		default:
-			die("unexpected diff status %c", p->status);
-		case DIFF_STATUS_UNMERGED:
-			/*
-			 * ADD_CACHE_IGNORE_REMOVAL is unset if "git
-			 * add -u" is calling us, In such a case, a
-			 * missing work tree file needs to be removed
-			 * if there is an unmerged entry at stage #2,
-			 * but such a diff record is followed by
-			 * another with DIFF_STATUS_DELETED (and if
-			 * there is no stage #2, we won't see DELETED
-			 * nor MODIFIED).  We can simply continue
-			 * either way.
-			 */
-			if (!(data->flags & ADD_CACHE_IGNORE_REMOVAL))
-				continue;
-			/*
-			 * Otherwise, it is "git add path" is asking
-			 * to explicitly add it; we fall through.  A
-			 * missing work tree file is an error and is
-			 * caught by add_file_to_index() in such a
-			 * case.
-			 */
-		case DIFF_STATUS_MODIFIED:
-		case DIFF_STATUS_TYPE_CHANGED:
-			if (add_file_to_index(&the_index, path, data->flags)) {
-				if (!(data->flags & ADD_CACHE_IGNORE_ERRORS))
-					die("updating files failed");
-				data->add_errors++;
-			}
-			break;
-		case DIFF_STATUS_DELETED:
-			if (data->flags & ADD_CACHE_IGNORE_REMOVAL)
-				break;
-			if (!(data->flags & ADD_CACHE_PRETEND))
-				remove_file_from_index(&the_index, path);
-			if (data->flags & (ADD_CACHE_PRETEND|ADD_CACHE_VERBOSE))
-				printf("remove '%s'\n", path);
-			break;
-		}
-	}
-}
-
-int add_files_to_cache(const char *prefix, const char **pathspec, int flags)
-{
-	struct update_callback_data data;
-	struct rev_info rev;
-	init_revisions(&rev, prefix);
-	setup_revisions(0, NULL, &rev, NULL);
-	rev.prune_data = pathspec;
-	rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
-	rev.diffopt.format_callback = update_callback;
-	data.flags = flags;
-	data.add_errors = 0;
-	rev.diffopt.format_callback_data = &data;
-	run_diff_files(&rev, DIFF_RACY_IS_MODIFIED);
-	return !!data.add_errors;
-}
-
 /*
  * Returns 1 if the path is an "other" path with respect to
  * the index; that is, the path is not mentioned in the index at all,

             reply	other threads:[~2010-01-21 19:45 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-21 19:37 Linus Torvalds [this message]
2010-01-21 20:07 ` Remove diff machinery dependency from read-cache Junio C Hamano
2010-01-21 20:15   ` Linus Torvalds
2010-01-21 22:18     ` Linus Torvalds
2010-01-21 23:25       ` Linus Torvalds
2010-01-22  0:45         ` Junio C Hamano
2010-01-22  0:59           ` Johannes Schindelin
2010-01-22  1:01             ` Junio C Hamano
2010-01-22  1:43               ` Johannes Schindelin
2010-01-22  3:50                 ` Junio C Hamano
2010-01-22  2:20           ` Linus Torvalds
2010-01-22  2:25             ` Linus Torvalds
2010-01-22  3:50               ` Linus Torvalds
2010-01-22  8:43               ` Johannes Sixt
2010-01-22 11:47                 ` [PATCH] merge-tree: remove unnecessary call of git_extract_argv0_path Johannes Sixt
2010-01-22 16:40                   ` Linus Torvalds
2010-01-22  2:35           ` Remove diff machinery dependency from read-cache Nicolas Pitre
2010-01-22  2:44             ` Linus Torvalds
2010-01-22  3:56             ` Junio C Hamano
2010-01-22  4:21               ` Linus Torvalds
2010-01-22  4:31             ` Linus Torvalds
2010-01-22  3:58 ` Linus Torvalds
2010-01-23  6:31 ` Brian Campbell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.00.1001211119130.13231@localhost.localdomain \
    --to=torvalds@linux-foundation.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).