git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: git@vger.kernel.org
Subject: [PATCH 2/2] builtin-add.c: optimize -A option and "git add $paths..."
Date: Wed, 23 Jul 2008 00:01:42 -0700	[thread overview]
Message-ID: <1216796502-11227-2-git-send-email-gitster@pobox.com> (raw)
In-Reply-To: <1216796502-11227-1-git-send-email-gitster@pobox.com>

The earlier "git add -A" change was done in a quite inefficient
way (i.e. it is as unefficient as "git add -u && git add ." modulo
one fork/exec and read/write index).

For that matter, the original "git add $paths..." could be improved.

When the user asks "git add .", we do not have to examine all paths
we encounter and perform the excluded() and dir_add_name()
processing, both of which are slower code and use slower data structure
by git standards, especially when the index is already reasonably well
populated.

Instead, we implement "git add $pathspec..." as:

 - read the index;

 - read_directory() to process untracked, unignored files the current
   way, that is, recursively doing readdir(), filtering them by pathspec
   and excluded(), queueing them via dir_add_name() and finally do
   add_files(); and

 - iterate over the index, filtering them by pathspec, and update only
   the modified/type changed paths but not deleted ones.

And "git add -A" becomes exactly the same as above, modulo:

 - missing $pathspec means "." instead of being an error; and

 - "iterate over the index" part handles deleted ones as well,
   i.e. exactly what the current update_callback() in builtin-add.c does.

In either case, because fill_directory() does not use read_directory() to
read everything in, we need to add an extra logic to iterate over the
index to catch mistyped pathspec.

In a fully populated kernel tree (without any local change), best of ten
runs:

(with this patch)
$ /usr/bin/time ~/git-test/bin/git add .
0.16user 0.17system 0:00.34elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+3231minor)pagefaults 0swaps

(without this patch)
$ /usr/bin/time ~/git-master/bin/git add .
0.21user 0.21system 0:00.42elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+3647minor)pagefaults 0swaps

Under the same condition, but after "rm .git/index" to force re-indexing
everything (the patch should not make any difference):

(with this patch)
$ /usr/bin/time ~/git-test/bin/git add .
3.22user 2.10system 1:35.10elapsed 5%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (30037major+63003minor)pagefaults 0swaps

(without this patch)
$ /usr/bin/time ~/git-master/bin/git add .
3.31user 2.28system 1:44.77elapsed 5%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (30472major+62563minor)pagefaults 0swaps

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin-add.c |   43 +++++++++++++++++++++++++++++++------------
 1 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/builtin-add.c b/builtin-add.c
index 0de516a..1834e2d 100644
--- a/builtin-add.c
+++ b/builtin-add.c
@@ -18,6 +18,27 @@ static const char * const builtin_add_usage[] = {
 static int patch_interactive = 0, add_interactive = 0;
 static int take_worktree_changes;
 
+static void fill_pathspec_matches(const char **pathspec, char *seen, int specs)
+{
+	int num_unmatched = 0, i;
+
+	/*
+	 * Since we are walking the index as if we are warlking the directory,
+	 * we have to mark the matched pathspec as seen; otherwise we will
+	 * mistakenly think that the user gave a pathspec that did not match
+	 * anything.
+	 */
+	for (i = 0; i < specs; i++)
+		if (!seen[i])
+			num_unmatched++;
+	if (!num_unmatched)
+		return;
+	for (i = 0; i < active_nr; i++) {
+		struct cache_entry *ce = active_cache[i];
+		match_pathspec(pathspec, ce->name, ce_namelen(ce), 0, seen);
+	}
+}
+
 static void prune_directory(struct dir_struct *dir, const char **pathspec, int prefix)
 {
 	char *seen;
@@ -37,6 +58,7 @@ static void prune_directory(struct dir_struct *dir, const char **pathspec, int p
 			*dst++ = entry;
 	}
 	dir->nr = dst - dir->entries;
+	fill_pathspec_matches(pathspec, seen, specs);
 
 	for (i = 0; i < specs; i++) {
 		if (!seen[i] && !file_exists(pathspec[i]))
@@ -201,7 +223,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 
 	if (addremove && take_worktree_changes)
 		die("-A and -u are mutually incompatible");
-	if (addremove && !argc) {
+	if ((addremove || take_worktree_changes) && !argc) {
 		static const char *here[2] = { ".", NULL };
 		argc = 1;
 		argv = here;
@@ -214,7 +236,9 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 
 	flags = ((verbose ? ADD_CACHE_VERBOSE : 0) |
 		 (show_only ? ADD_CACHE_PRETEND : 0) |
-		 (ignore_add_errors ? ADD_CACHE_IGNORE_ERRORS : 0));
+		 (ignore_add_errors ? ADD_CACHE_IGNORE_ERRORS : 0) |
+		 (!(addremove || take_worktree_changes)
+		  ? ADD_CACHE_IGNORE_REMOVAL : 0));
 
 	if (require_pathspec && argc == 0) {
 		fprintf(stderr, "Nothing specified, nothing added.\n");
@@ -223,24 +247,19 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 	}
 	pathspec = get_pathspec(prefix, argv);
 
-	/*
-	 * If we are adding new files, we need to scan the working
-	 * tree to find the ones that match pathspecs; this needs
-	 * to be done before we read the index.
-	 */
-	if (add_new_files)
-		fill_directory(&dir, pathspec, ignored_too);
-
 	if (read_cache() < 0)
 		die("index file corrupt");
 
+	if (add_new_files)
+		/* This picks up the paths that are not tracked */
+		fill_directory(&dir, pathspec, ignored_too);
+
 	if (refresh_only) {
 		refresh(verbose, pathspec);
 		goto finish;
 	}
 
-	if (take_worktree_changes || addremove)
-		exit_status |= add_files_to_cache(prefix, pathspec, flags);
+	exit_status |= add_files_to_cache(prefix, pathspec, flags);
 
 	if (add_new_files)
 		exit_status |= add_files(&dir, flags);
-- 
1.6.0.rc0.15.g0dda1

      reply	other threads:[~2008-07-23  7:03 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-23  7:01 [PATCH 1/2] builtin-add.c: restructure the code for maintainability Junio C Hamano
2008-07-23  7:01 ` Junio C Hamano [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1216796502-11227-2-git-send-email-gitster@pobox.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).