git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Erik Elfström" <erik.elfstrom@gmail.com>
To: git@vger.kernel.org
Cc: "Erik Elfström" <erik.elfstrom@gmail.com>
Subject: [PATCH 3/3] clean: improve performance when removing lots of directories
Date: Mon,  6 Apr 2015 13:48:24 +0200	[thread overview]
Message-ID: <1428320904-12366-4-git-send-email-erik.elfstrom@gmail.com> (raw)
In-Reply-To: <1428320904-12366-1-git-send-email-erik.elfstrom@gmail.com>

Before this change, clean used resolve_gitlink_ref to check for the
presence of nested git repositories. This had the drawback of creating
a ref_cache entry for every directory that should potentially be
cleaned. The linear search through the ref_cache list caused a massive
performance hit for large number of directories.

Teach clean.c:remove_dirs to use setup.c:is_git_directory
instead. is_git_directory will actually open HEAD and parse the HEAD
ref but this implies a nested git repository and should be rare when
cleaning.

Using is_git_directory should give a more standardized check for what
is and what isn't a git repository but also gives a slight behavioral
change. We will now detect and respect bare and empty nested git
repositories (only init run). Update t7300 to reflect this.

The time to clean an untracked directory containing 100000 sub
directories went from 61s to 1.7s after this change.

Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com>
Helped-by: Jeff King <peff@peff.net>
---
 builtin/clean.c  | 23 +++++++++++++++++++----
 t/t7300-clean.sh |  4 ++--
 2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/builtin/clean.c b/builtin/clean.c
index 98c103f..e951bd9 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -10,7 +10,6 @@
 #include "cache.h"
 #include "dir.h"
 #include "parse-options.h"
-#include "refs.h"
 #include "string-list.h"
 #include "quote.h"
 #include "column.h"
@@ -148,6 +147,24 @@ static int exclude_cb(const struct option *opt, const char *arg, int unset)
 	return 0;
 }
 
+static int is_git_repository(struct strbuf *path)
+{
+	int ret = 0;
+	if (is_git_directory(path->buf))
+		ret = 1;
+	else {
+		int orig_path_len = path->len;
+		if (path->buf[orig_path_len - 1] != '/')
+			strbuf_addch(path, '/');
+		strbuf_addstr(path, ".git");
+		if (is_git_directory(path->buf))
+			ret = 1;
+		strbuf_setlen(path, orig_path_len);
+	}
+
+	return ret;
+}
+
 static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag,
 		int dry_run, int quiet, int *dir_gone)
 {
@@ -155,13 +172,11 @@ static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag,
 	struct strbuf quoted = STRBUF_INIT;
 	struct dirent *e;
 	int res = 0, ret = 0, gone = 1, original_len = path->len, len;
-	unsigned char submodule_head[20];
 	struct string_list dels = STRING_LIST_INIT_DUP;
 
 	*dir_gone = 1;
 
-	if ((force_flag & REMOVE_DIR_KEEP_NESTED_GIT) &&
-			!resolve_gitlink_ref(path->buf, "HEAD", submodule_head)) {
+	if ((force_flag & REMOVE_DIR_KEEP_NESTED_GIT) && is_git_repository(path)) {
 		if (!quiet) {
 			quote_path_relative(path->buf, prefix, &quoted);
 			printf(dry_run ?  _(msg_would_skip_git_dir) : _(msg_skip_git_dir),
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index cfdf6d4..ada569d 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -455,7 +455,7 @@ test_expect_success 'nested git work tree' '
 	! test -d bar
 '
 
-test_expect_failure 'nested git (only init) should be kept' '
+test_expect_success 'nested git (only init) should be kept' '
 	rm -fr foo bar &&
 	mkdir foo bar &&
 	(
@@ -471,7 +471,7 @@ test_expect_failure 'nested git (only init) should be kept' '
 	! test -d bar
 '
 
-test_expect_failure 'nested git (bare) should be kept' '
+test_expect_success 'nested git (bare) should be kept' '
 	rm -fr foo bar &&
 	mkdir foo bar &&
 	(
-- 
2.4.0.rc0.37.ga3b75b3

  parent reply	other threads:[~2015-04-06 11:48 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-06 11:48 [PATCH 0/3] Improving performance of git clean Erik Elfström
2015-04-06 11:48 ` [PATCH 1/3] t7300: add tests to document behavior of clean and nested git Erik Elfström
2015-04-06 22:06   ` Eric Sunshine
2015-04-07 19:27     ` erik elfström
2015-04-07 19:40   ` Eric Sunshine
2015-04-07 19:53     ` Torsten Bögershausen
2015-04-06 11:48 ` [PATCH 2/3] p7300: added performance tests for clean Erik Elfström
2015-04-06 20:40   ` Torsten Bögershausen
2015-04-06 22:09     ` Eric Sunshine
2015-04-07 19:35       ` erik elfström
2015-04-06 11:48 ` Erik Elfström [this message]
2015-04-06 22:10   ` [PATCH 3/3] clean: improve performance when removing lots of directories Eric Sunshine
2015-04-07 19:55     ` erik elfström
2015-04-08 21:29       ` Eric Sunshine

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1428320904-12366-4-git-send-email-erik.elfstrom@gmail.com \
    --to=erik.elfstrom@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).