git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jeff King <peff@peff.net>,
	Michael J Gruber <git@drmicha.warpmail.net>,
	Jon Schewe <jpschewe@mtu.net>,
	spearce@spearce.org, git@vger.kernel.org
Subject: Re: [PATCH] ls-files: fix overeager pathspec optimization
Date: Fri, 08 Jan 2010 16:54:34 -0800	[thread overview]
Message-ID: <7v8wc8kshh.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: <alpine.LFD.2.00.1001081619570.7821@localhost.localdomain> (Linus Torvalds's message of "Fri\, 8 Jan 2010 16\:24\:21 -0800 \(PST\)")

Linus Torvalds <torvalds@linux-foundation.org> writes:

> I have this memory that _used_ to have a per-filename flag in "git add" 
> that checked if that particular filename component was used or not. But 
> now it just looks at 'dir->ignored_nr' and 'dir->ignored[]'.

Yes, and the previous patch wasn't adding what is ignored to the array, so
here is a re-roll to fix that in addition to the fix to "should the loop
start from checking an empty path?" issue you noticed.

But I am starting to wonder if we might be better off restructuring
read_directory_recursive().  Currently it assumes that the path it was
given _must_ be of interest (i.e. not ignored) and runs excluded() on
subdirectories it finds to make that same decision before recursing into
them or skipping them.  It might make more sense if it first checked if
the path given by the caller should be ignored and act accordingly. If it
is to be ignored, perhaps it will simply return without doing anything
(normal case), or it will return but adds the path to ignored[]
(DIR_COLLECT_IGNORED case), or it will recurse into itself but tell it
that everything it finds is to be ignored.  I dunno...

-- >8 --
Subject: [PATCH (v3)] ls-files: fix overeager pathspec optimization

Given pathspecs that share a common prefix, ls-files optimized its call
into recursive directory reader by starting at the common prefix
directory.

If you have a directory "t" with an untracked file "t/junk" in it, but the
top-level .gitignore file told us to ignore "t/", this resulted in:

    $ git ls-files -o --exclude-standard
    $ git ls-files -o --exclude-standard t/
    t/junk
    $ git ls-files -o --exclude-standard t/junk
    t/junk
    $ cd t && git ls-files -o --exclude-standard
    junk

We could argue that you are overriding the ignore file by giving a
patchspec that matches or being in that directory, but it is somewhat
unexpected.  Worse yet, these behave differently:

    $ git ls-files -o --exclude-standard t/ .
    $ git ls-files -o --exclude-standard t/
    t/junk

This patch changes the optimization so that it notices when the common
prefix directory that it starts reading from is an ignored one.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 dir.c                              |   36 +++++++++++++++++++++++++++++++++
 t/t3001-ls-files-others-exclude.sh |   39 ++++++++++++++++++++++++++++++++++++
 2 files changed, 75 insertions(+), 0 deletions(-)

diff --git a/dir.c b/dir.c
index d0999ba..e8be909 100644
--- a/dir.c
+++ b/dir.c
@@ -778,6 +778,39 @@ static void free_simplify(struct path_simplify *simplify)
 	free(simplify);
 }
 
+static int has_leading_ignored_dir(struct dir_struct *dir,
+				   const char *path_, int len,
+				   const struct path_simplify *simplify)
+{
+	int dtype = DT_DIR;
+	char path[PATH_MAX], *cp = path;
+
+	memcpy(path, path_, len);
+	while (1) {
+		char *next = memchr(cp, '/', path + len - cp);
+		int exclude;
+
+		/*
+		 * NOTE! NOTE! NOTE!: we might want to actually lstat(2)
+		 * path[] to make sure it is a directory.
+		 */
+		if (next)
+			*next = '\0';
+		exclude = excluded(dir, path, &dtype);
+		if (next)
+			*next = '/';
+		if (exclude) {
+			if (dir->flags & DIR_COLLECT_IGNORED)
+				dir_add_ignored(dir, path, len);
+			return 1;
+		}
+		if (!next)
+			break;
+		cp = next + 1;
+	}
+	return 0;
+}
+
 int read_directory(struct dir_struct *dir, const char *path, int len, const char **pathspec)
 {
 	struct path_simplify *simplify;
@@ -786,6 +819,9 @@ int read_directory(struct dir_struct *dir, const char *path, int len, const char
 		return dir->nr;
 
 	simplify = create_simplify(pathspec);
+	if (has_leading_ignored_dir(dir, path, len, simplify))
+		return dir->nr;
+
 	read_directory_recursive(dir, path, len, 0, simplify);
 	free_simplify(simplify);
 	qsort(dir->entries, dir->nr, sizeof(struct dir_entry *), cmp_name);
diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh
index c65bca8..9e71260 100755
--- a/t/t3001-ls-files-others-exclude.sh
+++ b/t/t3001-ls-files-others-exclude.sh
@@ -153,4 +153,43 @@ test_expect_success 'negated exclude matches can override previous ones' '
 	grep "^a.1" output
 '
 
+test_expect_success 'subdirectory ignore (setup)' '
+	mkdir -p top/l1/l2 &&
+	(
+		cd top &&
+		git init &&
+		echo /.gitignore >.gitignore &&
+		echo l1 >>.gitignore &&
+		echo l2 >l1/.gitignore &&
+		>l1/l2/l1
+	)
+'
+
+test_expect_success 'subdirectory ignore (toplevel)' '
+	(
+		cd top &&
+		git ls-files -o --exclude-standard
+	) >actual &&
+	>expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'subdirectory ignore (l1/l2)' '
+	(
+		cd top/l1/l2 &&
+		git ls-files -o --exclude-standard
+	) >actual &&
+	>expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'subdirectory ignore (l1)' '
+	(
+		cd top/l1 &&
+		git ls-files -o --exclude-standard
+	) >actual &&
+	>expect &&
+	test_cmp expect actual
+'
+
 test_done
-- 
1.6.6.209.g52296.dirty

  reply	other threads:[~2010-01-09  0:54 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-08 15:17 Possible bug in git-completion.sh Jon Schewe
2010-01-08 15:40 ` Michael J Gruber
2010-01-08 16:24   ` Jeff King
2010-01-08 16:38     ` Junio C Hamano
2010-01-08 16:41       ` Jeff King
2010-01-08 16:45         ` Junio C Hamano
2010-01-08 16:56           ` Junio C Hamano
2010-01-08 17:24             ` Jeff King
2010-01-08 17:21           ` Jeff King
2010-01-08 18:21           ` Junio C Hamano
2010-01-08 19:58             ` Junio C Hamano
2010-01-08 23:01               ` [PATCH] ls-files: fix overeager pathspec optimization Junio C Hamano
2010-01-08 23:24                 ` Linus Torvalds
2010-01-08 23:31                   ` Junio C Hamano
2010-01-09  0:06                   ` Junio C Hamano
2010-01-09  0:24                     ` Linus Torvalds
2010-01-09  0:54                       ` Junio C Hamano [this message]
2010-01-09  1:07                         ` Linus Torvalds
2010-01-09  5:42                           ` Jeff King
2010-01-09  7:16                           ` Junio C Hamano
2010-01-09  7:35                             ` [PATCH 1/4] t3001: test ls-files -o ignored/dir Junio C Hamano
2010-01-09  7:35                               ` [PATCH 2/4] read_directory_recursive(): refactor handling of a single path into a separate function Junio C Hamano
2010-01-09  7:35                               ` [PATCH 3/4] read_directory(): further split treat_path() Junio C Hamano
2010-01-09  7:35                               ` [PATCH 4/4] ls-files: fix overeager pathspec optimization Junio C Hamano
2010-01-12 16:33                                 ` Jeff King
2010-01-09  8:07                             ` [PATCH] " Junio C Hamano
2010-01-09 18:05                               ` Linus Torvalds
2010-01-10  6:31                                 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7v8wc8kshh.fsf@alter.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=git@drmicha.warpmail.net \
    --cc=git@vger.kernel.org \
    --cc=jpschewe@mtu.net \
    --cc=peff@peff.net \
    --cc=spearce@spearce.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).