git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
To: git@vger.kernel.org
Cc: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: [PATCH 4/4] tree_entry_interesting: do basedir compare on wildcard patterns when possible
Date: Sun, 18 Nov 2012 16:13:09 +0700	[thread overview]
Message-ID: <1353229989-13075-5-git-send-email-pclouds@gmail.com> (raw)
In-Reply-To: <1353229989-13075-1-git-send-email-pclouds@gmail.com>

Currently we treat "*.c" and "path/to/*.c" the same way. Which means
we check all possible paths in repo against "path/to/*.c". One could
see that "path/elsewhere/foo.c" obviously cannot match "path/to/*.c"
and we only need to check all paths _inside_ "path/to/" against that
pattern.

This patch checks the leading fixed part of a pathspec against base
directory and exit early if possible. We could even optimize further
in "path/to/something*.c" case (i.e. check the fixed part against
name_entry as well) but that's more complicated and probably does not
gain us much.

-O2 build on linux-2.6, without and with this patch respectively:

$ time git rev-list --quiet HEAD -- 'drivers/*.c'

real    1m9.484s
user    1m9.128s
sys     0m0.181s

$ time ~/w/git/git rev-list --quiet HEAD -- 'drivers/*.c'

real    0m15.710s
user    0m15.564s
sys     0m0.107s

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 tree-walk.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 62 insertions(+), 1 deletion(-)

diff --git a/tree-walk.c b/tree-walk.c
index 42fe610..dcc1015 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -573,6 +573,52 @@ static int match_dir_prefix(const char *base,
 }
 
 /*
+ * Perform matching on the leading non-wildcard part of
+ * pathspec. item->nowildcard_len must be greater than zero. Return
+ * non-zero if base is matched.
+ */
+static int match_wildcard_base(const struct pathspec_item *item,
+			       const char *base, int baselen,
+			       int *matched)
+{
+	const char *match = item->match;
+	/* the wildcard part is not considered in this function */
+	int matchlen = item->nowildcard_len;
+
+	if (baselen) {
+		int dirlen;
+		/*
+		 * Return early if base is longer than the
+		 * non-wildcard part but it does not match.
+		 */
+		if (baselen >= matchlen) {
+			*matched = matchlen;
+			return !strncmp(base, match, matchlen);
+		}
+
+		dirlen = matchlen;
+		while (dirlen && match[dirlen - 1] != '/')
+			dirlen--;
+
+		/* Return early if base is shorter than the
+		   non-wildcard part but it does not match. Note that
+		   base ends with '/' so we are sure it really matches
+		   directory */
+		if (strncmp(base, match, baselen))
+			return 0;
+		*matched = baselen;
+	} else
+		*matched = 0;
+	/*
+	 * we could have checked entry against the non-wildcard part
+	 * that is not in base and does similar never_interesting
+	 * optimization as in match_entry. For now just be happy with
+	 * base comparison.
+	 */
+	return entry_interesting;
+}
+
+/*
  * Is a tree entry interesting given the pathspec we have?
  *
  * Pre-condition: either baselen == base_offset (i.e. empty path)
@@ -602,7 +648,7 @@ enum interesting tree_entry_interesting(const struct name_entry *entry,
 		const struct pathspec_item *item = ps->items+i;
 		const char *match = item->match;
 		const char *base_str = base->buf + base_offset;
-		int matchlen = item->len;
+		int matchlen = item->len, matched = 0;
 
 		if (baselen >= matchlen) {
 			/* If it doesn't match, move along... */
@@ -647,9 +693,24 @@ match_wildcards:
 		if (item->nowildcard_len == item->len)
 			continue;
 
+		if (item->nowildcard_len &&
+		    !match_wildcard_base(item, base_str, baselen, &matched))
+			return entry_not_interesting;
+
 		/*
 		 * Concatenate base and entry->path into one and do
 		 * fnmatch() on it.
+		 *
+		 * While we could avoid concatenation in certain cases
+		 * [1], which saves a memcpy and potentially a
+		 * realloc, it turns out not worth it. Measurement on
+		 * linux-2.6 does not show any clear improvements,
+		 * partly because of the nowildcard_len optimization
+		 * in git_fnmatch(). Avoid micro-optimizations here.
+		 *
+		 * [1] if match_wildcard_base() says the base
+		 * directory is already matched, we only need to match
+		 * the rest, which is shorter so _in theory_ faster.
 		 */
 
 		strbuf_add(base, entry->path, pathlen);
-- 
1.8.0.rc2.23.g1fb49df

      parent reply	other threads:[~2012-11-18  9:13 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-18  9:13 [PATCH 0/4] Some pathspec wildcard optimization Nguyễn Thái Ngọc Duy
2012-11-18  9:13 ` [PATCH 1/4] pathspec: save the non-wildcard length part Nguyễn Thái Ngọc Duy
2012-11-19 20:48   ` Junio C Hamano
2012-11-18  9:13 ` [PATCH 2/4] pathspec: do exact comparison on the leading non-wildcard part Nguyễn Thái Ngọc Duy
2012-11-19 20:54   ` Junio C Hamano
2012-11-20 13:09     ` Nguyen Thai Ngoc Duy
2012-11-18  9:13 ` [PATCH 3/4] pathspec: apply "*.c" optimization from exclude Nguyễn Thái Ngọc Duy
2012-11-19 21:20   ` Junio C Hamano
2012-11-20 13:23     ` Nguyen Thai Ngoc Duy
2012-11-20 19:47       ` Junio C Hamano
2012-11-18  9:13 ` Nguyễn Thái Ngọc Duy [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1353229989-13075-5-git-send-email-pclouds@gmail.com \
    --to=pclouds@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).