From: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: [PATCH v2 3/4] pathspec: apply "*.c" optimization from exclude
Date: Sat, 24 Nov 2012 11:33:50 +0700 [thread overview]
Message-ID: <1353731631-20593-4-git-send-email-pclouds@gmail.com> (raw)
In-Reply-To: <1353731631-20593-1-git-send-email-pclouds@gmail.com>
When a pattern contains only a single asterisk as wildcard,
e.g. "foo*bar", after literally comparing the leading part "foo" with
the string, we can compare the tail of the string and make sure it
matches "bar", instead of running fnmatch() on "*bar" against the
remainder of the string.
-O2 build on linux-2.6, without the patch:
$ time git rev-list --quiet HEAD -- '*.c'
real 0m40.770s
user 0m40.290s
sys 0m0.256s
With the patch
$ time ~/w/git/git rev-list --quiet HEAD -- '*.c'
real 0m34.288s
user 0m33.997s
sys 0m0.205s
The above command is not supposed to be widely popular. It's chosen
because it exercises pathspec matching a lot. The point is it cuts
down matching time for popular patterns like *.c, which could be used
as pathspec in other places.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
The optimization's condition could be loosen to "only one asterisk"
(iow, fixed-length wildcards are OK, e.g. "abc[de]gh*.[ch]") when we
have a custom fnmatch implementation that supports dry-run, so it can
parse the pattern and determine the actual string length of
"abc[de]gh" and ".[ch]". We would have something similar to regcomp in
addition to git_fnmatch().
cache.h | 3 +++
dir.c | 18 ++++++++++++++++--
dir.h | 1 +
tree-walk.c | 6 ++++--
4 files changed, 24 insertions(+), 4 deletions(-)
diff --git a/cache.h b/cache.h
index bf031f1..babf9e5 100644
--- a/cache.h
+++ b/cache.h
@@ -473,6 +473,8 @@ extern int index_name_is_other(const struct index_state *, const char *, int);
extern int ie_match_stat(const struct index_state *, struct cache_entry *, struct stat *, unsigned int);
extern int ie_modified(const struct index_state *, struct cache_entry *, struct stat *, unsigned int);
+#define PATHSPEC_ONESTAR 1 /* the pathspec pattern sastisfies GFNM_ONESTAR */
+
struct pathspec {
const char **raw; /* get_pathspec() result, not freed by free_pathspec() */
int nr;
@@ -483,6 +485,7 @@ struct pathspec {
const char *match;
int len;
int nowildcard_len;
+ int flags;
} *items;
};
diff --git a/dir.c b/dir.c
index f81e1d2..9afd388 100644
--- a/dir.c
+++ b/dir.c
@@ -46,6 +46,13 @@ inline int git_fnmatch(const char *pattern, const char *string,
pattern += prefix;
string += prefix;
}
+ if (flags & GFNM_ONESTAR) {
+ int pattern_len = strlen(++pattern);
+ int string_len = strlen(string);
+ return string_len < pattern_len ||
+ strcmp(pattern,
+ string + string_len - pattern_len);
+ }
return fnmatch(pattern, string, fnm_flags);
}
@@ -246,7 +253,9 @@ static int match_pathspec_item(const struct pathspec_item *item, int prefix,
}
if (item->nowildcard_len < item->len &&
- !git_fnmatch(match, name, 0, item->nowildcard_len - prefix))
+ !git_fnmatch(match, name,
+ item->flags & PATHSPEC_ONESTAR ? GFNM_ONESTAR : 0,
+ item->nowildcard_len - prefix))
return MATCHED_FNMATCH;
return 0;
@@ -1446,8 +1455,13 @@ int init_pathspec(struct pathspec *pathspec, const char **paths)
item->match = path;
item->len = strlen(path);
item->nowildcard_len = simple_length(path);
- if (item->nowildcard_len < item->len)
+ item->flags = 0;
+ if (item->nowildcard_len < item->len) {
pathspec->has_wildcard = 1;
+ if (path[item->nowildcard_len] == '*' &&
+ no_wildcard(path + item->nowildcard_len + 1))
+ item->flags |= PATHSPEC_ONESTAR;
+ }
}
qsort(pathspec->items, pathspec->nr,
diff --git a/dir.h b/dir.h
index 0e8ae84..ab5af42 100644
--- a/dir.h
+++ b/dir.h
@@ -143,6 +143,7 @@ extern int fnmatch_icase(const char *pattern, const char *string, int flags);
* The prefix part of pattern must not contains wildcards.
*/
#define GFNM_PATHNAME 1 /* similar to FNM_PATHNAME */
+#define GFNM_ONESTAR 2 /* there is only _one_ wildcard, a star */
extern int git_fnmatch(const char *pattern, const char *string,
int flags, int prefix);
diff --git a/tree-walk.c b/tree-walk.c
index 2fcf3c0..585899e 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -628,7 +628,8 @@ enum interesting tree_entry_interesting(const struct name_entry *entry,
if (item->nowildcard_len < item->len) {
if (!git_fnmatch(match + baselen, entry->path,
- 0, item->nowildcard_len - baselen))
+ item->flags & PATHSPEC_ONESTAR ? GFNM_ONESTAR : 0,
+ item->nowildcard_len - baselen))
return entry_interesting;
/*
@@ -654,7 +655,8 @@ match_wildcards:
strbuf_add(base, entry->path, pathlen);
if (!git_fnmatch(match, base->buf + base_offset,
- 0, item->nowildcard_len)) {
+ item->flags & PATHSPEC_ONESTAR ? GFNM_ONESTAR : 0,
+ item->nowildcard_len)) {
strbuf_setlen(base, base_offset + baselen);
return entry_interesting;
}
--
1.8.0.rc2.23.g1fb49df
next prev parent reply other threads:[~2012-11-24 4:34 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-24 4:33 [PATCH v2 0/4] nd/pathspec-wildcard Nguyễn Thái Ngọc Duy
2012-11-24 4:33 ` [PATCH v2 1/4] pathspec: save the non-wildcard length part Nguyễn Thái Ngọc Duy
2012-11-24 4:33 ` [PATCH v2 2/4] pathspec: do exact comparison on the leading non-wildcard part Nguyễn Thái Ngọc Duy
2012-11-24 4:33 ` Nguyễn Thái Ngọc Duy [this message]
2012-11-24 4:33 ` [PATCH v2 4/4] tree_entry_interesting: do basedir compare on wildcard patterns when possible Nguyễn Thái Ngọc Duy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1353731631-20593-4-git-send-email-pclouds@gmail.com \
--to=pclouds@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.