* [PATCH v2 0/4] nd/pathspec-wildcard
@ 2012-11-24 4:33 Nguyễn Thái Ngọc Duy
2012-11-24 4:33 ` [PATCH v2 1/4] pathspec: save the non-wildcard length part Nguyễn Thái Ngọc Duy
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-11-24 4:33 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
The only change (apart from what Junio's made after I sent the series)
is rename GF_* to GFNM_* and PSF_* to PATHSPEC_* with a brief explanation
for each flag.
.gitignore code does not have "foo*oob" bug that the original series has.
Nguyễn Thái Ngọc Duy (4):
pathspec: save the non-wildcard length part
pathspec: do exact comparison on the leading non-wildcard part
pathspec: apply "*.c" optimization from exclude
tree_entry_interesting: do basedir compare on wildcard patterns when
possible
builtin/ls-files.c | 2 +-
builtin/ls-tree.c | 2 +-
cache.h | 5 +++-
dir.c | 36 +++++++++++++++++++++++---
dir.h | 9 +++++++
tree-walk.c | 75 ++++++++++++++++++++++++++++++++++++++++++++++++++----
6 files changed, 118 insertions(+), 11 deletions(-)
--
1.8.0.rc2.23.g1fb49df
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2 1/4] pathspec: save the non-wildcard length part
2012-11-24 4:33 [PATCH v2 0/4] nd/pathspec-wildcard Nguyễn Thái Ngọc Duy
@ 2012-11-24 4:33 ` Nguyễn Thái Ngọc Duy
2012-11-24 4:33 ` [PATCH v2 2/4] pathspec: do exact comparison on the leading non-wildcard part Nguyễn Thái Ngọc Duy
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-11-24 4:33 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
We mark pathspec with wildcards with the field use_wildcard. We
could do better by saving the length of the non-wildcard part, which
can be used for optimizations such as f9f6e2c (exclude: do strcmp as
much as possible before fnmatch - 2012-06-07).
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
builtin/ls-files.c | 2 +-
builtin/ls-tree.c | 2 +-
cache.h | 2 +-
dir.c | 6 +++---
tree-walk.c | 4 ++--
5 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index b5434af..4a9ee69 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -337,7 +337,7 @@ void overlay_tree_on_cache(const char *tree_name, const char *prefix)
matchbuf[0] = prefix;
matchbuf[1] = NULL;
init_pathspec(&pathspec, matchbuf);
- pathspec.items[0].use_wildcard = 0;
+ pathspec.items[0].nowildcard_len = pathspec.items[0].len;
} else
init_pathspec(&pathspec, NULL);
if (read_tree(tree, 1, &pathspec))
diff --git a/builtin/ls-tree.c b/builtin/ls-tree.c
index 235c17c..fb76e38 100644
--- a/builtin/ls-tree.c
+++ b/builtin/ls-tree.c
@@ -168,7 +168,7 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
init_pathspec(&pathspec, get_pathspec(prefix, argv + 1));
for (i = 0; i < pathspec.nr; i++)
- pathspec.items[i].use_wildcard = 0;
+ pathspec.items[i].nowildcard_len = pathspec.items[i].len;
pathspec.has_wildcard = 0;
tree = parse_tree_indirect(sha1);
if (!tree)
diff --git a/cache.h b/cache.h
index dbd8018..bf031f1 100644
--- a/cache.h
+++ b/cache.h
@@ -482,7 +482,7 @@ struct pathspec {
struct pathspec_item {
const char *match;
int len;
- unsigned int use_wildcard:1;
+ int nowildcard_len;
} *items;
};
diff --git a/dir.c b/dir.c
index 5a83aa7..c391d46 100644
--- a/dir.c
+++ b/dir.c
@@ -230,7 +230,7 @@ static int match_pathspec_item(const struct pathspec_item *item, int prefix,
return MATCHED_RECURSIVELY;
}
- if (item->use_wildcard && !fnmatch(match, name, 0))
+ if (item->nowildcard_len < item->len && !fnmatch(match, name, 0))
return MATCHED_FNMATCH;
return 0;
@@ -1429,8 +1429,8 @@ int init_pathspec(struct pathspec *pathspec, const char **paths)
item->match = path;
item->len = strlen(path);
- item->use_wildcard = !no_wildcard(path);
- if (item->use_wildcard)
+ item->nowildcard_len = simple_length(path);
+ if (item->nowildcard_len < item->len)
pathspec->has_wildcard = 1;
}
diff --git a/tree-walk.c b/tree-walk.c
index 3f54c02..af871c5 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -626,7 +626,7 @@ enum interesting tree_entry_interesting(const struct name_entry *entry,
&never_interesting))
return entry_interesting;
- if (item->use_wildcard) {
+ if (item->nowildcard_len < item->len) {
if (!fnmatch(match + baselen, entry->path, 0))
return entry_interesting;
@@ -642,7 +642,7 @@ enum interesting tree_entry_interesting(const struct name_entry *entry,
}
match_wildcards:
- if (!item->use_wildcard)
+ if (item->nowildcard_len == item->len)
continue;
/*
--
1.8.0.rc2.23.g1fb49df
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v2 2/4] pathspec: do exact comparison on the leading non-wildcard part
2012-11-24 4:33 [PATCH v2 0/4] nd/pathspec-wildcard Nguyễn Thái Ngọc Duy
2012-11-24 4:33 ` [PATCH v2 1/4] pathspec: save the non-wildcard length part Nguyễn Thái Ngọc Duy
@ 2012-11-24 4:33 ` Nguyễn Thái Ngọc Duy
2012-11-24 4:33 ` [PATCH v2 3/4] pathspec: apply "*.c" optimization from exclude Nguyễn Thái Ngọc Duy
2012-11-24 4:33 ` [PATCH v2 4/4] tree_entry_interesting: do basedir compare on wildcard patterns when possible Nguyễn Thái Ngọc Duy
3 siblings, 0 replies; 5+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-11-24 4:33 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
dir.c | 18 +++++++++++++++++-
dir.h | 8 ++++++++
tree-walk.c | 6 ++++--
3 files changed, 29 insertions(+), 3 deletions(-)
diff --git a/dir.c b/dir.c
index c391d46..f81e1d2 100644
--- a/dir.c
+++ b/dir.c
@@ -34,6 +34,21 @@ int fnmatch_icase(const char *pattern, const char *string, int flags)
return fnmatch(pattern, string, flags | (ignore_case ? FNM_CASEFOLD : 0));
}
+inline int git_fnmatch(const char *pattern, const char *string,
+ int flags, int prefix)
+{
+ int fnm_flags = 0;
+ if (flags & GFNM_PATHNAME)
+ fnm_flags |= FNM_PATHNAME;
+ if (prefix > 0) {
+ if (strncmp(pattern, string, prefix))
+ return FNM_NOMATCH;
+ pattern += prefix;
+ string += prefix;
+ }
+ return fnmatch(pattern, string, fnm_flags);
+}
+
static size_t common_prefix_len(const char **pathspec)
{
const char *n, *first;
@@ -230,7 +245,8 @@ static int match_pathspec_item(const struct pathspec_item *item, int prefix,
return MATCHED_RECURSIVELY;
}
- if (item->nowildcard_len < item->len && !fnmatch(match, name, 0))
+ if (item->nowildcard_len < item->len &&
+ !git_fnmatch(match, name, 0, item->nowildcard_len - prefix))
return MATCHED_FNMATCH;
return 0;
diff --git a/dir.h b/dir.h
index f5c89e3..0e8ae84 100644
--- a/dir.h
+++ b/dir.h
@@ -139,4 +139,12 @@ extern int strcmp_icase(const char *a, const char *b);
extern int strncmp_icase(const char *a, const char *b, size_t count);
extern int fnmatch_icase(const char *pattern, const char *string, int flags);
+/*
+ * The prefix part of pattern must not contains wildcards.
+ */
+#define GFNM_PATHNAME 1 /* similar to FNM_PATHNAME */
+
+extern int git_fnmatch(const char *pattern, const char *string,
+ int flags, int prefix);
+
#endif
diff --git a/tree-walk.c b/tree-walk.c
index af871c5..2fcf3c0 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -627,7 +627,8 @@ enum interesting tree_entry_interesting(const struct name_entry *entry,
return entry_interesting;
if (item->nowildcard_len < item->len) {
- if (!fnmatch(match + baselen, entry->path, 0))
+ if (!git_fnmatch(match + baselen, entry->path,
+ 0, item->nowildcard_len - baselen))
return entry_interesting;
/*
@@ -652,7 +653,8 @@ match_wildcards:
strbuf_add(base, entry->path, pathlen);
- if (!fnmatch(match, base->buf + base_offset, 0)) {
+ if (!git_fnmatch(match, base->buf + base_offset,
+ 0, item->nowildcard_len)) {
strbuf_setlen(base, base_offset + baselen);
return entry_interesting;
}
--
1.8.0.rc2.23.g1fb49df
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v2 3/4] pathspec: apply "*.c" optimization from exclude
2012-11-24 4:33 [PATCH v2 0/4] nd/pathspec-wildcard Nguyễn Thái Ngọc Duy
2012-11-24 4:33 ` [PATCH v2 1/4] pathspec: save the non-wildcard length part Nguyễn Thái Ngọc Duy
2012-11-24 4:33 ` [PATCH v2 2/4] pathspec: do exact comparison on the leading non-wildcard part Nguyễn Thái Ngọc Duy
@ 2012-11-24 4:33 ` Nguyễn Thái Ngọc Duy
2012-11-24 4:33 ` [PATCH v2 4/4] tree_entry_interesting: do basedir compare on wildcard patterns when possible Nguyễn Thái Ngọc Duy
3 siblings, 0 replies; 5+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-11-24 4:33 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
When a pattern contains only a single asterisk as wildcard,
e.g. "foo*bar", after literally comparing the leading part "foo" with
the string, we can compare the tail of the string and make sure it
matches "bar", instead of running fnmatch() on "*bar" against the
remainder of the string.
-O2 build on linux-2.6, without the patch:
$ time git rev-list --quiet HEAD -- '*.c'
real 0m40.770s
user 0m40.290s
sys 0m0.256s
With the patch
$ time ~/w/git/git rev-list --quiet HEAD -- '*.c'
real 0m34.288s
user 0m33.997s
sys 0m0.205s
The above command is not supposed to be widely popular. It's chosen
because it exercises pathspec matching a lot. The point is it cuts
down matching time for popular patterns like *.c, which could be used
as pathspec in other places.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
The optimization's condition could be loosen to "only one asterisk"
(iow, fixed-length wildcards are OK, e.g. "abc[de]gh*.[ch]") when we
have a custom fnmatch implementation that supports dry-run, so it can
parse the pattern and determine the actual string length of
"abc[de]gh" and ".[ch]". We would have something similar to regcomp in
addition to git_fnmatch().
cache.h | 3 +++
dir.c | 18 ++++++++++++++++--
dir.h | 1 +
tree-walk.c | 6 ++++--
4 files changed, 24 insertions(+), 4 deletions(-)
diff --git a/cache.h b/cache.h
index bf031f1..babf9e5 100644
--- a/cache.h
+++ b/cache.h
@@ -473,6 +473,8 @@ extern int index_name_is_other(const struct index_state *, const char *, int);
extern int ie_match_stat(const struct index_state *, struct cache_entry *, struct stat *, unsigned int);
extern int ie_modified(const struct index_state *, struct cache_entry *, struct stat *, unsigned int);
+#define PATHSPEC_ONESTAR 1 /* the pathspec pattern sastisfies GFNM_ONESTAR */
+
struct pathspec {
const char **raw; /* get_pathspec() result, not freed by free_pathspec() */
int nr;
@@ -483,6 +485,7 @@ struct pathspec {
const char *match;
int len;
int nowildcard_len;
+ int flags;
} *items;
};
diff --git a/dir.c b/dir.c
index f81e1d2..9afd388 100644
--- a/dir.c
+++ b/dir.c
@@ -46,6 +46,13 @@ inline int git_fnmatch(const char *pattern, const char *string,
pattern += prefix;
string += prefix;
}
+ if (flags & GFNM_ONESTAR) {
+ int pattern_len = strlen(++pattern);
+ int string_len = strlen(string);
+ return string_len < pattern_len ||
+ strcmp(pattern,
+ string + string_len - pattern_len);
+ }
return fnmatch(pattern, string, fnm_flags);
}
@@ -246,7 +253,9 @@ static int match_pathspec_item(const struct pathspec_item *item, int prefix,
}
if (item->nowildcard_len < item->len &&
- !git_fnmatch(match, name, 0, item->nowildcard_len - prefix))
+ !git_fnmatch(match, name,
+ item->flags & PATHSPEC_ONESTAR ? GFNM_ONESTAR : 0,
+ item->nowildcard_len - prefix))
return MATCHED_FNMATCH;
return 0;
@@ -1446,8 +1455,13 @@ int init_pathspec(struct pathspec *pathspec, const char **paths)
item->match = path;
item->len = strlen(path);
item->nowildcard_len = simple_length(path);
- if (item->nowildcard_len < item->len)
+ item->flags = 0;
+ if (item->nowildcard_len < item->len) {
pathspec->has_wildcard = 1;
+ if (path[item->nowildcard_len] == '*' &&
+ no_wildcard(path + item->nowildcard_len + 1))
+ item->flags |= PATHSPEC_ONESTAR;
+ }
}
qsort(pathspec->items, pathspec->nr,
diff --git a/dir.h b/dir.h
index 0e8ae84..ab5af42 100644
--- a/dir.h
+++ b/dir.h
@@ -143,6 +143,7 @@ extern int fnmatch_icase(const char *pattern, const char *string, int flags);
* The prefix part of pattern must not contains wildcards.
*/
#define GFNM_PATHNAME 1 /* similar to FNM_PATHNAME */
+#define GFNM_ONESTAR 2 /* there is only _one_ wildcard, a star */
extern int git_fnmatch(const char *pattern, const char *string,
int flags, int prefix);
diff --git a/tree-walk.c b/tree-walk.c
index 2fcf3c0..585899e 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -628,7 +628,8 @@ enum interesting tree_entry_interesting(const struct name_entry *entry,
if (item->nowildcard_len < item->len) {
if (!git_fnmatch(match + baselen, entry->path,
- 0, item->nowildcard_len - baselen))
+ item->flags & PATHSPEC_ONESTAR ? GFNM_ONESTAR : 0,
+ item->nowildcard_len - baselen))
return entry_interesting;
/*
@@ -654,7 +655,8 @@ match_wildcards:
strbuf_add(base, entry->path, pathlen);
if (!git_fnmatch(match, base->buf + base_offset,
- 0, item->nowildcard_len)) {
+ item->flags & PATHSPEC_ONESTAR ? GFNM_ONESTAR : 0,
+ item->nowildcard_len)) {
strbuf_setlen(base, base_offset + baselen);
return entry_interesting;
}
--
1.8.0.rc2.23.g1fb49df
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v2 4/4] tree_entry_interesting: do basedir compare on wildcard patterns when possible
2012-11-24 4:33 [PATCH v2 0/4] nd/pathspec-wildcard Nguyễn Thái Ngọc Duy
` (2 preceding siblings ...)
2012-11-24 4:33 ` [PATCH v2 3/4] pathspec: apply "*.c" optimization from exclude Nguyễn Thái Ngọc Duy
@ 2012-11-24 4:33 ` Nguyễn Thái Ngọc Duy
3 siblings, 0 replies; 5+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-11-24 4:33 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
Currently we treat "*.c" and "path/to/*.c" the same way. Which means
we check all possible paths in repo against "path/to/*.c". One could
see that "path/elsewhere/foo.c" obviously cannot match "path/to/*.c"
and we only need to check all paths _inside_ "path/to/" against that
pattern.
This patch checks the leading fixed part of a pathspec against base
directory and exit early if possible. We could even optimize further
in "path/to/something*.c" case (i.e. check the fixed part against
name_entry as well) but that's more complicated and probably does not
gain us much.
-O2 build on linux-2.6, without and with this patch respectively:
$ time git rev-list --quiet HEAD -- 'drivers/*.c'
real 1m9.484s
user 1m9.128s
sys 0m0.181s
$ time ~/w/git/git rev-list --quiet HEAD -- 'drivers/*.c'
real 0m15.710s
user 0m15.564s
sys 0m0.107s
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
tree-walk.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 62 insertions(+), 1 deletion(-)
diff --git a/tree-walk.c b/tree-walk.c
index 585899e..9d6e59b 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -573,6 +573,52 @@ static int match_dir_prefix(const char *base,
}
/*
+ * Perform matching on the leading non-wildcard part of
+ * pathspec. item->nowildcard_len must be greater than zero. Return
+ * non-zero if base is matched.
+ */
+static int match_wildcard_base(const struct pathspec_item *item,
+ const char *base, int baselen,
+ int *matched)
+{
+ const char *match = item->match;
+ /* the wildcard part is not considered in this function */
+ int matchlen = item->nowildcard_len;
+
+ if (baselen) {
+ int dirlen;
+ /*
+ * Return early if base is longer than the
+ * non-wildcard part but it does not match.
+ */
+ if (baselen >= matchlen) {
+ *matched = matchlen;
+ return !strncmp(base, match, matchlen);
+ }
+
+ dirlen = matchlen;
+ while (dirlen && match[dirlen - 1] != '/')
+ dirlen--;
+
+ /* Return early if base is shorter than the
+ non-wildcard part but it does not match. Note that
+ base ends with '/' so we are sure it really matches
+ directory */
+ if (strncmp(base, match, baselen))
+ return 0;
+ *matched = baselen;
+ } else
+ *matched = 0;
+ /*
+ * we could have checked entry against the non-wildcard part
+ * that is not in base and does similar never_interesting
+ * optimization as in match_entry. For now just be happy with
+ * base comparison.
+ */
+ return entry_interesting;
+}
+
+/*
* Is a tree entry interesting given the pathspec we have?
*
* Pre-condition: either baselen == base_offset (i.e. empty path)
@@ -602,7 +648,7 @@ enum interesting tree_entry_interesting(const struct name_entry *entry,
const struct pathspec_item *item = ps->items+i;
const char *match = item->match;
const char *base_str = base->buf + base_offset;
- int matchlen = item->len;
+ int matchlen = item->len, matched = 0;
if (baselen >= matchlen) {
/* If it doesn't match, move along... */
@@ -647,9 +693,24 @@ match_wildcards:
if (item->nowildcard_len == item->len)
continue;
+ if (item->nowildcard_len &&
+ !match_wildcard_base(item, base_str, baselen, &matched))
+ return entry_not_interesting;
+
/*
* Concatenate base and entry->path into one and do
* fnmatch() on it.
+ *
+ * While we could avoid concatenation in certain cases
+ * [1], which saves a memcpy and potentially a
+ * realloc, it turns out not worth it. Measurement on
+ * linux-2.6 does not show any clear improvements,
+ * partly because of the nowildcard_len optimization
+ * in git_fnmatch(). Avoid micro-optimizations here.
+ *
+ * [1] if match_wildcard_base() says the base
+ * directory is already matched, we only need to match
+ * the rest, which is shorter so _in theory_ faster.
*/
strbuf_add(base, entry->path, pathlen);
--
1.8.0.rc2.23.g1fb49df
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-11-24 4:34 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-24 4:33 [PATCH v2 0/4] nd/pathspec-wildcard Nguyễn Thái Ngọc Duy
2012-11-24 4:33 ` [PATCH v2 1/4] pathspec: save the non-wildcard length part Nguyễn Thái Ngọc Duy
2012-11-24 4:33 ` [PATCH v2 2/4] pathspec: do exact comparison on the leading non-wildcard part Nguyễn Thái Ngọc Duy
2012-11-24 4:33 ` [PATCH v2 3/4] pathspec: apply "*.c" optimization from exclude Nguyễn Thái Ngọc Duy
2012-11-24 4:33 ` [PATCH v2 4/4] tree_entry_interesting: do basedir compare on wildcard patterns when possible Nguyễn Thái Ngọc Duy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).