git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH/WIP 0/3] Bye bye fnmatch()
@ 2012-12-19 13:08 Nguyễn Thái Ngọc Duy
  2012-12-19 13:08 ` [PATCH 1/3] wildmatch: make dowild() take arbitrary flags Nguyễn Thái Ngọc Duy
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-12-19 13:08 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

For those who have not followed, nd/wildmatch brings another
fnmatch-like implementation which can nearly replace fnmatch.
System fnmatch() seems to behave differently in some cases. It's
better to stay away and use one implementation for all.

I just wanted to see how much work there may be if we go this way. It
turns out not much. I haven't checked my dowild() changes carefully.
I may have left a bug in '[]' code. There are some minor issues I
like dependency on FNM_* macros or wildmatch.h should be incorporated
back to git-compat-util.h. But the test suite passes for me. So it's
promising.

Nguyễn Thái Ngọc Duy (3):
  wildmatch: make dowild() take arbitrary flags
  wildmatch: support "no FNM_PATHNAME" mode
  Convert all fnmatch() calls to wildmatch()

 builtin/apply.c        |  3 ++-
 builtin/branch.c       |  3 ++-
 builtin/describe.c     |  3 ++-
 builtin/for-each-ref.c |  3 ++-
 builtin/ls-remote.c    |  3 ++-
 builtin/name-rev.c     |  3 ++-
 builtin/reflog.c       |  3 ++-
 builtin/replace.c      |  3 ++-
 builtin/show-branch.c  |  3 ++-
 builtin/tag.c          |  3 ++-
 diffcore-order.c       |  3 ++-
 dir.c                  |  6 +++---
 refs.c                 |  3 ++-
 t/t3070-wildmatch.sh   | 27 +++++++++++++++++++++++++++
 test-wildmatch.c       |  4 +++-
 tree-walk.c            |  5 +++--
 wildmatch.c            | 25 ++++++++++++++-----------
 17 files changed, 74 insertions(+), 29 deletions(-)

-- 
1.8.0.rc2.23.g1fb49df

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] wildmatch: make dowild() take arbitrary flags
  2012-12-19 13:08 [PATCH/WIP 0/3] Bye bye fnmatch() Nguyễn Thái Ngọc Duy
@ 2012-12-19 13:08 ` Nguyễn Thái Ngọc Duy
  2012-12-19 16:32   ` Junio C Hamano
  2012-12-19 13:08 ` [PATCH 2/3] wildmatch: support "no FNM_PATHNAME" mode Nguyễn Thái Ngọc Duy
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-12-19 13:08 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 wildmatch.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/wildmatch.c b/wildmatch.c
index 3972e26..9586ed9 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -55,7 +55,7 @@ typedef unsigned char uchar;
 #define ISXDIGIT(c) (ISASCII(c) && isxdigit(c))
 
 /* Match pattern "p" against "text" */
-static int dowild(const uchar *p, const uchar *text, int force_lower_case)
+static int dowild(const uchar *p, const uchar *text, int flags)
 {
 	uchar p_ch;
 
@@ -64,9 +64,9 @@ static int dowild(const uchar *p, const uchar *text, int force_lower_case)
 		uchar t_ch, prev_ch;
 		if ((t_ch = *text) == '\0' && p_ch != '*')
 			return ABORT_ALL;
-		if (force_lower_case && ISUPPER(t_ch))
+		if (flags & FNM_CASEFOLD && ISUPPER(t_ch))
 			t_ch = tolower(t_ch);
-		if (force_lower_case && ISUPPER(p_ch))
+		if (flags & FNM_CASEFOLD && ISUPPER(p_ch))
 			p_ch = tolower(p_ch);
 		switch (p_ch) {
 		case '\\':
@@ -100,7 +100,7 @@ static int dowild(const uchar *p, const uchar *text, int force_lower_case)
 					 * both foo/bar and foo/a/bar.
 					 */
 					if (p[0] == '/' &&
-					    dowild(p + 1, text, force_lower_case) == MATCH)
+					    dowild(p + 1, text, flags) == MATCH)
 						return MATCH;
 					special = TRUE;
 				} else
@@ -119,7 +119,7 @@ static int dowild(const uchar *p, const uchar *text, int force_lower_case)
 			while (1) {
 				if (t_ch == '\0')
 					break;
-				if ((matched = dowild(p, text,  force_lower_case)) != NOMATCH) {
+				if ((matched = dowild(p, text,  flags)) != NOMATCH) {
 					if (!special || matched != ABORT_TO_STARSTAR)
 						return matched;
 				} else if (!special && t_ch == '/')
@@ -229,6 +229,5 @@ static int dowild(const uchar *p, const uchar *text, int force_lower_case)
 /* Match the "pattern" against the "text" string. */
 int wildmatch(const char *pattern, const char *text, int flags)
 {
-	return dowild((const uchar*)pattern, (const uchar*)text,
-		      flags & FNM_CASEFOLD ? 1 :0);
+	return dowild((const uchar*)pattern, (const uchar*)text, flags);
 }
-- 
1.8.0.rc2.23.g1fb49df

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] wildmatch: support "no FNM_PATHNAME" mode
  2012-12-19 13:08 [PATCH/WIP 0/3] Bye bye fnmatch() Nguyễn Thái Ngọc Duy
  2012-12-19 13:08 ` [PATCH 1/3] wildmatch: make dowild() take arbitrary flags Nguyễn Thái Ngọc Duy
@ 2012-12-19 13:08 ` Nguyễn Thái Ngọc Duy
  2012-12-19 17:24   ` Junio C Hamano
  2012-12-19 13:08 ` [PATCH 3/3] Convert all fnmatch() calls to wildmatch() Nguyễn Thái Ngọc Duy
  2012-12-19 13:21 ` [PATCH/WIP 0/3] Bye bye fnmatch() Nguyen Thai Ngoc Duy
  3 siblings, 1 reply; 11+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-12-19 13:08 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

By default wildmatch(,, 0) is equivalent with fnmatch(,, FNM_PATHNAME).

This patch makes wildmatch behave more like fnmatch: FNM_PATHNAME
behavior is always applied when FNM_PATHNAME is passed to
wildmatch. Without FNM_PATHNAME, wildmatch accepts '/' in '?' and '[]'
and treats '*' like '**' in the original version.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 The choice of name "pathspec" is not good. I couldn't think of
 anything appropriate and just did not care enough at this point.

 dir.c                |  2 +-
 t/t3070-wildmatch.sh | 27 +++++++++++++++++++++++++++
 test-wildmatch.c     |  4 +++-
 wildmatch.c          | 12 ++++++++----
 4 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/dir.c b/dir.c
index cb7328b..7bbd6f8 100644
--- a/dir.c
+++ b/dir.c
@@ -595,7 +595,7 @@ int match_pathname(const char *pathname, int pathlen,
 	}
 
 	return wildmatch(pattern, name,
-			 ignore_case ? FNM_CASEFOLD : 0) == 0;
+			 FNM_PATHNAME | (ignore_case ? FNM_CASEFOLD : 0)) == 0;
 }
 
 /* Scan the list and let the last match determine the fate.
diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh
index 3155eab..ca4ac46 100755
--- a/t/t3070-wildmatch.sh
+++ b/t/t3070-wildmatch.sh
@@ -29,6 +29,18 @@ match() {
     fi
 }
 
+pathspec() {
+    if [ $1 = 1 ]; then
+	test_expect_success "pathspec:    match '$2' '$3'" "
+	    test-wildmatch pathspec '$2' '$3'
+	"
+    else
+	test_expect_success "pathspec: no match '$2' '$3'" "
+	    ! test-wildmatch pathspec '$2' '$3'
+	"
+    fi
+}
+
 # Basic wildmat features
 match 1 1 foo foo
 match 0 0 foo bar
@@ -192,4 +204,19 @@ match 0 0 'XXX/adobe/courier/bold/o/normal//12/120/75/75/X/70/iso8859/1' 'XXX/*/
 match 1 0 'abcd/abcdefg/abcdefghijk/abcdefghijklmnop.txt' '**/*a*b*g*n*t'
 match 0 0 'abcd/abcdefg/abcdefghijk/abcdefghijklmnop.txtz' '**/*a*b*g*n*t'
 
+pathspec 1 foo foo
+pathspec 0 foo fo
+pathspec 1 foo/bar foo/bar
+pathspec 1 foo/bar 'foo/*'
+pathspec 1 foo/bba/arr 'foo/*'
+pathspec 1 foo/bba/arr 'foo/**'
+pathspec 1 foo/bba/arr 'foo*'
+pathspec 1 foo/bba/arr 'foo**'
+pathspec 1 foo/bba/arr 'foo/*arr'
+pathspec 1 foo/bba/arr 'foo/**arr'
+pathspec 0 foo/bba/arr 'foo/*z'
+pathspec 0 foo/bba/arr 'foo/**z'
+pathspec 1 foo/bar 'foo?bar'
+pathspec 1 foo/bar 'foo[/]bar'
+
 test_done
diff --git a/test-wildmatch.c b/test-wildmatch.c
index e384c8e..7fefa4f 100644
--- a/test-wildmatch.c
+++ b/test-wildmatch.c
@@ -12,9 +12,11 @@ int main(int argc, char **argv)
 			argv[i] += 3;
 	}
 	if (!strcmp(argv[1], "wildmatch"))
+		return !!wildmatch(argv[3], argv[2], FNM_PATHNAME);
+	else if (!strcmp(argv[1], "pathspec"))
 		return !!wildmatch(argv[3], argv[2], 0);
 	else if (!strcmp(argv[1], "iwildmatch"))
-		return !!wildmatch(argv[3], argv[2], FNM_CASEFOLD);
+		return !!wildmatch(argv[3], argv[2], FNM_PATHNAME | FNM_CASEFOLD);
 	else if (!strcmp(argv[1], "fnmatch"))
 		return !!fnmatch(argv[3], argv[2], FNM_PATHNAME);
 	else
diff --git a/wildmatch.c b/wildmatch.c
index 9586ed9..6aa034f 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -80,14 +80,17 @@ static int dowild(const uchar *p, const uchar *text, int flags)
 			continue;
 		case '?':
 			/* Match anything but '/'. */
-			if (t_ch == '/')
+			if (flags & FNM_PATHNAME && t_ch == '/')
 				return NOMATCH;
 			continue;
 		case '*':
 			if (*++p == '*') {
 				const uchar *prev_p = p - 2;
 				while (*++p == '*') {}
-				if ((prev_p == text || *prev_p == '/') ||
+				if (!(flags & FNM_PATHNAME))
+					/* without FNM_PATHNAME, '*' == '**' */
+					special = TRUE;
+				else if ((prev_p == text || *prev_p == '/') ||
 				    (*p == '\0' || *p == '/' ||
 				     (p[0] == '\\' && p[1] == '/'))) {
 					/*
@@ -106,7 +109,7 @@ static int dowild(const uchar *p, const uchar *text, int flags)
 				} else
 					return ABORT_MALFORMED;
 			} else
-				special = FALSE;
+				special = flags & FNM_PATHNAME ? FALSE: TRUE;
 			if (*p == '\0') {
 				/* Trailing "**" matches everything.  Trailing "*" matches
 				 * only if there are no more slash characters. */
@@ -217,7 +220,8 @@ static int dowild(const uchar *p, const uchar *text, int flags)
 				} else if (t_ch == p_ch)
 					matched = TRUE;
 			} while (prev_ch = p_ch, (p_ch = *++p) != ']');
-			if (matched == special || t_ch == '/')
+			if (matched == special ||
+			    (flags & FNM_PATHNAME && t_ch == '/'))
 				return NOMATCH;
 			continue;
 		}
-- 
1.8.0.rc2.23.g1fb49df

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] Convert all fnmatch() calls to wildmatch()
  2012-12-19 13:08 [PATCH/WIP 0/3] Bye bye fnmatch() Nguyễn Thái Ngọc Duy
  2012-12-19 13:08 ` [PATCH 1/3] wildmatch: make dowild() take arbitrary flags Nguyễn Thái Ngọc Duy
  2012-12-19 13:08 ` [PATCH 2/3] wildmatch: support "no FNM_PATHNAME" mode Nguyễn Thái Ngọc Duy
@ 2012-12-19 13:08 ` Nguyễn Thái Ngọc Duy
  2012-12-19 18:36   ` Junio C Hamano
  2012-12-19 13:21 ` [PATCH/WIP 0/3] Bye bye fnmatch() Nguyen Thai Ngoc Duy
  3 siblings, 1 reply; 11+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-12-19 13:08 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/apply.c        | 3 ++-
 builtin/branch.c       | 3 ++-
 builtin/describe.c     | 3 ++-
 builtin/for-each-ref.c | 3 ++-
 builtin/ls-remote.c    | 3 ++-
 builtin/name-rev.c     | 3 ++-
 builtin/reflog.c       | 3 ++-
 builtin/replace.c      | 3 ++-
 builtin/show-branch.c  | 3 ++-
 builtin/tag.c          | 3 ++-
 diffcore-order.c       | 3 ++-
 dir.c                  | 4 ++--
 refs.c                 | 3 ++-
 tree-walk.c            | 5 +++--
 14 files changed, 29 insertions(+), 16 deletions(-)

diff --git a/builtin/apply.c b/builtin/apply.c
index d2180b0..86ff51d 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -16,6 +16,7 @@
 #include "dir.h"
 #include "diff.h"
 #include "parse-options.h"
+#include "wildmatch.h"
 
 /*
  *  --check turns on checking that the working tree matches the
@@ -3786,7 +3787,7 @@ static int use_patch(struct patch *p)
 	/* See if it matches any of exclude/include rule */
 	for (i = 0; i < limit_by_name.nr; i++) {
 		struct string_list_item *it = &limit_by_name.items[i];
-		if (!fnmatch(it->string, pathname, 0))
+		if (!wildmatch(it->string, pathname, 0))
 			return (it->util != NULL);
 	}
 
diff --git a/builtin/branch.c b/builtin/branch.c
index 0e060f2..81257c1 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -17,6 +17,7 @@
 #include "revision.h"
 #include "string-list.h"
 #include "column.h"
+#include "wildmatch.h"
 
 static const char * const builtin_branch_usage[] = {
 	"git branch [options] [-r | -a] [--merged | --no-merged]",
@@ -286,7 +287,7 @@ static int match_patterns(const char **pattern, const char *refname)
 	if (!*pattern)
 		return 1; /* no pattern always matches */
 	while (*pattern) {
-		if (!fnmatch(*pattern, refname, 0))
+		if (!wildmatch(*pattern, refname, 0))
 			return 1;
 		pattern++;
 	}
diff --git a/builtin/describe.c b/builtin/describe.c
index 9f63067..1f55802 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -7,6 +7,7 @@
 #include "parse-options.h"
 #include "diff.h"
 #include "hash.h"
+#include "wildmatch.h"
 
 #define SEEN		(1u<<0)
 #define MAX_TAGS	(FLAG_BITS - 1)
@@ -161,7 +162,7 @@ static int get_name(const char *path, const unsigned char *sha1, int flag, void
 		else
 			prio = 1;
 
-		if (pattern && fnmatch(pattern, path + 10, 0))
+		if (pattern && wildmatch(pattern, path + 10, 0))
 			prio = 0;
 	}
 	else
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 0c5294e..85f87dd 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -9,6 +9,7 @@
 #include "quote.h"
 #include "parse-options.h"
 #include "remote.h"
+#include "wildmatch.h"
 
 /* Quoting styles */
 #define QUOTE_NONE 0
@@ -794,7 +795,7 @@ static int grab_single_ref(const char *refname, const unsigned char *sha1, int f
 			     refname[plen] == '/' ||
 			     p[plen-1] == '/'))
 				break;
-			if (!fnmatch(p, refname, FNM_PATHNAME))
+			if (!wildmatch(p, refname, FNM_PATHNAME))
 				break;
 		}
 		if (!*pattern)
diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index 41c88a9..8271fbb 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -2,6 +2,7 @@
 #include "cache.h"
 #include "transport.h"
 #include "remote.h"
+#include "wildmatch.h"
 
 static const char ls_remote_usage[] =
 "git ls-remote [--heads] [--tags]  [-u <exec> | --upload-pack <exec>]\n"
@@ -22,7 +23,7 @@ static int tail_match(const char **pattern, const char *path)
 	if (snprintf(pathbuf, sizeof(pathbuf), "/%s", path) > sizeof(pathbuf))
 		return error("insanely long ref %.*s...", 20, path);
 	while ((p = *(pattern++)) != NULL) {
-		if (!fnmatch(p, pathbuf, 0))
+		if (!wildmatch(p, pathbuf, 0))
 			return 1;
 	}
 	return 0;
diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 1b37458..0cc3141 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -4,6 +4,7 @@
 #include "tag.h"
 #include "refs.h"
 #include "parse-options.h"
+#include "wildmatch.h"
 
 #define CUTOFF_DATE_SLOP 86400 /* one day */
 
@@ -97,7 +98,7 @@ static int name_ref(const char *path, const unsigned char *sha1, int flags, void
 	if (data->tags_only && prefixcmp(path, "refs/tags/"))
 		return 0;
 
-	if (data->ref_filter && fnmatch(data->ref_filter, path, 0))
+	if (data->ref_filter && wildmatch(data->ref_filter, path, 0))
 		return 0;
 
 	while (o && o->type == OBJ_TAG) {
diff --git a/builtin/reflog.c b/builtin/reflog.c
index 062d7da..39dd8b4 100644
--- a/builtin/reflog.c
+++ b/builtin/reflog.c
@@ -7,6 +7,7 @@
 #include "diff.h"
 #include "revision.h"
 #include "reachable.h"
+#include "wildmatch.h"
 
 /*
  * reflog expire
@@ -561,7 +562,7 @@ static void set_reflog_expiry_param(struct cmd_reflog_expire_cb *cb, int slot, c
 		return; /* both given explicitly -- nothing to tweak */
 
 	for (ent = reflog_expire_cfg; ent; ent = ent->next) {
-		if (!fnmatch(ent->pattern, ref, 0)) {
+		if (!wildmatch(ent->pattern, ref, 0)) {
 			if (!(slot & EXPIRE_TOTAL))
 				cb->expire_total = ent->expire_total;
 			if (!(slot & EXPIRE_UNREACH))
diff --git a/builtin/replace.c b/builtin/replace.c
index 4a8970e..7fdefa8 100644
--- a/builtin/replace.c
+++ b/builtin/replace.c
@@ -12,6 +12,7 @@
 #include "builtin.h"
 #include "refs.h"
 #include "parse-options.h"
+#include "wildmatch.h"
 
 static const char * const git_replace_usage[] = {
 	"git replace [-f] <object> <replacement>",
@@ -25,7 +26,7 @@ static int show_reference(const char *refname, const unsigned char *sha1,
 {
 	const char *pattern = cb_data;
 
-	if (!fnmatch(pattern, refname, 0))
+	if (!wildmatch(pattern, refname, 0))
 		printf("%s\n", refname);
 
 	return 0;
diff --git a/builtin/show-branch.c b/builtin/show-branch.c
index a59e088..f4fa165 100644
--- a/builtin/show-branch.c
+++ b/builtin/show-branch.c
@@ -4,6 +4,7 @@
 #include "builtin.h"
 #include "color.h"
 #include "parse-options.h"
+#include "wildmatch.h"
 
 static const char* show_branch_usage[] = {
     "git show-branch [-a|--all] [-r|--remotes] [--topo-order | --date-order] [--current] [--color[=<when>] | --no-color] [--sparse] [--more=<n> | --list | --independent | --merge-base] [--no-name | --sha1-name] [--topics] [(<rev> | <glob>)...]",
@@ -452,7 +453,7 @@ static int append_matching_ref(const char *refname, const unsigned char *sha1, i
 			slash--;
 	if (!*tail)
 		return 0;
-	if (fnmatch(match_ref_pattern, tail, 0))
+	if (wildmatch(match_ref_pattern, tail, 0))
 		return 0;
 	if (!prefixcmp(refname, "refs/heads/"))
 		return append_head_ref(refname, sha1, flag, cb_data);
diff --git a/builtin/tag.c b/builtin/tag.c
index 7b1be85..89aead3 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -17,6 +17,7 @@
 #include "gpg-interface.h"
 #include "sha1-array.h"
 #include "column.h"
+#include "wildmatch.h"
 
 static const char * const git_tag_usage[] = {
 	"git tag [-a|-s|-u <key-id>] [-f] [-m <msg>|-F <file>] <tagname> [<head>]",
@@ -42,7 +43,7 @@ static int match_pattern(const char **patterns, const char *ref)
 	if (!*patterns)
 		return 1;
 	for (; *patterns; patterns++)
-		if (!fnmatch(*patterns, ref, 0))
+		if (!wildmatch(*patterns, ref, 0))
 			return 1;
 	return 0;
 }
diff --git a/diffcore-order.c b/diffcore-order.c
index 23e9385..21eb30c 100644
--- a/diffcore-order.c
+++ b/diffcore-order.c
@@ -4,6 +4,7 @@
 #include "cache.h"
 #include "diff.h"
 #include "diffcore.h"
+#include "wildmatch.h"
 
 static char **order;
 static int order_cnt;
@@ -79,7 +80,7 @@ static int match_order(const char *path)
 		strcpy(p, path);
 		while (p[0]) {
 			char *cp;
-			if (!fnmatch(order[i], p, 0))
+			if (!wildmatch(order[i], p, 0))
 				return i;
 			cp = strrchr(p, '/');
 			if (!cp)
diff --git a/dir.c b/dir.c
index 7bbd6f8..c1339c4 100644
--- a/dir.c
+++ b/dir.c
@@ -32,7 +32,7 @@ int strncmp_icase(const char *a, const char *b, size_t count)
 
 int fnmatch_icase(const char *pattern, const char *string, int flags)
 {
-	return fnmatch(pattern, string, flags | (ignore_case ? FNM_CASEFOLD : 0));
+	return wildmatch(pattern, string, flags | (ignore_case ? FNM_CASEFOLD : 0));
 }
 
 static size_t common_prefix_len(const char **pathspec)
@@ -231,7 +231,7 @@ static int match_pathspec_item(const struct pathspec_item *item, int prefix,
 			return MATCHED_RECURSIVELY;
 	}
 
-	if (item->use_wildcard && !fnmatch(match, name, 0))
+	if (item->use_wildcard && !wildmatch(match, name, 0))
 		return MATCHED_FNMATCH;
 
 	return 0;
diff --git a/refs.c b/refs.c
index da74a2b..47929d9 100644
--- a/refs.c
+++ b/refs.c
@@ -3,6 +3,7 @@
 #include "object.h"
 #include "tag.h"
 #include "dir.h"
+#include "wildmatch.h"
 
 /*
  * Make sure "ref" is something reasonable to have under ".git/refs/";
@@ -1188,7 +1189,7 @@ static int filter_refs(const char *refname, const unsigned char *sha1, int flags
 		       void *data)
 {
 	struct ref_filter *filter = (struct ref_filter *)data;
-	if (fnmatch(filter->pattern, refname, 0))
+	if (wildmatch(filter->pattern, refname, 0))
 		return 0;
 	return filter->fn(refname, sha1, flags, filter->cb_data);
 }
diff --git a/tree-walk.c b/tree-walk.c
index 492c7cd..c729e89 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -3,6 +3,7 @@
 #include "unpack-trees.h"
 #include "dir.h"
 #include "tree.h"
+#include "wildmatch.h"
 
 static const char *get_mode(const char *str, unsigned int *modep)
 {
@@ -627,7 +628,7 @@ enum interesting tree_entry_interesting(const struct name_entry *entry,
 				return entry_interesting;
 
 			if (item->use_wildcard) {
-				if (!fnmatch(match + baselen, entry->path, 0))
+				if (!wildmatch(match + baselen, entry->path, 0))
 					return entry_interesting;
 
 				/*
@@ -652,7 +653,7 @@ match_wildcards:
 
 		strbuf_add(base, entry->path, pathlen);
 
-		if (!fnmatch(match, base->buf + base_offset, 0)) {
+		if (!wildmatch(match, base->buf + base_offset, 0)) {
 			strbuf_setlen(base, base_offset + baselen);
 			return entry_interesting;
 		}
-- 
1.8.0.rc2.23.g1fb49df

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH/WIP 0/3] Bye bye fnmatch()
  2012-12-19 13:08 [PATCH/WIP 0/3] Bye bye fnmatch() Nguyễn Thái Ngọc Duy
                   ` (2 preceding siblings ...)
  2012-12-19 13:08 ` [PATCH 3/3] Convert all fnmatch() calls to wildmatch() Nguyễn Thái Ngọc Duy
@ 2012-12-19 13:21 ` Nguyen Thai Ngoc Duy
  3 siblings, 0 replies; 11+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-12-19 13:21 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

On Wed, Dec 19, 2012 at 8:08 PM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> For those who have not followed, nd/wildmatch brings another
> fnmatch-like implementation which can nearly replace fnmatch.
> System fnmatch() seems to behave differently in some cases. It's
> better to stay away and use one implementation for all.

Oh I forgot. Case-insensitive matching will be available to everybody.
On the other hand it'll be a lot more work to (implement and) use
other FNM_* flags like FNM_PERIOD.
-- 
Duy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] wildmatch: make dowild() take arbitrary flags
  2012-12-19 13:08 ` [PATCH 1/3] wildmatch: make dowild() take arbitrary flags Nguyễn Thái Ngọc Duy
@ 2012-12-19 16:32   ` Junio C Hamano
  0 siblings, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2012-12-19 16:32 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  wildmatch.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/wildmatch.c b/wildmatch.c
> index 3972e26..9586ed9 100644
> --- a/wildmatch.c
> +++ b/wildmatch.c
> @@ -55,7 +55,7 @@ typedef unsigned char uchar;
>  #define ISXDIGIT(c) (ISASCII(c) && isxdigit(c))
>  
>  /* Match pattern "p" against "text" */
> -static int dowild(const uchar *p, const uchar *text, int force_lower_case)
> +static int dowild(const uchar *p, const uchar *text, int flags)

It may be better to declare a bitset like this unsigned.

>  {
>  	uchar p_ch;
>  
> @@ -64,9 +64,9 @@ static int dowild(const uchar *p, const uchar *text, int force_lower_case)
>  		uchar t_ch, prev_ch;
>  		if ((t_ch = *text) == '\0' && p_ch != '*')
>  			return ABORT_ALL;
> -		if (force_lower_case && ISUPPER(t_ch))
> +		if (flags & FNM_CASEFOLD && ISUPPER(t_ch))

Please add parentheses around bitwise-AND that is used as a boolean,
i.e.

	if ((flags & FNM_CASEFOLD) && ISUPPER(t_ch))

Less chance of confusion.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] wildmatch: support "no FNM_PATHNAME" mode
  2012-12-19 13:08 ` [PATCH 2/3] wildmatch: support "no FNM_PATHNAME" mode Nguyễn Thái Ngọc Duy
@ 2012-12-19 17:24   ` Junio C Hamano
  2012-12-20  1:55     ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2012-12-19 17:24 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> By default wildmatch(,, 0) is equivalent with fnmatch(,, FNM_PATHNAME).

Is this stating a fact before or after the patch?

I think it is more like:

    So far, wildmatch() has always honoured directory boundary and
    there was no way to turn it off.  Make it behave more like
    fnmatch() by requiring all callers that want the FNM_PATHNAME
    behaviour to pass that in the flags parameter.  Callers that do
    not specify FNM_PATHNAME will get wildcards like ? and * in
    their patterns matched against '/' and ...

>  The choice of name "pathspec" is not good. I couldn't think of
>  anything appropriate and just did not care enough at this point.

Well, you should, before this series leaves the WIP state.  It seems
that all operating modes supported by test-wildmatch are named as
somethingmatch, so "pathmatch" may be a better candidate.

> diff --git a/test-wildmatch.c b/test-wildmatch.c
> index e384c8e..7fefa4f 100644
> --- a/test-wildmatch.c
> +++ b/test-wildmatch.c
> @@ -12,9 +12,11 @@ int main(int argc, char **argv)
>  			argv[i] += 3;
>  	}
>  	if (!strcmp(argv[1], "wildmatch"))
> +		return !!wildmatch(argv[3], argv[2], FNM_PATHNAME);
> +	else if (!strcmp(argv[1], "pathspec"))
>  		return !!wildmatch(argv[3], argv[2], 0);

"ipathmatch" to pass only FNM_CASEFOLD may be a natural extension,
but I doubt we use that combination in the real life (yet).

I am probably two step ahead of what is being done (read: this will
be a Git 2.0 topic, if not later), but I am wondering how we'd
integrate this new machinery well with the pathspec limited
traversal.

Traditionally, when traversing a tree limited with a pathspec, say,
"Documentation/*.txt", we used the simple part of the prefix and
noticed that any subdirectory whose name is not "Documentation" can
never match the pathspec and avoided descending into it.  As the
user can use "**" to expand to "any levels of subdirectory", I think
the pathspec limited traversal eventually can safely (and should)
use FNM_PATHNAME by default.  

That will allow people to say "Documentation/**/*.txt" to match both
"git.txt" and "howto/maintain-git.txt", without resorting to the
more traditional !FNM_PATHNAME semantics "Documentation/*.txt" to
match the latter (i.e. "*" matches "howto/maintain-git").

When that happens, we should want to retain the same "do not bother
to descend into subdirectories that will never match" optimization
for a pattern like "Doc*tion/**/*.txt".  Because of FNM_PATHNAME, we
can tell if a subdirectory is worth descending into by looking at
the not-so-simple prefix "Doc*tion/"; "Documentation" will match,
"Doc" will not (because '*' won't match '/').

Which tells me that integrating this _well_ into the rest of the
system is not just a matter of replacing fnmatch() with wildmatch().

I also expect that wildmatch() would be much slower than fnmatch()
especially when doing its "**" magic, but I personally do not think
it will be a showstopper.  If the user asks for a more powerful but
expensive operation, we are saving time for the user by doing a more
powerful thing (reducing the need to postprocess the results) and
can afford to spend extra cycles.

As long as simpler patterns fnmatch() groks (namely, '?', '*', and
'[class]' wildcards only) are not slowed down by replacing it with
wildmatch(), that is, of course.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] Convert all fnmatch() calls to wildmatch()
  2012-12-19 13:08 ` [PATCH 3/3] Convert all fnmatch() calls to wildmatch() Nguyễn Thái Ngọc Duy
@ 2012-12-19 18:36   ` Junio C Hamano
  2012-12-20 12:36     ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2012-12-19 18:36 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> diff --git a/tree-walk.c b/tree-walk.c
> index 492c7cd..c729e89 100644
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -3,6 +3,7 @@
>  #include "unpack-trees.h"
>  #include "dir.h"
>  #include "tree.h"
> +#include "wildmatch.h"
>  
>  static const char *get_mode(const char *str, unsigned int *modep)
>  {
> @@ -627,7 +628,7 @@ enum interesting tree_entry_interesting(const struct name_entry *entry,
>  				return entry_interesting;
>  
>  			if (item->use_wildcard) {
> -				if (!fnmatch(match + baselen, entry->path, 0))
> +				if (!wildmatch(match + baselen, entry->path, 0))
>  					return entry_interesting;

This and the change to dir.c obviously have interactions with
8c6abbc (pathspec: apply "*.c" optimization from exclude,
2012-11-24).

I've already alluded to it in my response to 2/3, I guess.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] wildmatch: support "no FNM_PATHNAME" mode
  2012-12-19 17:24   ` Junio C Hamano
@ 2012-12-20  1:55     ` Nguyen Thai Ngoc Duy
  2012-12-20 13:34       ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 11+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-12-20  1:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Dec 20, 2012 at 12:24 AM, Junio C Hamano <gitster@pobox.com> wrote:
> When that happens, we should want to retain the same "do not bother
> to descend into subdirectories that will never match" optimization
> for a pattern like "Doc*tion/**/*.txt".  Because of FNM_PATHNAME, we
> can tell if a subdirectory is worth descending into by looking at
> the not-so-simple prefix "Doc*tion/"; "Documentation" will match,
> "Doc" will not (because '*' won't match '/').
>
> Which tells me that integrating this _well_ into the rest of the
> system is not just a matter of replacing fnmatch() with wildmatch().

Yeah, we open a door for more opportunities and a lot more headache.

> I also expect that wildmatch() would be much slower than fnmatch()
> especially when doing its "**" magic, but I personally do not think
> it will be a showstopper.

A potential showstopper is the lack of multibyte support. I don't know
if people want that though.

> If the user asks for a more powerful but
> expensive operation, we are saving time for the user by doing a more
> powerful thing (reducing the need to postprocess the results) and
> can afford to spend extra cycles.

In some case we may be able to spend fewer cycles by preprocessing
patterns first.

> As long as simpler patterns fnmatch() groks (namely, '?', '*', and
> '[class]' wildcards only) are not slowed down by replacing it with
> wildmatch(), that is, of course.

I'm concerned about performance vs fnmatch too. I'll probably write a
small program to exercise both and measure it with glibc.
-- 
Duy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] Convert all fnmatch() calls to wildmatch()
  2012-12-19 18:36   ` Junio C Hamano
@ 2012-12-20 12:36     ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 11+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-12-20 12:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Dec 20, 2012 at 1:36 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> @@ -627,7 +628,7 @@ enum interesting tree_entry_interesting(const struct name_entry *entry,
>>                               return entry_interesting;
>>
>>                       if (item->use_wildcard) {
>> -                             if (!fnmatch(match + baselen, entry->path, 0))
>> +                             if (!wildmatch(match + baselen, entry->path, 0))
>>                                       return entry_interesting;
>
> This and the change to dir.c obviously have interactions with
> 8c6abbc (pathspec: apply "*.c" optimization from exclude,
> 2012-11-24).
>
> I've already alluded to it in my response to 2/3, I guess.

Conflict of plans. I did not expect myself to work on replacing
fnmatch soon and git_fnmatch is an intermediate step to collect more
optimizations and gradually replace fnmatch. Eventually git_fnmatch()
would become a wrapper of wildmatch, all the optimizations are pushed
down there. If we replace fnmatch->wildmatch first then there's little
reason for the existence of git_fnmatch(). Maybe I should merge this
with the fnmatch elimination into a single series.
-- 
Duy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] wildmatch: support "no FNM_PATHNAME" mode
  2012-12-20  1:55     ` Nguyen Thai Ngoc Duy
@ 2012-12-20 13:34       ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 11+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-12-20 13:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Dec 20, 2012 at 8:55 AM, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
>> As long as simpler patterns fnmatch() groks (namely, '?', '*', and
>> '[class]' wildcards only) are not slowed down by replacing it with
>> wildmatch(), that is, of course.
>
> I'm concerned about performance vs fnmatch too. I'll probably write a
> small program to exercise both and measure it with glibc.

I'll send out proper patches later this weekend, but unless I make big
mistakes it seems wildmatch is faster than fnmatch from glibc 2.14.1
on x86-64 :) Maybe I did make mistakes. The test matches every line in
/tmp/filelist.txt (which is ls-files from linux-2.6.git) 2000 times.
"pathname" means FNM_PATHNAME.

$ export LANG=C
$ ./test-wildmatch-perf /tmp/filelist.txt 'Documentation/*' 2000
wildmatch 1s 528394us
fnmatch   2s 588396us
$ ./test-wildmatch-perf /tmp/filelist.txt 'drivers/*' 2000
wildmatch 1s 957243us
fnmatch   3s 134839us
$ ./test-wildmatch-perf /tmp/filelist.txt 'Documentation/*' 2000 pathname
wildmatch 1s 548575us
fnmatch   2s 629680us
$ ./test-wildmatch-perf /tmp/filelist.txt 'drivers/*' 2000 pathname
wildmatch 2s 142377us
fnmatch   3s 318295us
$ ./test-wildmatch-perf /tmp/filelist.txt '[Dd]ocu[Mn]entation/*' 2000
wildmatch 1s 748505us
fnmatch   3s 224414us
$ ./test-wildmatch-perf /tmp/filelist.txt '[Dd]o?u[Mn]en?ati?n/*' 2000
wildmatch 1s 743268us
fnmatch   3s 278034us
$ ./test-wildmatch-perf /tmp/filelist.txt '[Dd]o?u[Mn]en?ati?n/*' 2000 pathname
wildmatch 1s 745151us
fnmatch   3s 314127us
$ ./test-wildmatch-perf /tmp/filelist.txt nonsense 2000 pathname
wildmatch 1s 310727us
fnmatch   2s 279123us
-- 
Duy

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-12-20 13:34 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-19 13:08 [PATCH/WIP 0/3] Bye bye fnmatch() Nguyễn Thái Ngọc Duy
2012-12-19 13:08 ` [PATCH 1/3] wildmatch: make dowild() take arbitrary flags Nguyễn Thái Ngọc Duy
2012-12-19 16:32   ` Junio C Hamano
2012-12-19 13:08 ` [PATCH 2/3] wildmatch: support "no FNM_PATHNAME" mode Nguyễn Thái Ngọc Duy
2012-12-19 17:24   ` Junio C Hamano
2012-12-20  1:55     ` Nguyen Thai Ngoc Duy
2012-12-20 13:34       ` Nguyen Thai Ngoc Duy
2012-12-19 13:08 ` [PATCH 3/3] Convert all fnmatch() calls to wildmatch() Nguyễn Thái Ngọc Duy
2012-12-19 18:36   ` Junio C Hamano
2012-12-20 12:36     ` Nguyen Thai Ngoc Duy
2012-12-19 13:21 ` [PATCH/WIP 0/3] Bye bye fnmatch() Nguyen Thai Ngoc Duy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).