* [PATCH] wildmatch: properly fold case everywhere @ 2013-05-28 12:32 Anthony Ramine 2013-05-28 12:53 ` Duy Nguyen ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Anthony Ramine @ 2013-05-28 12:32 UTC (permalink / raw) To: git; +Cc: Nguyễn Thái Ngọc Duy Case folding is not done correctly when matching against the [:upper:] character class and uppercased character ranges (e.g. A-Z). Specifically, an uppercase letter fails to match against any of them when case folding is requested because plain characters in the pattern and the whole string and preemptively lowercased to handle the base case fast. That optimization is kept and ISLOWER() is used in the [:upper:] case when case folding is requested, while matching against a character range is retried with toupper() if the character was lowercase. Signed-off-by: Anthony Ramine <n.oxyde@gmail.com> --- t/t3070-wildmatch.sh | 43 +++++++++++++++++++++++++++++++++++++------ wildmatch.c | 7 +++++++ 2 files changed, 44 insertions(+), 6 deletions(-) diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh index 4c37057..17315aa 100755 --- a/t/t3070-wildmatch.sh +++ b/t/t3070-wildmatch.sh @@ -6,20 +6,20 @@ test_description='wildmatch tests' match() { if [ $1 = 1 ]; then - test_expect_success "wildmatch: match '$3' '$4'" " + test_expect_success "wildmatch: match '$3' '$4'" " test-wildmatch wildmatch '$3' '$4' " else - test_expect_success "wildmatch: no match '$3' '$4'" " + test_expect_success "wildmatch: no match '$3' '$4'" " ! test-wildmatch wildmatch '$3' '$4' " fi if [ $2 = 1 ]; then - test_expect_success "fnmatch: match '$3' '$4'" " + test_expect_success "fnmatch: match '$3' '$4'" " test-wildmatch fnmatch '$3' '$4' " elif [ $2 = 0 ]; then - test_expect_success "fnmatch: no match '$3' '$4'" " + test_expect_success "fnmatch: no match '$3' '$4'" " ! test-wildmatch fnmatch '$3' '$4' " # else @@ -29,13 +29,25 @@ match() { fi } +imatch() { + if [ $1 = 1 ]; then + test_expect_success "iwildmatch: match '$2' '$3'" " + test-wildmatch iwildmatch '$2' '$3' + " + else + test_expect_success "iwildmatch: no match '$2' '$3'" " + ! test-wildmatch iwildmatch '$2' '$3' + " + fi +} + pathmatch() { if [ $1 = 1 ]; then - test_expect_success "pathmatch: match '$2' '$3'" " + test_expect_success "pathmatch: match '$2' '$3'" " test-wildmatch pathmatch '$2' '$3' " else - test_expect_success "pathmatch: no match '$2' '$3'" " + test_expect_success "pathmatch: no match '$2' '$3'" " ! test-wildmatch pathmatch '$2' '$3' " fi @@ -235,4 +247,23 @@ pathmatch 1 abcXdefXghi '*X*i' pathmatch 1 ab/cXd/efXg/hi '*/*X*/*/*i' pathmatch 1 ab/cXd/efXg/hi '*Xg*i' +# Case-sensitivy features +match 0 x 'a' '[A-Z]' +match 1 x 'A' '[A-Z]' +match 0 x 'A' '[a-z]' +match 1 x 'a' '[a-z]' +match 0 x 'a' '[[:upper:]]' +match 1 x 'A' '[[:upper:]]' +match 0 x 'A' '[[:lower:]]' +match 1 x 'a' '[[:lower:]]' + +imatch 1 'a' '[A-Z]' +imatch 1 'A' '[A-Z]' +imatch 1 'A' '[a-z]' +imatch 1 'a' '[a-z]' +imatch 1 'a' '[[:upper:]]' +imatch 1 'A' '[[:upper:]]' +imatch 1 'A' '[[:lower:]]' +imatch 1 'a' '[[:lower:]]' + test_done diff --git a/wildmatch.c b/wildmatch.c index 7192bdc..ea318d3 100644 --- a/wildmatch.c +++ b/wildmatch.c @@ -196,6 +196,11 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) } if (t_ch <= p_ch && t_ch >= prev_ch) matched = 1; + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) { + t_ch = toupper(t_ch); + if (t_ch <= p_ch && t_ch >= prev_ch) + matched = 1; + } p_ch = 0; /* This makes "prev_ch" get set to 0. */ } else if (p_ch == '[' && p[1] == ':') { const uchar *s; @@ -245,6 +250,8 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) } else if (CC_EQ(s,i, "upper")) { if (ISUPPER(t_ch)) matched = 1; + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) + matched = 1; } else if (CC_EQ(s,i, "xdigit")) { if (ISXDIGIT(t_ch)) matched = 1; -- 1.8.3 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH] wildmatch: properly fold case everywhere 2013-05-28 12:32 [PATCH] wildmatch: properly fold case everywhere Anthony Ramine @ 2013-05-28 12:53 ` Duy Nguyen 2013-05-28 13:01 ` Anthony Ramine 2013-05-28 13:10 ` [PATCH v2] " Anthony Ramine 2013-05-28 13:58 ` [PATCH v3] " Anthony Ramine 2 siblings, 1 reply; 18+ messages in thread From: Duy Nguyen @ 2013-05-28 12:53 UTC (permalink / raw) To: Anthony Ramine; +Cc: Git Mailing List On Tue, May 28, 2013 at 7:32 PM, Anthony Ramine <n.oxyde@gmail.com> wrote: > @@ -196,6 +196,11 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) > } > if (t_ch <= p_ch && t_ch >= prev_ch) > matched = 1; > + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) { > + t_ch = toupper(t_ch); This happens in a while loop where t_ch may be used again. Should we make a local copy of toupper(t_ch) and leave t_ch untouched? > + if (t_ch <= p_ch && t_ch >= prev_ch) > + matched = 1; > + } > p_ch = 0; /* This makes "prev_ch" get set to 0. */ > } else if (p_ch == '[' && p[1] == ':') { > const uchar *s; -- Duy ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] wildmatch: properly fold case everywhere 2013-05-28 12:53 ` Duy Nguyen @ 2013-05-28 13:01 ` Anthony Ramine 0 siblings, 0 replies; 18+ messages in thread From: Anthony Ramine @ 2013-05-28 13:01 UTC (permalink / raw) To: Duy Nguyen; +Cc: Git Mailing List You're right, I will amend my patch. How do I make git-send-email reply to that thread? -- Anthony Ramine Le 28 mai 2013 à 14:53, Duy Nguyen a écrit : > On Tue, May 28, 2013 at 7:32 PM, Anthony Ramine <n.oxyde@gmail.com> wrote: >> @@ -196,6 +196,11 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) >> } >> if (t_ch <= p_ch && t_ch >= prev_ch) >> matched = 1; >> + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) { >> + t_ch = toupper(t_ch); > > This happens in a while loop where t_ch may be used again. Should we > make a local copy of toupper(t_ch) and leave t_ch untouched? > >> + if (t_ch <= p_ch && t_ch >= prev_ch) >> + matched = 1; >> + } >> p_ch = 0; /* This makes "prev_ch" get set to 0. */ >> } else if (p_ch == '[' && p[1] == ':') { >> const uchar *s; > -- > Duy ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2] wildmatch: properly fold case everywhere 2013-05-28 12:32 [PATCH] wildmatch: properly fold case everywhere Anthony Ramine 2013-05-28 12:53 ` Duy Nguyen @ 2013-05-28 13:10 ` Anthony Ramine 2013-05-28 13:58 ` [PATCH v3] " Anthony Ramine 2 siblings, 0 replies; 18+ messages in thread From: Anthony Ramine @ 2013-05-28 13:10 UTC (permalink / raw) To: git; +Cc: Nguyễn Thái Ngọc Duy Case folding is not done correctly when matching against the [:upper:] character class and uppercased character ranges (e.g. A-Z). Specifically, an uppercase letter fails to match against any of them when case folding is requested because plain characters in the pattern and the whole string and preemptively lowercased to handle the base case fast. That optimization is kept and ISLOWER() is used in the [:upper:] case when case folding is requested, while matching against a character range is retried with toupper() if the character was lowercase. Signed-off-by: Anthony Ramine <n.oxyde@gmail.com> --- t/t3070-wildmatch.sh | 47 +++++++++++++++++++++++++++++++++++++++++------ wildmatch.c | 7 +++++++ 2 files changed, 48 insertions(+), 6 deletions(-) diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh index 4c37057..e1b45e6 100755 --- a/t/t3070-wildmatch.sh +++ b/t/t3070-wildmatch.sh @@ -6,20 +6,20 @@ test_description='wildmatch tests' match() { if [ $1 = 1 ]; then - test_expect_success "wildmatch: match '$3' '$4'" " + test_expect_success "wildmatch: match '$3' '$4'" " test-wildmatch wildmatch '$3' '$4' " else - test_expect_success "wildmatch: no match '$3' '$4'" " + test_expect_success "wildmatch: no match '$3' '$4'" " ! test-wildmatch wildmatch '$3' '$4' " fi if [ $2 = 1 ]; then - test_expect_success "fnmatch: match '$3' '$4'" " + test_expect_success "fnmatch: match '$3' '$4'" " test-wildmatch fnmatch '$3' '$4' " elif [ $2 = 0 ]; then - test_expect_success "fnmatch: no match '$3' '$4'" " + test_expect_success "fnmatch: no match '$3' '$4'" " ! test-wildmatch fnmatch '$3' '$4' " # else @@ -29,13 +29,25 @@ match() { fi } +imatch() { + if [ $1 = 1 ]; then + test_expect_success "iwildmatch: match '$2' '$3'" " + test-wildmatch iwildmatch '$2' '$3' + " + else + test_expect_success "iwildmatch: no match '$2' '$3'" " + ! test-wildmatch iwildmatch '$2' '$3' + " + fi +} + pathmatch() { if [ $1 = 1 ]; then - test_expect_success "pathmatch: match '$2' '$3'" " + test_expect_success "pathmatch: match '$2' '$3'" " test-wildmatch pathmatch '$2' '$3' " else - test_expect_success "pathmatch: no match '$2' '$3'" " + test_expect_success "pathmatch: no match '$2' '$3'" " ! test-wildmatch pathmatch '$2' '$3' " fi @@ -235,4 +247,27 @@ pathmatch 1 abcXdefXghi '*X*i' pathmatch 1 ab/cXd/efXg/hi '*/*X*/*/*i' pathmatch 1 ab/cXd/efXg/hi '*Xg*i' +# Case-sensitivy features +match 0 x 'a' '[A-Z]' +match 1 x 'A' '[A-Z]' +match 0 x 'A' '[a-z]' +match 1 x 'a' '[a-z]' +match 0 x 'a' '[[:upper:]]' +match 1 x 'A' '[[:upper:]]' +match 0 x 'A' '[[:lower:]]' +match 1 x 'a' '[[:lower:]]' +match 0 x 'A' '[B-Za]' +match 1 x 'a' '[B-Za]' + +imatch 1 'a' '[A-Z]' +imatch 1 'A' '[A-Z]' +imatch 1 'A' '[a-z]' +imatch 1 'a' '[a-z]' +imatch 1 'a' '[[:upper:]]' +imatch 1 'A' '[[:upper:]]' +imatch 1 'A' '[[:lower:]]' +imatch 1 'a' '[[:lower:]]' +imatch 1 'A' '[B-Za]' +imatch 1 'a' '[B-Za]' + test_done diff --git a/wildmatch.c b/wildmatch.c index 7192bdc..ea318d3 100644 --- a/wildmatch.c +++ b/wildmatch.c @@ -196,6 +196,11 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) } if (t_ch <= p_ch && t_ch >= prev_ch) matched = 1; + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) { + t_ch = toupper(t_ch); + if (t_ch <= p_ch && t_ch >= prev_ch) + matched = 1; + } p_ch = 0; /* This makes "prev_ch" get set to 0. */ } else if (p_ch == '[' && p[1] == ':') { const uchar *s; @@ -245,6 +250,8 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) } else if (CC_EQ(s,i, "upper")) { if (ISUPPER(t_ch)) matched = 1; + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) + matched = 1; } else if (CC_EQ(s,i, "xdigit")) { if (ISXDIGIT(t_ch)) matched = 1; -- 1.8.3 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3] wildmatch: properly fold case everywhere 2013-05-28 12:32 [PATCH] wildmatch: properly fold case everywhere Anthony Ramine 2013-05-28 12:53 ` Duy Nguyen 2013-05-28 13:10 ` [PATCH v2] " Anthony Ramine @ 2013-05-28 13:58 ` Anthony Ramine 2013-05-29 13:22 ` Duy Nguyen 2 siblings, 1 reply; 18+ messages in thread From: Anthony Ramine @ 2013-05-28 13:58 UTC (permalink / raw) To: Git Mailing List; +Cc: Nguyễn Thái Ngọc Duy Case folding is not done correctly when matching against the [:upper:] character class and uppercased character ranges (e.g. A-Z). Specifically, an uppercase letter fails to match against any of them when case folding is requested because plain characters in the pattern and the whole string and preemptively lowercased to handle the base case fast. That optimization is kept and ISLOWER() is used in the [:upper:] case when case folding is requested, while matching against a character range is retried with toupper() if the character was lowercase. Signed-off-by: Anthony Ramine <n.oxyde@gmail.com> --- t/t3070-wildmatch.sh | 47 +++++++++++++++++++++++++++++++++++++++++------ wildmatch.c | 7 +++++++ 2 files changed, 48 insertions(+), 6 deletions(-) Please disregard PATCH v2, it is identical to the first one. diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh index 4c37057..e1b45e6 100755 --- a/t/t3070-wildmatch.sh +++ b/t/t3070-wildmatch.sh @@ -6,20 +6,20 @@ test_description='wildmatch tests' match() { if [ $1 = 1 ]; then - test_expect_success "wildmatch: match '$3' '$4'" " + test_expect_success "wildmatch: match '$3' '$4'" " test-wildmatch wildmatch '$3' '$4' " else - test_expect_success "wildmatch: no match '$3' '$4'" " + test_expect_success "wildmatch: no match '$3' '$4'" " ! test-wildmatch wildmatch '$3' '$4' " fi if [ $2 = 1 ]; then - test_expect_success "fnmatch: match '$3' '$4'" " + test_expect_success "fnmatch: match '$3' '$4'" " test-wildmatch fnmatch '$3' '$4' " elif [ $2 = 0 ]; then - test_expect_success "fnmatch: no match '$3' '$4'" " + test_expect_success "fnmatch: no match '$3' '$4'" " ! test-wildmatch fnmatch '$3' '$4' " # else @@ -29,13 +29,25 @@ match() { fi } +imatch() { + if [ $1 = 1 ]; then + test_expect_success "iwildmatch: match '$2' '$3'" " + test-wildmatch iwildmatch '$2' '$3' + " + else + test_expect_success "iwildmatch: no match '$2' '$3'" " + ! test-wildmatch iwildmatch '$2' '$3' + " + fi +} + pathmatch() { if [ $1 = 1 ]; then - test_expect_success "pathmatch: match '$2' '$3'" " + test_expect_success "pathmatch: match '$2' '$3'" " test-wildmatch pathmatch '$2' '$3' " else - test_expect_success "pathmatch: no match '$2' '$3'" " + test_expect_success "pathmatch: no match '$2' '$3'" " ! test-wildmatch pathmatch '$2' '$3' " fi @@ -235,4 +247,27 @@ pathmatch 1 abcXdefXghi '*X*i' pathmatch 1 ab/cXd/efXg/hi '*/*X*/*/*i' pathmatch 1 ab/cXd/efXg/hi '*Xg*i' +# Case-sensitivy features +match 0 x 'a' '[A-Z]' +match 1 x 'A' '[A-Z]' +match 0 x 'A' '[a-z]' +match 1 x 'a' '[a-z]' +match 0 x 'a' '[[:upper:]]' +match 1 x 'A' '[[:upper:]]' +match 0 x 'A' '[[:lower:]]' +match 1 x 'a' '[[:lower:]]' +match 0 x 'A' '[B-Za]' +match 1 x 'a' '[B-Za]' + +imatch 1 'a' '[A-Z]' +imatch 1 'A' '[A-Z]' +imatch 1 'A' '[a-z]' +imatch 1 'a' '[a-z]' +imatch 1 'a' '[[:upper:]]' +imatch 1 'A' '[[:upper:]]' +imatch 1 'A' '[[:lower:]]' +imatch 1 'a' '[[:lower:]]' +imatch 1 'A' '[B-Za]' +imatch 1 'a' '[B-Za]' + test_done diff --git a/wildmatch.c b/wildmatch.c index 7192bdc..f91ba99 100644 --- a/wildmatch.c +++ b/wildmatch.c @@ -196,6 +196,11 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) } if (t_ch <= p_ch && t_ch >= prev_ch) matched = 1; + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) { + uchar t_ch_upper = toupper(t_ch); + if (t_ch_upper <= p_ch && t_ch_upper >= prev_ch) + matched = 1; + } p_ch = 0; /* This makes "prev_ch" get set to 0. */ } else if (p_ch == '[' && p[1] == ':') { const uchar *s; @@ -245,6 +250,8 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) } else if (CC_EQ(s,i, "upper")) { if (ISUPPER(t_ch)) matched = 1; + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) + matched = 1; } else if (CC_EQ(s,i, "xdigit")) { if (ISXDIGIT(t_ch)) matched = 1; -- 1.8.3 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v3] wildmatch: properly fold case everywhere 2013-05-28 13:58 ` [PATCH v3] " Anthony Ramine @ 2013-05-29 13:22 ` Duy Nguyen 2013-05-29 13:37 ` Anthony Ramine 0 siblings, 1 reply; 18+ messages in thread From: Duy Nguyen @ 2013-05-29 13:22 UTC (permalink / raw) To: Anthony Ramine; +Cc: Git Mailing List On Tue, May 28, 2013 at 8:58 PM, Anthony Ramine <n.oxyde@gmail.com> wrote: > Case folding is not done correctly when matching against the [:upper:] > character class and uppercased character ranges (e.g. A-Z). > Specifically, an uppercase letter fails to match against any of them > when case folding is requested because plain characters in the pattern > and the whole string and preemptively lowercased to handle the base case > fast. I did a little test with glibc fnmatch and also checked the source code. I don't think 'a' matches [:upper:]. So I'm not sure if that's a correct behavior or a bug in glibc. The spec is not clear (I think) on this. I guess we should just assume that 'a' should match '[:upper:]'? > @@ -196,6 +196,11 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) > } > if (t_ch <= p_ch && t_ch >= prev_ch) > matched = 1; > + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) { > + uchar t_ch_upper = toupper(t_ch); > + if (t_ch_upper <= p_ch && t_ch_upper >= prev_ch) > + matched = 1; > + } Or we could stick with to tolower. Something like this if ((t_ch <= p_ch && t_ch >= prev_ch) || ((flags & WM_CASEFOLD) && t_ch <= tolower(p_ch) && t_ch >= tolower(prev_ch))) match = 1; I think it's easier to read if we either downcase all, or upcase all, not both. > p_ch = 0; /* This makes "prev_ch" get set to 0. */ > } else if (p_ch == '[' && p[1] == ':') { > const uchar *s; > @@ -245,6 +250,8 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) > } else if (CC_EQ(s,i, "upper")) { > if (ISUPPER(t_ch)) > matched = 1; > + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) > + matched = 1; > } else if (CC_EQ(s,i, "xdigit")) { > if (ISXDIGIT(t_ch)) > matched = 1; If WM_CASEFOLD is set, maybe isalpha(t_ch) is enough then? -- Duy ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3] wildmatch: properly fold case everywhere 2013-05-29 13:22 ` Duy Nguyen @ 2013-05-29 13:37 ` Anthony Ramine 2013-05-29 13:52 ` Duy Nguyen 0 siblings, 1 reply; 18+ messages in thread From: Anthony Ramine @ 2013-05-29 13:37 UTC (permalink / raw) To: Duy Nguyen; +Cc: Git Mailing List Replied inline. Regards, -- Anthony Ramine Le 29 mai 2013 à 15:22, Duy Nguyen a écrit : > On Tue, May 28, 2013 at 8:58 PM, Anthony Ramine <n.oxyde@gmail.com> wrote: >> Case folding is not done correctly when matching against the [:upper:] >> character class and uppercased character ranges (e.g. A-Z). >> Specifically, an uppercase letter fails to match against any of them >> when case folding is requested because plain characters in the pattern >> and the whole string and preemptively lowercased to handle the base case >> fast. > > I did a little test with glibc fnmatch and also checked the source > code. I don't think 'a' matches [:upper:]. So I'm not sure if that's a > correct behavior or a bug in glibc. The spec is not clear (I think) on > this. I guess we should just assume that 'a' should match '[:upper:]'? I don't know, in my opinion if case folding is enabled we should say [:upper:], [:lower:] and [:alpha:] are equivalent. This opinion is shared by GNU Flex [1]: > • If your scanner is case-insensitive (the ‘-i’ flag), then ‘[:upper:]’ and ‘[:lower:]’ are equivalent to ‘[:alpha:]’. [1] http://flex.sourceforge.net/manual/Patterns.html >> @@ -196,6 +196,11 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) >> } >> if (t_ch <= p_ch && t_ch >= prev_ch) >> matched = 1; >> + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) { >> + uchar t_ch_upper = toupper(t_ch); >> + if (t_ch_upper <= p_ch && t_ch_upper >= prev_ch) >> + matched = 1; >> + } > > Or we could stick with to tolower. Something like this > > if ((t_ch <= p_ch && t_ch >= prev_ch) || > ((flags & WM_CASEFOLD) && > t_ch <= tolower(p_ch) && t_ch >= tolower(prev_ch))) > match = 1; > > I think it's easier to read if we either downcase all, or upcase all, not both. If the range to match against is [A-_], it will become [a-_] which is an empty range, ord('a') > ord('_'). I think it is simpler to reuse toupper() after the fact as I did. Anyway maybe I should add a test for that corner case? >> p_ch = 0; /* This makes "prev_ch" get set to 0. */ >> } else if (p_ch == '[' && p[1] == ':') { >> const uchar *s; >> @@ -245,6 +250,8 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) >> } else if (CC_EQ(s,i, "upper")) { >> if (ISUPPER(t_ch)) >> matched = 1; >> + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) >> + matched = 1; >> } else if (CC_EQ(s,i, "xdigit")) { >> if (ISXDIGIT(t_ch)) >> matched = 1; > > If WM_CASEFOLD is set, maybe isalpha(t_ch) is enough then? Yes isalpha() is enought but I wanted to keep the two cases separated, I can amend that if you want. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3] wildmatch: properly fold case everywhere 2013-05-29 13:37 ` Anthony Ramine @ 2013-05-29 13:52 ` Duy Nguyen 2013-05-29 17:57 ` Anthony Ramine 0 siblings, 1 reply; 18+ messages in thread From: Duy Nguyen @ 2013-05-29 13:52 UTC (permalink / raw) To: Anthony Ramine; +Cc: Git Mailing List On Wed, May 29, 2013 at 8:37 PM, Anthony Ramine <n.oxyde@gmail.com> wrote: > Le 29 mai 2013 à 15:22, Duy Nguyen a écrit : > >> On Tue, May 28, 2013 at 8:58 PM, Anthony Ramine <n.oxyde@gmail.com> wrote: >>> Case folding is not done correctly when matching against the [:upper:] >>> character class and uppercased character ranges (e.g. A-Z). >>> Specifically, an uppercase letter fails to match against any of them >>> when case folding is requested because plain characters in the pattern >>> and the whole string and preemptively lowercased to handle the base case >>> fast. >> >> I did a little test with glibc fnmatch and also checked the source >> code. I don't think 'a' matches [:upper:]. So I'm not sure if that's a >> correct behavior or a bug in glibc. The spec is not clear (I think) on >> this. I guess we should just assume that 'a' should match '[:upper:]'? > > I don't know, in my opinion if case folding is enabled we should say [:upper:], [:lower:] and [:alpha:] are equivalent. > > This opinion is shared by GNU Flex [1]: > >> • If your scanner is case-insensitive (the ‘-i’ flag), then ‘[:upper:]’ and ‘[:lower:]’ are equivalent to ‘[:alpha:]’. > > [1] http://flex.sourceforge.net/manual/Patterns.html Then we should do it too because of this precedent, I think. >>> @@ -196,6 +196,11 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) >>> } >>> if (t_ch <= p_ch && t_ch >= prev_ch) >>> matched = 1; >>> + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) { >>> + uchar t_ch_upper = toupper(t_ch); >>> + if (t_ch_upper <= p_ch && t_ch_upper >= prev_ch) >>> + matched = 1; >>> + } >> >> Or we could stick with to tolower. Something like this >> >> if ((t_ch <= p_ch && t_ch >= prev_ch) || >> ((flags & WM_CASEFOLD) && >> t_ch <= tolower(p_ch) && t_ch >= tolower(prev_ch))) >> match = 1; >> >> I think it's easier to read if we either downcase all, or upcase all, not both. > > If the range to match against is [A-_], it will become [a-_] which is an empty range, ord('a') > ord('_'). I think it is simpler to reuse toupper() after the fact as I did. > > Anyway maybe I should add a test for that corner case? Yeah I was thinking about such a case, but I saw glibc do it... I guess we just found another bug, at least in compat/fnmatch.c. Yes a test for it would be great, in case I change my mind 2 years from now and decide to turn it the other way ;) > >>> p_ch = 0; /* This makes "prev_ch" get set to 0. */ >>> } else if (p_ch == '[' && p[1] == ':') { >>> const uchar *s; >>> @@ -245,6 +250,8 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) >>> } else if (CC_EQ(s,i, "upper")) { >>> if (ISUPPER(t_ch)) >>> matched = 1; >>> + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) >>> + matched = 1; >>> } else if (CC_EQ(s,i, "xdigit")) { >>> if (ISXDIGIT(t_ch)) >>> matched = 1; >> >> If WM_CASEFOLD is set, maybe isalpha(t_ch) is enough then? > > Yes isalpha() is enought but I wanted to keep the two cases separated, I can amend that if you want. Either way is fine. I don't think this code is performance critical. Your call. -- Duy ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3] wildmatch: properly fold case everywhere 2013-05-29 13:52 ` Duy Nguyen @ 2013-05-29 17:57 ` Anthony Ramine 2013-05-30 0:04 ` Duy Nguyen 0 siblings, 1 reply; 18+ messages in thread From: Anthony Ramine @ 2013-05-29 17:57 UTC (permalink / raw) To: Duy Nguyen; +Cc: Git Mailing List Replied inline. -- Anthony Ramine Le 29 mai 2013 à 15:52, Duy Nguyen a écrit : > On Wed, May 29, 2013 at 8:37 PM, Anthony Ramine <n.oxyde@gmail.com> wrote: >> Le 29 mai 2013 à 15:22, Duy Nguyen a écrit : >> >>> On Tue, May 28, 2013 at 8:58 PM, Anthony Ramine <n.oxyde@gmail.com> wrote: >>>> Case folding is not done correctly when matching against the [:upper:] >>>> character class and uppercased character ranges (e.g. A-Z). >>>> Specifically, an uppercase letter fails to match against any of them >>>> when case folding is requested because plain characters in the pattern >>>> and the whole string and preemptively lowercased to handle the base case >>>> fast. >>> >>> I did a little test with glibc fnmatch and also checked the source >>> code. I don't think 'a' matches [:upper:]. So I'm not sure if that's a >>> correct behavior or a bug in glibc. The spec is not clear (I think) on >>> this. I guess we should just assume that 'a' should match '[:upper:]'? >> >> I don't know, in my opinion if case folding is enabled we should say [:upper:], [:lower:] and [:alpha:] are equivalent. >> >> This opinion is shared by GNU Flex [1]: >> >>> • If your scanner is case-insensitive (the ‘-i’ flag), then ‘[:upper:]’ and ‘[:lower:]’ are equivalent to ‘[:alpha:]’. >> >> [1] http://flex.sourceforge.net/manual/Patterns.html > > Then we should do it too because of this precedent, I think. > >>>> @@ -196,6 +196,11 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) >>>> } >>>> if (t_ch <= p_ch && t_ch >= prev_ch) >>>> matched = 1; >>>> + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) { >>>> + uchar t_ch_upper = toupper(t_ch); >>>> + if (t_ch_upper <= p_ch && t_ch_upper >= prev_ch) >>>> + matched = 1; >>>> + } >>> >>> Or we could stick with to tolower. Something like this >>> >>> if ((t_ch <= p_ch && t_ch >= prev_ch) || >>> ((flags & WM_CASEFOLD) && >>> t_ch <= tolower(p_ch) && t_ch >= tolower(prev_ch))) >>> match = 1; >>> >>> I think it's easier to read if we either downcase all, or upcase all, not both. >> >> If the range to match against is [A-_], it will become [a-_] which is an empty range, ord('a') > ord('_'). I think it is simpler to reuse toupper() after the fact as I did. >> >> Anyway maybe I should add a test for that corner case? > > Yeah I was thinking about such a case, but I saw glibc do it... I > guess we just found another bug, at least in compat/fnmatch.c. Yes a > test for it would be great, in case I change my mind 2 years from now > and decide to turn it the other way ;) Should I patch compat/fnmatch.c too? That would make it different from the glibc's one. >> >>>> p_ch = 0; /* This makes "prev_ch" get set to 0. */ >>>> } else if (p_ch == '[' && p[1] == ':') { >>>> const uchar *s; >>>> @@ -245,6 +250,8 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) >>>> } else if (CC_EQ(s,i, "upper")) { >>>> if (ISUPPER(t_ch)) >>>> matched = 1; >>>> + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) >>>> + matched = 1; >>>> } else if (CC_EQ(s,i, "xdigit")) { >>>> if (ISXDIGIT(t_ch)) >>>> matched = 1; >>> >>> If WM_CASEFOLD is set, maybe isalpha(t_ch) is enough then? >> >> Yes isalpha() is enought but I wanted to keep the two cases separated, I can amend that if you want. > > Either way is fine. I don't think this code is performance critical. Your call. > -- > Duy ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3] wildmatch: properly fold case everywhere 2013-05-29 17:57 ` Anthony Ramine @ 2013-05-30 0:04 ` Duy Nguyen 2013-05-30 8:45 ` [PATCH] " Anthony Ramine 0 siblings, 1 reply; 18+ messages in thread From: Duy Nguyen @ 2013-05-30 0:04 UTC (permalink / raw) To: Anthony Ramine; +Cc: Git Mailing List On Thu, May 30, 2013 at 12:57 AM, Anthony Ramine <n.oxyde@gmail.com> wrote: >>> If the range to match against is [A-_], it will become [a-_] which is an empty range, ord('a') > ord('_'). I think it is simpler to reuse toupper() after the fact as I did. >>> >>> Anyway maybe I should add a test for that corner case? >> >> Yeah I was thinking about such a case, but I saw glibc do it... I >> guess we just found another bug, at least in compat/fnmatch.c. Yes a >> test for it would be great, in case I change my mind 2 years from now >> and decide to turn it the other way ;) > > Should I patch compat/fnmatch.c too? That would make it different from the glibc's one. No. I plan to remove compat/fnmatch and always use wildmatch, even ignoring system's fnmatch. That would keep the matching behavior consistent across platforms. -- Duy ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH] wildmatch: properly fold case everywhere 2013-05-30 0:04 ` Duy Nguyen @ 2013-05-30 8:45 ` Anthony Ramine 2013-05-30 8:52 ` Duy Nguyen ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Anthony Ramine @ 2013-05-30 8:45 UTC (permalink / raw) To: Git Mailing List; +Cc: Nguyễn Thái Ngọc Duy Case folding is not done correctly when matching against the [:upper:] character class and uppercased character ranges (e.g. A-Z). Specifically, an uppercase letter fails to match against any of them when case folding is requested because plain characters in the pattern and the whole string and preemptively lowercased to handle the base case fast. That optimization is kept and ISLOWER() is used in the [:upper:] case when case folding is requested, while matching against a character range is retried with toupper() if the character was lowercase, as the bounds of the range itself cannot be modified (in a case-insensitive context, [A-_] is not equivalent to [a-_]). Signed-off-by: Anthony Ramine <n.oxyde@gmail.com> --- t/t3070-wildmatch.sh | 55 ++++++++++++++++++++++++++++++++++++++++++++++------ wildmatch.c | 7 +++++++ 2 files changed, 56 insertions(+), 6 deletions(-) I added four tests for the [A-_] range case and a note about it in the commit message. diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh index 4c37057..38446a0 100755 --- a/t/t3070-wildmatch.sh +++ b/t/t3070-wildmatch.sh @@ -6,20 +6,20 @@ test_description='wildmatch tests' match() { if [ $1 = 1 ]; then - test_expect_success "wildmatch: match '$3' '$4'" " + test_expect_success "wildmatch: match '$3' '$4'" " test-wildmatch wildmatch '$3' '$4' " else - test_expect_success "wildmatch: no match '$3' '$4'" " + test_expect_success "wildmatch: no match '$3' '$4'" " ! test-wildmatch wildmatch '$3' '$4' " fi if [ $2 = 1 ]; then - test_expect_success "fnmatch: match '$3' '$4'" " + test_expect_success "fnmatch: match '$3' '$4'" " test-wildmatch fnmatch '$3' '$4' " elif [ $2 = 0 ]; then - test_expect_success "fnmatch: no match '$3' '$4'" " + test_expect_success "fnmatch: no match '$3' '$4'" " ! test-wildmatch fnmatch '$3' '$4' " # else @@ -29,13 +29,25 @@ match() { fi } +imatch() { + if [ $1 = 1 ]; then + test_expect_success "iwildmatch: match '$2' '$3'" " + test-wildmatch iwildmatch '$2' '$3' + " + else + test_expect_success "iwildmatch: no match '$2' '$3'" " + ! test-wildmatch iwildmatch '$2' '$3' + " + fi +} + pathmatch() { if [ $1 = 1 ]; then - test_expect_success "pathmatch: match '$2' '$3'" " + test_expect_success "pathmatch: match '$2' '$3'" " test-wildmatch pathmatch '$2' '$3' " else - test_expect_success "pathmatch: no match '$2' '$3'" " + test_expect_success "pathmatch: no match '$2' '$3'" " ! test-wildmatch pathmatch '$2' '$3' " fi @@ -235,4 +247,35 @@ pathmatch 1 abcXdefXghi '*X*i' pathmatch 1 ab/cXd/efXg/hi '*/*X*/*/*i' pathmatch 1 ab/cXd/efXg/hi '*Xg*i' +# Case-sensitivy features +match 0 x 'a' '[A-Z]' +match 1 x 'A' '[A-Z]' +match 0 x 'A' '[a-z]' +match 1 x 'a' '[a-z]' +match 0 x 'a' '[[:upper:]]' +match 1 x 'A' '[[:upper:]]' +match 0 x 'A' '[[:lower:]]' +match 1 x 'a' '[[:lower:]]' +match 0 x 'A' '[B-Za]' +match 1 x 'a' '[B-Za]' +match 0 x 'A' '[B-a]' +match 1 x 'a' '[B-a]' +match 0 x 'z' '[Z-y]' +match 1 x 'Z' '[Z-y]' + +imatch 1 'a' '[A-Z]' +imatch 1 'A' '[A-Z]' +imatch 1 'A' '[a-z]' +imatch 1 'a' '[a-z]' +imatch 1 'a' '[[:upper:]]' +imatch 1 'A' '[[:upper:]]' +imatch 1 'A' '[[:lower:]]' +imatch 1 'a' '[[:lower:]]' +imatch 1 'A' '[B-Za]' +imatch 1 'a' '[B-Za]' +imatch 1 'A' '[B-a]' +imatch 1 'a' '[B-a]' +imatch 1 'z' '[Z-y]' +imatch 1 'Z' '[Z-y]' + test_done diff --git a/wildmatch.c b/wildmatch.c index 7192bdc..f91ba99 100644 --- a/wildmatch.c +++ b/wildmatch.c @@ -196,6 +196,11 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) } if (t_ch <= p_ch && t_ch >= prev_ch) matched = 1; + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) { + uchar t_ch_upper = toupper(t_ch); + if (t_ch_upper <= p_ch && t_ch_upper >= prev_ch) + matched = 1; + } p_ch = 0; /* This makes "prev_ch" get set to 0. */ } else if (p_ch == '[' && p[1] == ':') { const uchar *s; @@ -245,6 +250,8 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) } else if (CC_EQ(s,i, "upper")) { if (ISUPPER(t_ch)) matched = 1; + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) + matched = 1; } else if (CC_EQ(s,i, "xdigit")) { if (ISXDIGIT(t_ch)) matched = 1; -- 1.8.3 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH] wildmatch: properly fold case everywhere 2013-05-30 8:45 ` [PATCH] " Anthony Ramine @ 2013-05-30 8:52 ` Duy Nguyen 2013-05-30 9:07 ` Eric Sunshine 2013-05-30 10:19 ` [PATCH v5] " Anthony Ramine 2 siblings, 0 replies; 18+ messages in thread From: Duy Nguyen @ 2013-05-30 8:52 UTC (permalink / raw) To: Anthony Ramine; +Cc: Git Mailing List On Thu, May 30, 2013 at 3:45 PM, Anthony Ramine <n.oxyde@gmail.com> wrote: > Case folding is not done correctly when matching against the [:upper:] > character class and uppercased character ranges (e.g. A-Z). > Specifically, an uppercase letter fails to match against any of them > when case folding is requested because plain characters in the pattern > and the whole string and preemptively lowercased to handle the base case > fast. > > That optimization is kept and ISLOWER() is used in the [:upper:] case > when case folding is requested, while matching against a character range > is retried with toupper() if the character was lowercase, as the bounds > of the range itself cannot be modified (in a case-insensitive context, > [A-_] is not equivalent to [a-_]). > > Signed-off-by: Anthony Ramine <n.oxyde@gmail.com> Reviewed-by: Duy Nguyen <pclouds@gmail.com> If you have time, you may want to send a similar patch to rsync, which contains original wildmatch implementation. It does not look much different from this one, except that (flags & WM_CASEFOLD) is replaced with force_lower_case. Thanks. -- Duy ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] wildmatch: properly fold case everywhere 2013-05-30 8:45 ` [PATCH] " Anthony Ramine 2013-05-30 8:52 ` Duy Nguyen @ 2013-05-30 9:07 ` Eric Sunshine 2013-05-30 9:29 ` Anthony Ramine 2013-05-30 10:19 ` [PATCH v5] " Anthony Ramine 2 siblings, 1 reply; 18+ messages in thread From: Eric Sunshine @ 2013-05-30 9:07 UTC (permalink / raw) To: Anthony Ramine; +Cc: Git Mailing List, Nguyễn Thái Ngọc Duy On Thu, May 30, 2013 at 4:45 AM, Anthony Ramine <n.oxyde@gmail.com> wrote: > Case folding is not done correctly when matching against the [:upper:] > character class and uppercased character ranges (e.g. A-Z). > Specifically, an uppercase letter fails to match against any of them > when case folding is requested because plain characters in the pattern > and the whole string and preemptively lowercased to handle the base case Did you mean s/and preemptively/are preemptively/ ? > fast. > > That optimization is kept and ISLOWER() is used in the [:upper:] case > when case folding is requested, while matching against a character range > is retried with toupper() if the character was lowercase, as the bounds > of the range itself cannot be modified (in a case-insensitive context, > [A-_] is not equivalent to [a-_]). > > Signed-off-by: Anthony Ramine <n.oxyde@gmail.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] wildmatch: properly fold case everywhere 2013-05-30 9:07 ` Eric Sunshine @ 2013-05-30 9:29 ` Anthony Ramine 2013-05-30 10:09 ` Eric Sunshine 0 siblings, 1 reply; 18+ messages in thread From: Anthony Ramine @ 2013-05-30 9:29 UTC (permalink / raw) To: Eric Sunshine; +Cc: Git Mailing List, Nguyễn Thái Ngọc Duy Yes indeed. Will amend. Should I add your name in Reviewed-by as well? -- Anthony Ramine Le 30 mai 2013 à 11:07, Eric Sunshine a écrit : > On Thu, May 30, 2013 at 4:45 AM, Anthony Ramine <n.oxyde@gmail.com> wrote: >> Case folding is not done correctly when matching against the [:upper:] >> character class and uppercased character ranges (e.g. A-Z). >> Specifically, an uppercase letter fails to match against any of them >> when case folding is requested because plain characters in the pattern >> and the whole string and preemptively lowercased to handle the base case > > Did you mean s/and preemptively/are preemptively/ ? > >> fast. >> >> That optimization is kept and ISLOWER() is used in the [:upper:] case >> when case folding is requested, while matching against a character range >> is retried with toupper() if the character was lowercase, as the bounds >> of the range itself cannot be modified (in a case-insensitive context, >> [A-_] is not equivalent to [a-_]). >> >> Signed-off-by: Anthony Ramine <n.oxyde@gmail.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] wildmatch: properly fold case everywhere 2013-05-30 9:29 ` Anthony Ramine @ 2013-05-30 10:09 ` Eric Sunshine 0 siblings, 0 replies; 18+ messages in thread From: Eric Sunshine @ 2013-05-30 10:09 UTC (permalink / raw) To: Anthony Ramine; +Cc: Git Mailing List, Nguyễn Thái Ngọc Duy On Thu, May 30, 2013 at 5:29 AM, Anthony Ramine <n.oxyde@gmail.com> wrote: > Yes indeed. Will amend. Should I add your name in Reviewed-by as well? No. I merely spotted a minor typographical error. > -- > Anthony Ramine > > Le 30 mai 2013 à 11:07, Eric Sunshine a écrit : > >> On Thu, May 30, 2013 at 4:45 AM, Anthony Ramine <n.oxyde@gmail.com> wrote: >>> Case folding is not done correctly when matching against the [:upper:] >>> character class and uppercased character ranges (e.g. A-Z). >>> Specifically, an uppercase letter fails to match against any of them >>> when case folding is requested because plain characters in the pattern >>> and the whole string and preemptively lowercased to handle the base case >> >> Did you mean s/and preemptively/are preemptively/ ? >> >>> fast. >>> >>> That optimization is kept and ISLOWER() is used in the [:upper:] case >>> when case folding is requested, while matching against a character range >>> is retried with toupper() if the character was lowercase, as the bounds >>> of the range itself cannot be modified (in a case-insensitive context, >>> [A-_] is not equivalent to [a-_]). >>> >>> Signed-off-by: Anthony Ramine <n.oxyde@gmail.com> > ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v5] wildmatch: properly fold case everywhere 2013-05-30 8:45 ` [PATCH] " Anthony Ramine 2013-05-30 8:52 ` Duy Nguyen 2013-05-30 9:07 ` Eric Sunshine @ 2013-05-30 10:19 ` Anthony Ramine 2013-06-02 21:53 ` Junio C Hamano 2 siblings, 1 reply; 18+ messages in thread From: Anthony Ramine @ 2013-05-30 10:19 UTC (permalink / raw) To: Git Mailing List; +Cc: Nguyễn Thái Ngọc Duy Case folding is not done correctly when matching against the [:upper:] character class and uppercased character ranges (e.g. A-Z). Specifically, an uppercase letter fails to match against any of them when case folding is requested because plain characters in the pattern and the whole string are preemptively lowercased to handle the base case fast. That optimization is kept and ISLOWER() is used in the [:upper:] case when case folding is requested, while matching against a character range is retried with toupper() if the character was lowercase, as the bounds of the range itself cannot be modified (in a case-insensitive context, [A-_] is not equivalent to [a-_]). Signed-off-by: Anthony Ramine <n.oxyde@gmail.com> Reviewed-by: Duy Nguyen <pclouds@gmail.com> --- t/t3070-wildmatch.sh | 55 ++++++++++++++++++++++++++++++++++++++++++++++------ wildmatch.c | 7 +++++++ 2 files changed, 56 insertions(+), 6 deletions(-) I added Duy as reviewer and fixed a typo in the commit message reported by Eric Sunshine. diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh index 4c37057..38446a0 100755 --- a/t/t3070-wildmatch.sh +++ b/t/t3070-wildmatch.sh @@ -6,20 +6,20 @@ test_description='wildmatch tests' match() { if [ $1 = 1 ]; then - test_expect_success "wildmatch: match '$3' '$4'" " + test_expect_success "wildmatch: match '$3' '$4'" " test-wildmatch wildmatch '$3' '$4' " else - test_expect_success "wildmatch: no match '$3' '$4'" " + test_expect_success "wildmatch: no match '$3' '$4'" " ! test-wildmatch wildmatch '$3' '$4' " fi if [ $2 = 1 ]; then - test_expect_success "fnmatch: match '$3' '$4'" " + test_expect_success "fnmatch: match '$3' '$4'" " test-wildmatch fnmatch '$3' '$4' " elif [ $2 = 0 ]; then - test_expect_success "fnmatch: no match '$3' '$4'" " + test_expect_success "fnmatch: no match '$3' '$4'" " ! test-wildmatch fnmatch '$3' '$4' " # else @@ -29,13 +29,25 @@ match() { fi } +imatch() { + if [ $1 = 1 ]; then + test_expect_success "iwildmatch: match '$2' '$3'" " + test-wildmatch iwildmatch '$2' '$3' + " + else + test_expect_success "iwildmatch: no match '$2' '$3'" " + ! test-wildmatch iwildmatch '$2' '$3' + " + fi +} + pathmatch() { if [ $1 = 1 ]; then - test_expect_success "pathmatch: match '$2' '$3'" " + test_expect_success "pathmatch: match '$2' '$3'" " test-wildmatch pathmatch '$2' '$3' " else - test_expect_success "pathmatch: no match '$2' '$3'" " + test_expect_success "pathmatch: no match '$2' '$3'" " ! test-wildmatch pathmatch '$2' '$3' " fi @@ -235,4 +247,35 @@ pathmatch 1 abcXdefXghi '*X*i' pathmatch 1 ab/cXd/efXg/hi '*/*X*/*/*i' pathmatch 1 ab/cXd/efXg/hi '*Xg*i' +# Case-sensitivy features +match 0 x 'a' '[A-Z]' +match 1 x 'A' '[A-Z]' +match 0 x 'A' '[a-z]' +match 1 x 'a' '[a-z]' +match 0 x 'a' '[[:upper:]]' +match 1 x 'A' '[[:upper:]]' +match 0 x 'A' '[[:lower:]]' +match 1 x 'a' '[[:lower:]]' +match 0 x 'A' '[B-Za]' +match 1 x 'a' '[B-Za]' +match 0 x 'A' '[B-a]' +match 1 x 'a' '[B-a]' +match 0 x 'z' '[Z-y]' +match 1 x 'Z' '[Z-y]' + +imatch 1 'a' '[A-Z]' +imatch 1 'A' '[A-Z]' +imatch 1 'A' '[a-z]' +imatch 1 'a' '[a-z]' +imatch 1 'a' '[[:upper:]]' +imatch 1 'A' '[[:upper:]]' +imatch 1 'A' '[[:lower:]]' +imatch 1 'a' '[[:lower:]]' +imatch 1 'A' '[B-Za]' +imatch 1 'a' '[B-Za]' +imatch 1 'A' '[B-a]' +imatch 1 'a' '[B-a]' +imatch 1 'z' '[Z-y]' +imatch 1 'Z' '[Z-y]' + test_done diff --git a/wildmatch.c b/wildmatch.c index 7192bdc..f91ba99 100644 --- a/wildmatch.c +++ b/wildmatch.c @@ -196,6 +196,11 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) } if (t_ch <= p_ch && t_ch >= prev_ch) matched = 1; + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) { + uchar t_ch_upper = toupper(t_ch); + if (t_ch_upper <= p_ch && t_ch_upper >= prev_ch) + matched = 1; + } p_ch = 0; /* This makes "prev_ch" get set to 0. */ } else if (p_ch == '[' && p[1] == ':') { const uchar *s; @@ -245,6 +250,8 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) } else if (CC_EQ(s,i, "upper")) { if (ISUPPER(t_ch)) matched = 1; + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) + matched = 1; } else if (CC_EQ(s,i, "xdigit")) { if (ISXDIGIT(t_ch)) matched = 1; -- 1.8.3 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v5] wildmatch: properly fold case everywhere 2013-05-30 10:19 ` [PATCH v5] " Anthony Ramine @ 2013-06-02 21:53 ` Junio C Hamano 2013-06-02 23:42 ` Anthony Ramine 0 siblings, 1 reply; 18+ messages in thread From: Junio C Hamano @ 2013-06-02 21:53 UTC (permalink / raw) To: Anthony Ramine; +Cc: Git Mailing List, Nguyễn Thái Ngọc Duy Anthony Ramine <n.oxyde@gmail.com> writes: > ase folding is not done correctly when matching against the [:upper:] > character class and uppercased character ranges (e.g. A-Z). > Specifically, an uppercase letter fails to match against any of them > when case folding is requested because plain characters in the pattern > and the whole string are preemptively lowercased to handle the base case > fast. > > That optimization is kept and ISLOWER() is used in the [:upper:] case > when case folding is requested, while matching against a character range > is retried with toupper() if the character was lowercase, as the bounds > of the range itself cannot be modified (in a case-insensitive context, > [A-_] is not equivalent to [a-_]). > > Signed-off-by: Anthony Ramine <n.oxyde@gmail.com> > Reviewed-by: Duy Nguyen <pclouds@gmail.com> > --- Thanks. > t/t3070-wildmatch.sh | 55 ++++++++++++++++++++++++++++++++++++++++++++++------ > wildmatch.c | 7 +++++++ > 2 files changed, 56 insertions(+), 6 deletions(-) > > I added Duy as reviewer and fixed a typo in the commit message reported by > Eric Sunshine. > > diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh > index 4c37057..38446a0 100755 > --- a/t/t3070-wildmatch.sh > +++ b/t/t3070-wildmatch.sh > @@ -6,20 +6,20 @@ test_description='wildmatch tests' > > match() { > if [ $1 = 1 ]; then > - test_expect_success "wildmatch: match '$3' '$4'" " > + test_expect_success "wildmatch: match '$3' '$4'" " > test-wildmatch wildmatch '$3' '$4' > " > else > - test_expect_success "wildmatch: no match '$3' '$4'" " > + test_expect_success "wildmatch: no match '$3' '$4'" " > ! test-wildmatch wildmatch '$3' '$4' > " > fi > if [ $2 = 1 ]; then > - test_expect_success "fnmatch: match '$3' '$4'" " > + test_expect_success "fnmatch: match '$3' '$4'" " > test-wildmatch fnmatch '$3' '$4' > " > elif [ $2 = 0 ]; then > - test_expect_success "fnmatch: no match '$3' '$4'" " > + test_expect_success "fnmatch: no match '$3' '$4'" " > ! test-wildmatch fnmatch '$3' '$4' > " > # else Is the above about aligning $3/$4 across checks of different types (i.e. purely cosmetic)? I am not complaining; just making sure if there is nothing deeper going on. It is outside the scope of this change, but the shell style of this script (most notably use of [] instead of test) needs to be fixed someday, preferrably soon, including the commented out else clause at the end of the hunk. > @@ -235,4 +247,35 @@ pathmatch 1 abcXdefXghi '*X*i' > pathmatch 1 ab/cXd/efXg/hi '*/*X*/*/*i' > pathmatch 1 ab/cXd/efXg/hi '*Xg*i' > > +# Case-sensitivy features > +match 0 x 'a' '[A-Z]' > +match 1 x 'A' '[A-Z]' > +match 0 x 'A' '[a-z]' > +match 1 x 'a' '[a-z]' > +match 0 x 'a' '[[:upper:]]' > +match 1 x 'A' '[[:upper:]]' > +match 0 x 'A' '[[:lower:]]' > +match 1 x 'a' '[[:lower:]]' > +match 0 x 'A' '[B-Za]' > +match 1 x 'a' '[B-Za]' > +match 0 x 'A' '[B-a]' > +match 1 x 'a' '[B-a]' > +match 0 x 'z' '[Z-y]' > +match 1 x 'Z' '[Z-y]' > + > +imatch 1 'a' '[A-Z]' Do we want "# Case-insensitivity features" commment here as well? > +imatch 1 'A' '[A-Z]' > +imatch 1 'A' '[a-z]' > +imatch 1 'a' '[a-z]' > +imatch 1 'a' '[[:upper:]]' > +imatch 1 'A' '[[:upper:]]' > +imatch 1 'A' '[[:lower:]]' > +imatch 1 'a' '[[:lower:]]' > +imatch 1 'A' '[B-Za]' > +imatch 1 'a' '[B-Za]' > +imatch 1 'A' '[B-a]' > +imatch 1 'a' '[B-a]' > +imatch 1 'z' '[Z-y]' > +imatch 1 'Z' '[Z-y]' > + > test_done > diff --git a/wildmatch.c b/wildmatch.c > index 7192bdc..f91ba99 100644 > --- a/wildmatch.c > +++ b/wildmatch.c > @@ -196,6 +196,11 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) > } > if (t_ch <= p_ch && t_ch >= prev_ch) > matched = 1; > + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) { > + uchar t_ch_upper = toupper(t_ch); > + if (t_ch_upper <= p_ch && t_ch_upper >= prev_ch) > + matched = 1; > + } > p_ch = 0; /* This makes "prev_ch" get set to 0. */ Hmm, this looks somewhat strange. * At the beginning of the outermost "per characters in the text" loop, we seem to downcase t_ch when WM_CASEFOLD is set. * Also at the same place, we also seem to downcase p_ch under the same condition. which makes me wonder why the fix is not like this: + if (flags & WM_CASEFOLD) { + if (ISUPPER(p_ch)) + p_ch = tolower(p_ch); + if (prev_ch && ISUPPER(prev_ch)) + prev_ch = tolower(prev_ch); + } if (t_ch <= p_ch && t_ch >= prev_ch) matched = 1; p_ch = 0; /* This sets "prev_ch" to 0 */ Ahh, OK, the "seemingly strange" construct is about handling a range like "[Z-y]"; we do not want to upcase or downcase the p_ch/prev_ch make the range "[z-y]" (empty) which would exclude bytes like "^", "_" or even "Z". And it is also OK to downcase p_ch in a single-letter case, not the characters in a range, at the beginning of the outermost loop, because we always compare for equality against t_ch (which is downcased) in that case. > @@ -245,6 +250,8 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) > } else if (CC_EQ(s,i, "upper")) { > if (ISUPPER(t_ch)) > matched = 1; > + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) > + matched = 1; This also looks somewhat strange but correct in that t_ch is already downcased so we do not need a corresponding change for CC_EQ("lower") codepath. Interesting. Will apply. Thanks. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v5] wildmatch: properly fold case everywhere 2013-06-02 21:53 ` Junio C Hamano @ 2013-06-02 23:42 ` Anthony Ramine 0 siblings, 0 replies; 18+ messages in thread From: Anthony Ramine @ 2013-06-02 23:42 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git Mailing List, Nguyễn Thái Ngọc Duy Hello Junio, Replied inline. Regards, -- Anthony Ramine Le 2 juin 2013 à 23:53, Junio C Hamano a écrit : > Anthony Ramine <n.oxyde@gmail.com> writes: > >> ase folding is not done correctly when matching against the [:upper:] >> character class and uppercased character ranges (e.g. A-Z). >> Specifically, an uppercase letter fails to match against any of them >> when case folding is requested because plain characters in the pattern >> and the whole string are preemptively lowercased to handle the base case >> fast. >> >> That optimization is kept and ISLOWER() is used in the [:upper:] case >> when case folding is requested, while matching against a character range >> is retried with toupper() if the character was lowercase, as the bounds >> of the range itself cannot be modified (in a case-insensitive context, >> [A-_] is not equivalent to [a-_]). >> >> Signed-off-by: Anthony Ramine <n.oxyde@gmail.com> >> Reviewed-by: Duy Nguyen <pclouds@gmail.com> >> --- > > Thanks. > >> t/t3070-wildmatch.sh | 55 ++++++++++++++++++++++++++++++++++++++++++++++------ >> wildmatch.c | 7 +++++++ >> 2 files changed, 56 insertions(+), 6 deletions(-) >> >> I added Duy as reviewer and fixed a typo in the commit message reported by >> Eric Sunshine. >> >> diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh >> index 4c37057..38446a0 100755 >> --- a/t/t3070-wildmatch.sh >> +++ b/t/t3070-wildmatch.sh >> @@ -6,20 +6,20 @@ test_description='wildmatch tests' >> >> match() { >> if [ $1 = 1 ]; then >> - test_expect_success "wildmatch: match '$3' '$4'" " >> + test_expect_success "wildmatch: match '$3' '$4'" " >> test-wildmatch wildmatch '$3' '$4' >> " >> else >> - test_expect_success "wildmatch: no match '$3' '$4'" " >> + test_expect_success "wildmatch: no match '$3' '$4'" " >> ! test-wildmatch wildmatch '$3' '$4' >> " >> fi >> if [ $2 = 1 ]; then >> - test_expect_success "fnmatch: match '$3' '$4'" " >> + test_expect_success "fnmatch: match '$3' '$4'" " >> test-wildmatch fnmatch '$3' '$4' >> " >> elif [ $2 = 0 ]; then >> - test_expect_success "fnmatch: no match '$3' '$4'" " >> + test_expect_success "fnmatch: no match '$3' '$4'" " >> ! test-wildmatch fnmatch '$3' '$4' >> " >> # else > > Is the above about aligning $3/$4 across checks of different types > (i.e. purely cosmetic)? I am not complaining; just making sure if > there is nothing deeper going on. Yes purely cosmetic. > It is outside the scope of this change, but the shell style of this > script (most notably use of [] instead of test) needs to be fixed > someday, preferrably soon, including the commented out else clause > at the end of the hunk. > >> @@ -235,4 +247,35 @@ pathmatch 1 abcXdefXghi '*X*i' >> pathmatch 1 ab/cXd/efXg/hi '*/*X*/*/*i' >> pathmatch 1 ab/cXd/efXg/hi '*Xg*i' >> >> +# Case-sensitivy features >> +match 0 x 'a' '[A-Z]' >> +match 1 x 'A' '[A-Z]' >> +match 0 x 'A' '[a-z]' >> +match 1 x 'a' '[a-z]' >> +match 0 x 'a' '[[:upper:]]' >> +match 1 x 'A' '[[:upper:]]' >> +match 0 x 'A' '[[:lower:]]' >> +match 1 x 'a' '[[:lower:]]' >> +match 0 x 'A' '[B-Za]' >> +match 1 x 'a' '[B-Za]' >> +match 0 x 'A' '[B-a]' >> +match 1 x 'a' '[B-a]' >> +match 0 x 'z' '[Z-y]' >> +match 1 x 'Z' '[Z-y]' >> + >> +imatch 1 'a' '[A-Z]' > > Do we want "# Case-insensitivity features" commment here as well? I don't particularly care, some sections have titles, some don't. >> +imatch 1 'A' '[A-Z]' >> +imatch 1 'A' '[a-z]' >> +imatch 1 'a' '[a-z]' >> +imatch 1 'a' '[[:upper:]]' >> +imatch 1 'A' '[[:upper:]]' >> +imatch 1 'A' '[[:lower:]]' >> +imatch 1 'a' '[[:lower:]]' >> +imatch 1 'A' '[B-Za]' >> +imatch 1 'a' '[B-Za]' >> +imatch 1 'A' '[B-a]' >> +imatch 1 'a' '[B-a]' >> +imatch 1 'z' '[Z-y]' >> +imatch 1 'Z' '[Z-y]' > >> + >> test_done >> diff --git a/wildmatch.c b/wildmatch.c >> index 7192bdc..f91ba99 100644 >> --- a/wildmatch.c >> +++ b/wildmatch.c >> @@ -196,6 +196,11 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) >> } >> if (t_ch <= p_ch && t_ch >= prev_ch) >> matched = 1; >> + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) { >> + uchar t_ch_upper = toupper(t_ch); >> + if (t_ch_upper <= p_ch && t_ch_upper >= prev_ch) >> + matched = 1; >> + } >> p_ch = 0; /* This makes "prev_ch" get set to 0. */ > > Hmm, this looks somewhat strange. > > * At the beginning of the outermost "per characters in the text" > loop, we seem to downcase t_ch when WM_CASEFOLD is set. > * Also at the same place, we also seem to downcase p_ch under the > same condition. > > which makes me wonder why the fix is not like this: > > + if (flags & WM_CASEFOLD) { > + if (ISUPPER(p_ch)) > + p_ch = tolower(p_ch); > + if (prev_ch && ISUPPER(prev_ch)) > + prev_ch = tolower(prev_ch); > + } > if (t_ch <= p_ch && t_ch >= prev_ch) > matched = 1; > p_ch = 0; /* This sets "prev_ch" to 0 */ > > > Ahh, OK, the "seemingly strange" construct is about handling a range > like "[Z-y]"; we do not want to upcase or downcase the p_ch/prev_ch > make the range "[z-y]" (empty) which would exclude bytes like "^", > "_" or even "Z". > > And it is also OK to downcase p_ch in a single-letter case, not the > characters in a range, at the beginning of the outermost loop, > because we always compare for equality against t_ch (which is > downcased) in that case. Yeah I tried to explain that in the commit message but it is surely too dense. >> @@ -245,6 +250,8 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) >> } else if (CC_EQ(s,i, "upper")) { >> if (ISUPPER(t_ch)) >> matched = 1; >> + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) >> + matched = 1; > > This also looks somewhat strange but correct in that t_ch is already > downcased so we do not need a corresponding change for CC_EQ("lower") > codepath. I chose to still lowercase everything first, to keep the common path as is. > Interesting. Will apply. > > Thanks. Great. You're welcome. ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2013-06-02 23:43 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-05-28 12:32 [PATCH] wildmatch: properly fold case everywhere Anthony Ramine 2013-05-28 12:53 ` Duy Nguyen 2013-05-28 13:01 ` Anthony Ramine 2013-05-28 13:10 ` [PATCH v2] " Anthony Ramine 2013-05-28 13:58 ` [PATCH v3] " Anthony Ramine 2013-05-29 13:22 ` Duy Nguyen 2013-05-29 13:37 ` Anthony Ramine 2013-05-29 13:52 ` Duy Nguyen 2013-05-29 17:57 ` Anthony Ramine 2013-05-30 0:04 ` Duy Nguyen 2013-05-30 8:45 ` [PATCH] " Anthony Ramine 2013-05-30 8:52 ` Duy Nguyen 2013-05-30 9:07 ` Eric Sunshine 2013-05-30 9:29 ` Anthony Ramine 2013-05-30 10:09 ` Eric Sunshine 2013-05-30 10:19 ` [PATCH v5] " Anthony Ramine 2013-06-02 21:53 ` Junio C Hamano 2013-06-02 23:42 ` Anthony Ramine
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).