From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH] is_hfs_dotgit: loosen over-eager match of \u{..47}
Date: Tue, 23 Dec 2014 03:45:36 -0500 [thread overview]
Message-ID: <20141223084536.GA25190@peff.net> (raw)
Our is_hfs_dotgit function relies on the hackily-implemented
next_hfs_char to give us the next character that an HFS+
filename comparison would look at. It's hacky because it
doesn't implement the full case-folding table of HFS+; it
gives us just enough to see if the path matches ".git".
At the end of next_hfs_char, we use tolower() to convert our
32-bit code point to lowercase. Our tolower() implementation
only takes an 8-bit char, though; it throws away the upper
24 bits. This means we can't have any false negatives for
is_hfs_dotgit. We only care about matching 7-bit ASCII
characters in ".git", and we will correctly process 'G' or
'g'.
However, we _can_ have false positives. Because we throw
away the upper bits, code point \u{0147} (for example) will
look like 'G' and get downcased to 'g'. It's not known
whether a sequence of code points whose truncation ends up
as ".git" is meaningful in any language, but it does not
hurt to be more accurate here. We can just pass out the full
32-bit code point, and compare it manually to the upper and
lowercase characters we care about.
Signed-off-by: Jeff King <peff@peff.net>
---
I saw Linus ask about this on G+. I had done the "no false
negative" analysis when writing the patch, but didn't
consider the false positive.
Another way of accomplishing the same thing is for next_hfs_char to
continue folding case, but _only_ do so for 8-bit code points. Like:
return (out & 0xffffff00) ? out : tolower(out);
I think the what's below is more obvious (and is actually
how I originally wrote it; I switched to using tolower()
during development to try to make it more readable).
t/t1450-fsck.sh | 16 ++++++++++++++++
utf8.c | 17 ++++++++++++-----
2 files changed, 28 insertions(+), 5 deletions(-)
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 1f04b8a..3f5883d 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -349,6 +349,22 @@ dot-backslash-case .\\\\.GIT\\\\foobar
dotgit-case-backslash .git\\\\foobar
EOF
+test_expect_success 'fsck allows .Ňit' '
+ (
+ git init not-dotgit &&
+ cd not-dotgit &&
+ echo content >file &&
+ git add file &&
+ git commit -m base &&
+ blob=$(git rev-parse :file) &&
+ printf "100644 blob $blob\t.\\305\\207it" >tree &&
+ tree=$(git mktree <tree) &&
+ git fsck 2>err &&
+ cat err &&
+ ! test -s err
+ )
+'
+
# create a static test repo which is broken by omitting
# one particular object ($1, which is looked up via rev-parse
# in the new repository).
diff --git a/utf8.c b/utf8.c
index 9a3f4ad..34a779e 100644
--- a/utf8.c
+++ b/utf8.c
@@ -606,7 +606,7 @@ static ucs_char_t next_hfs_char(const char **in)
* but this is enough to catch anything that will convert
* to ".git"
*/
- return tolower(out);
+ return out;
}
}
@@ -614,10 +614,17 @@ int is_hfs_dotgit(const char *path)
{
ucs_char_t c;
- if (next_hfs_char(&path) != '.' ||
- next_hfs_char(&path) != 'g' ||
- next_hfs_char(&path) != 'i' ||
- next_hfs_char(&path) != 't')
+ c = next_hfs_char(&path);
+ if (c != '.')
+ return 0;
+ c = next_hfs_char(&path);
+ if (c != 'g' && c != 'G')
+ return 0;
+ c = next_hfs_char(&path);
+ if (c != 'i' && c != 'I')
+ return 0;
+ c = next_hfs_char(&path);
+ if (c != 't' && c != 'T')
return 0;
c = next_hfs_char(&path);
if (c && !is_dir_sep(c))
--
2.2.1.376.gec59d43
next reply other threads:[~2014-12-23 8:45 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-23 8:45 Jeff King [this message]
2014-12-23 15:24 ` [PATCH] is_hfs_dotgit: loosen over-eager match of \u{..47} Torsten Bögershausen
2014-12-23 18:17 ` Junio C Hamano
2014-12-23 18:18 ` Jeff King
2014-12-23 20:14 ` Jonathan Nieder
2014-12-23 21:02 ` Junio C Hamano
2014-12-23 21:12 ` Jeff King
2014-12-23 21:09 ` Jeff King
2014-12-23 20:31 ` Johannes Sixt
2014-12-23 21:11 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141223084536.GA25190@peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).