From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH v2 1/2] grep: use static trans-case table
Date: Wed, 29 Feb 2012 03:28:14 -0500 [thread overview]
Message-ID: <20120229082814.GB14181@sigill.intra.peff.net> (raw)
In-Reply-To: <1330474831-9030-2-git-send-email-gitster@pobox.com>
On Tue, Feb 28, 2012 at 04:20:30PM -0800, Junio C Hamano wrote:
> In order to prepare the kwset machinery for a case-insensitive search, we
> used to use a static table of 256 elements and filled it every time before
> calling kwsalloc(). Because the kwset machinery will never modify this
> table, just allocate a single instance globally and fill it at the compile
> time.
Hmm. I was going to complain that the original code used tolower() to
generate the table at run-time, and therefore respected the current
locale. But of course we have replaced tolower() with a
locale-independent version, so it should behave identically.
But that does make me wonder. Do people expect their case-insensitive
searches to work on non-ASCII characters? I would think yes, but I do
not use non-ASCII characters in the first place, so my opinion may not
mean much.
For that matter, does REG_ICASE respect locales? The glibc code appears
to consider it, but I couldn't make it work in some simple tests. But if
it does, that raises another weirdness: we fall back to kwset
transparently when a grep pattern contains no metacharacters. So you
would get different results for "-i --grep=é" versus "-i --grep=é.*".
Of course, even if we used a locale-respecting version of tolower in the
original code, I suspect that a byte table would be fundamentally
insufficient, anyway, in the face of multi-byte encodings like utf8.
So I don't think your patch is making the problem any worse. And even if
somebody wants to tackle the problem later, the solution would look so
unlike the original code that your change is not hurting their effort.
-Peff
next prev parent reply other threads:[~2012-02-29 8:28 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-29 0:20 [PATCH v2 0/2] "log --regexp-ignore-case -S/-G<string>" Junio C Hamano
2012-02-29 0:20 ` [PATCH v2 1/2] grep: use static trans-case table Junio C Hamano
2012-02-29 8:28 ` Jeff King [this message]
2012-02-29 8:51 ` Junio C Hamano
2012-02-29 0:20 ` [PATCH v2 2/2] pickaxe: allow -i to search in patch case-insensitively Junio C Hamano
2012-02-29 8:35 ` Jeff King
2012-02-29 8:55 ` Junio C Hamano
2012-02-29 9:18 ` Jeff King
2012-02-29 11:40 ` Thomas Rast
2012-02-29 18:05 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120229082814.GB14181@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).