From: Jakub Narebski <jnareb@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 4/5] xdiff: introduce XDF_IGNORE_CASE
Date: Wed, 22 Feb 2012 10:07:56 -0800 (PST) [thread overview]
Message-ID: <m3ehtmeo7c.fsf@localhost.localdomain> (raw)
In-Reply-To: <1329704188-9955-5-git-send-email-gitster@pobox.com>
Junio C Hamano <gitster@pobox.com> writes:
> Teach the hash function and per-line comparison logic to compare lines
> while ignoring the differences in case. It is not an ignore-whitespace
> option but still needs to trigger the inexact match logic, and that is
> why the previous step introduced XDF_INEXACT_MATCH mask.
Nb. how it compares with ignore case in filesystem paths?
> Assign the 7th bit for this option, and move the bits to select diff
> algorithms out of the way in order to leave room for a few bits to add
> more variants of ignore-whitespace, such as --ignore-tab-expansion, if
> somebody else is inclined to do so later.
Or do a proper Unicode sorting / collation algorithm, with different
levels
(4.3 Form a sort key for each string, UTS #10.):
Level 1: alphabetic ordering
Level 2: diacritic ordering
Level 3: case ordering
Level 4: tie-breaking (e.g. in the case when variable is 'shifted')
> We would still need to teach the front-end to flip this bit, for this
> change to be any useful.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
> +static inline int match_a_byte(char ch1, char ch2, long flags)
> +{
> + if (ch1 == ch2)
> + return 1;
> + if (!(flags & XDF_IGNORE_CASE) || ((ch1 | ch2) & 0x80))
> + return 0;
> + if (isupper(ch1))
> + ch1 = tolower(ch1);
> + if (isupper(ch2))
> + ch2 = tolower(ch2);
> + return (ch1 == ch2);
> +}
<del>
Wouldn't a better solution be a collate algorithm rather than changing
a sorting function? Or is it a performance hack on typical body of
text under version control (mainly lowercase)?
</del>
"(libc.info)Collation Fuctions" says:
The functions `strcoll' and `wcscoll' perform this translation
implicitly, in order to do one comparison. By contrast, `strxfrm' and
`wcsxfrm' perform the mapping explicitly. If you are making multiple
comparisons using the same string or set of strings, it is likely to be
more efficient to use `strxfrm' or `wcsxfrm' to transform all the
strings just once, and subsequently compare the transformed strings
with `strcmp' or `wcscmp'.
The function match_a_byte (memcoll?) defined here is similar to strcoll;
do we compare single line with more than one other line?
> +static inline unsigned long hash_a_byte(const char ch_, long flags)
> +{
> + unsigned long ch = ch_ & 0xFF;
> + if ((flags & XDF_IGNORE_CASE) && !(ch & 0x80) && isupper(ch))
> + ch = tolower(ch);
> + return ch;
> +}
> +
Hmmm... hash_a_byte (memxfrm?) is similar to strxfrm, so you do use
one or the other...
--
Jakub Narebski
next prev parent reply other threads:[~2012-02-22 18:08 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-20 2:16 [PATCH 0/5] diff --ignore-case Junio C Hamano
2012-02-20 2:16 ` [PATCH 1/5] xdiff: remove XDL_PATCH_* macros Junio C Hamano
2012-02-20 2:16 ` [PATCH 2/5] xdiff: PATIENCE/HISTOGRAM are not independent option bits Junio C Hamano
2012-02-20 2:16 ` [PATCH 3/5] xdiff: introduce XDF_INEXACT_MATCH Junio C Hamano
2012-02-20 2:16 ` [PATCH 4/5] xdiff: introduce XDF_IGNORE_CASE Junio C Hamano
2012-02-22 18:07 ` Jakub Narebski [this message]
2012-02-20 2:16 ` [PATCH 5/5] diff: --ignore-case Junio C Hamano
2012-02-20 7:36 ` [PATCH 6/5] diff -i Junio C Hamano
2012-02-20 8:41 ` [PATCH 0/5] diff --ignore-case Johannes Sixt
2012-02-20 8:52 ` Junio C Hamano
2012-02-20 14:06 ` Thomas Rast
2012-02-20 19:47 ` Junio C Hamano
2012-02-20 22:10 ` Chris Leong
2012-02-21 9:02 ` Re* " Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m3ehtmeo7c.fsf@localhost.localdomain \
--to=jnareb@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).