From: Jakub Narebski <jnareb@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 4/5] xdiff: introduce XDF_IGNORE_CASE
Date: Wed, 22 Feb 2012 10:07:56 -0800 (PST) [thread overview]
Message-ID: <m3ehtmeo7c.fsf@localhost.localdomain> (raw)
In-Reply-To: <1329704188-9955-5-git-send-email-gitster@pobox.com>
Junio C Hamano <gitster@pobox.com> writes:
> Teach the hash function and per-line comparison logic to compare lines
> while ignoring the differences in case. It is not an ignore-whitespace
> option but still needs to trigger the inexact match logic, and that is
> why the previous step introduced XDF_INEXACT_MATCH mask.
Nb. how it compares with ignore case in filesystem paths?
> Assign the 7th bit for this option, and move the bits to select diff
> algorithms out of the way in order to leave room for a few bits to add
> more variants of ignore-whitespace, such as --ignore-tab-expansion, if
> somebody else is inclined to do so later.
Or do a proper Unicode sorting / collation algorithm, with different
levels
(4.3 Form a sort key for each string, UTS #10.):
Level 1: alphabetic ordering
Level 2: diacritic ordering
Level 3: case ordering
Level 4: tie-breaking (e.g. in the case when variable is 'shifted')
> We would still need to teach the front-end to flip this bit, for this
> change to be any useful.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
> +static inline int match_a_byte(char ch1, char ch2, long flags)
> +{
> + if (ch1 == ch2)
> + return 1;
> + if (!(flags & XDF_IGNORE_CASE) || ((ch1 | ch2) & 0x80))
> + return 0;
> + if (isupper(ch1))
> + ch1 = tolower(ch1);
> + if (isupper(ch2))
> + ch2 = tolower(ch2);
> + return (ch1 == ch2);
> +}
<del>
Wouldn't a better solution be a collate algorithm rather than changing
a sorting function? Or is it a performance hack on typical body of
text under version control (mainly lowercase)?
</del>
"(libc.info)Collation Fuctions" says:
The functions `strcoll' and `wcscoll' perform this translation
implicitly, in order to do one comparison. By contrast, `strxfrm' and
`wcsxfrm' perform the mapping explicitly. If you are making multiple
comparisons using the same string or set of strings, it is likely to be
more efficient to use `strxfrm' or `wcsxfrm' to transform all the
strings just once, and subsequently compare the transformed strings
with `strcmp' or `wcscmp'.
The function match_a_byte (memcoll?) defined here is similar to strcoll;
do we compare single line with more than one other line?
> +static inline unsigned long hash_a_byte(const char ch_, long flags)
> +{
> + unsigned long ch = ch_ & 0xFF;
> + if ((flags & XDF_IGNORE_CASE) && !(ch & 0x80) && isupper(ch))
> + ch = tolower(ch);
> + return ch;
> +}
> +
Hmmm... hash_a_byte (memxfrm?) is similar to strxfrm, so you do use
one or the other...
--
Jakub Narebski
next prev parent reply other threads:[~2012-02-22 18:08 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-20 2:16 [PATCH 0/5] diff --ignore-case Junio C Hamano
2012-02-20 2:16 ` [PATCH 1/5] xdiff: remove XDL_PATCH_* macros Junio C Hamano
2012-02-20 2:16 ` [PATCH 2/5] xdiff: PATIENCE/HISTOGRAM are not independent option bits Junio C Hamano
2012-02-20 2:16 ` [PATCH 3/5] xdiff: introduce XDF_INEXACT_MATCH Junio C Hamano
2012-02-20 2:16 ` [PATCH 4/5] xdiff: introduce XDF_IGNORE_CASE Junio C Hamano
2012-02-22 18:07 ` Jakub Narebski [this message]
2012-02-20 2:16 ` [PATCH 5/5] diff: --ignore-case Junio C Hamano
2012-02-20 7:36 ` [PATCH 6/5] diff -i Junio C Hamano
2012-02-20 8:41 ` [PATCH 0/5] diff --ignore-case Johannes Sixt
2012-02-20 8:52 ` Junio C Hamano
2012-02-20 14:06 ` Thomas Rast
2012-02-20 19:47 ` Junio C Hamano
2012-02-20 22:10 ` Chris Leong
2012-02-21 9:02 ` Re* " Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m3ehtmeo7c.fsf@localhost.localdomain \
--to=jnareb@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.