git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 4/5] xdiff: introduce XDF_IGNORE_CASE
Date: Wed, 22 Feb 2012 10:07:56 -0800 (PST)	[thread overview]
Message-ID: <m3ehtmeo7c.fsf@localhost.localdomain> (raw)
In-Reply-To: <1329704188-9955-5-git-send-email-gitster@pobox.com>

Junio C Hamano <gitster@pobox.com> writes:

> Teach the hash function and per-line comparison logic to compare lines
> while ignoring the differences in case.  It is not an ignore-whitespace
> option but still needs to trigger the inexact match logic, and that is
> why the previous step introduced XDF_INEXACT_MATCH mask.

Nb. how it compares with ignore case in filesystem paths?
 
> Assign the 7th bit for this option, and move the bits to select diff
> algorithms out of the way in order to leave room for a few bits to add
> more variants of ignore-whitespace, such as --ignore-tab-expansion, if
> somebody else is inclined to do so later.

Or do a proper Unicode sorting / collation algorithm, with different
levels 

  (4.3 Form a sort key for each string, UTS #10.):

     Level 1: alphabetic ordering
     Level 2: diacritic ordering
     Level 3: case ordering
     Level 4: tie-breaking (e.g. in the case when variable is 'shifted')

> We would still need to teach the front-end to flip this bit, for this
> change to be any useful.
> 
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---

> +static inline int match_a_byte(char ch1, char ch2, long flags)
> +{
> +	if (ch1 == ch2)
> +		return 1;
> +	if (!(flags & XDF_IGNORE_CASE) || ((ch1 | ch2) & 0x80))
> +		return 0;
> +	if (isupper(ch1))
> +		ch1 = tolower(ch1);
> +	if (isupper(ch2))
> +		ch2 = tolower(ch2);
> +	return (ch1 == ch2);
> +}

<del>
Wouldn't a better solution be a collate algorithm rather than changing
a sorting function?  Or is it a performance hack on typical body of
text under version control (mainly lowercase)?
</del>

"(libc.info)Collation Fuctions" says:

     The functions `strcoll' and `wcscoll' perform this translation
  implicitly, in order to do one comparison.  By contrast, `strxfrm' and
  `wcsxfrm' perform the mapping explicitly.  If you are making multiple
  comparisons using the same string or set of strings, it is likely to be
  more efficient to use `strxfrm' or `wcsxfrm' to transform all the
  strings just once, and subsequently compare the transformed strings
  with `strcmp' or `wcscmp'.

The function match_a_byte (memcoll?) defined here is similar to strcoll;
do we compare single line with more than one other line?

> +static inline unsigned long hash_a_byte(const char ch_, long flags)
> +{
> +	unsigned long ch = ch_ & 0xFF;
> +	if ((flags & XDF_IGNORE_CASE) && !(ch & 0x80) && isupper(ch))
> +		ch = tolower(ch);
> +	return ch;
> +}
> +

Hmmm... hash_a_byte (memxfrm?) is similar to strxfrm, so you do use
one or the other...

-- 
Jakub Narebski

  reply	other threads:[~2012-02-22 18:08 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-20  2:16 [PATCH 0/5] diff --ignore-case Junio C Hamano
2012-02-20  2:16 ` [PATCH 1/5] xdiff: remove XDL_PATCH_* macros Junio C Hamano
2012-02-20  2:16 ` [PATCH 2/5] xdiff: PATIENCE/HISTOGRAM are not independent option bits Junio C Hamano
2012-02-20  2:16 ` [PATCH 3/5] xdiff: introduce XDF_INEXACT_MATCH Junio C Hamano
2012-02-20  2:16 ` [PATCH 4/5] xdiff: introduce XDF_IGNORE_CASE Junio C Hamano
2012-02-22 18:07   ` Jakub Narebski [this message]
2012-02-20  2:16 ` [PATCH 5/5] diff: --ignore-case Junio C Hamano
2012-02-20  7:36 ` [PATCH 6/5] diff -i Junio C Hamano
2012-02-20  8:41 ` [PATCH 0/5] diff --ignore-case Johannes Sixt
2012-02-20  8:52   ` Junio C Hamano
2012-02-20 14:06     ` Thomas Rast
2012-02-20 19:47       ` Junio C Hamano
2012-02-20 22:10         ` Chris Leong
2012-02-21  9:02         ` Re* " Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m3ehtmeo7c.fsf@localhost.localdomain \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).