git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Marko Kreen" <markokr@gmail.com>
To: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Dmitry Potapov" <dpotapov@gmail.com>,
	"Andreas Ericsson" <ae@op5.se>,
	"Git Mailing List" <git@vger.kernel.org>,
	"Junio C Hamano" <gitster@pobox.com>
Subject: Re: I'm a total push-over..
Date: Sat, 26 Jan 2008 14:16:29 +0200	[thread overview]
Message-ID: <e51f66da0801260416p5f5ffb98w16fe832fe62dc7c9@mail.gmail.com> (raw)
In-Reply-To: <alpine.LFD.1.00.0801251407010.5056@hp.linux-foundation.org>

On 1/26/08, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Fri, 25 Jan 2008, Marko Kreen wrote:
> > Well, although this is very clever approach, I suggest against it.
> > You'll end up with complex code that gives out substandard results.
>
> Actually, *your* operation is the one that gives substandard results.
>
> > I think its better to have separate case-folding function (or several),
> > that copies string to temp buffer and then run proper optimized hash
> > function on that buffer.
>
> I'm sorry, but you just cannot do that efficiently and portably.
>
> I can write a hash function that reliably does 8 bytes at a time for the
> common case on a 64-bit architecture, exactly because it's easy to do
> "test high bits in parallel" with a simple bitwise 'and', and we can do
> the same with "approximate lower-to-uppercase 8 bytes at a time" for a
> hash by just clearing bit 5.
>
> In contrast, trying to do the same thing in half-way portable C, but being
> limited to having to get the case-folding *exactly* right (which you need
> for the comparison function) is much much harder. It's basically
> impossible in portable C (it's doable with architecture-specific features,
> ie vector extensions that have per-byte compares etc).

Here you misunderstood me, I was proposing following:

int hash_folded(const char *str, int len)
{
   char buf[512];
   do_folding(buf, str, len);
   return do_hash(buf, len);
}

That is - the folded string should stay internal to hash function.

Only difference from combined foling+hashing would be that
you can code each part separately.

> And hashing is performance-critical, much more so than the compares (ie
> you're likely to have to hash tens of thousands of files, while you will
> only compare a couple). So it really is worth optimizing for.
>
> And the thing is, "performance" isn't a secondary feature. It's also not
> something you can add later by optimizing.
>
> It's also a mindset issue. Quite frankly, people who do this by "convert
> to some folded/normalized form, then do the operation" will generally make
> much more fundamental mistakes. Once you get into the mindset of "let's
> pass a corrupted strign around", you are in trouble. You start thinking
> that the corrupted string isn't really "corrupt", it's in an "optimized
> format".
>
> And it's all downhill from there. Don't do it.

Againg, you seem to keep HFS+ behaviour in mind, but that was
not what I did suggest.  Probably my mistake, sorry.

-- 
marko

  parent reply	other threads:[~2008-01-26 12:17 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-22 23:37 I'm a total push-over Linus Torvalds
2008-01-23  1:35 ` Kevin Ballard
2008-01-23  2:23 ` Junio C Hamano
2008-01-23  2:36   ` Junio C Hamano
2008-01-23 12:24     ` Johannes Schindelin
2008-01-23 12:28       ` David Kastrup
2008-01-23 12:56         ` Theodore Tso
2008-01-23  2:58   ` Linus Torvalds
2008-01-23  3:19     ` Linus Torvalds
2008-01-25  6:50       ` Junio C Hamano
2008-01-25 16:24         ` Linus Torvalds
2008-01-23  7:23     ` Junio C Hamano
2008-01-23 12:25       ` Johannes Schindelin
2008-01-23 16:25         ` Linus Torvalds
2008-01-23 16:34           ` Johannes Schindelin
2008-01-23 17:09             ` Linus Torvalds
2008-01-23 17:29               ` Linus Torvalds
2008-01-25  5:21                 ` Jeremy Maitin-Shepard
2008-01-25 12:51                   ` Johannes Schindelin
2008-01-25 18:19                     ` Jeremy Maitin-Shepard
2008-01-25 18:24                       ` Johannes Schindelin
2008-01-25 19:07                       ` Junio C Hamano
2008-01-23  8:32 ` Andreas Ericsson
2008-01-23  9:15   ` Dmitry Potapov
2008-01-23  9:31     ` Andreas Ericsson
2008-01-23 14:01       ` Marko Kreen
2008-01-23 14:39         ` Andreas Ericsson
2008-01-24  6:51           ` Luke Lu
2008-01-24 10:24             ` Andreas Ericsson
2008-01-24 13:19           ` Marko Kreen
2008-01-24 16:00             ` Andreas Ericsson
2008-01-24 16:13               ` Marko Kreen
2008-01-24 16:28               ` Dmitry Potapov
2008-01-24 17:15                 ` Linus Torvalds
2008-01-24 18:45                   ` Dmitry Potapov
2008-01-24 19:08                     ` Linus Torvalds
2008-01-25 20:52                   ` Marko Kreen
2008-01-25 22:16                     ` Linus Torvalds
2008-01-25 22:35                       ` Linus Torvalds
2008-01-26 12:16                       ` Marko Kreen [this message]
2008-01-27  6:51                         ` Linus Torvalds
2008-01-27  8:21                           ` Dmitry Potapov
2008-01-27 14:07                             ` Johannes Schindelin
2008-01-27 14:48                               ` Dmitry Potapov
2008-01-27  9:45                           ` Marko Kreen
2008-01-27 15:06                             ` Dmitry Potapov
2008-01-26 12:37                       ` Marko Kreen
2008-01-25 20:08               ` Marko Kreen
2008-01-23 17:10       ` Dmitry Potapov
2008-01-24 10:39         ` Andreas Ericsson
2008-01-23 16:06   ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e51f66da0801260416p5f5ffb98w16fe832fe62dc7c9@mail.gmail.com \
    --to=markokr@gmail.com \
    --cc=ae@op5.se \
    --cc=dpotapov@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).