From mboxrd@z Thu Jan 1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:37980)
by lists.gnu.org with esmtp (Exim 4.71)
(envelope-from
) id 1ZpaJT-0003hS-Cl
for qemu-devel@nongnu.org; Fri, 23 Oct 2015 07:15:12 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
(envelope-from
) id 1ZpaJQ-00068P-KF
for qemu-devel@nongnu.org; Fri, 23 Oct 2015 07:15:11 -0400
Received: from mail1.vodafone.ie ([213.233.128.43]:31852)
by eggs.gnu.org with esmtp (Exim 4.71)
(envelope-from
) id 1ZpaJQ-00067l-Ev
for qemu-devel@nongnu.org; Fri, 23 Oct 2015 07:15:08 -0400
References: <1445522453-14450-1-git-send-email-P@draigBrady.com>
<5628F4BC.2040502@redhat.com> <5628F634.6040809@redhat.com>
<5628FE20.80802@draigBrady.com> <56290152.7010408@redhat.com>
<562908A6.3000307@redhat.com> <56290B55.2000703@redhat.com>
<20151022173919.GC14789@potion.brq.redhat.com>
<56293D5D.2030107@redhat.com>
From: =?UTF-8?Q?P=c3=a1draig_Brady?=
Message-ID: <562A16BB.6080006@draigBrady.com>
Date: Fri, 23 Oct 2015 12:15:07 +0100
MIME-Version: 1.0
In-Reply-To: <56293D5D.2030107@redhat.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Subject: Re: [Qemu-devel] [PATCH] copy,
dd: simplify and optimize NUL bytes detection
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
To: Paolo Bonzini , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?=
Cc: Rusty Russell , coreutils@gnu.org, "qemu-devel@nongnu.org"
On 22/10/15 20:47, Paolo Bonzini wrote:
>
>
> On 22/10/2015 19:39, Radim Krčmář wrote:
>> 2015-10-22 18:14+0200, Paolo Bonzini:
>>> On 22/10/2015 18:02, Eric Blake wrote:
>>>> I see a bug in there:
>>>
>>> Of course. You shouldn't have told me what the bug was, I deserved
>>> to look for it myself. :)
>>
>> It rather seems that you don't want spoilers, :)
>>
>> I see two bugs now.
>
> Me too. :) But Rusty surely has some testcases in case he wants to
> adopt some of the ideas here. O:-)
For completeness this should address the bugs I think?
bool memeqzero4_paolo(const void *data, size_t length)
{
const unsigned char *p = data;
unsigned long word;
if (!length)
return true;
/* Check len bytes not aligned on a word. */
while (__builtin_expect(length & (sizeof(word) - 1), 0)) {
if (*p)
return false;
p++;
length--;
if (!length)
return true;
}
/* Check up to 16 bytes a word at a time. */
for (;;) {
memcpy(&word, p, sizeof(word));
if (word)
return false;
p += sizeof(word);
length -= sizeof(word);
if (!length)
return true;
if (__builtin_expect(length & 15, 0) == 0)
break;
}
/* Now we know that's zero, memcmp with self. */
return memcmp(data, p, length) == 0;
}
compiled with gcc 5.1.1 -march=native -O2 on an i3-2310M
we get these timings:
bytes 1 8 16 512 65536
---------------------------------------------
Rusty: 10 28 59 114 6510
Paolo: 9 9 12 75 6495
It's also smaller, especially at -O3:
$ nm -S a.out | grep memeqzero4
... 000000000000005b t memeqzero4_paolo
... 0000000000000063 t memeqzero4_rusty
$ gcc -march=native -O3 memeqzero.c
$ nm -S a.out | grep memeqzero4
... 000000000000005b t memeqzero4_paolo
... 0000000000000133 t memeqzero4_rusty
cheers,
Pádraig.