From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37811) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZpaJD-0003GU-K2 for qemu-devel@nongnu.org; Fri, 23 Oct 2015 07:14:56 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZpaJA-0005wy-VC for qemu-devel@nongnu.org; Fri, 23 Oct 2015 07:14:55 -0400 Received: from mx1.redhat.com ([209.132.183.28]:45264) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZpaJA-0005wa-Q6 for qemu-devel@nongnu.org; Fri, 23 Oct 2015 07:14:52 -0400 References: <1445522453-14450-1-git-send-email-P@draigBrady.com> <5628F4BC.2040502@redhat.com> <5628F634.6040809@redhat.com> <5628FE20.80802@draigBrady.com> <56290152.7010408@redhat.com> <562908A6.3000307@redhat.com> <56290B55.2000703@redhat.com> <20151022173919.GC14789@potion.brq.redhat.com> <56293D5D.2030107@redhat.com> <562A1625.60900@lincor.com> From: Paolo Bonzini Message-ID: <562A16A7.7070506@redhat.com> Date: Fri, 23 Oct 2015 13:14:47 +0200 MIME-Version: 1.0 In-Reply-To: <562A1625.60900@lincor.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH] copy, dd: simplify and optimize NUL bytes detection List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: =?UTF-8?Q?P=c3=a1draig_Brady?= , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= Cc: Rusty Russell , coreutils@gnu.org, "qemu-devel@nongnu.org" On 23/10/2015 13:12, P=C3=A1draig Brady wrote: > On 22/10/15 20:47, Paolo Bonzini wrote: >> >> >> On 22/10/2015 19:39, Radim Kr=C4=8Dm=C3=A1=C5=99 wrote: >>> 2015-10-22 18:14+0200, Paolo Bonzini: >>>> On 22/10/2015 18:02, Eric Blake wrote: >>>>> I see a bug in there: >>>> >>>> Of course. You shouldn't have told me what the bug was, I deserved >>>> to look for it myself. :) >>> >>> It rather seems that you don't want spoilers, :) >>> >>> I see two bugs now. >> >> Me too. :) But Rusty surely has some testcases in case he wants to >> adopt some of the ideas here. O:-) >=20 > For completeness this should address the bugs I think? Yes, thanks! :D Paolo > bool memeqzero4_paolo(const void *data, size_t length) > { > const unsigned char *p =3D data; > unsigned long word; >=20 > if (!length) > return true; >=20 > /* Check len bytes not aligned on a word. */ > while (__builtin_expect(length & (sizeof(word) - 1), 0)) { > if (*p) > return false; > p++; > length--; > if (!length) > return true; > } >=20 > /* Check up to 16 bytes a word at a time. */ > for (;;) { > memcpy(&word, p, sizeof(word)); > if (word) > return false; > p +=3D sizeof(word); > length -=3D sizeof(word); > if (!length) > return true; > if (__builtin_expect(length & 15, 0) =3D=3D 0) > break; > } >=20 > /* Now we know that's zero, memcmp with self. */ > return memcmp(data, p, length) =3D=3D 0; > } >=20 > compiled with gcc 5.1.1 -march=3Dnative -O2 on an i3-2310M > we get these timings: >=20 > bytes 1 8 16 512 65536 > --------------------------------------------- > Rusty: 10 28 59 114 6510 > Paolo: 9 9 12 75 6495 >=20 > It's also smaller, especially at -O3: >=20 > $ nm -S a.out | grep memeqzero4 > ... 000000000000005b t memeqzero4_paolo > ... 0000000000000063 t memeqzero4_rusty > $ gcc -march=3Dnative -O3 memeqzero.c > $ nm -S a.out | grep memeqzero4 > ... 000000000000005b t memeqzero4_paolo > ... 0000000000000133 t memeqzero4_rusty >=20 > cheers, > P=C3=A1draig. >=20