From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50144) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZpH6B-0005rm-GM for qemu-devel@nongnu.org; Thu, 22 Oct 2015 10:44:15 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZpH67-00067s-HR for qemu-devel@nongnu.org; Thu, 22 Oct 2015 10:44:11 -0400 Received: from mail-wi0-x22d.google.com ([2a00:1450:400c:c05::22d]:34107) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZpH67-00067f-Az for qemu-devel@nongnu.org; Thu, 22 Oct 2015 10:44:07 -0400 Received: by wikq8 with SMTP id q8so35445222wik.1 for ; Thu, 22 Oct 2015 07:44:06 -0700 (PDT) Sender: Paolo Bonzini References: <1445522453-14450-1-git-send-email-P@draigBrady.com> <5628F4BC.2040502@redhat.com> From: Paolo Bonzini Message-ID: <5628F634.6040809@redhat.com> Date: Thu, 22 Oct 2015 16:44:04 +0200 MIME-Version: 1.0 In-Reply-To: <5628F4BC.2040502@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] copy, dd: simplify and optimize NUL bytes detection List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake , =?UTF-8?Q?P=c3=a1draig_Brady?= , coreutils@gnu.org Cc: Rusty Russell , "qemu-devel@nongnu.org" On 22/10/2015 16:37, Eric Blake wrote: >> > + /* Check first 16 bytes manually. */ >> > + for (len = 0; len < 16; len++) >> > + { >> > + if (! bufsize) >> > + return true; >> > + if (*p) >> > + return false; >> > + p++; >> > + bufsize--; >> > + } >> > + >> > + /* Now we know that's zero, memcmp with self. */ >> > + return memcmp (buf, p, bufsize) == 0; >> > } > Cool trick of using a suitably-aligned overlap-to-self check to then > trigger platform-specific speedups without having to rewrite them by > hand! qemu is doing a similar check in util/cutils.c:buffer_is_zero() > that could probably benefit from the same idea. Nice trick indeed. On the other hand, the first 16 bytes are enough to rule out 99.99% (number out of thin hair) of the non-zero blocks, so that's where you want to optimize. Checking them an unsigned long at a time, or fetching a few unsigned longs and ORing them together would probably be the best of both worlds, because you then only use the FPU in the rare case of a zero buffer. Paolo