All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Blake <eblake@redhat.com>
To: "Paolo Bonzini" <pbonzini@redhat.com>,
	"Pádraig Brady" <P@draigbrady.com>,
	coreutils@gnu.org
Cc: Rusty Russell <rusty@rustcorp.com.au>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] [PATCH] copy, dd: simplify and optimize NUL bytes detection
Date: Thu, 22 Oct 2015 10:02:46 -0600	[thread overview]
Message-ID: <562908A6.3000307@redhat.com> (raw)
In-Reply-To: <56290152.7010408@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 2212 bytes --]

On 10/22/2015 09:31 AM, Paolo Bonzini wrote:

> Only if your machine cannot do unaligned loads.  If it can, you can
> align the length instead of the buffer.  memcmp will take care of
> aligning the buffer (with some luck it won't have to, e.g. if buf is
> 0x12340002 and length = 4094).  On x86 unaligned "unsigned long" loads
> are basically free as long as they don't cross a cache line.
> 
>> BTW Rusty has a benchmark framework for this as referenced from:
>> http://rusty.ozlabs.org/?p=560
> 
> I missed his benchmark framework so I wrote another one, here it is:
> https://gist.githubusercontent.com/bonzini/9a95b0e02d1ceb60af9e/raw/7bc42ddccdb6c42fea3db58e0539d0443d0e6dc6/memeqzero.c

I see a bug in there:

static __attribute__((noinline)) bool memeqzero4_paolo(const void *data,
size_t length)
{
    const unsigned char *p = data;
    unsigned long word;

    while (__builtin_expect(length & (sizeof(word) - 1), 0)) {
        if (*p)
            return false;
        p++;
        length--;
    }
    while (__builtin_expect(length & (16 - sizeof(word)), 0)) {
        memcpy(&word, p, sizeof(word));
        if (word)
            return false;
        p += sizeof(word);
        length -= sizeof(word);
    }

     /* Now we know that's zero, memcmp with self. */
     return length == 0 || memcmp(data, p, length) == 0;
}

If length is already aligned on entry, then you are calling memcmp(data,
data, length) which is trivially 0 for all input, rather than checking
for actual NUL bytes.  You MUST check at least one byte manually before
handing off to memcmp(), and having the distance between data and p be a
multiple of a cache-line (well, blindly picking 16 as Rusty did is a
close approximation) will probably let the libc memcmp() run a lot
faster than if memcmp() has to deal with unaligned pointers (where it
can optimize for one of the two reads to be aligned, but the other read
is unaligned - even if the two reads are close enough to be hitting the
same cache line, you are still suffering from some performance slowdowns).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

  reply	other threads:[~2015-10-22 16:02 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1445522453-14450-1-git-send-email-P@draigBrady.com>
2015-10-22 14:37 ` [Qemu-devel] [PATCH] copy, dd: simplify and optimize NUL bytes detection Eric Blake
2015-10-22 14:44   ` Paolo Bonzini
2015-10-22 15:17     ` Pádraig Brady
2015-10-22 15:31       ` Paolo Bonzini
2015-10-22 16:02         ` Eric Blake [this message]
2015-10-22 16:14           ` Paolo Bonzini
2015-10-22 17:39             ` Radim Krčmář
2015-10-22 19:47               ` Paolo Bonzini
2015-10-23 11:12                 ` Pádraig Brady
2015-10-23 11:14                   ` Paolo Bonzini
2015-10-23 11:15                 ` Pádraig Brady
2015-10-24  2:24                   ` Pádraig Brady
2015-10-25 12:00                     ` Pádraig Brady
2015-10-22 15:47       ` Bernhard Voelker
2015-10-22 15:52         ` Paolo Bonzini
2015-10-22 15:55         ` Eric Blake
2015-10-23 10:59           ` Bernhard Voelker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=562908A6.3000307@redhat.com \
    --to=eblake@redhat.com \
    --cc=P@draigbrady.com \
    --cc=coreutils@gnu.org \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rusty@rustcorp.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.