From: Bernhard Voelker <mail@bernhard-voelker.de>
To: "Eric Blake" <eblake@redhat.com>,
"Pádraig Brady" <P@draigBrady.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
coreutils@gnu.org
Cc: Rusty Russell <rusty@rustcorp.com.au>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] [PATCH] copy, dd: simplify and optimize NUL bytes detection
Date: Fri, 23 Oct 2015 12:59:18 +0200 [thread overview]
Message-ID: <562A1306.2070902@bernhard-voelker.de> (raw)
In-Reply-To: <562906DD.5040501@redhat.com>
On 10/22/2015 05:55 PM, Eric Blake wrote:
> On 10/22/2015 09:47 AM, Bernhard Voelker wrote:
>
>>> Also I suspect the extra conditions involved in using longs
>>> for just the first 16 bytes would outweigh the benefits?
>>> I.E. the first simple loop probably breaks early, and if not
>>> has the added benefit of "priming the pumps" for the subsequent memcmp().
>>
>> what about spending some 16 bytes of memory and do the memcmp on the whole
>> buffer?
>>
>> static unsigned char p[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
>> return 0 == memcmp (p, buf, bufsize);
>
> Won't work over the whole bufsize for anything larger than 16 unless you
> do repeated memcmp()s.
>
> Or are you suggesting that the first 16-byte head validation be done
> against a static buffer via one memcmp(), followed by the other
> overlap-self memcmp() for the rest of the buffer? But I suspect that
> for short lengths, it is more efficient to do an unrolled loop than to
> make a function call (where the function call itself will probably just
> do an unrolled loop on the short length). You want the short case to be
> fast, and the real speedup comes by delegating as much of the long case
> as possible to the system memcmp() optimizations.
Of course, you're completely right. My example above was over-simplified
and therefore plain wrong, sorry.
Aiming at tools like dd(1), I played a bit with the idea of pre-known-zeroed
buffer in front of the real payload data, i.e. having a buffer of 16 + 64k
where the first 16 bytes are all NULs, thus being able to immediately use
the overlap-self memcmp() with the payload starting at offset 16.
Tests showed that you are right with your other suspicion, too: the overhead
of calling memcmp() for small buffer sizes is less effective than Rusty's way.
Therefore +1 for Padraig's patch.
Have a nice day,
Berny
prev parent reply other threads:[~2015-10-23 10:59 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1445522453-14450-1-git-send-email-P@draigBrady.com>
2015-10-22 14:37 ` [Qemu-devel] [PATCH] copy, dd: simplify and optimize NUL bytes detection Eric Blake
2015-10-22 14:44 ` Paolo Bonzini
2015-10-22 15:17 ` Pádraig Brady
2015-10-22 15:31 ` Paolo Bonzini
2015-10-22 16:02 ` Eric Blake
2015-10-22 16:14 ` Paolo Bonzini
2015-10-22 17:39 ` Radim Krčmář
2015-10-22 19:47 ` Paolo Bonzini
2015-10-23 11:12 ` Pádraig Brady
2015-10-23 11:14 ` Paolo Bonzini
2015-10-23 11:15 ` Pádraig Brady
2015-10-24 2:24 ` Pádraig Brady
2015-10-25 12:00 ` Pádraig Brady
2015-10-22 15:47 ` Bernhard Voelker
2015-10-22 15:52 ` Paolo Bonzini
2015-10-22 15:55 ` Eric Blake
2015-10-23 10:59 ` Bernhard Voelker [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=562A1306.2070902@bernhard-voelker.de \
--to=mail@bernhard-voelker.de \
--cc=P@draigBrady.com \
--cc=coreutils@gnu.org \
--cc=eblake@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=rusty@rustcorp.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.