qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Peter Lieven <pl@kamp.de>
Cc: Stefan Hajnoczi <stefanha@gmail.com>,
	Orit Wasserman <owasserm@redhat.com>,
	qemu-devel@nongnu.org, quintela@redhat.com
Subject: Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations
Date: Mon, 25 Mar 2013 15:34:16 +0100	[thread overview]
Message-ID: <51506068.5080103@redhat.com> (raw)
In-Reply-To: <806A8BFB-FF1F-482C-B679-2B1B10D06D7C@kamp.de>

Il 25/03/2013 14:32, Peter Lieven ha scritto:
> 
> Am 25.03.2013 um 14:23 schrieb Peter Lieven <pl@kamp.de>:
> 
>>
>> Am 25.03.2013 um 14:02 schrieb Paolo Bonzini <pbonzini@redhat.com>:
>>
>>>> Maybe I should have explained the output more detailed. The percentages
>>>> are added. 35.8% in the second last column means that
>>>> 35.8% have a return value that is less than TARGET_PAGE_SIZE.
>>>> This was meant to illustrate at how many 64-bit chunks you have
>>>> to look to grab a certain percentage of non-zero pages.
>>>
>>> Ok, I wrongly understood that many pages had 4088 zero bytes but
>>> the last 8 were not zero.  Now it's clearer, and more logical too. :)
>>>
>>>> Looking e.g. at the third value it means that looking at the first
>>>> three 64-bit chunks it will catch 34.0% of all pages.
>>>> It turns out that the non-zeroness of a page can be detected looking
>>>> at the first 256 or so bits and only a low
>>>> percentage turns out to be non-zero at a later position. So after
>>>> having checked the first chunks one by one
>>>> there is no big penalty looking at the remaining chunks with the
>>>> vectorized loop.
>>>
>>> I think it makes most sense to unroll the first four non-vectorized
>>> iterations, i.e. not use SSE and use three or four ifs.  Either:
>>>
>>>  if (foo[0]) return 0;
>>>  if (foo[1]) return 8;
>>>  if (foo[2]) return 16;
>>>  if (foo[3]) return 24;
>>>
>>> or
>>>
>>>  if (foo[0]) return 0;
>>>  if (foo[1] | foo[2] | foo[3]) return 8;
>>>
>>> and then proceed on the remaining 4096-4*sizeof(long) bytes with
>>> the vectorized loop.  foo+4 is aligned for SIMD operations on both
>>> 32- and 64-bit machines, which makes this a nice choice.
>>
>> i can't start at foo+4 since the remaining X-4*sizeof(long) bytes
>> are not dividable by 8*sizeof(VECTYPE).


Hmm, right.  What about just processing the first few longs twice, i.e.
the above followed by "for (i = 0; i < len / sizeof(sizeof(VECTYPE); i
+= BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR)"?

Paolo

>>
>>    for (i = BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR; 
>>         i < len / sizeof(VECTYPE); 
>>         i += BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR) {
>>        …
>>    }
> 
> performance of the above is bad compared to:
> 
>     for (i = 0; i < BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR; i++) {
>         if (!ALL_EQ(p[i], zero)) {
>             return i * sizeof(VECTYPE);
>         }
>     }
> 
> …
> 
> The above is basically what old is_dup_page is doing, but after the first
> 8 iterations the optimized version kicks in.
> 
> Peter
> 
> 
> 

  reply	other threads:[~2013-03-25 14:34 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-22 12:46 [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations Peter Lieven
2013-03-22 12:46 ` [Qemu-devel] [PATCHv4 1/9] move vector definitions to qemu-common.h Peter Lieven
2013-03-25  8:35   ` Orit Wasserman
2013-03-22 12:46 ` [Qemu-devel] [PATCHv4 2/9] cutils: add a function to find non-zero content in a buffer Peter Lieven
2013-03-22 19:37   ` Eric Blake
2013-03-22 20:03     ` Peter Lieven
2013-03-22 20:22       ` [Qemu-devel] indentation hints [was: [PATCHv4 2/9] cutils: add a function to find non-zero content in a buffer] Eric Blake
2013-03-23 11:18         ` Peter Maydell
2013-03-25  8:53   ` [Qemu-devel] [PATCHv4 2/9] cutils: add a function to find non-zero content in a buffer Orit Wasserman
2013-03-25  8:56     ` Peter Lieven
2013-03-25  9:26       ` Orit Wasserman
2013-03-25  9:42         ` Paolo Bonzini
2013-03-25 10:03           ` Orit Wasserman
2013-03-22 12:46 ` [Qemu-devel] [PATCHv4 3/9] buffer_is_zero: use vector optimizations if possible Peter Lieven
2013-03-25  8:53   ` Orit Wasserman
2013-03-22 12:46 ` [Qemu-devel] [PATCHv4 4/9] bitops: use vector algorithm to optimize find_next_bit() Peter Lieven
2013-03-25  9:04   ` Orit Wasserman
2013-03-22 12:46 ` [Qemu-devel] [PATCHv4 5/9] migration: search for zero instead of dup pages Peter Lieven
2013-03-22 19:49   ` Eric Blake
2013-03-22 20:02     ` Peter Lieven
2013-03-25  9:30   ` Orit Wasserman
2013-03-22 12:46 ` [Qemu-devel] [PATCHv4 6/9] migration: add an indicator for bulk state of ram migration Peter Lieven
2013-03-25  9:32   ` Orit Wasserman
2013-03-22 12:46 ` [Qemu-devel] [PATCHv4 7/9] migration: do not sent zero pages in bulk stage Peter Lieven
2013-03-22 20:13   ` Eric Blake
2013-03-25  9:44   ` Orit Wasserman
2013-03-22 12:46 ` [Qemu-devel] [PATCHv4 8/9] migration: do not search dirty " Peter Lieven
2013-03-25 10:05   ` Orit Wasserman
2013-03-22 12:46 ` [Qemu-devel] [PATCHv4 9/9] migration: use XBZRLE only after " Peter Lieven
2013-03-25 10:16   ` Orit Wasserman
2013-03-22 17:25 ` [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations Paolo Bonzini
2013-03-22 19:20   ` Peter Lieven
2013-03-22 21:24     ` Paolo Bonzini
2013-03-23  7:34       ` Peter Lieven
2013-03-25 10:17       ` Peter Lieven
2013-03-25 10:53         ` Paolo Bonzini
2013-03-25 11:26           ` Peter Lieven
2013-03-25 13:02             ` Paolo Bonzini
2013-03-25 13:23               ` Peter Lieven
2013-03-25 13:32                 ` Peter Lieven
2013-03-25 14:34                   ` Paolo Bonzini [this message]
2013-03-25 21:37                     ` Peter Lieven
2013-03-26  8:14                     ` Peter Lieven
2013-03-26  9:20                       ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51506068.5080103@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=owasserm@redhat.com \
    --cc=pl@kamp.de \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).