All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-trivial] [PATCH] arch_init.c: remove duplicate function
@ 2014-04-14  2:27 Amos Kong
  2014-04-14  9:19 ` Laszlo Ersek
  0 siblings, 1 reply; 6+ messages in thread
From: Amos Kong @ 2014-04-14  2:27 UTC (permalink / raw)
  To: qemu-trivial; +Cc: yamahata, lersek, mjt, qemu-devel

We already have a function buffer_is_zero() in util/cutils.c

Signed-off-by: Amos Kong <akong@redhat.com>
---
 arch_init.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 60c975d..342e5dc 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -152,11 +152,6 @@ int qemu_read_default_config_files(bool userconfig)
     return 0;
 }
 
-static inline bool is_zero_range(uint8_t *p, uint64_t size)
-{
-    return buffer_find_nonzero_offset(p, size) == size;
-}
-
 /* struct contains XBZRLE cache and a static page
    used by the compression */
 static struct {
@@ -587,7 +582,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
                         acct_info.dup_pages++;
                     }
                 }
-            } else if (is_zero_range(p, TARGET_PAGE_SIZE)) {
+            } else if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
                 acct_info.dup_pages++;
                 bytes_sent = save_block_hdr(f, block, offset, cont,
                                             RAM_SAVE_FLAG_COMPRESS);
@@ -983,7 +978,7 @@ static inline void *host_from_stream_offset(QEMUFile *f,
  */
 void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
 {
-    if (ch != 0 || !is_zero_range(host, size)) {
+    if (ch != 0 || !buffer_is_zero(host, size)) {
         memset(host, ch, size);
     }
 }
-- 
1.9.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Qemu-trivial] [PATCH] arch_init.c: remove duplicate function
  2014-04-14  2:27 [Qemu-trivial] [PATCH] arch_init.c: remove duplicate function Amos Kong
@ 2014-04-14  9:19 ` Laszlo Ersek
  2014-04-14 23:55   ` Michael R. Hines
  0 siblings, 1 reply; 6+ messages in thread
From: Laszlo Ersek @ 2014-04-14  9:19 UTC (permalink / raw)
  To: Amos Kong, qemu-trivial; +Cc: yamahata, mjt, qemu-devel, Michael R. Hines

On 04/14/14 04:27, Amos Kong wrote:
> We already have a function buffer_is_zero() in util/cutils.c
> 
> Signed-off-by: Amos Kong <akong@redhat.com>
> ---
>  arch_init.c | 9 ++-------
>  1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index 60c975d..342e5dc 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -152,11 +152,6 @@ int qemu_read_default_config_files(bool userconfig)
>      return 0;
>  }
>  
> -static inline bool is_zero_range(uint8_t *p, uint64_t size)
> -{
> -    return buffer_find_nonzero_offset(p, size) == size;
> -}
> -
>  /* struct contains XBZRLE cache and a static page
>     used by the compression */
>  static struct {
> @@ -587,7 +582,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
>                          acct_info.dup_pages++;
>                      }
>                  }
> -            } else if (is_zero_range(p, TARGET_PAGE_SIZE)) {
> +            } else if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
>                  acct_info.dup_pages++;
>                  bytes_sent = save_block_hdr(f, block, offset, cont,
>                                              RAM_SAVE_FLAG_COMPRESS);
> @@ -983,7 +978,7 @@ static inline void *host_from_stream_offset(QEMUFile *f,
>   */
>  void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
>  {
> -    if (ch != 0 || !is_zero_range(host, size)) {
> +    if (ch != 0 || !buffer_is_zero(host, size)) {
>          memset(host, ch, size);
>      }
>  }
> 

This seems to be correct, functionally -- apparently buffer_is_zero()
has laxer size requirements than buffer_find_nonzero_offset(). However,
I think the latter might be faster.

For ram_save_block() I guess the difference is negligible. But
ram_handle_compressed() is also called from "migration-rdma.c", where I
can't even guess if a little bit of slowdown would count.

I'm OK with the patch if Michael (CC'd) is.

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-trivial] [PATCH] arch_init.c: remove duplicate function
  2014-04-14  9:19 ` Laszlo Ersek
@ 2014-04-14 23:55   ` Michael R. Hines
  2014-04-15  9:00     ` Laszlo Ersek
  2014-04-27  8:48     ` [Qemu-trivial] " Paolo Bonzini
  0 siblings, 2 replies; 6+ messages in thread
From: Michael R. Hines @ 2014-04-14 23:55 UTC (permalink / raw)
  To: Laszlo Ersek, Amos Kong, qemu-trivial
  Cc: yamahata, Paolo Bonzini, mjt, qemu-devel, Michael R. Hines

On 04/14/2014 05:19 PM, Laszlo Ersek wrote:
> On 04/14/14 04:27, Amos Kong wrote:
>> We already have a function buffer_is_zero() in util/cutils.c
>>
>> Signed-off-by: Amos Kong <akong@redhat.com>
>> ---
>>   arch_init.c | 9 ++-------
>>   1 file changed, 2 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch_init.c b/arch_init.c
>> index 60c975d..342e5dc 100644
>> --- a/arch_init.c
>> +++ b/arch_init.c
>> @@ -152,11 +152,6 @@ int qemu_read_default_config_files(bool userconfig)
>>       return 0;
>>   }
>>   
>> -static inline bool is_zero_range(uint8_t *p, uint64_t size)
>> -{
>> -    return buffer_find_nonzero_offset(p, size) == size;
>> -}
>> -
>>   /* struct contains XBZRLE cache and a static page
>>      used by the compression */
>>   static struct {
>> @@ -587,7 +582,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
>>                           acct_info.dup_pages++;
>>                       }
>>                   }
>> -            } else if (is_zero_range(p, TARGET_PAGE_SIZE)) {
>> +            } else if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
>>                   acct_info.dup_pages++;
>>                   bytes_sent = save_block_hdr(f, block, offset, cont,
>>                                               RAM_SAVE_FLAG_COMPRESS);
>> @@ -983,7 +978,7 @@ static inline void *host_from_stream_offset(QEMUFile *f,
>>    */
>>   void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
>>   {
>> -    if (ch != 0 || !is_zero_range(host, size)) {
>> +    if (ch != 0 || !buffer_is_zero(host, size)) {
>>           memset(host, ch, size);
>>       }
>>   }
>>
> This seems to be correct, functionally -- apparently buffer_is_zero()
> has laxer size requirements than buffer_find_nonzero_offset(). However,
> I think the latter might be faster.
>
> For ram_save_block() I guess the difference is negligible. But
> ram_handle_compressed() is also called from "migration-rdma.c", where I
> can't even guess if a little bit of slowdown would count.
>
> I'm OK with the patch if Michael (CC'd) is.
>
> Thanks
> Laszlo
>

Thanks for the CC.

Actually, it looks like buffer_is_zero() is calling 
buffer_find_nonzero_offset()
as a "first try" anyway - which is the same thing RDMA is doing. So, all
calls to ram_handle_compressed() should hit the branch target there in
buffer_is_zero() for can_use_buffer_find_nonzero_offset() and automatically
drop into the vectorized-optimization to search for zeros, so there 
shouldn't
be any change in performance. The same should apply for TCP migration
as well - it's working on page-granularity, which is always aligned to 
32 or 64 bits.

Paolo? I see that some of the block-migration code and the qemu-img code 
is also
calling buffer_is_zero() - are you guys depending on the performance of any
buffer_is_zero() calls to use the vector-optimized version like 
migration does?

Just for RDMA: Reviewed-by: Michael R. Hines <mrhines@us.ibm.com>

- Michael



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-trivial] [PATCH] arch_init.c: remove duplicate function
  2014-04-14 23:55   ` Michael R. Hines
@ 2014-04-15  9:00     ` Laszlo Ersek
  2014-04-15 16:03       ` [Qemu-trivial] [Qemu-devel] " 陈梁
  2014-04-27  8:48     ` [Qemu-trivial] " Paolo Bonzini
  1 sibling, 1 reply; 6+ messages in thread
From: Laszlo Ersek @ 2014-04-15  9:00 UTC (permalink / raw)
  To: Michael R. Hines, Amos Kong, qemu-trivial
  Cc: yamahata, Paolo Bonzini, mjt, qemu-devel, Michael R. Hines

On 04/15/14 01:55, Michael R. Hines wrote:
> On 04/14/2014 05:19 PM, Laszlo Ersek wrote:
>> On 04/14/14 04:27, Amos Kong wrote:
>>> We already have a function buffer_is_zero() in util/cutils.c
>>>
>>> Signed-off-by: Amos Kong <akong@redhat.com>
>>> ---
>>>   arch_init.c | 9 ++-------
>>>   1 file changed, 2 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/arch_init.c b/arch_init.c
>>> index 60c975d..342e5dc 100644
>>> --- a/arch_init.c
>>> +++ b/arch_init.c
>>> @@ -152,11 +152,6 @@ int qemu_read_default_config_files(bool userconfig)
>>>       return 0;
>>>   }
>>>   -static inline bool is_zero_range(uint8_t *p, uint64_t size)
>>> -{
>>> -    return buffer_find_nonzero_offset(p, size) == size;
>>> -}
>>> -
>>>   /* struct contains XBZRLE cache and a static page
>>>      used by the compression */
>>>   static struct {
>>> @@ -587,7 +582,7 @@ static int ram_save_block(QEMUFile *f, bool
>>> last_stage)
>>>                           acct_info.dup_pages++;
>>>                       }
>>>                   }
>>> -            } else if (is_zero_range(p, TARGET_PAGE_SIZE)) {
>>> +            } else if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
>>>                   acct_info.dup_pages++;
>>>                   bytes_sent = save_block_hdr(f, block, offset, cont,
>>>                                               RAM_SAVE_FLAG_COMPRESS);
>>> @@ -983,7 +978,7 @@ static inline void
>>> *host_from_stream_offset(QEMUFile *f,
>>>    */
>>>   void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
>>>   {
>>> -    if (ch != 0 || !is_zero_range(host, size)) {
>>> +    if (ch != 0 || !buffer_is_zero(host, size)) {
>>>           memset(host, ch, size);
>>>       }
>>>   }
>>>
>> This seems to be correct, functionally -- apparently buffer_is_zero()
>> has laxer size requirements than buffer_find_nonzero_offset(). However,
>> I think the latter might be faster.
>>
>> For ram_save_block() I guess the difference is negligible. But
>> ram_handle_compressed() is also called from "migration-rdma.c", where I
>> can't even guess if a little bit of slowdown would count.
>>
>> I'm OK with the patch if Michael (CC'd) is.
>>
>> Thanks
>> Laszlo
>>
> 
> Thanks for the CC.
> 
> Actually, it looks like buffer_is_zero() is calling
> buffer_find_nonzero_offset()
> as a "first try" anyway

I have no idea how I managed to miss that.

> - which is the same thing RDMA is doing. So, all
> calls to ram_handle_compressed() should hit the branch target there in
> buffer_is_zero() for can_use_buffer_find_nonzero_offset() and automatically
> drop into the vectorized-optimization to search for zeros, so there
> shouldn't
> be any change in performance. The same should apply for TCP migration
> as well - it's working on page-granularity, which is always aligned to
> 32 or 64 bits.
> 
> Paolo? I see that some of the block-migration code and the qemu-img code
> is also
> calling buffer_is_zero() - are you guys depending on the performance of any
> buffer_is_zero() calls to use the vector-optimized version like
> migration does?

The patch doesn't change buffer_is_zero() internally, so callers of
buffer_is_zero() are unaffected. The patch turns two indirect callers of
buffer_find_nonzero_offset() into two "differently indirect" callers of
the same (which I now see thanks to your explanation). Hence,

Before:
ram_save_block        -> is_zero_range -> buffer_find_nonzero_offset
ram_handle_compressed -> is_zero_range -> buffer_find_nonzero_offset

After:
ram_save_block        -> buffer_is_zero -> buffer_find_nonzero_offset
ram_handle_compressed -> buffer_is_zero -> buffer_find_nonzero_offset

Reviewed-by: Laszlo Ersek <lersek@redhat.com>

Thanks!
Laszlo


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-trivial] [Qemu-devel] [PATCH] arch_init.c: remove duplicate function
  2014-04-15  9:00     ` Laszlo Ersek
@ 2014-04-15 16:03       ` 陈梁
  0 siblings, 0 replies; 6+ messages in thread
From: 陈梁 @ 2014-04-15 16:03 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: yamahata, qemu-trivial, mjt, qemu-devel, Michael R. Hines,
	陈梁, Michael R. Hines, Paolo Bonzini


> On 04/15/14 01:55, Michael R. Hines wrote:
>> On 04/14/2014 05:19 PM, Laszlo Ersek wrote:
>>> On 04/14/14 04:27, Amos Kong wrote:
>>>> We already have a function buffer_is_zero() in util/cutils.c
>>>> 
>>>> Signed-off-by: Amos Kong <akong@redhat.com>
>>>> ---
>>>>  arch_init.c | 9 ++-------
>>>>  1 file changed, 2 insertions(+), 7 deletions(-)
>>>> 
>>>> diff --git a/arch_init.c b/arch_init.c
>>>> index 60c975d..342e5dc 100644
>>>> --- a/arch_init.c
>>>> +++ b/arch_init.c
>>>> @@ -152,11 +152,6 @@ int qemu_read_default_config_files(bool userconfig)
>>>>      return 0;
>>>>  }
>>>>  -static inline bool is_zero_range(uint8_t *p, uint64_t size)
>>>> -{
>>>> -    return buffer_find_nonzero_offset(p, size) == size;
>>>> -}
>>>> -
>>>>  /* struct contains XBZRLE cache and a static page
>>>>     used by the compression */
>>>>  static struct {
>>>> @@ -587,7 +582,7 @@ static int ram_save_block(QEMUFile *f, bool
>>>> last_stage)
>>>>                          acct_info.dup_pages++;
>>>>                      }
>>>>                  }
>>>> -            } else if (is_zero_range(p, TARGET_PAGE_SIZE)) {
>>>> +            } else if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
>>>>                  acct_info.dup_pages++;
>>>>                  bytes_sent = save_block_hdr(f, block, offset, cont,
>>>>                                              RAM_SAVE_FLAG_COMPRESS);
>>>> @@ -983,7 +978,7 @@ static inline void
>>>> *host_from_stream_offset(QEMUFile *f,
>>>>   */
>>>>  void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
>>>>  {
>>>> -    if (ch != 0 || !is_zero_range(host, size)) {
>>>> +    if (ch != 0 || !buffer_is_zero(host, size)) {
>>>>          memset(host, ch, size);
>>>>      }
>>>>  }
>>>> 
>>> This seems to be correct, functionally -- apparently buffer_is_zero()
>>> has laxer size requirements than buffer_find_nonzero_offset(). However,
>>> I think the latter might be faster.
>>> 
>>> For ram_save_block() I guess the difference is negligible. But
>>> ram_handle_compressed() is also called from "migration-rdma.c", where I
>>> can't even guess if a little bit of slowdown would count.
>>> 
>>> I'm OK with the patch if Michael (CC'd) is.
>>> 
>>> Thanks
>>> Laszlo
>>> 
>> 
>> Thanks for the CC.
>> 
>> Actually, it looks like buffer_is_zero() is calling
>> buffer_find_nonzero_offset()
>> as a "first try" anyway
> 
> I have no idea how I managed to miss that.
> 
>> - which is the same thing RDMA is doing. So, all
>> calls to ram_handle_compressed() should hit the branch target there in
>> buffer_is_zero() for can_use_buffer_find_nonzero_offset() and automatically
>> drop into the vectorized-optimization to search for zeros, so there
>> shouldn't
>> be any change in performance. The same should apply for TCP migration
>> as well - it's working on page-granularity, which is always aligned to
>> 32 or 64 bits.
>> 
>> Paolo? I see that some of the block-migration code and the qemu-img code
>> is also
>> calling buffer_is_zero() - are you guys depending on the performance of any
>> buffer_is_zero() calls to use the vector-optimized version like
>> migration does?
> 
> The patch doesn't change buffer_is_zero() internally, so callers of
> buffer_is_zero() are unaffected. The patch turns two indirect callers of
> buffer_find_nonzero_offset() into two "differently indirect" callers of
> the same (which I now see thanks to your explanation). Hence,
> 
> Before:
> ram_save_block        -> is_zero_range -> buffer_find_nonzero_offset
> ram_handle_compressed -> is_zero_range -> buffer_find_nonzero_offset
Because of *static inline *is_zero_range, it may be like that:
    ram_save_block    -> buffer_find_nonzero_offset
    ram_handle_compressed   -> buffer_find_nonzero_offset
But this affects little in my testing.

Best regards
ChenLiang

> After:
> ram_save_block        -> buffer_is_zero -> buffer_find_nonzero_offset
> ram_handle_compressed -> buffer_is_zero -> buffer_find_nonzero_offset
> 
> Reviewed-by: Laszlo Ersek <lersek@redhat.com>
> 
> Thanks!
> Laszlo



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-trivial] [PATCH] arch_init.c: remove duplicate function
  2014-04-14 23:55   ` Michael R. Hines
  2014-04-15  9:00     ` Laszlo Ersek
@ 2014-04-27  8:48     ` Paolo Bonzini
  1 sibling, 0 replies; 6+ messages in thread
From: Paolo Bonzini @ 2014-04-27  8:48 UTC (permalink / raw)
  To: Michael R. Hines, Laszlo Ersek, Amos Kong, qemu-trivial
  Cc: yamahata, mjt, qemu-devel, Michael R. Hines

Il 15/04/2014 01:55, Michael R. Hines ha scritto:
> Paolo? I see that some of the block-migration code and the qemu-img code
> is also
> calling buffer_is_zero() - are you guys depending on the performance of any
> buffer_is_zero() calls to use the vector-optimized version like
> migration does?

Yes, definitely.  Not to the point that we care about which functions 
are inlined, but the constant-factor improvement that comes for 
vectorization can have a noticeable impact.

Paolo


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-04-27  8:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-14  2:27 [Qemu-trivial] [PATCH] arch_init.c: remove duplicate function Amos Kong
2014-04-14  9:19 ` Laszlo Ersek
2014-04-14 23:55   ` Michael R. Hines
2014-04-15  9:00     ` Laszlo Ersek
2014-04-15 16:03       ` [Qemu-trivial] [Qemu-devel] " 陈梁
2014-04-27  8:48     ` [Qemu-trivial] " Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.