* [Qemu-trivial] [PATCH] arch_init.c: remove duplicate function @ 2014-04-14 2:27 Amos Kong 2014-04-14 9:19 ` Laszlo Ersek 0 siblings, 1 reply; 6+ messages in thread From: Amos Kong @ 2014-04-14 2:27 UTC (permalink / raw) To: qemu-trivial; +Cc: yamahata, lersek, mjt, qemu-devel We already have a function buffer_is_zero() in util/cutils.c Signed-off-by: Amos Kong <akong@redhat.com> --- arch_init.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/arch_init.c b/arch_init.c index 60c975d..342e5dc 100644 --- a/arch_init.c +++ b/arch_init.c @@ -152,11 +152,6 @@ int qemu_read_default_config_files(bool userconfig) return 0; } -static inline bool is_zero_range(uint8_t *p, uint64_t size) -{ - return buffer_find_nonzero_offset(p, size) == size; -} - /* struct contains XBZRLE cache and a static page used by the compression */ static struct { @@ -587,7 +582,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage) acct_info.dup_pages++; } } - } else if (is_zero_range(p, TARGET_PAGE_SIZE)) { + } else if (buffer_is_zero(p, TARGET_PAGE_SIZE)) { acct_info.dup_pages++; bytes_sent = save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_COMPRESS); @@ -983,7 +978,7 @@ static inline void *host_from_stream_offset(QEMUFile *f, */ void ram_handle_compressed(void *host, uint8_t ch, uint64_t size) { - if (ch != 0 || !is_zero_range(host, size)) { + if (ch != 0 || !buffer_is_zero(host, size)) { memset(host, ch, size); } } -- 1.9.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [Qemu-trivial] [PATCH] arch_init.c: remove duplicate function 2014-04-14 2:27 [Qemu-trivial] [PATCH] arch_init.c: remove duplicate function Amos Kong @ 2014-04-14 9:19 ` Laszlo Ersek 2014-04-14 23:55 ` Michael R. Hines 0 siblings, 1 reply; 6+ messages in thread From: Laszlo Ersek @ 2014-04-14 9:19 UTC (permalink / raw) To: Amos Kong, qemu-trivial; +Cc: yamahata, mjt, qemu-devel, Michael R. Hines On 04/14/14 04:27, Amos Kong wrote: > We already have a function buffer_is_zero() in util/cutils.c > > Signed-off-by: Amos Kong <akong@redhat.com> > --- > arch_init.c | 9 ++------- > 1 file changed, 2 insertions(+), 7 deletions(-) > > diff --git a/arch_init.c b/arch_init.c > index 60c975d..342e5dc 100644 > --- a/arch_init.c > +++ b/arch_init.c > @@ -152,11 +152,6 @@ int qemu_read_default_config_files(bool userconfig) > return 0; > } > > -static inline bool is_zero_range(uint8_t *p, uint64_t size) > -{ > - return buffer_find_nonzero_offset(p, size) == size; > -} > - > /* struct contains XBZRLE cache and a static page > used by the compression */ > static struct { > @@ -587,7 +582,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage) > acct_info.dup_pages++; > } > } > - } else if (is_zero_range(p, TARGET_PAGE_SIZE)) { > + } else if (buffer_is_zero(p, TARGET_PAGE_SIZE)) { > acct_info.dup_pages++; > bytes_sent = save_block_hdr(f, block, offset, cont, > RAM_SAVE_FLAG_COMPRESS); > @@ -983,7 +978,7 @@ static inline void *host_from_stream_offset(QEMUFile *f, > */ > void ram_handle_compressed(void *host, uint8_t ch, uint64_t size) > { > - if (ch != 0 || !is_zero_range(host, size)) { > + if (ch != 0 || !buffer_is_zero(host, size)) { > memset(host, ch, size); > } > } > This seems to be correct, functionally -- apparently buffer_is_zero() has laxer size requirements than buffer_find_nonzero_offset(). However, I think the latter might be faster. For ram_save_block() I guess the difference is negligible. But ram_handle_compressed() is also called from "migration-rdma.c", where I can't even guess if a little bit of slowdown would count. I'm OK with the patch if Michael (CC'd) is. Thanks Laszlo ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-trivial] [PATCH] arch_init.c: remove duplicate function 2014-04-14 9:19 ` Laszlo Ersek @ 2014-04-14 23:55 ` Michael R. Hines 2014-04-15 9:00 ` Laszlo Ersek 2014-04-27 8:48 ` [Qemu-trivial] " Paolo Bonzini 0 siblings, 2 replies; 6+ messages in thread From: Michael R. Hines @ 2014-04-14 23:55 UTC (permalink / raw) To: Laszlo Ersek, Amos Kong, qemu-trivial Cc: yamahata, Paolo Bonzini, mjt, qemu-devel, Michael R. Hines On 04/14/2014 05:19 PM, Laszlo Ersek wrote: > On 04/14/14 04:27, Amos Kong wrote: >> We already have a function buffer_is_zero() in util/cutils.c >> >> Signed-off-by: Amos Kong <akong@redhat.com> >> --- >> arch_init.c | 9 ++------- >> 1 file changed, 2 insertions(+), 7 deletions(-) >> >> diff --git a/arch_init.c b/arch_init.c >> index 60c975d..342e5dc 100644 >> --- a/arch_init.c >> +++ b/arch_init.c >> @@ -152,11 +152,6 @@ int qemu_read_default_config_files(bool userconfig) >> return 0; >> } >> >> -static inline bool is_zero_range(uint8_t *p, uint64_t size) >> -{ >> - return buffer_find_nonzero_offset(p, size) == size; >> -} >> - >> /* struct contains XBZRLE cache and a static page >> used by the compression */ >> static struct { >> @@ -587,7 +582,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage) >> acct_info.dup_pages++; >> } >> } >> - } else if (is_zero_range(p, TARGET_PAGE_SIZE)) { >> + } else if (buffer_is_zero(p, TARGET_PAGE_SIZE)) { >> acct_info.dup_pages++; >> bytes_sent = save_block_hdr(f, block, offset, cont, >> RAM_SAVE_FLAG_COMPRESS); >> @@ -983,7 +978,7 @@ static inline void *host_from_stream_offset(QEMUFile *f, >> */ >> void ram_handle_compressed(void *host, uint8_t ch, uint64_t size) >> { >> - if (ch != 0 || !is_zero_range(host, size)) { >> + if (ch != 0 || !buffer_is_zero(host, size)) { >> memset(host, ch, size); >> } >> } >> > This seems to be correct, functionally -- apparently buffer_is_zero() > has laxer size requirements than buffer_find_nonzero_offset(). However, > I think the latter might be faster. > > For ram_save_block() I guess the difference is negligible. But > ram_handle_compressed() is also called from "migration-rdma.c", where I > can't even guess if a little bit of slowdown would count. > > I'm OK with the patch if Michael (CC'd) is. > > Thanks > Laszlo > Thanks for the CC. Actually, it looks like buffer_is_zero() is calling buffer_find_nonzero_offset() as a "first try" anyway - which is the same thing RDMA is doing. So, all calls to ram_handle_compressed() should hit the branch target there in buffer_is_zero() for can_use_buffer_find_nonzero_offset() and automatically drop into the vectorized-optimization to search for zeros, so there shouldn't be any change in performance. The same should apply for TCP migration as well - it's working on page-granularity, which is always aligned to 32 or 64 bits. Paolo? I see that some of the block-migration code and the qemu-img code is also calling buffer_is_zero() - are you guys depending on the performance of any buffer_is_zero() calls to use the vector-optimized version like migration does? Just for RDMA: Reviewed-by: Michael R. Hines <mrhines@us.ibm.com> - Michael ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-trivial] [PATCH] arch_init.c: remove duplicate function 2014-04-14 23:55 ` Michael R. Hines @ 2014-04-15 9:00 ` Laszlo Ersek 2014-04-15 16:03 ` [Qemu-trivial] [Qemu-devel] " 陈梁 2014-04-27 8:48 ` [Qemu-trivial] " Paolo Bonzini 1 sibling, 1 reply; 6+ messages in thread From: Laszlo Ersek @ 2014-04-15 9:00 UTC (permalink / raw) To: Michael R. Hines, Amos Kong, qemu-trivial Cc: yamahata, Paolo Bonzini, mjt, qemu-devel, Michael R. Hines On 04/15/14 01:55, Michael R. Hines wrote: > On 04/14/2014 05:19 PM, Laszlo Ersek wrote: >> On 04/14/14 04:27, Amos Kong wrote: >>> We already have a function buffer_is_zero() in util/cutils.c >>> >>> Signed-off-by: Amos Kong <akong@redhat.com> >>> --- >>> arch_init.c | 9 ++------- >>> 1 file changed, 2 insertions(+), 7 deletions(-) >>> >>> diff --git a/arch_init.c b/arch_init.c >>> index 60c975d..342e5dc 100644 >>> --- a/arch_init.c >>> +++ b/arch_init.c >>> @@ -152,11 +152,6 @@ int qemu_read_default_config_files(bool userconfig) >>> return 0; >>> } >>> -static inline bool is_zero_range(uint8_t *p, uint64_t size) >>> -{ >>> - return buffer_find_nonzero_offset(p, size) == size; >>> -} >>> - >>> /* struct contains XBZRLE cache and a static page >>> used by the compression */ >>> static struct { >>> @@ -587,7 +582,7 @@ static int ram_save_block(QEMUFile *f, bool >>> last_stage) >>> acct_info.dup_pages++; >>> } >>> } >>> - } else if (is_zero_range(p, TARGET_PAGE_SIZE)) { >>> + } else if (buffer_is_zero(p, TARGET_PAGE_SIZE)) { >>> acct_info.dup_pages++; >>> bytes_sent = save_block_hdr(f, block, offset, cont, >>> RAM_SAVE_FLAG_COMPRESS); >>> @@ -983,7 +978,7 @@ static inline void >>> *host_from_stream_offset(QEMUFile *f, >>> */ >>> void ram_handle_compressed(void *host, uint8_t ch, uint64_t size) >>> { >>> - if (ch != 0 || !is_zero_range(host, size)) { >>> + if (ch != 0 || !buffer_is_zero(host, size)) { >>> memset(host, ch, size); >>> } >>> } >>> >> This seems to be correct, functionally -- apparently buffer_is_zero() >> has laxer size requirements than buffer_find_nonzero_offset(). However, >> I think the latter might be faster. >> >> For ram_save_block() I guess the difference is negligible. But >> ram_handle_compressed() is also called from "migration-rdma.c", where I >> can't even guess if a little bit of slowdown would count. >> >> I'm OK with the patch if Michael (CC'd) is. >> >> Thanks >> Laszlo >> > > Thanks for the CC. > > Actually, it looks like buffer_is_zero() is calling > buffer_find_nonzero_offset() > as a "first try" anyway I have no idea how I managed to miss that. > - which is the same thing RDMA is doing. So, all > calls to ram_handle_compressed() should hit the branch target there in > buffer_is_zero() for can_use_buffer_find_nonzero_offset() and automatically > drop into the vectorized-optimization to search for zeros, so there > shouldn't > be any change in performance. The same should apply for TCP migration > as well - it's working on page-granularity, which is always aligned to > 32 or 64 bits. > > Paolo? I see that some of the block-migration code and the qemu-img code > is also > calling buffer_is_zero() - are you guys depending on the performance of any > buffer_is_zero() calls to use the vector-optimized version like > migration does? The patch doesn't change buffer_is_zero() internally, so callers of buffer_is_zero() are unaffected. The patch turns two indirect callers of buffer_find_nonzero_offset() into two "differently indirect" callers of the same (which I now see thanks to your explanation). Hence, Before: ram_save_block -> is_zero_range -> buffer_find_nonzero_offset ram_handle_compressed -> is_zero_range -> buffer_find_nonzero_offset After: ram_save_block -> buffer_is_zero -> buffer_find_nonzero_offset ram_handle_compressed -> buffer_is_zero -> buffer_find_nonzero_offset Reviewed-by: Laszlo Ersek <lersek@redhat.com> Thanks! Laszlo ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-trivial] [Qemu-devel] [PATCH] arch_init.c: remove duplicate function 2014-04-15 9:00 ` Laszlo Ersek @ 2014-04-15 16:03 ` 陈梁 0 siblings, 0 replies; 6+ messages in thread From: 陈梁 @ 2014-04-15 16:03 UTC (permalink / raw) To: Laszlo Ersek Cc: yamahata, qemu-trivial, mjt, qemu-devel, Michael R. Hines, 陈梁, Michael R. Hines, Paolo Bonzini > On 04/15/14 01:55, Michael R. Hines wrote: >> On 04/14/2014 05:19 PM, Laszlo Ersek wrote: >>> On 04/14/14 04:27, Amos Kong wrote: >>>> We already have a function buffer_is_zero() in util/cutils.c >>>> >>>> Signed-off-by: Amos Kong <akong@redhat.com> >>>> --- >>>> arch_init.c | 9 ++------- >>>> 1 file changed, 2 insertions(+), 7 deletions(-) >>>> >>>> diff --git a/arch_init.c b/arch_init.c >>>> index 60c975d..342e5dc 100644 >>>> --- a/arch_init.c >>>> +++ b/arch_init.c >>>> @@ -152,11 +152,6 @@ int qemu_read_default_config_files(bool userconfig) >>>> return 0; >>>> } >>>> -static inline bool is_zero_range(uint8_t *p, uint64_t size) >>>> -{ >>>> - return buffer_find_nonzero_offset(p, size) == size; >>>> -} >>>> - >>>> /* struct contains XBZRLE cache and a static page >>>> used by the compression */ >>>> static struct { >>>> @@ -587,7 +582,7 @@ static int ram_save_block(QEMUFile *f, bool >>>> last_stage) >>>> acct_info.dup_pages++; >>>> } >>>> } >>>> - } else if (is_zero_range(p, TARGET_PAGE_SIZE)) { >>>> + } else if (buffer_is_zero(p, TARGET_PAGE_SIZE)) { >>>> acct_info.dup_pages++; >>>> bytes_sent = save_block_hdr(f, block, offset, cont, >>>> RAM_SAVE_FLAG_COMPRESS); >>>> @@ -983,7 +978,7 @@ static inline void >>>> *host_from_stream_offset(QEMUFile *f, >>>> */ >>>> void ram_handle_compressed(void *host, uint8_t ch, uint64_t size) >>>> { >>>> - if (ch != 0 || !is_zero_range(host, size)) { >>>> + if (ch != 0 || !buffer_is_zero(host, size)) { >>>> memset(host, ch, size); >>>> } >>>> } >>>> >>> This seems to be correct, functionally -- apparently buffer_is_zero() >>> has laxer size requirements than buffer_find_nonzero_offset(). However, >>> I think the latter might be faster. >>> >>> For ram_save_block() I guess the difference is negligible. But >>> ram_handle_compressed() is also called from "migration-rdma.c", where I >>> can't even guess if a little bit of slowdown would count. >>> >>> I'm OK with the patch if Michael (CC'd) is. >>> >>> Thanks >>> Laszlo >>> >> >> Thanks for the CC. >> >> Actually, it looks like buffer_is_zero() is calling >> buffer_find_nonzero_offset() >> as a "first try" anyway > > I have no idea how I managed to miss that. > >> - which is the same thing RDMA is doing. So, all >> calls to ram_handle_compressed() should hit the branch target there in >> buffer_is_zero() for can_use_buffer_find_nonzero_offset() and automatically >> drop into the vectorized-optimization to search for zeros, so there >> shouldn't >> be any change in performance. The same should apply for TCP migration >> as well - it's working on page-granularity, which is always aligned to >> 32 or 64 bits. >> >> Paolo? I see that some of the block-migration code and the qemu-img code >> is also >> calling buffer_is_zero() - are you guys depending on the performance of any >> buffer_is_zero() calls to use the vector-optimized version like >> migration does? > > The patch doesn't change buffer_is_zero() internally, so callers of > buffer_is_zero() are unaffected. The patch turns two indirect callers of > buffer_find_nonzero_offset() into two "differently indirect" callers of > the same (which I now see thanks to your explanation). Hence, > > Before: > ram_save_block -> is_zero_range -> buffer_find_nonzero_offset > ram_handle_compressed -> is_zero_range -> buffer_find_nonzero_offset Because of *static inline *is_zero_range, it may be like that: ram_save_block -> buffer_find_nonzero_offset ram_handle_compressed -> buffer_find_nonzero_offset But this affects little in my testing. Best regards ChenLiang > After: > ram_save_block -> buffer_is_zero -> buffer_find_nonzero_offset > ram_handle_compressed -> buffer_is_zero -> buffer_find_nonzero_offset > > Reviewed-by: Laszlo Ersek <lersek@redhat.com> > > Thanks! > Laszlo ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-trivial] [PATCH] arch_init.c: remove duplicate function 2014-04-14 23:55 ` Michael R. Hines 2014-04-15 9:00 ` Laszlo Ersek @ 2014-04-27 8:48 ` Paolo Bonzini 1 sibling, 0 replies; 6+ messages in thread From: Paolo Bonzini @ 2014-04-27 8:48 UTC (permalink / raw) To: Michael R. Hines, Laszlo Ersek, Amos Kong, qemu-trivial Cc: yamahata, mjt, qemu-devel, Michael R. Hines Il 15/04/2014 01:55, Michael R. Hines ha scritto: > Paolo? I see that some of the block-migration code and the qemu-img code > is also > calling buffer_is_zero() - are you guys depending on the performance of any > buffer_is_zero() calls to use the vector-optimized version like > migration does? Yes, definitely. Not to the point that we care about which functions are inlined, but the constant-factor improvement that comes for vectorization can have a noticeable impact. Paolo ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-04-27 8:48 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-04-14 2:27 [Qemu-trivial] [PATCH] arch_init.c: remove duplicate function Amos Kong 2014-04-14 9:19 ` Laszlo Ersek 2014-04-14 23:55 ` Michael R. Hines 2014-04-15 9:00 ` Laszlo Ersek 2014-04-15 16:03 ` [Qemu-trivial] [Qemu-devel] " 陈梁 2014-04-27 8:48 ` [Qemu-trivial] " Paolo Bonzini
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.