[Qemu-devel] [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
@ 2010-02-10 10:52 OHMURA Kei
  2010-02-10 13:10 ` [Qemu-devel] " Ulrich Drepper
  2010-02-10 13:20 ` Avi Kivity
  0 siblings, 2 replies; 26+ messages in thread
From: OHMURA Kei @ 2010-02-10 10:52 UTC (permalink / raw)
  To: kvm@vger.kernel.org, qemu-devel@nongnu.org
  Cc: ohmura.kei, mtosatti, Avi Kivity

dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c.
But We think that dirty-bitmap-traveling by long size is faster than by byte
size especially when most of memory is not dirty.

Signed-off-by: OHMURA Kei <ohmura.kei@lab.ntt.co.jp>
---
 bswap.h    |    1 -
 qemu-kvm.c |   30 ++++++++++++++++--------------
 2 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/bswap.h b/bswap.h
index 4558704..d896f01 100644
--- a/bswap.h
+++ b/bswap.h
@@ -209,7 +209,6 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v)
 #define cpu_to_32wu cpu_to_le32wu
 #endif
 
-#undef le_bswap
 #undef be_bswap
 #undef le_bswaps
 #undef be_bswaps
diff --git a/qemu-kvm.c b/qemu-kvm.c
index a305907..ea07912 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -2438,27 +2438,29 @@ static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
                                          unsigned long offset,
                                          unsigned long mem_size)
 {
-    unsigned int i, j, n = 0;
-    unsigned char c;
-    unsigned long page_number, addr, addr1;
+    unsigned int i, j;
+    unsigned long page_number, addr, addr1, c;
     ram_addr_t ram_addr;
-    unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
+    unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + HOST_LONG_BITS - 1) /
+        HOST_LONG_BITS;
+    unsigned long *bitmap_ul = (unsigned long *)bitmap;
 
     /* 
      * bitmap-traveling is faster than memory-traveling (for addr...) 
      * especially when most of the memory is not dirty.
      */
     for (i = 0; i < len; i++) {
-        c = bitmap[i];
-        while (c > 0) {
-            j = ffsl(c) - 1;
-            c &= ~(1u << j);
-            page_number = i * 8 + j;
-            addr1 = page_number * TARGET_PAGE_SIZE;
-            addr = offset + addr1;
-            ram_addr = cpu_get_physical_page_desc(addr);
-            cpu_physical_memory_set_dirty(ram_addr);
-            n++;
+        if (bitmap_ul[i] != 0) {
+            c = le_bswap(bitmap_ul[i], HOST_LONG_BITS);
+            while (c > 0) {
+                j = ffsl(c) - 1;
+                c &= ~(1ul << j);
+                page_number = i * HOST_LONG_BITS + j;
+                addr1 = page_number * TARGET_PAGE_SIZE;
+                addr = offset + addr1;
+                ram_addr = cpu_get_physical_page_desc(addr);
+                cpu_physical_memory_set_dirty(ram_addr);
+            }
         }
     }
     return 0;
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-10 10:52 [Qemu-devel] [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling OHMURA Kei
@ 2010-02-10 13:10 ` Ulrich Drepper
  2010-02-10 13:20 ` Avi Kivity
  1 sibling, 0 replies; 26+ messages in thread
From: Ulrich Drepper @ 2010-02-10 13:10 UTC (permalink / raw)
  To: OHMURA Kei
  Cc: mtosatti, qemu-devel@nongnu.org, kvm@vger.kernel.org, Avi Kivity

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 02/10/2010 02:52 AM, OHMURA Kei wrote:

>      for (i = 0; i < len; i++) {
> -        c = bitmap[i];
> -        while (c > 0) {
> -            j = ffsl(c) - 1;
> -            c &= ~(1u << j);
> -            page_number = i * 8 + j;
> -            addr1 = page_number * TARGET_PAGE_SIZE;
> -            addr = offset + addr1;
> -            ram_addr = cpu_get_physical_page_desc(addr);
> -            cpu_physical_memory_set_dirty(ram_addr);
> -            n++;
> +        if (bitmap_ul[i] != 0) {
> +            c = le_bswap(bitmap_ul[i], HOST_LONG_BITS);
> +            while (c > 0) {
> +                j = ffsl(c) - 1;
> +                c &= ~(1ul << j);
> +                page_number = i * HOST_LONG_BITS + j;
> +                addr1 = page_number * TARGET_PAGE_SIZE;
> +                addr = offset + addr1;
> +                ram_addr = cpu_get_physical_page_desc(addr);
> +                cpu_physical_memory_set_dirty(ram_addr);
> +            }

If you're optimizing this code you might want to do it all.  The
compiler might not see through the bswap call and create unnecessary
data dependencies.  Especially problematic if the bitmap is really
sparse.  Also, the outer test is != while the inner test is >.  Be
consistent.  I suggest to replace the inner loop with

      do {
        ...
      } while (c != 0);

Depending on how sparse the bitmap is populated this might reduce the
number of data dependencies quite a bit.

- -- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAktysDoACgkQ2ijCOnn/RHS2zwCfcj+G0S5ZAEA8MjGAVI/rKjJJ
+0oAnA4njIrwx3/5+o43ekYeYXSNyei0
=ukkz
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-10 10:52 [Qemu-devel] [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling OHMURA Kei
  2010-02-10 13:10 ` [Qemu-devel] " Ulrich Drepper
@ 2010-02-10 13:20 ` Avi Kivity
  2010-02-10 15:54   ` Anthony Liguori
  2010-02-10 15:55   ` Anthony Liguori
  1 sibling, 2 replies; 26+ messages in thread
From: Avi Kivity @ 2010-02-10 13:20 UTC (permalink / raw)
  To: OHMURA Kei
  Cc: Anthony Liguori, mtosatti, qemu-devel@nongnu.org,
	kvm@vger.kernel.org

On 02/10/2010 12:52 PM, OHMURA Kei wrote:
> dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c.
> But We think that dirty-bitmap-traveling by long size is faster than by byte
> size especially when most of memory is not dirty.
>
> --- a/bswap.h
> +++ b/bswap.h
> @@ -209,7 +209,6 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v)
>  #define cpu_to_32wu cpu_to_le32wu
>  #endif
>  
> -#undef le_bswap
>  #undef be_bswap
>  #undef le_bswaps
>   


Anthony, is it okay to export le_bswap this way, or will you want
leul_to_cpu()?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-10 13:20 ` Avi Kivity
@ 2010-02-10 15:54   ` Anthony Liguori
  2010-02-10 15:57     ` Avi Kivity
  2010-02-10 16:00     ` Alexander Graf
  2010-02-10 15:55   ` Anthony Liguori
  1 sibling, 2 replies; 26+ messages in thread
From: Anthony Liguori @ 2010-02-10 15:54 UTC (permalink / raw)
  To: Avi Kivity
  Cc: OHMURA Kei, mtosatti, qemu-devel@nongnu.org, kvm@vger.kernel.org,
	Anthony Liguori

On 02/10/2010 07:20 AM, Avi Kivity wrote:
> On 02/10/2010 12:52 PM, OHMURA Kei wrote:
>   
>> dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c.
>> But We think that dirty-bitmap-traveling by long size is faster than by byte
>> size especially when most of memory is not dirty.
>>
>> --- a/bswap.h
>> +++ b/bswap.h
>> @@ -209,7 +209,6 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v)
>>  #define cpu_to_32wu cpu_to_le32wu
>>  #endif
>>  
>> -#undef le_bswap
>>  #undef be_bswap
>>  #undef le_bswaps
>>   
>>     
>
> Anthony, is it okay to export le_bswap this way, or will you want
> leul_to_cpu()?
>   

kvm_get_dirty_pages_log_range() is kvm-specific code. We're guaranteed
that when we're using kvm, target byte order == host byte order.

So is it really necessary to use a byte swapping function at all?

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-10 13:20 ` Avi Kivity
  2010-02-10 15:54   ` Anthony Liguori
@ 2010-02-10 15:55   ` Anthony Liguori
  2010-02-12  2:03     ` OHMURA Kei
  1 sibling, 1 reply; 26+ messages in thread
From: Anthony Liguori @ 2010-02-10 15:55 UTC (permalink / raw)
  To: Avi Kivity
  Cc: OHMURA Kei, mtosatti, qemu-devel@nongnu.org, kvm@vger.kernel.org,
	Anthony Liguori

On 02/10/2010 07:20 AM, Avi Kivity wrote:
> On 02/10/2010 12:52 PM, OHMURA Kei wrote:
>   
>> dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c.
>> But We think that dirty-bitmap-traveling by long size is faster than by byte
>> size especially when most of memory is not dirty.
>>
>> --- a/bswap.h
>> +++ b/bswap.h
>> @@ -209,7 +209,6 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v)
>>  #define cpu_to_32wu cpu_to_le32wu
>>  #endif
>>  
>> -#undef le_bswap
>>  #undef be_bswap
>>  #undef le_bswaps
>>   
>>     
>
> Anthony, is it okay to export le_bswap this way, or will you want
> leul_to_cpu()?
>   

Oh, I see what's happening here. Yes, I think a leul_to_cpu() makes more
sense.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-10 15:54   ` Anthony Liguori
@ 2010-02-10 15:57     ` Avi Kivity
  2010-02-10 16:00     ` Alexander Graf
  1 sibling, 0 replies; 26+ messages in thread
From: Avi Kivity @ 2010-02-10 15:57 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: OHMURA Kei, mtosatti, qemu-devel@nongnu.org, kvm@vger.kernel.org,
	Anthony Liguori

On 02/10/2010 05:54 PM, Anthony Liguori wrote:
> On 02/10/2010 07:20 AM, Avi Kivity wrote:
>   
>> On 02/10/2010 12:52 PM, OHMURA Kei wrote:
>>   
>>     
>>> dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c.
>>> But We think that dirty-bitmap-traveling by long size is faster than by byte
>>> size especially when most of memory is not dirty.
>>>
>>> --- a/bswap.h
>>> +++ b/bswap.h
>>> @@ -209,7 +209,6 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v)
>>>  #define cpu_to_32wu cpu_to_le32wu
>>>  #endif
>>>  
>>> -#undef le_bswap
>>>  #undef be_bswap
>>>  #undef le_bswaps
>>>   
>>>     
>>>       
>> Anthony, is it okay to export le_bswap this way, or will you want
>> leul_to_cpu()?
>>   
>>     
> kvm_get_dirty_pages_log_range() is kvm-specific code. We're guaranteed
> that when we're using kvm, target byte order == host byte order.
>
> So is it really necessary to use a byte swapping function at all?
>   

The dirty log bitmap is always little endian. This is so we don't have
to depend on sizeof(long) (which can vary between kernel and userspace)
or mandate some other access size.

(if native endian worked, then the previous byte-based code would have
been broken on big endian).

Seriously, those who say that big vs little endian is a matter of taste
are missing something.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-10 15:54   ` Anthony Liguori
  2010-02-10 15:57     ` Avi Kivity
@ 2010-02-10 16:00     ` Alexander Graf
  2010-02-10 16:35       ` Anthony Liguori
  1 sibling, 1 reply; 26+ messages in thread
From: Alexander Graf @ 2010-02-10 16:00 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Anthony Liguori, kvm@vger.kernel.org, mtosatti, OHMURA Kei,
	qemu-devel@nongnu.org, Avi Kivity

Anthony Liguori wrote:
> On 02/10/2010 07:20 AM, Avi Kivity wrote:
>   
>> On 02/10/2010 12:52 PM, OHMURA Kei wrote:
>>   
>>     
>>> dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c.
>>> But We think that dirty-bitmap-traveling by long size is faster than by byte
>>> size especially when most of memory is not dirty.
>>>
>>> --- a/bswap.h
>>> +++ b/bswap.h
>>> @@ -209,7 +209,6 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v)
>>>  #define cpu_to_32wu cpu_to_le32wu
>>>  #endif
>>>  
>>> -#undef le_bswap
>>>  #undef be_bswap
>>>  #undef le_bswaps
>>>   
>>>     
>>>       
>> Anthony, is it okay to export le_bswap this way, or will you want
>> leul_to_cpu()?
>>   
>>     
>
> kvm_get_dirty_pages_log_range() is kvm-specific code. We're guaranteed
> that when we're using kvm, target byte order == host byte order.
>
> So is it really necessary to use a byte swapping function at all?
>   

On PPC the bitmap is Little Endian.


Alex

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-10 16:00     ` Alexander Graf
@ 2010-02-10 16:35       ` Anthony Liguori
  2010-02-10 16:43         ` Alexander Graf
  2010-02-10 16:43         ` Avi Kivity
  0 siblings, 2 replies; 26+ messages in thread
From: Anthony Liguori @ 2010-02-10 16:35 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anthony Liguori, kvm@vger.kernel.org, mtosatti, OHMURA Kei,
	qemu-devel@nongnu.org, Avi Kivity

On 02/10/2010 10:00 AM, Alexander Graf wrote:
> On PPC the bitmap is Little Endian.
>   

Out of curiousity, why? It seems like an odd interface.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-10 16:35       ` Anthony Liguori
@ 2010-02-10 16:43         ` Alexander Graf
  2010-02-10 16:46           ` Avi Kivity
  2010-02-10 16:43         ` Avi Kivity
  1 sibling, 1 reply; 26+ messages in thread
From: Alexander Graf @ 2010-02-10 16:43 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Anthony Liguori, kvm@vger.kernel.org, mtosatti, OHMURA Kei,
	qemu-devel@nongnu.org, Avi Kivity

Anthony Liguori wrote:
> On 02/10/2010 10:00 AM, Alexander Graf wrote:
>   
>> On PPC the bitmap is Little Endian.
>>   
>>     
>
> Out of curiousity, why? It seems like an odd interface.
>   

Because on PPC, you usually run PPC32 userspace code on a PPC64 kernel.
Unlike with x86, there's no real benefit in using 64 bit userspace.

So thanks to the nature of big endianness, that breaks our set_bit
helpers, because they assume you're using "long" data types for the
bits. While that's no real issue on little endian, since the next int is
just the high part of a u64, it messes everything up on ppc.

For more details, please just look in the archives on my patches to make
it little endian.

Alex

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-10 16:35       ` Anthony Liguori
  2010-02-10 16:43         ` Alexander Graf
@ 2010-02-10 16:43         ` Avi Kivity
  1 sibling, 0 replies; 26+ messages in thread
From: Avi Kivity @ 2010-02-10 16:43 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Anthony Liguori, kvm@vger.kernel.org, qemu-devel@nongnu.org,
	mtosatti, OHMURA Kei, Alexander Graf

On 02/10/2010 06:35 PM, Anthony Liguori wrote:
> On 02/10/2010 10:00 AM, Alexander Graf wrote:
>   
>> On PPC the bitmap is Little Endian.
>>   
>>     
> Out of curiousity, why? It seems like an odd interface.
>
>   

Exactly this issue. If you specify it as unsigned long native endian,
there is ambiguity between 32-bit and 64-bit userspace. If you specify
it as uint64_t native endian, you have an inefficient implementation on
32-bit userspace. So we went for unsigned byte native endian, which is
the same as any size little endian.

(well I think the real reason is that it just grew that way out of x86,
but the above is quite plausible).

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-10 16:43         ` Alexander Graf
@ 2010-02-10 16:46           ` Avi Kivity
  2010-02-10 16:47             ` Alexander Graf
  0 siblings, 1 reply; 26+ messages in thread
From: Avi Kivity @ 2010-02-10 16:46 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anthony Liguori, kvm@vger.kernel.org, mtosatti, OHMURA Kei,
	qemu-devel@nongnu.org

On 02/10/2010 06:43 PM, Alexander Graf wrote:
>
>> Out of curiousity, why? It seems like an odd interface.
>>   
>>     
> Because on PPC, you usually run PPC32 userspace code on a PPC64 kernel.
> Unlike with x86, there's no real benefit in using 64 bit userspace.
>   

btw, does 32-bit ppc qemu support large memory guests? It doesn't on
x86, and I don't remember any hacks to support large memory guests
elsewhere.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-10 16:46           ` Avi Kivity
@ 2010-02-10 16:47             ` Alexander Graf
  2010-02-10 16:52               ` Avi Kivity
  0 siblings, 1 reply; 26+ messages in thread
From: Alexander Graf @ 2010-02-10 16:47 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, kvm@vger.kernel.org, mtosatti, OHMURA Kei,
	qemu-devel@nongnu.org

Avi Kivity wrote:
> On 02/10/2010 06:43 PM, Alexander Graf wrote:
>   
>>> Out of curiousity, why? It seems like an odd interface.
>>>   
>>>     
>>>       
>> Because on PPC, you usually run PPC32 userspace code on a PPC64 kernel.
>> Unlike with x86, there's no real benefit in using 64 bit userspace.
>>   
>>     
>
> btw, does 32-bit ppc qemu support large memory guests? It doesn't on
> x86, and I don't remember any hacks to support large memory guests
> elsewhere.
>   


It doesn't :-). In fact, the guest we virtualize wouldn't work with > 2
GB anyways, because that needs an iommu implementation.


Alex

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-10 16:47             ` Alexander Graf
@ 2010-02-10 16:52               ` Avi Kivity
  2010-02-10 16:54                 ` Alexander Graf
  0 siblings, 1 reply; 26+ messages in thread
From: Avi Kivity @ 2010-02-10 16:52 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anthony Liguori, kvm@vger.kernel.org, mtosatti, OHMURA Kei,
	qemu-devel@nongnu.org

On 02/10/2010 06:47 PM, Alexander Graf wrote:
>>> Because on PPC, you usually run PPC32 userspace code on a PPC64 kernel.
>>> Unlike with x86, there's no real benefit in using 64 bit userspace.
>>>   
>>>     
>>>       
>> btw, does 32-bit ppc qemu support large memory guests? It doesn't on
>> x86, and I don't remember any hacks to support large memory guests
>> elsewhere.
>>   
>>     
>
> It doesn't :-). In fact, the guest we virtualize wouldn't work with > 2
> GB anyways, because that needs an iommu implementation.
>   

Oh, so you may want to revisit the "there's no real benefit in using 64
bit userspace".

Seriously, that looks like a big deficiency. What would it take to
implement an iommu?

I imagine Anthony's latest patches are a first step in that journey.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-10 16:52               ` Avi Kivity
@ 2010-02-10 16:54                 ` Alexander Graf
  0 siblings, 0 replies; 26+ messages in thread
From: Alexander Graf @ 2010-02-10 16:54 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, kvm@vger.kernel.org, mtosatti, OHMURA Kei,
	qemu-devel@nongnu.org

Avi Kivity wrote:
> On 02/10/2010 06:47 PM, Alexander Graf wrote:
>   
>>>> Because on PPC, you usually run PPC32 userspace code on a PPC64 kernel.
>>>> Unlike with x86, there's no real benefit in using 64 bit userspace.
>>>>   
>>>>     
>>>>       
>>>>         
>>> btw, does 32-bit ppc qemu support large memory guests? It doesn't on
>>> x86, and I don't remember any hacks to support large memory guests
>>> elsewhere.
>>>   
>>>     
>>>       
>> It doesn't :-). In fact, the guest we virtualize wouldn't work with > 2
>> GB anyways, because that needs an iommu implementation.
>>   
>>     
>
> Oh, so you may want to revisit the "there's no real benefit in using 64
> bit userspace".
>   

Well, for normal users they don't. SLES11 is 64-bit only, so we're good
on that. But openSUSE uses 32-bit userland.

> Seriously, that looks like a big deficiency. What would it take to
> implement an iommu?
>
> I imagine Anthony's latest patches are a first step in that journey.
>   

All reads/writes from PCI devices would need to go through a wrapper.
Maybe we could also define a per-device offset for memory accesses. That
way the overhead might be less.

Yes, Anthony's patches look like they are a really big step in that
direction.


Alex

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-10 15:55   ` Anthony Liguori
@ 2010-02-12  2:03     ` OHMURA Kei
  2010-02-14 12:34       ` Avi Kivity
  0 siblings, 1 reply; 26+ messages in thread
From: OHMURA Kei @ 2010-02-12  2:03 UTC (permalink / raw)
  To: Anthony Liguori, Avi Kivity, qemu-devel@nongnu.org,
	kvm@vger.kernel.org
  Cc: ohmura.kei, mtosatti, drepper, Yoshiaki Tamura

On 02/11/2010 Anthony Liguori <anthony@codemonkey.ws> wrote:
> Oh, I see what's happening here. Yes, I think a leul_to_cpu() makes more
> sense.

Maybe I'm missing something here.
I couldn't find leul_to_cpu(), so have defined it in bswap.h.
Correct?

--- a/bswap.h
+++ b/bswap.h
@@ -205,8 +205,10 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v)
 
 #ifdef HOST_WORDS_BIGENDIAN
 #define cpu_to_32wu cpu_to_be32wu
+#define leul_to_cpu(v) le ## HOST_LONG_BITS ## _to_cpu(v)
 #else
 #define cpu_to_32wu cpu_to_le32wu
+#define leul_to_cpu(v) (v)
 #endif



On 02/10/2010 Ulrich Drepper <drepper@redhat.com> wrote:
> If you're optimizing this code you might want to do it all.  The
> compiler might not see through the bswap call and create unnecessary
> data dependencies.  Especially problematic if the bitmap is really
> sparse.  Also, the outer test is != while the inner test is >.  Be
> consistent.  I suggest to replace the inner loop with
> 
>      do {
>        ...
>      } while (c != 0);
> 
> Depending on how sparse the bitmap is populated this might reduce the
> number of data dependencies quite a bit.

Combining all comments, the code would be like this.
     
 if (bitmap_ul[i] != 0) {
     c = leul_to_cpu(bitmap_ul[i]);
     do {
         j = ffsl(c) - 1;
         c &= ~(1ul << j);
         page_number = i * HOST_LONG_BITS + j;
         addr1 = page_number * TARGET_PAGE_SIZE;
         addr = offset + addr1;
         ram_addr = cpu_get_physical_page_desc(addr);
         cpu_physical_memory_set_dirty(ram_addr);
     } while (c != 0);
 }

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-12  2:03     ` OHMURA Kei
@ 2010-02-14 12:34       ` Avi Kivity
  2010-02-15  6:12         ` OHMURA Kei
  0 siblings, 1 reply; 26+ messages in thread
From: Avi Kivity @ 2010-02-14 12:34 UTC (permalink / raw)
  To: OHMURA Kei
  Cc: kvm@vger.kernel.org, mtosatti, Yoshiaki Tamura,
	qemu-devel@nongnu.org, drepper

On 02/12/2010 04:03 AM, OHMURA Kei wrote:
> On 02/11/2010 Anthony Liguori <anthony@codemonkey.ws> wrote:
>   
>> Oh, I see what's happening here. Yes, I think a leul_to_cpu() makes more
>> sense.
>>     
> Maybe I'm missing something here.
> I couldn't find leul_to_cpu(), so have defined it in bswap.h.
> Correct?
>
> --- a/bswap.h
> +++ b/bswap.h
> @@ -205,8 +205,10 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v)
>  
>  #ifdef HOST_WORDS_BIGENDIAN
>  #define cpu_to_32wu cpu_to_be32wu
> +#define leul_to_cpu(v) le ## HOST_LONG_BITS ## _to_cpu(v)
>  #else
>  #define cpu_to_32wu cpu_to_le32wu
> +#define leul_to_cpu(v) (v)
>  #endif
>
>
>
> On 02/10/2010 Ulrich Drepper <drepper@redhat.com> wrote:
>   
>> If you're optimizing this code you might want to do it all.  The
>> compiler might not see through the bswap call and create unnecessary
>> data dependencies.  Especially problematic if the bitmap is really
>> sparse.  Also, the outer test is != while the inner test is >.  Be
>> consistent.  I suggest to replace the inner loop with
>>
>>      do {
>>        ...
>>      } while (c != 0);
>>
>> Depending on how sparse the bitmap is populated this might reduce the
>> number of data dependencies quite a bit.
>>     
> Combining all comments, the code would be like this.
>      
>  if (bitmap_ul[i] != 0) {
>      c = leul_to_cpu(bitmap_ul[i]);
>      do {
>          j = ffsl(c) - 1;
>          c &= ~(1ul << j);
>          page_number = i * HOST_LONG_BITS + j;
>          addr1 = page_number * TARGET_PAGE_SIZE;
>          addr = offset + addr1;
>          ram_addr = cpu_get_physical_page_desc(addr);
>          cpu_physical_memory_set_dirty(ram_addr);
>      } while (c != 0);
>  }
>   

Except you don't need bitmap_ul any more - you can change the type of
the bitmap variable, since all accesses should now be ulongs.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-14 12:34       ` Avi Kivity
@ 2010-02-15  6:12         ` OHMURA Kei
  2010-02-15  8:24           ` Alexander Graf
  0 siblings, 1 reply; 26+ messages in thread
From: OHMURA Kei @ 2010-02-15  6:12 UTC (permalink / raw)
  To: Avi Kivity, qemu-devel@nongnu.org, kvm@vger.kernel.org
  Cc: ohmura.kei, mtosatti, drepper, Yoshiaki Tamura

dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c.
But We think that dirty-bitmap-traveling by long size is faster than by byte
size especially when most of memory is not dirty.

Signed-off-by: OHMURA Kei <ohmura.kei@lab.ntt.co.jp>
---
 bswap.h    |    2 ++
 qemu-kvm.c |   31 ++++++++++++++++---------------
 2 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/bswap.h b/bswap.h
index 4558704..1f87e6d 100644
--- a/bswap.h
+++ b/bswap.h
@@ -205,8 +205,10 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v)
 
 #ifdef HOST_WORDS_BIGENDIAN
 #define cpu_to_32wu cpu_to_be32wu
+#define leul_to_cpu(v) le ## HOST_LONG_BITS ## _to_cpu(v)
 #else
 #define cpu_to_32wu cpu_to_le32wu
+#define leul_to_cpu(v) (v)
 #endif
 
 #undef le_bswap
diff --git a/qemu-kvm.c b/qemu-kvm.c
index a305907..6952aa5 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -2434,31 +2434,32 @@ int kvm_physical_memory_set_dirty_tracking(int enable)
 
 /* get kvm's dirty pages bitmap and update qemu's */
 static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
-                                         unsigned char *bitmap,
+                                         unsigned long *bitmap,
                                          unsigned long offset,
                                          unsigned long mem_size)
 {
-    unsigned int i, j, n = 0;
-    unsigned char c;
-    unsigned long page_number, addr, addr1;
+    unsigned int i, j;
+    unsigned long page_number, addr, addr1, c;
     ram_addr_t ram_addr;
-    unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
+    unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + HOST_LONG_BITS - 1) /
+        HOST_LONG_BITS;
 
     /* 
      * bitmap-traveling is faster than memory-traveling (for addr...) 
      * especially when most of the memory is not dirty.
      */
     for (i = 0; i < len; i++) {
-        c = bitmap[i];
-        while (c > 0) {
-            j = ffsl(c) - 1;
-            c &= ~(1u << j);
-            page_number = i * 8 + j;
-            addr1 = page_number * TARGET_PAGE_SIZE;
-            addr = offset + addr1;
-            ram_addr = cpu_get_physical_page_desc(addr);
-            cpu_physical_memory_set_dirty(ram_addr);
-            n++;
+        if (bitmap[i] != 0) {
+            c = leul_to_cpu(bitmap[i]);
+            do {
+                j = ffsl(c) - 1;
+                c &= ~(1ul << j);
+                page_number = i * HOST_LONG_BITS + j;
+                addr1 = page_number * TARGET_PAGE_SIZE;
+                addr = offset + addr1;
+                ram_addr = cpu_get_physical_page_desc(addr);
+                cpu_physical_memory_set_dirty(ram_addr);
+            } while (c != 0);
         }
     }
     return 0;
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-15  6:12         ` OHMURA Kei
@ 2010-02-15  8:24           ` Alexander Graf
  2010-02-16 11:16             ` OHMURA Kei
  0 siblings, 1 reply; 26+ messages in thread
From: Alexander Graf @ 2010-02-15  8:24 UTC (permalink / raw)
  To: OHMURA Kei
  Cc: kvm@vger.kernel.org, mtosatti, Yoshiaki Tamura,
	qemu-devel@nongnu.org, Avi Kivity, drepper


On 15.02.2010, at 07:12, OHMURA Kei wrote:

> dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c.
> But We think that dirty-bitmap-traveling by long size is faster than by byte

"We think"? I mean - yes, I think so too. But have you actually measured it? How much improvement are we talking here?
Is it still faster when a bswap is involved?

Alex

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-15  8:24           ` Alexander Graf
@ 2010-02-16 11:16             ` OHMURA Kei
  2010-02-16 11:18               ` Alexander Graf
  0 siblings, 1 reply; 26+ messages in thread
From: OHMURA Kei @ 2010-02-16 11:16 UTC (permalink / raw)
  To: Alexander Graf
  Cc: ohmura.kei, kvm@vger.kernel.org, mtosatti, Yoshiaki Tamura,
	qemu-devel@nongnu.org, Avi Kivity, drepper

> "We think"? I mean - yes, I think so too. But have you actually measured it?
> How much improvement are we talking here?
> Is it still faster when a bswap is involved?

Thanks for pointing out.
I will post the data for x86 later.
However, I don't have a test environment to check the impact of bswap.
Would you please measure the run time between the following section if possible?

start ->
qemu-kvm.c:

static int kvm_get_dirty_bitmap_cb(unsigned long start, unsigned long len,
                                   void *bitmap, void *opaque)
{
    /* warm up each function */
    kvm_get_dirty_pages_log_range(start, bitmap, start, len);
    kvm_get_dirty_pages_log_range_new(start, bitmap, start, len);

    /* measurement */
    int64_t t1, t2;
    t1 = cpu_get_real_ticks();
    kvm_get_dirty_pages_log_range(start, bitmap, start, len);
    t1 = cpu_get_real_ticks() - t1;
    t2 = cpu_get_real_ticks();
    kvm_get_dirty_pages_log_range_new(start, bitmap, start, len);
    t2 = cpu_get_real_ticks() - t2;

    printf("## %zd, %zd\n", t1, t2); fflush(stdout);

    return kvm_get_dirty_pages_log_range_new(start, bitmap, start, len);
}
end ->

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-16 11:16             ` OHMURA Kei
@ 2010-02-16 11:18               ` Alexander Graf
  2010-02-17  9:42                 ` OHMURA Kei
  0 siblings, 1 reply; 26+ messages in thread
From: Alexander Graf @ 2010-02-16 11:18 UTC (permalink / raw)
  To: OHMURA Kei
  Cc: kvm@vger.kernel.org, mtosatti, Yoshiaki Tamura,
	qemu-devel@nongnu.org, Avi Kivity, drepper


On 16.02.2010, at 12:16, OHMURA Kei wrote:

>> "We think"? I mean - yes, I think so too. But have you actually measured it?
>> How much improvement are we talking here?
>> Is it still faster when a bswap is involved?
> 
> Thanks for pointing out.
> I will post the data for x86 later.
> However, I don't have a test environment to check the impact of bswap.
> Would you please measure the run time between the following section if possible?

It'd make more sense to have a real stand alone test program, no?
I can try to write one today, but I have some really nasty important bugs to fix first.


Alex

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-16 11:18               ` Alexander Graf
@ 2010-02-17  9:42                 ` OHMURA Kei
  2010-02-17  9:46                   ` Alexander Graf
  2010-02-17  9:47                   ` Avi Kivity
  0 siblings, 2 replies; 26+ messages in thread
From: OHMURA Kei @ 2010-02-17  9:42 UTC (permalink / raw)
  To: Alexander Graf
  Cc: ohmura.kei, kvm@vger.kernel.org, mtosatti, Yoshiaki Tamura,
	qemu-devel@nongnu.org, Avi Kivity, drepper

>>> "We think"? I mean - yes, I think so too. But have you actually measured it?
>>> How much improvement are we talking here?
>>> Is it still faster when a bswap is involved?
>> Thanks for pointing out.
>> I will post the data for x86 later.
>> However, I don't have a test environment to check the impact of bswap.
>> Would you please measure the run time between the following section if possible?
> 
> It'd make more sense to have a real stand alone test program, no?
> I can try to write one today, but I have some really nasty important bugs to fix first.


OK.  I will prepare a test code with sample data.  
Since I found a ppc machine around, I will run the code and post the results of
x86 and ppc.


By the way, the following data is a result of x86 measured in QEMU/KVM.  

This data shows, how many times the function is called (#called), runtime of 
original function(orig.), runtime of this patch(patch), speedup ratio (ratio).

Test1: Guest OS read 3GB file, which is bigger than memory.
#called     orig.(msec)     patch(msec)     ratio
108         1.1             0.1             7.6
102         1.0             0.1             6.8
132         1.6             0.2             7.1
 
Test2: Guest OS read/write 3GB file, which is bigger than memory.
#called     orig.(msec)     patch(msec)     ratio
2394        33              7.7             4.3
2100        29              7.1             4.1
2832        40              9.9             4.0

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-17  9:42                 ` OHMURA Kei
@ 2010-02-17  9:46                   ` Alexander Graf
  2010-02-18  5:57                     ` OHMURA Kei
  2010-02-17  9:47                   ` Avi Kivity
  1 sibling, 1 reply; 26+ messages in thread
From: Alexander Graf @ 2010-02-17  9:46 UTC (permalink / raw)
  To: OHMURA Kei
  Cc: kvm@vger.kernel.org, mtosatti, Yoshiaki Tamura,
	qemu-devel@nongnu.org, Avi Kivity, drepper


On 17.02.2010, at 10:42, OHMURA Kei wrote:

>>>> "We think"? I mean - yes, I think so too. But have you actually measured it?
>>>> How much improvement are we talking here?
>>>> Is it still faster when a bswap is involved?
>>> Thanks for pointing out.
>>> I will post the data for x86 later.
>>> However, I don't have a test environment to check the impact of bswap.
>>> Would you please measure the run time between the following section if possible?
>> It'd make more sense to have a real stand alone test program, no?
>> I can try to write one today, but I have some really nasty important bugs to fix first.
> 
> 
> OK.  I will prepare a test code with sample data.  Since I found a ppc machine around, I will run the code and post the results of
> x86 and ppc.
> 
> 
> By the way, the following data is a result of x86 measured in QEMU/KVM.  
> This data shows, how many times the function is called (#called), runtime of original function(orig.), runtime of this patch(patch), speedup ratio (ratio).

That does indeed look promising!

Thanks for doing this micro-benchmark. I just want to be 100% sure that it doesn't affect performance for big endian badly.


Alex

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-17  9:42                 ` OHMURA Kei
  2010-02-17  9:46                   ` Alexander Graf
@ 2010-02-17  9:47                   ` Avi Kivity
  2010-02-17  9:49                     ` Alexander Graf
  1 sibling, 1 reply; 26+ messages in thread
From: Avi Kivity @ 2010-02-17  9:47 UTC (permalink / raw)
  To: OHMURA Kei
  Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, mtosatti,
	Yoshiaki Tamura, Alexander Graf, drepper

On 02/17/2010 11:42 AM, OHMURA Kei wrote:
>>>> "We think"? I mean - yes, I think so too. But have you actually 
>>>> measured it?
>>>> How much improvement are we talking here?
>>>> Is it still faster when a bswap is involved?
>>> Thanks for pointing out.
>>> I will post the data for x86 later.
>>> However, I don't have a test environment to check the impact of bswap.
>>> Would you please measure the run time between the following section 
>>> if possible?
>>
>> It'd make more sense to have a real stand alone test program, no?
>> I can try to write one today, but I have some really nasty important 
>> bugs to fix first.
>
>
> OK.  I will prepare a test code with sample data.  Since I found a ppc 
> machine around, I will run the code and post the results of
> x86 and ppc.
>

I've applied the patch - I think the x86 results justify it, and I'll be 
very surprised if ppc doesn't show a similar gain.  Skipping 7 memory 
accesses and 7 tests must be a win.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-17  9:47                   ` Avi Kivity
@ 2010-02-17  9:49                     ` Alexander Graf
  0 siblings, 0 replies; 26+ messages in thread
From: Alexander Graf @ 2010-02-17  9:49 UTC (permalink / raw)
  To: Avi Kivity
  Cc: OHMURA Kei, kvm@vger.kernel.org, mtosatti, Yoshiaki Tamura,
	qemu-devel@nongnu.org, drepper


On 17.02.2010, at 10:47, Avi Kivity wrote:

> On 02/17/2010 11:42 AM, OHMURA Kei wrote:
>>>>> "We think"? I mean - yes, I think so too. But have you actually measured it?
>>>>> How much improvement are we talking here?
>>>>> Is it still faster when a bswap is involved?
>>>> Thanks for pointing out.
>>>> I will post the data for x86 later.
>>>> However, I don't have a test environment to check the impact of bswap.
>>>> Would you please measure the run time between the following section if possible?
>>> 
>>> It'd make more sense to have a real stand alone test program, no?
>>> I can try to write one today, but I have some really nasty important bugs to fix first.
>> 
>> 
>> OK.  I will prepare a test code with sample data.  Since I found a ppc machine around, I will run the code and post the results of
>> x86 and ppc.
>> 
> 
> I've applied the patch - I think the x86 results justify it, and I'll be very surprised if ppc doesn't show a similar gain.  Skipping 7 memory accesses and 7 tests must be a win.

Sounds good to me. I don't assume bswap to be horribly slow either. Just want to be sure.


Alex

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-17  9:46                   ` Alexander Graf
@ 2010-02-18  5:57                     ` OHMURA Kei
  2010-02-18 10:30                       ` Alexander Graf
  0 siblings, 1 reply; 26+ messages in thread
From: OHMURA Kei @ 2010-02-18  5:57 UTC (permalink / raw)
  To: Alexander Graf
  Cc: ohmura.kei, kvm@vger.kernel.org, mtosatti, Yoshiaki Tamura,
	qemu-devel@nongnu.org, Avi Kivity, drepper

>>>>> "We think"? I mean - yes, I think so too. But have you actually measured it?
>>>>> How much improvement are we talking here?
>>>>> Is it still faster when a bswap is involved?
>>>> Thanks for pointing out.
>>>> I will post the data for x86 later.
>>>> However, I don't have a test environment to check the impact of bswap.
>>>> Would you please measure the run time between the following section if possible?
>>> It'd make more sense to have a real stand alone test program, no?
>>> I can try to write one today, but I have some really nasty important bugs to fix first.
>>
>> OK.  I will prepare a test code with sample data.  Since I found a ppc machine around, I will run the code and post the results of
>> x86 and ppc.
>>
>>
>> By the way, the following data is a result of x86 measured in QEMU/KVM.  
>> This data shows, how many times the function is called (#called), runtime of original function(orig.), runtime of this patch(patch), speedup ratio (ratio).
> 
> That does indeed look promising!
> 
> Thanks for doing this micro-benchmark. I just want to be 100% sure that it doesn't affect performance for big endian badly.


I measured runtime of the test code with sample data.  My test environment 
and results are described below.

x86 Test Environment:
CPU: 4x Intel Xeon Quad Core 2.66GHz
Mem size: 6GB

ppc Test Environment:
CPU: 2x Dual Core PPC970MP
Mem size: 2GB

The sample data of dirty bitmap was produced by QEMU/KVM while the guest OS
was live migrating.  To measure the runtime I copied cpu_get_real_ticks() of
QEMU to my test program.


Experimental results:
Test1: Guest OS read 3GB file, which is bigger than memory. 
       orig.(msec)    patch(msec)    ratio
x86    0.3            0.1            6.4 
ppc    7.9            2.7            3.0 

Test2: Guest OS read/write 3GB file, which is bigger than memory. 
       orig.(msec)    patch(msec)    ratio
x86    12.0           3.2            3.7 
ppc    251.1          123            2.0 


I also measured the runtime of bswap itself on ppc, and I found it was only 
just 0.3% ~ 0.7 % of the runtime described above. 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-18  5:57                     ` OHMURA Kei
@ 2010-02-18 10:30                       ` Alexander Graf
  0 siblings, 0 replies; 26+ messages in thread
From: Alexander Graf @ 2010-02-18 10:30 UTC (permalink / raw)
  To: OHMURA Kei
  Cc: kvm@vger.kernel.org, mtosatti, Yoshiaki Tamura,
	qemu-devel@nongnu.org, Avi Kivity, drepper


On 18.02.2010, at 06:57, OHMURA Kei wrote:

>>>>>> "We think"? I mean - yes, I think so too. But have you actually measured it?
>>>>>> How much improvement are we talking here?
>>>>>> Is it still faster when a bswap is involved?
>>>>> Thanks for pointing out.
>>>>> I will post the data for x86 later.
>>>>> However, I don't have a test environment to check the impact of bswap.
>>>>> Would you please measure the run time between the following section if possible?
>>>> It'd make more sense to have a real stand alone test program, no?
>>>> I can try to write one today, but I have some really nasty important bugs to fix first.
>>> 
>>> OK.  I will prepare a test code with sample data.  Since I found a ppc machine around, I will run the code and post the results of
>>> x86 and ppc.
>>> 
>>> 
>>> By the way, the following data is a result of x86 measured in QEMU/KVM.  This data shows, how many times the function is called (#called), runtime of original function(orig.), runtime of this patch(patch), speedup ratio (ratio).
>> That does indeed look promising!
>> Thanks for doing this micro-benchmark. I just want to be 100% sure that it doesn't affect performance for big endian badly.
> 
> 
> I measured runtime of the test code with sample data.  My test environment and results are described below.
> 
> x86 Test Environment:
> CPU: 4x Intel Xeon Quad Core 2.66GHz
> Mem size: 6GB
> 
> ppc Test Environment:
> CPU: 2x Dual Core PPC970MP
> Mem size: 2GB
> 
> The sample data of dirty bitmap was produced by QEMU/KVM while the guest OS
> was live migrating.  To measure the runtime I copied cpu_get_real_ticks() of
> QEMU to my test program.
> 
> 
> Experimental results:
> Test1: Guest OS read 3GB file, which is bigger than memory.       orig.(msec)    patch(msec)    ratio
> x86    0.3            0.1            6.4 ppc    7.9            2.7            3.0 
> Test2: Guest OS read/write 3GB file, which is bigger than memory.       orig.(msec)    patch(msec)    ratio
> x86    12.0           3.2            3.7 ppc    251.1          123            2.0 
> 
> I also measured the runtime of bswap itself on ppc, and I found it was only just 0.3% ~ 0.7 % of the runtime described above. 

Awesome! Thank you so much for giving actual data to make me feel comfortable with it :-).


Alex

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2010-02-18 10:31 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-10 10:52 [Qemu-devel] [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling OHMURA Kei
2010-02-10 13:10 ` [Qemu-devel] " Ulrich Drepper
2010-02-10 13:20 ` Avi Kivity
2010-02-10 15:54   ` Anthony Liguori
2010-02-10 15:57     ` Avi Kivity
2010-02-10 16:00     ` Alexander Graf
2010-02-10 16:35       ` Anthony Liguori
2010-02-10 16:43         ` Alexander Graf
2010-02-10 16:46           ` Avi Kivity
2010-02-10 16:47             ` Alexander Graf
2010-02-10 16:52               ` Avi Kivity
2010-02-10 16:54                 ` Alexander Graf
2010-02-10 16:43         ` Avi Kivity
2010-02-10 15:55   ` Anthony Liguori
2010-02-12  2:03     ` OHMURA Kei
2010-02-14 12:34       ` Avi Kivity
2010-02-15  6:12         ` OHMURA Kei
2010-02-15  8:24           ` Alexander Graf
2010-02-16 11:16             ` OHMURA Kei
2010-02-16 11:18               ` Alexander Graf
2010-02-17  9:42                 ` OHMURA Kei
2010-02-17  9:46                   ` Alexander Graf
2010-02-18  5:57                     ` OHMURA Kei
2010-02-18 10:30                       ` Alexander Graf
2010-02-17  9:47                   ` Avi Kivity
2010-02-17  9:49                     ` Alexander Graf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).