qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling
@ 2010-02-05 10:18 OHMURA Kei
  2010-02-05 12:04 ` [Qemu-devel] " Jan Kiszka
  2010-02-08 12:40 ` Avi Kivity
  0 siblings, 2 replies; 11+ messages in thread
From: OHMURA Kei @ 2010-02-05 10:18 UTC (permalink / raw)
  To: kvm, qemu-devel; +Cc: ohmura.kei, avi

dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c.
But We think that dirty-bitmap-traveling by long size is faster than by byte
size especially when most of memory is not dirty.

Signed-off-by: OHMURA Kei <ohmura.kei@lab.ntt.co.jp>
---
 qemu-kvm.c |   49 ++++++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index a305907..5459cdd 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -2433,22 +2433,21 @@ int kvm_physical_memory_set_dirty_tracking(int enable)
 }
 
 /* get kvm's dirty pages bitmap and update qemu's */
-static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
-                                         unsigned char *bitmap,
-                                         unsigned long offset,
-                                         unsigned long mem_size)
+static void kvm_get_dirty_pages_log_range_by_byte(unsigned int start,
+                                                  unsigned int end,
+                                                  unsigned char *bitmap,
+                                                  unsigned long offset)
 {
     unsigned int i, j, n = 0;
     unsigned char c;
     unsigned long page_number, addr, addr1;
     ram_addr_t ram_addr;
-    unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
 
     /* 
      * bitmap-traveling is faster than memory-traveling (for addr...) 
      * especially when most of the memory is not dirty.
      */
-    for (i = 0; i < len; i++) {
+    for (i = start; i < end; i++) {
         c = bitmap[i];
         while (c > 0) {
             j = ffsl(c) - 1;
@@ -2461,13 +2460,49 @@ static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
             n++;
         }
     }
+}
+
+static int kvm_get_dirty_pages_log_range_by_long(unsigned long start_addr,
+                                                 unsigned char *bitmap,
+                                                 unsigned long offset,
+                                                 unsigned long mem_size)
+{
+    unsigned int i;
+    unsigned int len;
+    unsigned long *bitmap_ul = (unsigned long *)bitmap;
+
+    /* bitmap-traveling by long size is faster than by byte size
+     * especially when most of memory is not dirty.
+     * bitmap should be long-size aligned for traveling by long.
+     */
+    if (((unsigned long)bitmap & (TARGET_LONG_SIZE - 1)) == 0) {
+        len = ((mem_size / TARGET_PAGE_SIZE) + TARGET_LONG_BITS - 1) /
+            TARGET_LONG_BITS;
+        for (i = 0; i < len; i++)
+            if (bitmap_ul[i] != 0)
+                kvm_get_dirty_pages_log_range_by_byte(i * TARGET_LONG_SIZE, 
+                    (i + 1) * TARGET_LONG_SIZE, bitmap, offset);
+        /* 
+         * We will check the remaining dirty-bitmap, 
+         * when the mem_size is not a multiple of TARGET_LONG_SIZE. 
+         */ 
+        if ((mem_size & (TARGET_LONG_SIZE - 1)) != 0) {
+            len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
+            kvm_get_dirty_pages_log_range_by_byte(i * TARGET_LONG_SIZE, 
+                len, bitmap, offset);
+        }
+    } else { /* slow path: traveling by byte. */
+        len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
+        kvm_get_dirty_pages_log_range_by_byte(0, len, bitmap, offset);
+    }
+
     return 0;
 }
 
 static int kvm_get_dirty_bitmap_cb(unsigned long start, unsigned long len,
                                    void *bitmap, void *opaque)
 {
-    return kvm_get_dirty_pages_log_range(start, bitmap, start, len);
+    return kvm_get_dirty_pages_log_range_by_long(start, bitmap, start, len);
 }
 
 /* 
-- 1.6.3.3 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] Re: [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-05 10:18 [Qemu-devel] [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling OHMURA Kei
@ 2010-02-05 12:04 ` Jan Kiszka
  2010-02-08  6:14   ` OHMURA Kei
  2010-02-08 12:40 ` Avi Kivity
  1 sibling, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2010-02-05 12:04 UTC (permalink / raw)
  To: OHMURA Kei; +Cc: qemu-devel, kvm, avi

OHMURA Kei wrote:
> dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c.
> But We think that dirty-bitmap-traveling by long size is faster than by byte
> size especially when most of memory is not dirty.

Sounds logical - do you have numbers on the improvement?

Would be great if you could provide a version for upstream as well
because it will likely replace this qemu-kvm code on day.

Jan

> 
> Signed-off-by: OHMURA Kei <ohmura.kei@lab.ntt.co.jp>
> ---
>  qemu-kvm.c |   49 ++++++++++++++++++++++++++++++++++++++++++-------
>  1 files changed, 42 insertions(+), 7 deletions(-)
> 
> diff --git a/qemu-kvm.c b/qemu-kvm.c
> index a305907..5459cdd 100644
> --- a/qemu-kvm.c
> +++ b/qemu-kvm.c
> @@ -2433,22 +2433,21 @@ int kvm_physical_memory_set_dirty_tracking(int enable)
>  }
>  
>  /* get kvm's dirty pages bitmap and update qemu's */
> -static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
> -                                         unsigned char *bitmap,
> -                                         unsigned long offset,
> -                                         unsigned long mem_size)
> +static void kvm_get_dirty_pages_log_range_by_byte(unsigned int start,
> +                                                  unsigned int end,
> +                                                  unsigned char *bitmap,
> +                                                  unsigned long offset)
>  {
>      unsigned int i, j, n = 0;
>      unsigned char c;
>      unsigned long page_number, addr, addr1;
>      ram_addr_t ram_addr;
> -    unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
>  
>      /* 
>       * bitmap-traveling is faster than memory-traveling (for addr...) 
>       * especially when most of the memory is not dirty.
>       */
> -    for (i = 0; i < len; i++) {
> +    for (i = start; i < end; i++) {
>          c = bitmap[i];
>          while (c > 0) {
>              j = ffsl(c) - 1;
> @@ -2461,13 +2460,49 @@ static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
>              n++;
>          }
>      }
> +}
> +
> +static int kvm_get_dirty_pages_log_range_by_long(unsigned long start_addr,
> +                                                 unsigned char *bitmap,
> +                                                 unsigned long offset,
> +                                                 unsigned long mem_size)
> +{
> +    unsigned int i;
> +    unsigned int len;
> +    unsigned long *bitmap_ul = (unsigned long *)bitmap;
> +
> +    /* bitmap-traveling by long size is faster than by byte size
> +     * especially when most of memory is not dirty.
> +     * bitmap should be long-size aligned for traveling by long.
> +     */
> +    if (((unsigned long)bitmap & (TARGET_LONG_SIZE - 1)) == 0) {
> +        len = ((mem_size / TARGET_PAGE_SIZE) + TARGET_LONG_BITS - 1) /
> +            TARGET_LONG_BITS;
> +        for (i = 0; i < len; i++)
> +            if (bitmap_ul[i] != 0)
> +                kvm_get_dirty_pages_log_range_by_byte(i * TARGET_LONG_SIZE, 
> +                    (i + 1) * TARGET_LONG_SIZE, bitmap, offset);
> +        /* 
> +         * We will check the remaining dirty-bitmap, 
> +         * when the mem_size is not a multiple of TARGET_LONG_SIZE. 
> +         */ 
> +        if ((mem_size & (TARGET_LONG_SIZE - 1)) != 0) {
> +            len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
> +            kvm_get_dirty_pages_log_range_by_byte(i * TARGET_LONG_SIZE, 
> +                len, bitmap, offset);
> +        }
> +    } else { /* slow path: traveling by byte. */
> +        len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
> +        kvm_get_dirty_pages_log_range_by_byte(0, len, bitmap, offset);
> +    }
> +
>      return 0;
>  }
>  
>  static int kvm_get_dirty_bitmap_cb(unsigned long start, unsigned long len,
>                                     void *bitmap, void *opaque)
>  {
> -    return kvm_get_dirty_pages_log_range(start, bitmap, start, len);
> +    return kvm_get_dirty_pages_log_range_by_long(start, bitmap, start, len);
>  }
>  
>  /* 
> -- 1.6.3.3 

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] Re: [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-05 12:04 ` [Qemu-devel] " Jan Kiszka
@ 2010-02-08  6:14   ` OHMURA Kei
  2010-02-08 11:23     ` OHMURA Kei
  0 siblings, 1 reply; 11+ messages in thread
From: OHMURA Kei @ 2010-02-08  6:14 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: ohmura.kei, qemu-devel, kvm, avi

> Sounds logical - do you have numbers on the improvement?

Sure.  The patch showed approximately 3-7 times speed up when measured with 
rdtsc.  The test environment and detailed results are described below.

---
tmp = rdtsc();
/* function of original code*/
t1 += rdtsc() - tmp;
    
tmp = rdtsc();
/* function of this patch */
t2 += rdtsc() - tmp;
---

Test Envirionment:
CPU: 4x Intel Xeon Quad Core 2.66GHz
Mem size: 6GB
kvm version: 2.6.31-17-server
qemu version: commit ed880109f74f0a4dd5b7ec09e6a2d9ba4903d9a5

Host OS: Ubuntu 9.10 (kernel 2.6.31)
Guest OS: Debian/GNU Linux lenny (kernel 2.6.26)
Guest Mem size: 512MB

We executed live migration three times.  This data shows, how many times the 
function is called (#called), runtime of original (orig.), runtime of this 
patch (patch), speedup ratio (ratio), when live migration run.

Experimental results:
Test1: Guest OS read 3GB file, which is bigger than memory.
#called     orig.(msec)     patch(msec)     ratio
114         1.00            0.15            6.76
132         1.57            0.25            6.26
96          1.00            0.16            6.27
 
Test2: Guest OS read/write 3GB file, which is bigger than memory.
#called     orig.(msec)     patch(msec)     ratio
2196        38.1            10.6            3.59
2256        39.6            10.8            3.68
2112        36.3            10.3            3.53



> Would be great if you could provide a version for upstream as well
> because it will likely replace this qemu-kvm code on day.

O.K.  We'll prepare it.

We'll also post a patch set to quicken dirty pages checking in ram_save_block
and ram_save_live soon.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] Re: [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-08  6:14   ` OHMURA Kei
@ 2010-02-08 11:23     ` OHMURA Kei
  2010-02-08 11:44       ` Jan Kiszka
  0 siblings, 1 reply; 11+ messages in thread
From: OHMURA Kei @ 2010-02-08 11:23 UTC (permalink / raw)
  To: Jan Kiszka, kvm, qemu-devel; +Cc: ohmura.kei, avi

>> Would be great if you could provide a version for upstream as well
>> because it will likely replace this qemu-kvm code on day.
> O.K.  We'll prepare it.


We have implemented the version for upstream.  Some source code are borrowed 
from qemu-kvm.c.  It is not fully tested yet, though.

We also did performance test against this patch.  Test environment is the same 
as the email I sent before.


Experimental results:
Test1: Guest OS read 3GB file, which is bigger than memory.
#called     orig.(msec)     patch(msec)     ratio
14          3.79            0.18            20.8
12          3.20            0.15            21.4
11          2.89            0.14            21.0
 
Test2: Guest OS read/write 3GB file, which is bigger than memory.
#called     orig.(msec)     patch(msec)     ratio
364         180             8.70            20.7
326         161             7.71            20.9
474         235             11.7            20.1


---
 kvm-all.c |   80 +++++++++++++++++++++++++++++++++++++++++++++++++-----------
 1 files changed, 65 insertions(+), 15 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 15ec38e..9666843 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -279,9 +279,69 @@ int kvm_set_migration_log(int enable)
     return 0;
 }
 
-static int test_le_bit(unsigned long nr, unsigned char *addr)
+static inline void kvm_get_dirty_pages_log_range_by_byte(unsigned int start,
+                                                         unsigned int end,
+                                                         unsigned char *bitmap,
+                                                         unsigned long offset)
 {
-    return (addr[nr >> 3] >> (nr & 7)) & 1;
+    unsigned int i, j, n = 0;
+    unsigned long page_number, addr, addr1;
+    ram_addr_t ram_addr;
+    unsigned char c;
+
+    /*   
+     * bitmap-traveling is faster than memory-traveling (for addr...)
+     * especially when most of the memory is not dirty.
+     */
+    for (i = start; i < end; i++) {
+        c = bitmap[i];
+        while (c > 0) {
+            j = ffsl(c) - 1;
+            c &= ~(1u << j);
+            page_number = i * 8 + j;
+            addr1 = page_number * TARGET_PAGE_SIZE;
+            addr = offset + addr1;
+            ram_addr = cpu_get_physical_page_desc(addr);
+            cpu_physical_memory_set_dirty(ram_addr);
+            n++;
+        }
+    }
+}
+
+static int kvm_get_dirty_pages_log_range_by_long(unsigned long start_addr,
+                                                 unsigned char *bitmap,
+                                                 unsigned long mem_size)
+{
+    unsigned int i;
+    unsigned int len;
+    unsigned long *bitmap_ul = (unsigned long *)bitmap;
+
+    /* bitmap-traveling by long size is faster than by byte size
+     * especially when most of memory is not dirty.
+     * bitmap should be long-size aligned for traveling by long.
+     */
+    if (((unsigned long)bitmap & (TARGET_LONG_SIZE - 1)) == 0) {
+        len = ((mem_size / TARGET_PAGE_SIZE) + TARGET_LONG_BITS - 1) / 
+            TARGET_LONG_BITS;
+        for (i = 0; i < len; i++)
+            if (bitmap_ul[i] != 0)
+                kvm_get_dirty_pages_log_range_by_byte(i * TARGET_LONG_SIZE,
+                    (i + 1) * TARGET_LONG_SIZE, bitmap, start_addr);
+        /*                                                      
+         * We will check the remaining dirty-bitmap, 
+         * when the mem_size is not a multiple of TARGET_LONG_SIZE. 
+         */
+        if ((mem_size & (TARGET_LONG_SIZE - 1)) != 0) {
+            len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
+            kvm_get_dirty_pages_log_range_by_byte(i * TARGET_LONG_SIZE,
+                len, bitmap, start_addr);
+        }
+    } else { /* slow path: traveling by byte. */
+        len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
+        kvm_get_dirty_pages_log_range_by_byte(0, len, bitmap, start_addr);
+    }
+
+    return 0;
 }
 
 /**
@@ -297,8 +357,6 @@ int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
 {
     KVMState *s = kvm_state;
     unsigned long size, allocated_size = 0;
-    target_phys_addr_t phys_addr;
-    ram_addr_t addr;
     KVMDirtyLog d;
     KVMSlot *mem;
     int ret = 0;
@@ -327,17 +385,9 @@ int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
             break;
         }
 
-        for (phys_addr = mem->start_addr, addr = mem->phys_offset;
-             phys_addr < mem->start_addr + mem->memory_size;
-             phys_addr += TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
-            unsigned char *bitmap = (unsigned char *)d.dirty_bitmap;
-            unsigned nr = (phys_addr - mem->start_addr) >> TARGET_PAGE_BITS;
-
-            if (test_le_bit(nr, bitmap)) {
-                cpu_physical_memory_set_dirty(addr);
-            }
-        }
-        start_addr = phys_addr;
+        kvm_get_dirty_pages_log_range_by_long(mem->start_addr, 
+            d.dirty_bitmap, mem->memory_size);
+        start_addr = mem->start_addr + mem->memory_size;
     }
     qemu_free(d.dirty_bitmap);
 
-- 
1.6.3.3 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] Re: [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-08 11:23     ` OHMURA Kei
@ 2010-02-08 11:44       ` Jan Kiszka
  2010-02-09  9:55         ` OHMURA Kei
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2010-02-08 11:44 UTC (permalink / raw)
  To: OHMURA Kei; +Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, avi@redhat.com

OHMURA Kei wrote:
>>> Would be great if you could provide a version for upstream as well
>>> because it will likely replace this qemu-kvm code on day.
>> O.K.  We'll prepare it.
> 
> 
> We have implemented the version for upstream.  Some source code are borrowed 
> from qemu-kvm.c.  It is not fully tested yet, though.
> 
> We also did performance test against this patch.  Test environment is the same 
> as the email I sent before.
> 
> 
> Experimental results:
> Test1: Guest OS read 3GB file, which is bigger than memory.
> #called     orig.(msec)     patch(msec)     ratio
> 14          3.79            0.18            20.8
> 12          3.20            0.15            21.4
> 11          2.89            0.14            21.0
>  
> Test2: Guest OS read/write 3GB file, which is bigger than memory.
> #called     orig.(msec)     patch(msec)     ratio
> 364         180             8.70            20.7
> 326         161             7.71            20.9
> 474         235             11.7            20.1
> 

Wow, so we were really inefficient here. Nice work!

Once you are done with your tests, please post this against
qemu-kvm.git's uq/master so that Avi or Marcelo can push it upstream.
Minor remarks below.

> 
> ---
>  kvm-all.c |   80 +++++++++++++++++++++++++++++++++++++++++++++++++-----------
>  1 files changed, 65 insertions(+), 15 deletions(-)
> 
> diff --git a/kvm-all.c b/kvm-all.c
> index 15ec38e..9666843 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -279,9 +279,69 @@ int kvm_set_migration_log(int enable)
>      return 0;
>  }
>  
> -static int test_le_bit(unsigned long nr, unsigned char *addr)
> +static inline void kvm_get_dirty_pages_log_range_by_byte(unsigned int start,

I don't think inline is appropriate here. Smart compilers are able to do
this on their own. And small code footprint actually contributes to
speed as well.

> +                                                         unsigned int end,
> +                                                         unsigned char *bitmap,
> +                                                         unsigned long offset)
>  {
> -    return (addr[nr >> 3] >> (nr & 7)) & 1;
> +    unsigned int i, j, n = 0;
> +    unsigned long page_number, addr, addr1;
> +    ram_addr_t ram_addr;
> +    unsigned char c;
> +
> +    /*   
> +     * bitmap-traveling is faster than memory-traveling (for addr...)
> +     * especially when most of the memory is not dirty.
> +     */
> +    for (i = start; i < end; i++) {
> +        c = bitmap[i];
> +        while (c > 0) {
> +            j = ffsl(c) - 1;
> +            c &= ~(1u << j);
> +            page_number = i * 8 + j;
> +            addr1 = page_number * TARGET_PAGE_SIZE;
> +            addr = offset + addr1;
> +            ram_addr = cpu_get_physical_page_desc(addr);
> +            cpu_physical_memory_set_dirty(ram_addr);
> +            n++;
> +        }
> +    }
> +}
> +
> +static int kvm_get_dirty_pages_log_range_by_long(unsigned long start_addr,
> +                                                 unsigned char *bitmap,
> +                                                 unsigned long mem_size)
> +{
> +    unsigned int i;
> +    unsigned int len;
> +    unsigned long *bitmap_ul = (unsigned long *)bitmap;
> +
> +    /* bitmap-traveling by long size is faster than by byte size
> +     * especially when most of memory is not dirty.
> +     * bitmap should be long-size aligned for traveling by long.
> +     */
> +    if (((unsigned long)bitmap & (TARGET_LONG_SIZE - 1)) == 0) {
> +        len = ((mem_size / TARGET_PAGE_SIZE) + TARGET_LONG_BITS - 1) / 
> +            TARGET_LONG_BITS;
> +        for (i = 0; i < len; i++)
> +            if (bitmap_ul[i] != 0)
> +                kvm_get_dirty_pages_log_range_by_byte(i * TARGET_LONG_SIZE,
> +                    (i + 1) * TARGET_LONG_SIZE, bitmap, start_addr);

Missing { }, 2x.

> +        /*                                                      
> +         * We will check the remaining dirty-bitmap, 
> +         * when the mem_size is not a multiple of TARGET_LONG_SIZE. 
> +         */
> +        if ((mem_size & (TARGET_LONG_SIZE - 1)) != 0) {
> +            len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
> +            kvm_get_dirty_pages_log_range_by_byte(i * TARGET_LONG_SIZE,
> +                len, bitmap, start_addr);

This line should be indented to the '('.

> +        }
> +    } else { /* slow path: traveling by byte. */
> +        len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
> +        kvm_get_dirty_pages_log_range_by_byte(0, len, bitmap, start_addr);
> +    }
> +
> +    return 0;
>  }
>  
>  /**
> @@ -297,8 +357,6 @@ int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
>  {
>      KVMState *s = kvm_state;
>      unsigned long size, allocated_size = 0;
> -    target_phys_addr_t phys_addr;
> -    ram_addr_t addr;
>      KVMDirtyLog d;
>      KVMSlot *mem;
>      int ret = 0;
> @@ -327,17 +385,9 @@ int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
>              break;
>          }
>  
> -        for (phys_addr = mem->start_addr, addr = mem->phys_offset;
> -             phys_addr < mem->start_addr + mem->memory_size;
> -             phys_addr += TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
> -            unsigned char *bitmap = (unsigned char *)d.dirty_bitmap;
> -            unsigned nr = (phys_addr - mem->start_addr) >> TARGET_PAGE_BITS;
> -
> -            if (test_le_bit(nr, bitmap)) {
> -                cpu_physical_memory_set_dirty(addr);
> -            }
> -        }
> -        start_addr = phys_addr;
> +        kvm_get_dirty_pages_log_range_by_long(mem->start_addr, 
> +            d.dirty_bitmap, mem->memory_size);
> +        start_addr = mem->start_addr + mem->memory_size;
>      }
>      qemu_free(d.dirty_bitmap);
>  

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] Re: [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-05 10:18 [Qemu-devel] [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling OHMURA Kei
  2010-02-05 12:04 ` [Qemu-devel] " Jan Kiszka
@ 2010-02-08 12:40 ` Avi Kivity
  2010-02-09  9:54   ` OHMURA Kei
  1 sibling, 1 reply; 11+ messages in thread
From: Avi Kivity @ 2010-02-08 12:40 UTC (permalink / raw)
  To: OHMURA Kei; +Cc: qemu-devel, kvm

On 02/05/2010 12:18 PM, OHMURA Kei wrote:
> dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c.
> But We think that dirty-bitmap-traveling by long size is faster than by byte
> size especially when most of memory is not dirty.
>
>
>
> +
> +static int kvm_get_dirty_pages_log_range_by_long(unsigned long start_addr,
> +                                                 unsigned char *bitmap,
> +                                                 unsigned long offset,
> +                                                 unsigned long mem_size)
> +{
> +    unsigned int i;
> +    unsigned int len;
> +    unsigned long *bitmap_ul = (unsigned long *)bitmap;
> +
> +    /* bitmap-traveling by long size is faster than by byte size
> +     * especially when most of memory is not dirty.
> +     * bitmap should be long-size aligned for traveling by long.
> +     */
> +    if (((unsigned long)bitmap & (TARGET_LONG_SIZE - 1)) == 0) {
>   

Since we allocate the bitmap, we can be sure that it is aligned on a
long boundary (qemu_malloc() should guarantee that). So you can
eliminate the fallback.

> +        len = ((mem_size / TARGET_PAGE_SIZE) + TARGET_LONG_BITS - 1) /
> +            TARGET_LONG_BITS;
> +        for (i = 0; i < len; i++)
> +            if (bitmap_ul[i] != 0)
> +                kvm_get_dirty_pages_log_range_by_byte(i * TARGET_LONG_SIZE, 
> +                    (i + 1) * TARGET_LONG_SIZE, bitmap, offset);
>   

Better to just use the original loop here (since we don't need the
function as a fallback).

> +        /* 
> +         * We will check the remaining dirty-bitmap, 
> +         * when the mem_size is not a multiple of TARGET_LONG_SIZE. 
> +         */ 
> +        if ((mem_size & (TARGET_LONG_SIZE - 1)) != 0) {
> +            len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
> +            kvm_get_dirty_pages_log_range_by_byte(i * TARGET_LONG_SIZE, 
> +                len, bitmap, offset);
> +        }
>   

Seems like the bitmap size is also aligned as well (allocated using
BITMAP_SIZE which aligns using HOST_LONG_BITS), so this is unnecessary
as well.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] Re: [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-08 12:40 ` Avi Kivity
@ 2010-02-09  9:54   ` OHMURA Kei
  2010-02-09 10:26     ` Avi Kivity
  0 siblings, 1 reply; 11+ messages in thread
From: OHMURA Kei @ 2010-02-09  9:54 UTC (permalink / raw)
  To: Avi Kivity; +Cc: ohmura.kei, mtosatti, qemu-devel, kvm

Thank you for your comments.  We have implemented the code which applied your 
comments.  This is patch for qemu-kvm.c.

Signed-off-by: OHMURA Kei <ohmura.kei@lab.ntt.co.jp>
---
 qemu-kvm.c |   31 +++++++++++++++++++------------
 1 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index a305907..d7474ea 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -2438,27 +2438,34 @@ static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
                                          unsigned long offset,
                                          unsigned long mem_size)
 {
-    unsigned int i, j, n = 0;
+    unsigned int i, j, k, start, end;
     unsigned char c;
     unsigned long page_number, addr, addr1;
     ram_addr_t ram_addr;
-    unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
+    unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + TARGET_LONG_BITS - 1) /
+        TARGET_LONG_BITS;
+    unsigned long *bitmap_ul = (unsigned long *)bitmap;
 
     /* 
      * bitmap-traveling is faster than memory-traveling (for addr...) 
      * especially when most of the memory is not dirty.
      */
     for (i = 0; i < len; i++) {
-        c = bitmap[i];
-        while (c > 0) {
-            j = ffsl(c) - 1;
-            c &= ~(1u << j);
-            page_number = i * 8 + j;
-            addr1 = page_number * TARGET_PAGE_SIZE;
-            addr = offset + addr1;
-            ram_addr = cpu_get_physical_page_desc(addr);
-            cpu_physical_memory_set_dirty(ram_addr);
-            n++;
+        if (bitmap_ul[i] != 0) {
+            start = i * TARGET_LONG_SIZE;
+            end = (i + 1) * TARGET_LONG_SIZE;
+            for (j = start; j < end; j++) {
+                c = bitmap[j];
+                while (c > 0) {
+                    k = ffsl(c) - 1;
+                    c &= ~(1u << k);
+                    page_number = j * 8 + k;
+                    addr1 = page_number * TARGET_PAGE_SIZE;
+                    addr = offset + addr1;
+                    ram_addr = cpu_get_physical_page_desc(addr);
+                    cpu_physical_memory_set_dirty(ram_addr);
+                }
+            }
         }
     }
     return 0;
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] Re: [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-08 11:44       ` Jan Kiszka
@ 2010-02-09  9:55         ` OHMURA Kei
  0 siblings, 0 replies; 11+ messages in thread
From: OHMURA Kei @ 2010-02-09  9:55 UTC (permalink / raw)
  To: Jan Kiszka, avi@redhat.com
  Cc: ohmura.kei, mtosatti, qemu-devel@nongnu.org, kvm@vger.kernel.org

Thank you for your comments.  We have implemented the code which applied your
comments.  This is patch for upstream.

Signed-off-by: OHMURA Kei <ohmura.kei@lab.ntt.co.jp>
---
 kvm-all.c |   54 +++++++++++++++++++++++++++++++++++++++---------------
 1 files changed, 39 insertions(+), 15 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 6c0fd37..603307c 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -287,9 +287,43 @@ int kvm_set_migration_log(int enable)
     return 0;
 }
 
-static int test_le_bit(unsigned long nr, unsigned char *addr)
+/* get kvm's dirty pages bitmap and update qemu's */
+static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
+                                         unsigned char *bitmap,
+                                         unsigned long offset,
+                                         unsigned long mem_size)
 {
-    return (addr[nr >> 3] >> (nr & 7)) & 1;
+    unsigned int i, j, k, start, end;
+    unsigned char c;
+    unsigned long page_number, addr, addr1;
+    ram_addr_t ram_addr;
+    unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + TARGET_LONG_BITS - 1) /
+        TARGET_LONG_BITS;
+    unsigned long *bitmap_ul = (unsigned long *)bitmap;
+    
+    /*
+     * bitmap-traveling is faster than memory-traveling (for addr...)
+     * especially when most of the memory is not dirty.
+     */
+    for (i = 0; i < len; i++) {
+        if (bitmap_ul[i] != 0) {
+            start = i * TARGET_LONG_SIZE;
+            end = (i + 1) * TARGET_LONG_SIZE;
+            for (j = start; j < end; j++) {
+                c = bitmap[j];
+                while (c > 0) {
+                    k = ffsl(c) - 1;
+                    c &= ~(1u << k);
+                    page_number = j * 8 + k;
+                    addr1 = page_number * TARGET_PAGE_SIZE;
+                    addr = offset + addr1;
+                    ram_addr = cpu_get_physical_page_desc(addr);
+                    cpu_physical_memory_set_dirty(ram_addr);
+                }
+            }
+        }
+    }
+    return 0;
 }
 
 /**
@@ -305,8 +339,6 @@ int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
 {
     KVMState *s = kvm_state;
     unsigned long size, allocated_size = 0;
-    target_phys_addr_t phys_addr;
-    ram_addr_t addr;
     KVMDirtyLog d;
     KVMSlot *mem;
     int ret = 0;
@@ -335,17 +367,9 @@ int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
             break;
         }
 
-        for (phys_addr = mem->start_addr, addr = mem->phys_offset;
-             phys_addr < mem->start_addr + mem->memory_size;
-             phys_addr += TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
-            unsigned char *bitmap = (unsigned char *)d.dirty_bitmap;
-            unsigned nr = (phys_addr - mem->start_addr) >> TARGET_PAGE_BITS;
-
-            if (test_le_bit(nr, bitmap)) {
-                cpu_physical_memory_set_dirty(addr);
-            }
-        }
-        start_addr = phys_addr;
+        kvm_get_dirty_pages_log_range(mem->start_addr, d.dirty_bitmap, 
+                                      mem->start_addr, mem->memory_size);
+        start_addr = mem->start_addr + mem->memory_size;
     }
     qemu_free(d.dirty_bitmap);
 
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] Re: [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-09  9:54   ` OHMURA Kei
@ 2010-02-09 10:26     ` Avi Kivity
  2010-02-10  9:55       ` OHMURA Kei
  0 siblings, 1 reply; 11+ messages in thread
From: Avi Kivity @ 2010-02-09 10:26 UTC (permalink / raw)
  To: OHMURA Kei; +Cc: mtosatti, qemu-devel, kvm

On 02/09/2010 11:54 AM, OHMURA Kei wrote:
> Thank you for your comments.  We have implemented the code which applied your 
> comments.  This is patch for qemu-kvm.c.
>   

Please reuse the changelog when reposing a patch, this makes it easier
for me to apply it.

> @@ -2438,27 +2438,34 @@ static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
>                                           unsigned long offset,
>                                           unsigned long mem_size)
>  {
> -    unsigned int i, j, n = 0;
> +    unsigned int i, j, k, start, end;
>      unsigned char c;
>      unsigned long page_number, addr, addr1;
>      ram_addr_t ram_addr;
> -    unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
> +    unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + TARGET_LONG_BITS - 1) /
> +        TARGET_LONG_BITS;
> +    unsigned long *bitmap_ul = (unsigned long *)bitmap;
>  
>      /* 
>       * bitmap-traveling is faster than memory-traveling (for addr...) 
>       * especially when most of the memory is not dirty.
>       */
>      for (i = 0; i < len; i++) {
> -        c = bitmap[i];
> -        while (c > 0) {
> -            j = ffsl(c) - 1;
> -            c &= ~(1u << j);
> -            page_number = i * 8 + j;
> -            addr1 = page_number * TARGET_PAGE_SIZE;
> -            addr = offset + addr1;
> -            ram_addr = cpu_get_physical_page_desc(addr);
> -            cpu_physical_memory_set_dirty(ram_addr);
> -            n++;
> +        if (bitmap_ul[i] != 0) {
> +            start = i * TARGET_LONG_SIZE;
> +            end = (i + 1) * TARGET_LONG_SIZE;
>   

Should be a host long size, not guest. This will fail when running a
32-bit qemu-system-x86_64 binary.

> +            for (j = start; j < end; j++) {
> +                c = bitmap[j];
> +                while (c > 0) {
> +                    k = ffsl(c) - 1;
> +                    c &= ~(1u << k);
> +                    page_number = j * 8 + k;
> +                    addr1 = page_number * TARGET_PAGE_SIZE;
> +                    addr = offset + addr1;
> +                    ram_addr = cpu_get_physical_page_desc(addr);
> +                    cpu_physical_memory_set_dirty(ram_addr);
> +                }
> +            }
>   

Instead of using a nested loop if bitmap_ul[i] != 0, it is possible to
use just a single loop (while (c > 0)), and process a long's worth of data.

The only trickery is with big endian hosts, where the conversion from
bit number to page number is a bit complicated.

If we do this, we can convert the bitmap's type to unsigned long
throughout, and avoid the casts.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] Re: [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-09 10:26     ` Avi Kivity
@ 2010-02-10  9:55       ` OHMURA Kei
  2010-02-10 10:24         ` Avi Kivity
  0 siblings, 1 reply; 11+ messages in thread
From: OHMURA Kei @ 2010-02-10  9:55 UTC (permalink / raw)
  To: Avi Kivity; +Cc: ohmura.kei, mtosatti, qemu-devel, kvm

> Please reuse the changelog when reposing a patch, this makes it easier
> for me to apply it.

Thanks.  Will follow it from next time.


> Should be a host long size, not guest. This will fail when running a
> 32-bit qemu-system-x86_64 binary.

Sorry. That was our mistake.


> Instead of using a nested loop if bitmap_ul[i] != 0, it is possible to
> use just a single loop (while (c>  0)), and process a long's worth of data.
>
> The only trickery is with big endian hosts, where the conversion from
> bit number to page number is a bit complicated.

To convert the bitmap from big endian to little endian, le_bswap macro in
bswap.h seems useful, which is now undefined.  What do you think about this
approach?

This is an example bitmap-traveling code using le_bswap:
     /* 
      * bitmap-traveling is faster than memory-traveling (for addr...) 
      * especially when most of the memory is not dirty.
      */
     for (i = 0; i < len; i++) {
        if (bitmap_ul[i] != 0) {
            c = le_bswap(bitmap_ul[i], HOST_LONG_BITS);
            while (c > 0) {
                j = ffsl(c) - 1;
                c &= ~(1ul << j);
                page_number = i * HOST_LONG_BITS + j;
                addr1 = page_number * TARGET_PAGE_SIZE;
                addr = offset + addr1;
                ram_addr = cpu_get_physical_page_desc(addr);
                cpu_physical_memory_set_dirty(ram_addr);
            }
         }
     }

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] Re: [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling
  2010-02-10  9:55       ` OHMURA Kei
@ 2010-02-10 10:24         ` Avi Kivity
  0 siblings, 0 replies; 11+ messages in thread
From: Avi Kivity @ 2010-02-10 10:24 UTC (permalink / raw)
  To: OHMURA Kei; +Cc: mtosatti, qemu-devel, kvm

On 02/10/2010 11:55 AM, OHMURA Kei wrote:
>
>> Instead of using a nested loop if bitmap_ul[i] != 0, it is possible to
>> use just a single loop (while (c>  0)), and process a long's worth of data.
>>
>> The only trickery is with big endian hosts, where the conversion from
>> bit number to page number is a bit complicated.
>>     
> To convert the bitmap from big endian to little endian, le_bswap macro in
> bswap.h seems useful, which is now undefined.  What do you think about this
> approach?
>
> This is an example bitmap-traveling code using le_bswap:
>      /* 
>       * bitmap-traveling is faster than memory-traveling (for addr...) 
>       * especially when most of the memory is not dirty.
>       */
>      for (i = 0; i < len; i++) {
>         if (bitmap_ul[i] != 0) {
>             c = le_bswap(bitmap_ul[i], HOST_LONG_BITS);
>             while (c > 0) {
>                 j = ffsl(c) - 1;
>                 c &= ~(1ul << j);
>                 page_number = i * HOST_LONG_BITS + j;
>                 addr1 = page_number * TARGET_PAGE_SIZE;
>                 addr = offset + addr1;
>                 ram_addr = cpu_get_physical_page_desc(addr);
>                 cpu_physical_memory_set_dirty(ram_addr);
>             }
>          }
>      }
>   

Yes, that solves the problem very neatly.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-02-10 10:24 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-05 10:18 [Qemu-devel] [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling OHMURA Kei
2010-02-05 12:04 ` [Qemu-devel] " Jan Kiszka
2010-02-08  6:14   ` OHMURA Kei
2010-02-08 11:23     ` OHMURA Kei
2010-02-08 11:44       ` Jan Kiszka
2010-02-09  9:55         ` OHMURA Kei
2010-02-08 12:40 ` Avi Kivity
2010-02-09  9:54   ` OHMURA Kei
2010-02-09 10:26     ` Avi Kivity
2010-02-10  9:55       ` OHMURA Kei
2010-02-10 10:24         ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).