* [Qemu-devel] [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling
@ 2010-02-05 10:18 OHMURA Kei
2010-02-05 12:04 ` [Qemu-devel] " Jan Kiszka
2010-02-08 12:40 ` Avi Kivity
0 siblings, 2 replies; 11+ messages in thread
From: OHMURA Kei @ 2010-02-05 10:18 UTC (permalink / raw)
To: kvm, qemu-devel; +Cc: ohmura.kei, avi
dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c.
But We think that dirty-bitmap-traveling by long size is faster than by byte
size especially when most of memory is not dirty.
Signed-off-by: OHMURA Kei <ohmura.kei@lab.ntt.co.jp>
---
qemu-kvm.c | 49 ++++++++++++++++++++++++++++++++++++++++++-------
1 files changed, 42 insertions(+), 7 deletions(-)
diff --git a/qemu-kvm.c b/qemu-kvm.c
index a305907..5459cdd 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -2433,22 +2433,21 @@ int kvm_physical_memory_set_dirty_tracking(int enable)
}
/* get kvm's dirty pages bitmap and update qemu's */
-static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
- unsigned char *bitmap,
- unsigned long offset,
- unsigned long mem_size)
+static void kvm_get_dirty_pages_log_range_by_byte(unsigned int start,
+ unsigned int end,
+ unsigned char *bitmap,
+ unsigned long offset)
{
unsigned int i, j, n = 0;
unsigned char c;
unsigned long page_number, addr, addr1;
ram_addr_t ram_addr;
- unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
/*
* bitmap-traveling is faster than memory-traveling (for addr...)
* especially when most of the memory is not dirty.
*/
- for (i = 0; i < len; i++) {
+ for (i = start; i < end; i++) {
c = bitmap[i];
while (c > 0) {
j = ffsl(c) - 1;
@@ -2461,13 +2460,49 @@ static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
n++;
}
}
+}
+
+static int kvm_get_dirty_pages_log_range_by_long(unsigned long start_addr,
+ unsigned char *bitmap,
+ unsigned long offset,
+ unsigned long mem_size)
+{
+ unsigned int i;
+ unsigned int len;
+ unsigned long *bitmap_ul = (unsigned long *)bitmap;
+
+ /* bitmap-traveling by long size is faster than by byte size
+ * especially when most of memory is not dirty.
+ * bitmap should be long-size aligned for traveling by long.
+ */
+ if (((unsigned long)bitmap & (TARGET_LONG_SIZE - 1)) == 0) {
+ len = ((mem_size / TARGET_PAGE_SIZE) + TARGET_LONG_BITS - 1) /
+ TARGET_LONG_BITS;
+ for (i = 0; i < len; i++)
+ if (bitmap_ul[i] != 0)
+ kvm_get_dirty_pages_log_range_by_byte(i * TARGET_LONG_SIZE,
+ (i + 1) * TARGET_LONG_SIZE, bitmap, offset);
+ /*
+ * We will check the remaining dirty-bitmap,
+ * when the mem_size is not a multiple of TARGET_LONG_SIZE.
+ */
+ if ((mem_size & (TARGET_LONG_SIZE - 1)) != 0) {
+ len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
+ kvm_get_dirty_pages_log_range_by_byte(i * TARGET_LONG_SIZE,
+ len, bitmap, offset);
+ }
+ } else { /* slow path: traveling by byte. */
+ len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
+ kvm_get_dirty_pages_log_range_by_byte(0, len, bitmap, offset);
+ }
+
return 0;
}
static int kvm_get_dirty_bitmap_cb(unsigned long start, unsigned long len,
void *bitmap, void *opaque)
{
- return kvm_get_dirty_pages_log_range(start, bitmap, start, len);
+ return kvm_get_dirty_pages_log_range_by_long(start, bitmap, start, len);
}
/*
-- 1.6.3.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [Qemu-devel] Re: [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling
2010-02-05 10:18 [Qemu-devel] [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling OHMURA Kei
@ 2010-02-05 12:04 ` Jan Kiszka
2010-02-08 6:14 ` OHMURA Kei
2010-02-08 12:40 ` Avi Kivity
1 sibling, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2010-02-05 12:04 UTC (permalink / raw)
To: OHMURA Kei; +Cc: qemu-devel, kvm, avi
OHMURA Kei wrote:
> dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c.
> But We think that dirty-bitmap-traveling by long size is faster than by byte
> size especially when most of memory is not dirty.
Sounds logical - do you have numbers on the improvement?
Would be great if you could provide a version for upstream as well
because it will likely replace this qemu-kvm code on day.
Jan
>
> Signed-off-by: OHMURA Kei <ohmura.kei@lab.ntt.co.jp>
> ---
> qemu-kvm.c | 49 ++++++++++++++++++++++++++++++++++++++++++-------
> 1 files changed, 42 insertions(+), 7 deletions(-)
>
> diff --git a/qemu-kvm.c b/qemu-kvm.c
> index a305907..5459cdd 100644
> --- a/qemu-kvm.c
> +++ b/qemu-kvm.c
> @@ -2433,22 +2433,21 @@ int kvm_physical_memory_set_dirty_tracking(int enable)
> }
>
> /* get kvm's dirty pages bitmap and update qemu's */
> -static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
> - unsigned char *bitmap,
> - unsigned long offset,
> - unsigned long mem_size)
> +static void kvm_get_dirty_pages_log_range_by_byte(unsigned int start,
> + unsigned int end,
> + unsigned char *bitmap,
> + unsigned long offset)
> {
> unsigned int i, j, n = 0;
> unsigned char c;
> unsigned long page_number, addr, addr1;
> ram_addr_t ram_addr;
> - unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
>
> /*
> * bitmap-traveling is faster than memory-traveling (for addr...)
> * especially when most of the memory is not dirty.
> */
> - for (i = 0; i < len; i++) {
> + for (i = start; i < end; i++) {
> c = bitmap[i];
> while (c > 0) {
> j = ffsl(c) - 1;
> @@ -2461,13 +2460,49 @@ static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
> n++;
> }
> }
> +}
> +
> +static int kvm_get_dirty_pages_log_range_by_long(unsigned long start_addr,
> + unsigned char *bitmap,
> + unsigned long offset,
> + unsigned long mem_size)
> +{
> + unsigned int i;
> + unsigned int len;
> + unsigned long *bitmap_ul = (unsigned long *)bitmap;
> +
> + /* bitmap-traveling by long size is faster than by byte size
> + * especially when most of memory is not dirty.
> + * bitmap should be long-size aligned for traveling by long.
> + */
> + if (((unsigned long)bitmap & (TARGET_LONG_SIZE - 1)) == 0) {
> + len = ((mem_size / TARGET_PAGE_SIZE) + TARGET_LONG_BITS - 1) /
> + TARGET_LONG_BITS;
> + for (i = 0; i < len; i++)
> + if (bitmap_ul[i] != 0)
> + kvm_get_dirty_pages_log_range_by_byte(i * TARGET_LONG_SIZE,
> + (i + 1) * TARGET_LONG_SIZE, bitmap, offset);
> + /*
> + * We will check the remaining dirty-bitmap,
> + * when the mem_size is not a multiple of TARGET_LONG_SIZE.
> + */
> + if ((mem_size & (TARGET_LONG_SIZE - 1)) != 0) {
> + len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
> + kvm_get_dirty_pages_log_range_by_byte(i * TARGET_LONG_SIZE,
> + len, bitmap, offset);
> + }
> + } else { /* slow path: traveling by byte. */
> + len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
> + kvm_get_dirty_pages_log_range_by_byte(0, len, bitmap, offset);
> + }
> +
> return 0;
> }
>
> static int kvm_get_dirty_bitmap_cb(unsigned long start, unsigned long len,
> void *bitmap, void *opaque)
> {
> - return kvm_get_dirty_pages_log_range(start, bitmap, start, len);
> + return kvm_get_dirty_pages_log_range_by_long(start, bitmap, start, len);
> }
>
> /*
> -- 1.6.3.3
--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Qemu-devel] Re: [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling
2010-02-05 12:04 ` [Qemu-devel] " Jan Kiszka
@ 2010-02-08 6:14 ` OHMURA Kei
2010-02-08 11:23 ` OHMURA Kei
0 siblings, 1 reply; 11+ messages in thread
From: OHMURA Kei @ 2010-02-08 6:14 UTC (permalink / raw)
To: Jan Kiszka; +Cc: ohmura.kei, qemu-devel, kvm, avi
> Sounds logical - do you have numbers on the improvement?
Sure. The patch showed approximately 3-7 times speed up when measured with
rdtsc. The test environment and detailed results are described below.
---
tmp = rdtsc();
/* function of original code*/
t1 += rdtsc() - tmp;
tmp = rdtsc();
/* function of this patch */
t2 += rdtsc() - tmp;
---
Test Envirionment:
CPU: 4x Intel Xeon Quad Core 2.66GHz
Mem size: 6GB
kvm version: 2.6.31-17-server
qemu version: commit ed880109f74f0a4dd5b7ec09e6a2d9ba4903d9a5
Host OS: Ubuntu 9.10 (kernel 2.6.31)
Guest OS: Debian/GNU Linux lenny (kernel 2.6.26)
Guest Mem size: 512MB
We executed live migration three times. This data shows, how many times the
function is called (#called), runtime of original (orig.), runtime of this
patch (patch), speedup ratio (ratio), when live migration run.
Experimental results:
Test1: Guest OS read 3GB file, which is bigger than memory.
#called orig.(msec) patch(msec) ratio
114 1.00 0.15 6.76
132 1.57 0.25 6.26
96 1.00 0.16 6.27
Test2: Guest OS read/write 3GB file, which is bigger than memory.
#called orig.(msec) patch(msec) ratio
2196 38.1 10.6 3.59
2256 39.6 10.8 3.68
2112 36.3 10.3 3.53
> Would be great if you could provide a version for upstream as well
> because it will likely replace this qemu-kvm code on day.
O.K. We'll prepare it.
We'll also post a patch set to quicken dirty pages checking in ram_save_block
and ram_save_live soon.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Qemu-devel] Re: [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling
2010-02-08 6:14 ` OHMURA Kei
@ 2010-02-08 11:23 ` OHMURA Kei
2010-02-08 11:44 ` Jan Kiszka
0 siblings, 1 reply; 11+ messages in thread
From: OHMURA Kei @ 2010-02-08 11:23 UTC (permalink / raw)
To: Jan Kiszka, kvm, qemu-devel; +Cc: ohmura.kei, avi
>> Would be great if you could provide a version for upstream as well
>> because it will likely replace this qemu-kvm code on day.
> O.K. We'll prepare it.
We have implemented the version for upstream. Some source code are borrowed
from qemu-kvm.c. It is not fully tested yet, though.
We also did performance test against this patch. Test environment is the same
as the email I sent before.
Experimental results:
Test1: Guest OS read 3GB file, which is bigger than memory.
#called orig.(msec) patch(msec) ratio
14 3.79 0.18 20.8
12 3.20 0.15 21.4
11 2.89 0.14 21.0
Test2: Guest OS read/write 3GB file, which is bigger than memory.
#called orig.(msec) patch(msec) ratio
364 180 8.70 20.7
326 161 7.71 20.9
474 235 11.7 20.1
---
kvm-all.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++++-----------
1 files changed, 65 insertions(+), 15 deletions(-)
diff --git a/kvm-all.c b/kvm-all.c
index 15ec38e..9666843 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -279,9 +279,69 @@ int kvm_set_migration_log(int enable)
return 0;
}
-static int test_le_bit(unsigned long nr, unsigned char *addr)
+static inline void kvm_get_dirty_pages_log_range_by_byte(unsigned int start,
+ unsigned int end,
+ unsigned char *bitmap,
+ unsigned long offset)
{
- return (addr[nr >> 3] >> (nr & 7)) & 1;
+ unsigned int i, j, n = 0;
+ unsigned long page_number, addr, addr1;
+ ram_addr_t ram_addr;
+ unsigned char c;
+
+ /*
+ * bitmap-traveling is faster than memory-traveling (for addr...)
+ * especially when most of the memory is not dirty.
+ */
+ for (i = start; i < end; i++) {
+ c = bitmap[i];
+ while (c > 0) {
+ j = ffsl(c) - 1;
+ c &= ~(1u << j);
+ page_number = i * 8 + j;
+ addr1 = page_number * TARGET_PAGE_SIZE;
+ addr = offset + addr1;
+ ram_addr = cpu_get_physical_page_desc(addr);
+ cpu_physical_memory_set_dirty(ram_addr);
+ n++;
+ }
+ }
+}
+
+static int kvm_get_dirty_pages_log_range_by_long(unsigned long start_addr,
+ unsigned char *bitmap,
+ unsigned long mem_size)
+{
+ unsigned int i;
+ unsigned int len;
+ unsigned long *bitmap_ul = (unsigned long *)bitmap;
+
+ /* bitmap-traveling by long size is faster than by byte size
+ * especially when most of memory is not dirty.
+ * bitmap should be long-size aligned for traveling by long.
+ */
+ if (((unsigned long)bitmap & (TARGET_LONG_SIZE - 1)) == 0) {
+ len = ((mem_size / TARGET_PAGE_SIZE) + TARGET_LONG_BITS - 1) /
+ TARGET_LONG_BITS;
+ for (i = 0; i < len; i++)
+ if (bitmap_ul[i] != 0)
+ kvm_get_dirty_pages_log_range_by_byte(i * TARGET_LONG_SIZE,
+ (i + 1) * TARGET_LONG_SIZE, bitmap, start_addr);
+ /*
+ * We will check the remaining dirty-bitmap,
+ * when the mem_size is not a multiple of TARGET_LONG_SIZE.
+ */
+ if ((mem_size & (TARGET_LONG_SIZE - 1)) != 0) {
+ len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
+ kvm_get_dirty_pages_log_range_by_byte(i * TARGET_LONG_SIZE,
+ len, bitmap, start_addr);
+ }
+ } else { /* slow path: traveling by byte. */
+ len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
+ kvm_get_dirty_pages_log_range_by_byte(0, len, bitmap, start_addr);
+ }
+
+ return 0;
}
/**
@@ -297,8 +357,6 @@ int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
{
KVMState *s = kvm_state;
unsigned long size, allocated_size = 0;
- target_phys_addr_t phys_addr;
- ram_addr_t addr;
KVMDirtyLog d;
KVMSlot *mem;
int ret = 0;
@@ -327,17 +385,9 @@ int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
break;
}
- for (phys_addr = mem->start_addr, addr = mem->phys_offset;
- phys_addr < mem->start_addr + mem->memory_size;
- phys_addr += TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
- unsigned char *bitmap = (unsigned char *)d.dirty_bitmap;
- unsigned nr = (phys_addr - mem->start_addr) >> TARGET_PAGE_BITS;
-
- if (test_le_bit(nr, bitmap)) {
- cpu_physical_memory_set_dirty(addr);
- }
- }
- start_addr = phys_addr;
+ kvm_get_dirty_pages_log_range_by_long(mem->start_addr,
+ d.dirty_bitmap, mem->memory_size);
+ start_addr = mem->start_addr + mem->memory_size;
}
qemu_free(d.dirty_bitmap);
--
1.6.3.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [Qemu-devel] Re: [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling
2010-02-08 11:23 ` OHMURA Kei
@ 2010-02-08 11:44 ` Jan Kiszka
2010-02-09 9:55 ` OHMURA Kei
0 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2010-02-08 11:44 UTC (permalink / raw)
To: OHMURA Kei; +Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, avi@redhat.com
OHMURA Kei wrote:
>>> Would be great if you could provide a version for upstream as well
>>> because it will likely replace this qemu-kvm code on day.
>> O.K. We'll prepare it.
>
>
> We have implemented the version for upstream. Some source code are borrowed
> from qemu-kvm.c. It is not fully tested yet, though.
>
> We also did performance test against this patch. Test environment is the same
> as the email I sent before.
>
>
> Experimental results:
> Test1: Guest OS read 3GB file, which is bigger than memory.
> #called orig.(msec) patch(msec) ratio
> 14 3.79 0.18 20.8
> 12 3.20 0.15 21.4
> 11 2.89 0.14 21.0
>
> Test2: Guest OS read/write 3GB file, which is bigger than memory.
> #called orig.(msec) patch(msec) ratio
> 364 180 8.70 20.7
> 326 161 7.71 20.9
> 474 235 11.7 20.1
>
Wow, so we were really inefficient here. Nice work!
Once you are done with your tests, please post this against
qemu-kvm.git's uq/master so that Avi or Marcelo can push it upstream.
Minor remarks below.
>
> ---
> kvm-all.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++++-----------
> 1 files changed, 65 insertions(+), 15 deletions(-)
>
> diff --git a/kvm-all.c b/kvm-all.c
> index 15ec38e..9666843 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -279,9 +279,69 @@ int kvm_set_migration_log(int enable)
> return 0;
> }
>
> -static int test_le_bit(unsigned long nr, unsigned char *addr)
> +static inline void kvm_get_dirty_pages_log_range_by_byte(unsigned int start,
I don't think inline is appropriate here. Smart compilers are able to do
this on their own. And small code footprint actually contributes to
speed as well.
> + unsigned int end,
> + unsigned char *bitmap,
> + unsigned long offset)
> {
> - return (addr[nr >> 3] >> (nr & 7)) & 1;
> + unsigned int i, j, n = 0;
> + unsigned long page_number, addr, addr1;
> + ram_addr_t ram_addr;
> + unsigned char c;
> +
> + /*
> + * bitmap-traveling is faster than memory-traveling (for addr...)
> + * especially when most of the memory is not dirty.
> + */
> + for (i = start; i < end; i++) {
> + c = bitmap[i];
> + while (c > 0) {
> + j = ffsl(c) - 1;
> + c &= ~(1u << j);
> + page_number = i * 8 + j;
> + addr1 = page_number * TARGET_PAGE_SIZE;
> + addr = offset + addr1;
> + ram_addr = cpu_get_physical_page_desc(addr);
> + cpu_physical_memory_set_dirty(ram_addr);
> + n++;
> + }
> + }
> +}
> +
> +static int kvm_get_dirty_pages_log_range_by_long(unsigned long start_addr,
> + unsigned char *bitmap,
> + unsigned long mem_size)
> +{
> + unsigned int i;
> + unsigned int len;
> + unsigned long *bitmap_ul = (unsigned long *)bitmap;
> +
> + /* bitmap-traveling by long size is faster than by byte size
> + * especially when most of memory is not dirty.
> + * bitmap should be long-size aligned for traveling by long.
> + */
> + if (((unsigned long)bitmap & (TARGET_LONG_SIZE - 1)) == 0) {
> + len = ((mem_size / TARGET_PAGE_SIZE) + TARGET_LONG_BITS - 1) /
> + TARGET_LONG_BITS;
> + for (i = 0; i < len; i++)
> + if (bitmap_ul[i] != 0)
> + kvm_get_dirty_pages_log_range_by_byte(i * TARGET_LONG_SIZE,
> + (i + 1) * TARGET_LONG_SIZE, bitmap, start_addr);
Missing { }, 2x.
> + /*
> + * We will check the remaining dirty-bitmap,
> + * when the mem_size is not a multiple of TARGET_LONG_SIZE.
> + */
> + if ((mem_size & (TARGET_LONG_SIZE - 1)) != 0) {
> + len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
> + kvm_get_dirty_pages_log_range_by_byte(i * TARGET_LONG_SIZE,
> + len, bitmap, start_addr);
This line should be indented to the '('.
> + }
> + } else { /* slow path: traveling by byte. */
> + len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
> + kvm_get_dirty_pages_log_range_by_byte(0, len, bitmap, start_addr);
> + }
> +
> + return 0;
> }
>
> /**
> @@ -297,8 +357,6 @@ int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
> {
> KVMState *s = kvm_state;
> unsigned long size, allocated_size = 0;
> - target_phys_addr_t phys_addr;
> - ram_addr_t addr;
> KVMDirtyLog d;
> KVMSlot *mem;
> int ret = 0;
> @@ -327,17 +385,9 @@ int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
> break;
> }
>
> - for (phys_addr = mem->start_addr, addr = mem->phys_offset;
> - phys_addr < mem->start_addr + mem->memory_size;
> - phys_addr += TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
> - unsigned char *bitmap = (unsigned char *)d.dirty_bitmap;
> - unsigned nr = (phys_addr - mem->start_addr) >> TARGET_PAGE_BITS;
> -
> - if (test_le_bit(nr, bitmap)) {
> - cpu_physical_memory_set_dirty(addr);
> - }
> - }
> - start_addr = phys_addr;
> + kvm_get_dirty_pages_log_range_by_long(mem->start_addr,
> + d.dirty_bitmap, mem->memory_size);
> + start_addr = mem->start_addr + mem->memory_size;
> }
> qemu_free(d.dirty_bitmap);
>
Thanks,
Jan
--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Qemu-devel] Re: [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling
2010-02-05 10:18 [Qemu-devel] [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling OHMURA Kei
2010-02-05 12:04 ` [Qemu-devel] " Jan Kiszka
@ 2010-02-08 12:40 ` Avi Kivity
2010-02-09 9:54 ` OHMURA Kei
1 sibling, 1 reply; 11+ messages in thread
From: Avi Kivity @ 2010-02-08 12:40 UTC (permalink / raw)
To: OHMURA Kei; +Cc: qemu-devel, kvm
On 02/05/2010 12:18 PM, OHMURA Kei wrote:
> dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c.
> But We think that dirty-bitmap-traveling by long size is faster than by byte
> size especially when most of memory is not dirty.
>
>
>
> +
> +static int kvm_get_dirty_pages_log_range_by_long(unsigned long start_addr,
> + unsigned char *bitmap,
> + unsigned long offset,
> + unsigned long mem_size)
> +{
> + unsigned int i;
> + unsigned int len;
> + unsigned long *bitmap_ul = (unsigned long *)bitmap;
> +
> + /* bitmap-traveling by long size is faster than by byte size
> + * especially when most of memory is not dirty.
> + * bitmap should be long-size aligned for traveling by long.
> + */
> + if (((unsigned long)bitmap & (TARGET_LONG_SIZE - 1)) == 0) {
>
Since we allocate the bitmap, we can be sure that it is aligned on a
long boundary (qemu_malloc() should guarantee that). So you can
eliminate the fallback.
> + len = ((mem_size / TARGET_PAGE_SIZE) + TARGET_LONG_BITS - 1) /
> + TARGET_LONG_BITS;
> + for (i = 0; i < len; i++)
> + if (bitmap_ul[i] != 0)
> + kvm_get_dirty_pages_log_range_by_byte(i * TARGET_LONG_SIZE,
> + (i + 1) * TARGET_LONG_SIZE, bitmap, offset);
>
Better to just use the original loop here (since we don't need the
function as a fallback).
> + /*
> + * We will check the remaining dirty-bitmap,
> + * when the mem_size is not a multiple of TARGET_LONG_SIZE.
> + */
> + if ((mem_size & (TARGET_LONG_SIZE - 1)) != 0) {
> + len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
> + kvm_get_dirty_pages_log_range_by_byte(i * TARGET_LONG_SIZE,
> + len, bitmap, offset);
> + }
>
Seems like the bitmap size is also aligned as well (allocated using
BITMAP_SIZE which aligns using HOST_LONG_BITS), so this is unnecessary
as well.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Qemu-devel] Re: [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling
2010-02-08 12:40 ` Avi Kivity
@ 2010-02-09 9:54 ` OHMURA Kei
2010-02-09 10:26 ` Avi Kivity
0 siblings, 1 reply; 11+ messages in thread
From: OHMURA Kei @ 2010-02-09 9:54 UTC (permalink / raw)
To: Avi Kivity; +Cc: ohmura.kei, mtosatti, qemu-devel, kvm
Thank you for your comments. We have implemented the code which applied your
comments. This is patch for qemu-kvm.c.
Signed-off-by: OHMURA Kei <ohmura.kei@lab.ntt.co.jp>
---
qemu-kvm.c | 31 +++++++++++++++++++------------
1 files changed, 19 insertions(+), 12 deletions(-)
diff --git a/qemu-kvm.c b/qemu-kvm.c
index a305907..d7474ea 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -2438,27 +2438,34 @@ static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
unsigned long offset,
unsigned long mem_size)
{
- unsigned int i, j, n = 0;
+ unsigned int i, j, k, start, end;
unsigned char c;
unsigned long page_number, addr, addr1;
ram_addr_t ram_addr;
- unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
+ unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + TARGET_LONG_BITS - 1) /
+ TARGET_LONG_BITS;
+ unsigned long *bitmap_ul = (unsigned long *)bitmap;
/*
* bitmap-traveling is faster than memory-traveling (for addr...)
* especially when most of the memory is not dirty.
*/
for (i = 0; i < len; i++) {
- c = bitmap[i];
- while (c > 0) {
- j = ffsl(c) - 1;
- c &= ~(1u << j);
- page_number = i * 8 + j;
- addr1 = page_number * TARGET_PAGE_SIZE;
- addr = offset + addr1;
- ram_addr = cpu_get_physical_page_desc(addr);
- cpu_physical_memory_set_dirty(ram_addr);
- n++;
+ if (bitmap_ul[i] != 0) {
+ start = i * TARGET_LONG_SIZE;
+ end = (i + 1) * TARGET_LONG_SIZE;
+ for (j = start; j < end; j++) {
+ c = bitmap[j];
+ while (c > 0) {
+ k = ffsl(c) - 1;
+ c &= ~(1u << k);
+ page_number = j * 8 + k;
+ addr1 = page_number * TARGET_PAGE_SIZE;
+ addr = offset + addr1;
+ ram_addr = cpu_get_physical_page_desc(addr);
+ cpu_physical_memory_set_dirty(ram_addr);
+ }
+ }
}
}
return 0;
--
1.6.3.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [Qemu-devel] Re: [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling
2010-02-08 11:44 ` Jan Kiszka
@ 2010-02-09 9:55 ` OHMURA Kei
0 siblings, 0 replies; 11+ messages in thread
From: OHMURA Kei @ 2010-02-09 9:55 UTC (permalink / raw)
To: Jan Kiszka, avi@redhat.com
Cc: ohmura.kei, mtosatti, qemu-devel@nongnu.org, kvm@vger.kernel.org
Thank you for your comments. We have implemented the code which applied your
comments. This is patch for upstream.
Signed-off-by: OHMURA Kei <ohmura.kei@lab.ntt.co.jp>
---
kvm-all.c | 54 +++++++++++++++++++++++++++++++++++++++---------------
1 files changed, 39 insertions(+), 15 deletions(-)
diff --git a/kvm-all.c b/kvm-all.c
index 6c0fd37..603307c 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -287,9 +287,43 @@ int kvm_set_migration_log(int enable)
return 0;
}
-static int test_le_bit(unsigned long nr, unsigned char *addr)
+/* get kvm's dirty pages bitmap and update qemu's */
+static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
+ unsigned char *bitmap,
+ unsigned long offset,
+ unsigned long mem_size)
{
- return (addr[nr >> 3] >> (nr & 7)) & 1;
+ unsigned int i, j, k, start, end;
+ unsigned char c;
+ unsigned long page_number, addr, addr1;
+ ram_addr_t ram_addr;
+ unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + TARGET_LONG_BITS - 1) /
+ TARGET_LONG_BITS;
+ unsigned long *bitmap_ul = (unsigned long *)bitmap;
+
+ /*
+ * bitmap-traveling is faster than memory-traveling (for addr...)
+ * especially when most of the memory is not dirty.
+ */
+ for (i = 0; i < len; i++) {
+ if (bitmap_ul[i] != 0) {
+ start = i * TARGET_LONG_SIZE;
+ end = (i + 1) * TARGET_LONG_SIZE;
+ for (j = start; j < end; j++) {
+ c = bitmap[j];
+ while (c > 0) {
+ k = ffsl(c) - 1;
+ c &= ~(1u << k);
+ page_number = j * 8 + k;
+ addr1 = page_number * TARGET_PAGE_SIZE;
+ addr = offset + addr1;
+ ram_addr = cpu_get_physical_page_desc(addr);
+ cpu_physical_memory_set_dirty(ram_addr);
+ }
+ }
+ }
+ }
+ return 0;
}
/**
@@ -305,8 +339,6 @@ int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
{
KVMState *s = kvm_state;
unsigned long size, allocated_size = 0;
- target_phys_addr_t phys_addr;
- ram_addr_t addr;
KVMDirtyLog d;
KVMSlot *mem;
int ret = 0;
@@ -335,17 +367,9 @@ int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
break;
}
- for (phys_addr = mem->start_addr, addr = mem->phys_offset;
- phys_addr < mem->start_addr + mem->memory_size;
- phys_addr += TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
- unsigned char *bitmap = (unsigned char *)d.dirty_bitmap;
- unsigned nr = (phys_addr - mem->start_addr) >> TARGET_PAGE_BITS;
-
- if (test_le_bit(nr, bitmap)) {
- cpu_physical_memory_set_dirty(addr);
- }
- }
- start_addr = phys_addr;
+ kvm_get_dirty_pages_log_range(mem->start_addr, d.dirty_bitmap,
+ mem->start_addr, mem->memory_size);
+ start_addr = mem->start_addr + mem->memory_size;
}
qemu_free(d.dirty_bitmap);
--
1.6.3.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [Qemu-devel] Re: [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling
2010-02-09 9:54 ` OHMURA Kei
@ 2010-02-09 10:26 ` Avi Kivity
2010-02-10 9:55 ` OHMURA Kei
0 siblings, 1 reply; 11+ messages in thread
From: Avi Kivity @ 2010-02-09 10:26 UTC (permalink / raw)
To: OHMURA Kei; +Cc: mtosatti, qemu-devel, kvm
On 02/09/2010 11:54 AM, OHMURA Kei wrote:
> Thank you for your comments. We have implemented the code which applied your
> comments. This is patch for qemu-kvm.c.
>
Please reuse the changelog when reposing a patch, this makes it easier
for me to apply it.
> @@ -2438,27 +2438,34 @@ static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
> unsigned long offset,
> unsigned long mem_size)
> {
> - unsigned int i, j, n = 0;
> + unsigned int i, j, k, start, end;
> unsigned char c;
> unsigned long page_number, addr, addr1;
> ram_addr_t ram_addr;
> - unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
> + unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + TARGET_LONG_BITS - 1) /
> + TARGET_LONG_BITS;
> + unsigned long *bitmap_ul = (unsigned long *)bitmap;
>
> /*
> * bitmap-traveling is faster than memory-traveling (for addr...)
> * especially when most of the memory is not dirty.
> */
> for (i = 0; i < len; i++) {
> - c = bitmap[i];
> - while (c > 0) {
> - j = ffsl(c) - 1;
> - c &= ~(1u << j);
> - page_number = i * 8 + j;
> - addr1 = page_number * TARGET_PAGE_SIZE;
> - addr = offset + addr1;
> - ram_addr = cpu_get_physical_page_desc(addr);
> - cpu_physical_memory_set_dirty(ram_addr);
> - n++;
> + if (bitmap_ul[i] != 0) {
> + start = i * TARGET_LONG_SIZE;
> + end = (i + 1) * TARGET_LONG_SIZE;
>
Should be a host long size, not guest. This will fail when running a
32-bit qemu-system-x86_64 binary.
> + for (j = start; j < end; j++) {
> + c = bitmap[j];
> + while (c > 0) {
> + k = ffsl(c) - 1;
> + c &= ~(1u << k);
> + page_number = j * 8 + k;
> + addr1 = page_number * TARGET_PAGE_SIZE;
> + addr = offset + addr1;
> + ram_addr = cpu_get_physical_page_desc(addr);
> + cpu_physical_memory_set_dirty(ram_addr);
> + }
> + }
>
Instead of using a nested loop if bitmap_ul[i] != 0, it is possible to
use just a single loop (while (c > 0)), and process a long's worth of data.
The only trickery is with big endian hosts, where the conversion from
bit number to page number is a bit complicated.
If we do this, we can convert the bitmap's type to unsigned long
throughout, and avoid the casts.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Qemu-devel] Re: [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling
2010-02-09 10:26 ` Avi Kivity
@ 2010-02-10 9:55 ` OHMURA Kei
2010-02-10 10:24 ` Avi Kivity
0 siblings, 1 reply; 11+ messages in thread
From: OHMURA Kei @ 2010-02-10 9:55 UTC (permalink / raw)
To: Avi Kivity; +Cc: ohmura.kei, mtosatti, qemu-devel, kvm
> Please reuse the changelog when reposing a patch, this makes it easier
> for me to apply it.
Thanks. Will follow it from next time.
> Should be a host long size, not guest. This will fail when running a
> 32-bit qemu-system-x86_64 binary.
Sorry. That was our mistake.
> Instead of using a nested loop if bitmap_ul[i] != 0, it is possible to
> use just a single loop (while (c> 0)), and process a long's worth of data.
>
> The only trickery is with big endian hosts, where the conversion from
> bit number to page number is a bit complicated.
To convert the bitmap from big endian to little endian, le_bswap macro in
bswap.h seems useful, which is now undefined. What do you think about this
approach?
This is an example bitmap-traveling code using le_bswap:
/*
* bitmap-traveling is faster than memory-traveling (for addr...)
* especially when most of the memory is not dirty.
*/
for (i = 0; i < len; i++) {
if (bitmap_ul[i] != 0) {
c = le_bswap(bitmap_ul[i], HOST_LONG_BITS);
while (c > 0) {
j = ffsl(c) - 1;
c &= ~(1ul << j);
page_number = i * HOST_LONG_BITS + j;
addr1 = page_number * TARGET_PAGE_SIZE;
addr = offset + addr1;
ram_addr = cpu_get_physical_page_desc(addr);
cpu_physical_memory_set_dirty(ram_addr);
}
}
}
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Qemu-devel] Re: [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling
2010-02-10 9:55 ` OHMURA Kei
@ 2010-02-10 10:24 ` Avi Kivity
0 siblings, 0 replies; 11+ messages in thread
From: Avi Kivity @ 2010-02-10 10:24 UTC (permalink / raw)
To: OHMURA Kei; +Cc: mtosatti, qemu-devel, kvm
On 02/10/2010 11:55 AM, OHMURA Kei wrote:
>
>> Instead of using a nested loop if bitmap_ul[i] != 0, it is possible to
>> use just a single loop (while (c> 0)), and process a long's worth of data.
>>
>> The only trickery is with big endian hosts, where the conversion from
>> bit number to page number is a bit complicated.
>>
> To convert the bitmap from big endian to little endian, le_bswap macro in
> bswap.h seems useful, which is now undefined. What do you think about this
> approach?
>
> This is an example bitmap-traveling code using le_bswap:
> /*
> * bitmap-traveling is faster than memory-traveling (for addr...)
> * especially when most of the memory is not dirty.
> */
> for (i = 0; i < len; i++) {
> if (bitmap_ul[i] != 0) {
> c = le_bswap(bitmap_ul[i], HOST_LONG_BITS);
> while (c > 0) {
> j = ffsl(c) - 1;
> c &= ~(1ul << j);
> page_number = i * HOST_LONG_BITS + j;
> addr1 = page_number * TARGET_PAGE_SIZE;
> addr = offset + addr1;
> ram_addr = cpu_get_physical_page_desc(addr);
> cpu_physical_memory_set_dirty(ram_addr);
> }
> }
> }
>
Yes, that solves the problem very neatly.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2010-02-10 10:24 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-05 10:18 [Qemu-devel] [PATCH] qemu-kvm: Speed up of the dirty-bitmap-traveling OHMURA Kei
2010-02-05 12:04 ` [Qemu-devel] " Jan Kiszka
2010-02-08 6:14 ` OHMURA Kei
2010-02-08 11:23 ` OHMURA Kei
2010-02-08 11:44 ` Jan Kiszka
2010-02-09 9:55 ` OHMURA Kei
2010-02-08 12:40 ` Avi Kivity
2010-02-09 9:54 ` OHMURA Kei
2010-02-09 10:26 ` Avi Kivity
2010-02-10 9:55 ` OHMURA Kei
2010-02-10 10:24 ` Avi Kivity
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).