* [Qemu-devel] [PATCH 0/3] memory: an optimization
@ 2016-02-20 2:35 Gonglei
2016-02-20 2:35 ` [Qemu-devel] [PATCH 1/3] exec: store RAMBlock pointer into memory region Gonglei
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Gonglei @ 2016-02-20 2:35 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, Gonglei, peter.huangpeng
Perf top tells me qemu_get_ram_ptr consume too much cpu cycles.
> 22.56% qemu-kvm [.] address_space_translate
> 13.29% qemu-kvm [.] qemu_get_ram_ptr
> 4.71% qemu-kvm [.] phys_page_find
> 4.43% qemu-kvm [.] address_space_translate_internal
> 3.47% libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt
> 3.08% qemu-kvm [.] qemu_ram_addr_from_host
> 2.62% qemu-kvm [.] address_space_map
> 2.61% libc-2.19.so [.] _int_malloc
> 2.58% libc-2.19.so [.] _int_free
> 2.38% libc-2.19.so [.] malloc
> 2.06% libpthread-2.19.so [.] pthread_mutex_lock
> 1.68% libc-2.19.so [.] malloc_consolidate
> 1.35% libc-2.19.so [.] __memcpy_sse2_unaligned
> 1.23% qemu-kvm [.] lduw_le_phys
> 1.18% qemu-kvm [.] find_next_zero_bit
> 1.02% qemu-kvm [.] object_unref
And Paolo suggested that we can get rid of qemu_get_ram_ptr
by storing the RAMBlock pointer into the memory region,
instead of the ram_addr_t value. And after appling this change,
I got much better performance indeed.
BTW, PATCH 3 is an occasional find.
Gonglei (3):
exec: store RAMBlock pointer into memory region
memory: optimize qemu_get_ram_ptr and qemu_ram_ptr_length
memory: Remove the superfluous code
exec.c | 48 ++++++++++++++++++++++++++++++------------------
include/exec/memory.h | 7 +++----
memory.c | 3 ++-
3 files changed, 35 insertions(+), 23 deletions(-)
--
1.8.5.2
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Qemu-devel] [PATCH 1/3] exec: store RAMBlock pointer into memory region 2016-02-20 2:35 [Qemu-devel] [PATCH 0/3] memory: an optimization Gonglei @ 2016-02-20 2:35 ` Gonglei 2016-02-22 2:45 ` Fam Zheng 2016-02-20 2:35 ` [Qemu-devel] [PATCH 2/3] memory: optimize qemu_get_ram_ptr and qemu_ram_ptr_length Gonglei ` (2 subsequent siblings) 3 siblings, 1 reply; 8+ messages in thread From: Gonglei @ 2016-02-20 2:35 UTC (permalink / raw) To: qemu-devel; +Cc: pbonzini, Gonglei, peter.huangpeng Each RAM memory region has a unique corresponding RAMBlock. In the current realization, the memory region only stored the ram_addr which means the offset of RAM address space, We need to qurey the global ram.list to find the ram block by ram_addr if we want to get the ram block, which is very expensive. Now, we store the RAMBlock pointer into memory region structure. So, if we know the mr, we can easily get the RAMBlock. Signed-off-by: Gonglei <arei.gonglei@huawei.com> --- exec.c | 2 ++ include/exec/memory.h | 1 + memory.c | 1 + 3 files changed, 4 insertions(+) diff --git a/exec.c b/exec.c index 1f24500..e29e369 100644 --- a/exec.c +++ b/exec.c @@ -1717,6 +1717,8 @@ ram_addr_t qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size, error_propagate(errp, local_err); return -1; } + /* store the ram block pointer into memroy region */ + mr->ram_block = new_block; return addr; } diff --git a/include/exec/memory.h b/include/exec/memory.h index c92734a..23e2e3e 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -172,6 +172,7 @@ struct MemoryRegion { bool global_locking; uint8_t dirty_log_mask; ram_addr_t ram_addr; + void *ram_block; /* RAMBlock pointer */ Object *owner; const MemoryRegionIOMMUOps *iommu_ops; diff --git a/memory.c b/memory.c index 09041ed..b4451dd 100644 --- a/memory.c +++ b/memory.c @@ -912,6 +912,7 @@ void memory_region_init(MemoryRegion *mr, } mr->name = g_strdup(name); mr->owner = owner; + mr->ram_block = NULL; if (name) { char *escaped_name = memory_region_escape_name(name); -- 1.8.5.2 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH 1/3] exec: store RAMBlock pointer into memory region 2016-02-20 2:35 ` [Qemu-devel] [PATCH 1/3] exec: store RAMBlock pointer into memory region Gonglei @ 2016-02-22 2:45 ` Fam Zheng 2016-02-22 3:28 ` Gonglei (Arei) 0 siblings, 1 reply; 8+ messages in thread From: Fam Zheng @ 2016-02-22 2:45 UTC (permalink / raw) To: Gonglei; +Cc: pbonzini, qemu-devel, peter.huangpeng On Sat, 02/20 10:35, Gonglei wrote: > Each RAM memory region has a unique corresponding RAMBlock. > In the current realization, the memory region only stored > the ram_addr which means the offset of RAM address space, > We need to qurey the global ram.list to find the ram block > by ram_addr if we want to get the ram block, which is very > expensive. > > Now, we store the RAMBlock pointer into memory region > structure. So, if we know the mr, we can easily get the > RAMBlock. > > Signed-off-by: Gonglei <arei.gonglei@huawei.com> > --- > exec.c | 2 ++ > include/exec/memory.h | 1 + > memory.c | 1 + > 3 files changed, 4 insertions(+) > > diff --git a/exec.c b/exec.c > index 1f24500..e29e369 100644 > --- a/exec.c > +++ b/exec.c > @@ -1717,6 +1717,8 @@ ram_addr_t qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size, > error_propagate(errp, local_err); > return -1; > } > + /* store the ram block pointer into memroy region */ The comment is superfluous IMHO, the code is quite self-explanatory. > + mr->ram_block = new_block; > return addr; > } > > diff --git a/include/exec/memory.h b/include/exec/memory.h > index c92734a..23e2e3e 100644 > --- a/include/exec/memory.h > +++ b/include/exec/memory.h > @@ -172,6 +172,7 @@ struct MemoryRegion { > bool global_locking; > uint8_t dirty_log_mask; > ram_addr_t ram_addr; > + void *ram_block; /* RAMBlock pointer */ Why not add typedef struct RAMBlock RAMBlock; then RAMBlock *ram_block; ? > Object *owner; > const MemoryRegionIOMMUOps *iommu_ops; > > diff --git a/memory.c b/memory.c > index 09041ed..b4451dd 100644 > --- a/memory.c > +++ b/memory.c > @@ -912,6 +912,7 @@ void memory_region_init(MemoryRegion *mr, > } > mr->name = g_strdup(name); > mr->owner = owner; > + mr->ram_block = NULL; > > if (name) { > char *escaped_name = memory_region_escape_name(name); > -- > 1.8.5.2 > > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH 1/3] exec: store RAMBlock pointer into memory region 2016-02-22 2:45 ` Fam Zheng @ 2016-02-22 3:28 ` Gonglei (Arei) 0 siblings, 0 replies; 8+ messages in thread From: Gonglei (Arei) @ 2016-02-22 3:28 UTC (permalink / raw) To: Fam Zheng; +Cc: pbonzini@redhat.com, qemu-devel@nongnu.org, Huangpeng (Peter) Hi Fam, > From: Fam Zheng [mailto:famz@redhat.com] > Sent: Monday, February 22, 2016 10:46 AM > > On Sat, 02/20 10:35, Gonglei wrote: > > Each RAM memory region has a unique corresponding RAMBlock. > > In the current realization, the memory region only stored > > the ram_addr which means the offset of RAM address space, > > We need to qurey the global ram.list to find the ram block > > by ram_addr if we want to get the ram block, which is very > > expensive. > > > > Now, we store the RAMBlock pointer into memory region > > structure. So, if we know the mr, we can easily get the > > RAMBlock. > > > > Signed-off-by: Gonglei <arei.gonglei@huawei.com> > > --- > > exec.c | 2 ++ > > include/exec/memory.h | 1 + > > memory.c | 1 + > > 3 files changed, 4 insertions(+) > > > > diff --git a/exec.c b/exec.c > > index 1f24500..e29e369 100644 > > --- a/exec.c > > +++ b/exec.c > > @@ -1717,6 +1717,8 @@ ram_addr_t > qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size, > > error_propagate(errp, local_err); > > return -1; > > } > > + /* store the ram block pointer into memroy region */ > > The comment is superfluous IMHO, the code is quite self-explanatory. > Yes, agree. > > + mr->ram_block = new_block; > > return addr; > > } > > > > diff --git a/include/exec/memory.h b/include/exec/memory.h > > index c92734a..23e2e3e 100644 > > --- a/include/exec/memory.h > > +++ b/include/exec/memory.h > > @@ -172,6 +172,7 @@ struct MemoryRegion { > > bool global_locking; > > uint8_t dirty_log_mask; > > ram_addr_t ram_addr; > > + void *ram_block; /* RAMBlock pointer */ > > Why not add > > typedef struct RAMBlock RAMBlock; > > then > > RAMBlock *ram_block; > > ? > It's clearer. Will fix in v2, thanks :) Regards, -Gonglei > > Object *owner; > > const MemoryRegionIOMMUOps *iommu_ops; > > > > diff --git a/memory.c b/memory.c > > index 09041ed..b4451dd 100644 > > --- a/memory.c > > +++ b/memory.c > > @@ -912,6 +912,7 @@ void memory_region_init(MemoryRegion *mr, > > } > > mr->name = g_strdup(name); > > mr->owner = owner; > > + mr->ram_block = NULL; > > > > if (name) { > > char *escaped_name = memory_region_escape_name(name); > > -- > > 1.8.5.2 > > > > > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Qemu-devel] [PATCH 2/3] memory: optimize qemu_get_ram_ptr and qemu_ram_ptr_length 2016-02-20 2:35 [Qemu-devel] [PATCH 0/3] memory: an optimization Gonglei 2016-02-20 2:35 ` [Qemu-devel] [PATCH 1/3] exec: store RAMBlock pointer into memory region Gonglei @ 2016-02-20 2:35 ` Gonglei 2016-02-20 2:35 ` [Qemu-devel] [PATCH 3/3] memory: Remove the superfluous code Gonglei 2016-02-20 9:47 ` [Qemu-devel] [PATCH 0/3] memory: an optimization Paolo Bonzini 3 siblings, 0 replies; 8+ messages in thread From: Gonglei @ 2016-02-20 2:35 UTC (permalink / raw) To: qemu-devel; +Cc: pbonzini, Gonglei, peter.huangpeng these two functions consume too much cpu overhead to find the RAMBlock by ram address. After this patch, we can pass the RAMBlock pointer to them so that they don't need to find the RAMBlock anymore most of the time. We can get better performance in address translation processing. Signed-off-by: Gonglei <arei.gonglei@huawei.com> --- exec.c | 46 ++++++++++++++++++++++++++++------------------ include/exec/memory.h | 4 ++-- memory.c | 2 +- 3 files changed, 31 insertions(+), 21 deletions(-) diff --git a/exec.c b/exec.c index e29e369..f714238 100644 --- a/exec.c +++ b/exec.c @@ -1868,9 +1868,13 @@ void *qemu_get_ram_block_host_ptr(ram_addr_t addr) * * Called within RCU critical section. */ -void *qemu_get_ram_ptr(ram_addr_t addr) +void *qemu_get_ram_ptr(RAMBlock *ram_block, ram_addr_t addr) { - RAMBlock *block = qemu_get_ram_block(addr); + RAMBlock *block = ram_block; + + if (block == NULL) { + block = qemu_get_ram_block(addr); + } if (xen_enabled() && block->host == NULL) { /* We need to check if the requested address is in the RAM @@ -1891,15 +1895,18 @@ void *qemu_get_ram_ptr(ram_addr_t addr) * * Called within RCU critical section. */ -static void *qemu_ram_ptr_length(ram_addr_t addr, hwaddr *size) +static void *qemu_ram_ptr_length(RAMBlock *ram_block, ram_addr_t addr, + hwaddr *size) { - RAMBlock *block; + RAMBlock *block = ram_block; ram_addr_t offset_inside_block; if (*size == 0) { return NULL; } - block = qemu_get_ram_block(addr); + if (block == NULL) { + block = qemu_get_ram_block(addr); + } offset_inside_block = addr - block->offset; *size = MIN(*size, block->max_length - offset_inside_block); @@ -2027,13 +2034,13 @@ static void notdirty_mem_write(void *opaque, hwaddr ram_addr, } switch (size) { case 1: - stb_p(qemu_get_ram_ptr(ram_addr), val); + stb_p(qemu_get_ram_ptr(NULL, ram_addr), val); break; case 2: - stw_p(qemu_get_ram_ptr(ram_addr), val); + stw_p(qemu_get_ram_ptr(NULL, ram_addr), val); break; case 4: - stl_p(qemu_get_ram_ptr(ram_addr), val); + stl_p(qemu_get_ram_ptr(NULL, ram_addr), val); break; default: abort(); @@ -2609,7 +2616,7 @@ static MemTxResult address_space_write_continue(AddressSpace *as, hwaddr addr, } else { addr1 += memory_region_get_ram_addr(mr); /* RAM case */ - ptr = qemu_get_ram_ptr(addr1); + ptr = qemu_get_ram_ptr(mr->ram_block, addr1); memcpy(ptr, buf, l); invalidate_and_set_dirty(mr, addr1, l); } @@ -2700,7 +2707,7 @@ MemTxResult address_space_read_continue(AddressSpace *as, hwaddr addr, } } else { /* RAM case */ - ptr = qemu_get_ram_ptr(mr->ram_addr + addr1); + ptr = qemu_get_ram_ptr(mr->ram_block, mr->ram_addr + addr1); memcpy(buf, ptr, l); } @@ -2785,7 +2792,7 @@ static inline void cpu_physical_memory_write_rom_internal(AddressSpace *as, } else { addr1 += memory_region_get_ram_addr(mr); /* ROM/RAM case */ - ptr = qemu_get_ram_ptr(addr1); + ptr = qemu_get_ram_ptr(mr->ram_block, addr1); switch (type) { case WRITE_DATA: memcpy(ptr, buf, l); @@ -2997,7 +3004,7 @@ void *address_space_map(AddressSpace *as, memory_region_ref(mr); *plen = done; - ptr = qemu_ram_ptr_length(raddr + base, plen); + ptr = qemu_ram_ptr_length(mr->ram_block, raddr + base, plen); rcu_read_unlock(); return ptr; @@ -3081,7 +3088,8 @@ static inline uint32_t address_space_ldl_internal(AddressSpace *as, hwaddr addr, #endif } else { /* RAM case */ - ptr = qemu_get_ram_ptr((memory_region_get_ram_addr(mr) + ptr = qemu_get_ram_ptr(mr->ram_block, + (memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK) + addr1); switch (endian) { @@ -3176,7 +3184,8 @@ static inline uint64_t address_space_ldq_internal(AddressSpace *as, hwaddr addr, #endif } else { /* RAM case */ - ptr = qemu_get_ram_ptr((memory_region_get_ram_addr(mr) + ptr = qemu_get_ram_ptr(mr->ram_block, + (memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK) + addr1); switch (endian) { @@ -3291,7 +3300,8 @@ static inline uint32_t address_space_lduw_internal(AddressSpace *as, #endif } else { /* RAM case */ - ptr = qemu_get_ram_ptr((memory_region_get_ram_addr(mr) + ptr = qemu_get_ram_ptr(mr->ram_block, + (memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK) + addr1); switch (endian) { @@ -3376,7 +3386,7 @@ void address_space_stl_notdirty(AddressSpace *as, hwaddr addr, uint32_t val, r = memory_region_dispatch_write(mr, addr1, val, 4, attrs); } else { addr1 += memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK; - ptr = qemu_get_ram_ptr(addr1); + ptr = qemu_get_ram_ptr(mr->ram_block, addr1); stl_p(ptr, val); dirty_log_mask = memory_region_get_dirty_log_mask(mr); @@ -3431,7 +3441,7 @@ static inline void address_space_stl_internal(AddressSpace *as, } else { /* RAM case */ addr1 += memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK; - ptr = qemu_get_ram_ptr(addr1); + ptr = qemu_get_ram_ptr(mr->ram_block, addr1); switch (endian) { case DEVICE_LITTLE_ENDIAN: stl_le_p(ptr, val); @@ -3541,7 +3551,7 @@ static inline void address_space_stw_internal(AddressSpace *as, } else { /* RAM case */ addr1 += memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK; - ptr = qemu_get_ram_ptr(addr1); + ptr = qemu_get_ram_ptr(mr->ram_block, addr1); switch (endian) { case DEVICE_LITTLE_ENDIAN: stw_le_p(ptr, val); diff --git a/include/exec/memory.h b/include/exec/memory.h index 23e2e3e..227fbf4 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -1390,7 +1390,7 @@ MemTxResult address_space_read_continue(AddressSpace *as, hwaddr addr, MemoryRegion *mr); MemTxResult address_space_read_full(AddressSpace *as, hwaddr addr, MemTxAttrs attrs, uint8_t *buf, int len); -void *qemu_get_ram_ptr(ram_addr_t addr); +void *qemu_get_ram_ptr(RAMBlock *ram_block, ram_addr_t addr); static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write) { @@ -1431,7 +1431,7 @@ MemTxResult address_space_read(AddressSpace *as, hwaddr addr, MemTxAttrs attrs, mr = address_space_translate(as, addr, &addr1, &l, false); if (len == l && memory_access_is_direct(mr, false)) { addr1 += memory_region_get_ram_addr(mr); - ptr = qemu_get_ram_ptr(addr1); + ptr = qemu_get_ram_ptr(mr->ram_block, addr1); memcpy(buf, ptr, len); } else { result = address_space_read_continue(as, addr, attrs, buf, len, diff --git a/memory.c b/memory.c index b4451dd..0dd9695 100644 --- a/memory.c +++ b/memory.c @@ -1570,7 +1570,7 @@ void *memory_region_get_ram_ptr(MemoryRegion *mr) mr = mr->alias; } assert(mr->ram_addr != RAM_ADDR_INVALID); - ptr = qemu_get_ram_ptr(mr->ram_addr & TARGET_PAGE_MASK); + ptr = qemu_get_ram_ptr(mr->ram_block, mr->ram_addr & TARGET_PAGE_MASK); rcu_read_unlock(); return ptr + offset; -- 1.8.5.2 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [Qemu-devel] [PATCH 3/3] memory: Remove the superfluous code 2016-02-20 2:35 [Qemu-devel] [PATCH 0/3] memory: an optimization Gonglei 2016-02-20 2:35 ` [Qemu-devel] [PATCH 1/3] exec: store RAMBlock pointer into memory region Gonglei 2016-02-20 2:35 ` [Qemu-devel] [PATCH 2/3] memory: optimize qemu_get_ram_ptr and qemu_ram_ptr_length Gonglei @ 2016-02-20 2:35 ` Gonglei 2016-02-20 9:47 ` [Qemu-devel] [PATCH 0/3] memory: an optimization Paolo Bonzini 3 siblings, 0 replies; 8+ messages in thread From: Gonglei @ 2016-02-20 2:35 UTC (permalink / raw) To: qemu-devel; +Cc: pbonzini, Gonglei, peter.huangpeng Signed-off-by: Gonglei <arei.gonglei@huawei.com> --- include/exec/memory.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/include/exec/memory.h b/include/exec/memory.h index 227fbf4..5f96e6b 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -1399,8 +1399,6 @@ static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write) } else { return memory_region_is_ram(mr) || memory_region_is_romd(mr); } - - return false; } /** -- 1.8.5.2 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] memory: an optimization 2016-02-20 2:35 [Qemu-devel] [PATCH 0/3] memory: an optimization Gonglei ` (2 preceding siblings ...) 2016-02-20 2:35 ` [Qemu-devel] [PATCH 3/3] memory: Remove the superfluous code Gonglei @ 2016-02-20 9:47 ` Paolo Bonzini 2016-02-20 10:34 ` Gonglei (Arei) 3 siblings, 1 reply; 8+ messages in thread From: Paolo Bonzini @ 2016-02-20 9:47 UTC (permalink / raw) To: Gonglei, qemu-devel; +Cc: peter.huangpeng On 20/02/2016 03:35, Gonglei wrote: > Perf top tells me qemu_get_ram_ptr consume too much cpu cycles. >> 22.56% qemu-kvm [.] address_space_translate >> 13.29% qemu-kvm [.] qemu_get_ram_ptr >> 4.71% qemu-kvm [.] phys_page_find >> 4.43% qemu-kvm [.] address_space_translate_internal >> 3.47% libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt >> 3.08% qemu-kvm [.] qemu_ram_addr_from_host >> 2.62% qemu-kvm [.] address_space_map >> 2.61% libc-2.19.so [.] _int_malloc >> 2.58% libc-2.19.so [.] _int_free >> 2.38% libc-2.19.so [.] malloc >> 2.06% libpthread-2.19.so [.] pthread_mutex_lock >> 1.68% libc-2.19.so [.] malloc_consolidate >> 1.35% libc-2.19.so [.] __memcpy_sse2_unaligned >> 1.23% qemu-kvm [.] lduw_le_phys >> 1.18% qemu-kvm [.] find_next_zero_bit >> 1.02% qemu-kvm [.] object_unref > > And Paolo suggested that we can get rid of qemu_get_ram_ptr > by storing the RAMBlock pointer into the memory region, > instead of the ram_addr_t value. And after appling this change, > I got much better performance indeed. What's the gain like? I've not reviewed the patch in depth, but what I can say is that I like it a lot. It only does the bare minimum needed to provide the optimization, but this also makes it very simple to understand. More cleanups and further optimizations are possible (including removing mr->ram_addr completely), but your patches really does one thing and does it well. Good job! Paolo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] memory: an optimization 2016-02-20 9:47 ` [Qemu-devel] [PATCH 0/3] memory: an optimization Paolo Bonzini @ 2016-02-20 10:34 ` Gonglei (Arei) 0 siblings, 0 replies; 8+ messages in thread From: Gonglei (Arei) @ 2016-02-20 10:34 UTC (permalink / raw) To: Paolo Bonzini, qemu-devel@nongnu.org; +Cc: Huangpeng (Peter) Hi Paolo, > -----Original Message----- > From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo > Bonzini > Sent: Saturday, February 20, 2016 5:48 PM > To: Gonglei (Arei); qemu-devel@nongnu.org > Cc: Huangpeng (Peter) > Subject: Re: [PATCH 0/3] memory: an optimization > > > > On 20/02/2016 03:35, Gonglei wrote: > > Perf top tells me qemu_get_ram_ptr consume too much cpu cycles. > >> 22.56% qemu-kvm [.] address_space_translate > >> 13.29% qemu-kvm [.] qemu_get_ram_ptr > >> 4.71% qemu-kvm [.] phys_page_find > >> 4.43% qemu-kvm [.] > address_space_translate_internal > >> 3.47% libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt > >> 3.08% qemu-kvm [.] qemu_ram_addr_from_host > >> 2.62% qemu-kvm [.] address_space_map > >> 2.61% libc-2.19.so [.] _int_malloc > >> 2.58% libc-2.19.so [.] _int_free > >> 2.38% libc-2.19.so [.] malloc > >> 2.06% libpthread-2.19.so [.] pthread_mutex_lock > >> 1.68% libc-2.19.so [.] malloc_consolidate > >> 1.35% libc-2.19.so [.] __memcpy_sse2_unaligned > >> 1.23% qemu-kvm [.] lduw_le_phys > >> 1.18% qemu-kvm [.] find_next_zero_bit > >> 1.02% qemu-kvm [.] object_unref > > > > And Paolo suggested that we can get rid of qemu_get_ram_ptr > > by storing the RAMBlock pointer into the memory region, > > instead of the ram_addr_t value. And after appling this change, > > I got much better performance indeed. > > What's the gain like? > After rebased on the master branch right now, I found that the qemu_get_ram_ptr is not one of main consumers. But I also get some bonus from this patch set. Before this optimization: 1.26% qemu-kvm [.] qemu_get_ram_ptr 0.89% qemu-kvm [.] qemu_get_ram_block Applied the patch set: 0.87% qemu-kvm [.] qemu_get_ram_ptr Now the main consumers are (too much different with qemu-2.3): 6.38% libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt 6.02% qemu-kvm [.] vring_desc_read.isra.26 5.27% qemu-kvm [.] address_space_map 4.45% qemu-kvm [.] qemu_ram_block_from_host 4.13% libpthread-2.19.so [.] pthread_mutex_lock 3.95% libc-2.19.so [.] _int_free 3.46% qemu-kvm [.] address_space_translate_internal 3.40% qemu-kvm [.] address_space_translate 3.39% qemu-kvm [.] phys_page_find 3.37% libc-2.19.so [.] _int_malloc 3.21% qemu-kvm [.] stw_le_phys 2.70% libc-2.19.so [.] malloc 2.18% qemu-kvm [.] lduw_le_phys 2.15% libc-2.19.so [.] __memcpy_sse2_unaligned 1.58% qemu-kvm [.] address_space_write 1.48% libc-2.19.so [.] memset 1.22% qemu-kvm [.] virtqueue_map_desc 1.22% libc-2.19.so [.] __libc_calloc 1.21% qemu-kvm [.] virtio_notify And the speed based on the master branch and my patch series: Testing AES-128-CBC cipher: Encrypting in chunks of 256 bytes: done. 506.27 MiB in 5.01 secs: 100.97 MiB/sec (2073684 packets) Encrypting in chunks of 256 bytes: done. 505.89 MiB in 5.02 secs: 100.85 MiB/sec (2072106 packets) Encrypting in chunks of 256 bytes: done. 505.94 MiB in 5.02 secs: 100.86 MiB/sec (2072343 packets) Encrypting in chunks of 256 bytes: done. 505.96 MiB in 5.02 secs: 100.87 MiB/sec (2072412 packets) Encrypting in chunks of 256 bytes: done. 505.92 MiB in 5.02 secs: 100.86 MiB/sec (2072241 packets) Encrypting in chunks of 256 bytes: done. 506.36 MiB in 5.02 secs: 100.95 MiB/sec (2074057 packets) Encrypting in chunks of 256 bytes: done. 506.35 MiB in 5.01 secs: 101.02 MiB/sec (2073998 packets) Encrypting in chunks of 256 bytes: done. 505.41 MiB in 5.01 secs: 100.92 MiB/sec (2070157 packets) > I've not reviewed the patch in depth, but what I can say is that I like > it a lot. It only does the bare minimum needed to provide the > optimization, but this also makes it very simple to understand. More > cleanups and further optimizations are possible (including removing > mr->ram_addr completely), but your patches really does one thing and > does it well. Good job! > Thanks! Regards, -Gonglei ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-02-22 3:33 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-02-20 2:35 [Qemu-devel] [PATCH 0/3] memory: an optimization Gonglei 2016-02-20 2:35 ` [Qemu-devel] [PATCH 1/3] exec: store RAMBlock pointer into memory region Gonglei 2016-02-22 2:45 ` Fam Zheng 2016-02-22 3:28 ` Gonglei (Arei) 2016-02-20 2:35 ` [Qemu-devel] [PATCH 2/3] memory: optimize qemu_get_ram_ptr and qemu_ram_ptr_length Gonglei 2016-02-20 2:35 ` [Qemu-devel] [PATCH 3/3] memory: Remove the superfluous code Gonglei 2016-02-20 9:47 ` [Qemu-devel] [PATCH 0/3] memory: an optimization Paolo Bonzini 2016-02-20 10:34 ` Gonglei (Arei)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).