* [Qemu-devel] [PATCH v2 0/3] memory: an optimization @ 2016-02-22 8:34 Gonglei 2016-02-22 8:34 ` [Qemu-devel] [PATCH v2 1/3] exec: store RAMBlock pointer into memory region Gonglei ` (3 more replies) 0 siblings, 4 replies; 6+ messages in thread From: Gonglei @ 2016-02-22 8:34 UTC (permalink / raw) To: qemu-devel; +Cc: pbonzini, Gonglei, peter.huangpeng Perf top tells me qemu_get_ram_ptr consume some cpu cycles. Before this optimization: 1.26% qemu-kvm [.] qemu_get_ram_ptr 0.89% qemu-kvm [.] qemu_get_ram_block Applied the patch set: 0.87% qemu-kvm [.] qemu_get_ram_ptr And Paolo suggested that we can get rid of qemu_get_ram_ptr by storing the RAMBlock pointer into the memory region, instead of the ram_addr_t value. And after appling this change, I got much better performance indeed. BTW, PATCH 3 is an occasional find. v2: - using 'struct RAMBlock *' instead of 'void *' in patch 1 [Fam] - drop superfluous comments in patch 1 [Fam] Gonglei (3): exec: store RAMBlock pointer into memory region memory: optimize qemu_get_ram_ptr and qemu_ram_ptr_length memory: Remove the superfluous code exec.c | 48 ++++++++++++++++++++++++++++++------------------ include/exec/memory.h | 8 ++++---- memory.c | 3 ++- 3 files changed, 36 insertions(+), 23 deletions(-) -- 1.8.5.2 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH v2 1/3] exec: store RAMBlock pointer into memory region 2016-02-22 8:34 [Qemu-devel] [PATCH v2 0/3] memory: an optimization Gonglei @ 2016-02-22 8:34 ` Gonglei 2016-02-22 8:34 ` [Qemu-devel] [PATCH v2 2/3] memory: optimize qemu_get_ram_ptr and qemu_ram_ptr_length Gonglei ` (2 subsequent siblings) 3 siblings, 0 replies; 6+ messages in thread From: Gonglei @ 2016-02-22 8:34 UTC (permalink / raw) To: qemu-devel; +Cc: pbonzini, Gonglei, peter.huangpeng Each RAM memory region has a unique corresponding RAMBlock. In the current realization, the memory region only stored the ram_addr which means the offset of RAM address space, We need to qurey the global ram.list to find the ram block by ram_addr if we want to get the ram block, which is very expensive. Now, we store the RAMBlock pointer into memory region structure. So, if we know the mr, we can easily get the RAMBlock. Signed-off-by: Gonglei <arei.gonglei@huawei.com> --- exec.c | 2 ++ include/exec/memory.h | 2 ++ memory.c | 1 + 3 files changed, 5 insertions(+) diff --git a/exec.c b/exec.c index 1f24500..4c0114a 100644 --- a/exec.c +++ b/exec.c @@ -1717,6 +1717,8 @@ ram_addr_t qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size, error_propagate(errp, local_err); return -1; } + + mr->ram_block = new_block; return addr; } diff --git a/include/exec/memory.h b/include/exec/memory.h index c92734a..4025729 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -156,6 +156,7 @@ struct MemoryRegionIOMMUOps { typedef struct CoalescedMemoryRange CoalescedMemoryRange; typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd; +typedef struct RAMBlock RAMBlock; struct MemoryRegion { Object parent_obj; @@ -172,6 +173,7 @@ struct MemoryRegion { bool global_locking; uint8_t dirty_log_mask; ram_addr_t ram_addr; + RAMBlock *ram_block; Object *owner; const MemoryRegionIOMMUOps *iommu_ops; diff --git a/memory.c b/memory.c index 09041ed..b4451dd 100644 --- a/memory.c +++ b/memory.c @@ -912,6 +912,7 @@ void memory_region_init(MemoryRegion *mr, } mr->name = g_strdup(name); mr->owner = owner; + mr->ram_block = NULL; if (name) { char *escaped_name = memory_region_escape_name(name); -- 1.8.5.2 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH v2 2/3] memory: optimize qemu_get_ram_ptr and qemu_ram_ptr_length 2016-02-22 8:34 [Qemu-devel] [PATCH v2 0/3] memory: an optimization Gonglei 2016-02-22 8:34 ` [Qemu-devel] [PATCH v2 1/3] exec: store RAMBlock pointer into memory region Gonglei @ 2016-02-22 8:34 ` Gonglei 2016-02-22 8:34 ` [Qemu-devel] [PATCH v2 3/3] memory: Remove the superfluous code Gonglei 2016-02-22 10:22 ` [Qemu-devel] [PATCH v2 0/3] memory: an optimization Paolo Bonzini 3 siblings, 0 replies; 6+ messages in thread From: Gonglei @ 2016-02-22 8:34 UTC (permalink / raw) To: qemu-devel; +Cc: pbonzini, Gonglei, peter.huangpeng these two functions consume too much cpu overhead to find the RAMBlock by ram address. After this patch, we can pass the RAMBlock pointer to them so that they don't need to find the RAMBlock anymore most of the time. We can get better performance in address translation processing. Signed-off-by: Gonglei <arei.gonglei@huawei.com> --- exec.c | 46 ++++++++++++++++++++++++++++------------------ include/exec/memory.h | 4 ++-- memory.c | 2 +- 3 files changed, 31 insertions(+), 21 deletions(-) diff --git a/exec.c b/exec.c index 4c0114a..c62c439 100644 --- a/exec.c +++ b/exec.c @@ -1868,9 +1868,13 @@ void *qemu_get_ram_block_host_ptr(ram_addr_t addr) * * Called within RCU critical section. */ -void *qemu_get_ram_ptr(ram_addr_t addr) +void *qemu_get_ram_ptr(RAMBlock *ram_block, ram_addr_t addr) { - RAMBlock *block = qemu_get_ram_block(addr); + RAMBlock *block = ram_block; + + if (block == NULL) { + block = qemu_get_ram_block(addr); + } if (xen_enabled() && block->host == NULL) { /* We need to check if the requested address is in the RAM @@ -1891,15 +1895,18 @@ void *qemu_get_ram_ptr(ram_addr_t addr) * * Called within RCU critical section. */ -static void *qemu_ram_ptr_length(ram_addr_t addr, hwaddr *size) +static void *qemu_ram_ptr_length(RAMBlock *ram_block, ram_addr_t addr, + hwaddr *size) { - RAMBlock *block; + RAMBlock *block = ram_block; ram_addr_t offset_inside_block; if (*size == 0) { return NULL; } - block = qemu_get_ram_block(addr); + if (block == NULL) { + block = qemu_get_ram_block(addr); + } offset_inside_block = addr - block->offset; *size = MIN(*size, block->max_length - offset_inside_block); @@ -2027,13 +2034,13 @@ static void notdirty_mem_write(void *opaque, hwaddr ram_addr, } switch (size) { case 1: - stb_p(qemu_get_ram_ptr(ram_addr), val); + stb_p(qemu_get_ram_ptr(NULL, ram_addr), val); break; case 2: - stw_p(qemu_get_ram_ptr(ram_addr), val); + stw_p(qemu_get_ram_ptr(NULL, ram_addr), val); break; case 4: - stl_p(qemu_get_ram_ptr(ram_addr), val); + stl_p(qemu_get_ram_ptr(NULL, ram_addr), val); break; default: abort(); @@ -2609,7 +2616,7 @@ static MemTxResult address_space_write_continue(AddressSpace *as, hwaddr addr, } else { addr1 += memory_region_get_ram_addr(mr); /* RAM case */ - ptr = qemu_get_ram_ptr(addr1); + ptr = qemu_get_ram_ptr(mr->ram_block, addr1); memcpy(ptr, buf, l); invalidate_and_set_dirty(mr, addr1, l); } @@ -2700,7 +2707,7 @@ MemTxResult address_space_read_continue(AddressSpace *as, hwaddr addr, } } else { /* RAM case */ - ptr = qemu_get_ram_ptr(mr->ram_addr + addr1); + ptr = qemu_get_ram_ptr(mr->ram_block, mr->ram_addr + addr1); memcpy(buf, ptr, l); } @@ -2785,7 +2792,7 @@ static inline void cpu_physical_memory_write_rom_internal(AddressSpace *as, } else { addr1 += memory_region_get_ram_addr(mr); /* ROM/RAM case */ - ptr = qemu_get_ram_ptr(addr1); + ptr = qemu_get_ram_ptr(mr->ram_block, addr1); switch (type) { case WRITE_DATA: memcpy(ptr, buf, l); @@ -2997,7 +3004,7 @@ void *address_space_map(AddressSpace *as, memory_region_ref(mr); *plen = done; - ptr = qemu_ram_ptr_length(raddr + base, plen); + ptr = qemu_ram_ptr_length(mr->ram_block, raddr + base, plen); rcu_read_unlock(); return ptr; @@ -3081,7 +3088,8 @@ static inline uint32_t address_space_ldl_internal(AddressSpace *as, hwaddr addr, #endif } else { /* RAM case */ - ptr = qemu_get_ram_ptr((memory_region_get_ram_addr(mr) + ptr = qemu_get_ram_ptr(mr->ram_block, + (memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK) + addr1); switch (endian) { @@ -3176,7 +3184,8 @@ static inline uint64_t address_space_ldq_internal(AddressSpace *as, hwaddr addr, #endif } else { /* RAM case */ - ptr = qemu_get_ram_ptr((memory_region_get_ram_addr(mr) + ptr = qemu_get_ram_ptr(mr->ram_block, + (memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK) + addr1); switch (endian) { @@ -3291,7 +3300,8 @@ static inline uint32_t address_space_lduw_internal(AddressSpace *as, #endif } else { /* RAM case */ - ptr = qemu_get_ram_ptr((memory_region_get_ram_addr(mr) + ptr = qemu_get_ram_ptr(mr->ram_block, + (memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK) + addr1); switch (endian) { @@ -3376,7 +3386,7 @@ void address_space_stl_notdirty(AddressSpace *as, hwaddr addr, uint32_t val, r = memory_region_dispatch_write(mr, addr1, val, 4, attrs); } else { addr1 += memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK; - ptr = qemu_get_ram_ptr(addr1); + ptr = qemu_get_ram_ptr(mr->ram_block, addr1); stl_p(ptr, val); dirty_log_mask = memory_region_get_dirty_log_mask(mr); @@ -3431,7 +3441,7 @@ static inline void address_space_stl_internal(AddressSpace *as, } else { /* RAM case */ addr1 += memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK; - ptr = qemu_get_ram_ptr(addr1); + ptr = qemu_get_ram_ptr(mr->ram_block, addr1); switch (endian) { case DEVICE_LITTLE_ENDIAN: stl_le_p(ptr, val); @@ -3541,7 +3551,7 @@ static inline void address_space_stw_internal(AddressSpace *as, } else { /* RAM case */ addr1 += memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK; - ptr = qemu_get_ram_ptr(addr1); + ptr = qemu_get_ram_ptr(mr->ram_block, addr1); switch (endian) { case DEVICE_LITTLE_ENDIAN: stw_le_p(ptr, val); diff --git a/include/exec/memory.h b/include/exec/memory.h index 4025729..1cf2e51 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -1391,7 +1391,7 @@ MemTxResult address_space_read_continue(AddressSpace *as, hwaddr addr, MemoryRegion *mr); MemTxResult address_space_read_full(AddressSpace *as, hwaddr addr, MemTxAttrs attrs, uint8_t *buf, int len); -void *qemu_get_ram_ptr(ram_addr_t addr); +void *qemu_get_ram_ptr(RAMBlock *ram_block, ram_addr_t addr); static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write) { @@ -1432,7 +1432,7 @@ MemTxResult address_space_read(AddressSpace *as, hwaddr addr, MemTxAttrs attrs, mr = address_space_translate(as, addr, &addr1, &l, false); if (len == l && memory_access_is_direct(mr, false)) { addr1 += memory_region_get_ram_addr(mr); - ptr = qemu_get_ram_ptr(addr1); + ptr = qemu_get_ram_ptr(mr->ram_block, addr1); memcpy(buf, ptr, len); } else { result = address_space_read_continue(as, addr, attrs, buf, len, diff --git a/memory.c b/memory.c index b4451dd..0dd9695 100644 --- a/memory.c +++ b/memory.c @@ -1570,7 +1570,7 @@ void *memory_region_get_ram_ptr(MemoryRegion *mr) mr = mr->alias; } assert(mr->ram_addr != RAM_ADDR_INVALID); - ptr = qemu_get_ram_ptr(mr->ram_addr & TARGET_PAGE_MASK); + ptr = qemu_get_ram_ptr(mr->ram_block, mr->ram_addr & TARGET_PAGE_MASK); rcu_read_unlock(); return ptr + offset; -- 1.8.5.2 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH v2 3/3] memory: Remove the superfluous code 2016-02-22 8:34 [Qemu-devel] [PATCH v2 0/3] memory: an optimization Gonglei 2016-02-22 8:34 ` [Qemu-devel] [PATCH v2 1/3] exec: store RAMBlock pointer into memory region Gonglei 2016-02-22 8:34 ` [Qemu-devel] [PATCH v2 2/3] memory: optimize qemu_get_ram_ptr and qemu_ram_ptr_length Gonglei @ 2016-02-22 8:34 ` Gonglei 2016-02-22 10:22 ` [Qemu-devel] [PATCH v2 0/3] memory: an optimization Paolo Bonzini 3 siblings, 0 replies; 6+ messages in thread From: Gonglei @ 2016-02-22 8:34 UTC (permalink / raw) To: qemu-devel; +Cc: pbonzini, Gonglei, peter.huangpeng Signed-off-by: Gonglei <arei.gonglei@huawei.com> --- include/exec/memory.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/include/exec/memory.h b/include/exec/memory.h index 1cf2e51..4e5a145 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -1400,8 +1400,6 @@ static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write) } else { return memory_region_is_ram(mr) || memory_region_is_romd(mr); } - - return false; } /** -- 1.8.5.2 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/3] memory: an optimization 2016-02-22 8:34 [Qemu-devel] [PATCH v2 0/3] memory: an optimization Gonglei ` (2 preceding siblings ...) 2016-02-22 8:34 ` [Qemu-devel] [PATCH v2 3/3] memory: Remove the superfluous code Gonglei @ 2016-02-22 10:22 ` Paolo Bonzini 2016-02-23 3:49 ` Fam Zheng 3 siblings, 1 reply; 6+ messages in thread From: Paolo Bonzini @ 2016-02-22 10:22 UTC (permalink / raw) To: Gonglei, qemu-devel; +Cc: peter.huangpeng On 22/02/2016 09:34, Gonglei wrote: > Perf top tells me qemu_get_ram_ptr consume some cpu cycles. > > Before this optimization: > 1.26% qemu-kvm [.] qemu_get_ram_ptr > 0.89% qemu-kvm [.] qemu_get_ram_block > > Applied the patch set: > 0.87% qemu-kvm [.] qemu_get_ram_ptr > > And Paolo suggested that we can get rid of qemu_get_ram_ptr > by storing the RAMBlock pointer into the memory region, > instead of the ram_addr_t value. And after appling this change, > I got much better performance indeed. > > BTW, PATCH 3 is an occasional find. > > v2: > - using 'struct RAMBlock *' instead of 'void *' in patch 1 [Fam] > - drop superfluous comments in patch 1 [Fam] > > Gonglei (3): > exec: store RAMBlock pointer into memory region > memory: optimize qemu_get_ram_ptr and qemu_ram_ptr_length > memory: Remove the superfluous code > > exec.c | 48 ++++++++++++++++++++++++++++++------------------ > include/exec/memory.h | 8 ++++---- > memory.c | 3 ++- > 3 files changed, 36 insertions(+), 23 deletions(-) > Thanks Lei and Fam, patches queued. Paolo ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/3] memory: an optimization 2016-02-22 10:22 ` [Qemu-devel] [PATCH v2 0/3] memory: an optimization Paolo Bonzini @ 2016-02-23 3:49 ` Fam Zheng 0 siblings, 0 replies; 6+ messages in thread From: Fam Zheng @ 2016-02-23 3:49 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Gonglei, qemu-devel, peter.huangpeng On Mon, 02/22 11:22, Paolo Bonzini wrote: > > > On 22/02/2016 09:34, Gonglei wrote: > > Perf top tells me qemu_get_ram_ptr consume some cpu cycles. > > > > Before this optimization: > > 1.26% qemu-kvm [.] qemu_get_ram_ptr > > 0.89% qemu-kvm [.] qemu_get_ram_block > > > > Applied the patch set: > > 0.87% qemu-kvm [.] qemu_get_ram_ptr > > > > And Paolo suggested that we can get rid of qemu_get_ram_ptr > > by storing the RAMBlock pointer into the memory region, > > instead of the ram_addr_t value. And after appling this change, > > I got much better performance indeed. > > > > BTW, PATCH 3 is an occasional find. > > > > v2: > > - using 'struct RAMBlock *' instead of 'void *' in patch 1 [Fam] > > - drop superfluous comments in patch 1 [Fam] > > > > Gonglei (3): > > exec: store RAMBlock pointer into memory region > > memory: optimize qemu_get_ram_ptr and qemu_ram_ptr_length > > memory: Remove the superfluous code > > > > exec.c | 48 ++++++++++++++++++++++++++++++------------------ > > include/exec/memory.h | 8 ++++---- > > memory.c | 3 ++- > > 3 files changed, 36 insertions(+), 23 deletions(-) > > > > Thanks Lei and Fam, patches queued. Thanks! Actually I'd like to clean this up a bit more: moving assigning to mr->ram_block from exec.c to memory.c, and drop mr->ram_addr. I've already done these on top of master last Friday before v1 of this was posted (oops! :), but I can rebase on top of these patches. And upon that, I think we can replicate the ram_list.mru_block trick as AddressSpaceDispatch.mru_section, to further reduce the calls to qemu_get_ram_ptr. Paolo, is there a git branch I can base off? Fam ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-02-23 3:49 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-02-22 8:34 [Qemu-devel] [PATCH v2 0/3] memory: an optimization Gonglei 2016-02-22 8:34 ` [Qemu-devel] [PATCH v2 1/3] exec: store RAMBlock pointer into memory region Gonglei 2016-02-22 8:34 ` [Qemu-devel] [PATCH v2 2/3] memory: optimize qemu_get_ram_ptr and qemu_ram_ptr_length Gonglei 2016-02-22 8:34 ` [Qemu-devel] [PATCH v2 3/3] memory: Remove the superfluous code Gonglei 2016-02-22 10:22 ` [Qemu-devel] [PATCH v2 0/3] memory: an optimization Paolo Bonzini 2016-02-23 3:49 ` Fam Zheng
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).