From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:45973) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UcBcl-0002Md-1A for qemu-devel@nongnu.org; Tue, 14 May 2013 05:34:25 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UcBch-0008DK-6w for qemu-devel@nongnu.org; Tue, 14 May 2013 05:34:22 -0400 Received: from mail-gh0-f178.google.com ([209.85.160.178]:37466) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UcBcg-0008Cq-US for qemu-devel@nongnu.org; Tue, 14 May 2013 05:34:19 -0400 Received: by mail-gh0-f178.google.com with SMTP id g15so37947ghb.37 for ; Tue, 14 May 2013 02:34:18 -0700 (PDT) Sender: Paolo Bonzini Message-ID: <51920512.3090708@redhat.com> Date: Tue, 14 May 2013 11:34:10 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <1368415264-10800-1-git-send-email-qemulist@gmail.com> <1368415264-10800-3-git-send-email-qemulist@gmail.com> <5190B2FD.7090008@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH 2/2] mem: prepare address_space listener rcu style List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: liu ping fan Cc: Peter Maydell , Anthony Liguori , "Michael S. Tsirkin" , Jan Kiszka , qemu-devel@nongnu.org, Stefan Hajnoczi Il 14/05/2013 07:47, liu ping fan ha scritto: > On Mon, May 13, 2013 at 5:31 PM, Paolo Bonzini wrote: >> Il 13/05/2013 05:21, Liu Ping Fan ha scritto: >>> From: Liu Ping Fan >>> >>> Each address space listener has PhysPageMap *cur_map, *next_map, >>> the switch from cur_map to next_map complete the RCU style. The >>> mem_commit() do the switch, and it is against reader but AddressSpace's >>> lock or later RCU mechanism (around address_space_translate() ). >>> >>> Signed-off-by: Liu Ping Fan >>> --- >>> exec.c | 36 +++++++++++++++++++++++++++--------- >>> include/exec/memory-internal.h | 11 ++++++++++- >>> 2 files changed, 37 insertions(+), 10 deletions(-) >>> >>> diff --git a/exec.c b/exec.c >>> index bb4e540..e5871d6 100644 >>> --- a/exec.c >>> +++ b/exec.c >>> @@ -186,24 +186,26 @@ static void phys_page_set(AddressSpaceDispatch *d, >>> hwaddr index, hwaddr nb, >>> uint16_t leaf) >>> { >>> + PhysPageMap *map = d->next_map; >>> /* Wildly overreserve - it doesn't matter much. */ >>> phys_map_node_reserve(3 * P_L2_LEVELS); >>> >>> - phys_page_set_level(&d->phys_map, &index, &nb, leaf, P_L2_LEVELS - 1); >>> + phys_page_set_level(&map->root, &index, &nb, leaf, P_L2_LEVELS - 1); >>> } >>> >>> static PhysSection *phys_section_find(AddressSpaceDispatch *d, >>> hwaddr index) >>> { >>> - PhysPageEntry lp = d->phys_map; >>> PhysPageEntry *p; >>> - PhysSection *phys_sections = cur_pgtbl->phys_sections; >>> - Node *phys_map_nodes = cur_pgtbl->phys_map_nodes; >>> + PhysPageEntry lp = d->cur_map->root; >>> + PhysPageTable *pgtbl = d->cur_map->pgtbl; >>> + PhysSection *phys_sections = pgtbl->phys_sections; >>> + Node *phys_map_nodes = pgtbl->phys_map_nodes; >>> int i; >>> >>> for (i = P_L2_LEVELS - 1; i >= 0 && !lp.is_leaf; i--) { >>> if (lp.ptr == PHYS_MAP_NODE_NIL) { >>> - return &phys_sections[cur_pgtbl->phys_section_unassigned]; >>> + return &phys_sections[pgtbl->phys_section_unassigned]; >>> } >>> p = phys_map_nodes[lp.ptr]; >>> lp = p[(index >> (i * L2_BITS)) & (L2_SIZE - 1)]; >>> @@ -234,7 +236,7 @@ MemoryRegionSection *address_space_translate(AddressSpace *as, hwaddr addr, >>> IOMMUTLBEntry iotlb; >>> MemoryRegionSection *section; >>> hwaddr len = *plen; >>> - >>> + PhysPageTable *pgtbl = cur_pgtbl; >> >> d->cur_map->pgtbl. >> >>> for (;;) { >>> section = address_space_lookup_region(as, addr); >>> >>> @@ -254,7 +256,7 @@ MemoryRegionSection *address_space_translate(AddressSpace *as, hwaddr addr, >>> | (addr & iotlb.addr_mask)); >>> len = MIN(len, (addr | iotlb.addr_mask) - addr + 1); >>> if (!iotlb.perm[is_write]) { >>> - section = &cur_pgtbl->phys_sections[cur_pgtbl->phys_section_unassigned].section; >>> + section = &pgtbl->phys_sections[pgtbl->phys_section_unassigned].section; >>> break; >>> } >>> >>> @@ -1703,7 +1705,21 @@ static void mem_begin(MemoryListener *listener) >>> { >>> AddressSpaceDispatch *d = container_of(listener, AddressSpaceDispatch, listener); >>> >>> - d->phys_map.ptr = PHYS_MAP_NODE_NIL; >>> + d->next_map = g_new0(PhysPageMap, 1); >>> + d->next_map->pgtbl = next_pgtbl; >>> +} >>> + >>> +static void mem_commit(MemoryListener *listener) >>> +{ >>> + AddressSpaceDispatch *d = container_of(listener, AddressSpaceDispatch, listener); >>> + PhysPageMap *m = d->cur_map; >>> + >>> + d->cur_map = d->next_map; >>> + /* Fixme, Currently, we rely on biglock or address-space lock against >>> + * reader. So here, we can safely drop it. >>> + * After RCU, should change to call_rcu() >>> + */ >>> + g_free(m); >>> } >>> >>> static void core_begin(MemoryListener *listener) >>> @@ -1771,11 +1787,12 @@ void address_space_init_dispatch(AddressSpace *as) >>> { >>> AddressSpaceDispatch *d = g_new(AddressSpaceDispatch, 1); >>> >>> - d->phys_map = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 }; >>> + d->cur_map = g_new0(PhysPageMap, 1); >>> d->listener = (MemoryListener) { >>> .begin = mem_begin, >>> .region_add = mem_add, >>> .region_nop = mem_add, >>> + .commit = mem_commit, >>> .priority = 0, >>> }; >>> as->dispatch = d; >>> @@ -1787,6 +1804,7 @@ void address_space_destroy_dispatch(AddressSpace *as) >>> AddressSpaceDispatch *d = as->dispatch; >>> >>> memory_listener_unregister(&d->listener); >>> + g_free(d->cur_map); >>> g_free(d); >>> as->dispatch = NULL; >>> } >>> diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h >>> index 1b156fd..0dfe260 100644 >>> --- a/include/exec/memory-internal.h >>> +++ b/include/exec/memory-internal.h >>> @@ -30,13 +30,22 @@ struct PhysPageEntry { >>> uint16_t ptr : 15; >>> }; >>> >>> +struct PhysPageTable; >>> +typedef struct PhysPageMap PhysPageMap; >>> + >>> +struct PhysPageMap { >>> + PhysPageEntry root; >>> + struct PhysPageTable *pgtbl; >> >> cur_pgtbl should be introduced in patch 1 already. >> > PhysPageMap is a member of each AddressSpaceDispatch. And we achieve > RCU based on PhysPageMap *cur_map=next_map. But assignment of cur_map is not atomic. That's what I was trying to achieve with the do...while loop below. > While cur_pgtbl/next_pgtbl are shared by all AddressSpaceDispatch. You cannot share them, because the root of some AddressSpaceDispatch structures will still refer to the old pagetable. You need to attach the root of each ASD to the right pagetable, and the simplest way to do it is to put cur_pgtbl/next_pgtbl in the ASD. And this problem exists in the first patch already, so that's where you have to put it. >>> +}; >>> + >>> typedef struct AddressSpaceDispatch AddressSpaceDispatch; >>> >>> struct AddressSpaceDispatch { >>> /* This is a multi-level map on the physical address space. >>> * The bottom level has pointers to MemoryRegionSections. >>> */ >>> - PhysPageEntry phys_map; >>> + PhysPageMap *cur_map; >>> + PhysPageMap *next_map; >> >> Pointers are quite expensive here. With RCU we can fetch a consistent >> root/table pair like this: >> >> rcu_read_lock(); >> do { >> pgtbl = d->cur_pgtbl; >> smp_rmb(); >> root = d->cur_root; >> >> /* RCU ensures that d->cur_pgtbl remains alive, thus it cannot >> * be recycled while this loop is running. If >> * d->cur_pgtbl == pgtbl, the root is the right one for this >> * pgtable. >> */ >> smp_rmb(); >> } while (d->cur_pgtbl == pgtbl); Ouch, != of course. >> ... >> rcu_read_unlock(); >> > It seems to break the semantic of rcu_dereference() and rcu_assign(). It doesn't. In fact it is even stronger, I'm using a "full" rmb instead of read_barrier_depends. > If pointers are expensive, how about this: > if (unlikely(d->prev_map!=d->cur_map)) { > d->root = d->cur_map->root; > d->pgtbl = d->cur_map->root; > d->prev_map = d->cur_map; > } > So usually, we use cache value. Doesn't work, it has ABA problem. In my solution above, RCU avoids ABA because the read and the check are under the same RCU critical section. In your solution, the read and the check are under different RCU critical sections, so it doesn't work. Paolo > Thanks and regards, > Pingfan >> Remember to have a matching smp_wmb() in mem_commit, and to write ->root >> first: >> >> old_pgtbl = d->cur_pgtbl; >> smp_wmb(); >> d->cur_root = d->next_root; >> >> /* Write the root before updating the page table. */ >> smp_wmb(); >> d->cur_pgtbl = d->next_pgtbl; >> >> /* Write cur_pgtbl before possibly destroying the old one. */ >> smp_mb(); >> page_table_unref(old_pgtbl); /* uses call_rcu if --refcount == 0 */ >> >> If you are renaming fields, please do it as the first step. >> >> Paolo >> >>> MemoryListener listener; >>> }; >>> >>> >> > >