From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:33449) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ubp6r-0006s0-3p for qemu-devel@nongnu.org; Mon, 13 May 2013 05:31:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Ubp6p-0008Mo-Nj for qemu-devel@nongnu.org; Mon, 13 May 2013 05:31:57 -0400 Received: from mx1.redhat.com ([209.132.183.28]:60974) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ubp6p-0008Mk-FW for qemu-devel@nongnu.org; Mon, 13 May 2013 05:31:55 -0400 Message-ID: <5190B2FD.7090008@redhat.com> Date: Mon, 13 May 2013 11:31:41 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <1368415264-10800-1-git-send-email-qemulist@gmail.com> <1368415264-10800-3-git-send-email-qemulist@gmail.com> In-Reply-To: <1368415264-10800-3-git-send-email-qemulist@gmail.com> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH 2/2] mem: prepare address_space listener rcu style List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Liu Ping Fan Cc: Peter Maydell , Anthony Liguori , "Michael S. Tsirkin" , Jan Kiszka , qemu-devel@nongnu.org, Stefan Hajnoczi Il 13/05/2013 05:21, Liu Ping Fan ha scritto: > From: Liu Ping Fan > > Each address space listener has PhysPageMap *cur_map, *next_map, > the switch from cur_map to next_map complete the RCU style. The > mem_commit() do the switch, and it is against reader but AddressSpace's > lock or later RCU mechanism (around address_space_translate() ). > > Signed-off-by: Liu Ping Fan > --- > exec.c | 36 +++++++++++++++++++++++++++--------- > include/exec/memory-internal.h | 11 ++++++++++- > 2 files changed, 37 insertions(+), 10 deletions(-) > > diff --git a/exec.c b/exec.c > index bb4e540..e5871d6 100644 > --- a/exec.c > +++ b/exec.c > @@ -186,24 +186,26 @@ static void phys_page_set(AddressSpaceDispatch *d, > hwaddr index, hwaddr nb, > uint16_t leaf) > { > + PhysPageMap *map = d->next_map; > /* Wildly overreserve - it doesn't matter much. */ > phys_map_node_reserve(3 * P_L2_LEVELS); > > - phys_page_set_level(&d->phys_map, &index, &nb, leaf, P_L2_LEVELS - 1); > + phys_page_set_level(&map->root, &index, &nb, leaf, P_L2_LEVELS - 1); > } > > static PhysSection *phys_section_find(AddressSpaceDispatch *d, > hwaddr index) > { > - PhysPageEntry lp = d->phys_map; > PhysPageEntry *p; > - PhysSection *phys_sections = cur_pgtbl->phys_sections; > - Node *phys_map_nodes = cur_pgtbl->phys_map_nodes; > + PhysPageEntry lp = d->cur_map->root; > + PhysPageTable *pgtbl = d->cur_map->pgtbl; > + PhysSection *phys_sections = pgtbl->phys_sections; > + Node *phys_map_nodes = pgtbl->phys_map_nodes; > int i; > > for (i = P_L2_LEVELS - 1; i >= 0 && !lp.is_leaf; i--) { > if (lp.ptr == PHYS_MAP_NODE_NIL) { > - return &phys_sections[cur_pgtbl->phys_section_unassigned]; > + return &phys_sections[pgtbl->phys_section_unassigned]; > } > p = phys_map_nodes[lp.ptr]; > lp = p[(index >> (i * L2_BITS)) & (L2_SIZE - 1)]; > @@ -234,7 +236,7 @@ MemoryRegionSection *address_space_translate(AddressSpace *as, hwaddr addr, > IOMMUTLBEntry iotlb; > MemoryRegionSection *section; > hwaddr len = *plen; > - > + PhysPageTable *pgtbl = cur_pgtbl; d->cur_map->pgtbl. > for (;;) { > section = address_space_lookup_region(as, addr); > > @@ -254,7 +256,7 @@ MemoryRegionSection *address_space_translate(AddressSpace *as, hwaddr addr, > | (addr & iotlb.addr_mask)); > len = MIN(len, (addr | iotlb.addr_mask) - addr + 1); > if (!iotlb.perm[is_write]) { > - section = &cur_pgtbl->phys_sections[cur_pgtbl->phys_section_unassigned].section; > + section = &pgtbl->phys_sections[pgtbl->phys_section_unassigned].section; > break; > } > > @@ -1703,7 +1705,21 @@ static void mem_begin(MemoryListener *listener) > { > AddressSpaceDispatch *d = container_of(listener, AddressSpaceDispatch, listener); > > - d->phys_map.ptr = PHYS_MAP_NODE_NIL; > + d->next_map = g_new0(PhysPageMap, 1); > + d->next_map->pgtbl = next_pgtbl; > +} > + > +static void mem_commit(MemoryListener *listener) > +{ > + AddressSpaceDispatch *d = container_of(listener, AddressSpaceDispatch, listener); > + PhysPageMap *m = d->cur_map; > + > + d->cur_map = d->next_map; > + /* Fixme, Currently, we rely on biglock or address-space lock against > + * reader. So here, we can safely drop it. > + * After RCU, should change to call_rcu() > + */ > + g_free(m); > } > > static void core_begin(MemoryListener *listener) > @@ -1771,11 +1787,12 @@ void address_space_init_dispatch(AddressSpace *as) > { > AddressSpaceDispatch *d = g_new(AddressSpaceDispatch, 1); > > - d->phys_map = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 }; > + d->cur_map = g_new0(PhysPageMap, 1); > d->listener = (MemoryListener) { > .begin = mem_begin, > .region_add = mem_add, > .region_nop = mem_add, > + .commit = mem_commit, > .priority = 0, > }; > as->dispatch = d; > @@ -1787,6 +1804,7 @@ void address_space_destroy_dispatch(AddressSpace *as) > AddressSpaceDispatch *d = as->dispatch; > > memory_listener_unregister(&d->listener); > + g_free(d->cur_map); > g_free(d); > as->dispatch = NULL; > } > diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h > index 1b156fd..0dfe260 100644 > --- a/include/exec/memory-internal.h > +++ b/include/exec/memory-internal.h > @@ -30,13 +30,22 @@ struct PhysPageEntry { > uint16_t ptr : 15; > }; > > +struct PhysPageTable; > +typedef struct PhysPageMap PhysPageMap; > + > +struct PhysPageMap { > + PhysPageEntry root; > + struct PhysPageTable *pgtbl; cur_pgtbl should be introduced in patch 1 already. > +}; > + > typedef struct AddressSpaceDispatch AddressSpaceDispatch; > > struct AddressSpaceDispatch { > /* This is a multi-level map on the physical address space. > * The bottom level has pointers to MemoryRegionSections. > */ > - PhysPageEntry phys_map; > + PhysPageMap *cur_map; > + PhysPageMap *next_map; Pointers are quite expensive here. With RCU we can fetch a consistent root/table pair like this: rcu_read_lock(); do { pgtbl = d->cur_pgtbl; smp_rmb(); root = d->cur_root; /* RCU ensures that d->cur_pgtbl remains alive, thus it cannot * be recycled while this loop is running. If * d->cur_pgtbl == pgtbl, the root is the right one for this * pgtable. */ smp_rmb(); } while (d->cur_pgtbl == pgtbl); ... rcu_read_unlock(); Remember to have a matching smp_wmb() in mem_commit, and to write ->root first: old_pgtbl = d->cur_pgtbl; smp_wmb(); d->cur_root = d->next_root; /* Write the root before updating the page table. */ smp_wmb(); d->cur_pgtbl = d->next_pgtbl; /* Write cur_pgtbl before possibly destroying the old one. */ smp_mb(); page_table_unref(old_pgtbl); /* uses call_rcu if --refcount == 0 */ If you are renaming fields, please do it as the first step. Paolo > MemoryListener listener; > }; > >