From: Paolo Bonzini <pbonzini@redhat.com>
To: Liu Ping Fan <qemulist@gmail.com>
Cc: Peter Maydell <peter.maydell@linaro.org>,
Anthony Liguori <anthony@codemonkey.ws>,
"Michael S. Tsirkin" <mst@redhat.com>,
Jan Kiszka <jan.kiszka@siemens.com>,
qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [RFC PATCH 2/2] mem: prepare address_space listener rcu style
Date: Mon, 13 May 2013 11:31:41 +0200 [thread overview]
Message-ID: <5190B2FD.7090008@redhat.com> (raw)
In-Reply-To: <1368415264-10800-3-git-send-email-qemulist@gmail.com>
Il 13/05/2013 05:21, Liu Ping Fan ha scritto:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>
> Each address space listener has PhysPageMap *cur_map, *next_map,
> the switch from cur_map to next_map complete the RCU style. The
> mem_commit() do the switch, and it is against reader but AddressSpace's
> lock or later RCU mechanism (around address_space_translate() ).
>
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
> exec.c | 36 +++++++++++++++++++++++++++---------
> include/exec/memory-internal.h | 11 ++++++++++-
> 2 files changed, 37 insertions(+), 10 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index bb4e540..e5871d6 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -186,24 +186,26 @@ static void phys_page_set(AddressSpaceDispatch *d,
> hwaddr index, hwaddr nb,
> uint16_t leaf)
> {
> + PhysPageMap *map = d->next_map;
> /* Wildly overreserve - it doesn't matter much. */
> phys_map_node_reserve(3 * P_L2_LEVELS);
>
> - phys_page_set_level(&d->phys_map, &index, &nb, leaf, P_L2_LEVELS - 1);
> + phys_page_set_level(&map->root, &index, &nb, leaf, P_L2_LEVELS - 1);
> }
>
> static PhysSection *phys_section_find(AddressSpaceDispatch *d,
> hwaddr index)
> {
> - PhysPageEntry lp = d->phys_map;
> PhysPageEntry *p;
> - PhysSection *phys_sections = cur_pgtbl->phys_sections;
> - Node *phys_map_nodes = cur_pgtbl->phys_map_nodes;
> + PhysPageEntry lp = d->cur_map->root;
> + PhysPageTable *pgtbl = d->cur_map->pgtbl;
> + PhysSection *phys_sections = pgtbl->phys_sections;
> + Node *phys_map_nodes = pgtbl->phys_map_nodes;
> int i;
>
> for (i = P_L2_LEVELS - 1; i >= 0 && !lp.is_leaf; i--) {
> if (lp.ptr == PHYS_MAP_NODE_NIL) {
> - return &phys_sections[cur_pgtbl->phys_section_unassigned];
> + return &phys_sections[pgtbl->phys_section_unassigned];
> }
> p = phys_map_nodes[lp.ptr];
> lp = p[(index >> (i * L2_BITS)) & (L2_SIZE - 1)];
> @@ -234,7 +236,7 @@ MemoryRegionSection *address_space_translate(AddressSpace *as, hwaddr addr,
> IOMMUTLBEntry iotlb;
> MemoryRegionSection *section;
> hwaddr len = *plen;
> -
> + PhysPageTable *pgtbl = cur_pgtbl;
d->cur_map->pgtbl.
> for (;;) {
> section = address_space_lookup_region(as, addr);
>
> @@ -254,7 +256,7 @@ MemoryRegionSection *address_space_translate(AddressSpace *as, hwaddr addr,
> | (addr & iotlb.addr_mask));
> len = MIN(len, (addr | iotlb.addr_mask) - addr + 1);
> if (!iotlb.perm[is_write]) {
> - section = &cur_pgtbl->phys_sections[cur_pgtbl->phys_section_unassigned].section;
> + section = &pgtbl->phys_sections[pgtbl->phys_section_unassigned].section;
> break;
> }
>
> @@ -1703,7 +1705,21 @@ static void mem_begin(MemoryListener *listener)
> {
> AddressSpaceDispatch *d = container_of(listener, AddressSpaceDispatch, listener);
>
> - d->phys_map.ptr = PHYS_MAP_NODE_NIL;
> + d->next_map = g_new0(PhysPageMap, 1);
> + d->next_map->pgtbl = next_pgtbl;
> +}
> +
> +static void mem_commit(MemoryListener *listener)
> +{
> + AddressSpaceDispatch *d = container_of(listener, AddressSpaceDispatch, listener);
> + PhysPageMap *m = d->cur_map;
> +
> + d->cur_map = d->next_map;
> + /* Fixme, Currently, we rely on biglock or address-space lock against
> + * reader. So here, we can safely drop it.
> + * After RCU, should change to call_rcu()
> + */
> + g_free(m);
> }
>
> static void core_begin(MemoryListener *listener)
> @@ -1771,11 +1787,12 @@ void address_space_init_dispatch(AddressSpace *as)
> {
> AddressSpaceDispatch *d = g_new(AddressSpaceDispatch, 1);
>
> - d->phys_map = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
> + d->cur_map = g_new0(PhysPageMap, 1);
> d->listener = (MemoryListener) {
> .begin = mem_begin,
> .region_add = mem_add,
> .region_nop = mem_add,
> + .commit = mem_commit,
> .priority = 0,
> };
> as->dispatch = d;
> @@ -1787,6 +1804,7 @@ void address_space_destroy_dispatch(AddressSpace *as)
> AddressSpaceDispatch *d = as->dispatch;
>
> memory_listener_unregister(&d->listener);
> + g_free(d->cur_map);
> g_free(d);
> as->dispatch = NULL;
> }
> diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h
> index 1b156fd..0dfe260 100644
> --- a/include/exec/memory-internal.h
> +++ b/include/exec/memory-internal.h
> @@ -30,13 +30,22 @@ struct PhysPageEntry {
> uint16_t ptr : 15;
> };
>
> +struct PhysPageTable;
> +typedef struct PhysPageMap PhysPageMap;
> +
> +struct PhysPageMap {
> + PhysPageEntry root;
> + struct PhysPageTable *pgtbl;
cur_pgtbl should be introduced in patch 1 already.
> +};
> +
> typedef struct AddressSpaceDispatch AddressSpaceDispatch;
>
> struct AddressSpaceDispatch {
> /* This is a multi-level map on the physical address space.
> * The bottom level has pointers to MemoryRegionSections.
> */
> - PhysPageEntry phys_map;
> + PhysPageMap *cur_map;
> + PhysPageMap *next_map;
Pointers are quite expensive here. With RCU we can fetch a consistent
root/table pair like this:
rcu_read_lock();
do {
pgtbl = d->cur_pgtbl;
smp_rmb();
root = d->cur_root;
/* RCU ensures that d->cur_pgtbl remains alive, thus it cannot
* be recycled while this loop is running. If
* d->cur_pgtbl == pgtbl, the root is the right one for this
* pgtable.
*/
smp_rmb();
} while (d->cur_pgtbl == pgtbl);
...
rcu_read_unlock();
Remember to have a matching smp_wmb() in mem_commit, and to write ->root
first:
old_pgtbl = d->cur_pgtbl;
smp_wmb();
d->cur_root = d->next_root;
/* Write the root before updating the page table. */
smp_wmb();
d->cur_pgtbl = d->next_pgtbl;
/* Write cur_pgtbl before possibly destroying the old one. */
smp_mb();
page_table_unref(old_pgtbl); /* uses call_rcu if --refcount == 0 */
If you are renaming fields, please do it as the first step.
Paolo
> MemoryListener listener;
> };
>
>
next prev parent reply other threads:[~2013-05-13 9:31 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-13 3:21 [Qemu-devel] [RFC PATCH 0/2] make memory listener prepared for rcu style Liu Ping Fan
2013-05-13 3:21 ` [Qemu-devel] [RFC PATCH 1/2] mem: make phys_section and phys_map_nodes prepared for RCU Liu Ping Fan
2013-05-13 9:20 ` Paolo Bonzini
2013-05-14 3:38 ` liu ping fan
2013-05-14 9:27 ` Paolo Bonzini
2013-05-15 7:04 ` liu ping fan
2013-05-26 13:02 ` liu ping fan
2013-05-27 11:54 ` Paolo Bonzini
2013-05-29 1:52 ` liu ping fan
2013-05-13 3:21 ` [Qemu-devel] [RFC PATCH 2/2] mem: prepare address_space listener rcu style Liu Ping Fan
2013-05-13 9:31 ` Paolo Bonzini [this message]
2013-05-14 5:47 ` liu ping fan
2013-05-14 9:34 ` Paolo Bonzini
2013-05-15 1:29 ` liu ping fan
2013-05-15 8:22 ` Paolo Bonzini
2013-05-15 9:11 ` liu ping fan
2013-05-15 9:19 ` Paolo Bonzini
2013-05-16 9:09 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5190B2FD.7090008@redhat.com \
--to=pbonzini@redhat.com \
--cc=anthony@codemonkey.ws \
--cc=jan.kiszka@siemens.com \
--cc=mst@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=qemulist@gmail.com \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.