From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34592) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YLPbz-0003OA-1T for qemu-devel@nongnu.org; Wed, 11 Feb 2015 00:13:20 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YLPbv-000407-RX for qemu-devel@nongnu.org; Wed, 11 Feb 2015 00:13:18 -0500 Received: from e8.ny.us.ibm.com ([32.97.182.138]:54128) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YLPbv-0003zl-Na for qemu-devel@nongnu.org; Wed, 11 Feb 2015 00:13:15 -0500 Received: from /spool/local by e8.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 11 Feb 2015 00:13:12 -0500 Received: from b01cxnp23033.gho.pok.ibm.com (b01cxnp23033.gho.pok.ibm.com [9.57.198.28]) by d01dlp03.pok.ibm.com (Postfix) with ESMTP id B0E45C9005A for ; Wed, 11 Feb 2015 00:04:20 -0500 (EST) Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by b01cxnp23033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id t1B5D9S725821252 for ; Wed, 11 Feb 2015 05:13:09 GMT Received: from d01av03.pok.ibm.com (localhost [127.0.0.1]) by d01av03.pok.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id t1B5D8X2026179 for ; Wed, 11 Feb 2015 00:13:08 -0500 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable From: Michael Roth In-Reply-To: <1423572769-4238-1-git-send-email-pbonzini@redhat.com> References: <1423572769-4238-1-git-send-email-pbonzini@redhat.com> Message-ID: <20150211051302.3809.50882@loki> Date: Tue, 10 Feb 2015 23:13:02 -0600 Subject: Re: [Qemu-devel] [PATCH] memory: unregister AddressSpace MemoryListener within BQL List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini , qemu-devel@nongnu.org Cc: alex.williamson@redhat.com Quoting Paolo Bonzini (2015-02-10 06:52:49) > address_space_destroy_dispatch is called from an RCU callback and hence > outside the iothread mutex (BQL). However, after address_space_destroy > no new accesses can hit the destroyed AddressSpace so it is not necessary > to observe changes to the memory map. Move the memory_listener_unregister > call earlier, to make it thread-safe again. > = > Reported-by: Alex Williamson > Fixes: 374f2981d1f10bc4307f250f24b2a7ddb9b14be0 > Signed-off-by: Paolo Bonzini Prior to this patch I was seeing segfaults in various parts of memory listener register/unregister path running a workload that rapidly hot plugs/unplugs a sizeable number of devices, which seems to be addressed with this patch applied. But now I'm seeing a less frequent segfault in the RCU thread when running the same workload: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x3fffb689ec20 (LWP 26230)] call_rcu_thread (opaque=3D) at /home/mdroth/w/qemu.git/util/= rcu.c:250 250 node->func(node); (gdb) bt #0 call_rcu_thread (opaque=3D) at /home/mdroth/w/qemu.git/u= til/rcu.c:250 #1 0x00003fffb787c29c in .start_thread () from /lib64/libpthread.so.0 #2 0x00003fffb779cd30 in .__clone () from /lib64/libc.so.6 (gdb) ptype node type =3D struct rcu_head { struct rcu_head *next; RCUCBFunc *func; } * (gdb) print node $1 =3D (struct rcu_head *) 0x11189a68 (gdb) print node->func $2 =3D (RCUCBFunc *) 0x0 (gdb) print node->next $3 =3D (struct rcu_head *) 0x3fff9800d4f0 I've seen it on both x86 and pseries (with spapr hotplug patches applied), = and have only seen it occur at this spot. AFAICT node->func is only set via 1 of: call_rcu(old_view, flatview_unref, rcu); call_rcu(as, do_address_space_destroy, rcu); so it shouldn't ever be NULL... and there's a wmb after node->func is set, prior to the node being made available to the RCU thread via enqueue(), so that doesn't seem to be the issue. I think the node in this case is a FlatView*, if that helps narrow it down: (gdb) print ((AddressSpace *)(0x3fff9800d4f0))->name $5 =3D 0x100000000
(gdb) print ((FlatView *)(0x3fff9800d4f0))->ref $6 =3D 1 (gdb) print ((FlatView *)(0x3fff9800d4f0))->nr $7 =3D 34 (gdb) print ((FlatView *)(0x3fff9800d4f0))->nr_allocated $8 =3D 40 (gdb) The workload is basically this, run in a tight loop: device_add virtio-net-pci,id=3D0 sleep .5 ... device_add virtio-net-pci,id=3D14 sleep .5 sleep 3 device_del 0 ... device_del 14 Let me know if there's anything else I can do to narrow it down further. > --- > exec.c | 6 +++++- > include/exec/memory-internal.h | 1 + > memory.c | 1 + > 3 files changed, 7 insertions(+), 1 deletion(-) > = > diff --git a/exec.c b/exec.c > index 6b79ad1..6dff7bc 100644 > --- a/exec.c > +++ b/exec.c > @@ -2059,11 +2059,15 @@ void address_space_init_dispatch(AddressSpace *as) > memory_listener_register(&as->dispatch_listener, as); > } > = > +void address_space_unregister(AddressSpace *as) > +{ > + memory_listener_unregister(&as->dispatch_listener); > +} > + > void address_space_destroy_dispatch(AddressSpace *as) > { > AddressSpaceDispatch *d =3D as->dispatch; > = > - memory_listener_unregister(&as->dispatch_listener); > g_free(d); > as->dispatch =3D NULL; > } > diff --git a/include/exec/memory-internal.h b/include/exec/memory-interna= l.h > index 25c43c0..fb467ac 100644 > --- a/include/exec/memory-internal.h > +++ b/include/exec/memory-internal.h > @@ -23,6 +23,7 @@ > typedef struct AddressSpaceDispatch AddressSpaceDispatch; > = > void address_space_init_dispatch(AddressSpace *as); > +void address_space_unregister(AddressSpace *as); > void address_space_destroy_dispatch(AddressSpace *as); > = > extern const MemoryRegionOps unassigned_mem_ops; > diff --git a/memory.c b/memory.c > index 9b91243..130152c 100644 > --- a/memory.c > +++ b/memory.c > @@ -1978,6 +1978,7 @@ void address_space_destroy(AddressSpace *as) > as->root =3D NULL; > memory_region_transaction_commit(); > QTAILQ_REMOVE(&address_spaces, as, address_spaces_link); > + address_space_unregister(as); > = > /* At this point, as->dispatch and as->current_map are dummy > * entries that the guest should never use. Wait for the old > -- = > 1.8.3.1