From: Paolo Bonzini <pbonzini@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
Date: Wed, 18 Sep 2013 00:02:51 +0200 [thread overview]
Message-ID: <5238D18B.6020901@redhat.com> (raw)
In-Reply-To: <20130917195123.GB21419@redhat.com>
Il 17/09/2013 21:51, Michael S. Tsirkin ha scritto:
> A much more interesting case is e.g. disabling memory.
> E.g.
>
> config write (disable memory)
> read (flush out outstanding writes)
> write <- must now have no effect
This works already. memory_region_del_subregion is synchronous, and
will remain synchronous wrt the current CPU even after memory dispatch
is moved out of the BQL.
This is racy of course:
VCPU 1 VCPU 2
------------------------------------------------------------
config write (disable memory) write
read
This is also racy:
VCPU 1 DMA to MMIO region
------------------------------------------------------------
config write (disable memory) write
read
This is the case where a write from another device can do weird things
such as causing a packet to be transmitted on a NIC. As you wrote (and
I agree) device removal is a superset of disabling memory and bus
master; ergo, if we handle it for disabling memory we need to handle it
for device removal too. This is why I believe qemu_del_nic belongs in
instance_finalize.
> Or disabling bus master:
>
> config write (disable bus master)
> read (flush in outstanding writes)
> <- device must now not change memory
This doesn't work. The problem is that when you disable bus master any
previous call to address_space_map remains mapped. Whoever called
address_space_map can and will write blindly to that area.
This cannot be fixed by synchronize_rcu() no matter where you place it.
The logic is like this
address_space_map
rcu_read_lock()
mr = find memory region for address
memory_region_ref(mr)
rcu_read_unlock()
do actual read or write
address_space_unmap
memory_region_unref(mr)
synchronize_rcu() ensures that future writes will not write to memory.
But it does not ensure anything about writes started before. And
unfortunately a write "starts" at the moment you start an AIO operation,
and lasts until roughly when the AIO callback executes.
If you want to fix that, the right place to do it is the PCI core. You
need a new callback of some sort in the PCI device, that will
synchronously cancel all pending I/O when the bus master bit is set to zero.
I don't think address_space_map/unmap has any equivalent PCI transaction
(or exists in any other sane bus, for that matter). However, we do it
pervasively for obvious performance reasons. I'm afraid that this is an
emulation tradeoff that we cannot avoid.
> And these rules absolutely must be obeyed,
> if you don't you'll break guests.
Yes (and I think it was already discussed, I have some deja vu feeling).
I'm not sure it is _that_ important (QEMU lived with a no-op bus master
bit for years and nothing exploded), but it is certainly good to know
what works and what doesn't---and why, and whether it is fixable.
> So I think we really should find a way to make the above work correctly.
> Removal will follow almost automatically, since it disables memory and
> mastering by itself.
These patches are concerned with the "almost" part. The difference
between config writes and removal is that, as the code is currently
written, config writes cannot cause dangling pointers.
Paolo
next prev parent reply other threads:[~2013-09-17 22:04 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
2013-09-03 12:32 ` [Qemu-devel] [PATCH 01/38] qdev: document assumption that unrealize is followed by finalize Paolo Bonzini
2013-09-17 9:00 ` Michael S. Tsirkin
2013-09-03 12:32 ` [Qemu-devel] [PATCH 02/38] pci: split exit and finalize Paolo Bonzini
2013-09-17 9:16 ` Michael S. Tsirkin
2013-09-17 9:56 ` Paolo Bonzini
2013-09-17 10:23 ` Paolo Bonzini
2013-09-17 10:06 ` Michael S. Tsirkin
2013-09-03 12:32 ` [Qemu-devel] [PATCH 03/38] ac97: use instance_finalize instead of exit Paolo Bonzini
2013-09-03 12:32 ` [Qemu-devel] [PATCH 04/38] es1370: " Paolo Bonzini
2013-09-03 12:32 ` [Qemu-devel] [PATCH 05/38] hda: reclaim memory in " Paolo Bonzini
2013-09-03 12:32 ` [Qemu-devel] [PATCH 06/38] serial: " Paolo Bonzini
2013-09-03 12:32 ` [Qemu-devel] [PATCH 07/38] tpci200: use " Paolo Bonzini
2013-09-03 12:32 ` [Qemu-devel] [PATCH 08/38] pci-assign: reclaim memory in " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 09/38] ahci: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 10/38] msix: split msix_free from msix_uninit Paolo Bonzini
2013-09-17 9:21 ` Michael S. Tsirkin
2013-09-17 9:56 ` Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 11/38] cmd646: use instance_finalize instead of exit Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 12/38] ide/piix: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 13/38] ide/via: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 14/38] ivshmem: reclaim memory in " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 15/38] pci-testdev: use " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 16/38] vfio: reclaim memory in " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 17/38] e1000: use " Paolo Bonzini
2013-09-17 9:27 ` Michael S. Tsirkin
2013-09-17 10:13 ` Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 18/38] eepro100: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 19/38] ne2000: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 20/38] pcnet: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 21/38] rtl8139: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 22/38] vmxnet3: reclaim memory in " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 23/38] shpc: split shpc_free from shpc_cleanup Paolo Bonzini
2013-09-17 9:24 ` Michael S. Tsirkin
2013-09-17 9:58 ` Paolo Bonzini
2013-09-17 10:03 ` Michael S. Tsirkin
2013-09-03 12:33 ` [Qemu-devel] [PATCH 24/38] pci_bridge: split pci_bridge_free from pci_bridge_exitfn Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 25/38] pcie_aer: pcie_aer_exit really frees stuff Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 26/38] pci_bridge: reclaim memory in instance_finalize instead of exit Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 27/38] ioh4320: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 28/38] xio3130-downstream: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 29/38] xio3130-upstream: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 30/38] pcie: do not recreate mmcfg I/O region, use an alias instead Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 31/38] esp: use instance_finalize instead of exit Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 32/38] lsi: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 33/38] pvscsi: reclaim memory in " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 34/38] usb-uhci: use " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 35/38] virtio-pci: reclaim memory in " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 36/38] wdt_i6300esb: use " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 37/38] xen_pt: reclaim memory in " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 38/38] tpm: move add/del_subregion to realize/unrealize Paolo Bonzini
2013-09-16 16:35 ` [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
2013-09-17 6:44 ` Wenchao Xia
2013-09-17 10:01 ` Paolo Bonzini
2013-09-20 6:16 ` Wenchao Xia
2013-09-17 9:31 ` Michael S. Tsirkin
2013-09-17 12:47 ` Michael S. Tsirkin
2013-09-17 14:41 ` Paolo Bonzini
2013-09-17 14:45 ` Michael S. Tsirkin
2013-09-17 15:41 ` Paolo Bonzini
2013-09-17 15:59 ` Michael S. Tsirkin
2013-09-17 16:13 ` Paolo Bonzini
2013-09-17 16:29 ` Michael S. Tsirkin
2013-09-17 16:58 ` Paolo Bonzini
2013-09-17 17:07 ` Michael S. Tsirkin
2013-09-17 17:16 ` Paolo Bonzini
2013-09-17 17:26 ` Michael S. Tsirkin
2013-09-17 19:07 ` Paolo Bonzini
2013-09-17 19:51 ` Michael S. Tsirkin
2013-09-17 22:02 ` Paolo Bonzini [this message]
2013-09-18 5:48 ` Michael S. Tsirkin
2013-09-18 7:40 ` Paolo Bonzini
2013-09-18 8:41 ` Michael S. Tsirkin
2013-09-18 11:26 ` Paolo Bonzini
2013-09-18 11:56 ` Peter Maydell
2013-09-18 13:11 ` Paolo Bonzini
2013-09-18 13:19 ` Peter Maydell
2013-09-18 13:28 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5238D18B.6020901@redhat.com \
--to=pbonzini@redhat.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).