From: "Michael S. Tsirkin" <mst@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
Date: Wed, 18 Sep 2013 08:48:48 +0300 [thread overview]
Message-ID: <20130918054848.GB23532@redhat.com> (raw)
In-Reply-To: <5238D18B.6020901@redhat.com>
On Wed, Sep 18, 2013 at 12:02:51AM +0200, Paolo Bonzini wrote:
> Il 17/09/2013 21:51, Michael S. Tsirkin ha scritto:
> > A much more interesting case is e.g. disabling memory.
> > E.g.
> >
> > config write (disable memory)
> > read (flush out outstanding writes)
> > write <- must now have no effect
>
> This works already. memory_region_del_subregion is synchronous, and
> will remain synchronous wrt the current CPU even after memory dispatch
> is moved out of the BQL.
>
> This is racy of course:
>
> VCPU 1 VCPU 2
> ------------------------------------------------------------
> config write (disable memory) write
> read
>
> This is also racy:
>
> VCPU 1 DMA to MMIO region
What about normal DMA to memory?
I don't believe DMA to MMIO is relevant ATM,
we can just make it fail (e.g. as Express spec 1.0 allowed).
> ------------------------------------------------------------
> config write (disable memory) write
> read
>
> This is the case where a write from another device can do weird things
> such as causing a packet to be transmitted on a NIC. As you wrote (and
> I agree) device removal is a superset of disabling memory and bus
> master; ergo, if we handle it for disabling memory we need to handle it
> for device removal too. This is why I believe qemu_del_nic belongs in
> instance_finalize.
>
> > Or disabling bus master:
> >
> > config write (disable bus master)
> > read (flush in outstanding writes)
> > <- device must now not change memory
>
> This doesn't work. The problem is that when you disable bus master any
> previous call to address_space_map remains mapped. Whoever called
> address_space_map can and will write blindly to that area.
>
> This cannot be fixed by synchronize_rcu() no matter where you place it.
> The logic is like this
>
> address_space_map
> rcu_read_lock()
> mr = find memory region for address
> memory_region_ref(mr)
> rcu_read_unlock()
> do actual read or write
> address_space_unmap
> memory_region_unref(mr)
>
> synchronize_rcu() ensures that future writes will not write to memory.
> But it does not ensure anything about writes started before. And
> unfortunately a write "starts" at the moment you start an AIO operation,
> and lasts until roughly when the AIO callback executes.
>
> If you want to fix that, the right place to do it is the PCI core. You
> need a new callback of some sort in the PCI device, that will
> synchronously cancel all pending I/O when the bus master bit is set to zero.
It has to work even without config changes really.
So I think the fix is actually obeying ordering rules,
that is know that write is in progress
and flush on read.
> I don't think address_space_map/unmap has any equivalent PCI transaction
> (or exists in any other sane bus, for that matter). However, we do it
> pervasively for obvious performance reasons. I'm afraid that this is an
> emulation tradeoff that we cannot avoid.
>
> > And these rules absolutely must be obeyed,
> > if you don't you'll break guests.
>
> Yes (and I think it was already discussed, I have some deja vu feeling).
> I'm not sure it is _that_ important (QEMU lived with a no-op bus master
> bit for years and nothing exploded),
Yes but you can't pass e.g. WHQL without it.
And even bus master is less important than obeying ordering rules.
We got and get away with ignoring them because of the BQL.
The moment emulated PCI devices write into memory without a BQL
we'll need to emulate ordering.
> but it is certainly good to know
> what works and what doesn't---and why, and whether it is fixable.
>
> > So I think we really should find a way to make the above work correctly.
> > Removal will follow almost automatically, since it disables memory and
> > mastering by itself.
>
> These patches are concerned with the "almost" part. The difference
> between config writes and removal is that, as the code is currently
> written, config writes cannot cause dangling pointers.
>
> Paolo
I'm not really objecting to the patches.
I think moving memory region destroy out to finalize makes sense
irrespectively, as long as destroy is made idempotent so we can simply
destroy everything without worrying whether we initialized it.
The rest of the changes will be harder, we'll have to do
them carefully on a case by case basis.
--
MST
next prev parent reply other threads:[~2013-09-18 5:46 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
2013-09-03 12:32 ` [Qemu-devel] [PATCH 01/38] qdev: document assumption that unrealize is followed by finalize Paolo Bonzini
2013-09-17 9:00 ` Michael S. Tsirkin
2013-09-03 12:32 ` [Qemu-devel] [PATCH 02/38] pci: split exit and finalize Paolo Bonzini
2013-09-17 9:16 ` Michael S. Tsirkin
2013-09-17 9:56 ` Paolo Bonzini
2013-09-17 10:23 ` Paolo Bonzini
2013-09-17 10:06 ` Michael S. Tsirkin
2013-09-03 12:32 ` [Qemu-devel] [PATCH 03/38] ac97: use instance_finalize instead of exit Paolo Bonzini
2013-09-03 12:32 ` [Qemu-devel] [PATCH 04/38] es1370: " Paolo Bonzini
2013-09-03 12:32 ` [Qemu-devel] [PATCH 05/38] hda: reclaim memory in " Paolo Bonzini
2013-09-03 12:32 ` [Qemu-devel] [PATCH 06/38] serial: " Paolo Bonzini
2013-09-03 12:32 ` [Qemu-devel] [PATCH 07/38] tpci200: use " Paolo Bonzini
2013-09-03 12:32 ` [Qemu-devel] [PATCH 08/38] pci-assign: reclaim memory in " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 09/38] ahci: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 10/38] msix: split msix_free from msix_uninit Paolo Bonzini
2013-09-17 9:21 ` Michael S. Tsirkin
2013-09-17 9:56 ` Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 11/38] cmd646: use instance_finalize instead of exit Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 12/38] ide/piix: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 13/38] ide/via: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 14/38] ivshmem: reclaim memory in " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 15/38] pci-testdev: use " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 16/38] vfio: reclaim memory in " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 17/38] e1000: use " Paolo Bonzini
2013-09-17 9:27 ` Michael S. Tsirkin
2013-09-17 10:13 ` Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 18/38] eepro100: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 19/38] ne2000: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 20/38] pcnet: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 21/38] rtl8139: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 22/38] vmxnet3: reclaim memory in " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 23/38] shpc: split shpc_free from shpc_cleanup Paolo Bonzini
2013-09-17 9:24 ` Michael S. Tsirkin
2013-09-17 9:58 ` Paolo Bonzini
2013-09-17 10:03 ` Michael S. Tsirkin
2013-09-03 12:33 ` [Qemu-devel] [PATCH 24/38] pci_bridge: split pci_bridge_free from pci_bridge_exitfn Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 25/38] pcie_aer: pcie_aer_exit really frees stuff Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 26/38] pci_bridge: reclaim memory in instance_finalize instead of exit Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 27/38] ioh4320: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 28/38] xio3130-downstream: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 29/38] xio3130-upstream: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 30/38] pcie: do not recreate mmcfg I/O region, use an alias instead Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 31/38] esp: use instance_finalize instead of exit Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 32/38] lsi: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 33/38] pvscsi: reclaim memory in " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 34/38] usb-uhci: use " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 35/38] virtio-pci: reclaim memory in " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 36/38] wdt_i6300esb: use " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 37/38] xen_pt: reclaim memory in " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 38/38] tpm: move add/del_subregion to realize/unrealize Paolo Bonzini
2013-09-16 16:35 ` [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
2013-09-17 6:44 ` Wenchao Xia
2013-09-17 10:01 ` Paolo Bonzini
2013-09-20 6:16 ` Wenchao Xia
2013-09-17 9:31 ` Michael S. Tsirkin
2013-09-17 12:47 ` Michael S. Tsirkin
2013-09-17 14:41 ` Paolo Bonzini
2013-09-17 14:45 ` Michael S. Tsirkin
2013-09-17 15:41 ` Paolo Bonzini
2013-09-17 15:59 ` Michael S. Tsirkin
2013-09-17 16:13 ` Paolo Bonzini
2013-09-17 16:29 ` Michael S. Tsirkin
2013-09-17 16:58 ` Paolo Bonzini
2013-09-17 17:07 ` Michael S. Tsirkin
2013-09-17 17:16 ` Paolo Bonzini
2013-09-17 17:26 ` Michael S. Tsirkin
2013-09-17 19:07 ` Paolo Bonzini
2013-09-17 19:51 ` Michael S. Tsirkin
2013-09-17 22:02 ` Paolo Bonzini
2013-09-18 5:48 ` Michael S. Tsirkin [this message]
2013-09-18 7:40 ` Paolo Bonzini
2013-09-18 8:41 ` Michael S. Tsirkin
2013-09-18 11:26 ` Paolo Bonzini
2013-09-18 11:56 ` Peter Maydell
2013-09-18 13:11 ` Paolo Bonzini
2013-09-18 13:19 ` Peter Maydell
2013-09-18 13:28 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130918054848.GB23532@redhat.com \
--to=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).