All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Peter Maydell <peter.maydell@linaro.org>
Cc: QEMU Developers <qemu-devel@nongnu.org>,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
Date: Wed, 18 Sep 2013 15:11:47 +0200	[thread overview]
Message-ID: <5239A693.2070409@redhat.com> (raw)
In-Reply-To: <CAFEAcA__D4uV9smfeWd2KCURBcky1cia9m0yPh6=P3QdRsw6Gw@mail.gmail.com>

Il 18/09/2013 13:56, Peter Maydell ha scritto:
>> > But does guest code actually care?  In many cases, I suspect that
>> > sticking a smp_rmb() in the read side of "unlocked" register accesses,
>> > and a smp_wmb() in the write side, will do just fine.  And add a
>> > compatibility property to place a device back under the BQL for guests
>> > that have problems.
> Yuck. This sounds like a recipe for spending the next five years
> debugging subtle race conditions. We need to continue to support
> the semantics that the architecture and hardware specs define
> for memory access orderings to emulated devices.

We cannot in the general case, QEMU is not a cycle-exact simulator.

You need to look at the particular case.  And if you look at particular
cases, you'll find many that are already broken now.

For example, we already have no such guarantee for RAM BARs when running
under KVM, because accesses do not go through QEMU and are not
serialized by the BQL.

Or you could have a device with an MSI vector, program it to write to
RAM, and poll the RAM location from the guest.  Such a write would
currently not be ordered with previous DMA from the device, which
contradicts the PCI spec.  (This is a bug and can be fixed).

address_space_map/unmap pretty much breaks any DMA that is concurrent
with control register access (e.g. the PCI command register).

And all these cases are already there!

Moving devices outside the BQL of course generates more of them.  But
it's not like everything is broken.  For example, ordering memory access
to one emulated device from one CPU is handled naturally (in either TCG
or KVM mode).  Ordering of accesses from a CPU with those from the QEMU
data-plane code is also handled simply with locks or memory barriers
private to the device.

With multiple VCPUs operating at the same time (e.g. the send path of a
network driver on a VCPU, with the interrupts processed on another VCPU)
the activities are likely not independent and the guest is doing its own
synchronization anyway.  It's more likely that they use a lock, but they
can even do Dekker-style synchronization using MMIO registers and it
will just work as long as MMIO read/write ops use
atomic_mb_read/atomic_mb_set (i.e. as long as the bus ordering
guarantees are implemented locally to the device).

There's nothing magic, really.  Both PV and real devices have been doing
it forever by placing some registers in RAM instead of MMIO, and
communicating synchronization points via interrupts and doorbell registers.

But above all, devices have to request BQL-free MMIO explicitly.  You do
not have to use it at all, you can just use all the infrastructure to do
unlocked bus-master DMA (which is anyway already broken from the
ordering POV).  You can limit BQL-free MMIO to PV devices, or to
extremely simple devices, or to one or two highly-optimized registers.
There is a huge gamut of choices, and no magic really.

Paolo

  reply	other threads:[~2013-09-18 13:11 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
2013-09-03 12:32 ` [Qemu-devel] [PATCH 01/38] qdev: document assumption that unrealize is followed by finalize Paolo Bonzini
2013-09-17  9:00   ` Michael S. Tsirkin
2013-09-03 12:32 ` [Qemu-devel] [PATCH 02/38] pci: split exit and finalize Paolo Bonzini
2013-09-17  9:16   ` Michael S. Tsirkin
2013-09-17  9:56     ` Paolo Bonzini
2013-09-17 10:23       ` Paolo Bonzini
2013-09-17 10:06   ` Michael S. Tsirkin
2013-09-03 12:32 ` [Qemu-devel] [PATCH 03/38] ac97: use instance_finalize instead of exit Paolo Bonzini
2013-09-03 12:32 ` [Qemu-devel] [PATCH 04/38] es1370: " Paolo Bonzini
2013-09-03 12:32 ` [Qemu-devel] [PATCH 05/38] hda: reclaim memory in " Paolo Bonzini
2013-09-03 12:32 ` [Qemu-devel] [PATCH 06/38] serial: " Paolo Bonzini
2013-09-03 12:32 ` [Qemu-devel] [PATCH 07/38] tpci200: use " Paolo Bonzini
2013-09-03 12:32 ` [Qemu-devel] [PATCH 08/38] pci-assign: reclaim memory in " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 09/38] ahci: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 10/38] msix: split msix_free from msix_uninit Paolo Bonzini
2013-09-17  9:21   ` Michael S. Tsirkin
2013-09-17  9:56     ` Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 11/38] cmd646: use instance_finalize instead of exit Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 12/38] ide/piix: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 13/38] ide/via: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 14/38] ivshmem: reclaim memory in " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 15/38] pci-testdev: use " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 16/38] vfio: reclaim memory in " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 17/38] e1000: use " Paolo Bonzini
2013-09-17  9:27   ` Michael S. Tsirkin
2013-09-17 10:13     ` Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 18/38] eepro100: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 19/38] ne2000: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 20/38] pcnet: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 21/38] rtl8139: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 22/38] vmxnet3: reclaim memory in " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 23/38] shpc: split shpc_free from shpc_cleanup Paolo Bonzini
2013-09-17  9:24   ` Michael S. Tsirkin
2013-09-17  9:58     ` Paolo Bonzini
2013-09-17 10:03       ` Michael S. Tsirkin
2013-09-03 12:33 ` [Qemu-devel] [PATCH 24/38] pci_bridge: split pci_bridge_free from pci_bridge_exitfn Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 25/38] pcie_aer: pcie_aer_exit really frees stuff Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 26/38] pci_bridge: reclaim memory in instance_finalize instead of exit Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 27/38] ioh4320: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 28/38] xio3130-downstream: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 29/38] xio3130-upstream: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 30/38] pcie: do not recreate mmcfg I/O region, use an alias instead Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 31/38] esp: use instance_finalize instead of exit Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 32/38] lsi: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 33/38] pvscsi: reclaim memory in " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 34/38] usb-uhci: use " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 35/38] virtio-pci: reclaim memory in " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 36/38] wdt_i6300esb: use " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 37/38] xen_pt: reclaim memory in " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 38/38] tpm: move add/del_subregion to realize/unrealize Paolo Bonzini
2013-09-16 16:35 ` [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
2013-09-17  6:44 ` Wenchao Xia
2013-09-17 10:01   ` Paolo Bonzini
2013-09-20  6:16     ` Wenchao Xia
2013-09-17  9:31 ` Michael S. Tsirkin
2013-09-17 12:47 ` Michael S. Tsirkin
2013-09-17 14:41   ` Paolo Bonzini
2013-09-17 14:45     ` Michael S. Tsirkin
2013-09-17 15:41       ` Paolo Bonzini
2013-09-17 15:59         ` Michael S. Tsirkin
2013-09-17 16:13           ` Paolo Bonzini
2013-09-17 16:29             ` Michael S. Tsirkin
2013-09-17 16:58               ` Paolo Bonzini
2013-09-17 17:07                 ` Michael S. Tsirkin
2013-09-17 17:16                   ` Paolo Bonzini
2013-09-17 17:26                     ` Michael S. Tsirkin
2013-09-17 19:07                       ` Paolo Bonzini
2013-09-17 19:51                         ` Michael S. Tsirkin
2013-09-17 22:02                           ` Paolo Bonzini
2013-09-18  5:48                             ` Michael S. Tsirkin
2013-09-18  7:40                               ` Paolo Bonzini
2013-09-18  8:41                                 ` Michael S. Tsirkin
2013-09-18 11:26                                   ` Paolo Bonzini
2013-09-18 11:56                                     ` Peter Maydell
2013-09-18 13:11                                       ` Paolo Bonzini [this message]
2013-09-18 13:19                                         ` Peter Maydell
2013-09-18 13:28                                           ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5239A693.2070409@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=mst@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.