qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	qemu-devel@nongnu.org, Anthony Liguori <anthony@codemonkey.ws>,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [Qemu-devel] [PATCH] Add a memory barrier to guest memory access function
Date: Tue, 22 May 2012 01:22:44 +0300	[thread overview]
Message-ID: <20120521222243.GI17031@redhat.com> (raw)
In-Reply-To: <1337637497.2779.114.camel@pasglop>

On Tue, May 22, 2012 at 07:58:17AM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2012-05-21 at 15:18 +0300, Michael S. Tsirkin wrote:
> 
> > > mb is more than just "read flush writes" (besides it's not a statement
> > > about flushing, it's a statement about ordering. whether it has a
> > > flushing side effect on x86 is a separate issue, it doesn't on power for
> > > example).
> > 
> > I referred to reads not bypassing writes on PCI.
> 
> Again, from which originator ? From a given initiator, nothing bypasses
> anything, so the right thing to do here is a full mb(). However, I
> suspect what you are talking about here is read -responses- not
> bypassing writes in the direction of the response (ie, the "flushing"
> semantic of reads) which is a different matter.

No. My spec says:
A3, A4
A Posted Request must be able to pass Non-Posted Requests to avoid
deadlocks.


> Also don't forget that
> this is only a semantic of PCI, not of the system fabric, ie, a device
> DMA read doesn't flush a CPU write that is still in that CPU store
> queue.

We need to emulate what hardware does IMO.
So if usb has different rules it needs a different barriers.


> > This is the real argument in my eyes: that we
> > should behave the way real hardware does.
> 
> But that doesn't really make much sense since we don't actually have a
> non-coherent bus sitting in the middle :-)
> 
> However we should as much as possible be observed to behave as such, I
> agree, though I don't think we need to bother too much about timings
> since we don't really have way to enforce the immediate visibility of
> stores within the coherent domain without a bunch of arch specific very
> very heavy hammers which we really don't want to wield at this point.
> 
> > > Real flushing out writes matters very much in real life in two very
> > > different contexts that tend to not affect emulation in qemu as much.
> > > 
> > > One is flushing write in the opposite direction (or rather, having the
> > > read response queued up behind those writes) which is critical to
> > > ensuring proper completion of DMAs after an LSI from a guest driver
> > > perspective on real HW typically.
> > > 
> > > The other classic case is to flush posted MMIO writes in order to ensure
> > > that a subsequent delay is respected.
> > > 
> > > Most of those don't actually matter when doing emulation. Besides a
> > > barrier won't provide you the second guarantee, you need a nastier
> > > construct at least on some architectures like power.
> > 
> > Exactly. This is what I was saying too.
> 
> Right and I'm reasonably sure that none of those above is our problem. 
> 
> As I said, at this point, what I want to sort out is purely the
> observable ordering of DMA transactions. The side effect of reads in one
> direction on writes in the other direction is an orthogonal problem
> which as I wrote above is probably not hurting us.
> 
> > > However, we do need to ensure that read and writes are properly ordered
> > > vs. each other (regardless of any "flush" semantic) or things could go
> > > very wrong on OO architectures (here too, my understanding on x86 is
> > > limited).
> > 
> > Right. Here's a compromize:
> > - add smp_rmb() on any DMA read
> > - add smp_wmb( on any DMA write
> > This is almost zero cost on x86 at least.
> > So we are not regressing existing setups.
> 
> And it's not correct. With that setup, DMA writes can pass DMA reads
> (and vice-versa) which doesn't correspond to the guarantees of the PCI
> spec.

Cite the spec please. Express spec matches this at least.

> The question I suppose is whether this is a problem in practice...
> 
> > Are there any places where devices do read after write?
> 
> It's possible, things like update of a descriptor followed by reading of
> the next one, etc...  I don't have an example hot in mind right know of
> a device that would be hurt but I'm a bit nervous as this would be a
> violation of the PCI guaranteed ordering.
> 
> > My preferred way is to find them and do pci_dma_flush() invoking
> > smp_mb(). If there is such a case it's likely on datapath anyway
> > so we do care.
> > 
> > But I can also live with a global flag "latest_dma_read"
> > and on read we could do
> > 	if (unlikely(latest_dma_read))
> > 		smp_mb();
> > 
> > if you really insist on it
> > though I do think it's inelegant.
> 
> Again, why do you object on simply making the default accessors fully
> ordered ? Do you think it will be a measurable different in most cases ?
> 
> Shouldn't we measure it first ?

It's a lot of work. We measured the effect for virtio in
the past. I don't think we need to redo it.

> > You said above x86 is unaffected. This is portability, not safety.
> 
> x86 is unaffected by the missing wmb/rmb, it might not be unaffected by
> the missing ordering between loads and stores, I don't know, as I said,
> I don't fully know the x86 memory model.

You don't need to understand it. Assume memory-barriers.h is correct.

> In any case, opposing "portability" to "safety" the way you do it means
> you are making assumptions that basically "qemu is written for x86 and
> nothing else matters".

No. But find a way to fix power without hurting working setups.

> If that's your point of view, so be it and be clear about it, but I will
> disagree :-) And while I can understand that powerpc might not be
> considered as the most important arch around at this point in time,
> these problems are going to affect ARM as well.
> 
> > > Almost all our
> > > devices were written without any thought given to ordering, so they
> > > basically can and should be considered as all broken.
> > 
> > Problem is, a lot of code is likely broken even after you sprinkle
> > barriers around. For example qemu might write A then B where guest driver
> > expects to see B written before A.
> 
> No, this is totally unrelated bugs, nothing to do with barriers. You are
> mixing up two completely different problems and using one as an excuse
> to not fix the other one :-)
> 
> A device with the above problem would be broken today on x86 regardless.
> 
> > > Since thinking
> > > about ordering is something that, by experience, very few programmer can
> > > do and get right, the default should absolutely be fully ordered.
> > 
> > Give it bus ordering. That is not fully ordered.
> 
> It pretty much is actually, look at your PCI spec :-)

I looked. 2.4.1.  Transaction Ordering Rules

> > > Performance regressions aren't a big deal to bisect in that case: If
> > > there's a regression for a given driver and it points to this specific
> > > patch adding the barriers then we know precisely where the regression
> > > come from, and we can get some insight about how this specific driver
> > > can be improved to use more relaxed accessors.
> > > 
> > > I don't see the problem.
> > > 
> > > One thing that might be worth looking at is if indeed mb() is so much
> > > more costly than just wmb/rmb, in which circumstances we could have some
> > > smarts in the accessors to make them skip the full mb based on knowledge
> > > of previous access direction, though here too I would be tempted to only
> > > do that if absolutely necessary (ie if we can't instead just fix the
> > > sensitive driver to use explicitly relaxed accessors).
> > 
> > We did this in virtio and yes it is measureable.
> 
> You did it in virtio in a very hot spot on a performance critical
> driver. My argument is that:
> 
>  - We can do it in a way that doesn't affect virtio at all (by using the
> dma accessors instead of cpu_*)
> 
>  - Only few drivers have that kind of performance criticality and they
> can be easily hand fixed.
> 
> > branches are pretty cheap though.
> 
> Depends, not always but yes, cheaper than barriers in many cases.
> 
> > > > > So on that I will not compromise.
> > > > > 
> > > > > However, I think it might be better to leave the barrier in the dma
> > > > > accessor since that's how you also get iommu transparency etc... so it's
> > > > > not a bad place to put them, and leave the cpu_physical_* for use by
> > > > > lower level device drivers which are thus responsible also for dealing
> > > > > with ordering if they have to.
> > > > > 
> > > > > Cheers,
> > > > > Ben.
> > > > 
> > > > You claim to understand what matters for all devices I doubt that.
> > > 
> > > It's pretty obvious that anything that does DMA using a classic
> > > descriptor + buffers structure is broken without appropriate ordering.
> > > 
> > > And yes, I claim to have a fairly good idea of the problem, but I don't
> > > think throwing credentials around is going to be helpful.
> > >  
> > > > Why don't we add safe APIs, then go over devices and switch over?
> > > > I counted 97 pci_dma_ accesses.
> > > > 33 in rtl, 32 in eepro100, 12 in lsi, 7 in e1000.
> > > > 
> > > > Let maintainers make a decision where does speed matter.
> > > 
> > > No. Let's fix the general bug first. Then let's people who know the
> > > individual drivers intimately and understand their access patterns make
> > > the call as to when things can/should be relaxed.
> > > 
> > > Ben.
> > 
> > As a maintainer of a device, if you send me a patch I can review.
> > If you change core APIs creating performance regressions
> > I don't even know what to review without wasting time debugging
> > and bisecting.
> 
> Well, if you don't know then you ask on the list and others (such as
> myself or a certain mst) who happens to know those issues will help that
> lone clueless maintainer, seriously it's not like it's hard, and a lot
> easier than just keeping everything broken and hoping we get to audit
> them all properly.
> 
> > According to what you said you want to fix kvm on powerpc.
> > Good. Find a way that looks non intrusive on x86 please.
> 
> And ARM. And any other OO arch.
> 
> Ben.

  reply	other threads:[~2012-05-21 22:22 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-10  4:48 [Qemu-devel] [PATCH 00/13] IOMMU infrastructure Benjamin Herrenschmidt
2012-05-10  4:48 ` [Qemu-devel] [PATCH 01/13] Better support for dma_addr_t variables Benjamin Herrenschmidt
2012-05-10  4:48 ` [Qemu-devel] [PATCH 02/13] Implement cpu_physical_memory_zero() Benjamin Herrenschmidt
2012-05-15  0:42   ` Anthony Liguori
2012-05-15  1:23     ` David Gibson
2012-05-15  2:03       ` Anthony Liguori
2012-05-10  4:48 ` [Qemu-devel] [PATCH 03/13] iommu: Add universal DMA helper functions Benjamin Herrenschmidt
2012-05-10  4:48 ` [Qemu-devel] [PATCH 04/13] usb-ohci: Use " Benjamin Herrenschmidt
2012-05-10  4:48 ` [Qemu-devel] [PATCH 05/13] iommu: Make sglists and dma_bdrv helpers use new universal DMA helpers Benjamin Herrenschmidt
2012-05-10  4:49 ` [Qemu-devel] [PATCH 06/13] ide/ahci: Use universal DMA helper functions Benjamin Herrenschmidt
2012-05-21  1:51   ` [Qemu-devel] [PATCH 06/13 - UPDATED] " Benjamin Herrenschmidt
2012-05-10  4:49 ` [Qemu-devel] [PATCH 07/13] usb: Convert usb_packet_{map, unmap} to universal DMA helpers Benjamin Herrenschmidt
2012-05-10  4:49 ` [Qemu-devel] [PATCH 08/13] iommu: Introduce IOMMU emulation infrastructure Benjamin Herrenschmidt
2012-05-15  0:49   ` Anthony Liguori
2012-05-15  1:42     ` David Gibson
2012-05-15  2:03       ` Anthony Liguori
2012-05-15  2:32         ` Benjamin Herrenschmidt
2012-05-15  2:50           ` Anthony Liguori
2012-05-15  3:02             ` Benjamin Herrenschmidt
2012-05-15 14:02               ` Anthony Liguori
2012-05-15 21:55                 ` Benjamin Herrenschmidt
2012-05-15 22:02                   ` Anthony Liguori
2012-05-15 23:08                     ` Benjamin Herrenschmidt
2012-05-15 23:58                       ` Anthony Liguori
2012-05-16  0:41                         ` Benjamin Herrenschmidt
2012-05-16  0:54                           ` Anthony Liguori
2012-05-16  1:20                             ` Benjamin Herrenschmidt
2012-05-16 19:36                               ` Anthony Liguori
2012-05-10  4:49 ` [Qemu-devel] [PATCH 09/13] iommu: Add facility to cancel in-use dma memory maps Benjamin Herrenschmidt
2012-05-10  4:49 ` [Qemu-devel] [PATCH 10/13] pseries: Convert sPAPR TCEs to use generic IOMMU infrastructure Benjamin Herrenschmidt
2012-05-10  4:49 ` [Qemu-devel] [PATCH 11/13] iommu: Allow PCI to use " Benjamin Herrenschmidt
2012-05-10  4:49 ` [Qemu-devel] [PATCH 12/13] pseries: Implement IOMMU and DMA for PAPR PCI devices Benjamin Herrenschmidt
2012-05-10  4:49 ` [Qemu-devel] [PATCH 13/13] iommu: Add a memory barrier to DMA RW function Benjamin Herrenschmidt
2012-05-15  0:52   ` Anthony Liguori
2012-05-15  1:11     ` Benjamin Herrenschmidt
2012-05-15  1:44     ` David Gibson
2012-05-16  4:35       ` Benjamin Herrenschmidt
2012-05-16  5:51         ` David Gibson
2012-05-16 19:39         ` Anthony Liguori
2012-05-16 21:10           ` Benjamin Herrenschmidt
2012-05-16 21:12             ` Benjamin Herrenschmidt
2012-05-17  0:07           ` Benjamin Herrenschmidt
2012-05-17  0:24             ` Benjamin Herrenschmidt
2012-05-17  0:52               ` [Qemu-devel] [RFC/PATCH] Add a memory barrier to guest memory access functions Benjamin Herrenschmidt
2012-05-17  2:28                 ` Anthony Liguori
2012-05-17  2:44                   ` Benjamin Herrenschmidt
2012-05-17 22:09                     ` Anthony Liguori
2012-05-18  1:04                       ` David Gibson
2012-05-18  1:16                       ` Benjamin Herrenschmidt
2012-05-17  3:35                   ` David Gibson
2012-05-18  6:53               ` [Qemu-devel] [PATCH 13/13] iommu: Add a memory barrier to DMA RW function Paolo Bonzini
2012-05-18  8:18                 ` Benjamin Herrenschmidt
2012-05-18  8:57                   ` Paolo Bonzini
2012-05-18 22:26                     ` Benjamin Herrenschmidt
2012-05-19  7:24                       ` Paolo Bonzini
2012-05-20 21:36                         ` Benjamin Herrenschmidt
2012-05-21  1:56                           ` [Qemu-devel] [PATCH] Add a memory barrier to guest memory access functions Benjamin Herrenschmidt
2012-05-21  8:11                             ` Paolo Bonzini
2012-05-21  8:31                               ` Michael S. Tsirkin
2012-05-21  8:58                                 ` Benjamin Herrenschmidt
2012-05-21  9:07                                   ` Benjamin Herrenschmidt
2012-05-21  9:16                                     ` Benjamin Herrenschmidt
2012-05-21  9:34                                       ` Michael S. Tsirkin
2012-05-21  9:53                                         ` Benjamin Herrenschmidt
2012-05-21 10:31                                           ` Michael S. Tsirkin
2012-05-21 11:45                                             ` Benjamin Herrenschmidt
2012-05-21 12:18                                               ` Michael S. Tsirkin
2012-05-21 15:16                                                 ` Paolo Bonzini
2012-05-21 21:58                                                 ` [Qemu-devel] [PATCH] Add a memory barrier to guest memory access function Benjamin Herrenschmidt
2012-05-21 22:22                                                   ` Michael S. Tsirkin [this message]
2012-05-21 22:56                                                     ` Benjamin Herrenschmidt
2012-05-22  5:11                                                       ` Michael S. Tsirkin
2012-05-22  0:00                                                     ` Benjamin Herrenschmidt
2012-05-22  4:19                                                   ` Rusty Russell
2012-05-21 22:18                                 ` [Qemu-devel] [PATCH] Add a memory barrier to guest memory access functions Anthony Liguori
2012-05-21 22:26                                   ` Benjamin Herrenschmidt
2012-05-21 22:31                                     ` Anthony Liguori
2012-05-21 22:44                                       ` Michael S. Tsirkin
2012-05-21 23:02                                         ` Benjamin Herrenschmidt
2012-05-22  4:34                                         ` [Qemu-devel] [PATCH] Add a memory barrier to DMA functions Benjamin Herrenschmidt
2012-05-22  4:51                                           ` Benjamin Herrenschmidt
2012-05-22  7:17                                           ` Benjamin Herrenschmidt
2012-05-22 11:14                                             ` Michael S. Tsirkin
2012-05-22 11:41                                               ` Benjamin Herrenschmidt
2012-05-22 12:03                                                 ` Michael S. Tsirkin
2012-05-22 21:24                                                   ` Benjamin Herrenschmidt
2012-05-22 21:40                                                   ` Anthony Liguori
2012-05-21 22:37                                   ` [Qemu-devel] [PATCH] Add a memory barrier to guest memory access functions Michael S. Tsirkin
2012-05-15  0:52 ` [Qemu-devel] [PATCH 00/13] IOMMU infrastructure Anthony Liguori

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120521222243.GI17031@redhat.com \
    --to=mst@redhat.com \
    --cc=anthony@codemonkey.ws \
    --cc=benh@kernel.crashing.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).