qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	qemu-devel@nongnu.org, Anthony Liguori <anthony@codemonkey.ws>,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [Qemu-devel] [PATCH] Add a memory barrier to guest memory access function
Date: Tue, 22 May 2012 08:56:12 +1000	[thread overview]
Message-ID: <1337640972.2779.127.camel@pasglop> (raw)
In-Reply-To: <20120521222243.GI17031@redhat.com>

On Tue, 2012-05-22 at 01:22 +0300, Michael S. Tsirkin wrote:
> > Again, from which originator ? From a given initiator, nothing
> bypasses
> > anything, so the right thing to do here is a full mb(). However, I
> > suspect what you are talking about here is read -responses- not
> > bypassing writes in the direction of the response (ie, the
> "flushing"
> > semantic of reads) which is a different matter.
> 
> No. My spec says:
> A3, A4
> A Posted Request must be able to pass Non-Posted Requests to avoid
> deadlocks.

Right, a read + write can become write + read at the target, I forgot
about that, or you can deadlock due to the flush semantics, but a write
+ read must remain in order or am I missing something ?

And write + read afaik is typically the one that x86 can re-order
without a barrier isn't it ?

> > Also don't forget that
> > this is only a semantic of PCI, not of the system fabric, ie, a
> device
> > DMA read doesn't flush a CPU write that is still in that CPU store
> > queue.
> 
> We need to emulate what hardware does IMO.
> So if usb has different rules it needs a different barriers.

Who talks about USB here ? Whatever rules USB has only matter between
the USB device and the USB controller emulation, USB doesn't sit
directly on the memory bus doing DMA, it all goes through the HCI, which
adheres the the ordering rules of whatever bus it sits on.

Here, sanity must prevail :-) I suggest ordering by default...

> > And it's not correct. With that setup, DMA writes can pass DMA reads
> > (and vice-versa) which doesn't correspond to the guarantees of the
> PCI
> > spec.
> 
> Cite the spec please. Express spec matches this at least.

Sure, see above. Yes I did forgot that a read + write could be
re-ordered on PCI but that isn't the case of a write + read, or am I
reading the table sideways ?

> It's a lot of work. We measured the effect for virtio in
> the past. I don't think we need to redo it.

virtio is specifically our high performance case and what I'm proposing
isn't affecting it.

> > > You said above x86 is unaffected. This is portability, not safety.
> > 
> > x86 is unaffected by the missing wmb/rmb, it might not be unaffected
> by
> > the missing ordering between loads and stores, I don't know, as I
> said,
> > I don't fully know the x86 memory model.
> 
> You don't need to understand it. Assume memory-barriers.h is correct.

In which case we still need a full mb() unless we can convince ourselves
that the ordering between a write and a subsequent read can be relaxed
safely and I'm really not sure about it.

> > In any case, opposing "portability" to "safety" the way you do it
> means
> > you are making assumptions that basically "qemu is written for x86
> and
> > nothing else matters".
> 
> No. But find a way to fix power without hurting working setups.

And ARM ;-)

Arguably x86 is wrong too anyway, at least from a strict interpretation
off the spec (and unless I missed something).

> > If that's your point of view, so be it and be clear about it, but I
> will
> > disagree :-) And while I can understand that powerpc might not be
> > considered as the most important arch around at this point in time,
> > these problems are going to affect ARM as well.
> > 
> > > > Almost all our
> > > > devices were written without any thought given to ordering, so
> they
> > > > basically can and should be considered as all broken.
> > > 
> > > Problem is, a lot of code is likely broken even after you sprinkle
> > > barriers around. For example qemu might write A then B where guest
> driver
> > > expects to see B written before A.
> > 
> > No, this is totally unrelated bugs, nothing to do with barriers. You
> are
> > mixing up two completely different problems and using one as an
> excuse
> > to not fix the other one :-)
> > 
> > A device with the above problem would be broken today on x86
> regardless.
> > 
> > > > Since thinking
> > > > about ordering is something that, by experience, very few
> programmer can
> > > > do and get right, the default should absolutely be fully
> ordered.
> > > 
> > > Give it bus ordering. That is not fully ordered.
> > 
> > It pretty much is actually, look at your PCI spec :-)
> 
> I looked. 2.4.1.  Transaction Ordering Rules
> 
> > > > Performance regressions aren't a big deal to bisect in that
> case: If
> > > > there's a regression for a given driver and it points to this
> specific
> > > > patch adding the barriers then we know precisely where the
> regression
> > > > come from, and we can get some insight about how this specific
> driver
> > > > can be improved to use more relaxed accessors.
> > > > 
> > > > I don't see the problem.
> > > > 
> > > > One thing that might be worth looking at is if indeed mb() is so
> much
> > > > more costly than just wmb/rmb, in which circumstances we could
> have some
> > > > smarts in the accessors to make them skip the full mb based on
> knowledge
> > > > of previous access direction, though here too I would be tempted
> to only
> > > > do that if absolutely necessary (ie if we can't instead just fix
> the
> > > > sensitive driver to use explicitly relaxed accessors).
> > > 
> > > We did this in virtio and yes it is measureable.
> > 
> > You did it in virtio in a very hot spot on a performance critical
> > driver. My argument is that:
> > 
> >  - We can do it in a way that doesn't affect virtio at all (by using
> the
> > dma accessors instead of cpu_*)
> > 
> >  - Only few drivers have that kind of performance criticality and
> they
> > can be easily hand fixed.
> > 
> > > branches are pretty cheap though.
> > 
> > Depends, not always but yes, cheaper than barriers in many cases.
> > 
> > > > > > So on that I will not compromise.
> > > > > > 
> > > > > > However, I think it might be better to leave the barrier in
> the dma
> > > > > > accessor since that's how you also get iommu transparency
> etc... so it's
> > > > > > not a bad place to put them, and leave the cpu_physical_*
> for use by
> > > > > > lower level device drivers which are thus responsible also
> for dealing
> > > > > > with ordering if they have to.
> > > > > > 
> > > > > > Cheers,
> > > > > > Ben.
> > > > > 
> > > > > You claim to understand what matters for all devices I doubt
> that.
> > > > 
> > > > It's pretty obvious that anything that does DMA using a classic
> > > > descriptor + buffers structure is broken without appropriate
> ordering.
> > > > 
> > > > And yes, I claim to have a fairly good idea of the problem, but
> I don't
> > > > think throwing credentials around is going to be helpful.
> > > >  
> > > > > Why don't we add safe APIs, then go over devices and switch
> over?
> > > > > I counted 97 pci_dma_ accesses.
> > > > > 33 in rtl, 32 in eepro100, 12 in lsi, 7 in e1000.
> > > > > 
> > > > > Let maintainers make a decision where does speed matter.
> > > > 
> > > > No. Let's fix the general bug first. Then let's people who know
> the
> > > > individual drivers intimately and understand their access
> patterns make
> > > > the call as to when things can/should be relaxed.
> > > > 
> > > > Ben.
> > > 
> > > As a maintainer of a device, if you send me a patch I can review.
> > > If you change core APIs creating performance regressions
> > > I don't even know what to review without wasting time debugging
> > > and bisecting.
> > 
> > Well, if you don't know then you ask on the list and others (such as
> > myself or a certain mst) who happens to know those issues will help
> that
> > lone clueless maintainer, seriously it's not like it's hard, and a
> lot
> > easier than just keeping everything broken and hoping we get to
> audit
> > them all properly.
> > 
> > > According to what you said you want to fix kvm on powerpc.
> > > Good. Find a way that looks non intrusive on x86 please.
> > 
> > And ARM. And any other OO arch.
> > 
> > Ben.
> 
> 
> 

  reply	other threads:[~2012-05-21 22:56 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-10  4:48 [Qemu-devel] [PATCH 00/13] IOMMU infrastructure Benjamin Herrenschmidt
2012-05-10  4:48 ` [Qemu-devel] [PATCH 01/13] Better support for dma_addr_t variables Benjamin Herrenschmidt
2012-05-10  4:48 ` [Qemu-devel] [PATCH 02/13] Implement cpu_physical_memory_zero() Benjamin Herrenschmidt
2012-05-15  0:42   ` Anthony Liguori
2012-05-15  1:23     ` David Gibson
2012-05-15  2:03       ` Anthony Liguori
2012-05-10  4:48 ` [Qemu-devel] [PATCH 03/13] iommu: Add universal DMA helper functions Benjamin Herrenschmidt
2012-05-10  4:48 ` [Qemu-devel] [PATCH 04/13] usb-ohci: Use " Benjamin Herrenschmidt
2012-05-10  4:48 ` [Qemu-devel] [PATCH 05/13] iommu: Make sglists and dma_bdrv helpers use new universal DMA helpers Benjamin Herrenschmidt
2012-05-10  4:49 ` [Qemu-devel] [PATCH 06/13] ide/ahci: Use universal DMA helper functions Benjamin Herrenschmidt
2012-05-21  1:51   ` [Qemu-devel] [PATCH 06/13 - UPDATED] " Benjamin Herrenschmidt
2012-05-10  4:49 ` [Qemu-devel] [PATCH 07/13] usb: Convert usb_packet_{map, unmap} to universal DMA helpers Benjamin Herrenschmidt
2012-05-10  4:49 ` [Qemu-devel] [PATCH 08/13] iommu: Introduce IOMMU emulation infrastructure Benjamin Herrenschmidt
2012-05-15  0:49   ` Anthony Liguori
2012-05-15  1:42     ` David Gibson
2012-05-15  2:03       ` Anthony Liguori
2012-05-15  2:32         ` Benjamin Herrenschmidt
2012-05-15  2:50           ` Anthony Liguori
2012-05-15  3:02             ` Benjamin Herrenschmidt
2012-05-15 14:02               ` Anthony Liguori
2012-05-15 21:55                 ` Benjamin Herrenschmidt
2012-05-15 22:02                   ` Anthony Liguori
2012-05-15 23:08                     ` Benjamin Herrenschmidt
2012-05-15 23:58                       ` Anthony Liguori
2012-05-16  0:41                         ` Benjamin Herrenschmidt
2012-05-16  0:54                           ` Anthony Liguori
2012-05-16  1:20                             ` Benjamin Herrenschmidt
2012-05-16 19:36                               ` Anthony Liguori
2012-05-10  4:49 ` [Qemu-devel] [PATCH 09/13] iommu: Add facility to cancel in-use dma memory maps Benjamin Herrenschmidt
2012-05-10  4:49 ` [Qemu-devel] [PATCH 10/13] pseries: Convert sPAPR TCEs to use generic IOMMU infrastructure Benjamin Herrenschmidt
2012-05-10  4:49 ` [Qemu-devel] [PATCH 11/13] iommu: Allow PCI to use " Benjamin Herrenschmidt
2012-05-10  4:49 ` [Qemu-devel] [PATCH 12/13] pseries: Implement IOMMU and DMA for PAPR PCI devices Benjamin Herrenschmidt
2012-05-10  4:49 ` [Qemu-devel] [PATCH 13/13] iommu: Add a memory barrier to DMA RW function Benjamin Herrenschmidt
2012-05-15  0:52   ` Anthony Liguori
2012-05-15  1:11     ` Benjamin Herrenschmidt
2012-05-15  1:44     ` David Gibson
2012-05-16  4:35       ` Benjamin Herrenschmidt
2012-05-16  5:51         ` David Gibson
2012-05-16 19:39         ` Anthony Liguori
2012-05-16 21:10           ` Benjamin Herrenschmidt
2012-05-16 21:12             ` Benjamin Herrenschmidt
2012-05-17  0:07           ` Benjamin Herrenschmidt
2012-05-17  0:24             ` Benjamin Herrenschmidt
2012-05-17  0:52               ` [Qemu-devel] [RFC/PATCH] Add a memory barrier to guest memory access functions Benjamin Herrenschmidt
2012-05-17  2:28                 ` Anthony Liguori
2012-05-17  2:44                   ` Benjamin Herrenschmidt
2012-05-17 22:09                     ` Anthony Liguori
2012-05-18  1:04                       ` David Gibson
2012-05-18  1:16                       ` Benjamin Herrenschmidt
2012-05-17  3:35                   ` David Gibson
2012-05-18  6:53               ` [Qemu-devel] [PATCH 13/13] iommu: Add a memory barrier to DMA RW function Paolo Bonzini
2012-05-18  8:18                 ` Benjamin Herrenschmidt
2012-05-18  8:57                   ` Paolo Bonzini
2012-05-18 22:26                     ` Benjamin Herrenschmidt
2012-05-19  7:24                       ` Paolo Bonzini
2012-05-20 21:36                         ` Benjamin Herrenschmidt
2012-05-21  1:56                           ` [Qemu-devel] [PATCH] Add a memory barrier to guest memory access functions Benjamin Herrenschmidt
2012-05-21  8:11                             ` Paolo Bonzini
2012-05-21  8:31                               ` Michael S. Tsirkin
2012-05-21  8:58                                 ` Benjamin Herrenschmidt
2012-05-21  9:07                                   ` Benjamin Herrenschmidt
2012-05-21  9:16                                     ` Benjamin Herrenschmidt
2012-05-21  9:34                                       ` Michael S. Tsirkin
2012-05-21  9:53                                         ` Benjamin Herrenschmidt
2012-05-21 10:31                                           ` Michael S. Tsirkin
2012-05-21 11:45                                             ` Benjamin Herrenschmidt
2012-05-21 12:18                                               ` Michael S. Tsirkin
2012-05-21 15:16                                                 ` Paolo Bonzini
2012-05-21 21:58                                                 ` [Qemu-devel] [PATCH] Add a memory barrier to guest memory access function Benjamin Herrenschmidt
2012-05-21 22:22                                                   ` Michael S. Tsirkin
2012-05-21 22:56                                                     ` Benjamin Herrenschmidt [this message]
2012-05-22  5:11                                                       ` Michael S. Tsirkin
2012-05-22  0:00                                                     ` Benjamin Herrenschmidt
2012-05-22  4:19                                                   ` Rusty Russell
2012-05-21 22:18                                 ` [Qemu-devel] [PATCH] Add a memory barrier to guest memory access functions Anthony Liguori
2012-05-21 22:26                                   ` Benjamin Herrenschmidt
2012-05-21 22:31                                     ` Anthony Liguori
2012-05-21 22:44                                       ` Michael S. Tsirkin
2012-05-21 23:02                                         ` Benjamin Herrenschmidt
2012-05-22  4:34                                         ` [Qemu-devel] [PATCH] Add a memory barrier to DMA functions Benjamin Herrenschmidt
2012-05-22  4:51                                           ` Benjamin Herrenschmidt
2012-05-22  7:17                                           ` Benjamin Herrenschmidt
2012-05-22 11:14                                             ` Michael S. Tsirkin
2012-05-22 11:41                                               ` Benjamin Herrenschmidt
2012-05-22 12:03                                                 ` Michael S. Tsirkin
2012-05-22 21:24                                                   ` Benjamin Herrenschmidt
2012-05-22 21:40                                                   ` Anthony Liguori
2012-05-21 22:37                                   ` [Qemu-devel] [PATCH] Add a memory barrier to guest memory access functions Michael S. Tsirkin
2012-05-15  0:52 ` [Qemu-devel] [PATCH 00/13] IOMMU infrastructure Anthony Liguori

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1337640972.2779.127.camel@pasglop \
    --to=benh@kernel.crashing.org \
    --cc=anthony@codemonkey.ws \
    --cc=david@gibson.dropbear.id.au \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).