public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Ira W. Snyder" <iws@ovro.caltech.edu>
To: Shi Xuelin-B29237 <B29237@freescale.com>
Cc: "vinod.koul@intel.com" <vinod.koul@intel.com>,
	"dan.j.williams@intel.com" <dan.j.williams@intel.com>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Li Yang-R58472 <r58472@freescale.com>
Subject: Re: [PATCH][RFC] fsldma: fix performance degradation by optimizing spinlock use.
Date: Fri, 2 Dec 2011 09:13:35 -0800	[thread overview]
Message-ID: <20111202171334.GA19677@ovro.caltech.edu> (raw)
In-Reply-To: <DBB740589CE8814680DECFE34BE197AB14BCBC@039-SN1MPN1-006.039d.mgd.msft.net>

On Fri, Dec 02, 2011 at 03:47:27AM +0000, Shi Xuelin-B29237 wrote:
> Hi Iris,
> 
> >I'm convinced that "smp_rmb()" is needed when removing the spinlock. As noted, Documentation/memory-barriers.txt says that stores on one CPU can be
> >observed by another CPU in a different order.
> >Previously, there was an UNLOCK (in fsl_dma_tx_submit) followed by a LOCK (in fsl_tx_status). This provided a "full barrier", forcing the operations to 
> >complete correctly when viewed by the second CPU. 
> 
> I do not agree this smp_rmb() works here. Because when this smp_rmb() executed and begin to read chan->common.cookie, you still cannot avoid the order issue. Something like one is reading old value, but another CPU is updating the new value. 
> 
> My point is here the order is not important for the DMA decision.
> Completed DMA tx is decided as not complete is not a big deal, because next time it will be OK.
> 
> I believe there is no case that could cause uncompleted DMA tx is decided as completed, because the fsl_tx_status is called after fsl_dma_tx_submit for a specific cookie. If you can give me an example here, I will agree with you.
> 

According to memory-barriers.txt, writes to main memory may be observed in
any order if memory barriers are not used. This means that writes can
appear to happen in a different order than they were issued by the CPU.

Citing from the text:

> There are certain things that the Linux kernel memory barriers do not guarantee:
>
>  (*) There is no guarantee that any of the memory accesses specified before a
>      memory barrier will be _complete_ by the completion of a memory barrier
>      instruction; the barrier can be considered to draw a line in that CPU's
>      access queue that accesses of the appropriate type may not cross.

Also:

> Without intervention, CPU 2 may perceive the events on CPU 1 in some
> effectively random order, despite the write barrier issued by CPU 1:

Also:

> When dealing with CPU-CPU interactions, certain types of memory barrier should
> always be paired.  A lack of appropriate pairing is almost certainly an error.
>
> A write barrier should always be paired with a data dependency barrier or read
> barrier, though a general barrier would also be viable.

Therefore, in an SMP system, the following situation can happen.

descriptor->cookie = 2
chan->common.cookie = 1
chan->completed_cookie = 1

This occurs when CPU-A calls fsl_dma_tx_submit() and then CPU-B calls
dma_async_is_complete() ***after*** CPU-B has observed the write to
descriptor->cookie, and ***before*** before CPU-B has observed the write to
chan->common.cookie.

Remember, without barriers, CPU-B can observe CPU-A's memory accesses in
*any possible order*. Memory accesses are not guaranteed to be *complete*
by the time fsl_dma_tx_submit() returns!

With the above values, dma_async_is_complete() returns DMA_COMPLETE. This
is incorrect: the DMA is still in progress. The required invariant
chan->common.cookie >= descriptor->cookie has not been met.

By adding an smp_rmb(), I force CPU-B to stall until *both* stores in
fsl_dma_tx_submit() (descriptor->cookie and chan->common.cookie) actually
hit main memory. This avoids the above situation: all CPU's observe
descriptor->cookie and chan->common.cookie to update in sync with each
other.

Is this unclear in any way?

Please run your test with the smp_rmb() and measure the performance
impact.

Ira

  reply	other threads:[~2011-12-02 17:13 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-22  4:55 [PATCH][RFC] fsldma: fix performance degradation by optimizing spinlock use b29237
2011-11-22 18:59 ` Ira W. Snyder
2011-11-24  8:12   ` Shi Xuelin-B29237
2011-11-28 16:38     ` Ira W. Snyder
2011-11-29  3:19       ` Li Yang-R58472
2011-11-29 17:25         ` Ira W. Snyder
2011-11-30  0:08           ` Tabi Timur-B04825
2011-11-30  9:57           ` Shi Xuelin-B29237
2011-11-30 17:07             ` Ira W. Snyder
2011-12-02  3:47               ` Shi Xuelin-B29237
2011-12-02 17:13                 ` Ira W. Snyder [this message]
2011-12-05  6:11                   ` Shi Xuelin-B29237
2011-11-29 19:49         ` Scott Wood
2011-11-29  3:41       ` Shi Xuelin-B29237

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111202171334.GA19677@ovro.caltech.edu \
    --to=iws@ovro.caltech.edu \
    --cc=B29237@freescale.com \
    --cc=dan.j.williams@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=r58472@freescale.com \
    --cc=vinod.koul@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox