From: "Ira W. Snyder" <iws@ovro.caltech.edu>
To: Shi Xuelin-B29237 <B29237@freescale.com>
Cc: "vinod.koul@intel.com" <vinod.koul@intel.com>,
"dan.j.williams@intel.com" <dan.j.williams@intel.com>,
"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
Li Yang-R58472 <r58472@freescale.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH][RFC] fsldma: fix performance degradation by optimizing spinlock use.
Date: Mon, 28 Nov 2011 08:38:15 -0800 [thread overview]
Message-ID: <20111128163814.GA10919@ovro.caltech.edu> (raw)
In-Reply-To: <DBB740589CE8814680DECFE34BE197AB148CDD@039-SN1MPN1-006.039d.mgd.msft.net>
On Thu, Nov 24, 2011 at 08:12:25AM +0000, Shi Xuelin-B29237 wrote:
> Hi Ira,
>
> Thanks for your review.
>
> After second thought, I think your scenario may not occur.
> Because the cookie 20 we query must be returned by fsl_dma_tx_submit(...) in practice.
> We never query a cookie not returned by fsl_dma_tx_submit(...).
>
I agree about this part.
> When we call fsl_tx_status(20), the chan->common.cookie is definitely wrote as 20 and cpu2 could not read as 19.
>
This is what I don't agree about. However, I'm not an expert on CPU
cache vs. memory accesses in an multi-processor system. The section
titled "CACHE COHERENCY" in Documentation/memory-barriers.txt leads me
to believe that the scenario I described is possible.
What happens if CPU1's write of chan->common.cookie only goes into
CPU1's cache. It never makes it to main memory before CPU2 fetches the
old value of 19.
I don't think you should see any performance impact from the smp_mb()
operation.
Thanks,
Ira
> -----Original Message-----
> From: Ira W. Snyder [mailto:iws@ovro.caltech.edu]
> Sent: 2011年11月23日 2:59
> To: Shi Xuelin-B29237
> Cc: dan.j.williams@intel.com; Li Yang-R58472; zw@zh-kernel.org; vinod.koul@intel.com; linuxppc-dev@lists.ozlabs.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH][RFC] fsldma: fix performance degradation by optimizing spinlock use.
>
> On Tue, Nov 22, 2011 at 12:55:05PM +0800, b29237@freescale.com wrote:
> > From: Forrest Shi <b29237@freescale.com>
> >
> > dma status check function fsl_tx_status is heavily called in
> > a tight loop and the desc lock in fsl_tx_status contended by
> > the dma status update function. this caused the dma performance
> > degrades much.
> >
> > this patch releases the lock in the fsl_tx_status function.
> > I believe it has no neglect impact on the following call of
> > dma_async_is_complete(...).
> >
> > we can see below three conditions will be identified as success
> > a) x < complete < use
> > b) x < complete+N < use+N
> > c) x < complete < use+N
> > here complete is the completed_cookie, use is the last_used
> > cookie, x is the querying cookie, N is MAX cookie
> >
> > when chan->completed_cookie is being read, the last_used may
> > be incresed. Anyway it has no neglect impact on the dma status
> > decision.
> >
> > Signed-off-by: Forrest Shi <xuelin.shi@freescale.com>
> > ---
> > drivers/dma/fsldma.c | 5 -----
> > 1 files changed, 0 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index
> > 8a78154..1dca56f 100644
> > --- a/drivers/dma/fsldma.c
> > +++ b/drivers/dma/fsldma.c
> > @@ -986,15 +986,10 @@ static enum dma_status fsl_tx_status(struct dma_chan *dchan,
> > struct fsldma_chan *chan = to_fsl_chan(dchan);
> > dma_cookie_t last_complete;
> > dma_cookie_t last_used;
> > - unsigned long flags;
> > -
> > - spin_lock_irqsave(&chan->desc_lock, flags);
> >
>
> This will cause a bug. See below for a detailed explanation. You need this instead:
>
> /*
> * On an SMP system, we must ensure that this CPU has seen the
> * memory accesses performed by another CPU under the
> * chan->desc_lock spinlock.
> */
> smp_mb();
> > last_complete = chan->completed_cookie;
> > last_used = dchan->cookie;
> >
> > - spin_unlock_irqrestore(&chan->desc_lock, flags);
> > -
> > dma_set_tx_state(txstate, last_complete, last_used, 0);
> > return dma_async_is_complete(cookie, last_complete, last_used); }
>
> Facts:
> - dchan->cookie is the same member as chan->common.cookie (same memory location)
> - chan->common.cookie is the "last allocated cookie for a pending transaction"
> - chan->completed_cookie is the "last completed transaction"
>
> I have replaced "dchan->cookie" with "chan->common.cookie" in the below explanation, to keep everything referenced from the same structure.
>
> Variable usage before your change. Everything is used locked.
> - RW chan->common.cookie (fsl_dma_tx_submit)
> - R chan->common.cookie (fsl_tx_status)
> - R chan->completed_cookie (fsl_tx_status)
> - W chan->completed_cookie (dma_do_tasklet)
>
> Variable usage after your change:
> - RW chan->common.cookie LOCKED
> - R chan->common.cookie NO LOCK
> - R chan->completed_cookie NO LOCK
> - W chan->completed_cookie LOCKED
>
> What if we assume that you have a 2 CPU system (such as a P2020). After your changes, one possible sequence is:
>
> === CPU1 - allocate + submit descriptor: fsl_dma_tx_submit() === spin_lock_irqsave
> descriptor->cookie = 20 (x in your example)
> chan->common.cookie = 20 (used in your example)
> spin_unlock_irqrestore
>
> === CPU2 - immediately calls fsl_tx_status() ===
> chan->common.cookie == 19
> chan->completed_cookie == 19
> descriptor->cookie == 20
>
> Since we don't have locks anymore, CPU2 may not have seen the write to
> chan->common.cookie yet.
>
> Also assume that the DMA hardware has not started processing the transaction yet. Therefore dma_do_tasklet() has not been called, and
> chan->completed_cookie has not been updated.
>
> In this case, dma_async_is_complete() (on CPU2) returns DMA_SUCCESS, even though the DMA operation has not succeeded. The DMA operation has not even started yet!
>
> The smp_mb() fixes this, since it forces CPU2 to have seen all memory operations that happened before CPU1 released the spinlock. Spinlocks are implicit SMP memory barriers.
>
> Therefore, the above example becomes:
> smp_mb();
> chan->common.cookie == 20
> chan->completed_cookie == 19
> descriptor->cookie == 20
>
> Then dma_async_is_complete() returns DMA_IN_PROGRESS, which is correct.
>
> Thanks,
> Ira
>
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
next prev parent reply other threads:[~2011-11-28 16:38 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-22 4:55 [PATCH][RFC] fsldma: fix performance degradation by optimizing spinlock use b29237
2011-11-22 18:59 ` Ira W. Snyder
2011-11-24 8:12 ` Shi Xuelin-B29237
2011-11-28 16:38 ` Ira W. Snyder [this message]
2011-11-29 3:19 ` Li Yang-R58472
2011-11-29 17:25 ` Ira W. Snyder
2011-11-30 0:08 ` Tabi Timur-B04825
2011-11-30 9:57 ` Shi Xuelin-B29237
2011-11-30 17:07 ` Ira W. Snyder
2011-12-02 3:47 ` Shi Xuelin-B29237
2011-12-02 17:13 ` Ira W. Snyder
2011-12-05 6:11 ` Shi Xuelin-B29237
2011-11-29 19:49 ` Scott Wood
2011-11-29 3:41 ` Shi Xuelin-B29237
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111128163814.GA10919@ovro.caltech.edu \
--to=iws@ovro.caltech.edu \
--cc=B29237@freescale.com \
--cc=dan.j.williams@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=r58472@freescale.com \
--cc=vinod.koul@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).