All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ira W. Snyder" <iws@ovro.caltech.edu>
To: Shi Xuelin-B29237 <B29237@freescale.com>
Cc: "vinod.koul@intel.com" <vinod.koul@intel.com>,
	"dan.j.williams@intel.com" <dan.j.williams@intel.com>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
	Li Yang-R58472 <r58472@freescale.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH][RFC] fsldma: fix performance degradation by optimizing spinlock use.
Date: Mon, 28 Nov 2011 08:38:15 -0800	[thread overview]
Message-ID: <20111128163814.GA10919@ovro.caltech.edu> (raw)
In-Reply-To: <DBB740589CE8814680DECFE34BE197AB148CDD@039-SN1MPN1-006.039d.mgd.msft.net>

On Thu, Nov 24, 2011 at 08:12:25AM +0000, Shi Xuelin-B29237 wrote:
> Hi Ira,
> 
> Thanks for your review.
> 
> After second thought, I think your scenario may not occur.
> Because the cookie 20 we query must be returned by fsl_dma_tx_submit(...) in practice. 
> We never query a cookie not returned by fsl_dma_tx_submit(...).
> 

I agree about this part.

> When we call fsl_tx_status(20), the chan->common.cookie is definitely wrote as 20 and cpu2 could not read as 19.
> 

This is what I don't agree about. However, I'm not an expert on CPU
cache vs. memory accesses in an multi-processor system. The section
titled "CACHE COHERENCY" in Documentation/memory-barriers.txt leads me
to believe that the scenario I described is possible.

What happens if CPU1's write of chan->common.cookie only goes into
CPU1's cache. It never makes it to main memory before CPU2 fetches the
old value of 19.

I don't think you should see any performance impact from the smp_mb()
operation.

Thanks,
Ira

> -----Original Message-----
> From: Ira W. Snyder [mailto:iws@ovro.caltech.edu] 
> Sent: 2011年11月23日 2:59
> To: Shi Xuelin-B29237
> Cc: dan.j.williams@intel.com; Li Yang-R58472; zw@zh-kernel.org; vinod.koul@intel.com; linuxppc-dev@lists.ozlabs.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH][RFC] fsldma: fix performance degradation by optimizing spinlock use.
> 
> On Tue, Nov 22, 2011 at 12:55:05PM +0800, b29237@freescale.com wrote:
> > From: Forrest Shi <b29237@freescale.com>
> > 
> >     dma status check function fsl_tx_status is heavily called in
> >     a tight loop and the desc lock in fsl_tx_status contended by
> >     the dma status update function. this caused the dma performance
> >     degrades much.
> > 
> >     this patch releases the lock in the fsl_tx_status function.
> >     I believe it has no neglect impact on the following call of
> >     dma_async_is_complete(...).
> > 
> >     we can see below three conditions will be identified as success
> >     a)  x < complete < use
> >     b)  x < complete+N < use+N
> >     c)  x < complete < use+N
> >     here complete is the completed_cookie, use is the last_used
> >     cookie, x is the querying cookie, N is MAX cookie
> > 
> >     when chan->completed_cookie is being read, the last_used may
> >     be incresed. Anyway it has no neglect impact on the dma status
> >     decision.
> > 
> >     Signed-off-by: Forrest Shi <xuelin.shi@freescale.com>
> > ---
> >  drivers/dma/fsldma.c |    5 -----
> >  1 files changed, 0 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 
> > 8a78154..1dca56f 100644
> > --- a/drivers/dma/fsldma.c
> > +++ b/drivers/dma/fsldma.c
> > @@ -986,15 +986,10 @@ static enum dma_status fsl_tx_status(struct dma_chan *dchan,
> >  	struct fsldma_chan *chan = to_fsl_chan(dchan);
> >  	dma_cookie_t last_complete;
> >  	dma_cookie_t last_used;
> > -	unsigned long flags;
> > -
> > -	spin_lock_irqsave(&chan->desc_lock, flags);
> >  
> 
> This will cause a bug. See below for a detailed explanation. You need this instead:
> 
> 	/*
> 	 * On an SMP system, we must ensure that this CPU has seen the
> 	 * memory accesses performed by another CPU under the
> 	 * chan->desc_lock spinlock.
> 	 */
> 	smp_mb();
> >  	last_complete = chan->completed_cookie;
> >  	last_used = dchan->cookie;
> >  
> > -	spin_unlock_irqrestore(&chan->desc_lock, flags);
> > -
> >  	dma_set_tx_state(txstate, last_complete, last_used, 0);
> >  	return dma_async_is_complete(cookie, last_complete, last_used);  }
> 
> Facts:
> - dchan->cookie is the same member as chan->common.cookie (same memory location)
> - chan->common.cookie is the "last allocated cookie for a pending transaction"
> - chan->completed_cookie is the "last completed transaction"
> 
> I have replaced "dchan->cookie" with "chan->common.cookie" in the below explanation, to keep everything referenced from the same structure.
> 
> Variable usage before your change. Everything is used locked.
> - RW chan->common.cookie		(fsl_dma_tx_submit)
> - R  chan->common.cookie		(fsl_tx_status)
> - R  chan->completed_cookie		(fsl_tx_status)
> - W  chan->completed_cookie		(dma_do_tasklet)
> 
> Variable usage after your change:
> - RW chan->common.cookie		LOCKED
> - R  chan->common.cookie		NO LOCK
> - R  chan->completed_cookie		NO LOCK
> - W  chan->completed_cookie             LOCKED
> 
> What if we assume that you have a 2 CPU system (such as a P2020). After your changes, one possible sequence is:
> 
> === CPU1 - allocate + submit descriptor: fsl_dma_tx_submit() === spin_lock_irqsave
> descriptor->cookie = 20		(x in your example)
> chan->common.cookie = 20	(used in your example)
> spin_unlock_irqrestore
> 
> === CPU2 - immediately calls fsl_tx_status() ===
> chan->common.cookie == 19
> chan->completed_cookie == 19
> descriptor->cookie == 20
> 
> Since we don't have locks anymore, CPU2 may not have seen the write to
> chan->common.cookie yet.
> 
> Also assume that the DMA hardware has not started processing the transaction yet. Therefore dma_do_tasklet() has not been called, and
> chan->completed_cookie has not been updated.
> 
> In this case, dma_async_is_complete() (on CPU2) returns DMA_SUCCESS, even though the DMA operation has not succeeded. The DMA operation has not even started yet!
> 
> The smp_mb() fixes this, since it forces CPU2 to have seen all memory operations that happened before CPU1 released the spinlock. Spinlocks are implicit SMP memory barriers.
> 
> Therefore, the above example becomes:
> smp_mb();
> chan->common.cookie == 20
> chan->completed_cookie == 19
> descriptor->cookie == 20
> 
> Then dma_async_is_complete() returns DMA_IN_PROGRESS, which is correct.
> 
> Thanks,
> Ira
> 
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

  reply	other threads:[~2011-11-28 16:38 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-22  4:55 [PATCH][RFC] fsldma: fix performance degradation by optimizing spinlock use b29237
2011-11-22  4:55 ` b29237
2011-11-22 18:59 ` Ira W. Snyder
2011-11-22 18:59   ` Ira W. Snyder
2011-11-24  8:12   ` Shi Xuelin-B29237
2011-11-24  8:12     ` Shi Xuelin-B29237
2011-11-28 16:38     ` Ira W. Snyder [this message]
2011-11-29  3:19       ` Li Yang-R58472
2011-11-29  3:19         ` Li Yang-R58472
2011-11-29 17:25         ` Ira W. Snyder
2011-11-29 17:25           ` Ira W. Snyder
2011-11-30  0:08           ` Tabi Timur-B04825
2011-11-30  0:08             ` Tabi Timur-B04825
2011-11-30  9:57           ` Shi Xuelin-B29237
2011-11-30  9:57             ` Shi Xuelin-B29237
2011-11-30 17:07             ` Ira W. Snyder
2011-12-02  3:47               ` Shi Xuelin-B29237
2011-12-02  3:47                 ` Shi Xuelin-B29237
2011-12-02 17:13                 ` Ira W. Snyder
2011-12-05  6:11                   ` Shi Xuelin-B29237
2011-12-05  6:11                     ` Shi Xuelin-B29237
2011-11-29 19:49         ` Scott Wood
2011-11-29 19:49           ` Scott Wood
2011-11-29  3:41       ` Shi Xuelin-B29237
2011-11-29  3:41         ` Shi Xuelin-B29237

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111128163814.GA10919@ovro.caltech.edu \
    --to=iws@ovro.caltech.edu \
    --cc=B29237@freescale.com \
    --cc=dan.j.williams@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=r58472@freescale.com \
    --cc=vinod.koul@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.