From mboxrd@z Thu Jan 1 00:00:00 1970 From: dan.j.williams@intel.com (Dan Williams) Date: Wed, 14 Apr 2010 17:08:24 -0700 Subject: DMA Engine API performance issues In-Reply-To: References: Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Tue, Apr 13, 2010 at 11:42 PM, melwyn lobo wrote: > Hello Dan, > I have some doubts regarding DMA API usage on its clients for example > MMC, ALSA, USB etc. > > I am going to take example of the ALSA framework. Audio data transfer > is initiated in soc_pcm_trigger(). This is called in an atomic > context, > with spinlock held and irqs disabled. Here most drivers enable data > transfer from the MSP peripheral to the audio codec via tx_submit > implementation > of the DMA engine driver. This enqueues the transaction in an active > list which calls for using spinlocks with bottom half disabled. > It is in this function when spin_unlock_bh() is called the kernel > detects irq's disabled and generates a warning. > So the workaround here for ALSA drivers would be to use tasklet or > workqueues to defer calling this in an interruptible context, which > would > cause performance problems (the same function soc_pcm_trigger is > called for stoppping the transfer) in cases where the stream has to be > repeatedly > stopped and started. > > So the core issue is use of spin_unlock_bh in an atomic context. > Workaround for removing the warnings would be: > 1. Use spin_lock with irqsave and corresponding unlock function which > does not generate a warning in a similar situation. > ?But this could be futile in one case where the tasklet is scheduled > from ksoftirqd which could lead to corruption. > ?Also this means the interrupts would be disabled (on the local cpu) > till the function executed which is not something > ?desirable. > 2. Use local_irq_enable() before calling the DMA APIs and disable once > done. This is a crude solution and understandably undesirable and > dangerous. > > The DMA Engine framework assumes that the channel interrupt handling > is done in a tasklet (dma_run_dependencies), which I believe is the > reason for the issue. dma_run_dependencies() is only needed in the channel switching case which really only applies to the raid/mem-to-mem usage model (i.e. xor on one channel followed by copy on another). In the mem-to-io model you should not need to perform channel switching. I suggest following what the other mem-to-io drivers (ipu, dw_dmac, coh...) have implemented with their locks. In general the dmaengine api is meant to provide 1/ a method for matching dma consumers with capable dma devices 2/ a platform agnostic api for issuing mem-to-mem and simple mem-to-io (slave) dma. If the current framework provides everything you need, then by all means use it, but you may find there are architecture specific concerns that cannot be supported under the existing mem-to-io model. In other words the dmaengine abstraction stops being useful and gets in the way when there are specific architecture considerations beyond simple channel to slave-device associations. For example, dw_dmac and ipu are successfully using the dma-slave interface while the PXA folks are sticking with their local dma api. So use it if it simplifies your development more than it complicates it. -- Dan