* dmaengine: fix dma_unmap (was: Re: [PATCH 06/13] DMAENGINE: driver for the ARM PL080/PL081 PrimeCells) @ 2011-01-03 16:36 Dan Williams 2011-01-03 16:52 ` Russell King - ARM Linux 0 siblings, 1 reply; 4+ messages in thread From: Dan Williams @ 2011-01-03 16:36 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Linus Walleij, Viresh Kumar, Kukjin Kim, yuanyabin1978, linux-kernel, Ben Dooks, Peter Pearse, linux-arm-kernel, Alessandro Rubini, linux-raid On Mon, Jan 3, 2011 at 3:14 AM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Sun, Jan 02, 2011 at 09:33:34PM +0100, Linus Walleij wrote: >> As for the in-tree PL08x driver I'd say it's doing pretty well for >> memcpy() so we could add platform data for that on supported >> platforms, then for device transfers we need more elaborative >> work. > > It has the issue that it's not unmapping the buffers after the memcpy() > operation has completed, so on ARMv6+ we have the possibility for > speculative prefetches to corrupt the destination buffer. > > Neither are a number of the other DMA engine drivers. This is why I'd > like to see some common infrastructure in the DMA engine core for saying > "this tx descriptor is now complete" so that DMA engine driver authors > don't have to even think about whether they should be unmapping buffers. This requires that a copy of the mapped addresses be maintained outside the driver's physical descriptor. This needs support from the client to set up storage for this information (probably a scatterlist). The dmaengine core could use this to implement a common unmap routine. However, this still has the problem of how to prevent unmapping too early in the multi-operation raid case and how to communicate the full set of addresses to unmap to the final descriptor in such a chain. I think the only way to fully solve this is to make the client solely responsible for both mapping and unmapping. For raid this will have implications for architectures that split operation types on to different physical channels. Preparing the entire operation chain ahead of time is not possible on such configuration because we need to remap the buffers for each channel transition. So, raid will have an optimized path for engines like mv_xor, ioatdma, and iop-adma (iop13xx) where all buffers can be mapped upfront (against a single physical channel) and then unmapped when all stripe operations complete. For the others iop-adma (iop3xx) and ppc44x we need to wait for each leg to finish before mapping and issuing the next leg. There will most likely be negative performance implications of waiting and reissuing, but as far as I can see this is unavoidable. > I'd also like to see DMA_COMPL_SKIP_*_UNMAP always set by prep_slave_sg() > in tx->flags so we don't have to end up with "is this a slave operation" > tests in the completion handler. Longer term I do not see these flags surviving, but yes a 2.6.38 change along these lines makes sense. -- Dan -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: dmaengine: fix dma_unmap (was: Re: [PATCH 06/13] DMAENGINE: driver for the ARM PL080/PL081 PrimeCells) 2011-01-03 16:36 dmaengine: fix dma_unmap (was: Re: [PATCH 06/13] DMAENGINE: driver for the ARM PL080/PL081 PrimeCells) Dan Williams @ 2011-01-03 16:52 ` Russell King - ARM Linux 2011-01-03 21:26 ` Dan Williams 0 siblings, 1 reply; 4+ messages in thread From: Russell King - ARM Linux @ 2011-01-03 16:52 UTC (permalink / raw) To: Dan Williams Cc: Viresh Kumar, Kukjin Kim, Linus Walleij, Peter Pearse, linux-kernel, linux-raid, Ben Dooks, yuanyabin1978, linux-arm-kernel, Alessandro Rubini On Mon, Jan 03, 2011 at 08:36:00AM -0800, Dan Williams wrote: > For raid this will have implications for architectures that split > operation types on to different physical channels. Preparing the > entire operation chain ahead of time is not possible on such > configuration because we need to remap the buffers for each channel > transition. That's not entirely true. You will only need to remap buffers if old_chan->device != new_chan->device, as the underlying struct device will be the different and could possibly have a different IOMMU or DMA-able memory parameters. So, when changing channels, the optimization is not engine specific, but can be effected when the chan->device points to the same dma_device structure. That means it should still be possible to chain several operations together, even if it means that they occur on different channels on the same device. One passing idea is the async_* operations need to chain buffers in terms of <virtual page+offset, len, dma_addr_t, struct dma_device *>, or maybe <struct dma_device *, scatterlist>. If the dma_device pointer is initialized, the scatterlist is already mapped. If this differs from the dma_device for the next selected operation, the previous operations need to be run, then unmap and remap for the new device. Does that sound possible? > > I'd also like to see DMA_COMPL_SKIP_*_UNMAP always set by prep_slave_sg() > > in tx->flags so we don't have to end up with "is this a slave operation" > > tests in the completion handler. > > Longer term I do not see these flags surviving, but yes a 2.6.38 > change along these lines makes sense. Well, if the idea is to kill those flags, then it would be a good idea not to introduce new uses of them as that'll only complicate matters. I do have an untested patch which adds the unmap to pl08x, but I'm wondering if it's worth it, or whether to disable the memcpy support for the time being. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: dmaengine: fix dma_unmap (was: Re: [PATCH 06/13] DMAENGINE: driver for the ARM PL080/PL081 PrimeCells) 2011-01-03 16:52 ` Russell King - ARM Linux @ 2011-01-03 21:26 ` Dan Williams 2011-01-04 22:34 ` Linus Walleij 0 siblings, 1 reply; 4+ messages in thread From: Dan Williams @ 2011-01-03 21:26 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Viresh Kumar, Kukjin Kim, Linus Walleij, Peter Pearse, linux-kernel, linux-raid, Ben Dooks, yuanyabin1978, linux-arm-kernel, Alessandro Rubini On Mon, Jan 3, 2011 at 8:52 AM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Mon, Jan 03, 2011 at 08:36:00AM -0800, Dan Williams wrote: >> For raid this will have implications for architectures that split >> operation types on to different physical channels. Preparing the >> entire operation chain ahead of time is not possible on such >> configuration because we need to remap the buffers for each channel >> transition. > > That's not entirely true. You will only need to remap buffers if > old_chan->device != new_chan->device, as the underlying struct device > will be the different and could possibly have a different IOMMU or > DMA-able memory parameters. > Yes, but currently operation capabilities are organized per dma device (i.e. all channels on a dma device share the same set of capabilities). The channel allocator will keep the chain on a single channel where possible, but if it determines we need to switch to a channel with a different capability set then we have also switched dma devices at that point. iop3xx and ppc4xx have this dma_device-per-dma_chan organization.currently. They could switch to a model of hiding multiple hw channels behind a single dma_chan, but then they would need to handle the operation ordering and channel transitions internally. > So, when changing channels, the optimization is not engine specific, > but can be effected when the chan->device points to the same dma_device > structure. That means it should still be possible to chain several > operations together, even if it means that they occur on different > channels on the same device. > > One passing idea is the async_* operations need to chain buffers in > terms of <virtual page+offset, len, dma_addr_t, struct dma_device *>, or > maybe <struct dma_device *, scatterlist>. If the dma_device pointer is > initialized, the scatterlist is already mapped. If this differs from > the dma_device for the next selected operation, the previous operations > need to be run, then unmap and remap for the new device. > > Does that sound possible? Yes, but the dma driver still does not have enough information to determine when it is finally safe to unmap / allow speculative reads. The raid driver can make a much cleaner guarantee of "this stripe now belongs to a dma device" and "all dma operations have completed this stripe can be returned to the cpu / rescheduled on a new channel". >> > I'd also like to see DMA_COMPL_SKIP_*_UNMAP always set by prep_slave_sg() >> > in tx->flags so we don't have to end up with "is this a slave operation" >> > tests in the completion handler. >> >> Longer term I do not see these flags surviving, but yes a 2.6.38 >> change along these lines makes sense. > > Well, if the idea is to kill those flags, then it would be a good idea > not to introduce new uses of them as that'll only complicate matters. > > I do have an untested patch which adds the unmap to pl08x, but I'm > wondering if it's worth it, or whether to disable the memcpy support > for the time being. We could disable the driver if NET_DMA or ASYNC_TX_DMA are selected. That still allows the driver to be exercised with dmatest. Although I notice the driver is already marked experimental, do we need something stronger for 37-final? -- Dan -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: dmaengine: fix dma_unmap (was: Re: [PATCH 06/13] DMAENGINE: driver for the ARM PL080/PL081 PrimeCells) 2011-01-03 21:26 ` Dan Williams @ 2011-01-04 22:34 ` Linus Walleij 0 siblings, 0 replies; 4+ messages in thread From: Linus Walleij @ 2011-01-04 22:34 UTC (permalink / raw) To: Dan Williams Cc: Russell King - ARM Linux, Viresh Kumar, Kukjin Kim, Peter Pearse, linux-kernel, linux-raid, Ben Dooks, yuanyabin1978, linux-arm-kernel, Alessandro Rubini 2011/1/3 Dan Williams <dan.j.williams@intel.com>: > We could disable the driver if NET_DMA or ASYNC_TX_DMA are selected. > That still allows the driver to be exercised with dmatest. Although > I notice the driver is already marked experimental, do we need > something stronger for 37-final? Your pick, IMHO. To use it out-of-the-box with 2.6.37 is not possible on any system anyway - we have not patched in the required platform data to any ARM system! Those who do such things surely know what they're doing. Yours, Linus Walleij ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-01-04 22:34 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-01-03 16:36 dmaengine: fix dma_unmap (was: Re: [PATCH 06/13] DMAENGINE: driver for the ARM PL080/PL081 PrimeCells) Dan Williams 2011-01-03 16:52 ` Russell King - ARM Linux 2011-01-03 21:26 ` Dan Williams 2011-01-04 22:34 ` Linus Walleij
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).