From mboxrd@z Thu Jan 1 00:00:00 1970 From: Martin Sperl Subject: Re: Depreciated spi_master.transfer and "prepared spi messages" for an optimized pipelined-SPI-DMA-driver Date: Fri, 15 Nov 2013 12:15:55 +0100 Message-ID: <5286026B.2090903@sperl.org> References: <20131107203127.GB2493@sirena.org.uk> <86AE15B6-05AF-4EFF-8B8F-10806A7C148B@sperl.org> <20131108161957.GP2493@sirena.org.uk> <5F70E708-89B9-4DCF-A31A-E688BAA0E062@sperl.org> <20131108180934.GQ2493@sirena.org.uk> <20131109183056.GU2493@sirena.org.uk> <6C7903B3-8563-490E-AD7D-BA5D65FFB9BC@sperl.org> <20131112011954.GH2674@sirena.org.uk> <52823E73.503@sperl.org> <20131114015009.GB26614@sirena.org.uk> <9640F4C7-7F82-453E-9D83-5875A1559A20@sperl.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: linux-spi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Mark Brown Return-path: In-Reply-To: <9640F4C7-7F82-453E-9D83-5875A1559A20-d5rIkyn9cnPYtjvyW6yDsg@public.gmane.org> Sender: linux-spi-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Hi Mark! I thought a bit about what you had said about DMA and building-blocks, and I have come up with a possible vision of the "future". You essentially gave me the idea that we could reduce restrictions on the proposed spi_prepare_message by adding some flags of in what way the structure may change. So if you got a "fully" fixed structure, then we could add "flags" here to indicate if the structure also is fixed with regards to: * Length of Transfers * Change in Pointer to Transfers * providing DMA data or not (this could get indicated via is_dma_mapped) Then the bus-driver (or framework) could start to use different construction schemes to build the DMA chain based on different flag variations and thus optimizing the time it takes to build the DMA (which is currently the worsted case in my driver). This would definitely "improve" the performance of those sync calls (if we make a prepare in the sync call somehow for the hot-path) Also the driver can start to use (pre-built/cached) "building-blocks" that make up a lot of the DMA control blocks that are needed. (configuring the SPI controller and such) For more complex drivers we may also defer the DMA-preparation (and scheduling thereof) of messages to a worker thread which would produce DMA chains to schedule. But if the message is prepared it would get scheduled from spi_async context immediately, unless there is something in the prepare queue, which then would have to wait so that no reordering occurs. This comes with a potential "trap": We would need to define if a "reorder" of SPI messages between two different SPI Bus devices with different CS is a valid operation or not. (the question is mostly: would this "queuing" happen on a per bus-driver basis or on a per CS basis) The timing sequence would be like this (for scheduling spi_async messages at the same time): driver A: spi_async(spi0,msg0_unprepared); /* getting put into the "queue" */ driver B: spi_async(spi1,msg1_prepared); /* getting scheduled immediately */ On the SPI Bus you would see the following: msg1_prepared; msg0_unprepared; Here we have the requirements of a "definition" of scheduling policies. As far as I can tell (and I may be wrong) the issue described would only impact if a single device-driver would require two or more different CS to make a device work properly. So I would say that "reordering" is essentially allowed if we talk about two different drivers talking to different devices (or multiple instances of a single driver talking to distinct devices of the same type). Note that this could show some strange "concurrency behavior" in some drivers due to implicit assumptions which are not defined until now and which would never show up on the current setup. That is unless you find a board with 2 SPI buses and you connect an identical device on each SPI Bus. But if one looks at this from this direction, then such an assumption in the driver would be a bug in the device-driver that would need to get fixed in the driver itself. But if a driver exists for a device that requires two (or more) Chip selects with the additional "requirement" of "proper" ordering, then: Either the driver needs to make both messages "prepared". We may need to provide an extra flag to the "prepare" to force "global ordering" on the SPI-bus. or we make it a requirement that - in such cases - the spi_lock_bus interface is used to block the whole bus. But the first question is: are there any drivers currently that require multiple CS to do its work? All of these ideas will also help us in the future and may then may get used for other bus-drivers as well - we then could even start using a "DMA-sequencing" engine inside the framework to reduce the code on the driver (how to define those would be tricky though - we would need to see how "flexible" other devices are compared to bcm2835). But first we need to get one working driver with "optimal" performance that we may confirm it is working in principle and then continue to see where optimizations can get made and finally refactor this into a "generic" solution sharing code in the SPI framework and it may eventually move into dmaengine... Another thing that came to my mind is a means to give spi_async a hint if it runs in an interrupt context or not. This would give the opportunity to allow memory allocations in spi_async and transfer using GFP_KERNEL instead of GFP_ATOMIC for cases that run in non_interrupt cases. We could make this a flag in the spi_message structure or use different functions: spi_async_threaded and spi_async_interrupt while having the spi_async map to spi_async_interrupt for compatibility reasons with older drivers. Both are valid approaches - I believe it is a matter of taste which one gets used... Another thing that you may have mentioned could be a message that would be like "read(8|16)bitReadXbyte". You first read (1/2) bytes and then read X bytes that come from the first one or two bytes. The problem here may be that such a "scheme" would not (not necessarily) work with a fully DMA pipelined approach - especially endian-nes might be different (possibly could work around this with 1 byte DMA transfers, if the DMA engine supports it). But for some HW-devices length might be mixed with other control-bits, which would require some "and+shift" logic, which would become impossible to implement in DMA for most DMA-engines (the mcp2515 would be such a case). The question here is really: how many such devices do exists that provide such, that would make it worth doing that in a generic way? Finally I have a proposal to avoid the "naming" conflict as we have seen in our discussions so far: why not call the new interface: int spi_(pre)build_message(struct spi_device*,struct spi_message, unsigned long flags); int spi_unbuild_message(struct spi_device*,struct spi_message); with the corresponding equivalent methods in spi_master? Other naming ideas are welcome (I just would not say DMA, as it might get used for other things as well) So I think all the above ideas give us a lot to ponder how/where we may want to move in the future... In the meantime I will try to move to the SPI bus-driver to the transfer interface and I will probably add a module parameter to allow the selection of which interface is used. This way we have then many more scenarios to compare to see how they fare with different situations/use-cases/... But in the end I would guess that going with the "direct" DMA chaining approach would give the best results... Martin P.s: maybe we should start renaming the subject of this thread? -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html