Continuous streaming SPI transfer

linux-spi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Continuous streaming SPI transfer
@ 2012-09-27 22:09 Nuutti Kotivuori
       [not found] ` <87zk4bazje.fsf-Nc554NfcwGrUGg1qMAD/drNAH6kLmebB@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Nuutti Kotivuori @ 2012-09-27 22:09 UTC (permalink / raw)
  To: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Hello,

I would like to use SPI in a streaming fashion, with a transfer being
active all the time. This seems very difficult with the current kernel
drivers. I am mainly looking at the Raspberry Pi BCM2708 SPI driver, but
I have also tried to find similar features from other drivers.

There seems to be no way to prevent the deactivation and reactivation of
the clock and everything between separate transfers - and a single
transfer is bounded in size and no progress is reported for it. Even
within a single transfer, it would seem that an earlier transfer is
waited to receive a DONE signal (tx fifo empty, rx fifo has everything)
before starting a new transfer, so there's always a small gap between
transfers. (The RPi hardware stops sending SCLK if there ever is a
condition where there is no byte ready in the FIFO to be sent, so if a
DONE signal is receved that means that SCLK has already been stopped.)

Keeping a transfer active constantly would need some buffering on writes
and reads to keep the TX FIFO always filled and RX FIFO not
full. Otherwise I don't see any direct problems with it, atleast on the
hardware level.

So, my questions are:

 1) Is there a fundamental problem in keeping an SPI transfer active for
    extended periods of time (several hours)?
 2) Is this possible with the kernel API somehow?
 3) Is this possible from userland somehow?
 4) If it is not possible, what would be a good API for such a case?
 5) Would patches for such functionality be accepted?

Thank you in advance,
-- Naked

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Continuous streaming SPI transfer
       [not found] ` <87zk4bazje.fsf-Nc554NfcwGrUGg1qMAD/drNAH6kLmebB@public.gmane.org>
@ 2012-09-28  1:30   ` Ned Forrester
       [not found]     ` <5064FDB1.9060304-/d+BM93fTQY@public.gmane.org>
  2012-10-04 21:13   ` Mark Brown
  1 sibling, 1 reply; 5+ messages in thread
From: Ned Forrester @ 2012-09-28  1:30 UTC (permalink / raw)
  To: Nuutti Kotivuori; +Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

On 09/27/2012 06:09 PM, Nuutti Kotivuori wrote:
> Hello,
> 
> I would like to use SPI in a streaming fashion, with a transfer being
> active all the time. This seems very difficult with the current kernel
> drivers. I am mainly looking at the Raspberry Pi BCM2708 SPI driver, but
> I have also tried to find similar features from other drivers.

You don't say whether you mean to transfer into or out of the external
device (or both). Nor do you say what clock rate you want to use (what's
the average bit/byte rate?).

I am not familiar with the hardware you want to use, but I have done
this on a PXA255 processor, at 11Mbit/sec from the external device to
the PXA processor.  For this I had to extensively modify the pxa2xx_spi
controller driver (that was with kernel 2.6.20, I think the name has
changed in recent kernels), and, of course, to write my own protocol
driver (the glue between the kernel side of Clib and the controller
driver).  I did not have to make any changes to the SPI core.

NOTE WELL: Please don't ask for this code, I cannot share it.

> There seems to be no way to prevent the deactivation and reactivation of
> the clock and everything between separate transfers - and a single
> transfer is bounded in size and no progress is reported for it. Even
> within a single transfer, it would seem that an earlier transfer is
> waited to receive a DONE signal (tx fifo empty, rx fifo has everything)
> before starting a new transfer, so there's always a small gap between
> transfers. (The RPi hardware stops sending SCLK if there ever is a
> condition where there is no byte ready in the FIFO to be sent, so if a
> DONE signal is receved that means that SCLK has already been stopped.)

To overcome this problem, I ran the PXA processor as a slave, with the
clocks supplied from external hardware.  The problem then becomes
keeping the receive FIFO from overflowing.  That was solved by enabling
chained DMA transfers (at enormous driver-writing expense; I'd never
written a device driver before).  DMA chaining allows the DMA hardware
to fetch its own next set of DMA parameters (addresses and byte count)
at the end of each DMA, thus avoiding the interrupt latency between each
buffer of data.  Using 256 buffers of 4096 bytes, interrupt latency of
100s of milliseconds becomes acceptable.

I used this to stream data from an external device, but I don't know of
any reason why it could not be used to stream data to a device, or to
both transmit and receive.  I have no idea whether your hardware has any
equivalent of DMA chaining, nor whether you have the experience/budget
for writing device drivers.  I don't think you can achieve what you want
without a dedicated effort, because the SPI model is not intended for
this type of application.

> Keeping a transfer active constantly would need some buffering on writes
> and reads to keep the TX FIFO always filled and RX FIFO not
> full. Otherwise I don't see any direct problems with it, atleast on the
> hardware level.
> 
> So, my questions are:
> 
>  1) Is there a fundamental problem in keeping an SPI transfer active for
>     extended periods of time (several hours)?

It depends on the clock rate, the features of the hardware you intend to
use, and your willingness to write your own driver.

>  2) Is this possible with the kernel API somehow?

The SPI core assumes a series of messages containing transfers.  It is
intended to support multiple chips attached to one (or more) bus(es).
Each chip has a protocol driver with corresponding user-space interface
provided by the Clib, and each protocol driver passes messages in mixed
fashion to the controller driver, which actually manipulates the bus and
chip selects.

What you want can be done through the SPI core only if there is a single
device on the bus (or if data to/from multiple devices is completely
predictable, eg. sequential).  I wrote my driver as a modification of
pxa2xx_spi, with all existing functionality intact and continuing to use
all the message mechanics of the SPI core.  That was probably a mistake,
as it would have taken me far less time to write a dedicated, single
purpose driver.

>  3) Is this possible from userland somehow?

Ultimately everything has to pass to userland to be useful.  Do you
mean: can this be done without writing a device driver?  I doubt it.

>  4) If it is not possible, what would be a good API for such a case?

Not sure.  Somehow you need to be able to execute a read() from user
space and get data.  I chose to read() a block of data equal to the
length of each DMA transfer (4096 bytes), so that each buffer could be
freed as it is read.  It should also work to model the device as
continuously streaming and then to read a number of bytes, either
blocking (returns only when byte count is satisfied) or non-blocking
(returns available bytes).

I am thinking of a device that streams data to the processor.  Note that
the chained DMA scheme involves considerable delay between when the
device actually sends data and when those data are ultimately read in
user space (10s or 100s of buffers later).  If you are sending AND
receiving data, the sent data must be queued with write() well in
advance, and the corresponding read data will not be returned until
later.  Some mechanism would be required to identify which received data
corresponded to which transmitted data.  In the normal SPI core model,
you would write the data for a message, then read the date from the
message, before starting the next message.

>  5) Would patches for such functionality be accepted?

Hard to say.  This sort of question has come up many times in the past
in connection with proposed (but never implemented) slave mode for
Linux.  I would not expect patches to be well received if they depart
from the multi-chip message model.

> 
> Thank you in advance,
> -- Naked
> 
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://ad.doubleclick.net/clk;258768047;13503038;j?
> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
> _______________________________________________
> spi-devel-general mailing list
> spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
> https://lists.sourceforge.net/lists/listinfo/spi-devel-general
> 
> 

-- 
Ned Forrester                                       nforrester-/d+BM93fTQY@public.gmane.org
Oceanographic Systems Lab                           508-289-2226 Office
Applied Ocean Physics and Engineering Dept.
Woods Hole Oceanographic Institution          Woods Hole, MA 02543, USA
http://www.whoi.edu/
http://www.whoi.edu/page.do?pid=29856
http://www.whoi.edu/hpb/Site.do?id=1532

------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Continuous streaming SPI transfer
       [not found]     ` <5064FDB1.9060304-/d+BM93fTQY@public.gmane.org>
@ 2012-09-29 20:20       ` Nuutti Kotivuori
       [not found]         ` <87txugbmy2.fsf-Nc554NfcwGrUGg1qMAD/drNAH6kLmebB@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Nuutti Kotivuori @ 2012-09-29 20:20 UTC (permalink / raw)
  To: Ned Forrester; +Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Ned Forrester <nforrester-/d+BM93fTQY@public.gmane.org> writes:
> On 09/27/2012 06:09 PM, Nuutti Kotivuori wrote:
>> I would like to use SPI in a streaming fashion, with a transfer being
>> active all the time. This seems very difficult with the current kernel
>> drivers. I am mainly looking at the Raspberry Pi BCM2708 SPI driver, but
>> I have also tried to find similar features from other drivers.
>
> You don't say whether you mean to transfer into or out of the external
> device (or both). Nor do you say what clock rate you want to use (what's
> the average bit/byte rate?).

I mean to transfer both in and out, but the clock rate I need is not very
high, 100 kHz should be fine, 250 kHz should be plenty.

> I am not familiar with the hardware you want to use, but I have done
> this on a PXA255 processor, at 11Mbit/sec from the external device to
> the PXA processor.  For this I had to extensively modify the pxa2xx_spi
> controller driver (that was with kernel 2.6.20, I think the name has
> changed in recent kernels), and, of course, to write my own protocol
> driver (the glue between the kernel side of Clib and the controller
> driver).  I did not have to make any changes to the SPI core.
>
> NOTE WELL: Please don't ask for this code, I cannot share it.

That sounds like a *lot* of effort, something beyond what I'm willing to
put in.

> To overcome this problem, I ran the PXA processor as a slave, with the
> clocks supplied from external hardware.  The problem then becomes
> keeping the receive FIFO from overflowing.  That was solved by enabling
> chained DMA transfers (at enormous driver-writing expense; I'd never
> written a device driver before).

[...]

> I don't think you can achieve what you want without a dedicated
> effort, because the SPI model is not intended for this type of
> application.

I agree, this is so for your data rate. However, for my 250 kHz, things
are much simpler.

Because of my own investigations and your answer, I decided to forgo the
kernel route altogether. Instead, I used the bcm2835 library from
userspace as the peripherals are simply mapped to a fixed address space.

With this, my main loop is simply

  while (!exit) {
    uint32_t state = bcm2835_peri_read(spi_cs);
    if (state & BCM2835_SPI0_CS_RXF)
      fprintf(stderr, "RX FIFO full!\n");
    if (state & BCM2835_SPI0_CS_DONE)
      fprintf(stderr, "TX FIFO empty!\n");
    if (state & BCM2835_SPI0_CS_TXD)
      bcm2835_peri_write_nb(spi_fifo, producebyte());
    if (state & BCM2835_SPI0_CS_RXD)
      processbyte(bcm2835_peri_read_nb(spi_fifo));
    if (!(state & (BCM2835_SPI0_CS_TXD | BCM2835_SPI0_CS_RXD)))
      nanosleep(&loop_wait, NULL);
  }

To explain, I simply loop in a userspace (but realtime scheduled)
process, writing to the TX FIFO if the TX FIFO can accept more input and
reading from the RX FIFO if the RX FIFO has any bytes to read. If there
is neither, I sleep for 300 microseconds.

This piece of code can easily keep both RX and TX FIFOS well tended at
250kHz - and I managed to get it stable at even 2 MHz while testing. The
solution does not use interrupts for anything (as it can't), but since
the data rate is fixed a simple timer is not much worse. The 300
microsecond sleep version uses about 7% of the CPU, and that is easily
acceptable for me.

I see no fundamental reason why a similar mode of operation could not be
supported for /dev/spidev or inside the kernel - much more efficiently -
and I would very much like to see something like this implemented.

However, for my use case I have total control of the runtime environment
and I can easily make sure no other software touches the SPI or the pins
I need - and I stand to gain nothing by implementing this inside the
kernel - so I will stick with the userland mmap() /dev/mem hack for the
time being.

Thank you for the answer, it was very enlightening.

-- Naked

------------------------------------------------------------------------------
How fast is your code?
3 out of 4 devs don\\\'t know how their code performs in production.
Find out how slow your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219672;13503038;z?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Continuous streaming SPI transfer
       [not found]         ` <87txugbmy2.fsf-Nc554NfcwGrUGg1qMAD/drNAH6kLmebB@public.gmane.org>
@ 2012-09-30  1:56           ` Ned Forrester
  0 siblings, 0 replies; 5+ messages in thread
From: Ned Forrester @ 2012-09-30  1:56 UTC (permalink / raw)
  To: Nuutti Kotivuori; +Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

On 09/29/2012 04:20 PM, Nuutti Kotivuori wrote:
> Ned Forrester <nforrester-/d+BM93fTQY@public.gmane.org> writes:
>> On 09/27/2012 06:09 PM, Nuutti Kotivuori wrote:
>>> I would like to use SPI in a streaming fashion, with a transfer being
>>> active all the time. This seems very difficult with the current kernel
>>> drivers. I am mainly looking at the Raspberry Pi BCM2708 SPI driver, but
>>> I have also tried to find similar features from other drivers.
>>
>> You don't say whether you mean to transfer into or out of the external
>> device (or both). Nor do you say what clock rate you want to use (what's
>> the average bit/byte rate?).
> 
> I mean to transfer both in and out, but the clock rate I need is not very
> high, 100 kHz should be fine, 250 kHz should be plenty.
> 
>> I am not familiar with the hardware you want to use, but I have done
>> this on a PXA255 processor, at 11Mbit/sec from the external device to
>> the PXA processor.  For this I had to extensively modify the pxa2xx_spi
>> controller driver (that was with kernel 2.6.20, I think the name has
>> changed in recent kernels), and, of course, to write my own protocol
>> driver (the glue between the kernel side of Clib and the controller
>> driver).  I did not have to make any changes to the SPI core.
>>
>> NOTE WELL: Please don't ask for this code, I cannot share it.
> 
> That sounds like a *lot* of effort, something beyond what I'm willing to
> put in.
> 
>> To overcome this problem, I ran the PXA processor as a slave, with the
>> clocks supplied from external hardware.  The problem then becomes
>> keeping the receive FIFO from overflowing.  That was solved by enabling
>> chained DMA transfers (at enormous driver-writing expense; I'd never
>> written a device driver before).
> 
> [...]
> 
>> I don't think you can achieve what you want without a dedicated
>> effort, because the SPI model is not intended for this type of
>> application.
> 
> I agree, this is so for your data rate. However, for my 250 kHz, things
> are much simpler.
> 
> Because of my own investigations and your answer, I decided to forgo the
> kernel route altogether. Instead, I used the bcm2835 library from
> userspace as the peripherals are simply mapped to a fixed address space.
> 
> With this, my main loop is simply
> 
>   while (!exit) {
>     uint32_t state = bcm2835_peri_read(spi_cs);
>     if (state & BCM2835_SPI0_CS_RXF)
>       fprintf(stderr, "RX FIFO full!\n");
>     if (state & BCM2835_SPI0_CS_DONE)
>       fprintf(stderr, "TX FIFO empty!\n");
>     if (state & BCM2835_SPI0_CS_TXD)
>       bcm2835_peri_write_nb(spi_fifo, producebyte());
>     if (state & BCM2835_SPI0_CS_RXD)
>       processbyte(bcm2835_peri_read_nb(spi_fifo));
>     if (!(state & (BCM2835_SPI0_CS_TXD | BCM2835_SPI0_CS_RXD)))
>       nanosleep(&loop_wait, NULL);
>   }
> 
> To explain, I simply loop in a userspace (but realtime scheduled)
> process, writing to the TX FIFO if the TX FIFO can accept more input and
> reading from the RX FIFO if the RX FIFO has any bytes to read. If there
> is neither, I sleep for 300 microseconds.
> 
> This piece of code can easily keep both RX and TX FIFOS well tended at
> 250kHz - and I managed to get it stable at even 2 MHz while testing. The
> solution does not use interrupts for anything (as it can't), but since
> the data rate is fixed a simple timer is not much worse. The 300
> microsecond sleep version uses about 7% of the CPU, and that is easily
> acceptable for me.
> 
> I see no fundamental reason why a similar mode of operation could not be
> supported for /dev/spidev or inside the kernel - much more efficiently -
> and I would very much like to see something like this implemented.
> 
> However, for my use case I have total control of the runtime environment
> and I can easily make sure no other software touches the SPI or the pins
> I need - and I stand to gain nothing by implementing this inside the
> kernel - so I will stick with the userland mmap() /dev/mem hack for the
> time being.
> 
> Thank you for the answer, it was very enlightening.
> 
> -- Naked

Most certainly simpler at lower clock rate.  I'm glad you found
something that works.

-- 
Ned Forrester                                       nforrester-/d+BM93fTQY@public.gmane.org
Oceanographic Systems Lab                           508-289-2226 Office
Applied Ocean Physics and Engineering Dept.
Woods Hole Oceanographic Institution          Woods Hole, MA 02543, USA
http://www.whoi.edu/
http://www.whoi.edu/page.do?pid=29856
http://www.whoi.edu/hpb/Site.do?id=1532


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Continuous streaming SPI transfer
       [not found] ` <87zk4bazje.fsf-Nc554NfcwGrUGg1qMAD/drNAH6kLmebB@public.gmane.org>
  2012-09-28  1:30   ` Ned Forrester
@ 2012-10-04 21:13   ` Mark Brown
  1 sibling, 0 replies; 5+ messages in thread
From: Mark Brown @ 2012-10-04 21:13 UTC (permalink / raw)
  To: Nuutti Kotivuori; +Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

On Fri, Sep 28, 2012 at 01:09:25AM +0300, Nuutti Kotivuori wrote:

> There seems to be no way to prevent the deactivation and reactivation of
> the clock and everything between separate transfers - and a single
> transfer is bounded in size and no progress is reported for it. Even
> within a single transfer, it would seem that an earlier transfer is

This sounds more like you've got an IIO application (or audio) than a
SPI one - the hardware is the same but you're thinking about it in a
completely different way and so a separate subsystem makes sense.

------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-10-04 21:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-27 22:09 Continuous streaming SPI transfer Nuutti Kotivuori
     [not found] ` <87zk4bazje.fsf-Nc554NfcwGrUGg1qMAD/drNAH6kLmebB@public.gmane.org>
2012-09-28  1:30   ` Ned Forrester
     [not found]     ` <5064FDB1.9060304-/d+BM93fTQY@public.gmane.org>
2012-09-29 20:20       ` Nuutti Kotivuori
     [not found]         ` <87txugbmy2.fsf-Nc554NfcwGrUGg1qMAD/drNAH6kLmebB@public.gmane.org>
2012-09-30  1:56           ` Ned Forrester
2012-10-04 21:13   ` Mark Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).