PCI DMA burst delay

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* PCI DMA burst delay
@ 2005-12-29 14:43 Burkhard Schölpen
  2005-12-29 15:23 ` Paul Fulghum
  0 siblings, 1 reply; 7+ messages in thread
From: Burkhard Schölpen @ 2005-12-29 14:43 UTC (permalink / raw)
  To: linux-kernel

Hello,

I'm working on a linux driver for a custom pci card (containing a Xilinx FPGA) which is bus master capable and has to transfer large amounts of data with high bandwidth. I finally succeeded in mmaping the dma buffer residing in ram to user space to avoid unnecessary copying. So actually it seems to work quite well, but sometimes I get some trouble (which seems to occur randomly) concerning dma transfers from ram to the device. When this problem occurs, it leads to the fact, that data arrives too late at the input fifo on the pci card (16kBit).

Looking at some signals with an oscilloscope shows the following behaviour:
1. After preparing the dma buffer in ram and telling the pci card that the dma transfer should begin, the first dma burst is transmitted in a normal way.
2. After the first burst, the pci bus grant signal is disabled, so the access to the bus seems to be denied.
3. About 400 nanoseconds later, the pci device tries to initiate the next burst, but does not succeed (pci bus access is not granted)
  => this process is repeated 3 times
4. In most cases the next burst starts here after the third trial (and all other following bursts are following well). But in the (rarely) faulty case, the 2nd burst only starts after another delay of about 600ns, which is too late, because meanwhile I get a buffer underrun in the FPGA. After some delayed bursts the transfer continues normally.

Does anybody have an idea, why the dma bursts could be delayed, although I deactivated all other pci devices that could disturb the transfers? Maybe it is a quite simple issue, because I'm not yet very experienced with dma stuff. Could it be a problem with my driver implementation, because if the problem occurs, it is always after the first burst? The dma buffer in ram I allocated with pci_alloc_consistent() as described in Rubini's book and the DMA-mapping.txt documentation file.

Here is some information about my environment:
- Gigabyte GA-8I945GMF mainboard with Pentium D processor
- custom pci board with Xilinx FPGA Spartan 2 (XC2S150-6) with PCI 32 LogiCore
- Debian Linux with 2.6.13.4 SMP kernel

Another thing I should mention is that I tried to configure the length of the dma bursts with the pci core, but that didn't work. The oscilloscope showed, that the actual burst length never was higher than 512 Bits and I think this is much too less to be efficient! 

Any hint would be very appreciated.

Kind regards,
Burkhard Schölpen

__________________________________________________________________________
Erweitern Sie FreeMail zu einem noch leistungsstarkeren E-Mail-Postfach!		
Mehr Infos unter http://freemail.web.de/home/landingpad/?mc=021131

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PCI DMA burst delay
  2005-12-29 14:43 PCI DMA burst delay Burkhard Schölpen
@ 2005-12-29 15:23 ` Paul Fulghum
  0 siblings, 0 replies; 7+ messages in thread
From: Paul Fulghum @ 2005-12-29 15:23 UTC (permalink / raw)
  To: Burkhard Schölpen; +Cc: linux-kernel

Burkhard Schölpen wrote:
> ... in the (rarely) faulty case, the 2nd burst only starts
 > after another delay of about 600ns, which is too late

Looking at the PCI 2.3 specification,
arbitration latency on the order of a microsecond
or two does not seem excessive for a 33MHz bus.

> ... I deactivated all other pci devices that could disturb the transfers?

Are you accessing registers on your device
during the DMA transfers? If so, the CPU is
acting as a PCI master that could delay granting
the bus to your device.

-- 
Paul Fulghum
Microgate Systems, Ltd.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PCI DMA burst delay
@ 2005-12-29 16:03 Burkhard Schölpen
  2005-12-29 16:55 ` Paul Fulghum
  0 siblings, 1 reply; 7+ messages in thread
From: Burkhard Schölpen @ 2005-12-29 16:03 UTC (permalink / raw)
  To: PaulFulghum; +Cc: linux-kernel

>Paul Fulghum <paulkf@microgate.com> schrieb am 29.12.05 16:30:20:
>
>Burkhard Schölpen wrote:
>> ... in the (rarely) faulty case, the 2nd burst only starts
>> after another delay of about 600ns, which is too late
>
>Looking at the PCI 2.3 specification,
>arbitration latency on the order of a microsecond
>or two does not seem excessive for a 33MHz bus.

Okay, then I think I have to figure out, why I cannot get longer bursts than 512 Bits...does anybody have a clue how I can handle that?

>> ... I deactivated all other pci devices that could disturb the transfers?
>
>Are you accessing registers on your device
>during the DMA transfers? If so, the CPU is
>acting as a PCI master that could delay granting
>the bus to your device.

No, I just made sure that not. There are no register accesses during dma transfer. The driver sends the application to sleep until an interrupt signals the completeness. 

Kind regards,
Burkhard

______________________________________________________________
Verschicken Sie romantische, coole und witzige Bilder per SMS!
Jetzt bei WEB.DE FreeMail: http://f.web.de/?mc=021193

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PCI DMA burst delay
  2005-12-29 16:03 Burkhard Schölpen
@ 2005-12-29 16:55 ` Paul Fulghum
  0 siblings, 0 replies; 7+ messages in thread
From: Paul Fulghum @ 2005-12-29 16:55 UTC (permalink / raw)
  To: Burkhard Schölpen; +Cc: linux-kernel

Burkhard Schölpen wrote:
> why I cannot get longer bursts than 512 Bits...

What value is written by the system into the
PCI configuration space of your device for
the latency timer?
(8 bits at offset 0x0d, units = clock cycles)

You can try setting that value in your
driver to a higher value.

-- 
Paul Fulghum
Microgate Systems, Ltd.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PCI DMA burst delay
       [not found] <5p7gt-3lu-5@gated-at.bofh.it>
@ 2005-12-29 18:40 ` Robert Hancock
  0 siblings, 0 replies; 7+ messages in thread
From: Robert Hancock @ 2005-12-29 18:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: bschoelpen

Burkhard Schölpen wrote:
> Hello,
> 
> I'm working on a linux driver for a custom pci card (containing a Xilinx FPGA) which is bus master capable and has to transfer large amounts of data with high bandwidth. I finally succeeded in mmaping the dma buffer residing in ram to user space to avoid unnecessary copying. So actually it seems to work quite well, but sometimes I get some trouble (which seems to occur randomly) concerning dma transfers from ram to the device. When this problem occurs, it leads to the fact, that data arrives too late at the input fifo on the pci card (16kBit).
> 
> Looking at some signals with an oscilloscope shows the following behaviour:
> 1. After preparing the dma buffer in ram and telling the pci card that the dma transfer should begin, the first dma burst is transmitted in a normal way.
> 2. After the first burst, the pci bus grant signal is disabled, so the access to the bus seems to be denied.
> 3. About 400 nanoseconds later, the pci device tries to initiate the next burst, but does not succeed (pci bus access is not granted)
>   => this process is repeated 3 times
> 4. In most cases the next burst starts here after the third trial (and all other following bursts are following well). But in the (rarely) faulty case, the 2nd burst only starts after another delay of about 600ns, which is too late, because meanwhile I get a buffer underrun in the FPGA. After some delayed bursts the transfer continues normally.
> 
> Does anybody have an idea, why the dma bursts could be delayed, although I deactivated all other pci devices that could disturb the transfers? Maybe it is a quite simple issue, because I'm not yet very experienced with dma stuff. Could it be a problem with my driver implementation, because if the problem occurs, it is always after the first burst? The dma buffer in ram I allocated with pci_alloc_consistent() as described in Rubini's book and the DMA-mapping.txt documentation file.

What kind of PCI transaction is the core using to do the reads? I think 
that Memory Read can cause bursts to be interrupted quickly on some 
chipsets. If you can use Memory Read Line or Memory Read Multiple this 
may increase performance.

You may also need more buffering in the FPGA, otherwise you may be 
vulnerable to underruns if there is contention on the PCI bus. The 
device should be able to handle normal arbitration delays.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PCI DMA burst delay
@ 2005-12-30 13:23 Burkhard Schölpen
  2005-12-30 18:45 ` Robert Hancock
  0 siblings, 1 reply; 7+ messages in thread
From: Burkhard Schölpen @ 2005-12-30 13:23 UTC (permalink / raw)
  To: RobertHancock, linux-kernel

>What kind of PCI transaction is the core using to do the reads? I think 
>that Memory Read can cause bursts to be interrupted quickly on some 
>chipsets. If you can use Memory Read Line or Memory Read Multiple this 
>may increase performance.
>
>You may also need more buffering in the FPGA, otherwise you may be 
>vulnerable to underruns if there is contention on the PCI bus. The 
>device should be able to handle normal arbitration delays.

Yeah, that was it! I asked the programmer of the FPGA and he told me that he was using the normal Memory Read transaction. After changing that to MRM we get a much higher burst length. Now the buffer underruns really seem to be disappeared, that is great! He also told me that the fifo buffer on the FPGA could not be larger, because the size is somehow limited in the core (it's some special block ram I think...), so we are lucky that the burst length seems to fix our problem.

By the way, there is another question coming up to my mind. The pci card is to be designed for a large size copying machine (i.e. it is something like a framegrabber device which has to write out data to a printer simultaneously) which leads to really high bandwidth. For now I allocate the dma buffer in ram (a ringbuffer) using pci_alloc_consistent(), which unfortunately delimits the size to about 4 MB. However, it would be convenient to be able to allocate a larger dma buffer, because then we would be able to perform some image processing algorithms directly inside this buffer via mmapping it to user space. Is there any way to achieve this quite simple without being forced to use scatter/gather dma (our hardware is not able to do this - at least until now)?

Thank you very much for your help!

Kind regards,
Burkhard Schölpen

__________________________________________________________________________
Erweitern Sie FreeMail zu einem noch leistungsstarkeren E-Mail-Postfach!		
Mehr Infos unter http://freemail.web.de/home/landingpad/?mc=021131

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PCI DMA burst delay
  2005-12-30 13:23 Burkhard Schölpen
@ 2005-12-30 18:45 ` Robert Hancock
  0 siblings, 0 replies; 7+ messages in thread
From: Robert Hancock @ 2005-12-30 18:45 UTC (permalink / raw)
  To: Burkhard Schölpen; +Cc: linux-kernel

Burkhard Schölpen wrote:
> By the way, there is another question coming up to my mind. The pci card is to be designed for a large size copying machine (i.e. it is something like a framegrabber device which has to write out data to a printer simultaneously) which leads to really high bandwidth. For now I allocate the dma buffer in ram (a ringbuffer) using pci_alloc_consistent(), which unfortunately delimits the size to about 4 MB. However, it would be convenient to be able to allocate a larger dma buffer, because then we would be able to perform some image processing algorithms directly inside this buffer via mmapping it to user space. Is there any way to achieve this quite simple without being forced to use scatter/gather dma (our hardware is not able to do this - at least until now)?

Unfortunately if you need a memory buffer that is physically contiguous 
to do DMA on, your choices are basically either pci_alloc_consistent, or 
possibly boot-time allocation of memory by telling the kernel to use 
less memory than is in the machine. Trying to allocate a big chunk of 
contiguous memory after the system has come up will not be very reliable 
since memory tends to become fragmented.

When dealing with this amount of data it really would be best to use 
some form of scatter-gather DMA. Even if the hardware is not capable of 
taking multiple addresses and doing the DMA on its own, you could sort 
of fake it and tell it to do multiple transfers, one for each block of 
memory - that might have some overhead though.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2005-12-30 18:47 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-12-29 14:43 PCI DMA burst delay Burkhard Schölpen
2005-12-29 15:23 ` Paul Fulghum
  -- strict thread matches above, loose matches on Subject: below --
2005-12-29 16:03 Burkhard Schölpen
2005-12-29 16:55 ` Paul Fulghum
     [not found] <5p7gt-3lu-5@gated-at.bofh.it>
2005-12-29 18:40 ` Robert Hancock
2005-12-30 13:23 Burkhard Schölpen
2005-12-30 18:45 ` Robert Hancock

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox