* Re: DMA with PCIe and very large DMA transfers [not found] <fa.vod1UTmCwWWvRyGIk08cgMVx/H4@ifi.uio.no> @ 2008-07-24 20:03 ` Robert Hancock 0 siblings, 0 replies; 4+ messages in thread From: Robert Hancock @ 2008-07-24 20:03 UTC (permalink / raw) To: Alex; +Cc: linux-kernel Alex wrote: > Are there any examples (or just documentation) on providing DMA for > PCIe devices? I have read the DMA-mapping.txt document but wasn't sure > if this was all relevant to PCIe. For example, pci_set_dma_mask talks > about driving pins on the PCI bus, but PCIe doesn't work in quite the > same way. Perhaps these calls have no effect in this case (similar to > the PCI latency timers) but I just wondered. Not sure where you saw that reference, but there's no difference with respect to DMA mapping with PCI vs. PCI Express. > > I'm also interested in knowing if any drivers perform very large DMA > transfers. I'm putting together a driver for a specialist high-speed > data acquisition device that typically might need a DMA buffer of > 100-500MB (ouch!) in the low 32 bit address space (or possibly 36 bit > address space, but I'm not sure if this is possible to allocate > without allocating as much as possible and then discarding?) but only > supports a very limited number of scatter/gather entries (between 1 > and 4). The particular use-case for this is a ring buffer with > registers in IO memory that are used to keep track of read/write > pointers in the buffer. The device writes to the DMA memory when there > is space in the ring buffer i.e. the DMA transfer is only from device > to host. > > I would like to perform the DMA straight from device to user-space > (probably via mmap), which I think requires a consistent/coherent > rather than streaming DMA so that I may read from the ring buffer > while the DMA may still be active (although not active in that section > of the buffer). > > I assume that to allocate that much memory in physical contiguous > addresses will require a driver to be loaded as soon as possible at > startup. I was thinking about trying to grab a lot of high-order pages > and try and make them one contiguous block - is that feasible? For a block of memory that big, you may need to reserve some memory at boot time for use by the device. I don't really have any details on how to do that, though. > Browsing the archives, I found references to early allocation for > large buffers, but no direct links to existing examples or recommended > techniques on how to stitch pages together in to a single buffer. Is > there a platform independent way to ensure cache coherency with > allocated pages like this (i.e. not allocated with > pci_alloc_consistent / dma_alloc_coherent)? > > I suppose that anything which takes a large chunk of physical memory > at startup isn't very recommended, but this is for a specialist device > and the host machine will probably be dedicated to using it. > > As an aside, my module, driver and device are under the pci bus in > sysfs - should be PCIe device be showing under the pci_express bus? > This appears to be the PCIe Port Bus Driver and only has the aer > driver listed under it. I can't find any other drivers in the kernel > source that use it (I'm currently running 2.6.21). Most parts of the kernel don't care whether devices are PCI or PCI-E, so this is presumably why. ^ permalink raw reply [flat|nested] 4+ messages in thread
* DMA with PCIe and very large DMA transfers @ 2008-07-24 15:06 Alex 2008-07-24 20:02 ` Jesse Barnes 0 siblings, 1 reply; 4+ messages in thread From: Alex @ 2008-07-24 15:06 UTC (permalink / raw) To: linux-kernel Are there any examples (or just documentation) on providing DMA for PCIe devices? I have read the DMA-mapping.txt document but wasn't sure if this was all relevant to PCIe. For example, pci_set_dma_mask talks about driving pins on the PCI bus, but PCIe doesn't work in quite the same way. Perhaps these calls have no effect in this case (similar to the PCI latency timers) but I just wondered. I'm also interested in knowing if any drivers perform very large DMA transfers. I'm putting together a driver for a specialist high-speed data acquisition device that typically might need a DMA buffer of 100-500MB (ouch!) in the low 32 bit address space (or possibly 36 bit address space, but I'm not sure if this is possible to allocate without allocating as much as possible and then discarding?) but only supports a very limited number of scatter/gather entries (between 1 and 4). The particular use-case for this is a ring buffer with registers in IO memory that are used to keep track of read/write pointers in the buffer. The device writes to the DMA memory when there is space in the ring buffer i.e. the DMA transfer is only from device to host. I would like to perform the DMA straight from device to user-space (probably via mmap), which I think requires a consistent/coherent rather than streaming DMA so that I may read from the ring buffer while the DMA may still be active (although not active in that section of the buffer). I assume that to allocate that much memory in physical contiguous addresses will require a driver to be loaded as soon as possible at startup. I was thinking about trying to grab a lot of high-order pages and try and make them one contiguous block - is that feasible? Browsing the archives, I found references to early allocation for large buffers, but no direct links to existing examples or recommended techniques on how to stitch pages together in to a single buffer. Is there a platform independent way to ensure cache coherency with allocated pages like this (i.e. not allocated with pci_alloc_consistent / dma_alloc_coherent)? I suppose that anything which takes a large chunk of physical memory at startup isn't very recommended, but this is for a specialist device and the host machine will probably be dedicated to using it. As an aside, my module, driver and device are under the pci bus in sysfs - should be PCIe device be showing under the pci_express bus? This appears to be the PCIe Port Bus Driver and only has the aer driver listed under it. I can't find any other drivers in the kernel source that use it (I'm currently running 2.6.21). Thanks, Alex ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: DMA with PCIe and very large DMA transfers 2008-07-24 15:06 Alex @ 2008-07-24 20:02 ` Jesse Barnes 2008-07-25 7:33 ` Clemens Ladisch 0 siblings, 1 reply; 4+ messages in thread From: Jesse Barnes @ 2008-07-24 20:02 UTC (permalink / raw) To: Alex; +Cc: linux-kernel On Thursday, July 24, 2008 8:06 am Alex wrote: > Are there any examples (or just documentation) on providing DMA for > PCIe devices? I have read the DMA-mapping.txt document but wasn't sure > if this was all relevant to PCIe. For example, pci_set_dma_mask talks > about driving pins on the PCI bus, but PCIe doesn't work in quite the > same way. Perhaps these calls have no effect in this case (similar to > the PCI latency timers) but I just wondered. Yes, the API applies to PCIe as well. It's really more of an address translation layer than anything else, giving you bus addresses to use with your device. The API described in DMA-API.txt is even more generic, but either one should work for your purposes. > I'm also interested in knowing if any drivers perform very large DMA > transfers. I'm putting together a driver for a specialist high-speed > data acquisition device that typically might need a DMA buffer of > 100-500MB (ouch!) in the low 32 bit address space (or possibly 36 bit > address space, but I'm not sure if this is possible to allocate > without allocating as much as possible and then discarding?) but only > supports a very limited number of scatter/gather entries (between 1 > and 4). The particular use-case for this is a ring buffer with > registers in IO memory that are used to keep track of read/write > pointers in the buffer. The device writes to the DMA memory when there > is space in the ring buffer i.e. the DMA transfer is only from device > to host. It sounds like you'll probably have fairly special purpose configurations. In that case, it may be reasonable to reserve your large DMA buffers at boot time, assuming you need large, contiguous chunks of physical memory. Depending on your usage model, it may even be reasonable to limit the kernel's view of memory by modifying the e820 map or similar, and writing some custom code to manage your DMA buffers. > I would like to perform the DMA straight from device to user-space > (probably via mmap), which I think requires a consistent/coherent > rather than streaming DMA so that I may read from the ring buffer > while the DMA may still be active (although not active in that section > of the buffer). Well this would likely rule out the second approach I mentioned above, unless you wanted to do more major surgery on the kernel. For high bw userspace communication you might check out relayfs (Documentation/filesystems/relay.txt), if it's suitable for your needs it could make things easier on you. > I assume that to allocate that much memory in physical contiguous > addresses will require a driver to be loaded as soon as possible at > startup. I was thinking about trying to grab a lot of high-order pages > and try and make them one contiguous block - is that feasible? > Browsing the archives, I found references to early allocation for > large buffers, but no direct links to existing examples or recommended > techniques on how to stitch pages together in to a single buffer. Is > there a platform independent way to ensure cache coherency with > allocated pages like this (i.e. not allocated with > pci_alloc_consistent / dma_alloc_coherent)? Not that I can think of offhand, though on many platforms DMA mapped with map_single or map_sg will be coherent by default, which may be good enough for you. > As an aside, my module, driver and device are under the pci bus in > sysfs - should be PCIe device be showing under the pci_express bus? > This appears to be the PCIe Port Bus Driver and only has the aer > driver listed under it. I can't find any other drivers in the kernel > source that use it (I'm currently running 2.6.21). Yeah, for the most part stuff will appear under PCI in sysfs; only PCIe specific features like AER will show up under the PCI express bus driver. Note that if you want to upstream this driver we'll probably want to add some platform independent code to support your needs (huge, coherent DMA buffers) in a reasonably generic way... Jesse ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: DMA with PCIe and very large DMA transfers 2008-07-24 20:02 ` Jesse Barnes @ 2008-07-25 7:33 ` Clemens Ladisch 0 siblings, 0 replies; 4+ messages in thread From: Clemens Ladisch @ 2008-07-25 7:33 UTC (permalink / raw) To: Alex; +Cc: Jesse Barnes, linux-kernel Jesse Barnes wrote: > On Thursday, July 24, 2008 8:06 am Alex wrote: > > I'm also interested in knowing if any drivers perform very large DMA > > transfers. I'm putting together a driver for a specialist high-speed > > data acquisition device that typically might need a DMA buffer of > > 100-500MB (ouch!) in the low 32 bit address space (or possibly 36 bit > > address space, but I'm not sure if this is possible to allocate > > without allocating as much as possible and then discarding?) but only > > supports a very limited number of scatter/gather entries (between 1 > > and 4). [...] > > It sounds like you'll probably have fairly special purpose configurations. In > that case, it may be reasonable to reserve your large DMA buffers at boot > time, assuming you need large, contiguous chunks of physical memory. Most of the sound drivers do this because few chips support SG. > > I assume that to allocate that much memory in physical contiguous > > addresses will require a driver to be loaded as soon as possible at > > startup. I was thinking about trying to grab a lot of high-order pages > > and try and make them one contiguous block - is that feasible? > > Browsing the archives, I found references to early allocation for > > large buffers, but no direct links to existing examples or recommended > > techniques on how to stitch pages together in to a single buffer. Have a look into sound/core/memalloc.c. It tries to get a contiguous block from the kernel; I don't think that it's possible to do this manually if the kernel has failed. Regards, Clemens ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-07-25 7:34 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <fa.vod1UTmCwWWvRyGIk08cgMVx/H4@ifi.uio.no>
2008-07-24 20:03 ` DMA with PCIe and very large DMA transfers Robert Hancock
2008-07-24 15:06 Alex
2008-07-24 20:02 ` Jesse Barnes
2008-07-25 7:33 ` Clemens Ladisch
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox