* Re: PCI Bursting with PIO [not found] <fa.Mbzb1/2dWnp5V/5ElzijlkAstZU@ifi.uio.no> @ 2008-02-16 6:00 ` Robert Hancock 2008-02-17 4:53 ` Dan Gora 0 siblings, 1 reply; 10+ messages in thread From: Robert Hancock @ 2008-02-16 6:00 UTC (permalink / raw) To: Dan Gora; +Cc: linux-kernel Dan Gora wrote: > Hi, > > I am trying to optimize a driver for a slave only PCI device and am > having a lot of trouble getting any kind of PCI burst transactions in > either the read or the write direction. Using bcopy/memcpy or even a > hand-crafted while (len) { *pdst++ = *psrc++} (with pdst and psrc > unsigned long*) I can only get writes to burst and even in that case > only for 2 data phases (8 bytes) and only on 64 bit machines. The > best that I have managed is to use a hand crafted asm function which > copies the data through mmx registers on i386 machines, but that still > only bursts a maximum of 16 bytes in the write direction and not at > all in the read direction. The source and destination pointers are > both aligned to 8 byte boundaries, so I don't think that it's an > alignment issue. The chipset is being limited by what the CPU is giving it. If the CPU sends only a small amount of data in one access then the chipset usually does not try to burst more than that. > > Is there any way to get PIO to burst over the PCI bus in the read and > write direction? My device has 4 BAR registers, but the area where I > am transferring data is marked 'prefetchable' (although the others are > not). I read here: http://lkml.org/lkml/2004/9/23/393 that this was a > prerequisite, but it is apparently not sufficient. He also mentioned > that the area had to be marked as write-back, but it's not clear how > you can tell (no /proc/mtrr doesn't tell you) or that it has anything > to do with bursting reads. > > Any ideas would be really appreciated, Well, in order for the CPU to batch up more writes you'd have to map the BAR as either write-combining or write-back. If it's not listed in /proc/mtrr it will be the default setting of uncacheable. X has code to set up the video memory on the video card as write-combining so it can get better write performance, you could do something similar. Setting it as write-back might allow you to get the reads to do bursting as well (since the CPU will do a cache-line fill instead of individual accesses) but this if the device is modifying this memory area, unless you add code to invalidate those cache lines before reading the data you'll get stale data back. You could run into some other less obvious issues as well, as normally device memory regions are not mapped write-back. In general, especially if you need to read data back from the device, implementing a DMA engine would be by far the better option. Most chipsets seem not at all optimized for handling sequential reads from PCI memory from the CPU. (Even in the DMA case, you have to be careful with what type of memory read transaction you use when transferring from host memory - some chipsets don't like to burst more than one cycle if you use normal Memory Read instead of Memory Read Line or Memory Read Multiple.) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: PCI Bursting with PIO 2008-02-16 6:00 ` PCI Bursting with PIO Robert Hancock @ 2008-02-17 4:53 ` Dan Gora 0 siblings, 0 replies; 10+ messages in thread From: Dan Gora @ 2008-02-17 4:53 UTC (permalink / raw) To: linux-kernel On Feb 15, 2008 10:00 PM, Robert Hancock <rwh461@mail.usask.ca> wrote: > > Well, in order for the CPU to batch up more writes you'd have to map the > BAR as either write-combining or write-back. If it's not listed in > /proc/mtrr it will be the default setting of uncacheable. Ok, this is pretty much what I thought, but I still don't really have any idea how to do this. ioremap() doesn't take any flags and I'm not using ioremap_uncacheable(), plus the BAR is marked prefetchable... > X has code to > set up the video memory on the video card as write-combining so it can > get better write performance, you could do something similar. Alan mentioned this as well, but I haven't tried to hunt this code yet. If you have any pointers as to where I might find this, I would appreciate it. > Setting it as write-back might allow you to get the reads to do bursting > as well (since the CPU will do a cache-line fill instead of individual > accesses) I don't see what the cache write policy has to do with the reads. If the region is marked cacheable, then all reads should try and read a cache line, right? The write-back or write-through policy only has to do with the writes. If it's write through then writes go directly to RAM, if it's write-back then they hit the cache and are flushed when the line is flushed (LRU replacement, explicit cache line flush, etc..), right? > but this if the device is modifying this memory area, unless > you add code to invalidate those cache lines before reading the data > you'll get stale data back. Yeah this could definitely be tricky, would pci_dma_sync suffice for this? > You could run into some other less obvious > issues as well, as normally device memory regions are not mapped write-back. > > In general, especially if you need to read data back from the device, > implementing a DMA engine would be by far the better option. Most > chipsets seem not at all optimized for handling sequential reads from > PCI memory from the CPU. (Even in the DMA case, you have to be careful > with what type of memory read transaction you use when transferring from > host memory - some chipsets don't like to burst more than one cycle if > you use normal Memory Read instead of Memory Read Line or Memory Read > Multiple.) True enough... Fortunately my device allows me to set these... What I am trying to avoid is PCI read transactions in general. PCI reads are slow pretty much no matter if they are originated from the device or from the host because of all the multitude of bridges they have to go through (I've seen 5 in some cases... sheesh). So ultimately I like for everything going to the device to be written from the host, then everything going towards the host be DMA'd into RAM by the device, at least then we can take advantage of PCI write posting and you don't have to wait for the write to actually complete before we plod on. But this depends on at least getting get write burst performance from the host so that the time to write the data from host is less than the time it would take for the device to read the data out of RAM. thanks again for your help! dan ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <fa.QF1nJvJhMpLvtquNDa6sbADHwhs@ifi.uio.no>]
[parent not found: <fa.LrKLA2l3F3abMpHk4aDjFfwzFVI@ifi.uio.no>]
[parent not found: <fa.kFanAY/O5VMnSf6YXqEyxmsR62U@ifi.uio.no>]
* Re: PCI Bursting with PIO [not found] ` <fa.kFanAY/O5VMnSf6YXqEyxmsR62U@ifi.uio.no> @ 2008-02-17 19:06 ` Robert Hancock 0 siblings, 0 replies; 10+ messages in thread From: Robert Hancock @ 2008-02-17 19:06 UTC (permalink / raw) To: Dan Gora; +Cc: linux-kernel Dan Gora wrote: > On Feb 15, 2008 10:00 PM, Robert Hancock <rwh461@mail.usask.ca> wrote: >> Well, in order for the CPU to batch up more writes you'd have to map the >> BAR as either write-combining or write-back. If it's not listed in >> /proc/mtrr it will be the default setting of uncacheable. > > Ok, this is pretty much what I thought, but I still don't really have > any idea how to do this. ioremap() doesn't take any flags and I'm not > using ioremap_uncacheable(), plus the BAR is marked prefetchable... Likely easiest to do it from userspace by writing into /proc/mtrr to change the memory type attributes. Have a look at Documentation/mtrr.txt. > >> X has code to >> set up the video memory on the video card as write-combining so it can >> get better write performance, you could do something similar. > > Alan mentioned this as well, but I haven't tried to hunt this code > yet. If you have any pointers as to where I might find this, I would > appreciate it. > >> Setting it as write-back might allow you to get the reads to do bursting >> as well (since the CPU will do a cache-line fill instead of individual >> accesses) > > I don't see what the cache write policy has to do with the reads. If > the region is marked cacheable, then all reads should try and read a > cache line, right? The write-back or write-through policy only has to > do with the writes. If it's write through then writes go directly to > RAM, if it's write-back then they hit the cache and are flushed when > the line is flushed (LRU replacement, explicit cache line flush, > etc..), right? That caching attribute affects reads as well. If it's marked uncacheable or write-combining then reads will never be cached, only if it's marked write-back. > >> but this if the device is modifying this memory area, unless >> you add code to invalidate those cache lines before reading the data >> you'll get stale data back. > > Yeah this could definitely be tricky, would pci_dma_sync suffice for this? No, that's not meant to handle this case of stale data in the CPU's cache since that doesn't normally happen. Something like clflush or wbinvd would do it, those being x86 specific of course.. > >> You could run into some other less obvious >> issues as well, as normally device memory regions are not mapped write-back. >> >> In general, especially if you need to read data back from the device, >> implementing a DMA engine would be by far the better option. Most >> chipsets seem not at all optimized for handling sequential reads from >> PCI memory from the CPU. (Even in the DMA case, you have to be careful >> with what type of memory read transaction you use when transferring from >> host memory - some chipsets don't like to burst more than one cycle if >> you use normal Memory Read instead of Memory Read Line or Memory Read >> Multiple.) > > True enough... Fortunately my device allows me to set these... > > What I am trying to avoid is PCI read transactions in general. PCI > reads are slow pretty much no matter if they are originated from the > device or from the host because of all the multitude of bridges they > have to go through (I've seen 5 in some cases... sheesh). So > ultimately I like for everything going to the device to be written > from the host, then everything going towards the host be DMA'd into > RAM by the device, at least then we can take advantage of PCI write > posting and you don't have to wait for the write to actually complete > before we plod on. But this depends on at least getting get write > burst performance from the host so that the time to write the data > from host is less than the time it would take for the device to read > the data out of RAM. > > thanks again for your help! > dan Setting write-combining should be fairly easy without too many wierd side effects. Trying to use write-back to get burst reads is potentially doable, but may be fraught with difficulty. I think DMA in both directions is still likely better though, unless the data you are writing is very small. Most chipsets have pretty small posting buffers so the amount it will help you is small. If you fill them up you'll just stall the CPU. With doing a DMA read, at least only the device will stall. ^ permalink raw reply [flat|nested] 10+ messages in thread
* PCI Bursting with PIO
@ 2008-02-15 3:28 Dan Gora
2008-02-15 10:54 ` Andi Kleen
2008-02-15 13:02 ` Alan Cox
0 siblings, 2 replies; 10+ messages in thread
From: Dan Gora @ 2008-02-15 3:28 UTC (permalink / raw)
To: linux-kernel
Hi,
I am trying to optimize a driver for a slave only PCI device and am
having a lot of trouble getting any kind of PCI burst transactions in
either the read or the write direction. Using bcopy/memcpy or even a
hand-crafted while (len) { *pdst++ = *psrc++} (with pdst and psrc
unsigned long*) I can only get writes to burst and even in that case
only for 2 data phases (8 bytes) and only on 64 bit machines. The
best that I have managed is to use a hand crafted asm function which
copies the data through mmx registers on i386 machines, but that still
only bursts a maximum of 16 bytes in the write direction and not at
all in the read direction. The source and destination pointers are
both aligned to 8 byte boundaries, so I don't think that it's an
alignment issue.
Is there any way to get PIO to burst over the PCI bus in the read and
write direction? My device has 4 BAR registers, but the area where I
am transferring data is marked 'prefetchable' (although the others are
not). I read here: http://lkml.org/lkml/2004/9/23/393 that this was a
prerequisite, but it is apparently not sufficient. He also mentioned
that the area had to be marked as write-back, but it's not clear how
you can tell (no /proc/mtrr doesn't tell you) or that it has anything
to do with bursting reads.
Any ideas would be really appreciated,
thanks-
dan
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: PCI Bursting with PIO 2008-02-15 3:28 Dan Gora @ 2008-02-15 10:54 ` Andi Kleen 2008-02-15 17:55 ` Dan Gora 2008-02-15 13:02 ` Alan Cox 1 sibling, 1 reply; 10+ messages in thread From: Andi Kleen @ 2008-02-15 10:54 UTC (permalink / raw) To: Dan Gora; +Cc: linux-kernel "Dan Gora" <dan.gora@gmail.com> writes: > > Is there any way to get PIO I assume you really mean MMIO, not PIO. PIO would be port IO. > to burst over the PCI bus in the read and > write direction? You should set the MMIO mapping to write combining using an MTRR You might need to add appropiate memory barriers if you rely on write ordering though. -Andi ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: PCI Bursting with PIO 2008-02-15 10:54 ` Andi Kleen @ 2008-02-15 17:55 ` Dan Gora 0 siblings, 0 replies; 10+ messages in thread From: Dan Gora @ 2008-02-15 17:55 UTC (permalink / raw) To: Andi Kleen; +Cc: linux-kernel On Fri, Feb 15, 2008 at 2:54 AM, Andi Kleen <andi@firstfloor.org> wrote: > "Dan Gora" <dan.gora@gmail.com> writes: > > > > Is there any way to get PIO > > I assume you really mean MMIO, not PIO. PIO would be port IO. Sorry, I always saw it referred to as "Programmed I/O" as opposed to DMA... > You should set the MMIO mapping to write combining using an MTRR Sorry to be thick here, but how would I go about doing that? > You might need to add appropiate memory barriers if you rely > on write ordering though. Ok, thanks for the info... dan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: PCI Bursting with PIO 2008-02-15 3:28 Dan Gora 2008-02-15 10:54 ` Andi Kleen @ 2008-02-15 13:02 ` Alan Cox 2008-02-15 18:00 ` Dan Gora 1 sibling, 1 reply; 10+ messages in thread From: Alan Cox @ 2008-02-15 13:02 UTC (permalink / raw) To: Dan Gora; +Cc: linux-kernel > Is there any way to get PIO to burst over the PCI bus in the read and > write direction? My device has 4 BAR registers, but the area where I I think you are doign about as well as the X folks did when they spent time on trying to optimise pio transfers to and from graphics card RAM. > Any ideas would be really appreciated, Put a DMA controller on it ;) Alan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: PCI Bursting with PIO 2008-02-15 13:02 ` Alan Cox @ 2008-02-15 18:00 ` Dan Gora 2008-02-15 18:41 ` H. Peter Anvin 2008-02-15 19:00 ` Alan Cox 0 siblings, 2 replies; 10+ messages in thread From: Dan Gora @ 2008-02-15 18:00 UTC (permalink / raw) To: linux-kernel On Fri, Feb 15, 2008 at 5:02 AM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote: > > Is there any way to get PIO to burst over the PCI bus in the read and > > write direction? My device has 4 BAR registers, but the area where I > > I think you are doign about as well as the X folks did when they spent > time on trying to optimise pio transfers to and from graphics card RAM. > That's good to know. Do you have a link or anything to their discussion or some key words that I could hunt it down? > > > Any ideas would be really appreciated, > > Put a DMA controller on it ;) Ugh.. sadly that's what's coming. I really don't get why the northbridge cannot burst however. If the memory is mapped prefetchable and you have to do a PCI read through 3 PCIe bridges to finally get to your device it seems like it would _really_ behoove the bridge to do a Memory read multiple and get the whole cache line. I have searched around a lot and there doesn't seem to be any info at all on how you can persuade these bridges to do different PCI commands or burst. I don't know why.... thanks again for your help, dan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: PCI Bursting with PIO 2008-02-15 18:00 ` Dan Gora @ 2008-02-15 18:41 ` H. Peter Anvin 2008-02-15 19:00 ` Alan Cox 1 sibling, 0 replies; 10+ messages in thread From: H. Peter Anvin @ 2008-02-15 18:41 UTC (permalink / raw) To: Dan Gora; +Cc: linux-kernel Dan Gora wrote: >> >> Put a DMA controller on it ;) > > Ugh.. sadly that's what's coming. I really don't get why the > northbridge cannot burst however. Because the early Intel northbridges didn't, so noone else bothered either, since everyone designed their hardware to not require that capability. -hpa ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: PCI Bursting with PIO 2008-02-15 18:00 ` Dan Gora 2008-02-15 18:41 ` H. Peter Anvin @ 2008-02-15 19:00 ` Alan Cox 1 sibling, 0 replies; 10+ messages in thread From: Alan Cox @ 2008-02-15 19:00 UTC (permalink / raw) To: Dan Gora; +Cc: linux-kernel On Fri, 15 Feb 2008 10:00:28 -0800 "Dan Gora" <dan.gora@gmail.com> wrote: > On Fri, Feb 15, 2008 at 5:02 AM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote: > > > Is there any way to get PIO to burst over the PCI bus in the read and > > > write direction? My device has 4 BAR registers, but the area where I > > > > I think you are doign about as well as the X folks did when they spent > > time on trying to optimise pio transfers to and from graphics card RAM. > > > > That's good to know. Do you have a link or anything to their > discussion or some key words that I could hunt it down? It was some time ago but a look at the X tree will find you the code. It's basically the same as you did - using MMX. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2008-02-17 19:06 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <fa.Mbzb1/2dWnp5V/5ElzijlkAstZU@ifi.uio.no>
2008-02-16 6:00 ` PCI Bursting with PIO Robert Hancock
2008-02-17 4:53 ` Dan Gora
[not found] <fa.QF1nJvJhMpLvtquNDa6sbADHwhs@ifi.uio.no>
[not found] ` <fa.LrKLA2l3F3abMpHk4aDjFfwzFVI@ifi.uio.no>
[not found] ` <fa.kFanAY/O5VMnSf6YXqEyxmsR62U@ifi.uio.no>
2008-02-17 19:06 ` Robert Hancock
2008-02-15 3:28 Dan Gora
2008-02-15 10:54 ` Andi Kleen
2008-02-15 17:55 ` Dan Gora
2008-02-15 13:02 ` Alan Cox
2008-02-15 18:00 ` Dan Gora
2008-02-15 18:41 ` H. Peter Anvin
2008-02-15 19:00 ` Alan Cox
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox