* E100 RX ring buffers continued... @ 2009-08-22 14:09 Krzysztof Halasa 2009-08-22 15:32 ` Krzysztof Halasa 2009-08-23 1:15 ` David Miller 0 siblings, 2 replies; 8+ messages in thread From: Krzysztof Halasa @ 2009-08-22 14:09 UTC (permalink / raw) To: netdev Cc: David Miller, Jeff Kirsher, Jesse Brandeburg, Bruce Allan, PJ Waskiewicz, John Ronciak Hi, E100 RX buffers seem more problematic than I though. The descriptor doesn't contain the address of the data buffer. Instead, the descriptor is to be followed by the data buffer immediately. It's not only the driver invention :-( The manual mentions it's the only mode supported, aka "simplified mode". There is also apparently unsupported and mostly undocumented "flexible mode". The question to Intel's experts: can the flexible mode be used anyway? I can write the code, but need info about eg. the data structures (RFD, how does it work in flexible mode) and a confirmation the silicon (or maybe which silicon) will work with it. TIA. -- Krzysztof Halasa ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: E100 RX ring buffers continued... 2009-08-22 14:09 E100 RX ring buffers continued Krzysztof Halasa @ 2009-08-22 15:32 ` Krzysztof Halasa 2009-08-23 1:15 ` David Miller 1 sibling, 0 replies; 8+ messages in thread From: Krzysztof Halasa @ 2009-08-22 15:32 UTC (permalink / raw) To: netdev Cc: David Miller, Jeff Kirsher, Jesse Brandeburg, Bruce Allan, PJ Waskiewicz, John Ronciak Krzysztof Halasa <khc@pm.waw.pl> writes: > There is also apparently unsupported and mostly undocumented "flexible > mode". The question to Intel's experts: can the flexible mode be used > anyway? I can write the code, but need info about eg. the data > structures (RFD, how does it work in flexible mode) and a confirmation > the silicon (or maybe which silicon) will work with it. I can see there is some "erratum" in 82559 related to flexible RX mode. It seems we're not affected, though - it requires more than 1 RBD (per RFD I hope) for the problem to happen. Do I get it right? -- Krzysztof Halasa ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: E100 RX ring buffers continued... 2009-08-22 14:09 E100 RX ring buffers continued Krzysztof Halasa 2009-08-22 15:32 ` Krzysztof Halasa @ 2009-08-23 1:15 ` David Miller 2009-08-23 15:30 ` Krzysztof Halasa 1 sibling, 1 reply; 8+ messages in thread From: David Miller @ 2009-08-23 1:15 UTC (permalink / raw) To: khc Cc: netdev, jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan, peter.p.waskiewicz.jr, john.ronciak From: Krzysztof Halasa <khc@pm.waw.pl> Date: Sat, 22 Aug 2009 16:09:00 +0200 > E100 RX buffers seem more problematic than I though. The descriptor > doesn't contain the address of the data buffer. Instead, the > descriptor is to be followed by the data buffer immediately. It's > not only the driver invention :-( I'm happy we can now see now why nobody has "fixed" this for more than 15 years :-) I think going down the road of trying to use the flexible mode is a dead end. I doubt any other OS driver is using it, and that means that there are likely many other errata hiding in the bushes which you will unearth by trying to use this new descriptor mode. And it won't show up when you test it, it will show up when some random person in some remote data center somewhere updates their kernel, and they won't send us a bug report, they'll replace their card or downgrade their kernel instead. And that's why it's inappropiate to approach this problem in this way for a driver this old, and for hardware that's been around so long. Just make the driver use consistent buffers for RX, and when packets arrive an SKB is allocated and the packet data is copied into the SKB from the consistent buffer. And for 2.6.31-rcX we probably have to simply revert your change. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: E100 RX ring buffers continued... 2009-08-23 1:15 ` David Miller @ 2009-08-23 15:30 ` Krzysztof Halasa 2009-08-23 18:20 ` Walt Holman 0 siblings, 1 reply; 8+ messages in thread From: Krzysztof Halasa @ 2009-08-23 15:30 UTC (permalink / raw) To: David Miller, Walt Holman Cc: netdev, jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan, peter.p.waskiewicz.jr, john.ronciak David Miller <davem@davemloft.net> writes: > I think going down the road of trying to use the flexible mode is a > dead end. I doubt any other OS driver is using it, and that means > that there are likely many other errata hiding in the bushes which you > will unearth by trying to use this new descriptor mode. And it won't > show up when you test it, it will show up when some random person in > some remote data center somewhere updates their kernel, and they won't > send us a bug report, they'll replace their card or downgrade their > kernel instead. Well, I'm afraid it's a possible scenario. I won't touch the flexible mode unless Intel folks tell me it's safe. OTOH it seems it was used by less common software, at least in the 82557-9 times. Not sure about Windows driver. (It seems the simplified mode was meant for Linux-alike "1 buffer per packet" approach while the flexible mode was to support systems using mbuf-like structures). > Just make the driver use consistent buffers for RX, and when packets > arrive an SKB is allocated and the packet data is copied into the SKB >>From the consistent buffer. That would be a performance hit, wouldn't it? Especially in older machines, where e100 was typically installed. I think it should be the last resort. > And for 2.6.31-rcX we probably have to simply revert your change. Unfortunately it seems that reverting will not fix operation completely on a system with swiotlb (checking the descriptor status isn't the only racy operation which depends on the cache behaviour, though it's the most frequent). And it will break all non-coherent archs again, they will either need to use the patch or still stick to (already removed) eepro100.c I looked at the eepro100.c sources. It uses BIDIR mapping the same way as e100.c does, but then syncs using FROM_DEVICE/TO_DEVICE instead of e100's always-BIDIR. I think the same in e100.c would work on all platforms (though still violating the DMA API a bit). Perhaps we should do that instead? Walt, can you check if 2.6.30.5 with the following patch applied still breaks e100 with the 6 GB of RAM, please? TIA. (This patch makes swiotlb aware that the CPU didn't alter the buffer, though I don't know if swiotlb will use this info). diff --git a/drivers/net/e100.c b/drivers/net/e100.c index 014dfb6..53e8252 100644 --- a/drivers/net/e100.c +++ b/drivers/net/e100.c @@ -1764,7 +1764,7 @@ static int e100_rx_indicate(struct nic *nic, struct rx *rx, nic->ru_running = RU_SUSPENDED; pci_dma_sync_single_for_device(nic->pdev, rx->dma_addr, sizeof(struct rfd), - PCI_DMA_BIDIRECTIONAL); + PCI_DMA_FROMDEVICE); return -ENODATA; } -- Krzysztof Halasa ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: E100 RX ring buffers continued... 2009-08-23 15:30 ` Krzysztof Halasa @ 2009-08-23 18:20 ` Walt Holman 2009-08-23 20:43 ` Krzysztof Halasa 0 siblings, 1 reply; 8+ messages in thread From: Walt Holman @ 2009-08-23 18:20 UTC (permalink / raw) To: Krzysztof Halasa Cc: netdev, jeffrey t kirsher, jesse brandeburg, bruce w allan, peter p waskiewicz jr, john ronciak, David Miller ----- "Krzysztof Halasa" <khc@pm.waw.pl> wrote: > David Miller <davem@davemloft.net> writes: > > > I think going down the road of trying to use the flexible mode is a > > dead end. I doubt any other OS driver is using it, and that means > > that there are likely many other errata hiding in the bushes which > you > > will unearth by trying to use this new descriptor mode. And it > won't > > show up when you test it, it will show up when some random person > in > > some remote data center somewhere updates their kernel, and they > won't > > send us a bug report, they'll replace their card or downgrade their > > kernel instead. > > Well, I'm afraid it's a possible scenario. I won't touch the flexible > mode unless Intel folks tell me it's safe. > OTOH it seems it was used by less common software, at least in the > 82557-9 times. Not sure about Windows driver. > (It seems the simplified mode was meant for Linux-alike "1 buffer per > packet" approach while the flexible mode was to support systems using > mbuf-like structures). > > > Just make the driver use consistent buffers for RX, and when > packets > > arrive an SKB is allocated and the packet data is copied into the > SKB > >>From the consistent buffer. > > That would be a performance hit, wouldn't it? Especially in older > machines, where e100 was typically installed. I think it should be > the > last resort. > > > And for 2.6.31-rcX we probably have to simply revert your change. > > Unfortunately it seems that reverting will not fix operation > completely > on a system with swiotlb (checking the descriptor status isn't the > only > racy operation which depends on the cache behaviour, though it's the > most frequent). And it will break all non-coherent archs again, they > will either need to use the patch or still stick to (already removed) > eepro100.c > > I looked at the eepro100.c sources. It uses BIDIR mapping the same > way > as e100.c does, but then syncs using FROM_DEVICE/TO_DEVICE instead of > e100's always-BIDIR. I think the same in e100.c would work on all > platforms (though still violating the DMA API a bit). Perhaps we > should > do that instead? > > Walt, can you check if 2.6.30.5 with the following patch applied > still > breaks e100 with the 6 GB of RAM, please? TIA. > (This patch makes swiotlb aware that the CPU didn't alter the buffer, > though I don't know if swiotlb will use this info). > > diff --git a/drivers/net/e100.c b/drivers/net/e100.c > index 014dfb6..53e8252 100644 > --- a/drivers/net/e100.c > +++ b/drivers/net/e100.c > @@ -1764,7 +1764,7 @@ static int e100_rx_indicate(struct nic *nic, > struct rx *rx, > nic->ru_running = RU_SUSPENDED; > pci_dma_sync_single_for_device(nic->pdev, rx->dma_addr, > sizeof(struct rfd), > - PCI_DMA_BIDIRECTIONAL); > + PCI_DMA_FROMDEVICE); > return -ENODATA; > } > > -- > Krzysztof Halasa Krzystof, I've tested this under 2.6.31-rc7 (which was also broken to begin with) and this appears to have fixed it. It's a fairly limited test at this point, as I've just been downloading a large file for the past 15 mins. or so, but this was normally enough to have multiple stoppages. Do you still need 2.6.30.5 tested as well? -Walt ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: E100 RX ring buffers continued... 2009-08-23 18:20 ` Walt Holman @ 2009-08-23 20:43 ` Krzysztof Halasa 2009-08-24 2:03 ` David Miller 0 siblings, 1 reply; 8+ messages in thread From: Krzysztof Halasa @ 2009-08-23 20:43 UTC (permalink / raw) To: Walt Holman Cc: netdev, jeffrey t kirsher, jesse brandeburg, bruce w allan, peter p waskiewicz jr, john ronciak, David Miller Walt Holman <walt@holmansrus.com> writes: > I've tested this under 2.6.31-rc7 (which was also broken to begin > with) and this appears to have fixed it. It's a fairly limited test > at this point, as I've just been downloading a large file for the past > 15 mins. or so, but this was normally enough to have multiple > stoppages. Do you still need 2.6.30.5 tested as well? No, these tests and the drivers in both versions are equivalent. Thanks a lot. David, I think it's safe to apply this. Probably to "stable" series, too. I think at least this should theoretically be changed as well: e100_rx_indicate() ... /* Need to sync before taking a peek at cb_complete bit */ pci_dma_sync_single_for_cpu(nic->pdev, rx->dma_addr, sizeof(struct rfd), PCI_DMA_BIDIRECTIONAL); rfd_status = le16_to_cpu(rfd->status); but since it appears to work I wouldn't touch it ATM. I guess the swiotlb code is smart enough to not write to device-owner memory on for_cpu() call. ... E100: fix interaction with swiotlb on X86. E100 places it's RX packet descriptors inside skb->data and uses them with bidirectional streaming DMA mapping. Data in descriptors is accessed simultaneously by the chip (writing status and size when a packet is received) and CPU (reading to check if the packet was received). This isn't a valid usage of PCI DMA API, which requires use of the coherent (consistent) memory for such purpose. Unfortunately e100 chips working in "simplified" RX mode have to store received data directly after the descriptor. Fixing the driver to conform to the API would require using unsupported "flexible" RX mode or receiving data into a coherent memory and using CPU to copy it to network buffers. This patch, while not yet making the driver conform to the PCI DMA API, allows it to work correctly on X86 with swiotlb (while not breaking other architectures). Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl> diff --git a/drivers/net/e100.c b/drivers/net/e100.c index 014dfb6..53e8252 100644 --- a/drivers/net/e100.c +++ b/drivers/net/e100.c @@ -1764,7 +1764,7 @@ static int e100_rx_indicate(struct nic *nic, struct rx *rx, nic->ru_running = RU_SUSPENDED; pci_dma_sync_single_for_device(nic->pdev, rx->dma_addr, sizeof(struct rfd), - PCI_DMA_BIDIRECTIONAL); + PCI_DMA_FROMDEVICE); return -ENODATA; } -- Krzysztof Halasa ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: E100 RX ring buffers continued... 2009-08-23 20:43 ` Krzysztof Halasa @ 2009-08-24 2:03 ` David Miller [not found] ` <m3bpm5v65e.fsf@intrepid.localdomain> 0 siblings, 1 reply; 8+ messages in thread From: David Miller @ 2009-08-24 2:03 UTC (permalink / raw) To: khc Cc: walt, netdev, jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan, peter.p.waskiewicz.jr, john.ronciak From: Krzysztof Halasa <khc@pm.waw.pl> Date: Sun, 23 Aug 2009 22:43:29 +0200 > David, I think it's safe to apply this. Yep, I've applied this to net-2.6, thanks! > Probably to "stable" series, too. That would depend upon applying your original patch to -stable, which I had no intentions of doing. ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <m3bpm5v65e.fsf@intrepid.localdomain>]
* Re: E100 RX ring buffers continued... [not found] ` <m3bpm5v65e.fsf@intrepid.localdomain> @ 2009-08-24 20:10 ` David Miller 0 siblings, 0 replies; 8+ messages in thread From: David Miller @ 2009-08-24 20:10 UTC (permalink / raw) To: khc Cc: walt, netdev, jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan, peter.p.waskiewicz.jr, john.ronciak From: Krzysztof Halasa <khc@pm.waw.pl> Date: Mon, 24 Aug 2009 12:49:01 +0200 > David Miller <davem@davemloft.net> writes: > >>> Probably to "stable" series, too. >> >> That would depend upon applying your original patch to -stable, >> which I had no intentions of doing. > > Still you sent it for both 2.6.27 and .30 anyway :-) Did I? Aha, then I indeed have to send your fix :-) ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-08-24 20:10 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-22 14:09 E100 RX ring buffers continued Krzysztof Halasa
2009-08-22 15:32 ` Krzysztof Halasa
2009-08-23 1:15 ` David Miller
2009-08-23 15:30 ` Krzysztof Halasa
2009-08-23 18:20 ` Walt Holman
2009-08-23 20:43 ` Krzysztof Halasa
2009-08-24 2:03 ` David Miller
[not found] ` <m3bpm5v65e.fsf@intrepid.localdomain>
2009-08-24 20:10 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).