* E100 RX ring buffers continued...
@ 2009-08-22 14:09 Krzysztof Halasa
2009-08-22 15:32 ` Krzysztof Halasa
2009-08-23 1:15 ` David Miller
0 siblings, 2 replies; 8+ messages in thread
From: Krzysztof Halasa @ 2009-08-22 14:09 UTC (permalink / raw)
To: netdev
Cc: David Miller, Jeff Kirsher, Jesse Brandeburg, Bruce Allan,
PJ Waskiewicz, John Ronciak
Hi,
E100 RX buffers seem more problematic than I though.
The descriptor doesn't contain the address of the data buffer. Instead,
the descriptor is to be followed by the data buffer immediately. It's
not only the driver invention :-(
The manual mentions it's the only mode supported, aka "simplified mode".
There is also apparently unsupported and mostly undocumented "flexible
mode". The question to Intel's experts: can the flexible mode be used
anyway? I can write the code, but need info about eg. the data
structures (RFD, how does it work in flexible mode) and a confirmation
the silicon (or maybe which silicon) will work with it.
TIA.
--
Krzysztof Halasa
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: E100 RX ring buffers continued...
2009-08-22 14:09 E100 RX ring buffers continued Krzysztof Halasa
@ 2009-08-22 15:32 ` Krzysztof Halasa
2009-08-23 1:15 ` David Miller
1 sibling, 0 replies; 8+ messages in thread
From: Krzysztof Halasa @ 2009-08-22 15:32 UTC (permalink / raw)
To: netdev
Cc: David Miller, Jeff Kirsher, Jesse Brandeburg, Bruce Allan,
PJ Waskiewicz, John Ronciak
Krzysztof Halasa <khc@pm.waw.pl> writes:
> There is also apparently unsupported and mostly undocumented "flexible
> mode". The question to Intel's experts: can the flexible mode be used
> anyway? I can write the code, but need info about eg. the data
> structures (RFD, how does it work in flexible mode) and a confirmation
> the silicon (or maybe which silicon) will work with it.
I can see there is some "erratum" in 82559 related to flexible RX mode.
It seems we're not affected, though - it requires more than 1 RBD (per
RFD I hope) for the problem to happen.
Do I get it right?
--
Krzysztof Halasa
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: E100 RX ring buffers continued...
2009-08-22 14:09 E100 RX ring buffers continued Krzysztof Halasa
2009-08-22 15:32 ` Krzysztof Halasa
@ 2009-08-23 1:15 ` David Miller
2009-08-23 15:30 ` Krzysztof Halasa
1 sibling, 1 reply; 8+ messages in thread
From: David Miller @ 2009-08-23 1:15 UTC (permalink / raw)
To: khc
Cc: netdev, jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan,
peter.p.waskiewicz.jr, john.ronciak
From: Krzysztof Halasa <khc@pm.waw.pl>
Date: Sat, 22 Aug 2009 16:09:00 +0200
> E100 RX buffers seem more problematic than I though. The descriptor
> doesn't contain the address of the data buffer. Instead, the
> descriptor is to be followed by the data buffer immediately. It's
> not only the driver invention :-(
I'm happy we can now see now why nobody has "fixed" this for more than
15 years :-)
I think going down the road of trying to use the flexible mode is a
dead end. I doubt any other OS driver is using it, and that means
that there are likely many other errata hiding in the bushes which you
will unearth by trying to use this new descriptor mode. And it won't
show up when you test it, it will show up when some random person in
some remote data center somewhere updates their kernel, and they won't
send us a bug report, they'll replace their card or downgrade their
kernel instead.
And that's why it's inappropiate to approach this problem in this way
for a driver this old, and for hardware that's been around so long.
Just make the driver use consistent buffers for RX, and when packets
arrive an SKB is allocated and the packet data is copied into the SKB
from the consistent buffer.
And for 2.6.31-rcX we probably have to simply revert your change.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: E100 RX ring buffers continued...
2009-08-23 1:15 ` David Miller
@ 2009-08-23 15:30 ` Krzysztof Halasa
2009-08-23 18:20 ` Walt Holman
0 siblings, 1 reply; 8+ messages in thread
From: Krzysztof Halasa @ 2009-08-23 15:30 UTC (permalink / raw)
To: David Miller, Walt Holman
Cc: netdev, jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan,
peter.p.waskiewicz.jr, john.ronciak
David Miller <davem@davemloft.net> writes:
> I think going down the road of trying to use the flexible mode is a
> dead end. I doubt any other OS driver is using it, and that means
> that there are likely many other errata hiding in the bushes which you
> will unearth by trying to use this new descriptor mode. And it won't
> show up when you test it, it will show up when some random person in
> some remote data center somewhere updates their kernel, and they won't
> send us a bug report, they'll replace their card or downgrade their
> kernel instead.
Well, I'm afraid it's a possible scenario. I won't touch the flexible
mode unless Intel folks tell me it's safe.
OTOH it seems it was used by less common software, at least in the
82557-9 times. Not sure about Windows driver.
(It seems the simplified mode was meant for Linux-alike "1 buffer per
packet" approach while the flexible mode was to support systems using
mbuf-like structures).
> Just make the driver use consistent buffers for RX, and when packets
> arrive an SKB is allocated and the packet data is copied into the SKB
>>From the consistent buffer.
That would be a performance hit, wouldn't it? Especially in older
machines, where e100 was typically installed. I think it should be the
last resort.
> And for 2.6.31-rcX we probably have to simply revert your change.
Unfortunately it seems that reverting will not fix operation completely
on a system with swiotlb (checking the descriptor status isn't the only
racy operation which depends on the cache behaviour, though it's the
most frequent). And it will break all non-coherent archs again, they
will either need to use the patch or still stick to (already removed)
eepro100.c
I looked at the eepro100.c sources. It uses BIDIR mapping the same way
as e100.c does, but then syncs using FROM_DEVICE/TO_DEVICE instead of
e100's always-BIDIR. I think the same in e100.c would work on all
platforms (though still violating the DMA API a bit). Perhaps we should
do that instead?
Walt, can you check if 2.6.30.5 with the following patch applied still
breaks e100 with the 6 GB of RAM, please? TIA.
(This patch makes swiotlb aware that the CPU didn't alter the buffer,
though I don't know if swiotlb will use this info).
diff --git a/drivers/net/e100.c b/drivers/net/e100.c
index 014dfb6..53e8252 100644
--- a/drivers/net/e100.c
+++ b/drivers/net/e100.c
@@ -1764,7 +1764,7 @@ static int e100_rx_indicate(struct nic *nic, struct rx *rx,
nic->ru_running = RU_SUSPENDED;
pci_dma_sync_single_for_device(nic->pdev, rx->dma_addr,
sizeof(struct rfd),
- PCI_DMA_BIDIRECTIONAL);
+ PCI_DMA_FROMDEVICE);
return -ENODATA;
}
--
Krzysztof Halasa
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: E100 RX ring buffers continued...
2009-08-23 15:30 ` Krzysztof Halasa
@ 2009-08-23 18:20 ` Walt Holman
2009-08-23 20:43 ` Krzysztof Halasa
0 siblings, 1 reply; 8+ messages in thread
From: Walt Holman @ 2009-08-23 18:20 UTC (permalink / raw)
To: Krzysztof Halasa
Cc: netdev, jeffrey t kirsher, jesse brandeburg, bruce w allan,
peter p waskiewicz jr, john ronciak, David Miller
----- "Krzysztof Halasa" <khc@pm.waw.pl> wrote:
> David Miller <davem@davemloft.net> writes:
>
> > I think going down the road of trying to use the flexible mode is a
> > dead end. I doubt any other OS driver is using it, and that means
> > that there are likely many other errata hiding in the bushes which
> you
> > will unearth by trying to use this new descriptor mode. And it
> won't
> > show up when you test it, it will show up when some random person
> in
> > some remote data center somewhere updates their kernel, and they
> won't
> > send us a bug report, they'll replace their card or downgrade their
> > kernel instead.
>
> Well, I'm afraid it's a possible scenario. I won't touch the flexible
> mode unless Intel folks tell me it's safe.
> OTOH it seems it was used by less common software, at least in the
> 82557-9 times. Not sure about Windows driver.
> (It seems the simplified mode was meant for Linux-alike "1 buffer per
> packet" approach while the flexible mode was to support systems using
> mbuf-like structures).
>
> > Just make the driver use consistent buffers for RX, and when
> packets
> > arrive an SKB is allocated and the packet data is copied into the
> SKB
> >>From the consistent buffer.
>
> That would be a performance hit, wouldn't it? Especially in older
> machines, where e100 was typically installed. I think it should be
> the
> last resort.
>
> > And for 2.6.31-rcX we probably have to simply revert your change.
>
> Unfortunately it seems that reverting will not fix operation
> completely
> on a system with swiotlb (checking the descriptor status isn't the
> only
> racy operation which depends on the cache behaviour, though it's the
> most frequent). And it will break all non-coherent archs again, they
> will either need to use the patch or still stick to (already removed)
> eepro100.c
>
> I looked at the eepro100.c sources. It uses BIDIR mapping the same
> way
> as e100.c does, but then syncs using FROM_DEVICE/TO_DEVICE instead of
> e100's always-BIDIR. I think the same in e100.c would work on all
> platforms (though still violating the DMA API a bit). Perhaps we
> should
> do that instead?
>
> Walt, can you check if 2.6.30.5 with the following patch applied
> still
> breaks e100 with the 6 GB of RAM, please? TIA.
> (This patch makes swiotlb aware that the CPU didn't alter the buffer,
> though I don't know if swiotlb will use this info).
>
> diff --git a/drivers/net/e100.c b/drivers/net/e100.c
> index 014dfb6..53e8252 100644
> --- a/drivers/net/e100.c
> +++ b/drivers/net/e100.c
> @@ -1764,7 +1764,7 @@ static int e100_rx_indicate(struct nic *nic,
> struct rx *rx,
> nic->ru_running = RU_SUSPENDED;
> pci_dma_sync_single_for_device(nic->pdev, rx->dma_addr,
> sizeof(struct rfd),
> - PCI_DMA_BIDIRECTIONAL);
> + PCI_DMA_FROMDEVICE);
> return -ENODATA;
> }
>
> --
> Krzysztof Halasa
Krzystof,
I've tested this under 2.6.31-rc7 (which was also broken to begin with) and this appears to have fixed it. It's a fairly limited test at this point, as I've just been downloading a large file for the past 15 mins. or so, but this was normally enough to have multiple stoppages. Do you still need 2.6.30.5 tested as well?
-Walt
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: E100 RX ring buffers continued...
2009-08-23 18:20 ` Walt Holman
@ 2009-08-23 20:43 ` Krzysztof Halasa
2009-08-24 2:03 ` David Miller
0 siblings, 1 reply; 8+ messages in thread
From: Krzysztof Halasa @ 2009-08-23 20:43 UTC (permalink / raw)
To: Walt Holman
Cc: netdev, jeffrey t kirsher, jesse brandeburg, bruce w allan,
peter p waskiewicz jr, john ronciak, David Miller
Walt Holman <walt@holmansrus.com> writes:
> I've tested this under 2.6.31-rc7 (which was also broken to begin
> with) and this appears to have fixed it. It's a fairly limited test
> at this point, as I've just been downloading a large file for the past
> 15 mins. or so, but this was normally enough to have multiple
> stoppages. Do you still need 2.6.30.5 tested as well?
No, these tests and the drivers in both versions are equivalent. Thanks
a lot.
David, I think it's safe to apply this. Probably to "stable" series,
too.
I think at least this should theoretically be changed as well:
e100_rx_indicate()
...
/* Need to sync before taking a peek at cb_complete bit */
pci_dma_sync_single_for_cpu(nic->pdev, rx->dma_addr,
sizeof(struct rfd), PCI_DMA_BIDIRECTIONAL);
rfd_status = le16_to_cpu(rfd->status);
but since it appears to work I wouldn't touch it ATM. I guess the
swiotlb code is smart enough to not write to device-owner memory on
for_cpu() call.
...
E100: fix interaction with swiotlb on X86.
E100 places it's RX packet descriptors inside skb->data and uses them
with bidirectional streaming DMA mapping. Data in descriptors is
accessed simultaneously by the chip (writing status and size when
a packet is received) and CPU (reading to check if the packet was
received). This isn't a valid usage of PCI DMA API, which requires use
of the coherent (consistent) memory for such purpose. Unfortunately e100
chips working in "simplified" RX mode have to store received data
directly after the descriptor. Fixing the driver to conform to the API
would require using unsupported "flexible" RX mode or receiving data
into a coherent memory and using CPU to copy it to network buffers.
This patch, while not yet making the driver conform to the PCI DMA API,
allows it to work correctly on X86 with swiotlb (while not breaking
other architectures).
Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
diff --git a/drivers/net/e100.c b/drivers/net/e100.c
index 014dfb6..53e8252 100644
--- a/drivers/net/e100.c
+++ b/drivers/net/e100.c
@@ -1764,7 +1764,7 @@ static int e100_rx_indicate(struct nic *nic, struct rx *rx,
nic->ru_running = RU_SUSPENDED;
pci_dma_sync_single_for_device(nic->pdev, rx->dma_addr,
sizeof(struct rfd),
- PCI_DMA_BIDIRECTIONAL);
+ PCI_DMA_FROMDEVICE);
return -ENODATA;
}
--
Krzysztof Halasa
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: E100 RX ring buffers continued...
2009-08-23 20:43 ` Krzysztof Halasa
@ 2009-08-24 2:03 ` David Miller
[not found] ` <m3bpm5v65e.fsf@intrepid.localdomain>
0 siblings, 1 reply; 8+ messages in thread
From: David Miller @ 2009-08-24 2:03 UTC (permalink / raw)
To: khc
Cc: walt, netdev, jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan,
peter.p.waskiewicz.jr, john.ronciak
From: Krzysztof Halasa <khc@pm.waw.pl>
Date: Sun, 23 Aug 2009 22:43:29 +0200
> David, I think it's safe to apply this.
Yep, I've applied this to net-2.6, thanks!
> Probably to "stable" series, too.
That would depend upon applying your original patch to -stable,
which I had no intentions of doing.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: E100 RX ring buffers continued...
[not found] ` <m3bpm5v65e.fsf@intrepid.localdomain>
@ 2009-08-24 20:10 ` David Miller
0 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2009-08-24 20:10 UTC (permalink / raw)
To: khc
Cc: walt, netdev, jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan,
peter.p.waskiewicz.jr, john.ronciak
From: Krzysztof Halasa <khc@pm.waw.pl>
Date: Mon, 24 Aug 2009 12:49:01 +0200
> David Miller <davem@davemloft.net> writes:
>
>>> Probably to "stable" series, too.
>>
>> That would depend upon applying your original patch to -stable,
>> which I had no intentions of doing.
>
> Still you sent it for both 2.6.27 and .30 anyway :-)
Did I? Aha, then I indeed have to send your fix :-)
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-08-24 20:10 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-22 14:09 E100 RX ring buffers continued Krzysztof Halasa
2009-08-22 15:32 ` Krzysztof Halasa
2009-08-23 1:15 ` David Miller
2009-08-23 15:30 ` Krzysztof Halasa
2009-08-23 18:20 ` Walt Holman
2009-08-23 20:43 ` Krzysztof Halasa
2009-08-24 2:03 ` David Miller
[not found] ` <m3bpm5v65e.fsf@intrepid.localdomain>
2009-08-24 20:10 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).