public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* at91sam9260 MACB problem with IP fragmentation
@ 2012-12-06 11:32 Erwin Rol
  2012-12-06 13:27 ` Nicolas Ferre
  0 siblings, 1 reply; 7+ messages in thread
From: Erwin Rol @ 2012-12-06 11:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: Havard Skinnemoen, Nicolas Ferre

Hello Nicolas, Havard, all,

I have a very obscure problem with a at91sam9260 board (almost 1 to 1
copy of the Atmel EK).

The MACB seems to stall when I use large (>2 * MTU) UDP datagrams. The
test case is that a udp echo client (PC) sends datagrams with increasing
length to the AT91 until the max length of the UDP datagram is reached.
When there is no IP fragmentation everything is fine, but when the
datagrams are starting to get fragmented the AT91 will not reply
anymore. But as soon as some network traffic happens it goes on again,
and non of the data is lost.

With wireshark the effect can be easily seen (192.168.1.4 is the PC echo
client, and 192.168.1.133 is the at91 echo server) After the first
request there comes no reply. After a 5 second timeout the second
request is send. And then both replies are returned.

When I enabled debugging output it all started to work. So I tried some
udelays in the driver instead of printk and with a 1ms delay in the irq
handler it started working. Of course that is an unacceptable fix, but
it looks like that is some weird race condition that causes the sending
to stall. The only difference with normal MTU sized datagrams I can
think of is that the fragmented packets can be passed very quickly to
the macb tx function, because the kernel has all 5 skb's ready.

I would be very interested to hear if someone else could reproduce this
problem. Or even better, has seen this problem and has a fix for it.

I tried several kernels including the test version from Nicolas that he
posted on LKML in October. They all show the same effect.

please CC me because I am currently not on the list.

- Erwin

The wireshark dump;

> No.     Time           Source                Destination           Protocol Length Info
>       1 0.000000000    192.168.1.4           192.168.1.133         IPv4     1514   Fragmented IP protocol (proto=UDP 17, off=0, ID=0654) [Reassembled in #5]
>       2 0.000123000    192.168.1.4           192.168.1.133         IPv4     1514   Fragmented IP protocol (proto=UDP 17, off=1480, ID=0654) [Reassembled in #5]
>       3 0.000113000    192.168.1.4           192.168.1.133         IPv4     1514   Fragmented IP protocol (proto=UDP 17, off=2960, ID=0654) [Reassembled in #5]
>       4 0.000147000    192.168.1.4           192.168.1.133         IPv4     1514   Fragmented IP protocol (proto=UDP 17, off=4440, ID=0654) [Reassembled in #5]
>       5 0.000114000    192.168.1.4           192.168.1.133         ECHO     1259   Request
>       6 4.527395000    192.168.1.4           192.168.1.133         IPv4     1514   Fragmented IP protocol (proto=UDP 17, off=0, ID=065d) [Reassembled in #10]
>       7 0.000174000    192.168.1.4           192.168.1.133         IPv4     1514   Fragmented IP protocol (proto=UDP 17, off=1480, ID=065d) [Reassembled in #10]
>       8 0.000026000    192.168.1.4           192.168.1.133         IPv4     1514   Fragmented IP protocol (proto=UDP 17, off=2960, ID=065d) [Reassembled in #10]
>       9 0.000213000    192.168.1.4           192.168.1.133         IPv4     1514   Fragmented IP protocol (proto=UDP 17, off=4440, ID=065d) [Reassembled in #10]
>      10 0.000018000    192.168.1.4           192.168.1.133         ECHO     1260   Request
>      11 0.001115000    192.168.1.133         192.168.1.4           IPv4     1514   Fragmented IP protocol (proto=UDP 17, off=0, ID=c75d) [Reassembled in #15]
>      12 0.000120000    192.168.1.133         192.168.1.4           IPv4     1514   Fragmented IP protocol (proto=UDP 17, off=1480, ID=c75d) [Reassembled in #15]
>      13 0.000205000    192.168.1.133         192.168.1.4           IPv4     1514   Fragmented IP protocol (proto=UDP 17, off=2960, ID=c75d) [Reassembled in #15]
>      14 0.000167000    192.168.1.133         192.168.1.4           IPv4     1514   Fragmented IP protocol (proto=UDP 17, off=4440, ID=c75d) [Reassembled in #15]
>      15 0.000006000    192.168.1.133         192.168.1.4           ECHO     1259   Response
>      16 0.000396000    192.168.1.133         192.168.1.4           IPv4     1514   Fragmented IP protocol (proto=UDP 17, off=0, ID=c75e) [Reassembled in #20]
>      17 0.000224000    192.168.1.133         192.168.1.4           IPv4     1514   Fragmented IP protocol (proto=UDP 17, off=1480, ID=c75e) [Reassembled in #20]
>      18 0.000009000    192.168.1.133         192.168.1.4           IPv4     1514   Fragmented IP protocol (proto=UDP 17, off=2960, ID=c75e) [Reassembled in #20]
>      19 0.000237000    192.168.1.133         192.168.1.4           IPv4     1514   Fragmented IP protocol (proto=UDP 17, off=4440, ID=c75e) [Reassembled in #20]
>      20 0.000009000    192.168.1.133         192.168.1.4           ECHO     1260   Response



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: at91sam9260 MACB problem with IP fragmentation
  2012-12-06 11:32 at91sam9260 MACB problem with IP fragmentation Erwin Rol
@ 2012-12-06 13:27 ` Nicolas Ferre
  2012-12-06 15:15   ` Erwin Rol
  2012-12-20  9:17   ` Erwin Rol
  0 siblings, 2 replies; 7+ messages in thread
From: Nicolas Ferre @ 2012-12-06 13:27 UTC (permalink / raw)
  To: Erwin Rol
  Cc: linux-kernel, Havard Skinnemoen, linux-arm-kernel, matteo.fortini,
	netdev

Erwin,

On 12/06/2012 12:32 PM, Erwin Rol :
> Hello Nicolas, Havard, all,
> 
> I have a very obscure problem with a at91sam9260 board (almost 1 to 1
> copy of the Atmel EK).
> 
> The MACB seems to stall when I use large (>2 * MTU) UDP datagrams. The
> test case is that a udp echo client (PC) sends datagrams with increasing
> length to the AT91 until the max length of the UDP datagram is reached.
> When there is no IP fragmentation everything is fine, but when the
> datagrams are starting to get fragmented the AT91 will not reply
> anymore. But as soon as some network traffic happens it goes on again,
> and non of the data is lost.
> 
> With wireshark the effect can be easily seen (192.168.1.4 is the PC echo
> client, and 192.168.1.133 is the at91 echo server) After the first
> request there comes no reply. After a 5 second timeout the second
> request is send. And then both replies are returned.
> 
> When I enabled debugging output it all started to work. So I tried some
> udelays in the driver instead of printk and with a 1ms delay in the irq
> handler it started working. Of course that is an unacceptable fix, but
> it looks like that is some weird race condition that causes the sending
> to stall. The only difference with normal MTU sized datagrams I can
> think of is that the fragmented packets can be passed very quickly to
> the macb tx function, because the kernel has all 5 skb's ready.
> 
> I would be very interested to hear if someone else could reproduce this
> problem. Or even better, has seen this problem and has a fix for it.
> 
> I tried several kernels including the test version from Nicolas that he
> posted on LKML in October. They all show the same effect.

[..]

It seems that Matteo has the same behavior: check here:
http://www.spinics.net/lists/netdev/msg218951.html

I am working on the macb driver right now, so I will try to reproduce
and track this issue on my side.

Best regards,
-- 
Nicolas Ferre

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: at91sam9260 MACB problem with IP fragmentation
  2012-12-06 13:27 ` Nicolas Ferre
@ 2012-12-06 15:15   ` Erwin Rol
  2012-12-20  9:17   ` Erwin Rol
  1 sibling, 0 replies; 7+ messages in thread
From: Erwin Rol @ 2012-12-06 15:15 UTC (permalink / raw)
  To: Nicolas Ferre
  Cc: linux-kernel, Havard Skinnemoen, linux-arm-kernel, matteo.fortini,
	netdev

Hey Nicolas,

On 6-12-2012 14:27, Nicolas Ferre wrote:
> Erwin,
> 
> On 12/06/2012 12:32 PM, Erwin Rol :
>> Hello Nicolas, Havard, all,
>>
>> I have a very obscure problem with a at91sam9260 board (almost 1 to 1
>> copy of the Atmel EK).
>>
>>  <snip>
>>
> [..]
> 
> It seems that Matteo has the same behavior: check here:
> http://www.spinics.net/lists/netdev/msg218951.html

The difference seems to be that in Matteo's case the receiving stalls.
In my case it is the sending that stalls. I see the UDP datagram in
userspace and the sendto call also returns without error, but the data
does not end up on the network (until the next packet is send)

> I am working on the macb driver right now, so I will try to reproduce
> and track this issue on my side.

That would be really great, and thank you for the quick reply. If you
have anything that I should try or test on my hardware just let me know.

BTW: A quick check on a at91sam9263 board did not show the problem. I
will try to verify if it really does work on a 9263, cause maybe it just
more rare on a 9263.

- Erwin


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: at91sam9260 MACB problem with IP fragmentation
  2012-12-06 13:27 ` Nicolas Ferre
  2012-12-06 15:15   ` Erwin Rol
@ 2012-12-20  9:17   ` Erwin Rol
  2012-12-20 17:51     ` Nicolas Ferre
  2013-02-12 10:08     ` [PATCH] net/macb: fix race with RX interrupt while doing NAPI Nicolas Ferre
  1 sibling, 2 replies; 7+ messages in thread
From: Erwin Rol @ 2012-12-20  9:17 UTC (permalink / raw)
  To: Nicolas Ferre
  Cc: linux-kernel, Havard Skinnemoen, linux-arm-kernel, matteo.fortini,
	netdev

Hallo Nicolas,

On 6-12-2012 14:27, Nicolas Ferre wrote:
> Erwin,
> 
> On 12/06/2012 12:32 PM, Erwin Rol :
>> Hello Nicolas, Havard, all,
>>
>> I have a very obscure problem with a at91sam9260 board (almost 1 to 1
>> copy of the Atmel EK).
>>
>> The MACB seems to stall when I use large (>2 * MTU) UDP datagrams. The
>> test case is that a udp echo client (PC) sends datagrams with increasing
>> length to the AT91 until the max length of the UDP datagram is reached.
>> When there is no IP fragmentation everything is fine, but when the
>> datagrams are starting to get fragmented the AT91 will not reply
>> anymore. But as soon as some network traffic happens it goes on again,
>> and non of the data is lost.

<snip>

>> I tried several kernels including the test version from Nicolas that he
>> posted on LKML in October. They all show the same effect.
> 
> [..]
> 
> It seems that Matteo has the same behavior: check here:
> http://www.spinics.net/lists/netdev/msg218951.html

I tried Matteo's patch and it seems to work. But I don't know if the
patch is really the right solution. I checked again with wireshark and
it really seems the sending that stalls not the receiving. But as soon
as a ethernet frame is received the sending "un-stalls". So maybe the
patch just causes an MACB IRQ at certain moments that causes the sending
to continue?

> I am working on the macb driver right now, so I will try to reproduce
> and track this issue on my side.

Any luck reproducing it ?


- Erwin


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: at91sam9260 MACB problem with IP fragmentation
  2012-12-20  9:17   ` Erwin Rol
@ 2012-12-20 17:51     ` Nicolas Ferre
  2013-02-12 10:08     ` [PATCH] net/macb: fix race with RX interrupt while doing NAPI Nicolas Ferre
  1 sibling, 0 replies; 7+ messages in thread
From: Nicolas Ferre @ 2012-12-20 17:51 UTC (permalink / raw)
  To: Erwin Rol
  Cc: linux-kernel, Havard Skinnemoen, linux-arm-kernel, matteo.fortini,
	netdev

On 12/20/2012 10:17 AM, Erwin Rol :
> Hallo Nicolas,
> 
> On 6-12-2012 14:27, Nicolas Ferre wrote:
>> Erwin,
>>
>> On 12/06/2012 12:32 PM, Erwin Rol :
>>> Hello Nicolas, Havard, all,
>>>
>>> I have a very obscure problem with a at91sam9260 board (almost 1 to 1
>>> copy of the Atmel EK).
>>>
>>> The MACB seems to stall when I use large (>2 * MTU) UDP datagrams. The
>>> test case is that a udp echo client (PC) sends datagrams with increasing
>>> length to the AT91 until the max length of the UDP datagram is reached.
>>> When there is no IP fragmentation everything is fine, but when the
>>> datagrams are starting to get fragmented the AT91 will not reply
>>> anymore. But as soon as some network traffic happens it goes on again,
>>> and non of the data is lost.
> 
> <snip>
> 
>>> I tried several kernels including the test version from Nicolas that he
>>> posted on LKML in October. They all show the same effect.
>>
>> [..]
>>
>> It seems that Matteo has the same behavior: check here:
>> http://www.spinics.net/lists/netdev/msg218951.html
> 
> I tried Matteo's patch and it seems to work. But I don't know if the
> patch is really the right solution. I checked again with wireshark and
> it really seems the sending that stalls not the receiving. But as soon
> as a ethernet frame is received the sending "un-stalls". So maybe the
> patch just causes an MACB IRQ at certain moments that causes the sending
> to continue?

Any digging is interesting for me.


>> I am working on the macb driver right now, so I will try to reproduce
>> and track this issue on my side.
> 
> Any luck reproducing it ?

Yes, I see unexpected things happening but as I am connected to a whole
company network so maybe some broadcast packets are unlocking the
interface...
Anyway, I am continuing to investigate.

Best regards,--
Nicolas Ferre

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] net/macb: fix race with RX interrupt while doing NAPI
  2012-12-20  9:17   ` Erwin Rol
  2012-12-20 17:51     ` Nicolas Ferre
@ 2013-02-12 10:08     ` Nicolas Ferre
  2013-02-13 18:36       ` David Miller
  1 sibling, 1 reply; 7+ messages in thread
From: Nicolas Ferre @ 2013-02-12 10:08 UTC (permalink / raw)
  To: David S. Miller, netdev
  Cc: linux-arm-kernel, linux-kernel, Jean-Christophe PLAGNIOL-VILLARD,
	mailinglists, Nicolas Ferre

When interrupts are disabled, an RX condition can occur but
it is not reported when enabling interrupts again. We need to check
RSR and use napi_reschedule() if condition is met.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
---
 drivers/net/ethernet/cadence/macb.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c
index a9b0830..b9d4bb9 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -693,6 +693,11 @@ static int macb_poll(struct napi_struct *napi, int budget)
 		 * get notified when new packets arrive.
 		 */
 		macb_writel(bp, IER, MACB_RX_INT_FLAGS);
+
+		/* Packets received while interrupts were disabled */
+		status = macb_readl(bp, RSR);
+		if (unlikely(status))
+			napi_reschedule(napi);
 	}
 
 	/* TODO: Handle errors */
-- 
1.8.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] net/macb: fix race with RX interrupt while doing NAPI
  2013-02-12 10:08     ` [PATCH] net/macb: fix race with RX interrupt while doing NAPI Nicolas Ferre
@ 2013-02-13 18:36       ` David Miller
  0 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2013-02-13 18:36 UTC (permalink / raw)
  To: nicolas.ferre
  Cc: netdev, linux-arm-kernel, linux-kernel, plagnioj, mailinglists

From: Nicolas Ferre <nicolas.ferre@atmel.com>
Date: Tue, 12 Feb 2013 11:08:48 +0100

> When interrupts are disabled, an RX condition can occur but
> it is not reported when enabling interrupts again. We need to check
> RSR and use napi_reschedule() if condition is met.
> 
> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>

Applied.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-02-13 18:36 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-06 11:32 at91sam9260 MACB problem with IP fragmentation Erwin Rol
2012-12-06 13:27 ` Nicolas Ferre
2012-12-06 15:15   ` Erwin Rol
2012-12-20  9:17   ` Erwin Rol
2012-12-20 17:51     ` Nicolas Ferre
2013-02-12 10:08     ` [PATCH] net/macb: fix race with RX interrupt while doing NAPI Nicolas Ferre
2013-02-13 18:36       ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox