netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* LRO/GSO interaction when packets are forwarded
@ 2008-03-07 14:09 Kieran Mansley
  2008-03-07 16:25 ` Stephen Hemminger
  0 siblings, 1 reply; 21+ messages in thread
From: Kieran Mansley @ 2008-03-07 14:09 UTC (permalink / raw)
  To: netdev

We've seen a couple of problems when using a bridge or IP forwarding
combined with LRO packets generated by a network device driver.  As you
know, LRO packets can be either be page based (and passed up with
lro_receive_page()) or use the skb frag_list (and passed up with
lro_receive_skb()).  In both cases it is likely that the device driver
will have set CHECKSUM_UNNECESSARY to indicate that the packet has been
checksummed by the device, and gso_size to mark it as an LRO packet and
indicate the actual received MSS.

If this skb goes directly to the network stack everything is fine.  The
problem comes when this packet instead goes into a bridge and is then
retransmitted on another device.  The skb seems to pass through the
bridge relatively unmodified and because it has gso_size set the
transmit path will attempt to segment it.  If page-based allocation has
been used, this is fine, but if the skb frag_list has been used the
transmit path BUGs in skb_gso_segment():

http://lxr.linux.no/linux+v2.6.24.3/net/core/dev.c#L1410

Secondly, the same function hopes that a GSO packet will have
CHECKSUM_PARTIAL set - if this packet had originated from a stack rather
than from an LRO device this would be the case - but instead it will
most likely have CHECKSUM_UNNECESSARY.

Both of these problems are essentially being caused by gso_size and the
ip_summed field have slightly different meanings on the receive and
transmit paths, and the bridge/IP forwarding stuff not translating from
one to the other.  To be fair to the bridge, it would not be obvious to
it that it will be passing the packet to a real device (that will invoke
the transmit path) or to a stack.

This leads me to my questions:

 - any idea why other drivers aren't hitting this problem?  One
possibility is that they're using lro_receive_page rather then
lro_receive_skb, but I'd still expect to see the CHECKSUM_PARTIAL
warning.  I'm wondering if having LRO and forwarding between devices is
a relatively rare thing, and so it just hasn't been tested.

 - any suggestion as to the best place to try and fix this up?  My
preference is making the transmit path cope with a packet that has the
frag_list in use.  Making it cope with CHECKSUM_UNNECESSARY should also
be possible but to be honest I'm finding skb_gso_segment's handling of
CHECKSUM_PARTIAL a bit hard to follow.  The alternative would be I
suppose to get the bridge and IP forwarding code to fix the socket
buffer up before transmitting it, or for the driver to somehow know that
it this packet will be forwarded and so it shouldn't use LRO.

Of course, if we're hitting this because we're doing something wrong and
you're confident it's not a problem in Linux, I'd be grateful to know!

Here's a stack trace showing the path a packet that hits this might
take:

 [<c0106831>] die+0x111/0x210
 [<c0106d67>] do_trap+0x97/0xf0
 [<c0107149>] do_invalid_op+0x89/0xa0
 [<c033c2fa>] error_code+0x72/0x78
 [<c02d41de>] dev_hard_start_xmit+0x1ae/0x2c0
 [<c02e276f>] __qdisc_run+0x4f/0x1d0
 [<c02d45c1>] dev_queue_xmit+0x2d1/0x350
 [<f8ae4054>] br_dev_queue_push_xmit+0x64/0xb0 [bridge]
 [<f8ae8bd3>] br_nf_dev_queue_xmit+0x13/0x40 [bridge]
 [<f8ae90b0>] br_nf_post_routing+0x1b0/0x1f0 [bridge]
 [<c02e724b>] nf_iterate+0x5b/0x90
 [<c02e72ca>] nf_hook_slow+0x4a/0xc0
 [<f8ae41b6>] br_forward_finish+0x46/0x60 [bridge]
 [<f8ae9317>] br_nf_forward_finish+0xc7/0x160 [bridge]
 [<f8ae98e7>] br_nf_forward_ip+0x137/0x1b0 [bridge]
 [<c02e724b>] nf_iterate+0x5b/0x90
 [<c02e72ca>] nf_hook_slow+0x4a/0xc0
 [<f8ae4225>] __br_forward+0x55/0x80 [bridge]
 [<f8ae4307>] br_forward+0x27/0x30 [bridge]
 [<f8ae4cfd>] br_handle_frame_finish+0xed/0x150 [bridge]
 [<f8ae960e>] br_nf_pre_routing_finish+0x1be/0x360 [bridge]
 [<f8ae9f15>] br_nf_pre_routing+0x425/0x6e0 [bridge]
 [<c02e724b>] nf_iterate+0x5b/0x90
 [<c02e72ca>] nf_hook_slow+0x4a/0xc0
 [<f8ae4ecb>] br_handle_frame+0x16b/0x210 [bridge]
 [<c02d4856>] netif_receive_skb+0x216/0x310
 [<c02d49b6>] process_backlog+0x66/0xd0
 [<c02d0c72>] net_rx_action+0xd2/0x170
 [<c0131f72>] __do_softirq+0x82/0x100
 [<c0107f11>] do_softirq+0x71/0xc0

skb_gso_segment is called from dev_gso_segment, which is called from
dev_hard_start_xmit, which is shown in the stack trace.

Thanks

Kieran



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2008-09-14  2:09 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-07 14:09 LRO/GSO interaction when packets are forwarded Kieran Mansley
2008-03-07 16:25 ` Stephen Hemminger
2008-03-07 17:06   ` Kieran Mansley
2008-03-07 21:43     ` [PATCH] ethtool: command line support for lro Stephen Hemminger
2008-03-10 18:07       ` Ben Hutchings
2008-03-10 18:29         ` Stephen Hemminger
2008-03-10 18:50           ` Ben Hutchings
2008-04-17 12:11         ` Ben Hutchings
2008-04-30 18:36           ` Kok, Auke
2008-05-02 14:34             ` Ben Hutchings
2008-09-14  2:09           ` Jeff Garzik
2008-03-11 16:50     ` LRO/GSO interaction when packets are forwarded Kieran Mansley
2008-04-22 21:15     ` Ben Hutchings
2008-04-22 23:01       ` Stephen Hemminger
2008-04-23  6:00         ` Jarek Poplawski
2008-04-23  6:15           ` Jarek Poplawski
2008-04-23 10:07             ` Ben Hutchings
2008-04-23 10:38               ` Jarek Poplawski
2008-04-23 10:42                 ` David Miller
2008-04-23 11:09                   ` Jarek Poplawski
2008-04-23 10:04         ` Ben Hutchings

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).