From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kieran Mansley Subject: LRO/GSO interaction when packets are forwarded Date: Fri, 07 Mar 2008 14:09:57 +0000 Message-ID: <1204898997.4220.41.camel@moonstone.uk.level5networks.com> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit To: netdev@vger.kernel.org Return-path: Received: from 216-237-3-220.orange.nextweb.net ([216.237.3.220]:38254 "EHLO exchange.solarflare.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760806AbYCGOhp (ORCPT ); Fri, 7 Mar 2008 09:37:45 -0500 Sender: netdev-owner@vger.kernel.org List-ID: We've seen a couple of problems when using a bridge or IP forwarding combined with LRO packets generated by a network device driver. As you know, LRO packets can be either be page based (and passed up with lro_receive_page()) or use the skb frag_list (and passed up with lro_receive_skb()). In both cases it is likely that the device driver will have set CHECKSUM_UNNECESSARY to indicate that the packet has been checksummed by the device, and gso_size to mark it as an LRO packet and indicate the actual received MSS. If this skb goes directly to the network stack everything is fine. The problem comes when this packet instead goes into a bridge and is then retransmitted on another device. The skb seems to pass through the bridge relatively unmodified and because it has gso_size set the transmit path will attempt to segment it. If page-based allocation has been used, this is fine, but if the skb frag_list has been used the transmit path BUGs in skb_gso_segment(): http://lxr.linux.no/linux+v2.6.24.3/net/core/dev.c#L1410 Secondly, the same function hopes that a GSO packet will have CHECKSUM_PARTIAL set - if this packet had originated from a stack rather than from an LRO device this would be the case - but instead it will most likely have CHECKSUM_UNNECESSARY. Both of these problems are essentially being caused by gso_size and the ip_summed field have slightly different meanings on the receive and transmit paths, and the bridge/IP forwarding stuff not translating from one to the other. To be fair to the bridge, it would not be obvious to it that it will be passing the packet to a real device (that will invoke the transmit path) or to a stack. This leads me to my questions: - any idea why other drivers aren't hitting this problem? One possibility is that they're using lro_receive_page rather then lro_receive_skb, but I'd still expect to see the CHECKSUM_PARTIAL warning. I'm wondering if having LRO and forwarding between devices is a relatively rare thing, and so it just hasn't been tested. - any suggestion as to the best place to try and fix this up? My preference is making the transmit path cope with a packet that has the frag_list in use. Making it cope with CHECKSUM_UNNECESSARY should also be possible but to be honest I'm finding skb_gso_segment's handling of CHECKSUM_PARTIAL a bit hard to follow. The alternative would be I suppose to get the bridge and IP forwarding code to fix the socket buffer up before transmitting it, or for the driver to somehow know that it this packet will be forwarded and so it shouldn't use LRO. Of course, if we're hitting this because we're doing something wrong and you're confident it's not a problem in Linux, I'd be grateful to know! Here's a stack trace showing the path a packet that hits this might take: [] die+0x111/0x210 [] do_trap+0x97/0xf0 [] do_invalid_op+0x89/0xa0 [] error_code+0x72/0x78 [] dev_hard_start_xmit+0x1ae/0x2c0 [] __qdisc_run+0x4f/0x1d0 [] dev_queue_xmit+0x2d1/0x350 [] br_dev_queue_push_xmit+0x64/0xb0 [bridge] [] br_nf_dev_queue_xmit+0x13/0x40 [bridge] [] br_nf_post_routing+0x1b0/0x1f0 [bridge] [] nf_iterate+0x5b/0x90 [] nf_hook_slow+0x4a/0xc0 [] br_forward_finish+0x46/0x60 [bridge] [] br_nf_forward_finish+0xc7/0x160 [bridge] [] br_nf_forward_ip+0x137/0x1b0 [bridge] [] nf_iterate+0x5b/0x90 [] nf_hook_slow+0x4a/0xc0 [] __br_forward+0x55/0x80 [bridge] [] br_forward+0x27/0x30 [bridge] [] br_handle_frame_finish+0xed/0x150 [bridge] [] br_nf_pre_routing_finish+0x1be/0x360 [bridge] [] br_nf_pre_routing+0x425/0x6e0 [bridge] [] nf_iterate+0x5b/0x90 [] nf_hook_slow+0x4a/0xc0 [] br_handle_frame+0x16b/0x210 [bridge] [] netif_receive_skb+0x216/0x310 [] process_backlog+0x66/0xd0 [] net_rx_action+0xd2/0x170 [] __do_softirq+0x82/0x100 [] do_softirq+0x71/0xc0 skb_gso_segment is called from dev_gso_segment, which is called from dev_hard_start_xmit, which is shown in the stack trace. Thanks Kieran