Re: [RFC PATCH] net: ip_finish_output_gso: Attempt gso_size clamping if segments exceed mtu

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Florian Westphal <fw@strlen.de>
To: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	netdev@vger.kernel.org,
	Hannes Frederic Sowa <hannes@stressinduktion.org>,
	Eric Dumazet <edumazet@google.com>,
	Florian Westphal <fw@strlen.de>
Subject: Re: [RFC PATCH] net: ip_finish_output_gso: Attempt gso_size clamping if segments exceed mtu
Date: Mon, 22 Aug 2016 14:58:42 +0200	[thread overview]
Message-ID: <20160822125842.GF6199@breakpoint.cc> (raw)
In-Reply-To: <1471867570-1406-1-git-send-email-shmulik.ladkani@gmail.com>

Shmulik Ladkani <shmulik.ladkani@gmail.com> wrote:
> There are cases where gso skbs (which originate from an ingress
> interface) have a gso_size value that exceeds the output dst mtu:
> 
>  - ipv4 forwarding middlebox having in/out interfaces with different mtus
>    addressed by fe6cc55f3a 'net: ip, ipv6: handle gso skbs in forwarding path'
>  - bridge having a tunnel member interface stacked over a device with small mtu
>    addressed by b8247f095e 'net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs'
> 
> In both cases, such skbs are identified, then go through early software
> segmentation+fragmentation as part of ip_finish_output_gso.
> 
> Another approach is to shrink the gso_size to a value suitable so
> resulting segments are smaller than dst mtu, as suggeted by Eric
> Dumazet (as part of [1]) and Florian Westphal (as part of [2]).
> 
> This will void the need for software segmentation/fragmentation at
> ip_finish_output_gso, thus significantly improve throughput and lower
> cpu load.
> 
> This RFC patch attempts to implement this gso_size clamping.
> 
> [1] https://patchwork.ozlabs.org/patch/314327/
> [2] https://patchwork.ozlabs.org/patch/644724/
> 
> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Florian Westphal <fw@strlen.de>
> 
> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> ---
> 
>  Comments welcome.
> 
>  Few questions embedded in the patch.
> 
>  Florian, in fe6cc55f you described a BUG due to gso_size decrease.
>  I've tested both bridged and routed cases, but in my setups failed to
>  hit the issue; Appreciate if you can provide some hints.

Still get the BUG, I applied this patch on top of net-next.

On hypervisor:
10.0.0.2 via 192.168.7.10 dev tap0 mtu lock 1500
ssh root@10.0.0.2 'cat > /dev/null' < /dev/zero

On vm1 (which dies instantly, see below):
eth0 mtu 1500 (192.168.7.10)
eth1 mtu 1280 (10.0.0.1)

On vm2
eth0 mtu 1280 (10.0.0.2)

Normal ipv4 routing via vm1, no iptables etc. present, so

we have  hypervisor 1500 -> 1500 VM1 1280 -> 1280 VM2

Turning off gro avoids this problem.

------------[ cut here ]------------
kernel BUG at net-next/net/core/skbuff.c:3210!
invalid opcode: 0000 [#1] SMP
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.8.0-rc2+ #1842
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
task: ffff88013b100000 task.stack: ffff88013b0fc000
RIP: 0010:[<ffffffff8135ab44>]  [<ffffffff8135ab44>] skb_segment+0x964/0xb20
RSP: 0018:ffff88013fd838d0  EFLAGS: 00010212
RAX: 00000000000005a8 RBX: ffff88013a9f9900 RCX: ffff88013b1cf500
RDX: 0000000000006612 RSI: 0000000000000494 RDI: 0000000000000114
RBP: ffff88013fd839a8 R08: 00000000000069ca R09: ffff88013b1cf400
R10: 0000000000000011 R11: 0000000000006612 R12: 00000000000064fe
R13: ffff8801394c7300 R14: ffff88013937ad80 R15: 0000000000000011
FS:  0000000000000000(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f059fc3b2b0 CR3: 0000000001806000 CR4: 00000000000006a0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
 000000000000003b ffffffffffffffbe fffffff400000000 ffff88013b1cf400
 0000000000000000 0000000000000042 0000000000000040 0000000000000001
 0000000000000042 ffff88013b1cf600 0000000000000000 ffff8801000004cc
Call Trace:
 <IRQ> 
 [<ffffffff8123bacf>] ? swiotlb_map_page+0x5f/0x120
 [<ffffffff813eda00>] tcp_gso_segment+0x100/0x480
 [<ffffffff813eddb3>] tcp4_gso_segment+0x33/0x90
 [<ffffffff813fda7a>] inet_gso_segment+0x12a/0x3b0
 [<ffffffff81368c00>] ? dev_hard_start_xmit+0x20/0x110
 [<ffffffff813684f0>] skb_mac_gso_segment+0x90/0xf0
 [<ffffffff81368601>] __skb_gso_segment+0xb1/0x140
 [<ffffffff81368a7f>] validate_xmit_skb+0x14f/0x2b0
 [<ffffffff81368d2e>] validate_xmit_skb_list+0x3e/0x60
 [<ffffffff8138cb6a>] sch_direct_xmit+0x10a/0x1a0
 [<ffffffff81369199>] __dev_queue_xmit+0x369/0x5d0
 [<ffffffff8136940b>] dev_queue_xmit+0xb/0x10
 [<ffffffff813c8f47>] ip_finish_output2+0x247/0x310
 [<ffffffff813cac10>] ip_finish_output+0x1c0/0x250
 [<ffffffff813cadea>] ip_output+0x3a/0x40
 [<ffffffff813c751c>] ip_forward+0x36c/0x410
 [<ffffffff813c5b06>] ip_rcv+0x2e6/0x630
 [<ffffffff81364d5f>] __netif_receive_skb_core+0x2cf/0x940
 [<ffffffff813189bd>] ? e1000_alloc_rx_buffers+0x1bd/0x490
 [<ffffffff813653e8>] __netif_receive_skb+0x18/0x60
 [<ffffffff81365728>] netif_receive_skb_internal+0x28/0x90
 [<ffffffff813ee3b0>] ? tcp4_gro_complete+0x80/0x90
 [<ffffffff8136580a>] napi_gro_complete+0x7a/0xa0
 [<ffffffff813697e5>] napi_gro_flush+0x55/0x70
 [<ffffffff81369d06>] napi_complete_done+0x66/0xb0
 [<ffffffff81319810>] e1000_clean+0x380/0x900
 [<ffffffff81368c65>] ? dev_hard_start_xmit+0x85/0x110
 [<ffffffff81369ef3>] net_rx_action+0x1a3/0x2b0
 [<ffffffff81049c22>] __do_softirq+0xe2/0x1d0
 [<ffffffff81049f09>] irq_exit+0x89/0x90
 [<ffffffff810199bf>] do_IRQ+0x4f/0xd0
 [<ffffffff81498882>] common_interrupt+0x82/0x82
 <EOI> 
 [<ffffffff81035bd6>] ? native_safe_halt+0x6/0x10
 [<ffffffff8101ff49>] default_idle+0x9/0x10
 [<ffffffff8102052a>] arch_cpu_idle+0xa/0x10
 [<ffffffff810791ce>] default_idle_call+0x2e/0x30
 [<ffffffff8107933f>] cpu_startup_entry+0x16f/0x220
 [<ffffffff8102d6f5>] start_secondary+0x105/0x130
Code: 00 08 02 48 89 df 44 89 44 24 18 83 e6 c0 e8 04 c7 ff ff 85 c0 0f 85 02 01 00 00 8b 83 b8 00 00 00 44 8b 44 24 18 e9 cc fe ff ff <0f> 0b 0f 0b 0f 0b 8b 4b 74 85 c9 0f 85 ce 00 00 00 48 8b 83 c0 
RIP  [<ffffffff8135ab44>] skb_segment+0x964/0xb20
 RSP <ffff88013fd838d0>
---[ end trace 924612451efe8dce ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception in interrupt

next prev parent reply	other threads:[~2016-08-22 13:41 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-22 12:06 [RFC PATCH] net: ip_finish_output_gso: Attempt gso_size clamping if segments exceed mtu Shmulik Ladkani
2016-08-22 12:58 ` Florian Westphal [this message]
2016-08-22 13:05   ` Shmulik Ladkani
2016-08-24 14:53   ` Shmulik Ladkani
2016-08-24 14:59     ` Florian Westphal
2016-08-25  9:05   ` Shmulik Ladkani
2016-08-26 11:19     ` Herbert Xu
2016-09-09  5:48     ` Shmulik Ladkani

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160822125842.GF6199@breakpoint.cc \
    --to=fw@strlen.de \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hannes@stressinduktion.org \
    --cc=netdev@vger.kernel.org \
    --cc=shmulik.ladkani@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.