All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shmulik Ladkani <shmulik.ladkani@gmail.com>
To: Daniel Borkmann <daniel@iogearbox.net>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	netdev <netdev@vger.kernel.org>,
	Alexander Duyck <alexander.duyck@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>, Yonghong Song <yhs@fb.com>,
	Steffen Klassert <steffen.klassert@secunet.com>,
	shmulik@metanetworks.com, eyal@metanetworks.com
Subject: BUG_ON in skb_segment, after bpf_skb_change_proto was applied
Date: Mon, 26 Aug 2019 17:07:24 +0300	[thread overview]
Message-ID: <20190826170724.25ff616f@pixies> (raw)

Hi,

In our production systems, running v4.19.y longterm kernels, we hit a
BUG_ON in 'skb_segment()'. It occurs rarely and although tried, couldn't
synthetically reproduce.

In v4.19.41 it crashes at net/core/skbuff.c:3711

		while (pos < offset + len) {
			if (i >= nfrags) {
				i = 0;
				nfrags = skb_shinfo(list_skb)->nr_frags;
				frag = skb_shinfo(list_skb)->frags;
				frag_skb = list_skb;
				if (!skb_headlen(list_skb)) {
					BUG_ON(!nfrags);
				} else {
3711:					BUG_ON(!list_skb->head_frag);

With the accompanying dump:

 kernel BUG at net/core/skbuff.c:3711!
 invalid opcode: 0000 [#1] SMP PTI
 CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Not tainted 4.19.41-041941-generic #201905080231
 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
 RIP: 0010:skb_segment+0xb65/0xda9
 Code: 89 44 24 60 48 89 4c 24 70 e8 87 b3 ff ff 48 8b 4c 24 70 44 8b 44 24 60 85 c0 44 8b 54 24 4c 0f 84 fc fb ff ff e9 16 fd ff ff <0f> 0b 29 c1 89 ce 09 ca e9 61 ff ff ff 0f 0b 41 8b bf 84 00 00 00
 RSP: 0018:ffff9e4d79b037c0 EFLAGS: 00010246
 RAX: ffff9e4d75012ec0 RBX: ffff9e4d74067500 RCX: 0000000000000000
 RDX: 0000000000480020 RSI: 0000000000000000 RDI: ffff9e4d74e3a200
 RBP: ffff9e4d79b03898 R08: 0000000000000564 R09: f69d84ecbfe8b972
 R10: 0000000000000571 R11: a6b66a32f69d84ec R12: 0000000000000564
 R13: ffff9e4c18d03ef0 R14: 0000000000000000 R15: ffff9e4d74e3a200
 FS:  0000000000000000(0000) GS:ffff9e4d79b00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00000000007f50d8 CR3: 000000009420a003 CR4: 00000000001606e0
 Call Trace:
  <IRQ>
  tcp_gso_segment+0xf9/0x4e0
  tcp6_gso_segment+0x5e/0x100
  ipv6_gso_segment+0x112/0x340
  skb_mac_gso_segment+0xb9/0x130
  __skb_gso_segment+0x84/0x190
  validate_xmit_skb+0x14a/0x2f0
  validate_xmit_skb_list+0x4b/0x70
  sch_direct_xmit+0x154/0x390
  __dev_queue_xmit+0x808/0x920
  dev_queue_xmit+0x10/0x20
  neigh_direct_output+0x11/0x20
  ip6_finish_output2+0x1b9/0x5b0
  ip6_finish_output+0x13a/0x1b0
  ip6_output+0x6c/0x110
  ? ip6_fragment+0xa40/0xa40
  ip6_forward+0x501/0x810
  ip6_rcv_finish+0x7a/0x90
  ipv6_rcv+0x69/0xe0
  ? nf_hook.part.24+0x10/0x10
  __netif_receive_skb_core+0x4fa/0xc80
  ? netif_receive_skb_core+0x20/0x20
  ? netif_receive_skb_internal+0x45/0xf0
  ? tcp4_gro_complete+0x86/0x90
  ? napi_gro_complete+0x53/0x90
  __netif_receive_skb_one_core+0x3b/0x80
  __netif_receive_skb+0x18/0x60
  process_backlog+0xb3/0x170
  net_rx_action+0x130/0x350
  __do_softirq+0xdc/0x2d4

To our best knowledge, the packet flow leading to this BUG_ON is:

  - ingress on eth0 (veth, gro:on), ipv4 udp encapsulated esp
  - re-ingresss on eth0, after xfrm, decapsulated ipv4 tcp
  - the skb was GROed (skb_is_gso:true)
  - ipv4 forwarding to dummy1, where eBPF nat4-to-6 program is attached
    at TC Egress (calls 'bpf_skb_change_proto()'), then redirect to ingress
    on same device.
    NOTE: 'bpf_skb_proto_4_to_6()' mangles 'shinfo->gso_size'
  - ingress on dummy1, ipv6 tcp
  - ipv6 forwarding
  - egress on tun2 (tun device) that calls:
    validate_xmit_skb -> ... -> skb_segment BUG_ON

A similar issue was reported and fixed by Yonghong Song in commit
13acc94eff12 ("net: permit skb_segment on head_frag frag_list skb").

However 13acc94eff12 added "BUG_ON(!list_skb->head_frag)" to line 3711,
and patchwork states:

    This patch addressed the issue by handling skb_headlen(list_skb) != 0
    case properly if list_skb->head_frag is true, which is expected in
    most cases. [1]

meaning, 13acc94eff12 does not support list_skb->head_frag=0 case.

Historically, it is claimed that skb_segment is rather intolerant to
gso_size changes, quote:

    Eric suggested to shrink gso_size instead to avoid segmentation+fragments.
    I think its nice idea, but skb_gso_segment makes certain assumptions about
    nr_frags and gso_size (it can't handle frag size > desired mss). [2]

Any suggestions how to debug and fix this?

Could it be that 'bpf_skb_change_proto()' isn't really allowed to
mangle 'gso_size', and we should somehow enforce a 'skb_segment()' call
PRIOR translation?

Appreciate any input and assistance,
Shmulik

[1] https://patchwork.ozlabs.org/patch/889166/
[2] https://patchwork.ozlabs.org/patch/314327/

             reply	other threads:[~2019-08-26 14:07 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-26 14:07 Shmulik Ladkani [this message]
2019-08-26 17:47 ` BUG_ON in skb_segment, after bpf_skb_change_proto was applied Eric Dumazet
2019-08-27 11:42   ` Shmulik Ladkani
2019-08-27 12:10     ` Daniel Borkmann
2019-08-28  5:56       ` Shmulik Ladkani
2019-08-29 12:22       ` Shmulik Ladkani
2019-09-01 20:05         ` Willem de Bruijn
2019-09-02 13:44           ` Shmulik Ladkani
2019-09-03 15:51           ` Shmulik Ladkani
2019-09-03 16:23             ` Willem de Bruijn
2019-09-03 17:03               ` Shmulik Ladkani
2019-09-03 17:24                 ` Willem de Bruijn
2019-08-27 15:09     ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190826170724.25ff616f@pixies \
    --to=shmulik.ladkani@gmail.com \
    --cc=alexander.duyck@gmail.com \
    --cc=ast@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eric.dumazet@gmail.com \
    --cc=eyal@metanetworks.com \
    --cc=netdev@vger.kernel.org \
    --cc=shmulik@metanetworks.com \
    --cc=steffen.klassert@secunet.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.