All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: cong.wang@bytedance.com, daniel@iogearbox.net, lmb@isovalent.com,
	edumazet@google.com, bpf@vger.kernel.org, netdev@vger.kernel.org,
	ast@kernel.org, andrii@kernel.org, will@isovalent.com
Subject: Re: [PATCH bpf v2 01/12] bpf: sockmap, pass skb ownership through read_skb
Date: Tue, 28 Mar 2023 12:42:14 +0200	[thread overview]
Message-ID: <87y1nh5f8k.fsf@cloudflare.com> (raw)
In-Reply-To: <20230327175446.98151-2-john.fastabend@gmail.com>

On Mon, Mar 27, 2023 at 10:54 AM -07, John Fastabend wrote:
> The read_skb hook calls consume_skb() now, but this means that if the
> recv_actor program wants to use the skb it needs to inc the ref cnt
> so that the consume_skb() doesn't kfree the sk_buff.
>
> This is problematic because in some error cases under memory pressure
> we may need to linearize the sk_buff from sk_psock_skb_ingress_enqueue().
> Then we get this,
>
>  skb_linearize()
>    __pskb_pull_tail()
>      pskb_expand_head()
>        BUG_ON(skb_shared(skb))
>
> Because we incremented users refcnt from sk_psock_verdict_recv() we
> hit the bug on with refcnt > 1 and trip it.
>
> To fix lets simply pass ownership of the sk_buff through the skb_read
> call. Then we can drop the consume from read_skb handlers and assume
> the verdict recv does any required kfree.
>
> Bug found while testing in our CI which runs in VMs that hit memory
> constraints rather regularly. William tested TCP read_skb handlers.
>
> [  106.536188] ------------[ cut here ]------------
> [  106.536197] kernel BUG at net/core/skbuff.c:1693!
> [  106.536479] invalid opcode: 0000 [#1] PREEMPT SMP PTI
> [  106.536726] CPU: 3 PID: 1495 Comm: curl Not tainted 5.19.0-rc5 #1
> [  106.537023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.16.0-1 04/01/2014
> [  106.537467] RIP: 0010:pskb_expand_head+0x269/0x330
> [  106.538585] RSP: 0018:ffffc90000138b68 EFLAGS: 00010202
> [  106.538839] RAX: 000000000000003f RBX: ffff8881048940e8 RCX: 0000000000000a20
> [  106.539186] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff8881048940e8
> [  106.539529] RBP: ffffc90000138be8 R08: 00000000e161fd1a R09: 0000000000000000
> [  106.539877] R10: 0000000000000018 R11: 0000000000000000 R12: ffff8881048940e8
> [  106.540222] R13: 0000000000000003 R14: 0000000000000000 R15: ffff8881048940e8
> [  106.540568] FS:  00007f277dde9f00(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000
> [  106.540954] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  106.541227] CR2: 00007f277eeede64 CR3: 000000000ad3e000 CR4: 00000000000006e0
> [  106.541569] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  106.541915] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  106.542255] Call Trace:
> [  106.542383]  <IRQ>
> [  106.542487]  __pskb_pull_tail+0x4b/0x3e0
> [  106.542681]  skb_ensure_writable+0x85/0xa0
> [  106.542882]  sk_skb_pull_data+0x18/0x20
> [  106.543084]  bpf_prog_b517a65a242018b0_bpf_skskb_http_verdict+0x3a9/0x4aa9
> [  106.543536]  ? migrate_disable+0x66/0x80
> [  106.543871]  sk_psock_verdict_recv+0xe2/0x310
> [  106.544258]  ? sk_psock_write_space+0x1f0/0x1f0
> [  106.544561]  tcp_read_skb+0x7b/0x120
> [  106.544740]  tcp_data_queue+0x904/0xee0
> [  106.544931]  tcp_rcv_established+0x212/0x7c0
> [  106.545142]  tcp_v4_do_rcv+0x174/0x2a0
> [  106.545326]  tcp_v4_rcv+0xe70/0xf60
> [  106.545500]  ip_protocol_deliver_rcu+0x48/0x290
> [  106.545744]  ip_local_deliver_finish+0xa7/0x150
>
> Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()")
> Reported-by: William Findlay <will@isovalent.com>
> Tested-by: William Findlay <will@isovalent.com>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> ---

Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>

  reply	other threads:[~2023-03-28 10:42 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-27 17:54 [PATCH bpf v2 00/11] bpf sockmap fixes John Fastabend
2023-03-27 17:54 ` [PATCH bpf v2 01/12] bpf: sockmap, pass skb ownership through read_skb John Fastabend
2023-03-28 10:42   ` Jakub Sitnicki [this message]
2023-03-27 17:54 ` [PATCH bpf v2 02/12] bpf: sockmap, convert schedule_work into delayed_work John Fastabend
2023-03-28 12:09   ` Jakub Sitnicki
2023-03-28 21:56     ` John Fastabend
2023-03-29 11:09       ` Jakub Sitnicki
2023-03-27 17:54 ` [PATCH bpf v2 03/12] bpf: sockmap, improved check for empty queue John Fastabend
2023-03-29 12:24   ` Jakub Sitnicki
2023-04-01  0:59     ` John Fastabend
2023-04-03  8:42       ` Jakub Sitnicki
2023-03-27 17:54 ` [PATCH bpf v2 04/12] bpf: sockmap, handle fin correctly John Fastabend
2023-04-03 11:11   ` Jakub Sitnicki
2023-04-03 21:05     ` John Fastabend
2023-04-04 10:11       ` Jakub Sitnicki
2023-03-27 17:54 ` [PATCH bpf v2 05/12] bpf: sockmap, TCP data stall on recv before accept John Fastabend
2023-03-27 17:54 ` [PATCH bpf v2 06/12] bpf: sockmap, wake up polling after data copy John Fastabend
2023-03-27 17:54 ` [PATCH bpf v2 07/12] bpf: sockmap incorrectly handling copied_seq John Fastabend
2023-03-27 17:54 ` [PATCH bpf v2 08/12] bpf: sockmap, pull socket helpers out of listen test for general use John Fastabend
2023-03-27 17:54 ` [PATCH bpf v2 09/12] bpf: sockmap, build helper to create connected socket pair John Fastabend
2023-03-27 17:54 ` [PATCH bpf v2 10/12] bpf: sockmap, test shutdown() correctly exits epoll and recv()=0 John Fastabend
2023-03-27 17:54 ` [PATCH bpf v2 11/12] bpf: sockmap, test FIONREAD returns correct bytes in rx buffer John Fastabend
2023-03-27 17:54 ` [PATCH bpf v2 12/12] bpf: sockmap, test FIONREAD returns correct bytes in rx buffer with drops John Fastabend
2023-03-27 18:17 ` [PATCH bpf v2 00/11] bpf sockmap fixes John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y1nh5f8k.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cong.wang@bytedance.com \
    --cc=daniel@iogearbox.net \
    --cc=edumazet@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=lmb@isovalent.com \
    --cc=netdev@vger.kernel.org \
    --cc=will@isovalent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.