From: John Fastabend <john.fastabend@gmail.com>
To: zijianzhang@bytedance.com, bpf@vger.kernel.org
Cc: edumazet@google.com, john.fastabend@gmail.com,
jakub@cloudflare.com, davem@davemloft.net, dsahern@kernel.org,
kuba@kernel.org, pabeni@redhat.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
kpsingh@kernel.org, sdf@fomichev.me, haoluo@google.com,
jolsa@kernel.org, mykolal@fb.com, shuah@kernel.org,
wangyufen@huawei.com, xiyou.wangcong@gmail.com,
zijianzhang@bytedance.com
Subject: RE: [PATCH bpf 2/2] tcp_bpf: Fix the sk_mem_uncharge logic in tcp_bpf_sendmsg
Date: Wed, 20 Nov 2024 21:43:48 -0800 [thread overview]
Message-ID: <673ec8941b96d_157a20829@john.notmuch> (raw)
In-Reply-To: <20241016234838.3167769-3-zijianzhang@bytedance.com>
zijianzhang@ wrote:
> From: Zijian Zhang <zijianzhang@bytedance.com>
>
> The current sk memory accounting logic in __SK_REDIRECT is pre-uncharging
> tosend bytes, which is either msg->sg.size or a smaller value apply_bytes.
> Potential problems with this strategy are as follows:
> - If the actual sent bytes are smaller than tosend, we need to charge some
> bytes back, as in line 487, which is okay but seems not clean.
> - When tosend is set to apply_bytes, as in line 417, and (ret < 0), we may
> miss uncharging (msg->sg.size - apply_bytes) bytes.
>
> 415 tosend = msg->sg.size;
> 416 if (psock->apply_bytes && psock->apply_bytes < tosend)
> 417 tosend = psock->apply_bytes;
> ...
> 443 sk_msg_return(sk, msg, tosend);
> 444 release_sock(sk);
> 446 origsize = msg->sg.size;
> 447 ret = tcp_bpf_sendmsg_redir(sk_redir, redir_ingress,
> 448 msg, tosend, flags);
> 449 sent = origsize - msg->sg.size;
> ...
> 454 lock_sock(sk);
> 455 if (unlikely(ret < 0)) {
> 456 int free = sk_msg_free_nocharge(sk, msg);
> 458 if (!cork)
> 459 *copied -= free;
> 460 }
> ...
> 487 if (eval == __SK_REDIRECT)
> 488 sk_mem_charge(sk, tosend - sent);
>
> When running the selftest test_txmsg_redir_wait_sndmem with txmsg_apply,
> the following warning will be reported,
> ------------[ cut here ]------------
> WARNING: CPU: 6 PID: 57 at net/ipv4/af_inet.c:156 inet_sock_destruct+0x190/0x1a0
> Modules linked in:
> CPU: 6 UID: 0 PID: 57 Comm: kworker/6:0 Not tainted 6.12.0-rc1.bm.1-amd64+ #43
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> Workqueue: events sk_psock_destroy
> RIP: 0010:inet_sock_destruct+0x190/0x1a0
> RSP: 0018:ffffad0a8021fe08 EFLAGS: 00010206
> RAX: 0000000000000011 RBX: ffff9aab4475b900 RCX: ffff9aab481a0800
> RDX: 0000000000000303 RSI: 0000000000000011 RDI: ffff9aab4475b900
> RBP: ffff9aab4475b990 R08: 0000000000000000 R09: ffff9aab40050ec0
> R10: 0000000000000000 R11: ffff9aae6fdb1d01 R12: ffff9aab49c60400
> R13: ffff9aab49c60598 R14: ffff9aab49c60598 R15: dead000000000100
> FS: 0000000000000000(0000) GS:ffff9aae6fd80000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007ffec7e47bd8 CR3: 00000001a1a1c004 CR4: 0000000000770ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
> <TASK>
> ? __warn+0x89/0x130
> ? inet_sock_destruct+0x190/0x1a0
> ? report_bug+0xfc/0x1e0
> ? handle_bug+0x5c/0xa0
> ? exc_invalid_op+0x17/0x70
> ? asm_exc_invalid_op+0x1a/0x20
> ? inet_sock_destruct+0x190/0x1a0
> __sk_destruct+0x25/0x220
> sk_psock_destroy+0x2b2/0x310
> process_scheduled_works+0xa3/0x3e0
> worker_thread+0x117/0x240
> ? __pfx_worker_thread+0x10/0x10
> kthread+0xcf/0x100
> ? __pfx_kthread+0x10/0x10
> ret_from_fork+0x31/0x40
> ? __pfx_kthread+0x10/0x10
> ret_from_fork_asm+0x1a/0x30
> </TASK>
> ---[ end trace 0000000000000000 ]---
>
> In __SK_REDIRECT, a more concise way is delaying the uncharging after sent
> bytes are finalized, and uncharge this value. When (ret < 0), we shall
> invoke sk_msg_free.
>
> Same thing happens in case __SK_DROP, when tosend is set to apply_bytes,
> we may miss uncharging (msg->sg.size - apply_bytes) bytes. The same
> warning will be reported in selftest.
>
> 468 case __SK_DROP:
> 469 default:
> 470 sk_msg_free_partial(sk, msg, tosend);
> 471 sk_msg_apply_bytes(psock, tosend);
> 472 *copied -= (tosend + delta);
> 473 return -EACCES;
>
> So instead of sk_msg_free_partial we can do sk_msg_free here.
>
> Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
> Fixes: 8ec95b94716a ("bpf, sockmap: Fix the sk->sk_forward_alloc warning of sk_stream_kill_queues")
> Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
> ---
Agree thanks.
Acked-by: John Fastabend <john.fastabend@gmail.com>
next prev parent reply other threads:[~2024-11-21 5:43 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-16 23:48 [PATCH bpf 0/2] tcp_bpf: Fix the sk_mem_uncharge logic in tcp_bpf_sendmsg zijianzhang
2024-10-16 23:48 ` [PATCH bpf 1/2] selftests/bpf: Add apply_bytes test to test_txmsg_redir_wait_sndmem in test_sockmap zijianzhang
2024-11-21 5:38 ` John Fastabend
2024-10-16 23:48 ` [PATCH bpf 2/2] tcp_bpf: Fix the sk_mem_uncharge logic in tcp_bpf_sendmsg zijianzhang
2024-11-21 5:43 ` John Fastabend [this message]
2024-11-21 5:49 ` [PATCH bpf 0/2] " John Fastabend
2024-11-26 19:50 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=673ec8941b96d_157a20829@john.notmuch \
--to=john.fastabend@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=haoluo@google.com \
--cc=jakub@cloudflare.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=martin.lau@linux.dev \
--cc=mykolal@fb.com \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=wangyufen@huawei.com \
--cc=xiyou.wangcong@gmail.com \
--cc=yonghong.song@linux.dev \
--cc=zijianzhang@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.