From: Jakub Sitnicki <jakub@cloudflare.com>
To: Cong Wang <xiyou.wangcong@gmail.com>, zijianzhang@bytedance.com
Cc: netdev@vger.kernel.org, bpf@vger.kernel.org,
john.fastabend@gmail.com, zhoufeng.zf@bytedance.com,
Amery Hung <amery.hung@bytedance.com>,
Cong Wang <cong.wang@bytedance.com>
Subject: Re: [Patch bpf-next v4 4/4] tcp_bpf: improve ingress redirection performance with message corking
Date: Wed, 02 Jul 2025 14:17:13 +0200 [thread overview]
Message-ID: <87ecuyn5x2.fsf@cloudflare.com> (raw)
In-Reply-To: <20250701011201.235392-5-xiyou.wangcong@gmail.com> (Cong Wang's message of "Mon, 30 Jun 2025 18:12:01 -0700")
On Mon, Jun 30, 2025 at 06:12 PM -07, Cong Wang wrote:
> From: Zijian Zhang <zijianzhang@bytedance.com>
>
> The TCP_BPF ingress redirection path currently lacks the message corking
> mechanism found in standard TCP. This causes the sender to wake up the
> receiver for every message, even when messages are small, resulting in
> reduced throughput compared to regular TCP in certain scenarios.
I'm curious what scenarios are you referring to? Is it send-to-local or
ingress-to-local? [1]
If the sender is emitting small messages, that's probably intended -
that is they likely want to get the message across as soon as possible,
because They must have disabled the Nagle algo (set TCP_NODELAY) to do
that.
Otherwise, you get small segment merging on the sender side by default.
And if MTU is a limiting factor, you should also be getting batching
from GRO.
What I'm getting at is that I don't quite follow why you don't see
sufficient batching before the sockmap redirect today?
> This change introduces a kernel worker-based intermediate layer to provide
> automatic message corking for TCP_BPF. While this adds a slight latency
> overhead, it significantly improves overall throughput by reducing
> unnecessary wake-ups and reducing the sock lock contention.
"Slight" for a +5% increase in latency is an understatement :-)
IDK about this being always on for every socket. For send-to-local
[1], sk_msg redirs can be viewed as a form of IPC, where latency
matters.
I do understand that you're trying to optimize for bulk-transfer
workloads, but please consider also request-response workloads.
[1] https://github.com/jsitnicki/kubecon-2024-sockmap/blob/main/cheatsheet-sockmap-redirect.png
> Reviewed-by: Amery Hung <amery.hung@bytedance.com>
> Co-developed-by: Cong Wang <cong.wang@bytedance.com>
> Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
> ---
next prev parent reply other threads:[~2025-07-02 12:17 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-01 1:11 [Patch bpf-next v4 0/4] tcp_bpf: improve ingress redirection performance with message corking Cong Wang
2025-07-01 1:11 ` [Patch bpf-next v4 1/4] skmsg: rename sk_msg_alloc() to sk_msg_expand() Cong Wang
2025-07-02 9:24 ` Jakub Sitnicki
2025-07-01 1:11 ` [Patch bpf-next v4 2/4] skmsg: implement slab allocator cache for sk_msg Cong Wang
2025-07-02 11:36 ` Jakub Sitnicki
2025-07-01 1:12 ` [Patch bpf-next v4 3/4] skmsg: save some space in struct sk_psock Cong Wang
2025-07-02 11:46 ` Jakub Sitnicki
2025-07-01 1:12 ` [Patch bpf-next v4 4/4] tcp_bpf: improve ingress redirection performance with message corking Cong Wang
2025-07-02 12:17 ` Jakub Sitnicki [this message]
2025-07-03 2:17 ` Zijian Zhang
2025-07-03 11:32 ` Jakub Sitnicki
2025-07-04 4:20 ` Cong Wang
2025-07-07 17:51 ` Jakub Sitnicki
2025-07-15 0:26 ` Zijian Zhang
2025-07-02 10:22 ` [Patch bpf-next v4 0/4] " Jakub Sitnicki
2025-07-03 1:48 ` Zijian Zhang
2025-07-02 11:05 ` Jakub Sitnicki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ecuyn5x2.fsf@cloudflare.com \
--to=jakub@cloudflare.com \
--cc=amery.hung@bytedance.com \
--cc=bpf@vger.kernel.org \
--cc=cong.wang@bytedance.com \
--cc=john.fastabend@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=xiyou.wangcong@gmail.com \
--cc=zhoufeng.zf@bytedance.com \
--cc=zijianzhang@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.