All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Zijian Zhang <zijianzhang@bytedance.com>,
	 netdev@vger.kernel.org, bpf@vger.kernel.org,
	 john.fastabend@gmail.com, zhoufeng.zf@bytedance.com,
	 Amery Hung <amery.hung@bytedance.com>,
	 Cong Wang <cong.wang@bytedance.com>
Subject: Re: [Patch bpf-next v4 4/4] tcp_bpf: improve ingress redirection performance with message corking
Date: Mon, 07 Jul 2025 19:51:12 +0200	[thread overview]
Message-ID: <87cyabhotr.fsf@cloudflare.com> (raw)
In-Reply-To: <aGdWhRi/0KLTFL8k@pop-os.localdomain> (Cong Wang's message of "Thu, 3 Jul 2025 21:20:21 -0700")

On Thu, Jul 03, 2025 at 09:20 PM -07, Cong Wang wrote:
> On Thu, Jul 03, 2025 at 01:32:08PM +0200, Jakub Sitnicki wrote:
>> I'm all for reaping the benefits of batching, but I'm not thrilled about
>> having a backlog worker on the path. The one we have on the sk_skb path
>> has been a bottleneck:
>
> It depends on what you compare with. If you compare it with vanilla
> TCP_BPF, we did see is 5% latency increase. If you compare it with
> regular TCP, it is still much better. Our goal is to make Cillium's
> sockops-enable competitive with regular TCP, hence we compare it with
> regular TCP.
>
> I hope this makes sense to you. Sorry if this was not clear in our cover
> letter.

Latency-wise I think we should be comparing sk_msg send-to-local against
UDS rather than full-stack TCP.

There is quite a bit of guessing on my side as to what you're looking
for because the cover letter doesn't say much about the use case.

For instance, do you control the sender?  Why not do big writes on the
sender side if raw throughput is what you care about?

>> 1) There's no backpressure propagation so you can have a backlog
>> build-up. One thing to check is what happens if the receiver closes its
>> window.
>
> Right, I am sure there are still a lot of optimizations we can further
> improve. The only question is how much we need for now. How about
> optimizing it one step each time? :)

This is introducing a quite a bit complexity from the start. I'd like to
least explore if it can be done in a simpler fashion before committing to
it.

You point at wake-ups as being the throughput killer. As an alternative,
can we wake up the receiver conditionally? That is only if the receiver
has made progress since on the queue since the last notification. This
could also be a form of wakeup moderation.

>> 2) There's a scheduling latency. That's why the performance of splicing
>> sockets with sockmap (ingress-to-egress) looks bleak [1].
>
> Same for regular TCP, we have to wakeup the receiver/worker. But I may
> misunderstand this point?

What I meant is that, in the pessimistic case, to deliver a message we
now have to go through two wakeups:

sender -wakeup-> kworker -wakeup-> receiver

>> So I have to dig deeper...
>> 
>> Have you considered and/or evaluated any alternative designs? For
>> instance, what stops us from having an auto-corking / coalescing
>> strategy on the sender side?
>
> Auto corking _may_ be not as easy as TCP, since essentially we have no
> protocol here, just a pure socket layer.

You're right. We don't have a flush signal for auto-corking on the
sender side with sk_msg's.

What about what I mentioned above - can we moderate the wakeups based on
receiver making progress? Does that sound feasible to you?

Thanks,
-jkbs

  reply	other threads:[~2025-07-07 17:51 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-01  1:11 [Patch bpf-next v4 0/4] tcp_bpf: improve ingress redirection performance with message corking Cong Wang
2025-07-01  1:11 ` [Patch bpf-next v4 1/4] skmsg: rename sk_msg_alloc() to sk_msg_expand() Cong Wang
2025-07-02  9:24   ` Jakub Sitnicki
2025-07-01  1:11 ` [Patch bpf-next v4 2/4] skmsg: implement slab allocator cache for sk_msg Cong Wang
2025-07-02 11:36   ` Jakub Sitnicki
2025-07-01  1:12 ` [Patch bpf-next v4 3/4] skmsg: save some space in struct sk_psock Cong Wang
2025-07-02 11:46   ` Jakub Sitnicki
2025-07-01  1:12 ` [Patch bpf-next v4 4/4] tcp_bpf: improve ingress redirection performance with message corking Cong Wang
2025-07-02 12:17   ` Jakub Sitnicki
2025-07-03  2:17     ` Zijian Zhang
2025-07-03 11:32       ` Jakub Sitnicki
2025-07-04  4:20         ` Cong Wang
2025-07-07 17:51           ` Jakub Sitnicki [this message]
2025-07-15  0:26             ` Zijian Zhang
2025-07-02 10:22 ` [Patch bpf-next v4 0/4] " Jakub Sitnicki
2025-07-03  1:48   ` Zijian Zhang
2025-07-02 11:05 ` Jakub Sitnicki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87cyabhotr.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=amery.hung@bytedance.com \
    --cc=bpf@vger.kernel.org \
    --cc=cong.wang@bytedance.com \
    --cc=john.fastabend@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    --cc=zhoufeng.zf@bytedance.com \
    --cc=zijianzhang@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.