From: Stanislav Fomichev <stfomichev@gmail.com>
To: Jason Xing <kerneljasonxing@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>,
davem@davemloft.net, edumazet@google.com, pabeni@redhat.com,
bjorn@kernel.org, magnus.karlsson@intel.com,
maciej.fijalkowski@intel.com, jonathan.lemon@gmail.com,
sdf@fomichev.me, ast@kernel.org, daniel@iogearbox.net,
hawk@kernel.org, john.fastabend@gmail.com, joe@dama.to,
willemdebruijn.kernel@gmail.com, bpf@vger.kernel.org,
netdev@vger.kernel.org, Jason Xing <kernelxing@tencent.com>
Subject: Re: [PATCH net-next v3] net: xsk: introduce XDP_MAX_TX_BUDGET set/getsockopt
Date: Mon, 23 Jun 2025 17:48:09 -0700 [thread overview]
Message-ID: <aFn1ybR3kgSfvL_N@mini-arch> (raw)
In-Reply-To: <CAL+tcoBub4JpHrgWekK+OVCb0frXUaFYDGVd2XL3bvjHOTmFjQ@mail.gmail.com>
On 06/24, Jason Xing wrote:
> On Mon, Jun 23, 2025 at 10:18 PM Stanislav Fomichev
> <stfomichev@gmail.com> wrote:
> >
> > On 06/21, Jason Xing wrote:
> > > On Sat, Jun 21, 2025 at 12:47 AM Stanislav Fomichev
> > > <stfomichev@gmail.com> wrote:
> > > >
> > > > On 06/21, Jason Xing wrote:
> > > > > On Fri, Jun 20, 2025 at 10:25 PM Stanislav Fomichev
> > > > > <stfomichev@gmail.com> wrote:
> > > > > >
> > > > > > On 06/19, Jakub Kicinski wrote:
> > > > > > > On Thu, 19 Jun 2025 17:04:40 +0800 Jason Xing wrote:
> > > > > > > > @@ -424,7 +421,9 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc)
> > > > > > > > rcu_read_lock();
> > > > > > > > again:
> > > > > > > > list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) {
> > > > > > > > - if (xs->tx_budget_spent >= MAX_PER_SOCKET_BUDGET) {
> > > > > > > > + int max_budget = READ_ONCE(xs->max_tx_budget);
> > > > > > > > +
> > > > > > > > + if (xs->tx_budget_spent >= max_budget) {
> > > > > > > > budget_exhausted = true;
> > > > > > > > continue;
> > > > > > > > }
> > > > > > > > @@ -779,7 +778,7 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> > > > > > > > static int __xsk_generic_xmit(struct sock *sk)
> > > > > > > > {
> > > > > > > > struct xdp_sock *xs = xdp_sk(sk);
> > > > > > > > - u32 max_batch = TX_BATCH_SIZE;
> > > > > > > > + u32 max_budget = READ_ONCE(xs->max_tx_budget);
> > > > > > >
> > > > > > > Hm, maybe a question to Stan / Willem & other XSK experts but are these
> > > > > > > two max values / code paths really related? Question 2 -- is generic
> > > > > > > XSK a legit optimization target, legit enough to add uAPI?
> > > > > >
> > > > > > 1) xsk_tx_peek_desc is for zc case and xsk_build_skb is copy mode;
> > > > > > whether we want to affect zc case given the fact that Jason seemingly
> > > > > > cares about copy mode is a good question.
> > > > >
> > > > > Allow me to ask the similar question that you asked me before: even though I
> > > > > didn't see the necessity to set the max budget for zc mode (just
> > > > > because I didn't spot it happening), would it be better if we separate
> > > > > both of them because it's an uAPI interface. IIUC, if the setsockopt
> > > > > is set, we will not separate it any more in the future?
> > > > >
> > > > > We can keep using the hardcoded value (32) in the zc mode like
> > > > > before and __only__ touch the copy mode? Later if someone or I found
> > > > > the significance of making it tunable, then another parameter of
> > > > > setsockopt can be added? Does it make sense?
> > > >
> > > > Related suggestion: maybe we don't need this limit at all for the copy mode?
> > > > If the user, with a socket option, can arbitrarily change it, what is the
> > > > point of this limit? Keep it on the zc side to make sure one socket doesn't
> > > > starve the rest and drop from the copy mode.. Any reason not to do it?
> > >
> > > Thanks for bringing up the same question that I had in this thread. I
> > > saw the commit[1] mentioned it is used to avoid the burst as DPDK
> > > does, so my thought is that it might be used to prevent such a case
> > > where multiple sockets try to send packets through a shared umem
> > > nearly at the same time?
> > >
> > > Making it tunable is to provide a chance to let users seek for a good
> > > solution that is the best fit for them. It doesn't mean we
> > > allow/expect to see the burst situation.
> >
> > The users can choose to moderate their batches by submitting less
> > with each sendmsg call. I see why having a batch limit might be useful for
> > zerocopy to tx in batches to interleave multiple sockets, but not
> > sure how this limit helps for the copy mode. Since we are not running
> > qdisc layer on tx, we don't really have a good answer for multiple
> > sockets sharing the same device/queue..
>
> It's worth mentioning that the xsk still holds the tx queue lock in
> the non-zc mode. So I assume getting rid of the limit might be harmful
> for other non xsk flows. That is what I know about the burst concern.
But one still needs NET_RAW to use it, right? So it's not like some
random process will suddenly start ddos-ing tx queues.. Maybe we should
add need_resched() / signal_pending() to the loop to break it?
next prev parent reply other threads:[~2025-06-24 0:48 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-19 9:04 [PATCH net-next v3] net: xsk: introduce XDP_MAX_TX_BUDGET set/getsockopt Jason Xing
2025-06-19 13:53 ` Willem de Bruijn
2025-06-19 23:53 ` Jason Xing
2025-06-20 0:02 ` Jason Xing
2025-06-20 13:43 ` Willem de Bruijn
2025-06-20 13:58 ` Willem de Bruijn
2025-06-20 14:37 ` Jason Xing
2025-06-20 22:21 ` Willem de Bruijn
2025-06-19 15:09 ` Jakub Kicinski
2025-06-20 0:17 ` Jason Xing
2025-06-20 13:50 ` Willem de Bruijn
2025-06-20 15:03 ` Jason Xing
2025-06-20 22:24 ` Willem de Bruijn
2025-06-21 0:40 ` Jason Xing
2025-06-21 14:43 ` Jakub Kicinski
2025-06-22 0:05 ` Jason Xing
2025-06-20 14:25 ` Stanislav Fomichev
2025-06-20 16:30 ` Jason Xing
2025-06-20 16:47 ` Stanislav Fomichev
2025-06-20 17:46 ` Jason Xing
2025-06-23 14:18 ` Stanislav Fomichev
2025-06-23 23:54 ` Jason Xing
2025-06-24 0:48 ` Stanislav Fomichev [this message]
2025-06-24 2:47 ` Jason Xing
2025-06-20 22:20 ` Willem de Bruijn
2025-06-21 1:06 ` Jason Xing
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aFn1ybR3kgSfvL_N@mini-arch \
--to=stfomichev@gmail.com \
--cc=ast@kernel.org \
--cc=bjorn@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=joe@dama.to \
--cc=john.fastabend@gmail.com \
--cc=jonathan.lemon@gmail.com \
--cc=kerneljasonxing@gmail.com \
--cc=kernelxing@tencent.com \
--cc=kuba@kernel.org \
--cc=maciej.fijalkowski@intel.com \
--cc=magnus.karlsson@intel.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=willemdebruijn.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).