netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stanislav Fomichev <stfomichev@gmail.com>
To: Jason Xing <kerneljasonxing@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>,
	davem@davemloft.net, edumazet@google.com, pabeni@redhat.com,
	bjorn@kernel.org, magnus.karlsson@intel.com,
	maciej.fijalkowski@intel.com, jonathan.lemon@gmail.com,
	sdf@fomichev.me, ast@kernel.org, daniel@iogearbox.net,
	hawk@kernel.org, john.fastabend@gmail.com, joe@dama.to,
	willemdebruijn.kernel@gmail.com, bpf@vger.kernel.org,
	netdev@vger.kernel.org, Jason Xing <kernelxing@tencent.com>
Subject: Re: [PATCH net-next v3] net: xsk: introduce XDP_MAX_TX_BUDGET set/getsockopt
Date: Mon, 23 Jun 2025 17:48:09 -0700	[thread overview]
Message-ID: <aFn1ybR3kgSfvL_N@mini-arch> (raw)
In-Reply-To: <CAL+tcoBub4JpHrgWekK+OVCb0frXUaFYDGVd2XL3bvjHOTmFjQ@mail.gmail.com>

On 06/24, Jason Xing wrote:
> On Mon, Jun 23, 2025 at 10:18 PM Stanislav Fomichev
> <stfomichev@gmail.com> wrote:
> >
> > On 06/21, Jason Xing wrote:
> > > On Sat, Jun 21, 2025 at 12:47 AM Stanislav Fomichev
> > > <stfomichev@gmail.com> wrote:
> > > >
> > > > On 06/21, Jason Xing wrote:
> > > > > On Fri, Jun 20, 2025 at 10:25 PM Stanislav Fomichev
> > > > > <stfomichev@gmail.com> wrote:
> > > > > >
> > > > > > On 06/19, Jakub Kicinski wrote:
> > > > > > > On Thu, 19 Jun 2025 17:04:40 +0800 Jason Xing wrote:
> > > > > > > > @@ -424,7 +421,9 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc)
> > > > > > > >     rcu_read_lock();
> > > > > > > >  again:
> > > > > > > >     list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) {
> > > > > > > > -           if (xs->tx_budget_spent >= MAX_PER_SOCKET_BUDGET) {
> > > > > > > > +           int max_budget = READ_ONCE(xs->max_tx_budget);
> > > > > > > > +
> > > > > > > > +           if (xs->tx_budget_spent >= max_budget) {
> > > > > > > >                     budget_exhausted = true;
> > > > > > > >                     continue;
> > > > > > > >             }
> > > > > > > > @@ -779,7 +778,7 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> > > > > > > >  static int __xsk_generic_xmit(struct sock *sk)
> > > > > > > >  {
> > > > > > > >     struct xdp_sock *xs = xdp_sk(sk);
> > > > > > > > -   u32 max_batch = TX_BATCH_SIZE;
> > > > > > > > +   u32 max_budget = READ_ONCE(xs->max_tx_budget);
> > > > > > >
> > > > > > > Hm, maybe a question to Stan / Willem & other XSK experts but are these
> > > > > > > two max values / code paths really related? Question 2 -- is generic
> > > > > > > XSK a legit optimization target, legit enough to add uAPI?
> > > > > >
> > > > > > 1) xsk_tx_peek_desc is for zc case and xsk_build_skb is copy mode;
> > > > > > whether we want to affect zc case given the fact that Jason seemingly
> > > > > > cares about copy mode is a good question.
> > > > >
> > > > > Allow me to ask the similar question that you asked me before: even though I
> > > > > didn't see the necessity to set the max budget for zc mode (just
> > > > > because I didn't spot it happening), would it be better if we separate
> > > > > both of them because it's an uAPI interface. IIUC, if the setsockopt
> > > > > is set, we will not separate it any more in the future?
> > > > >
> > > > > We can keep using the hardcoded value (32) in the zc mode like
> > > > > before and __only__ touch the copy mode? Later if someone or I found
> > > > > the significance of making it tunable, then another parameter of
> > > > > setsockopt can be added? Does it make sense?
> > > >
> > > > Related suggestion: maybe we don't need this limit at all for the copy mode?
> > > > If the user, with a socket option, can arbitrarily change it, what is the
> > > > point of this limit? Keep it on the zc side to make sure one socket doesn't
> > > > starve the rest and drop from the copy mode.. Any reason not to do it?
> > >
> > > Thanks for bringing up the same question that I had in this thread. I
> > > saw the commit[1] mentioned it is used to avoid the burst as DPDK
> > > does, so my thought is that it might be used to prevent such a case
> > > where multiple sockets try to send packets through a shared umem
> > > nearly at the same time?
> > >
> > > Making it tunable is to provide a chance to let users seek for a good
> > > solution that is the best fit for them. It doesn't mean we
> > > allow/expect to see the burst situation.
> >
> > The users can choose to moderate their batches by submitting less
> > with each sendmsg call. I see why having a batch limit might be useful for
> > zerocopy to tx in batches to interleave multiple sockets, but not
> > sure how this limit helps for the copy mode. Since we are not running
> > qdisc layer on tx, we don't really have a good answer for multiple
> > sockets sharing the same device/queue..
> 
> It's worth mentioning that the xsk still holds the tx queue lock in
> the non-zc mode. So I assume getting rid of the limit might be harmful
> for other non xsk flows. That is what I know about the burst concern.

But one still needs NET_RAW to use it, right? So it's not like some
random process will suddenly start ddos-ing tx queues.. Maybe we should
add need_resched() / signal_pending() to the loop to break it?

  reply	other threads:[~2025-06-24  0:48 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-19  9:04 [PATCH net-next v3] net: xsk: introduce XDP_MAX_TX_BUDGET set/getsockopt Jason Xing
2025-06-19 13:53 ` Willem de Bruijn
2025-06-19 23:53   ` Jason Xing
2025-06-20  0:02     ` Jason Xing
2025-06-20 13:43     ` Willem de Bruijn
2025-06-20 13:58       ` Willem de Bruijn
2025-06-20 14:37         ` Jason Xing
2025-06-20 22:21           ` Willem de Bruijn
2025-06-19 15:09 ` Jakub Kicinski
2025-06-20  0:17   ` Jason Xing
2025-06-20 13:50     ` Willem de Bruijn
2025-06-20 15:03       ` Jason Xing
2025-06-20 22:24         ` Willem de Bruijn
2025-06-21  0:40           ` Jason Xing
2025-06-21 14:43     ` Jakub Kicinski
2025-06-22  0:05       ` Jason Xing
2025-06-20 14:25   ` Stanislav Fomichev
2025-06-20 16:30     ` Jason Xing
2025-06-20 16:47       ` Stanislav Fomichev
2025-06-20 17:46         ` Jason Xing
2025-06-23 14:18           ` Stanislav Fomichev
2025-06-23 23:54             ` Jason Xing
2025-06-24  0:48               ` Stanislav Fomichev [this message]
2025-06-24  2:47                 ` Jason Xing
2025-06-20 22:20     ` Willem de Bruijn
2025-06-21  1:06       ` Jason Xing

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aFn1ybR3kgSfvL_N@mini-arch \
    --to=stfomichev@gmail.com \
    --cc=ast@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=joe@dama.to \
    --cc=john.fastabend@gmail.com \
    --cc=jonathan.lemon@gmail.com \
    --cc=kerneljasonxing@gmail.com \
    --cc=kernelxing@tencent.com \
    --cc=kuba@kernel.org \
    --cc=maciej.fijalkowski@intel.com \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).