netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] net: xsk: update tx queue consumer immdiately after transmission
@ 2025-06-19  9:36 Jason Xing
  2025-06-20 14:10 ` Stanislav Fomichev
  0 siblings, 1 reply; 9+ messages in thread
From: Jason Xing @ 2025-06-19  9:36 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, joe, willemdebruijn.kernel
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

For afxdp, the return value of sendto() syscall doesn't reflect how many
descs handled in the kernel. One of use cases is that when user-space
application tries to know the number of transmitted skbs and then decides
if it continues to send, say, is it stopped due to max tx budget?

The following formular can be used after sending to learn how many
skbs/descs the kernel takes care of:

  tx_queue.consumers_before - tx_queue.consumers_after

Prior to the current patch, the consumer of tx queue is not immdiately
updated at the end of each sendto syscall, which leads the consumer
value out-of-dated from the perspective of user space. So this patch
requires store operation to pass the cached value to the shared value
to handle the problem.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/xdp/xsk.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 7c47f665e9d1..3288ab2d67b4 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -856,6 +856,8 @@ static int __xsk_generic_xmit(struct sock *sk)
 	}
 
 out:
+	__xskq_cons_release(xs->tx);
+
 	if (sent_frame)
 		if (xsk_tx_writeable(xs))
 			sk->sk_write_space(sk);
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] net: xsk: update tx queue consumer immdiately after transmission
  2025-06-19  9:36 [PATCH net-next] net: xsk: update tx queue consumer immdiately after transmission Jason Xing
@ 2025-06-20 14:10 ` Stanislav Fomichev
  2025-06-20 15:25   ` Jason Xing
  2025-06-20 15:35   ` Maciej Fijalkowski
  0 siblings, 2 replies; 9+ messages in thread
From: Stanislav Fomichev @ 2025-06-20 14:10 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, joe, willemdebruijn.kernel, bpf, netdev,
	Jason Xing

On 06/19, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 
> For afxdp, the return value of sendto() syscall doesn't reflect how many
> descs handled in the kernel. One of use cases is that when user-space
> application tries to know the number of transmitted skbs and then decides
> if it continues to send, say, is it stopped due to max tx budget?
> 
> The following formular can be used after sending to learn how many
> skbs/descs the kernel takes care of:
> 
>   tx_queue.consumers_before - tx_queue.consumers_after
> 
> Prior to the current patch, the consumer of tx queue is not immdiately
> updated at the end of each sendto syscall, which leads the consumer
> value out-of-dated from the perspective of user space. So this patch
> requires store operation to pass the cached value to the shared value
> to handle the problem.
> 
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
>  net/xdp/xsk.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 7c47f665e9d1..3288ab2d67b4 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -856,6 +856,8 @@ static int __xsk_generic_xmit(struct sock *sk)
>  	}
>  
>  out:
> +	__xskq_cons_release(xs->tx);
> +
>  	if (sent_frame)
>  		if (xsk_tx_writeable(xs))
>  			sk->sk_write_space(sk);

So for the "good" case we are going to write the cons twice? From
xskq_cons_peek_desc and from here? Maybe make this __xskq_cons_release
conditional ('if (err)')?

I also wonder whether we should add a test for that? Should be easy to
verify by sending more than 32 packets. Is there a place in
tools/testing/selftests/bpf/xskxceiver.c to add that?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] net: xsk: update tx queue consumer immdiately after transmission
  2025-06-20 14:10 ` Stanislav Fomichev
@ 2025-06-20 15:25   ` Jason Xing
  2025-06-20 15:58     ` Stanislav Fomichev
  2025-06-20 15:35   ` Maciej Fijalkowski
  1 sibling, 1 reply; 9+ messages in thread
From: Jason Xing @ 2025-06-20 15:25 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, joe, willemdebruijn.kernel, bpf, netdev,
	Jason Xing

On Fri, Jun 20, 2025 at 10:10 PM Stanislav Fomichev
<stfomichev@gmail.com> wrote:
>
> On 06/19, Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > For afxdp, the return value of sendto() syscall doesn't reflect how many
> > descs handled in the kernel. One of use cases is that when user-space
> > application tries to know the number of transmitted skbs and then decides
> > if it continues to send, say, is it stopped due to max tx budget?
> >
> > The following formular can be used after sending to learn how many
> > skbs/descs the kernel takes care of:
> >
> >   tx_queue.consumers_before - tx_queue.consumers_after
> >
> > Prior to the current patch, the consumer of tx queue is not immdiately
> > updated at the end of each sendto syscall, which leads the consumer
> > value out-of-dated from the perspective of user space. So this patch
> > requires store operation to pass the cached value to the shared value
> > to handle the problem.
> >
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> >  net/xdp/xsk.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index 7c47f665e9d1..3288ab2d67b4 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -856,6 +856,8 @@ static int __xsk_generic_xmit(struct sock *sk)
> >       }
> >
> >  out:
> > +     __xskq_cons_release(xs->tx);
> > +
> >       if (sent_frame)
> >               if (xsk_tx_writeable(xs))
> >                       sk->sk_write_space(sk);
>
> So for the "good" case we are going to write the cons twice? From
> xskq_cons_peek_desc and from here? Maybe make this __xskq_cons_release
> conditional ('if (err)')?

One unlikely exception:
xskq_cons_peek_desc()->xskq_cons_read_desc()->xskq_cons_is_valid_desc()->return
false;
?

There are still two possible 'return false' in xskq_cons_peek_desc()
while so far I didn't spot a single one happening.

Admittedly, your suggestion covers the majority of normal good ones. I
can adjust it as you said.

>
> I also wonder whether we should add a test for that? Should be easy to
> verify by sending more than 32 packets. Is there a place in
> tools/testing/selftests/bpf/xskxceiver.c to add that?

Well, sorry, if it's not required, please don't force me to do so :S
The patch is only one simple update of the consumer that is shared
between user-space and kernel.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] net: xsk: update tx queue consumer immdiately after transmission
  2025-06-20 14:10 ` Stanislav Fomichev
  2025-06-20 15:25   ` Jason Xing
@ 2025-06-20 15:35   ` Maciej Fijalkowski
  2025-06-20 15:42     ` Jason Xing
  1 sibling, 1 reply; 9+ messages in thread
From: Maciej Fijalkowski @ 2025-06-20 15:35 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Jason Xing, davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, joe,
	willemdebruijn.kernel, bpf, netdev, Jason Xing

On Fri, Jun 20, 2025 at 07:10:51AM -0700, Stanislav Fomichev wrote:
> On 06/19, Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> > 
> > For afxdp, the return value of sendto() syscall doesn't reflect how many
> > descs handled in the kernel. One of use cases is that when user-space
> > application tries to know the number of transmitted skbs and then decides
> > if it continues to send, say, is it stopped due to max tx budget?
> > 
> > The following formular can be used after sending to learn how many
> > skbs/descs the kernel takes care of:
> > 
> >   tx_queue.consumers_before - tx_queue.consumers_after
> > 
> > Prior to the current patch, the consumer of tx queue is not immdiately
> > updated at the end of each sendto syscall, which leads the consumer
> > value out-of-dated from the perspective of user space. So this patch
> > requires store operation to pass the cached value to the shared value
> > to handle the problem.
> > 
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> >  net/xdp/xsk.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index 7c47f665e9d1..3288ab2d67b4 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -856,6 +856,8 @@ static int __xsk_generic_xmit(struct sock *sk)
> >  	}
> >  
> >  out:
> > +	__xskq_cons_release(xs->tx);
> > +
> >  	if (sent_frame)
> >  		if (xsk_tx_writeable(xs))
> >  			sk->sk_write_space(sk);
> 
> So for the "good" case we are going to write the cons twice? From
> xskq_cons_peek_desc and from here? Maybe make this __xskq_cons_release
> conditional ('if (err)')?

this patch updates a global state of producer whereas generic xmit loop
updates local value. this global state is also updated within peeking
function.

from quick look patch seems to be correct however my mind is in vacation
mode so i'll take a second look on monday.

> 
> I also wonder whether we should add a test for that? Should be easy to
> verify by sending more than 32 packets. Is there a place in
> tools/testing/selftests/bpf/xskxceiver.c to add that?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] net: xsk: update tx queue consumer immdiately after transmission
  2025-06-20 15:35   ` Maciej Fijalkowski
@ 2025-06-20 15:42     ` Jason Xing
  0 siblings, 0 replies; 9+ messages in thread
From: Jason Xing @ 2025-06-20 15:42 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Stanislav Fomichev, davem, edumazet, kuba, pabeni, bjorn,
	magnus.karlsson, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, joe, willemdebruijn.kernel, bpf, netdev,
	Jason Xing

On Fri, Jun 20, 2025 at 11:35 PM Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> On Fri, Jun 20, 2025 at 07:10:51AM -0700, Stanislav Fomichev wrote:
> > On 06/19, Jason Xing wrote:
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > For afxdp, the return value of sendto() syscall doesn't reflect how many
> > > descs handled in the kernel. One of use cases is that when user-space
> > > application tries to know the number of transmitted skbs and then decides
> > > if it continues to send, say, is it stopped due to max tx budget?
> > >
> > > The following formular can be used after sending to learn how many
> > > skbs/descs the kernel takes care of:
> > >
> > >   tx_queue.consumers_before - tx_queue.consumers_after
> > >
> > > Prior to the current patch, the consumer of tx queue is not immdiately
> > > updated at the end of each sendto syscall, which leads the consumer
> > > value out-of-dated from the perspective of user space. So this patch
> > > requires store operation to pass the cached value to the shared value
> > > to handle the problem.
> > >
> > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > ---
> > >  net/xdp/xsk.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > > index 7c47f665e9d1..3288ab2d67b4 100644
> > > --- a/net/xdp/xsk.c
> > > +++ b/net/xdp/xsk.c
> > > @@ -856,6 +856,8 @@ static int __xsk_generic_xmit(struct sock *sk)
> > >     }
> > >
> > >  out:
> > > +   __xskq_cons_release(xs->tx);
> > > +
> > >     if (sent_frame)
> > >             if (xsk_tx_writeable(xs))
> > >                     sk->sk_write_space(sk);
> >
> > So for the "good" case we are going to write the cons twice? From
> > xskq_cons_peek_desc and from here? Maybe make this __xskq_cons_release
> > conditional ('if (err)')?
>
> this patch updates a global state of producer whereas generic xmit loop
> updates local value. this global state is also updated within peeking
> function.

Stanislav also pointed out the normal/majority of good cases. I will
filter out the good case then.

>
> from quick look patch seems to be correct however my mind is in vacation
> mode so i'll take a second look on monday.

Thanks. I'm very sure that the line this patch introduces can be
helpful because I manually printk the delta to verify before/after
__xskq_cons_release(xs->tx); and then spot a few numbers larger than
zero during a simple test.

Thanks,
Jason

>
> >
> > I also wonder whether we should add a test for that? Should be easy to
> > verify by sending more than 32 packets. Is there a place in
> > tools/testing/selftests/bpf/xskxceiver.c to add that?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] net: xsk: update tx queue consumer immdiately after transmission
  2025-06-20 15:25   ` Jason Xing
@ 2025-06-20 15:58     ` Stanislav Fomichev
  2025-06-20 16:26       ` Jason Xing
  2025-06-23  5:31       ` Jason Xing
  0 siblings, 2 replies; 9+ messages in thread
From: Stanislav Fomichev @ 2025-06-20 15:58 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, joe, willemdebruijn.kernel, bpf, netdev,
	Jason Xing

On 06/20, Jason Xing wrote:
> On Fri, Jun 20, 2025 at 10:10 PM Stanislav Fomichev
> <stfomichev@gmail.com> wrote:
> >
> > On 06/19, Jason Xing wrote:
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > For afxdp, the return value of sendto() syscall doesn't reflect how many
> > > descs handled in the kernel. One of use cases is that when user-space
> > > application tries to know the number of transmitted skbs and then decides
> > > if it continues to send, say, is it stopped due to max tx budget?
> > >
> > > The following formular can be used after sending to learn how many
> > > skbs/descs the kernel takes care of:
> > >
> > >   tx_queue.consumers_before - tx_queue.consumers_after
> > >
> > > Prior to the current patch, the consumer of tx queue is not immdiately
> > > updated at the end of each sendto syscall, which leads the consumer
> > > value out-of-dated from the perspective of user space. So this patch
> > > requires store operation to pass the cached value to the shared value
> > > to handle the problem.
> > >
> > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > ---
> > >  net/xdp/xsk.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > > index 7c47f665e9d1..3288ab2d67b4 100644
> > > --- a/net/xdp/xsk.c
> > > +++ b/net/xdp/xsk.c
> > > @@ -856,6 +856,8 @@ static int __xsk_generic_xmit(struct sock *sk)
> > >       }
> > >
> > >  out:
> > > +     __xskq_cons_release(xs->tx);
> > > +
> > >       if (sent_frame)
> > >               if (xsk_tx_writeable(xs))
> > >                       sk->sk_write_space(sk);
> >
> > So for the "good" case we are going to write the cons twice? From
> > xskq_cons_peek_desc and from here? Maybe make this __xskq_cons_release
> > conditional ('if (err)')?
> 
> One unlikely exception:
> xskq_cons_peek_desc()->xskq_cons_read_desc()->xskq_cons_is_valid_desc()->return
> false;
> ?
> 
> There are still two possible 'return false' in xskq_cons_peek_desc()
> while so far I didn't spot a single one happening.
> 
> Admittedly, your suggestion covers the majority of normal good ones. I
> can adjust it as you said.
> 
> >
> > I also wonder whether we should add a test for that? Should be easy to
> > verify by sending more than 32 packets. Is there a place in
> > tools/testing/selftests/bpf/xskxceiver.c to add that?
> 
> Well, sorry, if it's not required, please don't force me to do so :S
> The patch is only one simple update of the consumer that is shared
> between user-space and kernel.

My suspicion is that the same issue exists for the zc case. So would
be nice to test it and fix it as well :-p

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] net: xsk: update tx queue consumer immdiately after transmission
  2025-06-20 15:58     ` Stanislav Fomichev
@ 2025-06-20 16:26       ` Jason Xing
  2025-06-20 16:29         ` Jason Xing
  2025-06-23  5:31       ` Jason Xing
  1 sibling, 1 reply; 9+ messages in thread
From: Jason Xing @ 2025-06-20 16:26 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, joe, willemdebruijn.kernel, bpf, netdev,
	Jason Xing

On Fri, Jun 20, 2025 at 11:58 PM Stanislav Fomichev
<stfomichev@gmail.com> wrote:
>
> On 06/20, Jason Xing wrote:
> > On Fri, Jun 20, 2025 at 10:10 PM Stanislav Fomichev
> > <stfomichev@gmail.com> wrote:
> > >
> > > On 06/19, Jason Xing wrote:
> > > > From: Jason Xing <kernelxing@tencent.com>
> > > >
> > > > For afxdp, the return value of sendto() syscall doesn't reflect how many
> > > > descs handled in the kernel. One of use cases is that when user-space
> > > > application tries to know the number of transmitted skbs and then decides
> > > > if it continues to send, say, is it stopped due to max tx budget?
> > > >
> > > > The following formular can be used after sending to learn how many
> > > > skbs/descs the kernel takes care of:
> > > >
> > > >   tx_queue.consumers_before - tx_queue.consumers_after
> > > >
> > > > Prior to the current patch, the consumer of tx queue is not immdiately
> > > > updated at the end of each sendto syscall, which leads the consumer
> > > > value out-of-dated from the perspective of user space. So this patch
> > > > requires store operation to pass the cached value to the shared value
> > > > to handle the problem.
> > > >
> > > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > > ---
> > > >  net/xdp/xsk.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > > > index 7c47f665e9d1..3288ab2d67b4 100644
> > > > --- a/net/xdp/xsk.c
> > > > +++ b/net/xdp/xsk.c
> > > > @@ -856,6 +856,8 @@ static int __xsk_generic_xmit(struct sock *sk)
> > > >       }
> > > >
> > > >  out:
> > > > +     __xskq_cons_release(xs->tx);
> > > > +
> > > >       if (sent_frame)
> > > >               if (xsk_tx_writeable(xs))
> > > >                       sk->sk_write_space(sk);
> > >
> > > So for the "good" case we are going to write the cons twice? From
> > > xskq_cons_peek_desc and from here? Maybe make this __xskq_cons_release
> > > conditional ('if (err)')?
> >
> > One unlikely exception:
> > xskq_cons_peek_desc()->xskq_cons_read_desc()->xskq_cons_is_valid_desc()->return
> > false;
> > ?
> >
> > There are still two possible 'return false' in xskq_cons_peek_desc()
> > while so far I didn't spot a single one happening.
> >
> > Admittedly, your suggestion covers the majority of normal good ones. I
> > can adjust it as you said.
> >
> > >
> > > I also wonder whether we should add a test for that? Should be easy to
> > > verify by sending more than 32 packets. Is there a place in
> > > tools/testing/selftests/bpf/xskxceiver.c to add that?
> >
> > Well, sorry, if it's not required, please don't force me to do so :S
> > The patch is only one simple update of the consumer that is shared
> > between user-space and kernel.
>
> My suspicion is that the same issue exists for the zc case. So would
> be nice to test it and fix it as well :-p

Oh, well, I will take a look at how the selftest works in the next few days.

Allow me to ask the question that you asked me before: even though I
didn't see the necessity to set the max budget for zc mode (just
because I didn't spot it happening), would it be better if we separate
both of them because it's an uAPI interface. IIUC, if the setsockopt
is set, we will not separate it any more in the future?

Or we can keep using the hardcoded value (32) in the zc mode like
before and __only__ touch the copy mode? Then if someone or I found
the significance of making it tunable, then another parameter of
setsockopt can be added? Does it make sense?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] net: xsk: update tx queue consumer immdiately after transmission
  2025-06-20 16:26       ` Jason Xing
@ 2025-06-20 16:29         ` Jason Xing
  0 siblings, 0 replies; 9+ messages in thread
From: Jason Xing @ 2025-06-20 16:29 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, joe, willemdebruijn.kernel, bpf, netdev,
	Jason Xing

> Allow me to ask the question that you asked me before: even though I
> didn't see the necessity to set the max budget for zc mode (just
> because I didn't spot it happening), would it be better if we separate
> both of them because it's an uAPI interface. IIUC, if the setsockopt
> is set, we will not separate it any more in the future?
>
> Or we can keep using the hardcoded value (32) in the zc mode like
> before and __only__ touch the copy mode? Then if someone or I found
> the significance of making it tunable, then another parameter of
> setsockopt can be added? Does it make sense?

I found I replied to a wrong thread. Let me copy&paste there instead.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next] net: xsk: update tx queue consumer immdiately after transmission
  2025-06-20 15:58     ` Stanislav Fomichev
  2025-06-20 16:26       ` Jason Xing
@ 2025-06-23  5:31       ` Jason Xing
  1 sibling, 0 replies; 9+ messages in thread
From: Jason Xing @ 2025-06-23  5:31 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, joe, willemdebruijn.kernel, bpf, netdev,
	Jason Xing

On Fri, Jun 20, 2025 at 11:58 PM Stanislav Fomichev
<stfomichev@gmail.com> wrote:
>
> On 06/20, Jason Xing wrote:
> > On Fri, Jun 20, 2025 at 10:10 PM Stanislav Fomichev
> > <stfomichev@gmail.com> wrote:
> > >
> > > On 06/19, Jason Xing wrote:
> > > > From: Jason Xing <kernelxing@tencent.com>
> > > >
> > > > For afxdp, the return value of sendto() syscall doesn't reflect how many
> > > > descs handled in the kernel. One of use cases is that when user-space
> > > > application tries to know the number of transmitted skbs and then decides
> > > > if it continues to send, say, is it stopped due to max tx budget?
> > > >
> > > > The following formular can be used after sending to learn how many
> > > > skbs/descs the kernel takes care of:
> > > >
> > > >   tx_queue.consumers_before - tx_queue.consumers_after
> > > >
> > > > Prior to the current patch, the consumer of tx queue is not immdiately
> > > > updated at the end of each sendto syscall, which leads the consumer
> > > > value out-of-dated from the perspective of user space. So this patch
> > > > requires store operation to pass the cached value to the shared value
> > > > to handle the problem.
> > > >
> > > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > > ---
> > > >  net/xdp/xsk.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > > > index 7c47f665e9d1..3288ab2d67b4 100644
> > > > --- a/net/xdp/xsk.c
> > > > +++ b/net/xdp/xsk.c
> > > > @@ -856,6 +856,8 @@ static int __xsk_generic_xmit(struct sock *sk)
> > > >       }
> > > >
> > > >  out:
> > > > +     __xskq_cons_release(xs->tx);
> > > > +
> > > >       if (sent_frame)
> > > >               if (xsk_tx_writeable(xs))
> > > >                       sk->sk_write_space(sk);
> > >
> > > So for the "good" case we are going to write the cons twice? From
> > > xskq_cons_peek_desc and from here? Maybe make this __xskq_cons_release
> > > conditional ('if (err)')?
> >
> > One unlikely exception:
> > xskq_cons_peek_desc()->xskq_cons_read_desc()->xskq_cons_is_valid_desc()->return
> > false;
> > ?
> >
> > There are still two possible 'return false' in xskq_cons_peek_desc()
> > while so far I didn't spot a single one happening.
> >
> > Admittedly, your suggestion covers the majority of normal good ones. I
> > can adjust it as you said.
> >
> > >
> > > I also wonder whether we should add a test for that? Should be easy to
> > > verify by sending more than 32 packets. Is there a place in
> > > tools/testing/selftests/bpf/xskxceiver.c to add that?
> >
> > Well, sorry, if it's not required, please don't force me to do so :S
> > The patch is only one simple update of the consumer that is shared
> > between user-space and kernel.
>
> My suspicion is that the same issue exists for the zc case. So would
> be nice to test it and fix it as well :-p

After digging into the logic around xsk_tx_peek_desc(), I can say that
at the end of every caller of xsk_tx_peek_desc(), there is always a
xsk_tx_release() function that used to update the local consumer to
the global state of consumer. So for the zero copy mode, no need to
change at all :)

I will soon send the v2 with the 'if (error)' statement in the
__xsk_generic_xmit().

Thanks,
Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-06-23  5:32 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-19  9:36 [PATCH net-next] net: xsk: update tx queue consumer immdiately after transmission Jason Xing
2025-06-20 14:10 ` Stanislav Fomichev
2025-06-20 15:25   ` Jason Xing
2025-06-20 15:58     ` Stanislav Fomichev
2025-06-20 16:26       ` Jason Xing
2025-06-20 16:29         ` Jason Xing
2025-06-23  5:31       ` Jason Xing
2025-06-20 15:35   ` Maciej Fijalkowski
2025-06-20 15:42     ` Jason Xing

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).