All of lore.kernel.org
 help / color / mirror / Atom feed
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Jason Xing <kerneljasonxing@gmail.com>
Cc: <davem@davemloft.net>, <edumazet@google.com>, <kuba@kernel.org>,
	<pabeni@redhat.com>, <bjorn@kernel.org>,
	<magnus.karlsson@intel.com>, <jonathan.lemon@gmail.com>,
	<sdf@fomichev.me>, <ast@kernel.org>, <daniel@iogearbox.net>,
	<hawk@kernel.org>, <john.fastabend@gmail.com>, <joe@dama.to>,
	<willemdebruijn.kernel@gmail.com>, <fmancera@suse.de>,
	<csmate@nop.hu>, <bpf@vger.kernel.org>, <netdev@vger.kernel.org>,
	Jason Xing <kernelxing@tencent.com>
Subject: Re: [PATCH RFC net-next 2/2] xsk: introduce a cached cq to temporarily store descriptor addrs
Date: Fri, 31 Oct 2025 15:02:18 +0100	[thread overview]
Message-ID: <aQTBajODN3Nnskta@boxer> (raw)
In-Reply-To: <20251031093230.82386-3-kerneljasonxing@gmail.com>

On Fri, Oct 31, 2025 at 05:32:30PM +0800, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 
> Before the commit 30f241fcf52a ("xsk: Fix immature cq descriptor
> production"), there is one issue[1] which causes the wrong publish
> of descriptors in race condidtion. The above commit fixes the issue
> but adds more memory operations in the xmit hot path and interrupt
> context, which can cause side effect in performance.
> 
> This patch tries to propose a new solution to fix the problem
> without manipulating the allocation and deallocation of memory. One
> of the key points is that I borrowed the idea from the above commit
> that postpones updating the ring->descs in xsk_destruct_skb()
> instead of in __xsk_generic_xmit().
> 
> The core logics are as show below:
> 1. allocate a new local queue. Only its cached_prod member is used.
> 2. write the descriptors into the local queue in the xmit path. And
>    record the cached_prod as @start_addr that reflects the
>    start position of this queue so that later the skb can easily
>    find where its addrs are written in the destruction phase.
> 3. initialize the upper 24 bits of destructor_arg to store @start_addr
>    in xsk_skb_init_misc().
> 4. Initialize the lower 8 bits of destructor_arg to store how many
>    descriptors the skb owns in xsk_update_num_desc().
> 5. write the desc addr(s) from the @start_addr from the cached cq
>    one by one into the real cq in xsk_destruct_skb(). In turn sync
>    the global state of the cq.
> 
> The format of destructor_arg is designed as:
>  ------------------------ --------
> |       start_addr       |  num   |
>  ------------------------ --------
> Using upper 24 bits is enough to keep the temporary descriptors. And
> it's also enough to use lower 8 bits to show the number of descriptors
> that one skb owns.
> 
> [1]: https://lore.kernel.org/all/20250530095957.43248-1-e.kubanski@partner.samsung.com/
> 
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
> I posted the series as an RFC because I'd like to hear more opinions on
> the current rought approach so that the fix[2] can be avoided and
> mitigate the impact of performance. This patch might have bugs because
> I decided to spend more time on it after we come to an agreement. Please
> review the overall concepts. Thanks!
> 
> Maciej, could you share with me the way you tested jumbo frame? I used
> ./xdpsock -i enp2s0f1 -t -q 1 -S -s 9728 but the xdpsock utilizes the
> nic more than 90%, which means I cannot see the performance impact.
> 
> [2]:https://lore.kernel.org/all/20251030140355.4059-1-fmancera@suse.de/
> ---
>  include/net/xdp_sock.h      |   1 +
>  include/net/xsk_buff_pool.h |   1 +
>  net/xdp/xsk.c               | 104 ++++++++++++++++++++++++++++--------
>  net/xdp/xsk_buff_pool.c     |   1 +
>  4 files changed, 84 insertions(+), 23 deletions(-)

(...)

> diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> index aa9788f20d0d..6e170107dec7 100644
> --- a/net/xdp/xsk_buff_pool.c
> +++ b/net/xdp/xsk_buff_pool.c
> @@ -99,6 +99,7 @@ struct xsk_buff_pool *xp_create_and_assign_umem(struct xdp_sock *xs,
>  
>  	pool->fq = xs->fq_tmp;
>  	pool->cq = xs->cq_tmp;
> +	pool->cached_cq = xs->cached_cq;

Jason,

pool can be shared between multiple sockets that bind to same <netdev,qid>
tuple. I believe here you're opening up for the very same issue Eryk
initially reported.

>  
>  	for (i = 0; i < pool->free_heads_cnt; i++) {
>  		xskb = &pool->heads[i];
> -- 
> 2.41.3
> 

  reply	other threads:[~2025-10-31 14:02 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-31  9:32 [PATCH RFC net-next 0/2] xsk: fix immature cq descriptor production (II) Jason Xing
2025-10-31  9:32 ` [PATCH RFC net-next 1/2] Revert "xsk: Fix immature cq descriptor production" Jason Xing
2025-10-31  9:32 ` [PATCH RFC net-next 2/2] xsk: introduce a cached cq to temporarily store descriptor addrs Jason Xing
2025-10-31 14:02   ` Maciej Fijalkowski [this message]
2025-10-31 23:59     ` Jason Xing
2025-11-03 15:00       ` Maciej Fijalkowski
2025-11-03 23:29         ` Jason Xing
2025-11-11 13:03         ` Jason Xing
2025-11-11 13:44           ` Magnus Karlsson
2025-11-11 14:02             ` Jason Xing
2025-11-14 15:52               ` Maciej Fijalkowski
2025-11-14 23:46                 ` Jason Xing
2025-11-17 16:05                   ` Maciej Fijalkowski
2025-11-18  0:01                     ` Jason Xing
2025-11-18 10:14                       ` Maciej Fijalkowski
2025-11-18 11:40                         ` Jason Xing
2025-11-18 18:28                           ` Maciej Fijalkowski
2025-11-18 23:37                             ` Jason Xing
2025-11-19  9:20                               ` Jason Xing

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aQTBajODN3Nnskta@boxer \
    --to=maciej.fijalkowski@intel.com \
    --cc=ast@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=csmate@nop.hu \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fmancera@suse.de \
    --cc=hawk@kernel.org \
    --cc=joe@dama.to \
    --cc=john.fastabend@gmail.com \
    --cc=jonathan.lemon@gmail.com \
    --cc=kerneljasonxing@gmail.com \
    --cc=kernelxing@tencent.com \
    --cc=kuba@kernel.org \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.