All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <hawk@kernel.org>
To: Dragos Tatulea <dtatulea@nvidia.com>,
	Chris Arges <carges@cloudflare.com>
Cc: Jesse Brandeburg <jbrandeburg@cloudflare.com>,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	kernel-team <kernel-team@cloudflare.com>,
	tariqt@nvidia.com, saeedm@nvidia.com,
	Leon Romanovsky <leon@kernel.org>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	John Fastabend <john.fastabend@gmail.com>,
	Simon Horman <horms@kernel.org>,
	Andrew Rzeznik <arzeznik@cloudflare.com>,
	Yan Zhai <yan@cloudflare.com>,
	Kumar Kartikeya Dwivedi <memxor@gmail.com>
Subject: Re: [BUG] mlx5_core memory management issue
Date: Thu, 14 Aug 2025 13:26:37 +0200	[thread overview]
Message-ID: <e60404e2-4782-409f-8596-ae21ce7272c4@kernel.org> (raw)
In-Reply-To: <4zkm7dmkxhfhf3cm7eniim26z6nbp3zsm4qttapg3xbvkrqhro@cvjnbr624m5h>

[-- Attachment #1: Type: text/plain, Size: 4328 bytes --]



On 13/08/2025 22.24, Dragos Tatulea wrote:
> On Wed, Aug 13, 2025 at 07:26:49PM +0000, Dragos Tatulea wrote:
>> On Wed, Aug 13, 2025 at 01:53:48PM -0500, Chris Arges wrote:
>>> On 2025-08-12 16:25:58, Chris Arges wrote:
>>>> On 2025-08-12 20:19:30, Dragos Tatulea wrote:
>>>>> On Tue, Aug 12, 2025 at 11:55:39AM -0700, Jesse Brandeburg wrote:
>>>>>> On 8/12/25 8:44 AM, 'Dragos Tatulea' via kernel-team wrote:
>>>>>>
>>>>>>> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
>>>>>>> index 482d284a1553..484216c7454d 100644
>>>>>>> --- a/kernel/bpf/devmap.c
>>>>>>> +++ b/kernel/bpf/devmap.c
>>>>>>> @@ -408,8 +408,10 @@ static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags)
>>>>>>>           /* If not all frames have been transmitted, it is our
>>>>>>>            * responsibility to free them
>>>>>>>            */
>>>>>>> +       xdp_set_return_frame_no_direct();
>>>>>>>           for (i = sent; unlikely(i < to_send); i++)
>>>>>>>                   xdp_return_frame_rx_napi(bq->q[i]);
>>>>>>> +       xdp_clear_return_frame_no_direct();
>>>>>>
>>>>>> Why can't this instead just be xdp_return_frame(bq->q[i]); with no
>>>>>> "no_direct" fussing?
>>>>>>
>>>>>> Wouldn't this be the safest way for this function to call frame completion?
>>>>>> It seems like presuming the calling context is napi is wrong?
>>>>>>
>>>>> It would be better indeed. Thanks for removing my horse glasses!
>>>>>
>>>>> Once Chris verifies that this works for him I can prepare a fix patch.
>>>>>
>>>> Working on that now, I'm testing a kernel with the following change:
>>>>
>>>> ---
>>>>
>>>> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
>>>> index 3aa002a47..ef86d9e06 100644
>>>> --- a/kernel/bpf/devmap.c
>>>> +++ b/kernel/bpf/devmap.c
>>>> @@ -409,7 +409,7 @@ static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags)
>>>>           * responsibility to free them
>>>>           */
>>>>          for (i = sent; unlikely(i < to_send); i++)
>>>> -               xdp_return_frame_rx_napi(bq->q[i]);
>>>> +               xdp_return_frame(bq->q[i]);
>>>>   
>>>>   out:
>>>>          bq->count = 0;
>>>
>>> This patch resolves the issue I was seeing and I am no longer able to
>>> reproduce the issue. I tested for about 2 hours, when the reproducer usually
>>> takes about 1-2 minutes.
>>>
>> Thanks! Will send a patch tomorrow and also add you in the Tested-by tag.
>>

Looking at code ... there are more cases we need to deal with.
If simply replacing xdp_return_frame_rx_napi() with xdp_return_frame.

The normal way to fix this is to use the helpers:
  - xdp_set_return_frame_no_direct();
  - xdp_clear_return_frame_no_direct()

Because __xdp_return() code[1] via xdp_return_frame_no_direct() will
disable those napi_direct requests.

  [1] https://elixir.bootlin.com/linux/v6.16/source/net/core/xdp.c#L439

Something doesn't add-up, because the remote CPUMAP bpf-prog that 
redirects to veth is running in cpu_map_bpf_prog_run_xdp()[2] and that 
function already uses the xdp_set_return_frame_no_direct() helper.

  [2] https://elixir.bootlin.com/linux/v6.16/source/kernel/bpf/cpumap.c#L189

I see the bug now... attached a patch with the fix.
The scope for the "no_direct" forgot to wrap the xdp_do_flush() call.

Looks like bug was introduced in 11941f8a8536 ("bpf: cpumap: Implement 
generic cpumap") v5.15.

>> As follow up work it would be good to have a way to catch this family of
>> issues. Something in the lines of the patch below.
>>

Yes, please, we want something that can catch these kind of hard to find 
bugs.

>> Thanks,
>> Dragos
>>
>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
>> index f1373756cd0f..0c498fbd8df6 100644
>> --- a/net/core/page_pool.c
>> +++ b/net/core/page_pool.c
>> @@ -794,6 +794,10 @@ __page_pool_put_page(struct page_pool *pool, netmem_ref netmem,
>>   {
>>          lockdep_assert_no_hardirq();
>>   
>> +#ifdef CONFIG_PAGE_POOL_CACHEDEBUG
>> +       WARN(page_pool_napi_local(pool), "Page pool cache access from non-direct napi context");
> I meant to negate the condition here.
> 

The XDP code have evolved since the xdp_set_return_frame_no_direct()
calls were added.  Now page_pool keeps track of pp->napi and
pool-> cpuid.  Maybe the __xdp_return [1] checks should be updated?
(and maybe it allows us to remove the no_direct helpers).

--Jesper

[-- Attachment #2: 01-cpumap-disable-pp-direct.patch --]
[-- Type: text/x-patch, Size: 2101 bytes --]

cpumap: disable page_pool direct xdp_return need larger scope

From: Jesper Dangaard Brouer <hawk@kernel.org>

When running an XDP bpf_prog on the remote CPU in cpumap code
then we must disable the direct return optimization that
xdp_return can perform for mem_type page_pool.  This optimization
assumes code is still executing under RX-NAPI of the original
receiving CPU, which isn't true on this remote CPU.

The cpumap code already disabled this via helpers
xdp_set_return_frame_no_direct() and xdp_clear_return_frame_no_direct(),
but the scope didn't include xdp_do_flush().

When doing XDP_REDIRECT towards e.g devmap this causes the
function bq_xmit_all() to run with direct return optimization
enabled. This can lead to hard to find bugs.

Fix by expanding scope to include xdp_do_flush().

Fixes: 11941f8a8536 ("bpf: cpumap: Implement generic cpumap")
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
---
 kernel/bpf/cpumap.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index b2b7b8ec2c2a..c46360b27871 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -186,7 +186,6 @@ static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_entry *rcpu,
 	struct xdp_buff xdp;
 	int i, nframes = 0;
 
-	xdp_set_return_frame_no_direct();
 	xdp.rxq = &rxq;
 
 	for (i = 0; i < n; i++) {
@@ -231,7 +230,6 @@ static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_entry *rcpu,
 		}
 	}
 
-	xdp_clear_return_frame_no_direct();
 	stats->pass += nframes;
 
 	return nframes;
@@ -255,6 +253,7 @@ static void cpu_map_bpf_prog_run(struct bpf_cpu_map_entry *rcpu, void **frames,
 
 	rcu_read_lock();
 	bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx);
+	xdp_set_return_frame_no_direct();
 
 	ret->xdp_n = cpu_map_bpf_prog_run_xdp(rcpu, frames, ret->xdp_n, stats);
 	if (unlikely(ret->skb_n))
@@ -264,6 +263,7 @@ static void cpu_map_bpf_prog_run(struct bpf_cpu_map_entry *rcpu, void **frames,
 	if (stats->redirect)
 		xdp_do_flush();
 
+	xdp_clear_return_frame_no_direct();
 	bpf_net_ctx_clear(bpf_net_ctx);
 	rcu_read_unlock();
 

  reply	other threads:[~2025-08-14 11:26 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-03 15:49 [BUG] mlx5_core memory management issue Chris Arges
2025-07-04 12:37 ` Dragos Tatulea
2025-07-04 20:14   ` Dragos Tatulea
2025-07-07 22:07     ` Chris Arges
2025-07-23 18:48   ` Chris Arges
2025-07-24 17:01     ` Dragos Tatulea
2025-08-07 16:45       ` Chris Arges
2025-08-11  8:37         ` Dragos Tatulea
2025-08-12 15:44           ` Dragos Tatulea
2025-08-12 18:55             ` Jesse Brandeburg
2025-08-12 20:19               ` Dragos Tatulea
2025-08-12 21:25                 ` Chris Arges
2025-08-13 18:53                   ` Chris Arges
2025-08-13 19:26                     ` Dragos Tatulea
2025-08-13 20:24                       ` Dragos Tatulea
2025-08-14 11:26                         ` Jesper Dangaard Brouer [this message]
2025-08-14 14:42                           ` Dragos Tatulea
2025-08-14 15:58                             ` Jesper Dangaard Brouer
2025-08-14 16:45                               ` Dragos Tatulea
2025-08-15 14:59                               ` Jakub Kicinski
2025-08-15 16:02                                 ` Jesper Dangaard Brouer
2025-08-15 16:36                                   ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e60404e2-4782-409f-8596-ae21ce7272c4@kernel.org \
    --to=hawk@kernel.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=arzeznik@cloudflare.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=carges@cloudflare.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dtatulea@nvidia.com \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jbrandeburg@cloudflare.com \
    --cc=john.fastabend@gmail.com \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=memxor@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=saeedm@nvidia.com \
    --cc=tariqt@nvidia.com \
    --cc=yan@cloudflare.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.