Linux Kernel Selftest development
 help / color / mirror / Atom feed
From: Bastien Curutchet <bastien.curutchet@bootlin.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Cc: "Björn Töpel" <bjorn@kernel.org>,
	"Magnus Karlsson" <magnus.karlsson@intel.com>,
	"Jonathan Lemon" <jonathan.lemon@gmail.com>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	"Martin KaFai Lau" <martin.lau@linux.dev>,
	"Eduard Zingerman" <eddyz87@gmail.com>,
	"Song Liu" <song@kernel.org>,
	"Yonghong Song" <yonghong.song@linux.dev>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"KP Singh" <kpsingh@kernel.org>,
	"Stanislav Fomichev" <sdf@fomichev.me>,
	"Hao Luo" <haoluo@google.com>, "Jiri Olsa" <jolsa@kernel.org>,
	"Mykola Lysenko" <mykolal@fb.com>,
	"Shuah Khan" <shuah@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	"Thomas Petazzoni" <thomas.petazzoni@bootlin.com>,
	"Alexis Lothore" <alexis.lothore@bootlin.com>,
	"Network Development" <netdev@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>,
	"open list:KERNEL SELFTEST FRAMEWORK"
	<linux-kselftest@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH bpf-next v4 04/15] selftests/bpf: test_xsk: fix memory leak in testapp_stats_rx_dropped()
Date: Mon, 29 Sep 2025 10:57:45 +0200	[thread overview]
Message-ID: <4c4edfc4-f69e-43d3-8bb7-95d00bad45c5@bootlin.com> (raw)
In-Reply-To: <CAADnVQ+bBofJDfieyOYzSmSujSfJwDTQhiz3aJw7hE+4E2_iPA@mail.gmail.com>

On 9/27/25 1:19 PM, Alexei Starovoitov wrote:
> On Fri, Sep 26, 2025 at 12:47 PM Maciej Fijalkowski
> <maciej.fijalkowski@intel.com> wrote:
>>
>> On Fri, Sep 26, 2025 at 08:39:28AM +0200, Bastien Curutchet wrote:
>>> Hi Maciej,
>>>
>>> On 9/25/25 3:32 PM, Maciej Fijalkowski wrote:
>>>> On Wed, Sep 24, 2025 at 04:49:39PM +0200, Bastien Curutchet (eBPF Foundation) wrote:
>>>>> testapp_stats_rx_dropped() generates pkt_stream twice. The last
>>>>> generated is released by pkt_stream_restore_default() at the end of the
>>>>> test but we lose the pointer of the first pkt_stream.
>>>>>
>>>>> Release the 'middle' pkt_stream when it's getting replaced to prevent
>>>>> memory leaks.
>>>>>
>>>>> Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
>>>>> ---
>>>>>    tools/testing/selftests/bpf/test_xsk.c | 7 +++++++
>>>>>    1 file changed, 7 insertions(+)
>>>>>
>>>>> diff --git a/tools/testing/selftests/bpf/test_xsk.c b/tools/testing/selftests/bpf/test_xsk.c
>>>>> index 8d7c38eb32ca3537cb019f120c3350ebd9f8c6bc..eb18288ea1e4aa1c9337d16333b7174ecaed0999 100644
>>>>> --- a/tools/testing/selftests/bpf/test_xsk.c
>>>>> +++ b/tools/testing/selftests/bpf/test_xsk.c
>>>>> @@ -536,6 +536,13 @@ static void pkt_stream_receive_half(struct test_spec *test)
>>>>>            struct pkt_stream *pkt_stream = test->ifobj_tx->xsk->pkt_stream;
>>>>>            u32 i;
>>>>> + if (test->ifobj_rx->xsk->pkt_stream != test->rx_pkt_stream_default)
>>>>> +         /* Packet stream has already been replaced so we have to release this one.
>>>>> +          * The newly created one will be freed by the restore_default() at the
>>>>> +          * end of the test
>>>>> +          */
>>>>> +         pkt_stream_delete(test->ifobj_rx->xsk->pkt_stream);
>>>>
>>>> I don't see why this one is not addressed within test case
>>>> (testapp_stats_rx_dropped()) and other fix is (testapp_xdp_shared_umem()).
>>>>
>>>
>>> pkt_stream_receive_half() can be used by other tests. I thought it would be
>>
>> So is pkt_stream_replace_half() and other routines that eventually call
>> pkt_stream_generate() and overwrite the pkt_stream, right?
>>
>> It just feels odd to have a special treatment in one function and other
>> are left as-is just because currently we don't have another abusive test
>> case.
>>
>> Maybe it's enough of bike-shedding here, just wanted to clarify on my POV.
>>
>> In the end don't get me wrong here, this interface is a bit PITA for me
>> and thanks for whole effort!
> 
> My reading of this discussion that it doesn't block the series
> and can be done in the follow up if necessary.
> 
> So I was planning to apply it, but it found real bugs:
> 
> ./test_progs -t xsk
> [   18.066989] bpf_testmod: loading out-of-tree module taints kernel.
> [   32.204881] BUG: Bad page state in process test_progs  pfn:11c98b
> [   32.207167] page: refcount:0 mapcount:0 mapping:0000000000000000
> index:0x0 pfn:0x11c98b
> [   32.210084] flags: 0x1fffe0000000000(node=0|zone=1|lastcpupid=0x7fff)
> [   32.212493] raw: 01fffe0000000000 dead000000000040 ff11000123c9b000
> 0000000000000000
> [   32.218056] raw: 0000000000000000 0000000000000001 00000000ffffffff
> 0000000000000000
> [   32.220900] page dumped because: page_pool leak
> [   32.222636] Modules linked in: bpf_testmod(O) bpf_preload
> [   32.224632] CPU: 6 UID: 0 PID: 3612 Comm: test_progs Tainted: G
>        O        6.17.0-rc5-gfec474d29325 #6969 PREEMPT
> [   32.224638] Tainted: [O]=OOT_MODULE
> [   32.224639] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
> [   32.224641] Call Trace:
> [   32.224644]  <IRQ>
> [   32.224646]  dump_stack_lvl+0x4b/0x70
> [   32.224653]  bad_page.cold+0xbd/0xe0
> [   32.224657]  __free_frozen_pages+0x838/0x10b0
> [   32.224660]  ? skb_pp_cow_data+0x782/0xc30
> [   32.224665]  bpf_xdp_shrink_data+0x221/0x530
> [   32.224668]  ? skb_pp_cow_data+0x6d1/0xc30
> [   32.224671]  bpf_xdp_adjust_tail+0x598/0x810
> [   32.224673]  ? xsk_destruct_skb+0x321/0x800
> [   32.224678]  bpf_prog_004ac6bb21de57a7_xsk_xdp_adjust_tail+0x52/0xd6
> [   32.224681]  veth_xdp_rcv_skb+0x45d/0x15a0
> [   32.224684]  ? get_stack_info_noinstr+0x16/0xe0
> [   32.224688]  ? veth_set_channels+0x920/0x920
> [   32.224691]  ? get_stack_info+0x2f/0x80
> [   32.224693]  ? unwind_next_frame+0x3af/0x1df0
> [   32.224697]  veth_xdp_rcv.constprop.0+0x38a/0xbe0
> [   32.224700]  ? common_startup_64+0x13e/0x148
> [   32.224703]  ? veth_xdp_rcv_one+0xcd0/0xcd0
> [   32.224706]  ? stack_trace_save+0x84/0xa0
> [   32.224709]  ? stack_depot_save_flags+0x28/0x820
> [   32.224713]  ? __resched_curr.constprop.0+0x332/0x3b0
> [   32.224716]  ? timerqueue_add+0x217/0x320
> [   32.224719]  veth_poll+0x115/0x5e0
> [   32.224722]  ? veth_xdp_rcv.constprop.0+0xbe0/0xbe0
> [   32.224726]  ? update_load_avg+0x1cb/0x12d0
> [   32.224730]  ? update_cfs_group+0x121/0x2c0
> [   32.224733]  __napi_poll+0xa0/0x420
> [   32.224736]  net_rx_action+0x901/0xe90
> [   32.224740]  ? run_backlog_napi+0x50/0x50
> [   32.224743]  ? clockevents_program_event+0x1cc/0x280
> [   32.224746]  ? hrtimer_interrupt+0x31e/0x7c0
> [   32.224749]  handle_softirqs+0x151/0x430
> [   32.224752]  do_softirq+0x3f/0x60
> [   32.224755]  </IRQ>
> [   32.224756]  <TASK>
> [   32.224757]  __local_bh_enable_ip+0x58/0x60
> [   32.224759]  __dev_direct_xmit+0x295/0x540
> [   32.224762]  __xsk_generic_xmit+0x180a/0x2df0
> [   32.224764]  ? ___kmalloc_large_node+0xdf/0x130
> [   32.224767]  ? __mutex_unlock_slowpath.isra.0+0x330/0x330
> [   32.224770]  ? __rtnl_unlock+0x65/0xd0
> [   32.224773]  ? xsk_create+0x700/0x700
> [   32.224774]  ? netdev_run_todo+0xce/0xbe0
> [   32.224777]  ? _raw_spin_lock_irqsave+0x7b/0xc0
> [   32.224780]  xsk_sendmsg+0x365/0x770
> [   32.224782]  ? xsk_poll+0x640/0x640
> [   32.224783]  __sock_sendmsg+0xc1/0x150
> [   32.224787]  __sys_sendto+0x1d0/0x260
> [   32.224790]  ? __ia32_sys_getpeername+0xb0/0xb0
> [   32.224793]  ? fput+0x29/0x80
> [   32.224796]  ? __sys_bind+0x187/0x1c0
> [   32.224798]  ? __sys_bind_socket+0x90/0x90
> [   32.224801]  ? randomize_page+0x60/0x60
> [   32.224804]  ? fget+0x18e/0x230
> [   32.224807]  __x64_sys_sendto+0xe0/0x1b0
> [   32.224810]  ? fpregs_assert_state_consistent+0x57/0xe0
> [   32.224812]  do_syscall_64+0x46/0x180
> [   32.224815]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> 
> and at the end:
> 
> # ERROR: [receive_pkts] Receive loop timed out
> test_xsk:FAIL:Run test unexpected error: -1 (errno 12)
> #251/32  ns_xsk_drv/XDP_ADJUST_TAIL_SHRINK_MULTI_BUFF:FAIL
> #251     ns_xsk_drv:FAIL
> Summary: 1/67 PASSED, 0 SKIPPED, 1 FAILED
> 
> [   99.308243] page_pool_release_retry() stalled pool shutdown: id
> 185, 48 inflight 60 sec
> [  159.724173] page_pool_release_retry() stalled pool shutdown: id
> 185, 48 inflight 120 sec
> 
> 
> The test is great and the work to make it run as part of test_progs
> paid off big time.
> 
> But we cannot enable it by default, since it will be crashing CI VMs.
> 
> Please reproduce the above issue.
> You might need
> CONFIG_DEBUG_VM=y
> and other mm debug flags.
> 

I did reproduce the issue with CONFIG_DEBUG_VM=y

> If the fix can be done quickly let's land the fix first.
> If not, please respin the series, but disable the test by default
> until the bug is fixed.

I won't have much time this week to investigate this further, so I'll 
respin the series with this test in the 'flaky table'.

Best regards,
-- 
Bastien Curutchet, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


  reply	other threads:[~2025-09-29  8:58 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-24 14:49 [PATCH bpf-next v4 00/15] selftests/bpf: Integrate test_xsk.c to test_progs framework Bastien Curutchet (eBPF Foundation)
2025-09-24 14:49 ` [PATCH bpf-next v4 01/15] selftests/bpf: test_xsk: Split xskxceiver Bastien Curutchet (eBPF Foundation)
2025-09-24 14:49 ` [PATCH bpf-next v4 02/15] selftests/bpf: test_xsk: Initialize bitmap before use Bastien Curutchet (eBPF Foundation)
2025-09-24 14:49 ` [PATCH bpf-next v4 03/15] selftests/bpf: test_xsk: Fix __testapp_validate_traffic()'s return value Bastien Curutchet (eBPF Foundation)
2025-09-24 14:49 ` [PATCH bpf-next v4 04/15] selftests/bpf: test_xsk: fix memory leak in testapp_stats_rx_dropped() Bastien Curutchet (eBPF Foundation)
2025-09-25 13:32   ` Maciej Fijalkowski
2025-09-26  6:39     ` Bastien Curutchet
2025-09-26 11:47       ` Maciej Fijalkowski
2025-09-27 11:19         ` Alexei Starovoitov
2025-09-29  8:57           ` Bastien Curutchet [this message]
2025-09-29 14:37             ` Maciej Fijalkowski
2025-09-29 16:04               ` Maciej Fijalkowski
2025-09-24 14:49 ` [PATCH bpf-next v4 05/15] selftests/bpf: test_xsk: fix memory leak in testapp_xdp_shared_umem() Bastien Curutchet (eBPF Foundation)
2025-09-24 14:49 ` [PATCH bpf-next v4 06/15] selftests/bpf: test_xsk: Wrap test clean-up in functions Bastien Curutchet (eBPF Foundation)
2025-09-24 14:49 ` [PATCH bpf-next v4 07/15] selftests/bpf: test_xsk: Release resources when swap fails Bastien Curutchet (eBPF Foundation)
2025-09-24 14:49 ` [PATCH bpf-next v4 08/15] selftests/bpf: test_xsk: Add return value to init_iface() Bastien Curutchet (eBPF Foundation)
2025-09-24 14:49 ` [PATCH bpf-next v4 09/15] selftests/bpf: test_xsk: Don't exit immediately when xsk_attach fails Bastien Curutchet (eBPF Foundation)
2025-09-24 14:49 ` [PATCH bpf-next v4 10/15] selftests/bpf: test_xsk: Don't exit immediately when gettimeofday fails Bastien Curutchet (eBPF Foundation)
2025-09-24 14:49 ` [PATCH bpf-next v4 11/15] selftests/bpf: test_xsk: Don't exit immediately when workers fail Bastien Curutchet (eBPF Foundation)
2025-09-24 14:49 ` [PATCH bpf-next v4 12/15] selftests/bpf: test_xsk: Don't exit immediately if validate_traffic fails Bastien Curutchet (eBPF Foundation)
2025-09-24 14:49 ` [PATCH bpf-next v4 13/15] selftests/bpf: test_xsk: Don't exit immediately on allocation failures Bastien Curutchet (eBPF Foundation)
2025-09-24 14:49 ` [PATCH bpf-next v4 14/15] selftests/bpf: test_xsk: Isolate flaky tests Bastien Curutchet (eBPF Foundation)
2025-09-24 14:49 ` [PATCH bpf-next v4 15/15] selftests/bpf: test_xsk: Integrate test_xsk.c to test_progs framework Bastien Curutchet (eBPF Foundation)
2025-09-26 11:49 ` [PATCH bpf-next v4 00/15] selftests/bpf: " Maciej Fijalkowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4c4edfc4-f69e-43d3-8bb7-95d00bad45c5@bootlin.com \
    --to=bastien.curutchet@bootlin.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=alexis.lothore@bootlin.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=eddyz87@gmail.com \
    --cc=haoluo@google.com \
    --cc=hawk@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=jonathan.lemon@gmail.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=maciej.fijalkowski@intel.com \
    --cc=magnus.karlsson@intel.com \
    --cc=martin.lau@linux.dev \
    --cc=mykolal@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@fomichev.me \
    --cc=shuah@kernel.org \
    --cc=song@kernel.org \
    --cc=thomas.petazzoni@bootlin.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox