[bug report] kmemleak issue observed during blktests

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [bug report] kmemleak issue observed during blktests
@ 2025-07-16  1:42 Yi Zhang
  2025-07-16  1:54 ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Yi Zhang @ 2025-07-16  1:42 UTC (permalink / raw)
  To: linux-block; +Cc: Ming Lei, Shinichiro Kawasaki, Jens Axboe

Hi

I found the following kmemleak issue on the latest
linux-block/for-next, please help check it and let me know if you need
any info/testing for it, thanks.

commit: linux-block/for-next: 8192f418ee2f (HEAD -> for-next,
origin/for-next) Merge branch 'for-6.17/io_uring' into for-next

# dmesg | grep kmemleak
[31404.993877] kmemleak: 608 new suspected memory leaks (see
/sys/kernel/debug/kmemleak)

unreferenced object 0xffff8882e7fb9000 (size 2048):
  comm "check", pid 10460, jiffies 4324980514
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace (crc c47e6a37):
    __kvmalloc_node_noprof+0x55d/0x7a0
    sbitmap_init_node+0x15a/0x6a0
    kyber_init_hctx+0x316/0xb90
    blk_mq_init_sched+0x416/0x580
    elevator_switch+0x18b/0x630
    elv_update_nr_hw_queues+0x219/0x2c0
    __blk_mq_update_nr_hw_queues+0x36a/0x6f0
    blk_mq_update_nr_hw_queues+0x3a/0x60
    find_fallback+0x510/0x540 [nbd]
    nbd_send_cmd+0x24b/0x1480 [nbd]
    configfs_write_iter+0x2ae/0x470
    vfs_write+0x524/0xe70
    ksys_write+0xff/0x200
    do_syscall_64+0x98/0x3c0
    entry_SYSCALL_64_after_hwframe+0x76/0x7e
unreferenced object 0xffff8882e7fbb000 (size 2048):
  comm "check", pid 10460, jiffies 4324980514
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace (crc c47e6a37):
    __kvmalloc_node_noprof+0x55d/0x7a0
    sbitmap_init_node+0x15a/0x6a0
    kyber_init_hctx+0x316/0xb90
    blk_mq_init_sched+0x416/0x580
    elevator_switch+0x18b/0x630
    elv_update_nr_hw_queues+0x219/0x2c0
    __blk_mq_update_nr_hw_queues+0x36a/0x6f0
    blk_mq_update_nr_hw_queues+0x3a/0x60
    find_fallback+0x510/0x540 [nbd]
    nbd_send_cmd+0x24b/0x1480 [nbd]
    configfs_write_iter+0x2ae/0x470
    vfs_write+0x524/0xe70
    ksys_write+0xff/0x200
    do_syscall_64+0x98/0x3c0
    entry_SYSCALL_64_after_hwframe+0x76/0x7e
unreferenced object 0xffff88819e855000 (size 2048):
  comm "check", pid 10460, jiffies 4324980514
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace (crc c47e6a37):
    __kvmalloc_node_noprof+0x55d/0x7a0
    sbitmap_init_node+0x15a/0x6a0
    kyber_init_hctx+0x316/0xb90
    blk_mq_init_sched+0x416/0x580
    elevator_switch+0x18b/0x630
    elv_update_nr_hw_queues+0x219/0x2c0
    __blk_mq_update_nr_hw_queues+0x36a/0x6f0
    blk_mq_update_nr_hw_queues+0x3a/0x60
    find_fallback+0x510/0x540 [nbd]
    nbd_send_cmd+0x24b/0x1480 [nbd]
    configfs_write_iter+0x2ae/0x470
    vfs_write+0x524/0xe70
    ksys_write+0xff/0x200
    do_syscall_64+0x98/0x3c0
    entry_SYSCALL_64_after_hwframe+0x76/0x7e


-- 
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug report] kmemleak issue observed during blktests
  2025-07-16  1:42 [bug report] kmemleak issue observed during blktests Yi Zhang
@ 2025-07-16  1:54 ` Jens Axboe
  2025-07-16  7:50   ` Yu Kuai
  0 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2025-07-16  1:54 UTC (permalink / raw)
  To: Yi Zhang, linux-block; +Cc: Ming Lei, Shinichiro Kawasaki

On 7/15/25 7:42 PM, Yi Zhang wrote:
> Hi
> 
> I found the following kmemleak issue on the latest
> linux-block/for-next, please help check it and let me know if you need
> any info/testing for it, thanks.
> 
> commit: linux-block/for-next: 8192f418ee2f (HEAD -> for-next,
> origin/for-next) Merge branch 'for-6.17/io_uring' into for-next
> 
> # dmesg | grep kmemleak
> [31404.993877] kmemleak: 608 new suspected memory leaks (see
> /sys/kernel/debug/kmemleak)
> 
> unreferenced object 0xffff8882e7fb9000 (size 2048):
>   comm "check", pid 10460, jiffies 4324980514
>   hex dump (first 32 bytes):
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace (crc c47e6a37):
>     __kvmalloc_node_noprof+0x55d/0x7a0
>     sbitmap_init_node+0x15a/0x6a0
>     kyber_init_hctx+0x316/0xb90
>     blk_mq_init_sched+0x416/0x580
>     elevator_switch+0x18b/0x630
>     elv_update_nr_hw_queues+0x219/0x2c0
>     __blk_mq_update_nr_hw_queues+0x36a/0x6f0
>     blk_mq_update_nr_hw_queues+0x3a/0x60
>     find_fallback+0x510/0x540 [nbd]
>     nbd_send_cmd+0x24b/0x1480 [nbd]
>     configfs_write_iter+0x2ae/0x470
>     vfs_write+0x524/0xe70
>     ksys_write+0xff/0x200
>     do_syscall_64+0x98/0x3c0
>     entry_SYSCALL_64_after_hwframe+0x76/0x7e
> unreferenced object 0xffff8882e7fbb000 (size 2048):
>   comm "check", pid 10460, jiffies 4324980514
>   hex dump (first 32 bytes):
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace (crc c47e6a37):
>     __kvmalloc_node_noprof+0x55d/0x7a0
>     sbitmap_init_node+0x15a/0x6a0
>     kyber_init_hctx+0x316/0xb90
>     blk_mq_init_sched+0x416/0x580
>     elevator_switch+0x18b/0x630
>     elv_update_nr_hw_queues+0x219/0x2c0
>     __blk_mq_update_nr_hw_queues+0x36a/0x6f0
>     blk_mq_update_nr_hw_queues+0x3a/0x60
>     find_fallback+0x510/0x540 [nbd]
>     nbd_send_cmd+0x24b/0x1480 [nbd]
>     configfs_write_iter+0x2ae/0x470
>     vfs_write+0x524/0xe70
>     ksys_write+0xff/0x200
>     do_syscall_64+0x98/0x3c0
>     entry_SYSCALL_64_after_hwframe+0x76/0x7e
> unreferenced object 0xffff88819e855000 (size 2048):
>   comm "check", pid 10460, jiffies 4324980514
>   hex dump (first 32 bytes):
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace (crc c47e6a37):
>     __kvmalloc_node_noprof+0x55d/0x7a0
>     sbitmap_init_node+0x15a/0x6a0
>     kyber_init_hctx+0x316/0xb90
>     blk_mq_init_sched+0x416/0x580
>     elevator_switch+0x18b/0x630
>     elv_update_nr_hw_queues+0x219/0x2c0
>     __blk_mq_update_nr_hw_queues+0x36a/0x6f0
>     blk_mq_update_nr_hw_queues+0x3a/0x60
>     find_fallback+0x510/0x540 [nbd]
>     nbd_send_cmd+0x24b/0x1480 [nbd]
>     configfs_write_iter+0x2ae/0x470
>     vfs_write+0x524/0xe70
>     ksys_write+0xff/0x200
>     do_syscall_64+0x98/0x3c0
>     entry_SYSCALL_64_after_hwframe+0x76/0x7e

Can you try and revert:

commit 8b428f42f3edfd62422aa7ad87049ab232a2eaa9
Author: Ming Lei <ming.lei@redhat.com>
Date:   Wed Jul 9 19:17:44 2025 +0800

    nbd: fix lockdep deadlock warning

and see if that fixes it? If not, then checkout for-6.17/block instead,
and then run a bisect with that sha marked as bad, and v6.16-rc4 as the
known good one.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug report] kmemleak issue observed during blktests
  2025-07-16  1:54 ` Jens Axboe
@ 2025-07-16  7:50   ` Yu Kuai
  2025-07-16 10:40     ` Ming Lei
  0 siblings, 1 reply; 12+ messages in thread
From: Yu Kuai @ 2025-07-16  7:50 UTC (permalink / raw)
  To: Jens Axboe, Yi Zhang, linux-block
  Cc: Ming Lei, Shinichiro Kawasaki, yukuai (C)

Hi,

在 2025/07/16 9:54, Jens Axboe 写道:
> unreferenced object 0xffff8882e7fbb000 (size 2048):
>    comm "check", pid 10460, jiffies 4324980514
>    hex dump (first 32 bytes):
>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>    backtrace (crc c47e6a37):
>      __kvmalloc_node_noprof+0x55d/0x7a0
>      sbitmap_init_node+0x15a/0x6a0
>      kyber_init_hctx+0x316/0xb90
>      blk_mq_init_sched+0x416/0x580
>      elevator_switch+0x18b/0x630
>      elv_update_nr_hw_queues+0x219/0x2c0
>      __blk_mq_update_nr_hw_queues+0x36a/0x6f0
>      blk_mq_update_nr_hw_queues+0x3a/0x60
>      find_fallback+0x510/0x540 [nbd]

This is werid, and I check the code that it's impossible
blk_mq_update_nr_hw_queues() can be called from find_fallback().

Does kmemleak show wrong backtrace?

Thanks,
Kuai

>      nbd_send_cmd+0x24b/0x1480 [nbd]
>      configfs_write_iter+0x2ae/0x470
>      vfs_write+0x524/0xe70
>      ksys_write+0xff/0x200
>      do_syscall_64+0x98/0x3c0
>      entry_SYSCALL_64_after_hwframe+0x76/0x7e


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug report] kmemleak issue observed during blktests
  2025-07-16  7:50   ` Yu Kuai
@ 2025-07-16 10:40     ` Ming Lei
  2025-07-16 19:24       ` Nilay Shroff
  2025-07-17  3:58       ` Yi Zhang
  0 siblings, 2 replies; 12+ messages in thread
From: Ming Lei @ 2025-07-16 10:40 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Jens Axboe, Yi Zhang, linux-block, Shinichiro Kawasaki,
	yukuai (C)

On Wed, Jul 16, 2025 at 03:50:34PM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2025/07/16 9:54, Jens Axboe 写道:
> > unreferenced object 0xffff8882e7fbb000 (size 2048):
> >    comm "check", pid 10460, jiffies 4324980514
> >    hex dump (first 32 bytes):
> >      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >    backtrace (crc c47e6a37):
> >      __kvmalloc_node_noprof+0x55d/0x7a0
> >      sbitmap_init_node+0x15a/0x6a0
> >      kyber_init_hctx+0x316/0xb90
> >      blk_mq_init_sched+0x416/0x580
> >      elevator_switch+0x18b/0x630
> >      elv_update_nr_hw_queues+0x219/0x2c0
> >      __blk_mq_update_nr_hw_queues+0x36a/0x6f0
> >      blk_mq_update_nr_hw_queues+0x3a/0x60
> >      find_fallback+0x510/0x540 [nbd]
> 
> This is werid, and I check the code that it's impossible
> blk_mq_update_nr_hw_queues() can be called from find_fallback().

Yes.

> Does kmemleak show wrong backtrace?

I tried to run blktests block/005 over nbd, but can't reproduce this
kmemleak report after setting up the detector.

Yi, can you share your reproducer?


Thanks
Ming


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug report] kmemleak issue observed during blktests
  2025-07-16 10:40     ` Ming Lei
@ 2025-07-16 19:24       ` Nilay Shroff
  2025-07-17  0:02         ` Ming Lei
  2025-07-17  3:58       ` Yi Zhang
  1 sibling, 1 reply; 12+ messages in thread
From: Nilay Shroff @ 2025-07-16 19:24 UTC (permalink / raw)
  To: Ming Lei, Yu Kuai
  Cc: Jens Axboe, Yi Zhang, linux-block, Shinichiro Kawasaki,
	yukuai (C)

On 7/16/25 4:10 PM, Ming Lei wrote:
> On Wed, Jul 16, 2025 at 03:50:34PM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2025/07/16 9:54, Jens Axboe 写道:
>>> unreferenced object 0xffff8882e7fbb000 (size 2048):
>>>    comm "check", pid 10460, jiffies 4324980514
>>>    hex dump (first 32 bytes):
>>>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>    backtrace (crc c47e6a37):
>>>      __kvmalloc_node_noprof+0x55d/0x7a0
>>>      sbitmap_init_node+0x15a/0x6a0
>>>      kyber_init_hctx+0x316/0xb90
>>>      blk_mq_init_sched+0x416/0x580
>>>      elevator_switch+0x18b/0x630
>>>      elv_update_nr_hw_queues+0x219/0x2c0
>>>      __blk_mq_update_nr_hw_queues+0x36a/0x6f0
>>>      blk_mq_update_nr_hw_queues+0x3a/0x60
>>>      find_fallback+0x510/0x540 [nbd]
>>
>> This is werid, and I check the code that it's impossible
>> blk_mq_update_nr_hw_queues() can be called from find_fallback().
> 
> Yes.
> 
>> Does kmemleak show wrong backtrace?
> 
> I tried to run blktests block/005 over nbd, but can't reproduce this
> kmemleak report after setting up the detector.

I have analyzed this bug and found the root cause:

The issue arises while we run nr_hw_queue update,  Specifically, we first
reallocate hardware contexts (hctx) via __blk_mq_realloc_hw_ctxs(), and 
then later invoke elevator_switch() (assuming q->elevator is not NULL). 
The elevator switch code would first exit old elevator (elevator_exit)
and then switch to new elevator. The elevator_exit loops through
each hctx and invokes the elevator’s per-hctx exit method ->exit_hctx(),
which releases resources allocated during ->init_hctx().

This memleak manifests when we reduce the num of h/w queues - for example,
when the initial update sets the number of queues to X, and a later update
reduces it to Y, where Y < X. In this case, we'd loose the access to old 
hctxs while we get to elevator exit code because __blk_mq_realloc_hw_ctxs
would have already released the old hctxs. As we don't now have any reference
left to the old hctxs, we don't have any way to free the scheduler resources
(which are allocate in ->init_hctx()) and kmemleak complains about it.

Regarding reproduction, I was also not able to recreate it using block/005
but then I wrote a script using null-blk driver which updates nr_hw_queue
from X to Y (where Y < X) and I encountered this memleak. So this is not
an issue with nbd driver.

I've implemented a potential fix for the above issue and I'm unit 
testing it now. I will post a formal patch in some time.

Thanks,
--Nilay

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug report] kmemleak issue observed during blktests
  2025-07-16 19:24       ` Nilay Shroff
@ 2025-07-17  0:02         ` Ming Lei
  2025-07-17  0:46           ` Yi Zhang
  2025-07-17 14:11           ` Nilay Shroff
  0 siblings, 2 replies; 12+ messages in thread
From: Ming Lei @ 2025-07-17  0:02 UTC (permalink / raw)
  To: Nilay Shroff
  Cc: Yu Kuai, Jens Axboe, Yi Zhang, linux-block, Shinichiro Kawasaki,
	yukuai (C)

On Thu, Jul 17, 2025 at 12:54:31AM +0530, Nilay Shroff wrote:
> 
> 
> On 7/16/25 4:10 PM, Ming Lei wrote:
> > On Wed, Jul 16, 2025 at 03:50:34PM +0800, Yu Kuai wrote:
> >> Hi,
> >>
> >> 在 2025/07/16 9:54, Jens Axboe 写道:
> >>> unreferenced object 0xffff8882e7fbb000 (size 2048):
> >>>    comm "check", pid 10460, jiffies 4324980514
> >>>    hex dump (first 32 bytes):
> >>>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >>>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >>>    backtrace (crc c47e6a37):
> >>>      __kvmalloc_node_noprof+0x55d/0x7a0
> >>>      sbitmap_init_node+0x15a/0x6a0
> >>>      kyber_init_hctx+0x316/0xb90
> >>>      blk_mq_init_sched+0x416/0x580
> >>>      elevator_switch+0x18b/0x630
> >>>      elv_update_nr_hw_queues+0x219/0x2c0
> >>>      __blk_mq_update_nr_hw_queues+0x36a/0x6f0
> >>>      blk_mq_update_nr_hw_queues+0x3a/0x60
> >>>      find_fallback+0x510/0x540 [nbd]
> >>
> >> This is werid, and I check the code that it's impossible
> >> blk_mq_update_nr_hw_queues() can be called from find_fallback().
> > 
> > Yes.
> > 
> >> Does kmemleak show wrong backtrace?
> > 
> > I tried to run blktests block/005 over nbd, but can't reproduce this
> > kmemleak report after setting up the detector.
> 
> I have analyzed this bug and found the root cause:
> 
> The issue arises while we run nr_hw_queue update,  Specifically, we first
> reallocate hardware contexts (hctx) via __blk_mq_realloc_hw_ctxs(), and 
> then later invoke elevator_switch() (assuming q->elevator is not NULL). 
> The elevator switch code would first exit old elevator (elevator_exit)
> and then switch to new elevator. The elevator_exit loops through
> each hctx and invokes the elevator’s per-hctx exit method ->exit_hctx(),
> which releases resources allocated during ->init_hctx().
> 
> This memleak manifests when we reduce the num of h/w queues - for example,
> when the initial update sets the number of queues to X, and a later update
> reduces it to Y, where Y < X. In this case, we'd loose the access to old 
> hctxs while we get to elevator exit code because __blk_mq_realloc_hw_ctxs
> would have already released the old hctxs. As we don't now have any reference
> left to the old hctxs, we don't have any way to free the scheduler resources
> (which are allocate in ->init_hctx()) and kmemleak complains about it.
> 
> Regarding reproduction, I was also not able to recreate it using block/005
> but then I wrote a script using null-blk driver which updates nr_hw_queue
> from X to Y (where Y < X) and I encountered this memleak. So this is not
> an issue with nbd driver.
> 
> I've implemented a potential fix for the above issue and I'm unit 
> testing it now. I will post a formal patch in some time.

Great!

Looks it is introduced in commit 596dce110b7d ("block: simplify elevator reattachment
for updating nr_hw_queues"), but easy to cause panic with that patchset.

One simple fix is to restore to original two-stage elevator switch, meantime saving
elevator name in xarray for not adding boilerplate code back.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug report] kmemleak issue observed during blktests
  2025-07-17  0:02         ` Ming Lei
@ 2025-07-17  0:46           ` Yi Zhang
  2025-07-17 14:22             ` Nilay Shroff
  2025-07-17 14:11           ` Nilay Shroff
  1 sibling, 1 reply; 12+ messages in thread
From: Yi Zhang @ 2025-07-17  0:46 UTC (permalink / raw)
  To: Ming Lei
  Cc: Nilay Shroff, Yu Kuai, Jens Axboe, linux-block,
	Shinichiro Kawasaki, yukuai (C)

On Thu, Jul 17, 2025 at 8:02 AM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Thu, Jul 17, 2025 at 12:54:31AM +0530, Nilay Shroff wrote:
> >
> >
> > On 7/16/25 4:10 PM, Ming Lei wrote:
> > > On Wed, Jul 16, 2025 at 03:50:34PM +0800, Yu Kuai wrote:
> > >> Hi,
> > >>
> > >> 在 2025/07/16 9:54, Jens Axboe 写道:
> > >>> unreferenced object 0xffff8882e7fbb000 (size 2048):
> > >>>    comm "check", pid 10460, jiffies 4324980514
> > >>>    hex dump (first 32 bytes):
> > >>>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > >>>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > >>>    backtrace (crc c47e6a37):
> > >>>      __kvmalloc_node_noprof+0x55d/0x7a0
> > >>>      sbitmap_init_node+0x15a/0x6a0
> > >>>      kyber_init_hctx+0x316/0xb90
> > >>>      blk_mq_init_sched+0x416/0x580
> > >>>      elevator_switch+0x18b/0x630
> > >>>      elv_update_nr_hw_queues+0x219/0x2c0
> > >>>      __blk_mq_update_nr_hw_queues+0x36a/0x6f0
> > >>>      blk_mq_update_nr_hw_queues+0x3a/0x60
> > >>>      find_fallback+0x510/0x540 [nbd]
> > >>
> > >> This is werid, and I check the code that it's impossible
> > >> blk_mq_update_nr_hw_queues() can be called from find_fallback().
> > >
> > > Yes.
> > >
> > >> Does kmemleak show wrong backtrace?
> > >
> > > I tried to run blktests block/005 over nbd, but can't reproduce this
> > > kmemleak report after setting up the detector.
> >
> > I have analyzed this bug and found the root cause:
> >
> > The issue arises while we run nr_hw_queue update,  Specifically, we first
> > reallocate hardware contexts (hctx) via __blk_mq_realloc_hw_ctxs(), and
> > then later invoke elevator_switch() (assuming q->elevator is not NULL).
> > The elevator switch code would first exit old elevator (elevator_exit)
> > and then switch to new elevator. The elevator_exit loops through
> > each hctx and invokes the elevator’s per-hctx exit method ->exit_hctx(),
> > which releases resources allocated during ->init_hctx().
> >
> > This memleak manifests when we reduce the num of h/w queues - for example,
> > when the initial update sets the number of queues to X, and a later update
> > reduces it to Y, where Y < X. In this case, we'd loose the access to old
> > hctxs while we get to elevator exit code because __blk_mq_realloc_hw_ctxs
> > would have already released the old hctxs. As we don't now have any reference
> > left to the old hctxs, we don't have any way to free the scheduler resources
> > (which are allocate in ->init_hctx()) and kmemleak complains about it.
> >
> > Regarding reproduction, I was also not able to recreate it using block/005
> > but then I wrote a script using null-blk driver which updates nr_hw_queue
> > from X to Y (where Y < X) and I encountered this memleak. So this is not
> > an issue with nbd driver.
> >
> > I've implemented a potential fix for the above issue and I'm unit
> > testing it now. I will post a formal patch in some time.
>
> Great!
>
> Looks it is introduced in commit 596dce110b7d ("block: simplify elevator reattachment
> for updating nr_hw_queues"), but easy to cause panic with that patchset.
>
> One simple fix is to restore to original two-stage elevator switch, meantime saving
> elevator name in xarray for not adding boilerplate code back.
>
>
> Thanks,
> Ming
>

Sorry for the late response, it takes me some time to find which case
triggered the kmemleak.
It turns out that block/040[1] triggered the kmemleak, and just
running [2] after block/040 can not trigger the kmemleak immediately.
We have to wait for more time.
[1]
[  458.175983] null_blk: disk nullb0 created
[  458.180035] null_blk: module loaded
[  458.397994] run blktests block/040 at 2025-07-16 20:31:20
[  458.571488] null_blk: disk nullb1 created
[  874.620574] kmemleak: 522 new suspected memory leaks (see
/sys/kernel/debug/kmemleak)
[2]
echo scan >/sys/kernel/debug/kmemleak

-- 
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug report] kmemleak issue observed during blktests
  2025-07-16 10:40     ` Ming Lei
  2025-07-16 19:24       ` Nilay Shroff
@ 2025-07-17  3:58       ` Yi Zhang
  1 sibling, 0 replies; 12+ messages in thread
From: Yi Zhang @ 2025-07-17  3:58 UTC (permalink / raw)
  To: Ming Lei
  Cc: Yu Kuai, Jens Axboe, Nilay Shroff, linux-block,
	Shinichiro Kawasaki, yukuai (C)

On Wed, Jul 16, 2025 at 6:40 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Wed, Jul 16, 2025 at 03:50:34PM +0800, Yu Kuai wrote:
> > Hi,
> >
> > 在 2025/07/16 9:54, Jens Axboe 写道:
> > > unreferenced object 0xffff8882e7fbb000 (size 2048):
> > >    comm "check", pid 10460, jiffies 4324980514
> > >    hex dump (first 32 bytes):
> > >      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > >      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > >    backtrace (crc c47e6a37):
> > >      __kvmalloc_node_noprof+0x55d/0x7a0
> > >      sbitmap_init_node+0x15a/0x6a0
> > >      kyber_init_hctx+0x316/0xb90
> > >      blk_mq_init_sched+0x416/0x580
> > >      elevator_switch+0x18b/0x630
> > >      elv_update_nr_hw_queues+0x219/0x2c0
> > >      __blk_mq_update_nr_hw_queues+0x36a/0x6f0
> > >      blk_mq_update_nr_hw_queues+0x3a/0x60
> > >      find_fallback+0x510/0x540 [nbd]
> >
> > This is werid, and I check the code that it's impossible
> > blk_mq_update_nr_hw_queues() can be called from find_fallback().
>
> Yes.
>
> > Does kmemleak show wrong backtrace?
>

I think so, the kmemleak was observed when I was running all the
blktests which include the nbd test, that's why the backtrace have the
nbd info.
If I only run the test block/040, when the kmemleak occurred, the back
trace doesn't have the nbd info now.

> I tried to run blktests block/005 over nbd, but can't reproduce this
> kmemleak report after setting up the detector.
>
> Yi, can you share your reproducer?
>
>
> Thanks
> Ming
>


-- 
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug report] kmemleak issue observed during blktests
  2025-07-17  0:02         ` Ming Lei
  2025-07-17  0:46           ` Yi Zhang
@ 2025-07-17 14:11           ` Nilay Shroff
  2025-07-17 14:25             ` Yi Zhang
  1 sibling, 1 reply; 12+ messages in thread
From: Nilay Shroff @ 2025-07-17 14:11 UTC (permalink / raw)
  To: Ming Lei
  Cc: Yu Kuai, Jens Axboe, Yi Zhang, linux-block, Shinichiro Kawasaki,
	yukuai (C)



On 7/17/25 5:32 AM, Ming Lei wrote:
> On Thu, Jul 17, 2025 at 12:54:31AM +0530, Nilay Shroff wrote:
>>
>>
>> On 7/16/25 4:10 PM, Ming Lei wrote:
>>> On Wed, Jul 16, 2025 at 03:50:34PM +0800, Yu Kuai wrote:
>>>> Hi,
>>>>
>>>> 在 2025/07/16 9:54, Jens Axboe 写道:
>>>>> unreferenced object 0xffff8882e7fbb000 (size 2048):
>>>>>    comm "check", pid 10460, jiffies 4324980514
>>>>>    hex dump (first 32 bytes):
>>>>>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>>>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>>>    backtrace (crc c47e6a37):
>>>>>      __kvmalloc_node_noprof+0x55d/0x7a0
>>>>>      sbitmap_init_node+0x15a/0x6a0
>>>>>      kyber_init_hctx+0x316/0xb90
>>>>>      blk_mq_init_sched+0x416/0x580
>>>>>      elevator_switch+0x18b/0x630
>>>>>      elv_update_nr_hw_queues+0x219/0x2c0
>>>>>      __blk_mq_update_nr_hw_queues+0x36a/0x6f0
>>>>>      blk_mq_update_nr_hw_queues+0x3a/0x60
>>>>>      find_fallback+0x510/0x540 [nbd]
>>>>
>>>> This is werid, and I check the code that it's impossible
>>>> blk_mq_update_nr_hw_queues() can be called from find_fallback().
>>>
>>> Yes.
>>>
>>>> Does kmemleak show wrong backtrace?
>>>
>>> I tried to run blktests block/005 over nbd, but can't reproduce this
>>> kmemleak report after setting up the detector.
>>
>> I have analyzed this bug and found the root cause:
>>
>> The issue arises while we run nr_hw_queue update,  Specifically, we first
>> reallocate hardware contexts (hctx) via __blk_mq_realloc_hw_ctxs(), and 
>> then later invoke elevator_switch() (assuming q->elevator is not NULL). 
>> The elevator switch code would first exit old elevator (elevator_exit)
>> and then switch to new elevator. The elevator_exit loops through
>> each hctx and invokes the elevator’s per-hctx exit method ->exit_hctx(),
>> which releases resources allocated during ->init_hctx().
>>
>> This memleak manifests when we reduce the num of h/w queues - for example,
>> when the initial update sets the number of queues to X, and a later update
>> reduces it to Y, where Y < X. In this case, we'd loose the access to old 
>> hctxs while we get to elevator exit code because __blk_mq_realloc_hw_ctxs
>> would have already released the old hctxs. As we don't now have any reference
>> left to the old hctxs, we don't have any way to free the scheduler resources
>> (which are allocate in ->init_hctx()) and kmemleak complains about it.
>>
>> Regarding reproduction, I was also not able to recreate it using block/005
>> but then I wrote a script using null-blk driver which updates nr_hw_queue
>> from X to Y (where Y < X) and I encountered this memleak. So this is not
>> an issue with nbd driver.
>>
>> I've implemented a potential fix for the above issue and I'm unit 
>> testing it now. I will post a formal patch in some time.
> 
> Great!
> 
> Looks it is introduced in commit 596dce110b7d ("block: simplify elevator reattachment
> for updating nr_hw_queues"), but easy to cause panic with that patchset.
> 
Yeah correct.

> One simple fix is to restore to original two-stage elevator switch, meantime saving
> elevator name in xarray for not adding boilerplate code back.

Agreed, I did implement the same and the fix is on its way...

Thanks,
--Nilay


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug report] kmemleak issue observed during blktests
  2025-07-17  0:46           ` Yi Zhang
@ 2025-07-17 14:22             ` Nilay Shroff
  0 siblings, 0 replies; 12+ messages in thread
From: Nilay Shroff @ 2025-07-17 14:22 UTC (permalink / raw)
  To: Yi Zhang, Ming Lei
  Cc: Yu Kuai, Jens Axboe, linux-block, Shinichiro Kawasaki, yukuai (C)



On 7/17/25 6:16 AM, Yi Zhang wrote:
> 
> Sorry for the late response, it takes me some time to find which case
> triggered the kmemleak.
> It turns out that block/040[1] triggered the kmemleak, and just
> running [2] after block/040 can not trigger the kmemleak immediately.
> We have to wait for more time.
> [1]
> [  458.175983] null_blk: disk nullb0 created
> [  458.180035] null_blk: module loaded
> [  458.397994] run blktests block/040 at 2025-07-16 20:31:20
> [  458.571488] null_blk: disk nullb1 created
> [  874.620574] kmemleak: 522 new suspected memory leaks (see
> /sys/kernel/debug/kmemleak)
> [2]
> echo scan >/sys/kernel/debug/kmemleak
> 
This brings to mind a potential improvement: why don’t we enable
kmemleak checks in blktests? We could clear the kmemleak state at
the beginning of each test, run the test, and then scan for any
reported memory leaks.

If kmemleak reports a leak, we could:
- Mark the test as failed.
- Append the leak details to the test log for later review.

This would help catch resource leaks more proactively during 
automated testing. If all agrees, then I could workout a patch
to blktests for the above improvement.

Thanks,
--Nilay

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug report] kmemleak issue observed during blktests
  2025-07-17 14:11           ` Nilay Shroff
@ 2025-07-17 14:25             ` Yi Zhang
  2025-07-17 14:28               ` Nilay Shroff
  0 siblings, 1 reply; 12+ messages in thread
From: Yi Zhang @ 2025-07-17 14:25 UTC (permalink / raw)
  To: Nilay Shroff
  Cc: Ming Lei, Yu Kuai, Jens Axboe, linux-block, Shinichiro Kawasaki,
	yukuai (C)

On Thu, Jul 17, 2025 at 10:12 PM Nilay Shroff <nilay@linux.ibm.com> wrote:
>
>
>
> On 7/17/25 5:32 AM, Ming Lei wrote:
> > On Thu, Jul 17, 2025 at 12:54:31AM +0530, Nilay Shroff wrote:
> >>
> >>
> >> On 7/16/25 4:10 PM, Ming Lei wrote:
> >>> On Wed, Jul 16, 2025 at 03:50:34PM +0800, Yu Kuai wrote:
> >>>> Hi,
> >>>>
> >>>> 在 2025/07/16 9:54, Jens Axboe 写道:
> >>>>> unreferenced object 0xffff8882e7fbb000 (size 2048):
> >>>>>    comm "check", pid 10460, jiffies 4324980514
> >>>>>    hex dump (first 32 bytes):
> >>>>>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >>>>>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >>>>>    backtrace (crc c47e6a37):
> >>>>>      __kvmalloc_node_noprof+0x55d/0x7a0
> >>>>>      sbitmap_init_node+0x15a/0x6a0
> >>>>>      kyber_init_hctx+0x316/0xb90
> >>>>>      blk_mq_init_sched+0x416/0x580
> >>>>>      elevator_switch+0x18b/0x630
> >>>>>      elv_update_nr_hw_queues+0x219/0x2c0
> >>>>>      __blk_mq_update_nr_hw_queues+0x36a/0x6f0
> >>>>>      blk_mq_update_nr_hw_queues+0x3a/0x60
> >>>>>      find_fallback+0x510/0x540 [nbd]
> >>>>
> >>>> This is werid, and I check the code that it's impossible
> >>>> blk_mq_update_nr_hw_queues() can be called from find_fallback().
> >>>
> >>> Yes.
> >>>
> >>>> Does kmemleak show wrong backtrace?
> >>>
> >>> I tried to run blktests block/005 over nbd, but can't reproduce this
> >>> kmemleak report after setting up the detector.
> >>
> >> I have analyzed this bug and found the root cause:
> >>
> >> The issue arises while we run nr_hw_queue update,  Specifically, we first
> >> reallocate hardware contexts (hctx) via __blk_mq_realloc_hw_ctxs(), and
> >> then later invoke elevator_switch() (assuming q->elevator is not NULL).
> >> The elevator switch code would first exit old elevator (elevator_exit)
> >> and then switch to new elevator. The elevator_exit loops through
> >> each hctx and invokes the elevator’s per-hctx exit method ->exit_hctx(),
> >> which releases resources allocated during ->init_hctx().
> >>
> >> This memleak manifests when we reduce the num of h/w queues - for example,
> >> when the initial update sets the number of queues to X, and a later update
> >> reduces it to Y, where Y < X. In this case, we'd loose the access to old
> >> hctxs while we get to elevator exit code because __blk_mq_realloc_hw_ctxs
> >> would have already released the old hctxs. As we don't now have any reference
> >> left to the old hctxs, we don't have any way to free the scheduler resources
> >> (which are allocate in ->init_hctx()) and kmemleak complains about it.
> >>
> >> Regarding reproduction, I was also not able to recreate it using block/005
> >> but then I wrote a script using null-blk driver which updates nr_hw_queue
> >> from X to Y (where Y < X) and I encountered this memleak. So this is not
> >> an issue with nbd driver.
> >>
> >> I've implemented a potential fix for the above issue and I'm unit
> >> testing it now. I will post a formal patch in some time.
> >
> > Great!
> >
> > Looks it is introduced in commit 596dce110b7d ("block: simplify elevator reattachment
> > for updating nr_hw_queues"), but easy to cause panic with that patchset.
> >
> Yeah correct.
>
> > One simple fix is to restore to original two-stage elevator switch, meantime saving
> > elevator name in xarray for not adding boilerplate code back.
>
> Agreed, I did implement the same and the fix is on its way...
>
> Thanks,
> --Nilay
>

Hi Nilay

How about update the patch with the below trace which doesn't have nbd info:

unreferenced object 0xffff8881b82f7400 (size 512):
  comm "check", pid 68454, jiffies 4310588881
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace (crc 5bac8b34):
    __kvmalloc_node_noprof+0x55d/0x7a0
    sbitmap_init_node+0x15a/0x6a0
    kyber_init_hctx+0x316/0xb90
    blk_mq_init_sched+0x419/0x580
    elevator_switch+0x18b/0x630
    elv_update_nr_hw_queues+0x219/0x2c0
    __blk_mq_update_nr_hw_queues+0x36a/0x6f0
    blk_mq_update_nr_hw_queues+0x3a/0x60
    0xffffffffc09ceb80
    0xffffffffc09d7e0b
    configfs_write_iter+0x2b1/0x470
    vfs_write+0x527/0xe70
    ksys_write+0xff/0x200
    do_syscall_64+0x98/0x3c0
    entry_SYSCALL_64_after_hwframe+0x76/0x7e
unreferenced object 0xffff8881b82f6000 (size 512):
  comm "check", pid 68454, jiffies 4310588881
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace (crc 5bac8b34):
    __kvmalloc_node_noprof+0x55d/0x7a0
    sbitmap_init_node+0x15a/0x6a0
    kyber_init_hctx+0x316/0xb90
    blk_mq_init_sched+0x419/0x580
    elevator_switch+0x18b/0x630
    elv_update_nr_hw_queues+0x219/0x2c0
    __blk_mq_update_nr_hw_queues+0x36a/0x6f0
    blk_mq_update_nr_hw_queues+0x3a/0x60
    0xffffffffc09ceb80
    0xffffffffc09d7e0b
    configfs_write_iter+0x2b1/0x470
    vfs_write+0x527/0xe70
    ksys_write+0xff/0x200
    do_syscall_64+0x98/0x3c0
    entry_SYSCALL_64_after_hwframe+0x76/0x7e
unreferenced object 0xffff8881b82f5800 (size 512):
  comm "check", pid 68454, jiffies 4310588881
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace (crc 5bac8b34):
    __kvmalloc_node_noprof+0x55d/0x7a0
    sbitmap_init_node+0x15a/0x6a0
    kyber_init_hctx+0x316/0xb90
    blk_mq_init_sched+0x419/0x580
    elevator_switch+0x18b/0x630
    elv_update_nr_hw_queues+0x219/0x2c0
    __blk_mq_update_nr_hw_queues+0x36a/0x6f0
    blk_mq_update_nr_hw_queues+0x3a/0x60
    0xffffffffc09ceb80
    0xffffffffc09d7e0b
    configfs_write_iter+0x2b1/0x470
    vfs_write+0x527/0xe70

    ksys_write+0xff/0x200
    do_syscall_64+0x98/0x3c0
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

--
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug report] kmemleak issue observed during blktests
  2025-07-17 14:25             ` Yi Zhang
@ 2025-07-17 14:28               ` Nilay Shroff
  0 siblings, 0 replies; 12+ messages in thread
From: Nilay Shroff @ 2025-07-17 14:28 UTC (permalink / raw)
  To: Yi Zhang
  Cc: Ming Lei, Yu Kuai, Jens Axboe, linux-block, Shinichiro Kawasaki,
	yukuai (C)



On 7/17/25 7:55 PM, Yi Zhang wrote:
> 
> Hi Nilay
> 
> How about update the patch with the below trace which doesn't have nbd info:
> 
> unreferenced object 0xffff8881b82f7400 (size 512):
>   comm "check", pid 68454, jiffies 4310588881
>   hex dump (first 32 bytes):
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace (crc 5bac8b34):
>     __kvmalloc_node_noprof+0x55d/0x7a0
>     sbitmap_init_node+0x15a/0x6a0
>     kyber_init_hctx+0x316/0xb90
>     blk_mq_init_sched+0x419/0x580
>     elevator_switch+0x18b/0x630
>     elv_update_nr_hw_queues+0x219/0x2c0
>     __blk_mq_update_nr_hw_queues+0x36a/0x6f0
>     blk_mq_update_nr_hw_queues+0x3a/0x60
>     0xffffffffc09ceb80
>     0xffffffffc09d7e0b
>     configfs_write_iter+0x2b1/0x470
>     vfs_write+0x527/0xe70
>     ksys_write+0xff/0x200
>     do_syscall_64+0x98/0x3c0
>     entry_SYSCALL_64_after_hwframe+0x76/0x7e
> unreferenced object 0xffff8881b82f6000 (size 512):
>   comm "check", pid 68454, jiffies 4310588881
>   hex dump (first 32 bytes):
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace (crc 5bac8b34):
>     __kvmalloc_node_noprof+0x55d/0x7a0
>     sbitmap_init_node+0x15a/0x6a0
>     kyber_init_hctx+0x316/0xb90
>     blk_mq_init_sched+0x419/0x580
>     elevator_switch+0x18b/0x630
>     elv_update_nr_hw_queues+0x219/0x2c0
>     __blk_mq_update_nr_hw_queues+0x36a/0x6f0
>     blk_mq_update_nr_hw_queues+0x3a/0x60
>     0xffffffffc09ceb80
>     0xffffffffc09d7e0b
>     configfs_write_iter+0x2b1/0x470
>     vfs_write+0x527/0xe70
>     ksys_write+0xff/0x200
>     do_syscall_64+0x98/0x3c0
>     entry_SYSCALL_64_after_hwframe+0x76/0x7e
> unreferenced object 0xffff8881b82f5800 (size 512):
>   comm "check", pid 68454, jiffies 4310588881
>   hex dump (first 32 bytes):
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace (crc 5bac8b34):
>     __kvmalloc_node_noprof+0x55d/0x7a0
>     sbitmap_init_node+0x15a/0x6a0
>     kyber_init_hctx+0x316/0xb90
>     blk_mq_init_sched+0x419/0x580
>     elevator_switch+0x18b/0x630
>     elv_update_nr_hw_queues+0x219/0x2c0
>     __blk_mq_update_nr_hw_queues+0x36a/0x6f0
>     blk_mq_update_nr_hw_queues+0x3a/0x60
>     0xffffffffc09ceb80
>     0xffffffffc09d7e0b
>     configfs_write_iter+0x2b1/0x470
>     vfs_write+0x527/0xe70
> 
>     ksys_write+0xff/0x200
>     do_syscall_64+0x98/0x3c0
>     entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
Sure, I will send another patch with the updated 
commit message.

Thanks,
--Nilay


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-07-17 14:29 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-16  1:42 [bug report] kmemleak issue observed during blktests Yi Zhang
2025-07-16  1:54 ` Jens Axboe
2025-07-16  7:50   ` Yu Kuai
2025-07-16 10:40     ` Ming Lei
2025-07-16 19:24       ` Nilay Shroff
2025-07-17  0:02         ` Ming Lei
2025-07-17  0:46           ` Yi Zhang
2025-07-17 14:22             ` Nilay Shroff
2025-07-17 14:11           ` Nilay Shroff
2025-07-17 14:25             ` Yi Zhang
2025-07-17 14:28               ` Nilay Shroff
2025-07-17  3:58       ` Yi Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).