* [PATCH 0/2] fix sbitmap initialization and null_blk tagset setup @ 2025-07-20 11:35 Nilay Shroff 2025-07-20 11:35 ` [PATCH 1/2] lib/sbitmap: fix kernel crash observed when sbitmap depth is zero Nilay Shroff 2025-07-20 11:35 ` [PATCH 2/2] null_blk: fix set->driver_data while setting up tagset Nilay Shroff 0 siblings, 2 replies; 8+ messages in thread From: Nilay Shroff @ 2025-07-20 11:35 UTC (permalink / raw) To: linux-block Cc: hch, ming.lei, hare, axboe, dlemoal, johannes.thumshirn, gjoyce Hi, This patchset fixes two subtle issues discovered while unit testing nr_hw_queue update code using null_blk driver. The first patch in the series, fixes an issue in the sbitmap initialization code, where sb->alloc_hint is not explicitly set to NULL when the sbitmap depth is zero. This can lead to a kernel crash in sbitmap_free(), which unconditionally calls free_percpu() on sb->alloc_hint — even if it was never allocated. The crash is caused by dereferencing an invalid pointer or stale garbage value. The second patch in the series fixes a bug in the null_blk driver where the driver_data field of the tagset is not properly initialized when setting up shared tagsets. This omission causes null_map_queues() to fail during nr_hw_queues update, leading to no software queues (ctx) being mapped to new hardware queues (hctx). As a result, the affected hctx remains unused for any IO. Interestingly, this bug exposed the first issue with sbitmap freeing. As usual, review and feedback are most welcome! Nilay Shroff (2): lib/sbitmap: fix kernel crash observed when sbitmap depth is zero null_blk: fix set->driver_data while setting up tagset drivers/block/null_blk/main.c | 3 ++- lib/sbitmap.c | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) -- 2.50.1 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/2] lib/sbitmap: fix kernel crash observed when sbitmap depth is zero 2025-07-20 11:35 [PATCH 0/2] fix sbitmap initialization and null_blk tagset setup Nilay Shroff @ 2025-07-20 11:35 ` Nilay Shroff 2025-07-21 7:15 ` Hannes Reinecke 2025-07-21 13:02 ` Damien Le Moal 2025-07-20 11:35 ` [PATCH 2/2] null_blk: fix set->driver_data while setting up tagset Nilay Shroff 1 sibling, 2 replies; 8+ messages in thread From: Nilay Shroff @ 2025-07-20 11:35 UTC (permalink / raw) To: linux-block Cc: hch, ming.lei, hare, axboe, dlemoal, johannes.thumshirn, gjoyce We observed a kernel crash when the I/O scheduler allocates an sbitmap for a hardware queue (hctx) that has no associated software queues (ctx), and later attempts to free it. When no software queues are mapped to a hardware queue, the sbitmap is initialized with a depth of zero. In such cases, the sbitmap_init_node() function should set sb->alloc_hint to NULL. However, if this is not done, sb->alloc_hint may contain garbage, and calling sbitmap_free() will pass this invalid pointer to free_percpu(), resulting in a kernel crash. Example crash trace: ================================================================== Kernel attempted to read user page (28) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on read at 0x00000028 Faulting instruction address: 0xc000000000708f88 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries [...] CPU: 5 UID: 0 PID: 5491 Comm: mk_nullb_shared Kdump: loaded Tainted: G B 6.16.0-rc5+ #294 VOLUNTARY Tainted: [B]=BAD_PAGE Hardware name: IBM,9043-MRX POWER10 (architected) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_028) hv:phyp pSeries [...] NIP [c000000000708f88] free_percpu+0x144/0xba8 LR [c000000000708f84] free_percpu+0x140/0xba8 Call Trace: free_percpu+0x140/0xba8 (unreliable) kyber_exit_hctx+0x94/0x124 blk_mq_exit_sched+0xe4/0x214 elevator_exit+0xa8/0xf4 elevator_switch+0x3b8/0x5d8 elv_update_nr_hw_queues+0x14c/0x300 blk_mq_update_nr_hw_queues+0x5cc/0x670 nullb_update_nr_hw_queues+0x118/0x1f8 [null_blk] nullb_device_submit_queues_store+0xac/0x170 [null_blk] configfs_write_iter+0x1dc/0x2d0 vfs_write+0x5b0/0x77c ksys_write+0xa0/0x180 system_call_exception+0x1b0/0x4f0 system_call_vectored_common+0x15c/0x2ec If the sbitmap depth is zero, sb->alloc_hint memory is NOT allocated, but the pointer is not explicitly set to NULL. Later, during sbitmap_free(), the kernel attempts to free sb->alloc_hint, which is a per cpu pointer variable, regardless of whether it was valid, leading to a crash. This patch ensures that sb->alloc_hint is explicitly set to NULL in sbitmap_init_node() when the requested depth is zero. This prevents free_percpu() from freeing sb->alloc_hint and thus avoids the observed crash. Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> --- lib/sbitmap.c | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/sbitmap.c b/lib/sbitmap.c index d3412984170c..aa8b6ca76169 100644 --- a/lib/sbitmap.c +++ b/lib/sbitmap.c @@ -119,6 +119,7 @@ int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift, if (depth == 0) { sb->map = NULL; + sb->alloc_hint = NULL; return 0; } -- 2.50.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] lib/sbitmap: fix kernel crash observed when sbitmap depth is zero 2025-07-20 11:35 ` [PATCH 1/2] lib/sbitmap: fix kernel crash observed when sbitmap depth is zero Nilay Shroff @ 2025-07-21 7:15 ` Hannes Reinecke 2025-07-21 13:02 ` Damien Le Moal 1 sibling, 0 replies; 8+ messages in thread From: Hannes Reinecke @ 2025-07-21 7:15 UTC (permalink / raw) To: Nilay Shroff, linux-block Cc: hch, ming.lei, axboe, dlemoal, johannes.thumshirn, gjoyce On 7/20/25 13:35, Nilay Shroff wrote: > We observed a kernel crash when the I/O scheduler allocates an sbitmap > for a hardware queue (hctx) that has no associated software queues (ctx), > and later attempts to free it. When no software queues are mapped to a > hardware queue, the sbitmap is initialized with a depth of zero. In such > cases, the sbitmap_init_node() function should set sb->alloc_hint to NULL. > However, if this is not done, sb->alloc_hint may contain garbage, and > calling sbitmap_free() will pass this invalid pointer to free_percpu(), > resulting in a kernel crash. > > Example crash trace: > ================================================================== > Kernel attempted to read user page (28) - exploit attempt? (uid: 0) > BUG: Kernel NULL pointer dereference on read at 0x00000028 > Faulting instruction address: 0xc000000000708f88 > Oops: Kernel access of bad area, sig: 11 [#1] > LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries > [...] > CPU: 5 UID: 0 PID: 5491 Comm: mk_nullb_shared Kdump: loaded Tainted: G B 6.16.0-rc5+ #294 VOLUNTARY > Tainted: [B]=BAD_PAGE > Hardware name: IBM,9043-MRX POWER10 (architected) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_028) hv:phyp pSeries > [...] > NIP [c000000000708f88] free_percpu+0x144/0xba8 > LR [c000000000708f84] free_percpu+0x140/0xba8 > Call Trace: > free_percpu+0x140/0xba8 (unreliable) > kyber_exit_hctx+0x94/0x124 > blk_mq_exit_sched+0xe4/0x214 > elevator_exit+0xa8/0xf4 > elevator_switch+0x3b8/0x5d8 > elv_update_nr_hw_queues+0x14c/0x300 > blk_mq_update_nr_hw_queues+0x5cc/0x670 > nullb_update_nr_hw_queues+0x118/0x1f8 [null_blk] > nullb_device_submit_queues_store+0xac/0x170 [null_blk] > configfs_write_iter+0x1dc/0x2d0 > vfs_write+0x5b0/0x77c > ksys_write+0xa0/0x180 > system_call_exception+0x1b0/0x4f0 > system_call_vectored_common+0x15c/0x2ec > > If the sbitmap depth is zero, sb->alloc_hint memory is NOT allocated, but > the pointer is not explicitly set to NULL. Later, during sbitmap_free(), > the kernel attempts to free sb->alloc_hint, which is a per cpu pointer > variable, regardless of whether it was valid, leading to a crash. > > This patch ensures that sb->alloc_hint is explicitly set to NULL in > sbitmap_init_node() when the requested depth is zero. This prevents > free_percpu() from freeing sb->alloc_hint and thus avoids the observed > crash. > > Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> > --- > lib/sbitmap.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/lib/sbitmap.c b/lib/sbitmap.c > index d3412984170c..aa8b6ca76169 100644 > --- a/lib/sbitmap.c > +++ b/lib/sbitmap.c > @@ -119,6 +119,7 @@ int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift, > > if (depth == 0) { > sb->map = NULL; > + sb->alloc_hint = NULL; > return 0; > } > Reviewed-by: Hannes Reinecke <hare@suse.de> Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] lib/sbitmap: fix kernel crash observed when sbitmap depth is zero 2025-07-20 11:35 ` [PATCH 1/2] lib/sbitmap: fix kernel crash observed when sbitmap depth is zero Nilay Shroff 2025-07-21 7:15 ` Hannes Reinecke @ 2025-07-21 13:02 ` Damien Le Moal 1 sibling, 0 replies; 8+ messages in thread From: Damien Le Moal @ 2025-07-21 13:02 UTC (permalink / raw) To: Nilay Shroff, linux-block Cc: hch, ming.lei, hare, axboe, johannes.thumshirn, gjoyce On 2025/07/20 20:35, Nilay Shroff wrote: > We observed a kernel crash when the I/O scheduler allocates an sbitmap > for a hardware queue (hctx) that has no associated software queues (ctx), > and later attempts to free it. When no software queues are mapped to a > hardware queue, the sbitmap is initialized with a depth of zero. In such > cases, the sbitmap_init_node() function should set sb->alloc_hint to NULL. > However, if this is not done, sb->alloc_hint may contain garbage, and > calling sbitmap_free() will pass this invalid pointer to free_percpu(), > resulting in a kernel crash. Reviewed-by: Damien Le Moal <dlemoal@kernel.org> -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/2] null_blk: fix set->driver_data while setting up tagset 2025-07-20 11:35 [PATCH 0/2] fix sbitmap initialization and null_blk tagset setup Nilay Shroff 2025-07-20 11:35 ` [PATCH 1/2] lib/sbitmap: fix kernel crash observed when sbitmap depth is zero Nilay Shroff @ 2025-07-20 11:35 ` Nilay Shroff 2025-07-21 7:16 ` Hannes Reinecke 2025-07-21 13:04 ` Damien Le Moal 1 sibling, 2 replies; 8+ messages in thread From: Nilay Shroff @ 2025-07-20 11:35 UTC (permalink / raw) To: linux-block Cc: hch, ming.lei, hare, axboe, dlemoal, johannes.thumshirn, gjoyce When setting up a null block device, we initialize a tagset that includes a driver_data field—typically used by block drivers to store a pointer to driver-specific data. In the case of null_blk, this should point to the struct nullb instance. However, due to recent tagset refactoring in the null_blk driver, we missed initializing driver_data when creating a shared tagset. As a result, software queues (ctx) fail to map correctly to new hardware queues (hctx). For example, increasing the number of submit queues triggers an nr_hw_queues update, which invokes null_map_queues() to remap queues. Since set->driver_data is unset, null_map_queues() fails to map any ctx to the new hctxs, leading to hctx->nr_ctx == 0, effectively making the hardware queues unusable for I/O. This patch fixes the issue by ensuring that set->driver_data is properly initialized to point to the struct nullb during tagset setup. Fixes: 72ca28765fc4 ("null_blk: refactor tag_set setup") Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> --- drivers/block/null_blk/main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c index aa163ae9b2aa..9e1c4ce6fc42 100644 --- a/drivers/block/null_blk/main.c +++ b/drivers/block/null_blk/main.c @@ -1854,13 +1854,14 @@ static int null_init_global_tag_set(void) static int null_setup_tagset(struct nullb *nullb) { + nullb->tag_set->driver_data = nullb; + if (nullb->dev->shared_tags) { nullb->tag_set = &tag_set; return null_init_global_tag_set(); } nullb->tag_set = &nullb->__tag_set; - nullb->tag_set->driver_data = nullb; nullb->tag_set->nr_hw_queues = nullb->dev->submit_queues; nullb->tag_set->queue_depth = nullb->dev->hw_queue_depth; nullb->tag_set->numa_node = nullb->dev->home_node; -- 2.50.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] null_blk: fix set->driver_data while setting up tagset 2025-07-20 11:35 ` [PATCH 2/2] null_blk: fix set->driver_data while setting up tagset Nilay Shroff @ 2025-07-21 7:16 ` Hannes Reinecke 2025-07-21 13:04 ` Damien Le Moal 1 sibling, 0 replies; 8+ messages in thread From: Hannes Reinecke @ 2025-07-21 7:16 UTC (permalink / raw) To: Nilay Shroff, linux-block Cc: hch, ming.lei, axboe, dlemoal, johannes.thumshirn, gjoyce On 7/20/25 13:35, Nilay Shroff wrote: > When setting up a null block device, we initialize a tagset that > includes a driver_data field—typically used by block drivers to > store a pointer to driver-specific data. In the case of null_blk, > this should point to the struct nullb instance. > > However, due to recent tagset refactoring in the null_blk driver, we > missed initializing driver_data when creating a shared tagset. As a > result, software queues (ctx) fail to map correctly to new hardware > queues (hctx). For example, increasing the number of submit queues > triggers an nr_hw_queues update, which invokes null_map_queues() to > remap queues. Since set->driver_data is unset, null_map_queues() > fails to map any ctx to the new hctxs, leading to hctx->nr_ctx == 0, > effectively making the hardware queues unusable for I/O. > > This patch fixes the issue by ensuring that set->driver_data is properly > initialized to point to the struct nullb during tagset setup. > > Fixes: 72ca28765fc4 ("null_blk: refactor tag_set setup") > Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> > --- > drivers/block/null_blk/main.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c > index aa163ae9b2aa..9e1c4ce6fc42 100644 > --- a/drivers/block/null_blk/main.c > +++ b/drivers/block/null_blk/main.c > @@ -1854,13 +1854,14 @@ static int null_init_global_tag_set(void) > > static int null_setup_tagset(struct nullb *nullb) > { > + nullb->tag_set->driver_data = nullb; > + > if (nullb->dev->shared_tags) { > nullb->tag_set = &tag_set; > return null_init_global_tag_set(); > } > > nullb->tag_set = &nullb->__tag_set; > - nullb->tag_set->driver_data = nullb; > nullb->tag_set->nr_hw_queues = nullb->dev->submit_queues; > nullb->tag_set->queue_depth = nullb->dev->hw_queue_depth; > nullb->tag_set->numa_node = nullb->dev->home_node; Reviewed-by: Hannes Reinecke <hare@suse.de> Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] null_blk: fix set->driver_data while setting up tagset 2025-07-20 11:35 ` [PATCH 2/2] null_blk: fix set->driver_data while setting up tagset Nilay Shroff 2025-07-21 7:16 ` Hannes Reinecke @ 2025-07-21 13:04 ` Damien Le Moal 2025-07-21 13:37 ` Nilay Shroff 1 sibling, 1 reply; 8+ messages in thread From: Damien Le Moal @ 2025-07-21 13:04 UTC (permalink / raw) To: Nilay Shroff, linux-block Cc: hch, ming.lei, hare, axboe, johannes.thumshirn, gjoyce On 2025/07/20 20:35, Nilay Shroff wrote: > When setting up a null block device, we initialize a tagset that > includes a driver_data field—typically used by block drivers to > store a pointer to driver-specific data. In the case of null_blk, > this should point to the struct nullb instance. > > However, due to recent tagset refactoring in the null_blk driver, we > missed initializing driver_data when creating a shared tagset. As a > result, software queues (ctx) fail to map correctly to new hardware > queues (hctx). For example, increasing the number of submit queues > triggers an nr_hw_queues update, which invokes null_map_queues() to > remap queues. Since set->driver_data is unset, null_map_queues() > fails to map any ctx to the new hctxs, leading to hctx->nr_ctx == 0, > effectively making the hardware queues unusable for I/O. > > This patch fixes the issue by ensuring that set->driver_data is properly > initialized to point to the struct nullb during tagset setup. > > Fixes: 72ca28765fc4 ("null_blk: refactor tag_set setup") > Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> > --- > drivers/block/null_blk/main.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c > index aa163ae9b2aa..9e1c4ce6fc42 100644 > --- a/drivers/block/null_blk/main.c > +++ b/drivers/block/null_blk/main.c > @@ -1854,13 +1854,14 @@ static int null_init_global_tag_set(void) > > static int null_setup_tagset(struct nullb *nullb) > { > + nullb->tag_set->driver_data = nullb; > + How can this be correct since the tag_set pointer is initialized below ? > if (nullb->dev->shared_tags) { > nullb->tag_set = &tag_set; Shouldn't you add: nullb->tag_set->driver_data = nullb; here instead ? > return null_init_global_tag_set(); > } > > nullb->tag_set = &nullb->__tag_set; > - nullb->tag_set->driver_data = nullb; > nullb->tag_set->nr_hw_queues = nullb->dev->submit_queues; > nullb->tag_set->queue_depth = nullb->dev->hw_queue_depth; > nullb->tag_set->numa_node = nullb->dev->home_node; -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] null_blk: fix set->driver_data while setting up tagset 2025-07-21 13:04 ` Damien Le Moal @ 2025-07-21 13:37 ` Nilay Shroff 0 siblings, 0 replies; 8+ messages in thread From: Nilay Shroff @ 2025-07-21 13:37 UTC (permalink / raw) To: Damien Le Moal, linux-block Cc: hch, ming.lei, hare, axboe, johannes.thumshirn, gjoyce On 7/21/25 6:34 PM, Damien Le Moal wrote: > On 2025/07/20 20:35, Nilay Shroff wrote: >> When setting up a null block device, we initialize a tagset that >> includes a driver_data field—typically used by block drivers to >> store a pointer to driver-specific data. In the case of null_blk, >> this should point to the struct nullb instance. >> >> However, due to recent tagset refactoring in the null_blk driver, we >> missed initializing driver_data when creating a shared tagset. As a >> result, software queues (ctx) fail to map correctly to new hardware >> queues (hctx). For example, increasing the number of submit queues >> triggers an nr_hw_queues update, which invokes null_map_queues() to >> remap queues. Since set->driver_data is unset, null_map_queues() >> fails to map any ctx to the new hctxs, leading to hctx->nr_ctx == 0, >> effectively making the hardware queues unusable for I/O. >> >> This patch fixes the issue by ensuring that set->driver_data is properly >> initialized to point to the struct nullb during tagset setup. >> >> Fixes: 72ca28765fc4 ("null_blk: refactor tag_set setup") >> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> >> --- >> drivers/block/null_blk/main.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c >> index aa163ae9b2aa..9e1c4ce6fc42 100644 >> --- a/drivers/block/null_blk/main.c >> +++ b/drivers/block/null_blk/main.c >> @@ -1854,13 +1854,14 @@ static int null_init_global_tag_set(void) >> >> static int null_setup_tagset(struct nullb *nullb) >> { >> + nullb->tag_set->driver_data = nullb; >> + > > How can this be correct since the tag_set pointer is initialized below ? > >> if (nullb->dev->shared_tags) { >> nullb->tag_set = &tag_set; > > Shouldn't you add: > > nullb->tag_set->driver_data = nullb; > > here instead ? > >> return null_init_global_tag_set(); >> } Oh yes good catch! I'll fix this in the next patchset. My bad.. :( Thanks, --Nilay ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-07-21 13:37 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-07-20 11:35 [PATCH 0/2] fix sbitmap initialization and null_blk tagset setup Nilay Shroff 2025-07-20 11:35 ` [PATCH 1/2] lib/sbitmap: fix kernel crash observed when sbitmap depth is zero Nilay Shroff 2025-07-21 7:15 ` Hannes Reinecke 2025-07-21 13:02 ` Damien Le Moal 2025-07-20 11:35 ` [PATCH 2/2] null_blk: fix set->driver_data while setting up tagset Nilay Shroff 2025-07-21 7:16 ` Hannes Reinecke 2025-07-21 13:04 ` Damien Le Moal 2025-07-21 13:37 ` Nilay Shroff
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox