* [PATCH 0/2] fix sbitmap initialization and null_blk tagset setup
@ 2025-07-20 11:35 Nilay Shroff
2025-07-20 11:35 ` [PATCH 1/2] lib/sbitmap: fix kernel crash observed when sbitmap depth is zero Nilay Shroff
2025-07-20 11:35 ` [PATCH 2/2] null_blk: fix set->driver_data while setting up tagset Nilay Shroff
0 siblings, 2 replies; 8+ messages in thread
From: Nilay Shroff @ 2025-07-20 11:35 UTC (permalink / raw)
To: linux-block
Cc: hch, ming.lei, hare, axboe, dlemoal, johannes.thumshirn, gjoyce
Hi,
This patchset fixes two subtle issues discovered while unit testing
nr_hw_queue update code using null_blk driver.
The first patch in the series, fixes an issue in the sbitmap initialization
code, where sb->alloc_hint is not explicitly set to NULL when the sbitmap
depth is zero. This can lead to a kernel crash in sbitmap_free(), which
unconditionally calls free_percpu() on sb->alloc_hint — even if it was
never allocated. The crash is caused by dereferencing an invalid pointer
or stale garbage value.
The second patch in the series fixes a bug in the null_blk driver where
the driver_data field of the tagset is not properly initialized when
setting up shared tagsets. This omission causes null_map_queues() to fail
during nr_hw_queues update, leading to no software queues (ctx) being
mapped to new hardware queues (hctx). As a result, the affected hctx
remains unused for any IO. Interestingly, this bug exposed the first
issue with sbitmap freeing.
As usual, review and feedback are most welcome!
Nilay Shroff (2):
lib/sbitmap: fix kernel crash observed when sbitmap depth is zero
null_blk: fix set->driver_data while setting up tagset
drivers/block/null_blk/main.c | 3 ++-
lib/sbitmap.c | 1 +
2 files changed, 3 insertions(+), 1 deletion(-)
--
2.50.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/2] lib/sbitmap: fix kernel crash observed when sbitmap depth is zero
2025-07-20 11:35 [PATCH 0/2] fix sbitmap initialization and null_blk tagset setup Nilay Shroff
@ 2025-07-20 11:35 ` Nilay Shroff
2025-07-21 7:15 ` Hannes Reinecke
2025-07-21 13:02 ` Damien Le Moal
2025-07-20 11:35 ` [PATCH 2/2] null_blk: fix set->driver_data while setting up tagset Nilay Shroff
1 sibling, 2 replies; 8+ messages in thread
From: Nilay Shroff @ 2025-07-20 11:35 UTC (permalink / raw)
To: linux-block
Cc: hch, ming.lei, hare, axboe, dlemoal, johannes.thumshirn, gjoyce
We observed a kernel crash when the I/O scheduler allocates an sbitmap
for a hardware queue (hctx) that has no associated software queues (ctx),
and later attempts to free it. When no software queues are mapped to a
hardware queue, the sbitmap is initialized with a depth of zero. In such
cases, the sbitmap_init_node() function should set sb->alloc_hint to NULL.
However, if this is not done, sb->alloc_hint may contain garbage, and
calling sbitmap_free() will pass this invalid pointer to free_percpu(),
resulting in a kernel crash.
Example crash trace:
==================================================================
Kernel attempted to read user page (28) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on read at 0x00000028
Faulting instruction address: 0xc000000000708f88
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[...]
CPU: 5 UID: 0 PID: 5491 Comm: mk_nullb_shared Kdump: loaded Tainted: G B 6.16.0-rc5+ #294 VOLUNTARY
Tainted: [B]=BAD_PAGE
Hardware name: IBM,9043-MRX POWER10 (architected) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_028) hv:phyp pSeries
[...]
NIP [c000000000708f88] free_percpu+0x144/0xba8
LR [c000000000708f84] free_percpu+0x140/0xba8
Call Trace:
free_percpu+0x140/0xba8 (unreliable)
kyber_exit_hctx+0x94/0x124
blk_mq_exit_sched+0xe4/0x214
elevator_exit+0xa8/0xf4
elevator_switch+0x3b8/0x5d8
elv_update_nr_hw_queues+0x14c/0x300
blk_mq_update_nr_hw_queues+0x5cc/0x670
nullb_update_nr_hw_queues+0x118/0x1f8 [null_blk]
nullb_device_submit_queues_store+0xac/0x170 [null_blk]
configfs_write_iter+0x1dc/0x2d0
vfs_write+0x5b0/0x77c
ksys_write+0xa0/0x180
system_call_exception+0x1b0/0x4f0
system_call_vectored_common+0x15c/0x2ec
If the sbitmap depth is zero, sb->alloc_hint memory is NOT allocated, but
the pointer is not explicitly set to NULL. Later, during sbitmap_free(),
the kernel attempts to free sb->alloc_hint, which is a per cpu pointer
variable, regardless of whether it was valid, leading to a crash.
This patch ensures that sb->alloc_hint is explicitly set to NULL in
sbitmap_init_node() when the requested depth is zero. This prevents
free_percpu() from freeing sb->alloc_hint and thus avoids the observed
crash.
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
lib/sbitmap.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/lib/sbitmap.c b/lib/sbitmap.c
index d3412984170c..aa8b6ca76169 100644
--- a/lib/sbitmap.c
+++ b/lib/sbitmap.c
@@ -119,6 +119,7 @@ int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift,
if (depth == 0) {
sb->map = NULL;
+ sb->alloc_hint = NULL;
return 0;
}
--
2.50.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/2] null_blk: fix set->driver_data while setting up tagset
2025-07-20 11:35 [PATCH 0/2] fix sbitmap initialization and null_blk tagset setup Nilay Shroff
2025-07-20 11:35 ` [PATCH 1/2] lib/sbitmap: fix kernel crash observed when sbitmap depth is zero Nilay Shroff
@ 2025-07-20 11:35 ` Nilay Shroff
2025-07-21 7:16 ` Hannes Reinecke
2025-07-21 13:04 ` Damien Le Moal
1 sibling, 2 replies; 8+ messages in thread
From: Nilay Shroff @ 2025-07-20 11:35 UTC (permalink / raw)
To: linux-block
Cc: hch, ming.lei, hare, axboe, dlemoal, johannes.thumshirn, gjoyce
When setting up a null block device, we initialize a tagset that
includes a driver_data field—typically used by block drivers to
store a pointer to driver-specific data. In the case of null_blk,
this should point to the struct nullb instance.
However, due to recent tagset refactoring in the null_blk driver, we
missed initializing driver_data when creating a shared tagset. As a
result, software queues (ctx) fail to map correctly to new hardware
queues (hctx). For example, increasing the number of submit queues
triggers an nr_hw_queues update, which invokes null_map_queues() to
remap queues. Since set->driver_data is unset, null_map_queues()
fails to map any ctx to the new hctxs, leading to hctx->nr_ctx == 0,
effectively making the hardware queues unusable for I/O.
This patch fixes the issue by ensuring that set->driver_data is properly
initialized to point to the struct nullb during tagset setup.
Fixes: 72ca28765fc4 ("null_blk: refactor tag_set setup")
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
drivers/block/null_blk/main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
index aa163ae9b2aa..9e1c4ce6fc42 100644
--- a/drivers/block/null_blk/main.c
+++ b/drivers/block/null_blk/main.c
@@ -1854,13 +1854,14 @@ static int null_init_global_tag_set(void)
static int null_setup_tagset(struct nullb *nullb)
{
+ nullb->tag_set->driver_data = nullb;
+
if (nullb->dev->shared_tags) {
nullb->tag_set = &tag_set;
return null_init_global_tag_set();
}
nullb->tag_set = &nullb->__tag_set;
- nullb->tag_set->driver_data = nullb;
nullb->tag_set->nr_hw_queues = nullb->dev->submit_queues;
nullb->tag_set->queue_depth = nullb->dev->hw_queue_depth;
nullb->tag_set->numa_node = nullb->dev->home_node;
--
2.50.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] lib/sbitmap: fix kernel crash observed when sbitmap depth is zero
2025-07-20 11:35 ` [PATCH 1/2] lib/sbitmap: fix kernel crash observed when sbitmap depth is zero Nilay Shroff
@ 2025-07-21 7:15 ` Hannes Reinecke
2025-07-21 13:02 ` Damien Le Moal
1 sibling, 0 replies; 8+ messages in thread
From: Hannes Reinecke @ 2025-07-21 7:15 UTC (permalink / raw)
To: Nilay Shroff, linux-block
Cc: hch, ming.lei, axboe, dlemoal, johannes.thumshirn, gjoyce
On 7/20/25 13:35, Nilay Shroff wrote:
> We observed a kernel crash when the I/O scheduler allocates an sbitmap
> for a hardware queue (hctx) that has no associated software queues (ctx),
> and later attempts to free it. When no software queues are mapped to a
> hardware queue, the sbitmap is initialized with a depth of zero. In such
> cases, the sbitmap_init_node() function should set sb->alloc_hint to NULL.
> However, if this is not done, sb->alloc_hint may contain garbage, and
> calling sbitmap_free() will pass this invalid pointer to free_percpu(),
> resulting in a kernel crash.
>
> Example crash trace:
> ==================================================================
> Kernel attempted to read user page (28) - exploit attempt? (uid: 0)
> BUG: Kernel NULL pointer dereference on read at 0x00000028
> Faulting instruction address: 0xc000000000708f88
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> [...]
> CPU: 5 UID: 0 PID: 5491 Comm: mk_nullb_shared Kdump: loaded Tainted: G B 6.16.0-rc5+ #294 VOLUNTARY
> Tainted: [B]=BAD_PAGE
> Hardware name: IBM,9043-MRX POWER10 (architected) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_028) hv:phyp pSeries
> [...]
> NIP [c000000000708f88] free_percpu+0x144/0xba8
> LR [c000000000708f84] free_percpu+0x140/0xba8
> Call Trace:
> free_percpu+0x140/0xba8 (unreliable)
> kyber_exit_hctx+0x94/0x124
> blk_mq_exit_sched+0xe4/0x214
> elevator_exit+0xa8/0xf4
> elevator_switch+0x3b8/0x5d8
> elv_update_nr_hw_queues+0x14c/0x300
> blk_mq_update_nr_hw_queues+0x5cc/0x670
> nullb_update_nr_hw_queues+0x118/0x1f8 [null_blk]
> nullb_device_submit_queues_store+0xac/0x170 [null_blk]
> configfs_write_iter+0x1dc/0x2d0
> vfs_write+0x5b0/0x77c
> ksys_write+0xa0/0x180
> system_call_exception+0x1b0/0x4f0
> system_call_vectored_common+0x15c/0x2ec
>
> If the sbitmap depth is zero, sb->alloc_hint memory is NOT allocated, but
> the pointer is not explicitly set to NULL. Later, during sbitmap_free(),
> the kernel attempts to free sb->alloc_hint, which is a per cpu pointer
> variable, regardless of whether it was valid, leading to a crash.
>
> This patch ensures that sb->alloc_hint is explicitly set to NULL in
> sbitmap_init_node() when the requested depth is zero. This prevents
> free_percpu() from freeing sb->alloc_hint and thus avoids the observed
> crash.
>
> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
> ---
> lib/sbitmap.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/lib/sbitmap.c b/lib/sbitmap.c
> index d3412984170c..aa8b6ca76169 100644
> --- a/lib/sbitmap.c
> +++ b/lib/sbitmap.c
> @@ -119,6 +119,7 @@ int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift,
>
> if (depth == 0) {
> sb->map = NULL;
> + sb->alloc_hint = NULL;
> return 0;
> }
>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] null_blk: fix set->driver_data while setting up tagset
2025-07-20 11:35 ` [PATCH 2/2] null_blk: fix set->driver_data while setting up tagset Nilay Shroff
@ 2025-07-21 7:16 ` Hannes Reinecke
2025-07-21 13:04 ` Damien Le Moal
1 sibling, 0 replies; 8+ messages in thread
From: Hannes Reinecke @ 2025-07-21 7:16 UTC (permalink / raw)
To: Nilay Shroff, linux-block
Cc: hch, ming.lei, axboe, dlemoal, johannes.thumshirn, gjoyce
On 7/20/25 13:35, Nilay Shroff wrote:
> When setting up a null block device, we initialize a tagset that
> includes a driver_data field—typically used by block drivers to
> store a pointer to driver-specific data. In the case of null_blk,
> this should point to the struct nullb instance.
>
> However, due to recent tagset refactoring in the null_blk driver, we
> missed initializing driver_data when creating a shared tagset. As a
> result, software queues (ctx) fail to map correctly to new hardware
> queues (hctx). For example, increasing the number of submit queues
> triggers an nr_hw_queues update, which invokes null_map_queues() to
> remap queues. Since set->driver_data is unset, null_map_queues()
> fails to map any ctx to the new hctxs, leading to hctx->nr_ctx == 0,
> effectively making the hardware queues unusable for I/O.
>
> This patch fixes the issue by ensuring that set->driver_data is properly
> initialized to point to the struct nullb during tagset setup.
>
> Fixes: 72ca28765fc4 ("null_blk: refactor tag_set setup")
> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
> ---
> drivers/block/null_blk/main.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
> index aa163ae9b2aa..9e1c4ce6fc42 100644
> --- a/drivers/block/null_blk/main.c
> +++ b/drivers/block/null_blk/main.c
> @@ -1854,13 +1854,14 @@ static int null_init_global_tag_set(void)
>
> static int null_setup_tagset(struct nullb *nullb)
> {
> + nullb->tag_set->driver_data = nullb;
> +
> if (nullb->dev->shared_tags) {
> nullb->tag_set = &tag_set;
> return null_init_global_tag_set();
> }
>
> nullb->tag_set = &nullb->__tag_set;
> - nullb->tag_set->driver_data = nullb;
> nullb->tag_set->nr_hw_queues = nullb->dev->submit_queues;
> nullb->tag_set->queue_depth = nullb->dev->hw_queue_depth;
> nullb->tag_set->numa_node = nullb->dev->home_node;
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] lib/sbitmap: fix kernel crash observed when sbitmap depth is zero
2025-07-20 11:35 ` [PATCH 1/2] lib/sbitmap: fix kernel crash observed when sbitmap depth is zero Nilay Shroff
2025-07-21 7:15 ` Hannes Reinecke
@ 2025-07-21 13:02 ` Damien Le Moal
1 sibling, 0 replies; 8+ messages in thread
From: Damien Le Moal @ 2025-07-21 13:02 UTC (permalink / raw)
To: Nilay Shroff, linux-block
Cc: hch, ming.lei, hare, axboe, johannes.thumshirn, gjoyce
On 2025/07/20 20:35, Nilay Shroff wrote:
> We observed a kernel crash when the I/O scheduler allocates an sbitmap
> for a hardware queue (hctx) that has no associated software queues (ctx),
> and later attempts to free it. When no software queues are mapped to a
> hardware queue, the sbitmap is initialized with a depth of zero. In such
> cases, the sbitmap_init_node() function should set sb->alloc_hint to NULL.
> However, if this is not done, sb->alloc_hint may contain garbage, and
> calling sbitmap_free() will pass this invalid pointer to free_percpu(),
> resulting in a kernel crash.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] null_blk: fix set->driver_data while setting up tagset
2025-07-20 11:35 ` [PATCH 2/2] null_blk: fix set->driver_data while setting up tagset Nilay Shroff
2025-07-21 7:16 ` Hannes Reinecke
@ 2025-07-21 13:04 ` Damien Le Moal
2025-07-21 13:37 ` Nilay Shroff
1 sibling, 1 reply; 8+ messages in thread
From: Damien Le Moal @ 2025-07-21 13:04 UTC (permalink / raw)
To: Nilay Shroff, linux-block
Cc: hch, ming.lei, hare, axboe, johannes.thumshirn, gjoyce
On 2025/07/20 20:35, Nilay Shroff wrote:
> When setting up a null block device, we initialize a tagset that
> includes a driver_data field—typically used by block drivers to
> store a pointer to driver-specific data. In the case of null_blk,
> this should point to the struct nullb instance.
>
> However, due to recent tagset refactoring in the null_blk driver, we
> missed initializing driver_data when creating a shared tagset. As a
> result, software queues (ctx) fail to map correctly to new hardware
> queues (hctx). For example, increasing the number of submit queues
> triggers an nr_hw_queues update, which invokes null_map_queues() to
> remap queues. Since set->driver_data is unset, null_map_queues()
> fails to map any ctx to the new hctxs, leading to hctx->nr_ctx == 0,
> effectively making the hardware queues unusable for I/O.
>
> This patch fixes the issue by ensuring that set->driver_data is properly
> initialized to point to the struct nullb during tagset setup.
>
> Fixes: 72ca28765fc4 ("null_blk: refactor tag_set setup")
> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
> ---
> drivers/block/null_blk/main.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
> index aa163ae9b2aa..9e1c4ce6fc42 100644
> --- a/drivers/block/null_blk/main.c
> +++ b/drivers/block/null_blk/main.c
> @@ -1854,13 +1854,14 @@ static int null_init_global_tag_set(void)
>
> static int null_setup_tagset(struct nullb *nullb)
> {
> + nullb->tag_set->driver_data = nullb;
> +
How can this be correct since the tag_set pointer is initialized below ?
> if (nullb->dev->shared_tags) {
> nullb->tag_set = &tag_set;
Shouldn't you add:
nullb->tag_set->driver_data = nullb;
here instead ?
> return null_init_global_tag_set();
> }
>
> nullb->tag_set = &nullb->__tag_set;
> - nullb->tag_set->driver_data = nullb;
> nullb->tag_set->nr_hw_queues = nullb->dev->submit_queues;
> nullb->tag_set->queue_depth = nullb->dev->hw_queue_depth;
> nullb->tag_set->numa_node = nullb->dev->home_node;
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] null_blk: fix set->driver_data while setting up tagset
2025-07-21 13:04 ` Damien Le Moal
@ 2025-07-21 13:37 ` Nilay Shroff
0 siblings, 0 replies; 8+ messages in thread
From: Nilay Shroff @ 2025-07-21 13:37 UTC (permalink / raw)
To: Damien Le Moal, linux-block
Cc: hch, ming.lei, hare, axboe, johannes.thumshirn, gjoyce
On 7/21/25 6:34 PM, Damien Le Moal wrote:
> On 2025/07/20 20:35, Nilay Shroff wrote:
>> When setting up a null block device, we initialize a tagset that
>> includes a driver_data field—typically used by block drivers to
>> store a pointer to driver-specific data. In the case of null_blk,
>> this should point to the struct nullb instance.
>>
>> However, due to recent tagset refactoring in the null_blk driver, we
>> missed initializing driver_data when creating a shared tagset. As a
>> result, software queues (ctx) fail to map correctly to new hardware
>> queues (hctx). For example, increasing the number of submit queues
>> triggers an nr_hw_queues update, which invokes null_map_queues() to
>> remap queues. Since set->driver_data is unset, null_map_queues()
>> fails to map any ctx to the new hctxs, leading to hctx->nr_ctx == 0,
>> effectively making the hardware queues unusable for I/O.
>>
>> This patch fixes the issue by ensuring that set->driver_data is properly
>> initialized to point to the struct nullb during tagset setup.
>>
>> Fixes: 72ca28765fc4 ("null_blk: refactor tag_set setup")
>> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
>> ---
>> drivers/block/null_blk/main.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
>> index aa163ae9b2aa..9e1c4ce6fc42 100644
>> --- a/drivers/block/null_blk/main.c
>> +++ b/drivers/block/null_blk/main.c
>> @@ -1854,13 +1854,14 @@ static int null_init_global_tag_set(void)
>>
>> static int null_setup_tagset(struct nullb *nullb)
>> {
>> + nullb->tag_set->driver_data = nullb;
>> +
>
> How can this be correct since the tag_set pointer is initialized below ?
>
>> if (nullb->dev->shared_tags) {
>> nullb->tag_set = &tag_set;
>
> Shouldn't you add:
>
> nullb->tag_set->driver_data = nullb;
>
> here instead ?
>
>> return null_init_global_tag_set();
>> }
Oh yes good catch! I'll fix this in the next patchset.
My bad.. :(
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-07-21 13:37 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-20 11:35 [PATCH 0/2] fix sbitmap initialization and null_blk tagset setup Nilay Shroff
2025-07-20 11:35 ` [PATCH 1/2] lib/sbitmap: fix kernel crash observed when sbitmap depth is zero Nilay Shroff
2025-07-21 7:15 ` Hannes Reinecke
2025-07-21 13:02 ` Damien Le Moal
2025-07-20 11:35 ` [PATCH 2/2] null_blk: fix set->driver_data while setting up tagset Nilay Shroff
2025-07-21 7:16 ` Hannes Reinecke
2025-07-21 13:04 ` Damien Le Moal
2025-07-21 13:37 ` Nilay Shroff
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox