* [PATCHv3 0/2] fix sbitmap initialization and null_blk shared tagset behavior
@ 2025-07-23 13:43 Nilay Shroff
2025-07-23 13:43 ` [PATCHv3 1/2] lib/sbitmap: fix kernel crash observed when sbitmap depth is zero Nilay Shroff
2025-07-23 13:43 ` [PATCHv3 2/2] null_blk: prevent submit and poll queues update for shared tagset Nilay Shroff
0 siblings, 2 replies; 7+ messages in thread
From: Nilay Shroff @ 2025-07-23 13:43 UTC (permalink / raw)
To: linux-block
Cc: hch, dlemoal, yukuai1, hare, ming.lei, axboe, johannes.thumshirn,
gjoyce
Hi,
This patchset fixes two subtle issues discovered while unit testing
nr_hw_queue update code using null_blk driver.
The first patch in the series, fixes an issue in the sbitmap initialization
code, where sb->alloc_hint is not explicitly set to NULL when the sbitmap
depth is zero. This can lead to a kernel crash in sbitmap_free(), which
unconditionally calls free_percpu() on sb->alloc_hint — even if it was
never allocated. The crash is caused by dereferencing an invalid pointer
or stale garbage value.
The second patch in the series, prevents runtime updates to submit_queues
or poll_queues when using a shared tagset. Currently, such updates lead
to the allocation of new hardware queues (hctx) that are never mapped to
any software queues (ctx), rendering them unusable for I/O. This patch
rejects these changes and ensures more consistent behavior. Interestingly,
this unnecessary queue update path helped uncover the issue fixed in first
patch.
As usual, review and feedback are most welcome!
Changes from v2:
- Updated the second patch to prevent the user from modifying submit
or poll queues when tagset is shared (Damien Le Moal, Yu Kuai)
Changes from v1:
- The set->driver_data field should be initialized separately for the
shared tagset to ensure it is correctly set for both shared and
non-shared tagset cases. (Damien Le Moal)
Nilay Shroff (2):
lib/sbitmap: fix kernel crash observed when sbitmap depth is zero
null_blk: prevent submit and poll queues update for shared tagset
drivers/block/null_blk/main.c | 32 ++++++++++++++++++++++----------
lib/sbitmap.c | 1 +
2 files changed, 23 insertions(+), 10 deletions(-)
--
2.50.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCHv3 1/2] lib/sbitmap: fix kernel crash observed when sbitmap depth is zero
2025-07-23 13:43 [PATCHv3 0/2] fix sbitmap initialization and null_blk shared tagset behavior Nilay Shroff
@ 2025-07-23 13:43 ` Nilay Shroff
2025-07-24 2:19 ` Yu Kuai
2025-07-23 13:43 ` [PATCHv3 2/2] null_blk: prevent submit and poll queues update for shared tagset Nilay Shroff
1 sibling, 1 reply; 7+ messages in thread
From: Nilay Shroff @ 2025-07-23 13:43 UTC (permalink / raw)
To: linux-block
Cc: hch, dlemoal, yukuai1, hare, ming.lei, axboe, johannes.thumshirn,
gjoyce
We observed a kernel crash when the I/O scheduler allocates an sbitmap
for a hardware queue (hctx) that has no associated software queues (ctx),
and later attempts to free it. When no software queues are mapped to a
hardware queue, the sbitmap is initialized with a depth of zero. In such
cases, the sbitmap_init_node() function should set sb->alloc_hint to NULL.
However, if this is not done, sb->alloc_hint may contain garbage, and
calling sbitmap_free() will pass this invalid pointer to free_percpu(),
resulting in a kernel crash.
Example crash trace:
==================================================================
Kernel attempted to read user page (28) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on read at 0x00000028
Faulting instruction address: 0xc000000000708f88
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[...]
CPU: 5 UID: 0 PID: 5491 Comm: mk_nullb_shared Kdump: loaded Tainted: G B 6.16.0-rc5+ #294 VOLUNTARY
Tainted: [B]=BAD_PAGE
Hardware name: IBM,9043-MRX POWER10 (architected) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_028) hv:phyp pSeries
[...]
NIP [c000000000708f88] free_percpu+0x144/0xba8
LR [c000000000708f84] free_percpu+0x140/0xba8
Call Trace:
free_percpu+0x140/0xba8 (unreliable)
kyber_exit_hctx+0x94/0x124
blk_mq_exit_sched+0xe4/0x214
elevator_exit+0xa8/0xf4
elevator_switch+0x3b8/0x5d8
elv_update_nr_hw_queues+0x14c/0x300
blk_mq_update_nr_hw_queues+0x5cc/0x670
nullb_update_nr_hw_queues+0x118/0x1f8 [null_blk]
nullb_device_submit_queues_store+0xac/0x170 [null_blk]
configfs_write_iter+0x1dc/0x2d0
vfs_write+0x5b0/0x77c
ksys_write+0xa0/0x180
system_call_exception+0x1b0/0x4f0
system_call_vectored_common+0x15c/0x2ec
If the sbitmap depth is zero, sb->alloc_hint memory is NOT allocated, but
the pointer is not explicitly set to NULL. Later, during sbitmap_free(),
the kernel attempts to free sb->alloc_hint, which is a per cpu pointer
variable, regardless of whether it was valid, leading to a crash.
This patch ensures that sb->alloc_hint is explicitly set to NULL in
sbitmap_init_node() when the requested depth is zero. This prevents
free_percpu() from freeing sb->alloc_hint and thus avoids the observed
crash.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
lib/sbitmap.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/lib/sbitmap.c b/lib/sbitmap.c
index d3412984170c..aa8b6ca76169 100644
--- a/lib/sbitmap.c
+++ b/lib/sbitmap.c
@@ -119,6 +119,7 @@ int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift,
if (depth == 0) {
sb->map = NULL;
+ sb->alloc_hint = NULL;
return 0;
}
--
2.50.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCHv3 2/2] null_blk: prevent submit and poll queues update for shared tagset
2025-07-23 13:43 [PATCHv3 0/2] fix sbitmap initialization and null_blk shared tagset behavior Nilay Shroff
2025-07-23 13:43 ` [PATCHv3 1/2] lib/sbitmap: fix kernel crash observed when sbitmap depth is zero Nilay Shroff
@ 2025-07-23 13:43 ` Nilay Shroff
2025-07-23 15:29 ` Hannes Reinecke
` (2 more replies)
1 sibling, 3 replies; 7+ messages in thread
From: Nilay Shroff @ 2025-07-23 13:43 UTC (permalink / raw)
To: linux-block
Cc: hch, dlemoal, yukuai1, hare, ming.lei, axboe, johannes.thumshirn,
gjoyce
When a user updates the number of submit or poll queues on a null_blk
device, the block layer creates new hardware queues (hctxs). However, if
the device is using a shared tagset, null_blk does not map any software
queues (ctx) to the newly created hctx (via null_map_queues()), resulting
in those hardware queues being left unused for I/O. This behavior is
misleading, as the user may expect the new queues to be functional, even
though they are effectively ignored. To avoid this confusion and potential
misconfiguration:
- Reject runtime updates to submit_queues or poll_queues via sysfs when
the device uses a shared tagset by returning -EINVAL.
- During configuration validation (prior to powering on the device), reset
submit_queues and poll_queues to the module parameters (g_submit_queues
and g_poll_queues) if the shared tagset is enabled.
This ensures consistent behavior and avoids creating unused hardware queues
(hctxs) due to ineffective runtime queue updates.
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
drivers/block/null_blk/main.c | 32 ++++++++++++++++++++++----------
1 file changed, 22 insertions(+), 10 deletions(-)
diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
index aa163ae9b2aa..57bd9aeb9aaf 100644
--- a/drivers/block/null_blk/main.c
+++ b/drivers/block/null_blk/main.c
@@ -388,6 +388,12 @@ static int nullb_update_nr_hw_queues(struct nullb_device *dev,
if (!submit_queues)
return -EINVAL;
+ /*
+ * Cannot update queues with shared tagset.
+ */
+ if (dev->shared_tags)
+ return -EINVAL;
+
/*
* Make sure that null_init_hctx() does not access nullb->queues[] past
* the end of that array.
@@ -1884,18 +1890,24 @@ static int null_validate_conf(struct nullb_device *dev)
dev->queue_mode = NULL_Q_MQ;
}
- if (dev->use_per_node_hctx) {
- if (dev->submit_queues != nr_online_nodes)
- dev->submit_queues = nr_online_nodes;
- } else if (dev->submit_queues > nr_cpu_ids)
- dev->submit_queues = nr_cpu_ids;
- else if (dev->submit_queues == 0)
- dev->submit_queues = 1;
- dev->prev_submit_queues = dev->submit_queues;
-
- if (dev->poll_queues > g_poll_queues)
+ if (dev->shared_tags) {
+ dev->submit_queues = g_submit_queues;
dev->poll_queues = g_poll_queues;
+ } else {
+ if (dev->use_per_node_hctx) {
+ if (dev->submit_queues != nr_online_nodes)
+ dev->submit_queues = nr_online_nodes;
+ } else if (dev->submit_queues > nr_cpu_ids)
+ dev->submit_queues = nr_cpu_ids;
+ else if (dev->submit_queues == 0)
+ dev->submit_queues = 1;
+
+ if (dev->poll_queues > g_poll_queues)
+ dev->poll_queues = g_poll_queues;
+ }
+ dev->prev_submit_queues = dev->submit_queues;
dev->prev_poll_queues = dev->poll_queues;
+
dev->irqmode = min_t(unsigned int, dev->irqmode, NULL_IRQ_TIMER);
/* Do memory allocation, so set blocking */
--
2.50.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCHv3 2/2] null_blk: prevent submit and poll queues update for shared tagset
2025-07-23 13:43 ` [PATCHv3 2/2] null_blk: prevent submit and poll queues update for shared tagset Nilay Shroff
@ 2025-07-23 15:29 ` Hannes Reinecke
2025-07-23 23:53 ` Damien Le Moal
2025-07-24 2:21 ` Yu Kuai
2 siblings, 0 replies; 7+ messages in thread
From: Hannes Reinecke @ 2025-07-23 15:29 UTC (permalink / raw)
To: Nilay Shroff, linux-block
Cc: hch, dlemoal, yukuai1, ming.lei, axboe, johannes.thumshirn,
gjoyce
On 7/23/25 15:43, Nilay Shroff wrote:
> When a user updates the number of submit or poll queues on a null_blk
> device, the block layer creates new hardware queues (hctxs). However, if
> the device is using a shared tagset, null_blk does not map any software
> queues (ctx) to the newly created hctx (via null_map_queues()), resulting
> in those hardware queues being left unused for I/O. This behavior is
> misleading, as the user may expect the new queues to be functional, even
> though they are effectively ignored. To avoid this confusion and potential
> misconfiguration:
> - Reject runtime updates to submit_queues or poll_queues via sysfs when
> the device uses a shared tagset by returning -EINVAL.
> - During configuration validation (prior to powering on the device), reset
> submit_queues and poll_queues to the module parameters (g_submit_queues
> and g_poll_queues) if the shared tagset is enabled.
>
> This ensures consistent behavior and avoids creating unused hardware queues
> (hctxs) due to ineffective runtime queue updates.
>
> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
> ---
> drivers/block/null_blk/main.c | 32 ++++++++++++++++++++++----------
> 1 file changed, 22 insertions(+), 10 deletions(-)
>
Reviwed-by: Hannes Reinecke <hare@suse.de>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv3 2/2] null_blk: prevent submit and poll queues update for shared tagset
2025-07-23 13:43 ` [PATCHv3 2/2] null_blk: prevent submit and poll queues update for shared tagset Nilay Shroff
2025-07-23 15:29 ` Hannes Reinecke
@ 2025-07-23 23:53 ` Damien Le Moal
2025-07-24 2:21 ` Yu Kuai
2 siblings, 0 replies; 7+ messages in thread
From: Damien Le Moal @ 2025-07-23 23:53 UTC (permalink / raw)
To: Nilay Shroff, linux-block
Cc: hch, yukuai1, hare, ming.lei, axboe, johannes.thumshirn, gjoyce
On 7/23/25 22:43, Nilay Shroff wrote:
> When a user updates the number of submit or poll queues on a null_blk
> device, the block layer creates new hardware queues (hctxs). However, if
> the device is using a shared tagset, null_blk does not map any software
> queues (ctx) to the newly created hctx (via null_map_queues()), resulting
> in those hardware queues being left unused for I/O. This behavior is
> misleading, as the user may expect the new queues to be functional, even
> though they are effectively ignored. To avoid this confusion and potential
> misconfiguration:
> - Reject runtime updates to submit_queues or poll_queues via sysfs when
> the device uses a shared tagset by returning -EINVAL.
> - During configuration validation (prior to powering on the device), reset
> submit_queues and poll_queues to the module parameters (g_submit_queues
> and g_poll_queues) if the shared tagset is enabled.
>
> This ensures consistent behavior and avoids creating unused hardware queues
> (hctxs) due to ineffective runtime queue updates.
>
> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv3 1/2] lib/sbitmap: fix kernel crash observed when sbitmap depth is zero
2025-07-23 13:43 ` [PATCHv3 1/2] lib/sbitmap: fix kernel crash observed when sbitmap depth is zero Nilay Shroff
@ 2025-07-24 2:19 ` Yu Kuai
0 siblings, 0 replies; 7+ messages in thread
From: Yu Kuai @ 2025-07-24 2:19 UTC (permalink / raw)
To: Nilay Shroff, linux-block
Cc: hch, dlemoal, yukuai1, hare, ming.lei, axboe, johannes.thumshirn,
gjoyce, yukuai (C)
在 2025/07/23 21:43, Nilay Shroff 写道:
> We observed a kernel crash when the I/O scheduler allocates an sbitmap
> for a hardware queue (hctx) that has no associated software queues (ctx),
> and later attempts to free it. When no software queues are mapped to a
> hardware queue, the sbitmap is initialized with a depth of zero. In such
> cases, the sbitmap_init_node() function should set sb->alloc_hint to NULL.
> However, if this is not done, sb->alloc_hint may contain garbage, and
> calling sbitmap_free() will pass this invalid pointer to free_percpu(),
> resulting in a kernel crash.
>
> Example crash trace:
> ==================================================================
> Kernel attempted to read user page (28) - exploit attempt? (uid: 0)
> BUG: Kernel NULL pointer dereference on read at 0x00000028
> Faulting instruction address: 0xc000000000708f88
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> [...]
> CPU: 5 UID: 0 PID: 5491 Comm: mk_nullb_shared Kdump: loaded Tainted: G B 6.16.0-rc5+ #294 VOLUNTARY
> Tainted: [B]=BAD_PAGE
> Hardware name: IBM,9043-MRX POWER10 (architected) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_028) hv:phyp pSeries
> [...]
> NIP [c000000000708f88] free_percpu+0x144/0xba8
> LR [c000000000708f84] free_percpu+0x140/0xba8
> Call Trace:
> free_percpu+0x140/0xba8 (unreliable)
> kyber_exit_hctx+0x94/0x124
> blk_mq_exit_sched+0xe4/0x214
> elevator_exit+0xa8/0xf4
> elevator_switch+0x3b8/0x5d8
> elv_update_nr_hw_queues+0x14c/0x300
> blk_mq_update_nr_hw_queues+0x5cc/0x670
> nullb_update_nr_hw_queues+0x118/0x1f8 [null_blk]
> nullb_device_submit_queues_store+0xac/0x170 [null_blk]
> configfs_write_iter+0x1dc/0x2d0
> vfs_write+0x5b0/0x77c
> ksys_write+0xa0/0x180
> system_call_exception+0x1b0/0x4f0
> system_call_vectored_common+0x15c/0x2ec
>
> If the sbitmap depth is zero, sb->alloc_hint memory is NOT allocated, but
> the pointer is not explicitly set to NULL. Later, during sbitmap_free(),
> the kernel attempts to free sb->alloc_hint, which is a per cpu pointer
> variable, regardless of whether it was valid, leading to a crash.
>
> This patch ensures that sb->alloc_hint is explicitly set to NULL in
> sbitmap_init_node() when the requested depth is zero. This prevents
> free_percpu() from freeing sb->alloc_hint and thus avoids the observed
> crash.
>
> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
> ---
> lib/sbitmap.c | 1 +
> 1 file changed, 1 insertion(+)
>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Thanks
> diff --git a/lib/sbitmap.c b/lib/sbitmap.c
> index d3412984170c..aa8b6ca76169 100644
> --- a/lib/sbitmap.c
> +++ b/lib/sbitmap.c
> @@ -119,6 +119,7 @@ int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift,
>
> if (depth == 0) {
> sb->map = NULL;
> + sb->alloc_hint = NULL;
> return 0;
> }
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv3 2/2] null_blk: prevent submit and poll queues update for shared tagset
2025-07-23 13:43 ` [PATCHv3 2/2] null_blk: prevent submit and poll queues update for shared tagset Nilay Shroff
2025-07-23 15:29 ` Hannes Reinecke
2025-07-23 23:53 ` Damien Le Moal
@ 2025-07-24 2:21 ` Yu Kuai
2 siblings, 0 replies; 7+ messages in thread
From: Yu Kuai @ 2025-07-24 2:21 UTC (permalink / raw)
To: Nilay Shroff, linux-block
Cc: hch, dlemoal, yukuai1, hare, ming.lei, axboe, johannes.thumshirn,
gjoyce, yukuai (C)
在 2025/07/23 21:43, Nilay Shroff 写道:
> When a user updates the number of submit or poll queues on a null_blk
> device, the block layer creates new hardware queues (hctxs). However, if
> the device is using a shared tagset, null_blk does not map any software
> queues (ctx) to the newly created hctx (via null_map_queues()), resulting
> in those hardware queues being left unused for I/O. This behavior is
> misleading, as the user may expect the new queues to be functional, even
> though they are effectively ignored. To avoid this confusion and potential
> misconfiguration:
> - Reject runtime updates to submit_queues or poll_queues via sysfs when
> the device uses a shared tagset by returning -EINVAL.
> - During configuration validation (prior to powering on the device), reset
> submit_queues and poll_queues to the module parameters (g_submit_queues
> and g_poll_queues) if the shared tagset is enabled.
>
> This ensures consistent behavior and avoids creating unused hardware queues
> (hctxs) due to ineffective runtime queue updates.
>
> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
> ---
> drivers/block/null_blk/main.c | 32 ++++++++++++++++++++++----------
> 1 file changed, 22 insertions(+), 10 deletions(-)
>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Thanks
> diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
> index aa163ae9b2aa..57bd9aeb9aaf 100644
> --- a/drivers/block/null_blk/main.c
> +++ b/drivers/block/null_blk/main.c
> @@ -388,6 +388,12 @@ static int nullb_update_nr_hw_queues(struct nullb_device *dev,
> if (!submit_queues)
> return -EINVAL;
>
> + /*
> + * Cannot update queues with shared tagset.
> + */
> + if (dev->shared_tags)
> + return -EINVAL;
> +
> /*
> * Make sure that null_init_hctx() does not access nullb->queues[] past
> * the end of that array.
> @@ -1884,18 +1890,24 @@ static int null_validate_conf(struct nullb_device *dev)
> dev->queue_mode = NULL_Q_MQ;
> }
>
> - if (dev->use_per_node_hctx) {
> - if (dev->submit_queues != nr_online_nodes)
> - dev->submit_queues = nr_online_nodes;
> - } else if (dev->submit_queues > nr_cpu_ids)
> - dev->submit_queues = nr_cpu_ids;
> - else if (dev->submit_queues == 0)
> - dev->submit_queues = 1;
> - dev->prev_submit_queues = dev->submit_queues;
> -
> - if (dev->poll_queues > g_poll_queues)
> + if (dev->shared_tags) {
> + dev->submit_queues = g_submit_queues;
> dev->poll_queues = g_poll_queues;
> + } else {
> + if (dev->use_per_node_hctx) {
> + if (dev->submit_queues != nr_online_nodes)
> + dev->submit_queues = nr_online_nodes;
> + } else if (dev->submit_queues > nr_cpu_ids)
> + dev->submit_queues = nr_cpu_ids;
> + else if (dev->submit_queues == 0)
> + dev->submit_queues = 1;
> +
> + if (dev->poll_queues > g_poll_queues)
> + dev->poll_queues = g_poll_queues;
> + }
> + dev->prev_submit_queues = dev->submit_queues;
> dev->prev_poll_queues = dev->poll_queues;
> +
> dev->irqmode = min_t(unsigned int, dev->irqmode, NULL_IRQ_TIMER);
>
> /* Do memory allocation, so set blocking */
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-07-24 2:21 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-23 13:43 [PATCHv3 0/2] fix sbitmap initialization and null_blk shared tagset behavior Nilay Shroff
2025-07-23 13:43 ` [PATCHv3 1/2] lib/sbitmap: fix kernel crash observed when sbitmap depth is zero Nilay Shroff
2025-07-24 2:19 ` Yu Kuai
2025-07-23 13:43 ` [PATCHv3 2/2] null_blk: prevent submit and poll queues update for shared tagset Nilay Shroff
2025-07-23 15:29 ` Hannes Reinecke
2025-07-23 23:53 ` Damien Le Moal
2025-07-24 2:21 ` Yu Kuai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).