* [PATCH net v2 2/2] vsock/virtio: restore msg_iter on transmission failure
From: Octavian Purdila @ 2026-06-13 0:09 UTC (permalink / raw)
To: netdev
Cc: Alexander Viro, Andrew Morton, Arseniy Krasnov, David S. Miller,
Eric Dumazet, Eugenio Pérez, Jakub Kicinski, Jason Wang, kvm,
linux-block, linux-fsdevel, linux-kernel, Michael S. Tsirkin,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Stefano Garzarella,
virtualization, Xuan Zhuo, Octavian Purdila,
syzbot+28e5f3d207b14bae122a
In-Reply-To: <20260613000953.467473-1-tavip@google.com>
When transmission fails in virtio_transport_send_pkt_info, the msg_iter
might have been partially advanced. If we don't restore it, the next
attempt to send data will use an incorrect iterator state, leading to
desync and warnings like "send_pkt() returns 0, but X expected".
Specifically, this can happen in the following scenario, triggered by
the syzkaller repro:
1. A write-only VMA (PROT_WRITE only) is partially populated by a
prior TUN write that failed with -EIO but still faulted in some
pages).
2. A vsock sendmmsg call with MSG_ZEROCOPY requests transmission of a
buffer from this VMA.
3. The first packet (64KB) is sent successfully because the pages are
populated.
4. The second packet allocation fails because GUP fast pins the first page
but GUP slow fails on the next unpopulated page due to PROT_WRITE-only
permissions.
5. The iterator is advanced by the partially successful GUP (68KB total
advanced: 64KB from first packet + 4KB from second), but the send loop
breaks and only reports 64KB sent. This creates a 4KB desync.
6. The next retry starts with a non-zero iov_offset, disabling zerocopy
and falling back to copy mode.
7. In copy mode, the transmission succeeds for the next packets but
exhausts the iterator early because of the desync.
8. The final retry sees an empty iterator but zerocopy is re-enabled
(offset resets). It attempts to send the remaining bytes with zerocopy
but pins 0 pages, creating an empty packet.
9. The transport sends the empty packet, triggering the warning because
the returned bytes (header only) do not match the expected payload size.
10. The loop continues to spin, allocating ubuf_info each time, eventually
exhausting sysctl_optmem_max and returning -ENOMEM to userspace.
Restore msg_iter to its original state before the packet allocation
and transmission attempt if they fail.
Fixes: e0718bd82e27 ("vsock: enable setting SO_ZEROCOPY")
Reported-by: syzbot+28e5f3d207b14bae122a@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=28e5f3d207b14bae122a
Assisted-by: gemini:gemini-3.1-pro
Signed-off-by: Octavian Purdila <tavip@google.com>
---
net/vmw_vsock/virtio_transport_common.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index b10666937c490..2baa5a6ebd750 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -295,6 +295,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
u32 max_skb_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
u32 src_cid, src_port, dst_cid, dst_port;
const struct virtio_transport *t_ops;
+ struct iov_iter_state msg_iter_state;
struct virtio_vsock_sock *vvs;
struct ubuf_info *uarg = NULL;
u32 pkt_len = info->pkt_len;
@@ -368,8 +369,17 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
struct sk_buff *skb;
size_t skb_len;
+ /* Save iterator state in case allocation or transmission fails
+ * so we can restore it and retry.
+ */
+ if (info->msg)
+ iov_iter_save_state(&info->msg->msg_iter, &msg_iter_state);
+
skb_len = min(max_skb_len, rest_len);
+ /* Note: virtio_transport_alloc_skb() can advance info->msg->msg_iter
+ * even if it fails (e.g. partial GUP success).
+ */
skb = virtio_transport_alloc_skb(info, skb_len, can_zcopy,
uarg,
src_cid, src_port,
@@ -399,6 +409,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
break;
} while (rest_len);
+ if (info->msg && ret < 0)
+ iov_iter_restore(&info->msg->msg_iter, &msg_iter_state);
+
virtio_transport_put_credit(vvs, rest_len);
/* msg_zerocopy_realloc() initializes the ubuf_info refcnt to 1.
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related
* [PATCH net v2 1/2] iov_iter: export iov_iter_restore
From: Octavian Purdila @ 2026-06-13 0:09 UTC (permalink / raw)
To: netdev
Cc: Alexander Viro, Andrew Morton, Arseniy Krasnov, David S. Miller,
Eric Dumazet, Eugenio Pérez, Jakub Kicinski, Jason Wang, kvm,
linux-block, linux-fsdevel, linux-kernel, Michael S. Tsirkin,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Stefano Garzarella,
virtualization, Xuan Zhuo, Octavian Purdila
In-Reply-To: <20260613000953.467473-1-tavip@google.com>
Export iov_iter_restore so that it can be used by modules.
This is needed by the virtio vsock transport (which can be built as a
module) to restore the msg_iter state when transmission fails.
Signed-off-by: Octavian Purdila <tavip@google.com>
---
lib/iov_iter.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 243662af1af73..067e745f9ef53 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1491,6 +1491,7 @@ void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state)
i->__iov -= state->nr_segs - i->nr_segs;
i->nr_segs = state->nr_segs;
}
+EXPORT_SYMBOL(iov_iter_restore);
/*
* Extract a list of contiguous pages from an ITER_FOLIOQ iterator. This does
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related
* [PATCH net v2 0/2] vsock/virtio: fix msg_iter desync on transmission failure
From: Octavian Purdila @ 2026-06-13 0:09 UTC (permalink / raw)
To: netdev
Cc: Alexander Viro, Andrew Morton, Arseniy Krasnov, David S. Miller,
Eric Dumazet, Eugenio Pérez, Jakub Kicinski, Jason Wang, kvm,
linux-block, linux-fsdevel, linux-kernel, Michael S. Tsirkin,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Stefano Garzarella,
virtualization, Xuan Zhuo, Octavian Purdila
This series fixes a msg_iter desync issue in the virtio vsock transport
that can lead to warnings and eventual -ENOMEM under specific failure
scenarios (e.g. partial GUP failure during MSG_ZEROCOPY transmission).
To fix this, we need to restore the msg_iter state on transmission failure.
However, since virtio vsock transport can be built as a module, we first
need to export iov_iter_restore.
Patch 1 exports iov_iter_restore.
Patch 2 implements the msg_iter restoration in virtio vsock.
Changes in v2:
- Use iov_iter_savestate()/iov_iter_restore() (Stefano)
- Use a single restore point (Stefano)
- Reverse xmas tree (Stefano)
- Added comments in the code (Stefano)
v1: https://lore.kernel.org/all/20260609004809.1285028-1-tavip@google.com/
Octavian Purdila (2):
iov_iter: export iov_iter_restore
vsock/virtio: restore msg_iter on transmission failure
lib/iov_iter.c | 1 +
net/vmw_vsock/virtio_transport_common.c | 13 +++++++++++++
2 files changed, 14 insertions(+)
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply
* [PATCH] block: check bio split for unaligned bvec
From: Keith Busch @ 2026-06-12 22:32 UTC (permalink / raw)
To: linux-block, axboe; +Cc: hch, Keith Busch, Carlos Maiolino
From: Keith Busch <kbusch@kernel.org>
Offsets and lengths need to be validated against the dma alignment. This
check was skipped for sufficiently a small bio with a single bvec, which
may allow an invalid request dispatched to the driver. Force the
validation for an unaligned bvec by forcing the bio split path that
handles this condition.
Fixes: 7eac33186957 ("iomap: simplify direct io validity check")
Fixes: 5ff3f74e145a ("block: simplify direct io validity check")
Reported-by: Carlos Maiolino <cem@kernel.org>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
block/blk.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/block/blk.h b/block/blk.h
index 1a2d9101bba04..004048fa0c5a8 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -404,6 +404,8 @@ static inline bool bio_may_need_split(struct bio *bio,
bv = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
if (bio->bi_iter.bi_size > bv->bv_len - bio->bi_iter.bi_bvec_done)
return true;
+ if ((bv->bv_offset | bv->bv_len) & lim->dma_alignment)
+ return true;
return bv->bv_len + bv->bv_offset > lim->max_fast_segment_size;
}
--
2.52.0
^ permalink raw reply related
* Re: [PATCH 16/27] loop: Split loop_configure()
From: Bart Van Assche @ 2026-06-12 17:12 UTC (permalink / raw)
To: Haris Iqbal, Jens Axboe; +Cc: linux-block, Christoph Hellwig, Marco Elver
In-Reply-To: <667e352d-afad-412e-8f2a-ad5f5f8a737b@linux.dev>
On 6/12/26 9:16 AM, Haris Iqbal wrote:
> On 6/10/26 00:05, Bart Van Assche wrote:
>> -static int loop_configure(struct loop_device *lo, blk_mode_t mode,
>> - struct block_device *bdev,
>> - const struct loop_config *config)
>> +static int __loop_configure(struct loop_device *lo, blk_mode_t mode,
>> + struct block_device *bdev,
>> + const struct loop_config *config, struct file *file,
>> + bool *partscan)
>> {
>
> I wonder if we can add "__must_hold(&lo->lo_mutex)" to this.
> Same for the function __loop_change_fd()
I will look into this.
Thanks,
Bart.
^ permalink raw reply
* Re: configurable block error injection v5
From: Jens Axboe @ 2026-06-12 16:44 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jonathan Corbet, Damien Le Moal, Hannes Reinecke, Keith Busch,
linux-block, linux-doc
In-Reply-To: <20260611140703.2401204-1-hch@lst.de>
On Thu, 11 Jun 2026 16:06:43 +0200, Christoph Hellwig wrote:
> this series adds a new configurable block error injection facility.
> We already have a few to inject block errors, but unfortunately most
> of them are either not very useful or hard to use, or both:
>
> - The fail_make_request failure injection point can't distinguish
> different commands, different ranges in the file and can only injection
> plain I/O errors.
> - the should_fail_bio 'dynamic' failure injection has all the same issues
> as fail_make_request
> - dm-error can only fail all command in the table using BLK_STS_IOERR
> and requires setting up a new block device
> - dm-flakey and dm-dust allow all kinds of configurability, but still
> don't have good error selection, no good support for non-read/write
> commands and are limited to the dm table alignment requirements,
> which for zoned devices enforces setting them up for an entire zone.
> They also once again require setting up a stacked block device,
> which is really annoying in harnesses like xfstests
>
> [...]
Applied, thanks!
[1/4] block: add a macro to initialize the status table
commit: 8c8ebed16581faf3b3e97336aeca3d8226c4435f
[2/4] block: add a "tag" for block status codes
commit: ce351560b714403acfdeed86ef96675d229da837
[3/4] block: add a str_to_blk_op helper
commit: d39a63ead381c7ee93cd938ea2d759c17343b522
[4/4] block: add configurable error injection
commit: e8dcf2d142bd720c8334233ad6cfdf00f0e76b7f
Best regards,
--
Jens Axboe
^ permalink raw reply
* Re: [PATCH 16/27] loop: Split loop_configure()
From: Haris Iqbal @ 2026-06-12 16:16 UTC (permalink / raw)
To: Bart Van Assche, Jens Axboe; +Cc: linux-block, Christoph Hellwig, Marco Elver
In-Reply-To: <f001b652198021d586af28c4c2c27be4b1824f3d.1781042470.git.bvanassche@acm.org>
On 6/10/26 00:05, Bart Van Assche wrote:
> Prepare for adding a second __loop_configure() call.
>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
> drivers/block/loop.c | 109 ++++++++++++++++++++++---------------------
> 1 file changed, 57 insertions(+), 52 deletions(-)
>
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 6fbea0af144f..80fdb0dee268 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -981,61 +981,28 @@ static void loop_update_limits(struct loop_device *lo, struct queue_limits *lim,
> lim->discard_granularity = 0;
> }
>
> -static int loop_configure(struct loop_device *lo, blk_mode_t mode,
> - struct block_device *bdev,
> - const struct loop_config *config)
> +static int __loop_configure(struct loop_device *lo, blk_mode_t mode,
> + struct block_device *bdev,
> + const struct loop_config *config, struct file *file,
> + bool *partscan)
> {
I wonder if we can add "__must_hold(&lo->lo_mutex)" to this.
Same for the function __loop_change_fd()
> - struct file *file = fget(config->fd);
> struct queue_limits lim;
> - int error;
> loff_t size;
> - bool partscan;
> - bool is_loop;
> -
> - if (!file)
> - return -EBADF;
> -
> - error = loop_check_backing_file(file);
> - if (error) {
> - fput(file);
> - return error;
> - }
> -
> - is_loop = is_loop_device(file);
> -
> - /* This is safe, since we have a reference from open(). */
> - __module_get(THIS_MODULE);
> -
> - /*
> - * If we don't hold exclusive handle for the device, upgrade to it
> - * here to avoid changing device under exclusive owner.
> - */
> - if (!(mode & BLK_OPEN_EXCL)) {
> - error = bd_prepare_to_claim(bdev, loop_configure, NULL);
> - if (error)
> - goto out_putf;
> - }
> -
> - error = loop_global_lock_killable(lo, is_loop);
> - if (error)
> - goto out_bdev;
> + int error;
>
> - error = -EBUSY;
> if (lo->lo_state != Lo_unbound)
> - goto out_unlock;
> + return -EBUSY;
>
> error = loop_validate_file(file, bdev);
> if (error)
> - goto out_unlock;
> + return error;
>
> - if ((config->info.lo_flags & ~LOOP_CONFIGURE_SETTABLE_FLAGS) != 0) {
> - error = -EINVAL;
> - goto out_unlock;
> - }
> + if ((config->info.lo_flags & ~LOOP_CONFIGURE_SETTABLE_FLAGS) != 0)
> + return -EINVAL;
>
> error = loop_set_status_from_info(lo, &config->info);
> if (error)
> - goto out_unlock;
> + return error;
> lo->lo_flags = config->info.lo_flags;
>
> if (!(file->f_mode & FMODE_WRITE) || !(mode & BLK_OPEN_WRITE) ||
> @@ -1046,10 +1013,8 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,
> lo->workqueue = alloc_workqueue("loop%d",
> WQ_UNBOUND | WQ_FREEZABLE,
> 0, lo->lo_number);
> - if (!lo->workqueue) {
> - error = -ENOMEM;
> - goto out_unlock;
> - }
> + if (!lo->workqueue)
> + return -ENOMEM;
> }
>
> /* suppress uevents while reconfiguring the device */
> @@ -1066,7 +1031,7 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,
> /* No need to freeze the queue as the device isn't bound yet. */
> error = queue_limits_commit_update(lo->lo_queue, &lim);
> if (error)
> - goto out_unlock;
> + return error;
>
> /*
> * We might switch to direct I/O mode for the loop device, write back
> @@ -1087,14 +1052,56 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,
> WRITE_ONCE(lo->lo_state, Lo_bound);
> if (part_shift)
> lo->lo_flags |= LO_FLAGS_PARTSCAN;
> - partscan = lo->lo_flags & LO_FLAGS_PARTSCAN;
> - if (partscan)
> + *partscan = lo->lo_flags & LO_FLAGS_PARTSCAN;
> + if (*partscan)
> clear_bit(GD_SUPPRESS_PART_SCAN, &lo->lo_disk->state);
>
> dev_set_uevent_suppress(disk_to_dev(lo->lo_disk), 0);
> kobject_uevent(&disk_to_dev(lo->lo_disk)->kobj, KOBJ_CHANGE);
>
> + return 0;
> +}
> +
> +static int loop_configure(struct loop_device *lo, blk_mode_t mode,
> + struct block_device *bdev,
> + const struct loop_config *config)
> +{
> + struct file *file = fget(config->fd);
> + int error;
> + bool partscan;
> + bool is_loop;
> +
> + if (!file)
> + return -EBADF;
> +
> + error = loop_check_backing_file(file);
> + if (error) {
> + fput(file);
> + return error;
> + }
> +
> + is_loop = is_loop_device(file);
> +
> + /* This is safe, since we have a reference from open(). */
> + __module_get(THIS_MODULE);
> +
> + /*
> + * If we don't hold exclusive handle for the device, upgrade to it
> + * here to avoid changing device under exclusive owner.
> + */
> + if (!(mode & BLK_OPEN_EXCL)) {
> + error = bd_prepare_to_claim(bdev, loop_configure, NULL);
> + if (error)
> + goto out_putf;
> + }
> +
> + error = loop_global_lock_killable(lo, is_loop);
> + if (error)
> + goto out_bdev;
> + error = __loop_configure(lo, mode, bdev, config, file, &partscan);
> loop_global_unlock(lo, is_loop);
> + if (error)
> + goto out_bdev;
> if (partscan)
> loop_reread_partitions(lo);
>
> @@ -1103,8 +1110,6 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,
>
> return 0;
>
> -out_unlock:
> - loop_global_unlock(lo, is_loop);
> out_bdev:
> if (!(mode & BLK_OPEN_EXCL))
> bd_abort_claiming(bdev, loop_configure);
>
^ permalink raw reply
* Re: [PATCH v2] block: invalidate cached plug timestamp after task switch
From: Usama Arif @ 2026-06-12 15:47 UTC (permalink / raw)
To: Peter Zijlstra
Cc: axboe, linux-block, bsegall, dietmar.eggemann, juri.lelli,
kprateek.nayak, linux-kernel, mgorman, mingo, rostedt,
vincent.guittot, vschneid, shakeel.butt, hannes, riel,
kernel-team, stable
In-Reply-To: <20260612154022.GC42921@noisy.programming.kicks-ass.net>
On 12/06/2026 16:40, Peter Zijlstra wrote:
> On Fri, Jun 12, 2026 at 11:02:58AM +0100, Usama Arif wrote:
>>
>>
>> On 12/06/2026 10:45, Peter Zijlstra wrote:
>>> On Fri, Jun 12, 2026 at 02:40:42AM -0700, Usama Arif wrote:
>>>
>>>> +static __always_inline void blk_plug_invalidate_ts(void)
>>>> {
>>>> + if (unlikely(current->flags & PF_BLOCK_TS)) {
>>>> + struct blk_plug *plug = current->plug;
>>>>
>>>> + if (plug)
>>>> + plug->cur_ktime = 0;
>>>> + current->flags &= ~PF_BLOCK_TS;
>>>> + }
>>>> }
>>>
>>> If you can guarantee PF_BLOCK_TS is only ever set when current->plug,
>>> this can be reduced further.
>>
>> Thanks for the reviews!
>>
>> The invariant holds at set time (the only set in blk_time_get_ns() is
>> gated by if (!plug)) and through the only legitimate plug clear in
>> blk_finish_plug() (which goes through __blk_flush_plug() that clears
>> PF_BLOCK_TS first).
>>
>> However, copy_process() sets p->plug = NULL for the child but doesn't
>> strip PF_BLOCK_TS from the inherited flags.
>>
>> I think the if(plug) is a good defensive check, but can also do the below
>> if you prefer?
>
> I think that's worth the extra few lines.
ah sorry didnt understand you completely, by extra lines do you mean keep
the if(plug) or get rid of it and clear the flag in fork?
I though more about it and I think its much nicer to get rid of the if(plug)
and clear the flag in fork.
^ permalink raw reply
* Re: [PATCH v2] block: invalidate cached plug timestamp after task switch
From: Peter Zijlstra @ 2026-06-12 15:40 UTC (permalink / raw)
To: Usama Arif
Cc: axboe, linux-block, bsegall, dietmar.eggemann, juri.lelli,
kprateek.nayak, linux-kernel, mgorman, mingo, rostedt,
vincent.guittot, vschneid, shakeel.butt, hannes, riel,
kernel-team, stable
In-Reply-To: <789fd34a-d051-4f98-bd66-e3d99ec2dbb1@linux.dev>
On Fri, Jun 12, 2026 at 11:02:58AM +0100, Usama Arif wrote:
>
>
> On 12/06/2026 10:45, Peter Zijlstra wrote:
> > On Fri, Jun 12, 2026 at 02:40:42AM -0700, Usama Arif wrote:
> >
> >> +static __always_inline void blk_plug_invalidate_ts(void)
> >> {
> >> + if (unlikely(current->flags & PF_BLOCK_TS)) {
> >> + struct blk_plug *plug = current->plug;
> >>
> >> + if (plug)
> >> + plug->cur_ktime = 0;
> >> + current->flags &= ~PF_BLOCK_TS;
> >> + }
> >> }
> >
> > If you can guarantee PF_BLOCK_TS is only ever set when current->plug,
> > this can be reduced further.
>
> Thanks for the reviews!
>
> The invariant holds at set time (the only set in blk_time_get_ns() is
> gated by if (!plug)) and through the only legitimate plug clear in
> blk_finish_plug() (which goes through __blk_flush_plug() that clears
> PF_BLOCK_TS first).
>
> However, copy_process() sets p->plug = NULL for the child but doesn't
> strip PF_BLOCK_TS from the inherited flags.
>
> I think the if(plug) is a good defensive check, but can also do the below
> if you prefer?
I think that's worth the extra few lines.
^ permalink raw reply
* Re: [PATCH 01/27] aoe: Enable lock context analysis
From: Haris Iqbal @ 2026-06-12 14:52 UTC (permalink / raw)
To: Bart Van Assche, Jens Axboe
Cc: linux-block, Christoph Hellwig, Marco Elver, Christoph Hellwig,
Justin Sanders
In-Reply-To: <f15d4c4f4a22ab4788a40d51d88d228a649d0115.1781042470.git.bvanassche@acm.org>
On 6/10/26 00:04, Bart Van Assche wrote:
> Add a missing __must_hold() annotation. Enable lock context analysis in the
> Makefile.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Looks good,
Reviewed-by: Md Haris Iqbal <haris.iqbal@linux.dev>
> ---
> drivers/block/aoe/Makefile | 2 ++
> drivers/block/aoe/aoecmd.c | 1 +
> 2 files changed, 3 insertions(+)
>
> diff --git a/drivers/block/aoe/Makefile b/drivers/block/aoe/Makefile
> index b7545ce2f1b0..27bff6359a56 100644
> --- a/drivers/block/aoe/Makefile
> +++ b/drivers/block/aoe/Makefile
> @@ -3,5 +3,7 @@
> # Makefile for ATA over Ethernet
> #
>
> +CONTEXT_ANALYSIS := y
> +
> obj-$(CONFIG_ATA_OVER_ETH) += aoe.o
> aoe-y := aoeblk.o aoechr.o aoecmd.o aoedev.o aoemain.o aoenet.o
> diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
> index a4744a30a8af..54c57b9f8894 100644
> --- a/drivers/block/aoe/aoecmd.c
> +++ b/drivers/block/aoe/aoecmd.c
> @@ -1193,6 +1193,7 @@ noskb: if (buf)
> */
> static int
> ktio(int id)
> + __must_hold(&iocq[id].lock)
> {
> struct frame *f;
> struct list_head *pos;
>
^ permalink raw reply
* Re: configurable block error injection v5
From: Haris Iqbal @ 2026-06-12 14:10 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe
Cc: Jonathan Corbet, Damien Le Moal, Hannes Reinecke, Keith Busch,
linux-block, linux-doc
In-Reply-To: <20260611140703.2401204-1-hch@lst.de>
On 6/11/26 16:06, Christoph Hellwig wrote:
> Hi all,
>
> this series adds a new configurable block error injection facility.
> We already have a few to inject block errors, but unfortunately most
> of them are either not very useful or hard to use, or both:
>
> - The fail_make_request failure injection point can't distinguish
> different commands, different ranges in the file and can only injection
> plain I/O errors.
> - the should_fail_bio 'dynamic' failure injection has all the same issues
> as fail_make_request
> - dm-error can only fail all command in the table using BLK_STS_IOERR
> and requires setting up a new block device
> - dm-flakey and dm-dust allow all kinds of configurability, but still
> don't have good error selection, no good support for non-read/write
> commands and are limited to the dm table alignment requirements,
> which for zoned devices enforces setting them up for an entire zone.
> They also once again require setting up a stacked block device,
> which is really annoying in harnesses like xfstests
>
> This series adds a new debugfs-based block layer error injection
> that allows to configure what operations and ranges the injection
> applied to, and what status to return. It also allows to configure a
> failure ratio similar to the xfs errortag injection.
>
> Changes since v4:
> - don't unlock in removeall to avoid a race between removeall and setup
> - document why we can't match 0-sized bios
>
> Changes since v3:
> - use a static branch to guard the new condition
> - split out a new header so that jump_label.h doesn't get pulled into
> blk.h
> - more checking for impossible conditions in blk_status_to_tag
> - more spelling fixes
>
> Changes since v2:
> - improve the documentation a bit
> - fix a spelling mistake in a comment
>
> Changes since v1:
> - drop the should_fail_bio removal and cleanup depending on it, as it's
> used by eBPF programs and thus a hidden UABI.
> - as a result split the code out to it's own Kconfig symbol
> - various error handling fixed pointed out by Keith
> - documentation spelling fixes pointed out by Randy
>
> Diffstat:
> Documentation/block/error-injection.rst | 59 +++++
> Documentation/block/index.rst | 1
> block/Kconfig | 8
> block/Makefile | 1
> block/blk-core.c | 87 ++++++--
> block/blk-sysfs.c | 5
> block/blk.h | 3
> block/error-injection.c | 315 ++++++++++++++++++++++++++++++++
> block/error-injection.h | 21 ++
> block/genhd.c | 4
> include/linux/blkdev.h | 6
> 11 files changed, 490 insertions(+), 20 deletions(-)
Thank you for this series. It is a nice addition.
Reviewed-by: Md Haris Iqbal <haris.iqbal@linux.dev>
(for the whole series)
>
^ permalink raw reply
* [PATCH v3] rust: add procedural macro for declaring configfs attributes
From: Malte Wechter @ 2026-06-12 13:29 UTC (permalink / raw)
To: Andreas Hindborg, Breno Leitao, Miguel Ojeda, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Jens Axboe, Luis Chamberlain,
Petr Pavlu, Daniel Gomez, Sami Tolvanen, Aaron Tomlin
Cc: linux-kernel, rust-for-linux, linux-block, linux-modules,
Malte Wechter
Implement `configfs_attrs!` as a procedural macro using `syn`, this
improves readability and maintainability. Remove the old macro and
replace all uses with the new macro. Add the new macro implementation
file to MAINTAINERS.
Signed-off-by: Malte Wechter <maltewechter@gmail.com>
---
Changes in v3:
- Remove 'make_static_ident' function, make names for static variables simpler
- Move 'parse_ordered_fields' macro from module.rs into helpers
- Use 'parse_ordered_fields' macro for parsing instead of doing it ad-hoc
- Link to v2: https://lore.kernel.org/r/20260603-configfs-syn-v2-1-cb58489c2647@gmail.com
Changes in v2:
- Add a try_parse helper function to macros/helpers.rs
- Fix bug where 'child' configuration gets dropped if trailing comma is missing (sashiko)
- Link to v1: https://lore.kernel.org/r/20260520-configfs-syn-v1-1-6c5b80a9cef2@gmail.com
---
MAINTAINERS | 1 +
drivers/block/rnull/configfs.rs | 2 +-
rust/kernel/configfs.rs | 251 ----------------------------------------
rust/macros/configfs_attrs.rs | 135 +++++++++++++++++++++
rust/macros/helpers.rs | 139 ++++++++++++++++++++++
rust/macros/lib.rs | 85 ++++++++++++++
rust/macros/module.rs | 137 ----------------------
samples/rust/rust_configfs.rs | 2 +-
8 files changed, 362 insertions(+), 390 deletions(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index 2fb1c75afd16..45f7a1ec93b4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6464,6 +6464,7 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/a.hindborg/linux.git config
F: fs/configfs/
F: include/linux/configfs.h
F: rust/kernel/configfs.rs
+F: rust/macros/configfs_attrs.rs
F: samples/configfs/
F: samples/rust/rust_configfs.rs
diff --git a/drivers/block/rnull/configfs.rs b/drivers/block/rnull/configfs.rs
index 7c2eb5c0b722..f28ec69d7984 100644
--- a/drivers/block/rnull/configfs.rs
+++ b/drivers/block/rnull/configfs.rs
@@ -4,8 +4,8 @@
use kernel::{
block::mq::gen_disk::{GenDisk, GenDiskBuilder},
configfs::{self, AttributeOperations},
- configfs_attrs,
fmt::{self, Write as _},
+ macros::configfs_attrs,
new_mutex,
page::PAGE_SIZE,
prelude::*,
diff --git a/rust/kernel/configfs.rs b/rust/kernel/configfs.rs
index 2339c6467325..7a91e36677f5 100644
--- a/rust/kernel/configfs.rs
+++ b/rust/kernel/configfs.rs
@@ -791,254 +791,3 @@ fn as_ptr(&self) -> *const bindings::config_item_type {
self.item_type.get()
}
}
-
-/// Define a list of configfs attributes statically.
-///
-/// Invoking the macro in the following manner:
-///
-/// ```ignore
-/// let item_type = configfs_attrs! {
-/// container: configfs::Subsystem<Configuration>,
-/// data: Configuration,
-/// child: Child,
-/// attributes: [
-/// message: 0,
-/// bar: 1,
-/// ],
-/// };
-/// ```
-///
-/// Expands the following output:
-///
-/// ```ignore
-/// let item_type = {
-/// static CONFIGURATION_MESSAGE_ATTR: kernel::configfs::Attribute<
-/// 0,
-/// Configuration,
-/// Configuration,
-/// > = unsafe {
-/// kernel::configfs::Attribute::new({
-/// const S: &str = "message\u{0}";
-/// const C: &kernel::str::CStr = match kernel::str::CStr::from_bytes_with_nul(
-/// S.as_bytes()
-/// ) {
-/// Ok(v) => v,
-/// Err(_) => {
-/// core::panicking::panic_fmt(core::const_format_args!(
-/// "string contains interior NUL"
-/// ));
-/// }
-/// };
-/// C
-/// })
-/// };
-///
-/// static CONFIGURATION_BAR_ATTR: kernel::configfs::Attribute<
-/// 1,
-/// Configuration,
-/// Configuration
-/// > = unsafe {
-/// kernel::configfs::Attribute::new({
-/// const S: &str = "bar\u{0}";
-/// const C: &kernel::str::CStr = match kernel::str::CStr::from_bytes_with_nul(
-/// S.as_bytes()
-/// ) {
-/// Ok(v) => v,
-/// Err(_) => {
-/// core::panicking::panic_fmt(core::const_format_args!(
-/// "string contains interior NUL"
-/// ));
-/// }
-/// };
-/// C
-/// })
-/// };
-///
-/// const N: usize = (1usize + (1usize + 0usize)) + 1usize;
-///
-/// static CONFIGURATION_ATTRS: kernel::configfs::AttributeList<N, Configuration> =
-/// unsafe { kernel::configfs::AttributeList::new() };
-///
-/// {
-/// const N: usize = 0usize;
-/// unsafe { CONFIGURATION_ATTRS.add::<N, 0, _>(&CONFIGURATION_MESSAGE_ATTR) };
-/// }
-///
-/// {
-/// const N: usize = (1usize + 0usize);
-/// unsafe { CONFIGURATION_ATTRS.add::<N, 1, _>(&CONFIGURATION_BAR_ATTR) };
-/// }
-///
-/// static CONFIGURATION_TPE:
-/// kernel::configfs::ItemType<configfs::Subsystem<Configuration> ,Configuration>
-/// = kernel::configfs::ItemType::<
-/// configfs::Subsystem<Configuration>,
-/// Configuration
-/// >::new_with_child_ctor::<N,Child>(
-/// &THIS_MODULE,
-/// &CONFIGURATION_ATTRS
-/// );
-///
-/// &CONFIGURATION_TPE
-/// }
-/// ```
-#[macro_export]
-macro_rules! configfs_attrs {
- (
- container: $container:ty,
- data: $data:ty,
- attributes: [
- $($name:ident: $attr:literal),* $(,)?
- ] $(,)?
- ) => {
- $crate::configfs_attrs!(
- count:
- @container($container),
- @data($data),
- @child(),
- @no_child(x),
- @attrs($($name $attr)*),
- @eat($($name $attr,)*),
- @assign(),
- @cnt(0usize),
- )
- };
- (
- container: $container:ty,
- data: $data:ty,
- child: $child:ty,
- attributes: [
- $($name:ident: $attr:literal),* $(,)?
- ] $(,)?
- ) => {
- $crate::configfs_attrs!(
- count:
- @container($container),
- @data($data),
- @child($child),
- @no_child(),
- @attrs($($name $attr)*),
- @eat($($name $attr,)*),
- @assign(),
- @cnt(0usize),
- )
- };
- (count:
- @container($container:ty),
- @data($data:ty),
- @child($($child:ty)?),
- @no_child($($no_child:ident)?),
- @attrs($($aname:ident $aattr:literal)*),
- @eat($name:ident $attr:literal, $($rname:ident $rattr:literal,)*),
- @assign($($assign:block)*),
- @cnt($cnt:expr),
- ) => {
- $crate::configfs_attrs!(
- count:
- @container($container),
- @data($data),
- @child($($child)?),
- @no_child($($no_child)?),
- @attrs($($aname $aattr)*),
- @eat($($rname $rattr,)*),
- @assign($($assign)* {
- const N: usize = $cnt;
- // The following macro text expands to a call to `Attribute::add`.
-
- // SAFETY: By design of this macro, the name of the variable we
- // invoke the `add` method on below, is not visible outside of
- // the macro expansion. The macro does not operate concurrently
- // on this variable, and thus we have exclusive access to the
- // variable.
- unsafe {
- $crate::macros::paste!(
- [< $data:upper _ATTRS >]
- .add::<N, $attr, _>(&[< $data:upper _ $name:upper _ATTR >])
- )
- };
- }),
- @cnt(1usize + $cnt),
- )
- };
- (count:
- @container($container:ty),
- @data($data:ty),
- @child($($child:ty)?),
- @no_child($($no_child:ident)?),
- @attrs($($aname:ident $aattr:literal)*),
- @eat(),
- @assign($($assign:block)*),
- @cnt($cnt:expr),
- ) =>
- {
- $crate::configfs_attrs!(
- final:
- @container($container),
- @data($data),
- @child($($child)?),
- @no_child($($no_child)?),
- @attrs($($aname $aattr)*),
- @assign($($assign)*),
- @cnt($cnt),
- )
- };
- (final:
- @container($container:ty),
- @data($data:ty),
- @child($($child:ty)?),
- @no_child($($no_child:ident)?),
- @attrs($($name:ident $attr:literal)*),
- @assign($($assign:block)*),
- @cnt($cnt:expr),
- ) =>
- {
- $crate::macros::paste!{
- {
- $(
- // SAFETY: We are expanding `configfs_attrs`.
- static [< $data:upper _ $name:upper _ATTR >]:
- $crate::configfs::Attribute<$attr, $data, $data> =
- unsafe {
- $crate::configfs::Attribute::new(
- $crate::c_str!(::core::stringify!($name)),
- )
- };
- )*
-
-
- // We need space for a null terminator.
- const N: usize = $cnt + 1usize;
-
- // SAFETY: We are expanding `configfs_attrs`.
- static [< $data:upper _ATTRS >]:
- $crate::configfs::AttributeList<N, $data> =
- unsafe { $crate::configfs::AttributeList::new() };
-
- $($assign)*
-
- $(
- const [<$no_child:upper>]: bool = true;
-
- static [< $data:upper _TPE >] : $crate::configfs::ItemType<$container, $data> =
- $crate::configfs::ItemType::<$container, $data>::new::<N>(
- &THIS_MODULE, &[<$ data:upper _ATTRS >]
- );
- )?
-
- $(
- static [< $data:upper _TPE >]:
- $crate::configfs::ItemType<$container, $data> =
- $crate::configfs::ItemType::<$container, $data>::
- new_with_child_ctor::<N, $child>(
- &THIS_MODULE, &[<$ data:upper _ATTRS >]
- );
- )?
-
- & [< $data:upper _TPE >]
- }
- }
- };
-
-}
-
-pub use crate::configfs_attrs;
diff --git a/rust/macros/configfs_attrs.rs b/rust/macros/configfs_attrs.rs
new file mode 100644
index 000000000000..81037bc38188
--- /dev/null
+++ b/rust/macros/configfs_attrs.rs
@@ -0,0 +1,135 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use quote::{
+ format_ident,
+ quote, //
+};
+
+use syn::{
+ bracketed,
+ ext::IdentExt,
+ parse::{
+ Parse,
+ ParseStream, //
+ },
+ punctuated::Punctuated,
+ spanned::Spanned,
+ Error,
+ Ident,
+ LitInt,
+ Token,
+ Type, //
+};
+
+use crate::helpers::parse_ordered_fields;
+
+pub(crate) struct ConfigfsAttrs {
+ container: Type,
+ data: Type,
+ child: Option<Type>,
+ attributes: Vec<(Ident, LitInt)>,
+}
+
+fn parse_attribute_field(stream: ParseStream<'_>) -> syn::Result<(Ident, LitInt)> {
+ let id = stream.parse::<syn::Ident>()?;
+ let _colon = stream.parse::<Token![:]>()?;
+ let v = stream.parse::<LitInt>()?;
+ Ok((id, v))
+}
+
+fn parse_attributes(stream: ParseStream<'_>) -> syn::Result<Vec<(Ident, LitInt)>> {
+ let attr_stream;
+ let _bracket = bracketed!(attr_stream in stream);
+ let attributes = Punctuated::<(Ident, LitInt), Token![,]>::parse_terminated_with(
+ &attr_stream,
+ parse_attribute_field,
+ )?;
+ Ok(attributes.into_iter().collect::<Vec<_>>())
+}
+
+impl Parse for ConfigfsAttrs {
+ fn parse(input: ParseStream<'_>) -> syn::Result<Self> {
+ parse_ordered_fields!(
+ from input;
+ container [required] => (input.parse::<Type>())?,
+ data [required] => (input.parse::<Type>())?,
+ child => (input.parse::<Type>())?,
+ attributes [required] => parse_attributes(input)?,
+ );
+
+ Ok(ConfigfsAttrs {
+ container,
+ data,
+ child,
+ attributes,
+ })
+ }
+}
+
+pub(crate) fn configfs_attrs(cfs_attrs: ConfigfsAttrs) -> proc_macro2::TokenStream {
+ let (container_ty, data_ty) = (&cfs_attrs.container, &cfs_attrs.data);
+
+ let data_tp_ident = Ident::new("DATA_TPE", cfs_attrs.data.span());
+ let data_attr_ident = Ident::new("DATA_ATTR_LIST", cfs_attrs.data.span());
+
+ let n = cfs_attrs.attributes.len() + 1;
+
+ let attr_list = quote! {
+ static #data_attr_ident: kernel::configfs::AttributeList<#n, #data_ty> =
+ // SAFETY: We are expanding `configfs_attrs`.
+ unsafe { kernel::configfs::AttributeList::new() };
+ };
+
+ let mut attrs = Vec::new();
+ for (attr_idx, (name, id)) in cfs_attrs.attributes.iter().enumerate() {
+ let name_with_attr = format_ident!("{}_ATTR_{}", name.to_string().to_uppercase(), attr_idx);
+
+ let id: u64 = match id.base10_parse::<u64>() {
+ Ok(v) => v,
+ Err(_) => {
+ return syn::Error::new(id.span(), "Could not parse attribute ID as a u64")
+ .to_compile_error();
+ }
+ };
+
+ attrs.push(quote! {
+ static #name_with_attr: kernel::configfs::Attribute<#id, #data_ty, #data_ty> =
+ // SAFETY: We are expanding `configfs_attrs`.
+ unsafe {
+ kernel::configfs::Attribute::new(kernel::c_str!(::core::stringify!(#name)))
+ };
+
+ // SAFETY: By design of this macro, the name of the variable we
+ // invoke the `add` method on below, is not visible outside of
+ // the macro expansion. The macro does not operate concurrently
+ // on this variable, and thus we have exclusive access to the
+ // variable.
+ unsafe { #data_attr_ident.add::<#attr_idx, #id, _>(&#name_with_attr) }
+ });
+ }
+
+ let has_child_code = if let Some(child) = cfs_attrs.child {
+ quote! { new_with_child_ctor::<#n, #child>}
+ } else {
+ quote! { new::<#n> }
+ };
+
+ let data_type = quote! {
+ {
+ static #data_tp_ident:
+ kernel::configfs::ItemType<#container_ty, #data_ty> =
+ kernel::configfs::ItemType::<#container_ty, #data_ty>::#has_child_code(
+ &THIS_MODULE, &#data_attr_ident
+ );
+ &#data_tp_ident
+ }
+ };
+
+ quote! {
+ {
+ #attr_list
+ #(#attrs)*
+ #data_type
+ }
+ }
+}
diff --git a/rust/macros/helpers.rs b/rust/macros/helpers.rs
index d18fbf4daa0a..df524749631a 100644
--- a/rust/macros/helpers.rs
+++ b/rust/macros/helpers.rs
@@ -58,3 +58,142 @@ pub(crate) fn file() -> String {
pub(crate) fn gather_cfg_attrs(attr: &[Attribute]) -> impl Iterator<Item = &Attribute> + '_ {
attr.iter().filter(|a| a.path().is_ident("cfg"))
}
+
+/// Parse fields that are required to use a specific order.
+///
+/// As fields must follow a specific order, we *could* just parse fields one by one by peeking.
+/// However the error message generated when implementing that way is not very friendly.
+///
+/// So instead we parse fields in an arbitrary order, but only enforce the ordering after parsing,
+/// and if the wrong order is used, the proper order is communicated to the user with error message.
+///
+/// Usage looks like this:
+/// ```ignore
+/// parse_ordered_fields! {
+/// from input;
+///
+/// // This will extract "foo: <field>" into a variable named "foo".
+/// // The variable will have type `Option<_>`.
+/// foo => <expression that parses the field>,
+///
+/// // If you need the variable name to be different than the key name.
+/// // This extracts "baz: <field>" into a variable named "bar".
+/// // You might want this if "baz" is a keyword.
+/// baz as bar => <expression that parse the field>,
+///
+/// // You can mark a key as required, and the variable will no longer be `Option`.
+/// // foobar will be of type `Expr` instead of `Option<Expr>`.
+/// foobar [required] => input.parse::<Expr>()?,
+/// }
+/// ```
+macro_rules! parse_ordered_fields {
+ (@gen
+ [$input:expr]
+ [$([$name:ident; $key:ident; $parser:expr])*]
+ [$([$req_name:ident; $req_key:ident])*]
+ ) => {
+ $(let mut $name = None;)*
+
+ const EXPECTED_KEYS: &[&str] = &[$(stringify!($key),)*];
+ const REQUIRED_KEYS: &[&str] = &[$(stringify!($req_key),)*];
+
+ let span = $input.span();
+ let mut seen_keys = Vec::new();
+
+ while !$input.is_empty() {
+ let key = $input.call(Ident::parse_any)?;
+
+ if seen_keys.contains(&key) {
+ Err(Error::new_spanned(
+ &key,
+ format!(r#"duplicated key "{key}". Keys can only be specified once."#),
+ ))?
+ }
+
+ $input.parse::<Token![:]>()?;
+
+ match &*key.to_string() {
+ $(
+ stringify!($key) => $name = Some($parser),
+ )*
+ _ => {
+ Err(Error::new_spanned(
+ &key,
+ format!(r#"unknown key "{key}". Valid keys are: {EXPECTED_KEYS:?}."#),
+ ))?
+ }
+ }
+
+ $input.parse::<Token![,]>()?;
+ seen_keys.push(key);
+ }
+
+ for key in REQUIRED_KEYS {
+ if !seen_keys.iter().any(|e| e == key) {
+ Err(Error::new(span, format!(r#"missing required key "{key}""#)))?
+ }
+ }
+
+ let mut ordered_keys: Vec<&str> = Vec::new();
+ for key in EXPECTED_KEYS {
+ if seen_keys.iter().any(|e| e == key) {
+ ordered_keys.push(key);
+ }
+ }
+
+ if seen_keys != ordered_keys {
+ Err(Error::new(
+ span,
+ format!(r#"keys are not ordered as expected. Order them like: {ordered_keys:?}."#),
+ ))?
+ }
+
+ $(let $req_name = $req_name.expect("required field");)*
+ };
+
+ // Handle required fields.
+ (@gen
+ [$input:expr] [$($tok:tt)*] [$($req:tt)*]
+ $key:ident as $name:ident [required] => $parser:expr,
+ $($rest:tt)*
+ ) => {
+ parse_ordered_fields!(
+ @gen [$input] [$($tok)* [$name; $key; $parser]] [$($req)* [$name; $key]] $($rest)*
+ )
+ };
+ (@gen
+ [$input:expr] [$($tok:tt)*] [$($req:tt)*]
+ $name:ident [required] => $parser:expr,
+ $($rest:tt)*
+ ) => {
+ parse_ordered_fields!(
+ @gen [$input] [$($tok)* [$name; $name; $parser]] [$($req)* [$name; $name]] $($rest)*
+ )
+ };
+
+ // Handle optional fields.
+ (@gen
+ [$input:expr] [$($tok:tt)*] [$($req:tt)*]
+ $key:ident as $name:ident => $parser:expr,
+ $($rest:tt)*
+ ) => {
+ parse_ordered_fields!(
+ @gen [$input] [$($tok)* [$name; $key; $parser]] [$($req)*] $($rest)*
+ )
+ };
+ (@gen
+ [$input:expr] [$($tok:tt)*] [$($req:tt)*]
+ $name:ident => $parser:expr,
+ $($rest:tt)*
+ ) => {
+ parse_ordered_fields!(
+ @gen [$input] [$($tok)* [$name; $name; $parser]] [$($req)*] $($rest)*
+ )
+ };
+
+ (from $input:expr; $($tok:tt)*) => {
+ parse_ordered_fields!(@gen [$input] [] [] $($tok)*)
+ }
+}
+
+pub(crate) use parse_ordered_fields;
diff --git a/rust/macros/lib.rs b/rust/macros/lib.rs
index 2cfd59e0f9e7..be04d94d0bc5 100644
--- a/rust/macros/lib.rs
+++ b/rust/macros/lib.rs
@@ -15,6 +15,8 @@
#![cfg_attr(not(CONFIG_RUSTC_HAS_SPAN_FILE), feature(proc_macro_span))]
mod concat_idents;
+#[cfg(CONFIG_CONFIGFS_FS)]
+mod configfs_attrs;
mod export;
mod fmt;
mod helpers;
@@ -489,3 +491,86 @@ pub fn kunit_tests(attr: TokenStream, input: TokenStream) -> TokenStream {
.unwrap_or_else(|e| e.into_compile_error())
.into()
}
+
+/// Define a list of configfs attributes statically.
+///
+/// # Examples
+///
+/// ```ignore
+/// let item_type = configfs_attrs! {
+/// container: configfs::Subsystem<Configuration>,
+/// data: Configuration,
+/// child: Child,
+/// attributes: [
+/// message: 0,
+/// bar: 1,
+/// ],
+/// };
+///```
+///
+/// Expands the following output:
+/// let item_type = {
+/// static DATA_ATTR_LIST: kernel::configfs::AttributeList<
+/// 3usize,
+/// Configuration,
+/// > = unsafe { kernel::configfs::AttributeList::new() };
+/// static MESSAGE_ATTR_0: kernel::configfs::Attribute<
+/// 0u64,
+/// Configuration,
+/// Configuration,
+/// > = unsafe {
+/// kernel::configfs::Attribute::new({
+/// const S: &str = "message\u{0}";
+/// const C: &kernel::str::CStr = match kernel::str::CStr::from_bytes_with_nul(
+/// S.as_bytes(),
+/// ) {
+/// Ok(v) => v,
+/// Err(_) => {
+/// ::core::panicking::panic_fmt(
+/// format_args!("string contains interior NUL"),
+/// );
+/// }
+/// };
+/// C
+/// })
+/// };
+/// unsafe { DATA_ATTR_LIST.add::<0usize, 0u64, _>(&MESSAGE_ATTR_0) }
+/// static BAR_ATTR_1: kernel::configfs::Attribute<
+/// 1u64,
+/// Configuration,
+/// Configuration,
+/// > = unsafe {
+/// kernel::configfs::Attribute::new({
+/// const S: &str = "bar\u{0}";
+/// const C: &kernel::str::CStr = match kernel::str::CStr::from_bytes_with_nul(
+/// S.as_bytes(),
+/// ) {
+/// Ok(v) => v,
+/// Err(_) => {
+/// ::core::panicking::panic_fmt(
+/// format_args!("string contains interior NUL"),
+/// );
+/// }
+/// };
+/// C
+/// })
+/// };
+/// unsafe { DATA_ATTR_LIST.add::<1usize, 1u64, _>(&BAR_ATTR_1) }
+/// {
+/// static DATA_TPE: kernel::configfs::ItemType<
+/// Subsystem<Configuration>,
+/// Configuration,
+/// > = kernel::configfs::ItemType::<
+/// Subsystem<Configuration>,
+/// Configuration,
+/// >::new_with_child_ctor::<3usize, Child>(&THIS_MODULE, &DATA_ATTR_LIST);
+/// &DATA_TPE
+/// }
+/// };
+///
+#[cfg(CONFIG_CONFIGFS_FS)]
+#[proc_macro]
+pub fn configfs_attrs(input: TokenStream) -> TokenStream {
+ configfs_attrs::configfs_attrs(parse_macro_input!(input as configfs_attrs::ConfigfsAttrs))
+ .into()
+}
diff --git a/rust/macros/module.rs b/rust/macros/module.rs
index 06c18e207508..7ff6ad09b1a2 100644
--- a/rust/macros/module.rs
+++ b/rust/macros/module.rs
@@ -196,143 +196,6 @@ fn param_ops_path(param_type: &str) -> Path {
}
}
-/// Parse fields that are required to use a specific order.
-///
-/// As fields must follow a specific order, we *could* just parse fields one by one by peeking.
-/// However the error message generated when implementing that way is not very friendly.
-///
-/// So instead we parse fields in an arbitrary order, but only enforce the ordering after parsing,
-/// and if the wrong order is used, the proper order is communicated to the user with error message.
-///
-/// Usage looks like this:
-/// ```ignore
-/// parse_ordered_fields! {
-/// from input;
-///
-/// // This will extract "foo: <field>" into a variable named "foo".
-/// // The variable will have type `Option<_>`.
-/// foo => <expression that parses the field>,
-///
-/// // If you need the variable name to be different than the key name.
-/// // This extracts "baz: <field>" into a variable named "bar".
-/// // You might want this if "baz" is a keyword.
-/// baz as bar => <expression that parse the field>,
-///
-/// // You can mark a key as required, and the variable will no longer be `Option`.
-/// // foobar will be of type `Expr` instead of `Option<Expr>`.
-/// foobar [required] => input.parse::<Expr>()?,
-/// }
-/// ```
-macro_rules! parse_ordered_fields {
- (@gen
- [$input:expr]
- [$([$name:ident; $key:ident; $parser:expr])*]
- [$([$req_name:ident; $req_key:ident])*]
- ) => {
- $(let mut $name = None;)*
-
- const EXPECTED_KEYS: &[&str] = &[$(stringify!($key),)*];
- const REQUIRED_KEYS: &[&str] = &[$(stringify!($req_key),)*];
-
- let span = $input.span();
- let mut seen_keys = Vec::new();
-
- while !$input.is_empty() {
- let key = $input.call(Ident::parse_any)?;
-
- if seen_keys.contains(&key) {
- Err(Error::new_spanned(
- &key,
- format!(r#"duplicated key "{key}". Keys can only be specified once."#),
- ))?
- }
-
- $input.parse::<Token![:]>()?;
-
- match &*key.to_string() {
- $(
- stringify!($key) => $name = Some($parser),
- )*
- _ => {
- Err(Error::new_spanned(
- &key,
- format!(r#"unknown key "{key}". Valid keys are: {EXPECTED_KEYS:?}."#),
- ))?
- }
- }
-
- $input.parse::<Token![,]>()?;
- seen_keys.push(key);
- }
-
- for key in REQUIRED_KEYS {
- if !seen_keys.iter().any(|e| e == key) {
- Err(Error::new(span, format!(r#"missing required key "{key}""#)))?
- }
- }
-
- let mut ordered_keys: Vec<&str> = Vec::new();
- for key in EXPECTED_KEYS {
- if seen_keys.iter().any(|e| e == key) {
- ordered_keys.push(key);
- }
- }
-
- if seen_keys != ordered_keys {
- Err(Error::new(
- span,
- format!(r#"keys are not ordered as expected. Order them like: {ordered_keys:?}."#),
- ))?
- }
-
- $(let $req_name = $req_name.expect("required field");)*
- };
-
- // Handle required fields.
- (@gen
- [$input:expr] [$($tok:tt)*] [$($req:tt)*]
- $key:ident as $name:ident [required] => $parser:expr,
- $($rest:tt)*
- ) => {
- parse_ordered_fields!(
- @gen [$input] [$($tok)* [$name; $key; $parser]] [$($req)* [$name; $key]] $($rest)*
- )
- };
- (@gen
- [$input:expr] [$($tok:tt)*] [$($req:tt)*]
- $name:ident [required] => $parser:expr,
- $($rest:tt)*
- ) => {
- parse_ordered_fields!(
- @gen [$input] [$($tok)* [$name; $name; $parser]] [$($req)* [$name; $name]] $($rest)*
- )
- };
-
- // Handle optional fields.
- (@gen
- [$input:expr] [$($tok:tt)*] [$($req:tt)*]
- $key:ident as $name:ident => $parser:expr,
- $($rest:tt)*
- ) => {
- parse_ordered_fields!(
- @gen [$input] [$($tok)* [$name; $key; $parser]] [$($req)*] $($rest)*
- )
- };
- (@gen
- [$input:expr] [$($tok:tt)*] [$($req:tt)*]
- $name:ident => $parser:expr,
- $($rest:tt)*
- ) => {
- parse_ordered_fields!(
- @gen [$input] [$($tok)* [$name; $name; $parser]] [$($req)*] $($rest)*
- )
- };
-
- (from $input:expr; $($tok:tt)*) => {
- parse_ordered_fields!(@gen [$input] [] [] $($tok)*)
- }
-}
-
struct Parameter {
name: Ident,
ptype: Ident,
diff --git a/samples/rust/rust_configfs.rs b/samples/rust/rust_configfs.rs
index a1bd9db6010d..876462f7789d 100644
--- a/samples/rust/rust_configfs.rs
+++ b/samples/rust/rust_configfs.rs
@@ -4,7 +4,7 @@
use kernel::alloc::flags;
use kernel::configfs;
-use kernel::configfs::configfs_attrs;
+use kernel::macros::configfs_attrs;
use kernel::new_mutex;
use kernel::page::PAGE_SIZE;
use kernel::prelude::*;
---
base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
change-id: 20260417-configfs-syn-191e07130027
Best regards,
--
Malte Wechter <maltewechter@gmail.com>
^ permalink raw reply related
* Re: [PATCH] iomap: enforce DIO alignment check in iomap
From: Carlos Maiolino @ 2026-06-12 13:23 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Keith Busch, brauner, linux-block, linux-fsdevel, linux-ext4,
linux-xfs, Hannes Reinecke, Martin K. Petersen, Jens Axboe
In-Reply-To: <20260612052831.GA9010@lst.de>
On Fri, Jun 12, 2026 at 07:28:31AM +0200, Christoph Hellwig wrote:
> On Thu, Jun 11, 2026 at 05:47:07PM +0200, Carlos Maiolino wrote:
> > On Thu, Jun 11, 2026 at 03:38:33PM +0200, Christoph Hellwig wrote:
> > > On Thu, Jun 11, 2026 at 06:57:47AM -0600, Keith Busch wrote:
> > > > It's entirely possible a device supports byte aligned addresses. The
> > > > block layer just doesn't let a driver report that. So either it really
> > > > was successful because you found a bug that skips the alignment checks,
> > > > or your device silently corrupted your payload.
> >
> > I tried this on different hardware, I find it hard to say all those
> > devices were corrupting the payload.
>
> I think in the other thread we agreed that we are currently missing
> the alignment check for fast-path bios not hitting the splitting code,
> so maybe that is something you see. Additionally we're missing the
> checks for purely bio based drivers not calling the splitting helper
> at all, but I don't think that applies here.
>
> > > > Anyway, my earlier suggestion should work. Ming thinks it may go to far,
> > > > though, in not taking the optimization when it was possible. So here's
> > > > an alternative suggestion that should get things working as expected:
> > >
> > > The fix below looks like it is addressing a real bug. I'm not sure if
> > > Carlos is hitting it, but we were missing the alignment checks for
> > > single-bvec fast path bios so far indeed.
> >
> > You left context out so I'm assuming by the fix you meant Keith's patch.
>
> Yes.
The fix indeed seems to fix the behavior I'm seeing. Keith could you Cc
me if you end up sending an official version?
^ permalink raw reply
* [PATCH v5 9/9] arm64: dts: qcom: arduino-imola: Describe NVMEM layout for WiFi/BT addresses
From: Loic Poulain @ 2026-06-12 13:21 UTC (permalink / raw)
To: Ulf Hansson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Bjorn Andersson, Konrad Dybcio, Jens Axboe, Johannes Berg,
Jeff Johnson, Bartosz Golaszewski, Marcel Holtmann,
Luiz Augusto von Dentz, Balakrishna Godavarthi, Rocky Liao,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Srinivas Kandagatla, Andrew Lunn, Heiner Kallweit,
Russell King, Saravana Kannan
Cc: linux-mmc, devicetree, linux-kernel, linux-arm-msm, linux-block,
linux-wireless, ath10k, linux-bluetooth, netdev, daniel,
Loic Poulain, Konrad Dybcio, Bartosz Golaszewski
In-Reply-To: <20260612-block-as-nvmem-v5-0-95e0b30fff90@oss.qualcomm.com>
On Arduino Uno-Q, the eMMC boot1 partition is factory provisioned
with device-specific information such as the WiFi MAC address
and the Bluetooth BD address. This partition can serve as an
alternative to additional non-volatile memory, such as a
dedicated EEPROM.
The eMMC boot partitions are typically good candidates, as they
are relatively small, read-only by default (and can be enforced
as hardware read-only), and are not affected by board reflashing
procedures, which generally target the eMMC user or GP partitions.
Describe the corresponding nvmem-layout for the WiFi and Bluetooth
addresses, and point the WiFi and Bluetooth nodes to the appropriate
NVMEM cells to retrieve them.
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
---
arch/arm64/boot/dts/qcom/qrb2210-arduino-imola.dts | 39 ++++++++++++++++++++++
1 file changed, 39 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/qrb2210-arduino-imola.dts b/arch/arm64/boot/dts/qcom/qrb2210-arduino-imola.dts
index bf088fa9807f040f0c8f405f9111b01790b09377..128c7a7e76b5b089044745f5d6407d6391055fc2 100644
--- a/arch/arm64/boot/dts/qcom/qrb2210-arduino-imola.dts
+++ b/arch/arm64/boot/dts/qcom/qrb2210-arduino-imola.dts
@@ -409,7 +409,40 @@ &sdhc_1 {
no-sdio;
no-sd;
+ #address-cells = <1>;
+ #size-cells = <0>;
+
status = "okay";
+
+ card@0 {
+ compatible = "mmc-card";
+ reg = <0>;
+
+ partitions-boot1 {
+ compatible = "fixed-partitions";
+
+ #address-cells = <1>;
+ #size-cells = <1>;
+
+ nvmem-layout {
+ compatible = "fixed-layout";
+ #address-cells = <1>;
+ #size-cells = <1>;
+
+ wifi_mac_addr: mac-addr@4400 {
+ compatible = "mac-base";
+ reg = <0x4400 0x6>;
+ #nvmem-cell-cells = <1>;
+ };
+
+ bd_addr: bd-addr@5400 {
+ compatible = "mac-base";
+ reg = <0x5400 0x6>;
+ #nvmem-cell-cells = <1>;
+ };
+ };
+ };
+ };
};
&spi5 {
@@ -512,6 +545,9 @@ bluetooth {
vddch0-supply = <&pm4125_l22>;
enable-gpios = <&tlmm 87 GPIO_ACTIVE_HIGH>;
max-speed = <3000000>;
+
+ nvmem-cells = <&bd_addr 0>;
+ nvmem-cell-names = "local-bd-address";
};
};
@@ -557,6 +593,9 @@ &wifi {
qcom,ath10k-calibration-variant = "ArduinoImola";
firmware-name = "qcm2290";
+ nvmem-cells = <&wifi_mac_addr 0>;
+ nvmem-cell-names = "mac-address";
+
status = "okay";
};
--
2.34.1
^ permalink raw reply related
* [PATCH v5 8/9] Bluetooth: qca: Set NVMEM BD address quirks when address is invalid
From: Loic Poulain @ 2026-06-12 13:21 UTC (permalink / raw)
To: Ulf Hansson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Bjorn Andersson, Konrad Dybcio, Jens Axboe, Johannes Berg,
Jeff Johnson, Bartosz Golaszewski, Marcel Holtmann,
Luiz Augusto von Dentz, Balakrishna Godavarthi, Rocky Liao,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Srinivas Kandagatla, Andrew Lunn, Heiner Kallweit,
Russell King, Saravana Kannan
Cc: linux-mmc, devicetree, linux-kernel, linux-arm-msm, linux-block,
linux-wireless, ath10k, linux-bluetooth, netdev, daniel,
Loic Poulain, Bartosz Golaszewski
In-Reply-To: <20260612-block-as-nvmem-v5-0-95e0b30fff90@oss.qualcomm.com>
When the controller BD address is invalid (zero or default),
set the NVMEM quirks to allow retrieving the address from a
'local-bd-address' NVMEM cell. The BD address is often stored
alongside the WiFi MAC address in big-endian format, so also
set the big-endian quirk.
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
---
drivers/bluetooth/btqca.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/bluetooth/btqca.c b/drivers/bluetooth/btqca.c
index dda76365726f0bfe0e80e05fe04859fa4f0592e1..df33eacfd29fa680f393f90215150743e6001d5b 100644
--- a/drivers/bluetooth/btqca.c
+++ b/drivers/bluetooth/btqca.c
@@ -721,8 +721,11 @@ static int qca_check_bdaddr(struct hci_dev *hdev, const struct qca_fw_config *co
}
bda = (struct hci_rp_read_bd_addr *)skb->data;
- if (!bacmp(&bda->bdaddr, &config->bdaddr))
+ if (!bacmp(&bda->bdaddr, &config->bdaddr)) {
hci_set_quirk(hdev, HCI_QUIRK_USE_BDADDR_PROPERTY);
+ hci_set_quirk(hdev, HCI_QUIRK_USE_BDADDR_NVMEM);
+ hci_set_quirk(hdev, HCI_QUIRK_BDADDR_NVMEM_BE);
+ }
kfree_skb(skb);
--
2.34.1
^ permalink raw reply related
* [PATCH v5 7/9] Bluetooth: hci_sync: Add NVMEM-backed BD address retrieval
From: Loic Poulain @ 2026-06-12 13:20 UTC (permalink / raw)
To: Ulf Hansson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Bjorn Andersson, Konrad Dybcio, Jens Axboe, Johannes Berg,
Jeff Johnson, Bartosz Golaszewski, Marcel Holtmann,
Luiz Augusto von Dentz, Balakrishna Godavarthi, Rocky Liao,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Srinivas Kandagatla, Andrew Lunn, Heiner Kallweit,
Russell King, Saravana Kannan
Cc: linux-mmc, devicetree, linux-kernel, linux-arm-msm, linux-block,
linux-wireless, ath10k, linux-bluetooth, netdev, daniel,
Loic Poulain, Bartosz Golaszewski
In-Reply-To: <20260612-block-as-nvmem-v5-0-95e0b30fff90@oss.qualcomm.com>
Some devices store the Bluetooth BD address in non-volatile
memory, which can be accessed through the NVMEM framework.
Similar to Ethernet or WiFi MAC addresses, add support for
reading the BD address from a 'local-bd-address' NVMEM cell.
As with the device-tree provided BD address, add a quirk to
indicate whether a device or platform should attempt to read
the address from NVMEM when no valid in-chip address is present.
Also add a quirk to indicate if the address is stored in
big-endian byte order.
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
---
include/net/bluetooth/hci.h | 18 ++++++++++++++++++
net/bluetooth/hci_sync.c | 39 ++++++++++++++++++++++++++++++++++++++-
2 files changed, 56 insertions(+), 1 deletion(-)
diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
index 572b1c620c5d653a1fe10b26c1b0ba33e8f4968f..7686466d1109253b0d75edeb5f6a99fb98ce4cc6 100644
--- a/include/net/bluetooth/hci.h
+++ b/include/net/bluetooth/hci.h
@@ -164,6 +164,24 @@ enum {
*/
HCI_QUIRK_BDADDR_PROPERTY_BROKEN,
+ /* When this quirk is set, the public Bluetooth address
+ * initially reported by HCI Read BD Address command
+ * is considered invalid. The public BD Address can be
+ * retrieved via a 'local-bd-address' NVMEM cell.
+ *
+ * This quirk can be set before hci_register_dev is called or
+ * during the hdev->setup vendor callback.
+ */
+ HCI_QUIRK_USE_BDADDR_NVMEM,
+
+ /* When this quirk is set, the Bluetooth Device Address provided by
+ * the 'local-bd-address' NVMEM is stored in big-endian order.
+ *
+ * This quirk can be set before hci_register_dev is called or
+ * during the hdev->setup vendor callback.
+ */
+ HCI_QUIRK_BDADDR_NVMEM_BE,
+
/* When this quirk is set, the duplicate filtering during
* scanning is based on Bluetooth devices addresses. To allow
* RSSI based updates, restart scanning if needed.
diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
index fd3aacdea512a37c22b9a2be90c89ddca4b4d99f..589ccdfa26c1281d6eb979370523fff0d7920302 100644
--- a/net/bluetooth/hci_sync.c
+++ b/net/bluetooth/hci_sync.c
@@ -7,6 +7,7 @@
*/
#include <linux/property.h>
+#include <linux/of_net.h>
#include <net/bluetooth/bluetooth.h>
#include <net/bluetooth/hci_core.h>
@@ -3588,6 +3589,37 @@ int hci_powered_update_sync(struct hci_dev *hdev)
return 0;
}
+/**
+ * hci_dev_get_bd_addr_from_nvmem - Get the Bluetooth Device Address
+ * (BD_ADDR) for a HCI device from
+ * an NVMEM cell.
+ * @hdev: The HCI device
+ *
+ * Search for 'local-bd-address' NVMEM cell in the device firmware node.
+ *
+ * All-zero BD addresses are rejected (unprovisioned).
+ */
+static int hci_dev_get_bd_addr_from_nvmem(struct hci_dev *hdev)
+{
+ struct device_node *np = dev_of_node(hdev->dev.parent);
+ u8 ba[sizeof(bdaddr_t)];
+ int err;
+
+ if (!np)
+ return -ENODEV;
+
+ err = of_get_nvmem_eui48(np, "local-bd-address", ba);
+ if (err)
+ return err;
+
+ if (hci_test_quirk(hdev, HCI_QUIRK_BDADDR_NVMEM_BE))
+ baswap(&hdev->public_addr, (bdaddr_t *)ba);
+ else
+ bacpy(&hdev->public_addr, (bdaddr_t *)ba);
+
+ return 0;
+}
+
/**
* hci_dev_get_bd_addr_from_property - Get the Bluetooth Device Address
* (BD_ADDR) for a HCI device from
@@ -5042,12 +5074,17 @@ static int hci_dev_setup_sync(struct hci_dev *hdev)
* its setup callback.
*/
invalid_bdaddr = hci_test_quirk(hdev, HCI_QUIRK_INVALID_BDADDR) ||
- hci_test_quirk(hdev, HCI_QUIRK_USE_BDADDR_PROPERTY);
+ hci_test_quirk(hdev, HCI_QUIRK_USE_BDADDR_PROPERTY) ||
+ hci_test_quirk(hdev, HCI_QUIRK_USE_BDADDR_NVMEM);
if (!ret) {
if (hci_test_quirk(hdev, HCI_QUIRK_USE_BDADDR_PROPERTY) &&
!bacmp(&hdev->public_addr, BDADDR_ANY))
hci_dev_get_bd_addr_from_property(hdev);
+ if (hci_test_quirk(hdev, HCI_QUIRK_USE_BDADDR_NVMEM) &&
+ !bacmp(&hdev->public_addr, BDADDR_ANY))
+ hci_dev_get_bd_addr_from_nvmem(hdev);
+
if (invalid_bdaddr && bacmp(&hdev->public_addr, BDADDR_ANY) &&
hdev->set_bdaddr) {
ret = hdev->set_bdaddr(hdev, &hdev->public_addr);
--
2.34.1
^ permalink raw reply related
* [PATCH v5 6/9] net: of_net: Add of_get_nvmem_eui48() helper for EUI-48 lookup
From: Loic Poulain @ 2026-06-12 13:20 UTC (permalink / raw)
To: Ulf Hansson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Bjorn Andersson, Konrad Dybcio, Jens Axboe, Johannes Berg,
Jeff Johnson, Bartosz Golaszewski, Marcel Holtmann,
Luiz Augusto von Dentz, Balakrishna Godavarthi, Rocky Liao,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Srinivas Kandagatla, Andrew Lunn, Heiner Kallweit,
Russell King, Saravana Kannan
Cc: linux-mmc, devicetree, linux-kernel, linux-arm-msm, linux-block,
linux-wireless, ath10k, linux-bluetooth, netdev, daniel,
Loic Poulain, Bartosz Golaszewski
In-Reply-To: <20260612-block-as-nvmem-v5-0-95e0b30fff90@oss.qualcomm.com>
Factor out the common NVMEM EUI-48 retrieval logic from
of_get_mac_address_nvmem() into a new of_get_nvmem_eui48() helper that
accepts the NVMEM cell name as a parameter. This allows other subsystems
(e.g. Bluetooth) to reuse the same lookup-validate-copy pattern with a
different cell name, without duplicating code.
of_get_mac_address_nvmem() is updated to call of_get_nvmem_eui48() with
"mac-address", preserving its existing behavior.
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
---
include/linux/of_net.h | 7 +++++++
net/core/of_net.c | 49 +++++++++++++++++++++++++++++++++++++------------
2 files changed, 44 insertions(+), 12 deletions(-)
diff --git a/include/linux/of_net.h b/include/linux/of_net.h
index d88715a0b3a52f87af23d47791bea3baf5be5200..7854ba555d9a55f3d020a37fe00a27ae52e0e5dc 100644
--- a/include/linux/of_net.h
+++ b/include/linux/of_net.h
@@ -15,6 +15,7 @@ struct net_device;
extern int of_get_phy_mode(struct device_node *np, phy_interface_t *interface);
extern int of_get_mac_address(struct device_node *np, u8 *mac);
extern int of_get_mac_address_nvmem(struct device_node *np, u8 *mac);
+int of_get_nvmem_eui48(struct device_node *np, const char *cell_name, u8 *addr);
int of_get_ethdev_address(struct device_node *np, struct net_device *dev);
extern struct net_device *of_find_net_device_by_node(struct device_node *np);
#else
@@ -34,6 +35,12 @@ static inline int of_get_mac_address_nvmem(struct device_node *np, u8 *mac)
return -ENODEV;
}
+static inline int of_get_nvmem_eui48(struct device_node *np,
+ const char *cell_name, u8 *addr)
+{
+ return -ENODEV;
+}
+
static inline int of_get_ethdev_address(struct device_node *np, struct net_device *dev)
{
return -ENODEV;
diff --git a/net/core/of_net.c b/net/core/of_net.c
index 93ea425b9248a23f4f95a336e9cdbf0053248e32..11c1acca151266ac9287457b4050a54b08e2b5f5 100644
--- a/net/core/of_net.c
+++ b/net/core/of_net.c
@@ -61,9 +61,7 @@ static int of_get_mac_addr(struct device_node *np, const char *name, u8 *addr)
int of_get_mac_address_nvmem(struct device_node *np, u8 *addr)
{
struct platform_device *pdev = of_find_device_by_node(np);
- struct nvmem_cell *cell;
- const void *mac;
- size_t len;
+ u8 mac[ETH_ALEN] __aligned(sizeof(u16));
int ret;
/* Try lookup by device first, there might be a nvmem_cell_lookup
@@ -75,27 +73,54 @@ int of_get_mac_address_nvmem(struct device_node *np, u8 *addr)
return ret;
}
- cell = of_nvmem_cell_get(np, "mac-address");
+ ret = of_get_nvmem_eui48(np, "mac-address", mac);
+ if (ret)
+ return ret;
+
+ if (!is_valid_ether_addr(mac))
+ return -EINVAL;
+
+ ether_addr_copy(addr, mac);
+ return 0;
+}
+EXPORT_SYMBOL(of_get_mac_address_nvmem);
+
+/**
+ * of_get_nvmem_eui48 - Read a 6-byte EUI-48 address from a named NVMEM cell.
+ * @np: Device node to look up the NVMEM cell from.
+ * @cell_name: Name of the NVMEM cell (e.g. "mac-address", "local-bd-address").
+ * @addr: Output buffer for the 6-byte address.
+ *
+ * Reads the named NVMEM cell and validates that it contains a non-zero 6-byte
+ * address. Returns 0 on success, negative errno on failure.
+ */
+int of_get_nvmem_eui48(struct device_node *np, const char *cell_name, u8 *addr)
+{
+ struct nvmem_cell *cell;
+ const void *eui48;
+ size_t len;
+
+ cell = of_nvmem_cell_get(np, cell_name);
if (IS_ERR(cell))
return PTR_ERR(cell);
- mac = nvmem_cell_read(cell, &len);
+ eui48 = nvmem_cell_read(cell, &len);
nvmem_cell_put(cell);
- if (IS_ERR(mac))
- return PTR_ERR(mac);
+ if (IS_ERR(eui48))
+ return PTR_ERR(eui48);
- if (len != ETH_ALEN || !is_valid_ether_addr(mac)) {
- kfree(mac);
+ if (len != ETH_ALEN || !memchr_inv(eui48, 0, ETH_ALEN)) {
+ kfree(eui48);
return -EINVAL;
}
- memcpy(addr, mac, ETH_ALEN);
- kfree(mac);
+ memcpy(addr, eui48, ETH_ALEN);
+ kfree(eui48);
return 0;
}
-EXPORT_SYMBOL(of_get_mac_address_nvmem);
+EXPORT_SYMBOL_GPL(of_get_nvmem_eui48);
/**
* of_get_mac_address()
--
2.34.1
^ permalink raw reply related
* [PATCH v5 5/9] block: implement NVMEM provider
From: Loic Poulain @ 2026-06-12 13:20 UTC (permalink / raw)
To: Ulf Hansson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Bjorn Andersson, Konrad Dybcio, Jens Axboe, Johannes Berg,
Jeff Johnson, Bartosz Golaszewski, Marcel Holtmann,
Luiz Augusto von Dentz, Balakrishna Godavarthi, Rocky Liao,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Srinivas Kandagatla, Andrew Lunn, Heiner Kallweit,
Russell King, Saravana Kannan
Cc: linux-mmc, devicetree, linux-kernel, linux-arm-msm, linux-block,
linux-wireless, ath10k, linux-bluetooth, netdev, daniel,
Loic Poulain
In-Reply-To: <20260612-block-as-nvmem-v5-0-95e0b30fff90@oss.qualcomm.com>
From: Daniel Golle <daniel@makrotopia.org>
On embedded devices using an eMMC it is common that one or more partitions
on the eMMC are used to store MAC addresses and Wi-Fi calibration EEPROM
data. Allow referencing the partition in device tree for the kernel and
Wi-Fi drivers accessing it via the NVMEM layer.
For now, NVMEM is only registered for the whole disk block device, as the
OF node is currently only associated to it.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Co-developed-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
---
block/Kconfig | 9 ++++
block/Makefile | 1 +
block/blk-nvmem.c | 109 ++++++++++++++++++++++++++++++++++++++++++++++
block/blk.h | 8 ++++
block/genhd.c | 4 ++
include/linux/blk_types.h | 3 ++
include/linux/blkdev.h | 1 +
7 files changed, 135 insertions(+)
diff --git a/block/Kconfig b/block/Kconfig
index 15027963472d7b40e27b9097a5993c457b5b3054..0b33747e16dc33473683706f75c92bdf8b648f7c 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -209,6 +209,15 @@ config BLK_INLINE_ENCRYPTION_FALLBACK
by falling back to the kernel crypto API when inline
encryption hardware is not present.
+config BLK_NVMEM
+ bool "Block device NVMEM provider"
+ depends on OF
+ depends on NVMEM
+ help
+ Allow block devices (or partitions) to act as NVMEM providers,
+ typically used with eMMC to store MAC addresses or Wi-Fi
+ calibration data on embedded devices.
+
source "block/partitions/Kconfig"
config BLK_PM
diff --git a/block/Makefile b/block/Makefile
index 7dce2e44276c4274c11a0a61121c83d9c43d6e0c..d7ac389e71902bc091a8800ea266190a43b3e63d 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -36,3 +36,4 @@ obj-$(CONFIG_BLK_INLINE_ENCRYPTION) += blk-crypto.o blk-crypto-profile.o \
blk-crypto-sysfs.o
obj-$(CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK) += blk-crypto-fallback.o
obj-$(CONFIG_BLOCK_HOLDER_DEPRECATED) += holder.o
+obj-$(CONFIG_BLK_NVMEM) += blk-nvmem.o
diff --git a/block/blk-nvmem.c b/block/blk-nvmem.c
new file mode 100644
index 0000000000000000000000000000000000000000..c005f059d9fe56242ebaef9905673dff902b5686
--- /dev/null
+++ b/block/blk-nvmem.c
@@ -0,0 +1,109 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * block device NVMEM provider
+ *
+ * Copyright (c) 2024 Daniel Golle <daniel@makrotopia.org>
+ * Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
+ *
+ * Useful on devices using a partition on an eMMC for MAC addresses or
+ * Wi-Fi calibration EEPROM data.
+ */
+
+#include <linux/file.h>
+#include <linux/nvmem-provider.h>
+#include <linux/nvmem-consumer.h>
+#include <linux/of.h>
+#include <linux/pagemap.h>
+#include <linux/property.h>
+
+#include "blk.h"
+
+static int blk_nvmem_reg_read(void *priv, unsigned int from, void *val, size_t bytes)
+{
+ blk_mode_t mode = BLK_OPEN_READ | BLK_OPEN_RESTRICT_WRITES;
+ dev_t devt = (dev_t)(uintptr_t)priv;
+ size_t bytes_left = bytes;
+ loff_t pos = from;
+ int ret = 0;
+
+ struct file *bdev_file __free(fput) = bdev_file_open_by_dev(devt, mode, priv, NULL);
+ if (IS_ERR(bdev_file))
+ return PTR_ERR(bdev_file);
+
+ while (bytes_left) {
+ pgoff_t f_index = pos >> PAGE_SHIFT;
+ struct folio *folio;
+ size_t folio_off;
+ size_t to_read;
+
+ folio = read_mapping_folio(bdev_file->f_mapping, f_index, NULL);
+ if (IS_ERR(folio)) {
+ ret = PTR_ERR(folio);
+ break;
+ }
+
+ folio_off = offset_in_folio(folio, pos);
+ to_read = min(bytes_left, folio_size(folio) - folio_off);
+ memcpy_from_folio(val, folio, folio_off, to_read);
+ pos += to_read;
+ bytes_left -= to_read;
+ val += to_read;
+ folio_put(folio);
+ }
+
+ return ret;
+}
+
+void blk_nvmem_add(struct block_device *bdev)
+{
+ struct device *dev = &bdev->bd_device;
+ struct nvmem_config config = {};
+
+ /* skip devices which do not have a device tree node */
+ if (!dev_of_node(dev))
+ return;
+
+ /* skip devices without an nvmem layout defined */
+ struct device_node *child __free(device_node) =
+ of_get_child_by_name(dev_of_node(dev), "nvmem-layout");
+ if (!child)
+ return;
+
+ /*
+ * skip block device too large to be represented as NVMEM devices,
+ * the NVMEM reg_read callback uses an unsigned int offset
+ */
+ if (bdev_nr_bytes(bdev) > UINT_MAX) {
+ dev_warn(dev, "block device too large to be an NVMEM provider\n");
+ return;
+ }
+
+ config.id = NVMEM_DEVID_NONE;
+ config.dev = dev;
+ config.name = dev_name(dev);
+ config.owner = THIS_MODULE;
+ config.priv = (void *)(uintptr_t)dev->devt;
+ config.reg_read = blk_nvmem_reg_read;
+ config.size = bdev_nr_bytes(bdev);
+ config.word_size = 1;
+ config.stride = 1;
+ config.read_only = true;
+ config.root_only = true;
+ config.ignore_wp = true;
+ config.of_node = to_of_node(dev->fwnode);
+
+ bdev->bd_nvmem = nvmem_register(&config);
+ if (IS_ERR(bdev->bd_nvmem)) {
+ dev_err_probe(dev, PTR_ERR(bdev->bd_nvmem),
+ "Failed to register NVMEM device\n");
+ bdev->bd_nvmem = NULL;
+ }
+}
+
+void blk_nvmem_del(struct block_device *bdev)
+{
+ if (bdev->bd_nvmem)
+ nvmem_unregister(bdev->bd_nvmem);
+
+ bdev->bd_nvmem = NULL;
+}
diff --git a/block/blk.h b/block/blk.h
index ec4674cdf2ead4fd259ff5fc42401f591e684ee9..cd3c7ca723391c40be56f1dd4810e641b7c8a2b3 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -757,4 +757,12 @@ static inline void blk_debugfs_unlock(struct request_queue *q,
memalloc_noio_restore(memflags);
}
+#ifdef CONFIG_BLK_NVMEM
+void blk_nvmem_add(struct block_device *bdev);
+void blk_nvmem_del(struct block_device *bdev);
+#else
+static inline void blk_nvmem_add(struct block_device *bdev) {}
+static inline void blk_nvmem_del(struct block_device *bdev) {}
+#endif
+
#endif /* BLK_INTERNAL_H */
diff --git a/block/genhd.c b/block/genhd.c
index 7d6854fd28e95ae9134309679a7c6a937f5b7db8..1b2382de6fb30c1e5f60f45c04dc03ed3bf5d5f2 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -421,6 +421,8 @@ static void add_disk_final(struct gendisk *disk)
*/
dev_set_uevent_suppress(ddev, 0);
disk_uevent(disk, KOBJ_ADD);
+
+ blk_nvmem_add(disk->part0);
}
blk_apply_bdi_limits(disk->bdi, &disk->queue->limits);
@@ -704,6 +706,8 @@ static void __del_gendisk(struct gendisk *disk)
disk_del_events(disk);
+ blk_nvmem_del(disk->part0);
+
/*
* Prevent new openers by unlinked the bdev inode.
*/
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 8808ee76e73c09e0ceaac41ba59e86fb0c4efc64..ace6f59b860d0813665b2f62a1c03a1f4be94059 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -73,6 +73,9 @@ struct block_device {
int bd_writers;
#ifdef CONFIG_SECURITY
void *bd_security;
+#endif
+#ifdef CONFIG_BLK_NVMEM
+ struct nvmem_device *bd_nvmem;
#endif
/*
* keep this out-of-line as it's both big and not needed in the fast
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 890128cdea1ce66863c5baa36f3b336ec4550807..f15d2b5bf9e4fd2368b8a70416a978e22c0d4333 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -30,6 +30,7 @@
struct module;
struct request_queue;
+struct nvmem_device;
struct elevator_queue;
struct blk_trace;
struct request;
--
2.34.1
^ permalink raw reply related
* [PATCH v5 4/9] dt-bindings: bluetooth: qcom: Add NVMEM BD address cell
From: Loic Poulain @ 2026-06-12 13:20 UTC (permalink / raw)
To: Ulf Hansson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Bjorn Andersson, Konrad Dybcio, Jens Axboe, Johannes Berg,
Jeff Johnson, Bartosz Golaszewski, Marcel Holtmann,
Luiz Augusto von Dentz, Balakrishna Godavarthi, Rocky Liao,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Srinivas Kandagatla, Andrew Lunn, Heiner Kallweit,
Russell King, Saravana Kannan
Cc: linux-mmc, devicetree, linux-kernel, linux-arm-msm, linux-block,
linux-wireless, ath10k, linux-bluetooth, netdev, daniel,
Loic Poulain, Bartosz Golaszewski
In-Reply-To: <20260612-block-as-nvmem-v5-0-95e0b30fff90@oss.qualcomm.com>
Add support for an NVMEM cell provider for "local-bd-address",
allowing the Bluetooth stack to retrieve controller's BD address
from non-volatile storage such as an EEPROM or an eMMC partition.
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
---
.../devicetree/bindings/net/bluetooth/qcom,bluetooth-common.yaml | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/Documentation/devicetree/bindings/net/bluetooth/qcom,bluetooth-common.yaml b/Documentation/devicetree/bindings/net/bluetooth/qcom,bluetooth-common.yaml
index c8e9c55c1afb4c8e05ba2dae41ce2db4194b4a0f..7cb28f30c9af032082f23311f2fc89a32f266f17 100644
--- a/Documentation/devicetree/bindings/net/bluetooth/qcom,bluetooth-common.yaml
+++ b/Documentation/devicetree/bindings/net/bluetooth/qcom,bluetooth-common.yaml
@@ -22,4 +22,13 @@ properties:
description:
boot firmware is incorrectly passing the address in big-endian order
+ nvmem-cells:
+ maxItems: 1
+ description:
+ Nvmem data cell that contains a 6 byte BD address with the most
+ significant byte first (big-endian).
+
+ nvmem-cell-names:
+ const: local-bd-address
+
additionalProperties: true
--
2.34.1
^ permalink raw reply related
* [PATCH v5 3/9] dt-bindings: net: wireless: qcom,ath10k: Document NVMEM cells
From: Loic Poulain @ 2026-06-12 13:20 UTC (permalink / raw)
To: Ulf Hansson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Bjorn Andersson, Konrad Dybcio, Jens Axboe, Johannes Berg,
Jeff Johnson, Bartosz Golaszewski, Marcel Holtmann,
Luiz Augusto von Dentz, Balakrishna Godavarthi, Rocky Liao,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Srinivas Kandagatla, Andrew Lunn, Heiner Kallweit,
Russell King, Saravana Kannan
Cc: linux-mmc, devicetree, linux-kernel, linux-arm-msm, linux-block,
linux-wireless, ath10k, linux-bluetooth, netdev, daniel,
Loic Poulain, Bartosz Golaszewski, Krzysztof Kozlowski
In-Reply-To: <20260612-block-as-nvmem-v5-0-95e0b30fff90@oss.qualcomm.com>
Document the NVMEM cells supported by the ath10k driver, the
mac-address, pre-calibration data, and calibration data.
Since such data may also originate from chipset OTP or be supplied
via other device tree structures. All of these cells are optional
and can be provided independently, in any combination.
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
---
.../devicetree/bindings/net/wireless/qcom,ath10k.yaml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.yaml b/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.yaml
index c21d66c7cd558ab792524be9afec8b79272d1c87..878c5d833a9cb073520c256c1b72d0f1489e7f4a 100644
--- a/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.yaml
+++ b/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.yaml
@@ -92,6 +92,22 @@ properties:
ieee80211-freq-limit: true
+ nvmem-cells:
+ minItems: 1
+ maxItems: 3
+ description:
+ References to nvmem cells for MAC address and/or calibration data.
+ Supported cell names are mac-address, calibration, and pre-calibration.
+
+ nvmem-cell-names:
+ minItems: 1
+ maxItems: 3
+ items:
+ enum:
+ - mac-address
+ - calibration
+ - pre-calibration
+
qcom,calibration-data:
$ref: /schemas/types.yaml#/definitions/uint8-array
description:
--
2.34.1
^ permalink raw reply related
* [PATCH v5 2/9] dt-bindings: mmc: Document support for nvmem-layout
From: Loic Poulain @ 2026-06-12 13:20 UTC (permalink / raw)
To: Ulf Hansson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Bjorn Andersson, Konrad Dybcio, Jens Axboe, Johannes Berg,
Jeff Johnson, Bartosz Golaszewski, Marcel Holtmann,
Luiz Augusto von Dentz, Balakrishna Godavarthi, Rocky Liao,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Srinivas Kandagatla, Andrew Lunn, Heiner Kallweit,
Russell King, Saravana Kannan
Cc: linux-mmc, devicetree, linux-kernel, linux-arm-msm, linux-block,
linux-wireless, ath10k, linux-bluetooth, netdev, daniel,
Loic Poulain, Bartosz Golaszewski
In-Reply-To: <20260612-block-as-nvmem-v5-0-95e0b30fff90@oss.qualcomm.com>
Add support for an nvmem-layout subnode under an eMMC hardware
partition. This allows the partition to be exposed as an NVMEM
provider and its internal layout to be described. For example,
an eMMC boot partition can be used to store device-specific
information such as a WiFi MAC address.
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
---
.../devicetree/bindings/mmc/mmc-card.yaml | 29 ++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/Documentation/devicetree/bindings/mmc/mmc-card.yaml b/Documentation/devicetree/bindings/mmc/mmc-card.yaml
index a61d6c96df759102f9c1fbfd548b026a77921cae..ca907ad73095925b234b119948f94ae81e698c86 100644
--- a/Documentation/devicetree/bindings/mmc/mmc-card.yaml
+++ b/Documentation/devicetree/bindings/mmc/mmc-card.yaml
@@ -40,6 +40,9 @@ patternProperties:
contains:
const: fixed-partitions
+ nvmem-layout:
+ $ref: /schemas/nvmem/layouts/nvmem-layout.yaml
+
required:
- compatible
- reg
@@ -86,6 +89,32 @@ examples:
read-only;
};
};
+
+ partitions-boot2 {
+ compatible = "fixed-partitions";
+
+ #address-cells = <1>;
+ #size-cells = <1>;
+
+ nvmem-layout {
+ compatible = "fixed-layout";
+
+ #address-cells = <1>;
+ #size-cells = <1>;
+
+ mac-addr@4400 {
+ compatible = "mac-base";
+ reg = <0x4400 0x6>;
+ #nvmem-cell-cells = <1>;
+ };
+
+ bd-addr@5400 {
+ compatible = "mac-base";
+ reg = <0x5400 0x6>;
+ #nvmem-cell-cells = <1>;
+ };
+ };
+ };
};
};
--
2.34.1
^ permalink raw reply related
* [PATCH v5 1/9] block: partitions: of: Skip child nodes without reg property
From: Loic Poulain @ 2026-06-12 13:20 UTC (permalink / raw)
To: Ulf Hansson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Bjorn Andersson, Konrad Dybcio, Jens Axboe, Johannes Berg,
Jeff Johnson, Bartosz Golaszewski, Marcel Holtmann,
Luiz Augusto von Dentz, Balakrishna Godavarthi, Rocky Liao,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Srinivas Kandagatla, Andrew Lunn, Heiner Kallweit,
Russell King, Saravana Kannan
Cc: linux-mmc, devicetree, linux-kernel, linux-arm-msm, linux-block,
linux-wireless, ath10k, linux-bluetooth, netdev, daniel,
Loic Poulain
In-Reply-To: <20260612-block-as-nvmem-v5-0-95e0b30fff90@oss.qualcomm.com>
Child nodes of a fixed-partitions node are not necessarily partition
entries, for example an nvmem-layout node has no reg property. The
current code passes a NULL reg pointer and uninitialized len to the
length check, which can result in a kernel panic or silent failure to
register any partitions.
Fix validate_of_partition() to return a skip indicator when no reg
property is present. Guard add_of_partition() with a reg property
check for the same reason.
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
---
block/partitions/of.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/block/partitions/of.c b/block/partitions/of.c
index c22b6066109819c71568f73e8db8833d196b1cf6..534e02a9d85f62611d880af9b302d9fd49aa4d46 100644
--- a/block/partitions/of.c
+++ b/block/partitions/of.c
@@ -15,6 +15,10 @@ static int validate_of_partition(struct device_node *np, int slot)
int a_cells = of_n_addr_cells(np);
int s_cells = of_n_size_cells(np);
+ /* Skip nodes without a reg property (e.g. nvmem-layout) */
+ if (!reg)
+ return 1;
+
/* Make sure reg len match the expected addr and size cells */
if (len / sizeof(*reg) != a_cells + s_cells)
return -EINVAL;
@@ -80,14 +84,15 @@ int of_partition(struct parsed_partitions *state)
slot = 1;
/* Validate parition offset and size */
for_each_child_of_node(partitions_np, np) {
- if (validate_of_partition(np, slot)) {
+ int err = validate_of_partition(np, slot);
+
+ if (err < 0) {
of_node_put(np);
of_node_put(partitions_np);
-
return -1;
}
-
- slot++;
+ if (!err)
+ slot++;
}
slot = 1;
@@ -97,9 +102,10 @@ int of_partition(struct parsed_partitions *state)
break;
}
- add_of_partition(state, slot, np);
-
- slot++;
+ if (of_property_present(np, "reg")) {
+ add_of_partition(state, slot, np);
+ slot++;
+ }
}
seq_buf_puts(&state->pp_buf, "\n");
--
2.34.1
^ permalink raw reply related
* [PATCH v5 0/9] Support for block device NVMEM providers
From: Loic Poulain @ 2026-06-12 13:20 UTC (permalink / raw)
To: Ulf Hansson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Bjorn Andersson, Konrad Dybcio, Jens Axboe, Johannes Berg,
Jeff Johnson, Bartosz Golaszewski, Marcel Holtmann,
Luiz Augusto von Dentz, Balakrishna Godavarthi, Rocky Liao,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Srinivas Kandagatla, Andrew Lunn, Heiner Kallweit,
Russell King, Saravana Kannan
Cc: linux-mmc, devicetree, linux-kernel, linux-arm-msm, linux-block,
linux-wireless, ath10k, linux-bluetooth, netdev, daniel,
Loic Poulain, Bartosz Golaszewski, Krzysztof Kozlowski,
Konrad Dybcio
On embedded devices, it is common for factory provisioning to store
device-specific information, such as Ethernet or WiFi MAC addresses,
in a dedicated area of an eMMC partition. This avoids the need for
and additional EEPROM/OTP and leverages the persistence of eMMC.
One example is the Arduino UNO-Q, where the WiFi MAC address and the
Bluetooth Device address are stored in the eMMC Boot1 partition.
Until now, accessing this information required a custom bootloader
to read the data and inject it into the Device Tree before handing
control over to the kernel. This approach is fragile and leads to
device-specific workarounds.
Rather than adding a new NVMEM provider specifically to the eMMC
subsystem, the new support operates at the block layer, allowing any
block device to behave like other non-volatile memories such as EEPROM
or OTP.
This series builds on earlier work by Daniel Golle that enables block
devices to act as NVMEM providers:
https://lore.kernel.org/all/6061aa4201030b9bb2f8d03ef32a564fdb786ed1.1709667858.git.daniel@makrotopia.org/
It also introduces an NVMEM layout description for the Arduino UNO-Q,
allowing device-specific data stored in the eMMC Boot1 partition to
be accessed in a standard way.
WiFi and Ethernet already support retrieving MAC addresses from NVMEM.
Bluetooth requires similar support, which is also addressed.
Note that this is currently limited to MMC-backed block devices, as
only the MMC core associates a firmware node with the block device
(add_disk_fwnode). This can be easily extended in the future to
support additional block drivers.
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
---
Changes in v5:
- Fixed ath10k binding issue + extended commit message (Krzysztof)
- Moved blk-nvmem handling to block core instead of a class_interface
This allows correct/robust integration with block device life cycle (Bartosz).
- block: partitions: of: Skip child nodes without reg property (sashiko)
- Link to v4: https://lore.kernel.org/r/20260609-block-as-nvmem-v4-0-45712e6b22c6@oss.qualcomm.com
Changes in v4:
- Fix squash issue (dts commit incorrectly squashed) (Konrad)
- Use devres for nvmem resources (Bartosz)
- use __free() destructor helper when possible (Bartosz)
- Fix value return checking for bdev_file_open_by_dev
- Link to v3: https://lore.kernel.org/r/20260608-block-as-nvmem-v3-0-82681f50aa35@oss.qualcomm.com
Changes in v3:
- Fixed missing 'fixed-partitions' compatible in partition (Rob)
- Fixed clashing nvmem cells, document calibration along mac (Sashiko)
- Remove workaround to handle dangling nvmem references after
unregistering, this is a generic nvmem framework issue handled
in Bartosz's series:
https://lore.kernel.org/all/20260429-nvmem-unbind-v3-0-2a694f95395b@oss.qualcomm.com/
- Validate mac (is_valid_ether_addr) before copying to output buffer
- Link to v2: https://lore.kernel.org/r/20260507-block-as-nvmem-v2-0-bf17edd5134e@oss.qualcomm.com
Changes in v2:
- Fix example nvmem-layout cells to use compatible = "mac-base"
- Squash WiFi MAC and Bluetooth BD address consumer patches into the nvmem layout patch
- Fix possible use-after-free in blk-nvmem: bnv (nvmem priv) linked to nvmem lifetime
- Simplify nvmem-cell-names from items: - const: to plain const:
- Factor out common NVMEM EUI-48 retrieval logic
- Reorder changes
- Link to v1: https://lore.kernel.org/r/20260428-block-as-nvmem-v1-0-6ad23e75190a@oss.qualcomm.com
---
Daniel Golle (1):
block: implement NVMEM provider
Loic Poulain (8):
block: partitions: of: Skip child nodes without reg property
dt-bindings: mmc: Document support for nvmem-layout
dt-bindings: net: wireless: qcom,ath10k: Document NVMEM cells
dt-bindings: bluetooth: qcom: Add NVMEM BD address cell
net: of_net: Add of_get_nvmem_eui48() helper for EUI-48 lookup
Bluetooth: hci_sync: Add NVMEM-backed BD address retrieval
Bluetooth: qca: Set NVMEM BD address quirks when address is invalid
arm64: dts: qcom: arduino-imola: Describe NVMEM layout for WiFi/BT addresses
.../devicetree/bindings/mmc/mmc-card.yaml | 29 ++++++
.../net/bluetooth/qcom,bluetooth-common.yaml | 9 ++
.../bindings/net/wireless/qcom,ath10k.yaml | 16 +++
arch/arm64/boot/dts/qcom/qrb2210-arduino-imola.dts | 39 ++++++++
block/Kconfig | 9 ++
block/Makefile | 1 +
block/blk-nvmem.c | 109 +++++++++++++++++++++
block/blk.h | 8 ++
block/genhd.c | 4 +
block/partitions/of.c | 20 ++--
drivers/bluetooth/btqca.c | 5 +-
include/linux/blk_types.h | 3 +
include/linux/blkdev.h | 1 +
include/linux/of_net.h | 7 ++
include/net/bluetooth/hci.h | 18 ++++
net/bluetooth/hci_sync.c | 39 +++++++-
net/core/of_net.c | 49 ++++++---
17 files changed, 345 insertions(+), 21 deletions(-)
---
base-commit: ccb7390d6cdb23b298a6e2a7028ec134dfc4db10
change-id: 20260428-block-as-nvmem-4b308e8bda9a
Best regards,
--
Loic Poulain <loic.poulain@oss.qualcomm.com>
^ permalink raw reply
* Re: [PATCH RFC 0/1] block: fix concurrent elevator change failure
From: Nilay Shroff @ 2026-06-12 11:45 UTC (permalink / raw)
To: Ming Lei, Shin'ichiro Kawasaki; +Cc: linux-block, Jens Axboe
In-Reply-To: <aivoHk4DE_pkKkDm@fedora>
On 6/12/26 4:36 PM, Ming Lei wrote:
> On Fri, Jun 12, 2026 at 06:47:50PM +0900, Shin'ichiro Kawasaki wrote:
>> On Jun 11, 2026 / 06:22, Ming Lei wrote:
>>> Hi Shin'ichiro,
>>
>> Hi Ming, thanks for the comments.
>>
>>>
>>> On Thu, Jun 11, 2026 at 04:41:59PM +0900, Shin'ichiro Kawasaki wrote:
>>>> I observed that the blktests test case block/005 hangs on a specific
>>>> server hardware using a specific HDD as a block device. During the test
>>>> case run, the kernel reported a KASAN null-ptr-deref (and other memory
>>>> corruption symptoms) [2]. This failure looked sporadic and hardware-
>>>> dependent.
>>>>
>>>> From the kernel message, I noticed that udev-worker wrote to the
>>>> queue/scheduler sysfs attribute to change the IO scheduler, or elevator.
>>>> The test case block/005 also wrote to the same sysfs attribute, which
>>>
>>> sysfs write is supposed to be serialized...
>>
>> I checked the sysfs write handler elv_iosched_store() in block/elevator.c.
>> I found elevator_change() call is guarded with the rw_semaphore
>> "set->update_nr_hwq_lock", but the guard is not the writer lock but the reader
>> lock. This does not serialize the sysfs writes.
>
> Please see kernfs_fop_write_iter(), in which mutex is held before calling
> ->write().
>
I think you're referring to @of->mutex here; however of->mutex is per struct
kernfs_open_file, which is associated with an open instance of the sysfs file.
The important point is that two separate opens can have different kernfs_open_file
instances and therefore different mutexes. Thus, concurrent write to same sysfs
attribute from two different processes may still be possible.
>>
>> I tried the patch below to replace the reader lock with the writer lock. With
>> a quick trial, it looks working. The kernel message is no longer observed and
>> the new test case does not cause hangs. I will do further testing to confirm
>> that this change does not trigger other new lockdep WARNs. Assuming it does not
>> have such side effects, I hope this fix approach is acceptable. It doesn't add
>> the new lock, so I think it's the better.
>>
>> diff --git a/block/elevator.c b/block/elevator.c
>> index 3bcd37c2aa34..b03185a217ff 100644
>> --- a/block/elevator.c
>> +++ b/block/elevator.c
>> @@ -813,7 +813,7 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
>> * update_nr_hwq_lock -> kn->active (via del_gendisk -> kobject_del)
>> * kn->active -> update_nr_hwq_lock (via this sysfs write path)
>> */
>> - if (!down_read_trylock(&set->update_nr_hwq_lock)) {
>> + if (!down_write_trylock(&set->update_nr_hwq_lock)) {
>> ret = -EBUSY;
>> goto out;
>> }
>> @@ -824,7 +824,7 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
>> } else {
>> ret = -ENOENT;
>> }
>> - up_read(&set->update_nr_hwq_lock);
>> + up_write(&set->update_nr_hwq_lock);
>>
>> out:
>> if (ctx.type)
>>
>> [...]
>>
>>> blk_mq_sched_reg_debugfs already includes debugfs lock, so I feel the proper
>>> fix could be check & avoid the null-ptr-deref.
>>
>> Actually, null-ptr-deref is one of the failure symptoms. KASAN slab-user-after
>> free is also observed [3]. Then I'm guessing adding null checks may not be
>> enough.
>>
>>> Adding new lock should be the last straw usually, especially this one is
>>> depended by queue freeze.
>>
>> Got it, thanks.
>>
>>
>> [3] KASAN slab-use-after-free
>
> Then you need to figure out the exact slab type and check if the pointer is cleared
> during free.
>
> Anyway, there is guard already, not see reason to add new lock for covering
> it.
>
Regarding the observed failure, my understanding is that blk_mq_debugfs_register_sched()
and blk_mq_debugfs_register_sched_hctx() access q->elevator without holding q->elevator_lock.
If multiple scheduler update paths run concurrently, one path can replace and free the
elevator while another path is still using it, which would explain the observed KASAN
use-after-free and NULL pointer dereference reports.
With the proposed change, upgrading update_nr_hwq_lock from a reader lock to a writer
lock in elv_iosched_store() would serialize concurrent scheduler updates and therefore
prevent multiple elevator switch operations from running at the same time.
The another way to fix this might be to acquire q->elevator_lock in blk_mq_sched_reg_debugfs()
and thus serialize access to q->elevator in blk_mq_debugfs_register_sched() and
blk_mq_debugfs_register_sched_hctx().
Thanks,
--Nilay
^ permalink raw reply
* Re: [PATCH RFC 0/1] block: fix concurrent elevator change failure
From: Ming Lei @ 2026-06-12 11:06 UTC (permalink / raw)
To: Shin'ichiro Kawasaki; +Cc: linux-block, Jens Axboe, Nilay Shroff
In-Reply-To: <aivMxPCd305WbBsk@shinmob>
On Fri, Jun 12, 2026 at 06:47:50PM +0900, Shin'ichiro Kawasaki wrote:
> On Jun 11, 2026 / 06:22, Ming Lei wrote:
> > Hi Shin'ichiro,
>
> Hi Ming, thanks for the comments.
>
> >
> > On Thu, Jun 11, 2026 at 04:41:59PM +0900, Shin'ichiro Kawasaki wrote:
> > > I observed that the blktests test case block/005 hangs on a specific
> > > server hardware using a specific HDD as a block device. During the test
> > > case run, the kernel reported a KASAN null-ptr-deref (and other memory
> > > corruption symptoms) [2]. This failure looked sporadic and hardware-
> > > dependent.
> > >
> > > From the kernel message, I noticed that udev-worker wrote to the
> > > queue/scheduler sysfs attribute to change the IO scheduler, or elevator.
> > > The test case block/005 also wrote to the same sysfs attribute, which
> >
> > sysfs write is supposed to be serialized...
>
> I checked the sysfs write handler elv_iosched_store() in block/elevator.c.
> I found elevator_change() call is guarded with the rw_semaphore
> "set->update_nr_hwq_lock", but the guard is not the writer lock but the reader
> lock. This does not serialize the sysfs writes.
Please see kernfs_fop_write_iter(), in which mutex is held before calling
->write().
>
> I tried the patch below to replace the reader lock with the writer lock. With
> a quick trial, it looks working. The kernel message is no longer observed and
> the new test case does not cause hangs. I will do further testing to confirm
> that this change does not trigger other new lockdep WARNs. Assuming it does not
> have such side effects, I hope this fix approach is acceptable. It doesn't add
> the new lock, so I think it's the better.
>
> diff --git a/block/elevator.c b/block/elevator.c
> index 3bcd37c2aa34..b03185a217ff 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -813,7 +813,7 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
> * update_nr_hwq_lock -> kn->active (via del_gendisk -> kobject_del)
> * kn->active -> update_nr_hwq_lock (via this sysfs write path)
> */
> - if (!down_read_trylock(&set->update_nr_hwq_lock)) {
> + if (!down_write_trylock(&set->update_nr_hwq_lock)) {
> ret = -EBUSY;
> goto out;
> }
> @@ -824,7 +824,7 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
> } else {
> ret = -ENOENT;
> }
> - up_read(&set->update_nr_hwq_lock);
> + up_write(&set->update_nr_hwq_lock);
>
> out:
> if (ctx.type)
>
> [...]
>
> > blk_mq_sched_reg_debugfs already includes debugfs lock, so I feel the proper
> > fix could be check & avoid the null-ptr-deref.
>
> Actually, null-ptr-deref is one of the failure symptoms. KASAN slab-user-after
> free is also observed [3]. Then I'm guessing adding null checks may not be
> enough.
>
> > Adding new lock should be the last straw usually, especially this one is
> > depended by queue freeze.
>
> Got it, thanks.
>
>
> [3] KASAN slab-use-after-free
Then you need to figure out the exact slab type and check if the pointer is cleared
during free.
Anyway, there is guard already, not see reason to add new lock for covering
it.
Thanks,
Ming
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox