* Re: [PATCH] rnbd-clt: Use common error handling code in rnbd_get_iu()
From: Haris Iqbal @ 2026-06-12 9:39 UTC (permalink / raw)
To: Markus Elfring; +Cc: linux-block, Jack Wang, Jens Axboe, LKML, kernel-janitors
In-Reply-To: <c9f86f0b-331d-4cb1-b8a2-00bc1e857ec7@web.de>
On Wed, Jun 10, 2026 at 9:03 PM Markus Elfring <Markus.Elfring@web.de> wrote:
>
> From: Markus Elfring <elfring@users.sourceforge.net>
> Date: Wed, 10 Jun 2026 20:58:47 +0200
>
> Use an additional label so that a bit of exception handling can be better
> reused at the end of an if branch.
>
> This issue was detected by using the Coccinelle software.
>
> Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
> ---
> drivers/block/rnbd/rnbd-clt.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/block/rnbd/rnbd-clt.c b/drivers/block/rnbd/rnbd-clt.c
> index 4d6725a0035e..d8e3f145ee2f 100644
> --- a/drivers/block/rnbd/rnbd-clt.c
> +++ b/drivers/block/rnbd/rnbd-clt.c
> @@ -329,10 +329,8 @@ static struct rnbd_iu *rnbd_get_iu(struct rnbd_clt_session *sess,
> return NULL;
>
> permit = rnbd_get_permit(sess, con_type, wait);
> - if (!permit) {
> - kfree(iu);
> - return NULL;
> - }
> + if (!permit)
> + goto free_iu;
>
> iu->permit = permit;
> /*
> @@ -349,6 +347,7 @@ static struct rnbd_iu *rnbd_get_iu(struct rnbd_clt_session *sess,
>
> if (sg_alloc_table(&iu->sgt, 1, GFP_KERNEL)) {
> rnbd_put_permit(sess, permit);
> +free_iu:
Thanks for the patch.
It does what it mentioned in the commit description, but maybe we do
not need to do this?
If there was a kfree before the last "return iu;", it would have made
more sense. But jumping to the middle of a conditional block to reuse
the free and return seems forced.
> kfree(iu);
> return NULL;
> }
> --
> 2.54.0
>
^ permalink raw reply
* [PATCH v2] block: invalidate cached plug timestamp after task switch
From: Usama Arif @ 2026-06-12 9:40 UTC (permalink / raw)
To: axboe, linux-block, bsegall, dietmar.eggemann, juri.lelli,
kprateek.nayak, linux-kernel, mgorman, mingo, peterz, rostedt,
vincent.guittot, vschneid
Cc: shakeel.butt, hannes, riel, kernel-team, Usama Arif, stable
blk_time_get_ns() caches ktime_get_ns() in current->plug->cur_ktime
and marks the task with PF_BLOCK_TS. That cache is only valid while the
task keeps running; if the task is switched out, wall-clock time
advances and the cached value must not be reused when the task runs again.
The existing invalidation covers explicit plug flushes through
__blk_flush_plug(), and the schedule() / rtmutex paths through
sched_update_worker(). It does not cover in-kernel preemption paths such
as preempt_schedule(), preempt_schedule_notrace(), and
preempt_schedule_irq(), which enter __schedule(SM_PREEMPT) directly and
return without calling sched_update_worker().
As a result, a task preempted while holding a plug with PF_BLOCK_TS set
can reuse a stale plug->cur_ktime after it is scheduled back in. blk-iocost
then consumes that stale timestamp through ioc_now(), producing stale vnow
values for throttle decisions, and through ioc_rqos_done(), inflating
on-queue time and feeding false missed-QoS samples into vrate
adjustment.
Move the schedule-side invalidation to finish_task_switch(), which runs
for the scheduled-in task after every actual context switch regardless
of which schedule entry point was used. Keep __blk_flush_plug() as the
explicit flush/finish-plug invalidation path, and remove only the
PF_BLOCK_TS handling from sched_update_worker().
Fixes: 06b23f92af87 ("block: update cached timestamp post schedule/preemption")
Cc: stable@vger.kernel.org
Signed-off-by: Usama Arif <usama.arif@linux.dev>
---
v1 -> v2: https://lore.kernel.org/all/20260611231428.345098-1-usama.arif@linux.dev/
- Make the function just blk_plug_invalidate_ts(), move the check for
PF_BLOCK_TS flag into blk_plug_invalidate_ts and make it __always_inline
(Peter Zijlstra).
---
include/linux/blkdev.h | 17 ++++++++---------
kernel/sched/core.c | 12 ++++++++----
2 files changed, 16 insertions(+), 13 deletions(-)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 57e84d59a642..1c1fd31ce187 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1216,16 +1216,15 @@ static inline void blk_flush_plug(struct blk_plug *plug, bool async)
__blk_flush_plug(plug, async);
}
-/*
- * tsk == current here
- */
-static inline void blk_plug_invalidate_ts(struct task_struct *tsk)
+static __always_inline void blk_plug_invalidate_ts(void)
{
- struct blk_plug *plug = tsk->plug;
+ if (unlikely(current->flags & PF_BLOCK_TS)) {
+ struct blk_plug *plug = current->plug;
- if (plug)
- plug->cur_ktime = 0;
- current->flags &= ~PF_BLOCK_TS;
+ if (plug)
+ plug->cur_ktime = 0;
+ current->flags &= ~PF_BLOCK_TS;
+ }
}
int blkdev_issue_flush(struct block_device *bdev);
@@ -1251,7 +1250,7 @@ static inline void blk_flush_plug(struct blk_plug *plug, bool async)
{
}
-static inline void blk_plug_invalidate_ts(struct task_struct *tsk)
+static inline void blk_plug_invalidate_ts(void)
{
}
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8b791e9e9f67..e97e98c33be5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5368,6 +5368,12 @@ static struct rq *finish_task_switch(struct task_struct *prev)
*/
kmap_local_sched_in();
+ /*
+ * Any cached block-layer timestamp (plug->cur_ktime) is stale now,
+ * invalidate it.
+ */
+ blk_plug_invalidate_ts();
+
fire_sched_in_preempt_notifiers(current);
/*
* When switching through a kernel thread, the loop in
@@ -7290,12 +7296,10 @@ static inline void sched_submit_work(struct task_struct *tsk)
static void sched_update_worker(struct task_struct *tsk)
{
- if (tsk->flags & (PF_WQ_WORKER | PF_IO_WORKER | PF_BLOCK_TS)) {
- if (tsk->flags & PF_BLOCK_TS)
- blk_plug_invalidate_ts(tsk);
+ if (tsk->flags & (PF_WQ_WORKER | PF_IO_WORKER)) {
if (tsk->flags & PF_WQ_WORKER)
wq_worker_running(tsk);
- else if (tsk->flags & PF_IO_WORKER)
+ else
io_wq_worker_running(tsk);
}
}
--
2.53.0-Meta
^ permalink raw reply related
* Re: [PATCH v2] block: invalidate cached plug timestamp after task switch
From: Peter Zijlstra @ 2026-06-12 9:45 UTC (permalink / raw)
To: Usama Arif
Cc: axboe, linux-block, bsegall, dietmar.eggemann, juri.lelli,
kprateek.nayak, linux-kernel, mgorman, mingo, rostedt,
vincent.guittot, vschneid, shakeel.butt, hannes, riel,
kernel-team, stable
In-Reply-To: <20260612094042.3350401-1-usama.arif@linux.dev>
On Fri, Jun 12, 2026 at 02:40:42AM -0700, Usama Arif wrote:
> +static __always_inline void blk_plug_invalidate_ts(void)
> {
> + if (unlikely(current->flags & PF_BLOCK_TS)) {
> + struct blk_plug *plug = current->plug;
>
> + if (plug)
> + plug->cur_ktime = 0;
> + current->flags &= ~PF_BLOCK_TS;
> + }
> }
If you can guarantee PF_BLOCK_TS is only ever set when current->plug,
this can be reduced further.
^ permalink raw reply
* Re: [PATCH RFC 0/1] block: fix concurrent elevator change failure
From: Shin'ichiro Kawasaki @ 2026-06-12 9:47 UTC (permalink / raw)
To: Ming Lei; +Cc: linux-block, Jens Axboe, Nilay Shroff
In-Reply-To: <aiqaXfTqCLMu2DwF@fedora>
On Jun 11, 2026 / 06:22, Ming Lei wrote:
> Hi Shin'ichiro,
Hi Ming, thanks for the comments.
>
> On Thu, Jun 11, 2026 at 04:41:59PM +0900, Shin'ichiro Kawasaki wrote:
> > I observed that the blktests test case block/005 hangs on a specific
> > server hardware using a specific HDD as a block device. During the test
> > case run, the kernel reported a KASAN null-ptr-deref (and other memory
> > corruption symptoms) [2]. This failure looked sporadic and hardware-
> > dependent.
> >
> > From the kernel message, I noticed that udev-worker wrote to the
> > queue/scheduler sysfs attribute to change the IO scheduler, or elevator.
> > The test case block/005 also wrote to the same sysfs attribute, which
>
> sysfs write is supposed to be serialized...
I checked the sysfs write handler elv_iosched_store() in block/elevator.c.
I found elevator_change() call is guarded with the rw_semaphore
"set->update_nr_hwq_lock", but the guard is not the writer lock but the reader
lock. This does not serialize the sysfs writes.
I tried the patch below to replace the reader lock with the writer lock. With
a quick trial, it looks working. The kernel message is no longer observed and
the new test case does not cause hangs. I will do further testing to confirm
that this change does not trigger other new lockdep WARNs. Assuming it does not
have such side effects, I hope this fix approach is acceptable. It doesn't add
the new lock, so I think it's the better.
diff --git a/block/elevator.c b/block/elevator.c
index 3bcd37c2aa34..b03185a217ff 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -813,7 +813,7 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
* update_nr_hwq_lock -> kn->active (via del_gendisk -> kobject_del)
* kn->active -> update_nr_hwq_lock (via this sysfs write path)
*/
- if (!down_read_trylock(&set->update_nr_hwq_lock)) {
+ if (!down_write_trylock(&set->update_nr_hwq_lock)) {
ret = -EBUSY;
goto out;
}
@@ -824,7 +824,7 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
} else {
ret = -ENOENT;
}
- up_read(&set->update_nr_hwq_lock);
+ up_write(&set->update_nr_hwq_lock);
out:
if (ctx.type)
[...]
> blk_mq_sched_reg_debugfs already includes debugfs lock, so I feel the proper
> fix could be check & avoid the null-ptr-deref.
Actually, null-ptr-deref is one of the failure symptoms. KASAN slab-user-after
free is also observed [3]. Then I'm guessing adding null checks may not be
enough.
> Adding new lock should be the last straw usually, especially this one is
> depended by queue freeze.
Got it, thanks.
[3] KASAN slab-use-after-free
[ 802.836569][ T3919] run blktests block/005 at 2026-05-11 10:42:39
[ 804.256901][ T3866] debugfs: 'sched' already exists in 'sdd'
[ 804.874743][ T3919] debugfs: 'sched' already exists in 'sdd'
[ 804.882124][ T3919] ==================================================================
[ 804.882154][ T3866] debugfs: 'sched' already exists in 'sdd'
[ 804.890039][ T3919] BUG: KASAN: slab-use-after-free in elevator_change_done+0x304/0x610
[ 804.890053][ T3919] Write of size 8 at addr ffff8881273e08e0 by task check/3919
[ 804.890061][ T3919]
[ 804.890069][ T3919] CPU: 4 UID: 0 PID: 3919 Comm: check Not tainted 7.1.0-rc2-kts+ #1 PREEMPT(lazy)
[ 804.890080][ T3919] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0 12/17/2015
[ 804.890086][ T3919] Call Trace:
[ 804.890092][ T3919] <TASK>
[ 804.890098][ T3919] dump_stack_lvl+0x6e/0xa0
[ 804.890118][ T3919] print_address_description.constprop.0+0x70/0x300
[ 804.890135][ T3919] ? elevator_change_done+0x304/0x610
[ 804.890145][ T3919] print_report+0xfc/0x1ff
[ 804.890154][ T3919] ? __virt_addr_valid+0x1d1/0x3f0
[ 804.890163][ T3919] ? elevator_change_done+0x304/0x610
[ 804.890168][ T3919] kasan_report+0xf6/0x1c0
[ 804.890176][ T3919] ? elevator_change_done+0x304/0x610
[ 804.890185][ T3919] kasan_check_range+0x125/0x200
[ 804.890192][ T3919] elevator_change_done+0x304/0x610
[ 804.890198][ T3919] ? sysfs_file_ops+0x70/0x140
[ 804.890206][ T3919] ? __pfx_elevator_change_done+0x10/0x10
[ 804.890213][ T3919] ? __pfx_sysfs_kf_write+0x10/0x10
[ 804.890220][ T3919] ? __pfx_sysfs_kf_write+0x10/0x10
[ 804.890225][ T3919] elevator_change+0x283/0x4f0
[ 804.890233][ T3919] ? __pfx_sysfs_kf_write+0x10/0x10
[ 804.890239][ T3919] elv_iosched_store+0x30c/0x3a0
[ 804.890246][ T3919] ? __pfx_elv_iosched_store+0x10/0x10
[ 804.890255][ T3919] ? lock_acquire.part.0+0xb8/0x230 10:42 [84/1747]
[ 804.890262][ T3919] ? kernfs_fop_write_iter+0x25b/0x5e0
[ 804.890268][ T3919] ? lock_acquire.part.0+0xb8/0x230
[ 804.890274][ T3919] ? lock_acquire+0x126/0x140
[ 804.890281][ T3919] ? __pfx_sysfs_kf_write+0x10/0x10
[ 804.890286][ T3919] queue_attr_store+0x23f/0x360
[ 804.890295][ T3919] ? __pfx_queue_attr_store+0x10/0x10
[ 804.890300][ T3919] ? __lock_acquire+0x55d/0xbd0
[ 804.890308][ T3919] ? lock_acquire.part.0+0xb8/0x230
[ 804.890314][ T3919] ? sysfs_file_kobj+0x1d/0x1b0
[ 804.890319][ T3919] ? find_held_lock+0x2b/0x80
[ 804.890326][ T3919] ? __lock_release.isra.0+0x59/0x170
[ 804.890334][ T3919] ? lock_release.part.0+0x1c/0x50
[ 804.890340][ T3919] ? sysfs_file_kobj+0xb9/0x1b0
[ 804.890345][ T3919] ? sysfs_kf_write+0x65/0x170
[ 804.890352][ T3919] ? __pfx_sysfs_kf_write+0x10/0x10
[ 804.890357][ T3919] kernfs_fop_write_iter+0x3da/0x5e0
[ 804.890363][ T3919] ? __pfx_kernfs_fop_write_iter+0x10/0x10
[ 804.890368][ T3919] vfs_write+0x524/0x1010
[ 804.890378][ T3919] ? __pfx_vfs_write+0x10/0x10
[ 804.890393][ T3919] ksys_write+0xff/0x200
[ 804.890401][ T3919] ? __pfx_ksys_write+0x10/0x10
[ 804.890408][ T3919] ? __pfx_pte_val+0x10/0x10
[ 804.890414][ T3919] ? folio_xchg_last_cpupid+0xc6/0x130
[ 804.890421][ T3919] do_syscall_64+0xf4/0x1550
[ 804.890429][ T3919] ? __lock_release.isra.0+0x59/0x170
[ 804.890437][ T3919] ? lock_release.part.0+0x1c/0x50
[ 804.890444][ T3919] ? rcu_read_unlock+0x1c/0x60
[ 804.890449][ T3919] ? wp_page_reuse+0x160/0x1e0
[ 804.890455][ T3919] ? do_wp_page+0x5db/0x10a0
[ 804.890465][ T3919] ? handle_pte_fault+0x54e/0x760
[ 804.890472][ T3919] ? __pfx_handle_pte_fault+0x10/0x10
[ 804.890479][ T3919] ? __pfx_pmd_val+0x10/0x10
[ 804.890485][ T3919] ? __handle_mm_fault+0xa02/0xef0
[ 804.890493][ T3919] ? __lock_acquire+0x55d/0xbd0
[ 804.890499][ T3919] ? __pfx_css_rstat_updated+0x10/0x10
[ 804.890509][ T3919] ? lock_acquire.part.0+0xb8/0x230
[ 804.890515][ T3919] ? count_memcg_events_mm.constprop.0+0x22/0x130
[ 804.890522][ T3919] ? find_held_lock+0x2b/0x80
[ 804.890528][ T3919] ? __lock_release.isra.0+0x59/0x170
[ 804.890536][ T3919] ? find_held_lock+0x2b/0x80
[ 804.890542][ T3919] ? __lock_release.isra.0+0x59/0x170
[ 804.890550][ T3919] ? do_user_addr_fault+0x811/0xed0
[ 804.890559][ T3919] ? do_syscall_64+0x34/0x1550
[ 804.890564][ T3919] ? lockdep_hardirqs_on_prepare.part.0+0x9b/0x140
[ 804.890570][ T3919] ? do_syscall_64+0x34/0x1550
[ 804.890575][ T3919] ? trace_hardirqs_on+0x19/0x1a0
[ 804.890584][ T3919] ? do_syscall_64+0xab/0x1550
[ 804.890590][ T3919] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 804.890596][ T3919] RIP: 0033:0x7ff08cbe3bbe
[ 804.890603][ T3919] Code: 4d 89 d8 e8 34 bd 00 00 4c 8b 5d f8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 05 <c9> c3 83 e2 39 83 fa 08 75 e7 e8 13 ff ff ff 0f 1f 00 f
3 0f 1e fa
[ 804.890609][ T3919] RSP: 002b:00007ffc95718820 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[ 804.890616][ T3919] RAX: ffffffffffffffda RBX: 00007ff08cd5f5c0 RCX: 00007ff08cbe3bbe
[ 804.890621][ T3919] RDX: 0000000000000006 RSI: 0000563340f2c390 RDI: 0000000000000001
[ 804.890624][ T3919] RBP: 00007ffc95718830 R08: 0000000000000000 R09: 0000000000000000
[ 804.890627][ T3919] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000006
[ 804.890630][ T3919] R13: 0000000000000006 R14: 0000563340f2c390 R15: 0000563340f96890
[ 804.890641][ T3919] </TASK>
[ 804.890643][ T3919]
[ 805.368835][ T3919] Allocated by task 3919:
[ 805.373543][ T3919] kasan_save_stack+0x30/0x50
[ 805.378559][ T3919] kasan_save_track+0x14/0x30
[ 805.383559][ T3919] __kasan_kmalloc+0x9a/0xb0
[ 805.388465][ T3919] elevator_alloc+0xc5/0x2b0
[ 805.393366][ T3919] blk_mq_init_sched+0xa6/0x5e0
[ 805.398554][ T3919] elevator_switch+0x18e/0x680
[ 805.403702][ T3919] elevator_change+0x2d8/0x4f0
[ 805.408802][ T3919] elv_iosched_store+0x30c/0x3a0
[ 805.414116][ T3919] queue_attr_store+0x23f/0x360
[ 805.419289][ T3919] kernfs_fop_write_iter+0x3da/0x5e0
[ 805.424938][ T3919] vfs_write+0x524/0x1010
[ 805.429600][ T3919] ksys_write+0xff/0x200
[ 805.434159][ T3919] do_syscall_64+0xf4/0x1550
[ 805.439064][ T3919] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 805.445273][ T3919]
[ 805.447927][ T3919] Freed by task 3866:
[ 805.452231][ T3919] kasan_save_stack+0x30/0x50
[ 805.457287][ T3919] kasan_save_track+0x14/0x30
[ 805.462282][ T3919] kasan_save_free_info+0x3b/0x70
[ 805.467645][ T3919] __kasan_slab_free+0x6b/0x90
[ 805.472736][ T3919] kfree+0x21c/0x620
[ 805.476953][ T3919] kobject_cleanup+0x105/0x3a0
[ 805.482039][ T3919] elevator_change_done+0x196/0x610
[ 805.487633][ T3919] elevator_change+0x283/0x4f0
[ 805.492730][ T3919] elv_iosched_store+0x30c/0x3a0
[ 805.497989][ T3919] queue_attr_store+0x23f/0x360
[ 805.503144][ T3919] kernfs_fop_write_iter+0x3da/0x5e0
[ 805.508747][ T3919] vfs_write+0x524/0x1010
[ 805.513381][ T3919] ksys_write+0xff/0x200
[ 805.517944][ T3919] do_syscall_64+0xf4/0x1550
[ 805.522862][ T3919] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 805.529118][ T3919]
[ 805.531858][ T3919] The buggy address belongs to the object at ffff8881273e0800
[ 805.531858][ T3919] which belongs to the cache kmalloc-rnd-13-1k of size 1024
[ 805.547392][ T3919] The buggy address is located 224 bytes inside of
[ 805.547392][ T3919] freed 1024-byte region [ffff8881273e0800, ffff8881273e0c00)
[ 805.562078][ T3919]
[ 805.564734][ T3919] The buggy address belongs to the physical page:
[ 805.571446][ T3919] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1273e0
[ 805.580609][ T3919] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 805.589411][ T3919] flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
[ 805.597524][ T3919] page_type: f5(slab)
[ 805.601916][ T3919] raw: 0017ffffc0000040 ffff88810005c640 dead000000000100 dead000000000122
[ 805.610881][ T3919] raw: 0000000000000000 0000000800100010 00000000f5000000 0000000000000000
[ 805.619808][ T3919] head: 0017ffffc0000040 ffff88810005c640 dead000000000100 dead000000000122
[ 805.628815][ T3919] head: 0000000000000000 0000000800100010 00000000f5000000 0000000000000000
[ 805.637838][ T3919] head: 0017ffffc0000003 fffffffffffffe01 00000000ffffffff 00000000ffffffff
[ 805.646901][ T3919] head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008
[ 805.655983][ T3919] page dumped because: kasan: bad access detected
[ 805.662913][ T3919]
[ 805.665657][ T3919] Memory state around the buggy address:
[ 805.671717][ T3919] ffff8881273e0780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 805.680194][ T3919] ffff8881273e0800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 805.688697][ T3919] >ffff8881273e0880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 805.697130][ T3919] ^
[ 805.704717][ T3919] ffff8881273e0900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 805.713179][ T3919] ffff8881273e0980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 805.721720][ T3919] ==================================================================
[ 805.730526][ T3919] Disabling lock debugging due to kernel taint
...
^ permalink raw reply related
* [PATCH v3 0/4] btrfs: use IOMAP_DIO_BOUNCE flag instead of falling back to buffered IO
From: Qu Wenruo @ 2026-06-12 9:51 UTC (permalink / raw)
To: linux-btrfs, linux-block, linux-fsdevel, linux-xfs
[CHANGELOG]
v3:
- Fix a bug in error handling of bio_iov_iter_bounce_write()
Which can lead to generic/708 failure on btrfs.
- Respect nofault flag in bio_iov_iter_bounce_write()
To avoid btrfs specific deadlocks.
- Reject NOWAIT and BOUNCE direct IOs
Since BOUNCE always allocate pages using GFP_KERNEL, which can sleep
and break NOWAIT requirement, has to reject such combination.
v2:
- Rework the comment in btrfs_dio_write()
Commit 968f19c5b1b7 ("btrfs: always fallback to buffered write if the
inode requires checksum") solved the csum mismatch caused by unstable
direct IO buffers, it has a pretty hefty performance penalty.
Meanwhile upstream iomap has introduce IOMAP_DIO_BOUNCE flag to get
stable buffers meanwhile without falling back to buffered IOs.
Using that flag btrfs can reach 95% of the original zero-copy direct IO
performance, almost 2x the current buffered fallback performance.
However during my tests, there are several bugs related to iomap that
can lead to direct IO test case failures:
- generic/708
Results garbage in the end of the writes, is a bug in the error
handling of a short copy.
Fixed in the first patch.
- Deadlock if using the page cache as direct IO buffer
This is because bio_iov_iter_bounce_write() doesn't respect
iov_iter::nofault flag.
Fixed in the second patch.
- Possible NOWAIT and BOUNCE conflicts
BOUNCE flag for both reads and writes will allocate new folios using
GFP_KERNEL, which can sleep and break NOWAIT requirement.
Reject such combination in iomap_dio_bio_iter() directly in the 3rd
patch.
And the final one will enable btrfs to use IOMAP_DIO_BOUNCE flag, so
that even with data checksum we do not need to fallback to buffered IO
and reclaim most of the dropped direct IO performance.
Qu Wenruo (4):
block: revert the iov_iter after a short copy in
bio_iov_iter_bounce_write()
block: respect iov_iter::nofault flag in bio_iov_iter_bounce_write()
iomap: reject NOWAIT and BOUNCE direct IOs
btrfs: use IOMAP_DIO_BOUNCE flag instead of falling back to buffered
IO
block/bio.c | 10 ++++++---
fs/btrfs/direct-io.c | 53 ++++++++++++++++++++------------------------
fs/iomap/direct-io.c | 4 ++++
3 files changed, 35 insertions(+), 32 deletions(-)
--
2.54.0
^ permalink raw reply
* [PATCH v3 1/4] block: revert the iov_iter after a short copy in bio_iov_iter_bounce_write()
From: Qu Wenruo @ 2026-06-12 9:51 UTC (permalink / raw)
To: linux-btrfs, linux-block, linux-fsdevel, linux-xfs
In-Reply-To: <cover.1781253428.git.wqu@suse.com>
For the incoming IOMAP_DIO_BOUNCE flag usage inside btrfs, it's pretty
easy to hit short copy inside bio_iov_iter_bounce_write().
This is because btrfs has disabled page fault to avoid certain deadlock
during direct writes, and instead btrfs manually fault in the pages then
retry.
And inside bio_iov_iter_bounce_write(), if we hit a short write, we
didn't revert the iov_iter, which can cause problems like unexpected
garbage for the next retry.
Revert the iov_iter after a short copy.
One thing to note is that, the folio is allocated then immediately
queued into the bio, so the proper revert size should be
(bi_size - this_len + copied).
Fixes: 8dd5e7c75d7b ("block: add helpers to bounce buffer an iov_iter into bios")
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
block/bio.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index 5f10900b3f42..b33ff69bb722 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1321,6 +1321,7 @@ static int bio_iov_iter_bounce_write(struct bio *bio, struct iov_iter *iter,
do {
size_t this_len = min(total_len, SZ_1M);
+ size_t copied;
struct folio *folio;
if (this_len > minsize * 2)
@@ -1334,12 +1335,12 @@ static int bio_iov_iter_bounce_write(struct bio *bio, struct iov_iter *iter,
break;
bio_add_folio_nofail(bio, folio, this_len, 0);
- if (copy_from_iter(folio_address(folio), this_len, iter) !=
- this_len) {
+ copied = copy_from_iter(folio_address(folio), this_len, iter);
+ if (copied < this_len) {
+ iov_iter_revert(iter, bio->bi_iter.bi_size - this_len + copied);
bio_free_folios(bio);
return -EFAULT;
}
-
total_len -= this_len;
} while (total_len && bio->bi_vcnt < bio->bi_max_vecs);
--
2.54.0
^ permalink raw reply related
* [PATCH v3 4/4] btrfs: use IOMAP_DIO_BOUNCE flag instead of falling back to buffered IO
From: Qu Wenruo @ 2026-06-12 9:51 UTC (permalink / raw)
To: linux-btrfs, linux-block, linux-fsdevel, linux-xfs
In-Reply-To: <cover.1781253428.git.wqu@suse.com>
Previously btrfs forces direct writes to fall back to buffered ones if the
inode has data checksum or the profile has duplication.
That fallback is to avoid the content being modified that the final
content may mismatch with the checksum or the other mirrors.
That brings a pretty huge performance cost, which already caused some
concern at that time.
But later upstream commit c9d114846b38 ("iomap: add a flag to bounce
buffer direct I/O") introduced a new method by copying the content into
new pages, and do all the operations based on the newly allocated pages.
So let btrfs to utilize the new flag for direct writes if we require
stable folios.
There is a quick benchmark, using the following fio setup:
fio --name=randwrite --filename $mnt/foobar --ioengine=libaio --size=4G \
--rw=randwrite --iodepth=64 --runtime=60 --time_based --direct=1 \
--bs=$blocksize
Unit is MiB/s.
Blocksize | Zero-copy (*) | Buffered | Bounce
-----------+---------------+----------+-----------
4K | 35.1 | 17.1 | 33.8
64K | 522 | 251 | 492
*: This is done by reverting the commit 968f19c5b1b7 ("btrfs: always
fallback to buffered write if the inode requires checksum")
Although with page bouncing the performance is only around 95% of
true-zero copy, it's still almost double the performance of buffered
fallback.
There will be a small change in behavior, since we're using
IOMAP_DIO_BOUNCE flag to allocate new folios, NOWAIT flag will
immediately fail.
So for NOWAIT direct IOs, NODATASUM and RAID0/SINGLE profiles are still
required.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/direct-io.c | 53 ++++++++++++++++++++------------------------
1 file changed, 24 insertions(+), 29 deletions(-)
diff --git a/fs/btrfs/direct-io.c b/fs/btrfs/direct-io.c
index e566a60b0ce5..bbf94056a874 100644
--- a/fs/btrfs/direct-io.c
+++ b/fs/btrfs/direct-io.c
@@ -818,13 +818,36 @@ static ssize_t btrfs_dio_read(struct kiocb *iocb, struct iov_iter *iter,
IOMAP_DIO_PARTIAL | IOMAP_DIO_FSBLOCK_ALIGNED, &data, done_before);
}
+static bool need_stable_write(struct btrfs_inode *inode)
+{
+ const u64 data_profile = btrfs_data_alloc_profile(inode->root->fs_info) &
+ BTRFS_BLOCK_GROUP_PROFILE_MASK;
+
+ /* Data checksum requires stable buffer. */
+ if (!(inode->flags & BTRFS_INODE_NODATASUM))
+ return true;
+ /*
+ * Any profile with mirror/parity will require stable buffer.
+ * Otherwise the mirror may differ from each other.
+ *
+ * Thus only SINGLE and RAID0 doesn't require stable buffer.
+ */
+ if (data_profile != 0 && data_profile != BTRFS_BLOCK_GROUP_RAID0)
+ return true;
+ return false;
+}
+
static struct iomap_dio *btrfs_dio_write(struct kiocb *iocb, struct iov_iter *iter,
size_t done_before)
{
struct btrfs_dio_data data = { 0 };
+ unsigned int dio_flags = IOMAP_DIO_PARTIAL | IOMAP_DIO_FSBLOCK_ALIGNED;
+
+ if (need_stable_write(BTRFS_I(file_inode(iocb->ki_filp))))
+ dio_flags |= IOMAP_DIO_BOUNCE;
return __iomap_dio_rw(iocb, iter, &btrfs_dio_iomap_ops, &btrfs_dio_ops,
- IOMAP_DIO_PARTIAL | IOMAP_DIO_FSBLOCK_ALIGNED, &data, done_before);
+ dio_flags, &data, done_before);
}
static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
@@ -853,8 +876,6 @@ ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
ssize_t ret;
unsigned int ilock_flags = 0;
struct iomap_dio *dio;
- const u64 data_profile = btrfs_data_alloc_profile(fs_info) &
- BTRFS_BLOCK_GROUP_PROFILE_MASK;
if (iocb->ki_flags & IOCB_NOWAIT)
ilock_flags |= BTRFS_ILOCK_TRY;
@@ -868,16 +889,6 @@ ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
if (iocb->ki_pos + iov_iter_count(from) <= i_size_read(inode) && IS_NOSEC(inode))
ilock_flags |= BTRFS_ILOCK_SHARED;
- /*
- * If our data profile has duplication (either extra mirrors or RAID56),
- * we can not trust the direct IO buffer, the content may change during
- * writeback and cause different contents written to different mirrors.
- *
- * Thus only RAID0 and SINGLE can go true zero-copy direct IO.
- */
- if (data_profile != BTRFS_BLOCK_GROUP_RAID0 && data_profile != 0)
- goto buffered;
-
relock:
ret = btrfs_inode_lock(BTRFS_I(inode), ilock_flags);
if (ret < 0)
@@ -918,22 +929,6 @@ ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
btrfs_inode_unlock(BTRFS_I(inode), ilock_flags);
goto buffered;
}
- /*
- * We can't control the folios being passed in, applications can write
- * to them while a direct IO write is in progress. This means the
- * content might change after we calculated the data checksum.
- * Therefore we can end up storing a checksum that doesn't match the
- * persisted data.
- *
- * To be extra safe and avoid false data checksum mismatch, if the
- * inode requires data checksum, just fallback to buffered IO.
- * For buffered IO we have full control of page cache and can ensure
- * no one is modifying the content during writeback.
- */
- if (!(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)) {
- btrfs_inode_unlock(BTRFS_I(inode), ilock_flags);
- goto buffered;
- }
/*
* The iov_iter can be mapped to the same file range we are writing to.
--
2.54.0
^ permalink raw reply related
* [PATCH v3 2/4] block: respect iov_iter::nofault flag in bio_iov_iter_bounce_write()
From: Qu Wenruo @ 2026-06-12 9:51 UTC (permalink / raw)
To: linux-btrfs, linux-block, linux-fsdevel, linux-xfs
In-Reply-To: <cover.1781253428.git.wqu@suse.com>
For the incoming usage of IOMAP_DIO_BOUNCE in btrfs, btrfs has set
iov_iter::nofault to prevent deadlock when a page fault is needed to
read out the buffer.
However bio_iov_iter_bounce_write() doesn't respect iov_iter::nofault
flag, and just call a plain copy_from_iter() so it can still trigger
page fault and cause deadlock in btrfs.
Fix it by utilizing copy_folio_from_iter_atomic() if nofault flag is
set, otherwise use copy_folio_from_iter().
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
block/bio.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/block/bio.c b/block/bio.c
index b33ff69bb722..01bb76d9717c 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1335,7 +1335,10 @@ static int bio_iov_iter_bounce_write(struct bio *bio, struct iov_iter *iter,
break;
bio_add_folio_nofail(bio, folio, this_len, 0);
- copied = copy_from_iter(folio_address(folio), this_len, iter);
+ if (iter->nofault)
+ copied = copy_folio_from_iter_atomic(folio, 0, this_len, iter);
+ else
+ copied = copy_folio_from_iter(folio, 0, this_len, iter);
if (copied < this_len) {
iov_iter_revert(iter, bio->bi_iter.bi_size - this_len + copied);
bio_free_folios(bio);
--
2.54.0
^ permalink raw reply related
* [PATCH v3 3/4] iomap: reject NOWAIT and BOUNCE direct IOs
From: Qu Wenruo @ 2026-06-12 9:51 UTC (permalink / raw)
To: linux-btrfs, linux-block, linux-fsdevel, linux-xfs
In-Reply-To: <cover.1781253428.git.wqu@suse.com>
If a direct IO requires bounced pages for stable buffer, it will always
allocate memory, and both bio_iov_iter_bounce_write() and
bio_iov_iter_bounce_read() are allocating pages using GFP_KERNEL, which
can sleep and break NOWAIT requirement.
So we need to reject such NOWAIT and BOUNCE direct IO in
iomap_dio_bio_iter().
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/iomap/direct-io.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index b36ee619cdcd..d1601122f0b5 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -412,6 +412,10 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
unsigned int alignment;
ssize_t ret = 0;
+ /* Bounced direct IO will need to allocate memory, breaking NOWAIT flag. */
+ if (unlikely(iter->flags & IOMAP_NOWAIT && dio->flags & IOMAP_DIO_BOUNCE))
+ return -EAGAIN;
+
/*
* File systems that write out of place and always allocate new blocks
* need each bio to be block aligned as that's the unit of allocation.
--
2.54.0
^ permalink raw reply related
* Re: [PATCH v2] block: invalidate cached plug timestamp after task switch
From: Usama Arif @ 2026-06-12 10:02 UTC (permalink / raw)
To: Peter Zijlstra
Cc: axboe, linux-block, bsegall, dietmar.eggemann, juri.lelli,
kprateek.nayak, linux-kernel, mgorman, mingo, rostedt,
vincent.guittot, vschneid, shakeel.butt, hannes, riel,
kernel-team, stable
In-Reply-To: <20260612094520.GA42921@noisy.programming.kicks-ass.net>
On 12/06/2026 10:45, Peter Zijlstra wrote:
> On Fri, Jun 12, 2026 at 02:40:42AM -0700, Usama Arif wrote:
>
>> +static __always_inline void blk_plug_invalidate_ts(void)
>> {
>> + if (unlikely(current->flags & PF_BLOCK_TS)) {
>> + struct blk_plug *plug = current->plug;
>>
>> + if (plug)
>> + plug->cur_ktime = 0;
>> + current->flags &= ~PF_BLOCK_TS;
>> + }
>> }
>
> If you can guarantee PF_BLOCK_TS is only ever set when current->plug,
> this can be reduced further.
Thanks for the reviews!
The invariant holds at set time (the only set in blk_time_get_ns() is
gated by if (!plug)) and through the only legitimate plug clear in
blk_finish_plug() (which goes through __blk_flush_plug() that clears
PF_BLOCK_TS first).
However, copy_process() sets p->plug = NULL for the child but doesn't
strip PF_BLOCK_TS from the inherited flags.
I think the if(plug) is a good defensive check, but can also do the below
if you prefer?
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1c1fd31ce187..c285a4d9837d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1219,10 +1219,7 @@ static inline void blk_flush_plug(struct blk_plug *plug, bool async)
static __always_inline void blk_plug_invalidate_ts(void)
{
if (unlikely(current->flags & PF_BLOCK_TS)) {
- struct blk_plug *plug = current->plug;
-
- if (plug)
- plug->cur_ktime = 0;
+ current->plug->cur_ktime = 0;
current->flags &= ~PF_BLOCK_TS;
}
}
diff --git a/kernel/fork.c b/kernel/fork.c
index 892a95214c54..9a062149e0d8 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2167,7 +2167,8 @@ __latent_entropy struct task_struct *copy_process(
goto bad_fork_cleanup_count;
delayacct_tsk_init(p); /* Must remain after dup_task_struct() */
- p->flags &= ~(PF_SUPERPRIV | PF_WQ_WORKER | PF_IDLE | PF_NO_SETAFFINITY);
+ p->flags &= ~(PF_SUPERPRIV | PF_WQ_WORKER | PF_IDLE | PF_NO_SETAFFINITY |
+ PF_BLOCK_TS);
p->flags |= PF_FORKNOEXEC;
INIT_LIST_HEAD(&p->children);
INIT_LIST_HEAD(&p->sibling);
^ permalink raw reply related
* Re: [PATCH v4 6/8] Bluetooth: hci_sync: Add NVMEM-backed BD address retrieval
From: Loic Poulain @ 2026-06-12 10:00 UTC (permalink / raw)
To: Dmitry Baryshkov
Cc: Ulf Hansson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Bjorn Andersson, Konrad Dybcio, Jens Axboe, Johannes Berg,
Jeff Johnson, Bartosz Golaszewski, Marcel Holtmann,
Luiz Augusto von Dentz, Balakrishna Godavarthi, Rocky Liao,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Srinivas Kandagatla, Andrew Lunn, Heiner Kallweit,
Russell King, Saravana Kannan, linux-mmc, devicetree,
linux-kernel, linux-arm-msm, linux-block, linux-wireless, ath10k,
linux-bluetooth, netdev, daniel, Bartosz Golaszewski
In-Reply-To: <sy2ofvdbcxspxtmfdavjvdz7oes5ieuep4znf4ayknmuwhrlgk@7lp3bkegaeif>
On Fri, Jun 12, 2026 at 11:11 AM Dmitry Baryshkov
<dmitry.baryshkov@oss.qualcomm.com> wrote:
>
> On Tue, Jun 09, 2026 at 09:52:31AM +0200, Loic Poulain wrote:
> > Some devices store the Bluetooth BD address in non-volatile
> > memory, which can be accessed through the NVMEM framework.
> > Similar to Ethernet or WiFi MAC addresses, add support for
> > reading the BD address from a 'local-bd-address' NVMEM cell.
> >
> > As with the device-tree provided BD address, add a quirk to
> > indicate whether a device or platform should attempt to read
> > the address from NVMEM when no valid in-chip address is present.
> > Also add a quirk to indicate if the address is stored in
> > big-endian byte order.
> >
> > Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
> > Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
> > ---
> > include/net/bluetooth/hci.h | 18 ++++++++++++++++++
> > net/bluetooth/hci_sync.c | 39 ++++++++++++++++++++++++++++++++++++++-
> > 2 files changed, 56 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
> > index 572b1c620c5d653a1fe10b26c1b0ba33e8f4968f..7686466d1109253b0d75edeb5f6a99fb98ce4cc6 100644
> > --- a/include/net/bluetooth/hci.h
> > +++ b/include/net/bluetooth/hci.h
> > @@ -164,6 +164,24 @@ enum {
> > */
> > HCI_QUIRK_BDADDR_PROPERTY_BROKEN,
> >
> > + /* When this quirk is set, the public Bluetooth address
> > + * initially reported by HCI Read BD Address command
> > + * is considered invalid. The public BD Address can be
> > + * retrieved via a 'local-bd-address' NVMEM cell.
>
> Why do we need a quirk here? Can't we always assume that if there is an
> NVMEM cell, it contains a correct address, even if HCI command returned
> a seemingly-sensible one?
The pattern follows HCI_QUIRK_USE_BDADDR_PROPERTY, the quirk indicates
that the address returned by the HCI Read BD Address command is
invalid and should be overridden using a fwnode property. Without this
quirk, even a valid fwnode-provided address is ignored. So here this
is primarily done to align with that established behavior, although
whether that design choice is ideal is a good question.
This also raises the question of why an explicit HCI_QUIRK_USE_* flag
is required to allow reading from NVMEM when the controller-provided
address is known to be invalid, rather than attempting to use any
available backend (fwnode-prop or NVMEM). but this remains
consistent with the behavior established by the fwnode-based quirk.
So, I think these aspects could be revisited in a Bluetooth follow-up
series if there is interest in reworking the overall addr fallback
design.
Regards,
Loic
>
> > + *
> > + * This quirk can be set before hci_register_dev is called or
> > + * during the hdev->setup vendor callback.
> > + */
> > + HCI_QUIRK_USE_BDADDR_NVMEM,
> > +
> > + /* When this quirk is set, the Bluetooth Device Address provided by
> > + * the 'local-bd-address' NVMEM is stored in big-endian order.
> > + *
> > + * This quirk can be set before hci_register_dev is called or
> > + * during the hdev->setup vendor callback.
> > + */
> > + HCI_QUIRK_BDADDR_NVMEM_BE,
>
> Also, is this necessary? Are the devices which store the address in the
> wrong format in the NVMEM?
>
> > +
> > /* When this quirk is set, the duplicate filtering during
> > * scanning is based on Bluetooth devices addresses. To allow
> > * RSSI based updates, restart scanning if needed.
>
> --
> With best wishes
> Dmitry
^ permalink raw reply
* Re: [LSF/MM/BPF RFC PATCH 00/13]
From: Haris Iqbal @ 2026-06-12 10:36 UTC (permalink / raw)
To: Leon Romanovsky
Cc: linux-block, linux-rdma, linux-kernel, axboe, bvanassche, hch,
jgg, jinpu.wang
In-Reply-To: <20260611115902.GO327369@unreal>
On Thu, Jun 11, 2026 at 1:59 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Wed, May 27, 2026 at 02:44:08PM +0200, Haris Iqbal wrote:
> > On Tue, May 12, 2026 at 12:34 PM Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > On Tue, May 05, 2026 at 09:46:12AM +0200, Md Haris Iqbal wrote:
> > > > Following a conversation with Bart yesterday, I am sending the RMR+BRMR
> > > > code through patch for easier review.
> > > >
> > > > The patches apply over the for-next branch of the block tree over commit
> > > > 07dfa981ca3
> > > >
> > > > For context,
> > > > RMR (Reliable Multicast over RTRS) is a kernel module that provides
> > > > active-active block-level replication over RDMA. It guarantees delivery
> > > > of IO to a group of storage nodes and handles resynchronization of data
> > > > directly between storage nodes without involving the compute client.
> > > >
> > > > BRMR (Block device over RMR) sits on top of RMR and exposes a standard
> > > > Linux block device (/dev/brmrX) backed by an RMR pool. Together, RMR and
> > > > BRMR provide a single-hop replication and resynchronization solution for
> > > > RDMA-connected storage clusters.
> > > >
> > > > My session is on Wednesday, at 12 in the storage room (Istanbul).
> > >
> > > To summarize the discussion:
> > >
> > > 1. Move as much logic as possible into the block layer; RDMA should serve
> > > strictly as a transport.
> > > 2. Identify another in‑kernel user of this functionality, and add support for
> > > it if required. At least accommodate potential users elsewhere in the
> > > kernel.
> >
> > Thanks for the summary Leon.
> >
> > The main logic which handles multicast/replication legs, missed I/O
> > tracking, re-synchronization, etc are the core parts of RMR.
> > If we move those to a separate module, there won't be much left in
> > RMR. RMR already uses RTRS from the RDMA subsystem as transport.
> >
> > Having said that, I am not against moving RMR out of the RDMA layer.
> > It can serve as a reliable replication service/library for any other
> > user in the kernel to use.
> > Which subsystem (block or something else) would be a better fit then,
> > can be discussed.
> >
> > PS: Would this be a good candidate for a session/discussion in the upcoming LPC?
>
> Probably yes.
>
> Thanks
Thanks Leon. I'll submit the abstract through the portal.
Do you think the topic is better suited towards Refereed track or Kernel Summit?
>
> >
> > >
> > > Thanks
^ permalink raw reply
* Re: [PATCH RFC 0/1] block: fix concurrent elevator change failure
From: Ming Lei @ 2026-06-12 11:06 UTC (permalink / raw)
To: Shin'ichiro Kawasaki; +Cc: linux-block, Jens Axboe, Nilay Shroff
In-Reply-To: <aivMxPCd305WbBsk@shinmob>
On Fri, Jun 12, 2026 at 06:47:50PM +0900, Shin'ichiro Kawasaki wrote:
> On Jun 11, 2026 / 06:22, Ming Lei wrote:
> > Hi Shin'ichiro,
>
> Hi Ming, thanks for the comments.
>
> >
> > On Thu, Jun 11, 2026 at 04:41:59PM +0900, Shin'ichiro Kawasaki wrote:
> > > I observed that the blktests test case block/005 hangs on a specific
> > > server hardware using a specific HDD as a block device. During the test
> > > case run, the kernel reported a KASAN null-ptr-deref (and other memory
> > > corruption symptoms) [2]. This failure looked sporadic and hardware-
> > > dependent.
> > >
> > > From the kernel message, I noticed that udev-worker wrote to the
> > > queue/scheduler sysfs attribute to change the IO scheduler, or elevator.
> > > The test case block/005 also wrote to the same sysfs attribute, which
> >
> > sysfs write is supposed to be serialized...
>
> I checked the sysfs write handler elv_iosched_store() in block/elevator.c.
> I found elevator_change() call is guarded with the rw_semaphore
> "set->update_nr_hwq_lock", but the guard is not the writer lock but the reader
> lock. This does not serialize the sysfs writes.
Please see kernfs_fop_write_iter(), in which mutex is held before calling
->write().
>
> I tried the patch below to replace the reader lock with the writer lock. With
> a quick trial, it looks working. The kernel message is no longer observed and
> the new test case does not cause hangs. I will do further testing to confirm
> that this change does not trigger other new lockdep WARNs. Assuming it does not
> have such side effects, I hope this fix approach is acceptable. It doesn't add
> the new lock, so I think it's the better.
>
> diff --git a/block/elevator.c b/block/elevator.c
> index 3bcd37c2aa34..b03185a217ff 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -813,7 +813,7 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
> * update_nr_hwq_lock -> kn->active (via del_gendisk -> kobject_del)
> * kn->active -> update_nr_hwq_lock (via this sysfs write path)
> */
> - if (!down_read_trylock(&set->update_nr_hwq_lock)) {
> + if (!down_write_trylock(&set->update_nr_hwq_lock)) {
> ret = -EBUSY;
> goto out;
> }
> @@ -824,7 +824,7 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
> } else {
> ret = -ENOENT;
> }
> - up_read(&set->update_nr_hwq_lock);
> + up_write(&set->update_nr_hwq_lock);
>
> out:
> if (ctx.type)
>
> [...]
>
> > blk_mq_sched_reg_debugfs already includes debugfs lock, so I feel the proper
> > fix could be check & avoid the null-ptr-deref.
>
> Actually, null-ptr-deref is one of the failure symptoms. KASAN slab-user-after
> free is also observed [3]. Then I'm guessing adding null checks may not be
> enough.
>
> > Adding new lock should be the last straw usually, especially this one is
> > depended by queue freeze.
>
> Got it, thanks.
>
>
> [3] KASAN slab-use-after-free
Then you need to figure out the exact slab type and check if the pointer is cleared
during free.
Anyway, there is guard already, not see reason to add new lock for covering
it.
Thanks,
Ming
^ permalink raw reply
* Re: [PATCH RFC 0/1] block: fix concurrent elevator change failure
From: Nilay Shroff @ 2026-06-12 11:45 UTC (permalink / raw)
To: Ming Lei, Shin'ichiro Kawasaki; +Cc: linux-block, Jens Axboe
In-Reply-To: <aivoHk4DE_pkKkDm@fedora>
On 6/12/26 4:36 PM, Ming Lei wrote:
> On Fri, Jun 12, 2026 at 06:47:50PM +0900, Shin'ichiro Kawasaki wrote:
>> On Jun 11, 2026 / 06:22, Ming Lei wrote:
>>> Hi Shin'ichiro,
>>
>> Hi Ming, thanks for the comments.
>>
>>>
>>> On Thu, Jun 11, 2026 at 04:41:59PM +0900, Shin'ichiro Kawasaki wrote:
>>>> I observed that the blktests test case block/005 hangs on a specific
>>>> server hardware using a specific HDD as a block device. During the test
>>>> case run, the kernel reported a KASAN null-ptr-deref (and other memory
>>>> corruption symptoms) [2]. This failure looked sporadic and hardware-
>>>> dependent.
>>>>
>>>> From the kernel message, I noticed that udev-worker wrote to the
>>>> queue/scheduler sysfs attribute to change the IO scheduler, or elevator.
>>>> The test case block/005 also wrote to the same sysfs attribute, which
>>>
>>> sysfs write is supposed to be serialized...
>>
>> I checked the sysfs write handler elv_iosched_store() in block/elevator.c.
>> I found elevator_change() call is guarded with the rw_semaphore
>> "set->update_nr_hwq_lock", but the guard is not the writer lock but the reader
>> lock. This does not serialize the sysfs writes.
>
> Please see kernfs_fop_write_iter(), in which mutex is held before calling
> ->write().
>
I think you're referring to @of->mutex here; however of->mutex is per struct
kernfs_open_file, which is associated with an open instance of the sysfs file.
The important point is that two separate opens can have different kernfs_open_file
instances and therefore different mutexes. Thus, concurrent write to same sysfs
attribute from two different processes may still be possible.
>>
>> I tried the patch below to replace the reader lock with the writer lock. With
>> a quick trial, it looks working. The kernel message is no longer observed and
>> the new test case does not cause hangs. I will do further testing to confirm
>> that this change does not trigger other new lockdep WARNs. Assuming it does not
>> have such side effects, I hope this fix approach is acceptable. It doesn't add
>> the new lock, so I think it's the better.
>>
>> diff --git a/block/elevator.c b/block/elevator.c
>> index 3bcd37c2aa34..b03185a217ff 100644
>> --- a/block/elevator.c
>> +++ b/block/elevator.c
>> @@ -813,7 +813,7 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
>> * update_nr_hwq_lock -> kn->active (via del_gendisk -> kobject_del)
>> * kn->active -> update_nr_hwq_lock (via this sysfs write path)
>> */
>> - if (!down_read_trylock(&set->update_nr_hwq_lock)) {
>> + if (!down_write_trylock(&set->update_nr_hwq_lock)) {
>> ret = -EBUSY;
>> goto out;
>> }
>> @@ -824,7 +824,7 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
>> } else {
>> ret = -ENOENT;
>> }
>> - up_read(&set->update_nr_hwq_lock);
>> + up_write(&set->update_nr_hwq_lock);
>>
>> out:
>> if (ctx.type)
>>
>> [...]
>>
>>> blk_mq_sched_reg_debugfs already includes debugfs lock, so I feel the proper
>>> fix could be check & avoid the null-ptr-deref.
>>
>> Actually, null-ptr-deref is one of the failure symptoms. KASAN slab-user-after
>> free is also observed [3]. Then I'm guessing adding null checks may not be
>> enough.
>>
>>> Adding new lock should be the last straw usually, especially this one is
>>> depended by queue freeze.
>>
>> Got it, thanks.
>>
>>
>> [3] KASAN slab-use-after-free
>
> Then you need to figure out the exact slab type and check if the pointer is cleared
> during free.
>
> Anyway, there is guard already, not see reason to add new lock for covering
> it.
>
Regarding the observed failure, my understanding is that blk_mq_debugfs_register_sched()
and blk_mq_debugfs_register_sched_hctx() access q->elevator without holding q->elevator_lock.
If multiple scheduler update paths run concurrently, one path can replace and free the
elevator while another path is still using it, which would explain the observed KASAN
use-after-free and NULL pointer dereference reports.
With the proposed change, upgrading update_nr_hwq_lock from a reader lock to a writer
lock in elv_iosched_store() would serialize concurrent scheduler updates and therefore
prevent multiple elevator switch operations from running at the same time.
The another way to fix this might be to acquire q->elevator_lock in blk_mq_sched_reg_debugfs()
and thus serialize access to q->elevator in blk_mq_debugfs_register_sched() and
blk_mq_debugfs_register_sched_hctx().
Thanks,
--Nilay
^ permalink raw reply
* [PATCH v5 0/9] Support for block device NVMEM providers
From: Loic Poulain @ 2026-06-12 13:20 UTC (permalink / raw)
To: Ulf Hansson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Bjorn Andersson, Konrad Dybcio, Jens Axboe, Johannes Berg,
Jeff Johnson, Bartosz Golaszewski, Marcel Holtmann,
Luiz Augusto von Dentz, Balakrishna Godavarthi, Rocky Liao,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Srinivas Kandagatla, Andrew Lunn, Heiner Kallweit,
Russell King, Saravana Kannan
Cc: linux-mmc, devicetree, linux-kernel, linux-arm-msm, linux-block,
linux-wireless, ath10k, linux-bluetooth, netdev, daniel,
Loic Poulain, Bartosz Golaszewski, Krzysztof Kozlowski,
Konrad Dybcio
On embedded devices, it is common for factory provisioning to store
device-specific information, such as Ethernet or WiFi MAC addresses,
in a dedicated area of an eMMC partition. This avoids the need for
and additional EEPROM/OTP and leverages the persistence of eMMC.
One example is the Arduino UNO-Q, where the WiFi MAC address and the
Bluetooth Device address are stored in the eMMC Boot1 partition.
Until now, accessing this information required a custom bootloader
to read the data and inject it into the Device Tree before handing
control over to the kernel. This approach is fragile and leads to
device-specific workarounds.
Rather than adding a new NVMEM provider specifically to the eMMC
subsystem, the new support operates at the block layer, allowing any
block device to behave like other non-volatile memories such as EEPROM
or OTP.
This series builds on earlier work by Daniel Golle that enables block
devices to act as NVMEM providers:
https://lore.kernel.org/all/6061aa4201030b9bb2f8d03ef32a564fdb786ed1.1709667858.git.daniel@makrotopia.org/
It also introduces an NVMEM layout description for the Arduino UNO-Q,
allowing device-specific data stored in the eMMC Boot1 partition to
be accessed in a standard way.
WiFi and Ethernet already support retrieving MAC addresses from NVMEM.
Bluetooth requires similar support, which is also addressed.
Note that this is currently limited to MMC-backed block devices, as
only the MMC core associates a firmware node with the block device
(add_disk_fwnode). This can be easily extended in the future to
support additional block drivers.
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
---
Changes in v5:
- Fixed ath10k binding issue + extended commit message (Krzysztof)
- Moved blk-nvmem handling to block core instead of a class_interface
This allows correct/robust integration with block device life cycle (Bartosz).
- block: partitions: of: Skip child nodes without reg property (sashiko)
- Link to v4: https://lore.kernel.org/r/20260609-block-as-nvmem-v4-0-45712e6b22c6@oss.qualcomm.com
Changes in v4:
- Fix squash issue (dts commit incorrectly squashed) (Konrad)
- Use devres for nvmem resources (Bartosz)
- use __free() destructor helper when possible (Bartosz)
- Fix value return checking for bdev_file_open_by_dev
- Link to v3: https://lore.kernel.org/r/20260608-block-as-nvmem-v3-0-82681f50aa35@oss.qualcomm.com
Changes in v3:
- Fixed missing 'fixed-partitions' compatible in partition (Rob)
- Fixed clashing nvmem cells, document calibration along mac (Sashiko)
- Remove workaround to handle dangling nvmem references after
unregistering, this is a generic nvmem framework issue handled
in Bartosz's series:
https://lore.kernel.org/all/20260429-nvmem-unbind-v3-0-2a694f95395b@oss.qualcomm.com/
- Validate mac (is_valid_ether_addr) before copying to output buffer
- Link to v2: https://lore.kernel.org/r/20260507-block-as-nvmem-v2-0-bf17edd5134e@oss.qualcomm.com
Changes in v2:
- Fix example nvmem-layout cells to use compatible = "mac-base"
- Squash WiFi MAC and Bluetooth BD address consumer patches into the nvmem layout patch
- Fix possible use-after-free in blk-nvmem: bnv (nvmem priv) linked to nvmem lifetime
- Simplify nvmem-cell-names from items: - const: to plain const:
- Factor out common NVMEM EUI-48 retrieval logic
- Reorder changes
- Link to v1: https://lore.kernel.org/r/20260428-block-as-nvmem-v1-0-6ad23e75190a@oss.qualcomm.com
---
Daniel Golle (1):
block: implement NVMEM provider
Loic Poulain (8):
block: partitions: of: Skip child nodes without reg property
dt-bindings: mmc: Document support for nvmem-layout
dt-bindings: net: wireless: qcom,ath10k: Document NVMEM cells
dt-bindings: bluetooth: qcom: Add NVMEM BD address cell
net: of_net: Add of_get_nvmem_eui48() helper for EUI-48 lookup
Bluetooth: hci_sync: Add NVMEM-backed BD address retrieval
Bluetooth: qca: Set NVMEM BD address quirks when address is invalid
arm64: dts: qcom: arduino-imola: Describe NVMEM layout for WiFi/BT addresses
.../devicetree/bindings/mmc/mmc-card.yaml | 29 ++++++
.../net/bluetooth/qcom,bluetooth-common.yaml | 9 ++
.../bindings/net/wireless/qcom,ath10k.yaml | 16 +++
arch/arm64/boot/dts/qcom/qrb2210-arduino-imola.dts | 39 ++++++++
block/Kconfig | 9 ++
block/Makefile | 1 +
block/blk-nvmem.c | 109 +++++++++++++++++++++
block/blk.h | 8 ++
block/genhd.c | 4 +
block/partitions/of.c | 20 ++--
drivers/bluetooth/btqca.c | 5 +-
include/linux/blk_types.h | 3 +
include/linux/blkdev.h | 1 +
include/linux/of_net.h | 7 ++
include/net/bluetooth/hci.h | 18 ++++
net/bluetooth/hci_sync.c | 39 +++++++-
net/core/of_net.c | 49 ++++++---
17 files changed, 345 insertions(+), 21 deletions(-)
---
base-commit: ccb7390d6cdb23b298a6e2a7028ec134dfc4db10
change-id: 20260428-block-as-nvmem-4b308e8bda9a
Best regards,
--
Loic Poulain <loic.poulain@oss.qualcomm.com>
^ permalink raw reply
* [PATCH v5 1/9] block: partitions: of: Skip child nodes without reg property
From: Loic Poulain @ 2026-06-12 13:20 UTC (permalink / raw)
To: Ulf Hansson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Bjorn Andersson, Konrad Dybcio, Jens Axboe, Johannes Berg,
Jeff Johnson, Bartosz Golaszewski, Marcel Holtmann,
Luiz Augusto von Dentz, Balakrishna Godavarthi, Rocky Liao,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Srinivas Kandagatla, Andrew Lunn, Heiner Kallweit,
Russell King, Saravana Kannan
Cc: linux-mmc, devicetree, linux-kernel, linux-arm-msm, linux-block,
linux-wireless, ath10k, linux-bluetooth, netdev, daniel,
Loic Poulain
In-Reply-To: <20260612-block-as-nvmem-v5-0-95e0b30fff90@oss.qualcomm.com>
Child nodes of a fixed-partitions node are not necessarily partition
entries, for example an nvmem-layout node has no reg property. The
current code passes a NULL reg pointer and uninitialized len to the
length check, which can result in a kernel panic or silent failure to
register any partitions.
Fix validate_of_partition() to return a skip indicator when no reg
property is present. Guard add_of_partition() with a reg property
check for the same reason.
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
---
block/partitions/of.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/block/partitions/of.c b/block/partitions/of.c
index c22b6066109819c71568f73e8db8833d196b1cf6..534e02a9d85f62611d880af9b302d9fd49aa4d46 100644
--- a/block/partitions/of.c
+++ b/block/partitions/of.c
@@ -15,6 +15,10 @@ static int validate_of_partition(struct device_node *np, int slot)
int a_cells = of_n_addr_cells(np);
int s_cells = of_n_size_cells(np);
+ /* Skip nodes without a reg property (e.g. nvmem-layout) */
+ if (!reg)
+ return 1;
+
/* Make sure reg len match the expected addr and size cells */
if (len / sizeof(*reg) != a_cells + s_cells)
return -EINVAL;
@@ -80,14 +84,15 @@ int of_partition(struct parsed_partitions *state)
slot = 1;
/* Validate parition offset and size */
for_each_child_of_node(partitions_np, np) {
- if (validate_of_partition(np, slot)) {
+ int err = validate_of_partition(np, slot);
+
+ if (err < 0) {
of_node_put(np);
of_node_put(partitions_np);
-
return -1;
}
-
- slot++;
+ if (!err)
+ slot++;
}
slot = 1;
@@ -97,9 +102,10 @@ int of_partition(struct parsed_partitions *state)
break;
}
- add_of_partition(state, slot, np);
-
- slot++;
+ if (of_property_present(np, "reg")) {
+ add_of_partition(state, slot, np);
+ slot++;
+ }
}
seq_buf_puts(&state->pp_buf, "\n");
--
2.34.1
^ permalink raw reply related
* [PATCH v5 2/9] dt-bindings: mmc: Document support for nvmem-layout
From: Loic Poulain @ 2026-06-12 13:20 UTC (permalink / raw)
To: Ulf Hansson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Bjorn Andersson, Konrad Dybcio, Jens Axboe, Johannes Berg,
Jeff Johnson, Bartosz Golaszewski, Marcel Holtmann,
Luiz Augusto von Dentz, Balakrishna Godavarthi, Rocky Liao,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Srinivas Kandagatla, Andrew Lunn, Heiner Kallweit,
Russell King, Saravana Kannan
Cc: linux-mmc, devicetree, linux-kernel, linux-arm-msm, linux-block,
linux-wireless, ath10k, linux-bluetooth, netdev, daniel,
Loic Poulain, Bartosz Golaszewski
In-Reply-To: <20260612-block-as-nvmem-v5-0-95e0b30fff90@oss.qualcomm.com>
Add support for an nvmem-layout subnode under an eMMC hardware
partition. This allows the partition to be exposed as an NVMEM
provider and its internal layout to be described. For example,
an eMMC boot partition can be used to store device-specific
information such as a WiFi MAC address.
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
---
.../devicetree/bindings/mmc/mmc-card.yaml | 29 ++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/Documentation/devicetree/bindings/mmc/mmc-card.yaml b/Documentation/devicetree/bindings/mmc/mmc-card.yaml
index a61d6c96df759102f9c1fbfd548b026a77921cae..ca907ad73095925b234b119948f94ae81e698c86 100644
--- a/Documentation/devicetree/bindings/mmc/mmc-card.yaml
+++ b/Documentation/devicetree/bindings/mmc/mmc-card.yaml
@@ -40,6 +40,9 @@ patternProperties:
contains:
const: fixed-partitions
+ nvmem-layout:
+ $ref: /schemas/nvmem/layouts/nvmem-layout.yaml
+
required:
- compatible
- reg
@@ -86,6 +89,32 @@ examples:
read-only;
};
};
+
+ partitions-boot2 {
+ compatible = "fixed-partitions";
+
+ #address-cells = <1>;
+ #size-cells = <1>;
+
+ nvmem-layout {
+ compatible = "fixed-layout";
+
+ #address-cells = <1>;
+ #size-cells = <1>;
+
+ mac-addr@4400 {
+ compatible = "mac-base";
+ reg = <0x4400 0x6>;
+ #nvmem-cell-cells = <1>;
+ };
+
+ bd-addr@5400 {
+ compatible = "mac-base";
+ reg = <0x5400 0x6>;
+ #nvmem-cell-cells = <1>;
+ };
+ };
+ };
};
};
--
2.34.1
^ permalink raw reply related
* [PATCH v5 3/9] dt-bindings: net: wireless: qcom,ath10k: Document NVMEM cells
From: Loic Poulain @ 2026-06-12 13:20 UTC (permalink / raw)
To: Ulf Hansson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Bjorn Andersson, Konrad Dybcio, Jens Axboe, Johannes Berg,
Jeff Johnson, Bartosz Golaszewski, Marcel Holtmann,
Luiz Augusto von Dentz, Balakrishna Godavarthi, Rocky Liao,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Srinivas Kandagatla, Andrew Lunn, Heiner Kallweit,
Russell King, Saravana Kannan
Cc: linux-mmc, devicetree, linux-kernel, linux-arm-msm, linux-block,
linux-wireless, ath10k, linux-bluetooth, netdev, daniel,
Loic Poulain, Bartosz Golaszewski, Krzysztof Kozlowski
In-Reply-To: <20260612-block-as-nvmem-v5-0-95e0b30fff90@oss.qualcomm.com>
Document the NVMEM cells supported by the ath10k driver, the
mac-address, pre-calibration data, and calibration data.
Since such data may also originate from chipset OTP or be supplied
via other device tree structures. All of these cells are optional
and can be provided independently, in any combination.
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
---
.../devicetree/bindings/net/wireless/qcom,ath10k.yaml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.yaml b/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.yaml
index c21d66c7cd558ab792524be9afec8b79272d1c87..878c5d833a9cb073520c256c1b72d0f1489e7f4a 100644
--- a/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.yaml
+++ b/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.yaml
@@ -92,6 +92,22 @@ properties:
ieee80211-freq-limit: true
+ nvmem-cells:
+ minItems: 1
+ maxItems: 3
+ description:
+ References to nvmem cells for MAC address and/or calibration data.
+ Supported cell names are mac-address, calibration, and pre-calibration.
+
+ nvmem-cell-names:
+ minItems: 1
+ maxItems: 3
+ items:
+ enum:
+ - mac-address
+ - calibration
+ - pre-calibration
+
qcom,calibration-data:
$ref: /schemas/types.yaml#/definitions/uint8-array
description:
--
2.34.1
^ permalink raw reply related
* [PATCH v5 4/9] dt-bindings: bluetooth: qcom: Add NVMEM BD address cell
From: Loic Poulain @ 2026-06-12 13:20 UTC (permalink / raw)
To: Ulf Hansson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Bjorn Andersson, Konrad Dybcio, Jens Axboe, Johannes Berg,
Jeff Johnson, Bartosz Golaszewski, Marcel Holtmann,
Luiz Augusto von Dentz, Balakrishna Godavarthi, Rocky Liao,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Srinivas Kandagatla, Andrew Lunn, Heiner Kallweit,
Russell King, Saravana Kannan
Cc: linux-mmc, devicetree, linux-kernel, linux-arm-msm, linux-block,
linux-wireless, ath10k, linux-bluetooth, netdev, daniel,
Loic Poulain, Bartosz Golaszewski
In-Reply-To: <20260612-block-as-nvmem-v5-0-95e0b30fff90@oss.qualcomm.com>
Add support for an NVMEM cell provider for "local-bd-address",
allowing the Bluetooth stack to retrieve controller's BD address
from non-volatile storage such as an EEPROM or an eMMC partition.
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
---
.../devicetree/bindings/net/bluetooth/qcom,bluetooth-common.yaml | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/Documentation/devicetree/bindings/net/bluetooth/qcom,bluetooth-common.yaml b/Documentation/devicetree/bindings/net/bluetooth/qcom,bluetooth-common.yaml
index c8e9c55c1afb4c8e05ba2dae41ce2db4194b4a0f..7cb28f30c9af032082f23311f2fc89a32f266f17 100644
--- a/Documentation/devicetree/bindings/net/bluetooth/qcom,bluetooth-common.yaml
+++ b/Documentation/devicetree/bindings/net/bluetooth/qcom,bluetooth-common.yaml
@@ -22,4 +22,13 @@ properties:
description:
boot firmware is incorrectly passing the address in big-endian order
+ nvmem-cells:
+ maxItems: 1
+ description:
+ Nvmem data cell that contains a 6 byte BD address with the most
+ significant byte first (big-endian).
+
+ nvmem-cell-names:
+ const: local-bd-address
+
additionalProperties: true
--
2.34.1
^ permalink raw reply related
* [PATCH v5 5/9] block: implement NVMEM provider
From: Loic Poulain @ 2026-06-12 13:20 UTC (permalink / raw)
To: Ulf Hansson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Bjorn Andersson, Konrad Dybcio, Jens Axboe, Johannes Berg,
Jeff Johnson, Bartosz Golaszewski, Marcel Holtmann,
Luiz Augusto von Dentz, Balakrishna Godavarthi, Rocky Liao,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Srinivas Kandagatla, Andrew Lunn, Heiner Kallweit,
Russell King, Saravana Kannan
Cc: linux-mmc, devicetree, linux-kernel, linux-arm-msm, linux-block,
linux-wireless, ath10k, linux-bluetooth, netdev, daniel,
Loic Poulain
In-Reply-To: <20260612-block-as-nvmem-v5-0-95e0b30fff90@oss.qualcomm.com>
From: Daniel Golle <daniel@makrotopia.org>
On embedded devices using an eMMC it is common that one or more partitions
on the eMMC are used to store MAC addresses and Wi-Fi calibration EEPROM
data. Allow referencing the partition in device tree for the kernel and
Wi-Fi drivers accessing it via the NVMEM layer.
For now, NVMEM is only registered for the whole disk block device, as the
OF node is currently only associated to it.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Co-developed-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
---
block/Kconfig | 9 ++++
block/Makefile | 1 +
block/blk-nvmem.c | 109 ++++++++++++++++++++++++++++++++++++++++++++++
block/blk.h | 8 ++++
block/genhd.c | 4 ++
include/linux/blk_types.h | 3 ++
include/linux/blkdev.h | 1 +
7 files changed, 135 insertions(+)
diff --git a/block/Kconfig b/block/Kconfig
index 15027963472d7b40e27b9097a5993c457b5b3054..0b33747e16dc33473683706f75c92bdf8b648f7c 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -209,6 +209,15 @@ config BLK_INLINE_ENCRYPTION_FALLBACK
by falling back to the kernel crypto API when inline
encryption hardware is not present.
+config BLK_NVMEM
+ bool "Block device NVMEM provider"
+ depends on OF
+ depends on NVMEM
+ help
+ Allow block devices (or partitions) to act as NVMEM providers,
+ typically used with eMMC to store MAC addresses or Wi-Fi
+ calibration data on embedded devices.
+
source "block/partitions/Kconfig"
config BLK_PM
diff --git a/block/Makefile b/block/Makefile
index 7dce2e44276c4274c11a0a61121c83d9c43d6e0c..d7ac389e71902bc091a8800ea266190a43b3e63d 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -36,3 +36,4 @@ obj-$(CONFIG_BLK_INLINE_ENCRYPTION) += blk-crypto.o blk-crypto-profile.o \
blk-crypto-sysfs.o
obj-$(CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK) += blk-crypto-fallback.o
obj-$(CONFIG_BLOCK_HOLDER_DEPRECATED) += holder.o
+obj-$(CONFIG_BLK_NVMEM) += blk-nvmem.o
diff --git a/block/blk-nvmem.c b/block/blk-nvmem.c
new file mode 100644
index 0000000000000000000000000000000000000000..c005f059d9fe56242ebaef9905673dff902b5686
--- /dev/null
+++ b/block/blk-nvmem.c
@@ -0,0 +1,109 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * block device NVMEM provider
+ *
+ * Copyright (c) 2024 Daniel Golle <daniel@makrotopia.org>
+ * Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
+ *
+ * Useful on devices using a partition on an eMMC for MAC addresses or
+ * Wi-Fi calibration EEPROM data.
+ */
+
+#include <linux/file.h>
+#include <linux/nvmem-provider.h>
+#include <linux/nvmem-consumer.h>
+#include <linux/of.h>
+#include <linux/pagemap.h>
+#include <linux/property.h>
+
+#include "blk.h"
+
+static int blk_nvmem_reg_read(void *priv, unsigned int from, void *val, size_t bytes)
+{
+ blk_mode_t mode = BLK_OPEN_READ | BLK_OPEN_RESTRICT_WRITES;
+ dev_t devt = (dev_t)(uintptr_t)priv;
+ size_t bytes_left = bytes;
+ loff_t pos = from;
+ int ret = 0;
+
+ struct file *bdev_file __free(fput) = bdev_file_open_by_dev(devt, mode, priv, NULL);
+ if (IS_ERR(bdev_file))
+ return PTR_ERR(bdev_file);
+
+ while (bytes_left) {
+ pgoff_t f_index = pos >> PAGE_SHIFT;
+ struct folio *folio;
+ size_t folio_off;
+ size_t to_read;
+
+ folio = read_mapping_folio(bdev_file->f_mapping, f_index, NULL);
+ if (IS_ERR(folio)) {
+ ret = PTR_ERR(folio);
+ break;
+ }
+
+ folio_off = offset_in_folio(folio, pos);
+ to_read = min(bytes_left, folio_size(folio) - folio_off);
+ memcpy_from_folio(val, folio, folio_off, to_read);
+ pos += to_read;
+ bytes_left -= to_read;
+ val += to_read;
+ folio_put(folio);
+ }
+
+ return ret;
+}
+
+void blk_nvmem_add(struct block_device *bdev)
+{
+ struct device *dev = &bdev->bd_device;
+ struct nvmem_config config = {};
+
+ /* skip devices which do not have a device tree node */
+ if (!dev_of_node(dev))
+ return;
+
+ /* skip devices without an nvmem layout defined */
+ struct device_node *child __free(device_node) =
+ of_get_child_by_name(dev_of_node(dev), "nvmem-layout");
+ if (!child)
+ return;
+
+ /*
+ * skip block device too large to be represented as NVMEM devices,
+ * the NVMEM reg_read callback uses an unsigned int offset
+ */
+ if (bdev_nr_bytes(bdev) > UINT_MAX) {
+ dev_warn(dev, "block device too large to be an NVMEM provider\n");
+ return;
+ }
+
+ config.id = NVMEM_DEVID_NONE;
+ config.dev = dev;
+ config.name = dev_name(dev);
+ config.owner = THIS_MODULE;
+ config.priv = (void *)(uintptr_t)dev->devt;
+ config.reg_read = blk_nvmem_reg_read;
+ config.size = bdev_nr_bytes(bdev);
+ config.word_size = 1;
+ config.stride = 1;
+ config.read_only = true;
+ config.root_only = true;
+ config.ignore_wp = true;
+ config.of_node = to_of_node(dev->fwnode);
+
+ bdev->bd_nvmem = nvmem_register(&config);
+ if (IS_ERR(bdev->bd_nvmem)) {
+ dev_err_probe(dev, PTR_ERR(bdev->bd_nvmem),
+ "Failed to register NVMEM device\n");
+ bdev->bd_nvmem = NULL;
+ }
+}
+
+void blk_nvmem_del(struct block_device *bdev)
+{
+ if (bdev->bd_nvmem)
+ nvmem_unregister(bdev->bd_nvmem);
+
+ bdev->bd_nvmem = NULL;
+}
diff --git a/block/blk.h b/block/blk.h
index ec4674cdf2ead4fd259ff5fc42401f591e684ee9..cd3c7ca723391c40be56f1dd4810e641b7c8a2b3 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -757,4 +757,12 @@ static inline void blk_debugfs_unlock(struct request_queue *q,
memalloc_noio_restore(memflags);
}
+#ifdef CONFIG_BLK_NVMEM
+void blk_nvmem_add(struct block_device *bdev);
+void blk_nvmem_del(struct block_device *bdev);
+#else
+static inline void blk_nvmem_add(struct block_device *bdev) {}
+static inline void blk_nvmem_del(struct block_device *bdev) {}
+#endif
+
#endif /* BLK_INTERNAL_H */
diff --git a/block/genhd.c b/block/genhd.c
index 7d6854fd28e95ae9134309679a7c6a937f5b7db8..1b2382de6fb30c1e5f60f45c04dc03ed3bf5d5f2 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -421,6 +421,8 @@ static void add_disk_final(struct gendisk *disk)
*/
dev_set_uevent_suppress(ddev, 0);
disk_uevent(disk, KOBJ_ADD);
+
+ blk_nvmem_add(disk->part0);
}
blk_apply_bdi_limits(disk->bdi, &disk->queue->limits);
@@ -704,6 +706,8 @@ static void __del_gendisk(struct gendisk *disk)
disk_del_events(disk);
+ blk_nvmem_del(disk->part0);
+
/*
* Prevent new openers by unlinked the bdev inode.
*/
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 8808ee76e73c09e0ceaac41ba59e86fb0c4efc64..ace6f59b860d0813665b2f62a1c03a1f4be94059 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -73,6 +73,9 @@ struct block_device {
int bd_writers;
#ifdef CONFIG_SECURITY
void *bd_security;
+#endif
+#ifdef CONFIG_BLK_NVMEM
+ struct nvmem_device *bd_nvmem;
#endif
/*
* keep this out-of-line as it's both big and not needed in the fast
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 890128cdea1ce66863c5baa36f3b336ec4550807..f15d2b5bf9e4fd2368b8a70416a978e22c0d4333 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -30,6 +30,7 @@
struct module;
struct request_queue;
+struct nvmem_device;
struct elevator_queue;
struct blk_trace;
struct request;
--
2.34.1
^ permalink raw reply related
* [PATCH v5 6/9] net: of_net: Add of_get_nvmem_eui48() helper for EUI-48 lookup
From: Loic Poulain @ 2026-06-12 13:20 UTC (permalink / raw)
To: Ulf Hansson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Bjorn Andersson, Konrad Dybcio, Jens Axboe, Johannes Berg,
Jeff Johnson, Bartosz Golaszewski, Marcel Holtmann,
Luiz Augusto von Dentz, Balakrishna Godavarthi, Rocky Liao,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Srinivas Kandagatla, Andrew Lunn, Heiner Kallweit,
Russell King, Saravana Kannan
Cc: linux-mmc, devicetree, linux-kernel, linux-arm-msm, linux-block,
linux-wireless, ath10k, linux-bluetooth, netdev, daniel,
Loic Poulain, Bartosz Golaszewski
In-Reply-To: <20260612-block-as-nvmem-v5-0-95e0b30fff90@oss.qualcomm.com>
Factor out the common NVMEM EUI-48 retrieval logic from
of_get_mac_address_nvmem() into a new of_get_nvmem_eui48() helper that
accepts the NVMEM cell name as a parameter. This allows other subsystems
(e.g. Bluetooth) to reuse the same lookup-validate-copy pattern with a
different cell name, without duplicating code.
of_get_mac_address_nvmem() is updated to call of_get_nvmem_eui48() with
"mac-address", preserving its existing behavior.
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
---
include/linux/of_net.h | 7 +++++++
net/core/of_net.c | 49 +++++++++++++++++++++++++++++++++++++------------
2 files changed, 44 insertions(+), 12 deletions(-)
diff --git a/include/linux/of_net.h b/include/linux/of_net.h
index d88715a0b3a52f87af23d47791bea3baf5be5200..7854ba555d9a55f3d020a37fe00a27ae52e0e5dc 100644
--- a/include/linux/of_net.h
+++ b/include/linux/of_net.h
@@ -15,6 +15,7 @@ struct net_device;
extern int of_get_phy_mode(struct device_node *np, phy_interface_t *interface);
extern int of_get_mac_address(struct device_node *np, u8 *mac);
extern int of_get_mac_address_nvmem(struct device_node *np, u8 *mac);
+int of_get_nvmem_eui48(struct device_node *np, const char *cell_name, u8 *addr);
int of_get_ethdev_address(struct device_node *np, struct net_device *dev);
extern struct net_device *of_find_net_device_by_node(struct device_node *np);
#else
@@ -34,6 +35,12 @@ static inline int of_get_mac_address_nvmem(struct device_node *np, u8 *mac)
return -ENODEV;
}
+static inline int of_get_nvmem_eui48(struct device_node *np,
+ const char *cell_name, u8 *addr)
+{
+ return -ENODEV;
+}
+
static inline int of_get_ethdev_address(struct device_node *np, struct net_device *dev)
{
return -ENODEV;
diff --git a/net/core/of_net.c b/net/core/of_net.c
index 93ea425b9248a23f4f95a336e9cdbf0053248e32..11c1acca151266ac9287457b4050a54b08e2b5f5 100644
--- a/net/core/of_net.c
+++ b/net/core/of_net.c
@@ -61,9 +61,7 @@ static int of_get_mac_addr(struct device_node *np, const char *name, u8 *addr)
int of_get_mac_address_nvmem(struct device_node *np, u8 *addr)
{
struct platform_device *pdev = of_find_device_by_node(np);
- struct nvmem_cell *cell;
- const void *mac;
- size_t len;
+ u8 mac[ETH_ALEN] __aligned(sizeof(u16));
int ret;
/* Try lookup by device first, there might be a nvmem_cell_lookup
@@ -75,27 +73,54 @@ int of_get_mac_address_nvmem(struct device_node *np, u8 *addr)
return ret;
}
- cell = of_nvmem_cell_get(np, "mac-address");
+ ret = of_get_nvmem_eui48(np, "mac-address", mac);
+ if (ret)
+ return ret;
+
+ if (!is_valid_ether_addr(mac))
+ return -EINVAL;
+
+ ether_addr_copy(addr, mac);
+ return 0;
+}
+EXPORT_SYMBOL(of_get_mac_address_nvmem);
+
+/**
+ * of_get_nvmem_eui48 - Read a 6-byte EUI-48 address from a named NVMEM cell.
+ * @np: Device node to look up the NVMEM cell from.
+ * @cell_name: Name of the NVMEM cell (e.g. "mac-address", "local-bd-address").
+ * @addr: Output buffer for the 6-byte address.
+ *
+ * Reads the named NVMEM cell and validates that it contains a non-zero 6-byte
+ * address. Returns 0 on success, negative errno on failure.
+ */
+int of_get_nvmem_eui48(struct device_node *np, const char *cell_name, u8 *addr)
+{
+ struct nvmem_cell *cell;
+ const void *eui48;
+ size_t len;
+
+ cell = of_nvmem_cell_get(np, cell_name);
if (IS_ERR(cell))
return PTR_ERR(cell);
- mac = nvmem_cell_read(cell, &len);
+ eui48 = nvmem_cell_read(cell, &len);
nvmem_cell_put(cell);
- if (IS_ERR(mac))
- return PTR_ERR(mac);
+ if (IS_ERR(eui48))
+ return PTR_ERR(eui48);
- if (len != ETH_ALEN || !is_valid_ether_addr(mac)) {
- kfree(mac);
+ if (len != ETH_ALEN || !memchr_inv(eui48, 0, ETH_ALEN)) {
+ kfree(eui48);
return -EINVAL;
}
- memcpy(addr, mac, ETH_ALEN);
- kfree(mac);
+ memcpy(addr, eui48, ETH_ALEN);
+ kfree(eui48);
return 0;
}
-EXPORT_SYMBOL(of_get_mac_address_nvmem);
+EXPORT_SYMBOL_GPL(of_get_nvmem_eui48);
/**
* of_get_mac_address()
--
2.34.1
^ permalink raw reply related
* [PATCH v5 7/9] Bluetooth: hci_sync: Add NVMEM-backed BD address retrieval
From: Loic Poulain @ 2026-06-12 13:20 UTC (permalink / raw)
To: Ulf Hansson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Bjorn Andersson, Konrad Dybcio, Jens Axboe, Johannes Berg,
Jeff Johnson, Bartosz Golaszewski, Marcel Holtmann,
Luiz Augusto von Dentz, Balakrishna Godavarthi, Rocky Liao,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Srinivas Kandagatla, Andrew Lunn, Heiner Kallweit,
Russell King, Saravana Kannan
Cc: linux-mmc, devicetree, linux-kernel, linux-arm-msm, linux-block,
linux-wireless, ath10k, linux-bluetooth, netdev, daniel,
Loic Poulain, Bartosz Golaszewski
In-Reply-To: <20260612-block-as-nvmem-v5-0-95e0b30fff90@oss.qualcomm.com>
Some devices store the Bluetooth BD address in non-volatile
memory, which can be accessed through the NVMEM framework.
Similar to Ethernet or WiFi MAC addresses, add support for
reading the BD address from a 'local-bd-address' NVMEM cell.
As with the device-tree provided BD address, add a quirk to
indicate whether a device or platform should attempt to read
the address from NVMEM when no valid in-chip address is present.
Also add a quirk to indicate if the address is stored in
big-endian byte order.
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
---
include/net/bluetooth/hci.h | 18 ++++++++++++++++++
net/bluetooth/hci_sync.c | 39 ++++++++++++++++++++++++++++++++++++++-
2 files changed, 56 insertions(+), 1 deletion(-)
diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
index 572b1c620c5d653a1fe10b26c1b0ba33e8f4968f..7686466d1109253b0d75edeb5f6a99fb98ce4cc6 100644
--- a/include/net/bluetooth/hci.h
+++ b/include/net/bluetooth/hci.h
@@ -164,6 +164,24 @@ enum {
*/
HCI_QUIRK_BDADDR_PROPERTY_BROKEN,
+ /* When this quirk is set, the public Bluetooth address
+ * initially reported by HCI Read BD Address command
+ * is considered invalid. The public BD Address can be
+ * retrieved via a 'local-bd-address' NVMEM cell.
+ *
+ * This quirk can be set before hci_register_dev is called or
+ * during the hdev->setup vendor callback.
+ */
+ HCI_QUIRK_USE_BDADDR_NVMEM,
+
+ /* When this quirk is set, the Bluetooth Device Address provided by
+ * the 'local-bd-address' NVMEM is stored in big-endian order.
+ *
+ * This quirk can be set before hci_register_dev is called or
+ * during the hdev->setup vendor callback.
+ */
+ HCI_QUIRK_BDADDR_NVMEM_BE,
+
/* When this quirk is set, the duplicate filtering during
* scanning is based on Bluetooth devices addresses. To allow
* RSSI based updates, restart scanning if needed.
diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
index fd3aacdea512a37c22b9a2be90c89ddca4b4d99f..589ccdfa26c1281d6eb979370523fff0d7920302 100644
--- a/net/bluetooth/hci_sync.c
+++ b/net/bluetooth/hci_sync.c
@@ -7,6 +7,7 @@
*/
#include <linux/property.h>
+#include <linux/of_net.h>
#include <net/bluetooth/bluetooth.h>
#include <net/bluetooth/hci_core.h>
@@ -3588,6 +3589,37 @@ int hci_powered_update_sync(struct hci_dev *hdev)
return 0;
}
+/**
+ * hci_dev_get_bd_addr_from_nvmem - Get the Bluetooth Device Address
+ * (BD_ADDR) for a HCI device from
+ * an NVMEM cell.
+ * @hdev: The HCI device
+ *
+ * Search for 'local-bd-address' NVMEM cell in the device firmware node.
+ *
+ * All-zero BD addresses are rejected (unprovisioned).
+ */
+static int hci_dev_get_bd_addr_from_nvmem(struct hci_dev *hdev)
+{
+ struct device_node *np = dev_of_node(hdev->dev.parent);
+ u8 ba[sizeof(bdaddr_t)];
+ int err;
+
+ if (!np)
+ return -ENODEV;
+
+ err = of_get_nvmem_eui48(np, "local-bd-address", ba);
+ if (err)
+ return err;
+
+ if (hci_test_quirk(hdev, HCI_QUIRK_BDADDR_NVMEM_BE))
+ baswap(&hdev->public_addr, (bdaddr_t *)ba);
+ else
+ bacpy(&hdev->public_addr, (bdaddr_t *)ba);
+
+ return 0;
+}
+
/**
* hci_dev_get_bd_addr_from_property - Get the Bluetooth Device Address
* (BD_ADDR) for a HCI device from
@@ -5042,12 +5074,17 @@ static int hci_dev_setup_sync(struct hci_dev *hdev)
* its setup callback.
*/
invalid_bdaddr = hci_test_quirk(hdev, HCI_QUIRK_INVALID_BDADDR) ||
- hci_test_quirk(hdev, HCI_QUIRK_USE_BDADDR_PROPERTY);
+ hci_test_quirk(hdev, HCI_QUIRK_USE_BDADDR_PROPERTY) ||
+ hci_test_quirk(hdev, HCI_QUIRK_USE_BDADDR_NVMEM);
if (!ret) {
if (hci_test_quirk(hdev, HCI_QUIRK_USE_BDADDR_PROPERTY) &&
!bacmp(&hdev->public_addr, BDADDR_ANY))
hci_dev_get_bd_addr_from_property(hdev);
+ if (hci_test_quirk(hdev, HCI_QUIRK_USE_BDADDR_NVMEM) &&
+ !bacmp(&hdev->public_addr, BDADDR_ANY))
+ hci_dev_get_bd_addr_from_nvmem(hdev);
+
if (invalid_bdaddr && bacmp(&hdev->public_addr, BDADDR_ANY) &&
hdev->set_bdaddr) {
ret = hdev->set_bdaddr(hdev, &hdev->public_addr);
--
2.34.1
^ permalink raw reply related
* [PATCH v5 8/9] Bluetooth: qca: Set NVMEM BD address quirks when address is invalid
From: Loic Poulain @ 2026-06-12 13:21 UTC (permalink / raw)
To: Ulf Hansson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Bjorn Andersson, Konrad Dybcio, Jens Axboe, Johannes Berg,
Jeff Johnson, Bartosz Golaszewski, Marcel Holtmann,
Luiz Augusto von Dentz, Balakrishna Godavarthi, Rocky Liao,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Srinivas Kandagatla, Andrew Lunn, Heiner Kallweit,
Russell King, Saravana Kannan
Cc: linux-mmc, devicetree, linux-kernel, linux-arm-msm, linux-block,
linux-wireless, ath10k, linux-bluetooth, netdev, daniel,
Loic Poulain, Bartosz Golaszewski
In-Reply-To: <20260612-block-as-nvmem-v5-0-95e0b30fff90@oss.qualcomm.com>
When the controller BD address is invalid (zero or default),
set the NVMEM quirks to allow retrieving the address from a
'local-bd-address' NVMEM cell. The BD address is often stored
alongside the WiFi MAC address in big-endian format, so also
set the big-endian quirk.
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
---
drivers/bluetooth/btqca.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/bluetooth/btqca.c b/drivers/bluetooth/btqca.c
index dda76365726f0bfe0e80e05fe04859fa4f0592e1..df33eacfd29fa680f393f90215150743e6001d5b 100644
--- a/drivers/bluetooth/btqca.c
+++ b/drivers/bluetooth/btqca.c
@@ -721,8 +721,11 @@ static int qca_check_bdaddr(struct hci_dev *hdev, const struct qca_fw_config *co
}
bda = (struct hci_rp_read_bd_addr *)skb->data;
- if (!bacmp(&bda->bdaddr, &config->bdaddr))
+ if (!bacmp(&bda->bdaddr, &config->bdaddr)) {
hci_set_quirk(hdev, HCI_QUIRK_USE_BDADDR_PROPERTY);
+ hci_set_quirk(hdev, HCI_QUIRK_USE_BDADDR_NVMEM);
+ hci_set_quirk(hdev, HCI_QUIRK_BDADDR_NVMEM_BE);
+ }
kfree_skb(skb);
--
2.34.1
^ permalink raw reply related
* [PATCH v5 9/9] arm64: dts: qcom: arduino-imola: Describe NVMEM layout for WiFi/BT addresses
From: Loic Poulain @ 2026-06-12 13:21 UTC (permalink / raw)
To: Ulf Hansson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Bjorn Andersson, Konrad Dybcio, Jens Axboe, Johannes Berg,
Jeff Johnson, Bartosz Golaszewski, Marcel Holtmann,
Luiz Augusto von Dentz, Balakrishna Godavarthi, Rocky Liao,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Srinivas Kandagatla, Andrew Lunn, Heiner Kallweit,
Russell King, Saravana Kannan
Cc: linux-mmc, devicetree, linux-kernel, linux-arm-msm, linux-block,
linux-wireless, ath10k, linux-bluetooth, netdev, daniel,
Loic Poulain, Konrad Dybcio, Bartosz Golaszewski
In-Reply-To: <20260612-block-as-nvmem-v5-0-95e0b30fff90@oss.qualcomm.com>
On Arduino Uno-Q, the eMMC boot1 partition is factory provisioned
with device-specific information such as the WiFi MAC address
and the Bluetooth BD address. This partition can serve as an
alternative to additional non-volatile memory, such as a
dedicated EEPROM.
The eMMC boot partitions are typically good candidates, as they
are relatively small, read-only by default (and can be enforced
as hardware read-only), and are not affected by board reflashing
procedures, which generally target the eMMC user or GP partitions.
Describe the corresponding nvmem-layout for the WiFi and Bluetooth
addresses, and point the WiFi and Bluetooth nodes to the appropriate
NVMEM cells to retrieve them.
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
---
arch/arm64/boot/dts/qcom/qrb2210-arduino-imola.dts | 39 ++++++++++++++++++++++
1 file changed, 39 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/qrb2210-arduino-imola.dts b/arch/arm64/boot/dts/qcom/qrb2210-arduino-imola.dts
index bf088fa9807f040f0c8f405f9111b01790b09377..128c7a7e76b5b089044745f5d6407d6391055fc2 100644
--- a/arch/arm64/boot/dts/qcom/qrb2210-arduino-imola.dts
+++ b/arch/arm64/boot/dts/qcom/qrb2210-arduino-imola.dts
@@ -409,7 +409,40 @@ &sdhc_1 {
no-sdio;
no-sd;
+ #address-cells = <1>;
+ #size-cells = <0>;
+
status = "okay";
+
+ card@0 {
+ compatible = "mmc-card";
+ reg = <0>;
+
+ partitions-boot1 {
+ compatible = "fixed-partitions";
+
+ #address-cells = <1>;
+ #size-cells = <1>;
+
+ nvmem-layout {
+ compatible = "fixed-layout";
+ #address-cells = <1>;
+ #size-cells = <1>;
+
+ wifi_mac_addr: mac-addr@4400 {
+ compatible = "mac-base";
+ reg = <0x4400 0x6>;
+ #nvmem-cell-cells = <1>;
+ };
+
+ bd_addr: bd-addr@5400 {
+ compatible = "mac-base";
+ reg = <0x5400 0x6>;
+ #nvmem-cell-cells = <1>;
+ };
+ };
+ };
+ };
};
&spi5 {
@@ -512,6 +545,9 @@ bluetooth {
vddch0-supply = <&pm4125_l22>;
enable-gpios = <&tlmm 87 GPIO_ACTIVE_HIGH>;
max-speed = <3000000>;
+
+ nvmem-cells = <&bd_addr 0>;
+ nvmem-cell-names = "local-bd-address";
};
};
@@ -557,6 +593,9 @@ &wifi {
qcom,ath10k-calibration-variant = "ArduinoImola";
firmware-name = "qcm2290";
+ nvmem-cells = <&wifi_mac_addr 0>;
+ nvmem-cell-names = "mac-address";
+
status = "okay";
};
--
2.34.1
^ permalink raw reply related
* Re: [PATCH] iomap: enforce DIO alignment check in iomap
From: Carlos Maiolino @ 2026-06-12 13:23 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Keith Busch, brauner, linux-block, linux-fsdevel, linux-ext4,
linux-xfs, Hannes Reinecke, Martin K. Petersen, Jens Axboe
In-Reply-To: <20260612052831.GA9010@lst.de>
On Fri, Jun 12, 2026 at 07:28:31AM +0200, Christoph Hellwig wrote:
> On Thu, Jun 11, 2026 at 05:47:07PM +0200, Carlos Maiolino wrote:
> > On Thu, Jun 11, 2026 at 03:38:33PM +0200, Christoph Hellwig wrote:
> > > On Thu, Jun 11, 2026 at 06:57:47AM -0600, Keith Busch wrote:
> > > > It's entirely possible a device supports byte aligned addresses. The
> > > > block layer just doesn't let a driver report that. So either it really
> > > > was successful because you found a bug that skips the alignment checks,
> > > > or your device silently corrupted your payload.
> >
> > I tried this on different hardware, I find it hard to say all those
> > devices were corrupting the payload.
>
> I think in the other thread we agreed that we are currently missing
> the alignment check for fast-path bios not hitting the splitting code,
> so maybe that is something you see. Additionally we're missing the
> checks for purely bio based drivers not calling the splitting helper
> at all, but I don't think that applies here.
>
> > > > Anyway, my earlier suggestion should work. Ming thinks it may go to far,
> > > > though, in not taking the optimization when it was possible. So here's
> > > > an alternative suggestion that should get things working as expected:
> > >
> > > The fix below looks like it is addressing a real bug. I'm not sure if
> > > Carlos is hitting it, but we were missing the alignment checks for
> > > single-bvec fast path bios so far indeed.
> >
> > You left context out so I'm assuming by the fix you meant Keith's patch.
>
> Yes.
The fix indeed seems to fix the behavior I'm seeing. Keith could you Cc
me if you end up sending an official version?
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox