* [syzbot] [block?] BUG: sleeping function called from invalid context in __xas_nomem (2)
@ 2025-06-27 16:20 syzbot
2025-06-28 8:36 ` [PATCH] brd: fix sleeping memory allocation in brd_insert_page() Tetsuo Handa
0 siblings, 1 reply; 5+ messages in thread
From: syzbot @ 2025-06-27 16:20 UTC (permalink / raw)
To: axboe, linux-block, linux-kernel, syzkaller-bugs
Hello,
syzbot found the following issue on:
HEAD commit: 86731a2a651e Linux 6.16-rc3
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1630bb0c580000
kernel config: https://syzkaller.appspot.com/x/.config?x=4ad206eb0100c6a2
dashboard link: https://syzkaller.appspot.com/bug?extid=ea4c8fd177a47338881a
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
userspace arch: i386
Unfortunately, I don't have any reproducer for this issue yet.
Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-86731a2a.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/9e7ff33d1e1f/vmlinux-86731a2a.xz
kernel image: https://storage.googleapis.com/syzbot-assets/1bb9a09c88bb/bzImage-86731a2a.xz
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+ea4c8fd177a47338881a@syzkaller.appspotmail.com
BUG: sleeping function called from invalid context at ./include/linux/sched/mm.h:321
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 6843, name: syz.1.211
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
1 lock held by syz.1.211/6843:
#0: ffffffff8e5c47c0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_release include/linux/rcupdate.h:341 [inline]
#0: ffffffff8e5c47c0 (rcu_read_lock){....}-{1:3}, at: rcu_read_unlock include/linux/rcupdate.h:871 [inline]
#0: ffffffff8e5c47c0 (rcu_read_lock){....}-{1:3}, at: brd_insert_page drivers/block/brd.c:65 [inline]
#0: ffffffff8e5c47c0 (rcu_read_lock){....}-{1:3}, at: brd_rw_bvec drivers/block/brd.c:121 [inline]
#0: ffffffff8e5c47c0 (rcu_read_lock){....}-{1:3}, at: brd_submit_bio+0x935/0x10a0 drivers/block/brd.c:191
CPU: 1 UID: 0 PID: 6843 Comm: syz.1.211 Not tainted 6.16.0-rc3-syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x16c/0x1f0 lib/dump_stack.c:120
__might_resched+0x3c0/0x5e0 kernel/sched/core.c:8800
might_alloc include/linux/sched/mm.h:321 [inline]
might_alloc include/linux/sched/mm.h:316 [inline]
slab_pre_alloc_hook mm/slub.c:4099 [inline]
slab_alloc_node mm/slub.c:4177 [inline]
kmem_cache_alloc_lru_noprof+0x2d2/0x3b0 mm/slub.c:4216
__xas_nomem+0x266/0x670 lib/xarray.c:341
__xa_cmpxchg_raw lib/xarray.c:1786 [inline]
__xa_cmpxchg+0x119/0x290 lib/xarray.c:1766
brd_insert_page drivers/block/brd.c:72 [inline]
brd_rw_bvec drivers/block/brd.c:121 [inline]
brd_submit_bio+0x9ce/0x10a0 drivers/block/brd.c:191
__submit_bio+0x301/0x690 block/blk-core.c:644
__submit_bio_noacct block/blk-core.c:690 [inline]
submit_bio_noacct_nocheck+0x852/0xd30 block/blk-core.c:753
submit_bio_noacct+0x50d/0x1eb0 block/blk-core.c:874
__blkdev_direct_IO block/fops.c:257 [inline]
blkdev_direct_IO+0x1647/0x1ff0 block/fops.c:433
blkdev_direct_write block/fops.c:701 [inline]
blkdev_write_iter+0x6fd/0xdf0 block/fops.c:768
do_iter_readv_writev+0x654/0x950 fs/read_write.c:827
vfs_writev+0x35f/0xde0 fs/read_write.c:1057
do_writev+0x132/0x340 fs/read_write.c:1103
do_syscall_32_irqs_on arch/x86/entry/syscall_32.c:83 [inline]
__do_fast_syscall_32+0x7c/0x3a0 arch/x86/entry/syscall_32.c:306
do_fast_syscall_32+0x32/0x80 arch/x86/entry/syscall_32.c:331
entry_SYSENTER_compat_after_hwframe+0x84/0x8e
RIP: 0023:0xf7f11579
Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
RSP: 002b:00000000f503655c EFLAGS: 00000296 ORIG_RAX: 0000000000000092
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 0000000080000a40
RDX: 0000000000000021 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>
----------------
Code disassembly (best guess), 2 bytes skipped:
0: 10 06 adc %al,(%rsi)
2: 03 74 b4 01 add 0x1(%rsp,%rsi,4),%esi
6: 10 07 adc %al,(%rdi)
8: 03 74 b0 01 add 0x1(%rax,%rsi,4),%esi
c: 10 08 adc %cl,(%rax)
e: 03 74 d8 01 add 0x1(%rax,%rbx,8),%esi
1e: 00 51 52 add %dl,0x52(%rcx)
21: 55 push %rbp
22: 89 e5 mov %esp,%ebp
24: 0f 34 sysenter
26: cd 80 int $0x80
* 28: 5d pop %rbp <-- trapping instruction
29: 5a pop %rdx
2a: 59 pop %rcx
2b: c3 ret
2c: 90 nop
2d: 90 nop
2e: 90 nop
2f: 90 nop
30: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
37: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title
If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)
If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report
If you want to undo deduplication, reply with:
#syz undup
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] brd: fix sleeping memory allocation in brd_insert_page()
2025-06-27 16:20 [syzbot] [block?] BUG: sleeping function called from invalid context in __xas_nomem (2) syzbot
@ 2025-06-28 8:36 ` Tetsuo Handa
2025-06-28 9:39 ` Tetsuo Handa
2025-06-30 5:36 ` Christoph Hellwig
0 siblings, 2 replies; 5+ messages in thread
From: Tetsuo Handa @ 2025-06-28 8:36 UTC (permalink / raw)
To: Jens Axboe, Yu Kuai, Christoph Hellwig, LKML
syzbot is reporting that brd_insert_page() is calling
__xa_cmpxchg(__GFP_DIRECT_RECLAIM) with spinlock and RCU lock held.
Change __xa_cmpxchg() to use GFP_NOWAIT | __GFP_NOWARN, for it is likely
that __xa_cmpxchg() succeeds because of preceding alloc_page().
Fixes: bbcacab2e8ee ("brd: avoid extra xarray lookups on first write")
Reported-by: syzbot+ea4c8fd177a47338881a@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=ea4c8fd177a47338881a
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
drivers/block/brd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index b1be6c510372..ed3eb931750c 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -70,7 +70,7 @@ static struct page *brd_insert_page(struct brd_device *brd, sector_t sector,
xa_lock(&brd->brd_pages);
ret = __xa_cmpxchg(&brd->brd_pages, sector >> PAGE_SECTORS_SHIFT, NULL,
- page, gfp);
+ page, GFP_NOWAIT | __GFP_NOWARN);
if (ret) {
xa_unlock(&brd->brd_pages);
__free_page(page);
--
2.50.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] brd: fix sleeping memory allocation in brd_insert_page()
2025-06-28 8:36 ` [PATCH] brd: fix sleeping memory allocation in brd_insert_page() Tetsuo Handa
@ 2025-06-28 9:39 ` Tetsuo Handa
2025-06-28 11:03 ` Tetsuo Handa
2025-06-30 5:36 ` Christoph Hellwig
1 sibling, 1 reply; 5+ messages in thread
From: Tetsuo Handa @ 2025-06-28 9:39 UTC (permalink / raw)
To: Jens Axboe, Yu Kuai, Christoph Hellwig, LKML
On 2025/06/28 17:36, Tetsuo Handa wrote:
> syzbot is reporting that brd_insert_page() is calling
> __xa_cmpxchg(__GFP_DIRECT_RECLAIM) with spinlock and RCU lock held.
Hmm. Holding spinlock itself is OK because xa_lock() is a requirement.
> Change __xa_cmpxchg() to use GFP_NOWAIT | __GFP_NOWARN, for it is likely
> that __xa_cmpxchg() succeeds because of preceding alloc_page().
Since this gfp flag is for allocating index array, it should use
__GFP_DIRECT_RECLAIM if possible. Then, deferring RCU lock if possible
makes sense. Then, I wonder what this RCU lock is protecting...
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] brd: fix sleeping memory allocation in brd_insert_page()
2025-06-28 9:39 ` Tetsuo Handa
@ 2025-06-28 11:03 ` Tetsuo Handa
0 siblings, 0 replies; 5+ messages in thread
From: Tetsuo Handa @ 2025-06-28 11:03 UTC (permalink / raw)
To: Jens Axboe, Yu Kuai, Christoph Hellwig, LKML
On 2025/06/28 18:39, Tetsuo Handa wrote:
> On 2025/06/28 17:36, Tetsuo Handa wrote:
>> syzbot is reporting that brd_insert_page() is calling
>> __xa_cmpxchg(__GFP_DIRECT_RECLAIM) with spinlock and RCU lock held.
>
> Hmm. Holding spinlock itself is OK because xa_lock() is a requirement.
>
>> Change __xa_cmpxchg() to use GFP_NOWAIT | __GFP_NOWARN, for it is likely
>> that __xa_cmpxchg() succeeds because of preceding alloc_page().
>
> Since this gfp flag is for allocating index array, it should use
> __GFP_DIRECT_RECLAIM if possible. Then, deferring RCU lock if possible
> makes sense. Then, I wonder what this RCU lock is protecting...
>
OK. I assume that the "concurrent discard" in
https://lkml.kernel.org/20250628011459.832760-1-yukuai1@huaweicloud.com means
brd_do_discard().
Calling rcu_read_lock() from brd_insert_page() before xa_unlock() is called prevents
__free_page() from brd_free_one_page() from call_rcu() from brd_do_discard(), even if
the page allocated by alloc_page() and stored into brd->brd_pages by __xa_cmpxchg() is
removed by __xa_erase() before brd_rw_bvec() calls memcpy_{to,from}_page()/memset();
allowing brd_rw_bvec() to continue using the page returned by brd_insert_page().
I came to worry one possibility about the above expectation, for I don't know
details of xarray.
__xa_cmpxchg() calls __xa_cmpxchg_raw() with xa_lock already held.
__xa_cmpxchg_raw() always calls __xas_nomem() with xa_lock already held.
__xas_nomem() might temporarily release xa_lock for allocating memory if
__GFP_DIRECT_RECLAIM is specified.
__xa_cmpxchg_raw() might store "entry" at xas_store() before calling __xas_nomem().
Then, is there a possibility that __xas_nomem() temporarily releases xa_lock for
allocating memory after__xa_cmpxchg_raw() already called xas_store() ?
Unless there is a guarantee that __xas_nomem() never releases xa_lock if
__xa_cmpxchg_raw() called xa_store(), there will be a race window that
the page allocated by alloc_page() and stored into brd->brd_pages by __xa_cmpxchg() is
removed by __xa_erase() from brd_do_discard() and __free_page() from brd_free_one_page()
from call_rcu() from brd_do_discard() is fired before brd_insert_page() calls rcu_lock()
immediately after returning from __xa_cmpxchg().
Also, what serializes concurrent brd_insert_page(), for when __xas_nomem() temporarily
released xa_lock for allocating memory, two threads might concurrently call
kmem_cache_alloc_lru() from __xas_nomem() ?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] brd: fix sleeping memory allocation in brd_insert_page()
2025-06-28 8:36 ` [PATCH] brd: fix sleeping memory allocation in brd_insert_page() Tetsuo Handa
2025-06-28 9:39 ` Tetsuo Handa
@ 2025-06-30 5:36 ` Christoph Hellwig
1 sibling, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2025-06-30 5:36 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: Jens Axboe, Yu Kuai, Christoph Hellwig, LKML
I think the correct fix is "brd: fix leeping function called from invalid
context in brd_insert_page()" from Yu Kuai. Please take a look at that
and double check it, though.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-06-30 5:36 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-27 16:20 [syzbot] [block?] BUG: sleeping function called from invalid context in __xas_nomem (2) syzbot
2025-06-28 8:36 ` [PATCH] brd: fix sleeping memory allocation in brd_insert_page() Tetsuo Handa
2025-06-28 9:39 ` Tetsuo Handa
2025-06-28 11:03 ` Tetsuo Handa
2025-06-30 5:36 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).