* [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek
@ 2024-04-03 18:23 syzbot
2024-04-03 23:51 ` syzbot
0 siblings, 1 reply; 29+ messages in thread
From: syzbot @ 2024-04-03 18:23 UTC (permalink / raw)
To: gregkh, linux-fsdevel, linux-kernel, syzkaller-bugs, tj
Hello,
syzbot found the following issue on:
HEAD commit: fe46a7dd189e Merge tag 'sound-6.9-rc1' of git://git.kernel..
git tree: upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=136eb2f6180000
kernel config: https://syzkaller.appspot.com/x/.config?x=4d90a36f0cab495a
dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17f1d93d180000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15c38139180000
Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/72ab73815344/disk-fe46a7dd.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/2d6d6b0d7071/vmlinux-fe46a7dd.xz
kernel image: https://storage.googleapis.com/syzbot-assets/48e275e5478b/bzImage-fe46a7dd.xz
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+9a5b0ced8b1bfb238b56@syzkaller.appspotmail.com
======================================================
WARNING: possible circular locking dependency detected
6.8.0-syzkaller-08951-gfe46a7dd189e #0 Not tainted
------------------------------------------------------
syz-executor250/5062 is trying to acquire lock:
ffff888022c36888 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_llseek+0x7e/0x2a0 fs/kernfs/file.c:867
but task is already holding lock:
ffff88807edeadd8 (&ovl_i_lock_key[depth]){+.+.}-{3:3}, at: ovl_inode_lock fs/overlayfs/overlayfs.h:649 [inline]
ffff88807edeadd8 (&ovl_i_lock_key[depth]){+.+.}-{3:3}, at: ovl_llseek+0x26b/0x470 fs/overlayfs/file.c:214
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&ovl_i_lock_key[depth]){+.+.}-{3:3}:
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__mutex_lock_common kernel/locking/mutex.c:608 [inline]
__mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
ovl_inode_lock_interruptible fs/overlayfs/overlayfs.h:654 [inline]
ovl_nlink_start+0xdc/0x390 fs/overlayfs/util.c:1162
ovl_do_remove+0x1fa/0xd90 fs/overlayfs/dir.c:893
vfs_rmdir+0x367/0x4c0 fs/namei.c:4209
do_rmdir+0x3b5/0x580 fs/namei.c:4268
__do_sys_rmdir fs/namei.c:4287 [inline]
__se_sys_rmdir fs/namei.c:4285 [inline]
__x64_sys_rmdir+0x49/0x60 fs/namei.c:4285
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75
-> #1 (&ovl_i_mutex_dir_key[depth]){++++}-{3:3}:
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
down_read+0xb1/0xa40 kernel/locking/rwsem.c:1526
inode_lock_shared include/linux/fs.h:803 [inline]
lookup_slow+0x45/0x70 fs/namei.c:1708
walk_component+0x2e1/0x410 fs/namei.c:2004
lookup_last fs/namei.c:2461 [inline]
path_lookupat+0x16f/0x450 fs/namei.c:2485
filename_lookup+0x256/0x610 fs/namei.c:2514
kern_path+0x35/0x50 fs/namei.c:2622
lookup_bdev+0xc5/0x290 block/bdev.c:1072
resume_store+0x1a0/0x710 kernel/power/hibernate.c:1235
kernfs_fop_write_iter+0x3a4/0x500 fs/kernfs/file.c:334
call_write_iter include/linux/fs.h:2108 [inline]
new_sync_write fs/read_write.c:497 [inline]
vfs_write+0xa84/0xcb0 fs/read_write.c:590
ksys_write+0x1a0/0x2c0 fs/read_write.c:643
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75
-> #0 (&of->mutex){+.+.}-{3:3}:
check_prev_add kernel/locking/lockdep.c:3134 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
__lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__mutex_lock_common kernel/locking/mutex.c:608 [inline]
__mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
kernfs_fop_llseek+0x7e/0x2a0 fs/kernfs/file.c:867
ovl_llseek+0x314/0x470 fs/overlayfs/file.c:218
vfs_llseek fs/read_write.c:289 [inline]
ksys_lseek fs/read_write.c:302 [inline]
__do_sys_lseek fs/read_write.c:313 [inline]
__se_sys_lseek fs/read_write.c:311 [inline]
__x64_sys_lseek+0x153/0x1e0 fs/read_write.c:311
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75
other info that might help us debug this:
Chain exists of:
&of->mutex --> &ovl_i_mutex_dir_key[depth] --> &ovl_i_lock_key[depth]
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&ovl_i_lock_key[depth]);
lock(&ovl_i_mutex_dir_key[depth]);
lock(&ovl_i_lock_key[depth]);
lock(&of->mutex);
*** DEADLOCK ***
1 lock held by syz-executor250/5062:
#0: ffff88807edeadd8 (&ovl_i_lock_key[depth]){+.+.}-{3:3}, at: ovl_inode_lock fs/overlayfs/overlayfs.h:649 [inline]
#0: ffff88807edeadd8 (&ovl_i_lock_key[depth]){+.+.}-{3:3}, at: ovl_llseek+0x26b/0x470 fs/overlayfs/file.c:214
stack backtrace:
CPU: 1 PID: 5062 Comm: syz-executor250 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187
check_prev_add kernel/locking/lockdep.c:3134 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
__lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__mutex_lock_common kernel/locking/mutex.c:608 [inline]
__mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
kernfs_fop_llseek+0x7e/0x2a0 fs/kernfs/file.c:867
ovl_llseek+0x314/0x470 fs/overlayfs/file.c:218
vfs_llseek fs/read_write.c:289 [inline]
ksys_lseek fs/read_write.c:302 [inline]
__do_sys_lseek fs/read_write.c:313 [inline]
__se_sys_lseek fs/read_write.c:311 [inline]
__x64_sys_lseek+0x153/0x1e0 fs/read_write.c:311
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75
RIP: 0033:0x7f0e2bdfd219
Code: 48 83 c4 28 c3 e8 67 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffd2f80f3f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000008
RAX: ffffffffffffffda RBX: 00007ffd2f80f400 RCX: 00007f0e2bdfd219
RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000005
RBP: 00007ffd2f80f408 R08: 00007f0e2bdca000 R09: 00007f0e2bdca000
R10: 00007f0e2bdca000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffd2f80f668 R14: 0000000000000001 R15: 0000000000000001
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.
If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)
If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report
If you want to undo deduplication, reply with:
#syz undup
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek 2024-04-03 18:23 [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek syzbot @ 2024-04-03 23:51 ` syzbot 2024-04-04 6:54 ` Amir Goldstein 0 siblings, 1 reply; 29+ messages in thread From: syzbot @ 2024-04-03 23:51 UTC (permalink / raw) To: gregkh, linux-fsdevel, linux-kernel, syzkaller-bugs, tj, valesini syzbot has bisected this issue to: commit 0fedefd4c4e33dd24f726b13b5d7c143e2b483be Author: Valentine Sinitsyn <valesini@yandex-team.ru> Date: Mon Sep 25 08:40:12 2023 +0000 kernfs: sysfs: support custom llseek method for sysfs entries bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=17cb5e03180000 start commit: fe46a7dd189e Merge tag 'sound-6.9-rc1' of git://git.kernel.. git tree: upstream final oops: https://syzkaller.appspot.com/x/report.txt?x=142b5e03180000 console output: https://syzkaller.appspot.com/x/log.txt?x=102b5e03180000 kernel config: https://syzkaller.appspot.com/x/.config?x=4d90a36f0cab495a dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17f1d93d180000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15c38139180000 Reported-by: syzbot+9a5b0ced8b1bfb238b56@syzkaller.appspotmail.com Fixes: 0fedefd4c4e3 ("kernfs: sysfs: support custom llseek method for sysfs entries") For information about bisection process see: https://goo.gl/tpsmEJ#bisection ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek 2024-04-03 23:51 ` syzbot @ 2024-04-04 6:54 ` Amir Goldstein 2024-04-04 8:11 ` Al Viro 2024-04-05 15:08 ` Amir Goldstein 0 siblings, 2 replies; 29+ messages in thread From: Amir Goldstein @ 2024-04-04 6:54 UTC (permalink / raw) To: syzbot Cc: gregkh, linux-fsdevel, linux-kernel, syzkaller-bugs, tj, valesini, Christoph Hellwig, Christian Brauner, Jan Kara, Miklos Szeredi, Al Viro On Thu, Apr 4, 2024 at 2:51 AM syzbot <syzbot+9a5b0ced8b1bfb238b56@syzkaller.appspotmail.com> wrote: > > syzbot has bisected this issue to: > > commit 0fedefd4c4e33dd24f726b13b5d7c143e2b483be > Author: Valentine Sinitsyn <valesini@yandex-team.ru> > Date: Mon Sep 25 08:40:12 2023 +0000 > > kernfs: sysfs: support custom llseek method for sysfs entries > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=17cb5e03180000 > start commit: fe46a7dd189e Merge tag 'sound-6.9-rc1' of git://git.kernel.. > git tree: upstream > final oops: https://syzkaller.appspot.com/x/report.txt?x=142b5e03180000 > console output: https://syzkaller.appspot.com/x/log.txt?x=102b5e03180000 > kernel config: https://syzkaller.appspot.com/x/.config?x=4d90a36f0cab495a > dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56 > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17f1d93d180000 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15c38139180000 > > Reported-by: syzbot+9a5b0ced8b1bfb238b56@syzkaller.appspotmail.com > Fixes: 0fedefd4c4e3 ("kernfs: sysfs: support custom llseek method for sysfs entries") > > For information about bisection process see: https://goo.gl/tpsmEJ#bisection > I think this commit is only the trigger for lockdep warning in this specific scenario, but the conceptual issue existed before that for example, with read from sysfs, which also can take of->mutex. I think (not sure) that the potential deadlock is real, not a false positive. OTOH, hibernation code may be crawling with potential and more likely deadlocks... The conceptual issue (I think) is this: Overlayfs is a stacked filesystem which regularly calls vfs helpers such as path lookup on other filesystems. This specialized behavior is accompanied with a declaration of s_stack_depth > 0, annotating ovl inode locks per stack depth (ovl_lockdep_annotate_inode_mutex_key) and restricting the types of filesystems that are allowed for writable upper layer. In the lockdep dependency chain, overlayfs inode lock is taken before kernfs internal of->mutex, where kernfs (sysfs) is the lower layer of overlayfs, which is sane. With /sys/power/resume (and probably other files), sysfs also behaves as a stacking filesystem, calling vfs helpers, such as lookup_bdev() -> kern_path(), which is a behavior of a stacked filesystem, without all the precautions that comes with behaving as a stacked filesystem. If an overlayfs path is written into /sys/power/resume and that overlayfs has sysfs as its lower layer, then there may be a small chance of hitting the ABBA deadlock, but there could very well be some conditions that prevent it. It's a shame that converting blockdev path to major:minor is not always done as a separate step by usersapce, but not much to do about it now... I don't think that Christoph's refactoring to blockdev interfaces have changed anything in this regard, but I wasn't sure so CCed the developers involved. A relatively easy way to avert this use case is to define a lookup flag LOOKUP_NO_STACKED, which does not traverse into any stacked fs (s_stack_depth > 0) and use this flag for looking up the hibernation blockdev. I suppose writing the hibernation blockdev from inside containers is not a common thing to do, so this mitigation could be good enough. Thoughts? Amir. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek 2024-04-04 6:54 ` Amir Goldstein @ 2024-04-04 8:11 ` Al Viro 2024-04-04 8:21 ` Al Viro 2024-04-05 15:08 ` Amir Goldstein 1 sibling, 1 reply; 29+ messages in thread From: Al Viro @ 2024-04-04 8:11 UTC (permalink / raw) To: Amir Goldstein Cc: syzbot, gregkh, linux-fsdevel, linux-kernel, syzkaller-bugs, tj, valesini, Christoph Hellwig, Christian Brauner, Jan Kara, Miklos Szeredi On Thu, Apr 04, 2024 at 09:54:35AM +0300, Amir Goldstein wrote: > > In the lockdep dependency chain, overlayfs inode lock is taken > before kernfs internal of->mutex, where kernfs (sysfs) is the lower > layer of overlayfs, which is sane. > > With /sys/power/resume (and probably other files), sysfs also > behaves as a stacking filesystem, calling vfs helpers, such as > lookup_bdev() -> kern_path(), which is a behavior of a stacked > filesystem, without all the precautions that comes with behaving > as a stacked filesystem. No. This is far worse than anything stacked filesystems do - it's an arbitrary pathname resolution while holding a lock. It's not local. Just about anything (including automounts, etc.) can be happening there and it pushes the lock in question outside of *ALL* pathwalk-related locks. Pathname doesn't have to resolve to anything on overlayfs - it can just go through a symlink on it, or walk into it and traverse a bunch of .. afterwards, etc. Don't confuse that with stacking - it's not even close. You can't use that anywhere near overlayfs layers. Maybe isolate it into a separate filesystem, to be automounted on /sys/power. And make anyone playing with overlayfs with sysfs as a layer mount the damn thing on top of power/ in your overlayfs. But using that thing as a part of layer is a non-starter. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek 2024-04-04 8:11 ` Al Viro @ 2024-04-04 8:21 ` Al Viro 2024-04-04 8:40 ` Al Viro 2024-04-04 9:33 ` Amir Goldstein 0 siblings, 2 replies; 29+ messages in thread From: Al Viro @ 2024-04-04 8:21 UTC (permalink / raw) To: Amir Goldstein Cc: syzbot, gregkh, linux-fsdevel, linux-kernel, syzkaller-bugs, tj, valesini, Christoph Hellwig, Christian Brauner, Jan Kara, Miklos Szeredi On Thu, Apr 04, 2024 at 09:11:22AM +0100, Al Viro wrote: > On Thu, Apr 04, 2024 at 09:54:35AM +0300, Amir Goldstein wrote: > > > > In the lockdep dependency chain, overlayfs inode lock is taken > > before kernfs internal of->mutex, where kernfs (sysfs) is the lower > > layer of overlayfs, which is sane. > > > > With /sys/power/resume (and probably other files), sysfs also > > behaves as a stacking filesystem, calling vfs helpers, such as > > lookup_bdev() -> kern_path(), which is a behavior of a stacked > > filesystem, without all the precautions that comes with behaving > > as a stacked filesystem. > > No. This is far worse than anything stacked filesystems do - it's > an arbitrary pathname resolution while holding a lock. > It's not local. Just about anything (including automounts, etc.) > can be happening there and it pushes the lock in question outside > of *ALL* pathwalk-related locks. Pathname doesn't have to > resolve to anything on overlayfs - it can just go through > a symlink on it, or walk into it and traverse a bunch of .. > afterwards, etc. > > Don't confuse that with stacking - it's not even close. > You can't use that anywhere near overlayfs layers. > > Maybe isolate it into a separate filesystem, to be automounted > on /sys/power. And make anyone playing with overlayfs with > sysfs as a layer mount the damn thing on top of power/ in your > overlayfs. But using that thing as a part of layer is > a non-starter. Incidentally, why do you need to lock overlayfs inode to call vfs_llseek() on the underlying file? It might (or might not) need to lock the underlying file (for things like ->i_size, etc.), but that will be done by ->llseek() instance and it would deal with the inode in the layer, not overlayfs one. Similar question applies to ovl_write_iter() - why do you need to hold the overlayfs inode locked during the call of backing_file_write_iter()? ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek 2024-04-04 8:21 ` Al Viro @ 2024-04-04 8:40 ` Al Viro 2024-04-04 9:33 ` Amir Goldstein 1 sibling, 0 replies; 29+ messages in thread From: Al Viro @ 2024-04-04 8:40 UTC (permalink / raw) To: Amir Goldstein Cc: syzbot, gregkh, linux-fsdevel, linux-kernel, syzkaller-bugs, tj, valesini, Christoph Hellwig, Christian Brauner, Jan Kara, Miklos Szeredi On Thu, Apr 04, 2024 at 09:21:10AM +0100, Al Viro wrote: > Similar question applies to ovl_write_iter() - why do you > need to hold the overlayfs inode locked during the call of > backing_file_write_iter()? Consider the scenario when unlink() is called on that sucker during the write() that triggers that pathwalk. We have unlink: blocked on overlayfs inode of file, while holding the parent directory. write: holding the overlayfs inode of file and trying to resolve a pathname that contains .../power/suspend_stats/../../...; blocked on attempt to lock parent so we could do a lookup in it. No llseek involved anywhere, kernfs of->mutex held, but not contended, deadlock purely on ->i_rwsem of overlayfs inodes. Holding overlayfs inode locked during the call of lookup_bdev() is really no-go. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek 2024-04-04 8:21 ` Al Viro 2024-04-04 8:40 ` Al Viro @ 2024-04-04 9:33 ` Amir Goldstein 2024-04-04 22:01 ` Al Viro ` (3 more replies) 1 sibling, 4 replies; 29+ messages in thread From: Amir Goldstein @ 2024-04-04 9:33 UTC (permalink / raw) To: Al Viro Cc: syzbot, gregkh, linux-fsdevel, linux-kernel, syzkaller-bugs, tj, valesini, Christoph Hellwig, Christian Brauner, Jan Kara, Miklos Szeredi On Thu, Apr 4, 2024 at 11:21 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Thu, Apr 04, 2024 at 09:11:22AM +0100, Al Viro wrote: > > On Thu, Apr 04, 2024 at 09:54:35AM +0300, Amir Goldstein wrote: > > > > > > In the lockdep dependency chain, overlayfs inode lock is taken > > > before kernfs internal of->mutex, where kernfs (sysfs) is the lower > > > layer of overlayfs, which is sane. > > > > > > With /sys/power/resume (and probably other files), sysfs also > > > behaves as a stacking filesystem, calling vfs helpers, such as > > > lookup_bdev() -> kern_path(), which is a behavior of a stacked > > > filesystem, without all the precautions that comes with behaving > > > as a stacked filesystem. > > > > No. This is far worse than anything stacked filesystems do - it's > > an arbitrary pathname resolution while holding a lock. > > It's not local. Just about anything (including automounts, etc.) > > can be happening there and it pushes the lock in question outside > > of *ALL* pathwalk-related locks. Pathname doesn't have to > > resolve to anything on overlayfs - it can just go through > > a symlink on it, or walk into it and traverse a bunch of .. > > afterwards, etc. > > > > Don't confuse that with stacking - it's not even close. > > You can't use that anywhere near overlayfs layers. > > > > Maybe isolate it into a separate filesystem, to be automounted > > on /sys/power. And make anyone playing with overlayfs with > > sysfs as a layer mount the damn thing on top of power/ in your > > overlayfs. But using that thing as a part of layer is > > a non-starter. I don't follow what you are saying. Which code is in non-starter violation? kernfs for calling lookup_bdev() with internal of->mutex held? Overlayfs for allowing sysfs as a lower layer and calling vfs_llseek(lower_sysfs_file,...) during copy up while ovl inode is held for legit reasons (e.g. from ovl_rename())? > > Incidentally, why do you need to lock overlayfs inode to call > vfs_llseek() on the underlying file? It might (or might not) > need to lock the underlying file (for things like ->i_size, > etc.), but that will be done by ->llseek() instance and it > would deal with the inode in the layer, not overlayfs one. We do not (anymore) lock ovl inode in ovl_llseek(), see: b1f9d3858f72 ovl: use ovl_inode_lock in ovl_llseek() but ovl inode is held in operations (e.g. ovl_rename) which trigger copy up and call vfs_llseek() on the lower file. > > Similar question applies to ovl_write_iter() - why do you > need to hold the overlayfs inode locked during the call of > backing_file_write_iter()? > Not sure. This question I need to defer to Miklos. I see in several places the pattern: inode_lock(inode); /* Update mode */ ovl_copyattr(inode); ret = file_remove_privs(file); ... /* Update size */ ovl_file_modified(file); ... inode_unlock(inode); so it could be related to atomic remove privs and update mtime, but possibly we could convert all of those inode_lock() to ovl_inode_lock() (i.e. internal lock below vfs inode lock). [...] > Consider the scenario when unlink() is called on that sucker > during the write() that triggers that pathwalk. We have > > unlink: blocked on overlayfs inode of file, while holding the parent directory. > write: holding the overlayfs inode of file and trying to resolve a pathname > that contains .../power/suspend_stats/../../...; blocked on attempt to lock > parent so we could do a lookup in it. This specifically cannot happen because sysfs is not allowed as an upper layer only as a lower layer, so overlayfs itself will not be writing to /sys/power/resume. > > No llseek involved anywhere, kernfs of->mutex held, but not contended, > deadlock purely on ->i_rwsem of overlayfs inodes. > > Holding overlayfs inode locked during the call of lookup_bdev() is really > no-go. Yes. I see that, but how can this be resolved? If the specific file /sys/power/resume will not have FMODE_LSEEK, then ovl_copy_up_file() will not try to skip_hole on it at all and the llseek lock dependency will be averted. The question is whether opt-out of FMODE_LSEEK for /sys/power/resume will break anything or if we should mark seq files in another way so that overlayfs would not try to seek_hole on any of them categorically. Thanks, Amir. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek 2024-04-04 9:33 ` Amir Goldstein @ 2024-04-04 22:01 ` Al Viro 2024-04-05 10:34 ` Amir Goldstein 2024-04-05 6:51 ` Christoph Hellwig ` (2 subsequent siblings) 3 siblings, 1 reply; 29+ messages in thread From: Al Viro @ 2024-04-04 22:01 UTC (permalink / raw) To: Amir Goldstein Cc: syzbot, gregkh, linux-fsdevel, linux-kernel, syzkaller-bugs, tj, valesini, Christoph Hellwig, Christian Brauner, Jan Kara, Miklos Szeredi On Thu, Apr 04, 2024 at 12:33:40PM +0300, Amir Goldstein wrote: > This specifically cannot happen because sysfs is not allowed as an > upper layer only as a lower layer, so overlayfs itself will not be writing to > /sys/power/resume. Then how could you possibly get a deadlock there? What would your minimal deadlocked set look like? 1. Something is blocked in lookup_bdev() called from resume_store(), called from sysfs_kf_write(), called from kernfs_write_iter(), which has acquired ->mutex of struct kernfs_open_file that had been allocated by kernfs_fop_open() back when the file had been opened. Note that each struct file instance gets a separate struct kernfs_open_file. Since we are calling ->write_iter(), the file *MUST* have been opened for write. 2. Something is blocked in kernfs_fop_llseek() on the same of->mutex, i.e. using the same struct file as (1). That something is holding an overlayfs inode lock, which is what the next thread is blocked on. + at least one more thread, to complete the cycle. Right? How could that possibly happen without overlayfs opening /sys/power/resume for write? Again, each struct file instance gets a separate of->mutex; for a deadlock you need a cycle of threads and a cycle of locks, such that each thread is holding the corresponding lock and is blocked on attempt to get the lock that comes next in the cyclic order. If overlayfs never writes to that sucker, it can't participate in that cycle. Sure, you can get overlayfs llseek grabbing of->mutex of *ANOTHER* struct file opened for the same sysfs file. Since it's not the same struct file and since each struct file there gets a separate kernfs_open_file instance, the mutex won't be the same. Unless I'm missing something else, that can't deadlock. For a quick and dirty experiment, try to give of->mutex on r/o opens a class separate from that on r/w and w/o opens (mutex_init() in kernfs_fop_open()) and see if lockdep warnings persist. Something like if (has_mmap) mutex_init(&of->mutex); else if (file->f_mode & FMODE_WRITE) mutex_init(&of->mutex); else mutex_init(&of->mutex); circa fs/kernfs/file.c:642. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek 2024-04-04 22:01 ` Al Viro @ 2024-04-05 10:34 ` Amir Goldstein 0 siblings, 0 replies; 29+ messages in thread From: Amir Goldstein @ 2024-04-05 10:34 UTC (permalink / raw) To: Al Viro Cc: syzbot, gregkh, linux-fsdevel, linux-kernel, syzkaller-bugs, tj, valesini, Christoph Hellwig, Christian Brauner, Jan Kara, Miklos Szeredi On Fri, Apr 5, 2024 at 1:01 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Thu, Apr 04, 2024 at 12:33:40PM +0300, Amir Goldstein wrote: > > > This specifically cannot happen because sysfs is not allowed as an > > upper layer only as a lower layer, so overlayfs itself will not be writing to > > /sys/power/resume. > > Then how could you possibly get a deadlock there? What would your minimal > deadlocked set look like? > > 1. Something is blocked in lookup_bdev() called from resume_store(), called > from sysfs_kf_write(), called from kernfs_write_iter(), which has acquired > ->mutex of struct kernfs_open_file that had been allocated by > kernfs_fop_open() back when the file had been opened. Note that each > struct file instance gets a separate struct kernfs_open_file. Since we are > calling ->write_iter(), the file *MUST* have been opened for write. > > 2. Something is blocked in kernfs_fop_llseek() on the same of->mutex, > i.e. using the same struct file as (1). That something is holding an > overlayfs inode lock, which is what the next thread is blocked on. > > + at least one more thread, to complete the cycle. > > Right? How could that possibly happen without overlayfs opening /sys/power/resume > for write? Again, each struct file instance gets a separate of->mutex; > for a deadlock you need a cycle of threads and a cycle of locks, such > that each thread is holding the corresponding lock and is blocked on > attempt to get the lock that comes next in the cyclic order. Absolutely right. I had it in my mind that this was a node lock. Did not look closely. > > If overlayfs never writes to that sucker, it can't participate in that > cycle. Sure, you can get overlayfs llseek grabbing of->mutex of *ANOTHER* > struct file opened for the same sysfs file. Since it's not the same > struct file and since each struct file there gets a separate kernfs_open_file > instance, the mutex won't be the same. > > Unless I'm missing something else, that can't deadlock. For a quick and > dirty experiment, try to give of->mutex on r/o opens a class separate from > that on r/w and w/o opens (mutex_init() in kernfs_fop_open()) and see > if lockdep warnings persist. > > Something like > > if (has_mmap) > mutex_init(&of->mutex); > else if (file->f_mode & FMODE_WRITE) > mutex_init(&of->mutex); > else > mutex_init(&of->mutex); Why a quick experiment? Why not a permanent kludge? It is not any better or worse than the already existing has_mmap subclass annotation. huh? Thanks, Amir. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek 2024-04-04 9:33 ` Amir Goldstein 2024-04-04 22:01 ` Al Viro @ 2024-04-05 6:51 ` Christoph Hellwig 2024-04-05 11:19 ` Christian Brauner ` (2 more replies) 2024-04-05 10:47 ` Christian Brauner 2024-04-06 4:09 ` Al Viro 3 siblings, 3 replies; 29+ messages in thread From: Christoph Hellwig @ 2024-04-05 6:51 UTC (permalink / raw) To: Amir Goldstein Cc: Al Viro, syzbot, gregkh, linux-fsdevel, linux-kernel, syzkaller-bugs, tj, valesini, Christoph Hellwig, Christian Brauner, Jan Kara, Miklos Szeredi On Thu, Apr 04, 2024 at 12:33:40PM +0300, Amir Goldstein wrote: > I don't follow what you are saying. > Which code is in non-starter violation? > kernfs for calling lookup_bdev() with internal of->mutex held? That is a huge problem, and has been causing endless annoying lockdep chains in the block layer for us. If we have some way to kill this the whole block layer would benefit. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek 2024-04-05 6:51 ` Christoph Hellwig @ 2024-04-05 11:19 ` Christian Brauner 2024-04-05 13:48 ` Christian Brauner 2024-04-05 15:41 ` Tejun Heo 2024-04-06 3:54 ` Al Viro 2 siblings, 1 reply; 29+ messages in thread From: Christian Brauner @ 2024-04-05 11:19 UTC (permalink / raw) To: Christoph Hellwig Cc: Amir Goldstein, Al Viro, syzbot, gregkh, linux-fsdevel, linux-kernel, syzkaller-bugs, tj, valesini, Jan Kara, Miklos Szeredi On Fri, Apr 05, 2024 at 08:51:35AM +0200, Christoph Hellwig wrote: > On Thu, Apr 04, 2024 at 12:33:40PM +0300, Amir Goldstein wrote: > > I don't follow what you are saying. > > Which code is in non-starter violation? > > kernfs for calling lookup_bdev() with internal of->mutex held? > > That is a huge problem, and has been causing endless annoying lockdep > chains in the block layer for us. If we have some way to kill this > the whole block layer would benefit. Why not just try and add a better resume api that forces resume to not use a path argument neither for resume_file nor for writes to /sys/power/resume. IOW, extend the protocol what can get written to /sys/power/resume and then phase out the path argument. It'll take a while but it's a possibly clean solution. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek 2024-04-05 11:19 ` Christian Brauner @ 2024-04-05 13:48 ` Christian Brauner 2024-04-05 14:52 ` Christoph Hellwig 0 siblings, 1 reply; 29+ messages in thread From: Christian Brauner @ 2024-04-05 13:48 UTC (permalink / raw) To: Christoph Hellwig Cc: Amir Goldstein, Al Viro, syzbot, gregkh, linux-fsdevel, linux-kernel, syzkaller-bugs, tj, valesini, Jan Kara, Miklos Szeredi On Fri, Apr 05, 2024 at 01:19:35PM +0200, Christian Brauner wrote: > On Fri, Apr 05, 2024 at 08:51:35AM +0200, Christoph Hellwig wrote: > > On Thu, Apr 04, 2024 at 12:33:40PM +0300, Amir Goldstein wrote: > > > I don't follow what you are saying. > > > Which code is in non-starter violation? > > > kernfs for calling lookup_bdev() with internal of->mutex held? > > > > That is a huge problem, and has been causing endless annoying lockdep > > chains in the block layer for us. If we have some way to kill this > > the whole block layer would benefit. > > Why not just try and add a better resume api that forces resume to not > use a path argument neither for resume_file nor for writes to > /sys/power/resume. IOW, extend the protocol what can get written to > /sys/power/resume and then phase out the path argument. It'll take a > while but it's a possibly clean solution. In fact, just looking at this code with a naive set of eyes for a second: * There's early_lookup_bdev() which deals with PARTUUID, PARTLABEL, raw device number, and lookup based on /dev. No actual path lookup involved in that. * So the only interesting case is lookup_bdev() for /sys/power/suspend. That one takes arbitrary paths. But being realistic for a moment... How many people will specify a device path that's _not_ some variant of /dev/...? IOW, how many people will specify a device path that's not on devtmpfs or a symlink on devtmpfs? Probably almost no one. Containers come to mind ofc. But they can't mount devtmpfs so usually what they do is that they create a tmpfs mount at /dev and then bind-mount device nodes they need into there. But unprivileged containers cannot use suspend because that requires init_user_ns capabilities. And privileged containers that are allowed to hibernate and use custom paths seem extremly unlikely as well. So really, _naively_ it seems to me that one could factor out the /dev/* part of the device number parsing logic in early_lookup_bdev() and port resume_store() to use that first and only if that fails fall back to full lookup_bdev(). (Possibly combined with some sort of logging that the user should use /dev/... paths or at least a way to recognize that this arbitrary path stuff is actually used.) And citing from a chat with the hibernation maintainer in systemd: <brauner> So /sys/power/resume does systemd ever write anything other than a /dev/* path in to there? <maintainer> Hmm? You never do that? It only accepts devno. So that takes away one of the main users of this api. So I really suspect that arbitrary device path is unused in practice. Maybe I'm all wrong though. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek 2024-04-05 13:48 ` Christian Brauner @ 2024-04-05 14:52 ` Christoph Hellwig 0 siblings, 0 replies; 29+ messages in thread From: Christoph Hellwig @ 2024-04-05 14:52 UTC (permalink / raw) To: Christian Brauner Cc: Christoph Hellwig, Amir Goldstein, Al Viro, syzbot, gregkh, linux-fsdevel, linux-kernel, syzkaller-bugs, tj, valesini, Jan Kara, Miklos Szeredi On Fri, Apr 05, 2024 at 03:48:20PM +0200, Christian Brauner wrote: > * There's early_lookup_bdev() which deals with PARTUUID, > PARTLABEL, raw device number, and lookup based on /dev. No actual path > lookup involved in that. > > * So the only interesting case is lookup_bdev() for /sys/power/suspend. > That one takes arbitrary paths. But being realistic for a moment... > How many people will specify a device path that's _not_ some variant > of /dev/...? IOW, how many people will specify a device path that's > not on devtmpfs or a symlink on devtmpfs? Probably almost no one. That's not the point. The poins is that trying to do the dumb name to bdev translation in early_lookup_bdev is wrong. Distro had and have their own numbering schemes, and not using them bypasses access control. We should never use that at runtime. > <brauner> So /sys/power/resume does systemd ever write anything other than a /dev/* path in to there? > <maintainer> Hmm? You never do that? It only accepts devno. > > So that takes away one of the main users of this api. So I really > suspect that arbitrary device path is unused in practice. Maybe I'm all > wrong though. I'm all fine with just accepting a devno and no name. But I fear it will break something as someone added it for whatever use case they had (and we should not have allowed that back then, but that ship has sailed unfortunately) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek 2024-04-05 6:51 ` Christoph Hellwig 2024-04-05 11:19 ` Christian Brauner @ 2024-04-05 15:41 ` Tejun Heo 2024-04-06 3:54 ` Al Viro 2 siblings, 0 replies; 29+ messages in thread From: Tejun Heo @ 2024-04-05 15:41 UTC (permalink / raw) To: Christoph Hellwig Cc: Amir Goldstein, Al Viro, syzbot, gregkh, linux-fsdevel, linux-kernel, syzkaller-bugs, valesini, Christian Brauner, Jan Kara, Miklos Szeredi Hello, On Fri, Apr 05, 2024 at 08:51:35AM +0200, Christoph Hellwig wrote: > On Thu, Apr 04, 2024 at 12:33:40PM +0300, Amir Goldstein wrote: > > I don't follow what you are saying. > > Which code is in non-starter violation? > > kernfs for calling lookup_bdev() with internal of->mutex held? > > That is a huge problem, and has been causing endless annoying lockdep > chains in the block layer for us. If we have some way to kill this > the whole block layer would benefit. of->mutex is mostly there as a convenience to kernfs (here, sysfs) users so that they don't have to worry about concurrent invocation of the callbacks. It needs more careful look but on cursory observation, it shouldn't be difficult to implement a flag or different op type which skips of->mutex if this causes a lot of pain. Thanks. -- tejun ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek 2024-04-05 6:51 ` Christoph Hellwig 2024-04-05 11:19 ` Christian Brauner 2024-04-05 15:41 ` Tejun Heo @ 2024-04-06 3:54 ` Al Viro 2 siblings, 0 replies; 29+ messages in thread From: Al Viro @ 2024-04-06 3:54 UTC (permalink / raw) To: Christoph Hellwig Cc: Amir Goldstein, syzbot, gregkh, linux-fsdevel, linux-kernel, syzkaller-bugs, tj, valesini, Christian Brauner, Jan Kara, Miklos Szeredi On Fri, Apr 05, 2024 at 08:51:35AM +0200, Christoph Hellwig wrote: > On Thu, Apr 04, 2024 at 12:33:40PM +0300, Amir Goldstein wrote: > > I don't follow what you are saying. > > Which code is in non-starter violation? > > kernfs for calling lookup_bdev() with internal of->mutex held? > > That is a huge problem, and has been causing endless annoying lockdep > chains in the block layer for us. If we have some way to kill this > the whole block layer would benefit. Specifically of->mutex or having pathwalk done from some ->write_iter()? Incidentally, losetup happily accepts sysfs files as backing store. I don't think it adds problems we don't have otherwise, but IMO that's really bogus... ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek 2024-04-04 9:33 ` Amir Goldstein 2024-04-04 22:01 ` Al Viro 2024-04-05 6:51 ` Christoph Hellwig @ 2024-04-05 10:47 ` Christian Brauner 2024-04-05 14:48 ` Amir Goldstein 2024-04-06 4:09 ` Al Viro 3 siblings, 1 reply; 29+ messages in thread From: Christian Brauner @ 2024-04-05 10:47 UTC (permalink / raw) To: Amir Goldstein Cc: Al Viro, syzbot, gregkh, linux-fsdevel, linux-kernel, syzkaller-bugs, tj, valesini, Christoph Hellwig, Jan Kara, Miklos Szeredi On Thu, Apr 04, 2024 at 12:33:40PM +0300, Amir Goldstein wrote: > On Thu, Apr 4, 2024 at 11:21 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > On Thu, Apr 04, 2024 at 09:11:22AM +0100, Al Viro wrote: > > > On Thu, Apr 04, 2024 at 09:54:35AM +0300, Amir Goldstein wrote: > > > > > > > > In the lockdep dependency chain, overlayfs inode lock is taken > > > > before kernfs internal of->mutex, where kernfs (sysfs) is the lower > > > > layer of overlayfs, which is sane. > > > > > > > > With /sys/power/resume (and probably other files), sysfs also > > > > behaves as a stacking filesystem, calling vfs helpers, such as > > > > lookup_bdev() -> kern_path(), which is a behavior of a stacked > > > > filesystem, without all the precautions that comes with behaving > > > > as a stacked filesystem. > > > > > > No. This is far worse than anything stacked filesystems do - it's > > > an arbitrary pathname resolution while holding a lock. > > > It's not local. Just about anything (including automounts, etc.) > > > can be happening there and it pushes the lock in question outside > > > of *ALL* pathwalk-related locks. Pathname doesn't have to > > > resolve to anything on overlayfs - it can just go through > > > a symlink on it, or walk into it and traverse a bunch of .. > > > afterwards, etc. > > > > > > Don't confuse that with stacking - it's not even close. > > > You can't use that anywhere near overlayfs layers. > > > > > > Maybe isolate it into a separate filesystem, to be automounted > > > on /sys/power. And make anyone playing with overlayfs with > > > sysfs as a layer mount the damn thing on top of power/ in your > > > overlayfs. But using that thing as a part of layer is > > > a non-starter. > > I don't follow what you are saying. > Which code is in non-starter violation? > kernfs for calling lookup_bdev() with internal of->mutex held? > Overlayfs for allowing sysfs as a lower layer and calling > vfs_llseek(lower_sysfs_file,...) during copy up while ovl inode is held > for legit reasons (e.g. from ovl_rename())? > > > > > Incidentally, why do you need to lock overlayfs inode to call > > vfs_llseek() on the underlying file? It might (or might not) > > need to lock the underlying file (for things like ->i_size, > > etc.), but that will be done by ->llseek() instance and it > > would deal with the inode in the layer, not overlayfs one. > > We do not (anymore) lock ovl inode in ovl_llseek(), see: > b1f9d3858f72 ovl: use ovl_inode_lock in ovl_llseek() > but ovl inode is held in operations (e.g. ovl_rename) > which trigger copy up and call vfs_llseek() on the lower file. > > > > > Similar question applies to ovl_write_iter() - why do you > > need to hold the overlayfs inode locked during the call of > > backing_file_write_iter()? > > > > Not sure. This question I need to defer to Miklos. > I see in several places the pattern: > inode_lock(inode); > /* Update mode */ > ovl_copyattr(inode); > ret = file_remove_privs(file); > ... > /* Update size */ > ovl_file_modified(file); > ... > inode_unlock(inode); > > so it could be related to atomic remove privs and update mtime, > but possibly we could convert all of those inode_lock() to > ovl_inode_lock() (i.e. internal lock below vfs inode lock). > > [...] > > Consider the scenario when unlink() is called on that sucker > > during the write() that triggers that pathwalk. We have > > > > unlink: blocked on overlayfs inode of file, while holding the parent directory. > > write: holding the overlayfs inode of file and trying to resolve a pathname > > that contains .../power/suspend_stats/../../...; blocked on attempt to lock > > parent so we could do a lookup in it. > > This specifically cannot happen because sysfs is not allowed as an > upper layer only as a lower layer, so overlayfs itself will not be writing to > /sys/power/resume. I don't understand that part. If overlayfs uses /sys/power/ as a lower layer it can open and write to /sys/power/resume, no? Honestly, why don't you just block /sys/power from appearing in any layer in overlayfs? This seems like such a niche use-case that it's so unlikely that this will be used that I would just try and kill it. If you do it like Al suggested and switch it to an automount you get that for free. But I guess you can also just block it without that. (Frankly, I find it weird that sysfs is allowed as a layer in any case. I completely forgot about this. Imho, both procfs and sysfs should not be usable as a lower layer - procfs is, I know - and then only select parts should be like /sys/fs/cgroup or sm where I can see the container people somehow using this to mess with the cgroup tree or something.) > > > > > No llseek involved anywhere, kernfs of->mutex held, but not contended, > > deadlock purely on ->i_rwsem of overlayfs inodes. > > > > Holding overlayfs inode locked during the call of lookup_bdev() is really > > no-go. > > Yes. I see that, but how can this be resolved? > > If the specific file /sys/power/resume will not have FMODE_LSEEK, > then ovl_copy_up_file() will not try to skip_hole on it at all and the > llseek lock dependency will be averted. > > The question is whether opt-out of FMODE_LSEEK for /sys/power/resume > will break anything or if we should mark seq files in another way so that > overlayfs would not try to seek_hole on any of them categorically. > > Thanks, > Amir. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek 2024-04-05 10:47 ` Christian Brauner @ 2024-04-05 14:48 ` Amir Goldstein 0 siblings, 0 replies; 29+ messages in thread From: Amir Goldstein @ 2024-04-05 14:48 UTC (permalink / raw) To: Christian Brauner Cc: Al Viro, syzbot, gregkh, linux-fsdevel, linux-kernel, syzkaller-bugs, tj, valesini, Christoph Hellwig, Jan Kara, Miklos Szeredi On Fri, Apr 5, 2024 at 1:47 PM Christian Brauner <brauner@kernel.org> wrote: > > On Thu, Apr 04, 2024 at 12:33:40PM +0300, Amir Goldstein wrote: > > On Thu, Apr 4, 2024 at 11:21 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > > > On Thu, Apr 04, 2024 at 09:11:22AM +0100, Al Viro wrote: > > > > On Thu, Apr 04, 2024 at 09:54:35AM +0300, Amir Goldstein wrote: > > > > > > > > > > In the lockdep dependency chain, overlayfs inode lock is taken > > > > > before kernfs internal of->mutex, where kernfs (sysfs) is the lower > > > > > layer of overlayfs, which is sane. > > > > > > > > > > With /sys/power/resume (and probably other files), sysfs also > > > > > behaves as a stacking filesystem, calling vfs helpers, such as > > > > > lookup_bdev() -> kern_path(), which is a behavior of a stacked > > > > > filesystem, without all the precautions that comes with behaving > > > > > as a stacked filesystem. > > > > > > > > No. This is far worse than anything stacked filesystems do - it's > > > > an arbitrary pathname resolution while holding a lock. > > > > It's not local. Just about anything (including automounts, etc.) > > > > can be happening there and it pushes the lock in question outside > > > > of *ALL* pathwalk-related locks. Pathname doesn't have to > > > > resolve to anything on overlayfs - it can just go through > > > > a symlink on it, or walk into it and traverse a bunch of .. > > > > afterwards, etc. > > > > > > > > Don't confuse that with stacking - it's not even close. > > > > You can't use that anywhere near overlayfs layers. > > > > > > > > Maybe isolate it into a separate filesystem, to be automounted > > > > on /sys/power. And make anyone playing with overlayfs with > > > > sysfs as a layer mount the damn thing on top of power/ in your > > > > overlayfs. But using that thing as a part of layer is > > > > a non-starter. > > > > I don't follow what you are saying. > > Which code is in non-starter violation? > > kernfs for calling lookup_bdev() with internal of->mutex held? > > Overlayfs for allowing sysfs as a lower layer and calling > > vfs_llseek(lower_sysfs_file,...) during copy up while ovl inode is held > > for legit reasons (e.g. from ovl_rename())? > > > > > > > > Incidentally, why do you need to lock overlayfs inode to call > > > vfs_llseek() on the underlying file? It might (or might not) > > > need to lock the underlying file (for things like ->i_size, > > > etc.), but that will be done by ->llseek() instance and it > > > would deal with the inode in the layer, not overlayfs one. > > > > We do not (anymore) lock ovl inode in ovl_llseek(), see: > > b1f9d3858f72 ovl: use ovl_inode_lock in ovl_llseek() > > but ovl inode is held in operations (e.g. ovl_rename) > > which trigger copy up and call vfs_llseek() on the lower file. > > > > > > > > Similar question applies to ovl_write_iter() - why do you > > > need to hold the overlayfs inode locked during the call of > > > backing_file_write_iter()? > > > > > > > Not sure. This question I need to defer to Miklos. > > I see in several places the pattern: > > inode_lock(inode); > > /* Update mode */ > > ovl_copyattr(inode); > > ret = file_remove_privs(file); > > ... > > /* Update size */ > > ovl_file_modified(file); > > ... > > inode_unlock(inode); > > > > so it could be related to atomic remove privs and update mtime, > > but possibly we could convert all of those inode_lock() to > > ovl_inode_lock() (i.e. internal lock below vfs inode lock). > > > > [...] > > > Consider the scenario when unlink() is called on that sucker > > > during the write() that triggers that pathwalk. We have > > > > > > unlink: blocked on overlayfs inode of file, while holding the parent directory. > > > write: holding the overlayfs inode of file and trying to resolve a pathname > > > that contains .../power/suspend_stats/../../...; blocked on attempt to lock > > > parent so we could do a lookup in it. > > > > This specifically cannot happen because sysfs is not allowed as an > > upper layer only as a lower layer, so overlayfs itself will not be writing to > > /sys/power/resume. > > I don't understand that part. If overlayfs uses /sys/power/ as a lower > layer it can open and write to /sys/power/resume, no? > > Honestly, why don't you just block /sys/power from appearing in any > layer in overlayfs? This seems like such a niche use-case that it's so > unlikely that this will be used that I would just try and kill it. I do not want to special case /sys/power in overlayfs. > > If you do it like Al suggested and switch it to an automount you get Not important enough IMO to make this change. > that for free. But I guess you can also just block it without that. > > (Frankly, I find it weird that sysfs is allowed as a layer in any case. I > completely forgot about this. Imho, both procfs and sysfs should not be > usable as a lower layer - procfs is, I know - and then only select parts > should be like /sys/fs/cgroup or sm where I can see the container people > somehow using this to mess with the cgroup tree or something.) > I do not know if using sysfs as a lower layer is an important use case, but I have a feeling that people already may do it, so I cannot regress it without a good reason. Al's suggestion to annotate writable kernfs files as a different class from readonly kernfs files seems fine by me to silence lockdep false positive. I will try to feed this solution to syzbot. Thanks, Amir. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek 2024-04-04 9:33 ` Amir Goldstein ` (2 preceding siblings ...) 2024-04-05 10:47 ` Christian Brauner @ 2024-04-06 4:09 ` Al Viro 2024-04-06 5:25 ` Amir Goldstein 3 siblings, 1 reply; 29+ messages in thread From: Al Viro @ 2024-04-06 4:09 UTC (permalink / raw) To: Amir Goldstein Cc: syzbot, gregkh, linux-fsdevel, linux-kernel, syzkaller-bugs, tj, valesini, Christoph Hellwig, Christian Brauner, Jan Kara, Miklos Szeredi On Thu, Apr 04, 2024 at 12:33:40PM +0300, Amir Goldstein wrote: > We do not (anymore) lock ovl inode in ovl_llseek(), see: > b1f9d3858f72 ovl: use ovl_inode_lock in ovl_llseek() > but ovl inode is held in operations (e.g. ovl_rename) > which trigger copy up and call vfs_llseek() on the lower file. OK, but why do we bother with ovl_inode_lock() there? Note that serialization on struct file level is provided on syscall level - see call of fdget_pos() in there. IOW, which object are you protecting? If it's struct file passed your way, you should already have the serialization. If it's underlying file on disk, that's up to vfs_llseek(). Exclusion with copyup by a different operation? I'm not saying it's wrong - it's just that the thing is tricky enough, so some clarification might be a good idea. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek 2024-04-06 4:09 ` Al Viro @ 2024-04-06 5:25 ` Amir Goldstein 0 siblings, 0 replies; 29+ messages in thread From: Amir Goldstein @ 2024-04-06 5:25 UTC (permalink / raw) To: Al Viro, Miklos Szeredi Cc: syzbot, gregkh, linux-fsdevel, linux-kernel, syzkaller-bugs, tj, valesini, Christoph Hellwig, Christian Brauner, Jan Kara On Sat, Apr 6, 2024 at 7:09 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Thu, Apr 04, 2024 at 12:33:40PM +0300, Amir Goldstein wrote: > > > We do not (anymore) lock ovl inode in ovl_llseek(), see: > > b1f9d3858f72 ovl: use ovl_inode_lock in ovl_llseek() > > but ovl inode is held in operations (e.g. ovl_rename) > > which trigger copy up and call vfs_llseek() on the lower file. > > OK, but why do we bother with ovl_inode_lock() there? > Note that serialization on struct file level is provided > on syscall level - see call of fdget_pos() in there. > IOW, which object are you protecting? If it's struct file > passed your way, you should already have the serialization. > If it's underlying file on disk, that's up to vfs_llseek(). You're right. > Exclusion with copyup by a different operation? Nah, don't see how this is relevant to file->f_pos. > > I'm not saying it's wrong - it's just that the thing is > tricky enough, so some clarification might be a good idea. I think I just used inode_lock() in 9e46b840c705 ("ovl: support stacked SEEK_HOLE/SEEK_DATA") as a common coding pattern in overlayfs when protecting the "master" copy of overlay inode attributes, but it was not needed for file->f_pos. Miklos, please ack that I am not missing anything and that ovl_inode_lock() is indeed redundant in ovl_llseek(). Anyway, this lock is not part of the lockdep issue that started this thread. Thanks, Amir. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek 2024-04-04 6:54 ` Amir Goldstein 2024-04-04 8:11 ` Al Viro @ 2024-04-05 15:08 ` Amir Goldstein 2024-04-05 15:37 ` syzbot 1 sibling, 1 reply; 29+ messages in thread From: Amir Goldstein @ 2024-04-05 15:08 UTC (permalink / raw) To: syzbot Cc: gregkh, linux-fsdevel, linux-kernel, syzkaller-bugs, tj, valesini, Christoph Hellwig, Christian Brauner, Jan Kara, Miklos Szeredi, Al Viro On Thu, Apr 4, 2024 at 9:54 AM Amir Goldstein <amir73il@gmail.com> wrote: > > On Thu, Apr 4, 2024 at 2:51 AM syzbot > <syzbot+9a5b0ced8b1bfb238b56@syzkaller.appspotmail.com> wrote: > > > > syzbot has bisected this issue to: > > > > commit 0fedefd4c4e33dd24f726b13b5d7c143e2b483be > > Author: Valentine Sinitsyn <valesini@yandex-team.ru> > > Date: Mon Sep 25 08:40:12 2023 +0000 > > > > kernfs: sysfs: support custom llseek method for sysfs entries > > > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=17cb5e03180000 > > start commit: fe46a7dd189e Merge tag 'sound-6.9-rc1' of git://git.kernel.. > > git tree: upstream > > final oops: https://syzkaller.appspot.com/x/report.txt?x=142b5e03180000 > > console output: https://syzkaller.appspot.com/x/log.txt?x=102b5e03180000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=4d90a36f0cab495a > > dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56 > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17f1d93d180000 > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15c38139180000 > > > > Reported-by: syzbot+9a5b0ced8b1bfb238b56@syzkaller.appspotmail.com > > Fixes: 0fedefd4c4e3 ("kernfs: sysfs: support custom llseek method for sysfs entries") > > > > For information about bisection process see: https://goo.gl/tpsmEJ#bisection > > > Let's test Al's solution. #syz test: https://github.com/amir73il/linux/ vfs-fixes Thanks, Amir. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek 2024-04-05 15:08 ` Amir Goldstein @ 2024-04-05 15:37 ` syzbot 2024-04-05 16:23 ` Al Viro 0 siblings, 1 reply; 29+ messages in thread From: syzbot @ 2024-04-05 15:37 UTC (permalink / raw) To: amir73il, brauner, gregkh, hch, jack, linux-fsdevel, linux-kernel, miklos, syzkaller-bugs, tj, valesini, viro Hello, syzbot tried to test the proposed patch but the build/boot failed: viperboard [ 8.144672][ T1] usbcore: registered new interface driver dln2 [ 8.146814][ T1] usbcore: registered new interface driver pn533_usb [ 8.154721][ T1] nfcsim 0.2 initialized [ 8.156856][ T1] usbcore: registered new interface driver port100 [ 8.158745][ T1] usbcore: registered new interface driver nfcmrvl [ 8.168709][ T1] Loading iSCSI transport class v2.0-870. [ 8.188275][ T1] virtio_scsi virtio0: 1/0/0 default/read/poll queues [ 8.202624][ T1] ------------[ cut here ]------------ [ 8.204252][ T1] refcount_t: decrement hit 0; leaking memory. [ 8.206219][ T1] WARNING: CPU: 0 PID: 1 at lib/refcount.c:31 refcount_warn_saturate+0xfa/0x1d0 [ 8.208638][ T1] Modules linked in: [ 8.209556][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.9.0-rc1-syzkaller-00001-g70d370568b75 #0 [ 8.214301][ T1] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 [ 8.217361][ T1] RIP: 0010:refcount_warn_saturate+0xfa/0x1d0 [ 8.218680][ T1] Code: b2 00 00 00 e8 77 99 f2 fc 5b 5d c3 cc cc cc cc e8 6b 99 f2 fc c6 05 11 1e f0 0a 01 90 48 c7 c7 e0 5e 1e 8c e8 b7 35 b5 fc 90 <0f> 0b 90 90 eb d9 e8 4b 99 f2 fc c6 05 ee 1d f0 0a 01 90 48 c7 c7 [ 8.223425][ T1] RSP: 0000:ffffc90000066e18 EFLAGS: 00010246 [ 8.225385][ T1] RAX: 1f757896feb95b00 RBX: ffff8881482a2d7c RCX: ffff888016ac8000 [ 8.227629][ T1] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 8.229990][ T1] RBP: 0000000000000004 R08: ffffffff8157ffe2 R09: fffffbfff1c396e0 [ 8.232520][ T1] R10: dffffc0000000000 R11: fffffbfff1c396e0 R12: ffffea000502cdc0 [ 8.234601][ T1] R13: ffffea000502cdc8 R14: 1ffffd4000a059b9 R15: 0000000000000000 [ 8.236876][ T1] FS: 0000000000000000(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000 [ 8.239600][ T1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8.240914][ T1] CR2: ffff88823ffff000 CR3: 000000000e132000 CR4: 00000000003506f0 [ 8.243303][ T1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 8.245750][ T1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 8.247945][ T1] Call Trace: [ 8.248964][ T1] <TASK> [ 8.250217][ T1] ? __warn+0x163/0x4e0 [ 8.251589][ T1] ? refcount_warn_saturate+0xfa/0x1d0 [ 8.253743][ T1] ? report_bug+0x2b3/0x500 [ 8.254770][ T1] ? refcount_warn_saturate+0xfa/0x1d0 [ 8.256336][ T1] ? handle_bug+0x3e/0x70 [ 8.257030][ T1] ? exc_invalid_op+0x1a/0x50 [ 8.258667][ T1] ? asm_exc_invalid_op+0x1a/0x20 [ 8.259957][ T1] ? __warn_printk+0x292/0x360 [ 8.261215][ T1] ? refcount_warn_saturate+0xfa/0x1d0 [ 8.263208][ T1] ? refcount_warn_saturate+0xf9/0x1d0 [ 8.264293][ T1] __free_pages_ok+0xc54/0xd80 [ 8.265404][ T1] make_alloc_exact+0xa3/0xf0 [ 8.266560][ T1] vring_alloc_queue_split+0x20a/0x600 [ 8.267988][ T1] ? __pfx_vring_alloc_queue_split+0x10/0x10 [ 8.269965][ T1] ? vp_find_vqs+0x4c/0x4e0 [ 8.271058][ T1] ? virtscsi_probe+0x3ea/0xf60 [ 8.272057][ T1] ? virtio_dev_probe+0x991/0xaf0 [ 8.273466][ T1] ? really_probe+0x2b8/0xad0 [ 8.275102][ T1] ? driver_probe_device+0x50/0x430 [ 8.276174][ T1] vring_create_virtqueue_split+0xc6/0x310 [ 8.277826][ T1] ? ret_from_fork+0x4b/0x80 [ 8.279140][ T1] ? __pfx_vring_create_virtqueue_split+0x10/0x10 [ 8.281586][ T1] vring_create_virtqueue+0xca/0x110 [ 8.283311][ T1] ? __pfx_vp_notify+0x10/0x10 [ 8.285146][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10 [ 8.286554][ T1] setup_vq+0xe9/0x2d0 [ 8.287916][ T1] ? __pfx_vp_notify+0x10/0x10 [ 8.289684][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10 [ 8.291329][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10 [ 8.292741][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10 [ 8.294786][ T1] vp_setup_vq+0xbf/0x330 [ 8.296422][ T1] ? __pfx_vp_config_changed+0x10/0x10 [ 8.297504][ T1] ? ioread16+0x2f/0x90 [ 8.298989][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10 [ 8.300925][ T1] vp_find_vqs_msix+0x8b2/0xc80 [ 8.302291][ T1] vp_find_vqs+0x4c/0x4e0 [ 8.303546][ T1] virtscsi_init+0x8db/0xd00 [ 8.304820][ T1] ? __pfx_virtscsi_init+0x10/0x10 [ 8.305808][ T1] ? __pfx_default_calc_sets+0x10/0x10 [ 8.307015][ T1] ? scsi_host_alloc+0xa57/0xea0 [ 8.308290][ T1] ? vp_get+0xfd/0x140 [ 8.309405][ T1] virtscsi_probe+0x3ea/0xf60 [ 8.310389][ T1] ? __pfx_virtscsi_probe+0x10/0x10 [ 8.311500][ T1] ? kernfs_add_one+0x156/0x8b0 [ 8.312348][ T1] ? virtio_no_restricted_mem_acc+0x9/0x10 [ 8.313977][ T1] ? virtio_features_ok+0x10c/0x270 [ 8.314859][ T1] virtio_dev_probe+0x991/0xaf0 [ 8.315766][ T1] ? __pfx_virtio_dev_probe+0x10/0x10 [ 8.316886][ T1] really_probe+0x2b8/0xad0 [ 8.317711][ T1] __driver_probe_device+0x1a2/0x390 [ 8.318525][ T1] driver_probe_device+0x50/0x430 [ 8.319567][ T1] __driver_attach+0x45f/0x710 [ 8.320525][ T1] ? __pfx___driver_attach+0x10/0x10 [ 8.321841][ T1] bus_for_each_dev+0x239/0x2b0 [ 8.322872][ T1] ? __pfx___driver_attach+0x10/0x10 [ 8.324235][ T1] ? __pfx_bus_for_each_dev+0x10/0x10 [ 8.326127][ T1] ? do_raw_spin_unlock+0x13c/0x8b0 [ 8.327579][ T1] bus_add_driver+0x347/0x620 [ 8.328989][ T1] driver_register+0x23a/0x320 [ 8.330318][ T1] ? __pfx_virtio_scsi_init+0x10/0x10 [ 8.332178][ T1] virtio_scsi_init+0x65/0xe0 [ 8.333752][ T1] ? __pfx_virtio_scsi_init+0x10/0x10 [ 8.334950][ T1] do_one_initcall+0x248/0x880 [ 8.336015][ T1] ? __pfx_virtio_scsi_init+0x10/0x10 [ 8.337202][ T1] ? __pfx_do_one_initcall+0x10/0x10 [ 8.338254][ T1] ? lockdep_hardirqs_on_prepare+0x43d/0x780 [ 8.339378][ T1] ? __pfx_parse_args+0x10/0x10 [ 8.341464][ T1] ? do_initcalls+0x1c/0x80 [ 8.342223][ T1] ? rcu_is_watching+0x15/0xb0 [ 8.343161][ T1] do_initcall_level+0x157/0x210 [ 8.344252][ T1] do_initcalls+0x3f/0x80 [ 8.345400][ T1] kernel_init_freeable+0x435/0x5d0 [ 8.346571][ T1] ? __pfx_kernel_init_freeable+0x10/0x10 [ 8.349024][ T1] ? __pfx_lockdep_hardirqs_on_prepare+0x10/0x10 [ 8.350664][ T1] ? __pfx_kernel_init+0x10/0x10 [ 8.352143][ T1] ? __pfx_kernel_init+0x10/0x10 [ 8.353430][ T1] ? __pfx_kernel_init+0x10/0x10 [ 8.354609][ T1] kernel_init+0x1d/0x2b0 [ 8.355433][ T1] ret_from_fork+0x4b/0x80 [ 8.356203][ T1] ? __pfx_kernel_init+0x10/0x10 [ 8.357092][ T1] ret_from_fork_asm+0x1a/0x30 [ 8.358282][ T1] </TASK> [ 8.359166][ T1] Kernel panic - not syncing: kernel: panic_on_warn set ... [ 8.360941][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.9.0-rc1-syzkaller-00001-g70d370568b75 #0 [ 8.363085][ T1] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 [ 8.363411][ T1] Call Trace: [ 8.363411][ T1] <TASK> [ 8.363411][ T1] dump_stack_lvl+0x241/0x360 [ 8.363411][ T1] ? __pfx_dump_stack_lvl+0x10/0x10 [ 8.363411][ T1] ? __pfx__printk+0x10/0x10 [ 8.363411][ T1] ? _printk+0xd5/0x120 [ 8.363411][ T1] ? vscnprintf+0x5d/0x90 [ 8.363411][ T1] panic+0x349/0x860 [ 8.363411][ T1] ? __warn+0x172/0x4e0 [ 8.363411][ T1] ? __pfx_panic+0x10/0x10 [ 8.363411][ T1] ? show_trace_log_lvl+0x4e6/0x520 [ 8.363411][ T1] ? ret_from_fork_asm+0x1a/0x30 [ 8.363411][ T1] __warn+0x346/0x4e0 [ 8.363411][ T1] ? refcount_warn_saturate+0xfa/0x1d0 [ 8.363411][ T1] report_bug+0x2b3/0x500 [ 8.363411][ T1] ? refcount_warn_saturate+0xfa/0x1d0 [ 8.363411][ T1] handle_bug+0x3e/0x70 [ 8.363411][ T1] exc_invalid_op+0x1a/0x50 [ 8.363411][ T1] asm_exc_invalid_op+0x1a/0x20 [ 8.363411][ T1] RIP: 0010:refcount_warn_saturate+0xfa/0x1d0 [ 8.363411][ T1] Code: b2 00 00 00 e8 77 99 f2 fc 5b 5d c3 cc cc cc cc e8 6b 99 f2 fc c6 05 11 1e f0 0a 01 90 48 c7 c7 e0 5e 1e 8c e8 b7 35 b5 fc 90 <0f> 0b 90 90 eb d9 e8 4b 99 f2 fc c6 05 ee 1d f0 0a 01 90 48 c7 c7 [ 8.363411][ T1] RSP: 0000:ffffc90000066e18 EFLAGS: 00010246 [ 8.363411][ T1] RAX: 1f757896feb95b00 RBX: ffff8881482a2d7c RCX: ffff888016ac8000 [ 8.363411][ T1] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 8.363411][ T1] RBP: 0000000000000004 R08: ffffffff8157ffe2 R09: fffffbfff1c396e0 [ 8.363411][ T1] R10: dffffc0000000000 R11: fffffbfff1c396e0 R12: ffffea000502cdc0 [ 8.363411][ T1] R13: ffffea000502cdc8 R14: 1ffffd4000a059b9 R15: 0000000000000000 [ 8.363411][ T1] ? __warn_printk+0x292/0x360 [ 8.363411][ T1] ? refcount_warn_saturate+0xf9/0x1d0 [ 8.363411][ T1] __free_pages_ok+0xc54/0xd80 [ 8.363411][ T1] make_alloc_exact+0xa3/0xf0 [ 8.363411][ T1] vring_alloc_queue_split+0x20a/0x600 [ 8.363411][ T1] ? __pfx_vring_alloc_queue_split+0x10/0x10 [ 8.363411][ T1] ? vp_find_vqs+0x4c/0x4e0 [ 8.363411][ T1] ? virtscsi_probe+0x3ea/0xf60 [ 8.363411][ T1] ? virtio_dev_probe+0x991/0xaf0 [ 8.363411][ T1] ? really_probe+0x2b8/0xad0 [ 8.363411][ T1] ? driver_probe_device+0x50/0x430 [ 8.363411][ T1] vring_create_virtqueue_split+0xc6/0x310 [ 8.363411][ T1] ? ret_from_fork+0x4b/0x80 [ 8.363411][ T1] ? __pfx_vring_create_virtqueue_split+0x10/0x10 [ 8.363411][ T1] vring_create_virtqueue+0xca/0x110 [ 8.363411][ T1] ? __pfx_vp_notify+0x10/0x10 [ 8.363411][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10 [ 8.363411][ T1] setup_vq+0xe9/0x2d0 [ 8.363411][ T1] ? __pfx_vp_notify+0x10/0x10 [ 8.363411][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10 [ 8.363411][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10 [ 8.363411][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10 [ 8.363411][ T1] vp_setup_vq+0xbf/0x330 [ 8.363411][ T1] ? __pfx_vp_config_changed+0x10/0x10 [ 8.363411][ T1] ? ioread16+0x2f/0x90 [ 8.363411][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10 [ 8.363411][ T1] vp_find_vqs_msix+0x8b2/0xc80 [ 8.363411][ T1] vp_find_vqs+0x4c/0x4e0 [ 8.363411][ T1] virtscsi_init+0x8db/0xd00 [ 8.363411][ T1] ? __pfx_virtscsi_init+0x10/0x10 [ 8.363411][ T1] ? __pfx_default_calc_sets+0x10/0x10 [ 8.363411][ T1] ? scsi_host_alloc+0xa57/0xea0 [ 8.363411][ T1] ? vp_get+0xfd/0x140 [ 8.363411][ T1] virtscsi_probe+0x3ea/0xf60 [ 8.363411][ T1] ? __pfx_virtscsi_probe+0x10/0x10 [ 8.363411][ T1] ? kernfs_add_one+0x156/0x8b0 [ 8.363411][ T1] ? virtio_no_restricted_mem_acc+0x9/0x10 [ 8.363411][ T1] ? virtio_features_ok+0x10c/0x270 [ 8.363411][ T1] virtio_dev_probe+0x991/0xaf0 [ 8.363411][ T1] ? __pfx_virtio_dev_probe+0x10/0x10 [ 8.363411][ T1] really_probe+0x2b8/0xad0 [ 8.363411][ T1] __driver_probe_device+0x1a2/0x390 [ 8.363411][ T1] driver_probe_device+0x50/0x430 [ 8.363411][ T1] __driver_attach+0x45f/0x710 [ 8.363411][ T1] ? __pfx___driver_attach+0x10/0x10 [ 8.363411][ T1] bus_for_each_dev+0x239/0x2b0 [ 8.363411][ T1] ? __pfx___driver_attach+0x10/0x10 [ 8.363411][ T1] ? __pfx_bus_for_each_dev+0x10/0x10 [ 8.363411][ T1] ? do_raw_spin_unlock+0x13c/0x8b0 [ 8.363411][ T1] bus_add_driver+0x347/0x620 [ 8.363411][ T1] driver_register+0x23a/0x320 [ 8.363411][ T1] ? __pfx_virtio_scsi_init+0x10/0x10 [ 8.363411][ T1] virtio_scsi_init+0x65/0xe0 [ 8.363411][ T1] ? __pfx_virtio_scsi_init+0x10/0x10 [ 8.363411][ T1] do_one_initcall+0x248/0x880 [ 8.363411][ T1] ? __pfx_virtio_scsi_init+0x10/0x10 [ 8.363411][ T1] ? __pfx_do_one_initcall+0x10/0x10 [ 8.363411][ T1] ? lockdep_hardirqs_on_prepare+0x43d/0x780 [ 8.363411][ T1] ? __pfx_parse_args+0x10/0x10 [ 8.363411][ T1] ? do_initcalls+0x1c/0x80 [ 8.363411][ T1] ? rcu_is_watching+0x15/0xb0 [ 8.363411][ T1] do_initcall_level+0x157/0x210 [ 8.363411][ T1] do_initcalls+0x3f/0x80 [ 8.363411][ T1] kernel_init_freeable+0x435/0x5d0 [ 8.363411][ T1] ? __pfx_kernel_init_freeable+0x10/0x10 [ 8.363411][ T1] ? __pfx_lockdep_hardirqs_on_prepare+0x10/0x10 [ 8.363411][ T1] ? __pfx_kernel_init+0x10/0x10 [ 8.363411][ T1] ? __pfx_kernel_init+0x10/0x10 [ 8.363411][ T1] ? __pfx_kernel_init+0x10/0x10 [ 8.363411][ T1] kernel_init+0x1d/0x2b0 [ 8.363411][ T1] ret_from_fork+0x4b/0x80 [ 8.363411][ T1] ? __pfx_kernel_init+0x10/0x10 [ 8.363411][ T1] ret_from_fork_asm+0x1a/0x30 [ 8.363411][ T1] </TASK> [ 8.363411][ T1] Kernel Offset: disabled [ 8.363411][ T1] Rebooting in 86400 seconds.. syzkaller build log: go env (err=<nil>) GO111MODULE='auto' GOARCH='amd64' GOBIN='' GOCACHE='/syzkaller/.cache/go-build' GOENV='/syzkaller/.config/go/env' GOEXE='' GOEXPERIMENT='' GOFLAGS='' GOHOSTARCH='amd64' GOHOSTOS='linux' GOINSECURE='' GOMODCACHE='/syzkaller/jobs-2/linux/gopath/pkg/mod' GONOPROXY='' GONOSUMDB='' GOOS='linux' GOPATH='/syzkaller/jobs-2/linux/gopath' GOPRIVATE='' GOPROXY='https://proxy.golang.org,direct' GOROOT='/usr/local/go' GOSUMDB='sum.golang.org' GOTMPDIR='' GOTOOLCHAIN='auto' GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64' GOVCS='' GOVERSION='go1.21.4' GCCGO='gccgo' GOAMD64='v1' AR='ar' CC='gcc' CXX='g++' CGO_ENABLED='1' GOMOD='/syzkaller/jobs-2/linux/gopath/src/github.com/google/syzkaller/go.mod' GOWORK='' CGO_CFLAGS='-O2 -g' CGO_CPPFLAGS='' CGO_CXXFLAGS='-O2 -g' CGO_FFLAGS='-O2 -g' CGO_LDFLAGS='-O2 -g' PKG_CONFIG='pkg-config' GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build2829307758=/tmp/go-build -gno-record-gcc-switches' git status (err=<nil>) HEAD detached at 51c4dcff8 nothing to commit, working tree clean tput: No value for $TERM and no -T specified tput: No value for $TERM and no -T specified Makefile:31: run command via tools/syz-env for best compatibility, see: Makefile:32: https://github.com/google/syzkaller/blob/master/docs/contributing.md#using-syz-env go list -f '{{.Stale}}' ./sys/syz-sysgen | grep -q false || go install ./sys/syz-sysgen make .descriptions tput: No value for $TERM and no -T specified tput: No value for $TERM and no -T specified Makefile:31: run command via tools/syz-env for best compatibility, see: Makefile:32: https://github.com/google/syzkaller/blob/master/docs/contributing.md#using-syz-env bin/syz-sysgen touch .descriptions GOOS=linux GOARCH=amd64 go build "-ldflags=-s -w -X github.com/google/syzkaller/prog.GitRevision=51c4dcff83b0574620c280cc5130ef59cc4a2e32 -X 'github.com/google/syzkaller/prog.gitRevisionDate=20240403-123700'" "-tags=syz_target syz_os_linux syz_arch_amd64 " -o ./bin/linux_amd64/syz-fuzzer github.com/google/syzkaller/syz-fuzzer GOOS=linux GOARCH=amd64 go build "-ldflags=-s -w -X github.com/google/syzkaller/prog.GitRevision=51c4dcff83b0574620c280cc5130ef59cc4a2e32 -X 'github.com/google/syzkaller/prog.gitRevisionDate=20240403-123700'" "-tags=syz_target syz_os_linux syz_arch_amd64 " -o ./bin/linux_amd64/syz-execprog github.com/google/syzkaller/tools/syz-execprog GOOS=linux GOARCH=amd64 go build "-ldflags=-s -w -X github.com/google/syzkaller/prog.GitRevision=51c4dcff83b0574620c280cc5130ef59cc4a2e32 -X 'github.com/google/syzkaller/prog.gitRevisionDate=20240403-123700'" "-tags=syz_target syz_os_linux syz_arch_amd64 " -o ./bin/linux_amd64/syz-stress github.com/google/syzkaller/tools/syz-stress mkdir -p ./bin/linux_amd64 gcc -o ./bin/linux_amd64/syz-executor executor/executor.cc \ -m64 -O2 -pthread -Wall -Werror -Wparentheses -Wunused-const-variable -Wframe-larger-than=16384 -Wno-stringop-overflow -Wno-array-bounds -Wno-format-overflow -Wno-unused-but-set-variable -Wno-unused-command-line-argument -static-pie -fpermissive -w -DGOOS_linux=1 -DGOARCH_amd64=1 \ -DHOSTGOOS_linux=1 -DGIT_REVISION=\"51c4dcff83b0574620c280cc5130ef59cc4a2e32\" Error text is too large and was truncated, full error text is at: https://syzkaller.appspot.com/x/error.txt?x=16ae7ead180000 Tested on: commit: 70d37056 kernfs: annotate different lockdep class for .. git tree: https://github.com/amir73il/linux/ vfs-fixes kernel config: https://syzkaller.appspot.com/x/.config?x=d45c08b2154c215d dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56 compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 Note: no patches were applied. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek 2024-04-05 15:37 ` syzbot @ 2024-04-05 16:23 ` Al Viro 2024-04-06 5:34 ` Amir Goldstein 0 siblings, 1 reply; 29+ messages in thread From: Al Viro @ 2024-04-05 16:23 UTC (permalink / raw) To: syzbot Cc: amir73il, brauner, gregkh, hch, jack, linux-fsdevel, linux-kernel, miklos, syzkaller-bugs, tj, valesini On Fri, Apr 05, 2024 at 08:37:03AM -0700, syzbot wrote: > Hello, > > syzbot tried to test the proposed patch but the build/boot failed: WTF? The patch is diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c index e9df2f87072c6..8502ef68459b9 100644 --- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -636,11 +636,18 @@ static int kernfs_fop_open(struct inode *inode, struct file *file) * each file a separate locking class. Let's differentiate on * whether the file has mmap or not for now. * - * Both paths of the branch look the same. They're supposed to + * For similar reasons, writable and readonly files are given different + * lockdep key, because the writable file /sys/power/resume may call vfs + * lookup helpers for arbitrary paths and readonly files can be read by + * overlayfs from vfs helpers when sysfs is a lower layer of overalyfs. + * + * All three cases look the same. They're supposed to * look that way and give @of->mutex different static lockdep keys. */ if (has_mmap) mutex_init(&of->mutex); + else if (file->f_mode & FMODE_WRITE) + mutex_init(&of->mutex); else mutex_init(&of->mutex); How could it possibly trigger boot failure? Test the parent, perhaps? ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek 2024-04-05 16:23 ` Al Viro @ 2024-04-06 5:34 ` Amir Goldstein 2024-04-06 7:05 ` syzbot 0 siblings, 1 reply; 29+ messages in thread From: Amir Goldstein @ 2024-04-06 5:34 UTC (permalink / raw) To: Al Viro Cc: syzbot, brauner, gregkh, hch, jack, linux-fsdevel, linux-kernel, miklos, syzkaller-bugs, tj, valesini On Fri, Apr 5, 2024 at 7:23 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Fri, Apr 05, 2024 at 08:37:03AM -0700, syzbot wrote: > > Hello, > > > > syzbot tried to test the proposed patch but the build/boot failed: > > WTF? The patch is > > diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c > index e9df2f87072c6..8502ef68459b9 100644 > --- a/fs/kernfs/file.c > +++ b/fs/kernfs/file.c > @@ -636,11 +636,18 @@ static int kernfs_fop_open(struct inode *inode, struct file *file) > * each file a separate locking class. Let's differentiate on > * whether the file has mmap or not for now. > * > - * Both paths of the branch look the same. They're supposed to > + * For similar reasons, writable and readonly files are given different > + * lockdep key, because the writable file /sys/power/resume may call vfs > + * lookup helpers for arbitrary paths and readonly files can be read by > + * overlayfs from vfs helpers when sysfs is a lower layer of overalyfs. > + * > + * All three cases look the same. They're supposed to > * look that way and give @of->mutex different static lockdep keys. > */ > if (has_mmap) > mutex_init(&of->mutex); > + else if (file->f_mode & FMODE_WRITE) > + mutex_init(&of->mutex); > else > mutex_init(&of->mutex); > > How could it possibly trigger boot failure? Test the parent, perhaps? Let's try again, rebased on current master: #syz test: https://github.com/amir73il/linux/ vfs-fixes Thanks, Amir. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek 2024-04-06 5:34 ` Amir Goldstein @ 2024-04-06 7:05 ` syzbot 2024-04-06 7:11 ` Al Viro 0 siblings, 1 reply; 29+ messages in thread From: syzbot @ 2024-04-06 7:05 UTC (permalink / raw) To: amir73il, brauner, gregkh, hch, jack, linux-fsdevel, linux-kernel, miklos, syzkaller-bugs, tj, valesini, viro Hello, syzbot tried to test the proposed patch but the build/boot failed: viperboard [ 8.566509][ T1] usbcore: registered new interface driver dln2 [ 8.568801][ T1] usbcore: registered new interface driver pn533_usb [ 8.579446][ T1] nfcsim 0.2 initialized [ 8.581035][ T1] usbcore: registered new interface driver port100 [ 8.583546][ T1] usbcore: registered new interface driver nfcmrvl [ 8.595440][ T1] Loading iSCSI transport class v2.0-870. [ 8.614503][ T1] virtio_scsi virtio0: 1/0/0 default/read/poll queues [ 8.624792][ T1] ------------[ cut here ]------------ [ 8.626075][ T1] refcount_t: decrement hit 0; leaking memory. [ 8.627319][ T1] WARNING: CPU: 0 PID: 1 at lib/refcount.c:31 refcount_warn_saturate+0xfa/0x1d0 [ 8.629053][ T1] Modules linked in: [ 8.629654][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.9.0-rc2-syzkaller-00388-g3398bf34c993 #0 [ 8.631954][ T1] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 [ 8.634054][ T1] RIP: 0010:refcount_warn_saturate+0xfa/0x1d0 [ 8.635268][ T1] Code: b2 00 00 00 e8 d7 93 f0 fc 5b 5d c3 cc cc cc cc e8 cb 93 f0 fc c6 05 ae 5c ee 0a 01 90 48 c7 c7 40 80 1e 8c e8 e7 2e b3 fc 90 <0f> 0b 90 90 eb d9 e8 ab 93 f0 fc c6 05 8b 5c ee 0a 01 90 48 c7 c7 [ 8.638923][ T1] RSP: 0000:ffffc90000066e18 EFLAGS: 00010246 [ 8.639822][ T1] RAX: 9b6fe4ff1df42a00 RBX: ffff888021006ccc RCX: ffff888016ac0000 [ 8.641085][ T1] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 8.642968][ T1] RBP: 0000000000000004 R08: ffffffff8157ffc2 R09: fffffbfff1c39af8 [ 8.645656][ T1] R10: dffffc0000000000 R11: fffffbfff1c39af8 R12: ffffea000502fdc0 [ 8.646945][ T1] R13: ffffea000502fdc8 R14: 1ffffd4000a05fb9 R15: 0000000000000000 [ 8.648121][ T1] FS: 0000000000000000(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000 [ 8.650496][ T1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8.652023][ T1] CR2: ffff88823ffff000 CR3: 000000000e134000 CR4: 00000000003506f0 [ 8.653935][ T1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 8.655934][ T1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 8.657851][ T1] Call Trace: [ 8.658338][ T1] <TASK> [ 8.658807][ T1] ? __warn+0x163/0x4e0 [ 8.659517][ T1] ? refcount_warn_saturate+0xfa/0x1d0 [ 8.660549][ T1] ? report_bug+0x2b3/0x500 [ 8.661574][ T1] ? refcount_warn_saturate+0xfa/0x1d0 [ 8.663372][ T1] ? handle_bug+0x3e/0x70 [ 8.664928][ T1] ? exc_invalid_op+0x1a/0x50 [ 8.666599][ T1] ? asm_exc_invalid_op+0x1a/0x20 [ 8.668117][ T1] ? __warn_printk+0x292/0x360 [ 8.670085][ T1] ? refcount_warn_saturate+0xfa/0x1d0 [ 8.671620][ T1] ? refcount_warn_saturate+0xf9/0x1d0 [ 8.673520][ T1] __free_pages_ok+0xc54/0xd80 [ 8.675020][ T1] make_alloc_exact+0xa3/0xf0 [ 8.676151][ T1] vring_alloc_queue_split+0x20a/0x600 [ 8.677685][ T1] ? __pfx_vring_alloc_queue_split+0x10/0x10 [ 8.678762][ T1] ? vp_find_vqs+0x4c/0x4e0 [ 8.679763][ T1] ? virtscsi_probe+0x3ea/0xf60 [ 8.681408][ T1] ? virtio_dev_probe+0x991/0xaf0 [ 8.683147][ T1] ? really_probe+0x2b8/0xad0 [ 8.684759][ T1] ? driver_probe_device+0x50/0x430 [ 8.685967][ T1] vring_create_virtqueue_split+0xc6/0x310 [ 8.687403][ T1] ? ret_from_fork+0x4b/0x80 [ 8.688676][ T1] ? __pfx_vring_create_virtqueue_split+0x10/0x10 [ 8.690045][ T1] vring_create_virtqueue+0xca/0x110 [ 8.691014][ T1] ? __pfx_vp_notify+0x10/0x10 [ 8.692612][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10 [ 8.693924][ T1] setup_vq+0xe9/0x2d0 [ 8.695355][ T1] ? __pfx_vp_notify+0x10/0x10 [ 8.696851][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10 [ 8.699030][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10 [ 8.700639][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10 [ 8.702131][ T1] vp_setup_vq+0xbf/0x330 [ 8.703279][ T1] ? __pfx_vp_config_changed+0x10/0x10 [ 8.704850][ T1] ? ioread16+0x2f/0x90 [ 8.706423][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10 [ 8.707372][ T1] vp_find_vqs_msix+0x8b2/0xc80 [ 8.708975][ T1] vp_find_vqs+0x4c/0x4e0 [ 8.709918][ T1] virtscsi_init+0x8db/0xd00 [ 8.710723][ T1] ? __pfx_virtscsi_init+0x10/0x10 [ 8.711813][ T1] ? __pfx_default_calc_sets+0x10/0x10 [ 8.713608][ T1] ? scsi_host_alloc+0xa57/0xea0 [ 8.714998][ T1] ? vp_get+0xfd/0x140 [ 8.715786][ T1] virtscsi_probe+0x3ea/0xf60 [ 8.716933][ T1] ? __pfx_virtscsi_probe+0x10/0x10 [ 8.718052][ T1] ? kernfs_add_one+0x156/0x8b0 [ 8.719084][ T1] ? virtio_no_restricted_mem_acc+0x9/0x10 [ 8.719983][ T1] ? virtio_features_ok+0x10c/0x270 [ 8.721406][ T1] virtio_dev_probe+0x991/0xaf0 [ 8.722392][ T1] ? __pfx_virtio_dev_probe+0x10/0x10 [ 8.723354][ T1] really_probe+0x2b8/0xad0 [ 8.725227][ T1] __driver_probe_device+0x1a2/0x390 [ 8.726307][ T1] driver_probe_device+0x50/0x430 [ 8.727156][ T1] __driver_attach+0x45f/0x710 [ 8.727968][ T1] ? __pfx___driver_attach+0x10/0x10 [ 8.728999][ T1] bus_for_each_dev+0x239/0x2b0 [ 8.730291][ T1] ? __pfx___driver_attach+0x10/0x10 [ 8.731396][ T1] ? __pfx_bus_for_each_dev+0x10/0x10 [ 8.732523][ T1] ? do_raw_spin_unlock+0x13c/0x8b0 [ 8.733601][ T1] bus_add_driver+0x347/0x620 [ 8.734733][ T1] driver_register+0x23a/0x320 [ 8.735723][ T1] ? __pfx_virtio_scsi_init+0x10/0x10 [ 8.736859][ T1] virtio_scsi_init+0x65/0xe0 [ 8.737729][ T1] ? __pfx_virtio_scsi_init+0x10/0x10 [ 8.739109][ T1] do_one_initcall+0x248/0x880 [ 8.740693][ T1] ? __pfx_virtio_scsi_init+0x10/0x10 [ 8.742428][ T1] ? __pfx_do_one_initcall+0x10/0x10 [ 8.745322][ T1] ? lockdep_hardirqs_on_prepare+0x43d/0x780 [ 8.746957][ T1] ? __pfx_parse_args+0x10/0x10 [ 8.748064][ T1] ? do_initcalls+0x1c/0x80 [ 8.749434][ T1] ? rcu_is_watching+0x15/0xb0 [ 8.750232][ T1] do_initcall_level+0x157/0x210 [ 8.751547][ T1] do_initcalls+0x3f/0x80 [ 8.752397][ T1] kernel_init_freeable+0x435/0x5d0 [ 8.754091][ T1] ? __pfx_kernel_init_freeable+0x10/0x10 [ 8.755229][ T1] ? __pfx_lockdep_hardirqs_on_prepare+0x10/0x10 [ 8.757834][ T1] ? __pfx_kernel_init+0x10/0x10 [ 8.759572][ T1] ? __pfx_kernel_init+0x10/0x10 [ 8.761202][ T1] ? __pfx_kernel_init+0x10/0x10 [ 8.762156][ T1] kernel_init+0x1d/0x2b0 [ 8.763250][ T1] ret_from_fork+0x4b/0x80 [ 8.764029][ T1] ? __pfx_kernel_init+0x10/0x10 [ 8.765039][ T1] ret_from_fork_asm+0x1a/0x30 [ 8.766178][ T1] </TASK> [ 8.766835][ T1] Kernel panic - not syncing: kernel: panic_on_warn set ... [ 8.768906][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.9.0-rc2-syzkaller-00388-g3398bf34c993 #0 [ 8.770531][ T1] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 [ 8.772129][ T1] Call Trace: [ 8.772129][ T1] <TASK> [ 8.772129][ T1] dump_stack_lvl+0x241/0x360 [ 8.772129][ T1] ? __pfx_dump_stack_lvl+0x10/0x10 [ 8.772129][ T1] ? __pfx__printk+0x10/0x10 [ 8.772129][ T1] ? _printk+0xd5/0x120 [ 8.772129][ T1] ? vscnprintf+0x5d/0x90 [ 8.772129][ T1] panic+0x349/0x860 [ 8.772129][ T1] ? __warn+0x172/0x4e0 [ 8.772129][ T1] ? __pfx_panic+0x10/0x10 [ 8.772129][ T1] ? show_trace_log_lvl+0x4e6/0x520 [ 8.781563][ T1] ? ret_from_fork_asm+0x1a/0x30 [ 8.781563][ T1] __warn+0x346/0x4e0 [ 8.781563][ T1] ? refcount_warn_saturate+0xfa/0x1d0 [ 8.781563][ T1] report_bug+0x2b3/0x500 [ 8.781563][ T1] ? refcount_warn_saturate+0xfa/0x1d0 [ 8.781563][ T1] handle_bug+0x3e/0x70 [ 8.781563][ T1] exc_invalid_op+0x1a/0x50 [ 8.781563][ T1] asm_exc_invalid_op+0x1a/0x20 [ 8.781563][ T1] RIP: 0010:refcount_warn_saturate+0xfa/0x1d0 [ 8.791455][ T1] Code: b2 00 00 00 e8 d7 93 f0 fc 5b 5d c3 cc cc cc cc e8 cb 93 f0 fc c6 05 ae 5c ee 0a 01 90 48 c7 c7 40 80 1e 8c e8 e7 2e b3 fc 90 <0f> 0b 90 90 eb d9 e8 ab 93 f0 fc c6 05 8b 5c ee 0a 01 90 48 c7 c7 [ 8.791455][ T1] RSP: 0000:ffffc90000066e18 EFLAGS: 00010246 [ 8.791455][ T1] RAX: 9b6fe4ff1df42a00 RBX: ffff888021006ccc RCX: ffff888016ac0000 [ 8.791455][ T1] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 8.791455][ T1] RBP: 0000000000000004 R08: ffffffff8157ffc2 R09: fffffbfff1c39af8 [ 8.801572][ T1] R10: dffffc0000000000 R11: fffffbfff1c39af8 R12: ffffea000502fdc0 [ 8.801572][ T1] R13: ffffea000502fdc8 R14: 1ffffd4000a05fb9 R15: 0000000000000000 [ 8.801572][ T1] ? __warn_printk+0x292/0x360 [ 8.801572][ T1] ? refcount_warn_saturate+0xf9/0x1d0 [ 8.801572][ T1] __free_pages_ok+0xc54/0xd80 [ 8.801572][ T1] make_alloc_exact+0xa3/0xf0 [ 8.801572][ T1] vring_alloc_queue_split+0x20a/0x600 [ 8.801572][ T1] ? __pfx_vring_alloc_queue_split+0x10/0x10 [ 8.811433][ T1] ? vp_find_vqs+0x4c/0x4e0 [ 8.811433][ T1] ? virtscsi_probe+0x3ea/0xf60 [ 8.811433][ T1] ? virtio_dev_probe+0x991/0xaf0 [ 8.811433][ T1] ? really_probe+0x2b8/0xad0 [ 8.811433][ T1] ? driver_probe_device+0x50/0x430 [ 8.811433][ T1] vring_create_virtqueue_split+0xc6/0x310 [ 8.811433][ T1] ? ret_from_fork+0x4b/0x80 [ 8.811433][ T1] ? __pfx_vring_create_virtqueue_split+0x10/0x10 [ 8.811433][ T1] vring_create_virtqueue+0xca/0x110 [ 8.821568][ T1] ? __pfx_vp_notify+0x10/0x10 [ 8.821568][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10 [ 8.821568][ T1] setup_vq+0xe9/0x2d0 [ 8.821568][ T1] ? __pfx_vp_notify+0x10/0x10 [ 8.821568][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10 [ 8.821568][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10 [ 8.821568][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10 [ 8.821568][ T1] vp_setup_vq+0xbf/0x330 [ 8.831450][ T1] ? __pfx_vp_config_changed+0x10/0x10 [ 8.831450][ T1] ? ioread16+0x2f/0x90 [ 8.831450][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10 [ 8.831450][ T1] vp_find_vqs_msix+0x8b2/0xc80 [ 8.831450][ T1] vp_find_vqs+0x4c/0x4e0 [ 8.831450][ T1] virtscsi_init+0x8db/0xd00 [ 8.831450][ T1] ? __pfx_virtscsi_init+0x10/0x10 [ 8.831450][ T1] ? __pfx_default_calc_sets+0x10/0x10 [ 8.831450][ T1] ? scsi_host_alloc+0xa57/0xea0 [ 8.831450][ T1] ? vp_get+0xfd/0x140 [ 8.831450][ T1] virtscsi_probe+0x3ea/0xf60 [ 8.841557][ T1] ? __pfx_virtscsi_probe+0x10/0x10 [ 8.841557][ T1] ? kernfs_add_one+0x156/0x8b0 [ 8.841557][ T1] ? virtio_no_restricted_mem_acc+0x9/0x10 [ 8.841557][ T1] ? virtio_features_ok+0x10c/0x270 [ 8.841557][ T1] virtio_dev_probe+0x991/0xaf0 [ 8.841557][ T1] ? __pfx_virtio_dev_probe+0x10/0x10 [ 8.841557][ T1] really_probe+0x2b8/0xad0 [ 8.841557][ T1] __driver_probe_device+0x1a2/0x390 [ 8.841557][ T1] driver_probe_device+0x50/0x430 [ 8.841557][ T1] __driver_attach+0x45f/0x710 [ 8.851437][ T1] ? __pfx___driver_attach+0x10/0x10 [ 8.851437][ T1] bus_for_each_dev+0x239/0x2b0 [ 8.851437][ T1] ? __pfx___driver_attach+0x10/0x10 [ 8.851437][ T1] ? __pfx_bus_for_each_dev+0x10/0x10 [ 8.851437][ T1] ? do_raw_spin_unlock+0x13c/0x8b0 [ 8.851437][ T1] bus_add_driver+0x347/0x620 [ 8.851437][ T1] driver_register+0x23a/0x320 [ 8.851437][ T1] ? __pfx_virtio_scsi_init+0x10/0x10 [ 8.851437][ T1] virtio_scsi_init+0x65/0xe0 [ 8.851437][ T1] ? __pfx_virtio_scsi_init+0x10/0x10 [ 8.851437][ T1] do_one_initcall+0x248/0x880 [ 8.861551][ T1] ? __pfx_virtio_scsi_init+0x10/0x10 [ 8.861551][ T1] ? __pfx_do_one_initcall+0x10/0x10 [ 8.861551][ T1] ? lockdep_hardirqs_on_prepare+0x43d/0x780 [ 8.861551][ T1] ? __pfx_parse_args+0x10/0x10 [ 8.861551][ T1] ? do_initcalls+0x1c/0x80 [ 8.861551][ T1] ? rcu_is_watching+0x15/0xb0 [ 8.861551][ T1] do_initcall_level+0x157/0x210 [ 8.861551][ T1] do_initcalls+0x3f/0x80 [ 8.861551][ T1] kernel_init_freeable+0x435/0x5d0 [ 8.871429][ T1] ? __pfx_kernel_init_freeable+0x10/0x10 [ 8.871429][ T1] ? __pfx_lockdep_hardirqs_on_prepare+0x10/0x10 [ 8.871429][ T1] ? __pfx_kernel_init+0x10/0x10 [ 8.871429][ T1] ? __pfx_kernel_init+0x10/0x10 [ 8.871429][ T1] ? __pfx_kernel_init+0x10/0x10 [ 8.871429][ T1] kernel_init+0x1d/0x2b0 [ 8.871429][ T1] ret_from_fork+0x4b/0x80 [ 8.871429][ T1] ? __pfx_kernel_init+0x10/0x10 [ 8.871429][ T1] ret_from_fork_asm+0x1a/0x30 [ 8.871429][ T1] </TASK> [ 8.871429][ T1] Kernel Offset: disabled [ 8.871429][ T1] Rebooting in 86400 seconds.. syzkaller build log: go env (err=<nil>) GO111MODULE='auto' GOARCH='amd64' GOBIN='' GOCACHE='/syzkaller/.cache/go-build' GOENV='/syzkaller/.config/go/env' GOEXE='' GOEXPERIMENT='' GOFLAGS='' GOHOSTARCH='amd64' GOHOSTOS='linux' GOINSECURE='' GOMODCACHE='/syzkaller/jobs-2/linux/gopath/pkg/mod' GONOPROXY='' GONOSUMDB='' GOOS='linux' GOPATH='/syzkaller/jobs-2/linux/gopath' GOPRIVATE='' GOPROXY='https://proxy.golang.org,direct' GOROOT='/usr/local/go' GOSUMDB='sum.golang.org' GOTMPDIR='' GOTOOLCHAIN='auto' GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64' GOVCS='' GOVERSION='go1.21.4' GCCGO='gccgo' GOAMD64='v1' AR='ar' CC='gcc' CXX='g++' CGO_ENABLED='1' GOMOD='/syzkaller/jobs-2/linux/gopath/src/github.com/google/syzkaller/go.mod' GOWORK='' CGO_CFLAGS='-O2 -g' CGO_CPPFLAGS='' CGO_CXXFLAGS='-O2 -g' CGO_FFLAGS='-O2 -g' CGO_LDFLAGS='-O2 -g' PKG_CONFIG='pkg-config' GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build3147238964=/tmp/go-build -gno-record-gcc-switches' git status (err=<nil>) HEAD detached at 51c4dcff8 nothing to commit, working tree clean tput: No value for $TERM and no -T specified tput: No value for $TERM and no -T specified Makefile:31: run command via tools/syz-env for best compatibility, see: Makefile:32: https://github.com/google/syzkaller/blob/master/docs/contributing.md#using-syz-env go list -f '{{.Stale}}' ./sys/syz-sysgen | grep -q false || go install ./sys/syz-sysgen make .descriptions tput: No value for $TERM and no -T specified tput: No value for $TERM and no -T specified Makefile:31: run command via tools/syz-env for best compatibility, see: Makefile:32: https://github.com/google/syzkaller/blob/master/docs/contributing.md#using-syz-env bin/syz-sysgen touch .descriptions GOOS=linux GOARCH=amd64 go build "-ldflags=-s -w -X github.com/google/syzkaller/prog.GitRevision=51c4dcff83b0574620c280cc5130ef59cc4a2e32 -X 'github.com/google/syzkaller/prog.gitRevisionDate=20240403-123700'" "-tags=syz_target syz_os_linux syz_arch_amd64 " -o ./bin/linux_amd64/syz-fuzzer github.com/google/syzkaller/syz-fuzzer GOOS=linux GOARCH=amd64 go build "-ldflags=-s -w -X github.com/google/syzkaller/prog.GitRevision=51c4dcff83b0574620c280cc5130ef59cc4a2e32 -X 'github.com/google/syzkaller/prog.gitRevisionDate=20240403-123700'" "-tags=syz_target syz_os_linux syz_arch_amd64 " -o ./bin/linux_amd64/syz-execprog github.com/google/syzkaller/tools/syz-execprog GOOS=linux GOARCH=amd64 go build "-ldflags=-s -w -X github.com/google/syzkaller/prog.GitRevision=51c4dcff83b0574620c280cc5130ef59cc4a2e32 -X 'github.com/google/syzkaller/prog.gitRevisionDate=20240403-123700'" "-tags=syz_target syz_os_linux syz_arch_amd64 " -o ./bin/linux_amd64/syz-stress github.com/google/syzkaller/tools/syz-stress mkdir -p ./bin/linux_amd64 gcc -o ./bin/linux_amd64/syz-executor executor/executor.cc \ -m64 -O2 -pthread -Wall -Werror -Wparentheses -Wunused-const-variable -Wframe-larger-than=16384 -Wno-stringop-overflow -Wno-array-bounds -Wno-format-overflow -Wno-unused-but-set-variable -Wno-unused-command-line-argument -static-pie -fpermissive -w -DGOOS_linux=1 -DGOARCH_amd64=1 \ -DHOSTGOOS_linux=1 -DGIT_REVISION=\"51c4dcff83b0574620c280cc5130ef59cc4a2e32\" Error text is too large and was truncated, full error text is at: https://syzkaller.appspot.com/x/error.txt?x=16a356f6180000 Tested on: commit: 3398bf34 kernfs: annotate different lockdep class for .. git tree: https://github.com/amir73il/linux/ vfs-fixes kernel config: https://syzkaller.appspot.com/x/.config?x=c5cda112a8438056 dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56 compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 Note: no patches were applied. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek 2024-04-06 7:05 ` syzbot @ 2024-04-06 7:11 ` Al Viro 2024-04-06 8:57 ` Amir Goldstein 0 siblings, 1 reply; 29+ messages in thread From: Al Viro @ 2024-04-06 7:11 UTC (permalink / raw) To: syzbot Cc: amir73il, brauner, gregkh, hch, jack, linux-fsdevel, linux-kernel, miklos, syzkaller-bugs, tj, valesini On Sat, Apr 06, 2024 at 12:05:04AM -0700, syzbot wrote: > commit: 3398bf34 kernfs: annotate different lockdep class for .. > git tree: https://github.com/amir73il/linux/ vfs-fixes > kernel config: https://syzkaller.appspot.com/x/.config?x=c5cda112a8438056 > dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56 > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 > > Note: no patches were applied. How about the same test on 6c6e47d69d821047097909288b6d7f1aafb3b9b1? ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek 2024-04-06 7:11 ` Al Viro @ 2024-04-06 8:57 ` Amir Goldstein 2024-04-07 0:50 ` Al Viro 0 siblings, 1 reply; 29+ messages in thread From: Amir Goldstein @ 2024-04-06 8:57 UTC (permalink / raw) To: Al Viro Cc: syzbot, brauner, gregkh, hch, jack, linux-fsdevel, linux-kernel, miklos, syzkaller-bugs, tj, valesini, Hillf Danton On Sat, Apr 6, 2024 at 10:11 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Sat, Apr 06, 2024 at 12:05:04AM -0700, syzbot wrote: > > > commit: 3398bf34 kernfs: annotate different lockdep class for .. > > git tree: https://github.com/amir73il/linux/ vfs-fixes > > kernel config: https://syzkaller.appspot.com/x/.config?x=c5cda112a8438056 > > dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56 > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 > > > > Note: no patches were applied. > Looks like it fixes the problem: https://lore.kernel.org/lkml/000000000000a386f2061562ba6a@google.com/ Al, Are you ok with going with your solution? Do you want to pick it up through your tree? Or shall I post it and ask Christian or Greg to pick it up? Thanks, Amir. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek 2024-04-06 8:57 ` Amir Goldstein @ 2024-04-07 0:50 ` Al Viro 2024-04-07 11:02 ` Amir Goldstein 2024-04-09 9:18 ` Christian Brauner 0 siblings, 2 replies; 29+ messages in thread From: Al Viro @ 2024-04-07 0:50 UTC (permalink / raw) To: Amir Goldstein Cc: syzbot, brauner, gregkh, hch, jack, linux-fsdevel, linux-kernel, miklos, syzkaller-bugs, tj, valesini, Hillf Danton On Sat, Apr 06, 2024 at 11:57:11AM +0300, Amir Goldstein wrote: > On Sat, Apr 6, 2024 at 10:11 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > On Sat, Apr 06, 2024 at 12:05:04AM -0700, syzbot wrote: > > > > > commit: 3398bf34 kernfs: annotate different lockdep class for .. > > > git tree: https://github.com/amir73il/linux/ vfs-fixes > > > kernel config: https://syzkaller.appspot.com/x/.config?x=c5cda112a8438056 > > > dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56 > > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 > > > > > > Note: no patches were applied. > > > > Looks like it fixes the problem: > https://lore.kernel.org/lkml/000000000000a386f2061562ba6a@google.com/ > > Al, > > Are you ok with going with your solution? > Do you want to pick it up through your tree? > Or shall I post it and ask Christian or Greg to pick it up? Umm... I can grab it into #fixes. Would be nice to get the result of that test both on current mainline and mainline + patch, though - looks like the same test keeps hitting some unrelated shite in virtio_scsi. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek 2024-04-07 0:50 ` Al Viro @ 2024-04-07 11:02 ` Amir Goldstein 2024-04-09 9:18 ` Christian Brauner 1 sibling, 0 replies; 29+ messages in thread From: Amir Goldstein @ 2024-04-07 11:02 UTC (permalink / raw) To: Al Viro Cc: syzbot, brauner, gregkh, hch, jack, linux-fsdevel, linux-kernel, miklos, syzkaller-bugs, tj, valesini, Hillf Danton On Sun, Apr 7, 2024 at 3:51 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Sat, Apr 06, 2024 at 11:57:11AM +0300, Amir Goldstein wrote: > > On Sat, Apr 6, 2024 at 10:11 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > > > On Sat, Apr 06, 2024 at 12:05:04AM -0700, syzbot wrote: > > > > > > > commit: 3398bf34 kernfs: annotate different lockdep class for .. > > > > git tree: https://github.com/amir73il/linux/ vfs-fixes > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=c5cda112a8438056 > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56 > > > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 > > > > > > > > Note: no patches were applied. > > > > > > > Looks like it fixes the problem: > > https://lore.kernel.org/lkml/000000000000a386f2061562ba6a@google.com/ > > > > Al, > > > > Are you ok with going with your solution? > > Do you want to pick it up through your tree? > > Or shall I post it and ask Christian or Greg to pick it up? > > Umm... I can grab it into #fixes. Cool. Also: #syz dup: possible deadlock in seq_read_iter (3) https://lore.kernel.org/linux-fsdevel/CAJfpegumD0gDXpZwy53pPu54ifOfW-tTBLniLHep3sW2Z96MjQ@mail.gmail.com/ Thanks, Amir. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek 2024-04-07 0:50 ` Al Viro 2024-04-07 11:02 ` Amir Goldstein @ 2024-04-09 9:18 ` Christian Brauner 1 sibling, 0 replies; 29+ messages in thread From: Christian Brauner @ 2024-04-09 9:18 UTC (permalink / raw) To: Al Viro Cc: Amir Goldstein, syzbot, gregkh, hch, jack, linux-fsdevel, linux-kernel, miklos, syzkaller-bugs, tj, valesini, Hillf Danton On Sun, Apr 07, 2024 at 01:50:56AM +0100, Al Viro wrote: > On Sat, Apr 06, 2024 at 11:57:11AM +0300, Amir Goldstein wrote: > > On Sat, Apr 6, 2024 at 10:11 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > > > On Sat, Apr 06, 2024 at 12:05:04AM -0700, syzbot wrote: > > > > > > > commit: 3398bf34 kernfs: annotate different lockdep class for .. > > > > git tree: https://github.com/amir73il/linux/ vfs-fixes > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=c5cda112a8438056 > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56 > > > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 > > > > > > > > Note: no patches were applied. > > > > > > > Looks like it fixes the problem: > > https://lore.kernel.org/lkml/000000000000a386f2061562ba6a@google.com/ > > > > Al, > > > > Are you ok with going with your solution? > > Do you want to pick it up through your tree? > > Or shall I post it and ask Christian or Greg to pick it up? > > Umm... I can grab it into #fixes. Would be nice to get the result of Fine by me if you have other stuff pending for -rc4 anyway. Otherwise I can grab it. ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2024-04-09 9:18 UTC | newest] Thread overview: 29+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-04-03 18:23 [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek syzbot 2024-04-03 23:51 ` syzbot 2024-04-04 6:54 ` Amir Goldstein 2024-04-04 8:11 ` Al Viro 2024-04-04 8:21 ` Al Viro 2024-04-04 8:40 ` Al Viro 2024-04-04 9:33 ` Amir Goldstein 2024-04-04 22:01 ` Al Viro 2024-04-05 10:34 ` Amir Goldstein 2024-04-05 6:51 ` Christoph Hellwig 2024-04-05 11:19 ` Christian Brauner 2024-04-05 13:48 ` Christian Brauner 2024-04-05 14:52 ` Christoph Hellwig 2024-04-05 15:41 ` Tejun Heo 2024-04-06 3:54 ` Al Viro 2024-04-05 10:47 ` Christian Brauner 2024-04-05 14:48 ` Amir Goldstein 2024-04-06 4:09 ` Al Viro 2024-04-06 5:25 ` Amir Goldstein 2024-04-05 15:08 ` Amir Goldstein 2024-04-05 15:37 ` syzbot 2024-04-05 16:23 ` Al Viro 2024-04-06 5:34 ` Amir Goldstein 2024-04-06 7:05 ` syzbot 2024-04-06 7:11 ` Al Viro 2024-04-06 8:57 ` Amir Goldstein 2024-04-07 0:50 ` Al Viro 2024-04-07 11:02 ` Amir Goldstein 2024-04-09 9:18 ` Christian Brauner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox