* [PATCH] block: Avoid mounting the bdev pseudo-filesystem in userspace
@ 2026-05-21 7:28 Denis Arefev
2026-05-25 6:07 ` Christoph Hellwig
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Denis Arefev @ 2026-05-21 7:28 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block, linux-kernel, lvc-project, stable
The bdev pseudo-filesystem is an internal kernel filesystem with which
userspace should not interfere. Unregister it so that userspace cannot
even attempt to mount it.
This fixes a bug [1] that occurs when attempting to access files,
because the system call move_mount() uses pointers declared in the
inode_operations structure, which for the bdev pseudo-filesystem
are always equal to 0. `inode->i_op = &empty_iops;`
[1]
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0010) - not-present page
PGD 23380067 P4D 23380067 PUD 23381067 PMD 0
Oops: 0010 [#1] PREEMPT SMP KASAN NOPTI
CPU: 2 PID: 17125 Comm: syz-executor.0 Not tainted 6.1.155-syzkaller-00350-g84221fde2681 #0
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
RIP: 0010:0x0
Call Trace:
<TASK>
lookup_open.isra.0+0x700/0x1180 fs/namei.c:3460
open_last_lookups fs/namei.c:3550 [inline]
path_openat+0x953/0x2700 fs/namei.c:3780
do_filp_open+0x1c5/0x410 fs/namei.c:3810
do_sys_openat2+0x171/0x4d0 fs/open.c:1318
do_sys_open fs/open.c:1334 [inline]
__do_sys_openat fs/open.c:1350 [inline]
__se_sys_openat fs/open.c:1345 [inline]
__x64_sys_openat+0x13c/0x1f0 fs/open.c:1345
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x35/0x80 arch/x86/entry/common.c:81
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Link: https://lore.kernel.org/all/20131010004732.GJ13318@ZenIV.linux.org.uk/T/#
Cc: stable@vger.kernel.org
Signed-off-by: Denis Arefev <arefev@swemel.ru>
---
block/bdev.c | 5 -----
1 file changed, 5 deletions(-)
diff --git a/block/bdev.c b/block/bdev.c
index bb0ffa3bb4df..107ac9eaac7f 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -446,15 +446,10 @@ EXPORT_SYMBOL_GPL(blockdev_superblock);
void __init bdev_cache_init(void)
{
- int err;
-
bdev_cachep = kmem_cache_create("bdev_cache", sizeof(struct bdev_inode),
0, (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
SLAB_ACCOUNT|SLAB_PANIC),
init_once);
- err = register_filesystem(&bd_type);
- if (err)
- panic("Cannot register bdev pseudo-fs");
blockdev_mnt = kern_mount(&bd_type);
if (IS_ERR(blockdev_mnt))
panic("Cannot create bdev pseudo-fs");
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH] block: Avoid mounting the bdev pseudo-filesystem in userspace 2026-05-21 7:28 [PATCH] block: Avoid mounting the bdev pseudo-filesystem in userspace Denis Arefev @ 2026-05-25 6:07 ` Christoph Hellwig 2026-06-02 1:21 ` Al Viro 2026-05-26 16:37 ` Jens Axboe 2026-06-02 1:19 ` Al Viro 2 siblings, 1 reply; 12+ messages in thread From: Christoph Hellwig @ 2026-05-25 6:07 UTC (permalink / raw) To: Denis Arefev; +Cc: Jens Axboe, linux-block, linux-kernel, lvc-project, stable On Thu, May 21, 2026 at 10:28:56AM +0300, Denis Arefev wrote: > The bdev pseudo-filesystem is an internal kernel filesystem with which > userspace should not interfere. Unregister it so that userspace cannot > even attempt to mount it. > > This fixes a bug [1] that occurs when attempting to access files, > because the system call move_mount() uses pointers declared in the > inode_operations structure, which for the bdev pseudo-filesystem > are always equal to 0. `inode->i_op = &empty_iops;` Looks good: Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] block: Avoid mounting the bdev pseudo-filesystem in userspace 2026-05-25 6:07 ` Christoph Hellwig @ 2026-06-02 1:21 ` Al Viro 0 siblings, 0 replies; 12+ messages in thread From: Al Viro @ 2026-06-02 1:21 UTC (permalink / raw) To: Christoph Hellwig Cc: Denis Arefev, Jens Axboe, linux-block, linux-kernel, lvc-project, stable On Sun, May 24, 2026 at 11:07:18PM -0700, Christoph Hellwig wrote: > On Thu, May 21, 2026 at 10:28:56AM +0300, Denis Arefev wrote: > > The bdev pseudo-filesystem is an internal kernel filesystem with which > > userspace should not interfere. Unregister it so that userspace cannot > > even attempt to mount it. > > > > This fixes a bug [1] that occurs when attempting to access files, > > because the system call move_mount() uses pointers declared in the > > inode_operations structure, which for the bdev pseudo-filesystem > > are always equal to 0. `inode->i_op = &empty_iops;` > > Looks good: It really, really does not. I would like to see the reproducer - analysis looks like random noise out of LLM. I've no real problem with removing that register_filesystem(), but if it *does* fix some reproducer, I really want to see details. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] block: Avoid mounting the bdev pseudo-filesystem in userspace 2026-05-21 7:28 [PATCH] block: Avoid mounting the bdev pseudo-filesystem in userspace Denis Arefev 2026-05-25 6:07 ` Christoph Hellwig @ 2026-05-26 16:37 ` Jens Axboe 2026-06-02 1:19 ` Al Viro 2 siblings, 0 replies; 12+ messages in thread From: Jens Axboe @ 2026-05-26 16:37 UTC (permalink / raw) To: Denis Arefev; +Cc: linux-block, linux-kernel, lvc-project, stable On Thu, 21 May 2026 10:28:56 +0300, Denis Arefev wrote: > The bdev pseudo-filesystem is an internal kernel filesystem with which > userspace should not interfere. Unregister it so that userspace cannot > even attempt to mount it. > > This fixes a bug [1] that occurs when attempting to access files, > because the system call move_mount() uses pointers declared in the > inode_operations structure, which for the bdev pseudo-filesystem > are always equal to 0. `inode->i_op = &empty_iops;` > > [...] Applied, thanks! [1/1] block: Avoid mounting the bdev pseudo-filesystem in userspace commit: b518ae170f6c411cac2d5f320278c27d902bc628 Best regards, -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] block: Avoid mounting the bdev pseudo-filesystem in userspace 2026-05-21 7:28 [PATCH] block: Avoid mounting the bdev pseudo-filesystem in userspace Denis Arefev 2026-05-25 6:07 ` Christoph Hellwig 2026-05-26 16:37 ` Jens Axboe @ 2026-06-02 1:19 ` Al Viro 2026-06-02 1:35 ` Al Viro 2 siblings, 1 reply; 12+ messages in thread From: Al Viro @ 2026-06-02 1:19 UTC (permalink / raw) To: Denis Arefev; +Cc: Jens Axboe, linux-block, linux-kernel, lvc-project, stable On Thu, May 21, 2026 at 10:28:56AM +0300, Denis Arefev wrote: > The bdev pseudo-filesystem is an internal kernel filesystem with which > userspace should not interfere. Unregister it so that userspace cannot > even attempt to mount it. > > This fixes a bug [1] that occurs when attempting to access files, > because the system call move_mount() uses pointers declared in the > inode_operations structure, which for the bdev pseudo-filesystem > are always equal to 0. `inode->i_op = &empty_iops;` What? init_pseudo() sets SB_NOUSER; what are you talking about? And assuming you've somehow managed to mount the sucker, which ->i_op method had been accessed? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] block: Avoid mounting the bdev pseudo-filesystem in userspace 2026-06-02 1:19 ` Al Viro @ 2026-06-02 1:35 ` Al Viro 2026-06-02 2:04 ` [PATCH] make new mount API honour SB_NOUSER (was Re: [PATCH] block: Avoid mounting the bdev pseudo-filesystem in userspace) Al Viro 0 siblings, 1 reply; 12+ messages in thread From: Al Viro @ 2026-06-02 1:35 UTC (permalink / raw) To: Denis Arefev Cc: linux-fsdevel, Jens Axboe, linux-block, linux-kernel, lvc-project, stable On Tue, Jun 02, 2026 at 02:19:07AM +0100, Al Viro wrote: > On Thu, May 21, 2026 at 10:28:56AM +0300, Denis Arefev wrote: > > The bdev pseudo-filesystem is an internal kernel filesystem with which > > userspace should not interfere. Unregister it so that userspace cannot > > even attempt to mount it. > > > > This fixes a bug [1] that occurs when attempting to access files, > > because the system call move_mount() uses pointers declared in the > > inode_operations structure, which for the bdev pseudo-filesystem > > are always equal to 0. `inode->i_op = &empty_iops;` > > What? init_pseudo() sets SB_NOUSER; what are you talking about? ... which doesn't suffice, apparently, since now bdev has become mountable, along with the rest of pseudo-fs. *THAT* is a bug. > And assuming you've somehow managed to mount the sucker, which > ->i_op method had been accessed? ->lookup(), apparently. Which means that 'directory' should've been rejected by d_can_lookup(), no matter which filesystem it's been from. Which might or might not be a bug in its own right. In any case, NAK on that patch - it's papering over the real bug that has nothing to do with block layer. mount -t bdev none /mnt must fail, same as for pipefs, sockfs, etc. It doesn't. fsdevel Cc'd, as it should've been from the very beginning. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] make new mount API honour SB_NOUSER (was Re: [PATCH] block: Avoid mounting the bdev pseudo-filesystem in userspace) 2026-06-02 1:35 ` Al Viro @ 2026-06-02 2:04 ` Al Viro 2026-06-02 9:11 ` Jan Kara 2026-06-02 14:55 ` Christian Brauner 0 siblings, 2 replies; 12+ messages in thread From: Al Viro @ 2026-06-02 2:04 UTC (permalink / raw) To: Linus Torvalds Cc: Christian Brauner, Jan Kara, linux-fsdevel, Jens Axboe, linux-block, linux-kernel, lvc-project, stable, Denis Arefev one should *not* be allowed to mount one of those, new API or not. Reported-by: Denis Arefev <arefev@swemel.ru> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --- [[ I still want to see the rest of the reproducer - report smells like a missing d_can_lookup() somewhere, on top of fsmount(2) bug]] diff --git a/fs/namespace.c b/fs/namespace.c index fe919abd2f01..17777c837683 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -4499,6 +4499,10 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags, new_mnt = vfs_create_mount(fc); if (IS_ERR(new_mnt)) return PTR_ERR(new_mnt); + if (new_mnt->mnt_sb->s_flags & SB_NOUSER) { + mntput(new_mnt); + return -EINVAL; + } new_mnt->mnt_flags = mnt_flags; new_path.dentry = dget(fc->root); ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] make new mount API honour SB_NOUSER (was Re: [PATCH] block: Avoid mounting the bdev pseudo-filesystem in userspace) 2026-06-02 2:04 ` [PATCH] make new mount API honour SB_NOUSER (was Re: [PATCH] block: Avoid mounting the bdev pseudo-filesystem in userspace) Al Viro @ 2026-06-02 9:11 ` Jan Kara 2026-06-02 13:23 ` Arefev 2026-06-02 14:07 ` Al Viro 2026-06-02 14:55 ` Christian Brauner 1 sibling, 2 replies; 12+ messages in thread From: Jan Kara @ 2026-06-02 9:11 UTC (permalink / raw) To: Al Viro Cc: Linus Torvalds, Christian Brauner, Jan Kara, linux-fsdevel, Jens Axboe, linux-block, linux-kernel, lvc-project, stable, Denis Arefev On Tue 02-06-26 03:04:44, Al Viro wrote: > one should *not* be allowed to mount one of those, new API or not. > > Reported-by: Denis Arefev <arefev@swemel.ru> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Won't it make sense to actually check fc->sb_flags before we call vfs_create_mount()? Otherwise it looks good to me. Honza > --- > [[ I still want to see the rest of the reproducer - report smells like a missing > d_can_lookup() somewhere, on top of fsmount(2) bug]] > diff --git a/fs/namespace.c b/fs/namespace.c > index fe919abd2f01..17777c837683 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -4499,6 +4499,10 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags, > new_mnt = vfs_create_mount(fc); > if (IS_ERR(new_mnt)) > return PTR_ERR(new_mnt); > + if (new_mnt->mnt_sb->s_flags & SB_NOUSER) { > + mntput(new_mnt); > + return -EINVAL; > + } > new_mnt->mnt_flags = mnt_flags; > > new_path.dentry = dget(fc->root); -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] make new mount API honour SB_NOUSER (was Re: [PATCH] block: Avoid mounting the bdev pseudo-filesystem in userspace) 2026-06-02 9:11 ` Jan Kara @ 2026-06-02 13:23 ` Arefev 2026-06-02 14:54 ` Al Viro 2026-06-02 14:07 ` Al Viro 1 sibling, 1 reply; 12+ messages in thread From: Arefev @ 2026-06-02 13:23 UTC (permalink / raw) To: Jan Kara, Al Viro Cc: Linus Torvalds, Christian Brauner, linux-fsdevel, Jens Axboe, linux-block, linux-kernel, lvc-project, stable 02.06.2026 12:11, Jan Kara пишет: > On Tue 02-06-26 03:04:44, Al Viro wrote: >> one should *not* be allowed to mount one of those, new API or not. >> >> Reported-by: Denis Arefev <arefev@swemel.ru> >> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> > Won't it make sense to actually check fc->sb_flags before we call > vfs_create_mount()? Otherwise it looks good to me. > > Honza Hi all. The sequence of system calls before the crash could be as follows: fsopen("bdev", ...) fsconfig(fd_fs, FSCONFIG_CMD_CREATE, 0,0,0) fsmount(fd_fs, 0,0) move_mount(fd_mnt, "", AT_FDCWD, "./file1", 0x46ul) The system call executed at the time of the cras: open("/dev/media0", ...); Simplified stacktrace: path_openat |-> link_path_walk |-> walk_component |-> __lookup_slow |-> ld = inode->i_op->lookup(inode, dentry, flags); <- Oops Searching for possible solutions in the commit history yielded the following result: commit fd3e007f6c6a0f677e4ee8aca4b9bab8ad6cab9a commit 1a6e9e76b713d9632783efe78295ed3507fdad64 commit d6f2589ad561aa5fa39f347eca6942668b7560a1 Checking the fc->sb_flags flag before calling vfs_create_mount() is a great idea, if it helps prevent crashes in two more file systems, 'sockfs' and 'pipefs'. Best regards, Denis. > >> --- >> [[ I still want to see the rest of the reproducer - report smells like a missing >> d_can_lookup() somewhere, on top of fsmount(2) bug]] >> diff --git a/fs/namespace.c b/fs/namespace.c >> index fe919abd2f01..17777c837683 100644 >> --- a/fs/namespace.c >> +++ b/fs/namespace.c >> @@ -4499,6 +4499,10 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags, >> new_mnt = vfs_create_mount(fc); >> if (IS_ERR(new_mnt)) >> return PTR_ERR(new_mnt); >> + if (new_mnt->mnt_sb->s_flags & SB_NOUSER) { >> + mntput(new_mnt); >> + return -EINVAL; >> + } >> new_mnt->mnt_flags = mnt_flags; >> >> new_path.dentry = dget(fc->root); ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] make new mount API honour SB_NOUSER (was Re: [PATCH] block: Avoid mounting the bdev pseudo-filesystem in userspace) 2026-06-02 13:23 ` Arefev @ 2026-06-02 14:54 ` Al Viro 0 siblings, 0 replies; 12+ messages in thread From: Al Viro @ 2026-06-02 14:54 UTC (permalink / raw) To: Arefev Cc: Jan Kara, Linus Torvalds, Christian Brauner, linux-fsdevel, Jens Axboe, linux-block, linux-kernel, lvc-project, stable On Tue, Jun 02, 2026 at 04:23:21PM +0300, Arefev wrote: > The sequence of system calls before the crash could be as follows: > > fsopen("bdev", ...) > fsconfig(fd_fs, FSCONFIG_CMD_CREATE, 0,0,0) > fsmount(fd_fs, 0,0) > move_mount(fd_mnt, "", AT_FDCWD, "./file1", 0x46ul) Huh? "file1" being a regular file or was it actually a directory? AFAICS, the d_is_dir() mismatch would be rejected by do_move_mount()... > The system call executed at the time of the cras: > > open("/dev/media0", ...); > > Simplified stacktrace: > > path_openat > |-> link_path_walk > |-> walk_component > |-> __lookup_slow > |-> ld = inode->i_op->lookup(inode, dentry, flags); <- Oops How the hell does that thing bound on top of "./file1" lead to resolution of "/dev/media0" walking anywhere near it? Something's missing here. > Checking the fc->sb_flags flag before calling vfs_create_mount() is a great > idea, > if it helps prevent crashes in two more file systems, 'sockfs' and 'pipefs'. Calling vfs_create_mount() is not a problem; refusing to attach the result if SB_NOUSER has ended up in ->s_flags is the right thing to do, but I still would like to understand how did this call of walk_component() manage to evade if (unlikely(!d_can_lookup(nd->path.dentry))) { if (nd->flags & LOOKUP_RCU) { if (!try_to_unlazy(nd)) return -ECHILD; } return -ENOTDIR; } on the previous iteration through link_path_walk() or, if it had been the first one, the corresponding checks at chroot()/chdir()/fchdir() time. Note that there are very legitimate objects with NULL ->lookup() - every regular file is like that, obviously, but there also exist ones that look like directories in mode bits, but still have NULL ->lookup(). See d_flags_for_inode() and look for DCACHE_AUTODIR_TYPE there. So whatever scenario has played out, you've got a call of walk_component() with nd->path.dentry that should have failed d_can_lookup(). That ought to have been prevented and this prevention would better be much closer than anything fsmount(2) does. Don't get me wrong - userland mounting of bdev and friends should not be allowed, but that's not the only thing that went wrong in the reproducer. BTW, how easy to trigger it is? Is that "you need to run for a few months on a bunch of boxen" or "run this sequence and it'll crash that way"? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] make new mount API honour SB_NOUSER (was Re: [PATCH] block: Avoid mounting the bdev pseudo-filesystem in userspace) 2026-06-02 9:11 ` Jan Kara 2026-06-02 13:23 ` Arefev @ 2026-06-02 14:07 ` Al Viro 1 sibling, 0 replies; 12+ messages in thread From: Al Viro @ 2026-06-02 14:07 UTC (permalink / raw) To: Jan Kara Cc: Linus Torvalds, Christian Brauner, linux-fsdevel, Jens Axboe, linux-block, linux-kernel, lvc-project, stable, Denis Arefev On Tue, Jun 02, 2026 at 11:11:11AM +0200, Jan Kara wrote: > On Tue 02-06-26 03:04:44, Al Viro wrote: > > one should *not* be allowed to mount one of those, new API or not. > > > > Reported-by: Denis Arefev <arefev@swemel.ru> > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> > > Won't it make sense to actually check fc->sb_flags before we call > vfs_create_mount()? Otherwise it looks good to me. Interpretation of fc->sb_flags is up to your ->get_tree(). What matters is ->s_flags in the resulting superblock; that's type-independent and that's what we ought to check... ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] make new mount API honour SB_NOUSER (was Re: [PATCH] block: Avoid mounting the bdev pseudo-filesystem in userspace) 2026-06-02 2:04 ` [PATCH] make new mount API honour SB_NOUSER (was Re: [PATCH] block: Avoid mounting the bdev pseudo-filesystem in userspace) Al Viro 2026-06-02 9:11 ` Jan Kara @ 2026-06-02 14:55 ` Christian Brauner 1 sibling, 0 replies; 12+ messages in thread From: Christian Brauner @ 2026-06-02 14:55 UTC (permalink / raw) To: Linus Torvalds, Al Viro Cc: Christian Brauner, Jan Kara, linux-fsdevel, Jens Axboe, linux-block, linux-kernel, lvc-project, stable, Denis Arefev On Tue, 02 Jun 2026 03:04:44 +0100, Al Viro wrote: > one should *not* be allowed to mount one of those, new API or not. Applied to the vfs-7.2.misc branch of the vfs/vfs.git tree. Patches in the vfs-7.2.misc branch should appear in linux-next soon. Please report any outstanding bugs that were missed during review in a new review to the original patch series allowing us to drop it. It's encouraged to provide Acked-bys and Reviewed-bys even though the patch has now been applied. If possible patch trailers will be updated. Note that commit hashes shown below are subject to change due to rebase, trailer updates or similar. If in doubt, please check the listed branch. tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git branch: vfs-7.2.misc [1/1] mount: honour SB_NOUSER in the new mount API https://git.kernel.org/vfs/vfs/c/67d8c452fae1 ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-06-02 14:56 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-21 7:28 [PATCH] block: Avoid mounting the bdev pseudo-filesystem in userspace Denis Arefev 2026-05-25 6:07 ` Christoph Hellwig 2026-06-02 1:21 ` Al Viro 2026-05-26 16:37 ` Jens Axboe 2026-06-02 1:19 ` Al Viro 2026-06-02 1:35 ` Al Viro 2026-06-02 2:04 ` [PATCH] make new mount API honour SB_NOUSER (was Re: [PATCH] block: Avoid mounting the bdev pseudo-filesystem in userspace) Al Viro 2026-06-02 9:11 ` Jan Kara 2026-06-02 13:23 ` Arefev 2026-06-02 14:54 ` Al Viro 2026-06-02 14:07 ` Al Viro 2026-06-02 14:55 ` Christian Brauner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox