All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thibaut Sautereau <thibaut@sautereau.fr>
To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <christian@brauner.io>
Subject: NULL pointer deref in put_fs_context with unprivileged LXC
Date: Thu, 10 Oct 2019 23:35:12 +0200	[thread overview]
Message-ID: <20191010213512.GA875@gandi.net> (raw)

Since v5.1 and as of v5.3.5, I get the following oops every single time
I start an *unprivileged* LXC container:

	BUG: kernel NULL pointer dereference, address: 0000000000000043
	#PF: supervisor read access in kernel mode
	#PF: error_code(0x0000) - not-present page
	PGD 0 P4D 0 
	Oops: 0000 [#1] SMP PTI
	CPU: 3 PID: 3789 Comm: systemd Tainted: G                T 5.3.5 #5
	RIP: 0010:put_fs_context+0x13/0x180
	Code: e4 31 c9 eb c8 e8 1d d6 dc ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 41 54 55 48 89 fd 53 48 8b b>
	RSP: 0018:ffffc90000777e10 EFLAGS: 00010286
	RAX: 00000000fffffff3 RBX: 0000000000000000 RCX: ffffc90000777d6c
	RDX: 0000000000000000 RSI: ffff8884062331e8 RDI: fffffffffffffff3
	RBP: ffff8883e772dc00 R08: ffff88840d6bc680 R09: 0000000000000001
	R10: 0000000000000000 R11: 0000000000000001 R12: fffffffffffffff3
	R13: ffff888405ad2860 R14: ffff8883e772dc00 R15: 0000000000000027
	FS:  00007998d1444980(0000) GS:ffff88840f980000(0000) knlGS:0000000000000000
	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	CR2: 0000000000000043 CR3: 000000040d236003 CR4: 00000000001606e0
	Call Trace:
	 do_mount+0x2f6/0xab0
	 ksys_mount+0x79/0xc0
	 __x64_sys_mount+0x1d/0x30
	 do_syscall_64+0x68/0x666
	 entry_SYSCALL_64_after_hwframe+0x49/0xbe
	RIP: 0033:0x7998d23aafea
	Code: 48 8b 0d a9 0e 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 4>
	RSP: 002b:00007ffd4b0c8bc8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
	RAX: ffffffffffffffda RBX: 00005ae352a55a30 RCX: 00007998d23aafea
	RDX: 00005ae3529fe0b3 RSI: 00005ae3529fe0d5 RDI: 00005ae3529fe0b3
	RBP: 0000000000000007 R08: 00005ae3529fe0ca R09: 00005ae35433fb20
	R10: 000000000000000e R11: 0000000000000246 R12: 00000000fffffffe
	R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
	Modules linked in:
	CR2: 0000000000000043
	---[ end trace 66de701522a6be46 ]---
	RIP: 0010:put_fs_context+0x13/0x180
	Code: e4 31 c9 eb c8 e8 1d d6 dc ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 41 54 55 48 89 fd 53 48 8b b>
	RSP: 0018:ffffc90000777e10 EFLAGS: 00010286
	RAX: 00000000fffffff3 RBX: 0000000000000000 RCX: ffffc90000777d6c
	RDX: 0000000000000000 RSI: ffff8884062331e8 RDI: fffffffffffffff3
	RBP: ffff8883e772dc00 R08: ffff88840d6bc680 R09: 0000000000000001
	R10: 0000000000000000 R11: 0000000000000001 R12: fffffffffffffff3
	R13: ffff888405ad2860 R14: ffff8883e772dc00 R15: 0000000000000027
	FS:  00007998d1444980(0000) GS:ffff88840f980000(0000) knlGS:0000000000000000
	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	CR2: 0000000000000043 CR3: 000000040d236003 CR4: 00000000001606e0

According to GDB:
	$ gdb fs/fs_context.o
	(gdb) l *put_fs_context+0x13
	0xa53 is in put_fs_context (fs/fs_context.c:494).
	489	void put_fs_context(struct fs_context *fc)
	490	{
	491		struct super_block *sb;
	492
	493		if (fc->root) {
	494			sb = fc->root->d_sb;
	495			dput(fc->root);
	496			fc->root = NULL;
	497			deactivate_super(sb);
	498		}

	$ gdb fs/namespace.o
	(gdb) l *do_mount+0x2f6
	0x5506 is in do_mount (fs/namespace.c:2796).
	2791			err = vfs_get_tree(fc);
	2792		if (!err)
	2793			err = do_new_mount_fc(fc, path, mnt_flags);
	2794
	2795		put_fs_context(fc);
	2796		return err;
	2797	}
	2798
	2799	int finish_automount(struct vfsmount *m, struct path *path)
	2800	{


I don't face this issue when starting the same container as a
privileged one. I tried to strace the container when launched in
foreground and the following snippet may be related to the problem:

	[pid 35813] openat(AT_FDCWD, "/sys/fs", O_RDONLY|O_CLOEXEC|O_PATH|O_DIRECTORY) = 4
	[pid 35813] name_to_handle_at(4, "cgroup", {handle_bytes=128}, 0x7ffcdf6ebac4, AT_SYMLINK_FOLLOW) = -1 EOPNOTSUPP (Operation not supported)
	[pid 35813] name_to_handle_at(4, "", {handle_bytes=128}, 0x7ffcdf6ebac4, AT_EMPTY_PATH) = -1 EOPNOTSUPP (Operation not supported)
	[pid 35813] openat(4, "cgroup", O_RDONLY|O_CLOEXEC|O_PATH) = 5
	[pid 35813] openat(AT_FDCWD, "/proc/self/fdinfo/5", O_RDONLY|O_CLOEXEC) = 6
	[pid 35813] fstat(6, {st_mode=S_IFREG|0400, st_size=0, ...}) = 0
	[pid 35813] fstat(6, {st_mode=S_IFREG|0400, st_size=0, ...}) = 0
	[pid 35813] read(6, "pos:\t0\nflags:\t012000000\nmnt_id:\t"..., 2048) = 36
	[pid 35813] read(6, "", 1024)           = 0
	[pid 35813] close(6)                    = 0
	[pid 35813] close(5)                    = 0
	[pid 35813] openat(AT_FDCWD, "/proc/self/fdinfo/4", O_RDONLY|O_CLOEXEC) = 5
	[pid 35813] fstat(5, {st_mode=S_IFREG|0400, st_size=0, ...}) = 0
	[pid 35813] fstat(5, {st_mode=S_IFREG|0400, st_size=0, ...}) = 0
	[pid 35813] read(5, "pos:\t0\nflags:\t012200000\nmnt_id:\t"..., 2048) = 36
	[pid 35813] read(5, "", 1024)           = 0
	[pid 35813] close(5)                    = 0
	[pid 35813] newfstatat(4, "cgroup", {st_mode=S_IFDIR|0555, st_size=0, ...}, 0) = 0
	[pid 35813] newfstatat(4, "", {st_mode=S_IFDIR|0755, st_size=0, ...}, AT_EMPTY_PATH) = 0
	[pid 35813] close(4)                    = 0
	[pid 35813] stat("/sys/fs", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
	[pid 35813] mkdir("/sys/fs/cgroup", 0755) = -1 EEXIST (File exists)
	[pid 35813] stat("/sys/fs/cgroup", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
	[pid 35813] mount("tmpfs", "/sys/fs/cgroup", "tmpfs", MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_STRICTATIME, "mode=755") = 0
	[pid 35813] statfs("/sys/fs/cgroup/", {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=2032290, f_bfree=2032290, f_bavail=2032290, f_files=2032290, f_ffree=2032289, f_fsid={val=[0, 0]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NODEV|ST_NOEXEC}) = 0
	[pid 35813] statfs("/sys/fs/cgroup/unified/", 0x7ffcdf6ebc10) = -1 ENOENT (No such file or directory)
	[pid 35813] statfs("/sys/fs/cgroup/systemd/", 0x7ffcdf6ebc10) = -1 ENOENT (No such file or directory)
	[pid 35813] openat(AT_FDCWD, "/proc/1/cmdline", O_RDONLY|O_CLOEXEC) = 4
	[pid 35813] prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
	[pid 35813] mmap(NULL, 2101248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7999ced5b000
	[pid 35813] fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
	[pid 35813] read(4, "/sbin/init\0", 1024) = 11
	[pid 35813] read(4, "", 1024)           = 0
	[pid 35813] mremap(0x7999ced5b000, 2101248, 4096, MREMAP_MAYMOVE) = 0x7999ced5b000
	[pid 35813] close(4)                    = 0
	[pid 35813] munmap(0x7999ced5b000, 4096) = 0
	[pid 35813] openat(AT_FDCWD, "/", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 4
	[pid 35813] openat(4, "sys", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 5
	[pid 35813] fstat(5, {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
	[pid 35813] close(4)                    = 0
	[pid 35813] openat(5, "fs", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 4
	[pid 35813] fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
	[pid 35813] close(5)                    = 0
	[pid 35813] openat(4, "cgroup", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 5
	[pid 35813] fstat(5, {st_mode=S_IFDIR|0755, st_size=40, ...}) = 0
	[pid 35813] close(4)                    = 0
	[pid 35813] openat(5, "unified", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = -1 ENOENT (No such file or directory)
	[pid 35813] close(5)                    = 0
	[pid 35813] stat("/sys/fs/cgroup", {st_mode=S_IFDIR|0755, st_size=40, ...}) = 0
	[pid 35813] mkdir("/sys/fs/cgroup/unified", 0755) = 0
	[pid 35813] mount("cgroup2", "/sys/fs/cgroup/unified", "cgroup2", MS_NOSUID|MS_NODEV|MS_NOEXEC, "nsdelegate") = ?
	[pid 35813] +++ killed by SIGKILL +++

I've been trying to reproduce by playing with user namespaces and
cgroup2 mounts but I didn't succeed. Only an lxc-start of an
unprivileged container causes an oops (every single time). I wanted to
dive into the code but I hadn't looked at this part of the kernel since
the recent rework of file system mounting internals, thus I've been
postponing that for weeks now and I thought it was time to report the
bug anyway. Sorry for the lack of more detailed info :/

-- 
Thibaut

             reply	other threads:[~2019-10-10 21:35 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-10 21:35 Thibaut Sautereau [this message]
2019-10-11 14:14 ` NULL pointer deref in put_fs_context with unprivileged LXC Christian Brauner
2019-11-05 20:58   ` Thibaut Sautereau
2019-11-06  7:24     ` Al Viro
2019-11-06 21:06       ` Thibaut Sautereau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191010213512.GA875@gandi.net \
    --to=thibaut@sautereau.fr \
    --cc=christian@brauner.io \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.