[PATCH review 00/18] userns: review of bug fixes for 3.19-rcX

public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH review 00/18] userns: review of bug fixes for 3.19-rcX
@ 2014-12-12 22:32 Eric W. Biederman
  2014-12-12 22:48 ` [PATCH review 01/18] mnt: Implicitly add MNT_NODEV on remount when it was implicitly added by mount Eric W. Biederman
                   ` (14 more replies)
  0 siblings, 15 replies; 24+ messages in thread
From: Eric W. Biederman @ 2014-12-12 22:32 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Richard Weinberger,
	Andy Lutomirski

The entire tree for testing is available at:
	git.kernel.org:/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-testing

This is my queue of important bug fixes for user namespaces.  Most of
these changes warrant being backported.  A few are bug fixes for cases
where only root can trigger the issue so have not been marked for being
back ported to stable.

A few of these patches have not been posted for review preivously, so I
a giving the light of mailling list before I send them to Linus.  This
patchset has seen some testing already. 

Since there are small deliberate breakage of userspace in here the more
reviewers/testers the better.

Baring complictions I intend to ask Linus to pull this patchset sometime
early next week.

So far nothing broke on my libvirt-lxc test bed. :-)
Tested with openSUSE 13.2 and libvirt 1.2.9.
Tested-by: Richard Weinberger <richard-/L3Ra7n9ekc@public.gmane.org>

Tested on Fedora20 with libvirt 1.2.11, works fine.
Tested-by: Chen Hanxiao <chenhanxiao-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>

Eric W. Biederman (18):
      mnt: Implicitly add MNT_NODEV on remount when it was implicitly added by mount
      mnt: Update unprivileged remount test
      umount: Disallow unprivileged mount force
      umount: Do not allow unmounting rootfs.
      mnt: Move the clear of MNT_LOCKED from copy_tree to it's callers.
      mnt: Carefully set CL_UNPRIVILEGED in clone_mnt
      mnt: Clear mnt_expire during pivot_root
      groups: Consolidate the setgroups permission checks
      userns: Document what the invariant required for safe unprivileged mappings.
      userns: Don't allow setgroups until a gid mapping has been setablished
      userns: Don't allow unprivileged creation of gid mappings
      userns: Check euid no fsuid when establishing an unprivileged uid mapping
      userns: Only allow the creator of the userns unprivileged mappings
      userns: Rename id_map_mutex to userns_state_mutex
      userns: Add a knob to disable setgroups on a per user namespace basis
      userns: Allow setting gid_maps without privilege when setgroups is disabled
      userns; Correct the comment in map_write
      userns: Unbreak the unprivileged remount tests

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH review 01/18] mnt: Implicitly add MNT_NODEV on remount when it was implicitly added by mount
  2014-12-12 22:32 [PATCH review 00/18] userns: review of bug fixes for 3.19-rcX Eric W. Biederman
@ 2014-12-12 22:48 ` Eric W. Biederman
  2014-12-12 22:48 ` [PATCH review 02/18] mnt: Update unprivileged remount test Eric W. Biederman
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2014-12-12 22:48 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Serge E. Hallyn, Andy Lutomirski, Chen Hanxiao,
	Richard Weinberger, Eric W. Biederman, stable

Now that remount is properly enforcing the rule that you can't remove
nodev at least sandstorm.io is breaking when performing a remount.

It turns out that there is an easy intuitive solution implicitly
add nodev on remount when nodev was implicitly added on mount.

Tested-by: Cedric Bosdonnat <cbosdonnat@suse.com>
Tested-by: Richard Weinberger <richard@nod.at>
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/namespace.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 5b66b2b3624d..3a1a87dc33df 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2098,7 +2098,13 @@ static int do_remount(struct path *path, int flags, int mnt_flags,
 	}
 	if ((mnt->mnt.mnt_flags & MNT_LOCK_NODEV) &&
 	    !(mnt_flags & MNT_NODEV)) {
-		return -EPERM;
+		/* Was the nodev implicitly added in mount? */
+		if ((mnt->mnt_ns->user_ns != &init_user_ns) &&
+		    !(sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT)) {
+			mnt_flags |= MNT_NODEV;
+		} else {
+			return -EPERM;
+		}
 	}
 	if ((mnt->mnt.mnt_flags & MNT_LOCK_NOSUID) &&
 	    !(mnt_flags & MNT_NOSUID)) {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH review 02/18] mnt: Update unprivileged remount test
  2014-12-12 22:32 [PATCH review 00/18] userns: review of bug fixes for 3.19-rcX Eric W. Biederman
  2014-12-12 22:48 ` [PATCH review 01/18] mnt: Implicitly add MNT_NODEV on remount when it was implicitly added by mount Eric W. Biederman
@ 2014-12-12 22:48 ` Eric W. Biederman
       [not found] ` <87k31wzehb.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2014-12-12 22:48 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Serge E. Hallyn, Andy Lutomirski, Chen Hanxiao,
	Richard Weinberger, Eric W. Biederman, stable

- MNT_NODEV should be irrelevant except when reading back mount flags,
  no longer specify MNT_NODEV on remount.

- Test MNT_NODEV on devpts where it is meaningful even for unprivileged mounts.

- Add a test to verify that remount of a prexisting mount with the same flags
  is allowed and does not change those flags.

- Cleanup up the definitions of MS_REC, MS_RELATIME, MS_STRICTATIME that are used
  when the code is built in an environment without them.

- Correct the test error messages when tests fail.  There were not 5 tests
  that tested MS_RELATIME.

Cc: stable@vger.kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 .../selftests/mount/unprivileged-remount-test.c    | 172 +++++++++++++++++----
 1 file changed, 142 insertions(+), 30 deletions(-)

diff --git a/tools/testing/selftests/mount/unprivileged-remount-test.c b/tools/testing/selftests/mount/unprivileged-remount-test.c
index 1b3ff2fda4d0..9669d375625a 100644
--- a/tools/testing/selftests/mount/unprivileged-remount-test.c
+++ b/tools/testing/selftests/mount/unprivileged-remount-test.c
@@ -6,6 +6,8 @@
 #include <sys/types.h>
 #include <sys/mount.h>
 #include <sys/wait.h>
+#include <sys/vfs.h>
+#include <sys/statvfs.h>
 #include <stdlib.h>
 #include <unistd.h>
 #include <fcntl.h>
@@ -32,11 +34,14 @@
 # define CLONE_NEWPID 0x20000000
 #endif
 
+#ifndef MS_REC
+# define MS_REC 16384
+#endif
 #ifndef MS_RELATIME
-#define MS_RELATIME (1 << 21)
+# define MS_RELATIME (1 << 21)
 #endif
 #ifndef MS_STRICTATIME
-#define MS_STRICTATIME (1 << 24)
+# define MS_STRICTATIME (1 << 24)
 #endif
 
 static void die(char *fmt, ...)
@@ -87,6 +92,45 @@ static void write_file(char *filename, char *fmt, ...)
 	}
 }
 
+static int read_mnt_flags(const char *path)
+{
+	int ret;
+	struct statvfs stat;
+	int mnt_flags;
+
+	ret = statvfs(path, &stat);
+	if (ret != 0) {
+		die("statvfs of %s failed: %s\n",
+			path, strerror(errno));
+	}
+	if (stat.f_flag & ~(ST_RDONLY | ST_NOSUID | ST_NODEV | \
+			ST_NOEXEC | ST_NOATIME | ST_NODIRATIME | ST_RELATIME | \
+			ST_SYNCHRONOUS | ST_MANDLOCK)) {
+		die("Unrecognized mount flags\n");
+	}
+	mnt_flags = 0;
+	if (stat.f_flag & ST_RDONLY)
+		mnt_flags |= MS_RDONLY;
+	if (stat.f_flag & ST_NOSUID)
+		mnt_flags |= MS_NOSUID;
+	if (stat.f_flag & ST_NODEV)
+		mnt_flags |= MS_NODEV;
+	if (stat.f_flag & ST_NOEXEC)
+		mnt_flags |= MS_NOEXEC;
+	if (stat.f_flag & ST_NOATIME)
+		mnt_flags |= MS_NOATIME;
+	if (stat.f_flag & ST_NODIRATIME)
+		mnt_flags |= MS_NODIRATIME;
+	if (stat.f_flag & ST_RELATIME)
+		mnt_flags |= MS_RELATIME;
+	if (stat.f_flag & ST_SYNCHRONOUS)
+		mnt_flags |= MS_SYNCHRONOUS;
+	if (stat.f_flag & ST_MANDLOCK)
+		mnt_flags |= ST_MANDLOCK;
+
+	return mnt_flags;
+}
+
 static void create_and_enter_userns(void)
 {
 	uid_t uid;
@@ -118,7 +162,8 @@ static void create_and_enter_userns(void)
 }
 
 static
-bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags)
+bool test_unpriv_remount(const char *fstype, const char *mount_options,
+			 int mount_flags, int remount_flags, int invalid_flags)
 {
 	pid_t child;
 
@@ -151,9 +196,11 @@ bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags)
 			strerror(errno));
 	}
 
-	if (mount("testing", "/tmp", "ramfs", mount_flags, NULL) != 0) {
-		die("mount of /tmp failed: %s\n",
-			strerror(errno));
+	if (mount("testing", "/tmp", fstype, mount_flags, mount_options) != 0) {
+		die("mount of %s with options '%s' on /tmp failed: %s\n",
+		    fstype,
+		    mount_options? mount_options : "",
+		    strerror(errno));
 	}
 
 	create_and_enter_userns();
@@ -181,62 +228,127 @@ bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags)
 
 static bool test_unpriv_remount_simple(int mount_flags)
 {
-	return test_unpriv_remount(mount_flags, mount_flags, 0);
+	return test_unpriv_remount("ramfs", NULL, mount_flags, mount_flags, 0);
 }
 
 static bool test_unpriv_remount_atime(int mount_flags, int invalid_flags)
 {
-	return test_unpriv_remount(mount_flags, mount_flags, invalid_flags);
+	return test_unpriv_remount("ramfs", NULL, mount_flags, mount_flags,
+				   invalid_flags);
+}
+
+static bool test_priv_mount_unpriv_remount(void)
+{
+	pid_t child;
+	int ret;
+	const char *orig_path = "/dev";
+	const char *dest_path = "/tmp";
+	int orig_mnt_flags, remount_mnt_flags;
+
+	child = fork();
+	if (child == -1) {
+		die("fork failed: %s\n",
+			strerror(errno));
+	}
+	if (child != 0) { /* parent */
+		pid_t pid;
+		int status;
+		pid = waitpid(child, &status, 0);
+		if (pid == -1) {
+			die("waitpid failed: %s\n",
+				strerror(errno));
+		}
+		if (pid != child) {
+			die("waited for %d got %d\n",
+				child, pid);
+		}
+		if (!WIFEXITED(status)) {
+			die("child did not terminate cleanly\n");
+		}
+		return WEXITSTATUS(status) == EXIT_SUCCESS ? true : false;
+	}
+
+	orig_mnt_flags = read_mnt_flags(orig_path);
+
+	create_and_enter_userns();
+	ret = unshare(CLONE_NEWNS);
+	if (ret != 0) {
+		die("unshare(CLONE_NEWNS) failed: %s\n",
+			strerror(errno));
+	}
+
+	ret = mount(orig_path, dest_path, "bind", MS_BIND | MS_REC, NULL);
+	if (ret != 0) {
+		die("recursive bind mount of %s onto %s failed: %s\n",
+			orig_path, dest_path, strerror(errno));
+	}
+
+	ret = mount(dest_path, dest_path, "none",
+		    MS_REMOUNT | MS_BIND | orig_mnt_flags , NULL);
+	if (ret != 0) {
+		/* system("cat /proc/self/mounts"); */
+		die("remount of /tmp failed: %s\n",
+		    strerror(errno));
+	}
+
+	remount_mnt_flags = read_mnt_flags(dest_path);
+	if (orig_mnt_flags != remount_mnt_flags) {
+		die("Mount flags unexpectedly changed during remount of %s originally mounted on %s\n",
+			dest_path, orig_path);
+	}
+	exit(EXIT_SUCCESS);
 }
 
 int main(int argc, char **argv)
 {
-	if (!test_unpriv_remount_simple(MS_RDONLY|MS_NODEV)) {
+	if (!test_unpriv_remount_simple(MS_RDONLY)) {
 		die("MS_RDONLY malfunctions\n");
 	}
-	if (!test_unpriv_remount_simple(MS_NODEV)) {
+	if (!test_unpriv_remount("devpts", "newinstance", MS_NODEV, MS_NODEV, 0)) {
 		die("MS_NODEV malfunctions\n");
 	}
-	if (!test_unpriv_remount_simple(MS_NOSUID|MS_NODEV)) {
+	if (!test_unpriv_remount_simple(MS_NOSUID)) {
 		die("MS_NOSUID malfunctions\n");
 	}
-	if (!test_unpriv_remount_simple(MS_NOEXEC|MS_NODEV)) {
+	if (!test_unpriv_remount_simple(MS_NOEXEC)) {
 		die("MS_NOEXEC malfunctions\n");
 	}
-	if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODEV,
-				       MS_NOATIME|MS_NODEV))
+	if (!test_unpriv_remount_atime(MS_RELATIME,
+				       MS_NOATIME))
 	{
 		die("MS_RELATIME malfunctions\n");
 	}
-	if (!test_unpriv_remount_atime(MS_STRICTATIME|MS_NODEV,
-				       MS_NOATIME|MS_NODEV))
+	if (!test_unpriv_remount_atime(MS_STRICTATIME,
+				       MS_NOATIME))
 	{
 		die("MS_STRICTATIME malfunctions\n");
 	}
-	if (!test_unpriv_remount_atime(MS_NOATIME|MS_NODEV,
-				       MS_STRICTATIME|MS_NODEV))
+	if (!test_unpriv_remount_atime(MS_NOATIME,
+				       MS_STRICTATIME))
 	{
-		die("MS_RELATIME malfunctions\n");
+		die("MS_NOATIME malfunctions\n");
 	}
-	if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODIRATIME|MS_NODEV,
-				       MS_NOATIME|MS_NODEV))
+	if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODIRATIME,
+				       MS_NOATIME))
 	{
-		die("MS_RELATIME malfunctions\n");
+		die("MS_RELATIME|MS_NODIRATIME malfunctions\n");
 	}
-	if (!test_unpriv_remount_atime(MS_STRICTATIME|MS_NODIRATIME|MS_NODEV,
-				       MS_NOATIME|MS_NODEV))
+	if (!test_unpriv_remount_atime(MS_STRICTATIME|MS_NODIRATIME,
+				       MS_NOATIME))
 	{
-		die("MS_RELATIME malfunctions\n");
+		die("MS_STRICTATIME|MS_NODIRATIME malfunctions\n");
 	}
-	if (!test_unpriv_remount_atime(MS_NOATIME|MS_NODIRATIME|MS_NODEV,
-				       MS_STRICTATIME|MS_NODEV))
+	if (!test_unpriv_remount_atime(MS_NOATIME|MS_NODIRATIME,
+				       MS_STRICTATIME))
 	{
-		die("MS_RELATIME malfunctions\n");
+		die("MS_NOATIME|MS_DIRATIME malfunctions\n");
 	}
-	if (!test_unpriv_remount(MS_STRICTATIME|MS_NODEV, MS_NODEV,
-				 MS_NOATIME|MS_NODEV))
+	if (!test_unpriv_remount("ramfs", NULL, MS_STRICTATIME, 0, MS_NOATIME))
 	{
 		die("Default atime malfunctions\n");
 	}
+	if (!test_priv_mount_unpriv_remount()) {
+		die("Mount flags unexpectedly changed after remount\n");
+	}
 	return EXIT_SUCCESS;
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH review 03/18] umount: Disallow unprivileged mount force
       [not found] ` <87k31wzehb.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
@ 2014-12-12 22:48   ` Eric W. Biederman
  2014-12-12 23:07     ` Andy Lutomirski
  2014-12-12 22:48   ` [PATCH review 04/18] umount: Do not allow unmounting rootfs Eric W. Biederman
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: Eric W. Biederman @ 2014-12-12 22:48 UTC (permalink / raw)
  To: Linux Containers
  Cc: Richard Weinberger, stable-u79uwXL29TY76Z2rM5mHXA,
	Andy Lutomirski, Eric W. Biederman,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

Forced unmount affects not just the mount namespace but the underlying
superblock as well.  Restrict forced unmount to the global root user
for now.  Otherwise it becomes possible a user in a less privileged
mount namespace to force the shutdown of a superblock of a filesystem
in a more privileged mount namespace, allowing a DOS attack on root.

Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/namespace.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index 3a1a87dc33df..43b16af8af30 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1544,6 +1544,9 @@ SYSCALL_DEFINE2(umount, char __user *, name, int, flags)
 		goto dput_and_out;
 	if (mnt->mnt.mnt_flags & MNT_LOCKED)
 		goto dput_and_out;
+	retval = -EPERM;
+	if (flags & MNT_FORCE && !capable(CAP_SYS_ADMIN))
+		goto dput_and_out;
 
 	retval = do_umount(mnt, flags);
 dput_and_out:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH review 04/18] umount: Do not allow unmounting rootfs.
       [not found] ` <87k31wzehb.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
  2014-12-12 22:48   ` [PATCH review 03/18] umount: Disallow unprivileged mount force Eric W. Biederman
@ 2014-12-12 22:48   ` Eric W. Biederman
  2014-12-12 22:48   ` [PATCH review 11/18] userns: Don't allow unprivileged creation of gid mappings Eric W. Biederman
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2014-12-12 22:48 UTC (permalink / raw)
  To: Linux Containers
  Cc: Richard Weinberger, Andy Lutomirski, Eric W. Biederman,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

Andrew Vagin <avagin-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:

> #define _GNU_SOURCE
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sched.h>
> #include <unistd.h>
> #include <sys/mount.h>
>
> int main(int argc, char **argv)
> {
> 	int fd;
>
> 	fd = open("/proc/self/ns/mnt", O_RDONLY);
> 	if (fd < 0)
> 	   return 1;
> 	   while (1) {
> 	   	 if (umount2("/", MNT_DETACH) ||
> 		        setns(fd, CLONE_NEWNS))
> 					break;
> 					}
>
> 					return 0;
> }
>
> root@ubuntu:/home/avagin# gcc -Wall nsenter.c -o nsenter
> root@ubuntu:/home/avagin# strace ./nsenter
> execve("./nsenter", ["./nsenter"], [/* 22 vars */]) = 0
> ...
> open("/proc/self/ns/mnt", O_RDONLY)     = 3
> umount("/", MNT_DETACH)                 = 0
> setns(3, 131072)                        = 0
> umount("/", MNT_DETACH
>
causes:

> [  260.548301] ------------[ cut here ]------------
> [  260.550941] kernel BUG at /build/buildd/linux-3.13.0/fs/pnode.c:372!
> [  260.552068] invalid opcode: 0000 [#1] SMP
> [  260.552068] Modules linked in: xt_CHECKSUM iptable_mangle xt_tcpudp xt_addrtype xt_conntrack ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack bridge stp llc dm_thin_pool dm_persistent_data dm_bufio dm_bio_prison iptable_filter ip_tables x_tables crct10dif_pclmul crc32_pclmul ghash_clmulni_intel binfmt_misc nfsd auth_rpcgss nfs_acl aesni_intel nfs lockd aes_x86_64 sunrpc fscache lrw gf128mul glue_helper ablk_helper cryptd serio_raw ppdev parport_pc lp parport btrfs xor raid6_pq libcrc32c psmouse floppy
> [  260.552068] CPU: 0 PID: 1723 Comm: nsenter Not tainted 3.13.0-30-generic #55-Ubuntu
> [  260.552068] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [  260.552068] task: ffff8800376097f0 ti: ffff880074824000 task.ti: ffff880074824000
> [  260.552068] RIP: 0010:[<ffffffff811e9483>]  [<ffffffff811e9483>] propagate_umount+0x123/0x130
> [  260.552068] RSP: 0018:ffff880074825e98  EFLAGS: 00010246
> [  260.552068] RAX: ffff88007c741140 RBX: 0000000000000002 RCX: ffff88007c741190
> [  260.552068] RDX: ffff88007c741190 RSI: ffff880074825ec0 RDI: ffff880074825ec0
> [  260.552068] RBP: ffff880074825eb0 R08: 00000000000172e0 R09: ffff88007fc172e0
> [  260.552068] R10: ffffffff811cc642 R11: ffffea0001d59000 R12: ffff88007c741140
> [  260.552068] R13: ffff88007c741140 R14: ffff88007c741140 R15: 0000000000000000
> [  260.552068] FS:  00007fd5c7e41740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
> [  260.552068] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  260.552068] CR2: 00007fd5c7968050 CR3: 0000000070124000 CR4: 00000000000406f0
> [  260.552068] Stack:
> [  260.552068]  0000000000000002 0000000000000002 ffff88007c631000 ffff880074825ed8
> [  260.552068]  ffffffff811dcfac ffff88007c741140 0000000000000002 ffff88007c741160
> [  260.552068]  ffff880074825f38 ffffffff811dd12b ffffffff811cc642 0000000075640000
> [  260.552068] Call Trace:
> [  260.552068]  [<ffffffff811dcfac>] umount_tree+0x20c/0x260
> [  260.552068]  [<ffffffff811dd12b>] do_umount+0x12b/0x300
> [  260.552068]  [<ffffffff811cc642>] ? final_putname+0x22/0x50
> [  260.552068]  [<ffffffff811cc849>] ? putname+0x29/0x40
> [  260.552068]  [<ffffffff811dd88c>] SyS_umount+0xdc/0x100
> [  260.552068]  [<ffffffff8172aeff>] tracesys+0xe1/0xe6
> [  260.552068] Code: 89 50 08 48 8b 50 08 48 89 02 49 89 45 08 e9 72 ff ff ff 0f 1f 44 00 00 4c 89 e6 4c 89 e7 e8 f5 f6 ff ff 48 89 c3 e9 39 ff ff ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 90 66 66 66 66 90 55 b8 01
> [  260.552068] RIP  [<ffffffff811e9483>] propagate_umount+0x123/0x130
> [  260.552068]  RSP <ffff880074825e98>
> [  260.611451] ---[ end trace 11c33d85f1d4c652 ]--

Which in practice is totally uninteresting.  Only the global root user can
do it, and it is just a stupid thing to do.

However that is no excuse to allow a silly way to oops the kernel.

We can avoid this silly problem by setting MNT_LOCKED on the rootfs
mount point and thus avoid needing any special cases in the unmount
code.

Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/namespace.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index 43b16af8af30..15d0328bd035 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3011,6 +3011,7 @@ static void __init init_mount_tree(void)
 
 	root.mnt = mnt;
 	root.dentry = mnt->mnt_root;
+	mnt->mnt_flags |= MNT_LOCKED;
 
 	set_fs_pwd(current->fs, &root);
 	set_fs_root(current->fs, &root);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH review 05/18] mnt: Move the clear of MNT_LOCKED from copy_tree to it's callers.
  2014-12-12 22:32 [PATCH review 00/18] userns: review of bug fixes for 3.19-rcX Eric W. Biederman
                   ` (2 preceding siblings ...)
       [not found] ` <87k31wzehb.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
@ 2014-12-12 22:48 ` Eric W. Biederman
  2014-12-12 22:48 ` [PATCH review 06/18] mnt: Carefully set CL_UNPRIVILEGED in clone_mnt Eric W. Biederman
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2014-12-12 22:48 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Serge E. Hallyn, Andy Lutomirski, Chen Hanxiao,
	Richard Weinberger, Eric W. Biederman

Clear MNT_LOCKED in the callers of copy_tree except copy_mnt_ns, and
collect_mounts.  In copy_mnt_ns it is necessary to create an exact
copy of a mount tree, so not clearing MNT_LOCKED is important.
Similarly collect_mounts is used to take a snapshot of the mount tree
for audit logging purposes and auditing using a faithful copy of the
tree is important.

This becomes particularly significant when we start setting MNT_LOCKED
on rootfs to prevent it from being unmounted.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/namespace.c | 1 -
 fs/pnode.c     | 1 +
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 15d0328bd035..e8d1ffa7f132 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1613,7 +1613,6 @@ struct mount *copy_tree(struct mount *mnt, struct dentry *dentry,
 	if (IS_ERR(q))
 		return q;
 
-	q->mnt.mnt_flags &= ~MNT_LOCKED;
 	q->mnt_mountpoint = mnt->mnt_mountpoint;
 
 	p = mnt;
diff --git a/fs/pnode.c b/fs/pnode.c
index aae331a5d03b..260ac8f898a4 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -242,6 +242,7 @@ static int propagate_one(struct mount *m)
 	child = copy_tree(last_source, last_source->mnt.mnt_root, type);
 	if (IS_ERR(child))
 		return PTR_ERR(child);
+	child->mnt.mnt_flags &= ~MNT_LOCKED;
 	mnt_set_mountpoint(m, mp, child);
 	last_dest = m;
 	last_source = child;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH review 06/18] mnt: Carefully set CL_UNPRIVILEGED in clone_mnt
  2014-12-12 22:32 [PATCH review 00/18] userns: review of bug fixes for 3.19-rcX Eric W. Biederman
                   ` (3 preceding siblings ...)
  2014-12-12 22:48 ` [PATCH review 05/18] mnt: Move the clear of MNT_LOCKED from copy_tree to it's callers Eric W. Biederman
@ 2014-12-12 22:48 ` Eric W. Biederman
  2014-12-12 22:48 ` [PATCH review 07/18] mnt: Clear mnt_expire during pivot_root Eric W. Biederman
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2014-12-12 22:48 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Serge E. Hallyn, Andy Lutomirski, Chen Hanxiao,
	Richard Weinberger, Eric W. Biederman

old->mnt_expiry should be ignored unless CL_EXPIRE is set.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/namespace.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index e8d1ffa7f132..f87a90b98da2 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -963,7 +963,8 @@ static struct mount *clone_mnt(struct mount *old, struct dentry *root,
 	}
 
 	/* Don't allow unprivileged users to reveal what is under a mount */
-	if ((flag & CL_UNPRIVILEGED) && list_empty(&old->mnt_expire))
+	if ((flag & CL_UNPRIVILEGED) &&
+	    (!(flag & CL_EXPIRE) || list_empty(&old->mnt_expire)))
 		mnt->mnt.mnt_flags |= MNT_LOCKED;
 
 	atomic_inc(&sb->s_active);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH review 07/18] mnt: Clear mnt_expire during pivot_root
  2014-12-12 22:32 [PATCH review 00/18] userns: review of bug fixes for 3.19-rcX Eric W. Biederman
                   ` (4 preceding siblings ...)
  2014-12-12 22:48 ` [PATCH review 06/18] mnt: Carefully set CL_UNPRIVILEGED in clone_mnt Eric W. Biederman
@ 2014-12-12 22:48 ` Eric W. Biederman
  2014-12-12 22:48 ` [PATCH review 08/18] groups: Consolidate the setgroups permission checks Eric W. Biederman
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2014-12-12 22:48 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Serge E. Hallyn, Andy Lutomirski, Chen Hanxiao,
	Richard Weinberger, Eric W. Biederman

When inspecting the pivot_root and the current mount expiry logic I
realized that pivot_root fails to clear like mount move does.

Add the missing line in case someone does the interesting feat of
moving an expirable submount.  This gives a strong guarantee that root
of the filesystem tree will never expire.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/namespace.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index f87a90b98da2..fe1c77145a78 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2967,6 +2967,8 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 	/* mount new_root on / */
 	attach_mnt(new_mnt, real_mount(root_parent.mnt), root_mp);
 	touch_mnt_namespace(current->nsproxy->mnt_ns);
+	/* A moved mount should not expire automatically */
+	list_del_init(&new_mnt->mnt_expire);
 	unlock_mount_hash();
 	chroot_fs_refs(&root, &new);
 	put_mountpoint(root_mp);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH review 08/18] groups: Consolidate the setgroups permission checks
  2014-12-12 22:32 [PATCH review 00/18] userns: review of bug fixes for 3.19-rcX Eric W. Biederman
                   ` (5 preceding siblings ...)
  2014-12-12 22:48 ` [PATCH review 07/18] mnt: Clear mnt_expire during pivot_root Eric W. Biederman
@ 2014-12-12 22:48 ` Eric W. Biederman
  2014-12-12 22:48 ` [PATCH review 09/18] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2014-12-12 22:48 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Serge E. Hallyn, Andy Lutomirski, Chen Hanxiao,
	Richard Weinberger, Eric W. Biederman, stable

Today there are 3 instances of setgroups and due to an oversight their
permission checking has diverged.  Add a common function so that
they may all share the same permission checking code.

This corrects the current oversight in the current permission checks
and adds a helper to avoid this in the future.

A user namespace security fix will update this new helper, shortly.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/s390/kernel/compat_linux.c | 2 +-
 include/linux/cred.h            | 1 +
 kernel/groups.c                 | 9 ++++++++-
 kernel/uid16.c                  | 2 +-
 4 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
index ca38139423ae..437e61159279 100644
--- a/arch/s390/kernel/compat_linux.c
+++ b/arch/s390/kernel/compat_linux.c
@@ -249,7 +249,7 @@ COMPAT_SYSCALL_DEFINE2(s390_setgroups16, int, gidsetsize, u16 __user *, grouplis
 	struct group_info *group_info;
 	int retval;
 
-	if (!capable(CAP_SETGID))
+	if (!may_setgroups())
 		return -EPERM;
 	if ((unsigned)gidsetsize > NGROUPS_MAX)
 		return -EINVAL;
diff --git a/include/linux/cred.h b/include/linux/cred.h
index b2d0820837c4..2fb2ca2127ed 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -68,6 +68,7 @@ extern void groups_free(struct group_info *);
 extern int set_current_groups(struct group_info *);
 extern void set_groups(struct cred *, struct group_info *);
 extern int groups_search(const struct group_info *, kgid_t);
+extern bool may_setgroups(void);
 
 /* access the groups "array" with this macro */
 #define GROUP_AT(gi, i) \
diff --git a/kernel/groups.c b/kernel/groups.c
index 451698f86cfa..02d8a251c476 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -213,6 +213,13 @@ out:
 	return i;
 }
 
+bool may_setgroups(void)
+{
+	struct user_namespace *user_ns = current_user_ns();
+
+	return ns_capable(user_ns, CAP_SETGID);
+}
+
 /*
  *	SMP: Our groups are copy-on-write. We can set them safely
  *	without another task interfering.
@@ -223,7 +230,7 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
 	struct group_info *group_info;
 	int retval;
 
-	if (!ns_capable(current_user_ns(), CAP_SETGID))
+	if (!may_setgroups())
 		return -EPERM;
 	if ((unsigned)gidsetsize > NGROUPS_MAX)
 		return -EINVAL;
diff --git a/kernel/uid16.c b/kernel/uid16.c
index 602e5bbbceff..d58cc4d8f0d1 100644
--- a/kernel/uid16.c
+++ b/kernel/uid16.c
@@ -176,7 +176,7 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
 	struct group_info *group_info;
 	int retval;
 
-	if (!ns_capable(current_user_ns(), CAP_SETGID))
+	if (!may_setgroups())
 		return -EPERM;
 	if ((unsigned)gidsetsize > NGROUPS_MAX)
 		return -EINVAL;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH review 09/18] userns: Document what the invariant required for safe unprivileged mappings.
  2014-12-12 22:32 [PATCH review 00/18] userns: review of bug fixes for 3.19-rcX Eric W. Biederman
                   ` (6 preceding siblings ...)
  2014-12-12 22:48 ` [PATCH review 08/18] groups: Consolidate the setgroups permission checks Eric W. Biederman
@ 2014-12-12 22:48 ` Eric W. Biederman
  2014-12-12 22:48 ` [PATCH review 10/18] userns: Don't allow setgroups until a gid mapping has been setablished Eric W. Biederman
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2014-12-12 22:48 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Serge E. Hallyn, Andy Lutomirski, Chen Hanxiao,
	Richard Weinberger, Eric W. Biederman, stable

The rule is simple.  Don't allow anything that wouldn't be allowed
without unprivileged mappings.

It was previously overlooked that establishing gid mappings would
allow dropping groups and potentially gaining permission to files and
directories that had lesser permissions for a specific group than for
all other users.

This is the rule needed to fix CVE-2014-8989 and prevent any other
security issues with new_idmap_permitted.

The reason for this rule is that the unix permission model is old and
there are programs out there somewhere that take advantage of every
little corner of it.  So allowing a uid or gid mapping to be
established without privielge that would allow anything that would not
be allowed without that mapping will result in expectations from some
code somewhere being violated.  Violated expectations about the
behavior of the OS is a long way to say a security issue.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/user_namespace.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index aa312b0dc3ec..b99c862a2e3f 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -812,7 +812,9 @@ static bool new_idmap_permitted(const struct file *file,
 				struct user_namespace *ns, int cap_setid,
 				struct uid_gid_map *new_map)
 {
-	/* Allow mapping to your own filesystem ids */
+	/* Don't allow mappings that would allow anything that wouldn't
+	 * be allowed without the establishment of unprivileged mappings.
+	 */
 	if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) {
 		u32 id = new_map->extent[0].lower_first;
 		if (cap_setid == CAP_SETUID) {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH review 10/18] userns: Don't allow setgroups until a gid mapping has been setablished
  2014-12-12 22:32 [PATCH review 00/18] userns: review of bug fixes for 3.19-rcX Eric W. Biederman
                   ` (7 preceding siblings ...)
  2014-12-12 22:48 ` [PATCH review 09/18] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman
@ 2014-12-12 22:48 ` Eric W. Biederman
  2014-12-12 22:48 ` [PATCH review 12/18] userns: Check euid no fsuid when establishing an unprivileged uid mapping Eric W. Biederman
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2014-12-12 22:48 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Serge E. Hallyn, Andy Lutomirski, Chen Hanxiao,
	Richard Weinberger, Eric W. Biederman, stable

setgroups is unique in not needing a valid mapping before it can be called,
in the case of setgroups(0, NULL) which drops all supplemental groups.

The design of the user namespace assumes that CAP_SETGID can not actually
be used until a gid mapping is established.  Therefore add a helper function
to see if the user namespace gid mapping has been established and call
that function in the setgroups permission check.

This is part of the fix for CVE-2014-8989, being able to drop groups
without privilege using user namespaces.

Cc: stable@vger.kernel.org
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/user_namespace.h |  5 +++++
 kernel/groups.c                |  4 +++-
 kernel/user_namespace.c        | 14 ++++++++++++++
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index e95372654f09..8d493083486a 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -63,6 +63,7 @@ extern const struct seq_operations proc_projid_seq_operations;
 extern ssize_t proc_uid_map_write(struct file *, const char __user *, size_t, loff_t *);
 extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, loff_t *);
 extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, loff_t *);
+extern bool userns_may_setgroups(const struct user_namespace *ns);
 #else
 
 static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
@@ -87,6 +88,10 @@ static inline void put_user_ns(struct user_namespace *ns)
 {
 }
 
+static inline bool userns_may_setgroups(const struct user_namespace *ns)
+{
+	return true;
+}
 #endif
 
 #endif /* _LINUX_USER_H */
diff --git a/kernel/groups.c b/kernel/groups.c
index 02d8a251c476..664411f171b5 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -6,6 +6,7 @@
 #include <linux/slab.h>
 #include <linux/security.h>
 #include <linux/syscalls.h>
+#include <linux/user_namespace.h>
 #include <asm/uaccess.h>
 
 /* init to 2 - one for init_task, one to ensure it is never freed */
@@ -217,7 +218,8 @@ bool may_setgroups(void)
 {
 	struct user_namespace *user_ns = current_user_ns();
 
-	return ns_capable(user_ns, CAP_SETGID);
+	return ns_capable(user_ns, CAP_SETGID) &&
+		userns_may_setgroups(user_ns);
 }
 
 /*
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index b99c862a2e3f..27c8dab48c07 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -843,6 +843,20 @@ static bool new_idmap_permitted(const struct file *file,
 	return false;
 }
 
+bool userns_may_setgroups(const struct user_namespace *ns)
+{
+	bool allowed;
+
+	mutex_lock(&id_map_mutex);
+	/* It is not safe to use setgroups until a gid mapping in
+	 * the user namespace has been established.
+	 */
+	allowed = ns->gid_map.nr_extents != 0;
+	mutex_unlock(&id_map_mutex);
+
+	return allowed;
+}
+
 static void *userns_get(struct task_struct *task)
 {
 	struct user_namespace *user_ns;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH review 11/18] userns: Don't allow unprivileged creation of gid mappings
       [not found] ` <87k31wzehb.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
  2014-12-12 22:48   ` [PATCH review 03/18] umount: Disallow unprivileged mount force Eric W. Biederman
  2014-12-12 22:48   ` [PATCH review 04/18] umount: Do not allow unmounting rootfs Eric W. Biederman
@ 2014-12-12 22:48   ` Eric W. Biederman
  2014-12-12 22:48   ` [PATCH review 13/18] userns: Only allow the creator of the userns unprivileged mappings Eric W. Biederman
  2014-12-14 19:41   ` [PATCH review 00/18] userns: review of bug fixes for 3.19-rcX Richard Weinberger
  4 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2014-12-12 22:48 UTC (permalink / raw)
  To: Linux Containers
  Cc: Richard Weinberger, stable-u79uwXL29TY76Z2rM5mHXA,
	Andy Lutomirski, Eric W. Biederman,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

As any gid mapping will allow and must allow for backwards
compatibility dropping groups don't allow any gid mappings to be
established without CAP_SETGID in the parent user namespace.

For a small class of applications this change breaks userspace
and removes useful functionality.  This small class of applications
includes tools/testing/selftests/mount/unprivilged-remount-test.c

Most of the removed functionality will be added back with the addition
of a one way knob to disable setgroups.  Once setgroups is disabled
setting the gid_map becomes as safe as setting the uid_map.

For more common applications that set the uid_map and the gid_map
with privilege this change will have no affect.

This is part of a fix for CVE-2014-8989.

Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Reviewed-by: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 kernel/user_namespace.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 27c8dab48c07..1ce6d67c07b7 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -821,10 +821,6 @@ static bool new_idmap_permitted(const struct file *file,
 			kuid_t uid = make_kuid(ns->parent, id);
 			if (uid_eq(uid, file->f_cred->fsuid))
 				return true;
-		} else if (cap_setid == CAP_SETGID) {
-			kgid_t gid = make_kgid(ns->parent, id);
-			if (gid_eq(gid, file->f_cred->fsgid))
-				return true;
 		}
 	}
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH review 12/18] userns: Check euid no fsuid when establishing an unprivileged uid mapping
  2014-12-12 22:32 [PATCH review 00/18] userns: review of bug fixes for 3.19-rcX Eric W. Biederman
                   ` (8 preceding siblings ...)
  2014-12-12 22:48 ` [PATCH review 10/18] userns: Don't allow setgroups until a gid mapping has been setablished Eric W. Biederman
@ 2014-12-12 22:48 ` Eric W. Biederman
  2014-12-12 22:48 ` [PATCH review 14/18] userns: Rename id_map_mutex to userns_state_mutex Eric W. Biederman
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2014-12-12 22:48 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Serge E. Hallyn, Andy Lutomirski, Chen Hanxiao,
	Richard Weinberger, Eric W. Biederman, stable

setresuid allows the euid to be set to any of uid, euid, suid, and
fsuid.  Therefor it is safe to allow an unprivileged user to map
their euid and use CAP_SETUID privileged with exactly that uid,
as no new credentials can be obtained.

I can not find a combination of existing system calls that allows setting
uid, euid, suid, and fsuid from the fsuid making the previous use
of fsuid for allowing unprivileged mappings a bug.

This is part of a fix for CVE-2014-8989.

Cc: stable@vger.kernel.org
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/user_namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 1ce6d67c07b7..9451b12a9b6c 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -819,7 +819,7 @@ static bool new_idmap_permitted(const struct file *file,
 		u32 id = new_map->extent[0].lower_first;
 		if (cap_setid == CAP_SETUID) {
 			kuid_t uid = make_kuid(ns->parent, id);
-			if (uid_eq(uid, file->f_cred->fsuid))
+			if (uid_eq(uid, file->f_cred->euid))
 				return true;
 		}
 	}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH review 13/18] userns: Only allow the creator of the userns unprivileged mappings
       [not found] ` <87k31wzehb.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
                     ` (2 preceding siblings ...)
  2014-12-12 22:48   ` [PATCH review 11/18] userns: Don't allow unprivileged creation of gid mappings Eric W. Biederman
@ 2014-12-12 22:48   ` Eric W. Biederman
  2014-12-14 19:41   ` [PATCH review 00/18] userns: review of bug fixes for 3.19-rcX Richard Weinberger
  4 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2014-12-12 22:48 UTC (permalink / raw)
  To: Linux Containers
  Cc: Richard Weinberger, stable-u79uwXL29TY76Z2rM5mHXA,
	Andy Lutomirski, Eric W. Biederman,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

If you did not create the user namespace and are allowed
to write to uid_map or gid_map you should already have the necessary
privilege in the parent user namespace to establish any mapping
you want so this will not affect userspace in practice.

Limiting unprivileged uid mapping establishment to the creator of the
user namespace makes it easier to verify all credentials obtained with
the uid mapping can be obtained without the uid mapping without
privilege.

Limiting unprivileged gid mapping establishment (which is temporarily
absent) to the creator of the user namespace also ensures that the
combination of uid and gid can already be obtained without privilege.

This is part of the fix for CVE-2014-8989.

Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Reviewed-by: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 kernel/user_namespace.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 9451b12a9b6c..1e34de2fbd60 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -812,14 +812,16 @@ static bool new_idmap_permitted(const struct file *file,
 				struct user_namespace *ns, int cap_setid,
 				struct uid_gid_map *new_map)
 {
+	const struct cred *cred = file->f_cred;
 	/* Don't allow mappings that would allow anything that wouldn't
 	 * be allowed without the establishment of unprivileged mappings.
 	 */
-	if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) {
+	if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) &&
+	    uid_eq(ns->owner, cred->euid)) {
 		u32 id = new_map->extent[0].lower_first;
 		if (cap_setid == CAP_SETUID) {
 			kuid_t uid = make_kuid(ns->parent, id);
-			if (uid_eq(uid, file->f_cred->euid))
+			if (uid_eq(uid, cred->euid))
 				return true;
 		}
 	}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH review 14/18] userns: Rename id_map_mutex to userns_state_mutex
  2014-12-12 22:32 [PATCH review 00/18] userns: review of bug fixes for 3.19-rcX Eric W. Biederman
                   ` (9 preceding siblings ...)
  2014-12-12 22:48 ` [PATCH review 12/18] userns: Check euid no fsuid when establishing an unprivileged uid mapping Eric W. Biederman
@ 2014-12-12 22:48 ` Eric W. Biederman
  2014-12-12 22:48 ` [PATCH review 15/18] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2014-12-12 22:48 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Serge E. Hallyn, Andy Lutomirski, Chen Hanxiao,
	Richard Weinberger, Eric W. Biederman, stable

Generalize id_map_mutex so it can be used for more state of a user namespace.

Cc: stable@vger.kernel.org
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/user_namespace.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 1e34de2fbd60..44a555ac6104 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -24,6 +24,7 @@
 #include <linux/fs_struct.h>
 
 static struct kmem_cache *user_ns_cachep __read_mostly;
+static DEFINE_MUTEX(userns_state_mutex);
 
 static bool new_idmap_permitted(const struct file *file,
 				struct user_namespace *ns, int cap_setid,
@@ -583,9 +584,6 @@ static bool mappings_overlap(struct uid_gid_map *new_map,
 	return false;
 }
 
-
-static DEFINE_MUTEX(id_map_mutex);
-
 static ssize_t map_write(struct file *file, const char __user *buf,
 			 size_t count, loff_t *ppos,
 			 int cap_setid,
@@ -602,7 +600,7 @@ static ssize_t map_write(struct file *file, const char __user *buf,
 	ssize_t ret = -EINVAL;
 
 	/*
-	 * The id_map_mutex serializes all writes to any given map.
+	 * The userns_state_mutex serializes all writes to any given map.
 	 *
 	 * Any map is only ever written once.
 	 *
@@ -620,7 +618,7 @@ static ssize_t map_write(struct file *file, const char __user *buf,
 	 * order and smp_rmb() is guaranteed that we don't have crazy
 	 * architectures returning stale data.
 	 */
-	mutex_lock(&id_map_mutex);
+	mutex_lock(&userns_state_mutex);
 
 	ret = -EPERM;
 	/* Only allow one successful write to the map */
@@ -750,7 +748,7 @@ static ssize_t map_write(struct file *file, const char __user *buf,
 	*ppos = count;
 	ret = count;
 out:
-	mutex_unlock(&id_map_mutex);
+	mutex_unlock(&userns_state_mutex);
 	if (page)
 		free_page(page);
 	return ret;
@@ -845,12 +843,12 @@ bool userns_may_setgroups(const struct user_namespace *ns)
 {
 	bool allowed;
 
-	mutex_lock(&id_map_mutex);
+	mutex_lock(&userns_state_mutex);
 	/* It is not safe to use setgroups until a gid mapping in
 	 * the user namespace has been established.
 	 */
 	allowed = ns->gid_map.nr_extents != 0;
-	mutex_unlock(&id_map_mutex);
+	mutex_unlock(&userns_state_mutex);
 
 	return allowed;
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH review 15/18] userns: Add a knob to disable setgroups on a per user namespace basis
  2014-12-12 22:32 [PATCH review 00/18] userns: review of bug fixes for 3.19-rcX Eric W. Biederman
                   ` (10 preceding siblings ...)
  2014-12-12 22:48 ` [PATCH review 14/18] userns: Rename id_map_mutex to userns_state_mutex Eric W. Biederman
@ 2014-12-12 22:48 ` Eric W. Biederman
  2014-12-12 22:48 ` [PATCH review 16/18] userns: Allow setting gid_maps without privilege when setgroups is disabled Eric W. Biederman
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2014-12-12 22:48 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Serge E. Hallyn, Andy Lutomirski, Chen Hanxiao,
	Richard Weinberger, Eric W. Biederman, stable

- Expose the knob to user space through a proc file /proc/<pid>/setgroups

  A value of "deny" means the setgroups system call is disabled in the
  current processes user namespace and can not be enabled in the
  future in this user namespace.

  A value of "allow" means the segtoups system call is enabled.

- Descendant user namespaces inherit the value of setgroups from
  their parents.

- A proc file is used (instead of a sysctl) as sysctls currently do
  not allow checking the permissions at open time.

- Writing to the proc file is restricted to before the gid_map
  for the user namespace is set.

  This ensures that disabling setgroups at a user namespace
  level will never remove the ability to call setgroups
  from a process that already has that ability.

  A process may opt in to the setgroups disable for itself by
  creating, entering and configuring a user namespace or by calling
  setns on an existing user namespace with setgroups disabled.
  Processes without privileges already can not call setgroups so this
  is a noop.  Prodcess with privilege become processes without
  privilege when entering a user namespace and as with any other path
  to dropping privilege they would not have the ability to call
  setgroups.  So this remains within the bounds of what is possible
  without a knob to disable setgroups permanently in a user namespace.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/proc/base.c                 | 53 ++++++++++++++++++++++++++
 include/linux/user_namespace.h |  7 ++++
 kernel/user.c                  |  1 +
 kernel/user_namespace.c        | 85 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 146 insertions(+)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 772efa45a452..7dc3ea89ef1a 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2464,6 +2464,57 @@ static const struct file_operations proc_projid_map_operations = {
 	.llseek		= seq_lseek,
 	.release	= proc_id_map_release,
 };
+
+static int proc_setgroups_open(struct inode *inode, struct file *file)
+{
+	struct user_namespace *ns = NULL;
+	struct task_struct *task;
+	int ret;
+
+	ret = -ESRCH;
+	task = get_proc_task(inode);
+	if (task) {
+		rcu_read_lock();
+		ns = get_user_ns(task_cred_xxx(task, user_ns));
+		rcu_read_unlock();
+		put_task_struct(task);
+	}
+	if (!ns)
+		goto err;
+
+	if (file->f_mode & FMODE_WRITE) {
+		ret = -EACCES;
+		if (!ns_capable(ns, CAP_SYS_ADMIN))
+			goto err_put_ns;
+	}
+
+	ret = single_open(file, &proc_setgroups_show, ns);
+	if (ret)
+		goto err_put_ns;
+
+	return 0;
+err_put_ns:
+	put_user_ns(ns);
+err:
+	return ret;
+}
+
+static int proc_setgroups_release(struct inode *inode, struct file *file)
+{
+	struct seq_file *seq = file->private_data;
+	struct user_namespace *ns = seq->private;
+	int ret = single_release(inode, file);
+	put_user_ns(ns);
+	return ret;
+}
+
+static const struct file_operations proc_setgroups_operations = {
+	.open		= proc_setgroups_open,
+	.write		= proc_setgroups_write,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= proc_setgroups_release,
+};
 #endif /* CONFIG_USER_NS */
 
 static int proc_pid_personality(struct seq_file *m, struct pid_namespace *ns,
@@ -2572,6 +2623,7 @@ static const struct pid_entry tgid_base_stuff[] = {
 	REG("uid_map",    S_IRUGO|S_IWUSR, proc_uid_map_operations),
 	REG("gid_map",    S_IRUGO|S_IWUSR, proc_gid_map_operations),
 	REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations),
+	REG("setgroups",  S_IRUGO|S_IWUSR, proc_setgroups_operations),
 #endif
 #ifdef CONFIG_CHECKPOINT_RESTORE
 	REG("timers",	  S_IRUGO, proc_timers_operations),
@@ -2913,6 +2965,7 @@ static const struct pid_entry tid_base_stuff[] = {
 	REG("uid_map",    S_IRUGO|S_IWUSR, proc_uid_map_operations),
 	REG("gid_map",    S_IRUGO|S_IWUSR, proc_gid_map_operations),
 	REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations),
+	REG("setgroups",  S_IRUGO|S_IWUSR, proc_setgroups_operations),
 #endif
 };
 
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 8d493083486a..9f3579ff543d 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -17,6 +17,10 @@ struct uid_gid_map {	/* 64 bytes -- 1 cache line */
 	} extent[UID_GID_MAP_MAX_EXTENTS];
 };
 
+#define USERNS_SETGROUPS_ALLOWED 1UL
+
+#define USERNS_INIT_FLAGS USERNS_SETGROUPS_ALLOWED
+
 struct user_namespace {
 	struct uid_gid_map	uid_map;
 	struct uid_gid_map	gid_map;
@@ -27,6 +31,7 @@ struct user_namespace {
 	kuid_t			owner;
 	kgid_t			group;
 	unsigned int		proc_inum;
+	unsigned long		flags;
 
 	/* Register of per-UID persistent keyrings for this namespace */
 #ifdef CONFIG_PERSISTENT_KEYRINGS
@@ -63,6 +68,8 @@ extern const struct seq_operations proc_projid_seq_operations;
 extern ssize_t proc_uid_map_write(struct file *, const char __user *, size_t, loff_t *);
 extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, loff_t *);
 extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, loff_t *);
+extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *);
+extern int proc_setgroups_show(struct seq_file *m, void *v);
 extern bool userns_may_setgroups(const struct user_namespace *ns);
 #else
 
diff --git a/kernel/user.c b/kernel/user.c
index 4efa39350e44..2d09940c9632 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -51,6 +51,7 @@ struct user_namespace init_user_ns = {
 	.owner = GLOBAL_ROOT_UID,
 	.group = GLOBAL_ROOT_GID,
 	.proc_inum = PROC_USER_INIT_INO,
+	.flags = USERNS_INIT_FLAGS,
 #ifdef CONFIG_PERSISTENT_KEYRINGS
 	.persistent_keyring_register_sem =
 	__RWSEM_INITIALIZER(init_user_ns.persistent_keyring_register_sem),
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 44a555ac6104..6e80f4c1322b 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -100,6 +100,11 @@ int create_user_ns(struct cred *new)
 	ns->owner = owner;
 	ns->group = group;
 
+	/* Inherit USERNS_SETGROUPS_ALLOWED from our parent */
+	mutex_lock(&userns_state_mutex);
+	ns->flags = parent_ns->flags;
+	mutex_unlock(&userns_state_mutex);
+
 	set_cred_user_ns(new, ns);
 
 #ifdef CONFIG_PERSISTENT_KEYRINGS
@@ -839,6 +844,84 @@ static bool new_idmap_permitted(const struct file *file,
 	return false;
 }
 
+int proc_setgroups_show(struct seq_file *seq, void *v)
+{
+	struct user_namespace *ns = seq->private;
+	unsigned long userns_flags = ACCESS_ONCE(ns->flags);
+
+	seq_printf(seq, "%s\n",
+		   (userns_flags & USERNS_SETGROUPS_ALLOWED) ?
+		   "allow" : "deny");
+	return 0;
+}
+
+ssize_t proc_setgroups_write(struct file *file, const char __user *buf,
+			     size_t count, loff_t *ppos)
+{
+	struct seq_file *seq = file->private_data;
+	struct user_namespace *ns = seq->private;
+	char kbuf[8], *pos;
+	bool setgroups_allowed;
+	ssize_t ret;
+
+	/* Only allow a very narrow range of strings to be written */
+	ret = -EINVAL;
+	if ((*ppos != 0) || (count >= sizeof(kbuf)))
+		goto out;
+
+	/* What was written? */
+	ret = -EFAULT;
+	if (copy_from_user(kbuf, buf, count))
+		goto out;
+	kbuf[count] = '\0';
+	pos = kbuf;
+
+	/* What is being requested? */
+	ret = -EINVAL;
+	if (strncmp(pos, "allow", 5) == 0) {
+		pos += 5;
+		setgroups_allowed = true;
+	}
+	else if (strncmp(pos, "deny", 4) == 0) {
+		pos += 4;
+		setgroups_allowed = false;
+	}
+	else
+		goto out;
+
+	/* Verify there is not trailing junk on the line */
+	pos = skip_spaces(pos);
+	if (*pos != '\0')
+		goto out;
+
+	ret = -EPERM;
+	mutex_lock(&userns_state_mutex);
+	if (setgroups_allowed) {
+		/* Enabling setgroups after setgroups has been disabled
+		 * is not allowed.
+		 */
+		if (!(ns->flags & USERNS_SETGROUPS_ALLOWED))
+			goto out_unlock;
+	} else {
+		/* Permanently disabling setgroups after setgroups has
+		 * been enabled by writing the gid_map is not allowed.
+		 */
+		if (ns->gid_map.nr_extents != 0)
+			goto out_unlock;
+		ns->flags &= ~USERNS_SETGROUPS_ALLOWED;
+	}
+	mutex_unlock(&userns_state_mutex);
+
+	/* Report a successful write */
+	*ppos = count;
+	ret = count;
+out:
+	return ret;
+out_unlock:
+	mutex_unlock(&userns_state_mutex);
+	goto out;
+}
+
 bool userns_may_setgroups(const struct user_namespace *ns)
 {
 	bool allowed;
@@ -848,6 +931,8 @@ bool userns_may_setgroups(const struct user_namespace *ns)
 	 * the user namespace has been established.
 	 */
 	allowed = ns->gid_map.nr_extents != 0;
+	/* Is setgroups allowed? */
+	allowed = allowed && (ns->flags & USERNS_SETGROUPS_ALLOWED);
 	mutex_unlock(&userns_state_mutex);
 
 	return allowed;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH review 16/18] userns: Allow setting gid_maps without privilege when setgroups is disabled
  2014-12-12 22:32 [PATCH review 00/18] userns: review of bug fixes for 3.19-rcX Eric W. Biederman
                   ` (11 preceding siblings ...)
  2014-12-12 22:48 ` [PATCH review 15/18] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman
@ 2014-12-12 22:48 ` Eric W. Biederman
  2014-12-12 22:48 ` [PATCH review 17/18] userns; Correct the comment in map_write Eric W. Biederman
  2014-12-12 22:48 ` [PATCH review 18/18] userns: Unbreak the unprivileged remount tests Eric W. Biederman
  14 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2014-12-12 22:48 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Serge E. Hallyn, Andy Lutomirski, Chen Hanxiao,
	Richard Weinberger, Eric W. Biederman, stable

Now that setgroups can be disabled and not reenabled, setting gid_map
without privielge can now be enabled when setgroups is disabled.

This restores most of the functionality that was lost when unprivileged
setting of gid_map was removed.  Applications that use this functionality
will need to check to see if they use setgroups or init_groups, and if they
don't they can be fixed by simply disabling setgroups before writing to
gid_map.

Cc: stable@vger.kernel.org
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/user_namespace.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 6e80f4c1322b..a2e37c5d2f63 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -826,6 +826,11 @@ static bool new_idmap_permitted(const struct file *file,
 			kuid_t uid = make_kuid(ns->parent, id);
 			if (uid_eq(uid, cred->euid))
 				return true;
+		} else if (cap_setid == CAP_SETGID) {
+			kgid_t gid = make_kgid(ns->parent, id);
+			if (!(ns->flags & USERNS_SETGROUPS_ALLOWED) &&
+			    gid_eq(gid, cred->egid))
+				return true;
 		}
 	}
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH review 17/18] userns; Correct the comment in map_write
  2014-12-12 22:32 [PATCH review 00/18] userns: review of bug fixes for 3.19-rcX Eric W. Biederman
                   ` (12 preceding siblings ...)
  2014-12-12 22:48 ` [PATCH review 16/18] userns: Allow setting gid_maps without privilege when setgroups is disabled Eric W. Biederman
@ 2014-12-12 22:48 ` Eric W. Biederman
  2014-12-12 22:48 ` [PATCH review 18/18] userns: Unbreak the unprivileged remount tests Eric W. Biederman
  14 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2014-12-12 22:48 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Serge E. Hallyn, Andy Lutomirski, Chen Hanxiao,
	Richard Weinberger, Eric W. Biederman

It is important that all maps are less than PAGE_SIZE
or else setting the last byte of the buffer to '0'
could write off the end of the allocated storage.

Correct the misleading comment.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/user_namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index a2e37c5d2f63..ad419b04c146 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -643,7 +643,7 @@ static ssize_t map_write(struct file *file, const char __user *buf,
 	if (!page)
 		goto out;
 
-	/* Only allow <= page size writes at the beginning of the file */
+	/* Only allow < page size writes at the beginning of the file */
 	ret = -EINVAL;
 	if ((*ppos != 0) || (count >= PAGE_SIZE))
 		goto out;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH review 18/18] userns: Unbreak the unprivileged remount tests
  2014-12-12 22:32 [PATCH review 00/18] userns: review of bug fixes for 3.19-rcX Eric W. Biederman
                   ` (13 preceding siblings ...)
  2014-12-12 22:48 ` [PATCH review 17/18] userns; Correct the comment in map_write Eric W. Biederman
@ 2014-12-12 22:48 ` Eric W. Biederman
  14 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2014-12-12 22:48 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Serge E. Hallyn, Andy Lutomirski, Chen Hanxiao,
	Richard Weinberger, Eric W. Biederman, stable

A security fix in caused the way the unprivileged remount tests were
using user namespaces to break.  Tweak the way user namespaces are
being used so the test works again.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 .../selftests/mount/unprivileged-remount-test.c    | 32 ++++++++++++++++------
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/mount/unprivileged-remount-test.c b/tools/testing/selftests/mount/unprivileged-remount-test.c
index 9669d375625a..517785052f1c 100644
--- a/tools/testing/selftests/mount/unprivileged-remount-test.c
+++ b/tools/testing/selftests/mount/unprivileged-remount-test.c
@@ -53,17 +53,14 @@ static void die(char *fmt, ...)
 	exit(EXIT_FAILURE);
 }
 
-static void write_file(char *filename, char *fmt, ...)
+static void vmaybe_write_file(bool enoent_ok, char *filename, char *fmt, va_list ap)
 {
 	char buf[4096];
 	int fd;
 	ssize_t written;
 	int buf_len;
-	va_list ap;
 
-	va_start(ap, fmt);
 	buf_len = vsnprintf(buf, sizeof(buf), fmt, ap);
-	va_end(ap);
 	if (buf_len < 0) {
 		die("vsnprintf failed: %s\n",
 		    strerror(errno));
@@ -74,6 +71,8 @@ static void write_file(char *filename, char *fmt, ...)
 
 	fd = open(filename, O_WRONLY);
 	if (fd < 0) {
+		if ((errno == ENOENT) && enoent_ok)
+			return;
 		die("open of %s failed: %s\n",
 		    filename, strerror(errno));
 	}
@@ -92,6 +91,26 @@ static void write_file(char *filename, char *fmt, ...)
 	}
 }
 
+static void maybe_write_file(char *filename, char *fmt, ...)
+{
+	va_list ap;
+
+	va_start(ap, fmt);
+	vmaybe_write_file(true, filename, fmt, ap);
+	va_end(ap);
+
+}
+
+static void write_file(char *filename, char *fmt, ...)
+{
+	va_list ap;
+
+	va_start(ap, fmt);
+	vmaybe_write_file(false, filename, fmt, ap);
+	va_end(ap);
+
+}
+
 static int read_mnt_flags(const char *path)
 {
 	int ret;
@@ -144,13 +163,10 @@ static void create_and_enter_userns(void)
 			strerror(errno));
 	}
 
+	maybe_write_file("/proc/self/setgroups", "deny");
 	write_file("/proc/self/uid_map", "0 %d 1", uid);
 	write_file("/proc/self/gid_map", "0 %d 1", gid);
 
-	if (setgroups(0, NULL) != 0) {
-		die("setgroups failed: %s\n",
-			strerror(errno));
-	}
 	if (setgid(0) != 0) {
 		die ("setgid(0) failed %s\n",
 			strerror(errno));
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH review 03/18] umount: Disallow unprivileged mount force
  2014-12-12 22:48   ` [PATCH review 03/18] umount: Disallow unprivileged mount force Eric W. Biederman
@ 2014-12-12 23:07     ` Andy Lutomirski
       [not found]       ` <CALCETrV2kBfzypMbYKgxJ4BqB6yBG6Xvo=sZy3tvTng5ZRvAKQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 24+ messages in thread
From: Andy Lutomirski @ 2014-12-12 23:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Linux FS Devel, Serge E. Hallyn, Chen Hanxiao,
	Richard Weinberger, stable

On Fri, Dec 12, 2014 at 2:48 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Forced unmount affects not just the mount namespace but the underlying
> superblock as well.  Restrict forced unmount to the global root user
> for now.  Otherwise it becomes possible a user in a less privileged
> mount namespace to force the shutdown of a superblock of a filesystem
> in a more privileged mount namespace, allowing a DOS attack on root.
>

I thought I already fixed this.  Did I miss part of it?

> Cc: stable@vger.kernel.org
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  fs/namespace.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 3a1a87dc33df..43b16af8af30 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -1544,6 +1544,9 @@ SYSCALL_DEFINE2(umount, char __user *, name, int, flags)
>                 goto dput_and_out;
>         if (mnt->mnt.mnt_flags & MNT_LOCKED)
>                 goto dput_and_out;
> +       retval = -EPERM;
> +       if (flags & MNT_FORCE && !capable(CAP_SYS_ADMIN))
> +               goto dput_and_out;
>
>         retval = do_umount(mnt, flags);
>  dput_and_out:
> --
> 1.9.1
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH review 03/18] umount: Disallow unprivileged mount force
       [not found]       ` <CALCETrV2kBfzypMbYKgxJ4BqB6yBG6Xvo=sZy3tvTng5ZRvAKQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-12-12 23:25         ` Eric W. Biederman
  2014-12-13  0:20           ` Andy Lutomirski
  0 siblings, 1 reply; 24+ messages in thread
From: Eric W. Biederman @ 2014-12-12 23:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Richard Weinberger, Linux Containers, stable, Linux FS Devel

Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes:

> On Fri, Dec 12, 2014 at 2:48 PM, Eric W. Biederman
> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>> Forced unmount affects not just the mount namespace but the underlying
>> superblock as well.  Restrict forced unmount to the global root user
>> for now.  Otherwise it becomes possible a user in a less privileged
>> mount namespace to force the shutdown of a superblock of a filesystem
>> in a more privileged mount namespace, allowing a DOS attack on root.
>>
>
> I thought I already fixed this.  Did I miss part of it?

My tree is based at 3.18-rc6 and as of there I don't see another
fix.

You fixed the remount_sb case in umount I don't think you fixed forced
unmounts themselves.

If you did fix this and can point me at it I will be happy to drop this
patch.

Eric

>> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> ---
>>  fs/namespace.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/fs/namespace.c b/fs/namespace.c
>> index 3a1a87dc33df..43b16af8af30 100644
>> --- a/fs/namespace.c
>> +++ b/fs/namespace.c
>> @@ -1544,6 +1544,9 @@ SYSCALL_DEFINE2(umount, char __user *, name, int, flags)
>>                 goto dput_and_out;
>>         if (mnt->mnt.mnt_flags & MNT_LOCKED)
>>                 goto dput_and_out;
>> +       retval = -EPERM;
>> +       if (flags & MNT_FORCE && !capable(CAP_SYS_ADMIN))
>> +               goto dput_and_out;
>>
>>         retval = do_umount(mnt, flags);
>>  dput_and_out:
>> --
>> 1.9.1
>>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH review 03/18] umount: Disallow unprivileged mount force
  2014-12-12 23:25         ` Eric W. Biederman
@ 2014-12-13  0:20           ` Andy Lutomirski
  0 siblings, 0 replies; 24+ messages in thread
From: Andy Lutomirski @ 2014-12-13  0:20 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Linux FS Devel, Serge E. Hallyn, Chen Hanxiao,
	Richard Weinberger, stable

On Fri, Dec 12, 2014 at 3:25 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Andy Lutomirski <luto@amacapital.net> writes:
>
>> On Fri, Dec 12, 2014 at 2:48 PM, Eric W. Biederman
>> <ebiederm@xmission.com> wrote:
>>> Forced unmount affects not just the mount namespace but the underlying
>>> superblock as well.  Restrict forced unmount to the global root user
>>> for now.  Otherwise it becomes possible a user in a less privileged
>>> mount namespace to force the shutdown of a superblock of a filesystem
>>> in a more privileged mount namespace, allowing a DOS attack on root.
>>>
>>
>> I thought I already fixed this.  Did I miss part of it?
>
> My tree is based at 3.18-rc6 and as of there I don't see another
> fix.
>
> You fixed the remount_sb case in umount I don't think you fixed forced
> unmounts themselves.
>
> If you did fix this and can point me at it I will be happy to drop this
> patch.

Oh, right, I missed:

    if (flags & MNT_FORCE && sb->s_op->umount_begin) {
        sb->s_op->umount_begin(sb);
    }

whoops.

--Andy

>
> Eric
>
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>> ---
>>>  fs/namespace.c | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/fs/namespace.c b/fs/namespace.c
>>> index 3a1a87dc33df..43b16af8af30 100644
>>> --- a/fs/namespace.c
>>> +++ b/fs/namespace.c
>>> @@ -1544,6 +1544,9 @@ SYSCALL_DEFINE2(umount, char __user *, name, int, flags)
>>>                 goto dput_and_out;
>>>         if (mnt->mnt.mnt_flags & MNT_LOCKED)
>>>                 goto dput_and_out;
>>> +       retval = -EPERM;
>>> +       if (flags & MNT_FORCE && !capable(CAP_SYS_ADMIN))
>>> +               goto dput_and_out;
>>>
>>>         retval = do_umount(mnt, flags);
>>>  dput_and_out:
>>> --
>>> 1.9.1
>>>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH review 00/18] userns: review of bug fixes for 3.19-rcX
       [not found] ` <87k31wzehb.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
                     ` (3 preceding siblings ...)
  2014-12-12 22:48   ` [PATCH review 13/18] userns: Only allow the creator of the userns unprivileged mappings Eric W. Biederman
@ 2014-12-14 19:41   ` Richard Weinberger
  2014-12-15  2:25     ` Eric W. Biederman
  4 siblings, 1 reply; 24+ messages in thread
From: Richard Weinberger @ 2014-12-14 19:41 UTC (permalink / raw)
  To: Eric W. Biederman, Linux Containers
  Cc: Andy Lutomirski, kzak-H+wXaHxf7aLQT0dZR+AlfA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	dottedmag-6nM9faFvqWvMFIMGWPqnnw

Am 12.12.2014 um 23:32 schrieb Eric W. Biederman:
> 
> The entire tree for testing is available at:
> 	git.kernel.org:/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-testing
> 
> This is my queue of important bug fixes for user namespaces.  Most of
> these changes warrant being backported.  A few are bug fixes for cases
> where only root can trigger the issue so have not been marked for being
> back ported to stable.
> 
> A few of these patches have not been posted for review preivously, so I
> a giving the light of mailling list before I send them to Linus.  This
> patchset has seen some testing already. 
> 
> Since there are small deliberate breakage of userspace in here the more
> reviewers/testers the better.
> 
> Baring complictions I intend to ask Linus to pull this patchset sometime
> early next week.
> 
> So far nothing broke on my libvirt-lxc test bed. :-)
> Tested with openSUSE 13.2 and libvirt 1.2.9.
> Tested-by: Richard Weinberger <richard-/L3Ra7n9ekc@public.gmane.org>

FYI, this change set breaks util-linux's unshare(1) tool
as an unprivileged is no longer allowed to write to /proc/self/gid_map.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH review 00/18] userns: review of bug fixes for 3.19-rcX
  2014-12-14 19:41   ` [PATCH review 00/18] userns: review of bug fixes for 3.19-rcX Richard Weinberger
@ 2014-12-15  2:25     ` Eric W. Biederman
  0 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2014-12-15  2:25 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Linux Containers, linux-fsdevel, Serge E. Hallyn, Andy Lutomirski,
	Chen Hanxiao, kzak, dottedmag

Richard Weinberger <richard@nod.at> writes:

> Am 12.12.2014 um 23:32 schrieb Eric W. Biederman:
>> 
>> The entire tree for testing is available at:
>> 	git.kernel.org:/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-testing
>> 
>> This is my queue of important bug fixes for user namespaces.  Most of
>> these changes warrant being backported.  A few are bug fixes for cases
>> where only root can trigger the issue so have not been marked for being
>> back ported to stable.
>> 
>> A few of these patches have not been posted for review preivously, so I
>> a giving the light of mailling list before I send them to Linus.  This
>> patchset has seen some testing already. 
>> 
>> Since there are small deliberate breakage of userspace in here the more
>> reviewers/testers the better.
>> 
>> Baring complictions I intend to ask Linus to pull this patchset sometime
>> early next week.
>> 
>> So far nothing broke on my libvirt-lxc test bed. :-)
>> Tested with openSUSE 13.2 and libvirt 1.2.9.
>> Tested-by: Richard Weinberger <richard@nod.at>
>
> FYI, this change set breaks util-linux's unshare(1) tool
> as an unprivileged is no longer allowed to write to
> /proc/self/gid_map.

Only the --map-root-user option.  The patch below fixes it.
I will push this upstream after I push the main change to Linus.

This probably deseres a little discussion on the util-linux list.  Most
use cases will continue to work but with setgroups disabled some things
won't work and can not be made to work without privilege.

From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Thu, 11 Dec 2014 20:05:25 -0600
Subject: [PATCH] unshare: Fix --map-root-user to work on new kernels

In rare cases droping groups with setgroups(0, NULL) is an operation
that can grant a user additional privileges.  User namespaces were
allwoing that operation to unprivileged users and that had to be
fixed.

Update unshare --map-root-user to disable the setgroups operation
before setting the gid_map.

This is needed as after the security fix gid_map is restricted to
privileged users unless setgroups has been disabled.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/pathnames.h |  1 +
 sys-utils/unshare.c | 19 +++++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/include/pathnames.h b/include/pathnames.h
index 1cc4e15e6e4f..1c53e4554268 100644
--- a/include/pathnames.h
+++ b/include/pathnames.h
@@ -92,6 +92,7 @@
 
 #define _PATH_PROC_UIDMAP	"/proc/self/uid_map"
 #define _PATH_PROC_GIDMAP	"/proc/self/gid_map"
+#define _PATH_PROC_SETGROUPS	"/proc/self/setgroups"
 
 #define _PATH_PROC_ATTR_CURRENT	"/proc/self/attr/current"
 #define _PATH_PROC_ATTR_EXEC	"/proc/self/attr/exec"
diff --git a/sys-utils/unshare.c b/sys-utils/unshare.c
index 95e4afbd055e..d409a7c936b6 100644
--- a/sys-utils/unshare.c
+++ b/sys-utils/unshare.c
@@ -39,6 +39,24 @@
 #include "pathnames.h"
 #include "all-io.h"
 
+static void disable_setgroups(void)
+{
+	const char *file = _PATH_PROC_SETGROUPS;
+	const char *deny = "deny";
+	int fd;
+
+	fd = open(file, O_WRONLY);
+	if (fd < 0) {
+		if (errno == ENOENT)
+			return;
+		 err(EXIT_FAILURE, _("cannot open %s"), file);
+	}
+
+	if (write_all(fd, deny, strlen(deny)))
+		err(EXIT_FAILURE, _("write failed %s"), file);
+	close(fd);
+}
+
 static void map_id(const char *file, uint32_t from, uint32_t to)
 {
 	char *buf;
@@ -178,6 +196,7 @@ int main(int argc, char *argv[])
 	}
 
 	if (maproot) {
+		disable_setgroups();
 		map_id(_PATH_PROC_UIDMAP, 0, real_euid);
 		map_id(_PATH_PROC_GIDMAP, 0, real_egid);
 	}
-- 
2.1.3



^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2014-12-15  2:27 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-12 22:32 [PATCH review 00/18] userns: review of bug fixes for 3.19-rcX Eric W. Biederman
2014-12-12 22:48 ` [PATCH review 01/18] mnt: Implicitly add MNT_NODEV on remount when it was implicitly added by mount Eric W. Biederman
2014-12-12 22:48 ` [PATCH review 02/18] mnt: Update unprivileged remount test Eric W. Biederman
     [not found] ` <87k31wzehb.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-12-12 22:48   ` [PATCH review 03/18] umount: Disallow unprivileged mount force Eric W. Biederman
2014-12-12 23:07     ` Andy Lutomirski
     [not found]       ` <CALCETrV2kBfzypMbYKgxJ4BqB6yBG6Xvo=sZy3tvTng5ZRvAKQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-12-12 23:25         ` Eric W. Biederman
2014-12-13  0:20           ` Andy Lutomirski
2014-12-12 22:48   ` [PATCH review 04/18] umount: Do not allow unmounting rootfs Eric W. Biederman
2014-12-12 22:48   ` [PATCH review 11/18] userns: Don't allow unprivileged creation of gid mappings Eric W. Biederman
2014-12-12 22:48   ` [PATCH review 13/18] userns: Only allow the creator of the userns unprivileged mappings Eric W. Biederman
2014-12-14 19:41   ` [PATCH review 00/18] userns: review of bug fixes for 3.19-rcX Richard Weinberger
2014-12-15  2:25     ` Eric W. Biederman
2014-12-12 22:48 ` [PATCH review 05/18] mnt: Move the clear of MNT_LOCKED from copy_tree to it's callers Eric W. Biederman
2014-12-12 22:48 ` [PATCH review 06/18] mnt: Carefully set CL_UNPRIVILEGED in clone_mnt Eric W. Biederman
2014-12-12 22:48 ` [PATCH review 07/18] mnt: Clear mnt_expire during pivot_root Eric W. Biederman
2014-12-12 22:48 ` [PATCH review 08/18] groups: Consolidate the setgroups permission checks Eric W. Biederman
2014-12-12 22:48 ` [PATCH review 09/18] userns: Document what the invariant required for safe unprivileged mappings Eric W. Biederman
2014-12-12 22:48 ` [PATCH review 10/18] userns: Don't allow setgroups until a gid mapping has been setablished Eric W. Biederman
2014-12-12 22:48 ` [PATCH review 12/18] userns: Check euid no fsuid when establishing an unprivileged uid mapping Eric W. Biederman
2014-12-12 22:48 ` [PATCH review 14/18] userns: Rename id_map_mutex to userns_state_mutex Eric W. Biederman
2014-12-12 22:48 ` [PATCH review 15/18] userns: Add a knob to disable setgroups on a per user namespace basis Eric W. Biederman
2014-12-12 22:48 ` [PATCH review 16/18] userns: Allow setting gid_maps without privilege when setgroups is disabled Eric W. Biederman
2014-12-12 22:48 ` [PATCH review 17/18] userns; Correct the comment in map_write Eric W. Biederman
2014-12-12 22:48 ` [PATCH review 18/18] userns: Unbreak the unprivileged remount tests Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox