linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Patch resend v3 0/3] allow the creation of architecture emulation containers where the emulator binary is outside the container
@ 2016-04-19 12:24 James Bottomley
  2016-04-19 12:27 ` [Patch resend v3 1/3] fs: add filp_clone_open API James Bottomley
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: James Bottomley @ 2016-04-19 12:24 UTC (permalink / raw)
  To: containers, linux-fsdevel; +Cc: Al Viro, Jan Kara

Changes in v3:
 * Collected acks from Jan Kara and Serge Hallyn
 * Fixed doc spelling errors
 * fput -> put_filp in error path

Changes in v2:

 * Use oldfile f_creds for the vfs_open credentials in
   filp_clone_open()
 * remove spurious fput() after filp_close()
 * Add documentation for the F option

There's also now a git tree at

https://git.kernel.org/cgit/linux/kernel/git/jejb/binfmt_misc.git

Original cover letter:

I've recently been playing a lot with architecture emulation containers
using qemu, mainly so I can build and test the arm secure boot
toolchain on my x86 laptop (not having any arm server hardware).

The process for bringing up architecture emulation containers using
qemu-user is very cumbersome because the emulation has to be installed
within the container.  This is bad for several reasons, firstly because
it contaminates the container image to have an emulator sitting inside
it.  Secondly it means that all orchestration systems have to be
explicitly modified to work with non-native architectures and thirdly
it means that the consumer of the container can accidentally destroy
the emulation and thus permanently crash the container.

The fix for this is to add a mode to binfmt_misc where the emulation
can be permanently installed from the current mount namespace and where
it survives both chroot and changes of mount namespace, effectively
allowing the container to run using the emulator, but where the
emulator itself isn't present within the container.

There are a couple of downsides to this, firstly the mapping of the
interpreter is accessible inside the container even if the actual file
isn't (I don't think this is really a security problem) and secondly,
to update the emulator package, you now have to remove the emulation
update and reinstall it, but this should be easy to do with packaging
scripts.  Finally, the emulator must be static otherwise the container
would crash when the elf binary format attempted to run the elf
interpreter.

I'm not really a regular FS developer, so I'd appreciate FS people
taking a look at the new filp_clone_open() API and whether I got
everything correct.

Thanks,

James

---
James Bottomley (3):
  fs: add filp_clone_open API
  binfmt_misc: add persistent opened binary handler for containers
  binfmt_misc: add F option description to documentation

 Documentation/binfmt_misc.txt |  7 +++++++
 fs/binfmt_misc.c              | 41
+++++++++++++++++++++++++++++++++++++++--
 fs/internal.h                 |  1 +
 fs/open.c                     | 20 ++++++++++++++++++++
 4 files changed, 67 insertions(+), 2 deletions(-)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Patch resend v3 1/3] fs: add filp_clone_open API
  2016-04-19 12:24 [Patch resend v3 0/3] allow the creation of architecture emulation containers where the emulator binary is outside the container James Bottomley
@ 2016-04-19 12:27 ` James Bottomley
  2016-04-19 12:28 ` [Patch resend v3 2/3] binfmt_misc: add persistent opened binary handler for containers James Bottomley
  2016-04-19 12:29 ` [Patch resend v3 3/3] binfmt_misc: add F option description to documentation James Bottomley
  2 siblings, 0 replies; 6+ messages in thread
From: James Bottomley @ 2016-04-19 12:27 UTC (permalink / raw)
  To: containers, linux-fsdevel; +Cc: Al Viro, Jan Kara

I need an API that allows me to obtain a clone of the current file
pointer to pass in to an exec handler.  I've labelled this as an
internal API because I can't see how it would be useful outside of the
fs subsystem.  The use case will be a persistent binfmt_misc handler.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Jan Kara <jack@suse.cz>
---
 fs/internal.h |  1 +
 fs/open.c     | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/fs/internal.h b/fs/internal.h
index b71deee..c8ca0c9 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -108,6 +108,7 @@ extern long do_handle_open(int mountdirfd,
 			   struct file_handle __user *ufh, int open_flag);
 extern int open_check_o_direct(struct file *f);
 extern int vfs_open(const struct path *, struct file *, const struct cred *);
+extern struct file *filp_clone_open(struct file *);
 
 /*
  * inode.c
diff --git a/fs/open.c b/fs/open.c
index 17cb6b1..bfe6f2b 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1002,6 +1002,26 @@ struct file *file_open_root(struct dentry *dentry, struct vfsmount *mnt,
 }
 EXPORT_SYMBOL(file_open_root);
 
+struct file *filp_clone_open(struct file *oldfile)
+{
+	struct file *file;
+	int retval;
+
+	file = get_empty_filp();
+	if (IS_ERR(file))
+		return file;
+
+	file->f_flags = oldfile->f_flags;
+	retval = vfs_open(&oldfile->f_path, file, oldfile->f_cred);
+	if (retval) {
+		put_filp(file);
+		return ERR_PTR(retval);
+	}
+
+	return file;
+}
+EXPORT_SYMBOL(filp_clone_open);
+
 long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
 {
 	struct open_flags op;
-- 
2.6.6


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Patch resend v3 2/3] binfmt_misc: add persistent opened binary handler for containers
  2016-04-19 12:24 [Patch resend v3 0/3] allow the creation of architecture emulation containers where the emulator binary is outside the container James Bottomley
  2016-04-19 12:27 ` [Patch resend v3 1/3] fs: add filp_clone_open API James Bottomley
@ 2016-04-19 12:28 ` James Bottomley
  2016-04-19 12:29 ` [Patch resend v3 3/3] binfmt_misc: add F option description to documentation James Bottomley
  2 siblings, 0 replies; 6+ messages in thread
From: James Bottomley @ 2016-04-19 12:28 UTC (permalink / raw)
  To: containers, linux-fsdevel; +Cc: Al Viro, Jan Kara

>From a119fa7e0d4eb96a6290eb63a5bd597f5844fab4 Mon Sep 17 00:00:00 2001
From: James Bottomley <James.Bottomley@HansenPartnership.com>
Date: Wed, 17 Feb 2016 16:51:16 -0800
Subject: [PATCH 2/3] binfmt_misc: add persistent opened binary handler for
 containers

This patch adds a new flag 'F' to the binfmt handlers.  If you pass in
'F' the binary that runs the emulation will be opened immediately and
in future, will be cloned from the open file.

The net effect is that the handler survives both changeroots and mount
namespace changes, making it easy to work with foreign architecture
containers without contaminating the container image with the
emulator.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
---
 fs/binfmt_misc.c | 41 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 39 insertions(+), 2 deletions(-)

diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index 3a3ced7..8a108c4 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -26,6 +26,8 @@
 #include <linux/fs.h>
 #include <linux/uaccess.h>
 
+#include "internal.h"
+
 #ifdef DEBUG
 # define USE_DEBUG 1
 #else
@@ -43,6 +45,7 @@ enum {Enabled, Magic};
 #define MISC_FMT_PRESERVE_ARGV0 (1 << 31)
 #define MISC_FMT_OPEN_BINARY (1 << 30)
 #define MISC_FMT_CREDENTIALS (1 << 29)
+#define MISC_FMT_OPEN_FILE (1 << 28)
 
 typedef struct {
 	struct list_head list;
@@ -54,6 +57,7 @@ typedef struct {
 	char *interpreter;		/* filename of interpreter */
 	char *name;
 	struct dentry *dentry;
+	struct file *interp_file;
 } Node;
 
 static DEFINE_RWLOCK(entries_lock);
@@ -201,7 +205,13 @@ static int load_misc_binary(struct linux_binprm *bprm)
 	if (retval < 0)
 		goto error;
 
-	interp_file = open_exec(iname);
+	if (fmt->flags & MISC_FMT_OPEN_FILE && fmt->interp_file) {
+		interp_file = filp_clone_open(fmt->interp_file);
+		if (!IS_ERR(interp_file))
+			deny_write_access(interp_file);
+	} else {
+		interp_file = open_exec(iname);
+	}
 	retval = PTR_ERR(interp_file);
 	if (IS_ERR(interp_file))
 		goto error;
@@ -285,6 +295,11 @@ static char *check_special_flags(char *sfs, Node *e)
 			e->flags |= (MISC_FMT_CREDENTIALS |
 					MISC_FMT_OPEN_BINARY);
 			break;
+		case 'F':
+			pr_debug("register: flag: F: open interpreter file now\n");
+			p++;
+			e->flags |= MISC_FMT_OPEN_FILE;
+			break;
 		default:
 			cont = 0;
 		}
@@ -543,6 +558,8 @@ static void entry_status(Node *e, char *page)
 		*dp++ = 'O';
 	if (e->flags & MISC_FMT_CREDENTIALS)
 		*dp++ = 'C';
+	if (e->flags & MISC_FMT_OPEN_FILE)
+		*dp++ = 'F';
 	*dp++ = '\n';
 
 	if (!test_bit(Magic, &e->flags)) {
@@ -590,6 +607,11 @@ static void kill_node(Node *e)
 	}
 	write_unlock(&entries_lock);
 
+	if ((e->flags & MISC_FMT_OPEN_FILE) && e->interp_file) {
+		filp_close(e->interp_file, NULL);
+		e->interp_file = NULL;
+	}
+
 	if (dentry) {
 		drop_nlink(d_inode(dentry));
 		d_drop(dentry);
@@ -698,6 +720,21 @@ static ssize_t bm_register_write(struct file *file, const char __user *buffer,
 		goto out2;
 	}
 
+	if (e->flags & MISC_FMT_OPEN_FILE) {
+		struct file *f;
+
+		f = open_exec(e->interpreter);
+		if (IS_ERR(f)) {
+			err = PTR_ERR(f);
+			pr_notice("register: failed to install interpreter file %s\n", e->interpreter);
+			simple_release_fs(&bm_mnt, &entry_count);
+			iput(inode);
+			inode = NULL;
+			goto out2;
+		}
+		e->interp_file = f;
+	}
+
 	e->dentry = dget(dentry);
 	inode->i_private = e;
 	inode->i_fop = &bm_entry_operations;
@@ -716,7 +753,7 @@ out:
 
 	if (err) {
 		kfree(e);
-		return -EINVAL;
+		return err;
 	}
 	return count;
 }
-- 
2.6.6


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Patch resend v3 3/3] binfmt_misc: add F option description to documentation
  2016-04-19 12:24 [Patch resend v3 0/3] allow the creation of architecture emulation containers where the emulator binary is outside the container James Bottomley
  2016-04-19 12:27 ` [Patch resend v3 1/3] fs: add filp_clone_open API James Bottomley
  2016-04-19 12:28 ` [Patch resend v3 2/3] binfmt_misc: add persistent opened binary handler for containers James Bottomley
@ 2016-04-19 12:29 ` James Bottomley
  2016-04-19 19:58   ` Serge E. Hallyn
  2 siblings, 1 reply; 6+ messages in thread
From: James Bottomley @ 2016-04-19 12:29 UTC (permalink / raw)
  To: containers, linux-fsdevel; +Cc: Al Viro, Jan Kara

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
---
 Documentation/binfmt_misc.txt | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/binfmt_misc.txt b/Documentation/binfmt_misc.txt
index 6b1de70..ec83bbc 100644
--- a/Documentation/binfmt_misc.txt
+++ b/Documentation/binfmt_misc.txt
@@ -66,6 +66,13 @@ Here is what the fields mean:
             This feature should be used with care as the interpreter
             will run with root permissions when a setuid binary owned by root
             is run with binfmt_misc.
+      'F' - fix binary.  The usual behaviour of binfmt_misc is to spawn the
+      	    binary lazily when the misc format file is invoked.  However,
+	    this doesn't work very well in the face of mount namespaces and
+	    changeroots, so the F mode opens the binary as soon as the
+	    emulation is installed and uses the opened image to spawn the
+	    emulator, meaning it is always available once installed,
+	    regardless of how the environment changes.
 
 
 There are some restrictions:
-- 
2.6.6


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Patch resend v3 3/3] binfmt_misc: add F option description to documentation
  2016-04-19 12:29 ` [Patch resend v3 3/3] binfmt_misc: add F option description to documentation James Bottomley
@ 2016-04-19 19:58   ` Serge E. Hallyn
  2016-04-19 20:01     ` James Bottomley
  0 siblings, 1 reply; 6+ messages in thread
From: Serge E. Hallyn @ 2016-04-19 19:58 UTC (permalink / raw)
  To: James Bottomley; +Cc: containers, linux-fsdevel, Jan Kara, Al Viro

Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

Thanks, James.  Which tree were you thinking this would go in
through?

> ---
>  Documentation/binfmt_misc.txt | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/Documentation/binfmt_misc.txt b/Documentation/binfmt_misc.txt
> index 6b1de70..ec83bbc 100644
> --- a/Documentation/binfmt_misc.txt
> +++ b/Documentation/binfmt_misc.txt
> @@ -66,6 +66,13 @@ Here is what the fields mean:
>              This feature should be used with care as the interpreter
>              will run with root permissions when a setuid binary owned by root
>              is run with binfmt_misc.
> +      'F' - fix binary.  The usual behaviour of binfmt_misc is to spawn the
> +      	    binary lazily when the misc format file is invoked.  However,
> +	    this doesn't work very well in the face of mount namespaces and
> +	    changeroots, so the F mode opens the binary as soon as the
> +	    emulation is installed and uses the opened image to spawn the
> +	    emulator, meaning it is always available once installed,
> +	    regardless of how the environment changes.
>  
>  
>  There are some restrictions:
> -- 
> 2.6.6
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Patch resend v3 3/3] binfmt_misc: add F option description to documentation
  2016-04-19 19:58   ` Serge E. Hallyn
@ 2016-04-19 20:01     ` James Bottomley
  0 siblings, 0 replies; 6+ messages in thread
From: James Bottomley @ 2016-04-19 20:01 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: containers, linux-fsdevel, Jan Kara, Al Viro

On Tue, 2016-04-19 at 14:58 -0500, Serge E. Hallyn wrote:
> Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> > Signed-off-by: James Bottomley <
> > James.Bottomley@HansenPartnership.com>
> 
> Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
> 
> Thanks, James.  Which tree were you thinking this would go in
> through?

I'm happy to become binfmt_misc maintainer (on the grounds that last to
touch it gets to keep it) and thus run my own tree, but I talked to Al
at LSF/MM and he thinks it can go through the vfs tree.

James

> > ---
> >  Documentation/binfmt_misc.txt | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/Documentation/binfmt_misc.txt
> > b/Documentation/binfmt_misc.txt
> > index 6b1de70..ec83bbc 100644
> > --- a/Documentation/binfmt_misc.txt
> > +++ b/Documentation/binfmt_misc.txt
> > @@ -66,6 +66,13 @@ Here is what the fields mean:
> >              This feature should be used with care as the
> > interpreter
> >              will run with root permissions when a setuid binary
> > owned by root
> >              is run with binfmt_misc.
> > +      'F' - fix binary.  The usual behaviour of binfmt_misc is to
> > spawn the
> > +      	    binary lazily when the misc format file is
> > invoked.  However,
> > +	    this doesn't work very well in the face of mount
> > namespaces and
> > +	    changeroots, so the F mode opens the binary as soon as
> > the
> > +	    emulation is installed and uses the opened image to
> > spawn the
> > +	    emulator, meaning it is always available once
> > installed,
> > +	    regardless of how the environment changes.
> >  
> >  
> >  There are some restrictions:


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-04-19 20:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-19 12:24 [Patch resend v3 0/3] allow the creation of architecture emulation containers where the emulator binary is outside the container James Bottomley
2016-04-19 12:27 ` [Patch resend v3 1/3] fs: add filp_clone_open API James Bottomley
2016-04-19 12:28 ` [Patch resend v3 2/3] binfmt_misc: add persistent opened binary handler for containers James Bottomley
2016-04-19 12:29 ` [Patch resend v3 3/3] binfmt_misc: add F option description to documentation James Bottomley
2016-04-19 19:58   ` Serge E. Hallyn
2016-04-19 20:01     ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).