* [Patch 0/2] allow the creation of architecture emulation containers where the emulator binary is outside the container
@ 2016-02-18 1:02 James Bottomley
[not found] ` <1455757372.7664.7.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: James Bottomley @ 2016-02-18 1:02 UTC (permalink / raw)
To: containers, linux-fsdevel
I've recently been playing a lot with architecture emulation containers
using qemu, mainly so I can build and test the arm secure boot
toolchain on my x86 laptop (not having any arm server hardware).
The process for bringing up architecture emulation containers using
qemu-user is very cumbersome because the emulation has to be installed
within the container. This is bad for several reasons, firstly because
it contaminates the container image to have an emulator sitting inside
it. Secondly it means that all orchestration systems have to be
explicitly modified to work with non-native architectures and thirdly
it means that the consumer of the container can accidentally destroy
the emulation and thus permanently crash the container.
The fix for this is to add a mode to binfmt_misc where the emulation
can be permanently installed from the current mount namespace and where
it survives both chroot and changes of mount namespace, effectively
allowing the container to run using the emulator, but where the
emulator itself isn't present within the container.
There are a couple of downsides to this, firstly the mapping of the
interpreter is accessible inside the container even if the actual file
isn't (I don't think this is really a security problem) and secondly,
to update the emulator package, you now have to remove the emulation
update and reinstall it, but this should be easy to do with packaging
scripts. Finally, the emulator must be static otherwise the container
would crash when the elf binary format attempted to run the elf
interpreter.
I'm not really a regular FS developer, so I'd appreciate FS people
taking a look at the new filp_clone_open() API and whether I got
everything correct.
Thanks,
James
---
James Bottomley (2):
fs: add filp_clone_open API
binfmt_misc: add persistent opened binary handler for containers
fs/binfmt_misc.c | 42 ++++++++++++++++++++++++++++++++++++++++--
fs/internal.h | 1 +
fs/open.c | 20 ++++++++++++++++++++
3 files changed, 61 insertions(+), 2 deletions(-)
--
2.6.2
^ permalink raw reply [flat|nested] 6+ messages in thread[parent not found: <1455757372.7664.7.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>]
* [Patch 1/2] fs: add filp_clone_open API
[not found] ` <1455757372.7664.7.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
@ 2016-02-18 1:04 ` James Bottomley
2016-02-18 1:05 ` [Patch 2/2] binfmt_misc: add persistently opened binary handler for containers James Bottomley
1 sibling, 0 replies; 6+ messages in thread
From: James Bottomley @ 2016-02-18 1:04 UTC (permalink / raw)
To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-fsdevel
I need an API that allows me to obtain a clone of the current file
pointer to pass in to an exec handler. I've labelled this as an
internal API because I can't see how it would be useful outside of the
fs subsystem. The use case will be a persistent binfmt_misc handler.
Signed-off-by: James Bottomley <James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
---
fs/internal.h | 1 +
fs/open.c | 20 ++++++++++++++++++++
2 files changed, 21 insertions(+)
diff --git a/fs/internal.h b/fs/internal.h
index b71deee..c8ca0c9 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -108,6 +108,7 @@ extern long do_handle_open(int mountdirfd,
struct file_handle __user *ufh, int open_flag);
extern int open_check_o_direct(struct file *f);
extern int vfs_open(const struct path *, struct file *, const struct cred *);
+extern struct file *filp_clone_open(struct file *);
/*
* inode.c
diff --git a/fs/open.c b/fs/open.c
index 55bdc75..6c21c14 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1004,6 +1004,26 @@ struct file *file_open_root(struct dentry *dentry, struct vfsmount *mnt,
}
EXPORT_SYMBOL(file_open_root);
+struct file *filp_clone_open(struct file *oldfile)
+{
+ struct file *file;
+ int retval;
+
+ file = get_empty_filp();
+ if (IS_ERR(file))
+ return file;
+
+ file->f_flags = oldfile->f_flags;
+ retval = vfs_open(&oldfile->f_path, file, current_cred());
+ if (retval) {
+ fput(file);
+ return ERR_PTR(retval);
+ }
+
+ return file;
+}
+EXPORT_SYMBOL(filp_clone_open);
+
long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
{
struct open_flags op;
--
2.6.2
^ permalink raw reply related [flat|nested] 6+ messages in thread* [Patch 2/2] binfmt_misc: add persistently opened binary handler for containers
[not found] ` <1455757372.7664.7.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
2016-02-18 1:04 ` [Patch 1/2] fs: add filp_clone_open API James Bottomley
@ 2016-02-18 1:05 ` James Bottomley
1 sibling, 0 replies; 6+ messages in thread
From: James Bottomley @ 2016-02-18 1:05 UTC (permalink / raw)
To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-fsdevel
This patch adds a new flag 'F' to the binfmt handlers. If you pass in
'F' the binary that runs the emulation will be opened immediately and
in future, will be cloned from the open file.
The net effect is that the handler survives both changeroots and mount
namespace changes, making it easy to work with foreign architecture
containers without contaminating the container image with the
emulator.
Signed-off-by: James Bottomley <James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
---
fs/binfmt_misc.c | 42 ++++++++++++++++++++++++++++++++++++++++--
1 file changed, 40 insertions(+), 2 deletions(-)
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index 3a3ced7..c61ade2 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -26,6 +26,8 @@
#include <linux/fs.h>
#include <linux/uaccess.h>
+#include "internal.h"
+
#ifdef DEBUG
# define USE_DEBUG 1
#else
@@ -43,6 +45,7 @@ enum {Enabled, Magic};
#define MISC_FMT_PRESERVE_ARGV0 (1 << 31)
#define MISC_FMT_OPEN_BINARY (1 << 30)
#define MISC_FMT_CREDENTIALS (1 << 29)
+#define MISC_FMT_OPEN_FILE (1 << 28)
typedef struct {
struct list_head list;
@@ -54,6 +57,7 @@ typedef struct {
char *interpreter; /* filename of interpreter */
char *name;
struct dentry *dentry;
+ struct file *interp_file;
} Node;
static DEFINE_RWLOCK(entries_lock);
@@ -201,7 +205,13 @@ static int load_misc_binary(struct linux_binprm *bprm)
if (retval < 0)
goto error;
- interp_file = open_exec(iname);
+ if (fmt->flags & MISC_FMT_OPEN_FILE && fmt->interp_file) {
+ interp_file = filp_clone_open(fmt->interp_file);
+ if (!IS_ERR(interp_file))
+ deny_write_access(interp_file);
+ } else {
+ interp_file = open_exec(iname);
+ }
retval = PTR_ERR(interp_file);
if (IS_ERR(interp_file))
goto error;
@@ -285,6 +295,11 @@ static char *check_special_flags(char *sfs, Node *e)
e->flags |= (MISC_FMT_CREDENTIALS |
MISC_FMT_OPEN_BINARY);
break;
+ case 'F':
+ pr_debug("register: flag: F: open interpreter file now\n");
+ p++;
+ e->flags |= MISC_FMT_OPEN_FILE;
+ break;
default:
cont = 0;
}
@@ -543,6 +558,8 @@ static void entry_status(Node *e, char *page)
*dp++ = 'O';
if (e->flags & MISC_FMT_CREDENTIALS)
*dp++ = 'C';
+ if (e->flags & MISC_FMT_OPEN_FILE)
+ *dp++ = 'F';
*dp++ = '\n';
if (!test_bit(Magic, &e->flags)) {
@@ -590,6 +607,12 @@ static void kill_node(Node *e)
}
write_unlock(&entries_lock);
+ if ((e->flags & MISC_FMT_OPEN_FILE) && e->interp_file) {
+ filp_close(e->interp_file, NULL);
+ fput(e->interp_file);
+ e->interp_file = NULL;
+ }
+
if (dentry) {
drop_nlink(d_inode(dentry));
d_drop(dentry);
@@ -698,6 +721,21 @@ static ssize_t bm_register_write(struct file *file, const char __user *buffer,
goto out2;
}
+ if (e->flags & MISC_FMT_OPEN_FILE) {
+ struct file *f;
+
+ f = open_exec(e->interpreter);
+ if (IS_ERR(f)) {
+ err = PTR_ERR(f);
+ pr_notice("register: failed to install interpreter file %s\n", e->interpreter);
+ simple_release_fs(&bm_mnt, &entry_count);
+ iput(inode);
+ inode = NULL;
+ goto out2;
+ }
+ e->interp_file = f;
+ }
+
e->dentry = dget(dentry);
inode->i_private = e;
inode->i_fop = &bm_entry_operations;
@@ -716,7 +754,7 @@ out:
if (err) {
kfree(e);
- return -EINVAL;
+ return err;
}
return count;
}
--
2.6.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Patch 1/2] fs: add filp_clone_open API
2016-02-18 1:02 [Patch 0/2] allow the creation of architecture emulation containers where the emulator binary is outside the container James Bottomley
[not found] ` <1455757372.7664.7.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
@ 2016-02-18 1:04 ` James Bottomley
2016-02-18 1:05 ` [Patch 2/2] binfmt_misc: add persistently opened binary handler for containers James Bottomley
2 siblings, 0 replies; 6+ messages in thread
From: James Bottomley @ 2016-02-18 1:04 UTC (permalink / raw)
To: containers, linux-fsdevel
I need an API that allows me to obtain a clone of the current file
pointer to pass in to an exec handler. I've labelled this as an
internal API because I can't see how it would be useful outside of the
fs subsystem. The use case will be a persistent binfmt_misc handler.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
---
fs/internal.h | 1 +
fs/open.c | 20 ++++++++++++++++++++
2 files changed, 21 insertions(+)
diff --git a/fs/internal.h b/fs/internal.h
index b71deee..c8ca0c9 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -108,6 +108,7 @@ extern long do_handle_open(int mountdirfd,
struct file_handle __user *ufh, int open_flag);
extern int open_check_o_direct(struct file *f);
extern int vfs_open(const struct path *, struct file *, const struct cred *);
+extern struct file *filp_clone_open(struct file *);
/*
* inode.c
diff --git a/fs/open.c b/fs/open.c
index 55bdc75..6c21c14 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1004,6 +1004,26 @@ struct file *file_open_root(struct dentry *dentry, struct vfsmount *mnt,
}
EXPORT_SYMBOL(file_open_root);
+struct file *filp_clone_open(struct file *oldfile)
+{
+ struct file *file;
+ int retval;
+
+ file = get_empty_filp();
+ if (IS_ERR(file))
+ return file;
+
+ file->f_flags = oldfile->f_flags;
+ retval = vfs_open(&oldfile->f_path, file, current_cred());
+ if (retval) {
+ fput(file);
+ return ERR_PTR(retval);
+ }
+
+ return file;
+}
+EXPORT_SYMBOL(filp_clone_open);
+
long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
{
struct open_flags op;
--
2.6.2
^ permalink raw reply related [flat|nested] 6+ messages in thread* [Patch 2/2] binfmt_misc: add persistently opened binary handler for containers
2016-02-18 1:02 [Patch 0/2] allow the creation of architecture emulation containers where the emulator binary is outside the container James Bottomley
[not found] ` <1455757372.7664.7.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
2016-02-18 1:04 ` [Patch 1/2] fs: add filp_clone_open API James Bottomley
@ 2016-02-18 1:05 ` James Bottomley
2 siblings, 0 replies; 6+ messages in thread
From: James Bottomley @ 2016-02-18 1:05 UTC (permalink / raw)
To: containers, linux-fsdevel
This patch adds a new flag 'F' to the binfmt handlers. If you pass in
'F' the binary that runs the emulation will be opened immediately and
in future, will be cloned from the open file.
The net effect is that the handler survives both changeroots and mount
namespace changes, making it easy to work with foreign architecture
containers without contaminating the container image with the
emulator.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
---
fs/binfmt_misc.c | 42 ++++++++++++++++++++++++++++++++++++++++--
1 file changed, 40 insertions(+), 2 deletions(-)
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index 3a3ced7..c61ade2 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -26,6 +26,8 @@
#include <linux/fs.h>
#include <linux/uaccess.h>
+#include "internal.h"
+
#ifdef DEBUG
# define USE_DEBUG 1
#else
@@ -43,6 +45,7 @@ enum {Enabled, Magic};
#define MISC_FMT_PRESERVE_ARGV0 (1 << 31)
#define MISC_FMT_OPEN_BINARY (1 << 30)
#define MISC_FMT_CREDENTIALS (1 << 29)
+#define MISC_FMT_OPEN_FILE (1 << 28)
typedef struct {
struct list_head list;
@@ -54,6 +57,7 @@ typedef struct {
char *interpreter; /* filename of interpreter */
char *name;
struct dentry *dentry;
+ struct file *interp_file;
} Node;
static DEFINE_RWLOCK(entries_lock);
@@ -201,7 +205,13 @@ static int load_misc_binary(struct linux_binprm *bprm)
if (retval < 0)
goto error;
- interp_file = open_exec(iname);
+ if (fmt->flags & MISC_FMT_OPEN_FILE && fmt->interp_file) {
+ interp_file = filp_clone_open(fmt->interp_file);
+ if (!IS_ERR(interp_file))
+ deny_write_access(interp_file);
+ } else {
+ interp_file = open_exec(iname);
+ }
retval = PTR_ERR(interp_file);
if (IS_ERR(interp_file))
goto error;
@@ -285,6 +295,11 @@ static char *check_special_flags(char *sfs, Node *e)
e->flags |= (MISC_FMT_CREDENTIALS |
MISC_FMT_OPEN_BINARY);
break;
+ case 'F':
+ pr_debug("register: flag: F: open interpreter file now\n");
+ p++;
+ e->flags |= MISC_FMT_OPEN_FILE;
+ break;
default:
cont = 0;
}
@@ -543,6 +558,8 @@ static void entry_status(Node *e, char *page)
*dp++ = 'O';
if (e->flags & MISC_FMT_CREDENTIALS)
*dp++ = 'C';
+ if (e->flags & MISC_FMT_OPEN_FILE)
+ *dp++ = 'F';
*dp++ = '\n';
if (!test_bit(Magic, &e->flags)) {
@@ -590,6 +607,12 @@ static void kill_node(Node *e)
}
write_unlock(&entries_lock);
+ if ((e->flags & MISC_FMT_OPEN_FILE) && e->interp_file) {
+ filp_close(e->interp_file, NULL);
+ fput(e->interp_file);
+ e->interp_file = NULL;
+ }
+
if (dentry) {
drop_nlink(d_inode(dentry));
d_drop(dentry);
@@ -698,6 +721,21 @@ static ssize_t bm_register_write(struct file *file, const char __user *buffer,
goto out2;
}
+ if (e->flags & MISC_FMT_OPEN_FILE) {
+ struct file *f;
+
+ f = open_exec(e->interpreter);
+ if (IS_ERR(f)) {
+ err = PTR_ERR(f);
+ pr_notice("register: failed to install interpreter file %s\n", e->interpreter);
+ simple_release_fs(&bm_mnt, &entry_count);
+ iput(inode);
+ inode = NULL;
+ goto out2;
+ }
+ e->interp_file = f;
+ }
+
e->dentry = dget(dentry);
inode->i_private = e;
inode->i_fop = &bm_entry_operations;
@@ -716,7 +754,7 @@ out:
if (err) {
kfree(e);
- return -EINVAL;
+ return err;
}
return count;
}
--
2.6.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Patch 0/2] allow the creation of architecture emulation containers where the emulator binary is outside the container
@ 2016-02-18 1:02 James Bottomley
0 siblings, 0 replies; 6+ messages in thread
From: James Bottomley @ 2016-02-18 1:02 UTC (permalink / raw)
To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-fsdevel
I've recently been playing a lot with architecture emulation containers
using qemu, mainly so I can build and test the arm secure boot
toolchain on my x86 laptop (not having any arm server hardware).
The process for bringing up architecture emulation containers using
qemu-user is very cumbersome because the emulation has to be installed
within the container. This is bad for several reasons, firstly because
it contaminates the container image to have an emulator sitting inside
it. Secondly it means that all orchestration systems have to be
explicitly modified to work with non-native architectures and thirdly
it means that the consumer of the container can accidentally destroy
the emulation and thus permanently crash the container.
The fix for this is to add a mode to binfmt_misc where the emulation
can be permanently installed from the current mount namespace and where
it survives both chroot and changes of mount namespace, effectively
allowing the container to run using the emulator, but where the
emulator itself isn't present within the container.
There are a couple of downsides to this, firstly the mapping of the
interpreter is accessible inside the container even if the actual file
isn't (I don't think this is really a security problem) and secondly,
to update the emulator package, you now have to remove the emulation
update and reinstall it, but this should be easy to do with packaging
scripts. Finally, the emulator must be static otherwise the container
would crash when the elf binary format attempted to run the elf
interpreter.
I'm not really a regular FS developer, so I'd appreciate FS people
taking a look at the new filp_clone_open() API and whether I got
everything correct.
Thanks,
James
---
James Bottomley (2):
fs: add filp_clone_open API
binfmt_misc: add persistent opened binary handler for containers
fs/binfmt_misc.c | 42 ++++++++++++++++++++++++++++++++++++++++--
fs/internal.h | 1 +
fs/open.c | 20 ++++++++++++++++++++
3 files changed, 61 insertions(+), 2 deletions(-)
--
2.6.2
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-02-18 1:05 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-18 1:02 [Patch 0/2] allow the creation of architecture emulation containers where the emulator binary is outside the container James Bottomley
[not found] ` <1455757372.7664.7.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
2016-02-18 1:04 ` [Patch 1/2] fs: add filp_clone_open API James Bottomley
2016-02-18 1:05 ` [Patch 2/2] binfmt_misc: add persistently opened binary handler for containers James Bottomley
2016-02-18 1:04 ` [Patch 1/2] fs: add filp_clone_open API James Bottomley
2016-02-18 1:05 ` [Patch 2/2] binfmt_misc: add persistently opened binary handler for containers James Bottomley
-- strict thread matches above, loose matches on Subject: below --
2016-02-18 1:02 [Patch 0/2] allow the creation of architecture emulation containers where the emulator binary is outside the container James Bottomley
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.