public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Move console redirect to pid namespace
@ 2013-02-09  2:28 minyard
  2013-02-09 18:14 ` Bruno Prémont
  0 siblings, 1 reply; 6+ messages in thread
From: minyard @ 2013-02-09  2:28 UTC (permalink / raw)
  To: Linux Kernel; +Cc: Corey Minyard

From: Corey Minyard <cminyard@mvista.com>

The console redirect - ioctl(fd, TIOCCONS) - is not in a namespace,
thus a container can do a redirect and grab all the I/O on the host
and all container consoles.

This change puts the redirect in the pid namespace.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
---

I'm pretty sure this patch is not correct, but I'm not quite sure the
best way to fix this.  I'm not 100% sure that the pid namespace is the
right place, but it seemed the most reasonable of all the choices.  The
other obvious choice is the mount namespace, but it didn't seem as good
a fit.

The other problem is that I don't think you can call fput() from
destroy_pid_namespace().  That can be called from interrupt context,
and I don't think fput() is safe there.  I know it's not safe in 3.4
with the RT patch applied.  However, the only way I've come up with to
fix it is to add a workqueue, and that seems a bit heavy for this.

 drivers/tty/tty_io.c          |   29 ++++++++++++++++++-----------
 include/linux/pid_namespace.h |    1 +
 kernel/pid_namespace.c        |    3 +++
 3 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index da9fde8..c93c23d 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -95,6 +95,7 @@
 #include <linux/seq_file.h>
 #include <linux/serial.h>
 #include <linux/ratelimit.h>
+#include <linux/pid_namespace.h>
 
 #include <linux/uaccess.h>
 
@@ -503,8 +504,10 @@ static const struct file_operations hung_up_tty_fops = {
 	.release	= tty_release,
 };
 
+/*
+ * redirect is in the pid namespace, but we use a global lock.
+ */
 static DEFINE_SPINLOCK(redirect_lock);
-static struct file *redirect;
 
 /**
  *	tty_wakeup	-	request more data
@@ -563,15 +566,17 @@ static void __tty_hangup(struct tty_struct *tty)
 	int    closecount = 0, n;
 	unsigned long flags;
 	int refs = 0;
+	struct file *redir;
 
 	if (!tty)
 		return;
 
 
 	spin_lock(&redirect_lock);
-	if (redirect && file_tty(redirect) == tty) {
-		f = redirect;
-		redirect = NULL;
+	redir = current->nsproxy->pid_ns->redirect;
+	if (redir && file_tty(redir) == tty) {
+		f = redir;
+		current->nsproxy->pid_ns->redirect = NULL;
 	}
 	spin_unlock(&redirect_lock);
 
@@ -1163,10 +1168,12 @@ ssize_t redirected_tty_write(struct file *file, const char __user *buf,
 						size_t count, loff_t *ppos)
 {
 	struct file *p = NULL;
+	struct file *redir;
 
 	spin_lock(&redirect_lock);
-	if (redirect)
-		p = get_file(redirect);
+	redir = current->nsproxy->pid_ns->redirect;
+	if (redir)
+		p = get_file(redir);
 	spin_unlock(&redirect_lock);
 
 	if (p) {
@@ -2247,19 +2254,19 @@ static int tioccons(struct file *file)
 	if (file->f_op->write == redirected_tty_write) {
 		struct file *f;
 		spin_lock(&redirect_lock);
-		f = redirect;
-		redirect = NULL;
+		f = current->nsproxy->pid_ns->redirect;
+		current->nsproxy->pid_ns->redirect = NULL;
 		spin_unlock(&redirect_lock);
 		if (f)
 			fput(f);
 		return 0;
 	}
 	spin_lock(&redirect_lock);
-	if (redirect) {
+	if (current->nsproxy->pid_ns->redirect) {
 		spin_unlock(&redirect_lock);
 		return -EBUSY;
 	}
-	redirect = get_file(file);
+	current->nsproxy->pid_ns->redirect = get_file(file);
 	spin_unlock(&redirect_lock);
 	return 0;
 }
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 215e5e3..b04870f 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -38,6 +38,7 @@ struct pid_namespace {
 	int hide_pid;
 	int reboot;	/* group exit code if this pidns was rebooted */
 	unsigned int proc_inum;
+	struct file *redirect;
 };
 
 extern struct pid_namespace init_pid_ns;
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index c1c3dc1..af1bfce 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -18,6 +18,7 @@
 #include <linux/proc_fs.h>
 #include <linux/reboot.h>
 #include <linux/export.h>
+#include <linux/file.h>
 
 #define BITS_PER_PAGE		(PAGE_SIZE*8)
 
@@ -138,6 +139,8 @@ static void destroy_pid_namespace(struct pid_namespace *ns)
 {
 	int i;
 
+	if (ns->redirect)
+	    fput(ns->redirect);
 	proc_free_inum(ns->proc_inum);
 	for (i = 0; i < PIDMAP_ENTRIES; i++)
 		kfree(ns->pidmap[i].page);
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] Move console redirect to pid namespace
  2013-02-09  2:28 [PATCH] Move console redirect to pid namespace minyard
@ 2013-02-09 18:14 ` Bruno Prémont
  2013-02-13 19:08   ` Eric W. Biederman
  0 siblings, 1 reply; 6+ messages in thread
From: Bruno Prémont @ 2013-02-09 18:14 UTC (permalink / raw)
  To: Corey Minyard; +Cc: Linux Kernel, Corey Minyard, containers

CCing containers list

On Fri, 08 February 2013 minyard@acm.org wrote:
> From: Corey Minyard <cminyard@mvista.com>
> 
> The console redirect - ioctl(fd, TIOCCONS) - is not in a namespace,
> thus a container can do a redirect and grab all the I/O on the host
> and all container consoles.
> 
> This change puts the redirect in the pid namespace.
> 
> Signed-off-by: Corey Minyard <cminyard@mvista.com>
> ---
> 
> I'm pretty sure this patch is not correct, but I'm not quite sure the
> best way to fix this.  I'm not 100% sure that the pid namespace is the
> right place, but it seemed the most reasonable of all the choices.  The
> other obvious choice is the mount namespace, but it didn't seem as good
> a fit.

With recent changes, tying it to init user namespace might even be better.

> The other problem is that I don't think you can call fput() from
> destroy_pid_namespace().  That can be called from interrupt context,
> and I don't think fput() is safe there.  I know it's not safe in 3.4
> with the RT patch applied.  However, the only way I've come up with to
> fix it is to add a workqueue, and that seems a bit heavy for this.
> 
>  drivers/tty/tty_io.c          |   29 ++++++++++++++++++-----------
>  include/linux/pid_namespace.h |    1 +
>  kernel/pid_namespace.c        |    3 +++
>  3 files changed, 22 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
> index da9fde8..c93c23d 100644
> --- a/drivers/tty/tty_io.c
> +++ b/drivers/tty/tty_io.c
> @@ -95,6 +95,7 @@
>  #include <linux/seq_file.h>
>  #include <linux/serial.h>
>  #include <linux/ratelimit.h>
> +#include <linux/pid_namespace.h>
>  
>  #include <linux/uaccess.h>
>  
> @@ -503,8 +504,10 @@ static const struct file_operations hung_up_tty_fops = {
>  	.release	= tty_release,
>  };
>  
> +/*
> + * redirect is in the pid namespace, but we use a global lock.
> + */
>  static DEFINE_SPINLOCK(redirect_lock);
> -static struct file *redirect;
>  
>  /**
>   *	tty_wakeup	-	request more data
> @@ -563,15 +566,17 @@ static void __tty_hangup(struct tty_struct *tty)
>  	int    closecount = 0, n;
>  	unsigned long flags;
>  	int refs = 0;
> +	struct file *redir;
>  
>  	if (!tty)
>  		return;
>  
>  
>  	spin_lock(&redirect_lock);
> -	if (redirect && file_tty(redirect) == tty) {
> -		f = redirect;
> -		redirect = NULL;
> +	redir = current->nsproxy->pid_ns->redirect;
> +	if (redir && file_tty(redir) == tty) {
> +		f = redir;
> +		current->nsproxy->pid_ns->redirect = NULL;
>  	}
>  	spin_unlock(&redirect_lock);
>  
> @@ -1163,10 +1168,12 @@ ssize_t redirected_tty_write(struct file *file, const char __user *buf,
>  						size_t count, loff_t *ppos)
>  {
>  	struct file *p = NULL;
> +	struct file *redir;
>  
>  	spin_lock(&redirect_lock);
> -	if (redirect)
> -		p = get_file(redirect);
> +	redir = current->nsproxy->pid_ns->redirect;
> +	if (redir)
> +		p = get_file(redir);
>  	spin_unlock(&redirect_lock);
>  
>  	if (p) {
> @@ -2247,19 +2254,19 @@ static int tioccons(struct file *file)
>  	if (file->f_op->write == redirected_tty_write) {
>  		struct file *f;
>  		spin_lock(&redirect_lock);
> -		f = redirect;
> -		redirect = NULL;
> +		f = current->nsproxy->pid_ns->redirect;
> +		current->nsproxy->pid_ns->redirect = NULL;
>  		spin_unlock(&redirect_lock);
>  		if (f)
>  			fput(f);
>  		return 0;
>  	}
>  	spin_lock(&redirect_lock);
> -	if (redirect) {
> +	if (current->nsproxy->pid_ns->redirect) {
>  		spin_unlock(&redirect_lock);
>  		return -EBUSY;
>  	}
> -	redirect = get_file(file);
> +	current->nsproxy->pid_ns->redirect = get_file(file);
>  	spin_unlock(&redirect_lock);
>  	return 0;
>  }
> diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
> index 215e5e3..b04870f 100644
> --- a/include/linux/pid_namespace.h
> +++ b/include/linux/pid_namespace.h
> @@ -38,6 +38,7 @@ struct pid_namespace {
>  	int hide_pid;
>  	int reboot;	/* group exit code if this pidns was rebooted */
>  	unsigned int proc_inum;
> +	struct file *redirect;
>  };
>  
>  extern struct pid_namespace init_pid_ns;
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index c1c3dc1..af1bfce 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -18,6 +18,7 @@
>  #include <linux/proc_fs.h>
>  #include <linux/reboot.h>
>  #include <linux/export.h>
> +#include <linux/file.h>
>  
>  #define BITS_PER_PAGE		(PAGE_SIZE*8)
>  
> @@ -138,6 +139,8 @@ static void destroy_pid_namespace(struct pid_namespace *ns)
>  {
>  	int i;
>  
> +	if (ns->redirect)
> +	    fput(ns->redirect);
>  	proc_free_inum(ns->proc_inum);
>  	for (i = 0; i < PIDMAP_ENTRIES; i++)
>  		kfree(ns->pidmap[i].page);


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Move console redirect to pid namespace
  2013-02-09 18:14 ` Bruno Prémont
@ 2013-02-13 19:08   ` Eric W. Biederman
  2013-02-15  2:08     ` Corey Minyard
  0 siblings, 1 reply; 6+ messages in thread
From: Eric W. Biederman @ 2013-02-13 19:08 UTC (permalink / raw)
  To: Bruno Prémont; +Cc: Corey Minyard, Corey Minyard, containers, Linux Kernel

Bruno Prémont <bonbons@linux-vserver.org> writes:

> CCing containers list
>
> On Fri, 08 February 2013 minyard@acm.org wrote:
>> From: Corey Minyard <cminyard@mvista.com>
>> 
>> The console redirect - ioctl(fd, TIOCCONS) - is not in a namespace,
>> thus a container can do a redirect and grab all the I/O on the host
>> and all container consoles.
>> 
>> This change puts the redirect in the pid namespace.
>> 
>> Signed-off-by: Corey Minyard <cminyard@mvista.com>
>> ---
>> 
>> I'm pretty sure this patch is not correct, but I'm not quite sure the
>> best way to fix this.  I'm not 100% sure that the pid namespace is the
>> right place, but it seemed the most reasonable of all the choices.  The
>> other obvious choice is the mount namespace, but it didn't seem as good
>> a fit.
>
> With recent changes, tying it to init user namespace might even be
> better.

With recent changes this is tied to the initial user namespace.  So the
simple solution to this and so many other similiar security problems is
to run your container in a user namespace.

The permission check currently is capable(CAP_SYS_ADMIN) which requires
the caller to have the CAP_SYS_ADMIN in the initial user namespace.

Is there a desire to have TIOCCONS not just fail in a container but to
have TIOCCONS work in a container specific way?

>> The other problem is that I don't think you can call fput() from
>> destroy_pid_namespace().  That can be called from interrupt context,
>> and I don't think fput() is safe there.  I know it's not safe in 3.4
>> with the RT patch applied.  However, the only way I've come up with to
>> fix it is to add a workqueue, and that seems a bit heavy for this.

Actually getting destroy_pid_namespace out of interrupt context wouldn't
be the worst thing in the world.

Eric

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Move console redirect to pid namespace
  2013-02-13 19:08   ` Eric W. Biederman
@ 2013-02-15  2:08     ` Corey Minyard
  2013-02-15  4:23       ` Eric W. Biederman
  0 siblings, 1 reply; 6+ messages in thread
From: Corey Minyard @ 2013-02-15  2:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Bruno Prémont, Corey Minyard, containers, Linux Kernel

On 02/13/2013 01:08 PM, Eric W. Biederman wrote:
> Bruno Prémont <bonbons@linux-vserver.org> writes:
>
>> CCing containers list
>>
>> On Fri, 08 February 2013 minyard@acm.org wrote:
>>> From: Corey Minyard <cminyard@mvista.com>
>>>
>>> The console redirect - ioctl(fd, TIOCCONS) - is not in a namespace,
>>> thus a container can do a redirect and grab all the I/O on the host
>>> and all container consoles.
>>>
>>> This change puts the redirect in the pid namespace.
>>>
>>> Signed-off-by: Corey Minyard <cminyard@mvista.com>
>>> ---
>>>
>>> I'm pretty sure this patch is not correct, but I'm not quite sure the
>>> best way to fix this.  I'm not 100% sure that the pid namespace is the
>>> right place, but it seemed the most reasonable of all the choices.  The
>>> other obvious choice is the mount namespace, but it didn't seem as good
>>> a fit.
>> With recent changes, tying it to init user namespace might even be
>> better.
> With recent changes this is tied to the initial user namespace.  So the
> simple solution to this and so many other similiar security problems is
> to run your container in a user namespace.
>
> The permission check currently is capable(CAP_SYS_ADMIN) which requires
> the caller to have the CAP_SYS_ADMIN in the initial user namespace.

I'm not sure I follow.  Are these changes in k.org, or in another 
repository someplace?

>
> Is there a desire to have TIOCCONS not just fail in a container but to
> have TIOCCONS work in a container specific way?

Well, my desire is for the host console to work properly if a container 
uses TIOCCONS :-).  It seems to me that the most consistent way to 
handle this is to have TIOCCONS in a container redirect the container's 
console.

>
>>> The other problem is that I don't think you can call fput() from
>>> destroy_pid_namespace().  That can be called from interrupt context,
>>> and I don't think fput() is safe there.  I know it's not safe in 3.4
>>> with the RT patch applied.  However, the only way I've come up with to
>>> fix it is to add a workqueue, and that seems a bit heavy for this.
> Actually getting destroy_pid_namespace out of interrupt context wouldn't
> be the worst thing in the world.

I would agree, but it would still require something like a workqueue.  
Is there a better mechanism?

-corey

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Move console redirect to pid namespace
  2013-02-15  2:08     ` Corey Minyard
@ 2013-02-15  4:23       ` Eric W. Biederman
  2013-02-15 14:50         ` Corey Minyard
  0 siblings, 1 reply; 6+ messages in thread
From: Eric W. Biederman @ 2013-02-15  4:23 UTC (permalink / raw)
  To: minyard; +Cc: Bruno Prémont, Corey Minyard, containers, Linux Kernel

Corey Minyard <tcminyard@gmail.com> writes:

> On 02/13/2013 01:08 PM, Eric W. Biederman wrote:
>> Bruno Prémont <bonbons@linux-vserver.org> writes:
>>
>>> CCing containers list
>>>
>>> On Fri, 08 February 2013 minyard@acm.org wrote:
>>>> From: Corey Minyard <cminyard@mvista.com>
>>>>
>>>> The console redirect - ioctl(fd, TIOCCONS) - is not in a namespace,
>>>> thus a container can do a redirect and grab all the I/O on the host
>>>> and all container consoles.
>>>>
>>>> This change puts the redirect in the pid namespace.
>>>>
>>>> Signed-off-by: Corey Minyard <cminyard@mvista.com>
>>>> ---
>>>>
>>>> I'm pretty sure this patch is not correct, but I'm not quite sure the
>>>> best way to fix this.  I'm not 100% sure that the pid namespace is the
>>>> right place, but it seemed the most reasonable of all the choices.  The
>>>> other obvious choice is the mount namespace, but it didn't seem as good
>>>> a fit.
>>> With recent changes, tying it to init user namespace might even be
>>> better.
>> With recent changes this is tied to the initial user namespace.  So the
>> simple solution to this and so many other similiar security problems is
>> to run your container in a user namespace.
>>
>> The permission check currently is capable(CAP_SYS_ADMIN) which requires
>> the caller to have the CAP_SYS_ADMIN in the initial user namespace.
>
> I'm not sure I follow.  Are these changes in k.org, or in another
> repository someplace?

In k.org. 3.7 would work. 3.8-rcX would work even better.

root in a user namespace does not have permission to call TIOCCONS.

>> Is there a desire to have TIOCCONS not just fail in a container but to
>> have TIOCCONS work in a container specific way?
>
> Well, my desire is for the host console to work properly if a
> container uses TIOCCONS :-).  It seems to me that the most consistent
> way to handle this is to have TIOCCONS in a container redirect the
> container's console.

Last I looked people were creating a regulary pty and using that in
/dev/ for their containers.  So the emperical evidence is that TIOCCONS
is not needed.  What case are you looking at that needs TIOCCONS?

If there is good cause we can make TIOCCONS work but we need a
compelling case beyond root in a container can do bad things.

>>>> The other problem is that I don't think you can call fput() from
>>>> destroy_pid_namespace().  That can be called from interrupt context,
>>>> and I don't think fput() is safe there.  I know it's not safe in 3.4
>>>> with the RT patch applied.  However, the only way I've come up with to
>>>> fix it is to add a workqueue, and that seems a bit heavy for this.
>> Actually getting destroy_pid_namespace out of interrupt context wouldn't
>> be the worst thing in the world.
>
> I would agree, but it would still require something like a workqueue.
> Is there a better mechanism?

It might be as simple as finding all of the put_pids and moving them out
of spin_lock critical sections.  I don't know that we drop pids in
actual interrupt context.

Eric


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Move console redirect to pid namespace
  2013-02-15  4:23       ` Eric W. Biederman
@ 2013-02-15 14:50         ` Corey Minyard
  0 siblings, 0 replies; 6+ messages in thread
From: Corey Minyard @ 2013-02-15 14:50 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Bruno Prémont, Corey Minyard, containers, Linux Kernel

On 02/14/2013 10:23 PM, Eric W. Biederman wrote:
>
>>> With recent changes this is tied to the initial user namespace.  So the
>>> simple solution to this and so many other similiar security problems is
>>> to run your container in a user namespace.
>>>
>>> The permission check currently is capable(CAP_SYS_ADMIN) which requires
>>> the caller to have the CAP_SYS_ADMIN in the initial user namespace.
>> I'm not sure I follow.  Are these changes in k.org, or in another
>> repository someplace?
> In k.org. 3.7 would work. 3.8-rcX would work even better.
>
> root in a user namespace does not have permission to call TIOCCONS.

Ok, that's good enough for me.  I don't have a compelling reason to make 
it work, beyond liking consistency.

Thank you,

-corey

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-02-15 14:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-09  2:28 [PATCH] Move console redirect to pid namespace minyard
2013-02-09 18:14 ` Bruno Prémont
2013-02-13 19:08   ` Eric W. Biederman
2013-02-15  2:08     ` Corey Minyard
2013-02-15  4:23       ` Eric W. Biederman
2013-02-15 14:50         ` Corey Minyard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox