public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] [RFC] Make it easier to harden /proc/
@ 2011-03-16 19:31 Richard Weinberger
  2011-03-16 19:55 ` Kees Cook
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Weinberger @ 2011-03-16 19:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: akpm, serge, eparis, kees.cook, jmorris, eugeneteo, drosenberg,
	Richard Weinberger

When containers like LXC are used a unprivileged and jailed
root user can still write to critical files in /proc/.
E.g: /proc/sys/kernel/{sysrq, panic, panic_on_oops, ... }

This new restricted attribute makes it possible to protect such
files. When restricted is set to true root needs CAP_SYS_ADMIN
to into the file.

Signed-off-by: Richard Weinberger <richard@nod.at>
---
 fs/proc/proc_sysctl.c  |    3 +++
 include/linux/sysctl.h |    1 +
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 8eb2522..cf7f27d 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -149,6 +149,9 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
 	if (sysctl_perm(head->root, table, write ? MAY_WRITE : MAY_READ))
 		goto out;
 
+	if (write && table->restricted && !capable(CAP_SYS_ADMIN))
+		goto out;
+
 	/* if that can happen at all, it should be -EINVAL, not -EISDIR */
 	error = -EINVAL;
 	if (!table->proc_handler)
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 11684d9..67d6129 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -1018,6 +1018,7 @@ struct ctl_table
 	void *data;
 	int maxlen;
 	mode_t mode;
+	bool restricted; /* CAP_SYS_ADMIN is needed for write access */
 	struct ctl_table *child;
 	struct ctl_table *parent;	/* Automatically set */
 	proc_handler *proc_handler;	/* Callback for text formatting */
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] [RFC] Make it easier to harden /proc/
  2011-03-16 19:31 [PATCH] [RFC] Make it easier to harden /proc/ Richard Weinberger
@ 2011-03-16 19:55 ` Kees Cook
  2011-03-16 20:08   ` Richard Weinberger
  0 siblings, 1 reply; 19+ messages in thread
From: Kees Cook @ 2011-03-16 19:55 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: linux-kernel, akpm, serge, eparis, jmorris, eugeneteo, drosenberg

Hi Richard,

On Wed, Mar 16, 2011 at 08:31:47PM +0100, Richard Weinberger wrote:
> When containers like LXC are used a unprivileged and jailed
> root user can still write to critical files in /proc/.
> E.g: /proc/sys/kernel/{sysrq, panic, panic_on_oops, ... }
> 
> This new restricted attribute makes it possible to protect such
> files. When restricted is set to true root needs CAP_SYS_ADMIN
> to into the file.

I was thinking about this too. I'd prefer more fine-grained control
in this area, since some sysctl entries aren't strictly controlled by
CAP_SYS_ADMIN (e.g. mmap_min_addr is already checking CAP_SYS_RAWIO).

How about this instead?

Signed-off-by: Kees Cook <kees.cook@canonical.com>
---
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 8eb2522..5c5cfab 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -149,6 +149,10 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
 	if (sysctl_perm(head->root, table, write ? MAY_WRITE : MAY_READ))
 		goto out;
 
+	if (write && !cap_isclear(table->write_caps) &&
+            !cap_issubset(table->write_caps, current_cred()->cap_permitted))
+		goto out;
+
 	/* if that can happen at all, it should be -EINVAL, not -EISDIR */
 	error = -EINVAL;
 	if (!table->proc_handler)
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 11684d9..4e05493 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -1018,6 +1018,7 @@ struct ctl_table
 	void *data;
 	int maxlen;
 	mode_t mode;
+	kernel_cap_t write_caps;	/* Capabilities required to write */
 	struct ctl_table *child;
 	struct ctl_table *parent;	/* Automatically set */
 	proc_handler *proc_handler;	/* Callback for text formatting */


-- 
Kees Cook
Ubuntu Security Team

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] [RFC] Make it easier to harden /proc/
  2011-03-16 19:55 ` Kees Cook
@ 2011-03-16 20:08   ` Richard Weinberger
  2011-03-16 20:45     ` Arnd Bergmann
  2011-03-16 21:19     ` Alexey Dobriyan
  0 siblings, 2 replies; 19+ messages in thread
From: Richard Weinberger @ 2011-03-16 20:08 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, akpm, serge, eparis, jmorris, eugeneteo, drosenberg

Kees,

Am Mittwoch 16 März 2011, 20:55:49 schrieb Kees Cook:
> Hi Richard,
> 
> On Wed, Mar 16, 2011 at 08:31:47PM +0100, Richard Weinberger wrote:
> > When containers like LXC are used a unprivileged and jailed
> > root user can still write to critical files in /proc/.
> > E.g: /proc/sys/kernel/{sysrq, panic, panic_on_oops, ... }
> > 
> > This new restricted attribute makes it possible to protect such
> > files. When restricted is set to true root needs CAP_SYS_ADMIN
> > to into the file.
> 
> I was thinking about this too. I'd prefer more fine-grained control
> in this area, since some sysctl entries aren't strictly controlled by
> CAP_SYS_ADMIN (e.g. mmap_min_addr is already checking CAP_SYS_RAWIO).
> 
> How about this instead?

Good Idea.
May we should also consider a per-directory restriction.
Every file in /proc/sys/{kernel/, vm/, fs/, dev/} needs a protection.
It would be much easier to set the protection on the parent directory
instead of protecting file by file...

> Signed-off-by: Kees Cook <kees.cook@canonical.com>
> ---
> diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
> index 8eb2522..5c5cfab 100644
> --- a/fs/proc/proc_sysctl.c
> +++ b/fs/proc/proc_sysctl.c
> @@ -149,6 +149,10 @@ static ssize_t proc_sys_call_handler(struct file
> *filp, void __user *buf, if (sysctl_perm(head->root, table, write ?
> MAY_WRITE : MAY_READ)) goto out;
> 
> +	if (write && !cap_isclear(table->write_caps) &&
> +            !cap_issubset(table->write_caps,
> current_cred()->cap_permitted)) +		goto out;
> +
>  	/* if that can happen at all, it should be -EINVAL, not -EISDIR */
>  	error = -EINVAL;
>  	if (!table->proc_handler)
> diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
> index 11684d9..4e05493 100644
> --- a/include/linux/sysctl.h
> +++ b/include/linux/sysctl.h
> @@ -1018,6 +1018,7 @@ struct ctl_table
>  	void *data;
>  	int maxlen;
>  	mode_t mode;
> +	kernel_cap_t write_caps;	/* Capabilities required to write */
>  	struct ctl_table *child;
>  	struct ctl_table *parent;	/* Automatically set */
>  	proc_handler *proc_handler;	/* Callback for text formatting */


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] [RFC] Make it easier to harden /proc/
  2011-03-16 20:08   ` Richard Weinberger
@ 2011-03-16 20:45     ` Arnd Bergmann
  2011-03-16 20:52       ` Richard Weinberger
  2011-03-16 21:19     ` Alexey Dobriyan
  1 sibling, 1 reply; 19+ messages in thread
From: Arnd Bergmann @ 2011-03-16 20:45 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Kees Cook, linux-kernel, akpm, serge, eparis, jmorris, eugeneteo,
	drosenberg, Eric W. Biederman

On Wednesday 16 March 2011 21:08:16 Richard Weinberger wrote:
> Am Mittwoch 16 März 2011, 20:55:49 schrieb Kees Cook:
> > On Wed, Mar 16, 2011 at 08:31:47PM +0100, Richard Weinberger wrote:
> > > When containers like LXC are used a unprivileged and jailed
> > > root user can still write to critical files in /proc/.
> > > E.g: /proc/sys/kernel/{sysrq, panic, panic_on_oops, ... }
> > > 
> > > This new restricted attribute makes it possible to protect such
> > > files. When restricted is set to true root needs CAP_SYS_ADMIN
> > > to into the file.
> > 
> > I was thinking about this too. I'd prefer more fine-grained control
> > in this area, since some sysctl entries aren't strictly controlled by
> > CAP_SYS_ADMIN (e.g. mmap_min_addr is already checking CAP_SYS_RAWIO).
> > 
> > How about this instead?
> 
> Good Idea.
> May we should also consider a per-directory restriction.
> Every file in /proc/sys/{kernel/, vm/, fs/, dev/} needs a protection.
> It would be much easier to set the protection on the parent directory
> instead of protecting file by file...

How does this interact with the per-namespace sysctls that Eric
Biederman added a few years ago?

I had expected that any dangerous sysctl would not be visible in
an unpriviledge container anyway.

	Arnd

> > Signed-off-by: Kees Cook <kees.cook@canonical.com>
> > ---
> > diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
> > index 8eb2522..5c5cfab 100644
> > --- a/fs/proc/proc_sysctl.c
> > +++ b/fs/proc/proc_sysctl.c
> > @@ -149,6 +149,10 @@ static ssize_t proc_sys_call_handler(struct file
> > *filp, void __user *buf, if (sysctl_perm(head->root, table, write ?
> > MAY_WRITE : MAY_READ)) goto out;
> > 
> > +	if (write && !cap_isclear(table->write_caps) &&
> > +            !cap_issubset(table->write_caps,
> > current_cred()->cap_permitted)) +		goto out;
> > +
> >  	/* if that can happen at all, it should be -EINVAL, not -EISDIR */
> >  	error = -EINVAL;
> >  	if (!table->proc_handler)
> > diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
> > index 11684d9..4e05493 100644
> > --- a/include/linux/sysctl.h
> > +++ b/include/linux/sysctl.h
> > @@ -1018,6 +1018,7 @@ struct ctl_table
> >  	void *data;
> >  	int maxlen;
> >  	mode_t mode;
> > +	kernel_cap_t write_caps;	/* Capabilities required to write */
> >  	struct ctl_table *child;
> >  	struct ctl_table *parent;	/* Automatically set */
> >  	proc_handler *proc_handler;	/* Callback for text formatting */
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] [RFC] Make it easier to harden /proc/
  2011-03-16 20:45     ` Arnd Bergmann
@ 2011-03-16 20:52       ` Richard Weinberger
  2011-03-16 21:03         ` Arnd Bergmann
                           ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Richard Weinberger @ 2011-03-16 20:52 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Kees Cook, linux-kernel, akpm, serge, eparis, jmorris, eugeneteo,
	drosenberg, Eric W. Biederman

Am Mittwoch 16 März 2011, 21:45:45 schrieb Arnd Bergmann:
> On Wednesday 16 March 2011 21:08:16 Richard Weinberger wrote:
> > Am Mittwoch 16 März 2011, 20:55:49 schrieb Kees Cook:
> > > On Wed, Mar 16, 2011 at 08:31:47PM +0100, Richard Weinberger wrote:
> > > > When containers like LXC are used a unprivileged and jailed
> > > > root user can still write to critical files in /proc/.
> > > > E.g: /proc/sys/kernel/{sysrq, panic, panic_on_oops, ... }
> > > > 
> > > > This new restricted attribute makes it possible to protect such
> > > > files. When restricted is set to true root needs CAP_SYS_ADMIN
> > > > to into the file.
> > > 
> > > I was thinking about this too. I'd prefer more fine-grained control
> > > in this area, since some sysctl entries aren't strictly controlled by
> > > CAP_SYS_ADMIN (e.g. mmap_min_addr is already checking CAP_SYS_RAWIO).
> > > 
> > > How about this instead?
> > 
> > Good Idea.
> > May we should also consider a per-directory restriction.
> > Every file in /proc/sys/{kernel/, vm/, fs/, dev/} needs a protection.
> > It would be much easier to set the protection on the parent directory
> > instead of protecting file by file...
> 
> How does this interact with the per-namespace sysctls that Eric
> Biederman added a few years ago?

Do you mean CONFIG_{UTS, UPC, USER, NET,}_NS?

> I had expected that any dangerous sysctl would not be visible in
> an unpriviledge container anyway.

No way.
That's why it's currently a very good idea to mount /proc/ read-only into a container.

> 	Arnd
> 
> > > Signed-off-by: Kees Cook <kees.cook@canonical.com>
> > > ---
> > > diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
> > > index 8eb2522..5c5cfab 100644
> > > --- a/fs/proc/proc_sysctl.c
> > > +++ b/fs/proc/proc_sysctl.c
> > > @@ -149,6 +149,10 @@ static ssize_t proc_sys_call_handler(struct file
> > > *filp, void __user *buf, if (sysctl_perm(head->root, table, write ?
> > > MAY_WRITE : MAY_READ)) goto out;
> > > 
> > > +	if (write && !cap_isclear(table->write_caps) &&
> > > +            !cap_issubset(table->write_caps,
> > > current_cred()->cap_permitted)) +		goto out;
> > > +
> > > 
> > >  	/* if that can happen at all, it should be -EINVAL, not -EISDIR */
> > >  	error = -EINVAL;
> > >  	if (!table->proc_handler)
> > > 
> > > diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
> > > index 11684d9..4e05493 100644
> > > --- a/include/linux/sysctl.h
> > > +++ b/include/linux/sysctl.h
> > > @@ -1018,6 +1018,7 @@ struct ctl_table
> > > 
> > >  	void *data;
> > >  	int maxlen;
> > >  	mode_t mode;
> > > 
> > > +	kernel_cap_t write_caps;	/* Capabilities required to write */
> > > 
> > >  	struct ctl_table *child;
> > >  	struct ctl_table *parent;	/* Automatically set */
> > >  	proc_handler *proc_handler;	/* Callback for text formatting */


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] [RFC] Make it easier to harden /proc/
  2011-03-16 20:52       ` Richard Weinberger
@ 2011-03-16 21:03         ` Arnd Bergmann
  2011-03-16 21:04         ` Alexey Dobriyan
  2011-03-16 21:17         ` Eric W. Biederman
  2 siblings, 0 replies; 19+ messages in thread
From: Arnd Bergmann @ 2011-03-16 21:03 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Kees Cook, linux-kernel, akpm, serge, eparis, jmorris, eugeneteo,
	drosenberg, Eric W. Biederman

On Wednesday 16 March 2011 21:52:49 Richard Weinberger wrote:
> Am Mittwoch 16 März 2011, 21:45:45 schrieb Arnd Bergmann:
> > How does this interact with the per-namespace sysctls that Eric
> > Biederman added a few years ago?
> 
> Do you mean CONFIG_{UTS, UPC, USER, NET,}_NS?

I mean specifically e51b6ba07 "sysctl: Infrastructure for per namespace sysctls"
and related patches. I've looked a bit closer there and it seems that
this is only used for network namespaces at the moment.

> > I had expected that any dangerous sysctl would not be visible in
> > an unpriviledge container anyway.
> 
> No way.
> That's why it's currently a very good idea to mount /proc/ read-only into a container.

Ok, I see. 

	Arnd

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] [RFC] Make it easier to harden /proc/
  2011-03-16 20:52       ` Richard Weinberger
  2011-03-16 21:03         ` Arnd Bergmann
@ 2011-03-16 21:04         ` Alexey Dobriyan
  2011-03-16 21:07           ` Richard Weinberger
  2011-03-16 21:17         ` Eric W. Biederman
  2 siblings, 1 reply; 19+ messages in thread
From: Alexey Dobriyan @ 2011-03-16 21:04 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Arnd Bergmann, Kees Cook, linux-kernel, akpm, serge, eparis,
	jmorris, eugeneteo, drosenberg, Eric W. Biederman

On Wed, Mar 16, 2011 at 09:52:49PM +0100, Richard Weinberger wrote:
> Am Mittwoch 16 März 2011, 21:45:45 schrieb Arnd Bergmann:
> > On Wednesday 16 March 2011 21:08:16 Richard Weinberger wrote:
> > > Am Mittwoch 16 März 2011, 20:55:49 schrieb Kees Cook:
> > > > On Wed, Mar 16, 2011 at 08:31:47PM +0100, Richard Weinberger wrote:
> > > > > When containers like LXC are used a unprivileged and jailed
> > > > > root user can still write to critical files in /proc/.
> > > > > E.g: /proc/sys/kernel/{sysrq, panic, panic_on_oops, ... }
> > > > > 
> > > > > This new restricted attribute makes it possible to protect such
> > > > > files. When restricted is set to true root needs CAP_SYS_ADMIN
> > > > > to into the file.
> > > > 
> > > > I was thinking about this too. I'd prefer more fine-grained control
> > > > in this area, since some sysctl entries aren't strictly controlled by
> > > > CAP_SYS_ADMIN (e.g. mmap_min_addr is already checking CAP_SYS_RAWIO).
> > > > 
> > > > How about this instead?
> > > 
> > > Good Idea.
> > > May we should also consider a per-directory restriction.
> > > Every file in /proc/sys/{kernel/, vm/, fs/, dev/} needs a protection.
> > > It would be much easier to set the protection on the parent directory
> > > instead of protecting file by file...
> > 
> > How does this interact with the per-namespace sysctls that Eric
> > Biederman added a few years ago?
> 
> Do you mean CONFIG_{UTS, UPC, USER, NET,}_NS?

It only covers /proc/sys/net/

> > I had expected that any dangerous sysctl would not be visible in
> > an unpriviledge container anyway.
> 
> No way.

No way what exactly?

> That's why it's currently a very good idea to mount /proc/ read-only into a container.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] [RFC] Make it easier to harden /proc/
  2011-03-16 21:04         ` Alexey Dobriyan
@ 2011-03-16 21:07           ` Richard Weinberger
  2011-03-16 21:15             ` Alexey Dobriyan
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Weinberger @ 2011-03-16 21:07 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Arnd Bergmann, Kees Cook, linux-kernel, akpm, serge, eparis,
	jmorris, eugeneteo, drosenberg, Eric W. Biederman

Am Mittwoch 16 März 2011, 22:04:52 schrieb Alexey Dobriyan:
> On Wed, Mar 16, 2011 at 09:52:49PM +0100, Richard Weinberger wrote:
> > Am Mittwoch 16 März 2011, 21:45:45 schrieb Arnd Bergmann:
> > > On Wednesday 16 March 2011 21:08:16 Richard Weinberger wrote:
> > > > Am Mittwoch 16 März 2011, 20:55:49 schrieb Kees Cook:
> > > > > On Wed, Mar 16, 2011 at 08:31:47PM +0100, Richard Weinberger wrote:
> > > > > > When containers like LXC are used a unprivileged and jailed
> > > > > > root user can still write to critical files in /proc/.
> > > > > > E.g: /proc/sys/kernel/{sysrq, panic, panic_on_oops, ... }
> > > > > > 
> > > > > > This new restricted attribute makes it possible to protect such
> > > > > > files. When restricted is set to true root needs CAP_SYS_ADMIN
> > > > > > to into the file.
> > > > > 
> > > > > I was thinking about this too. I'd prefer more fine-grained control
> > > > > in this area, since some sysctl entries aren't strictly controlled
> > > > > by CAP_SYS_ADMIN (e.g. mmap_min_addr is already checking
> > > > > CAP_SYS_RAWIO).
> > > > > 
> > > > > How about this instead?
> > > > 
> > > > Good Idea.
> > > > May we should also consider a per-directory restriction.
> > > > Every file in /proc/sys/{kernel/, vm/, fs/, dev/} needs a protection.
> > > > It would be much easier to set the protection on the parent directory
> > > > instead of protecting file by file...
> > > 
> > > How does this interact with the per-namespace sysctls that Eric
> > > Biederman added a few years ago?
> > 
> > Do you mean CONFIG_{UTS, UPC, USER, NET,}_NS?
> 
> It only covers /proc/sys/net/

Exactly.

> > > I had expected that any dangerous sysctl would not be visible in
> > > an unpriviledge container anyway.
> > 
> > No way.
> 
> No way what exactly?

Dangerous sysctls are not protected at all.
E.g. A jailed root can use /proc/sysrq-trigger.

> 
> > That's why it's currently a very good idea to mount /proc/ read-only into
> > a container.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] [RFC] Make it easier to harden /proc/
  2011-03-16 21:07           ` Richard Weinberger
@ 2011-03-16 21:15             ` Alexey Dobriyan
  2011-03-17 10:14               ` Miquel van Smoorenburg
  0 siblings, 1 reply; 19+ messages in thread
From: Alexey Dobriyan @ 2011-03-16 21:15 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Arnd Bergmann, Kees Cook, linux-kernel, akpm, serge, eparis,
	jmorris, eugeneteo, drosenberg, Eric W. Biederman

On Wed, Mar 16, 2011 at 10:07:48PM +0100, Richard Weinberger wrote:
> Am Mittwoch 16 März 2011, 22:04:52 schrieb Alexey Dobriyan:
> > On Wed, Mar 16, 2011 at 09:52:49PM +0100, Richard Weinberger wrote:
> > > Am Mittwoch 16 März 2011, 21:45:45 schrieb Arnd Bergmann:
> > > > On Wednesday 16 March 2011 21:08:16 Richard Weinberger wrote:
> > > > > Am Mittwoch 16 März 2011, 20:55:49 schrieb Kees Cook:
> > > > > > On Wed, Mar 16, 2011 at 08:31:47PM +0100, Richard Weinberger wrote:
> > > > > > > When containers like LXC are used a unprivileged and jailed
> > > > > > > root user can still write to critical files in /proc/.
> > > > > > > E.g: /proc/sys/kernel/{sysrq, panic, panic_on_oops, ... }
> > > > > > > 
> > > > > > > This new restricted attribute makes it possible to protect such
> > > > > > > files. When restricted is set to true root needs CAP_SYS_ADMIN
> > > > > > > to into the file.
> > > > > > 
> > > > > > I was thinking about this too. I'd prefer more fine-grained control
> > > > > > in this area, since some sysctl entries aren't strictly controlled
> > > > > > by CAP_SYS_ADMIN (e.g. mmap_min_addr is already checking
> > > > > > CAP_SYS_RAWIO).
> > > > > > 
> > > > > > How about this instead?
> > > > > 
> > > > > Good Idea.
> > > > > May we should also consider a per-directory restriction.
> > > > > Every file in /proc/sys/{kernel/, vm/, fs/, dev/} needs a protection.
> > > > > It would be much easier to set the protection on the parent directory
> > > > > instead of protecting file by file...
> > > > 
> > > > How does this interact with the per-namespace sysctls that Eric
> > > > Biederman added a few years ago?
> > > 
> > > Do you mean CONFIG_{UTS, UPC, USER, NET,}_NS?
> > 
> > It only covers /proc/sys/net/
> 
> Exactly.
> 
> > > > I had expected that any dangerous sysctl would not be visible in
> > > > an unpriviledge container anyway.
> > > 
> > > No way.
> > 
> > No way what exactly?
> 
> Dangerous sysctls are not protected at all.
> E.g. A jailed root can use /proc/sysrq-trigger.

Yes, and it's suggested that you do not show it at all,
instead of bloaing ctl_table.

But this requires knowledge which /proc is root and which one is "root".
:-(

With current splitup into FOO_NS...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] [RFC] Make it easier to harden /proc/
  2011-03-16 20:52       ` Richard Weinberger
  2011-03-16 21:03         ` Arnd Bergmann
  2011-03-16 21:04         ` Alexey Dobriyan
@ 2011-03-16 21:17         ` Eric W. Biederman
  2011-03-16 21:23           ` Richard Weinberger
  2011-03-17  6:41           ` Kees Cook
  2 siblings, 2 replies; 19+ messages in thread
From: Eric W. Biederman @ 2011-03-16 21:17 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Arnd Bergmann, Kees Cook, linux-kernel, akpm, serge, eparis,
	jmorris, eugeneteo, drosenberg

Richard Weinberger <richard@nod.at> writes:

2> Am Mittwoch 16 März 2011, 21:45:45 schrieb Arnd Bergmann:
>> On Wednesday 16 March 2011 21:08:16 Richard Weinberger wrote:
>> > Am Mittwoch 16 März 2011, 20:55:49 schrieb Kees Cook:
>> > > On Wed, Mar 16, 2011 at 08:31:47PM +0100, Richard Weinberger wrote:
>> > > > When containers like LXC are used a unprivileged and jailed
>> > > > root user can still write to critical files in /proc/.
>> > > > E.g: /proc/sys/kernel/{sysrq, panic, panic_on_oops, ... }
>> > > > 
>> > > > This new restricted attribute makes it possible to protect such
>> > > > files. When restricted is set to true root needs CAP_SYS_ADMIN
>> > > > to into the file.
>> > > 
>> > > I was thinking about this too. I'd prefer more fine-grained control
>> > > in this area, since some sysctl entries aren't strictly controlled by
>> > > CAP_SYS_ADMIN (e.g. mmap_min_addr is already checking CAP_SYS_RAWIO).
>> > > 
>> > > How about this instead?
>> > 
>> > Good Idea.
>> > May we should also consider a per-directory restriction.
>> > Every file in /proc/sys/{kernel/, vm/, fs/, dev/} needs a protection.
>> > It would be much easier to set the protection on the parent directory
>> > instead of protecting file by file...
>> 
>> How does this interact with the per-namespace sysctls that Eric
>> Biederman added a few years ago?
>
> Do you mean CONFIG_{UTS, UPC, USER, NET,}_NS?
>
>> I had expected that any dangerous sysctl would not be visible in
>> an unpriviledge container anyway.
>
> No way.
> That's why it's currently a very good idea to mount /proc/ read-only
> into a container.

However it is in the architecture.  The problem is that the user
namespace is not finished.  Once finished even root with all caps in a
container will have no more permissions than the unprivileged user that
created the user namespace.

Essentially the change is to make permissions checks become a comparison
of the tuple (user_ns, uid) instead of just comparisons by uid.  If we
want to fix permission problems with proc and containers please let's
focus on the completing the user namespace.

Eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] [RFC] Make it easier to harden /proc/
  2011-03-16 20:08   ` Richard Weinberger
  2011-03-16 20:45     ` Arnd Bergmann
@ 2011-03-16 21:19     ` Alexey Dobriyan
  2011-03-17 16:51       ` Eric W. Biederman
  1 sibling, 1 reply; 19+ messages in thread
From: Alexey Dobriyan @ 2011-03-16 21:19 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Kees Cook, linux-kernel, akpm, serge, eparis, jmorris, eugeneteo,
	drosenberg

On Wed, Mar 16, 2011 at 09:08:16PM +0100, Richard Weinberger wrote:
> Kees,
> 
> Am Mittwoch 16 März 2011, 20:55:49 schrieb Kees Cook:
> > Hi Richard,
> > 
> > On Wed, Mar 16, 2011 at 08:31:47PM +0100, Richard Weinberger wrote:
> > > When containers like LXC are used a unprivileged and jailed
> > > root user can still write to critical files in /proc/.
> > > E.g: /proc/sys/kernel/{sysrq, panic, panic_on_oops, ... }
> > > 
> > > This new restricted attribute makes it possible to protect such
> > > files. When restricted is set to true root needs CAP_SYS_ADMIN
> > > to into the file.
> > 
> > I was thinking about this too. I'd prefer more fine-grained control
> > in this area, since some sysctl entries aren't strictly controlled by
> > CAP_SYS_ADMIN (e.g. mmap_min_addr is already checking CAP_SYS_RAWIO).
> > 
> > How about this instead?
> 
> Good Idea.
> May we should also consider a per-directory restriction.
> Every file in /proc/sys/{kernel/, vm/, fs/, dev/} needs a protection.
> It would be much easier to set the protection on the parent directory
> instead of protecting file by file...

Of course, not.

You should _enable_ them one by one, not the other way around.

	"default deny"

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] [RFC] Make it easier to harden /proc/
  2011-03-16 21:17         ` Eric W. Biederman
@ 2011-03-16 21:23           ` Richard Weinberger
  2011-03-16 21:27             ` Eric W. Biederman
  2011-03-17  6:41           ` Kees Cook
  1 sibling, 1 reply; 19+ messages in thread
From: Richard Weinberger @ 2011-03-16 21:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Arnd Bergmann, Kees Cook, linux-kernel, akpm, serge, eparis,
	jmorris, eugeneteo, drosenberg

Am Mittwoch 16 März 2011, 22:17:39 schrieb Eric W. Biederman:
> Richard Weinberger <richard@nod.at> writes:
> 
> 2> Am Mittwoch 16 März 2011, 21:45:45 schrieb Arnd Bergmann:
> >> On Wednesday 16 March 2011 21:08:16 Richard Weinberger wrote:
> >> > Am Mittwoch 16 März 2011, 20:55:49 schrieb Kees Cook:
> >> > > On Wed, Mar 16, 2011 at 08:31:47PM +0100, Richard Weinberger wrote:
> >> > > > When containers like LXC are used a unprivileged and jailed
> >> > > > root user can still write to critical files in /proc/.
> >> > > > E.g: /proc/sys/kernel/{sysrq, panic, panic_on_oops, ... }
> >> > > > 
> >> > > > This new restricted attribute makes it possible to protect such
> >> > > > files. When restricted is set to true root needs CAP_SYS_ADMIN
> >> > > > to into the file.
> >> > > 
> >> > > I was thinking about this too. I'd prefer more fine-grained control
> >> > > in this area, since some sysctl entries aren't strictly controlled
> >> > > by CAP_SYS_ADMIN (e.g. mmap_min_addr is already checking
> >> > > CAP_SYS_RAWIO).
> >> > > 
> >> > > How about this instead?
> >> > 
> >> > Good Idea.
> >> > May we should also consider a per-directory restriction.
> >> > Every file in /proc/sys/{kernel/, vm/, fs/, dev/} needs a protection.
> >> > It would be much easier to set the protection on the parent directory
> >> > instead of protecting file by file...
> >> 
> >> How does this interact with the per-namespace sysctls that Eric
> >> Biederman added a few years ago?
> > 
> > Do you mean CONFIG_{UTS, UPC, USER, NET,}_NS?
> > 
> >> I had expected that any dangerous sysctl would not be visible in
> >> an unpriviledge container anyway.
> > 
> > No way.
> > That's why it's currently a very good idea to mount /proc/ read-only
> > into a container.
> 
> However it is in the architecture.  The problem is that the user
> namespace is not finished.  Once finished even root with all caps in a
> container will have no more permissions than the unprivileged user that
> created the user namespace.
> 
> Essentially the change is to make permissions checks become a comparison
> of the tuple (user_ns, uid) instead of just comparisons by uid.  If we
> want to fix permission problems with proc and containers please let's
> focus on the completing the user namespace.

Ok. What's the current status, where can I help?

> Eric


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] [RFC] Make it easier to harden /proc/
  2011-03-16 21:23           ` Richard Weinberger
@ 2011-03-16 21:27             ` Eric W. Biederman
  0 siblings, 0 replies; 19+ messages in thread
From: Eric W. Biederman @ 2011-03-16 21:27 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Arnd Bergmann, Kees Cook, linux-kernel, akpm, serge, eparis,
	jmorris, eugeneteo, drosenberg

Richard Weinberger <richard@nod.at> writes:

> Am Mittwoch 16 März 2011, 22:17:39 schrieb Eric W. Biederman:
>> Richard Weinberger <richard@nod.at> writes:
>> 
>> 2> Am Mittwoch 16 März 2011, 21:45:45 schrieb Arnd Bergmann:
>> >> On Wednesday 16 March 2011 21:08:16 Richard Weinberger wrote:
>> >> > Am Mittwoch 16 März 2011, 20:55:49 schrieb Kees Cook:
>> >> > > On Wed, Mar 16, 2011 at 08:31:47PM +0100, Richard Weinberger wrote:
>> >> > > > When containers like LXC are used a unprivileged and jailed
>> >> > > > root user can still write to critical files in /proc/.
>> >> > > > E.g: /proc/sys/kernel/{sysrq, panic, panic_on_oops, ... }
>> >> > > > 
>> >> > > > This new restricted attribute makes it possible to protect such
>> >> > > > files. When restricted is set to true root needs CAP_SYS_ADMIN
>> >> > > > to into the file.
>> >> > > 
>> >> > > I was thinking about this too. I'd prefer more fine-grained control
>> >> > > in this area, since some sysctl entries aren't strictly controlled
>> >> > > by CAP_SYS_ADMIN (e.g. mmap_min_addr is already checking
>> >> > > CAP_SYS_RAWIO).
>> >> > > 
>> >> > > How about this instead?
>> >> > 
>> >> > Good Idea.
>> >> > May we should also consider a per-directory restriction.
>> >> > Every file in /proc/sys/{kernel/, vm/, fs/, dev/} needs a protection.
>> >> > It would be much easier to set the protection on the parent directory
>> >> > instead of protecting file by file...
>> >> 
>> >> How does this interact with the per-namespace sysctls that Eric
>> >> Biederman added a few years ago?
>> > 
>> > Do you mean CONFIG_{UTS, UPC, USER, NET,}_NS?
>> > 
>> >> I had expected that any dangerous sysctl would not be visible in
>> >> an unpriviledge container anyway.
>> > 
>> > No way.
>> > That's why it's currently a very good idea to mount /proc/ read-only
>> > into a container.
>> 
>> However it is in the architecture.  The problem is that the user
>> namespace is not finished.  Once finished even root with all caps in a
>> container will have no more permissions than the unprivileged user that
>> created the user namespace.
>> 
>> Essentially the change is to make permissions checks become a comparison
>> of the tuple (user_ns, uid) instead of just comparisons by uid.  If we
>> want to fix permission problems with proc and containers please let's
>> focus on the completing the user namespace.
>
> Ok. What's the current status, where can I help?

Serge has been getting some of the pieces together and merging them to
Andrew.  I think he has the basic infrastructure in place.  Certainly he
has the infrastructure in place for per user namespace capabilities.

What should be left is the mechanics of making certain every permission
check in the kernel takes user namespaces properly into account.

Eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] [RFC] Make it easier to harden /proc/
  2011-03-16 21:17         ` Eric W. Biederman
  2011-03-16 21:23           ` Richard Weinberger
@ 2011-03-17  6:41           ` Kees Cook
  2011-03-17  7:30             ` Richard Weinberger
  1 sibling, 1 reply; 19+ messages in thread
From: Kees Cook @ 2011-03-17  6:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Richard Weinberger, Arnd Bergmann, linux-kernel, akpm, serge,
	eparis, jmorris, eugeneteo, drosenberg

On Wed, Mar 16, 2011 at 02:17:39PM -0700, Eric W. Biederman wrote:
> Richard Weinberger <richard@nod.at> writes:
> 2> Am Mittwoch 16 März 2011, 21:45:45 schrieb Arnd Bergmann:
> >> On Wednesday 16 March 2011 21:08:16 Richard Weinberger wrote:
> >> > Am Mittwoch 16 März 2011, 20:55:49 schrieb Kees Cook:
> >> > > On Wed, Mar 16, 2011 at 08:31:47PM +0100, Richard Weinberger wrote:
> >> > > > When containers like LXC are used a unprivileged and jailed
> >> > > > root user can still write to critical files in /proc/.
> >> > > > E.g: /proc/sys/kernel/{sysrq, panic, panic_on_oops, ... }
> >> > > > 
> >> > > > This new restricted attribute makes it possible to protect such
> >> > > > files. When restricted is set to true root needs CAP_SYS_ADMIN
> >> > > > to into the file.
> >> > > 
> >> > > I was thinking about this too. I'd prefer more fine-grained control
> >> > > in this area, since some sysctl entries aren't strictly controlled by
> >> > > CAP_SYS_ADMIN (e.g. mmap_min_addr is already checking CAP_SYS_RAWIO).
> >> > > 
> >> > > How about this instead?
> >> > 
> >> > Good Idea.
> >> > May we should also consider a per-directory restriction.
> >> > Every file in /proc/sys/{kernel/, vm/, fs/, dev/} needs a protection.
> >> > It would be much easier to set the protection on the parent directory
> >> > instead of protecting file by file...
> >> 
> >> How does this interact with the per-namespace sysctls that Eric
> >> Biederman added a few years ago?
> >
> > Do you mean CONFIG_{UTS, UPC, USER, NET,}_NS?
> >
> >> I had expected that any dangerous sysctl would not be visible in
> >> an unpriviledge container anyway.
> >
> > No way.
> > That's why it's currently a very good idea to mount /proc/ read-only
> > into a container.
> 
> However it is in the architecture.  The problem is that the user
> namespace is not finished.  Once finished even root with all caps in a
> container will have no more permissions than the unprivileged user that
> created the user namespace.
> 
> Essentially the change is to make permissions checks become a comparison
> of the tuple (user_ns, uid) instead of just comparisons by uid.  If we
> want to fix permission problems with proc and containers please let's
> focus on the completing the user namespace.

I actually think these are not mutually exclusive. Right now /proc/sys is
filled with ways to gain caps as a reduced-privilege uid 0 user. I don't
think containers are the only place where we want to be limiting /proc/sys.
(For example, core_pattern and modprobe entries can both be written by
root, regardless of cap, which can be directed to run arbitrary commands
with full caps. And yes, that's also being fixed separately, it's just an
example.)

I'd still like to see the sysctl table expanded to include caps to test.

-Kees

-- 
Kees Cook
Ubuntu Security Team

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] [RFC] Make it easier to harden /proc/
  2011-03-17  6:41           ` Kees Cook
@ 2011-03-17  7:30             ` Richard Weinberger
  0 siblings, 0 replies; 19+ messages in thread
From: Richard Weinberger @ 2011-03-17  7:30 UTC (permalink / raw)
  To: Kees Cook
  Cc: Eric W. Biederman, Arnd Bergmann, linux-kernel, akpm, serge,
	eparis, jmorris, eugeneteo, drosenberg

On Wed, 16 Mar 2011 23:41:36 -0700, Kees Cook <kees.cook@canonical.com>
wrote:
> On Wed, Mar 16, 2011 at 02:17:39PM -0700, Eric W. Biederman wrote:
>> Richard Weinberger <richard@nod.at> writes:
>> 2> Am Mittwoch 16 März 2011, 21:45:45 schrieb Arnd Bergmann:
>> >> On Wednesday 16 March 2011 21:08:16 Richard Weinberger wrote:
>> >> > Am Mittwoch 16 März 2011, 20:55:49 schrieb Kees Cook:
>> >> > > On Wed, Mar 16, 2011 at 08:31:47PM +0100, Richard Weinberger wrote:
>> >> > > > When containers like LXC are used a unprivileged and jailed
>> >> > > > root user can still write to critical files in /proc/.
>> >> > > > E.g: /proc/sys/kernel/{sysrq, panic, panic_on_oops, ... }
>> >> > > >
>> >> > > > This new restricted attribute makes it possible to protect such
>> >> > > > files. When restricted is set to true root needs CAP_SYS_ADMIN
>> >> > > > to into the file.
>> >> > >
>> >> > > I was thinking about this too. I'd prefer more fine-grained control
>> >> > > in this area, since some sysctl entries aren't strictly controlled by
>> >> > > CAP_SYS_ADMIN (e.g. mmap_min_addr is already checking CAP_SYS_RAWIO).
>> >> > >
>> >> > > How about this instead?
>> >> >
>> >> > Good Idea.
>> >> > May we should also consider a per-directory restriction.
>> >> > Every file in /proc/sys/{kernel/, vm/, fs/, dev/} needs a protection.
>> >> > It would be much easier to set the protection on the parent directory
>> >> > instead of protecting file by file...
>> >>
>> >> How does this interact with the per-namespace sysctls that Eric
>> >> Biederman added a few years ago?
>> >
>> > Do you mean CONFIG_{UTS, UPC, USER, NET,}_NS?
>> >
>> >> I had expected that any dangerous sysctl would not be visible in
>> >> an unpriviledge container anyway.
>> >
>> > No way.
>> > That's why it's currently a very good idea to mount /proc/ read-only
>> > into a container.
>>
>> However it is in the architecture.  The problem is that the user
>> namespace is not finished.  Once finished even root with all caps in a
>> container will have no more permissions than the unprivileged user that
>> created the user namespace.
>>
>> Essentially the change is to make permissions checks become a comparison
>> of the tuple (user_ns, uid) instead of just comparisons by uid.  If we
>> want to fix permission problems with proc and containers please let's
>> focus on the completing the user namespace.
> 
> I actually think these are not mutually exclusive. Right now /proc/sys is
> filled with ways to gain caps as a reduced-privilege uid 0 user. I don't
> think containers are the only place where we want to be limiting /proc/sys.
> (For example, core_pattern and modprobe entries can both be written by
> root, regardless of cap, which can be directed to run arbitrary commands
> with full caps. And yes, that's also being fixed separately, it's just an
> example.)
> 
> I'd still like to see the sysctl table expanded to include caps to test.

I agree with you.
Every writable file in /proc/ should have a check for at least one cap.

 
> -Kees


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] [RFC] Make it easier to harden /proc/
  2011-03-16 21:15             ` Alexey Dobriyan
@ 2011-03-17 10:14               ` Miquel van Smoorenburg
  2011-03-17 10:57                 ` Richard Weinberger
  0 siblings, 1 reply; 19+ messages in thread
From: Miquel van Smoorenburg @ 2011-03-17 10:14 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Richard Weinberger, Arnd Bergmann, Kees Cook, linux-kernel, akpm,
	serge, eparis, jmorris, eugeneteo, drosenberg, Eric W. Biederman,
	Miquel van Smoorenburg



On 16-03-11 10:15 PM, Alexey Dobriyan wrote:
> On Wed, Mar 16, 2011 at 10:07:48PM +0100, Richard Weinberger wrote:
>> Am Mittwoch 16 März 2011, 22:04:52 schrieb Alexey Dobriyan:
>>> On Wed, Mar 16, 2011 at 09:52:49PM +0100, Richard Weinberger wrote:
>>>> Am Mittwoch 16 März 2011, 21:45:45 schrieb Arnd Bergmann:
>>>>> On Wednesday 16 March 2011 21:08:16 Richard Weinberger wrote:
>>>>>> Am Mittwoch 16 März 2011, 20:55:49 schrieb Kees Cook:
>>>>> I had expected that any dangerous sysctl would not be visible in
>>>>> an unpriviledge container anyway.
>>>>
>>>> No way.
>>>
>>> No way what exactly?
>>
>> Dangerous sysctls are not protected at all.
>> E.g. A jailed root can use /proc/sysrq-trigger.
>
> Yes, and it's suggested that you do not show it at all,
> instead of bloaing ctl_table.
>
> But this requires knowledge which /proc is root and which one is "root".
> :-(
>
> With current splitup into FOO_NS...

And what about sysfs, there's a lot of writable stuff there too. For 
example in /sys/module/*/parameters, /sys/block/*/device/queu , 
/sys/kernel/, /sys/platform/ etc. Perhaps things you don't want to be 
read too, such as some uevent files.

Shouldn't that be made inaccessible as well, preferably not visible?

Programs in containers may need sysfs for stuff like 
/sys/class/net/<device> , so just not mounting sysfs may not be an option.

Mike.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] [RFC] Make it easier to harden /proc/
  2011-03-17 10:14               ` Miquel van Smoorenburg
@ 2011-03-17 10:57                 ` Richard Weinberger
  0 siblings, 0 replies; 19+ messages in thread
From: Richard Weinberger @ 2011-03-17 10:57 UTC (permalink / raw)
  To: Miquel van Smoorenburg
  Cc: Alexey Dobriyan, Arnd Bergmann, Kees Cook, linux-kernel, akpm,
	serge, eparis, jmorris, eugeneteo, drosenberg, Eric W. Biederman

Am Donnerstag 17 März 2011, 11:14:26 schrieb Miquel van Smoorenburg:
> On 16-03-11 10:15 PM, Alexey Dobriyan wrote:
> > On Wed, Mar 16, 2011 at 10:07:48PM +0100, Richard Weinberger wrote:
> >> Am Mittwoch 16 März 2011, 22:04:52 schrieb Alexey Dobriyan:
> >>> On Wed, Mar 16, 2011 at 09:52:49PM +0100, Richard Weinberger wrote:
> >>>> Am Mittwoch 16 März 2011, 21:45:45 schrieb Arnd Bergmann:
> >>>>> On Wednesday 16 March 2011 21:08:16 Richard Weinberger wrote:
> >>>>>> Am Mittwoch 16 März 2011, 20:55:49 schrieb Kees Cook:
> >>>>> I had expected that any dangerous sysctl would not be visible in
> >>>>> an unpriviledge container anyway.
> >>>> 
> >>>> No way.
> >>> 
> >>> No way what exactly?
> >> 
> >> Dangerous sysctls are not protected at all.
> >> E.g. A jailed root can use /proc/sysrq-trigger.
> > 
> > Yes, and it's suggested that you do not show it at all,
> > instead of bloaing ctl_table.
> > 
> > But this requires knowledge which /proc is root and which one is "root".
> > 
> > :-(
> > 
> > With current splitup into FOO_NS...
> 
> And what about sysfs, there's a lot of writable stuff there too. For
> example in /sys/module/*/parameters, /sys/block/*/device/queu ,
> /sys/kernel/, /sys/platform/ etc. Perhaps things you don't want to be
> read too, such as some uevent files.
> 
> Shouldn't that be made inaccessible as well, preferably not visible?

Sure.
It's the next big thing on my TODO list. :)

> Programs in containers may need sysfs for stuff like
> /sys/class/net/<device> , so just not mounting sysfs may not be an option.

In most cases mounting /sys read-only is sufficient.
Also in most of my cases no /sys is needed.

> Mike.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] [RFC] Make it easier to harden /proc/
  2011-03-16 21:19     ` Alexey Dobriyan
@ 2011-03-17 16:51       ` Eric W. Biederman
  2011-03-19 10:43         ` Richard Weinberger
  0 siblings, 1 reply; 19+ messages in thread
From: Eric W. Biederman @ 2011-03-17 16:51 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Richard Weinberger, Kees Cook, linux-kernel, akpm, serge, eparis,
	jmorris, eugeneteo, drosenberg

Alexey Dobriyan <adobriyan@gmail.com> writes:

> On Wed, Mar 16, 2011 at 09:08:16PM +0100, Richard Weinberger wrote:
>> Kees,
>> 
>> Am Mittwoch 16 März 2011, 20:55:49 schrieb Kees Cook:
>> > Hi Richard,
>> > 
>> > On Wed, Mar 16, 2011 at 08:31:47PM +0100, Richard Weinberger wrote:
>> > > When containers like LXC are used a unprivileged and jailed
>> > > root user can still write to critical files in /proc/.
>> > > E.g: /proc/sys/kernel/{sysrq, panic, panic_on_oops, ... }
>> > > 
>> > > This new restricted attribute makes it possible to protect such
>> > > files. When restricted is set to true root needs CAP_SYS_ADMIN
>> > > to into the file.
>> > 
>> > I was thinking about this too. I'd prefer more fine-grained control
>> > in this area, since some sysctl entries aren't strictly controlled by
>> > CAP_SYS_ADMIN (e.g. mmap_min_addr is already checking CAP_SYS_RAWIO).
>> > 
>> > How about this instead?
>> 
>> Good Idea.
>> May we should also consider a per-directory restriction.
>> Every file in /proc/sys/{kernel/, vm/, fs/, dev/} needs a protection.
>> It would be much easier to set the protection on the parent directory
>> instead of protecting file by file...
>
> Of course, not.
>
> You should _enable_ them one by one, not the other way around.
>
> 	"default deny"

Right.

Since the primary problem here is containers we can use the
user_namespace to add the default deny policy.

Something like the trivial patch below should make /proc/sys safe,
and the technique applies in general.

Richard is that a good enough example to get you started?

Eric

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 0f1bd83..a172a9d 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1674,10 +1674,12 @@ void register_sysctl_root(struct ctl_table_root *root)
 
 static int test_perm(int mode, int op)
 {
-	if (!current_euid())
-		mode >>= 6;
-	else if (in_egroup_p(0))
-		mode >>= 3;
+	if (current_user_ns() == &init_user_ns) {
+		if (!current_euid())
+			mode >>= 6;
+		else if (in_egroup_p(0))
+			mode >>= 3;
+	}
 	if ((op & ~mode & (MAY_READ|MAY_WRITE|MAY_EXEC)) == 0)
 		return 0;
 	return -EACCES;



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] [RFC] Make it easier to harden /proc/
  2011-03-17 16:51       ` Eric W. Biederman
@ 2011-03-19 10:43         ` Richard Weinberger
  0 siblings, 0 replies; 19+ messages in thread
From: Richard Weinberger @ 2011-03-19 10:43 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alexey Dobriyan, Kees Cook, linux-kernel, akpm, serge, eparis,
	jmorris, eugeneteo, drosenberg

Am Donnerstag 17 März 2011, 17:51:41 schrieb Eric W. Biederman:
> Alexey Dobriyan <adobriyan@gmail.com> writes:
> > On Wed, Mar 16, 2011 at 09:08:16PM +0100, Richard Weinberger wrote:
> >> Kees,
> >> 
> >> Am Mittwoch 16 März 2011, 20:55:49 schrieb Kees Cook:
> >> > Hi Richard,
> >> > 
> >> > On Wed, Mar 16, 2011 at 08:31:47PM +0100, Richard Weinberger wrote:
> >> > > When containers like LXC are used a unprivileged and jailed
> >> > > root user can still write to critical files in /proc/.
> >> > > E.g: /proc/sys/kernel/{sysrq, panic, panic_on_oops, ... }
> >> > > 
> >> > > This new restricted attribute makes it possible to protect such
> >> > > files. When restricted is set to true root needs CAP_SYS_ADMIN
> >> > > to into the file.
> >> > 
> >> > I was thinking about this too. I'd prefer more fine-grained control
> >> > in this area, since some sysctl entries aren't strictly controlled by
> >> > CAP_SYS_ADMIN (e.g. mmap_min_addr is already checking CAP_SYS_RAWIO).
> >> > 
> >> > How about this instead?
> >> 
> >> Good Idea.
> >> May we should also consider a per-directory restriction.
> >> Every file in /proc/sys/{kernel/, vm/, fs/, dev/} needs a protection.
> >> It would be much easier to set the protection on the parent directory
> >> instead of protecting file by file...
> > 
> > Of course, not.
> > 
> > You should _enable_ them one by one, not the other way around.
> > 
> > 	"default deny"
> 
> Right.
> 
> Since the primary problem here is containers we can use the
> user_namespace to add the default deny policy.
> 
> Something like the trivial patch below should make /proc/sys safe,
> and the technique applies in general.
> 
> Richard is that a good enough example to get you started?

Yes. Thanks.

> Eric
> 
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 0f1bd83..a172a9d 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1674,10 +1674,12 @@ void register_sysctl_root(struct ctl_table_root
> *root)
> 
>  static int test_perm(int mode, int op)
>  {
> -	if (!current_euid())
> -		mode >>= 6;
> -	else if (in_egroup_p(0))
> -		mode >>= 3;
> +	if (current_user_ns() == &init_user_ns) {
> +		if (!current_euid())
> +			mode >>= 6;
> +		else if (in_egroup_p(0))
> +			mode >>= 3;
> +	}
>  	if ((op & ~mode & (MAY_READ|MAY_WRITE|MAY_EXEC)) == 0)
>  		return 0;
>  	return -EACCES;


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2011-03-19 10:43 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-16 19:31 [PATCH] [RFC] Make it easier to harden /proc/ Richard Weinberger
2011-03-16 19:55 ` Kees Cook
2011-03-16 20:08   ` Richard Weinberger
2011-03-16 20:45     ` Arnd Bergmann
2011-03-16 20:52       ` Richard Weinberger
2011-03-16 21:03         ` Arnd Bergmann
2011-03-16 21:04         ` Alexey Dobriyan
2011-03-16 21:07           ` Richard Weinberger
2011-03-16 21:15             ` Alexey Dobriyan
2011-03-17 10:14               ` Miquel van Smoorenburg
2011-03-17 10:57                 ` Richard Weinberger
2011-03-16 21:17         ` Eric W. Biederman
2011-03-16 21:23           ` Richard Weinberger
2011-03-16 21:27             ` Eric W. Biederman
2011-03-17  6:41           ` Kees Cook
2011-03-17  7:30             ` Richard Weinberger
2011-03-16 21:19     ` Alexey Dobriyan
2011-03-17 16:51       ` Eric W. Biederman
2011-03-19 10:43         ` Richard Weinberger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox