Re: [PATCH 01/15] add Documentation/namespaces/user_namespace.txt (v3)

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Serge E. Hallyn" <serge.hallyn@canonical.com>
To: Vasiliy Kulikov <segoon@openwall.com>
Cc: Serge Hallyn <serge@hallyn.com>,
	akpm@osdl.org, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, containers@lists.linux-foundation.org,
	dhowells@redhat.com, ebiederm@xmission.com, rdunlap@xenotime.net,
	kernel-hardening@lists.openwall.com
Subject: Re: [PATCH 01/15] add Documentation/namespaces/user_namespace.txt (v3)
Date: Sat, 1 Oct 2011 12:00:47 -0500	[thread overview]
Message-ID: <20111001170047.GA2935@sergelap> (raw)
In-Reply-To: <20110927155659.GA22532@albatros>

Quoting Vasiliy Kulikov (segoon@openwall.com):
> On Tue, Sep 27, 2011 at 08:21 -0500, Serge E. Hallyn wrote:
> > > First, the patches by design expose much kernel code to unprivileged
> > > userspace processes.  This code doesn't expect malformed data (e.g. VFS,
> > > specific filesystems, block layer, char drivers, sysadmin part of LSMs,
> > > etc. etc.).  By relaxing permission rules you greatly increase attack
> > > surface of the kernel from unprivileged users.  Are you (or somebody
> > > else) planning to audit this code?
> > 
> > I had wanted to (but didn't) propose a discussion at ksummit about how
> > best to approach the filesystem code.  That's not even just for user
> > namespaces - patches have been floated in the past to make mount an
> > unprivileged operation depending on the FS and the user's permission
> > over the device and target.
> 
> This is a dangerous operation by itself.

Of course it is :)  And it's been a while since it has been brought up,
but it *was* quite well thought through and throrougly discussed - see
i.e. https://lkml.org/lkml/2008/1/8/131

Oh, that's right.  In the end the reason it didn't go in had to do with
the ability for an unprivileged user to prevent a privileged user from
unmounting trees by leaving a busy mount in a hidden namespace.

Eric, in the past we didn't know what to do about that, but I wonder
if setns could be used in some clever way to solve it from userspace.

> AFAICS, this is the reason why
> e.g. FUSE doesn't pass user mount points to other users and even root.
> Beginning from violating some rules like existance of single "." and
> ".." in each directory and ending with filename charsets with /, \000
> and things like `, ", ', \ inside.
> 
> 
> >  So I don't know if a combination of auditing
> > and fuzzing is the way to go,
> 
> Maybe the combination of both.  There are no generic recommendations,
> it's always limited to the subsystem, checked property, and the
> auditor.

Ok, let me keep focusing on the tightening down right now, and then
before proceeding with relaxing, I'll start some analysis and discussion
of the code which is already under targeted (ns_capable) capability checks.

> > > Also, will it be possible to somehow restrict what specific kernel
> > > facilities are accessible from users (IOW, what root emulation
> > > limitations are in action)?  It is userful from both points of sysadmin,
> > > who might not want to allow users to do such things, and from the
> > > security POV in sense of attack surface reduction.
> > 
> > You're probably thinking along different lines, but this is why I've
> > been wanting seccomp2 to get pushed through.  So that we can deny a
> > container the syscalls we know it won't need, especially newer ones,
> > to reduce the attack surface available to it.
> 
> This dependency greatly complicates the things.

IMO this is not a dependency for user namespaces though - it's only a
dependency for unprivileged user namespaces.  And we haven't seriously
discussed doing that yet precisely because we're nowhere near ready
(and frankly I don't know that it'll ever be sane).

> First, there is a big misunderstanding between Will and Ingo in what
> needs seccompv2 should serve.  Will wants to reduce kernel attack

I know I know :)

> surface by limiting syscalls and syscall arguments available to a user
> (a single task, btw).  Ingo wants to see a full featured filtering
> engine, which needs code changes all over the kernel.  Given the needed
> changes amounts, it will unlikely reduce attack surface.
> 
> You probably don't want Will's version as syscalls filtering is a very

It seems to me per-syscall filtering is a great start.  I'm not looking
to seccomp2 as an assurance against formerly privileged (and now only
privileged per-namespace) code which may have had previously overlooked
bugs.  I'm looking to seccomp2 as an assurance against bugs in newly
written syscalls or the compatibility layer.

> bad abstraction in your case.  user_namespaces likely need Ingo's
> version of seccomp as it will be possible to filter e.g. fs-specific
> events, but even if it is implemented, it will take a looong time for
> your needs IMHO.

Yes, I think that would just lead to exploits through bad policy.

> Also, I'm afraid for _good_ user_namespace filtering the policy
> definition will be too complicated (like SELinux policy definition for
> non-trivial applications) if it is implemented in events filtering
> terms.
> 
> 
> > The way we're approaching it right now is that by default everything
> > stays 'capable(X)', so that a non-init user namespace doesn't get the
> > privileges.
> 
> Great.  I was not sure about it.
> 
> 
> >  While some of my patchsets this summer didn't follow this,
> > Eric reminded me that we should first clamp down on the user namespaces
> > as much as possible, and relax permissions in child namespaces later.
> 
> I think it is the only sane way.

Yup.  I trust you and Eric will keep me in check if I get over-zealous :)

-serge

next prev parent reply	other threads:[~2011-10-01 17:00 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-02 19:56 user namespaces v3: continue targetting capabilities Serge Hallyn
2011-09-02 19:56 ` (unknown), Serge Hallyn
2011-09-02 19:56 ` [PATCH 01/15] add Documentation/namespaces/user_namespace.txt (v3) Serge Hallyn
2011-09-07 22:50   ` Andrew Morton
2011-09-09 13:10     ` Serge E. Hallyn
2011-09-26 19:17   ` Vasiliy Kulikov
2011-09-27 13:21     ` Serge E. Hallyn
2011-09-27 15:56       ` Vasiliy Kulikov
2011-10-01 17:00         ` Serge E. Hallyn [this message]
2011-10-03  1:46           ` Eric W. Biederman
2011-10-03 19:53             ` Eric W. Biederman
2011-10-03 20:04               ` Serge E. Hallyn
     [not found] ` <1314993400-6910-1-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2011-09-02 19:56   ` [PATCH 02/15] user ns: setns: move capable checks into per-ns attach helper Serge Hallyn
2011-09-04  1:51     ` Matt Helsley
2011-09-09 14:56       ` Serge E. Hallyn
2011-09-02 19:56   ` [PATCH 03/15] keyctl: check capabilities against key's user_ns Serge Hallyn
2011-09-02 19:56   ` [PATCH 04/15] user_ns: convert fs/attr.c to targeted capabilities Serge Hallyn
2011-09-02 19:56   ` [PATCH 05/15] userns: clamp down users of cap_raised Serge Hallyn
2011-09-02 19:56   ` [PATCH 06/15] user namespace: make each net (net_ns) belong to a user_ns Serge Hallyn
2011-09-02 19:56   ` [PATCH 11/15] userns: make some net-sysfs capable calls targeted Serge Hallyn
2011-09-02 19:56   ` [PATCH 12/15] user_ns: target af_key capability check Serge Hallyn
2011-09-02 19:56 ` [PATCH 07/15] user namespace: use net->user_ns for some capable calls under net/ Serge Hallyn
2011-09-02 19:56 ` [PATCH 08/15] af_netlink.c: make netlink_capable userns-aware Serge Hallyn
2011-09-02 19:56 ` [PATCH 09/15] user ns: convert ipv6 to targeted capabilities Serge Hallyn
2011-09-02 19:56 ` [PATCH 10/15] net/core/scm.c: target capable() calls to user_ns owning the net_ns Serge Hallyn
2011-09-02 19:56 ` [PATCH 13/15] userns: net: make many network capable calls targeted Serge Hallyn
2011-09-02 19:56 ` [PATCH 14/15] net: pass user_ns to cap_netlink_recv() Serge Hallyn
2011-09-02 19:56 ` [PATCH 15/15] make kernel/signal.c user ns safe (v2) Serge Hallyn
     [not found] ` <1314993400-6910-3-git-send-email-serge@hallyn.com>
2011-09-02 23:49   ` missing [PATCH 01/15] Eric W. Biederman
2011-09-03  1:09     ` Serge E. Hallyn
2011-09-13 14:43 ` user namespaces v3: continue targetting capabilities Serge E. Hallyn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111001170047.GA2935@sergelap \
    --to=serge.hallyn@canonical.com \
    --cc=akpm@osdl.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=rdunlap@xenotime.net \
    --cc=segoon@openwall.com \
    --cc=serge@hallyn.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).