Re: Network isolation with RLIMIT

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: Network isolation with RLIMIT_NETWORK, cont'd.
       [not found] <1260674379-4262-1-git-send-email-michael@laptop.org>
@ 2009-12-13  3:44 ` Michael Stone
  2009-12-13  5:09   ` setrlimit(RLIMIT_NETWORK) vs. prctl(???) Michael Stone
                     ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Michael Stone @ 2009-12-13  3:44 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-security-module
  Cc: Andi Kleen, David Lang, Oliver Hartkopp, Alan Cox, Herbert Xu,
	Valdis Kletnieks, Bryan Donlan, Rémi Denis-Courmont,
	Evgeniy Polyakov, C. Scott Ananian, James Morris,
	Eric W. Biederman, Bernie Innocenti, Mark Seaborn

Gentlefolks,

You were all meant to be included on the CC-list for the letter and patches
which I just sent to lkml:

   http://lkml.org/lkml/2009/12/12/149

Apologies for the typo in my previous mail headers.

Regards,

Michael

^ permalink raw reply	[flat|nested] 14+ messages in thread

* setrlimit(RLIMIT_NETWORK) vs. prctl(???)
  2009-12-13  3:44 ` Network isolation with RLIMIT_NETWORK, cont'd Michael Stone
@ 2009-12-13  5:09   ` Michael Stone
  2009-12-13  5:20     ` Ulrich Drepper
  2009-12-13  8:32   ` Network isolation with RLIMIT_NETWORK, cont'd Rémi Denis-Courmont
  2009-12-13 10:05   ` Eric W. Biederman
  2 siblings, 1 reply; 14+ messages in thread
From: Michael Stone @ 2009-12-13  5:09 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-security-module
  Cc: Andi Kleen, David Lang, Oliver Hartkopp, Alan Cox, Herbert Xu,
	Valdis Kletnieks, Bryan Donlan, Rémi Denis-Courmont,
	Evgeniy Polyakov, C. Scott Ananian, James Morris,
	Eric W. Biederman, Bernie Innocenti, Mark Seaborn, Michael Stone

Folks,

A colleague just asked me an excellent question about my approach which I'd
like to share with you. Paraphrasing, he wrote:

> rlimits seem very heavy for a simple inherited boolean flag. Also, creating
> a new one will require modifying a lot of delicate userland software.
> Wouldn't some new prctl() flags be a better choice?

Here's my response:

> You're absolutely right that choosing to expose this functionality as an
> rlimit (as opposed to as a new syscall or as a flag to an old syscall like
> prctl()) is a decision with complex consequences.
> 
> I picked rlimits for this patch (after trying the "new syscall" approach
> privately) because doing so provides exactly the interface, semantics, and
> userland integration that I want:
>
> interface: "unprivileged", "temporarily drop", "permanently drop", "get
> current state", "persist current state across exec()", and some room for
> future expansion of semantics by definining new state values between 0 and
> RLIMIT_INFINITY.
> 
> integration: lots of sandboxing code already contains logic to drop rlimits
> when starting up an isolated process. Furthermore, I think it would be really
> great to be able to limit networking from the shell via ulimit and on a
> per-user basis via /etc/security/limits.conf.
> 
> That being said, I'm not wedded to the decision. Could you give me some more
> specific examples of the kinds of changes in low-level userspace code that
> you're worried about?

Regards,

Michael

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: setrlimit(RLIMIT_NETWORK) vs. prctl(???)
  2009-12-13  5:09   ` setrlimit(RLIMIT_NETWORK) vs. prctl(???) Michael Stone
@ 2009-12-13  5:20     ` Ulrich Drepper
  2009-12-15  5:33       ` Michael Stone
  0 siblings, 1 reply; 14+ messages in thread
From: Ulrich Drepper @ 2009-12-13  5:20 UTC (permalink / raw)
  To: Michael Stone
  Cc: linux-kernel, netdev, linux-security-module, Andi Kleen,
	David Lang, Oliver Hartkopp, Alan Cox, Herbert Xu,
	Valdis Kletnieks, Bryan Donlan, Rémi Denis-Courmont,
	Evgeniy Polyakov, C. Scott Ananian, James Morris,
	Eric W. Biederman, Bernie Innocenti, Mark Seaborn

On Sat, Dec 12, 2009 at 21:09, Michael Stone <michael@laptop.org> wrote:
>> That being said, I'm not wedded to the decision. Could you give me some
>> more
>> specific examples of the kinds of changes in low-level userspace code that
>> you're worried about?

It was an accident that I sent the email privately.

As summarized in the paraphrased comment, it's a pain to deal with
rlimit extensions.  It's easy enough to do all this using prctl() with
the same semantics and without forcing any other code to be modified.
I let others more competent to judge the usefulness.  But using rlimit
as the interface is just plain wrong.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: setrlimit(RLIMIT_NETWORK) vs. prctl(???)
  2009-12-13  5:20     ` Ulrich Drepper
@ 2009-12-15  5:33       ` Michael Stone
  0 siblings, 0 replies; 14+ messages in thread
From: Michael Stone @ 2009-12-15  5:33 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: linux-kernel, netdev, linux-security-module

Ulrich Drepper wrote:
> On Sat, Dec 12, 2009 at 21:09, Michael Stone <michael@laptop.org> wrote:
>> That being said, I'm not wedded to the decision. Could you give me some
>> more specific examples of the kinds of changes in low-level userspace code
>> that you're worried about?
> 
> As summarized in the paraphrased comment, it's a pain to deal with
> rlimit extensions.  It's easy enough to do all this using prctl() with
> the same semantics and without forcing any other code to be modified.
> I let others more competent to judge the usefulness.  But using rlimit
> as the interface is just plain wrong.

I still like the rlimit-based interface because I think it gives good intuition
about how to use the facility and about how it ought to be exposed to high-level
parts of userland but it certainly can't hurt to cook up a version based on
prctl() so that we can make a fair comparison of the two. 

I'll see what I can come up with.

Regards,

Michael

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Network isolation with RLIMIT_NETWORK, cont'd.
  2009-12-13  3:44 ` Network isolation with RLIMIT_NETWORK, cont'd Michael Stone
  2009-12-13  5:09   ` setrlimit(RLIMIT_NETWORK) vs. prctl(???) Michael Stone
@ 2009-12-13  8:32   ` Rémi Denis-Courmont
  2009-12-13 13:44     ` Michael Stone
  2009-12-13 10:05   ` Eric W. Biederman
  2 siblings, 1 reply; 14+ messages in thread
From: Rémi Denis-Courmont @ 2009-12-13  8:32 UTC (permalink / raw)
  To: Michael Stone
  Cc: linux-kernel, netdev, linux-security-module, Andi Kleen,
	David Lang, Oliver Hartkopp, Alan Cox, Herbert Xu,
	Valdis Kletnieks, Bryan Donlan, Evgeniy Polyakov,
	C. Scott Ananian, James Morris, Eric W. Biederman,
	Bernie Innocenti, Mark Seaborn

	Hello,

Le dimanche 13 décembre 2009 05:44:18 Michael Stone, vous avez écrit :
> You were all meant to be included on the CC-list for the letter and patches
> which I just sent to lkml:
> 
>    http://lkml.org/lkml/2009/12/12/149

You explicitly mention the need to connect to the X server over local sockets.  
But won't that allow the sandboxed application to send synthetic events to any 
other X11 applications? Hence unless the whole X server has restricted network 
access, this seems a bit broken? D-Bus, which also uses local sockets, will 
exhibit similar issues, as will any unrestricted IPC mechanism in fact.

I am not sure if restricting network access but not other file descriptors 
makes that much sense... ? Then again, I'm not entirely clear what you are 
trying to solve.

If I had to sandbox something, I'd drop the process file limit to 0. That will 
effectively cut off network, file system, and POSIX IPCs. Unfortunately, the 
process can still use SysV IPC, ptrace(), and send signals to others. So those 
are the gaps I would first try to contain.

-- 
Rémi Denis-Courmont
http://www.remlab.net/
http://fi.linkedin.com/in/remidenis

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Network isolation with RLIMIT_NETWORK, cont'd.
  2009-12-13  8:32   ` Network isolation with RLIMIT_NETWORK, cont'd Rémi Denis-Courmont
@ 2009-12-13 13:44     ` Michael Stone
  0 siblings, 0 replies; 14+ messages in thread
From: Michael Stone @ 2009-12-13 13:44 UTC (permalink / raw)
  To: Rémi Denis-Courmont
  Cc: Michael Stone, linux-kernel, netdev, linux-security-module,
	Andi Kleen, David Lang, Oliver Hartkopp, Alan Cox, Herbert Xu,
	Valdis Kletnieks, Bryan Donlan, Evgeniy Polyakov,
	C. Scott Ananian, James Morris, Eric W. Biederman,
	Bernie Innocenti, Mark Seaborn

Rémi,                                                                                                                                                                                                                                                               
> You explicitly mention the need to connect to the X server over local sockets.
> But won't that allow the sandboxed application to send synthetic events to any
> other X11 applications? 

X11 cookie authentication and socket ownership+permissions effectively control
access to the X server by local processes. Thus, as an isolation author, I may
easily grant my isolated process any of:

   a) full access to the main X server 
   b) some access to a nested X server (like a Xephyr) which I'm using to do
      some event filtering
   c) no access to any X server by witholding thec cookies or by changing the
      permissions on the X socket to be more restrictive

with existing techniques.

> Hence unless the whole X server has restricted network access, this seems a
> bit broken? 

Not broken for the reasons I mentioned above. However, using this rlimit to
disable fresh network access for the whole X server actually sounds like a
rather nice idea; thanks for suggesting it.

> D-Bus, which also uses local sockets, will exhibit similar issues, 

Absolutely. However, D-Bus, like X, already has strong authentication
mechanisms in place that permit me to use pre-existing Unix discretionary 
access control to limit what communication takes place. More specifically, I can 

   a) tell D-Bus to use a file-system socket and change the credentials on that
      socket

   b) use cookies to authenticate incoming connections

   c) explicitly tell D-Bus what users and groups may connect via configuration
      files

   d) explicitly tell D-Bus what users and groups may send and receive which
      messages via configuration files

> as will any unrestricted IPC mechanism in fact. I am not sure if restricting
> network access but not other file descriptors makes that much sense...? Then
> again, I'm not entirely clear what you are trying to solve.

Inadequately access-controlled IPC mechanisms are the specific problem that I
am trying to address. Fortunately, these mechanisms seem to be rare: the only
two that I know of are non-AF_UNIX sockets and ptrace(). All the other IPC
mechanisms that I have seen may be adequately restricted by changing file
permissions and ownership.

> If I had to sandbox something, I'd drop the process file limit to 0. 

That is a technique that is commonly used by many people in this space. It
works well for some limited use cases and, like SECCOMP, is too restrictive for
the kinds of general-purpose applications that I'm sandboxing.

If you're interested,

   http://cr.yp.to/unix/disablenetwork.html

lists several specific problems. To see more, just try dropping RLIMIT_NOFILE
to 0 before launching all your favorite apps. I'd be curious to hear how far
you get.

Regards,

Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Network isolation with RLIMIT_NETWORK, cont'd.
  2009-12-13  3:44 ` Network isolation with RLIMIT_NETWORK, cont'd Michael Stone
  2009-12-13  5:09   ` setrlimit(RLIMIT_NETWORK) vs. prctl(???) Michael Stone
  2009-12-13  8:32   ` Network isolation with RLIMIT_NETWORK, cont'd Rémi Denis-Courmont
@ 2009-12-13 10:05   ` Eric W. Biederman
  2009-12-13 14:21     ` Michael Stone
  2009-12-17 17:52     ` Andi Kleen
  2 siblings, 2 replies; 14+ messages in thread
From: Eric W. Biederman @ 2009-12-13 10:05 UTC (permalink / raw)
  To: Michael Stone
  Cc: linux-kernel, netdev, linux-security-module, Andi Kleen,
	David Lang, Oliver Hartkopp, Alan Cox, Herbert Xu,
	Valdis Kletnieks, Bryan Donlan, Rémi Denis-Courmont,
	Evgeniy Polyakov, C. Scott Ananian, James Morris,
	Bernie Innocenti, Mark Seaborn, Linux Containers

I have added the container's list to the cc as there is some overlap.

Michael Stone <michael@laptop.org> writes:

> Dear lkml,
> 
> A few months ago [1], I asked for feedback on a new network isolation primitive
> named "RLIMIT_NETWORK" designed for use with Unix sandboxing utilities like
> Rainbow, Plash, and friends [2]. Thank you to all those CC'ed for your helpful
> early remarks.
> 
> Here is an updated patchset with responses to the following criticisms:

Overall what you have looks addhoc, and very special case which is
likely to impair maintenance in the future.

Furthermore you have not addressed the primary issue that keeps
unshare(CLONE_NEWNET) requiring root privileges.  You can in theory
confuse a suid root application and cause it to take action with it's
elevated privileges that violate the security policy.  The network
namespace has more potential to confuse existing applications than
your mechanism, but the problem seems to remain.

>   1. ptrace() 
>      
>      It was pointed out by Alan Cox, Andi Kleen, and others that processes
>      which dropped their RLIMIT_NETWORK rlimit were still able to directly
>      perform networking through a ptrace()'d victim.
> 
>      The new patchset adds an access check to __ptrace_may_access() to prevent
>      this behavior.

Solve that with an unused uid.  That ptrace_may_access check is
completely non-intuitive, and a problem if we ever remove the current
== task security module bug avoidance.

>   2. unshare(CLONE_NEWNET)
> 
>      It was pointed out by James Morris that network namespaces could be used
>      to implement behavior similar to the behavior this patchset is designed to
>      implement. To address this criticism, I added support for network
>      namespaces to my sandboxing utility (Rainbow).
> 
>      Unfortunately, I have discovered that network namespaces in their current
>      form are not appropriate for my use cases because they prevent the
>      namespace'd apps from connecting to the X server, even over plain old
>      AF_UNIX sockets.

We discussed that a while ago, and there is no fundamental reason to
disallow opening unix domain sockets from another network namespace.
The reason this has not been done, is that no one has taken a good
hard look at the packet transmit path and said there are no technical
problems for packets traversing between two network namespaces.

It is probably time to revisit that.

>      The RLIMIT_NETWORK facility I propose contains a specific exception for
>      AF_UNIX filesystem sockets since those sockets are already bound by
>      regular Unix discretionary access control.

What is more significant that unix discretionary access control is the
fact that the set of available af_unix sockets you can bind to is filtered
by the mount namespace.

With respect to the problem of handling suid root applications my long
term plan is to finish the security credentials namespace aka
unshare(NEWUSER).  Making the capabilities namespace local and
changing all uid based checks from uid1 == uid2 to (ns1, uid1) ==
(ns2, uid2).  At which point suid root applications will not be a
problem because the problem root capabilities will not be available
for them to acquire.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Network isolation with RLIMIT_NETWORK, cont'd.
  2009-12-13 10:05   ` Eric W. Biederman
@ 2009-12-13 14:21     ` Michael Stone
  2009-12-17 17:31       ` Mark Seaborn
  2009-12-17 17:52     ` Andi Kleen
  1 sibling, 1 reply; 14+ messages in thread
From: Michael Stone @ 2009-12-13 14:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Michael Stone, linux-kernel, netdev, linux-security-module,
	Andi Kleen, David Lang, Oliver Hartkopp, Alan Cox, Herbert Xu,
	Valdis Kletnieks, Bryan Donlan, Rémi Denis-Courmont,
	Evgeniy Polyakov, C. Scott Ananian, James Morris,
	Bernie Innocenti, Mark Seaborn, Linux Containers

Eric Biederman wrote:

> I have added the container's list to the cc as there is some overlap.

Good idea; thanks.

> Overall what you have looks ad-hoc, and very special case which is
> likely to impair maintenance in the future.

Unfortunately, these are the semantics which are necessary to make further
progress on sandboxing real Linux apps with the discretionary access control
facilities which are available today.

> You can in theory confuse a suid root application and cause it to take action
> with it's elevated privileges that violate the security policy. 

You're right, in theory. In practice, the setuid-root facility is a rather
special escape hatch which *everyone* in this field knows must be carefully
audited and maintained when building or updating trustworthy systems.

Also, in practice, I'm not expecting perfection today. Nor was I last year, nor
am I next year. What I am expecting is that the kernel will supply me (perhaps
with my assistance along the way) with the access control facilities that I
need to do my job in userland. This is one of them.

> The network namespace has more potential to confuse existing applications
> than your mechanism, but the problem seems to remain.

I'm glad to hear that you find this mechanism to be comparatively less
confusing.

>>   1. ptrace() 
>>      
>>      It was pointed out by Alan Cox, Andi Kleen, and others that processes
>>      which dropped their RLIMIT_NETWORK rlimit were still able to directly
>>      perform networking through a ptrace()'d victim.
>> 
>>      The new patchset adds an access check to __ptrace_may_access() to prevent
>>      this behavior.
> 
> Solve that with an unused uid.  

I already do, in general. (As do the other people requesting this facility.)

The reason for the __ptrace_may_access() check is that the logical way for
*application authors* whose code is *already* running in a fresh uid to further
improve system security is to separate their network I/O from their parsing
code a process boundary and to drop networking privileges in the parser.

>>   2. unshare(CLONE_NEWNET)
>> 
>>      It was pointed out by James Morris that network namespaces could be used
>>      to implement behavior similar to the behavior this patchset is designed to
>>      implement. To address this criticism, I added support for network
>>      namespaces to my sandboxing utility (Rainbow).
>> 
>>      Unfortunately, I have discovered that network namespaces in their current
>>      form are not appropriate for my use cases because they prevent the
>>      namespace'd apps from connecting to the X server, even over plain old
>>      AF_UNIX sockets.
>
>We discussed that a while ago, and there is no fundamental reason to
>disallow opening unix domain sockets from another network namespace.

I disagree. I like that the network namespaces have (fairly) clear semantics.
They are excellent semantics for some of my other use cases, like testing
networked software [1]. They're probably quite nice for full-blown
containerization. They're just not right for the kind of lightweight sandboxing
of complicated legacy apps that I'm doing.

[1]: http://dev.laptop.org/git/users/mstone/dnshash/tree/docs/unit_testing.txt

>> The RLIMIT_NETWORK facility I propose contains a specific exception for
>> AF_UNIX filesystem sockets since those sockets are already bound by
>> regular Unix discretionary access control.
> 
> What is more significant than unix discretionary access control is the
> fact that the set of available af_unix sockets you can bind to is filtered
> by the mount namespace.

Actually, the Unix DAC is far more important for my purposes. The reason is
that it's unprivileged, already understood by literally *everyone* involved in
Unix security, and it has the best tools support of any access control
mechanism.

For comparison, I do use CLONE_NEWNS mount namespaces and they've been a real
pain because

   a) unlike in Plan 9, they're privileged,

   b) they greatly complicate debugging the isolated app because you see
      different things inside and outside the namespace,

   c) there's no good way to manipulate them from userland, and

   d) they're poorly documented outside of the mount man page.

Regards,

Michael

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Network isolation with RLIMIT_NETWORK, cont'd.
  2009-12-13 14:21     ` Michael Stone
@ 2009-12-17 17:31       ` Mark Seaborn
  2009-12-17 18:24         ` Bryan Donlan
  2009-12-17 19:23         ` Bernie Innocenti
  0 siblings, 2 replies; 14+ messages in thread
From: Mark Seaborn @ 2009-12-17 17:31 UTC (permalink / raw)
  To: Michael Stone
  Cc: David Lang, Valdis Kletnieks, Herbert Xu,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, C. Scott Ananian,
	Bernie Innocenti, linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	Andi Kleen, Eric W. Biederman, Oliver Hartkopp, Linux Containers,
	Evgeniy Polyakov, Bryan Donlan, Rémi Denis-Courmont,
	Alan Cox

On Sun, Dec 13, 2009 at 2:21 PM, Michael Stone <michael-2X9k7bc8m7Mdnm+yROfE0A@public.gmane.org> wrote:

> For comparison, I do use CLONE_NEWNS mount namespaces and they've been a
> real
> pain because
>
>  a) unlike in Plan 9, they're privileged,
>
>  b) they greatly complicate debugging the isolated app because you see
>     different things inside and outside the namespace,
>
>  c) there's no good way to manipulate them from userland, and
>
>  d) they're poorly documented outside of the mount man page.
>

Maybe we could try to fix those problems.

The reason chroot() and clone()/CLONE_NEWNS are privileged is that they
provide a way to violate the assumptions of setuid/setgid executables.  If
we add a per-process flag that prevents a process from exec'ing setuid
executables, we could allow chroot() and CLONE_NEWNS when that flag is set.
That fixes (a).

Maybe we could fix (b) by making mount namespaces into first class objects
that can be named through a file descriptor, so that one process can
manipulate another process's namespace without itself being subject to the
namespace.

Cheers,
Mark

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Network isolation with RLIMIT_NETWORK, cont'd.
  2009-12-17 17:31       ` Mark Seaborn
@ 2009-12-17 18:24         ` Bryan Donlan
  2009-12-17 19:35           ` Bernie Innocenti
  2009-12-17 19:23         ` Bernie Innocenti
  1 sibling, 1 reply; 14+ messages in thread
From: Bryan Donlan @ 2009-12-17 18:24 UTC (permalink / raw)
  To: Mark Seaborn
  Cc: Michael Stone, Eric W. Biederman, linux-kernel, netdev,
	linux-security-module, Andi Kleen, David Lang, Oliver Hartkopp,
	Alan Cox, Herbert Xu, Valdis Kletnieks, Rémi Denis-Courmont,
	Evgeniy Polyakov, C. Scott Ananian, James Morris,
	Bernie Innocenti, Linux Containers

On Thu, Dec 17, 2009 at 12:31 PM, Mark Seaborn <mrs@mythic-beasts.com> wrote:

> Maybe we could fix (b) by making mount namespaces into first class objects
> that can be named through a file descriptor, so that one process can
> manipulate another process's namespace without itself being subject to the
> namespace.

Can this be done using openat() and friends currently? It would seem
the natural way to implement this; open /proc/(pid)/root, then
openat() things from there (or even chdir to it and see the mounts
that it sees from there...)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Network isolation with RLIMIT_NETWORK, cont'd.
  2009-12-17 18:24         ` Bryan Donlan
@ 2009-12-17 19:35           ` Bernie Innocenti
  2009-12-17 19:53             ` Bryan Donlan
  0 siblings, 1 reply; 14+ messages in thread
From: Bernie Innocenti @ 2009-12-17 19:35 UTC (permalink / raw)
  To: Bryan Donlan
  Cc: Mark Seaborn, Michael Stone, Eric W. Biederman, linux-kernel,
	netdev, linux-security-module, Andi Kleen, David Lang,
	Oliver Hartkopp, Alan Cox, Herbert Xu, Valdis Kletnieks,
	Rémi Denis-Courmont, Evgeniy Polyakov, C. Scott Ananian,
	James Morris, Linux Containers

On Thu, 2009-12-17 at 13:24 -0500, Bryan Donlan wrote:
> Can this be done using openat() and friends currently? It would seem
> the natural way to implement this; open /proc/(pid)/root, then
> openat() things from there (or even chdir to it and see the mounts
> that it sees from there...)

Yeah, but /proc/<pid>/root is just a symlink. It's correct for chroots,
but I doubt it can be meaningful for per-process namespaces.

If we were to implement Mark Seaborn's idea of naming
namespaces, /proc/<pid>/rootfd would be a file descriptor providing
access to the namespace through some fancy ioctls.

Or maybe not. Could such a file-descriptor be used as the source
argument to mount(), perhaps along with a new MS_NS flag?

Alternatively, perhaps one could come up with a userspace solution:
read /proc/<pid>/mounts and repeat all mounts, perhaps with a prefix.
The downsides are that it would require superuser privs and wouldn't
automatically stay synchronized with the real namespace.

-- 
   // Bernie Innocenti - http://codewiz.org/
 \X/  Sugar Labs       - http://sugarlabs.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Network isolation with RLIMIT_NETWORK, cont'd.
  2009-12-17 19:35           ` Bernie Innocenti
@ 2009-12-17 19:53             ` Bryan Donlan
  0 siblings, 0 replies; 14+ messages in thread
From: Bryan Donlan @ 2009-12-17 19:53 UTC (permalink / raw)
  To: Bernie Innocenti
  Cc: Mark Seaborn, Michael Stone, Eric W. Biederman, linux-kernel,
	netdev, linux-security-module, Andi Kleen, David Lang,
	Oliver Hartkopp, Alan Cox, Herbert Xu, Valdis Kletnieks,
	Rémi Denis-Courmont, Evgeniy Polyakov, C. Scott Ananian,
	James Morris, Linux Containers

On Thu, Dec 17, 2009 at 2:35 PM, Bernie Innocenti <bernie@codewiz.org> wrote:
> On Thu, 2009-12-17 at 13:24 -0500, Bryan Donlan wrote:
>> Can this be done using openat() and friends currently? It would seem
>> the natural way to implement this; open /proc/(pid)/root, then
>> openat() things from there (or even chdir to it and see the mounts
>> that it sees from there...)
>
> Yeah, but /proc/<pid>/root is just a symlink. It's correct for chroots,
> but I doubt it can be meaningful for per-process namespaces.

The files in /proc/<pid>/fs are 'just symlinks', but opening them can
provide access to objects (eg, deleted files) not accessible through
the normal filesystem namespace. I see no reason, API-wise, why
/proc/<pid>/root couldn't be extended similarly - but I've not looked
at the namespaces implementation, so maybe there's some reason it'd be
difficult to implement...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Network isolation with RLIMIT_NETWORK, cont'd.
  2009-12-17 17:31       ` Mark Seaborn
  2009-12-17 18:24         ` Bryan Donlan
@ 2009-12-17 19:23         ` Bernie Innocenti
  1 sibling, 0 replies; 14+ messages in thread
From: Bernie Innocenti @ 2009-12-17 19:23 UTC (permalink / raw)
  To: Mark Seaborn
  Cc: Michael Stone, Eric W. Biederman, linux-kernel, netdev,
	linux-security-module, Andi Kleen, David Lang, Oliver Hartkopp,
	Alan Cox, Herbert Xu, Valdis Kletnieks, Bryan Donlan,
	Rémi Denis-Courmont, Evgeniy Polyakov, C. Scott Ananian,
	James Morris, Linux Containers

On Thu, 2009-12-17 at 17:31 +0000, Mark Seaborn wrote:


> The reason chroot() and clone()/CLONE_NEWNS are privileged is that
> they provide a way to violate the assumptions of setuid/setgid
> executables.  If we add a per-process flag that prevents a process
> from exec'ing setuid executables, we could allow chroot() and
> CLONE_NEWNS when that flag is set.  That fixes (a).

I think this would be great.

> 
> Maybe we could fix (b) by making mount namespaces into first class
> objects that can be named through a file descriptor, so that one
> process can manipulate another process's namespace without itself
> being subject to the namespace.

I think Michael's problem with debugging is much more fundamental:
application programmers get confused when some filesystem operations
fail in the debugged process, while it works fine from the shell.

It would help if the kernel provided a way for a process to switch to
another process' namespace. Even better, it would be great if existing
namespaces could be mounted at an arbitrary position within another
namespace. Then one could use traditional shell tools to inspect it, or
even chroot into it.

</delirium>

-- 
   // Bernie Innocenti - http://codewiz.org/
 \X/  Sugar Labs       - http://sugarlabs.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Network isolation with RLIMIT_NETWORK, cont'd.
  2009-12-13 10:05   ` Eric W. Biederman
  2009-12-13 14:21     ` Michael Stone
@ 2009-12-17 17:52     ` Andi Kleen
  1 sibling, 0 replies; 14+ messages in thread
From: Andi Kleen @ 2009-12-17 17:52 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Michael Stone, linux-kernel, netdev, linux-security-module,
	Andi Kleen, David Lang, Oliver Hartkopp, Alan Cox, Herbert Xu,
	Valdis Kletnieks, Bryan Donlan, Rémi Denis-Courmont,
	Evgeniy Polyakov, C. Scott Ananian, James Morris,
	Bernie Innocenti, Mark Seaborn, Linux Containers

> Solve that with an unused uid.  That ptrace_may_access check is
> completely non-intuitive, and a problem if we ever remove the current
> == task security module bug avoidance.

I thought he wanted to do that without suid?

If he can change uids he can as well just use full network namespaces.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-12-17 19:53 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1260674379-4262-1-git-send-email-michael@laptop.org>
2009-12-13  3:44 ` Network isolation with RLIMIT_NETWORK, cont'd Michael Stone
2009-12-13  5:09   ` setrlimit(RLIMIT_NETWORK) vs. prctl(???) Michael Stone
2009-12-13  5:20     ` Ulrich Drepper
2009-12-15  5:33       ` Michael Stone
2009-12-13  8:32   ` Network isolation with RLIMIT_NETWORK, cont'd Rémi Denis-Courmont
2009-12-13 13:44     ` Michael Stone
2009-12-13 10:05   ` Eric W. Biederman
2009-12-13 14:21     ` Michael Stone
2009-12-17 17:31       ` Mark Seaborn
2009-12-17 18:24         ` Bryan Donlan
2009-12-17 19:35           ` Bernie Innocenti
2009-12-17 19:53             ` Bryan Donlan
2009-12-17 19:23         ` Bernie Innocenti
2009-12-17 17:52     ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).