* [PATCH 0/3] New system call, unshare
@ 2005-08-08 13:28 Janak Desai
2005-08-10 14:08 ` Florian Weimer
0 siblings, 1 reply; 5+ messages in thread
From: Janak Desai @ 2005-08-08 13:28 UTC (permalink / raw)
To: viro, sds, linuxram, ericvh, dwalsh, jmorris, akpm, torvalds, gh,
linux-fsdevel
Cc: linux-kernel
Patch Summary:
This patch implements a new system call, unshare. unshare allows
a process to disassociate parts of the process context that were
initially being shared using the clone() system call.
The patch consists of two parts:
[1/2] Implements the system call handler function sys_unshare.
[2/2] Implements system call setup for x86 architecture.
Patch Justification:
Inspiration for this patch came from the 4/20/05 post by Al Viro
on linux-fsdevel mailing list and the needs of per-process namespace
based polyinstantiated directories. In his post Mr. Viro saw
usefulness of the ability to create a private namespace without
forking. He also mentioned that "There used to be a kinda-sorta
agreement on a new syscall: unshare(bitmap) with arguments like
those of clone(2)".
Polyinstantiated directories provide an instance of a directory
based on the process security context (user id and/or extended
selinux attributes). Polyinstantiation of public directories such
as /tmp provide better separation of processes and prevent
illegal information flow through file name. Polyinstantiated
directories are needed for common criteria certification using
Mandatory Access Control based Protection Profiles.
Legacy Mandatory Access Control based UNIX operating systems
often modified kernel's pathname translation routines to
implement polyinstantiated directories. We are currently working
on a userspace polyinstantiation mechanism that was proposed by
Stephen Smalley on the selinux mailing list and that uses the
per-process namespace. Without the unshare system call, namespace
separation can only be achieved by clone(2), which would require
porting and maintaining all commands such as login, su, gdm, ssh,
cron, newrole, etc, that establish a user session. With unshare,
namespace setup can be done using PAM session management functions
without patching individual commands.
This patch was first submitted on linux-fsdevel in mid-may and
suggestions for improvement have been incorporated. It is now
ported to the latest rc5-mm tree and is being submitted for
consideration for inclusion in the mm tree for 2.6.14.
Overall Approach:
The overall approach followed clone system call and its permission
enforcement. However, instead of clone's "what do we leave shared?"
logic, here the logic was based on "what do we unshare, that was
previously being shared?". Unlike clone, which operated on a newly
allocated and not-yet schedulable task structure, additional
task_lock()s were taken to avoid race conditions from unshare
having to work on the current process. Before unsharing any part
of the context, a check is made to ensure that that part of the
context is being shared in the first place. If the context is not
being shared to begin with, the system call returns success. If
the context is being shared, the system call makes a private copy
of that context and updates the appropriate pointers of the
current task structure to point to this new private copy. If
allocation and setup of the private copy fails, the system call
appropriately restores the current task structures to continue
using the shared context.
Currently, the system call only allows "unsharing" of namespace,
signal handlers and virtual memory, because those three were deemed
useful on the linux-fsdevel mailing list.
Testing:
The patch has been tested on uni-processor i386 architecture
based Fedora Core 3 system.
Signed off by: Janak Desai
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 0/3] New system call, unshare
2005-08-08 13:28 [PATCH 0/3] New system call, unshare Janak Desai
@ 2005-08-10 14:08 ` Florian Weimer
2005-08-10 14:18 ` serue
2005-08-23 6:18 ` Al Viro
0 siblings, 2 replies; 5+ messages in thread
From: Florian Weimer @ 2005-08-10 14:08 UTC (permalink / raw)
To: Janak Desai
Cc: viro, sds, linuxram, ericvh, dwalsh, jmorris, akpm, torvalds, gh,
linux-fsdevel, linux-kernel
* Janak Desai:
> With unshare, namespace setup can be done using PAM session
> management functions without patching individual commands.
I don't think it's a good idea to use security-critical code well
without its original specification. Clearly the current situation
sucks, but this is mainly a lack of PAM functionality, IMHO.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 0/3] New system call, unshare
2005-08-10 14:08 ` Florian Weimer
@ 2005-08-10 14:18 ` serue
2005-08-10 15:05 ` Janak Desai
2005-08-23 6:18 ` Al Viro
1 sibling, 1 reply; 5+ messages in thread
From: serue @ 2005-08-10 14:18 UTC (permalink / raw)
To: Florian Weimer
Cc: Janak Desai, viro, sds, linuxram, ericvh, dwalsh, jmorris, akpm,
torvalds, gh, linux-fsdevel, linux-kernel
Quoting Florian Weimer (fw@deneb.enyo.de):
> * Janak Desai:
>
> > With unshare, namespace setup can be done using PAM session
> > management functions without patching individual commands.
>
> I don't think it's a good idea to use security-critical code well
Note that this patch is not removing the CAP_SYS_ADMIN requirement,
just allowing the operation to happen outside of clone(). Unlike
domain transitions in selinux, which should be tied to exec() so
as to tie them to known code, I don't see what clone() would provide
in terms of safety which we are losing.
> without its original specification. Clearly the current situation
> sucks, but this is mainly a lack of PAM functionality, IMHO.
I'm not sure this is to do with PAM functionality, rather than
just its design. Is there a way of "fixing" pam so that we don't
need unshare()?
thanks,
-serge
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 0/3] New system call, unshare
2005-08-10 14:18 ` serue
@ 2005-08-10 15:05 ` Janak Desai
0 siblings, 0 replies; 5+ messages in thread
From: Janak Desai @ 2005-08-10 15:05 UTC (permalink / raw)
To: serue
Cc: Florian Weimer, viro, sds, linuxram, ericvh, dwalsh, jmorris,
akpm, torvalds, gh, linux-fsdevel, linux-kernel
serue@us.ibm.com wrote:
> Quoting Florian Weimer (fw@deneb.enyo.de):
>
>>* Janak Desai:
>>
>>
>>>With unshare, namespace setup can be done using PAM session
>>>management functions without patching individual commands.
>>
>>I don't think it's a good idea to use security-critical code well
>
>
> Note that this patch is not removing the CAP_SYS_ADMIN requirement,
> just allowing the operation to happen outside of clone(). Unlike
> domain transitions in selinux, which should be tied to exec() so
> as to tie them to known code, I don't see what clone() would provide
> in terms of safety which we are losing.
>
>
>>without its original specification. Clearly the current situation
>>sucks, but this is mainly a lack of PAM functionality, IMHO.
>
>
> I'm not sure this is to do with PAM functionality, rather than
> just its design. Is there a way of "fixing" pam so that we don't
> need unshare()?
>
I have been trying to narrow down the problem since Alan's post
about using clone() instead of unshare. The problem comes down to
parent, on _exit(), clobbering controlling tty. I have tried, from
the child, to close and open the tty stored in PAM but that has
not helped.
-Janak
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 0/3] New system call, unshare
2005-08-10 14:08 ` Florian Weimer
2005-08-10 14:18 ` serue
@ 2005-08-23 6:18 ` Al Viro
1 sibling, 0 replies; 5+ messages in thread
From: Al Viro @ 2005-08-23 6:18 UTC (permalink / raw)
To: Florian Weimer
Cc: Janak Desai, sds, linuxram, ericvh, dwalsh, jmorris, akpm,
torvalds, gh, linux-fsdevel, linux-kernel
On Wed, Aug 10, 2005 at 04:08:31PM +0200, Florian Weimer wrote:
> * Janak Desai:
>
> > With unshare, namespace setup can be done using PAM session
> > management functions without patching individual commands.
>
> I don't think it's a good idea to use security-critical code well
> without its original specification. Clearly the current situation
> sucks, but this is mainly a lack of PAM functionality, IMHO.
Eh? We are talking about a primitive that has far more uses than
PAM. This is a missing piece of the stuff done by clone() and fork():
each task is a virtual machine with sharable components. We can
get a copy of machine with arbitrary set of components replaced with
private copies. That's what clone() and fork() do. The thing missing
from that set is taking a component (VM, descriptors, etc.) of process
itself and making it private. The same thing we do on fork(), but
without creating a new process.
FWIW, I'm OK with that. IIRC, Linus ACKed the concept some time ago.
PAM is one obvious use, but there's are other situations where the lack
of that primitive is inconvenient...
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2005-08-23 6:15 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-08 13:28 [PATCH 0/3] New system call, unshare Janak Desai
2005-08-10 14:08 ` Florian Weimer
2005-08-10 14:18 ` serue
2005-08-10 15:05 ` Janak Desai
2005-08-23 6:18 ` Al Viro
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).