linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] New system call, unshare
@ 2005-08-08 13:28 Janak Desai
  2005-08-10 14:08 ` Florian Weimer
  0 siblings, 1 reply; 5+ messages in thread
From: Janak Desai @ 2005-08-08 13:28 UTC (permalink / raw)
  To: viro, sds, linuxram, ericvh, dwalsh, jmorris, akpm, torvalds, gh,
	linux-fsdevel
  Cc: linux-kernel


Patch Summary:
This patch implements a new system call, unshare.  unshare allows
a process to disassociate parts of the process context that were 
initially being shared using the clone() system call.

The patch consists of two parts:
[1/2] Implements the system call handler function sys_unshare.
[2/2] Implements system call setup for x86 architecture.

Patch Justification:
Inspiration for this patch came from the 4/20/05 post by Al Viro
on linux-fsdevel mailing list and the needs of per-process namespace 
based polyinstantiated directories. In his post Mr. Viro saw 
usefulness of the ability to create a private namespace without
forking. He also mentioned that "There used to be a kinda-sorta 
agreement on a new syscall: unshare(bitmap) with arguments like 
those of clone(2)".

Polyinstantiated directories provide an instance of a directory
based on the process security context (user id and/or extended
selinux attributes). Polyinstantiation of public directories such 
as /tmp provide better separation of processes and prevent 
illegal information flow through file name. Polyinstantiated
directories are needed for common criteria certification using 
Mandatory Access Control based Protection Profiles.

Legacy Mandatory Access Control based UNIX operating systems
often modified kernel's pathname translation routines to
implement polyinstantiated directories. We are currently working
on a userspace polyinstantiation mechanism that was proposed by 
Stephen Smalley on the selinux mailing list and that uses the
per-process namespace.  Without the unshare system call, namespace
separation can only be achieved by clone(2), which would require 
porting and maintaining all commands such as login, su, gdm, ssh,
cron, newrole, etc, that establish a user session.  With unshare,
namespace setup can be done using PAM session management functions
without patching individual commands. 

This patch was first submitted on linux-fsdevel in mid-may and 
suggestions for improvement have been incorporated. It is now
ported to the latest rc5-mm tree and is being submitted for
consideration for inclusion in the mm tree for 2.6.14.

Overall Approach:
The overall approach followed clone system call and its permission
enforcement. However, instead of clone's "what do we leave shared?" 
logic, here the logic was based on "what do we unshare, that was 
previously being shared?". Unlike clone, which operated on a newly 
allocated and not-yet schedulable task structure, additional
task_lock()s were taken to avoid race conditions from unshare 
having to work on the current process. Before unsharing any part 
of the context, a check is made to ensure that that part of the
context is being shared in the first place. If the context is not
being shared to begin with, the system call returns success. If 
the context is being shared, the system call makes a private copy
of that context and updates the appropriate pointers of the 
current task structure to point to this new private copy. If 
allocation and setup of the private copy fails, the system call 
appropriately restores the current task structures to continue 
using the shared context.

Currently, the system call only allows "unsharing" of namespace, 
signal handlers and virtual memory, because those three were deemed 
useful on the linux-fsdevel mailing list.

Testing:
The patch has been tested on uni-processor i386 architecture
based Fedora Core 3 system.

Signed off by: Janak Desai


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/3] New system call, unshare
  2005-08-08 13:28 [PATCH 0/3] New system call, unshare Janak Desai
@ 2005-08-10 14:08 ` Florian Weimer
  2005-08-10 14:18   ` serue
  2005-08-23  6:18   ` Al Viro
  0 siblings, 2 replies; 5+ messages in thread
From: Florian Weimer @ 2005-08-10 14:08 UTC (permalink / raw)
  To: Janak Desai
  Cc: viro, sds, linuxram, ericvh, dwalsh, jmorris, akpm, torvalds, gh,
	linux-fsdevel, linux-kernel

* Janak Desai:

> With unshare, namespace setup can be done using PAM session
> management functions without patching individual commands.

I don't think it's a good idea to use security-critical code well
without its original specification.  Clearly the current situation
sucks, but this is mainly a lack of PAM functionality, IMHO.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/3] New system call, unshare
  2005-08-10 14:08 ` Florian Weimer
@ 2005-08-10 14:18   ` serue
  2005-08-10 15:05     ` Janak Desai
  2005-08-23  6:18   ` Al Viro
  1 sibling, 1 reply; 5+ messages in thread
From: serue @ 2005-08-10 14:18 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Janak Desai, viro, sds, linuxram, ericvh, dwalsh, jmorris, akpm,
	torvalds, gh, linux-fsdevel, linux-kernel

Quoting Florian Weimer (fw@deneb.enyo.de):
> * Janak Desai:
> 
> > With unshare, namespace setup can be done using PAM session
> > management functions without patching individual commands.
> 
> I don't think it's a good idea to use security-critical code well

Note that this patch is not removing the CAP_SYS_ADMIN requirement,
just allowing the operation to happen outside of clone().  Unlike
domain transitions in selinux, which should be tied to exec() so
as to tie them to known code, I don't see what clone() would provide
in terms of safety which we are losing.

> without its original specification.  Clearly the current situation
> sucks, but this is mainly a lack of PAM functionality, IMHO.

I'm not sure this is to do with PAM functionality, rather than
just its design.  Is there a way of "fixing" pam so that we don't
need unshare()?

thanks,
-serge

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/3] New system call, unshare
  2005-08-10 14:18   ` serue
@ 2005-08-10 15:05     ` Janak Desai
  0 siblings, 0 replies; 5+ messages in thread
From: Janak Desai @ 2005-08-10 15:05 UTC (permalink / raw)
  To: serue
  Cc: Florian Weimer, viro, sds, linuxram, ericvh, dwalsh, jmorris,
	akpm, torvalds, gh, linux-fsdevel, linux-kernel

serue@us.ibm.com wrote:
> Quoting Florian Weimer (fw@deneb.enyo.de):
> 
>>* Janak Desai:
>>
>>
>>>With unshare, namespace setup can be done using PAM session
>>>management functions without patching individual commands.
>>
>>I don't think it's a good idea to use security-critical code well
> 
> 
> Note that this patch is not removing the CAP_SYS_ADMIN requirement,
> just allowing the operation to happen outside of clone().  Unlike
> domain transitions in selinux, which should be tied to exec() so
> as to tie them to known code, I don't see what clone() would provide
> in terms of safety which we are losing.
> 
> 
>>without its original specification.  Clearly the current situation
>>sucks, but this is mainly a lack of PAM functionality, IMHO.
> 
> 
> I'm not sure this is to do with PAM functionality, rather than
> just its design.  Is there a way of "fixing" pam so that we don't
> need unshare()?
> 

I have been trying to narrow down the problem since Alan's post
about using clone() instead of unshare. The problem comes down to
parent, on _exit(), clobbering controlling tty. I have tried, from
the child, to close and open the tty stored in PAM but that has
not helped.

-Janak


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/3] New system call, unshare
  2005-08-10 14:08 ` Florian Weimer
  2005-08-10 14:18   ` serue
@ 2005-08-23  6:18   ` Al Viro
  1 sibling, 0 replies; 5+ messages in thread
From: Al Viro @ 2005-08-23  6:18 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Janak Desai, sds, linuxram, ericvh, dwalsh, jmorris, akpm,
	torvalds, gh, linux-fsdevel, linux-kernel

On Wed, Aug 10, 2005 at 04:08:31PM +0200, Florian Weimer wrote:
> * Janak Desai:
> 
> > With unshare, namespace setup can be done using PAM session
> > management functions without patching individual commands.
> 
> I don't think it's a good idea to use security-critical code well
> without its original specification.  Clearly the current situation
> sucks, but this is mainly a lack of PAM functionality, IMHO.

Eh?  We are talking about a primitive that has far more uses than
PAM.  This is a missing piece of the stuff done by clone() and fork():
each task is a virtual machine with sharable components.  We can
get a copy of machine  with arbitrary set of components replaced with
private copies.  That's what clone() and fork() do.  The thing missing
from that set is taking a component (VM, descriptors, etc.) of process
itself and making it private.  The same thing we do on fork(), but
without creating a new process.

FWIW, I'm OK with that.  IIRC, Linus ACKed the concept some time ago.
PAM is one obvious use, but there's are other situations where the lack
of that primitive is inconvenient...

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-08-23  6:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-08 13:28 [PATCH 0/3] New system call, unshare Janak Desai
2005-08-10 14:08 ` Florian Weimer
2005-08-10 14:18   ` serue
2005-08-10 15:05     ` Janak Desai
2005-08-23  6:18   ` Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).