From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stanislav Kinsbursky Subject: Re: [PATCH review 52/85] sunrpc: Properly encode kuids and kgids in auth.unix.gid rpc pipe upcalls. Date: Thu, 14 Feb 2013 11:12:16 +0400 Message-ID: <511C8E50.8080007@parallels.com> References: <87621w14vs.fsf@xmission.com> <1360777934-5663-1-git-send-email-ebiederm@xmission.com> <1360777934-5663-52-git-send-email-ebiederm@xmission.com> <20130213210545.GO14195@fieldses.org> <874nhfrjgg.fsf@xmission.com> <20130213215047.GR14195@fieldses.org> <8738wzq1z6.fsf@xmission.com> <20130213225840.GV14195@fieldses.org> <87ip5vn6iv.fsf@xmission.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1251; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: "J. Bruce Fields" , , Linux Containers , , "Serge E. Hallyn" , "Trond Myklebust" To: "Eric W. Biederman" Return-path: In-Reply-To: <87ip5vn6iv.fsf@xmission.com> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org 14.02.2013 03:22, Eric W. Biederman =EF=E8=F8=E5=F2: > "J. Bruce Fields" writes: > >> On Wed, Feb 13, 2013 at 02:32:29PM -0800, Eric W. Biederman wrote: >>> "J. Bruce Fields" writes: >>> >>>> On Wed, Feb 13, 2013 at 01:29:35PM -0800, Eric W. Biederman wrote: >>>>> "J. Bruce Fields" writes: >>>>> >>>>>> On Wed, Feb 13, 2013 at 09:51:41AM -0800, Eric W. Biederman wrot= e: >>>>>>> From: "Eric W. Biederman" >>>>>>> >>>>>>> When a new rpc connection is established with an in-kernel serv= er, the >>>>>>> traffic passes through svc_process_common, and svc_set_client a= nd down >>>>>>> into svcauth_unix_set_client if it is of type RPC_AUTH_NULL or >>>>>>> RPC_AUTH_UNIX. >>>>>>> >>>>>>> svcauth_unix_set_client then looks at the uid of the credential= we >>>>>>> have assigned to the incomming client and if we don't have the = groups >>>>>>> already cached makes an upcall to get a list of groups that the= client >>>>>>> can use. >>>>>>> >>>>>>> The upcall encodes send a rpc message to user space encoding th= e uid >>>>>>> of the user whose groups we want to know. Encode the kuid of t= he user >>>>>>> in the initial user namespace as nfs mounts can only happen tod= ay in >>>>>>> the initial user namespace. >>>>>> >>>>>> OK, I didn't know that. >>>>>> >>>>>> (Though I'm unclear how it should matter to the server what user >>>>>> namespace the client is in?) >>>>> >>>>> Perhaps I have the description a little scrambled. The short ver= sion >>>>> is that to start I only support the initial network namespace. >>>>> >>>>> If I haven't succeeded it is my intent to initially limit the ser= vers >>>>> to the initial user namespace as well. I should see if I can fig= ure >>>>> that out. >>>>> >>>>>>> When a reply to an upcall comes in convert interpret the uid an= d gid values >>>>>>> from the rpc pipe as uids and gids in the initial user namespac= e and convert >>>>>>> them into kuids and kgids before processing them further. >>>>>>> >>>>>>> When reading proc files listing the uid to gid list cache conve= rt the >>>>>>> kuids and kgids from into uids and gids the initial user namesp= ace. As we are >>>>>>> displaying server internal details it makes sense to display th= ese values >>>>>>> from the servers perspective. >>>>>> >>>>>> All of these caches are already per-network-namespace. Ideally = wouldn't >>>>>> we also like to associate a user namespace with each cache someh= ow? >>>>> >>>>> Ideally yes. I read through the caches enough to figure out wher= e there >>>>> user space interfaces were, and to make certain we had conversion= s >>>>> to/from kuids and kgids. >>>>> >>>>> I haven't looked at what user namespace makes sense for these >>>>> caches. For this cache my first guess is that net->user_ns >>>>> is what we want as it will be shared by all users in network name= space I >>>>> presume. >>>> >>>> Oh, I didn't know about net->user_ns--so each network namespace is >>>> associated with a single user namespace, great, that simplifies li= fe. >>>> Yes, that sounds exactly right. >>> >>> Yes. net->user_ns is the user namespace the network namespace was >>> created in. And it is the user namespace that is used in test >>> like ns_capable(net->user_ns, CAP_NET_ADMIN) to see if you are allo= wed >>> to manipulate the network namespace. So looks like exactly what we >>> want for that cache. >>> >>> Could you double check my understanding of the code? >>> >>> I want to be certain that I can't _yet_ start an sunrpc server proc= ess >>> outside of the initial user namespace. While writing an earlier re= ply I >>> realized that I hadn't thought about where sunrpc server processes = come >>> from. >>> >>> Reading through the code it looks like we can have nfs mounts outsi= de of >>> the initial network namespace. >> >> We're talking about the server side here, not the client, so I'm not >> sure what you mean by "nfs mounts". The nfs server does use various >> pseudofilesystems ("proc", "nfsd"), and those can be mounted outside= the >> initial network namespace. > > Actually I was seeing that nfs clients were starting lockd. So I was > just reasoning here that anything that came from a nfs client was > ultimately in the user namespace of that client, which is ultimately > limited by the client out. > >> The server can receive rpc requests over network interfaces outside = the >> initial network namespace, sure. The server doesn't perform mounts = on >> behalf of clients, though, it just accesses previously mounted >> filesystems on clients' behalf. > > But nfsd_init_socks only creates sockets in a single network namespac= e, > and today we pass only &init_net. > >>> But because they are mounts they are >>> still limited to the initial user namespace. >> >> OK, so that's just a limitation on any mount whatsoever for now. I'= m >> catching on, slowly, thanks! > > If you set in struct filesystem .fs_flags =3D FS_USERNS_MOUNT your > filesystem can be mounted outside of the initial user namespace. But > since that takes extra work and because unprivileged users are allowe= d > to create user namespaces and perform the mounts by default it is off= =2E > >>> Now looking at the nfs server, seems to be hard coded to only start >>> in the initial network namespace despite almost having support for >>> starting in more. >> >> Right, Stanislav's got 4 more patches that should finish the job; se= e >> http://mid.gmane.org/<20130201125210.3257.46454.stgit@localhost.loca= ldomain> >> and followups. That should make it for 3.9, I just need to review >> them.... > > Ok that is interesting. > > There is an interesting corner case here where an unprivileged user > can create a user namespace and then can create a network namespace. > Depending on how we interpret things when Stanislaves patches reach > there we might have to add: > > if (net->user_ns !=3D &init_user_ns) > -EINVAL > > Somewhere appropriate. > >>> Even more the nfs server is controlled and started through the "nfs= d" >>> filesystem. Which has to be mounted before you can start the serve= r. >>> So you can only start the server through a mount in the initial use= r >>> namespace. >> >> Yes. >> >>> lockd is started by either the nfs server or the nfs client. >>> >>> There are no other sunrpc servers in the kernel. >> >> There are a couple callback services on the NFS client--those should= be >> associated with nfs mounts in some obvious way. There's a confusing= ACL >> service that's really just an appendage of NFSv2/v3 service. >> >> I think we're fine. > > Thanks. > >>> I think all of that is enough to reasonably claim that you can't ha= ve >>> any sunrpc server processes outside of the initial user namespace. = But >>> if I am wrong I would to find an appropriate spot to put in a line >>> that says: >>> if (current_user_ns() !=3D &init_user) >>> return -ESORRY_CHARLEY; >> >> I think you're right. >> >> So for now it's safely confined to one user namespace, and I think w= e >> understand approximately what to do if we want to support nfsd's in = user >> namespace in the future. (Mainly, make sure nfsd and proc can be >> mounted in them and then most things will be determined by the user_= ns >> of the network namespace associated with a given rpc.) > > For 3.9 the list of filesystems mountable outside the initial user > namespace is: mqueuefs, tmpfs, ramfs, devpts, sysfs, and proc. > > I am a touch concerned about /proc/fs/nfsd/exports after my patches > and Stanislavs patches both come in. As I think that will allow for > cases where net->user_ns !=3D &init_userns. But we can cross that br= idge > when we come to it. > Hmmm... Maybe I'm missing the point of user namespaces, but since NFS kernel se= rver is controlled via NFSd file system write calls, maybe it would be bette= r to add: =2Efs_flags =3D FS_USERNS_MOUNT to it and add check: + if (net->user_ns !=3D current_user_ns()) + return -EINVAL; No? > Eric > --=20 Best regards, Stanislav Kinsbursky