* Re: container support in DazukoFS [not found] <87tyogde9l.fsf@lonestar.fn.ogness.net> @ 2010-07-03 15:42 ` Eric W. Biederman 2010-07-03 16:40 ` Eric W. Biederman 2010-07-03 21:13 ` Serge E. Hallyn 0 siblings, 2 replies; 4+ messages in thread From: Eric W. Biederman @ 2010-07-03 15:42 UTC (permalink / raw) To: John Ogness; +Cc: dazuko-devel, Linux Containers, linux-fsdevel John Ogness <dazukolist3@ogness.net> writes: > [Cc: Eric Biederman because of his container feedback on LKML. > Hi Eric, the Dazuko-Devel mailing list is register-only, but if you > reply to me, then I can post your comments on the list.] I am not willing to discuss design ideas in detail on a closed list, as such I have copied a couple appropriate mailing lists to have such a discussion. > I've been wondering what we could do to make DazukoFS more acceptable > for mainline inclusion. Aside from making DazukoFS more complete: > > http://lists.gnu.org/archive/html/dazuko-devel/2010-06/msg00000.html > > the main issue reported from LKML reviews was that we need to support > containers. I think I have a solution for this, which will also make > DazukoFS more flexible when not using containers. That is a very odd way of putting it. We really don't allow the ability to compile out container support. So you really have only two cases. When there is only one instance of various namespaces or when there are many. Your code is simply broken if it doesn't handle namespaces properly, especially the mount namespace. > My idea is the following: > > 1. There is only 1 global device "/dev/dazukofs.ctrl". > > 2. When a group is added (using the "add" or "addtrack" commands), > DazukoFS will create the "/dev/dazukofs.N" group device within the > container-space of the process adding the group. This means that > contained environments can create their own local group devices. For > systems not using containers this is also an improvement because it > means group devices are created dynamically (instead of the 10 static > groups that exist now). What is a container-space ? So far we only have a single device namespace. If you are going around creating control devices dynamically, I suggest a control pseudo filesystem like devpts might be more appropriate. The you can keep your per instance configuration as per mount data in your control fs. > 3. When a process reads the global control device to see the group > names, only the groups within the same container-space as the reading > process are shown. This keeps information about other containers > private. > > 4. When a file is accessed, only the groups from container-spaces > where the file exists in that container-space will be notified. > > 5. If an ignore device is desired, it may be created using a new > command for the control device. Perhaps something like "addign". The > ignore device would be created within the same container-space of the > process requesting the ignore device. The ignore scope would only be > the container-space of the ignore device. This means that if a process > is being ignored within its container, a non-contained process on the > host machine could still react to files accesses by the contained > ignored process. > > 6. When a new group is created, it does not go live until the first > read on the group device has occured. This allows an application to > setup a new group and set the permissions on the new group device > before dropping its privileges and beginning file access control. > > > There are a couple things that I like about these changes. First off, > I like that group and ignore devices are created dynamically. This > should have been the way it was done from the beginning. This not only > removes restrictions on the number of groups, but makes it much easier > to think in terms of containers. > > Secondly, I like that a group does not go live until the first read on > the group device. This makes it much simpler (and cleaner) for > developing non-privileged applications to perform online file access > control. > > I also see some open issues here. When automatically creating new > devices, one must always consider the permissions and ownership > involved. Right now this can be handled using udev rules or by a > privileged process setting them appropriately. I think that this is > probably ok for now, especially since complex SElinux rules could come > into play. But it is something we need to keep in mind. > > I am not familiar with how udev works for containers. For example, > would every container be able to have its own /dev/dazukofs.0 or must > udev devices be globally unique? Either way, I do not see this as a > problem, but will need to be considered. > > I am also not familiar with the kernel API's for container > management. So my idea may need to be adjusted a bit, but I think in > general it should work. If anyone has experience with containers, I'd > be interested in hearing about this. For the mount namespace which sounds like you primarily care about the APIs are: clone( ... CLONE_NEWNS ... ) unshare( CLONE_NEWNS ) mount( ... ) chroot( ... ) They have been in the kernel since at least 2.5.early. If you are doing interesting things with filesystems and you don't understand those APIs I don't see how you can possibly create correct code. Eric ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: container support in DazukoFS 2010-07-03 15:42 ` container support in DazukoFS Eric W. Biederman @ 2010-07-03 16:40 ` Eric W. Biederman 2010-07-03 23:15 ` John Ogness 2010-07-03 21:13 ` Serge E. Hallyn 1 sibling, 1 reply; 4+ messages in thread From: Eric W. Biederman @ 2010-07-03 16:40 UTC (permalink / raw) To: John Ogness; +Cc: dazuko-devel, Linux Containers, linux-fsdevel ebiederm@xmission.com (Eric W. Biederman) writes: > John Ogness <dazukolist3@ogness.net> writes: > >> [Cc: Eric Biederman because of his container feedback on LKML. >> Hi Eric, the Dazuko-Devel mailing list is register-only, but if you >> reply to me, then I can post your comments on the list.] > > I am not willing to discuss design ideas in detail on a closed list, > as such I have copied a couple appropriate mailing lists to have such > a discussion. > >> I've been wondering what we could do to make DazukoFS more acceptable >> for mainline inclusion. Aside from making DazukoFS more complete: >> >> http://lists.gnu.org/archive/html/dazuko-devel/2010-06/msg00000.html >> >> the main issue reported from LKML reviews was that we need to support >> containers. I think I have a solution for this, which will also make >> DazukoFS more flexible when not using containers. > > That is a very odd way of putting it. We really don't allow the > ability to compile out container support. So you really have only > two cases. When there is only one instance of various namespaces > or when there are many. Your code is simply broken if it doesn't > handle namespaces properly, especially the mount namespace. I just looked back at the reviews, and what I see is that your code essentially got the a brush off, as not really being worth reviewing. The comments were largely to point out giant design flaws in your approach to you, more than a serious hey this is a good idea, here a couple of little problems you need to fix to make it a good implementation. I don't think you even comprehended much less addressed Al's concerns. For something like this you definitely need something that will at least get Al Viro's nod of approval as Al is the VFS maintainer. For good or bad the VFS is an exceeding complex beast, you need to understand and work with the VFS not fight it if you want to do file level access control. In particular Al was saying that the scenario you warn about in your readme is impossible to avoid, and thus Dazuko is broken by design. > ======== > WARNING > ========= > > It is possible to mount DazukoFS to a directory other than the directory > that is being stacked upon. For example: > > # mount -t dazukofs /usr/local/games /tmp/dazukofs_test > > When accessing files within /tmp/dazukofs_test, you will be accessing > files in /usr/local/games (through dazukofs). When accessing files directly > in /usr/local/games, dazukofs will not be involved (and will not detect > the file access). > > THIS HAS POTENTIAL PROBLEMS! > > If files are modified directly in /usr/local/games, the dazukofs layer > will not know about it. When dazukofs later tries to access those files, > it may result in corrupt data or kernel crashes. As long as > /usr/local/games is ONLY modified through dazukofs, there should not be > any problems. I am a bit puzzled why you are making something like this a kernel feature at all instead of treating virus scanning as something that apps can voluntarily participate in. With so many races and holes in your implementation I don't see how a userspace implemenation in something like the gnome-vfs would be less effective. >> My idea is the following: >> >> 1. There is only 1 global device "/dev/dazukofs.ctrl". >> >> 2. When a group is added (using the "add" or "addtrack" commands), >> DazukoFS will create the "/dev/dazukofs.N" group device within the >> container-space of the process adding the group. This means that >> contained environments can create their own local group devices. For >> systems not using containers this is also an improvement because it >> means group devices are created dynamically (instead of the 10 static >> groups that exist now). > > What is a container-space ? So far we only have a single device namespace. > > If you are going around creating control devices dynamically, I > suggest a control pseudo filesystem like devpts might be more appropriate. > The you can keep your per instance configuration as per mount data > in your control fs. > >> 3. When a process reads the global control device to see the group >> names, only the groups within the same container-space as the reading >> process are shown. This keeps information about other containers >> private. What I was objecting to long ago is the existence of group names, your current design has global group names. I can't understand what your groups are doing, or why your groups need names, but having group names in a new interface makes them global and unusable by containers, and pretty much so fragile that you are going to wish you had sense to design something less prone to problems later on. Also using the concept of a dazuko group when we already have the concept of process group is to put it mildly confusing. I looked at your tracking code a little bit I don't understand what you are trying to accomplish but the code certainly does not track the process that opens the dazuko group as the description indicates it should. >> 4. When a file is accessed, only the groups from container-spaces >> where the file exists in that container-space will be notified. Since you asked you should not use current->pid. You want something that is struct pid based for your notifications, or you will never figure out which process is doing what in the presence of pid namespaces. >> 5. If an ignore device is desired, it may be created using a new >> command for the control device. Perhaps something like "addign". The >> ignore device would be created within the same container-space of the >> process requesting the ignore device. The ignore scope would only be >> the container-space of the ignore device. This means that if a process >> is being ignored within its container, a non-contained process on the >> host machine could still react to files accesses by the contained >> ignored process. >> >> 6. When a new group is created, it does not go live until the first >> read on the group device has occured. This allows an application to >> setup a new group and set the permissions on the new group device >> before dropping its privileges and beginning file access control. >> >> >> There are a couple things that I like about these changes. First off, >> I like that group and ignore devices are created dynamically. This >> should have been the way it was done from the beginning. This not only >> removes restrictions on the number of groups, but makes it much easier >> to think in terms of containers. >> >> Secondly, I like that a group does not go live until the first read on >> the group device. This makes it much simpler (and cleaner) for >> developing non-privileged applications to perform online file access >> control. >> >> I also see some open issues here. When automatically creating new >> devices, one must always consider the permissions and ownership >> involved. Right now this can be handled using udev rules or by a >> privileged process setting them appropriately. I think that this is >> probably ok for now, especially since complex SElinux rules could come >> into play. But it is something we need to keep in mind. >> >> I am not familiar with how udev works for containers. For example, >> would every container be able to have its own /dev/dazukofs.0 or must >> udev devices be globally unique? Either way, I do not see this as a >> problem, but will need to be considered. >> >> I am also not familiar with the kernel API's for container >> management. So my idea may need to be adjusted a bit, but I think in >> general it should work. If anyone has experience with containers, I'd >> be interested in hearing about this. > > For the mount namespace which sounds like you primarily care about the > APIs are: > clone( ... CLONE_NEWNS ... ) > unshare( CLONE_NEWNS ) > mount( ... ) > chroot( ... ) > > They have been in the kernel since at least 2.5.early. If you are doing > interesting things with filesystems and you don't understand those APIs > I don't see how you can possibly create correct code. > > > Eric ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: container support in DazukoFS 2010-07-03 16:40 ` Eric W. Biederman @ 2010-07-03 23:15 ` John Ogness 0 siblings, 0 replies; 4+ messages in thread From: John Ogness @ 2010-07-03 23:15 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Linux Containers, linux-fsdevel, dazuko-devel On 2010-07-03, ebiederm@xmission.com (Eric W. Biederman) wrote: >> I've been wondering what we could do to make DazukoFS more acceptable >> for mainline inclusion. >> >> [...] > > I just looked back at the reviews, and what I see is that your code > essentially got the a brush off, as not really being worth > reviewing. I didn't get that impression. Especially since posting the patches led to a relatively positive LWN.net article from Jake Edge. > The comments were largely to point out giant design flaws in your > approach to you, more than a serious hey this is a good idea, here a > couple of little problems you need to fix to make it a good > implementation. The only giant design flaws that were discussed were related to stackable filesystems in general and affect current mainline code (eCryptfs) just as much as DazukoFS. Since eCryptfs also has these issues and was accepted mainline, I did not view this as a reason to reject DazukoFS. As I stated in the original patch posts, one of the reasons for adding another stackable filesystem to mainline would be to help identify common functionality between the stackable filesystems. And then together figure out how we can solve these problems, which currently affect any stackable filesystem in Linux. > [...] > > In particular Al was saying that the scenario you warn about in your > readme is impossible to avoid, and thus Dazuko is broken by design. His comments were saying that stackable filesystems are broken by design. Do we need to fix filesystem stacking in Linux before accepting any more broken stackable filesystems? Or do we just pretend that eCryptfs doesn't have these problems while brushing off any other stackable filesystem submissions? > [...] > > I am a bit puzzled why you are making something like this a kernel > feature at all instead of treating virus scanning as something that > apps can voluntarily participate in. Getting every possible application on a system to participate is a lot more work than simply letting the filesystem handle it. All file access must go through the filesystem, so if you want to control file access I think it makes sense to implement that at the filesystem level. > With so many races and holes in your implementation I don't see how > a userspace implemenation in something like the gnome-vfs would be > less effective. The only races and holes are related to stackable filesystems on Linux in general. > [...] > > If you are going around creating control devices dynamically, I > suggest a control pseudo filesystem like devpts might be more > appropriate. The you can keep your per instance configuration as > per mount data in your control fs. That is an interesting suggestion. I will think about how we could/should do that. > [...] > > What I was objecting to long ago is the existence of group names, > your current design has global group names. I can't understand what > your groups are doing, or why your groups need names, but having > group names in a new interface makes them global and unusable by > containers, and pretty much so fragile that you are going to wish > you had sense to design something less prone to problems later on. > > Also using the concept of a dazuko group when we already have the > concept of process group is to put it mildly confusing. > > I looked at your tracking code a little bit I don't understand what > you are trying to accomplish but the code certainly does not track > the process that opens the dazuko group as the description indicates > it should. A Dazuko group is not associated with processes. Instead, processes decide if they want to do work for an existing group. Maybe "file access event queue" is a more appropriate description than "group". There is no restriction on which processes can handle an item of a file access event queue except for the Linux security permissions on the queue itself (which is currently a device node). I can see how using Linux process groups to implement this feature would be possible. But it would be changing the semantics of the feature considerably and making things IMHO unnecessarily complicated. Perhaps I need technical documentation that is geared towards kernel rather than userspace developers. Then such misunderstandings and incorrect associations could be (possibly) avoided. > [...] > > Since you asked you should not use current->pid. You want something > that is struct pid based for your notifications, or you will never > figure out which process is doing what in the presence of pid > namespaces. Thank you. These changes were implemented after your comment on LKML. > [...] > > For the mount namespace which sounds like you primarily care about > the APIs are: > clone( ... CLONE_NEWNS ... ) > unshare( CLONE_NEWNS ) > mount( ... ) > chroot( ... ) > > They have been in the kernel since at least 2.5.early. If you are > doing interesting things with filesystems and you don't understand > those APIs I don't see how you can possibly create correct code. I am interested in creating correct code. That is why I have asked questions. John Ogness ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: container support in DazukoFS 2010-07-03 15:42 ` container support in DazukoFS Eric W. Biederman 2010-07-03 16:40 ` Eric W. Biederman @ 2010-07-03 21:13 ` Serge E. Hallyn 1 sibling, 0 replies; 4+ messages in thread From: Serge E. Hallyn @ 2010-07-03 21:13 UTC (permalink / raw) To: Eric W. Biederman Cc: John Ogness, Linux Containers, linux-fsdevel, dazuko-devel Quoting Eric W. Biederman (ebiederm@xmission.com): > John Ogness <dazukolist3@ogness.net> writes: > For the mount namespace which sounds like you primarily care about the > APIs are: > clone( ... CLONE_NEWNS ... ) > unshare( CLONE_NEWNS ) > mount( ... ) > chroot( ... ) and pivot_root > They have been in the kernel since at least 2.5.early. If you are doing > interesting things with filesystems and you don't understand those APIs > I don't see how you can possibly create correct code. > > > Eric > _______________________________________________ > Containers mailing list > Containers@lists.linux-foundation.org > https://lists.linux-foundation.org/mailman/listinfo/containers ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-07-03 23:15 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <87tyogde9l.fsf@lonestar.fn.ogness.net> 2010-07-03 15:42 ` container support in DazukoFS Eric W. Biederman 2010-07-03 16:40 ` Eric W. Biederman 2010-07-03 23:15 ` John Ogness 2010-07-03 21:13 ` Serge E. Hallyn
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).