* FUSE merging? @ 2005-06-30 9:19 Miklos Szeredi 2005-06-30 9:27 ` Andrew Morton 0 siblings, 1 reply; 78+ messages in thread From: Miklos Szeredi @ 2005-06-30 9:19 UTC (permalink / raw) To: akpm; +Cc: linux-kernel Hi Andrew! What's up with FUSE merging? Is there anything pending that I should do? Ted Ts'o's ideas about selective access to mountpoints are interesting, but I wouldn't consider them merge critical, as they solve a problem, that hasn't yet come up in real life. Thanks, Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-06-30 9:19 FUSE merging? Miklos Szeredi @ 2005-06-30 9:27 ` Andrew Morton 2005-06-30 9:51 ` Miklos Szeredi 0 siblings, 1 reply; 78+ messages in thread From: Andrew Morton @ 2005-06-30 9:27 UTC (permalink / raw) To: Miklos Szeredi; +Cc: linux-kernel Miklos Szeredi <miklos@szeredi.hu> wrote: > > What's up with FUSE merging? Is there anything pending that I should > do? Where are we up to with the fuse_allow_task() bunfight? ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-06-30 9:27 ` Andrew Morton @ 2005-06-30 9:51 ` Miklos Szeredi 2005-06-30 10:00 ` Arjan van de Ven 0 siblings, 1 reply; 78+ messages in thread From: Miklos Szeredi @ 2005-06-30 9:51 UTC (permalink / raw) To: akpm; +Cc: linux-kernel > > What's up with FUSE merging? Is there anything pending that I should > > do? > > Where are we up to with the fuse_allow_task() bunfight? I think we agreed, that there seem to be no alternatives. Tytso said, that fuse_allow_task() thing is basically OK, but there should be some method to make certain tasks excempt from this limitation. I agree, with this, but I think there should be at least one (preferably more) users who actually need this, before I start thinking about implementing it. Making a mount be excepmt is already possible with the 'allow_other' (privileged by default) mount option. Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-06-30 9:51 ` Miklos Szeredi @ 2005-06-30 10:00 ` Arjan van de Ven 2005-06-30 10:12 ` Miklos Szeredi 2005-06-30 10:16 ` Miklos Szeredi 0 siblings, 2 replies; 78+ messages in thread From: Arjan van de Ven @ 2005-06-30 10:00 UTC (permalink / raw) To: Miklos Szeredi; +Cc: akpm, linux-kernel On Thu, 2005-06-30 at 11:51 +0200, Miklos Szeredi wrote: > > > What's up with FUSE merging? Is there anything pending that I should > > > do? > > > > Where are we up to with the fuse_allow_task() bunfight? > > I think we agreed, that there seem to be no alternatives. > > Tytso said, that fuse_allow_task() thing is basically OK, but there > should be some method to make certain tasks excempt from this > limitation. I agree, with this, but I think there should be at least > one (preferably more) users who actually need this, before I start > thinking about implementing it. > > Making a mount be excepmt is already possible with the 'allow_other' > (privileged by default) mount option. if you are so interested in getting fuse merged... why not merge it first with the security stuff removed entirely. And then start discussing putting security stuff back in ? ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-06-30 10:00 ` Arjan van de Ven @ 2005-06-30 10:12 ` Miklos Szeredi 2005-06-30 10:20 ` Arjan van de Ven 2005-06-30 10:16 ` Miklos Szeredi 1 sibling, 1 reply; 78+ messages in thread From: Miklos Szeredi @ 2005-06-30 10:12 UTC (permalink / raw) To: arjan; +Cc: akpm, linux-kernel > if you are so interested in getting fuse merged... why not merge it > first with the security stuff removed entirely. And then start > discussing putting security stuff back in ? a) it's already been discussed to death (just search for 'fuse' on lkml and fsdevel) b) I don't consider it a good idea to ship a defunct version of it in the mainline Can you please accept my wish to have FUSE merged _with_ the unprivileged mount's thing. If anybody has anything to add to the discussion, please do it now, and not later. Delaying this further won't get us any bonus IMO. Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-06-30 10:12 ` Miklos Szeredi @ 2005-06-30 10:20 ` Arjan van de Ven 2005-06-30 10:24 ` Miklos Szeredi 2005-06-30 11:13 ` Anton Altaparmakov 0 siblings, 2 replies; 78+ messages in thread From: Arjan van de Ven @ 2005-06-30 10:20 UTC (permalink / raw) To: Miklos Szeredi; +Cc: akpm, linux-kernel On Thu, 2005-06-30 at 12:12 +0200, Miklos Szeredi wrote: > > if you are so interested in getting fuse merged... why not merge it > > first with the security stuff removed entirely. And then start > > discussing putting security stuff back in ? > > a) it's already been discussed to death (just search for 'fuse' on > lkml and fsdevel) > > b) I don't consider it a good idea to ship a defunct version of it in > the mainline > > Can you please accept my wish to have FUSE merged _with_ the > unprivileged mount's thing. By the same argument: Then can you please accept that FUSE will not get merged right now. ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-06-30 10:20 ` Arjan van de Ven @ 2005-06-30 10:24 ` Miklos Szeredi 2005-06-30 19:39 ` Avuton Olrich 2005-06-30 11:13 ` Anton Altaparmakov 1 sibling, 1 reply; 78+ messages in thread From: Miklos Szeredi @ 2005-06-30 10:24 UTC (permalink / raw) To: arjan; +Cc: akpm, linux-kernel > By the same argument: > Then can you please accept that FUSE will not get merged right now. Yes. My argument is: IF it's not going to get merged now, can we please continue the discussion about why it's unacceptable, and what are the alternatives. Is that fair? Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-06-30 10:24 ` Miklos Szeredi @ 2005-06-30 19:39 ` Avuton Olrich 2005-07-01 6:23 ` Miklos Szeredi 0 siblings, 1 reply; 78+ messages in thread From: Avuton Olrich @ 2005-06-30 19:39 UTC (permalink / raw) To: Miklos Szeredi; +Cc: arjan, akpm, linux-kernel On 6/30/05, Miklos Szeredi <miklos@szeredi.hu> wrote: > > Then can you please accept that FUSE will not get merged right now. > My argument is: IF it's not going to get merged now, can we please > continue the discussion about why it's unacceptable, and what are the > alternatives. Why has there not been more discussion about just making an option for those 15 lines, just for merging's sake, and hopefully after more discussion, the option will go away one way or another. On the other hand everyone says security, security, security and I don't remember one person actually saying something negative about what it does to security. avuton -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-06-30 19:39 ` Avuton Olrich @ 2005-07-01 6:23 ` Miklos Szeredi 0 siblings, 0 replies; 78+ messages in thread From: Miklos Szeredi @ 2005-07-01 6:23 UTC (permalink / raw) To: avuton; +Cc: arjan, akpm, linux-kernel > > > Then can you please accept that FUSE will not get merged right now. > > My argument is: IF it's not going to get merged now, can we please > > continue the discussion about why it's unacceptable, and what are the > > alternatives. > > Why has there not been more discussion about just making an option for > those 15 lines, just for merging's sake, and hopefully after more > discussion, the option will go away one way or another. On the other > hand everyone says security, security, security and I don't remember > one person actually saying something negative about what it does to > security. There is a mount option: 'allow_other' which does just this. Or did you mean a config option? Thanks, Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-06-30 10:20 ` Arjan van de Ven 2005-06-30 10:24 ` Miklos Szeredi @ 2005-06-30 11:13 ` Anton Altaparmakov 2005-06-30 19:46 ` Andrew Morton 1 sibling, 1 reply; 78+ messages in thread From: Anton Altaparmakov @ 2005-06-30 11:13 UTC (permalink / raw) To: Arjan van de Ven; +Cc: Miklos Szeredi, akpm, linux-kernel On Thu, 2005-06-30 at 12:20 +0200, Arjan van de Ven wrote: > On Thu, 2005-06-30 at 12:12 +0200, Miklos Szeredi wrote: > > > if you are so interested in getting fuse merged... why not merge it > > > first with the security stuff removed entirely. And then start > > > discussing putting security stuff back in ? > > > > a) it's already been discussed to death (just search for 'fuse' on > > lkml and fsdevel) > > > > b) I don't consider it a good idea to ship a defunct version of it in > > the mainline > > > > Can you please accept my wish to have FUSE merged _with_ the > > unprivileged mount's thing. > > By the same argument: > Then can you please accept that FUSE will not get merged right now. Why should he? IMNSHO it should be merged right now with the security stuff. FUSE works as is. Without the security stuff FUSE is useless. I have yet to read even a single constructive argument why it should not be merged as is. Best regards, Anton -- Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/ ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-06-30 11:13 ` Anton Altaparmakov @ 2005-06-30 19:46 ` Andrew Morton 2005-06-30 20:00 ` Andrew Morton ` (2 more replies) 0 siblings, 3 replies; 78+ messages in thread From: Andrew Morton @ 2005-06-30 19:46 UTC (permalink / raw) To: Anton Altaparmakov; +Cc: arjan, miklos, linux-kernel, Frank van Maarseveen Anton Altaparmakov <aia21@cam.ac.uk> wrote: > > On Thu, 2005-06-30 at 12:20 +0200, Arjan van de Ven wrote: > > On Thu, 2005-06-30 at 12:12 +0200, Miklos Szeredi wrote: > > > > if you are so interested in getting fuse merged... why not merge it > > > > first with the security stuff removed entirely. And then start > > > > discussing putting security stuff back in ? > > > > > > a) it's already been discussed to death (just search for 'fuse' on > > > lkml and fsdevel) > > > > > > b) I don't consider it a good idea to ship a defunct version of it in > > > the mainline > > > > > > Can you please accept my wish to have FUSE merged _with_ the > > > unprivileged mount's thing. > > > > By the same argument: > > Then can you please accept that FUSE will not get merged right now. > > Why should he? IMNSHO it should be merged right now with the security > stuff. FUSE works as is. Without the security stuff FUSE is useless. > > I have yet to read even a single constructive argument why it should not > be merged as is. I believe that the requirement which fuse_allow_task() attempts to satisfy is legitimate and is useful to FUSE users. The fact that, AFAIK, nobody as found a way to implement it more nicely is a Linux problem, not a FUSE problem. Given that the actual amount of code involved is small, centralised and well known about we can easily fix it up later if/when new infrastructure or new ideas become available. So unless someone is able to come up with a better approach in the next few days I'm inclined to say "we suck" and merge the thing as-is. However, a few things: - is there anything in the current implementation of the permission stuff which might tie our hands if it is later reimplemented? IOW: does the current FUSE user interface in any way lock us into the current FUSE implementation (fuse_allow_task())? - the fuse mount options don't seem to be documented - aren't we going to remove the nfs semi-server feature? - Frank points out that a user can send a sigstop to his own setuid(0) task and he intimates that this could cause DoS problems with FUSE. More details needed please? - I don't recall seeing an exhaustive investigation of how an unprivileged user could use a FUSE mount to implement DoS attacks against other users or against root. ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-06-30 19:46 ` Andrew Morton @ 2005-06-30 20:00 ` Andrew Morton 2005-07-01 6:40 ` Miklos Szeredi 2005-06-30 22:28 ` Frank van Maarseveen 2005-07-01 6:36 ` FUSE merging? Miklos Szeredi 2 siblings, 1 reply; 78+ messages in thread From: Andrew Morton @ 2005-06-30 20:00 UTC (permalink / raw) To: aia21, arjan, miklos, linux-kernel, frankvm Andrew Morton <akpm@osdl.org> wrote: > > However, a few things: > > - is there anything in the current implementation of the permission stuff > which might tie our hands if it is later reimplemented? IOW: does the > current FUSE user interface in any way lock us into the current FUSE > implementation (fuse_allow_task())? > > - the fuse mount options don't seem to be documented > > - aren't we going to remove the nfs semi-server feature? > > - Frank points out that a user can send a sigstop to his own setuid(0) > task and he intimates that this could cause DoS problems with FUSE. More > details needed please? > > - I don't recall seeing an exhaustive investigation of how an > unprivileged user could use a FUSE mount to implement DoS attacks against > other users or against root. You say "If a sysadmin trusts the users enough, or can ensure through other measures, that system processes will never enter non-privileged mounts, it can relax the last limitation with a "user_allow_other" config option. If this config option is set, the mounting user can add the "allow_other" mount option which disables the check for other users' processes." What config option, where? ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-06-30 20:00 ` Andrew Morton @ 2005-07-01 6:40 ` Miklos Szeredi 0 siblings, 0 replies; 78+ messages in thread From: Miklos Szeredi @ 2005-07-01 6:40 UTC (permalink / raw) To: akpm; +Cc: aia21, arjan, miklos, linux-kernel, frankvm > > - I don't recall seeing an exhaustive investigation of how an > > unprivileged user could use a FUSE mount to implement DoS attacks against > > other users or against root. > > You say > > "If a sysadmin trusts the users enough, or can ensure through other > measures, that system processes will never enter non-privileged mounts, > it can relax the last limitation with a "user_allow_other" config > option. If this config option is set, the mounting user can add the > "allow_other" mount option which disables the check for other users' > processes." > > What config option, where? Currently that's a userspace issue. There's a /etc/fuse.conf file, with two options: max_mounts=X user_allow_other The fusermount helper reads this file, and decides if passing the 'allow_other' mount option to the kernel is OK or not. If we want unprivileged sys_mount() these will have to be checked in kernel (set via sysfs, etc). Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-06-30 19:46 ` Andrew Morton 2005-06-30 20:00 ` Andrew Morton @ 2005-06-30 22:28 ` Frank van Maarseveen 2005-07-01 6:58 ` Miklos Szeredi 2005-07-01 6:36 ` FUSE merging? Miklos Szeredi 2 siblings, 1 reply; 78+ messages in thread From: Frank van Maarseveen @ 2005-06-30 22:28 UTC (permalink / raw) To: Andrew Morton Cc: Anton Altaparmakov, arjan, miklos, linux-kernel, Frank van Maarseveen On Thu, Jun 30, 2005 at 12:46:22PM -0700, Andrew Morton wrote: > > - Frank points out that a user can send a sigstop to his own setuid(0) > task and he intimates that this could cause DoS problems with FUSE. More > details needed please? It's the other way around: Apparently it is not a security problem to SIGSTOP or even SIGKILL a setuid program. So why is it a security problem when such a program is delayed by a supposedly malicious behaving FUSE mount? I think that setuid programs take too many things for granted, especially "time". I also think the ptrace equivalence principle (item C2 in the FUSE doc) is too harsh for FUSE. Suppose the process changes id to full root and we can no longer send signals to it. Are there any other ways we could affect its scheduling without FUSE? I think "yes", clearly not that easy as when it accesses a FUSE mount but "yes". Think about typing ^S (XOFF), or by letting it read from a pipe or from a file on a very very slow device. Or by renicing the parent in advance. Regarding the pipe: yes the setuid program could check that with fstat() but is such a check fundamentally the right approach? I have doubt because unified I/O is a good thing and there is no guarantee whatsoever about completion of any FS operation within a certain amount of time. Suppose another malicious process does a lookup in a huge directory without hashed names? What about a process consuming lots of memory, pushing everything else into swap? What about deleting a _huge_ file or do other things which might(?) take a considerable amount of kernel time? [id]notify might even help using this to delay a root process at a crucial point to exploit a race. So, I think there are many ways to affect the execution speed of [setuid] programs. I have never heard of a setuid root program which renices itself, such, that it successfully avoids a race or DoS exploit. And then the DoS thing using simulated endless files within FUSE. It is already possible to create terabyte sized [sparse] files. Can the fstat() size/blocks info be trusted from FUSE? no more than fstat() outside FUSE because the file may still be growing! > - I don't recall seeing an exhaustive investigation of how an > unprivileged user could use a FUSE mount to implement DoS attacks against > other users or against root. In general I think it is _hard_ to protect against a local DoS for many reasons and I don't see any new fundamental problem here with FUSE: it is just making it more obvious that it's hard to write secure setuid programs. Those programs should _know_ that input data and anything else from the user is "tainted" and that they must be _very_ careful with it, in every detail. -- Frank ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-06-30 22:28 ` Frank van Maarseveen @ 2005-07-01 6:58 ` Miklos Szeredi 2005-07-01 9:24 ` Frank van Maarseveen 0 siblings, 1 reply; 78+ messages in thread From: Miklos Szeredi @ 2005-07-01 6:58 UTC (permalink / raw) To: frankvm; +Cc: akpm, aia21, arjan, miklos, linux-kernel, frankvm > > > > - Frank points out that a user can send a sigstop to his own setuid(0) > > task and he intimates that this could cause DoS problems with FUSE. More > > details needed please? > > It's the other way around: > Apparently it is not a security problem to SIGSTOP or even SIGKILL a > setuid program. So why is it a security problem when such a program is > delayed by a supposedly malicious behaving FUSE mount? Perfectly valid argument. My question: is it not a security problem to allow signals to reach a suid program? > I think that setuid programs take too many things for granted, especially > "time". I also think the ptrace equivalence principle (item C2 in the > FUSE doc) is too harsh for FUSE. It's obviously not equivalence. FUSE filesystem gets a subset of ptrace's capabilities (and rather a small one). > Suppose the process changes id to full root and we can no longer send > signals to it. Are there any other ways we could affect its scheduling > without FUSE? I think "yes", clearly not that easy as when it accesses a > FUSE mount but "yes". Think about typing ^S (XOFF), or by letting it read > from a pipe or from a file on a very very slow device. Or by renicing > the parent in advance. Regarding the pipe: yes the setuid program could > check that with fstat() but is such a check fundamentally the right > approach? I have doubt because unified I/O is a good thing and there is > no guarantee whatsoever about completion of any FS operation within a > certain amount of time. Suppose another malicious process does a lookup > in a huge directory without hashed names? What about a process consuming > lots of memory, pushing everything else into swap? What about deleting > a _huge_ file or do other things which might(?) take a considerable > amount of kernel time? [id]notify might even help using this to delay > a root process at a crucial point to exploit a race. So, I think there > are many ways to affect the execution speed of [setuid] programs. I > have never heard of a setuid root program which renices itself, such, > that it successfully avoids a race or DoS exploit. There's a huge difference between slowing down, and stopping a process. I wouldn't consider the first a true DoS. > And then the DoS thing using simulated endless files within FUSE. It is > already possible to create terabyte sized [sparse] files. Can the fstat() > size/blocks info be trusted from FUSE? no more than fstat() outside FUSE > because the file may still be growing! > > > - I don't recall seeing an exhaustive investigation of how an > > unprivileged user could use a FUSE mount to implement DoS attacks against > > other users or against root. > > In general I think it is _hard_ to protect against a local DoS for many > reasons and I don't see any new fundamental problem here with FUSE: > it is just making it more obvious that it's hard to write secure setuid > programs. Those programs should _know_ that input data and anything else > from the user is "tainted" and that they must be _very_ careful with it, > in every detail. Yes. The extra problem with FUSE, is that they are not _able_ to be careful. They can't even check if a file is in fact on a FUSE mount or not without the FUSE daemon's intervention (lookup on a file will be passed to userspace). Thanks, Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 6:58 ` Miklos Szeredi @ 2005-07-01 9:24 ` Frank van Maarseveen 2005-07-01 10:27 ` Miklos Szeredi 0 siblings, 1 reply; 78+ messages in thread From: Frank van Maarseveen @ 2005-07-01 9:24 UTC (permalink / raw) To: Miklos Szeredi; +Cc: frankvm, akpm, aia21, arjan, linux-kernel On Fri, Jul 01, 2005 at 08:58:05AM +0200, Miklos Szeredi wrote: > > > > > > - Frank points out that a user can send a sigstop to his own setuid(0) > > > task and he intimates that this could cause DoS problems with FUSE. More > > > details needed please? > > > > It's the other way around: > > Apparently it is not a security problem to SIGSTOP or even SIGKILL a > > setuid program. So why is it a security problem when such a program is > > delayed by a supposedly malicious behaving FUSE mount? > > Perfectly valid argument. My question: is it not a security problem > to allow signals to reach a suid program? That's what I though too so I asked it first on the security mailing list. Apparently this signal behavior is normal. > There's a huge difference between slowing down, and stopping a > process. I wouldn't consider the first a true DoS. Stopping is a special case. But it is effectively the same as being indefinately slowed down by, say, 10000+ malicious processes and from that angle I don't see a fundamental difference w.r.t. security. Killing the malicous processes should solve the problem. And killing one FUSE process looks easier to me than killing 10000+ ones. > Yes. The extra problem with FUSE, is that they are not _able_ to be > careful. I think this is not true. Every pathname passed to a setuid program by the user is basically "tainted". Standard I/O is tainted as well. > They can't even check if a file is in fact on a FUSE mount They shouldn't. The pathname is not to be trusted anyway. I think FUSE has shown to be conservative enough w.r.t. security to be merged. But it may be interesting to consider: - replace ptraceability test by a kill()ability test. - some sort of "intr" mount option for most signals on by default. - Forbid hiding data by mounting a FUSE filesystem on top of it (does FUSE check for this already?) - /proc isn't a problem: most root processes tend to avoid it because it is synthetic and thus uninteresting. Maybe we should extend the idea of "synthetic file-systems being uninteresting" to any process which cannot receive signals from the FUSE mount owner. When one cannot hide data by a FUSE mount and its synthetic anyway so not interesting then just show the original empty mount point. -- Frank ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 9:24 ` Frank van Maarseveen @ 2005-07-01 10:27 ` Miklos Szeredi 2005-07-01 12:00 ` Frank van Maarseveen 0 siblings, 1 reply; 78+ messages in thread From: Miklos Szeredi @ 2005-07-01 10:27 UTC (permalink / raw) To: frankvm; +Cc: miklos, frankvm, akpm, aia21, arjan, linux-kernel > > Perfectly valid argument. My question: is it not a security problem > > to allow signals to reach a suid program? > > That's what I though too so I asked it first on the security mailing list. > Apparently this signal behavior is normal. Well, I think it's a fertile ground for hole hunters out there. Just needs a little publicity ;) Is it considered DoS for example if I prevent other users from sending email? SIGSTOP on sendmail at the right moment (when the database is locked) should do it fine. > Stopping is a special case. But it is effectively the same as being > indefinately slowed down by, say, 10000+ malicious processes and from > that angle I don't see a fundamental difference w.r.t. security. On a well protected multiuser system there will be ulimits in place to prevent that. > Killing the malicous processes should solve the problem. And killing > one FUSE process looks easier to me than killing 10000+ ones. Killing always works, if the sysadmin happens to be around. If not then there's not a lot other users can do. > I think this is not true. Every pathname passed to a setuid program > by the user is basically "tainted". Standard I/O is tainted as well. You mean suid programs are never to touch paths passed to them? If that would be true, then fuse_allow_task() would not be needed, but would do no harm either, since it would never be invoked by a suid program. > > They can't even check if a file is in fact on a FUSE mount > > They shouldn't. The pathname is not to be trusted anyway. > > I think FUSE has shown to be conservative enough w.r.t. security to be > merged. But it may be interesting to consider: > > - replace ptraceability test by a kill()ability test. You didn't consider the information leak aspect (point B in fuse.txt). > - some sort of "intr" mount option for most signals on by default. KILL will always interrupt a request. So getting rid of a malicious mount should present no problems. > - Forbid hiding data by mounting a FUSE filesystem on top of it (does > FUSE check for this already?) Yes. It checks for writablilty on the mountpoing (excluding limited writablilty as /tmp for example). > - /proc isn't a problem: most root processes tend to avoid it because > it is synthetic and thus uninteresting. Maybe we should extend > the idea of "synthetic file-systems being uninteresting" to any > process which cannot receive signals from the FUSE mount owner. When > one cannot hide data by a FUSE mount and its synthetic anyway so not > interesting then just show the original empty mount point. Been there. People (like Al Viro) didn't like it. Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 10:27 ` Miklos Szeredi @ 2005-07-01 12:00 ` Frank van Maarseveen 2005-07-01 12:36 ` Miklos Szeredi 0 siblings, 1 reply; 78+ messages in thread From: Frank van Maarseveen @ 2005-07-01 12:00 UTC (permalink / raw) To: Miklos Szeredi; +Cc: frankvm, akpm, aia21, arjan, linux-kernel On Fri, Jul 01, 2005 at 12:27:01PM +0200, Miklos Szeredi wrote: > > You mean suid programs are never to touch paths passed to them? never when euid==root. The pathname could even point into /proc or anything else yet unknown, e.g. by putting some symlinks at the right places. The mere act of opening the file as root could have unwanted side effects already. > > If that would be true, then fuse_allow_task() would not be needed, but > would do no harm either, since it would never be invoked by a suid > program. In theory it should not be necessary. But on a practical side: we need to provide security for daemons with elevated privileges which need to traverse all local disks. > You didn't consider the information leak aspect (point B in fuse.txt). Correct. I have no answer to that other than: is it a real problem or yet something else a setuid program should take into consideration? And what info can we extract already using inotify/dnotify? There are several ways to monitor activity and it is all information. /proc (ps) gives information too. > > - Forbid hiding data by mounting a FUSE filesystem on top of it (does > > FUSE check for this already?) > > Yes. It checks for writablilty on the mountpoing (excluding limited > writablilty as /tmp for example). But can you mount FUSE on top of a populated tree, a non-leaf dir? > > - /proc isn't a problem: most root processes tend to avoid it because > > it is synthetic and thus uninteresting. Maybe we should extend > > the idea of "synthetic file-systems being uninteresting" to any > > process which cannot receive signals from the FUSE mount owner. When > > one cannot hide data by a FUSE mount and its synthetic anyway so not > > interesting then just show the original empty mount point. > > Been there. People (like Al Viro) didn't like it. including changing the ptraceability test by a signal test and including the (IMHO) required emptyness of the mount stub? Traversing a FUSE mountpoint is almost equivalent to talking with a userspace program. Why should that be interesting when one simply wants to traverse the FS? root isn't going to execute all user programs to see what they do either. -- Frank ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 12:00 ` Frank van Maarseveen @ 2005-07-01 12:36 ` Miklos Szeredi 2005-07-01 13:05 ` Frank van Maarseveen 0 siblings, 1 reply; 78+ messages in thread From: Miklos Szeredi @ 2005-07-01 12:36 UTC (permalink / raw) To: frankvm; +Cc: akpm, aia21, arjan, linux-kernel > > You mean suid programs are never to touch paths passed to them? > > never when euid==root. > The pathname could even point into /proc or anything else yet unknown, > e.g. by putting some symlinks at the right places. The mere act of > opening the file as root could have unwanted side effects already. OK, open is out. However other operations (stat, unlink, chmod etc) are always without side effects on "normal" filesystems. However on FUSE they are very much unsafe (can block, not do what was instructed and return success, etc). > > If that would be true, then fuse_allow_task() would not be needed, but > > would do no harm either, since it would never be invoked by a suid > > program. > > In theory it should not be necessary. But on a practical side: we need > to provide security for daemons with elevated privileges which need to > traverse all local disks. I agree wholeheartedly. However, I'm not arguing this point, because it has been (rightly) pointed out, that private namespaces can be used to solve this. While the suid issue is not solvable with private namespaces. > > You didn't consider the information leak aspect (point B in fuse.txt). > > Correct. I have no answer to that other than: is it a real problem or > yet something else a setuid program should take into consideration? > And what info can we extract already using inotify/dnotify? Probably not file access patterns. But yes I don't consider this a very grave problem. > There are several ways to monitor activity and it is all > information. /proc (ps) gives information too. > > > > - Forbid hiding data by mounting a FUSE filesystem on top of it (does > > > FUSE check for this already?) > > > > Yes. It checks for writablilty on the mountpoing (excluding limited > > writablilty as /tmp for example). > > But can you mount FUSE on top of a populated tree, a non-leaf dir? Yes, but I think that's OK, because if the directory is writable on which you mount, than you can hide the data already (unlinking it, but keeping a reference though a file descriptor). And it's not very effective hiding, since a bind mount of the mountpoint's filesystem will reveal what's underneeth the FUSE mount. > > > - /proc isn't a problem: most root processes tend to avoid it because > > > it is synthetic and thus uninteresting. Maybe we should extend > > > the idea of "synthetic file-systems being uninteresting" to any > > > process which cannot receive signals from the FUSE mount owner. When > > > one cannot hide data by a FUSE mount and its synthetic anyway so not > > > interesting then just show the original empty mount point. > > > > Been there. People (like Al Viro) didn't like it. > > including changing the ptraceability test by a signal test and including > the (IMHO) required emptyness of the mount stub? It's been thrown out for the reason, that it's unacceptable if suid programs see a different namespace as non-suid. > Traversing a FUSE mountpoint is almost equivalent to talking with a > userspace program. Why should that be interesting when one simply wants > to traverse the FS? root isn't going to execute all user programs to > see what they do either. Yes. Please explain that to Al Viro, Christoph Hellwig et. al. Believe me it's not something that's easy to get across, and I'm very happy that you see it this way too :). Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 12:36 ` Miklos Szeredi @ 2005-07-01 13:05 ` Frank van Maarseveen 2005-07-01 13:21 ` Miklos Szeredi 0 siblings, 1 reply; 78+ messages in thread From: Frank van Maarseveen @ 2005-07-01 13:05 UTC (permalink / raw) To: Miklos Szeredi; +Cc: frankvm, akpm, aia21, arjan, linux-kernel On Fri, Jul 01, 2005 at 02:36:22PM +0200, Miklos Szeredi wrote: > > > You mean suid programs are never to touch paths passed to them? > > > > never when euid==root. > > The pathname could even point into /proc or anything else yet unknown, > > e.g. by putting some symlinks at the right places. The mere act of > > opening the file as root could have unwanted side effects already. > > OK, open is out. However other operations (stat, unlink, chmod etc) > are always without side effects on "normal" filesystems. However on > FUSE they are very much unsafe (can block, not do what was instructed > and return success, etc). What about tricking a setuid program to stat into /auto (/mnt/auto, /misc, whatever it is called)? then the automounter will act upon a root request with again possibly unwanted side effects. See how careful a setuid/full-root program must be in handling userdata including pathnames? FUSE suddenly makes this more obvious but it is not new. > > > > - /proc isn't a problem: most root processes tend to avoid it because > > > > it is synthetic and thus uninteresting. Maybe we should extend > > > > the idea of "synthetic file-systems being uninteresting" to any > > > > process which cannot receive signals from the FUSE mount owner. When > > > > one cannot hide data by a FUSE mount and its synthetic anyway so not > > > > interesting then just show the original empty mount point. > > > > > > Been there. People (like Al Viro) didn't like it. > > > > including changing the ptraceability test by a signal test and including > > the (IMHO) required emptyness of the mount stub? > > It's been thrown out for the reason, that it's unacceptable if suid > programs see a different namespace as non-suid. You mean root versus non-root. or user versus other user I assume. Because the euid (fsuid) is what matters. But then: this _is_ already the case for NFS when squash_root is in effect (what about kerberos et.al?). So there are several reasons to consider FUSE a nonlocal fs instead of a local one so nothing new there. FUSE could be used to implement a usable (not perfect) userspace NFS/ftp client. To require an empty stub to mount FUSE upon makes the whole picture cleaner: users are only able to extend the namespace _leaf_ nodes for themselves and processes they can send signals to: setuid programs which do not fully become root. The existing namespace [nodes] remains unchanged for everyone. -- Frank ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 13:05 ` Frank van Maarseveen @ 2005-07-01 13:21 ` Miklos Szeredi 2005-07-01 15:20 ` Frank van Maarseveen 0 siblings, 1 reply; 78+ messages in thread From: Miklos Szeredi @ 2005-07-01 13:21 UTC (permalink / raw) To: frankvm; +Cc: akpm, aia21, arjan, linux-kernel > > OK, open is out. However other operations (stat, unlink, chmod etc) > > are always without side effects on "normal" filesystems. However on > > FUSE they are very much unsafe (can block, not do what was instructed > > and return success, etc). > > What about tricking a setuid program to stat into /auto (/mnt/auto, > /misc, whatever it is called)? then the automounter will act upon a root > request with again possibly unwanted side effects. See how careful a > setuid/full-root program must be in handling userdata including pathnames? I don't see why /auto is special. It's basically a userspace filesystem too, but that's not what is specaial about FUSE. It's the fact the it's a userspace filesystem controlled by an _ordinary user_. > FUSE suddenly makes this more obvious but it is not new. I believe it _is_ something new. If it were not, then your arguments would be bulletproof. As it is, I think you miss the point that the side effect is actually in the hands of the user invoking the suid program, instead of something external. > > > including changing the ptraceability test by a signal test and including > > > the (IMHO) required emptyness of the mount stub? > > > > It's been thrown out for the reason, that it's unacceptable if suid > > programs see a different namespace as non-suid. > > You mean root versus non-root. or user versus other user I assume. Because > the euid (fsuid) is what matters. Yes. > But then: this _is_ already the case for NFS when squash_root is in effect > (what about kerberos et.al?). So there are several reasons to consider > FUSE a nonlocal fs instead of a local one so nothing new there. FUSE could > be used to implement a usable (not perfect) userspace NFS/ftp client. Yes. In fact even if the check were left out of the kernel, the userspace filesystem could still return different data/error based on fsuid/fsgid/pid. So what's so controversial about it? I really fail to understand... > To require an empty stub to mount FUSE upon makes the whole picture > cleaner: users are only able to extend the namespace _leaf_ nodes for > themselves and processes they can send signals to: setuid programs > which do not fully become root. The existing namespace [nodes] remains > unchanged for everyone. It's not as simple. A filesystem can be mounted many times (either with mount --bind, or just by mounting the same device on multiple mountpoints). In this case you can't ensure, that a mountpoint will remain a leaf node after being mounted on. Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 13:21 ` Miklos Szeredi @ 2005-07-01 15:20 ` Frank van Maarseveen 2005-07-01 17:04 ` Miklos Szeredi 0 siblings, 1 reply; 78+ messages in thread From: Frank van Maarseveen @ 2005-07-01 15:20 UTC (permalink / raw) To: Miklos Szeredi; +Cc: frankvm, akpm, aia21, arjan, linux-kernel On Fri, Jul 01, 2005 at 03:21:59PM +0200, Miklos Szeredi wrote: > > > To require an empty stub to mount FUSE upon makes the whole picture > > cleaner: users are only able to extend the namespace _leaf_ nodes for > > themselves and processes they can send signals to: setuid programs > > which do not fully become root. The existing namespace [nodes] remains > > unchanged for everyone. > > It's not as simple. A filesystem can be mounted many times (either > with mount --bind, or just by mounting the same device on multiple > mountpoints). In this case you can't ensure, that a mountpoint will > remain a leaf node after being mounted on. I have bind-mounted / on /net/blabla I tried two experiments: mounting something under / and looking for it under /net/blabla mounting something under /net/blabla and looking for it under / The experiment was done with bind mounts and by mounting a USB stick (/dev/sdb1) and there was no auto propagation of mounts. (2.6.12-rc6) How can a leaf dir suddenly become non-leaf by a mount without an explicit mount command? -- Frank ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 15:20 ` Frank van Maarseveen @ 2005-07-01 17:04 ` Miklos Szeredi 2005-07-01 18:04 ` Frank van Maarseveen 0 siblings, 1 reply; 78+ messages in thread From: Miklos Szeredi @ 2005-07-01 17:04 UTC (permalink / raw) To: frankvm; +Cc: akpm, aia21, arjan, linux-kernel > > It's not as simple. A filesystem can be mounted many times (either > > with mount --bind, or just by mounting the same device on multiple > > mountpoints). In this case you can't ensure, that a mountpoint will > > remain a leaf node after being mounted on. > > I have bind-mounted / on /net/blabla > I tried two experiments: > > mounting something under / and looking for it under /net/blabla > mounting something under /net/blabla and looking for it under / > > The experiment was done with bind mounts and by mounting a USB stick > (/dev/sdb1) and there was no auto propagation of mounts. I'm not talking about auto propagation (that's only now being implemented by Ram Pai, and is not in stock kernels). What I'm saying is that mounting something over a leaf node, does not guarantee, that it will remain a leaf node after it's been mounted on. For example: mkdir /tmp/leafdir mkdir /tmp/rootcopy mount --bind / /tmp/rootcopy mount /dev/sdb1 /tmp/leafdir mkdir /tmp/rootcopy/tmp/leafdir/child Now 'leafdir' is no longer a leaf. I'm not saying this is a problem, but also I don't see any overwhelming reason to not allow user mounts over non-leaf directories. Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 17:04 ` Miklos Szeredi @ 2005-07-01 18:04 ` Frank van Maarseveen 2005-07-01 19:35 ` Jeremy Maitin-Shepard 2005-07-02 14:49 ` Miklos Szeredi 0 siblings, 2 replies; 78+ messages in thread From: Frank van Maarseveen @ 2005-07-01 18:04 UTC (permalink / raw) To: Miklos Szeredi; +Cc: frankvm, akpm, aia21, arjan, linux-kernel On Fri, Jul 01, 2005 at 07:04:50PM +0200, Miklos Szeredi wrote: > I'm not saying this is a problem, but also I don't see any > overwhelming reason to not allow user mounts over non-leaf > directories. All things considered I'd still prefer forbidding FUSE mounts on non-leaf dirs. For name space sanity. And it may be easier to get the whole thing accepted: - One could argue that the existing name space is extended rather than changed [for a subset of processes], what Al Viro seems to reject. - The processes which cannot be ptraced/sent a signal by the mount owner are not "forced" to traverse the FUSE mount for the sake of name space invariancy, with all associated security problems: they can see everything up to the leaf node of all the usual mounts. But put otherwise: is there a compelling reason to permit FUSE mounts on non-leaf nodes? Can FUSE mount on a file like NFS? What is your opinion about replacing the ptrace check by a signal check (later on, no hurry)? -- Frank ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 18:04 ` Frank van Maarseveen @ 2005-07-01 19:35 ` Jeremy Maitin-Shepard 2005-07-02 14:49 ` Miklos Szeredi 1 sibling, 0 replies; 78+ messages in thread From: Jeremy Maitin-Shepard @ 2005-07-01 19:35 UTC (permalink / raw) To: linux-kernel Frank van Maarseveen <frankvm@frankvm.com> writes: [snip] > But put otherwise: is there a compelling reason to permit FUSE mounts on > non-leaf nodes? In my own use of FUSE, I have found it handy to stick mount scripts in some of the directories that I use as FUSE mount points. -- Jeremy Maitin-Shepard ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 18:04 ` Frank van Maarseveen 2005-07-01 19:35 ` Jeremy Maitin-Shepard @ 2005-07-02 14:49 ` Miklos Szeredi 2005-07-02 16:00 ` Frank van Maarseveen 1 sibling, 1 reply; 78+ messages in thread From: Miklos Szeredi @ 2005-07-02 14:49 UTC (permalink / raw) To: frankvm; +Cc: akpm, aia21, arjan, linux-kernel > > I'm not saying this is a problem, but also I don't see any > > overwhelming reason to not allow user mounts over non-leaf > > directories. > > All things considered I'd still prefer forbidding FUSE mounts on non-leaf > dirs. For name space sanity. And it may be easier to get the whole thing > accepted: > > - One could argue that the existing name space is extended rather than > changed [for a subset of processes], what Al Viro seems to reject. > - The processes which cannot be ptraced/sent a signal by the mount > owner are not "forced" to traverse the FUSE mount for the sake of > name space invariancy, with all associated security problems: they > can see everything up to the leaf node of all the usual mounts. > > But put otherwise: is there a compelling reason to permit FUSE mounts on > non-leaf nodes? Not really. Maybe it does have some uses, but I'm not aware of any. But I don't think it would matter in the acceptance of the mount hiding patch, since that patch was not rejected on the basis of what FUSE would use it for, rather for the general philosophy of not allowing namespace differences based on user id. > Can FUSE mount on a file like NFS? Yes. > What is your opinion about replacing the ptrace check by a signal check > (later on, no hurry)? Maybe. You'd still have to convince me, that signals sent to suid programs are not a security problem. Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-02 14:49 ` Miklos Szeredi @ 2005-07-02 16:00 ` Frank van Maarseveen 2005-07-03 6:16 ` Miklos Szeredi 0 siblings, 1 reply; 78+ messages in thread From: Frank van Maarseveen @ 2005-07-02 16:00 UTC (permalink / raw) To: Miklos Szeredi; +Cc: frankvm, akpm, aia21, arjan, linux-kernel On Sat, Jul 02, 2005 at 04:49:24PM +0200, Miklos Szeredi wrote: > > > > All things considered I'd still prefer forbidding FUSE mounts on non-leaf > > dirs. For name space sanity. And it may be easier to get the whole thing > > accepted: > > > > But I don't think it would matter in the acceptance of the mount > hiding patch, since that patch was not rejected on the basis of what > FUSE would use it for, rather for the general philosophy of not > allowing namespace differences based on user id. That would really be a loss. After some thinking, the whole "not allowing namespace differences based on user id" philosophy is unenforcable and not even true sometimes nowadays. Think NFS: have a look at the unfsd server, you'll be surprised what it can do. Think any other networked file system exported by a machine with an unusual disk file-system underneath. IIRC ncpfs does this on the server based on access and thus based on uid. (hmm, I _hated_ it seeing empty directories only because I had no access to anything below. Based on that I'd prefer EACCES instead of seeing an empty mount stub when FUSE denies access to root or any other user.) The thing is, root rules the _local_ part of the name space. So it should make a _huge_ difference if FUSE can fiddle with that or only with what's below the leaf nodes. > > What is your opinion about replacing the ptrace check by a signal check > > (later on, no hurry)? > > Maybe. You'd still have to convince me, that signals sent to suid > programs are not a security problem. google kill(2): http://www.opengroup.org/onlinepubs/007908799/xsh/kill.html It is _defined_ behavior. So, it is up to the quality of the programmer whether or not it results in a security problem ;-) -- Frank ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-02 16:00 ` Frank van Maarseveen @ 2005-07-03 6:16 ` Miklos Szeredi 2005-07-03 11:25 ` Frank van Maarseveen 0 siblings, 1 reply; 78+ messages in thread From: Miklos Szeredi @ 2005-07-03 6:16 UTC (permalink / raw) To: frankvm; +Cc: akpm, aia21, arjan, linux-kernel > After some thinking, the whole "not allowing namespace differences > based on user id" philosophy is unenforcable and not even true sometimes > nowadays. Think NFS: have a look at the unfsd server, you'll be surprised > what it can do. Think any other networked file system exported by a > machine with an unusual disk file-system underneath. IIRC ncpfs does > this on the server based on access and thus based on uid. Hmm, do you mean returning different directory contents based on uid? > (hmm, I _hated_ it seeing empty directories only because I had no access > to anything below. Based on that I'd prefer EACCES instead of seeing an > empty mount stub when FUSE denies access to root or any other user.) Well, it works that way currently, and there doesn't seem to be any real problem with it. > The thing is, root rules the _local_ part of the name space. So it should > make a _huge_ difference if FUSE can fiddle with that or only with what's > below the leaf nodes. I don't really understand what you mean by "local". The problem with this leaf node philosophy, is that it's not really consistent. You can ensure that a mountpoint is a leaf node at mount time, but you can force it to remain a leaf node after the mount. So I don't see why this check at mount time would make _any_ difference. > > > What is your opinion about replacing the ptrace check by a signal check > > > (later on, no hurry)? > > > > Maybe. You'd still have to convince me, that signals sent to suid > > programs are not a security problem. > > google kill(2): > > http://www.opengroup.org/onlinepubs/007908799/xsh/kill.html > > It is _defined_ behavior. So, it is up to the quality of the programmer > whether or not it results in a security problem ;-) Ahh, right. The info leak argument still holds, but it's pretty weak. So if the current behavior causes a problem for sombody, and relaxing the check from ptraceability to killability fixes it, then I'll consider doing it. Until then, let's keep the more secure check. Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-03 6:16 ` Miklos Szeredi @ 2005-07-03 11:25 ` Frank van Maarseveen 2005-07-03 13:24 ` Miklos Szeredi 0 siblings, 1 reply; 78+ messages in thread From: Frank van Maarseveen @ 2005-07-03 11:25 UTC (permalink / raw) To: Miklos Szeredi; +Cc: frankvm, akpm, aia21, arjan, linux-kernel On Sun, Jul 03, 2005 at 08:16:37AM +0200, Miklos Szeredi wrote: > > After some thinking, the whole "not allowing namespace differences > > based on user id" philosophy is unenforcable and not even true sometimes > > nowadays. Think NFS: have a look at the unfsd server, you'll be surprised > > what it can do. Think any other networked file system exported by a > > machine with an unusual disk file-system underneath. IIRC ncpfs does > > this on the server based on access and thus based on uid. > > Hmm, do you mean returning different directory contents based on uid? http://clusternfs.sourceforge.net Don't ask me how this plays with the dcache. > > The thing is, root rules the _local_ part of the name space. So it should > > make a _huge_ difference if FUSE can fiddle with that or only with what's > > below the leaf nodes. > > I don't really understand what you mean by "local". The opposite of "local" is "remote", i.e. networked filesystems: mount foo:/bar /usr/src/bar /, /usr and /usr/src are stored on a local disk. /usr/src/bar/* is not. Namespace invariance can be guaranteed for the "/usr/src" part. Not for anything below unless you control the peer. > > The problem with this leaf node philosophy, is that it's not really > consistent. You can ensure that a mountpoint is a leaf node at mount > time, but you cannot force it to remain a leaf node after the mount. So ^^^ inserted by me ok, I just remembered that any process with an open directory handle could still fchdir() underneath. I think the leaf node enforcing is possible but it is indeed a bit more complicated. (Hmm, it's a bit bizarre but could you mount FUSE on, for example, a named pipe and change it into a directory?) > I don't see why this check at mount time would make _any_ difference. It should be possible to do audits on local filesystems, e.g. by: find / /home /var -xdev .... This can be done as root but sometimes you may want to do this with the uid/gid of a specific user, for safety or for checking what the user actually can access or damage. And that won't work as expected when the user places a FUSE mount on top of his own login directory. But I don't think leaf node enforcing is required from a security point of view. This is the only thing I could come up with. IMHO The namespace argument against FUSE is weak for multiple reasons. The only variancy I see is when crossing the mount point. And that disappears once EACCES is returned when non-ptraceable processes try to cross it. But that's not really acceptable (see previous audit case) unless FUSE refuses to mount on non-leaf dirs. -- Frank ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-03 11:25 ` Frank van Maarseveen @ 2005-07-03 13:24 ` Miklos Szeredi 2005-07-03 13:50 ` Frank van Maarseveen 2005-07-03 14:10 ` FUSE merging? (2) Frank van Maarseveen 0 siblings, 2 replies; 78+ messages in thread From: Miklos Szeredi @ 2005-07-03 13:24 UTC (permalink / raw) To: frankvm; +Cc: akpm, aia21, arjan, linux-kernel > > Hmm, do you mean returning different directory contents based on uid? > > http://clusternfs.sourceforge.net > > Don't ask me how this plays with the dcache. But here the decision on what to return is in the _server_. There's nothing magic about that. It's as if it was N different servers for N different clients, only more effective. > The opposite of "local" is "remote", i.e. networked filesystems: > > mount foo:/bar /usr/src/bar > > /, /usr and /usr/src are stored on a local disk. /usr/src/bar/* is not. > Namespace invariance can be guaranteed for the "/usr/src" part. Not for > anything below unless you control the peer. I think what you call namespace invariance is basically true for all existing filesystems. There could be a filesystem which returns different directory contents based on whatever it wants, but it can't return a different "dentry" for the same name. So file/directory _content_ can be made to vary, but the namespace itself can't. > > > > The problem with this leaf node philosophy, is that it's not really > > consistent. You can ensure that a mountpoint is a leaf node at mount > > time, but you cannot force it to remain a leaf node after the mount. So > ^^^ > inserted by me [well corrected :)] > > ok, I just remembered that any process with an open directory handle > could still fchdir() underneath. I think the leaf node enforcing is > possible but it is indeed a bit more complicated. > > (Hmm, it's a bit bizarre but could you mount FUSE on, for example, a > named pipe and change it into a directory?) No. Fusermount checks file type and refuses the mount if there's a mismatch (and it protects against races by mounting on '.' for directories, and on '/proc/self/fd/X' for regular files). > > I don't see why this check at mount time would make _any_ difference. > > It should be possible to do audits on local filesystems, e.g. by: > > find / /home /var -xdev .... > > This can be done as root but sometimes you may want to do this with the > uid/gid of a specific user, for safety or for checking what the user > actually can access or damage. But note, that running with the uid/gid of the user exposes the auditing script to manipulation (kill, ptrace) by the user. Running with changed fsuid/fsgid is OK though. > And that won't work as expected when the user places a FUSE mount on > top of his own login directory. But I don't think leaf node > enforcing is required from a security point of view. This is the > only thing I could come up with. OK, from the auditing POV, there's a slight hole in unprivileged mounts. But I don't think this is grave, since it's not so hard to hide any sensitive data from such scripts anyway (keeping data in memory, or keeping a file descriptor to an unlinked file, etc). > IMHO The namespace argument against FUSE is weak for multiple reasons. The > only variancy I see is when crossing the mount point. And that disappears > once EACCES is returned when non-ptraceable processes try to cross it. Yes, but still this is just a difference in permission, and not a difference in namespace. > But that's not really acceptable (see previous audit case) unless FUSE > refuses to mount on non-leaf dirs. I don't think the audit case is important. It's easy to work around it manually by the sysadmin, and for the automatic case it doesn't really matter (as detailed above). Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-03 13:24 ` Miklos Szeredi @ 2005-07-03 13:50 ` Frank van Maarseveen 2005-07-03 14:03 ` Miklos Szeredi 2005-07-03 14:10 ` FUSE merging? (2) Frank van Maarseveen 1 sibling, 1 reply; 78+ messages in thread From: Frank van Maarseveen @ 2005-07-03 13:50 UTC (permalink / raw) To: Miklos Szeredi; +Cc: frankvm, akpm, aia21, arjan, linux-kernel On Sun, Jul 03, 2005 at 03:24:04PM +0200, Miklos Szeredi wrote: > > > Hmm, do you mean returning different directory contents based on uid? > > > > http://clusternfs.sourceforge.net > > > > Don't ask me how this plays with the dcache. > > But here the decision on what to return is in the _server_. It still means that name space invariancy cannot be guaranteed. > There's > nothing magic about that. It's as if it was N different servers for N > different clients, only more effective. Not entirely, there is a UID dependancy. > I think what you call namespace invariance is basically true for all > existing filesystems. There could be a filesystem which returns > different directory contents based on whatever it wants, but it can't > return a different "dentry" for the same name. This is not what I mean. The directory contents itself must be identical for every user. And every name must of course correspond with only one dentry. That's name-space invariance IMO. > > IMHO The namespace argument against FUSE is weak for multiple reasons. The > > only variancy I see is when crossing the mount point. And that disappears > > once EACCES is returned when non-ptraceable processes try to cross it. > > Yes, but still this is just a difference in permission, and not a > difference in namespace. Exactly. And such a difference in permission already exists for (sane) networked file systems such as NFS with "squash_root" in effect on the server. -- Frank ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-03 13:50 ` Frank van Maarseveen @ 2005-07-03 14:03 ` Miklos Szeredi 0 siblings, 0 replies; 78+ messages in thread From: Miklos Szeredi @ 2005-07-03 14:03 UTC (permalink / raw) To: frankvm; +Cc: akpm, aia21, arjan, linux-kernel > > There's > > nothing magic about that. It's as if it was N different servers for N > > different clients, only more effective. > > Not entirely, there is a UID dependancy. Ahh, so there is. Does it actually work? I doubt it. The VFS won't allow two different dentries to refer to the same name. And without that, how would you have several inodes for a single name? > > I think what you call namespace invariance is basically true for all > > existing filesystems. There could be a filesystem which returns > > different directory contents based on whatever it wants, but it can't > > return a different "dentry" for the same name. > > This is not what I mean. The directory contents itself must be identical > for every user. And every name must of course correspond with only one > dentry. That's name-space invariance IMO. OK. > > > IMHO The namespace argument against FUSE is weak for multiple > > > reasons. The only variancy I see is when crossing the mount > > > point. And that disappears once EACCES is returned when > > > non-ptraceable processes try to cross it. > > > > Yes, but still this is just a difference in permission, and not a > > difference in namespace. > > Exactly. And such a difference in permission already exists for (sane) > networked file systems such as NFS with "squash_root" in effect on > the server. Agreed. Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? (2) 2005-07-03 13:24 ` Miklos Szeredi 2005-07-03 13:50 ` Frank van Maarseveen @ 2005-07-03 14:10 ` Frank van Maarseveen 2005-07-03 15:47 ` Miklos Szeredi 1 sibling, 1 reply; 78+ messages in thread From: Frank van Maarseveen @ 2005-07-03 14:10 UTC (permalink / raw) To: Miklos Szeredi; +Cc: frankvm, akpm, aia21, arjan, linux-kernel On Sun, Jul 03, 2005 at 03:24:04PM +0200, Miklos Szeredi wrote: > > > But that's not really acceptable (see previous audit case) unless FUSE > > refuses to mount on non-leaf dirs. > > I don't think the audit case is important. It's easy to work around > it manually by the sysadmin, and for the automatic case it doesn't > really matter (as detailed above). Note that the audit case "as user" is less important than the root case. I consider the latter very important and EACCES will break it when FUSE permits mounting on non-leaf dirs. -- Frank ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? (2) 2005-07-03 14:10 ` FUSE merging? (2) Frank van Maarseveen @ 2005-07-03 15:47 ` Miklos Szeredi 2005-07-03 19:36 ` Frank van Maarseveen 0 siblings, 1 reply; 78+ messages in thread From: Miklos Szeredi @ 2005-07-03 15:47 UTC (permalink / raw) To: frankvm; +Cc: akpm, aia21, arjan, linux-kernel > > > But that's not really acceptable (see previous audit case) unless FUSE > > > refuses to mount on non-leaf dirs. > > > > I don't think the audit case is important. It's easy to work around > > it manually by the sysadmin, and for the automatic case it doesn't > > really matter (as detailed above). > > Note that the audit case "as user" is less important than the root case. I > consider the latter very important and EACCES will break it when FUSE > permits mounting on non-leaf dirs. OK. Can you tell me, why you consider it important? And what's your proposal for dealing with it? Refusing to mount on non-leaf dir is not a solution, since it would still allow arbitrary hiding. Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? (2) 2005-07-03 15:47 ` Miklos Szeredi @ 2005-07-03 19:36 ` Frank van Maarseveen 2005-07-04 8:56 ` Miklos Szeredi 0 siblings, 1 reply; 78+ messages in thread From: Frank van Maarseveen @ 2005-07-03 19:36 UTC (permalink / raw) To: Miklos Szeredi; +Cc: frankvm, akpm, aia21, arjan, linux-kernel On Sun, Jul 03, 2005 at 05:47:58PM +0200, Miklos Szeredi wrote: > > > > But that's not really acceptable (see previous audit case) unless FUSE > > > > refuses to mount on non-leaf dirs. > > > > > > I don't think the audit case is important. It's easy to work around > > > it manually by the sysadmin, and for the automatic case it doesn't > > > really matter (as detailed above). > > > > Note that the audit case "as user" is less important than the root case. I > > consider the latter very important and EACCES will break it when FUSE > > permits mounting on non-leaf dirs. > > OK. Can you tell me, why you consider it important? And what's your > proposal for dealing with it? It is important because on UNIX, "root" rules on local filesystems. I dont't like the idea of root not being able to run "find -xdev" anymore for administrative tasks, just because something got hidden by accident or just for fun by a user. It's not about malicious users who want to hide data: they can do that in tons of ways. The simple "find -xdev" by root should just not be affected unless there is a very good reason (SELinux or other "hardened" solutions). IMHO The best thing FUSE could do is to make the mount totally invisible: don't return EACCES, don't follow the FUSE mount but stay on the original tree. I think it's either this or returning EACCES plus the leaf node constraint at mount time. The name-space variancy introduced by the first option is only minor: Mounting anything over a tree which is still in use by a process is much worse because it tends to be disruptive. And that has always been possible. [And I would use the kill() equivalence instead of ptrace() because it is more appropriate. Doing so avoids the risk of accidentally breaking useful setuid programs - I don't know if that will happen but I don't see any security issues here.] -- Frank ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? (2) 2005-07-03 19:36 ` Frank van Maarseveen @ 2005-07-04 8:56 ` Miklos Szeredi 2005-07-04 9:59 ` Frank van Maarseveen 0 siblings, 1 reply; 78+ messages in thread From: Miklos Szeredi @ 2005-07-04 8:56 UTC (permalink / raw) To: frankvm; +Cc: akpm, aia21, arjan, linux-kernel > It is important because on UNIX, "root" rules on local filesystems. > I dont't like the idea of root not being able to run "find -xdev" > anymore for administrative tasks, just because something got hidden > by accident or just for fun by a user. It's not about malicious > users who want to hide data: they can do that in tons of ways. That's a sort of security by obscurity: if the user is dumb enough he cannot do any harm. But I'm not interested in that sort of thing. If this issue important, then it should be solved properly, and not just by "preventing accidents". > IMHO The best thing FUSE could do is to make the mount totally > invisible: don't return EACCES, don't follow the FUSE mount but stay > on the original tree. I think it's either this or returning EACCES > plus the leaf node constraint at mount time. The leaf node constranint doesn't make sense. The hidden mount thing does, but it has been very flatly rejected by Al Viro. There's a nice solution to this (discussed at length earlier): private namespaces. I think we are still confusing these two issues, which are in fact separate. 1) polluting global namespace is bad (find -xdev issue) 2) not ptraceable (or not killable) processes should not be able to access an unprivileged mount For 1) private namespaces are the proper solution. For 2) the fuse_allow_task() in it's current or modified form (to check killability) should be OK. 1) is completely orthogonal to FUSE. 2) is currently provably secure, and doesn't seem cause problems in practice. Do you have a concrete example, where it would cause problems? Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? (2) 2005-07-04 8:56 ` Miklos Szeredi @ 2005-07-04 9:59 ` Frank van Maarseveen 2005-07-04 10:27 ` Miklos Szeredi 0 siblings, 1 reply; 78+ messages in thread From: Frank van Maarseveen @ 2005-07-04 9:59 UTC (permalink / raw) To: Miklos Szeredi; +Cc: frankvm, akpm, aia21, arjan, linux-kernel On Mon, Jul 04, 2005 at 10:56:30AM +0200, Miklos Szeredi wrote: > > It is important because on UNIX, "root" rules on local filesystems. > > I dont't like the idea of root not being able to run "find -xdev" > > anymore for administrative tasks, just because something got hidden > > by accident or just for fun by a user. It's not about malicious > > users who want to hide data: they can do that in tons of ways. > > That's a sort of security by obscurity: if the user is dumb enough he > cannot do any harm. But I'm not interested in that sort of thing. If > this issue important, then it should be solved properly, and not just > by "preventing accidents". "solving it properly" refers to hardening the leaf node constraint against circumvention I assume. Suppose there's a script for doing simple on-line backups using "find". Now explain to the user why he lost his data due to a backup script geting EACCES on a non-leaf FUSE mount. I don't think that's acceptable. On the other hand, when the user stored something _deliberately_ under a mountpoint, circumventing the leaf node constraint by some trickery then it is clearly his own fault when the data is lost. Anyway, a leaf node constraint can be hardened against misuse later on, should it become necessary. Your bind-mount case to circumvent this restriction is slightly flawed because it requires root interaction. > > There's a nice solution to this (discussed at length earlier): private > namespaces. I thought that's rejected because a process doesn't automatically get the right namespace after rsh into such a machine? And fixing it by adjusting the name-space of a process (by whatever means) is not transparent. > I think we are still confusing these two issues, which are in fact > separate. > > 1) polluting global namespace is bad (find -xdev issue) > > 2) not ptraceable (or not killable) processes should not be able to > access an unprivileged mount > > For 1) private namespaces are the proper solution. For 2) the > fuse_allow_task() in it's current or modified form (to check > killability) should be OK. > > 1) is completely orthogonal to FUSE. 2) is currently provably secure, > and doesn't seem cause problems in practice. Do you have a concrete > example, where it would cause problems? See above backup scenario. Issues (1) and (2) are tied together I'm afraid: When using a private name-space and thus assuming an unrelated process needs to do something very special to get that name-space then (2) would not be needed at all. On the other hand, Name-space inheritance by setuid processes suddenly becomes an issue: issue (2) is re-appearing but at another place. -- Frank ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? (2) 2005-07-04 9:59 ` Frank van Maarseveen @ 2005-07-04 10:27 ` Miklos Szeredi 2005-07-04 11:26 ` Frank van Maarseveen 0 siblings, 1 reply; 78+ messages in thread From: Miklos Szeredi @ 2005-07-04 10:27 UTC (permalink / raw) To: frankvm; +Cc: miklos, frankvm, akpm, aia21, arjan, linux-kernel > "solving it properly" refers to hardening the leaf node constraint > against circumvention I assume. Suppose there's a script for doing simple > on-line backups using "find". Now explain to the user why he lost his > data due to a backup script geting EACCES on a non-leaf FUSE mount. I see your point. But then this is really not a security issue, but an "are you sure you want to format C:" style protection for the user's own sake. Adding a mount option (checked by the library) for this would be fine. E.g. with "mount_nonempty" it would not refuse to mount on a non-leaf dir, and README would document, that using this option might cause trouble. Otherwise the mount would be refused with a reference to the above option. Is that what you were thinking? > > There's a nice solution to this (discussed at length earlier): private > > namespaces. > > I thought that's rejected because a process doesn't automatically get the > right namespace after rsh into such a machine? And fixing it by adjusting > the name-space of a process (by whatever means) is not transparent. Private namespaces in their current form are not really useful. But that's irrelevant to the current discussion. If somebody needs private namespaces they will have to add the missing features (Ram Pai is working on shared subtrees, the biggest chunk). > > I think we are still confusing these two issues, which are in fact > > separate. > > > > 1) polluting global namespace is bad (find -xdev issue) > > > > 2) not ptraceable (or not killable) processes should not be able to > > access an unprivileged mount > > > > For 1) private namespaces are the proper solution. For 2) the > > fuse_allow_task() in it's current or modified form (to check > > killability) should be OK. > > > > 1) is completely orthogonal to FUSE. 2) is currently provably secure, > > and doesn't seem cause problems in practice. Do you have a concrete > > example, where it would cause problems? > > See above backup scenario. The backup problem is a consequence of 1). It has absolutely zero to do with 2). If the fuse_allow_task() security check didn't exist the backup script would still not work. > Issues (1) and (2) are tied together I'm afraid: > > When using a private name-space and thus assuming an unrelated process > needs to do something very special to get that name-space then (2) > would not be needed at all. Wrong. It's still needed, because suid/sgid programs can - run under the private namespace without doing anything special - run with extra privileges, not possesed by the user executing the program > On the other hand, Name-space inheritance by setuid processes suddenly > becomes an issue: issue (2) is re-appearing but at another place. I don't think you could change the rules of namespace inheritence, without causing trouble. Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? (2) 2005-07-04 10:27 ` Miklos Szeredi @ 2005-07-04 11:26 ` Frank van Maarseveen 0 siblings, 0 replies; 78+ messages in thread From: Frank van Maarseveen @ 2005-07-04 11:26 UTC (permalink / raw) To: Miklos Szeredi; +Cc: frankvm, akpm, aia21, arjan, linux-kernel On Mon, Jul 04, 2005 at 12:27:13PM +0200, Miklos Szeredi wrote: > E.g. with "mount_nonempty" it would not refuse to > mount on a non-leaf dir, and README would document, that using this > option might cause trouble. Otherwise the mount would be refused with > a reference to the above option. that will do. -- Frank ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-06-30 19:46 ` Andrew Morton 2005-06-30 20:00 ` Andrew Morton 2005-06-30 22:28 ` Frank van Maarseveen @ 2005-07-01 6:36 ` Miklos Szeredi 2005-07-01 6:50 ` Andrew Morton ` (2 more replies) 2 siblings, 3 replies; 78+ messages in thread From: Miklos Szeredi @ 2005-07-01 6:36 UTC (permalink / raw) To: akpm; +Cc: aia21, arjan, miklos, linux-kernel, frankvm > However, a few things: > > - is there anything in the current implementation of the permission stuff > which might tie our hands if it is later reimplemented? IOW: does the > current FUSE user interface in any way lock us into the current FUSE > implementation (fuse_allow_task())? No. This thing is above the userspace interface and completely independent. Either a task is allowed, and then the request goes through to the interface. Or if it's not, the request is stopped right there, and never reaches the userspace interface. > - the fuse mount options don't seem to be documented True. I'll send a patch (they are documented in the README of the fuse distribution). > - aren't we going to remove the nfs semi-server feature? I leave the decision to you ;) It's a separate independent patch already (fuse-nfs-export.patch). > - Frank points out that a user can send a sigstop to his own setuid(0) > task and he intimates that this could cause DoS problems with FUSE. More > details needed please? Will follow up in Franks answer. > - I don't recall seeing an exhaustive investigation of how an > unprivileged user could use a FUSE mount to implement DoS attacks against > other users or against root. Here's a description of a theoretical DoS scenario: http://marc.theaimsgroup.com/?l=linux-fsdevel&m=111522019516694&w=2 Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 6:36 ` FUSE merging? Miklos Szeredi @ 2005-07-01 6:50 ` Andrew Morton 2005-07-01 7:07 ` Miklos Szeredi 2005-07-01 12:37 ` bert hubert 2005-07-01 7:46 ` Frederik Deweerdt 2005-07-01 9:36 ` Frank van Maarseveen 2 siblings, 2 replies; 78+ messages in thread From: Andrew Morton @ 2005-07-01 6:50 UTC (permalink / raw) To: Miklos Szeredi; +Cc: aia21, arjan, miklos, linux-kernel, frankvm Miklos Szeredi <miklos@szeredi.hu> wrote: > > > - aren't we going to remove the nfs semi-server feature? > > I leave the decision to you ;) It's a separate independent patch > already (fuse-nfs-export.patch). Let's leave it out - that'll stimulate some activity in the userspace-nfs-server-for-FUSE area. Speaking of which, dumb question: what does FUSE offer over simply using NFS protocol to talk to the userspace filesystem driver? ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 6:50 ` Andrew Morton @ 2005-07-01 7:07 ` Miklos Szeredi 2005-07-01 7:14 ` Andrew Morton 2005-07-01 12:37 ` bert hubert 1 sibling, 1 reply; 78+ messages in thread From: Miklos Szeredi @ 2005-07-01 7:07 UTC (permalink / raw) To: akpm; +Cc: aia21, arjan, linux-kernel, frankvm > > > > > - aren't we going to remove the nfs semi-server feature? > > > > I leave the decision to you ;) It's a separate independent patch > > already (fuse-nfs-export.patch). > > Let's leave it out - that'll stimulate some activity in the > userspace-nfs-server-for-FUSE area. > > Speaking of which, dumb question: what does FUSE offer over simply using > NFS protocol to talk to the userspace filesystem driver? Oh lots: - no deadlocks (NFS mounted from localhost is riddled with them) - efficient protocol, optimized for less context switches - dcache invalidation policy - probably more, but I can't remember Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 7:07 ` Miklos Szeredi @ 2005-07-01 7:14 ` Andrew Morton 2005-07-01 7:27 ` Miles Bader 2005-07-01 7:38 ` Miklos Szeredi 0 siblings, 2 replies; 78+ messages in thread From: Andrew Morton @ 2005-07-01 7:14 UTC (permalink / raw) To: Miklos Szeredi; +Cc: aia21, arjan, linux-kernel, frankvm Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > > > > - aren't we going to remove the nfs semi-server feature? > > > > > > I leave the decision to you ;) It's a separate independent patch > > > already (fuse-nfs-export.patch). > > > > Let's leave it out - that'll stimulate some activity in the > > userspace-nfs-server-for-FUSE area. > > > > Speaking of which, dumb question: what does FUSE offer over simply using > > NFS protocol to talk to the userspace filesystem driver? > > Oh lots: > > - no deadlocks (NFS mounted from localhost is riddled with them) It is? We had some low-memory problems a while back, but they got fixed. During that work I did some nfs-to-localhost testing and things seemed OK. > - efficient protocol, optimized for less context switches One wouldn't really expect a userspace filesystem to be particularly fast, and the performance will be dominated by memory copies and IO wait anyway. > - dcache invalidation policy What's that? > - probably more, but I can't remember Please do.. ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 7:14 ` Andrew Morton @ 2005-07-01 7:27 ` Miles Bader 2005-07-01 7:38 ` Miklos Szeredi 1 sibling, 0 replies; 78+ messages in thread From: Miles Bader @ 2005-07-01 7:27 UTC (permalink / raw) To: Andrew Morton; +Cc: Miklos Szeredi, aia21, arjan, linux-kernel, frankvm Andrew Morton <akpm@osdl.org> writes: >> - efficient protocol, optimized for less context switches > > One wouldn't really expect a userspace filesystem to be particularly fast, > and the performance will be dominated by memory copies and IO wait anyway. Well there's slow and then there's slow... numbers are always nice though. -miles -- [|nurgle|] ddt- demonic? so quake will have an evil kinda setting? one that will make every christian in the world foamm at the mouth? [iddt] nurg, that's the goal ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 7:14 ` Andrew Morton 2005-07-01 7:27 ` Miles Bader @ 2005-07-01 7:38 ` Miklos Szeredi 2005-07-01 8:02 ` Andrew Morton 1 sibling, 1 reply; 78+ messages in thread From: Miklos Szeredi @ 2005-07-01 7:38 UTC (permalink / raw) To: akpm; +Cc: miklos, aia21, arjan, linux-kernel, frankvm > > > > > > > > > > > - aren't we going to remove the nfs semi-server feature? > > > > > > > > I leave the decision to you ;) It's a separate independent patch > > > > already (fuse-nfs-export.patch). > > > > > > Let's leave it out - that'll stimulate some activity in the > > > userspace-nfs-server-for-FUSE area. > > > > > > Speaking of which, dumb question: what does FUSE offer over simply using > > > NFS protocol to talk to the userspace filesystem driver? > > > > Oh lots: > > > > - no deadlocks (NFS mounted from localhost is riddled with them) > > It is? We had some low-memory problems a while back, but they got fixed. > During that work I did some nfs-to-localhost testing and things seemed OK. Well, there's the "unsolvable" writeback deadlock problem, that FUSE works around by not buffering dirty pages (and not allowing writable mmap). Does NFS solve that? I'm interested :) Then there's the usual "filesystem recursing into itself" deadlock. Mounting with 'intr' probably solves this for NFS, but that has unwanted side effects. FUSE only allows KILL to interrupt a request. > > - efficient protocol, optimized for less context switches > > One wouldn't really expect a userspace filesystem to be particularly fast, FUSE is pretty fast. >100Mbytes/s transfer speeds on a moderate hardware are not unusual. > and the performance will be dominated by memory copies and IO wait anyway. Memory copies don't seem to be an issue (and FUSE does very little of it). Performance is mostly dominated by context switch times (if the underlying filesystem can keep up). Unfortunately unbuffered writes mean a separate request for each written page, and thus a context switch (on UP at least). This has a marked effect on write performance. > > - dcache invalidation policy > > What's that? Userspace can tell the kernel, how long a dentry should be valid. I don't think the NFS protocol provides this. Same holds for the inode attributes. > > - probably more, but I can't remember > > Please do.. OK, I'll do a little research. Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 7:38 ` Miklos Szeredi @ 2005-07-01 8:02 ` Andrew Morton 2005-07-01 10:11 ` Miklos Szeredi 2005-07-03 19:39 ` Pavel Machek 0 siblings, 2 replies; 78+ messages in thread From: Andrew Morton @ 2005-07-01 8:02 UTC (permalink / raw) To: Miklos Szeredi; +Cc: miklos, aia21, arjan, linux-kernel, frankvm Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > > > > > > > > > > > - aren't we going to remove the nfs semi-server feature? > > > > > > > > > > I leave the decision to you ;) It's a separate independent patch > > > > > already (fuse-nfs-export.patch). > > > > > > > > Let's leave it out - that'll stimulate some activity in the > > > > userspace-nfs-server-for-FUSE area. > > > > > > > > Speaking of which, dumb question: what does FUSE offer over simply using > > > > NFS protocol to talk to the userspace filesystem driver? > > > > > > Oh lots: > > > > > > - no deadlocks (NFS mounted from localhost is riddled with them) > > > > It is? We had some low-memory problems a while back, but they got fixed. > > During that work I did some nfs-to-localhost testing and things seemed OK. > > Well, there's the "unsolvable" writeback deadlock problem, that FUSE > works around by not buffering dirty pages (and not allowing writable > mmap). Does NFS solve that? I'm interested :) I don't know - first you'd have to describe it. > Then there's the usual "filesystem recursing into itself" deadlock. Describe this completely as well, please. > Mounting with 'intr' probably solves this for NFS, but that has > unwanted side effects. FUSE only allows KILL to interrupt a request. Maybe these things can be solved in NFS? > > > - dcache invalidation policy > > > > What's that? > > Userspace can tell the kernel, how long a dentry should be valid. I > don't think the NFS protocol provides this. Same holds for the inode > attributes. Why is that needed? > > > - probably more, but I can't remember > > > > Please do.. > > OK, I'll do a little research. > v9fs has a user-level server too. Maybe it has been used in FUSE-like scenarios more than NFS. Plus NFS and v9fs work across the network... ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 8:02 ` Andrew Morton @ 2005-07-01 10:11 ` Miklos Szeredi 2005-07-01 11:29 ` Andrew Morton ` (2 more replies) 2005-07-03 19:39 ` Pavel Machek 1 sibling, 3 replies; 78+ messages in thread From: Miklos Szeredi @ 2005-07-01 10:11 UTC (permalink / raw) To: akpm; +Cc: miklos, miklos, aia21, arjan, linux-kernel, frankvm > > Well, there's the "unsolvable" writeback deadlock problem, that FUSE > > works around by not buffering dirty pages (and not allowing writable > > mmap). Does NFS solve that? I'm interested :) > > I don't know - first you'd have to describe it. A dirty page is being written back, but the userspace server needs to allocate memory to complete the request. But the allocation will block, since there's no more free memory. > > Then there's the usual "filesystem recursing into itself" deadlock. > > Describe this completely as well, please. User does unlink("/mnt/userfs/file"). Userspace server receives request to unlink "/file". Then the daemon does unlink("/mnt/userfs/file"). This will deadlock on i_sem. > > Mounting with 'intr' probably solves this for NFS, but that has > > unwanted side effects. FUSE only allows KILL to interrupt a request. > > Maybe these things can be solved in NFS? Possibly. > > > > > - dcache invalidation policy > > > > > > What's that? > > > > Userspace can tell the kernel, how long a dentry should be valid. I > > don't think the NFS protocol provides this. Same holds for the inode > > attributes. > > Why is that needed? Because, I can well imagine a synthetic filesystem, where file data/metadata change aribitrarily. In this case the timeout heuristic in NFS is not useful. In fact with NFS it's often a PITA, that it doesn't want to refresh a file's data/metatata, which I _know_ has changed on the server. > > > > - probably more, but I can't remember > > > > > > Please do.. > > > > OK, I'll do a little research. > > > > v9fs has a user-level server too. Maybe it has been used in FUSE-like > scenarios more than NFS. I think the p9 protocol is suffering from trying to be too generic. The FUSE kernel interface is probably slightly tied to the linux VFS, and would present problems if trying to port to other *NIX or god forbid some other OS family altogether. That may seem like a drawback, but I don't think it is: - people are encouraged to use the FUSE library API instead of the raw kernel interface - if it will be ported to other systems, even the kernel interface could probably be made compatible, only at the loss of simplicity/performance. > Plus NFS and v9fs work across the network... Yes. I consider that a drawback. FUSE does data transfer very efficiently (single copy), without the heavy network infrastructure being in the way. Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 10:11 ` Miklos Szeredi @ 2005-07-01 11:29 ` Andrew Morton 2005-07-01 12:00 ` Miklos Szeredi ` (3 more replies) 2005-07-01 12:08 ` Frank van Maarseveen 2005-07-01 13:21 ` Eric Van Hensbergen 2 siblings, 4 replies; 78+ messages in thread From: Andrew Morton @ 2005-07-01 11:29 UTC (permalink / raw) To: Miklos Szeredi; +Cc: miklos, aia21, arjan, linux-kernel, frankvm Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > Well, there's the "unsolvable" writeback deadlock problem, that FUSE > > > works around by not buffering dirty pages (and not allowing writable > > > mmap). Does NFS solve that? I'm interested :) > > > > I don't know - first you'd have to describe it. > > A dirty page is being written back, but the userspace server needs to > allocate memory to complete the request. But the allocation will > block, since there's no more free memory. That shouldn't happen with write() traffic due to the dirty memory balancing logic. It'll happen with MAP_SHARED. Totally disallowing MAP_SHARED sounds a bit drastic, but of course nfs/v9fs could be taught to do that. > > > Then there's the usual "filesystem recursing into itself" deadlock. > > > > Describe this completely as well, please. > > User does unlink("/mnt/userfs/file"). Userspace server receives > request to unlink "/file". Then the daemon does > unlink("/mnt/userfs/file"). This will deadlock on i_sem. eh? How can the fuse client and the fuse server both get access to the same file in this manner? I don't see how you could set that up with NFS, for example. > > > Userspace can tell the kernel, how long a dentry should be valid. I > > > don't think the NFS protocol provides this. Same holds for the inode > > > attributes. > > > > Why is that needed? > > Because, I can well imagine a synthetic filesystem, where file > data/metadata change aribitrarily. In this case the timeout heuristic > in NFS is not useful. > > In fact with NFS it's often a PITA, that it doesn't want to refresh a > file's data/metatata, which I _know_ has changed on the server. I think nfs can do this, as long as the modification was done through the server. I'd expect v9fs would be the same. > > Plus NFS and v9fs work across the network... > > Yes. I consider that a drawback. Others (many) would disagree. Sorry, but I'm not buying it. I still don't see a solid reason why all this could not be done with nfs/v9fs, some kernel tweaks and the rest in userspace. It would take some effort, but that effort would end up strengthening existing kernel capabilities rather than adding brand new things, which is good. ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 11:29 ` Andrew Morton @ 2005-07-01 12:00 ` Miklos Szeredi 2005-07-01 12:53 ` Anton Altaparmakov ` (2 subsequent siblings) 3 siblings, 0 replies; 78+ messages in thread From: Miklos Szeredi @ 2005-07-01 12:00 UTC (permalink / raw) To: akpm; +Cc: aia21, arjan, linux-kernel, frankvm > > A dirty page is being written back, but the userspace server needs to > > allocate memory to complete the request. But the allocation will > > block, since there's no more free memory. > > That shouldn't happen with write() traffic due to the dirty memory > balancing logic. How? It either blocks other allocations until the writeback is completed (DoS) or allows memory to be exhausted (deadlock). Making unpriv mounts work securely is not a trivial thing I can tell you ;) > > User does unlink("/mnt/userfs/file"). Userspace server receives > > request to unlink "/file". Then the daemon does > > unlink("/mnt/userfs/file"). This will deadlock on i_sem. > > eh? How can the fuse client and the fuse server both get access to the > same file in this manner? I don't see how you could set that up with NFS, > for example. With a custom userspace NFS server you can do whatever you want. That's the whole purpose of the exercise. > > Because, I can well imagine a synthetic filesystem, where file > > data/metadata change aribitrarily. In this case the timeout heuristic > > in NFS is not useful. > > > > In fact with NFS it's often a PITA, that it doesn't want to refresh a > > file's data/metatata, which I _know_ has changed on the server. > > I think nfs can do this, as long as the modification was done through the > server. I'd expect v9fs would be the same. It's often not. Sshfs is a good example. File server will not be able to notify the client when anything changes. Polling is the only solution, and NFS doesn't always get it right (and in fact it cannot). It's much better to leave cache timeout policy to the userspace filesystem, then trying to guess it in the kernel. > > > Plus NFS and v9fs work across the network... > > > > Yes. I consider that a drawback. > > Others (many) would disagree. > > > Sorry, but I'm not buying it. I still don't see a solid reason why all > this could not be done with nfs/v9fs, some kernel tweaks and the rest in > userspace. It would take some effort, but that effort would end up > strengthening existing kernel capabilities rather than adding brand new > things, which is good. I'm not sure. NFS is a monster, everybody can agree. Getting all the requirements of FUSE (safe unprivileged mounts, etc) would be a nightmare. FUSE does one thing, and it does that right. I think that's good. Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 11:29 ` Andrew Morton 2005-07-01 12:00 ` Miklos Szeredi @ 2005-07-01 12:53 ` Anton Altaparmakov 2005-07-01 13:07 ` Anton Altaparmakov 2005-07-01 13:51 ` Frank van Maarseveen 2005-07-01 13:29 ` Eric Van Hensbergen 2005-07-01 16:45 ` Matthias Urlichs 3 siblings, 2 replies; 78+ messages in thread From: Anton Altaparmakov @ 2005-07-01 12:53 UTC (permalink / raw) To: Andrew Morton; +Cc: Miklos Szeredi, arjan, linux-kernel, frankvm On Fri, 2005-07-01 at 04:29 -0700, Andrew Morton wrote: > Sorry, but I'm not buying it. I still don't see a solid reason why all > this could not be done with nfs/v9fs, some kernel tweaks and the rest in > userspace. It would take some effort, but that effort would end up > strengthening existing kernel capabilities rather than adding brand new > things, which is good. FUSE is a generic FS API which is _very_ easy to write an FS for (learning curve is about 10-15 minutes starting after you have unpacked the fuse source code, at least it took me that long to start writing an FS based on the example one provided). NFS is not anything like that. Also can the NFS approach provide me with different content depending on the uid of the accessing process? With FUSE that is easy as pie. Even easier than that actually... Best regards, Anton -- Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/ ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 12:53 ` Anton Altaparmakov @ 2005-07-01 13:07 ` Anton Altaparmakov 2005-07-01 13:51 ` Frank van Maarseveen 1 sibling, 0 replies; 78+ messages in thread From: Anton Altaparmakov @ 2005-07-01 13:07 UTC (permalink / raw) To: Andrew Morton; +Cc: Miklos Szeredi, arjan, linux-kernel, frankvm On Fri, 2005-07-01 at 13:53 +0100, Anton Altaparmakov wrote: > On Fri, 2005-07-01 at 04:29 -0700, Andrew Morton wrote: > > Sorry, but I'm not buying it. I still don't see a solid reason why all > > this could not be done with nfs/v9fs, some kernel tweaks and the rest in > > userspace. It would take some effort, but that effort would end up > > strengthening existing kernel capabilities rather than adding brand new > > things, which is good. > > FUSE is a generic FS API which is _very_ easy to write an FS for > (learning curve is about 10-15 minutes starting after you have unpacked > the fuse source code, at least it took me that long to start writing an > FS based on the example one provided). NFS is not anything like that. > > Also can the NFS approach provide me with different content depending on > the uid of the accessing process? With FUSE that is easy as pie. Even > easier than that actually... I forgot: And doesn't NFS require stable inode numbers and other "invariables" like that for it to work? FUSE doesn't and those requirements are a real PITA in a lot of cases where there simply are no inodes and the numbers are synthetic and change on each remount or even on each access after the dentry has expired... And I always thought that doing FS in userspace via NFS is considered an ugly hack. I didn't have the impression that that had changed recently. (-; Best regards, Anton -- Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/ ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 12:53 ` Anton Altaparmakov 2005-07-01 13:07 ` Anton Altaparmakov @ 2005-07-01 13:51 ` Frank van Maarseveen 1 sibling, 0 replies; 78+ messages in thread From: Frank van Maarseveen @ 2005-07-01 13:51 UTC (permalink / raw) To: Anton Altaparmakov Cc: Andrew Morton, Miklos Szeredi, arjan, linux-kernel, frankvm On Fri, Jul 01, 2005 at 01:53:54PM +0100, Anton Altaparmakov wrote: > On Fri, 2005-07-01 at 04:29 -0700, Andrew Morton wrote: > > Sorry, but I'm not buying it. I still don't see a solid reason why all > > this could not be done with nfs/v9fs, some kernel tweaks and the rest in > > userspace. It would take some effort, but that effort would end up > > strengthening existing kernel capabilities rather than adding brand new > > things, which is good. > > Also can the NFS approach provide me with different content depending on > the uid of the accessing process? With FUSE that is easy as pie. Even > easier than that actually... unfsd can that I believe. However, FUSE and user space NFSd are complementary. For every NFS solution one still needs to do the mounting as root. FUSE addresses the client side: it can implement a user space NFS client. -- Frank ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 11:29 ` Andrew Morton 2005-07-01 12:00 ` Miklos Szeredi 2005-07-01 12:53 ` Anton Altaparmakov @ 2005-07-01 13:29 ` Eric Van Hensbergen 2005-07-01 16:45 ` Matthias Urlichs 3 siblings, 0 replies; 78+ messages in thread From: Eric Van Hensbergen @ 2005-07-01 13:29 UTC (permalink / raw) To: Andrew Morton Cc: Miklos Szeredi, aia21, arjan, linux-kernel, frankvm, v9fs-developer On 7/1/05, Andrew Morton <akpm@osdl.org> wrote: > Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > Userspace can tell the kernel, how long a dentry should be valid. I > > > > don't think the NFS protocol provides this. Same holds for the inode > > > > attributes. > > > > > > Why is that needed? > > > > Because, I can well imagine a synthetic filesystem, where file > > data/metadata change aribitrarily. In this case the timeout heuristic > > in NFS is not useful. > > > > In fact with NFS it's often a PITA, that it doesn't want to refresh a > > file's data/metatata, which I _know_ has changed on the server. > > I think nfs can do this, as long as the modification was done through the > server. I'd expect v9fs would be the same. > v9fs aggressively invalidates dentries by default -- it is our experience that caching metadata (particularly in synthetics) causes more problems than it is worth. That being said, there are prototype designs for v9fs cache layers which actively detect if underlying file systems are synthetic or static and allow parametrized cache policies (for both the dcache and the page cache). As a side-note which I know less about, I believe NFSv4 includes server-push invalidation semantics, but I can't remember if that applies to metadata or just data. -eric ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 11:29 ` Andrew Morton ` (2 preceding siblings ...) 2005-07-01 13:29 ` Eric Van Hensbergen @ 2005-07-01 16:45 ` Matthias Urlichs 3 siblings, 0 replies; 78+ messages in thread From: Matthias Urlichs @ 2005-07-01 16:45 UTC (permalink / raw) To: linux-kernel Hi, Andrew Morton wrote: > Sorry, but I'm not buying it. I still don't see a solid reason why all > this could not be done with nfs/v9fs, some kernel tweaks and the rest in > userspace. Let's forget about NFS here. It's stateless. You don't want a wholly stateless layer between two stateful instances; the fact that it works for a disk-based NFS server isn't proof that it'd work for gmailfs or sshfs. There are a lot of FUSE server implementations out there already. You want all of them to rewrite their code for v9fs? I admit that I don't know zilch about how difficult it is to write a v9fs server (is there sane sample code / a support library?) or how much overhead such a server would incur or how safe it'd be to run a user-controlled server on the same machine as the mountpoint. The point is that the FUSE people already cover all these points, thus: unless there's a major technical problem with it that v9fs solves better, I'd advocate to include it. -- Matthias Urlichs | {M:U} IT Design @ m-u-it.de | smurf@smurf.noris.de Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de - - Magpie, n.: A bird whose thievish disposition suggested to someone that it might be taught to talk. -- Ambrose Bierce, "The Devil's Dictionary" ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 10:11 ` Miklos Szeredi 2005-07-01 11:29 ` Andrew Morton @ 2005-07-01 12:08 ` Frank van Maarseveen 2005-07-01 13:21 ` Eric Van Hensbergen 2 siblings, 0 replies; 78+ messages in thread From: Frank van Maarseveen @ 2005-07-01 12:08 UTC (permalink / raw) To: Miklos Szeredi; +Cc: akpm, aia21, arjan, linux-kernel, frankvm On Fri, Jul 01, 2005 at 12:11:53PM +0200, Miklos Szeredi wrote: > > > Userspace can tell the kernel, how long a dentry should be valid. I > > > don't think the NFS protocol provides this. Same holds for the inode > > > attributes. > > > > Why is that needed? > > Because, I can well imagine a synthetic filesystem, where file > data/metadata change aribitrarily. In this case the timeout heuristic > in NFS is not useful. > > In fact with NFS it's often a PITA, that it doesn't want to refresh a > file's data/metatata, which I _know_ has changed on the server. This NFS issue is on my radar for years already. I have a patch which is practical but a bit disgusting. IMHO it's orthogonal to FUSE. -- Frank ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 10:11 ` Miklos Szeredi 2005-07-01 11:29 ` Andrew Morton 2005-07-01 12:08 ` Frank van Maarseveen @ 2005-07-01 13:21 ` Eric Van Hensbergen 2005-07-01 13:53 ` Miklos Szeredi 2 siblings, 1 reply; 78+ messages in thread From: Eric Van Hensbergen @ 2005-07-01 13:21 UTC (permalink / raw) To: Miklos Szeredi; +Cc: akpm, aia21, arjan, linux-kernel, frankvm, v9fs-developer On 7/1/05, Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > v9fs has a user-level server too. Maybe it has been used in FUSE-like > > scenarios more than NFS. We've really only dabbled with v9fs and user-level file services, mostly through interacting with Plan 9 From User Space applications (http://www.plan9.us) However, there are people actively improving this area of functionality including providing an SDK to allow easy creation of synthetic file systems. That being said, there are many aspects of v9fs which have been written/re-written with the express purpose of providing support for such synthetics. > > I think the p9 protocol is suffering from trying to be too generic. > The FUSE kernel interface is probably slightly tied to the linux VFS, > and would present problems if trying to port to other *NIX or god > forbid some other OS family altogether. > I don't know where 9P "suffers" from being too generic, it's just well-designed and has done a good job of keeping things simple -- something that the plethora of over designed, bloated interfaces of today could learn from. > > > Plus NFS and v9fs work across the network... > > Yes. I consider that a drawback. FUSE does data transfer very > efficiently (single copy), without the heavy network infrastructure > being in the way. > I'll grant you this is something v9fs-2.0 suffers from, but its something we are actively addressing in v9fs-2.1. We're working more towards the implementation that is present in the Plan 9 kernel, where the core efficiently multiplexes the requests either directly to local servers (in Plan 9's case via function call APIs) or encapsulates them for shipping across the network. The 9P interface is used for both, it just has different embodiments depending on underlying transport. That being said, I imagine the time spent context switching in and out of the kernel dominates performance. With a proper mux there is no reason why v9fs can't be made as efficient as FUSE - and that's what we intend to demonstrate in v9fs-2.1. Plus, with v9fs you get the benefit of being able to export your synthetic file systems over the network with no additional copies. Further, when you create an infrastructure which is meant to work over a network, you take fewer things for granted -- which ultimately leads to a more robust system capable of dealing with many of these problems. -eric ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 13:21 ` Eric Van Hensbergen @ 2005-07-01 13:53 ` Miklos Szeredi 2005-07-01 14:18 ` Eric Van Hensbergen 0 siblings, 1 reply; 78+ messages in thread From: Miklos Szeredi @ 2005-07-01 13:53 UTC (permalink / raw) To: ericvh; +Cc: miklos, akpm, aia21, arjan, linux-kernel, frankvm, v9fs-developer > I don't know where 9P "suffers" from being too generic, it's just > well-designed and has done a good job of keeping things simple -- > something that the plethora of over designed, bloated interfaces of > today could learn from. True. I very much like the simplicity of the 9P protocol. But it's system independence sometimes makes it fit poorly to the Linux VFS interface. I guess you have a wide experience with this :) > > > Plus NFS and v9fs work across the network... > > > > Yes. I consider that a drawback. FUSE does data transfer very > > efficiently (single copy), without the heavy network infrastructure > > being in the way. > > > > I'll grant you this is something v9fs-2.0 suffers from, but its > something we are actively addressing in v9fs-2.1. We're working more > towards the implementation that is present in the Plan 9 kernel, where > the core efficiently multiplexes the requests either directly to local > servers (in Plan 9's case via function call APIs) or encapsulates them > for shipping across the network. The 9P interface is used for both, > it just has different embodiments depending on underlying transport. > > That being said, I imagine the time spent context switching in and out > of the kernel dominates performance. Context switch happens from one process to the other, not when entering/leaving the kernel (which is very efficient). So it's much more important to reduce the number of round-trips for a single operation, than multiplexing requests for multiple operations. > With a proper mux there is no reason why v9fs can't be made as > efficient as FUSE - and that's what we intend to demonstrate in > v9fs-2.1. Plus, with v9fs you get the benefit of being able to > export your synthetic file systems over the network with no > additional copies. Yes, but does that matter? I'm not sure that it's a good idea bundling network filesystem functionality together with userspace filesystem functionality. Each has it's own set of requirements, and it's own way of working optimally. What would people say if ext3 was always mounted locally through NFS, because the kernel would only provide the NFS filesystem client. Differentiation of interfaces depending on the "closeness" of the client to the server makes good sense IMO. We currently have in-kernel and across-network. FUSE adds in-userspace in between those two. Sometime these can overlap, but one interface will always be more optimal (in terms of functionality as well as speed) for a specific application. > Further, when you create an infrastructure which is meant to work over > a network, you take fewer things for granted -- which ultimately leads > to a more robust system capable of dealing with many of these > problems. Yes. I'm not speaking agains v9fs, which I think has a valid niche, as well as FUSE. Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 13:53 ` Miklos Szeredi @ 2005-07-01 14:18 ` Eric Van Hensbergen 2005-07-01 14:31 ` Miklos Szeredi 0 siblings, 1 reply; 78+ messages in thread From: Eric Van Hensbergen @ 2005-07-01 14:18 UTC (permalink / raw) To: Miklos Szeredi; +Cc: akpm, aia21, arjan, linux-kernel, frankvm, v9fs-developer On 7/1/05, Miklos Szeredi <miklos@szeredi.hu> wrote: > > I don't know where 9P "suffers" from being too generic, it's just > > well-designed and has done a good job of keeping things simple -- > > something that the plethora of over designed, bloated interfaces of > > today could learn from. > > True. I very much like the simplicity of the 9P protocol. But it's > system independence sometimes makes it fit poorly to the Linux VFS > interface. I guess you have a wide experience with this :) > Yeah, but most of our problems had less to do with the VFS interface per se, and more to do with the dcache/page-cache. In the long run, the portability is something you may want though -- not only to provide support under BSD or whatever, but also to insulate changes in the VFS API from user file servers. > > So it's much more important to reduce the number of round-trips for a > single operation, than multiplexing requests for multiple operations. > Agreed, this will be something we'll (v9fs) have to keep a close tab on to keep things efficient. > > With a proper mux there is no reason why v9fs can't be made as > > efficient as FUSE - and that's what we intend to demonstrate in > > v9fs-2.1. Plus, with v9fs you get the benefit of being able to > > export your synthetic file systems over the network with no > > additional copies. > > Yes, but does that matter? I'm not sure that it's a good idea > bundling network filesystem functionality together with userspace > filesystem functionality. Each has it's own set of requirements, and > it's own way of working optimally. > I see your point, but increasingly common usage environments are distributed systems and I think network synthetics will have their niche. > What would people say if ext3 was always mounted locally through NFS, > because the kernel would only provide the NFS filesystem client. Probably the same thing they would say if ext3 was a user-space application that always needed to be mounted via FUSE ;) > > Differentiation of interfaces depending on the "closeness" of the > client to the server makes good sense IMO. We currently have > in-kernel and across-network. FUSE adds in-userspace in between those > two. > I think that remains to be seen. There is much to be gained from blurring the differentiation as we move Linux towards a first-class distributed system. If unified interfaces can be made "good-enough" performance wise, what justifies having multiple interfaces depending on network versus local? Specialization has its place, but performance mongering at the cost of design is what killed systems research. In the end, specialization has its place, but I think it's always worth striving towards unified interfaces when performance doesn't suffer to a great degree. > > > Further, when you create an infrastructure which is meant to work over > > a network, you take fewer things for granted -- which ultimately leads > > to a more robust system capable of dealing with many of these > > problems. > > Yes. I'm not speaking agains v9fs, which I think has a valid niche, > as well as FUSE. > FUSE certainly has its place, and has done a great job creating an environment in which it is relatively easy to create new file systems in user-space. My main point in responding was to take the position that the v9fs mechanisms are adequate to provide user-space file systems and that while it was not the primary motivation behind the v9fs project, we are actively pursuing improving the performance and robustness of our mechanisms for providing user-space (as well as kernel-space) file service and developing an SDK to ease the implementation of 9P-based synthetic file servers. -eric ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 14:18 ` Eric Van Hensbergen @ 2005-07-01 14:31 ` Miklos Szeredi 2005-07-02 10:01 ` Eric W. Biederman 0 siblings, 1 reply; 78+ messages in thread From: Miklos Szeredi @ 2005-07-01 14:31 UTC (permalink / raw) To: ericvh; +Cc: akpm, aia21, arjan, linux-kernel, frankvm, v9fs-developer > > What would people say if ext3 was always mounted locally through NFS, > > because the kernel would only provide the NFS filesystem client. > > Probably the same thing they would say if ext3 was a user-space > application that always needed to be mounted via FUSE ;) Yes, and rightly. One of the misunderstandings about userspace filesystems (Linus falls into this) is to compare it with microkernels. FUSE (and userspace filesystems in general) are NOT meant to replace in kernel filesystems or the VFS. They are an addition with which different kinds of filesystems can be implemented much better than they could be in kernel. Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 14:31 ` Miklos Szeredi @ 2005-07-02 10:01 ` Eric W. Biederman 2005-07-02 14:58 ` Miklos Szeredi 2005-07-02 16:43 ` Eric Van Hensbergen 0 siblings, 2 replies; 78+ messages in thread From: Eric W. Biederman @ 2005-07-02 10:01 UTC (permalink / raw) To: Miklos Szeredi Cc: ericvh, akpm, aia21, arjan, linux-kernel, frankvm, v9fs-developer Miklos Szeredi <miklos@szeredi.hu> writes: >> > What would people say if ext3 was always mounted locally through NFS, >> > because the kernel would only provide the NFS filesystem client. >> >> Probably the same thing they would say if ext3 was a user-space >> application that always needed to be mounted via FUSE ;) > > Yes, and rightly. > > One of the misunderstandings about userspace filesystems (Linus falls > into this) is to compare it with microkernels. > > FUSE (and userspace filesystems in general) are NOT meant to replace > in kernel filesystems or the VFS. They are an addition with which > different kinds of filesystems can be implemented much better than > they could be in kernel. Taking a quick glance at v9fs and fuse I fail to see how either plays nicely with the page cache. v9fs according to my reading of the protocol specification does not have any concept of a lease. So you can't tell if you are talking about a virtual filesystem where all calls should be passed straight to the server or a real filesystem where you can perform caching. The implementation simply appears to bypass the pagecache which seems sane. Skimming through the FUSE code I see the same problem, in that you can't autodetect the right thing. This is currently hacked around with "direct_io" mount option selecting between a cached and a non-cached status on a filesystem basis at mount time. But having a per file flag would be nicer. I also don't understand why in fuse direct_io is an if statement in fuse_file_read/write instead of simply being a different set of filesystem operations. Neither implementation seems to forward user space locks to the filesystem server. Eric ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-02 10:01 ` Eric W. Biederman @ 2005-07-02 14:58 ` Miklos Szeredi 2005-07-02 16:43 ` Eric Van Hensbergen 1 sibling, 0 replies; 78+ messages in thread From: Miklos Szeredi @ 2005-07-02 14:58 UTC (permalink / raw) To: ebiederm Cc: ericvh, akpm, aia21, arjan, linux-kernel, frankvm, v9fs-developer > Taking a quick glance at v9fs and fuse I fail to see how either > plays nicely with the page cache. > > v9fs according to my reading of the protocol specification does > not have any concept of a lease. So you can't tell if you are > talking about a virtual filesystem where all calls should be passed > straight to the server or a real filesystem where you can perform > caching. The implementation simply appears to bypass the pagecache > which seems sane. > > Skimming through the FUSE code I see the same problem, in that you can't > autodetect the right thing. This is currently hacked around with > "direct_io" mount option selecting between a cached and a non-cached > status on a filesystem basis at mount time. But having > a per file flag would be nicer. There's a plan to make this work. The kernel ABI has alredy been prepared for this, it would be relatively little work to implement. But I usually wait with something like this until people actually start asking for this feature. > I also don't understand why in fuse direct_io is an if statement in > fuse_file_read/write instead of simply being a different set of > filesystem operations. Good point. I'll fix that. > Neither implementation seems to forward user space locks to the > filesystem server. This too has been discussed. The last half year has been mostly spend with ironing out problems cought during integration. Sometime this summer I'll start implementing these new features (inode based API, locking, userspace NFS serving, maybe shared writable mmap support). Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-02 10:01 ` Eric W. Biederman 2005-07-02 14:58 ` Miklos Szeredi @ 2005-07-02 16:43 ` Eric Van Hensbergen 2005-07-02 17:33 ` Eric W. Biederman 1 sibling, 1 reply; 78+ messages in thread From: Eric Van Hensbergen @ 2005-07-02 16:43 UTC (permalink / raw) To: Eric W. Biederman, Miklos Szeredi Cc: akpm, aia21, arjan, linux-kernel, frankvm, v9fs-developer On Sat, 2 Jul 2005 6:15 am, Eric W. Biederman wrote: > > Taking a quick glance at v9fs and fuse I fail to see how either > plays nicely with the page cache. > True, in fact it actively avoids using it. The previous version used both the page cache and the dcache with undesirable effects on synthetic file systems so we removed cache support. Our intention is to design a cache layer (similar to cfs on Plan 9) which handles cache semantics which can be parameterized with the appropriate cache policy depending on the underlying file server. > v9fs according to my reading of the protocol specification does > not have any concept of a lease. So you can't tell if you are > talking about a virtual filesystem where all calls should be passed > straight to the server or a real filesystem where you can perform > caching. While 9P contains no explicit support for leases and cacheing there is an informal mechanism which is used (at least for plan 9 file servers). If the qid.vers is 0 the file can be assumed to be a synthetic file and so it is not cached. > > Neither implementation seems to forward user space locks to the > filesystem server. > Yup. We have exclusive open semantics but not locks in the Posix sense. Lock support is on our 2.1 roadmap. -eric ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-02 16:43 ` Eric Van Hensbergen @ 2005-07-02 17:33 ` Eric W. Biederman 0 siblings, 0 replies; 78+ messages in thread From: Eric W. Biederman @ 2005-07-02 17:33 UTC (permalink / raw) To: Eric Van Hensbergen Cc: Miklos Szeredi, akpm, aia21, arjan, linux-kernel, frankvm, v9fs-developer Eric Van Hensbergen <ericvh@gmail.com> writes: > On Sat, 2 Jul 2005 6:15 am, Eric W. Biederman wrote: >> >> Taking a quick glance at v9fs and fuse I fail to see how either >> plays nicely with the page cache. >> > > True, in fact it actively avoids using it. The previous version used both the > page cache and the dcache with undesirable effects on synthetic file systems so > we removed cache support. Our intention is to design a cache layer (similar to > cfs on Plan 9) which handles cache semantics which can be parameterized with the > appropriate cache policy depending on the underlying file server. Not having auto discovery for that kind of thing disturbs me. But if you can discover what you must do and then the policy is about what you can do it I guess I'm fine with that. >> v9fs according to my reading of the protocol specification does >> not have any concept of a lease. So you can't tell if you are >> talking about a virtual filesystem where all calls should be passed >> straight to the server or a real filesystem where you can perform >> caching. > > While 9P contains no explicit support for leases and cacheing there is an > informal mechanism which is used (at least for plan 9 file servers). If the > qid.vers is 0 the file can be assumed to be a synthetic file and so it is not > cached. That sounds sane. With that you can at least do NFS style caching with a lot of stat calls to verify your cache is coherent and by implementing it as a write-through cache you can even do a halfway decent job of being cache coherent. Which is probably about the best you can do with the current unix API. With a write-through cache you can likely achieve the same semantic effect of totally not caching a file with an appropriate number of stat calls. Not caching some files will like yield I suggest you document the quid.vers == 0 magic for an uncachable file, so future interoperability is assured. Eric ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 8:02 ` Andrew Morton 2005-07-01 10:11 ` Miklos Szeredi @ 2005-07-03 19:39 ` Pavel Machek 2005-07-04 8:38 ` Miklos Szeredi 1 sibling, 1 reply; 78+ messages in thread From: Pavel Machek @ 2005-07-03 19:39 UTC (permalink / raw) To: Andrew Morton; +Cc: Miklos Szeredi, aia21, arjan, linux-kernel, frankvm Hi! > > > > > > I leave the decision to you ;) It's a separate independent patch > > > > > > already (fuse-nfs-export.patch). > > > > > > > > > > Let's leave it out - that'll stimulate some activity in the > > > > > userspace-nfs-server-for-FUSE area. > > > > > > > > > > Speaking of which, dumb question: what does FUSE offer over simply using > > > > > NFS protocol to talk to the userspace filesystem driver? > > > > > > > > Oh lots: > > > > > > > > - no deadlocks (NFS mounted from localhost is riddled with them) > > > > > > It is? We had some low-memory problems a while back, but they got fixed. > > > During that work I did some nfs-to-localhost testing and things seemed OK. > > > > Well, there's the "unsolvable" writeback deadlock problem, that FUSE > > works around by not buffering dirty pages (and not allowing writable > > mmap). Does NFS solve that? I'm interested :) > > I don't know - first you'd have to describe it. Actually, the right question is "how is fuse better than coda". I've asked that before; unlike nfs, userspace filesystems implemented with coda actually *work*, but do not provide partial-file writes. Pavel -- teflon -- maybe it is a trademark, but it should not be. ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-03 19:39 ` Pavel Machek @ 2005-07-04 8:38 ` Miklos Szeredi [not found] ` <20050704084900.GG15370@elf.ucw.cz> 0 siblings, 1 reply; 78+ messages in thread From: Miklos Szeredi @ 2005-07-04 8:38 UTC (permalink / raw) To: pavel; +Cc: akpm, aia21, arjan, linux-kernel, frankvm > Actually, the right question is "how is fuse better than coda". I've > asked that before; unlike nfs, userspace filesystems implemented with > coda actually *work*, but do not provide partial-file writes. You answered your own question. I did talk to Jan Harkes about the file I/O issue before starting FUSE. [searching archives] here's a quote from him about this: "I've been thinking about partial file accesses myself. However, I really don't want to go all the way to block-level caching. That would add a lot of overhead either in passing every read/write call up to userspace, or by using a largish amount of memory to keep track of availability of parts of the file. It also defeats the more efficient 'streaming' fetch of a whole file. However, something that would work reasonably well is a file offset marker that indicates how much data is available. Basically, when the application opens a file, the open upcall returns after the first... let's say 64KB... have arrived. Any read's and write (and mmap's) that access the available part of the file will be allowed. When any operation tries to access beyond the marker an upcall is made which blocks until the related part of the file has streamed in." So true random access doesn't fit too well into the CODA philosophy. Of course you could extend CODA to handle this as well (and all the other things needed for safe user mounts), but the results would proably not have pleased either side. Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
[parent not found: <20050704084900.GG15370@elf.ucw.cz>]
* Re: FUSE merging? [not found] ` <20050704084900.GG15370@elf.ucw.cz> @ 2005-07-04 9:02 ` Miklos Szeredi 2005-07-04 10:46 ` Pekka Enberg 0 siblings, 1 reply; 78+ messages in thread From: Miklos Szeredi @ 2005-07-04 9:02 UTC (permalink / raw) To: pavel; +Cc: akpm, aia21, arjan, linux-kernel, frankvm [CC restored] > Okay, I just wanted to mention CODA. Modifying CODA is probably still > better than modifying NFS (as akpm suggested at one point). Definitely. Here are some numbers on the size these filesystems as in current -mm ('wc fs/${fs}/* include/linux/${fs}*') nfs: 25495 9p: 6102 coda: 4752 fuse: 3733 I'm sure FUSE came out smallest because I'm biased and did something wrong ;) Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-04 9:02 ` Miklos Szeredi @ 2005-07-04 10:46 ` Pekka Enberg 0 siblings, 0 replies; 78+ messages in thread From: Pekka Enberg @ 2005-07-04 10:46 UTC (permalink / raw) To: Miklos Szeredi; +Cc: pavel, akpm, aia21, arjan, linux-kernel, frankvm On 7/4/05, Miklos Szeredi <miklos@szeredi.hu> wrote: > Here are some numbers on the size these filesystems as in current -mm > ('wc fs/${fs}/* include/linux/${fs}*') Sloccount [1] gives more meaningful numbers than wc: ('sloccount fs/${fs}/* include/linux/${fs}*') nfs: 21,046 9p: 3,856 coda: 3,358 fuse: 2,829 1. http://www.dwheeler.com/sloccount/ Pekka ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 6:50 ` Andrew Morton 2005-07-01 7:07 ` Miklos Szeredi @ 2005-07-01 12:37 ` bert hubert 1 sibling, 0 replies; 78+ messages in thread From: bert hubert @ 2005-07-01 12:37 UTC (permalink / raw) To: Andrew Morton; +Cc: Miklos Szeredi, aia21, arjan, linux-kernel, frankvm On Thu, Jun 30, 2005 at 11:50:59PM -0700, Andrew Morton wrote: > Speaking of which, dumb question: what does FUSE offer over simply using > NFS protocol to talk to the userspace filesystem driver? NFS currently does not currently engender warm feelings wrt ease of programming and quality in general - especially under Linux sadly enough. It is also a narrow window through which to speak to the rich set of options, flags, attributes and features the Linux kernel offers. I think Solaris used to implement bind mounts through loopback NFS, but that went out of fashion as well. -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 6:36 ` FUSE merging? Miklos Szeredi 2005-07-01 6:50 ` Andrew Morton @ 2005-07-01 7:46 ` Frederik Deweerdt 2005-07-01 9:47 ` Miklos Szeredi 2005-07-01 9:36 ` Frank van Maarseveen 2 siblings, 1 reply; 78+ messages in thread From: Frederik Deweerdt @ 2005-07-01 7:46 UTC (permalink / raw) To: Miklos Szeredi; +Cc: akpm, aia21, arjan, linux-kernel, frankvm Le 01/07/05 08:36 +0200, Miklos Szeredi écrivit: > Here's a description of a theoretical DoS scenario: > > http://marc.theaimsgroup.com/?l=linux-fsdevel&m=111522019516694&w=2 > > Miklos Could this be solved by implementing some sort of (optional) timeout on fuse syscalls (in request_send)? Fred -- o---------------------------------------------o | http://open-news.net : l'info alternative | | Tech - Sciences - Politique - International | o---------------------------------------------o ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 7:46 ` Frederik Deweerdt @ 2005-07-01 9:47 ` Miklos Szeredi 0 siblings, 0 replies; 78+ messages in thread From: Miklos Szeredi @ 2005-07-01 9:47 UTC (permalink / raw) To: frederik.deweerdt; +Cc: akpm, aia21, arjan, linux-kernel, frankvm > Could this be solved by implementing some sort of (optional) timeout on fuse > syscalls (in request_send)? Yes, but that would be thousand times worse than the current solution. You just can't know in advance, what a "sane" timeout value is. Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 6:36 ` FUSE merging? Miklos Szeredi 2005-07-01 6:50 ` Andrew Morton 2005-07-01 7:46 ` Frederik Deweerdt @ 2005-07-01 9:36 ` Frank van Maarseveen 2005-07-01 10:45 ` Miklos Szeredi 2 siblings, 1 reply; 78+ messages in thread From: Frank van Maarseveen @ 2005-07-01 9:36 UTC (permalink / raw) To: Miklos Szeredi; +Cc: akpm, aia21, arjan, linux-kernel, frankvm On Fri, Jul 01, 2005 at 08:36:02AM +0200, Miklos Szeredi wrote: > > Here's a description of a theoretical DoS scenario: > > http://marc.theaimsgroup.com/?l=linux-fsdevel&m=111522019516694&w=2 So the open() hangs indefinately. but what if blackhat tries to install a package from a no longer existing server on /net or via NFS? A user supplied pathname is not to be trusted by any setuid (or full root) program. Another example: I'm not sure if there are still /dev/tty devices which may block indefinately upon open() but: - I have yet to see a setuid program which always uses O_NONBLOCK when opening user supplied pathnames. - one cannot stat() and then open() because that gives a race. -- Frank ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 9:36 ` Frank van Maarseveen @ 2005-07-01 10:45 ` Miklos Szeredi 2005-07-01 11:34 ` Frank van Maarseveen 0 siblings, 1 reply; 78+ messages in thread From: Miklos Szeredi @ 2005-07-01 10:45 UTC (permalink / raw) To: frankvm; +Cc: akpm, aia21, arjan, linux-kernel, frankvm > > > > Here's a description of a theoretical DoS scenario: > > > > http://marc.theaimsgroup.com/?l=linux-fsdevel&m=111522019516694&w=2 > > So the open() hangs indefinately. but what if blackhat tries to install > a package from a no longer existing server on /net or via NFS? > > A user supplied pathname is not to be trusted by any setuid (or full > root) program. If /net won't detect a dead server within a timeout, I think it can be considered broken. > Another example: I'm not sure if there are still /dev/tty devices which > may block indefinately upon open() but: > > - I have yet to see a setuid program which always uses O_NONBLOCK > when opening user supplied pathnames. > - one cannot stat() and then open() because that gives a race. Is "being already broken" an excuse for preventing future breakage, when these are fixed? Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-07-01 10:45 ` Miklos Szeredi @ 2005-07-01 11:34 ` Frank van Maarseveen 0 siblings, 0 replies; 78+ messages in thread From: Frank van Maarseveen @ 2005-07-01 11:34 UTC (permalink / raw) To: Miklos Szeredi; +Cc: frankvm, akpm, aia21, arjan, linux-kernel On Fri, Jul 01, 2005 at 12:45:22PM +0200, Miklos Szeredi wrote: > > > > > > Here's a description of a theoretical DoS scenario: > > > > > > http://marc.theaimsgroup.com/?l=linux-fsdevel&m=111522019516694&w=2 > > > > So the open() hangs indefinately. but what if blackhat tries to install > > a package from a no longer existing server on /net or via NFS? > > > > A user supplied pathname is not to be trusted by any setuid (or full > > root) program. > > If /net won't detect a dead server within a timeout, I think it can be > considered broken. > > > Another example: I'm not sure if there are still /dev/tty devices which > > may block indefinately upon open() but: > > > > - I have yet to see a setuid program which always uses O_NONBLOCK > > when opening user supplied pathnames. > > - one cannot stat() and then open() because that gives a race. > > Is "being already broken" an excuse for preventing future breakage, > when these are fixed? All this breakage points into the same direction: A user supplied pathname is not to be trusted by any setuid (or full root) program. -- Frank ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-06-30 10:00 ` Arjan van de Ven 2005-06-30 10:12 ` Miklos Szeredi @ 2005-06-30 10:16 ` Miklos Szeredi 2005-06-30 16:30 ` Pavel Machek 1 sibling, 1 reply; 78+ messages in thread From: Miklos Szeredi @ 2005-06-30 10:16 UTC (permalink / raw) To: arjan; +Cc: akpm, linux-kernel > if you are so interested in getting fuse merged... why not merge it > first with the security stuff removed entirely. And then start > discussing putting security stuff back in ? BTW, I can split out the security stuff into a separate patch from the rest, if people feel more confortable discussing a concrete patch, instead of a range of lines (actually a 15 line function) of the whole. Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? 2005-06-30 10:16 ` Miklos Szeredi @ 2005-06-30 16:30 ` Pavel Machek 0 siblings, 0 replies; 78+ messages in thread From: Pavel Machek @ 2005-06-30 16:30 UTC (permalink / raw) To: Miklos Szeredi; +Cc: arjan, akpm, linux-kernel Hi! > > if you are so interested in getting fuse merged... why not merge it > > first with the security stuff removed entirely. And then start > > discussing putting security stuff back in ? > > BTW, I can split out the security stuff into a separate patch from the > rest, if people feel more confortable discussing a concrete patch, > instead of a range of lines (actually a 15 line function) of the > whole. Yes, I think that would help. [And also make it last in the series ;-)] Pavel -- teflon -- maybe it is a trademark, but it should not be. ^ permalink raw reply [flat|nested] 78+ messages in thread
[parent not found: <4ly7J-14H-9@gated-at.bofh.it>]
[parent not found: <4lRDA-4U-55@gated-at.bofh.it>]
[parent not found: <4lSJa-16Z-7@gated-at.bofh.it>]
[parent not found: <4m5ZG-2ok-1@gated-at.bofh.it>]
[parent not found: <4maPM-5XJ-5@gated-at.bofh.it>]
[parent not found: <4mcHV-7no-21@gated-at.bofh.it>]
[parent not found: <4mduc-7Zg-1@gated-at.bofh.it>]
[parent not found: <4mfcJ-UT-17@gated-at.bofh.it>]
[parent not found: <4mitV-3mL-3@gated-at.bofh.it>]
[parent not found: <4mv7Q-2Ki-19@gated-at.bofh.it>]
[parent not found: <4mwdG-3AP-15@gated-at.bofh.it>]
[parent not found: <4mwwX-3N9-9@gated-at.bofh.it>]
* Re: FUSE merging? (2) [not found] ` <4mwwX-3N9-9@gated-at.bofh.it> @ 2005-07-04 13:09 ` Bodo Eggert 2005-07-04 13:17 ` Miklos Szeredi 0 siblings, 1 reply; 78+ messages in thread From: Bodo Eggert @ 2005-07-04 13:09 UTC (permalink / raw) To: Miklos Szeredi, frankvm, miklos, frankvm, akpm, aia21, arjan, linux-kernel Miklos Szeredi <miklos@szeredi.hu> wrote: > I see your point. But then this is really not a security issue, but > an "are you sure you want to format C:" style protection for the > user's own sake. Adding a mount option (checked by the library) for > this would be fine. E.g. with "mount_nonempty" it would not refuse to > mount on a non-leaf dir, and README would document, that using this > option might cause trouble. Otherwise the mount would be refused with > a reference to the above option. IMO that should be a generic mount option, not FUSE specific. Maybe the default could vary for each fs, but I'd vote against that. -- Ich danke GMX dafür, die Verwendung meiner Adressen mittels per SPF verbreiteten Lügen zu sabotieren. ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? (2) 2005-07-04 13:09 ` FUSE merging? (2) Bodo Eggert @ 2005-07-04 13:17 ` Miklos Szeredi 2005-07-04 15:19 ` Ragnar Kjørstad 0 siblings, 1 reply; 78+ messages in thread From: Miklos Szeredi @ 2005-07-04 13:17 UTC (permalink / raw) To: 7eggert; +Cc: akpm, aia21, arjan, linux-kernel > > I see your point. But then this is really not a security issue, but > > an "are you sure you want to format C:" style protection for the > > user's own sake. Adding a mount option (checked by the library) for > > this would be fine. E.g. with "mount_nonempty" it would not refuse to > > mount on a non-leaf dir, and README would document, that using this > > option might cause trouble. Otherwise the mount would be refused with > > a reference to the above option. > > IMO that should be a generic mount option, not FUSE specific. > Maybe the default could vary for each fs, but I'd vote against that. The option only makes sense with the default being restrictive. But making that default for all filesystems can't be done, because that would immediately break thousands of existing installations. I think this makes some sense for unprivileged mounts, but otherwise not really. If sysadmin is not careful about where the mounts go, tough luck on him. Miklos ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: FUSE merging? (2) 2005-07-04 13:17 ` Miklos Szeredi @ 2005-07-04 15:19 ` Ragnar Kjørstad 0 siblings, 0 replies; 78+ messages in thread From: Ragnar Kjørstad @ 2005-07-04 15:19 UTC (permalink / raw) To: Miklos Szeredi; +Cc: 7eggert, akpm, aia21, arjan, linux-kernel On Mon, Jul 04, 2005 at 03:17:35PM +0200, Miklos Szeredi wrote: > > > I see your point. But then this is really not a security issue, but > > > an "are you sure you want to format C:" style protection for the > > > user's own sake. Adding a mount option (checked by the library) for > > > this would be fine. E.g. with "mount_nonempty" it would not refuse to > > > mount on a non-leaf dir, and README would document, that using this > > > option might cause trouble. Otherwise the mount would be refused with > > > a reference to the above option. > > > > IMO that should be a generic mount option, not FUSE specific. > > Maybe the default could vary for each fs, but I'd vote against that. Why a mount option at all? Why not just a switch for the mount utility? > The option only makes sense with the default being restrictive. But > making that default for all filesystems can't be done, because that > would immediately break thousands of existing installations. I think it is acceptable to change this behaviour in a new version of the mount utility. One could considder ignoring the restriction when running with "-a" or when running as root - that would reduce or eliminate the problems with the transition. However, if this is implemented in mount itself, it is totally orthogonal to the FUSE merge discussion. -- Ragnar Kjørstad Software Engineer Scali - http://www.scali.com Scaling the Linux Datacenter ^ permalink raw reply [flat|nested] 78+ messages in thread
end of thread, other threads:[~2005-07-04 15:20 UTC | newest]
Thread overview: 78+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-30 9:19 FUSE merging? Miklos Szeredi
2005-06-30 9:27 ` Andrew Morton
2005-06-30 9:51 ` Miklos Szeredi
2005-06-30 10:00 ` Arjan van de Ven
2005-06-30 10:12 ` Miklos Szeredi
2005-06-30 10:20 ` Arjan van de Ven
2005-06-30 10:24 ` Miklos Szeredi
2005-06-30 19:39 ` Avuton Olrich
2005-07-01 6:23 ` Miklos Szeredi
2005-06-30 11:13 ` Anton Altaparmakov
2005-06-30 19:46 ` Andrew Morton
2005-06-30 20:00 ` Andrew Morton
2005-07-01 6:40 ` Miklos Szeredi
2005-06-30 22:28 ` Frank van Maarseveen
2005-07-01 6:58 ` Miklos Szeredi
2005-07-01 9:24 ` Frank van Maarseveen
2005-07-01 10:27 ` Miklos Szeredi
2005-07-01 12:00 ` Frank van Maarseveen
2005-07-01 12:36 ` Miklos Szeredi
2005-07-01 13:05 ` Frank van Maarseveen
2005-07-01 13:21 ` Miklos Szeredi
2005-07-01 15:20 ` Frank van Maarseveen
2005-07-01 17:04 ` Miklos Szeredi
2005-07-01 18:04 ` Frank van Maarseveen
2005-07-01 19:35 ` Jeremy Maitin-Shepard
2005-07-02 14:49 ` Miklos Szeredi
2005-07-02 16:00 ` Frank van Maarseveen
2005-07-03 6:16 ` Miklos Szeredi
2005-07-03 11:25 ` Frank van Maarseveen
2005-07-03 13:24 ` Miklos Szeredi
2005-07-03 13:50 ` Frank van Maarseveen
2005-07-03 14:03 ` Miklos Szeredi
2005-07-03 14:10 ` FUSE merging? (2) Frank van Maarseveen
2005-07-03 15:47 ` Miklos Szeredi
2005-07-03 19:36 ` Frank van Maarseveen
2005-07-04 8:56 ` Miklos Szeredi
2005-07-04 9:59 ` Frank van Maarseveen
2005-07-04 10:27 ` Miklos Szeredi
2005-07-04 11:26 ` Frank van Maarseveen
2005-07-01 6:36 ` FUSE merging? Miklos Szeredi
2005-07-01 6:50 ` Andrew Morton
2005-07-01 7:07 ` Miklos Szeredi
2005-07-01 7:14 ` Andrew Morton
2005-07-01 7:27 ` Miles Bader
2005-07-01 7:38 ` Miklos Szeredi
2005-07-01 8:02 ` Andrew Morton
2005-07-01 10:11 ` Miklos Szeredi
2005-07-01 11:29 ` Andrew Morton
2005-07-01 12:00 ` Miklos Szeredi
2005-07-01 12:53 ` Anton Altaparmakov
2005-07-01 13:07 ` Anton Altaparmakov
2005-07-01 13:51 ` Frank van Maarseveen
2005-07-01 13:29 ` Eric Van Hensbergen
2005-07-01 16:45 ` Matthias Urlichs
2005-07-01 12:08 ` Frank van Maarseveen
2005-07-01 13:21 ` Eric Van Hensbergen
2005-07-01 13:53 ` Miklos Szeredi
2005-07-01 14:18 ` Eric Van Hensbergen
2005-07-01 14:31 ` Miklos Szeredi
2005-07-02 10:01 ` Eric W. Biederman
2005-07-02 14:58 ` Miklos Szeredi
2005-07-02 16:43 ` Eric Van Hensbergen
2005-07-02 17:33 ` Eric W. Biederman
2005-07-03 19:39 ` Pavel Machek
2005-07-04 8:38 ` Miklos Szeredi
[not found] ` <20050704084900.GG15370@elf.ucw.cz>
2005-07-04 9:02 ` Miklos Szeredi
2005-07-04 10:46 ` Pekka Enberg
2005-07-01 12:37 ` bert hubert
2005-07-01 7:46 ` Frederik Deweerdt
2005-07-01 9:47 ` Miklos Szeredi
2005-07-01 9:36 ` Frank van Maarseveen
2005-07-01 10:45 ` Miklos Szeredi
2005-07-01 11:34 ` Frank van Maarseveen
2005-06-30 10:16 ` Miklos Szeredi
2005-06-30 16:30 ` Pavel Machek
[not found] <4ly7J-14H-9@gated-at.bofh.it>
[not found] ` <4lRDA-4U-55@gated-at.bofh.it>
[not found] ` <4lSJa-16Z-7@gated-at.bofh.it>
[not found] ` <4m5ZG-2ok-1@gated-at.bofh.it>
[not found] ` <4maPM-5XJ-5@gated-at.bofh.it>
[not found] ` <4mcHV-7no-21@gated-at.bofh.it>
[not found] ` <4mduc-7Zg-1@gated-at.bofh.it>
[not found] ` <4mfcJ-UT-17@gated-at.bofh.it>
[not found] ` <4mitV-3mL-3@gated-at.bofh.it>
[not found] ` <4mv7Q-2Ki-19@gated-at.bofh.it>
[not found] ` <4mwdG-3AP-15@gated-at.bofh.it>
[not found] ` <4mwwX-3N9-9@gated-at.bofh.it>
2005-07-04 13:09 ` FUSE merging? (2) Bodo Eggert
2005-07-04 13:17 ` Miklos Szeredi
2005-07-04 15:19 ` Ragnar Kjørstad
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox