FUSE merging?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* FUSE merging?
@ 2005-06-30  9:19 Miklos Szeredi
  2005-06-30  9:27 ` Andrew Morton
  0 siblings, 1 reply; 80+ messages in thread
From: Miklos Szeredi @ 2005-06-30  9:19 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel

Hi Andrew!

What's up with FUSE merging?  Is there anything pending that I should
do?

Ted Ts'o's ideas about selective access to mountpoints are
interesting, but I wouldn't consider them merge critical, as they
solve a problem, that hasn't yet come up in real life.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-06-30  9:19 Miklos Szeredi
@ 2005-06-30  9:27 ` Andrew Morton
  2005-06-30  9:51   ` Miklos Szeredi
  0 siblings, 1 reply; 80+ messages in thread
From: Andrew Morton @ 2005-06-30  9:27 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-kernel

Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> What's up with FUSE merging?  Is there anything pending that I should
>  do?

Where are we up to with the fuse_allow_task() bunfight?

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-06-30  9:27 ` Andrew Morton
@ 2005-06-30  9:51   ` Miklos Szeredi
  2005-06-30 10:00     ` Arjan van de Ven
  0 siblings, 1 reply; 80+ messages in thread
From: Miklos Szeredi @ 2005-06-30  9:51 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel

> > What's up with FUSE merging?  Is there anything pending that I should
> >  do?
> 
> Where are we up to with the fuse_allow_task() bunfight?

I think we agreed, that there seem to be no alternatives.

Tytso said, that fuse_allow_task() thing is basically OK, but there
should be some method to make certain tasks excempt from this
limitation.  I agree, with this, but I think there should be at least
one (preferably more) users who actually need this, before I start
thinking about implementing it.

Making a mount be excepmt is already possible with the 'allow_other'
(privileged by default) mount option.

Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-06-30  9:51   ` Miklos Szeredi
@ 2005-06-30 10:00     ` Arjan van de Ven
  2005-06-30 10:12       ` Miklos Szeredi
  2005-06-30 10:16       ` Miklos Szeredi
  0 siblings, 2 replies; 80+ messages in thread
From: Arjan van de Ven @ 2005-06-30 10:00 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: akpm, linux-kernel

On Thu, 2005-06-30 at 11:51 +0200, Miklos Szeredi wrote:
> > > What's up with FUSE merging?  Is there anything pending that I should
> > >  do?
> > 
> > Where are we up to with the fuse_allow_task() bunfight?
> 
> I think we agreed, that there seem to be no alternatives.
> 
> Tytso said, that fuse_allow_task() thing is basically OK, but there
> should be some method to make certain tasks excempt from this
> limitation.  I agree, with this, but I think there should be at least
> one (preferably more) users who actually need this, before I start
> thinking about implementing it.
> 
> Making a mount be excepmt is already possible with the 'allow_other'
> (privileged by default) mount option.

if you are so interested in getting fuse merged... why not merge it
first with the security stuff removed entirely. And then start
discussing putting security stuff back in ?



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-06-30 10:00     ` Arjan van de Ven
@ 2005-06-30 10:12       ` Miklos Szeredi
  2005-06-30 10:20         ` Arjan van de Ven
  2005-06-30 10:16       ` Miklos Szeredi
  1 sibling, 1 reply; 80+ messages in thread
From: Miklos Szeredi @ 2005-06-30 10:12 UTC (permalink / raw)
  To: arjan; +Cc: akpm, linux-kernel

> if you are so interested in getting fuse merged... why not merge it
> first with the security stuff removed entirely. And then start
> discussing putting security stuff back in ?

  a) it's already been discussed to death (just search for 'fuse' on
     lkml and fsdevel)

  b) I don't consider it a good idea to ship a defunct version of it in
     the mainline

Can you please accept my wish to have FUSE merged _with_ the
unprivileged mount's thing.

If anybody has anything to add to the discussion, please do it now,
and not later.  Delaying this further won't get us any bonus IMO.

Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-06-30 10:00     ` Arjan van de Ven
  2005-06-30 10:12       ` Miklos Szeredi
@ 2005-06-30 10:16       ` Miklos Szeredi
  2005-06-30 16:30         ` Pavel Machek
  1 sibling, 1 reply; 80+ messages in thread
From: Miklos Szeredi @ 2005-06-30 10:16 UTC (permalink / raw)
  To: arjan; +Cc: akpm, linux-kernel

> if you are so interested in getting fuse merged... why not merge it
> first with the security stuff removed entirely. And then start
> discussing putting security stuff back in ?

BTW, I can split out the security stuff into a separate patch from the
rest, if people feel more confortable discussing a concrete patch,
instead of a range of lines (actually a 15 line function) of the
whole.

Miklos



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-06-30 10:12       ` Miklos Szeredi
@ 2005-06-30 10:20         ` Arjan van de Ven
  2005-06-30 10:24           ` Miklos Szeredi
  2005-06-30 11:13           ` Anton Altaparmakov
  0 siblings, 2 replies; 80+ messages in thread
From: Arjan van de Ven @ 2005-06-30 10:20 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: akpm, linux-kernel

On Thu, 2005-06-30 at 12:12 +0200, Miklos Szeredi wrote:
> > if you are so interested in getting fuse merged... why not merge it
> > first with the security stuff removed entirely. And then start
> > discussing putting security stuff back in ?
> 
>   a) it's already been discussed to death (just search for 'fuse' on
>      lkml and fsdevel)
> 
>   b) I don't consider it a good idea to ship a defunct version of it in
>      the mainline
> 
> Can you please accept my wish to have FUSE merged _with_ the
> unprivileged mount's thing.

By the same argument:
Then can you please accept that FUSE will not get merged right now.



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-06-30 10:20         ` Arjan van de Ven
@ 2005-06-30 10:24           ` Miklos Szeredi
  2005-06-30 19:39             ` Avuton Olrich
  2005-06-30 11:13           ` Anton Altaparmakov
  1 sibling, 1 reply; 80+ messages in thread
From: Miklos Szeredi @ 2005-06-30 10:24 UTC (permalink / raw)
  To: arjan; +Cc: akpm, linux-kernel

> By the same argument:
> Then can you please accept that FUSE will not get merged right now.

Yes.

My argument is: IF it's not going to get merged now, can we please
continue the discussion about why it's unacceptable, and what are the
alternatives.

Is that fair?

Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-06-30 10:20         ` Arjan van de Ven
  2005-06-30 10:24           ` Miklos Szeredi
@ 2005-06-30 11:13           ` Anton Altaparmakov
  2005-06-30 19:46             ` Andrew Morton
  1 sibling, 1 reply; 80+ messages in thread
From: Anton Altaparmakov @ 2005-06-30 11:13 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Miklos Szeredi, akpm, linux-kernel

On Thu, 2005-06-30 at 12:20 +0200, Arjan van de Ven wrote:
> On Thu, 2005-06-30 at 12:12 +0200, Miklos Szeredi wrote:
> > > if you are so interested in getting fuse merged... why not merge it
> > > first with the security stuff removed entirely. And then start
> > > discussing putting security stuff back in ?
> > 
> >   a) it's already been discussed to death (just search for 'fuse' on
> >      lkml and fsdevel)
> > 
> >   b) I don't consider it a good idea to ship a defunct version of it in
> >      the mainline
> > 
> > Can you please accept my wish to have FUSE merged _with_ the
> > unprivileged mount's thing.
> 
> By the same argument:
> Then can you please accept that FUSE will not get merged right now.

Why should he?  IMNSHO it should be merged right now with the security
stuff.  FUSE works as is.  Without the security stuff FUSE is useless.

I have yet to read even a single constructive argument why it should not
be merged as is.

Best regards,

        Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-06-30 10:16       ` Miklos Szeredi
@ 2005-06-30 16:30         ` Pavel Machek
  0 siblings, 0 replies; 80+ messages in thread
From: Pavel Machek @ 2005-06-30 16:30 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: arjan, akpm, linux-kernel

Hi!

> > if you are so interested in getting fuse merged... why not merge it
> > first with the security stuff removed entirely. And then start
> > discussing putting security stuff back in ?
> 
> BTW, I can split out the security stuff into a separate patch from the
> rest, if people feel more confortable discussing a concrete patch,
> instead of a range of lines (actually a 15 line function) of the
> whole.

Yes, I think that would help. [And also make it last in the series
;-)]
								Pavel
-- 
teflon -- maybe it is a trademark, but it should not be.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-06-30 10:24           ` Miklos Szeredi
@ 2005-06-30 19:39             ` Avuton Olrich
  2005-07-01  6:23               ` Miklos Szeredi
  0 siblings, 1 reply; 80+ messages in thread
From: Avuton Olrich @ 2005-06-30 19:39 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: arjan, akpm, linux-kernel

On 6/30/05, Miklos Szeredi <miklos@szeredi.hu> wrote:
> > Then can you please accept that FUSE will not get merged right now.
> My argument is: IF it's not going to get merged now, can we please
> continue the discussion about why it's unacceptable, and what are the
> alternatives.

Why has there not been more discussion about just making an option for
those 15 lines, just for merging's sake, and hopefully after more
discussion, the option will go away one way or another. On the other
hand everyone says security, security, security and I don't remember
one person actually saying something negative about what it does to
security.

avuton

-- 
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-06-30 11:13           ` Anton Altaparmakov
@ 2005-06-30 19:46             ` Andrew Morton
  2005-06-30 20:00               ` Andrew Morton
                                 ` (2 more replies)
  0 siblings, 3 replies; 80+ messages in thread
From: Andrew Morton @ 2005-06-30 19:46 UTC (permalink / raw)
  To: Anton Altaparmakov; +Cc: arjan, miklos, linux-kernel, Frank van Maarseveen

Anton Altaparmakov <aia21@cam.ac.uk> wrote:
>
> On Thu, 2005-06-30 at 12:20 +0200, Arjan van de Ven wrote:
> > On Thu, 2005-06-30 at 12:12 +0200, Miklos Szeredi wrote:
> > > > if you are so interested in getting fuse merged... why not merge it
> > > > first with the security stuff removed entirely. And then start
> > > > discussing putting security stuff back in ?
> > > 
> > >   a) it's already been discussed to death (just search for 'fuse' on
> > >      lkml and fsdevel)
> > > 
> > >   b) I don't consider it a good idea to ship a defunct version of it in
> > >      the mainline
> > > 
> > > Can you please accept my wish to have FUSE merged _with_ the
> > > unprivileged mount's thing.
> > 
> > By the same argument:
> > Then can you please accept that FUSE will not get merged right now.
> 
> Why should he?  IMNSHO it should be merged right now with the security
> stuff.  FUSE works as is.  Without the security stuff FUSE is useless.
> 
> I have yet to read even a single constructive argument why it should not
> be merged as is.

I believe that the requirement which fuse_allow_task() attempts to satisfy
is legitimate and is useful to FUSE users.

The fact that, AFAIK, nobody as found a way to implement it more nicely is
a Linux problem, not a FUSE problem.

Given that the actual amount of code involved is small, centralised and
well known about we can easily fix it up later if/when new infrastructure
or new ideas become available.

So unless someone is able to come up with a better approach in the next few
days I'm inclined to say "we suck" and merge the thing as-is.

However, a few things:

- is there anything in the current implementation of the permission stuff
  which might tie our hands if it is later reimplemented?  IOW: does the
  current FUSE user interface in any way lock us into the current FUSE
  implementation (fuse_allow_task())?

- the fuse mount options don't seem to be documented

- aren't we going to remove the nfs semi-server feature?

- Frank points out that a user can send a sigstop to his own setuid(0)
  task and he intimates that this could cause DoS problems with FUSE.  More
  details needed please?

- I don't recall seeing an exhaustive investigation of how an
  unprivileged user could use a FUSE mount to implement DoS attacks against
  other users or against root.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-06-30 19:46             ` Andrew Morton
@ 2005-06-30 20:00               ` Andrew Morton
  2005-07-01  6:40                 ` Miklos Szeredi
  2005-06-30 22:28               ` Frank van Maarseveen
  2005-07-01  6:36               ` Miklos Szeredi
  2 siblings, 1 reply; 80+ messages in thread
From: Andrew Morton @ 2005-06-30 20:00 UTC (permalink / raw)
  To: aia21, arjan, miklos, linux-kernel, frankvm

Andrew Morton <akpm@osdl.org> wrote:
>
>  However, a few things:
> 
>  - is there anything in the current implementation of the permission stuff
>    which might tie our hands if it is later reimplemented?  IOW: does the
>    current FUSE user interface in any way lock us into the current FUSE
>    implementation (fuse_allow_task())?
> 
>  - the fuse mount options don't seem to be documented
> 
>  - aren't we going to remove the nfs semi-server feature?
> 
>  - Frank points out that a user can send a sigstop to his own setuid(0)
>    task and he intimates that this could cause DoS problems with FUSE.  More
>    details needed please?
> 
>  - I don't recall seeing an exhaustive investigation of how an
>    unprivileged user could use a FUSE mount to implement DoS attacks against
>    other users or against root.

You say

  "If a sysadmin trusts the users enough, or can ensure through other
   measures, that system processes will never enter non-privileged mounts,
   it can relax the last limitation with a "user_allow_other" config
   option.  If this config option is set, the mounting user can add the
   "allow_other" mount option which disables the check for other users'
   processes."

What config option, where?


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-06-30 19:46             ` Andrew Morton
  2005-06-30 20:00               ` Andrew Morton
@ 2005-06-30 22:28               ` Frank van Maarseveen
  2005-07-01  6:58                 ` Miklos Szeredi
  2005-07-01  6:36               ` Miklos Szeredi
  2 siblings, 1 reply; 80+ messages in thread
From: Frank van Maarseveen @ 2005-06-30 22:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Anton Altaparmakov, arjan, miklos, linux-kernel,
	Frank van Maarseveen

On Thu, Jun 30, 2005 at 12:46:22PM -0700, Andrew Morton wrote:
> 
> - Frank points out that a user can send a sigstop to his own setuid(0)
>   task and he intimates that this could cause DoS problems with FUSE.  More
>   details needed please?

It's the other way around:
Apparently it is not a security problem to SIGSTOP or even SIGKILL a
setuid program. So why is it a security problem when such a program is
delayed by a supposedly malicious behaving FUSE mount?

I think that setuid programs take too many things for granted, especially
"time". I also think the ptrace equivalence principle (item C2 in the
FUSE doc) is too harsh for FUSE.

Suppose the process changes id to full root and we can no longer send
signals to it. Are there any other ways we could affect its scheduling
without FUSE? I think "yes", clearly not that easy as when it accesses a
FUSE mount but "yes". Think about typing ^S (XOFF), or by letting it read
from a pipe or from a file on a very very slow device. Or by renicing
the parent in advance. Regarding the pipe: yes the setuid program could
check that with fstat() but is such a check fundamentally the right
approach? I have doubt because unified I/O is a good thing and there is
no guarantee whatsoever about completion of any FS operation within a
certain amount of time. Suppose another malicious process does a lookup
in a huge directory without hashed names? What about a process consuming
lots of memory, pushing everything else into swap? What about deleting
a _huge_ file or do other things which might(?) take a considerable
amount of kernel time? [id]notify might even help using this to delay
a root process at a crucial point to exploit a race. So, I think there
are many ways to affect the execution speed of [setuid] programs. I
have never heard of a setuid root program which renices itself, such,
that it successfully avoids a race or DoS exploit.

And then the DoS thing using simulated endless files within FUSE. It is
already possible to create terabyte sized [sparse] files. Can the fstat()
size/blocks info be trusted from FUSE? no more than fstat() outside FUSE
because the file may still be growing!

> - I don't recall seeing an exhaustive investigation of how an
>   unprivileged user could use a FUSE mount to implement DoS attacks against
>   other users or against root.

In general I think it is _hard_ to protect against a local DoS for many
reasons and I don't see any new fundamental problem here with FUSE:
it is just making it more obvious that it's hard to write secure setuid
programs. Those programs should _know_ that input data and anything else
from the user is "tainted" and that they must be _very_ careful with it,
in every detail.

-- 
Frank

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-06-30 19:39             ` Avuton Olrich
@ 2005-07-01  6:23               ` Miklos Szeredi
  0 siblings, 0 replies; 80+ messages in thread
From: Miklos Szeredi @ 2005-07-01  6:23 UTC (permalink / raw)
  To: avuton; +Cc: arjan, akpm, linux-kernel

> > > Then can you please accept that FUSE will not get merged right now.
> > My argument is: IF it's not going to get merged now, can we please
> > continue the discussion about why it's unacceptable, and what are the
> > alternatives.
> 
> Why has there not been more discussion about just making an option for
> those 15 lines, just for merging's sake, and hopefully after more
> discussion, the option will go away one way or another. On the other
> hand everyone says security, security, security and I don't remember
> one person actually saying something negative about what it does to
> security.

There is a mount option: 'allow_other' which does just this.  Or did
you mean a config option?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-06-30 19:46             ` Andrew Morton
  2005-06-30 20:00               ` Andrew Morton
  2005-06-30 22:28               ` Frank van Maarseveen
@ 2005-07-01  6:36               ` Miklos Szeredi
  2005-07-01  6:50                 ` Andrew Morton
                                   ` (2 more replies)
  2 siblings, 3 replies; 80+ messages in thread
From: Miklos Szeredi @ 2005-07-01  6:36 UTC (permalink / raw)
  To: akpm; +Cc: aia21, arjan, miklos, linux-kernel, frankvm

> However, a few things:
> 
> - is there anything in the current implementation of the permission stuff
>   which might tie our hands if it is later reimplemented?  IOW: does the
>   current FUSE user interface in any way lock us into the current FUSE
>   implementation (fuse_allow_task())?

No.  This thing is above the userspace interface and completely
independent.  Either a task is allowed, and then the request goes
through to the interface.  Or if it's not, the request is stopped
right there, and never reaches the userspace interface.

> - the fuse mount options don't seem to be documented

True.  I'll send a patch (they are documented in the README of the
fuse distribution).

> - aren't we going to remove the nfs semi-server feature?

I leave the decision to you ;)  It's a separate independent patch
already (fuse-nfs-export.patch).

> - Frank points out that a user can send a sigstop to his own setuid(0)
>   task and he intimates that this could cause DoS problems with FUSE.  More
>   details needed please?

Will follow up in Franks answer.

> - I don't recall seeing an exhaustive investigation of how an
>   unprivileged user could use a FUSE mount to implement DoS attacks against
>   other users or against root.

Here's a description of a theoretical DoS scenario:

  http://marc.theaimsgroup.com/?l=linux-fsdevel&m=111522019516694&w=2

Miklos


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-06-30 20:00               ` Andrew Morton
@ 2005-07-01  6:40                 ` Miklos Szeredi
  0 siblings, 0 replies; 80+ messages in thread
From: Miklos Szeredi @ 2005-07-01  6:40 UTC (permalink / raw)
  To: akpm; +Cc: aia21, arjan, miklos, linux-kernel, frankvm

> >  - I don't recall seeing an exhaustive investigation of how an
> >    unprivileged user could use a FUSE mount to implement DoS attacks against
> >    other users or against root.
> 
> You say
> 
>   "If a sysadmin trusts the users enough, or can ensure through other
>    measures, that system processes will never enter non-privileged mounts,
>    it can relax the last limitation with a "user_allow_other" config
>    option.  If this config option is set, the mounting user can add the
>    "allow_other" mount option which disables the check for other users'
>    processes."
> 
> What config option, where?

Currently that's a userspace issue.  There's a /etc/fuse.conf file,
with two options:

  max_mounts=X
  user_allow_other

The fusermount helper reads this file, and decides if passing the
'allow_other' mount option to the kernel is OK or not.

If we want unprivileged sys_mount() these will have to be checked in
kernel (set via sysfs, etc).

Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01  6:36               ` Miklos Szeredi
@ 2005-07-01  6:50                 ` Andrew Morton
  2005-07-01  7:07                   ` Miklos Szeredi
  2005-07-01 12:37                   ` bert hubert
  2005-07-01  7:46                 ` Frederik Deweerdt
  2005-07-01  9:36                 ` Frank van Maarseveen
  2 siblings, 2 replies; 80+ messages in thread
From: Andrew Morton @ 2005-07-01  6:50 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: aia21, arjan, miklos, linux-kernel, frankvm

Miklos Szeredi <miklos@szeredi.hu> wrote:
>
>  > - aren't we going to remove the nfs semi-server feature?
> 
>  I leave the decision to you ;)  It's a separate independent patch
>  already (fuse-nfs-export.patch).

Let's leave it out - that'll stimulate some activity in the
userspace-nfs-server-for-FUSE area.

Speaking of which, dumb question: what does FUSE offer over simply using
NFS protocol to talk to the userspace filesystem driver?

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-06-30 22:28               ` Frank van Maarseveen
@ 2005-07-01  6:58                 ` Miklos Szeredi
  2005-07-01  9:24                   ` Frank van Maarseveen
  0 siblings, 1 reply; 80+ messages in thread
From: Miklos Szeredi @ 2005-07-01  6:58 UTC (permalink / raw)
  To: frankvm; +Cc: akpm, aia21, arjan, miklos, linux-kernel, frankvm

> > 
> > - Frank points out that a user can send a sigstop to his own setuid(0)
> >   task and he intimates that this could cause DoS problems with FUSE.  More
> >   details needed please?
> 
> It's the other way around:
> Apparently it is not a security problem to SIGSTOP or even SIGKILL a
> setuid program. So why is it a security problem when such a program is
> delayed by a supposedly malicious behaving FUSE mount?

Perfectly valid argument.  My question: is it not a security problem
to allow signals to reach a suid program?

> I think that setuid programs take too many things for granted, especially
> "time". I also think the ptrace equivalence principle (item C2 in the
> FUSE doc) is too harsh for FUSE.

It's obviously not equivalence.  FUSE filesystem gets a subset of
ptrace's capabilities (and rather a small one).

> Suppose the process changes id to full root and we can no longer send
> signals to it. Are there any other ways we could affect its scheduling
> without FUSE? I think "yes", clearly not that easy as when it accesses a
> FUSE mount but "yes". Think about typing ^S (XOFF), or by letting it read
> from a pipe or from a file on a very very slow device. Or by renicing
> the parent in advance. Regarding the pipe: yes the setuid program could
> check that with fstat() but is such a check fundamentally the right
> approach? I have doubt because unified I/O is a good thing and there is
> no guarantee whatsoever about completion of any FS operation within a
> certain amount of time. Suppose another malicious process does a lookup
> in a huge directory without hashed names? What about a process consuming
> lots of memory, pushing everything else into swap? What about deleting
> a _huge_ file or do other things which might(?) take a considerable
> amount of kernel time? [id]notify might even help using this to delay
> a root process at a crucial point to exploit a race. So, I think there
> are many ways to affect the execution speed of [setuid] programs. I
> have never heard of a setuid root program which renices itself, such,
> that it successfully avoids a race or DoS exploit.

There's a huge difference between slowing down, and stopping a
process.  I wouldn't consider the first a true DoS. 

> And then the DoS thing using simulated endless files within FUSE. It is
> already possible to create terabyte sized [sparse] files. Can the fstat()
> size/blocks info be trusted from FUSE? no more than fstat() outside FUSE
> because the file may still be growing!
> 
> > - I don't recall seeing an exhaustive investigation of how an
> >   unprivileged user could use a FUSE mount to implement DoS attacks against
> >   other users or against root.
> 
> In general I think it is _hard_ to protect against a local DoS for many
> reasons and I don't see any new fundamental problem here with FUSE:
> it is just making it more obvious that it's hard to write secure setuid
> programs. Those programs should _know_ that input data and anything else
> from the user is "tainted" and that they must be _very_ careful with it,
> in every detail.

Yes.  The extra problem with FUSE, is that they are not _able_ to be
careful.  They can't even check if a file is in fact on a FUSE mount
or not without the FUSE daemon's intervention (lookup on a file will
be passed to userspace).

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01  6:50                 ` Andrew Morton
@ 2005-07-01  7:07                   ` Miklos Szeredi
  2005-07-01  7:14                     ` Andrew Morton
  2005-07-01 12:37                   ` bert hubert
  1 sibling, 1 reply; 80+ messages in thread
From: Miklos Szeredi @ 2005-07-01  7:07 UTC (permalink / raw)
  To: akpm; +Cc: aia21, arjan, linux-kernel, frankvm

> >
> >  > - aren't we going to remove the nfs semi-server feature?
> > 
> >  I leave the decision to you ;)  It's a separate independent patch
> >  already (fuse-nfs-export.patch).
> 
> Let's leave it out - that'll stimulate some activity in the
> userspace-nfs-server-for-FUSE area.
> 
> Speaking of which, dumb question: what does FUSE offer over simply using
> NFS protocol to talk to the userspace filesystem driver?

Oh lots:

  - no deadlocks (NFS mounted from localhost is riddled with them)

  - efficient protocol, optimized for less context switches

  - dcache invalidation policy
  
  - probably more, but I can't remember

Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01  7:07                   ` Miklos Szeredi
@ 2005-07-01  7:14                     ` Andrew Morton
  2005-07-01  7:27                       ` Miles Bader
  2005-07-01  7:38                       ` Miklos Szeredi
  0 siblings, 2 replies; 80+ messages in thread
From: Andrew Morton @ 2005-07-01  7:14 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: aia21, arjan, linux-kernel, frankvm

Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> > >
> > >  > - aren't we going to remove the nfs semi-server feature?
> > > 
> > >  I leave the decision to you ;)  It's a separate independent patch
> > >  already (fuse-nfs-export.patch).
> > 
> > Let's leave it out - that'll stimulate some activity in the
> > userspace-nfs-server-for-FUSE area.
> > 
> > Speaking of which, dumb question: what does FUSE offer over simply using
> > NFS protocol to talk to the userspace filesystem driver?
> 
> Oh lots:
> 
>   - no deadlocks (NFS mounted from localhost is riddled with them)

It is?  We had some low-memory problems a while back, but they got fixed. 
During that work I did some nfs-to-localhost testing and things seemed OK.

>   - efficient protocol, optimized for less context switches

One wouldn't really expect a userspace filesystem to be particularly fast,
and the performance will be dominated by memory copies and IO wait anyway.

>   - dcache invalidation policy

What's that?

>   - probably more, but I can't remember

Please do..

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01  7:14                     ` Andrew Morton
@ 2005-07-01  7:27                       ` Miles Bader
  2005-07-01  7:38                       ` Miklos Szeredi
  1 sibling, 0 replies; 80+ messages in thread
From: Miles Bader @ 2005-07-01  7:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Miklos Szeredi, aia21, arjan, linux-kernel, frankvm

Andrew Morton <akpm@osdl.org> writes:
>>   - efficient protocol, optimized for less context switches
>
> One wouldn't really expect a userspace filesystem to be particularly fast,
> and the performance will be dominated by memory copies and IO wait anyway.

Well there's slow and then there's slow...  numbers are always nice though.

-miles
-- 
[|nurgle|]  ddt- demonic? so quake will have an evil kinda setting? one that
            will  make every christian in the world foamm at the mouth?
[iddt]      nurg, that's the goal

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01  7:14                     ` Andrew Morton
  2005-07-01  7:27                       ` Miles Bader
@ 2005-07-01  7:38                       ` Miklos Szeredi
  2005-07-01  8:02                         ` Andrew Morton
  1 sibling, 1 reply; 80+ messages in thread
From: Miklos Szeredi @ 2005-07-01  7:38 UTC (permalink / raw)
  To: akpm; +Cc: miklos, aia21, arjan, linux-kernel, frankvm

> >
> > > >
> > > >  > - aren't we going to remove the nfs semi-server feature?
> > > > 
> > > >  I leave the decision to you ;)  It's a separate independent patch
> > > >  already (fuse-nfs-export.patch).
> > > 
> > > Let's leave it out - that'll stimulate some activity in the
> > > userspace-nfs-server-for-FUSE area.
> > > 
> > > Speaking of which, dumb question: what does FUSE offer over simply using
> > > NFS protocol to talk to the userspace filesystem driver?
> > 
> > Oh lots:
> > 
> >   - no deadlocks (NFS mounted from localhost is riddled with them)
> 
> It is?  We had some low-memory problems a while back, but they got fixed. 
> During that work I did some nfs-to-localhost testing and things seemed OK.

Well, there's the "unsolvable" writeback deadlock problem, that FUSE
works around by not buffering dirty pages (and not allowing writable
mmap).  Does NFS solve that?  I'm interested :)

Then there's the usual "filesystem recursing into itself" deadlock.
Mounting with 'intr' probably solves this for NFS, but that has
unwanted side effects.  FUSE only allows KILL to interrupt a request.

> >   - efficient protocol, optimized for less context switches
> 
> One wouldn't really expect a userspace filesystem to be particularly fast,

FUSE is pretty fast.  >100Mbytes/s transfer speeds on a moderate
hardware are not unusual.

> and the performance will be dominated by memory copies and IO wait anyway.

Memory copies don't seem to be an issue (and FUSE does very little of
it).  Performance is mostly dominated by context switch times (if the
underlying filesystem can keep up).  Unfortunately unbuffered writes
mean a separate request for each written page, and thus a context
switch (on UP at least).  This has a marked effect on write
performance.

> >   - dcache invalidation policy
> 
> What's that?

Userspace can tell the kernel, how long a dentry should be valid.  I
don't think the NFS protocol provides this. Same holds for the inode
attributes.

> >   - probably more, but I can't remember
> 
> Please do..

OK, I'll do a little research.

Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01  6:36               ` Miklos Szeredi
  2005-07-01  6:50                 ` Andrew Morton
@ 2005-07-01  7:46                 ` Frederik Deweerdt
  2005-07-01  9:47                   ` Miklos Szeredi
  2005-07-01  9:36                 ` Frank van Maarseveen
  2 siblings, 1 reply; 80+ messages in thread
From: Frederik Deweerdt @ 2005-07-01  7:46 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: akpm, aia21, arjan, linux-kernel, frankvm

Le 01/07/05 08:36 +0200, Miklos Szeredi écrivit:
> Here's a description of a theoretical DoS scenario:
> 
>   http://marc.theaimsgroup.com/?l=linux-fsdevel&m=111522019516694&w=2
> 
> Miklos
Could this be solved by implementing some sort of (optional) timeout on fuse
syscalls (in request_send)?

Fred

-- 
o---------------------------------------------o
| http://open-news.net : l'info alternative   |
| Tech - Sciences - Politique - International |
o---------------------------------------------o

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01  7:38                       ` Miklos Szeredi
@ 2005-07-01  8:02                         ` Andrew Morton
  2005-07-01 10:11                           ` Miklos Szeredi
  2005-07-03 19:39                           ` Pavel Machek
  0 siblings, 2 replies; 80+ messages in thread
From: Andrew Morton @ 2005-07-01  8:02 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: miklos, aia21, arjan, linux-kernel, frankvm

Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> > >
> > > > >
> > > > >  > - aren't we going to remove the nfs semi-server feature?
> > > > > 
> > > > >  I leave the decision to you ;)  It's a separate independent patch
> > > > >  already (fuse-nfs-export.patch).
> > > > 
> > > > Let's leave it out - that'll stimulate some activity in the
> > > > userspace-nfs-server-for-FUSE area.
> > > > 
> > > > Speaking of which, dumb question: what does FUSE offer over simply using
> > > > NFS protocol to talk to the userspace filesystem driver?
> > > 
> > > Oh lots:
> > > 
> > >   - no deadlocks (NFS mounted from localhost is riddled with them)
> > 
> > It is?  We had some low-memory problems a while back, but they got fixed. 
> > During that work I did some nfs-to-localhost testing and things seemed OK.
> 
> Well, there's the "unsolvable" writeback deadlock problem, that FUSE
> works around by not buffering dirty pages (and not allowing writable
> mmap).  Does NFS solve that?  I'm interested :)

I don't know - first you'd have to describe it.

> Then there's the usual "filesystem recursing into itself" deadlock.

Describe this completely as well, please.

> Mounting with 'intr' probably solves this for NFS, but that has
> unwanted side effects.  FUSE only allows KILL to interrupt a request.

Maybe these things can be solved in NFS?

> > >   - dcache invalidation policy
> > 
> > What's that?
> 
> Userspace can tell the kernel, how long a dentry should be valid.  I
> don't think the NFS protocol provides this. Same holds for the inode
> attributes.

Why is that needed?

> > >   - probably more, but I can't remember
> > 
> > Please do..
> 
> OK, I'll do a little research.
> 

v9fs has a user-level server too.  Maybe it has been used in FUSE-like
scenarios more than NFS.

Plus NFS and v9fs work across the network...

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01  6:58                 ` Miklos Szeredi
@ 2005-07-01  9:24                   ` Frank van Maarseveen
  2005-07-01 10:27                     ` Miklos Szeredi
  0 siblings, 1 reply; 80+ messages in thread
From: Frank van Maarseveen @ 2005-07-01  9:24 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: frankvm, akpm, aia21, arjan, linux-kernel

On Fri, Jul 01, 2005 at 08:58:05AM +0200, Miklos Szeredi wrote:
> > > 
> > > - Frank points out that a user can send a sigstop to his own setuid(0)
> > >   task and he intimates that this could cause DoS problems with FUSE.  More
> > >   details needed please?
> > 
> > It's the other way around:
> > Apparently it is not a security problem to SIGSTOP or even SIGKILL a
> > setuid program. So why is it a security problem when such a program is
> > delayed by a supposedly malicious behaving FUSE mount?
> 
> Perfectly valid argument.  My question: is it not a security problem
> to allow signals to reach a suid program?

That's what I though too so I asked it first on the security mailing list.
Apparently this signal behavior is normal.

> There's a huge difference between slowing down, and stopping a
> process.  I wouldn't consider the first a true DoS. 

Stopping is a special case. But it is effectively the same as being
indefinately slowed down by, say, 10000+ malicious processes and from
that angle I don't see a fundamental difference w.r.t. security.

Killing the malicous processes should solve the problem. And killing
one FUSE process looks easier to me than killing 10000+ ones.

> Yes.  The extra problem with FUSE, is that they are not _able_ to be
> careful.

I think this is not true. Every pathname passed to a setuid program
by the user is basically "tainted". Standard I/O is tainted as well.

> They can't even check if a file is in fact on a FUSE mount

They shouldn't. The pathname is not to be trusted anyway.

I think FUSE has shown to be conservative enough w.r.t. security to be
merged. But it may be interesting to consider:

-	replace ptraceability test by a kill()ability test.
-	some sort of "intr" mount option for most signals on by default.
-	Forbid hiding data by mounting a FUSE filesystem on top of it (does
	FUSE check for this already?)
-	/proc isn't a problem: most root processes tend to avoid it because
	it is synthetic and thus uninteresting. Maybe we should extend
	the idea of "synthetic file-systems being uninteresting" to any
	process which cannot receive signals from the FUSE mount owner. When
	one cannot hide data by a FUSE mount and its synthetic anyway so not
	interesting then just show the original empty mount point.

-- 
Frank

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01  6:36               ` Miklos Szeredi
  2005-07-01  6:50                 ` Andrew Morton
  2005-07-01  7:46                 ` Frederik Deweerdt
@ 2005-07-01  9:36                 ` Frank van Maarseveen
  2005-07-01 10:45                   ` Miklos Szeredi
  2 siblings, 1 reply; 80+ messages in thread
From: Frank van Maarseveen @ 2005-07-01  9:36 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: akpm, aia21, arjan, linux-kernel, frankvm

On Fri, Jul 01, 2005 at 08:36:02AM +0200, Miklos Szeredi wrote:
> 
> Here's a description of a theoretical DoS scenario:
> 
>   http://marc.theaimsgroup.com/?l=linux-fsdevel&m=111522019516694&w=2

So the open() hangs indefinately. but what if blackhat tries to install
a package from a no longer existing server on /net or via NFS?

A user supplied pathname is not to be trusted by any setuid (or full
root) program.

Another example: I'm not sure if there are still /dev/tty devices which
may block indefinately upon open() but:

-	I have yet to see a setuid program which always uses O_NONBLOCK
	when opening user supplied pathnames.
-	one cannot stat() and then open() because that gives a race.

-- 
Frank

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01  7:46                 ` Frederik Deweerdt
@ 2005-07-01  9:47                   ` Miklos Szeredi
  0 siblings, 0 replies; 80+ messages in thread
From: Miklos Szeredi @ 2005-07-01  9:47 UTC (permalink / raw)
  To: frederik.deweerdt; +Cc: akpm, aia21, arjan, linux-kernel, frankvm

> Could this be solved by implementing some sort of (optional) timeout on fuse
> syscalls (in request_send)?

Yes, but that would be thousand times worse than the current solution.
You just can't know in advance, what a "sane" timeout value is.

Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01  8:02                         ` Andrew Morton
@ 2005-07-01 10:11                           ` Miklos Szeredi
  2005-07-01 11:29                             ` Andrew Morton
                                               ` (2 more replies)
  2005-07-03 19:39                           ` Pavel Machek
  1 sibling, 3 replies; 80+ messages in thread
From: Miklos Szeredi @ 2005-07-01 10:11 UTC (permalink / raw)
  To: akpm; +Cc: miklos, miklos, aia21, arjan, linux-kernel, frankvm

> > Well, there's the "unsolvable" writeback deadlock problem, that FUSE
> > works around by not buffering dirty pages (and not allowing writable
> > mmap).  Does NFS solve that?  I'm interested :)
> 
> I don't know - first you'd have to describe it.

A dirty page is being written back, but the userspace server needs to
allocate memory to complete the request.  But the allocation will
block, since there's no more free memory.  

> > Then there's the usual "filesystem recursing into itself" deadlock.
> 
> Describe this completely as well, please.

User does unlink("/mnt/userfs/file").  Userspace server receives
request to unlink "/file".  Then the daemon does
unlink("/mnt/userfs/file").  This will deadlock on i_sem.

> > Mounting with 'intr' probably solves this for NFS, but that has
> > unwanted side effects.  FUSE only allows KILL to interrupt a request.
> 
> Maybe these things can be solved in NFS?

Possibly.

> 
> > > >   - dcache invalidation policy
> > > 
> > > What's that?
> > 
> > Userspace can tell the kernel, how long a dentry should be valid.  I
> > don't think the NFS protocol provides this. Same holds for the inode
> > attributes.
> 
> Why is that needed?

Because, I can well imagine a synthetic filesystem, where file
data/metadata change aribitrarily.  In this case the timeout heuristic
in NFS is not useful.

In fact with NFS it's often a PITA, that it doesn't want to refresh a
file's data/metatata, which I _know_ has changed on the server.

> > > >   - probably more, but I can't remember
> > > 
> > > Please do..
> > 
> > OK, I'll do a little research.
> > 
> 
> v9fs has a user-level server too.  Maybe it has been used in FUSE-like
> scenarios more than NFS.

I think the p9 protocol is suffering from trying to be too generic.
The FUSE kernel interface is probably slightly tied to the linux VFS,
and would present problems if trying to port to other *NIX or god
forbid some other OS family altogether.

That may seem like a drawback, but I don't think it is:

   - people are encouraged to use the FUSE library API instead of the
     raw kernel interface

   - if it will be ported to other systems, even the kernel interface
     could probably be made compatible, only at the loss of
     simplicity/performance.

> Plus NFS and v9fs work across the network...

Yes.  I consider that a drawback.  FUSE does data transfer very
efficiently (single copy), without the heavy network infrastructure
being in the way.

Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01  9:24                   ` Frank van Maarseveen
@ 2005-07-01 10:27                     ` Miklos Szeredi
  2005-07-01 12:00                       ` Frank van Maarseveen
  0 siblings, 1 reply; 80+ messages in thread
From: Miklos Szeredi @ 2005-07-01 10:27 UTC (permalink / raw)
  To: frankvm; +Cc: miklos, frankvm, akpm, aia21, arjan, linux-kernel

> > Perfectly valid argument.  My question: is it not a security problem
> > to allow signals to reach a suid program?
> 
> That's what I though too so I asked it first on the security mailing list.
> Apparently this signal behavior is normal.

Well, I think it's a fertile ground for hole hunters out there.  Just
needs a little publicity ;)

Is it considered DoS for example if I prevent other users from sending
email?  SIGSTOP on sendmail at the right moment (when the database is
locked) should do it fine.

> Stopping is a special case. But it is effectively the same as being
> indefinately slowed down by, say, 10000+ malicious processes and from
> that angle I don't see a fundamental difference w.r.t. security.

On a well protected multiuser system there will be ulimits in place to
prevent that.

> Killing the malicous processes should solve the problem. And killing
> one FUSE process looks easier to me than killing 10000+ ones.

Killing always works, if the sysadmin happens to be around.  If not
then there's not a lot other users can do.

> I think this is not true. Every pathname passed to a setuid program
> by the user is basically "tainted". Standard I/O is tainted as well.

You mean suid programs are never to touch paths passed to them?

If that would be true, then fuse_allow_task() would not be needed, but
would do no harm either, since it would never be invoked by a suid
program.

> > They can't even check if a file is in fact on a FUSE mount
> 
> They shouldn't. The pathname is not to be trusted anyway.
> 
> I think FUSE has shown to be conservative enough w.r.t. security to be
> merged. But it may be interesting to consider:
> 
> -	replace ptraceability test by a kill()ability test.

You didn't consider the information leak aspect (point B in fuse.txt).

> -	some sort of "intr" mount option for most signals on by default.

KILL will always interrupt a request.  So getting rid of a malicious
mount should present no problems.

> -	Forbid hiding data by mounting a FUSE filesystem on top of it (does
> 	FUSE check for this already?)

Yes.  It checks for writablilty on the mountpoing (excluding limited
writablilty as /tmp for example).

> -	/proc isn't a problem: most root processes tend to avoid it because
> 	it is synthetic and thus uninteresting. Maybe we should extend
> 	the idea of "synthetic file-systems being uninteresting" to any
> 	process which cannot receive signals from the FUSE mount owner. When
> 	one cannot hide data by a FUSE mount and its synthetic anyway so not
> 	interesting then just show the original empty mount point.

Been there.  People (like Al Viro) didn't like it.

Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01  9:36                 ` Frank van Maarseveen
@ 2005-07-01 10:45                   ` Miklos Szeredi
  2005-07-01 11:34                     ` Frank van Maarseveen
  0 siblings, 1 reply; 80+ messages in thread
From: Miklos Szeredi @ 2005-07-01 10:45 UTC (permalink / raw)
  To: frankvm; +Cc: akpm, aia21, arjan, linux-kernel, frankvm

> > 
> > Here's a description of a theoretical DoS scenario:
> > 
> >   http://marc.theaimsgroup.com/?l=linux-fsdevel&m=111522019516694&w=2
> 
> So the open() hangs indefinately. but what if blackhat tries to install
> a package from a no longer existing server on /net or via NFS?
> 
> A user supplied pathname is not to be trusted by any setuid (or full
> root) program.

If /net won't detect a dead server within a timeout, I think it can be
considered broken.

> Another example: I'm not sure if there are still /dev/tty devices which
> may block indefinately upon open() but:
> 
> -	I have yet to see a setuid program which always uses O_NONBLOCK
> 	when opening user supplied pathnames.
> -	one cannot stat() and then open() because that gives a race.

Is "being already broken" an excuse for preventing future breakage,
when these are fixed?

Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01 10:11                           ` Miklos Szeredi
@ 2005-07-01 11:29                             ` Andrew Morton
  2005-07-01 12:00                               ` Miklos Szeredi
                                                 ` (3 more replies)
  2005-07-01 12:08                             ` Frank van Maarseveen
  2005-07-01 13:21                             ` Eric Van Hensbergen
  2 siblings, 4 replies; 80+ messages in thread
From: Andrew Morton @ 2005-07-01 11:29 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: miklos, aia21, arjan, linux-kernel, frankvm

Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> > > Well, there's the "unsolvable" writeback deadlock problem, that FUSE
> > > works around by not buffering dirty pages (and not allowing writable
> > > mmap).  Does NFS solve that?  I'm interested :)
> > 
> > I don't know - first you'd have to describe it.
> 
> A dirty page is being written back, but the userspace server needs to
> allocate memory to complete the request.  But the allocation will
> block, since there's no more free memory.  

That shouldn't happen with write() traffic due to the dirty memory
balancing logic.

It'll happen with MAP_SHARED.  Totally disallowing MAP_SHARED sounds a bit
drastic, but of course nfs/v9fs could be taught to do that.

> > > Then there's the usual "filesystem recursing into itself" deadlock.
> > 
> > Describe this completely as well, please.
> 
> User does unlink("/mnt/userfs/file").  Userspace server receives
> request to unlink "/file".  Then the daemon does
> unlink("/mnt/userfs/file").  This will deadlock on i_sem.

eh?  How can the fuse client and the fuse server both get access to the
same file in this manner?  I don't see how you could set that up with NFS,
for example.

> > > Userspace can tell the kernel, how long a dentry should be valid.  I
> > > don't think the NFS protocol provides this. Same holds for the inode
> > > attributes.
> > 
> > Why is that needed?
> 
> Because, I can well imagine a synthetic filesystem, where file
> data/metadata change aribitrarily.  In this case the timeout heuristic
> in NFS is not useful.
> 
> In fact with NFS it's often a PITA, that it doesn't want to refresh a
> file's data/metatata, which I _know_ has changed on the server.

I think nfs can do this, as long as the modification was done through the
server.  I'd expect v9fs would be the same.

> > Plus NFS and v9fs work across the network...
> 
> Yes.  I consider that a drawback.

Others (many) would disagree.


Sorry, but I'm not buying it.  I still don't see a solid reason why all
this could not be done with nfs/v9fs, some kernel tweaks and the rest in
userspace.  It would take some effort, but that effort would end up
strengthening existing kernel capabilities rather than adding brand new
things, which is good.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01 10:45                   ` Miklos Szeredi
@ 2005-07-01 11:34                     ` Frank van Maarseveen
  0 siblings, 0 replies; 80+ messages in thread
From: Frank van Maarseveen @ 2005-07-01 11:34 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: frankvm, akpm, aia21, arjan, linux-kernel

On Fri, Jul 01, 2005 at 12:45:22PM +0200, Miklos Szeredi wrote:
> > > 
> > > Here's a description of a theoretical DoS scenario:
> > > 
> > >   http://marc.theaimsgroup.com/?l=linux-fsdevel&m=111522019516694&w=2
> > 
> > So the open() hangs indefinately. but what if blackhat tries to install
> > a package from a no longer existing server on /net or via NFS?
> > 
> > A user supplied pathname is not to be trusted by any setuid (or full
> > root) program.
> 
> If /net won't detect a dead server within a timeout, I think it can be
> considered broken.
> 
> > Another example: I'm not sure if there are still /dev/tty devices which
> > may block indefinately upon open() but:
> > 
> > -	I have yet to see a setuid program which always uses O_NONBLOCK
> > 	when opening user supplied pathnames.
> > -	one cannot stat() and then open() because that gives a race.
> 
> Is "being already broken" an excuse for preventing future breakage,
> when these are fixed?

All this breakage points into the same direction: A user supplied pathname
is not to be trusted by any setuid (or full root) program.

-- 
Frank

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01 10:27                     ` Miklos Szeredi
@ 2005-07-01 12:00                       ` Frank van Maarseveen
  2005-07-01 12:36                         ` Miklos Szeredi
  0 siblings, 1 reply; 80+ messages in thread
From: Frank van Maarseveen @ 2005-07-01 12:00 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: frankvm, akpm, aia21, arjan, linux-kernel

On Fri, Jul 01, 2005 at 12:27:01PM +0200, Miklos Szeredi wrote:
> 
> You mean suid programs are never to touch paths passed to them?

never when euid==root.
The pathname could even point into /proc or anything else yet unknown,
e.g. by putting some symlinks at the right places. The mere act of
opening the file as root could have unwanted side effects already.

> 
> If that would be true, then fuse_allow_task() would not be needed, but
> would do no harm either, since it would never be invoked by a suid
> program.

In theory it should not be necessary. But on a practical side: we need
to provide security for daemons with elevated privileges which need to
traverse all local disks.

> You didn't consider the information leak aspect (point B in fuse.txt).

Correct. I have no answer to that other than: is it a real problem or
yet something else a setuid program should take into consideration?
And what info can we extract already using inotify/dnotify? There are
several ways to monitor activity and it is all information. /proc (ps)
gives information too.

> > -	Forbid hiding data by mounting a FUSE filesystem on top of it (does
> > 	FUSE check for this already?)
> 
> Yes.  It checks for writablilty on the mountpoing (excluding limited
> writablilty as /tmp for example).

But can you mount FUSE on top of a populated tree, a non-leaf dir?

> > -	/proc isn't a problem: most root processes tend to avoid it because
> > 	it is synthetic and thus uninteresting. Maybe we should extend
> > 	the idea of "synthetic file-systems being uninteresting" to any
> > 	process which cannot receive signals from the FUSE mount owner. When
> > 	one cannot hide data by a FUSE mount and its synthetic anyway so not
> > 	interesting then just show the original empty mount point.
> 
> Been there.  People (like Al Viro) didn't like it.

including changing the ptraceability test by a signal test and including
the (IMHO) required emptyness of the mount stub?

Traversing a FUSE mountpoint is almost equivalent to talking with a
userspace program. Why should that be interesting when one simply wants
to traverse the FS? root isn't going to execute all user programs to
see what they do either.

-- 
Frank

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01 11:29                             ` Andrew Morton
@ 2005-07-01 12:00                               ` Miklos Szeredi
  2005-07-01 12:53                               ` Anton Altaparmakov
                                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 80+ messages in thread
From: Miklos Szeredi @ 2005-07-01 12:00 UTC (permalink / raw)
  To: akpm; +Cc: aia21, arjan, linux-kernel, frankvm

> > A dirty page is being written back, but the userspace server needs to
> > allocate memory to complete the request.  But the allocation will
> > block, since there's no more free memory.  
> 
> That shouldn't happen with write() traffic due to the dirty memory
> balancing logic.

How?  It either blocks other allocations until the writeback is
completed (DoS) or allows memory to be exhausted (deadlock).

Making unpriv mounts work securely is not a trivial thing I can tell
you ;)

> > User does unlink("/mnt/userfs/file").  Userspace server receives
> > request to unlink "/file".  Then the daemon does
> > unlink("/mnt/userfs/file").  This will deadlock on i_sem.
> 
> eh?  How can the fuse client and the fuse server both get access to the
> same file in this manner?  I don't see how you could set that up with NFS,
> for example.

With a custom userspace NFS server you can do whatever you want.
That's the whole purpose of the exercise.

> > Because, I can well imagine a synthetic filesystem, where file
> > data/metadata change aribitrarily.  In this case the timeout heuristic
> > in NFS is not useful.
> > 
> > In fact with NFS it's often a PITA, that it doesn't want to refresh a
> > file's data/metatata, which I _know_ has changed on the server.
> 
> I think nfs can do this, as long as the modification was done through the
> server.  I'd expect v9fs would be the same.

It's often not.  Sshfs is a good example.  File server will not be
able to notify the client when anything changes.  Polling is the only
solution, and NFS doesn't always get it right (and in fact it cannot).
It's much better to leave cache timeout policy to the userspace
filesystem, then trying to guess it in the kernel.


> > > Plus NFS and v9fs work across the network...
> > 
> > Yes.  I consider that a drawback.
> 
> Others (many) would disagree.
> 
> 
> Sorry, but I'm not buying it.  I still don't see a solid reason why all
> this could not be done with nfs/v9fs, some kernel tweaks and the rest in
> userspace.  It would take some effort, but that effort would end up
> strengthening existing kernel capabilities rather than adding brand new
> things, which is good.

I'm not sure.  NFS is a monster, everybody can agree.  Getting all the
requirements of FUSE (safe unprivileged mounts, etc) would be a
nightmare.

FUSE does one thing, and it does that right.  I think that's good.

Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01 10:11                           ` Miklos Szeredi
  2005-07-01 11:29                             ` Andrew Morton
@ 2005-07-01 12:08                             ` Frank van Maarseveen
  2005-07-01 13:21                             ` Eric Van Hensbergen
  2 siblings, 0 replies; 80+ messages in thread
From: Frank van Maarseveen @ 2005-07-01 12:08 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: akpm, aia21, arjan, linux-kernel, frankvm

On Fri, Jul 01, 2005 at 12:11:53PM +0200, Miklos Szeredi wrote:
> > > Userspace can tell the kernel, how long a dentry should be valid.  I
> > > don't think the NFS protocol provides this. Same holds for the inode
> > > attributes.
> > 
> > Why is that needed?
> 
> Because, I can well imagine a synthetic filesystem, where file
> data/metadata change aribitrarily.  In this case the timeout heuristic
> in NFS is not useful.
> 
> In fact with NFS it's often a PITA, that it doesn't want to refresh a
> file's data/metatata, which I _know_ has changed on the server.

This NFS issue is on my radar for years already. I have a patch which
is practical but a bit disgusting. IMHO it's orthogonal to FUSE.

-- 
Frank

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01 12:00                       ` Frank van Maarseveen
@ 2005-07-01 12:36                         ` Miklos Szeredi
  2005-07-01 13:05                           ` Frank van Maarseveen
  0 siblings, 1 reply; 80+ messages in thread
From: Miklos Szeredi @ 2005-07-01 12:36 UTC (permalink / raw)
  To: frankvm; +Cc: akpm, aia21, arjan, linux-kernel

> > You mean suid programs are never to touch paths passed to them?
> 
> never when euid==root.
> The pathname could even point into /proc or anything else yet unknown,
> e.g. by putting some symlinks at the right places. The mere act of
> opening the file as root could have unwanted side effects already.

OK, open is out.  However other operations (stat, unlink, chmod etc)
are always without side effects on "normal" filesystems.  However on
FUSE they are very much unsafe (can block, not do what was instructed
and return success, etc).

> > If that would be true, then fuse_allow_task() would not be needed, but
> > would do no harm either, since it would never be invoked by a suid
> > program.
> 
> In theory it should not be necessary. But on a practical side: we need
> to provide security for daemons with elevated privileges which need to
> traverse all local disks.

I agree wholeheartedly.  However, I'm not arguing this point, because
it has been (rightly) pointed out, that private namespaces can be used
to solve this.  While the suid issue is not solvable with private
namespaces.

> > You didn't consider the information leak aspect (point B in fuse.txt).
> 
> Correct. I have no answer to that other than: is it a real problem or
> yet something else a setuid program should take into consideration?
> And what info can we extract already using inotify/dnotify?

Probably not file access patterns.  But yes I don't consider this a
very grave problem.

> There are several ways to monitor activity and it is all
> information. /proc (ps) gives information too.
> 
> > > -	Forbid hiding data by mounting a FUSE filesystem on top of it (does
> > > 	FUSE check for this already?)
> > 
> > Yes.  It checks for writablilty on the mountpoing (excluding limited
> > writablilty as /tmp for example).
> 
> But can you mount FUSE on top of a populated tree, a non-leaf dir?

Yes, but I think that's OK, because if the directory is writable on
which you mount, than you can hide the data already (unlinking it, but
keeping a reference though a file descriptor).  And it's not very
effective hiding, since a bind mount of the mountpoint's filesystem
will reveal what's underneeth the FUSE mount.

> > > -	/proc isn't a problem: most root processes tend to avoid it because
> > > 	it is synthetic and thus uninteresting. Maybe we should extend
> > > 	the idea of "synthetic file-systems being uninteresting" to any
> > > 	process which cannot receive signals from the FUSE mount owner. When
> > > 	one cannot hide data by a FUSE mount and its synthetic anyway so not
> > > 	interesting then just show the original empty mount point.
> > 
> > Been there.  People (like Al Viro) didn't like it.
> 
> including changing the ptraceability test by a signal test and including
> the (IMHO) required emptyness of the mount stub?

It's been thrown out for the reason, that it's unacceptable if suid
programs see a different namespace as non-suid.

> Traversing a FUSE mountpoint is almost equivalent to talking with a
> userspace program. Why should that be interesting when one simply wants
> to traverse the FS? root isn't going to execute all user programs to
> see what they do either.

Yes.  Please explain that to Al Viro, Christoph Hellwig et. al.
Believe me it's not something that's easy to get across, and I'm very
happy that you see it this way too :).

Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01  6:50                 ` Andrew Morton
  2005-07-01  7:07                   ` Miklos Szeredi
@ 2005-07-01 12:37                   ` bert hubert
  1 sibling, 0 replies; 80+ messages in thread
From: bert hubert @ 2005-07-01 12:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Miklos Szeredi, aia21, arjan, linux-kernel, frankvm

On Thu, Jun 30, 2005 at 11:50:59PM -0700, Andrew Morton wrote:
> Speaking of which, dumb question: what does FUSE offer over simply using
> NFS protocol to talk to the userspace filesystem driver?

NFS currently does not currently engender warm feelings wrt ease of
programming and quality in general - especially under Linux sadly enough.

It is also a narrow window through which to speak to the rich set of
options, flags, attributes and features the Linux kernel offers.

I think Solaris used to implement bind mounts through loopback NFS, but that
went out of fashion as well.

-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://netherlabs.nl              Open and Closed source services

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01 11:29                             ` Andrew Morton
  2005-07-01 12:00                               ` Miklos Szeredi
@ 2005-07-01 12:53                               ` Anton Altaparmakov
  2005-07-01 13:07                                 ` Anton Altaparmakov
  2005-07-01 13:51                                 ` Frank van Maarseveen
  2005-07-01 13:29                               ` Eric Van Hensbergen
  2005-07-01 16:45                               ` Matthias Urlichs
  3 siblings, 2 replies; 80+ messages in thread
From: Anton Altaparmakov @ 2005-07-01 12:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Miklos Szeredi, arjan, linux-kernel, frankvm

On Fri, 2005-07-01 at 04:29 -0700, Andrew Morton wrote:
> Sorry, but I'm not buying it.  I still don't see a solid reason why all
> this could not be done with nfs/v9fs, some kernel tweaks and the rest in
> userspace.  It would take some effort, but that effort would end up
> strengthening existing kernel capabilities rather than adding brand new
> things, which is good.

FUSE is a generic FS API which is _very_ easy to write an FS for
(learning curve is about 10-15 minutes starting after you have unpacked
the fuse source code, at least it took me that long to start writing an
FS based on the example one provided).  NFS is not anything like that.

Also can the NFS approach provide me with different content depending on
the uid of the accessing process?  With FUSE that is easy as pie.  Even
easier than that actually...

Best regards,

        Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01 12:36                         ` Miklos Szeredi
@ 2005-07-01 13:05                           ` Frank van Maarseveen
  2005-07-01 13:21                             ` Miklos Szeredi
  0 siblings, 1 reply; 80+ messages in thread
From: Frank van Maarseveen @ 2005-07-01 13:05 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: frankvm, akpm, aia21, arjan, linux-kernel

On Fri, Jul 01, 2005 at 02:36:22PM +0200, Miklos Szeredi wrote:
> > > You mean suid programs are never to touch paths passed to them?
> > 
> > never when euid==root.
> > The pathname could even point into /proc or anything else yet unknown,
> > e.g. by putting some symlinks at the right places. The mere act of
> > opening the file as root could have unwanted side effects already.
> 
> OK, open is out.  However other operations (stat, unlink, chmod etc)
> are always without side effects on "normal" filesystems.  However on
> FUSE they are very much unsafe (can block, not do what was instructed
> and return success, etc).

What about tricking a setuid program to stat into /auto (/mnt/auto,
/misc, whatever it is called)? then the automounter will act upon a root
request with again possibly unwanted side effects. See how careful a
setuid/full-root program must be in handling userdata including pathnames?

FUSE suddenly makes this more obvious but it is not new.

> > > > -	/proc isn't a problem: most root processes tend to avoid it because
> > > > 	it is synthetic and thus uninteresting. Maybe we should extend
> > > > 	the idea of "synthetic file-systems being uninteresting" to any
> > > > 	process which cannot receive signals from the FUSE mount owner. When
> > > > 	one cannot hide data by a FUSE mount and its synthetic anyway so not
> > > > 	interesting then just show the original empty mount point.
> > > 
> > > Been there.  People (like Al Viro) didn't like it.
> > 
> > including changing the ptraceability test by a signal test and including
> > the (IMHO) required emptyness of the mount stub?
> 
> It's been thrown out for the reason, that it's unacceptable if suid
> programs see a different namespace as non-suid.

You mean root versus non-root. or user versus other user I assume. Because
the euid (fsuid) is what matters.

But then: this _is_ already the case for NFS when squash_root is in effect
(what about kerberos et.al?). So there are several reasons to consider
FUSE a nonlocal fs instead of a local one so nothing new there. FUSE could
be used to implement a usable (not perfect) userspace NFS/ftp client.

To require an empty stub to mount FUSE upon makes the whole picture
cleaner: users are only able to extend the namespace _leaf_ nodes for
themselves and processes they can send signals to: setuid programs
which do not fully become root. The existing namespace [nodes] remains
unchanged for everyone.

-- 
Frank

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01 12:53                               ` Anton Altaparmakov
@ 2005-07-01 13:07                                 ` Anton Altaparmakov
  2005-07-01 13:51                                 ` Frank van Maarseveen
  1 sibling, 0 replies; 80+ messages in thread
From: Anton Altaparmakov @ 2005-07-01 13:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Miklos Szeredi, arjan, linux-kernel, frankvm

On Fri, 2005-07-01 at 13:53 +0100, Anton Altaparmakov wrote:
> On Fri, 2005-07-01 at 04:29 -0700, Andrew Morton wrote:
> > Sorry, but I'm not buying it.  I still don't see a solid reason why all
> > this could not be done with nfs/v9fs, some kernel tweaks and the rest in
> > userspace.  It would take some effort, but that effort would end up
> > strengthening existing kernel capabilities rather than adding brand new
> > things, which is good.
> 
> FUSE is a generic FS API which is _very_ easy to write an FS for
> (learning curve is about 10-15 minutes starting after you have unpacked
> the fuse source code, at least it took me that long to start writing an
> FS based on the example one provided).  NFS is not anything like that.
> 
> Also can the NFS approach provide me with different content depending on
> the uid of the accessing process?  With FUSE that is easy as pie.  Even
> easier than that actually...

I forgot:  And doesn't NFS require stable inode numbers and other
"invariables" like that for it to work?  FUSE doesn't and those
requirements are a real PITA in a lot of cases where there simply are no
inodes and the numbers are synthetic and change on each remount or even
on each access after the dentry has expired...

And I always thought that doing FS in userspace via NFS is considered an
ugly hack.  I didn't have the impression that that had changed recently.
(-;

Best regards,

        Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01 10:11                           ` Miklos Szeredi
  2005-07-01 11:29                             ` Andrew Morton
  2005-07-01 12:08                             ` Frank van Maarseveen
@ 2005-07-01 13:21                             ` Eric Van Hensbergen
  2005-07-01 13:53                               ` Miklos Szeredi
  2 siblings, 1 reply; 80+ messages in thread
From: Eric Van Hensbergen @ 2005-07-01 13:21 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: akpm, aia21, arjan, linux-kernel, frankvm, v9fs-developer

On 7/1/05, Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > v9fs has a user-level server too.  Maybe it has been used in FUSE-like
> > scenarios more than NFS.

We've really only dabbled with v9fs and user-level file services,
mostly through interacting with Plan 9 From User Space applications
(http://www.plan9.us)  However, there are people actively improving
this area of functionality including providing an SDK to allow easy
creation of synthetic file systems.  That being said, there are many
aspects of v9fs which have been written/re-written with the express
purpose of providing support for such synthetics.

> 
> I think the p9 protocol is suffering from trying to be too generic.
> The FUSE kernel interface is probably slightly tied to the linux VFS,
> and would present problems if trying to port to other *NIX or god
> forbid some other OS family altogether.
> 

I don't know where 9P "suffers" from being too generic, it's just
well-designed and has done a good job of keeping things simple --
something that the plethora of over designed, bloated interfaces of
today could learn from.

> 
> > Plus NFS and v9fs work across the network...
> 
> Yes.  I consider that a drawback.  FUSE does data transfer very
> efficiently (single copy), without the heavy network infrastructure
> being in the way.
> 

I'll grant you this is something v9fs-2.0 suffers from, but its
something we are actively addressing in v9fs-2.1.  We're working more
towards the implementation that is present in the Plan 9 kernel, where
the core efficiently multiplexes the requests either directly to local
servers (in Plan 9's case via function call APIs) or encapsulates them
for shipping across the network.  The 9P interface is used for both,
it just has different embodiments depending on underlying transport.

That being said, I imagine the time spent context switching in and out
of the kernel dominates performance.  With a proper mux there is no
reason why v9fs can't be made as efficient as FUSE - and that's what
we intend to demonstrate in v9fs-2.1.  Plus, with v9fs you get the
benefit of being able to export your synthetic file systems over the
network with no additional copies.

Further, when you create an infrastructure which is meant to work over
a network, you take fewer things for granted -- which ultimately leads
to a more robust system capable of dealing with many of these
problems.

          -eric

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01 13:05                           ` Frank van Maarseveen
@ 2005-07-01 13:21                             ` Miklos Szeredi
  2005-07-01 15:20                               ` Frank van Maarseveen
  0 siblings, 1 reply; 80+ messages in thread
From: Miklos Szeredi @ 2005-07-01 13:21 UTC (permalink / raw)
  To: frankvm; +Cc: akpm, aia21, arjan, linux-kernel

> > OK, open is out.  However other operations (stat, unlink, chmod etc)
> > are always without side effects on "normal" filesystems.  However on
> > FUSE they are very much unsafe (can block, not do what was instructed
> > and return success, etc).
> 
> What about tricking a setuid program to stat into /auto (/mnt/auto,
> /misc, whatever it is called)? then the automounter will act upon a root
> request with again possibly unwanted side effects. See how careful a
> setuid/full-root program must be in handling userdata including pathnames?

I don't see why /auto is special.  It's basically a userspace
filesystem too, but that's not what is specaial about FUSE.  It's the
fact the it's a userspace filesystem controlled by an _ordinary user_.

> FUSE suddenly makes this more obvious but it is not new.

I believe it _is_ something new.  If it were not, then your arguments
would be bulletproof.  As it is, I think you miss the point that the
side effect is actually in the hands of the user invoking the suid
program, instead of something external.

> > > including changing the ptraceability test by a signal test and including
> > > the (IMHO) required emptyness of the mount stub?
> > 
> > It's been thrown out for the reason, that it's unacceptable if suid
> > programs see a different namespace as non-suid.
> 
> You mean root versus non-root. or user versus other user I assume. Because
> the euid (fsuid) is what matters.

Yes.

> But then: this _is_ already the case for NFS when squash_root is in effect
> (what about kerberos et.al?). So there are several reasons to consider
> FUSE a nonlocal fs instead of a local one so nothing new there. FUSE could
> be used to implement a usable (not perfect) userspace NFS/ftp client.

Yes.  In fact even if the check were left out of the kernel, the
userspace filesystem could still return different data/error based on
fsuid/fsgid/pid.

So what's so controversial about it?  I really fail to understand...

> To require an empty stub to mount FUSE upon makes the whole picture
> cleaner: users are only able to extend the namespace _leaf_ nodes for
> themselves and processes they can send signals to: setuid programs
> which do not fully become root. The existing namespace [nodes] remains
> unchanged for everyone.

It's not as simple.  A filesystem can be mounted many times (either
with mount --bind, or just by mounting the same device on multiple
mountpoints).  In this case you can't ensure, that a mountpoint will
remain a leaf node after being mounted on.

Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01 11:29                             ` Andrew Morton
  2005-07-01 12:00                               ` Miklos Szeredi
  2005-07-01 12:53                               ` Anton Altaparmakov
@ 2005-07-01 13:29                               ` Eric Van Hensbergen
  2005-07-01 16:45                               ` Matthias Urlichs
  3 siblings, 0 replies; 80+ messages in thread
From: Eric Van Hensbergen @ 2005-07-01 13:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Miklos Szeredi, aia21, arjan, linux-kernel, frankvm,
	v9fs-developer

On 7/1/05, Andrew Morton <akpm@osdl.org> wrote:
> Miklos Szeredi <miklos@szeredi.hu> wrote: 
> > > > Userspace can tell the kernel, how long a dentry should be valid.  I
> > > > don't think the NFS protocol provides this. Same holds for the inode
> > > > attributes.
> > >
> > > Why is that needed?
> >
> > Because, I can well imagine a synthetic filesystem, where file
> > data/metadata change aribitrarily.  In this case the timeout heuristic
> > in NFS is not useful.
> >
> > In fact with NFS it's often a PITA, that it doesn't want to refresh a
> > file's data/metatata, which I _know_ has changed on the server.
> 
> I think nfs can do this, as long as the modification was done through the
> server.  I'd expect v9fs would be the same.
> 

v9fs aggressively invalidates dentries by default -- it is our
experience that caching metadata (particularly in synthetics) causes
more problems than it is worth.  That being said, there are prototype
designs for v9fs cache layers which actively detect if underlying file
systems are synthetic or static and allow parametrized cache policies
(for both the dcache and the page cache).

As a side-note which I know less about, I believe NFSv4 includes
server-push invalidation semantics, but I can't remember if that
applies to metadata or just data.

          -eric

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01 12:53                               ` Anton Altaparmakov
  2005-07-01 13:07                                 ` Anton Altaparmakov
@ 2005-07-01 13:51                                 ` Frank van Maarseveen
  1 sibling, 0 replies; 80+ messages in thread
From: Frank van Maarseveen @ 2005-07-01 13:51 UTC (permalink / raw)
  To: Anton Altaparmakov
  Cc: Andrew Morton, Miklos Szeredi, arjan, linux-kernel, frankvm

On Fri, Jul 01, 2005 at 01:53:54PM +0100, Anton Altaparmakov wrote:
> On Fri, 2005-07-01 at 04:29 -0700, Andrew Morton wrote:
> > Sorry, but I'm not buying it.  I still don't see a solid reason why all
> > this could not be done with nfs/v9fs, some kernel tweaks and the rest in
> > userspace.  It would take some effort, but that effort would end up
> > strengthening existing kernel capabilities rather than adding brand new
> > things, which is good.
> 
> Also can the NFS approach provide me with different content depending on
> the uid of the accessing process?  With FUSE that is easy as pie.  Even
> easier than that actually...

unfsd can that I believe. However, FUSE and user space NFSd are complementary.
For every NFS solution one still needs to do the mounting as root. FUSE
addresses the client side: it can implement a user space NFS client.

-- 
Frank

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01 13:21                             ` Eric Van Hensbergen
@ 2005-07-01 13:53                               ` Miklos Szeredi
  2005-07-01 14:18                                 ` Eric Van Hensbergen
  0 siblings, 1 reply; 80+ messages in thread
From: Miklos Szeredi @ 2005-07-01 13:53 UTC (permalink / raw)
  To: ericvh; +Cc: miklos, akpm, aia21, arjan, linux-kernel, frankvm, v9fs-developer

> I don't know where 9P "suffers" from being too generic, it's just
> well-designed and has done a good job of keeping things simple --
> something that the plethora of over designed, bloated interfaces of
> today could learn from.

True.  I very much like the simplicity of the 9P protocol.  But it's
system independence sometimes makes it fit poorly to the Linux VFS
interface.  I guess you have a wide experience with this :)

> > > Plus NFS and v9fs work across the network...
> > 
> > Yes.  I consider that a drawback.  FUSE does data transfer very
> > efficiently (single copy), without the heavy network infrastructure
> > being in the way.
> > 
> 
> I'll grant you this is something v9fs-2.0 suffers from, but its
> something we are actively addressing in v9fs-2.1.  We're working more
> towards the implementation that is present in the Plan 9 kernel, where
> the core efficiently multiplexes the requests either directly to local
> servers (in Plan 9's case via function call APIs) or encapsulates them
> for shipping across the network.  The 9P interface is used for both,
> it just has different embodiments depending on underlying transport.
> 
> That being said, I imagine the time spent context switching in and out
> of the kernel dominates performance.

Context switch happens from one process to the other, not when
entering/leaving the kernel (which is very efficient).

So it's much more important to reduce the number of round-trips for a
single operation, than multiplexing requests for multiple operations.

> With a proper mux there is no reason why v9fs can't be made as
> efficient as FUSE - and that's what we intend to demonstrate in
> v9fs-2.1.  Plus, with v9fs you get the benefit of being able to
> export your synthetic file systems over the network with no
> additional copies.

Yes, but does that matter?  I'm not sure that it's a good idea
bundling network filesystem functionality together with userspace
filesystem functionality.  Each has it's own set of requirements, and
it's own way of working optimally.

What would people say if ext3 was always mounted locally through NFS,
because the kernel would only provide the NFS filesystem client.

Differentiation of interfaces depending on the "closeness" of the
client to the server makes good sense IMO.  We currently have
in-kernel and across-network.  FUSE adds in-userspace in between those
two.

Sometime these can overlap, but one interface will always be more
optimal (in terms of functionality as well as speed) for a specific
application.

> Further, when you create an infrastructure which is meant to work over
> a network, you take fewer things for granted -- which ultimately leads
> to a more robust system capable of dealing with many of these
> problems.

Yes.  I'm not speaking agains v9fs, which I think has a valid niche,
as well as FUSE.

Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01 13:53                               ` Miklos Szeredi
@ 2005-07-01 14:18                                 ` Eric Van Hensbergen
  2005-07-01 14:31                                   ` Miklos Szeredi
  0 siblings, 1 reply; 80+ messages in thread
From: Eric Van Hensbergen @ 2005-07-01 14:18 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: akpm, aia21, arjan, linux-kernel, frankvm, v9fs-developer

On 7/1/05, Miklos Szeredi <miklos@szeredi.hu> wrote:
> > I don't know where 9P "suffers" from being too generic, it's just
> > well-designed and has done a good job of keeping things simple --
> > something that the plethora of over designed, bloated interfaces of
> > today could learn from.
> 
> True.  I very much like the simplicity of the 9P protocol.  But it's
> system independence sometimes makes it fit poorly to the Linux VFS
> interface.  I guess you have a wide experience with this :)
>

Yeah, but most of our problems had less to do with the VFS interface
per se, and more to do with the dcache/page-cache.   In the long run,
the portability is something you may want though -- not only to
provide support under BSD or whatever, but also to insulate changes in
the VFS API from user file servers.

> 
> So it's much more important to reduce the number of round-trips for a
> single operation, than multiplexing requests for multiple operations.
>

Agreed, this will be something we'll (v9fs) have to keep a close tab
on to keep things efficient.

> > With a proper mux there is no reason why v9fs can't be made as
> > efficient as FUSE - and that's what we intend to demonstrate in
> > v9fs-2.1.  Plus, with v9fs you get the benefit of being able to
> > export your synthetic file systems over the network with no
> > additional copies.
> 
> Yes, but does that matter?  I'm not sure that it's a good idea
> bundling network filesystem functionality together with userspace
> filesystem functionality.  Each has it's own set of requirements, and
> it's own way of working optimally.
> 

I see your point, but increasingly common usage environments are
distributed systems and I think network synthetics will have their
niche.

> What would people say if ext3 was always mounted locally through NFS,
> because the kernel would only provide the NFS filesystem client.

Probably the same thing they would say if ext3 was a user-space
application that always needed to be mounted via FUSE ;)

> 
> Differentiation of interfaces depending on the "closeness" of the
> client to the server makes good sense IMO.  We currently have
> in-kernel and across-network.  FUSE adds in-userspace in between those
> two.
> 

I think that remains to be seen.  There is much to be gained from
blurring the differentiation as we move Linux towards a first-class
distributed system.  If unified interfaces can be made "good-enough"
performance wise, what justifies having multiple interfaces depending
on network versus local?  Specialization has its place, but
performance mongering at the cost of design is what killed systems
research.  In the end, specialization has its place, but I think it's
always worth striving towards unified interfaces when performance
doesn't suffer to a great degree.

> 
> > Further, when you create an infrastructure which is meant to work over
> > a network, you take fewer things for granted -- which ultimately leads
> > to a more robust system capable of dealing with many of these
> > problems.
> 
> Yes.  I'm not speaking agains v9fs, which I think has a valid niche,
> as well as FUSE.
> 

FUSE certainly has its place, and has done a great job creating an
environment in which it is relatively easy to create new file systems
in user-space.  My main point in responding was to take the position
that the v9fs mechanisms are adequate to provide user-space file
systems and that while it was not the primary motivation behind the
v9fs project, we are actively pursuing improving the performance and
robustness of our mechanisms for providing user-space (as well as
kernel-space) file service and developing an SDK to ease the
implementation of 9P-based synthetic file servers.

         -eric

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01 14:18                                 ` Eric Van Hensbergen
@ 2005-07-01 14:31                                   ` Miklos Szeredi
  2005-07-02 10:01                                     ` Eric W. Biederman
  0 siblings, 1 reply; 80+ messages in thread
From: Miklos Szeredi @ 2005-07-01 14:31 UTC (permalink / raw)
  To: ericvh; +Cc: akpm, aia21, arjan, linux-kernel, frankvm, v9fs-developer

> > What would people say if ext3 was always mounted locally through NFS,
> > because the kernel would only provide the NFS filesystem client.
> 
> Probably the same thing they would say if ext3 was a user-space
> application that always needed to be mounted via FUSE ;)

Yes, and rightly.

One of the misunderstandings about userspace filesystems (Linus falls
into this) is to compare it with microkernels.

FUSE (and userspace filesystems in general) are NOT meant to replace
in kernel filesystems or the VFS.  They are an addition with which
different kinds of filesystems can be implemented much better than
they could be in kernel.

Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01 13:21                             ` Miklos Szeredi
@ 2005-07-01 15:20                               ` Frank van Maarseveen
  2005-07-01 17:04                                 ` Miklos Szeredi
  0 siblings, 1 reply; 80+ messages in thread
From: Frank van Maarseveen @ 2005-07-01 15:20 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: frankvm, akpm, aia21, arjan, linux-kernel

On Fri, Jul 01, 2005 at 03:21:59PM +0200, Miklos Szeredi wrote:
> 
> > To require an empty stub to mount FUSE upon makes the whole picture
> > cleaner: users are only able to extend the namespace _leaf_ nodes for
> > themselves and processes they can send signals to: setuid programs
> > which do not fully become root. The existing namespace [nodes] remains
> > unchanged for everyone.
> 
> It's not as simple.  A filesystem can be mounted many times (either
> with mount --bind, or just by mounting the same device on multiple
> mountpoints).  In this case you can't ensure, that a mountpoint will
> remain a leaf node after being mounted on.

I have bind-mounted / on /net/blabla
I tried two experiments:

	mounting something under / and looking for it under /net/blabla
	mounting something under /net/blabla and looking for it under /

The experiment was done with bind mounts and by mounting a USB stick
(/dev/sdb1) and there was no auto propagation of mounts.

(2.6.12-rc6)

How can a leaf dir suddenly become non-leaf by a mount without an explicit
mount command?

-- 
Frank

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01 11:29                             ` Andrew Morton
                                                 ` (2 preceding siblings ...)
  2005-07-01 13:29                               ` Eric Van Hensbergen
@ 2005-07-01 16:45                               ` Matthias Urlichs
  3 siblings, 0 replies; 80+ messages in thread
From: Matthias Urlichs @ 2005-07-01 16:45 UTC (permalink / raw)
  To: linux-kernel

Hi, Andrew Morton wrote:

> Sorry, but I'm not buying it.  I still don't see a solid reason why all
> this could not be done with nfs/v9fs, some kernel tweaks and the rest in
> userspace.

Let's forget about NFS here. It's stateless. You don't want a wholly
stateless layer between two stateful instances; the fact that it works for
a disk-based NFS server isn't proof that it'd work for gmailfs or sshfs.

There are a lot of FUSE server implementations out there already.
You want all of them to rewrite their code for v9fs?

I admit that I don't know zilch about how difficult it is to write a v9fs
server (is there sane sample code / a support library?) or how much
overhead such a server would incur or how safe it'd be to run a
user-controlled server on the same machine as the mountpoint.
The point is that the FUSE people already cover all these points,
thus: unless there's a major technical problem with it that v9fs solves
better, I'd advocate to include it.

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  smurf@smurf.noris.de
Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de
 - -
Magpie, n.:
	A bird whose thievish disposition suggested to someone that it
	might be taught to talk.
		-- Ambrose Bierce, "The Devil's Dictionary"

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01 15:20                               ` Frank van Maarseveen
@ 2005-07-01 17:04                                 ` Miklos Szeredi
  2005-07-01 18:04                                   ` Frank van Maarseveen
  0 siblings, 1 reply; 80+ messages in thread
From: Miklos Szeredi @ 2005-07-01 17:04 UTC (permalink / raw)
  To: frankvm; +Cc: akpm, aia21, arjan, linux-kernel

> > It's not as simple.  A filesystem can be mounted many times (either
> > with mount --bind, or just by mounting the same device on multiple
> > mountpoints).  In this case you can't ensure, that a mountpoint will
> > remain a leaf node after being mounted on.
> 
> I have bind-mounted / on /net/blabla
> I tried two experiments:
> 
> 	mounting something under / and looking for it under /net/blabla
> 	mounting something under /net/blabla and looking for it under /
> 
> The experiment was done with bind mounts and by mounting a USB stick
> (/dev/sdb1) and there was no auto propagation of mounts.

I'm not talking about auto propagation (that's only now being
implemented by Ram Pai, and is not in stock kernels).

What I'm saying is that mounting something over a leaf node, does not
guarantee, that it will remain a leaf node after it's been mounted on.

For example:

mkdir /tmp/leafdir
mkdir /tmp/rootcopy
mount --bind / /tmp/rootcopy
mount /dev/sdb1 /tmp/leafdir
mkdir /tmp/rootcopy/tmp/leafdir/child

Now 'leafdir' is no longer a leaf.

I'm not saying this is a problem, but also I don't see any
overwhelming reason to not allow user mounts over non-leaf
directories.

Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01 17:04                                 ` Miklos Szeredi
@ 2005-07-01 18:04                                   ` Frank van Maarseveen
  2005-07-01 19:35                                     ` Jeremy Maitin-Shepard
  2005-07-02 14:49                                     ` Miklos Szeredi
  0 siblings, 2 replies; 80+ messages in thread
From: Frank van Maarseveen @ 2005-07-01 18:04 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: frankvm, akpm, aia21, arjan, linux-kernel

On Fri, Jul 01, 2005 at 07:04:50PM +0200, Miklos Szeredi wrote:

> I'm not saying this is a problem, but also I don't see any
> overwhelming reason to not allow user mounts over non-leaf
> directories.

All things considered I'd still prefer forbidding FUSE mounts on non-leaf
dirs. For name space sanity. And it may be easier to get the whole thing
accepted:

-	One could argue that the existing name space is extended rather than
	changed [for a subset of processes], what Al Viro seems to reject.
-	The processes which cannot be ptraced/sent a signal by the mount
	owner are not "forced" to traverse the FUSE mount for the sake of
	name space invariancy, with all associated security problems: they
	can see everything up to the leaf node of all the usual mounts.

But put otherwise: is there a compelling reason to permit FUSE mounts on
non-leaf nodes?

Can FUSE mount on a file like NFS?

What is your opinion about replacing the ptrace check by a signal check
(later on, no hurry)?

-- 
Frank

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01 18:04                                   ` Frank van Maarseveen
@ 2005-07-01 19:35                                     ` Jeremy Maitin-Shepard
  2005-07-02 14:49                                     ` Miklos Szeredi
  1 sibling, 0 replies; 80+ messages in thread
From: Jeremy Maitin-Shepard @ 2005-07-01 19:35 UTC (permalink / raw)
  To: linux-kernel

Frank van Maarseveen <frankvm@frankvm.com> writes:

[snip]

> But put otherwise: is there a compelling reason to permit FUSE mounts on
> non-leaf nodes?

In my own use of FUSE, I have found it handy to stick mount scripts in
some of the directories that I use as FUSE mount points.

-- 
Jeremy Maitin-Shepard

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01 14:31                                   ` Miklos Szeredi
@ 2005-07-02 10:01                                     ` Eric W. Biederman
  2005-07-02 14:58                                       ` Miklos Szeredi
  2005-07-02 16:43                                       ` Eric Van Hensbergen
  0 siblings, 2 replies; 80+ messages in thread
From: Eric W. Biederman @ 2005-07-02 10:01 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: ericvh, akpm, aia21, arjan, linux-kernel, frankvm, v9fs-developer

Miklos Szeredi <miklos@szeredi.hu> writes:

>> > What would people say if ext3 was always mounted locally through NFS,
>> > because the kernel would only provide the NFS filesystem client.
>> 
>> Probably the same thing they would say if ext3 was a user-space
>> application that always needed to be mounted via FUSE ;)
>
> Yes, and rightly.
>
> One of the misunderstandings about userspace filesystems (Linus falls
> into this) is to compare it with microkernels.
>
> FUSE (and userspace filesystems in general) are NOT meant to replace
> in kernel filesystems or the VFS.  They are an addition with which
> different kinds of filesystems can be implemented much better than
> they could be in kernel.

Taking a quick glance at v9fs and fuse I fail to see how either
plays nicely with the page cache.

v9fs according to my reading of the protocol specification does
not have any concept of a lease.  So you can't tell if you are
talking about a virtual filesystem where all calls should be passed
straight to the server or a real filesystem where you can perform
caching.  The implementation simply appears to bypass the pagecache
which seems sane.

Skimming through the FUSE code I see the same problem, in that you can't
autodetect the right thing.  This is currently hacked around with
"direct_io" mount option selecting between a cached and a non-cached
status on a filesystem basis at mount time.  But having
a per file flag would be nicer.  I also don't understand
why in fuse direct_io is an if statement in fuse_file_read/write
instead of simply being a different set of filesystem operations.

Neither implementation seems to forward user space locks to the
filesystem server.

Eric

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01 18:04                                   ` Frank van Maarseveen
  2005-07-01 19:35                                     ` Jeremy Maitin-Shepard
@ 2005-07-02 14:49                                     ` Miklos Szeredi
  2005-07-02 16:00                                       ` Frank van Maarseveen
  1 sibling, 1 reply; 80+ messages in thread
From: Miklos Szeredi @ 2005-07-02 14:49 UTC (permalink / raw)
  To: frankvm; +Cc: akpm, aia21, arjan, linux-kernel

> > I'm not saying this is a problem, but also I don't see any
> > overwhelming reason to not allow user mounts over non-leaf
> > directories.
> 
> All things considered I'd still prefer forbidding FUSE mounts on non-leaf
> dirs. For name space sanity. And it may be easier to get the whole thing
> accepted:
> 
> -	One could argue that the existing name space is extended rather than
> 	changed [for a subset of processes], what Al Viro seems to reject.
> -	The processes which cannot be ptraced/sent a signal by the mount
> 	owner are not "forced" to traverse the FUSE mount for the sake of
> 	name space invariancy, with all associated security problems: they
> 	can see everything up to the leaf node of all the usual mounts.
> 
> But put otherwise: is there a compelling reason to permit FUSE mounts on
> non-leaf nodes?

Not really.  Maybe it does have some uses, but I'm not aware of any.

But I don't think it would matter in the acceptance of the mount
hiding patch, since that patch was not rejected on the basis of what
FUSE would use it for, rather for the general philosophy of not
allowing namespace differences based on user id.

> Can FUSE mount on a file like NFS?

Yes.

> What is your opinion about replacing the ptrace check by a signal check
> (later on, no hurry)?

Maybe.  You'd still have to convince me, that signals sent to suid
programs are not a security problem.

Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-02 10:01                                     ` Eric W. Biederman
@ 2005-07-02 14:58                                       ` Miklos Szeredi
  2005-07-02 16:43                                       ` Eric Van Hensbergen
  1 sibling, 0 replies; 80+ messages in thread
From: Miklos Szeredi @ 2005-07-02 14:58 UTC (permalink / raw)
  To: ebiederm
  Cc: ericvh, akpm, aia21, arjan, linux-kernel, frankvm, v9fs-developer

> Taking a quick glance at v9fs and fuse I fail to see how either
> plays nicely with the page cache.
> 
> v9fs according to my reading of the protocol specification does
> not have any concept of a lease.  So you can't tell if you are
> talking about a virtual filesystem where all calls should be passed
> straight to the server or a real filesystem where you can perform
> caching.  The implementation simply appears to bypass the pagecache
> which seems sane.
> 
> Skimming through the FUSE code I see the same problem, in that you can't
> autodetect the right thing.  This is currently hacked around with
> "direct_io" mount option selecting between a cached and a non-cached
> status on a filesystem basis at mount time.  But having
> a per file flag would be nicer.

There's a plan to make this work.  The kernel ABI has alredy been
prepared for this, it would be relatively little work to implement.
But I usually wait with something like this until people actually
start asking for this feature.

> I also don't understand why in fuse direct_io is an if statement in
> fuse_file_read/write instead of simply being a different set of
> filesystem operations.

Good point.  I'll fix that.

> Neither implementation seems to forward user space locks to the
> filesystem server.

This too has been discussed.  The last half year has been mostly spend
with ironing out problems cought during integration.  Sometime this
summer I'll start implementing these new features (inode based API,
locking, userspace NFS serving, maybe shared writable mmap support).

Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-02 14:49                                     ` Miklos Szeredi
@ 2005-07-02 16:00                                       ` Frank van Maarseveen
  2005-07-03  6:16                                         ` Miklos Szeredi
  0 siblings, 1 reply; 80+ messages in thread
From: Frank van Maarseveen @ 2005-07-02 16:00 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: frankvm, akpm, aia21, arjan, linux-kernel

On Sat, Jul 02, 2005 at 04:49:24PM +0200, Miklos Szeredi wrote:
> > 
> > All things considered I'd still prefer forbidding FUSE mounts on non-leaf
> > dirs. For name space sanity. And it may be easier to get the whole thing
> > accepted:
> > 
> 
> But I don't think it would matter in the acceptance of the mount
> hiding patch, since that patch was not rejected on the basis of what
> FUSE would use it for, rather for the general philosophy of not
> allowing namespace differences based on user id.

That would really be a loss.

After some thinking, the whole "not allowing namespace differences
based on user id" philosophy is unenforcable and not even true sometimes
nowadays. Think NFS: have a look at the unfsd server, you'll be surprised
what it can do. Think any other networked file system exported by a
machine with an unusual disk file-system underneath. IIRC ncpfs does
this on the server based on access and thus based on uid.

(hmm, I _hated_ it seeing empty directories only because I had no access
 to anything below. Based on that I'd prefer EACCES instead of seeing an
 empty mount stub when FUSE denies access to root or any other user.)

The thing is, root rules the _local_ part of the name space. So it should
make a _huge_ difference if FUSE can fiddle with that or only with what's
below the leaf nodes.

> > What is your opinion about replacing the ptrace check by a signal check
> > (later on, no hurry)?
> 
> Maybe.  You'd still have to convince me, that signals sent to suid
> programs are not a security problem.

google kill(2):

	http://www.opengroup.org/onlinepubs/007908799/xsh/kill.html

It is _defined_ behavior. So, it is up to the quality of the programmer
whether or not it results in a security problem ;-)

-- 
Frank

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-02 10:01                                     ` Eric W. Biederman
  2005-07-02 14:58                                       ` Miklos Szeredi
@ 2005-07-02 16:43                                       ` Eric Van Hensbergen
  2005-07-02 17:33                                         ` Eric W. Biederman
  1 sibling, 1 reply; 80+ messages in thread
From: Eric Van Hensbergen @ 2005-07-02 16:43 UTC (permalink / raw)
  To: Eric W. Biederman, Miklos Szeredi
  Cc: akpm, aia21, arjan, linux-kernel, frankvm, v9fs-developer

On Sat, 2 Jul 2005 6:15 am, Eric W. Biederman wrote:
>
> Taking a quick glance at v9fs and fuse I fail to see how either
> plays nicely with the page cache.
>

True, in fact it actively avoids using it.  The previous version used 
both the page cache and the dcache with undesirable effects on synthetic 
file systems so we removed cache support.  Our intention is to design a 
cache layer (similar to cfs on Plan 9) which handles cache semantics 
which can be parameterized with the appropriate cache policy depending 
on the underlying file server.

> v9fs according to my reading of the protocol specification does
> not have any concept of a lease.  So you can't tell if you are
> talking about a virtual filesystem where all calls should be passed
> straight to the server or a real filesystem where you can perform
> caching.

While 9P contains no explicit support for leases and cacheing there is 
an informal mechanism which is used (at least for plan 9 file servers).  
If the qid.vers is 0 the file can be assumed to be a synthetic file and 
so it is not cached.

>
> Neither implementation seems to forward user space locks to the
> filesystem server.
>

Yup.  We have exclusive open semantics but not locks in the Posix 
sense.  Lock support is on our 2.1 roadmap.

    -eric

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-02 16:43                                       ` Eric Van Hensbergen
@ 2005-07-02 17:33                                         ` Eric W. Biederman
  0 siblings, 0 replies; 80+ messages in thread
From: Eric W. Biederman @ 2005-07-02 17:33 UTC (permalink / raw)
  To: Eric Van Hensbergen
  Cc: Miklos Szeredi, akpm, aia21, arjan, linux-kernel, frankvm,
	v9fs-developer

Eric Van Hensbergen <ericvh@gmail.com> writes:

> On Sat, 2 Jul 2005 6:15 am, Eric W. Biederman wrote:
>>
>> Taking a quick glance at v9fs and fuse I fail to see how either
>> plays nicely with the page cache.
>>
>
> True, in fact it actively avoids using it.  The previous version used both the
> page cache and the dcache with undesirable effects on synthetic file systems so
> we removed cache support.  Our intention is to design a cache layer (similar to
> cfs on Plan 9) which handles cache semantics which can be parameterized with the
> appropriate cache policy depending on the underlying file server.

Not having auto discovery for that kind of thing disturbs me.  But
if you can discover what you must do and then the policy is about
what you can do it I guess I'm fine with that.

>> v9fs according to my reading of the protocol specification does
>> not have any concept of a lease.  So you can't tell if you are
>> talking about a virtual filesystem where all calls should be passed
>> straight to the server or a real filesystem where you can perform
>> caching.
>
> While 9P contains no explicit support for leases and cacheing there is an
> informal mechanism which is used (at least for plan 9 file servers).  If the
> qid.vers is 0 the file can be assumed to be a synthetic file and so it is not
> cached.

That sounds sane.  With that you can at least do NFS style caching 
with a lot of stat calls to verify your cache is coherent and by
implementing it as a write-through cache you can even do a halfway
decent job of being cache coherent.  Which is probably about the
best you can do with the current unix API.

With a write-through cache you can likely achieve the same
semantic effect of totally not caching a file with an appropriate
number of stat calls.  Not caching some files will like yield 

I suggest you document the quid.vers == 0 magic for an uncachable
file, so future interoperability is assured.

Eric

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-02 16:00                                       ` Frank van Maarseveen
@ 2005-07-03  6:16                                         ` Miklos Szeredi
  2005-07-03 11:25                                           ` Frank van Maarseveen
  0 siblings, 1 reply; 80+ messages in thread
From: Miklos Szeredi @ 2005-07-03  6:16 UTC (permalink / raw)
  To: frankvm; +Cc: akpm, aia21, arjan, linux-kernel

> After some thinking, the whole "not allowing namespace differences
> based on user id" philosophy is unenforcable and not even true sometimes
> nowadays. Think NFS: have a look at the unfsd server, you'll be surprised
> what it can do. Think any other networked file system exported by a
> machine with an unusual disk file-system underneath. IIRC ncpfs does
> this on the server based on access and thus based on uid.

Hmm, do you mean returning different directory contents based on uid?

> (hmm, I _hated_ it seeing empty directories only because I had no access
>  to anything below. Based on that I'd prefer EACCES instead of seeing an
>  empty mount stub when FUSE denies access to root or any other user.)

Well, it works that way currently, and there doesn't seem to be any
real problem with it.

> The thing is, root rules the _local_ part of the name space. So it should
> make a _huge_ difference if FUSE can fiddle with that or only with what's
> below the leaf nodes.

I don't really understand what you mean by "local".

The problem with this leaf node philosophy, is that it's not really
consistent.  You can ensure that a mountpoint is a leaf node at mount
time, but you can force it to remain a leaf node after the mount.  So
I don't see why this check at mount time would make _any_ difference.

> > > What is your opinion about replacing the ptrace check by a signal check
> > > (later on, no hurry)?
> > 
> > Maybe.  You'd still have to convince me, that signals sent to suid
> > programs are not a security problem.
> 
> google kill(2):
> 
> 	http://www.opengroup.org/onlinepubs/007908799/xsh/kill.html
> 
> It is _defined_ behavior. So, it is up to the quality of the programmer
> whether or not it results in a security problem ;-)

Ahh, right.

The info leak argument still holds, but it's pretty weak.

So if the current behavior causes a problem for sombody, and relaxing
the check from ptraceability to killability fixes it, then I'll
consider doing it.  Until then, let's keep the more secure check.

Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-03  6:16                                         ` Miklos Szeredi
@ 2005-07-03 11:25                                           ` Frank van Maarseveen
  2005-07-03 13:24                                             ` Miklos Szeredi
  0 siblings, 1 reply; 80+ messages in thread
From: Frank van Maarseveen @ 2005-07-03 11:25 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: frankvm, akpm, aia21, arjan, linux-kernel

On Sun, Jul 03, 2005 at 08:16:37AM +0200, Miklos Szeredi wrote:
> > After some thinking, the whole "not allowing namespace differences
> > based on user id" philosophy is unenforcable and not even true sometimes
> > nowadays. Think NFS: have a look at the unfsd server, you'll be surprised
> > what it can do. Think any other networked file system exported by a
> > machine with an unusual disk file-system underneath. IIRC ncpfs does
> > this on the server based on access and thus based on uid.
> 
> Hmm, do you mean returning different directory contents based on uid?

	http://clusternfs.sourceforge.net

Don't ask me how this plays with the dcache.

> > The thing is, root rules the _local_ part of the name space. So it should
> > make a _huge_ difference if FUSE can fiddle with that or only with what's
> > below the leaf nodes.
> 
> I don't really understand what you mean by "local".

The opposite of "local" is "remote", i.e. networked filesystems:

	mount foo:/bar /usr/src/bar

/, /usr and /usr/src are stored on a local disk. /usr/src/bar/* is not.
Namespace invariance can be guaranteed for the "/usr/src" part. Not for
anything below unless you control the peer.

> 
> The problem with this leaf node philosophy, is that it's not really
> consistent.  You can ensure that a mountpoint is a leaf node at mount
> time, but you cannot force it to remain a leaf node after the mount.  So
                   ^^^
                 inserted by me

ok, I just remembered that any process with an open directory handle
could still fchdir() underneath. I think the leaf node enforcing is
possible but it is indeed a bit more complicated.

(Hmm, it's a bit bizarre but could you mount FUSE on, for example, a
 named pipe and change it into a directory?)

> I don't see why this check at mount time would make _any_ difference.

It should be possible to do audits on local filesystems, e.g. by:

	find / /home /var -xdev ....

This can be done as root but sometimes you may want to do this with the
uid/gid of a specific user, for safety or for checking what the user
actually can access or damage. And that won't work as expected when the
user places a FUSE mount on top of his own login directory. But I don't
think leaf node enforcing is required from a security point of view. This
is the only thing I could come up with.

IMHO The namespace argument against FUSE is weak for multiple reasons. The
only variancy I see is when crossing the mount point. And that disappears
once EACCES is returned when non-ptraceable processes try to cross it.
But that's not really acceptable (see previous audit case) unless FUSE
refuses to mount on non-leaf dirs.

-- 
Frank

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-03 11:25                                           ` Frank van Maarseveen
@ 2005-07-03 13:24                                             ` Miklos Szeredi
  2005-07-03 13:50                                               ` Frank van Maarseveen
  0 siblings, 1 reply; 80+ messages in thread
From: Miklos Szeredi @ 2005-07-03 13:24 UTC (permalink / raw)
  To: frankvm; +Cc: akpm, aia21, arjan, linux-kernel

> > Hmm, do you mean returning different directory contents based on uid?
> 
> 	http://clusternfs.sourceforge.net
> 
> Don't ask me how this plays with the dcache.

But here the decision on what to return is in the _server_.  There's
nothing magic about that.  It's as if it was N different servers for N
different clients, only more effective.

> The opposite of "local" is "remote", i.e. networked filesystems:
> 
> 	mount foo:/bar /usr/src/bar
> 
> /, /usr and /usr/src are stored on a local disk. /usr/src/bar/* is not.
> Namespace invariance can be guaranteed for the "/usr/src" part. Not for
> anything below unless you control the peer.

I think what you call namespace invariance is basically true for all
existing filesystems.  There could be a filesystem which returns
different directory contents based on whatever it wants, but it can't
return a different "dentry" for the same name.

So file/directory _content_ can be made to vary, but the namespace
itself can't.

> > 
> > The problem with this leaf node philosophy, is that it's not really
> > consistent.  You can ensure that a mountpoint is a leaf node at mount
> > time, but you cannot force it to remain a leaf node after the mount.  So
>                    ^^^
>                  inserted by me

[well corrected :)]

> 
> ok, I just remembered that any process with an open directory handle
> could still fchdir() underneath. I think the leaf node enforcing is
> possible but it is indeed a bit more complicated.
> 
> (Hmm, it's a bit bizarre but could you mount FUSE on, for example, a
>  named pipe and change it into a directory?)

No.  Fusermount checks file type and refuses the mount if there's a
mismatch (and it protects against races by mounting on '.' for
directories, and on '/proc/self/fd/X' for regular files).

> > I don't see why this check at mount time would make _any_ difference.
> 
> It should be possible to do audits on local filesystems, e.g. by:
> 
> 	find / /home /var -xdev ....
> 
> This can be done as root but sometimes you may want to do this with the
> uid/gid of a specific user, for safety or for checking what the user
> actually can access or damage.

But note, that running with the uid/gid of the user exposes the
auditing script to manipulation (kill, ptrace) by the user.  Running
with changed fsuid/fsgid is OK though.

> And that won't work as expected when the user places a FUSE mount on
> top of his own login directory. But I don't think leaf node
> enforcing is required from a security point of view. This is the
> only thing I could come up with.

OK, from the auditing POV, there's a slight hole in unprivileged
mounts.  But I don't think this is grave, since it's not so hard to
hide any sensitive data from such scripts anyway (keeping data in
memory, or keeping a file descriptor to an unlinked file, etc).

> IMHO The namespace argument against FUSE is weak for multiple reasons. The
> only variancy I see is when crossing the mount point. And that disappears
> once EACCES is returned when non-ptraceable processes try to cross it.

Yes, but still this is just a difference in permission, and not a
difference in namespace.

> But that's not really acceptable (see previous audit case) unless FUSE
> refuses to mount on non-leaf dirs.

I don't think the audit case is important.  It's easy to work around
it manually by the sysadmin, and for the automatic case it doesn't
really matter (as detailed above).

Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-03 13:24                                             ` Miklos Szeredi
@ 2005-07-03 13:50                                               ` Frank van Maarseveen
  2005-07-03 14:03                                                 ` Miklos Szeredi
  0 siblings, 1 reply; 80+ messages in thread
From: Frank van Maarseveen @ 2005-07-03 13:50 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: frankvm, akpm, aia21, arjan, linux-kernel

On Sun, Jul 03, 2005 at 03:24:04PM +0200, Miklos Szeredi wrote:
> > > Hmm, do you mean returning different directory contents based on uid?
> > 
> > 	http://clusternfs.sourceforge.net
> > 
> > Don't ask me how this plays with the dcache.
> 
> But here the decision on what to return is in the _server_.

It still means that name space invariancy cannot be guaranteed.

> There's
> nothing magic about that.  It's as if it was N different servers for N
> different clients, only more effective.

Not entirely, there is a UID dependancy.

> I think what you call namespace invariance is basically true for all
> existing filesystems.  There could be a filesystem which returns
> different directory contents based on whatever it wants, but it can't
> return a different "dentry" for the same name.

This is not what I mean. The directory contents itself must be identical
for every user. And every name must of course correspond with only one
dentry. That's name-space invariance IMO.

> > IMHO The namespace argument against FUSE is weak for multiple reasons. The
> > only variancy I see is when crossing the mount point. And that disappears
> > once EACCES is returned when non-ptraceable processes try to cross it.
> 
> Yes, but still this is just a difference in permission, and not a
> difference in namespace.

Exactly. And such a difference in permission already exists for (sane)
networked file systems such as NFS with "squash_root" in effect on
the server.

-- 
Frank

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-03 13:50                                               ` Frank van Maarseveen
@ 2005-07-03 14:03                                                 ` Miklos Szeredi
  0 siblings, 0 replies; 80+ messages in thread
From: Miklos Szeredi @ 2005-07-03 14:03 UTC (permalink / raw)
  To: frankvm; +Cc: akpm, aia21, arjan, linux-kernel

> > There's
> > nothing magic about that.  It's as if it was N different servers for N
> > different clients, only more effective.
> 
> Not entirely, there is a UID dependancy.

Ahh, so there is.

Does it actually work?  I doubt it.  The VFS won't allow two different
dentries to refer to the same name.  And without that, how would you
have several inodes for a single name?

> > I think what you call namespace invariance is basically true for all
> > existing filesystems.  There could be a filesystem which returns
> > different directory contents based on whatever it wants, but it can't
> > return a different "dentry" for the same name.
> 
> This is not what I mean. The directory contents itself must be identical
> for every user. And every name must of course correspond with only one
> dentry. That's name-space invariance IMO.

OK.

> > > IMHO The namespace argument against FUSE is weak for multiple
> > > reasons. The only variancy I see is when crossing the mount
> > > point. And that disappears once EACCES is returned when
> > > non-ptraceable processes try to cross it.
> > 
> > Yes, but still this is just a difference in permission, and not a
> > difference in namespace.
> 
> Exactly. And such a difference in permission already exists for (sane)
> networked file systems such as NFS with "squash_root" in effect on
> the server.

Agreed.

Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-01  8:02                         ` Andrew Morton
  2005-07-01 10:11                           ` Miklos Szeredi
@ 2005-07-03 19:39                           ` Pavel Machek
  2005-07-04  8:38                             ` Miklos Szeredi
  1 sibling, 1 reply; 80+ messages in thread
From: Pavel Machek @ 2005-07-03 19:39 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Miklos Szeredi, aia21, arjan, linux-kernel, frankvm

Hi!

> > > > > >  I leave the decision to you ;)  It's a separate independent patch
> > > > > >  already (fuse-nfs-export.patch).
> > > > > 
> > > > > Let's leave it out - that'll stimulate some activity in the
> > > > > userspace-nfs-server-for-FUSE area.
> > > > > 
> > > > > Speaking of which, dumb question: what does FUSE offer over simply using
> > > > > NFS protocol to talk to the userspace filesystem driver?
> > > > 
> > > > Oh lots:
> > > > 
> > > >   - no deadlocks (NFS mounted from localhost is riddled with them)
> > > 
> > > It is?  We had some low-memory problems a while back, but they got fixed. 
> > > During that work I did some nfs-to-localhost testing and things seemed OK.
> > 
> > Well, there's the "unsolvable" writeback deadlock problem, that FUSE
> > works around by not buffering dirty pages (and not allowing writable
> > mmap).  Does NFS solve that?  I'm interested :)
> 
> I don't know - first you'd have to describe it.

Actually, the right question is "how is fuse better than coda". I've
asked that before; unlike nfs, userspace filesystems implemented with
coda actually *work*, but do not provide partial-file writes.

								Pavel
-- 
teflon -- maybe it is a trademark, but it should not be.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-03 19:39                           ` Pavel Machek
@ 2005-07-04  8:38                             ` Miklos Szeredi
       [not found]                               ` <20050704084900.GG15370@elf.ucw.cz>
  0 siblings, 1 reply; 80+ messages in thread
From: Miklos Szeredi @ 2005-07-04  8:38 UTC (permalink / raw)
  To: pavel; +Cc: akpm, aia21, arjan, linux-kernel, frankvm

> Actually, the right question is "how is fuse better than coda". I've
> asked that before; unlike nfs, userspace filesystems implemented with
> coda actually *work*, but do not provide partial-file writes.

You answered your own question.

I did talk to Jan Harkes about the file I/O issue before starting
FUSE.  [searching archives] here's a quote from him about this:

  "I've been thinking about partial file accesses myself. However, I
  really don't want to go all the way to block-level caching. That
  would add a lot of overhead either in passing every read/write call
  up to userspace, or by using a largish amount of memory to keep
  track of availability of parts of the file. It also defeats the more
  efficient 'streaming' fetch of a whole file.

  However, something that would work reasonably well is a file offset
  marker that indicates how much data is available. Basically, when the
  application opens a file, the open upcall returns after the first...
  let's say 64KB... have arrived. Any read's and write (and mmap's) that
  access the available part of the file will be allowed. When any
  operation tries to access beyond the marker an upcall is made which
  blocks until the related part of the file has streamed in."

So true random access doesn't fit too well into the CODA philosophy.

Of course you could extend CODA to handle this as well (and all the
other things needed for safe user mounts), but the results would
proably not have pleased either side.

Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
       [not found]                               ` <20050704084900.GG15370@elf.ucw.cz>
@ 2005-07-04  9:02                                 ` Miklos Szeredi
  2005-07-04 10:46                                   ` Pekka Enberg
  0 siblings, 1 reply; 80+ messages in thread
From: Miklos Szeredi @ 2005-07-04  9:02 UTC (permalink / raw)
  To: pavel; +Cc: akpm, aia21, arjan, linux-kernel, frankvm

[CC restored]

> Okay, I just wanted to mention CODA. Modifying CODA is probably still
> better than modifying NFS (as akpm suggested at one point).

Definitely.

Here are some numbers on the size these filesystems as in current -mm
('wc fs/${fs}/* include/linux/${fs}*')

nfs:  25495
9p:    6102
coda:  4752
fuse:  3733

I'm sure FUSE came out smallest because I'm biased and did something
wrong ;)

Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-07-04  9:02                                 ` Miklos Szeredi
@ 2005-07-04 10:46                                   ` Pekka Enberg
  0 siblings, 0 replies; 80+ messages in thread
From: Pekka Enberg @ 2005-07-04 10:46 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: pavel, akpm, aia21, arjan, linux-kernel, frankvm

On 7/4/05, Miklos Szeredi <miklos@szeredi.hu> wrote:
> Here are some numbers on the size these filesystems as in current -mm
> ('wc fs/${fs}/* include/linux/${fs}*')

Sloccount [1] gives more meaningful numbers than wc:

('sloccount fs/${fs}/* include/linux/${fs}*')

nfs:  21,046
9p:    3,856
coda:  3,358
fuse:  2,829

  1. http://www.dwheeler.com/sloccount/

                              Pekka

^ permalink raw reply	[flat|nested] 80+ messages in thread

* FUSE merging?
@ 2005-09-02 22:02 Miklos Szeredi
  2005-09-02 22:34 ` Andrew Morton
  0 siblings, 1 reply; 80+ messages in thread
From: Miklos Szeredi @ 2005-09-02 22:02 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, fuse-devel, torvalds

Hi Andrew!

Do you plan to send FUSE to Linus for 2.6.14?

I know you have some doubts about usefulness, etc.  Here are a couple
of facts, that I hope show that Linux should benefit from having FUSE:

 - total number of downloads from SF: ~25000

 - number of downloads of last release (during 3 months): ~7000

 - number of distros carrying official packages: 2 (debian, gentoo)

 - number of publicly available filesystems known: 27

 - of which at least 2 are carried by debian (and maybe others)

 - number of language bindings: 7 (native: C, java, python, perl, C#, sh, TCL)

 - biggest known commercial user: ~110TB exported, total bandwidth: 1.5TB/s

 - mailing list traffic 100-200 messages/month

 - have been in -mm since 2005 january

Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-09-02 22:02 FUSE merging? Miklos Szeredi
@ 2005-09-02 22:34 ` Andrew Morton
  2005-09-03  0:24   ` FUSE merging? (why I chose FUSE over v9fs) Joshua J. Berry
                     ` (2 more replies)
  0 siblings, 3 replies; 80+ messages in thread
From: Andrew Morton @ 2005-09-02 22:34 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-kernel, fuse-devel, torvalds

Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> Hi Andrew!
> 
> Do you plan to send FUSE to Linus for 2.6.14?

Haven't thought about it all much.  Have spent most of my time in the last
month admiring the contents of kernel bugzilla, and the ongoing attempts to
increase them.

> I know you have some doubts about usefulness, etc.  Here are a couple
> of facts, that I hope show that Linux should benefit from having FUSE:
> 
>  - total number of downloads from SF: ~25000
> 
>  - number of downloads of last release (during 3 months): ~7000
> 
>  - number of distros carrying official packages: 2 (debian, gentoo)
> 
>  - number of publicly available filesystems known: 27
> 
>  - of which at least 2 are carried by debian (and maybe others)
> 
>  - number of language bindings: 7 (native: C, java, python, perl, C#, sh, TCL)
> 
>  - biggest known commercial user: ~110TB exported, total bandwidth: 1.5TB/s
> 
>  - mailing list traffic 100-200 messages/month
> 
>  - have been in -mm since 2005 january
> 

I agree that lots of people would like the functionality.  I regret that
although it appears that v9fs could provide it, there seems to be no
interest in working on that.

The main sticking point with FUSE remains the permission tricks around
fuse_allow_task().  AFAIK it remains the case that nobody has come up with
any better idea, so I'm inclined to merge the thing.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging? (why I chose FUSE over v9fs)
  2005-09-02 22:34 ` Andrew Morton
@ 2005-09-03  0:24   ` Joshua J. Berry
  2005-09-03  0:34   ` FUSE merging? Kasper Sandberg
  2005-09-03  5:31   ` Miklos Szeredi
  2 siblings, 0 replies; 80+ messages in thread
From: Joshua J. Berry @ 2005-09-03  0:24 UTC (permalink / raw)
  To: fuse-devel; +Cc: Andrew Morton, Miklos Szeredi, linux-kernel, torvalds

[-- Attachment #1: Type: text/plain, Size: 1210 bytes --]

On Friday 02 September 2005 15:34, Andrew Morton wrote:
> Miklos Szeredi <miklos@szeredi.hu> wrote:
> > Hi Andrew!
> >
> > Do you plan to send FUSE to Linus for 2.6.14?
...
> I agree that lots of people would like the functionality.  I regret that
> although it appears that v9fs could provide it, there seems to be no
> interest in working on that.

I evaluated both v9fs and FUSE for my project (I don't want to link to it 
until it does something actually useful ;) ) ... and it seemed that v9fs 
just wasn't UNIXy enough for my purposes -- the Plan9 way and the UNIX way 
were different enough to make me nervous.  I don't remember the specific 
details (this was a few months ago), but I do remember that v9fs had no 
extended attribute support, which was a showstopper for me.  Also, I 
couldn't find any userspace library for writing 9P servers.

Others may have reached similar conclusions.  Or maybe FUSE is just 
better-marketed. ;)

Either way, I am a happy FUSE user.  I think it offers things v9fs doesn't, 
and I'd like to see it in mainline. :)

-- Josh

-- 
Joshua J. Berry

"I haven't lost my mind -- it's backed up on tape somewhere."
    -- /usr/games/fortune

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-09-02 22:34 ` Andrew Morton
  2005-09-03  0:24   ` FUSE merging? (why I chose FUSE over v9fs) Joshua J. Berry
@ 2005-09-03  0:34   ` Kasper Sandberg
  2005-09-03  5:31   ` Miklos Szeredi
  2 siblings, 0 replies; 80+ messages in thread
From: Kasper Sandberg @ 2005-09-03  0:34 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Miklos Szeredi, linux-kernel, fuse-devel, torvalds

On Fri, 2005-09-02 at 15:34 -0700, Andrew Morton wrote:
> Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > Hi Andrew!
> > 
> > Do you plan to send FUSE to Linus for 2.6.14?
> 
<snip>

i use fuse too, and i like it, it works good, and its quite fast and
easy. it has given me no problems at all, i suggest merging, it harms
nothing, and seems to be well maintained

> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-09-02 22:34 ` Andrew Morton
  2005-09-03  0:24   ` FUSE merging? (why I chose FUSE over v9fs) Joshua J. Berry
  2005-09-03  0:34   ` FUSE merging? Kasper Sandberg
@ 2005-09-03  5:31   ` Miklos Szeredi
  2005-09-03  6:40     ` Andrew Morton
                       ` (2 more replies)
  2 siblings, 3 replies; 80+ messages in thread
From: Miklos Szeredi @ 2005-09-03  5:31 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, fuse-devel, torvalds

> Haven't thought about it all much.  Have spent most of my time in the last
> month admiring the contents of kernel bugzilla, and the ongoing attempts to
> increase them.

A penal system could be created, for example if someone is caught
introducing a bug, he will have to choose three additional reports
from bugzilla and analyze/fix them ;)

> >  - number of language bindings: 7 (native: C, java, python, perl,
> >  - C#, sh, TCL)

8 now, someone just sent a private mail about bindings for the Pliant
(never heard of it) language.

> I agree that lots of people would like the functionality.  I regret that
> although it appears that v9fs could provide it,

I think you are wrong there.  You don't appreciate all the complexity
FUSE _lacks_ by not being network transparent.  Just look at the error
text to errno conversion muck that v9fs has.  And their problems with
trying to do generic uid/gid mappings.

> there seems to be no interest in working on that.

It would mean adding a plethora of extensions to the 9P protocol, that
would take away all it's beauty.  I think you should realize that
these are different interfaces for different purposes. There may be
some overlap, but not enough to warrant trying to massage them into
one big ball.

> The main sticking point with FUSE remains the permission tricks around
> fuse_allow_task().  AFAIK it remains the case that nobody has come up with
> any better idea, so I'm inclined to merge the thing.

Do you promise?  I can do a resplit and submit to Linus, if that takes
some load off you.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-09-03  5:31   ` Miklos Szeredi
@ 2005-09-03  6:40     ` Andrew Morton
  2005-09-03  7:23       ` Miklos Szeredi
  2005-09-03 12:12     ` [fuse-devel] " yoann padioleau
  2005-09-03 13:29     ` Eric Van Hensbergen
  2 siblings, 1 reply; 80+ messages in thread
From: Andrew Morton @ 2005-09-03  6:40 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-kernel, fuse-devel, torvalds

Miklos Szeredi <miklos@szeredi.hu> wrote:
>
>  > The main sticking point with FUSE remains the permission tricks around
>  > fuse_allow_task().  AFAIK it remains the case that nobody has come up with
>  > any better idea, so I'm inclined to merge the thing.
> 
>  Do you promise?

I troll.  What others think matters.  But at this stage, objections would
need to be substantial, IMO.  We're rather deadlocked on the permission
thing, but if we can't come up with anything better then I'm inclined to
say what-the-hell.

>   I can do a resplit and submit to Linus, if that takes
>  some load off you.

Nah, then I'd just have to check that everything is the same.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-09-03  6:40     ` Andrew Morton
@ 2005-09-03  7:23       ` Miklos Szeredi
  0 siblings, 0 replies; 80+ messages in thread
From: Miklos Szeredi @ 2005-09-03  7:23 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, fuse-devel, torvalds

> >  > The main sticking point with FUSE remains the permission tricks around
> >  > fuse_allow_task().  AFAIK it remains the case that nobody has come up with
> >  > any better idea, so I'm inclined to merge the thing.
> > 
> >  Do you promise?
> 
> I troll.  What others think matters.  But at this stage, objections would
> need to be substantial, IMO.

Fair enough.

> We're rather deadlocked on the permission thing, but if we can't
> come up with anything better then I'm inclined to say what-the-hell.

There's no disadvantage IMO.  It adds nearly zero complexity.  If
someone doesn't like it, it can be configured out in userspace.  And
it leaves no legacy interfaces to support if later a better method is
found.

> >   I can do a resplit and submit to Linus, if that takes
> >  some load off you.
> 
> Nah, then I'd just have to check that everything is the same.

OK.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [fuse-devel] Re: FUSE merging?
  2005-09-03  5:31   ` Miklos Szeredi
  2005-09-03  6:40     ` Andrew Morton
@ 2005-09-03 12:12     ` yoann padioleau
  2005-09-03 13:29     ` Eric Van Hensbergen
  2 siblings, 0 replies; 80+ messages in thread
From: yoann padioleau @ 2005-09-03 12:12 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: akpm, linux-kernel, fuse-devel, torvalds

>
>
>>>  - number of language bindings: 7 (native: C, java, python, perl,
>>>  - C#, sh, TCL)
>>>
>
> 8 now, someone just sent a private mail about bindings for the Pliant
> (never heard of it) language.
>

9 now (there is an ocaml binding, and if you dont know ocaml, shame  
on you).

I would just like to add
  "please, merge fuse in linux, pleaseeeeeeeeeeee"
I am an happy user of debian so I have no problem, but when I want other
people to install my fuse-advanced-not-yet-public-like-spotlight-and- 
winfs-just-beter filesystem,
then there is a big problem.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-09-03  5:31   ` Miklos Szeredi
  2005-09-03  6:40     ` Andrew Morton
  2005-09-03 12:12     ` [fuse-devel] " yoann padioleau
@ 2005-09-03 13:29     ` Eric Van Hensbergen
  2005-09-03 14:20       ` Miklos Szeredi
  2 siblings, 1 reply; 80+ messages in thread
From: Eric Van Hensbergen @ 2005-09-03 13:29 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: akpm, linux-kernel, fuse-devel, torvalds, V9FS Developers,
	Linux FS Devel

On 9/3/05, Miklos Szeredi <miklos@szeredi.hu> wrote:
> 
> > I agree that lots of people would like the functionality.  I regret that
> > although it appears that v9fs could provide it,
> 
> I think you are wrong there.  You don't appreciate all the complexity
> FUSE _lacks_ by not being network transparent.  Just look at the error
> text to errno conversion muck that v9fs has.  And their problems with
> trying to do generic uid/gid mappings.
>

While FUSE doesn't handle it directly, doesn't it have to punt it to
its network file systems, how to the sshfs and what not handle this
sort of mapping?  Not really a criticism, just curious.  This doesn't
so much relate to FUSE, but I've been wrestling with what to do about
this chunk of (mapping) code -- it seems like it might be a good idea
to have some common code shared amongst the networked file systems to
handle this sort of thing.  The NFS idmapd service seems
overcomplicated, but something like that in the common code could
provide the same level of service.  What do folks think? Should
someone (me?) take a whack at a common id mapping service for the
kernel (or just extract idmapd from NFS) -- or is this something
better implemented filesystem-to-filesystem?

> > there seems to be no interest in working on that.
> 
> It would mean adding a plethora of extensions to the 9P protocol, that
> would take away all it's beauty.  I think you should realize that
> these are different interfaces for different purposes. There may be
> some overlap, but not enough to warrant trying to massage them into
> one big ball.
> 

A very good point.  I toyed with the idea of looking at creating a
FUSE-API-compatible v9fs file server library - but there are a good
deal of features (like extended attributes) that we don't have
provisions for in the protocol -- and most likely a good deal of
complexity supporting some of these features  that we may not want to
deal with just yet.

Miklos is right, for the moment FUSE and v9fs have some overlap, but
they remain very different things.  FUSE is far more focused on
delivering user-space file servers, and as such has a better solution
for developing user-space file servers.  We are still focusing on
getting the core of v9fs worked out, when we eventually have that
working smoothly, I like to think we'd be able to spend some time
developing a file server SDK as rich as FUSE (perhaps something
API-compatible as I mentioned before) -- but we want to focus on
getting the core protocol implementation right first - since it has
uses beyond user-space file servers.

         -eric

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-09-03 13:29     ` Eric Van Hensbergen
@ 2005-09-03 14:20       ` Miklos Szeredi
  2005-09-03 15:01         ` Eric Van Hensbergen
  0 siblings, 1 reply; 80+ messages in thread
From: Miklos Szeredi @ 2005-09-03 14:20 UTC (permalink / raw)
  To: ericvh
  Cc: akpm, linux-kernel, fuse-devel, torvalds, v9fs-developer,
	linux-fsdevel

> While FUSE doesn't handle it directly, doesn't it have to punt it to
> its network file systems, how to the sshfs and what not handle this
> sort of mapping?

Sshfs handles it by not handling it.  In this case it is neither
possible, nor needed to be able to correctly map the id space.

Yes, it may confuse the user.  It may even confuse the kernel for
sticky directories(*).  But basically it just works, and is very
simple.

> Not really a criticism, just curious.  This doesn't so much relate
> to FUSE, but I've been wrestling with what to do about this chunk of
> (mapping) code -- it seems like it might be a good idea to have some
> common code shared amongst the networked file systems to handle this
> sort of thing.  The NFS idmapd service seems overcomplicated, but
> something like that in the common code could provide the same level
> of service.  What do folks think? Should someone (me?) take a whack
> at a common id mapping service for the kernel (or just extract
> idmapd from NFS) -- or is this something better implemented
> filesystem-to-filesystem?

If more than one filesystem would use it, it would make sense to
abstract it out.  FUSE doesn't need it since it can happily do the
mapping in userspace.

Miklos

(*) I think the correct behavior would be if checking sticky
permissions could also be delegated to the filesystem, like checking
normal permissions can be.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-09-03 14:20       ` Miklos Szeredi
@ 2005-09-03 15:01         ` Eric Van Hensbergen
  2005-09-03 15:38           ` Miklos Szeredi
  0 siblings, 1 reply; 80+ messages in thread
From: Eric Van Hensbergen @ 2005-09-03 15:01 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: akpm, linux-kernel, fuse-devel, v9fs-developer, linux-fsdevel

On 9/3/05, Miklos Szeredi <miklos@szeredi.hu> wrote:
> > While FUSE doesn't handle it directly, doesn't it have to punt it to
> > its network file systems, how to the sshfs and what not handle this
> > sort of mapping?
> 
> Sshfs handles it by not handling it.  In this case it is neither
> possible, nor needed to be able to correctly map the id space.
> 
> Yes, it may confuse the user.  It may even confuse the kernel for
> sticky directories(*).  But basically it just works, and is very
> simple.
> 

In principal, Plan 9 file servers handle permission checking
server-side, so we could likewise punt -- but it seemed a good idea to
have some form of mapping for directory listings (and things like
sticky directories) to make sense.

               -eric

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: FUSE merging?
  2005-09-03 15:01         ` Eric Van Hensbergen
@ 2005-09-03 15:38           ` Miklos Szeredi
  0 siblings, 0 replies; 80+ messages in thread
From: Miklos Szeredi @ 2005-09-03 15:38 UTC (permalink / raw)
  To: ericvh; +Cc: akpm, linux-kernel, fuse-devel, v9fs-developer, linux-fsdevel

> > Yes, it may confuse the user.  It may even confuse the kernel for
> > sticky directories(*).  But basically it just works, and is very
> > simple.
> > 
> 
> In principal, Plan 9 file servers handle permission checking
> server-side, so we could likewise punt -- but it seemed a good idea to
> have some form of mapping for directory listings (and things like
> sticky directories) to make sense.

Yes if the user/group names are available (as in 9P), then doing the
mapping based on /etc/passwd for example makes sense.

But sshfs only transfers the numeric uid/gid, and hence there's simply
no info to base any transformation on.

It could transfer /etc/passwd from the remote server, and use that to
do mapping, but that is getting more complex than the problem actually
warrants IMO.

Miklos

^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2005-09-03 15:38 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-02 22:02 FUSE merging? Miklos Szeredi
2005-09-02 22:34 ` Andrew Morton
2005-09-03  0:24   ` FUSE merging? (why I chose FUSE over v9fs) Joshua J. Berry
2005-09-03  0:34   ` FUSE merging? Kasper Sandberg
2005-09-03  5:31   ` Miklos Szeredi
2005-09-03  6:40     ` Andrew Morton
2005-09-03  7:23       ` Miklos Szeredi
2005-09-03 12:12     ` [fuse-devel] " yoann padioleau
2005-09-03 13:29     ` Eric Van Hensbergen
2005-09-03 14:20       ` Miklos Szeredi
2005-09-03 15:01         ` Eric Van Hensbergen
2005-09-03 15:38           ` Miklos Szeredi
  -- strict thread matches above, loose matches on Subject: below --
2005-06-30  9:19 Miklos Szeredi
2005-06-30  9:27 ` Andrew Morton
2005-06-30  9:51   ` Miklos Szeredi
2005-06-30 10:00     ` Arjan van de Ven
2005-06-30 10:12       ` Miklos Szeredi
2005-06-30 10:20         ` Arjan van de Ven
2005-06-30 10:24           ` Miklos Szeredi
2005-06-30 19:39             ` Avuton Olrich
2005-07-01  6:23               ` Miklos Szeredi
2005-06-30 11:13           ` Anton Altaparmakov
2005-06-30 19:46             ` Andrew Morton
2005-06-30 20:00               ` Andrew Morton
2005-07-01  6:40                 ` Miklos Szeredi
2005-06-30 22:28               ` Frank van Maarseveen
2005-07-01  6:58                 ` Miklos Szeredi
2005-07-01  9:24                   ` Frank van Maarseveen
2005-07-01 10:27                     ` Miklos Szeredi
2005-07-01 12:00                       ` Frank van Maarseveen
2005-07-01 12:36                         ` Miklos Szeredi
2005-07-01 13:05                           ` Frank van Maarseveen
2005-07-01 13:21                             ` Miklos Szeredi
2005-07-01 15:20                               ` Frank van Maarseveen
2005-07-01 17:04                                 ` Miklos Szeredi
2005-07-01 18:04                                   ` Frank van Maarseveen
2005-07-01 19:35                                     ` Jeremy Maitin-Shepard
2005-07-02 14:49                                     ` Miklos Szeredi
2005-07-02 16:00                                       ` Frank van Maarseveen
2005-07-03  6:16                                         ` Miklos Szeredi
2005-07-03 11:25                                           ` Frank van Maarseveen
2005-07-03 13:24                                             ` Miklos Szeredi
2005-07-03 13:50                                               ` Frank van Maarseveen
2005-07-03 14:03                                                 ` Miklos Szeredi
2005-07-01  6:36               ` Miklos Szeredi
2005-07-01  6:50                 ` Andrew Morton
2005-07-01  7:07                   ` Miklos Szeredi
2005-07-01  7:14                     ` Andrew Morton
2005-07-01  7:27                       ` Miles Bader
2005-07-01  7:38                       ` Miklos Szeredi
2005-07-01  8:02                         ` Andrew Morton
2005-07-01 10:11                           ` Miklos Szeredi
2005-07-01 11:29                             ` Andrew Morton
2005-07-01 12:00                               ` Miklos Szeredi
2005-07-01 12:53                               ` Anton Altaparmakov
2005-07-01 13:07                                 ` Anton Altaparmakov
2005-07-01 13:51                                 ` Frank van Maarseveen
2005-07-01 13:29                               ` Eric Van Hensbergen
2005-07-01 16:45                               ` Matthias Urlichs
2005-07-01 12:08                             ` Frank van Maarseveen
2005-07-01 13:21                             ` Eric Van Hensbergen
2005-07-01 13:53                               ` Miklos Szeredi
2005-07-01 14:18                                 ` Eric Van Hensbergen
2005-07-01 14:31                                   ` Miklos Szeredi
2005-07-02 10:01                                     ` Eric W. Biederman
2005-07-02 14:58                                       ` Miklos Szeredi
2005-07-02 16:43                                       ` Eric Van Hensbergen
2005-07-02 17:33                                         ` Eric W. Biederman
2005-07-03 19:39                           ` Pavel Machek
2005-07-04  8:38                             ` Miklos Szeredi
     [not found]                               ` <20050704084900.GG15370@elf.ucw.cz>
2005-07-04  9:02                                 ` Miklos Szeredi
2005-07-04 10:46                                   ` Pekka Enberg
2005-07-01 12:37                   ` bert hubert
2005-07-01  7:46                 ` Frederik Deweerdt
2005-07-01  9:47                   ` Miklos Szeredi
2005-07-01  9:36                 ` Frank van Maarseveen
2005-07-01 10:45                   ` Miklos Szeredi
2005-07-01 11:34                     ` Frank van Maarseveen
2005-06-30 10:16       ` Miklos Szeredi
2005-06-30 16:30         ` Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox