linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 0/5] fuse: handle release synchronously (v4)
       [not found] <20140925120244.540.31506.stgit@dhcp-10-30-22-200.sw.ru>
@ 2014-09-26 15:28 ` Miklos Szeredi
  2014-09-30  3:15   ` Miklos Szeredi
  0 siblings, 1 reply; 4+ messages in thread
From: Miklos Szeredi @ 2014-09-26 15:28 UTC (permalink / raw)
  To: Maxim Patlasov
  Cc: fuse-devel, Anand Avati, Kernel Mailing List, Linux-Fsdevel,
	Al Viro, Linus Torvalds

[Adding CC's]

On Thu, Sep 25, 2014 at 2:05 PM, Maxim Patlasov <MPatlasov@parallels.com> wrote:

> In short, the problem is that fuse_release (that's usually called on last
> user close(2)) sends FUSE_RELEASE to userspace and returns without waiting
> for ACK from userspace. Consequently, there is a gap when user regards the
> file released while userspace fuse is still working on it. An attempt to
> access the file from another node leads to complicated synchronization
> problems because the first node still "holds" the file.
>
> The patch-set resolves the problem by making fuse_release synchronous:
> wait for ACK from userspace for FUSE_RELEASE if the feature is ON.
>
> It must be emphasized that even if the feature is enabled (i.e. fuse_release
> is synchronous), nothing guarantees that fuse_release() will be called
> in the context of close(2). In fact, it may happen later, on last fput().
> However, there are still a lot of use cases when close(2) is synchronous,
> so the feature must be regarded as an optimization maximizing chances of
> synchronous behaviour of close(2).

Okay, we have the common case of close -> last-fput ->release.  This
being synchronous is fine.  Related cases are munmap(), and weird
corner cases including I/O on file descriptor being done in parallel
with close() in another thread.  Synchronous behavior is also fine in
these cases, since the task calling the last fput() is in fact
responsible for releasing the file.

And then we have the uncommon cases when fput() is called from
something unrelated.  Take the following DoS example:  malicious app
creates a socket loop (sockA is sent over sockB and sockB is sent over
sockA) and in addition it tacks a fuse fd onto one of the sockets.
The fuse fd is implemented to block forever on release.  In this case
the loop will persist after the sockets are closed due to refcounting.
Later, a garbage collection is triggered from a completely unrelated
socket operation.   This result in the unrelated task being blocked
forever.

The simplest way to avoid such an attack is to make the sync-release
feature privileged.  But even if it's privileged, the fact that
->release can take a lot of time and a completely unrelated task could
be waiting for it to finish is not a good thing.

So I'm wondering: could we have some way of distinguishing "good
release" from "bad release"?  Maybe adding an fput_sync() variant that
passes an "sync" flag to ->release()?

Al, Linus?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/5] fuse: handle release synchronously (v4)
  2014-09-26 15:28 ` Miklos Szeredi
@ 2014-09-30  3:15   ` Miklos Szeredi
  0 siblings, 0 replies; 4+ messages in thread
From: Miklos Szeredi @ 2014-09-30  3:15 UTC (permalink / raw)
  To: Maxim Patlasov
  Cc: fuse-devel, Anand Avati, Kernel Mailing List, Linux-Fsdevel,
	Al Viro, Linus Torvalds

On Fri, Sep 26, 2014 at 5:28 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> [Adding CC's]
>
> On Thu, Sep 25, 2014 at 2:05 PM, Maxim Patlasov <MPatlasov@parallels.com> wrote:
>
>> In short, the problem is that fuse_release (that's usually called on last
>> user close(2)) sends FUSE_RELEASE to userspace and returns without waiting
>> for ACK from userspace. Consequently, there is a gap when user regards the
>> file released while userspace fuse is still working on it. An attempt to
>> access the file from another node leads to complicated synchronization
>> problems because the first node still "holds" the file.
>>
>> The patch-set resolves the problem by making fuse_release synchronous:
>> wait for ACK from userspace for FUSE_RELEASE if the feature is ON.
>>
>> It must be emphasized that even if the feature is enabled (i.e. fuse_release
>> is synchronous), nothing guarantees that fuse_release() will be called
>> in the context of close(2). In fact, it may happen later, on last fput().
>> However, there are still a lot of use cases when close(2) is synchronous,
>> so the feature must be regarded as an optimization maximizing chances of
>> synchronous behaviour of close(2).
>
> Okay, we have the common case of close -> last-fput ->release.  This
> being synchronous is fine.  Related cases are munmap(), and weird
> corner cases including I/O on file descriptor being done in parallel
> with close() in another thread.  Synchronous behavior is also fine in
> these cases, since the task calling the last fput() is in fact
> responsible for releasing the file.
>
> And then we have the uncommon cases when fput() is called from
> something unrelated.  Take the following DoS example:  malicious app
> creates a socket loop (sockA is sent over sockB and sockB is sent over
> sockA) and in addition it tacks a fuse fd onto one of the sockets.
> The fuse fd is implemented to block forever on release.  In this case
> the loop will persist after the sockets are closed due to refcounting.
> Later, a garbage collection is triggered from a completely unrelated
> socket operation.   This result in the unrelated task being blocked
> forever.
>
> The simplest way to avoid such an attack is to make the sync-release
> feature privileged.  But even if it's privileged, the fact that
> ->release can take a lot of time and a completely unrelated task could
> be waiting for it to finish is not a good thing.
>
> So I'm wondering: could we have some way of distinguishing "good
> release" from "bad release"?  Maybe adding an fput_sync() variant that
> passes an "sync" flag to ->release()?

If we just concentrate on close(2), then there's a much simpler
solution: in fuse_flush() check if file_count(file) is 1.  If it is,
then this is a final close and we can mark the FLUSH request as such
with a flag (FUSE_FLUSH_FINAL).  Userspace filesystem can act
accordingly and may even return an error value to close(2).  In this
case we could even omit the RELEASE request as an optimization, since
we know the file cannot be resurrected if this was the last reference.

Do we need to be synchronous in any other case?  E.g. munmap() comes to mind.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/5] fuse: handle release synchronously (v4)
       [not found]                 ` <CA+55aFyU2NgXYZSnNBwMR_ZyFuKzHna2NBBhwd6iev0uaEdqKA@mail.gmail.com>
@ 2014-10-18 18:22                   ` Al Viro
  2014-10-18 22:44                     ` Eric W. Biederman
  0 siblings, 1 reply; 4+ messages in thread
From: Al Viro @ 2014-10-18 18:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Miklos Szeredi, Maxim Patlasov, Anand Avati,
	Linux Kernel Mailing List, Michael j Theall, fuse-devel,
	linux-fsdevel

On Sat, Oct 18, 2014 at 08:40:05AM -0700, Linus Torvalds wrote:
> On Sat, Oct 18, 2014 at 8:35 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Look around for AIO. Look around for the loop driver. Look around for
> > a number of things that do "fget()" and that you completely ignored.
> 
> .. actually, there are more instances of "get_file()" than of
> "fget()", the aio one just happened to be the latter form. Lots and
> lots of ways to get ahold of a file descriptor that keeps it open past
> the "last close".

FWIW, procfs patch touches a very annoying issue: ->show_fdinfo() being
blocking.  I would really like to get rid of that particular get_file()
and even more so - of get_files_struct() in there.

I certainly agree that anyone who expects that close() means the end of IO
is completely misguided.  Mappings don't disappear on close(), neither does
a descriptor returned by dup(), or one that child got over fork(),
or something sent over in SCM_RIGHTS datagram, or, as you suggested, made
backing store for /dev/loop, etc.

What's more, in the example given upthread, somebody might've spotted that
file in /proc/<pid>/fd/* and *opened* it.  At which point umount would
have to fail with EBUSY.  And the same lsof(8) might've done just that.

It's not a matter of correctness or security, especially since somebody who
could do that, could've stopped your process, PTRACE_POKEd a fairly short
series of syscalls that would connect to AF_UNIX socket, send the file
over to them and clean after itself, then single-stepped through all of that,
restored the original state and resumed your process.  

It is a QoI matter, though.  And get_files_struct() in there is a lot more
annoying than get_file()/fput().  Suppose you catch the process during
exit().  All of a sudden, read from /proc/<pid>/fdinfo/<n> ends up doing
shitloads of filp_close().  It would be nice to avoid that.

Folks, how much pain would it be to make ->show_fdinfo() non-blocking?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/5] fuse: handle release synchronously (v4)
  2014-10-18 18:22                   ` [PATCH 0/5] fuse: handle release synchronously (v4) Al Viro
@ 2014-10-18 22:44                     ` Eric W. Biederman
  0 siblings, 0 replies; 4+ messages in thread
From: Eric W. Biederman @ 2014-10-18 22:44 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Miklos Szeredi, Maxim Patlasov, Anand Avati,
	Linux Kernel Mailing List, Michael j Theall, fuse-devel,
	linux-fsdevel

Al Viro <viro@ZenIV.linux.org.uk> writes:

> On Sat, Oct 18, 2014 at 08:40:05AM -0700, Linus Torvalds wrote:
>> On Sat, Oct 18, 2014 at 8:35 AM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>> >
>> > Look around for AIO. Look around for the loop driver. Look around for
>> > a number of things that do "fget()" and that you completely ignored.
>> 
>> .. actually, there are more instances of "get_file()" than of
>> "fget()", the aio one just happened to be the latter form. Lots and
>> lots of ways to get ahold of a file descriptor that keeps it open past
>> the "last close".
>
> FWIW, procfs patch touches a very annoying issue: ->show_fdinfo() being
> blocking.  I would really like to get rid of that particular get_file()
> and even more so - of get_files_struct() in there.
>
> I certainly agree that anyone who expects that close() means the end of IO
> is completely misguided.  Mappings don't disappear on close(), neither does
> a descriptor returned by dup(), or one that child got over fork(),
> or something sent over in SCM_RIGHTS datagram, or, as you suggested, made
> backing store for /dev/loop, etc.
>
> What's more, in the example given upthread, somebody might've spotted that
> file in /proc/<pid>/fd/* and *opened* it.  At which point umount would
> have to fail with EBUSY.  And the same lsof(8) might've done just that.
>
> It's not a matter of correctness or security, especially since somebody who
> could do that, could've stopped your process, PTRACE_POKEd a fairly short
> series of syscalls that would connect to AF_UNIX socket, send the file
> over to them and clean after itself, then single-stepped through all of that,
> restored the original state and resumed your process.  
>
> It is a QoI matter, though.  And get_files_struct() in there is a lot more
> annoying than get_file()/fput().  Suppose you catch the process during
> exit().  All of a sudden, read from /proc/<pid>/fdinfo/<n> ends up doing
> shitloads of filp_close().  It would be nice to avoid that.
>
> Folks, how much pain would it be to make ->show_fdinfo() non-blocking?

I took a quick look and there are a couple of instances in tun,
eventpoll, and fanotify/inotify that take a spinlock while traversing
the data that needs to be printed.

So it would take a good hard stare at those pieces of code to understand
the locking, and potentially rewrite those routines.

The only one I am particularly familiar with tun did not look
fundamentally hard to change but it also isn't something I would
casually do either, as it would be easy to introduce nasty races by
accident.

Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-10-18 22:44 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20140930191933.GC5011@tucsk.piliscsaba.szeredi.hu>
     [not found] ` <CA+55aFyD2Q6tJzRUbNyNFRdbSN09SAN9CqC5BTmCtOu4W9PKGw@mail.gmail.com>
     [not found]   ` <542BE551.1010705@parallels.com>
     [not found]     ` <CAJfpegs4Up_feUbqEGxfhq+rZneTmy0qcZy-k3so7VjufQY7-w@mail.gmail.com>
     [not found]       ` <543F9E75.2090509@parallels.com>
     [not found]         ` <CAJfpegvS+rmFqGgvHq183Z-MxLAwcgEB57LQxAwy3QAe5CaVwg@mail.gmail.com>
     [not found]           ` <CA+55aFw68DMazGSu=YaOVVEWp=_LrD+2abah+5Tq3kpmH-r+Hg@mail.gmail.com>
     [not found]             ` <20141017085509.GE5011@tucsk.piliscsaba.szeredi.hu>
     [not found]               ` <CA+55aFyALiUbRVcxGGG79N-97pa997v9En64O9jLLBvBAO6vrA@mail.gmail.com>
     [not found]                 ` <CA+55aFyU2NgXYZSnNBwMR_ZyFuKzHna2NBBhwd6iev0uaEdqKA@mail.gmail.com>
2014-10-18 18:22                   ` [PATCH 0/5] fuse: handle release synchronously (v4) Al Viro
2014-10-18 22:44                     ` Eric W. Biederman
     [not found] <20140925120244.540.31506.stgit@dhcp-10-30-22-200.sw.ru>
2014-09-26 15:28 ` Miklos Szeredi
2014-09-30  3:15   ` Miklos Szeredi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).