From: Maxim Patlasov <mpatlasov@parallels.com>
To: Linus Torvalds <torvalds@linux-foundation.org>,
Miklos Szeredi <miklos@szeredi.hu>
Cc: Anand Avati <avati@gluster.org>,
"open list:FUSE: FILESYSTEM..."
<fuse-devel@lists.sourceforge.net>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
<mtheall@us.ibm.com>
Subject: Re: [PATCH 0/5] fuse: handle release synchronously (v4)
Date: Wed, 1 Oct 2014 15:28:17 +0400 [thread overview]
Message-ID: <542BE551.1010705@parallels.com> (raw)
In-Reply-To: <CA+55aFyD2Q6tJzRUbNyNFRdbSN09SAN9CqC5BTmCtOu4W9PKGw@mail.gmail.com>
On 10/01/2014 12:44 AM, Linus Torvalds wrote:
> On Tue, Sep 30, 2014 at 12:19 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> What about flock(2), FL_SETLEASE, etc semantics (which are the sane ones,
>> compared to the POSIX locks shit which mandates release of lock on each close(2)
>> instead of "when all [duplicate] descriptors have been closed")?
>>
>> You have to do that from ->release(), there's no question about that.
> We do locks_remove_file() independently on ->release, but yes, it's
> basically done just before the last release.
>
> But it has the *exact* same semantics as release, including very much
> having nothing what-so-ever to do with "last close()".
>
> If the file descriptor is opened for other reasons (ie mmap, /proc
> accesses, whatever), then that delays locks_remove_file() the same way
> it delays release.
>
> None of that has *anothing* to do with "synchronous". Thinking it does is wrong.
>
> And none of this has *anything* to do with the issue that Maxim
> pointed to in the mailing list web page, which was about write caches,
> and how you cannot (and MUST NOT) delay them until release time.
I apologise for mentioning that mailing list web page in my title
message. This was really misleading, I had to think about it in advance.
Of course, write caches must be flushed in scope of ->flush(), not
->release(). Let me please set forth an use-case that led me to those
patches.
We implemented a FUSE-based distributed storage solution intended for
keeping images of VMs (virtual machines) and their configuration files.
The way how VMs use images makes exclusive-open()er semantics very
attractive: while a VM is using its image on a node, the concurrent
access from other nodes to that image is neither desirable nor
necessary. So, we acquire an exclusive lease on FUSE_OPEN and release
it on FUSE_RELEASE. This is quite natural and has obviously nothing to
do with FUSE_FLUSH.
Following such semantics, there are two choices for handling open() if
the file is currently exclusively locked by a remote node: (a) return
EBUSY; (b) block until the remote node release the file. We decided for
(a), because (b) is very inconvenient in practice: most applications
handle failed open(2) properly, but very few are clever enough to spawn
a separate thread with open() and kill it if the open() has not
succeeded in a reasonable time.
The patches I sent make essentially one thing: they make FUSE
->release() wait for ACK from userspace before return. Without these
patches, any attempt to test or use our storage in valid use-cases led
to spurious EBUSY. For example, while migrating a VM from one node to
another, we firstly close the image file on source node, then try to
open it on destination node, but fail because FUSE_RELEASE is not
processed by userspace on source node yet.
Given those patches must die, do you have any ideas how to resolve that
"spurious EBUSY" problem?
Thanks,
Maxim
next prev parent reply other threads:[~2014-10-01 11:28 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-25 12:05 [PATCH 0/5] fuse: handle release synchronously (v4) Maxim Patlasov
2014-09-25 12:05 ` [PATCH 1/5] fuse: add FOPEN_SYNC_RELEASE flag to ff->open_flags Maxim Patlasov
2014-09-25 12:05 ` [PATCH 2/5] fuse: cosmetic rework of fuse_send_readpages Maxim Patlasov
2014-09-25 12:06 ` [PATCH 3/5] fuse: wait for end of IO on release Maxim Patlasov
2014-09-25 12:06 ` [PATCH 4/5] fuse: add mount option to disable synchronous release Maxim Patlasov
2014-09-25 12:07 ` [PATCH 5/5] fuse: enable close_wait " Maxim Patlasov
2014-09-26 15:28 ` [PATCH 0/5] fuse: handle release synchronously (v4) Miklos Szeredi
2014-09-30 3:15 ` Miklos Szeredi
2014-09-30 3:55 ` Linus Torvalds
[not found] ` <CAFboF2yhGyjk4e_CHQV5b2WvB-QhsWNyHvFiFG_OM_=3-KArLQ@mail.gmail.com>
2014-09-30 7:43 ` Miklos Szeredi
2014-09-30 18:22 ` Linus Torvalds
2014-09-30 19:04 ` Linus Torvalds
2014-09-30 19:19 ` Miklos Szeredi
2014-09-30 20:44 ` Linus Torvalds
2014-10-01 3:47 ` Miklos Szeredi
2014-10-01 11:28 ` Maxim Patlasov [this message]
2014-10-09 8:14 ` Miklos Szeredi
2014-10-16 10:31 ` Maxim Patlasov
2014-10-16 13:43 ` Miklos Szeredi
2014-10-16 13:54 ` Linus Torvalds
2014-10-17 8:55 ` Miklos Szeredi
2014-10-18 15:35 ` Linus Torvalds
2014-10-18 15:40 ` Linus Torvalds
2014-10-18 18:01 ` Miklos Szeredi
2014-10-18 18:24 ` Al Viro
2014-10-18 18:45 ` Al Viro
2014-10-18 18:38 ` Linus Torvalds
2014-10-18 18:22 ` Al Viro
2014-10-18 22:44 ` Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=542BE551.1010705@parallels.com \
--to=mpatlasov@parallels.com \
--cc=avati@gluster.org \
--cc=fuse-devel@lists.sourceforge.net \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=mtheall@us.ibm.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox