From: Vivek Goyal <vgoyal@redhat.com>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Hanna Reitz <hreitz@redhat.com>
Subject: Re: Persistent FUSE file handles (Was: virtiofs uuid and file handles)
Date: Mon, 12 Sep 2022 15:56:47 -0400 [thread overview]
Message-ID: <Yx+O/0gVFso5YNxG@redhat.com> (raw)
In-Reply-To: <CAOQ4uxhTksMqScNuRbRNNtXvs+JhTbcggPQpXfzqHJtYmTKuRA@mail.gmail.com>
On Mon, Sep 12, 2022 at 06:07:42PM +0300, Amir Goldstein wrote:
> On Mon, Sep 12, 2022 at 5:35 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Mon, Sep 12, 2022 at 04:38:48PM +0300, Amir Goldstein wrote:
> > > On Mon, Sep 12, 2022 at 4:16 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > >
> > > > On Sun, Sep 11, 2022 at 01:14:49PM +0300, Amir Goldstein wrote:
> > > > > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > > > > >
> > > > > > One proposal was to add LOOKUP_HANDLE operation that is similar to
> > > > > > LOOKUP except it takes a {variable length handle, name} as input and
> > > > > > returns a variable length handle *and* a u64 node_id that can be used
> > > > > > normally for all other operations.
> > > > > >
> > > > > > The advantage of such a scheme for virtio-fs (and possibly other fuse
> > > > > > based fs) would be that userspace need not keep a refcounted object
> > > > > > around until the kernel sends a FORGET, but can prune its node ID
> > > > > > based cache at any time. If that happens and a request from the
> > > > > > client (kernel) comes in with a stale node ID, the server will return
> > > > > > -ESTALE and the client can ask for a new node ID with a special
> > > > > > lookup_handle(fh, NULL).
> > > > > >
> > > > > > Disadvantages being:
> > > > > >
> > > > > > - cost of generating a file handle on all lookups
> > > > > > - cost of storing file handle in kernel icache
> > > > > >
> > > > > > I don't think either of those are problematic in the virtiofs case.
> > > > > > The cost of having to keep fds open while the client has them in its
> > > > > > cache is much higher.
> > > > > >
> > > > >
> > > > > I was thinking of taking a stab at LOOKUP_HANDLE for a generic
> > > > > implementation of persistent file handles for FUSE.
> > > >
> > > > Hi Amir,
> > > >
> > > > I was going throug the proposal above for LOOKUP_HANDLE and I was
> > > > wondering how nodeid reuse is handled.
> > >
> > > LOOKUP_HANDLE extends the 64bit node id to be variable size id.
> >
> > Ok. So this variable size id is basically file handle returned by
> > host?
> >
> > So this looks little different from what Miklos had suggested. IIUC,
> > he wanted LOOKUP_HANDLE to return both file handle as well as *node id*.
> >
> > *********************************
> > One proposal was to add LOOKUP_HANDLE operation that is similar to
> > LOOKUP except it takes a {variable length handle, name} as input and
> > returns a variable length handle *and* a u64 node_id that can be used
> > normally for all other operations.
> > ***************************************
> >
>
> Ha! Thanks for reminding me about that.
> It's been a while since I looked at what actually needs to be done.
> That means that evicting server inodes from cache may not be as
> easy as I had imagined.
>
> > > A server that declares support for LOOKUP_HANDLE must never
> > > reuse a handle.
> > >
> > > That's the basic idea. Just as a filesystem that declares to support
> > > exportfs must never reuse a file handle.
> >
> > >
> > > > IOW, if server decides to drop
> > > > nodeid from its cache and reuse it for some other file, how will we
> > > > differentiate between two. Some sort of generation id encoded in
> > > > nodeid?
> > > >
> > >
> > > That's usually the way that file handles are implemented in
> > > local fs. The inode number is the internal lookup index and the
> > > generation part is advanced on reuse.
> > >
> > > But for passthrough fs like virtiofsd, the LOOKUP_HANDLE will
> > > just use the native fs file handles, so virtiofsd can evict the inodes
> > > entry from its cache completely, not only close the open fds.
> >
> > Ok, got it. Will be interesting to see how kernel fuse changes look
> > to accomodate this variable sized nodeid.
> >
>
> It may make sense to have a FUSE protocol dialect where nodeid
> is variable size for all commands, but it probably won't be part of
> the initial LOOKUP_HANDLE work.
>
> > >
> > > That is what my libfuse_passthough POC does.
> >
> > Where have you hosted corresponding kernel changes?
> >
>
> There are no kernel changes.
>
> For xfs and ext4 I know how to implement open_by_ino()
> and I know how to parse the opaque fs file handle to extract
> ino+generation from it and return them in FUSE_LOOKUP
> response.
Aha, interesting. So is this filesystem specific. Works on xfs/ext4 but
not necessarily on other filesystems like nfs. (Because they have their
own way of encoding things in file handle).
>
> So I could manage to implement persistent NFS file handles
> over the existing FUSE protocol with 64bit node id.
And that explains that why you did not have to make kernel changes. But
this does not allow sever to close the fd associate with nodeid? Or there
is a way for server to generate file handle and then call
open_by_handle_at().
Thanks
Vivek
next prev parent reply other threads:[~2022-09-12 19:56 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <a8828676-210a-99e8-30d7-6076f334ed71@virtuozzo.com>
[not found] ` <CAOQ4uxgZ08ePA5WFOYFoLZaq_-Kjr-haNzBN5Aj3MfF=f9pjdg@mail.gmail.com>
[not found] ` <1bb71cbf-0a10-34c7-409d-914058e102f6@virtuozzo.com>
[not found] ` <CAOQ4uxieqnKENV_kJYwfcnPjNdVuqH3BnKVx_zLz=N_PdAguNg@mail.gmail.com>
[not found] ` <dc696835-bbb5-ed4e-8708-bc828d415a2b@virtuozzo.com>
[not found] ` <CAOQ4uxg0XVEEzc+HyyC63WWZuA2AsRjJmbZBuNimtj=t+quVyg@mail.gmail.com>
[not found] ` <20200922210445.GG57620@redhat.com>
2020-09-23 2:49 ` virtiofs uuid and file handles Amir Goldstein
2020-09-23 7:44 ` Miklos Szeredi
2020-09-23 9:56 ` Amir Goldstein
2020-09-23 11:12 ` Miklos Szeredi
2021-05-29 16:05 ` Amir Goldstein
2021-05-31 14:11 ` Miklos Szeredi
2021-05-31 18:12 ` Amir Goldstein
2021-06-01 14:49 ` Vivek Goyal
2021-06-01 15:42 ` Amir Goldstein
2021-06-01 16:08 ` Max Reitz
2021-06-01 18:23 ` Amir Goldstein
2022-09-11 10:14 ` Persistent FUSE file handles (Was: virtiofs uuid and file handles) Amir Goldstein
2022-09-11 15:16 ` Bernd Schubert
2022-09-11 15:29 ` Amir Goldstein
2022-09-11 15:55 ` Bernd Schubert
2022-09-12 13:16 ` Vivek Goyal
2022-09-12 13:38 ` Amir Goldstein
2022-09-12 14:35 ` Vivek Goyal
2022-09-12 15:07 ` Amir Goldstein
2022-09-12 19:56 ` Vivek Goyal [this message]
2022-09-13 2:07 ` Amir Goldstein
2025-02-19 17:58 ` LOOKUP_HANDLE and FUSE passthrough (was: Persistent FUSE " Amir Goldstein
2025-02-20 10:26 ` Miklos Szeredi
2025-02-20 15:51 ` Amir Goldstein
2025-02-24 15:53 ` Miklos Szeredi
[not found] ` <20200922212534.GH57620@redhat.com>
[not found] ` <CAOQ4uxjp6NpF_Q0QqUTzE5=YiKz9w6JbUVyROG+rNFcHPAThFg@mail.gmail.com>
2020-09-23 12:53 ` Copying overlayfs directories with index=on Pavel Tikhomirov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yx+O/0gVFso5YNxG@redhat.com \
--to=vgoyal@redhat.com \
--cc=amir73il@gmail.com \
--cc=hreitz@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=miklos@szeredi.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.