From: Krister Johansen <kjlx@templeofstupid.com>
To: Bernd Schubert <bernd.schubert@fastmail.fm>
Cc: Krister Johansen <kjlx@templeofstupid.com>,
Miklos Szeredi <miklos@szeredi.hu>,
linux-fsdevel@vger.kernel.org,
Miklos Szeredi <mszeredi@redhat.com>,
linux-kernel@vger.kernel.org,
German Maglione <gmaglione@redhat.com>,
Greg Kurz <groug@kaod.org>, Max Reitz <mreitz@redhat.com>
Subject: Re: [resend PATCH v2 2/2] fuse: ensure that submounts lookup their parent
Date: Mon, 9 Oct 2023 10:15:25 -0700 [thread overview]
Message-ID: <20231009171525.GA1973@templeofstupid.com> (raw)
In-Reply-To: <968148ad-787e-4ccb-9d84-f32b5da88517@fastmail.fm>
On Mon, Oct 09, 2023 at 02:52:42PM +0200, Bernd Schubert wrote:
>
>
> On 10/7/23 02:41, Krister Johansen wrote:
> > On Fri, Oct 06, 2023 at 07:13:06PM +0200, Bernd Schubert wrote:
> > >
> > >
> > > On 10/2/23 17:24, Krister Johansen wrote:
> > > > The submount code uses the parent nodeid passed into the function in
> > > > order to create the root dentry for the new submount. This nodeid does
> > > > not get its remote reference count incremented by a lookup option.
> > > >
> > > > If the parent inode is evicted from its superblock, due to memory
> > > > pressure for example, it can result in a forget opertation being sent to
> > > > the server. Should this nodeid be forgotten while it is still in use in
> > > > a submount, users of the submount get an error from the server on any
> > > > subsequent access. In the author's case, this was an EBADF on all
> > > > subsequent operations that needed to reference the root.
> > > >
> > > > Debugging the problem revealed that the dentry shrinker triggered a forget
> > > > after killing the dentry with the last reference, despite the root
> > > > dentry in another superblock still using the nodeid.
> > > >
> > > > As a result, a container that was also using this submount failed to
> > > > access its filesystem because it had borrowed the reference instead of
> > > > taking its own when setting up its superblock for the submount.
> > > >
> > > > This commit fixes the problem by having the new submount trigger a
> > > > lookup for the parent as part of creating a new root dentry for the
> > > > virtiofsd submount superblock. This allows each superblock to have its
> > > > inodes removed by the shrinker when unreferenced, while keeping the
> > > > nodeid reference count accurate and active with the server.
> > > >
> > > > Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
> > > > ---
> > > > fs/fuse/dir.c | 10 +++++-----
> > > > fs/fuse/fuse_i.h | 6 ++++++
> > > > fs/fuse/inode.c | 43 +++++++++++++++++++++++++++++++++++++------
> > > > 3 files changed, 48 insertions(+), 11 deletions(-)
> > > >
> > > > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> > > > index 5e01946d7531..333730c74619 100644
> > > > --- a/fs/fuse/dir.c
> > > > +++ b/fs/fuse/dir.c
> > > > @@ -183,11 +183,11 @@ static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args,
> > > > args->out_args[0].value = outarg;
> > > > }
> > > > -static int fuse_dentry_revalidate_lookup(struct fuse_mount *fm,
> > > > - struct dentry *entry,
> > > > - struct inode *inode,
> > > > - struct fuse_entry_out *outarg,
> > > > - bool *lookedup)
> > > > +int fuse_dentry_revalidate_lookup(struct fuse_mount *fm,
> > > > + struct dentry *entry,
> > > > + struct inode *inode,
> > > > + struct fuse_entry_out *outarg,
> > > > + bool *lookedup)
> > > > {
> > > > struct dentry *parent;
> > > > struct fuse_forget_link *forget;
> > > > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> > > > index 405252bb51f2..a66fcf50a4cc 100644
> > > > --- a/fs/fuse/fuse_i.h
> > > > +++ b/fs/fuse/fuse_i.h
> > > > @@ -1325,6 +1325,12 @@ void fuse_dax_dontcache(struct inode *inode, unsigned int flags);
> > > > bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
> > > > void fuse_dax_cancel_work(struct fuse_conn *fc);
> > > > +/* dir.c */
> > > > +int fuse_dentry_revalidate_lookup(struct fuse_mount *fm, struct dentry *entry,
> > > > + struct inode *inode,
> > > > + struct fuse_entry_out *outarg,
> > > > + bool *lookedup);
> > > > +
> > > > /* ioctl.c */
> > > > long fuse_file_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
> > > > long fuse_file_compat_ioctl(struct file *file, unsigned int cmd,
> > > > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> > > > index 444418e240c8..79a31cb55512 100644
> > > > --- a/fs/fuse/inode.c
> > > > +++ b/fs/fuse/inode.c
> > > > @@ -1464,7 +1464,13 @@ static int fuse_fill_super_submount(struct super_block *sb,
> > > > struct fuse_mount *fm = get_fuse_mount_super(sb);
> > > > struct super_block *parent_sb = parent_fi->inode.i_sb;
> > > > struct fuse_attr root_attr;
> > > > + struct fuse_inode *fi;
> > > > struct inode *root;
> > > > + struct inode *parent;
> > > > + struct dentry *pdent;
> > > > + struct fuse_entry_out outarg;
> > > > + bool lookedup = false;
> > > > + int ret;
> > > > fuse_sb_defaults(sb);
> > > > fm->sb = sb;
> > > > @@ -1480,14 +1486,39 @@ static int fuse_fill_super_submount(struct super_block *sb,
> > > > if (parent_sb->s_subtype && !sb->s_subtype)
> > > > return -ENOMEM;
> > > > - fuse_fill_attr_from_inode(&root_attr, parent_fi);
> > > > - root = fuse_iget(sb, parent_fi->nodeid, 0, &root_attr, 0, 0);
> > > > /*
> > > > - * This inode is just a duplicate, so it is not looked up and
> > > > - * its nlookup should not be incremented. fuse_iget() does
> > > > - * that, though, so undo it here.
> > > > + * It is necessary to lookup the parent_if->nodeid in case the dentry
> > > > + * that triggered the automount of the submount is later evicted.
> > > > + * If this dentry is evicted without the lookup count getting increased
> > > > + * on the submount root, then the server can subsequently forget this
> > > > + * nodeid which leads to errors when trying to access the root of the
> > > > + * submount.
> > > > */
> > > > - get_fuse_inode(root)->nlookup--;
> > > > + parent = &parent_fi->inode;
> > > > + pdent = d_find_alias(parent);
> > > > + if (!pdent)
> > > > + return -EINVAL;
> > > > +
> > > > + ret = fuse_dentry_revalidate_lookup(fm, pdent, parent, &outarg,
> > > > + &lookedup);
> > > > + dput(pdent);
> > > > + /*
> > > > + * The new root owns this nlookup on success, and it is incremented by
> > > > + * fuse_iget(). In the case the lookup succeeded but revalidate fails,
> > > > + * ensure that the lookup count is tracked by the parent.
> > > > + */
> > > > + if (ret <= 0) {
> > > > + if (lookedup) {
> > > > + fi = get_fuse_inode(parent);
> > > > + spin_lock(&fi->lock);
> > > > + fi->nlookup++;
> > > > + spin_unlock(&fi->lock);
> > > > + }
> > >
> > > I might be wrong, but doesn't that mean that
> > > "get_fuse_inode(root)->nlookup--" needs to be called?
> >
> > In the case where ret > 0, the nlookup on get_fuse_inode(root) is set to
> > 1 by fuse_iget(). That ensures that the root is forgotten when later
> > unmounted. The code that handles the forget uses the count of nlookup
> > to tell the server-side how many references to forget. (That's in
> > fuse_evict_inode()).
> >
> > However, if the fuse_dentry_revalidate_lookup() call performs a valid
> > lookup but returns an error, this function will return before it fills
> > out s_root in the superblock or calls fuse_iget(). If the superblock
> > doesn't have a s_root set, then the code in generic_kill_super() won't
> > dput() the root dentry and trigger the forget.
> >
> > The intention of this code was to handle the case where the lookup had
> > succeeded, but the code determined it was still necessary to return an
> > error. In that situation, the reference taken by the lookup has to be
> > accounted somewhere, and the parent seemed like a plausible candidate.
>
> Yeah sorry, I had just missed that fuse_iget() also moved and then thought
> it would have increased fi->nlookup already.
No worries; I'd much rather get feedback if something doesn't look
right, even if it turns out okay in the end.
> > However, after writing up this response, I can see that there's still a
> > problem here if d_make_root(root) returns NULL, because we'll also lose
> > track of the nlookup in that case.
> >
> > If you agree that charging this to the parent on error makes sense, I'll
> > re-work the error handling here so that the right thing happens when
> > either fuse_dentry_revalidate_lookup() or d_make_root() encounter an
> > error.
>
> Oh yeah, I also missed that. Although, iput() calls iput_final, which then
> calls evict and sends the fuse forget - isn't that the right action already?
Thanks, I had forgotten that d_make_root() would call iput() for me if
d_alloc_anon() fails. Let me restate this to suggest that I account the
nlookup to the parent if fuse_dentry_revalidate_lookup() or fuse_iget()
fail instead. Does that sound right?
> > Thanks for the feedback.
>
> Well, false alarm from my side, sorry again!
No apology necessary; I appreciate you spending the time to look and ask
questions.
-K
next prev parent reply other threads:[~2023-10-09 17:15 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-02 15:24 [resend PATCH v2 0/2] virtiofs submounts that are still in use forgotten by shrinker Krister Johansen
2023-10-02 15:24 ` [resend PATCH v2 1/2] fuse: revalidate: move lookup into a separate function Krister Johansen
2023-10-02 15:24 ` [resend PATCH v2 2/2] fuse: ensure that submounts lookup their parent Krister Johansen
2023-10-06 17:13 ` Bernd Schubert
2023-10-07 0:41 ` Krister Johansen
2023-10-09 12:52 ` Bernd Schubert
2023-10-09 17:15 ` Krister Johansen [this message]
2023-10-09 18:43 ` Bernd Schubert
2023-10-10 2:35 ` Krister Johansen
2023-10-09 19:45 ` Miklos Szeredi
2023-10-10 2:35 ` Krister Johansen
2023-10-10 8:15 ` Miklos Szeredi
2023-10-11 1:25 ` Krister Johansen
2023-10-11 7:07 ` Miklos Szeredi
2023-10-11 16:32 ` Krister Johansen
2023-10-11 18:27 ` Miklos Szeredi
2023-10-18 1:33 ` Krister Johansen
2023-10-18 1:33 ` [PATCH v3] fuse: share lookup state between submount and its parent Krister Johansen
2023-10-19 12:39 ` Miklos Szeredi
2023-10-20 21:33 ` Krister Johansen
2023-10-20 21:34 ` [PATCH v4] " Krister Johansen
2023-10-02 22:18 ` [resend PATCH v2 0/2] virtiofs submounts that are still in use forgotten by shrinker Bernd Schubert
2023-10-03 16:48 ` Krister Johansen
2023-10-03 22:54 ` Bernd Schubert
2023-10-04 13:58 ` Krister Johansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231009171525.GA1973@templeofstupid.com \
--to=kjlx@templeofstupid.com \
--cc=bernd.schubert@fastmail.fm \
--cc=gmaglione@redhat.com \
--cc=groug@kaod.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=mreitz@redhat.com \
--cc=mszeredi@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox