linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@ownmail.net>
To: "Al Viro" <viro@zeniv.linux.org.uk>
Cc: "Trond Myklebust" <trondmy@kernel.org>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Christian Brauner" <brauner@kernel.org>,
	linux-fsdevel@vger.kernel.org, "Jan Kara" <jack@suse.cz>
Subject: Re: [RFC] a possible way of reducing the PITA of ->d_name audits
Date: Mon, 15 Sep 2025 09:07:00 +1000	[thread overview]
Message-ID: <175789122000.1696783.12073600603531640624@noble.neil.brown.name> (raw)
In-Reply-To: <20250914055626.GG39973@ZenIV>

On Sun, 14 Sep 2025, Al Viro wrote:
> On Sun, Sep 14, 2025 at 02:37:30AM +0100, Al Viro wrote:
> 
> > AFAICS, it can happen if you are there from nfs4_file_open(), hit
> > _nfs4_opendata_to_nfs4_state(opendata), find ->rpc_done to be true
> > in there, hit nfs4_opendata_find_nfs4_state(), have it call
> > nfs4_opendata_get_inode() and run into a server without
> > NFS_CAP_ATOMIC_OPEN_V1.  Then you get ->o_arg.claim set to
> > NFS4_OPEN_CLAIM_NULL and hit this:
> >                 inode = nfs_fhget(data->dir->d_sb, &data->o_res.fh,
> >                                 &data->f_attr);
> > finding not the same inode as your dentry has attached to it.
> > 
> > So the test might end up not being true, at least from my reading of
> > that code.
> > 
> > What I don't understand is the reasons for not failing immediately
> > with EOPENSTALE in that case.
> > 
> > TBH, I would be a lot more comfortable if the "attach inode to dentry"
> > logics in there had been taken several levels up the call chains - analysis
> > would be much easier that way...
>  
> BTW, that's one place where your scheme with locking dentry might cause
> latency problems - two opens on the same cached dentry could be sent
> in parallel, but if you hold it against renames, etc., you might end up
> with those two roundtrips serialized against each other...
> 

That is certainly something we need to explore when the time comes,
though at the moment I'm mostly focuses on trying to land APIs to
centralise the locking so that we can easily see what locking is
actually being done and can change it all in one place.

Currently all calls to ->atomic_open are exclusive on the dentry, either
because O_CREATE means the directory is locked exclusively or because
d_alloc_parallel() locked the dentry with DCACHE_PAR_LOOKUP.  ...except
a cached-negative (which was accepted by d_revalidate) without O_CREATE.
I would rather we didn't call ->atomic_open in that case.  If the
filesystem wants us to (which NFS does) they can fail ->d_revalidate
for LOOKUP_OPEN.  I have a patch to do that for NFS so we can drop the
d_alloc_parallel() from nfs_atomic_open().  I think we need that case to
be exclusive anyway.  You achieved that for NFS by using
d_alloc_parallel().

All other atomic_open functions simply call finish_no_open() or return
-ENOENT when given a non-d_in_lookup() dentry and are not asked to
CREATE.  They could all be simplified if we avoided the atomic_open call
in that case.

The proposal is to add a case when O_CREATE isn't requested and the
dentry is no longer d_in_lookup().  Probably that needs exclusive access
to the dentry anyway?  Either way it wouldn't be a regression.

I wonder if maybe we should transition from offering ->atomic_open() to
offering ->lookup_open() which is called instead of the current
lookup_open() (though with locking moved inside the function).

So for a LAST_NORM open the filesystem can take complete control after
the parent has been found.  The VFS can export a bunch of appropriate
helpers which the filesystem can mix and match.  This seems cleaner than
having a ->lookup() which bypasses LOOKUP_OPEN requests and a
->d_revalidate which does the same.

NeilBrown

      reply	other threads:[~2025-09-14 23:07 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-07 20:32 [RFC] a possible way of reducing the PITA of ->d_name audits Al Viro
2025-09-07 21:51 ` Linus Torvalds
2025-09-08  0:06   ` Al Viro
2025-09-08  0:47     ` Linus Torvalds
2025-09-08  2:51       ` Al Viro
2025-09-08  3:57         ` Al Viro
2025-09-08  4:50           ` NeilBrown
2025-09-08  5:19             ` Al Viro
2025-09-08  6:25               ` NeilBrown
2025-09-08  9:05                 ` Al Viro
2025-09-10  2:45                   ` NeilBrown
2025-09-10  7:24                     ` Al Viro
2025-09-10 22:52                       ` NeilBrown
2025-09-12  5:49                       ` ->atomic_open() fun (was Re: [RFC] a possible way of reducing the PITA of ->d_name audits) Al Viro
2025-09-12  8:23                         ` Miklos Szeredi
2025-09-12 18:29                           ` Al Viro
2025-09-12 19:22                             ` Miklos Szeredi
2025-09-12 20:36                               ` Al Viro
2025-09-12 20:50                                 ` Al Viro
2025-09-13  3:36                             ` NeilBrown
2025-09-13  5:07                               ` Al Viro
2025-09-13  5:50                                 ` NeilBrown
2025-09-14 19:01                                 ` Miklos Szeredi
2025-09-14 19:50                                   ` Al Viro
2025-09-14 20:05                                     ` Miklos Szeredi
2025-09-15  8:54                                       ` Bernd Schubert
2025-09-12 18:55                         ` Al Viro
2025-09-12 18:59                           ` [PATCH 1/9] allow finish_no_open(file, ERR_PTR(-E...)) Al Viro
2025-09-12 18:59                             ` [PATCH 2/9] 9p: simplify v9fs_vfs_atomic_open() Al Viro
2025-09-12 18:59                             ` [PATCH 3/9] 9p: simplify v9fs_vfs_atomic_open_dotl() Al Viro
2025-09-12 18:59                             ` [PATCH 4/9] simplify cifs_atomic_open() Al Viro
2025-09-12 18:59                             ` [PATCH 5/9] simplify vboxsf_dir_atomic_open() Al Viro
2025-09-12 18:59                             ` [PATCH 6/9] simplify nfs_atomic_open_v23() Al Viro
2025-09-12 18:59                             ` [PATCH 7/9] simplify fuse_atomic_open() Al Viro
2025-09-12 18:59                             ` [PATCH 8/9] simplify gfs2_atomic_open() Al Viro
2025-09-12 18:59                             ` [PATCH 9/9] slightly simplify nfs_atomic_open() Al Viro
2025-09-12 22:23                             ` [PATCH 1/9] allow finish_no_open(file, ERR_PTR(-E...)) Linus Torvalds
2025-09-13  3:34                             ` NeilBrown
2025-09-13 21:28                   ` [RFC] a possible way of reducing the PITA of ->d_name audits Al Viro
2025-09-14  1:05                     ` NeilBrown
2025-09-14  1:37                       ` Al Viro
2025-09-14  5:56                         ` Al Viro
2025-09-14 23:07                           ` NeilBrown [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=175789122000.1696783.12073600603531640624@noble.neil.brown.name \
    --to=neilb@ownmail.net \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=neil@brown.name \
    --cc=torvalds@linux-foundation.org \
    --cc=trondmy@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).