* [PATCH 1/2] NFS: remove d_drop()/d_alloc_paralle() from nfs_atomic_open()
2025-09-15 3:01 VFS: change ->atomic_open() calling to always have exclusive access NeilBrown
@ 2025-09-15 3:01 ` NeilBrown
2025-09-15 3:01 ` [PATCH 2/2] VFS: don't call ->atomic_open on cached negative without O_CREAT NeilBrown
2025-09-17 4:34 ` VFS: change ->atomic_open() calling to always have exclusive access Al Viro
2 siblings, 0 replies; 7+ messages in thread
From: NeilBrown @ 2025-09-15 3:01 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Trond Myklebust,
Anna Schumaker
Cc: linux-nfs, linux-fsdevel, Jeff Layton, Amir Goldstein, Jan Kara
From: NeilBrown <neil@brown.name>
It is important that two non-create NFS "open"s of a negative dentry don't race.
They both have a shared lock on i_rwsem and so could run concurrently, but
they might both try to call d_exact_alias() or d_splice_alias() at the
same time which is confusing at best.
nfs_atomic_open() currently avoids this by discarding the negative dentry
and creating a new one with d_alloc_parallel(). Only one thread can
successfully get the d_in_lookup() dentry, the other will wait for the
first to finish, and can use the result of that first lookup.
Dropping the dentry like this will defeat a proposed new locking scheme
which locks the dentry and requires it to remain hashed.
We can achieve the same effect by causing ->d_revalidate to invalidate a
negative dentry when LOOKUP_OPEN is set. Doing this is consistent with
the "close to open" caching semantics of NFS which require the server to
be queried whenever opening a file - cached information must not be
trusted.
With this change to d_revaliate (implemented in nfs_neg_need_reval)
we can be sure that if the dentry that reaches nfs_atomic_open() is
not d_in_lookup and is negative, then some other lookup or atomic_open
must have happened since d_revalidate was called, so we have recent
confirmation from the server that the name doesn't exist, and we can
safely fail the open request (which finish_no_open() will effectively do
when passed a negative dentry).
Note that none of the above applies to O_CREAT opens. These are fully
serialised with an exclusive lock on i_rwsem and there is no attempt to
d_drop() the dentry in that case. d_revalidate() does not need to
request invalidation of a negative dentry as a create will proceed in
that case anyway.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/nfs/dir.c | 30 +++++++-----------------------
1 file changed, 7 insertions(+), 23 deletions(-)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index d81217923936..bc417566508e 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1615,6 +1615,11 @@ int nfs_neg_need_reval(struct inode *dir, struct dentry *dentry,
{
if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET))
return 0;
+ if (flags & LOOKUP_OPEN)
+ /* close-to-open semantics require we go to server
+ * on each open.
+ */
+ return 1;
if (NFS_SERVER(dir)->flags & NFS_MOUNT_LOOKUP_CACHE_NONEG)
return 1;
/* Case insensitive server? Revalidate negative dentries */
@@ -2060,14 +2065,12 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
struct file *file, unsigned open_flags,
umode_t mode)
{
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
struct nfs_open_context *ctx;
struct dentry *res;
struct iattr attr = { .ia_valid = ATTR_OPEN };
struct inode *inode;
unsigned int lookup_flags = 0;
unsigned long dir_verifier;
- bool switched = false;
int created = 0;
int err;
@@ -2112,16 +2115,8 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
attr.ia_size = 0;
}
- if (!(open_flags & O_CREAT) && !d_in_lookup(dentry)) {
- d_drop(dentry);
- switched = true;
- dentry = d_alloc_parallel(dentry->d_parent,
- &dentry->d_name, &wq);
- if (IS_ERR(dentry))
- return PTR_ERR(dentry);
- if (unlikely(!d_in_lookup(dentry)))
- return finish_no_open(file, dentry);
- }
+ if (!(open_flags & O_CREAT) && !d_in_lookup(dentry))
+ return finish_no_open(file, dentry);
ctx = create_nfs_open_context(dentry, open_flags, file);
err = PTR_ERR(ctx);
@@ -2165,10 +2160,6 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
put_nfs_open_context(ctx);
out:
- if (unlikely(switched)) {
- d_lookup_done(dentry);
- dput(dentry);
- }
return err;
no_open:
@@ -2191,13 +2182,6 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
res = ERR_PTR(-EOPENSTALE);
}
}
- if (switched) {
- d_lookup_done(dentry);
- if (!res)
- res = dentry;
- else
- dput(dentry);
- }
if (IS_ERR(res))
return PTR_ERR(res);
return finish_no_open(file, res);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH 2/2] VFS: don't call ->atomic_open on cached negative without O_CREAT
2025-09-15 3:01 VFS: change ->atomic_open() calling to always have exclusive access NeilBrown
2025-09-15 3:01 ` [PATCH 1/2] NFS: remove d_drop()/d_alloc_paralle() from nfs_atomic_open() NeilBrown
@ 2025-09-15 3:01 ` NeilBrown
2025-09-17 4:23 ` Al Viro
2025-09-17 4:34 ` VFS: change ->atomic_open() calling to always have exclusive access Al Viro
2 siblings, 1 reply; 7+ messages in thread
From: NeilBrown @ 2025-09-15 3:01 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Trond Myklebust,
Anna Schumaker
Cc: linux-nfs, linux-fsdevel, Jeff Layton, Amir Goldstein, Jan Kara
From: NeilBrown <neil@brown.name>
If the dentry is found to be negative (not d_in_lookup() and not
positive) and if O_CREATE wasn't requested then we do not have exclusive
access the dentry.
If we pass it to ->atomic_open() the filesystem will need to ensure any
lookup+open operations are serialised so that two threads don't both try
to instantiate the dentry. This is an unnecessary burden to put on the
filesystem.
If the filesystem wants to perform such a lookup+open operation when a
negative dentry is found, it should return 0 from ->d_revalidate in that
case (when LOOKUP_OPEN) so that the calls serialise in
d_alloc_parallel().
All filesystems with ->atomic_open() currently handle the case of a
negative dentry without O_CREAT either by returning -ENOENT or by
calling finish_no_open(), either with NULL or with the negative dentry.
All of these have the same effect.
For filesystems without ->atomic_open(), lookup_open() will, in this
case, also call finish_no_open().
So this patch removes the burden from filesystems by calling
finish_no_open() early on a negative cached dentry when O_CREAT isn't
requested.
With this change any ->atomic_open() function can be certain that it has
exclusive access to the dentry, either because an exclusive lock is held
on the parent directory or because DCACHE_PAR_LOOKUP is set implying an
exclusive lock on the dentry itself.
Signed-off-by: NeilBrown <neil@brown.name>
---
Documentation/filesystems/vfs.rst | 4 ++++
fs/namei.c | 8 ++++++++
2 files changed, 12 insertions(+)
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 486a91633474..be7dd654f5fd 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -678,6 +678,10 @@ otherwise noted.
flag should be set in file->f_mode. In case of O_EXCL the
method must only succeed if the file didn't exist and hence
FMODE_CREATED shall always be set on success.
+ atomic_open() will always have exclusive access to the dentry
+ as if O_CREAT hasn't caused the directory to be locked exclusively,
+ then the dentry will have DCACHE_PAR_LOOKUP will also
+ provides exclusivity.
``tmpfile``
called in the end of O_TMPFILE open(). Optional, equivalent to
diff --git a/fs/namei.c b/fs/namei.c
index ba8bf73d2f9c..5f732b9cd2db 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3647,6 +3647,14 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
/* Cached positive dentry: will open in f_op->open */
return dentry;
}
+ if ((open_flag & O_CREAT) == 0 && !d_in_lookup(dentry)) {
+ /* Cached negative dentry and no create requested.
+ * If a filesystem wants to be called in this case
+ * it should trigger dentry invalidation in
+ * ->d_revalidate.
+ */
+ return dentry;
+ }
if (open_flag & O_CREAT)
audit_inode(nd->name, dir, AUDIT_INODE_PARENT);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: VFS: change ->atomic_open() calling to always have exclusive access
2025-09-15 3:01 VFS: change ->atomic_open() calling to always have exclusive access NeilBrown
2025-09-15 3:01 ` [PATCH 1/2] NFS: remove d_drop()/d_alloc_paralle() from nfs_atomic_open() NeilBrown
2025-09-15 3:01 ` [PATCH 2/2] VFS: don't call ->atomic_open on cached negative without O_CREAT NeilBrown
@ 2025-09-17 4:34 ` Al Viro
2025-09-17 7:25 ` NeilBrown
2 siblings, 1 reply; 7+ messages in thread
From: Al Viro @ 2025-09-17 4:34 UTC (permalink / raw)
To: NeilBrown
Cc: Christian Brauner, Trond Myklebust, Anna Schumaker, linux-nfs,
linux-fsdevel, Jeff Layton, Amir Goldstein, Jan Kara
On Mon, Sep 15, 2025 at 01:01:06PM +1000, NeilBrown wrote:
> ->atomic_open() is called with only a shared lock on the directory when
> O_CREAT wasn't requested. If the dentry is negative it is still called
> because the filesystem might want to revalidate-and-open in an atomic
> operations. NFS does this to fullfil close-to-open consistency
> requirements.
>
> NFS has complex code to drop the dentry and reallocate with
> d_alloc_parallel() in this case to ensure that only one lookup/open
> happens at a time. It would be simpler to have NFS return zero from
> ->d_revalidate in this case so that d_alloc_parallel() will be calling
> in lookup_open() and NFS wan't need to worry about concurrency.
>
> So this series makes that change to NFS to simplify atomic_open and remove
> the d_drop() and d_alloc_parallel(), and then changes lookup_open() so that
> atomic_open() can never be called without exclusive access to the dentry.
What will happen if you have a stale negative dentry of /mnt/foo/bar, with
/mnt/foo not writable to you (e.g. because /mnt is read-only mount) and
foo/bar having actually been created by another client since your old lookup
had happened? What's more, assume that this file is owned by you and has 0660
for mode.
Calling open("/mnt/foo/bar", O_CREAT|O_RDWR, 0666) should succeed - the file
in question exists, so O_CREAT and the third argument of open(2) should be
ignored.
What happens is that we get ->d_revalidate() still with effects of O_CREAT
(i.e. LOOKUP_CREATE|LOOKUP_OPEN), but ->atomic_open() does *not* get O_CREAT.
Your series is broken in that case, AFAICS.
^ permalink raw reply [flat|nested] 7+ messages in thread