[PATCH v2 0/2] VFS: change ->atomic_open() calling to always have exclusive access

Linux NFS development
 help / color / mirror / Atom feed

* [PATCH v2 0/2] VFS: change ->atomic_open() calling to always have exclusive access
@ 2025-09-25  1:17 NeilBrown
  2025-09-25  1:17 ` [PATCH v2 1/2] NFS: remove d_drop()/d_alloc_parallel() from nfs_atomic_open() NeilBrown
  2025-09-25  1:17 ` [PATCH v2 2/2] VFS: don't call ->atomic_open on cached negative without O_CREAT NeilBrown
  0 siblings, 2 replies; 3+ messages in thread
From: NeilBrown @ 2025-09-25  1:17 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Alexander Viro,
	Christian Brauner
  Cc: Jan Kara, linux-fsdevel, linux-nfs

->atomic_open() is called with only a shared lock on the directory when
O_CREAT wasn't requested.  If the dentry is negative it is still called
because the filesystem might want to revalidate-and-open in an atomic
operations.  NFS does this to fullfil close-to-open consistency
requirements.

NFS has complex code to drop the dentry and reallocate with
d_alloc_parallel() in this case to ensure that only one lookup/open
happens at a time.  It would be simpler to have NFS return zero from
->d_revalidate in this case so that d_alloc_parallel() will be called
in lookup_open() and NFS wan't need to worry about concurrency.

So this series makes that change to NFS to simplify atomic_open and remove
the d_drop() and d_alloc_parallel(), and then changes lookup_open() so that
atomic_open() can never be called without exclusive access to the dentry.

The v2 fixes a couple couple of bugs and revises the documentation to
remove the claim that ->atomic_open will not not be called with a
negative dentry and not O_CREAT.  As O_CREAT can be cleared late, that
combination is still possible.  The important property is that
->atomic_open will now always have exclusive access to the dentry.

NeilBrown


 [PATCH v2 1/2] NFS: remove d_drop()/d_alloc_parallel() from
 [PATCH v2 2/2] VFS: don't call ->atomic_open on cached negative

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH v2 1/2] NFS: remove d_drop()/d_alloc_parallel() from nfs_atomic_open()
  2025-09-25  1:17 [PATCH v2 0/2] VFS: change ->atomic_open() calling to always have exclusive access NeilBrown
@ 2025-09-25  1:17 ` NeilBrown
  2025-09-25  1:17 ` [PATCH v2 2/2] VFS: don't call ->atomic_open on cached negative without O_CREAT NeilBrown
  1 sibling, 0 replies; 3+ messages in thread
From: NeilBrown @ 2025-09-25  1:17 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Alexander Viro,
	Christian Brauner
  Cc: Jan Kara, linux-fsdevel, linux-nfs

From: NeilBrown <neil@brown.name>

It is important that two non-create NFS "open"s of a negative dentry
don't race.  They both have a shared lock on i_rwsem and so could run
concurrently, but they might both try to call d_exact_alias() or
d_splice_alias() at the same time which is confusing at best.

nfs_atomic_open() currently avoids this by discarding the negative
dentry and creating a new one with d_alloc_parallel().  Only one thread
can successfully get the d_in_lookup() dentry, the other will wait for
the first to finish, and can use the result of that first lookup.

Dropping the dentry like this will defeat a proposed new locking scheme
which locks the dentry and requires it to remain hashed.  Calling
d_alloc_parallel() here when the parent is locked interferes with
proposed changes to invert the lock ordering between the parent inode
and DCACHE_PAR_LOOKUP on a child.

We can achieve the same effect by causing ->d_revalidate to invalidate a
negative dentry when LOOKUP_OPEN is set.  Doing this is consistent with
the "close to open" caching semantics of NFS which requires the server
to be queried whenever opening a file - cached information must not be
trusted.

With this change to ->d_revaliate (implemented in nfs_neg_need_reval) we
can be sure that we have exclusive access to any dentry that reaches
nfs_atomic_open().  Either O_CREAT was requested and so the parent is
locked exclusively, or the dentry will have DCACHE_PAR_LOOKUP set.

This means that the d_drop() and d_alloc_parallel() calls in
nfs_atomic_lookup() are no longer needed to provide exclusion

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/nfs/dir.c | 31 +++++++------------------------
 1 file changed, 7 insertions(+), 24 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 5f7d9be6f022..c7c746ae377c 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1615,6 +1615,13 @@ int nfs_neg_need_reval(struct inode *dir, struct dentry *dentry,
 {
 	if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET))
 		return 0;
+	if (flags & LOOKUP_OPEN)
+		/* close-to-open semantics require we go to server
+		 * on each open.  By invalidating the dentry we
+		 * also ensure nfs_atomic_open() always has exclusive
+		 * access to the dentry.
+		 */
+		return 0;
 	if (NFS_SERVER(dir)->flags & NFS_MOUNT_LOOKUP_CACHE_NONEG)
 		return 1;
 	/* Case insensitive server? Revalidate negative dentries */
@@ -2060,14 +2067,12 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 		    struct file *file, unsigned open_flags,
 		    umode_t mode)
 {
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 	struct nfs_open_context *ctx;
 	struct dentry *res;
 	struct iattr attr = { .ia_valid = ATTR_OPEN };
 	struct inode *inode;
 	unsigned int lookup_flags = 0;
 	unsigned long dir_verifier;
-	bool switched = false;
 	int created = 0;
 	int err;

@@ -2112,17 +2117,6 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 		attr.ia_size = 0;
 	}

-	if (!(open_flags & O_CREAT) && !d_in_lookup(dentry)) {
-		d_drop(dentry);
-		switched = true;
-		dentry = d_alloc_parallel(dentry->d_parent,
-					  &dentry->d_name, &wq);
-		if (IS_ERR(dentry))
-			return PTR_ERR(dentry);
-		if (unlikely(!d_in_lookup(dentry)))
-			return finish_no_open(file, dentry);
-	}
-
 	ctx = create_nfs_open_context(dentry, open_flags, file);
 	err = PTR_ERR(ctx);
 	if (IS_ERR(ctx))
@@ -2165,10 +2159,6 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 	trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
 	put_nfs_open_context(ctx);
 out:
-	if (unlikely(switched)) {
-		d_lookup_done(dentry);
-		dput(dentry);
-	}
 	return err;

 no_open:
@@ -2191,13 +2181,6 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 			res = ERR_PTR(-EOPENSTALE);
 		}
 	}
-	if (switched) {
-		d_lookup_done(dentry);
-		if (!res)
-			res = dentry;
-		else
-			dput(dentry);
-	}
 	return finish_no_open(file, res);
 }
 EXPORT_SYMBOL_GPL(nfs_atomic_open);
-- 
2.50.0.107.gf914562f5916.dirty

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH v2 2/2] VFS: don't call ->atomic_open on cached negative without O_CREAT
  2025-09-25  1:17 [PATCH v2 0/2] VFS: change ->atomic_open() calling to always have exclusive access NeilBrown
  2025-09-25  1:17 ` [PATCH v2 1/2] NFS: remove d_drop()/d_alloc_parallel() from nfs_atomic_open() NeilBrown
@ 2025-09-25  1:17 ` NeilBrown
  1 sibling, 0 replies; 3+ messages in thread
From: NeilBrown @ 2025-09-25  1:17 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Alexander Viro,
	Christian Brauner
  Cc: Jan Kara, linux-fsdevel, linux-nfs

From: NeilBrown <neil@brown.name>

If the dentry is found to be negative (not d_in_lookup() and not
positive) and if O_CREATE wasn't requested then we do not have exclusive
access the dentry.

If we pass it to ->atomic_open() the filesystem will need to ensure any
lookup+open operations are serialised so that two threads don't both try
to instantiate the dentry.  This is an unnecessary burden to put on the
filesystem.

If the filesystem wants to perform such a lookup+open operation when a
negative dentry is found, it should return 0 from ->d_revalidate in that
case (when LOOKUP_OPEN) so that the calls serialise in
d_alloc_parallel().

All filesystems with ->atomic_open() currently handle the case of a
negative dentry without O_CREAT either by returning -ENOENT or by
calling finish_no_open() with a NULL dentry.  These have the same effect
of causing atomic_open() to return -ENOENT.

For filesystems without ->atomic_open(), lookup_open() will, in this
case, also return -ENOENT.

So this patch removes the burden from filesystems by returning -ENOENT
early on a negative cached dentry when O_CREAT isn't requested.

With this change any ->atomic_open() function can be certain that it has
exclusive access to the dentry, either because an exclusive lock is held
on the parent directory or because DCACHE_PAR_LOOKUP is set implying an
exclusive lock on the dentry itself.

Signed-off-by: NeilBrown <neil@brown.name>
---
 Documentation/filesystems/porting.rst | 10 ++++++++++
 Documentation/filesystems/vfs.rst     |  4 ++++
 fs/namei.c                            |  9 +++++++++
 3 files changed, 23 insertions(+)

diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index ab48ab3f6eb2..0ce53a9d36ec 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1297,3 +1297,13 @@ a different length, use
 	vfs_parse_fs_qstr(fc, key, &QSTR_LEN(value, len))

 instead.
+
+---
+
+**mandatory**
+
+If a filesystem provides ->atomic_open and needs to handle non-creating
+open of a cached-negative dentry, it should provide a ->d_revalidate
+that returns zero for a negative dentry when LOOKUP_OPEN is set.
+In return it is guaranteed exclusive access to any dentry passed to
+->atomic_open.
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 486a91633474..6ef17a97064f 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -678,6 +678,10 @@ otherwise noted.
 	flag should be set in file->f_mode.  In case of O_EXCL the
 	method must only succeed if the file didn't exist and hence
 	FMODE_CREATED shall always be set on success.
+	atomic_open() will always have exclusive access to the dentry,
+	as if O_CREAT hasn't caused the directory to be locked exclusively,
+	then the dentry will have DCACHE_PAR_LOOKUP which also
+	provides exclusivity.

 ``tmpfile``
 	called in the end of O_TMPFILE open().  Optional, equivalent to
diff --git a/fs/namei.c b/fs/namei.c
index ba8bf73d2f9c..520296f70f34 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3647,6 +3647,15 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 		/* Cached positive dentry: will open in f_op->open */
 		return dentry;
 	}
+	if ((open_flag & O_CREAT) == 0 && !d_in_lookup(dentry)) {
+		/* Cached negative dentry and no create requested.
+		 * If a filesystem wants to be called in this case
+		 * it should trigger dentry invalidation in
+		 * ->d_revalidate(.., LOOKUP_OPEN).
+		 */
+		error = -ENOENT;
+		goto out_dput;
+	}

 	if (open_flag & O_CREAT)
 		audit_inode(nd->name, dir, AUDIT_INODE_PARENT);
-- 
2.50.0.107.gf914562f5916.dirty

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-09-25  1:22 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-25  1:17 [PATCH v2 0/2] VFS: change ->atomic_open() calling to always have exclusive access NeilBrown
2025-09-25  1:17 ` [PATCH v2 1/2] NFS: remove d_drop()/d_alloc_parallel() from nfs_atomic_open() NeilBrown
2025-09-25  1:17 ` [PATCH v2 2/2] VFS: don't call ->atomic_open on cached negative without O_CREAT NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox