[PATCH v9 0/7] fuse: full atomic open and atomic-open-revalidate

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v9 0/7] fuse: full atomic open and atomic-open-revalidate
@ 2023-09-20 17:34 Bernd Schubert
  2023-09-20 17:34 ` [PATCH v9 1/7] fuse: rename fuse_create_open Bernd Schubert
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Bernd Schubert @ 2023-09-20 17:34 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: bernd.schubert, miklos, dsingh, Bernd Schubert, Vivek Goyal,
	Christian Brauner, Al Viro, Horst Birthelmer

In FUSE, as of now, uncached lookups are expensive over the wire.
E.g additional latencies and stressing (meta data) servers from
thousands of clients. With atomic-open lookup before open
can be avoided.

Here is the link to performance numbers
https://lore.kernel.org/linux-fsdevel/20220322121212.5087-1-dharamhans87@gmail.com/

Here is the libfuse pull request
https://github.com/libfuse/libfuse/pull/813

The patches are passing passthrough_hp xfstests (libfuse part applied),
although we had to introduce umount retries into xfstests, as recent
kernels/xfstests fail umount in some tests with
EBUSY - independent of atomic open. (Although outstanding for v7)

I'm especially interested in Al's and Christians opinion about the
atomic open dentry revalidation in v7. If the vfs changes are
acceptable, would it be possible to also look at the other patches
and their vfs/dcache interaction? I __hope__ I got it right and I hope
the vfs can be accepted.

v9:
    - Followed Miklos suggestion and added another patch to further
      optimize atomic revalidate/open, which avoids dentry
      acquire/release and also avoids double call into ->d_revalidate
    - Updates following Miklos' review
    - Dropped a temporary comment in patch 2/7 (accidental leftover)

v8: - Another slight indentation fix in _fuse_atomic_open
    - Fix compilation error in patch 4 (fuse atomic revalidate)
    - Remove LOOKUP_ATOMIC_REVALIDATE
    - Switch from DCACHE_ATOMIC_OPEN flag to return value and
      and introduce an enum for d_revalidate return values.
    - checkpatch fixes

v7: - Indentation and style fixes for _fuse_atomic_open.
    - Remodel atomic open to avoid races with parallel lookup, similar
      to NFS commit c94c09535c4debcc439f55b5b6d9ebe57bd4665a and what
      is done in _nfs4_open_and_get_state()
      A WARN_ONCE() and fallback is added to ensure operation is on
      negative dentries only.
    - Error handling is done via the fallback fuse_create_open()
      to reduce complexity and code duplication.
    - Remove entry cache invalidation on ENOENT in the atomic-open
      patch, as atomic-open so far operates on negative dentries only.
    - Remove fuse_advise_use_readdirplus() in _fuse_atomic_open
      (Thanks Miklos)
    - Add forgotten free_ext_value() (Thanks Miklos).
    - Declare struct fuse_inode per condition as the value needs to
      be retrieved anyway per condition.
    - Added atomic open-revalidation and required vfs changes
    - Added myself (Bernd) as Co-developed-by to Dharmendras patches, as
      I did substantial modifications.
    - More as reminder for myself, so far these tests below are
      done manually or with custom scripts, I think we need xfstests
      for these.

        With updated libfuse /scratch/dest is mounted by:
        passthrough_hp -o allow_other --foreground --debug-fuse /scratch/source /scratch/dest

        1) Test atomic open (file create) and negative dentry open

            rm -f /scratch/source/file # ensure file does not exist
            mount /scratch/dest  # overlay of /scratch source
            echo a > /scratch/dest/file # non-existing file
            umount and mount /scratch/test (clear cache)
            cat /scratch/dest/file
            rm -f /scratch/dest/file

        2) Test dir open

            mkdir /scratch/source/dir
            mount /scratch/dest  # overlay of /scratch source
            cat /scratch/source/dir
            rmdir /scratch/source/dir

        3)  Test revalidate without file change

            mount /scratch/dest
            echo "a" > /scratch/dest/file
            echo "b" >> /scratch/dest/file
            echo "c" >> /scratch/dest/file
            cat /scratch/dest/file
            rm -f /scratch/dest/file

        4)  Test revalidate by underlying file change

            mount /scratch/dest
            echo "a" > /scratch/dest/file
            cat /scratch/dest/file
            rm -f /scratch/source/file # unknown to dest mount
            str="b"
            echo "${str}" > /scratch/source/file
            reval=$(cat /scratch/dest/file)
            if [ "$str" != "reval" ]; then
                echo "String mismatch after revalidation"
                exit 1
            fi
            rm -f /scratch/dest/file

        5) Test revalidate by underlying file change, but with
           O_CREATE included. Tests dentry creation by the atomic
           revalidate

            mount /scratch/dest
            echo "a" >> /scratch/dest/file
            rm -f /scratch/source/file
            echo "b" > /scratch/source/file

            # revalidate includes O_CREATE
            echo "c" >> /scratch/dest/file

        6) Repeat above tests, but with additional "--nocache"
           passthrough_hp option

v6: Addressed Miklos comments and rewrote atomic open into its own
    function. Also dropped for now is the revalidate optimization, we
    have the code/patch, but it needs more testing. Also easier to
    first agree on atomic open and then to land the next optimization.

v5: Addressed comments

v3: Addressed comments

v4: Addressed all comments and refactored the code into 3 separate patches
    respectively for Atomic create, Atomic open, optimizing lookup in
    d_revalidate().

v3: handle review comments

v2: fixed a memory leak in <fuse_atomic_open_common>

Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dharmendra Singh <dsingh@ddn.com>
Cc: Horst Birthelmer <hbirthelmer@ddn.com>
Cc: linux-fsdevel@vger.kernel.org

Bernd Schubert (6):
  fuse: rename fuse_create_open
  fuse: introduce atomic open
  vfs: Optimize atomic_open() on positive dentry
  fuse: Revalidate positive entries in fuse_atomic_open
  fuse: Return D_REVALIDATE_ATOMIC for cached dentries
  fuse: Avoid code duplication in atomic open

Miklos Szeredi (1):
  [RFC] Allow atomic_open() on positive dentry

 fs/fuse/dir.c             | 390 +++++++++++++++++++++++++++++++++++++-
 fs/fuse/fuse_i.h          |   6 +
 fs/namei.c                |  66 ++++++-
 include/linux/namei.h     |   7 +
 include/uapi/linux/fuse.h |   3 +
 5 files changed, 461 insertions(+), 11 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v9 1/7] fuse: rename fuse_create_open
  2023-09-20 17:34 [PATCH v9 0/7] fuse: full atomic open and atomic-open-revalidate Bernd Schubert
@ 2023-09-20 17:34 ` Bernd Schubert
  2023-09-20 17:34 ` [PATCH v9 2/7] fuse: introduce atomic open Bernd Schubert
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Bernd Schubert @ 2023-09-20 17:34 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: bernd.schubert, miklos, dsingh, Bernd Schubert

Just preparation work for atomic open.

Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Dharmendra Singh <dsingh@ddn.com>
Cc: linux-fsdevel@vger.kernel.org
---
 fs/fuse/dir.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index f67bef9d83c4..542086140781 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -611,7 +611,7 @@ static void free_ext_value(struct fuse_args *args)
  * If the filesystem doesn't support this, then fall back to separate
  * 'mknod' + 'open' requests.
  */
-static int fuse_create_open(struct inode *dir, struct dentry *entry,
+static int _fuse_create_open(struct inode *dir, struct dentry *entry,
 			    struct file *file, unsigned int flags,
 			    umode_t mode, u32 opcode)
 {
@@ -751,7 +751,7 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
 	if (fc->no_create)
 		goto mknod;
 
-	err = fuse_create_open(dir, entry, file, flags, mode, FUSE_CREATE);
+	err = _fuse_create_open(dir, entry, file, flags, mode, FUSE_CREATE);
 	if (err == -ENOSYS) {
 		fc->no_create = 1;
 		goto mknod;
@@ -877,7 +877,7 @@ static int fuse_tmpfile(struct mnt_idmap *idmap, struct inode *dir,
 	if (fc->no_tmpfile)
 		return -EOPNOTSUPP;
 
-	err = fuse_create_open(dir, file->f_path.dentry, file, file->f_flags, mode, FUSE_TMPFILE);
+	err = _fuse_create_open(dir, file->f_path.dentry, file, file->f_flags, mode, FUSE_TMPFILE);
 	if (err == -ENOSYS) {
 		fc->no_tmpfile = 1;
 		err = -EOPNOTSUPP;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v9 2/7] fuse: introduce atomic open
  2023-09-20 17:34 [PATCH v9 0/7] fuse: full atomic open and atomic-open-revalidate Bernd Schubert
  2023-09-20 17:34 ` [PATCH v9 1/7] fuse: rename fuse_create_open Bernd Schubert
@ 2023-09-20 17:34 ` Bernd Schubert
       [not found]   ` <7616CA3C-312F-4F9F-9BB3-903D3A77289B@chromium.org>
  2023-10-03  3:38   ` Yuan Yao
  2023-09-20 17:34 ` [PATCH v9 3/7] [RFC] Allow atomic_open() on positive dentry Bernd Schubert
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 23+ messages in thread
From: Bernd Schubert @ 2023-09-20 17:34 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: bernd.schubert, miklos, dsingh, Bernd Schubert, Horst Birthelmer,
	Christian Brauner, Al Viro

From: Dharmendra Singh <dsingh@ddn.com>

This adds full atomic open support, to avoid lookup before open/create.
If the implementation (fuse server/daemon) does not support atomic open
it falls back to non-atomic open.

Co-developed-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Dharmendra Singh <dsingh@ddn.com>
Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dharmendra Singh <dsingh@ddn.com>
Cc: linux-fsdevel@vger.kernel.org
---
 fs/fuse/dir.c             | 214 +++++++++++++++++++++++++++++++++++++-
 fs/fuse/fuse_i.h          |   3 +
 include/uapi/linux/fuse.h |   3 +
 3 files changed, 219 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 542086140781..4cb2809a852d 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -722,7 +722,7 @@ static int _fuse_create_open(struct inode *dir, struct dentry *entry,
 
 static int fuse_mknod(struct mnt_idmap *, struct inode *, struct dentry *,
 		      umode_t, dev_t);
-static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
+static int fuse_create_open(struct inode *dir, struct dentry *entry,
 			    struct file *file, unsigned flags,
 			    umode_t mode)
 {
@@ -768,6 +768,218 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
 	return finish_no_open(file, res);
 }
 
+static int _fuse_atomic_open(struct inode *dir, struct dentry *entry,
+			     struct file *file, unsigned flags,
+			     umode_t mode)
+{
+	int err;
+	struct inode *inode;
+	struct fuse_mount *fm = get_fuse_mount(dir);
+	struct fuse_conn *fc = fm->fc;
+	FUSE_ARGS(args);
+	struct fuse_forget_link *forget;
+	struct fuse_create_in inarg;
+	struct fuse_open_out outopen;
+	struct fuse_entry_out outentry;
+	struct fuse_inode *fi;
+	struct fuse_file *ff;
+	struct dentry *switched_entry = NULL, *alias = NULL;
+	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
+
+	/* Expect a negative dentry */
+	if (unlikely(d_inode(entry)))
+		goto fallback;
+
+	/* Userspace expects S_IFREG in create mode */
+	if ((flags & O_CREAT) && (mode & S_IFMT) != S_IFREG)
+		goto fallback;
+
+	forget = fuse_alloc_forget();
+	err = -ENOMEM;
+	if (!forget)
+		goto out_err;
+
+	err = -ENOMEM;
+	ff = fuse_file_alloc(fm);
+	if (!ff)
+		goto out_put_forget_req;
+
+	if (!fc->dont_mask)
+		mode &= ~current_umask();
+
+	flags &= ~O_NOCTTY;
+	memset(&inarg, 0, sizeof(inarg));
+	memset(&outentry, 0, sizeof(outentry));
+	inarg.flags = flags;
+	inarg.mode = mode;
+	inarg.umask = current_umask();
+
+	if (fc->handle_killpriv_v2 && (flags & O_TRUNC) &&
+	    !(flags & O_EXCL) && !capable(CAP_FSETID)) {
+		inarg.open_flags |= FUSE_OPEN_KILL_SUIDGID;
+	}
+
+	args.opcode = FUSE_OPEN_ATOMIC;
+	args.nodeid = get_node_id(dir);
+	args.in_numargs = 2;
+	args.in_args[0].size = sizeof(inarg);
+	args.in_args[0].value = &inarg;
+	args.in_args[1].size = entry->d_name.len + 1;
+	args.in_args[1].value = entry->d_name.name;
+	args.out_numargs = 2;
+	args.out_args[0].size = sizeof(outentry);
+	args.out_args[0].value = &outentry;
+	args.out_args[1].size = sizeof(outopen);
+	args.out_args[1].value = &outopen;
+
+	if (flags & O_CREAT) {
+		err = get_create_ext(&args, dir, entry, mode);
+		if (err)
+			goto out_free_ff;
+	}
+
+	err = fuse_simple_request(fm, &args);
+	free_ext_value(&args);
+	if (err == -ENOSYS) {
+		fc->no_open_atomic = 1;
+		fuse_file_free(ff);
+		kfree(forget);
+		goto fallback;
+	}
+
+	if (!err && !outentry.nodeid)
+		err = -ENOENT;
+
+	if (err)
+		goto out_free_ff;
+
+	err = -EIO;
+	if (invalid_nodeid(outentry.nodeid) || fuse_invalid_attr(&outentry.attr))
+		goto out_free_ff;
+
+	ff->fh = outopen.fh;
+	ff->nodeid = outentry.nodeid;
+	ff->open_flags = outopen.open_flags;
+	inode = fuse_iget(dir->i_sb, outentry.nodeid, outentry.generation,
+			  &outentry.attr, entry_attr_timeout(&outentry), 0);
+	if (!inode) {
+		flags &= ~(O_CREAT | O_EXCL | O_TRUNC);
+		fuse_sync_release(NULL, ff, flags);
+		fuse_queue_forget(fm->fc, forget, outentry.nodeid, 1);
+		err = -ENOMEM;
+		goto out_err;
+	}
+
+	/* prevent racing/parallel lookup on a negative hashed */
+	if (!(flags & O_CREAT) && !d_in_lookup(entry)) {
+		d_drop(entry);
+		switched_entry = d_alloc_parallel(entry->d_parent,
+						   &entry->d_name, &wq);
+		if (IS_ERR(switched_entry)) {
+			err = PTR_ERR(switched_entry);
+			goto out_free_ff;
+		}
+
+		if (unlikely(!d_in_lookup(switched_entry))) {
+			/* fall back */
+			dput(switched_entry);
+			switched_entry = NULL;
+			goto free_and_fallback;
+		}
+
+		entry = switched_entry;
+	}
+
+	if (d_really_is_negative(entry)) {
+		d_drop(entry);
+		alias = d_exact_alias(entry, inode);
+		if (!alias) {
+			alias = d_splice_alias(inode, entry);
+			if (IS_ERR(alias)) {
+				/*
+				 * Close the file in user space, but do not unlink it,
+				 * if it was created - with network file systems other
+				 * clients might have already accessed it.
+				 */
+				fi = get_fuse_inode(inode);
+				fuse_sync_release(fi, ff, flags);
+				fuse_queue_forget(fm->fc, forget, outentry.nodeid, 1);
+				err = PTR_ERR(alias);
+				goto out_err;
+			}
+		}
+
+		if (alias)
+			entry = alias;
+	}
+
+	fuse_change_entry_timeout(entry, &outentry);
+
+	/*  File was indeed created */
+	if (outopen.open_flags & FOPEN_FILE_CREATED) {
+		if (!(flags & O_CREAT)) {
+			pr_debug("Server side bug, FOPEN_FILE_CREATED set "
+				 "without O_CREAT, ignoring.");
+		} else {
+			/* This should be always set when the file is created */
+			fuse_dir_changed(dir);
+			file->f_mode |= FMODE_CREATED;
+		}
+	}
+
+	if (S_ISDIR(mode))
+		ff->open_flags &= ~FOPEN_DIRECT_IO;
+	err = finish_open(file, entry, generic_file_open);
+	if (err) {
+		fi = get_fuse_inode(inode);
+		fuse_sync_release(fi, ff, flags);
+	} else {
+		file->private_data = ff;
+		fuse_finish_open(inode, file);
+	}
+
+	kfree(forget);
+
+	if (switched_entry) {
+		d_lookup_done(switched_entry);
+		dput(switched_entry);
+	}
+
+	dput(alias);
+
+	return err;
+
+out_free_ff:
+	fuse_file_free(ff);
+out_put_forget_req:
+	kfree(forget);
+out_err:
+	if (switched_entry) {
+		d_lookup_done(switched_entry);
+		dput(switched_entry);
+	}
+
+	return err;
+
+free_and_fallback:
+	fuse_file_free(ff);
+	kfree(forget);
+fallback:
+	return fuse_create_open(dir, entry, file, flags, mode);
+}
+
+static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
+			    struct file *file, unsigned int flags,
+			    umode_t mode)
+{
+	struct fuse_conn *fc = get_fuse_conn(dir);
+
+	if (fc->no_open_atomic)
+		return fuse_create_open(dir, entry, file, flags, mode);
+	else
+		return _fuse_atomic_open(dir, entry, file, flags, mode);
+}
+
 /*
  * Code shared between mknod, mkdir, symlink and link
  */
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 9b7fc7d3c7f1..c838708cfa2b 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -672,6 +672,9 @@ struct fuse_conn {
 	/** Is open/release not implemented by fs? */
 	unsigned no_open:1;
 
+	/** Is open atomic not implemented by fs? */
+	unsigned no_open_atomic:1;
+
 	/** Is opendir/releasedir not implemented by fs? */
 	unsigned no_opendir:1;
 
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index b3fcab13fcd3..33fefee42697 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -315,6 +315,7 @@ struct fuse_file_lock {
  * FOPEN_STREAM: the file is stream-like (no file position at all)
  * FOPEN_NOFLUSH: don't flush data cache on close (unless FUSE_WRITEBACK_CACHE)
  * FOPEN_PARALLEL_DIRECT_WRITES: Allow concurrent direct writes on the same inode
+ * FOPEN_FILE_CREATED: the file was indeed created
  */
 #define FOPEN_DIRECT_IO		(1 << 0)
 #define FOPEN_KEEP_CACHE	(1 << 1)
@@ -323,6 +324,7 @@ struct fuse_file_lock {
 #define FOPEN_STREAM		(1 << 4)
 #define FOPEN_NOFLUSH		(1 << 5)
 #define FOPEN_PARALLEL_DIRECT_WRITES	(1 << 6)
+#define FOPEN_FILE_CREATED	(1 << 7)
 
 /**
  * INIT request/reply flags
@@ -575,6 +577,7 @@ enum fuse_opcode {
 	FUSE_REMOVEMAPPING	= 49,
 	FUSE_SYNCFS		= 50,
 	FUSE_TMPFILE		= 51,
+	FUSE_OPEN_ATOMIC	= 52,
 
 	/* CUSE specific operations */
 	CUSE_INIT		= 4096,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v9 3/7] [RFC] Allow atomic_open() on positive dentry
  2023-09-20 17:34 [PATCH v9 0/7] fuse: full atomic open and atomic-open-revalidate Bernd Schubert
  2023-09-20 17:34 ` [PATCH v9 1/7] fuse: rename fuse_create_open Bernd Schubert
  2023-09-20 17:34 ` [PATCH v9 2/7] fuse: introduce atomic open Bernd Schubert
@ 2023-09-20 17:34 ` Bernd Schubert
  2023-09-21  6:20   ` Amir Goldstein
  2023-09-20 17:34 ` [PATCH v9 4/7] vfs: Optimize " Bernd Schubert
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Bernd Schubert @ 2023-09-20 17:34 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: bernd.schubert, miklos, dsingh, Bernd Schubert, Christian Brauner,
	Al Viro

From: Miklos Szeredi <miklos@szeredi.hu>

atomic_open() will do an open-by-name or create-and-open
depending on the flags.

If file was created, then the old positive dentry is obviously
stale, so it will be invalidated and a new one will be allocated.

If not created, then check whether it's the same inode (same as in
->d_revalidate()) and if not, invalidate & allocate new dentry.

Co-developed-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dharmendra Singh <dsingh@ddn.com>
Cc: linux-fsdevel@vger.kernel.org
---
 fs/namei.c            | 25 ++++++++++++++++++++-----
 include/linux/namei.h |  7 +++++++
 2 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index e56ff39a79bc..f01b278ac0ef 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -858,7 +858,7 @@ static inline int d_revalidate(struct dentry *dentry, unsigned int flags)
 	if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE))
 		return dentry->d_op->d_revalidate(dentry, flags);
 	else
-		return 1;
+		return D_REVALIDATE_VALID;
 }
 
 /**
@@ -1611,10 +1611,11 @@ struct dentry *lookup_one_qstr_excl(const struct qstr *name,
 }
 EXPORT_SYMBOL(lookup_one_qstr_excl);
 
-static struct dentry *lookup_fast(struct nameidata *nd)
+static struct dentry *lookup_fast(struct nameidata *nd, bool *atomic_revalidate)
 {
 	struct dentry *dentry, *parent = nd->path.dentry;
 	int status = 1;
+	*atomic_revalidate = false;
 
 	/*
 	 * Rename seqlock is not required here because in the off chance
@@ -1656,6 +1657,10 @@ static struct dentry *lookup_fast(struct nameidata *nd)
 		dput(dentry);
 		return ERR_PTR(status);
 	}
+
+	if (status == D_REVALIDATE_ATOMIC)
+		*atomic_revalidate = true;
+
 	return dentry;
 }
 
@@ -1981,6 +1986,7 @@ static const char *handle_dots(struct nameidata *nd, int type)
 static const char *walk_component(struct nameidata *nd, int flags)
 {
 	struct dentry *dentry;
+	bool atomic_revalidate;
 	/*
 	 * "." and ".." are special - ".." especially so because it has
 	 * to be able to know about the current root directory and
@@ -1991,7 +1997,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
 			put_link(nd);
 		return handle_dots(nd, nd->last_type);
 	}
-	dentry = lookup_fast(nd);
+	dentry = lookup_fast(nd, &atomic_revalidate);
 	if (IS_ERR(dentry))
 		return ERR_CAST(dentry);
 	if (unlikely(!dentry)) {
@@ -1999,6 +2005,9 @@ static const char *walk_component(struct nameidata *nd, int flags)
 		if (IS_ERR(dentry))
 			return ERR_CAST(dentry);
 	}
+
+	WARN_ON(atomic_revalidate);
+
 	if (!(flags & WALK_MORE) && nd->depth)
 		put_link(nd);
 	return step_into(nd, flags, dentry);
@@ -3430,7 +3439,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 		dput(dentry);
 		dentry = NULL;
 	}
-	if (dentry->d_inode) {
+	if (dentry->d_inode && error != D_REVALIDATE_ATOMIC) {
 		/* Cached positive dentry: will open in f_op->open */
 		return dentry;
 	}
@@ -3523,12 +3532,18 @@ static const char *open_last_lookups(struct nameidata *nd,
 	}
 
 	if (!(open_flag & O_CREAT)) {
+		bool atomic_revalidate;
+
 		if (nd->last.name[nd->last.len])
 			nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
 		/* we _can_ be in RCU mode here */
-		dentry = lookup_fast(nd);
+		dentry = lookup_fast(nd, &atomic_revalidate);
 		if (IS_ERR(dentry))
 			return ERR_CAST(dentry);
+		if (dentry && unlikely(atomic_revalidate)) {
+			dput(dentry);
+			dentry = NULL;
+		}
 		if (likely(dentry))
 			goto finish_lookup;
 
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 1463cbda4888..a70e87d2b2a9 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -47,6 +47,13 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT};
 /* LOOKUP_* flags which do scope-related checks based on the dirfd. */
 #define LOOKUP_IS_SCOPED (LOOKUP_BENEATH | LOOKUP_IN_ROOT)
 
+/* ->d_revalidate return codes */
+enum {
+	D_REVALIDATE_INVALID = 0, /* invalid dentry */
+	D_REVALIDATE_VALID   = 1, /* valid dentry */
+	D_REVALIDATE_ATOMIC =  2, /* atomic_open will revalidate */
+};
+
 extern int path_pts(struct path *path);
 
 extern int user_path_at_empty(int, const char __user *, unsigned, struct path *, int *empty);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v9 4/7] vfs: Optimize atomic_open() on positive dentry
  2023-09-20 17:34 [PATCH v9 0/7] fuse: full atomic open and atomic-open-revalidate Bernd Schubert
                   ` (2 preceding siblings ...)
  2023-09-20 17:34 ` [PATCH v9 3/7] [RFC] Allow atomic_open() on positive dentry Bernd Schubert
@ 2023-09-20 17:34 ` Bernd Schubert
  2023-09-21  6:16   ` Amir Goldstein
  2023-10-11  7:17   ` Dan Carpenter
  2023-09-20 17:34 ` [PATCH v9 5/7] fuse: Revalidate positive entries in fuse_atomic_open Bernd Schubert
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 23+ messages in thread
From: Bernd Schubert @ 2023-09-20 17:34 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: bernd.schubert, miklos, dsingh, Bernd Schubert, Christian Brauner,
	Al Viro

This was suggested by Miklos based on review from the previous
patch that introduced atomic open for positive dentries.
In open_last_lookups() the dentry was not used anymore when atomic
revalidate was available, which required to release the dentry,
then code fall through to lookup_open was done, which resulted
in another d_lookup and also d_revalidate. All of that can
be avoided by the new atomic_revalidate_open() function.

Another included change is the introduction of an enum as
d_revalidate return code.

Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Dharmendra Singh <dsingh@ddn.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
---
 fs/namei.c | 45 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 43 insertions(+), 2 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index f01b278ac0ef..8ad7a0dfa596 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3388,6 +3388,44 @@ static struct dentry *atomic_open(struct nameidata *nd, struct dentry *dentry,
 	return dentry;
 }
 
+static struct dentry *atomic_revalidate_open(struct dentry *dentry,
+					     struct nameidata *nd,
+					     struct file *file,
+					     const struct open_flags *op,
+					     bool *got_write)
+{
+	struct mnt_idmap *idmap;
+	struct dentry *dir = nd->path.dentry;
+	struct inode *dir_inode = dir->d_inode;
+	int open_flag = op->open_flag;
+	umode_t mode = op->mode;
+
+	if (unlikely(IS_DEADDIR(dir_inode)))
+		return ERR_PTR(-ENOENT);
+
+	file->f_mode &= ~FMODE_CREATED;
+
+	if (unlikely(open_flag & O_CREAT)) {
+		WARN_ON(1);
+		return ERR_PTR(-EINVAL);
+	}
+
+	if (open_flag & (O_TRUNC | O_WRONLY | O_RDWR))
+		*got_write = !mnt_want_write(nd->path.mnt);
+	else
+		*got_write = false;
+
+	if (!got_write)
+		open_flag &= ~O_TRUNC;
+
+	inode_lock_shared(dir->d_inode);
+	dentry = atomic_open(nd, dentry, file, open_flag, mode);
+	inode_unlock_shared(dir->d_inode);
+
+	return dentry;
+
+}
+
 /*
  * Look up and maybe create and open the last component.
  *
@@ -3541,8 +3579,10 @@ static const char *open_last_lookups(struct nameidata *nd,
 		if (IS_ERR(dentry))
 			return ERR_CAST(dentry);
 		if (dentry && unlikely(atomic_revalidate)) {
-			dput(dentry);
-			dentry = NULL;
+			BUG_ON(nd->flags & LOOKUP_RCU);
+			dentry = atomic_revalidate_open(dentry, nd, file, op,
+							&got_write);
+			goto drop_write;
 		}
 		if (likely(dentry))
 			goto finish_lookup;
@@ -3580,6 +3620,7 @@ static const char *open_last_lookups(struct nameidata *nd,
 	else
 		inode_unlock_shared(dir->d_inode);
 
+drop_write:
 	if (got_write)
 		mnt_drop_write(nd->path.mnt);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v9 5/7] fuse: Revalidate positive entries in fuse_atomic_open
  2023-09-20 17:34 [PATCH v9 0/7] fuse: full atomic open and atomic-open-revalidate Bernd Schubert
                   ` (3 preceding siblings ...)
  2023-09-20 17:34 ` [PATCH v9 4/7] vfs: Optimize " Bernd Schubert
@ 2023-09-20 17:34 ` Bernd Schubert
  2023-09-20 17:34 ` [PATCH v9 6/7] fuse: Return D_REVALIDATE_ATOMIC for cached dentries Bernd Schubert
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Bernd Schubert @ 2023-09-20 17:34 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: bernd.schubert, miklos, dsingh, Bernd Schubert, Horst Birthelmer,
	Christian Brauner, Al Viro

From: Dharmendra Singh <dsingh@ddn.com>

This makes use of the vfs changes and fuse_dentry_revalidate()
can now skip revalidate, if the fuse implementation has
atomic_open support, which will has to do the dentry
revalidation.

Skipping revalidate is only possible when we absolutely
know that the implementation supports atomic_open, so
another bit had to be added to struct fuse_conn, which
is set when atomic_open was successful.

Once struct fuse_conn has the positive 'has_open_atomic'
fuse_dentry_revalidate() might set DCACHE_ATOMIC_OPEN.
vfs use that flag to use atomic_open.

If the file was newly created, the previous positive dentry
is invalidated and a new dentry and inode are allocated
and linked (d_splice_alias).

If file was not created, we revalidate the inode. If inode is
stale, current inode is marked as bad. And new inode is allocated
and linked to new dentry(old dentry invalidated). In case of
inode attributes differing with fresh attr, we allocate new
dentry and hook current inode to it and open the file.

For negative dentry, FS just allocate new inode and hook it onto
passed entry from VFS and open the file.

Co-developed-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Dharmendra Singh <dsingh@ddn.com>
Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dharmendra Singh <dsingh@ddn.com>
Cc: linux-fsdevel@vger.kernel.org
---
 fs/fuse/dir.c    | 203 ++++++++++++++++++++++++++++++++++++++++-------
 fs/fuse/fuse_i.h |   3 +
 2 files changed, 176 insertions(+), 30 deletions(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 4cb2809a852d..aefd783c7552 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -230,6 +230,19 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
 
 		fm = get_fuse_mount(inode);
 
+		/* If open atomic is supported by FUSE then use this opportunity
+		 * to avoid this lookup and combine lookup + open into a single call.
+		 *
+		 * Note: Fuse detects open atomic implementation automatically.
+		 * Therefore first few call would go into open atomic code path
+		 * , detects that open atomic is implemented or not by setting
+		 * fc->no_open_atomic. In case open atomic is not implemented,
+		 * calls fall back to non-atomic open.
+		 */
+		if (fm->fc->has_open_atomic && flags & LOOKUP_OPEN) {
+			ret = D_REVALIDATE_ATOMIC;
+			goto out;
+		}
 		forget = fuse_alloc_forget();
 		ret = -ENOMEM;
 		if (!forget)
@@ -280,12 +293,12 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
 			dput(parent);
 		}
 	}
-	ret = 1;
+	ret = D_REVALIDATE_VALID;
 out:
 	return ret;
 
 invalid:
-	ret = 0;
+	ret = D_REVALIDATE_INVALID;
 	goto out;
 }
 
@@ -768,12 +781,84 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
 	return finish_no_open(file, res);
 }
 
+/**
+ * Revalidate inode hooked into dentry against freshly acquired
+ * attributes. If inode is stale then allocate new dentry and
+ * hook it onto fresh inode.
+ */
+static struct dentry *
+fuse_atomic_open_revalidate(struct fuse_conn *fc, struct dentry *entry,
+			    struct inode *inode, int switched,
+			    struct fuse_entry_out *outentry,
+			    wait_queue_head_t *wq, int *alloc_inode)
+{
+	u64 attr_version;
+	struct dentry *prev = entry;
+
+	if (outentry->nodeid != get_node_id(inode) ||
+	    (bool) IS_AUTOMOUNT(inode) !=
+	    (bool) (outentry->attr.flags & FUSE_ATTR_SUBMOUNT)) {
+		*alloc_inode = 1;
+	} else if (fuse_stale_inode(inode, outentry->generation,
+				  &outentry->attr)) {
+		fuse_make_bad(inode);
+		*alloc_inode = 1;
+	}
+
+	if (*alloc_inode) {
+		struct dentry *new = NULL;
+
+		if (!switched && !d_in_lookup(entry)) {
+			d_drop(entry);
+			new = d_alloc_parallel(entry->d_parent, &entry->d_name,
+					       wq);
+			if (IS_ERR(new))
+				return new;
+
+			if (unlikely(!d_in_lookup(new))) {
+				dput(new);
+				new = ERR_PTR(-EIO);
+				return new;
+			}
+		}
+
+		fuse_invalidate_entry(entry);
+
+		entry = new;
+	} else if (!*alloc_inode) {
+		attr_version = fuse_get_attr_version(fc);
+		forget_all_cached_acls(inode);
+		fuse_change_attributes(inode, &outentry->attr,
+				       entry_attr_timeout(outentry),
+				       attr_version);
+	}
+
+	if (prev == entry) {
+		/* nothing changed, atomic-open on the server side
+		 * had increased the lookup count - do the same here
+		 */
+		struct fuse_inode *fi = get_fuse_inode(inode);
+
+		spin_lock(&fi->lock);
+		fi->nlookup++;
+		spin_unlock(&fi->lock);
+	}
+
+	return entry;
+}
+
+/**
+ * Does 'lookup + create + open' or 'lookup + open' atomically.
+ * @entry might be positive as well, therefore inode is re-validated.
+ * Positive dentry is invalidated in case inode attributes differ or
+ * we encountered error.
+ */
 static int _fuse_atomic_open(struct inode *dir, struct dentry *entry,
 			     struct file *file, unsigned flags,
 			     umode_t mode)
 {
 	int err;
-	struct inode *inode;
+	struct inode *inode = d_inode(entry);
 	struct fuse_mount *fm = get_fuse_mount(dir);
 	struct fuse_conn *fc = fm->fc;
 	FUSE_ARGS(args);
@@ -785,10 +870,7 @@ static int _fuse_atomic_open(struct inode *dir, struct dentry *entry,
 	struct fuse_file *ff;
 	struct dentry *switched_entry = NULL, *alias = NULL;
 	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
-
-	/* Expect a negative dentry */
-	if (unlikely(d_inode(entry)))
-		goto fallback;
+	int alloc_inode = 0;
 
 	/* Userspace expects S_IFREG in create mode */
 	if ((flags & O_CREAT) && (mode & S_IFMT) != S_IFREG)
@@ -840,37 +922,56 @@ static int _fuse_atomic_open(struct inode *dir, struct dentry *entry,
 
 	err = fuse_simple_request(fm, &args);
 	free_ext_value(&args);
-	if (err == -ENOSYS) {
-		fc->no_open_atomic = 1;
-		fuse_file_free(ff);
-		kfree(forget);
-		goto fallback;
-	}
 
 	if (!err && !outentry.nodeid)
 		err = -ENOENT;
 
-	if (err)
-		goto out_free_ff;
+	if (err) {
+		if (err == -ENOSYS) {
+			fc->no_open_atomic = 1;
+
+			/* Might come up if userspace tricks us and would
+			 * return -ENOSYS for OPEN_ATOMIC after it was
+			 * aready working
+			 */
+			if (unlikely(fc->has_open_atomic == 1))
+				pr_info("fuse server/daemon bug, atomic open "
+					"got -ENOSYS although it was already "
+					"succeeding before.");
+
+			/* This should better never happen, revalidate
+			 * is missing for this entry*/
+			if (d_really_is_positive(entry)) {
+				WARN_ON(1);
+				err = -EIO;
+				goto out_free_ff;
+			}
+
+			fuse_file_free(ff);
+			kfree(forget);
+			goto fallback;
+		} else {
+			if (d_really_is_positive(entry)) {
+				if (err != -EINTR && err != -ENOMEM)
+					fuse_invalidate_entry(entry);
+			}
+
+			goto out_free_ff;
+		}
+	}
+
+	if (!err && !fc->has_open_atomic) {
+		/* Only set this flag when atomic open did not return an error,
+		 * so that we are absolutely sure it is implemented.
+		 */
+		fc->has_open_atomic = 1;
+	}
 
 	err = -EIO;
 	if (invalid_nodeid(outentry.nodeid) || fuse_invalid_attr(&outentry.attr))
 		goto out_free_ff;
 
-	ff->fh = outopen.fh;
-	ff->nodeid = outentry.nodeid;
-	ff->open_flags = outopen.open_flags;
-	inode = fuse_iget(dir->i_sb, outentry.nodeid, outentry.generation,
-			  &outentry.attr, entry_attr_timeout(&outentry), 0);
-	if (!inode) {
-		flags &= ~(O_CREAT | O_EXCL | O_TRUNC);
-		fuse_sync_release(NULL, ff, flags);
-		fuse_queue_forget(fm->fc, forget, outentry.nodeid, 1);
-		err = -ENOMEM;
-		goto out_err;
-	}
-
-	/* prevent racing/parallel lookup on a negative hashed */
+	/* prevent racing/parallel lookup */
 	if (!(flags & O_CREAT) && !d_in_lookup(entry)) {
 		d_drop(entry);
 		switched_entry = d_alloc_parallel(entry->d_parent,
@@ -884,10 +985,52 @@ static int _fuse_atomic_open(struct inode *dir, struct dentry *entry,
 			/* fall back */
 			dput(switched_entry);
 			switched_entry = NULL;
-			goto free_and_fallback;
+
+			if (!inode) {
+				goto free_and_fallback;
+			} else {
+				/* XXX can this happen at all and is there a
+				 * better way to handle it?
+				 */
+				err = -EIO;
+				goto out_free_ff;
+			}
+		}
+	}
+
+	if (inode) {
+		struct dentry *new;
+
+		err = -ESTALE;
+		new = fuse_atomic_open_revalidate(fm->fc, entry, inode,
+						  !!switched_entry,
+						  &outentry, &wq, &alloc_inode);
+		if (IS_ERR(new)) {
+			err = PTR_ERR(new);
+			goto out_free_ff;
 		}
 
+		if (new != entry && new != NULL)
+			switched_entry = new;
+	}
+
+	if (switched_entry)
 		entry = switched_entry;
+
+	ff->fh = outopen.fh;
+	ff->nodeid = outentry.nodeid;
+	ff->open_flags = outopen.open_flags;
+
+	if (!inode || alloc_inode) {
+		inode = fuse_iget(dir->i_sb, outentry.nodeid, outentry.generation,
+				  &outentry.attr, entry_attr_timeout(&outentry), 0);
+		if (!inode) {
+			flags &= ~(O_CREAT | O_EXCL | O_TRUNC);
+			fuse_sync_release(NULL, ff, flags);
+			fuse_queue_forget(fm->fc, forget, outentry.nodeid, 1);
+			err = -ENOMEM;
+			goto out_err;
+		}
 	}
 
 	if (d_really_is_negative(entry)) {
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index c838708cfa2b..3987046682b1 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -675,6 +675,9 @@ struct fuse_conn {
 	/** Is open atomic not implemented by fs? */
 	unsigned no_open_atomic:1;
 
+	/** Is open atomic is proven to be implemented by fs? */
+	unsigned has_open_atomic:1;
+
 	/** Is opendir/releasedir not implemented by fs? */
 	unsigned no_opendir:1;
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v9 6/7] fuse: Return D_REVALIDATE_ATOMIC for cached dentries
  2023-09-20 17:34 [PATCH v9 0/7] fuse: full atomic open and atomic-open-revalidate Bernd Schubert
                   ` (4 preceding siblings ...)
  2023-09-20 17:34 ` [PATCH v9 5/7] fuse: Revalidate positive entries in fuse_atomic_open Bernd Schubert
@ 2023-09-20 17:34 ` Bernd Schubert
  2023-09-20 17:34 ` [PATCH v9 7/7] fuse: Avoid code duplication in atomic open Bernd Schubert
  2023-09-21  9:33 ` [PATCH v9 0/7] fuse: full atomic open and atomic-open-revalidate Amir Goldstein
  7 siblings, 0 replies; 23+ messages in thread
From: Bernd Schubert @ 2023-09-20 17:34 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: bernd.schubert, miklos, dsingh, Bernd Schubert, Horst Birthelmer

Cached dentries do not get revalidate, but open would still result in
open + getattr, instead of one atomic_open call only.

libfuse logs (passthrough_hp):

Unpatched:
----------
unique: 22, opcode: OPEN (14), nodeid: 140698229673544, insize: 48, pid: 3434
   unique: 22, success, outsize: 32
unique: 24, opcode: GETATTR (3), nodeid: 140698229673544, insize: 56, pid: 3434
   unique: 24, success, outsize: 120
unique: 26, opcode: FLUSH (25), nodeid: 140698229673544, insize: 64, pid: 3434
   unique: 26, success, outsize: 16
unique: 28, opcode: RELEASE (18), nodeid: 140698229673544, insize: 64, pid: 0
   unique: 28, success, outsize: 16

Patched:
----------
unique: 20, opcode: OPEN_ATOMIC (52), nodeid: 1, insize: 63, pid: 3397
   unique: 20, success, outsize: 160
unique: 22, opcode: FLUSH (25), nodeid: 140024188243528, insize: 64, pid: 3397
   unique: 22, success, outsize: 16
unique: 24, opcode: RELEASE (18), nodeid: 140024188243528, insize: 64, pid: 0
   unique: 24, success, outsize: 16

Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Dharmendra Singh <dsingh@ddn.com>
Cc: Horst Birthelmer <hbirthelmer@ddn.com>
Cc: linux-fsdevel@vger.kernel.org
---
 fs/fuse/dir.c | 49 +++++++++++++++++++++++++++++++++++++------------
 1 file changed, 37 insertions(+), 12 deletions(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index aefd783c7552..e2c397e6e4bf 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -193,6 +193,22 @@ static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args,
 	args->out_args[0].value = outarg;
 }
 
+/*
+ * If open atomic is supported by FUSE then use this opportunity
+ * to avoid this lookup and combine lookup + open into a single call.
+ */
+static int fuse_dentry_do_atomic_revalidate(struct dentry *entry,
+					     unsigned int flags,
+					     struct fuse_conn *fc)
+{
+	int ret = 0;
+
+	if (flags & LOOKUP_OPEN && fc->has_open_atomic)
+		ret = D_REVALIDATE_ATOMIC;
+
+	return ret;
+}
+
 /*
  * Check whether the dentry is still valid
  *
@@ -230,19 +246,10 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
 
 		fm = get_fuse_mount(inode);
 
-		/* If open atomic is supported by FUSE then use this opportunity
-		 * to avoid this lookup and combine lookup + open into a single call.
-		 *
-		 * Note: Fuse detects open atomic implementation automatically.
-		 * Therefore first few call would go into open atomic code path
-		 * , detects that open atomic is implemented or not by setting
-		 * fc->no_open_atomic. In case open atomic is not implemented,
-		 * calls fall back to non-atomic open.
-		 */
-		if (fm->fc->has_open_atomic && flags & LOOKUP_OPEN) {
-			ret = D_REVALIDATE_ATOMIC;
+		ret = fuse_dentry_do_atomic_revalidate(entry, flags, fm->fc);
+		if (ret)
 			goto out;
-		}
+
 		forget = fuse_alloc_forget();
 		ret = -ENOMEM;
 		if (!forget)
@@ -285,6 +292,16 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
 	} else if (inode) {
 		fi = get_fuse_inode(inode);
 		if (flags & LOOKUP_RCU) {
+			fm = get_fuse_mount(inode);
+			if (fm->fc->has_open_atomic) {
+				/* Atomic open is preferred, as it does entry
+				 * revalidate and attribute refresh, but
+				 * DCACHE_ATOMIC_OPEN cannot be set in RCU mode
+				 */
+				if (flags & LOOKUP_OPEN)
+					return -ECHILD;
+			}
+
 			if (test_bit(FUSE_I_INIT_RDPLUS, &fi->state))
 				return -ECHILD;
 		} else if (test_and_clear_bit(FUSE_I_INIT_RDPLUS, &fi->state)) {
@@ -292,6 +309,14 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
 			fuse_advise_use_readdirplus(d_inode(parent));
 			dput(parent);
 		}
+
+		/* revalidate is skipped, but we still want atomic open to
+		 * update attributes during open
+		 */
+		fm = get_fuse_mount(inode);
+		ret = fuse_dentry_do_atomic_revalidate(entry, flags, fm->fc);
+		if (ret)
+			goto out;
 	}
 	ret = D_REVALIDATE_VALID;
 out:
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v9 7/7] fuse: Avoid code duplication in atomic open
  2023-09-20 17:34 [PATCH v9 0/7] fuse: full atomic open and atomic-open-revalidate Bernd Schubert
                   ` (5 preceding siblings ...)
  2023-09-20 17:34 ` [PATCH v9 6/7] fuse: Return D_REVALIDATE_ATOMIC for cached dentries Bernd Schubert
@ 2023-09-20 17:34 ` Bernd Schubert
  2023-09-21  9:33 ` [PATCH v9 0/7] fuse: full atomic open and atomic-open-revalidate Amir Goldstein
  7 siblings, 0 replies; 23+ messages in thread
From: Bernd Schubert @ 2023-09-20 17:34 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: bernd.schubert, miklos, dsingh, Bernd Schubert, Horst Birthelmer

The same code was used in fuse_atomic_open_revalidate()
_fuse_atomic_open().

(If preferred, this could be merged into the main fuse atomic
revalidate patch). Or adding the function could be moved up
in the series.

Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Dharmendra Singh <dsingh@ddn.com>
Cc: Horst Birthelmer <hbirthelmer@ddn.com>
Cc: linux-fsdevel@vger.kernel.org
---
 fs/fuse/dir.c | 48 +++++++++++++++++++++++-------------------------
 1 file changed, 23 insertions(+), 25 deletions(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index e2c397e6e4bf..4e285af2f17f 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -806,6 +806,25 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
 	return finish_no_open(file, res);
 }
 
+static struct dentry *fuse_atomic_open_alloc_dentry(struct dentry *entry,
+						    wait_queue_head_t *wq)
+{
+	struct dentry *new;
+
+	d_drop(entry);
+	new = d_alloc_parallel(entry->d_parent, &entry->d_name,
+			       wq);
+	if (IS_ERR(new))
+		return new;
+
+	/* XXX Can this happen at all and there a way to handle it? */
+	if (unlikely(!d_in_lookup(new))) {
+		dput(new);
+		new = ERR_PTR(-EIO);
+	}
+	return new;
+}
+
 /**
  * Revalidate inode hooked into dentry against freshly acquired
  * attributes. If inode is stale then allocate new dentry and
@@ -834,17 +853,9 @@ fuse_atomic_open_revalidate(struct fuse_conn *fc, struct dentry *entry,
 		struct dentry *new = NULL;
 
 		if (!switched && !d_in_lookup(entry)) {
-			d_drop(entry);
-			new = d_alloc_parallel(entry->d_parent, &entry->d_name,
-					       wq);
+			new = fuse_atomic_open_alloc_dentry(entry, wq);
 			if (IS_ERR(new))
 				return new;
-
-			if (unlikely(!d_in_lookup(new))) {
-				dput(new);
-				new = ERR_PTR(-EIO);
-				return new;
-			}
 		}
 
 		fuse_invalidate_entry(entry);
@@ -998,26 +1009,13 @@ static int _fuse_atomic_open(struct inode *dir, struct dentry *entry,
 
 	/* prevent racing/parallel lookup */
 	if (!(flags & O_CREAT) && !d_in_lookup(entry)) {
-		d_drop(entry);
-		switched_entry = d_alloc_parallel(entry->d_parent,
-						   &entry->d_name, &wq);
+		switched_entry = fuse_atomic_open_alloc_dentry(entry, &wq);
 		if (IS_ERR(switched_entry)) {
-			err = PTR_ERR(switched_entry);
-			goto out_free_ff;
-		}
-
-		if (unlikely(!d_in_lookup(switched_entry))) {
-			/* fall back */
-			dput(switched_entry);
-			switched_entry = NULL;
-
 			if (!inode) {
 				goto free_and_fallback;
 			} else {
-				/* XXX can this happen at all and is there a
-				 * better way to handle it?
-				 */
-				err = -EIO;
+				/* XXX Is there a better way to handle it? */
+				err = PTR_ERR(switched_entry);
 				goto out_free_ff;
 			}
 		}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v9 4/7] vfs: Optimize atomic_open() on positive dentry
  2023-09-20 17:34 ` [PATCH v9 4/7] vfs: Optimize " Bernd Schubert
@ 2023-09-21  6:16   ` Amir Goldstein
  2023-09-21  8:09     ` Bernd Schubert
  2023-10-11  7:17   ` Dan Carpenter
  1 sibling, 1 reply; 23+ messages in thread
From: Amir Goldstein @ 2023-09-21  6:16 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: linux-fsdevel, bernd.schubert, miklos, dsingh, Christian Brauner,
	Al Viro

On Thu, Sep 21, 2023 at 12:48 AM Bernd Schubert <bschubert@ddn.com> wrote:
>
> This was suggested by Miklos based on review from the previous
> patch that introduced atomic open for positive dentries.
> In open_last_lookups() the dentry was not used anymore when atomic
> revalidate was available, which required to release the dentry,
> then code fall through to lookup_open was done, which resulted
> in another d_lookup and also d_revalidate. All of that can
> be avoided by the new atomic_revalidate_open() function.
>
> Another included change is the introduction of an enum as
> d_revalidate return code.
>
> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
> Cc: Miklos Szeredi <miklos@szeredi.hu>
> Cc: Dharmendra Singh <dsingh@ddn.com>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: linux-fsdevel@vger.kernel.org
> ---
>  fs/namei.c | 45 +++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 43 insertions(+), 2 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index f01b278ac0ef..8ad7a0dfa596 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3388,6 +3388,44 @@ static struct dentry *atomic_open(struct nameidata *nd, struct dentry *dentry,
>         return dentry;
>  }
>
> +static struct dentry *atomic_revalidate_open(struct dentry *dentry,
> +                                            struct nameidata *nd,
> +                                            struct file *file,
> +                                            const struct open_flags *op,
> +                                            bool *got_write)
> +{
> +       struct mnt_idmap *idmap;
> +       struct dentry *dir = nd->path.dentry;
> +       struct inode *dir_inode = dir->d_inode;
> +       int open_flag = op->open_flag;
> +       umode_t mode = op->mode;
> +
> +       if (unlikely(IS_DEADDIR(dir_inode)))
> +               return ERR_PTR(-ENOENT);
> +
> +       file->f_mode &= ~FMODE_CREATED;
> +
> +       if (unlikely(open_flag & O_CREAT)) {
> +               WARN_ON(1);

      if (WARN_ON_ONCE(open_flag & O_CREAT)) {

is more compact and produces a nicer print

> +               return ERR_PTR(-EINVAL);
> +       }
> +
> +       if (open_flag & (O_TRUNC | O_WRONLY | O_RDWR))
> +               *got_write = !mnt_want_write(nd->path.mnt);
> +       else
> +               *got_write = false;
> +
> +       if (!got_write)
> +               open_flag &= ~O_TRUNC;
> +
> +       inode_lock_shared(dir->d_inode);
> +       dentry = atomic_open(nd, dentry, file, open_flag, mode);
> +       inode_unlock_shared(dir->d_inode);
> +
> +       return dentry;
> +
> +}
> +
>  /*
>   * Look up and maybe create and open the last component.
>   *
> @@ -3541,8 +3579,10 @@ static const char *open_last_lookups(struct nameidata *nd,
>                 if (IS_ERR(dentry))
>                         return ERR_CAST(dentry);
>                 if (dentry && unlikely(atomic_revalidate)) {
> -                       dput(dentry);
> -                       dentry = NULL;
> +                       BUG_ON(nd->flags & LOOKUP_RCU);

The assertion means that someone wrote bad code
it does not mean that the kernel internal state is hopelessly corrupted.
Please avoid BUG_ON and use WARN_ON_ONCE and if possible
(seems to be the case here) also take the graceful error path.
It's better to fail an open than to crash the kernel.

Thanks,
Amir.

> +                       dentry = atomic_revalidate_open(dentry, nd, file, op,
> +                                                       &got_write);
> +                       goto drop_write;
>                 }
>                 if (likely(dentry))
>                         goto finish_lookup;
> @@ -3580,6 +3620,7 @@ static const char *open_last_lookups(struct nameidata *nd,
>         else
>                 inode_unlock_shared(dir->d_inode);
>
> +drop_write:
>         if (got_write)
>                 mnt_drop_write(nd->path.mnt);
>
> --
> 2.39.2
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v9 3/7] [RFC] Allow atomic_open() on positive dentry
  2023-09-20 17:34 ` [PATCH v9 3/7] [RFC] Allow atomic_open() on positive dentry Bernd Schubert
@ 2023-09-21  6:20   ` Amir Goldstein
  0 siblings, 0 replies; 23+ messages in thread
From: Amir Goldstein @ 2023-09-21  6:20 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: linux-fsdevel, bernd.schubert, miklos, dsingh, Christian Brauner,
	Al Viro

On Wed, Sep 20, 2023 at 8:51 PM Bernd Schubert <bschubert@ddn.com> wrote:
>
> From: Miklos Szeredi <miklos@szeredi.hu>
>
> atomic_open() will do an open-by-name or create-and-open
> depending on the flags.
>
> If file was created, then the old positive dentry is obviously
> stale, so it will be invalidated and a new one will be allocated.
>
> If not created, then check whether it's the same inode (same as in
> ->d_revalidate()) and if not, invalidate & allocate new dentry.
>
> Co-developed-by: Bernd Schubert <bschubert@ddn.com>
> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Dharmendra Singh <dsingh@ddn.com>
> Cc: linux-fsdevel@vger.kernel.org
> ---
>  fs/namei.c            | 25 ++++++++++++++++++++-----
>  include/linux/namei.h |  7 +++++++
>  2 files changed, 27 insertions(+), 5 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index e56ff39a79bc..f01b278ac0ef 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -858,7 +858,7 @@ static inline int d_revalidate(struct dentry *dentry, unsigned int flags)
>         if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE))
>                 return dentry->d_op->d_revalidate(dentry, flags);
>         else
> -               return 1;
> +               return D_REVALIDATE_VALID;
>  }
>
>  /**
> @@ -1611,10 +1611,11 @@ struct dentry *lookup_one_qstr_excl(const struct qstr *name,
>  }
>  EXPORT_SYMBOL(lookup_one_qstr_excl);
>
> -static struct dentry *lookup_fast(struct nameidata *nd)
> +static struct dentry *lookup_fast(struct nameidata *nd, bool *atomic_revalidate)
>  {
>         struct dentry *dentry, *parent = nd->path.dentry;
>         int status = 1;
> +       *atomic_revalidate = false;
>
>         /*
>          * Rename seqlock is not required here because in the off chance
> @@ -1656,6 +1657,10 @@ static struct dentry *lookup_fast(struct nameidata *nd)
>                 dput(dentry);
>                 return ERR_PTR(status);
>         }
> +
> +       if (status == D_REVALIDATE_ATOMIC)
> +               *atomic_revalidate = true;
> +
>         return dentry;
>  }
>
> @@ -1981,6 +1986,7 @@ static const char *handle_dots(struct nameidata *nd, int type)
>  static const char *walk_component(struct nameidata *nd, int flags)
>  {
>         struct dentry *dentry;
> +       bool atomic_revalidate;
>         /*
>          * "." and ".." are special - ".." especially so because it has
>          * to be able to know about the current root directory and
> @@ -1991,7 +1997,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
>                         put_link(nd);
>                 return handle_dots(nd, nd->last_type);
>         }
> -       dentry = lookup_fast(nd);
> +       dentry = lookup_fast(nd, &atomic_revalidate);
>         if (IS_ERR(dentry))
>                 return ERR_CAST(dentry);
>         if (unlikely(!dentry)) {
> @@ -1999,6 +2005,9 @@ static const char *walk_component(struct nameidata *nd, int flags)
>                 if (IS_ERR(dentry))
>                         return ERR_CAST(dentry);
>         }
> +
> +       WARN_ON(atomic_revalidate);
> +

WARN_ON_ONCE should be enough in most cases
to signal bad code without spamming the log.

Is there an error path that can be taken in this case?

Thanks,
Amir.

>         if (!(flags & WALK_MORE) && nd->depth)
>                 put_link(nd);
>         return step_into(nd, flags, dentry);
> @@ -3430,7 +3439,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
>                 dput(dentry);
>                 dentry = NULL;
>         }
> -       if (dentry->d_inode) {
> +       if (dentry->d_inode && error != D_REVALIDATE_ATOMIC) {
>                 /* Cached positive dentry: will open in f_op->open */
>                 return dentry;
>         }
> @@ -3523,12 +3532,18 @@ static const char *open_last_lookups(struct nameidata *nd,
>         }
>
>         if (!(open_flag & O_CREAT)) {
> +               bool atomic_revalidate;
> +
>                 if (nd->last.name[nd->last.len])
>                         nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
>                 /* we _can_ be in RCU mode here */
> -               dentry = lookup_fast(nd);
> +               dentry = lookup_fast(nd, &atomic_revalidate);
>                 if (IS_ERR(dentry))
>                         return ERR_CAST(dentry);
> +               if (dentry && unlikely(atomic_revalidate)) {
> +                       dput(dentry);
> +                       dentry = NULL;
> +               }
>                 if (likely(dentry))
>                         goto finish_lookup;
>
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 1463cbda4888..a70e87d2b2a9 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -47,6 +47,13 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT};
>  /* LOOKUP_* flags which do scope-related checks based on the dirfd. */
>  #define LOOKUP_IS_SCOPED (LOOKUP_BENEATH | LOOKUP_IN_ROOT)
>
> +/* ->d_revalidate return codes */
> +enum {
> +       D_REVALIDATE_INVALID = 0, /* invalid dentry */
> +       D_REVALIDATE_VALID   = 1, /* valid dentry */
> +       D_REVALIDATE_ATOMIC =  2, /* atomic_open will revalidate */
> +};
> +
>  extern int path_pts(struct path *path);
>
>  extern int user_path_at_empty(int, const char __user *, unsigned, struct path *, int *empty);
> --
> 2.39.2
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v9 4/7] vfs: Optimize atomic_open() on positive dentry
  2023-09-21  6:16   ` Amir Goldstein
@ 2023-09-21  8:09     ` Bernd Schubert
  2023-09-21  8:29       ` Amir Goldstein
  0 siblings, 1 reply; 23+ messages in thread
From: Bernd Schubert @ 2023-09-21  8:09 UTC (permalink / raw)
  To: Amir Goldstein, Bernd Schubert
  Cc: linux-fsdevel, miklos, dsingh, Christian Brauner, Al Viro



On 9/21/23 08:16, Amir Goldstein wrote:
> On Thu, Sep 21, 2023 at 12:48 AM Bernd Schubert <bschubert@ddn.com> wrote:
>>
>> This was suggested by Miklos based on review from the previous
>> patch that introduced atomic open for positive dentries.
>> In open_last_lookups() the dentry was not used anymore when atomic
>> revalidate was available, which required to release the dentry,
>> then code fall through to lookup_open was done, which resulted
>> in another d_lookup and also d_revalidate. All of that can
>> be avoided by the new atomic_revalidate_open() function.
>>
>> Another included change is the introduction of an enum as
>> d_revalidate return code.
>>
>> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
>> Cc: Miklos Szeredi <miklos@szeredi.hu>
>> Cc: Dharmendra Singh <dsingh@ddn.com>
>> Cc: Christian Brauner <brauner@kernel.org>
>> Cc: Al Viro <viro@zeniv.linux.org.uk>
>> Cc: linux-fsdevel@vger.kernel.org
>> ---
>>   fs/namei.c | 45 +++++++++++++++++++++++++++++++++++++++++++--
>>   1 file changed, 43 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/namei.c b/fs/namei.c
>> index f01b278ac0ef..8ad7a0dfa596 100644
>> --- a/fs/namei.c
>> +++ b/fs/namei.c
>> @@ -3388,6 +3388,44 @@ static struct dentry *atomic_open(struct nameidata *nd, struct dentry *dentry,
>>          return dentry;
>>   }
>>
>> +static struct dentry *atomic_revalidate_open(struct dentry *dentry,
>> +                                            struct nameidata *nd,
>> +                                            struct file *file,
>> +                                            const struct open_flags *op,
>> +                                            bool *got_write)
>> +{
>> +       struct mnt_idmap *idmap;
>> +       struct dentry *dir = nd->path.dentry;
>> +       struct inode *dir_inode = dir->d_inode;
>> +       int open_flag = op->open_flag;
>> +       umode_t mode = op->mode;
>> +
>> +       if (unlikely(IS_DEADDIR(dir_inode)))
>> +               return ERR_PTR(-ENOENT);
>> +
>> +       file->f_mode &= ~FMODE_CREATED;
>> +
>> +       if (unlikely(open_flag & O_CREAT)) {
>> +               WARN_ON(1);
> 
>        if (WARN_ON_ONCE(open_flag & O_CREAT)) {
> 
> is more compact and produces a nicer print

Thanks a lot for your review Amir! Nice, I hadn't noticed that
these macros actually return a value. Also updated the related
fuse patch, which had a similar construct.

> 
>> +               return ERR_PTR(-EINVAL);
>> +       }
>> +
>> +       if (open_flag & (O_TRUNC | O_WRONLY | O_RDWR))
>> +               *got_write = !mnt_want_write(nd->path.mnt);
>> +       else
>> +               *got_write = false;
>> +
>> +       if (!got_write)
>> +               open_flag &= ~O_TRUNC;
>> +
>> +       inode_lock_shared(dir->d_inode);
>> +       dentry = atomic_open(nd, dentry, file, open_flag, mode);
>> +       inode_unlock_shared(dir->d_inode);
>> +
>> +       return dentry;
>> +
>> +}
>> +
>>   /*
>>    * Look up and maybe create and open the last component.
>>    *
>> @@ -3541,8 +3579,10 @@ static const char *open_last_lookups(struct nameidata *nd,
>>                  if (IS_ERR(dentry))
>>                          return ERR_CAST(dentry);
>>                  if (dentry && unlikely(atomic_revalidate)) {
>> -                       dput(dentry);
>> -                       dentry = NULL;
>> +                       BUG_ON(nd->flags & LOOKUP_RCU);
> 
> The assertion means that someone wrote bad code
> it does not mean that the kernel internal state is hopelessly corrupted.
> Please avoid BUG_ON and use WARN_ON_ONCE and if possible
> (seems to be the case here) also take the graceful error path.
> It's better to fail an open than to crash the kernel.

Thanks, updated. I had used BUG_ON because there is a similar BUG_ON a
few lines below. Added another RFC patch to covert that one as well.
I'm going to wait a few days for possible other reviews and will then send
out the new version. The updated v10 branch is here

https://github.com/bsbernd/linux/tree/atomic-open-for-6.5-v10


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v9 4/7] vfs: Optimize atomic_open() on positive dentry
  2023-09-21  8:09     ` Bernd Schubert
@ 2023-09-21  8:29       ` Amir Goldstein
  2023-09-21  9:16         ` Bernd Schubert
  0 siblings, 1 reply; 23+ messages in thread
From: Amir Goldstein @ 2023-09-21  8:29 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Bernd Schubert, linux-fsdevel, miklos, dsingh, Christian Brauner,
	Al Viro

On Thu, Sep 21, 2023 at 11:09 AM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
>
>
> On 9/21/23 08:16, Amir Goldstein wrote:
> > On Thu, Sep 21, 2023 at 12:48 AM Bernd Schubert <bschubert@ddn.com> wrote:
> >>
> >> This was suggested by Miklos based on review from the previous
> >> patch that introduced atomic open for positive dentries.
> >> In open_last_lookups() the dentry was not used anymore when atomic
> >> revalidate was available, which required to release the dentry,
> >> then code fall through to lookup_open was done, which resulted
> >> in another d_lookup and also d_revalidate. All of that can
> >> be avoided by the new atomic_revalidate_open() function.
> >>
> >> Another included change is the introduction of an enum as
> >> d_revalidate return code.
> >>
> >> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
> >> Cc: Miklos Szeredi <miklos@szeredi.hu>
> >> Cc: Dharmendra Singh <dsingh@ddn.com>
> >> Cc: Christian Brauner <brauner@kernel.org>
> >> Cc: Al Viro <viro@zeniv.linux.org.uk>
> >> Cc: linux-fsdevel@vger.kernel.org
> >> ---
> >>   fs/namei.c | 45 +++++++++++++++++++++++++++++++++++++++++++--
> >>   1 file changed, 43 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/fs/namei.c b/fs/namei.c
> >> index f01b278ac0ef..8ad7a0dfa596 100644
> >> --- a/fs/namei.c
> >> +++ b/fs/namei.c
> >> @@ -3388,6 +3388,44 @@ static struct dentry *atomic_open(struct nameidata *nd, struct dentry *dentry,
> >>          return dentry;
> >>   }
> >>
> >> +static struct dentry *atomic_revalidate_open(struct dentry *dentry,
> >> +                                            struct nameidata *nd,
> >> +                                            struct file *file,
> >> +                                            const struct open_flags *op,
> >> +                                            bool *got_write)
> >> +{
> >> +       struct mnt_idmap *idmap;
> >> +       struct dentry *dir = nd->path.dentry;
> >> +       struct inode *dir_inode = dir->d_inode;
> >> +       int open_flag = op->open_flag;
> >> +       umode_t mode = op->mode;
> >> +
> >> +       if (unlikely(IS_DEADDIR(dir_inode)))
> >> +               return ERR_PTR(-ENOENT);
> >> +
> >> +       file->f_mode &= ~FMODE_CREATED;
> >> +
> >> +       if (unlikely(open_flag & O_CREAT)) {
> >> +               WARN_ON(1);
> >
> >        if (WARN_ON_ONCE(open_flag & O_CREAT)) {
> >
> > is more compact and produces a nicer print
>
> Thanks a lot for your review Amir! Nice, I hadn't noticed that
> these macros actually return a value. Also updated the related
> fuse patch, which had a similar construct.
>
> >
> >> +               return ERR_PTR(-EINVAL);
> >> +       }
> >> +
> >> +       if (open_flag & (O_TRUNC | O_WRONLY | O_RDWR))
> >> +               *got_write = !mnt_want_write(nd->path.mnt);
> >> +       else
> >> +               *got_write = false;
> >> +
> >> +       if (!got_write)
> >> +               open_flag &= ~O_TRUNC;
> >> +
> >> +       inode_lock_shared(dir->d_inode);
> >> +       dentry = atomic_open(nd, dentry, file, open_flag, mode);
> >> +       inode_unlock_shared(dir->d_inode);
> >> +
> >> +       return dentry;
> >> +
> >> +}
> >> +
> >>   /*
> >>    * Look up and maybe create and open the last component.
> >>    *
> >> @@ -3541,8 +3579,10 @@ static const char *open_last_lookups(struct nameidata *nd,
> >>                  if (IS_ERR(dentry))
> >>                          return ERR_CAST(dentry);
> >>                  if (dentry && unlikely(atomic_revalidate)) {
> >> -                       dput(dentry);
> >> -                       dentry = NULL;
> >> +                       BUG_ON(nd->flags & LOOKUP_RCU);
> >
> > The assertion means that someone wrote bad code
> > it does not mean that the kernel internal state is hopelessly corrupted.
> > Please avoid BUG_ON and use WARN_ON_ONCE and if possible
> > (seems to be the case here) also take the graceful error path.
> > It's better to fail an open than to crash the kernel.
>
> Thanks, updated. I had used BUG_ON because there is a similar BUG_ON a
> few lines below.

checkpatch strictly forbids new BUG_ON:
"Do not crash the kernel unless it is absolutely unavoidable-- use
 WARN_ON_ONCE() plus recovery code (if feasible) instead of BUG() or variants"

But it's true that vfs code has several of those.

> Added another RFC patch to covert that one as well.
> I'm going to wait a few days for possible other reviews and will then send
> out the new version. The updated v10 branch is here
>
> https://github.com/bsbernd/linux/tree/atomic-open-for-6.5-v10
>

IIUC, patches 3,4 are independent vfs optimization.
Is that correct?

Since you are going to need review of vfs maintainers
and since Christian would probably want to merge them
via the vfs tree, I think it would be better for you to post
them separately from your series if possible, so Miklos
would be able to pick up the fuse patches when they are
reviewed.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v9 4/7] vfs: Optimize atomic_open() on positive dentry
  2023-09-21  8:29       ` Amir Goldstein
@ 2023-09-21  9:16         ` Bernd Schubert
  0 siblings, 0 replies; 23+ messages in thread
From: Bernd Schubert @ 2023-09-21  9:16 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Bernd Schubert, linux-fsdevel, miklos, dsingh, Christian Brauner,
	Al Viro



On 9/21/23 10:29, Amir Goldstein wrote:
> On Thu, Sep 21, 2023 at 11:09 AM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
>>
>>
>>
>> On 9/21/23 08:16, Amir Goldstein wrote:
>>> On Thu, Sep 21, 2023 at 12:48 AM Bernd Schubert <bschubert@ddn.com> wrote:
>>>>
>>>> This was suggested by Miklos based on review from the previous
>>>> patch that introduced atomic open for positive dentries.
>>>> In open_last_lookups() the dentry was not used anymore when atomic
>>>> revalidate was available, which required to release the dentry,
>>>> then code fall through to lookup_open was done, which resulted
>>>> in another d_lookup and also d_revalidate. All of that can
>>>> be avoided by the new atomic_revalidate_open() function.
>>>>
>>>> Another included change is the introduction of an enum as
>>>> d_revalidate return code.
>>>>
>>>> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
>>>> Cc: Miklos Szeredi <miklos@szeredi.hu>
>>>> Cc: Dharmendra Singh <dsingh@ddn.com>
>>>> Cc: Christian Brauner <brauner@kernel.org>
>>>> Cc: Al Viro <viro@zeniv.linux.org.uk>
>>>> Cc: linux-fsdevel@vger.kernel.org
>>>> ---
>>>>    fs/namei.c | 45 +++++++++++++++++++++++++++++++++++++++++++--
>>>>    1 file changed, 43 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/fs/namei.c b/fs/namei.c
>>>> index f01b278ac0ef..8ad7a0dfa596 100644
>>>> --- a/fs/namei.c
>>>> +++ b/fs/namei.c
>>>> @@ -3388,6 +3388,44 @@ static struct dentry *atomic_open(struct nameidata *nd, struct dentry *dentry,
>>>>           return dentry;
>>>>    }
>>>>
>>>> +static struct dentry *atomic_revalidate_open(struct dentry *dentry,
>>>> +                                            struct nameidata *nd,
>>>> +                                            struct file *file,
>>>> +                                            const struct open_flags *op,
>>>> +                                            bool *got_write)
>>>> +{
>>>> +       struct mnt_idmap *idmap;
>>>> +       struct dentry *dir = nd->path.dentry;
>>>> +       struct inode *dir_inode = dir->d_inode;
>>>> +       int open_flag = op->open_flag;
>>>> +       umode_t mode = op->mode;
>>>> +
>>>> +       if (unlikely(IS_DEADDIR(dir_inode)))
>>>> +               return ERR_PTR(-ENOENT);
>>>> +
>>>> +       file->f_mode &= ~FMODE_CREATED;
>>>> +
>>>> +       if (unlikely(open_flag & O_CREAT)) {
>>>> +               WARN_ON(1);
>>>
>>>         if (WARN_ON_ONCE(open_flag & O_CREAT)) {
>>>
>>> is more compact and produces a nicer print
>>
>> Thanks a lot for your review Amir! Nice, I hadn't noticed that
>> these macros actually return a value. Also updated the related
>> fuse patch, which had a similar construct.
>>
>>>
>>>> +               return ERR_PTR(-EINVAL);
>>>> +       }
>>>> +
>>>> +       if (open_flag & (O_TRUNC | O_WRONLY | O_RDWR))
>>>> +               *got_write = !mnt_want_write(nd->path.mnt);
>>>> +       else
>>>> +               *got_write = false;
>>>> +
>>>> +       if (!got_write)
>>>> +               open_flag &= ~O_TRUNC;
>>>> +
>>>> +       inode_lock_shared(dir->d_inode);
>>>> +       dentry = atomic_open(nd, dentry, file, open_flag, mode);
>>>> +       inode_unlock_shared(dir->d_inode);
>>>> +
>>>> +       return dentry;
>>>> +
>>>> +}
>>>> +
>>>>    /*
>>>>     * Look up and maybe create and open the last component.
>>>>     *
>>>> @@ -3541,8 +3579,10 @@ static const char *open_last_lookups(struct nameidata *nd,
>>>>                   if (IS_ERR(dentry))
>>>>                           return ERR_CAST(dentry);
>>>>                   if (dentry && unlikely(atomic_revalidate)) {
>>>> -                       dput(dentry);
>>>> -                       dentry = NULL;
>>>> +                       BUG_ON(nd->flags & LOOKUP_RCU);
>>>
>>> The assertion means that someone wrote bad code
>>> it does not mean that the kernel internal state is hopelessly corrupted.
>>> Please avoid BUG_ON and use WARN_ON_ONCE and if possible
>>> (seems to be the case here) also take the graceful error path.
>>> It's better to fail an open than to crash the kernel.
>>
>> Thanks, updated. I had used BUG_ON because there is a similar BUG_ON a
>> few lines below.
> 
> checkpatch strictly forbids new BUG_ON:
> "Do not crash the kernel unless it is absolutely unavoidable-- use
>   WARN_ON_ONCE() plus recovery code (if feasible) instead of BUG() or variants"
> 
> But it's true that vfs code has several of those.

Yeah sorry, I had seen the warning, but ignored it, in favor of code
consistency.

> 
>> Added another RFC patch to covert that one as well.
>> I'm going to wait a few days for possible other reviews and will then send
>> out the new version. The updated v10 branch is here
>>
>> https://github.com/bsbernd/linux/tree/atomic-open-for-6.5-v10
>>
> 
> IIUC, patches 3,4 are independent vfs optimization.
> Is that correct?

Hmm, not exactly. Patches 3 is a dependency for patch 5-7. So far
atomic open didn't handle positive dentries / revalidate.
Patch 3 adds support for that, the file system then needs to
signal in its ->d_revalidate return code that it wants
atomic_open to do the revalidation (which adds a bit complexity to
the atomic_open method). Patch 3 has then two cases,
O_CREATE and plain open without O_CREATE. Patch 4 is an
optimization for patch 3, for plain open. I actually start
to wonder if I shouldn't remove plain open from patch 3
and to add that in patch 4. Probably easier to read. Thanks for
making me think about that :)

> 
> Since you are going to need review of vfs maintainers
> and since Christian would probably want to merge them
> via the vfs tree, I think it would be better for you to post
> them separately from your series if possible, so Miklos
> would be able to pick up the fuse patches when they are
> reviewed.

Only the new patch 8 is independent, I'm going to send that out
separately today.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v9 0/7] fuse: full atomic open and atomic-open-revalidate
  2023-09-20 17:34 [PATCH v9 0/7] fuse: full atomic open and atomic-open-revalidate Bernd Schubert
                   ` (6 preceding siblings ...)
  2023-09-20 17:34 ` [PATCH v9 7/7] fuse: Avoid code duplication in atomic open Bernd Schubert
@ 2023-09-21  9:33 ` Amir Goldstein
  2023-09-21 11:59   ` Bernd Schubert
  7 siblings, 1 reply; 23+ messages in thread
From: Amir Goldstein @ 2023-09-21  9:33 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: linux-fsdevel, bernd.schubert, miklos, dsingh, Vivek Goyal,
	Christian Brauner, Al Viro, Horst Birthelmer

On Thu, Sep 21, 2023 at 9:31 AM Bernd Schubert <bschubert@ddn.com> wrote:
>
> In FUSE, as of now, uncached lookups are expensive over the wire.
> E.g additional latencies and stressing (meta data) servers from
> thousands of clients. With atomic-open lookup before open
> can be avoided.
>
> Here is the link to performance numbers
> https://lore.kernel.org/linux-fsdevel/20220322121212.5087-1-dharamhans87@gmail.com/
>
> Here is the libfuse pull request
> https://github.com/libfuse/libfuse/pull/813
>
> The patches are passing passthrough_hp xfstests (libfuse part applied),
> although we had to introduce umount retries into xfstests, as recent
> kernels/xfstests fail umount in some tests with
> EBUSY - independent of atomic open. (Although outstanding for v7)

Hi Bernd!

I was using xfstests to test passthrough_hp (for FUSE kernel passthrough).
FYI, I have made some improvements to the mount helper
in libfuse [1] to support remount, which helps pass a few tests.

So far, I have all the tests in group -g quick.rw pass with the baseline
passthrough_hp (over xfs).

Do you have a baseline for the entire quick/auto group to share with me?
Can you share the patch that you are using to avoid the EBUSY errors?

Note that Chritian has suggested a method to use inotify
IN_UNMOUNT event to wait for sb shutdown in fstests [2].

Thanks,
Amir.

[1] https://github.com/amir73il/libfuse/commits/fuse-backing-fd
[2] https://lore.kernel.org/linux-fsdevel/20230908-verflachen-neudefinition-4da649d673a9@brauner/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v9 0/7] fuse: full atomic open and atomic-open-revalidate
  2023-09-21  9:33 ` [PATCH v9 0/7] fuse: full atomic open and atomic-open-revalidate Amir Goldstein
@ 2023-09-21 11:59   ` Bernd Schubert
  2023-09-21 14:24     ` Amir Goldstein
  0 siblings, 1 reply; 23+ messages in thread
From: Bernd Schubert @ 2023-09-21 11:59 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: linux-fsdevel@vger.kernel.org, bernd.schubert@fastmail.fm,
	miklos@szeredi.hu, Dharmendra Singh, Vivek Goyal,
	Christian Brauner, Al Viro, Horst Birthelmer

[-- Attachment #1: Type: text/plain, Size: 2693 bytes --]

On 9/21/23 11:33, Amir Goldstein wrote:
> On Thu, Sep 21, 2023 at 9:31 AM Bernd Schubert <bschubert@ddn.com> wrote:
>>
>> In FUSE, as of now, uncached lookups are expensive over the wire.
>> E.g additional latencies and stressing (meta data) servers from
>> thousands of clients. With atomic-open lookup before open
>> can be avoided.
>>
>> Here is the link to performance numbers
>> https://lore.kernel.org/linux-fsdevel/20220322121212.5087-1-dharamhans87@gmail.com/
>>
>> Here is the libfuse pull request
>> https://github.com/libfuse/libfuse/pull/813
>>
>> The patches are passing passthrough_hp xfstests (libfuse part applied),
>> although we had to introduce umount retries into xfstests, as recent
>> kernels/xfstests fail umount in some tests with
>> EBUSY - independent of atomic open. (Although outstanding for v7)
> 
> Hi Bernd!
> 
> I was using xfstests to test passthrough_hp (for FUSE kernel passthrough).
> FYI, I have made some improvements to the mount helper
> in libfuse [1] to support remount, which helps pass a few tests.

Thanks, just asked there to send it separate to upstream.

> 
> So far, I have all the tests in group -g quick.rw pass with the baseline
> passthrough_hp (over xfs).
> 
> Do you have a baseline for the entire quick/auto group to share with me?

Please find my results attached. I have opened a libfuse issue for generic/477,
(open_by_handle_at tests) but I'm not sure if this is passthrough_hp only (it
trusts the passed node id, without checking if there is an inode object for it).
Possibly fuse.ko passes an invalide node id - this is something for a rainy
weekend (or so) to investigate...


> Can you share the patch that you are using to avoid the EBUSY errors?


The simple version to avoid _most_ of EBUSY is this


diff --git a/common/rc b/common/rc
index 741579af..a40fca3b 100644
--- a/common/rc
+++ b/common/rc
@@ -305,6 +305,7 @@ _scratch_mount_idmapped()
  
  _scratch_unmount()
  {
+       sync
         case "$FSTYP" in
         overlay)
                 _overlay_scratch_unmount



The better version is this
https://github.com/kdave/xfstests/commit/33a15af07bb044e2773a83df1c7e0a0df280a4b7

> 
> Note that Chritian has suggested a method to use inotify
> IN_UNMOUNT event to wait for sb shutdown in fstests [2].

Thanks, I had seen the discussion. Although I (silently) wondered if something
like MNT_BLOCk as umount2 flag wouldn't be easier.

> 
> Thanks,
> Amir.
> 
> [1] https://github.com/amir73il/libfuse/commits/fuse-backing-fd
> [2] https://lore.kernel.org/linux-fsdevel/20230908-verflachen-neudefinition-4da649d673a9@brauner/


Thanks,
Bernd

[-- Attachment #2: xfstests-unpatched-6.5.log.2023-09-19-213432.gz --]
[-- Type: application/gzip, Size: 8142 bytes --]

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v9 0/7] fuse: full atomic open and atomic-open-revalidate
  2023-09-21 11:59   ` Bernd Schubert
@ 2023-09-21 14:24     ` Amir Goldstein
  2023-09-21 14:44       ` Bernd Schubert
  2023-09-29 11:34       ` Amir Goldstein
  0 siblings, 2 replies; 23+ messages in thread
From: Amir Goldstein @ 2023-09-21 14:24 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: linux-fsdevel@vger.kernel.org, bernd.schubert@fastmail.fm,
	miklos@szeredi.hu, Dharmendra Singh, Vivek Goyal,
	Christian Brauner, Al Viro, Horst Birthelmer

On Thu, Sep 21, 2023 at 3:00 PM Bernd Schubert <bschubert@ddn.com> wrote:
>
> On 9/21/23 11:33, Amir Goldstein wrote:
> > On Thu, Sep 21, 2023 at 9:31 AM Bernd Schubert <bschubert@ddn.com> wrote:
> >>
> >> In FUSE, as of now, uncached lookups are expensive over the wire.
> >> E.g additional latencies and stressing (meta data) servers from
> >> thousands of clients. With atomic-open lookup before open
> >> can be avoided.
> >>
> >> Here is the link to performance numbers
> >> https://lore.kernel.org/linux-fsdevel/20220322121212.5087-1-dharamhans87@gmail.com/
> >>
> >> Here is the libfuse pull request
> >> https://github.com/libfuse/libfuse/pull/813
> >>
> >> The patches are passing passthrough_hp xfstests (libfuse part applied),
> >> although we had to introduce umount retries into xfstests, as recent
> >> kernels/xfstests fail umount in some tests with
> >> EBUSY - independent of atomic open. (Although outstanding for v7)
> >
> > Hi Bernd!
> >
> > I was using xfstests to test passthrough_hp (for FUSE kernel passthrough).
> > FYI, I have made some improvements to the mount helper
> > in libfuse [1] to support remount, which helps pass a few tests.
>
> Thanks, just asked there to send it separate to upstream.
>
> >
> > So far, I have all the tests in group -g quick.rw pass with the baseline
> > passthrough_hp (over xfs).
> >
> > Do you have a baseline for the entire quick/auto group to share with me?
>
> Please find my results attached.

Not too bad.
3 more tests can pass with my mount helper fix for remount ;)

> I have opened a libfuse issue for generic/477,
> (open_by_handle_at tests) but I'm not sure if this is passthrough_hp only (it
> trusts the passed node id, without checking if there is an inode object for it).
> Possibly fuse.ko passes an invalide node id - this is something for a rainy
> weekend (or so) to investigate...

Stale file handles after mount cycle are expected.
FUSE is not equipped to handle this correctly.

NFS clients may even get access to the wrong inode
after FUSE restart/reexport, if FUSE is exported with the same
NFS fsid.

See this discussion [3] about how this could be solved hackishly
with existing FUSE protocol (for fs that know how to open by ino)
and about the LOOKUP_HANDLE protocol command that is
needed to solve this in a generic way.

>
>
> > Can you share the patch that you are using to avoid the EBUSY errors?
>
>
> The simple version to avoid _most_ of EBUSY is this
>
>
> diff --git a/common/rc b/common/rc
> index 741579af..a40fca3b 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -305,6 +305,7 @@ _scratch_mount_idmapped()
>
>   _scratch_unmount()
>   {
> +       sync
>          case "$FSTYP" in
>          overlay)
>                  _overlay_scratch_unmount
>
>
>
> The better version is this
> https://github.com/kdave/xfstests/commit/33a15af07bb044e2773a83df1c7e0a0df280a4b7
>
> >
> > Note that Chritian has suggested a method to use inotify
> > IN_UNMOUNT event to wait for sb shutdown in fstests [2].
>
> Thanks, I had seen the discussion. Although I (silently) wondered if something
> like MNT_BLOCk as umount2 flag wouldn't be easier.
>

You'd better keep wondering silently unless you want to upset Christian ;)

Thanks,
Amir.

> > [1] https://github.com/amir73il/libfuse/commits/fuse-backing-fd
> > [2] https://lore.kernel.org/linux-fsdevel/20230908-verflachen-neudefinition-4da649d673a9@brauner/
[3] https://lore.kernel.org/linux-fsdevel/CAOQ4uxg_FV8U833qVkgPaAWJ4MNcnGoy9Gci41bmak4_ROSc3g@mail.gmail.com/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v9 0/7] fuse: full atomic open and atomic-open-revalidate
  2023-09-21 14:24     ` Amir Goldstein
@ 2023-09-21 14:44       ` Bernd Schubert
  2023-09-29 11:34       ` Amir Goldstein
  1 sibling, 0 replies; 23+ messages in thread
From: Bernd Schubert @ 2023-09-21 14:44 UTC (permalink / raw)
  To: Amir Goldstein, Bernd Schubert
  Cc: linux-fsdevel@vger.kernel.org, miklos@szeredi.hu,
	Dharmendra Singh, Vivek Goyal, Christian Brauner, Al Viro,
	Horst Birthelmer



On 9/21/23 16:24, Amir Goldstein wrote:
> On Thu, Sep 21, 2023 at 3:00 PM Bernd Schubert <bschubert@ddn.com> wrote:
>>
>> On 9/21/23 11:33, Amir Goldstein wrote:
>>> On Thu, Sep 21, 2023 at 9:31 AM Bernd Schubert <bschubert@ddn.com> wrote:
>>>>
>>>> In FUSE, as of now, uncached lookups are expensive over the wire.
>>>> E.g additional latencies and stressing (meta data) servers from
>>>> thousands of clients. With atomic-open lookup before open
>>>> can be avoided.
>>>>
>>>> Here is the link to performance numbers
>>>> https://lore.kernel.org/linux-fsdevel/20220322121212.5087-1-dharamhans87@gmail.com/
>>>>
>>>> Here is the libfuse pull request
>>>> https://github.com/libfuse/libfuse/pull/813
>>>>
>>>> The patches are passing passthrough_hp xfstests (libfuse part applied),
>>>> although we had to introduce umount retries into xfstests, as recent
>>>> kernels/xfstests fail umount in some tests with
>>>> EBUSY - independent of atomic open. (Although outstanding for v7)
>>>
>>> Hi Bernd!
>>>
>>> I was using xfstests to test passthrough_hp (for FUSE kernel passthrough).
>>> FYI, I have made some improvements to the mount helper
>>> in libfuse [1] to support remount, which helps pass a few tests.
>>
>> Thanks, just asked there to send it separate to upstream.
>>
>>>
>>> So far, I have all the tests in group -g quick.rw pass with the baseline
>>> passthrough_hp (over xfs).
>>>
>>> Do you have a baseline for the entire quick/auto group to share with me?
>>
>> Please find my results attached.
> 
> Not too bad.
> 3 more tests can pass with my mount helper fix for remount ;)
> 
>> I have opened a libfuse issue for generic/477,
>> (open_by_handle_at tests) but I'm not sure if this is passthrough_hp only (it
>> trusts the passed node id, without checking if there is an inode object for it).
>> Possibly fuse.ko passes an invalide node id - this is something for a rainy
>> weekend (or so) to investigate...
> 
> Stale file handles after mount cycle are expected.
> FUSE is not equipped to handle this correctly.

I know and I don't have a problem with that. Issue is that the test triggers a
heap buffer overflow,  see the ASAN report here

https://github.com/libfuse/libfuse/issues/838

A possible reason might be an invalid node id by open_by_handle_at, or
lookup/release is not right. As I said, will investigate once I have a free
minute.

> 
> NFS clients may even get access to the wrong inode
> after FUSE restart/reexport, if FUSE is exported with the same
> NFS fsid.
> 
> See this discussion [3] about how this could be solved hackishly
> with existing FUSE protocol (for fs that know how to open by ino)
> and about the LOOKUP_HANDLE protocol command that is
> needed to solve this in a generic way.

I will read through it later. I would prefer adding support up to
MAX_HANDLE_SZ - our file systems typically exceed 64 bit inode sizes.
Without having it read, I would just expose exportfs methods to userspace
(which might be the LOOKUP_HANDLE protocol).


> 
>>
>>
>>> Can you share the patch that you are using to avoid the EBUSY errors?
>>
>>
>> The simple version to avoid _most_ of EBUSY is this
>>
>>
>> diff --git a/common/rc b/common/rc
>> index 741579af..a40fca3b 100644
>> --- a/common/rc
>> +++ b/common/rc
>> @@ -305,6 +305,7 @@ _scratch_mount_idmapped()
>>
>>    _scratch_unmount()
>>    {
>> +       sync
>>           case "$FSTYP" in
>>           overlay)
>>                   _overlay_scratch_unmount
>>
>>
>>
>> The better version is this
>> https://github.com/kdave/xfstests/commit/33a15af07bb044e2773a83df1c7e0a0df280a4b7
>>
>>>
>>> Note that Chritian has suggested a method to use inotify
>>> IN_UNMOUNT event to wait for sb shutdown in fstests [2].
>>
>> Thanks, I had seen the discussion. Although I (silently) wondered if something
>> like MNT_BLOCk as umount2 flag wouldn't be easier.
>>
> 
> You'd better keep wondering silently unless you want to upset Christian ;)

Ouch, Christian is in CC, inotify is fine ;)


Thanks,
Bernd


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v9 0/7] fuse: full atomic open and atomic-open-revalidate
  2023-09-21 14:24     ` Amir Goldstein
  2023-09-21 14:44       ` Bernd Schubert
@ 2023-09-29 11:34       ` Amir Goldstein
  1 sibling, 0 replies; 23+ messages in thread
From: Amir Goldstein @ 2023-09-29 11:34 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: linux-fsdevel@vger.kernel.org, bernd.schubert@fastmail.fm,
	miklos@szeredi.hu, Dharmendra Singh, Vivek Goyal,
	Christian Brauner, Al Viro, Horst Birthelmer

On Thu, Sep 21, 2023 at 5:24 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Thu, Sep 21, 2023 at 3:00 PM Bernd Schubert <bschubert@ddn.com> wrote:
> >
> > On 9/21/23 11:33, Amir Goldstein wrote:
> > > On Thu, Sep 21, 2023 at 9:31 AM Bernd Schubert <bschubert@ddn.com> wrote:
> > >>
> > >> In FUSE, as of now, uncached lookups are expensive over the wire.
> > >> E.g additional latencies and stressing (meta data) servers from
> > >> thousands of clients. With atomic-open lookup before open
> > >> can be avoided.
> > >>
> > >> Here is the link to performance numbers
> > >> https://lore.kernel.org/linux-fsdevel/20220322121212.5087-1-dharamhans87@gmail.com/
> > >>
> > >> Here is the libfuse pull request
> > >> https://github.com/libfuse/libfuse/pull/813
> > >>
> > >> The patches are passing passthrough_hp xfstests (libfuse part applied),
> > >> although we had to introduce umount retries into xfstests, as recent
> > >> kernels/xfstests fail umount in some tests with
> > >> EBUSY - independent of atomic open. (Although outstanding for v7)
> > >
> > > Hi Bernd!
> > >
> > > I was using xfstests to test passthrough_hp (for FUSE kernel passthrough).
> > > FYI, I have made some improvements to the mount helper
> > > in libfuse [1] to support remount, which helps pass a few tests.
> >
> > Thanks, just asked there to send it separate to upstream.

Now upstream. Thanks for your help!

> >
> > >
> > > So far, I have all the tests in group -g quick.rw pass with the baseline
> > > passthrough_hp (over xfs).
> > >
> > > Do you have a baseline for the entire quick/auto group to share with me?
> >
> > Please find my results attached.
>
> Not too bad.
> 3 more tests can pass with my mount helper fix for remount ;)
>

FYI, here is a wdiff of my -g auto passthough_hp test run compared to yours:

[-unpatched-6.5-]{+upatched-6.6-rc3+}
Failures: generic/003 [-generic/020-] {+generic/099+} generic/184
generic/192 generic/263 [-generic/294 generic/306-] {+generic/317
generic/318 generic/319 generic/375+} generic/401 {+generic/423+}
generic/426 [-generic/427-] generic/434 [-generic/452-]
{+generic/444+} generic/467 [-generic/468-] generic/477
[-generic/478-] generic/617 {+generic/532+} generic/631 generic/633
generic/683 [-generic/688-]

Some of my {+NEW+} failures are because I have POSIX_ACL support enabled
in Kconfig, so the same tests are [not run] in your results.
I suspect that several permission related tests that PASS for you and FAIL
for me may also be because of enabled POSIX_ACL.
I was also running passthouhg_hp with -odefault_permissions, but AFAIK
this did not change the fstests results.

> >
> >
> > > Can you share the patch that you are using to avoid the EBUSY errors?
> >
> >
> > The simple version to avoid _most_ of EBUSY is this
> >

You know, I am testing passthrough_hp with kernel 6.6-rc3 and I did
not encounter any EBUST errors.

Maybe there is some relevant vfs fix in 6.6-rc3, because you were testing 6.5?
Or maybe it's because my test VM has only 2 cpus.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v9 2/7] fuse: introduce atomic open
       [not found]   ` <7616CA3C-312F-4F9F-9BB3-903D3A77289B@chromium.org>
@ 2023-10-02 22:42     ` Bernd Schubert
  2023-10-12 20:44       ` Bernd Schubert
  0 siblings, 1 reply; 23+ messages in thread
From: Bernd Schubert @ 2023-10-02 22:42 UTC (permalink / raw)
  To: Yuan Yao
  Cc: linux-fsdevel@vger.kernel.org, bernd.schubert@fastmail.fm,
	miklos@szeredi.hu, Dharmendra Singh, Horst Birthelmer,
	Christian Brauner, Al Viro

On 10/2/23 03:57, Yuan Yao wrote:
> Hi, I’m Yuan, particularly interested in this patch set, and I've 
> noticed some ambiguity regarding the behavior of atomic_open for 
> symbolic links.
> 
> 
> I think this part may cause a problem if we atomic_open a symbolic 
> link.The previous behavior for fuse_create_open() will only do lookup 
> but not open the symbolic link. But, with the full atomic open kernel 
> patch. My understanding is that when the kernel performs an atomic_open 
> operation on a symbolic link, the dentry returned from the FUSE server 
> contains the inode pointing to the opened symbolic link. However, after 
> atomic_open() is called, the may_open() function in namei.c checks the 
> node's i_mode and identifies it as a symbolic link, resulting in an 
> ELOOP error.


Thanks for testing it, I didn't manage to look into it yet, but will for 
sure do during the next days (I hope tomorrow).

> 
> 
> My concernn is: what is the expected behavior for opening a symbolic 
> link, both on the kernel side and the server side? Is it possible for 
> the fuse server to return the dentry containing the inode of the link 
> destination instead of the inode of the symbolic link itself?

nfs_atomic_open handles -ELOOP. Possibly fuse server needs to check for 
O_NOFOLLOW, but I'm not sure. Will look into that in the morning.




Thanks,
Bernd

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v9 2/7] fuse: introduce atomic open
  2023-09-20 17:34 ` [PATCH v9 2/7] fuse: introduce atomic open Bernd Schubert
       [not found]   ` <7616CA3C-312F-4F9F-9BB3-903D3A77289B@chromium.org>
@ 2023-10-03  3:38   ` Yuan Yao
  1 sibling, 0 replies; 23+ messages in thread
From: Yuan Yao @ 2023-10-03  3:38 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: linux-fsdevel, bernd.schubert, miklos, dsingh, Horst Birthelmer,
	Christian Brauner, Al Viro, takayas, keiichiw

Sorry for confusing, I’m resending the mail as the previous mail had a formatting issue(not plain text) and was not sent to linux-fsdevel.

Hi, I’m Yuan, particularly interested in this patch set, and I've noticed some ambiguity regarding the behavior of atomic_open for symbolic links.

I think this part may cause a problem if we atomic_open a symbolic link.The previous behavior for fuse_create_open() will only do lookup but not open the symbolic link. But, with the full atomic open kernel patch. My understanding is that when the kernel performs an atomic_open operation on a symbolic link, the dentry returned from the FUSE server contains the inode pointing to the opened symbolic link. However, after atomic_open() is called, the may_open() function in namei.c checks the node's i_mode and identifies it as a symbolic link, resulting in an ELOOP error.

My concernn is: what is the expected behavior for opening a symbolic link, both on the kernel side and the server side? Is it possible for the fuse server to return the dentry containing the inode of the link destination instead of the inode of the symbolic link itself?

> On Sep 21, 2023, at 2:34, Bernd Schubert <bschubert@ddn.com> wrote:
> 
> From: Dharmendra Singh <dsingh@ddn.com>
> 
> This adds full atomic open support, to avoid lookup before open/create.
> If the implementation (fuse server/daemon) does not support atomic open
> it falls back to non-atomic open.
> 
> Co-developed-by: Bernd Schubert <bschubert@ddn.com>
> Signed-off-by: Dharmendra Singh <dsingh@ddn.com>
> Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>
> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
> Cc: Miklos Szeredi <miklos@szeredi.hu>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Dharmendra Singh <dsingh@ddn.com>
> Cc: linux-fsdevel@vger.kernel.org
> ---
> fs/fuse/dir.c             | 214 +++++++++++++++++++++++++++++++++++++-
> fs/fuse/fuse_i.h          |   3 +
> include/uapi/linux/fuse.h |   3 +
> 3 files changed, 219 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 542086140781..4cb2809a852d 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -722,7 +722,7 @@ static int _fuse_create_open(struct inode *dir, struct dentry *entry,
> 
> static int fuse_mknod(struct mnt_idmap *, struct inode *, struct dentry *,
>      umode_t, dev_t);
> -static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
> +static int fuse_create_open(struct inode *dir, struct dentry *entry,
>    struct file *file, unsigned flags,
>    umode_t mode)
> {
> @@ -768,6 +768,218 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
> return finish_no_open(file, res);
> }
> 
> +static int _fuse_atomic_open(struct inode *dir, struct dentry *entry,
> +     struct file *file, unsigned flags,
> +     umode_t mode)
> +{
> + int err;
> + struct inode *inode;
> + struct fuse_mount *fm = get_fuse_mount(dir);
> + struct fuse_conn *fc = fm->fc;
> + FUSE_ARGS(args);
> + struct fuse_forget_link *forget;
> + struct fuse_create_in inarg;
> + struct fuse_open_out outopen;
> + struct fuse_entry_out outentry;
> + struct fuse_inode *fi;
> + struct fuse_file *ff;
> + struct dentry *switched_entry = NULL, *alias = NULL;
> + DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
> +
> + /* Expect a negative dentry */
> + if (unlikely(d_inode(entry)))
> + goto fallback;
> +
> + /* Userspace expects S_IFREG in create mode */
> + if ((flags & O_CREAT) && (mode & S_IFMT) != S_IFREG)
> + goto fallback;
> +
> + forget = fuse_alloc_forget();
> + err = -ENOMEM;
> + if (!forget)
> + goto out_err;
> +
> + err = -ENOMEM;
> + ff = fuse_file_alloc(fm);
> + if (!ff)
> + goto out_put_forget_req;
> +
> + if (!fc->dont_mask)
> + mode &= ~current_umask();
> +
> + flags &= ~O_NOCTTY;
> + memset(&inarg, 0, sizeof(inarg));
> + memset(&outentry, 0, sizeof(outentry));
> + inarg.flags = flags;
> + inarg.mode = mode;
> + inarg.umask = current_umask();
> +
> + if (fc->handle_killpriv_v2 && (flags & O_TRUNC) &&
> +    !(flags & O_EXCL) && !capable(CAP_FSETID)) {
> + inarg.open_flags |= FUSE_OPEN_KILL_SUIDGID;
> + }
> +
> + args.opcode = FUSE_OPEN_ATOMIC;
> + args.nodeid = get_node_id(dir);
> + args.in_numargs = 2;
> + args.in_args[0].size = sizeof(inarg);
> + args.in_args[0].value = &inarg;
> + args.in_args[1].size = entry->d_name.len + 1;
> + args.in_args[1].value = entry->d_name.name;
> + args.out_numargs = 2;
> + args.out_args[0].size = sizeof(outentry);
> + args.out_args[0].value = &outentry;
> + args.out_args[1].size = sizeof(outopen);
> + args.out_args[1].value = &outopen;
> +
> + if (flags & O_CREAT) {
> + err = get_create_ext(&args, dir, entry, mode);
> + if (err)
> + goto out_free_ff;
> + }
> +
> + err = fuse_simple_request(fm, &args);
> + free_ext_value(&args);
> + if (err == -ENOSYS) {
> + fc->no_open_atomic = 1;
> + fuse_file_free(ff);
> + kfree(forget);
> + goto fallback;
> + }
> +
> + if (!err && !outentry.nodeid)
> + err = -ENOENT;
> +
> + if (err)
> + goto out_free_ff;
> +
> + err = -EIO;
> + if (invalid_nodeid(outentry.nodeid) || fuse_invalid_attr(&outentry.attr))
> + goto out_free_ff;
> +
> + ff->fh = outopen.fh;
> + ff->nodeid = outentry.nodeid;
> + ff->open_flags = outopen.open_flags;
> + inode = fuse_iget(dir->i_sb, outentry.nodeid, outentry.generation,
> +  &outentry.attr, entry_attr_timeout(&outentry), 0);
> + if (!inode) {
> + flags &= ~(O_CREAT | O_EXCL | O_TRUNC);
> + fuse_sync_release(NULL, ff, flags);
> + fuse_queue_forget(fm->fc, forget, outentry.nodeid, 1);
> + err = -ENOMEM;
> + goto out_err;
> + }
> +
> + /* prevent racing/parallel lookup on a negative hashed */
> + if (!(flags & O_CREAT) && !d_in_lookup(entry)) {
> + d_drop(entry);
> + switched_entry = d_alloc_parallel(entry->d_parent,
> +   &entry->d_name, &wq);
> + if (IS_ERR(switched_entry)) {
> + err = PTR_ERR(switched_entry);
> + goto out_free_ff;
> + }
> +
> + if (unlikely(!d_in_lookup(switched_entry))) {
> + /* fall back */
> + dput(switched_entry);
> + switched_entry = NULL;
> + goto free_and_fallback;
> + }
> +
> + entry = switched_entry;
> + }
> +
> + if (d_really_is_negative(entry)) {
> + d_drop(entry);
> + alias = d_exact_alias(entry, inode);
> + if (!alias) {
> + alias = d_splice_alias(inode, entry);
> + if (IS_ERR(alias)) {
> + /*
> + * Close the file in user space, but do not unlink it,
> + * if it was created - with network file systems other
> + * clients might have already accessed it.
> + */
> + fi = get_fuse_inode(inode);
> + fuse_sync_release(fi, ff, flags);
> + fuse_queue_forget(fm->fc, forget, outentry.nodeid, 1);
> + err = PTR_ERR(alias);
> + goto out_err;
> + }
> + }
> +
> + if (alias)
> + entry = alias;
> + }
> +
> + fuse_change_entry_timeout(entry, &outentry);
> +
> + /*  File was indeed created */
> + if (outopen.open_flags & FOPEN_FILE_CREATED) {
> + if (!(flags & O_CREAT)) {
> + pr_debug("Server side bug, FOPEN_FILE_CREATED set "
> + "without O_CREAT, ignoring.");
> + } else {
> + /* This should be always set when the file is created */
> + fuse_dir_changed(dir);
> + file->f_mode |= FMODE_CREATED;
> + }
> + }
> +
> + if (S_ISDIR(mode))
> + ff->open_flags &= ~FOPEN_DIRECT_IO;
> + err = finish_open(file, entry, generic_file_open);
> + if (err) {
> + fi = get_fuse_inode(inode);
> + fuse_sync_release(fi, ff, flags);
> + } else {
> + file->private_data = ff;
> + fuse_finish_open(inode, file);
> + }
> +
> + kfree(forget);
> +
> + if (switched_entry) {
> + d_lookup_done(switched_entry);
> + dput(switched_entry);
> + }
> +
> + dput(alias);
> +
> + return err;
> +
> +out_free_ff:
> + fuse_file_free(ff);
> +out_put_forget_req:
> + kfree(forget);
> +out_err:
> + if (switched_entry) {
> + d_lookup_done(switched_entry);
> + dput(switched_entry);
> + }
> +
> + return err;
> +
> +free_and_fallback:
> + fuse_file_free(ff);
> + kfree(forget);
> +fallback:
> + return fuse_create_open(dir, entry, file, flags, mode);
> +}
> +
> +static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
> +    struct file *file, unsigned int flags,
> +    umode_t mode)
> +{
> + struct fuse_conn *fc = get_fuse_conn(dir);
> +
> + if (fc->no_open_atomic)
> + return fuse_create_open(dir, entry, file, flags, mode);
> + else
> + return _fuse_atomic_open(dir, entry, file, flags, mode);
> +}
> +
> /*
>  * Code shared between mknod, mkdir, symlink and link
>  */
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 9b7fc7d3c7f1..c838708cfa2b 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -672,6 +672,9 @@ struct fuse_conn {
> /** Is open/release not implemented by fs? */
> unsigned no_open:1;
> 
> + /** Is open atomic not implemented by fs? */
> + unsigned no_open_atomic:1;
> +
> /** Is opendir/releasedir not implemented by fs? */
> unsigned no_opendir:1;
> 
> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> index b3fcab13fcd3..33fefee42697 100644
> --- a/include/uapi/linux/fuse.h
> +++ b/include/uapi/linux/fuse.h
> @@ -315,6 +315,7 @@ struct fuse_file_lock {
>  * FOPEN_STREAM: the file is stream-like (no file position at all)
>  * FOPEN_NOFLUSH: don't flush data cache on close (unless FUSE_WRITEBACK_CACHE)
>  * FOPEN_PARALLEL_DIRECT_WRITES: Allow concurrent direct writes on the same inode
> + * FOPEN_FILE_CREATED: the file was indeed created
>  */
> #define FOPEN_DIRECT_IO (1 << 0)
> #define FOPEN_KEEP_CACHE (1 << 1)
> @@ -323,6 +324,7 @@ struct fuse_file_lock {
> #define FOPEN_STREAM (1 << 4)
> #define FOPEN_NOFLUSH (1 << 5)
> #define FOPEN_PARALLEL_DIRECT_WRITES (1 << 6)
> +#define FOPEN_FILE_CREATED (1 << 7)
> 
> /**
>  * INIT request/reply flags
> @@ -575,6 +577,7 @@ enum fuse_opcode {
> FUSE_REMOVEMAPPING = 49,
> FUSE_SYNCFS = 50,
> FUSE_TMPFILE = 51,
> + FUSE_OPEN_ATOMIC = 52,
> 
> /* CUSE specific operations */
> CUSE_INIT = 4096,
> -- 
> 2.39.2
> 
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v9 4/7] vfs: Optimize atomic_open() on positive dentry
  2023-09-20 17:34 ` [PATCH v9 4/7] vfs: Optimize " Bernd Schubert
  2023-09-21  6:16   ` Amir Goldstein
@ 2023-10-11  7:17   ` Dan Carpenter
  2023-10-17 14:25     ` Bernd Schubert
  1 sibling, 1 reply; 23+ messages in thread
From: Dan Carpenter @ 2023-10-11  7:17 UTC (permalink / raw)
  To: oe-kbuild, Bernd Schubert, linux-fsdevel
  Cc: lkp, oe-kbuild-all, bernd.schubert, miklos, dsingh,
	Bernd Schubert, Christian Brauner, Al Viro

Hi Bernd,

kernel test robot noticed the following build warnings:

https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Bernd-Schubert/fuse-rename-fuse_create_open/20230921-013805
base:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping.git for-next
patch link:    https://lore.kernel.org/r/20230920173445.3943581-5-bschubert%40ddn.com
patch subject: [PATCH v9 4/7] vfs: Optimize atomic_open() on positive dentry
config: i386-randconfig-141-20230927 (https://download.01.org/0day-ci/archive/20231011/202310111259.WGjXat6p-lkp@intel.com/config)
compiler: gcc-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0
reproduce: (https://download.01.org/0day-ci/archive/20231011/202310111259.WGjXat6p-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
| Closes: https://lore.kernel.org/r/202310111259.WGjXat6p-lkp@intel.com/

New smatch warnings:
fs/namei.c:3418 atomic_revalidate_open() warn: variable dereferenced before check 'got_write' (see line 3414)

Old smatch warnings:
fs/namei.c:1573 lookup_dcache() warn: passing zero to 'ERR_PTR'
fs/namei.c:1658 lookup_fast() warn: passing zero to 'ERR_PTR'
fs/namei.c:2189 hash_name() error: uninitialized symbol 'bdata'.
fs/namei.c:2600 __kern_path_locked() warn: inconsistent returns '&path->dentry->d_inode->i_rwsem'.
fs/namei.c:3480 lookup_open() error: uninitialized symbol 'error'.

vim +/got_write +3418 fs/namei.c

0bb53ce89df211 Bernd Schubert 2023-09-20  3391  static struct dentry *atomic_revalidate_open(struct dentry *dentry,
0bb53ce89df211 Bernd Schubert 2023-09-20  3392  					     struct nameidata *nd,
0bb53ce89df211 Bernd Schubert 2023-09-20  3393  					     struct file *file,
0bb53ce89df211 Bernd Schubert 2023-09-20  3394  					     const struct open_flags *op,
0bb53ce89df211 Bernd Schubert 2023-09-20  3395  					     bool *got_write)
0bb53ce89df211 Bernd Schubert 2023-09-20  3396  {
0bb53ce89df211 Bernd Schubert 2023-09-20  3397  	struct mnt_idmap *idmap;
0bb53ce89df211 Bernd Schubert 2023-09-20  3398  	struct dentry *dir = nd->path.dentry;
0bb53ce89df211 Bernd Schubert 2023-09-20  3399  	struct inode *dir_inode = dir->d_inode;
0bb53ce89df211 Bernd Schubert 2023-09-20  3400  	int open_flag = op->open_flag;
0bb53ce89df211 Bernd Schubert 2023-09-20  3401  	umode_t mode = op->mode;
0bb53ce89df211 Bernd Schubert 2023-09-20  3402  
0bb53ce89df211 Bernd Schubert 2023-09-20  3403  	if (unlikely(IS_DEADDIR(dir_inode)))
0bb53ce89df211 Bernd Schubert 2023-09-20  3404  		return ERR_PTR(-ENOENT);
0bb53ce89df211 Bernd Schubert 2023-09-20  3405  
0bb53ce89df211 Bernd Schubert 2023-09-20  3406  	file->f_mode &= ~FMODE_CREATED;
0bb53ce89df211 Bernd Schubert 2023-09-20  3407  
0bb53ce89df211 Bernd Schubert 2023-09-20  3408  	if (unlikely(open_flag & O_CREAT)) {
0bb53ce89df211 Bernd Schubert 2023-09-20  3409  		WARN_ON(1);
0bb53ce89df211 Bernd Schubert 2023-09-20  3410  		return ERR_PTR(-EINVAL);
0bb53ce89df211 Bernd Schubert 2023-09-20  3411  	}
0bb53ce89df211 Bernd Schubert 2023-09-20  3412  
0bb53ce89df211 Bernd Schubert 2023-09-20  3413  	if (open_flag & (O_TRUNC | O_WRONLY | O_RDWR))
0bb53ce89df211 Bernd Schubert 2023-09-20 @3414  		*got_write = !mnt_want_write(nd->path.mnt);

Dereferenced

0bb53ce89df211 Bernd Schubert 2023-09-20  3415  	else
0bb53ce89df211 Bernd Schubert 2023-09-20  3416  		*got_write = false;

Here too.

0bb53ce89df211 Bernd Schubert 2023-09-20  3417  
0bb53ce89df211 Bernd Schubert 2023-09-20 @3418  	if (!got_write)

Checked too late.  But I think maybe it was supposed to check
if (!*got_write)?

0bb53ce89df211 Bernd Schubert 2023-09-20  3419  		open_flag &= ~O_TRUNC;
0bb53ce89df211 Bernd Schubert 2023-09-20  3420  
0bb53ce89df211 Bernd Schubert 2023-09-20  3421  	inode_lock_shared(dir->d_inode);
0bb53ce89df211 Bernd Schubert 2023-09-20  3422  	dentry = atomic_open(nd, dentry, file, open_flag, mode);

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v9 2/7] fuse: introduce atomic open
  2023-10-02 22:42     ` Bernd Schubert
@ 2023-10-12 20:44       ` Bernd Schubert
  0 siblings, 0 replies; 23+ messages in thread
From: Bernd Schubert @ 2023-10-12 20:44 UTC (permalink / raw)
  To: Bernd Schubert, Yuan Yao
  Cc: linux-fsdevel@vger.kernel.org, miklos@szeredi.hu,
	Dharmendra Singh, Horst Birthelmer, Christian Brauner, Al Viro

Hi Yuan,

>>
>> My concernn is: what is the expected behavior for opening a symbolic
>> link, both on the kernel side and the server side? Is it possible for
>> the fuse server to return the dentry containing the inode of the link
>> destination instead of the inode of the symbolic link itself?
> 
> nfs_atomic_open handles -ELOOP. Possibly fuse server needs to check for
> O_NOFOLLOW, but I'm not sure. Will look into that in the morning.
> 

thanks again for testing atomic open and reporting the symlink issue. 
And sorry for the delay.

Actually interesting that it was not caught by xfstests (or I somehow 
didn't notice) Horst (in CC) is currently adding fuse specific tests - 
we will catch it in the future. Will also check for existing symlink 
tests - this is actually not fuse specific...

I just pushed an initial fix to

https://github.com/bsbernd/linux/tree/atomic-open-for-6.5-v10
https://github.com/libfuse/libfuse/pull/813

I'm going to add further changes for symlinks, though. Right now atomic 
open just reports -ELOOP and then falls back to fuse_create_open, which 
sends a lookup - atomic_open now adds in overhead - the opposite of what 
I want. I had tried to return the attributes and the error for 
atomic_open, but that is not accepted in fuse_dev_do_write(). It would 
be possible to hack that function with checks for FUSE_OPEN_ATOMIC, but 
it would be rather hackish. So we need to extend fuse_entry_out with the 
error code, or add a new struct.

Thanks,
Bernd

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v9 4/7] vfs: Optimize atomic_open() on positive dentry
  2023-10-11  7:17   ` Dan Carpenter
@ 2023-10-17 14:25     ` Bernd Schubert
  0 siblings, 0 replies; 23+ messages in thread
From: Bernd Schubert @ 2023-10-17 14:25 UTC (permalink / raw)
  To: Dan Carpenter, oe-kbuild@lists.linux.dev,
	linux-fsdevel@vger.kernel.org
  Cc: lkp@intel.com, oe-kbuild-all@lists.linux.dev,
	bernd.schubert@fastmail.fm, miklos@szeredi.hu, Dharmendra Singh,
	Christian Brauner, Al Viro, Yuan Yao

On 10/11/23 09:17, Dan Carpenter wrote:
> Hi Bernd,
> 
> kernel test robot noticed the following build warnings:
> 
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Bernd-Schubert/fuse-rename-fuse_create_open/20230921-013805
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping.git for-next
> patch link:    https://lore.kernel.org/r/20230920173445.3943581-5-bschubert%40ddn.com
> patch subject: [PATCH v9 4/7] vfs: Optimize atomic_open() on positive dentry
> config: i386-randconfig-141-20230927 (https://download.01.org/0day-ci/archive/20231011/202310111259.WGjXat6p-lkp@intel.com/config)
> compiler: gcc-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0
> reproduce: (https://download.01.org/0day-ci/archive/20231011/202310111259.WGjXat6p-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
> | Closes: https://lore.kernel.org/r/202310111259.WGjXat6p-lkp@intel.com/
> 
> New smatch warnings:
> fs/namei.c:3418 atomic_revalidate_open() warn: variable dereferenced before check 'got_write' (see line 3414)
> 
> Old smatch warnings:
> fs/namei.c:1573 lookup_dcache() warn: passing zero to 'ERR_PTR'
> fs/namei.c:1658 lookup_fast() warn: passing zero to 'ERR_PTR'
> fs/namei.c:2189 hash_name() error: uninitialized symbol 'bdata'.
> fs/namei.c:2600 __kern_path_locked() warn: inconsistent returns '&path->dentry->d_inode->i_rwsem'.
> fs/namei.c:3480 lookup_open() error: uninitialized symbol 'error'.
> 
> vim +/got_write +3418 fs/namei.c

Thanks, fixed in my v10 branch,.


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2023-10-17 14:27 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-20 17:34 [PATCH v9 0/7] fuse: full atomic open and atomic-open-revalidate Bernd Schubert
2023-09-20 17:34 ` [PATCH v9 1/7] fuse: rename fuse_create_open Bernd Schubert
2023-09-20 17:34 ` [PATCH v9 2/7] fuse: introduce atomic open Bernd Schubert
     [not found]   ` <7616CA3C-312F-4F9F-9BB3-903D3A77289B@chromium.org>
2023-10-02 22:42     ` Bernd Schubert
2023-10-12 20:44       ` Bernd Schubert
2023-10-03  3:38   ` Yuan Yao
2023-09-20 17:34 ` [PATCH v9 3/7] [RFC] Allow atomic_open() on positive dentry Bernd Schubert
2023-09-21  6:20   ` Amir Goldstein
2023-09-20 17:34 ` [PATCH v9 4/7] vfs: Optimize " Bernd Schubert
2023-09-21  6:16   ` Amir Goldstein
2023-09-21  8:09     ` Bernd Schubert
2023-09-21  8:29       ` Amir Goldstein
2023-09-21  9:16         ` Bernd Schubert
2023-10-11  7:17   ` Dan Carpenter
2023-10-17 14:25     ` Bernd Schubert
2023-09-20 17:34 ` [PATCH v9 5/7] fuse: Revalidate positive entries in fuse_atomic_open Bernd Schubert
2023-09-20 17:34 ` [PATCH v9 6/7] fuse: Return D_REVALIDATE_ATOMIC for cached dentries Bernd Schubert
2023-09-20 17:34 ` [PATCH v9 7/7] fuse: Avoid code duplication in atomic open Bernd Schubert
2023-09-21  9:33 ` [PATCH v9 0/7] fuse: full atomic open and atomic-open-revalidate Amir Goldstein
2023-09-21 11:59   ` Bernd Schubert
2023-09-21 14:24     ` Amir Goldstein
2023-09-21 14:44       ` Bernd Schubert
2023-09-29 11:34       ` Amir Goldstein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).