From: Dave Chinner <david@fromorbit.com>
To: linux-fsdevel@vger.kernel.org
Cc: linux-xfs@vger.kernel.org, linux-bcachefs@vger.kernel.org,
kent.overstreet@linux.dev, torvalds@linux-foundation.org
Subject: [PATCH 2/7] vfs: add inode iteration superblock method
Date: Wed, 2 Oct 2024 11:33:19 +1000 [thread overview]
Message-ID: <20241002014017.3801899-3-david@fromorbit.com> (raw)
In-Reply-To: <20241002014017.3801899-1-david@fromorbit.com>
From: Dave Chinner <dchinner@redhat.com>
Add a new superblock method for iterating all cached inodes in the
inode cache.
This will be used to replace the explicit sb->s_inodes iteration,
and the caller will supply a callback function and a private data
pointer that gets passed to the callback along with each inode that
is iterated.
There are two iteration functions provided. The first is the
interface that everyone should be using - it provides an valid,
unlocked and referenced inode that any inode operation (including
blocking operations) is allowed on. The iterator infrastructure is
responsible for lifecycle management, hence the subsystem callback
only needs to implement the operation it wants to perform on all
inodes.
The second iterator interface is the unsafe variant for internal VFS
use only. It simply iterates all VFS inodes without guaranteeing
any state or taking references. This iteration is done under a RCU
read lock to ensure that the VFS inode is not freed from under
the callback. If the operation wishes to block, it must drop the
RCU context after guaranteeing that the inode will not get freed.
This unsafe iteration mechanism is needed for operations that need
tight control over the state of the inodes they need to operate on.
This mechanism allows the existing sb->s_inodes iteration models
to be maintained, allowing a generic implementation for iterating
all cached inodes on the superblock to be provided.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/internal.h | 2 +
fs/super.c | 105 +++++++++++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 12 ++++++
3 files changed, 119 insertions(+)
diff --git a/fs/internal.h b/fs/internal.h
index 37749b429e80..7039d13980c6 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -127,6 +127,8 @@ struct super_block *user_get_super(dev_t, bool excl);
void put_super(struct super_block *sb);
extern bool mount_capable(struct fs_context *);
int sb_init_dio_done_wq(struct super_block *sb);
+void super_iter_inodes_unsafe(struct super_block *sb, ino_iter_fn iter_fn,
+ void *private_data);
/*
* Prepare superblock for changing its read-only state (i.e., either remount
diff --git a/fs/super.c b/fs/super.c
index a16e6a6342e0..20a9446d943a 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -167,6 +167,111 @@ static void super_wake(struct super_block *sb, unsigned int flag)
wake_up_var(&sb->s_flags);
}
+/**
+ * super_iter_inodes - iterate all the cached inodes on a superblock
+ * @sb: superblock to iterate
+ * @iter_fn: callback to run on every inode found.
+ *
+ * This function iterates all cached inodes on a superblock that are not in
+ * the process of being initialised or torn down. It will run @iter_fn() with
+ * a valid, referenced inode, so it is safe for the caller to do anything
+ * it wants with the inode except drop the reference the iterator holds.
+ *
+ */
+int super_iter_inodes(struct super_block *sb, ino_iter_fn iter_fn,
+ void *private_data, int flags)
+{
+ struct inode *inode, *old_inode = NULL;
+ int ret = 0;
+
+ spin_lock(&sb->s_inode_list_lock);
+ list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+ spin_lock(&inode->i_lock);
+ if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
+ spin_unlock(&inode->i_lock);
+ continue;
+ }
+
+ /*
+ * Skip over zero refcount inode if the caller only wants
+ * referenced inodes to be iterated.
+ */
+ if ((flags & INO_ITER_REFERENCED) &&
+ !atomic_read(&inode->i_count)) {
+ spin_unlock(&inode->i_lock);
+ continue;
+ }
+
+ __iget(inode);
+ spin_unlock(&inode->i_lock);
+ spin_unlock(&sb->s_inode_list_lock);
+ iput(old_inode);
+
+ ret = iter_fn(inode, private_data);
+
+ old_inode = inode;
+ if (ret == INO_ITER_ABORT) {
+ ret = 0;
+ break;
+ }
+ if (ret < 0)
+ break;
+
+ cond_resched();
+ spin_lock(&sb->s_inode_list_lock);
+ }
+ spin_unlock(&sb->s_inode_list_lock);
+ iput(old_inode);
+ return ret;
+}
+
+/**
+ * super_iter_inodes_unsafe - unsafely iterate all the inodes on a superblock
+ * @sb: superblock to iterate
+ * @iter_fn: callback to run on every inode found.
+ *
+ * This is almost certainly not the function you want. It is for internal VFS
+ * operations only. Please use super_iter_inodes() instead. If you must use
+ * this function, please add a comment explaining why it is necessary and the
+ * locking that makes it safe to use this function.
+ *
+ * This function iterates all cached inodes on a superblock that are attached to
+ * the superblock. It will pass each inode to @iter_fn unlocked and without
+ * having performed any existences checks on it.
+
+ * @iter_fn must perform all necessary state checks on the inode itself to
+ * ensure safe operation. super_iter_inodes_unsafe() only guarantees that the
+ * inode exists and won't be freed whilst the callback is running.
+ *
+ * @iter_fn must not block. It is run in an atomic context that is not allowed
+ * to sleep to provide the inode existence guarantees. If the callback needs to
+ * do blocking operations it needs to track the inode itself and defer those
+ * operations until after the iteration completes.
+ *
+ * @iter_fn must provide conditional reschedule checks itself. If rescheduling
+ * or deferred processing is needed, it must return INO_ITER_ABORT to return to
+ * the high level function to perform those operations. It can then restart the
+ * iteration again. The high level code must provide forwards progress
+ * guarantees if they are necessary.
+ *
+ */
+void super_iter_inodes_unsafe(struct super_block *sb, ino_iter_fn iter_fn,
+ void *private_data)
+{
+ struct inode *inode;
+ int ret;
+
+ rcu_read_lock();
+ spin_lock(&sb->s_inode_list_lock);
+ list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+ ret = iter_fn(inode, private_data);
+ if (ret == INO_ITER_ABORT)
+ break;
+ }
+ spin_unlock(&sb->s_inode_list_lock);
+ rcu_read_unlock();
+}
+
/*
* One thing we have to be careful of with a per-sb shrinker is that we don't
* drop the last active reference to the superblock from within the shrinker.
diff --git a/include/linux/fs.h b/include/linux/fs.h
index eae5b67e4a15..0a6a462c45ab 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2213,6 +2213,18 @@ enum freeze_holder {
FREEZE_MAY_NEST = (1U << 2),
};
+/* Inode iteration callback return values */
+#define INO_ITER_DONE 0
+#define INO_ITER_ABORT 1
+
+/* Inode iteration control flags */
+#define INO_ITER_REFERENCED (1U << 0)
+#define INO_ITER_UNSAFE (1U << 1)
+
+typedef int (*ino_iter_fn)(struct inode *inode, void *priv);
+int super_iter_inodes(struct super_block *sb, ino_iter_fn iter_fn,
+ void *private_data, int flags);
+
struct super_operations {
struct inode *(*alloc_inode)(struct super_block *sb);
void (*destroy_inode)(struct inode *);
--
2.45.2
next prev parent reply other threads:[~2024-10-02 1:40 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-02 1:33 [RFC PATCH 0/7] vfs: improving inode cache iteration scalability Dave Chinner
2024-10-02 1:33 ` [PATCH 1/7] vfs: replace invalidate_inodes() with evict_inodes() Dave Chinner
2024-10-03 7:07 ` Christoph Hellwig
2024-10-03 9:20 ` Jan Kara
2024-10-02 1:33 ` Dave Chinner [this message]
2024-10-03 7:12 ` [PATCH 2/7] vfs: add inode iteration superblock method Christoph Hellwig
2024-10-03 10:35 ` Dave Chinner
2024-10-04 9:53 ` kernel test robot
2024-10-02 1:33 ` [PATCH 3/7] vfs: convert vfs inode iterators to super_iter_inodes_unsafe() Dave Chinner
2024-10-03 7:14 ` Christoph Hellwig
2024-10-03 10:45 ` Dave Chinner
2024-10-04 10:55 ` kernel test robot
2024-10-02 1:33 ` [PATCH 4/7] vfs: Convert sb->s_inodes iteration to super_iter_inodes() Dave Chinner
2024-10-03 7:23 ` lsm sb_delete hook, was " Christoph Hellwig
2024-10-03 7:38 ` Christoph Hellwig
2024-10-03 11:57 ` Jan Kara
2024-10-03 12:11 ` Christoph Hellwig
2024-10-03 12:26 ` Jan Kara
2024-10-03 12:39 ` Christoph Hellwig
2024-10-03 12:56 ` Jan Kara
2024-10-03 13:04 ` Christoph Hellwig
2024-10-03 13:59 ` Dave Chinner
2024-10-03 16:17 ` Jan Kara
2024-10-04 0:46 ` Dave Chinner
2024-10-04 7:21 ` Christian Brauner
2024-10-04 12:14 ` Christoph Hellwig
2024-10-04 13:49 ` Jan Kara
2024-10-04 18:15 ` Paul Moore
2024-10-04 22:57 ` Dave Chinner
2024-10-05 15:21 ` Mickaël Salaün
2024-10-05 16:03 ` Mickaël Salaün
2024-10-05 16:03 ` Paul Moore
2024-10-07 20:37 ` Linus Torvalds
2024-10-07 23:33 ` Dave Chinner
2024-10-08 0:28 ` Linus Torvalds
2024-10-08 0:54 ` Linus Torvalds
2024-10-09 9:49 ` Jan Kara
2024-10-08 12:59 ` Mickaël Salaün
2024-10-09 0:21 ` Dave Chinner
2024-10-09 9:23 ` Mickaël Salaün
2024-10-08 8:57 ` Amir Goldstein
2024-10-08 11:23 ` Jan Kara
2024-10-08 12:16 ` Christian Brauner
2024-10-09 0:03 ` Dave Chinner
2024-10-08 23:44 ` Dave Chinner
2024-10-09 6:10 ` Amir Goldstein
2024-10-09 14:18 ` Jan Kara
2024-10-02 1:33 ` [PATCH 5/7] vfs: add inode iteration superblock method Dave Chinner
2024-10-03 7:24 ` Christoph Hellwig
2024-10-02 1:33 ` [PATCH 6/7] xfs: implement sb->iter_vfs_inodes Dave Chinner
2024-10-03 7:30 ` Christoph Hellwig
2024-10-02 1:33 ` [PATCH 7/7] bcachefs: " Dave Chinner
2024-10-02 10:00 ` [RFC PATCH 0/7] vfs: improving inode cache iteration scalability Christian Brauner
2024-10-02 12:34 ` Dave Chinner
2024-10-02 19:29 ` Kent Overstreet
2024-10-02 22:23 ` Dave Chinner
2024-10-02 23:20 ` Kent Overstreet
2024-10-03 1:41 ` Dave Chinner
2024-10-03 2:24 ` Kent Overstreet
2024-10-03 9:17 ` Jan Kara
2024-10-03 9:59 ` Dave Chinner
2024-10-02 19:49 ` Linus Torvalds
2024-10-02 20:28 ` Kent Overstreet
2024-10-02 23:17 ` Dave Chinner
2024-10-03 1:22 ` Kent Overstreet
2024-10-03 2:20 ` Dave Chinner
2024-10-03 2:42 ` Kent Overstreet
2024-10-03 11:45 ` Jan Kara
2024-10-03 12:18 ` Christoph Hellwig
2024-10-03 12:46 ` Jan Kara
2024-10-03 13:35 ` Dave Chinner
2024-10-03 13:03 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241002014017.3801899-3-david@fromorbit.com \
--to=david@fromorbit.com \
--cc=kent.overstreet@linux.dev \
--cc=linux-bcachefs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).