* [PATCH] fs: revamp iput()
[not found] <20250827-kraut-anekdote-35789fddbb0b@brauner>
@ 2025-08-27 16:24 ` Mateusz Guzik
2025-08-30 15:54 ` Mateusz Guzik
0 siblings, 1 reply; 5+ messages in thread
From: Mateusz Guzik @ 2025-08-27 16:24 UTC (permalink / raw)
To: brauner
Cc: viro, jack, linux-kernel, linux-fsdevel, josef, kernel-team,
amir73il, linux-btrfs, linux-ext4, linux-xfs, Mateusz Guzik
The material change is I_DIRTY_TIME handling without a spurious ref
acquire/release cycle.
While here a bunch of smaller changes:
1. predict there is an inode -- bpftrace suggests one is passed vast
majority of the time
2. convert BUG_ON into VFS_BUG_ON_INODE
3. assert on ->i_count
4. assert ->i_lock is not held
5. flip the order of I_DIRTY_TIME and nlink count checks as the former
is less likely to be true
I verified atomic_read(&inode->i_count) does not show up in asm if
debug is disabled.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
The routine kept annoying me, so here is a further revised variant.
I verified this compiles, but I still cannot runtime test. I'm sorry for
that. My signed-off is conditional on a good samaritan making sure it
works :)
diff compared to the thing I sent "informally":
- if (unlikely(!inode))
- asserts
- slightly reworded iput_final commentary
- unlikely() on the second I_DIRTY_TIME check
Given the revamp I think it makes sense to attribute the change to me,
hence a "proper" mail.
The thing surviving from the submission by Josef is:
+ if (atomic_add_unless(&inode->i_count, -1, 1))
+ return;
And of course he is the one who brought up the spurious refcount trip in
the first place.
I'm happy with Reported-by, Co-developed-by or whatever other credit
as you guys see fit.
That aside I think it would be nice if NULL inodes passed to iput
became illegal, but that's a different story for another day.
fs/inode.c | 46 +++++++++++++++++++++++++++++++++++-----------
1 file changed, 35 insertions(+), 11 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c
index 01ebdc40021e..01a554e11279 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1908,20 +1908,44 @@ static void iput_final(struct inode *inode)
*/
void iput(struct inode *inode)
{
- if (!inode)
+ if (unlikely(!inode))
return;
- BUG_ON(inode->i_state & I_CLEAR);
+
retry:
- if (atomic_dec_and_lock(&inode->i_count, &inode->i_lock)) {
- if (inode->i_nlink && (inode->i_state & I_DIRTY_TIME)) {
- atomic_inc(&inode->i_count);
- spin_unlock(&inode->i_lock);
- trace_writeback_lazytime_iput(inode);
- mark_inode_dirty_sync(inode);
- goto retry;
- }
- iput_final(inode);
+ lockdep_assert_not_held(&inode->i_lock);
+ VFS_BUG_ON_INODE(inode->i_state & I_CLEAR, inode);
+ /*
+ * Note this assert is technically racy as if the count is bogusly
+ * equal to one, then two CPUs racing to further drop it can both
+ * conclude it's fine.
+ */
+ VFS_BUG_ON_INODE(atomic_read(&inode->i_count) < 1, inode);
+
+ if (atomic_add_unless(&inode->i_count, -1, 1))
+ return;
+
+ if ((inode->i_state & I_DIRTY_TIME) && inode->i_nlink) {
+ trace_writeback_lazytime_iput(inode);
+ mark_inode_dirty_sync(inode);
+ goto retry;
}
+
+ spin_lock(&inode->i_lock);
+ if (unlikely((inode->i_state & I_DIRTY_TIME) && inode->i_nlink)) {
+ spin_unlock(&inode->i_lock);
+ goto retry;
+ }
+
+ if (!atomic_dec_and_test(&inode->i_count)) {
+ spin_unlock(&inode->i_lock);
+ return;
+ }
+
+ /*
+ * iput_final() drops ->i_lock, we can't assert on it as the inode may
+ * be deallocated by the time the call returns.
+ */
+ iput_final(inode);
}
EXPORT_SYMBOL(iput);
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] fs: revamp iput()
2025-08-27 16:24 ` [PATCH] fs: revamp iput() Mateusz Guzik
@ 2025-08-30 15:54 ` Mateusz Guzik
2025-09-01 8:50 ` Jan Kara
2025-09-01 10:41 ` Christian Brauner
0 siblings, 2 replies; 5+ messages in thread
From: Mateusz Guzik @ 2025-08-30 15:54 UTC (permalink / raw)
To: brauner
Cc: viro, jack, linux-kernel, linux-fsdevel, josef, kernel-team,
amir73il, linux-btrfs, linux-ext4, linux-xfs
I'm writing a long response to this series, in the meantime I noticed
this bit landed in
https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs-6.18.inode.refcount.preliminaries&id=3cba19f6a00675fbc2af0987dfc90e216e6cfb74
but with some whitespace issues in comments -- they are indented with
spaces instead of tabs after the opening line.
I verified the mail I sent does not have it, so I'm guessing this was
copy-pasted?
Tabing them by hand does the trick, below is my copy-paste as proof,
please indent by hand in your editor ;)
diff --git a/fs/inode.c b/fs/inode.c
index 2db680a37235..fe4868e2a954 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1915,10 +1915,10 @@ void iput(struct inode *inode)
lockdep_assert_not_held(&inode->i_lock);
VFS_BUG_ON_INODE(inode->i_state & I_CLEAR, inode);
/*
- * Note this assert is technically racy as if the count is bogusly
- * equal to one, then two CPUs racing to further drop it can both
- * conclude it's fine.
- */
+ * Note this assert is technically racy as if the count is bogusly
+ * equal to one, then two CPUs racing to further drop it can both
+ * conclude it's fine.
+ */
VFS_BUG_ON_INODE(atomic_read(&inode->i_count) < 1, inode);
if (atomic_add_unless(&inode->i_count, -1, 1))
@@ -1942,9 +1942,9 @@ void iput(struct inode *inode)
}
/*
- * iput_final() drops ->i_lock, we can't assert on it as the inode may
- * be deallocated by the time the call returns.
- */
+ * iput_final() drops ->i_lock, we can't assert on it as the inode may
+ * be deallocated by the time the call returns.
+ */
iput_final(inode);
}
EXPORT_SYMBOL(iput);
While here, vim told me about spaces instead of tabs in 2 more spots
in the file. Again to show the lines:
diff --git a/fs/inode.c b/fs/inode.c
index 2db680a37235..833de5457a06 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -550,11 +550,11 @@ static void __inode_add_lru(struct inode *inode,
bool rotate)
struct wait_queue_head *inode_bit_waitqueue(struct wait_bit_queue_entry *wqe,
struct inode *inode, u32 bit)
{
- void *bit_address;
+ void *bit_address;
- bit_address = inode_state_wait_address(inode, bit);
- init_wait_var_entry(wqe, bit_address, 0);
- return __var_waitqueue(bit_address);
+ bit_address = inode_state_wait_address(inode, bit);
+ init_wait_var_entry(wqe, bit_address, 0);
+ return __var_waitqueue(bit_address);
}
EXPORT_SYMBOL(inode_bit_waitqueue);
@@ -2938,7 +2938,7 @@ EXPORT_SYMBOL(mode_strip_sgid);
*/
void dump_inode(struct inode *inode, const char *reason)
{
- pr_warn("%s encountered for inode %px", reason, inode);
+ pr_warn("%s encountered for inode %px", reason, inode);
}
EXPORT_SYMBOL(dump_inode);
Christian, I think it would be the most expedient if you just made
changes on your own with whatever commit message you see fit. No need
to mention I brought this up. If you insist I can send a patch.
On Wed, Aug 27, 2025 at 6:24 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> The material change is I_DIRTY_TIME handling without a spurious ref
> acquire/release cycle.
>
> While here a bunch of smaller changes:
> 1. predict there is an inode -- bpftrace suggests one is passed vast
> majority of the time
> 2. convert BUG_ON into VFS_BUG_ON_INODE
> 3. assert on ->i_count
> 4. assert ->i_lock is not held
> 5. flip the order of I_DIRTY_TIME and nlink count checks as the former
> is less likely to be true
>
> I verified atomic_read(&inode->i_count) does not show up in asm if
> debug is disabled.
>
> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
> ---
>
> The routine kept annoying me, so here is a further revised variant.
>
> I verified this compiles, but I still cannot runtime test. I'm sorry for
> that. My signed-off is conditional on a good samaritan making sure it
> works :)
>
> diff compared to the thing I sent "informally":
> - if (unlikely(!inode))
> - asserts
> - slightly reworded iput_final commentary
> - unlikely() on the second I_DIRTY_TIME check
>
> Given the revamp I think it makes sense to attribute the change to me,
> hence a "proper" mail.
>
> The thing surviving from the submission by Josef is:
> + if (atomic_add_unless(&inode->i_count, -1, 1))
> + return;
>
> And of course he is the one who brought up the spurious refcount trip in
> the first place.
>
> I'm happy with Reported-by, Co-developed-by or whatever other credit
> as you guys see fit.
>
> That aside I think it would be nice if NULL inodes passed to iput
> became illegal, but that's a different story for another day.
>
> fs/inode.c | 46 +++++++++++++++++++++++++++++++++++-----------
> 1 file changed, 35 insertions(+), 11 deletions(-)
>
> diff --git a/fs/inode.c b/fs/inode.c
> index 01ebdc40021e..01a554e11279 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1908,20 +1908,44 @@ static void iput_final(struct inode *inode)
> */
> void iput(struct inode *inode)
> {
> - if (!inode)
> + if (unlikely(!inode))
> return;
> - BUG_ON(inode->i_state & I_CLEAR);
> +
> retry:
> - if (atomic_dec_and_lock(&inode->i_count, &inode->i_lock)) {
> - if (inode->i_nlink && (inode->i_state & I_DIRTY_TIME)) {
> - atomic_inc(&inode->i_count);
> - spin_unlock(&inode->i_lock);
> - trace_writeback_lazytime_iput(inode);
> - mark_inode_dirty_sync(inode);
> - goto retry;
> - }
> - iput_final(inode);
> + lockdep_assert_not_held(&inode->i_lock);
> + VFS_BUG_ON_INODE(inode->i_state & I_CLEAR, inode);
> + /*
> + * Note this assert is technically racy as if the count is bogusly
> + * equal to one, then two CPUs racing to further drop it can both
> + * conclude it's fine.
> + */
> + VFS_BUG_ON_INODE(atomic_read(&inode->i_count) < 1, inode);
> +
> + if (atomic_add_unless(&inode->i_count, -1, 1))
> + return;
> +
> + if ((inode->i_state & I_DIRTY_TIME) && inode->i_nlink) {
> + trace_writeback_lazytime_iput(inode);
> + mark_inode_dirty_sync(inode);
> + goto retry;
> }
> +
> + spin_lock(&inode->i_lock);
> + if (unlikely((inode->i_state & I_DIRTY_TIME) && inode->i_nlink)) {
> + spin_unlock(&inode->i_lock);
> + goto retry;
> + }
> +
> + if (!atomic_dec_and_test(&inode->i_count)) {
> + spin_unlock(&inode->i_lock);
> + return;
> + }
> +
> + /*
> + * iput_final() drops ->i_lock, we can't assert on it as the inode may
> + * be deallocated by the time the call returns.
> + */
> + iput_final(inode);
> }
> EXPORT_SYMBOL(iput);
>
> --
> 2.43.0
>
--
Mateusz Guzik <mjguzik gmail.com>
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] fs: revamp iput()
2025-08-30 15:54 ` Mateusz Guzik
@ 2025-09-01 8:50 ` Jan Kara
2025-09-01 10:39 ` Christian Brauner
2025-09-01 10:41 ` Christian Brauner
1 sibling, 1 reply; 5+ messages in thread
From: Jan Kara @ 2025-09-01 8:50 UTC (permalink / raw)
To: Mateusz Guzik
Cc: brauner, viro, jack, linux-kernel, linux-fsdevel, josef,
kernel-team, amir73il, linux-btrfs, linux-ext4, linux-xfs
On Sat 30-08-25 17:54:35, Mateusz Guzik wrote:
> I'm writing a long response to this series, in the meantime I noticed
> this bit landed in
> https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs-6.18.inode.refcount.preliminaries&id=3cba19f6a00675fbc2af0987dfc90e216e6cfb74
> but with some whitespace issues in comments -- they are indented with
> spaces instead of tabs after the opening line.
Interesting. I didn't see an email about inclusion. Anyway, the change
looks good to me so Christian, feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] fs: revamp iput()
2025-09-01 8:50 ` Jan Kara
@ 2025-09-01 10:39 ` Christian Brauner
0 siblings, 0 replies; 5+ messages in thread
From: Christian Brauner @ 2025-09-01 10:39 UTC (permalink / raw)
To: Jan Kara
Cc: Mateusz Guzik, viro, linux-kernel, linux-fsdevel, josef,
kernel-team, amir73il, linux-btrfs, linux-ext4, linux-xfs
On Mon, Sep 01, 2025 at 10:50:59AM +0200, Jan Kara wrote:
> On Sat 30-08-25 17:54:35, Mateusz Guzik wrote:
> > I'm writing a long response to this series, in the meantime I noticed
> > this bit landed in
> > https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs-6.18.inode.refcount.preliminaries&id=3cba19f6a00675fbc2af0987dfc90e216e6cfb74
> > but with some whitespace issues in comments -- they are indented with
> > spaces instead of tabs after the opening line.
>
> Interesting. I didn't see an email about inclusion. Anyway, the change
Sorry, that waas my bad. I talked with Josef off-list and told him that
I would apply Mateusz suggestions with his CdB and SoB added. I forgot
to repeat that on the list.
> looks good to me so Christian, feel free to add:
>
> Reviewed-by: Jan Kara <jack@suse.cz>
Thanks!
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] fs: revamp iput()
2025-08-30 15:54 ` Mateusz Guzik
2025-09-01 8:50 ` Jan Kara
@ 2025-09-01 10:41 ` Christian Brauner
1 sibling, 0 replies; 5+ messages in thread
From: Christian Brauner @ 2025-09-01 10:41 UTC (permalink / raw)
To: Mateusz Guzik
Cc: viro, jack, linux-kernel, linux-fsdevel, josef, kernel-team,
amir73il, linux-btrfs, linux-ext4, linux-xfs
> Christian, I think it would be the most expedient if you just made
Ok, thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-09-01 10:41 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20250827-kraut-anekdote-35789fddbb0b@brauner>
2025-08-27 16:24 ` [PATCH] fs: revamp iput() Mateusz Guzik
2025-08-30 15:54 ` Mateusz Guzik
2025-09-01 8:50 ` Jan Kara
2025-09-01 10:39 ` Christian Brauner
2025-09-01 10:41 ` Christian Brauner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).