linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fs: revamp iput()
       [not found] <20250827-kraut-anekdote-35789fddbb0b@brauner>
@ 2025-08-27 16:24 ` Mateusz Guzik
  2025-08-30 15:54   ` Mateusz Guzik
  0 siblings, 1 reply; 5+ messages in thread
From: Mateusz Guzik @ 2025-08-27 16:24 UTC (permalink / raw)
  To: brauner
  Cc: viro, jack, linux-kernel, linux-fsdevel, josef, kernel-team,
	amir73il, linux-btrfs, linux-ext4, linux-xfs, Mateusz Guzik

The material change is I_DIRTY_TIME handling without a spurious ref
acquire/release cycle.

While here a bunch of smaller changes:
1. predict there is an inode -- bpftrace suggests one is passed vast
   majority of the time
2. convert BUG_ON into VFS_BUG_ON_INODE
3. assert on ->i_count
4. assert ->i_lock is not held
5. flip the order of I_DIRTY_TIME and nlink count checks as the former
   is less likely to be true

I verified atomic_read(&inode->i_count) does not show up in asm if
debug is disabled.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---

The routine kept annoying me, so here is a further revised variant.

I verified this compiles, but I still cannot runtime test. I'm sorry for
that.  My signed-off is conditional on a good samaritan making sure it
works :)

diff compared to the thing I sent "informally":
- if (unlikely(!inode))
- asserts
- slightly reworded iput_final commentary
- unlikely() on the second I_DIRTY_TIME check

Given the revamp I think it makes sense to attribute the change to me,
hence a "proper" mail.

The thing surviving from the submission by Josef is:
+       if (atomic_add_unless(&inode->i_count, -1, 1))
+               return;

And of course he is the one who brought up the spurious refcount trip in
the first place.

I'm happy with Reported-by, Co-developed-by or whatever other credit
as you guys see fit.

That aside I think it would be nice if NULL inodes passed to iput
became illegal, but that's a different story for another day.

 fs/inode.c | 46 +++++++++++++++++++++++++++++++++++-----------
 1 file changed, 35 insertions(+), 11 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 01ebdc40021e..01a554e11279 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1908,20 +1908,44 @@ static void iput_final(struct inode *inode)
  */
 void iput(struct inode *inode)
 {
-	if (!inode)
+	if (unlikely(!inode))
 		return;
-	BUG_ON(inode->i_state & I_CLEAR);
+
 retry:
-	if (atomic_dec_and_lock(&inode->i_count, &inode->i_lock)) {
-		if (inode->i_nlink && (inode->i_state & I_DIRTY_TIME)) {
-			atomic_inc(&inode->i_count);
-			spin_unlock(&inode->i_lock);
-			trace_writeback_lazytime_iput(inode);
-			mark_inode_dirty_sync(inode);
-			goto retry;
-		}
-		iput_final(inode);
+	lockdep_assert_not_held(&inode->i_lock);
+	VFS_BUG_ON_INODE(inode->i_state & I_CLEAR, inode);
+	/*
+	 * Note this assert is technically racy as if the count is bogusly
+	 * equal to one, then two CPUs racing to further drop it can both
+	 * conclude it's fine.
+	 */
+	VFS_BUG_ON_INODE(atomic_read(&inode->i_count) < 1, inode);
+
+	if (atomic_add_unless(&inode->i_count, -1, 1))
+		return;
+
+	if ((inode->i_state & I_DIRTY_TIME) && inode->i_nlink) {
+		trace_writeback_lazytime_iput(inode);
+		mark_inode_dirty_sync(inode);
+		goto retry;
 	}
+
+	spin_lock(&inode->i_lock);
+	if (unlikely((inode->i_state & I_DIRTY_TIME) && inode->i_nlink)) {
+		spin_unlock(&inode->i_lock);
+		goto retry;
+	}
+
+	if (!atomic_dec_and_test(&inode->i_count)) {
+		spin_unlock(&inode->i_lock);
+		return;
+	}
+
+	/*
+	 * iput_final() drops ->i_lock, we can't assert on it as the inode may
+	 * be deallocated by the time the call returns.
+	 */
+	iput_final(inode);
 }
 EXPORT_SYMBOL(iput);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] fs: revamp iput()
  2025-08-27 16:24 ` [PATCH] fs: revamp iput() Mateusz Guzik
@ 2025-08-30 15:54   ` Mateusz Guzik
  2025-09-01  8:50     ` Jan Kara
  2025-09-01 10:41     ` Christian Brauner
  0 siblings, 2 replies; 5+ messages in thread
From: Mateusz Guzik @ 2025-08-30 15:54 UTC (permalink / raw)
  To: brauner
  Cc: viro, jack, linux-kernel, linux-fsdevel, josef, kernel-team,
	amir73il, linux-btrfs, linux-ext4, linux-xfs

I'm writing a long response to this series, in the meantime I noticed
this bit landed in
https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs-6.18.inode.refcount.preliminaries&id=3cba19f6a00675fbc2af0987dfc90e216e6cfb74
but with some whitespace issues in comments -- they are indented with
spaces instead of tabs after the opening line.

I verified the mail I sent does not have it, so I'm guessing this was
copy-pasted?

Tabing them by hand does the trick, below is my copy-paste as proof,
please indent by hand in your editor ;)

diff --git a/fs/inode.c b/fs/inode.c
index 2db680a37235..fe4868e2a954 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1915,10 +1915,10 @@ void iput(struct inode *inode)
        lockdep_assert_not_held(&inode->i_lock);
        VFS_BUG_ON_INODE(inode->i_state & I_CLEAR, inode);
        /*
-        * Note this assert is technically racy as if the count is bogusly
-        * equal to one, then two CPUs racing to further drop it can both
-        * conclude it's fine.
-        */
+        * Note this assert is technically racy as if the count is bogusly
+        * equal to one, then two CPUs racing to further drop it can both
+        * conclude it's fine.
+        */
        VFS_BUG_ON_INODE(atomic_read(&inode->i_count) < 1, inode);

        if (atomic_add_unless(&inode->i_count, -1, 1))
@@ -1942,9 +1942,9 @@ void iput(struct inode *inode)
        }

        /*
-        * iput_final() drops ->i_lock, we can't assert on it as the inode may
-        * be deallocated by the time the call returns.
-        */
+        * iput_final() drops ->i_lock, we can't assert on it as the inode may
+        * be deallocated by the time the call returns.
+        */
        iput_final(inode);
 }
 EXPORT_SYMBOL(iput);

While here, vim told me about spaces instead of tabs in 2 more spots
in the file. Again to show the lines:

diff --git a/fs/inode.c b/fs/inode.c
index 2db680a37235..833de5457a06 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -550,11 +550,11 @@ static void __inode_add_lru(struct inode *inode,
bool rotate)
 struct wait_queue_head *inode_bit_waitqueue(struct wait_bit_queue_entry *wqe,
                                            struct inode *inode, u32 bit)
 {
-        void *bit_address;
+       void *bit_address;

-        bit_address = inode_state_wait_address(inode, bit);
-        init_wait_var_entry(wqe, bit_address, 0);
-        return __var_waitqueue(bit_address);
+       bit_address = inode_state_wait_address(inode, bit);
+       init_wait_var_entry(wqe, bit_address, 0);
+       return __var_waitqueue(bit_address);
 }
 EXPORT_SYMBOL(inode_bit_waitqueue);
@@ -2938,7 +2938,7 @@ EXPORT_SYMBOL(mode_strip_sgid);
  */
 void dump_inode(struct inode *inode, const char *reason)
 {
-       pr_warn("%s encountered for inode %px", reason, inode);
+       pr_warn("%s encountered for inode %px", reason, inode);
 }

 EXPORT_SYMBOL(dump_inode);

Christian, I think it would be the most expedient if you just made
changes on your own with whatever commit message you see fit. No need
to mention I brought this up. If you insist I can send a patch.

On Wed, Aug 27, 2025 at 6:24 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> The material change is I_DIRTY_TIME handling without a spurious ref
> acquire/release cycle.
>
> While here a bunch of smaller changes:
> 1. predict there is an inode -- bpftrace suggests one is passed vast
>    majority of the time
> 2. convert BUG_ON into VFS_BUG_ON_INODE
> 3. assert on ->i_count
> 4. assert ->i_lock is not held
> 5. flip the order of I_DIRTY_TIME and nlink count checks as the former
>    is less likely to be true
>
> I verified atomic_read(&inode->i_count) does not show up in asm if
> debug is disabled.
>
> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
> ---
>
> The routine kept annoying me, so here is a further revised variant.
>
> I verified this compiles, but I still cannot runtime test. I'm sorry for
> that.  My signed-off is conditional on a good samaritan making sure it
> works :)
>
> diff compared to the thing I sent "informally":
> - if (unlikely(!inode))
> - asserts
> - slightly reworded iput_final commentary
> - unlikely() on the second I_DIRTY_TIME check
>
> Given the revamp I think it makes sense to attribute the change to me,
> hence a "proper" mail.
>
> The thing surviving from the submission by Josef is:
> +       if (atomic_add_unless(&inode->i_count, -1, 1))
> +               return;
>
> And of course he is the one who brought up the spurious refcount trip in
> the first place.
>
> I'm happy with Reported-by, Co-developed-by or whatever other credit
> as you guys see fit.
>
> That aside I think it would be nice if NULL inodes passed to iput
> became illegal, but that's a different story for another day.
>
>  fs/inode.c | 46 +++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 35 insertions(+), 11 deletions(-)
>
> diff --git a/fs/inode.c b/fs/inode.c
> index 01ebdc40021e..01a554e11279 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1908,20 +1908,44 @@ static void iput_final(struct inode *inode)
>   */
>  void iput(struct inode *inode)
>  {
> -       if (!inode)
> +       if (unlikely(!inode))
>                 return;
> -       BUG_ON(inode->i_state & I_CLEAR);
> +
>  retry:
> -       if (atomic_dec_and_lock(&inode->i_count, &inode->i_lock)) {
> -               if (inode->i_nlink && (inode->i_state & I_DIRTY_TIME)) {
> -                       atomic_inc(&inode->i_count);
> -                       spin_unlock(&inode->i_lock);
> -                       trace_writeback_lazytime_iput(inode);
> -                       mark_inode_dirty_sync(inode);
> -                       goto retry;
> -               }
> -               iput_final(inode);
> +       lockdep_assert_not_held(&inode->i_lock);
> +       VFS_BUG_ON_INODE(inode->i_state & I_CLEAR, inode);
> +       /*
> +        * Note this assert is technically racy as if the count is bogusly
> +        * equal to one, then two CPUs racing to further drop it can both
> +        * conclude it's fine.
> +        */
> +       VFS_BUG_ON_INODE(atomic_read(&inode->i_count) < 1, inode);
> +
> +       if (atomic_add_unless(&inode->i_count, -1, 1))
> +               return;
> +
> +       if ((inode->i_state & I_DIRTY_TIME) && inode->i_nlink) {
> +               trace_writeback_lazytime_iput(inode);
> +               mark_inode_dirty_sync(inode);
> +               goto retry;
>         }
> +
> +       spin_lock(&inode->i_lock);
> +       if (unlikely((inode->i_state & I_DIRTY_TIME) && inode->i_nlink)) {
> +               spin_unlock(&inode->i_lock);
> +               goto retry;
> +       }
> +
> +       if (!atomic_dec_and_test(&inode->i_count)) {
> +               spin_unlock(&inode->i_lock);
> +               return;
> +       }
> +
> +       /*
> +        * iput_final() drops ->i_lock, we can't assert on it as the inode may
> +        * be deallocated by the time the call returns.
> +        */
> +       iput_final(inode);
>  }
>  EXPORT_SYMBOL(iput);
>
> --
> 2.43.0
>


-- 
Mateusz Guzik <mjguzik gmail.com>

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] fs: revamp iput()
  2025-08-30 15:54   ` Mateusz Guzik
@ 2025-09-01  8:50     ` Jan Kara
  2025-09-01 10:39       ` Christian Brauner
  2025-09-01 10:41     ` Christian Brauner
  1 sibling, 1 reply; 5+ messages in thread
From: Jan Kara @ 2025-09-01  8:50 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: brauner, viro, jack, linux-kernel, linux-fsdevel, josef,
	kernel-team, amir73il, linux-btrfs, linux-ext4, linux-xfs

On Sat 30-08-25 17:54:35, Mateusz Guzik wrote:
> I'm writing a long response to this series, in the meantime I noticed
> this bit landed in
> https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs-6.18.inode.refcount.preliminaries&id=3cba19f6a00675fbc2af0987dfc90e216e6cfb74
> but with some whitespace issues in comments -- they are indented with
> spaces instead of tabs after the opening line.

Interesting. I didn't see an email about inclusion. Anyway, the change
looks good to me so Christian, feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

							Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] fs: revamp iput()
  2025-09-01  8:50     ` Jan Kara
@ 2025-09-01 10:39       ` Christian Brauner
  0 siblings, 0 replies; 5+ messages in thread
From: Christian Brauner @ 2025-09-01 10:39 UTC (permalink / raw)
  To: Jan Kara
  Cc: Mateusz Guzik, viro, linux-kernel, linux-fsdevel, josef,
	kernel-team, amir73il, linux-btrfs, linux-ext4, linux-xfs

On Mon, Sep 01, 2025 at 10:50:59AM +0200, Jan Kara wrote:
> On Sat 30-08-25 17:54:35, Mateusz Guzik wrote:
> > I'm writing a long response to this series, in the meantime I noticed
> > this bit landed in
> > https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs-6.18.inode.refcount.preliminaries&id=3cba19f6a00675fbc2af0987dfc90e216e6cfb74
> > but with some whitespace issues in comments -- they are indented with
> > spaces instead of tabs after the opening line.
> 
> Interesting. I didn't see an email about inclusion. Anyway, the change

Sorry, that waas my bad. I talked with Josef off-list and told him that
I would apply Mateusz suggestions with his CdB and SoB added. I forgot
to repeat that on the list.

> looks good to me so Christian, feel free to add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>

Thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] fs: revamp iput()
  2025-08-30 15:54   ` Mateusz Guzik
  2025-09-01  8:50     ` Jan Kara
@ 2025-09-01 10:41     ` Christian Brauner
  1 sibling, 0 replies; 5+ messages in thread
From: Christian Brauner @ 2025-09-01 10:41 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: viro, jack, linux-kernel, linux-fsdevel, josef, kernel-team,
	amir73il, linux-btrfs, linux-ext4, linux-xfs

> Christian, I think it would be the most expedient if you just made

Ok, thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-09-01 10:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20250827-kraut-anekdote-35789fddbb0b@brauner>
2025-08-27 16:24 ` [PATCH] fs: revamp iput() Mateusz Guzik
2025-08-30 15:54   ` Mateusz Guzik
2025-09-01  8:50     ` Jan Kara
2025-09-01 10:39       ` Christian Brauner
2025-09-01 10:41     ` Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).