* Re: [PATCH] vfs: fix race between evice_inodes() and find_inode()&iput()
2024-08-23 13:07 [PATCH] vfs: fix race between evice_inodes() and find_inode()&iput() Julian Sun
@ 2024-08-24 4:54 ` Mateusz Guzik
2024-08-26 4:11 ` Julian Sun
2024-08-26 10:50 ` Christian Brauner
2024-08-28 16:36 ` Jan Kara
2 siblings, 1 reply; 6+ messages in thread
From: Mateusz Guzik @ 2024-08-24 4:54 UTC (permalink / raw)
To: Julian Sun
Cc: linux-fsdevel, linux-btrfs, viro, brauner, jack, david,
zhuyifei1999, syzbot+67ba3c42bcbb4665d3ad, stable
On Fri, Aug 23, 2024 at 09:07:30PM +0800, Julian Sun wrote:
> Hi, all
>
> Recently I noticed a bug[1] in btrfs, after digged it into
> and I believe it'a race in vfs.
>
> Let's assume there's a inode (ie ino 261) with i_count 1 is
> called by iput(), and there's a concurrent thread calling
> generic_shutdown_super().
>
> cpu0: cpu1:
> iput() // i_count is 1
> ->spin_lock(inode)
> ->dec i_count to 0
> ->iput_final() generic_shutdown_super()
> ->__inode_add_lru() ->evict_inodes()
> // cause some reason[2] ->if (atomic_read(inode->i_count)) continue;
> // return before // inode 261 passed the above check
> // list_lru_add_obj() // and then schedule out
> ->spin_unlock()
> // note here: the inode 261
> // was still at sb list and hash list,
> // and I_FREEING|I_WILL_FREE was not been set
>
> btrfs_iget()
> // after some function calls
> ->find_inode()
> // found the above inode 261
> ->spin_lock(inode)
> // check I_FREEING|I_WILL_FREE
> // and passed
> ->__iget()
> ->spin_unlock(inode) // schedule back
> ->spin_lock(inode)
> // check (I_NEW|I_FREEING|I_WILL_FREE) flags,
> // passed and set I_FREEING
> iput() ->spin_unlock(inode)
> ->spin_lock(inode) ->evict()
> // dec i_count to 0
> ->iput_final()
> ->spin_unlock()
> ->evict()
>
> Now, we have two threads simultaneously evicting
> the same inode, which may trigger the BUG(inode->i_state & I_CLEAR)
> statement both within clear_inode() and iput().
>
> To fix the bug, recheck the inode->i_count after holding i_lock.
> Because in the most scenarios, the first check is valid, and
> the overhead of spin_lock() can be reduced.
>
> If there is any misunderstanding, please let me know, thanks.
>
> [1]: https://lore.kernel.org/linux-btrfs/000000000000eabe1d0619c48986@google.com/
> [2]: The reason might be 1. SB_ACTIVE was removed or 2. mapping_shrinkable()
> return false when I reproduced the bug.
>
> Reported-by: syzbot+67ba3c42bcbb4665d3ad@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=67ba3c42bcbb4665d3ad
> CC: stable@vger.kernel.org
> Fixes: 63997e98a3be ("split invalidate_inodes()")
> Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
> ---
> fs/inode.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/fs/inode.c b/fs/inode.c
> index 3a41f83a4ba5..011f630777d0 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -723,6 +723,10 @@ void evict_inodes(struct super_block *sb)
> continue;
>
> spin_lock(&inode->i_lock);
> + if (atomic_read(&inode->i_count)) {
> + spin_unlock(&inode->i_lock);
> + continue;
> + }
> if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
> spin_unlock(&inode->i_lock);
> continue;
This looks correct to me, albeit I would argue the commit message is
overly verbose making it harder to understand the gist of the problem:
evict_inodes() fails to re-check i_count after acquiring the spin lock,
while the flags blocking 0->1 i_count transisions are not set yet,
making it possible to race against such transition.
The real remark I have here is that evict_inodes(), modulo the bug, is
identical to invalidate_inodes(). Perhaps a separate patch (*not* for
stable) to whack it would be prudent?
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] vfs: fix race between evice_inodes() and find_inode()&iput()
2024-08-24 4:54 ` Mateusz Guzik
@ 2024-08-26 4:11 ` Julian Sun
2024-08-26 7:05 ` Mateusz Guzik
0 siblings, 1 reply; 6+ messages in thread
From: Julian Sun @ 2024-08-26 4:11 UTC (permalink / raw)
To: Mateusz Guzik
Cc: linux-fsdevel, linux-btrfs, viro, brauner, jack, david,
zhuyifei1999, syzbot+67ba3c42bcbb4665d3ad, stable
On Sat, 2024-08-24 at 06:54 +0200, Mateusz Guzik wrote:
> On Fri, Aug 23, 2024 at 09:07:30PM +0800, Julian Sun wrote:
> > Hi, all
> >
> > Recently I noticed a bug[1] in btrfs, after digged it into
> > and I believe it'a race in vfs.
> >
> > Let's assume there's a inode (ie ino 261) with i_count 1 is
> > called by iput(), and there's a concurrent thread calling
> > generic_shutdown_super().
> >
> > cpu0: cpu1:
> > iput() // i_count is 1
> > ->spin_lock(inode)
> > ->dec i_count to 0
> > ->iput_final() generic_shutdown_super()
> > ->__inode_add_lru() ->evict_inodes()
> > // cause some reason[2] ->if (atomic_read(inode-
> > >i_count)) continue;
> > // return before // inode 261 passed the
> > above check
> > // list_lru_add_obj() // and then schedule out
> > ->spin_unlock()
> > // note here: the inode 261
> > // was still at sb list and hash list,
> > // and I_FREEING|I_WILL_FREE was not been set
> >
> > btrfs_iget()
> > // after some function calls
> > ->find_inode()
> > // found the above inode 261
> > ->spin_lock(inode)
> > // check I_FREEING|I_WILL_FREE
> > // and passed
> > ->__iget()
> > ->spin_unlock(inode) // schedule back
> > ->spin_lock(inode)
> > // check
> > (I_NEW|I_FREEING|I_WILL_FREE) flags,
> > // passed and set I_FREEING
> > iput() ->spin_unlock(inode)
> > ->spin_lock(inode) ->evict()
> > // dec i_count to 0
> > ->iput_final()
> > ->spin_unlock()
> > ->evict()
> >
> > Now, we have two threads simultaneously evicting
> > the same inode, which may trigger the BUG(inode->i_state & I_CLEAR)
> > statement both within clear_inode() and iput().
> >
> > To fix the bug, recheck the inode->i_count after holding i_lock.
> > Because in the most scenarios, the first check is valid, and
> > the overhead of spin_lock() can be reduced.
> >
> > If there is any misunderstanding, please let me know, thanks.
> >
> > [1]:
> > https://lore.kernel.org/linux-btrfs/000000000000eabe1d0619c48986@google.com/
> > [2]: The reason might be 1. SB_ACTIVE was removed or 2.
> > mapping_shrinkable()
> > return false when I reproduced the bug.
> >
> > Reported-by: syzbot+67ba3c42bcbb4665d3ad@syzkaller.appspotmail.com
> > Closes:
> > https://syzkaller.appspot.com/bug?extid=67ba3c42bcbb4665d3ad
> > CC: stable@vger.kernel.org
> > Fixes: 63997e98a3be ("split invalidate_inodes()")
> > Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
> > ---
> > fs/inode.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/fs/inode.c b/fs/inode.c
> > index 3a41f83a4ba5..011f630777d0 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -723,6 +723,10 @@ void evict_inodes(struct super_block *sb)
> > continue;
> >
> > spin_lock(&inode->i_lock);
> > + if (atomic_read(&inode->i_count)) {
> > + spin_unlock(&inode->i_lock);
> > + continue;
> > + }
> > if (inode->i_state & (I_NEW | I_FREEING |
> > I_WILL_FREE)) {
> > spin_unlock(&inode->i_lock);
> > continue;
>
> This looks correct to me, albeit I would argue the commit message is
> overly verbose making it harder to understand the gist of the
> problem:
> evict_inodes() fails to re-check i_count after acquiring the spin
> lock,
> while the flags blocking 0->1 i_count transisions are not set yet,
> making it possible to race against such transition.
Alright, I think the issue is clearly explained through the above
commit message. If you insist, I can send a patch v2 to reorder the
commit message.
>
> The real remark I have here is that evict_inodes(), modulo the bug,
> is
> identical to invalidate_inodes(). Perhaps a separate patch (*not* for
> stable) to whack it would be prudent?
Agreed. We can replace invalidate_inodes() with evict_inodes() after
this patch.
Thanks,
--
Julian Sun <sunjunchao2870@gmail.com>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] vfs: fix race between evice_inodes() and find_inode()&iput()
2024-08-26 4:11 ` Julian Sun
@ 2024-08-26 7:05 ` Mateusz Guzik
0 siblings, 0 replies; 6+ messages in thread
From: Mateusz Guzik @ 2024-08-26 7:05 UTC (permalink / raw)
To: Julian Sun
Cc: linux-fsdevel, linux-btrfs, viro, brauner, jack, david,
zhuyifei1999, syzbot+67ba3c42bcbb4665d3ad, stable
On Mon, Aug 26, 2024 at 6:11 AM Julian Sun <sunjunchao2870@gmail.com> wrote:
>
> On Sat, 2024-08-24 at 06:54 +0200, Mateusz Guzik wrote:
> > evict_inodes() fails to re-check i_count after acquiring the spin
> > lock,
> > while the flags blocking 0->1 i_count transisions are not set yet,
> > making it possible to race against such transition.
> Alright, I think the issue is clearly explained through the above
> commit message. If you insist, I can send a patch v2 to reorder the
> commit message.
I'm in no position to insist, merely noting. :)
--
Mateusz Guzik <mjguzik gmail.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] vfs: fix race between evice_inodes() and find_inode()&iput()
2024-08-23 13:07 [PATCH] vfs: fix race between evice_inodes() and find_inode()&iput() Julian Sun
2024-08-24 4:54 ` Mateusz Guzik
@ 2024-08-26 10:50 ` Christian Brauner
2024-08-28 16:36 ` Jan Kara
2 siblings, 0 replies; 6+ messages in thread
From: Christian Brauner @ 2024-08-26 10:50 UTC (permalink / raw)
To: Julian Sun
Cc: Christian Brauner, viro, jack, david, zhuyifei1999,
syzbot+67ba3c42bcbb4665d3ad, stable, linux-fsdevel, linux-btrfs
On Fri, 23 Aug 2024 21:07:30 +0800, Julian Sun wrote:
> Recently I noticed a bug[1] in btrfs, after digged it into
> and I believe it'a race in vfs.
>
> Let's assume there's a inode (ie ino 261) with i_count 1 is
> called by iput(), and there's a concurrent thread calling
> generic_shutdown_super().
>
> [...]
Applied to the vfs.misc branch of the vfs/vfs.git tree.
Patches in the vfs.misc branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.misc
[1/1] vfs: fix race between evice_inodes() and find_inode()&iput()
https://git.kernel.org/vfs/vfs/c/f37af83281e6
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] vfs: fix race between evice_inodes() and find_inode()&iput()
2024-08-23 13:07 [PATCH] vfs: fix race between evice_inodes() and find_inode()&iput() Julian Sun
2024-08-24 4:54 ` Mateusz Guzik
2024-08-26 10:50 ` Christian Brauner
@ 2024-08-28 16:36 ` Jan Kara
2 siblings, 0 replies; 6+ messages in thread
From: Jan Kara @ 2024-08-28 16:36 UTC (permalink / raw)
To: Julian Sun
Cc: linux-fsdevel, linux-btrfs, viro, brauner, jack, david,
zhuyifei1999, syzbot+67ba3c42bcbb4665d3ad, stable
On Fri 23-08-24 21:07:30, Julian Sun wrote:
> Hi, all
>
> Recently I noticed a bug[1] in btrfs, after digged it into
> and I believe it'a race in vfs.
>
> Let's assume there's a inode (ie ino 261) with i_count 1 is
> called by iput(), and there's a concurrent thread calling
> generic_shutdown_super().
>
> cpu0: cpu1:
> iput() // i_count is 1
> ->spin_lock(inode)
> ->dec i_count to 0
> ->iput_final() generic_shutdown_super()
> ->__inode_add_lru() ->evict_inodes()
> // cause some reason[2] ->if (atomic_read(inode->i_count)) continue;
> // return before // inode 261 passed the above check
> // list_lru_add_obj() // and then schedule out
> ->spin_unlock()
> // note here: the inode 261
> // was still at sb list and hash list,
> // and I_FREEING|I_WILL_FREE was not been set
>
> btrfs_iget()
> // after some function calls
> ->find_inode()
> // found the above inode 261
> ->spin_lock(inode)
> // check I_FREEING|I_WILL_FREE
> // and passed
> ->__iget()
> ->spin_unlock(inode) // schedule back
> ->spin_lock(inode)
> // check (I_NEW|I_FREEING|I_WILL_FREE) flags,
> // passed and set I_FREEING
> iput() ->spin_unlock(inode)
> ->spin_lock(inode) ->evict()
> // dec i_count to 0
> ->iput_final()
> ->spin_unlock()
> ->evict()
>
> Now, we have two threads simultaneously evicting
> the same inode, which may trigger the BUG(inode->i_state & I_CLEAR)
> statement both within clear_inode() and iput().
>
> To fix the bug, recheck the inode->i_count after holding i_lock.
> Because in the most scenarios, the first check is valid, and
> the overhead of spin_lock() can be reduced.
>
> If there is any misunderstanding, please let me know, thanks.
>
> [1]: https://lore.kernel.org/linux-btrfs/000000000000eabe1d0619c48986@google.com/
> [2]: The reason might be 1. SB_ACTIVE was removed or 2. mapping_shrinkable()
> return false when I reproduced the bug.
>
> Reported-by: syzbot+67ba3c42bcbb4665d3ad@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=67ba3c42bcbb4665d3ad
> CC: stable@vger.kernel.org
> Fixes: 63997e98a3be ("split invalidate_inodes()")
> Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
Thanks for the fix. It looks good to me and I'm curious how come we didn't
hit this earlier ;). Anyway, feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
I'd just note that the Fixes tag isn't correct AFAICT. At the time of
commit 63997e98a3be we had a global inode_lock and the
atomic_read(&inode->i_count) check was done under it. I'd say it was
55fa6091d831 ("fs: move i_sb_list out from under inode_lock")
or possibly
250df6ed274d ("fs: protect inode->i_state with inode->i_lock")
(depending on how you look at it) that introduced this problem. Not that it
would matter that much these days :).
Honza
> ---
> fs/inode.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/fs/inode.c b/fs/inode.c
> index 3a41f83a4ba5..011f630777d0 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -723,6 +723,10 @@ void evict_inodes(struct super_block *sb)
> continue;
>
> spin_lock(&inode->i_lock);
> + if (atomic_read(&inode->i_count)) {
> + spin_unlock(&inode->i_lock);
> + continue;
> + }
> if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
> spin_unlock(&inode->i_lock);
> continue;
> --
> 2.39.2
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 6+ messages in thread