* [PATCH] vfs: handle __wait_on_freeing_inode() and evict() race
@ 2024-07-18 15:18 Mateusz Guzik
2024-07-18 20:40 ` Jan Kara
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: Mateusz Guzik @ 2024-07-18 15:18 UTC (permalink / raw)
To: brauner
Cc: viro, jack, linux-kernel, linux-fsdevel, Dominique Martinet,
Jakub Kicinski, v9fs, Mateusz Guzik
Lockless hash lookup can find and lock the inode after it gets the
I_FREEING flag set, at which point it blocks waiting for teardown in
evict() to finish.
However, the flag is still set even after evict() wakes up all waiters.
This results in a race where if the inode lock is taken late enough, it
can happen after both hash removal and wakeups, meaning there is nobody
to wake the racing thread up.
This worked prior to RCU-based lookup because the entire ordeal was
synchronized with the inode hash lock.
Since unhashing requires the inode lock, we can safely check whether it
happened after acquiring it.
Link: https://lore.kernel.org/v9fs/20240717102458.649b60be@kernel.org/
Reported-by: Dominique Martinet <asmadeus@codewreck.org>
Fixes: 7180f8d91fcb ("vfs: add rcu-based find_inode variants for iget ops")
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
The 'fixes' tag is contingent on testing by someone else. :>
I have 0 experience with 9pfs and the docs failed me vs getting it
running on libvirt+qemu, so I gave up on trying to test it myself.
Dominique, you offered to narrow things down here, assuming the offer
stands I would appreciate if you got this sorted out :)
Even if the patch in the current form does not go in, it should be
sufficient to confirm the problem diagnosis is correct.
A debug printk can be added to validate the problematic condition was
encountered, for example:
> diff --git a/fs/inode.c b/fs/inode.c
> index 54e0be80be14..8f61fad0bc69 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -2308,6 +2308,7 @@ static void __wait_on_freeing_inode(struct inode *inode, bool locked)
> if (unlikely(inode_unhashed(inode))) {
> BUG_ON(locked);
> spin_unlock(&inode->i_lock);
> + printk(KERN_EMERG "%s: got unhashed inode %p\n", __func__, inode);
> return;
> }
fs/inode.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/fs/inode.c b/fs/inode.c
index f356fe2ec2b6..54e0be80be14 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -676,6 +676,16 @@ static void evict(struct inode *inode)
remove_inode_hash(inode);
+ /*
+ * Wake up waiters in __wait_on_freeing_inode().
+ *
+ * Lockless hash lookup may end up finding the inode before we removed
+ * it above, but only lock it *after* we are done with the wakeup below.
+ * In this case the potential waiter cannot safely block.
+ *
+ * The inode being unhashed after the call to remove_inode_hash() is
+ * used as an indicator whether blocking on it is safe.
+ */
spin_lock(&inode->i_lock);
wake_up_bit(&inode->i_state, __I_NEW);
BUG_ON(inode->i_state != (I_FREEING | I_CLEAR));
@@ -2291,6 +2301,16 @@ static void __wait_on_freeing_inode(struct inode *inode, bool locked)
{
wait_queue_head_t *wq;
DEFINE_WAIT_BIT(wait, &inode->i_state, __I_NEW);
+
+ /*
+ * Handle racing against evict(), see that routine for more details.
+ */
+ if (unlikely(inode_unhashed(inode))) {
+ BUG_ON(locked);
+ spin_unlock(&inode->i_lock);
+ return;
+ }
+
wq = bit_waitqueue(&inode->i_state, __I_NEW);
prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
spin_unlock(&inode->i_lock);
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] vfs: handle __wait_on_freeing_inode() and evict() race
2024-07-18 15:18 [PATCH] vfs: handle __wait_on_freeing_inode() and evict() race Mateusz Guzik
@ 2024-07-18 20:40 ` Jan Kara
2024-07-18 21:42 ` Dominique Martinet
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Jan Kara @ 2024-07-18 20:40 UTC (permalink / raw)
To: Mateusz Guzik
Cc: brauner, viro, jack, linux-kernel, linux-fsdevel,
Dominique Martinet, Jakub Kicinski, v9fs
On Thu 18-07-24 17:18:37, Mateusz Guzik wrote:
> Lockless hash lookup can find and lock the inode after it gets the
> I_FREEING flag set, at which point it blocks waiting for teardown in
> evict() to finish.
>
> However, the flag is still set even after evict() wakes up all waiters.
>
> This results in a race where if the inode lock is taken late enough, it
> can happen after both hash removal and wakeups, meaning there is nobody
> to wake the racing thread up.
>
> This worked prior to RCU-based lookup because the entire ordeal was
> synchronized with the inode hash lock.
>
> Since unhashing requires the inode lock, we can safely check whether it
> happened after acquiring it.
>
> Link: https://lore.kernel.org/v9fs/20240717102458.649b60be@kernel.org/
> Reported-by: Dominique Martinet <asmadeus@codewreck.org>
> Fixes: 7180f8d91fcb ("vfs: add rcu-based find_inode variants for iget ops")
> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
>
> The 'fixes' tag is contingent on testing by someone else. :>
>
> I have 0 experience with 9pfs and the docs failed me vs getting it
> running on libvirt+qemu, so I gave up on trying to test it myself.
>
> Dominique, you offered to narrow things down here, assuming the offer
> stands I would appreciate if you got this sorted out :)
>
> Even if the patch in the current form does not go in, it should be
> sufficient to confirm the problem diagnosis is correct.
>
> A debug printk can be added to validate the problematic condition was
> encountered, for example:
>
> > diff --git a/fs/inode.c b/fs/inode.c
> > index 54e0be80be14..8f61fad0bc69 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -2308,6 +2308,7 @@ static void __wait_on_freeing_inode(struct inode *inode, bool locked)
> > if (unlikely(inode_unhashed(inode))) {
> > BUG_ON(locked);
> > spin_unlock(&inode->i_lock);
> > + printk(KERN_EMERG "%s: got unhashed inode %p\n", __func__, inode);
> > return;
> > }
>
>
> fs/inode.c | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> diff --git a/fs/inode.c b/fs/inode.c
> index f356fe2ec2b6..54e0be80be14 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -676,6 +676,16 @@ static void evict(struct inode *inode)
>
> remove_inode_hash(inode);
>
> + /*
> + * Wake up waiters in __wait_on_freeing_inode().
> + *
> + * Lockless hash lookup may end up finding the inode before we removed
> + * it above, but only lock it *after* we are done with the wakeup below.
> + * In this case the potential waiter cannot safely block.
> + *
> + * The inode being unhashed after the call to remove_inode_hash() is
> + * used as an indicator whether blocking on it is safe.
> + */
> spin_lock(&inode->i_lock);
> wake_up_bit(&inode->i_state, __I_NEW);
> BUG_ON(inode->i_state != (I_FREEING | I_CLEAR));
> @@ -2291,6 +2301,16 @@ static void __wait_on_freeing_inode(struct inode *inode, bool locked)
> {
> wait_queue_head_t *wq;
> DEFINE_WAIT_BIT(wait, &inode->i_state, __I_NEW);
> +
> + /*
> + * Handle racing against evict(), see that routine for more details.
> + */
> + if (unlikely(inode_unhashed(inode))) {
> + BUG_ON(locked);
> + spin_unlock(&inode->i_lock);
> + return;
> + }
> +
> wq = bit_waitqueue(&inode->i_state, __I_NEW);
> prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
> spin_unlock(&inode->i_lock);
> --
> 2.43.0
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] vfs: handle __wait_on_freeing_inode() and evict() race
2024-07-18 15:18 [PATCH] vfs: handle __wait_on_freeing_inode() and evict() race Mateusz Guzik
2024-07-18 20:40 ` Jan Kara
@ 2024-07-18 21:42 ` Dominique Martinet
2024-07-19 12:02 ` Christian Brauner
2024-07-19 13:25 ` Jakub Kicinski
3 siblings, 0 replies; 6+ messages in thread
From: Dominique Martinet @ 2024-07-18 21:42 UTC (permalink / raw)
To: Mateusz Guzik
Cc: brauner, viro, jack, linux-kernel, linux-fsdevel, Jakub Kicinski,
v9fs
Mateusz Guzik wrote on Thu, Jul 18, 2024 at 05:18:37PM +0200:
> Lockless hash lookup can find and lock the inode after it gets the
> I_FREEING flag set, at which point it blocks waiting for teardown in
> evict() to finish.
>
> However, the flag is still set even after evict() wakes up all waiters.
>
> This results in a race where if the inode lock is taken late enough, it
> can happen after both hash removal and wakeups, meaning there is nobody
> to wake the racing thread up.
>
> This worked prior to RCU-based lookup because the entire ordeal was
> synchronized with the inode hash lock.
>
> Since unhashing requires the inode lock, we can safely check whether it
> happened after acquiring it.
>
> Link: https://lore.kernel.org/v9fs/20240717102458.649b60be@kernel.org/
> Reported-by: Dominique Martinet <asmadeus@codewreck.org>
> Fixes: 7180f8d91fcb ("vfs: add rcu-based find_inode variants for iget ops")
> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
> ---
>
> The 'fixes' tag is contingent on testing by someone else. :>
Thanks for the quick fix!
> I have 0 experience with 9pfs and the docs failed me vs getting it
> running on libvirt+qemu, so I gave up on trying to test it myself.
I hadn't used it until yesterday either, but virtme-ng[1] should be easy
enough to get running without much effort: just cloning this and running
/path/to/virtme-ng/vng from a built linux tree will start a vm with /
mounted as 9p read-only (--rwdir /foo for writing)
[1] https://github.com/arighi/virtme-ng
> Dominique, you offered to narrow things down here, assuming the offer
> stands I would appreciate if you got this sorted out :)
Unfortunately I haven't been able to reproduce this :/
I'm not running the exact same workload but 9p should be instanciating
inodes from just a find in a large tree; I tried running finds in
parallel etc to no avail.
You mentioned adding some sleep to make this easier to hit, should
something like this help or did I get this wrong?
----
diff --git a/fs/inode.c b/fs/inode.c
index 54e0be80be14..c2991142a462 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -21,6 +21,7 @@
#include <linux/list_lru.h>
#include <linux/iversion.h>
#include <linux/rw_hint.h>
+#include <linux/delay.h>
#include <trace/events/writeback.h>
#include "internal.h"
@@ -962,6 +963,7 @@ static struct inode *find_inode_fast(struct super_block *sb,
continue;
if (inode->i_sb != sb)
continue;
+ usleep_range(10,100);
spin_lock(&inode->i_lock);
if (inode->i_state & (I_FREEING|I_WILL_FREE)) {
__wait_on_freeing_inode(inode, locked);
----
unfortunately I've checked with a printk there too and I never get there
in the first place, so it probably needs to hit another race first where
we're getting an inode that's about or has just been dropped or
something, but none of my "9p stress" workloads seem to be hitting it
either...
Could be some scheduling difference or just that my workloads aren't
appropriate; I need to try running networking tests but ran out of time
for today.
> Even if the patch in the current form does not go in, it should be
> sufficient to confirm the problem diagnosis is correct.
>
> A debug printk can be added to validate the problematic condition was
> encountered, for example:
That was helpful, thanks.
> > diff --git a/fs/inode.c b/fs/inode.c
> > index 54e0be80be14..8f61fad0bc69 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -2308,6 +2308,7 @@ static void __wait_on_freeing_inode(struct inode *inode, bool locked)
> > if (unlikely(inode_unhashed(inode))) {
> > BUG_ON(locked);
> > spin_unlock(&inode->i_lock);
> > + printk(KERN_EMERG "%s: got unhashed inode %p\n", __func__, inode);
> > return;
> > }
--
Dominique Martinet | Asmadeus
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] vfs: handle __wait_on_freeing_inode() and evict() race
2024-07-18 15:18 [PATCH] vfs: handle __wait_on_freeing_inode() and evict() race Mateusz Guzik
2024-07-18 20:40 ` Jan Kara
2024-07-18 21:42 ` Dominique Martinet
@ 2024-07-19 12:02 ` Christian Brauner
2024-07-19 13:25 ` Jakub Kicinski
3 siblings, 0 replies; 6+ messages in thread
From: Christian Brauner @ 2024-07-19 12:02 UTC (permalink / raw)
To: Mateusz Guzik
Cc: Christian Brauner, viro, jack, linux-kernel, linux-fsdevel,
Dominique Martinet, Jakub Kicinski, v9fs
On Thu, 18 Jul 2024 17:18:37 +0200, Mateusz Guzik wrote:
> Lockless hash lookup can find and lock the inode after it gets the
> I_FREEING flag set, at which point it blocks waiting for teardown in
> evict() to finish.
>
> However, the flag is still set even after evict() wakes up all waiters.
>
> This results in a race where if the inode lock is taken late enough, it
> can happen after both hash removal and wakeups, meaning there is nobody
> to wake the racing thread up.
>
> [...]
Thanks for that and the concise explanation!
---
Applied to the vfs.fixes branch of the vfs/vfs.git tree.
Patches in the vfs.fixes branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.fixes
[1/1] vfs: handle __wait_on_freeing_inode() and evict() race
https://git.kernel.org/vfs/vfs/c/3ba35ec4b0ed
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] vfs: handle __wait_on_freeing_inode() and evict() race
2024-07-18 15:18 [PATCH] vfs: handle __wait_on_freeing_inode() and evict() race Mateusz Guzik
` (2 preceding siblings ...)
2024-07-19 12:02 ` Christian Brauner
@ 2024-07-19 13:25 ` Jakub Kicinski
2024-07-19 14:28 ` Mateusz Guzik
3 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2024-07-19 13:25 UTC (permalink / raw)
To: Mateusz Guzik
Cc: brauner, viro, jack, linux-kernel, linux-fsdevel,
Dominique Martinet, v9fs
On Thu, 18 Jul 2024 17:18:37 +0200 Mateusz Guzik wrote:
> Link: https://lore.kernel.org/v9fs/20240717102458.649b60be@kernel.org/
> Reported-by: Dominique Martinet <asmadeus@codewreck.org>
🧐️ click on that link... Anyway, can confirm, problem goes away:
Tested-by: Jakub Kicinski <kuba@kernel.org>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] vfs: handle __wait_on_freeing_inode() and evict() race
2024-07-19 13:25 ` Jakub Kicinski
@ 2024-07-19 14:28 ` Mateusz Guzik
0 siblings, 0 replies; 6+ messages in thread
From: Mateusz Guzik @ 2024-07-19 14:28 UTC (permalink / raw)
To: Jakub Kicinski
Cc: brauner, viro, jack, linux-kernel, linux-fsdevel,
Dominique Martinet, v9fs
On Fri, Jul 19, 2024 at 3:25 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 18 Jul 2024 17:18:37 +0200 Mateusz Guzik wrote:
> > Link: https://lore.kernel.org/v9fs/20240717102458.649b60be@kernel.org/
> > Reported-by: Dominique Martinet <asmadeus@codewreck.org>
>
> 🧐️ click on that link... Anyway, can confirm, problem goes away:
>
well the reporter address can be easily massaged if you want, I
grabbed the person who cc'ed me
> Tested-by: Jakub Kicinski <kuba@kernel.org>
thanks
--
Mateusz Guzik <mjguzik gmail.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-07-19 14:28 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-18 15:18 [PATCH] vfs: handle __wait_on_freeing_inode() and evict() race Mateusz Guzik
2024-07-18 20:40 ` Jan Kara
2024-07-18 21:42 ` Dominique Martinet
2024-07-19 12:02 ` Christian Brauner
2024-07-19 13:25 ` Jakub Kicinski
2024-07-19 14:28 ` Mateusz Guzik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).