linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mateusz Guzik <mjguzik@gmail.com>
To: brauner@kernel.org
Cc: viro@zeniv.linux.org.uk, jack@suse.cz,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Dominique Martinet <asmadeus@codewreck.org>,
	Jakub Kicinski <kuba@kernel.org>,
	v9fs@lists.linux.dev, Mateusz Guzik <mjguzik@gmail.com>
Subject: [PATCH] vfs: handle __wait_on_freeing_inode() and evict() race
Date: Thu, 18 Jul 2024 17:18:37 +0200	[thread overview]
Message-ID: <20240718151838.611807-1-mjguzik@gmail.com> (raw)

Lockless hash lookup can find and lock the inode after it gets the
I_FREEING flag set, at which point it blocks waiting for teardown in
evict() to finish.

However, the flag is still set even after evict() wakes up all waiters.

This results in a race where if the inode lock is taken late enough, it
can happen after both hash removal and wakeups, meaning there is nobody
to wake the racing thread up.

This worked prior to RCU-based lookup because the entire ordeal was
synchronized with the inode hash lock.

Since unhashing requires the inode lock, we can safely check whether it
happened after acquiring it.

Link: https://lore.kernel.org/v9fs/20240717102458.649b60be@kernel.org/
Reported-by: Dominique Martinet <asmadeus@codewreck.org>
Fixes: 7180f8d91fcb ("vfs: add rcu-based find_inode variants for iget ops")
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---

The 'fixes' tag is contingent on testing by someone else. :>

I have 0 experience with 9pfs and the docs failed me vs getting it
running on libvirt+qemu, so I gave up on trying to test it myself.

Dominique, you offered to narrow things down here, assuming the offer
stands I would appreciate if you got this sorted out :)

Even if the patch in the current form does not go in, it should be
sufficient to confirm the problem diagnosis is correct.

A debug printk can be added to validate the problematic condition was
encountered, for example:

> diff --git a/fs/inode.c b/fs/inode.c
> index 54e0be80be14..8f61fad0bc69 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -2308,6 +2308,7 @@ static void __wait_on_freeing_inode(struct inode *inode, bool locked)
>         if (unlikely(inode_unhashed(inode))) {
>                 BUG_ON(locked);
>                 spin_unlock(&inode->i_lock);
> +               printk(KERN_EMERG "%s: got unhashed inode %p\n", __func__, inode);
>                 return;
>         }


 fs/inode.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/fs/inode.c b/fs/inode.c
index f356fe2ec2b6..54e0be80be14 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -676,6 +676,16 @@ static void evict(struct inode *inode)
 
 	remove_inode_hash(inode);
 
+	/*
+	 * Wake up waiters in __wait_on_freeing_inode().
+	 *
+	 * Lockless hash lookup may end up finding the inode before we removed
+	 * it above, but only lock it *after* we are done with the wakeup below.
+	 * In this case the potential waiter cannot safely block.
+	 *
+	 * The inode being unhashed after the call to remove_inode_hash() is
+	 * used as an indicator whether blocking on it is safe.
+	 */
 	spin_lock(&inode->i_lock);
 	wake_up_bit(&inode->i_state, __I_NEW);
 	BUG_ON(inode->i_state != (I_FREEING | I_CLEAR));
@@ -2291,6 +2301,16 @@ static void __wait_on_freeing_inode(struct inode *inode, bool locked)
 {
 	wait_queue_head_t *wq;
 	DEFINE_WAIT_BIT(wait, &inode->i_state, __I_NEW);
+
+	/*
+	 * Handle racing against evict(), see that routine for more details.
+	 */
+	if (unlikely(inode_unhashed(inode))) {
+		BUG_ON(locked);
+		spin_unlock(&inode->i_lock);
+		return;
+	}
+
 	wq = bit_waitqueue(&inode->i_state, __I_NEW);
 	prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
 	spin_unlock(&inode->i_lock);
-- 
2.43.0


             reply	other threads:[~2024-07-18 15:18 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-18 15:18 Mateusz Guzik [this message]
2024-07-18 20:40 ` [PATCH] vfs: handle __wait_on_freeing_inode() and evict() race Jan Kara
2024-07-18 21:42 ` Dominique Martinet
2024-07-19 12:02 ` Christian Brauner
2024-07-19 13:25 ` Jakub Kicinski
2024-07-19 14:28   ` Mateusz Guzik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240718151838.611807-1-mjguzik@gmail.com \
    --to=mjguzik@gmail.com \
    --cc=asmadeus@codewreck.org \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=kuba@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=v9fs@lists.linux.dev \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).