public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: linux-btrfs@vger.kernel.org
Subject: [PATCH v2 0/2] Fix data race with transaction->state
Date: Fri, 21 Nov 2025 09:59:20 -0500	[thread overview]
Message-ID: <cover.1763736921.git.josef@toxicpanda.com> (raw)

v1: https://lore.kernel.org/linux-btrfs/cover.1763481355.git.josef@toxicpanda.com/

v1->v2:
- I'm rusty and forgot READ_ONCE/WRITE_ONCE doesn't mean smp consistency, fixed
  the race with a proper locked check.
- Updated the smp_mb usage in start_transaction to use the proper helper.

I want to note that this isn't actually an observed hang, there was a problem
with MMIO based block IO in my version of QEMU that was making IO just stop. I
happened to notice this because the hung tasks looked very much like a deadlock.
This fixes a real data race, and we would for sure miss wakeups without these
fixes, but I don't actually have a reproducer for any sort of deadlock in this
area.

=== Original email ===

I've been setting up Claude to setup fstests and run vms automatically and I
kept hitting hangs. This turned out to be a bug with qemu's microvm, but at some
point I was convinced there was a deadlock with running out of block tags and
ordered extent completion and transaction commit. This actually wasn't the case,
however this data race is in fact real. We can easily miss wakeups if we have to
wait on transaction state to change because we do it outside of a lock and we do
not have proper barriers around transaction->state. I suspect this explains the
random hangs that I would see in production while at Meta that would clear up
eventually (we do call wakeup on the transaction wait thing a lot). In any case
this is a data race, even if it wasn't my particular bug, we should fix it.
I've run it through fstests a few times, but obviously spot check it since I'm a
little rusty with this stuff at the moment. Thanks,

Josef

Josef Bacik (2):
  btrfs: fix data race on transaction->state
  btrfs: remove useless smp_mb in start_transaction

 fs/btrfs/transaction.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

-- 
2.51.1


             reply	other threads:[~2025-11-21 14:59 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-21 14:59 Josef Bacik [this message]
2025-11-21 14:59 ` [PATCH v2 1/2] btrfs: fix data race on transaction->state Josef Bacik
2025-11-21 15:29   ` Filipe Manana
2025-11-21 20:29     ` Qu Wenruo
2025-11-21 20:51       ` Filipe Manana
2025-11-21 14:59 ` [PATCH v2 2/2] btrfs: remove useless smp_mb in start_transaction Josef Bacik
2025-11-24 17:23   ` David Sterba
2025-11-21 19:49 ` [PATCH v2 0/2] Fix data race with transaction->state Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1763736921.git.josef@toxicpanda.com \
    --to=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox