From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, Qu Wenruo <wqu@suse.com>,
Boris Burkov <boris@bur.io>, David Sterba <dsterba@suse.com>,
Rahul Sharma <black.hawk@163.com>
Subject: [PATCH 5.15 18/39] btrfs: fix racy bitfield write in btrfs_clear_space_info_full()
Date: Tue, 17 Feb 2026 21:31:27 +0100 [thread overview]
Message-ID: <20260217200003.630708137@linuxfoundation.org> (raw)
In-Reply-To: <20260217200002.929083107@linuxfoundation.org>
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Boris Burkov <boris@bur.io>
[ Upstream commit 38e818718c5e04961eea0fa8feff3f100ce40408 ]
>From the memory-barriers.txt document regarding memory barrier ordering
guarantees:
(*) These guarantees do not apply to bitfields, because compilers often
generate code to modify these using non-atomic read-modify-write
sequences. Do not attempt to use bitfields to synchronize parallel
algorithms.
(*) Even in cases where bitfields are protected by locks, all fields
in a given bitfield must be protected by one lock. If two fields
in a given bitfield are protected by different locks, the compiler's
non-atomic read-modify-write sequences can cause an update to one
field to corrupt the value of an adjacent field.
btrfs_space_info has a bitfield sharing an underlying word consisting of
the fields full, chunk_alloc, and flush:
struct btrfs_space_info {
struct btrfs_fs_info * fs_info; /* 0 8 */
struct btrfs_space_info * parent; /* 8 8 */
...
int clamp; /* 172 4 */
unsigned int full:1; /* 176: 0 4 */
unsigned int chunk_alloc:1; /* 176: 1 4 */
unsigned int flush:1; /* 176: 2 4 */
...
Therefore, to be safe from parallel read-modify-writes losing a write to
one of the bitfield members protected by a lock, all writes to all the
bitfields must use the lock. They almost universally do, except for
btrfs_clear_space_info_full() which iterates over the space_infos and
writes out found->full = 0 without a lock.
Imagine that we have one thread completing a transaction in which we
finished deleting a block_group and are thus calling
btrfs_clear_space_info_full() while simultaneously the data reclaim
ticket infrastructure is running do_async_reclaim_data_space():
T1 T2
btrfs_commit_transaction
btrfs_clear_space_info_full
data_sinfo->full = 0
READ: full:0, chunk_alloc:0, flush:1
do_async_reclaim_data_space(data_sinfo)
spin_lock(&space_info->lock);
if(list_empty(tickets))
space_info->flush = 0;
READ: full: 0, chunk_alloc:0, flush:1
MOD/WRITE: full: 0, chunk_alloc:0, flush:0
spin_unlock(&space_info->lock);
return;
MOD/WRITE: full:0, chunk_alloc:0, flush:1
and now data_sinfo->flush is 1 but the reclaim worker has exited. This
breaks the invariant that flush is 0 iff there is no work queued or
running. Once this invariant is violated, future allocations that go
into __reserve_bytes() will add tickets to space_info->tickets but will
see space_info->flush is set to 1 and not queue the work. After this,
they will block forever on the resulting ticket, as it is now impossible
to kick the worker again.
I also confirmed by looking at the assembly of the affected kernel that
it is doing RMW operations. For example, to set the flush (3rd) bit to 0,
the assembly is:
andb $0xfb,0x60(%rbx)
and similarly for setting the full (1st) bit to 0:
andb $0xfe,-0x20(%rax)
So I think this is really a bug on practical systems. I have observed
a number of systems in this exact state, but am currently unable to
reproduce it.
Rather than leaving this footgun lying around for the future, take
advantage of the fact that there is room in the struct anyway, and that
it is already quite large and simply change the three bitfield members to
bools. This avoids writes to space_info->full having any effect on
writes to space_info->flush, regardless of locking.
Fixes: 957780eb2788 ("Btrfs: introduce ticketed enospc infrastructure")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[ The context change is due to the commit cc0517fe779f
("btrfs: tweak extent/chunk allocation for space_info sub-space")
and the commit 45a59513b4b2
("btrfs: add support for reclaiming from sub-space space_info")
in v6.16 which are irrelevant to the logic of this patch. ]
Signed-off-by: Rahul Sharma <black.hawk@163.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/btrfs/block-group.c | 6 +++---
fs/btrfs/space-info.c | 20 ++++++++++----------
fs/btrfs/space-info.h | 6 +++---
3 files changed, 16 insertions(+), 16 deletions(-)
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -3699,7 +3699,7 @@ int btrfs_chunk_alloc(struct btrfs_trans
mutex_unlock(&fs_info->chunk_mutex);
} else {
/* Proceed with allocation */
- space_info->chunk_alloc = 1;
+ space_info->chunk_alloc = true;
wait_for_alloc = false;
spin_unlock(&space_info->lock);
}
@@ -3735,7 +3735,7 @@ int btrfs_chunk_alloc(struct btrfs_trans
spin_lock(&space_info->lock);
if (ret < 0) {
if (ret == -ENOSPC)
- space_info->full = 1;
+ space_info->full = true;
else
goto out;
} else {
@@ -3745,7 +3745,7 @@ int btrfs_chunk_alloc(struct btrfs_trans
space_info->force_alloc = CHUNK_ALLOC_NO_FORCE;
out:
- space_info->chunk_alloc = 0;
+ space_info->chunk_alloc = false;
spin_unlock(&space_info->lock);
mutex_unlock(&fs_info->chunk_mutex);
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -178,7 +178,7 @@ void btrfs_clear_space_info_full(struct
struct btrfs_space_info *found;
list_for_each_entry(found, head, list)
- found->full = 0;
+ found->full = false;
}
static int create_space_info(struct btrfs_fs_info *info, u64 flags)
@@ -271,7 +271,7 @@ void btrfs_update_space_info(struct btrf
found->bytes_readonly += bytes_readonly;
found->bytes_zone_unusable += bytes_zone_unusable;
if (total_bytes > 0)
- found->full = 0;
+ found->full = false;
btrfs_try_granting_tickets(info, found);
spin_unlock(&found->lock);
*space_info = found;
@@ -941,7 +941,7 @@ static void btrfs_async_reclaim_metadata
spin_lock(&space_info->lock);
to_reclaim = btrfs_calc_reclaim_metadata_size(fs_info, space_info);
if (!to_reclaim) {
- space_info->flush = 0;
+ space_info->flush = false;
spin_unlock(&space_info->lock);
return;
}
@@ -953,7 +953,7 @@ static void btrfs_async_reclaim_metadata
flush_space(fs_info, space_info, to_reclaim, flush_state, false);
spin_lock(&space_info->lock);
if (list_empty(&space_info->tickets)) {
- space_info->flush = 0;
+ space_info->flush = false;
spin_unlock(&space_info->lock);
return;
}
@@ -996,7 +996,7 @@ static void btrfs_async_reclaim_metadata
flush_state = FLUSH_DELAYED_ITEMS_NR;
commit_cycles--;
} else {
- space_info->flush = 0;
+ space_info->flush = false;
}
} else {
flush_state = FLUSH_DELAYED_ITEMS_NR;
@@ -1158,7 +1158,7 @@ static void btrfs_async_reclaim_data_spa
spin_lock(&space_info->lock);
if (list_empty(&space_info->tickets)) {
- space_info->flush = 0;
+ space_info->flush = false;
spin_unlock(&space_info->lock);
return;
}
@@ -1169,7 +1169,7 @@ static void btrfs_async_reclaim_data_spa
flush_space(fs_info, space_info, U64_MAX, ALLOC_CHUNK_FORCE, false);
spin_lock(&space_info->lock);
if (list_empty(&space_info->tickets)) {
- space_info->flush = 0;
+ space_info->flush = false;
spin_unlock(&space_info->lock);
return;
}
@@ -1182,7 +1182,7 @@ static void btrfs_async_reclaim_data_spa
data_flush_states[flush_state], false);
spin_lock(&space_info->lock);
if (list_empty(&space_info->tickets)) {
- space_info->flush = 0;
+ space_info->flush = false;
spin_unlock(&space_info->lock);
return;
}
@@ -1199,7 +1199,7 @@ static void btrfs_async_reclaim_data_spa
if (maybe_fail_all_tickets(fs_info, space_info))
flush_state = 0;
else
- space_info->flush = 0;
+ space_info->flush = false;
} else {
flush_state = 0;
}
@@ -1510,7 +1510,7 @@ static int __reserve_bytes(struct btrfs_
*/
maybe_clamp_preempt(fs_info, space_info);
- space_info->flush = 1;
+ space_info->flush = true;
trace_btrfs_trigger_flush(fs_info,
space_info->flags,
orig_bytes, flush,
--- a/fs/btrfs/space-info.h
+++ b/fs/btrfs/space-info.h
@@ -28,11 +28,11 @@ struct btrfs_space_info {
flushing. The value is >> clamp, so turns
out to be a 2^clamp divisor. */
- unsigned int full:1; /* indicates that we cannot allocate any more
+ bool full; /* indicates that we cannot allocate any more
chunks for this space */
- unsigned int chunk_alloc:1; /* set if we are allocating a chunk */
+ bool chunk_alloc; /* set if we are allocating a chunk */
- unsigned int flush:1; /* set if we are trying to make space */
+ bool flush; /* set if we are trying to make space */
unsigned int force_alloc; /* set if we need to force a chunk
alloc for this space */
next prev parent reply other threads:[~2026-02-17 20:43 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-17 20:31 [PATCH 5.15 00/39] 5.15.201-rc1 review Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 01/39] crypto: octeontx - Fix length check to avoid truncation in ucode_load_store Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 02/39] crypto: omap - Allocate OMAP_CRYPTO_FORCE_COPY scatterlists correctly Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 03/39] crypto: virtio - Add spinlock protection with virtqueue notification Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 04/39] nilfs2: Fix potential block overflow that cause system hang Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 05/39] scsi: qla2xxx: Validate sp before freeing associated memory Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 06/39] scsi: qla2xxx: Delay module unload while fabric scan in progress Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 07/39] scsi: qla2xxx: Query FW again before proceeding with login Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 08/39] gpio: omap: do not register driver in probe() Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 09/39] ALSA: hda/realtek: Fix headset mic for TongFang X6AR55xU Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 10/39] gpio: sprd: Change sprd_gpio lock to raw_spin_lock Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 11/39] romfs: check sb_set_blocksize() return value Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 12/39] =?UTF-8?q?drm/tegra:=20hdmi:=20sor:=20Fix=20error:=20variable=20?= =?UTF-8?q?=E2=80=98j=E2=80=99=20set=20but=20not=20used?= Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 13/39] platform/x86: classmate-laptop: Add missing NULL pointer checks Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 14/39] platform/x86: panasonic-laptop: Fix sysfs group leak in error path Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 15/39] ASoC: fsl_xcvr: fix missing lock in fsl_xcvr_mode_put() Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 16/39] gpiolib: acpi: Fix gpio count with string references Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 17/39] Revert "wireguard: device: enable threaded NAPI" Greg Kroah-Hartman
2026-02-17 20:31 ` Greg Kroah-Hartman [this message]
2026-02-17 20:31 ` [PATCH 5.15 19/39] smb: client: set correct id, uid and cruid for multiuser automounts Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 20/39] net: dsa: free routing table on probe failure Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 21/39] selftests: mptcp: pm: ensure unknown flags are ignored Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 22/39] mptcp: fix race in mptcp_pm_nl_flush_addrs_doit() Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 23/39] crypto: virtio - Remove duplicated virtqueue_kick in virtio_crypto_skcipher_crypt_req Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 24/39] smb: server: fix leak of active_num_conn in ksmbd_tcp_new_connection() Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 25/39] bus: fsl-mc: Replace snprintf and sprintf with sysfs_emit in sysfs show functions Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 26/39] bus: fsl-mc: fix use-after-free in driver_override_show() Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 27/39] scsi: qla2xxx: Fix bsg_done() causing double free Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 28/39] scsi: qla2xxx: Use named initializers for port_[d]state_str Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 29/39] scsi: qla2xxx: Remove dead code (GNN ID) Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 30/39] scsi: qla2xxx: Reduce fabric scan duplicate code Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 31/39] scsi: qla2xxx: Free sp in error path to fix system crash Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 32/39] PCI: endpoint: Automatically create a function specific attributes group Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 33/39] PCI: endpoint: Remove unused field in struct pci_epf_group Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 34/39] PCI: endpoint: Avoid creating sub-groups asynchronously Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 35/39] fbdev: rivafb: fix divide error in nv3_arb() Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 36/39] fbdev: smscufx: properly copy ioctl memory to kernelspace Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 37/39] f2fs: fix to avoid UAF in f2fs_write_end_io() Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 38/39] f2fs: fix out-of-bounds access in sysfs attribute read/write Greg Kroah-Hartman
2026-02-17 20:31 ` [PATCH 5.15 39/39] USB: serial: option: add Telit FN920C04 RNDIS compositions Greg Kroah-Hartman
2026-02-17 22:22 ` [PATCH 5.15 00/39] 5.15.201-rc1 review Florian Fainelli
2026-02-18 8:22 ` Jon Hunter
2026-02-18 9:09 ` Brett A C Sheffield
2026-02-18 11:59 ` Mark Brown
2026-02-18 16:27 ` Vijayendra Suman
2026-02-19 6:51 ` Ron Economos
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260217200003.630708137@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=black.hawk@163.com \
--cc=boris@bur.io \
--cc=dsterba@suse.com \
--cc=patches@lists.linux.dev \
--cc=stable@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox