* [PATCH v3 0/2] btrfs: fix use-after-free in btrfs_encoded_read_endio @ 2024-11-13 17:16 Johannes Thumshirn 2024-11-13 17:16 ` [PATCH v3 1/2] " Johannes Thumshirn 2024-11-13 17:16 ` [PATCH v3 2/2] btrfs: simplify waiting for encoded read endios Johannes Thumshirn 0 siblings, 2 replies; 7+ messages in thread From: Johannes Thumshirn @ 2024-11-13 17:16 UTC (permalink / raw) To: linux-btrfs Cc: Filipe Manana, Damien Le Moal, Johannes Thumshirn, Johannes Thumshirn, Mark Harmstone, Omar Sandoval Shinichiro reported a occassional memory corruption in our CI system with btrfs/248 that lead to panics. He also managed to reproduce this corruption reliably on one host. See patch 1/2 for details on the corruption and the fix, patch 2/2 is a cleanup Damien suggested on top of the fix to make the code more obvious. Changes to v2: - Make patch 1/2 only do the atomic_dec_and_test() as a minimal viable fix - Make patch 2/2 only do completion and refcount_t Link to v2: https://lore.kernel.org/linux-btrfs/cover.1731407982.git.jth@kernel.org Changes to v1: - Update commit message of patch 1/1 - Prevent double-free of 'priv' in case of io_uring in 2/2 - Use wait_for_completion_io() in 2/2 - Convert priv->pending from atomic_t to refcount_t calling it refs in 2/2 Link to v1: https://lore.kernel.org/linux-btrfs/cover.1731316882.git.jth@kernel.org Johannes Thumshirn (2): btrfs: fix use-after-free in btrfs_encoded_read_endio btrfs: simplify waiting for encoded read endios fs/btrfs/inode.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) -- 2.43.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v3 1/2] btrfs: fix use-after-free in btrfs_encoded_read_endio 2024-11-13 17:16 [PATCH v3 0/2] btrfs: fix use-after-free in btrfs_encoded_read_endio Johannes Thumshirn @ 2024-11-13 17:16 ` Johannes Thumshirn 2024-11-13 19:30 ` Filipe Manana 2024-11-13 21:11 ` Qu Wenruo 2024-11-13 17:16 ` [PATCH v3 2/2] btrfs: simplify waiting for encoded read endios Johannes Thumshirn 1 sibling, 2 replies; 7+ messages in thread From: Johannes Thumshirn @ 2024-11-13 17:16 UTC (permalink / raw) To: linux-btrfs Cc: Filipe Manana, Damien Le Moal, Johannes Thumshirn, Mark Harmstone, Omar Sandoval, Shinichiro Kawasaki, Damien Le Moal From: Johannes Thumshirn <johannes.thumshirn@wdc.com> Shinichiro reported the following use-after free that sometimes is happening in our CI system when running fstests' btrfs/284 on a TCMU runner device: ================================================================== BUG: KASAN: slab-use-after-free in lock_release+0x708/0x780 Read of size 8 at addr ffff888106a83f18 by task kworker/u80:6/219 CPU: 8 UID: 0 PID: 219 Comm: kworker/u80:6 Not tainted 6.12.0-rc6-kts+ #15 Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 3.3 02/21/2020 Workqueue: btrfs-endio btrfs_end_bio_work [btrfs] Call Trace: <TASK> dump_stack_lvl+0x6e/0xa0 ? lock_release+0x708/0x780 print_report+0x174/0x505 ? lock_release+0x708/0x780 ? __virt_addr_valid+0x224/0x410 ? lock_release+0x708/0x780 kasan_report+0xda/0x1b0 ? lock_release+0x708/0x780 ? __wake_up+0x44/0x60 lock_release+0x708/0x780 ? __pfx_lock_release+0x10/0x10 ? __pfx_do_raw_spin_lock+0x10/0x10 ? lock_is_held_type+0x9a/0x110 _raw_spin_unlock_irqrestore+0x1f/0x60 __wake_up+0x44/0x60 btrfs_encoded_read_endio+0x14b/0x190 [btrfs] btrfs_check_read_bio+0x8d9/0x1360 [btrfs] ? lock_release+0x1b0/0x780 ? trace_lock_acquire+0x12f/0x1a0 ? __pfx_btrfs_check_read_bio+0x10/0x10 [btrfs] ? process_one_work+0x7e3/0x1460 ? lock_acquire+0x31/0xc0 ? process_one_work+0x7e3/0x1460 process_one_work+0x85c/0x1460 ? __pfx_process_one_work+0x10/0x10 ? assign_work+0x16c/0x240 worker_thread+0x5e6/0xfc0 ? __pfx_worker_thread+0x10/0x10 kthread+0x2c3/0x3a0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x31/0x70 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Allocated by task 3661: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 __kasan_kmalloc+0xaa/0xb0 btrfs_encoded_read_regular_fill_pages+0x16c/0x6d0 [btrfs] send_extent_data+0xf0f/0x24a0 [btrfs] process_extent+0x48a/0x1830 [btrfs] changed_cb+0x178b/0x2ea0 [btrfs] btrfs_ioctl_send+0x3bf9/0x5c20 [btrfs] _btrfs_ioctl_send+0x117/0x330 [btrfs] btrfs_ioctl+0x184a/0x60a0 [btrfs] __x64_sys_ioctl+0x12e/0x1a0 do_syscall_64+0x95/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e Freed by task 3661: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x70 __kasan_slab_free+0x4f/0x70 kfree+0x143/0x490 btrfs_encoded_read_regular_fill_pages+0x531/0x6d0 [btrfs] send_extent_data+0xf0f/0x24a0 [btrfs] process_extent+0x48a/0x1830 [btrfs] changed_cb+0x178b/0x2ea0 [btrfs] btrfs_ioctl_send+0x3bf9/0x5c20 [btrfs] _btrfs_ioctl_send+0x117/0x330 [btrfs] btrfs_ioctl+0x184a/0x60a0 [btrfs] __x64_sys_ioctl+0x12e/0x1a0 do_syscall_64+0x95/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e The buggy address belongs to the object at ffff888106a83f00 which belongs to the cache kmalloc-rnd-07-96 of size 96 The buggy address is located 24 bytes inside of freed 96-byte region [ffff888106a83f00, ffff888106a83f60) The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888106a83800 pfn:0x106a83 flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff) page_type: f5(slab) raw: 0017ffffc0000000 ffff888100053680 ffffea0004917200 0000000000000004 raw: ffff888106a83800 0000000080200019 00000001f5000000 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888106a83e00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ffff888106a83e80: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc >ffff888106a83f00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ^ ffff888106a83f80: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ffff888106a84000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ================================================================== Further analyzing the trace and the crash dump's vmcore file shows that the wake_up() call in btrfs_encoded_read_endio() is calling wake_up() on the wait_queue that is in the private data passed to the end_io handler. Commit 4ff47df40447 ("btrfs: move priv off stack in btrfs_encoded_read_regular_fill_pages()") moved 'struct btrfs_encoded_read_private' off the stack. Before that commit one can see a corruption of the private data when analyzing the vmcore after a crash: *(struct btrfs_encoded_read_private *)0xffff88815626eec8 = { .wait = (wait_queue_head_t){ .lock = (spinlock_t){ .rlock = (struct raw_spinlock){ .raw_lock = (arch_spinlock_t){ .val = (atomic_t){ .counter = (int)-2005885696, }, .locked = (u8)0, .pending = (u8)157, .locked_pending = (u16)40192, .tail = (u16)34928, }, .magic = (unsigned int)536325682, .owner_cpu = (unsigned int)29, .owner = (void *)__SCT__tp_func_btrfs_transaction_commit+0x0 = 0x0, .dep_map = (struct lockdep_map){ .key = (struct lock_class_key *)0xffff8881575a3b6c, .class_cache = (struct lock_class *[2]){ 0xffff8882a71985c0, 0xffffea00066f5d40 }, .name = (const char *)0xffff88815626f100 = "", .wait_type_outer = (u8)37, .wait_type_inner = (u8)178, .lock_type = (u8)154, }, }, .__padding = (u8 [24]){ 0, 157, 112, 136, 50, 174, 247, 31, 29 }, .dep_map = (struct lockdep_map){ .key = (struct lock_class_key *)0xffff8881575a3b6c, .class_cache = (struct lock_class *[2]){ 0xffff8882a71985c0, 0xffffea00066f5d40 }, .name = (const char *)0xffff88815626f100 = "", .wait_type_outer = (u8)37, .wait_type_inner = (u8)178, .lock_type = (u8)154, }, }, .head = (struct list_head){ .next = (struct list_head *)0x112cca, .prev = (struct list_head *)0x47, }, }, .pending = (atomic_t){ .counter = (int)-1491499288, }, .status = (blk_status_t)130, } Here we can see several indicators of in-memory data corruption, e.g. the large negative atomic values of ->pending or ->wait->lock->rlock->raw_lock->val, as well as the bogus spinlock magic 0x1ff7ae32 (decimal 536325682 above) instead of 0xdead4ead or the bogus pointer values for ->wait->head. To fix this, change atomic_dec_return() to atomic_dec_and_test() to fix the corruption, as atomic_dec_return() is defined as two instructions on x86_64, whereas atomic_dec_and_test() is defined as a single atomic operation. This can lead to a situation where counter value is already decremented but the if statement in btrfs_encoded_read_endio() is not completely processed, i.e. the 0 test has not completed. If another thread continues executing btrfs_encoded_read_regular_fill_pages() the atomic_dec_return() there can see an already updated ->pending counter and continues by freeing the private data. Continuing in the endio handler the test for 0 succeeds and the wait_queue is woken up, resulting in a use-after-free. Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Suggested-by: Damien Le Moal <Damien.LeMoal@wdc.com> Fixes: 1881fba89bd5 ("btrfs: add BTRFS_IOC_ENCODED_READ ioctl") Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> --- fs/btrfs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 22b8e2764619..fdad1adee1a3 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -9089,7 +9089,7 @@ static void btrfs_encoded_read_endio(struct btrfs_bio *bbio) */ WRITE_ONCE(priv->status, bbio->bio.bi_status); } - if (atomic_dec_return(&priv->pending) == 0) { + if (atomic_dec_and_test(&priv->pending)) { int err = blk_status_to_errno(READ_ONCE(priv->status)); if (priv->uring_ctx) { -- 2.43.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v3 1/2] btrfs: fix use-after-free in btrfs_encoded_read_endio 2024-11-13 17:16 ` [PATCH v3 1/2] " Johannes Thumshirn @ 2024-11-13 19:30 ` Filipe Manana 2024-11-13 21:11 ` Qu Wenruo 1 sibling, 0 replies; 7+ messages in thread From: Filipe Manana @ 2024-11-13 19:30 UTC (permalink / raw) To: Johannes Thumshirn Cc: linux-btrfs, Filipe Manana, Damien Le Moal, Johannes Thumshirn, Mark Harmstone, Omar Sandoval, Shinichiro Kawasaki, Damien Le Moal On Wed, Nov 13, 2024 at 5:17 PM Johannes Thumshirn <jth@kernel.org> wrote: > > From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > Shinichiro reported the following use-after free that sometimes is > happening in our CI system when running fstests' btrfs/284 on a TCMU > runner device: > > ================================================================== > BUG: KASAN: slab-use-after-free in lock_release+0x708/0x780 > Read of size 8 at addr ffff888106a83f18 by task kworker/u80:6/219 > > CPU: 8 UID: 0 PID: 219 Comm: kworker/u80:6 Not tainted 6.12.0-rc6-kts+ #15 > Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 3.3 02/21/2020 > Workqueue: btrfs-endio btrfs_end_bio_work [btrfs] > Call Trace: > <TASK> > dump_stack_lvl+0x6e/0xa0 > ? lock_release+0x708/0x780 > print_report+0x174/0x505 > ? lock_release+0x708/0x780 > ? __virt_addr_valid+0x224/0x410 > ? lock_release+0x708/0x780 > kasan_report+0xda/0x1b0 > ? lock_release+0x708/0x780 > ? __wake_up+0x44/0x60 > lock_release+0x708/0x780 > ? __pfx_lock_release+0x10/0x10 > ? __pfx_do_raw_spin_lock+0x10/0x10 > ? lock_is_held_type+0x9a/0x110 > _raw_spin_unlock_irqrestore+0x1f/0x60 > __wake_up+0x44/0x60 > btrfs_encoded_read_endio+0x14b/0x190 [btrfs] > btrfs_check_read_bio+0x8d9/0x1360 [btrfs] > ? lock_release+0x1b0/0x780 > ? trace_lock_acquire+0x12f/0x1a0 > ? __pfx_btrfs_check_read_bio+0x10/0x10 [btrfs] > ? process_one_work+0x7e3/0x1460 > ? lock_acquire+0x31/0xc0 > ? process_one_work+0x7e3/0x1460 > process_one_work+0x85c/0x1460 > ? __pfx_process_one_work+0x10/0x10 > ? assign_work+0x16c/0x240 > worker_thread+0x5e6/0xfc0 > ? __pfx_worker_thread+0x10/0x10 > kthread+0x2c3/0x3a0 > ? __pfx_kthread+0x10/0x10 > ret_from_fork+0x31/0x70 > ? __pfx_kthread+0x10/0x10 > ret_from_fork_asm+0x1a/0x30 > </TASK> > > Allocated by task 3661: > kasan_save_stack+0x30/0x50 > kasan_save_track+0x14/0x30 > __kasan_kmalloc+0xaa/0xb0 > btrfs_encoded_read_regular_fill_pages+0x16c/0x6d0 [btrfs] > send_extent_data+0xf0f/0x24a0 [btrfs] > process_extent+0x48a/0x1830 [btrfs] > changed_cb+0x178b/0x2ea0 [btrfs] > btrfs_ioctl_send+0x3bf9/0x5c20 [btrfs] > _btrfs_ioctl_send+0x117/0x330 [btrfs] > btrfs_ioctl+0x184a/0x60a0 [btrfs] > __x64_sys_ioctl+0x12e/0x1a0 > do_syscall_64+0x95/0x180 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > Freed by task 3661: > kasan_save_stack+0x30/0x50 > kasan_save_track+0x14/0x30 > kasan_save_free_info+0x3b/0x70 > __kasan_slab_free+0x4f/0x70 > kfree+0x143/0x490 > btrfs_encoded_read_regular_fill_pages+0x531/0x6d0 [btrfs] > send_extent_data+0xf0f/0x24a0 [btrfs] > process_extent+0x48a/0x1830 [btrfs] > changed_cb+0x178b/0x2ea0 [btrfs] > btrfs_ioctl_send+0x3bf9/0x5c20 [btrfs] > _btrfs_ioctl_send+0x117/0x330 [btrfs] > btrfs_ioctl+0x184a/0x60a0 [btrfs] > __x64_sys_ioctl+0x12e/0x1a0 > do_syscall_64+0x95/0x180 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > The buggy address belongs to the object at ffff888106a83f00 > which belongs to the cache kmalloc-rnd-07-96 of size 96 > The buggy address is located 24 bytes inside of > freed 96-byte region [ffff888106a83f00, ffff888106a83f60) > > The buggy address belongs to the physical page: > page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888106a83800 pfn:0x106a83 > flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff) > page_type: f5(slab) > raw: 0017ffffc0000000 ffff888100053680 ffffea0004917200 0000000000000004 > raw: ffff888106a83800 0000000080200019 00000001f5000000 0000000000000000 > page dumped because: kasan: bad access detected > > Memory state around the buggy address: > ffff888106a83e00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc > ffff888106a83e80: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc > >ffff888106a83f00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc > ^ > ffff888106a83f80: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc > ffff888106a84000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > ================================================================== > > Further analyzing the trace and the crash dump's vmcore file shows that > the wake_up() call in btrfs_encoded_read_endio() is calling wake_up() on > the wait_queue that is in the private data passed to the end_io handler. > > Commit 4ff47df40447 ("btrfs: move priv off stack in > btrfs_encoded_read_regular_fill_pages()") moved 'struct > btrfs_encoded_read_private' off the stack. > > Before that commit one can see a corruption of the private data when > analyzing the vmcore after a crash: > > *(struct btrfs_encoded_read_private *)0xffff88815626eec8 = { > .wait = (wait_queue_head_t){ > .lock = (spinlock_t){ > .rlock = (struct raw_spinlock){ > .raw_lock = (arch_spinlock_t){ > .val = (atomic_t){ > .counter = (int)-2005885696, > }, > .locked = (u8)0, > .pending = (u8)157, > .locked_pending = (u16)40192, > .tail = (u16)34928, > }, > .magic = (unsigned int)536325682, > .owner_cpu = (unsigned int)29, > .owner = (void *)__SCT__tp_func_btrfs_transaction_commit+0x0 = 0x0, > .dep_map = (struct lockdep_map){ > .key = (struct lock_class_key *)0xffff8881575a3b6c, > .class_cache = (struct lock_class *[2]){ 0xffff8882a71985c0, 0xffffea00066f5d40 }, > .name = (const char *)0xffff88815626f100 = "", > .wait_type_outer = (u8)37, > .wait_type_inner = (u8)178, > .lock_type = (u8)154, > }, > }, > .__padding = (u8 [24]){ 0, 157, 112, 136, 50, 174, 247, 31, 29 }, > .dep_map = (struct lockdep_map){ > .key = (struct lock_class_key *)0xffff8881575a3b6c, > .class_cache = (struct lock_class *[2]){ 0xffff8882a71985c0, 0xffffea00066f5d40 }, > .name = (const char *)0xffff88815626f100 = "", > .wait_type_outer = (u8)37, > .wait_type_inner = (u8)178, > .lock_type = (u8)154, > }, > }, > .head = (struct list_head){ > .next = (struct list_head *)0x112cca, > .prev = (struct list_head *)0x47, > }, > }, > .pending = (atomic_t){ > .counter = (int)-1491499288, > }, > .status = (blk_status_t)130, > } > > Here we can see several indicators of in-memory data corruption, e.g. the > large negative atomic values of ->pending or > ->wait->lock->rlock->raw_lock->val, as well as the bogus spinlock magic > 0x1ff7ae32 (decimal 536325682 above) instead of 0xdead4ead or the bogus > pointer values for ->wait->head. > > To fix this, change atomic_dec_return() to atomic_dec_and_test() to fix the > corruption, as atomic_dec_return() is defined as two instructions on > x86_64, whereas atomic_dec_and_test() is defined as a single atomic > operation. This can lead to a situation where counter value is already > decremented but the if statement in btrfs_encoded_read_endio() is not > completely processed, i.e. the 0 test has not completed. If another thread > continues executing btrfs_encoded_read_regular_fill_pages() the > atomic_dec_return() there can see an already updated ->pending counter and > continues by freeing the private data. Continuing in the endio handler the > test for 0 succeeds and the wait_queue is woken up, resulting in a > use-after-free. > > Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> > Suggested-by: Damien Le Moal <Damien.LeMoal@wdc.com> > Fixes: 1881fba89bd5 ("btrfs: add BTRFS_IOC_ENCODED_READ ioctl") > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Looks good now, thanks. > --- > fs/btrfs/inode.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 22b8e2764619..fdad1adee1a3 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -9089,7 +9089,7 @@ static void btrfs_encoded_read_endio(struct btrfs_bio *bbio) > */ > WRITE_ONCE(priv->status, bbio->bio.bi_status); > } > - if (atomic_dec_return(&priv->pending) == 0) { > + if (atomic_dec_and_test(&priv->pending)) { > int err = blk_status_to_errno(READ_ONCE(priv->status)); > > if (priv->uring_ctx) { > -- > 2.43.0 > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3 1/2] btrfs: fix use-after-free in btrfs_encoded_read_endio 2024-11-13 17:16 ` [PATCH v3 1/2] " Johannes Thumshirn 2024-11-13 19:30 ` Filipe Manana @ 2024-11-13 21:11 ` Qu Wenruo 1 sibling, 0 replies; 7+ messages in thread From: Qu Wenruo @ 2024-11-13 21:11 UTC (permalink / raw) To: Johannes Thumshirn, linux-btrfs Cc: Filipe Manana, Damien Le Moal, Johannes Thumshirn, Mark Harmstone, Omar Sandoval, Shinichiro Kawasaki, Damien Le Moal 在 2024/11/14 03:46, Johannes Thumshirn 写道: > From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > Shinichiro reported the following use-after free that sometimes is > happening in our CI system when running fstests' btrfs/284 on a TCMU > runner device: > > ================================================================== > BUG: KASAN: slab-use-after-free in lock_release+0x708/0x780 > Read of size 8 at addr ffff888106a83f18 by task kworker/u80:6/219 > > CPU: 8 UID: 0 PID: 219 Comm: kworker/u80:6 Not tainted 6.12.0-rc6-kts+ #15 > Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 3.3 02/21/2020 > Workqueue: btrfs-endio btrfs_end_bio_work [btrfs] > Call Trace: > <TASK> > dump_stack_lvl+0x6e/0xa0 > ? lock_release+0x708/0x780 > print_report+0x174/0x505 > ? lock_release+0x708/0x780 > ? __virt_addr_valid+0x224/0x410 > ? lock_release+0x708/0x780 > kasan_report+0xda/0x1b0 > ? lock_release+0x708/0x780 > ? __wake_up+0x44/0x60 > lock_release+0x708/0x780 > ? __pfx_lock_release+0x10/0x10 > ? __pfx_do_raw_spin_lock+0x10/0x10 > ? lock_is_held_type+0x9a/0x110 > _raw_spin_unlock_irqrestore+0x1f/0x60 > __wake_up+0x44/0x60 > btrfs_encoded_read_endio+0x14b/0x190 [btrfs] > btrfs_check_read_bio+0x8d9/0x1360 [btrfs] > ? lock_release+0x1b0/0x780 > ? trace_lock_acquire+0x12f/0x1a0 > ? __pfx_btrfs_check_read_bio+0x10/0x10 [btrfs] > ? process_one_work+0x7e3/0x1460 > ? lock_acquire+0x31/0xc0 > ? process_one_work+0x7e3/0x1460 > process_one_work+0x85c/0x1460 > ? __pfx_process_one_work+0x10/0x10 > ? assign_work+0x16c/0x240 > worker_thread+0x5e6/0xfc0 > ? __pfx_worker_thread+0x10/0x10 > kthread+0x2c3/0x3a0 > ? __pfx_kthread+0x10/0x10 > ret_from_fork+0x31/0x70 > ? __pfx_kthread+0x10/0x10 > ret_from_fork_asm+0x1a/0x30 > </TASK> > > Allocated by task 3661: > kasan_save_stack+0x30/0x50 > kasan_save_track+0x14/0x30 > __kasan_kmalloc+0xaa/0xb0 > btrfs_encoded_read_regular_fill_pages+0x16c/0x6d0 [btrfs] > send_extent_data+0xf0f/0x24a0 [btrfs] > process_extent+0x48a/0x1830 [btrfs] > changed_cb+0x178b/0x2ea0 [btrfs] > btrfs_ioctl_send+0x3bf9/0x5c20 [btrfs] > _btrfs_ioctl_send+0x117/0x330 [btrfs] > btrfs_ioctl+0x184a/0x60a0 [btrfs] > __x64_sys_ioctl+0x12e/0x1a0 > do_syscall_64+0x95/0x180 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > Freed by task 3661: > kasan_save_stack+0x30/0x50 > kasan_save_track+0x14/0x30 > kasan_save_free_info+0x3b/0x70 > __kasan_slab_free+0x4f/0x70 > kfree+0x143/0x490 > btrfs_encoded_read_regular_fill_pages+0x531/0x6d0 [btrfs] > send_extent_data+0xf0f/0x24a0 [btrfs] > process_extent+0x48a/0x1830 [btrfs] > changed_cb+0x178b/0x2ea0 [btrfs] > btrfs_ioctl_send+0x3bf9/0x5c20 [btrfs] > _btrfs_ioctl_send+0x117/0x330 [btrfs] > btrfs_ioctl+0x184a/0x60a0 [btrfs] > __x64_sys_ioctl+0x12e/0x1a0 > do_syscall_64+0x95/0x180 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > The buggy address belongs to the object at ffff888106a83f00 > which belongs to the cache kmalloc-rnd-07-96 of size 96 > The buggy address is located 24 bytes inside of > freed 96-byte region [ffff888106a83f00, ffff888106a83f60) > > The buggy address belongs to the physical page: > page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888106a83800 pfn:0x106a83 > flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff) > page_type: f5(slab) > raw: 0017ffffc0000000 ffff888100053680 ffffea0004917200 0000000000000004 > raw: ffff888106a83800 0000000080200019 00000001f5000000 0000000000000000 > page dumped because: kasan: bad access detected > > Memory state around the buggy address: > ffff888106a83e00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc > ffff888106a83e80: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc > >ffff888106a83f00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc > ^ > ffff888106a83f80: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc > ffff888106a84000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > ================================================================== > > Further analyzing the trace and the crash dump's vmcore file shows that > the wake_up() call in btrfs_encoded_read_endio() is calling wake_up() on > the wait_queue that is in the private data passed to the end_io handler. > > Commit 4ff47df40447 ("btrfs: move priv off stack in > btrfs_encoded_read_regular_fill_pages()") moved 'struct > btrfs_encoded_read_private' off the stack. > > Before that commit one can see a corruption of the private data when > analyzing the vmcore after a crash: > > *(struct btrfs_encoded_read_private *)0xffff88815626eec8 = { > .wait = (wait_queue_head_t){ > .lock = (spinlock_t){ > .rlock = (struct raw_spinlock){ > .raw_lock = (arch_spinlock_t){ > .val = (atomic_t){ > .counter = (int)-2005885696, > }, > .locked = (u8)0, > .pending = (u8)157, > .locked_pending = (u16)40192, > .tail = (u16)34928, > }, > .magic = (unsigned int)536325682, > .owner_cpu = (unsigned int)29, > .owner = (void *)__SCT__tp_func_btrfs_transaction_commit+0x0 = 0x0, > .dep_map = (struct lockdep_map){ > .key = (struct lock_class_key *)0xffff8881575a3b6c, > .class_cache = (struct lock_class *[2]){ 0xffff8882a71985c0, 0xffffea00066f5d40 }, > .name = (const char *)0xffff88815626f100 = "", > .wait_type_outer = (u8)37, > .wait_type_inner = (u8)178, > .lock_type = (u8)154, > }, > }, > .__padding = (u8 [24]){ 0, 157, 112, 136, 50, 174, 247, 31, 29 }, > .dep_map = (struct lockdep_map){ > .key = (struct lock_class_key *)0xffff8881575a3b6c, > .class_cache = (struct lock_class *[2]){ 0xffff8882a71985c0, 0xffffea00066f5d40 }, > .name = (const char *)0xffff88815626f100 = "", > .wait_type_outer = (u8)37, > .wait_type_inner = (u8)178, > .lock_type = (u8)154, > }, > }, > .head = (struct list_head){ > .next = (struct list_head *)0x112cca, > .prev = (struct list_head *)0x47, > }, > }, > .pending = (atomic_t){ > .counter = (int)-1491499288, > }, > .status = (blk_status_t)130, > } > > Here we can see several indicators of in-memory data corruption, e.g. the > large negative atomic values of ->pending or > ->wait->lock->rlock->raw_lock->val, as well as the bogus spinlock magic > 0x1ff7ae32 (decimal 536325682 above) instead of 0xdead4ead or the bogus > pointer values for ->wait->head. > > To fix this, change atomic_dec_return() to atomic_dec_and_test() to fix the > corruption, as atomic_dec_return() is defined as two instructions on > x86_64, whereas atomic_dec_and_test() is defined as a single atomic > operation. This can lead to a situation where counter value is already > decremented but the if statement in btrfs_encoded_read_endio() is not > completely processed, i.e. the 0 test has not completed. If another thread > continues executing btrfs_encoded_read_regular_fill_pages() the > atomic_dec_return() there can see an already updated ->pending counter and > continues by freeing the private data. Continuing in the endio handler the > test for 0 succeeds and the wait_queue is woken up, resulting in a > use-after-free. > > Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> > Suggested-by: Damien Le Moal <Damien.LeMoal@wdc.com> > Fixes: 1881fba89bd5 ("btrfs: add BTRFS_IOC_ENCODED_READ ioctl") > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Although you reworked the whole @pending in the next patch which is pretty good, I assume this is a hotfix which needs to be backported. In that case, I'm wondering should we also fix the atomic_dec_return() also in btrfs_encoded_read_regular_fill_pages() just in case. Otherwise looks good to me. Reviewed-by: Qu Wenruo <wqu@suse.com> Thanks, Qu > --- > fs/btrfs/inode.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 22b8e2764619..fdad1adee1a3 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -9089,7 +9089,7 @@ static void btrfs_encoded_read_endio(struct btrfs_bio *bbio) > */ > WRITE_ONCE(priv->status, bbio->bio.bi_status); > } > - if (atomic_dec_return(&priv->pending) == 0) { > + if (atomic_dec_and_test(&priv->pending)) { > int err = blk_status_to_errno(READ_ONCE(priv->status)); > > if (priv->uring_ctx) { ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v3 2/2] btrfs: simplify waiting for encoded read endios 2024-11-13 17:16 [PATCH v3 0/2] btrfs: fix use-after-free in btrfs_encoded_read_endio Johannes Thumshirn 2024-11-13 17:16 ` [PATCH v3 1/2] " Johannes Thumshirn @ 2024-11-13 17:16 ` Johannes Thumshirn 2024-11-13 19:29 ` Filipe Manana 2024-11-13 21:11 ` Qu Wenruo 1 sibling, 2 replies; 7+ messages in thread From: Johannes Thumshirn @ 2024-11-13 17:16 UTC (permalink / raw) To: linux-btrfs Cc: Filipe Manana, Damien Le Moal, Johannes Thumshirn, Mark Harmstone, Omar Sandoval From: Johannes Thumshirn <johannes.thumshirn@wdc.com> Simplify the I/O completion path for encoded reads by using a completion instead of a wait_queue. Furthermore use refcount_t instead of atomic_t for reference counting the private data. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> --- fs/btrfs/inode.c | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index fdad1adee1a3..3ba78dc3abaa 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3,6 +3,7 @@ * Copyright (C) 2007 Oracle. All rights reserved. */ +#include "linux/completion.h" #include <crypto/hash.h> #include <linux/kernel.h> #include <linux/bio.h> @@ -9068,9 +9069,9 @@ static ssize_t btrfs_encoded_read_inline( } struct btrfs_encoded_read_private { - wait_queue_head_t wait; + struct completion done; void *uring_ctx; - atomic_t pending; + refcount_t pending_refs; blk_status_t status; }; @@ -9089,14 +9090,14 @@ static void btrfs_encoded_read_endio(struct btrfs_bio *bbio) */ WRITE_ONCE(priv->status, bbio->bio.bi_status); } - if (atomic_dec_and_test(&priv->pending)) { + if (refcount_dec_and_test(&priv->pending_refs)) { int err = blk_status_to_errno(READ_ONCE(priv->status)); if (priv->uring_ctx) { btrfs_uring_read_extent_endio(priv->uring_ctx, err); kfree(priv); } else { - wake_up(&priv->wait); + complete(&priv->done); } } bio_put(&bbio->bio); @@ -9116,8 +9117,8 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, if (!priv) return -ENOMEM; - init_waitqueue_head(&priv->wait); - atomic_set(&priv->pending, 1); + init_completion(&priv->done); + refcount_set(&priv->pending_refs, 1); priv->status = 0; priv->uring_ctx = uring_ctx; @@ -9130,7 +9131,7 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, size_t bytes = min_t(u64, disk_io_size, PAGE_SIZE); if (bio_add_page(&bbio->bio, pages[i], bytes, 0) < bytes) { - atomic_inc(&priv->pending); + refcount_inc(&priv->pending_refs); btrfs_submit_bbio(bbio, 0); bbio = btrfs_bio_alloc(BIO_MAX_VECS, REQ_OP_READ, fs_info, @@ -9145,11 +9146,11 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, disk_io_size -= bytes; } while (disk_io_size); - atomic_inc(&priv->pending); + refcount_inc(&priv->pending_refs); btrfs_submit_bbio(bbio, 0); if (uring_ctx) { - if (atomic_dec_return(&priv->pending) == 0) { + if (refcount_dec_and_test(&priv->pending_refs)) { ret = blk_status_to_errno(READ_ONCE(priv->status)); btrfs_uring_read_extent_endio(uring_ctx, ret); kfree(priv); @@ -9158,8 +9159,8 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, return -EIOCBQUEUED; } else { - if (atomic_dec_return(&priv->pending) != 0) - io_wait_event(priv->wait, !atomic_read(&priv->pending)); + if (!refcount_dec_and_test(&priv->pending_refs)) + wait_for_completion_io(&priv->done); /* See btrfs_encoded_read_endio() for ordering. */ ret = blk_status_to_errno(READ_ONCE(priv->status)); kfree(priv); -- 2.43.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v3 2/2] btrfs: simplify waiting for encoded read endios 2024-11-13 17:16 ` [PATCH v3 2/2] btrfs: simplify waiting for encoded read endios Johannes Thumshirn @ 2024-11-13 19:29 ` Filipe Manana 2024-11-13 21:11 ` Qu Wenruo 1 sibling, 0 replies; 7+ messages in thread From: Filipe Manana @ 2024-11-13 19:29 UTC (permalink / raw) To: Johannes Thumshirn Cc: linux-btrfs, Filipe Manana, Damien Le Moal, Johannes Thumshirn, Mark Harmstone, Omar Sandoval On Wed, Nov 13, 2024 at 5:17 PM Johannes Thumshirn <jth@kernel.org> wrote: > > From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > Simplify the I/O completion path for encoded reads by using a > completion instead of a wait_queue. > > Furthermore use refcount_t instead of atomic_t for reference counting the > private data. > > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Looks good now, thanks. > --- > fs/btrfs/inode.c | 23 ++++++++++++----------- > 1 file changed, 12 insertions(+), 11 deletions(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index fdad1adee1a3..3ba78dc3abaa 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -3,6 +3,7 @@ > * Copyright (C) 2007 Oracle. All rights reserved. > */ > > +#include "linux/completion.h" > #include <crypto/hash.h> > #include <linux/kernel.h> > #include <linux/bio.h> > @@ -9068,9 +9069,9 @@ static ssize_t btrfs_encoded_read_inline( > } > > struct btrfs_encoded_read_private { > - wait_queue_head_t wait; > + struct completion done; > void *uring_ctx; > - atomic_t pending; > + refcount_t pending_refs; > blk_status_t status; > }; > > @@ -9089,14 +9090,14 @@ static void btrfs_encoded_read_endio(struct btrfs_bio *bbio) > */ > WRITE_ONCE(priv->status, bbio->bio.bi_status); > } > - if (atomic_dec_and_test(&priv->pending)) { > + if (refcount_dec_and_test(&priv->pending_refs)) { > int err = blk_status_to_errno(READ_ONCE(priv->status)); > > if (priv->uring_ctx) { > btrfs_uring_read_extent_endio(priv->uring_ctx, err); > kfree(priv); > } else { > - wake_up(&priv->wait); > + complete(&priv->done); > } > } > bio_put(&bbio->bio); > @@ -9116,8 +9117,8 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, > if (!priv) > return -ENOMEM; > > - init_waitqueue_head(&priv->wait); > - atomic_set(&priv->pending, 1); > + init_completion(&priv->done); > + refcount_set(&priv->pending_refs, 1); > priv->status = 0; > priv->uring_ctx = uring_ctx; > > @@ -9130,7 +9131,7 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, > size_t bytes = min_t(u64, disk_io_size, PAGE_SIZE); > > if (bio_add_page(&bbio->bio, pages[i], bytes, 0) < bytes) { > - atomic_inc(&priv->pending); > + refcount_inc(&priv->pending_refs); > btrfs_submit_bbio(bbio, 0); > > bbio = btrfs_bio_alloc(BIO_MAX_VECS, REQ_OP_READ, fs_info, > @@ -9145,11 +9146,11 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, > disk_io_size -= bytes; > } while (disk_io_size); > > - atomic_inc(&priv->pending); > + refcount_inc(&priv->pending_refs); > btrfs_submit_bbio(bbio, 0); > > if (uring_ctx) { > - if (atomic_dec_return(&priv->pending) == 0) { > + if (refcount_dec_and_test(&priv->pending_refs)) { > ret = blk_status_to_errno(READ_ONCE(priv->status)); > btrfs_uring_read_extent_endio(uring_ctx, ret); > kfree(priv); > @@ -9158,8 +9159,8 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, > > return -EIOCBQUEUED; > } else { > - if (atomic_dec_return(&priv->pending) != 0) > - io_wait_event(priv->wait, !atomic_read(&priv->pending)); > + if (!refcount_dec_and_test(&priv->pending_refs)) > + wait_for_completion_io(&priv->done); > /* See btrfs_encoded_read_endio() for ordering. */ > ret = blk_status_to_errno(READ_ONCE(priv->status)); > kfree(priv); > -- > 2.43.0 > > -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.” ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3 2/2] btrfs: simplify waiting for encoded read endios 2024-11-13 17:16 ` [PATCH v3 2/2] btrfs: simplify waiting for encoded read endios Johannes Thumshirn 2024-11-13 19:29 ` Filipe Manana @ 2024-11-13 21:11 ` Qu Wenruo 1 sibling, 0 replies; 7+ messages in thread From: Qu Wenruo @ 2024-11-13 21:11 UTC (permalink / raw) To: Johannes Thumshirn, linux-btrfs Cc: Filipe Manana, Damien Le Moal, Johannes Thumshirn, Mark Harmstone, Omar Sandoval 在 2024/11/14 03:46, Johannes Thumshirn 写道: > From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > Simplify the I/O completion path for encoded reads by using a > completion instead of a wait_queue. > > Furthermore use refcount_t instead of atomic_t for reference counting the > private data. > > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Thanks, Qu > --- > fs/btrfs/inode.c | 23 ++++++++++++----------- > 1 file changed, 12 insertions(+), 11 deletions(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index fdad1adee1a3..3ba78dc3abaa 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -3,6 +3,7 @@ > * Copyright (C) 2007 Oracle. All rights reserved. > */ > > +#include "linux/completion.h" > #include <crypto/hash.h> > #include <linux/kernel.h> > #include <linux/bio.h> > @@ -9068,9 +9069,9 @@ static ssize_t btrfs_encoded_read_inline( > } > > struct btrfs_encoded_read_private { > - wait_queue_head_t wait; > + struct completion done; > void *uring_ctx; > - atomic_t pending; > + refcount_t pending_refs; > blk_status_t status; > }; > > @@ -9089,14 +9090,14 @@ static void btrfs_encoded_read_endio(struct btrfs_bio *bbio) > */ > WRITE_ONCE(priv->status, bbio->bio.bi_status); > } > - if (atomic_dec_and_test(&priv->pending)) { > + if (refcount_dec_and_test(&priv->pending_refs)) { > int err = blk_status_to_errno(READ_ONCE(priv->status)); > > if (priv->uring_ctx) { > btrfs_uring_read_extent_endio(priv->uring_ctx, err); > kfree(priv); > } else { > - wake_up(&priv->wait); > + complete(&priv->done); > } > } > bio_put(&bbio->bio); > @@ -9116,8 +9117,8 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, > if (!priv) > return -ENOMEM; > > - init_waitqueue_head(&priv->wait); > - atomic_set(&priv->pending, 1); > + init_completion(&priv->done); > + refcount_set(&priv->pending_refs, 1); > priv->status = 0; > priv->uring_ctx = uring_ctx; > > @@ -9130,7 +9131,7 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, > size_t bytes = min_t(u64, disk_io_size, PAGE_SIZE); > > if (bio_add_page(&bbio->bio, pages[i], bytes, 0) < bytes) { > - atomic_inc(&priv->pending); > + refcount_inc(&priv->pending_refs); > btrfs_submit_bbio(bbio, 0); > > bbio = btrfs_bio_alloc(BIO_MAX_VECS, REQ_OP_READ, fs_info, > @@ -9145,11 +9146,11 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, > disk_io_size -= bytes; > } while (disk_io_size); > > - atomic_inc(&priv->pending); > + refcount_inc(&priv->pending_refs); > btrfs_submit_bbio(bbio, 0); > > if (uring_ctx) { > - if (atomic_dec_return(&priv->pending) == 0) { > + if (refcount_dec_and_test(&priv->pending_refs)) { > ret = blk_status_to_errno(READ_ONCE(priv->status)); > btrfs_uring_read_extent_endio(uring_ctx, ret); > kfree(priv); > @@ -9158,8 +9159,8 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, > > return -EIOCBQUEUED; > } else { > - if (atomic_dec_return(&priv->pending) != 0) > - io_wait_event(priv->wait, !atomic_read(&priv->pending)); > + if (!refcount_dec_and_test(&priv->pending_refs)) > + wait_for_completion_io(&priv->done); > /* See btrfs_encoded_read_endio() for ordering. */ > ret = blk_status_to_errno(READ_ONCE(priv->status)); > kfree(priv); ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-11-13 21:12 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-11-13 17:16 [PATCH v3 0/2] btrfs: fix use-after-free in btrfs_encoded_read_endio Johannes Thumshirn 2024-11-13 17:16 ` [PATCH v3 1/2] " Johannes Thumshirn 2024-11-13 19:30 ` Filipe Manana 2024-11-13 21:11 ` Qu Wenruo 2024-11-13 17:16 ` [PATCH v3 2/2] btrfs: simplify waiting for encoded read endios Johannes Thumshirn 2024-11-13 19:29 ` Filipe Manana 2024-11-13 21:11 ` Qu Wenruo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox