* [v3 PATCH] ocfs2: fix recursive semaphore deadlock in fiemap call
[not found] ` <DM4PR10MB7476AAACC3BCFAA61077E606893BA@DM4PR10MB7476.namprd10.prod.outlook.com>
@ 2025-08-28 16:28 ` Mark Tinguely
2025-08-28 19:22 ` [External] : " Mark Tinguely
[not found] ` <93589648-9595-4fd1-a80f-5d1995c23fd6@linux.alibaba.com>
1 sibling, 1 reply; 3+ messages in thread
From: Mark Tinguely @ 2025-08-28 16:28 UTC (permalink / raw)
To: ocfs2-devel@lists.linux.dev
syzbot detected a OCFS2 hang due to a recursive semaphore on a
FS_IOC_FIEMAP of the extent list on a specially crafted mmap file.
context_switch kernel/sched/core.c:5357 [inline]
__schedule+0x1798/0x4cc0 kernel/sched/core.c:6961
__schedule_loop kernel/sched/core.c:7043 [inline]
schedule+0x165/0x360 kernel/sched/core.c:7058
schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:7115
rwsem_down_write_slowpath+0x872/0xfe0 kernel/locking/rwsem.c:1185
__down_write_common kernel/locking/rwsem.c:1317 [inline]
__down_write kernel/locking/rwsem.c:1326 [inline]
down_write+0x1ab/0x1f0 kernel/locking/rwsem.c:1591
ocfs2_page_mkwrite+0x2ff/0xc40 fs/ocfs2/mmap.c:142
do_page_mkwrite+0x14d/0x310 mm/memory.c:3361
wp_page_shared mm/memory.c:3762 [inline]
do_wp_page+0x268d/0x5800 mm/memory.c:3981
handle_pte_fault mm/memory.c:6068 [inline]
__handle_mm_fault+0x1033/0x5440 mm/memory.c:6195
handle_mm_fault+0x40a/0x8e0 mm/memory.c:6364
do_user_addr_fault+0x764/0x1390 arch/x86/mm/fault.c:1387
handle_page_fault arch/x86/mm/fault.c:1476 [inline]
exc_page_fault+0x76/0xf0 arch/x86/mm/fault.c:1532
asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
RIP: 0010:copy_user_generic arch/x86/include/asm/uaccess_64.h:126 [inline]
RIP: 0010:raw_copy_to_user arch/x86/include/asm/uaccess_64.h:147 [inline]
RIP: 0010:_inline_copy_to_user include/linux/uaccess.h:197 [inline]
RIP: 0010:_copy_to_user+0x85/0xb0 lib/usercopy.c:26
Code: e8 00 bc f7 fc 4d 39 fc 72 3d 4d 39 ec 77 38 e8 91 b9 f7 fc 4c 89
f7 89 de e8 47 25 5b fd 0f 01 cb 4c 89 ff 48 89 d9 4c 89 f6 <f3> a4 0f
1f 00 48 89 cb 0f 01 ca 48 89 d8 5b 41 5c 41 5d 41 5e 41
RSP: 0018:ffffc9000403f950 EFLAGS: 00050256
RAX: ffffffff84c7f101 RBX: 0000000000000038 RCX: 0000000000000038
RDX: 0000000000000000 RSI: ffffc9000403f9e0 RDI: 0000200000000060
RBP: ffffc9000403fa90 R08: ffffc9000403fa17 R09: 1ffff92000807f42
R10: dffffc0000000000 R11: fffff52000807f43 R12: 0000200000000098
R13: 00007ffffffff000 R14: ffffc9000403f9e0 R15: 0000200000000060
copy_to_user include/linux/uaccess.h:225 [inline]
fiemap_fill_next_extent+0x1c0/0x390 fs/ioctl.c:145
ocfs2_fiemap+0x888/0xc90 fs/ocfs2/extent_map.c:806
ioctl_fiemap fs/ioctl.c:220 [inline]
do_vfs_ioctl+0x1173/0x1430 fs/ioctl.c:532
__do_sys_ioctl fs/ioctl.c:596 [inline]
__se_sys_ioctl+0x82/0x170 fs/ioctl.c:584
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f5f13850fd9
RSP: 002b:00007ffe3b3518b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000200000000000 RCX: 00007f5f13850fd9
RDX: 0000200000000040 RSI: 00000000c020660b RDI: 0000000000000004
RBP: 6165627472616568 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe3b3518f0
R13: 00007ffe3b351b18 R14: 431bde82d7b634db R15: 00007f5f1389a03b
ocfs2_fiemap() takes a read lock of the ip_alloc_sem semaphore (since
v2.6.22-527-g7307de80510a) and calls fiemap_fill_next_extent()
to read the extent list of this running mmap executable.
The user supplied buffer to hold the fiemap information page faults
calling ocfs2_page_mkwrite() which will take a write lock (since
v2.6.27-38-g00dc417fa3e7) of the same semaphore. This recursive
semaphore will hold filesystem locks and causes a hang of the
fileystem.
The ip_alloc_sem protects the inode extent list and size.
Release the read semphore before calling fiemap_fill_next_extent()
in ocfs2_fiemap() and ocfs2_fiemap_inline(). ocfs2_fiemap_inline()
now returns with semaphore released.
Reported-by: syzbot+541dcc6ee768f77103e7@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=541dcc6ee768f77103e7
<https://syzkaller.appspot.com/bug?extid=541dcc6ee768f77103e7>
Signed-off-by: Mark Tinguely <mark.tinguely@oracle.com>
---
v3: also release semaphore in ocfs2_fiemap_inline() with semaphore released
move semaphore before calling fiemap_fill_next_extent()
change the error path to release the semaphore to avoid an unnecessary
semaphore down/up in the last loop..
v2: leave the write semaphore in ocfs2_page_mkwrite() but remove the
read semaphore before calling fiemap_fill_next_extent().
Add a Closes: line
review noted missing semaphore change to ocfs2_fiemap_inline()
suggest moving locks to fiemap_fill_next_extent() call.
v1: change the ip_alloc_sem from write to read.
rejected in review.
---
fs/ocfs2/extent_map.c | 27 +++++++++++++++++++++------
1 file changed, 21 insertions(+), 6 deletions(-)
diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
index 930150ed5db1..20045fb1bb70 100644
--- a/fs/ocfs2/extent_map.c
+++ b/fs/ocfs2/extent_map.c
@@ -706,6 +706,8 @@ int ocfs2_extent_map_get_blocks(struct inode *inode,
u64 v_blkno, u64 *p_blkno,
* it not only handles the fiemap for inlined files, but also deals
* with the fast symlink, cause they have no difference for extent
* mapping per se.
+ *
+ * Called with ip_alloc_sem taken and returned with it released.
*/
static int ocfs2_fiemap_inline(struct inode *inode, struct
buffer_head *di_bh,
struct fiemap_extent_info *fieinfo,
@@ -732,10 +734,16 @@ static int ocfs2_fiemap_inline(struct inode
*inode, struct buffer_head *di_bh,
phys += offsetof(struct ocfs2_dinode,
id2.i_data.id_data);
+ /* prevent deadlock to a mmap executable page fault */
+ up_read(&OCFS2_I(inode)->ip_alloc_sem);
+
ret = fiemap_fill_next_extent(fieinfo, 0, phys, id_count,
flags);
if (ret < 0)
return ret;
+ } else {
+ /* return with the semaphore released */
+ up_read(&OCFS2_I(inode)->ip_alloc_sem);
}
return 0;
@@ -769,6 +777,7 @@ int ocfs2_fiemap(struct inode *inode, struct
fiemap_extent_info *fieinfo,
*/
if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) ||
ocfs2_inode_is_fast_symlink(inode)) {
+ /* ip_alloc_sem is released in ocfs2_fiemap_inline() */
ret = ocfs2_fiemap_inline(inode, di_bh, fieinfo,
map_start);
goto out_unlock;
}
@@ -784,6 +793,7 @@ int ocfs2_fiemap(struct inode *inode, struct
fiemap_extent_info *fieinfo,
&hole_size, &rec,
&is_last);
if (ret) {
mlog_errno(ret);
+ up_read(&OCFS2_I(inode)->ip_alloc_sem);
goto out_unlock;
}
@@ -803,22 +813,27 @@ int ocfs2_fiemap(struct inode *inode, struct
fiemap_extent_info *fieinfo,
phys_bytes = le64_to_cpu(rec.e_blkno) <<
osb->sb->s_blocksize_bits;
virt_bytes = (u64)le32_to_cpu(rec.e_cpos) <<
osb->s_clustersize_bits;
+ /* prevent deadlock to a mmap executable page fault */
+ up_read(&OCFS2_I(inode)->ip_alloc_sem);
+
ret = fiemap_fill_next_extent(fieinfo, virt_bytes,
phys_bytes,
len_bytes, fe_flags);
- if (ret)
- break;
+ if (ret) {
+ if (ret > 0)
+ ret = 0;
+ goto out_unlock;
+ }
+
+ down_read(&OCFS2_I(inode)->ip_alloc_sem);
cpos = le32_to_cpu(rec.e_cpos)+
le16_to_cpu(rec.e_leaf_clusters);
}
- if (ret > 0)
- ret = 0;
+ up_read(&OCFS2_I(inode)->ip_alloc_sem);
out_unlock:
brelse(di_bh);
- up_read(&OCFS2_I(inode)->ip_alloc_sem);
-
ocfs2_inode_unlock(inode, 0);
out:
--
2.39.5 (Apple Git-154)
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [External] : [v3 PATCH] ocfs2: fix recursive semaphore deadlock in fiemap call
2025-08-28 16:28 ` [v3 PATCH] ocfs2: fix recursive semaphore deadlock in fiemap call Mark Tinguely
@ 2025-08-28 19:22 ` Mark Tinguely
0 siblings, 0 replies; 3+ messages in thread
From: Mark Tinguely @ 2025-08-28 19:22 UTC (permalink / raw)
To: ocfs2-devel@lists.linux.dev
On 8/28/25 11:28 AM, Mark Tinguely wrote:
> syzbot detected a OCFS2 hang due to a recursive semaphore on a
> FS_IOC_FIEMAP of the extent list on a specially crafted mmap file.
>
> context_switch kernel/sched/core.c:5357 [inline]
> __schedule+0x1798/0x4cc0 kernel/sched/core.c:6961
> __schedule_loop kernel/sched/core.c:7043 [inline]
> schedule+0x165/0x360 kernel/sched/core.c:7058
> schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:7115
> rwsem_down_write_slowpath+0x872/0xfe0 kernel/locking/rwsem.c:1185
> __down_write_common kernel/locking/rwsem.c:1317 [inline]
> __down_write kernel/locking/rwsem.c:1326 [inline]
> down_write+0x1ab/0x1f0 kernel/locking/rwsem.c:1591
> ocfs2_page_mkwrite+0x2ff/0xc40 fs/ocfs2/mmap.c:142
> do_page_mkwrite+0x14d/0x310 mm/memory.c:3361
> wp_page_shared mm/memory.c:3762 [inline]
> do_wp_page+0x268d/0x5800 mm/memory.c:3981
> handle_pte_fault mm/memory.c:6068 [inline]
> __handle_mm_fault+0x1033/0x5440 mm/memory.c:6195
> handle_mm_fault+0x40a/0x8e0 mm/memory.c:6364
> do_user_addr_fault+0x764/0x1390 arch/x86/mm/fault.c:1387
> handle_page_fault arch/x86/mm/fault.c:1476 [inline]
> exc_page_fault+0x76/0xf0 arch/x86/mm/fault.c:1532
> asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
> RIP: 0010:copy_user_generic arch/x86/include/asm/uaccess_64.h:126 [inline]
> RIP: 0010:raw_copy_to_user arch/x86/include/asm/uaccess_64.h:147 [inline]
> RIP: 0010:_inline_copy_to_user include/linux/uaccess.h:197 [inline]
> RIP: 0010:_copy_to_user+0x85/0xb0 lib/usercopy.c:26
> Code: e8 00 bc f7 fc 4d 39 fc 72 3d 4d 39 ec 77 38 e8 91 b9 f7 fc 4c 89
> f7 89 de e8 47 25 5b fd 0f 01 cb 4c 89 ff 48 89 d9 4c 89 f6 <f3> a4 0f
> 1f 00 48 89 cb 0f 01 ca 48 89 d8 5b 41 5c 41 5d 41 5e 41
> RSP: 0018:ffffc9000403f950 EFLAGS: 00050256
> RAX: ffffffff84c7f101 RBX: 0000000000000038 RCX: 0000000000000038
> RDX: 0000000000000000 RSI: ffffc9000403f9e0 RDI: 0000200000000060
> RBP: ffffc9000403fa90 R08: ffffc9000403fa17 R09: 1ffff92000807f42
> R10: dffffc0000000000 R11: fffff52000807f43 R12: 0000200000000098
> R13: 00007ffffffff000 R14: ffffc9000403f9e0 R15: 0000200000000060
> copy_to_user include/linux/uaccess.h:225 [inline]
> fiemap_fill_next_extent+0x1c0/0x390 fs/ioctl.c:145
> ocfs2_fiemap+0x888/0xc90 fs/ocfs2/extent_map.c:806
> ioctl_fiemap fs/ioctl.c:220 [inline]
> do_vfs_ioctl+0x1173/0x1430 fs/ioctl.c:532
> __do_sys_ioctl fs/ioctl.c:596 [inline]
> __se_sys_ioctl+0x82/0x170 fs/ioctl.c:584
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f5f13850fd9
> RSP: 002b:00007ffe3b3518b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 0000200000000000 RCX: 00007f5f13850fd9
> RDX: 0000200000000040 RSI: 00000000c020660b RDI: 0000000000000004
> RBP: 6165627472616568 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe3b3518f0
> R13: 00007ffe3b351b18 R14: 431bde82d7b634db R15: 00007f5f1389a03b
>
> ocfs2_fiemap() takes a read lock of the ip_alloc_sem semaphore (since
> v2.6.22-527-g7307de80510a) and calls fiemap_fill_next_extent()
> to read the extent list of this running mmap executable.
> The user supplied buffer to hold the fiemap information page faults
> calling ocfs2_page_mkwrite() which will take a write lock (since
> v2.6.27-38-g00dc417fa3e7) of the same semaphore. This recursive
> semaphore will hold filesystem locks and causes a hang of the
> fileystem.
>
> The ip_alloc_sem protects the inode extent list and size.
> Release the read semphore before calling fiemap_fill_next_extent()
> in ocfs2_fiemap() and ocfs2_fiemap_inline(). ocfs2_fiemap_inline()
> now returns with semaphore released.
>
> Reported-by: syzbot+541dcc6ee768f77103e7@syzkaller.appspotmail.com
> Closes: https://urldefense.com/v3/__https://syzkaller.appspot.com/bug?
> extid=541dcc6ee768f77103e7__;!!ACWV5N9M2RV99hQ!KtyP2cHen-
> zXLVmJXh4CmpXisEbTC01mhqQIZPQb-
> tKoEgZT2UhmXJNaiY4HXGpqTR9pYFHNSWzNyNyxGosOgxburSg$ <https://
> urldefense.com/v3/__https://syzkaller.appspot.com/bug?
> extid=541dcc6ee768f77103e7__;!!ACWV5N9M2RV99hQ!KtyP2cHen-
> zXLVmJXh4CmpXisEbTC01mhqQIZPQb-
> tKoEgZT2UhmXJNaiY4HXGpqTR9pYFHNSWzNyNyxGosOgxburSg$ >
>
> Signed-off-by: Mark Tinguely <mark.tinguely@oracle.com>
> ---
> v3: also release semaphore in ocfs2_fiemap_inline() with semaphore released
> move semaphore before calling fiemap_fill_next_extent()
> change the error path to release the semaphore to avoid an
> unnecessary
> semaphore down/up in the last loop..
>
> v2: leave the write semaphore in ocfs2_page_mkwrite() but remove the
> read semaphore before calling fiemap_fill_next_extent().
> Add a Closes: line
> review noted missing semaphore change to ocfs2_fiemap_inline()
> suggest moving locks to fiemap_fill_next_extent() call.
>
> v1: change the ip_alloc_sem from write to read.
> rejected in review.
> ---
> fs/ocfs2/extent_map.c | 27 +++++++++++++++++++++------
> 1 file changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
> index 930150ed5db1..20045fb1bb70 100644
> --- a/fs/ocfs2/extent_map.c
> +++ b/fs/ocfs2/extent_map.c
> @@ -706,6 +706,8 @@ int ocfs2_extent_map_get_blocks(struct inode *inode,
> u64 v_blkno, u64 *p_blkno,
> * it not only handles the fiemap for inlined files, but also deals
> * with the fast symlink, cause they have no difference for extent
> * mapping per se.
> + *
> + * Called with ip_alloc_sem taken and returned with it released.
> */
> static int ocfs2_fiemap_inline(struct inode *inode, struct
> buffer_head *di_bh,
> struct fiemap_extent_info *fieinfo,
> @@ -732,10 +734,16 @@ static int ocfs2_fiemap_inline(struct inode
> *inode, struct buffer_head *di_bh,
> phys += offsetof(struct ocfs2_dinode,
> id2.i_data.id_data);
>
> + /* prevent deadlock to a mmap executable page fault */
> + up_read(&OCFS2_I(inode)->ip_alloc_sem);
> +
> ret = fiemap_fill_next_extent(fieinfo, 0, phys,
> id_count,
> flags);
> if (ret < 0)
> return ret;
> + } else {
> + /* return with the semaphore released */
> + up_read(&OCFS2_I(inode)->ip_alloc_sem);
> }
>
> return 0;
> @@ -769,6 +777,7 @@ int ocfs2_fiemap(struct inode *inode, struct
> fiemap_extent_info *fieinfo,
> */
> if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) ||
> ocfs2_inode_is_fast_symlink(inode)) {
> + /* ip_alloc_sem is released in ocfs2_fiemap_inline() */
> ret = ocfs2_fiemap_inline(inode, di_bh, fieinfo,
> map_start);
> goto out_unlock;
> }
> @@ -784,6 +793,7 @@ int ocfs2_fiemap(struct inode *inode, struct
> fiemap_extent_info *fieinfo,
> &hole_size, &rec,
> &is_last);
> if (ret) {
> mlog_errno(ret);
> + up_read(&OCFS2_I(inode)->ip_alloc_sem);
not sure what happened to the white space here.
we can remove and change the goto to a new lable, say out_upsema ...
> goto out_unlock;
> }
>
> @@ -803,22 +813,27 @@ int ocfs2_fiemap(struct inode *inode, struct
> fiemap_extent_info *fieinfo,
> phys_bytes = le64_to_cpu(rec.e_blkno) << osb->sb-
> >s_blocksize_bits;
> virt_bytes = (u64)le32_to_cpu(rec.e_cpos) << osb-
> >s_clustersize_bits;
>
> + /* prevent deadlock to a mmap executable page fault */
> + up_read(&OCFS2_I(inode)->ip_alloc_sem);
> +
> ret = fiemap_fill_next_extent(fieinfo, virt_bytes,
> phys_bytes,
> len_bytes, fe_flags);
> - if (ret)
> - break;
> + if (ret) {
> + if (ret > 0)
> + ret = 0;
> + goto out_unlock;
> + }
> +
> + down_read(&OCFS2_I(inode)->ip_alloc_sem);
>
> cpos = le32_to_cpu(rec.e_cpos)+
> le16_to_cpu(rec.e_leaf_clusters);
> }
>
> - if (ret > 0)
> - ret = 0;
... could put the new out_upseama label here
> + up_read(&OCFS2_I(inode)->ip_alloc_sem);
>
> out_unlock:
> brelse(di_bh);
>
> - up_read(&OCFS2_I(inode)->ip_alloc_sem);
> -
> ocfs2_inode_unlock(inode, 0);
> out:
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [External] : Re: [v3 PATCH] ocfs2: fix recursive semaphore deadlock in fiemap call
[not found] ` <93589648-9595-4fd1-a80f-5d1995c23fd6@linux.alibaba.com>
@ 2025-08-29 13:28 ` Mark Tinguely
0 siblings, 0 replies; 3+ messages in thread
From: Mark Tinguely @ 2025-08-29 13:28 UTC (permalink / raw)
To: Joseph Qi
Cc: Mark Fasheh, heming.zhao@suse.com,
syzbot+541dcc6ee768f77103e7@syzkaller.appspotmail.com,
ocfs2-devel@lists.linux.dev
On 8/28/25 9:10 PM, Joseph Qi wrote:
>
>
> On 2025/8/28 23:53, Mark Tinguely wrote:
>> syzbot detected a OCFS2 hang due to a recursive semaphore on a
>> FS_IOC_FIEMAP of the extent list on a specially crafted mmap file.
>>
>> context_switch kernel/sched/core.c:5357 [inline]
>> __schedule+0x1798/0x4cc0 kernel/sched/core.c:6961
>> __schedule_loop kernel/sched/core.c:7043 [inline]
>> schedule+0x165/0x360 kernel/sched/core.c:7058
>> schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:7115
>> rwsem_down_write_slowpath+0x872/0xfe0 kernel/locking/rwsem.c:1185
>> __down_write_common kernel/locking/rwsem.c:1317 [inline]
>> __down_write kernel/locking/rwsem.c:1326 [inline]
>> down_write+0x1ab/0x1f0 kernel/locking/rwsem.c:1591
>> ocfs2_page_mkwrite+0x2ff/0xc40 fs/ocfs2/mmap.c:142
>> do_page_mkwrite+0x14d/0x310 mm/memory.c:3361
>> wp_page_shared mm/memory.c:3762 [inline]
>> do_wp_page+0x268d/0x5800 mm/memory.c:3981
>> handle_pte_fault mm/memory.c:6068 [inline]
>> __handle_mm_fault+0x1033/0x5440 mm/memory.c:6195
>> handle_mm_fault+0x40a/0x8e0 mm/memory.c:6364
>> do_user_addr_fault+0x764/0x1390 arch/x86/mm/fault.c:1387
>> handle_page_fault arch/x86/mm/fault.c:1476 [inline]
>> exc_page_fault+0x76/0xf0 arch/x86/mm/fault.c:1532
>> asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
>> RIP: 0010:copy_user_generic arch/x86/include/asm/uaccess_64.h:126 [inline]
>> RIP: 0010:raw_copy_to_user arch/x86/include/asm/uaccess_64.h:147 [inline]
>> RIP: 0010:_inline_copy_to_user include/linux/uaccess.h:197 [inline]
>> RIP: 0010:_copy_to_user+0x85/0xb0 lib/usercopy.c:26
>> Code: e8 00 bc f7 fc 4d 39 fc 72 3d 4d 39 ec 77 38 e8 91 b9 f7 fc 4c 89
>> f7 89 de e8 47 25 5b fd 0f 01 cb 4c 89 ff 48 89 d9 4c 89 f6 <f3> a4 0f
>> 1f 00 48 89 cb 0f 01 ca 48 89 d8 5b 41 5c 41 5d 41 5e 41
>> RSP: 0018:ffffc9000403f950 EFLAGS: 00050256
>> RAX: ffffffff84c7f101 RBX: 0000000000000038 RCX: 0000000000000038
>> RDX: 0000000000000000 RSI: ffffc9000403f9e0 RDI: 0000200000000060
>> RBP: ffffc9000403fa90 R08: ffffc9000403fa17 R09: 1ffff92000807f42
>> R10: dffffc0000000000 R11: fffff52000807f43 R12: 0000200000000098
>> R13: 00007ffffffff000 R14: ffffc9000403f9e0 R15: 0000200000000060
>> copy_to_user include/linux/uaccess.h:225 [inline]
>> fiemap_fill_next_extent+0x1c0/0x390 fs/ioctl.c:145
>> ocfs2_fiemap+0x888/0xc90 fs/ocfs2/extent_map.c:806
>> ioctl_fiemap fs/ioctl.c:220 [inline]
>> do_vfs_ioctl+0x1173/0x1430 fs/ioctl.c:532
>> __do_sys_ioctl fs/ioctl.c:596 [inline]
>> __se_sys_ioctl+0x82/0x170 fs/ioctl.c:584
>> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>> do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
>> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>> RIP: 0033:0x7f5f13850fd9
>> RSP: 002b:00007ffe3b3518b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>> RAX: ffffffffffffffda RBX: 0000200000000000 RCX: 00007f5f13850fd9
>> RDX: 0000200000000040 RSI: 00000000c020660b RDI: 0000000000000004
>> RBP: 6165627472616568 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe3b3518f0
>> R13: 00007ffe3b351b18 R14: 431bde82d7b634db R15: 00007f5f1389a03b
>>
>> ocfs2_fiemap() takes a read lock of the ip_alloc_sem semaphore (since
>> v2.6.22-527-g7307de80510a) and calls fiemap_fill_next_extent()
>> to read the extent list of this running mmap executable.
>> The user supplied buffer to hold the fiemap information page faults
>> calling ocfs2_page_mkwrite() which will take a write lock (since
>> v2.6.27-38-g00dc417fa3e7) of the same semaphore. This recursive
>> semaphore will hold filesystem locks and causes a hang of the
>> fileystem.
>>
>> The ip_alloc_sem protects the inode extent list and size.
>> Release the read semphore before calling fiemap_fill_next_extent()
>> in ocfs2_fiemap() and ocfs2_fiemap_inline(). ocfs2_fiemap_inline()
>> now returns with semaphore released.
>>
>> Reported-by: syzbot+541dcc6ee768f77103e7@syzkaller.appspotmail.com
>> Closes: https://urldefense.com/v3/__https://syzkaller.appspot.com/bug?extid=541dcc6ee768f77103e7__;!!ACWV5N9M2RV99hQ!NTMKVnLmJgrhlYd54LbJrSh-LYdJHOiPt6yBu8E6fL4ZIyBsfKWEPjK4RjXsQYcKPJyOSy5qkOZa9KpJyZhjl4WebbHI$
>>
>> Signed-off-by: Mark Tinguely <mark.tinguely@oracle.com>
>> ---
>> v3: also release semaphore in ocfs2_fiemap_inline() with semaphore released
>> move semaphore before calling fiemap_fill_next_extent()
>> change the error path to release the semaphore to avoid an unnecessary
>> semaphore down/up in the last loop..
>>
>> v2: leave the write semaphore in ocfs2_page_mkwrite() but remove the
>> read semaphore before calling fiemap_fill_next_extent().
>> Add a Closes: line
>> review noted missing semaphore change to ocfs2_fiemap_inline()
>> suggest moving locks to fiemap_fill_next_extent() call.
>>
>> v1: change the ip_alloc_sem from write to read.
>> rejected in review.
>> ---
>> fs/ocfs2/extent_map.c | 27 +++++++++++++++++++++------
>> 1 file changed, 21 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>> index 930150ed5db1..20045fb1bb70 100644
>> --- a/fs/ocfs2/extent_map.c
>> +++ b/fs/ocfs2/extent_map.c
>> @@ -706,6 +706,8 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 v_blkno, u64 *p_blkno,
>> * it not only handles the fiemap for inlined files, but also deals
>> * with the fast symlink, cause they have no difference for extent
>> * mapping per se.
>> + *
>> + * Called with ip_alloc_sem taken and returned with it released.
>
> This makes things complicated.
>
>> */
>> static int ocfs2_fiemap_inline(struct inode *inode, struct buffer_head *di_bh,
>> struct fiemap_extent_info *fieinfo,
>> @@ -732,10 +734,16 @@ static int ocfs2_fiemap_inline(struct inode *inode, struct buffer_head *di_bh,
>> phys += offsetof(struct ocfs2_dinode,
>> id2.i_data.id_data);
>>
>> + /* prevent deadlock to a mmap executable page fault */
>> + up_read(&OCFS2_I(inode)->ip_alloc_sem);
>> +
>> ret = fiemap_fill_next_extent(fieinfo, 0, phys, id_count,
>> flags);
>> if (ret < 0)
>> return ret;
>> + } else {
>> + /* return with the semaphore released */
>> + up_read(&OCFS2_I(inode)->ip_alloc_sem);
>> }
>>
>> return 0;
>> @@ -769,6 +777,7 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>> */
>> if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) ||
>> ocfs2_inode_is_fast_symlink(inode)) {
>> + /* ip_alloc_sem is released in ocfs2_fiemap_inline() */
>> ret = ocfs2_fiemap_inline(inode, di_bh, fieinfo, map_start);
>> goto out_unlock;
>> }
>> @@ -784,6 +793,7 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>> &hole_size, &rec, &is_last);
>> if (ret) {
>> mlog_errno(ret);
>> + up_read(&OCFS2_I(inode)->ip_alloc_sem);
>> goto out_unlock;
>> }
>>
>> @@ -803,22 +813,27 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>> phys_bytes = le64_to_cpu(rec.e_blkno) << osb->sb->s_blocksize_bits;
>> virt_bytes = (u64)le32_to_cpu(rec.e_cpos) << osb->s_clustersize_bits;
>>
>> + /* prevent deadlock to a mmap executable page fault */
>> + up_read(&OCFS2_I(inode)->ip_alloc_sem);
>> +
>> ret = fiemap_fill_next_extent(fieinfo, virt_bytes, phys_bytes,
>> len_bytes, fe_flags);
>> - if (ret)
>> - break;
>> + if (ret) {
>> + if (ret > 0)
>> + ret = 0;
>> + goto out_unlock;
>> + }
>> +
>> + down_read(&OCFS2_I(inode)->ip_alloc_sem);
>>
>> cpos = le32_to_cpu(rec.e_cpos)+ le16_to_cpu(rec.e_leaf_clusters);
>> }
>>
>> - if (ret > 0)
>> - ret = 0;
>> + up_read(&OCFS2_I(inode)->ip_alloc_sem);
>>
>> out_unlock:
>> brelse(di_bh);
>>
>> - up_read(&OCFS2_I(inode)->ip_alloc_sem);
>> -
>> ocfs2_inode_unlock(inode, 0);
>> out:
>>
>> --
>> 2.39.5 (Apple Git-154)
>>
>
> Could we simplify it in the following way?
>
> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
> index 930150ed5db1..1ab4bfa6fc24 100644
> --- a/fs/ocfs2/extent_map.c
> +++ b/fs/ocfs2/extent_map.c
> @@ -706,6 +706,8 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 v_blkno, u64 *p_blkno,
> * it not only handles the fiemap for inlined files, but also deals
> * with the fast symlink, cause they have no difference for extent
> * mapping per se.
> + *
> + * Must be called with ip_alloc_sem read lock held.
> */
> static int ocfs2_fiemap_inline(struct inode *inode, struct buffer_head *di_bh,
> struct fiemap_extent_info *fieinfo,
> @@ -718,6 +720,8 @@ static int ocfs2_fiemap_inline(struct inode *inode, struct buffer_head *di_bh,
> u32 flags = FIEMAP_EXTENT_DATA_INLINE|FIEMAP_EXTENT_LAST;
> struct ocfs2_inode_info *oi = OCFS2_I(inode);
>
> + lockdep_assert_held_read(&oi->ip_alloc_sem);
> +
> di = (struct ocfs2_dinode *)di_bh->b_data;
> if (ocfs2_inode_is_fast_symlink(inode))
> id_count = ocfs2_fast_symlink_chars(inode->i_sb);
> @@ -732,8 +736,11 @@ static int ocfs2_fiemap_inline(struct inode *inode, struct buffer_head *di_bh,
> phys += offsetof(struct ocfs2_dinode,
> id2.i_data.id_data);
>
> + /* Release ip_alloc_sem to prevent deadlock on fault */
> + up_read(&oi->ip_alloc_sem);
> ret = fiemap_fill_next_extent(fieinfo, 0, phys, id_count,
> flags);
> + down_read(&oi->ip_alloc_sem);
> if (ret < 0)
> return ret;
> }
> @@ -803,8 +810,11 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> phys_bytes = le64_to_cpu(rec.e_blkno) << osb->sb->s_blocksize_bits;
> virt_bytes = (u64)le32_to_cpu(rec.e_cpos) << osb->s_clustersize_bits;
>
> + /* Release ip_alloc_sem to prevent deadlock on fault */
> + up_read(&OCFS2_I(inode)->ip_alloc_sem);
> ret = fiemap_fill_next_extent(fieinfo, virt_bytes, phys_bytes,
> len_bytes, fe_flags);
> + down_read(&OCFS2_I(inode)->ip_alloc_sem);
> if (ret)
> break;
>
> Thanks,
> Joseph
>
yes. I was being fancy to avoid an unnecessary semaphore lock/unlock
rounnd trip for other writers. I suspected that this was going to be the
next request.
Mark.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-08-29 13:29 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20250828154733.13591-1-mark.tinguely@oracle.com>
[not found] ` <DM4PR10MB7476AAACC3BCFAA61077E606893BA@DM4PR10MB7476.namprd10.prod.outlook.com>
2025-08-28 16:28 ` [v3 PATCH] ocfs2: fix recursive semaphore deadlock in fiemap call Mark Tinguely
2025-08-28 19:22 ` [External] : " Mark Tinguely
[not found] ` <93589648-9595-4fd1-a80f-5d1995c23fd6@linux.alibaba.com>
2025-08-29 13:28 ` [External] : " Mark Tinguely
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).