ocfs2-devel.oss.oracle.com archive mirror
 help / color / mirror / Atom feed
* [v3 PATCH] ocfs2: fix recursive semaphore deadlock in fiemap call
       [not found] ` <DM4PR10MB7476AAACC3BCFAA61077E606893BA@DM4PR10MB7476.namprd10.prod.outlook.com>
@ 2025-08-28 16:28   ` Mark Tinguely
  2025-08-28 19:22     ` [External] : " Mark Tinguely
       [not found]   ` <93589648-9595-4fd1-a80f-5d1995c23fd6@linux.alibaba.com>
  1 sibling, 1 reply; 3+ messages in thread
From: Mark Tinguely @ 2025-08-28 16:28 UTC (permalink / raw)
  To: ocfs2-devel@lists.linux.dev

syzbot detected a OCFS2 hang due to a recursive semaphore on a
FS_IOC_FIEMAP of the extent list on a specially crafted mmap file.

context_switch kernel/sched/core.c:5357 [inline]
    __schedule+0x1798/0x4cc0 kernel/sched/core.c:6961
    __schedule_loop kernel/sched/core.c:7043 [inline]
    schedule+0x165/0x360 kernel/sched/core.c:7058
    schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:7115
    rwsem_down_write_slowpath+0x872/0xfe0 kernel/locking/rwsem.c:1185
    __down_write_common kernel/locking/rwsem.c:1317 [inline]
    __down_write kernel/locking/rwsem.c:1326 [inline]
    down_write+0x1ab/0x1f0 kernel/locking/rwsem.c:1591
    ocfs2_page_mkwrite+0x2ff/0xc40 fs/ocfs2/mmap.c:142
    do_page_mkwrite+0x14d/0x310 mm/memory.c:3361
    wp_page_shared mm/memory.c:3762 [inline]
    do_wp_page+0x268d/0x5800 mm/memory.c:3981
    handle_pte_fault mm/memory.c:6068 [inline]
    __handle_mm_fault+0x1033/0x5440 mm/memory.c:6195
    handle_mm_fault+0x40a/0x8e0 mm/memory.c:6364
    do_user_addr_fault+0x764/0x1390 arch/x86/mm/fault.c:1387
    handle_page_fault arch/x86/mm/fault.c:1476 [inline]
    exc_page_fault+0x76/0xf0 arch/x86/mm/fault.c:1532
    asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
RIP: 0010:copy_user_generic arch/x86/include/asm/uaccess_64.h:126 [inline]
RIP: 0010:raw_copy_to_user arch/x86/include/asm/uaccess_64.h:147 [inline]
RIP: 0010:_inline_copy_to_user include/linux/uaccess.h:197 [inline]
RIP: 0010:_copy_to_user+0x85/0xb0 lib/usercopy.c:26
Code: e8 00 bc f7 fc 4d 39 fc 72 3d 4d 39 ec 77 38 e8 91 b9 f7 fc 4c 89
f7 89 de e8 47 25 5b fd 0f 01 cb 4c 89 ff 48 89 d9 4c 89 f6 <f3> a4 0f
1f 00 48 89 cb 0f 01 ca 48 89 d8 5b 41 5c 41 5d 41 5e 41
RSP: 0018:ffffc9000403f950 EFLAGS: 00050256
RAX: ffffffff84c7f101 RBX: 0000000000000038 RCX: 0000000000000038
RDX: 0000000000000000 RSI: ffffc9000403f9e0 RDI: 0000200000000060
RBP: ffffc9000403fa90 R08: ffffc9000403fa17 R09: 1ffff92000807f42
R10: dffffc0000000000 R11: fffff52000807f43 R12: 0000200000000098
R13: 00007ffffffff000 R14: ffffc9000403f9e0 R15: 0000200000000060
    copy_to_user include/linux/uaccess.h:225 [inline]
    fiemap_fill_next_extent+0x1c0/0x390 fs/ioctl.c:145
    ocfs2_fiemap+0x888/0xc90 fs/ocfs2/extent_map.c:806
    ioctl_fiemap fs/ioctl.c:220 [inline]
    do_vfs_ioctl+0x1173/0x1430 fs/ioctl.c:532
    __do_sys_ioctl fs/ioctl.c:596 [inline]
    __se_sys_ioctl+0x82/0x170 fs/ioctl.c:584
    do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
    do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
    entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f5f13850fd9
RSP: 002b:00007ffe3b3518b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000200000000000 RCX: 00007f5f13850fd9
RDX: 0000200000000040 RSI: 00000000c020660b RDI: 0000000000000004
RBP: 6165627472616568 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe3b3518f0
R13: 00007ffe3b351b18 R14: 431bde82d7b634db R15: 00007f5f1389a03b

ocfs2_fiemap() takes a read lock of the ip_alloc_sem semaphore (since
v2.6.22-527-g7307de80510a) and calls fiemap_fill_next_extent()
to read the extent list of this running mmap executable.
The user supplied buffer to hold the fiemap information page faults
calling ocfs2_page_mkwrite() which will take a write lock (since
v2.6.27-38-g00dc417fa3e7) of the same semaphore. This recursive
semaphore will hold filesystem locks and causes a hang of the
fileystem.

The ip_alloc_sem protects the inode extent list and size.
Release the read semphore before calling fiemap_fill_next_extent()
in ocfs2_fiemap() and ocfs2_fiemap_inline(). ocfs2_fiemap_inline()
now returns with semaphore released.

Reported-by: syzbot+541dcc6ee768f77103e7@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=541dcc6ee768f77103e7 
<https://syzkaller.appspot.com/bug?extid=541dcc6ee768f77103e7>

Signed-off-by: Mark Tinguely <mark.tinguely@oracle.com>
---
v3: also release semaphore in ocfs2_fiemap_inline() with semaphore released
      move semaphore before calling fiemap_fill_next_extent()
      change the error path to release the semaphore to avoid an unnecessary
      semaphore down/up in the last loop..

v2: leave the write semaphore in ocfs2_page_mkwrite() but remove the
      read semaphore before calling fiemap_fill_next_extent().
      Add a Closes: line
      review noted missing semaphore change to ocfs2_fiemap_inline()
             suggest moving locks to fiemap_fill_next_extent() call.

v1: change the ip_alloc_sem from write to read.
      rejected in review.
---
   fs/ocfs2/extent_map.c | 27 +++++++++++++++++++++------
   1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
index 930150ed5db1..20045fb1bb70 100644
--- a/fs/ocfs2/extent_map.c
+++ b/fs/ocfs2/extent_map.c
@@ -706,6 +706,8 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, 
u64 v_blkno, u64 *p_blkno,
    * it not only handles the fiemap for inlined files, but also deals
    * with the fast symlink, cause they have no difference for extent
    * mapping per se.
+ *
+ * Called with ip_alloc_sem taken and returned with it released.
    */
   static int ocfs2_fiemap_inline(struct inode *inode, struct 
buffer_head *di_bh,
                                  struct fiemap_extent_info *fieinfo,
@@ -732,10 +734,16 @@ static int ocfs2_fiemap_inline(struct inode 
*inode, struct buffer_head *di_bh,
                           phys += offsetof(struct ocfs2_dinode,
                                            id2.i_data.id_data);

+               /* prevent deadlock to a mmap executable page fault */
+               up_read(&OCFS2_I(inode)->ip_alloc_sem);
+
                   ret = fiemap_fill_next_extent(fieinfo, 0, phys, id_count,
                                                 flags);
                   if (ret < 0)
                           return ret;
+       } else {
+               /* return with the semaphore released */
+               up_read(&OCFS2_I(inode)->ip_alloc_sem);
           }

           return 0;
@@ -769,6 +777,7 @@ int ocfs2_fiemap(struct inode *inode, struct 
fiemap_extent_info *fieinfo,
            */
           if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) ||
               ocfs2_inode_is_fast_symlink(inode)) {
+               /* ip_alloc_sem is released in ocfs2_fiemap_inline() */
                   ret = ocfs2_fiemap_inline(inode, di_bh, fieinfo, 
map_start);
                   goto out_unlock;
           }
@@ -784,6 +793,7 @@ int ocfs2_fiemap(struct inode *inode, struct 
fiemap_extent_info *fieinfo,
                                                    &hole_size, &rec, 
&is_last);
                   if (ret) {
                           mlog_errno(ret);
+                       up_read(&OCFS2_I(inode)->ip_alloc_sem);
                           goto out_unlock;
                   }

@@ -803,22 +813,27 @@ int ocfs2_fiemap(struct inode *inode, struct 
fiemap_extent_info *fieinfo,
                   phys_bytes = le64_to_cpu(rec.e_blkno) << 
osb->sb->s_blocksize_bits;
                   virt_bytes = (u64)le32_to_cpu(rec.e_cpos) << 
osb->s_clustersize_bits;

+               /* prevent deadlock to a mmap executable page fault */
+               up_read(&OCFS2_I(inode)->ip_alloc_sem);
+
                   ret = fiemap_fill_next_extent(fieinfo, virt_bytes, 
phys_bytes,
                                                 len_bytes, fe_flags);
-               if (ret)
-                       break;
+               if (ret) {
+                       if (ret > 0)
+                               ret = 0;
+                       goto out_unlock;
+               }
+
+               down_read(&OCFS2_I(inode)->ip_alloc_sem);

                   cpos = le32_to_cpu(rec.e_cpos)+ 
le16_to_cpu(rec.e_leaf_clusters);
           }

-       if (ret > 0)
-               ret = 0;
+       up_read(&OCFS2_I(inode)->ip_alloc_sem);

   out_unlock:
           brelse(di_bh);

-       up_read(&OCFS2_I(inode)->ip_alloc_sem);
-
           ocfs2_inode_unlock(inode, 0);
   out:

-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [External] : [v3 PATCH] ocfs2: fix recursive semaphore deadlock in fiemap call
  2025-08-28 16:28   ` [v3 PATCH] ocfs2: fix recursive semaphore deadlock in fiemap call Mark Tinguely
@ 2025-08-28 19:22     ` Mark Tinguely
  0 siblings, 0 replies; 3+ messages in thread
From: Mark Tinguely @ 2025-08-28 19:22 UTC (permalink / raw)
  To: ocfs2-devel@lists.linux.dev

On 8/28/25 11:28 AM, Mark Tinguely wrote:
> syzbot detected a OCFS2 hang due to a recursive semaphore on a
> FS_IOC_FIEMAP of the extent list on a specially crafted mmap file.
> 
> context_switch kernel/sched/core.c:5357 [inline]
>     __schedule+0x1798/0x4cc0 kernel/sched/core.c:6961
>     __schedule_loop kernel/sched/core.c:7043 [inline]
>     schedule+0x165/0x360 kernel/sched/core.c:7058
>     schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:7115
>     rwsem_down_write_slowpath+0x872/0xfe0 kernel/locking/rwsem.c:1185
>     __down_write_common kernel/locking/rwsem.c:1317 [inline]
>     __down_write kernel/locking/rwsem.c:1326 [inline]
>     down_write+0x1ab/0x1f0 kernel/locking/rwsem.c:1591
>     ocfs2_page_mkwrite+0x2ff/0xc40 fs/ocfs2/mmap.c:142
>     do_page_mkwrite+0x14d/0x310 mm/memory.c:3361
>     wp_page_shared mm/memory.c:3762 [inline]
>     do_wp_page+0x268d/0x5800 mm/memory.c:3981
>     handle_pte_fault mm/memory.c:6068 [inline]
>     __handle_mm_fault+0x1033/0x5440 mm/memory.c:6195
>     handle_mm_fault+0x40a/0x8e0 mm/memory.c:6364
>     do_user_addr_fault+0x764/0x1390 arch/x86/mm/fault.c:1387
>     handle_page_fault arch/x86/mm/fault.c:1476 [inline]
>     exc_page_fault+0x76/0xf0 arch/x86/mm/fault.c:1532
>     asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
> RIP: 0010:copy_user_generic arch/x86/include/asm/uaccess_64.h:126 [inline]
> RIP: 0010:raw_copy_to_user arch/x86/include/asm/uaccess_64.h:147 [inline]
> RIP: 0010:_inline_copy_to_user include/linux/uaccess.h:197 [inline]
> RIP: 0010:_copy_to_user+0x85/0xb0 lib/usercopy.c:26
> Code: e8 00 bc f7 fc 4d 39 fc 72 3d 4d 39 ec 77 38 e8 91 b9 f7 fc 4c 89
> f7 89 de e8 47 25 5b fd 0f 01 cb 4c 89 ff 48 89 d9 4c 89 f6 <f3> a4 0f
> 1f 00 48 89 cb 0f 01 ca 48 89 d8 5b 41 5c 41 5d 41 5e 41
> RSP: 0018:ffffc9000403f950 EFLAGS: 00050256
> RAX: ffffffff84c7f101 RBX: 0000000000000038 RCX: 0000000000000038
> RDX: 0000000000000000 RSI: ffffc9000403f9e0 RDI: 0000200000000060
> RBP: ffffc9000403fa90 R08: ffffc9000403fa17 R09: 1ffff92000807f42
> R10: dffffc0000000000 R11: fffff52000807f43 R12: 0000200000000098
> R13: 00007ffffffff000 R14: ffffc9000403f9e0 R15: 0000200000000060
>     copy_to_user include/linux/uaccess.h:225 [inline]
>     fiemap_fill_next_extent+0x1c0/0x390 fs/ioctl.c:145
>     ocfs2_fiemap+0x888/0xc90 fs/ocfs2/extent_map.c:806
>     ioctl_fiemap fs/ioctl.c:220 [inline]
>     do_vfs_ioctl+0x1173/0x1430 fs/ioctl.c:532
>     __do_sys_ioctl fs/ioctl.c:596 [inline]
>     __se_sys_ioctl+0x82/0x170 fs/ioctl.c:584
>     do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>     do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
>     entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f5f13850fd9
> RSP: 002b:00007ffe3b3518b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 0000200000000000 RCX: 00007f5f13850fd9
> RDX: 0000200000000040 RSI: 00000000c020660b RDI: 0000000000000004
> RBP: 6165627472616568 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe3b3518f0
> R13: 00007ffe3b351b18 R14: 431bde82d7b634db R15: 00007f5f1389a03b
> 
> ocfs2_fiemap() takes a read lock of the ip_alloc_sem semaphore (since
> v2.6.22-527-g7307de80510a) and calls fiemap_fill_next_extent()
> to read the extent list of this running mmap executable.
> The user supplied buffer to hold the fiemap information page faults
> calling ocfs2_page_mkwrite() which will take a write lock (since
> v2.6.27-38-g00dc417fa3e7) of the same semaphore. This recursive
> semaphore will hold filesystem locks and causes a hang of the
> fileystem.
> 
> The ip_alloc_sem protects the inode extent list and size.
> Release the read semphore before calling fiemap_fill_next_extent()
> in ocfs2_fiemap() and ocfs2_fiemap_inline(). ocfs2_fiemap_inline()
> now returns with semaphore released.
> 
> Reported-by: syzbot+541dcc6ee768f77103e7@syzkaller.appspotmail.com
> Closes: https://urldefense.com/v3/__https://syzkaller.appspot.com/bug? 
> extid=541dcc6ee768f77103e7__;!!ACWV5N9M2RV99hQ!KtyP2cHen- 
> zXLVmJXh4CmpXisEbTC01mhqQIZPQb- 
> tKoEgZT2UhmXJNaiY4HXGpqTR9pYFHNSWzNyNyxGosOgxburSg$ <https:// 
> urldefense.com/v3/__https://syzkaller.appspot.com/bug? 
> extid=541dcc6ee768f77103e7__;!!ACWV5N9M2RV99hQ!KtyP2cHen- 
> zXLVmJXh4CmpXisEbTC01mhqQIZPQb- 
> tKoEgZT2UhmXJNaiY4HXGpqTR9pYFHNSWzNyNyxGosOgxburSg$ >
> 
> Signed-off-by: Mark Tinguely <mark.tinguely@oracle.com>
> ---
> v3: also release semaphore in ocfs2_fiemap_inline() with semaphore released
>       move semaphore before calling fiemap_fill_next_extent()
>       change the error path to release the semaphore to avoid an 
> unnecessary
>       semaphore down/up in the last loop..
> 
> v2: leave the write semaphore in ocfs2_page_mkwrite() but remove the
>       read semaphore before calling fiemap_fill_next_extent().
>       Add a Closes: line
>       review noted missing semaphore change to ocfs2_fiemap_inline()
>              suggest moving locks to fiemap_fill_next_extent() call.
> 
> v1: change the ip_alloc_sem from write to read.
>       rejected in review.
> ---
>    fs/ocfs2/extent_map.c | 27 +++++++++++++++++++++------
>    1 file changed, 21 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
> index 930150ed5db1..20045fb1bb70 100644
> --- a/fs/ocfs2/extent_map.c
> +++ b/fs/ocfs2/extent_map.c
> @@ -706,6 +706,8 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, 
> u64 v_blkno, u64 *p_blkno,
>     * it not only handles the fiemap for inlined files, but also deals
>     * with the fast symlink, cause they have no difference for extent
>     * mapping per se.
> + *
> + * Called with ip_alloc_sem taken and returned with it released.
>     */
>    static int ocfs2_fiemap_inline(struct inode *inode, struct 
> buffer_head *di_bh,
>                                   struct fiemap_extent_info *fieinfo,
> @@ -732,10 +734,16 @@ static int ocfs2_fiemap_inline(struct inode 
> *inode, struct buffer_head *di_bh,
>                            phys += offsetof(struct ocfs2_dinode,
>                                             id2.i_data.id_data);
> 
> +               /* prevent deadlock to a mmap executable page fault */
> +               up_read(&OCFS2_I(inode)->ip_alloc_sem);
> +
>                    ret = fiemap_fill_next_extent(fieinfo, 0, phys, 
> id_count,
>                                                  flags);
>                    if (ret < 0)
>                            return ret;
> +       } else {
> +               /* return with the semaphore released */
> +               up_read(&OCFS2_I(inode)->ip_alloc_sem);
>            }
> 
>            return 0;
> @@ -769,6 +777,7 @@ int ocfs2_fiemap(struct inode *inode, struct 
> fiemap_extent_info *fieinfo,
>             */
>            if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) ||
>                ocfs2_inode_is_fast_symlink(inode)) {
> +               /* ip_alloc_sem is released in ocfs2_fiemap_inline() */
>                    ret = ocfs2_fiemap_inline(inode, di_bh, fieinfo, 
> map_start);
>                    goto out_unlock;
>            }
> @@ -784,6 +793,7 @@ int ocfs2_fiemap(struct inode *inode, struct 
> fiemap_extent_info *fieinfo,
>                                                     &hole_size, &rec, 
> &is_last);
>                    if (ret) {
>                            mlog_errno(ret);
> +                       up_read(&OCFS2_I(inode)->ip_alloc_sem);

not sure what happened to the white space here.
we can remove and change the goto to a new lable, say out_upsema ...

>                            goto out_unlock;
>                    }
> 
> @@ -803,22 +813,27 @@ int ocfs2_fiemap(struct inode *inode, struct 
> fiemap_extent_info *fieinfo,
>                    phys_bytes = le64_to_cpu(rec.e_blkno) << osb->sb- 
>  >s_blocksize_bits;
>                    virt_bytes = (u64)le32_to_cpu(rec.e_cpos) << osb- 
>  >s_clustersize_bits;
> 
> +               /* prevent deadlock to a mmap executable page fault */
> +               up_read(&OCFS2_I(inode)->ip_alloc_sem);
> +
>                    ret = fiemap_fill_next_extent(fieinfo, virt_bytes, 
> phys_bytes,
>                                                  len_bytes, fe_flags);
> -               if (ret)
> -                       break;
> +               if (ret) {
> +                       if (ret > 0)
> +                               ret = 0;
> +                       goto out_unlock;
> +               }
> +
> +               down_read(&OCFS2_I(inode)->ip_alloc_sem);
> 
>                    cpos = le32_to_cpu(rec.e_cpos)+ 
> le16_to_cpu(rec.e_leaf_clusters);
>            }
> 
> -       if (ret > 0)
> -               ret = 0;

... could put the new out_upseama label here

> +       up_read(&OCFS2_I(inode)->ip_alloc_sem);
> 
>    out_unlock:
>            brelse(di_bh);
> 
> -       up_read(&OCFS2_I(inode)->ip_alloc_sem);
> -
>            ocfs2_inode_unlock(inode, 0);
>    out:
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [External] : Re: [v3 PATCH] ocfs2: fix recursive semaphore deadlock in fiemap call
       [not found]   ` <93589648-9595-4fd1-a80f-5d1995c23fd6@linux.alibaba.com>
@ 2025-08-29 13:28     ` Mark Tinguely
  0 siblings, 0 replies; 3+ messages in thread
From: Mark Tinguely @ 2025-08-29 13:28 UTC (permalink / raw)
  To: Joseph Qi
  Cc: Mark Fasheh, heming.zhao@suse.com,
	syzbot+541dcc6ee768f77103e7@syzkaller.appspotmail.com,
	ocfs2-devel@lists.linux.dev

On 8/28/25 9:10 PM, Joseph Qi wrote:
> 
> 
> On 2025/8/28 23:53, Mark Tinguely wrote:
>> syzbot detected a OCFS2 hang due to a recursive semaphore on a
>> FS_IOC_FIEMAP of the extent list on a specially crafted mmap file.
>>
>> context_switch kernel/sched/core.c:5357 [inline]
>>    __schedule+0x1798/0x4cc0 kernel/sched/core.c:6961
>>    __schedule_loop kernel/sched/core.c:7043 [inline]
>>    schedule+0x165/0x360 kernel/sched/core.c:7058
>>    schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:7115
>>    rwsem_down_write_slowpath+0x872/0xfe0 kernel/locking/rwsem.c:1185
>>    __down_write_common kernel/locking/rwsem.c:1317 [inline]
>>    __down_write kernel/locking/rwsem.c:1326 [inline]
>>    down_write+0x1ab/0x1f0 kernel/locking/rwsem.c:1591
>>    ocfs2_page_mkwrite+0x2ff/0xc40 fs/ocfs2/mmap.c:142
>>    do_page_mkwrite+0x14d/0x310 mm/memory.c:3361
>>    wp_page_shared mm/memory.c:3762 [inline]
>>    do_wp_page+0x268d/0x5800 mm/memory.c:3981
>>    handle_pte_fault mm/memory.c:6068 [inline]
>>    __handle_mm_fault+0x1033/0x5440 mm/memory.c:6195
>>    handle_mm_fault+0x40a/0x8e0 mm/memory.c:6364
>>    do_user_addr_fault+0x764/0x1390 arch/x86/mm/fault.c:1387
>>    handle_page_fault arch/x86/mm/fault.c:1476 [inline]
>>    exc_page_fault+0x76/0xf0 arch/x86/mm/fault.c:1532
>>    asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
>> RIP: 0010:copy_user_generic arch/x86/include/asm/uaccess_64.h:126 [inline]
>> RIP: 0010:raw_copy_to_user arch/x86/include/asm/uaccess_64.h:147 [inline]
>> RIP: 0010:_inline_copy_to_user include/linux/uaccess.h:197 [inline]
>> RIP: 0010:_copy_to_user+0x85/0xb0 lib/usercopy.c:26
>> Code: e8 00 bc f7 fc 4d 39 fc 72 3d 4d 39 ec 77 38 e8 91 b9 f7 fc 4c 89
>> f7 89 de e8 47 25 5b fd 0f 01 cb 4c 89 ff 48 89 d9 4c 89 f6 <f3> a4 0f
>> 1f 00 48 89 cb 0f 01 ca 48 89 d8 5b 41 5c 41 5d 41 5e 41
>> RSP: 0018:ffffc9000403f950 EFLAGS: 00050256
>> RAX: ffffffff84c7f101 RBX: 0000000000000038 RCX: 0000000000000038
>> RDX: 0000000000000000 RSI: ffffc9000403f9e0 RDI: 0000200000000060
>> RBP: ffffc9000403fa90 R08: ffffc9000403fa17 R09: 1ffff92000807f42
>> R10: dffffc0000000000 R11: fffff52000807f43 R12: 0000200000000098
>> R13: 00007ffffffff000 R14: ffffc9000403f9e0 R15: 0000200000000060
>>    copy_to_user include/linux/uaccess.h:225 [inline]
>>    fiemap_fill_next_extent+0x1c0/0x390 fs/ioctl.c:145
>>    ocfs2_fiemap+0x888/0xc90 fs/ocfs2/extent_map.c:806
>>    ioctl_fiemap fs/ioctl.c:220 [inline]
>>    do_vfs_ioctl+0x1173/0x1430 fs/ioctl.c:532
>>    __do_sys_ioctl fs/ioctl.c:596 [inline]
>>    __se_sys_ioctl+0x82/0x170 fs/ioctl.c:584
>>    do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>>    do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
>>    entry_SYSCALL_64_after_hwframe+0x77/0x7f
>> RIP: 0033:0x7f5f13850fd9
>> RSP: 002b:00007ffe3b3518b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>> RAX: ffffffffffffffda RBX: 0000200000000000 RCX: 00007f5f13850fd9
>> RDX: 0000200000000040 RSI: 00000000c020660b RDI: 0000000000000004
>> RBP: 6165627472616568 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe3b3518f0
>> R13: 00007ffe3b351b18 R14: 431bde82d7b634db R15: 00007f5f1389a03b
>>
>> ocfs2_fiemap() takes a read lock of the ip_alloc_sem semaphore (since
>> v2.6.22-527-g7307de80510a) and calls fiemap_fill_next_extent()
>> to read the extent list of this running mmap executable.
>> The user supplied buffer to hold the fiemap information page faults
>> calling ocfs2_page_mkwrite() which will take a write lock (since
>> v2.6.27-38-g00dc417fa3e7) of the same semaphore. This recursive
>> semaphore will hold filesystem locks and causes a hang of the
>> fileystem.
>>
>> The ip_alloc_sem protects the inode extent list and size.
>> Release the read semphore before calling fiemap_fill_next_extent()
>> in ocfs2_fiemap() and ocfs2_fiemap_inline(). ocfs2_fiemap_inline()
>> now returns with semaphore released.
>>
>> Reported-by: syzbot+541dcc6ee768f77103e7@syzkaller.appspotmail.com
>> Closes: https://urldefense.com/v3/__https://syzkaller.appspot.com/bug?extid=541dcc6ee768f77103e7__;!!ACWV5N9M2RV99hQ!NTMKVnLmJgrhlYd54LbJrSh-LYdJHOiPt6yBu8E6fL4ZIyBsfKWEPjK4RjXsQYcKPJyOSy5qkOZa9KpJyZhjl4WebbHI$
>>
>> Signed-off-by: Mark Tinguely <mark.tinguely@oracle.com>
>> ---
>> v3: also release semaphore in ocfs2_fiemap_inline() with semaphore released
>>      move semaphore before calling fiemap_fill_next_extent()
>>      change the error path to release the semaphore to avoid an unnecessary
>>      semaphore down/up in the last loop..
>>
>> v2: leave the write semaphore in ocfs2_page_mkwrite() but remove the
>>      read semaphore before calling fiemap_fill_next_extent().
>>      Add a Closes: line
>>      review noted missing semaphore change to ocfs2_fiemap_inline()
>>             suggest moving locks to fiemap_fill_next_extent() call.
>>
>> v1: change the ip_alloc_sem from write to read.
>>      rejected in review.
>> ---
>>   fs/ocfs2/extent_map.c | 27 +++++++++++++++++++++------
>>   1 file changed, 21 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>> index 930150ed5db1..20045fb1bb70 100644
>> --- a/fs/ocfs2/extent_map.c
>> +++ b/fs/ocfs2/extent_map.c
>> @@ -706,6 +706,8 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 v_blkno, u64 *p_blkno,
>>    * it not only handles the fiemap for inlined files, but also deals
>>    * with the fast symlink, cause they have no difference for extent
>>    * mapping per se.
>> + *
>> + * Called with ip_alloc_sem taken and returned with it released.
> 
> This makes things complicated.
> 
>>    */
>>   static int ocfs2_fiemap_inline(struct inode *inode, struct buffer_head *di_bh,
>>                                  struct fiemap_extent_info *fieinfo,
>> @@ -732,10 +734,16 @@ static int ocfs2_fiemap_inline(struct inode *inode, struct buffer_head *di_bh,
>>                           phys += offsetof(struct ocfs2_dinode,
>>                                            id2.i_data.id_data);
>>
>> +               /* prevent deadlock to a mmap executable page fault */
>> +               up_read(&OCFS2_I(inode)->ip_alloc_sem);
>> +
>>                   ret = fiemap_fill_next_extent(fieinfo, 0, phys, id_count,
>>                                                 flags);
>>                   if (ret < 0)
>>                           return ret;
>> +       } else {
>> +               /* return with the semaphore released */
>> +               up_read(&OCFS2_I(inode)->ip_alloc_sem);
>>           }
>>
>>           return 0;
>> @@ -769,6 +777,7 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>            */
>>           if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) ||
>>               ocfs2_inode_is_fast_symlink(inode)) {
>> +               /* ip_alloc_sem is released in ocfs2_fiemap_inline() */
>>                   ret = ocfs2_fiemap_inline(inode, di_bh, fieinfo, map_start);
>>                   goto out_unlock;
>>           }
>> @@ -784,6 +793,7 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>                                                    &hole_size, &rec, &is_last);
>>                   if (ret) {
>>                           mlog_errno(ret);
>> +                       up_read(&OCFS2_I(inode)->ip_alloc_sem);
>>                           goto out_unlock;
>>                   }
>>
>> @@ -803,22 +813,27 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>                   phys_bytes = le64_to_cpu(rec.e_blkno) << osb->sb->s_blocksize_bits;
>>                   virt_bytes = (u64)le32_to_cpu(rec.e_cpos) << osb->s_clustersize_bits;
>>
>> +               /* prevent deadlock to a mmap executable page fault */
>> +               up_read(&OCFS2_I(inode)->ip_alloc_sem);
>> +
>>                   ret = fiemap_fill_next_extent(fieinfo, virt_bytes, phys_bytes,
>>                                                 len_bytes, fe_flags);
>> -               if (ret)
>> -                       break;
>> +               if (ret) {
>> +                       if (ret > 0)
>> +                               ret = 0;
>> +                       goto out_unlock;
>> +               }
>> +
>> +               down_read(&OCFS2_I(inode)->ip_alloc_sem);
>>
>>                   cpos = le32_to_cpu(rec.e_cpos)+ le16_to_cpu(rec.e_leaf_clusters);
>>           }
>>
>> -       if (ret > 0)
>> -               ret = 0;
>> +       up_read(&OCFS2_I(inode)->ip_alloc_sem);
>>
>>   out_unlock:
>>           brelse(di_bh);
>>
>> -       up_read(&OCFS2_I(inode)->ip_alloc_sem);
>> -
>>           ocfs2_inode_unlock(inode, 0);
>>   out:
>>
>> --
>> 2.39.5 (Apple Git-154)
>>
> 
> Could we simplify it in the following way?
> 
> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
> index 930150ed5db1..1ab4bfa6fc24 100644
> --- a/fs/ocfs2/extent_map.c
> +++ b/fs/ocfs2/extent_map.c
> @@ -706,6 +706,8 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 v_blkno, u64 *p_blkno,
>    * it not only handles the fiemap for inlined files, but also deals
>    * with the fast symlink, cause they have no difference for extent
>    * mapping per se.
> + *
> + * Must be called with ip_alloc_sem read lock held.
>    */
>   static int ocfs2_fiemap_inline(struct inode *inode, struct buffer_head *di_bh,
>   			       struct fiemap_extent_info *fieinfo,
> @@ -718,6 +720,8 @@ static int ocfs2_fiemap_inline(struct inode *inode, struct buffer_head *di_bh,
>   	u32 flags = FIEMAP_EXTENT_DATA_INLINE|FIEMAP_EXTENT_LAST;
>   	struct ocfs2_inode_info *oi = OCFS2_I(inode);
>   
> +	lockdep_assert_held_read(&oi->ip_alloc_sem);
> +
>   	di = (struct ocfs2_dinode *)di_bh->b_data;
>   	if (ocfs2_inode_is_fast_symlink(inode))
>   		id_count = ocfs2_fast_symlink_chars(inode->i_sb);
> @@ -732,8 +736,11 @@ static int ocfs2_fiemap_inline(struct inode *inode, struct buffer_head *di_bh,
>   			phys += offsetof(struct ocfs2_dinode,
>   					 id2.i_data.id_data);
>   
> +		/* Release ip_alloc_sem to prevent deadlock on fault */
> +		up_read(&oi->ip_alloc_sem);
>   		ret = fiemap_fill_next_extent(fieinfo, 0, phys, id_count,
>   					      flags);
> +		down_read(&oi->ip_alloc_sem);
>   		if (ret < 0)
>   			return ret;
>   	}
> @@ -803,8 +810,11 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>   		phys_bytes = le64_to_cpu(rec.e_blkno) << osb->sb->s_blocksize_bits;
>   		virt_bytes = (u64)le32_to_cpu(rec.e_cpos) << osb->s_clustersize_bits;
>   
> +		/* Release ip_alloc_sem to prevent deadlock on fault */
> +		up_read(&OCFS2_I(inode)->ip_alloc_sem);
>   		ret = fiemap_fill_next_extent(fieinfo, virt_bytes, phys_bytes,
>   					      len_bytes, fe_flags);
> +		down_read(&OCFS2_I(inode)->ip_alloc_sem);
>   		if (ret)
>   			break;
> 
> Thanks,
> Joseph
> 

yes. I was being fancy to avoid an unnecessary semaphore lock/unlock 
rounnd trip for other writers. I suspected that this was going to be the 
next request.

Mark.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-08-29 13:29 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20250828154733.13591-1-mark.tinguely@oracle.com>
     [not found] ` <DM4PR10MB7476AAACC3BCFAA61077E606893BA@DM4PR10MB7476.namprd10.prod.outlook.com>
2025-08-28 16:28   ` [v3 PATCH] ocfs2: fix recursive semaphore deadlock in fiemap call Mark Tinguely
2025-08-28 19:22     ` [External] : " Mark Tinguely
     [not found]   ` <93589648-9595-4fd1-a80f-5d1995c23fd6@linux.alibaba.com>
2025-08-29 13:28     ` [External] : " Mark Tinguely

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).