[PATCH] ocfs2: fix deadlock in dio write orphan cleanup path

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] ocfs2: fix deadlock in dio write orphan cleanup path
@ 2026-06-20  8:08 Deepanshu Kartikey
  2026-06-20  8:20 ` sashiko-bot
  2026-06-20 17:59 ` Matthew Wilcox
  0 siblings, 2 replies; 4+ messages in thread
From: Deepanshu Kartikey @ 2026-06-20  8:08 UTC (permalink / raw)
  To: mark, jlbec, joseph.qi, bigeasy, clrkwllms, rostedt
  Cc: ocfs2-devel, linux-kernel, linux-rt-devel, Deepanshu Kartikey,
	syzbot+ce129763ce7d7e914739

PREEMPT_RT's rtmutex PI chain walker detected a lock dependency
cycle in a single thread:

  ocfs2_file_write_iter()
    inode_lock(file_inode)             [Lock A]
      ocfs2_dio_end_io_write()
        ocfs2_inode_lock()             [Lock B]
          ocfs2_del_inode_from_orphan()
            inode_lock(orphan_dir)     [Lock D] <- cycle detected!

The problem is lock ordering. Lock B is held when Lock D is
acquired. Recovery paths acquire these locks in a different
order creating a potential cycle in the lock dependency graph.

Fix this by releasing Lock B (ocfs2_inode_unlock + brelse(di_bh))
BEFORE calling ocfs2_del_inode_from_orphan(). Pass NULL for di_bh
to signal that ocfs2_del_inode_from_orphan() should acquire its
own fresh cluster lock and di_bh internally.

This ensures consistent lock ordering:
  Before: B held -> D acquired         (inconsistent)
  After:  B released -> B' fresh -> D  (consistent)

Reported-by: syzbot+ce129763ce7d7e914739@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=ce129763ce7d7e914739
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
 fs/ocfs2/aops.c  | 21 +++++++++++++++------
 fs/ocfs2/namei.c | 17 ++++++++++++++++-
 2 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 6ec198bdab12..15b059a23ebc 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -2280,6 +2280,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
 	handle_t *handle = NULL;
 	loff_t end = offset + bytes;
 	int ret = 0, credits = 0, batch = 0;
+	bool orphaned = false;
 
 	ocfs2_init_dealloc_ctxt(&dealloc);
 
@@ -2371,17 +2372,25 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
 		ocfs2_commit_trans(osb, handle);
 unlock:
 	up_write(&oi->ip_alloc_sem);
+	/*
+	 * Release the cluster lock and di_bh BEFORE calling
+	 * ocfs2_del_inode_from_orphan(). That function will acquire
+	 * inode_lock(orphan_dir_inode) which would cause an AB-BA
+	 *  deadlock with recovery paths that hold orphan_dir lock
+	 * before acquiring the file inode lock.
+	 */
+	orphaned = (!ret && dwc->dw_orphaned);
+	ocfs2_inode_unlock(inode, 1);
+	brelse(di_bh);
+	di_bh = NULL;
 
-	/* everything looks good, let's start the cleanup */
-	if (!ret && dwc->dw_orphaned) {
+	/* everything looks good, let's start the orphan cleanup */
+	if (orphaned) {
 		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
-
-		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
+		ret = ocfs2_del_inode_from_orphan(osb, inode, NULL, 0, 0);
 		if (ret < 0)
 			mlog_errno(ret);
 	}
-	ocfs2_inode_unlock(inode, 1);
-	brelse(di_bh);
 out:
 	if (data_ac)
 		ocfs2_free_alloc_context(data_ac);
diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
index 1277666c77cd..25bbe2a9776e 100644
--- a/fs/ocfs2/namei.c
+++ b/fs/ocfs2/namei.c
@@ -2712,10 +2712,21 @@ int ocfs2_del_inode_from_orphan(struct ocfs2_super *osb,
 {
 	struct inode *orphan_dir_inode = NULL;
 	struct buffer_head *orphan_dir_bh = NULL;
-	struct ocfs2_dinode *di = (struct ocfs2_dinode *)di_bh->b_data;
+	struct ocfs2_dinode *di;
 	handle_t *handle = NULL;
 	int status = 0;
+	struct buffer_head *local_di_bh = NULL;
 
+	if (!di_bh) {
+		status = ocfs2_inode_lock(inode, &local_di_bh, 1);
+		if (status < 0) {
+			mlog_errno(status);
+			return status;
+		}
+		di_bh = local_di_bh;
+	}
+
+	di = (struct ocfs2_dinode *)di_bh->b_data;
 	orphan_dir_inode = ocfs2_get_system_file_inode(osb,
 			ORPHAN_DIR_SYSTEM_INODE,
 			le16_to_cpu(di->i_dio_orphaned_slot));
@@ -2779,6 +2790,10 @@ int ocfs2_del_inode_from_orphan(struct ocfs2_super *osb,
 	iput(orphan_dir_inode);
 
 bail:
+	if (local_di_bh) {
+		ocfs2_inode_unlock(inode, 1);
+		brelse(local_di_bh);
+	}
 	return status;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] ocfs2: fix deadlock in dio write orphan cleanup path
  2026-06-20  8:08 [PATCH] ocfs2: fix deadlock in dio write orphan cleanup path Deepanshu Kartikey
@ 2026-06-20  8:20 ` sashiko-bot
  2026-06-20 17:59 ` Matthew Wilcox
  1 sibling, 0 replies; 4+ messages in thread
From: sashiko-bot @ 2026-06-20  8:20 UTC (permalink / raw)
  To: Deepanshu Kartikey; +Cc: linux-kernel, linux-rt-devel

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

Pre-existing issues:
- [High] The patch completely fails to resolve the AB-BA deadlock between the write path and the orphan recovery path, as the exact same lock dependency is immediately recreated.
- [High] The identical lock dependency cycle (A -> B -> D) exists unmitigated in the orphan addition path.
- [High] Directory `i_rwsem` operations use the wrong subclass, violating VFS lock ordering rules and hiding parent-child deadlocks.
--

commit a3ea3ef9d81bdb971acfa026b14e772f33af54fa
Author: Deepanshu Kartikey <kartikey406@gmail.com>

ocfs2: fix deadlock in dio write orphan cleanup path

This commit attempts to fix an AB-BA deadlock in the direct IO orphan cleanup
path by releasing the file's cluster lock before calling
ocfs2_del_inode_from_orphan(). It passes NULL for di_bh to signal the
function to acquire its own lock.

Closes: https://syzkaller.appspot.com/bug?extid=ce129763ce7d7e914739

> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 6ec198bdab121..095cb3e44c65e 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c

[ ... ]

> @@ -2371,17 +2372,25 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>  		ocfs2_commit_trans(osb, handle);
>  unlock:
>  	up_write(&oi->ip_alloc_sem);
> +	/*
> +	 * Release the cluster lock and di_bh BEFORE calling
> +	 * ocfs2_del_inode_from_orphan(). That function will acquire
> +	 * inode_lock(orphan_dir_inode) which would cause an AB-BA
> +	 *  deadlock with recovery paths that hold orphan_dir lock
> +	 * before acquiring the file inode lock.
> +	 */
> +	orphaned = (!ret && dwc->dw_orphaned);
> +	ocfs2_inode_unlock(inode, 1);
> +	brelse(di_bh);
> +	di_bh = NULL;
>  
> -	/* everything looks good, let's start the cleanup */
> -	if (!ret && dwc->dw_orphaned) {
> +	/* everything looks good, let's start the orphan cleanup */
> +	if (orphaned) {
>  		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> -
> -		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
> +		ret = ocfs2_del_inode_from_orphan(osb, inode, NULL, 0, 0);
>  		if (ret < 0)
>  			mlog_errno(ret);
>  	}
> -	ocfs2_inode_unlock(inode, 1);
> -	brelse(di_bh);
>  out:

[ ... ]

> diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
> index 1277666c77cd2..25bbe2a9776ef 100644
> --- a/fs/ocfs2/namei.c
> +++ b/fs/ocfs2/namei.c
> @@ -2712,10 +2712,21 @@ int ocfs2_del_inode_from_orphan(struct ocfs2_super *osb,
>  {
>  	struct inode *orphan_dir_inode = NULL;
>  	struct buffer_head *orphan_dir_bh = NULL;
> -	struct ocfs2_dinode *di = (struct ocfs2_dinode *)di_bh->b_data;
> +	struct ocfs2_dinode *di;
>  	handle_t *handle = NULL;
>  	int status = 0;
> +	struct buffer_head *local_di_bh = NULL;
>  
> +	if (!di_bh) {
> +		status = ocfs2_inode_lock(inode, &local_di_bh, 1);

[Severity: High]
This is a pre-existing issue, but is the AB-BA deadlock between the write
path and the orphan recovery path truly resolved?

When di_bh is NULL, this immediately reacquires the exact same
ocfs2_inode_lock on the file inode (Lock B) before the orphan directory's
VFS lock (Lock D) is acquired later in this function:

fs/ocfs2/namei.c:ocfs2_del_inode_from_orphan() {
    ...
    inode_lock(orphan_dir_inode);
    ...
}

Since the lock acquisition order remains exactly the same (B -> D) as it
was before the patch, and the recovery path acquires these locks in the
reverse order (D -> B), doesn't the deadlock cycle remain unmitigated?

Also, as a related pre-existing issue, couldn't the identical lock
dependency cycle (A -> B -> D) still occur unmitigated in the orphan
addition path?

During direct IO writes, ocfs2_dio_get_block() calls
ocfs2_add_inode_to_orphan() while holding the file's VFS lock (Lock A).
Inside ocfs2_add_inode_to_orphan(), the file's cluster lock (Lock B) is
acquired:

fs/ocfs2/namei.c:ocfs2_add_inode_to_orphan() {
    status = ocfs2_inode_lock(inode, &di_bh, 1);
    ...
    status = ocfs2_prepare_orphan_dir(osb, &orphan_dir_inode,
                                      OCFS2_I(inode)->ip_blkno,
                                      orphan_name,
                                      &orphan_insert,
                                      true);
}

This calls ocfs2_prepare_orphan_dir(), which acquires the orphan
directory's VFS lock (Lock D) via inode_lock(). This seems to establish
the exact same A -> B -> D deadlock cycle against the recovery path.

Finally, this is a pre-existing issue, but might the use of inode_lock()
on the orphan directory violate VFS lock ordering rules?

When modifying a directory, the VFS requires acquiring the directory's
i_rwsem with subclass I_MUTEX_PARENT. In ocfs2_del_inode_from_orphan()
and ocfs2_lookup_lock_orphan_dir(), the default inode_lock() is used
instead:

fs/ocfs2/namei.c:ocfs2_del_inode_from_orphan() {
    ...
    inode_lock(orphan_dir_inode);
    ...
}

Because the caller already holds the file's inode_lock (also subclass 0),
acquiring the orphan directory's lock using subclass 0 can create a
lockdep AA self-deadlock or cycle warning. Furthermore, taking a child's
lock before its parent directory's lock intrinsically violates the VFS
hierarchy (I_MUTEX_PARENT -> I_MUTEX_NORMAL). Should this be using
inode_lock_nested(..., I_MUTEX_PARENT) instead?

> +		if (status < 0) {
> +			mlog_errno(status);
> +			return status;
> +		}
> +		di_bh = local_di_bh;
> +	}
> +
> +	di = (struct ocfs2_dinode *)di_bh->b_data;
>  	orphan_dir_inode = ocfs2_get_system_file_inode(osb,
>  			ORPHAN_DIR_SYSTEM_INODE,
>  			le16_to_cpu(di->i_dio_orphaned_slot));

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260620080802.35165-1-kartikey406@gmail.com?part=1

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] ocfs2: fix deadlock in dio write orphan cleanup path
  2026-06-20  8:08 [PATCH] ocfs2: fix deadlock in dio write orphan cleanup path Deepanshu Kartikey
  2026-06-20  8:20 ` sashiko-bot
@ 2026-06-20 17:59 ` Matthew Wilcox
  2026-06-20 23:26   ` Deepanshu Kartikey
  1 sibling, 1 reply; 4+ messages in thread
From: Matthew Wilcox @ 2026-06-20 17:59 UTC (permalink / raw)
  To: Deepanshu Kartikey
  Cc: mark, jlbec, joseph.qi, bigeasy, clrkwllms, rostedt, ocfs2-devel,
	linux-kernel, linux-rt-devel, syzbot+ce129763ce7d7e914739

On Sat, Jun 20, 2026 at 01:38:02PM +0530, Deepanshu Kartikey wrote:
> PREEMPT_RT's rtmutex PI chain walker detected a lock dependency
> cycle in a single thread:
> 
>   ocfs2_file_write_iter()
>     inode_lock(file_inode)             [Lock A]
>       ocfs2_dio_end_io_write()
>         ocfs2_inode_lock()             [Lock B]
>           ocfs2_del_inode_from_orphan()
>             inode_lock(orphan_dir)     [Lock D] <- cycle detected!

This seems like a false positive.  You can't call write_iter() on
a directory, and orphan_dir is always a directory.

I would suggest that the easiest way to make this warning go away is to
replace inode_lock(orphan_dir) with inode_lock_nested(orphan_dir,
I_MUTEX_NONDIR2).  It's a bit quirky because, well, orphan2 is a
directory.  We could add a seventh lock class to
inode_i_mutex_lock_class, but that feels a bit excessive.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] ocfs2: fix deadlock in dio write orphan cleanup path
  2026-06-20 17:59 ` Matthew Wilcox
@ 2026-06-20 23:26   ` Deepanshu Kartikey
  0 siblings, 0 replies; 4+ messages in thread
From: Deepanshu Kartikey @ 2026-06-20 23:26 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: mark, jlbec, joseph.qi, bigeasy, clrkwllms, rostedt, ocfs2-devel,
	linux-kernel, linux-rt-devel, syzbot+ce129763ce7d7e914739

On Sat, Jun 20, 2026 at 11:29 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> This seems like a false positive.  You can't call write_iter() on
> a directory, and orphan_dir is always a directory.
>
> I would suggest that the easiest way to make this warning go away is to
> replace inode_lock(orphan_dir) with inode_lock_nested(orphan_dir,
> I_MUTEX_NONDIR2).  It's a bit quirky because, well, orphan2 is a
> directory.  We could add a seventh lock class to
> inode_i_mutex_lock_class, but that feels a bit excessive.
>

Thanks for the review. I have sent the patch v2 with
 inode_lock_nested(orphan_dir_inode, I_MUTEX_NONDIR2)

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-20 23:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-20  8:08 [PATCH] ocfs2: fix deadlock in dio write orphan cleanup path Deepanshu Kartikey
2026-06-20  8:20 ` sashiko-bot
2026-06-20 17:59 ` Matthew Wilcox
2026-06-20 23:26   ` Deepanshu Kartikey

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.