The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Marco Elver <elver@google.com>
To: Heming Zhao <heming.zhao@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>, Joel Becker <jlbec@evilplan.org>,
	Joseph Qi <joseph.qi@linux.alibaba.com>,
	ocfs2-devel@lists.linux.dev, linux-kernel@vger.kernel.org,
	kasan-dev@googlegroups.com
Subject: Re: [PATCH] ocfs2: fix orphan inode disk leak in ocfs2_dio_end_io() on I/O error
Date: Fri, 12 Jun 2026 14:58:41 +0200	[thread overview]
Message-ID: <aiwCgXY4tChR3MiU@elver.google.com> (raw)
In-Reply-To: <aitf5rampsYpes2t@p15>

On Fri, Jun 12, 2026 at 09:27AM +0800, Heming Zhao wrote:
> On Thu, Jun 11, 2026 at 05:01:50PM +0200, Marco Elver wrote:
> > When an extending direct I/O write or a direct I/O write racing with an
> > unlink is initiated, ocfs2_direct_IO() places the user inode into the
> > system orphan directory and sets the OCFS2_DIO_ORPHANED_FL flag to
> > ensure defined behavior and crash consistency.
> > 
> > However, if the direct I/O request encounters an error or gets
> > asynchronous cancellation (bytes <= 0), the VFS completion hook
> > ocfs2_dio_end_io() bypasses ocfs2_dio_end_io_write() entirely and
> > executes ocfs2_dio_free_write_ctx().  This completely omits the teardown
> > of the orphan entry, leaking the user inode in the orphan directory and
> > leaving the OCFS2_DIO_ORPHANED_FL disk flag set.
> > 
> > Because the OCFS2_DIO_ORPHANED_FL flag remains active, subsequent VFS
> > final inode eviction (ocfs2_delete_inode) observes the flag, assumes a
> > direct I/O write is actively in progress, and refuses to wipe the inode.
> > This results in an irrecoverable disk storage and resource leak that can
> > only be reclaimed if the cluster unmounts or crashes.
> > 
> > Fix this by ensuring that ocfs2_dio_end_io() inspects dw_orphaned even
> > when an I/O error occurs, and executes ocfs2_del_inode_from_orphan() to
> > liberate the inode before destroying the in-memory write context.
> > 
> > Fixes: 5040f8df56fb ("ocfs2: free up write context when direct IO failed")
> > Assisted-by: Antigravity:Gemini
> > Signed-off-by: Marco Elver <elver@google.com>
> > ---
> >  fs/ocfs2/aops.c | 17 +++++++++++++++--
> >  1 file changed, 15 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> > index 4acdbb70882c..ad3f2057e26e 100644
> > --- a/fs/ocfs2/aops.c
> > +++ b/fs/ocfs2/aops.c
> > @@ -2419,11 +2419,24 @@ static int ocfs2_dio_end_io(struct kiocb *iocb,
> >  		mlog_ratelimited(ML_ERROR, "Direct IO failed, bytes = %lld",
> >  				 (long long)bytes);
> >  	if (private) {
> > -		if (bytes > 0)
> > +		if (bytes > 0) {
> >  			ret = ocfs2_dio_end_io_write(inode, private, offset,
> >  						     bytes);
> > -		else
> > +		} else {
> > +			struct ocfs2_dio_write_ctxt *dwc = private;
> > +
> > +			if (dwc->dw_orphaned) {
> > +				struct buffer_head *di_bh = NULL;
> > +
> > +				if (ocfs2_inode_lock(inode, &di_bh, 1) == 0) {
> > +					ocfs2_del_inode_from_orphan(OCFS2_SB(inode->i_sb),
> > +								    inode, di_bh, 0, 0);
> > +					ocfs2_inode_unlock(inode, 1);
> > +					brelse(di_bh);
> > +				}
> 
> Calling only ocfs2_del_inode_from_orphan() without ocfs2_truncate_file() will
> leave stale blocks beyond the EOF.

Right.

> I think the existing OCFS2 code already handles error/crash cases for orphaned
> inodes, and this "leaking" behavior is by design.
> please refer to ocfs2_recover_orphans() and ocfs2_add_inode_to_orphan().

Periodic scans skip direct I/O entries to avoid racing with active
direct I/O on live nodes:

In fs/ocfs2/journal.c:ocfs2_orphan_filldir():

	/* do not include dio entry in case of orphan scan */
	if ((p->orphan_reco_type == ORPHAN_NO_NEED_TRUNCATE) &&
			(!strncmp(name, OCFS2_DIO_ORPHAN_PREFIX,
			OCFS2_DIO_ORPHAN_PREFIX_LEN)))
		return true;

Is something else recovering them?

  reply	other threads:[~2026-06-12 12:58 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-11 15:01 [PATCH] ocfs2: fix orphan inode disk leak in ocfs2_dio_end_io() on I/O error Marco Elver
2026-06-12  1:27 ` Heming Zhao
2026-06-12 12:58   ` Marco Elver [this message]
2026-06-15 15:09     ` Heming Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aiwCgXY4tChR3MiU@elver.google.com \
    --to=elver@google.com \
    --cc=heming.zhao@suse.com \
    --cc=jlbec@evilplan.org \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark@fasheh.com \
    --cc=ocfs2-devel@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox