From: Marco Elver <elver@google.com>
To: Heming Zhao <heming.zhao@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>, Joel Becker <jlbec@evilplan.org>,
Joseph Qi <joseph.qi@linux.alibaba.com>,
ocfs2-devel@lists.linux.dev, linux-kernel@vger.kernel.org,
kasan-dev@googlegroups.com
Subject: Re: [PATCH] ocfs2: fix orphan inode disk leak in ocfs2_dio_end_io() on I/O error
Date: Fri, 12 Jun 2026 14:58:41 +0200 [thread overview]
Message-ID: <aiwCgXY4tChR3MiU@elver.google.com> (raw)
In-Reply-To: <aitf5rampsYpes2t@p15>
On Fri, Jun 12, 2026 at 09:27AM +0800, Heming Zhao wrote:
> On Thu, Jun 11, 2026 at 05:01:50PM +0200, Marco Elver wrote:
> > When an extending direct I/O write or a direct I/O write racing with an
> > unlink is initiated, ocfs2_direct_IO() places the user inode into the
> > system orphan directory and sets the OCFS2_DIO_ORPHANED_FL flag to
> > ensure defined behavior and crash consistency.
> >
> > However, if the direct I/O request encounters an error or gets
> > asynchronous cancellation (bytes <= 0), the VFS completion hook
> > ocfs2_dio_end_io() bypasses ocfs2_dio_end_io_write() entirely and
> > executes ocfs2_dio_free_write_ctx(). This completely omits the teardown
> > of the orphan entry, leaking the user inode in the orphan directory and
> > leaving the OCFS2_DIO_ORPHANED_FL disk flag set.
> >
> > Because the OCFS2_DIO_ORPHANED_FL flag remains active, subsequent VFS
> > final inode eviction (ocfs2_delete_inode) observes the flag, assumes a
> > direct I/O write is actively in progress, and refuses to wipe the inode.
> > This results in an irrecoverable disk storage and resource leak that can
> > only be reclaimed if the cluster unmounts or crashes.
> >
> > Fix this by ensuring that ocfs2_dio_end_io() inspects dw_orphaned even
> > when an I/O error occurs, and executes ocfs2_del_inode_from_orphan() to
> > liberate the inode before destroying the in-memory write context.
> >
> > Fixes: 5040f8df56fb ("ocfs2: free up write context when direct IO failed")
> > Assisted-by: Antigravity:Gemini
> > Signed-off-by: Marco Elver <elver@google.com>
> > ---
> > fs/ocfs2/aops.c | 17 +++++++++++++++--
> > 1 file changed, 15 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> > index 4acdbb70882c..ad3f2057e26e 100644
> > --- a/fs/ocfs2/aops.c
> > +++ b/fs/ocfs2/aops.c
> > @@ -2419,11 +2419,24 @@ static int ocfs2_dio_end_io(struct kiocb *iocb,
> > mlog_ratelimited(ML_ERROR, "Direct IO failed, bytes = %lld",
> > (long long)bytes);
> > if (private) {
> > - if (bytes > 0)
> > + if (bytes > 0) {
> > ret = ocfs2_dio_end_io_write(inode, private, offset,
> > bytes);
> > - else
> > + } else {
> > + struct ocfs2_dio_write_ctxt *dwc = private;
> > +
> > + if (dwc->dw_orphaned) {
> > + struct buffer_head *di_bh = NULL;
> > +
> > + if (ocfs2_inode_lock(inode, &di_bh, 1) == 0) {
> > + ocfs2_del_inode_from_orphan(OCFS2_SB(inode->i_sb),
> > + inode, di_bh, 0, 0);
> > + ocfs2_inode_unlock(inode, 1);
> > + brelse(di_bh);
> > + }
>
> Calling only ocfs2_del_inode_from_orphan() without ocfs2_truncate_file() will
> leave stale blocks beyond the EOF.
Right.
> I think the existing OCFS2 code already handles error/crash cases for orphaned
> inodes, and this "leaking" behavior is by design.
> please refer to ocfs2_recover_orphans() and ocfs2_add_inode_to_orphan().
Periodic scans skip direct I/O entries to avoid racing with active
direct I/O on live nodes:
In fs/ocfs2/journal.c:ocfs2_orphan_filldir():
/* do not include dio entry in case of orphan scan */
if ((p->orphan_reco_type == ORPHAN_NO_NEED_TRUNCATE) &&
(!strncmp(name, OCFS2_DIO_ORPHAN_PREFIX,
OCFS2_DIO_ORPHAN_PREFIX_LEN)))
return true;
Is something else recovering them?
next prev parent reply other threads:[~2026-06-12 12:58 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-11 15:01 [PATCH] ocfs2: fix orphan inode disk leak in ocfs2_dio_end_io() on I/O error Marco Elver
2026-06-12 1:27 ` Heming Zhao
2026-06-12 12:58 ` Marco Elver [this message]
2026-06-15 15:09 ` Heming Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aiwCgXY4tChR3MiU@elver.google.com \
--to=elver@google.com \
--cc=heming.zhao@suse.com \
--cc=jlbec@evilplan.org \
--cc=joseph.qi@linux.alibaba.com \
--cc=kasan-dev@googlegroups.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mark@fasheh.com \
--cc=ocfs2-devel@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox