From: Dave Chinner <david@fromorbit.com>
To: Jan Kara <jack@suse.cz>
Cc: Ted Tso <tytso@mit.edu>,
linux-ext4@vger.kernel.org,
"Ritesh Harjani (IBM)" <ritesh.list@gmail.com>,
Gao Xiang <hsiangkao@linux.alibaba.com>
Subject: Re: [PATCH] ext4: Properly sync file size update after O_SYNC direct IO
Date: Fri, 13 Oct 2023 10:33:08 +1100 [thread overview]
Message-ID: <ZSiCNPAFuKZbmFPg@dread.disaster.area> (raw)
In-Reply-To: <20231012085937.pzfsttumi6q4g3tm@quack3>
On Thu, Oct 12, 2023 at 10:59:37AM +0200, Jan Kara wrote:
> On Thu 12-10-23 11:26:15, Dave Chinner wrote:
> > On Wed, Oct 11, 2023 at 04:21:55PM +0200, Jan Kara wrote:
> > > Gao Xiang has reported that on ext4 O_SYNC direct IO does not properly
> > > sync file size update and thus if we crash at unfortunate moment, the
> > > file can have smaller size although O_SYNC IO has reported successful
> > > completion. The problem happens because update of on-disk inode size is
> > > handled in ext4_dio_write_iter() *after* iomap_dio_rw() (and thus
> > > dio_complete() in particular) has returned and generic_file_sync() gets
> > > called by dio_complete(). Fix the problem by handling on-disk inode size
> > > update directly in our ->end_io completion handler.
> > >
> > > References: https://lore.kernel.org/all/02d18236-26ef-09b0-90ad-030c4fe3ee20@linux.alibaba.com
> > > Reported-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > ---
> > > fs/ext4/file.c | 139 ++++++++++++++++++-------------------------------
> > > 1 file changed, 52 insertions(+), 87 deletions(-)
> > .....
> > > @@ -388,9 +342,28 @@ static ssize_t ext4_handle_inode_extension(struct inode *inode, loff_t offset,
> > > */
> > > if (inode->i_nlink)
> > > ext4_orphan_del(NULL, inode);
> > > + return;
> > > }
> > > + /*
> > > + * If i_disksize got extended due to writeback of delalloc blocks while
> > > + * the DIO was running we could fail to cleanup the orphan list in
> > > + * ext4_handle_inode_extension(). Do it now.
> > > + */
> > > + if (!list_empty(&EXT4_I(inode)->i_orphan) && inode->i_nlink) {
> > > + handle_t *handle = ext4_journal_start(inode, EXT4_HT_INODE, 2);
> >
> > So this has to be called after the DIO write completes and calls
> > ext4_handle_inode_extension()?
>
> Yes, if the write was setup as extending one ('extend' is set to true in
> ext4_dio_write_iter()).
Then that is worth a comment to document the constraint for anyone
that is trying to understand how ext4 is using the iomap DIO code.
> > > @@ -606,9 +570,8 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
> > > dio_flags, NULL, 0);
> > > if (ret == -ENOTBLK)
> > > ret = 0;
> > > -
> > > if (extend)
> > > - ret = ext4_handle_inode_extension(inode, offset, ret, count);
> > > + ext4_inode_extension_cleanup(inode, ret);
> >
> > Because this doesn't wait for AIO DIO to complete and actually
> > extend the file before running the cleanup code...
>
> As Gao wrote, ext4 sets IOMAP_DIO_FORCE_WAIT if 'extend' is set (see
> ext4_dio_write_checks()) so if we get to calling
> ext4_inode_extension_cleanup() we are guaranteed the IO has already
> completed.
Ugh. That's a pretty nasty undocumented landmine. It definitely
needs a comment (or better, a WARN_ON_ONCE()) to document that this
code -only- works if AIO is disabled. This isn't for ext4
developers, it's for people working on the iomap code to understand
that ext4 has some really non-obvious constraints in it's DIO code
paths and that's why the landmine is not being stepped on....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2023-10-12 23:33 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-11 14:21 [PATCH] ext4: Properly sync file size update after O_SYNC direct IO Jan Kara
2023-10-12 0:26 ` Dave Chinner
2023-10-12 2:21 ` Gao Xiang
2023-10-12 8:59 ` Jan Kara
2023-10-12 23:33 ` Dave Chinner [this message]
2023-10-13 10:14 ` Jan Kara
2023-10-12 15:25 ` Ritesh Harjani
2023-10-12 16:17 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZSiCNPAFuKZbmFPg@dread.disaster.area \
--to=david@fromorbit.com \
--cc=hsiangkao@linux.alibaba.com \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=ritesh.list@gmail.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.