Re: [PATCH v5 1/2] iomap: fix zero padding data issue in concurrent append writes

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Long Li <leo.lilong@huawei.com>,
	brauner@kernel.org, djwong@kernel.org, cem@kernel.org,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	yi.zhang@huawei.com, houtao1@huawei.com, yangerkun@huawei.com
Subject: Re: [PATCH v5 1/2] iomap: fix zero padding data issue in concurrent append writes
Date: Wed, 4 Dec 2024 07:05:16 -0500	[thread overview]
Message-ID: <Z1BFfIk_iDAh2uwF@bfoster> (raw)
In-Reply-To: <Z090Jd06yjgh_Q-y@dread.disaster.area>

On Wed, Dec 04, 2024 at 08:12:05AM +1100, Dave Chinner wrote:
> On Tue, Dec 03, 2024 at 09:54:41AM -0500, Brian Foster wrote:
> > On Tue, Dec 03, 2024 at 01:08:38PM +1100, Dave Chinner wrote:
> > > On Mon, Dec 02, 2024 at 10:26:14AM -0500, Brian Foster wrote:
> > > > On Sat, Nov 30, 2024 at 09:39:29PM +0800, Long Li wrote:
> > > We hold the MMAP_LOCK (filemap_invalidate_lock()) so no new pages
> > > can be instantiated over the range whilst we are running
> > > xfs_itruncate_extents(). hence once truncate_setsize() returns, we
> > > are guaranteed that there will be no IO in progress or can be
> > > started over the range we are removing.
> > > 
> > > Really, the issue is that writeback mappings have to be able to
> > > handle the range being mapped suddenly appear to be beyond EOF.
> > > This behaviour is a longstanding writeback constraint, and is what
> > > iomap_writepage_handle_eof() is attempting to handle.
> > > 
> > > We handle this by only sampling i_size_read() whilst we have the
> > > folio locked and can determine the action we should take with that
> > > folio (i.e. nothing, partial zeroing, or skip altogether). Once
> > > we've made the decision that the folio is within EOF and taken
> > > action on it (i.e. moved the folio to writeback state), we cannot
> > > then resample the inode size because a truncate may have started
> > > and changed the inode size.
> > > 
> > > We have to complete the mapping of the folio to disk blocks - the
> > > disk block mapping is guaranteed to be valid for the life of the IO
> > > because the folio is locked and under writeback - and submit the IO
> > > so that truncate_pagecache() will unblock and invalidate the folio
> > > when the IO completes.
> > > 
> > > Hence writeback vs truncate serialisation is really dependent on
> > > only sampling the inode size -once- whilst the dirty folio we are
> > > writing back is locked.
> > > 
> > 
> > Not sure I see how this is a serialization dependency given that
> > writeback completion also samples i_size.
> 
> Ah, I didn't explain what I meant very clearly, did I?
> 
> What I mean was we can't sample i_size in the IO path without
> specific checking/serialisation against truncate operations. And
> that means once we have partially zeroed the contents of a EOF
> straddling folio, we can't then sample the EOF again to determine
> the length of valid data in the folio as this can race with truncate
> and result in a different size for the data in the folio than we
> prepared it for.
> 

Ok, I think we're just saying the same thing using different words.

> > But no matter, it seems a
> > reasonable implementation to me to make the submission path consistent
> > in handling eof.
> 
> Yes, the IO completion path does sample it again via xfs_new_eof().
> However, as per above, it has specific checking for truncate down
> races and handles them:
> 
> /*
>  * If this I/O goes past the on-disk inode size update it unless it would
>  * be past the current in-core inode size.
>  */
> static inline xfs_fsize_t
> xfs_new_eof(struct xfs_inode *ip, xfs_fsize_t new_size)
> {
>         xfs_fsize_t i_size = i_size_read(VFS_I(ip));
> 
> >>>>    if (new_size > i_size || new_size < 0)
> >>>>            new_size = i_size;
>         return new_size > ip->i_disk_size ? new_size : 0;
> }
> 
> If we have a truncate_setsize() called for a truncate down whilst
> this IO is in progress, then xfs_new_eof() will see the new, smaller
> inode isize. The clamp on new_size handles this situation, and we
> then only triggers an update if the on-disk size is still smaller
> than the new truncated size (i.e. the IO being completed is still
> partially within the new EOF from the truncate down).
> 
> So I don't think there's an issue here at all at IO completion;
> it handles truncate down races cleanly...
> 

Agree.. this was kind of the point of the submit side trimming. I'm not
sure a second sample of i_size on submission for trimming purposes
affects this in any problematic way either.

Brian

> > I wonder if this could just use end_pos returned from
> > iomap_writepage_handle_eof()?
> 
> Yeah, that was what I was thinking, but I haven't looked at the code
> for long enough to have any real idea of whether that is sufficient
> or not.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>

next prev parent reply	other threads:[~2024-12-04 12:03 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-27  6:35 [PATCH v5 1/2] iomap: fix zero padding data issue in concurrent append writes Long Li
2024-11-27  6:35 ` [PATCH v5 2/2] xfs: clean up xfs_end_ioend() to reuse local variables Long Li
2024-11-27 16:07   ` Darrick J. Wong
2024-11-27 16:28 ` [PATCH v5 1/2] iomap: fix zero padding data issue in concurrent append writes Darrick J. Wong
2024-11-28  6:38   ` Long Li
2024-11-28  3:33 ` Christoph Hellwig
2024-11-30 13:39 ` Long Li
2024-12-02 15:26   ` Brian Foster
2024-12-03  2:08     ` Dave Chinner
2024-12-03 14:54       ` Brian Foster
2024-12-03 21:12         ` Dave Chinner
2024-12-04 12:05           ` Brian Foster [this message]
2024-12-04  9:06         ` Long Li
2024-12-04 12:05           ` Brian Foster
2024-12-06  3:36             ` Long Li
2024-12-04  9:00       ` Long Li
2024-12-04 12:17         ` Brian Foster
2024-12-05 12:47           ` Long Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z1BFfIk_iDAh2uwF@bfoster \
    --to=bfoster@redhat.com \
    --cc=brauner@kernel.org \
    --cc=cem@kernel.org \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=houtao1@huawei.com \
    --cc=leo.lilong@huawei.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox