From: Brian Foster <bfoster@redhat.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/2] iomap: zero cached pages over unwritten extents on zero range
Date: Tue, 20 Oct 2020 12:21:50 -0400 [thread overview]
Message-ID: <20201020162150.GB1272590@bfoster> (raw)
In-Reply-To: <20201019180144.GC1232435@bfoster>
On Mon, Oct 19, 2020 at 02:01:44PM -0400, Brian Foster wrote:
> On Mon, Oct 19, 2020 at 12:55:19PM -0400, Brian Foster wrote:
> > On Thu, Oct 15, 2020 at 10:49:01AM +0100, Christoph Hellwig wrote:
> > > > +iomap_zero_range_skip_uncached(struct inode *inode, loff_t *pos,
> > > > + loff_t *count, loff_t *written)
> > > > +{
> > > > + unsigned dirty_offset, bytes = 0;
> > > > +
> > > > + dirty_offset = page_cache_seek_hole_data(inode, *pos, *count,
> > > > + SEEK_DATA);
> > > > + if (dirty_offset == -ENOENT)
> > > > + bytes = *count;
> > > > + else if (dirty_offset > *pos)
> > > > + bytes = dirty_offset - *pos;
> > > > +
> > > > + if (bytes) {
> > > > + *pos += bytes;
> > > > + *count -= bytes;
> > > > + *written += bytes;
> > > > + }
> > >
> > > I find the calling conventions weird. why not return bytes and
> > > keep the increments/decrements of the three variables in the caller?
> > >
> >
> > No particular reason. IIRC I had it both ways and just landed on this.
> > I'd change it, but as mentioned in the patch 1 thread I don't think this
> > patch is sufficient (with or without patch 1) anyways because the page
> > can also have been reclaimed before we get here.
> >
>
> Christoph,
>
> What do you think about introducing behavior specific to
> iomap_truncate_page() to unconditionally write zeroes over unwritten
> extents? AFAICT that addresses the race and was historical XFS behavior
> (via block_truncate_page()) before iomap, so is not without precedent.
> What I'd probably do is bury the caller's did_zero parameter into a new
> internal struct iomap_zero_data to pass down into
> iomap_zero_range_actor(), then extend that structure with a
> 'zero_unwritten' field such that iomap_zero_range_actor() can do this:
>
Ugh, so the above doesn't quite describe historical behavior.
block_truncate_page() converts an unwritten block if a page exists
(dirty or not), but bails out if a page doesn't exist. We could still do
the above, but if we wanted something more intelligent I think we need
to check for a page before we get the mapping to know whether we can
safely skip an unwritten block or need to write over it. Otherwise if we
check for a page within the actor, we have no way of knowing whether
there was a (possibly dirty) page that had been written back and/or
reclaimed since ->iomap_begin(). If we check for the page first, I think
that the iolock/mmaplock in the truncate path ensures that a page can't
be added before we complete. We might be able to take that further and
check for a dirty || writeback page, but that might be safer as a
separate patch. See the (compile tested only) diff below for an idea of
what I was thinking.
Brian
--- 8< ---
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index bcfc288dba3f..2cdfcff02307 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1000,17 +1000,56 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
}
EXPORT_SYMBOL_GPL(iomap_zero_range);
+struct iomap_trunc_priv {
+ bool *did_zero;
+ bool has_page;
+};
+
+static loff_t
+iomap_truncate_page_actor(struct inode *inode, loff_t pos, loff_t count,
+ void *data, struct iomap *iomap, struct iomap *srcmap)
+{
+ struct iomap_trunc_priv *priv = data;
+ unsigned offset;
+ int status;
+
+ if (srcmap->type == IOMAP_HOLE)
+ return count;
+ if (srcmap->type == IOMAP_UNWRITTEN && !priv->has_page)
+ return count;
+
+ offset = offset_in_page(pos);
+ if (IS_DAX(inode))
+ status = dax_iomap_zero(pos, offset, count, iomap);
+ else
+ status = iomap_zero(inode, pos, offset, count, iomap, srcmap);
+ if (status < 0)
+ return status;
+
+ if (priv->did_zero)
+ *priv->did_zero = true;
+ return count;
+}
+
int
iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
const struct iomap_ops *ops)
{
+ struct iomap_trunc_priv priv = { .did_zero = did_zero };
unsigned int blocksize = i_blocksize(inode);
unsigned int off = pos & (blocksize - 1);
+ loff_t ret;
/* Block boundary? Nothing to do */
if (!off)
return 0;
- return iomap_zero_range(inode, pos, blocksize - off, did_zero, ops);
+
+ priv.has_page = filemap_range_has_page(inode->i_mapping, pos, pos);
+ ret = iomap_apply(inode, pos, blocksize - off, IOMAP_ZERO, ops, &priv,
+ iomap_truncate_page_actor);
+ if (ret <= 0)
+ return ret;
+ return 0;
}
EXPORT_SYMBOL_GPL(iomap_truncate_page);
next prev parent reply other threads:[~2020-10-20 16:21 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-12 14:03 [PATCH 0/2] iomap: zero dirty pages over unwritten extents Brian Foster
2020-10-12 14:03 ` [PATCH 1/2] iomap: use page dirty state to seek data " Brian Foster
2020-10-13 12:30 ` Brian Foster
2020-10-13 22:53 ` Dave Chinner
2020-10-14 12:59 ` Brian Foster
2020-10-14 22:37 ` Dave Chinner
2020-10-15 9:47 ` Christoph Hellwig
2020-10-19 16:55 ` Brian Foster
2020-10-27 18:07 ` Christoph Hellwig
2020-10-28 11:31 ` Brian Foster
2020-10-12 14:03 ` [PATCH 2/2] iomap: zero cached pages over unwritten extents on zero range Brian Foster
2020-10-15 9:49 ` Christoph Hellwig
2020-10-19 16:55 ` Brian Foster
2020-10-19 18:01 ` Brian Foster
2020-10-20 16:21 ` Brian Foster [this message]
2020-10-27 18:15 ` Christoph Hellwig
2020-10-28 11:31 ` Brian Foster
2020-10-23 1:02 ` [iomap] 11b5156248: xfstests.xfs.310.fail kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201020162150.GB1272590@bfoster \
--to=bfoster@redhat.com \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.