From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org
Subject: Re: [PATCH v2] iomap: zero cached page over unwritten extent on truncate page
Date: Wed, 21 Oct 2020 12:59:07 -0400 [thread overview]
Message-ID: <20201021165907.GA1328297@bfoster> (raw)
In-Reply-To: <20201021162547.GL9832@magnolia>
On Wed, Oct 21, 2020 at 09:25:47AM -0700, Darrick J. Wong wrote:
> On Wed, Oct 21, 2020 at 09:33:29AM -0400, Brian Foster wrote:
> > iomap_truncate_page() relies on zero range and zero range
> > unconditionally skips unwritten mappings. This is normally not a
> > problem as most users synchronize in-core state to the underlying
> > block mapping by flushing pagecache prior to calling into iomap.
> > This is not the case for iomap_truncate_page(), however. XFS calls
> > iomap_truncate_page() on truncate down before flushing the new EOF
> > page of the file. This means that if the new EOF block is unwritten
> > but covered by a dirty or writeback page (i.e. awaiting unwritten
> > conversion after writeback), iomap fails to zero that page. The
> > subsequent truncate_setsize() call does perform page zeroing, but
> > doesn't dirty the page. Therefore if the new EOF page is written
> > back after calling into iomap but before the pagecache truncate, the
> > post-EOF zeroing is lost on page reclaim. This exposes stale
> > post-EOF data on mapped reads.
> >
> > Rework iomap_truncate_page() to check pagecache state before calling
> > into iomap_apply() and use that info to determine whether we can
> > safely skip zeroing unwritten extents. The filesystem has locked out
> > concurrent I/O and mapped operations at this point but is not
> > serialized against writeback, unwritten extent conversion (I/O
> > completion) or page reclaim. Therefore if a page does not exist
> > before we acquire the mapping, we can be certain that an unwritten
> > extent cannot be converted before we return and thus it is safe to
> > skip. If a page does exist over an unwritten block, it could be in
> > the dirty or writeback states, convert the underlying mapping at any
> > time, and thus should be explicitly written to avoid racing with
> > writeback. Finally, since iomap_truncate_page() only targets the new
> > EOF block and must now pass additional state to the actor, open code
> > the zeroing actor instead of plumbing through zero range.
> >
> > This does have the tradeoff that an existing clean page is dirtied
> > and causes unwritten conversion, but this is analogous to historical
> > behavior implemented by block_truncate_page(). This patch restores
> > historical behavior to address the data exposure problem and leaves
> > filtering out the clean page case for a separate patch.
> > Fixes: 68a9f5e7007c ("xfs: implement iomap based buffered write path")
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >
> > v2:
> > - Rework to check for cached page explicitly and avoid use of seek data.
> > v1: https://lore.kernel.org/linux-fsdevel/20201012140350.950064-1-bfoster@redhat.com/
>
> Has the reproducer listed in that email been turned into a fstest case
> yet? :)
>
Heh.. that reproducer actually required customization to manufacture the
problem. I'll have to think more about a generic reproducer.
FWIW, I've also just come across a similar post-eof data exposure
failure with this patch, so I'll need to dig into that first and
foremost and figure out whether this is still incorrect/insufficient for
some reason...
> >
> > fs/iomap/buffered-io.c | 41 ++++++++++++++++++++++++++++++++++++++++-
> > 1 file changed, 40 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index bcfc288dba3f..2cdfcff02307 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -1000,17 +1000,56 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
> > }
> > EXPORT_SYMBOL_GPL(iomap_zero_range);
> >
> > +struct iomap_trunc_priv {
> > + bool *did_zero;
> > + bool has_page;
> > +};
> > +
> > +static loff_t
> > +iomap_truncate_page_actor(struct inode *inode, loff_t pos, loff_t count,
> > + void *data, struct iomap *iomap, struct iomap *srcmap)
> > +{
> > + struct iomap_trunc_priv *priv = data;
> > + unsigned offset;
> > + int status;
> > +
> > + if (srcmap->type == IOMAP_HOLE)
> > + return count;
> > + if (srcmap->type == IOMAP_UNWRITTEN && !priv->has_page)
> > + return count;
> > +
> > + offset = offset_in_page(pos);
> > + if (IS_DAX(inode))
> > + status = dax_iomap_zero(pos, offset, count, iomap);
> > + else
> > + status = iomap_zero(inode, pos, offset, count, iomap, srcmap);
> > + if (status < 0)
> > + return status;
> > +
> > + if (priv->did_zero)
> > + *priv->did_zero = true;
> > + return count;
> > +}
> > +
> > int
> > iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
> > const struct iomap_ops *ops)
> > {
> > + struct iomap_trunc_priv priv = { .did_zero = did_zero };
> > unsigned int blocksize = i_blocksize(inode);
> > unsigned int off = pos & (blocksize - 1);
> > + loff_t ret;
> >
> > /* Block boundary? Nothing to do */
> > if (!off)
> > return 0;
> > - return iomap_zero_range(inode, pos, blocksize - off, did_zero, ops);
> > +
> > + priv.has_page = filemap_range_has_page(inode->i_mapping, pos, pos);
>
> Er... shouldn't that second 'pos' be 'pos + blocksize - off - 1', like
> the apply call below? I guess it doesn't matter since we're only
> interested in the page at pos, but the double usage of pos caught my
> eye.
>
Yeah. I'll fix that up.
> I also wonder, can you move this into the actor so that you can pass
> *did_zero straight through without the two-item struct?
>
I don't think so because the idea was to explicitly check for page
presence prior to getting the mapping.
Brian
> --D
>
> > + ret = iomap_apply(inode, pos, blocksize - off, IOMAP_ZERO, ops, &priv,
> > + iomap_truncate_page_actor);
> > + if (ret <= 0)
> > + return ret;
> > + return 0;
> > }
> > EXPORT_SYMBOL_GPL(iomap_truncate_page);
> >
> > --
> > 2.25.4
> >
>
next prev parent reply other threads:[~2020-10-21 16:59 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-21 13:33 [PATCH v2] iomap: zero cached page over unwritten extent on truncate page Brian Foster
2020-10-21 16:25 ` Darrick J. Wong
2020-10-21 16:59 ` Brian Foster [this message]
2020-10-22 15:38 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201021165907.GA1328297@bfoster \
--to=bfoster@redhat.com \
--cc=darrick.wong@oracle.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.