From: Matthew Wilcox <willy@infradead.org>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
"Darrick J. Wong" <darrick.wong@oracle.com>,
Christoph Hellwig <hch@infradead.org>, Qian Cai <cai@redhat.com>
Subject: Re: [PATCH] iomap: Set all uptodate bits for an Uptodate page
Date: Thu, 24 Sep 2020 18:56:38 +0100 [thread overview]
Message-ID: <20200924175638.GA32101@casper.infradead.org> (raw)
In-Reply-To: <20200924172653.GC2603692@bfoster>
On Thu, Sep 24, 2020 at 01:26:53PM -0400, Brian Foster wrote:
> On Thu, Sep 24, 2020 at 04:22:11PM +0100, Matthew Wilcox wrote:
> > On Thu, Sep 24, 2020 at 11:12:59AM -0400, Brian Foster wrote:
> > > On Thu, Sep 24, 2020 at 02:59:00PM +0100, Matthew Wilcox wrote:
> > > > On Thu, Sep 24, 2020 at 09:12:35AM -0400, Brian Foster wrote:
> > > > > On Thu, Sep 24, 2020 at 01:56:08PM +0100, Matthew Wilcox (Oracle) wrote:
> > > > > > For filesystems with block size < page size, we need to set all the
> > > > > > per-block uptodate bits if the page was already uptodate at the time
> > > > > > we create the per-block metadata. This can happen if the page is
> > > > > > invalidated (eg by a write to drop_caches) but ultimately not removed
> > > > > > from the page cache.
> > > > > >
> > > > > > This is a data corruption issue as page writeback skips blocks which
> > > > > > are marked !uptodate.
> > > > >
> > > > > Thanks. Based on my testing of clearing PageUptodate here I suspect this
> > > > > will similarly prevent the problem, but I'll give this a test
> > > > > nonetheless.
> > > > >
> > > > > I am a little curious why we'd prefer to fill the iop here rather than
> > > > > just clear the page state if the iop data has been released. If the page
> > > > > is partially uptodate, then we end up having to re-read the page
> > > > > anyways, right? OTOH, I guess this behavior is more consistent with page
> > > > > size == block size filesystems where iop wouldn't exist and we just go
> > > > > by page state, so perhaps that makes more sense.
> > > >
> > > > Well, it's _true_ ... the PageUptodate bit means that every byte in this
> > > > page is at least as new as every byte on storage. There's no need to
> > > > re-read it, which is what we'll do if we ClearPageUptodate.
> > >
> > > Yes, of course. I'm just noting the inconsistent behavior between a full
> > > and partially uptodate page.
> >
> > Heh, well, we have no way of knowing. We literally just threw away
> > the information about which blocks are uptodate. So the best we can
> > do is work with the single bit we have. We do know that there are no
> > dirty blocks left on the page at this point (... maybe we should add a
> > VM_BUG_ON(!PageUptodate && PageDirty)).
> >
>
> Right..
>
> > Something we could do is summarise the block uptodate information in
> > the 32/64 bits of page_private without setting PagePrivate. That would
> > cause us to still allocate an iop so we can track reads/writes, but we
> > might be able to avoid a few reads.
> >
> > But I don't think it's worth it. Partially uptodate pages are not what
> > we should be optimising for; we should try to get & keep pages uptodate.
> > After all, it's a page cache ;-)
> >
>
> Fair enough. I was thinking about whether we could ensure the page is
> released if releasepage() effectively invalidated the page content (or
> avoid the release if we know the mapping won't be removed), but that
> appears to be nontrivial given the refcount interdependencies between
> page private and removing the mapping. I.e., the private data can hold a
> reference on the page and remove_mapping() wants to assume that the
> caller and page cache hold the last references on the page.
We could fix that -- remove_mapping() could take into account
page_has_private() in its call to page_ref_freeze() -- ie:
- refcount = 1 + compound_nr(page);
+ retcount = 1 + compound_nr(page) + page_has_private(page);
like some other parts of the VM do. And then the filesystem could
detach_page_private() in its aops->freepage() (which XFS aops don't
currently use).
That change might be a little larger than would be appreciated for a
data corruption fix going back two years. And there's already other
reasons for wanting to be able to create an iop for an Uptodate page
(ie the THP patchset).
next prev parent reply other threads:[~2020-09-24 17:56 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-24 12:56 [PATCH] iomap: Set all uptodate bits for an Uptodate page Matthew Wilcox (Oracle)
2020-09-24 13:12 ` Brian Foster
2020-09-24 13:59 ` Matthew Wilcox
2020-09-24 14:47 ` Gao Xiang
2020-09-24 15:12 ` Brian Foster
2020-09-24 15:22 ` Matthew Wilcox
2020-09-24 17:26 ` Brian Foster
2020-09-24 17:56 ` Matthew Wilcox [this message]
2020-09-24 15:08 ` Sedat Dilek
2020-09-24 15:15 ` Matthew Wilcox
2020-09-24 15:21 ` Sedat Dilek
2020-09-24 15:27 ` Matthew Wilcox
2020-09-24 16:19 ` Sedat Dilek
2020-09-24 16:36 ` Matthew Wilcox
2020-09-24 18:27 ` Sedat Dilek
2020-09-24 18:44 ` Matthew Wilcox
2020-09-24 18:47 ` Qian Cai
2020-09-24 19:54 ` Sedat Dilek
2020-09-24 20:02 ` Matthew Wilcox
2020-09-24 20:04 ` Sedat Dilek
2020-09-24 23:57 ` Matthew Wilcox
2020-09-25 2:13 ` Sedat Dilek
2020-09-25 10:44 ` Sedat Dilek
2020-09-25 11:12 ` Sedat Dilek
2020-09-25 13:24 ` Sedat Dilek
2020-09-25 13:36 ` Sedat Dilek
2020-09-25 13:46 ` Matthew Wilcox
2020-09-25 14:01 ` Sedat Dilek
2020-09-25 15:53 ` Matthew Wilcox
2020-09-26 19:12 ` Sedat Dilek
2020-09-27 11:31 ` Sedat Dilek
2020-09-27 12:04 ` Matthew Wilcox
2020-09-27 12:34 ` Sedat Dilek
2020-09-27 12:45 ` Sedat Dilek
2020-09-27 13:48 ` Sedat Dilek
2020-09-27 13:54 ` Matthew Wilcox
2020-09-27 14:02 ` Sedat Dilek
2020-09-27 15:19 ` Sedat Dilek
2020-10-03 18:52 ` Sedat Dilek
2020-10-04 4:13 ` Matthew Wilcox
2020-10-04 10:35 ` Sedat Dilek
2020-09-25 18:17 ` Darrick J. Wong
2020-09-28 6:41 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200924175638.GA32101@casper.infradead.org \
--to=willy@infradead.org \
--cc=bfoster@redhat.com \
--cc=cai@redhat.com \
--cc=darrick.wong@oracle.com \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).