From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH] [RFC] xfs: make xfs_writepage_map extent map centric
Date: Mon, 16 Oct 2017 08:17:54 -0400 [thread overview]
Message-ID: <20171016121754.GD58994@bfoster.bfoster> (raw)
In-Reply-To: <20171014222532.GB15067@dastard>
On Sun, Oct 15, 2017 at 09:25:32AM +1100, Dave Chinner wrote:
> On Fri, Oct 13, 2017 at 09:13:48AM -0400, Brian Foster wrote:
> > On Fri, Oct 13, 2017 at 10:22:21AM +1100, Dave Chinner wrote:
> > > On Thu, Oct 12, 2017 at 12:14:37PM -0400, Brian Foster wrote:
> > > > On Wed, Oct 11, 2017 at 07:11:31PM +1100, Dave Chinner wrote:
> > > > > From: Dave Chinner <dchinner@redhat.com>
> > > > >
> > > > > xfs_writepage_map() iterates over the bufferheads on a page to decide
> > > > > what sort of IO to do and what actions to take. However, when it comes
> > > > > to reflink and deciding when it needs to execute a COW operation, we no
> > > > > longer look at the bufferhead state but instead we ignore than and look up
> > > > > internal state held in teh COW fork extent list.
> > >
> > > [....]
> > >
> > > > > +
> > > > > + if (!wpc->imap_valid || wpc->io_type == XFS_IO_HOLE) {
> > > > > + /*
> > > > > + * set_page_dirty dirties all buffers in a page, independent
> > > > > + * of their state. The dirty state however is entirely
> > > > > + * meaningless for holes (!mapped && uptodate), so check we did
> > > > > + * have a buffer covering a hole here and continue.
> > > > > + */
> > > > > + ASSERT(!buffer_mapped(bh));
> > > > > + continue;
> > > > > }
> > > >
> > > > IIRC, the old (broken) write over hole situation would allocate a block,
> > > > potentially yell about a block reservation overrun and ultimately write
> > > > the page.
> > >
> > > Which was [always] the wrong thing to do.
> > >
> > > However, we were forced to do this historically because it covered
> > > bugs and deficiencies in the front end mapping code, such as before
> > > we had ->page_mkwrite notification. In those ancient times, mmap()
> > > could not do delayed allocation or discover unwritten extents
> > > correctly. So back in those days we had to do unreserved
> > > allocations in writeback and risk transaction overruns and ENOSPC in
> > > writeback because mmap writes were unable to do the right thing.
> > >
> > > Nowdays, we've closed off all those issues and the only situation we
> > > can get unreserved allocation into a hole is when we've got a
> > > mismatch between bufferhead and extent state. This should never
> > > happen anymore, and silently allocating blocks for ranges that may
> > > or may not have valid data in them and writing it to disk is
> > > exactly the wrong thing to be doing now.
> > >
> >
> > "This should never happen ..."
> >
> > Famous last words? ;)
> >
> > Yeah, I understand it's a broken case, but I'm slightly concerned over
> > the fact that it has happened in the past due to filesystem bugs. I
> > think due to more than one problem as well, but my memory is hazy.
> >
> > Note that I'm not suggesting we go ahead and write the page like we used
> > to do, but rather that we at least have a defensive check for a
> > discrepancy between pagecache state that suggests a buffer is dirty and
> > extent state says there is no place to write.
>
> This code does not checking against "buffer_dirty" because we've
> never done that in the writeback path - we've always written clean,
> uptodate, mapped sub-page buffers if the page is dirty.
>
> So the only case left to defend against here is mapped buffers over
> a hole or an invalid extent. The writeback path currently checks
> that doesn't happen with an ASSERT(), and the code above retains
> that ASSERT(). It will go away completely when we move away from
> bufferheads....
>
> > If so, throw a warning
> > (either xfs_alert() or WARN_ON()) to indicate something went wrong
> > because we tossed a dirty page without writing it anywhere. In other
> > words, all I'm really saying here is that I don't think an assert is
> > good enough.
>
> It's the same as what we've always had to defend against this "we
> fucked up the higher layer mapping" type of bug. I'm not creating
> any new situation here, so I don't think it warrants a new warning
> that will simply go away with bufferheads...
>
Ok, then it may not be ideal to rely on the buffer_head to detect the
error.
> > > > Unless I've missed that
> > > > being handled somewhere, that seems like a slightly more severe error
> > > > from the user perspective. Perhaps we should have a noiser warning here
> > > > or even consider that a writeback error?
> > >
> > > No, because it's a valid state to have cached data over a hole if
> > > it's been read. Hence the assert to ensure what we got was from a
> > > previous read into a hole - if a mod we make to the higher level
> > > code screws up, we'll catch it pretty quickly during testing...
> >
> > We should be able to distinguish between page/buffer state that
> > indicates we have valid data over a hole vs. valid+dirty data over a
> > hole in order to flag the latter as a problem.
>
> Except that, as above, we treat clean/dirty data exactly the same in
> writepage - we write it if it's mapped over a valid extent. The code
> catches the only case left that is invalid and it doesn't matter if
> the buffer is clean or dirty - in both cases it is invalid....
>
This is simply not the case as it pertains to how the current code deals
with holes. Initially I was thinking we still quietly wrote the page,
but that is incorrect too. Christoph added the XFS_BMAPI_DELALLOC flag
in commit d2b3964a ("xfs: fix COW writeback race") that alters this
behavior.
The current behavior of the broken dirty page over a hole case is that
we no longer silenty attempt a hole -> real allocation. Instead,
writeback fails with -EIO. With this patch, that exact same broken case
fails the assert, but otherwise skips the dirty page without any error
or notification to userspace. IOW, by skipping the
iomap_write_allocate() code unconditionally we've effectively removed an
error check and made the pre-existing assert logic responsible for it.
So regardless of what specific buffer state checks pre-existed or not,
this patch subtly changes an existing error condition to an assertion
without any explicit intent/justification. I can understand not wanting
to add new logic related to buffer heads, but I don't think "buffer
heads suck" is a good enough reason alone to just drop the error. The
safer approach is to retain the error so that it's clear we need to find
a better way to detect it before we eliminate buffer_heads.
It would be nice if we could just rely on the page state to detect the
error. For example, is there any real non-erroneous case for having a
dirty page completely over a hole within EOF of a file? Unfortunately
that doesn't help us in the sub-page blocksize case where some of the
page might be valid, but perhaps that is a reasonable compromise. If so,
I think that should be part of a patch that actually eliminates
buffer_heads rather than reworking writepage because it at least gives
us more time to think about the problem.
Brian
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2017-10-16 12:17 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-11 8:11 [PATCH] [RFC] xfs: make xfs_writepage_map extent map centric Dave Chinner
2017-10-12 16:14 ` Brian Foster
2017-10-12 23:22 ` Dave Chinner
2017-10-13 13:13 ` Brian Foster
2017-10-14 22:25 ` Dave Chinner
2017-10-16 12:17 ` Brian Foster [this message]
2017-10-17 1:58 ` Dave Chinner
2017-10-17 14:44 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171016121754.GD58994@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).