All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH] [RFC] xfs: make xfs_writepage_map extent map centric
Date: Mon, 16 Oct 2017 08:17:54 -0400	[thread overview]
Message-ID: <20171016121754.GD58994@bfoster.bfoster> (raw)
In-Reply-To: <20171014222532.GB15067@dastard>

On Sun, Oct 15, 2017 at 09:25:32AM +1100, Dave Chinner wrote:
> On Fri, Oct 13, 2017 at 09:13:48AM -0400, Brian Foster wrote:
> > On Fri, Oct 13, 2017 at 10:22:21AM +1100, Dave Chinner wrote:
> > > On Thu, Oct 12, 2017 at 12:14:37PM -0400, Brian Foster wrote:
> > > > On Wed, Oct 11, 2017 at 07:11:31PM +1100, Dave Chinner wrote:
> > > > > From: Dave Chinner <dchinner@redhat.com>
> > > > > 
> > > > > xfs_writepage_map() iterates over the bufferheads on a page to decide
> > > > > what sort of IO to do and what actions to take. However, when it comes
> > > > > to reflink and deciding when it needs to execute a COW operation, we no
> > > > > longer look at the bufferhead state but instead we ignore than and look up
> > > > > internal state held in teh COW fork extent list.
> > > 
> > > [....]
> > > 
> > > > > +
> > > > > +		if (!wpc->imap_valid || wpc->io_type == XFS_IO_HOLE) {
> > > > > +			/*
> > > > > +			 * set_page_dirty dirties all buffers in a page, independent
> > > > > +			 * of their state.  The dirty state however is entirely
> > > > > +			 * meaningless for holes (!mapped && uptodate), so check we did
> > > > > +			 * have a buffer covering a hole here and continue.
> > > > > +			 */
> > > > > +			ASSERT(!buffer_mapped(bh));
> > > > > +			continue;
> > > > >  		}
> > > > 
> > > > IIRC, the old (broken) write over hole situation would allocate a block,
> > > > potentially yell about a block reservation overrun and ultimately write
> > > > the page.
> > > 
> > > Which was [always] the wrong thing to do.
> > > 
> > > However, we were forced to do this historically because it covered
> > > bugs and deficiencies in the front end mapping code, such as before
> > > we had ->page_mkwrite notification. In those ancient times, mmap()
> > > could not do delayed allocation or discover unwritten extents
> > > correctly.  So back in those days we had to do unreserved
> > > allocations in writeback and risk transaction overruns and ENOSPC in
> > > writeback because mmap writes were unable to do the right thing.
> > > 
> > > Nowdays, we've closed off all those issues and the only situation we
> > > can get unreserved allocation into a hole is when we've got a
> > > mismatch between bufferhead and extent state. This should never
> > > happen anymore, and silently allocating blocks for ranges that may
> > > or may not have valid data in them and writing it to disk is
> > > exactly the wrong thing to be doing now.
> > > 
> > 
> > "This should never happen ..."
> > 
> > Famous last words? ;) 
> > 
> > Yeah, I understand it's a broken case, but I'm slightly concerned over
> > the fact that it has happened in the past due to filesystem bugs. I
> > think due to more than one problem as well, but my memory is hazy.
> > 
> > Note that I'm not suggesting we go ahead and write the page like we used
> > to do, but rather that we at least have a defensive check for a
> > discrepancy between pagecache state that suggests a buffer is dirty and
> > extent state says there is no place to write.
> 
> This code does not checking against "buffer_dirty" because we've
> never done that in the writeback path - we've always written clean,
> uptodate, mapped sub-page buffers if the page is dirty.
> 
> So the only case left to defend against here is mapped buffers over
> a hole or an invalid extent. The writeback path currently checks
> that doesn't happen with an ASSERT(), and the code above retains
> that ASSERT(). It will go away completely when we move away from
> bufferheads....
> 
> > If so, throw a warning
> > (either xfs_alert() or WARN_ON()) to indicate something went wrong
> > because we tossed a dirty page without writing it anywhere. In other
> > words, all I'm really saying here is that I don't think an assert is
> > good enough.
> 
> It's the same as what we've always had to defend against this "we
> fucked up the higher layer mapping" type of bug. I'm not creating
> any new situation here, so I don't think it warrants a new warning
> that will simply go away with bufferheads...
> 

Ok, then it may not be ideal to rely on the buffer_head to detect the
error.

> > > > Unless I've missed that
> > > > being handled somewhere, that seems like a slightly more severe error
> > > > from the user perspective. Perhaps we should have a noiser warning here
> > > > or even consider that a writeback error?
> > > 
> > > No, because it's a valid state to have cached data over a hole if
> > > it's been read. Hence the assert to ensure what we got was from a
> > > previous read into a hole - if a mod we make to the higher level
> > > code screws up, we'll catch it pretty quickly during testing...
> > 
> > We should be able to distinguish between page/buffer state that
> > indicates we have valid data over a hole vs. valid+dirty data over a
> > hole in order to flag the latter as a problem.
> 
> Except that, as above, we treat clean/dirty data exactly the same in
> writepage - we write it if it's mapped over a valid extent. The code
> catches the only case left that is invalid and it doesn't matter if
> the buffer is clean or dirty - in both cases it is invalid....
> 

This is simply not the case as it pertains to how the current code deals
with holes. Initially I was thinking we still quietly wrote the page,
but that is incorrect too. Christoph added the XFS_BMAPI_DELALLOC flag
in commit d2b3964a ("xfs: fix COW writeback race") that alters this
behavior.

The current behavior of the broken dirty page over a hole case is that
we no longer silenty attempt a hole -> real allocation. Instead,
writeback fails with -EIO. With this patch, that exact same broken case
fails the assert, but otherwise skips the dirty page without any error
or notification to userspace. IOW, by skipping the
iomap_write_allocate() code unconditionally we've effectively removed an
error check and made the pre-existing assert logic responsible for it.

So regardless of what specific buffer state checks pre-existed or not,
this patch subtly changes an existing error condition to an assertion
without any explicit intent/justification. I can understand not wanting
to add new logic related to buffer heads, but I don't think "buffer
heads suck" is a good enough reason alone to just drop the error. The
safer approach is to retain the error so that it's clear we need to find
a better way to detect it before we eliminate buffer_heads.

It would be nice if we could just rely on the page state to detect the
error. For example, is there any real non-erroneous case for having a
dirty page completely over a hole within EOF of a file? Unfortunately
that doesn't help us in the sub-page blocksize case where some of the
page might be valid, but perhaps that is a reasonable compromise. If so,
I think that should be part of a patch that actually eliminates
buffer_heads rather than reworking writepage because it at least gives
us more time to think about the problem.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2017-10-16 12:17 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-11  8:11 [PATCH] [RFC] xfs: make xfs_writepage_map extent map centric Dave Chinner
2017-10-12 16:14 ` Brian Foster
2017-10-12 23:22   ` Dave Chinner
2017-10-13 13:13     ` Brian Foster
2017-10-14 22:25       ` Dave Chinner
2017-10-16 12:17         ` Brian Foster [this message]
2017-10-17  1:58           ` Dave Chinner
2017-10-17 14:44             ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171016121754.GD58994@bfoster.bfoster \
    --to=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.