linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: bugzilla-daemon@bugzilla.kernel.org, linux-xfs@vger.kernel.org
Subject: Re: [Bug 202053] [xfstests generic/464]: XFS corruption and Assertion failed: 0, file: fs/xfs/xfs_super.c, line: 985
Date: Tue, 8 Jan 2019 09:54:49 -0500	[thread overview]
Message-ID: <20190108145449.GB6330@bfoster> (raw)
In-Reply-To: <20190108054639.GV4205@dastard>

On Tue, Jan 08, 2019 at 04:46:39PM +1100, Dave Chinner wrote:
> On Mon, Jan 07, 2019 at 09:41:04AM -0500, Brian Foster wrote:
> > On Sun, Jan 06, 2019 at 08:31:20AM +1100, Dave Chinner wrote:
> > > On Fri, Jan 04, 2019 at 07:32:17AM -0500, Brian Foster wrote:
> > > > On Tue, Dec 25, 2018 at 06:10:59AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> > > > - writepages is in progress on a particular file that has decently sized
> > > >   post-eof speculative preallocation
> > > > - writepages gets to the point where it looks up or allocates a new imap
> > > >   that includes the preallocation, the allocation/lookup result is
> > > >   stored in wpc
> > > > - the file is closed by one process, killing off preallocation, then
> > > >   immediately appended to by another, updating the file size by a few
> > > >   bytes
> > > > - writepages comes back around to xfs_map_blocks() and trims imap to the
> > > >   current size, but imap still includes one block of the original speculative
> > > >   prealloc (that was truncated and recreated) because the size increased
> > > >   between the time imap was stored and trimmed
> > > 
> > > I'm betting hole punch can cause similar problems with cached maps
> > > in writepage. I wrote a patch yonks ago to put a generation number
> > > in the extent fork and to store it in the cached map, and to
> > > invalidate the cached map if they didn't match.
> > > 
> > 
> > Isn't hole punch already serialized with writeback? I thought the issue
> 
> Yes, dirty pages over the range of the hole are flushed flushed
> before we punch the hole. And we do that after preventing new pages
> from being dirtied. But this doesn't prevent background writeback
> from running at the same time on regions of the file outside the
> range to be hole punched. It also does not prevent writeback from
> caching a map that covers the range that has a hole punched in the
> middle of it.
> 
> Say the file has one large allocated extent and all the pages are in
> cache and dirty (e.g. full file overwrite). Hole punch
> runs, locks out new writes and cleans the middle of the file. At the
> same time, background writeback starts at offset zero and
> caches the single extent that maps the entire file. Hole punch then locks the
> extent list, punches out the middle of the file and drops all it's
> locks. writeback is still writing pages at the start of the file,
> oblivious to the hole that has just been punched that made it's
> cached extent map stale.
> 
> App then dirties a new page over the hole, creating a delalloc
> extent.
> 

Ah right, thanks. I had just lost track of this. I was wondering why we
didn't have a test for this problem and then I realized I wrote one[1]
over a year ago and it just never landed. :P

The test uses truncate/pwrite rather than hole punch, but it triggers
the same fundamental problem. It looks like it left off at trying to
find a suitable set of parameters to (reasonably) reliably reproduce on
various test setups without taking forever.

Anyways, I have a small series to include a stable fix and then replace
the whole EOF trim band-aid with an invalidation fix based on
Christoph's aforementioned fork seqno patch. It needs more massage and
testing, but I'll revisit the fstest and include that as well.

Brian

[1] https://marc.info/?l=fstests&m=150902929900510&w=2


> If writeback hasn't yet reached the hole and skipped over it because
> there are no dirty pages in that range (say it
> blocked in the block device because of device congestion or
> throttling), it will see this newly dirtied page and check that it
> is within the range of the cached map. Which it is, because the
> cached map spans the entire file. So writeback will map it directly
> to a bio because it considers the cached map to be correct.
> 
> However, the hole punch made the cached map stale, and the newly
> dirtied page needs to call xfs_iomap_write_allocate() to convert the
> delalloc extent. But because we had no way to detect that the cached
> map no longer reflected the inode's extent map, writeback will
> incorrectly map the newly dirtied data to a freed (and potentially
> reallocated) block on disk. i.e. it's a use after free situation.
> 
> FWIW, the recent range_cyclic changes we made removed one of the
> vectors for this problem (skip over hole, go back to start, hit
> newly dirtied page and write with stale cached map), but the problem
> still exists if the cached extent is large enough and we block
> writeback lower in the IO stack....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

  reply	other threads:[~2019-01-08 14:54 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-24  7:16 [Bug 202053] New: [xfstests generic/464]: XFS corruption and Assertion failed: 0, file: fs/xfs/xfs_super.c, line: 985 bugzilla-daemon
2018-12-24  7:19 ` [Bug 202053] " bugzilla-daemon
2018-12-24 10:40 ` bugzilla-daemon
2018-12-24 10:43 ` bugzilla-daemon
2018-12-24 10:49 ` bugzilla-daemon
2018-12-25  6:10 ` bugzilla-daemon
2019-01-04 12:32   ` Brian Foster
2019-01-04 12:52     ` Brian Foster
2019-01-05 21:31     ` Dave Chinner
2019-01-06 21:57       ` Dave Chinner
2019-01-07 14:41         ` Brian Foster
2019-01-07 19:11           ` Brian Foster
2019-01-08  5:55             ` Dave Chinner
2019-01-08 14:57               ` Brian Foster
2019-01-07 14:41       ` Brian Foster
2019-01-08  5:46         ` Dave Chinner
2019-01-08 14:54           ` Brian Foster [this message]
2019-01-04 12:40 ` bugzilla-daemon
2019-01-04 12:52 ` bugzilla-daemon
2019-01-05 21:31 ` bugzilla-daemon
2019-01-06 21:57 ` bugzilla-daemon
2019-01-07  2:35 ` bugzilla-daemon
2019-01-07 14:41 ` bugzilla-daemon
2019-01-07 14:41 ` bugzilla-daemon
2019-01-07 19:11 ` bugzilla-daemon
2019-01-08  5:46 ` bugzilla-daemon
2019-01-08  5:55 ` bugzilla-daemon
2019-01-08 14:54 ` bugzilla-daemon
2019-01-08 14:57 ` bugzilla-daemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190108145449.GB6330@bfoster \
    --to=bfoster@redhat.com \
    --cc=bugzilla-daemon@bugzilla.kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).