Re: Oops while rebalancing, now unmountable.

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Christoph Hellwig <hch@infradead.org>
To: Chris Mason <chris.mason@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Shane Shrybman <shrybman@teksavvy.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: Oops while rebalancing, now unmountable.
Date: Mon, 15 Nov 2010 14:03:35 -0500	[thread overview]
Message-ID: <20101115190335.GA11374@infradead.org> (raw)
In-Reply-To: <1289845457-sup-9432@think>

On Mon, Nov 15, 2010 at 01:46:02PM -0500, Chris Mason wrote:
> For the metadata blocks, btrfs gets into a problematic lock inversion
> where it needs to record that a block has been written so that it will
> be properly recowed when someone tries to change it again.
> 
> Basically the rule for btree_writepage:
> 
> 1) lock the extent buffer (different from the page)
> 2) mark the metadata block as written
> 3) lock the page
> 4) call writepage
> 
> Btrfs does this correctly everywhere it uses writepage, and everyone
> else either uses writepages or is PF_MEMALLOC, except for the page
> migration code, which just jumps to step 4.
>
> So, my current fix adds a migrate page hook and adds a warning into the
> code to make sure we protest loudly when the block isn't marked as
> written.  Since this shakedown worked well, I'm changing the warning to
> a BUG().
> 

This sounds to me like you shouldn't bother to use ->writepage
for the case that adheres to your locking protocol, but just call into
extent_write_full_page directly.  ->writepage is supposed to directly
callable from the VM, and not require filesystems specific calling
conventions.  Just calling extent_write_full_page directly and
making btree_writepage do the PF_MEMALLOC unconditionally should
also fix the page migration corruption.  And at the same time
making btree_writepage future proof.

Btw, magic like the one there currently does need at least a long
describing comment.

> The check for kupdate in btree_writepages is different.  Once we write
> something, we have to do a good amount of work in order to modify it
> again.  The btrfs log commits make sure that we write metadata from time
> to time, so we don't really need help from the flusher threads unless.
>
> We also don't want to waste time writing metadata from
> balance_dirty_pages.  It'll just make more allocations later as we
> wander around and recow things, and it is much more likely to be seeky
> than the file IO.  So we setup a threshold where we don't bother doing
> metadata IO unless there is a good amount pending.
> 
> I'm fine with removing the metadata writepage entirely, it didn't use to
> have this many rules and it seems like a better idea to have it not
> there at all.

for_kupdate only covers a tiny subset of the flusher threads, as it's
only set for the older_than_this still writeback.  It doesn't cover
regular percentage background reclaim not other asynchronous activity
from the flusher threads, like wakeup_flusher_threads or the laptop-mode
I/O completion.

At the very least it should check for_kupdate || for_background to cover
all background writeback, which is what the few other uses of
for_kupdate already do, but I suspect you simply want to not mark
the btree inode as hashed in the inode hash and skip background
writeback completely.

next prev parent reply	other threads:[~2010-11-15 19:03 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-08 17:10 Oops while rebalancing, now unmountable Shane Shrybman
2010-11-08 17:55 ` Chris Mason
2010-11-08 20:39   ` Shane Shrybman
2010-11-08 21:04     ` Chris Mason
2010-11-08 21:25       ` Shane Shrybman
2010-11-09 13:42 ` Chris Mason
2010-11-09 18:21   ` Shane Shrybman
2010-11-14 19:55     ` Shane Shrybman
2010-11-14 20:42       ` Andrea Arcangeli
2010-11-14 22:00         ` Christoph Hellwig
2010-11-14 22:12           ` Andrea Arcangeli
2010-11-15 18:23             ` Christoph Hellwig
2010-11-15 18:46               ` Chris Mason
2010-11-15 19:03                 ` Christoph Hellwig [this message]
2010-11-16 21:48                 ` Shane Shrybman
2010-11-15 18:46               ` Andrea Arcangeli
2010-11-15 19:03                 ` Chris Mason
2010-11-15 19:16                   ` Andrea Arcangeli
2010-11-15 19:12                 ` Christoph Hellwig
2010-11-15 19:18                   ` Chris Mason
2010-11-15 19:29                   ` Andrea Arcangeli
2010-11-15 20:54                     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101115190335.GA11374@infradead.org \
    --to=hch@infradead.org \
    --cc=aarcange@redhat.com \
    --cc=chris.mason@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shrybman@teksavvy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).