linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Mason <chris.mason@oracle.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	Shane Shrybman <shrybman@teksavvy.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: Oops while rebalancing, now unmountable.
Date: Mon, 15 Nov 2010 14:03:55 -0500	[thread overview]
Message-ID: <1289847339-sup-4591@think> (raw)
In-Reply-To: <20101115184657.GJ6809@random.random>

Excerpts from Andrea Arcangeli's message of 2010-11-15 13:46:57 -0500:
> On Mon, Nov 15, 2010 at 01:23:14PM -0500, Christoph Hellwig wrote:
> > On Sun, Nov 14, 2010 at 11:12:22PM +0100, Andrea Arcangeli wrote:
> > > +static int btree_migratepage(struct address_space *mapping,
> > > +                       struct page *newpage, struct page *page)
> > > +{
> > > +       /*
> > > +        * we can't safely write a btree page from here,
> > > +        * we haven't done the locking hook
> > > +        */
> > > +       if (PageDirty(page))
> > > +               return -EAGAIN;
> > >=20
> > > fallback_migrate_page would call writeout() which is apparently not
> > > ok in btrfs for locking issues leading to corruption.
> >=20
> > Hmm, it seems the issue for that particular problem is indeedin btrfs=
.
>=20
> I've been reading the writeout() in mm/migrate.c and I wonder if maybe
> that should have been WB_SYNC_ALL or if we miss a
> wait_on_page_writeback in after ->writepage() returns? Can you have a
> look there? We check the PG_writeback bit when the page is not dirty
> (well before fallback_migrate_page is called), but after calling
> writeout() we don't return to wait on PG_writeback. We make sure to
> hold the page lock after ->writepage returns but that doesn't mean
> PG_writeback isn't still set.

It always returns either -EIO or -EAGAIN, so the caller will try again
and then end up waiting on PageWriteback?

>=20
> > If it needs external locking for writing out data it should not
> > implement ->writepage to start with.  Chris, can you explain what's
> > going on with the btree code? It's pretty funny both in the
> > btree_writepage which goes directly into extent_write_full_page
> > if PF_MEMALLOC is not set, but otherwise does much more complicated
> > work, and also in btree_writepages which skips various WB_SYNC_NONE,
> > including the very weird check for for_kupdate.
> >=20
> > What's the story behing all this and the corruption that Andrea found=
?
> >=20
> > > > Btw, what codepath does THP call migrate_pages from?  If you don'=
t
> > > > use an explicit thread writeout will be a no-op on btrfs and XFS,=
 too.
> > >=20
> > > THP never calls migrate_pages, it's memory compaction that calls it
> > > from inside alloc_pages(order=3D9). It got noticed only with THP be=
cause
> > > it makes more frequent hugepage allocations than nr_hugepages in
> > > hugetlbfs (and maybe there are more THP users already).
> >=20
> > Well, s/THP/compaction/ and the same problem applies.  The modern
> > filesystem all have stopped from writeback happening either at all
> > or at least for the delalloc case from direct reclaim.  Calling
> > into this code from alloc_pages for filesystem backed pages is thus
> > rather pointless.
>=20
> Compaction practically only happens in the context of the task
> allocating memory (in my tree it is also used by kswapd). Not
> immediate to ask a separate daemon to invoke it. Not sure why this
> should screw delalloc. Compaction isn't freeing any memory at all,
> it's not reclaim. It just defragments and moves stuff around and it
> may have to write dirty pages to do so.

The short version is that when the VM jumps in and starts doing single
page IO, we badly fragment files.  The FS wants writepages for
everything, so that we can do smart delayed allocation and so that we
can write out things in units bigger than 4KB.

We most recently hashed this out in threads with Mel Gorman around
balance_dirty_pages, but it has a very big impact on performance.

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=3Dmailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-11-15 19:03 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-08 17:10 Oops while rebalancing, now unmountable Shane Shrybman
2010-11-08 17:55 ` Chris Mason
2010-11-08 20:39   ` Shane Shrybman
2010-11-08 21:04     ` Chris Mason
2010-11-08 21:25       ` Shane Shrybman
2010-11-09 13:42 ` Chris Mason
2010-11-09 18:21   ` Shane Shrybman
2010-11-14 19:55     ` Shane Shrybman
2010-11-14 20:42       ` Andrea Arcangeli
2010-11-14 22:00         ` Christoph Hellwig
2010-11-14 22:12           ` Andrea Arcangeli
2010-11-15 18:23             ` Christoph Hellwig
2010-11-15 18:46               ` Chris Mason
2010-11-15 19:03                 ` Christoph Hellwig
2010-11-16 21:48                 ` Shane Shrybman
2010-11-15 18:46               ` Andrea Arcangeli
2010-11-15 19:03                 ` Chris Mason [this message]
2010-11-15 19:16                   ` Andrea Arcangeli
2010-11-15 19:12                 ` Christoph Hellwig
2010-11-15 19:18                   ` Chris Mason
2010-11-15 19:29                   ` Andrea Arcangeli
2010-11-15 20:54                     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1289847339-sup-4591@think \
    --to=chris.mason@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shrybman@teksavvy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).