linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Mason <chris.mason@oracle.com>
To: Liu Bo <liubo2009@cn.fujitsu.com>
Cc: Josef Bacik <josef@redhat.com>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] Btrfs: complete page writeback before doing ordered extents
Date: Tue, 24 Apr 2012 10:15:29 -0400	[thread overview]
Message-ID: <20120424141529.GI22794@shiny> (raw)
In-Reply-To: <4F9606EF.2080005@cn.fujitsu.com>

On Tue, Apr 24, 2012 at 09:50:39AM +0800, Liu Bo wrote:
> On 04/24/2012 01:33 AM, Josef Bacik wrote:
> 
> > We can deadlock waiting for pages to end writeback because we are doing an
> > allocation while hold a tree lock since the ordered extent stuff will
> > require tree locks.  A quick easy way to fix this is to end page writeback
> > before we do our ordered io stuff, which works fine since we don't really
> > need the page for this to work.  Eventually we want to make this work happen
> > as soon as the io is completed and then push the ordered extent stuff off to
> > a worker thread, but at this stage we need this deadlock fixed with changing
> > as little as possible.  Thanks,
> > 
> 
> 
> Hi Josef,
> 
> I'm ok with the patch, but could you show us more details about the deadlock between allocation and endio?

Josef and I have been talking about this one off-list for a while.  It's
a deadlock I tracked down in my overnight stress runs.

Basically what we have is the io-less dirty throttling code saying there
are too many pages in writeback, and so new allocations are backing up
and waiting for pages to leave writeback.

But the pages can't leave writeback because we're waiting on more memory
to complete the metadata changes at endio time.  Strictly speaking the
VM is doing something wrong here, our NOFS allocations shouldn't be
waiting for writeback to finish.

But, strictly speaking we're doing something wrong too, we're doing too
many allocations with pages tied up in writeback.

So this splits the page from the metadata changes.  We're still doing
the metadata changes after the IO is complete, but we're doing them
after we've let the pages go.

-chris

  reply	other threads:[~2012-04-24 14:15 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-23 17:33 [PATCH] Btrfs: complete page writeback before doing ordered extents Josef Bacik
2012-04-24  1:50 ` Liu Bo
2012-04-24 14:15   ` Chris Mason [this message]
2012-04-25  7:52     ` Liu Bo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120424141529.GI22794@shiny \
    --to=chris.mason@oracle.com \
    --cc=josef@redhat.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=liubo2009@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).