From: Chris Mason <chris.mason@oracle.com>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: dirty balancing deadlock
Date: Sun, 18 Feb 2007 20:01:02 -0500 [thread overview]
Message-ID: <20070219010102.GC9289@think.oraclecorp.com> (raw)
In-Reply-To: <E1HIwnX-0005Sr-00@dorka.pomaz.szeredi.hu>
On Mon, Feb 19, 2007 at 01:54:31AM +0100, Miklos Szeredi wrote:
> > > > > > If so, writes to B will decrease the dirty memory threshold.
> > > > >
> > > > > Yes, but not by enough. Say A dirties a 1100 pages, limit is 1000.
> > > > > Some pages queued for writeback (doesn't matter how much). B writes
> > > > > back 1, 1099 dirty remain in A, zero in B. balance_dirty_pages() for
> > > > > B doesn't know that there's nothing more to write back for B, it's
> > > > > just waiting there for those 1099, which'll never get written.
> > > >
> > > > hm, OK, arguable. I guess something like this..
> > >
> > > Doesn't help the fuse case, but does seem to help the loopback mount
> > > one.
> > >
> > > For fuse it's worse with the patch: now the write triggered by the
> > > balance recurses into fuse, with disastrous results, since the fuse
> > > writeback is now blocked on the userspace queue.
> > >
> > > fusexmp_fh_no D 40136678 0 505 494 506 504 (NOTLB)
> > > 08982b78 00000001 00000000 08f9f9b4 0805d8cb 089a75f8 08982b78 08f98000
> > > 08f98000 08f9f9dc 0805a38a 089a7100 08982680 08f9f9cc 08f98000 08f98000
> > > 085d8300 08982680 089a7100 08f9fa34 08183006 089a7100 08982680 089a7100 Call Trace:
> > > 08f9f9a0: [<0805d8cb>] switch_to_skas+0x3b/0x83
> > > 08f9f9b8: [<0805a38a>] _switch_to+0x49/0x99
> > > 08f9f9e0: [<08183006>] schedule+0x246/0x547
> > > 08f9fa38: [<08103c7e>] fuse_get_req_wp+0xe9/0x14a
> > > 08f9fa70: [<08103d2e>] fuse_writepage+0x4f/0x12c
> >
> > In general, writepage is supposed to do work without blocking on
> > expensive locks that will get pdflush and dirty reclaim stuck in this
> > fashion. You'll probably have to take the same approach reiserfs does
> > in data=journal mode, which is leaving the page dirty if fuse_get_req_wp
> > is going to block without making progress.
>
> Pdflush, and dirty reclaim set wbc->nonblocking to true.
> balance_dirty_pages and fsync don't. The problem here is that
> Andrew's patch is wrong to let balance_dirty_pages() try to write back
> pages from a different queue.
async or sync, writepage is supposed to either make progress or bail.
loopback aside, if the fuse call is blocking long term, you're going to
run into problems.
-chris
WARNING: multiple messages have this Message-ID (diff)
From: Chris Mason <chris.mason@oracle.com>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: dirty balancing deadlock
Date: Sun, 18 Feb 2007 20:01:02 -0500 [thread overview]
Message-ID: <20070219010102.GC9289@think.oraclecorp.com> (raw)
In-Reply-To: <E1HIwnX-0005Sr-00@dorka.pomaz.szeredi.hu>
On Mon, Feb 19, 2007 at 01:54:31AM +0100, Miklos Szeredi wrote:
> > > > > > If so, writes to B will decrease the dirty memory threshold.
> > > > >
> > > > > Yes, but not by enough. Say A dirties a 1100 pages, limit is 1000.
> > > > > Some pages queued for writeback (doesn't matter how much). B writes
> > > > > back 1, 1099 dirty remain in A, zero in B. balance_dirty_pages() for
> > > > > B doesn't know that there's nothing more to write back for B, it's
> > > > > just waiting there for those 1099, which'll never get written.
> > > >
> > > > hm, OK, arguable. I guess something like this..
> > >
> > > Doesn't help the fuse case, but does seem to help the loopback mount
> > > one.
> > >
> > > For fuse it's worse with the patch: now the write triggered by the
> > > balance recurses into fuse, with disastrous results, since the fuse
> > > writeback is now blocked on the userspace queue.
> > >
> > > fusexmp_fh_no D 40136678 0 505 494 506 504 (NOTLB)
> > > 08982b78 00000001 00000000 08f9f9b4 0805d8cb 089a75f8 08982b78 08f98000
> > > 08f98000 08f9f9dc 0805a38a 089a7100 08982680 08f9f9cc 08f98000 08f98000
> > > 085d8300 08982680 089a7100 08f9fa34 08183006 089a7100 08982680 089a7100 Call Trace:
> > > 08f9f9a0: [<0805d8cb>] switch_to_skas+0x3b/0x83
> > > 08f9f9b8: [<0805a38a>] _switch_to+0x49/0x99
> > > 08f9f9e0: [<08183006>] schedule+0x246/0x547
> > > 08f9fa38: [<08103c7e>] fuse_get_req_wp+0xe9/0x14a
> > > 08f9fa70: [<08103d2e>] fuse_writepage+0x4f/0x12c
> >
> > In general, writepage is supposed to do work without blocking on
> > expensive locks that will get pdflush and dirty reclaim stuck in this
> > fashion. You'll probably have to take the same approach reiserfs does
> > in data=journal mode, which is leaving the page dirty if fuse_get_req_wp
> > is going to block without making progress.
>
> Pdflush, and dirty reclaim set wbc->nonblocking to true.
> balance_dirty_pages and fsync don't. The problem here is that
> Andrew's patch is wrong to let balance_dirty_pages() try to write back
> pages from a different queue.
async or sync, writepage is supposed to either make progress or bail.
loopback aside, if the fuse call is blocking long term, you're going to
run into problems.
-chris
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-02-19 1:02 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-02-18 18:28 dirty balancing deadlock Miklos Szeredi
2007-02-18 18:28 ` Miklos Szeredi
2007-02-18 20:53 ` Andrew Morton
2007-02-18 20:53 ` Andrew Morton
2007-02-18 21:25 ` Rik van Riel
2007-02-18 21:25 ` Rik van Riel
2007-02-18 22:54 ` Miklos Szeredi
2007-02-18 22:54 ` Miklos Szeredi
2007-02-18 22:50 ` Miklos Szeredi
2007-02-18 22:50 ` Miklos Szeredi
2007-02-18 22:59 ` Andrew Morton
2007-02-18 22:59 ` Andrew Morton
2007-02-18 23:22 ` Miklos Szeredi
2007-02-18 23:22 ` Miklos Szeredi
2007-02-18 23:59 ` Andrew Morton
2007-02-18 23:59 ` Andrew Morton
2007-02-19 0:25 ` Miklos Szeredi
2007-02-19 0:25 ` Miklos Szeredi
2007-02-19 0:30 ` Miklos Szeredi
2007-02-19 0:30 ` Miklos Szeredi
2007-02-19 0:45 ` Miklos Szeredi
2007-02-19 0:45 ` Miklos Szeredi
2007-02-19 0:45 ` Chris Mason
2007-02-19 0:45 ` Chris Mason
2007-02-19 0:54 ` Miklos Szeredi
2007-02-19 0:54 ` Miklos Szeredi
2007-02-19 1:01 ` Chris Mason [this message]
2007-02-19 1:01 ` Chris Mason
2007-02-19 1:14 ` Miklos Szeredi
2007-02-19 1:14 ` Miklos Szeredi
2007-02-20 0:16 ` Chris Mason
2007-02-20 0:16 ` Chris Mason
2007-02-20 8:53 ` Miklos Szeredi
2007-02-20 8:53 ` Miklos Szeredi
2007-02-19 17:11 ` Miklos Szeredi
2007-02-19 17:11 ` Miklos Szeredi
2007-02-19 23:12 ` Miklos Szeredi
2007-02-19 23:12 ` Miklos Szeredi
2007-02-20 0:13 ` Chris Mason
2007-02-20 0:13 ` Chris Mason
2007-02-20 8:47 ` Miklos Szeredi
2007-02-20 8:47 ` Miklos Szeredi
2007-02-20 11:30 ` Chris Mason
2007-02-20 11:30 ` Chris Mason
2007-02-21 21:36 ` Andrew Morton
2007-02-21 21:36 ` Andrew Morton
2007-02-22 7:42 ` Miklos Szeredi
2007-02-22 7:42 ` Miklos Szeredi
2007-02-22 7:55 ` Andrew Morton
2007-02-22 7:55 ` Andrew Morton
2007-02-22 8:02 ` Miklos Szeredi
2007-02-22 8:02 ` Miklos Szeredi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070219010102.GC9289@think.oraclecorp.com \
--to=chris.mason@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=miklos@szeredi.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.