From: Andrew Morton <akpm@linux-foundation.org>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [patch 03/22] fix deadlock in balance_dirty_pages
Date: Thu, 1 Mar 2007 00:27:04 -0800 [thread overview]
Message-ID: <20070301002704.8fbbcda2.akpm@linux-foundation.org> (raw)
In-Reply-To: <E1HMfp2-0001fc-00@dorka.pomaz.szeredi.hu>
On Thu, 01 Mar 2007 08:35:28 +0100 Miklos Szeredi <miklos@szeredi.hu> wrote:
> > > This deadlock happens, when dirty pages from one filesystem are
> > > written back through another filesystem. It easiest to demonstrate
> > > with fuse although it could affect looback mounts as well (see
> > > following patches).
> > >
> > > Let's call the filesystems A(bove) and B(elow). Process Pr_a is
> > > writing to A, and process Pr_b is writing to B.
> > >
> > > Pr_a is bash-shared-mapping. Pr_b is the fuse filesystem daemon
> > > (fusexmp_fh), for simplicity let's assume that Pr_b is single
> > > threaded.
> > >
> > > These are the simplified stack traces of these processes after the
> > > deadlock:
> > >
> > > Pr_a (bash-shared-mapping):
> > >
> > > (block on queue)
> > > fuse_writepage
> > > generic_writepages
> > > writeback_inodes
> > > balance_dirty_pages
> > > balance_dirty_pages_ratelimited_nr
> > > set_page_dirty_mapping_balance
> > > do_no_page
> > >
> > >
> > > Pr_b (fusexmp_fh):
> > >
> > > io_schedule_timeout
> > > congestion_wait
> > > balance_dirty_pages
> > > balance_dirty_pages_ratelimited_nr
> > > generic_file_buffered_write
> > > generic_file_aio_write
> > > ext3_file_write
> > > do_sync_write
> > > vfs_write
> > > sys_pwrite64
> > >
> > >
> > > Thanks to the aggressive nature of Pr_a, it can happen, that
> > >
> > > nr_file_dirty > dirty_thresh + margin
> > >
> > > This is due to both nr_dirty growing and dirty_thresh shrinking, which
> > > in turn is due to nr_file_mapped rapidly growing. The exact size of
> > > the margin at which the deadlock happens is not known, but it's around
> > > 100 pages.
> > >
> > > At this point Pr_a enters balance_dirty_pages and starts to write back
> > > some if it's dirty pages. After submitting some requests, it blocks
> > > on the request queue.
> > >
> > > The first write request will trigger Pr_b to perform a write()
> > > syscall. This will submit a write request to the block device and
> > > then may enter balance_dirty_pages().
> > >
> > > The condition for exiting balance_dirty_pages() is
> > >
> > > - either that write_chunk pages have been written
> > >
> > > - or nr_file_dirty + nr_writeback < dirty_thresh
> > >
> > > It is entirely possible that less than write_chunk pages were written,
> > > in which case balance_dirty_pages() will not exit even after all the
> > > submitted requests have been succesfully completed.
> > >
> > > Which means that the write() syscall does not return.
> >
> > But the balance_dirty_pages() loop does more than just wait for those two
> > conditions. It will also submit _more_ dirty pages for writeout. ie: it
> > should be feeding more of file A's pages into writepage.
> >
> > Why isn't that happening?
>
> All of A's data is actually written by B. So just submitting more
> pages to some queue doesn't help, it will just make the queue longer.
>
> If the queue length were not limited, and B would have limitless
> threads, and the write() wouldn't exclude other writes to the same
> file (i_mutex), then there would be no deadlock.
>
> But for fuse the first and the last condition isn't met.
>
> For the loop device the second condition isn't met, loop is single
> threaded.
Sigh. What's this about i_mutex? That appears to be some critical
information which _still_ isn't being communicated.
next prev parent reply other threads:[~2007-03-01 8:27 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20070227223809.684624012@szeredi.hu>
[not found] ` <20070227223911.472192712@szeredi.hu>
2007-03-01 6:58 ` [patch 03/22] fix deadlock in balance_dirty_pages Andrew Morton
2007-03-01 7:35 ` Miklos Szeredi
2007-03-01 8:27 ` Andrew Morton [this message]
2007-03-01 8:37 ` Miklos Szeredi
2007-03-01 8:41 ` Andrew Morton
2007-03-01 8:58 ` Miklos Szeredi
[not found] ` <20070227223914.057085427@szeredi.hu>
2007-03-01 7:11 ` [patch 04/22] fix deadlock in throttle_vm_writeout Andrew Morton
2007-03-01 7:48 ` Miklos Szeredi
2007-02-27 23:14 [patch 00/22] misc VFS/VM patches and fuse writable shared mapping support Miklos Szeredi
2007-02-27 23:14 ` [patch 03/22] fix deadlock in balance_dirty_pages Miklos Szeredi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070301002704.8fbbcda2.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox