From: Chris Friesen <chris.friesen@windriver.com>
To: Austin Schuh <austin@peloton-tech.com>, <pavel@pavlinux.ru>
Cc: "J. Bruce Fields" <bfields@fieldses.org>,
<linux-ext4@vger.kernel.org>, <tytso@mit.edu>,
<adilger.kernel@dilger.ca>,
rt-users <linux-rt-users@vger.kernel.org>
Subject: RT/ext4/jbd2 circular dependency (was: Re: Hang writing to nfs-mounted filesystem from client)
Date: Thu, 23 Oct 2014 11:54:55 -0600 [thread overview]
Message-ID: <544940EF.7090907@windriver.com> (raw)
In-Reply-To: <CANGgnMbQmsdMDJUx7Bop9Xs=jQMmAJgWRjhXVFUGx-DwF=inYw@mail.gmail.com>
On 10/17/2014 12:55 PM, Austin Schuh wrote:
> Use the 121 patch. This sounds very similar to the issue that I helped
> debug with XFS. There ended up being a deadlock due to a bug in the
> kernel work queues. You can search the RT archives for more info.
I can confirm that the problem still shows up with the rt121 patch. (And
also with Paul Gortmaker's proposed 3.4.103-rt127 patch.)
We added some instrumentation and it looks like we've tracked down the problem.
Figuring out how to fix it is proving to be tricky.
Basically it looks like we have a circular dependency involving the
inode->i_data_sem rt_mutex, the PG_writeback bit, and the BJ_Shadow list. It
goes something like this:
jbd2_journal_commit_transaction:
1) set page for writeback (set PG_writeback bit)
2) put jbd2 journal head on BJ_Shadow list
3) sleep on PG_writeback bit waiting for page writeback complete
ext4_da_writepages:
1) ext4_map_blocks() acquires inode->i_data_sem for writing
2) do_get_write_access() sleeps waiting for jbd2 journal head to come off
the BJ_Shadow list
At this point the flush code can't run because it can't acquire
inode->i_data_sem for reading, so the page will never get written out.
Deadlock.
The following is a more detailed timeline with information from added trace
events:
nfsd-2012 [003] ....1.. 8612.903541: ext4_map_blocks_down_write: dev 147,3 ino 2097161
nfsd-2012 [003] ....1.. 8612.903546: ext4_map_blocks_up_write: dev 147,3 ino 2097161
nfsd-2012 [003] ....1.. 8612.903559: ext4_map_blocks_down_write: dev 147,3 ino 2097161
nfsd-2012 [003] ....1.. 8612.903565: ext4_map_blocks_up_write: dev 147,3 ino 2097161
nfsd-2012 [003] ....1.. 8612.903611: ext4_map_blocks_down_write: dev 147,3 ino 2097161
nfsd-2012 [003] ....1.. 8612.903616: ext4_map_blocks_up_write: dev 147,3 ino 2097161
<...>-5960 [004] ....1.. 8612.903628: jbd2_submit_inode_data: dev 147,3 ino 2097160
<...>-5960 [004] ....111 8612.903651: jbd2_list_add_bjshadow: adding jh ffff880415350000 to transaction ffff880415391180 BJ_Shadow list
<...>-5960 [004] ....111 8612.903653: jbd2_list_add_bjshadow: adding jh ffff8803eb08dbd0 to transaction ffff880415391180 BJ_Shadow list
<...>-5960 [004] ....111 8612.903655: jbd2_list_add_bjshadow: adding jh ffff8803eb08d150 to transaction ffff880415391180 BJ_Shadow list
<...>-5960 [004] ....111 8612.903656: jbd2_list_add_bjshadow: adding jh ffff8803eb08d0e0 to transaction ffff880415391180 BJ_Shadow list
<...>-5960 [004] ....111 8612.903657: jbd2_list_add_bjshadow: adding jh ffff88031c9449a0 to transaction ffff880415391180 BJ_Shadow list
nfsd-2012 [003] ....1.. 8612.903658: ext4_map_blocks_down_write: dev 147,3 ino 2097161
<...>-5960 [004] ....111 8612.903658: jbd2_list_add_bjshadow: adding jh ffff88031c944310 to transaction ffff880415391180 BJ_Shadow list
nfsd-2012 [003] ....1.. 8612.903665: ext4_map_blocks_up_write: dev 147,3 ino 2097161
<...>-5960 [004] ....1.. 8612.903696: jbd2_finish_inode_data: dev 147,3 ino 2097161
nfsd-2012 [003] ....1.. 8612.903706: ext4_map_blocks_down_write: dev 147,3 ino 2097161
nfsd-2012 [003] ....1.. 8612.903714: ext4_map_blocks_up_write: dev 147,3 ino 2097161
nfsd-2012 [003] ....1.. 8612.903802: ext4_map_blocks_down_write: dev 147,3 ino 2097161
nfsd-2012 [003] ....1.. 8612.903814: ext4_map_blocks_up_write: dev 147,3 ino 2097161
nfsd-2012 [003] ....1.. 8612.903960: ext4_map_blocks_down_write: dev 147,3 ino 2097161
nfsd-2012 [003] ....1.. 8612.903983: ext4_map_blocks_up_write: dev 147,3 ino 2097161
nfsd-2012 [003] ....1.. 8612.904311: ext4_map_blocks_down_write: dev 147,3 ino 2097161
nfsd-2012 [003] ....1.. 8612.904318: ext4_map_blocks_up_write: dev 147,3 ino 2097161
nfsd-2012 [003] ....1.. 8612.904331: ext4_map_blocks_down_write: dev 147,3 ino 2097161
nfsd-2012 [003] ....1.. 8612.904337: ext4_map_blocks_up_write: dev 147,3 ino 2097161
nfsd-2012 [003] ....1.. 8612.904399: ext4_map_blocks_down_write: dev 147,3 ino 2097161
nfsd-2012 [003] ....1.. 8612.904408: ext4_map_blocks_up_write: dev 147,3 ino 2097161
nfsd-2012 [003] ....1.. 8612.904678: ext4_map_blocks_down_write: dev 147,3 ino 2097161
nfsd-2012 [003] ....1.. 8612.904772: ext4_map_blocks_up_write: dev 147,3 ino 2097161
<...>-2015 [007] ....1.. 8612.934515: ext4_map_blocks_down_write: dev 147,3 ino 2097161
<...>-2015 [007] ....1.. 8612.934525: jbd2_list_sleep_bjshadow: waiting for jh ffff8803eb08dbd0 from transaction ffff880415391180 to be removed from BJ_Shadow list
Timeline:
pid 5960 is [jbd2/drbd3-8]
pid 2015 is [nfsd]
pid 2012 is [nfsd]
pid 5960:
8612.903628: jbd2_submit_inode_data for inode 2097160.
This is right before calling journal_submit_inode_data_buffers(), which
ends up calling set_page_writeback().
8612.903653: Add jh ffff8803eb08dbd0 to BJ_Shadow list.
This is in bd2_journal_write_metadata_buffer() right before calling
__jbd2_journal_file_buffer().
8612.903696: This is in journal_finish_inode_data_buffers(), right before calling
filemap_fdatawait() which ends up calling wait_on_page_bit(page,
PG_writeback)
<we see no more logs for pid 5960 after this>
pid 2015:
8612.934515: takes write lock on inode->i_data_sem for inode 2097161
8612.934525: goes to sleep waiting for jh ffff8803eb08dbd0 to be removed from
BJ_Shadow list
<we see no more logs for pid 2015 after this>
pid 2012:
8617.963896: hits 5-sec retry limit and stops the trace. This means it blocked
trying to get a read lock on inode->i_data_sem for inode 2097161
at time 8612.963.
Chris
next prev parent reply other threads:[~2014-10-23 17:54 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-17 17:50 Hang writing to nfs-mounted filesystem from client, all nfsd tasks on server blocked in D Chris Friesen
2014-10-17 18:01 ` Pavel Vasilyev
[not found] ` <CANGgnMbQmsdMDJUx7Bop9Xs=jQMmAJgWRjhXVFUGx-DwF=inYw@mail.gmail.com>
2014-10-23 17:54 ` Chris Friesen [this message]
2014-10-26 14:25 ` RT/ext4/jbd2 circular dependency (was: Re: Hang writing to nfs-mounted filesystem from client) Thomas Gleixner
2014-10-27 16:22 ` RT/ext4/jbd2 circular dependency Chris Friesen
2014-10-29 18:05 ` Thomas Gleixner
2014-10-29 19:11 ` Chris Friesen
2014-10-29 19:26 ` Thomas Gleixner
2014-10-29 20:17 ` Chris Friesen
2014-10-29 20:31 ` Thomas Gleixner
2014-10-29 23:19 ` Theodore Ts'o
2014-10-29 23:37 ` Chris Friesen
2014-10-30 1:44 ` Theodore Ts'o
2014-10-30 8:15 ` Kevin Liao
2014-10-30 12:24 ` Theodore Ts'o
2014-10-30 21:11 ` Thomas Gleixner
2014-10-30 23:24 ` Theodore Ts'o
2014-10-31 0:08 ` Chris Friesen
2014-10-31 0:16 ` Thomas Gleixner
2014-11-13 19:06 ` Jan Kara
2014-10-27 19:57 ` Chris Friesen
[not found] ` <544156FE.7070905-CWA4WttNNZF54TAoqtyWWQ@public.gmane.org>
2014-10-17 18:58 ` Hang writing to nfs-mounted filesystem from client, all nfsd tasks on server blocked in D Austin Schuh
2014-10-17 19:12 ` Dmitry Monakhov
2014-10-18 17:05 ` Hang writing to nfs-mounted filesystem from client -- expected code path? Chris Friesen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=544940EF.7090907@windriver.com \
--to=chris.friesen@windriver.com \
--cc=adilger.kernel@dilger.ca \
--cc=austin@peloton-tech.com \
--cc=bfields@fieldses.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-rt-users@vger.kernel.org \
--cc=pavel@pavlinux.ru \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).