All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Friesen <chris.friesen@windriver.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Austin Schuh <austin@peloton-tech.com>, <pavel@pavlinux.ru>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	<linux-ext4@vger.kernel.org>, <tytso@mit.edu>,
	<adilger.kernel@dilger.ca>,
	rt-users <linux-rt-users@vger.kernel.org>
Subject: Re: RT/ext4/jbd2 circular dependency
Date: Wed, 29 Oct 2014 13:11:22 -0600	[thread overview]
Message-ID: <54513BDA.1050804@windriver.com> (raw)
In-Reply-To: <alpine.DEB.2.11.1410291854090.5308@nanos>

On 10/29/2014 12:05 PM, Thomas Gleixner wrote:
> On Mon, 27 Oct 2014, Chris Friesen wrote:
>> There are details (stack traces, etc.) in the first message in the thread:
>> http://www.spinics.net/lists/linux-rt-users/msg12261.html
>>
>>
>> Originally we had thought that nfsd might have been implicated somehow, but it
>> seems like it was just a trigger (possibly by increasing the rate of sync
>> I/O).
>>
>> In the interest of full disclosure I should point out that we're using a
>> modified kernel so there is a chance that we have introduced the problem
>> ourselves.  That said, we have not made significant changes to either ext4 or
>> jbd2.  (Just a couple of minor cherry-picked bugfixes.)
>
> I don't think it's an ext4/jdb2 problem.

If we turn off journalling in ext4 we can't reproduce the problem.  Not 
conclusive, I'll admit...but interesting.

>> The relevant code paths are:
>>
>> Journal commit.  The important thing here is that we set the PG_writeback on a
>> page, put the jbd2 journal head on BJ_Shadow list, then sleep waiting for page
>> writeback complete.  If the page writeback never completes, then the journal
>> head never comes off the BJ_Shadow list.
>
> And that's what you need to investigate.
>
> The rest of the threads being stuck waiting for the journal writeback
> or inode->sem are just the consequence of it and have nothing to do
> with the root cause of the problem.
>
> ftrace with the block/writeback/jdb/ext4/sched tracepoints enabled
> should provide a first insight into the issue.

It seems plausible that the reason why page writeback never completes is 
that it's blocking trying to take inode->i_data_sem for reading, as seen 
in the following stack trace (from a hung system):

[<ffffffff8109cd0c>] rt_down_read+0x2c/0x40
[<ffffffff8120ac91>] ext4_map_blocks+0x41/0x270
[<ffffffff8120f0dc>] mpage_da_map_and_submit+0xac/0x4c0
[<ffffffff8120f9c9>] write_cache_pages_da+0x3f9/0x420
[<ffffffff8120fd30>] ext4_da_writepages+0x340/0x720
[<ffffffff8111a5f4>] do_writepages+0x24/0x40
[<ffffffff81191b71>] writeback_single_inode+0x181/0x4b0
[<ffffffff811922a2>] writeback_sb_inodes+0x1b2/0x290
[<ffffffff8119241e>] __writeback_inodes_wb+0x9e/0xd0
[<ffffffff811928e3>] wb_writeback+0x223/0x3f0
[<ffffffff81192b4f>] wb_check_old_data_flush+0x9f/0xb0
[<ffffffff8119403f>] wb_do_writeback+0x12f/0x250
[<ffffffff811941f4>] bdi_writeback_thread+0x94/0x320

I have ftrace logs for two of the three components that we think are 
involved.  I don't have ftrace logs for the above writeback case.  My 
instrumentation was set up to end tracing when someone blocked for 5 
seconds trying to get inode->i_data_sem, and it happened to be an nfsd 
task instead of the page writeback code.  I could conceivably modify the 
instrumentation to only get triggered by page writeback blocking.


For what it's worth, I'm currently testing a backport of commit b34090e 
from mainline (which in turn required backporting commits e5a120a and 
f5113ef).  It switches from using the BJ_Shadow list to using the 
BH_Shadow flag on the buffer head.  More interestingly, waiters now get 
woken up from journal_end_buffer_io_sync() instead of from 
jbd2_journal_commit_transaction().

So far this seems to be helping a lot.  It's lasted about 15x as long 
under stress as without the patches.

Chris

  reply	other threads:[~2014-10-29 19:11 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-17 17:50 Hang writing to nfs-mounted filesystem from client, all nfsd tasks on server blocked in D Chris Friesen
2014-10-17 17:50 ` Chris Friesen
2014-10-17 18:01 ` Pavel Vasilyev
     [not found]   ` <CANGgnMbQmsdMDJUx7Bop9Xs=jQMmAJgWRjhXVFUGx-DwF=inYw@mail.gmail.com>
2014-10-23 17:54     ` RT/ext4/jbd2 circular dependency (was: Re: Hang writing to nfs-mounted filesystem from client) Chris Friesen
2014-10-26 14:25       ` Thomas Gleixner
2014-10-27 16:22         ` RT/ext4/jbd2 circular dependency Chris Friesen
2014-10-29 18:05           ` Thomas Gleixner
2014-10-29 19:11             ` Chris Friesen [this message]
2014-10-29 19:26               ` Thomas Gleixner
2014-10-29 20:17                 ` Chris Friesen
2014-10-29 20:31                   ` Thomas Gleixner
2014-10-29 23:19                 ` Theodore Ts'o
2014-10-29 23:37                   ` Chris Friesen
2014-10-30  1:44                     ` Theodore Ts'o
2014-10-30  8:15                       ` Kevin Liao
2014-10-30 12:24                         ` Theodore Ts'o
2014-10-30 21:11                   ` Thomas Gleixner
2014-10-30 23:24                     ` Theodore Ts'o
2014-10-31  0:08                       ` Chris Friesen
2014-10-31  0:16                       ` Thomas Gleixner
2014-11-13 19:06                       ` Jan Kara
2014-10-27 19:57       ` Chris Friesen
     [not found] ` <544156FE.7070905-CWA4WttNNZF54TAoqtyWWQ@public.gmane.org>
2014-10-17 18:58   ` Hang writing to nfs-mounted filesystem from client, all nfsd tasks on server blocked in D Austin Schuh
2014-10-17 18:58     ` Austin Schuh
2014-10-17 19:12   ` Dmitry Monakhov
2014-10-17 19:12     ` Dmitry Monakhov
2014-10-18 17:05   ` Hang writing to nfs-mounted filesystem from client -- expected code path? Chris Friesen
2014-10-18 17:05     ` Chris Friesen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54513BDA.1050804@windriver.com \
    --to=chris.friesen@windriver.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=austin@peloton-tech.com \
    --cc=bfields@fieldses.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=pavel@pavlinux.ru \
    --cc=tglx@linutronix.de \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.