public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: Chris Friesen <chris.friesen@windriver.com>
To: "Theodore Y. Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>, linux-ext4@vger.kernel.org
Subject: Re: looking for assistance with jbd2 (and other processes) hung trying to write to disk
Date: Tue, 10 Nov 2020 14:14:26 -0600	[thread overview]
Message-ID: <4145280c-3f5e-58aa-62e8-f2a13a9f979e@windriver.com> (raw)
In-Reply-To: <20201110194623.GC2951190@mit.edu>

On 11/10/2020 1:46 PM, Theodore Y. Ts'o wrote:
> [Please note this e-mail is from an EXTERNAL e-mail address]
> 
> On Tue, Nov 10, 2020 at 09:57:39AM -0600, Chris Friesen wrote:

>> Just to be sure, I'm looking for whoever has the BH_Lock bit set on the
>> buffer_head "b_state" field, right?  I don't see any ownership field the way
>> we have for mutexes.  Is there some way to find out who would have locked
>> the buffer?
> 
> It's quite possible that the buffer was locked as part of doing I/O,
> and we are just waiting for the I/O to complete.  An example of this
> is in journal_submit_commit_record(), where we lock the buffer using
> lock_buffer(), and then call submit_bh() to submit the buffer for I/O.
> When the I/O is completed, the buffer head will be unlocked, and we
> can check the buffer_uptodate flag to see if the I/O completed
> successfully.  (See journal_wait_on_commit_record() for an example of
> this.)

Running "ps -m 'jbd2'" in the crashdump shows jbd2/nvme2n1p4- in the 
uninterruptible state, with a "last run" timestamp of over 9 minutes 
before the crash.  Same for a number of jbd2/dm* tasks.  This seems like 
a very long time to wait for I/O to complete, which is why I'm assuming 
something's gone off the rails.

> So the first thing I'd suggest doing is looking at the console output
> or dmesg output from the crashdump to see if there are any clues in
> terms of kernel messages from the device driver before things locked
> up.  This could be as simple as the device falling off the bus, in
> which case there might be some kernel error messages from the block
> layer or device driver that would give some insight.

The timeline looks like this (CPUs 0,1,24,25 are the housekeeping CPUS):

The only device-related issue I see is this, just a bit over 9 minutes 
before the eventual panic.  Prior to this there are no crashdump dmesg 
logs for a couple hours previous.
[119982.636995] WARNING: CPU: 1 PID: 21 at net/sched/sch_generic.c:360 
dev_watchdog+0x268/0x280
[119982.636997] NETDEV WATCHDOG: mh0 (iavf): transmit queue 3 timed out

Then I see rcu_sched self-detecting stalls:
[120024.146369] INFO: rcu_sched self-detected stall on CPU { 25} 
(t=60000 jiffies g=10078853 c=10078852 q=250)
[120203.725976] INFO: rcu_sched self-detected stall on CPU { 25} 
(t=240003 jiffies g=10078853 c=10078852 q=361)
[120383.305584] INFO: rcu_sched self-detected stall on CPU { 25} 
(t=420006 jiffies g=10078853 c=10078852 q=401)

The actual panic is here:
[120536.886219] Kernel panic - not syncing: Software Watchdog Timer expired
[120536.886221] CPU: 1 PID: 21 Comm: ktimersoftd/1 Kdump: loaded 
Tainted: G        W  O   ------------ T 
3.10.0-1127.rt56.1093.el7.tis.2.x86_64 #1


Chris

  reply	other threads:[~2020-11-10 20:14 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-09 21:11 looking for assistance with jbd2 (and other processes) hung trying to write to disk Chris Friesen
2020-11-10 11:42 ` Jan Kara
2020-11-10 15:57   ` Chris Friesen
2020-11-10 19:46     ` Theodore Y. Ts'o
2020-11-10 20:14       ` Chris Friesen [this message]
2020-11-11 15:57     ` Jan Kara
2020-11-11 16:24       ` Chris Friesen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4145280c-3f5e-58aa-62e8-f2a13a9f979e@windriver.com \
    --to=chris.friesen@windriver.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox