From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: "Bruno Prémont" <bonbons@sysophe.eu>
Cc: Josh Triplett <josh@joshtriplett.org>,
Steven Rostedt <rostedt@goodmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Lai Jiangshan <jiangshanlai@gmail.com>,
linux-kernel@vger.kernel.org
Subject: Re: RCU stall/SOFT-Lockup on 4.11.3/4.13.11 after multiple days uptime
Date: Sun, 12 Nov 2017 09:17:51 -0800 [thread overview]
Message-ID: <20171112171751.GA3624@linux.vnet.ibm.com> (raw)
In-Reply-To: <20171112120915.3072b927@neptune.home>
On Sun, Nov 12, 2017 at 12:09:28PM +0100, Bruno Prémont wrote:
> On Sat, 11 November 2017 "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > On Sat, Nov 11, 2017 at 08:38:32PM +0100, Bruno Prémont wrote:
> > > Hi,
> > >
> > > On a single-CPU KVM-based virtual machine I'm suffering from RCU stall
> > > and soft-lockup. 4.10.x kernels run fine (4.10.12) but starting with
> > > 4.11.x (4.11.3, 4.13.11) I'm getting system freezes for no apparent
> > > reason.
> > >
> > > All info I have is following console dump (from 4.13.11):
> > > [526415.290012] INFO: rcu_sched self-detected stall on CPU
> > > [526415.290012] o0-...: (745847 ticks this GP) idle=ba2/2/0 softirq=37393463/37393463 fqs=0
> > > [526415.290012] o (t=745854 jiffies g=23779976 c=23779975 q=32)
> > > [526415.290012] rcu_sched kthread starved for 745854 jiffies! g23779976 c23779975 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x0
> >
> > The above line says that the rcu_sched kthread asked to sleep for three
> > jiffies, but ended up sleeping for more than 745,854 jiffies.
> >
> > If your system does not let the RCU's kernel threads run, RCU cannot
> > help you much.
> >
> > The ->state of 0x0 indicates that the kthread is in fact runnable, but
> > did not get a chance to run. Was the system heavily loaded to the
> > point where you would expect a kthread to remain preempted for many
> > minutes?
> >
> > I am guessing that the answer is no, given that CPU 0 is actually idle
> > (idle=ba2/2/0). Seems unlikely, but I have to ask: Did you bind the
> > kthread to a specific CPU?
>
> The system should be lightly loaded (about 5-10% CPU usage on average), so
> plenty of time for RCU to do its work.
>
> I didn't bind processes (be it userspace process or kthread) to a specific
> CPU, thus it's all auto-configured.
>
> I guess the question then is what is the system busy with or waiting for
> that prevents RCU to get its work done...
> Shouldn't the watchdog print a trace of where CPU#0 is stuck? If so I might need
> to check at which log level and make sure that loglevel reaches console.
> Nothing did hit the disk though.
Do you have a high-speed interface to capture and store console output?
(As in something that can handle, say, 50MB in a reasonable period of
time.)
Thanx, Paul
> Thanks,
> Bruno
>
> > Thanx, Paul
> >
> > > [526440.020015] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0]
> > > [526468.020005] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0]
> > > [526478.320009] INFO: rcu_sched self-detected stall on CPU
> > > [526478.320009] o0-...: (752143 ticks this GP) idle=ba2/2/0 softirq=37393463/37393463 fqs=0
> > > [526478.320009] o (t=752157 jiffies g=23779976 c=23779975 q=32)
> > > [526478.320009] rcu_sched kthread starved for 752157 jiffies! g23779976 c23779975 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x0
> > > [526504.020016] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0]
> > > [526532.020007] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0]
> > > ...
> > >
> > > Attached is kernel config (4.13.11).
> > >
> > >
> > > The output obtained with 4.11.3 was:
> > > [ 280.680010] INFO: rcu_sched self-detected stall on CPU
> > > [ 280.680021] o0-...: (27312 ticks this GP) dile=b11/2/0 softirq=6119/6119 fqs=0
> > > [ 280.680021] o (t=27312 jiffies g=441 c=440 q=0)
> > > [ 280.680021] rcu_sched_kthread starved for 27312 jiffies! g441 c440 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x0
> > > ...
> > >
> > >
> > > As it's a remote VM for which I don't have access to the host I have little
> > > options for further digging (can't trigger sysrq's).
> > >
> > >
> > > Same kernel (4.13.11) seems to be running just fine on another KVM-base VM that
> > > has two CPUs.
> > >
> > >
> > > Does it ring a bell or is there some info that might be of any use,
> > > assuming I can obtain it?
> > >
> > > Bruno
> >
> >
>
next prev parent reply other threads:[~2017-11-12 17:17 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-11 19:38 RCU stall/SOFT-Lockup on 4.11.3/4.13.11 after multiple days uptime Bruno Prémont
2017-11-12 1:21 ` Paul E. McKenney
2017-11-12 11:09 ` Bruno Prémont
2017-11-12 17:17 ` Paul E. McKenney [this message]
2017-11-12 17:29 ` Bruno Prémont
2017-11-12 18:30 ` Bruno Prémont
2017-11-13 13:14 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171112171751.GA3624@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=bonbons@sysophe.eu \
--cc=jiangshanlai@gmail.com \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.