From: Sasha Levin <levinsasha928@gmail.com>
To: paulmck@linux.vnet.ibm.com
Cc: Ingo Molnar <mingo@elte.hu>, Yinghai Lu <yinghai@kernel.org>,
Avi Kivity <avi@redhat.com>,
Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>,
Pekka Enberg <penberg@kernel.org>,
kvm@vger.kernel.org, asias.hejun@gmail.com, gorcunov@gmail.com,
prasadjoshi124@gmail.com, takuya.yoshikawa@gmail.com
Subject: Re: [PATCH v2 6/8] kvm tools: Add rwlock wrapper
Date: Sat, 04 Jun 2011 09:26:15 +0300 [thread overview]
Message-ID: <1307168775.27498.2.camel@lappy> (raw)
In-Reply-To: <20110603230510.GU2333@linux.vnet.ibm.com>
On Fri, 2011-06-03 at 16:05 -0700, Paul E. McKenney wrote:
> On Sat, Jun 04, 2011 at 01:54:45AM +0300, Sasha Levin wrote:
> > On Fri, 2011-06-03 at 14:20 -0700, Paul E. McKenney wrote:
> > > On Sat, Jun 04, 2011 at 12:03:59AM +0300, Sasha Levin wrote:
> > > > On Fri, 2011-06-03 at 13:22 -0700, Paul E. McKenney wrote:
> > > > > On Fri, Jun 03, 2011 at 10:56:20PM +0300, Sasha Levin wrote:
> > > > > > On Fri, 2011-06-03 at 12:31 -0700, Paul E. McKenney wrote:
> > > > > > > On Fri, Jun 03, 2011 at 10:54:19AM +0300, Sasha Levin wrote:
> > > > > > > > On Fri, 2011-06-03 at 09:34 +0200, Ingo Molnar wrote:
> > > > > > > > > * Sasha Levin <levinsasha928@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > > > with no apparent progress being made.
> > > > > > > > > >
> > > > > > > > > > Since it's something that worked in 2.6.37, I've looked into it to
> > > > > > > > > > find what might have caused this issue.
> > > > > > > > > >
> > > > > > > > > > I've bisected guest kernels and found that the problem starts with:
> > > > > > > > > >
> > > > > > > > > > a26ac2455ffcf3be5c6ef92bc6df7182700f2114 is the first bad commit
> > > > > > > > > > commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114
> > > > > > > > > > Author: Paul E. McKenney <paul.mckenney@linaro.org>
> > > > > > > > > > Date: Wed Jan 12 14:10:23 2011 -0800
> > > > > > > > > >
> > > > > > > > > > rcu: move TREE_RCU from softirq to kthread
> > > > > > > > > >
> > > > > > > > > > Ingo, could you confirm that the problem goes away for you when you
> > > > > > > > > > use an earlier commit?
> > > > > > > > >
> > > > > > > > > testing will have to wait, but there's a recent upstream fix:
> > > > > > > > >
> > > > > > > > > d72bce0e67e8: rcu: Cure load woes
> > > > > > > > >
> > > > > > > > > That *might* perhaps address this problem too.
> > > > > > > > >
> > > > > > > > I've re-tested with Linus's current git, the problem is still there.
> > > > > > > >
> > > > > > > > > If not then this appears to be some sort of RCU related livelock with
> > > > > > > > > brutally overcommitted vcpus. On native this would show up too, in a
> > > > > > > > > less drastic form, as a spurious bootup delay.
> > > > > > > >
> > > > > > > > I don't think it was overcommited by *that* much. With that commit it
> > > > > > > > usually hangs at 20-40 vcpus, while without it I can go up to 255.
> > > > > > >
> > > > > > > Here is a diagnostic patch, untested. It assumes that your system
> > > > > > > has only a few CPUs (maybe 8-16) and that timers are still running.
> > > > > > > It dumps out some RCU state if grace periods extend for more than
> > > > > > > a few seconds.
> > > > > > >
> > > > > > > To activate it, call rcu_diag_timer_start() from process context.
> > > > > > > To stop it, call rcu_diag_timer_stop(), also from process context.
> > > > > >
> > > > > > Since the hang happens in guest kernel very early during boot, I can't
> > > > > > call rcu_diag_timer_start(). What would be a good place to put the
> > > > > > _start() code instead?
> > > > >
> > > > > Assuming that the failure occurs in the host OS rather than in the guest
> > > > > OS, I suggest placing rcu_diag_timer_start() in the host code that starts
> > > > > up the guest.
> > > > >
> > > > > On the other hand, if the failure is occuring in the guest OS, then
> > > > > I suggest placing the call to rcu_diag_timer_start() just after timer
> > > > > initialization -- that is, assuming that interrupts are enabled at the
> > > > > time of the failure. If interrupts are not yet enabled at the time of
> > > > > the failure, color me clueless.
> > > >
> > > > It happens in the guest OS, the bisection was done on a guest kernel.
> > > >
> > > > I've placed the rcu debug _start() right at the end of init_timers()
> > > > which happens before the hang, but I'm not seeing any output from the
> > > > debug function.
> > >
> > > Sounds like the hang is happening before timers start? Are scheduling
> > > clock interrupts happening in the guest, as in scheduler_tick()? If so,
> > > I could just as easily put the debug there.
> >
> > Yes, scheduler_tick() is running before the hang.
> > Should I just call the handler function directly from there?
>
> I suggest calling it no more than once every few seconds, but yes.
> (Otherwise you will get a very full dmesg.)
Here is the last group of printk(), basically it just repeats itself
when it's stuck:
KVM setup async PF for cpu 19
0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc17 2/46/0
kworker/0:1 R running task 0 0 14 0x00000008
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc18 2/23/0
kworker/0:1 R running task 0 0 14 0x00000008
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc19 0/0/0
kworker/0:1 R running task 0 0 14 0x00000008
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
[ 0.527044] #20
0000000000000000
Call Trace:
rcu_diag: rcuc3 0/0/0
kworker/0:1 R running task 0 0 14 0x00000008
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc4 2/358/0
kworker/0:1 R running task 0 0 14 0x00000008
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc5 2/340/1
kworker/0:1 R running task 0 0 14 0x00000008
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc6 2/321/1
kworker/0:1 R running task 0 0 14 0x00000008
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc7 2/301/1
kworker/0:1 R running task 0 0 14 0x00000008
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc8 2/277/0
kworker/0:1 R running task 0 0 14 0x00000008
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc9 2/255/0
kworker/0:1 R running task 0 0 14 0x00000008
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc10 2/226/0
kworker/0:1 R running task 0 0 14 0x00000008
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc11 2/214/1
kworker/0:1 R running task 0 0 14 0x00000008
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc12 2/193/1
kworker/0:1 R running task 0 0 14 0x00000008
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc13 2/172/0
kworker/0:1 R running task 0 0 14 0x00000008
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc14 2/146/0
kworker/0:1 R running task 0 0 14 0x00000008
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc15 2/130/0
kworker/0:1 R running task 0 0 14 0x00000008
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc16 2/107/0
kworker/0:1 R running task 0 0 14 0x00000008
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc17 2/86/0
kworker/0:1 R running task 0 0 14 0x00000008
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc18 2/62/1
kworker/0:1 R running task 0 0 14 0x00000008
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
rcu_diag: rcuc19 2/39/0
kworker/0:1 R running task 0 0 14 0x00000008
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
--
Sasha.
next prev parent reply other threads:[~2011-06-04 6:26 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-30 8:30 [PATCH v2 1/8] kvm tools: Use correct value for user signal base Sasha Levin
2011-05-30 8:30 ` [PATCH v2 2/8] kvm tools: Remove wrong global definition of kvm Sasha Levin
2011-05-30 8:38 ` Ingo Molnar
2011-05-30 8:59 ` Sasha Levin
2011-05-30 8:30 ` [PATCH v2 3/8] kvm tools: Allow pausing guests Sasha Levin
2011-05-30 8:39 ` Ingo Molnar
2011-05-30 8:30 ` [PATCH v2 4/8] kvm tools: Pause/resume guest using SIGUSR2 Sasha Levin
2011-05-30 8:41 ` Ingo Molnar
2011-05-30 8:30 ` [PATCH v2 5/8] kvm tools: Add a brlock Sasha Levin
2011-05-30 8:30 ` [PATCH v2 6/8] kvm tools: Add rwlock wrapper Sasha Levin
2011-05-30 8:43 ` Ingo Molnar
2011-05-30 9:29 ` Pekka Enberg
2011-05-30 9:34 ` Sasha Levin
2011-05-30 9:40 ` Pekka Enberg
2011-05-30 9:46 ` Sasha Levin
2011-05-30 9:48 ` Pekka Enberg
2011-05-30 9:54 ` Ingo Molnar
2011-05-30 11:11 ` Takuya Yoshikawa
2011-05-30 11:12 ` Sasha Levin
2011-05-30 11:26 ` Takuya Yoshikawa
2011-05-30 11:39 ` Avi Kivity
2011-05-30 11:49 ` Ingo Molnar
2011-05-30 11:55 ` Pekka Enberg
2011-05-30 11:58 ` Sasha Levin
2011-05-30 12:20 ` Ingo Molnar
2011-05-30 12:22 ` Sasha Levin
2011-05-30 12:25 ` Avi Kivity
2011-05-30 12:23 ` Avi Kivity
2011-05-30 12:30 ` Pekka Enberg
2011-05-30 12:32 ` Avi Kivity
2011-05-30 14:10 ` Ingo Molnar
2011-05-30 14:30 ` Avi Kivity
2011-05-30 14:43 ` Ingo Molnar
2011-05-30 14:50 ` Avi Kivity
2011-05-30 19:32 ` Ingo Molnar
2011-05-30 12:04 ` Avi Kivity
2011-05-30 12:36 ` Ingo Molnar
2011-05-30 12:44 ` Avi Kivity
2011-05-30 12:46 ` Pekka Enberg
2011-05-30 12:48 ` Avi Kivity
2011-05-30 13:05 ` Sasha Levin
2011-06-03 7:27 ` Sasha Levin
2011-06-03 7:34 ` Ingo Molnar
2011-06-03 7:54 ` Sasha Levin
2011-06-03 19:31 ` Paul E. McKenney
2011-06-03 19:56 ` Sasha Levin
2011-06-03 20:22 ` Paul E. McKenney
2011-06-03 21:03 ` Sasha Levin
2011-06-03 21:20 ` Paul E. McKenney
2011-06-03 22:54 ` Sasha Levin
2011-06-03 23:05 ` Paul E. McKenney
2011-06-04 6:26 ` Sasha Levin [this message]
2011-06-04 16:30 ` Paul E. McKenney
2011-06-14 22:26 ` Sasha Levin
2011-06-14 23:42 ` Paul E. McKenney
2011-06-15 1:25 ` Sasha Levin
2011-06-15 4:22 ` Paul E. McKenney
2011-06-05 12:12 ` Avi Kivity
2011-05-30 14:16 ` Takuya Yoshikawa
2011-05-30 9:56 ` Ingo Molnar
2011-05-30 10:05 ` Sasha Levin
2011-05-30 10:13 ` Ingo Molnar
2011-05-30 10:22 ` Sasha Levin
2011-05-30 10:30 ` Ingo Molnar
2011-05-30 10:41 ` Sasha Levin
2011-05-30 8:30 ` [PATCH v2 7/8] kvm tools: Add debug mode to brlock Sasha Levin
2011-05-30 8:30 ` [PATCH v2 8/8] kvm tools: Use brlock in MMIO and IOPORT Sasha Levin
2011-05-30 8:47 ` Ingo Molnar
2011-05-30 8:56 ` Sasha Levin
2011-05-30 8:35 ` [PATCH v2 1/8] kvm tools: Use correct value for user signal base Ingo Molnar
2011-05-30 8:40 ` Sasha Levin
2011-05-30 8:49 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1307168775.27498.2.camel@lappy \
--to=levinsasha928@gmail.com \
--cc=asias.hejun@gmail.com \
--cc=avi@redhat.com \
--cc=gorcunov@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=paulmck@linux.vnet.ibm.com \
--cc=penberg@kernel.org \
--cc=prasadjoshi124@gmail.com \
--cc=takuya.yoshikawa@gmail.com \
--cc=yinghai@kernel.org \
--cc=yoshikawa.takuya@oss.ntt.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.