linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Fengguang Wu <fengguang.wu@intel.com>
To: Avi Kivity <avi@redhat.com>
Cc: paulmck@linux.vnet.ibm.com, Josh Boyer <jwboyer@redhat.com>,
	Christian Hoffmann <email@christianhoffmann.info>,
	LKML <linux-kernel@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	johnstul@us.ibm.com, tglx@linutronix.de
Subject: Re: INFO: rcu_preempt detected stalls on CPUs/tasks: { 1} (detected by 0, t=10002 jiffies)
Date: Sun, 30 Sep 2012 19:59:37 +0800	[thread overview]
Message-ID: <20120930115937.GA21989@localhost> (raw)
In-Reply-To: <50682DDE.7010002@redhat.com>

On Sun, Sep 30, 2012 at 01:32:46PM +0200, Avi Kivity wrote:
> On 09/30/2012 01:23 PM, Fengguang Wu wrote:
> > On Sun, Sep 30, 2012 at 01:10:55PM +0200, Avi Kivity wrote:
> >> On 09/28/2012 05:35 AM, Paul E. McKenney wrote:
> >> > On Thu, Sep 27, 2012 at 12:40:44PM +0800, Fengguang Wu wrote:
> >> >> On Wed, Sep 26, 2012 at 09:28:50PM -0700, Paul E. McKenney wrote:
> >> >> > On Thu, Sep 27, 2012 at 10:54:00AM +0800, Fengguang Wu wrote:
> >> >> > > On Wed, Sep 26, 2012 at 09:45:43AM -0700, Paul E. McKenney wrote:
> >> >> > > > On Wed, Sep 26, 2012 at 04:15:01PM +0800, Fengguang Wu wrote:
> >> > 
> >> > [ . . . ]
> >> > 
> >> >> > > > But could you also please send your .config file and a description of
> >> >> > > 
> >> >> > > .config attached.
> >> >> > > 
> >> >> > > > the workload you are running?
> >> >> > > 
> >> >> > > It's basically the below commands. The exact initrd is not relevant in
> >> >> > > this case because it's a boot time warning before user space is
> >> >> > > started. The stalls roughly happen 1 time on every 10 boots.
> >> >> > 
> >> >> > Yow!!!
> >> >> > 
> >> >> > You have severe cross-CPU time-synchronization problems.  See for
> >> >> > example the first dmesg, with the relevant part extracted right here.
> >> >> > One CPU believes that it is about 37 seconds past boot, and the other
> >> >> > CPU beleives that it is about 137 seconds past boot.  Given that large
> >> >> > of a time difference, an RCU CPU stall warning is expected behavior.
> >> >> 
> >> >> Good spot! Yeah I noticed that huge timestamp gap, however didn't take
> >> >> it seriously enough..
> >> >> 
> >> >> > Get your two CPUs in agreement about what time it is, and I bet that
> >> >> > the CPU stall warnings will go away.
> >> >> 
> >> >> Possibly KVM related? Because the warnings show up in many test boxes
> >> >> running KVM and so is not likely some hardware specific issue.
> >> > 
> >> > I vaguely recall seeing something recently.  But let's ask the KVM and
> >> > timekeeping guys.
> >> 
> >> >From the logs it looks like hpet (why not kvmclock?) is used for the
> >> clock, it should not generate such drifts since it is a global clock.
> >> Can you verify current_clocksource on a boot that actually failed (in
> >> case the clocksource is switched during runtime)?
> > 
> > I've checked out the dmesg that's cited by Paul, attached. Yes it
> > contains lines
> > 
> > [    4.970051] Switching to clocksource hpet
> > 
> > and then
> > 
> > [    7.250353] Switching to clocksource tsc
> > 
> > And there is no kvm-clock lines. Oh well for this particular kernel:
> > 
> 
> Ah, tsc will certainly break on kvm if the hardware doesn't provide a
> constant tsc source.  I'm surprised the guest kernel didn't detect it
> and switch back to hpet though.

Thanks, it's good to know the root cause. All the dmesgs show the same hpet+tsc
switching pattern (and never switch back):

$ grep Switching dmesg-kvm_bisect2-inn-*21    
dmesg-kvm_bisect2-inn-41931-2012-09-27-10-37-51-3.6.0-rc7-bisect2-00078-g593d100-21:[    4.111415] Switching to clocksource hpet
dmesg-kvm_bisect2-inn-41931-2012-09-27-10-37-51-3.6.0-rc7-bisect2-00078-g593d100-21:[    6.550098] Switching to clocksource tsc
dmesg-kvm_bisect2-inn-41931-2012-09-27-10-41-48-3.6.0-rc7-bisect2-00078-g593d100-21:[    3.927716] Switching to clocksource hpet
dmesg-kvm_bisect2-inn-41931-2012-09-27-10-41-48-3.6.0-rc7-bisect2-00078-g593d100-21:[    6.030117] Switching to clocksource tsc
dmesg-kvm_bisect2-inn-42322-2012-09-27-10-35-17-3.6.0-rc7-bisect2-00078-g593d100-21:[    3.587408] Switching to clocksource hpet
dmesg-kvm_bisect2-inn-42322-2012-09-27-10-35-17-3.6.0-rc7-bisect2-00078-g593d100-21:[    5.812400] Switching to clocksource tsc
dmesg-kvm_bisect2-inn-42322-2012-09-27-10-43-56-3.6.0-rc7-bisect2-00078-g593d100-21:[    4.294842] Switching to clocksource hpet
dmesg-kvm_bisect2-inn-42322-2012-09-27-10-43-56-3.6.0-rc7-bisect2-00078-g593d100-21:[    6.491696] Switching to clocksource tsc
dmesg-kvm_bisect2-inn-42322-2012-09-27-10-47-03-3.6.0-rc7-bisect2-00078-g593d100-21:[    4.745749] Switching to clocksource hpet
dmesg-kvm_bisect2-inn-42322-2012-09-27-10-47-03-3.6.0-rc7-bisect2-00078-g593d100-21:[    7.193121] Switching to clocksource tsc
dmesg-kvm_bisect2-inn-42527-2012-09-27-10-38-38-3.6.0-rc7-bisect2-00078-g593d100-21:[    4.970051] Switching to clocksource hpet
dmesg-kvm_bisect2-inn-42527-2012-09-27-10-38-38-3.6.0-rc7-bisect2-00078-g593d100-21:[    7.250353] Switching to clocksource tsc

And these are the stall times:

$ grep -hC1 stalls dmesg-kvm_bisect2-inn-*21
[   36.122624] bus: 'platform': really_probe: probing driver i8042 with device i8042
[   36.106893] INFO: rcu_preempt self-detected stall on CPU[  210.200388] INFO: rcu_preempt detected stalls on CPUs/tasks: { 1} (detected by 0, t=17413 jiffies)
[  210.200417] Pid: 0, comm: swapper/0 Not tainted 3.6.0-rc7-bisect2-00078-g593d100 #21
--
[   35.403888] bus: 'platform': really_probe: probing driver i8042 with device i8042
[  212.130131] INFO: rcu_preempt detected stalls on CPUs/tasks:[  212.131029] rcu-torture: rtc: c1a5e988 ver: 228 tfle: 0 rta: 228 rtaf: 162 rtf: 206 rtmbe: 0 rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 988 onoff: 0/0:0/0 -1,0:-1,0 0:0 (HZ=100) barrier: 0/0:0
[  212.131029] rcu-torture: Reader Pipe:  16284405 36 0 0 0 0 0 0 0 0 0
--
[   35.458755] bus: 'platform': really_probe: probing driver i8042 with device i8042
[   35.448369] INFO: rcu_preempt self-detected stall on CPU { 1} [  210.260157] INFO: rcu_preempt detected stalls on CPUs/tasks: { 1} (detected by 0, t=17485 jiffies)
[  210.260179] Pid: 0, comm: swapper/0 Not tainted 3.6.0-rc7-bisect2-00078-g593d100 #21
--
[   36.079480] INFO: rcu_preempt self-detected stall on CPU { 1
[  223.710182] INFO: rcu_preempt detected stalls on CPUs/tasks: { 1} (detected by 0, t=18767 jiffies)
[  223.710201] Pid: 0, comm: swapper/0 Not tainted 3.6.0-rc7-bisect2-00078-g593d100 #21
--
[   37.831174] kobject: ':t-0000168' (c793a24c): fill_kobj_path: path = '/kernel/slab/:t-0000168'
[  136.910092] INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 1, t=10002 jiffies)
[  136.910092] INFO: Stall ended before state dump start
--
[   37.672159] kobject: 'iscsi_session' (cc2c0c0c): fill_kobj_path: path = '/class/iscsi_session'
[   37.672791] INFO: rcu_preempt self-detected stall on CPU { 1[  137.900041] INFO: rcu_preempt detected stalls on CPUs/tasks: { 1} (detected by 0, t=10028 jiffies)
[  137.900041] Pid: 41, comm: rcu_torture_rea Not tainted 3.6.0-rc7-bisect2-00078-g593d100 #21

Thanks,
Fengguang

  reply	other threads:[~2012-09-30 12:17 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-25 11:19 INFO: rcu_preempt detected stalls on CPUs/tasks: { 1} (detected by 0, t=10002 jiffies) Fengguang Wu
2012-09-25 15:07 ` Paul E. McKenney
     [not found]   ` <20120926042237.GA18287@localhost>
2012-09-26  4:34     ` Paul E. McKenney
2012-09-26  8:15       ` Fengguang Wu
2012-09-26 16:45         ` Paul E. McKenney
     [not found]           ` <20120927025400.GA11434@localhost>
     [not found]             ` <20120927042849.GA6556@linux.vnet.ibm.com>
     [not found]               ` <20120927044044.GA22753@localhost>
     [not found]                 ` <20120928033511.GB2480@linux.vnet.ibm.com>
2012-09-30 11:10                   ` Avi Kivity
2012-09-30 11:18                     ` Fengguang Wu
2012-09-30 11:31                       ` Avi Kivity
2012-09-30 11:23                     ` Fengguang Wu
2012-09-30 11:32                       ` Avi Kivity
2012-09-30 11:59                         ` Fengguang Wu [this message]
2012-10-08 23:40                           ` John Stultz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120930115937.GA21989@localhost \
    --to=fengguang.wu@intel.com \
    --cc=avi@redhat.com \
    --cc=email@christianhoffmann.info \
    --cc=johnstul@us.ibm.com \
    --cc=jwboyer@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).