linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Stultz <johnstul@us.ibm.com>
To: Fengguang Wu <fengguang.wu@intel.com>
Cc: Avi Kivity <avi@redhat.com>,
	paulmck@linux.vnet.ibm.com, Josh Boyer <jwboyer@redhat.com>,
	Christian Hoffmann <email@christianhoffmann.info>,
	LKML <linux-kernel@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	tglx@linutronix.de
Subject: Re: INFO: rcu_preempt detected stalls on CPUs/tasks: { 1} (detected by 0, t=10002 jiffies)
Date: Mon, 08 Oct 2012 16:40:06 -0700	[thread overview]
Message-ID: <50736456.1010407@us.ibm.com> (raw)
In-Reply-To: <20120930115937.GA21989@localhost>

On 09/30/2012 04:59 AM, Fengguang Wu wrote:
> On Sun, Sep 30, 2012 at 01:32:46PM +0200, Avi Kivity wrote:
>> On 09/30/2012 01:23 PM, Fengguang Wu wrote:
>>> On Sun, Sep 30, 2012 at 01:10:55PM +0200, Avi Kivity wrote:
>>>> On 09/28/2012 05:35 AM, Paul E. McKenney wrote:
>>>>> On Thu, Sep 27, 2012 at 12:40:44PM +0800, Fengguang Wu wrote:
>>>>>> On Wed, Sep 26, 2012 at 09:28:50PM -0700, Paul E. McKenney wrote:
>>>>>>> On Thu, Sep 27, 2012 at 10:54:00AM +0800, Fengguang Wu wrote:
>>>>>>>> On Wed, Sep 26, 2012 at 09:45:43AM -0700, Paul E. McKenney wrote:
>>>>>>>>> On Wed, Sep 26, 2012 at 04:15:01PM +0800, Fengguang Wu wrote:
>>>>> [ . . . ]
>>>>>
>>>>>>>>> But could you also please send your .config file and a description of
>>>>>>>> .config attached.
>>>>>>>>
>>>>>>>>> the workload you are running?
>>>>>>>> It's basically the below commands. The exact initrd is not relevant in
>>>>>>>> this case because it's a boot time warning before user space is
>>>>>>>> started. The stalls roughly happen 1 time on every 10 boots.
>>>>>>> Yow!!!
>>>>>>>
>>>>>>> You have severe cross-CPU time-synchronization problems.  See for
>>>>>>> example the first dmesg, with the relevant part extracted right here.
>>>>>>> One CPU believes that it is about 37 seconds past boot, and the other
>>>>>>> CPU beleives that it is about 137 seconds past boot.  Given that large
>>>>>>> of a time difference, an RCU CPU stall warning is expected behavior.
>>>>>> Good spot! Yeah I noticed that huge timestamp gap, however didn't take
>>>>>> it seriously enough..
>>>>>>
>>>>>>> Get your two CPUs in agreement about what time it is, and I bet that
>>>>>>> the CPU stall warnings will go away.
>>>>>> Possibly KVM related? Because the warnings show up in many test boxes
>>>>>> running KVM and so is not likely some hardware specific issue.
>>>>> I vaguely recall seeing something recently.  But let's ask the KVM and
>>>>> timekeeping guys.
>>>> >From the logs it looks like hpet (why not kvmclock?) is used for the
>>>> clock, it should not generate such drifts since it is a global clock.
>>>> Can you verify current_clocksource on a boot that actually failed (in
>>>> case the clocksource is switched during runtime)?
>>> I've checked out the dmesg that's cited by Paul, attached. Yes it
>>> contains lines
>>>
>>> [    4.970051] Switching to clocksource hpet
>>>
>>> and then
>>>
>>> [    7.250353] Switching to clocksource tsc
>>>
>>> And there is no kvm-clock lines. Oh well for this particular kernel:
>>>
>> Ah, tsc will certainly break on kvm if the hardware doesn't provide a
>> constant tsc source.  I'm surprised the guest kernel didn't detect it
>> and switch back to hpet though.
> Thanks, it's good to know the root cause. All the dmesgs show the same hpet+tsc
> switching pattern (and never switch back):
>
> $ grep Switching dmesg-kvm_bisect2-inn-*21
> dmesg-kvm_bisect2-inn-41931-2012-09-27-10-37-51-3.6.0-rc7-bisect2-00078-g593d100-21:[    4.111415] Switching to clocksource hpet
> dmesg-kvm_bisect2-inn-41931-2012-09-27-10-37-51-3.6.0-rc7-bisect2-00078-g593d100-21:[    6.550098] Switching to clocksource tsc

Is this still an open issue? Fengguang's mail sounds like its resolved, 
but I'm not sure it is.

The switching from HPET -> TSC  I believe is expected, as the refined 
calibration will delay the TSC from being registered for a few seconds.  
However, its unclear why the TSC, if it is faulty, isn't being caught 
and demoted by the clocksource watchdog.

I'm also curious why this originally bisected down to 
06ae115a1d551cd952d8  (when using the kvm clock) if it was more of a 
hardware issue. And in those logs, I don't see the printk time-stamp 
inconsistencies that were alluded to in this thread.

Fengguang: Is this still reproducible? Do you have any details (dmesg) 
about host system as well?

thanks
-john


      reply	other threads:[~2012-10-08 23:40 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-25 11:19 INFO: rcu_preempt detected stalls on CPUs/tasks: { 1} (detected by 0, t=10002 jiffies) Fengguang Wu
2012-09-25 15:07 ` Paul E. McKenney
     [not found]   ` <20120926042237.GA18287@localhost>
2012-09-26  4:34     ` Paul E. McKenney
2012-09-26  8:15       ` Fengguang Wu
2012-09-26 16:45         ` Paul E. McKenney
     [not found]           ` <20120927025400.GA11434@localhost>
     [not found]             ` <20120927042849.GA6556@linux.vnet.ibm.com>
     [not found]               ` <20120927044044.GA22753@localhost>
     [not found]                 ` <20120928033511.GB2480@linux.vnet.ibm.com>
2012-09-30 11:10                   ` Avi Kivity
2012-09-30 11:18                     ` Fengguang Wu
2012-09-30 11:31                       ` Avi Kivity
2012-09-30 11:23                     ` Fengguang Wu
2012-09-30 11:32                       ` Avi Kivity
2012-09-30 11:59                         ` Fengguang Wu
2012-10-08 23:40                           ` John Stultz [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50736456.1010407@us.ibm.com \
    --to=johnstul@us.ibm.com \
    --cc=avi@redhat.com \
    --cc=email@christianhoffmann.info \
    --cc=fengguang.wu@intel.com \
    --cc=jwboyer@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).