All of lore.kernel.org
 help / color / mirror / Atom feed
From: Prarit Bhargava <prarit@redhat.com>
To: John Stultz <johnstul@us.ibm.com>
Cc: linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Salman Qazi <sqazi@google.com>,
	stable@kernel.org
Subject: Re: [PATCH] clocksource, prevent overflow in clocksource_cyc2ns
Date: Thu, 05 Apr 2012 07:00:27 -0400	[thread overview]
Message-ID: <4F7D7B4B.7050203@redhat.com> (raw)
In-Reply-To: <4F7CF094.5020201@us.ibm.com>



On 04/04/2012 09:08 PM, John Stultz wrote:
> On 04/04/2012 11:33 AM, Prarit Bhargava wrote:
>>> One idea might be to replace the cyc2ns w/ mult_frac in only the watchdog code.
>>> I need to think on that some more (and maybe have you provide some debug output)
>>> to really understand how that's solving the issue for you, but it would be able
>>> to be done w/o affecting the other assumptions of the timekeeping core.
>>>
>> Hey John,
>>
>> After reading the initial part of your reply I was thinking about calling
>> mult_frac() directly from the watchdog code as well.
>>
>> Here's some debug output I cobbled together to get an idea of how quickly the
>> overflow was happening.
>>
>> [    5.435323] clocksource_watchdog: {0} cs tsc csfirst 227349443638728 mask
>> 0xFFFFFFFFFFFFFFFF mult 797281036 shift 31
>> [    5.444930] clocksource_watchdog: {0} wd hpet wdfirst 78332535 mask
>> 0xFFFFFFFF mult 292935555 shift 22
>>
>> These, of course, are just the basic data from the clocksources tsc and hpet.
> 
> If I'm doing the math right, these are ~2.7 Ghz cpus?

Yes.

> 
> So what kernel version are you using?

I was on an earlier version of Fedora (F16) ... but I'll jump forward and see if
I can still hit it.

> 
> In trying to reproduce this locally against Linus' HEAD on a much smaller system
> (single core + HT 1.6Ghz), I got:
> [    6.611366] clocksource_watchdog: {0} cs tsc csfirst 36177888648 mask
> ffffffffffffffff mult 10485747 shift 24
> [    6.611596] clocksource_watchdog: {0} wd hpet wdfirst 169168400 mask ffffffff
> mult 2684354560 shift 26
> 
> Note the smaller shift values. Not too long ago the shift calculation was
> adjusted to allow for longer periods between interrupts,  so I suspect you're on
> an older kernel.
> 
> Further, using your debug patch on my system, it was well beyond 10 minutes
> before the debug overflow occurred.  And similarly I couldn't trip the watchdog
> trigger using sysrq-t (but again, only two threads here, so not nearly as much
> data to print as you have).

I'm going to try this on a 32-cpu system (running the previously mentioned test)
with linux.git HEAD.

> 
> Could you verify that the issue you're seeing is still is present w/ current
> mainline?  Please don't take this as me dismissing your problem!  As I mentioned

Absolutely :)  I didn't take it that way at all. .... when I get in this AM I'll
bang out a test and see if I can cause this to happen with sysrq-t.  Keep in
mind that 10000 threads is the *minimum* I was able to cause this with, which is
only ~315 threads/cpu, which isn't a lot :/.  At that number of threads the dump
takes about 6 mins.  Doubling it, IIRC, exceeded 10 mins.

> earlier there are some known issues w/ the clocksource watchdog code. But I want
> to narrow down if you're  problem  is currently present in mainline or only in
> older kernels, as that will help us find the proper fix.

Thanks John,

P.

> 
> thanks
> -john
> 

  reply	other threads:[~2012-04-05 11:00 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-04 15:11 [PATCH] clocksource, prevent overflow in clocksource_cyc2ns Prarit Bhargava
2012-04-04 18:00 ` John Stultz
2012-04-04 18:33   ` Prarit Bhargava
2012-04-05  1:08     ` John Stultz
2012-04-05 11:00       ` Prarit Bhargava [this message]
2012-04-05 16:23         ` John Stultz
2012-04-05 12:27       ` Prarit Bhargava
2012-04-05 16:45         ` John Stultz
2012-04-06 23:29         ` Thomas Gleixner
2012-04-07 13:47           ` Prarit Bhargava
2012-04-18 23:20         ` John Stultz
2012-04-18 23:59           ` Prarit Bhargava
2012-04-19  0:18             ` John Stultz
2012-04-19 11:56               ` Prarit Bhargava
2012-04-19 12:50               ` Thomas Gleixner
2012-04-19 12:52                 ` Thomas Gleixner
2012-04-19 13:06                   ` Prarit Bhargava
2012-04-19 13:18                     ` Thomas Gleixner
2012-04-19 18:12                   ` John Stultz
2012-04-25 12:29                     ` Prarit Bhargava
2012-04-19 12:37             ` Thomas Gleixner
2012-04-19 12:51               ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F7D7B4B.7050203@redhat.com \
    --to=prarit@redhat.com \
    --cc=johnstul@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sqazi@google.com \
    --cc=stable@kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.