public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: David Miller <davem@davemloft.net>
Cc: mingo@elte.hu, tglx@linutronix.de, linux-kernel@vger.kernel.org
Subject: Re: cpu_clock() in NMIs
Date: Mon, 14 Dec 2009 10:02:39 +0100	[thread overview]
Message-ID: <1260781359.4165.42.camel@twins> (raw)
In-Reply-To: <20091213.182502.215092085.davem@davemloft.net>

On Sun, 2009-12-13 at 18:25 -0800, David Miller wrote:
> The background is that I was trying to resolve a sparc64 perf
> issue when I discovered this problem.
> 
> On sparc64 I implement pseudo NMIs by simply running the kernel
> at IRQ level 14 when local_irq_disable() is called, this allows
> performance counter events to still come in at IRQ level 15.
> 
> This doesn't work if any code in an NMI handler does local_irq_save()
> or local_irq_disable() since the "disable" will kick us back to cpu
> IRQ level 14 thus letting NMIs back in and we recurse.
> 
> The only path which that does that in the perf event IRQ handling path
> is the code supporting frequency based events.  It uses cpu_clock().
> 
> cpu_clock() simply invokes sched_clock() with IRQs disabled.
> 
> And that's a fundamental bug all on it's own, particularly for the
> HAVE_UNSTABLE_SCHED_CLOCK case.  NMIs can thus get into the
> sched_clock() code interrupting the local IRQ disable code sections
> of it.
> 
> Furthermore, for the not-HAVE_UNSTABLE_SCHED_CLOCK case, the IRQ
> disabling done by cpu_clock() is just pure overhead and completely
> unnecessary.
> 
> So the core problem is that sched_clock() is not NMI safe, but we
> are invoking it from NMI contexts in the perf events code (via
> cpu_clock()).
> 
> A less important issue is the overhead of IRQ disabling when it isn't
> necessary in cpu_clock().  Maybe something simple like the patch below
> to handle that.

I'm not sure, traditionally sched_clock() was always called with IRQs
disabled, and on eg. x86 that is needed because we read the TSC and then
scale that value depending on CPUfreq state, which can be changed by a
CPUfreq interrupt. Allowing NMIs in between isn't really a problem,
allowing IRQs in is.

Now, the SPARC implementation might be good without IRQs disabled, but
we should at least look at all other arches before we do what you
propose below. As it removes the IRQ disable from the callsites whereas
it previously always had that.




  reply	other threads:[~2009-12-14  9:00 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-14  2:25 cpu_clock() in NMIs David Miller
2009-12-14  9:02 ` Peter Zijlstra [this message]
2009-12-14 19:09   ` David Miller
2009-12-14 19:32     ` Peter Zijlstra
2009-12-15  5:36       ` David Miller
2009-12-14 19:57     ` Mike Frysinger
2009-12-15  8:11 ` [tip:sched/urgent] sched: Fix cpu_clock() in NMIs, on !CONFIG_HAVE_UNSTABLE_SCHED_CLOCK tip-bot for David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1260781359.4165.42.camel@twins \
    --to=peterz@infradead.org \
    --cc=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox