public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Rusty Russell <rusty@rustcorp.com.au>
Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	Andi Kleen <ak@muc.de>, "H. Peter Anvin" <hpa@zytor.com>,
	Chuck Ebbert <cebbert@redhat.com>,
	Christoph Hellwig <hch@infradead.org>,
	Jeremy Fitzhardinge <jeremy@goop.org>,
	Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [patch 5/8] Immediate Values - x86 Optimization
Date: Thu, 15 Nov 2007 00:37:51 -0500	[thread overview]
Message-ID: <20071115053751.GB26338@Krystal> (raw)
In-Reply-To: <200711151545.19359.rusty@rustcorp.com.au>

* Rusty Russell (rusty@rustcorp.com.au) wrote:
> On Thursday 15 November 2007 15:06:10 Mathieu Desnoyers wrote:
> > * Rusty Russell (rusty@rustcorp.com.au) wrote:
> > > A stop_machine (or lightweight variant using IPI) would be sufficient and
> > > vastly simpler.  Trying to patch NMI handlers while they're running is
> > > already crazy.
> >
> > I wouldn't mind if it was limited to the code within do_nmi(), but then
> > we would have to accept potential GPF if
> >
> > A - the NMI or MCE code calls any external kernel code (printk,
> >   notify_die, spin_lock/unlock, die_nmi, lapic_wd_event (perfctr code,
> >   calls printk too for debugging)...
> 
> Sure, but as I pointed out previously, such calls are already best effort.  
> You can do very little safely from do_nmi(), and calling printk isn't one of 
> them,

yes and no.. do_nmi uses the "bust spinlocks" exactly for this. So this
is ok by design. Other than this, we can end up mixing up the console
data output with different sources of characters, but I doubt something
really bad can happen (like a deadlock).

> nor is grabbing a spinlock (well, actually you could as long as it's 
> *only* used by NMI handlers.  See any of those?).

Yup, see arch/x86/kernel/nmi_64.c : nmi_watchdog_tick()

It defines a spinlock to "Serialise the printks". I guess it's good to
protect against other nmi watchdogs running on other CPUs concurrently,
I guess.

> 
> > Therefore, if one decides to use the immediate values to
> > leave dormant spinlock instrumentation in the kernel, I wouldn't want it
> > to have undesirable side-effects (GPF) when the instrumentation is
> > being enabled, as rare as it could be.
> 
> It's overengineered, since it's less likely than deadlock already.
> 

So should we put a warning telling "enabling tracing or profiling on a
production system that also uses NMI watchdog could potentially cause a
crash" ? The rarer a bug is, the more difficult it is to debug. It does
not make the bug hurt less when it happens.

The normal thing to do when a potential deadlock is detected is to fix
it, not to leave it there under the premise that it doesn't matter since
it happens rarely. In our case, where we know there is a potential race,
I don't see any reason not to make sure it never happens. What's the
cost of it ?

arch/x86/kernel/immediate.o : 2.4K

let's compare..

kernel/stop_machine.o : 3.9K

so I think that code size is not an issue there, especially since the
immediate values are not meant to be deployed on embedded systems.

> > > I'd keep this version up your sleeve for they day when it's needed.
> >
> > If we choose to go this way, stop_machine would have to do a sync_core()
> > on every CPU before it reactivates interrupts for this to respect
> > Intel's errata.
> 
> Yes, I don't think stop_machine is actually what you want anyway, since you 
> are happy to run in interrupt context.  An IPI-based scheme is probably 
> better, and also has the side effect of iret doing the sync you need, IIUC.
> 

Yup, looping in IPIs with interrupts disabled should do the job there.
It's just awful for interrupt latency on large SMP systems :( Being
currently bad at it is not a reason to make it worse. If we have a CPU
that is within a high latency irq disable region when we send the IPI,
we can easily end up waiting for this critical section to end with
interrupts disabled on all CPUs. The fact that we would wait for the
longest interrupt disable region with IRQs disabled implies that we
increase the maximum latency of the system, by design. I'm not sure I
would like to be the new longest record-beating IRQ off region.

Mathieu

> Hope that clarifies,
> Rusty.

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  reply	other threads:[~2007-11-15  5:43 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-13 18:58 [patch 0/8] Immediate Values (now with merged x86 support) Mathieu Desnoyers
2007-11-13 18:58 ` [patch 1/8] Immediate Values - Architecture Independent Code Mathieu Desnoyers
2007-11-13 18:58 ` [patch 2/8] Immediate Values - Kconfig menu in EMBEDDED Mathieu Desnoyers
2007-11-13 18:58 ` [patch 3/8] Immediate Values - Move Kprobes x86 restore_interrupt to kdebug.h Mathieu Desnoyers
2007-11-13 18:58 ` [patch 4/8] Add asm-compat.h to x86 Mathieu Desnoyers
2007-11-13 19:05   ` H. Peter Anvin
2007-11-13 20:37     ` [patch 4/8] Add asm-compat.h to x86 -> use new asm.h instead Mathieu Desnoyers
2007-11-13 18:58 ` [patch 5/8] Immediate Values - x86 Optimization Mathieu Desnoyers
2007-11-13 19:07   ` H. Peter Anvin
2007-11-13 19:24     ` Mathieu Desnoyers
2007-11-13 19:36       ` H. Peter Anvin
2007-11-13 19:45         ` Mathieu Desnoyers
2007-11-13 19:56           ` H. Peter Anvin
2007-11-13 20:40             ` [patch 5/8] Immediate Values - x86 Optimization (update) Mathieu Desnoyers
2007-11-13 21:26               ` H. Peter Anvin
2007-11-13 22:02                 ` Mathieu Desnoyers
2007-11-13 22:35                   ` H. Peter Anvin
2007-11-14  0:34                     ` Mathieu Desnoyers
2007-11-14  1:02                       ` H. Peter Anvin
2007-11-14  1:44                         ` [patch 5/8] Immediate Values - x86 Optimization (update 2) Mathieu Desnoyers
2007-11-14  2:58                           ` H. Peter Anvin
2007-11-14 14:16                             ` Mathieu Desnoyers
2007-11-14 18:52                             ` [PATCH] Immediate Values x86 Optimization Declare Discarded Instruction Mathieu Desnoyers
2007-11-14 19:00                               ` H. Peter Anvin
2007-11-14 19:16                                 ` [PATCH] Add __discard section to x86 Mathieu Desnoyers
2007-11-14 19:33                                   ` H. Peter Anvin
2007-11-15  3:08   ` [patch 5/8] Immediate Values - x86 Optimization Rusty Russell
2007-11-15  4:06     ` Mathieu Desnoyers
2007-11-15  4:45       ` Rusty Russell
2007-11-15  5:37         ` Mathieu Desnoyers [this message]
2007-11-15 11:06           ` Rusty Russell
2007-11-16 14:03             ` [patch 5/8] Immediate Values - x86 Optimization (simplified) Mathieu Desnoyers
2007-11-18 23:11               ` Rusty Russell
2007-11-19 14:28                 ` Mathieu Desnoyers
2007-11-19 23:06                   ` Rusty Russell
2007-11-20 17:02                     ` Mathieu Desnoyers
2007-11-19 19:15                 ` Mathieu Desnoyers
2007-11-13 18:58 ` [patch 6/8] Immediate Values - Powerpc Optimization Mathieu Desnoyers
2007-11-13 18:58 ` [patch 7/8] Immediate Values - Documentation Mathieu Desnoyers
2007-11-13 18:58 ` [patch 8/8] Scheduler Profiling - Use Immediate Values Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20071115053751.GB26338@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=ak@muc.de \
    --cc=akpm@linux-foundation.org \
    --cc=cebbert@redhat.com \
    --cc=hch@infradead.org \
    --cc=hpa@zytor.com \
    --cc=jeremy@goop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=rusty@rustcorp.com.au \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox