From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Andi Kleen <ak@muc.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>,
systemtap@sources.redhat.com, prasanna@in.ibm.com,
ananth@in.ibm.com, anil.s.keshavamurthy@intel.com,
akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
hch@infradead.org
Subject: Re: [patch 05/10] Linux Kernel Markers - i386 optimized version
Date: Fri, 11 May 2007 14:02:07 -0400 [thread overview]
Message-ID: <20070511180207.GA25516@Krystal> (raw)
In-Reply-To: <20070511060444.GA35262@muc.de>
* Andi Kleen (ak@muc.de) wrote:
> On Thu, May 10, 2007 at 12:59:18PM -0400, Mathieu Desnoyers wrote:
> > * Alan Cox (alan@lxorguk.ukuu.org.uk) wrote:
> > > > * First issue : Impact on the system. If we try to make this system
> > > > scale, we will create very long irq disable sections. The expected
> > > > duration is the worse case IPI latency plus the time it takes to CPU A
> > > > to change the variable. We therefore directly grow the worse case
> > > > system's interrupt latency.
> > >
> > > Not a huge problem. It doesn't scale in really horrible ways and the IPI
> > > latency on a PIV or later is actually very good. Also the impact is less
> > > than you might think as on huge huge boxes you want multiple copies of
> > > the kernel text pages to reduce NUMA traffic, so you only have to sync
> > > the group of processors involved
>
> I agree with Alan and disagree with you on the impact on the system.
>
I just want to make sure I understand your disagreement. You do not seem
to provide any counter-argument to the following technical fact : the
proposed algorithm will increase the worse-case interrupt latency of the
kernel.
The IPI might be fast, but I have seen interrupts being disabled for
quite a long time in some kernel code paths. Having interrupts disabled
on _each cpu_ while running an IPI handler waiting to be synchronized
with other CPUs has this side-effect. Therefore, if I understand well,
you object that the worse-case interrupt latency in the Linux kernel is
not important. Since I have some difficulty agreeing with your
objection, I'll leave the debate about the importance of such
side-effects to others, since it is mostly a political issue.
Or maybe am I not understanding you correctly ?
> > >
> > > > * Second issue : irq disabling does not protect us from NMI and traps.
> > > > We cannot use this algorithm to mark these code segments.
> > >
> > > If you synchronize all the other processors and disable local interrupts
> > > then the only traps you have to worry about are those you cause, and the
> > > only person taking the trap will be you so you're ok.
> > >
> > > NMI is hard but NMI is a special case not worth solving IMHO.
> > >
> >
> > Not caring about NMIs may have more impact than one could expect. You
> > have to be aware that (at least) the following code is executed in NMI
> > context. Trying to patch any of these functions could result in a dying
> > CPU :
>
> There is a function to disable the nmi watchdog temporarily now
>
As you pointed out below, NMI is only one example of interrupt sources
that cannot be protected by irq disable, such as MCE and SMIs. See below
for the rest of the discussion about this point.
>
> > In entry.S, there is also a call to local_irq_enable(), which falls into
> > lockdep code.
>
> ??
>
arch/i386/kernel/entry.S :
iret_exc:
TRACE_IRQS_ON
ENABLE_INTERRUPTS(CLBR_NONE)
pushl $0 # no error code
pushl $do_iret_error
jmp error_code
include/asm-i386/irqflags.h
# define TRACE_IRQS_ON \
pushl %eax; \
pushl %ecx; \
pushl %edx; \
call trace_hardirqs_on; \
popl %edx; \
popl %ecx; \
popl %eax;
Which falls into the lockdep.c code.
> >
> > Tracing those core kernel functions is a fundamental need of crash
> > tracing. So, in my point of view, it is not "just" about tracing NMIs,
> > but it's about tracing code that can be touched by NMIs.
>
> You only need to handle the erratas during the modification, not during
> the whole lifetime of the marker.
>
I agree with you, but you need to make the modification of every callees
of functions such as printk() safe in order to be able to trace them
later.
> The only frequent NMIs are watchdog and oprofile which both can
> be stopped. Other NMIs are very infrequent.
If we race with an "infrequent" NMI with this algorithm, it will result
in a unspecified trap, most likely a GPF. So having a solution that is
correct most of the time is not an option here. It will not just cause a
glitch, but bring the whole system down.
>
> BTW if you worry about NMI you would need to worry about machine
> check and SMI too.
>
Absolutely. I use NMIs as an example of these conditions, but MCE and
SMI also present the same issue.
So we get another example (I am sure we could easily find more) :
arch/i386/kernel/cpu/mcheck/p4.c:intel_machine_check
printk (and everything that printk calls)
vprintk
printk_clock
sched_clock
release_console_sem
call_console_drivers
.. therefore serial port code..
wake_up_klogd
wake_up_interruptible
try_to_wake_up
So.. just the fact that the MCE uses printk involves scheduler code
execution. If you plan not to support NMI, MCE or SMI, you have to
forbid instrumentation of any of those code paths.
Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
next prev parent reply other threads:[~2007-05-11 18:12 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-10 1:55 [patch 00/10] Linux Kernel Markers for 2.6.21-mm2 Mathieu Desnoyers
2007-05-10 1:55 ` [patch 01/10] Linux Kernel Markers - Add kconfig menus for the marker code Mathieu Desnoyers
2007-05-10 6:57 ` Christoph Hellwig
2007-05-10 1:55 ` [patch 02/10] Linux Kernel Markers, architecture independent code Mathieu Desnoyers
2007-05-10 5:10 ` Alexey Dobriyan
2007-05-10 12:58 ` Mathieu Desnoyers
2007-05-10 13:12 ` Mathieu Desnoyers
2007-05-10 19:00 ` Alexey Dobriyan
2007-05-10 19:46 ` Mathieu Desnoyers
2007-05-10 1:55 ` [patch 03/10] Allow userspace applications to use marker.h to parse the markers section in the kernel binary Mathieu Desnoyers
2007-05-10 6:51 ` Christoph Hellwig
2007-05-10 22:14 ` David Smith
2007-06-23 8:09 ` Christoph Hellwig
2007-06-23 9:25 ` Alan Cox
2007-06-23 9:32 ` Christoph Hellwig
2007-06-23 9:49 ` Alan Cox
2007-06-23 10:06 ` Christoph Hellwig
2007-06-23 14:55 ` Alan Cox
2007-05-10 1:55 ` [patch 04/10] Linux Kernel Markers - PowerPC optimized version Mathieu Desnoyers
2007-05-10 6:57 ` Christoph Hellwig
2007-05-10 1:56 ` [patch 05/10] Linux Kernel Markers - i386 " Mathieu Desnoyers
2007-05-10 9:06 ` Andi Kleen
2007-05-10 15:55 ` Mathieu Desnoyers
2007-05-10 16:28 ` Alan Cox
2007-05-10 16:59 ` Mathieu Desnoyers
2007-05-11 4:57 ` Ananth N Mavinakayanahalli
2007-05-11 18:55 ` Mathieu Desnoyers
2007-05-12 5:29 ` Suparna Bhattacharya
2007-05-11 6:04 ` Andi Kleen
2007-05-11 18:02 ` Mathieu Desnoyers [this message]
2007-05-11 21:56 ` Alan Cox
2007-05-13 15:20 ` Mathieu Desnoyers
2007-05-10 1:56 ` [patch 06/10] Linux Kernel Markers - Non optimized architectures Mathieu Desnoyers
2007-05-10 5:13 ` Alexey Dobriyan
2007-05-10 6:56 ` Christoph Hellwig
2007-05-10 13:11 ` Mathieu Desnoyers
2007-05-10 13:40 ` Alan Cox
2007-05-10 14:25 ` Mathieu Desnoyers
2007-05-10 15:33 ` Nicholas Berry
2007-05-10 16:09 ` Alan Cox
2007-05-10 1:56 ` [patch 07/10] Linux Kernel Markers - Documentation Mathieu Desnoyers
2007-05-10 6:58 ` Christoph Hellwig
2007-05-10 11:41 ` Alan Cox
2007-05-10 11:41 ` Christoph Hellwig
2007-05-10 12:48 ` Alan Cox
2007-05-10 12:52 ` Pekka Enberg
2007-05-10 13:04 ` Alan Cox
2007-05-10 13:16 ` Pekka J Enberg
2007-05-10 13:43 ` Alan Cox
2007-05-10 14:04 ` Pekka J Enberg
2007-05-10 14:12 ` Mathieu Desnoyers
2007-05-10 14:14 ` Mathieu Desnoyers
2007-05-11 15:05 ` Valdis.Kletnieks
2007-05-10 12:00 ` Christoph Hellwig
2007-05-10 15:51 ` Scott Preece
2007-05-10 1:56 ` [patch 08/10] Defines the linker macro EXTRA_RWDATA for the marker data section Mathieu Desnoyers
2007-05-10 1:56 ` [patch 09/10] Linux Kernel Markers - Use EXTRA_RWDATA in architectures Mathieu Desnoyers
2007-05-10 1:56 ` [patch 10/10] Port of blktrace to the Linux Kernel Markers Mathieu Desnoyers
2007-05-10 6:53 ` Christoph Hellwig
2007-05-10 9:20 ` Jens Axboe
2007-05-10 2:30 ` [patch 00/10] Linux Kernel Markers for 2.6.21-mm2 Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070511180207.GA25516@Krystal \
--to=mathieu.desnoyers@polymtl.ca \
--cc=ak@muc.de \
--cc=akpm@linux-foundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=ananth@in.ibm.com \
--cc=anil.s.keshavamurthy@intel.com \
--cc=hch@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=prasanna@in.ibm.com \
--cc=systemtap@sources.redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox