From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Chuck Ebbert <cebbert@redhat.com>
Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org
Subject: Re: [patch 4/8] Immediate Value - i386 Optimization
Date: Sun, 17 Jun 2007 13:50:09 -0400 [thread overview]
Message-ID: <20070617175009.GA931@Krystal> (raw)
In-Reply-To: <46730C5A.7080401@redhat.com>
* Chuck Ebbert (cebbert@redhat.com) wrote:
> On 06/15/2007 04:23 PM, Mathieu Desnoyers wrote:
> > i386 optimization of the immediate values which uses a movl with code patching
> > to set/unset the value used to populate the register used for the branch test.
> >
> > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> > ---
> > arch/i386/kernel/Makefile | 1
> > arch/i386/kernel/immediate.c | 177 +++++++++++++++++++++++++++++++++++++++++++
> > include/asm-i386/immediate.h | 72 +++++++++++++++++
> > 3 files changed, 249 insertions(+), 1 deletion(-)
> >
> > Index: linux-2.6-lttng/include/asm-i386/immediate.h
> > ===================================================================
> > --- linux-2.6-lttng.orig/include/asm-i386/immediate.h 2007-06-15 16:13:55.000000000 -0400
> > +++ linux-2.6-lttng/include/asm-i386/immediate.h 2007-06-15 16:14:04.000000000 -0400
> > @@ -1 +1,71 @@
> > -#include <asm-generic/immediate.h>
> > +#ifndef _ASM_I386_IMMEDIATE_H
> > +#define _ASM_I386_IMMEDIATE_H
> > +
> > +/*
> > + * Immediate values. i386 architecture optimizations.
> > + *
> > + * (C) Copyright 2006 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> > + *
> > + * This file is released under the GPLv2.
> > + * See the file COPYING for more details.
> > + */
> > +
> > +#define IF_DEFAULT (IF_OPTIMIZED | IF_LOCKDEP)
> > +
> > +/*
> > + * Optimized version of the immediate. Passing the flags as a pointer to
> > + * the inline assembly to trick it into putting the flags value as third
> > + * parameter in the structure.
> > + */
> > +#define immediate_optimized(flags, var) \
> > + ({ \
> > + int condition; \
> > + asm ( ".section __immediate, \"a\", @progbits;\n\t" \
> > + ".long %1, 0f, %2;\n\t" \
> > + ".previous;\n\t" \
> > + "0:\n\t" \
> > + "movl %3,%0;\n\t" \
> > + : "=r" (condition) \
> > + : "m" (var), \
> > + "m" (*(char*)flags), \
> > + "i" (0)); \
> > + (condition); \
>
Hi Chuck,
> Unnecessary parens.
>
ok, fixed in each immediate.h.
> > + })
> > +
> > +/*
> > + * immediate macro selecting the generic or optimized version of immediate,
> > + * depending on the flags specified. It is a macro because we need to pass the
> > + * name to immediate_optimized() and immediate_generic() so they can declare a
> > + * static variable with it.
> > + */
> > +#define _immediate(flags, var) \
> > +({ \
> > + (((flags) & IF_LOCKDEP) && ((flags) & IF_OPTIMIZED)) ? \
> > + immediate_optimized(flags, var) : \
> > + immediate_generic(flags, var); \
> > +})
> > +
> > +/* immediate with default behavior */
> > +#define immediate(var) _immediate(IF_DEFAULT, var)
> > +
> > +/*
> > + * Architecture dependant immediate information, used internally for immediate
> > + * activation.
> > + */
> > +
> > +/*
> > + * Offset of the immediate value from the start of the movl instruction, in
> > + * bytes. We point to the first lower byte of the 4 bytes immediate value. Only
> > + * changing one byte makes sure we do an atomic memory write, independently of
> > + * the alignment of the 4 bytes in the load immediate instruction.
> > + */
> > +#define IMMEDIATE_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET 1
> > +#define IMMEDIATE_OPTIMIZED_ENABLE_TYPE unsigned char
> > +/* Dereference enable as lvalue from a pointer to its instruction */
> > +#define IMMEDIATE_OPTIMIZED_ENABLE(a) \
> > + (*(IMMEDIATE_OPTIMIZED_ENABLE_TYPE*) \
> > + ((char*)(a)+IMMEDIATE_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET))
> > +
> > +extern int immediate_optimized_set_enable(void *address, char enable);
> > +
> > +#endif /* _ASM_I386_IMMEDIATE_H */
> > Index: linux-2.6-lttng/arch/i386/kernel/Makefile
> > ===================================================================
> > --- linux-2.6-lttng.orig/arch/i386/kernel/Makefile 2007-06-15 16:13:51.000000000 -0400
> > +++ linux-2.6-lttng/arch/i386/kernel/Makefile 2007-06-15 16:14:04.000000000 -0400
> > @@ -35,6 +35,7 @@
> > obj-y += sysenter.o vsyscall.o
> > obj-$(CONFIG_ACPI_SRAT) += srat.o
> > obj-$(CONFIG_EFI) += efi.o efi_stub.o
> > +obj-$(CONFIG_IMMEDIATE) += immediate.o
> > obj-$(CONFIG_DOUBLEFAULT) += doublefault.o
> > obj-$(CONFIG_SERIAL_8250) += legacy_serial.o
> > obj-$(CONFIG_VM86) += vm86.o
> > Index: linux-2.6-lttng/arch/i386/kernel/immediate.c
> > ===================================================================
> > --- /dev/null 1970-01-01 00:00:00.000000000 +0000
> > +++ linux-2.6-lttng/arch/i386/kernel/immediate.c 2007-06-15 16:14:04.000000000 -0400
> > @@ -0,0 +1,177 @@
> > +/*
> > + * Immediate Value - i386 architecture specific code.
> > + *
> > + * Rationale
> > + *
> > + * Required because of :
> > + * - Erratum 49 fix for Intel PIII.
> > + * - Still present on newer processors : Intel Core 2 Duo Processor for Intel
> > + * Centrino Duo Processor Technology Specification Update, AH33.
> > + * Unsynchronized Cross-Modifying Code Operations Can Cause Unexpected
> > + * Instruction Execution Results.
> > + *
> > + * Permits immediate value modification by XMC with correct serialization.
> > + *
> > + * Reentrant for NMI and trap handler instrumentation. Permits XMC to a
> > + * location that has preemption enabled because it involves no temporary or
> > + * reused data structure.
> > + *
> > + * Quoting Richard J Moore, source of the information motivating this
> > + * implementation which differs from the one proposed by Intel which is not
> > + * suitable for kernel context (does not support NMI and would require disabling
> > + * interrupts on every CPU for a long period) :
> > + *
> > + * "There is another issue to consider when looking into using probes other
> > + * then int3:
> > + *
> > + * Intel erratum 54 - Unsynchronized Cross-modifying code - refers to the
> > + * practice of modifying code on one processor where another has prefetched
> > + * the unmodified version of the code. Intel states that unpredictable general
> > + * protection faults may result if a synchronizing instruction (iret, int,
> > + * int3, cpuid, etc ) is not executed on the second processor before it
> > + * executes the pre-fetched out-of-date copy of the instruction.
> > + *
> > + * When we became aware of this I had a long discussion with Intel's
> > + * microarchitecture guys. It turns out that the reason for this erratum
> > + * (which incidentally Intel does not intend to fix) is because the trace
> > + * cache - the stream of micorops resulting from instruction interpretation -
> > + * cannot guaranteed to be valid. Reading between the lines I assume this
> > + * issue arises because of optimization done in the trace cache, where it is
> > + * no longer possible to identify the original instruction boundaries. If the
> > + * CPU discoverers that the trace cache has been invalidated because of
> > + * unsynchronized cross-modification then instruction execution will be
> > + * aborted with a GPF. Further discussion with Intel revealed that replacing
> > + * the first opcode byte with an int3 would not be subject to this erratum.
> > + *
> > + * So, is cmpxchg reliable? One has to guarantee more than mere atomicity."
> > + *
> > + * Overall design
> > + *
> > + * The algorithm proposed by Intel applies not so well in kernel context: it
> > + * would imply disabling interrupts and looping on every CPUs while modifying
> > + * the code and would not support instrumentation of code called from interrupt
> > + * sources that cannot be disabled.
> > + *
> > + * Therefore, we use a different algorithm to respect Intel's erratum (see the
> > + * quoted discussion above). We make sure that no CPU sees an out-of-date copy
> > + * of a pre-fetched instruction by 1 - using a breakpoint, which skips the
> > + * instruction that is going to be modified, 2 - issuing an IPI to every CPU to
> > + * execute a sync_core(), to make sure that even when the breakpoint is removed,
> > + * no cpu could possibly still have the out-of-date copy of the instruction,
> > + * modify the now unused 2nd byte of the instruction, and then put back the
> > + * original 1st byte of the instruction.
> > + *
> > + * It has exactly the same intent as the algorithm proposed by Intel, but
> > + * it has less side-effects, scales better and supports NMI, SMI and MCE.
>
> Algorithm looks plausible, was it really tested?
>
Yes, but I can't reproduce the erroneous condition on my setup. I just
realized that I did not test it after my last changes. Therefore, see
the following comments inline for the upcoming fixes :
> > + *
> > + * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> > + */
> > +
> > +#include <linux/notifier.h>
> > +#include <linux/preempt.h>
> > +#include <linux/smp.h>
> > +#include <linux/notifier.h>
> > +#include <linux/module.h>
> > +#include <linux/immediate.h>
> > +#include <linux/kdebug.h>
> > +
> > +#include <asm/cacheflush.h>
> > +
> > +#define BREAKPOINT_INSTRUCTION 0xcc
> > +#define BREAKPOINT_INS_LEN 1
> > +
> > +static long target_eip;
> > +
> > +static void immediate_synchronize_core(void *info)
> > +{
> > + sync_core(); /* use cpuid to stop speculative execution */
> > +}
> > +
> > +/*
> > + * The eip value points right after the breakpoint instruction, in the second
> > + * byte of the movb. Incrementing it of 1 byte makes the code resume right after
> > + * the movb instruction, effectively skipping this instruction.
> > + *
> > + * We simply skip the 2 bytes load immediate here, leaving the register in an
> > + * undefined state. We don't care about the content (0 or !0), because we are
> > + * changing the value 0->1 or 1->0. This small window of undefined value
> > + * doesn't matter.
> > + */
> > +static int immediate_notifier(struct notifier_block *nb,
> > + unsigned long val, void *data)
> > +{
> > + enum die_val die_val = (enum die_val) val;
> > + struct die_args *args = data;
> > +
> > + if (!args->regs || user_mode_vm(args->regs))
> > + return NOTIFY_DONE;
> > +
> > + if (die_val == DIE_INT3 && args->regs->eip == target_eip) {
> > + args->regs->eip += 1; /* Skip the next byte of load immediate */
>
> <nitpick>
> args->regs->eip++;
>
Should now be args->regs->eip += 4; because I jump over the last 4 bytes
of the 5 bytes long movl instead of jumping over the last byte of the 2 bytes
movb I used previously.
I also have to use the int3 handler installed by CONFIG_KPROBES when
CONFIG_IMMEDIATE is used so I do not depend on KPROBES for i386. It will
also be fixed in the next release.
> > + return NOTIFY_STOP;
> > + }
> > + return NOTIFY_DONE;
> > +}
> > +
> > +static struct notifier_block immediate_notify = {
> > + .notifier_call = immediate_notifier,
> > + .priority = 0x7fffffff, /* we need to be notified first */
> > +};
> > +
> > +/*
> > + * The address is not aligned. We can only change 1 byte of the value
> > + * atomically.
> > + * Must be called with immediate_mutex held.
> > + */
> > +int immediate_optimized_set_enable(void *address, char enable)
> > +{
> > + char saved_byte;
> > + int ret;
> > + char *dest = address;
> > +
> > + if (!(enable ^ dest[1])) /* Must be a state change 0<->1 to execute */
> > + return 0;
> > +
> > +#if defined(CONFIG_DEBUG_RODATA) || defined(CONFIG_DEBUG_PAGEALLOC)
> > + /* Make sure this page is writable */
> > + change_page_attr(virt_to_page(address), 1, PAGE_KERNEL_EXEC);
> > + global_flush_tlb();
> > +#endif
>
> Can't we have a macro or inline to do this, and the setting back
> to read-only? kprobes also has the same ugly #if blocks...
>
I guess it would be better to share a common macro between kprobes and
immediate for this.
> Hmm, what happens if you race with kprobes setting a probe on
> the same page? Couldn't it unexpectedly become read-only?
>
Sure it can. Is there any spinlock already there that could be used by
different objects and would fit this purpose?
> > + target_eip = (long)address + BREAKPOINT_INS_LEN;
> > + /* register_die_notifier has memory barriers */
> > + register_die_notifier(&immediate_notify);
> > + saved_byte = *dest;
> > + *dest = BREAKPOINT_INSTRUCTION;
> > + wmb();
> > + /*
> > + * Execute serializing instruction on each CPU.
> > + * Acts as a memory barrier.
> > + */
> > + ret = on_each_cpu(immediate_synchronize_core, NULL, 1, 1);
> > + BUG_ON(ret != 0);
> > +
> > + dest[1] = enable;
> > + wmb();
> > + *dest = saved_byte;
> > + /*
> > + * Wait for all int3 handlers to end
> > + * (interrupts are disabled in int3).
> > + * This CPU is clearly not in a int3 handler,
> > + * because int3 handler is not preemptible and
> > + * there cannot be any more int3 handler called
> > + * for this site, because we placed the original
> > + * instruction back.
> > + * synchronize_sched has memory barriers.
> > + */
> > + synchronize_sched();
> > + unregister_die_notifier(&immediate_notify);
> > + /* unregister_die_notifier has memory barriers */
> > + target_eip = 0;
> > +#if defined(CONFIG_DEBUG_RODATA)
> > + /* Set this page back to RX */
> > + change_page_attr(virt_to_page(address), 1, PAGE_KERNEL_RX);
> > + global_flush_tlb();
> > +#endif
> > + return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(immediate_optimized_set_enable);
> >
>
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
next prev parent reply other threads:[~2007-06-17 18:19 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-15 20:23 [patch 0/8] Immediate values for fast branches Mathieu Desnoyers
2007-06-15 20:23 ` [patch 1/8] Immediate Value - Architecture Independent Code Mathieu Desnoyers
2007-06-15 20:23 ` [patch 2/8] Immediate Values - Non Optimized Architectures Mathieu Desnoyers
2007-06-15 20:23 ` [patch 3/8] Immediate Value - Add kconfig menus Mathieu Desnoyers
2007-06-15 20:23 ` [patch 4/8] Immediate Value - i386 Optimization Mathieu Desnoyers
2007-06-15 22:02 ` Chuck Ebbert
2007-06-17 17:50 ` Mathieu Desnoyers [this message]
2007-06-18 14:57 ` [patch 4/8] Immediate Value - i386 Optimization; kprobes Mathieu Desnoyers
2007-06-18 18:44 ` Chuck Ebbert
2007-06-18 18:56 ` Andrew Morton
2007-06-18 19:27 ` Mathieu Desnoyers
2007-06-18 19:32 ` Andi Kleen
2007-06-18 20:16 ` Chuck Ebbert
2007-06-19 10:06 ` [patch 1/2] kprobes i386 quick fix mark-ro-data S. P. Prasanna
2007-06-19 10:08 ` [patch 2/2] kprobes x86_64 " S. P. Prasanna
2007-06-19 13:21 ` Arjan van de Ven
2007-06-19 13:30 ` Mathieu Desnoyers
2007-06-19 13:44 ` Arjan van de Ven
2007-06-19 14:31 ` S. P. Prasanna
2007-06-19 16:47 ` [patch 1/2] kprobes i386 " Andi Kleen
2007-06-15 20:23 ` [patch 5/8] Immediate Value - PowerPC Optimization Mathieu Desnoyers
2007-06-15 20:23 ` [patch 6/8] Immediate Value - Documentation Mathieu Desnoyers
2007-06-15 20:23 ` [patch 7/8] F00F bug fixup for i386 - use immediate values Mathieu Desnoyers
2007-06-15 20:23 ` [patch 8/8] Scheduler profiling - Use " Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070617175009.GA931@Krystal \
--to=mathieu.desnoyers@polymtl.ca \
--cc=akpm@linux-foundation.org \
--cc=cebbert@redhat.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.