public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Chuck Ebbert <cebbert@redhat.com>
Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org
Subject: Re: [patch 4/8] Immediate Value - i386 Optimization
Date: Sun, 17 Jun 2007 13:50:09 -0400	[thread overview]
Message-ID: <20070617175009.GA931@Krystal> (raw)
In-Reply-To: <46730C5A.7080401@redhat.com>

* Chuck Ebbert (cebbert@redhat.com) wrote:
> On 06/15/2007 04:23 PM, Mathieu Desnoyers wrote:
> > i386 optimization of the immediate values which uses a movl with code patching
> > to set/unset the value used to populate the register used for the branch test.
> > 
> > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> > ---
> >  arch/i386/kernel/Makefile    |    1 
> >  arch/i386/kernel/immediate.c |  177 +++++++++++++++++++++++++++++++++++++++++++
> >  include/asm-i386/immediate.h |   72 +++++++++++++++++
> >  3 files changed, 249 insertions(+), 1 deletion(-)
> > 
> > Index: linux-2.6-lttng/include/asm-i386/immediate.h
> > ===================================================================
> > --- linux-2.6-lttng.orig/include/asm-i386/immediate.h	2007-06-15 16:13:55.000000000 -0400
> > +++ linux-2.6-lttng/include/asm-i386/immediate.h	2007-06-15 16:14:04.000000000 -0400
> > @@ -1 +1,71 @@
> > -#include <asm-generic/immediate.h>
> > +#ifndef _ASM_I386_IMMEDIATE_H
> > +#define _ASM_I386_IMMEDIATE_H
> > +
> > +/*
> > + * Immediate values. i386 architecture optimizations.
> > + *
> > + * (C) Copyright 2006 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> > + *
> > + * This file is released under the GPLv2.
> > + * See the file COPYING for more details.
> > + */
> > +
> > +#define IF_DEFAULT (IF_OPTIMIZED | IF_LOCKDEP)
> > +
> > +/*
> > + * Optimized version of the immediate. Passing the flags as a pointer to
> > + * the inline assembly to trick it into putting the flags value as third
> > + * parameter in the structure.
> > + */
> > +#define immediate_optimized(flags, var)					\
> > +	({								\
> > +		int condition;						\
> > +		asm (	".section __immediate, \"a\", @progbits;\n\t"	\
> > +					".long %1, 0f, %2;\n\t"		\
> > +					".previous;\n\t"		\
> > +					"0:\n\t"			\
> > +					"movl %3,%0;\n\t"		\
> > +				: "=r" (condition)			\
> > +				: "m" (var),				\
> > +				  "m" (*(char*)flags),			\
> > +				  "i" (0));				\
> > +		(condition);						\
> 
Hi Chuck,

> Unnecessary parens.
> 

ok, fixed in each immediate.h.

> > +	})
> > +
> > +/*
> > + * immediate macro selecting the generic or optimized version of immediate,
> > + * depending on the flags specified. It is a macro because we need to pass the
> > + * name to immediate_optimized() and immediate_generic() so they can declare a
> > + * static variable with it.
> > + */
> > +#define _immediate(flags, var)						\
> > +({									\
> > +	(((flags) & IF_LOCKDEP) && ((flags) & IF_OPTIMIZED)) ?		\
> > +		immediate_optimized(flags, var) :			\
> > +		immediate_generic(flags, var);				\
> > +})
> > +
> > +/* immediate with default behavior */
> > +#define immediate(var)	_immediate(IF_DEFAULT, var)
> > +
> > +/*
> > + * Architecture dependant immediate information, used internally for immediate
> > + * activation.
> > + */
> > +
> > +/*
> > + * Offset of the immediate value from the start of the movl instruction, in
> > + * bytes. We point to the first lower byte of the 4 bytes immediate value. Only
> > + * changing one byte makes sure we do an atomic memory write, independently of
> > + * the alignment of the 4 bytes in the load immediate instruction.
> > + */
> > +#define IMMEDIATE_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET 1
> > +#define IMMEDIATE_OPTIMIZED_ENABLE_TYPE unsigned char
> > +/* Dereference enable as lvalue from a pointer to its instruction */
> > +#define IMMEDIATE_OPTIMIZED_ENABLE(a)					\
> > +	(*(IMMEDIATE_OPTIMIZED_ENABLE_TYPE*)				\
> > +		((char*)(a)+IMMEDIATE_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET))
> > +
> > +extern int immediate_optimized_set_enable(void *address, char enable);
> > +
> > +#endif /* _ASM_I386_IMMEDIATE_H */
> > Index: linux-2.6-lttng/arch/i386/kernel/Makefile
> > ===================================================================
> > --- linux-2.6-lttng.orig/arch/i386/kernel/Makefile	2007-06-15 16:13:51.000000000 -0400
> > +++ linux-2.6-lttng/arch/i386/kernel/Makefile	2007-06-15 16:14:04.000000000 -0400
> > @@ -35,6 +35,7 @@
> >  obj-y				+= sysenter.o vsyscall.o
> >  obj-$(CONFIG_ACPI_SRAT) 	+= srat.o
> >  obj-$(CONFIG_EFI) 		+= efi.o efi_stub.o
> > +obj-$(CONFIG_IMMEDIATE)		+= immediate.o
> >  obj-$(CONFIG_DOUBLEFAULT) 	+= doublefault.o
> >  obj-$(CONFIG_SERIAL_8250)	+= legacy_serial.o
> >  obj-$(CONFIG_VM86)		+= vm86.o
> > Index: linux-2.6-lttng/arch/i386/kernel/immediate.c
> > ===================================================================
> > --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> > +++ linux-2.6-lttng/arch/i386/kernel/immediate.c	2007-06-15 16:14:04.000000000 -0400
> > @@ -0,0 +1,177 @@
> > +/*
> > + * Immediate Value - i386 architecture specific code.
> > + *
> > + * Rationale
> > + *
> > + * Required because of :
> > + * - Erratum 49 fix for Intel PIII.
> > + * - Still present on newer processors : Intel Core 2 Duo Processor for Intel
> > + *   Centrino Duo Processor Technology Specification Update, AH33.
> > + *   Unsynchronized Cross-Modifying Code Operations Can Cause Unexpected
> > + *   Instruction Execution Results.
> > + *
> > + * Permits immediate value modification by XMC with correct serialization.
> > + *
> > + * Reentrant for NMI and trap handler instrumentation. Permits XMC to a
> > + * location that has preemption enabled because it involves no temporary or
> > + * reused data structure.
> > + *
> > + * Quoting Richard J Moore, source of the information motivating this
> > + * implementation which differs from the one proposed by Intel which is not
> > + * suitable for kernel context (does not support NMI and would require disabling
> > + * interrupts on every CPU for a long period) :
> > + *
> > + * "There is another issue to consider when looking into using probes other
> > + * then int3:
> > + *
> > + * Intel erratum 54 - Unsynchronized Cross-modifying code - refers to the
> > + * practice of modifying code on one processor where another has prefetched
> > + * the unmodified version of the code. Intel states that unpredictable general
> > + * protection faults may result if a synchronizing instruction (iret, int,
> > + * int3, cpuid, etc ) is not executed on the second processor before it
> > + * executes the pre-fetched out-of-date copy of the instruction.
> > + *
> > + * When we became aware of this I had a long discussion with Intel's
> > + * microarchitecture guys. It turns out that the reason for this erratum
> > + * (which incidentally Intel does not intend to fix) is because the trace
> > + * cache - the stream of micorops resulting from instruction interpretation -
> > + * cannot guaranteed to be valid. Reading between the lines I assume this
> > + * issue arises because of optimization done in the trace cache, where it is
> > + * no longer possible to identify the original instruction boundaries. If the
> > + * CPU discoverers that the trace cache has been invalidated because of
> > + * unsynchronized cross-modification then instruction execution will be
> > + * aborted with a GPF. Further discussion with Intel revealed that replacing
> > + * the first opcode byte with an int3 would not be subject to this erratum.
> > + *
> > + * So, is cmpxchg reliable? One has to guarantee more than mere atomicity."
> > + *
> > + * Overall design
> > + *
> > + * The algorithm proposed by Intel applies not so well in kernel context: it
> > + * would imply disabling interrupts and looping on every CPUs while modifying
> > + * the code and would not support instrumentation of code called from interrupt
> > + * sources that cannot be disabled.
> > + *
> > + * Therefore, we use a different algorithm to respect Intel's erratum (see the
> > + * quoted discussion above). We make sure that no CPU sees an out-of-date copy
> > + * of a pre-fetched instruction by 1 - using a breakpoint, which skips the
> > + * instruction that is going to be modified, 2 - issuing an IPI to every CPU to
> > + * execute a sync_core(), to make sure that even when the breakpoint is removed,
> > + * no cpu could possibly still have the out-of-date copy of the instruction,
> > + * modify the now unused 2nd byte of the instruction, and then put back the
> > + * original 1st byte of the instruction.
> > + *
> > + * It has exactly the same intent as the algorithm proposed by Intel, but
> > + * it has less side-effects, scales better and supports NMI, SMI and MCE.
> 
> Algorithm looks plausible, was it really tested?
> 

Yes, but I can't reproduce the erroneous condition on my setup. I just
realized that I did not test it after my last changes. Therefore, see
the following comments inline for the upcoming fixes :

> > + *
> > + * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> > + */
> > +
> > +#include <linux/notifier.h>
> > +#include <linux/preempt.h>
> > +#include <linux/smp.h>
> > +#include <linux/notifier.h>
> > +#include <linux/module.h>
> > +#include <linux/immediate.h>
> > +#include <linux/kdebug.h>
> > +
> > +#include <asm/cacheflush.h>
> > +
> > +#define BREAKPOINT_INSTRUCTION  0xcc
> > +#define BREAKPOINT_INS_LEN 1
> > +
> > +static long target_eip;
> > +
> > +static void immediate_synchronize_core(void *info)
> > +{
> > +	sync_core();	/* use cpuid to stop speculative execution */
> > +}
> > +
> > +/*
> > + * The eip value points right after the breakpoint instruction, in the second
> > + * byte of the movb. Incrementing it of 1 byte makes the code resume right after
> > + * the movb instruction, effectively skipping this instruction.
> > + *
> > + * We simply skip the 2 bytes load immediate here, leaving the register in an
> > + * undefined state. We don't care about the content (0 or !0), because we are
> > + * changing the value 0->1 or 1->0. This small window of undefined value
> > + * doesn't matter.
> > + */
> > +static int immediate_notifier(struct notifier_block *nb,
> > +	unsigned long val, void *data)
> > +{
> > +	enum die_val die_val = (enum die_val) val;
> > +	struct die_args *args = data;
> > +
> > +	if (!args->regs || user_mode_vm(args->regs))
> > +		return NOTIFY_DONE;
> > +
> > +	if (die_val == DIE_INT3	&& args->regs->eip == target_eip) {
> > +		args->regs->eip += 1; /* Skip the next byte of load immediate */
> 
> <nitpick>
> 		args->regs->eip++;
> 

Should now be args->regs->eip += 4; because I jump over the last 4 bytes
of the 5 bytes long movl instead of jumping over the last byte of the 2 bytes
movb I used previously.

I also have to use the int3 handler installed by CONFIG_KPROBES when
CONFIG_IMMEDIATE is used so I do not depend on KPROBES for i386. It will
also be fixed in the next release.

> > +		return NOTIFY_STOP;
> > +	}
> > +	return NOTIFY_DONE;
> > +}
> > +
> > +static struct notifier_block immediate_notify = {
> > +	.notifier_call = immediate_notifier,
> > +	.priority = 0x7fffffff,	/* we need to be notified first */
> > +};
> > +
> > +/*
> > + * The address is not aligned. We can only change 1 byte of the value
> > + * atomically.
> > + * Must be called with immediate_mutex held.
> > + */
> > +int immediate_optimized_set_enable(void *address, char enable)
> > +{
> > +	char saved_byte;
> > +	int ret;
> > +	char *dest = address;
> > +
> > +	if (!(enable ^ dest[1])) /* Must be a state change 0<->1 to execute */
> > +		return 0;
> > +
> > +#if defined(CONFIG_DEBUG_RODATA) || defined(CONFIG_DEBUG_PAGEALLOC)
> > +	/* Make sure this page is writable */
> > +	change_page_attr(virt_to_page(address), 1, PAGE_KERNEL_EXEC);
> > +	global_flush_tlb();
> > +#endif
> 
> Can't we have a macro or inline to do this, and the setting back
> to read-only? kprobes also has the same ugly #if blocks...
> 
I guess it would be better to share a common macro between kprobes and
immediate for this.

> Hmm, what happens if you race with kprobes setting a probe on
> the same page? Couldn't it unexpectedly become read-only?
> 

Sure it can. Is there any spinlock already there that could be used by
different objects and would fit this purpose?

> > +	target_eip = (long)address + BREAKPOINT_INS_LEN;
> > +	/* register_die_notifier has memory barriers */
> > +	register_die_notifier(&immediate_notify);
> > +	saved_byte = *dest;
> > +	*dest = BREAKPOINT_INSTRUCTION;
> > +	wmb();
> > +	/*
> > +	 * Execute serializing instruction on each CPU.
> > +	 * Acts as a memory barrier.
> > +	 */
> > +	ret = on_each_cpu(immediate_synchronize_core, NULL, 1, 1);
> > +	BUG_ON(ret != 0);
> > +
> > +	dest[1] = enable;
> > +	wmb();
> > +	*dest = saved_byte;
> > +		/*
> > +		 * Wait for all int3 handlers to end
> > +		 * (interrupts are disabled in int3).
> > +		 * This CPU is clearly not in a int3 handler,
> > +		 * because int3 handler is not preemptible and
> > +		 * there cannot be any more int3 handler called
> > +		 * for this site, because we placed the original
> > +		 * instruction back.
> > +		 * synchronize_sched has memory barriers.
> > +		 */
> > +	synchronize_sched();
> > +	unregister_die_notifier(&immediate_notify);
> > +	/* unregister_die_notifier has memory barriers */
> > +	target_eip = 0;
> > +#if defined(CONFIG_DEBUG_RODATA)
> > +	/* Set this page back to RX */
> > +	change_page_attr(virt_to_page(address), 1, PAGE_KERNEL_RX);
> > +	global_flush_tlb();
> > +#endif
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(immediate_optimized_set_enable);
> > 
> 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  reply	other threads:[~2007-06-17 18:19 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-15 20:23 [patch 0/8] Immediate values for fast branches Mathieu Desnoyers
2007-06-15 20:23 ` [patch 1/8] Immediate Value - Architecture Independent Code Mathieu Desnoyers
2007-06-15 20:23 ` [patch 2/8] Immediate Values - Non Optimized Architectures Mathieu Desnoyers
2007-06-15 20:23 ` [patch 3/8] Immediate Value - Add kconfig menus Mathieu Desnoyers
2007-06-15 20:23 ` [patch 4/8] Immediate Value - i386 Optimization Mathieu Desnoyers
2007-06-15 22:02   ` Chuck Ebbert
2007-06-17 17:50     ` Mathieu Desnoyers [this message]
2007-06-18 14:57     ` [patch 4/8] Immediate Value - i386 Optimization; kprobes Mathieu Desnoyers
2007-06-18 18:44       ` Chuck Ebbert
2007-06-18 18:56         ` Andrew Morton
2007-06-18 19:27           ` Mathieu Desnoyers
2007-06-18 19:32           ` Andi Kleen
2007-06-18 20:16             ` Chuck Ebbert
2007-06-19 10:06             ` [patch 1/2] kprobes i386 quick fix mark-ro-data S. P. Prasanna
2007-06-19 10:08               ` [patch 2/2] kprobes x86_64 " S. P. Prasanna
2007-06-19 13:21                 ` Arjan van de Ven
2007-06-19 13:30                   ` Mathieu Desnoyers
2007-06-19 13:44                     ` Arjan van de Ven
2007-06-19 14:31                       ` S. P. Prasanna
2007-06-19 16:47               ` [patch 1/2] kprobes i386 " Andi Kleen
2007-06-15 20:23 ` [patch 5/8] Immediate Value - PowerPC Optimization Mathieu Desnoyers
2007-06-15 20:23 ` [patch 6/8] Immediate Value - Documentation Mathieu Desnoyers
2007-06-15 20:23 ` [patch 7/8] F00F bug fixup for i386 - use immediate values Mathieu Desnoyers
2007-06-15 20:23 ` [patch 8/8] Scheduler profiling - Use " Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070617175009.GA931@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=akpm@linux-foundation.org \
    --cc=cebbert@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox