public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [patch 5/9] Conditional Calls - i386 Optimization
Date: Wed, 30 May 2007 13:33:28 -0700	[thread overview]
Message-ID: <20070530133328.03352d4b.akpm@linux-foundation.org> (raw)
In-Reply-To: <20070530140228.349272966@polymtl.ca>

On Wed, 30 May 2007 10:00:30 -0400
Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:

> i386 optimization of the cond_calls which uses a movb with code patching to
> set/unset the value used to populate the register used for the branch test.
> 
> +/*
> + * Conditional function calls. i386 architecture optimizations.
> + *
> + * (C) Copyright 2006 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> + *
> + * This file is released under the GPLv2.
> + * See the file COPYING for more details.
> + */
> +
> +
> +#ifdef CONFIG_COND_CALL
> +
> +#define CF_DEFAULT (CF_OPTIMIZED | CF_LOCKDEP | CF_PRINTK)
> +
> +/* Optimized version of the cond_call. Passing the flags as a pointer to
> + * the inline assembly to trick it into putting the flags value as third
> + * parameter in the structure. */
> +#define cond_call_optimized(flags, name, func) \
> +	({ \
> +		static const char __cstrtab_name_##name[] \
> +		__attribute__((section("__cond_call_strings"))) = #name; \
> +		char condition; \
> +		asm (	".section __cond_call, \"a\", @progbits;\n\t" \
> +					".long %1, 0f, %2;\n\t" \
> +					".previous;\n\t" \
> +					".align 2\n\t" \
> +					"0:\n\t" \
> +					"movb %3,%0;\n\t" \
> +				: "=r" (condition) \
> +				: "m" (*__cstrtab_name_##name), \
> +				  "m" (*(char*)flags), \
> +				  "i" ((flags) & CF_STATIC_ENABLE)); \
> +		(likely(!condition)) ? \
> +			(__typeof__(func))0 : \
> +			(func); \
> +	})

People often tab the \ characters across to the right, make them line up
nicely.  It does look better.

> +/* cond_call macro selecting the generic or optimized version of cond_call,
> + * depending on the flags specified. */
> +#define _cond_call(flags, name, func) \
> +({ \
> +	(((flags) & CF_LOCKDEP) && ((flags) & CF_OPTIMIZED)) ? \
> +		cond_call_optimized(flags, name, func) : \
> +		cond_call_generic(flags, name, func); \
> +})

So it is a macro instead of a static inline because we need to emit an
entry into __cond_call_strings once per callsite, yes?

> +/* Architecture dependant cond_call information, used internally for cond_call
> + * activation. */
> +
> +/* Offset of the immediate value from the start of the movb instruction, in
> + * bytes. */
> +#define COND_CALL_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET 1
> +#define COND_CALL_OPTIMIZED_ENABLE_TYPE unsigned char
> +/* Dereference enable as lvalue from a pointer to its instruction */
> +#define COND_CALL_OPTIMIZED_ENABLE(a) \
> +	*(COND_CALL_OPTIMIZED_ENABLE_TYPE*) \
> +		((char*)a+COND_CALL_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET)

I suspect this is underparenthesised.

> +extern int cond_call_optimized_set_enable(void *address, char enable);
> +
> +#endif
> +#endif //_ASM_I386_CONDCALL_H
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6-lttng/arch/i386/kernel/condcall.c	2007-05-17 01:52:38.000000000 -0400
> @@ -0,0 +1,140 @@
> +/* condcall.c
> + *
> + * - Erratum 49 fix for Intel PIII.
> + * - Still present on newer processors : Intel Core 2 Duo Processor for Intel
> + *   Centrino Duo Processor Technology Specification Update, AH33.
> + *   Unsynchronized Cross-Modifying Code Operations Can Cause Unexpected
> + *   Instruction Execution Results.
> + *
> + * Permits cond_call activation by XMC with correct serialization.
> + *
> + * Reentrant for NMI and trap handler instrumentation. Permits XMC to a
> + * location that has preemption enabled because it involves no temporary or
> + * reused data structure.
> + *
> + * Quoting Richard J Moore, source of the information motivating this
> + * implementation which differs from the one proposed by Intel which is not
> + * suitable for kernel context (does not support NMI and would require disabling
> + * interrupts on every CPU for a long period) :
> + *
> + * "There is another issue to consider when looking into using probes other
> + * then int3:
> + *
> + * Intel erratum 54 - Unsynchronized Cross-modifying code - refers to the
> + * practice of modifying code on one processor where another has prefetched
> + * the unmodified version of the code. Intel states that unpredictable general
> + * protection faults may result if a synchronizing instruction (iret, int,
> + * int3, cpuid, etc ) is not executed on the second processor before it
> + * executes the pre-fetched out-of-date copy of the instruction.
> + *
> + * When we became aware of this I had a long discussion with Intel's
> + * microarchitecture guys. It turns out that the reason for this erratum
> + * (which incidentally Intel does not intend to fix) is because the trace
> + * cache - the stream of micorops resulting from instruction interpretation -
> + * cannot guaranteed to be valid. Reading between the lines I assume this
> + * issue arises because of optimization done in the trace cache, where it is
> + * no longer possible to identify the original instruction boundaries. If the
> + * CPU discoverers that the trace cache has been invalidated because of
> + * unsynchronized cross-modification then instruction execution will be
> + * aborted with a GPF. Further discussion with Intel revealed that replacing
> + * the first opcode byte with an int3 would not be subject to this erratum.
> + *
> + * So, is cmpxchg reliable? One has to guarantee more than mere atomicity."
> + *
> + * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> + */
> +
> +#include <linux/notifier.h>
> +#include <linux/mutex.h>
> +#include <linux/preempt.h>
> +#include <linux/smp.h>
> +#include <linux/notifier.h>
> +#include <linux/module.h>
> +#include <linux/condcall.h>
> +#include <linux/kdebug.h>
> +
> +#include <asm/cacheflush.h>
> +
> +#define BREAKPOINT_INSTRUCTION  0xcc
> +#define BREAKPOINT_INS_LEN 1
> +
> +static DEFINE_MUTEX(cond_call_mutex);
> +static long target_eip = 0;

s/= 0//

> +static void cond_call_synchronize_core(void *info)
> +{
> +	sync_core();	/* use cpuid to stop speculative execution */
> +}
> +
> +/* The eip value points right after the breakpoint instruction,

What breakpoint insn?  How did it get there?

>  in the second
> + * byte of the movb. Incrementing it of 1 byte makes the code resume right after
> + * the movb instruction, effectively skipping this instruction.
> + *
> + * We simply skip the 2 bytes load immediate here, leaving the register in an
> + * undefined state. We don't care about the content (0 or !0), because we are
> + * changing the value 0->1 or 1->0. This small window of undefined value
> + * doesn't matter.
> + */
> +static int cond_call_notifier(struct notifier_block *nb,
> +	unsigned long val, void *data)
> +{
> +	enum die_val die_val = (enum die_val) val;
> +	struct die_args *args = (struct die_args *)data;

Unneeded cast of void*

> +	if (!args->regs || user_mode_vm(args->regs))
> +		return NOTIFY_DONE;
> +
> +	if (die_val == DIE_INT3	&& args->regs->eip == target_eip) {
> +		args->regs->eip += 1; /* Skip the next byte of load immediate */
> +		return NOTIFY_STOP;
> +	}
> +	return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block cond_call_notify = {
> +	.notifier_call = cond_call_notifier,
> +	.priority = 0x7fffffff,	/* we need to be notified first */
> +};
> +
> +int cond_call_optimized_set_enable(void *address, char enable)
> +{
> +	char saved_byte;
> +	int ret;
> +	char *dest = address;
> +
> +	mutex_lock(&cond_call_mutex);
> +
> +	if (!(enable ^ dest[1])) /* Must be a state change 0<->1 to execute */
> +		goto end;
> +
> +	target_eip = (long)address + BREAKPOINT_INS_LEN;
> +	/* register_die_notifier has memory barriers */
> +	register_die_notifier(&cond_call_notify);
> +	saved_byte = *dest;
> +	*dest = BREAKPOINT_INSTRUCTION;
> +	wmb();
> +	/* Execute serializing instruction on each CPU.
> +	 * Acts as a memory barrier. */
> +	ret = on_each_cpu(cond_call_synchronize_core, NULL, 1, 1);
> +	BUG_ON(ret != 0);
> +
> +	dest[1] = enable;
> +	wmb();
> +	*dest = saved_byte;
> +		/* Wait for all int3 handlers to end
> +		   (interrupts are disabled in int3).
> +		   This CPU is clearly not in a int3 handler,
> +		   because int3 handler is not preemptible and
> +		   there cannot be any more int3 handler called
> +		   for this site, because we placed the original
> +		   instruction back.
> +		   synchronize_sched has memory barriers */
> +	synchronize_sched();
> +	unregister_die_notifier(&cond_call_notify);
> +	/* unregister_die_notifier has memory barriers */
> +	target_eip = 0;
> +end:
> +	mutex_unlock(&cond_call_mutex);
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(cond_call_optimized_set_enable);

I'm not really able to review this material very usefully without having
seen an overall description of what it all does and how it all works.

What happens here when the text section is marked read-only, and when
CONFIG_DEBUG_PAGEALLOC is in use?


  reply	other threads:[~2007-05-30 20:34 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-30 14:00 [patch 0/9] Conditional Calls - for 2.6.22-rc2-mm1 Mathieu Desnoyers
2007-05-30 14:00 ` [patch 1/9] Conditional Calls - Architecture Independent Code Mathieu Desnoyers
2007-05-30 20:32   ` Andrew Morton
2007-05-31 16:34     ` Mathieu Desnoyers
2007-05-31 13:47   ` Andi Kleen
2007-06-05 18:40     ` Mathieu Desnoyers
2007-06-04 19:01   ` Adrian Bunk
2007-06-13 15:57     ` Mathieu Desnoyers
2007-06-13 21:51       ` Adrian Bunk
2007-06-14 16:02         ` Mathieu Desnoyers
2007-06-14 21:06           ` Adrian Bunk
2007-06-20 21:59             ` Mathieu Desnoyers
2007-06-21 13:00               ` Adrian Bunk
2007-06-21 13:55                 ` Mathieu Desnoyers
2007-05-30 14:00 ` [patch 2/9] Conditional Calls - Hash Table Mathieu Desnoyers
2007-05-30 20:32   ` Andrew Morton
2007-05-31 13:42   ` Andi Kleen
2007-06-01 16:08     ` Matt Mackall
2007-06-01 16:46       ` Mathieu Desnoyers
2007-06-01 17:07         ` Matt Mackall
2007-06-01 17:45           ` Andi Kleen
2007-06-01 18:06             ` Mathieu Desnoyers
2007-06-01 18:49               ` Matt Mackall
2007-06-01 19:35               ` Andi Kleen
2007-06-01 20:33                 ` Mathieu Desnoyers
2007-06-01 20:44                   ` Andi Kleen
2007-06-04 22:26                     ` Mathieu Desnoyers
2007-06-01 18:03           ` Mathieu Desnoyers
2007-05-30 14:00 ` [patch 3/9] Conditional Calls - Non Optimized Architectures Mathieu Desnoyers
2007-05-30 14:00 ` [patch 4/9] Conditional Calls - Add kconfig menus Mathieu Desnoyers
2007-05-30 14:00 ` [patch 5/9] Conditional Calls - i386 Optimization Mathieu Desnoyers
2007-05-30 20:33   ` Andrew Morton [this message]
2007-05-31 13:54   ` Andi Kleen
2007-06-05 19:02     ` Mathieu Desnoyers
2007-05-30 14:00 ` [patch 6/9] Conditional Calls - PowerPC Optimization Mathieu Desnoyers
2007-05-30 14:00 ` [patch 7/9] Conditional Calls - Documentation Mathieu Desnoyers
2007-05-30 14:00 ` [patch 8/9] F00F bug fixup for i386 - use conditional calls Mathieu Desnoyers
2007-05-30 20:33   ` Andrew Morton
2007-05-31 21:07     ` Mathieu Desnoyers
2007-05-31 21:21       ` Andrew Morton
2007-05-31 21:38         ` Mathieu Desnoyers
2007-05-30 14:00 ` [patch 9/9] Scheduler profiling - Use " Mathieu Desnoyers
2007-05-30 20:34   ` Andrew Morton
2007-06-01 15:54     ` Mathieu Desnoyers
2007-06-01 16:19       ` Andrew Morton
2007-06-01 16:33         ` Mathieu Desnoyers
2007-05-30 20:59   ` William Lee Irwin III
2007-05-31 21:12     ` Mathieu Desnoyers
2007-05-31 23:41       ` William Lee Irwin III
2007-06-04 22:20         ` Mathieu Desnoyers
2007-05-30 21:44   ` Matt Mackall
2007-05-31 21:36     ` Mathieu Desnoyers
2007-05-31 13:39   ` Andi Kleen
2007-05-31 22:07     ` Mathieu Desnoyers
2007-05-31 22:33       ` Andi Kleen
2007-06-04 22:20         ` Mathieu Desnoyers
  -- strict thread matches above, loose matches on Subject: below --
2007-05-29 18:33 [patch 0/9] Conditional Calls Mathieu Desnoyers
2007-05-29 18:34 ` [patch 5/9] Conditional Calls - i386 Optimization Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070530133328.03352d4b.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@polymtl.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox