* [patch 00/10] Immediate Values
@ 2007-07-03 16:40 Mathieu Desnoyers
2007-07-03 16:40 ` [patch 01/10] Immediate values - Global modules list and module mutex Mathieu Desnoyers
` (9 more replies)
0 siblings, 10 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2007-07-03 16:40 UTC (permalink / raw)
To: akpm, linux-kernel
Hi,
This is the update to the Immediate Values patch. It provides a value with a
load immediate instruction instead of requiring a data load which could hurt the
data cache. It aims at providing a very efficient manner to branch over
compiled-in but mostly inactive code.
This release takes care of the modifications suggested in the previous round of
review (thanks to all reviewers!).
It applies on 2.6.22-rc6-mm1, and depends on the text edit lock patch.
Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 51+ messages in thread
* [patch 01/10] Immediate values - Global modules list and module mutex
2007-07-03 16:40 [patch 00/10] Immediate Values Mathieu Desnoyers
@ 2007-07-03 16:40 ` Mathieu Desnoyers
2007-07-03 16:40 ` [patch 02/10] Immediate Value - Architecture Independent Code Mathieu Desnoyers
` (8 subsequent siblings)
9 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2007-07-03 16:40 UTC (permalink / raw)
To: akpm, linux-kernel; +Cc: Mathieu Desnoyers
[-- Attachment #1: immediate-values-global-modules-list-and-mutex.patch --]
[-- Type: text/plain, Size: 1259 bytes --]
Remove "static" from module_mutex and the modules list so it can be used by
other builtin objects in the kernel. Otherwise, every code depending on the
module list would have to be put in kernel/module.c. Since the immediate values
depends on the module list but can be considered as logically different, it
makes sense to implement them in their own file.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
kernel/module.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c 2007-07-03 11:48:36.000000000 -0400
+++ linux-2.6-lttng/kernel/module.c 2007-07-03 11:48:45.000000000 -0400
@@ -65,8 +65,8 @@
static DEFINE_SPINLOCK(modlist_lock);
/* List of modules, protected by module_mutex AND modlist_lock */
-static DEFINE_MUTEX(module_mutex);
-static LIST_HEAD(modules);
+DEFINE_MUTEX(module_mutex);
+LIST_HEAD(modules);
static DECLARE_MUTEX(notify_mutex);
static BLOCKING_NOTIFIER_HEAD(module_notify_list);
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 51+ messages in thread
* [patch 02/10] Immediate Value - Architecture Independent Code
2007-07-03 16:40 [patch 00/10] Immediate Values Mathieu Desnoyers
2007-07-03 16:40 ` [patch 01/10] Immediate values - Global modules list and module mutex Mathieu Desnoyers
@ 2007-07-03 16:40 ` Mathieu Desnoyers
2007-07-03 16:40 ` [patch 03/10] Immediate Values - Non Optimized Architectures Mathieu Desnoyers
` (7 subsequent siblings)
9 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2007-07-03 16:40 UTC (permalink / raw)
To: akpm, linux-kernel; +Cc: Mathieu Desnoyers
[-- Attachment #1: immediate-values-architecture-independent-code.patch --]
[-- Type: text/plain, Size: 17263 bytes --]
Immediate values (previously known as conditional calls) are used as a
fast condition used in an if() statement to compile in a block meant to be
dynamically enabled at runtime. When it is disabled, it has a very small
footprint: it loads an immediate value, fed to the branch. In the disabled
state, the branch skips the whole block (if the block consists in a function
call, it will skip the argument setup and the call itself).
It can be used to compile code in the kernel that is seldomly meant to be
dynamically activated. It's the case of CPU specific workarounds, profiling,
tracing, etc.
There is a generic immediate() version, which uses standard global variables,
and optimized per architecture immediate() implementations, which use a load
immediate to remove a data cache hit. When the immediate() functionnality is
disabled in the kernel, it falls back to global variables.
It adds a new rodata section "__immediate" to place the pointers to the enable
value. immediate() activation functions sits in kernel/immediate.c.
Immediate values refer to the memory address of a previously declared integer.
This integer hold the information about the state of the immediate values
associated, and must be accessed through the API found in linux/immediate.h.
At module load time, each immediate value is checked to see if it must be
enabled. It would be the case if the variable they refer to is exported from
another module and already enabled.
In their current implementation, immediate values should not be used to store
data not meant to be either 0 or !0. The goal of the optimized implementations
is to get the smallest instruction, with lowest impact on the normal function
behavior and to provide a int3 handler teardown that will support preemptible
code. Also, since the i386 architecture implementation depends on the 0 vs !0
variable caracteristic, it is recommended to only use these variables as a
condition for a if() statement.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
include/asm-generic/vmlinux.lds.h | 7 +
include/linux/immediate.h | 116 ++++++++++++++++++++++
include/linux/module.h | 10 +
kernel/Makefile | 1
kernel/immediate.c | 197 ++++++++++++++++++++++++++++++++++++++
kernel/module.c | 18 +++
6 files changed, 349 insertions(+)
Index: linux-2.6-lttng/include/linux/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/linux/immediate.h 2007-07-03 11:51:20.000000000 -0400
@@ -0,0 +1,116 @@
+#ifndef _LINUX_IMMEDIATE_H
+#define _LINUX_IMMEDIATE_H
+
+/*
+ * Immediate values, can be set at runtime. Only set to 0 or 1.
+ *
+ * (C) Copyright 2007 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#ifdef __KERNEL__
+
+struct module;
+
+/* Always access this type with the provided functions. */
+typedef struct { int value; } immediate_t;
+
+struct __immediate {
+ /* Identifier variable (int *) of the immediate value */
+ immediate_t *var;
+ /*
+ * Pointer to the memory location that holds the value (data or
+ * immediate value within the load immediate instruction).
+ */
+ void *enable;
+ /* Immediate value flags, see the list below */
+ int flags;
+};
+
+/*
+ * Immediate value flags : selects the mechanism used to set the immediate
+ * value. This is primarily used at reentrancy-unfriendly sites.
+ *
+ * On an architecture that has optimized immediate values implemented,
+ * the IF_OPTIMIZED flags distinguishes between optimized and non-optimized
+ * immediate value statements: typically, reentrancy-unfriendly sites should
+ * declare their immediate values without the IF_OPTIMIZED flag.
+ */
+#define IF_OPTIMIZED (1 << 0) /* Use optimized immediate */
+#define IF_LOCKDEP (1 << 1) /* Can trigger lockdep at patch site */
+#define _IF_NR 2
+
+/*
+ * In order to support embedded systems with read-only memory for the text
+ * segment, the choice to disable the "optimized" immediate values is left as a
+ * config option even if the architecture has the optimized flavor.
+ *
+ * This include scheme is used to support both the generic and optimized version
+ * at the same time : if a _immediate is declared with the IF_OPTIMIZED flags
+ * unset, it will use the generic version. This is useful when we must place
+ * immediates in locations that present specific reentrancy issues, such as
+ * some trap handlers, in the lockdep code and some of the scheduler code. The
+ * optimized version, when it uses the i386 mechanism to insure correct
+ * cross-cpu code modification, can trigger a trap, which will call into lockdep
+ * and might have other side-effects.
+ */
+
+#ifdef CONFIG_IMMEDIATE
+
+#include <asm/immediate.h> /* optimized immediate flavor */
+
+/*
+ * Generic immediate flavor always available.
+ *
+ * Note : the empty asm volatile with read constraint is used here instead of a
+ * "used" attribute to fix a gcc 4.1.x bug.
+ * Quoting Jeremy Fitzhardinge <jeremy@goop.org> :
+ * "There's a gcc bug which ignores the attribute for local-scope static
+ * variables: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29299"
+ */
+#define immediate_generic(flags, var) \
+ ({ \
+ static const struct __immediate __immediate_info \
+ __attribute__((section("__immediate"))) = \
+ { &(var), NULL, (flags) & ~IF_OPTIMIZED } ; \
+ asm volatile ( "" : : "i" (&__immediate_info)); \
+ (var).value; \
+ })
+
+extern void immediate_arm(immediate_t *var);
+extern void immediate_disarm(immediate_t *var);
+extern int immediate_list(void);
+extern void module_immediate_setup(struct module *mod);
+extern void __immediate_update(immediate_t *var, int value);
+#else /* !CONFIG_IMMEDIATE */
+
+#include <asm-generic/immediate.h> /* fallback on generic immediate */
+
+#define immediate_generic(flags, var) (unlikely((var).value))
+
+static inline void immediate_arm(immediate_t *var)
+{
+ var->value = 1;
+}
+
+static inline void immediate_disarm(immediate_t *var)
+{
+ var->value = 0;
+}
+static inline void module_immediate_setup(struct module *mod) { }
+static inline void __immediate_update(immediate_t *var, int value)
+{
+ var->value = value;
+}
+#endif /* CONFIG_IMMEDIATE */
+
+/* immediate_query : Returns 1 if enabled, 0 if disabled or not present */
+static inline int immediate_query(immediate_t *var)
+{
+ return var->value;
+}
+
+#endif /* __KERNEL__ */
+#endif
Index: linux-2.6-lttng/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-generic/vmlinux.lds.h 2007-07-03 11:47:41.000000000 -0400
+++ linux-2.6-lttng/include/asm-generic/vmlinux.lds.h 2007-07-03 11:51:20.000000000 -0400
@@ -122,6 +122,13 @@
VMLINUX_SYMBOL(__stop___kcrctab_gpl_future) = .; \
} \
\
+ /* Immediate values: pointers */ \
+ __immediate : AT(ADDR(__immediate) - LOAD_OFFSET) { \
+ VMLINUX_SYMBOL(__start___immediate) = .; \
+ *(__immediate) \
+ VMLINUX_SYMBOL(__stop___immediate) = .; \
+ } \
+ \
/* Kernel symbol table: strings */ \
__ksymtab_strings : AT(ADDR(__ksymtab_strings) - LOAD_OFFSET) { \
*(__ksymtab_strings) \
Index: linux-2.6-lttng/include/linux/module.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/module.h 2007-07-03 11:47:41.000000000 -0400
+++ linux-2.6-lttng/include/linux/module.h 2007-07-03 11:51:20.000000000 -0400
@@ -15,6 +15,7 @@
#include <linux/stringify.h>
#include <linux/kobject.h>
#include <linux/moduleparam.h>
+#include <linux/immediate.h>
#include <asm/local.h>
#include <asm/module.h>
@@ -67,6 +68,10 @@
/* Archs provide a method of finding the correct exception table. */
struct exception_table_entry;
+/* Protects the list of modules. */
+extern struct mutex module_mutex;
+extern struct list_head modules;
+
const struct exception_table_entry *
search_extable(const struct exception_table_entry *first,
const struct exception_table_entry *last,
@@ -370,6 +375,11 @@
/* The command line arguments (may be mangled). People like
keeping pointers to this stuff */
char *args;
+
+#ifdef CONFIG_IMMEDIATE
+ const struct __immediate *immediates;
+ unsigned int num_immediates;
+#endif
};
#ifndef MODULE_ARCH_INIT
#define MODULE_ARCH_INIT {}
Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c 2007-07-03 11:48:45.000000000 -0400
+++ linux-2.6-lttng/kernel/module.c 2007-07-03 11:51:20.000000000 -0400
@@ -32,6 +32,7 @@
#include <linux/cpu.h>
#include <linux/moduleparam.h>
#include <linux/errno.h>
+#include <linux/immediate.h>
#include <linux/err.h>
#include <linux/vermagic.h>
#include <linux/notifier.h>
@@ -1623,6 +1624,9 @@
unsigned int unusedcrcindex;
unsigned int unusedgplindex;
unsigned int unusedgplcrcindex;
+ unsigned int immediateindex = 0;
+ unsigned int markersindex = 0;
+ unsigned int markersstringsindex = 0;
struct module *mod;
long err = 0;
void *percpu = NULL, *ptr = NULL; /* Stops spurious gcc warning */
@@ -1719,6 +1723,9 @@
#ifdef ARCH_UNWIND_SECTION_NAME
unwindex = find_sec(hdr, sechdrs, secstrings, ARCH_UNWIND_SECTION_NAME);
#endif
+#ifdef CONFIG_IMMEDIATE
+ immediateindex = find_sec(hdr, sechdrs, secstrings, "__immediate");
+#endif
/* Don't keep modinfo section */
sechdrs[infoindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
@@ -1729,6 +1736,8 @@
#endif
if (unwindex)
sechdrs[unwindex].sh_flags |= SHF_ALLOC;
+ if (immediateindex)
+ sechdrs[immediateindex].sh_flags |= SHF_ALLOC;
/* Check module struct version now, before we try to use module. */
if (!check_modstruct_version(sechdrs, versindex, mod)) {
@@ -1869,6 +1878,13 @@
mod->gpl_future_syms = (void *)sechdrs[gplfutureindex].sh_addr;
if (gplfuturecrcindex)
mod->gpl_future_crcs = (void *)sechdrs[gplfuturecrcindex].sh_addr;
+#ifdef CONFIG_IMMEDIATE
+ if (immediateindex) {
+ mod->immediates = (void *)sechdrs[immediateindex].sh_addr;
+ mod->num_immediates =
+ sechdrs[immediateindex].sh_size / sizeof(*mod->immediates);
+ }
+#endif
mod->unused_syms = (void *)sechdrs[unusedindex].sh_addr;
if (unusedcrcindex)
@@ -1934,6 +1950,8 @@
}
#endif
+ module_immediate_setup(mod);
+
err = module_finalize(hdr, sechdrs, mod);
if (err < 0)
goto cleanup;
Index: linux-2.6-lttng/kernel/Makefile
===================================================================
--- linux-2.6-lttng.orig/kernel/Makefile 2007-07-03 11:47:41.000000000 -0400
+++ linux-2.6-lttng/kernel/Makefile 2007-07-03 11:51:20.000000000 -0400
@@ -58,6 +58,7 @@
obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
+obj-$(CONFIG_IMMEDIATE) += immediate.o
ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
# According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
Index: linux-2.6-lttng/kernel/immediate.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/kernel/immediate.c 2007-07-03 11:51:20.000000000 -0400
@@ -0,0 +1,197 @@
+/*
+ * Copyright (C) 2007 Mathieu Desnoyers
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/immediate.h>
+
+extern const struct __immediate __start___immediate[];
+extern const struct __immediate __stop___immediate[];
+
+/*
+ * modules_mutex nests inside immediate_mutex. immediate_mutex protects builtin
+ * immediates and module immediates.
+ */
+DEFINE_MUTEX(immediate_mutex);
+
+/*
+ * Sets a range of immediates to a enabled state : set the enable bit.
+ */
+static void _immediate_update_range(
+ const struct __immediate *begin, const struct __immediate *end)
+{
+ const struct __immediate *iter;
+ int enable;
+
+ for (iter = begin; iter < end; iter++) {
+ if (!(iter->flags & IF_OPTIMIZED))
+ continue;
+ enable = immediate_query(iter->var);
+ if (enable != IMMEDIATE_OPTIMIZED_ENABLE(iter->enable))
+ immediate_optimized_set_enable(iter->enable, enable);
+ }
+}
+
+#ifdef CONFIG_MODULES
+/*
+ * Setup the immediate according to the variable upon which it depends. Called
+ * by load_module with module_mutex held. This mutex protects against concurrent
+ * modifications to modules'immediates. Therefore, since
+ * module_immediate_setup() does not modify builtin immediates, it does not need
+ * to take the immediate_mutex.
+ */
+void module_immediate_setup(struct module *mod)
+{
+ _immediate_update_range(mod->immediates,
+ mod->immediates+mod->num_immediates);
+}
+#endif
+
+/*
+ * Provides a listing of the immediates present in the kernel with their type
+ * (optimized or generic) and state (enabled or disabled).
+ */
+static int _immediate_list_range(const struct __immediate *begin,
+ const struct __immediate *end)
+{
+ const struct __immediate *iter;
+ int found = 0;
+
+ for (iter = begin; iter < end; iter++) {
+ printk("variable %p \n", iter->var);
+ if (iter->flags & IF_OPTIMIZED)
+ printk(" enable %u optimized\n",
+ IMMEDIATE_OPTIMIZED_ENABLE(iter->enable));
+ else
+ printk(" enable %u generic\n",
+ immediate_query(iter->var));
+ found++;
+ }
+ return found;
+}
+
+#ifdef CONFIG_MODULES
+static inline void __immediate_update_modules(immediate_t *var, int value)
+{
+ struct module *mod;
+
+ list_for_each_entry(mod, &modules, list) {
+ if (mod->taints)
+ continue;
+ _immediate_update_range(mod->immediates,
+ mod->immediates+mod->num_immediates);
+ }
+}
+#else
+static inline void __immediate_update_modules(immediate_t *var, int value) { }
+#endif
+
+/*
+ * Calls _immediate_update_range for the core immediates and modules immediates.
+ */
+void __immediate_update(immediate_t *var, int value)
+{
+
+ var->value = value;
+ /* Core kernel immediates */
+ _immediate_update_range(__start___immediate, __stop___immediate);
+ /* immediates in modules. */
+ __immediate_update_modules(var, value);
+}
+
+#ifdef CONFIG_MODULES
+/*
+ * Takes module_mutex.
+ */
+void _immediate_update(immediate_t *var, int value)
+{
+ mutex_lock(&module_mutex);
+ __immediate_update(var, value);
+ mutex_unlock(&module_mutex);
+}
+#else
+void _immediate_update(immediate_t *var, int value)
+{
+ __immediate_update(var, value);
+}
+#endif
+
+/* immediate enabling/disabling use the immediate_mutex to synchronize. */
+void immediate_arm(immediate_t *var)
+{
+ mutex_lock(&immediate_mutex);
+ _immediate_update(var, 1);
+ mutex_unlock(&immediate_mutex);
+}
+EXPORT_SYMBOL_GPL(immediate_arm);
+
+/* immediate enabling/disabling use the immediate_mutex to synchronize. */
+void immediate_disarm(immediate_t *var)
+{
+ mutex_lock(&immediate_mutex);
+ _immediate_update(var, 0);
+ mutex_unlock(&immediate_mutex);
+}
+EXPORT_SYMBOL_GPL(immediate_disarm);
+
+#ifdef CONFIG_MODULES
+static inline int immediate_list_modules(void)
+{
+ int found = 0;
+ struct module *mod;
+
+ printk("Listing module immediate values\n");
+ mutex_lock(&module_mutex);
+ list_for_each_entry(mod, &modules, list) {
+ if (!mod->taints) {
+ printk("Listing immediate values for module %s\n",
+ mod->name);
+ found += _immediate_list_range(mod->immediates,
+ mod->immediates+mod->num_immediates);
+ }
+ }
+ mutex_unlock(&module_mutex);
+ return found;
+}
+#else
+static inline int immediate_list_modules(void)
+{
+ return 0;
+}
+#endif
+
+/*
+ * Calls _immediate_list_range for the core and module immediates.
+ * Cond call listing uses the module_mutex to synchronize.
+ * Takes the immediate mutex to protect against builtin immediate modification
+ * and takes the module_mutex to protect against module list modification.
+ * TODO : should output this listing to a procfs file.
+ */
+int immediate_list(void)
+{
+ int found = 0;
+
+ mutex_lock(&immediate_mutex);
+ /* Core kernel immediates */
+ printk("Listing kernel immediate values\n");
+ found += _immediate_list_range(__start___immediate, __stop___immediate);
+ /* immediates in modules. */
+ found += immediate_list_modules();
+ mutex_unlock(&immediate_mutex);
+ return found;
+}
+EXPORT_SYMBOL_GPL(immediate_list);
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 51+ messages in thread
* [patch 03/10] Immediate Values - Non Optimized Architectures
2007-07-03 16:40 [patch 00/10] Immediate Values Mathieu Desnoyers
2007-07-03 16:40 ` [patch 01/10] Immediate values - Global modules list and module mutex Mathieu Desnoyers
2007-07-03 16:40 ` [patch 02/10] Immediate Value - Architecture Independent Code Mathieu Desnoyers
@ 2007-07-03 16:40 ` Mathieu Desnoyers
2007-07-03 16:40 ` [patch 04/10] Immediate Value - Add kconfig menus Mathieu Desnoyers
` (6 subsequent siblings)
9 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2007-07-03 16:40 UTC (permalink / raw)
To: akpm, linux-kernel; +Cc: Mathieu Desnoyers
[-- Attachment #1: immediate-values-non-optimized-architectures.patch --]
[-- Type: text/plain, Size: 9601 bytes --]
Architecture agnostic, generic, version of the immediate values. It uses a
global variable to mimic the immediate values.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
include/asm-alpha/immediate.h | 1 +
include/asm-arm/immediate.h | 1 +
include/asm-arm26/immediate.h | 1 +
include/asm-cris/immediate.h | 1 +
include/asm-frv/immediate.h | 1 +
include/asm-generic/immediate.h | 14 ++++++++++++++
include/asm-h8300/immediate.h | 1 +
include/asm-i386/immediate.h | 1 +
include/asm-ia64/immediate.h | 1 +
include/asm-m32r/immediate.h | 1 +
include/asm-m68k/immediate.h | 1 +
include/asm-m68knommu/immediate.h | 1 +
include/asm-mips/immediate.h | 1 +
include/asm-parisc/immediate.h | 1 +
include/asm-powerpc/immediate.h | 1 +
include/asm-ppc/immediate.h | 1 +
include/asm-s390/immediate.h | 1 +
include/asm-sh/immediate.h | 1 +
include/asm-sh64/immediate.h | 1 +
include/asm-sparc/immediate.h | 1 +
include/asm-sparc64/immediate.h | 1 +
include/asm-um/immediate.h | 1 +
include/asm-v850/immediate.h | 1 +
include/asm-x86_64/immediate.h | 1 +
include/asm-xtensa/immediate.h | 1 +
25 files changed, 38 insertions(+)
Index: linux-2.6-lttng/include/asm-generic/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-generic/immediate.h 2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1,14 @@
+#ifndef _ASM_GENERIC_IMMEDIATE_H
+#define _ASM_GENERIC_IMMEDIATE_H
+
+/* Default flags, used by _immediate() */
+#define IF_DEFAULT 0
+
+/* Fallback on the generic immediate, since no optimized version is available */
+#define immediate_optimized immediate_generic
+#define _immediate(flags, var) immediate_generic(flags, var)
+
+/* immediate with default behavior */
+#define immediate(var) _immediate(IF_DEFAULT, var)
+
+#endif /* _ASM_GENERIC_IMMEDIATE_H */
Index: linux-2.6-lttng/include/asm-alpha/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-alpha/immediate.h 2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-arm/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-arm/immediate.h 2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-arm26/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-arm26/immediate.h 2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-cris/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-cris/immediate.h 2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-frv/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-frv/immediate.h 2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-h8300/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-h8300/immediate.h 2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-ia64/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-ia64/immediate.h 2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-m32r/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-m32r/immediate.h 2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-m68k/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-m68k/immediate.h 2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-m68knommu/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-m68knommu/immediate.h 2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-mips/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-mips/immediate.h 2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-parisc/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-parisc/immediate.h 2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-ppc/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-ppc/immediate.h 2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-s390/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-s390/immediate.h 2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-sh/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-sh/immediate.h 2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-sh64/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-sh64/immediate.h 2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-sparc/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-sparc/immediate.h 2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-sparc64/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-sparc64/immediate.h 2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-um/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-um/immediate.h 2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-v850/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-v850/immediate.h 2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-x86_64/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-x86_64/immediate.h 2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-xtensa/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-xtensa/immediate.h 2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-i386/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-i386/immediate.h 2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-powerpc/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-powerpc/immediate.h 2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 51+ messages in thread
* [patch 04/10] Immediate Value - Add kconfig menus
2007-07-03 16:40 [patch 00/10] Immediate Values Mathieu Desnoyers
` (2 preceding siblings ...)
2007-07-03 16:40 ` [patch 03/10] Immediate Values - Non Optimized Architectures Mathieu Desnoyers
@ 2007-07-03 16:40 ` Mathieu Desnoyers
2007-07-03 16:40 ` [patch 05/10] Immediate Values - kprobe header fix Mathieu Desnoyers
` (5 subsequent siblings)
9 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2007-07-03 16:40 UTC (permalink / raw)
To: akpm, linux-kernel; +Cc: Mathieu Desnoyers, Adrian Bunk, Andi Kleen
[-- Attachment #1: immediate-values-kconfig-menus.patch --]
[-- Type: text/plain, Size: 14249 bytes --]
Immediate values provide a way to compile in kernels features that can be
enabled dynamically, with a very small footprint when disabled.
This patch:
Add Kconfig menus for the marker code.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Adrian Bunk <bunk@stusta.de>
CC: Andi Kleen <andi@firstfloor.org>
---
arch/alpha/Kconfig | 6 ++++++
arch/arm/Kconfig | 6 ++++++
arch/arm26/Kconfig | 6 ++++++
arch/avr32/Kconfig.debug | 7 +++++++
arch/cris/Kconfig | 6 ++++++
arch/frv/Kconfig | 6 ++++++
arch/h8300/Kconfig | 6 ++++++
arch/i386/Kconfig | 2 ++
arch/ia64/Kconfig | 3 +++
arch/m32r/Kconfig | 6 ++++++
arch/m68k/Kconfig | 6 ++++++
arch/m68knommu/Kconfig | 6 ++++++
arch/mips/Kconfig | 6 ++++++
arch/parisc/Kconfig | 6 ++++++
arch/powerpc/Kconfig | 3 +++
arch/ppc/Kconfig | 6 ++++++
arch/s390/Kconfig | 2 ++
arch/sh/Kconfig | 6 ++++++
arch/sh64/Kconfig | 6 ++++++
arch/sparc/Kconfig | 2 ++
arch/sparc64/Kconfig | 3 +++
arch/um/Kconfig | 6 ++++++
arch/v850/Kconfig | 6 ++++++
arch/x86_64/Kconfig | 3 +++
arch/xtensa/Kconfig | 6 ++++++
kernel/Kconfig.immediate | 9 +++++++++
26 files changed, 136 insertions(+)
Index: linux-2.6-lttng/arch/alpha/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/alpha/Kconfig 2007-06-17 13:21:09.000000000 -0400
+++ linux-2.6-lttng/arch/alpha/Kconfig 2007-06-17 13:25:17.000000000 -0400
@@ -653,6 +653,12 @@
source "arch/alpha/oprofile/Kconfig"
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
source "arch/alpha/Kconfig.debug"
# DUMMY_CONSOLE may be defined in drivers/video/console/Kconfig
Index: linux-2.6-lttng/arch/arm/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/arm/Kconfig 2007-06-17 13:21:09.000000000 -0400
+++ linux-2.6-lttng/arch/arm/Kconfig 2007-06-17 13:25:17.000000000 -0400
@@ -1046,6 +1046,12 @@
source "arch/arm/oprofile/Kconfig"
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
source "arch/arm/Kconfig.debug"
source "security/Kconfig"
Index: linux-2.6-lttng/arch/arm26/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/arm26/Kconfig 2007-06-17 13:21:09.000000000 -0400
+++ linux-2.6-lttng/arch/arm26/Kconfig 2007-06-17 13:25:17.000000000 -0400
@@ -244,6 +244,12 @@
source "drivers/usb/Kconfig"
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
source "arch/arm26/Kconfig.debug"
source "security/Kconfig"
Index: linux-2.6-lttng/arch/cris/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/cris/Kconfig 2007-06-17 13:21:09.000000000 -0400
+++ linux-2.6-lttng/arch/cris/Kconfig 2007-06-17 13:25:17.000000000 -0400
@@ -198,6 +198,12 @@
source "drivers/usb/Kconfig"
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
source "arch/cris/Kconfig.debug"
source "security/Kconfig"
Index: linux-2.6-lttng/arch/frv/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/frv/Kconfig 2007-06-17 13:21:09.000000000 -0400
+++ linux-2.6-lttng/arch/frv/Kconfig 2007-06-17 13:25:17.000000000 -0400
@@ -375,6 +375,12 @@
source "fs/Kconfig"
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
source "arch/frv/Kconfig.debug"
source "security/Kconfig"
Index: linux-2.6-lttng/arch/h8300/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/h8300/Kconfig 2007-06-17 13:21:09.000000000 -0400
+++ linux-2.6-lttng/arch/h8300/Kconfig 2007-06-17 13:25:17.000000000 -0400
@@ -223,6 +223,12 @@
source "fs/Kconfig"
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
source "arch/h8300/Kconfig.debug"
source "security/Kconfig"
Index: linux-2.6-lttng/arch/i386/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/i386/Kconfig 2007-06-17 13:21:09.000000000 -0400
+++ linux-2.6-lttng/arch/i386/Kconfig 2007-06-17 13:25:17.000000000 -0400
@@ -1235,6 +1235,8 @@
for kernel debugging, non-intrusive instrumentation and testing.
If in doubt, say "N".
+source "kernel/Kconfig.immediate"
+
endif # INSTRUMENTATION
source "arch/i386/Kconfig.debug"
Index: linux-2.6-lttng/arch/ia64/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/ia64/Kconfig 2007-06-17 13:21:09.000000000 -0400
+++ linux-2.6-lttng/arch/ia64/Kconfig 2007-06-17 13:25:17.000000000 -0400
@@ -593,6 +593,9 @@
a probepoint and specifies the callback. Kprobes is useful
for kernel debugging, non-intrusive instrumentation and testing.
If in doubt, say "N".
+
+source "kernel/Kconfig.immediate"
+
endmenu
source "arch/ia64/Kconfig.debug"
Index: linux-2.6-lttng/arch/m32r/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/m32r/Kconfig 2007-06-17 13:21:09.000000000 -0400
+++ linux-2.6-lttng/arch/m32r/Kconfig 2007-06-17 13:25:17.000000000 -0400
@@ -408,6 +408,12 @@
source "arch/m32r/oprofile/Kconfig"
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
source "arch/m32r/Kconfig.debug"
source "security/Kconfig"
Index: linux-2.6-lttng/arch/m68k/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/m68k/Kconfig 2007-06-17 13:21:09.000000000 -0400
+++ linux-2.6-lttng/arch/m68k/Kconfig 2007-06-17 13:25:17.000000000 -0400
@@ -679,6 +679,12 @@
source "fs/Kconfig"
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
source "arch/m68k/Kconfig.debug"
source "security/Kconfig"
Index: linux-2.6-lttng/arch/m68knommu/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/m68knommu/Kconfig 2007-06-17 13:21:09.000000000 -0400
+++ linux-2.6-lttng/arch/m68knommu/Kconfig 2007-06-17 13:25:17.000000000 -0400
@@ -668,6 +668,12 @@
source "fs/Kconfig"
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
source "arch/m68knommu/Kconfig.debug"
source "security/Kconfig"
Index: linux-2.6-lttng/arch/mips/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/mips/Kconfig 2007-06-17 13:21:09.000000000 -0400
+++ linux-2.6-lttng/arch/mips/Kconfig 2007-06-17 13:25:17.000000000 -0400
@@ -1957,6 +1957,12 @@
source "arch/mips/oprofile/Kconfig"
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
source "arch/mips/Kconfig.debug"
source "security/Kconfig"
Index: linux-2.6-lttng/arch/parisc/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/parisc/Kconfig 2007-06-17 13:21:09.000000000 -0400
+++ linux-2.6-lttng/arch/parisc/Kconfig 2007-06-17 13:25:17.000000000 -0400
@@ -269,6 +269,12 @@
source "arch/parisc/oprofile/Kconfig"
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
source "arch/parisc/Kconfig.debug"
source "security/Kconfig"
Index: linux-2.6-lttng/arch/powerpc/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/powerpc/Kconfig 2007-06-17 13:21:09.000000000 -0400
+++ linux-2.6-lttng/arch/powerpc/Kconfig 2007-06-17 13:25:17.000000000 -0400
@@ -905,6 +905,9 @@
a probepoint and specifies the callback. Kprobes is useful
for kernel debugging, non-intrusive instrumentation and testing.
If in doubt, say "N".
+
+source "kernel/Kconfig.immediate"
+
endmenu
source "arch/powerpc/Kconfig.debug"
Index: linux-2.6-lttng/arch/ppc/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/ppc/Kconfig 2007-06-17 13:21:09.000000000 -0400
+++ linux-2.6-lttng/arch/ppc/Kconfig 2007-06-17 13:25:17.000000000 -0400
@@ -1451,8 +1451,14 @@
source "lib/Kconfig"
+menu "Instrumentation Support"
+
source "arch/powerpc/oprofile/Kconfig"
+source "kernel/Kconfig.immediate"
+
+endmenu
+
source "arch/ppc/Kconfig.debug"
source "security/Kconfig"
Index: linux-2.6-lttng/arch/s390/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/s390/Kconfig 2007-06-17 13:21:09.000000000 -0400
+++ linux-2.6-lttng/arch/s390/Kconfig 2007-06-17 13:25:17.000000000 -0400
@@ -547,6 +547,8 @@
for kernel debugging, non-intrusive instrumentation and testing.
If in doubt, say "N".
+source "kernel/Kconfig.immediate"
+
endmenu
source "arch/s390/Kconfig.debug"
Index: linux-2.6-lttng/arch/sh/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/sh/Kconfig 2007-06-17 13:21:09.000000000 -0400
+++ linux-2.6-lttng/arch/sh/Kconfig 2007-06-17 13:25:17.000000000 -0400
@@ -721,6 +721,12 @@
source "arch/sh/oprofile/Kconfig"
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
source "arch/sh/Kconfig.debug"
source "security/Kconfig"
Index: linux-2.6-lttng/arch/sh64/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/sh64/Kconfig 2007-06-17 13:21:09.000000000 -0400
+++ linux-2.6-lttng/arch/sh64/Kconfig 2007-06-17 13:25:17.000000000 -0400
@@ -284,6 +284,12 @@
source "arch/sh64/oprofile/Kconfig"
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
source "arch/sh64/Kconfig.debug"
source "security/Kconfig"
Index: linux-2.6-lttng/arch/sparc/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/sparc/Kconfig 2007-06-17 13:21:09.000000000 -0400
+++ linux-2.6-lttng/arch/sparc/Kconfig 2007-06-17 13:25:17.000000000 -0400
@@ -313,6 +313,8 @@
source "arch/sparc/oprofile/Kconfig"
+source "kernel/Kconfig.immediate"
+
endmenu
source "arch/sparc/Kconfig.debug"
Index: linux-2.6-lttng/arch/sparc64/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/sparc64/Kconfig 2007-06-17 13:21:09.000000000 -0400
+++ linux-2.6-lttng/arch/sparc64/Kconfig 2007-06-17 13:25:17.000000000 -0400
@@ -438,6 +438,9 @@
a probepoint and specifies the callback. Kprobes is useful
for kernel debugging, non-intrusive instrumentation and testing.
If in doubt, say "N".
+
+source "kernel/Kconfig.immediate"
+
endmenu
source "arch/sparc64/Kconfig.debug"
Index: linux-2.6-lttng/arch/um/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/um/Kconfig 2007-06-17 13:21:09.000000000 -0400
+++ linux-2.6-lttng/arch/um/Kconfig 2007-06-17 13:25:17.000000000 -0400
@@ -335,4 +335,10 @@
bool
default n
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
source "arch/um/Kconfig.debug"
Index: linux-2.6-lttng/arch/v850/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/v850/Kconfig 2007-06-17 13:21:09.000000000 -0400
+++ linux-2.6-lttng/arch/v850/Kconfig 2007-06-17 13:25:17.000000000 -0400
@@ -331,6 +331,12 @@
source "drivers/usb/Kconfig"
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
source "arch/v850/Kconfig.debug"
source "security/Kconfig"
Index: linux-2.6-lttng/arch/x86_64/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/x86_64/Kconfig 2007-06-17 13:21:09.000000000 -0400
+++ linux-2.6-lttng/arch/x86_64/Kconfig 2007-06-17 13:25:17.000000000 -0400
@@ -792,6 +792,9 @@
a probepoint and specifies the callback. Kprobes is useful
for kernel debugging, non-intrusive instrumentation and testing.
If in doubt, say "N".
+
+source "kernel/Kconfig.immediate"
+
endmenu
source "arch/x86_64/Kconfig.debug"
Index: linux-2.6-lttng/arch/xtensa/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/xtensa/Kconfig 2007-06-17 13:21:09.000000000 -0400
+++ linux-2.6-lttng/arch/xtensa/Kconfig 2007-06-17 13:25:17.000000000 -0400
@@ -251,6 +251,12 @@
provide one yourself.
endmenu
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
source "arch/xtensa/Kconfig.debug"
source "security/Kconfig"
Index: linux-2.6-lttng/kernel/Kconfig.immediate
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/kernel/Kconfig.immediate 2007-06-17 13:25:27.000000000 -0400
@@ -0,0 +1,9 @@
+# Immediate values configuration
+
+config IMMEDIATE
+ bool "Use self-modifying code to provide fast immediate values"
+ help
+ Provides a way to use immediate values acting as global values to
+ dynamically enable kernel features while having a very small
+ footprint when disabled. You may want to disable this feature if you
+ run your kernel code on a read-only rom/flash.
Index: linux-2.6-lttng/arch/avr32/Kconfig.debug
===================================================================
--- linux-2.6-lttng.orig/arch/avr32/Kconfig.debug 2007-06-17 13:21:09.000000000 -0400
+++ linux-2.6-lttng/arch/avr32/Kconfig.debug 2007-06-17 13:25:17.000000000 -0400
@@ -6,6 +6,9 @@
source "lib/Kconfig.debug"
+menu "Instrumentation Support"
+ depends on EXPERIMENTAL
+
config KPROBES
bool "Kprobes"
depends on DEBUG_KERNEL
@@ -16,4 +19,8 @@
for kernel debugging, non-intrusive instrumentation and testing.
If in doubt, say "N".
+source "kernel/Kconfig.immediate"
+
+endmenu
+
endmenu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 51+ messages in thread
* [patch 05/10] Immediate Values - kprobe header fix
2007-07-03 16:40 [patch 00/10] Immediate Values Mathieu Desnoyers
` (3 preceding siblings ...)
2007-07-03 16:40 ` [patch 04/10] Immediate Value - Add kconfig menus Mathieu Desnoyers
@ 2007-07-03 16:40 ` Mathieu Desnoyers
2007-07-03 16:40 ` [patch 06/10] Immediate Value - i386 Optimization Mathieu Desnoyers
` (4 subsequent siblings)
9 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2007-07-03 16:40 UTC (permalink / raw)
To: akpm, linux-kernel
Cc: Mathieu Desnoyers, prasanna, ananth, anil.s.keshavamurthy, davem
[-- Attachment #1: immediate-values-kprobes-headers.patch --]
[-- Type: text/plain, Size: 1296 bytes --]
Since the immediate values depend on the same int3 handler as kprobes implements
for i386, we have to get architecture specific defines available for the kprobes
trap handler (especially restore_interrupts()) wven when CONFIG_KPROBES is not
selected.
That kind of ifdef around a whole header does not make sense in the first place
anyway.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: prasanna@in.ibm.com
CC: ananth@in.ibm.com
CC: anil.s.keshavamurthy@intel.com
CC: davem@davemloft.net
---
include/linux/kprobes.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux-2.6-lttng/include/linux/kprobes.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/kprobes.h 2007-06-17 13:41:35.000000000 -0400
+++ linux-2.6-lttng/include/linux/kprobes.h 2007-06-17 13:41:55.000000000 -0400
@@ -37,9 +37,9 @@
#include <linux/rcupdate.h>
#include <linux/mutex.h>
-#ifdef CONFIG_KPROBES
#include <asm/kprobes.h>
+#ifdef CONFIG_KPROBES
/* kprobe_status settings */
#define KPROBE_HIT_ACTIVE 0x00000001
#define KPROBE_HIT_SS 0x00000002
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 51+ messages in thread
* [patch 06/10] Immediate Value - i386 Optimization
2007-07-03 16:40 [patch 00/10] Immediate Values Mathieu Desnoyers
` (4 preceding siblings ...)
2007-07-03 16:40 ` [patch 05/10] Immediate Values - kprobe header fix Mathieu Desnoyers
@ 2007-07-03 16:40 ` Mathieu Desnoyers
2007-07-03 18:45 ` H. Peter Anvin
2007-07-03 16:40 ` [patch 07/10] Immediate Value - PowerPC Optimization Mathieu Desnoyers
` (3 subsequent siblings)
9 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2007-07-03 16:40 UTC (permalink / raw)
To: akpm, linux-kernel; +Cc: Mathieu Desnoyers
[-- Attachment #1: immediate-values-i386-optimization.patch --]
[-- Type: text/plain, Size: 12004 bytes --]
i386 optimization of the immediate values which uses a movl with code patching
to set/unset the value used to populate the register used for the branch test.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
arch/i386/kernel/Makefile | 1
arch/i386/kernel/immediate.c | 171 +++++++++++++++++++++++++++++++++++++++++++
arch/i386/kernel/traps.c | 8 +-
include/asm-i386/immediate.h | 72 +++++++++++++++++-
4 files changed, 247 insertions(+), 5 deletions(-)
Index: linux-2.6-lttng/include/asm-i386/immediate.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-i386/immediate.h 2007-06-19 17:02:14.000000000 -0400
+++ linux-2.6-lttng/include/asm-i386/immediate.h 2007-06-19 17:02:15.000000000 -0400
@@ -1 +1,71 @@
-#include <asm-generic/immediate.h>
+#ifndef _ASM_I386_IMMEDIATE_H
+#define _ASM_I386_IMMEDIATE_H
+
+/*
+ * Immediate values. i386 architecture optimizations.
+ *
+ * (C) Copyright 2006 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#define IF_DEFAULT (IF_OPTIMIZED | IF_LOCKDEP)
+
+/*
+ * Optimized version of the immediate. Passing the flags as a pointer to
+ * the inline assembly to trick it into putting the flags value as third
+ * parameter in the structure.
+ */
+#define immediate_optimized(flags, var) \
+ ({ \
+ int condition; \
+ asm ( ".section __immediate, \"a\", @progbits;\n\t" \
+ ".long %1, 0f, %2;\n\t" \
+ ".previous;\n\t" \
+ "0:\n\t" \
+ "movl %3,%0;\n\t" \
+ : "=r" (condition) \
+ : "m" (var), \
+ "m" (*(char*)flags), \
+ "i" (0)); \
+ condition; \
+ })
+
+/*
+ * immediate macro selecting the generic or optimized version of immediate,
+ * depending on the flags specified. It is a macro because we need to pass the
+ * name to immediate_optimized() and immediate_generic() so they can declare a
+ * static variable with it.
+ */
+#define _immediate(flags, var) \
+({ \
+ (((flags) & IF_LOCKDEP) && ((flags) & IF_OPTIMIZED)) ? \
+ immediate_optimized(flags, var) : \
+ immediate_generic(flags, var); \
+})
+
+/* immediate with default behavior */
+#define immediate(var) _immediate(IF_DEFAULT, var)
+
+/*
+ * Architecture dependant immediate information, used internally for immediate
+ * activation.
+ */
+
+/*
+ * Offset of the immediate value from the start of the movl instruction, in
+ * bytes. We point to the first lower byte of the 4 bytes immediate value. Only
+ * changing one byte makes sure we do an atomic memory write, independently of
+ * the alignment of the 4 bytes in the load immediate instruction.
+ */
+#define IMMEDIATE_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET 1
+#define IMMEDIATE_OPTIMIZED_ENABLE_TYPE unsigned char
+/* Dereference enable as lvalue from a pointer to its instruction */
+#define IMMEDIATE_OPTIMIZED_ENABLE(a) \
+ (*(IMMEDIATE_OPTIMIZED_ENABLE_TYPE*) \
+ ((char*)(a)+IMMEDIATE_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET))
+
+extern int immediate_optimized_set_enable(void *address, char enable);
+
+#endif /* _ASM_I386_IMMEDIATE_H */
Index: linux-2.6-lttng/arch/i386/kernel/Makefile
===================================================================
--- linux-2.6-lttng.orig/arch/i386/kernel/Makefile 2007-06-19 17:00:55.000000000 -0400
+++ linux-2.6-lttng/arch/i386/kernel/Makefile 2007-06-19 17:02:15.000000000 -0400
@@ -35,6 +35,7 @@
obj-y += sysenter.o vsyscall.o
obj-$(CONFIG_ACPI_SRAT) += srat.o
obj-$(CONFIG_EFI) += efi.o efi_stub.o
+obj-$(CONFIG_IMMEDIATE) += immediate.o
obj-$(CONFIG_DOUBLEFAULT) += doublefault.o
obj-$(CONFIG_SERIAL_8250) += legacy_serial.o
obj-$(CONFIG_VM86) += vm86.o
Index: linux-2.6-lttng/arch/i386/kernel/immediate.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/arch/i386/kernel/immediate.c 2007-06-19 17:02:43.000000000 -0400
@@ -0,0 +1,171 @@
+/*
+ * Immediate Value - i386 architecture specific code.
+ *
+ * Rationale
+ *
+ * Required because of :
+ * - Erratum 49 fix for Intel PIII.
+ * - Still present on newer processors : Intel Core 2 Duo Processor for Intel
+ * Centrino Duo Processor Technology Specification Update, AH33.
+ * Unsynchronized Cross-Modifying Code Operations Can Cause Unexpected
+ * Instruction Execution Results.
+ *
+ * Permits immediate value modification by XMC with correct serialization.
+ *
+ * Reentrant for NMI and trap handler instrumentation. Permits XMC to a
+ * location that has preemption enabled because it involves no temporary or
+ * reused data structure.
+ *
+ * Quoting Richard J Moore, source of the information motivating this
+ * implementation which differs from the one proposed by Intel which is not
+ * suitable for kernel context (does not support NMI and would require disabling
+ * interrupts on every CPU for a long period) :
+ *
+ * "There is another issue to consider when looking into using probes other
+ * then int3:
+ *
+ * Intel erratum 54 - Unsynchronized Cross-modifying code - refers to the
+ * practice of modifying code on one processor where another has prefetched
+ * the unmodified version of the code. Intel states that unpredictable general
+ * protection faults may result if a synchronizing instruction (iret, int,
+ * int3, cpuid, etc ) is not executed on the second processor before it
+ * executes the pre-fetched out-of-date copy of the instruction.
+ *
+ * When we became aware of this I had a long discussion with Intel's
+ * microarchitecture guys. It turns out that the reason for this erratum
+ * (which incidentally Intel does not intend to fix) is because the trace
+ * cache - the stream of micorops resulting from instruction interpretation -
+ * cannot guaranteed to be valid. Reading between the lines I assume this
+ * issue arises because of optimization done in the trace cache, where it is
+ * no longer possible to identify the original instruction boundaries. If the
+ * CPU discoverers that the trace cache has been invalidated because of
+ * unsynchronized cross-modification then instruction execution will be
+ * aborted with a GPF. Further discussion with Intel revealed that replacing
+ * the first opcode byte with an int3 would not be subject to this erratum.
+ *
+ * So, is cmpxchg reliable? One has to guarantee more than mere atomicity."
+ *
+ * Overall design
+ *
+ * The algorithm proposed by Intel applies not so well in kernel context: it
+ * would imply disabling interrupts and looping on every CPUs while modifying
+ * the code and would not support instrumentation of code called from interrupt
+ * sources that cannot be disabled.
+ *
+ * Therefore, we use a different algorithm to respect Intel's erratum (see the
+ * quoted discussion above). We make sure that no CPU sees an out-of-date copy
+ * of a pre-fetched instruction by 1 - using a breakpoint, which skips the
+ * instruction that is going to be modified, 2 - issuing an IPI to every CPU to
+ * execute a sync_core(), to make sure that even when the breakpoint is removed,
+ * no cpu could possibly still have the out-of-date copy of the instruction,
+ * modify the now unused 2nd byte of the instruction, and then put back the
+ * original 1st byte of the instruction.
+ *
+ * It has exactly the same intent as the algorithm proposed by Intel, but
+ * it has less side-effects, scales better and supports NMI, SMI and MCE.
+ *
+ * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ */
+
+#include <linux/notifier.h>
+#include <linux/preempt.h>
+#include <linux/smp.h>
+#include <linux/notifier.h>
+#include <linux/module.h>
+#include <linux/immediate.h>
+#include <linux/kdebug.h>
+#include <linux/rcupdate.h>
+
+#include <asm/cacheflush.h>
+
+#define BREAKPOINT_INSTRUCTION 0xcc
+#define BREAKPOINT_INS_LEN 1
+
+static long target_eip;
+
+static void immediate_synchronize_core(void *info)
+{
+ sync_core(); /* use cpuid to stop speculative execution */
+}
+
+/*
+ * The eip value points right after the breakpoint instruction, in the second
+ * byte of the movl. Incrementing it of 4 bytes makes the code resume right
+ * after the movl instruction, effectively skipping this instruction.
+ *
+ * We simply skip the 4 bytes load immediate here, leaving the register in an
+ * undefined state. We don't care about the content (0 or !0), because we are
+ * changing the value 0->1 or 1->0. This small window of undefined value
+ * doesn't matter.
+ */
+static int immediate_notifier(struct notifier_block *nb,
+ unsigned long val, void *data)
+{
+ enum die_val die_val = (enum die_val) val;
+ struct die_args *args = data;
+
+ if (!args->regs || user_mode_vm(args->regs))
+ return NOTIFY_DONE;
+
+ if (die_val == DIE_INT3 && args->regs->eip == target_eip) {
+ args->regs->eip += 4; /* Skip the rest of the load immediate */
+ return NOTIFY_STOP;
+ }
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block immediate_notify = {
+ .notifier_call = immediate_notifier,
+ .priority = 0x7fffffff, /* we need to be notified first */
+};
+
+/*
+ * The address is not aligned. We can only change 1 byte of the value
+ * atomically.
+ * Must be called with immediate_mutex held.
+ */
+int immediate_optimized_set_enable(void *address, char enable)
+{
+ char saved_byte;
+ int ret;
+ char *dest = address;
+
+ if (!(enable ^ dest[1])) /* Must be a state change 0<->1 to execute */
+ return 0;
+
+ /* We plan to write only on the 1st 2 bytes of the movl */
+ kernel_text_lock((unsigned long)address, 2);
+ target_eip = (long)address + BREAKPOINT_INS_LEN;
+ /* register_die_notifier has memory barriers */
+ register_die_notifier(&immediate_notify);
+ saved_byte = *dest;
+ *dest = BREAKPOINT_INSTRUCTION;
+ wmb();
+ /*
+ * Execute serializing instruction on each CPU.
+ * Acts as a memory barrier.
+ */
+ ret = on_each_cpu(immediate_synchronize_core, NULL, 1, 1);
+ BUG_ON(ret != 0);
+
+ dest[1] = enable;
+ wmb();
+ *dest = saved_byte;
+ kernel_text_unlock((unsigned long)address, 2);
+ /*
+ * Wait for all int3 handlers to end
+ * (interrupts are disabled in int3).
+ * This CPU is clearly not in a int3 handler,
+ * because int3 handler is not preemptible and
+ * there cannot be any more int3 handler called
+ * for this site, because we placed the original
+ * instruction back.
+ * synchronize_sched has memory barriers.
+ */
+ synchronize_sched();
+ unregister_die_notifier(&immediate_notify);
+ /* unregister_die_notifier has memory barriers */
+ target_eip = 0;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(immediate_optimized_set_enable);
Index: linux-2.6-lttng/arch/i386/kernel/traps.c
===================================================================
--- linux-2.6-lttng.orig/arch/i386/kernel/traps.c 2007-06-19 17:00:55.000000000 -0400
+++ linux-2.6-lttng/arch/i386/kernel/traps.c 2007-06-19 17:02:15.000000000 -0400
@@ -628,7 +628,7 @@
}
DO_VM86_ERROR_INFO( 0, SIGFPE, "divide error", divide_error, FPE_INTDIV, regs->eip)
-#ifndef CONFIG_KPROBES
+#if !defined(CONFIG_KPROBES) && !defined(CONFIG_IMMEDIATE)
DO_VM86_ERROR( 3, SIGTRAP, "int3", int3)
#endif
DO_VM86_ERROR( 4, SIGSEGV, "overflow", overflow)
@@ -848,14 +848,14 @@
nmi_exit();
}
-#ifdef CONFIG_KPROBES
+#if defined(CONFIG_KPROBES) || defined(CONFIG_IMMEDIATE)
fastcall void __kprobes do_int3(struct pt_regs *regs, long error_code)
{
if (notify_die(DIE_INT3, "int3", regs, error_code, 3, SIGTRAP)
== NOTIFY_STOP)
return;
- /* This is an interrupt gate, because kprobes wants interrupts
- disabled. Normal trap handlers don't. */
+ /* This is an interrupt gate, because kprobes and immediate valueswants
+ * interrupts disabled. Normal trap handlers don't. */
restore_interrupts(regs);
do_trap(3, SIGTRAP, "int3", 1, regs, error_code, NULL);
}
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 51+ messages in thread
* [patch 07/10] Immediate Value - PowerPC Optimization
2007-07-03 16:40 [patch 00/10] Immediate Values Mathieu Desnoyers
` (5 preceding siblings ...)
2007-07-03 16:40 ` [patch 06/10] Immediate Value - i386 Optimization Mathieu Desnoyers
@ 2007-07-03 16:40 ` Mathieu Desnoyers
2007-07-03 16:40 ` [patch 08/10] Immediate Value - Documentation Mathieu Desnoyers
` (2 subsequent siblings)
9 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2007-07-03 16:40 UTC (permalink / raw)
To: akpm, linux-kernel; +Cc: Mathieu Desnoyers
[-- Attachment #1: immediate-values-powerpc-optimization.patch --]
[-- Type: text/plain, Size: 4571 bytes --]
PowerPC optimization of the immediate values which uses a li instruction,
patched with an immediate value 0 or 1 to set the immediate value.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
arch/powerpc/kernel/Makefile | 1
arch/powerpc/kernel/immediate.c | 29 ++++++++++++++++
include/asm-powerpc/immediate.h | 69 +++++++++++++++++++++++++++++++++++++++-
3 files changed, 98 insertions(+), 1 deletion(-)
Index: linux-2.6-lttng/include/asm-powerpc/immediate.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-powerpc/immediate.h 2007-06-18 19:15:39.000000000 -0400
+++ linux-2.6-lttng/include/asm-powerpc/immediate.h 2007-06-18 19:15:46.000000000 -0400
@@ -1 +1,68 @@
-#include <asm-generic/immediate.h>
+#ifndef _ASM_POWERPC_IMMEDIATE_H
+#define _ASM_POWERPC_IMMEDIATE_H
+
+/*
+ * Immediate values. PowerPC architecture optimizations.
+ *
+ * (C) Copyright 2006 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm/asm-compat.h>
+
+#define IF_DEFAULT (IF_OPTIMIZED | IF_LOCKDEP)
+
+/* Optimized version of the immediate */
+#define immediate_optimized(flags, var) \
+ ({ \
+ char condition; \
+ asm ( ".section __immediate, \"a\", @progbits;\n\t" \
+ PPC_LONG "%1, 0f, %2;\n\t" \
+ ".previous;\n\t" \
+ ".align 4\n\t" \
+ "0:\n\t" \
+ "li %0,%3;\n\t" \
+ : "=r" (condition) \
+ : "i" (&var), \
+ "i" (flags), \
+ "i" (0)); \
+ condition; \
+ })
+
+/*
+ * immediate macro selecting the generic or optimized version of immediate,
+ * depending on the flags specified. It is a macro because we need to pass the
+ * name to immediate_optimized() and immediate_generic() so they can declare a
+ * static variable with it.
+ */
+#define _immediate(flags, var) \
+({ \
+ ((flags) & IF_OPTIMIZED) ? \
+ immediate_optimized(flags, var) : \
+ immediate_generic(flags, var); \
+})
+
+/* immediate with default behavior */
+#define immediate(var) _immediate(IF_DEFAULT, var)
+
+/*
+ * Architecture dependant immediate information, used internally for immediate
+ * activation.
+ */
+
+/*
+ * Offset of the immediate value from the start of the addi instruction (result
+ * of the li mnemonic), in bytes.
+ */
+#define IMMEDIATE_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET 2
+#define IMMEDIATE_OPTIMIZED_ENABLE_TYPE unsigned short
+/* Dereference enable as lvalue from a pointer to its instruction */
+#define IMMEDIATE_OPTIMIZED_ENABLE(a) \
+ (*(IMMEDIATE_OPTIMIZED_ENABLE_TYPE*) \
+ ((char*)(a)+IMMEDIATE_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET))
+
+extern int immediate_optimized_set_enable(void *address, char enable);
+
+#endif /* _ASM_POWERPC_IMMEDIATE_H */
Index: linux-2.6-lttng/arch/powerpc/kernel/Makefile
===================================================================
--- linux-2.6-lttng.orig/arch/powerpc/kernel/Makefile 2007-06-18 19:15:39.000000000 -0400
+++ linux-2.6-lttng/arch/powerpc/kernel/Makefile 2007-06-18 19:15:46.000000000 -0400
@@ -96,3 +96,4 @@
extra-$(CONFIG_PPC_FPU) += fpu.o
extra-$(CONFIG_PPC64) += entry_64.o
+obj-$(CONFIG_IMMEDIATE) += immediate.o
Index: linux-2.6-lttng/arch/powerpc/kernel/immediate.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/arch/powerpc/kernel/immediate.c 2007-06-18 19:27:52.000000000 -0400
@@ -0,0 +1,29 @@
+/*
+ * Powerpc optimized immediate values enabling/disabling.
+ *
+ * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ */
+
+#include <linux/module.h>
+#include <linux/immediate.h>
+#include <linux/string.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+/*
+ * The address is aligned on 4 bytes boundary: the 4 bytes instruction we are
+ * changing fits within one page.
+ */
+int immediate_optimized_set_enable(void *address, char enable)
+{
+ char newi[IMMEDIATE_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET+1];
+ int size = IMMEDIATE_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET
+ + sizeof(IMMEDIATE_OPTIMIZED_ENABLE_TYPE);
+
+ memcpy(newi, address, size);
+ IMMEDIATE_OPTIMIZED_ENABLE(&newi[0]) = enable;
+ memcpy(address, newi, size);
+ flush_icache_range((unsigned long)address, size);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(immediate_optimized_set_enable);
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 51+ messages in thread
* [patch 08/10] Immediate Value - Documentation
2007-07-03 16:40 [patch 00/10] Immediate Values Mathieu Desnoyers
` (6 preceding siblings ...)
2007-07-03 16:40 ` [patch 07/10] Immediate Value - PowerPC Optimization Mathieu Desnoyers
@ 2007-07-03 16:40 ` Mathieu Desnoyers
2007-07-03 16:40 ` [patch 09/10] F00F bug fixup for i386 - use immediate values Mathieu Desnoyers
2007-07-03 16:40 ` [patch 10/10] Scheduler profiling - Use " Mathieu Desnoyers
9 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2007-07-03 16:40 UTC (permalink / raw)
To: akpm, linux-kernel; +Cc: Mathieu Desnoyers
[-- Attachment #1: immediate-values-documentation.patch --]
[-- Type: text/plain, Size: 4142 bytes --]
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
Documentation/immediate.txt | 103 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 103 insertions(+)
Index: linux-2.6-lttng/Documentation/immediate.txt
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/Documentation/immediate.txt 2007-06-15 16:14:05.000000000 -0400
@@ -0,0 +1,103 @@
+ Using the Immediate Values
+
+ Mathieu Desnoyers
+
+
+This document introduces Immediate Values and their use.
+
+* Purpose of immediate values
+
+An immediate value is used to compile into the kernel a branch that is disabled
+at compile-time that has almost no measurable performance impact on the kernel.
+Then, at runtime, it can be enabled dynamically.
+
+It can be used to compile code in the kernel that is seldomly meant to be
+dynamically activated. It's the case of CPU specific workarounds, profiling,
+tracing, etc.
+
+This infrastructure is specialized in supporting dynamic patching of the values
+in the instruction stream when multiple CPUs are running without disturbing the
+normal system behavior.
+
+
+* Usage
+
+In order to use the macro immediate, you should include linux/immediate.h.
+
+#include <linux/immediate.h>
+
+immediate_t __read_mostly this_immediate;
+EXPORT_SYMBOL(this_immediate);
+
+
+Add, in your code :
+
+if (unlikely(immediate(this_immediate))) {
+ some code...
+}
+
+And then, use:
+
+Use immediate_arm(&this_immediate) to activate the immediate value.
+
+Use immediate_disarm(&this_immediate) to deactivate the immediate value.
+
+Use immediate_query(&this_immediate) to query the immediate value state.
+
+The immediate mechanism supports inserting multiple instances of the same
+immediate. Immediate values can be put in inline functions, inlined static
+functions, and unrolled loops.
+
+
+* Optimization for a given architecture
+
+One can implement optimized immediate values for a given architecture by
+replacing asm-$ARCH/immediate.h.
+
+The IF_* flags can be used to control the type of immediate value. See the
+include/linux/immediate.h header for the list of flags. They can be specified as
+the first parameter of the _immediate() macro, as in the following example,
+which uses flags to declare a immediate that always uses the generic version of
+the immediates. It can be useful to use this when immediates are placed in
+kernel code presenting particular reentrancy challenges.
+
+if (unlikely(_immediate(IF_DEFAULT & ~IF_OPTIMIZED, this_immediate))) {
+ some code...
+}
+
+
+* Performance improvement
+
+Result of a small test comparing:
+
+1 - Branch depending on a cache miss (has to fetch in memory, caused by a 128
+ bytes stride)). This is the test that is likely to look like what
+ side-effect the original profile_hit code was causing, under the
+ assumption that the kernel is already using L1 and L2 caches at
+ their full capacity and that a supplementary data load would cause
+ cache trashing.
+2 - Branch depending on L1 cache hit. Just for comparison.
+3 - Branch depending on a load immediate in the instruction stream.
+
+It has been compiled with gcc -O2. Tests done on a 3GHz P4.
+
+In the first test series, the branch is not taken:
+
+number of tests : 1000
+number of branches per test : 81920
+memory hit cycles per iteration (mean) : 48.252
+L1 cache hit cycles per iteration (mean) : 16.1693
+instruction stream based test, cycles per iteration (mean) : 16.0432
+
+
+In the second test series, the branch is taken and an integer is
+incremented within the block:
+
+number of tests : 1000
+number of branches per test : 81920
+memory hit cycles per iteration (mean) : 48.2691
+L1 cache hit cycles per iteration (mean) : 16.396
+instruction stream based test, cycles per iteration (mean) : 16.0441
+
+Therefore, the memory fetch based test seems to be 200% slower than the
+load immediate based test.
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 51+ messages in thread
* [patch 09/10] F00F bug fixup for i386 - use immediate values
2007-07-03 16:40 [patch 00/10] Immediate Values Mathieu Desnoyers
` (7 preceding siblings ...)
2007-07-03 16:40 ` [patch 08/10] Immediate Value - Documentation Mathieu Desnoyers
@ 2007-07-03 16:40 ` Mathieu Desnoyers
2007-07-04 20:43 ` Alexey Dobriyan
2007-07-03 16:40 ` [patch 10/10] Scheduler profiling - Use " Mathieu Desnoyers
9 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2007-07-03 16:40 UTC (permalink / raw)
To: akpm, linux-kernel; +Cc: Mathieu Desnoyers
[-- Attachment #1: f00f-bug-use-immediate-values.patch --]
[-- Type: text/plain, Size: 2754 bytes --]
Use the faster immediate values for F00F bug handling in do_page_fault.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
arch/i386/kernel/traps.c | 4 ++++
arch/i386/mm/fault.c | 3 ++-
include/asm-i386/processor.h | 3 +++
3 files changed, 9 insertions(+), 1 deletion(-)
Index: linux-2.6-lttng/arch/i386/kernel/traps.c
===================================================================
--- linux-2.6-lttng.orig/arch/i386/kernel/traps.c 2007-06-15 16:13:50.000000000 -0400
+++ linux-2.6-lttng/arch/i386/kernel/traps.c 2007-06-15 16:14:06.000000000 -0400
@@ -31,6 +31,7 @@
#include <linux/uaccess.h>
#include <linux/nmi.h>
#include <linux/bug.h>
+#include <linux/immediate.h>
#ifdef CONFIG_EISA
#include <linux/ioport.h>
@@ -1149,6 +1150,8 @@
#endif /* CONFIG_MATH_EMULATION */
#ifdef CONFIG_X86_F00F_BUG
+immediate_t __read_mostly f00f_bug_fix;
+
void __init trap_init_f00f_bug(void)
{
__set_fixmap(FIX_F00F_IDT, __pa(&idt_table), PAGE_KERNEL_RO);
@@ -1159,6 +1162,7 @@
*/
idt_descr.address = fix_to_virt(FIX_F00F_IDT);
load_idt(&idt_descr);
+ immediate_arm(&f00f_bug_fix);
}
#endif
Index: linux-2.6-lttng/arch/i386/mm/fault.c
===================================================================
--- linux-2.6-lttng.orig/arch/i386/mm/fault.c 2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/i386/mm/fault.c 2007-06-15 16:14:06.000000000 -0400
@@ -25,6 +25,7 @@
#include <linux/kprobes.h>
#include <linux/uaccess.h>
#include <linux/kdebug.h>
+#include <linux/immediate.h>
#include <asm/system.h>
#include <asm/desc.h>
@@ -477,7 +478,7 @@
/*
* Pentium F0 0F C7 C8 bug workaround.
*/
- if (boot_cpu_data.f00f_bug) {
+ if (unlikely(immediate(f00f_bug_fix))) {
unsigned long nr;
nr = (address - idt_descr.address) >> 3;
Index: linux-2.6-lttng/include/asm-i386/processor.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-i386/processor.h 2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/include/asm-i386/processor.h 2007-06-15 16:14:06.000000000 -0400
@@ -21,6 +21,7 @@
#include <asm/percpu.h>
#include <linux/cpumask.h>
#include <linux/init.h>
+#include <linux/immediate.h>
#include <asm/processor-flags.h>
/* flag for disabling the tsc */
@@ -102,6 +103,8 @@
extern struct tss_struct doublefault_tss;
DECLARE_PER_CPU(struct tss_struct, init_tss);
+extern immediate_t __read_mostly f00f_bug_fix;
+
#ifdef CONFIG_SMP
extern struct cpuinfo_x86 cpu_data[];
#define current_cpu_data cpu_data[smp_processor_id()]
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 51+ messages in thread
* [patch 10/10] Scheduler profiling - Use immediate values
2007-07-03 16:40 [patch 00/10] Immediate Values Mathieu Desnoyers
` (8 preceding siblings ...)
2007-07-03 16:40 ` [patch 09/10] F00F bug fixup for i386 - use immediate values Mathieu Desnoyers
@ 2007-07-03 16:40 ` Mathieu Desnoyers
2007-07-03 18:11 ` Alexey Dobriyan
2007-07-04 20:35 ` Alexey Dobriyan
9 siblings, 2 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2007-07-03 16:40 UTC (permalink / raw)
To: akpm, linux-kernel; +Cc: Mathieu Desnoyers
[-- Attachment #1: profiling-use-immediate-values.patch --]
[-- Type: text/plain, Size: 6593 bytes --]
Use immediate values with lower d-cache hit in optimized version as a
condition for scheduler profiling call.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
drivers/kvm/svm.c | 2 +-
drivers/kvm/vmx.c | 2 +-
include/linux/profile.h | 9 +++------
kernel/profile.c | 37 +++++++++++++++++++++++++------------
kernel/sched.c | 3 ++-
5 files changed, 32 insertions(+), 21 deletions(-)
Index: linux-2.6-lttng/kernel/profile.c
===================================================================
--- linux-2.6-lttng.orig/kernel/profile.c 2007-06-29 12:21:08.000000000 -0400
+++ linux-2.6-lttng/kernel/profile.c 2007-06-29 14:26:42.000000000 -0400
@@ -23,6 +23,7 @@
#include <linux/profile.h>
#include <linux/highmem.h>
#include <linux/mutex.h>
+#include <linux/immediate.h>
#include <asm/sections.h>
#include <asm/semaphore.h>
#include <asm/irq_regs.h>
@@ -42,9 +43,6 @@
static atomic_t *prof_buffer;
static unsigned long prof_len, prof_shift;
-int prof_on __read_mostly;
-EXPORT_SYMBOL_GPL(prof_on);
-
static cpumask_t prof_cpu_mask = CPU_MASK_ALL;
#ifdef CONFIG_SMP
static DEFINE_PER_CPU(struct profile_hit *[2], cpu_profile_hits);
@@ -52,6 +50,12 @@
static DEFINE_MUTEX(profile_flip_mutex);
#endif /* CONFIG_SMP */
+/* Immediate values */
+immediate_t __read_mostly sleep_profiling, sched_profiling, kvm_profiling,
+ cpu_profiling;
+EXPORT_SYMBOL_GPL(kvm_profiling);
+EXPORT_SYMBOL_GPL(cpu_profiling);
+
static int __init profile_setup(char * str)
{
static char __initdata schedstr[] = "schedule";
@@ -60,7 +64,7 @@
int par;
if (!strncmp(str, sleepstr, strlen(sleepstr))) {
- prof_on = SLEEP_PROFILING;
+ immediate_arm(&sleep_profiling);
if (str[strlen(sleepstr)] == ',')
str += strlen(sleepstr) + 1;
if (get_option(&str, &par))
@@ -69,7 +73,7 @@
"kernel sleep profiling enabled (shift: %ld)\n",
prof_shift);
} else if (!strncmp(str, schedstr, strlen(schedstr))) {
- prof_on = SCHED_PROFILING;
+ immediate_arm(&sched_profiling);
if (str[strlen(schedstr)] == ',')
str += strlen(schedstr) + 1;
if (get_option(&str, &par))
@@ -78,7 +82,7 @@
"kernel schedule profiling enabled (shift: %ld)\n",
prof_shift);
} else if (!strncmp(str, kvmstr, strlen(kvmstr))) {
- prof_on = KVM_PROFILING;
+ immediate_arm(&kvm_profiling);
if (str[strlen(kvmstr)] == ',')
str += strlen(kvmstr) + 1;
if (get_option(&str, &par))
@@ -88,7 +92,7 @@
prof_shift);
} else if (get_option(&str, &par)) {
prof_shift = par;
- prof_on = CPU_PROFILING;
+ immediate_arm(&cpu_profiling);
printk(KERN_INFO "kernel profiling enabled (shift: %ld)\n",
prof_shift);
}
@@ -99,7 +103,10 @@
void __init profile_init(void)
{
- if (!prof_on)
+ if (!immediate_query(&sleep_profiling) &&
+ !immediate_query(&sched_profiling) &&
+ !immediate_query(&kvm_profiling) &&
+ !immediate_query(&cpu_profiling))
return;
/* only text is profiled */
@@ -288,7 +295,7 @@
int i, j, cpu;
struct profile_hit *hits;
- if (prof_on != type || !prof_buffer)
+ if (!prof_buffer)
return;
pc = min((pc - (unsigned long)_stext) >> prof_shift, prof_len - 1);
i = primary = (pc & (NR_PROFILE_GRP - 1)) << PROFILE_GRPSHIFT;
@@ -398,7 +405,7 @@
{
unsigned long pc;
- if (prof_on != type || !prof_buffer)
+ if (!prof_buffer)
return;
pc = ((unsigned long)__pc - (unsigned long)_stext) >> prof_shift;
atomic_add(nr_hits, &prof_buffer[min(pc, prof_len - 1)]);
@@ -555,7 +562,10 @@
}
return 0;
out_cleanup:
- prof_on = 0;
+ immediate_disarm(&sleep_profiling);
+ immediate_disarm(&sched_profiling);
+ immediate_disarm(&kvm_profiling);
+ immediate_disarm(&cpu_profiling);
smp_mb();
on_each_cpu(profile_nop, NULL, 0, 1);
for_each_online_cpu(cpu) {
@@ -582,7 +592,10 @@
{
struct proc_dir_entry *entry;
- if (!prof_on)
+ if (!immediate_query(&sleep_profiling) &&
+ !immediate_query(&sched_profiling) &&
+ !immediate_query(&kvm_profiling) &&
+ !immediate_query(&cpu_profiling))
return 0;
if (create_hash_tables())
return -1;
Index: linux-2.6-lttng/include/linux/profile.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/profile.h 2007-06-29 10:21:44.000000000 -0400
+++ linux-2.6-lttng/include/linux/profile.h 2007-06-29 14:26:42.000000000 -0400
@@ -10,7 +10,8 @@
#include <asm/errno.h>
-extern int prof_on __read_mostly;
+extern immediate_t __read_mostly sleep_profiling, sched_profiling, kvm_profiling,
+ cpu_profiling;
#define CPU_PROFILING 1
#define SCHED_PROFILING 2
@@ -35,11 +36,7 @@
*/
static inline void profile_hit(int type, void *ip)
{
- /*
- * Speedup for the common (no profiling enabled) case:
- */
- if (unlikely(prof_on == type))
- profile_hits(type, ip, 1);
+ profile_hits(type, ip, 1);
}
#ifdef CONFIG_PROC_FS
Index: linux-2.6-lttng/kernel/sched.c
===================================================================
--- linux-2.6-lttng.orig/kernel/sched.c 2007-06-29 14:16:23.000000000 -0400
+++ linux-2.6-lttng/kernel/sched.c 2007-06-29 14:27:26.000000000 -0400
@@ -3241,7 +3241,8 @@
if (unlikely(in_atomic_preempt_off()) && unlikely(!prev->exit_state))
__schedule_bug(prev);
- profile_hit(SCHED_PROFILING, __builtin_return_address(0));
+ if (unlikely(immediate(sched_profiling)))
+ profile_hit(SCHED_PROFILING, __builtin_return_address(0));
schedstat_inc(this_rq(), sched_cnt);
}
Index: linux-2.6-lttng/drivers/kvm/svm.c
===================================================================
--- linux-2.6-lttng.orig/drivers/kvm/svm.c 2007-06-29 12:27:40.000000000 -0400
+++ linux-2.6-lttng/drivers/kvm/svm.c 2007-06-29 14:26:42.000000000 -0400
@@ -1654,7 +1654,7 @@
/*
* Profile KVM exit RIPs:
*/
- if (unlikely(prof_on == KVM_PROFILING))
+ if (unlikely(immediate(kvm_profiling)))
profile_hit(KVM_PROFILING,
(void *)(unsigned long)vcpu->svm->vmcb->save.rip);
Index: linux-2.6-lttng/drivers/kvm/vmx.c
===================================================================
--- linux-2.6-lttng.orig/drivers/kvm/vmx.c 2007-06-29 12:27:40.000000000 -0400
+++ linux-2.6-lttng/drivers/kvm/vmx.c 2007-06-29 14:26:42.000000000 -0400
@@ -2156,7 +2156,7 @@
/*
* Profile KVM exit RIPs:
*/
- if (unlikely(prof_on == KVM_PROFILING))
+ if (unlikely(immediate(kvm_profiling)))
profile_hit(KVM_PROFILING, (void *)vmcs_readl(GUEST_RIP));
vcpu->launched = 1;
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 10/10] Scheduler profiling - Use immediate values
2007-07-03 16:40 ` [patch 10/10] Scheduler profiling - Use " Mathieu Desnoyers
@ 2007-07-03 18:11 ` Alexey Dobriyan
2007-07-03 18:57 ` Mathieu Desnoyers
2007-07-04 20:35 ` Alexey Dobriyan
1 sibling, 1 reply; 51+ messages in thread
From: Alexey Dobriyan @ 2007-07-03 18:11 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: akpm, linux-kernel
On Tue, Jul 03, 2007 at 12:40:56PM -0400, Mathieu Desnoyers wrote:
> Use immediate values with lower d-cache hit in optimized version as a
> condition for scheduler profiling call.
How much difference in performance do you see?
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 06/10] Immediate Value - i386 Optimization
2007-07-03 16:40 ` [patch 06/10] Immediate Value - i386 Optimization Mathieu Desnoyers
@ 2007-07-03 18:45 ` H. Peter Anvin
2007-07-03 19:16 ` Mathieu Desnoyers
0 siblings, 1 reply; 51+ messages in thread
From: H. Peter Anvin @ 2007-07-03 18:45 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: akpm, linux-kernel
What is not clear to me is the exact code that is generated by these
macros. Nor can I find it anywhere in the documentation.
Could you please describe this in some detail? In particular, it seems
that the uses of these are largely as branch targets, where the extra
indirection over modifying the jump target directly seems wasted.
-hpa
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 10/10] Scheduler profiling - Use immediate values
2007-07-03 18:11 ` Alexey Dobriyan
@ 2007-07-03 18:57 ` Mathieu Desnoyers
2007-07-04 14:23 ` Adrian Bunk
` (2 more replies)
0 siblings, 3 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2007-07-03 18:57 UTC (permalink / raw)
To: Alexey Dobriyan; +Cc: akpm, linux-kernel
* Alexey Dobriyan (adobriyan@gmail.com) wrote:
> On Tue, Jul 03, 2007 at 12:40:56PM -0400, Mathieu Desnoyers wrote:
> > Use immediate values with lower d-cache hit in optimized version as a
> > condition for scheduler profiling call.
>
> How much difference in performance do you see?
>
Hi Alexey,
Please have a look at Documentation/immediate.txt for that information.
Also note that the main advantage of the load immediate is to free a
cache line. Therefore, I guess the best way to quantify the improvement
it brings at one single site is not in terms of cycles, but in terms of
number of cache lines used by the scheduler code. Since memory bandwidth
seems to be an increasing bottleneck (CPU frequency increases faster
than the available memory bandwidth), it makes sense to free as much
cache lines as we can.
Measuring the overall impact on the system of this single modification
results in the difference brought by one site within the standard
deviation of the normal samples. It will become significant when the
number of immediate values used instead of global variables at hot
kernel paths (need to ponder with the frequency at which the data is
accessed) will start to be significant compared to the L1 data cache
size. We could characterize this in memory to L1 cache transfers per
seconds.
On 3GHz P4:
memory read: ~48 cycles
So we can definitely say that 48*HZ (approximation of the frequency at
which the scheduler is called) won't make much difference, but as it
grows, it will.
On a 1000HZ system, it results in:
48000 cycles/second, or 16µs/second, or 0.000016% speedup.
However, if we place this in code called much more often, such as
do_page_fault, we get, with an hypotetical scenario of approximation
of 100000 page faults per second:
4800000 cycles/s, 1.6ms/second or 0.0016% speedup.
So as the number of immediate values used increase, the overall memory
bandwidth required by the kernel will go down.
Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 06/10] Immediate Value - i386 Optimization
2007-07-03 18:45 ` H. Peter Anvin
@ 2007-07-03 19:16 ` Mathieu Desnoyers
2007-07-03 20:18 ` H. Peter Anvin
0 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2007-07-03 19:16 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: akpm, linux-kernel
* H. Peter Anvin (hpa@zytor.com) wrote:
> What is not clear to me is the exact code that is generated by these
> macros. Nor can I find it anywhere in the documentation.
>
> Could you please describe this in some detail? In particular, it seems
> that the uses of these are largely as branch targets, where the extra
> indirection over modifying the jump target directly seems wasted.
>
Hi Peter,
I understand your concern. If you find a way to let the code be compiled
by gcc, put at the end of the functions (never being a branch target)
and then, dynamically, get the address of the branch instruction and
patch it, all that in cooperation with gcc, I would be glad to hear from
it. What I found is that gcc lets us do anything that touches
variables/registers in an inline assembly, but does not permit to place
branch instructions ourselves; it does not expect the execution flow to
be changed in inline asms.
Here is an objdump of the interesting bits on an immediate value placed
in scheddule (inline schedule_debug).
00000000 <schedule>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 57 push %edi
4: 56 push %esi
5: 53 push %ebx
6: 83 ec 40 sub $0x40,%esp
9: b8 01 00 00 00 mov $0x1,%eax
e: e8 fc ff ff ff call f <schedule+0xf>
13: e8 fc ff ff ff call 14 <schedule+0x14>
18: 89 45 dc mov %eax,0xffffffdc(%ebp)
1b: b8 00 00 00 00 mov $0x0,%eax
20: 8b 4d dc mov 0xffffffdc(%ebp),%ecx
23: 8b 14 8d 00 00 00 00 mov 0x0(,%ecx,4),%edx
2a: 01 d0 add %edx,%eax
2c: 89 45 d0 mov %eax,0xffffffd0(%ebp)
2f: b8 00 00 00 00 mov $0x0,%eax
34: c7 44 02 04 01 00 00 movl $0x1,0x4(%edx,%eax,1)
3b: 00
3c: 8b 5d d0 mov 0xffffffd0(%ebp),%ebx
3f: 8b 9b f0 03 00 00 mov 0x3f0(%ebx),%ebx
45: 89 5d c8 mov %ebx,0xffffffc8(%ebp)
48: 81 c3 94 01 00 00 add $0x194,%ebx
4e: 89 5d cc mov %ebx,0xffffffcc(%ebp)
51: 8b 45 c8 mov 0xffffffc8(%ebp),%eax
54: 8b 40 14 mov 0x14(%eax),%eax
57: 85 c0 test %eax,%eax
59: 0f 89 30 03 00 00 jns 38f <schedule+0x38f>
5f: 89 e0 mov %esp,%eax
61: 25 00 e0 ff ff and $0xffffe000,%eax
66: 8b 40 14 mov 0x14(%eax),%eax
69: 25 ff ff ff ef and $0xefffffff,%eax
6e: 83 e8 01 sub $0x1,%eax
71: 0f 85 fb 02 00 00 jne 372 <schedule+0x372>
<branch site>
77: b8 00 00 00 00 mov $0x0,%eax
7c: 85 c0 test %eax,%eax
7e: 0f 85 16 03 00 00 jne 39a <schedule+0x39a>
here, we just loaded 0 in eax (movl used to make sure we populate the
whole register so we do not stall the pipeline)a
When we activate the site,
line 77 becomes: b8 01 00 00 00 mov $0x1,%eax
</branch site>
84: 8b 45 d0 mov 0xffffffd0(%ebp),%eax
87: e8 fc ff ff ff call 88 <schedule+0x88>
8c: 8b 4d c8 mov 0xffffffc8(%ebp),%ecx
8f: 8b 41 04 mov 0x4(%ecx),%eax
92: f0 0f ba 70 08 02 lock btrl $0x2,0x8(%eax)
...
<profile_hit inline function>
39a: 8b 55 04 mov 0x4(%ebp),%edx
39d: b9 01 00 00 00 mov $0x1,%ecx
3a2: b8 02 00 00 00 mov $0x2,%eax
3a7: e8 fc ff ff ff call 3a8 <schedule+0x3a8>
3ac: e9 d3 fc ff ff jmp 84 <schedule+0x84>
</profile_hit inline function>
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 06/10] Immediate Value - i386 Optimization
2007-07-03 19:16 ` Mathieu Desnoyers
@ 2007-07-03 20:18 ` H. Peter Anvin
2007-07-03 20:37 ` Chuck Ebbert
2007-07-03 20:43 ` Mathieu Desnoyers
0 siblings, 2 replies; 51+ messages in thread
From: H. Peter Anvin @ 2007-07-03 20:18 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: akpm, linux-kernel
Mathieu Desnoyers wrote:
>
> Hi Peter,
>
> I understand your concern. If you find a way to let the code be compiled
> by gcc, put at the end of the functions (never being a branch target)
> and then, dynamically, get the address of the branch instruction and
> patch it, all that in cooperation with gcc, I would be glad to hear from
> it. What I found is that gcc lets us do anything that touches
> variables/registers in an inline assembly, but does not permit to place
> branch instructions ourselves; it does not expect the execution flow to
> be changed in inline asms.
>
I believe this is correct. It probably would require requesting a gcc
builtin, which might be worthwhile to do if we
> <branch site>
> 77: b8 00 00 00 00 mov $0x0,%eax
> 7c: 85 c0 test %eax,%eax
> 7e: 0f 85 16 03 00 00 jne 39a <schedule+0x39a>
> here, we just loaded 0 in eax (movl used to make sure we populate the
> whole register so we do not stall the pipeline)a
> When we activate the site,
> line 77 becomes: b8 01 00 00 00 mov $0x1,%eax
> </branch site>
One could, though, use an indirect jump to achieve, if not as good, at
least most of the effect:
movl $<patchable>,<reg>
jmp *<reg>
Some x86 cores will be able to detect the movl...jmp forwarding, and
collapse it into a known branch target; however, on the ones that can't,
it might be worse, since one would have to rely on the indirect branch
predictor.
This would, however, provide infrastructure that could be combined with
a future gcc builtin.
-hpa
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 06/10] Immediate Value - i386 Optimization
2007-07-03 20:18 ` H. Peter Anvin
@ 2007-07-03 20:37 ` Chuck Ebbert
2007-07-03 21:30 ` H. Peter Anvin
2007-07-03 23:10 ` Jeremy Fitzhardinge
2007-07-03 20:43 ` Mathieu Desnoyers
1 sibling, 2 replies; 51+ messages in thread
From: Chuck Ebbert @ 2007-07-03 20:37 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Mathieu Desnoyers, akpm, linux-kernel
On 07/03/2007 04:18 PM, H. Peter Anvin wrote:
>
> One could, though, use an indirect jump to achieve, if not as good, at
> least most of the effect:
>
> movl $<patchable>,<reg>
> jmp *<reg>
>
Yeah, but there's this GCC bug:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22448
You can't even dereference labels in an ASM statement.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 06/10] Immediate Value - i386 Optimization
2007-07-03 20:18 ` H. Peter Anvin
2007-07-03 20:37 ` Chuck Ebbert
@ 2007-07-03 20:43 ` Mathieu Desnoyers
2007-07-03 21:30 ` H. Peter Anvin
1 sibling, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2007-07-03 20:43 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: akpm, linux-kernel
* H. Peter Anvin (hpa@zytor.com) wrote:
> Mathieu Desnoyers wrote:
> >
> > Hi Peter,
> >
> > I understand your concern. If you find a way to let the code be compiled
> > by gcc, put at the end of the functions (never being a branch target)
> > and then, dynamically, get the address of the branch instruction and
> > patch it, all that in cooperation with gcc, I would be glad to hear from
> > it. What I found is that gcc lets us do anything that touches
> > variables/registers in an inline assembly, but does not permit to place
> > branch instructions ourselves; it does not expect the execution flow to
> > be changed in inline asms.
> >
>
> I believe this is correct. It probably would require requesting a gcc
> builtin, which might be worthwhile to do if we
>
> > <branch site>
> > 77: b8 00 00 00 00 mov $0x0,%eax
> > 7c: 85 c0 test %eax,%eax
> > 7e: 0f 85 16 03 00 00 jne 39a <schedule+0x39a>
> > here, we just loaded 0 in eax (movl used to make sure we populate the
> > whole register so we do not stall the pipeline)
> > When we activate the site,
> > line 77 becomes: b8 01 00 00 00 mov $0x1,%eax
> > </branch site>
>
> One could, though, use an indirect jump to achieve, if not as good, at
> least most of the effect:
>
> movl $<patchable>,<reg>
> jmp *<reg>
>
Using a jmp *<reg> will instruct gcc not to inline inline functions and
restrict loop unrolling (but the latter is not used in the linux
kernel). We would have to compute different $<patchable> for every site
generated by putting an immediate in an inline function.
> Some x86 cores will be able to detect the movl...jmp forwarding, and
> collapse it into a known branch target; however, on the ones that can't,
> it might be worse, since one would have to rely on the indirect branch
> predictor.
>
> This would, however, provide infrastructure that could be combined with
> a future gcc builtin.
>
If we can change the compiler, here is what we could do:
Tell GCC to put NOPs that could be altered by a branch alternative to
some specified code. We should be able to get the instruction pointers
(think of inlines) to these nop/branch instructions so we can change
them dynamically.
Something like:
immediate_t myfunc_cond;
inline myfunction(void) {
static void *insn; /* pointer to nops/branch instruction */
static void *target_inactive, *target_active;
__builtin_polymorphic_if(&insn, &myfunc_cond) {
/* Do something */
} else {
...
}
}
I could then save all the insns into my immediate value section and
later activate them by looking up all of those who refer to myfunc_cond.
The default behavior would be to branch to the target_inactive, and we
could change insn to jump to target_active dynamically.
Note that we should align the jump instruction so the address could be
changed atomically in the general case (on x86 and x86_64, we have to
use an int3 bypass anyway, so we don't really care).
Also, we should fine a way to let gcc tell us what type of jump it had
to use depending on how far the target of the branch is.
I suspect this would be inherently tricky. If someone is ready to do
this and tells me "yes, it will be there in 1 month", I am more than
ready to switch my markers to this and help, but since the core of my
work is kernel tracing, I don't have the time nor the ressources to
tackle this problem.
In the event that someone answers "we'll do this in the following 3
years", I might consider to change the if (immediate(var)) into an
immediate_if (var) so we can later proceed to the change with simple
ifdefs without rewriting all the kernel code that would use it.
Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 06/10] Immediate Value - i386 Optimization
2007-07-03 20:43 ` Mathieu Desnoyers
@ 2007-07-03 21:30 ` H. Peter Anvin
0 siblings, 0 replies; 51+ messages in thread
From: H. Peter Anvin @ 2007-07-03 21:30 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: akpm, linux-kernel
Mathieu Desnoyers wrote:
>
> If we can change the compiler, here is what we could do:
>
> Tell GCC to put NOPs that could be altered by a branch alternative to
> some specified code. We should be able to get the instruction pointers
> (think of inlines) to these nop/branch instructions so we can change
> them dynamically.
>
Changing the compiler should be perfectly feasible, *BUT* I think we
need a transitional solution that works on existing compilers.
> I suspect this would be inherently tricky. If someone is ready to do
> this and tells me "yes, it will be there in 1 month", I am more than
> ready to switch my markers to this and help, but since the core of my
> work is kernel tracing, I don't have the time nor the ressources to
> tackle this problem.
>
> In the event that someone answers "we'll do this in the following 3
> years", I might consider to change the if (immediate(var)) into an
> immediate_if (var) so we can later proceed to the change with simple
> ifdefs without rewriting all the kernel code that would use it.
This is much more of "we'll do that in the following 1-2 years", since
we have to deal with a full gcc development cycle. However, I really
want to see this being implemented in a way that would let us DTRT in
the long run.
-hpa
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 06/10] Immediate Value - i386 Optimization
2007-07-03 20:37 ` Chuck Ebbert
@ 2007-07-03 21:30 ` H. Peter Anvin
2007-07-03 23:10 ` Jeremy Fitzhardinge
1 sibling, 0 replies; 51+ messages in thread
From: H. Peter Anvin @ 2007-07-03 21:30 UTC (permalink / raw)
To: Chuck Ebbert; +Cc: Mathieu Desnoyers, akpm, linux-kernel
Chuck Ebbert wrote:
> On 07/03/2007 04:18 PM, H. Peter Anvin wrote:
>> One could, though, use an indirect jump to achieve, if not as good, at
>> least most of the effect:
>>
>> movl $<patchable>,<reg>
>> jmp *<reg>
>>
>
> Yeah, but there's this GCC bug:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22448
>
> You can't even dereference labels in an ASM statement.
I wouldn't to that, though, for the existing compiler. Instead, I
would do:
void (*func)(void); /* or what's appropriate */
asm(<magic movl> : "=rm" (func));
func();
-hpa
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 06/10] Immediate Value - i386 Optimization
2007-07-03 20:37 ` Chuck Ebbert
2007-07-03 21:30 ` H. Peter Anvin
@ 2007-07-03 23:10 ` Jeremy Fitzhardinge
1 sibling, 0 replies; 51+ messages in thread
From: Jeremy Fitzhardinge @ 2007-07-03 23:10 UTC (permalink / raw)
To: Chuck Ebbert; +Cc: H. Peter Anvin, Mathieu Desnoyers, akpm, linux-kernel
Chuck Ebbert wrote:
> On 07/03/2007 04:18 PM, H. Peter Anvin wrote:
>
>> One could, though, use an indirect jump to achieve, if not as good, at
>> least most of the effect:
>>
>> movl $<patchable>,<reg>
>> jmp *<reg>
>>
>>
>
> Yeah, but there's this GCC bug:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22448
>
> You can't even dereference labels in an ASM statement.
I was told in absolute terms that any use of &&label other than to pass
it to goto was not supported, and would not be supported.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29305
Seems that passing to an asm() falls into the same class of problem I
had. I think the underlying problem is that if the code containing the
label is in an inlined function or unrolled loop, the reference can't be
resolved properly anyway.
J
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 10/10] Scheduler profiling - Use immediate values
2007-07-03 18:57 ` Mathieu Desnoyers
@ 2007-07-04 14:23 ` Adrian Bunk
2007-07-04 20:31 ` Alexey Dobriyan
2007-07-05 20:21 ` Andrew Morton
2 siblings, 0 replies; 51+ messages in thread
From: Adrian Bunk @ 2007-07-04 14:23 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: Alexey Dobriyan, akpm, linux-kernel
On Tue, Jul 03, 2007 at 02:57:48PM -0400, Mathieu Desnoyers wrote:
> * Alexey Dobriyan (adobriyan@gmail.com) wrote:
> > On Tue, Jul 03, 2007 at 12:40:56PM -0400, Mathieu Desnoyers wrote:
> > > Use immediate values with lower d-cache hit in optimized version as a
> > > condition for scheduler profiling call.
> >
> > How much difference in performance do you see?
> >
>
> Hi Alexey,
>
> Please have a look at Documentation/immediate.txt for that information.
> Also note that the main advantage of the load immediate is to free a
> cache line. Therefore, I guess the best way to quantify the improvement
> it brings at one single site is not in terms of cycles, but in terms of
> number of cache lines used by the scheduler code. Since memory bandwidth
> seems to be an increasing bottleneck (CPU frequency increases faster
> than the available memory bandwidth), it makes sense to free as much
> cache lines as we can.
>
> Measuring the overall impact on the system of this single modification
> results in the difference brought by one site within the standard
> deviation of the normal samples. It will become significant when the
> number of immediate values used instead of global variables at hot
> kernel paths (need to ponder with the frequency at which the data is
> accessed) will start to be significant compared to the L1 data cache
> size. We could characterize this in memory to L1 cache transfers per
> seconds.
>
> On 3GHz P4:
>
> memory read: ~48 cycles
>
> So we can definitely say that 48*HZ (approximation of the frequency at
> which the scheduler is called) won't make much difference, but as it
> grows, it will.
>
> On a 1000HZ system, it results in:
>
> 48000 cycles/second, or 16µs/second, or 0.000016% speedup.
>
> However, if we place this in code called much more often, such as
> do_page_fault, we get, with an hypotetical scenario of approximation
> of 100000 page faults per second:
>
> 4800000 cycles/s, 1.6ms/second or 0.0016% speedup.
>
> So as the number of immediate values used increase, the overall memory
> bandwidth required by the kernel will go down.
Might make a nice scientific paper, but even according to your own
optimistic numbers it's not realistic that you will ever achieve any
visible improvement even if you'd find 100 places in hotpaths you could
mark this way.
And a better direction for hotpaths seems to be Andi's __cold/COLD in
-mm without adding an own framework for doing such things.
> Mathieu
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 10/10] Scheduler profiling - Use immediate values
2007-07-03 18:57 ` Mathieu Desnoyers
2007-07-04 14:23 ` Adrian Bunk
@ 2007-07-04 20:31 ` Alexey Dobriyan
2007-07-05 20:21 ` Andrew Morton
2 siblings, 0 replies; 51+ messages in thread
From: Alexey Dobriyan @ 2007-07-04 20:31 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: akpm, linux-kernel
On Tue, Jul 03, 2007 at 02:57:48PM -0400, Mathieu Desnoyers wrote:
> * Alexey Dobriyan (adobriyan@gmail.com) wrote:
> > On Tue, Jul 03, 2007 at 12:40:56PM -0400, Mathieu Desnoyers wrote:
> > > Use immediate values with lower d-cache hit in optimized version as a
> > > condition for scheduler profiling call.
> >
> > How much difference in performance do you see?
> >
>
> Hi Alexey,
>
> Please have a look at Documentation/immediate.txt for that information.
> Also note that the main advantage of the load immediate is to free a
> cache line. Therefore, I guess the best way to quantify the improvement
> it brings at one single site is not in terms of cycles, but in terms of
> number of cache lines used by the scheduler code. Since memory bandwidth
> seems to be an increasing bottleneck (CPU frequency increases faster
> than the available memory bandwidth), it makes sense to free as much
> cache lines as we can.
>
> Measuring the overall impact on the system of this single modification
> results in the difference brought by one site within the standard
> deviation of the normal samples. It will become significant when the
> number of immediate values used instead of global variables at hot
> kernel paths (need to ponder with the frequency at which the data is
> accessed) will start to be significant compared to the L1 data cache
> size.
L1 cache is 8K here. Just how many such variables should exist?
On hot paths!
> We could characterize this in memory to L1 cache transfers per
> seconds.
>
> On 3GHz P4:
>
> memory read: ~48 cycles
>
> So we can definitely say that 48*HZ (approximation of the frequency at
> which the scheduler is called) won't make much difference, but as it
> grows, it will.
>
> On a 1000HZ system, it results in:
>
> 48000 cycles/second, or 16µs/second, or 0.000016% speedup.
>
> However, if we place this in code called much more often, such as
> do_page_fault, we get, with an hypotetical scenario of approximation
> of 100000 page faults per second:
>
> 4800000 cycles/s, 1.6ms/second or 0.0016% speedup.
>
> So as the number of immediate values used increase, the overall memory
> bandwidth required by the kernel will go down.
Adding so many infrastructure for something that you can't even measure
is totally unjustified.
There are already too many places where unlikely() and __read_mostly are
used just because they can be used, so adding yet another such very
specific, let's call it annotation, seems wrong to me.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 10/10] Scheduler profiling - Use immediate values
2007-07-03 16:40 ` [patch 10/10] Scheduler profiling - Use " Mathieu Desnoyers
2007-07-03 18:11 ` Alexey Dobriyan
@ 2007-07-04 20:35 ` Alexey Dobriyan
2007-07-04 22:41 ` Andi Kleen
1 sibling, 1 reply; 51+ messages in thread
From: Alexey Dobriyan @ 2007-07-04 20:35 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: akpm, linux-kernel
On Tue, Jul 03, 2007 at 12:40:56PM -0400, Mathieu Desnoyers wrote:
> Use immediate values with lower d-cache hit in optimized version as a
> condition for scheduler profiling call.
I think it's better to put profile.c under CONFIG_PROFILING as
_expected_, so CONFIG_PROFILING=n users won't get any overhead, immediate or
not. That's what I'm going to do after test-booting bunch of kernels.
Thus, enabling CONFIG_PROFILING option will buy you some overhead,
again, as _expected_.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 09/10] F00F bug fixup for i386 - use immediate values
2007-07-03 16:40 ` [patch 09/10] F00F bug fixup for i386 - use immediate values Mathieu Desnoyers
@ 2007-07-04 20:43 ` Alexey Dobriyan
0 siblings, 0 replies; 51+ messages in thread
From: Alexey Dobriyan @ 2007-07-04 20:43 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: akpm, linux-kernel
On Tue, Jul 03, 2007 at 12:40:55PM -0400, Mathieu Desnoyers wrote:
> Use the faster immediate values for F00F bug handling in do_page_fault.
> --- linux-2.6-lttng.orig/arch/i386/mm/fault.c
> +++ linux-2.6-lttng/arch/i386/mm/fault.c
> @@ -25,6 +25,7 @@
> #include <linux/kprobes.h>
> #include <linux/uaccess.h>
> #include <linux/kdebug.h>
> +#include <linux/immediate.h>
>
> #include <asm/system.h>
> #include <asm/desc.h>
> @@ -477,7 +478,7 @@
> /*
> * Pentium F0 0F C7 C8 bug workaround.
> */
> - if (boot_cpu_data.f00f_bug) {
> + if (unlikely(immediate(f00f_bug_fix))) {
As I understand it, this never triggers in normal valid and invalid
pagefaults, because even invalid userspace pagefaults take branch
slightly earlier where catch SIGSEGV. Nobody cares about performance if
you reach this branch.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 10/10] Scheduler profiling - Use immediate values
2007-07-04 20:35 ` Alexey Dobriyan
@ 2007-07-04 22:41 ` Andi Kleen
0 siblings, 0 replies; 51+ messages in thread
From: Andi Kleen @ 2007-07-04 22:41 UTC (permalink / raw)
To: Alexey Dobriyan; +Cc: Mathieu Desnoyers, akpm, linux-kernel
Alexey Dobriyan <adobriyan@gmail.com> writes:
> On Tue, Jul 03, 2007 at 12:40:56PM -0400, Mathieu Desnoyers wrote:
> > Use immediate values with lower d-cache hit in optimized version as a
> > condition for scheduler profiling call.
>
> I think it's better to put profile.c under CONFIG_PROFILING as
> _expected_, so CONFIG_PROFILING=n users won't get any overhead, immediate or
> not. That's what I'm going to do after test-booting bunch of kernels.
No, it's better to handle this efficiently at runtime e.g. for
distribution kernels. Mathieu's patch is good
-Andi
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 10/10] Scheduler profiling - Use immediate values
2007-07-03 18:57 ` Mathieu Desnoyers
2007-07-04 14:23 ` Adrian Bunk
2007-07-04 20:31 ` Alexey Dobriyan
@ 2007-07-05 20:21 ` Andrew Morton
2007-07-05 20:29 ` Andrew Morton
2007-07-06 11:44 ` Andi Kleen
2 siblings, 2 replies; 51+ messages in thread
From: Andrew Morton @ 2007-07-05 20:21 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: Alexey Dobriyan, linux-kernel
On Tue, 3 Jul 2007 14:57:48 -0400
Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> Measuring the overall impact on the system of this single modification
> results in the difference brought by one site within the standard
> deviation of the normal samples. It will become significant when the
> number of immediate values used instead of global variables at hot
> kernel paths (need to ponder with the frequency at which the data is
> accessed) will start to be significant compared to the L1 data cache
> size. We could characterize this in memory to L1 cache transfers per
> seconds.
>
> On 3GHz P4:
>
> memory read: ~48 cycles
>
> So we can definitely say that 48*HZ (approximation of the frequency at
> which the scheduler is called) won't make much difference, but as it
> grows, it will.
>
> On a 1000HZ system, it results in:
>
> 48000 cycles/second, or 16__s/second, or 0.000016% speedup.
>
> However, if we place this in code called much more often, such as
> do_page_fault, we get, with an hypotetical scenario of approximation
> of 100000 page faults per second:
>
> 4800000 cycles/s, 1.6ms/second or 0.0016% speedup.
>
> So as the number of immediate values used increase, the overall memory
> bandwidth required by the kernel will go down.
Is that 48 cycles measured when the target of the read is in L1 cache, as
it would be in any situation which we actually care about? I guess so...
Boy, this is a tiny optimisation and boy, you added a pile of tricky new
code to obtain it.
Frankly, I'm thinking that life would be simpler if we just added static
markers and stopped trying to add lots of tricksy
maintenance-load-increasing things like this.
Ho hum. Need more convincing, please.
Also: a while back (maybe as much as a year) we had an extensive discussion
regarding whether we want static markers at all in the kernel. The
eventual outcome was, I believe, "yes".
But our reasons for making that decision appear to have been lost. So if I
were to send the markers patches to Linus and he were to ask me "why are
you sending these", I'd be forced to answer "I don't know". This is not a
good situation.
Please prepare and maintain a short document which describes the
justification for making all these changes to the kernel. The changelog
for the main markers patch wold be an appropriate place for this. The
target audience would be kernel developers and it should capture the pro-
and con- arguments which were raised during that discussion.
Bascially: tell us why we should merge _any_ of this stuff, because I for
one have forgotten. Thanks.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 10/10] Scheduler profiling - Use immediate values
2007-07-05 20:21 ` Andrew Morton
@ 2007-07-05 20:29 ` Andrew Morton
2007-07-05 20:41 ` Mathieu Desnoyers
2007-07-06 11:44 ` Andi Kleen
1 sibling, 1 reply; 51+ messages in thread
From: Andrew Morton @ 2007-07-05 20:29 UTC (permalink / raw)
To: Mathieu Desnoyers, Alexey Dobriyan, linux-kernel
On Thu, 5 Jul 2007 13:21:20 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:
> Please prepare and maintain a short document which describes the
> justification for making all these changes to the kernel.
oh, you did. It's there in the add-kconfig-stuff patch.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 10/10] Scheduler profiling - Use immediate values
2007-07-05 20:29 ` Andrew Morton
@ 2007-07-05 20:41 ` Mathieu Desnoyers
0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2007-07-05 20:41 UTC (permalink / raw)
To: Andrew Morton; +Cc: Alexey Dobriyan, linux-kernel
* Andrew Morton (akpm@linux-foundation.org) wrote:
> On Thu, 5 Jul 2007 13:21:20 -0700
> Andrew Morton <akpm@linux-foundation.org> wrote:
>
> > Please prepare and maintain a short document which describes the
> > justification for making all these changes to the kernel.
>
> oh, you did. It's there in the add-kconfig-stuff patch.
Yes, if you feel it should be put in a different patch header, I'll move
it.
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 10/10] Scheduler profiling - Use immediate values
2007-07-05 20:21 ` Andrew Morton
2007-07-05 20:29 ` Andrew Morton
@ 2007-07-06 11:44 ` Andi Kleen
2007-07-06 17:50 ` Li, Tong N
2007-07-06 22:14 ` [patch 10/10] " Chuck Ebbert
1 sibling, 2 replies; 51+ messages in thread
From: Andi Kleen @ 2007-07-06 11:44 UTC (permalink / raw)
To: Andrew Morton; +Cc: Mathieu Desnoyers, Alexey Dobriyan, linux-kernel
Andrew Morton <akpm@linux-foundation.org> writes:
> Is that 48 cycles measured when the target of the read is in L1 cache, as
> it would be in any situation which we actually care about? I guess so...
The normal situation is big database or other bloated software runs; clears
all the dcaches, then enters kernel. Kernel has a cache miss on all its data.
But icache access is faster because the CPU prefetches.
We've had cases like this -- e.g. the additional dcache line
accesses that were added by the new time code in vgettimeofday()
were visible in macro benchmarks.
Also cache misses in this situation tend to be much more than 48 cycles
(even an K8 with integrated memory controller with fastest DIMMs is
slower than that) Mathieu probably measured an L2 miss, not a load from RAM.
Load from RAM can be hundreds of ns in the worst case.
I think the optimization is a good idea, although i dislike it
that it is complicated for the dynamic markers. If it was just
static it would be much simpler.
-Andi
^ permalink raw reply [flat|nested] 51+ messages in thread
* RE: [patch 10/10] Scheduler profiling - Use immediate values
2007-07-06 11:44 ` Andi Kleen
@ 2007-07-06 17:50 ` Li, Tong N
2007-07-06 20:03 ` Andi Kleen
2007-07-07 1:50 ` [patch 10/10] *Tests* " Mathieu Desnoyers
2007-07-06 22:14 ` [patch 10/10] " Chuck Ebbert
1 sibling, 2 replies; 51+ messages in thread
From: Li, Tong N @ 2007-07-06 17:50 UTC (permalink / raw)
To: Andi Kleen, Andrew Morton
Cc: Mathieu Desnoyers, Alexey Dobriyan, linux-kernel
> Also cache misses in this situation tend to be much more than 48
cycles
> (even an K8 with integrated memory controller with fastest DIMMs is
> slower than that) Mathieu probably measured an L2 miss, not a load
from
> RAM.
> Load from RAM can be hundreds of ns in the worst case.
>
The 48 cycles sounds to me like a memory load in an unloaded system, but
it is quite low. I wonder how it was measured...
tong
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 10/10] Scheduler profiling - Use immediate values
2007-07-06 17:50 ` Li, Tong N
@ 2007-07-06 20:03 ` Andi Kleen
2007-07-06 20:57 ` Li, Tong N
2007-07-07 1:50 ` [patch 10/10] *Tests* " Mathieu Desnoyers
1 sibling, 1 reply; 51+ messages in thread
From: Andi Kleen @ 2007-07-06 20:03 UTC (permalink / raw)
To: Li, Tong N
Cc: Andi Kleen, Andrew Morton, Mathieu Desnoyers, Alexey Dobriyan,
linux-kernel
On Fri, Jul 06, 2007 at 10:50:30AM -0700, Li, Tong N wrote:
> > Also cache misses in this situation tend to be much more than 48
> cycles
> > (even an K8 with integrated memory controller with fastest DIMMs is
> > slower than that) Mathieu probably measured an L2 miss, not a load
^^^^^^^
I meant L2 cache hit of course
> from
> > RAM.
> > Load from RAM can be hundreds of ns in the worst case.
> >
>
> The 48 cycles sounds to me like a memory load in an unloaded system, but
> it is quite low. I wonder how it was measured...
I found that memory latency is difficult to measure in modern x86
CPUs because they have very clever prefetchers that can often
outwit benchmarks.
Another trap on P4 is that RDTSC is actually quite slow and synchronizes
the CPU; that can add large measurement errors.
-Andi
^ permalink raw reply [flat|nested] 51+ messages in thread
* RE: [patch 10/10] Scheduler profiling - Use immediate values
2007-07-06 20:03 ` Andi Kleen
@ 2007-07-06 20:57 ` Li, Tong N
2007-07-06 21:03 ` Mathieu Desnoyers
0 siblings, 1 reply; 51+ messages in thread
From: Li, Tong N @ 2007-07-06 20:57 UTC (permalink / raw)
To: Andi Kleen
Cc: Andrew Morton, Mathieu Desnoyers, Alexey Dobriyan, linux-kernel
> I found that memory latency is difficult to measure in modern x86
> CPUs because they have very clever prefetchers that can often
> outwit benchmarks.
A pointer-chasing program that accesses a random sequence of addresses
usually can produce a good estimate on memory latency. Also, prefetching
can be turned off in BIOS or by modifying the MSRs.
> Another trap on P4 is that RDTSC is actually quite slow and
synchronizes
> the CPU; that can add large measurement errors.
>
> -Andi
The cost can be amortized if the portion of memory accesses is long
enough.
tong
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 10/10] Scheduler profiling - Use immediate values
2007-07-06 20:57 ` Li, Tong N
@ 2007-07-06 21:03 ` Mathieu Desnoyers
0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2007-07-06 21:03 UTC (permalink / raw)
To: Li, Tong N; +Cc: Andi Kleen, Andrew Morton, Alexey Dobriyan, linux-kernel
* Li, Tong N (tong.n.li@intel.com) wrote:
> > I found that memory latency is difficult to measure in modern x86
> > CPUs because they have very clever prefetchers that can often
> > outwit benchmarks.
>
> A pointer-chasing program that accesses a random sequence of addresses
> usually can produce a good estimate on memory latency. Also, prefetching
> can be turned off in BIOS or by modifying the MSRs.
>
> > Another trap on P4 is that RDTSC is actually quite slow and
> synchronizes
> > the CPU; that can add large measurement errors.
> >
> > -Andi
>
> The cost can be amortized if the portion of memory accesses is long
> enough.
>
> tong
>
That's what I am currently doing.. the results are coming in a few
moments... :)
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 10/10] Scheduler profiling - Use immediate values
2007-07-06 11:44 ` Andi Kleen
2007-07-06 17:50 ` Li, Tong N
@ 2007-07-06 22:14 ` Chuck Ebbert
2007-07-06 23:28 ` Adrian Bunk
1 sibling, 1 reply; 51+ messages in thread
From: Chuck Ebbert @ 2007-07-06 22:14 UTC (permalink / raw)
To: Andi Kleen
Cc: Andrew Morton, Mathieu Desnoyers, Alexey Dobriyan, linux-kernel
On 07/06/2007 07:44 AM, Andi Kleen wrote:
> I think the optimization is a good idea, although i dislike it
> that it is complicated for the dynamic markers. If it was just
> static it would be much simpler.
>
Another thing to consider is that there might be hundreds of these
probes/tracepoints active in an instrumented kernel. The overhead
adds up fast, so the gain may be worth all the pain.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 10/10] Scheduler profiling - Use immediate values
2007-07-06 22:14 ` [patch 10/10] " Chuck Ebbert
@ 2007-07-06 23:28 ` Adrian Bunk
2007-07-06 23:38 ` Dave Jones
2007-07-06 23:43 ` Mathieu Desnoyers
0 siblings, 2 replies; 51+ messages in thread
From: Adrian Bunk @ 2007-07-06 23:28 UTC (permalink / raw)
To: Chuck Ebbert
Cc: Andi Kleen, Andrew Morton, Mathieu Desnoyers, Alexey Dobriyan,
linux-kernel
On Fri, Jul 06, 2007 at 06:14:10PM -0400, Chuck Ebbert wrote:
> On 07/06/2007 07:44 AM, Andi Kleen wrote:
> > I think the optimization is a good idea, although i dislike it
> > that it is complicated for the dynamic markers. If it was just
> > static it would be much simpler.
>
> Another thing to consider is that there might be hundreds of these
> probes/tracepoints active in an instrumented kernel. The overhead
> adds up fast, so the gain may be worth all the pain.
Only if you want to squeeze the last bit of performance out of
_debugging_ functionality.
You avoid all the pain if you simply don't use debugging functionality
on production systems.
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 10/10] Scheduler profiling - Use immediate values
2007-07-06 23:28 ` Adrian Bunk
@ 2007-07-06 23:38 ` Dave Jones
2007-07-07 0:10 ` Adrian Bunk
2007-07-06 23:43 ` Mathieu Desnoyers
1 sibling, 1 reply; 51+ messages in thread
From: Dave Jones @ 2007-07-06 23:38 UTC (permalink / raw)
To: Adrian Bunk
Cc: Chuck Ebbert, Andi Kleen, Andrew Morton, Mathieu Desnoyers,
Alexey Dobriyan, linux-kernel
On Sat, Jul 07, 2007 at 01:28:43AM +0200, Adrian Bunk wrote:
> Only if you want to squeeze the last bit of performance out of
> _debugging_ functionality.
>
> You avoid all the pain if you simply don't use debugging functionality
> on production systems.
I think you're mixing up profiling and debugging. The former is
extremely valuable under production systems.
Dave
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 10/10] Scheduler profiling - Use immediate values
2007-07-06 23:28 ` Adrian Bunk
2007-07-06 23:38 ` Dave Jones
@ 2007-07-06 23:43 ` Mathieu Desnoyers
2007-07-07 2:25 ` Adrian Bunk
1 sibling, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2007-07-06 23:43 UTC (permalink / raw)
To: Adrian Bunk
Cc: Chuck Ebbert, Andi Kleen, Andrew Morton, Alexey Dobriyan,
linux-kernel, mbligh
* Adrian Bunk (bunk@stusta.de) wrote:
> On Fri, Jul 06, 2007 at 06:14:10PM -0400, Chuck Ebbert wrote:
> > On 07/06/2007 07:44 AM, Andi Kleen wrote:
> > > I think the optimization is a good idea, although i dislike it
> > > that it is complicated for the dynamic markers. If it was just
> > > static it would be much simpler.
> >
> > Another thing to consider is that there might be hundreds of these
> > probes/tracepoints active in an instrumented kernel. The overhead
> > adds up fast, so the gain may be worth all the pain.
>
> Only if you want to squeeze the last bit of performance out of
> _debugging_ functionality.
>
> You avoid all the pain if you simply don't use debugging functionality
> on production systems.
>
Adrian,
Please have a look at my markers posts, especially:
http://www.ussg.iu.edu/hypermail/linux/kernel/0707.0/0669.html
And also look into OLS 2007 proceedings for Martin Bligh's paper on
Debugging Google sized clusters. It basically makes the case for adding
functionnality to debug _user space_ problems on production systems that
can be turned on dynamically.
Mathieu
> cu
> Adrian
>
> --
>
> "Is there not promise of rain?" Ling Tan asked suddenly out
> of the darkness. There had been need of rain for many days.
> "Only a promise," Lao Er said.
> Pearl S. Buck - Dragon Seed
>
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 10/10] Scheduler profiling - Use immediate values
2007-07-06 23:38 ` Dave Jones
@ 2007-07-07 0:10 ` Adrian Bunk
2007-07-07 15:45 ` Frank Ch. Eigler
0 siblings, 1 reply; 51+ messages in thread
From: Adrian Bunk @ 2007-07-07 0:10 UTC (permalink / raw)
To: Dave Jones, Chuck Ebbert, Andi Kleen, Andrew Morton,
Mathieu Desnoyers, Alexey Dobriyan, linux-kernel
On Fri, Jul 06, 2007 at 07:38:27PM -0400, Dave Jones wrote:
> On Sat, Jul 07, 2007 at 01:28:43AM +0200, Adrian Bunk wrote:
>
> > Only if you want to squeeze the last bit of performance out of
> > _debugging_ functionality.
> >
> > You avoid all the pain if you simply don't use debugging functionality
> > on production systems.
>
> I think you're mixing up profiling and debugging. The former is
> extremely valuable under production systems.
profiling = debugging of performance problems
My words were perhaps a bit sloppy, but profiling isn't part of normal
operation and if people use a separate kernel for such purposes we don't
need infrastructure for reducing performance penalties of enabled debug
options.
> Dave
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 10/10] *Tests* Scheduler profiling - Use immediate values
2007-07-06 17:50 ` Li, Tong N
2007-07-06 20:03 ` Andi Kleen
@ 2007-07-07 1:50 ` Mathieu Desnoyers
2007-07-07 6:08 ` Li, Tong N
1 sibling, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2007-07-07 1:50 UTC (permalink / raw)
To: Li, Tong N; +Cc: Andi Kleen, Andrew Morton, Alexey Dobriyan, linux-kernel
* Li, Tong N (tong.n.li@intel.com) wrote:
> > Also cache misses in this situation tend to be much more than 48
> cycles
> > (even an K8 with integrated memory controller with fastest DIMMs is
> > slower than that) Mathieu probably measured an L2 miss, not a load
> from
> > RAM.
> > Load from RAM can be hundreds of ns in the worst case.
> >
>
> The 48 cycles sounds to me like a memory load in an unloaded system, but
> it is quite low. I wonder how it was measured...
>
> tong
>
Ah, excellent point.
I just re-read through my test case, and I modified it so it initializes
the memory first, instead of using the zero page, and now does a
pseudo-random walk instead of going through the L2 cache lines
sequentially... that will help. Especially since the memory is dual
channel and the cpu does some prefetching. The code follows this
email for easier review and test.
I also did a test that characterize the speed impact of my LTTng marker
set in both scenarios with a memory I/O hungry program, also provided
below.
* Memory hit for a data-based branch
Here are the results now, on my 3GHz Pentium 4:
number of tests : 100
number of branches per test : 100000
memory hit cycles per iteration (mean) : 636.611
L1 cache hit cycles per iteration (mean) : 89.6413
instruction stream based test, cycles per iteration (mean) : 85.3438
Just getting the pointer from a modulo on a pseudo-random value, doing
noting with it, cycles per iteration (mean) : 77.5044
So:
Base case: 77.50 cycles
instruction stream based test: +7.8394 cycles
L1 cache hit based test: +12.1369 cycles
Memory load based test: +559.1066 cycles
So let's say we have a ping flood coming at
(14014 packets transmitted, 14014 received, 0% packet loss, time 1826ms)
7674 packets per second. If we put 2 markers for irq entry/exit, it
brings us to 15348 markers sites executed per second.
(15348 exec/s) * (559 cycles/exec) / (3G cycles/s) = 0.0029
We therefore have a 0.29% slowdown just on this case.
Compared to this, the instruction stream based test will cause a
slowdown of:
(15348 exec/s) * (7.84 cycles/exec) / (3G cycles/s) = 0.00004
For a 0.004% slowdown.
If we plan to use this for memory allocation, spinlock, and all sort of
very high event rate tracing, we can assume it will execute 10 to 100
times more sites per second, which brings us to 0.4% slowdown with the
instruction stream based test compared to 29% slowdown with the memory
load based test on a system with high memory pressure.
* Markers impact under heavy memory load
Running a kernel with my LTTng instrumentation set, in a test that
generates memory pressure (from userspace) by trashing L1 and L2 caches
between calls to getppid() (note: syscall_trace is active and calls
a marker upon syscall entry and syscall exit; markers are disarmed).
This test is done in user-space, so there are some delays due to IRQs
coming and to the scheduler. (UP 2.6.22-rc6-mm1 kernel, task with -20
nice level)
My first set of results : Linear cache trashing, turned out not to be
very interesting, because it seems like the linearity of the memset on a
full array is somehow detected and it does not "really" trash the
caches.
Now the most interesting result : Random walk L1 and L2 trashing
surrounding a getppid() call.
- Markers compiled out (but syscall_trace execution forced)
number of tests : 10000
No memory pressure
Reading timestamps takes 108.033 cycles
getppid : 1681.4 cycles
With memory pressure
Reading timestamps takes 102.938 cycles
getppid : 15691.6 cycles
- With the immediate values based markers:
number of tests : 10000
No memory pressure
Reading timestamps takes 108.006 cycles
getppid : 1681.84 cycles
With memory pressure
Reading timestamps takes 100.291 cycles
getppid : 11793 cycles
- With global variables based markers:
number of tests : 10000
No memory pressure
Reading timestamps takes 107.999 cycles
getppid : 1669.06 cycles
With memory pressure
Reading timestamps takes 102.839 cycles
getppid : 12535 cycles
The result is quite interesting in that the kernel is slower without
markers than with markers. I explain it by the fact that the data
accessed is not layed out in the same manner in the cache lines when the
markers are compiled in or out. It seems that it aligns the function's
data better to compile-in the markers in this case.
But since the interesting comparison is between the immediate values and
global variables based markers, and because they share the same memory
layout, except for the movl being replaced by a movz, we see that the
global variable based markers (2 markers) adds 742 cycles to each system
call (syscall entry and exit are traced and memory locations for both
global variables lie on the same cache line).
Therefore, not only is it interesting to use the immediate values to
dynamically activate dormant code such as the markers, but I think it
should also be considered as a replacement for many of the "read mostly"
static variables.
Mathieu
Testing memory timings, in user-space:
----------------------------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef unsigned long long cycles_t;
#define barrier() __asm__ __volatile__("": : :"memory")
#define rdtsc(low,high) \
__asm__ __volatile__("rdtsc" : "=a" (low), "=d" (high))
#define rdtscl(low) \
__asm__ __volatile__("rdtsc" : "=a" (low) : : "edx")
#define rdtscll(val) \
__asm__ __volatile__("rdtsc" : "=A" (val))
static inline cycles_t get_cycles(void)
{
unsigned long long ret = 0;
barrier();
rdtscll(ret);
barrier();
return ret;
}
#define L1_CACHELINE_SIZE 64
#define L2_CACHELINE_SIZE 128
#define ARRAY_SIZE 524288000
#define NR_TESTS 100
#define NR_ITER 100000
char array[ARRAY_SIZE];
unsigned int glob = 0;
int main(int argc, char **argv)
{
char *testval;
cycles_t time1, time2;
double cycles_per_iter;
unsigned int i, j;
srandom(get_cycles());
printf("number of tests : %lu\n", NR_TESTS);
printf("number of branches per test : %lu\n", NR_ITER);
memset(array, 0, ARRAY_SIZE);
cycles_per_iter = 0.0;
for (i=0; i<NR_TESTS; i++) {
time1 = get_cycles();
for (j = 0; j < NR_ITER; j++) {
testval = &array[random() % ARRAY_SIZE];
if (*testval)
glob++;
}
time2 = get_cycles();
cycles_per_iter += (time2 - time1)/(double)NR_ITER;
}
cycles_per_iter /= (double)NR_TESTS;
printf("memory hit cycles per iteration (mean) : %g\n", cycles_per_iter);
cycles_per_iter = 0.0;
for (i=0; i<NR_TESTS; i++) {
testval = &array[random() % L1_CACHELINE_SIZE];
if (*testval)
glob++;
time1 = get_cycles();
for (j = 0; j < NR_ITER; j++) {
testval = &array[random() % L1_CACHELINE_SIZE];
if (*testval)
glob++;
}
time2 = get_cycles();
cycles_per_iter += (time2 - time1)/(double)NR_ITER;
}
cycles_per_iter /= (double)NR_TESTS;
printf("L1 cache hit cycles per iteration (mean) : %g\n", cycles_per_iter);
cycles_per_iter = 0.0;
for (i=0; i<NR_TESTS; i++) {
time1 = get_cycles();
for (j = 0; j < NR_ITER; j++) {
char condition;
asm ( ".align 2\n\t"
"0:\n\t"
"movb %1,%0;\n\t"
: "=r" (condition)
: "i" (0));
testval = &array[random() % ARRAY_SIZE];
if (condition)
glob++;
}
time2 = get_cycles();
cycles_per_iter += (time2 - time1)/(double)NR_ITER;
}
cycles_per_iter /= (double)NR_TESTS;
printf("instruction stream based test, cycles per iteration (mean) : %g\n", cycles_per_iter);
cycles_per_iter = 0.0;
for (i=0; i<NR_TESTS; i++) {
time1 = get_cycles();
for (j = 0; j < NR_ITER; j++) {
testval = &array[random() % ARRAY_SIZE];
}
time2 = get_cycles();
cycles_per_iter += (time2 - time1)/(double)NR_ITER;
}
cycles_per_iter /= (double)NR_TESTS;
printf("Just getting the pointer, doing noting with it, cycles per iteration (mean) : %g\n", cycles_per_iter);
return 0;
}
----------------------------------------------------------------------------
And here is the syscall timing under heavy memory I/O load: (gcc -O2)
----------------------------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
typedef unsigned long long cycles_t;
#define barrier() __asm__ __volatile__("": : :"memory")
#define rdtsc(low,high) \
__asm__ __volatile__("rdtsc" : "=a" (low), "=d" (high))
#define rdtscl(low) \
__asm__ __volatile__("rdtsc" : "=a" (low) : : "edx")
#define rdtscll(val) \
__asm__ __volatile__("rdtsc" : "=A" (val))
static inline cycles_t get_cycles(void)
{
unsigned long long ret = 0;
barrier();
rdtscll(ret);
barrier();
return ret;
}
#define L1_CACHELINE_SIZE 64
#define L2_CACHELINE_SIZE 128
#define ARRAY_SIZE 1048576 /* 1 MB, size of L2 cache */
char array[ARRAY_SIZE];
#define NR_TESTS 10000
int main(int argc, char **argv)
{
cycles_t time1, time2;
double cycles_per_iter;
unsigned int i, j;
pid_t pid;
printf("number of tests : %lu\n", NR_TESTS);
srandom(get_cycles());
printf("No memory pressure\n");
cycles_per_iter = 0.0;
for (i=0; i<NR_TESTS; i++) {
time1 = get_cycles();
time2 = get_cycles();
cycles_per_iter += (time2 - time1);
}
cycles_per_iter /= (double)NR_TESTS;
printf("Reading timestamps takes %g cycles\n", cycles_per_iter);
cycles_per_iter = 0.0;
for (i=0; i<NR_TESTS; i++) {
time1 = get_cycles();
pid = getppid();
time2 = get_cycles();
cycles_per_iter += (time2 - time1);
}
cycles_per_iter /= (double)NR_TESTS;
printf("getppid : %g cycles\n", cycles_per_iter);
printf("With memory pressure\n");
cycles_per_iter = 0.0;
for (i=0; i<NR_TESTS; i++) {
//memset(array, 1, ARRAY_SIZE);
for (j=0; j<ARRAY_SIZE*3; j++)
array[random() % ARRAY_SIZE] = 1;
time1 = get_cycles();
time2 = get_cycles();
cycles_per_iter += (time2 - time1);
}
cycles_per_iter /= (double)NR_TESTS;
printf("Reading timestamps takes %g cycles\n", cycles_per_iter);
cycles_per_iter = 0.0;
for (i=0; i<NR_TESTS; i++) {
//memset(array, 1, ARRAY_SIZE);
for (j=0; j<ARRAY_SIZE*3; j++)
array[random() % ARRAY_SIZE] = 1;
time1 = get_cycles();
pid = getppid();
time2 = get_cycles();
cycles_per_iter += (time2 - time1);
}
cycles_per_iter /= (double)NR_TESTS;
printf("getppid : %g cycles\n", cycles_per_iter);
return 0;
}
----------------------------------------------------------------------------
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 10/10] Scheduler profiling - Use immediate values
2007-07-06 23:43 ` Mathieu Desnoyers
@ 2007-07-07 2:25 ` Adrian Bunk
2007-07-07 2:35 ` Mathieu Desnoyers
0 siblings, 1 reply; 51+ messages in thread
From: Adrian Bunk @ 2007-07-07 2:25 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Chuck Ebbert, Andi Kleen, Andrew Morton, Alexey Dobriyan,
linux-kernel, mbligh
On Fri, Jul 06, 2007 at 07:43:15PM -0400, Mathieu Desnoyers wrote:
> * Adrian Bunk (bunk@stusta.de) wrote:
> > On Fri, Jul 06, 2007 at 06:14:10PM -0400, Chuck Ebbert wrote:
> > > On 07/06/2007 07:44 AM, Andi Kleen wrote:
> > > > I think the optimization is a good idea, although i dislike it
> > > > that it is complicated for the dynamic markers. If it was just
> > > > static it would be much simpler.
> > >
> > > Another thing to consider is that there might be hundreds of these
> > > probes/tracepoints active in an instrumented kernel. The overhead
> > > adds up fast, so the gain may be worth all the pain.
> >
> > Only if you want to squeeze the last bit of performance out of
> > _debugging_ functionality.
> >
> > You avoid all the pain if you simply don't use debugging functionality
> > on production systems.
> >
>
> Adrian,
>
> Please have a look at my markers posts, especially:
>
> http://www.ussg.iu.edu/hypermail/linux/kernel/0707.0/0669.html
>
> And also look into OLS 2007 proceedings for Martin Bligh's paper on
> Debugging Google sized clusters. It basically makes the case for adding
> functionnality to debug _user space_ problems on production systems that
> can be turned on dynamically.
Using a different kernel for tracing still fulfills all the requirements
listed in section 5 of your paper...
> Mathieu
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 10/10] Scheduler profiling - Use immediate values
2007-07-07 2:25 ` Adrian Bunk
@ 2007-07-07 2:35 ` Mathieu Desnoyers
2007-07-07 4:03 ` Adrian Bunk
0 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2007-07-07 2:35 UTC (permalink / raw)
To: Adrian Bunk
Cc: Chuck Ebbert, Andi Kleen, Andrew Morton, Alexey Dobriyan,
linux-kernel, mbligh
* Adrian Bunk (bunk@stusta.de) wrote:
> On Fri, Jul 06, 2007 at 07:43:15PM -0400, Mathieu Desnoyers wrote:
> > * Adrian Bunk (bunk@stusta.de) wrote:
> > > On Fri, Jul 06, 2007 at 06:14:10PM -0400, Chuck Ebbert wrote:
> > > > On 07/06/2007 07:44 AM, Andi Kleen wrote:
> > > > > I think the optimization is a good idea, although i dislike it
> > > > > that it is complicated for the dynamic markers. If it was just
> > > > > static it would be much simpler.
> > > >
> > > > Another thing to consider is that there might be hundreds of these
> > > > probes/tracepoints active in an instrumented kernel. The overhead
> > > > adds up fast, so the gain may be worth all the pain.
> > >
> > > Only if you want to squeeze the last bit of performance out of
> > > _debugging_ functionality.
> > >
> > > You avoid all the pain if you simply don't use debugging functionality
> > > on production systems.
> > >
> >
> > Adrian,
> >
> > Please have a look at my markers posts, especially:
> >
> > http://www.ussg.iu.edu/hypermail/linux/kernel/0707.0/0669.html
> >
> > And also look into OLS 2007 proceedings for Martin Bligh's paper on
> > Debugging Google sized clusters. It basically makes the case for adding
> > functionnality to debug _user space_ problems on production systems that
> > can be turned on dynamically.
>
> Using a different kernel for tracing still fulfills all the requirements
> listed in section 5 of your paper...
>
Not exactly. I assume you understand that rebooting 1000 live production
servers to find the source of a rare bug or the cause of a performance
issue is out of question.
Moreover, strategies like enabling flight recorder traces on a few nodes
on demand to detect performance problems can only be deployed in
production environment if they are part of the standard production
kernel.
Also, managing two different kernels is often out of question. Not only
is it a maintainance burden, but just switching to the "debug" kernel
can impact the system's behavior so badly that it could make the problem
disappear.
Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 10/10] Scheduler profiling - Use immediate values
2007-07-07 2:35 ` Mathieu Desnoyers
@ 2007-07-07 4:03 ` Adrian Bunk
2007-07-07 5:02 ` Willy Tarreau
0 siblings, 1 reply; 51+ messages in thread
From: Adrian Bunk @ 2007-07-07 4:03 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Chuck Ebbert, Andi Kleen, Andrew Morton, Alexey Dobriyan,
linux-kernel, mbligh
On Fri, Jul 06, 2007 at 10:35:11PM -0400, Mathieu Desnoyers wrote:
> * Adrian Bunk (bunk@stusta.de) wrote:
> > On Fri, Jul 06, 2007 at 07:43:15PM -0400, Mathieu Desnoyers wrote:
> > > * Adrian Bunk (bunk@stusta.de) wrote:
> > > > On Fri, Jul 06, 2007 at 06:14:10PM -0400, Chuck Ebbert wrote:
> > > > > On 07/06/2007 07:44 AM, Andi Kleen wrote:
> > > > > > I think the optimization is a good idea, although i dislike it
> > > > > > that it is complicated for the dynamic markers. If it was just
> > > > > > static it would be much simpler.
> > > > >
> > > > > Another thing to consider is that there might be hundreds of these
> > > > > probes/tracepoints active in an instrumented kernel. The overhead
> > > > > adds up fast, so the gain may be worth all the pain.
> > > >
> > > > Only if you want to squeeze the last bit of performance out of
> > > > _debugging_ functionality.
> > > >
> > > > You avoid all the pain if you simply don't use debugging functionality
> > > > on production systems.
> > > >
> > >
> > > Adrian,
> > >
> > > Please have a look at my markers posts, especially:
> > >
> > > http://www.ussg.iu.edu/hypermail/linux/kernel/0707.0/0669.html
> > >
> > > And also look into OLS 2007 proceedings for Martin Bligh's paper on
> > > Debugging Google sized clusters. It basically makes the case for adding
> > > functionnality to debug _user space_ problems on production systems that
> > > can be turned on dynamically.
> >
> > Using a different kernel for tracing still fulfills all the requirements
> > listed in section 5 of your paper...
> >
>
> Not exactly. I assume you understand that rebooting 1000 live production
> servers to find the source of a rare bug or the cause of a performance
> issue is out of question.
>
> Moreover, strategies like enabling flight recorder traces on a few nodes
> on demand to detect performance problems can only be deployed in
> production environment if they are part of the standard production
> kernel.
>
> Also, managing two different kernels is often out of question. Not only
> is it a maintainance burden, but just switching to the "debug" kernel
> can impact the system's behavior so badly that it could make the problem
> disappear.
As can turning tracing on at runtime.
And you can always define requirements in a way that your solution is
the only one...
Let's go to a different point:
Your paper says "When not running, must have zero effective impact."
How big is the measured impact of your markers when not used without any
immediate voodoo?
You have sent many numbers about micro-benchmarks and theoretical
numbers, but if you have sent the interesting numbers comparing
1. MARKERS=n
2. MARKERS=y, IMMEDIATE=n
3. MARKERS=y, IMMEDIATE=y
in actual benchmark testing I must have missed it.
Does 3. have a measurable and effective advantage over 2. or are you
optimizing for some 0.01% or 1% performance difference without any
effective impact and therefore not requred for the goals outlined in
your paper?
> Mathieu
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 10/10] Scheduler profiling - Use immediate values
2007-07-07 4:03 ` Adrian Bunk
@ 2007-07-07 5:02 ` Willy Tarreau
0 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2007-07-07 5:02 UTC (permalink / raw)
To: Adrian Bunk
Cc: Mathieu Desnoyers, Chuck Ebbert, Andi Kleen, Andrew Morton,
Alexey Dobriyan, linux-kernel, mbligh
On Sat, Jul 07, 2007 at 06:03:07AM +0200, Adrian Bunk wrote:
> On Fri, Jul 06, 2007 at 10:35:11PM -0400, Mathieu Desnoyers wrote:
> > * Adrian Bunk (bunk@stusta.de) wrote:
> > > On Fri, Jul 06, 2007 at 07:43:15PM -0400, Mathieu Desnoyers wrote:
> > > > * Adrian Bunk (bunk@stusta.de) wrote:
> > > > > On Fri, Jul 06, 2007 at 06:14:10PM -0400, Chuck Ebbert wrote:
> > > > > > On 07/06/2007 07:44 AM, Andi Kleen wrote:
> > > > > > > I think the optimization is a good idea, although i dislike it
> > > > > > > that it is complicated for the dynamic markers. If it was just
> > > > > > > static it would be much simpler.
> > > > > >
> > > > > > Another thing to consider is that there might be hundreds of these
> > > > > > probes/tracepoints active in an instrumented kernel. The overhead
> > > > > > adds up fast, so the gain may be worth all the pain.
> > > > >
> > > > > Only if you want to squeeze the last bit of performance out of
> > > > > _debugging_ functionality.
> > > > >
> > > > > You avoid all the pain if you simply don't use debugging functionality
> > > > > on production systems.
> > > > >
> > > >
> > > > Adrian,
> > > >
> > > > Please have a look at my markers posts, especially:
> > > >
> > > > http://www.ussg.iu.edu/hypermail/linux/kernel/0707.0/0669.html
> > > >
> > > > And also look into OLS 2007 proceedings for Martin Bligh's paper on
> > > > Debugging Google sized clusters. It basically makes the case for adding
> > > > functionnality to debug _user space_ problems on production systems that
> > > > can be turned on dynamically.
> > >
> > > Using a different kernel for tracing still fulfills all the requirements
> > > listed in section 5 of your paper...
> > >
> >
> > Not exactly. I assume you understand that rebooting 1000 live production
> > servers to find the source of a rare bug or the cause of a performance
> > issue is out of question.
> >
> > Moreover, strategies like enabling flight recorder traces on a few nodes
> > on demand to detect performance problems can only be deployed in
> > production environment if they are part of the standard production
> > kernel.
> >
> > Also, managing two different kernels is often out of question. Not only
> > is it a maintainance burden, but just switching to the "debug" kernel
> > can impact the system's behavior so badly that it could make the problem
> > disappear.
>
> As can turning tracing on at runtime.
>
> And you can always define requirements in a way that your solution is
> the only one...
On large production environments, you always lose a certain percentage of
machines at each reboot. Most often, it's the CR2032 lithium battery which
is dead and which causes all or parts of the CMOS settings to vanish, hanging
the system at boot. Then you play with the ON/OFF switch and a small
percentage of the power supplies refuse to restart and some disks refuse
to spin up.
Fortunately this does not happen with all machines, but if you have such
problems with 1% of your machines, you lose 10 machines when you reboot
1000 of them. Those problems require a lot of man power, which explains
why such systems are rarely updated.
Causing that much trouble just to enable debugging is clearly
unacceptable, and your debug kernel will simply never be used. Not to
mention the fact that people will never trust it because it's almost
never used !
Regards,
Willy
^ permalink raw reply [flat|nested] 51+ messages in thread
* RE: [patch 10/10] *Tests* Scheduler profiling - Use immediate values
2007-07-07 1:50 ` [patch 10/10] *Tests* " Mathieu Desnoyers
@ 2007-07-07 6:08 ` Li, Tong N
2007-07-11 5:02 ` Mathieu Desnoyers
0 siblings, 1 reply; 51+ messages in thread
From: Li, Tong N @ 2007-07-07 6:08 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Andi Kleen, Andrew Morton, Alexey Dobriyan, linux-kernel
Mathieu,
> cycles_per_iter = 0.0;
> for (i=0; i<NR_TESTS; i++) {
> time1 = get_cycles();
> for (j = 0; j < NR_ITER; j++) {
> testval = &array[random() % ARRAY_SIZE];
> }
> time2 = get_cycles();
> cycles_per_iter += (time2 - time1)/(double)NR_ITER;
> }
> cycles_per_iter /= (double)NR_TESTS;
> printf("Just getting the pointer, doing noting with it, cycles
per
> iteration (mean) : %g\n", cycles_per_iter);
>
Some comments on the code:
1. random() is counted in cycle_per_iter, which can skew the results.
You could pre-compute the random addresses and store them in an array.
Then, during the actual timing, walk the array:
index = 0;
for (i = 0; i < ARRAY_SIZE; i++)
index = *(int *)(array + index * CACHE_LINE_SIZE);
2. You may want to flush the cache before the timing starts.
3. You want to access memory at the cache-line granularity to avoid
addresses falling into the same line (and thus unwanted hits).
If you do these, I expect you'll get a higher memory latency.
tong
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 10/10] Scheduler profiling - Use immediate values
2007-07-07 0:10 ` Adrian Bunk
@ 2007-07-07 15:45 ` Frank Ch. Eigler
2007-07-07 17:01 ` Adrian Bunk
0 siblings, 1 reply; 51+ messages in thread
From: Frank Ch. Eigler @ 2007-07-07 15:45 UTC (permalink / raw)
To: Adrian Bunk
Cc: Dave Jones, Chuck Ebbert, Andi Kleen, Andrew Morton,
Mathieu Desnoyers, Alexey Dobriyan, linux-kernel
Adrian Bunk <bunk@stusta.de> writes:
> [...]
> profiling = debugging of performance problems
Indeed.
> My words were perhaps a bit sloppy, but profiling isn't part of
> normal operation and if people use a separate kernel for such
> purposes we don't need infrastructure for reducing performance
> penalties of enabled debug options.
Things are not so simple. One might not know that one has a
performance problem until one tries some analysis tools. Rebooting
into different kernels just to investigate does not work generally:
the erroneous phenomenon may have been short lived; the debug kernel,
being "only" for debugging, may not be well tested => sufficiently
trustworthy.
Your question asking for an actual performance impact of dormant hooks
is OTOH entirely legitimate. It clearly depends on the placement of
those hooks and thus their encounter rate, more so than their
underlying technology (markers with whatever optimizations). If the
cost is small enough, you will likely find that people will be willing
to pay a small fraction of average performance, in order to eke out
large gains when finding occasional e.g. algorithmic bugs.
- FChE
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 10/10] Scheduler profiling - Use immediate values
2007-07-07 15:45 ` Frank Ch. Eigler
@ 2007-07-07 17:01 ` Adrian Bunk
2007-07-07 17:20 ` Willy Tarreau
2007-07-07 17:55 ` Frank Ch. Eigler
0 siblings, 2 replies; 51+ messages in thread
From: Adrian Bunk @ 2007-07-07 17:01 UTC (permalink / raw)
To: Frank Ch. Eigler
Cc: Dave Jones, Chuck Ebbert, Andi Kleen, Andrew Morton,
Mathieu Desnoyers, Alexey Dobriyan, linux-kernel
On Sat, Jul 07, 2007 at 11:45:20AM -0400, Frank Ch. Eigler wrote:
> Adrian Bunk <bunk@stusta.de> writes:
>
> > [...]
> > profiling = debugging of performance problems
>
> Indeed.
>
> > My words were perhaps a bit sloppy, but profiling isn't part of
> > normal operation and if people use a separate kernel for such
> > purposes we don't need infrastructure for reducing performance
> > penalties of enabled debug options.
>
> Things are not so simple. One might not know that one has a
> performance problem until one tries some analysis tools. Rebooting
> into different kernels just to investigate does not work generally:
> the erroneous phenomenon may have been short lived; the debug kernel,
> being "only" for debugging, may not be well tested => sufficiently
> trustworthy.
I'm not getting this:
You'll only start looking into an analysis tool if you have a
performance problem, IOW if you are not satisfied with the performance.
And the debug code will not have been tested on this machine no matter
whether it's enabled through a compile option or at runtime.
> Your question asking for an actual performance impact of dormant hooks
> is OTOH entirely legitimate. It clearly depends on the placement of
> those hooks and thus their encounter rate, more so than their
> underlying technology (markers with whatever optimizations). If the
> cost is small enough, you will likely find that people will be willing
> to pay a small fraction of average performance, in order to eke out
> large gains when finding occasional e.g. algorithmic bugs.
If you might be able to get a big part of tracing and other debug code
enabled with a performance penalty of a few percent of _kernel_
performance, then you might get much debugging aid without any effective
impact on application performance.
You always have to decide between some debug code and some small bit of
performance. There's a reason why options to disable things like BUG()
or printk() are in the kernel config menus hidden behind CONFIG_EMBEDDED
although they obviously have some performance impact.
> - FChE
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 10/10] Scheduler profiling - Use immediate values
2007-07-07 17:01 ` Adrian Bunk
@ 2007-07-07 17:20 ` Willy Tarreau
2007-07-07 17:59 ` Adrian Bunk
2007-07-07 17:55 ` Frank Ch. Eigler
1 sibling, 1 reply; 51+ messages in thread
From: Willy Tarreau @ 2007-07-07 17:20 UTC (permalink / raw)
To: Adrian Bunk
Cc: Frank Ch. Eigler, Dave Jones, Chuck Ebbert, Andi Kleen,
Andrew Morton, Mathieu Desnoyers, Alexey Dobriyan, linux-kernel
On Sat, Jul 07, 2007 at 07:01:57PM +0200, Adrian Bunk wrote:
> On Sat, Jul 07, 2007 at 11:45:20AM -0400, Frank Ch. Eigler wrote:
> > Adrian Bunk <bunk@stusta.de> writes:
> >
> > > [...]
> > > profiling = debugging of performance problems
> >
> > Indeed.
> >
> > > My words were perhaps a bit sloppy, but profiling isn't part of
> > > normal operation and if people use a separate kernel for such
> > > purposes we don't need infrastructure for reducing performance
> > > penalties of enabled debug options.
> >
> > Things are not so simple. One might not know that one has a
> > performance problem until one tries some analysis tools. Rebooting
> > into different kernels just to investigate does not work generally:
> > the erroneous phenomenon may have been short lived; the debug kernel,
> > being "only" for debugging, may not be well tested => sufficiently
> > trustworthy.
>
> I'm not getting this:
>
> You'll only start looking into an analysis tool if you have a
> performance problem, IOW if you are not satisfied with the performance.
>
> And the debug code will not have been tested on this machine no matter
> whether it's enabled through a compile option or at runtime.
At least all the rest of the code will be untouched and you will not have
to reboot the machine. If you reboot to another kernel, nothing ensures
that you will have the same code sequences (in fact, gcc will reorder
some parts of code such as loops just because of an additional 'if').
So you know that the non-debug code you run will remain untouched. *This*
is important, because the debug code is not there to debug itself, but
to debug the rest.
> > Your question asking for an actual performance impact of dormant hooks
> > is OTOH entirely legitimate. It clearly depends on the placement of
> > those hooks and thus their encounter rate, more so than their
> > underlying technology (markers with whatever optimizations). If the
> > cost is small enough, you will likely find that people will be willing
> > to pay a small fraction of average performance, in order to eke out
> > large gains when finding occasional e.g. algorithmic bugs.
>
> If you might be able to get a big part of tracing and other debug code
> enabled with a performance penalty of a few percent of _kernel_
> performance, then you might get much debugging aid without any effective
> impact on application performance.
it largely depends on the application. Applications which require a lot
of system calls will be more sensible to kernel debugging. Common sense
also implies that such applications will be the ones for which kernel
debugging will be relevant.
> You always have to decide between some debug code and some small bit of
> performance. There's a reason why options to disable things like BUG()
> or printk() are in the kernel config menus hidden behind CONFIG_EMBEDDED
> although they obviously have some performance impact.
It is not for the CPU performance they can be disabled, but for the code
size which is a real problem on embedded system. While you often have
mem/cpu_mhz ratios around 1GB/1GHz on servers and desktops, you more often
have ratios like 16MB/500MHz which is 1:32 of the former. That's why you
optimize for size at the expense of speed on such systems.
>
> > - FChE
>
> cu
> Adrian
Regards,
Willy
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 10/10] Scheduler profiling - Use immediate values
2007-07-07 17:01 ` Adrian Bunk
2007-07-07 17:20 ` Willy Tarreau
@ 2007-07-07 17:55 ` Frank Ch. Eigler
1 sibling, 0 replies; 51+ messages in thread
From: Frank Ch. Eigler @ 2007-07-07 17:55 UTC (permalink / raw)
To: Adrian Bunk
Cc: Dave Jones, Chuck Ebbert, Andi Kleen, Andrew Morton,
Mathieu Desnoyers, Alexey Dobriyan, linux-kernel
Hi, Adrian -
On Sat, Jul 07, 2007 at 07:01:57PM +0200, Adrian Bunk wrote:
> [...]
> > Things are not so simple. One might not know that one has a
> > performance problem until one tries some analysis tools. Rebooting
> > into different kernels just to investigate does not work generally [...]
>
> I'm not getting this:
>
> You'll only start looking into an analysis tool if you have a
> performance problem, IOW if you are not satisfied with the
> performance.
There may be people whose jobs entail continually suspecting
performance problems. Or one may run instrumentation code on a
long-term basis specifically to locate performance spikes.
> And the debug code will not have been tested on this machine no matter
> whether it's enabled through a compile option or at runtime.
There is a big difference in favour of the former. The additional
instrumentation code may be small enough to inspect carefully. The
rest of the kernel would be unaffected.
> [...] If you might be able to get a big part of tracing and other
> debug code enabled with a performance penalty of a few percent of
> _kernel_ performance, then you might get much debugging aid without
> any effective impact on application performance.
Agreed.
- FChE
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 10/10] Scheduler profiling - Use immediate values
2007-07-07 17:20 ` Willy Tarreau
@ 2007-07-07 17:59 ` Adrian Bunk
0 siblings, 0 replies; 51+ messages in thread
From: Adrian Bunk @ 2007-07-07 17:59 UTC (permalink / raw)
To: Willy Tarreau
Cc: Frank Ch. Eigler, Dave Jones, Chuck Ebbert, Andi Kleen,
Andrew Morton, Mathieu Desnoyers, Alexey Dobriyan, linux-kernel
On Sat, Jul 07, 2007 at 07:20:12PM +0200, Willy Tarreau wrote:
> On Sat, Jul 07, 2007 at 07:01:57PM +0200, Adrian Bunk wrote:
> > On Sat, Jul 07, 2007 at 11:45:20AM -0400, Frank Ch. Eigler wrote:
>...
> > You always have to decide between some debug code and some small bit of
> > performance. There's a reason why options to disable things like BUG()
> > or printk() are in the kernel config menus hidden behind CONFIG_EMBEDDED
> > although they obviously have some performance impact.
>
> It is not for the CPU performance they can be disabled, but for the code
> size which is a real problem on embedded system. While you often have
> mem/cpu_mhz ratios around 1GB/1GHz on servers and desktops, you more often
> have ratios like 16MB/500MHz which is 1:32 of the former. That's why you
> optimize for size at the expense of speed on such systems.
The latter is not true for my two examples.
CONFIG_PRINTK=n, CONFIG_BUG=n will obviously make the kernel both
smaller and faster. [1]
> Regards,
> Willy
cu
Adrian
[1] faster due to less code to execute and positive cache effects due to
the smaller code [2]
[2] whether the "faster" is big enough that it is in any way measurable
is a different question
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch 10/10] *Tests* Scheduler profiling - Use immediate values
2007-07-07 6:08 ` Li, Tong N
@ 2007-07-11 5:02 ` Mathieu Desnoyers
0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2007-07-11 5:02 UTC (permalink / raw)
To: Li, Tong N; +Cc: Andi Kleen, Andrew Morton, Alexey Dobriyan, linux-kernel
Hi,
* Li, Tong N (tong.n.li@intel.com) wrote:
> Mathieu,
>
> > cycles_per_iter = 0.0;
> > for (i=0; i<NR_TESTS; i++) {
> > time1 = get_cycles();
> > for (j = 0; j < NR_ITER; j++) {
> > testval = &array[random() % ARRAY_SIZE];
> > }
> > time2 = get_cycles();
> > cycles_per_iter += (time2 - time1)/(double)NR_ITER;
> > }
> > cycles_per_iter /= (double)NR_TESTS;
> > printf("Just getting the pointer, doing noting with it, cycles
> per
> > iteration (mean) : %g\n", cycles_per_iter);
> >
>
> Some comments on the code:
>
> 1. random() is counted in cycle_per_iter, which can skew the results.
> You could pre-compute the random addresses and store them in an array.
> Then, during the actual timing, walk the array:
>
> index = 0;
> for (i = 0; i < ARRAY_SIZE; i++)
> index = *(int *)(array + index * CACHE_LINE_SIZE);
>
> 2. You may want to flush the cache before the timing starts.
>
> 3. You want to access memory at the cache-line granularity to avoid
> addresses falling into the same line (and thus unwanted hits).
>
This is true, my test code was not perfect. Thanks for the hints.
The improvements you propose will clearly accelerate my test program
quite a bit, but I doubt that it will cause even higher memory
latencies. Although using a random() at each memory access is slow, it
should give a good enough dispersion. And since do 3 cache trashing
passes in my code, I make sure that each and every cache lines are
trashed. In fact, since I do multiple accesses to each cache line (as
you noted in point 3), it takes more time, but makes it more certain
that I hit all of them at least once.
> If you do these, I expect you'll get a higher memory latency.
>
I will use these comments in my next tests, thanks. :) However, I still
feel confident that the numbers I got from my run still hold.
Mathieu
> tong
>
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 51+ messages in thread
end of thread, other threads:[~2007-07-11 5:07 UTC | newest]
Thread overview: 51+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-03 16:40 [patch 00/10] Immediate Values Mathieu Desnoyers
2007-07-03 16:40 ` [patch 01/10] Immediate values - Global modules list and module mutex Mathieu Desnoyers
2007-07-03 16:40 ` [patch 02/10] Immediate Value - Architecture Independent Code Mathieu Desnoyers
2007-07-03 16:40 ` [patch 03/10] Immediate Values - Non Optimized Architectures Mathieu Desnoyers
2007-07-03 16:40 ` [patch 04/10] Immediate Value - Add kconfig menus Mathieu Desnoyers
2007-07-03 16:40 ` [patch 05/10] Immediate Values - kprobe header fix Mathieu Desnoyers
2007-07-03 16:40 ` [patch 06/10] Immediate Value - i386 Optimization Mathieu Desnoyers
2007-07-03 18:45 ` H. Peter Anvin
2007-07-03 19:16 ` Mathieu Desnoyers
2007-07-03 20:18 ` H. Peter Anvin
2007-07-03 20:37 ` Chuck Ebbert
2007-07-03 21:30 ` H. Peter Anvin
2007-07-03 23:10 ` Jeremy Fitzhardinge
2007-07-03 20:43 ` Mathieu Desnoyers
2007-07-03 21:30 ` H. Peter Anvin
2007-07-03 16:40 ` [patch 07/10] Immediate Value - PowerPC Optimization Mathieu Desnoyers
2007-07-03 16:40 ` [patch 08/10] Immediate Value - Documentation Mathieu Desnoyers
2007-07-03 16:40 ` [patch 09/10] F00F bug fixup for i386 - use immediate values Mathieu Desnoyers
2007-07-04 20:43 ` Alexey Dobriyan
2007-07-03 16:40 ` [patch 10/10] Scheduler profiling - Use " Mathieu Desnoyers
2007-07-03 18:11 ` Alexey Dobriyan
2007-07-03 18:57 ` Mathieu Desnoyers
2007-07-04 14:23 ` Adrian Bunk
2007-07-04 20:31 ` Alexey Dobriyan
2007-07-05 20:21 ` Andrew Morton
2007-07-05 20:29 ` Andrew Morton
2007-07-05 20:41 ` Mathieu Desnoyers
2007-07-06 11:44 ` Andi Kleen
2007-07-06 17:50 ` Li, Tong N
2007-07-06 20:03 ` Andi Kleen
2007-07-06 20:57 ` Li, Tong N
2007-07-06 21:03 ` Mathieu Desnoyers
2007-07-07 1:50 ` [patch 10/10] *Tests* " Mathieu Desnoyers
2007-07-07 6:08 ` Li, Tong N
2007-07-11 5:02 ` Mathieu Desnoyers
2007-07-06 22:14 ` [patch 10/10] " Chuck Ebbert
2007-07-06 23:28 ` Adrian Bunk
2007-07-06 23:38 ` Dave Jones
2007-07-07 0:10 ` Adrian Bunk
2007-07-07 15:45 ` Frank Ch. Eigler
2007-07-07 17:01 ` Adrian Bunk
2007-07-07 17:20 ` Willy Tarreau
2007-07-07 17:59 ` Adrian Bunk
2007-07-07 17:55 ` Frank Ch. Eigler
2007-07-06 23:43 ` Mathieu Desnoyers
2007-07-07 2:25 ` Adrian Bunk
2007-07-07 2:35 ` Mathieu Desnoyers
2007-07-07 4:03 ` Adrian Bunk
2007-07-07 5:02 ` Willy Tarreau
2007-07-04 20:35 ` Alexey Dobriyan
2007-07-04 22:41 ` Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox