* [patch 00/10] [PATCH] Create instrumentation/ kernel directory
@ 2007-10-29 14:26 Mathieu Desnoyers
2007-10-29 14:26 ` [patch 01/10] Create instrumentation directory Mathieu Desnoyers
` (9 more replies)
0 siblings, 10 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2007-10-29 14:26 UTC (permalink / raw)
To: linux-kernel
Hi,
Since we already have the Instrumentation menu in
kernel/Kconfig.instrumentation and instrumentation code all over the
kernel tree:
arch/*/oprofile/*.c
kernel/kprobes.c
arch/*/kernel/kprobes.c
kernel/marker.c
kernel/profile.c
kernel/lockdep.c
vm/vmstat.c
block/blktrace.c
We could move them to
instrumentation/
arch/*/instrumentation/
Therefore, we could also move the kprobes and marker samples under
instrumentation/samples/
I submit these patches for review/comments, especially about kernel build
integration. instrumentation/ is currently a core-y and arch/*/instrumentation/
is a drivers-y because of oprofile.
Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 13+ messages in thread
* [patch 01/10] Create instrumentation directory
2007-10-29 14:26 [patch 00/10] [PATCH] Create instrumentation/ kernel directory Mathieu Desnoyers
@ 2007-10-29 14:26 ` Mathieu Desnoyers
2007-10-29 14:26 ` [patch 03/10] Move kprobes to instrumentation/ and arch/*/instrumentation/ Mathieu Desnoyers
` (8 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2007-10-29 14:26 UTC (permalink / raw)
To: linux-kernel; +Cc: Mathieu Desnoyers, Randy Dunlap, Sam Ravnborg
[-- Attachment #1: create-instrumentation-dir.patch --]
[-- Type: text/plain, Size: 17371 bytes --]
Since we already have the Instrumentation menu in
kernel/Kconfig.instrumentation and instrumentation code all over the
kernel tree:
arch/*/oprofile/*.c
kernel/kprobes.c
arch/*/kernel/kprobes.c
kernel/marker.c
kernel/profile.c
kernel/lockdep.c
vm/vmstat.c
block/blktrace.c
We could move them to
instrumentation/
arch/*/instrumentation/
Therefore, we could also move the kprobes and marker samples under
instrumentation/samples/
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Randy Dunlap <rdunlap@xenotime.net>
CC: Sam Ravnborg <sam@ravnborg.org>
---
arch/alpha/Kconfig | 2 -
arch/arm/Kconfig | 2 -
arch/blackfin/Kconfig | 2 -
arch/cris/Kconfig | 2 -
arch/frv/Kconfig | 2 -
arch/h8300/Kconfig | 2 -
arch/i386/Kconfig | 2 -
arch/ia64/Kconfig | 2 -
arch/m32r/Kconfig | 2 -
arch/m68k/Kconfig | 2 -
arch/m68knommu/Kconfig | 2 -
arch/mips/Kconfig | 2 -
arch/parisc/Kconfig | 2 -
arch/powerpc/Kconfig | 2 -
arch/ppc/Kconfig | 2 -
arch/s390/Kconfig | 2 -
arch/sh/Kconfig | 2 -
arch/sh64/Kconfig | 2 -
arch/sparc/Kconfig | 2 -
arch/sparc64/Kconfig | 2 -
arch/um/Kconfig | 2 -
arch/v850/Kconfig | 2 -
arch/x86_64/Kconfig | 2 -
arch/xtensa/Kconfig | 2 -
instrumentation/Kconfig | 49 +++++++++++++++++++++++++++++++++++++++++
instrumentation/Makefile | 3 ++
kernel/Kconfig.instrumentation | 49 -----------------------------------------
27 files changed, 76 insertions(+), 73 deletions(-)
Index: linux-2.6-lttng.stable/instrumentation/Kconfig
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng.stable/instrumentation/Kconfig 2007-10-29 09:51:12.000000000 -0400
@@ -0,0 +1,49 @@
+menuconfig INSTRUMENTATION
+ bool "Instrumentation Support"
+ default y
+ ---help---
+ Say Y here to get to see options related to performance measurement,
+ system-wide debugging, and testing. This option alone does not add any
+ kernel code.
+
+ If you say N, all options in this submenu will be skipped and
+ disabled. If you're trying to debug the kernel itself, go see the
+ Kernel Hacking menu.
+
+if INSTRUMENTATION
+
+config PROFILING
+ bool "Profiling support (EXPERIMENTAL)"
+ help
+ Say Y here to enable the extended profiling support mechanisms used
+ by profilers such as OProfile.
+
+config OPROFILE
+ tristate "OProfile system profiling (EXPERIMENTAL)"
+ depends on PROFILING
+ depends on ALPHA || ARM || BLACKFIN || X86_32 || IA64 || M32R || MIPS || PARISC || PPC || S390 || SUPERH || SPARC || X86_64
+ help
+ OProfile is a profiling system capable of profiling the
+ whole system, include the kernel, kernel modules, libraries,
+ and applications.
+
+ If unsure, say N.
+
+config KPROBES
+ bool "Kprobes"
+ depends on KALLSYMS && MODULES
+ depends on X86_32 || IA64 || PPC || S390 || SPARC64 || X86_64 || AVR32
+ help
+ Kprobes allows you to trap at almost any kernel address and
+ execute a callback function. register_kprobe() establishes
+ a probepoint and specifies the callback. Kprobes is useful
+ for kernel debugging, non-intrusive instrumentation and testing.
+ If in doubt, say "N".
+
+config MARKERS
+ bool "Activate markers"
+ help
+ Place an empty function call at each marker site. Can be
+ dynamically changed for a probe function.
+
+endif # INSTRUMENTATION
Index: linux-2.6-lttng.stable/kernel/Kconfig.instrumentation
===================================================================
--- linux-2.6-lttng.stable.orig/kernel/Kconfig.instrumentation 2007-10-29 09:51:10.000000000 -0400
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,49 +0,0 @@
-menuconfig INSTRUMENTATION
- bool "Instrumentation Support"
- default y
- ---help---
- Say Y here to get to see options related to performance measurement,
- system-wide debugging, and testing. This option alone does not add any
- kernel code.
-
- If you say N, all options in this submenu will be skipped and
- disabled. If you're trying to debug the kernel itself, go see the
- Kernel Hacking menu.
-
-if INSTRUMENTATION
-
-config PROFILING
- bool "Profiling support (EXPERIMENTAL)"
- help
- Say Y here to enable the extended profiling support mechanisms used
- by profilers such as OProfile.
-
-config OPROFILE
- tristate "OProfile system profiling (EXPERIMENTAL)"
- depends on PROFILING
- depends on ALPHA || ARM || BLACKFIN || X86_32 || IA64 || M32R || MIPS || PARISC || PPC || S390 || SUPERH || SPARC || X86_64
- help
- OProfile is a profiling system capable of profiling the
- whole system, include the kernel, kernel modules, libraries,
- and applications.
-
- If unsure, say N.
-
-config KPROBES
- bool "Kprobes"
- depends on KALLSYMS && MODULES
- depends on X86_32 || IA64 || PPC || S390 || SPARC64 || X86_64 || AVR32
- help
- Kprobes allows you to trap at almost any kernel address and
- execute a callback function. register_kprobe() establishes
- a probepoint and specifies the callback. Kprobes is useful
- for kernel debugging, non-intrusive instrumentation and testing.
- If in doubt, say "N".
-
-config MARKERS
- bool "Activate markers"
- help
- Place an empty function call at each marker site. Can be
- dynamically changed for a probe function.
-
-endif # INSTRUMENTATION
Index: linux-2.6-lttng.stable/arch/alpha/Kconfig
===================================================================
--- linux-2.6-lttng.stable.orig/arch/alpha/Kconfig 2007-10-29 09:51:10.000000000 -0400
+++ linux-2.6-lttng.stable/arch/alpha/Kconfig 2007-10-29 09:51:12.000000000 -0400
@@ -654,7 +654,7 @@ source "drivers/Kconfig"
source "fs/Kconfig"
-source "kernel/Kconfig.instrumentation"
+source "instrumentation/Kconfig"
source "arch/alpha/Kconfig.debug"
Index: linux-2.6-lttng.stable/arch/arm/Kconfig
===================================================================
--- linux-2.6-lttng.stable.orig/arch/arm/Kconfig 2007-10-29 09:51:10.000000000 -0400
+++ linux-2.6-lttng.stable/arch/arm/Kconfig 2007-10-29 09:51:12.000000000 -0400
@@ -1068,7 +1068,7 @@ endmenu
source "fs/Kconfig"
-source "kernel/Kconfig.instrumentation"
+source "instrumentation/Kconfig"
source "arch/arm/Kconfig.debug"
Index: linux-2.6-lttng.stable/arch/blackfin/Kconfig
===================================================================
--- linux-2.6-lttng.stable.orig/arch/blackfin/Kconfig 2007-10-29 09:51:10.000000000 -0400
+++ linux-2.6-lttng.stable/arch/blackfin/Kconfig 2007-10-29 09:51:12.000000000 -0400
@@ -1062,7 +1062,7 @@ source "drivers/Kconfig"
source "fs/Kconfig"
-source "kernel/Kconfig.instrumentation"
+source "instrumentation/Kconfig"
menu "Kernel hacking"
Index: linux-2.6-lttng.stable/arch/cris/Kconfig
===================================================================
--- linux-2.6-lttng.stable.orig/arch/cris/Kconfig 2007-10-29 09:51:10.000000000 -0400
+++ linux-2.6-lttng.stable/arch/cris/Kconfig 2007-10-29 09:51:12.000000000 -0400
@@ -196,7 +196,7 @@ source "sound/Kconfig"
source "drivers/usb/Kconfig"
-source "kernel/Kconfig.instrumentation"
+source "instrumentation/Kconfig"
source "arch/cris/Kconfig.debug"
Index: linux-2.6-lttng.stable/arch/frv/Kconfig
===================================================================
--- linux-2.6-lttng.stable.orig/arch/frv/Kconfig 2007-10-29 09:51:10.000000000 -0400
+++ linux-2.6-lttng.stable/arch/frv/Kconfig 2007-10-29 09:51:12.000000000 -0400
@@ -375,7 +375,7 @@ source "drivers/Kconfig"
source "fs/Kconfig"
-source "kernel/Kconfig.instrumentation"
+source "instrumentation/Kconfig"
source "arch/frv/Kconfig.debug"
Index: linux-2.6-lttng.stable/arch/h8300/Kconfig
===================================================================
--- linux-2.6-lttng.stable.orig/arch/h8300/Kconfig 2007-10-29 09:51:10.000000000 -0400
+++ linux-2.6-lttng.stable/arch/h8300/Kconfig 2007-10-29 09:51:12.000000000 -0400
@@ -223,7 +223,7 @@ endmenu
source "fs/Kconfig"
-source "kernel/Kconfig.instrumentation"
+source "instrumentation/Kconfig"
source "arch/h8300/Kconfig.debug"
Index: linux-2.6-lttng.stable/arch/i386/Kconfig
===================================================================
--- linux-2.6-lttng.stable.orig/arch/i386/Kconfig 2007-10-29 09:51:10.000000000 -0400
+++ linux-2.6-lttng.stable/arch/i386/Kconfig 2007-10-29 09:51:12.000000000 -0400
@@ -1270,7 +1270,7 @@ source "drivers/Kconfig"
source "fs/Kconfig"
-source "kernel/Kconfig.instrumentation"
+source "instrumentation/Kconfig"
source "arch/i386/Kconfig.debug"
Index: linux-2.6-lttng.stable/arch/ia64/Kconfig
===================================================================
--- linux-2.6-lttng.stable.orig/arch/ia64/Kconfig 2007-10-29 09:51:10.000000000 -0400
+++ linux-2.6-lttng.stable/arch/ia64/Kconfig 2007-10-29 09:51:12.000000000 -0400
@@ -592,7 +592,7 @@ config IRQ_PER_CPU
source "arch/ia64/hp/sim/Kconfig"
-source "kernel/Kconfig.instrumentation"
+source "instrumentation/Kconfig"
source "arch/ia64/Kconfig.debug"
Index: linux-2.6-lttng.stable/arch/m32r/Kconfig
===================================================================
--- linux-2.6-lttng.stable.orig/arch/m32r/Kconfig 2007-10-29 09:51:10.000000000 -0400
+++ linux-2.6-lttng.stable/arch/m32r/Kconfig 2007-10-29 09:51:12.000000000 -0400
@@ -426,7 +426,7 @@ source "drivers/Kconfig"
source "fs/Kconfig"
-source "kernel/Kconfig.instrumentation"
+source "instrumentation/Kconfig"
source "arch/m32r/Kconfig.debug"
Index: linux-2.6-lttng.stable/arch/m68k/Kconfig
===================================================================
--- linux-2.6-lttng.stable.orig/arch/m68k/Kconfig 2007-10-29 09:51:10.000000000 -0400
+++ linux-2.6-lttng.stable/arch/m68k/Kconfig 2007-10-29 09:51:12.000000000 -0400
@@ -683,7 +683,7 @@ endmenu
source "fs/Kconfig"
-source "kernel/Kconfig.instrumentation"
+source "instrumentation/Kconfig"
source "arch/m68k/Kconfig.debug"
Index: linux-2.6-lttng.stable/arch/m68knommu/Kconfig
===================================================================
--- linux-2.6-lttng.stable.orig/arch/m68knommu/Kconfig 2007-10-29 09:51:10.000000000 -0400
+++ linux-2.6-lttng.stable/arch/m68knommu/Kconfig 2007-10-29 09:51:12.000000000 -0400
@@ -707,7 +707,7 @@ source "drivers/Kconfig"
source "fs/Kconfig"
-source "kernel/Kconfig.instrumentation"
+source "instrumentation/Kconfig"
source "arch/m68knommu/Kconfig.debug"
Index: linux-2.6-lttng.stable/arch/mips/Kconfig
===================================================================
--- linux-2.6-lttng.stable.orig/arch/mips/Kconfig 2007-10-29 09:51:10.000000000 -0400
+++ linux-2.6-lttng.stable/arch/mips/Kconfig 2007-10-29 09:51:12.000000000 -0400
@@ -2009,7 +2009,7 @@ source "drivers/Kconfig"
source "fs/Kconfig"
-source "kernel/Kconfig.instrumentation"
+source "instrumentation/Kconfig"
source "arch/mips/Kconfig.debug"
Index: linux-2.6-lttng.stable/arch/parisc/Kconfig
===================================================================
--- linux-2.6-lttng.stable.orig/arch/parisc/Kconfig 2007-10-29 09:51:10.000000000 -0400
+++ linux-2.6-lttng.stable/arch/parisc/Kconfig 2007-10-29 09:51:12.000000000 -0400
@@ -267,7 +267,7 @@ source "drivers/Kconfig"
source "fs/Kconfig"
-source "kernel/Kconfig.instrumentation"
+source "instrumentation/Kconfig"
source "arch/parisc/Kconfig.debug"
Index: linux-2.6-lttng.stable/arch/powerpc/Kconfig
===================================================================
--- linux-2.6-lttng.stable.orig/arch/powerpc/Kconfig 2007-10-29 09:51:10.000000000 -0400
+++ linux-2.6-lttng.stable/arch/powerpc/Kconfig 2007-10-29 09:51:12.000000000 -0400
@@ -669,7 +669,7 @@ source "arch/powerpc/sysdev/qe_lib/Kconf
source "lib/Kconfig"
-source "kernel/Kconfig.instrumentation"
+source "instrumentation/Kconfig"
source "arch/powerpc/Kconfig.debug"
Index: linux-2.6-lttng.stable/arch/ppc/Kconfig
===================================================================
--- linux-2.6-lttng.stable.orig/arch/ppc/Kconfig 2007-10-29 09:51:10.000000000 -0400
+++ linux-2.6-lttng.stable/arch/ppc/Kconfig 2007-10-29 09:51:12.000000000 -0400
@@ -1317,7 +1317,7 @@ endmenu
source "lib/Kconfig"
-source "kernel/Kconfig.instrumentation"
+source "instrumentation/Kconfig"
source "arch/ppc/Kconfig.debug"
Index: linux-2.6-lttng.stable/arch/s390/Kconfig
===================================================================
--- linux-2.6-lttng.stable.orig/arch/s390/Kconfig 2007-10-29 09:51:10.000000000 -0400
+++ linux-2.6-lttng.stable/arch/s390/Kconfig 2007-10-29 09:51:12.000000000 -0400
@@ -529,7 +529,7 @@ source "drivers/Kconfig"
source "fs/Kconfig"
-source "kernel/Kconfig.instrumentation"
+source "instrumentation/Kconfig"
source "arch/s390/Kconfig.debug"
Index: linux-2.6-lttng.stable/arch/sh/Kconfig
===================================================================
--- linux-2.6-lttng.stable.orig/arch/sh/Kconfig 2007-10-29 09:51:10.000000000 -0400
+++ linux-2.6-lttng.stable/arch/sh/Kconfig 2007-10-29 09:51:12.000000000 -0400
@@ -758,7 +758,7 @@ source "drivers/Kconfig"
source "fs/Kconfig"
-source "kernel/Kconfig.instrumentation"
+source "instrumentation/Kconfig"
source "arch/sh/Kconfig.debug"
Index: linux-2.6-lttng.stable/arch/sh64/Kconfig
===================================================================
--- linux-2.6-lttng.stable.orig/arch/sh64/Kconfig 2007-10-29 09:51:10.000000000 -0400
+++ linux-2.6-lttng.stable/arch/sh64/Kconfig 2007-10-29 09:51:12.000000000 -0400
@@ -284,7 +284,7 @@ source "drivers/Kconfig"
source "fs/Kconfig"
-source "kernel/Kconfig.instrumentation"
+source "instrumentation/Kconfig"
source "arch/sh64/Kconfig.debug"
Index: linux-2.6-lttng.stable/arch/sparc/Kconfig
===================================================================
--- linux-2.6-lttng.stable.orig/arch/sparc/Kconfig 2007-10-29 09:51:10.000000000 -0400
+++ linux-2.6-lttng.stable/arch/sparc/Kconfig 2007-10-29 09:51:12.000000000 -0400
@@ -320,7 +320,7 @@ endmenu
source "fs/Kconfig"
-source "kernel/Kconfig.instrumentation"
+source "instrumentation/Kconfig"
source "arch/sparc/Kconfig.debug"
Index: linux-2.6-lttng.stable/arch/sparc64/Kconfig
===================================================================
--- linux-2.6-lttng.stable.orig/arch/sparc64/Kconfig 2007-10-29 09:51:10.000000000 -0400
+++ linux-2.6-lttng.stable/arch/sparc64/Kconfig 2007-10-29 09:51:12.000000000 -0400
@@ -462,7 +462,7 @@ source "drivers/sbus/char/Kconfig"
source "fs/Kconfig"
-source "kernel/Kconfig.instrumentation"
+source "instrumentation/Kconfig"
source "arch/sparc64/Kconfig.debug"
Index: linux-2.6-lttng.stable/arch/um/Kconfig
===================================================================
--- linux-2.6-lttng.stable.orig/arch/um/Kconfig 2007-10-29 09:51:10.000000000 -0400
+++ linux-2.6-lttng.stable/arch/um/Kconfig 2007-10-29 09:51:12.000000000 -0400
@@ -289,6 +289,6 @@ config INPUT
bool
default n
-source "kernel/Kconfig.instrumentation"
+source "instrumentation/Kconfig"
source "arch/um/Kconfig.debug"
Index: linux-2.6-lttng.stable/arch/v850/Kconfig
===================================================================
--- linux-2.6-lttng.stable.orig/arch/v850/Kconfig 2007-10-29 09:51:10.000000000 -0400
+++ linux-2.6-lttng.stable/arch/v850/Kconfig 2007-10-29 09:51:12.000000000 -0400
@@ -331,7 +331,7 @@ source "sound/Kconfig"
source "drivers/usb/Kconfig"
-source "kernel/Kconfig.instrumentation"
+source "instrumentation/Kconfig"
source "arch/v850/Kconfig.debug"
Index: linux-2.6-lttng.stable/arch/x86_64/Kconfig
===================================================================
--- linux-2.6-lttng.stable.orig/arch/x86_64/Kconfig 2007-10-29 09:51:10.000000000 -0400
+++ linux-2.6-lttng.stable/arch/x86_64/Kconfig 2007-10-29 09:51:12.000000000 -0400
@@ -833,7 +833,7 @@ source "drivers/firmware/Kconfig"
source fs/Kconfig
-source "kernel/Kconfig.instrumentation"
+source "instrumentation/Kconfig"
source "arch/x86_64/Kconfig.debug"
Index: linux-2.6-lttng.stable/arch/xtensa/Kconfig
===================================================================
--- linux-2.6-lttng.stable.orig/arch/xtensa/Kconfig 2007-10-29 09:51:10.000000000 -0400
+++ linux-2.6-lttng.stable/arch/xtensa/Kconfig 2007-10-29 09:51:12.000000000 -0400
@@ -251,7 +251,7 @@ config EMBEDDED_RAMDISK_IMAGE
provide one yourself.
endmenu
-source "kernel/Kconfig.instrumentation"
+source "instrumentation/Kconfig"
source "arch/xtensa/Kconfig.debug"
Index: linux-2.6-lttng.stable/instrumentation/Makefile
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng.stable/instrumentation/Makefile 2007-10-29 09:51:12.000000000 -0400
@@ -0,0 +1,3 @@
+#
+# Makefile for the linux kernel instrumentation
+#
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 13+ messages in thread
* [patch 03/10] Move kprobes to instrumentation/ and arch/*/instrumentation/
2007-10-29 14:26 [patch 00/10] [PATCH] Create instrumentation/ kernel directory Mathieu Desnoyers
2007-10-29 14:26 ` [patch 01/10] Create instrumentation directory Mathieu Desnoyers
@ 2007-10-29 14:26 ` Mathieu Desnoyers
2007-10-29 14:26 ` [patch 04/10] Move Linux Kernel Markers to instrumentation directory Mathieu Desnoyers
` (7 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2007-10-29 14:26 UTC (permalink / raw)
To: linux-kernel
Cc: Mathieu Desnoyers, Prasanna S Panchamukhi,
Ananth N Mavinakayanahalli, Anil S Keshavamurthy, David S. Miller
[-- Attachment #1: move-kprobes-to-instrumentation.patch --]
[-- Type: text/plain, Size: 329234 bytes --]
Consolidate the kprobes in the instrumentation directory.
- /instrumentation is compiled as a core-y
- Note : arch/*/instrumentation/kprobes.c is now compiled as a drivers-y. Was a
core-y before. It is caused by oprofile being a drivers-y.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Prasanna S Panchamukhi <prasanna@in.ibm.com>
CC: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
CC: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
CC: David S. Miller <davem@davemloft.net>
---
Makefile | 3
arch/avr32/instrumentation/Makefile | 2
arch/avr32/instrumentation/kprobes.c | 268 ++++++++
arch/avr32/kernel/Makefile | 1
arch/avr32/kernel/kprobes.c | 268 --------
arch/ia64/instrumentation/Makefile | 2
arch/ia64/instrumentation/kprobes.c | 1027 +++++++++++++++++++++++++++++++
arch/ia64/kernel/kprobes.c | 1027 -------------------------------
arch/powerpc/instrumentation/Makefile | 2
arch/powerpc/instrumentation/kprobes.c | 559 +++++++++++++++++
arch/powerpc/kernel/kprobes.c | 559 -----------------
arch/s390/instrumentation/Makefile | 2
arch/s390/instrumentation/kprobes.c | 672 ++++++++++++++++++++
arch/s390/kernel/kprobes.c | 672 --------------------
arch/sparc64/instrumentation/Makefile | 2
arch/sparc64/instrumentation/kprobes.c | 487 +++++++++++++++
arch/sparc64/kernel/kprobes.c | 487 ---------------
arch/x86/instrumentation/Makefile | 6
arch/x86/instrumentation/kprobes_32.c | 763 +++++++++++++++++++++++
arch/x86/instrumentation/kprobes_64.c | 761 +++++++++++++++++++++++
arch/x86/kernel/Makefile_32 | 1
arch/x86/kernel/Makefile_64 | 1
arch/x86/kernel/kprobes_32.c | 763 -----------------------
arch/x86/kernel/kprobes_64.c | 761 -----------------------
instrumentation/Makefile | 2
instrumentation/kprobes.c | 1063 +++++++++++++++++++++++++++++++++
kernel/Makefile | 1
kernel/kprobes.c | 1063 ---------------------------------
28 files changed, 5620 insertions(+), 5605 deletions(-)
Index: linux-2.6-lttng.stable/instrumentation/kprobes.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng.stable/instrumentation/kprobes.c 2007-10-29 09:51:38.000000000 -0400
@@ -0,0 +1,1063 @@
+/*
+ * Kernel Probes (KProbes)
+ * kernel/kprobes.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2002, 2004
+ *
+ * 2002-Oct Created by Vamsi Krishna S <vamsi_krishna@in.ibm.com> Kernel
+ * Probes initial implementation (includes suggestions from
+ * Rusty Russell).
+ * 2004-Aug Updated by Prasanna S Panchamukhi <prasanna@in.ibm.com> with
+ * hlists and exceptions notifier as suggested by Andi Kleen.
+ * 2004-July Suparna Bhattacharya <suparna@in.ibm.com> added jumper probes
+ * interface to access function arguments.
+ * 2004-Sep Prasanna S Panchamukhi <prasanna@in.ibm.com> Changed Kprobes
+ * exceptions notifier to be first on the priority list.
+ * 2005-May Hien Nguyen <hien@us.ibm.com>, Jim Keniston
+ * <jkenisto@us.ibm.com> and Prasanna S Panchamukhi
+ * <prasanna@in.ibm.com> added function-return probes.
+ */
+#include <linux/kprobes.h>
+#include <linux/hash.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/stddef.h>
+#include <linux/module.h>
+#include <linux/moduleloader.h>
+#include <linux/kallsyms.h>
+#include <linux/freezer.h>
+#include <linux/seq_file.h>
+#include <linux/debugfs.h>
+#include <linux/kdebug.h>
+
+#include <asm-generic/sections.h>
+#include <asm/cacheflush.h>
+#include <asm/errno.h>
+#include <asm/uaccess.h>
+
+#define KPROBE_HASH_BITS 6
+#define KPROBE_TABLE_SIZE (1 << KPROBE_HASH_BITS)
+
+
+/*
+ * Some oddball architectures like 64bit powerpc have function descriptors
+ * so this must be overridable.
+ */
+#ifndef kprobe_lookup_name
+#define kprobe_lookup_name(name, addr) \
+ addr = ((kprobe_opcode_t *)(kallsyms_lookup_name(name)))
+#endif
+
+static struct hlist_head kprobe_table[KPROBE_TABLE_SIZE];
+static struct hlist_head kretprobe_inst_table[KPROBE_TABLE_SIZE];
+
+/* NOTE: change this value only with kprobe_mutex held */
+static bool kprobe_enabled;
+
+DEFINE_MUTEX(kprobe_mutex); /* Protects kprobe_table */
+DEFINE_SPINLOCK(kretprobe_lock); /* Protects kretprobe_inst_table */
+static DEFINE_PER_CPU(struct kprobe *, kprobe_instance) = NULL;
+
+#ifdef __ARCH_WANT_KPROBES_INSN_SLOT
+/*
+ * kprobe->ainsn.insn points to the copy of the instruction to be
+ * single-stepped. x86_64, POWER4 and above have no-exec support and
+ * stepping on the instruction on a vmalloced/kmalloced/data page
+ * is a recipe for disaster
+ */
+#define INSNS_PER_PAGE (PAGE_SIZE/(MAX_INSN_SIZE * sizeof(kprobe_opcode_t)))
+
+struct kprobe_insn_page {
+ struct hlist_node hlist;
+ kprobe_opcode_t *insns; /* Page of instruction slots */
+ char slot_used[INSNS_PER_PAGE];
+ int nused;
+ int ngarbage;
+};
+
+enum kprobe_slot_state {
+ SLOT_CLEAN = 0,
+ SLOT_DIRTY = 1,
+ SLOT_USED = 2,
+};
+
+static struct hlist_head kprobe_insn_pages;
+static int kprobe_garbage_slots;
+static int collect_garbage_slots(void);
+
+static int __kprobes check_safety(void)
+{
+ int ret = 0;
+#if defined(CONFIG_PREEMPT) && defined(CONFIG_PM)
+ ret = freeze_processes();
+ if (ret == 0) {
+ struct task_struct *p, *q;
+ do_each_thread(p, q) {
+ if (p != current && p->state == TASK_RUNNING &&
+ p->pid != 0) {
+ printk("Check failed: %s is running\n",p->comm);
+ ret = -1;
+ goto loop_end;
+ }
+ } while_each_thread(p, q);
+ }
+loop_end:
+ thaw_processes();
+#else
+ synchronize_sched();
+#endif
+ return ret;
+}
+
+/**
+ * get_insn_slot() - Find a slot on an executable page for an instruction.
+ * We allocate an executable page if there's no room on existing ones.
+ */
+kprobe_opcode_t __kprobes *get_insn_slot(void)
+{
+ struct kprobe_insn_page *kip;
+ struct hlist_node *pos;
+
+ retry:
+ hlist_for_each_entry(kip, pos, &kprobe_insn_pages, hlist) {
+ if (kip->nused < INSNS_PER_PAGE) {
+ int i;
+ for (i = 0; i < INSNS_PER_PAGE; i++) {
+ if (kip->slot_used[i] == SLOT_CLEAN) {
+ kip->slot_used[i] = SLOT_USED;
+ kip->nused++;
+ return kip->insns + (i * MAX_INSN_SIZE);
+ }
+ }
+ /* Surprise! No unused slots. Fix kip->nused. */
+ kip->nused = INSNS_PER_PAGE;
+ }
+ }
+
+ /* If there are any garbage slots, collect it and try again. */
+ if (kprobe_garbage_slots && collect_garbage_slots() == 0) {
+ goto retry;
+ }
+ /* All out of space. Need to allocate a new page. Use slot 0. */
+ kip = kmalloc(sizeof(struct kprobe_insn_page), GFP_KERNEL);
+ if (!kip)
+ return NULL;
+
+ /*
+ * Use module_alloc so this page is within +/- 2GB of where the
+ * kernel image and loaded module images reside. This is required
+ * so x86_64 can correctly handle the %rip-relative fixups.
+ */
+ kip->insns = module_alloc(PAGE_SIZE);
+ if (!kip->insns) {
+ kfree(kip);
+ return NULL;
+ }
+ INIT_HLIST_NODE(&kip->hlist);
+ hlist_add_head(&kip->hlist, &kprobe_insn_pages);
+ memset(kip->slot_used, SLOT_CLEAN, INSNS_PER_PAGE);
+ kip->slot_used[0] = SLOT_USED;
+ kip->nused = 1;
+ kip->ngarbage = 0;
+ return kip->insns;
+}
+
+/* Return 1 if all garbages are collected, otherwise 0. */
+static int __kprobes collect_one_slot(struct kprobe_insn_page *kip, int idx)
+{
+ kip->slot_used[idx] = SLOT_CLEAN;
+ kip->nused--;
+ if (kip->nused == 0) {
+ /*
+ * Page is no longer in use. Free it unless
+ * it's the last one. We keep the last one
+ * so as not to have to set it up again the
+ * next time somebody inserts a probe.
+ */
+ hlist_del(&kip->hlist);
+ if (hlist_empty(&kprobe_insn_pages)) {
+ INIT_HLIST_NODE(&kip->hlist);
+ hlist_add_head(&kip->hlist,
+ &kprobe_insn_pages);
+ } else {
+ module_free(NULL, kip->insns);
+ kfree(kip);
+ }
+ return 1;
+ }
+ return 0;
+}
+
+static int __kprobes collect_garbage_slots(void)
+{
+ struct kprobe_insn_page *kip;
+ struct hlist_node *pos, *next;
+
+ /* Ensure no-one is preepmted on the garbages */
+ if (check_safety() != 0)
+ return -EAGAIN;
+
+ hlist_for_each_entry_safe(kip, pos, next, &kprobe_insn_pages, hlist) {
+ int i;
+ if (kip->ngarbage == 0)
+ continue;
+ kip->ngarbage = 0; /* we will collect all garbages */
+ for (i = 0; i < INSNS_PER_PAGE; i++) {
+ if (kip->slot_used[i] == SLOT_DIRTY &&
+ collect_one_slot(kip, i))
+ break;
+ }
+ }
+ kprobe_garbage_slots = 0;
+ return 0;
+}
+
+void __kprobes free_insn_slot(kprobe_opcode_t * slot, int dirty)
+{
+ struct kprobe_insn_page *kip;
+ struct hlist_node *pos;
+
+ hlist_for_each_entry(kip, pos, &kprobe_insn_pages, hlist) {
+ if (kip->insns <= slot &&
+ slot < kip->insns + (INSNS_PER_PAGE * MAX_INSN_SIZE)) {
+ int i = (slot - kip->insns) / MAX_INSN_SIZE;
+ if (dirty) {
+ kip->slot_used[i] = SLOT_DIRTY;
+ kip->ngarbage++;
+ } else {
+ collect_one_slot(kip, i);
+ }
+ break;
+ }
+ }
+
+ if (dirty && ++kprobe_garbage_slots > INSNS_PER_PAGE)
+ collect_garbage_slots();
+}
+#endif
+
+/* We have preemption disabled.. so it is safe to use __ versions */
+static inline void set_kprobe_instance(struct kprobe *kp)
+{
+ __get_cpu_var(kprobe_instance) = kp;
+}
+
+static inline void reset_kprobe_instance(void)
+{
+ __get_cpu_var(kprobe_instance) = NULL;
+}
+
+/*
+ * This routine is called either:
+ * - under the kprobe_mutex - during kprobe_[un]register()
+ * OR
+ * - with preemption disabled - from arch/xxx/kernel/kprobes.c
+ */
+struct kprobe __kprobes *get_kprobe(void *addr)
+{
+ struct hlist_head *head;
+ struct hlist_node *node;
+ struct kprobe *p;
+
+ head = &kprobe_table[hash_ptr(addr, KPROBE_HASH_BITS)];
+ hlist_for_each_entry_rcu(p, node, head, hlist) {
+ if (p->addr == addr)
+ return p;
+ }
+ return NULL;
+}
+
+/*
+ * Aggregate handlers for multiple kprobes support - these handlers
+ * take care of invoking the individual kprobe handlers on p->list
+ */
+static int __kprobes aggr_pre_handler(struct kprobe *p, struct pt_regs *regs)
+{
+ struct kprobe *kp;
+
+ list_for_each_entry_rcu(kp, &p->list, list) {
+ if (kp->pre_handler) {
+ set_kprobe_instance(kp);
+ if (kp->pre_handler(kp, regs))
+ return 1;
+ }
+ reset_kprobe_instance();
+ }
+ return 0;
+}
+
+static void __kprobes aggr_post_handler(struct kprobe *p, struct pt_regs *regs,
+ unsigned long flags)
+{
+ struct kprobe *kp;
+
+ list_for_each_entry_rcu(kp, &p->list, list) {
+ if (kp->post_handler) {
+ set_kprobe_instance(kp);
+ kp->post_handler(kp, regs, flags);
+ reset_kprobe_instance();
+ }
+ }
+}
+
+static int __kprobes aggr_fault_handler(struct kprobe *p, struct pt_regs *regs,
+ int trapnr)
+{
+ struct kprobe *cur = __get_cpu_var(kprobe_instance);
+
+ /*
+ * if we faulted "during" the execution of a user specified
+ * probe handler, invoke just that probe's fault handler
+ */
+ if (cur && cur->fault_handler) {
+ if (cur->fault_handler(cur, regs, trapnr))
+ return 1;
+ }
+ return 0;
+}
+
+static int __kprobes aggr_break_handler(struct kprobe *p, struct pt_regs *regs)
+{
+ struct kprobe *cur = __get_cpu_var(kprobe_instance);
+ int ret = 0;
+
+ if (cur && cur->break_handler) {
+ if (cur->break_handler(cur, regs))
+ ret = 1;
+ }
+ reset_kprobe_instance();
+ return ret;
+}
+
+/* Walks the list and increments nmissed count for multiprobe case */
+void __kprobes kprobes_inc_nmissed_count(struct kprobe *p)
+{
+ struct kprobe *kp;
+ if (p->pre_handler != aggr_pre_handler) {
+ p->nmissed++;
+ } else {
+ list_for_each_entry_rcu(kp, &p->list, list)
+ kp->nmissed++;
+ }
+ return;
+}
+
+/* Called with kretprobe_lock held */
+void __kprobes recycle_rp_inst(struct kretprobe_instance *ri,
+ struct hlist_head *head)
+{
+ /* remove rp inst off the rprobe_inst_table */
+ hlist_del(&ri->hlist);
+ if (ri->rp) {
+ /* remove rp inst off the used list */
+ hlist_del(&ri->uflist);
+ /* put rp inst back onto the free list */
+ INIT_HLIST_NODE(&ri->uflist);
+ hlist_add_head(&ri->uflist, &ri->rp->free_instances);
+ } else
+ /* Unregistering */
+ hlist_add_head(&ri->hlist, head);
+}
+
+struct hlist_head __kprobes *kretprobe_inst_table_head(struct task_struct *tsk)
+{
+ return &kretprobe_inst_table[hash_ptr(tsk, KPROBE_HASH_BITS)];
+}
+
+/*
+ * This function is called from finish_task_switch when task tk becomes dead,
+ * so that we can recycle any function-return probe instances associated
+ * with this task. These left over instances represent probed functions
+ * that have been called but will never return.
+ */
+void __kprobes kprobe_flush_task(struct task_struct *tk)
+{
+ struct kretprobe_instance *ri;
+ struct hlist_head *head, empty_rp;
+ struct hlist_node *node, *tmp;
+ unsigned long flags = 0;
+
+ INIT_HLIST_HEAD(&empty_rp);
+ spin_lock_irqsave(&kretprobe_lock, flags);
+ head = kretprobe_inst_table_head(tk);
+ hlist_for_each_entry_safe(ri, node, tmp, head, hlist) {
+ if (ri->task == tk)
+ recycle_rp_inst(ri, &empty_rp);
+ }
+ spin_unlock_irqrestore(&kretprobe_lock, flags);
+
+ hlist_for_each_entry_safe(ri, node, tmp, &empty_rp, hlist) {
+ hlist_del(&ri->hlist);
+ kfree(ri);
+ }
+}
+
+static inline void free_rp_inst(struct kretprobe *rp)
+{
+ struct kretprobe_instance *ri;
+ struct hlist_node *pos, *next;
+
+ hlist_for_each_entry_safe(ri, pos, next, &rp->free_instances, uflist) {
+ hlist_del(&ri->uflist);
+ kfree(ri);
+ }
+}
+
+/*
+ * Keep all fields in the kprobe consistent
+ */
+static inline void copy_kprobe(struct kprobe *old_p, struct kprobe *p)
+{
+ memcpy(&p->opcode, &old_p->opcode, sizeof(kprobe_opcode_t));
+ memcpy(&p->ainsn, &old_p->ainsn, sizeof(struct arch_specific_insn));
+}
+
+/*
+* Add the new probe to old_p->list. Fail if this is the
+* second jprobe at the address - two jprobes can't coexist
+*/
+static int __kprobes add_new_kprobe(struct kprobe *old_p, struct kprobe *p)
+{
+ if (p->break_handler) {
+ if (old_p->break_handler)
+ return -EEXIST;
+ list_add_tail_rcu(&p->list, &old_p->list);
+ old_p->break_handler = aggr_break_handler;
+ } else
+ list_add_rcu(&p->list, &old_p->list);
+ if (p->post_handler && !old_p->post_handler)
+ old_p->post_handler = aggr_post_handler;
+ return 0;
+}
+
+/*
+ * Fill in the required fields of the "manager kprobe". Replace the
+ * earlier kprobe in the hlist with the manager kprobe
+ */
+static inline void add_aggr_kprobe(struct kprobe *ap, struct kprobe *p)
+{
+ copy_kprobe(p, ap);
+ flush_insn_slot(ap);
+ ap->addr = p->addr;
+ ap->pre_handler = aggr_pre_handler;
+ ap->fault_handler = aggr_fault_handler;
+ if (p->post_handler)
+ ap->post_handler = aggr_post_handler;
+ if (p->break_handler)
+ ap->break_handler = aggr_break_handler;
+
+ INIT_LIST_HEAD(&ap->list);
+ list_add_rcu(&p->list, &ap->list);
+
+ hlist_replace_rcu(&p->hlist, &ap->hlist);
+}
+
+/*
+ * This is the second or subsequent kprobe at the address - handle
+ * the intricacies
+ */
+static int __kprobes register_aggr_kprobe(struct kprobe *old_p,
+ struct kprobe *p)
+{
+ int ret = 0;
+ struct kprobe *ap;
+
+ if (old_p->pre_handler == aggr_pre_handler) {
+ copy_kprobe(old_p, p);
+ ret = add_new_kprobe(old_p, p);
+ } else {
+ ap = kzalloc(sizeof(struct kprobe), GFP_KERNEL);
+ if (!ap)
+ return -ENOMEM;
+ add_aggr_kprobe(ap, old_p);
+ copy_kprobe(ap, p);
+ ret = add_new_kprobe(ap, p);
+ }
+ return ret;
+}
+
+static int __kprobes in_kprobes_functions(unsigned long addr)
+{
+ if (addr >= (unsigned long)__kprobes_text_start &&
+ addr < (unsigned long)__kprobes_text_end)
+ return -EINVAL;
+ return 0;
+}
+
+static int __kprobes __register_kprobe(struct kprobe *p,
+ unsigned long called_from)
+{
+ int ret = 0;
+ struct kprobe *old_p;
+ struct module *probed_mod;
+
+ /*
+ * If we have a symbol_name argument look it up,
+ * and add it to the address. That way the addr
+ * field can either be global or relative to a symbol.
+ */
+ if (p->symbol_name) {
+ if (p->addr)
+ return -EINVAL;
+ kprobe_lookup_name(p->symbol_name, p->addr);
+ }
+
+ if (!p->addr)
+ return -EINVAL;
+ p->addr = (kprobe_opcode_t *)(((char *)p->addr)+ p->offset);
+
+ if (!kernel_text_address((unsigned long) p->addr) ||
+ in_kprobes_functions((unsigned long) p->addr))
+ return -EINVAL;
+
+ p->mod_refcounted = 0;
+
+ /*
+ * Check if are we probing a module.
+ */
+ probed_mod = module_text_address((unsigned long) p->addr);
+ if (probed_mod) {
+ struct module *calling_mod = module_text_address(called_from);
+ /*
+ * We must allow modules to probe themself and in this case
+ * avoid incrementing the module refcount, so as to allow
+ * unloading of self probing modules.
+ */
+ if (calling_mod && calling_mod != probed_mod) {
+ if (unlikely(!try_module_get(probed_mod)))
+ return -EINVAL;
+ p->mod_refcounted = 1;
+ } else
+ probed_mod = NULL;
+ }
+
+ p->nmissed = 0;
+ mutex_lock(&kprobe_mutex);
+ old_p = get_kprobe(p->addr);
+ if (old_p) {
+ ret = register_aggr_kprobe(old_p, p);
+ goto out;
+ }
+
+ ret = arch_prepare_kprobe(p);
+ if (ret)
+ goto out;
+
+ INIT_HLIST_NODE(&p->hlist);
+ hlist_add_head_rcu(&p->hlist,
+ &kprobe_table[hash_ptr(p->addr, KPROBE_HASH_BITS)]);
+
+ if (kprobe_enabled)
+ arch_arm_kprobe(p);
+
+out:
+ mutex_unlock(&kprobe_mutex);
+
+ if (ret && probed_mod)
+ module_put(probed_mod);
+ return ret;
+}
+
+int __kprobes register_kprobe(struct kprobe *p)
+{
+ return __register_kprobe(p, (unsigned long)__builtin_return_address(0));
+}
+
+void __kprobes unregister_kprobe(struct kprobe *p)
+{
+ struct module *mod;
+ struct kprobe *old_p, *list_p;
+ int cleanup_p;
+
+ mutex_lock(&kprobe_mutex);
+ old_p = get_kprobe(p->addr);
+ if (unlikely(!old_p)) {
+ mutex_unlock(&kprobe_mutex);
+ return;
+ }
+ if (p != old_p) {
+ list_for_each_entry_rcu(list_p, &old_p->list, list)
+ if (list_p == p)
+ /* kprobe p is a valid probe */
+ goto valid_p;
+ mutex_unlock(&kprobe_mutex);
+ return;
+ }
+valid_p:
+ if (old_p == p ||
+ (old_p->pre_handler == aggr_pre_handler &&
+ p->list.next == &old_p->list && p->list.prev == &old_p->list)) {
+ /*
+ * Only probe on the hash list. Disarm only if kprobes are
+ * enabled - otherwise, the breakpoint would already have
+ * been removed. We save on flushing icache.
+ */
+ if (kprobe_enabled)
+ arch_disarm_kprobe(p);
+ hlist_del_rcu(&old_p->hlist);
+ cleanup_p = 1;
+ } else {
+ list_del_rcu(&p->list);
+ cleanup_p = 0;
+ }
+
+ mutex_unlock(&kprobe_mutex);
+
+ synchronize_sched();
+ if (p->mod_refcounted) {
+ mod = module_text_address((unsigned long)p->addr);
+ if (mod)
+ module_put(mod);
+ }
+
+ if (cleanup_p) {
+ if (p != old_p) {
+ list_del_rcu(&p->list);
+ kfree(old_p);
+ }
+ arch_remove_kprobe(p);
+ } else {
+ mutex_lock(&kprobe_mutex);
+ if (p->break_handler)
+ old_p->break_handler = NULL;
+ if (p->post_handler){
+ list_for_each_entry_rcu(list_p, &old_p->list, list){
+ if (list_p->post_handler){
+ cleanup_p = 2;
+ break;
+ }
+ }
+ if (cleanup_p == 0)
+ old_p->post_handler = NULL;
+ }
+ mutex_unlock(&kprobe_mutex);
+ }
+}
+
+static struct notifier_block kprobe_exceptions_nb = {
+ .notifier_call = kprobe_exceptions_notify,
+ .priority = 0x7fffffff /* we need to be notified first */
+};
+
+unsigned long __weak arch_deref_entry_point(void *entry)
+{
+ return (unsigned long)entry;
+}
+
+int __kprobes register_jprobe(struct jprobe *jp)
+{
+ unsigned long addr = arch_deref_entry_point(jp->entry);
+
+ if (!kernel_text_address(addr))
+ return -EINVAL;
+
+ /* Todo: Verify probepoint is a function entry point */
+ jp->kp.pre_handler = setjmp_pre_handler;
+ jp->kp.break_handler = longjmp_break_handler;
+
+ return __register_kprobe(&jp->kp,
+ (unsigned long)__builtin_return_address(0));
+}
+
+void __kprobes unregister_jprobe(struct jprobe *jp)
+{
+ unregister_kprobe(&jp->kp);
+}
+
+#ifdef ARCH_SUPPORTS_KRETPROBES
+
+/*
+ * This kprobe pre_handler is registered with every kretprobe. When probe
+ * hits it will set up the return probe.
+ */
+static int __kprobes pre_handler_kretprobe(struct kprobe *p,
+ struct pt_regs *regs)
+{
+ struct kretprobe *rp = container_of(p, struct kretprobe, kp);
+ unsigned long flags = 0;
+
+ /*TODO: consider to only swap the RA after the last pre_handler fired */
+ spin_lock_irqsave(&kretprobe_lock, flags);
+ if (!hlist_empty(&rp->free_instances)) {
+ struct kretprobe_instance *ri;
+
+ ri = hlist_entry(rp->free_instances.first,
+ struct kretprobe_instance, uflist);
+ ri->rp = rp;
+ ri->task = current;
+ arch_prepare_kretprobe(ri, regs);
+
+ /* XXX(hch): why is there no hlist_move_head? */
+ hlist_del(&ri->uflist);
+ hlist_add_head(&ri->uflist, &ri->rp->used_instances);
+ hlist_add_head(&ri->hlist, kretprobe_inst_table_head(ri->task));
+ } else
+ rp->nmissed++;
+ spin_unlock_irqrestore(&kretprobe_lock, flags);
+ return 0;
+}
+
+int __kprobes register_kretprobe(struct kretprobe *rp)
+{
+ int ret = 0;
+ struct kretprobe_instance *inst;
+ int i;
+ void *addr = rp->kp.addr;
+
+ if (kretprobe_blacklist_size) {
+ if (addr == NULL)
+ kprobe_lookup_name(rp->kp.symbol_name, addr);
+ addr += rp->kp.offset;
+
+ for (i = 0; kretprobe_blacklist[i].name != NULL; i++) {
+ if (kretprobe_blacklist[i].addr == addr)
+ return -EINVAL;
+ }
+ }
+
+ rp->kp.pre_handler = pre_handler_kretprobe;
+ rp->kp.post_handler = NULL;
+ rp->kp.fault_handler = NULL;
+ rp->kp.break_handler = NULL;
+
+ /* Pre-allocate memory for max kretprobe instances */
+ if (rp->maxactive <= 0) {
+#ifdef CONFIG_PREEMPT
+ rp->maxactive = max(10, 2 * NR_CPUS);
+#else
+ rp->maxactive = NR_CPUS;
+#endif
+ }
+ INIT_HLIST_HEAD(&rp->used_instances);
+ INIT_HLIST_HEAD(&rp->free_instances);
+ for (i = 0; i < rp->maxactive; i++) {
+ inst = kmalloc(sizeof(struct kretprobe_instance), GFP_KERNEL);
+ if (inst == NULL) {
+ free_rp_inst(rp);
+ return -ENOMEM;
+ }
+ INIT_HLIST_NODE(&inst->uflist);
+ hlist_add_head(&inst->uflist, &rp->free_instances);
+ }
+
+ rp->nmissed = 0;
+ /* Establish function entry probe point */
+ if ((ret = __register_kprobe(&rp->kp,
+ (unsigned long)__builtin_return_address(0))) != 0)
+ free_rp_inst(rp);
+ return ret;
+}
+
+#else /* ARCH_SUPPORTS_KRETPROBES */
+
+int __kprobes register_kretprobe(struct kretprobe *rp)
+{
+ return -ENOSYS;
+}
+
+static int __kprobes pre_handler_kretprobe(struct kprobe *p,
+ struct pt_regs *regs)
+{
+ return 0;
+}
+
+#endif /* ARCH_SUPPORTS_KRETPROBES */
+
+void __kprobes unregister_kretprobe(struct kretprobe *rp)
+{
+ unsigned long flags;
+ struct kretprobe_instance *ri;
+ struct hlist_node *pos, *next;
+
+ unregister_kprobe(&rp->kp);
+
+ /* No race here */
+ spin_lock_irqsave(&kretprobe_lock, flags);
+ hlist_for_each_entry_safe(ri, pos, next, &rp->used_instances, uflist) {
+ ri->rp = NULL;
+ hlist_del(&ri->uflist);
+ }
+ spin_unlock_irqrestore(&kretprobe_lock, flags);
+ free_rp_inst(rp);
+}
+
+static int __init init_kprobes(void)
+{
+ int i, err = 0;
+
+ /* FIXME allocate the probe table, currently defined statically */
+ /* initialize all list heads */
+ for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
+ INIT_HLIST_HEAD(&kprobe_table[i]);
+ INIT_HLIST_HEAD(&kretprobe_inst_table[i]);
+ }
+
+ if (kretprobe_blacklist_size) {
+ /* lookup the function address from its name */
+ for (i = 0; kretprobe_blacklist[i].name != NULL; i++) {
+ kprobe_lookup_name(kretprobe_blacklist[i].name,
+ kretprobe_blacklist[i].addr);
+ if (!kretprobe_blacklist[i].addr)
+ printk("kretprobe: lookup failed: %s\n",
+ kretprobe_blacklist[i].name);
+ }
+ }
+
+ /* By default, kprobes are enabled */
+ kprobe_enabled = true;
+
+ err = arch_init_kprobes();
+ if (!err)
+ err = register_die_notifier(&kprobe_exceptions_nb);
+
+ return err;
+}
+
+#ifdef CONFIG_DEBUG_FS
+static void __kprobes report_probe(struct seq_file *pi, struct kprobe *p,
+ const char *sym, int offset,char *modname)
+{
+ char *kprobe_type;
+
+ if (p->pre_handler == pre_handler_kretprobe)
+ kprobe_type = "r";
+ else if (p->pre_handler == setjmp_pre_handler)
+ kprobe_type = "j";
+ else
+ kprobe_type = "k";
+ if (sym)
+ seq_printf(pi, "%p %s %s+0x%x %s\n", p->addr, kprobe_type,
+ sym, offset, (modname ? modname : " "));
+ else
+ seq_printf(pi, "%p %s %p\n", p->addr, kprobe_type, p->addr);
+}
+
+static void __kprobes *kprobe_seq_start(struct seq_file *f, loff_t *pos)
+{
+ return (*pos < KPROBE_TABLE_SIZE) ? pos : NULL;
+}
+
+static void __kprobes *kprobe_seq_next(struct seq_file *f, void *v, loff_t *pos)
+{
+ (*pos)++;
+ if (*pos >= KPROBE_TABLE_SIZE)
+ return NULL;
+ return pos;
+}
+
+static void __kprobes kprobe_seq_stop(struct seq_file *f, void *v)
+{
+ /* Nothing to do */
+}
+
+static int __kprobes show_kprobe_addr(struct seq_file *pi, void *v)
+{
+ struct hlist_head *head;
+ struct hlist_node *node;
+ struct kprobe *p, *kp;
+ const char *sym = NULL;
+ unsigned int i = *(loff_t *) v;
+ unsigned long offset = 0;
+ char *modname, namebuf[128];
+
+ head = &kprobe_table[i];
+ preempt_disable();
+ hlist_for_each_entry_rcu(p, node, head, hlist) {
+ sym = kallsyms_lookup((unsigned long)p->addr, NULL,
+ &offset, &modname, namebuf);
+ if (p->pre_handler == aggr_pre_handler) {
+ list_for_each_entry_rcu(kp, &p->list, list)
+ report_probe(pi, kp, sym, offset, modname);
+ } else
+ report_probe(pi, p, sym, offset, modname);
+ }
+ preempt_enable();
+ return 0;
+}
+
+static struct seq_operations kprobes_seq_ops = {
+ .start = kprobe_seq_start,
+ .next = kprobe_seq_next,
+ .stop = kprobe_seq_stop,
+ .show = show_kprobe_addr
+};
+
+static int __kprobes kprobes_open(struct inode *inode, struct file *filp)
+{
+ return seq_open(filp, &kprobes_seq_ops);
+}
+
+static struct file_operations debugfs_kprobes_operations = {
+ .open = kprobes_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release,
+};
+
+static void __kprobes enable_all_kprobes(void)
+{
+ struct hlist_head *head;
+ struct hlist_node *node;
+ struct kprobe *p;
+ unsigned int i;
+
+ mutex_lock(&kprobe_mutex);
+
+ /* If kprobes are already enabled, just return */
+ if (kprobe_enabled)
+ goto already_enabled;
+
+ for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
+ head = &kprobe_table[i];
+ hlist_for_each_entry_rcu(p, node, head, hlist)
+ arch_arm_kprobe(p);
+ }
+
+ kprobe_enabled = true;
+ printk(KERN_INFO "Kprobes globally enabled\n");
+
+already_enabled:
+ mutex_unlock(&kprobe_mutex);
+ return;
+}
+
+static void __kprobes disable_all_kprobes(void)
+{
+ struct hlist_head *head;
+ struct hlist_node *node;
+ struct kprobe *p;
+ unsigned int i;
+
+ mutex_lock(&kprobe_mutex);
+
+ /* If kprobes are already disabled, just return */
+ if (!kprobe_enabled)
+ goto already_disabled;
+
+ kprobe_enabled = false;
+ printk(KERN_INFO "Kprobes globally disabled\n");
+ for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
+ head = &kprobe_table[i];
+ hlist_for_each_entry_rcu(p, node, head, hlist) {
+ if (!arch_trampoline_kprobe(p))
+ arch_disarm_kprobe(p);
+ }
+ }
+
+ mutex_unlock(&kprobe_mutex);
+ /* Allow all currently running kprobes to complete */
+ synchronize_sched();
+ return;
+
+already_disabled:
+ mutex_unlock(&kprobe_mutex);
+ return;
+}
+
+/*
+ * XXX: The debugfs bool file interface doesn't allow for callbacks
+ * when the bool state is switched. We can reuse that facility when
+ * available
+ */
+static ssize_t read_enabled_file_bool(struct file *file,
+ char __user *user_buf, size_t count, loff_t *ppos)
+{
+ char buf[3];
+
+ if (kprobe_enabled)
+ buf[0] = '1';
+ else
+ buf[0] = '0';
+ buf[1] = '\n';
+ buf[2] = 0x00;
+ return simple_read_from_buffer(user_buf, count, ppos, buf, 2);
+}
+
+static ssize_t write_enabled_file_bool(struct file *file,
+ const char __user *user_buf, size_t count, loff_t *ppos)
+{
+ char buf[32];
+ int buf_size;
+
+ buf_size = min(count, (sizeof(buf)-1));
+ if (copy_from_user(buf, user_buf, buf_size))
+ return -EFAULT;
+
+ switch (buf[0]) {
+ case 'y':
+ case 'Y':
+ case '1':
+ enable_all_kprobes();
+ break;
+ case 'n':
+ case 'N':
+ case '0':
+ disable_all_kprobes();
+ break;
+ }
+
+ return count;
+}
+
+static struct file_operations fops_kp = {
+ .read = read_enabled_file_bool,
+ .write = write_enabled_file_bool,
+};
+
+static int __kprobes debugfs_kprobe_init(void)
+{
+ struct dentry *dir, *file;
+ unsigned int value = 1;
+
+ dir = debugfs_create_dir("kprobes", NULL);
+ if (!dir)
+ return -ENOMEM;
+
+ file = debugfs_create_file("list", 0444, dir, NULL,
+ &debugfs_kprobes_operations);
+ if (!file) {
+ debugfs_remove(dir);
+ return -ENOMEM;
+ }
+
+ file = debugfs_create_file("enabled", 0600, dir,
+ &value, &fops_kp);
+ if (!file) {
+ debugfs_remove(dir);
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+late_initcall(debugfs_kprobe_init);
+#endif /* CONFIG_DEBUG_FS */
+
+module_init(init_kprobes);
+
+EXPORT_SYMBOL_GPL(register_kprobe);
+EXPORT_SYMBOL_GPL(unregister_kprobe);
+EXPORT_SYMBOL_GPL(register_jprobe);
+EXPORT_SYMBOL_GPL(unregister_jprobe);
+#ifdef CONFIG_KPROBES
+EXPORT_SYMBOL_GPL(jprobe_return);
+#endif
+
+#ifdef CONFIG_KPROBES
+EXPORT_SYMBOL_GPL(register_kretprobe);
+EXPORT_SYMBOL_GPL(unregister_kretprobe);
+#endif
Index: linux-2.6-lttng.stable/kernel/kprobes.c
===================================================================
--- linux-2.6-lttng.stable.orig/kernel/kprobes.c 2007-10-29 09:51:06.000000000 -0400
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,1063 +0,0 @@
-/*
- * Kernel Probes (KProbes)
- * kernel/kprobes.c
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
- *
- * Copyright (C) IBM Corporation, 2002, 2004
- *
- * 2002-Oct Created by Vamsi Krishna S <vamsi_krishna@in.ibm.com> Kernel
- * Probes initial implementation (includes suggestions from
- * Rusty Russell).
- * 2004-Aug Updated by Prasanna S Panchamukhi <prasanna@in.ibm.com> with
- * hlists and exceptions notifier as suggested by Andi Kleen.
- * 2004-July Suparna Bhattacharya <suparna@in.ibm.com> added jumper probes
- * interface to access function arguments.
- * 2004-Sep Prasanna S Panchamukhi <prasanna@in.ibm.com> Changed Kprobes
- * exceptions notifier to be first on the priority list.
- * 2005-May Hien Nguyen <hien@us.ibm.com>, Jim Keniston
- * <jkenisto@us.ibm.com> and Prasanna S Panchamukhi
- * <prasanna@in.ibm.com> added function-return probes.
- */
-#include <linux/kprobes.h>
-#include <linux/hash.h>
-#include <linux/init.h>
-#include <linux/slab.h>
-#include <linux/stddef.h>
-#include <linux/module.h>
-#include <linux/moduleloader.h>
-#include <linux/kallsyms.h>
-#include <linux/freezer.h>
-#include <linux/seq_file.h>
-#include <linux/debugfs.h>
-#include <linux/kdebug.h>
-
-#include <asm-generic/sections.h>
-#include <asm/cacheflush.h>
-#include <asm/errno.h>
-#include <asm/uaccess.h>
-
-#define KPROBE_HASH_BITS 6
-#define KPROBE_TABLE_SIZE (1 << KPROBE_HASH_BITS)
-
-
-/*
- * Some oddball architectures like 64bit powerpc have function descriptors
- * so this must be overridable.
- */
-#ifndef kprobe_lookup_name
-#define kprobe_lookup_name(name, addr) \
- addr = ((kprobe_opcode_t *)(kallsyms_lookup_name(name)))
-#endif
-
-static struct hlist_head kprobe_table[KPROBE_TABLE_SIZE];
-static struct hlist_head kretprobe_inst_table[KPROBE_TABLE_SIZE];
-
-/* NOTE: change this value only with kprobe_mutex held */
-static bool kprobe_enabled;
-
-DEFINE_MUTEX(kprobe_mutex); /* Protects kprobe_table */
-DEFINE_SPINLOCK(kretprobe_lock); /* Protects kretprobe_inst_table */
-static DEFINE_PER_CPU(struct kprobe *, kprobe_instance) = NULL;
-
-#ifdef __ARCH_WANT_KPROBES_INSN_SLOT
-/*
- * kprobe->ainsn.insn points to the copy of the instruction to be
- * single-stepped. x86_64, POWER4 and above have no-exec support and
- * stepping on the instruction on a vmalloced/kmalloced/data page
- * is a recipe for disaster
- */
-#define INSNS_PER_PAGE (PAGE_SIZE/(MAX_INSN_SIZE * sizeof(kprobe_opcode_t)))
-
-struct kprobe_insn_page {
- struct hlist_node hlist;
- kprobe_opcode_t *insns; /* Page of instruction slots */
- char slot_used[INSNS_PER_PAGE];
- int nused;
- int ngarbage;
-};
-
-enum kprobe_slot_state {
- SLOT_CLEAN = 0,
- SLOT_DIRTY = 1,
- SLOT_USED = 2,
-};
-
-static struct hlist_head kprobe_insn_pages;
-static int kprobe_garbage_slots;
-static int collect_garbage_slots(void);
-
-static int __kprobes check_safety(void)
-{
- int ret = 0;
-#if defined(CONFIG_PREEMPT) && defined(CONFIG_PM)
- ret = freeze_processes();
- if (ret == 0) {
- struct task_struct *p, *q;
- do_each_thread(p, q) {
- if (p != current && p->state == TASK_RUNNING &&
- p->pid != 0) {
- printk("Check failed: %s is running\n",p->comm);
- ret = -1;
- goto loop_end;
- }
- } while_each_thread(p, q);
- }
-loop_end:
- thaw_processes();
-#else
- synchronize_sched();
-#endif
- return ret;
-}
-
-/**
- * get_insn_slot() - Find a slot on an executable page for an instruction.
- * We allocate an executable page if there's no room on existing ones.
- */
-kprobe_opcode_t __kprobes *get_insn_slot(void)
-{
- struct kprobe_insn_page *kip;
- struct hlist_node *pos;
-
- retry:
- hlist_for_each_entry(kip, pos, &kprobe_insn_pages, hlist) {
- if (kip->nused < INSNS_PER_PAGE) {
- int i;
- for (i = 0; i < INSNS_PER_PAGE; i++) {
- if (kip->slot_used[i] == SLOT_CLEAN) {
- kip->slot_used[i] = SLOT_USED;
- kip->nused++;
- return kip->insns + (i * MAX_INSN_SIZE);
- }
- }
- /* Surprise! No unused slots. Fix kip->nused. */
- kip->nused = INSNS_PER_PAGE;
- }
- }
-
- /* If there are any garbage slots, collect it and try again. */
- if (kprobe_garbage_slots && collect_garbage_slots() == 0) {
- goto retry;
- }
- /* All out of space. Need to allocate a new page. Use slot 0. */
- kip = kmalloc(sizeof(struct kprobe_insn_page), GFP_KERNEL);
- if (!kip)
- return NULL;
-
- /*
- * Use module_alloc so this page is within +/- 2GB of where the
- * kernel image and loaded module images reside. This is required
- * so x86_64 can correctly handle the %rip-relative fixups.
- */
- kip->insns = module_alloc(PAGE_SIZE);
- if (!kip->insns) {
- kfree(kip);
- return NULL;
- }
- INIT_HLIST_NODE(&kip->hlist);
- hlist_add_head(&kip->hlist, &kprobe_insn_pages);
- memset(kip->slot_used, SLOT_CLEAN, INSNS_PER_PAGE);
- kip->slot_used[0] = SLOT_USED;
- kip->nused = 1;
- kip->ngarbage = 0;
- return kip->insns;
-}
-
-/* Return 1 if all garbages are collected, otherwise 0. */
-static int __kprobes collect_one_slot(struct kprobe_insn_page *kip, int idx)
-{
- kip->slot_used[idx] = SLOT_CLEAN;
- kip->nused--;
- if (kip->nused == 0) {
- /*
- * Page is no longer in use. Free it unless
- * it's the last one. We keep the last one
- * so as not to have to set it up again the
- * next time somebody inserts a probe.
- */
- hlist_del(&kip->hlist);
- if (hlist_empty(&kprobe_insn_pages)) {
- INIT_HLIST_NODE(&kip->hlist);
- hlist_add_head(&kip->hlist,
- &kprobe_insn_pages);
- } else {
- module_free(NULL, kip->insns);
- kfree(kip);
- }
- return 1;
- }
- return 0;
-}
-
-static int __kprobes collect_garbage_slots(void)
-{
- struct kprobe_insn_page *kip;
- struct hlist_node *pos, *next;
-
- /* Ensure no-one is preepmted on the garbages */
- if (check_safety() != 0)
- return -EAGAIN;
-
- hlist_for_each_entry_safe(kip, pos, next, &kprobe_insn_pages, hlist) {
- int i;
- if (kip->ngarbage == 0)
- continue;
- kip->ngarbage = 0; /* we will collect all garbages */
- for (i = 0; i < INSNS_PER_PAGE; i++) {
- if (kip->slot_used[i] == SLOT_DIRTY &&
- collect_one_slot(kip, i))
- break;
- }
- }
- kprobe_garbage_slots = 0;
- return 0;
-}
-
-void __kprobes free_insn_slot(kprobe_opcode_t * slot, int dirty)
-{
- struct kprobe_insn_page *kip;
- struct hlist_node *pos;
-
- hlist_for_each_entry(kip, pos, &kprobe_insn_pages, hlist) {
- if (kip->insns <= slot &&
- slot < kip->insns + (INSNS_PER_PAGE * MAX_INSN_SIZE)) {
- int i = (slot - kip->insns) / MAX_INSN_SIZE;
- if (dirty) {
- kip->slot_used[i] = SLOT_DIRTY;
- kip->ngarbage++;
- } else {
- collect_one_slot(kip, i);
- }
- break;
- }
- }
-
- if (dirty && ++kprobe_garbage_slots > INSNS_PER_PAGE)
- collect_garbage_slots();
-}
-#endif
-
-/* We have preemption disabled.. so it is safe to use __ versions */
-static inline void set_kprobe_instance(struct kprobe *kp)
-{
- __get_cpu_var(kprobe_instance) = kp;
-}
-
-static inline void reset_kprobe_instance(void)
-{
- __get_cpu_var(kprobe_instance) = NULL;
-}
-
-/*
- * This routine is called either:
- * - under the kprobe_mutex - during kprobe_[un]register()
- * OR
- * - with preemption disabled - from arch/xxx/kernel/kprobes.c
- */
-struct kprobe __kprobes *get_kprobe(void *addr)
-{
- struct hlist_head *head;
- struct hlist_node *node;
- struct kprobe *p;
-
- head = &kprobe_table[hash_ptr(addr, KPROBE_HASH_BITS)];
- hlist_for_each_entry_rcu(p, node, head, hlist) {
- if (p->addr == addr)
- return p;
- }
- return NULL;
-}
-
-/*
- * Aggregate handlers for multiple kprobes support - these handlers
- * take care of invoking the individual kprobe handlers on p->list
- */
-static int __kprobes aggr_pre_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct kprobe *kp;
-
- list_for_each_entry_rcu(kp, &p->list, list) {
- if (kp->pre_handler) {
- set_kprobe_instance(kp);
- if (kp->pre_handler(kp, regs))
- return 1;
- }
- reset_kprobe_instance();
- }
- return 0;
-}
-
-static void __kprobes aggr_post_handler(struct kprobe *p, struct pt_regs *regs,
- unsigned long flags)
-{
- struct kprobe *kp;
-
- list_for_each_entry_rcu(kp, &p->list, list) {
- if (kp->post_handler) {
- set_kprobe_instance(kp);
- kp->post_handler(kp, regs, flags);
- reset_kprobe_instance();
- }
- }
-}
-
-static int __kprobes aggr_fault_handler(struct kprobe *p, struct pt_regs *regs,
- int trapnr)
-{
- struct kprobe *cur = __get_cpu_var(kprobe_instance);
-
- /*
- * if we faulted "during" the execution of a user specified
- * probe handler, invoke just that probe's fault handler
- */
- if (cur && cur->fault_handler) {
- if (cur->fault_handler(cur, regs, trapnr))
- return 1;
- }
- return 0;
-}
-
-static int __kprobes aggr_break_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct kprobe *cur = __get_cpu_var(kprobe_instance);
- int ret = 0;
-
- if (cur && cur->break_handler) {
- if (cur->break_handler(cur, regs))
- ret = 1;
- }
- reset_kprobe_instance();
- return ret;
-}
-
-/* Walks the list and increments nmissed count for multiprobe case */
-void __kprobes kprobes_inc_nmissed_count(struct kprobe *p)
-{
- struct kprobe *kp;
- if (p->pre_handler != aggr_pre_handler) {
- p->nmissed++;
- } else {
- list_for_each_entry_rcu(kp, &p->list, list)
- kp->nmissed++;
- }
- return;
-}
-
-/* Called with kretprobe_lock held */
-void __kprobes recycle_rp_inst(struct kretprobe_instance *ri,
- struct hlist_head *head)
-{
- /* remove rp inst off the rprobe_inst_table */
- hlist_del(&ri->hlist);
- if (ri->rp) {
- /* remove rp inst off the used list */
- hlist_del(&ri->uflist);
- /* put rp inst back onto the free list */
- INIT_HLIST_NODE(&ri->uflist);
- hlist_add_head(&ri->uflist, &ri->rp->free_instances);
- } else
- /* Unregistering */
- hlist_add_head(&ri->hlist, head);
-}
-
-struct hlist_head __kprobes *kretprobe_inst_table_head(struct task_struct *tsk)
-{
- return &kretprobe_inst_table[hash_ptr(tsk, KPROBE_HASH_BITS)];
-}
-
-/*
- * This function is called from finish_task_switch when task tk becomes dead,
- * so that we can recycle any function-return probe instances associated
- * with this task. These left over instances represent probed functions
- * that have been called but will never return.
- */
-void __kprobes kprobe_flush_task(struct task_struct *tk)
-{
- struct kretprobe_instance *ri;
- struct hlist_head *head, empty_rp;
- struct hlist_node *node, *tmp;
- unsigned long flags = 0;
-
- INIT_HLIST_HEAD(&empty_rp);
- spin_lock_irqsave(&kretprobe_lock, flags);
- head = kretprobe_inst_table_head(tk);
- hlist_for_each_entry_safe(ri, node, tmp, head, hlist) {
- if (ri->task == tk)
- recycle_rp_inst(ri, &empty_rp);
- }
- spin_unlock_irqrestore(&kretprobe_lock, flags);
-
- hlist_for_each_entry_safe(ri, node, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
-}
-
-static inline void free_rp_inst(struct kretprobe *rp)
-{
- struct kretprobe_instance *ri;
- struct hlist_node *pos, *next;
-
- hlist_for_each_entry_safe(ri, pos, next, &rp->free_instances, uflist) {
- hlist_del(&ri->uflist);
- kfree(ri);
- }
-}
-
-/*
- * Keep all fields in the kprobe consistent
- */
-static inline void copy_kprobe(struct kprobe *old_p, struct kprobe *p)
-{
- memcpy(&p->opcode, &old_p->opcode, sizeof(kprobe_opcode_t));
- memcpy(&p->ainsn, &old_p->ainsn, sizeof(struct arch_specific_insn));
-}
-
-/*
-* Add the new probe to old_p->list. Fail if this is the
-* second jprobe at the address - two jprobes can't coexist
-*/
-static int __kprobes add_new_kprobe(struct kprobe *old_p, struct kprobe *p)
-{
- if (p->break_handler) {
- if (old_p->break_handler)
- return -EEXIST;
- list_add_tail_rcu(&p->list, &old_p->list);
- old_p->break_handler = aggr_break_handler;
- } else
- list_add_rcu(&p->list, &old_p->list);
- if (p->post_handler && !old_p->post_handler)
- old_p->post_handler = aggr_post_handler;
- return 0;
-}
-
-/*
- * Fill in the required fields of the "manager kprobe". Replace the
- * earlier kprobe in the hlist with the manager kprobe
- */
-static inline void add_aggr_kprobe(struct kprobe *ap, struct kprobe *p)
-{
- copy_kprobe(p, ap);
- flush_insn_slot(ap);
- ap->addr = p->addr;
- ap->pre_handler = aggr_pre_handler;
- ap->fault_handler = aggr_fault_handler;
- if (p->post_handler)
- ap->post_handler = aggr_post_handler;
- if (p->break_handler)
- ap->break_handler = aggr_break_handler;
-
- INIT_LIST_HEAD(&ap->list);
- list_add_rcu(&p->list, &ap->list);
-
- hlist_replace_rcu(&p->hlist, &ap->hlist);
-}
-
-/*
- * This is the second or subsequent kprobe at the address - handle
- * the intricacies
- */
-static int __kprobes register_aggr_kprobe(struct kprobe *old_p,
- struct kprobe *p)
-{
- int ret = 0;
- struct kprobe *ap;
-
- if (old_p->pre_handler == aggr_pre_handler) {
- copy_kprobe(old_p, p);
- ret = add_new_kprobe(old_p, p);
- } else {
- ap = kzalloc(sizeof(struct kprobe), GFP_KERNEL);
- if (!ap)
- return -ENOMEM;
- add_aggr_kprobe(ap, old_p);
- copy_kprobe(ap, p);
- ret = add_new_kprobe(ap, p);
- }
- return ret;
-}
-
-static int __kprobes in_kprobes_functions(unsigned long addr)
-{
- if (addr >= (unsigned long)__kprobes_text_start &&
- addr < (unsigned long)__kprobes_text_end)
- return -EINVAL;
- return 0;
-}
-
-static int __kprobes __register_kprobe(struct kprobe *p,
- unsigned long called_from)
-{
- int ret = 0;
- struct kprobe *old_p;
- struct module *probed_mod;
-
- /*
- * If we have a symbol_name argument look it up,
- * and add it to the address. That way the addr
- * field can either be global or relative to a symbol.
- */
- if (p->symbol_name) {
- if (p->addr)
- return -EINVAL;
- kprobe_lookup_name(p->symbol_name, p->addr);
- }
-
- if (!p->addr)
- return -EINVAL;
- p->addr = (kprobe_opcode_t *)(((char *)p->addr)+ p->offset);
-
- if (!kernel_text_address((unsigned long) p->addr) ||
- in_kprobes_functions((unsigned long) p->addr))
- return -EINVAL;
-
- p->mod_refcounted = 0;
-
- /*
- * Check if are we probing a module.
- */
- probed_mod = module_text_address((unsigned long) p->addr);
- if (probed_mod) {
- struct module *calling_mod = module_text_address(called_from);
- /*
- * We must allow modules to probe themself and in this case
- * avoid incrementing the module refcount, so as to allow
- * unloading of self probing modules.
- */
- if (calling_mod && calling_mod != probed_mod) {
- if (unlikely(!try_module_get(probed_mod)))
- return -EINVAL;
- p->mod_refcounted = 1;
- } else
- probed_mod = NULL;
- }
-
- p->nmissed = 0;
- mutex_lock(&kprobe_mutex);
- old_p = get_kprobe(p->addr);
- if (old_p) {
- ret = register_aggr_kprobe(old_p, p);
- goto out;
- }
-
- ret = arch_prepare_kprobe(p);
- if (ret)
- goto out;
-
- INIT_HLIST_NODE(&p->hlist);
- hlist_add_head_rcu(&p->hlist,
- &kprobe_table[hash_ptr(p->addr, KPROBE_HASH_BITS)]);
-
- if (kprobe_enabled)
- arch_arm_kprobe(p);
-
-out:
- mutex_unlock(&kprobe_mutex);
-
- if (ret && probed_mod)
- module_put(probed_mod);
- return ret;
-}
-
-int __kprobes register_kprobe(struct kprobe *p)
-{
- return __register_kprobe(p, (unsigned long)__builtin_return_address(0));
-}
-
-void __kprobes unregister_kprobe(struct kprobe *p)
-{
- struct module *mod;
- struct kprobe *old_p, *list_p;
- int cleanup_p;
-
- mutex_lock(&kprobe_mutex);
- old_p = get_kprobe(p->addr);
- if (unlikely(!old_p)) {
- mutex_unlock(&kprobe_mutex);
- return;
- }
- if (p != old_p) {
- list_for_each_entry_rcu(list_p, &old_p->list, list)
- if (list_p == p)
- /* kprobe p is a valid probe */
- goto valid_p;
- mutex_unlock(&kprobe_mutex);
- return;
- }
-valid_p:
- if (old_p == p ||
- (old_p->pre_handler == aggr_pre_handler &&
- p->list.next == &old_p->list && p->list.prev == &old_p->list)) {
- /*
- * Only probe on the hash list. Disarm only if kprobes are
- * enabled - otherwise, the breakpoint would already have
- * been removed. We save on flushing icache.
- */
- if (kprobe_enabled)
- arch_disarm_kprobe(p);
- hlist_del_rcu(&old_p->hlist);
- cleanup_p = 1;
- } else {
- list_del_rcu(&p->list);
- cleanup_p = 0;
- }
-
- mutex_unlock(&kprobe_mutex);
-
- synchronize_sched();
- if (p->mod_refcounted) {
- mod = module_text_address((unsigned long)p->addr);
- if (mod)
- module_put(mod);
- }
-
- if (cleanup_p) {
- if (p != old_p) {
- list_del_rcu(&p->list);
- kfree(old_p);
- }
- arch_remove_kprobe(p);
- } else {
- mutex_lock(&kprobe_mutex);
- if (p->break_handler)
- old_p->break_handler = NULL;
- if (p->post_handler){
- list_for_each_entry_rcu(list_p, &old_p->list, list){
- if (list_p->post_handler){
- cleanup_p = 2;
- break;
- }
- }
- if (cleanup_p == 0)
- old_p->post_handler = NULL;
- }
- mutex_unlock(&kprobe_mutex);
- }
-}
-
-static struct notifier_block kprobe_exceptions_nb = {
- .notifier_call = kprobe_exceptions_notify,
- .priority = 0x7fffffff /* we need to be notified first */
-};
-
-unsigned long __weak arch_deref_entry_point(void *entry)
-{
- return (unsigned long)entry;
-}
-
-int __kprobes register_jprobe(struct jprobe *jp)
-{
- unsigned long addr = arch_deref_entry_point(jp->entry);
-
- if (!kernel_text_address(addr))
- return -EINVAL;
-
- /* Todo: Verify probepoint is a function entry point */
- jp->kp.pre_handler = setjmp_pre_handler;
- jp->kp.break_handler = longjmp_break_handler;
-
- return __register_kprobe(&jp->kp,
- (unsigned long)__builtin_return_address(0));
-}
-
-void __kprobes unregister_jprobe(struct jprobe *jp)
-{
- unregister_kprobe(&jp->kp);
-}
-
-#ifdef ARCH_SUPPORTS_KRETPROBES
-
-/*
- * This kprobe pre_handler is registered with every kretprobe. When probe
- * hits it will set up the return probe.
- */
-static int __kprobes pre_handler_kretprobe(struct kprobe *p,
- struct pt_regs *regs)
-{
- struct kretprobe *rp = container_of(p, struct kretprobe, kp);
- unsigned long flags = 0;
-
- /*TODO: consider to only swap the RA after the last pre_handler fired */
- spin_lock_irqsave(&kretprobe_lock, flags);
- if (!hlist_empty(&rp->free_instances)) {
- struct kretprobe_instance *ri;
-
- ri = hlist_entry(rp->free_instances.first,
- struct kretprobe_instance, uflist);
- ri->rp = rp;
- ri->task = current;
- arch_prepare_kretprobe(ri, regs);
-
- /* XXX(hch): why is there no hlist_move_head? */
- hlist_del(&ri->uflist);
- hlist_add_head(&ri->uflist, &ri->rp->used_instances);
- hlist_add_head(&ri->hlist, kretprobe_inst_table_head(ri->task));
- } else
- rp->nmissed++;
- spin_unlock_irqrestore(&kretprobe_lock, flags);
- return 0;
-}
-
-int __kprobes register_kretprobe(struct kretprobe *rp)
-{
- int ret = 0;
- struct kretprobe_instance *inst;
- int i;
- void *addr = rp->kp.addr;
-
- if (kretprobe_blacklist_size) {
- if (addr == NULL)
- kprobe_lookup_name(rp->kp.symbol_name, addr);
- addr += rp->kp.offset;
-
- for (i = 0; kretprobe_blacklist[i].name != NULL; i++) {
- if (kretprobe_blacklist[i].addr == addr)
- return -EINVAL;
- }
- }
-
- rp->kp.pre_handler = pre_handler_kretprobe;
- rp->kp.post_handler = NULL;
- rp->kp.fault_handler = NULL;
- rp->kp.break_handler = NULL;
-
- /* Pre-allocate memory for max kretprobe instances */
- if (rp->maxactive <= 0) {
-#ifdef CONFIG_PREEMPT
- rp->maxactive = max(10, 2 * NR_CPUS);
-#else
- rp->maxactive = NR_CPUS;
-#endif
- }
- INIT_HLIST_HEAD(&rp->used_instances);
- INIT_HLIST_HEAD(&rp->free_instances);
- for (i = 0; i < rp->maxactive; i++) {
- inst = kmalloc(sizeof(struct kretprobe_instance), GFP_KERNEL);
- if (inst == NULL) {
- free_rp_inst(rp);
- return -ENOMEM;
- }
- INIT_HLIST_NODE(&inst->uflist);
- hlist_add_head(&inst->uflist, &rp->free_instances);
- }
-
- rp->nmissed = 0;
- /* Establish function entry probe point */
- if ((ret = __register_kprobe(&rp->kp,
- (unsigned long)__builtin_return_address(0))) != 0)
- free_rp_inst(rp);
- return ret;
-}
-
-#else /* ARCH_SUPPORTS_KRETPROBES */
-
-int __kprobes register_kretprobe(struct kretprobe *rp)
-{
- return -ENOSYS;
-}
-
-static int __kprobes pre_handler_kretprobe(struct kprobe *p,
- struct pt_regs *regs)
-{
- return 0;
-}
-
-#endif /* ARCH_SUPPORTS_KRETPROBES */
-
-void __kprobes unregister_kretprobe(struct kretprobe *rp)
-{
- unsigned long flags;
- struct kretprobe_instance *ri;
- struct hlist_node *pos, *next;
-
- unregister_kprobe(&rp->kp);
-
- /* No race here */
- spin_lock_irqsave(&kretprobe_lock, flags);
- hlist_for_each_entry_safe(ri, pos, next, &rp->used_instances, uflist) {
- ri->rp = NULL;
- hlist_del(&ri->uflist);
- }
- spin_unlock_irqrestore(&kretprobe_lock, flags);
- free_rp_inst(rp);
-}
-
-static int __init init_kprobes(void)
-{
- int i, err = 0;
-
- /* FIXME allocate the probe table, currently defined statically */
- /* initialize all list heads */
- for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
- INIT_HLIST_HEAD(&kprobe_table[i]);
- INIT_HLIST_HEAD(&kretprobe_inst_table[i]);
- }
-
- if (kretprobe_blacklist_size) {
- /* lookup the function address from its name */
- for (i = 0; kretprobe_blacklist[i].name != NULL; i++) {
- kprobe_lookup_name(kretprobe_blacklist[i].name,
- kretprobe_blacklist[i].addr);
- if (!kretprobe_blacklist[i].addr)
- printk("kretprobe: lookup failed: %s\n",
- kretprobe_blacklist[i].name);
- }
- }
-
- /* By default, kprobes are enabled */
- kprobe_enabled = true;
-
- err = arch_init_kprobes();
- if (!err)
- err = register_die_notifier(&kprobe_exceptions_nb);
-
- return err;
-}
-
-#ifdef CONFIG_DEBUG_FS
-static void __kprobes report_probe(struct seq_file *pi, struct kprobe *p,
- const char *sym, int offset,char *modname)
-{
- char *kprobe_type;
-
- if (p->pre_handler == pre_handler_kretprobe)
- kprobe_type = "r";
- else if (p->pre_handler == setjmp_pre_handler)
- kprobe_type = "j";
- else
- kprobe_type = "k";
- if (sym)
- seq_printf(pi, "%p %s %s+0x%x %s\n", p->addr, kprobe_type,
- sym, offset, (modname ? modname : " "));
- else
- seq_printf(pi, "%p %s %p\n", p->addr, kprobe_type, p->addr);
-}
-
-static void __kprobes *kprobe_seq_start(struct seq_file *f, loff_t *pos)
-{
- return (*pos < KPROBE_TABLE_SIZE) ? pos : NULL;
-}
-
-static void __kprobes *kprobe_seq_next(struct seq_file *f, void *v, loff_t *pos)
-{
- (*pos)++;
- if (*pos >= KPROBE_TABLE_SIZE)
- return NULL;
- return pos;
-}
-
-static void __kprobes kprobe_seq_stop(struct seq_file *f, void *v)
-{
- /* Nothing to do */
-}
-
-static int __kprobes show_kprobe_addr(struct seq_file *pi, void *v)
-{
- struct hlist_head *head;
- struct hlist_node *node;
- struct kprobe *p, *kp;
- const char *sym = NULL;
- unsigned int i = *(loff_t *) v;
- unsigned long offset = 0;
- char *modname, namebuf[128];
-
- head = &kprobe_table[i];
- preempt_disable();
- hlist_for_each_entry_rcu(p, node, head, hlist) {
- sym = kallsyms_lookup((unsigned long)p->addr, NULL,
- &offset, &modname, namebuf);
- if (p->pre_handler == aggr_pre_handler) {
- list_for_each_entry_rcu(kp, &p->list, list)
- report_probe(pi, kp, sym, offset, modname);
- } else
- report_probe(pi, p, sym, offset, modname);
- }
- preempt_enable();
- return 0;
-}
-
-static struct seq_operations kprobes_seq_ops = {
- .start = kprobe_seq_start,
- .next = kprobe_seq_next,
- .stop = kprobe_seq_stop,
- .show = show_kprobe_addr
-};
-
-static int __kprobes kprobes_open(struct inode *inode, struct file *filp)
-{
- return seq_open(filp, &kprobes_seq_ops);
-}
-
-static struct file_operations debugfs_kprobes_operations = {
- .open = kprobes_open,
- .read = seq_read,
- .llseek = seq_lseek,
- .release = seq_release,
-};
-
-static void __kprobes enable_all_kprobes(void)
-{
- struct hlist_head *head;
- struct hlist_node *node;
- struct kprobe *p;
- unsigned int i;
-
- mutex_lock(&kprobe_mutex);
-
- /* If kprobes are already enabled, just return */
- if (kprobe_enabled)
- goto already_enabled;
-
- for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
- head = &kprobe_table[i];
- hlist_for_each_entry_rcu(p, node, head, hlist)
- arch_arm_kprobe(p);
- }
-
- kprobe_enabled = true;
- printk(KERN_INFO "Kprobes globally enabled\n");
-
-already_enabled:
- mutex_unlock(&kprobe_mutex);
- return;
-}
-
-static void __kprobes disable_all_kprobes(void)
-{
- struct hlist_head *head;
- struct hlist_node *node;
- struct kprobe *p;
- unsigned int i;
-
- mutex_lock(&kprobe_mutex);
-
- /* If kprobes are already disabled, just return */
- if (!kprobe_enabled)
- goto already_disabled;
-
- kprobe_enabled = false;
- printk(KERN_INFO "Kprobes globally disabled\n");
- for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
- head = &kprobe_table[i];
- hlist_for_each_entry_rcu(p, node, head, hlist) {
- if (!arch_trampoline_kprobe(p))
- arch_disarm_kprobe(p);
- }
- }
-
- mutex_unlock(&kprobe_mutex);
- /* Allow all currently running kprobes to complete */
- synchronize_sched();
- return;
-
-already_disabled:
- mutex_unlock(&kprobe_mutex);
- return;
-}
-
-/*
- * XXX: The debugfs bool file interface doesn't allow for callbacks
- * when the bool state is switched. We can reuse that facility when
- * available
- */
-static ssize_t read_enabled_file_bool(struct file *file,
- char __user *user_buf, size_t count, loff_t *ppos)
-{
- char buf[3];
-
- if (kprobe_enabled)
- buf[0] = '1';
- else
- buf[0] = '0';
- buf[1] = '\n';
- buf[2] = 0x00;
- return simple_read_from_buffer(user_buf, count, ppos, buf, 2);
-}
-
-static ssize_t write_enabled_file_bool(struct file *file,
- const char __user *user_buf, size_t count, loff_t *ppos)
-{
- char buf[32];
- int buf_size;
-
- buf_size = min(count, (sizeof(buf)-1));
- if (copy_from_user(buf, user_buf, buf_size))
- return -EFAULT;
-
- switch (buf[0]) {
- case 'y':
- case 'Y':
- case '1':
- enable_all_kprobes();
- break;
- case 'n':
- case 'N':
- case '0':
- disable_all_kprobes();
- break;
- }
-
- return count;
-}
-
-static struct file_operations fops_kp = {
- .read = read_enabled_file_bool,
- .write = write_enabled_file_bool,
-};
-
-static int __kprobes debugfs_kprobe_init(void)
-{
- struct dentry *dir, *file;
- unsigned int value = 1;
-
- dir = debugfs_create_dir("kprobes", NULL);
- if (!dir)
- return -ENOMEM;
-
- file = debugfs_create_file("list", 0444, dir, NULL,
- &debugfs_kprobes_operations);
- if (!file) {
- debugfs_remove(dir);
- return -ENOMEM;
- }
-
- file = debugfs_create_file("enabled", 0600, dir,
- &value, &fops_kp);
- if (!file) {
- debugfs_remove(dir);
- return -ENOMEM;
- }
-
- return 0;
-}
-
-late_initcall(debugfs_kprobe_init);
-#endif /* CONFIG_DEBUG_FS */
-
-module_init(init_kprobes);
-
-EXPORT_SYMBOL_GPL(register_kprobe);
-EXPORT_SYMBOL_GPL(unregister_kprobe);
-EXPORT_SYMBOL_GPL(register_jprobe);
-EXPORT_SYMBOL_GPL(unregister_jprobe);
-#ifdef CONFIG_KPROBES
-EXPORT_SYMBOL_GPL(jprobe_return);
-#endif
-
-#ifdef CONFIG_KPROBES
-EXPORT_SYMBOL_GPL(register_kretprobe);
-EXPORT_SYMBOL_GPL(unregister_kretprobe);
-#endif
Index: linux-2.6-lttng.stable/instrumentation/Makefile
===================================================================
--- linux-2.6-lttng.stable.orig/instrumentation/Makefile 2007-10-29 09:51:12.000000000 -0400
+++ linux-2.6-lttng.stable/instrumentation/Makefile 2007-10-29 09:51:46.000000000 -0400
@@ -1,3 +1,5 @@
#
# Makefile for the linux kernel instrumentation
#
+
+obj-$(CONFIG_KPROBES) += kprobes.o
Index: linux-2.6-lttng.stable/kernel/Makefile
===================================================================
--- linux-2.6-lttng.stable.orig/kernel/Makefile 2007-10-29 09:51:06.000000000 -0400
+++ linux-2.6-lttng.stable/kernel/Makefile 2007-10-29 09:51:38.000000000 -0400
@@ -47,7 +47,6 @@ obj-$(CONFIG_STOP_MACHINE) += stop_machi
obj-$(CONFIG_AUDIT) += audit.o auditfilter.o
obj-$(CONFIG_AUDITSYSCALL) += auditsc.o
obj-$(CONFIG_AUDIT_TREE) += audit_tree.o
-obj-$(CONFIG_KPROBES) += kprobes.o
obj-$(CONFIG_SYSFS) += ksysfs.o
obj-$(CONFIG_DETECT_SOFTLOCKUP) += softlockup.o
obj-$(CONFIG_GENERIC_HARDIRQS) += irq/
Index: linux-2.6-lttng.stable/Makefile
===================================================================
--- linux-2.6-lttng.stable.orig/Makefile 2007-10-29 09:51:22.000000000 -0400
+++ linux-2.6-lttng.stable/Makefile 2007-10-29 09:51:38.000000000 -0400
@@ -574,7 +574,8 @@ export mod_strip_cmd
ifeq ($(KBUILD_EXTMOD),)
-core-y += kernel/ mm/ fs/ ipc/ security/ crypto/ block/
+core-y += kernel/ mm/ fs/ ipc/ security/ crypto/ block/ \
+ instrumentation/
vmlinux-dirs := $(patsubst %/,%,$(filter %/, $(init-y) $(init-m) \
$(core-y) $(core-m) $(drivers-y) $(drivers-m) \
Index: linux-2.6-lttng.stable/arch/avr32/instrumentation/kprobes.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng.stable/arch/avr32/instrumentation/kprobes.c 2007-10-29 09:51:38.000000000 -0400
@@ -0,0 +1,268 @@
+/*
+ * Kernel Probes (KProbes)
+ *
+ * Copyright (C) 2005-2006 Atmel Corporation
+ *
+ * Based on arch/ppc64/kernel/kprobes.c
+ * Copyright (C) IBM Corporation, 2002, 2004
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kprobes.h>
+#include <linux/ptrace.h>
+
+#include <asm/cacheflush.h>
+#include <linux/kdebug.h>
+#include <asm/ocd.h>
+
+DEFINE_PER_CPU(struct kprobe *, current_kprobe);
+static unsigned long kprobe_status;
+static struct pt_regs jprobe_saved_regs;
+
+struct kretprobe_blackpoint kretprobe_blacklist[] = {{NULL, NULL}};
+
+int __kprobes arch_prepare_kprobe(struct kprobe *p)
+{
+ int ret = 0;
+
+ if ((unsigned long)p->addr & 0x01) {
+ printk("Attempt to register kprobe at an unaligned address\n");
+ ret = -EINVAL;
+ }
+
+ /* XXX: Might be a good idea to check if p->addr is a valid
+ * kernel address as well... */
+
+ if (!ret) {
+ pr_debug("copy kprobe at %p\n", p->addr);
+ memcpy(p->ainsn.insn, p->addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
+ p->opcode = *p->addr;
+ }
+
+ return ret;
+}
+
+void __kprobes arch_arm_kprobe(struct kprobe *p)
+{
+ pr_debug("arming kprobe at %p\n", p->addr);
+ *p->addr = BREAKPOINT_INSTRUCTION;
+ flush_icache_range((unsigned long)p->addr,
+ (unsigned long)p->addr + sizeof(kprobe_opcode_t));
+}
+
+void __kprobes arch_disarm_kprobe(struct kprobe *p)
+{
+ pr_debug("disarming kprobe at %p\n", p->addr);
+ *p->addr = p->opcode;
+ flush_icache_range((unsigned long)p->addr,
+ (unsigned long)p->addr + sizeof(kprobe_opcode_t));
+}
+
+static void __kprobes prepare_singlestep(struct kprobe *p, struct pt_regs *regs)
+{
+ unsigned long dc;
+
+ pr_debug("preparing to singlestep over %p (PC=%08lx)\n",
+ p->addr, regs->pc);
+
+ BUG_ON(!(sysreg_read(SR) & SYSREG_BIT(SR_D)));
+
+ dc = __mfdr(DBGREG_DC);
+ dc |= DC_SS;
+ __mtdr(DBGREG_DC, dc);
+
+ /*
+ * We must run the instruction from its original location
+ * since it may actually reference PC.
+ *
+ * TODO: Do the instruction replacement directly in icache.
+ */
+ *p->addr = p->opcode;
+ flush_icache_range((unsigned long)p->addr,
+ (unsigned long)p->addr + sizeof(kprobe_opcode_t));
+}
+
+static void __kprobes resume_execution(struct kprobe *p, struct pt_regs *regs)
+{
+ unsigned long dc;
+
+ pr_debug("resuming execution at PC=%08lx\n", regs->pc);
+
+ dc = __mfdr(DBGREG_DC);
+ dc &= ~DC_SS;
+ __mtdr(DBGREG_DC, dc);
+
+ *p->addr = BREAKPOINT_INSTRUCTION;
+ flush_icache_range((unsigned long)p->addr,
+ (unsigned long)p->addr + sizeof(kprobe_opcode_t));
+}
+
+static void __kprobes set_current_kprobe(struct kprobe *p)
+{
+ __get_cpu_var(current_kprobe) = p;
+}
+
+static int __kprobes kprobe_handler(struct pt_regs *regs)
+{
+ struct kprobe *p;
+ void *addr = (void *)regs->pc;
+ int ret = 0;
+
+ pr_debug("kprobe_handler: kprobe_running=%p\n",
+ kprobe_running());
+
+ /*
+ * We don't want to be preempted for the entire
+ * duration of kprobe processing
+ */
+ preempt_disable();
+
+ /* Check that we're not recursing */
+ if (kprobe_running()) {
+ p = get_kprobe(addr);
+ if (p) {
+ if (kprobe_status == KPROBE_HIT_SS) {
+ printk("FIXME: kprobe hit while single-stepping!\n");
+ goto no_kprobe;
+ }
+
+ printk("FIXME: kprobe hit while handling another kprobe\n");
+ goto no_kprobe;
+ } else {
+ p = kprobe_running();
+ if (p->break_handler && p->break_handler(p, regs))
+ goto ss_probe;
+ }
+ /* If it's not ours, can't be delete race, (we hold lock). */
+ goto no_kprobe;
+ }
+
+ p = get_kprobe(addr);
+ if (!p)
+ goto no_kprobe;
+
+ kprobe_status = KPROBE_HIT_ACTIVE;
+ set_current_kprobe(p);
+ if (p->pre_handler && p->pre_handler(p, regs))
+ /* handler has already set things up, so skip ss setup */
+ return 1;
+
+ss_probe:
+ prepare_singlestep(p, regs);
+ kprobe_status = KPROBE_HIT_SS;
+ return 1;
+
+no_kprobe:
+ preempt_enable_no_resched();
+ return ret;
+}
+
+static int __kprobes post_kprobe_handler(struct pt_regs *regs)
+{
+ struct kprobe *cur = kprobe_running();
+
+ pr_debug("post_kprobe_handler, cur=%p\n", cur);
+
+ if (!cur)
+ return 0;
+
+ if (cur->post_handler) {
+ kprobe_status = KPROBE_HIT_SSDONE;
+ cur->post_handler(cur, regs, 0);
+ }
+
+ resume_execution(cur, regs);
+ reset_current_kprobe();
+ preempt_enable_no_resched();
+
+ return 1;
+}
+
+int __kprobes kprobe_fault_handler(struct pt_regs *regs, int trapnr)
+{
+ struct kprobe *cur = kprobe_running();
+
+ pr_debug("kprobe_fault_handler: trapnr=%d\n", trapnr);
+
+ if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr))
+ return 1;
+
+ if (kprobe_status & KPROBE_HIT_SS) {
+ resume_execution(cur, regs);
+ preempt_enable_no_resched();
+ }
+ return 0;
+}
+
+/*
+ * Wrapper routine to for handling exceptions.
+ */
+int __kprobes kprobe_exceptions_notify(struct notifier_block *self,
+ unsigned long val, void *data)
+{
+ struct die_args *args = (struct die_args *)data;
+ int ret = NOTIFY_DONE;
+
+ pr_debug("kprobe_exceptions_notify: val=%lu, data=%p\n",
+ val, data);
+
+ switch (val) {
+ case DIE_BREAKPOINT:
+ if (kprobe_handler(args->regs))
+ ret = NOTIFY_STOP;
+ break;
+ case DIE_SSTEP:
+ if (post_kprobe_handler(args->regs))
+ ret = NOTIFY_STOP;
+ break;
+ default:
+ break;
+ }
+
+ return ret;
+}
+
+int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
+{
+ struct jprobe *jp = container_of(p, struct jprobe, kp);
+
+ memcpy(&jprobe_saved_regs, regs, sizeof(struct pt_regs));
+
+ /*
+ * TODO: We should probably save some of the stack here as
+ * well, since gcc may pass arguments on the stack for certain
+ * functions (lots of arguments, large aggregates, varargs)
+ */
+
+ /* setup return addr to the jprobe handler routine */
+ regs->pc = (unsigned long)jp->entry;
+ return 1;
+}
+
+void __kprobes jprobe_return(void)
+{
+ asm volatile("breakpoint" ::: "memory");
+}
+
+int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
+{
+ /*
+ * FIXME - we should ideally be validating that we got here 'cos
+ * of the "trap" in jprobe_return() above, before restoring the
+ * saved regs...
+ */
+ memcpy(regs, &jprobe_saved_regs, sizeof(struct pt_regs));
+ return 1;
+}
+
+int __init arch_init_kprobes(void)
+{
+ printk("KPROBES: Enabling monitor mode (MM|DBE)...\n");
+ __mtdr(DBGREG_DC, DC_MM | DC_DBE);
+
+ /* TODO: Register kretprobe trampoline */
+ return 0;
+}
Index: linux-2.6-lttng.stable/arch/avr32/kernel/kprobes.c
===================================================================
--- linux-2.6-lttng.stable.orig/arch/avr32/kernel/kprobes.c 2007-10-29 09:51:07.000000000 -0400
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,268 +0,0 @@
-/*
- * Kernel Probes (KProbes)
- *
- * Copyright (C) 2005-2006 Atmel Corporation
- *
- * Based on arch/ppc64/kernel/kprobes.c
- * Copyright (C) IBM Corporation, 2002, 2004
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-
-#include <linux/kprobes.h>
-#include <linux/ptrace.h>
-
-#include <asm/cacheflush.h>
-#include <linux/kdebug.h>
-#include <asm/ocd.h>
-
-DEFINE_PER_CPU(struct kprobe *, current_kprobe);
-static unsigned long kprobe_status;
-static struct pt_regs jprobe_saved_regs;
-
-struct kretprobe_blackpoint kretprobe_blacklist[] = {{NULL, NULL}};
-
-int __kprobes arch_prepare_kprobe(struct kprobe *p)
-{
- int ret = 0;
-
- if ((unsigned long)p->addr & 0x01) {
- printk("Attempt to register kprobe at an unaligned address\n");
- ret = -EINVAL;
- }
-
- /* XXX: Might be a good idea to check if p->addr is a valid
- * kernel address as well... */
-
- if (!ret) {
- pr_debug("copy kprobe at %p\n", p->addr);
- memcpy(p->ainsn.insn, p->addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
- p->opcode = *p->addr;
- }
-
- return ret;
-}
-
-void __kprobes arch_arm_kprobe(struct kprobe *p)
-{
- pr_debug("arming kprobe at %p\n", p->addr);
- *p->addr = BREAKPOINT_INSTRUCTION;
- flush_icache_range((unsigned long)p->addr,
- (unsigned long)p->addr + sizeof(kprobe_opcode_t));
-}
-
-void __kprobes arch_disarm_kprobe(struct kprobe *p)
-{
- pr_debug("disarming kprobe at %p\n", p->addr);
- *p->addr = p->opcode;
- flush_icache_range((unsigned long)p->addr,
- (unsigned long)p->addr + sizeof(kprobe_opcode_t));
-}
-
-static void __kprobes prepare_singlestep(struct kprobe *p, struct pt_regs *regs)
-{
- unsigned long dc;
-
- pr_debug("preparing to singlestep over %p (PC=%08lx)\n",
- p->addr, regs->pc);
-
- BUG_ON(!(sysreg_read(SR) & SYSREG_BIT(SR_D)));
-
- dc = __mfdr(DBGREG_DC);
- dc |= DC_SS;
- __mtdr(DBGREG_DC, dc);
-
- /*
- * We must run the instruction from its original location
- * since it may actually reference PC.
- *
- * TODO: Do the instruction replacement directly in icache.
- */
- *p->addr = p->opcode;
- flush_icache_range((unsigned long)p->addr,
- (unsigned long)p->addr + sizeof(kprobe_opcode_t));
-}
-
-static void __kprobes resume_execution(struct kprobe *p, struct pt_regs *regs)
-{
- unsigned long dc;
-
- pr_debug("resuming execution at PC=%08lx\n", regs->pc);
-
- dc = __mfdr(DBGREG_DC);
- dc &= ~DC_SS;
- __mtdr(DBGREG_DC, dc);
-
- *p->addr = BREAKPOINT_INSTRUCTION;
- flush_icache_range((unsigned long)p->addr,
- (unsigned long)p->addr + sizeof(kprobe_opcode_t));
-}
-
-static void __kprobes set_current_kprobe(struct kprobe *p)
-{
- __get_cpu_var(current_kprobe) = p;
-}
-
-static int __kprobes kprobe_handler(struct pt_regs *regs)
-{
- struct kprobe *p;
- void *addr = (void *)regs->pc;
- int ret = 0;
-
- pr_debug("kprobe_handler: kprobe_running=%p\n",
- kprobe_running());
-
- /*
- * We don't want to be preempted for the entire
- * duration of kprobe processing
- */
- preempt_disable();
-
- /* Check that we're not recursing */
- if (kprobe_running()) {
- p = get_kprobe(addr);
- if (p) {
- if (kprobe_status == KPROBE_HIT_SS) {
- printk("FIXME: kprobe hit while single-stepping!\n");
- goto no_kprobe;
- }
-
- printk("FIXME: kprobe hit while handling another kprobe\n");
- goto no_kprobe;
- } else {
- p = kprobe_running();
- if (p->break_handler && p->break_handler(p, regs))
- goto ss_probe;
- }
- /* If it's not ours, can't be delete race, (we hold lock). */
- goto no_kprobe;
- }
-
- p = get_kprobe(addr);
- if (!p)
- goto no_kprobe;
-
- kprobe_status = KPROBE_HIT_ACTIVE;
- set_current_kprobe(p);
- if (p->pre_handler && p->pre_handler(p, regs))
- /* handler has already set things up, so skip ss setup */
- return 1;
-
-ss_probe:
- prepare_singlestep(p, regs);
- kprobe_status = KPROBE_HIT_SS;
- return 1;
-
-no_kprobe:
- preempt_enable_no_resched();
- return ret;
-}
-
-static int __kprobes post_kprobe_handler(struct pt_regs *regs)
-{
- struct kprobe *cur = kprobe_running();
-
- pr_debug("post_kprobe_handler, cur=%p\n", cur);
-
- if (!cur)
- return 0;
-
- if (cur->post_handler) {
- kprobe_status = KPROBE_HIT_SSDONE;
- cur->post_handler(cur, regs, 0);
- }
-
- resume_execution(cur, regs);
- reset_current_kprobe();
- preempt_enable_no_resched();
-
- return 1;
-}
-
-int __kprobes kprobe_fault_handler(struct pt_regs *regs, int trapnr)
-{
- struct kprobe *cur = kprobe_running();
-
- pr_debug("kprobe_fault_handler: trapnr=%d\n", trapnr);
-
- if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr))
- return 1;
-
- if (kprobe_status & KPROBE_HIT_SS) {
- resume_execution(cur, regs);
- preempt_enable_no_resched();
- }
- return 0;
-}
-
-/*
- * Wrapper routine to for handling exceptions.
- */
-int __kprobes kprobe_exceptions_notify(struct notifier_block *self,
- unsigned long val, void *data)
-{
- struct die_args *args = (struct die_args *)data;
- int ret = NOTIFY_DONE;
-
- pr_debug("kprobe_exceptions_notify: val=%lu, data=%p\n",
- val, data);
-
- switch (val) {
- case DIE_BREAKPOINT:
- if (kprobe_handler(args->regs))
- ret = NOTIFY_STOP;
- break;
- case DIE_SSTEP:
- if (post_kprobe_handler(args->regs))
- ret = NOTIFY_STOP;
- break;
- default:
- break;
- }
-
- return ret;
-}
-
-int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct jprobe *jp = container_of(p, struct jprobe, kp);
-
- memcpy(&jprobe_saved_regs, regs, sizeof(struct pt_regs));
-
- /*
- * TODO: We should probably save some of the stack here as
- * well, since gcc may pass arguments on the stack for certain
- * functions (lots of arguments, large aggregates, varargs)
- */
-
- /* setup return addr to the jprobe handler routine */
- regs->pc = (unsigned long)jp->entry;
- return 1;
-}
-
-void __kprobes jprobe_return(void)
-{
- asm volatile("breakpoint" ::: "memory");
-}
-
-int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
-{
- /*
- * FIXME - we should ideally be validating that we got here 'cos
- * of the "trap" in jprobe_return() above, before restoring the
- * saved regs...
- */
- memcpy(regs, &jprobe_saved_regs, sizeof(struct pt_regs));
- return 1;
-}
-
-int __init arch_init_kprobes(void)
-{
- printk("KPROBES: Enabling monitor mode (MM|DBE)...\n");
- __mtdr(DBGREG_DC, DC_MM | DC_DBE);
-
- /* TODO: Register kretprobe trampoline */
- return 0;
-}
Index: linux-2.6-lttng.stable/arch/ia64/instrumentation/kprobes.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng.stable/arch/ia64/instrumentation/kprobes.c 2007-10-29 09:51:38.000000000 -0400
@@ -0,0 +1,1027 @@
+/*
+ * Kernel Probes (KProbes)
+ * arch/ia64/kernel/kprobes.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2002, 2004
+ * Copyright (C) Intel Corporation, 2005
+ *
+ * 2005-Apr Rusty Lynch <rusty.lynch@intel.com> and Anil S Keshavamurthy
+ * <anil.s.keshavamurthy@intel.com> adapted from i386
+ */
+
+#include <linux/kprobes.h>
+#include <linux/ptrace.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/preempt.h>
+#include <linux/moduleloader.h>
+#include <linux/kdebug.h>
+
+#include <asm/pgtable.h>
+#include <asm/sections.h>
+#include <asm/uaccess.h>
+
+extern void jprobe_inst_return(void);
+
+DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
+DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
+
+struct kretprobe_blackpoint kretprobe_blacklist[] = {{NULL, NULL}};
+
+enum instruction_type {A, I, M, F, B, L, X, u};
+static enum instruction_type bundle_encoding[32][3] = {
+ { M, I, I }, /* 00 */
+ { M, I, I }, /* 01 */
+ { M, I, I }, /* 02 */
+ { M, I, I }, /* 03 */
+ { M, L, X }, /* 04 */
+ { M, L, X }, /* 05 */
+ { u, u, u }, /* 06 */
+ { u, u, u }, /* 07 */
+ { M, M, I }, /* 08 */
+ { M, M, I }, /* 09 */
+ { M, M, I }, /* 0A */
+ { M, M, I }, /* 0B */
+ { M, F, I }, /* 0C */
+ { M, F, I }, /* 0D */
+ { M, M, F }, /* 0E */
+ { M, M, F }, /* 0F */
+ { M, I, B }, /* 10 */
+ { M, I, B }, /* 11 */
+ { M, B, B }, /* 12 */
+ { M, B, B }, /* 13 */
+ { u, u, u }, /* 14 */
+ { u, u, u }, /* 15 */
+ { B, B, B }, /* 16 */
+ { B, B, B }, /* 17 */
+ { M, M, B }, /* 18 */
+ { M, M, B }, /* 19 */
+ { u, u, u }, /* 1A */
+ { u, u, u }, /* 1B */
+ { M, F, B }, /* 1C */
+ { M, F, B }, /* 1D */
+ { u, u, u }, /* 1E */
+ { u, u, u }, /* 1F */
+};
+
+/*
+ * In this function we check to see if the instruction
+ * is IP relative instruction and update the kprobe
+ * inst flag accordingly
+ */
+static void __kprobes update_kprobe_inst_flag(uint template, uint slot,
+ uint major_opcode,
+ unsigned long kprobe_inst,
+ struct kprobe *p)
+{
+ p->ainsn.inst_flag = 0;
+ p->ainsn.target_br_reg = 0;
+ p->ainsn.slot = slot;
+
+ /* Check for Break instruction
+ * Bits 37:40 Major opcode to be zero
+ * Bits 27:32 X6 to be zero
+ * Bits 32:35 X3 to be zero
+ */
+ if ((!major_opcode) && (!((kprobe_inst >> 27) & 0x1FF)) ) {
+ /* is a break instruction */
+ p->ainsn.inst_flag |= INST_FLAG_BREAK_INST;
+ return;
+ }
+
+ if (bundle_encoding[template][slot] == B) {
+ switch (major_opcode) {
+ case INDIRECT_CALL_OPCODE:
+ p->ainsn.inst_flag |= INST_FLAG_FIX_BRANCH_REG;
+ p->ainsn.target_br_reg = ((kprobe_inst >> 6) & 0x7);
+ break;
+ case IP_RELATIVE_PREDICT_OPCODE:
+ case IP_RELATIVE_BRANCH_OPCODE:
+ p->ainsn.inst_flag |= INST_FLAG_FIX_RELATIVE_IP_ADDR;
+ break;
+ case IP_RELATIVE_CALL_OPCODE:
+ p->ainsn.inst_flag |= INST_FLAG_FIX_RELATIVE_IP_ADDR;
+ p->ainsn.inst_flag |= INST_FLAG_FIX_BRANCH_REG;
+ p->ainsn.target_br_reg = ((kprobe_inst >> 6) & 0x7);
+ break;
+ }
+ } else if (bundle_encoding[template][slot] == X) {
+ switch (major_opcode) {
+ case LONG_CALL_OPCODE:
+ p->ainsn.inst_flag |= INST_FLAG_FIX_BRANCH_REG;
+ p->ainsn.target_br_reg = ((kprobe_inst >> 6) & 0x7);
+ break;
+ }
+ }
+ return;
+}
+
+/*
+ * In this function we check to see if the instruction
+ * (qp) cmpx.crel.ctype p1,p2=r2,r3
+ * on which we are inserting kprobe is cmp instruction
+ * with ctype as unc.
+ */
+static uint __kprobes is_cmp_ctype_unc_inst(uint template, uint slot,
+ uint major_opcode,
+ unsigned long kprobe_inst)
+{
+ cmp_inst_t cmp_inst;
+ uint ctype_unc = 0;
+
+ if (!((bundle_encoding[template][slot] == I) ||
+ (bundle_encoding[template][slot] == M)))
+ goto out;
+
+ if (!((major_opcode == 0xC) || (major_opcode == 0xD) ||
+ (major_opcode == 0xE)))
+ goto out;
+
+ cmp_inst.l = kprobe_inst;
+ if ((cmp_inst.f.x2 == 0) || (cmp_inst.f.x2 == 1)) {
+ /* Integer compare - Register Register (A6 type)*/
+ if ((cmp_inst.f.tb == 0) && (cmp_inst.f.ta == 0)
+ &&(cmp_inst.f.c == 1))
+ ctype_unc = 1;
+ } else if ((cmp_inst.f.x2 == 2)||(cmp_inst.f.x2 == 3)) {
+ /* Integer compare - Immediate Register (A8 type)*/
+ if ((cmp_inst.f.ta == 0) &&(cmp_inst.f.c == 1))
+ ctype_unc = 1;
+ }
+out:
+ return ctype_unc;
+}
+
+/*
+ * In this function we check to see if the instruction
+ * on which we are inserting kprobe is supported.
+ * Returns qp value if supported
+ * Returns -EINVAL if unsupported
+ */
+static int __kprobes unsupported_inst(uint template, uint slot,
+ uint major_opcode,
+ unsigned long kprobe_inst,
+ unsigned long addr)
+{
+ int qp;
+
+ qp = kprobe_inst & 0x3f;
+ if (is_cmp_ctype_unc_inst(template, slot, major_opcode, kprobe_inst)) {
+ if (slot == 1 && qp) {
+ printk(KERN_WARNING "Kprobes on cmp unc"
+ "instruction on slot 1 at <0x%lx>"
+ "is not supported\n", addr);
+ return -EINVAL;
+
+ }
+ qp = 0;
+ }
+ else if (bundle_encoding[template][slot] == I) {
+ if (major_opcode == 0) {
+ /*
+ * Check for Integer speculation instruction
+ * - Bit 33-35 to be equal to 0x1
+ */
+ if (((kprobe_inst >> 33) & 0x7) == 1) {
+ printk(KERN_WARNING
+ "Kprobes on speculation inst at <0x%lx> not supported\n",
+ addr);
+ return -EINVAL;
+ }
+ /*
+ * IP relative mov instruction
+ * - Bit 27-35 to be equal to 0x30
+ */
+ if (((kprobe_inst >> 27) & 0x1FF) == 0x30) {
+ printk(KERN_WARNING
+ "Kprobes on \"mov r1=ip\" at <0x%lx> not supported\n",
+ addr);
+ return -EINVAL;
+
+ }
+ }
+ else if ((major_opcode == 5) && !(kprobe_inst & (0xFUl << 33)) &&
+ (kprobe_inst & (0x1UL << 12))) {
+ /* test bit instructions, tbit,tnat,tf
+ * bit 33-36 to be equal to 0
+ * bit 12 to be equal to 1
+ */
+ if (slot == 1 && qp) {
+ printk(KERN_WARNING "Kprobes on test bit"
+ "instruction on slot at <0x%lx>"
+ "is not supported\n", addr);
+ return -EINVAL;
+ }
+ qp = 0;
+ }
+ }
+ else if (bundle_encoding[template][slot] == B) {
+ if (major_opcode == 7) {
+ /* IP-Relative Predict major code is 7 */
+ printk(KERN_WARNING "Kprobes on IP-Relative"
+ "Predict is not supported\n");
+ return -EINVAL;
+ }
+ else if (major_opcode == 2) {
+ /* Indirect Predict, major code is 2
+ * bit 27-32 to be equal to 10 or 11
+ */
+ int x6=(kprobe_inst >> 27) & 0x3F;
+ if ((x6 == 0x10) || (x6 == 0x11)) {
+ printk(KERN_WARNING "Kprobes on"
+ "Indirect Predict is not supported\n");
+ return -EINVAL;
+ }
+ }
+ }
+ /* kernel does not use float instruction, here for safety kprobe
+ * will judge whether it is fcmp/flass/float approximation instruction
+ */
+ else if (unlikely(bundle_encoding[template][slot] == F)) {
+ if ((major_opcode == 4 || major_opcode == 5) &&
+ (kprobe_inst & (0x1 << 12))) {
+ /* fcmp/fclass unc instruction */
+ if (slot == 1 && qp) {
+ printk(KERN_WARNING "Kprobes on fcmp/fclass "
+ "instruction on slot at <0x%lx> "
+ "is not supported\n", addr);
+ return -EINVAL;
+
+ }
+ qp = 0;
+ }
+ if ((major_opcode == 0 || major_opcode == 1) &&
+ (kprobe_inst & (0x1UL << 33))) {
+ /* float Approximation instruction */
+ if (slot == 1 && qp) {
+ printk(KERN_WARNING "Kprobes on float Approx "
+ "instr at <0x%lx> is not supported\n",
+ addr);
+ return -EINVAL;
+ }
+ qp = 0;
+ }
+ }
+ return qp;
+}
+
+/*
+ * In this function we override the bundle with
+ * the break instruction at the given slot.
+ */
+static void __kprobes prepare_break_inst(uint template, uint slot,
+ uint major_opcode,
+ unsigned long kprobe_inst,
+ struct kprobe *p,
+ int qp)
+{
+ unsigned long break_inst = BREAK_INST;
+ bundle_t *bundle = &p->opcode.bundle;
+
+ /*
+ * Copy the original kprobe_inst qualifying predicate(qp)
+ * to the break instruction
+ */
+ break_inst |= qp;
+
+ switch (slot) {
+ case 0:
+ bundle->quad0.slot0 = break_inst;
+ break;
+ case 1:
+ bundle->quad0.slot1_p0 = break_inst;
+ bundle->quad1.slot1_p1 = break_inst >> (64-46);
+ break;
+ case 2:
+ bundle->quad1.slot2 = break_inst;
+ break;
+ }
+
+ /*
+ * Update the instruction flag, so that we can
+ * emulate the instruction properly after we
+ * single step on original instruction
+ */
+ update_kprobe_inst_flag(template, slot, major_opcode, kprobe_inst, p);
+}
+
+static void __kprobes get_kprobe_inst(bundle_t *bundle, uint slot,
+ unsigned long *kprobe_inst, uint *major_opcode)
+{
+ unsigned long kprobe_inst_p0, kprobe_inst_p1;
+ unsigned int template;
+
+ template = bundle->quad0.template;
+
+ switch (slot) {
+ case 0:
+ *major_opcode = (bundle->quad0.slot0 >> SLOT0_OPCODE_SHIFT);
+ *kprobe_inst = bundle->quad0.slot0;
+ break;
+ case 1:
+ *major_opcode = (bundle->quad1.slot1_p1 >> SLOT1_p1_OPCODE_SHIFT);
+ kprobe_inst_p0 = bundle->quad0.slot1_p0;
+ kprobe_inst_p1 = bundle->quad1.slot1_p1;
+ *kprobe_inst = kprobe_inst_p0 | (kprobe_inst_p1 << (64-46));
+ break;
+ case 2:
+ *major_opcode = (bundle->quad1.slot2 >> SLOT2_OPCODE_SHIFT);
+ *kprobe_inst = bundle->quad1.slot2;
+ break;
+ }
+}
+
+/* Returns non-zero if the addr is in the Interrupt Vector Table */
+static int __kprobes in_ivt_functions(unsigned long addr)
+{
+ return (addr >= (unsigned long)__start_ivt_text
+ && addr < (unsigned long)__end_ivt_text);
+}
+
+static int __kprobes valid_kprobe_addr(int template, int slot,
+ unsigned long addr)
+{
+ if ((slot > 2) || ((bundle_encoding[template][1] == L) && slot > 1)) {
+ printk(KERN_WARNING "Attempting to insert unaligned kprobe "
+ "at 0x%lx\n", addr);
+ return -EINVAL;
+ }
+
+ if (in_ivt_functions(addr)) {
+ printk(KERN_WARNING "Kprobes can't be inserted inside "
+ "IVT functions at 0x%lx\n", addr);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static void __kprobes save_previous_kprobe(struct kprobe_ctlblk *kcb)
+{
+ unsigned int i;
+ i = atomic_add_return(1, &kcb->prev_kprobe_index);
+ kcb->prev_kprobe[i-1].kp = kprobe_running();
+ kcb->prev_kprobe[i-1].status = kcb->kprobe_status;
+}
+
+static void __kprobes restore_previous_kprobe(struct kprobe_ctlblk *kcb)
+{
+ unsigned int i;
+ i = atomic_sub_return(1, &kcb->prev_kprobe_index);
+ __get_cpu_var(current_kprobe) = kcb->prev_kprobe[i].kp;
+ kcb->kprobe_status = kcb->prev_kprobe[i].status;
+}
+
+static void __kprobes set_current_kprobe(struct kprobe *p,
+ struct kprobe_ctlblk *kcb)
+{
+ __get_cpu_var(current_kprobe) = p;
+}
+
+static void kretprobe_trampoline(void)
+{
+}
+
+/*
+ * At this point the target function has been tricked into
+ * returning into our trampoline. Lookup the associated instance
+ * and then:
+ * - call the handler function
+ * - cleanup by marking the instance as unused
+ * - long jump back to the original return address
+ */
+int __kprobes trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
+{
+ struct kretprobe_instance *ri = NULL;
+ struct hlist_head *head, empty_rp;
+ struct hlist_node *node, *tmp;
+ unsigned long flags, orig_ret_address = 0;
+ unsigned long trampoline_address =
+ ((struct fnptr *)kretprobe_trampoline)->ip;
+
+ INIT_HLIST_HEAD(&empty_rp);
+ spin_lock_irqsave(&kretprobe_lock, flags);
+ head = kretprobe_inst_table_head(current);
+
+ /*
+ * It is possible to have multiple instances associated with a given
+ * task either because an multiple functions in the call path
+ * have a return probe installed on them, and/or more then one return
+ * return probe was registered for a target function.
+ *
+ * We can handle this because:
+ * - instances are always inserted at the head of the list
+ * - when multiple return probes are registered for the same
+ * function, the first instance's ret_addr will point to the
+ * real return address, and all the rest will point to
+ * kretprobe_trampoline
+ */
+ hlist_for_each_entry_safe(ri, node, tmp, head, hlist) {
+ if (ri->task != current)
+ /* another task is sharing our hash bucket */
+ continue;
+
+ if (ri->rp && ri->rp->handler)
+ ri->rp->handler(ri, regs);
+
+ orig_ret_address = (unsigned long)ri->ret_addr;
+ recycle_rp_inst(ri, &empty_rp);
+
+ if (orig_ret_address != trampoline_address)
+ /*
+ * This is the real return address. Any other
+ * instances associated with this task are for
+ * other calls deeper on the call stack
+ */
+ break;
+ }
+
+ kretprobe_assert(ri, orig_ret_address, trampoline_address);
+
+ regs->cr_iip = orig_ret_address;
+
+ reset_current_kprobe();
+ spin_unlock_irqrestore(&kretprobe_lock, flags);
+ preempt_enable_no_resched();
+
+ hlist_for_each_entry_safe(ri, node, tmp, &empty_rp, hlist) {
+ hlist_del(&ri->hlist);
+ kfree(ri);
+ }
+ /*
+ * By returning a non-zero value, we are telling
+ * kprobe_handler() that we don't want the post_handler
+ * to run (and have re-enabled preemption)
+ */
+ return 1;
+}
+
+/* Called with kretprobe_lock held */
+void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
+ struct pt_regs *regs)
+{
+ ri->ret_addr = (kprobe_opcode_t *)regs->b0;
+
+ /* Replace the return addr with trampoline addr */
+ regs->b0 = ((struct fnptr *)kretprobe_trampoline)->ip;
+}
+
+int __kprobes arch_prepare_kprobe(struct kprobe *p)
+{
+ unsigned long addr = (unsigned long) p->addr;
+ unsigned long *kprobe_addr = (unsigned long *)(addr & ~0xFULL);
+ unsigned long kprobe_inst=0;
+ unsigned int slot = addr & 0xf, template, major_opcode = 0;
+ bundle_t *bundle;
+ int qp;
+
+ bundle = &((kprobe_opcode_t *)kprobe_addr)->bundle;
+ template = bundle->quad0.template;
+
+ if(valid_kprobe_addr(template, slot, addr))
+ return -EINVAL;
+
+ /* Move to slot 2, if bundle is MLX type and kprobe slot is 1 */
+ if (slot == 1 && bundle_encoding[template][1] == L)
+ slot++;
+
+ /* Get kprobe_inst and major_opcode from the bundle */
+ get_kprobe_inst(bundle, slot, &kprobe_inst, &major_opcode);
+
+ qp = unsupported_inst(template, slot, major_opcode, kprobe_inst, addr);
+ if (qp < 0)
+ return -EINVAL;
+
+ p->ainsn.insn = get_insn_slot();
+ if (!p->ainsn.insn)
+ return -ENOMEM;
+ memcpy(&p->opcode, kprobe_addr, sizeof(kprobe_opcode_t));
+ memcpy(p->ainsn.insn, kprobe_addr, sizeof(kprobe_opcode_t));
+
+ prepare_break_inst(template, slot, major_opcode, kprobe_inst, p, qp);
+
+ return 0;
+}
+
+void __kprobes arch_arm_kprobe(struct kprobe *p)
+{
+ unsigned long arm_addr;
+ bundle_t *src, *dest;
+
+ arm_addr = ((unsigned long)p->addr) & ~0xFUL;
+ dest = &((kprobe_opcode_t *)arm_addr)->bundle;
+ src = &p->opcode.bundle;
+
+ flush_icache_range((unsigned long)p->ainsn.insn,
+ (unsigned long)p->ainsn.insn + sizeof(kprobe_opcode_t));
+ switch (p->ainsn.slot) {
+ case 0:
+ dest->quad0.slot0 = src->quad0.slot0;
+ break;
+ case 1:
+ dest->quad1.slot1_p1 = src->quad1.slot1_p1;
+ break;
+ case 2:
+ dest->quad1.slot2 = src->quad1.slot2;
+ break;
+ }
+ flush_icache_range(arm_addr, arm_addr + sizeof(kprobe_opcode_t));
+}
+
+void __kprobes arch_disarm_kprobe(struct kprobe *p)
+{
+ unsigned long arm_addr;
+ bundle_t *src, *dest;
+
+ arm_addr = ((unsigned long)p->addr) & ~0xFUL;
+ dest = &((kprobe_opcode_t *)arm_addr)->bundle;
+ /* p->ainsn.insn contains the original unaltered kprobe_opcode_t */
+ src = &p->ainsn.insn->bundle;
+ switch (p->ainsn.slot) {
+ case 0:
+ dest->quad0.slot0 = src->quad0.slot0;
+ break;
+ case 1:
+ dest->quad1.slot1_p1 = src->quad1.slot1_p1;
+ break;
+ case 2:
+ dest->quad1.slot2 = src->quad1.slot2;
+ break;
+ }
+ flush_icache_range(arm_addr, arm_addr + sizeof(kprobe_opcode_t));
+}
+
+void __kprobes arch_remove_kprobe(struct kprobe *p)
+{
+ mutex_lock(&kprobe_mutex);
+ free_insn_slot(p->ainsn.insn, 0);
+ mutex_unlock(&kprobe_mutex);
+}
+/*
+ * We are resuming execution after a single step fault, so the pt_regs
+ * structure reflects the register state after we executed the instruction
+ * located in the kprobe (p->ainsn.insn.bundle). We still need to adjust
+ * the ip to point back to the original stack address. To set the IP address
+ * to original stack address, handle the case where we need to fixup the
+ * relative IP address and/or fixup branch register.
+ */
+static void __kprobes resume_execution(struct kprobe *p, struct pt_regs *regs)
+{
+ unsigned long bundle_addr = (unsigned long) (&p->ainsn.insn->bundle);
+ unsigned long resume_addr = (unsigned long)p->addr & ~0xFULL;
+ unsigned long template;
+ int slot = ((unsigned long)p->addr & 0xf);
+
+ template = p->ainsn.insn->bundle.quad0.template;
+
+ if (slot == 1 && bundle_encoding[template][1] == L)
+ slot = 2;
+
+ if (p->ainsn.inst_flag) {
+
+ if (p->ainsn.inst_flag & INST_FLAG_FIX_RELATIVE_IP_ADDR) {
+ /* Fix relative IP address */
+ regs->cr_iip = (regs->cr_iip - bundle_addr) +
+ resume_addr;
+ }
+
+ if (p->ainsn.inst_flag & INST_FLAG_FIX_BRANCH_REG) {
+ /*
+ * Fix target branch register, software convention is
+ * to use either b0 or b6 or b7, so just checking
+ * only those registers
+ */
+ switch (p->ainsn.target_br_reg) {
+ case 0:
+ if ((regs->b0 == bundle_addr) ||
+ (regs->b0 == bundle_addr + 0x10)) {
+ regs->b0 = (regs->b0 - bundle_addr) +
+ resume_addr;
+ }
+ break;
+ case 6:
+ if ((regs->b6 == bundle_addr) ||
+ (regs->b6 == bundle_addr + 0x10)) {
+ regs->b6 = (regs->b6 - bundle_addr) +
+ resume_addr;
+ }
+ break;
+ case 7:
+ if ((regs->b7 == bundle_addr) ||
+ (regs->b7 == bundle_addr + 0x10)) {
+ regs->b7 = (regs->b7 - bundle_addr) +
+ resume_addr;
+ }
+ break;
+ } /* end switch */
+ }
+ goto turn_ss_off;
+ }
+
+ if (slot == 2) {
+ if (regs->cr_iip == bundle_addr + 0x10) {
+ regs->cr_iip = resume_addr + 0x10;
+ }
+ } else {
+ if (regs->cr_iip == bundle_addr) {
+ regs->cr_iip = resume_addr;
+ }
+ }
+
+turn_ss_off:
+ /* Turn off Single Step bit */
+ ia64_psr(regs)->ss = 0;
+}
+
+static void __kprobes prepare_ss(struct kprobe *p, struct pt_regs *regs)
+{
+ unsigned long bundle_addr = (unsigned long) &p->ainsn.insn->bundle;
+ unsigned long slot = (unsigned long)p->addr & 0xf;
+
+ /* single step inline if break instruction */
+ if (p->ainsn.inst_flag == INST_FLAG_BREAK_INST)
+ regs->cr_iip = (unsigned long)p->addr & ~0xFULL;
+ else
+ regs->cr_iip = bundle_addr & ~0xFULL;
+
+ if (slot > 2)
+ slot = 0;
+
+ ia64_psr(regs)->ri = slot;
+
+ /* turn on single stepping */
+ ia64_psr(regs)->ss = 1;
+}
+
+static int __kprobes is_ia64_break_inst(struct pt_regs *regs)
+{
+ unsigned int slot = ia64_psr(regs)->ri;
+ unsigned int template, major_opcode;
+ unsigned long kprobe_inst;
+ unsigned long *kprobe_addr = (unsigned long *)regs->cr_iip;
+ bundle_t bundle;
+
+ memcpy(&bundle, kprobe_addr, sizeof(bundle_t));
+ template = bundle.quad0.template;
+
+ /* Move to slot 2, if bundle is MLX type and kprobe slot is 1 */
+ if (slot == 1 && bundle_encoding[template][1] == L)
+ slot++;
+
+ /* Get Kprobe probe instruction at given slot*/
+ get_kprobe_inst(&bundle, slot, &kprobe_inst, &major_opcode);
+
+ /* For break instruction,
+ * Bits 37:40 Major opcode to be zero
+ * Bits 27:32 X6 to be zero
+ * Bits 32:35 X3 to be zero
+ */
+ if (major_opcode || ((kprobe_inst >> 27) & 0x1FF) ) {
+ /* Not a break instruction */
+ return 0;
+ }
+
+ /* Is a break instruction */
+ return 1;
+}
+
+static int __kprobes pre_kprobes_handler(struct die_args *args)
+{
+ struct kprobe *p;
+ int ret = 0;
+ struct pt_regs *regs = args->regs;
+ kprobe_opcode_t *addr = (kprobe_opcode_t *)instruction_pointer(regs);
+ struct kprobe_ctlblk *kcb;
+
+ /*
+ * We don't want to be preempted for the entire
+ * duration of kprobe processing
+ */
+ preempt_disable();
+ kcb = get_kprobe_ctlblk();
+
+ /* Handle recursion cases */
+ if (kprobe_running()) {
+ p = get_kprobe(addr);
+ if (p) {
+ if ((kcb->kprobe_status == KPROBE_HIT_SS) &&
+ (p->ainsn.inst_flag == INST_FLAG_BREAK_INST)) {
+ ia64_psr(regs)->ss = 0;
+ goto no_kprobe;
+ }
+ /* We have reentered the pre_kprobe_handler(), since
+ * another probe was hit while within the handler.
+ * We here save the original kprobes variables and
+ * just single step on the instruction of the new probe
+ * without calling any user handlers.
+ */
+ save_previous_kprobe(kcb);
+ set_current_kprobe(p, kcb);
+ kprobes_inc_nmissed_count(p);
+ prepare_ss(p, regs);
+ kcb->kprobe_status = KPROBE_REENTER;
+ return 1;
+ } else if (args->err == __IA64_BREAK_JPROBE) {
+ /*
+ * jprobe instrumented function just completed
+ */
+ p = __get_cpu_var(current_kprobe);
+ if (p->break_handler && p->break_handler(p, regs)) {
+ goto ss_probe;
+ }
+ } else if (!is_ia64_break_inst(regs)) {
+ /* The breakpoint instruction was removed by
+ * another cpu right after we hit, no further
+ * handling of this interrupt is appropriate
+ */
+ ret = 1;
+ goto no_kprobe;
+ } else {
+ /* Not our break */
+ goto no_kprobe;
+ }
+ }
+
+ p = get_kprobe(addr);
+ if (!p) {
+ if (!is_ia64_break_inst(regs)) {
+ /*
+ * The breakpoint instruction was removed right
+ * after we hit it. Another cpu has removed
+ * either a probepoint or a debugger breakpoint
+ * at this address. In either case, no further
+ * handling of this interrupt is appropriate.
+ */
+ ret = 1;
+
+ }
+
+ /* Not one of our break, let kernel handle it */
+ goto no_kprobe;
+ }
+
+ set_current_kprobe(p, kcb);
+ kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+
+ if (p->pre_handler && p->pre_handler(p, regs))
+ /*
+ * Our pre-handler is specifically requesting that we just
+ * do a return. This is used for both the jprobe pre-handler
+ * and the kretprobe trampoline
+ */
+ return 1;
+
+ss_probe:
+ prepare_ss(p, regs);
+ kcb->kprobe_status = KPROBE_HIT_SS;
+ return 1;
+
+no_kprobe:
+ preempt_enable_no_resched();
+ return ret;
+}
+
+static int __kprobes post_kprobes_handler(struct pt_regs *regs)
+{
+ struct kprobe *cur = kprobe_running();
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+
+ if (!cur)
+ return 0;
+
+ if ((kcb->kprobe_status != KPROBE_REENTER) && cur->post_handler) {
+ kcb->kprobe_status = KPROBE_HIT_SSDONE;
+ cur->post_handler(cur, regs, 0);
+ }
+
+ resume_execution(cur, regs);
+
+ /*Restore back the original saved kprobes variables and continue. */
+ if (kcb->kprobe_status == KPROBE_REENTER) {
+ restore_previous_kprobe(kcb);
+ goto out;
+ }
+ reset_current_kprobe();
+
+out:
+ preempt_enable_no_resched();
+ return 1;
+}
+
+int __kprobes kprobes_fault_handler(struct pt_regs *regs, int trapnr)
+{
+ struct kprobe *cur = kprobe_running();
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+
+
+ switch(kcb->kprobe_status) {
+ case KPROBE_HIT_SS:
+ case KPROBE_REENTER:
+ /*
+ * We are here because the instruction being single
+ * stepped caused a page fault. We reset the current
+ * kprobe and the instruction pointer points back to
+ * the probe address and allow the page fault handler
+ * to continue as a normal page fault.
+ */
+ regs->cr_iip = ((unsigned long)cur->addr) & ~0xFULL;
+ ia64_psr(regs)->ri = ((unsigned long)cur->addr) & 0xf;
+ if (kcb->kprobe_status == KPROBE_REENTER)
+ restore_previous_kprobe(kcb);
+ else
+ reset_current_kprobe();
+ preempt_enable_no_resched();
+ break;
+ case KPROBE_HIT_ACTIVE:
+ case KPROBE_HIT_SSDONE:
+ /*
+ * We increment the nmissed count for accounting,
+ * we can also use npre/npostfault count for accouting
+ * these specific fault cases.
+ */
+ kprobes_inc_nmissed_count(cur);
+
+ /*
+ * We come here because instructions in the pre/post
+ * handler caused the page_fault, this could happen
+ * if handler tries to access user space by
+ * copy_from_user(), get_user() etc. Let the
+ * user-specified handler try to fix it first.
+ */
+ if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr))
+ return 1;
+ /*
+ * In case the user-specified fault handler returned
+ * zero, try to fix up.
+ */
+ if (ia64_done_with_exception(regs))
+ return 1;
+
+ /*
+ * Let ia64_do_page_fault() fix it.
+ */
+ break;
+ default:
+ break;
+ }
+
+ return 0;
+}
+
+int __kprobes kprobe_exceptions_notify(struct notifier_block *self,
+ unsigned long val, void *data)
+{
+ struct die_args *args = (struct die_args *)data;
+ int ret = NOTIFY_DONE;
+
+ if (args->regs && user_mode(args->regs))
+ return ret;
+
+ switch(val) {
+ case DIE_BREAK:
+ /* err is break number from ia64_bad_break() */
+ if ((args->err >> 12) == (__IA64_BREAK_KPROBE >> 12)
+ || args->err == __IA64_BREAK_JPROBE
+ || args->err == 0)
+ if (pre_kprobes_handler(args))
+ ret = NOTIFY_STOP;
+ break;
+ case DIE_FAULT:
+ /* err is vector number from ia64_fault() */
+ if (args->err == 36)
+ if (post_kprobes_handler(args->regs))
+ ret = NOTIFY_STOP;
+ break;
+ default:
+ break;
+ }
+ return ret;
+}
+
+struct param_bsp_cfm {
+ unsigned long ip;
+ unsigned long *bsp;
+ unsigned long cfm;
+};
+
+static void ia64_get_bsp_cfm(struct unw_frame_info *info, void *arg)
+{
+ unsigned long ip;
+ struct param_bsp_cfm *lp = arg;
+
+ do {
+ unw_get_ip(info, &ip);
+ if (ip == 0)
+ break;
+ if (ip == lp->ip) {
+ unw_get_bsp(info, (unsigned long*)&lp->bsp);
+ unw_get_cfm(info, (unsigned long*)&lp->cfm);
+ return;
+ }
+ } while (unw_unwind(info) >= 0);
+ lp->bsp = NULL;
+ lp->cfm = 0;
+ return;
+}
+
+unsigned long arch_deref_entry_point(void *entry)
+{
+ return ((struct fnptr *)entry)->ip;
+}
+
+int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
+{
+ struct jprobe *jp = container_of(p, struct jprobe, kp);
+ unsigned long addr = arch_deref_entry_point(jp->entry);
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+ struct param_bsp_cfm pa;
+ int bytes;
+
+ /*
+ * Callee owns the argument space and could overwrite it, eg
+ * tail call optimization. So to be absolutely safe
+ * we save the argument space before transferring the control
+ * to instrumented jprobe function which runs in
+ * the process context
+ */
+ pa.ip = regs->cr_iip;
+ unw_init_running(ia64_get_bsp_cfm, &pa);
+ bytes = (char *)ia64_rse_skip_regs(pa.bsp, pa.cfm & 0x3f)
+ - (char *)pa.bsp;
+ memcpy( kcb->jprobes_saved_stacked_regs,
+ pa.bsp,
+ bytes );
+ kcb->bsp = pa.bsp;
+ kcb->cfm = pa.cfm;
+
+ /* save architectural state */
+ kcb->jprobe_saved_regs = *regs;
+
+ /* after rfi, execute the jprobe instrumented function */
+ regs->cr_iip = addr & ~0xFULL;
+ ia64_psr(regs)->ri = addr & 0xf;
+ regs->r1 = ((struct fnptr *)(jp->entry))->gp;
+
+ /*
+ * fix the return address to our jprobe_inst_return() function
+ * in the jprobes.S file
+ */
+ regs->b0 = ((struct fnptr *)(jprobe_inst_return))->ip;
+
+ return 1;
+}
+
+int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
+{
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+ int bytes;
+
+ /* restoring architectural state */
+ *regs = kcb->jprobe_saved_regs;
+
+ /* restoring the original argument space */
+ flush_register_stack();
+ bytes = (char *)ia64_rse_skip_regs(kcb->bsp, kcb->cfm & 0x3f)
+ - (char *)kcb->bsp;
+ memcpy( kcb->bsp,
+ kcb->jprobes_saved_stacked_regs,
+ bytes );
+ invalidate_stacked_regs();
+
+ preempt_enable_no_resched();
+ return 1;
+}
+
+static struct kprobe trampoline_p = {
+ .pre_handler = trampoline_probe_handler
+};
+
+int __init arch_init_kprobes(void)
+{
+ trampoline_p.addr =
+ (kprobe_opcode_t *)((struct fnptr *)kretprobe_trampoline)->ip;
+ return register_kprobe(&trampoline_p);
+}
+
+int __kprobes arch_trampoline_kprobe(struct kprobe *p)
+{
+ if (p->addr ==
+ (kprobe_opcode_t *)((struct fnptr *)kretprobe_trampoline)->ip)
+ return 1;
+
+ return 0;
+}
Index: linux-2.6-lttng.stable/arch/ia64/kernel/kprobes.c
===================================================================
--- linux-2.6-lttng.stable.orig/arch/ia64/kernel/kprobes.c 2007-10-29 09:51:07.000000000 -0400
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,1027 +0,0 @@
-/*
- * Kernel Probes (KProbes)
- * arch/ia64/kernel/kprobes.c
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
- *
- * Copyright (C) IBM Corporation, 2002, 2004
- * Copyright (C) Intel Corporation, 2005
- *
- * 2005-Apr Rusty Lynch <rusty.lynch@intel.com> and Anil S Keshavamurthy
- * <anil.s.keshavamurthy@intel.com> adapted from i386
- */
-
-#include <linux/kprobes.h>
-#include <linux/ptrace.h>
-#include <linux/string.h>
-#include <linux/slab.h>
-#include <linux/preempt.h>
-#include <linux/moduleloader.h>
-#include <linux/kdebug.h>
-
-#include <asm/pgtable.h>
-#include <asm/sections.h>
-#include <asm/uaccess.h>
-
-extern void jprobe_inst_return(void);
-
-DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
-DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
-
-struct kretprobe_blackpoint kretprobe_blacklist[] = {{NULL, NULL}};
-
-enum instruction_type {A, I, M, F, B, L, X, u};
-static enum instruction_type bundle_encoding[32][3] = {
- { M, I, I }, /* 00 */
- { M, I, I }, /* 01 */
- { M, I, I }, /* 02 */
- { M, I, I }, /* 03 */
- { M, L, X }, /* 04 */
- { M, L, X }, /* 05 */
- { u, u, u }, /* 06 */
- { u, u, u }, /* 07 */
- { M, M, I }, /* 08 */
- { M, M, I }, /* 09 */
- { M, M, I }, /* 0A */
- { M, M, I }, /* 0B */
- { M, F, I }, /* 0C */
- { M, F, I }, /* 0D */
- { M, M, F }, /* 0E */
- { M, M, F }, /* 0F */
- { M, I, B }, /* 10 */
- { M, I, B }, /* 11 */
- { M, B, B }, /* 12 */
- { M, B, B }, /* 13 */
- { u, u, u }, /* 14 */
- { u, u, u }, /* 15 */
- { B, B, B }, /* 16 */
- { B, B, B }, /* 17 */
- { M, M, B }, /* 18 */
- { M, M, B }, /* 19 */
- { u, u, u }, /* 1A */
- { u, u, u }, /* 1B */
- { M, F, B }, /* 1C */
- { M, F, B }, /* 1D */
- { u, u, u }, /* 1E */
- { u, u, u }, /* 1F */
-};
-
-/*
- * In this function we check to see if the instruction
- * is IP relative instruction and update the kprobe
- * inst flag accordingly
- */
-static void __kprobes update_kprobe_inst_flag(uint template, uint slot,
- uint major_opcode,
- unsigned long kprobe_inst,
- struct kprobe *p)
-{
- p->ainsn.inst_flag = 0;
- p->ainsn.target_br_reg = 0;
- p->ainsn.slot = slot;
-
- /* Check for Break instruction
- * Bits 37:40 Major opcode to be zero
- * Bits 27:32 X6 to be zero
- * Bits 32:35 X3 to be zero
- */
- if ((!major_opcode) && (!((kprobe_inst >> 27) & 0x1FF)) ) {
- /* is a break instruction */
- p->ainsn.inst_flag |= INST_FLAG_BREAK_INST;
- return;
- }
-
- if (bundle_encoding[template][slot] == B) {
- switch (major_opcode) {
- case INDIRECT_CALL_OPCODE:
- p->ainsn.inst_flag |= INST_FLAG_FIX_BRANCH_REG;
- p->ainsn.target_br_reg = ((kprobe_inst >> 6) & 0x7);
- break;
- case IP_RELATIVE_PREDICT_OPCODE:
- case IP_RELATIVE_BRANCH_OPCODE:
- p->ainsn.inst_flag |= INST_FLAG_FIX_RELATIVE_IP_ADDR;
- break;
- case IP_RELATIVE_CALL_OPCODE:
- p->ainsn.inst_flag |= INST_FLAG_FIX_RELATIVE_IP_ADDR;
- p->ainsn.inst_flag |= INST_FLAG_FIX_BRANCH_REG;
- p->ainsn.target_br_reg = ((kprobe_inst >> 6) & 0x7);
- break;
- }
- } else if (bundle_encoding[template][slot] == X) {
- switch (major_opcode) {
- case LONG_CALL_OPCODE:
- p->ainsn.inst_flag |= INST_FLAG_FIX_BRANCH_REG;
- p->ainsn.target_br_reg = ((kprobe_inst >> 6) & 0x7);
- break;
- }
- }
- return;
-}
-
-/*
- * In this function we check to see if the instruction
- * (qp) cmpx.crel.ctype p1,p2=r2,r3
- * on which we are inserting kprobe is cmp instruction
- * with ctype as unc.
- */
-static uint __kprobes is_cmp_ctype_unc_inst(uint template, uint slot,
- uint major_opcode,
- unsigned long kprobe_inst)
-{
- cmp_inst_t cmp_inst;
- uint ctype_unc = 0;
-
- if (!((bundle_encoding[template][slot] == I) ||
- (bundle_encoding[template][slot] == M)))
- goto out;
-
- if (!((major_opcode == 0xC) || (major_opcode == 0xD) ||
- (major_opcode == 0xE)))
- goto out;
-
- cmp_inst.l = kprobe_inst;
- if ((cmp_inst.f.x2 == 0) || (cmp_inst.f.x2 == 1)) {
- /* Integer compare - Register Register (A6 type)*/
- if ((cmp_inst.f.tb == 0) && (cmp_inst.f.ta == 0)
- &&(cmp_inst.f.c == 1))
- ctype_unc = 1;
- } else if ((cmp_inst.f.x2 == 2)||(cmp_inst.f.x2 == 3)) {
- /* Integer compare - Immediate Register (A8 type)*/
- if ((cmp_inst.f.ta == 0) &&(cmp_inst.f.c == 1))
- ctype_unc = 1;
- }
-out:
- return ctype_unc;
-}
-
-/*
- * In this function we check to see if the instruction
- * on which we are inserting kprobe is supported.
- * Returns qp value if supported
- * Returns -EINVAL if unsupported
- */
-static int __kprobes unsupported_inst(uint template, uint slot,
- uint major_opcode,
- unsigned long kprobe_inst,
- unsigned long addr)
-{
- int qp;
-
- qp = kprobe_inst & 0x3f;
- if (is_cmp_ctype_unc_inst(template, slot, major_opcode, kprobe_inst)) {
- if (slot == 1 && qp) {
- printk(KERN_WARNING "Kprobes on cmp unc"
- "instruction on slot 1 at <0x%lx>"
- "is not supported\n", addr);
- return -EINVAL;
-
- }
- qp = 0;
- }
- else if (bundle_encoding[template][slot] == I) {
- if (major_opcode == 0) {
- /*
- * Check for Integer speculation instruction
- * - Bit 33-35 to be equal to 0x1
- */
- if (((kprobe_inst >> 33) & 0x7) == 1) {
- printk(KERN_WARNING
- "Kprobes on speculation inst at <0x%lx> not supported\n",
- addr);
- return -EINVAL;
- }
- /*
- * IP relative mov instruction
- * - Bit 27-35 to be equal to 0x30
- */
- if (((kprobe_inst >> 27) & 0x1FF) == 0x30) {
- printk(KERN_WARNING
- "Kprobes on \"mov r1=ip\" at <0x%lx> not supported\n",
- addr);
- return -EINVAL;
-
- }
- }
- else if ((major_opcode == 5) && !(kprobe_inst & (0xFUl << 33)) &&
- (kprobe_inst & (0x1UL << 12))) {
- /* test bit instructions, tbit,tnat,tf
- * bit 33-36 to be equal to 0
- * bit 12 to be equal to 1
- */
- if (slot == 1 && qp) {
- printk(KERN_WARNING "Kprobes on test bit"
- "instruction on slot at <0x%lx>"
- "is not supported\n", addr);
- return -EINVAL;
- }
- qp = 0;
- }
- }
- else if (bundle_encoding[template][slot] == B) {
- if (major_opcode == 7) {
- /* IP-Relative Predict major code is 7 */
- printk(KERN_WARNING "Kprobes on IP-Relative"
- "Predict is not supported\n");
- return -EINVAL;
- }
- else if (major_opcode == 2) {
- /* Indirect Predict, major code is 2
- * bit 27-32 to be equal to 10 or 11
- */
- int x6=(kprobe_inst >> 27) & 0x3F;
- if ((x6 == 0x10) || (x6 == 0x11)) {
- printk(KERN_WARNING "Kprobes on"
- "Indirect Predict is not supported\n");
- return -EINVAL;
- }
- }
- }
- /* kernel does not use float instruction, here for safety kprobe
- * will judge whether it is fcmp/flass/float approximation instruction
- */
- else if (unlikely(bundle_encoding[template][slot] == F)) {
- if ((major_opcode == 4 || major_opcode == 5) &&
- (kprobe_inst & (0x1 << 12))) {
- /* fcmp/fclass unc instruction */
- if (slot == 1 && qp) {
- printk(KERN_WARNING "Kprobes on fcmp/fclass "
- "instruction on slot at <0x%lx> "
- "is not supported\n", addr);
- return -EINVAL;
-
- }
- qp = 0;
- }
- if ((major_opcode == 0 || major_opcode == 1) &&
- (kprobe_inst & (0x1UL << 33))) {
- /* float Approximation instruction */
- if (slot == 1 && qp) {
- printk(KERN_WARNING "Kprobes on float Approx "
- "instr at <0x%lx> is not supported\n",
- addr);
- return -EINVAL;
- }
- qp = 0;
- }
- }
- return qp;
-}
-
-/*
- * In this function we override the bundle with
- * the break instruction at the given slot.
- */
-static void __kprobes prepare_break_inst(uint template, uint slot,
- uint major_opcode,
- unsigned long kprobe_inst,
- struct kprobe *p,
- int qp)
-{
- unsigned long break_inst = BREAK_INST;
- bundle_t *bundle = &p->opcode.bundle;
-
- /*
- * Copy the original kprobe_inst qualifying predicate(qp)
- * to the break instruction
- */
- break_inst |= qp;
-
- switch (slot) {
- case 0:
- bundle->quad0.slot0 = break_inst;
- break;
- case 1:
- bundle->quad0.slot1_p0 = break_inst;
- bundle->quad1.slot1_p1 = break_inst >> (64-46);
- break;
- case 2:
- bundle->quad1.slot2 = break_inst;
- break;
- }
-
- /*
- * Update the instruction flag, so that we can
- * emulate the instruction properly after we
- * single step on original instruction
- */
- update_kprobe_inst_flag(template, slot, major_opcode, kprobe_inst, p);
-}
-
-static void __kprobes get_kprobe_inst(bundle_t *bundle, uint slot,
- unsigned long *kprobe_inst, uint *major_opcode)
-{
- unsigned long kprobe_inst_p0, kprobe_inst_p1;
- unsigned int template;
-
- template = bundle->quad0.template;
-
- switch (slot) {
- case 0:
- *major_opcode = (bundle->quad0.slot0 >> SLOT0_OPCODE_SHIFT);
- *kprobe_inst = bundle->quad0.slot0;
- break;
- case 1:
- *major_opcode = (bundle->quad1.slot1_p1 >> SLOT1_p1_OPCODE_SHIFT);
- kprobe_inst_p0 = bundle->quad0.slot1_p0;
- kprobe_inst_p1 = bundle->quad1.slot1_p1;
- *kprobe_inst = kprobe_inst_p0 | (kprobe_inst_p1 << (64-46));
- break;
- case 2:
- *major_opcode = (bundle->quad1.slot2 >> SLOT2_OPCODE_SHIFT);
- *kprobe_inst = bundle->quad1.slot2;
- break;
- }
-}
-
-/* Returns non-zero if the addr is in the Interrupt Vector Table */
-static int __kprobes in_ivt_functions(unsigned long addr)
-{
- return (addr >= (unsigned long)__start_ivt_text
- && addr < (unsigned long)__end_ivt_text);
-}
-
-static int __kprobes valid_kprobe_addr(int template, int slot,
- unsigned long addr)
-{
- if ((slot > 2) || ((bundle_encoding[template][1] == L) && slot > 1)) {
- printk(KERN_WARNING "Attempting to insert unaligned kprobe "
- "at 0x%lx\n", addr);
- return -EINVAL;
- }
-
- if (in_ivt_functions(addr)) {
- printk(KERN_WARNING "Kprobes can't be inserted inside "
- "IVT functions at 0x%lx\n", addr);
- return -EINVAL;
- }
-
- return 0;
-}
-
-static void __kprobes save_previous_kprobe(struct kprobe_ctlblk *kcb)
-{
- unsigned int i;
- i = atomic_add_return(1, &kcb->prev_kprobe_index);
- kcb->prev_kprobe[i-1].kp = kprobe_running();
- kcb->prev_kprobe[i-1].status = kcb->kprobe_status;
-}
-
-static void __kprobes restore_previous_kprobe(struct kprobe_ctlblk *kcb)
-{
- unsigned int i;
- i = atomic_sub_return(1, &kcb->prev_kprobe_index);
- __get_cpu_var(current_kprobe) = kcb->prev_kprobe[i].kp;
- kcb->kprobe_status = kcb->prev_kprobe[i].status;
-}
-
-static void __kprobes set_current_kprobe(struct kprobe *p,
- struct kprobe_ctlblk *kcb)
-{
- __get_cpu_var(current_kprobe) = p;
-}
-
-static void kretprobe_trampoline(void)
-{
-}
-
-/*
- * At this point the target function has been tricked into
- * returning into our trampoline. Lookup the associated instance
- * and then:
- * - call the handler function
- * - cleanup by marking the instance as unused
- * - long jump back to the original return address
- */
-int __kprobes trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct kretprobe_instance *ri = NULL;
- struct hlist_head *head, empty_rp;
- struct hlist_node *node, *tmp;
- unsigned long flags, orig_ret_address = 0;
- unsigned long trampoline_address =
- ((struct fnptr *)kretprobe_trampoline)->ip;
-
- INIT_HLIST_HEAD(&empty_rp);
- spin_lock_irqsave(&kretprobe_lock, flags);
- head = kretprobe_inst_table_head(current);
-
- /*
- * It is possible to have multiple instances associated with a given
- * task either because an multiple functions in the call path
- * have a return probe installed on them, and/or more then one return
- * return probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always inserted at the head of the list
- * - when multiple return probes are registered for the same
- * function, the first instance's ret_addr will point to the
- * real return address, and all the rest will point to
- * kretprobe_trampoline
- */
- hlist_for_each_entry_safe(ri, node, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- if (ri->rp && ri->rp->handler)
- ri->rp->handler(ri, regs);
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
-
- regs->cr_iip = orig_ret_address;
-
- reset_current_kprobe();
- spin_unlock_irqrestore(&kretprobe_lock, flags);
- preempt_enable_no_resched();
-
- hlist_for_each_entry_safe(ri, node, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
- /*
- * By returning a non-zero value, we are telling
- * kprobe_handler() that we don't want the post_handler
- * to run (and have re-enabled preemption)
- */
- return 1;
-}
-
-/* Called with kretprobe_lock held */
-void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
- struct pt_regs *regs)
-{
- ri->ret_addr = (kprobe_opcode_t *)regs->b0;
-
- /* Replace the return addr with trampoline addr */
- regs->b0 = ((struct fnptr *)kretprobe_trampoline)->ip;
-}
-
-int __kprobes arch_prepare_kprobe(struct kprobe *p)
-{
- unsigned long addr = (unsigned long) p->addr;
- unsigned long *kprobe_addr = (unsigned long *)(addr & ~0xFULL);
- unsigned long kprobe_inst=0;
- unsigned int slot = addr & 0xf, template, major_opcode = 0;
- bundle_t *bundle;
- int qp;
-
- bundle = &((kprobe_opcode_t *)kprobe_addr)->bundle;
- template = bundle->quad0.template;
-
- if(valid_kprobe_addr(template, slot, addr))
- return -EINVAL;
-
- /* Move to slot 2, if bundle is MLX type and kprobe slot is 1 */
- if (slot == 1 && bundle_encoding[template][1] == L)
- slot++;
-
- /* Get kprobe_inst and major_opcode from the bundle */
- get_kprobe_inst(bundle, slot, &kprobe_inst, &major_opcode);
-
- qp = unsupported_inst(template, slot, major_opcode, kprobe_inst, addr);
- if (qp < 0)
- return -EINVAL;
-
- p->ainsn.insn = get_insn_slot();
- if (!p->ainsn.insn)
- return -ENOMEM;
- memcpy(&p->opcode, kprobe_addr, sizeof(kprobe_opcode_t));
- memcpy(p->ainsn.insn, kprobe_addr, sizeof(kprobe_opcode_t));
-
- prepare_break_inst(template, slot, major_opcode, kprobe_inst, p, qp);
-
- return 0;
-}
-
-void __kprobes arch_arm_kprobe(struct kprobe *p)
-{
- unsigned long arm_addr;
- bundle_t *src, *dest;
-
- arm_addr = ((unsigned long)p->addr) & ~0xFUL;
- dest = &((kprobe_opcode_t *)arm_addr)->bundle;
- src = &p->opcode.bundle;
-
- flush_icache_range((unsigned long)p->ainsn.insn,
- (unsigned long)p->ainsn.insn + sizeof(kprobe_opcode_t));
- switch (p->ainsn.slot) {
- case 0:
- dest->quad0.slot0 = src->quad0.slot0;
- break;
- case 1:
- dest->quad1.slot1_p1 = src->quad1.slot1_p1;
- break;
- case 2:
- dest->quad1.slot2 = src->quad1.slot2;
- break;
- }
- flush_icache_range(arm_addr, arm_addr + sizeof(kprobe_opcode_t));
-}
-
-void __kprobes arch_disarm_kprobe(struct kprobe *p)
-{
- unsigned long arm_addr;
- bundle_t *src, *dest;
-
- arm_addr = ((unsigned long)p->addr) & ~0xFUL;
- dest = &((kprobe_opcode_t *)arm_addr)->bundle;
- /* p->ainsn.insn contains the original unaltered kprobe_opcode_t */
- src = &p->ainsn.insn->bundle;
- switch (p->ainsn.slot) {
- case 0:
- dest->quad0.slot0 = src->quad0.slot0;
- break;
- case 1:
- dest->quad1.slot1_p1 = src->quad1.slot1_p1;
- break;
- case 2:
- dest->quad1.slot2 = src->quad1.slot2;
- break;
- }
- flush_icache_range(arm_addr, arm_addr + sizeof(kprobe_opcode_t));
-}
-
-void __kprobes arch_remove_kprobe(struct kprobe *p)
-{
- mutex_lock(&kprobe_mutex);
- free_insn_slot(p->ainsn.insn, 0);
- mutex_unlock(&kprobe_mutex);
-}
-/*
- * We are resuming execution after a single step fault, so the pt_regs
- * structure reflects the register state after we executed the instruction
- * located in the kprobe (p->ainsn.insn.bundle). We still need to adjust
- * the ip to point back to the original stack address. To set the IP address
- * to original stack address, handle the case where we need to fixup the
- * relative IP address and/or fixup branch register.
- */
-static void __kprobes resume_execution(struct kprobe *p, struct pt_regs *regs)
-{
- unsigned long bundle_addr = (unsigned long) (&p->ainsn.insn->bundle);
- unsigned long resume_addr = (unsigned long)p->addr & ~0xFULL;
- unsigned long template;
- int slot = ((unsigned long)p->addr & 0xf);
-
- template = p->ainsn.insn->bundle.quad0.template;
-
- if (slot == 1 && bundle_encoding[template][1] == L)
- slot = 2;
-
- if (p->ainsn.inst_flag) {
-
- if (p->ainsn.inst_flag & INST_FLAG_FIX_RELATIVE_IP_ADDR) {
- /* Fix relative IP address */
- regs->cr_iip = (regs->cr_iip - bundle_addr) +
- resume_addr;
- }
-
- if (p->ainsn.inst_flag & INST_FLAG_FIX_BRANCH_REG) {
- /*
- * Fix target branch register, software convention is
- * to use either b0 or b6 or b7, so just checking
- * only those registers
- */
- switch (p->ainsn.target_br_reg) {
- case 0:
- if ((regs->b0 == bundle_addr) ||
- (regs->b0 == bundle_addr + 0x10)) {
- regs->b0 = (regs->b0 - bundle_addr) +
- resume_addr;
- }
- break;
- case 6:
- if ((regs->b6 == bundle_addr) ||
- (regs->b6 == bundle_addr + 0x10)) {
- regs->b6 = (regs->b6 - bundle_addr) +
- resume_addr;
- }
- break;
- case 7:
- if ((regs->b7 == bundle_addr) ||
- (regs->b7 == bundle_addr + 0x10)) {
- regs->b7 = (regs->b7 - bundle_addr) +
- resume_addr;
- }
- break;
- } /* end switch */
- }
- goto turn_ss_off;
- }
-
- if (slot == 2) {
- if (regs->cr_iip == bundle_addr + 0x10) {
- regs->cr_iip = resume_addr + 0x10;
- }
- } else {
- if (regs->cr_iip == bundle_addr) {
- regs->cr_iip = resume_addr;
- }
- }
-
-turn_ss_off:
- /* Turn off Single Step bit */
- ia64_psr(regs)->ss = 0;
-}
-
-static void __kprobes prepare_ss(struct kprobe *p, struct pt_regs *regs)
-{
- unsigned long bundle_addr = (unsigned long) &p->ainsn.insn->bundle;
- unsigned long slot = (unsigned long)p->addr & 0xf;
-
- /* single step inline if break instruction */
- if (p->ainsn.inst_flag == INST_FLAG_BREAK_INST)
- regs->cr_iip = (unsigned long)p->addr & ~0xFULL;
- else
- regs->cr_iip = bundle_addr & ~0xFULL;
-
- if (slot > 2)
- slot = 0;
-
- ia64_psr(regs)->ri = slot;
-
- /* turn on single stepping */
- ia64_psr(regs)->ss = 1;
-}
-
-static int __kprobes is_ia64_break_inst(struct pt_regs *regs)
-{
- unsigned int slot = ia64_psr(regs)->ri;
- unsigned int template, major_opcode;
- unsigned long kprobe_inst;
- unsigned long *kprobe_addr = (unsigned long *)regs->cr_iip;
- bundle_t bundle;
-
- memcpy(&bundle, kprobe_addr, sizeof(bundle_t));
- template = bundle.quad0.template;
-
- /* Move to slot 2, if bundle is MLX type and kprobe slot is 1 */
- if (slot == 1 && bundle_encoding[template][1] == L)
- slot++;
-
- /* Get Kprobe probe instruction at given slot*/
- get_kprobe_inst(&bundle, slot, &kprobe_inst, &major_opcode);
-
- /* For break instruction,
- * Bits 37:40 Major opcode to be zero
- * Bits 27:32 X6 to be zero
- * Bits 32:35 X3 to be zero
- */
- if (major_opcode || ((kprobe_inst >> 27) & 0x1FF) ) {
- /* Not a break instruction */
- return 0;
- }
-
- /* Is a break instruction */
- return 1;
-}
-
-static int __kprobes pre_kprobes_handler(struct die_args *args)
-{
- struct kprobe *p;
- int ret = 0;
- struct pt_regs *regs = args->regs;
- kprobe_opcode_t *addr = (kprobe_opcode_t *)instruction_pointer(regs);
- struct kprobe_ctlblk *kcb;
-
- /*
- * We don't want to be preempted for the entire
- * duration of kprobe processing
- */
- preempt_disable();
- kcb = get_kprobe_ctlblk();
-
- /* Handle recursion cases */
- if (kprobe_running()) {
- p = get_kprobe(addr);
- if (p) {
- if ((kcb->kprobe_status == KPROBE_HIT_SS) &&
- (p->ainsn.inst_flag == INST_FLAG_BREAK_INST)) {
- ia64_psr(regs)->ss = 0;
- goto no_kprobe;
- }
- /* We have reentered the pre_kprobe_handler(), since
- * another probe was hit while within the handler.
- * We here save the original kprobes variables and
- * just single step on the instruction of the new probe
- * without calling any user handlers.
- */
- save_previous_kprobe(kcb);
- set_current_kprobe(p, kcb);
- kprobes_inc_nmissed_count(p);
- prepare_ss(p, regs);
- kcb->kprobe_status = KPROBE_REENTER;
- return 1;
- } else if (args->err == __IA64_BREAK_JPROBE) {
- /*
- * jprobe instrumented function just completed
- */
- p = __get_cpu_var(current_kprobe);
- if (p->break_handler && p->break_handler(p, regs)) {
- goto ss_probe;
- }
- } else if (!is_ia64_break_inst(regs)) {
- /* The breakpoint instruction was removed by
- * another cpu right after we hit, no further
- * handling of this interrupt is appropriate
- */
- ret = 1;
- goto no_kprobe;
- } else {
- /* Not our break */
- goto no_kprobe;
- }
- }
-
- p = get_kprobe(addr);
- if (!p) {
- if (!is_ia64_break_inst(regs)) {
- /*
- * The breakpoint instruction was removed right
- * after we hit it. Another cpu has removed
- * either a probepoint or a debugger breakpoint
- * at this address. In either case, no further
- * handling of this interrupt is appropriate.
- */
- ret = 1;
-
- }
-
- /* Not one of our break, let kernel handle it */
- goto no_kprobe;
- }
-
- set_current_kprobe(p, kcb);
- kcb->kprobe_status = KPROBE_HIT_ACTIVE;
-
- if (p->pre_handler && p->pre_handler(p, regs))
- /*
- * Our pre-handler is specifically requesting that we just
- * do a return. This is used for both the jprobe pre-handler
- * and the kretprobe trampoline
- */
- return 1;
-
-ss_probe:
- prepare_ss(p, regs);
- kcb->kprobe_status = KPROBE_HIT_SS;
- return 1;
-
-no_kprobe:
- preempt_enable_no_resched();
- return ret;
-}
-
-static int __kprobes post_kprobes_handler(struct pt_regs *regs)
-{
- struct kprobe *cur = kprobe_running();
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- if (!cur)
- return 0;
-
- if ((kcb->kprobe_status != KPROBE_REENTER) && cur->post_handler) {
- kcb->kprobe_status = KPROBE_HIT_SSDONE;
- cur->post_handler(cur, regs, 0);
- }
-
- resume_execution(cur, regs);
-
- /*Restore back the original saved kprobes variables and continue. */
- if (kcb->kprobe_status == KPROBE_REENTER) {
- restore_previous_kprobe(kcb);
- goto out;
- }
- reset_current_kprobe();
-
-out:
- preempt_enable_no_resched();
- return 1;
-}
-
-int __kprobes kprobes_fault_handler(struct pt_regs *regs, int trapnr)
-{
- struct kprobe *cur = kprobe_running();
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
-
- switch(kcb->kprobe_status) {
- case KPROBE_HIT_SS:
- case KPROBE_REENTER:
- /*
- * We are here because the instruction being single
- * stepped caused a page fault. We reset the current
- * kprobe and the instruction pointer points back to
- * the probe address and allow the page fault handler
- * to continue as a normal page fault.
- */
- regs->cr_iip = ((unsigned long)cur->addr) & ~0xFULL;
- ia64_psr(regs)->ri = ((unsigned long)cur->addr) & 0xf;
- if (kcb->kprobe_status == KPROBE_REENTER)
- restore_previous_kprobe(kcb);
- else
- reset_current_kprobe();
- preempt_enable_no_resched();
- break;
- case KPROBE_HIT_ACTIVE:
- case KPROBE_HIT_SSDONE:
- /*
- * We increment the nmissed count for accounting,
- * we can also use npre/npostfault count for accouting
- * these specific fault cases.
- */
- kprobes_inc_nmissed_count(cur);
-
- /*
- * We come here because instructions in the pre/post
- * handler caused the page_fault, this could happen
- * if handler tries to access user space by
- * copy_from_user(), get_user() etc. Let the
- * user-specified handler try to fix it first.
- */
- if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr))
- return 1;
- /*
- * In case the user-specified fault handler returned
- * zero, try to fix up.
- */
- if (ia64_done_with_exception(regs))
- return 1;
-
- /*
- * Let ia64_do_page_fault() fix it.
- */
- break;
- default:
- break;
- }
-
- return 0;
-}
-
-int __kprobes kprobe_exceptions_notify(struct notifier_block *self,
- unsigned long val, void *data)
-{
- struct die_args *args = (struct die_args *)data;
- int ret = NOTIFY_DONE;
-
- if (args->regs && user_mode(args->regs))
- return ret;
-
- switch(val) {
- case DIE_BREAK:
- /* err is break number from ia64_bad_break() */
- if ((args->err >> 12) == (__IA64_BREAK_KPROBE >> 12)
- || args->err == __IA64_BREAK_JPROBE
- || args->err == 0)
- if (pre_kprobes_handler(args))
- ret = NOTIFY_STOP;
- break;
- case DIE_FAULT:
- /* err is vector number from ia64_fault() */
- if (args->err == 36)
- if (post_kprobes_handler(args->regs))
- ret = NOTIFY_STOP;
- break;
- default:
- break;
- }
- return ret;
-}
-
-struct param_bsp_cfm {
- unsigned long ip;
- unsigned long *bsp;
- unsigned long cfm;
-};
-
-static void ia64_get_bsp_cfm(struct unw_frame_info *info, void *arg)
-{
- unsigned long ip;
- struct param_bsp_cfm *lp = arg;
-
- do {
- unw_get_ip(info, &ip);
- if (ip == 0)
- break;
- if (ip == lp->ip) {
- unw_get_bsp(info, (unsigned long*)&lp->bsp);
- unw_get_cfm(info, (unsigned long*)&lp->cfm);
- return;
- }
- } while (unw_unwind(info) >= 0);
- lp->bsp = NULL;
- lp->cfm = 0;
- return;
-}
-
-unsigned long arch_deref_entry_point(void *entry)
-{
- return ((struct fnptr *)entry)->ip;
-}
-
-int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct jprobe *jp = container_of(p, struct jprobe, kp);
- unsigned long addr = arch_deref_entry_point(jp->entry);
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
- struct param_bsp_cfm pa;
- int bytes;
-
- /*
- * Callee owns the argument space and could overwrite it, eg
- * tail call optimization. So to be absolutely safe
- * we save the argument space before transferring the control
- * to instrumented jprobe function which runs in
- * the process context
- */
- pa.ip = regs->cr_iip;
- unw_init_running(ia64_get_bsp_cfm, &pa);
- bytes = (char *)ia64_rse_skip_regs(pa.bsp, pa.cfm & 0x3f)
- - (char *)pa.bsp;
- memcpy( kcb->jprobes_saved_stacked_regs,
- pa.bsp,
- bytes );
- kcb->bsp = pa.bsp;
- kcb->cfm = pa.cfm;
-
- /* save architectural state */
- kcb->jprobe_saved_regs = *regs;
-
- /* after rfi, execute the jprobe instrumented function */
- regs->cr_iip = addr & ~0xFULL;
- ia64_psr(regs)->ri = addr & 0xf;
- regs->r1 = ((struct fnptr *)(jp->entry))->gp;
-
- /*
- * fix the return address to our jprobe_inst_return() function
- * in the jprobes.S file
- */
- regs->b0 = ((struct fnptr *)(jprobe_inst_return))->ip;
-
- return 1;
-}
-
-int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
- int bytes;
-
- /* restoring architectural state */
- *regs = kcb->jprobe_saved_regs;
-
- /* restoring the original argument space */
- flush_register_stack();
- bytes = (char *)ia64_rse_skip_regs(kcb->bsp, kcb->cfm & 0x3f)
- - (char *)kcb->bsp;
- memcpy( kcb->bsp,
- kcb->jprobes_saved_stacked_regs,
- bytes );
- invalidate_stacked_regs();
-
- preempt_enable_no_resched();
- return 1;
-}
-
-static struct kprobe trampoline_p = {
- .pre_handler = trampoline_probe_handler
-};
-
-int __init arch_init_kprobes(void)
-{
- trampoline_p.addr =
- (kprobe_opcode_t *)((struct fnptr *)kretprobe_trampoline)->ip;
- return register_kprobe(&trampoline_p);
-}
-
-int __kprobes arch_trampoline_kprobe(struct kprobe *p)
-{
- if (p->addr ==
- (kprobe_opcode_t *)((struct fnptr *)kretprobe_trampoline)->ip)
- return 1;
-
- return 0;
-}
Index: linux-2.6-lttng.stable/arch/powerpc/instrumentation/kprobes.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng.stable/arch/powerpc/instrumentation/kprobes.c 2007-10-29 09:51:38.000000000 -0400
@@ -0,0 +1,559 @@
+/*
+ * Kernel Probes (KProbes)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2002, 2004
+ *
+ * 2002-Oct Created by Vamsi Krishna S <vamsi_krishna@in.ibm.com> Kernel
+ * Probes initial implementation ( includes contributions from
+ * Rusty Russell).
+ * 2004-July Suparna Bhattacharya <suparna@in.ibm.com> added jumper probes
+ * interface to access function arguments.
+ * 2004-Nov Ananth N Mavinakayanahalli <ananth@in.ibm.com> kprobes port
+ * for PPC64
+ */
+
+#include <linux/kprobes.h>
+#include <linux/ptrace.h>
+#include <linux/preempt.h>
+#include <linux/module.h>
+#include <linux/kdebug.h>
+#include <asm/cacheflush.h>
+#include <asm/sstep.h>
+#include <asm/uaccess.h>
+
+DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
+DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
+
+struct kretprobe_blackpoint kretprobe_blacklist[] = {{NULL, NULL}};
+
+int __kprobes arch_prepare_kprobe(struct kprobe *p)
+{
+ int ret = 0;
+ kprobe_opcode_t insn = *p->addr;
+
+ if ((unsigned long)p->addr & 0x03) {
+ printk("Attempt to register kprobe at an unaligned address\n");
+ ret = -EINVAL;
+ } else if (IS_MTMSRD(insn) || IS_RFID(insn) || IS_RFI(insn)) {
+ printk("Cannot register a kprobe on rfi/rfid or mtmsr[d]\n");
+ ret = -EINVAL;
+ }
+
+ /* insn must be on a special executable page on ppc64 */
+ if (!ret) {
+ p->ainsn.insn = get_insn_slot();
+ if (!p->ainsn.insn)
+ ret = -ENOMEM;
+ }
+
+ if (!ret) {
+ memcpy(p->ainsn.insn, p->addr,
+ MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
+ p->opcode = *p->addr;
+ flush_icache_range((unsigned long)p->ainsn.insn,
+ (unsigned long)p->ainsn.insn + sizeof(kprobe_opcode_t));
+ }
+
+ p->ainsn.boostable = 0;
+ return ret;
+}
+
+void __kprobes arch_arm_kprobe(struct kprobe *p)
+{
+ *p->addr = BREAKPOINT_INSTRUCTION;
+ flush_icache_range((unsigned long) p->addr,
+ (unsigned long) p->addr + sizeof(kprobe_opcode_t));
+}
+
+void __kprobes arch_disarm_kprobe(struct kprobe *p)
+{
+ *p->addr = p->opcode;
+ flush_icache_range((unsigned long) p->addr,
+ (unsigned long) p->addr + sizeof(kprobe_opcode_t));
+}
+
+void __kprobes arch_remove_kprobe(struct kprobe *p)
+{
+ mutex_lock(&kprobe_mutex);
+ free_insn_slot(p->ainsn.insn, 0);
+ mutex_unlock(&kprobe_mutex);
+}
+
+static void __kprobes prepare_singlestep(struct kprobe *p, struct pt_regs *regs)
+{
+ regs->msr |= MSR_SE;
+
+ /*
+ * On powerpc we should single step on the original
+ * instruction even if the probed insn is a trap
+ * variant as values in regs could play a part in
+ * if the trap is taken or not
+ */
+ regs->nip = (unsigned long)p->ainsn.insn;
+}
+
+static void __kprobes save_previous_kprobe(struct kprobe_ctlblk *kcb)
+{
+ kcb->prev_kprobe.kp = kprobe_running();
+ kcb->prev_kprobe.status = kcb->kprobe_status;
+ kcb->prev_kprobe.saved_msr = kcb->kprobe_saved_msr;
+}
+
+static void __kprobes restore_previous_kprobe(struct kprobe_ctlblk *kcb)
+{
+ __get_cpu_var(current_kprobe) = kcb->prev_kprobe.kp;
+ kcb->kprobe_status = kcb->prev_kprobe.status;
+ kcb->kprobe_saved_msr = kcb->prev_kprobe.saved_msr;
+}
+
+static void __kprobes set_current_kprobe(struct kprobe *p, struct pt_regs *regs,
+ struct kprobe_ctlblk *kcb)
+{
+ __get_cpu_var(current_kprobe) = p;
+ kcb->kprobe_saved_msr = regs->msr;
+}
+
+/* Called with kretprobe_lock held */
+void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
+ struct pt_regs *regs)
+{
+ ri->ret_addr = (kprobe_opcode_t *)regs->link;
+
+ /* Replace the return addr with trampoline addr */
+ regs->link = (unsigned long)kretprobe_trampoline;
+}
+
+static int __kprobes kprobe_handler(struct pt_regs *regs)
+{
+ struct kprobe *p;
+ int ret = 0;
+ unsigned int *addr = (unsigned int *)regs->nip;
+ struct kprobe_ctlblk *kcb;
+
+ /*
+ * We don't want to be preempted for the entire
+ * duration of kprobe processing
+ */
+ preempt_disable();
+ kcb = get_kprobe_ctlblk();
+
+ /* Check we're not actually recursing */
+ if (kprobe_running()) {
+ p = get_kprobe(addr);
+ if (p) {
+ kprobe_opcode_t insn = *p->ainsn.insn;
+ if (kcb->kprobe_status == KPROBE_HIT_SS &&
+ is_trap(insn)) {
+ regs->msr &= ~MSR_SE;
+ regs->msr |= kcb->kprobe_saved_msr;
+ goto no_kprobe;
+ }
+ /* We have reentered the kprobe_handler(), since
+ * another probe was hit while within the handler.
+ * We here save the original kprobes variables and
+ * just single step on the instruction of the new probe
+ * without calling any user handlers.
+ */
+ save_previous_kprobe(kcb);
+ set_current_kprobe(p, regs, kcb);
+ kcb->kprobe_saved_msr = regs->msr;
+ kprobes_inc_nmissed_count(p);
+ prepare_singlestep(p, regs);
+ kcb->kprobe_status = KPROBE_REENTER;
+ return 1;
+ } else {
+ if (*addr != BREAKPOINT_INSTRUCTION) {
+ /* If trap variant, then it belongs not to us */
+ kprobe_opcode_t cur_insn = *addr;
+ if (is_trap(cur_insn))
+ goto no_kprobe;
+ /* The breakpoint instruction was removed by
+ * another cpu right after we hit, no further
+ * handling of this interrupt is appropriate
+ */
+ ret = 1;
+ goto no_kprobe;
+ }
+ p = __get_cpu_var(current_kprobe);
+ if (p->break_handler && p->break_handler(p, regs)) {
+ goto ss_probe;
+ }
+ }
+ goto no_kprobe;
+ }
+
+ p = get_kprobe(addr);
+ if (!p) {
+ if (*addr != BREAKPOINT_INSTRUCTION) {
+ /*
+ * PowerPC has multiple variants of the "trap"
+ * instruction. If the current instruction is a
+ * trap variant, it could belong to someone else
+ */
+ kprobe_opcode_t cur_insn = *addr;
+ if (is_trap(cur_insn))
+ goto no_kprobe;
+ /*
+ * The breakpoint instruction was removed right
+ * after we hit it. Another cpu has removed
+ * either a probepoint or a debugger breakpoint
+ * at this address. In either case, no further
+ * handling of this interrupt is appropriate.
+ */
+ ret = 1;
+ }
+ /* Not one of ours: let kernel handle it */
+ goto no_kprobe;
+ }
+
+ kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+ set_current_kprobe(p, regs, kcb);
+ if (p->pre_handler && p->pre_handler(p, regs))
+ /* handler has already set things up, so skip ss setup */
+ return 1;
+
+ss_probe:
+ if (p->ainsn.boostable >= 0) {
+ unsigned int insn = *p->ainsn.insn;
+
+ /* regs->nip is also adjusted if emulate_step returns 1 */
+ ret = emulate_step(regs, insn);
+ if (ret > 0) {
+ /*
+ * Once this instruction has been boosted
+ * successfully, set the boostable flag
+ */
+ if (unlikely(p->ainsn.boostable == 0))
+ p->ainsn.boostable = 1;
+
+ if (p->post_handler)
+ p->post_handler(p, regs, 0);
+
+ kcb->kprobe_status = KPROBE_HIT_SSDONE;
+ reset_current_kprobe();
+ preempt_enable_no_resched();
+ return 1;
+ } else if (ret < 0) {
+ /*
+ * We don't allow kprobes on mtmsr(d)/rfi(d), etc.
+ * So, we should never get here... but, its still
+ * good to catch them, just in case...
+ */
+ printk("Can't step on instruction %x\n", insn);
+ BUG();
+ } else if (ret == 0)
+ /* This instruction can't be boosted */
+ p->ainsn.boostable = -1;
+ }
+ prepare_singlestep(p, regs);
+ kcb->kprobe_status = KPROBE_HIT_SS;
+ return 1;
+
+no_kprobe:
+ preempt_enable_no_resched();
+ return ret;
+}
+
+/*
+ * Function return probe trampoline:
+ * - init_kprobes() establishes a probepoint here
+ * - When the probed function returns, this probe
+ * causes the handlers to fire
+ */
+void kretprobe_trampoline_holder(void)
+{
+ asm volatile(".global kretprobe_trampoline\n"
+ "kretprobe_trampoline:\n"
+ "nop\n");
+}
+
+/*
+ * Called when the probe at kretprobe trampoline is hit
+ */
+int __kprobes trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
+{
+ struct kretprobe_instance *ri = NULL;
+ struct hlist_head *head, empty_rp;
+ struct hlist_node *node, *tmp;
+ unsigned long flags, orig_ret_address = 0;
+ unsigned long trampoline_address =(unsigned long)&kretprobe_trampoline;
+
+ INIT_HLIST_HEAD(&empty_rp);
+ spin_lock_irqsave(&kretprobe_lock, flags);
+ head = kretprobe_inst_table_head(current);
+
+ /*
+ * It is possible to have multiple instances associated with a given
+ * task either because an multiple functions in the call path
+ * have a return probe installed on them, and/or more then one return
+ * return probe was registered for a target function.
+ *
+ * We can handle this because:
+ * - instances are always inserted at the head of the list
+ * - when multiple return probes are registered for the same
+ * function, the first instance's ret_addr will point to the
+ * real return address, and all the rest will point to
+ * kretprobe_trampoline
+ */
+ hlist_for_each_entry_safe(ri, node, tmp, head, hlist) {
+ if (ri->task != current)
+ /* another task is sharing our hash bucket */
+ continue;
+
+ if (ri->rp && ri->rp->handler)
+ ri->rp->handler(ri, regs);
+
+ orig_ret_address = (unsigned long)ri->ret_addr;
+ recycle_rp_inst(ri, &empty_rp);
+
+ if (orig_ret_address != trampoline_address)
+ /*
+ * This is the real return address. Any other
+ * instances associated with this task are for
+ * other calls deeper on the call stack
+ */
+ break;
+ }
+
+ kretprobe_assert(ri, orig_ret_address, trampoline_address);
+ regs->nip = orig_ret_address;
+
+ reset_current_kprobe();
+ spin_unlock_irqrestore(&kretprobe_lock, flags);
+ preempt_enable_no_resched();
+
+ hlist_for_each_entry_safe(ri, node, tmp, &empty_rp, hlist) {
+ hlist_del(&ri->hlist);
+ kfree(ri);
+ }
+ /*
+ * By returning a non-zero value, we are telling
+ * kprobe_handler() that we don't want the post_handler
+ * to run (and have re-enabled preemption)
+ */
+ return 1;
+}
+
+/*
+ * Called after single-stepping. p->addr is the address of the
+ * instruction whose first byte has been replaced by the "breakpoint"
+ * instruction. To avoid the SMP problems that can occur when we
+ * temporarily put back the original opcode to single-step, we
+ * single-stepped a copy of the instruction. The address of this
+ * copy is p->ainsn.insn.
+ */
+static void __kprobes resume_execution(struct kprobe *p, struct pt_regs *regs)
+{
+ int ret;
+ unsigned int insn = *p->ainsn.insn;
+
+ regs->nip = (unsigned long)p->addr;
+ ret = emulate_step(regs, insn);
+ if (ret == 0)
+ regs->nip = (unsigned long)p->addr + 4;
+}
+
+static int __kprobes post_kprobe_handler(struct pt_regs *regs)
+{
+ struct kprobe *cur = kprobe_running();
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+
+ if (!cur)
+ return 0;
+
+ if ((kcb->kprobe_status != KPROBE_REENTER) && cur->post_handler) {
+ kcb->kprobe_status = KPROBE_HIT_SSDONE;
+ cur->post_handler(cur, regs, 0);
+ }
+
+ resume_execution(cur, regs);
+ regs->msr |= kcb->kprobe_saved_msr;
+
+ /*Restore back the original saved kprobes variables and continue. */
+ if (kcb->kprobe_status == KPROBE_REENTER) {
+ restore_previous_kprobe(kcb);
+ goto out;
+ }
+ reset_current_kprobe();
+out:
+ preempt_enable_no_resched();
+
+ /*
+ * if somebody else is singlestepping across a probe point, msr
+ * will have SE set, in which case, continue the remaining processing
+ * of do_debug, as if this is not a probe hit.
+ */
+ if (regs->msr & MSR_SE)
+ return 0;
+
+ return 1;
+}
+
+int __kprobes kprobe_fault_handler(struct pt_regs *regs, int trapnr)
+{
+ struct kprobe *cur = kprobe_running();
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+ const struct exception_table_entry *entry;
+
+ switch(kcb->kprobe_status) {
+ case KPROBE_HIT_SS:
+ case KPROBE_REENTER:
+ /*
+ * We are here because the instruction being single
+ * stepped caused a page fault. We reset the current
+ * kprobe and the nip points back to the probe address
+ * and allow the page fault handler to continue as a
+ * normal page fault.
+ */
+ regs->nip = (unsigned long)cur->addr;
+ regs->msr &= ~MSR_SE;
+ regs->msr |= kcb->kprobe_saved_msr;
+ if (kcb->kprobe_status == KPROBE_REENTER)
+ restore_previous_kprobe(kcb);
+ else
+ reset_current_kprobe();
+ preempt_enable_no_resched();
+ break;
+ case KPROBE_HIT_ACTIVE:
+ case KPROBE_HIT_SSDONE:
+ /*
+ * We increment the nmissed count for accounting,
+ * we can also use npre/npostfault count for accouting
+ * these specific fault cases.
+ */
+ kprobes_inc_nmissed_count(cur);
+
+ /*
+ * We come here because instructions in the pre/post
+ * handler caused the page_fault, this could happen
+ * if handler tries to access user space by
+ * copy_from_user(), get_user() etc. Let the
+ * user-specified handler try to fix it first.
+ */
+ if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr))
+ return 1;
+
+ /*
+ * In case the user-specified fault handler returned
+ * zero, try to fix up.
+ */
+ if ((entry = search_exception_tables(regs->nip)) != NULL) {
+ regs->nip = entry->fixup;
+ return 1;
+ }
+
+ /*
+ * fixup_exception() could not handle it,
+ * Let do_page_fault() fix it.
+ */
+ break;
+ default:
+ break;
+ }
+ return 0;
+}
+
+/*
+ * Wrapper routine to for handling exceptions.
+ */
+int __kprobes kprobe_exceptions_notify(struct notifier_block *self,
+ unsigned long val, void *data)
+{
+ struct die_args *args = (struct die_args *)data;
+ int ret = NOTIFY_DONE;
+
+ if (args->regs && user_mode(args->regs))
+ return ret;
+
+ switch (val) {
+ case DIE_BPT:
+ if (kprobe_handler(args->regs))
+ ret = NOTIFY_STOP;
+ break;
+ case DIE_SSTEP:
+ if (post_kprobe_handler(args->regs))
+ ret = NOTIFY_STOP;
+ break;
+ default:
+ break;
+ }
+ return ret;
+}
+
+#ifdef CONFIG_PPC64
+unsigned long arch_deref_entry_point(void *entry)
+{
+ return (unsigned long)(((func_descr_t *)entry)->entry);
+}
+#endif
+
+int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
+{
+ struct jprobe *jp = container_of(p, struct jprobe, kp);
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+
+ memcpy(&kcb->jprobe_saved_regs, regs, sizeof(struct pt_regs));
+
+ /* setup return addr to the jprobe handler routine */
+ regs->nip = arch_deref_entry_point(jp->entry);
+#ifdef CONFIG_PPC64
+ regs->gpr[2] = (unsigned long)(((func_descr_t *)jp->entry)->toc);
+#endif
+
+ return 1;
+}
+
+void __kprobes jprobe_return(void)
+{
+ asm volatile("trap" ::: "memory");
+}
+
+void __kprobes jprobe_return_end(void)
+{
+};
+
+int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
+{
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+
+ /*
+ * FIXME - we should ideally be validating that we got here 'cos
+ * of the "trap" in jprobe_return() above, before restoring the
+ * saved regs...
+ */
+ memcpy(regs, &kcb->jprobe_saved_regs, sizeof(struct pt_regs));
+ preempt_enable_no_resched();
+ return 1;
+}
+
+static struct kprobe trampoline_p = {
+ .addr = (kprobe_opcode_t *) &kretprobe_trampoline,
+ .pre_handler = trampoline_probe_handler
+};
+
+int __init arch_init_kprobes(void)
+{
+ return register_kprobe(&trampoline_p);
+}
+
+int __kprobes arch_trampoline_kprobe(struct kprobe *p)
+{
+ if (p->addr == (kprobe_opcode_t *)&kretprobe_trampoline)
+ return 1;
+
+ return 0;
+}
Index: linux-2.6-lttng.stable/arch/powerpc/kernel/kprobes.c
===================================================================
--- linux-2.6-lttng.stable.orig/arch/powerpc/kernel/kprobes.c 2007-10-29 09:51:07.000000000 -0400
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,559 +0,0 @@
-/*
- * Kernel Probes (KProbes)
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
- *
- * Copyright (C) IBM Corporation, 2002, 2004
- *
- * 2002-Oct Created by Vamsi Krishna S <vamsi_krishna@in.ibm.com> Kernel
- * Probes initial implementation ( includes contributions from
- * Rusty Russell).
- * 2004-July Suparna Bhattacharya <suparna@in.ibm.com> added jumper probes
- * interface to access function arguments.
- * 2004-Nov Ananth N Mavinakayanahalli <ananth@in.ibm.com> kprobes port
- * for PPC64
- */
-
-#include <linux/kprobes.h>
-#include <linux/ptrace.h>
-#include <linux/preempt.h>
-#include <linux/module.h>
-#include <linux/kdebug.h>
-#include <asm/cacheflush.h>
-#include <asm/sstep.h>
-#include <asm/uaccess.h>
-
-DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
-DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
-
-struct kretprobe_blackpoint kretprobe_blacklist[] = {{NULL, NULL}};
-
-int __kprobes arch_prepare_kprobe(struct kprobe *p)
-{
- int ret = 0;
- kprobe_opcode_t insn = *p->addr;
-
- if ((unsigned long)p->addr & 0x03) {
- printk("Attempt to register kprobe at an unaligned address\n");
- ret = -EINVAL;
- } else if (IS_MTMSRD(insn) || IS_RFID(insn) || IS_RFI(insn)) {
- printk("Cannot register a kprobe on rfi/rfid or mtmsr[d]\n");
- ret = -EINVAL;
- }
-
- /* insn must be on a special executable page on ppc64 */
- if (!ret) {
- p->ainsn.insn = get_insn_slot();
- if (!p->ainsn.insn)
- ret = -ENOMEM;
- }
-
- if (!ret) {
- memcpy(p->ainsn.insn, p->addr,
- MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
- p->opcode = *p->addr;
- flush_icache_range((unsigned long)p->ainsn.insn,
- (unsigned long)p->ainsn.insn + sizeof(kprobe_opcode_t));
- }
-
- p->ainsn.boostable = 0;
- return ret;
-}
-
-void __kprobes arch_arm_kprobe(struct kprobe *p)
-{
- *p->addr = BREAKPOINT_INSTRUCTION;
- flush_icache_range((unsigned long) p->addr,
- (unsigned long) p->addr + sizeof(kprobe_opcode_t));
-}
-
-void __kprobes arch_disarm_kprobe(struct kprobe *p)
-{
- *p->addr = p->opcode;
- flush_icache_range((unsigned long) p->addr,
- (unsigned long) p->addr + sizeof(kprobe_opcode_t));
-}
-
-void __kprobes arch_remove_kprobe(struct kprobe *p)
-{
- mutex_lock(&kprobe_mutex);
- free_insn_slot(p->ainsn.insn, 0);
- mutex_unlock(&kprobe_mutex);
-}
-
-static void __kprobes prepare_singlestep(struct kprobe *p, struct pt_regs *regs)
-{
- regs->msr |= MSR_SE;
-
- /*
- * On powerpc we should single step on the original
- * instruction even if the probed insn is a trap
- * variant as values in regs could play a part in
- * if the trap is taken or not
- */
- regs->nip = (unsigned long)p->ainsn.insn;
-}
-
-static void __kprobes save_previous_kprobe(struct kprobe_ctlblk *kcb)
-{
- kcb->prev_kprobe.kp = kprobe_running();
- kcb->prev_kprobe.status = kcb->kprobe_status;
- kcb->prev_kprobe.saved_msr = kcb->kprobe_saved_msr;
-}
-
-static void __kprobes restore_previous_kprobe(struct kprobe_ctlblk *kcb)
-{
- __get_cpu_var(current_kprobe) = kcb->prev_kprobe.kp;
- kcb->kprobe_status = kcb->prev_kprobe.status;
- kcb->kprobe_saved_msr = kcb->prev_kprobe.saved_msr;
-}
-
-static void __kprobes set_current_kprobe(struct kprobe *p, struct pt_regs *regs,
- struct kprobe_ctlblk *kcb)
-{
- __get_cpu_var(current_kprobe) = p;
- kcb->kprobe_saved_msr = regs->msr;
-}
-
-/* Called with kretprobe_lock held */
-void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
- struct pt_regs *regs)
-{
- ri->ret_addr = (kprobe_opcode_t *)regs->link;
-
- /* Replace the return addr with trampoline addr */
- regs->link = (unsigned long)kretprobe_trampoline;
-}
-
-static int __kprobes kprobe_handler(struct pt_regs *regs)
-{
- struct kprobe *p;
- int ret = 0;
- unsigned int *addr = (unsigned int *)regs->nip;
- struct kprobe_ctlblk *kcb;
-
- /*
- * We don't want to be preempted for the entire
- * duration of kprobe processing
- */
- preempt_disable();
- kcb = get_kprobe_ctlblk();
-
- /* Check we're not actually recursing */
- if (kprobe_running()) {
- p = get_kprobe(addr);
- if (p) {
- kprobe_opcode_t insn = *p->ainsn.insn;
- if (kcb->kprobe_status == KPROBE_HIT_SS &&
- is_trap(insn)) {
- regs->msr &= ~MSR_SE;
- regs->msr |= kcb->kprobe_saved_msr;
- goto no_kprobe;
- }
- /* We have reentered the kprobe_handler(), since
- * another probe was hit while within the handler.
- * We here save the original kprobes variables and
- * just single step on the instruction of the new probe
- * without calling any user handlers.
- */
- save_previous_kprobe(kcb);
- set_current_kprobe(p, regs, kcb);
- kcb->kprobe_saved_msr = regs->msr;
- kprobes_inc_nmissed_count(p);
- prepare_singlestep(p, regs);
- kcb->kprobe_status = KPROBE_REENTER;
- return 1;
- } else {
- if (*addr != BREAKPOINT_INSTRUCTION) {
- /* If trap variant, then it belongs not to us */
- kprobe_opcode_t cur_insn = *addr;
- if (is_trap(cur_insn))
- goto no_kprobe;
- /* The breakpoint instruction was removed by
- * another cpu right after we hit, no further
- * handling of this interrupt is appropriate
- */
- ret = 1;
- goto no_kprobe;
- }
- p = __get_cpu_var(current_kprobe);
- if (p->break_handler && p->break_handler(p, regs)) {
- goto ss_probe;
- }
- }
- goto no_kprobe;
- }
-
- p = get_kprobe(addr);
- if (!p) {
- if (*addr != BREAKPOINT_INSTRUCTION) {
- /*
- * PowerPC has multiple variants of the "trap"
- * instruction. If the current instruction is a
- * trap variant, it could belong to someone else
- */
- kprobe_opcode_t cur_insn = *addr;
- if (is_trap(cur_insn))
- goto no_kprobe;
- /*
- * The breakpoint instruction was removed right
- * after we hit it. Another cpu has removed
- * either a probepoint or a debugger breakpoint
- * at this address. In either case, no further
- * handling of this interrupt is appropriate.
- */
- ret = 1;
- }
- /* Not one of ours: let kernel handle it */
- goto no_kprobe;
- }
-
- kcb->kprobe_status = KPROBE_HIT_ACTIVE;
- set_current_kprobe(p, regs, kcb);
- if (p->pre_handler && p->pre_handler(p, regs))
- /* handler has already set things up, so skip ss setup */
- return 1;
-
-ss_probe:
- if (p->ainsn.boostable >= 0) {
- unsigned int insn = *p->ainsn.insn;
-
- /* regs->nip is also adjusted if emulate_step returns 1 */
- ret = emulate_step(regs, insn);
- if (ret > 0) {
- /*
- * Once this instruction has been boosted
- * successfully, set the boostable flag
- */
- if (unlikely(p->ainsn.boostable == 0))
- p->ainsn.boostable = 1;
-
- if (p->post_handler)
- p->post_handler(p, regs, 0);
-
- kcb->kprobe_status = KPROBE_HIT_SSDONE;
- reset_current_kprobe();
- preempt_enable_no_resched();
- return 1;
- } else if (ret < 0) {
- /*
- * We don't allow kprobes on mtmsr(d)/rfi(d), etc.
- * So, we should never get here... but, its still
- * good to catch them, just in case...
- */
- printk("Can't step on instruction %x\n", insn);
- BUG();
- } else if (ret == 0)
- /* This instruction can't be boosted */
- p->ainsn.boostable = -1;
- }
- prepare_singlestep(p, regs);
- kcb->kprobe_status = KPROBE_HIT_SS;
- return 1;
-
-no_kprobe:
- preempt_enable_no_resched();
- return ret;
-}
-
-/*
- * Function return probe trampoline:
- * - init_kprobes() establishes a probepoint here
- * - When the probed function returns, this probe
- * causes the handlers to fire
- */
-void kretprobe_trampoline_holder(void)
-{
- asm volatile(".global kretprobe_trampoline\n"
- "kretprobe_trampoline:\n"
- "nop\n");
-}
-
-/*
- * Called when the probe at kretprobe trampoline is hit
- */
-int __kprobes trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct kretprobe_instance *ri = NULL;
- struct hlist_head *head, empty_rp;
- struct hlist_node *node, *tmp;
- unsigned long flags, orig_ret_address = 0;
- unsigned long trampoline_address =(unsigned long)&kretprobe_trampoline;
-
- INIT_HLIST_HEAD(&empty_rp);
- spin_lock_irqsave(&kretprobe_lock, flags);
- head = kretprobe_inst_table_head(current);
-
- /*
- * It is possible to have multiple instances associated with a given
- * task either because an multiple functions in the call path
- * have a return probe installed on them, and/or more then one return
- * return probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always inserted at the head of the list
- * - when multiple return probes are registered for the same
- * function, the first instance's ret_addr will point to the
- * real return address, and all the rest will point to
- * kretprobe_trampoline
- */
- hlist_for_each_entry_safe(ri, node, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- if (ri->rp && ri->rp->handler)
- ri->rp->handler(ri, regs);
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
- regs->nip = orig_ret_address;
-
- reset_current_kprobe();
- spin_unlock_irqrestore(&kretprobe_lock, flags);
- preempt_enable_no_resched();
-
- hlist_for_each_entry_safe(ri, node, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
- /*
- * By returning a non-zero value, we are telling
- * kprobe_handler() that we don't want the post_handler
- * to run (and have re-enabled preemption)
- */
- return 1;
-}
-
-/*
- * Called after single-stepping. p->addr is the address of the
- * instruction whose first byte has been replaced by the "breakpoint"
- * instruction. To avoid the SMP problems that can occur when we
- * temporarily put back the original opcode to single-step, we
- * single-stepped a copy of the instruction. The address of this
- * copy is p->ainsn.insn.
- */
-static void __kprobes resume_execution(struct kprobe *p, struct pt_regs *regs)
-{
- int ret;
- unsigned int insn = *p->ainsn.insn;
-
- regs->nip = (unsigned long)p->addr;
- ret = emulate_step(regs, insn);
- if (ret == 0)
- regs->nip = (unsigned long)p->addr + 4;
-}
-
-static int __kprobes post_kprobe_handler(struct pt_regs *regs)
-{
- struct kprobe *cur = kprobe_running();
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- if (!cur)
- return 0;
-
- if ((kcb->kprobe_status != KPROBE_REENTER) && cur->post_handler) {
- kcb->kprobe_status = KPROBE_HIT_SSDONE;
- cur->post_handler(cur, regs, 0);
- }
-
- resume_execution(cur, regs);
- regs->msr |= kcb->kprobe_saved_msr;
-
- /*Restore back the original saved kprobes variables and continue. */
- if (kcb->kprobe_status == KPROBE_REENTER) {
- restore_previous_kprobe(kcb);
- goto out;
- }
- reset_current_kprobe();
-out:
- preempt_enable_no_resched();
-
- /*
- * if somebody else is singlestepping across a probe point, msr
- * will have SE set, in which case, continue the remaining processing
- * of do_debug, as if this is not a probe hit.
- */
- if (regs->msr & MSR_SE)
- return 0;
-
- return 1;
-}
-
-int __kprobes kprobe_fault_handler(struct pt_regs *regs, int trapnr)
-{
- struct kprobe *cur = kprobe_running();
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
- const struct exception_table_entry *entry;
-
- switch(kcb->kprobe_status) {
- case KPROBE_HIT_SS:
- case KPROBE_REENTER:
- /*
- * We are here because the instruction being single
- * stepped caused a page fault. We reset the current
- * kprobe and the nip points back to the probe address
- * and allow the page fault handler to continue as a
- * normal page fault.
- */
- regs->nip = (unsigned long)cur->addr;
- regs->msr &= ~MSR_SE;
- regs->msr |= kcb->kprobe_saved_msr;
- if (kcb->kprobe_status == KPROBE_REENTER)
- restore_previous_kprobe(kcb);
- else
- reset_current_kprobe();
- preempt_enable_no_resched();
- break;
- case KPROBE_HIT_ACTIVE:
- case KPROBE_HIT_SSDONE:
- /*
- * We increment the nmissed count for accounting,
- * we can also use npre/npostfault count for accouting
- * these specific fault cases.
- */
- kprobes_inc_nmissed_count(cur);
-
- /*
- * We come here because instructions in the pre/post
- * handler caused the page_fault, this could happen
- * if handler tries to access user space by
- * copy_from_user(), get_user() etc. Let the
- * user-specified handler try to fix it first.
- */
- if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr))
- return 1;
-
- /*
- * In case the user-specified fault handler returned
- * zero, try to fix up.
- */
- if ((entry = search_exception_tables(regs->nip)) != NULL) {
- regs->nip = entry->fixup;
- return 1;
- }
-
- /*
- * fixup_exception() could not handle it,
- * Let do_page_fault() fix it.
- */
- break;
- default:
- break;
- }
- return 0;
-}
-
-/*
- * Wrapper routine to for handling exceptions.
- */
-int __kprobes kprobe_exceptions_notify(struct notifier_block *self,
- unsigned long val, void *data)
-{
- struct die_args *args = (struct die_args *)data;
- int ret = NOTIFY_DONE;
-
- if (args->regs && user_mode(args->regs))
- return ret;
-
- switch (val) {
- case DIE_BPT:
- if (kprobe_handler(args->regs))
- ret = NOTIFY_STOP;
- break;
- case DIE_SSTEP:
- if (post_kprobe_handler(args->regs))
- ret = NOTIFY_STOP;
- break;
- default:
- break;
- }
- return ret;
-}
-
-#ifdef CONFIG_PPC64
-unsigned long arch_deref_entry_point(void *entry)
-{
- return (unsigned long)(((func_descr_t *)entry)->entry);
-}
-#endif
-
-int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct jprobe *jp = container_of(p, struct jprobe, kp);
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- memcpy(&kcb->jprobe_saved_regs, regs, sizeof(struct pt_regs));
-
- /* setup return addr to the jprobe handler routine */
- regs->nip = arch_deref_entry_point(jp->entry);
-#ifdef CONFIG_PPC64
- regs->gpr[2] = (unsigned long)(((func_descr_t *)jp->entry)->toc);
-#endif
-
- return 1;
-}
-
-void __kprobes jprobe_return(void)
-{
- asm volatile("trap" ::: "memory");
-}
-
-void __kprobes jprobe_return_end(void)
-{
-};
-
-int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- /*
- * FIXME - we should ideally be validating that we got here 'cos
- * of the "trap" in jprobe_return() above, before restoring the
- * saved regs...
- */
- memcpy(regs, &kcb->jprobe_saved_regs, sizeof(struct pt_regs));
- preempt_enable_no_resched();
- return 1;
-}
-
-static struct kprobe trampoline_p = {
- .addr = (kprobe_opcode_t *) &kretprobe_trampoline,
- .pre_handler = trampoline_probe_handler
-};
-
-int __init arch_init_kprobes(void)
-{
- return register_kprobe(&trampoline_p);
-}
-
-int __kprobes arch_trampoline_kprobe(struct kprobe *p)
-{
- if (p->addr == (kprobe_opcode_t *)&kretprobe_trampoline)
- return 1;
-
- return 0;
-}
Index: linux-2.6-lttng.stable/arch/s390/instrumentation/kprobes.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng.stable/arch/s390/instrumentation/kprobes.c 2007-10-29 09:51:38.000000000 -0400
@@ -0,0 +1,672 @@
+/*
+ * Kernel Probes (KProbes)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2002, 2006
+ *
+ * s390 port, used ppc64 as template. Mike Grundy <grundym@us.ibm.com>
+ */
+
+#include <linux/kprobes.h>
+#include <linux/ptrace.h>
+#include <linux/preempt.h>
+#include <linux/stop_machine.h>
+#include <linux/kdebug.h>
+#include <asm/cacheflush.h>
+#include <asm/sections.h>
+#include <asm/uaccess.h>
+#include <linux/module.h>
+
+DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
+DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
+
+struct kretprobe_blackpoint kretprobe_blacklist[] = {{NULL, NULL}};
+
+int __kprobes arch_prepare_kprobe(struct kprobe *p)
+{
+ /* Make sure the probe isn't going on a difficult instruction */
+ if (is_prohibited_opcode((kprobe_opcode_t *) p->addr))
+ return -EINVAL;
+
+ if ((unsigned long)p->addr & 0x01) {
+ printk("Attempt to register kprobe at an unaligned address\n");
+ return -EINVAL;
+ }
+
+ /* Use the get_insn_slot() facility for correctness */
+ if (!(p->ainsn.insn = get_insn_slot()))
+ return -ENOMEM;
+
+ memcpy(p->ainsn.insn, p->addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
+
+ get_instruction_type(&p->ainsn);
+ p->opcode = *p->addr;
+ return 0;
+}
+
+int __kprobes is_prohibited_opcode(kprobe_opcode_t *instruction)
+{
+ switch (*(__u8 *) instruction) {
+ case 0x0c: /* bassm */
+ case 0x0b: /* bsm */
+ case 0x83: /* diag */
+ case 0x44: /* ex */
+ return -EINVAL;
+ }
+ switch (*(__u16 *) instruction) {
+ case 0x0101: /* pr */
+ case 0xb25a: /* bsa */
+ case 0xb240: /* bakr */
+ case 0xb258: /* bsg */
+ case 0xb218: /* pc */
+ case 0xb228: /* pt */
+ return -EINVAL;
+ }
+ return 0;
+}
+
+void __kprobes get_instruction_type(struct arch_specific_insn *ainsn)
+{
+ /* default fixup method */
+ ainsn->fixup = FIXUP_PSW_NORMAL;
+
+ /* save r1 operand */
+ ainsn->reg = (*ainsn->insn & 0xf0) >> 4;
+
+ /* save the instruction length (pop 5-5) in bytes */
+ switch (*(__u8 *) (ainsn->insn) >> 6) {
+ case 0:
+ ainsn->ilen = 2;
+ break;
+ case 1:
+ case 2:
+ ainsn->ilen = 4;
+ break;
+ case 3:
+ ainsn->ilen = 6;
+ break;
+ }
+
+ switch (*(__u8 *) ainsn->insn) {
+ case 0x05: /* balr */
+ case 0x0d: /* basr */
+ ainsn->fixup = FIXUP_RETURN_REGISTER;
+ /* if r2 = 0, no branch will be taken */
+ if ((*ainsn->insn & 0x0f) == 0)
+ ainsn->fixup |= FIXUP_BRANCH_NOT_TAKEN;
+ break;
+ case 0x06: /* bctr */
+ case 0x07: /* bcr */
+ ainsn->fixup = FIXUP_BRANCH_NOT_TAKEN;
+ break;
+ case 0x45: /* bal */
+ case 0x4d: /* bas */
+ ainsn->fixup = FIXUP_RETURN_REGISTER;
+ break;
+ case 0x47: /* bc */
+ case 0x46: /* bct */
+ case 0x86: /* bxh */
+ case 0x87: /* bxle */
+ ainsn->fixup = FIXUP_BRANCH_NOT_TAKEN;
+ break;
+ case 0x82: /* lpsw */
+ ainsn->fixup = FIXUP_NOT_REQUIRED;
+ break;
+ case 0xb2: /* lpswe */
+ if (*(((__u8 *) ainsn->insn) + 1) == 0xb2) {
+ ainsn->fixup = FIXUP_NOT_REQUIRED;
+ }
+ break;
+ case 0xa7: /* bras */
+ if ((*ainsn->insn & 0x0f) == 0x05) {
+ ainsn->fixup |= FIXUP_RETURN_REGISTER;
+ }
+ break;
+ case 0xc0:
+ if ((*ainsn->insn & 0x0f) == 0x00 /* larl */
+ || (*ainsn->insn & 0x0f) == 0x05) /* brasl */
+ ainsn->fixup |= FIXUP_RETURN_REGISTER;
+ break;
+ case 0xeb:
+ if (*(((__u8 *) ainsn->insn) + 5 ) == 0x44 || /* bxhg */
+ *(((__u8 *) ainsn->insn) + 5) == 0x45) {/* bxleg */
+ ainsn->fixup = FIXUP_BRANCH_NOT_TAKEN;
+ }
+ break;
+ case 0xe3: /* bctg */
+ if (*(((__u8 *) ainsn->insn) + 5) == 0x46) {
+ ainsn->fixup = FIXUP_BRANCH_NOT_TAKEN;
+ }
+ break;
+ }
+}
+
+static int __kprobes swap_instruction(void *aref)
+{
+ struct ins_replace_args *args = aref;
+ u32 *addr;
+ u32 instr;
+ int err = -EFAULT;
+
+ /*
+ * Text segment is read-only, hence we use stura to bypass dynamic
+ * address translation to exchange the instruction. Since stura
+ * always operates on four bytes, but we only want to exchange two
+ * bytes do some calculations to get things right. In addition we
+ * shall not cross any page boundaries (vmalloc area!) when writing
+ * the new instruction.
+ */
+ addr = (u32 *)((unsigned long)args->ptr & -4UL);
+ if ((unsigned long)args->ptr & 2)
+ instr = ((*addr) & 0xffff0000) | args->new;
+ else
+ instr = ((*addr) & 0x0000ffff) | args->new << 16;
+
+ asm volatile(
+ " lra %1,0(%1)\n"
+ "0: stura %2,%1\n"
+ "1: la %0,0\n"
+ "2:\n"
+ EX_TABLE(0b,2b)
+ : "+d" (err)
+ : "a" (addr), "d" (instr)
+ : "memory", "cc");
+
+ return err;
+}
+
+void __kprobes arch_arm_kprobe(struct kprobe *p)
+{
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+ unsigned long status = kcb->kprobe_status;
+ struct ins_replace_args args;
+
+ args.ptr = p->addr;
+ args.old = p->opcode;
+ args.new = BREAKPOINT_INSTRUCTION;
+
+ kcb->kprobe_status = KPROBE_SWAP_INST;
+ stop_machine_run(swap_instruction, &args, NR_CPUS);
+ kcb->kprobe_status = status;
+}
+
+void __kprobes arch_disarm_kprobe(struct kprobe *p)
+{
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+ unsigned long status = kcb->kprobe_status;
+ struct ins_replace_args args;
+
+ args.ptr = p->addr;
+ args.old = BREAKPOINT_INSTRUCTION;
+ args.new = p->opcode;
+
+ kcb->kprobe_status = KPROBE_SWAP_INST;
+ stop_machine_run(swap_instruction, &args, NR_CPUS);
+ kcb->kprobe_status = status;
+}
+
+void __kprobes arch_remove_kprobe(struct kprobe *p)
+{
+ mutex_lock(&kprobe_mutex);
+ free_insn_slot(p->ainsn.insn, 0);
+ mutex_unlock(&kprobe_mutex);
+}
+
+static void __kprobes prepare_singlestep(struct kprobe *p, struct pt_regs *regs)
+{
+ per_cr_bits kprobe_per_regs[1];
+
+ memset(kprobe_per_regs, 0, sizeof(per_cr_bits));
+ regs->psw.addr = (unsigned long)p->ainsn.insn | PSW_ADDR_AMODE;
+
+ /* Set up the per control reg info, will pass to lctl */
+ kprobe_per_regs[0].em_instruction_fetch = 1;
+ kprobe_per_regs[0].starting_addr = (unsigned long)p->ainsn.insn;
+ kprobe_per_regs[0].ending_addr = (unsigned long)p->ainsn.insn + 1;
+
+ /* Set the PER control regs, turns on single step for this address */
+ __ctl_load(kprobe_per_regs, 9, 11);
+ regs->psw.mask |= PSW_MASK_PER;
+ regs->psw.mask &= ~(PSW_MASK_IO | PSW_MASK_EXT | PSW_MASK_MCHECK);
+}
+
+static void __kprobes save_previous_kprobe(struct kprobe_ctlblk *kcb)
+{
+ kcb->prev_kprobe.kp = kprobe_running();
+ kcb->prev_kprobe.status = kcb->kprobe_status;
+ kcb->prev_kprobe.kprobe_saved_imask = kcb->kprobe_saved_imask;
+ memcpy(kcb->prev_kprobe.kprobe_saved_ctl, kcb->kprobe_saved_ctl,
+ sizeof(kcb->kprobe_saved_ctl));
+}
+
+static void __kprobes restore_previous_kprobe(struct kprobe_ctlblk *kcb)
+{
+ __get_cpu_var(current_kprobe) = kcb->prev_kprobe.kp;
+ kcb->kprobe_status = kcb->prev_kprobe.status;
+ kcb->kprobe_saved_imask = kcb->prev_kprobe.kprobe_saved_imask;
+ memcpy(kcb->kprobe_saved_ctl, kcb->prev_kprobe.kprobe_saved_ctl,
+ sizeof(kcb->kprobe_saved_ctl));
+}
+
+static void __kprobes set_current_kprobe(struct kprobe *p, struct pt_regs *regs,
+ struct kprobe_ctlblk *kcb)
+{
+ __get_cpu_var(current_kprobe) = p;
+ /* Save the interrupt and per flags */
+ kcb->kprobe_saved_imask = regs->psw.mask &
+ (PSW_MASK_PER | PSW_MASK_IO | PSW_MASK_EXT | PSW_MASK_MCHECK);
+ /* Save the control regs that govern PER */
+ __ctl_store(kcb->kprobe_saved_ctl, 9, 11);
+}
+
+/* Called with kretprobe_lock held */
+void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
+ struct pt_regs *regs)
+{
+ ri->ret_addr = (kprobe_opcode_t *) regs->gprs[14];
+
+ /* Replace the return addr with trampoline addr */
+ regs->gprs[14] = (unsigned long)&kretprobe_trampoline;
+}
+
+static int __kprobes kprobe_handler(struct pt_regs *regs)
+{
+ struct kprobe *p;
+ int ret = 0;
+ unsigned long *addr = (unsigned long *)
+ ((regs->psw.addr & PSW_ADDR_INSN) - 2);
+ struct kprobe_ctlblk *kcb;
+
+ /*
+ * We don't want to be preempted for the entire
+ * duration of kprobe processing
+ */
+ preempt_disable();
+ kcb = get_kprobe_ctlblk();
+
+ /* Check we're not actually recursing */
+ if (kprobe_running()) {
+ p = get_kprobe(addr);
+ if (p) {
+ if (kcb->kprobe_status == KPROBE_HIT_SS &&
+ *p->ainsn.insn == BREAKPOINT_INSTRUCTION) {
+ regs->psw.mask &= ~PSW_MASK_PER;
+ regs->psw.mask |= kcb->kprobe_saved_imask;
+ goto no_kprobe;
+ }
+ /* We have reentered the kprobe_handler(), since
+ * another probe was hit while within the handler.
+ * We here save the original kprobes variables and
+ * just single step on the instruction of the new probe
+ * without calling any user handlers.
+ */
+ save_previous_kprobe(kcb);
+ set_current_kprobe(p, regs, kcb);
+ kprobes_inc_nmissed_count(p);
+ prepare_singlestep(p, regs);
+ kcb->kprobe_status = KPROBE_REENTER;
+ return 1;
+ } else {
+ p = __get_cpu_var(current_kprobe);
+ if (p->break_handler && p->break_handler(p, regs)) {
+ goto ss_probe;
+ }
+ }
+ goto no_kprobe;
+ }
+
+ p = get_kprobe(addr);
+ if (!p)
+ /*
+ * No kprobe at this address. The fault has not been
+ * caused by a kprobe breakpoint. The race of breakpoint
+ * vs. kprobe remove does not exist because on s390 we
+ * use stop_machine_run to arm/disarm the breakpoints.
+ */
+ goto no_kprobe;
+
+ kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+ set_current_kprobe(p, regs, kcb);
+ if (p->pre_handler && p->pre_handler(p, regs))
+ /* handler has already set things up, so skip ss setup */
+ return 1;
+
+ss_probe:
+ prepare_singlestep(p, regs);
+ kcb->kprobe_status = KPROBE_HIT_SS;
+ return 1;
+
+no_kprobe:
+ preempt_enable_no_resched();
+ return ret;
+}
+
+/*
+ * Function return probe trampoline:
+ * - init_kprobes() establishes a probepoint here
+ * - When the probed function returns, this probe
+ * causes the handlers to fire
+ */
+void kretprobe_trampoline_holder(void)
+{
+ asm volatile(".global kretprobe_trampoline\n"
+ "kretprobe_trampoline: bcr 0,0\n");
+}
+
+/*
+ * Called when the probe at kretprobe trampoline is hit
+ */
+static int __kprobes trampoline_probe_handler(struct kprobe *p,
+ struct pt_regs *regs)
+{
+ struct kretprobe_instance *ri = NULL;
+ struct hlist_head *head, empty_rp;
+ struct hlist_node *node, *tmp;
+ unsigned long flags, orig_ret_address = 0;
+ unsigned long trampoline_address = (unsigned long)&kretprobe_trampoline;
+
+ INIT_HLIST_HEAD(&empty_rp);
+ spin_lock_irqsave(&kretprobe_lock, flags);
+ head = kretprobe_inst_table_head(current);
+
+ /*
+ * It is possible to have multiple instances associated with a given
+ * task either because an multiple functions in the call path
+ * have a return probe installed on them, and/or more then one return
+ * return probe was registered for a target function.
+ *
+ * We can handle this because:
+ * - instances are always inserted at the head of the list
+ * - when multiple return probes are registered for the same
+ * function, the first instance's ret_addr will point to the
+ * real return address, and all the rest will point to
+ * kretprobe_trampoline
+ */
+ hlist_for_each_entry_safe(ri, node, tmp, head, hlist) {
+ if (ri->task != current)
+ /* another task is sharing our hash bucket */
+ continue;
+
+ if (ri->rp && ri->rp->handler)
+ ri->rp->handler(ri, regs);
+
+ orig_ret_address = (unsigned long)ri->ret_addr;
+ recycle_rp_inst(ri, &empty_rp);
+
+ if (orig_ret_address != trampoline_address) {
+ /*
+ * This is the real return address. Any other
+ * instances associated with this task are for
+ * other calls deeper on the call stack
+ */
+ break;
+ }
+ }
+ kretprobe_assert(ri, orig_ret_address, trampoline_address);
+ regs->psw.addr = orig_ret_address | PSW_ADDR_AMODE;
+
+ reset_current_kprobe();
+ spin_unlock_irqrestore(&kretprobe_lock, flags);
+ preempt_enable_no_resched();
+
+ hlist_for_each_entry_safe(ri, node, tmp, &empty_rp, hlist) {
+ hlist_del(&ri->hlist);
+ kfree(ri);
+ }
+ /*
+ * By returning a non-zero value, we are telling
+ * kprobe_handler() that we don't want the post_handler
+ * to run (and have re-enabled preemption)
+ */
+ return 1;
+}
+
+/*
+ * Called after single-stepping. p->addr is the address of the
+ * instruction whose first byte has been replaced by the "breakpoint"
+ * instruction. To avoid the SMP problems that can occur when we
+ * temporarily put back the original opcode to single-step, we
+ * single-stepped a copy of the instruction. The address of this
+ * copy is p->ainsn.insn.
+ */
+static void __kprobes resume_execution(struct kprobe *p, struct pt_regs *regs)
+{
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+
+ regs->psw.addr &= PSW_ADDR_INSN;
+
+ if (p->ainsn.fixup & FIXUP_PSW_NORMAL)
+ regs->psw.addr = (unsigned long)p->addr +
+ ((unsigned long)regs->psw.addr -
+ (unsigned long)p->ainsn.insn);
+
+ if (p->ainsn.fixup & FIXUP_BRANCH_NOT_TAKEN)
+ if ((unsigned long)regs->psw.addr -
+ (unsigned long)p->ainsn.insn == p->ainsn.ilen)
+ regs->psw.addr = (unsigned long)p->addr + p->ainsn.ilen;
+
+ if (p->ainsn.fixup & FIXUP_RETURN_REGISTER)
+ regs->gprs[p->ainsn.reg] = ((unsigned long)p->addr +
+ (regs->gprs[p->ainsn.reg] -
+ (unsigned long)p->ainsn.insn))
+ | PSW_ADDR_AMODE;
+
+ regs->psw.addr |= PSW_ADDR_AMODE;
+ /* turn off PER mode */
+ regs->psw.mask &= ~PSW_MASK_PER;
+ /* Restore the original per control regs */
+ __ctl_load(kcb->kprobe_saved_ctl, 9, 11);
+ regs->psw.mask |= kcb->kprobe_saved_imask;
+}
+
+static int __kprobes post_kprobe_handler(struct pt_regs *regs)
+{
+ struct kprobe *cur = kprobe_running();
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+
+ if (!cur)
+ return 0;
+
+ if ((kcb->kprobe_status != KPROBE_REENTER) && cur->post_handler) {
+ kcb->kprobe_status = KPROBE_HIT_SSDONE;
+ cur->post_handler(cur, regs, 0);
+ }
+
+ resume_execution(cur, regs);
+
+ /*Restore back the original saved kprobes variables and continue. */
+ if (kcb->kprobe_status == KPROBE_REENTER) {
+ restore_previous_kprobe(kcb);
+ goto out;
+ }
+ reset_current_kprobe();
+out:
+ preempt_enable_no_resched();
+
+ /*
+ * if somebody else is singlestepping across a probe point, psw mask
+ * will have PER set, in which case, continue the remaining processing
+ * of do_single_step, as if this is not a probe hit.
+ */
+ if (regs->psw.mask & PSW_MASK_PER) {
+ return 0;
+ }
+
+ return 1;
+}
+
+int __kprobes kprobe_fault_handler(struct pt_regs *regs, int trapnr)
+{
+ struct kprobe *cur = kprobe_running();
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+ const struct exception_table_entry *entry;
+
+ switch(kcb->kprobe_status) {
+ case KPROBE_SWAP_INST:
+ /* We are here because the instruction replacement failed */
+ return 0;
+ case KPROBE_HIT_SS:
+ case KPROBE_REENTER:
+ /*
+ * We are here because the instruction being single
+ * stepped caused a page fault. We reset the current
+ * kprobe and the nip points back to the probe address
+ * and allow the page fault handler to continue as a
+ * normal page fault.
+ */
+ regs->psw.addr = (unsigned long)cur->addr | PSW_ADDR_AMODE;
+ regs->psw.mask &= ~PSW_MASK_PER;
+ regs->psw.mask |= kcb->kprobe_saved_imask;
+ if (kcb->kprobe_status == KPROBE_REENTER)
+ restore_previous_kprobe(kcb);
+ else
+ reset_current_kprobe();
+ preempt_enable_no_resched();
+ break;
+ case KPROBE_HIT_ACTIVE:
+ case KPROBE_HIT_SSDONE:
+ /*
+ * We increment the nmissed count for accounting,
+ * we can also use npre/npostfault count for accouting
+ * these specific fault cases.
+ */
+ kprobes_inc_nmissed_count(cur);
+
+ /*
+ * We come here because instructions in the pre/post
+ * handler caused the page_fault, this could happen
+ * if handler tries to access user space by
+ * copy_from_user(), get_user() etc. Let the
+ * user-specified handler try to fix it first.
+ */
+ if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr))
+ return 1;
+
+ /*
+ * In case the user-specified fault handler returned
+ * zero, try to fix up.
+ */
+ entry = search_exception_tables(regs->psw.addr & PSW_ADDR_INSN);
+ if (entry) {
+ regs->psw.addr = entry->fixup | PSW_ADDR_AMODE;
+ return 1;
+ }
+
+ /*
+ * fixup_exception() could not handle it,
+ * Let do_page_fault() fix it.
+ */
+ break;
+ default:
+ break;
+ }
+ return 0;
+}
+
+/*
+ * Wrapper routine to for handling exceptions.
+ */
+int __kprobes kprobe_exceptions_notify(struct notifier_block *self,
+ unsigned long val, void *data)
+{
+ struct die_args *args = (struct die_args *)data;
+ int ret = NOTIFY_DONE;
+
+ switch (val) {
+ case DIE_BPT:
+ if (kprobe_handler(args->regs))
+ ret = NOTIFY_STOP;
+ break;
+ case DIE_SSTEP:
+ if (post_kprobe_handler(args->regs))
+ ret = NOTIFY_STOP;
+ break;
+ case DIE_TRAP:
+ /* kprobe_running() needs smp_processor_id() */
+ preempt_disable();
+ if (kprobe_running() &&
+ kprobe_fault_handler(args->regs, args->trapnr))
+ ret = NOTIFY_STOP;
+ preempt_enable();
+ break;
+ default:
+ break;
+ }
+ return ret;
+}
+
+int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
+{
+ struct jprobe *jp = container_of(p, struct jprobe, kp);
+ unsigned long addr;
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+
+ memcpy(&kcb->jprobe_saved_regs, regs, sizeof(struct pt_regs));
+
+ /* setup return addr to the jprobe handler routine */
+ regs->psw.addr = (unsigned long)(jp->entry) | PSW_ADDR_AMODE;
+
+ /* r14 is the function return address */
+ kcb->jprobe_saved_r14 = (unsigned long)regs->gprs[14];
+ /* r15 is the stack pointer */
+ kcb->jprobe_saved_r15 = (unsigned long)regs->gprs[15];
+ addr = (unsigned long)kcb->jprobe_saved_r15;
+
+ memcpy(kcb->jprobes_stack, (kprobe_opcode_t *) addr,
+ MIN_STACK_SIZE(addr));
+ return 1;
+}
+
+void __kprobes jprobe_return(void)
+{
+ asm volatile(".word 0x0002");
+}
+
+void __kprobes jprobe_return_end(void)
+{
+ asm volatile("bcr 0,0");
+}
+
+int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
+{
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+ unsigned long stack_addr = (unsigned long)(kcb->jprobe_saved_r15);
+
+ /* Put the regs back */
+ memcpy(regs, &kcb->jprobe_saved_regs, sizeof(struct pt_regs));
+ /* put the stack back */
+ memcpy((kprobe_opcode_t *) stack_addr, kcb->jprobes_stack,
+ MIN_STACK_SIZE(stack_addr));
+ preempt_enable_no_resched();
+ return 1;
+}
+
+static struct kprobe trampoline_p = {
+ .addr = (kprobe_opcode_t *) & kretprobe_trampoline,
+ .pre_handler = trampoline_probe_handler
+};
+
+int __init arch_init_kprobes(void)
+{
+ return register_kprobe(&trampoline_p);
+}
+
+int __kprobes arch_trampoline_kprobe(struct kprobe *p)
+{
+ if (p->addr == (kprobe_opcode_t *) & kretprobe_trampoline)
+ return 1;
+ return 0;
+}
Index: linux-2.6-lttng.stable/arch/s390/kernel/kprobes.c
===================================================================
--- linux-2.6-lttng.stable.orig/arch/s390/kernel/kprobes.c 2007-10-29 09:51:07.000000000 -0400
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,672 +0,0 @@
-/*
- * Kernel Probes (KProbes)
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
- *
- * Copyright (C) IBM Corporation, 2002, 2006
- *
- * s390 port, used ppc64 as template. Mike Grundy <grundym@us.ibm.com>
- */
-
-#include <linux/kprobes.h>
-#include <linux/ptrace.h>
-#include <linux/preempt.h>
-#include <linux/stop_machine.h>
-#include <linux/kdebug.h>
-#include <asm/cacheflush.h>
-#include <asm/sections.h>
-#include <asm/uaccess.h>
-#include <linux/module.h>
-
-DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
-DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
-
-struct kretprobe_blackpoint kretprobe_blacklist[] = {{NULL, NULL}};
-
-int __kprobes arch_prepare_kprobe(struct kprobe *p)
-{
- /* Make sure the probe isn't going on a difficult instruction */
- if (is_prohibited_opcode((kprobe_opcode_t *) p->addr))
- return -EINVAL;
-
- if ((unsigned long)p->addr & 0x01) {
- printk("Attempt to register kprobe at an unaligned address\n");
- return -EINVAL;
- }
-
- /* Use the get_insn_slot() facility for correctness */
- if (!(p->ainsn.insn = get_insn_slot()))
- return -ENOMEM;
-
- memcpy(p->ainsn.insn, p->addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
-
- get_instruction_type(&p->ainsn);
- p->opcode = *p->addr;
- return 0;
-}
-
-int __kprobes is_prohibited_opcode(kprobe_opcode_t *instruction)
-{
- switch (*(__u8 *) instruction) {
- case 0x0c: /* bassm */
- case 0x0b: /* bsm */
- case 0x83: /* diag */
- case 0x44: /* ex */
- return -EINVAL;
- }
- switch (*(__u16 *) instruction) {
- case 0x0101: /* pr */
- case 0xb25a: /* bsa */
- case 0xb240: /* bakr */
- case 0xb258: /* bsg */
- case 0xb218: /* pc */
- case 0xb228: /* pt */
- return -EINVAL;
- }
- return 0;
-}
-
-void __kprobes get_instruction_type(struct arch_specific_insn *ainsn)
-{
- /* default fixup method */
- ainsn->fixup = FIXUP_PSW_NORMAL;
-
- /* save r1 operand */
- ainsn->reg = (*ainsn->insn & 0xf0) >> 4;
-
- /* save the instruction length (pop 5-5) in bytes */
- switch (*(__u8 *) (ainsn->insn) >> 6) {
- case 0:
- ainsn->ilen = 2;
- break;
- case 1:
- case 2:
- ainsn->ilen = 4;
- break;
- case 3:
- ainsn->ilen = 6;
- break;
- }
-
- switch (*(__u8 *) ainsn->insn) {
- case 0x05: /* balr */
- case 0x0d: /* basr */
- ainsn->fixup = FIXUP_RETURN_REGISTER;
- /* if r2 = 0, no branch will be taken */
- if ((*ainsn->insn & 0x0f) == 0)
- ainsn->fixup |= FIXUP_BRANCH_NOT_TAKEN;
- break;
- case 0x06: /* bctr */
- case 0x07: /* bcr */
- ainsn->fixup = FIXUP_BRANCH_NOT_TAKEN;
- break;
- case 0x45: /* bal */
- case 0x4d: /* bas */
- ainsn->fixup = FIXUP_RETURN_REGISTER;
- break;
- case 0x47: /* bc */
- case 0x46: /* bct */
- case 0x86: /* bxh */
- case 0x87: /* bxle */
- ainsn->fixup = FIXUP_BRANCH_NOT_TAKEN;
- break;
- case 0x82: /* lpsw */
- ainsn->fixup = FIXUP_NOT_REQUIRED;
- break;
- case 0xb2: /* lpswe */
- if (*(((__u8 *) ainsn->insn) + 1) == 0xb2) {
- ainsn->fixup = FIXUP_NOT_REQUIRED;
- }
- break;
- case 0xa7: /* bras */
- if ((*ainsn->insn & 0x0f) == 0x05) {
- ainsn->fixup |= FIXUP_RETURN_REGISTER;
- }
- break;
- case 0xc0:
- if ((*ainsn->insn & 0x0f) == 0x00 /* larl */
- || (*ainsn->insn & 0x0f) == 0x05) /* brasl */
- ainsn->fixup |= FIXUP_RETURN_REGISTER;
- break;
- case 0xeb:
- if (*(((__u8 *) ainsn->insn) + 5 ) == 0x44 || /* bxhg */
- *(((__u8 *) ainsn->insn) + 5) == 0x45) {/* bxleg */
- ainsn->fixup = FIXUP_BRANCH_NOT_TAKEN;
- }
- break;
- case 0xe3: /* bctg */
- if (*(((__u8 *) ainsn->insn) + 5) == 0x46) {
- ainsn->fixup = FIXUP_BRANCH_NOT_TAKEN;
- }
- break;
- }
-}
-
-static int __kprobes swap_instruction(void *aref)
-{
- struct ins_replace_args *args = aref;
- u32 *addr;
- u32 instr;
- int err = -EFAULT;
-
- /*
- * Text segment is read-only, hence we use stura to bypass dynamic
- * address translation to exchange the instruction. Since stura
- * always operates on four bytes, but we only want to exchange two
- * bytes do some calculations to get things right. In addition we
- * shall not cross any page boundaries (vmalloc area!) when writing
- * the new instruction.
- */
- addr = (u32 *)((unsigned long)args->ptr & -4UL);
- if ((unsigned long)args->ptr & 2)
- instr = ((*addr) & 0xffff0000) | args->new;
- else
- instr = ((*addr) & 0x0000ffff) | args->new << 16;
-
- asm volatile(
- " lra %1,0(%1)\n"
- "0: stura %2,%1\n"
- "1: la %0,0\n"
- "2:\n"
- EX_TABLE(0b,2b)
- : "+d" (err)
- : "a" (addr), "d" (instr)
- : "memory", "cc");
-
- return err;
-}
-
-void __kprobes arch_arm_kprobe(struct kprobe *p)
-{
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
- unsigned long status = kcb->kprobe_status;
- struct ins_replace_args args;
-
- args.ptr = p->addr;
- args.old = p->opcode;
- args.new = BREAKPOINT_INSTRUCTION;
-
- kcb->kprobe_status = KPROBE_SWAP_INST;
- stop_machine_run(swap_instruction, &args, NR_CPUS);
- kcb->kprobe_status = status;
-}
-
-void __kprobes arch_disarm_kprobe(struct kprobe *p)
-{
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
- unsigned long status = kcb->kprobe_status;
- struct ins_replace_args args;
-
- args.ptr = p->addr;
- args.old = BREAKPOINT_INSTRUCTION;
- args.new = p->opcode;
-
- kcb->kprobe_status = KPROBE_SWAP_INST;
- stop_machine_run(swap_instruction, &args, NR_CPUS);
- kcb->kprobe_status = status;
-}
-
-void __kprobes arch_remove_kprobe(struct kprobe *p)
-{
- mutex_lock(&kprobe_mutex);
- free_insn_slot(p->ainsn.insn, 0);
- mutex_unlock(&kprobe_mutex);
-}
-
-static void __kprobes prepare_singlestep(struct kprobe *p, struct pt_regs *regs)
-{
- per_cr_bits kprobe_per_regs[1];
-
- memset(kprobe_per_regs, 0, sizeof(per_cr_bits));
- regs->psw.addr = (unsigned long)p->ainsn.insn | PSW_ADDR_AMODE;
-
- /* Set up the per control reg info, will pass to lctl */
- kprobe_per_regs[0].em_instruction_fetch = 1;
- kprobe_per_regs[0].starting_addr = (unsigned long)p->ainsn.insn;
- kprobe_per_regs[0].ending_addr = (unsigned long)p->ainsn.insn + 1;
-
- /* Set the PER control regs, turns on single step for this address */
- __ctl_load(kprobe_per_regs, 9, 11);
- regs->psw.mask |= PSW_MASK_PER;
- regs->psw.mask &= ~(PSW_MASK_IO | PSW_MASK_EXT | PSW_MASK_MCHECK);
-}
-
-static void __kprobes save_previous_kprobe(struct kprobe_ctlblk *kcb)
-{
- kcb->prev_kprobe.kp = kprobe_running();
- kcb->prev_kprobe.status = kcb->kprobe_status;
- kcb->prev_kprobe.kprobe_saved_imask = kcb->kprobe_saved_imask;
- memcpy(kcb->prev_kprobe.kprobe_saved_ctl, kcb->kprobe_saved_ctl,
- sizeof(kcb->kprobe_saved_ctl));
-}
-
-static void __kprobes restore_previous_kprobe(struct kprobe_ctlblk *kcb)
-{
- __get_cpu_var(current_kprobe) = kcb->prev_kprobe.kp;
- kcb->kprobe_status = kcb->prev_kprobe.status;
- kcb->kprobe_saved_imask = kcb->prev_kprobe.kprobe_saved_imask;
- memcpy(kcb->kprobe_saved_ctl, kcb->prev_kprobe.kprobe_saved_ctl,
- sizeof(kcb->kprobe_saved_ctl));
-}
-
-static void __kprobes set_current_kprobe(struct kprobe *p, struct pt_regs *regs,
- struct kprobe_ctlblk *kcb)
-{
- __get_cpu_var(current_kprobe) = p;
- /* Save the interrupt and per flags */
- kcb->kprobe_saved_imask = regs->psw.mask &
- (PSW_MASK_PER | PSW_MASK_IO | PSW_MASK_EXT | PSW_MASK_MCHECK);
- /* Save the control regs that govern PER */
- __ctl_store(kcb->kprobe_saved_ctl, 9, 11);
-}
-
-/* Called with kretprobe_lock held */
-void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
- struct pt_regs *regs)
-{
- ri->ret_addr = (kprobe_opcode_t *) regs->gprs[14];
-
- /* Replace the return addr with trampoline addr */
- regs->gprs[14] = (unsigned long)&kretprobe_trampoline;
-}
-
-static int __kprobes kprobe_handler(struct pt_regs *regs)
-{
- struct kprobe *p;
- int ret = 0;
- unsigned long *addr = (unsigned long *)
- ((regs->psw.addr & PSW_ADDR_INSN) - 2);
- struct kprobe_ctlblk *kcb;
-
- /*
- * We don't want to be preempted for the entire
- * duration of kprobe processing
- */
- preempt_disable();
- kcb = get_kprobe_ctlblk();
-
- /* Check we're not actually recursing */
- if (kprobe_running()) {
- p = get_kprobe(addr);
- if (p) {
- if (kcb->kprobe_status == KPROBE_HIT_SS &&
- *p->ainsn.insn == BREAKPOINT_INSTRUCTION) {
- regs->psw.mask &= ~PSW_MASK_PER;
- regs->psw.mask |= kcb->kprobe_saved_imask;
- goto no_kprobe;
- }
- /* We have reentered the kprobe_handler(), since
- * another probe was hit while within the handler.
- * We here save the original kprobes variables and
- * just single step on the instruction of the new probe
- * without calling any user handlers.
- */
- save_previous_kprobe(kcb);
- set_current_kprobe(p, regs, kcb);
- kprobes_inc_nmissed_count(p);
- prepare_singlestep(p, regs);
- kcb->kprobe_status = KPROBE_REENTER;
- return 1;
- } else {
- p = __get_cpu_var(current_kprobe);
- if (p->break_handler && p->break_handler(p, regs)) {
- goto ss_probe;
- }
- }
- goto no_kprobe;
- }
-
- p = get_kprobe(addr);
- if (!p)
- /*
- * No kprobe at this address. The fault has not been
- * caused by a kprobe breakpoint. The race of breakpoint
- * vs. kprobe remove does not exist because on s390 we
- * use stop_machine_run to arm/disarm the breakpoints.
- */
- goto no_kprobe;
-
- kcb->kprobe_status = KPROBE_HIT_ACTIVE;
- set_current_kprobe(p, regs, kcb);
- if (p->pre_handler && p->pre_handler(p, regs))
- /* handler has already set things up, so skip ss setup */
- return 1;
-
-ss_probe:
- prepare_singlestep(p, regs);
- kcb->kprobe_status = KPROBE_HIT_SS;
- return 1;
-
-no_kprobe:
- preempt_enable_no_resched();
- return ret;
-}
-
-/*
- * Function return probe trampoline:
- * - init_kprobes() establishes a probepoint here
- * - When the probed function returns, this probe
- * causes the handlers to fire
- */
-void kretprobe_trampoline_holder(void)
-{
- asm volatile(".global kretprobe_trampoline\n"
- "kretprobe_trampoline: bcr 0,0\n");
-}
-
-/*
- * Called when the probe at kretprobe trampoline is hit
- */
-static int __kprobes trampoline_probe_handler(struct kprobe *p,
- struct pt_regs *regs)
-{
- struct kretprobe_instance *ri = NULL;
- struct hlist_head *head, empty_rp;
- struct hlist_node *node, *tmp;
- unsigned long flags, orig_ret_address = 0;
- unsigned long trampoline_address = (unsigned long)&kretprobe_trampoline;
-
- INIT_HLIST_HEAD(&empty_rp);
- spin_lock_irqsave(&kretprobe_lock, flags);
- head = kretprobe_inst_table_head(current);
-
- /*
- * It is possible to have multiple instances associated with a given
- * task either because an multiple functions in the call path
- * have a return probe installed on them, and/or more then one return
- * return probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always inserted at the head of the list
- * - when multiple return probes are registered for the same
- * function, the first instance's ret_addr will point to the
- * real return address, and all the rest will point to
- * kretprobe_trampoline
- */
- hlist_for_each_entry_safe(ri, node, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- if (ri->rp && ri->rp->handler)
- ri->rp->handler(ri, regs);
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address) {
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
- }
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
- regs->psw.addr = orig_ret_address | PSW_ADDR_AMODE;
-
- reset_current_kprobe();
- spin_unlock_irqrestore(&kretprobe_lock, flags);
- preempt_enable_no_resched();
-
- hlist_for_each_entry_safe(ri, node, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
- /*
- * By returning a non-zero value, we are telling
- * kprobe_handler() that we don't want the post_handler
- * to run (and have re-enabled preemption)
- */
- return 1;
-}
-
-/*
- * Called after single-stepping. p->addr is the address of the
- * instruction whose first byte has been replaced by the "breakpoint"
- * instruction. To avoid the SMP problems that can occur when we
- * temporarily put back the original opcode to single-step, we
- * single-stepped a copy of the instruction. The address of this
- * copy is p->ainsn.insn.
- */
-static void __kprobes resume_execution(struct kprobe *p, struct pt_regs *regs)
-{
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- regs->psw.addr &= PSW_ADDR_INSN;
-
- if (p->ainsn.fixup & FIXUP_PSW_NORMAL)
- regs->psw.addr = (unsigned long)p->addr +
- ((unsigned long)regs->psw.addr -
- (unsigned long)p->ainsn.insn);
-
- if (p->ainsn.fixup & FIXUP_BRANCH_NOT_TAKEN)
- if ((unsigned long)regs->psw.addr -
- (unsigned long)p->ainsn.insn == p->ainsn.ilen)
- regs->psw.addr = (unsigned long)p->addr + p->ainsn.ilen;
-
- if (p->ainsn.fixup & FIXUP_RETURN_REGISTER)
- regs->gprs[p->ainsn.reg] = ((unsigned long)p->addr +
- (regs->gprs[p->ainsn.reg] -
- (unsigned long)p->ainsn.insn))
- | PSW_ADDR_AMODE;
-
- regs->psw.addr |= PSW_ADDR_AMODE;
- /* turn off PER mode */
- regs->psw.mask &= ~PSW_MASK_PER;
- /* Restore the original per control regs */
- __ctl_load(kcb->kprobe_saved_ctl, 9, 11);
- regs->psw.mask |= kcb->kprobe_saved_imask;
-}
-
-static int __kprobes post_kprobe_handler(struct pt_regs *regs)
-{
- struct kprobe *cur = kprobe_running();
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- if (!cur)
- return 0;
-
- if ((kcb->kprobe_status != KPROBE_REENTER) && cur->post_handler) {
- kcb->kprobe_status = KPROBE_HIT_SSDONE;
- cur->post_handler(cur, regs, 0);
- }
-
- resume_execution(cur, regs);
-
- /*Restore back the original saved kprobes variables and continue. */
- if (kcb->kprobe_status == KPROBE_REENTER) {
- restore_previous_kprobe(kcb);
- goto out;
- }
- reset_current_kprobe();
-out:
- preempt_enable_no_resched();
-
- /*
- * if somebody else is singlestepping across a probe point, psw mask
- * will have PER set, in which case, continue the remaining processing
- * of do_single_step, as if this is not a probe hit.
- */
- if (regs->psw.mask & PSW_MASK_PER) {
- return 0;
- }
-
- return 1;
-}
-
-int __kprobes kprobe_fault_handler(struct pt_regs *regs, int trapnr)
-{
- struct kprobe *cur = kprobe_running();
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
- const struct exception_table_entry *entry;
-
- switch(kcb->kprobe_status) {
- case KPROBE_SWAP_INST:
- /* We are here because the instruction replacement failed */
- return 0;
- case KPROBE_HIT_SS:
- case KPROBE_REENTER:
- /*
- * We are here because the instruction being single
- * stepped caused a page fault. We reset the current
- * kprobe and the nip points back to the probe address
- * and allow the page fault handler to continue as a
- * normal page fault.
- */
- regs->psw.addr = (unsigned long)cur->addr | PSW_ADDR_AMODE;
- regs->psw.mask &= ~PSW_MASK_PER;
- regs->psw.mask |= kcb->kprobe_saved_imask;
- if (kcb->kprobe_status == KPROBE_REENTER)
- restore_previous_kprobe(kcb);
- else
- reset_current_kprobe();
- preempt_enable_no_resched();
- break;
- case KPROBE_HIT_ACTIVE:
- case KPROBE_HIT_SSDONE:
- /*
- * We increment the nmissed count for accounting,
- * we can also use npre/npostfault count for accouting
- * these specific fault cases.
- */
- kprobes_inc_nmissed_count(cur);
-
- /*
- * We come here because instructions in the pre/post
- * handler caused the page_fault, this could happen
- * if handler tries to access user space by
- * copy_from_user(), get_user() etc. Let the
- * user-specified handler try to fix it first.
- */
- if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr))
- return 1;
-
- /*
- * In case the user-specified fault handler returned
- * zero, try to fix up.
- */
- entry = search_exception_tables(regs->psw.addr & PSW_ADDR_INSN);
- if (entry) {
- regs->psw.addr = entry->fixup | PSW_ADDR_AMODE;
- return 1;
- }
-
- /*
- * fixup_exception() could not handle it,
- * Let do_page_fault() fix it.
- */
- break;
- default:
- break;
- }
- return 0;
-}
-
-/*
- * Wrapper routine to for handling exceptions.
- */
-int __kprobes kprobe_exceptions_notify(struct notifier_block *self,
- unsigned long val, void *data)
-{
- struct die_args *args = (struct die_args *)data;
- int ret = NOTIFY_DONE;
-
- switch (val) {
- case DIE_BPT:
- if (kprobe_handler(args->regs))
- ret = NOTIFY_STOP;
- break;
- case DIE_SSTEP:
- if (post_kprobe_handler(args->regs))
- ret = NOTIFY_STOP;
- break;
- case DIE_TRAP:
- /* kprobe_running() needs smp_processor_id() */
- preempt_disable();
- if (kprobe_running() &&
- kprobe_fault_handler(args->regs, args->trapnr))
- ret = NOTIFY_STOP;
- preempt_enable();
- break;
- default:
- break;
- }
- return ret;
-}
-
-int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct jprobe *jp = container_of(p, struct jprobe, kp);
- unsigned long addr;
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- memcpy(&kcb->jprobe_saved_regs, regs, sizeof(struct pt_regs));
-
- /* setup return addr to the jprobe handler routine */
- regs->psw.addr = (unsigned long)(jp->entry) | PSW_ADDR_AMODE;
-
- /* r14 is the function return address */
- kcb->jprobe_saved_r14 = (unsigned long)regs->gprs[14];
- /* r15 is the stack pointer */
- kcb->jprobe_saved_r15 = (unsigned long)regs->gprs[15];
- addr = (unsigned long)kcb->jprobe_saved_r15;
-
- memcpy(kcb->jprobes_stack, (kprobe_opcode_t *) addr,
- MIN_STACK_SIZE(addr));
- return 1;
-}
-
-void __kprobes jprobe_return(void)
-{
- asm volatile(".word 0x0002");
-}
-
-void __kprobes jprobe_return_end(void)
-{
- asm volatile("bcr 0,0");
-}
-
-int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
- unsigned long stack_addr = (unsigned long)(kcb->jprobe_saved_r15);
-
- /* Put the regs back */
- memcpy(regs, &kcb->jprobe_saved_regs, sizeof(struct pt_regs));
- /* put the stack back */
- memcpy((kprobe_opcode_t *) stack_addr, kcb->jprobes_stack,
- MIN_STACK_SIZE(stack_addr));
- preempt_enable_no_resched();
- return 1;
-}
-
-static struct kprobe trampoline_p = {
- .addr = (kprobe_opcode_t *) & kretprobe_trampoline,
- .pre_handler = trampoline_probe_handler
-};
-
-int __init arch_init_kprobes(void)
-{
- return register_kprobe(&trampoline_p);
-}
-
-int __kprobes arch_trampoline_kprobe(struct kprobe *p)
-{
- if (p->addr == (kprobe_opcode_t *) & kretprobe_trampoline)
- return 1;
- return 0;
-}
Index: linux-2.6-lttng.stable/arch/sparc64/instrumentation/kprobes.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng.stable/arch/sparc64/instrumentation/kprobes.c 2007-10-29 09:51:38.000000000 -0400
@@ -0,0 +1,487 @@
+/* arch/sparc64/kernel/kprobes.c
+ *
+ * Copyright (C) 2004 David S. Miller <davem@davemloft.net>
+ */
+
+#include <linux/kernel.h>
+#include <linux/kprobes.h>
+#include <linux/module.h>
+#include <linux/kdebug.h>
+#include <asm/signal.h>
+#include <asm/cacheflush.h>
+#include <asm/uaccess.h>
+
+/* We do not have hardware single-stepping on sparc64.
+ * So we implement software single-stepping with breakpoint
+ * traps. The top-level scheme is similar to that used
+ * in the x86 kprobes implementation.
+ *
+ * In the kprobe->ainsn.insn[] array we store the original
+ * instruction at index zero and a break instruction at
+ * index one.
+ *
+ * When we hit a kprobe we:
+ * - Run the pre-handler
+ * - Remember "regs->tnpc" and interrupt level stored in
+ * "regs->tstate" so we can restore them later
+ * - Disable PIL interrupts
+ * - Set regs->tpc to point to kprobe->ainsn.insn[0]
+ * - Set regs->tnpc to point to kprobe->ainsn.insn[1]
+ * - Mark that we are actively in a kprobe
+ *
+ * At this point we wait for the second breakpoint at
+ * kprobe->ainsn.insn[1] to hit. When it does we:
+ * - Run the post-handler
+ * - Set regs->tpc to "remembered" regs->tnpc stored above,
+ * restore the PIL interrupt level in "regs->tstate" as well
+ * - Make any adjustments necessary to regs->tnpc in order
+ * to handle relative branches correctly. See below.
+ * - Mark that we are no longer actively in a kprobe.
+ */
+
+DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
+DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
+
+struct kretprobe_blackpoint kretprobe_blacklist[] = {{NULL, NULL}};
+
+int __kprobes arch_prepare_kprobe(struct kprobe *p)
+{
+ p->ainsn.insn[0] = *p->addr;
+ flushi(&p->ainsn.insn[0]);
+
+ p->ainsn.insn[1] = BREAKPOINT_INSTRUCTION_2;
+ flushi(&p->ainsn.insn[1]);
+
+ p->opcode = *p->addr;
+ return 0;
+}
+
+void __kprobes arch_arm_kprobe(struct kprobe *p)
+{
+ *p->addr = BREAKPOINT_INSTRUCTION;
+ flushi(p->addr);
+}
+
+void __kprobes arch_disarm_kprobe(struct kprobe *p)
+{
+ *p->addr = p->opcode;
+ flushi(p->addr);
+}
+
+static void __kprobes save_previous_kprobe(struct kprobe_ctlblk *kcb)
+{
+ kcb->prev_kprobe.kp = kprobe_running();
+ kcb->prev_kprobe.status = kcb->kprobe_status;
+ kcb->prev_kprobe.orig_tnpc = kcb->kprobe_orig_tnpc;
+ kcb->prev_kprobe.orig_tstate_pil = kcb->kprobe_orig_tstate_pil;
+}
+
+static void __kprobes restore_previous_kprobe(struct kprobe_ctlblk *kcb)
+{
+ __get_cpu_var(current_kprobe) = kcb->prev_kprobe.kp;
+ kcb->kprobe_status = kcb->prev_kprobe.status;
+ kcb->kprobe_orig_tnpc = kcb->prev_kprobe.orig_tnpc;
+ kcb->kprobe_orig_tstate_pil = kcb->prev_kprobe.orig_tstate_pil;
+}
+
+static void __kprobes set_current_kprobe(struct kprobe *p, struct pt_regs *regs,
+ struct kprobe_ctlblk *kcb)
+{
+ __get_cpu_var(current_kprobe) = p;
+ kcb->kprobe_orig_tnpc = regs->tnpc;
+ kcb->kprobe_orig_tstate_pil = (regs->tstate & TSTATE_PIL);
+}
+
+static void __kprobes prepare_singlestep(struct kprobe *p, struct pt_regs *regs,
+ struct kprobe_ctlblk *kcb)
+{
+ regs->tstate |= TSTATE_PIL;
+
+ /*single step inline, if it a breakpoint instruction*/
+ if (p->opcode == BREAKPOINT_INSTRUCTION) {
+ regs->tpc = (unsigned long) p->addr;
+ regs->tnpc = kcb->kprobe_orig_tnpc;
+ } else {
+ regs->tpc = (unsigned long) &p->ainsn.insn[0];
+ regs->tnpc = (unsigned long) &p->ainsn.insn[1];
+ }
+}
+
+static int __kprobes kprobe_handler(struct pt_regs *regs)
+{
+ struct kprobe *p;
+ void *addr = (void *) regs->tpc;
+ int ret = 0;
+ struct kprobe_ctlblk *kcb;
+
+ /*
+ * We don't want to be preempted for the entire
+ * duration of kprobe processing
+ */
+ preempt_disable();
+ kcb = get_kprobe_ctlblk();
+
+ if (kprobe_running()) {
+ p = get_kprobe(addr);
+ if (p) {
+ if (kcb->kprobe_status == KPROBE_HIT_SS) {
+ regs->tstate = ((regs->tstate & ~TSTATE_PIL) |
+ kcb->kprobe_orig_tstate_pil);
+ goto no_kprobe;
+ }
+ /* We have reentered the kprobe_handler(), since
+ * another probe was hit while within the handler.
+ * We here save the original kprobes variables and
+ * just single step on the instruction of the new probe
+ * without calling any user handlers.
+ */
+ save_previous_kprobe(kcb);
+ set_current_kprobe(p, regs, kcb);
+ kprobes_inc_nmissed_count(p);
+ kcb->kprobe_status = KPROBE_REENTER;
+ prepare_singlestep(p, regs, kcb);
+ return 1;
+ } else {
+ if (*(u32 *)addr != BREAKPOINT_INSTRUCTION) {
+ /* The breakpoint instruction was removed by
+ * another cpu right after we hit, no further
+ * handling of this interrupt is appropriate
+ */
+ ret = 1;
+ goto no_kprobe;
+ }
+ p = __get_cpu_var(current_kprobe);
+ if (p->break_handler && p->break_handler(p, regs))
+ goto ss_probe;
+ }
+ goto no_kprobe;
+ }
+
+ p = get_kprobe(addr);
+ if (!p) {
+ if (*(u32 *)addr != BREAKPOINT_INSTRUCTION) {
+ /*
+ * The breakpoint instruction was removed right
+ * after we hit it. Another cpu has removed
+ * either a probepoint or a debugger breakpoint
+ * at this address. In either case, no further
+ * handling of this interrupt is appropriate.
+ */
+ ret = 1;
+ }
+ /* Not one of ours: let kernel handle it */
+ goto no_kprobe;
+ }
+
+ set_current_kprobe(p, regs, kcb);
+ kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+ if (p->pre_handler && p->pre_handler(p, regs))
+ return 1;
+
+ss_probe:
+ prepare_singlestep(p, regs, kcb);
+ kcb->kprobe_status = KPROBE_HIT_SS;
+ return 1;
+
+no_kprobe:
+ preempt_enable_no_resched();
+ return ret;
+}
+
+/* If INSN is a relative control transfer instruction,
+ * return the corrected branch destination value.
+ *
+ * regs->tpc and regs->tnpc still hold the values of the
+ * program counters at the time of trap due to the execution
+ * of the BREAKPOINT_INSTRUCTION_2 at p->ainsn.insn[1]
+ *
+ */
+static unsigned long __kprobes relbranch_fixup(u32 insn, struct kprobe *p,
+ struct pt_regs *regs)
+{
+ unsigned long real_pc = (unsigned long) p->addr;
+
+ /* Branch not taken, no mods necessary. */
+ if (regs->tnpc == regs->tpc + 0x4UL)
+ return real_pc + 0x8UL;
+
+ /* The three cases are call, branch w/prediction,
+ * and traditional branch.
+ */
+ if ((insn & 0xc0000000) == 0x40000000 ||
+ (insn & 0xc1c00000) == 0x00400000 ||
+ (insn & 0xc1c00000) == 0x00800000) {
+ unsigned long ainsn_addr;
+
+ ainsn_addr = (unsigned long) &p->ainsn.insn[0];
+
+ /* The instruction did all the work for us
+ * already, just apply the offset to the correct
+ * instruction location.
+ */
+ return (real_pc + (regs->tnpc - ainsn_addr));
+ }
+
+ /* It is jmpl or some other absolute PC modification instruction,
+ * leave NPC as-is.
+ */
+ return regs->tnpc;
+}
+
+/* If INSN is an instruction which writes it's PC location
+ * into a destination register, fix that up.
+ */
+static void __kprobes retpc_fixup(struct pt_regs *regs, u32 insn,
+ unsigned long real_pc)
+{
+ unsigned long *slot = NULL;
+
+ /* Simplest case is 'call', which always uses %o7 */
+ if ((insn & 0xc0000000) == 0x40000000) {
+ slot = ®s->u_regs[UREG_I7];
+ }
+
+ /* 'jmpl' encodes the register inside of the opcode */
+ if ((insn & 0xc1f80000) == 0x81c00000) {
+ unsigned long rd = ((insn >> 25) & 0x1f);
+
+ if (rd <= 15) {
+ slot = ®s->u_regs[rd];
+ } else {
+ /* Hard case, it goes onto the stack. */
+ flushw_all();
+
+ rd -= 16;
+ slot = (unsigned long *)
+ (regs->u_regs[UREG_FP] + STACK_BIAS);
+ slot += rd;
+ }
+ }
+ if (slot != NULL)
+ *slot = real_pc;
+}
+
+/*
+ * Called after single-stepping. p->addr is the address of the
+ * instruction which has been replaced by the breakpoint
+ * instruction. To avoid the SMP problems that can occur when we
+ * temporarily put back the original opcode to single-step, we
+ * single-stepped a copy of the instruction. The address of this
+ * copy is &p->ainsn.insn[0].
+ *
+ * This function prepares to return from the post-single-step
+ * breakpoint trap.
+ */
+static void __kprobes resume_execution(struct kprobe *p,
+ struct pt_regs *regs, struct kprobe_ctlblk *kcb)
+{
+ u32 insn = p->ainsn.insn[0];
+
+ regs->tnpc = relbranch_fixup(insn, p, regs);
+
+ /* This assignment must occur after relbranch_fixup() */
+ regs->tpc = kcb->kprobe_orig_tnpc;
+
+ retpc_fixup(regs, insn, (unsigned long) p->addr);
+
+ regs->tstate = ((regs->tstate & ~TSTATE_PIL) |
+ kcb->kprobe_orig_tstate_pil);
+}
+
+static int __kprobes post_kprobe_handler(struct pt_regs *regs)
+{
+ struct kprobe *cur = kprobe_running();
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+
+ if (!cur)
+ return 0;
+
+ if ((kcb->kprobe_status != KPROBE_REENTER) && cur->post_handler) {
+ kcb->kprobe_status = KPROBE_HIT_SSDONE;
+ cur->post_handler(cur, regs, 0);
+ }
+
+ resume_execution(cur, regs, kcb);
+
+ /*Restore back the original saved kprobes variables and continue. */
+ if (kcb->kprobe_status == KPROBE_REENTER) {
+ restore_previous_kprobe(kcb);
+ goto out;
+ }
+ reset_current_kprobe();
+out:
+ preempt_enable_no_resched();
+
+ return 1;
+}
+
+int __kprobes kprobe_fault_handler(struct pt_regs *regs, int trapnr)
+{
+ struct kprobe *cur = kprobe_running();
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+ const struct exception_table_entry *entry;
+
+ switch(kcb->kprobe_status) {
+ case KPROBE_HIT_SS:
+ case KPROBE_REENTER:
+ /*
+ * We are here because the instruction being single
+ * stepped caused a page fault. We reset the current
+ * kprobe and the tpc points back to the probe address
+ * and allow the page fault handler to continue as a
+ * normal page fault.
+ */
+ regs->tpc = (unsigned long)cur->addr;
+ regs->tnpc = kcb->kprobe_orig_tnpc;
+ regs->tstate = ((regs->tstate & ~TSTATE_PIL) |
+ kcb->kprobe_orig_tstate_pil);
+ if (kcb->kprobe_status == KPROBE_REENTER)
+ restore_previous_kprobe(kcb);
+ else
+ reset_current_kprobe();
+ preempt_enable_no_resched();
+ break;
+ case KPROBE_HIT_ACTIVE:
+ case KPROBE_HIT_SSDONE:
+ /*
+ * We increment the nmissed count for accounting,
+ * we can also use npre/npostfault count for accouting
+ * these specific fault cases.
+ */
+ kprobes_inc_nmissed_count(cur);
+
+ /*
+ * We come here because instructions in the pre/post
+ * handler caused the page_fault, this could happen
+ * if handler tries to access user space by
+ * copy_from_user(), get_user() etc. Let the
+ * user-specified handler try to fix it first.
+ */
+ if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr))
+ return 1;
+
+ /*
+ * In case the user-specified fault handler returned
+ * zero, try to fix up.
+ */
+
+ entry = search_exception_tables(regs->tpc);
+ if (entry) {
+ regs->tpc = entry->fixup;
+ regs->tnpc = regs->tpc + 4;
+ return 1;
+ }
+
+ /*
+ * fixup_exception() could not handle it,
+ * Let do_page_fault() fix it.
+ */
+ break;
+ default:
+ break;
+ }
+
+ return 0;
+}
+
+/*
+ * Wrapper routine to for handling exceptions.
+ */
+int __kprobes kprobe_exceptions_notify(struct notifier_block *self,
+ unsigned long val, void *data)
+{
+ struct die_args *args = (struct die_args *)data;
+ int ret = NOTIFY_DONE;
+
+ if (args->regs && user_mode(args->regs))
+ return ret;
+
+ switch (val) {
+ case DIE_DEBUG:
+ if (kprobe_handler(args->regs))
+ ret = NOTIFY_STOP;
+ break;
+ case DIE_DEBUG_2:
+ if (post_kprobe_handler(args->regs))
+ ret = NOTIFY_STOP;
+ break;
+ default:
+ break;
+ }
+ return ret;
+}
+
+asmlinkage void __kprobes kprobe_trap(unsigned long trap_level,
+ struct pt_regs *regs)
+{
+ BUG_ON(trap_level != 0x170 && trap_level != 0x171);
+
+ if (user_mode(regs)) {
+ local_irq_enable();
+ bad_trap(regs, trap_level);
+ return;
+ }
+
+ /* trap_level == 0x170 --> ta 0x70
+ * trap_level == 0x171 --> ta 0x71
+ */
+ if (notify_die((trap_level == 0x170) ? DIE_DEBUG : DIE_DEBUG_2,
+ (trap_level == 0x170) ? "debug" : "debug_2",
+ regs, 0, trap_level, SIGTRAP) != NOTIFY_STOP)
+ bad_trap(regs, trap_level);
+}
+
+/* Jprobes support. */
+int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
+{
+ struct jprobe *jp = container_of(p, struct jprobe, kp);
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+
+ memcpy(&(kcb->jprobe_saved_regs), regs, sizeof(*regs));
+
+ regs->tpc = (unsigned long) jp->entry;
+ regs->tnpc = ((unsigned long) jp->entry) + 0x4UL;
+ regs->tstate |= TSTATE_PIL;
+
+ return 1;
+}
+
+void __kprobes jprobe_return(void)
+{
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+ register unsigned long orig_fp asm("g1");
+
+ orig_fp = kcb->jprobe_saved_regs.u_regs[UREG_FP];
+ __asm__ __volatile__("\n"
+"1: cmp %%sp, %0\n\t"
+ "blu,a,pt %%xcc, 1b\n\t"
+ " restore\n\t"
+ ".globl jprobe_return_trap_instruction\n"
+"jprobe_return_trap_instruction:\n\t"
+ "ta 0x70"
+ : /* no outputs */
+ : "r" (orig_fp));
+}
+
+extern void jprobe_return_trap_instruction(void);
+
+extern void __show_regs(struct pt_regs * regs);
+
+int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
+{
+ u32 *addr = (u32 *) regs->tpc;
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+
+ if (addr == (u32 *) jprobe_return_trap_instruction) {
+ memcpy(regs, &(kcb->jprobe_saved_regs), sizeof(*regs));
+ preempt_enable_no_resched();
+ return 1;
+ }
+ return 0;
+}
+
+/* architecture specific initialization */
+int arch_init_kprobes(void)
+{
+ return 0;
+}
Index: linux-2.6-lttng.stable/arch/sparc64/kernel/kprobes.c
===================================================================
--- linux-2.6-lttng.stable.orig/arch/sparc64/kernel/kprobes.c 2007-10-29 09:51:07.000000000 -0400
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,487 +0,0 @@
-/* arch/sparc64/kernel/kprobes.c
- *
- * Copyright (C) 2004 David S. Miller <davem@davemloft.net>
- */
-
-#include <linux/kernel.h>
-#include <linux/kprobes.h>
-#include <linux/module.h>
-#include <linux/kdebug.h>
-#include <asm/signal.h>
-#include <asm/cacheflush.h>
-#include <asm/uaccess.h>
-
-/* We do not have hardware single-stepping on sparc64.
- * So we implement software single-stepping with breakpoint
- * traps. The top-level scheme is similar to that used
- * in the x86 kprobes implementation.
- *
- * In the kprobe->ainsn.insn[] array we store the original
- * instruction at index zero and a break instruction at
- * index one.
- *
- * When we hit a kprobe we:
- * - Run the pre-handler
- * - Remember "regs->tnpc" and interrupt level stored in
- * "regs->tstate" so we can restore them later
- * - Disable PIL interrupts
- * - Set regs->tpc to point to kprobe->ainsn.insn[0]
- * - Set regs->tnpc to point to kprobe->ainsn.insn[1]
- * - Mark that we are actively in a kprobe
- *
- * At this point we wait for the second breakpoint at
- * kprobe->ainsn.insn[1] to hit. When it does we:
- * - Run the post-handler
- * - Set regs->tpc to "remembered" regs->tnpc stored above,
- * restore the PIL interrupt level in "regs->tstate" as well
- * - Make any adjustments necessary to regs->tnpc in order
- * to handle relative branches correctly. See below.
- * - Mark that we are no longer actively in a kprobe.
- */
-
-DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
-DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
-
-struct kretprobe_blackpoint kretprobe_blacklist[] = {{NULL, NULL}};
-
-int __kprobes arch_prepare_kprobe(struct kprobe *p)
-{
- p->ainsn.insn[0] = *p->addr;
- flushi(&p->ainsn.insn[0]);
-
- p->ainsn.insn[1] = BREAKPOINT_INSTRUCTION_2;
- flushi(&p->ainsn.insn[1]);
-
- p->opcode = *p->addr;
- return 0;
-}
-
-void __kprobes arch_arm_kprobe(struct kprobe *p)
-{
- *p->addr = BREAKPOINT_INSTRUCTION;
- flushi(p->addr);
-}
-
-void __kprobes arch_disarm_kprobe(struct kprobe *p)
-{
- *p->addr = p->opcode;
- flushi(p->addr);
-}
-
-static void __kprobes save_previous_kprobe(struct kprobe_ctlblk *kcb)
-{
- kcb->prev_kprobe.kp = kprobe_running();
- kcb->prev_kprobe.status = kcb->kprobe_status;
- kcb->prev_kprobe.orig_tnpc = kcb->kprobe_orig_tnpc;
- kcb->prev_kprobe.orig_tstate_pil = kcb->kprobe_orig_tstate_pil;
-}
-
-static void __kprobes restore_previous_kprobe(struct kprobe_ctlblk *kcb)
-{
- __get_cpu_var(current_kprobe) = kcb->prev_kprobe.kp;
- kcb->kprobe_status = kcb->prev_kprobe.status;
- kcb->kprobe_orig_tnpc = kcb->prev_kprobe.orig_tnpc;
- kcb->kprobe_orig_tstate_pil = kcb->prev_kprobe.orig_tstate_pil;
-}
-
-static void __kprobes set_current_kprobe(struct kprobe *p, struct pt_regs *regs,
- struct kprobe_ctlblk *kcb)
-{
- __get_cpu_var(current_kprobe) = p;
- kcb->kprobe_orig_tnpc = regs->tnpc;
- kcb->kprobe_orig_tstate_pil = (regs->tstate & TSTATE_PIL);
-}
-
-static void __kprobes prepare_singlestep(struct kprobe *p, struct pt_regs *regs,
- struct kprobe_ctlblk *kcb)
-{
- regs->tstate |= TSTATE_PIL;
-
- /*single step inline, if it a breakpoint instruction*/
- if (p->opcode == BREAKPOINT_INSTRUCTION) {
- regs->tpc = (unsigned long) p->addr;
- regs->tnpc = kcb->kprobe_orig_tnpc;
- } else {
- regs->tpc = (unsigned long) &p->ainsn.insn[0];
- regs->tnpc = (unsigned long) &p->ainsn.insn[1];
- }
-}
-
-static int __kprobes kprobe_handler(struct pt_regs *regs)
-{
- struct kprobe *p;
- void *addr = (void *) regs->tpc;
- int ret = 0;
- struct kprobe_ctlblk *kcb;
-
- /*
- * We don't want to be preempted for the entire
- * duration of kprobe processing
- */
- preempt_disable();
- kcb = get_kprobe_ctlblk();
-
- if (kprobe_running()) {
- p = get_kprobe(addr);
- if (p) {
- if (kcb->kprobe_status == KPROBE_HIT_SS) {
- regs->tstate = ((regs->tstate & ~TSTATE_PIL) |
- kcb->kprobe_orig_tstate_pil);
- goto no_kprobe;
- }
- /* We have reentered the kprobe_handler(), since
- * another probe was hit while within the handler.
- * We here save the original kprobes variables and
- * just single step on the instruction of the new probe
- * without calling any user handlers.
- */
- save_previous_kprobe(kcb);
- set_current_kprobe(p, regs, kcb);
- kprobes_inc_nmissed_count(p);
- kcb->kprobe_status = KPROBE_REENTER;
- prepare_singlestep(p, regs, kcb);
- return 1;
- } else {
- if (*(u32 *)addr != BREAKPOINT_INSTRUCTION) {
- /* The breakpoint instruction was removed by
- * another cpu right after we hit, no further
- * handling of this interrupt is appropriate
- */
- ret = 1;
- goto no_kprobe;
- }
- p = __get_cpu_var(current_kprobe);
- if (p->break_handler && p->break_handler(p, regs))
- goto ss_probe;
- }
- goto no_kprobe;
- }
-
- p = get_kprobe(addr);
- if (!p) {
- if (*(u32 *)addr != BREAKPOINT_INSTRUCTION) {
- /*
- * The breakpoint instruction was removed right
- * after we hit it. Another cpu has removed
- * either a probepoint or a debugger breakpoint
- * at this address. In either case, no further
- * handling of this interrupt is appropriate.
- */
- ret = 1;
- }
- /* Not one of ours: let kernel handle it */
- goto no_kprobe;
- }
-
- set_current_kprobe(p, regs, kcb);
- kcb->kprobe_status = KPROBE_HIT_ACTIVE;
- if (p->pre_handler && p->pre_handler(p, regs))
- return 1;
-
-ss_probe:
- prepare_singlestep(p, regs, kcb);
- kcb->kprobe_status = KPROBE_HIT_SS;
- return 1;
-
-no_kprobe:
- preempt_enable_no_resched();
- return ret;
-}
-
-/* If INSN is a relative control transfer instruction,
- * return the corrected branch destination value.
- *
- * regs->tpc and regs->tnpc still hold the values of the
- * program counters at the time of trap due to the execution
- * of the BREAKPOINT_INSTRUCTION_2 at p->ainsn.insn[1]
- *
- */
-static unsigned long __kprobes relbranch_fixup(u32 insn, struct kprobe *p,
- struct pt_regs *regs)
-{
- unsigned long real_pc = (unsigned long) p->addr;
-
- /* Branch not taken, no mods necessary. */
- if (regs->tnpc == regs->tpc + 0x4UL)
- return real_pc + 0x8UL;
-
- /* The three cases are call, branch w/prediction,
- * and traditional branch.
- */
- if ((insn & 0xc0000000) == 0x40000000 ||
- (insn & 0xc1c00000) == 0x00400000 ||
- (insn & 0xc1c00000) == 0x00800000) {
- unsigned long ainsn_addr;
-
- ainsn_addr = (unsigned long) &p->ainsn.insn[0];
-
- /* The instruction did all the work for us
- * already, just apply the offset to the correct
- * instruction location.
- */
- return (real_pc + (regs->tnpc - ainsn_addr));
- }
-
- /* It is jmpl or some other absolute PC modification instruction,
- * leave NPC as-is.
- */
- return regs->tnpc;
-}
-
-/* If INSN is an instruction which writes it's PC location
- * into a destination register, fix that up.
- */
-static void __kprobes retpc_fixup(struct pt_regs *regs, u32 insn,
- unsigned long real_pc)
-{
- unsigned long *slot = NULL;
-
- /* Simplest case is 'call', which always uses %o7 */
- if ((insn & 0xc0000000) == 0x40000000) {
- slot = ®s->u_regs[UREG_I7];
- }
-
- /* 'jmpl' encodes the register inside of the opcode */
- if ((insn & 0xc1f80000) == 0x81c00000) {
- unsigned long rd = ((insn >> 25) & 0x1f);
-
- if (rd <= 15) {
- slot = ®s->u_regs[rd];
- } else {
- /* Hard case, it goes onto the stack. */
- flushw_all();
-
- rd -= 16;
- slot = (unsigned long *)
- (regs->u_regs[UREG_FP] + STACK_BIAS);
- slot += rd;
- }
- }
- if (slot != NULL)
- *slot = real_pc;
-}
-
-/*
- * Called after single-stepping. p->addr is the address of the
- * instruction which has been replaced by the breakpoint
- * instruction. To avoid the SMP problems that can occur when we
- * temporarily put back the original opcode to single-step, we
- * single-stepped a copy of the instruction. The address of this
- * copy is &p->ainsn.insn[0].
- *
- * This function prepares to return from the post-single-step
- * breakpoint trap.
- */
-static void __kprobes resume_execution(struct kprobe *p,
- struct pt_regs *regs, struct kprobe_ctlblk *kcb)
-{
- u32 insn = p->ainsn.insn[0];
-
- regs->tnpc = relbranch_fixup(insn, p, regs);
-
- /* This assignment must occur after relbranch_fixup() */
- regs->tpc = kcb->kprobe_orig_tnpc;
-
- retpc_fixup(regs, insn, (unsigned long) p->addr);
-
- regs->tstate = ((regs->tstate & ~TSTATE_PIL) |
- kcb->kprobe_orig_tstate_pil);
-}
-
-static int __kprobes post_kprobe_handler(struct pt_regs *regs)
-{
- struct kprobe *cur = kprobe_running();
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- if (!cur)
- return 0;
-
- if ((kcb->kprobe_status != KPROBE_REENTER) && cur->post_handler) {
- kcb->kprobe_status = KPROBE_HIT_SSDONE;
- cur->post_handler(cur, regs, 0);
- }
-
- resume_execution(cur, regs, kcb);
-
- /*Restore back the original saved kprobes variables and continue. */
- if (kcb->kprobe_status == KPROBE_REENTER) {
- restore_previous_kprobe(kcb);
- goto out;
- }
- reset_current_kprobe();
-out:
- preempt_enable_no_resched();
-
- return 1;
-}
-
-int __kprobes kprobe_fault_handler(struct pt_regs *regs, int trapnr)
-{
- struct kprobe *cur = kprobe_running();
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
- const struct exception_table_entry *entry;
-
- switch(kcb->kprobe_status) {
- case KPROBE_HIT_SS:
- case KPROBE_REENTER:
- /*
- * We are here because the instruction being single
- * stepped caused a page fault. We reset the current
- * kprobe and the tpc points back to the probe address
- * and allow the page fault handler to continue as a
- * normal page fault.
- */
- regs->tpc = (unsigned long)cur->addr;
- regs->tnpc = kcb->kprobe_orig_tnpc;
- regs->tstate = ((regs->tstate & ~TSTATE_PIL) |
- kcb->kprobe_orig_tstate_pil);
- if (kcb->kprobe_status == KPROBE_REENTER)
- restore_previous_kprobe(kcb);
- else
- reset_current_kprobe();
- preempt_enable_no_resched();
- break;
- case KPROBE_HIT_ACTIVE:
- case KPROBE_HIT_SSDONE:
- /*
- * We increment the nmissed count for accounting,
- * we can also use npre/npostfault count for accouting
- * these specific fault cases.
- */
- kprobes_inc_nmissed_count(cur);
-
- /*
- * We come here because instructions in the pre/post
- * handler caused the page_fault, this could happen
- * if handler tries to access user space by
- * copy_from_user(), get_user() etc. Let the
- * user-specified handler try to fix it first.
- */
- if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr))
- return 1;
-
- /*
- * In case the user-specified fault handler returned
- * zero, try to fix up.
- */
-
- entry = search_exception_tables(regs->tpc);
- if (entry) {
- regs->tpc = entry->fixup;
- regs->tnpc = regs->tpc + 4;
- return 1;
- }
-
- /*
- * fixup_exception() could not handle it,
- * Let do_page_fault() fix it.
- */
- break;
- default:
- break;
- }
-
- return 0;
-}
-
-/*
- * Wrapper routine to for handling exceptions.
- */
-int __kprobes kprobe_exceptions_notify(struct notifier_block *self,
- unsigned long val, void *data)
-{
- struct die_args *args = (struct die_args *)data;
- int ret = NOTIFY_DONE;
-
- if (args->regs && user_mode(args->regs))
- return ret;
-
- switch (val) {
- case DIE_DEBUG:
- if (kprobe_handler(args->regs))
- ret = NOTIFY_STOP;
- break;
- case DIE_DEBUG_2:
- if (post_kprobe_handler(args->regs))
- ret = NOTIFY_STOP;
- break;
- default:
- break;
- }
- return ret;
-}
-
-asmlinkage void __kprobes kprobe_trap(unsigned long trap_level,
- struct pt_regs *regs)
-{
- BUG_ON(trap_level != 0x170 && trap_level != 0x171);
-
- if (user_mode(regs)) {
- local_irq_enable();
- bad_trap(regs, trap_level);
- return;
- }
-
- /* trap_level == 0x170 --> ta 0x70
- * trap_level == 0x171 --> ta 0x71
- */
- if (notify_die((trap_level == 0x170) ? DIE_DEBUG : DIE_DEBUG_2,
- (trap_level == 0x170) ? "debug" : "debug_2",
- regs, 0, trap_level, SIGTRAP) != NOTIFY_STOP)
- bad_trap(regs, trap_level);
-}
-
-/* Jprobes support. */
-int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct jprobe *jp = container_of(p, struct jprobe, kp);
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- memcpy(&(kcb->jprobe_saved_regs), regs, sizeof(*regs));
-
- regs->tpc = (unsigned long) jp->entry;
- regs->tnpc = ((unsigned long) jp->entry) + 0x4UL;
- regs->tstate |= TSTATE_PIL;
-
- return 1;
-}
-
-void __kprobes jprobe_return(void)
-{
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
- register unsigned long orig_fp asm("g1");
-
- orig_fp = kcb->jprobe_saved_regs.u_regs[UREG_FP];
- __asm__ __volatile__("\n"
-"1: cmp %%sp, %0\n\t"
- "blu,a,pt %%xcc, 1b\n\t"
- " restore\n\t"
- ".globl jprobe_return_trap_instruction\n"
-"jprobe_return_trap_instruction:\n\t"
- "ta 0x70"
- : /* no outputs */
- : "r" (orig_fp));
-}
-
-extern void jprobe_return_trap_instruction(void);
-
-extern void __show_regs(struct pt_regs * regs);
-
-int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
-{
- u32 *addr = (u32 *) regs->tpc;
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- if (addr == (u32 *) jprobe_return_trap_instruction) {
- memcpy(regs, &(kcb->jprobe_saved_regs), sizeof(*regs));
- preempt_enable_no_resched();
- return 1;
- }
- return 0;
-}
-
-/* architecture specific initialization */
-int arch_init_kprobes(void)
-{
- return 0;
-}
Index: linux-2.6-lttng.stable/arch/avr32/instrumentation/Makefile
===================================================================
--- linux-2.6-lttng.stable.orig/arch/avr32/instrumentation/Makefile 2007-10-29 09:51:29.000000000 -0400
+++ linux-2.6-lttng.stable/arch/avr32/instrumentation/Makefile 2007-10-29 09:51:38.000000000 -0400
@@ -1,3 +1,5 @@
#
# avr32 instrumentation Makefile
#
+
+obj-$(CONFIG_KPROBES) += kprobes.o
Index: linux-2.6-lttng.stable/arch/avr32/kernel/Makefile
===================================================================
--- linux-2.6-lttng.stable.orig/arch/avr32/kernel/Makefile 2007-10-29 09:51:07.000000000 -0400
+++ linux-2.6-lttng.stable/arch/avr32/kernel/Makefile 2007-10-29 09:51:38.000000000 -0400
@@ -10,4 +10,3 @@ obj-y += setup.o traps.o semaphore.o
obj-y += signal.o sys_avr32.o process.o time.o
obj-y += init_task.o switch_to.o cpu.o
obj-$(CONFIG_MODULES) += module.o avr32_ksyms.o
-obj-$(CONFIG_KPROBES) += kprobes.o
Index: linux-2.6-lttng.stable/arch/ia64/instrumentation/Makefile
===================================================================
--- linux-2.6-lttng.stable.orig/arch/ia64/instrumentation/Makefile 2007-10-29 09:51:29.000000000 -0400
+++ linux-2.6-lttng.stable/arch/ia64/instrumentation/Makefile 2007-10-29 09:51:38.000000000 -0400
@@ -8,3 +8,5 @@ DRIVER_OBJS := $(addprefix ../../../driv
oprofile-y := $(DRIVER_OBJS) init.o backtrace.o
oprofile-$(CONFIG_PERFMON) += perfmon.o
+
+obj-$(CONFIG_KPROBES) += kprobes.o jprobes.o
Index: linux-2.6-lttng.stable/arch/powerpc/instrumentation/Makefile
===================================================================
--- linux-2.6-lttng.stable.orig/arch/powerpc/instrumentation/Makefile 2007-10-29 09:51:29.000000000 -0400
+++ linux-2.6-lttng.stable/arch/powerpc/instrumentation/Makefile 2007-10-29 09:51:38.000000000 -0400
@@ -17,3 +17,5 @@ oprofile-$(CONFIG_OPROFILE_CELL) += op_m
oprofile-$(CONFIG_PPC64) += op_model_rs64.o op_model_power4.o op_model_pa6t.o
oprofile-$(CONFIG_FSL_BOOKE) += op_model_fsl_booke.o
oprofile-$(CONFIG_6xx) += op_model_7450.o
+
+obj-$(CONFIG_KPROBES) += kprobes.o
Index: linux-2.6-lttng.stable/arch/s390/instrumentation/Makefile
===================================================================
--- linux-2.6-lttng.stable.orig/arch/s390/instrumentation/Makefile 2007-10-29 09:51:29.000000000 -0400
+++ linux-2.6-lttng.stable/arch/s390/instrumentation/Makefile 2007-10-29 09:51:38.000000000 -0400
@@ -7,3 +7,5 @@ DRIVER_OBJS = $(addprefix ../../../drive
timer_int.o )
oprofile-y := $(DRIVER_OBJS) init.o backtrace.o
+
+obj-$(CONFIG_KPROBES) += kprobes.o
Index: linux-2.6-lttng.stable/arch/sparc64/instrumentation/Makefile
===================================================================
--- linux-2.6-lttng.stable.orig/arch/sparc64/instrumentation/Makefile 2007-10-29 09:51:29.000000000 -0400
+++ linux-2.6-lttng.stable/arch/sparc64/instrumentation/Makefile 2007-10-29 09:51:38.000000000 -0400
@@ -7,3 +7,5 @@ DRIVER_OBJS = $(addprefix ../../../drive
timer_int.o )
oprofile-y := $(DRIVER_OBJS) init.o
+
+obj-$(CONFIG_KPROBES) += kprobes.o
Index: linux-2.6-lttng.stable/arch/x86/instrumentation/kprobes_32.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng.stable/arch/x86/instrumentation/kprobes_32.c 2007-10-29 09:51:38.000000000 -0400
@@ -0,0 +1,763 @@
+/*
+ * Kernel Probes (KProbes)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2002, 2004
+ *
+ * 2002-Oct Created by Vamsi Krishna S <vamsi_krishna@in.ibm.com> Kernel
+ * Probes initial implementation ( includes contributions from
+ * Rusty Russell).
+ * 2004-July Suparna Bhattacharya <suparna@in.ibm.com> added jumper probes
+ * interface to access function arguments.
+ * 2005-May Hien Nguyen <hien@us.ibm.com>, Jim Keniston
+ * <jkenisto@us.ibm.com> and Prasanna S Panchamukhi
+ * <prasanna@in.ibm.com> added function-return probes.
+ */
+
+#include <linux/kprobes.h>
+#include <linux/ptrace.h>
+#include <linux/preempt.h>
+#include <linux/kdebug.h>
+#include <asm/cacheflush.h>
+#include <asm/desc.h>
+#include <asm/uaccess.h>
+#include <asm/alternative.h>
+
+void jprobe_return_end(void);
+
+DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
+DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
+
+struct kretprobe_blackpoint kretprobe_blacklist[] = {
+ {"__switch_to", }, /* This function switches only current task, but
+ doesn't switch kernel stack.*/
+ {NULL, NULL} /* Terminator */
+};
+const int kretprobe_blacklist_size = ARRAY_SIZE(kretprobe_blacklist);
+
+/* insert a jmp code */
+static __always_inline void set_jmp_op(void *from, void *to)
+{
+ struct __arch_jmp_op {
+ char op;
+ long raddr;
+ } __attribute__((packed)) *jop;
+ jop = (struct __arch_jmp_op *)from;
+ jop->raddr = (long)(to) - ((long)(from) + 5);
+ jop->op = RELATIVEJUMP_INSTRUCTION;
+}
+
+/*
+ * returns non-zero if opcodes can be boosted.
+ */
+static __always_inline int can_boost(kprobe_opcode_t *opcodes)
+{
+#define W(row,b0,b1,b2,b3,b4,b5,b6,b7,b8,b9,ba,bb,bc,bd,be,bf) \
+ (((b0##UL << 0x0)|(b1##UL << 0x1)|(b2##UL << 0x2)|(b3##UL << 0x3) | \
+ (b4##UL << 0x4)|(b5##UL << 0x5)|(b6##UL << 0x6)|(b7##UL << 0x7) | \
+ (b8##UL << 0x8)|(b9##UL << 0x9)|(ba##UL << 0xa)|(bb##UL << 0xb) | \
+ (bc##UL << 0xc)|(bd##UL << 0xd)|(be##UL << 0xe)|(bf##UL << 0xf)) \
+ << (row % 32))
+ /*
+ * Undefined/reserved opcodes, conditional jump, Opcode Extension
+ * Groups, and some special opcodes can not be boost.
+ */
+ static const unsigned long twobyte_is_boostable[256 / 32] = {
+ /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */
+ /* ------------------------------- */
+ W(0x00, 0,0,1,1,0,0,1,0,1,1,0,0,0,0,0,0)| /* 00 */
+ W(0x10, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), /* 10 */
+ W(0x20, 1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0)| /* 20 */
+ W(0x30, 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0), /* 30 */
+ W(0x40, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)| /* 40 */
+ W(0x50, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), /* 50 */
+ W(0x60, 1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1)| /* 60 */
+ W(0x70, 0,0,0,0,1,1,1,1,0,0,0,0,0,0,1,1), /* 70 */
+ W(0x80, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)| /* 80 */
+ W(0x90, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1), /* 90 */
+ W(0xa0, 1,1,0,1,1,1,0,0,1,1,0,1,1,1,0,1)| /* a0 */
+ W(0xb0, 1,1,1,1,1,1,1,1,0,0,0,1,1,1,1,1), /* b0 */
+ W(0xc0, 1,1,0,0,0,0,0,0,1,1,1,1,1,1,1,1)| /* c0 */
+ W(0xd0, 0,1,1,1,0,1,0,0,1,1,0,1,1,1,0,1), /* d0 */
+ W(0xe0, 0,1,1,0,0,1,0,0,1,1,0,1,1,1,0,1)| /* e0 */
+ W(0xf0, 0,1,1,1,0,1,0,0,1,1,1,0,1,1,1,0) /* f0 */
+ /* ------------------------------- */
+ /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */
+ };
+#undef W
+ kprobe_opcode_t opcode;
+ kprobe_opcode_t *orig_opcodes = opcodes;
+retry:
+ if (opcodes - orig_opcodes > MAX_INSN_SIZE - 1)
+ return 0;
+ opcode = *(opcodes++);
+
+ /* 2nd-byte opcode */
+ if (opcode == 0x0f) {
+ if (opcodes - orig_opcodes > MAX_INSN_SIZE - 1)
+ return 0;
+ return test_bit(*opcodes, twobyte_is_boostable);
+ }
+
+ switch (opcode & 0xf0) {
+ case 0x60:
+ if (0x63 < opcode && opcode < 0x67)
+ goto retry; /* prefixes */
+ /* can't boost Address-size override and bound */
+ return (opcode != 0x62 && opcode != 0x67);
+ case 0x70:
+ return 0; /* can't boost conditional jump */
+ case 0xc0:
+ /* can't boost software-interruptions */
+ return (0xc1 < opcode && opcode < 0xcc) || opcode == 0xcf;
+ case 0xd0:
+ /* can boost AA* and XLAT */
+ return (opcode == 0xd4 || opcode == 0xd5 || opcode == 0xd7);
+ case 0xe0:
+ /* can boost in/out and absolute jmps */
+ return ((opcode & 0x04) || opcode == 0xea);
+ case 0xf0:
+ if ((opcode & 0x0c) == 0 && opcode != 0xf1)
+ goto retry; /* lock/rep(ne) prefix */
+ /* clear and set flags can be boost */
+ return (opcode == 0xf5 || (0xf7 < opcode && opcode < 0xfe));
+ default:
+ if (opcode == 0x26 || opcode == 0x36 || opcode == 0x3e)
+ goto retry; /* prefixes */
+ /* can't boost CS override and call */
+ return (opcode != 0x2e && opcode != 0x9a);
+ }
+}
+
+/*
+ * returns non-zero if opcode modifies the interrupt flag.
+ */
+static int __kprobes is_IF_modifier(kprobe_opcode_t opcode)
+{
+ switch (opcode) {
+ case 0xfa: /* cli */
+ case 0xfb: /* sti */
+ case 0xcf: /* iret/iretd */
+ case 0x9d: /* popf/popfd */
+ return 1;
+ }
+ return 0;
+}
+
+int __kprobes arch_prepare_kprobe(struct kprobe *p)
+{
+ /* insn: must be on special executable page on i386. */
+ p->ainsn.insn = get_insn_slot();
+ if (!p->ainsn.insn)
+ return -ENOMEM;
+
+ memcpy(p->ainsn.insn, p->addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
+ p->opcode = *p->addr;
+ if (can_boost(p->addr)) {
+ p->ainsn.boostable = 0;
+ } else {
+ p->ainsn.boostable = -1;
+ }
+ return 0;
+}
+
+void __kprobes arch_arm_kprobe(struct kprobe *p)
+{
+ text_poke(p->addr, ((unsigned char []){BREAKPOINT_INSTRUCTION}), 1);
+}
+
+void __kprobes arch_disarm_kprobe(struct kprobe *p)
+{
+ text_poke(p->addr, &p->opcode, 1);
+}
+
+void __kprobes arch_remove_kprobe(struct kprobe *p)
+{
+ mutex_lock(&kprobe_mutex);
+ free_insn_slot(p->ainsn.insn, (p->ainsn.boostable == 1));
+ mutex_unlock(&kprobe_mutex);
+}
+
+static void __kprobes save_previous_kprobe(struct kprobe_ctlblk *kcb)
+{
+ kcb->prev_kprobe.kp = kprobe_running();
+ kcb->prev_kprobe.status = kcb->kprobe_status;
+ kcb->prev_kprobe.old_eflags = kcb->kprobe_old_eflags;
+ kcb->prev_kprobe.saved_eflags = kcb->kprobe_saved_eflags;
+}
+
+static void __kprobes restore_previous_kprobe(struct kprobe_ctlblk *kcb)
+{
+ __get_cpu_var(current_kprobe) = kcb->prev_kprobe.kp;
+ kcb->kprobe_status = kcb->prev_kprobe.status;
+ kcb->kprobe_old_eflags = kcb->prev_kprobe.old_eflags;
+ kcb->kprobe_saved_eflags = kcb->prev_kprobe.saved_eflags;
+}
+
+static void __kprobes set_current_kprobe(struct kprobe *p, struct pt_regs *regs,
+ struct kprobe_ctlblk *kcb)
+{
+ __get_cpu_var(current_kprobe) = p;
+ kcb->kprobe_saved_eflags = kcb->kprobe_old_eflags
+ = (regs->eflags & (TF_MASK | IF_MASK));
+ if (is_IF_modifier(p->opcode))
+ kcb->kprobe_saved_eflags &= ~IF_MASK;
+}
+
+static void __kprobes prepare_singlestep(struct kprobe *p, struct pt_regs *regs)
+{
+ regs->eflags |= TF_MASK;
+ regs->eflags &= ~IF_MASK;
+ /*single step inline if the instruction is an int3*/
+ if (p->opcode == BREAKPOINT_INSTRUCTION)
+ regs->eip = (unsigned long)p->addr;
+ else
+ regs->eip = (unsigned long)p->ainsn.insn;
+}
+
+/* Called with kretprobe_lock held */
+void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
+ struct pt_regs *regs)
+{
+ unsigned long *sara = (unsigned long *)®s->esp;
+
+ ri->ret_addr = (kprobe_opcode_t *) *sara;
+
+ /* Replace the return addr with trampoline addr */
+ *sara = (unsigned long) &kretprobe_trampoline;
+}
+
+/*
+ * Interrupts are disabled on entry as trap3 is an interrupt gate and they
+ * remain disabled thorough out this function.
+ */
+static int __kprobes kprobe_handler(struct pt_regs *regs)
+{
+ struct kprobe *p;
+ int ret = 0;
+ kprobe_opcode_t *addr;
+ struct kprobe_ctlblk *kcb;
+
+ addr = (kprobe_opcode_t *)(regs->eip - sizeof(kprobe_opcode_t));
+
+ /*
+ * We don't want to be preempted for the entire
+ * duration of kprobe processing
+ */
+ preempt_disable();
+ kcb = get_kprobe_ctlblk();
+
+ /* Check we're not actually recursing */
+ if (kprobe_running()) {
+ p = get_kprobe(addr);
+ if (p) {
+ if (kcb->kprobe_status == KPROBE_HIT_SS &&
+ *p->ainsn.insn == BREAKPOINT_INSTRUCTION) {
+ regs->eflags &= ~TF_MASK;
+ regs->eflags |= kcb->kprobe_saved_eflags;
+ goto no_kprobe;
+ }
+ /* We have reentered the kprobe_handler(), since
+ * another probe was hit while within the handler.
+ * We here save the original kprobes variables and
+ * just single step on the instruction of the new probe
+ * without calling any user handlers.
+ */
+ save_previous_kprobe(kcb);
+ set_current_kprobe(p, regs, kcb);
+ kprobes_inc_nmissed_count(p);
+ prepare_singlestep(p, regs);
+ kcb->kprobe_status = KPROBE_REENTER;
+ return 1;
+ } else {
+ if (*addr != BREAKPOINT_INSTRUCTION) {
+ /* The breakpoint instruction was removed by
+ * another cpu right after we hit, no further
+ * handling of this interrupt is appropriate
+ */
+ regs->eip -= sizeof(kprobe_opcode_t);
+ ret = 1;
+ goto no_kprobe;
+ }
+ p = __get_cpu_var(current_kprobe);
+ if (p->break_handler && p->break_handler(p, regs)) {
+ goto ss_probe;
+ }
+ }
+ goto no_kprobe;
+ }
+
+ p = get_kprobe(addr);
+ if (!p) {
+ if (*addr != BREAKPOINT_INSTRUCTION) {
+ /*
+ * The breakpoint instruction was removed right
+ * after we hit it. Another cpu has removed
+ * either a probepoint or a debugger breakpoint
+ * at this address. In either case, no further
+ * handling of this interrupt is appropriate.
+ * Back up over the (now missing) int3 and run
+ * the original instruction.
+ */
+ regs->eip -= sizeof(kprobe_opcode_t);
+ ret = 1;
+ }
+ /* Not one of ours: let kernel handle it */
+ goto no_kprobe;
+ }
+
+ set_current_kprobe(p, regs, kcb);
+ kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+
+ if (p->pre_handler && p->pre_handler(p, regs))
+ /* handler has already set things up, so skip ss setup */
+ return 1;
+
+ss_probe:
+#if !defined(CONFIG_PREEMPT) || defined(CONFIG_PM)
+ if (p->ainsn.boostable == 1 && !p->post_handler){
+ /* Boost up -- we can execute copied instructions directly */
+ reset_current_kprobe();
+ regs->eip = (unsigned long)p->ainsn.insn;
+ preempt_enable_no_resched();
+ return 1;
+ }
+#endif
+ prepare_singlestep(p, regs);
+ kcb->kprobe_status = KPROBE_HIT_SS;
+ return 1;
+
+no_kprobe:
+ preempt_enable_no_resched();
+ return ret;
+}
+
+/*
+ * For function-return probes, init_kprobes() establishes a probepoint
+ * here. When a retprobed function returns, this probe is hit and
+ * trampoline_probe_handler() runs, calling the kretprobe's handler.
+ */
+ void __kprobes kretprobe_trampoline_holder(void)
+ {
+ asm volatile ( ".global kretprobe_trampoline\n"
+ "kretprobe_trampoline: \n"
+ " pushf\n"
+ /* skip cs, eip, orig_eax */
+ " subl $12, %esp\n"
+ " pushl %fs\n"
+ " pushl %ds\n"
+ " pushl %es\n"
+ " pushl %eax\n"
+ " pushl %ebp\n"
+ " pushl %edi\n"
+ " pushl %esi\n"
+ " pushl %edx\n"
+ " pushl %ecx\n"
+ " pushl %ebx\n"
+ " movl %esp, %eax\n"
+ " call trampoline_handler\n"
+ /* move eflags to cs */
+ " movl 52(%esp), %edx\n"
+ " movl %edx, 48(%esp)\n"
+ /* save true return address on eflags */
+ " movl %eax, 52(%esp)\n"
+ " popl %ebx\n"
+ " popl %ecx\n"
+ " popl %edx\n"
+ " popl %esi\n"
+ " popl %edi\n"
+ " popl %ebp\n"
+ " popl %eax\n"
+ /* skip eip, orig_eax, es, ds, fs */
+ " addl $20, %esp\n"
+ " popf\n"
+ " ret\n");
+}
+
+/*
+ * Called from kretprobe_trampoline
+ */
+fastcall void *__kprobes trampoline_handler(struct pt_regs *regs)
+{
+ struct kretprobe_instance *ri = NULL;
+ struct hlist_head *head, empty_rp;
+ struct hlist_node *node, *tmp;
+ unsigned long flags, orig_ret_address = 0;
+ unsigned long trampoline_address =(unsigned long)&kretprobe_trampoline;
+
+ INIT_HLIST_HEAD(&empty_rp);
+ spin_lock_irqsave(&kretprobe_lock, flags);
+ head = kretprobe_inst_table_head(current);
+ /* fixup registers */
+ regs->xcs = __KERNEL_CS | get_kernel_rpl();
+ regs->eip = trampoline_address;
+ regs->orig_eax = 0xffffffff;
+
+ /*
+ * It is possible to have multiple instances associated with a given
+ * task either because an multiple functions in the call path
+ * have a return probe installed on them, and/or more then one return
+ * return probe was registered for a target function.
+ *
+ * We can handle this because:
+ * - instances are always inserted at the head of the list
+ * - when multiple return probes are registered for the same
+ * function, the first instance's ret_addr will point to the
+ * real return address, and all the rest will point to
+ * kretprobe_trampoline
+ */
+ hlist_for_each_entry_safe(ri, node, tmp, head, hlist) {
+ if (ri->task != current)
+ /* another task is sharing our hash bucket */
+ continue;
+
+ if (ri->rp && ri->rp->handler){
+ __get_cpu_var(current_kprobe) = &ri->rp->kp;
+ get_kprobe_ctlblk()->kprobe_status = KPROBE_HIT_ACTIVE;
+ ri->rp->handler(ri, regs);
+ __get_cpu_var(current_kprobe) = NULL;
+ }
+
+ orig_ret_address = (unsigned long)ri->ret_addr;
+ recycle_rp_inst(ri, &empty_rp);
+
+ if (orig_ret_address != trampoline_address)
+ /*
+ * This is the real return address. Any other
+ * instances associated with this task are for
+ * other calls deeper on the call stack
+ */
+ break;
+ }
+
+ kretprobe_assert(ri, orig_ret_address, trampoline_address);
+ spin_unlock_irqrestore(&kretprobe_lock, flags);
+
+ hlist_for_each_entry_safe(ri, node, tmp, &empty_rp, hlist) {
+ hlist_del(&ri->hlist);
+ kfree(ri);
+ }
+ return (void*)orig_ret_address;
+}
+
+/*
+ * Called after single-stepping. p->addr is the address of the
+ * instruction whose first byte has been replaced by the "int 3"
+ * instruction. To avoid the SMP problems that can occur when we
+ * temporarily put back the original opcode to single-step, we
+ * single-stepped a copy of the instruction. The address of this
+ * copy is p->ainsn.insn.
+ *
+ * This function prepares to return from the post-single-step
+ * interrupt. We have to fix up the stack as follows:
+ *
+ * 0) Except in the case of absolute or indirect jump or call instructions,
+ * the new eip is relative to the copied instruction. We need to make
+ * it relative to the original instruction.
+ *
+ * 1) If the single-stepped instruction was pushfl, then the TF and IF
+ * flags are set in the just-pushed eflags, and may need to be cleared.
+ *
+ * 2) If the single-stepped instruction was a call, the return address
+ * that is atop the stack is the address following the copied instruction.
+ * We need to make it the address following the original instruction.
+ *
+ * This function also checks instruction size for preparing direct execution.
+ */
+static void __kprobes resume_execution(struct kprobe *p,
+ struct pt_regs *regs, struct kprobe_ctlblk *kcb)
+{
+ unsigned long *tos = (unsigned long *)®s->esp;
+ unsigned long copy_eip = (unsigned long)p->ainsn.insn;
+ unsigned long orig_eip = (unsigned long)p->addr;
+
+ regs->eflags &= ~TF_MASK;
+ switch (p->ainsn.insn[0]) {
+ case 0x9c: /* pushfl */
+ *tos &= ~(TF_MASK | IF_MASK);
+ *tos |= kcb->kprobe_old_eflags;
+ break;
+ case 0xc2: /* iret/ret/lret */
+ case 0xc3:
+ case 0xca:
+ case 0xcb:
+ case 0xcf:
+ case 0xea: /* jmp absolute -- eip is correct */
+ /* eip is already adjusted, no more changes required */
+ p->ainsn.boostable = 1;
+ goto no_change;
+ case 0xe8: /* call relative - Fix return addr */
+ *tos = orig_eip + (*tos - copy_eip);
+ break;
+ case 0x9a: /* call absolute -- same as call absolute, indirect */
+ *tos = orig_eip + (*tos - copy_eip);
+ goto no_change;
+ case 0xff:
+ if ((p->ainsn.insn[1] & 0x30) == 0x10) {
+ /*
+ * call absolute, indirect
+ * Fix return addr; eip is correct.
+ * But this is not boostable
+ */
+ *tos = orig_eip + (*tos - copy_eip);
+ goto no_change;
+ } else if (((p->ainsn.insn[1] & 0x31) == 0x20) || /* jmp near, absolute indirect */
+ ((p->ainsn.insn[1] & 0x31) == 0x21)) { /* jmp far, absolute indirect */
+ /* eip is correct. And this is boostable */
+ p->ainsn.boostable = 1;
+ goto no_change;
+ }
+ default:
+ break;
+ }
+
+ if (p->ainsn.boostable == 0) {
+ if ((regs->eip > copy_eip) &&
+ (regs->eip - copy_eip) + 5 < MAX_INSN_SIZE) {
+ /*
+ * These instructions can be executed directly if it
+ * jumps back to correct address.
+ */
+ set_jmp_op((void *)regs->eip,
+ (void *)orig_eip + (regs->eip - copy_eip));
+ p->ainsn.boostable = 1;
+ } else {
+ p->ainsn.boostable = -1;
+ }
+ }
+
+ regs->eip = orig_eip + (regs->eip - copy_eip);
+
+no_change:
+ return;
+}
+
+/*
+ * Interrupts are disabled on entry as trap1 is an interrupt gate and they
+ * remain disabled thoroughout this function.
+ */
+static int __kprobes post_kprobe_handler(struct pt_regs *regs)
+{
+ struct kprobe *cur = kprobe_running();
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+
+ if (!cur)
+ return 0;
+
+ if ((kcb->kprobe_status != KPROBE_REENTER) && cur->post_handler) {
+ kcb->kprobe_status = KPROBE_HIT_SSDONE;
+ cur->post_handler(cur, regs, 0);
+ }
+
+ resume_execution(cur, regs, kcb);
+ regs->eflags |= kcb->kprobe_saved_eflags;
+#ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT
+ if (raw_irqs_disabled_flags(regs->eflags))
+ trace_hardirqs_off();
+ else
+ trace_hardirqs_on();
+#endif
+
+ /*Restore back the original saved kprobes variables and continue. */
+ if (kcb->kprobe_status == KPROBE_REENTER) {
+ restore_previous_kprobe(kcb);
+ goto out;
+ }
+ reset_current_kprobe();
+out:
+ preempt_enable_no_resched();
+
+ /*
+ * if somebody else is singlestepping across a probe point, eflags
+ * will have TF set, in which case, continue the remaining processing
+ * of do_debug, as if this is not a probe hit.
+ */
+ if (regs->eflags & TF_MASK)
+ return 0;
+
+ return 1;
+}
+
+int __kprobes kprobe_fault_handler(struct pt_regs *regs, int trapnr)
+{
+ struct kprobe *cur = kprobe_running();
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+
+ switch(kcb->kprobe_status) {
+ case KPROBE_HIT_SS:
+ case KPROBE_REENTER:
+ /*
+ * We are here because the instruction being single
+ * stepped caused a page fault. We reset the current
+ * kprobe and the eip points back to the probe address
+ * and allow the page fault handler to continue as a
+ * normal page fault.
+ */
+ regs->eip = (unsigned long)cur->addr;
+ regs->eflags |= kcb->kprobe_old_eflags;
+ if (kcb->kprobe_status == KPROBE_REENTER)
+ restore_previous_kprobe(kcb);
+ else
+ reset_current_kprobe();
+ preempt_enable_no_resched();
+ break;
+ case KPROBE_HIT_ACTIVE:
+ case KPROBE_HIT_SSDONE:
+ /*
+ * We increment the nmissed count for accounting,
+ * we can also use npre/npostfault count for accouting
+ * these specific fault cases.
+ */
+ kprobes_inc_nmissed_count(cur);
+
+ /*
+ * We come here because instructions in the pre/post
+ * handler caused the page_fault, this could happen
+ * if handler tries to access user space by
+ * copy_from_user(), get_user() etc. Let the
+ * user-specified handler try to fix it first.
+ */
+ if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr))
+ return 1;
+
+ /*
+ * In case the user-specified fault handler returned
+ * zero, try to fix up.
+ */
+ if (fixup_exception(regs))
+ return 1;
+
+ /*
+ * fixup_exception() could not handle it,
+ * Let do_page_fault() fix it.
+ */
+ break;
+ default:
+ break;
+ }
+ return 0;
+}
+
+/*
+ * Wrapper routine to for handling exceptions.
+ */
+int __kprobes kprobe_exceptions_notify(struct notifier_block *self,
+ unsigned long val, void *data)
+{
+ struct die_args *args = (struct die_args *)data;
+ int ret = NOTIFY_DONE;
+
+ if (args->regs && user_mode_vm(args->regs))
+ return ret;
+
+ switch (val) {
+ case DIE_INT3:
+ if (kprobe_handler(args->regs))
+ ret = NOTIFY_STOP;
+ break;
+ case DIE_DEBUG:
+ if (post_kprobe_handler(args->regs))
+ ret = NOTIFY_STOP;
+ break;
+ case DIE_GPF:
+ /* kprobe_running() needs smp_processor_id() */
+ preempt_disable();
+ if (kprobe_running() &&
+ kprobe_fault_handler(args->regs, args->trapnr))
+ ret = NOTIFY_STOP;
+ preempt_enable();
+ break;
+ default:
+ break;
+ }
+ return ret;
+}
+
+int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
+{
+ struct jprobe *jp = container_of(p, struct jprobe, kp);
+ unsigned long addr;
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+
+ kcb->jprobe_saved_regs = *regs;
+ kcb->jprobe_saved_esp = ®s->esp;
+ addr = (unsigned long)(kcb->jprobe_saved_esp);
+
+ /*
+ * TBD: As Linus pointed out, gcc assumes that the callee
+ * owns the argument space and could overwrite it, e.g.
+ * tailcall optimization. So, to be absolutely safe
+ * we also save and restore enough stack bytes to cover
+ * the argument area.
+ */
+ memcpy(kcb->jprobes_stack, (kprobe_opcode_t *)addr,
+ MIN_STACK_SIZE(addr));
+ regs->eflags &= ~IF_MASK;
+ trace_hardirqs_off();
+ regs->eip = (unsigned long)(jp->entry);
+ return 1;
+}
+
+void __kprobes jprobe_return(void)
+{
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+
+ asm volatile (" xchgl %%ebx,%%esp \n"
+ " int3 \n"
+ " .globl jprobe_return_end \n"
+ " jprobe_return_end: \n"
+ " nop \n"::"b"
+ (kcb->jprobe_saved_esp):"memory");
+}
+
+int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
+{
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+ u8 *addr = (u8 *) (regs->eip - 1);
+ unsigned long stack_addr = (unsigned long)(kcb->jprobe_saved_esp);
+ struct jprobe *jp = container_of(p, struct jprobe, kp);
+
+ if ((addr > (u8 *) jprobe_return) && (addr < (u8 *) jprobe_return_end)) {
+ if (®s->esp != kcb->jprobe_saved_esp) {
+ struct pt_regs *saved_regs =
+ container_of(kcb->jprobe_saved_esp,
+ struct pt_regs, esp);
+ printk("current esp %p does not match saved esp %p\n",
+ ®s->esp, kcb->jprobe_saved_esp);
+ printk("Saved registers for jprobe %p\n", jp);
+ show_registers(saved_regs);
+ printk("Current registers\n");
+ show_registers(regs);
+ BUG();
+ }
+ *regs = kcb->jprobe_saved_regs;
+ memcpy((kprobe_opcode_t *) stack_addr, kcb->jprobes_stack,
+ MIN_STACK_SIZE(stack_addr));
+ preempt_enable_no_resched();
+ return 1;
+ }
+ return 0;
+}
+
+int __kprobes arch_trampoline_kprobe(struct kprobe *p)
+{
+ return 0;
+}
+
+int __init arch_init_kprobes(void)
+{
+ return 0;
+}
Index: linux-2.6-lttng.stable/arch/x86/instrumentation/kprobes_64.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng.stable/arch/x86/instrumentation/kprobes_64.c 2007-10-29 09:51:38.000000000 -0400
@@ -0,0 +1,761 @@
+/*
+ * Kernel Probes (KProbes)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2002, 2004
+ *
+ * 2002-Oct Created by Vamsi Krishna S <vamsi_krishna@in.ibm.com> Kernel
+ * Probes initial implementation ( includes contributions from
+ * Rusty Russell).
+ * 2004-July Suparna Bhattacharya <suparna@in.ibm.com> added jumper probes
+ * interface to access function arguments.
+ * 2004-Oct Jim Keniston <kenistoj@us.ibm.com> and Prasanna S Panchamukhi
+ * <prasanna@in.ibm.com> adapted for x86_64
+ * 2005-Mar Roland McGrath <roland@redhat.com>
+ * Fixed to handle %rip-relative addressing mode correctly.
+ * 2005-May Rusty Lynch <rusty.lynch@intel.com>
+ * Added function return probes functionality
+ */
+
+#include <linux/kprobes.h>
+#include <linux/ptrace.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/preempt.h>
+#include <linux/module.h>
+#include <linux/kdebug.h>
+
+#include <asm/pgtable.h>
+#include <asm/uaccess.h>
+#include <asm/alternative.h>
+
+void jprobe_return_end(void);
+static void __kprobes arch_copy_kprobe(struct kprobe *p);
+
+DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
+DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
+
+struct kretprobe_blackpoint kretprobe_blacklist[] = {
+ {"__switch_to", }, /* This function switches only current task, but
+ doesn't switch kernel stack.*/
+ {NULL, NULL} /* Terminator */
+};
+const int kretprobe_blacklist_size = ARRAY_SIZE(kretprobe_blacklist);
+
+/*
+ * returns non-zero if opcode modifies the interrupt flag.
+ */
+static __always_inline int is_IF_modifier(kprobe_opcode_t *insn)
+{
+ switch (*insn) {
+ case 0xfa: /* cli */
+ case 0xfb: /* sti */
+ case 0xcf: /* iret/iretd */
+ case 0x9d: /* popf/popfd */
+ return 1;
+ }
+
+ if (*insn >= 0x40 && *insn <= 0x4f && *++insn == 0xcf)
+ return 1;
+ return 0;
+}
+
+int __kprobes arch_prepare_kprobe(struct kprobe *p)
+{
+ /* insn: must be on special executable page on x86_64. */
+ p->ainsn.insn = get_insn_slot();
+ if (!p->ainsn.insn) {
+ return -ENOMEM;
+ }
+ arch_copy_kprobe(p);
+ return 0;
+}
+
+/*
+ * Determine if the instruction uses the %rip-relative addressing mode.
+ * If it does, return the address of the 32-bit displacement word.
+ * If not, return null.
+ */
+static s32 __kprobes *is_riprel(u8 *insn)
+{
+#define W(row,b0,b1,b2,b3,b4,b5,b6,b7,b8,b9,ba,bb,bc,bd,be,bf) \
+ (((b0##UL << 0x0)|(b1##UL << 0x1)|(b2##UL << 0x2)|(b3##UL << 0x3) | \
+ (b4##UL << 0x4)|(b5##UL << 0x5)|(b6##UL << 0x6)|(b7##UL << 0x7) | \
+ (b8##UL << 0x8)|(b9##UL << 0x9)|(ba##UL << 0xa)|(bb##UL << 0xb) | \
+ (bc##UL << 0xc)|(bd##UL << 0xd)|(be##UL << 0xe)|(bf##UL << 0xf)) \
+ << (row % 64))
+ static const u64 onebyte_has_modrm[256 / 64] = {
+ /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */
+ /* ------------------------------- */
+ W(0x00, 1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0)| /* 00 */
+ W(0x10, 1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0)| /* 10 */
+ W(0x20, 1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0)| /* 20 */
+ W(0x30, 1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0), /* 30 */
+ W(0x40, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)| /* 40 */
+ W(0x50, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)| /* 50 */
+ W(0x60, 0,0,1,1,0,0,0,0,0,1,0,1,0,0,0,0)| /* 60 */
+ W(0x70, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), /* 70 */
+ W(0x80, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)| /* 80 */
+ W(0x90, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)| /* 90 */
+ W(0xa0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)| /* a0 */
+ W(0xb0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), /* b0 */
+ W(0xc0, 1,1,0,0,1,1,1,1,0,0,0,0,0,0,0,0)| /* c0 */
+ W(0xd0, 1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1)| /* d0 */
+ W(0xe0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)| /* e0 */
+ W(0xf0, 0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,1) /* f0 */
+ /* ------------------------------- */
+ /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */
+ };
+ static const u64 twobyte_has_modrm[256 / 64] = {
+ /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */
+ /* ------------------------------- */
+ W(0x00, 1,1,1,1,0,0,0,0,0,0,0,0,0,1,0,1)| /* 0f */
+ W(0x10, 1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0)| /* 1f */
+ W(0x20, 1,1,1,1,1,0,1,0,1,1,1,1,1,1,1,1)| /* 2f */
+ W(0x30, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), /* 3f */
+ W(0x40, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)| /* 4f */
+ W(0x50, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)| /* 5f */
+ W(0x60, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)| /* 6f */
+ W(0x70, 1,1,1,1,1,1,1,0,0,0,0,0,1,1,1,1), /* 7f */
+ W(0x80, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)| /* 8f */
+ W(0x90, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)| /* 9f */
+ W(0xa0, 0,0,0,1,1,1,1,1,0,0,0,1,1,1,1,1)| /* af */
+ W(0xb0, 1,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1), /* bf */
+ W(0xc0, 1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0)| /* cf */
+ W(0xd0, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)| /* df */
+ W(0xe0, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)| /* ef */
+ W(0xf0, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0) /* ff */
+ /* ------------------------------- */
+ /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */
+ };
+#undef W
+ int need_modrm;
+
+ /* Skip legacy instruction prefixes. */
+ while (1) {
+ switch (*insn) {
+ case 0x66:
+ case 0x67:
+ case 0x2e:
+ case 0x3e:
+ case 0x26:
+ case 0x64:
+ case 0x65:
+ case 0x36:
+ case 0xf0:
+ case 0xf3:
+ case 0xf2:
+ ++insn;
+ continue;
+ }
+ break;
+ }
+
+ /* Skip REX instruction prefix. */
+ if ((*insn & 0xf0) == 0x40)
+ ++insn;
+
+ if (*insn == 0x0f) { /* Two-byte opcode. */
+ ++insn;
+ need_modrm = test_bit(*insn, twobyte_has_modrm);
+ } else { /* One-byte opcode. */
+ need_modrm = test_bit(*insn, onebyte_has_modrm);
+ }
+
+ if (need_modrm) {
+ u8 modrm = *++insn;
+ if ((modrm & 0xc7) == 0x05) { /* %rip+disp32 addressing mode */
+ /* Displacement follows ModRM byte. */
+ return (s32 *) ++insn;
+ }
+ }
+
+ /* No %rip-relative addressing mode here. */
+ return NULL;
+}
+
+static void __kprobes arch_copy_kprobe(struct kprobe *p)
+{
+ s32 *ripdisp;
+ memcpy(p->ainsn.insn, p->addr, MAX_INSN_SIZE);
+ ripdisp = is_riprel(p->ainsn.insn);
+ if (ripdisp) {
+ /*
+ * The copied instruction uses the %rip-relative
+ * addressing mode. Adjust the displacement for the
+ * difference between the original location of this
+ * instruction and the location of the copy that will
+ * actually be run. The tricky bit here is making sure
+ * that the sign extension happens correctly in this
+ * calculation, since we need a signed 32-bit result to
+ * be sign-extended to 64 bits when it's added to the
+ * %rip value and yield the same 64-bit result that the
+ * sign-extension of the original signed 32-bit
+ * displacement would have given.
+ */
+ s64 disp = (u8 *) p->addr + *ripdisp - (u8 *) p->ainsn.insn;
+ BUG_ON((s64) (s32) disp != disp); /* Sanity check. */
+ *ripdisp = disp;
+ }
+ p->opcode = *p->addr;
+}
+
+void __kprobes arch_arm_kprobe(struct kprobe *p)
+{
+ text_poke(p->addr, ((unsigned char []){BREAKPOINT_INSTRUCTION}), 1);
+}
+
+void __kprobes arch_disarm_kprobe(struct kprobe *p)
+{
+ text_poke(p->addr, &p->opcode, 1);
+}
+
+void __kprobes arch_remove_kprobe(struct kprobe *p)
+{
+ mutex_lock(&kprobe_mutex);
+ free_insn_slot(p->ainsn.insn, 0);
+ mutex_unlock(&kprobe_mutex);
+}
+
+static void __kprobes save_previous_kprobe(struct kprobe_ctlblk *kcb)
+{
+ kcb->prev_kprobe.kp = kprobe_running();
+ kcb->prev_kprobe.status = kcb->kprobe_status;
+ kcb->prev_kprobe.old_rflags = kcb->kprobe_old_rflags;
+ kcb->prev_kprobe.saved_rflags = kcb->kprobe_saved_rflags;
+}
+
+static void __kprobes restore_previous_kprobe(struct kprobe_ctlblk *kcb)
+{
+ __get_cpu_var(current_kprobe) = kcb->prev_kprobe.kp;
+ kcb->kprobe_status = kcb->prev_kprobe.status;
+ kcb->kprobe_old_rflags = kcb->prev_kprobe.old_rflags;
+ kcb->kprobe_saved_rflags = kcb->prev_kprobe.saved_rflags;
+}
+
+static void __kprobes set_current_kprobe(struct kprobe *p, struct pt_regs *regs,
+ struct kprobe_ctlblk *kcb)
+{
+ __get_cpu_var(current_kprobe) = p;
+ kcb->kprobe_saved_rflags = kcb->kprobe_old_rflags
+ = (regs->eflags & (TF_MASK | IF_MASK));
+ if (is_IF_modifier(p->ainsn.insn))
+ kcb->kprobe_saved_rflags &= ~IF_MASK;
+}
+
+static void __kprobes prepare_singlestep(struct kprobe *p, struct pt_regs *regs)
+{
+ regs->eflags |= TF_MASK;
+ regs->eflags &= ~IF_MASK;
+ /*single step inline if the instruction is an int3*/
+ if (p->opcode == BREAKPOINT_INSTRUCTION)
+ regs->rip = (unsigned long)p->addr;
+ else
+ regs->rip = (unsigned long)p->ainsn.insn;
+}
+
+/* Called with kretprobe_lock held */
+void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
+ struct pt_regs *regs)
+{
+ unsigned long *sara = (unsigned long *)regs->rsp;
+
+ ri->ret_addr = (kprobe_opcode_t *) *sara;
+ /* Replace the return addr with trampoline addr */
+ *sara = (unsigned long) &kretprobe_trampoline;
+}
+
+int __kprobes kprobe_handler(struct pt_regs *regs)
+{
+ struct kprobe *p;
+ int ret = 0;
+ kprobe_opcode_t *addr = (kprobe_opcode_t *)(regs->rip - sizeof(kprobe_opcode_t));
+ struct kprobe_ctlblk *kcb;
+
+ /*
+ * We don't want to be preempted for the entire
+ * duration of kprobe processing
+ */
+ preempt_disable();
+ kcb = get_kprobe_ctlblk();
+
+ /* Check we're not actually recursing */
+ if (kprobe_running()) {
+ p = get_kprobe(addr);
+ if (p) {
+ if (kcb->kprobe_status == KPROBE_HIT_SS &&
+ *p->ainsn.insn == BREAKPOINT_INSTRUCTION) {
+ regs->eflags &= ~TF_MASK;
+ regs->eflags |= kcb->kprobe_saved_rflags;
+ goto no_kprobe;
+ } else if (kcb->kprobe_status == KPROBE_HIT_SSDONE) {
+ /* TODO: Provide re-entrancy from
+ * post_kprobes_handler() and avoid exception
+ * stack corruption while single-stepping on
+ * the instruction of the new probe.
+ */
+ arch_disarm_kprobe(p);
+ regs->rip = (unsigned long)p->addr;
+ reset_current_kprobe();
+ ret = 1;
+ } else {
+ /* We have reentered the kprobe_handler(), since
+ * another probe was hit while within the
+ * handler. We here save the original kprobe
+ * variables and just single step on instruction
+ * of the new probe without calling any user
+ * handlers.
+ */
+ save_previous_kprobe(kcb);
+ set_current_kprobe(p, regs, kcb);
+ kprobes_inc_nmissed_count(p);
+ prepare_singlestep(p, regs);
+ kcb->kprobe_status = KPROBE_REENTER;
+ return 1;
+ }
+ } else {
+ if (*addr != BREAKPOINT_INSTRUCTION) {
+ /* The breakpoint instruction was removed by
+ * another cpu right after we hit, no further
+ * handling of this interrupt is appropriate
+ */
+ regs->rip = (unsigned long)addr;
+ ret = 1;
+ goto no_kprobe;
+ }
+ p = __get_cpu_var(current_kprobe);
+ if (p->break_handler && p->break_handler(p, regs)) {
+ goto ss_probe;
+ }
+ }
+ goto no_kprobe;
+ }
+
+ p = get_kprobe(addr);
+ if (!p) {
+ if (*addr != BREAKPOINT_INSTRUCTION) {
+ /*
+ * The breakpoint instruction was removed right
+ * after we hit it. Another cpu has removed
+ * either a probepoint or a debugger breakpoint
+ * at this address. In either case, no further
+ * handling of this interrupt is appropriate.
+ * Back up over the (now missing) int3 and run
+ * the original instruction.
+ */
+ regs->rip = (unsigned long)addr;
+ ret = 1;
+ }
+ /* Not one of ours: let kernel handle it */
+ goto no_kprobe;
+ }
+
+ set_current_kprobe(p, regs, kcb);
+ kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+
+ if (p->pre_handler && p->pre_handler(p, regs))
+ /* handler has already set things up, so skip ss setup */
+ return 1;
+
+ss_probe:
+ prepare_singlestep(p, regs);
+ kcb->kprobe_status = KPROBE_HIT_SS;
+ return 1;
+
+no_kprobe:
+ preempt_enable_no_resched();
+ return ret;
+}
+
+/*
+ * For function-return probes, init_kprobes() establishes a probepoint
+ * here. When a retprobed function returns, this probe is hit and
+ * trampoline_probe_handler() runs, calling the kretprobe's handler.
+ */
+ void kretprobe_trampoline_holder(void)
+ {
+ asm volatile ( ".global kretprobe_trampoline\n"
+ "kretprobe_trampoline: \n"
+ "nop\n");
+ }
+
+/*
+ * Called when we hit the probe point at kretprobe_trampoline
+ */
+int __kprobes trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
+{
+ struct kretprobe_instance *ri = NULL;
+ struct hlist_head *head, empty_rp;
+ struct hlist_node *node, *tmp;
+ unsigned long flags, orig_ret_address = 0;
+ unsigned long trampoline_address =(unsigned long)&kretprobe_trampoline;
+
+ INIT_HLIST_HEAD(&empty_rp);
+ spin_lock_irqsave(&kretprobe_lock, flags);
+ head = kretprobe_inst_table_head(current);
+
+ /*
+ * It is possible to have multiple instances associated with a given
+ * task either because an multiple functions in the call path
+ * have a return probe installed on them, and/or more then one return
+ * return probe was registered for a target function.
+ *
+ * We can handle this because:
+ * - instances are always inserted at the head of the list
+ * - when multiple return probes are registered for the same
+ * function, the first instance's ret_addr will point to the
+ * real return address, and all the rest will point to
+ * kretprobe_trampoline
+ */
+ hlist_for_each_entry_safe(ri, node, tmp, head, hlist) {
+ if (ri->task != current)
+ /* another task is sharing our hash bucket */
+ continue;
+
+ if (ri->rp && ri->rp->handler)
+ ri->rp->handler(ri, regs);
+
+ orig_ret_address = (unsigned long)ri->ret_addr;
+ recycle_rp_inst(ri, &empty_rp);
+
+ if (orig_ret_address != trampoline_address)
+ /*
+ * This is the real return address. Any other
+ * instances associated with this task are for
+ * other calls deeper on the call stack
+ */
+ break;
+ }
+
+ kretprobe_assert(ri, orig_ret_address, trampoline_address);
+ regs->rip = orig_ret_address;
+
+ reset_current_kprobe();
+ spin_unlock_irqrestore(&kretprobe_lock, flags);
+ preempt_enable_no_resched();
+
+ hlist_for_each_entry_safe(ri, node, tmp, &empty_rp, hlist) {
+ hlist_del(&ri->hlist);
+ kfree(ri);
+ }
+ /*
+ * By returning a non-zero value, we are telling
+ * kprobe_handler() that we don't want the post_handler
+ * to run (and have re-enabled preemption)
+ */
+ return 1;
+}
+
+/*
+ * Called after single-stepping. p->addr is the address of the
+ * instruction whose first byte has been replaced by the "int 3"
+ * instruction. To avoid the SMP problems that can occur when we
+ * temporarily put back the original opcode to single-step, we
+ * single-stepped a copy of the instruction. The address of this
+ * copy is p->ainsn.insn.
+ *
+ * This function prepares to return from the post-single-step
+ * interrupt. We have to fix up the stack as follows:
+ *
+ * 0) Except in the case of absolute or indirect jump or call instructions,
+ * the new rip is relative to the copied instruction. We need to make
+ * it relative to the original instruction.
+ *
+ * 1) If the single-stepped instruction was pushfl, then the TF and IF
+ * flags are set in the just-pushed eflags, and may need to be cleared.
+ *
+ * 2) If the single-stepped instruction was a call, the return address
+ * that is atop the stack is the address following the copied instruction.
+ * We need to make it the address following the original instruction.
+ */
+static void __kprobes resume_execution(struct kprobe *p,
+ struct pt_regs *regs, struct kprobe_ctlblk *kcb)
+{
+ unsigned long *tos = (unsigned long *)regs->rsp;
+ unsigned long next_rip = 0;
+ unsigned long copy_rip = (unsigned long)p->ainsn.insn;
+ unsigned long orig_rip = (unsigned long)p->addr;
+ kprobe_opcode_t *insn = p->ainsn.insn;
+
+ /*skip the REX prefix*/
+ if (*insn >= 0x40 && *insn <= 0x4f)
+ insn++;
+
+ switch (*insn) {
+ case 0x9c: /* pushfl */
+ *tos &= ~(TF_MASK | IF_MASK);
+ *tos |= kcb->kprobe_old_rflags;
+ break;
+ case 0xc3: /* ret/lret */
+ case 0xcb:
+ case 0xc2:
+ case 0xca:
+ regs->eflags &= ~TF_MASK;
+ /* rip is already adjusted, no more changes required*/
+ return;
+ case 0xe8: /* call relative - Fix return addr */
+ *tos = orig_rip + (*tos - copy_rip);
+ break;
+ case 0xff:
+ if ((insn[1] & 0x30) == 0x10) {
+ /* call absolute, indirect */
+ /* Fix return addr; rip is correct. */
+ next_rip = regs->rip;
+ *tos = orig_rip + (*tos - copy_rip);
+ } else if (((insn[1] & 0x31) == 0x20) || /* jmp near, absolute indirect */
+ ((insn[1] & 0x31) == 0x21)) { /* jmp far, absolute indirect */
+ /* rip is correct. */
+ next_rip = regs->rip;
+ }
+ break;
+ case 0xea: /* jmp absolute -- rip is correct */
+ next_rip = regs->rip;
+ break;
+ default:
+ break;
+ }
+
+ regs->eflags &= ~TF_MASK;
+ if (next_rip) {
+ regs->rip = next_rip;
+ } else {
+ regs->rip = orig_rip + (regs->rip - copy_rip);
+ }
+}
+
+int __kprobes post_kprobe_handler(struct pt_regs *regs)
+{
+ struct kprobe *cur = kprobe_running();
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+
+ if (!cur)
+ return 0;
+
+ if ((kcb->kprobe_status != KPROBE_REENTER) && cur->post_handler) {
+ kcb->kprobe_status = KPROBE_HIT_SSDONE;
+ cur->post_handler(cur, regs, 0);
+ }
+
+ resume_execution(cur, regs, kcb);
+ regs->eflags |= kcb->kprobe_saved_rflags;
+#ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT
+ if (raw_irqs_disabled_flags(regs->eflags))
+ trace_hardirqs_off();
+ else
+ trace_hardirqs_on();
+#endif
+
+ /* Restore the original saved kprobes variables and continue. */
+ if (kcb->kprobe_status == KPROBE_REENTER) {
+ restore_previous_kprobe(kcb);
+ goto out;
+ }
+ reset_current_kprobe();
+out:
+ preempt_enable_no_resched();
+
+ /*
+ * if somebody else is singlestepping across a probe point, eflags
+ * will have TF set, in which case, continue the remaining processing
+ * of do_debug, as if this is not a probe hit.
+ */
+ if (regs->eflags & TF_MASK)
+ return 0;
+
+ return 1;
+}
+
+int __kprobes kprobe_fault_handler(struct pt_regs *regs, int trapnr)
+{
+ struct kprobe *cur = kprobe_running();
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+ const struct exception_table_entry *fixup;
+
+ switch(kcb->kprobe_status) {
+ case KPROBE_HIT_SS:
+ case KPROBE_REENTER:
+ /*
+ * We are here because the instruction being single
+ * stepped caused a page fault. We reset the current
+ * kprobe and the rip points back to the probe address
+ * and allow the page fault handler to continue as a
+ * normal page fault.
+ */
+ regs->rip = (unsigned long)cur->addr;
+ regs->eflags |= kcb->kprobe_old_rflags;
+ if (kcb->kprobe_status == KPROBE_REENTER)
+ restore_previous_kprobe(kcb);
+ else
+ reset_current_kprobe();
+ preempt_enable_no_resched();
+ break;
+ case KPROBE_HIT_ACTIVE:
+ case KPROBE_HIT_SSDONE:
+ /*
+ * We increment the nmissed count for accounting,
+ * we can also use npre/npostfault count for accouting
+ * these specific fault cases.
+ */
+ kprobes_inc_nmissed_count(cur);
+
+ /*
+ * We come here because instructions in the pre/post
+ * handler caused the page_fault, this could happen
+ * if handler tries to access user space by
+ * copy_from_user(), get_user() etc. Let the
+ * user-specified handler try to fix it first.
+ */
+ if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr))
+ return 1;
+
+ /*
+ * In case the user-specified fault handler returned
+ * zero, try to fix up.
+ */
+ fixup = search_exception_tables(regs->rip);
+ if (fixup) {
+ regs->rip = fixup->fixup;
+ return 1;
+ }
+
+ /*
+ * fixup() could not handle it,
+ * Let do_page_fault() fix it.
+ */
+ break;
+ default:
+ break;
+ }
+ return 0;
+}
+
+/*
+ * Wrapper routine for handling exceptions.
+ */
+int __kprobes kprobe_exceptions_notify(struct notifier_block *self,
+ unsigned long val, void *data)
+{
+ struct die_args *args = (struct die_args *)data;
+ int ret = NOTIFY_DONE;
+
+ if (args->regs && user_mode(args->regs))
+ return ret;
+
+ switch (val) {
+ case DIE_INT3:
+ if (kprobe_handler(args->regs))
+ ret = NOTIFY_STOP;
+ break;
+ case DIE_DEBUG:
+ if (post_kprobe_handler(args->regs))
+ ret = NOTIFY_STOP;
+ break;
+ case DIE_GPF:
+ /* kprobe_running() needs smp_processor_id() */
+ preempt_disable();
+ if (kprobe_running() &&
+ kprobe_fault_handler(args->regs, args->trapnr))
+ ret = NOTIFY_STOP;
+ preempt_enable();
+ break;
+ default:
+ break;
+ }
+ return ret;
+}
+
+int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
+{
+ struct jprobe *jp = container_of(p, struct jprobe, kp);
+ unsigned long addr;
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+
+ kcb->jprobe_saved_regs = *regs;
+ kcb->jprobe_saved_rsp = (long *) regs->rsp;
+ addr = (unsigned long)(kcb->jprobe_saved_rsp);
+ /*
+ * As Linus pointed out, gcc assumes that the callee
+ * owns the argument space and could overwrite it, e.g.
+ * tailcall optimization. So, to be absolutely safe
+ * we also save and restore enough stack bytes to cover
+ * the argument area.
+ */
+ memcpy(kcb->jprobes_stack, (kprobe_opcode_t *)addr,
+ MIN_STACK_SIZE(addr));
+ regs->eflags &= ~IF_MASK;
+ trace_hardirqs_off();
+ regs->rip = (unsigned long)(jp->entry);
+ return 1;
+}
+
+void __kprobes jprobe_return(void)
+{
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+
+ asm volatile (" xchg %%rbx,%%rsp \n"
+ " int3 \n"
+ " .globl jprobe_return_end \n"
+ " jprobe_return_end: \n"
+ " nop \n"::"b"
+ (kcb->jprobe_saved_rsp):"memory");
+}
+
+int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
+{
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+ u8 *addr = (u8 *) (regs->rip - 1);
+ unsigned long stack_addr = (unsigned long)(kcb->jprobe_saved_rsp);
+ struct jprobe *jp = container_of(p, struct jprobe, kp);
+
+ if ((addr > (u8 *) jprobe_return) && (addr < (u8 *) jprobe_return_end)) {
+ if ((long *)regs->rsp != kcb->jprobe_saved_rsp) {
+ struct pt_regs *saved_regs =
+ container_of(kcb->jprobe_saved_rsp,
+ struct pt_regs, rsp);
+ printk("current rsp %p does not match saved rsp %p\n",
+ (long *)regs->rsp, kcb->jprobe_saved_rsp);
+ printk("Saved registers for jprobe %p\n", jp);
+ show_registers(saved_regs);
+ printk("Current registers\n");
+ show_registers(regs);
+ BUG();
+ }
+ *regs = kcb->jprobe_saved_regs;
+ memcpy((kprobe_opcode_t *) stack_addr, kcb->jprobes_stack,
+ MIN_STACK_SIZE(stack_addr));
+ preempt_enable_no_resched();
+ return 1;
+ }
+ return 0;
+}
+
+static struct kprobe trampoline_p = {
+ .addr = (kprobe_opcode_t *) &kretprobe_trampoline,
+ .pre_handler = trampoline_probe_handler
+};
+
+int __init arch_init_kprobes(void)
+{
+ return register_kprobe(&trampoline_p);
+}
+
+int __kprobes arch_trampoline_kprobe(struct kprobe *p)
+{
+ if (p->addr == (kprobe_opcode_t *)&kretprobe_trampoline)
+ return 1;
+
+ return 0;
+}
Index: linux-2.6-lttng.stable/arch/x86/kernel/kprobes_32.c
===================================================================
--- linux-2.6-lttng.stable.orig/arch/x86/kernel/kprobes_32.c 2007-10-29 09:51:07.000000000 -0400
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,763 +0,0 @@
-/*
- * Kernel Probes (KProbes)
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
- *
- * Copyright (C) IBM Corporation, 2002, 2004
- *
- * 2002-Oct Created by Vamsi Krishna S <vamsi_krishna@in.ibm.com> Kernel
- * Probes initial implementation ( includes contributions from
- * Rusty Russell).
- * 2004-July Suparna Bhattacharya <suparna@in.ibm.com> added jumper probes
- * interface to access function arguments.
- * 2005-May Hien Nguyen <hien@us.ibm.com>, Jim Keniston
- * <jkenisto@us.ibm.com> and Prasanna S Panchamukhi
- * <prasanna@in.ibm.com> added function-return probes.
- */
-
-#include <linux/kprobes.h>
-#include <linux/ptrace.h>
-#include <linux/preempt.h>
-#include <linux/kdebug.h>
-#include <asm/cacheflush.h>
-#include <asm/desc.h>
-#include <asm/uaccess.h>
-#include <asm/alternative.h>
-
-void jprobe_return_end(void);
-
-DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
-DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
-
-struct kretprobe_blackpoint kretprobe_blacklist[] = {
- {"__switch_to", }, /* This function switches only current task, but
- doesn't switch kernel stack.*/
- {NULL, NULL} /* Terminator */
-};
-const int kretprobe_blacklist_size = ARRAY_SIZE(kretprobe_blacklist);
-
-/* insert a jmp code */
-static __always_inline void set_jmp_op(void *from, void *to)
-{
- struct __arch_jmp_op {
- char op;
- long raddr;
- } __attribute__((packed)) *jop;
- jop = (struct __arch_jmp_op *)from;
- jop->raddr = (long)(to) - ((long)(from) + 5);
- jop->op = RELATIVEJUMP_INSTRUCTION;
-}
-
-/*
- * returns non-zero if opcodes can be boosted.
- */
-static __always_inline int can_boost(kprobe_opcode_t *opcodes)
-{
-#define W(row,b0,b1,b2,b3,b4,b5,b6,b7,b8,b9,ba,bb,bc,bd,be,bf) \
- (((b0##UL << 0x0)|(b1##UL << 0x1)|(b2##UL << 0x2)|(b3##UL << 0x3) | \
- (b4##UL << 0x4)|(b5##UL << 0x5)|(b6##UL << 0x6)|(b7##UL << 0x7) | \
- (b8##UL << 0x8)|(b9##UL << 0x9)|(ba##UL << 0xa)|(bb##UL << 0xb) | \
- (bc##UL << 0xc)|(bd##UL << 0xd)|(be##UL << 0xe)|(bf##UL << 0xf)) \
- << (row % 32))
- /*
- * Undefined/reserved opcodes, conditional jump, Opcode Extension
- * Groups, and some special opcodes can not be boost.
- */
- static const unsigned long twobyte_is_boostable[256 / 32] = {
- /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */
- /* ------------------------------- */
- W(0x00, 0,0,1,1,0,0,1,0,1,1,0,0,0,0,0,0)| /* 00 */
- W(0x10, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), /* 10 */
- W(0x20, 1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0)| /* 20 */
- W(0x30, 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0), /* 30 */
- W(0x40, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)| /* 40 */
- W(0x50, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), /* 50 */
- W(0x60, 1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1)| /* 60 */
- W(0x70, 0,0,0,0,1,1,1,1,0,0,0,0,0,0,1,1), /* 70 */
- W(0x80, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)| /* 80 */
- W(0x90, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1), /* 90 */
- W(0xa0, 1,1,0,1,1,1,0,0,1,1,0,1,1,1,0,1)| /* a0 */
- W(0xb0, 1,1,1,1,1,1,1,1,0,0,0,1,1,1,1,1), /* b0 */
- W(0xc0, 1,1,0,0,0,0,0,0,1,1,1,1,1,1,1,1)| /* c0 */
- W(0xd0, 0,1,1,1,0,1,0,0,1,1,0,1,1,1,0,1), /* d0 */
- W(0xe0, 0,1,1,0,0,1,0,0,1,1,0,1,1,1,0,1)| /* e0 */
- W(0xf0, 0,1,1,1,0,1,0,0,1,1,1,0,1,1,1,0) /* f0 */
- /* ------------------------------- */
- /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */
- };
-#undef W
- kprobe_opcode_t opcode;
- kprobe_opcode_t *orig_opcodes = opcodes;
-retry:
- if (opcodes - orig_opcodes > MAX_INSN_SIZE - 1)
- return 0;
- opcode = *(opcodes++);
-
- /* 2nd-byte opcode */
- if (opcode == 0x0f) {
- if (opcodes - orig_opcodes > MAX_INSN_SIZE - 1)
- return 0;
- return test_bit(*opcodes, twobyte_is_boostable);
- }
-
- switch (opcode & 0xf0) {
- case 0x60:
- if (0x63 < opcode && opcode < 0x67)
- goto retry; /* prefixes */
- /* can't boost Address-size override and bound */
- return (opcode != 0x62 && opcode != 0x67);
- case 0x70:
- return 0; /* can't boost conditional jump */
- case 0xc0:
- /* can't boost software-interruptions */
- return (0xc1 < opcode && opcode < 0xcc) || opcode == 0xcf;
- case 0xd0:
- /* can boost AA* and XLAT */
- return (opcode == 0xd4 || opcode == 0xd5 || opcode == 0xd7);
- case 0xe0:
- /* can boost in/out and absolute jmps */
- return ((opcode & 0x04) || opcode == 0xea);
- case 0xf0:
- if ((opcode & 0x0c) == 0 && opcode != 0xf1)
- goto retry; /* lock/rep(ne) prefix */
- /* clear and set flags can be boost */
- return (opcode == 0xf5 || (0xf7 < opcode && opcode < 0xfe));
- default:
- if (opcode == 0x26 || opcode == 0x36 || opcode == 0x3e)
- goto retry; /* prefixes */
- /* can't boost CS override and call */
- return (opcode != 0x2e && opcode != 0x9a);
- }
-}
-
-/*
- * returns non-zero if opcode modifies the interrupt flag.
- */
-static int __kprobes is_IF_modifier(kprobe_opcode_t opcode)
-{
- switch (opcode) {
- case 0xfa: /* cli */
- case 0xfb: /* sti */
- case 0xcf: /* iret/iretd */
- case 0x9d: /* popf/popfd */
- return 1;
- }
- return 0;
-}
-
-int __kprobes arch_prepare_kprobe(struct kprobe *p)
-{
- /* insn: must be on special executable page on i386. */
- p->ainsn.insn = get_insn_slot();
- if (!p->ainsn.insn)
- return -ENOMEM;
-
- memcpy(p->ainsn.insn, p->addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
- p->opcode = *p->addr;
- if (can_boost(p->addr)) {
- p->ainsn.boostable = 0;
- } else {
- p->ainsn.boostable = -1;
- }
- return 0;
-}
-
-void __kprobes arch_arm_kprobe(struct kprobe *p)
-{
- text_poke(p->addr, ((unsigned char []){BREAKPOINT_INSTRUCTION}), 1);
-}
-
-void __kprobes arch_disarm_kprobe(struct kprobe *p)
-{
- text_poke(p->addr, &p->opcode, 1);
-}
-
-void __kprobes arch_remove_kprobe(struct kprobe *p)
-{
- mutex_lock(&kprobe_mutex);
- free_insn_slot(p->ainsn.insn, (p->ainsn.boostable == 1));
- mutex_unlock(&kprobe_mutex);
-}
-
-static void __kprobes save_previous_kprobe(struct kprobe_ctlblk *kcb)
-{
- kcb->prev_kprobe.kp = kprobe_running();
- kcb->prev_kprobe.status = kcb->kprobe_status;
- kcb->prev_kprobe.old_eflags = kcb->kprobe_old_eflags;
- kcb->prev_kprobe.saved_eflags = kcb->kprobe_saved_eflags;
-}
-
-static void __kprobes restore_previous_kprobe(struct kprobe_ctlblk *kcb)
-{
- __get_cpu_var(current_kprobe) = kcb->prev_kprobe.kp;
- kcb->kprobe_status = kcb->prev_kprobe.status;
- kcb->kprobe_old_eflags = kcb->prev_kprobe.old_eflags;
- kcb->kprobe_saved_eflags = kcb->prev_kprobe.saved_eflags;
-}
-
-static void __kprobes set_current_kprobe(struct kprobe *p, struct pt_regs *regs,
- struct kprobe_ctlblk *kcb)
-{
- __get_cpu_var(current_kprobe) = p;
- kcb->kprobe_saved_eflags = kcb->kprobe_old_eflags
- = (regs->eflags & (TF_MASK | IF_MASK));
- if (is_IF_modifier(p->opcode))
- kcb->kprobe_saved_eflags &= ~IF_MASK;
-}
-
-static void __kprobes prepare_singlestep(struct kprobe *p, struct pt_regs *regs)
-{
- regs->eflags |= TF_MASK;
- regs->eflags &= ~IF_MASK;
- /*single step inline if the instruction is an int3*/
- if (p->opcode == BREAKPOINT_INSTRUCTION)
- regs->eip = (unsigned long)p->addr;
- else
- regs->eip = (unsigned long)p->ainsn.insn;
-}
-
-/* Called with kretprobe_lock held */
-void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
- struct pt_regs *regs)
-{
- unsigned long *sara = (unsigned long *)®s->esp;
-
- ri->ret_addr = (kprobe_opcode_t *) *sara;
-
- /* Replace the return addr with trampoline addr */
- *sara = (unsigned long) &kretprobe_trampoline;
-}
-
-/*
- * Interrupts are disabled on entry as trap3 is an interrupt gate and they
- * remain disabled thorough out this function.
- */
-static int __kprobes kprobe_handler(struct pt_regs *regs)
-{
- struct kprobe *p;
- int ret = 0;
- kprobe_opcode_t *addr;
- struct kprobe_ctlblk *kcb;
-
- addr = (kprobe_opcode_t *)(regs->eip - sizeof(kprobe_opcode_t));
-
- /*
- * We don't want to be preempted for the entire
- * duration of kprobe processing
- */
- preempt_disable();
- kcb = get_kprobe_ctlblk();
-
- /* Check we're not actually recursing */
- if (kprobe_running()) {
- p = get_kprobe(addr);
- if (p) {
- if (kcb->kprobe_status == KPROBE_HIT_SS &&
- *p->ainsn.insn == BREAKPOINT_INSTRUCTION) {
- regs->eflags &= ~TF_MASK;
- regs->eflags |= kcb->kprobe_saved_eflags;
- goto no_kprobe;
- }
- /* We have reentered the kprobe_handler(), since
- * another probe was hit while within the handler.
- * We here save the original kprobes variables and
- * just single step on the instruction of the new probe
- * without calling any user handlers.
- */
- save_previous_kprobe(kcb);
- set_current_kprobe(p, regs, kcb);
- kprobes_inc_nmissed_count(p);
- prepare_singlestep(p, regs);
- kcb->kprobe_status = KPROBE_REENTER;
- return 1;
- } else {
- if (*addr != BREAKPOINT_INSTRUCTION) {
- /* The breakpoint instruction was removed by
- * another cpu right after we hit, no further
- * handling of this interrupt is appropriate
- */
- regs->eip -= sizeof(kprobe_opcode_t);
- ret = 1;
- goto no_kprobe;
- }
- p = __get_cpu_var(current_kprobe);
- if (p->break_handler && p->break_handler(p, regs)) {
- goto ss_probe;
- }
- }
- goto no_kprobe;
- }
-
- p = get_kprobe(addr);
- if (!p) {
- if (*addr != BREAKPOINT_INSTRUCTION) {
- /*
- * The breakpoint instruction was removed right
- * after we hit it. Another cpu has removed
- * either a probepoint or a debugger breakpoint
- * at this address. In either case, no further
- * handling of this interrupt is appropriate.
- * Back up over the (now missing) int3 and run
- * the original instruction.
- */
- regs->eip -= sizeof(kprobe_opcode_t);
- ret = 1;
- }
- /* Not one of ours: let kernel handle it */
- goto no_kprobe;
- }
-
- set_current_kprobe(p, regs, kcb);
- kcb->kprobe_status = KPROBE_HIT_ACTIVE;
-
- if (p->pre_handler && p->pre_handler(p, regs))
- /* handler has already set things up, so skip ss setup */
- return 1;
-
-ss_probe:
-#if !defined(CONFIG_PREEMPT) || defined(CONFIG_PM)
- if (p->ainsn.boostable == 1 && !p->post_handler){
- /* Boost up -- we can execute copied instructions directly */
- reset_current_kprobe();
- regs->eip = (unsigned long)p->ainsn.insn;
- preempt_enable_no_resched();
- return 1;
- }
-#endif
- prepare_singlestep(p, regs);
- kcb->kprobe_status = KPROBE_HIT_SS;
- return 1;
-
-no_kprobe:
- preempt_enable_no_resched();
- return ret;
-}
-
-/*
- * For function-return probes, init_kprobes() establishes a probepoint
- * here. When a retprobed function returns, this probe is hit and
- * trampoline_probe_handler() runs, calling the kretprobe's handler.
- */
- void __kprobes kretprobe_trampoline_holder(void)
- {
- asm volatile ( ".global kretprobe_trampoline\n"
- "kretprobe_trampoline: \n"
- " pushf\n"
- /* skip cs, eip, orig_eax */
- " subl $12, %esp\n"
- " pushl %fs\n"
- " pushl %ds\n"
- " pushl %es\n"
- " pushl %eax\n"
- " pushl %ebp\n"
- " pushl %edi\n"
- " pushl %esi\n"
- " pushl %edx\n"
- " pushl %ecx\n"
- " pushl %ebx\n"
- " movl %esp, %eax\n"
- " call trampoline_handler\n"
- /* move eflags to cs */
- " movl 52(%esp), %edx\n"
- " movl %edx, 48(%esp)\n"
- /* save true return address on eflags */
- " movl %eax, 52(%esp)\n"
- " popl %ebx\n"
- " popl %ecx\n"
- " popl %edx\n"
- " popl %esi\n"
- " popl %edi\n"
- " popl %ebp\n"
- " popl %eax\n"
- /* skip eip, orig_eax, es, ds, fs */
- " addl $20, %esp\n"
- " popf\n"
- " ret\n");
-}
-
-/*
- * Called from kretprobe_trampoline
- */
-fastcall void *__kprobes trampoline_handler(struct pt_regs *regs)
-{
- struct kretprobe_instance *ri = NULL;
- struct hlist_head *head, empty_rp;
- struct hlist_node *node, *tmp;
- unsigned long flags, orig_ret_address = 0;
- unsigned long trampoline_address =(unsigned long)&kretprobe_trampoline;
-
- INIT_HLIST_HEAD(&empty_rp);
- spin_lock_irqsave(&kretprobe_lock, flags);
- head = kretprobe_inst_table_head(current);
- /* fixup registers */
- regs->xcs = __KERNEL_CS | get_kernel_rpl();
- regs->eip = trampoline_address;
- regs->orig_eax = 0xffffffff;
-
- /*
- * It is possible to have multiple instances associated with a given
- * task either because an multiple functions in the call path
- * have a return probe installed on them, and/or more then one return
- * return probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always inserted at the head of the list
- * - when multiple return probes are registered for the same
- * function, the first instance's ret_addr will point to the
- * real return address, and all the rest will point to
- * kretprobe_trampoline
- */
- hlist_for_each_entry_safe(ri, node, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- if (ri->rp && ri->rp->handler){
- __get_cpu_var(current_kprobe) = &ri->rp->kp;
- get_kprobe_ctlblk()->kprobe_status = KPROBE_HIT_ACTIVE;
- ri->rp->handler(ri, regs);
- __get_cpu_var(current_kprobe) = NULL;
- }
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
- spin_unlock_irqrestore(&kretprobe_lock, flags);
-
- hlist_for_each_entry_safe(ri, node, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
- return (void*)orig_ret_address;
-}
-
-/*
- * Called after single-stepping. p->addr is the address of the
- * instruction whose first byte has been replaced by the "int 3"
- * instruction. To avoid the SMP problems that can occur when we
- * temporarily put back the original opcode to single-step, we
- * single-stepped a copy of the instruction. The address of this
- * copy is p->ainsn.insn.
- *
- * This function prepares to return from the post-single-step
- * interrupt. We have to fix up the stack as follows:
- *
- * 0) Except in the case of absolute or indirect jump or call instructions,
- * the new eip is relative to the copied instruction. We need to make
- * it relative to the original instruction.
- *
- * 1) If the single-stepped instruction was pushfl, then the TF and IF
- * flags are set in the just-pushed eflags, and may need to be cleared.
- *
- * 2) If the single-stepped instruction was a call, the return address
- * that is atop the stack is the address following the copied instruction.
- * We need to make it the address following the original instruction.
- *
- * This function also checks instruction size for preparing direct execution.
- */
-static void __kprobes resume_execution(struct kprobe *p,
- struct pt_regs *regs, struct kprobe_ctlblk *kcb)
-{
- unsigned long *tos = (unsigned long *)®s->esp;
- unsigned long copy_eip = (unsigned long)p->ainsn.insn;
- unsigned long orig_eip = (unsigned long)p->addr;
-
- regs->eflags &= ~TF_MASK;
- switch (p->ainsn.insn[0]) {
- case 0x9c: /* pushfl */
- *tos &= ~(TF_MASK | IF_MASK);
- *tos |= kcb->kprobe_old_eflags;
- break;
- case 0xc2: /* iret/ret/lret */
- case 0xc3:
- case 0xca:
- case 0xcb:
- case 0xcf:
- case 0xea: /* jmp absolute -- eip is correct */
- /* eip is already adjusted, no more changes required */
- p->ainsn.boostable = 1;
- goto no_change;
- case 0xe8: /* call relative - Fix return addr */
- *tos = orig_eip + (*tos - copy_eip);
- break;
- case 0x9a: /* call absolute -- same as call absolute, indirect */
- *tos = orig_eip + (*tos - copy_eip);
- goto no_change;
- case 0xff:
- if ((p->ainsn.insn[1] & 0x30) == 0x10) {
- /*
- * call absolute, indirect
- * Fix return addr; eip is correct.
- * But this is not boostable
- */
- *tos = orig_eip + (*tos - copy_eip);
- goto no_change;
- } else if (((p->ainsn.insn[1] & 0x31) == 0x20) || /* jmp near, absolute indirect */
- ((p->ainsn.insn[1] & 0x31) == 0x21)) { /* jmp far, absolute indirect */
- /* eip is correct. And this is boostable */
- p->ainsn.boostable = 1;
- goto no_change;
- }
- default:
- break;
- }
-
- if (p->ainsn.boostable == 0) {
- if ((regs->eip > copy_eip) &&
- (regs->eip - copy_eip) + 5 < MAX_INSN_SIZE) {
- /*
- * These instructions can be executed directly if it
- * jumps back to correct address.
- */
- set_jmp_op((void *)regs->eip,
- (void *)orig_eip + (regs->eip - copy_eip));
- p->ainsn.boostable = 1;
- } else {
- p->ainsn.boostable = -1;
- }
- }
-
- regs->eip = orig_eip + (regs->eip - copy_eip);
-
-no_change:
- return;
-}
-
-/*
- * Interrupts are disabled on entry as trap1 is an interrupt gate and they
- * remain disabled thoroughout this function.
- */
-static int __kprobes post_kprobe_handler(struct pt_regs *regs)
-{
- struct kprobe *cur = kprobe_running();
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- if (!cur)
- return 0;
-
- if ((kcb->kprobe_status != KPROBE_REENTER) && cur->post_handler) {
- kcb->kprobe_status = KPROBE_HIT_SSDONE;
- cur->post_handler(cur, regs, 0);
- }
-
- resume_execution(cur, regs, kcb);
- regs->eflags |= kcb->kprobe_saved_eflags;
-#ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT
- if (raw_irqs_disabled_flags(regs->eflags))
- trace_hardirqs_off();
- else
- trace_hardirqs_on();
-#endif
-
- /*Restore back the original saved kprobes variables and continue. */
- if (kcb->kprobe_status == KPROBE_REENTER) {
- restore_previous_kprobe(kcb);
- goto out;
- }
- reset_current_kprobe();
-out:
- preempt_enable_no_resched();
-
- /*
- * if somebody else is singlestepping across a probe point, eflags
- * will have TF set, in which case, continue the remaining processing
- * of do_debug, as if this is not a probe hit.
- */
- if (regs->eflags & TF_MASK)
- return 0;
-
- return 1;
-}
-
-int __kprobes kprobe_fault_handler(struct pt_regs *regs, int trapnr)
-{
- struct kprobe *cur = kprobe_running();
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- switch(kcb->kprobe_status) {
- case KPROBE_HIT_SS:
- case KPROBE_REENTER:
- /*
- * We are here because the instruction being single
- * stepped caused a page fault. We reset the current
- * kprobe and the eip points back to the probe address
- * and allow the page fault handler to continue as a
- * normal page fault.
- */
- regs->eip = (unsigned long)cur->addr;
- regs->eflags |= kcb->kprobe_old_eflags;
- if (kcb->kprobe_status == KPROBE_REENTER)
- restore_previous_kprobe(kcb);
- else
- reset_current_kprobe();
- preempt_enable_no_resched();
- break;
- case KPROBE_HIT_ACTIVE:
- case KPROBE_HIT_SSDONE:
- /*
- * We increment the nmissed count for accounting,
- * we can also use npre/npostfault count for accouting
- * these specific fault cases.
- */
- kprobes_inc_nmissed_count(cur);
-
- /*
- * We come here because instructions in the pre/post
- * handler caused the page_fault, this could happen
- * if handler tries to access user space by
- * copy_from_user(), get_user() etc. Let the
- * user-specified handler try to fix it first.
- */
- if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr))
- return 1;
-
- /*
- * In case the user-specified fault handler returned
- * zero, try to fix up.
- */
- if (fixup_exception(regs))
- return 1;
-
- /*
- * fixup_exception() could not handle it,
- * Let do_page_fault() fix it.
- */
- break;
- default:
- break;
- }
- return 0;
-}
-
-/*
- * Wrapper routine to for handling exceptions.
- */
-int __kprobes kprobe_exceptions_notify(struct notifier_block *self,
- unsigned long val, void *data)
-{
- struct die_args *args = (struct die_args *)data;
- int ret = NOTIFY_DONE;
-
- if (args->regs && user_mode_vm(args->regs))
- return ret;
-
- switch (val) {
- case DIE_INT3:
- if (kprobe_handler(args->regs))
- ret = NOTIFY_STOP;
- break;
- case DIE_DEBUG:
- if (post_kprobe_handler(args->regs))
- ret = NOTIFY_STOP;
- break;
- case DIE_GPF:
- /* kprobe_running() needs smp_processor_id() */
- preempt_disable();
- if (kprobe_running() &&
- kprobe_fault_handler(args->regs, args->trapnr))
- ret = NOTIFY_STOP;
- preempt_enable();
- break;
- default:
- break;
- }
- return ret;
-}
-
-int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct jprobe *jp = container_of(p, struct jprobe, kp);
- unsigned long addr;
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- kcb->jprobe_saved_regs = *regs;
- kcb->jprobe_saved_esp = ®s->esp;
- addr = (unsigned long)(kcb->jprobe_saved_esp);
-
- /*
- * TBD: As Linus pointed out, gcc assumes that the callee
- * owns the argument space and could overwrite it, e.g.
- * tailcall optimization. So, to be absolutely safe
- * we also save and restore enough stack bytes to cover
- * the argument area.
- */
- memcpy(kcb->jprobes_stack, (kprobe_opcode_t *)addr,
- MIN_STACK_SIZE(addr));
- regs->eflags &= ~IF_MASK;
- trace_hardirqs_off();
- regs->eip = (unsigned long)(jp->entry);
- return 1;
-}
-
-void __kprobes jprobe_return(void)
-{
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- asm volatile (" xchgl %%ebx,%%esp \n"
- " int3 \n"
- " .globl jprobe_return_end \n"
- " jprobe_return_end: \n"
- " nop \n"::"b"
- (kcb->jprobe_saved_esp):"memory");
-}
-
-int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
- u8 *addr = (u8 *) (regs->eip - 1);
- unsigned long stack_addr = (unsigned long)(kcb->jprobe_saved_esp);
- struct jprobe *jp = container_of(p, struct jprobe, kp);
-
- if ((addr > (u8 *) jprobe_return) && (addr < (u8 *) jprobe_return_end)) {
- if (®s->esp != kcb->jprobe_saved_esp) {
- struct pt_regs *saved_regs =
- container_of(kcb->jprobe_saved_esp,
- struct pt_regs, esp);
- printk("current esp %p does not match saved esp %p\n",
- ®s->esp, kcb->jprobe_saved_esp);
- printk("Saved registers for jprobe %p\n", jp);
- show_registers(saved_regs);
- printk("Current registers\n");
- show_registers(regs);
- BUG();
- }
- *regs = kcb->jprobe_saved_regs;
- memcpy((kprobe_opcode_t *) stack_addr, kcb->jprobes_stack,
- MIN_STACK_SIZE(stack_addr));
- preempt_enable_no_resched();
- return 1;
- }
- return 0;
-}
-
-int __kprobes arch_trampoline_kprobe(struct kprobe *p)
-{
- return 0;
-}
-
-int __init arch_init_kprobes(void)
-{
- return 0;
-}
Index: linux-2.6-lttng.stable/arch/x86/kernel/kprobes_64.c
===================================================================
--- linux-2.6-lttng.stable.orig/arch/x86/kernel/kprobes_64.c 2007-10-29 09:51:07.000000000 -0400
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,761 +0,0 @@
-/*
- * Kernel Probes (KProbes)
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
- *
- * Copyright (C) IBM Corporation, 2002, 2004
- *
- * 2002-Oct Created by Vamsi Krishna S <vamsi_krishna@in.ibm.com> Kernel
- * Probes initial implementation ( includes contributions from
- * Rusty Russell).
- * 2004-July Suparna Bhattacharya <suparna@in.ibm.com> added jumper probes
- * interface to access function arguments.
- * 2004-Oct Jim Keniston <kenistoj@us.ibm.com> and Prasanna S Panchamukhi
- * <prasanna@in.ibm.com> adapted for x86_64
- * 2005-Mar Roland McGrath <roland@redhat.com>
- * Fixed to handle %rip-relative addressing mode correctly.
- * 2005-May Rusty Lynch <rusty.lynch@intel.com>
- * Added function return probes functionality
- */
-
-#include <linux/kprobes.h>
-#include <linux/ptrace.h>
-#include <linux/string.h>
-#include <linux/slab.h>
-#include <linux/preempt.h>
-#include <linux/module.h>
-#include <linux/kdebug.h>
-
-#include <asm/pgtable.h>
-#include <asm/uaccess.h>
-#include <asm/alternative.h>
-
-void jprobe_return_end(void);
-static void __kprobes arch_copy_kprobe(struct kprobe *p);
-
-DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
-DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
-
-struct kretprobe_blackpoint kretprobe_blacklist[] = {
- {"__switch_to", }, /* This function switches only current task, but
- doesn't switch kernel stack.*/
- {NULL, NULL} /* Terminator */
-};
-const int kretprobe_blacklist_size = ARRAY_SIZE(kretprobe_blacklist);
-
-/*
- * returns non-zero if opcode modifies the interrupt flag.
- */
-static __always_inline int is_IF_modifier(kprobe_opcode_t *insn)
-{
- switch (*insn) {
- case 0xfa: /* cli */
- case 0xfb: /* sti */
- case 0xcf: /* iret/iretd */
- case 0x9d: /* popf/popfd */
- return 1;
- }
-
- if (*insn >= 0x40 && *insn <= 0x4f && *++insn == 0xcf)
- return 1;
- return 0;
-}
-
-int __kprobes arch_prepare_kprobe(struct kprobe *p)
-{
- /* insn: must be on special executable page on x86_64. */
- p->ainsn.insn = get_insn_slot();
- if (!p->ainsn.insn) {
- return -ENOMEM;
- }
- arch_copy_kprobe(p);
- return 0;
-}
-
-/*
- * Determine if the instruction uses the %rip-relative addressing mode.
- * If it does, return the address of the 32-bit displacement word.
- * If not, return null.
- */
-static s32 __kprobes *is_riprel(u8 *insn)
-{
-#define W(row,b0,b1,b2,b3,b4,b5,b6,b7,b8,b9,ba,bb,bc,bd,be,bf) \
- (((b0##UL << 0x0)|(b1##UL << 0x1)|(b2##UL << 0x2)|(b3##UL << 0x3) | \
- (b4##UL << 0x4)|(b5##UL << 0x5)|(b6##UL << 0x6)|(b7##UL << 0x7) | \
- (b8##UL << 0x8)|(b9##UL << 0x9)|(ba##UL << 0xa)|(bb##UL << 0xb) | \
- (bc##UL << 0xc)|(bd##UL << 0xd)|(be##UL << 0xe)|(bf##UL << 0xf)) \
- << (row % 64))
- static const u64 onebyte_has_modrm[256 / 64] = {
- /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */
- /* ------------------------------- */
- W(0x00, 1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0)| /* 00 */
- W(0x10, 1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0)| /* 10 */
- W(0x20, 1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0)| /* 20 */
- W(0x30, 1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0), /* 30 */
- W(0x40, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)| /* 40 */
- W(0x50, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)| /* 50 */
- W(0x60, 0,0,1,1,0,0,0,0,0,1,0,1,0,0,0,0)| /* 60 */
- W(0x70, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), /* 70 */
- W(0x80, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)| /* 80 */
- W(0x90, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)| /* 90 */
- W(0xa0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)| /* a0 */
- W(0xb0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), /* b0 */
- W(0xc0, 1,1,0,0,1,1,1,1,0,0,0,0,0,0,0,0)| /* c0 */
- W(0xd0, 1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1)| /* d0 */
- W(0xe0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)| /* e0 */
- W(0xf0, 0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,1) /* f0 */
- /* ------------------------------- */
- /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */
- };
- static const u64 twobyte_has_modrm[256 / 64] = {
- /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */
- /* ------------------------------- */
- W(0x00, 1,1,1,1,0,0,0,0,0,0,0,0,0,1,0,1)| /* 0f */
- W(0x10, 1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0)| /* 1f */
- W(0x20, 1,1,1,1,1,0,1,0,1,1,1,1,1,1,1,1)| /* 2f */
- W(0x30, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), /* 3f */
- W(0x40, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)| /* 4f */
- W(0x50, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)| /* 5f */
- W(0x60, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)| /* 6f */
- W(0x70, 1,1,1,1,1,1,1,0,0,0,0,0,1,1,1,1), /* 7f */
- W(0x80, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)| /* 8f */
- W(0x90, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)| /* 9f */
- W(0xa0, 0,0,0,1,1,1,1,1,0,0,0,1,1,1,1,1)| /* af */
- W(0xb0, 1,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1), /* bf */
- W(0xc0, 1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0)| /* cf */
- W(0xd0, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)| /* df */
- W(0xe0, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)| /* ef */
- W(0xf0, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0) /* ff */
- /* ------------------------------- */
- /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */
- };
-#undef W
- int need_modrm;
-
- /* Skip legacy instruction prefixes. */
- while (1) {
- switch (*insn) {
- case 0x66:
- case 0x67:
- case 0x2e:
- case 0x3e:
- case 0x26:
- case 0x64:
- case 0x65:
- case 0x36:
- case 0xf0:
- case 0xf3:
- case 0xf2:
- ++insn;
- continue;
- }
- break;
- }
-
- /* Skip REX instruction prefix. */
- if ((*insn & 0xf0) == 0x40)
- ++insn;
-
- if (*insn == 0x0f) { /* Two-byte opcode. */
- ++insn;
- need_modrm = test_bit(*insn, twobyte_has_modrm);
- } else { /* One-byte opcode. */
- need_modrm = test_bit(*insn, onebyte_has_modrm);
- }
-
- if (need_modrm) {
- u8 modrm = *++insn;
- if ((modrm & 0xc7) == 0x05) { /* %rip+disp32 addressing mode */
- /* Displacement follows ModRM byte. */
- return (s32 *) ++insn;
- }
- }
-
- /* No %rip-relative addressing mode here. */
- return NULL;
-}
-
-static void __kprobes arch_copy_kprobe(struct kprobe *p)
-{
- s32 *ripdisp;
- memcpy(p->ainsn.insn, p->addr, MAX_INSN_SIZE);
- ripdisp = is_riprel(p->ainsn.insn);
- if (ripdisp) {
- /*
- * The copied instruction uses the %rip-relative
- * addressing mode. Adjust the displacement for the
- * difference between the original location of this
- * instruction and the location of the copy that will
- * actually be run. The tricky bit here is making sure
- * that the sign extension happens correctly in this
- * calculation, since we need a signed 32-bit result to
- * be sign-extended to 64 bits when it's added to the
- * %rip value and yield the same 64-bit result that the
- * sign-extension of the original signed 32-bit
- * displacement would have given.
- */
- s64 disp = (u8 *) p->addr + *ripdisp - (u8 *) p->ainsn.insn;
- BUG_ON((s64) (s32) disp != disp); /* Sanity check. */
- *ripdisp = disp;
- }
- p->opcode = *p->addr;
-}
-
-void __kprobes arch_arm_kprobe(struct kprobe *p)
-{
- text_poke(p->addr, ((unsigned char []){BREAKPOINT_INSTRUCTION}), 1);
-}
-
-void __kprobes arch_disarm_kprobe(struct kprobe *p)
-{
- text_poke(p->addr, &p->opcode, 1);
-}
-
-void __kprobes arch_remove_kprobe(struct kprobe *p)
-{
- mutex_lock(&kprobe_mutex);
- free_insn_slot(p->ainsn.insn, 0);
- mutex_unlock(&kprobe_mutex);
-}
-
-static void __kprobes save_previous_kprobe(struct kprobe_ctlblk *kcb)
-{
- kcb->prev_kprobe.kp = kprobe_running();
- kcb->prev_kprobe.status = kcb->kprobe_status;
- kcb->prev_kprobe.old_rflags = kcb->kprobe_old_rflags;
- kcb->prev_kprobe.saved_rflags = kcb->kprobe_saved_rflags;
-}
-
-static void __kprobes restore_previous_kprobe(struct kprobe_ctlblk *kcb)
-{
- __get_cpu_var(current_kprobe) = kcb->prev_kprobe.kp;
- kcb->kprobe_status = kcb->prev_kprobe.status;
- kcb->kprobe_old_rflags = kcb->prev_kprobe.old_rflags;
- kcb->kprobe_saved_rflags = kcb->prev_kprobe.saved_rflags;
-}
-
-static void __kprobes set_current_kprobe(struct kprobe *p, struct pt_regs *regs,
- struct kprobe_ctlblk *kcb)
-{
- __get_cpu_var(current_kprobe) = p;
- kcb->kprobe_saved_rflags = kcb->kprobe_old_rflags
- = (regs->eflags & (TF_MASK | IF_MASK));
- if (is_IF_modifier(p->ainsn.insn))
- kcb->kprobe_saved_rflags &= ~IF_MASK;
-}
-
-static void __kprobes prepare_singlestep(struct kprobe *p, struct pt_regs *regs)
-{
- regs->eflags |= TF_MASK;
- regs->eflags &= ~IF_MASK;
- /*single step inline if the instruction is an int3*/
- if (p->opcode == BREAKPOINT_INSTRUCTION)
- regs->rip = (unsigned long)p->addr;
- else
- regs->rip = (unsigned long)p->ainsn.insn;
-}
-
-/* Called with kretprobe_lock held */
-void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
- struct pt_regs *regs)
-{
- unsigned long *sara = (unsigned long *)regs->rsp;
-
- ri->ret_addr = (kprobe_opcode_t *) *sara;
- /* Replace the return addr with trampoline addr */
- *sara = (unsigned long) &kretprobe_trampoline;
-}
-
-int __kprobes kprobe_handler(struct pt_regs *regs)
-{
- struct kprobe *p;
- int ret = 0;
- kprobe_opcode_t *addr = (kprobe_opcode_t *)(regs->rip - sizeof(kprobe_opcode_t));
- struct kprobe_ctlblk *kcb;
-
- /*
- * We don't want to be preempted for the entire
- * duration of kprobe processing
- */
- preempt_disable();
- kcb = get_kprobe_ctlblk();
-
- /* Check we're not actually recursing */
- if (kprobe_running()) {
- p = get_kprobe(addr);
- if (p) {
- if (kcb->kprobe_status == KPROBE_HIT_SS &&
- *p->ainsn.insn == BREAKPOINT_INSTRUCTION) {
- regs->eflags &= ~TF_MASK;
- regs->eflags |= kcb->kprobe_saved_rflags;
- goto no_kprobe;
- } else if (kcb->kprobe_status == KPROBE_HIT_SSDONE) {
- /* TODO: Provide re-entrancy from
- * post_kprobes_handler() and avoid exception
- * stack corruption while single-stepping on
- * the instruction of the new probe.
- */
- arch_disarm_kprobe(p);
- regs->rip = (unsigned long)p->addr;
- reset_current_kprobe();
- ret = 1;
- } else {
- /* We have reentered the kprobe_handler(), since
- * another probe was hit while within the
- * handler. We here save the original kprobe
- * variables and just single step on instruction
- * of the new probe without calling any user
- * handlers.
- */
- save_previous_kprobe(kcb);
- set_current_kprobe(p, regs, kcb);
- kprobes_inc_nmissed_count(p);
- prepare_singlestep(p, regs);
- kcb->kprobe_status = KPROBE_REENTER;
- return 1;
- }
- } else {
- if (*addr != BREAKPOINT_INSTRUCTION) {
- /* The breakpoint instruction was removed by
- * another cpu right after we hit, no further
- * handling of this interrupt is appropriate
- */
- regs->rip = (unsigned long)addr;
- ret = 1;
- goto no_kprobe;
- }
- p = __get_cpu_var(current_kprobe);
- if (p->break_handler && p->break_handler(p, regs)) {
- goto ss_probe;
- }
- }
- goto no_kprobe;
- }
-
- p = get_kprobe(addr);
- if (!p) {
- if (*addr != BREAKPOINT_INSTRUCTION) {
- /*
- * The breakpoint instruction was removed right
- * after we hit it. Another cpu has removed
- * either a probepoint or a debugger breakpoint
- * at this address. In either case, no further
- * handling of this interrupt is appropriate.
- * Back up over the (now missing) int3 and run
- * the original instruction.
- */
- regs->rip = (unsigned long)addr;
- ret = 1;
- }
- /* Not one of ours: let kernel handle it */
- goto no_kprobe;
- }
-
- set_current_kprobe(p, regs, kcb);
- kcb->kprobe_status = KPROBE_HIT_ACTIVE;
-
- if (p->pre_handler && p->pre_handler(p, regs))
- /* handler has already set things up, so skip ss setup */
- return 1;
-
-ss_probe:
- prepare_singlestep(p, regs);
- kcb->kprobe_status = KPROBE_HIT_SS;
- return 1;
-
-no_kprobe:
- preempt_enable_no_resched();
- return ret;
-}
-
-/*
- * For function-return probes, init_kprobes() establishes a probepoint
- * here. When a retprobed function returns, this probe is hit and
- * trampoline_probe_handler() runs, calling the kretprobe's handler.
- */
- void kretprobe_trampoline_holder(void)
- {
- asm volatile ( ".global kretprobe_trampoline\n"
- "kretprobe_trampoline: \n"
- "nop\n");
- }
-
-/*
- * Called when we hit the probe point at kretprobe_trampoline
- */
-int __kprobes trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct kretprobe_instance *ri = NULL;
- struct hlist_head *head, empty_rp;
- struct hlist_node *node, *tmp;
- unsigned long flags, orig_ret_address = 0;
- unsigned long trampoline_address =(unsigned long)&kretprobe_trampoline;
-
- INIT_HLIST_HEAD(&empty_rp);
- spin_lock_irqsave(&kretprobe_lock, flags);
- head = kretprobe_inst_table_head(current);
-
- /*
- * It is possible to have multiple instances associated with a given
- * task either because an multiple functions in the call path
- * have a return probe installed on them, and/or more then one return
- * return probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always inserted at the head of the list
- * - when multiple return probes are registered for the same
- * function, the first instance's ret_addr will point to the
- * real return address, and all the rest will point to
- * kretprobe_trampoline
- */
- hlist_for_each_entry_safe(ri, node, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- if (ri->rp && ri->rp->handler)
- ri->rp->handler(ri, regs);
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
- regs->rip = orig_ret_address;
-
- reset_current_kprobe();
- spin_unlock_irqrestore(&kretprobe_lock, flags);
- preempt_enable_no_resched();
-
- hlist_for_each_entry_safe(ri, node, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
- /*
- * By returning a non-zero value, we are telling
- * kprobe_handler() that we don't want the post_handler
- * to run (and have re-enabled preemption)
- */
- return 1;
-}
-
-/*
- * Called after single-stepping. p->addr is the address of the
- * instruction whose first byte has been replaced by the "int 3"
- * instruction. To avoid the SMP problems that can occur when we
- * temporarily put back the original opcode to single-step, we
- * single-stepped a copy of the instruction. The address of this
- * copy is p->ainsn.insn.
- *
- * This function prepares to return from the post-single-step
- * interrupt. We have to fix up the stack as follows:
- *
- * 0) Except in the case of absolute or indirect jump or call instructions,
- * the new rip is relative to the copied instruction. We need to make
- * it relative to the original instruction.
- *
- * 1) If the single-stepped instruction was pushfl, then the TF and IF
- * flags are set in the just-pushed eflags, and may need to be cleared.
- *
- * 2) If the single-stepped instruction was a call, the return address
- * that is atop the stack is the address following the copied instruction.
- * We need to make it the address following the original instruction.
- */
-static void __kprobes resume_execution(struct kprobe *p,
- struct pt_regs *regs, struct kprobe_ctlblk *kcb)
-{
- unsigned long *tos = (unsigned long *)regs->rsp;
- unsigned long next_rip = 0;
- unsigned long copy_rip = (unsigned long)p->ainsn.insn;
- unsigned long orig_rip = (unsigned long)p->addr;
- kprobe_opcode_t *insn = p->ainsn.insn;
-
- /*skip the REX prefix*/
- if (*insn >= 0x40 && *insn <= 0x4f)
- insn++;
-
- switch (*insn) {
- case 0x9c: /* pushfl */
- *tos &= ~(TF_MASK | IF_MASK);
- *tos |= kcb->kprobe_old_rflags;
- break;
- case 0xc3: /* ret/lret */
- case 0xcb:
- case 0xc2:
- case 0xca:
- regs->eflags &= ~TF_MASK;
- /* rip is already adjusted, no more changes required*/
- return;
- case 0xe8: /* call relative - Fix return addr */
- *tos = orig_rip + (*tos - copy_rip);
- break;
- case 0xff:
- if ((insn[1] & 0x30) == 0x10) {
- /* call absolute, indirect */
- /* Fix return addr; rip is correct. */
- next_rip = regs->rip;
- *tos = orig_rip + (*tos - copy_rip);
- } else if (((insn[1] & 0x31) == 0x20) || /* jmp near, absolute indirect */
- ((insn[1] & 0x31) == 0x21)) { /* jmp far, absolute indirect */
- /* rip is correct. */
- next_rip = regs->rip;
- }
- break;
- case 0xea: /* jmp absolute -- rip is correct */
- next_rip = regs->rip;
- break;
- default:
- break;
- }
-
- regs->eflags &= ~TF_MASK;
- if (next_rip) {
- regs->rip = next_rip;
- } else {
- regs->rip = orig_rip + (regs->rip - copy_rip);
- }
-}
-
-int __kprobes post_kprobe_handler(struct pt_regs *regs)
-{
- struct kprobe *cur = kprobe_running();
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- if (!cur)
- return 0;
-
- if ((kcb->kprobe_status != KPROBE_REENTER) && cur->post_handler) {
- kcb->kprobe_status = KPROBE_HIT_SSDONE;
- cur->post_handler(cur, regs, 0);
- }
-
- resume_execution(cur, regs, kcb);
- regs->eflags |= kcb->kprobe_saved_rflags;
-#ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT
- if (raw_irqs_disabled_flags(regs->eflags))
- trace_hardirqs_off();
- else
- trace_hardirqs_on();
-#endif
-
- /* Restore the original saved kprobes variables and continue. */
- if (kcb->kprobe_status == KPROBE_REENTER) {
- restore_previous_kprobe(kcb);
- goto out;
- }
- reset_current_kprobe();
-out:
- preempt_enable_no_resched();
-
- /*
- * if somebody else is singlestepping across a probe point, eflags
- * will have TF set, in which case, continue the remaining processing
- * of do_debug, as if this is not a probe hit.
- */
- if (regs->eflags & TF_MASK)
- return 0;
-
- return 1;
-}
-
-int __kprobes kprobe_fault_handler(struct pt_regs *regs, int trapnr)
-{
- struct kprobe *cur = kprobe_running();
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
- const struct exception_table_entry *fixup;
-
- switch(kcb->kprobe_status) {
- case KPROBE_HIT_SS:
- case KPROBE_REENTER:
- /*
- * We are here because the instruction being single
- * stepped caused a page fault. We reset the current
- * kprobe and the rip points back to the probe address
- * and allow the page fault handler to continue as a
- * normal page fault.
- */
- regs->rip = (unsigned long)cur->addr;
- regs->eflags |= kcb->kprobe_old_rflags;
- if (kcb->kprobe_status == KPROBE_REENTER)
- restore_previous_kprobe(kcb);
- else
- reset_current_kprobe();
- preempt_enable_no_resched();
- break;
- case KPROBE_HIT_ACTIVE:
- case KPROBE_HIT_SSDONE:
- /*
- * We increment the nmissed count for accounting,
- * we can also use npre/npostfault count for accouting
- * these specific fault cases.
- */
- kprobes_inc_nmissed_count(cur);
-
- /*
- * We come here because instructions in the pre/post
- * handler caused the page_fault, this could happen
- * if handler tries to access user space by
- * copy_from_user(), get_user() etc. Let the
- * user-specified handler try to fix it first.
- */
- if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr))
- return 1;
-
- /*
- * In case the user-specified fault handler returned
- * zero, try to fix up.
- */
- fixup = search_exception_tables(regs->rip);
- if (fixup) {
- regs->rip = fixup->fixup;
- return 1;
- }
-
- /*
- * fixup() could not handle it,
- * Let do_page_fault() fix it.
- */
- break;
- default:
- break;
- }
- return 0;
-}
-
-/*
- * Wrapper routine for handling exceptions.
- */
-int __kprobes kprobe_exceptions_notify(struct notifier_block *self,
- unsigned long val, void *data)
-{
- struct die_args *args = (struct die_args *)data;
- int ret = NOTIFY_DONE;
-
- if (args->regs && user_mode(args->regs))
- return ret;
-
- switch (val) {
- case DIE_INT3:
- if (kprobe_handler(args->regs))
- ret = NOTIFY_STOP;
- break;
- case DIE_DEBUG:
- if (post_kprobe_handler(args->regs))
- ret = NOTIFY_STOP;
- break;
- case DIE_GPF:
- /* kprobe_running() needs smp_processor_id() */
- preempt_disable();
- if (kprobe_running() &&
- kprobe_fault_handler(args->regs, args->trapnr))
- ret = NOTIFY_STOP;
- preempt_enable();
- break;
- default:
- break;
- }
- return ret;
-}
-
-int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct jprobe *jp = container_of(p, struct jprobe, kp);
- unsigned long addr;
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- kcb->jprobe_saved_regs = *regs;
- kcb->jprobe_saved_rsp = (long *) regs->rsp;
- addr = (unsigned long)(kcb->jprobe_saved_rsp);
- /*
- * As Linus pointed out, gcc assumes that the callee
- * owns the argument space and could overwrite it, e.g.
- * tailcall optimization. So, to be absolutely safe
- * we also save and restore enough stack bytes to cover
- * the argument area.
- */
- memcpy(kcb->jprobes_stack, (kprobe_opcode_t *)addr,
- MIN_STACK_SIZE(addr));
- regs->eflags &= ~IF_MASK;
- trace_hardirqs_off();
- regs->rip = (unsigned long)(jp->entry);
- return 1;
-}
-
-void __kprobes jprobe_return(void)
-{
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- asm volatile (" xchg %%rbx,%%rsp \n"
- " int3 \n"
- " .globl jprobe_return_end \n"
- " jprobe_return_end: \n"
- " nop \n"::"b"
- (kcb->jprobe_saved_rsp):"memory");
-}
-
-int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
- u8 *addr = (u8 *) (regs->rip - 1);
- unsigned long stack_addr = (unsigned long)(kcb->jprobe_saved_rsp);
- struct jprobe *jp = container_of(p, struct jprobe, kp);
-
- if ((addr > (u8 *) jprobe_return) && (addr < (u8 *) jprobe_return_end)) {
- if ((long *)regs->rsp != kcb->jprobe_saved_rsp) {
- struct pt_regs *saved_regs =
- container_of(kcb->jprobe_saved_rsp,
- struct pt_regs, rsp);
- printk("current rsp %p does not match saved rsp %p\n",
- (long *)regs->rsp, kcb->jprobe_saved_rsp);
- printk("Saved registers for jprobe %p\n", jp);
- show_registers(saved_regs);
- printk("Current registers\n");
- show_registers(regs);
- BUG();
- }
- *regs = kcb->jprobe_saved_regs;
- memcpy((kprobe_opcode_t *) stack_addr, kcb->jprobes_stack,
- MIN_STACK_SIZE(stack_addr));
- preempt_enable_no_resched();
- return 1;
- }
- return 0;
-}
-
-static struct kprobe trampoline_p = {
- .addr = (kprobe_opcode_t *) &kretprobe_trampoline,
- .pre_handler = trampoline_probe_handler
-};
-
-int __init arch_init_kprobes(void)
-{
- return register_kprobe(&trampoline_p);
-}
-
-int __kprobes arch_trampoline_kprobe(struct kprobe *p)
-{
- if (p->addr == (kprobe_opcode_t *)&kretprobe_trampoline)
- return 1;
-
- return 0;
-}
Index: linux-2.6-lttng.stable/arch/x86/instrumentation/Makefile
===================================================================
--- linux-2.6-lttng.stable.orig/arch/x86/instrumentation/Makefile 2007-10-29 09:51:29.000000000 -0400
+++ linux-2.6-lttng.stable/arch/x86/instrumentation/Makefile 2007-10-29 09:51:38.000000000 -0400
@@ -10,3 +10,9 @@ oprofile-y := $(DRIVER_OBJS) init.o b
oprofile-$(CONFIG_X86_LOCAL_APIC) += nmi_int.o op_model_athlon.o \
op_model_ppro.o op_model_p4.o
oprofile-$(CONFIG_X86_IO_APIC) += nmi_timer_int.o
+
+ifeq ($(CONFIG_X86_32),y)
+obj-$(CONFIG_KPROBES) += kprobes_32.o
+else
+obj-$(CONFIG_KPROBES) += kprobes_64.o
+endif
Index: linux-2.6-lttng.stable/arch/x86/kernel/Makefile_32
===================================================================
--- linux-2.6-lttng.stable.orig/arch/x86/kernel/Makefile_32 2007-10-29 09:51:07.000000000 -0400
+++ linux-2.6-lttng.stable/arch/x86/kernel/Makefile_32 2007-10-29 09:51:38.000000000 -0400
@@ -30,7 +30,6 @@ obj-$(CONFIG_KEXEC) += machine_kexec_32
obj-$(CONFIG_CRASH_DUMP) += crash_dump_32.o
obj-$(CONFIG_X86_NUMAQ) += numaq_32.o
obj-$(CONFIG_X86_SUMMIT_NUMA) += summit_32.o
-obj-$(CONFIG_KPROBES) += kprobes_32.o
obj-$(CONFIG_MODULES) += module_32.o
obj-y += sysenter_32.o vsyscall_32.o
obj-$(CONFIG_ACPI_SRAT) += srat_32.o
Index: linux-2.6-lttng.stable/arch/x86/kernel/Makefile_64
===================================================================
--- linux-2.6-lttng.stable.orig/arch/x86/kernel/Makefile_64 2007-10-29 09:51:07.000000000 -0400
+++ linux-2.6-lttng.stable/arch/x86/kernel/Makefile_64 2007-10-29 09:51:38.000000000 -0400
@@ -28,7 +28,6 @@ obj-$(CONFIG_EARLY_PRINTK) += early_prin
obj-$(CONFIG_IOMMU) += pci-gart_64.o aperture_64.o
obj-$(CONFIG_CALGARY_IOMMU) += pci-calgary_64.o tce_64.o
obj-$(CONFIG_SWIOTLB) += pci-swiotlb_64.o
-obj-$(CONFIG_KPROBES) += kprobes_64.o
obj-$(CONFIG_X86_PM_TIMER) += pmtimer_64.o
obj-$(CONFIG_X86_VSMP) += vsmp_64.o
obj-$(CONFIG_K8_NB) += k8.o
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 13+ messages in thread
* [patch 04/10] Move Linux Kernel Markers to instrumentation directory
2007-10-29 14:26 [patch 00/10] [PATCH] Create instrumentation/ kernel directory Mathieu Desnoyers
2007-10-29 14:26 ` [patch 01/10] Create instrumentation directory Mathieu Desnoyers
2007-10-29 14:26 ` [patch 03/10] Move kprobes to instrumentation/ and arch/*/instrumentation/ Mathieu Desnoyers
@ 2007-10-29 14:26 ` Mathieu Desnoyers
2007-10-29 14:26 ` [patch 05/10] Move profiling to instrumentation Mathieu Desnoyers
` (6 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2007-10-29 14:26 UTC (permalink / raw)
To: linux-kernel; +Cc: Mathieu Desnoyers
[-- Attachment #1: move-markers-to-instrumentation.patch --]
[-- Type: text/plain, Size: 30622 bytes --]
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
instrumentation/Makefile | 1
instrumentation/marker.c | 525 +++++++++++++++++++++++++++++++++++++++++++++++
kernel/Makefile | 1
kernel/marker.c | 525 -----------------------------------------------
4 files changed, 526 insertions(+), 526 deletions(-)
Index: linux-2.6-lttng.stable/instrumentation/Makefile
===================================================================
--- linux-2.6-lttng.stable.orig/instrumentation/Makefile 2007-10-29 09:51:46.000000000 -0400
+++ linux-2.6-lttng.stable/instrumentation/Makefile 2007-10-29 09:52:09.000000000 -0400
@@ -3,3 +3,4 @@
#
obj-$(CONFIG_KPROBES) += kprobes.o
+obj-$(CONFIG_MARKERS) += marker.o
Index: linux-2.6-lttng.stable/instrumentation/marker.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng.stable/instrumentation/marker.c 2007-10-29 09:51:51.000000000 -0400
@@ -0,0 +1,525 @@
+/*
+ * Copyright (C) 2007 Mathieu Desnoyers
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/types.h>
+#include <linux/jhash.h>
+#include <linux/list.h>
+#include <linux/rcupdate.h>
+#include <linux/marker.h>
+#include <linux/err.h>
+
+extern struct marker __start___markers[];
+extern struct marker __stop___markers[];
+
+/*
+ * module_mutex nests inside markers_mutex. Markers mutex protects the builtin
+ * and module markers, the hash table and deferred_sync.
+ */
+static DEFINE_MUTEX(markers_mutex);
+
+/*
+ * Marker deferred synchronization.
+ * Upon marker probe_unregister, we delay call to synchronize_sched() to
+ * accelerate mass unregistration (only when there is no more reference to a
+ * given module do we call synchronize_sched()). However, we need to make sure
+ * every critical region has ended before we re-arm a marker that has been
+ * unregistered and then registered back with a different probe data.
+ */
+static int deferred_sync;
+
+/*
+ * Marker hash table, containing the active markers.
+ * Protected by module_mutex.
+ */
+#define MARKER_HASH_BITS 6
+#define MARKER_TABLE_SIZE (1 << MARKER_HASH_BITS)
+
+struct marker_entry {
+ struct hlist_node hlist;
+ char *format;
+ marker_probe_func *probe;
+ void *private;
+ int refcount; /* Number of times armed. 0 if disarmed. */
+ char name[0]; /* Contains name'\0'format'\0' */
+};
+
+static struct hlist_head marker_table[MARKER_TABLE_SIZE];
+
+/**
+ * __mark_empty_function - Empty probe callback
+ * @mdata: pointer of type const struct marker
+ * @fmt: format string
+ * @...: variable argument list
+ *
+ * Empty callback provided as a probe to the markers. By providing this to a
+ * disabled marker, we make sure the execution flow is always valid even
+ * though the function pointer change and the marker enabling are two distinct
+ * operations that modifies the execution flow of preemptible code.
+ */
+void __mark_empty_function(const struct marker *mdata, void *private,
+ const char *fmt, ...)
+{
+}
+EXPORT_SYMBOL_GPL(__mark_empty_function);
+
+/*
+ * Get marker if the marker is present in the marker hash table.
+ * Must be called with markers_mutex held.
+ * Returns NULL if not present.
+ */
+static struct marker_entry *get_marker(const char *name)
+{
+ struct hlist_head *head;
+ struct hlist_node *node;
+ struct marker_entry *e;
+ u32 hash = jhash(name, strlen(name), 0);
+
+ head = &marker_table[hash & ((1 << MARKER_HASH_BITS)-1)];
+ hlist_for_each_entry(e, node, head, hlist) {
+ if (!strcmp(name, e->name))
+ return e;
+ }
+ return NULL;
+}
+
+/*
+ * Add the marker to the marker hash table. Must be called with markers_mutex
+ * held.
+ */
+static int add_marker(const char *name, const char *format,
+ marker_probe_func *probe, void *private)
+{
+ struct hlist_head *head;
+ struct hlist_node *node;
+ struct marker_entry *e;
+ size_t name_len = strlen(name) + 1;
+ size_t format_len = 0;
+ u32 hash = jhash(name, name_len-1, 0);
+
+ if (format)
+ format_len = strlen(format) + 1;
+ head = &marker_table[hash & ((1 << MARKER_HASH_BITS)-1)];
+ hlist_for_each_entry(e, node, head, hlist) {
+ if (!strcmp(name, e->name)) {
+ printk(KERN_NOTICE
+ "Marker %s busy, probe %p already installed\n",
+ name, e->probe);
+ return -EBUSY; /* Already there */
+ }
+ }
+ /*
+ * Using kmalloc here to allocate a variable length element. Could
+ * cause some memory fragmentation if overused.
+ */
+ e = kmalloc(sizeof(struct marker_entry) + name_len + format_len,
+ GFP_KERNEL);
+ if (!e)
+ return -ENOMEM;
+ memcpy(&e->name[0], name, name_len);
+ if (format) {
+ e->format = &e->name[name_len];
+ memcpy(e->format, format, format_len);
+ trace_mark(core_marker_format, "name %s format %s",
+ e->name, e->format);
+ } else
+ e->format = NULL;
+ e->probe = probe;
+ e->private = private;
+ e->refcount = 0;
+ hlist_add_head(&e->hlist, head);
+ return 0;
+}
+
+/*
+ * Remove the marker from the marker hash table. Must be called with mutex_lock
+ * held.
+ */
+static void *remove_marker(const char *name)
+{
+ struct hlist_head *head;
+ struct hlist_node *node;
+ struct marker_entry *e;
+ int found = 0;
+ size_t len = strlen(name) + 1;
+ void *private = NULL;
+ u32 hash = jhash(name, len-1, 0);
+
+ head = &marker_table[hash & ((1 << MARKER_HASH_BITS)-1)];
+ hlist_for_each_entry(e, node, head, hlist) {
+ if (!strcmp(name, e->name)) {
+ found = 1;
+ break;
+ }
+ }
+ if (found) {
+ private = e->private;
+ hlist_del(&e->hlist);
+ kfree(e);
+ }
+ return private;
+}
+
+/*
+ * Set the mark_entry format to the format found in the element.
+ */
+static int marker_set_format(struct marker_entry **entry, const char *format)
+{
+ struct marker_entry *e;
+ size_t name_len = strlen((*entry)->name) + 1;
+ size_t format_len = strlen(format) + 1;
+
+ e = kmalloc(sizeof(struct marker_entry) + name_len + format_len,
+ GFP_KERNEL);
+ if (!e)
+ return -ENOMEM;
+ memcpy(&e->name[0], (*entry)->name, name_len);
+ e->format = &e->name[name_len];
+ memcpy(e->format, format, format_len);
+ e->probe = (*entry)->probe;
+ e->private = (*entry)->private;
+ e->refcount = (*entry)->refcount;
+ hlist_add_before(&e->hlist, &(*entry)->hlist);
+ hlist_del(&(*entry)->hlist);
+ kfree(*entry);
+ *entry = e;
+ trace_mark(core_marker_format, "name %s format %s",
+ e->name, e->format);
+ return 0;
+}
+
+/*
+ * Sets the probe callback corresponding to one marker.
+ */
+static int set_marker(struct marker_entry **entry, struct marker *elem)
+{
+ int ret;
+ WARN_ON(strcmp((*entry)->name, elem->name) != 0);
+
+ if ((*entry)->format) {
+ if (strcmp((*entry)->format, elem->format) != 0) {
+ printk(KERN_NOTICE
+ "Format mismatch for probe %s "
+ "(%s), marker (%s)\n",
+ (*entry)->name,
+ (*entry)->format,
+ elem->format);
+ return -EPERM;
+ }
+ } else {
+ ret = marker_set_format(entry, elem->format);
+ if (ret)
+ return ret;
+ }
+ elem->call = (*entry)->probe;
+ elem->private = (*entry)->private;
+ elem->state = 1;
+ return 0;
+}
+
+/*
+ * Disable a marker and its probe callback.
+ * Note: only after a synchronize_sched() issued after setting elem->call to the
+ * empty function insures that the original callback is not used anymore. This
+ * insured by preemption disabling around the call site.
+ */
+static void disable_marker(struct marker *elem)
+{
+ elem->state = 0;
+ elem->call = __mark_empty_function;
+ /*
+ * Leave the private data and id there, because removal is racy and
+ * should be done only after a synchronize_sched(). These are never used
+ * until the next initialization anyway.
+ */
+}
+
+/**
+ * marker_update_probe_range - Update a probe range
+ * @begin: beginning of the range
+ * @end: end of the range
+ * @probe_module: module address of the probe being updated
+ * @refcount: number of references left to the given probe_module (out)
+ *
+ * Updates the probe callback corresponding to a range of markers.
+ * Must be called with markers_mutex held.
+ */
+void marker_update_probe_range(struct marker *begin,
+ struct marker *end, struct module *probe_module,
+ int *refcount)
+{
+ struct marker *iter;
+ struct marker_entry *mark_entry;
+
+ for (iter = begin; iter < end; iter++) {
+ mark_entry = get_marker(iter->name);
+ if (mark_entry && mark_entry->refcount) {
+ set_marker(&mark_entry, iter);
+ /*
+ * ignore error, continue
+ */
+ if (probe_module)
+ if (probe_module ==
+ __module_text_address((unsigned long)mark_entry->probe))
+ (*refcount)++;
+ } else {
+ disable_marker(iter);
+ }
+ }
+}
+
+/*
+ * Update probes, removing the faulty probes.
+ * Issues a synchronize_sched() when no reference to the module passed
+ * as parameter is found in the probes so the probe module can be
+ * safely unloaded from now on.
+ */
+static void marker_update_probes(struct module *probe_module)
+{
+ int refcount = 0;
+
+ mutex_lock(&markers_mutex);
+ /* Core kernel markers */
+ marker_update_probe_range(__start___markers,
+ __stop___markers, probe_module, &refcount);
+ /* Markers in modules. */
+ module_update_markers(probe_module, &refcount);
+ if (probe_module && refcount == 0) {
+ synchronize_sched();
+ deferred_sync = 0;
+ }
+ mutex_unlock(&markers_mutex);
+}
+
+/**
+ * marker_probe_register - Connect a probe to a marker
+ * @name: marker name
+ * @format: format string
+ * @probe: probe handler
+ * @private: probe private data
+ *
+ * private data must be a valid allocated memory address, or NULL.
+ * Returns 0 if ok, error value on error.
+ */
+int marker_probe_register(const char *name, const char *format,
+ marker_probe_func *probe, void *private)
+{
+ struct marker_entry *entry;
+ int ret = 0, need_update = 0;
+
+ mutex_lock(&markers_mutex);
+ entry = get_marker(name);
+ if (entry && entry->refcount) {
+ ret = -EBUSY;
+ goto end;
+ }
+ if (deferred_sync) {
+ synchronize_sched();
+ deferred_sync = 0;
+ }
+ ret = add_marker(name, format, probe, private);
+ if (ret)
+ goto end;
+ need_update = 1;
+end:
+ mutex_unlock(&markers_mutex);
+ if (need_update)
+ marker_update_probes(NULL);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(marker_probe_register);
+
+/**
+ * marker_probe_unregister - Disconnect a probe from a marker
+ * @name: marker name
+ *
+ * Returns the private data given to marker_probe_register, or an ERR_PTR().
+ */
+void *marker_probe_unregister(const char *name)
+{
+ struct module *probe_module;
+ struct marker_entry *entry;
+ void *private;
+ int need_update = 0;
+
+ mutex_lock(&markers_mutex);
+ entry = get_marker(name);
+ if (!entry) {
+ private = ERR_PTR(-ENOENT);
+ goto end;
+ }
+ entry->refcount = 0;
+ /* In what module is the probe handler ? */
+ probe_module = __module_text_address((unsigned long)entry->probe);
+ private = remove_marker(name);
+ deferred_sync = 1;
+ need_update = 1;
+end:
+ mutex_unlock(&markers_mutex);
+ if (need_update)
+ marker_update_probes(probe_module);
+ return private;
+}
+EXPORT_SYMBOL_GPL(marker_probe_unregister);
+
+/**
+ * marker_probe_unregister_private_data - Disconnect a probe from a marker
+ * @private: probe private data
+ *
+ * Unregister a marker by providing the registered private data.
+ * Returns the private data given to marker_probe_register, or an ERR_PTR().
+ */
+void *marker_probe_unregister_private_data(void *private)
+{
+ struct module *probe_module;
+ struct hlist_head *head;
+ struct hlist_node *node;
+ struct marker_entry *entry;
+ int found = 0;
+ unsigned int i;
+ int need_update = 0;
+
+ mutex_lock(&markers_mutex);
+ for (i = 0; i < MARKER_TABLE_SIZE; i++) {
+ head = &marker_table[i];
+ hlist_for_each_entry(entry, node, head, hlist) {
+ if (entry->private == private) {
+ found = 1;
+ goto iter_end;
+ }
+ }
+ }
+iter_end:
+ if (!found) {
+ private = ERR_PTR(-ENOENT);
+ goto end;
+ }
+ entry->refcount = 0;
+ /* In what module is the probe handler ? */
+ probe_module = __module_text_address((unsigned long)entry->probe);
+ private = remove_marker(entry->name);
+ deferred_sync = 1;
+ need_update = 1;
+end:
+ mutex_unlock(&markers_mutex);
+ if (need_update)
+ marker_update_probes(probe_module);
+ return private;
+}
+EXPORT_SYMBOL_GPL(marker_probe_unregister_private_data);
+
+/**
+ * marker_arm - Arm a marker
+ * @name: marker name
+ *
+ * Activate a marker. It keeps a reference count of the number of
+ * arming/disarming done.
+ * Returns 0 if ok, error value on error.
+ */
+int marker_arm(const char *name)
+{
+ struct marker_entry *entry;
+ int ret = 0, need_update = 0;
+
+ mutex_lock(&markers_mutex);
+ entry = get_marker(name);
+ if (!entry) {
+ ret = -ENOENT;
+ goto end;
+ }
+ /*
+ * Only need to update probes when refcount passes from 0 to 1.
+ */
+ if (entry->refcount++)
+ goto end;
+ need_update = 1;
+end:
+ mutex_unlock(&markers_mutex);
+ if (need_update)
+ marker_update_probes(NULL);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(marker_arm);
+
+/**
+ * marker_disarm - Disarm a marker
+ * @name: marker name
+ *
+ * Disarm a marker. It keeps a reference count of the number of arming/disarming
+ * done.
+ * Returns 0 if ok, error value on error.
+ */
+int marker_disarm(const char *name)
+{
+ struct marker_entry *entry;
+ int ret = 0, need_update = 0;
+
+ mutex_lock(&markers_mutex);
+ entry = get_marker(name);
+ if (!entry) {
+ ret = -ENOENT;
+ goto end;
+ }
+ /*
+ * Only permit decrement refcount if higher than 0.
+ * Do probe update only on 1 -> 0 transition.
+ */
+ if (entry->refcount) {
+ if (--entry->refcount)
+ goto end;
+ } else {
+ ret = -EPERM;
+ goto end;
+ }
+ need_update = 1;
+end:
+ mutex_unlock(&markers_mutex);
+ if (need_update)
+ marker_update_probes(NULL);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(marker_disarm);
+
+/**
+ * marker_get_private_data - Get a marker's probe private data
+ * @name: marker name
+ *
+ * Returns the private data pointer, or an ERR_PTR.
+ * The private data pointer should _only_ be dereferenced if the caller is the
+ * owner of the data, or its content could vanish. This is mostly used to
+ * confirm that a caller is the owner of a registered probe.
+ */
+void *marker_get_private_data(const char *name)
+{
+ struct hlist_head *head;
+ struct hlist_node *node;
+ struct marker_entry *e;
+ size_t name_len = strlen(name) + 1;
+ u32 hash = jhash(name, name_len-1, 0);
+ int found = 0;
+
+ head = &marker_table[hash & ((1 << MARKER_HASH_BITS)-1)];
+ hlist_for_each_entry(e, node, head, hlist) {
+ if (!strcmp(name, e->name)) {
+ found = 1;
+ return e->private;
+ }
+ }
+ return ERR_PTR(-ENOENT);
+}
+EXPORT_SYMBOL_GPL(marker_get_private_data);
Index: linux-2.6-lttng.stable/kernel/Makefile
===================================================================
--- linux-2.6-lttng.stable.orig/kernel/Makefile 2007-10-29 09:51:38.000000000 -0400
+++ linux-2.6-lttng.stable/kernel/Makefile 2007-10-29 09:51:51.000000000 -0400
@@ -56,7 +56,6 @@ obj-$(CONFIG_RELAY) += relay.o
obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
-obj-$(CONFIG_MARKERS) += marker.o
ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
# According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
Index: linux-2.6-lttng.stable/kernel/marker.c
===================================================================
--- linux-2.6-lttng.stable.orig/kernel/marker.c 2007-10-29 09:51:06.000000000 -0400
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,525 +0,0 @@
-/*
- * Copyright (C) 2007 Mathieu Desnoyers
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
- */
-#include <linux/module.h>
-#include <linux/mutex.h>
-#include <linux/types.h>
-#include <linux/jhash.h>
-#include <linux/list.h>
-#include <linux/rcupdate.h>
-#include <linux/marker.h>
-#include <linux/err.h>
-
-extern struct marker __start___markers[];
-extern struct marker __stop___markers[];
-
-/*
- * module_mutex nests inside markers_mutex. Markers mutex protects the builtin
- * and module markers, the hash table and deferred_sync.
- */
-static DEFINE_MUTEX(markers_mutex);
-
-/*
- * Marker deferred synchronization.
- * Upon marker probe_unregister, we delay call to synchronize_sched() to
- * accelerate mass unregistration (only when there is no more reference to a
- * given module do we call synchronize_sched()). However, we need to make sure
- * every critical region has ended before we re-arm a marker that has been
- * unregistered and then registered back with a different probe data.
- */
-static int deferred_sync;
-
-/*
- * Marker hash table, containing the active markers.
- * Protected by module_mutex.
- */
-#define MARKER_HASH_BITS 6
-#define MARKER_TABLE_SIZE (1 << MARKER_HASH_BITS)
-
-struct marker_entry {
- struct hlist_node hlist;
- char *format;
- marker_probe_func *probe;
- void *private;
- int refcount; /* Number of times armed. 0 if disarmed. */
- char name[0]; /* Contains name'\0'format'\0' */
-};
-
-static struct hlist_head marker_table[MARKER_TABLE_SIZE];
-
-/**
- * __mark_empty_function - Empty probe callback
- * @mdata: pointer of type const struct marker
- * @fmt: format string
- * @...: variable argument list
- *
- * Empty callback provided as a probe to the markers. By providing this to a
- * disabled marker, we make sure the execution flow is always valid even
- * though the function pointer change and the marker enabling are two distinct
- * operations that modifies the execution flow of preemptible code.
- */
-void __mark_empty_function(const struct marker *mdata, void *private,
- const char *fmt, ...)
-{
-}
-EXPORT_SYMBOL_GPL(__mark_empty_function);
-
-/*
- * Get marker if the marker is present in the marker hash table.
- * Must be called with markers_mutex held.
- * Returns NULL if not present.
- */
-static struct marker_entry *get_marker(const char *name)
-{
- struct hlist_head *head;
- struct hlist_node *node;
- struct marker_entry *e;
- u32 hash = jhash(name, strlen(name), 0);
-
- head = &marker_table[hash & ((1 << MARKER_HASH_BITS)-1)];
- hlist_for_each_entry(e, node, head, hlist) {
- if (!strcmp(name, e->name))
- return e;
- }
- return NULL;
-}
-
-/*
- * Add the marker to the marker hash table. Must be called with markers_mutex
- * held.
- */
-static int add_marker(const char *name, const char *format,
- marker_probe_func *probe, void *private)
-{
- struct hlist_head *head;
- struct hlist_node *node;
- struct marker_entry *e;
- size_t name_len = strlen(name) + 1;
- size_t format_len = 0;
- u32 hash = jhash(name, name_len-1, 0);
-
- if (format)
- format_len = strlen(format) + 1;
- head = &marker_table[hash & ((1 << MARKER_HASH_BITS)-1)];
- hlist_for_each_entry(e, node, head, hlist) {
- if (!strcmp(name, e->name)) {
- printk(KERN_NOTICE
- "Marker %s busy, probe %p already installed\n",
- name, e->probe);
- return -EBUSY; /* Already there */
- }
- }
- /*
- * Using kmalloc here to allocate a variable length element. Could
- * cause some memory fragmentation if overused.
- */
- e = kmalloc(sizeof(struct marker_entry) + name_len + format_len,
- GFP_KERNEL);
- if (!e)
- return -ENOMEM;
- memcpy(&e->name[0], name, name_len);
- if (format) {
- e->format = &e->name[name_len];
- memcpy(e->format, format, format_len);
- trace_mark(core_marker_format, "name %s format %s",
- e->name, e->format);
- } else
- e->format = NULL;
- e->probe = probe;
- e->private = private;
- e->refcount = 0;
- hlist_add_head(&e->hlist, head);
- return 0;
-}
-
-/*
- * Remove the marker from the marker hash table. Must be called with mutex_lock
- * held.
- */
-static void *remove_marker(const char *name)
-{
- struct hlist_head *head;
- struct hlist_node *node;
- struct marker_entry *e;
- int found = 0;
- size_t len = strlen(name) + 1;
- void *private = NULL;
- u32 hash = jhash(name, len-1, 0);
-
- head = &marker_table[hash & ((1 << MARKER_HASH_BITS)-1)];
- hlist_for_each_entry(e, node, head, hlist) {
- if (!strcmp(name, e->name)) {
- found = 1;
- break;
- }
- }
- if (found) {
- private = e->private;
- hlist_del(&e->hlist);
- kfree(e);
- }
- return private;
-}
-
-/*
- * Set the mark_entry format to the format found in the element.
- */
-static int marker_set_format(struct marker_entry **entry, const char *format)
-{
- struct marker_entry *e;
- size_t name_len = strlen((*entry)->name) + 1;
- size_t format_len = strlen(format) + 1;
-
- e = kmalloc(sizeof(struct marker_entry) + name_len + format_len,
- GFP_KERNEL);
- if (!e)
- return -ENOMEM;
- memcpy(&e->name[0], (*entry)->name, name_len);
- e->format = &e->name[name_len];
- memcpy(e->format, format, format_len);
- e->probe = (*entry)->probe;
- e->private = (*entry)->private;
- e->refcount = (*entry)->refcount;
- hlist_add_before(&e->hlist, &(*entry)->hlist);
- hlist_del(&(*entry)->hlist);
- kfree(*entry);
- *entry = e;
- trace_mark(core_marker_format, "name %s format %s",
- e->name, e->format);
- return 0;
-}
-
-/*
- * Sets the probe callback corresponding to one marker.
- */
-static int set_marker(struct marker_entry **entry, struct marker *elem)
-{
- int ret;
- WARN_ON(strcmp((*entry)->name, elem->name) != 0);
-
- if ((*entry)->format) {
- if (strcmp((*entry)->format, elem->format) != 0) {
- printk(KERN_NOTICE
- "Format mismatch for probe %s "
- "(%s), marker (%s)\n",
- (*entry)->name,
- (*entry)->format,
- elem->format);
- return -EPERM;
- }
- } else {
- ret = marker_set_format(entry, elem->format);
- if (ret)
- return ret;
- }
- elem->call = (*entry)->probe;
- elem->private = (*entry)->private;
- elem->state = 1;
- return 0;
-}
-
-/*
- * Disable a marker and its probe callback.
- * Note: only after a synchronize_sched() issued after setting elem->call to the
- * empty function insures that the original callback is not used anymore. This
- * insured by preemption disabling around the call site.
- */
-static void disable_marker(struct marker *elem)
-{
- elem->state = 0;
- elem->call = __mark_empty_function;
- /*
- * Leave the private data and id there, because removal is racy and
- * should be done only after a synchronize_sched(). These are never used
- * until the next initialization anyway.
- */
-}
-
-/**
- * marker_update_probe_range - Update a probe range
- * @begin: beginning of the range
- * @end: end of the range
- * @probe_module: module address of the probe being updated
- * @refcount: number of references left to the given probe_module (out)
- *
- * Updates the probe callback corresponding to a range of markers.
- * Must be called with markers_mutex held.
- */
-void marker_update_probe_range(struct marker *begin,
- struct marker *end, struct module *probe_module,
- int *refcount)
-{
- struct marker *iter;
- struct marker_entry *mark_entry;
-
- for (iter = begin; iter < end; iter++) {
- mark_entry = get_marker(iter->name);
- if (mark_entry && mark_entry->refcount) {
- set_marker(&mark_entry, iter);
- /*
- * ignore error, continue
- */
- if (probe_module)
- if (probe_module ==
- __module_text_address((unsigned long)mark_entry->probe))
- (*refcount)++;
- } else {
- disable_marker(iter);
- }
- }
-}
-
-/*
- * Update probes, removing the faulty probes.
- * Issues a synchronize_sched() when no reference to the module passed
- * as parameter is found in the probes so the probe module can be
- * safely unloaded from now on.
- */
-static void marker_update_probes(struct module *probe_module)
-{
- int refcount = 0;
-
- mutex_lock(&markers_mutex);
- /* Core kernel markers */
- marker_update_probe_range(__start___markers,
- __stop___markers, probe_module, &refcount);
- /* Markers in modules. */
- module_update_markers(probe_module, &refcount);
- if (probe_module && refcount == 0) {
- synchronize_sched();
- deferred_sync = 0;
- }
- mutex_unlock(&markers_mutex);
-}
-
-/**
- * marker_probe_register - Connect a probe to a marker
- * @name: marker name
- * @format: format string
- * @probe: probe handler
- * @private: probe private data
- *
- * private data must be a valid allocated memory address, or NULL.
- * Returns 0 if ok, error value on error.
- */
-int marker_probe_register(const char *name, const char *format,
- marker_probe_func *probe, void *private)
-{
- struct marker_entry *entry;
- int ret = 0, need_update = 0;
-
- mutex_lock(&markers_mutex);
- entry = get_marker(name);
- if (entry && entry->refcount) {
- ret = -EBUSY;
- goto end;
- }
- if (deferred_sync) {
- synchronize_sched();
- deferred_sync = 0;
- }
- ret = add_marker(name, format, probe, private);
- if (ret)
- goto end;
- need_update = 1;
-end:
- mutex_unlock(&markers_mutex);
- if (need_update)
- marker_update_probes(NULL);
- return ret;
-}
-EXPORT_SYMBOL_GPL(marker_probe_register);
-
-/**
- * marker_probe_unregister - Disconnect a probe from a marker
- * @name: marker name
- *
- * Returns the private data given to marker_probe_register, or an ERR_PTR().
- */
-void *marker_probe_unregister(const char *name)
-{
- struct module *probe_module;
- struct marker_entry *entry;
- void *private;
- int need_update = 0;
-
- mutex_lock(&markers_mutex);
- entry = get_marker(name);
- if (!entry) {
- private = ERR_PTR(-ENOENT);
- goto end;
- }
- entry->refcount = 0;
- /* In what module is the probe handler ? */
- probe_module = __module_text_address((unsigned long)entry->probe);
- private = remove_marker(name);
- deferred_sync = 1;
- need_update = 1;
-end:
- mutex_unlock(&markers_mutex);
- if (need_update)
- marker_update_probes(probe_module);
- return private;
-}
-EXPORT_SYMBOL_GPL(marker_probe_unregister);
-
-/**
- * marker_probe_unregister_private_data - Disconnect a probe from a marker
- * @private: probe private data
- *
- * Unregister a marker by providing the registered private data.
- * Returns the private data given to marker_probe_register, or an ERR_PTR().
- */
-void *marker_probe_unregister_private_data(void *private)
-{
- struct module *probe_module;
- struct hlist_head *head;
- struct hlist_node *node;
- struct marker_entry *entry;
- int found = 0;
- unsigned int i;
- int need_update = 0;
-
- mutex_lock(&markers_mutex);
- for (i = 0; i < MARKER_TABLE_SIZE; i++) {
- head = &marker_table[i];
- hlist_for_each_entry(entry, node, head, hlist) {
- if (entry->private == private) {
- found = 1;
- goto iter_end;
- }
- }
- }
-iter_end:
- if (!found) {
- private = ERR_PTR(-ENOENT);
- goto end;
- }
- entry->refcount = 0;
- /* In what module is the probe handler ? */
- probe_module = __module_text_address((unsigned long)entry->probe);
- private = remove_marker(entry->name);
- deferred_sync = 1;
- need_update = 1;
-end:
- mutex_unlock(&markers_mutex);
- if (need_update)
- marker_update_probes(probe_module);
- return private;
-}
-EXPORT_SYMBOL_GPL(marker_probe_unregister_private_data);
-
-/**
- * marker_arm - Arm a marker
- * @name: marker name
- *
- * Activate a marker. It keeps a reference count of the number of
- * arming/disarming done.
- * Returns 0 if ok, error value on error.
- */
-int marker_arm(const char *name)
-{
- struct marker_entry *entry;
- int ret = 0, need_update = 0;
-
- mutex_lock(&markers_mutex);
- entry = get_marker(name);
- if (!entry) {
- ret = -ENOENT;
- goto end;
- }
- /*
- * Only need to update probes when refcount passes from 0 to 1.
- */
- if (entry->refcount++)
- goto end;
- need_update = 1;
-end:
- mutex_unlock(&markers_mutex);
- if (need_update)
- marker_update_probes(NULL);
- return ret;
-}
-EXPORT_SYMBOL_GPL(marker_arm);
-
-/**
- * marker_disarm - Disarm a marker
- * @name: marker name
- *
- * Disarm a marker. It keeps a reference count of the number of arming/disarming
- * done.
- * Returns 0 if ok, error value on error.
- */
-int marker_disarm(const char *name)
-{
- struct marker_entry *entry;
- int ret = 0, need_update = 0;
-
- mutex_lock(&markers_mutex);
- entry = get_marker(name);
- if (!entry) {
- ret = -ENOENT;
- goto end;
- }
- /*
- * Only permit decrement refcount if higher than 0.
- * Do probe update only on 1 -> 0 transition.
- */
- if (entry->refcount) {
- if (--entry->refcount)
- goto end;
- } else {
- ret = -EPERM;
- goto end;
- }
- need_update = 1;
-end:
- mutex_unlock(&markers_mutex);
- if (need_update)
- marker_update_probes(NULL);
- return ret;
-}
-EXPORT_SYMBOL_GPL(marker_disarm);
-
-/**
- * marker_get_private_data - Get a marker's probe private data
- * @name: marker name
- *
- * Returns the private data pointer, or an ERR_PTR.
- * The private data pointer should _only_ be dereferenced if the caller is the
- * owner of the data, or its content could vanish. This is mostly used to
- * confirm that a caller is the owner of a registered probe.
- */
-void *marker_get_private_data(const char *name)
-{
- struct hlist_head *head;
- struct hlist_node *node;
- struct marker_entry *e;
- size_t name_len = strlen(name) + 1;
- u32 hash = jhash(name, name_len-1, 0);
- int found = 0;
-
- head = &marker_table[hash & ((1 << MARKER_HASH_BITS)-1)];
- hlist_for_each_entry(e, node, head, hlist) {
- if (!strcmp(name, e->name)) {
- found = 1;
- return e->private;
- }
- }
- return ERR_PTR(-ENOENT);
-}
-EXPORT_SYMBOL_GPL(marker_get_private_data);
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 13+ messages in thread
* [patch 05/10] Move profiling to instrumentation
2007-10-29 14:26 [patch 00/10] [PATCH] Create instrumentation/ kernel directory Mathieu Desnoyers
` (2 preceding siblings ...)
2007-10-29 14:26 ` [patch 04/10] Move Linux Kernel Markers to instrumentation directory Mathieu Desnoyers
@ 2007-10-29 14:26 ` Mathieu Desnoyers
2007-10-29 14:59 ` Arjan van de Ven
2007-10-29 14:26 ` [patch 06/10] Move lockdep " Mathieu Desnoyers
` (5 subsequent siblings)
9 siblings, 1 reply; 13+ messages in thread
From: Mathieu Desnoyers @ 2007-10-29 14:26 UTC (permalink / raw)
To: linux-kernel
Cc: Mathieu Desnoyers, Ingo Molnar, William L. Irwin,
Arjan van de Ven
[-- Attachment #1: move-profile-to-instrumentation.patch --]
[-- Type: text/plain, Size: 35975 bytes --]
Move kernel/profile.c to instrumentation/.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Ingo Molnar <mingo@redhat.com>
CC: William L. Irwin <wli@holomorphy.com>
CC: Arjan van de Ven <arjan@infradead.org>
---
instrumentation/Makefile | 2
instrumentation/profile.c | 597 ++++++++++++++++++++++++++++++++++++++++++++++
kernel/Makefile | 2
kernel/profile.c | 597 ----------------------------------------------
4 files changed, 600 insertions(+), 598 deletions(-)
Index: linux-2.6-lttng.stable/instrumentation/Makefile
===================================================================
--- linux-2.6-lttng.stable.orig/instrumentation/Makefile 2007-10-29 09:52:09.000000000 -0400
+++ linux-2.6-lttng.stable/instrumentation/Makefile 2007-10-29 09:52:30.000000000 -0400
@@ -2,5 +2,7 @@
# Makefile for the linux kernel instrumentation
#
+obj-y += profile.o
+
obj-$(CONFIG_KPROBES) += kprobes.o
obj-$(CONFIG_MARKERS) += marker.o
Index: linux-2.6-lttng.stable/instrumentation/profile.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng.stable/instrumentation/profile.c 2007-10-29 09:52:13.000000000 -0400
@@ -0,0 +1,597 @@
+/*
+ * linux/kernel/profile.c
+ * Simple profiling. Manages a direct-mapped profile hit count buffer,
+ * with configurable resolution, support for restricting the cpus on
+ * which profiling is done, and switching between cpu time and
+ * schedule() calls via kernel command line parameters passed at boot.
+ *
+ * Scheduler profiling support, Arjan van de Ven and Ingo Molnar,
+ * Red Hat, July 2004
+ * Consolidation of architecture support code for profiling,
+ * William Irwin, Oracle, July 2004
+ * Amortized hit count accounting via per-cpu open-addressed hashtables
+ * to resolve timer interrupt livelocks, William Irwin, Oracle, 2004
+ */
+
+#include <linux/module.h>
+#include <linux/profile.h>
+#include <linux/bootmem.h>
+#include <linux/notifier.h>
+#include <linux/mm.h>
+#include <linux/cpumask.h>
+#include <linux/cpu.h>
+#include <linux/profile.h>
+#include <linux/highmem.h>
+#include <linux/mutex.h>
+#include <asm/sections.h>
+#include <asm/semaphore.h>
+#include <asm/irq_regs.h>
+#include <asm/ptrace.h>
+
+struct profile_hit {
+ u32 pc, hits;
+};
+#define PROFILE_GRPSHIFT 3
+#define PROFILE_GRPSZ (1 << PROFILE_GRPSHIFT)
+#define NR_PROFILE_HIT (PAGE_SIZE/sizeof(struct profile_hit))
+#define NR_PROFILE_GRP (NR_PROFILE_HIT/PROFILE_GRPSZ)
+
+/* Oprofile timer tick hook */
+static int (*timer_hook)(struct pt_regs *) __read_mostly;
+
+static atomic_t *prof_buffer;
+static unsigned long prof_len, prof_shift;
+
+int prof_on __read_mostly;
+EXPORT_SYMBOL_GPL(prof_on);
+
+static cpumask_t prof_cpu_mask = CPU_MASK_ALL;
+#ifdef CONFIG_SMP
+static DEFINE_PER_CPU(struct profile_hit *[2], cpu_profile_hits);
+static DEFINE_PER_CPU(int, cpu_profile_flip);
+static DEFINE_MUTEX(profile_flip_mutex);
+#endif /* CONFIG_SMP */
+
+static int __init profile_setup(char * str)
+{
+ static char __initdata schedstr[] = "schedule";
+ static char __initdata sleepstr[] = "sleep";
+ static char __initdata kvmstr[] = "kvm";
+ int par;
+
+ if (!strncmp(str, sleepstr, strlen(sleepstr))) {
+ prof_on = SLEEP_PROFILING;
+ if (str[strlen(sleepstr)] == ',')
+ str += strlen(sleepstr) + 1;
+ if (get_option(&str, &par))
+ prof_shift = par;
+ printk(KERN_INFO
+ "kernel sleep profiling enabled (shift: %ld)\n",
+ prof_shift);
+ } else if (!strncmp(str, schedstr, strlen(schedstr))) {
+ prof_on = SCHED_PROFILING;
+ if (str[strlen(schedstr)] == ',')
+ str += strlen(schedstr) + 1;
+ if (get_option(&str, &par))
+ prof_shift = par;
+ printk(KERN_INFO
+ "kernel schedule profiling enabled (shift: %ld)\n",
+ prof_shift);
+ } else if (!strncmp(str, kvmstr, strlen(kvmstr))) {
+ prof_on = KVM_PROFILING;
+ if (str[strlen(kvmstr)] == ',')
+ str += strlen(kvmstr) + 1;
+ if (get_option(&str, &par))
+ prof_shift = par;
+ printk(KERN_INFO
+ "kernel KVM profiling enabled (shift: %ld)\n",
+ prof_shift);
+ } else if (get_option(&str, &par)) {
+ prof_shift = par;
+ prof_on = CPU_PROFILING;
+ printk(KERN_INFO "kernel profiling enabled (shift: %ld)\n",
+ prof_shift);
+ }
+ return 1;
+}
+__setup("profile=", profile_setup);
+
+
+void __init profile_init(void)
+{
+ if (!prof_on)
+ return;
+
+ /* only text is profiled */
+ prof_len = (_etext - _stext) >> prof_shift;
+ prof_buffer = alloc_bootmem(prof_len*sizeof(atomic_t));
+}
+
+/* Profile event notifications */
+
+#ifdef CONFIG_PROFILING
+
+static BLOCKING_NOTIFIER_HEAD(task_exit_notifier);
+static ATOMIC_NOTIFIER_HEAD(task_free_notifier);
+static BLOCKING_NOTIFIER_HEAD(munmap_notifier);
+
+void profile_task_exit(struct task_struct * task)
+{
+ blocking_notifier_call_chain(&task_exit_notifier, 0, task);
+}
+
+int profile_handoff_task(struct task_struct * task)
+{
+ int ret;
+ ret = atomic_notifier_call_chain(&task_free_notifier, 0, task);
+ return (ret == NOTIFY_OK) ? 1 : 0;
+}
+
+void profile_munmap(unsigned long addr)
+{
+ blocking_notifier_call_chain(&munmap_notifier, 0, (void *)addr);
+}
+
+int task_handoff_register(struct notifier_block * n)
+{
+ return atomic_notifier_chain_register(&task_free_notifier, n);
+}
+
+int task_handoff_unregister(struct notifier_block * n)
+{
+ return atomic_notifier_chain_unregister(&task_free_notifier, n);
+}
+
+int profile_event_register(enum profile_type type, struct notifier_block * n)
+{
+ int err = -EINVAL;
+
+ switch (type) {
+ case PROFILE_TASK_EXIT:
+ err = blocking_notifier_chain_register(
+ &task_exit_notifier, n);
+ break;
+ case PROFILE_MUNMAP:
+ err = blocking_notifier_chain_register(
+ &munmap_notifier, n);
+ break;
+ }
+
+ return err;
+}
+
+
+int profile_event_unregister(enum profile_type type, struct notifier_block * n)
+{
+ int err = -EINVAL;
+
+ switch (type) {
+ case PROFILE_TASK_EXIT:
+ err = blocking_notifier_chain_unregister(
+ &task_exit_notifier, n);
+ break;
+ case PROFILE_MUNMAP:
+ err = blocking_notifier_chain_unregister(
+ &munmap_notifier, n);
+ break;
+ }
+
+ return err;
+}
+
+int register_timer_hook(int (*hook)(struct pt_regs *))
+{
+ if (timer_hook)
+ return -EBUSY;
+ timer_hook = hook;
+ return 0;
+}
+
+void unregister_timer_hook(int (*hook)(struct pt_regs *))
+{
+ WARN_ON(hook != timer_hook);
+ timer_hook = NULL;
+ /* make sure all CPUs see the NULL hook */
+ synchronize_sched(); /* Allow ongoing interrupts to complete. */
+}
+
+EXPORT_SYMBOL_GPL(register_timer_hook);
+EXPORT_SYMBOL_GPL(unregister_timer_hook);
+EXPORT_SYMBOL_GPL(task_handoff_register);
+EXPORT_SYMBOL_GPL(task_handoff_unregister);
+EXPORT_SYMBOL_GPL(profile_event_register);
+EXPORT_SYMBOL_GPL(profile_event_unregister);
+
+#endif /* CONFIG_PROFILING */
+
+
+#ifdef CONFIG_SMP
+/*
+ * Each cpu has a pair of open-addressed hashtables for pending
+ * profile hits. read_profile() IPI's all cpus to request them
+ * to flip buffers and flushes their contents to prof_buffer itself.
+ * Flip requests are serialized by the profile_flip_mutex. The sole
+ * use of having a second hashtable is for avoiding cacheline
+ * contention that would otherwise happen during flushes of pending
+ * profile hits required for the accuracy of reported profile hits
+ * and so resurrect the interrupt livelock issue.
+ *
+ * The open-addressed hashtables are indexed by profile buffer slot
+ * and hold the number of pending hits to that profile buffer slot on
+ * a cpu in an entry. When the hashtable overflows, all pending hits
+ * are accounted to their corresponding profile buffer slots with
+ * atomic_add() and the hashtable emptied. As numerous pending hits
+ * may be accounted to a profile buffer slot in a hashtable entry,
+ * this amortizes a number of atomic profile buffer increments likely
+ * to be far larger than the number of entries in the hashtable,
+ * particularly given that the number of distinct profile buffer
+ * positions to which hits are accounted during short intervals (e.g.
+ * several seconds) is usually very small. Exclusion from buffer
+ * flipping is provided by interrupt disablement (note that for
+ * SCHED_PROFILING or SLEEP_PROFILING profile_hit() may be called from
+ * process context).
+ * The hash function is meant to be lightweight as opposed to strong,
+ * and was vaguely inspired by ppc64 firmware-supported inverted
+ * pagetable hash functions, but uses a full hashtable full of finite
+ * collision chains, not just pairs of them.
+ *
+ * -- wli
+ */
+static void __profile_flip_buffers(void *unused)
+{
+ int cpu = smp_processor_id();
+
+ per_cpu(cpu_profile_flip, cpu) = !per_cpu(cpu_profile_flip, cpu);
+}
+
+static void profile_flip_buffers(void)
+{
+ int i, j, cpu;
+
+ mutex_lock(&profile_flip_mutex);
+ j = per_cpu(cpu_profile_flip, get_cpu());
+ put_cpu();
+ on_each_cpu(__profile_flip_buffers, NULL, 0, 1);
+ for_each_online_cpu(cpu) {
+ struct profile_hit *hits = per_cpu(cpu_profile_hits, cpu)[j];
+ for (i = 0; i < NR_PROFILE_HIT; ++i) {
+ if (!hits[i].hits) {
+ if (hits[i].pc)
+ hits[i].pc = 0;
+ continue;
+ }
+ atomic_add(hits[i].hits, &prof_buffer[hits[i].pc]);
+ hits[i].hits = hits[i].pc = 0;
+ }
+ }
+ mutex_unlock(&profile_flip_mutex);
+}
+
+static void profile_discard_flip_buffers(void)
+{
+ int i, cpu;
+
+ mutex_lock(&profile_flip_mutex);
+ i = per_cpu(cpu_profile_flip, get_cpu());
+ put_cpu();
+ on_each_cpu(__profile_flip_buffers, NULL, 0, 1);
+ for_each_online_cpu(cpu) {
+ struct profile_hit *hits = per_cpu(cpu_profile_hits, cpu)[i];
+ memset(hits, 0, NR_PROFILE_HIT*sizeof(struct profile_hit));
+ }
+ mutex_unlock(&profile_flip_mutex);
+}
+
+void profile_hits(int type, void *__pc, unsigned int nr_hits)
+{
+ unsigned long primary, secondary, flags, pc = (unsigned long)__pc;
+ int i, j, cpu;
+ struct profile_hit *hits;
+
+ if (prof_on != type || !prof_buffer)
+ return;
+ pc = min((pc - (unsigned long)_stext) >> prof_shift, prof_len - 1);
+ i = primary = (pc & (NR_PROFILE_GRP - 1)) << PROFILE_GRPSHIFT;
+ secondary = (~(pc << 1) & (NR_PROFILE_GRP - 1)) << PROFILE_GRPSHIFT;
+ cpu = get_cpu();
+ hits = per_cpu(cpu_profile_hits, cpu)[per_cpu(cpu_profile_flip, cpu)];
+ if (!hits) {
+ put_cpu();
+ return;
+ }
+ /*
+ * We buffer the global profiler buffer into a per-CPU
+ * queue and thus reduce the number of global (and possibly
+ * NUMA-alien) accesses. The write-queue is self-coalescing:
+ */
+ local_irq_save(flags);
+ do {
+ for (j = 0; j < PROFILE_GRPSZ; ++j) {
+ if (hits[i + j].pc == pc) {
+ hits[i + j].hits += nr_hits;
+ goto out;
+ } else if (!hits[i + j].hits) {
+ hits[i + j].pc = pc;
+ hits[i + j].hits = nr_hits;
+ goto out;
+ }
+ }
+ i = (i + secondary) & (NR_PROFILE_HIT - 1);
+ } while (i != primary);
+
+ /*
+ * Add the current hit(s) and flush the write-queue out
+ * to the global buffer:
+ */
+ atomic_add(nr_hits, &prof_buffer[pc]);
+ for (i = 0; i < NR_PROFILE_HIT; ++i) {
+ atomic_add(hits[i].hits, &prof_buffer[hits[i].pc]);
+ hits[i].pc = hits[i].hits = 0;
+ }
+out:
+ local_irq_restore(flags);
+ put_cpu();
+}
+
+static int __devinit profile_cpu_callback(struct notifier_block *info,
+ unsigned long action, void *__cpu)
+{
+ int node, cpu = (unsigned long)__cpu;
+ struct page *page;
+
+ switch (action) {
+ case CPU_UP_PREPARE:
+ case CPU_UP_PREPARE_FROZEN:
+ node = cpu_to_node(cpu);
+ per_cpu(cpu_profile_flip, cpu) = 0;
+ if (!per_cpu(cpu_profile_hits, cpu)[1]) {
+ page = alloc_pages_node(node,
+ GFP_KERNEL | __GFP_ZERO,
+ 0);
+ if (!page)
+ return NOTIFY_BAD;
+ per_cpu(cpu_profile_hits, cpu)[1] = page_address(page);
+ }
+ if (!per_cpu(cpu_profile_hits, cpu)[0]) {
+ page = alloc_pages_node(node,
+ GFP_KERNEL | __GFP_ZERO,
+ 0);
+ if (!page)
+ goto out_free;
+ per_cpu(cpu_profile_hits, cpu)[0] = page_address(page);
+ }
+ break;
+ out_free:
+ page = virt_to_page(per_cpu(cpu_profile_hits, cpu)[1]);
+ per_cpu(cpu_profile_hits, cpu)[1] = NULL;
+ __free_page(page);
+ return NOTIFY_BAD;
+ case CPU_ONLINE:
+ case CPU_ONLINE_FROZEN:
+ cpu_set(cpu, prof_cpu_mask);
+ break;
+ case CPU_UP_CANCELED:
+ case CPU_UP_CANCELED_FROZEN:
+ case CPU_DEAD:
+ case CPU_DEAD_FROZEN:
+ cpu_clear(cpu, prof_cpu_mask);
+ if (per_cpu(cpu_profile_hits, cpu)[0]) {
+ page = virt_to_page(per_cpu(cpu_profile_hits, cpu)[0]);
+ per_cpu(cpu_profile_hits, cpu)[0] = NULL;
+ __free_page(page);
+ }
+ if (per_cpu(cpu_profile_hits, cpu)[1]) {
+ page = virt_to_page(per_cpu(cpu_profile_hits, cpu)[1]);
+ per_cpu(cpu_profile_hits, cpu)[1] = NULL;
+ __free_page(page);
+ }
+ break;
+ }
+ return NOTIFY_OK;
+}
+#else /* !CONFIG_SMP */
+#define profile_flip_buffers() do { } while (0)
+#define profile_discard_flip_buffers() do { } while (0)
+#define profile_cpu_callback NULL
+
+void profile_hits(int type, void *__pc, unsigned int nr_hits)
+{
+ unsigned long pc;
+
+ if (prof_on != type || !prof_buffer)
+ return;
+ pc = ((unsigned long)__pc - (unsigned long)_stext) >> prof_shift;
+ atomic_add(nr_hits, &prof_buffer[min(pc, prof_len - 1)]);
+}
+#endif /* !CONFIG_SMP */
+
+EXPORT_SYMBOL_GPL(profile_hits);
+
+void profile_tick(int type)
+{
+ struct pt_regs *regs = get_irq_regs();
+
+ if (type == CPU_PROFILING && timer_hook)
+ timer_hook(regs);
+ if (!user_mode(regs) && cpu_isset(smp_processor_id(), prof_cpu_mask))
+ profile_hit(type, (void *)profile_pc(regs));
+}
+
+#ifdef CONFIG_PROC_FS
+#include <linux/proc_fs.h>
+#include <asm/uaccess.h>
+#include <asm/ptrace.h>
+
+static int prof_cpu_mask_read_proc (char *page, char **start, off_t off,
+ int count, int *eof, void *data)
+{
+ int len = cpumask_scnprintf(page, count, *(cpumask_t *)data);
+ if (count - len < 2)
+ return -EINVAL;
+ len += sprintf(page + len, "\n");
+ return len;
+}
+
+static int prof_cpu_mask_write_proc (struct file *file, const char __user *buffer,
+ unsigned long count, void *data)
+{
+ cpumask_t *mask = (cpumask_t *)data;
+ unsigned long full_count = count, err;
+ cpumask_t new_value;
+
+ err = cpumask_parse_user(buffer, count, new_value);
+ if (err)
+ return err;
+
+ *mask = new_value;
+ return full_count;
+}
+
+void create_prof_cpu_mask(struct proc_dir_entry *root_irq_dir)
+{
+ struct proc_dir_entry *entry;
+
+ /* create /proc/irq/prof_cpu_mask */
+ if (!(entry = create_proc_entry("prof_cpu_mask", 0600, root_irq_dir)))
+ return;
+ entry->data = (void *)&prof_cpu_mask;
+ entry->read_proc = prof_cpu_mask_read_proc;
+ entry->write_proc = prof_cpu_mask_write_proc;
+}
+
+/*
+ * This function accesses profiling information. The returned data is
+ * binary: the sampling step and the actual contents of the profile
+ * buffer. Use of the program readprofile is recommended in order to
+ * get meaningful info out of these data.
+ */
+static ssize_t
+read_profile(struct file *file, char __user *buf, size_t count, loff_t *ppos)
+{
+ unsigned long p = *ppos;
+ ssize_t read;
+ char * pnt;
+ unsigned int sample_step = 1 << prof_shift;
+
+ profile_flip_buffers();
+ if (p >= (prof_len+1)*sizeof(unsigned int))
+ return 0;
+ if (count > (prof_len+1)*sizeof(unsigned int) - p)
+ count = (prof_len+1)*sizeof(unsigned int) - p;
+ read = 0;
+
+ while (p < sizeof(unsigned int) && count > 0) {
+ if (put_user(*((char *)(&sample_step)+p),buf))
+ return -EFAULT;
+ buf++; p++; count--; read++;
+ }
+ pnt = (char *)prof_buffer + p - sizeof(atomic_t);
+ if (copy_to_user(buf,(void *)pnt,count))
+ return -EFAULT;
+ read += count;
+ *ppos += read;
+ return read;
+}
+
+/*
+ * Writing to /proc/profile resets the counters
+ *
+ * Writing a 'profiling multiplier' value into it also re-sets the profiling
+ * interrupt frequency, on architectures that support this.
+ */
+static ssize_t write_profile(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+#ifdef CONFIG_SMP
+ extern int setup_profiling_timer (unsigned int multiplier);
+
+ if (count == sizeof(int)) {
+ unsigned int multiplier;
+
+ if (copy_from_user(&multiplier, buf, sizeof(int)))
+ return -EFAULT;
+
+ if (setup_profiling_timer(multiplier))
+ return -EINVAL;
+ }
+#endif
+ profile_discard_flip_buffers();
+ memset(prof_buffer, 0, prof_len * sizeof(atomic_t));
+ return count;
+}
+
+static const struct file_operations proc_profile_operations = {
+ .read = read_profile,
+ .write = write_profile,
+};
+
+#ifdef CONFIG_SMP
+static void __init profile_nop(void *unused)
+{
+}
+
+static int __init create_hash_tables(void)
+{
+ int cpu;
+
+ for_each_online_cpu(cpu) {
+ int node = cpu_to_node(cpu);
+ struct page *page;
+
+ page = alloc_pages_node(node,
+ GFP_KERNEL | __GFP_ZERO | GFP_THISNODE,
+ 0);
+ if (!page)
+ goto out_cleanup;
+ per_cpu(cpu_profile_hits, cpu)[1]
+ = (struct profile_hit *)page_address(page);
+ page = alloc_pages_node(node,
+ GFP_KERNEL | __GFP_ZERO | GFP_THISNODE,
+ 0);
+ if (!page)
+ goto out_cleanup;
+ per_cpu(cpu_profile_hits, cpu)[0]
+ = (struct profile_hit *)page_address(page);
+ }
+ return 0;
+out_cleanup:
+ prof_on = 0;
+ smp_mb();
+ on_each_cpu(profile_nop, NULL, 0, 1);
+ for_each_online_cpu(cpu) {
+ struct page *page;
+
+ if (per_cpu(cpu_profile_hits, cpu)[0]) {
+ page = virt_to_page(per_cpu(cpu_profile_hits, cpu)[0]);
+ per_cpu(cpu_profile_hits, cpu)[0] = NULL;
+ __free_page(page);
+ }
+ if (per_cpu(cpu_profile_hits, cpu)[1]) {
+ page = virt_to_page(per_cpu(cpu_profile_hits, cpu)[1]);
+ per_cpu(cpu_profile_hits, cpu)[1] = NULL;
+ __free_page(page);
+ }
+ }
+ return -1;
+}
+#else
+#define create_hash_tables() ({ 0; })
+#endif
+
+static int __init create_proc_profile(void)
+{
+ struct proc_dir_entry *entry;
+
+ if (!prof_on)
+ return 0;
+ if (create_hash_tables())
+ return -1;
+ if (!(entry = create_proc_entry("profile", S_IWUSR | S_IRUGO, NULL)))
+ return 0;
+ entry->proc_fops = &proc_profile_operations;
+ entry->size = (1+prof_len) * sizeof(atomic_t);
+ hotcpu_notifier(profile_cpu_callback, 0);
+ return 0;
+}
+module_init(create_proc_profile);
+#endif /* CONFIG_PROC_FS */
Index: linux-2.6-lttng.stable/kernel/Makefile
===================================================================
--- linux-2.6-lttng.stable.orig/kernel/Makefile 2007-10-29 09:51:51.000000000 -0400
+++ linux-2.6-lttng.stable/kernel/Makefile 2007-10-29 09:52:13.000000000 -0400
@@ -2,7 +2,7 @@
# Makefile for the linux kernel.
#
-obj-y = sched.o fork.o exec_domain.o panic.o printk.o profile.o \
+obj-y = sched.o fork.o exec_domain.o panic.o printk.o \
exit.o itimer.o time.o softirq.o resource.o \
sysctl.o capability.o ptrace.o timer.o user.o user_namespace.o \
signal.o sys.o kmod.o workqueue.o pid.o \
Index: linux-2.6-lttng.stable/kernel/profile.c
===================================================================
--- linux-2.6-lttng.stable.orig/kernel/profile.c 2007-10-29 09:51:06.000000000 -0400
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,597 +0,0 @@
-/*
- * linux/kernel/profile.c
- * Simple profiling. Manages a direct-mapped profile hit count buffer,
- * with configurable resolution, support for restricting the cpus on
- * which profiling is done, and switching between cpu time and
- * schedule() calls via kernel command line parameters passed at boot.
- *
- * Scheduler profiling support, Arjan van de Ven and Ingo Molnar,
- * Red Hat, July 2004
- * Consolidation of architecture support code for profiling,
- * William Irwin, Oracle, July 2004
- * Amortized hit count accounting via per-cpu open-addressed hashtables
- * to resolve timer interrupt livelocks, William Irwin, Oracle, 2004
- */
-
-#include <linux/module.h>
-#include <linux/profile.h>
-#include <linux/bootmem.h>
-#include <linux/notifier.h>
-#include <linux/mm.h>
-#include <linux/cpumask.h>
-#include <linux/cpu.h>
-#include <linux/profile.h>
-#include <linux/highmem.h>
-#include <linux/mutex.h>
-#include <asm/sections.h>
-#include <asm/semaphore.h>
-#include <asm/irq_regs.h>
-#include <asm/ptrace.h>
-
-struct profile_hit {
- u32 pc, hits;
-};
-#define PROFILE_GRPSHIFT 3
-#define PROFILE_GRPSZ (1 << PROFILE_GRPSHIFT)
-#define NR_PROFILE_HIT (PAGE_SIZE/sizeof(struct profile_hit))
-#define NR_PROFILE_GRP (NR_PROFILE_HIT/PROFILE_GRPSZ)
-
-/* Oprofile timer tick hook */
-static int (*timer_hook)(struct pt_regs *) __read_mostly;
-
-static atomic_t *prof_buffer;
-static unsigned long prof_len, prof_shift;
-
-int prof_on __read_mostly;
-EXPORT_SYMBOL_GPL(prof_on);
-
-static cpumask_t prof_cpu_mask = CPU_MASK_ALL;
-#ifdef CONFIG_SMP
-static DEFINE_PER_CPU(struct profile_hit *[2], cpu_profile_hits);
-static DEFINE_PER_CPU(int, cpu_profile_flip);
-static DEFINE_MUTEX(profile_flip_mutex);
-#endif /* CONFIG_SMP */
-
-static int __init profile_setup(char * str)
-{
- static char __initdata schedstr[] = "schedule";
- static char __initdata sleepstr[] = "sleep";
- static char __initdata kvmstr[] = "kvm";
- int par;
-
- if (!strncmp(str, sleepstr, strlen(sleepstr))) {
- prof_on = SLEEP_PROFILING;
- if (str[strlen(sleepstr)] == ',')
- str += strlen(sleepstr) + 1;
- if (get_option(&str, &par))
- prof_shift = par;
- printk(KERN_INFO
- "kernel sleep profiling enabled (shift: %ld)\n",
- prof_shift);
- } else if (!strncmp(str, schedstr, strlen(schedstr))) {
- prof_on = SCHED_PROFILING;
- if (str[strlen(schedstr)] == ',')
- str += strlen(schedstr) + 1;
- if (get_option(&str, &par))
- prof_shift = par;
- printk(KERN_INFO
- "kernel schedule profiling enabled (shift: %ld)\n",
- prof_shift);
- } else if (!strncmp(str, kvmstr, strlen(kvmstr))) {
- prof_on = KVM_PROFILING;
- if (str[strlen(kvmstr)] == ',')
- str += strlen(kvmstr) + 1;
- if (get_option(&str, &par))
- prof_shift = par;
- printk(KERN_INFO
- "kernel KVM profiling enabled (shift: %ld)\n",
- prof_shift);
- } else if (get_option(&str, &par)) {
- prof_shift = par;
- prof_on = CPU_PROFILING;
- printk(KERN_INFO "kernel profiling enabled (shift: %ld)\n",
- prof_shift);
- }
- return 1;
-}
-__setup("profile=", profile_setup);
-
-
-void __init profile_init(void)
-{
- if (!prof_on)
- return;
-
- /* only text is profiled */
- prof_len = (_etext - _stext) >> prof_shift;
- prof_buffer = alloc_bootmem(prof_len*sizeof(atomic_t));
-}
-
-/* Profile event notifications */
-
-#ifdef CONFIG_PROFILING
-
-static BLOCKING_NOTIFIER_HEAD(task_exit_notifier);
-static ATOMIC_NOTIFIER_HEAD(task_free_notifier);
-static BLOCKING_NOTIFIER_HEAD(munmap_notifier);
-
-void profile_task_exit(struct task_struct * task)
-{
- blocking_notifier_call_chain(&task_exit_notifier, 0, task);
-}
-
-int profile_handoff_task(struct task_struct * task)
-{
- int ret;
- ret = atomic_notifier_call_chain(&task_free_notifier, 0, task);
- return (ret == NOTIFY_OK) ? 1 : 0;
-}
-
-void profile_munmap(unsigned long addr)
-{
- blocking_notifier_call_chain(&munmap_notifier, 0, (void *)addr);
-}
-
-int task_handoff_register(struct notifier_block * n)
-{
- return atomic_notifier_chain_register(&task_free_notifier, n);
-}
-
-int task_handoff_unregister(struct notifier_block * n)
-{
- return atomic_notifier_chain_unregister(&task_free_notifier, n);
-}
-
-int profile_event_register(enum profile_type type, struct notifier_block * n)
-{
- int err = -EINVAL;
-
- switch (type) {
- case PROFILE_TASK_EXIT:
- err = blocking_notifier_chain_register(
- &task_exit_notifier, n);
- break;
- case PROFILE_MUNMAP:
- err = blocking_notifier_chain_register(
- &munmap_notifier, n);
- break;
- }
-
- return err;
-}
-
-
-int profile_event_unregister(enum profile_type type, struct notifier_block * n)
-{
- int err = -EINVAL;
-
- switch (type) {
- case PROFILE_TASK_EXIT:
- err = blocking_notifier_chain_unregister(
- &task_exit_notifier, n);
- break;
- case PROFILE_MUNMAP:
- err = blocking_notifier_chain_unregister(
- &munmap_notifier, n);
- break;
- }
-
- return err;
-}
-
-int register_timer_hook(int (*hook)(struct pt_regs *))
-{
- if (timer_hook)
- return -EBUSY;
- timer_hook = hook;
- return 0;
-}
-
-void unregister_timer_hook(int (*hook)(struct pt_regs *))
-{
- WARN_ON(hook != timer_hook);
- timer_hook = NULL;
- /* make sure all CPUs see the NULL hook */
- synchronize_sched(); /* Allow ongoing interrupts to complete. */
-}
-
-EXPORT_SYMBOL_GPL(register_timer_hook);
-EXPORT_SYMBOL_GPL(unregister_timer_hook);
-EXPORT_SYMBOL_GPL(task_handoff_register);
-EXPORT_SYMBOL_GPL(task_handoff_unregister);
-EXPORT_SYMBOL_GPL(profile_event_register);
-EXPORT_SYMBOL_GPL(profile_event_unregister);
-
-#endif /* CONFIG_PROFILING */
-
-
-#ifdef CONFIG_SMP
-/*
- * Each cpu has a pair of open-addressed hashtables for pending
- * profile hits. read_profile() IPI's all cpus to request them
- * to flip buffers and flushes their contents to prof_buffer itself.
- * Flip requests are serialized by the profile_flip_mutex. The sole
- * use of having a second hashtable is for avoiding cacheline
- * contention that would otherwise happen during flushes of pending
- * profile hits required for the accuracy of reported profile hits
- * and so resurrect the interrupt livelock issue.
- *
- * The open-addressed hashtables are indexed by profile buffer slot
- * and hold the number of pending hits to that profile buffer slot on
- * a cpu in an entry. When the hashtable overflows, all pending hits
- * are accounted to their corresponding profile buffer slots with
- * atomic_add() and the hashtable emptied. As numerous pending hits
- * may be accounted to a profile buffer slot in a hashtable entry,
- * this amortizes a number of atomic profile buffer increments likely
- * to be far larger than the number of entries in the hashtable,
- * particularly given that the number of distinct profile buffer
- * positions to which hits are accounted during short intervals (e.g.
- * several seconds) is usually very small. Exclusion from buffer
- * flipping is provided by interrupt disablement (note that for
- * SCHED_PROFILING or SLEEP_PROFILING profile_hit() may be called from
- * process context).
- * The hash function is meant to be lightweight as opposed to strong,
- * and was vaguely inspired by ppc64 firmware-supported inverted
- * pagetable hash functions, but uses a full hashtable full of finite
- * collision chains, not just pairs of them.
- *
- * -- wli
- */
-static void __profile_flip_buffers(void *unused)
-{
- int cpu = smp_processor_id();
-
- per_cpu(cpu_profile_flip, cpu) = !per_cpu(cpu_profile_flip, cpu);
-}
-
-static void profile_flip_buffers(void)
-{
- int i, j, cpu;
-
- mutex_lock(&profile_flip_mutex);
- j = per_cpu(cpu_profile_flip, get_cpu());
- put_cpu();
- on_each_cpu(__profile_flip_buffers, NULL, 0, 1);
- for_each_online_cpu(cpu) {
- struct profile_hit *hits = per_cpu(cpu_profile_hits, cpu)[j];
- for (i = 0; i < NR_PROFILE_HIT; ++i) {
- if (!hits[i].hits) {
- if (hits[i].pc)
- hits[i].pc = 0;
- continue;
- }
- atomic_add(hits[i].hits, &prof_buffer[hits[i].pc]);
- hits[i].hits = hits[i].pc = 0;
- }
- }
- mutex_unlock(&profile_flip_mutex);
-}
-
-static void profile_discard_flip_buffers(void)
-{
- int i, cpu;
-
- mutex_lock(&profile_flip_mutex);
- i = per_cpu(cpu_profile_flip, get_cpu());
- put_cpu();
- on_each_cpu(__profile_flip_buffers, NULL, 0, 1);
- for_each_online_cpu(cpu) {
- struct profile_hit *hits = per_cpu(cpu_profile_hits, cpu)[i];
- memset(hits, 0, NR_PROFILE_HIT*sizeof(struct profile_hit));
- }
- mutex_unlock(&profile_flip_mutex);
-}
-
-void profile_hits(int type, void *__pc, unsigned int nr_hits)
-{
- unsigned long primary, secondary, flags, pc = (unsigned long)__pc;
- int i, j, cpu;
- struct profile_hit *hits;
-
- if (prof_on != type || !prof_buffer)
- return;
- pc = min((pc - (unsigned long)_stext) >> prof_shift, prof_len - 1);
- i = primary = (pc & (NR_PROFILE_GRP - 1)) << PROFILE_GRPSHIFT;
- secondary = (~(pc << 1) & (NR_PROFILE_GRP - 1)) << PROFILE_GRPSHIFT;
- cpu = get_cpu();
- hits = per_cpu(cpu_profile_hits, cpu)[per_cpu(cpu_profile_flip, cpu)];
- if (!hits) {
- put_cpu();
- return;
- }
- /*
- * We buffer the global profiler buffer into a per-CPU
- * queue and thus reduce the number of global (and possibly
- * NUMA-alien) accesses. The write-queue is self-coalescing:
- */
- local_irq_save(flags);
- do {
- for (j = 0; j < PROFILE_GRPSZ; ++j) {
- if (hits[i + j].pc == pc) {
- hits[i + j].hits += nr_hits;
- goto out;
- } else if (!hits[i + j].hits) {
- hits[i + j].pc = pc;
- hits[i + j].hits = nr_hits;
- goto out;
- }
- }
- i = (i + secondary) & (NR_PROFILE_HIT - 1);
- } while (i != primary);
-
- /*
- * Add the current hit(s) and flush the write-queue out
- * to the global buffer:
- */
- atomic_add(nr_hits, &prof_buffer[pc]);
- for (i = 0; i < NR_PROFILE_HIT; ++i) {
- atomic_add(hits[i].hits, &prof_buffer[hits[i].pc]);
- hits[i].pc = hits[i].hits = 0;
- }
-out:
- local_irq_restore(flags);
- put_cpu();
-}
-
-static int __devinit profile_cpu_callback(struct notifier_block *info,
- unsigned long action, void *__cpu)
-{
- int node, cpu = (unsigned long)__cpu;
- struct page *page;
-
- switch (action) {
- case CPU_UP_PREPARE:
- case CPU_UP_PREPARE_FROZEN:
- node = cpu_to_node(cpu);
- per_cpu(cpu_profile_flip, cpu) = 0;
- if (!per_cpu(cpu_profile_hits, cpu)[1]) {
- page = alloc_pages_node(node,
- GFP_KERNEL | __GFP_ZERO,
- 0);
- if (!page)
- return NOTIFY_BAD;
- per_cpu(cpu_profile_hits, cpu)[1] = page_address(page);
- }
- if (!per_cpu(cpu_profile_hits, cpu)[0]) {
- page = alloc_pages_node(node,
- GFP_KERNEL | __GFP_ZERO,
- 0);
- if (!page)
- goto out_free;
- per_cpu(cpu_profile_hits, cpu)[0] = page_address(page);
- }
- break;
- out_free:
- page = virt_to_page(per_cpu(cpu_profile_hits, cpu)[1]);
- per_cpu(cpu_profile_hits, cpu)[1] = NULL;
- __free_page(page);
- return NOTIFY_BAD;
- case CPU_ONLINE:
- case CPU_ONLINE_FROZEN:
- cpu_set(cpu, prof_cpu_mask);
- break;
- case CPU_UP_CANCELED:
- case CPU_UP_CANCELED_FROZEN:
- case CPU_DEAD:
- case CPU_DEAD_FROZEN:
- cpu_clear(cpu, prof_cpu_mask);
- if (per_cpu(cpu_profile_hits, cpu)[0]) {
- page = virt_to_page(per_cpu(cpu_profile_hits, cpu)[0]);
- per_cpu(cpu_profile_hits, cpu)[0] = NULL;
- __free_page(page);
- }
- if (per_cpu(cpu_profile_hits, cpu)[1]) {
- page = virt_to_page(per_cpu(cpu_profile_hits, cpu)[1]);
- per_cpu(cpu_profile_hits, cpu)[1] = NULL;
- __free_page(page);
- }
- break;
- }
- return NOTIFY_OK;
-}
-#else /* !CONFIG_SMP */
-#define profile_flip_buffers() do { } while (0)
-#define profile_discard_flip_buffers() do { } while (0)
-#define profile_cpu_callback NULL
-
-void profile_hits(int type, void *__pc, unsigned int nr_hits)
-{
- unsigned long pc;
-
- if (prof_on != type || !prof_buffer)
- return;
- pc = ((unsigned long)__pc - (unsigned long)_stext) >> prof_shift;
- atomic_add(nr_hits, &prof_buffer[min(pc, prof_len - 1)]);
-}
-#endif /* !CONFIG_SMP */
-
-EXPORT_SYMBOL_GPL(profile_hits);
-
-void profile_tick(int type)
-{
- struct pt_regs *regs = get_irq_regs();
-
- if (type == CPU_PROFILING && timer_hook)
- timer_hook(regs);
- if (!user_mode(regs) && cpu_isset(smp_processor_id(), prof_cpu_mask))
- profile_hit(type, (void *)profile_pc(regs));
-}
-
-#ifdef CONFIG_PROC_FS
-#include <linux/proc_fs.h>
-#include <asm/uaccess.h>
-#include <asm/ptrace.h>
-
-static int prof_cpu_mask_read_proc (char *page, char **start, off_t off,
- int count, int *eof, void *data)
-{
- int len = cpumask_scnprintf(page, count, *(cpumask_t *)data);
- if (count - len < 2)
- return -EINVAL;
- len += sprintf(page + len, "\n");
- return len;
-}
-
-static int prof_cpu_mask_write_proc (struct file *file, const char __user *buffer,
- unsigned long count, void *data)
-{
- cpumask_t *mask = (cpumask_t *)data;
- unsigned long full_count = count, err;
- cpumask_t new_value;
-
- err = cpumask_parse_user(buffer, count, new_value);
- if (err)
- return err;
-
- *mask = new_value;
- return full_count;
-}
-
-void create_prof_cpu_mask(struct proc_dir_entry *root_irq_dir)
-{
- struct proc_dir_entry *entry;
-
- /* create /proc/irq/prof_cpu_mask */
- if (!(entry = create_proc_entry("prof_cpu_mask", 0600, root_irq_dir)))
- return;
- entry->data = (void *)&prof_cpu_mask;
- entry->read_proc = prof_cpu_mask_read_proc;
- entry->write_proc = prof_cpu_mask_write_proc;
-}
-
-/*
- * This function accesses profiling information. The returned data is
- * binary: the sampling step and the actual contents of the profile
- * buffer. Use of the program readprofile is recommended in order to
- * get meaningful info out of these data.
- */
-static ssize_t
-read_profile(struct file *file, char __user *buf, size_t count, loff_t *ppos)
-{
- unsigned long p = *ppos;
- ssize_t read;
- char * pnt;
- unsigned int sample_step = 1 << prof_shift;
-
- profile_flip_buffers();
- if (p >= (prof_len+1)*sizeof(unsigned int))
- return 0;
- if (count > (prof_len+1)*sizeof(unsigned int) - p)
- count = (prof_len+1)*sizeof(unsigned int) - p;
- read = 0;
-
- while (p < sizeof(unsigned int) && count > 0) {
- if (put_user(*((char *)(&sample_step)+p),buf))
- return -EFAULT;
- buf++; p++; count--; read++;
- }
- pnt = (char *)prof_buffer + p - sizeof(atomic_t);
- if (copy_to_user(buf,(void *)pnt,count))
- return -EFAULT;
- read += count;
- *ppos += read;
- return read;
-}
-
-/*
- * Writing to /proc/profile resets the counters
- *
- * Writing a 'profiling multiplier' value into it also re-sets the profiling
- * interrupt frequency, on architectures that support this.
- */
-static ssize_t write_profile(struct file *file, const char __user *buf,
- size_t count, loff_t *ppos)
-{
-#ifdef CONFIG_SMP
- extern int setup_profiling_timer (unsigned int multiplier);
-
- if (count == sizeof(int)) {
- unsigned int multiplier;
-
- if (copy_from_user(&multiplier, buf, sizeof(int)))
- return -EFAULT;
-
- if (setup_profiling_timer(multiplier))
- return -EINVAL;
- }
-#endif
- profile_discard_flip_buffers();
- memset(prof_buffer, 0, prof_len * sizeof(atomic_t));
- return count;
-}
-
-static const struct file_operations proc_profile_operations = {
- .read = read_profile,
- .write = write_profile,
-};
-
-#ifdef CONFIG_SMP
-static void __init profile_nop(void *unused)
-{
-}
-
-static int __init create_hash_tables(void)
-{
- int cpu;
-
- for_each_online_cpu(cpu) {
- int node = cpu_to_node(cpu);
- struct page *page;
-
- page = alloc_pages_node(node,
- GFP_KERNEL | __GFP_ZERO | GFP_THISNODE,
- 0);
- if (!page)
- goto out_cleanup;
- per_cpu(cpu_profile_hits, cpu)[1]
- = (struct profile_hit *)page_address(page);
- page = alloc_pages_node(node,
- GFP_KERNEL | __GFP_ZERO | GFP_THISNODE,
- 0);
- if (!page)
- goto out_cleanup;
- per_cpu(cpu_profile_hits, cpu)[0]
- = (struct profile_hit *)page_address(page);
- }
- return 0;
-out_cleanup:
- prof_on = 0;
- smp_mb();
- on_each_cpu(profile_nop, NULL, 0, 1);
- for_each_online_cpu(cpu) {
- struct page *page;
-
- if (per_cpu(cpu_profile_hits, cpu)[0]) {
- page = virt_to_page(per_cpu(cpu_profile_hits, cpu)[0]);
- per_cpu(cpu_profile_hits, cpu)[0] = NULL;
- __free_page(page);
- }
- if (per_cpu(cpu_profile_hits, cpu)[1]) {
- page = virt_to_page(per_cpu(cpu_profile_hits, cpu)[1]);
- per_cpu(cpu_profile_hits, cpu)[1] = NULL;
- __free_page(page);
- }
- }
- return -1;
-}
-#else
-#define create_hash_tables() ({ 0; })
-#endif
-
-static int __init create_proc_profile(void)
-{
- struct proc_dir_entry *entry;
-
- if (!prof_on)
- return 0;
- if (create_hash_tables())
- return -1;
- if (!(entry = create_proc_entry("profile", S_IWUSR | S_IRUGO, NULL)))
- return 0;
- entry->proc_fops = &proc_profile_operations;
- entry->size = (1+prof_len) * sizeof(atomic_t);
- hotcpu_notifier(profile_cpu_callback, 0);
- return 0;
-}
-module_init(create_proc_profile);
-#endif /* CONFIG_PROC_FS */
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 13+ messages in thread
* [patch 06/10] Move lockdep to instrumentation
2007-10-29 14:26 [patch 00/10] [PATCH] Create instrumentation/ kernel directory Mathieu Desnoyers
` (3 preceding siblings ...)
2007-10-29 14:26 ` [patch 05/10] Move profiling to instrumentation Mathieu Desnoyers
@ 2007-10-29 14:26 ` Mathieu Desnoyers
2007-10-29 14:26 ` [patch 07/10] Move vmstat to instrumentation/ Mathieu Desnoyers
` (4 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2007-10-29 14:26 UTC (permalink / raw)
To: linux-kernel; +Cc: Mathieu Desnoyers, Ingo Molnar, Peter Zijlstra
[-- Attachment #1: move-lockdep-to-instrumentation.patch --]
[-- Type: text/plain, Size: 212350 bytes --]
Move lockdep.c, lockdep_internals.h and lock_proc.c to instrumentation/.
Should we move the lockdep Kconfig menu entry to Instrumentation support? It is
currently automatically selected by "Kernel debugging" in Kernel hacking.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Ingo Molnar <mingo@redhat.com>
CC: Peter Zijlstra <pzijlstr@redhat.com>
---
instrumentation/Makefile | 4
instrumentation/lockdep.c | 3217 ++++++++++++++++++++++++++++++++++++
instrumentation/lockdep_internals.h | 78
instrumentation/lockdep_proc.c | 683 +++++++
kernel/Makefile | 4
kernel/lockdep.c | 3217 ------------------------------------
kernel/lockdep_internals.h | 78
kernel/lockdep_proc.c | 683 -------
8 files changed, 3982 insertions(+), 3982 deletions(-)
Index: linux-2.6-lttng.stable/instrumentation/Makefile
===================================================================
--- linux-2.6-lttng.stable.orig/instrumentation/Makefile 2007-10-29 09:52:30.000000000 -0400
+++ linux-2.6-lttng.stable/instrumentation/Makefile 2007-10-29 09:52:58.000000000 -0400
@@ -6,3 +6,7 @@ obj-y += profile.o
obj-$(CONFIG_KPROBES) += kprobes.o
obj-$(CONFIG_MARKERS) += marker.o
+obj-$(CONFIG_LOCKDEP) += lockdep.o
+ifeq ($(CONFIG_PROC_FS),y)
+obj-$(CONFIG_LOCKDEP) += lockdep_proc.o
+endif
Index: linux-2.6-lttng.stable/instrumentation/lockdep.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng.stable/instrumentation/lockdep.c 2007-10-29 09:52:35.000000000 -0400
@@ -0,0 +1,3217 @@
+/*
+ * kernel/lockdep.c
+ *
+ * Runtime locking correctness validator
+ *
+ * Started by Ingo Molnar:
+ *
+ * Copyright (C) 2006,2007 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
+ * Copyright (C) 2007 Red Hat, Inc., Peter Zijlstra <pzijlstr@redhat.com>
+ *
+ * this code maps all the lock dependencies as they occur in a live kernel
+ * and will warn about the following classes of locking bugs:
+ *
+ * - lock inversion scenarios
+ * - circular lock dependencies
+ * - hardirq/softirq safe/unsafe locking bugs
+ *
+ * Bugs are reported even if the current locking scenario does not cause
+ * any deadlock at this point.
+ *
+ * I.e. if anytime in the past two locks were taken in a different order,
+ * even if it happened for another task, even if those were different
+ * locks (but of the same class as this lock), this code will detect it.
+ *
+ * Thanks to Arjan van de Ven for coming up with the initial idea of
+ * mapping lock dependencies runtime.
+ */
+#include <linux/mutex.h>
+#include <linux/sched.h>
+#include <linux/delay.h>
+#include <linux/module.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <linux/spinlock.h>
+#include <linux/kallsyms.h>
+#include <linux/interrupt.h>
+#include <linux/stacktrace.h>
+#include <linux/debug_locks.h>
+#include <linux/irqflags.h>
+#include <linux/utsname.h>
+#include <linux/hash.h>
+
+#include <asm/sections.h>
+
+#include "lockdep_internals.h"
+
+#ifdef CONFIG_PROVE_LOCKING
+int prove_locking = 1;
+module_param(prove_locking, int, 0644);
+#else
+#define prove_locking 0
+#endif
+
+#ifdef CONFIG_LOCK_STAT
+int lock_stat = 1;
+module_param(lock_stat, int, 0644);
+#else
+#define lock_stat 0
+#endif
+
+/*
+ * lockdep_lock: protects the lockdep graph, the hashes and the
+ * class/list/hash allocators.
+ *
+ * This is one of the rare exceptions where it's justified
+ * to use a raw spinlock - we really dont want the spinlock
+ * code to recurse back into the lockdep code...
+ */
+static raw_spinlock_t lockdep_lock = (raw_spinlock_t)__RAW_SPIN_LOCK_UNLOCKED;
+
+static int graph_lock(void)
+{
+ __raw_spin_lock(&lockdep_lock);
+ /*
+ * Make sure that if another CPU detected a bug while
+ * walking the graph we dont change it (while the other
+ * CPU is busy printing out stuff with the graph lock
+ * dropped already)
+ */
+ if (!debug_locks) {
+ __raw_spin_unlock(&lockdep_lock);
+ return 0;
+ }
+ return 1;
+}
+
+static inline int graph_unlock(void)
+{
+ if (debug_locks && !__raw_spin_is_locked(&lockdep_lock))
+ return DEBUG_LOCKS_WARN_ON(1);
+
+ __raw_spin_unlock(&lockdep_lock);
+ return 0;
+}
+
+/*
+ * Turn lock debugging off and return with 0 if it was off already,
+ * and also release the graph lock:
+ */
+static inline int debug_locks_off_graph_unlock(void)
+{
+ int ret = debug_locks_off();
+
+ __raw_spin_unlock(&lockdep_lock);
+
+ return ret;
+}
+
+static int lockdep_initialized;
+
+unsigned long nr_list_entries;
+static struct lock_list list_entries[MAX_LOCKDEP_ENTRIES];
+
+/*
+ * All data structures here are protected by the global debug_lock.
+ *
+ * Mutex key structs only get allocated, once during bootup, and never
+ * get freed - this significantly simplifies the debugging code.
+ */
+unsigned long nr_lock_classes;
+static struct lock_class lock_classes[MAX_LOCKDEP_KEYS];
+
+#ifdef CONFIG_LOCK_STAT
+static DEFINE_PER_CPU(struct lock_class_stats[MAX_LOCKDEP_KEYS], lock_stats);
+
+static int lock_contention_point(struct lock_class *class, unsigned long ip)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(class->contention_point); i++) {
+ if (class->contention_point[i] == 0) {
+ class->contention_point[i] = ip;
+ break;
+ }
+ if (class->contention_point[i] == ip)
+ break;
+ }
+
+ return i;
+}
+
+static void lock_time_inc(struct lock_time *lt, s64 time)
+{
+ if (time > lt->max)
+ lt->max = time;
+
+ if (time < lt->min || !lt->min)
+ lt->min = time;
+
+ lt->total += time;
+ lt->nr++;
+}
+
+static inline void lock_time_add(struct lock_time *src, struct lock_time *dst)
+{
+ dst->min += src->min;
+ dst->max += src->max;
+ dst->total += src->total;
+ dst->nr += src->nr;
+}
+
+struct lock_class_stats lock_stats(struct lock_class *class)
+{
+ struct lock_class_stats stats;
+ int cpu, i;
+
+ memset(&stats, 0, sizeof(struct lock_class_stats));
+ for_each_possible_cpu(cpu) {
+ struct lock_class_stats *pcs =
+ &per_cpu(lock_stats, cpu)[class - lock_classes];
+
+ for (i = 0; i < ARRAY_SIZE(stats.contention_point); i++)
+ stats.contention_point[i] += pcs->contention_point[i];
+
+ lock_time_add(&pcs->read_waittime, &stats.read_waittime);
+ lock_time_add(&pcs->write_waittime, &stats.write_waittime);
+
+ lock_time_add(&pcs->read_holdtime, &stats.read_holdtime);
+ lock_time_add(&pcs->write_holdtime, &stats.write_holdtime);
+
+ for (i = 0; i < ARRAY_SIZE(stats.bounces); i++)
+ stats.bounces[i] += pcs->bounces[i];
+ }
+
+ return stats;
+}
+
+void clear_lock_stats(struct lock_class *class)
+{
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ struct lock_class_stats *cpu_stats =
+ &per_cpu(lock_stats, cpu)[class - lock_classes];
+
+ memset(cpu_stats, 0, sizeof(struct lock_class_stats));
+ }
+ memset(class->contention_point, 0, sizeof(class->contention_point));
+}
+
+static struct lock_class_stats *get_lock_stats(struct lock_class *class)
+{
+ return &get_cpu_var(lock_stats)[class - lock_classes];
+}
+
+static void put_lock_stats(struct lock_class_stats *stats)
+{
+ put_cpu_var(lock_stats);
+}
+
+static void lock_release_holdtime(struct held_lock *hlock)
+{
+ struct lock_class_stats *stats;
+ s64 holdtime;
+
+ if (!lock_stat)
+ return;
+
+ holdtime = sched_clock() - hlock->holdtime_stamp;
+
+ stats = get_lock_stats(hlock->class);
+ if (hlock->read)
+ lock_time_inc(&stats->read_holdtime, holdtime);
+ else
+ lock_time_inc(&stats->write_holdtime, holdtime);
+ put_lock_stats(stats);
+}
+#else
+static inline void lock_release_holdtime(struct held_lock *hlock)
+{
+}
+#endif
+
+/*
+ * We keep a global list of all lock classes. The list only grows,
+ * never shrinks. The list is only accessed with the lockdep
+ * spinlock lock held.
+ */
+LIST_HEAD(all_lock_classes);
+
+/*
+ * The lockdep classes are in a hash-table as well, for fast lookup:
+ */
+#define CLASSHASH_BITS (MAX_LOCKDEP_KEYS_BITS - 1)
+#define CLASSHASH_SIZE (1UL << CLASSHASH_BITS)
+#define __classhashfn(key) hash_long((unsigned long)key, CLASSHASH_BITS)
+#define classhashentry(key) (classhash_table + __classhashfn((key)))
+
+static struct list_head classhash_table[CLASSHASH_SIZE];
+
+/*
+ * We put the lock dependency chains into a hash-table as well, to cache
+ * their existence:
+ */
+#define CHAINHASH_BITS (MAX_LOCKDEP_CHAINS_BITS-1)
+#define CHAINHASH_SIZE (1UL << CHAINHASH_BITS)
+#define __chainhashfn(chain) hash_long(chain, CHAINHASH_BITS)
+#define chainhashentry(chain) (chainhash_table + __chainhashfn((chain)))
+
+static struct list_head chainhash_table[CHAINHASH_SIZE];
+
+/*
+ * The hash key of the lock dependency chains is a hash itself too:
+ * it's a hash of all locks taken up to that lock, including that lock.
+ * It's a 64-bit hash, because it's important for the keys to be
+ * unique.
+ */
+#define iterate_chain_key(key1, key2) \
+ (((key1) << MAX_LOCKDEP_KEYS_BITS) ^ \
+ ((key1) >> (64-MAX_LOCKDEP_KEYS_BITS)) ^ \
+ (key2))
+
+void lockdep_off(void)
+{
+ current->lockdep_recursion++;
+}
+
+EXPORT_SYMBOL(lockdep_off);
+
+void lockdep_on(void)
+{
+ current->lockdep_recursion--;
+}
+
+EXPORT_SYMBOL(lockdep_on);
+
+/*
+ * Debugging switches:
+ */
+
+#define VERBOSE 0
+#define VERY_VERBOSE 0
+
+#if VERBOSE
+# define HARDIRQ_VERBOSE 1
+# define SOFTIRQ_VERBOSE 1
+#else
+# define HARDIRQ_VERBOSE 0
+# define SOFTIRQ_VERBOSE 0
+#endif
+
+#if VERBOSE || HARDIRQ_VERBOSE || SOFTIRQ_VERBOSE
+/*
+ * Quick filtering for interesting events:
+ */
+static int class_filter(struct lock_class *class)
+{
+#if 0
+ /* Example */
+ if (class->name_version == 1 &&
+ !strcmp(class->name, "lockname"))
+ return 1;
+ if (class->name_version == 1 &&
+ !strcmp(class->name, "&struct->lockfield"))
+ return 1;
+#endif
+ /* Filter everything else. 1 would be to allow everything else */
+ return 0;
+}
+#endif
+
+static int verbose(struct lock_class *class)
+{
+#if VERBOSE
+ return class_filter(class);
+#endif
+ return 0;
+}
+
+/*
+ * Stack-trace: tightly packed array of stack backtrace
+ * addresses. Protected by the graph_lock.
+ */
+unsigned long nr_stack_trace_entries;
+static unsigned long stack_trace[MAX_STACK_TRACE_ENTRIES];
+
+static int save_trace(struct stack_trace *trace)
+{
+ trace->nr_entries = 0;
+ trace->max_entries = MAX_STACK_TRACE_ENTRIES - nr_stack_trace_entries;
+ trace->entries = stack_trace + nr_stack_trace_entries;
+
+ trace->skip = 3;
+
+ save_stack_trace(trace);
+
+ trace->max_entries = trace->nr_entries;
+
+ nr_stack_trace_entries += trace->nr_entries;
+
+ if (nr_stack_trace_entries == MAX_STACK_TRACE_ENTRIES) {
+ if (!debug_locks_off_graph_unlock())
+ return 0;
+
+ printk("BUG: MAX_STACK_TRACE_ENTRIES too low!\n");
+ printk("turning off the locking correctness validator.\n");
+ dump_stack();
+
+ return 0;
+ }
+
+ return 1;
+}
+
+unsigned int nr_hardirq_chains;
+unsigned int nr_softirq_chains;
+unsigned int nr_process_chains;
+unsigned int max_lockdep_depth;
+unsigned int max_recursion_depth;
+
+#ifdef CONFIG_DEBUG_LOCKDEP
+/*
+ * We cannot printk in early bootup code. Not even early_printk()
+ * might work. So we mark any initialization errors and printk
+ * about it later on, in lockdep_info().
+ */
+static int lockdep_init_error;
+static unsigned long lockdep_init_trace_data[20];
+static struct stack_trace lockdep_init_trace = {
+ .max_entries = ARRAY_SIZE(lockdep_init_trace_data),
+ .entries = lockdep_init_trace_data,
+};
+
+/*
+ * Various lockdep statistics:
+ */
+atomic_t chain_lookup_hits;
+atomic_t chain_lookup_misses;
+atomic_t hardirqs_on_events;
+atomic_t hardirqs_off_events;
+atomic_t redundant_hardirqs_on;
+atomic_t redundant_hardirqs_off;
+atomic_t softirqs_on_events;
+atomic_t softirqs_off_events;
+atomic_t redundant_softirqs_on;
+atomic_t redundant_softirqs_off;
+atomic_t nr_unused_locks;
+atomic_t nr_cyclic_checks;
+atomic_t nr_cyclic_check_recursions;
+atomic_t nr_find_usage_forwards_checks;
+atomic_t nr_find_usage_forwards_recursions;
+atomic_t nr_find_usage_backwards_checks;
+atomic_t nr_find_usage_backwards_recursions;
+# define debug_atomic_inc(ptr) atomic_inc(ptr)
+# define debug_atomic_dec(ptr) atomic_dec(ptr)
+# define debug_atomic_read(ptr) atomic_read(ptr)
+#else
+# define debug_atomic_inc(ptr) do { } while (0)
+# define debug_atomic_dec(ptr) do { } while (0)
+# define debug_atomic_read(ptr) 0
+#endif
+
+/*
+ * Locking printouts:
+ */
+
+static const char *usage_str[] =
+{
+ [LOCK_USED] = "initial-use ",
+ [LOCK_USED_IN_HARDIRQ] = "in-hardirq-W",
+ [LOCK_USED_IN_SOFTIRQ] = "in-softirq-W",
+ [LOCK_ENABLED_SOFTIRQS] = "softirq-on-W",
+ [LOCK_ENABLED_HARDIRQS] = "hardirq-on-W",
+ [LOCK_USED_IN_HARDIRQ_READ] = "in-hardirq-R",
+ [LOCK_USED_IN_SOFTIRQ_READ] = "in-softirq-R",
+ [LOCK_ENABLED_SOFTIRQS_READ] = "softirq-on-R",
+ [LOCK_ENABLED_HARDIRQS_READ] = "hardirq-on-R",
+};
+
+const char * __get_key_name(struct lockdep_subclass_key *key, char *str)
+{
+ return kallsyms_lookup((unsigned long)key, NULL, NULL, NULL, str);
+}
+
+void
+get_usage_chars(struct lock_class *class, char *c1, char *c2, char *c3, char *c4)
+{
+ *c1 = '.', *c2 = '.', *c3 = '.', *c4 = '.';
+
+ if (class->usage_mask & LOCKF_USED_IN_HARDIRQ)
+ *c1 = '+';
+ else
+ if (class->usage_mask & LOCKF_ENABLED_HARDIRQS)
+ *c1 = '-';
+
+ if (class->usage_mask & LOCKF_USED_IN_SOFTIRQ)
+ *c2 = '+';
+ else
+ if (class->usage_mask & LOCKF_ENABLED_SOFTIRQS)
+ *c2 = '-';
+
+ if (class->usage_mask & LOCKF_ENABLED_HARDIRQS_READ)
+ *c3 = '-';
+ if (class->usage_mask & LOCKF_USED_IN_HARDIRQ_READ) {
+ *c3 = '+';
+ if (class->usage_mask & LOCKF_ENABLED_HARDIRQS_READ)
+ *c3 = '?';
+ }
+
+ if (class->usage_mask & LOCKF_ENABLED_SOFTIRQS_READ)
+ *c4 = '-';
+ if (class->usage_mask & LOCKF_USED_IN_SOFTIRQ_READ) {
+ *c4 = '+';
+ if (class->usage_mask & LOCKF_ENABLED_SOFTIRQS_READ)
+ *c4 = '?';
+ }
+}
+
+static void print_lock_name(struct lock_class *class)
+{
+ char str[KSYM_NAME_LEN], c1, c2, c3, c4;
+ const char *name;
+
+ get_usage_chars(class, &c1, &c2, &c3, &c4);
+
+ name = class->name;
+ if (!name) {
+ name = __get_key_name(class->key, str);
+ printk(" (%s", name);
+ } else {
+ printk(" (%s", name);
+ if (class->name_version > 1)
+ printk("#%d", class->name_version);
+ if (class->subclass)
+ printk("/%d", class->subclass);
+ }
+ printk("){%c%c%c%c}", c1, c2, c3, c4);
+}
+
+static void print_lockdep_cache(struct lockdep_map *lock)
+{
+ const char *name;
+ char str[KSYM_NAME_LEN];
+
+ name = lock->name;
+ if (!name)
+ name = __get_key_name(lock->key->subkeys, str);
+
+ printk("%s", name);
+}
+
+static void print_lock(struct held_lock *hlock)
+{
+ print_lock_name(hlock->class);
+ printk(", at: ");
+ print_ip_sym(hlock->acquire_ip);
+}
+
+static void lockdep_print_held_locks(struct task_struct *curr)
+{
+ int i, depth = curr->lockdep_depth;
+
+ if (!depth) {
+ printk("no locks held by %s/%d.\n", curr->comm, task_pid_nr(curr));
+ return;
+ }
+ printk("%d lock%s held by %s/%d:\n",
+ depth, depth > 1 ? "s" : "", curr->comm, task_pid_nr(curr));
+
+ for (i = 0; i < depth; i++) {
+ printk(" #%d: ", i);
+ print_lock(curr->held_locks + i);
+ }
+}
+
+static void print_lock_class_header(struct lock_class *class, int depth)
+{
+ int bit;
+
+ printk("%*s->", depth, "");
+ print_lock_name(class);
+ printk(" ops: %lu", class->ops);
+ printk(" {\n");
+
+ for (bit = 0; bit < LOCK_USAGE_STATES; bit++) {
+ if (class->usage_mask & (1 << bit)) {
+ int len = depth;
+
+ len += printk("%*s %s", depth, "", usage_str[bit]);
+ len += printk(" at:\n");
+ print_stack_trace(class->usage_traces + bit, len);
+ }
+ }
+ printk("%*s }\n", depth, "");
+
+ printk("%*s ... key at: ",depth,"");
+ print_ip_sym((unsigned long)class->key);
+}
+
+/*
+ * printk all lock dependencies starting at <entry>:
+ */
+static void print_lock_dependencies(struct lock_class *class, int depth)
+{
+ struct lock_list *entry;
+
+ if (DEBUG_LOCKS_WARN_ON(depth >= 20))
+ return;
+
+ print_lock_class_header(class, depth);
+
+ list_for_each_entry(entry, &class->locks_after, entry) {
+ if (DEBUG_LOCKS_WARN_ON(!entry->class))
+ return;
+
+ print_lock_dependencies(entry->class, depth + 1);
+
+ printk("%*s ... acquired at:\n",depth,"");
+ print_stack_trace(&entry->trace, 2);
+ printk("\n");
+ }
+}
+
+static void print_kernel_version(void)
+{
+ printk("%s %.*s\n", init_utsname()->release,
+ (int)strcspn(init_utsname()->version, " "),
+ init_utsname()->version);
+}
+
+static int very_verbose(struct lock_class *class)
+{
+#if VERY_VERBOSE
+ return class_filter(class);
+#endif
+ return 0;
+}
+
+/*
+ * Is this the address of a static object:
+ */
+static int static_obj(void *obj)
+{
+ unsigned long start = (unsigned long) &_stext,
+ end = (unsigned long) &_end,
+ addr = (unsigned long) obj;
+#ifdef CONFIG_SMP
+ int i;
+#endif
+
+ /*
+ * static variable?
+ */
+ if ((addr >= start) && (addr < end))
+ return 1;
+
+#ifdef CONFIG_SMP
+ /*
+ * percpu var?
+ */
+ for_each_possible_cpu(i) {
+ start = (unsigned long) &__per_cpu_start + per_cpu_offset(i);
+ end = (unsigned long) &__per_cpu_start + PERCPU_ENOUGH_ROOM
+ + per_cpu_offset(i);
+
+ if ((addr >= start) && (addr < end))
+ return 1;
+ }
+#endif
+
+ /*
+ * module var?
+ */
+ return is_module_address(addr);
+}
+
+/*
+ * To make lock name printouts unique, we calculate a unique
+ * class->name_version generation counter:
+ */
+static int count_matching_names(struct lock_class *new_class)
+{
+ struct lock_class *class;
+ int count = 0;
+
+ if (!new_class->name)
+ return 0;
+
+ list_for_each_entry(class, &all_lock_classes, lock_entry) {
+ if (new_class->key - new_class->subclass == class->key)
+ return class->name_version;
+ if (class->name && !strcmp(class->name, new_class->name))
+ count = max(count, class->name_version);
+ }
+
+ return count + 1;
+}
+
+/*
+ * Register a lock's class in the hash-table, if the class is not present
+ * yet. Otherwise we look it up. We cache the result in the lock object
+ * itself, so actual lookup of the hash should be once per lock object.
+ */
+static inline struct lock_class *
+look_up_lock_class(struct lockdep_map *lock, unsigned int subclass)
+{
+ struct lockdep_subclass_key *key;
+ struct list_head *hash_head;
+ struct lock_class *class;
+
+#ifdef CONFIG_DEBUG_LOCKDEP
+ /*
+ * If the architecture calls into lockdep before initializing
+ * the hashes then we'll warn about it later. (we cannot printk
+ * right now)
+ */
+ if (unlikely(!lockdep_initialized)) {
+ lockdep_init();
+ lockdep_init_error = 1;
+ save_stack_trace(&lockdep_init_trace);
+ }
+#endif
+
+ /*
+ * Static locks do not have their class-keys yet - for them the key
+ * is the lock object itself:
+ */
+ if (unlikely(!lock->key))
+ lock->key = (void *)lock;
+
+ /*
+ * NOTE: the class-key must be unique. For dynamic locks, a static
+ * lock_class_key variable is passed in through the mutex_init()
+ * (or spin_lock_init()) call - which acts as the key. For static
+ * locks we use the lock object itself as the key.
+ */
+ BUILD_BUG_ON(sizeof(struct lock_class_key) >
+ sizeof(struct lockdep_map));
+
+ key = lock->key->subkeys + subclass;
+
+ hash_head = classhashentry(key);
+
+ /*
+ * We can walk the hash lockfree, because the hash only
+ * grows, and we are careful when adding entries to the end:
+ */
+ list_for_each_entry(class, hash_head, hash_entry) {
+ if (class->key == key) {
+ WARN_ON_ONCE(class->name != lock->name);
+ return class;
+ }
+ }
+
+ return NULL;
+}
+
+/*
+ * Register a lock's class in the hash-table, if the class is not present
+ * yet. Otherwise we look it up. We cache the result in the lock object
+ * itself, so actual lookup of the hash should be once per lock object.
+ */
+static inline struct lock_class *
+register_lock_class(struct lockdep_map *lock, unsigned int subclass, int force)
+{
+ struct lockdep_subclass_key *key;
+ struct list_head *hash_head;
+ struct lock_class *class;
+ unsigned long flags;
+
+ class = look_up_lock_class(lock, subclass);
+ if (likely(class))
+ return class;
+
+ /*
+ * Debug-check: all keys must be persistent!
+ */
+ if (!static_obj(lock->key)) {
+ debug_locks_off();
+ printk("INFO: trying to register non-static key.\n");
+ printk("the code is fine but needs lockdep annotation.\n");
+ printk("turning off the locking correctness validator.\n");
+ dump_stack();
+
+ return NULL;
+ }
+
+ key = lock->key->subkeys + subclass;
+ hash_head = classhashentry(key);
+
+ raw_local_irq_save(flags);
+ if (!graph_lock()) {
+ raw_local_irq_restore(flags);
+ return NULL;
+ }
+ /*
+ * We have to do the hash-walk again, to avoid races
+ * with another CPU:
+ */
+ list_for_each_entry(class, hash_head, hash_entry)
+ if (class->key == key)
+ goto out_unlock_set;
+ /*
+ * Allocate a new key from the static array, and add it to
+ * the hash:
+ */
+ if (nr_lock_classes >= MAX_LOCKDEP_KEYS) {
+ if (!debug_locks_off_graph_unlock()) {
+ raw_local_irq_restore(flags);
+ return NULL;
+ }
+ raw_local_irq_restore(flags);
+
+ printk("BUG: MAX_LOCKDEP_KEYS too low!\n");
+ printk("turning off the locking correctness validator.\n");
+ return NULL;
+ }
+ class = lock_classes + nr_lock_classes++;
+ debug_atomic_inc(&nr_unused_locks);
+ class->key = key;
+ class->name = lock->name;
+ class->subclass = subclass;
+ INIT_LIST_HEAD(&class->lock_entry);
+ INIT_LIST_HEAD(&class->locks_before);
+ INIT_LIST_HEAD(&class->locks_after);
+ class->name_version = count_matching_names(class);
+ /*
+ * We use RCU's safe list-add method to make
+ * parallel walking of the hash-list safe:
+ */
+ list_add_tail_rcu(&class->hash_entry, hash_head);
+
+ if (verbose(class)) {
+ graph_unlock();
+ raw_local_irq_restore(flags);
+
+ printk("\nnew class %p: %s", class->key, class->name);
+ if (class->name_version > 1)
+ printk("#%d", class->name_version);
+ printk("\n");
+ dump_stack();
+
+ raw_local_irq_save(flags);
+ if (!graph_lock()) {
+ raw_local_irq_restore(flags);
+ return NULL;
+ }
+ }
+out_unlock_set:
+ graph_unlock();
+ raw_local_irq_restore(flags);
+
+ if (!subclass || force)
+ lock->class_cache = class;
+
+ if (DEBUG_LOCKS_WARN_ON(class->subclass != subclass))
+ return NULL;
+
+ return class;
+}
+
+#ifdef CONFIG_PROVE_LOCKING
+/*
+ * Allocate a lockdep entry. (assumes the graph_lock held, returns
+ * with NULL on failure)
+ */
+static struct lock_list *alloc_list_entry(void)
+{
+ if (nr_list_entries >= MAX_LOCKDEP_ENTRIES) {
+ if (!debug_locks_off_graph_unlock())
+ return NULL;
+
+ printk("BUG: MAX_LOCKDEP_ENTRIES too low!\n");
+ printk("turning off the locking correctness validator.\n");
+ return NULL;
+ }
+ return list_entries + nr_list_entries++;
+}
+
+/*
+ * Add a new dependency to the head of the list:
+ */
+static int add_lock_to_list(struct lock_class *class, struct lock_class *this,
+ struct list_head *head, unsigned long ip, int distance)
+{
+ struct lock_list *entry;
+ /*
+ * Lock not present yet - get a new dependency struct and
+ * add it to the list:
+ */
+ entry = alloc_list_entry();
+ if (!entry)
+ return 0;
+
+ entry->class = this;
+ entry->distance = distance;
+ if (!save_trace(&entry->trace))
+ return 0;
+
+ /*
+ * Since we never remove from the dependency list, the list can
+ * be walked lockless by other CPUs, it's only allocation
+ * that must be protected by the spinlock. But this also means
+ * we must make new entries visible only once writes to the
+ * entry become visible - hence the RCU op:
+ */
+ list_add_tail_rcu(&entry->entry, head);
+
+ return 1;
+}
+
+/*
+ * Recursive, forwards-direction lock-dependency checking, used for
+ * both noncyclic checking and for hardirq-unsafe/softirq-unsafe
+ * checking.
+ *
+ * (to keep the stackframe of the recursive functions small we
+ * use these global variables, and we also mark various helper
+ * functions as noinline.)
+ */
+static struct held_lock *check_source, *check_target;
+
+/*
+ * Print a dependency chain entry (this is only done when a deadlock
+ * has been detected):
+ */
+static noinline int
+print_circular_bug_entry(struct lock_list *target, unsigned int depth)
+{
+ if (debug_locks_silent)
+ return 0;
+ printk("\n-> #%u", depth);
+ print_lock_name(target->class);
+ printk(":\n");
+ print_stack_trace(&target->trace, 6);
+
+ return 0;
+}
+
+/*
+ * When a circular dependency is detected, print the
+ * header first:
+ */
+static noinline int
+print_circular_bug_header(struct lock_list *entry, unsigned int depth)
+{
+ struct task_struct *curr = current;
+
+ if (!debug_locks_off_graph_unlock() || debug_locks_silent)
+ return 0;
+
+ printk("\n=======================================================\n");
+ printk( "[ INFO: possible circular locking dependency detected ]\n");
+ print_kernel_version();
+ printk( "-------------------------------------------------------\n");
+ printk("%s/%d is trying to acquire lock:\n",
+ curr->comm, task_pid_nr(curr));
+ print_lock(check_source);
+ printk("\nbut task is already holding lock:\n");
+ print_lock(check_target);
+ printk("\nwhich lock already depends on the new lock.\n\n");
+ printk("\nthe existing dependency chain (in reverse order) is:\n");
+
+ print_circular_bug_entry(entry, depth);
+
+ return 0;
+}
+
+static noinline int print_circular_bug_tail(void)
+{
+ struct task_struct *curr = current;
+ struct lock_list this;
+
+ if (debug_locks_silent)
+ return 0;
+
+ this.class = check_source->class;
+ if (!save_trace(&this.trace))
+ return 0;
+
+ print_circular_bug_entry(&this, 0);
+
+ printk("\nother info that might help us debug this:\n\n");
+ lockdep_print_held_locks(curr);
+
+ printk("\nstack backtrace:\n");
+ dump_stack();
+
+ return 0;
+}
+
+#define RECURSION_LIMIT 40
+
+static int noinline print_infinite_recursion_bug(void)
+{
+ if (!debug_locks_off_graph_unlock())
+ return 0;
+
+ WARN_ON(1);
+
+ return 0;
+}
+
+/*
+ * Prove that the dependency graph starting at <entry> can not
+ * lead to <target>. Print an error and return 0 if it does.
+ */
+static noinline int
+check_noncircular(struct lock_class *source, unsigned int depth)
+{
+ struct lock_list *entry;
+
+ debug_atomic_inc(&nr_cyclic_check_recursions);
+ if (depth > max_recursion_depth)
+ max_recursion_depth = depth;
+ if (depth >= RECURSION_LIMIT)
+ return print_infinite_recursion_bug();
+ /*
+ * Check this lock's dependency list:
+ */
+ list_for_each_entry(entry, &source->locks_after, entry) {
+ if (entry->class == check_target->class)
+ return print_circular_bug_header(entry, depth+1);
+ debug_atomic_inc(&nr_cyclic_checks);
+ if (!check_noncircular(entry->class, depth+1))
+ return print_circular_bug_entry(entry, depth+1);
+ }
+ return 1;
+}
+
+#ifdef CONFIG_TRACE_IRQFLAGS
+/*
+ * Forwards and backwards subgraph searching, for the purposes of
+ * proving that two subgraphs can be connected by a new dependency
+ * without creating any illegal irq-safe -> irq-unsafe lock dependency.
+ */
+static enum lock_usage_bit find_usage_bit;
+static struct lock_class *forwards_match, *backwards_match;
+
+/*
+ * Find a node in the forwards-direction dependency sub-graph starting
+ * at <source> that matches <find_usage_bit>.
+ *
+ * Return 2 if such a node exists in the subgraph, and put that node
+ * into <forwards_match>.
+ *
+ * Return 1 otherwise and keep <forwards_match> unchanged.
+ * Return 0 on error.
+ */
+static noinline int
+find_usage_forwards(struct lock_class *source, unsigned int depth)
+{
+ struct lock_list *entry;
+ int ret;
+
+ if (depth > max_recursion_depth)
+ max_recursion_depth = depth;
+ if (depth >= RECURSION_LIMIT)
+ return print_infinite_recursion_bug();
+
+ debug_atomic_inc(&nr_find_usage_forwards_checks);
+ if (source->usage_mask & (1 << find_usage_bit)) {
+ forwards_match = source;
+ return 2;
+ }
+
+ /*
+ * Check this lock's dependency list:
+ */
+ list_for_each_entry(entry, &source->locks_after, entry) {
+ debug_atomic_inc(&nr_find_usage_forwards_recursions);
+ ret = find_usage_forwards(entry->class, depth+1);
+ if (ret == 2 || ret == 0)
+ return ret;
+ }
+ return 1;
+}
+
+/*
+ * Find a node in the backwards-direction dependency sub-graph starting
+ * at <source> that matches <find_usage_bit>.
+ *
+ * Return 2 if such a node exists in the subgraph, and put that node
+ * into <backwards_match>.
+ *
+ * Return 1 otherwise and keep <backwards_match> unchanged.
+ * Return 0 on error.
+ */
+static noinline int
+find_usage_backwards(struct lock_class *source, unsigned int depth)
+{
+ struct lock_list *entry;
+ int ret;
+
+ if (!__raw_spin_is_locked(&lockdep_lock))
+ return DEBUG_LOCKS_WARN_ON(1);
+
+ if (depth > max_recursion_depth)
+ max_recursion_depth = depth;
+ if (depth >= RECURSION_LIMIT)
+ return print_infinite_recursion_bug();
+
+ debug_atomic_inc(&nr_find_usage_backwards_checks);
+ if (source->usage_mask & (1 << find_usage_bit)) {
+ backwards_match = source;
+ return 2;
+ }
+
+ /*
+ * Check this lock's dependency list:
+ */
+ list_for_each_entry(entry, &source->locks_before, entry) {
+ debug_atomic_inc(&nr_find_usage_backwards_recursions);
+ ret = find_usage_backwards(entry->class, depth+1);
+ if (ret == 2 || ret == 0)
+ return ret;
+ }
+ return 1;
+}
+
+static int
+print_bad_irq_dependency(struct task_struct *curr,
+ struct held_lock *prev,
+ struct held_lock *next,
+ enum lock_usage_bit bit1,
+ enum lock_usage_bit bit2,
+ const char *irqclass)
+{
+ if (!debug_locks_off_graph_unlock() || debug_locks_silent)
+ return 0;
+
+ printk("\n======================================================\n");
+ printk( "[ INFO: %s-safe -> %s-unsafe lock order detected ]\n",
+ irqclass, irqclass);
+ print_kernel_version();
+ printk( "------------------------------------------------------\n");
+ printk("%s/%d [HC%u[%lu]:SC%u[%lu]:HE%u:SE%u] is trying to acquire:\n",
+ curr->comm, task_pid_nr(curr),
+ curr->hardirq_context, hardirq_count() >> HARDIRQ_SHIFT,
+ curr->softirq_context, softirq_count() >> SOFTIRQ_SHIFT,
+ curr->hardirqs_enabled,
+ curr->softirqs_enabled);
+ print_lock(next);
+
+ printk("\nand this task is already holding:\n");
+ print_lock(prev);
+ printk("which would create a new lock dependency:\n");
+ print_lock_name(prev->class);
+ printk(" ->");
+ print_lock_name(next->class);
+ printk("\n");
+
+ printk("\nbut this new dependency connects a %s-irq-safe lock:\n",
+ irqclass);
+ print_lock_name(backwards_match);
+ printk("\n... which became %s-irq-safe at:\n", irqclass);
+
+ print_stack_trace(backwards_match->usage_traces + bit1, 1);
+
+ printk("\nto a %s-irq-unsafe lock:\n", irqclass);
+ print_lock_name(forwards_match);
+ printk("\n... which became %s-irq-unsafe at:\n", irqclass);
+ printk("...");
+
+ print_stack_trace(forwards_match->usage_traces + bit2, 1);
+
+ printk("\nother info that might help us debug this:\n\n");
+ lockdep_print_held_locks(curr);
+
+ printk("\nthe %s-irq-safe lock's dependencies:\n", irqclass);
+ print_lock_dependencies(backwards_match, 0);
+
+ printk("\nthe %s-irq-unsafe lock's dependencies:\n", irqclass);
+ print_lock_dependencies(forwards_match, 0);
+
+ printk("\nstack backtrace:\n");
+ dump_stack();
+
+ return 0;
+}
+
+static int
+check_usage(struct task_struct *curr, struct held_lock *prev,
+ struct held_lock *next, enum lock_usage_bit bit_backwards,
+ enum lock_usage_bit bit_forwards, const char *irqclass)
+{
+ int ret;
+
+ find_usage_bit = bit_backwards;
+ /* fills in <backwards_match> */
+ ret = find_usage_backwards(prev->class, 0);
+ if (!ret || ret == 1)
+ return ret;
+
+ find_usage_bit = bit_forwards;
+ ret = find_usage_forwards(next->class, 0);
+ if (!ret || ret == 1)
+ return ret;
+ /* ret == 2 */
+ return print_bad_irq_dependency(curr, prev, next,
+ bit_backwards, bit_forwards, irqclass);
+}
+
+static int
+check_prev_add_irq(struct task_struct *curr, struct held_lock *prev,
+ struct held_lock *next)
+{
+ /*
+ * Prove that the new dependency does not connect a hardirq-safe
+ * lock with a hardirq-unsafe lock - to achieve this we search
+ * the backwards-subgraph starting at <prev>, and the
+ * forwards-subgraph starting at <next>:
+ */
+ if (!check_usage(curr, prev, next, LOCK_USED_IN_HARDIRQ,
+ LOCK_ENABLED_HARDIRQS, "hard"))
+ return 0;
+
+ /*
+ * Prove that the new dependency does not connect a hardirq-safe-read
+ * lock with a hardirq-unsafe lock - to achieve this we search
+ * the backwards-subgraph starting at <prev>, and the
+ * forwards-subgraph starting at <next>:
+ */
+ if (!check_usage(curr, prev, next, LOCK_USED_IN_HARDIRQ_READ,
+ LOCK_ENABLED_HARDIRQS, "hard-read"))
+ return 0;
+
+ /*
+ * Prove that the new dependency does not connect a softirq-safe
+ * lock with a softirq-unsafe lock - to achieve this we search
+ * the backwards-subgraph starting at <prev>, and the
+ * forwards-subgraph starting at <next>:
+ */
+ if (!check_usage(curr, prev, next, LOCK_USED_IN_SOFTIRQ,
+ LOCK_ENABLED_SOFTIRQS, "soft"))
+ return 0;
+ /*
+ * Prove that the new dependency does not connect a softirq-safe-read
+ * lock with a softirq-unsafe lock - to achieve this we search
+ * the backwards-subgraph starting at <prev>, and the
+ * forwards-subgraph starting at <next>:
+ */
+ if (!check_usage(curr, prev, next, LOCK_USED_IN_SOFTIRQ_READ,
+ LOCK_ENABLED_SOFTIRQS, "soft"))
+ return 0;
+
+ return 1;
+}
+
+static void inc_chains(void)
+{
+ if (current->hardirq_context)
+ nr_hardirq_chains++;
+ else {
+ if (current->softirq_context)
+ nr_softirq_chains++;
+ else
+ nr_process_chains++;
+ }
+}
+
+#else
+
+static inline int
+check_prev_add_irq(struct task_struct *curr, struct held_lock *prev,
+ struct held_lock *next)
+{
+ return 1;
+}
+
+static inline void inc_chains(void)
+{
+ nr_process_chains++;
+}
+
+#endif
+
+static int
+print_deadlock_bug(struct task_struct *curr, struct held_lock *prev,
+ struct held_lock *next)
+{
+ if (!debug_locks_off_graph_unlock() || debug_locks_silent)
+ return 0;
+
+ printk("\n=============================================\n");
+ printk( "[ INFO: possible recursive locking detected ]\n");
+ print_kernel_version();
+ printk( "---------------------------------------------\n");
+ printk("%s/%d is trying to acquire lock:\n",
+ curr->comm, task_pid_nr(curr));
+ print_lock(next);
+ printk("\nbut task is already holding lock:\n");
+ print_lock(prev);
+
+ printk("\nother info that might help us debug this:\n");
+ lockdep_print_held_locks(curr);
+
+ printk("\nstack backtrace:\n");
+ dump_stack();
+
+ return 0;
+}
+
+/*
+ * Check whether we are holding such a class already.
+ *
+ * (Note that this has to be done separately, because the graph cannot
+ * detect such classes of deadlocks.)
+ *
+ * Returns: 0 on deadlock detected, 1 on OK, 2 on recursive read
+ */
+static int
+check_deadlock(struct task_struct *curr, struct held_lock *next,
+ struct lockdep_map *next_instance, int read)
+{
+ struct held_lock *prev;
+ int i;
+
+ for (i = 0; i < curr->lockdep_depth; i++) {
+ prev = curr->held_locks + i;
+ if (prev->class != next->class)
+ continue;
+ /*
+ * Allow read-after-read recursion of the same
+ * lock class (i.e. read_lock(lock)+read_lock(lock)):
+ */
+ if ((read == 2) && prev->read)
+ return 2;
+ return print_deadlock_bug(curr, prev, next);
+ }
+ return 1;
+}
+
+/*
+ * There was a chain-cache miss, and we are about to add a new dependency
+ * to a previous lock. We recursively validate the following rules:
+ *
+ * - would the adding of the <prev> -> <next> dependency create a
+ * circular dependency in the graph? [== circular deadlock]
+ *
+ * - does the new prev->next dependency connect any hardirq-safe lock
+ * (in the full backwards-subgraph starting at <prev>) with any
+ * hardirq-unsafe lock (in the full forwards-subgraph starting at
+ * <next>)? [== illegal lock inversion with hardirq contexts]
+ *
+ * - does the new prev->next dependency connect any softirq-safe lock
+ * (in the full backwards-subgraph starting at <prev>) with any
+ * softirq-unsafe lock (in the full forwards-subgraph starting at
+ * <next>)? [== illegal lock inversion with softirq contexts]
+ *
+ * any of these scenarios could lead to a deadlock.
+ *
+ * Then if all the validations pass, we add the forwards and backwards
+ * dependency.
+ */
+static int
+check_prev_add(struct task_struct *curr, struct held_lock *prev,
+ struct held_lock *next, int distance)
+{
+ struct lock_list *entry;
+ int ret;
+
+ /*
+ * Prove that the new <prev> -> <next> dependency would not
+ * create a circular dependency in the graph. (We do this by
+ * forward-recursing into the graph starting at <next>, and
+ * checking whether we can reach <prev>.)
+ *
+ * We are using global variables to control the recursion, to
+ * keep the stackframe size of the recursive functions low:
+ */
+ check_source = next;
+ check_target = prev;
+ if (!(check_noncircular(next->class, 0)))
+ return print_circular_bug_tail();
+
+ if (!check_prev_add_irq(curr, prev, next))
+ return 0;
+
+ /*
+ * For recursive read-locks we do all the dependency checks,
+ * but we dont store read-triggered dependencies (only
+ * write-triggered dependencies). This ensures that only the
+ * write-side dependencies matter, and that if for example a
+ * write-lock never takes any other locks, then the reads are
+ * equivalent to a NOP.
+ */
+ if (next->read == 2 || prev->read == 2)
+ return 1;
+ /*
+ * Is the <prev> -> <next> dependency already present?
+ *
+ * (this may occur even though this is a new chain: consider
+ * e.g. the L1 -> L2 -> L3 -> L4 and the L5 -> L1 -> L2 -> L3
+ * chains - the second one will be new, but L1 already has
+ * L2 added to its dependency list, due to the first chain.)
+ */
+ list_for_each_entry(entry, &prev->class->locks_after, entry) {
+ if (entry->class == next->class) {
+ if (distance == 1)
+ entry->distance = 1;
+ return 2;
+ }
+ }
+
+ /*
+ * Ok, all validations passed, add the new lock
+ * to the previous lock's dependency list:
+ */
+ ret = add_lock_to_list(prev->class, next->class,
+ &prev->class->locks_after, next->acquire_ip, distance);
+
+ if (!ret)
+ return 0;
+
+ ret = add_lock_to_list(next->class, prev->class,
+ &next->class->locks_before, next->acquire_ip, distance);
+ if (!ret)
+ return 0;
+
+ /*
+ * Debugging printouts:
+ */
+ if (verbose(prev->class) || verbose(next->class)) {
+ graph_unlock();
+ printk("\n new dependency: ");
+ print_lock_name(prev->class);
+ printk(" => ");
+ print_lock_name(next->class);
+ printk("\n");
+ dump_stack();
+ return graph_lock();
+ }
+ return 1;
+}
+
+/*
+ * Add the dependency to all directly-previous locks that are 'relevant'.
+ * The ones that are relevant are (in increasing distance from curr):
+ * all consecutive trylock entries and the final non-trylock entry - or
+ * the end of this context's lock-chain - whichever comes first.
+ */
+static int
+check_prevs_add(struct task_struct *curr, struct held_lock *next)
+{
+ int depth = curr->lockdep_depth;
+ struct held_lock *hlock;
+
+ /*
+ * Debugging checks.
+ *
+ * Depth must not be zero for a non-head lock:
+ */
+ if (!depth)
+ goto out_bug;
+ /*
+ * At least two relevant locks must exist for this
+ * to be a head:
+ */
+ if (curr->held_locks[depth].irq_context !=
+ curr->held_locks[depth-1].irq_context)
+ goto out_bug;
+
+ for (;;) {
+ int distance = curr->lockdep_depth - depth + 1;
+ hlock = curr->held_locks + depth-1;
+ /*
+ * Only non-recursive-read entries get new dependencies
+ * added:
+ */
+ if (hlock->read != 2) {
+ if (!check_prev_add(curr, hlock, next, distance))
+ return 0;
+ /*
+ * Stop after the first non-trylock entry,
+ * as non-trylock entries have added their
+ * own direct dependencies already, so this
+ * lock is connected to them indirectly:
+ */
+ if (!hlock->trylock)
+ break;
+ }
+ depth--;
+ /*
+ * End of lock-stack?
+ */
+ if (!depth)
+ break;
+ /*
+ * Stop the search if we cross into another context:
+ */
+ if (curr->held_locks[depth].irq_context !=
+ curr->held_locks[depth-1].irq_context)
+ break;
+ }
+ return 1;
+out_bug:
+ if (!debug_locks_off_graph_unlock())
+ return 0;
+
+ WARN_ON(1);
+
+ return 0;
+}
+
+unsigned long nr_lock_chains;
+static struct lock_chain lock_chains[MAX_LOCKDEP_CHAINS];
+
+/*
+ * Look up a dependency chain. If the key is not present yet then
+ * add it and return 1 - in this case the new dependency chain is
+ * validated. If the key is already hashed, return 0.
+ * (On return with 1 graph_lock is held.)
+ */
+static inline int lookup_chain_cache(u64 chain_key, struct lock_class *class)
+{
+ struct list_head *hash_head = chainhashentry(chain_key);
+ struct lock_chain *chain;
+
+ if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
+ return 0;
+ /*
+ * We can walk it lock-free, because entries only get added
+ * to the hash:
+ */
+ list_for_each_entry(chain, hash_head, entry) {
+ if (chain->chain_key == chain_key) {
+cache_hit:
+ debug_atomic_inc(&chain_lookup_hits);
+ if (very_verbose(class))
+ printk("\nhash chain already cached, key: "
+ "%016Lx tail class: [%p] %s\n",
+ (unsigned long long)chain_key,
+ class->key, class->name);
+ return 0;
+ }
+ }
+ if (very_verbose(class))
+ printk("\nnew hash chain, key: %016Lx tail class: [%p] %s\n",
+ (unsigned long long)chain_key, class->key, class->name);
+ /*
+ * Allocate a new chain entry from the static array, and add
+ * it to the hash:
+ */
+ if (!graph_lock())
+ return 0;
+ /*
+ * We have to walk the chain again locked - to avoid duplicates:
+ */
+ list_for_each_entry(chain, hash_head, entry) {
+ if (chain->chain_key == chain_key) {
+ graph_unlock();
+ goto cache_hit;
+ }
+ }
+ if (unlikely(nr_lock_chains >= MAX_LOCKDEP_CHAINS)) {
+ if (!debug_locks_off_graph_unlock())
+ return 0;
+
+ printk("BUG: MAX_LOCKDEP_CHAINS too low!\n");
+ printk("turning off the locking correctness validator.\n");
+ return 0;
+ }
+ chain = lock_chains + nr_lock_chains++;
+ chain->chain_key = chain_key;
+ list_add_tail_rcu(&chain->entry, hash_head);
+ debug_atomic_inc(&chain_lookup_misses);
+ inc_chains();
+
+ return 1;
+}
+
+static int validate_chain(struct task_struct *curr, struct lockdep_map *lock,
+ struct held_lock *hlock, int chain_head, u64 chain_key)
+{
+ /*
+ * Trylock needs to maintain the stack of held locks, but it
+ * does not add new dependencies, because trylock can be done
+ * in any order.
+ *
+ * We look up the chain_key and do the O(N^2) check and update of
+ * the dependencies only if this is a new dependency chain.
+ * (If lookup_chain_cache() returns with 1 it acquires
+ * graph_lock for us)
+ */
+ if (!hlock->trylock && (hlock->check == 2) &&
+ lookup_chain_cache(chain_key, hlock->class)) {
+ /*
+ * Check whether last held lock:
+ *
+ * - is irq-safe, if this lock is irq-unsafe
+ * - is softirq-safe, if this lock is hardirq-unsafe
+ *
+ * And check whether the new lock's dependency graph
+ * could lead back to the previous lock.
+ *
+ * any of these scenarios could lead to a deadlock. If
+ * All validations
+ */
+ int ret = check_deadlock(curr, hlock, lock, hlock->read);
+
+ if (!ret)
+ return 0;
+ /*
+ * Mark recursive read, as we jump over it when
+ * building dependencies (just like we jump over
+ * trylock entries):
+ */
+ if (ret == 2)
+ hlock->read = 2;
+ /*
+ * Add dependency only if this lock is not the head
+ * of the chain, and if it's not a secondary read-lock:
+ */
+ if (!chain_head && ret != 2)
+ if (!check_prevs_add(curr, hlock))
+ return 0;
+ graph_unlock();
+ } else
+ /* after lookup_chain_cache(): */
+ if (unlikely(!debug_locks))
+ return 0;
+
+ return 1;
+}
+#else
+static inline int validate_chain(struct task_struct *curr,
+ struct lockdep_map *lock, struct held_lock *hlock,
+ int chain_head, u64 chain_key)
+{
+ return 1;
+}
+#endif
+
+/*
+ * We are building curr_chain_key incrementally, so double-check
+ * it from scratch, to make sure that it's done correctly:
+ */
+static void check_chain_key(struct task_struct *curr)
+{
+#ifdef CONFIG_DEBUG_LOCKDEP
+ struct held_lock *hlock, *prev_hlock = NULL;
+ unsigned int i, id;
+ u64 chain_key = 0;
+
+ for (i = 0; i < curr->lockdep_depth; i++) {
+ hlock = curr->held_locks + i;
+ if (chain_key != hlock->prev_chain_key) {
+ debug_locks_off();
+ printk("hm#1, depth: %u [%u], %016Lx != %016Lx\n",
+ curr->lockdep_depth, i,
+ (unsigned long long)chain_key,
+ (unsigned long long)hlock->prev_chain_key);
+ WARN_ON(1);
+ return;
+ }
+ id = hlock->class - lock_classes;
+ if (DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS))
+ return;
+
+ if (prev_hlock && (prev_hlock->irq_context !=
+ hlock->irq_context))
+ chain_key = 0;
+ chain_key = iterate_chain_key(chain_key, id);
+ prev_hlock = hlock;
+ }
+ if (chain_key != curr->curr_chain_key) {
+ debug_locks_off();
+ printk("hm#2, depth: %u [%u], %016Lx != %016Lx\n",
+ curr->lockdep_depth, i,
+ (unsigned long long)chain_key,
+ (unsigned long long)curr->curr_chain_key);
+ WARN_ON(1);
+ }
+#endif
+}
+
+static int
+print_usage_bug(struct task_struct *curr, struct held_lock *this,
+ enum lock_usage_bit prev_bit, enum lock_usage_bit new_bit)
+{
+ if (!debug_locks_off_graph_unlock() || debug_locks_silent)
+ return 0;
+
+ printk("\n=================================\n");
+ printk( "[ INFO: inconsistent lock state ]\n");
+ print_kernel_version();
+ printk( "---------------------------------\n");
+
+ printk("inconsistent {%s} -> {%s} usage.\n",
+ usage_str[prev_bit], usage_str[new_bit]);
+
+ printk("%s/%d [HC%u[%lu]:SC%u[%lu]:HE%u:SE%u] takes:\n",
+ curr->comm, task_pid_nr(curr),
+ trace_hardirq_context(curr), hardirq_count() >> HARDIRQ_SHIFT,
+ trace_softirq_context(curr), softirq_count() >> SOFTIRQ_SHIFT,
+ trace_hardirqs_enabled(curr),
+ trace_softirqs_enabled(curr));
+ print_lock(this);
+
+ printk("{%s} state was registered at:\n", usage_str[prev_bit]);
+ print_stack_trace(this->class->usage_traces + prev_bit, 1);
+
+ print_irqtrace_events(curr);
+ printk("\nother info that might help us debug this:\n");
+ lockdep_print_held_locks(curr);
+
+ printk("\nstack backtrace:\n");
+ dump_stack();
+
+ return 0;
+}
+
+/*
+ * Print out an error if an invalid bit is set:
+ */
+static inline int
+valid_state(struct task_struct *curr, struct held_lock *this,
+ enum lock_usage_bit new_bit, enum lock_usage_bit bad_bit)
+{
+ if (unlikely(this->class->usage_mask & (1 << bad_bit)))
+ return print_usage_bug(curr, this, bad_bit, new_bit);
+ return 1;
+}
+
+static int mark_lock(struct task_struct *curr, struct held_lock *this,
+ enum lock_usage_bit new_bit);
+
+#ifdef CONFIG_TRACE_IRQFLAGS
+
+/*
+ * print irq inversion bug:
+ */
+static int
+print_irq_inversion_bug(struct task_struct *curr, struct lock_class *other,
+ struct held_lock *this, int forwards,
+ const char *irqclass)
+{
+ if (!debug_locks_off_graph_unlock() || debug_locks_silent)
+ return 0;
+
+ printk("\n=========================================================\n");
+ printk( "[ INFO: possible irq lock inversion dependency detected ]\n");
+ print_kernel_version();
+ printk( "---------------------------------------------------------\n");
+ printk("%s/%d just changed the state of lock:\n",
+ curr->comm, task_pid_nr(curr));
+ print_lock(this);
+ if (forwards)
+ printk("but this lock took another, %s-irq-unsafe lock in the past:\n", irqclass);
+ else
+ printk("but this lock was taken by another, %s-irq-safe lock in the past:\n", irqclass);
+ print_lock_name(other);
+ printk("\n\nand interrupts could create inverse lock ordering between them.\n\n");
+
+ printk("\nother info that might help us debug this:\n");
+ lockdep_print_held_locks(curr);
+
+ printk("\nthe first lock's dependencies:\n");
+ print_lock_dependencies(this->class, 0);
+
+ printk("\nthe second lock's dependencies:\n");
+ print_lock_dependencies(other, 0);
+
+ printk("\nstack backtrace:\n");
+ dump_stack();
+
+ return 0;
+}
+
+/*
+ * Prove that in the forwards-direction subgraph starting at <this>
+ * there is no lock matching <mask>:
+ */
+static int
+check_usage_forwards(struct task_struct *curr, struct held_lock *this,
+ enum lock_usage_bit bit, const char *irqclass)
+{
+ int ret;
+
+ find_usage_bit = bit;
+ /* fills in <forwards_match> */
+ ret = find_usage_forwards(this->class, 0);
+ if (!ret || ret == 1)
+ return ret;
+
+ return print_irq_inversion_bug(curr, forwards_match, this, 1, irqclass);
+}
+
+/*
+ * Prove that in the backwards-direction subgraph starting at <this>
+ * there is no lock matching <mask>:
+ */
+static int
+check_usage_backwards(struct task_struct *curr, struct held_lock *this,
+ enum lock_usage_bit bit, const char *irqclass)
+{
+ int ret;
+
+ find_usage_bit = bit;
+ /* fills in <backwards_match> */
+ ret = find_usage_backwards(this->class, 0);
+ if (!ret || ret == 1)
+ return ret;
+
+ return print_irq_inversion_bug(curr, backwards_match, this, 0, irqclass);
+}
+
+void print_irqtrace_events(struct task_struct *curr)
+{
+ printk("irq event stamp: %u\n", curr->irq_events);
+ printk("hardirqs last enabled at (%u): ", curr->hardirq_enable_event);
+ print_ip_sym(curr->hardirq_enable_ip);
+ printk("hardirqs last disabled at (%u): ", curr->hardirq_disable_event);
+ print_ip_sym(curr->hardirq_disable_ip);
+ printk("softirqs last enabled at (%u): ", curr->softirq_enable_event);
+ print_ip_sym(curr->softirq_enable_ip);
+ printk("softirqs last disabled at (%u): ", curr->softirq_disable_event);
+ print_ip_sym(curr->softirq_disable_ip);
+}
+
+static int hardirq_verbose(struct lock_class *class)
+{
+#if HARDIRQ_VERBOSE
+ return class_filter(class);
+#endif
+ return 0;
+}
+
+static int softirq_verbose(struct lock_class *class)
+{
+#if SOFTIRQ_VERBOSE
+ return class_filter(class);
+#endif
+ return 0;
+}
+
+#define STRICT_READ_CHECKS 1
+
+static int mark_lock_irq(struct task_struct *curr, struct held_lock *this,
+ enum lock_usage_bit new_bit)
+{
+ int ret = 1;
+
+ switch(new_bit) {
+ case LOCK_USED_IN_HARDIRQ:
+ if (!valid_state(curr, this, new_bit, LOCK_ENABLED_HARDIRQS))
+ return 0;
+ if (!valid_state(curr, this, new_bit,
+ LOCK_ENABLED_HARDIRQS_READ))
+ return 0;
+ /*
+ * just marked it hardirq-safe, check that this lock
+ * took no hardirq-unsafe lock in the past:
+ */
+ if (!check_usage_forwards(curr, this,
+ LOCK_ENABLED_HARDIRQS, "hard"))
+ return 0;
+#if STRICT_READ_CHECKS
+ /*
+ * just marked it hardirq-safe, check that this lock
+ * took no hardirq-unsafe-read lock in the past:
+ */
+ if (!check_usage_forwards(curr, this,
+ LOCK_ENABLED_HARDIRQS_READ, "hard-read"))
+ return 0;
+#endif
+ if (hardirq_verbose(this->class))
+ ret = 2;
+ break;
+ case LOCK_USED_IN_SOFTIRQ:
+ if (!valid_state(curr, this, new_bit, LOCK_ENABLED_SOFTIRQS))
+ return 0;
+ if (!valid_state(curr, this, new_bit,
+ LOCK_ENABLED_SOFTIRQS_READ))
+ return 0;
+ /*
+ * just marked it softirq-safe, check that this lock
+ * took no softirq-unsafe lock in the past:
+ */
+ if (!check_usage_forwards(curr, this,
+ LOCK_ENABLED_SOFTIRQS, "soft"))
+ return 0;
+#if STRICT_READ_CHECKS
+ /*
+ * just marked it softirq-safe, check that this lock
+ * took no softirq-unsafe-read lock in the past:
+ */
+ if (!check_usage_forwards(curr, this,
+ LOCK_ENABLED_SOFTIRQS_READ, "soft-read"))
+ return 0;
+#endif
+ if (softirq_verbose(this->class))
+ ret = 2;
+ break;
+ case LOCK_USED_IN_HARDIRQ_READ:
+ if (!valid_state(curr, this, new_bit, LOCK_ENABLED_HARDIRQS))
+ return 0;
+ /*
+ * just marked it hardirq-read-safe, check that this lock
+ * took no hardirq-unsafe lock in the past:
+ */
+ if (!check_usage_forwards(curr, this,
+ LOCK_ENABLED_HARDIRQS, "hard"))
+ return 0;
+ if (hardirq_verbose(this->class))
+ ret = 2;
+ break;
+ case LOCK_USED_IN_SOFTIRQ_READ:
+ if (!valid_state(curr, this, new_bit, LOCK_ENABLED_SOFTIRQS))
+ return 0;
+ /*
+ * just marked it softirq-read-safe, check that this lock
+ * took no softirq-unsafe lock in the past:
+ */
+ if (!check_usage_forwards(curr, this,
+ LOCK_ENABLED_SOFTIRQS, "soft"))
+ return 0;
+ if (softirq_verbose(this->class))
+ ret = 2;
+ break;
+ case LOCK_ENABLED_HARDIRQS:
+ if (!valid_state(curr, this, new_bit, LOCK_USED_IN_HARDIRQ))
+ return 0;
+ if (!valid_state(curr, this, new_bit,
+ LOCK_USED_IN_HARDIRQ_READ))
+ return 0;
+ /*
+ * just marked it hardirq-unsafe, check that no hardirq-safe
+ * lock in the system ever took it in the past:
+ */
+ if (!check_usage_backwards(curr, this,
+ LOCK_USED_IN_HARDIRQ, "hard"))
+ return 0;
+#if STRICT_READ_CHECKS
+ /*
+ * just marked it hardirq-unsafe, check that no
+ * hardirq-safe-read lock in the system ever took
+ * it in the past:
+ */
+ if (!check_usage_backwards(curr, this,
+ LOCK_USED_IN_HARDIRQ_READ, "hard-read"))
+ return 0;
+#endif
+ if (hardirq_verbose(this->class))
+ ret = 2;
+ break;
+ case LOCK_ENABLED_SOFTIRQS:
+ if (!valid_state(curr, this, new_bit, LOCK_USED_IN_SOFTIRQ))
+ return 0;
+ if (!valid_state(curr, this, new_bit,
+ LOCK_USED_IN_SOFTIRQ_READ))
+ return 0;
+ /*
+ * just marked it softirq-unsafe, check that no softirq-safe
+ * lock in the system ever took it in the past:
+ */
+ if (!check_usage_backwards(curr, this,
+ LOCK_USED_IN_SOFTIRQ, "soft"))
+ return 0;
+#if STRICT_READ_CHECKS
+ /*
+ * just marked it softirq-unsafe, check that no
+ * softirq-safe-read lock in the system ever took
+ * it in the past:
+ */
+ if (!check_usage_backwards(curr, this,
+ LOCK_USED_IN_SOFTIRQ_READ, "soft-read"))
+ return 0;
+#endif
+ if (softirq_verbose(this->class))
+ ret = 2;
+ break;
+ case LOCK_ENABLED_HARDIRQS_READ:
+ if (!valid_state(curr, this, new_bit, LOCK_USED_IN_HARDIRQ))
+ return 0;
+#if STRICT_READ_CHECKS
+ /*
+ * just marked it hardirq-read-unsafe, check that no
+ * hardirq-safe lock in the system ever took it in the past:
+ */
+ if (!check_usage_backwards(curr, this,
+ LOCK_USED_IN_HARDIRQ, "hard"))
+ return 0;
+#endif
+ if (hardirq_verbose(this->class))
+ ret = 2;
+ break;
+ case LOCK_ENABLED_SOFTIRQS_READ:
+ if (!valid_state(curr, this, new_bit, LOCK_USED_IN_SOFTIRQ))
+ return 0;
+#if STRICT_READ_CHECKS
+ /*
+ * just marked it softirq-read-unsafe, check that no
+ * softirq-safe lock in the system ever took it in the past:
+ */
+ if (!check_usage_backwards(curr, this,
+ LOCK_USED_IN_SOFTIRQ, "soft"))
+ return 0;
+#endif
+ if (softirq_verbose(this->class))
+ ret = 2;
+ break;
+ default:
+ WARN_ON(1);
+ break;
+ }
+
+ return ret;
+}
+
+/*
+ * Mark all held locks with a usage bit:
+ */
+static int
+mark_held_locks(struct task_struct *curr, int hardirq)
+{
+ enum lock_usage_bit usage_bit;
+ struct held_lock *hlock;
+ int i;
+
+ for (i = 0; i < curr->lockdep_depth; i++) {
+ hlock = curr->held_locks + i;
+
+ if (hardirq) {
+ if (hlock->read)
+ usage_bit = LOCK_ENABLED_HARDIRQS_READ;
+ else
+ usage_bit = LOCK_ENABLED_HARDIRQS;
+ } else {
+ if (hlock->read)
+ usage_bit = LOCK_ENABLED_SOFTIRQS_READ;
+ else
+ usage_bit = LOCK_ENABLED_SOFTIRQS;
+ }
+ if (!mark_lock(curr, hlock, usage_bit))
+ return 0;
+ }
+
+ return 1;
+}
+
+/*
+ * Debugging helper: via this flag we know that we are in
+ * 'early bootup code', and will warn about any invalid irqs-on event:
+ */
+static int early_boot_irqs_enabled;
+
+void early_boot_irqs_off(void)
+{
+ early_boot_irqs_enabled = 0;
+}
+
+void early_boot_irqs_on(void)
+{
+ early_boot_irqs_enabled = 1;
+}
+
+/*
+ * Hardirqs will be enabled:
+ */
+void trace_hardirqs_on(void)
+{
+ struct task_struct *curr = current;
+ unsigned long ip;
+
+ if (unlikely(!debug_locks || current->lockdep_recursion))
+ return;
+
+ if (DEBUG_LOCKS_WARN_ON(unlikely(!early_boot_irqs_enabled)))
+ return;
+
+ if (unlikely(curr->hardirqs_enabled)) {
+ debug_atomic_inc(&redundant_hardirqs_on);
+ return;
+ }
+ /* we'll do an OFF -> ON transition: */
+ curr->hardirqs_enabled = 1;
+ ip = (unsigned long) __builtin_return_address(0);
+
+ if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
+ return;
+ if (DEBUG_LOCKS_WARN_ON(current->hardirq_context))
+ return;
+ /*
+ * We are going to turn hardirqs on, so set the
+ * usage bit for all held locks:
+ */
+ if (!mark_held_locks(curr, 1))
+ return;
+ /*
+ * If we have softirqs enabled, then set the usage
+ * bit for all held locks. (disabled hardirqs prevented
+ * this bit from being set before)
+ */
+ if (curr->softirqs_enabled)
+ if (!mark_held_locks(curr, 0))
+ return;
+
+ curr->hardirq_enable_ip = ip;
+ curr->hardirq_enable_event = ++curr->irq_events;
+ debug_atomic_inc(&hardirqs_on_events);
+}
+
+EXPORT_SYMBOL(trace_hardirqs_on);
+
+/*
+ * Hardirqs were disabled:
+ */
+void trace_hardirqs_off(void)
+{
+ struct task_struct *curr = current;
+
+ if (unlikely(!debug_locks || current->lockdep_recursion))
+ return;
+
+ if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
+ return;
+
+ if (curr->hardirqs_enabled) {
+ /*
+ * We have done an ON -> OFF transition:
+ */
+ curr->hardirqs_enabled = 0;
+ curr->hardirq_disable_ip = _RET_IP_;
+ curr->hardirq_disable_event = ++curr->irq_events;
+ debug_atomic_inc(&hardirqs_off_events);
+ } else
+ debug_atomic_inc(&redundant_hardirqs_off);
+}
+
+EXPORT_SYMBOL(trace_hardirqs_off);
+
+/*
+ * Softirqs will be enabled:
+ */
+void trace_softirqs_on(unsigned long ip)
+{
+ struct task_struct *curr = current;
+
+ if (unlikely(!debug_locks))
+ return;
+
+ if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
+ return;
+
+ if (curr->softirqs_enabled) {
+ debug_atomic_inc(&redundant_softirqs_on);
+ return;
+ }
+
+ /*
+ * We'll do an OFF -> ON transition:
+ */
+ curr->softirqs_enabled = 1;
+ curr->softirq_enable_ip = ip;
+ curr->softirq_enable_event = ++curr->irq_events;
+ debug_atomic_inc(&softirqs_on_events);
+ /*
+ * We are going to turn softirqs on, so set the
+ * usage bit for all held locks, if hardirqs are
+ * enabled too:
+ */
+ if (curr->hardirqs_enabled)
+ mark_held_locks(curr, 0);
+}
+
+/*
+ * Softirqs were disabled:
+ */
+void trace_softirqs_off(unsigned long ip)
+{
+ struct task_struct *curr = current;
+
+ if (unlikely(!debug_locks))
+ return;
+
+ if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
+ return;
+
+ if (curr->softirqs_enabled) {
+ /*
+ * We have done an ON -> OFF transition:
+ */
+ curr->softirqs_enabled = 0;
+ curr->softirq_disable_ip = ip;
+ curr->softirq_disable_event = ++curr->irq_events;
+ debug_atomic_inc(&softirqs_off_events);
+ DEBUG_LOCKS_WARN_ON(!softirq_count());
+ } else
+ debug_atomic_inc(&redundant_softirqs_off);
+}
+
+static int mark_irqflags(struct task_struct *curr, struct held_lock *hlock)
+{
+ /*
+ * If non-trylock use in a hardirq or softirq context, then
+ * mark the lock as used in these contexts:
+ */
+ if (!hlock->trylock) {
+ if (hlock->read) {
+ if (curr->hardirq_context)
+ if (!mark_lock(curr, hlock,
+ LOCK_USED_IN_HARDIRQ_READ))
+ return 0;
+ if (curr->softirq_context)
+ if (!mark_lock(curr, hlock,
+ LOCK_USED_IN_SOFTIRQ_READ))
+ return 0;
+ } else {
+ if (curr->hardirq_context)
+ if (!mark_lock(curr, hlock, LOCK_USED_IN_HARDIRQ))
+ return 0;
+ if (curr->softirq_context)
+ if (!mark_lock(curr, hlock, LOCK_USED_IN_SOFTIRQ))
+ return 0;
+ }
+ }
+ if (!hlock->hardirqs_off) {
+ if (hlock->read) {
+ if (!mark_lock(curr, hlock,
+ LOCK_ENABLED_HARDIRQS_READ))
+ return 0;
+ if (curr->softirqs_enabled)
+ if (!mark_lock(curr, hlock,
+ LOCK_ENABLED_SOFTIRQS_READ))
+ return 0;
+ } else {
+ if (!mark_lock(curr, hlock,
+ LOCK_ENABLED_HARDIRQS))
+ return 0;
+ if (curr->softirqs_enabled)
+ if (!mark_lock(curr, hlock,
+ LOCK_ENABLED_SOFTIRQS))
+ return 0;
+ }
+ }
+
+ return 1;
+}
+
+static int separate_irq_context(struct task_struct *curr,
+ struct held_lock *hlock)
+{
+ unsigned int depth = curr->lockdep_depth;
+
+ /*
+ * Keep track of points where we cross into an interrupt context:
+ */
+ hlock->irq_context = 2*(curr->hardirq_context ? 1 : 0) +
+ curr->softirq_context;
+ if (depth) {
+ struct held_lock *prev_hlock;
+
+ prev_hlock = curr->held_locks + depth-1;
+ /*
+ * If we cross into another context, reset the
+ * hash key (this also prevents the checking and the
+ * adding of the dependency to 'prev'):
+ */
+ if (prev_hlock->irq_context != hlock->irq_context)
+ return 1;
+ }
+ return 0;
+}
+
+#else
+
+static inline
+int mark_lock_irq(struct task_struct *curr, struct held_lock *this,
+ enum lock_usage_bit new_bit)
+{
+ WARN_ON(1);
+ return 1;
+}
+
+static inline int mark_irqflags(struct task_struct *curr,
+ struct held_lock *hlock)
+{
+ return 1;
+}
+
+static inline int separate_irq_context(struct task_struct *curr,
+ struct held_lock *hlock)
+{
+ return 0;
+}
+
+#endif
+
+/*
+ * Mark a lock with a usage bit, and validate the state transition:
+ */
+static int mark_lock(struct task_struct *curr, struct held_lock *this,
+ enum lock_usage_bit new_bit)
+{
+ unsigned int new_mask = 1 << new_bit, ret = 1;
+
+ /*
+ * If already set then do not dirty the cacheline,
+ * nor do any checks:
+ */
+ if (likely(this->class->usage_mask & new_mask))
+ return 1;
+
+ if (!graph_lock())
+ return 0;
+ /*
+ * Make sure we didnt race:
+ */
+ if (unlikely(this->class->usage_mask & new_mask)) {
+ graph_unlock();
+ return 1;
+ }
+
+ this->class->usage_mask |= new_mask;
+
+ if (!save_trace(this->class->usage_traces + new_bit))
+ return 0;
+
+ switch (new_bit) {
+ case LOCK_USED_IN_HARDIRQ:
+ case LOCK_USED_IN_SOFTIRQ:
+ case LOCK_USED_IN_HARDIRQ_READ:
+ case LOCK_USED_IN_SOFTIRQ_READ:
+ case LOCK_ENABLED_HARDIRQS:
+ case LOCK_ENABLED_SOFTIRQS:
+ case LOCK_ENABLED_HARDIRQS_READ:
+ case LOCK_ENABLED_SOFTIRQS_READ:
+ ret = mark_lock_irq(curr, this, new_bit);
+ if (!ret)
+ return 0;
+ break;
+ case LOCK_USED:
+ /*
+ * Add it to the global list of classes:
+ */
+ list_add_tail_rcu(&this->class->lock_entry, &all_lock_classes);
+ debug_atomic_dec(&nr_unused_locks);
+ break;
+ default:
+ if (!debug_locks_off_graph_unlock())
+ return 0;
+ WARN_ON(1);
+ return 0;
+ }
+
+ graph_unlock();
+
+ /*
+ * We must printk outside of the graph_lock:
+ */
+ if (ret == 2) {
+ printk("\nmarked lock as {%s}:\n", usage_str[new_bit]);
+ print_lock(this);
+ print_irqtrace_events(curr);
+ dump_stack();
+ }
+
+ return ret;
+}
+
+/*
+ * Initialize a lock instance's lock-class mapping info:
+ */
+void lockdep_init_map(struct lockdep_map *lock, const char *name,
+ struct lock_class_key *key, int subclass)
+{
+ if (unlikely(!debug_locks))
+ return;
+
+ if (DEBUG_LOCKS_WARN_ON(!key))
+ return;
+ if (DEBUG_LOCKS_WARN_ON(!name))
+ return;
+ /*
+ * Sanity check, the lock-class key must be persistent:
+ */
+ if (!static_obj(key)) {
+ printk("BUG: key %p not in .data!\n", key);
+ DEBUG_LOCKS_WARN_ON(1);
+ return;
+ }
+ lock->name = name;
+ lock->key = key;
+ lock->class_cache = NULL;
+#ifdef CONFIG_LOCK_STAT
+ lock->cpu = raw_smp_processor_id();
+#endif
+ if (subclass)
+ register_lock_class(lock, subclass, 1);
+}
+
+EXPORT_SYMBOL_GPL(lockdep_init_map);
+
+/*
+ * This gets called for every mutex_lock*()/spin_lock*() operation.
+ * We maintain the dependency maps and validate the locking attempt:
+ */
+static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
+ int trylock, int read, int check, int hardirqs_off,
+ unsigned long ip)
+{
+ struct task_struct *curr = current;
+ struct lock_class *class = NULL;
+ struct held_lock *hlock;
+ unsigned int depth, id;
+ int chain_head = 0;
+ u64 chain_key;
+
+ if (!prove_locking)
+ check = 1;
+
+ if (unlikely(!debug_locks))
+ return 0;
+
+ if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
+ return 0;
+
+ if (unlikely(subclass >= MAX_LOCKDEP_SUBCLASSES)) {
+ debug_locks_off();
+ printk("BUG: MAX_LOCKDEP_SUBCLASSES too low!\n");
+ printk("turning off the locking correctness validator.\n");
+ return 0;
+ }
+
+ if (!subclass)
+ class = lock->class_cache;
+ /*
+ * Not cached yet or subclass?
+ */
+ if (unlikely(!class)) {
+ class = register_lock_class(lock, subclass, 0);
+ if (!class)
+ return 0;
+ }
+ debug_atomic_inc((atomic_t *)&class->ops);
+ if (very_verbose(class)) {
+ printk("\nacquire class [%p] %s", class->key, class->name);
+ if (class->name_version > 1)
+ printk("#%d", class->name_version);
+ printk("\n");
+ dump_stack();
+ }
+
+ /*
+ * Add the lock to the list of currently held locks.
+ * (we dont increase the depth just yet, up until the
+ * dependency checks are done)
+ */
+ depth = curr->lockdep_depth;
+ if (DEBUG_LOCKS_WARN_ON(depth >= MAX_LOCK_DEPTH))
+ return 0;
+
+ hlock = curr->held_locks + depth;
+
+ hlock->class = class;
+ hlock->acquire_ip = ip;
+ hlock->instance = lock;
+ hlock->trylock = trylock;
+ hlock->read = read;
+ hlock->check = check;
+ hlock->hardirqs_off = hardirqs_off;
+#ifdef CONFIG_LOCK_STAT
+ hlock->waittime_stamp = 0;
+ hlock->holdtime_stamp = sched_clock();
+#endif
+
+ if (check == 2 && !mark_irqflags(curr, hlock))
+ return 0;
+
+ /* mark it as used: */
+ if (!mark_lock(curr, hlock, LOCK_USED))
+ return 0;
+
+ /*
+ * Calculate the chain hash: it's the combined has of all the
+ * lock keys along the dependency chain. We save the hash value
+ * at every step so that we can get the current hash easily
+ * after unlock. The chain hash is then used to cache dependency
+ * results.
+ *
+ * The 'key ID' is what is the most compact key value to drive
+ * the hash, not class->key.
+ */
+ id = class - lock_classes;
+ if (DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS))
+ return 0;
+
+ chain_key = curr->curr_chain_key;
+ if (!depth) {
+ if (DEBUG_LOCKS_WARN_ON(chain_key != 0))
+ return 0;
+ chain_head = 1;
+ }
+
+ hlock->prev_chain_key = chain_key;
+ if (separate_irq_context(curr, hlock)) {
+ chain_key = 0;
+ chain_head = 1;
+ }
+ chain_key = iterate_chain_key(chain_key, id);
+
+ if (!validate_chain(curr, lock, hlock, chain_head, chain_key))
+ return 0;
+
+ curr->curr_chain_key = chain_key;
+ curr->lockdep_depth++;
+ check_chain_key(curr);
+#ifdef CONFIG_DEBUG_LOCKDEP
+ if (unlikely(!debug_locks))
+ return 0;
+#endif
+ if (unlikely(curr->lockdep_depth >= MAX_LOCK_DEPTH)) {
+ debug_locks_off();
+ printk("BUG: MAX_LOCK_DEPTH too low!\n");
+ printk("turning off the locking correctness validator.\n");
+ return 0;
+ }
+
+ if (unlikely(curr->lockdep_depth > max_lockdep_depth))
+ max_lockdep_depth = curr->lockdep_depth;
+
+ return 1;
+}
+
+static int
+print_unlock_inbalance_bug(struct task_struct *curr, struct lockdep_map *lock,
+ unsigned long ip)
+{
+ if (!debug_locks_off())
+ return 0;
+ if (debug_locks_silent)
+ return 0;
+
+ printk("\n=====================================\n");
+ printk( "[ BUG: bad unlock balance detected! ]\n");
+ printk( "-------------------------------------\n");
+ printk("%s/%d is trying to release lock (",
+ curr->comm, task_pid_nr(curr));
+ print_lockdep_cache(lock);
+ printk(") at:\n");
+ print_ip_sym(ip);
+ printk("but there are no more locks to release!\n");
+ printk("\nother info that might help us debug this:\n");
+ lockdep_print_held_locks(curr);
+
+ printk("\nstack backtrace:\n");
+ dump_stack();
+
+ return 0;
+}
+
+/*
+ * Common debugging checks for both nested and non-nested unlock:
+ */
+static int check_unlock(struct task_struct *curr, struct lockdep_map *lock,
+ unsigned long ip)
+{
+ if (unlikely(!debug_locks))
+ return 0;
+ if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
+ return 0;
+
+ if (curr->lockdep_depth <= 0)
+ return print_unlock_inbalance_bug(curr, lock, ip);
+
+ return 1;
+}
+
+/*
+ * Remove the lock to the list of currently held locks in a
+ * potentially non-nested (out of order) manner. This is a
+ * relatively rare operation, as all the unlock APIs default
+ * to nested mode (which uses lock_release()):
+ */
+static int
+lock_release_non_nested(struct task_struct *curr,
+ struct lockdep_map *lock, unsigned long ip)
+{
+ struct held_lock *hlock, *prev_hlock;
+ unsigned int depth;
+ int i;
+
+ /*
+ * Check whether the lock exists in the current stack
+ * of held locks:
+ */
+ depth = curr->lockdep_depth;
+ if (DEBUG_LOCKS_WARN_ON(!depth))
+ return 0;
+
+ prev_hlock = NULL;
+ for (i = depth-1; i >= 0; i--) {
+ hlock = curr->held_locks + i;
+ /*
+ * We must not cross into another context:
+ */
+ if (prev_hlock && prev_hlock->irq_context != hlock->irq_context)
+ break;
+ if (hlock->instance == lock)
+ goto found_it;
+ prev_hlock = hlock;
+ }
+ return print_unlock_inbalance_bug(curr, lock, ip);
+
+found_it:
+ lock_release_holdtime(hlock);
+
+ /*
+ * We have the right lock to unlock, 'hlock' points to it.
+ * Now we remove it from the stack, and add back the other
+ * entries (if any), recalculating the hash along the way:
+ */
+ curr->lockdep_depth = i;
+ curr->curr_chain_key = hlock->prev_chain_key;
+
+ for (i++; i < depth; i++) {
+ hlock = curr->held_locks + i;
+ if (!__lock_acquire(hlock->instance,
+ hlock->class->subclass, hlock->trylock,
+ hlock->read, hlock->check, hlock->hardirqs_off,
+ hlock->acquire_ip))
+ return 0;
+ }
+
+ if (DEBUG_LOCKS_WARN_ON(curr->lockdep_depth != depth - 1))
+ return 0;
+ return 1;
+}
+
+/*
+ * Remove the lock to the list of currently held locks - this gets
+ * called on mutex_unlock()/spin_unlock*() (or on a failed
+ * mutex_lock_interruptible()). This is done for unlocks that nest
+ * perfectly. (i.e. the current top of the lock-stack is unlocked)
+ */
+static int lock_release_nested(struct task_struct *curr,
+ struct lockdep_map *lock, unsigned long ip)
+{
+ struct held_lock *hlock;
+ unsigned int depth;
+
+ /*
+ * Pop off the top of the lock stack:
+ */
+ depth = curr->lockdep_depth - 1;
+ hlock = curr->held_locks + depth;
+
+ /*
+ * Is the unlock non-nested:
+ */
+ if (hlock->instance != lock)
+ return lock_release_non_nested(curr, lock, ip);
+ curr->lockdep_depth--;
+
+ if (DEBUG_LOCKS_WARN_ON(!depth && (hlock->prev_chain_key != 0)))
+ return 0;
+
+ curr->curr_chain_key = hlock->prev_chain_key;
+
+ lock_release_holdtime(hlock);
+
+#ifdef CONFIG_DEBUG_LOCKDEP
+ hlock->prev_chain_key = 0;
+ hlock->class = NULL;
+ hlock->acquire_ip = 0;
+ hlock->irq_context = 0;
+#endif
+ return 1;
+}
+
+/*
+ * Remove the lock to the list of currently held locks - this gets
+ * called on mutex_unlock()/spin_unlock*() (or on a failed
+ * mutex_lock_interruptible()). This is done for unlocks that nest
+ * perfectly. (i.e. the current top of the lock-stack is unlocked)
+ */
+static void
+__lock_release(struct lockdep_map *lock, int nested, unsigned long ip)
+{
+ struct task_struct *curr = current;
+
+ if (!check_unlock(curr, lock, ip))
+ return;
+
+ if (nested) {
+ if (!lock_release_nested(curr, lock, ip))
+ return;
+ } else {
+ if (!lock_release_non_nested(curr, lock, ip))
+ return;
+ }
+
+ check_chain_key(curr);
+}
+
+/*
+ * Check whether we follow the irq-flags state precisely:
+ */
+static void check_flags(unsigned long flags)
+{
+#if defined(CONFIG_DEBUG_LOCKDEP) && defined(CONFIG_TRACE_IRQFLAGS)
+ if (!debug_locks)
+ return;
+
+ if (irqs_disabled_flags(flags))
+ DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled);
+ else
+ DEBUG_LOCKS_WARN_ON(!current->hardirqs_enabled);
+
+ /*
+ * We dont accurately track softirq state in e.g.
+ * hardirq contexts (such as on 4KSTACKS), so only
+ * check if not in hardirq contexts:
+ */
+ if (!hardirq_count()) {
+ if (softirq_count())
+ DEBUG_LOCKS_WARN_ON(current->softirqs_enabled);
+ else
+ DEBUG_LOCKS_WARN_ON(!current->softirqs_enabled);
+ }
+
+ if (!debug_locks)
+ print_irqtrace_events(current);
+#endif
+}
+
+/*
+ * We are not always called with irqs disabled - do that here,
+ * and also avoid lockdep recursion:
+ */
+void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
+ int trylock, int read, int check, unsigned long ip)
+{
+ unsigned long flags;
+
+ if (unlikely(!lock_stat && !prove_locking))
+ return;
+
+ if (unlikely(current->lockdep_recursion))
+ return;
+
+ raw_local_irq_save(flags);
+ check_flags(flags);
+
+ current->lockdep_recursion = 1;
+ __lock_acquire(lock, subclass, trylock, read, check,
+ irqs_disabled_flags(flags), ip);
+ current->lockdep_recursion = 0;
+ raw_local_irq_restore(flags);
+}
+
+EXPORT_SYMBOL_GPL(lock_acquire);
+
+void lock_release(struct lockdep_map *lock, int nested, unsigned long ip)
+{
+ unsigned long flags;
+
+ if (unlikely(!lock_stat && !prove_locking))
+ return;
+
+ if (unlikely(current->lockdep_recursion))
+ return;
+
+ raw_local_irq_save(flags);
+ check_flags(flags);
+ current->lockdep_recursion = 1;
+ __lock_release(lock, nested, ip);
+ current->lockdep_recursion = 0;
+ raw_local_irq_restore(flags);
+}
+
+EXPORT_SYMBOL_GPL(lock_release);
+
+#ifdef CONFIG_LOCK_STAT
+static int
+print_lock_contention_bug(struct task_struct *curr, struct lockdep_map *lock,
+ unsigned long ip)
+{
+ if (!debug_locks_off())
+ return 0;
+ if (debug_locks_silent)
+ return 0;
+
+ printk("\n=================================\n");
+ printk( "[ BUG: bad contention detected! ]\n");
+ printk( "---------------------------------\n");
+ printk("%s/%d is trying to contend lock (",
+ curr->comm, task_pid_nr(curr));
+ print_lockdep_cache(lock);
+ printk(") at:\n");
+ print_ip_sym(ip);
+ printk("but there are no locks held!\n");
+ printk("\nother info that might help us debug this:\n");
+ lockdep_print_held_locks(curr);
+
+ printk("\nstack backtrace:\n");
+ dump_stack();
+
+ return 0;
+}
+
+static void
+__lock_contended(struct lockdep_map *lock, unsigned long ip)
+{
+ struct task_struct *curr = current;
+ struct held_lock *hlock, *prev_hlock;
+ struct lock_class_stats *stats;
+ unsigned int depth;
+ int i, point;
+
+ depth = curr->lockdep_depth;
+ if (DEBUG_LOCKS_WARN_ON(!depth))
+ return;
+
+ prev_hlock = NULL;
+ for (i = depth-1; i >= 0; i--) {
+ hlock = curr->held_locks + i;
+ /*
+ * We must not cross into another context:
+ */
+ if (prev_hlock && prev_hlock->irq_context != hlock->irq_context)
+ break;
+ if (hlock->instance == lock)
+ goto found_it;
+ prev_hlock = hlock;
+ }
+ print_lock_contention_bug(curr, lock, ip);
+ return;
+
+found_it:
+ hlock->waittime_stamp = sched_clock();
+
+ point = lock_contention_point(hlock->class, ip);
+
+ stats = get_lock_stats(hlock->class);
+ if (point < ARRAY_SIZE(stats->contention_point))
+ stats->contention_point[i]++;
+ if (lock->cpu != smp_processor_id())
+ stats->bounces[bounce_contended + !!hlock->read]++;
+ put_lock_stats(stats);
+}
+
+static void
+__lock_acquired(struct lockdep_map *lock)
+{
+ struct task_struct *curr = current;
+ struct held_lock *hlock, *prev_hlock;
+ struct lock_class_stats *stats;
+ unsigned int depth;
+ u64 now;
+ s64 waittime = 0;
+ int i, cpu;
+
+ depth = curr->lockdep_depth;
+ if (DEBUG_LOCKS_WARN_ON(!depth))
+ return;
+
+ prev_hlock = NULL;
+ for (i = depth-1; i >= 0; i--) {
+ hlock = curr->held_locks + i;
+ /*
+ * We must not cross into another context:
+ */
+ if (prev_hlock && prev_hlock->irq_context != hlock->irq_context)
+ break;
+ if (hlock->instance == lock)
+ goto found_it;
+ prev_hlock = hlock;
+ }
+ print_lock_contention_bug(curr, lock, _RET_IP_);
+ return;
+
+found_it:
+ cpu = smp_processor_id();
+ if (hlock->waittime_stamp) {
+ now = sched_clock();
+ waittime = now - hlock->waittime_stamp;
+ hlock->holdtime_stamp = now;
+ }
+
+ stats = get_lock_stats(hlock->class);
+ if (waittime) {
+ if (hlock->read)
+ lock_time_inc(&stats->read_waittime, waittime);
+ else
+ lock_time_inc(&stats->write_waittime, waittime);
+ }
+ if (lock->cpu != cpu)
+ stats->bounces[bounce_acquired + !!hlock->read]++;
+ put_lock_stats(stats);
+
+ lock->cpu = cpu;
+}
+
+void lock_contended(struct lockdep_map *lock, unsigned long ip)
+{
+ unsigned long flags;
+
+ if (unlikely(!lock_stat))
+ return;
+
+ if (unlikely(current->lockdep_recursion))
+ return;
+
+ raw_local_irq_save(flags);
+ check_flags(flags);
+ current->lockdep_recursion = 1;
+ __lock_contended(lock, ip);
+ current->lockdep_recursion = 0;
+ raw_local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(lock_contended);
+
+void lock_acquired(struct lockdep_map *lock)
+{
+ unsigned long flags;
+
+ if (unlikely(!lock_stat))
+ return;
+
+ if (unlikely(current->lockdep_recursion))
+ return;
+
+ raw_local_irq_save(flags);
+ check_flags(flags);
+ current->lockdep_recursion = 1;
+ __lock_acquired(lock);
+ current->lockdep_recursion = 0;
+ raw_local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(lock_acquired);
+#endif
+
+/*
+ * Used by the testsuite, sanitize the validator state
+ * after a simulated failure:
+ */
+
+void lockdep_reset(void)
+{
+ unsigned long flags;
+ int i;
+
+ raw_local_irq_save(flags);
+ current->curr_chain_key = 0;
+ current->lockdep_depth = 0;
+ current->lockdep_recursion = 0;
+ memset(current->held_locks, 0, MAX_LOCK_DEPTH*sizeof(struct held_lock));
+ nr_hardirq_chains = 0;
+ nr_softirq_chains = 0;
+ nr_process_chains = 0;
+ debug_locks = 1;
+ for (i = 0; i < CHAINHASH_SIZE; i++)
+ INIT_LIST_HEAD(chainhash_table + i);
+ raw_local_irq_restore(flags);
+}
+
+static void zap_class(struct lock_class *class)
+{
+ int i;
+
+ /*
+ * Remove all dependencies this lock is
+ * involved in:
+ */
+ for (i = 0; i < nr_list_entries; i++) {
+ if (list_entries[i].class == class)
+ list_del_rcu(&list_entries[i].entry);
+ }
+ /*
+ * Unhash the class and remove it from the all_lock_classes list:
+ */
+ list_del_rcu(&class->hash_entry);
+ list_del_rcu(&class->lock_entry);
+
+}
+
+static inline int within(void *addr, void *start, unsigned long size)
+{
+ return addr >= start && addr < start + size;
+}
+
+void lockdep_free_key_range(void *start, unsigned long size)
+{
+ struct lock_class *class, *next;
+ struct list_head *head;
+ unsigned long flags;
+ int i;
+
+ raw_local_irq_save(flags);
+ graph_lock();
+
+ /*
+ * Unhash all classes that were created by this module:
+ */
+ for (i = 0; i < CLASSHASH_SIZE; i++) {
+ head = classhash_table + i;
+ if (list_empty(head))
+ continue;
+ list_for_each_entry_safe(class, next, head, hash_entry)
+ if (within(class->key, start, size))
+ zap_class(class);
+ }
+
+ graph_unlock();
+ raw_local_irq_restore(flags);
+}
+
+void lockdep_reset_lock(struct lockdep_map *lock)
+{
+ struct lock_class *class, *next;
+ struct list_head *head;
+ unsigned long flags;
+ int i, j;
+
+ raw_local_irq_save(flags);
+
+ /*
+ * Remove all classes this lock might have:
+ */
+ for (j = 0; j < MAX_LOCKDEP_SUBCLASSES; j++) {
+ /*
+ * If the class exists we look it up and zap it:
+ */
+ class = look_up_lock_class(lock, j);
+ if (class)
+ zap_class(class);
+ }
+ /*
+ * Debug check: in the end all mapped classes should
+ * be gone.
+ */
+ graph_lock();
+ for (i = 0; i < CLASSHASH_SIZE; i++) {
+ head = classhash_table + i;
+ if (list_empty(head))
+ continue;
+ list_for_each_entry_safe(class, next, head, hash_entry) {
+ if (unlikely(class == lock->class_cache)) {
+ if (debug_locks_off_graph_unlock())
+ WARN_ON(1);
+ goto out_restore;
+ }
+ }
+ }
+ graph_unlock();
+
+out_restore:
+ raw_local_irq_restore(flags);
+}
+
+void lockdep_init(void)
+{
+ int i;
+
+ /*
+ * Some architectures have their own start_kernel()
+ * code which calls lockdep_init(), while we also
+ * call lockdep_init() from the start_kernel() itself,
+ * and we want to initialize the hashes only once:
+ */
+ if (lockdep_initialized)
+ return;
+
+ for (i = 0; i < CLASSHASH_SIZE; i++)
+ INIT_LIST_HEAD(classhash_table + i);
+
+ for (i = 0; i < CHAINHASH_SIZE; i++)
+ INIT_LIST_HEAD(chainhash_table + i);
+
+ lockdep_initialized = 1;
+}
+
+void __init lockdep_info(void)
+{
+ printk("Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar\n");
+
+ printk("... MAX_LOCKDEP_SUBCLASSES: %lu\n", MAX_LOCKDEP_SUBCLASSES);
+ printk("... MAX_LOCK_DEPTH: %lu\n", MAX_LOCK_DEPTH);
+ printk("... MAX_LOCKDEP_KEYS: %lu\n", MAX_LOCKDEP_KEYS);
+ printk("... CLASSHASH_SIZE: %lu\n", CLASSHASH_SIZE);
+ printk("... MAX_LOCKDEP_ENTRIES: %lu\n", MAX_LOCKDEP_ENTRIES);
+ printk("... MAX_LOCKDEP_CHAINS: %lu\n", MAX_LOCKDEP_CHAINS);
+ printk("... CHAINHASH_SIZE: %lu\n", CHAINHASH_SIZE);
+
+ printk(" memory used by lock dependency info: %lu kB\n",
+ (sizeof(struct lock_class) * MAX_LOCKDEP_KEYS +
+ sizeof(struct list_head) * CLASSHASH_SIZE +
+ sizeof(struct lock_list) * MAX_LOCKDEP_ENTRIES +
+ sizeof(struct lock_chain) * MAX_LOCKDEP_CHAINS +
+ sizeof(struct list_head) * CHAINHASH_SIZE) / 1024);
+
+ printk(" per task-struct memory footprint: %lu bytes\n",
+ sizeof(struct held_lock) * MAX_LOCK_DEPTH);
+
+#ifdef CONFIG_DEBUG_LOCKDEP
+ if (lockdep_init_error) {
+ printk("WARNING: lockdep init error! Arch code didn't call lockdep_init() early enough?\n");
+ printk("Call stack leading to lockdep invocation was:\n");
+ print_stack_trace(&lockdep_init_trace, 0);
+ }
+#endif
+}
+
+static inline int in_range(const void *start, const void *addr, const void *end)
+{
+ return addr >= start && addr <= end;
+}
+
+static void
+print_freed_lock_bug(struct task_struct *curr, const void *mem_from,
+ const void *mem_to, struct held_lock *hlock)
+{
+ if (!debug_locks_off())
+ return;
+ if (debug_locks_silent)
+ return;
+
+ printk("\n=========================\n");
+ printk( "[ BUG: held lock freed! ]\n");
+ printk( "-------------------------\n");
+ printk("%s/%d is freeing memory %p-%p, with a lock still held there!\n",
+ curr->comm, task_pid_nr(curr), mem_from, mem_to-1);
+ print_lock(hlock);
+ lockdep_print_held_locks(curr);
+
+ printk("\nstack backtrace:\n");
+ dump_stack();
+}
+
+/*
+ * Called when kernel memory is freed (or unmapped), or if a lock
+ * is destroyed or reinitialized - this code checks whether there is
+ * any held lock in the memory range of <from> to <to>:
+ */
+void debug_check_no_locks_freed(const void *mem_from, unsigned long mem_len)
+{
+ const void *mem_to = mem_from + mem_len, *lock_from, *lock_to;
+ struct task_struct *curr = current;
+ struct held_lock *hlock;
+ unsigned long flags;
+ int i;
+
+ if (unlikely(!debug_locks))
+ return;
+
+ local_irq_save(flags);
+ for (i = 0; i < curr->lockdep_depth; i++) {
+ hlock = curr->held_locks + i;
+
+ lock_from = (void *)hlock->instance;
+ lock_to = (void *)(hlock->instance + 1);
+
+ if (!in_range(mem_from, lock_from, mem_to) &&
+ !in_range(mem_from, lock_to, mem_to))
+ continue;
+
+ print_freed_lock_bug(curr, mem_from, mem_to, hlock);
+ break;
+ }
+ local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(debug_check_no_locks_freed);
+
+static void print_held_locks_bug(struct task_struct *curr)
+{
+ if (!debug_locks_off())
+ return;
+ if (debug_locks_silent)
+ return;
+
+ printk("\n=====================================\n");
+ printk( "[ BUG: lock held at task exit time! ]\n");
+ printk( "-------------------------------------\n");
+ printk("%s/%d is exiting with locks still held!\n",
+ curr->comm, task_pid_nr(curr));
+ lockdep_print_held_locks(curr);
+
+ printk("\nstack backtrace:\n");
+ dump_stack();
+}
+
+void debug_check_no_locks_held(struct task_struct *task)
+{
+ if (unlikely(task->lockdep_depth > 0))
+ print_held_locks_bug(task);
+}
+
+void debug_show_all_locks(void)
+{
+ struct task_struct *g, *p;
+ int count = 10;
+ int unlock = 1;
+
+ if (unlikely(!debug_locks)) {
+ printk("INFO: lockdep is turned off.\n");
+ return;
+ }
+ printk("\nShowing all locks held in the system:\n");
+
+ /*
+ * Here we try to get the tasklist_lock as hard as possible,
+ * if not successful after 2 seconds we ignore it (but keep
+ * trying). This is to enable a debug printout even if a
+ * tasklist_lock-holding task deadlocks or crashes.
+ */
+retry:
+ if (!read_trylock(&tasklist_lock)) {
+ if (count == 10)
+ printk("hm, tasklist_lock locked, retrying... ");
+ if (count) {
+ count--;
+ printk(" #%d", 10-count);
+ mdelay(200);
+ goto retry;
+ }
+ printk(" ignoring it.\n");
+ unlock = 0;
+ }
+ if (count != 10)
+ printk(" locked it.\n");
+
+ do_each_thread(g, p) {
+ if (p->lockdep_depth)
+ lockdep_print_held_locks(p);
+ if (!unlock)
+ if (read_trylock(&tasklist_lock))
+ unlock = 1;
+ } while_each_thread(g, p);
+
+ printk("\n");
+ printk("=============================================\n\n");
+
+ if (unlock)
+ read_unlock(&tasklist_lock);
+}
+
+EXPORT_SYMBOL_GPL(debug_show_all_locks);
+
+void debug_show_held_locks(struct task_struct *task)
+{
+ if (unlikely(!debug_locks)) {
+ printk("INFO: lockdep is turned off.\n");
+ return;
+ }
+ lockdep_print_held_locks(task);
+}
+
+EXPORT_SYMBOL_GPL(debug_show_held_locks);
+
+void lockdep_sys_exit(void)
+{
+ struct task_struct *curr = current;
+
+ if (unlikely(curr->lockdep_depth)) {
+ if (!debug_locks_off())
+ return;
+ printk("\n================================================\n");
+ printk( "[ BUG: lock held when returning to user space! ]\n");
+ printk( "------------------------------------------------\n");
+ printk("%s/%d is leaving the kernel with locks still held!\n",
+ curr->comm, curr->pid);
+ lockdep_print_held_locks(curr);
+ }
+}
Index: linux-2.6-lttng.stable/kernel/lockdep.c
===================================================================
--- linux-2.6-lttng.stable.orig/kernel/lockdep.c 2007-10-29 09:51:06.000000000 -0400
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,3217 +0,0 @@
-/*
- * kernel/lockdep.c
- *
- * Runtime locking correctness validator
- *
- * Started by Ingo Molnar:
- *
- * Copyright (C) 2006,2007 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
- * Copyright (C) 2007 Red Hat, Inc., Peter Zijlstra <pzijlstr@redhat.com>
- *
- * this code maps all the lock dependencies as they occur in a live kernel
- * and will warn about the following classes of locking bugs:
- *
- * - lock inversion scenarios
- * - circular lock dependencies
- * - hardirq/softirq safe/unsafe locking bugs
- *
- * Bugs are reported even if the current locking scenario does not cause
- * any deadlock at this point.
- *
- * I.e. if anytime in the past two locks were taken in a different order,
- * even if it happened for another task, even if those were different
- * locks (but of the same class as this lock), this code will detect it.
- *
- * Thanks to Arjan van de Ven for coming up with the initial idea of
- * mapping lock dependencies runtime.
- */
-#include <linux/mutex.h>
-#include <linux/sched.h>
-#include <linux/delay.h>
-#include <linux/module.h>
-#include <linux/proc_fs.h>
-#include <linux/seq_file.h>
-#include <linux/spinlock.h>
-#include <linux/kallsyms.h>
-#include <linux/interrupt.h>
-#include <linux/stacktrace.h>
-#include <linux/debug_locks.h>
-#include <linux/irqflags.h>
-#include <linux/utsname.h>
-#include <linux/hash.h>
-
-#include <asm/sections.h>
-
-#include "lockdep_internals.h"
-
-#ifdef CONFIG_PROVE_LOCKING
-int prove_locking = 1;
-module_param(prove_locking, int, 0644);
-#else
-#define prove_locking 0
-#endif
-
-#ifdef CONFIG_LOCK_STAT
-int lock_stat = 1;
-module_param(lock_stat, int, 0644);
-#else
-#define lock_stat 0
-#endif
-
-/*
- * lockdep_lock: protects the lockdep graph, the hashes and the
- * class/list/hash allocators.
- *
- * This is one of the rare exceptions where it's justified
- * to use a raw spinlock - we really dont want the spinlock
- * code to recurse back into the lockdep code...
- */
-static raw_spinlock_t lockdep_lock = (raw_spinlock_t)__RAW_SPIN_LOCK_UNLOCKED;
-
-static int graph_lock(void)
-{
- __raw_spin_lock(&lockdep_lock);
- /*
- * Make sure that if another CPU detected a bug while
- * walking the graph we dont change it (while the other
- * CPU is busy printing out stuff with the graph lock
- * dropped already)
- */
- if (!debug_locks) {
- __raw_spin_unlock(&lockdep_lock);
- return 0;
- }
- return 1;
-}
-
-static inline int graph_unlock(void)
-{
- if (debug_locks && !__raw_spin_is_locked(&lockdep_lock))
- return DEBUG_LOCKS_WARN_ON(1);
-
- __raw_spin_unlock(&lockdep_lock);
- return 0;
-}
-
-/*
- * Turn lock debugging off and return with 0 if it was off already,
- * and also release the graph lock:
- */
-static inline int debug_locks_off_graph_unlock(void)
-{
- int ret = debug_locks_off();
-
- __raw_spin_unlock(&lockdep_lock);
-
- return ret;
-}
-
-static int lockdep_initialized;
-
-unsigned long nr_list_entries;
-static struct lock_list list_entries[MAX_LOCKDEP_ENTRIES];
-
-/*
- * All data structures here are protected by the global debug_lock.
- *
- * Mutex key structs only get allocated, once during bootup, and never
- * get freed - this significantly simplifies the debugging code.
- */
-unsigned long nr_lock_classes;
-static struct lock_class lock_classes[MAX_LOCKDEP_KEYS];
-
-#ifdef CONFIG_LOCK_STAT
-static DEFINE_PER_CPU(struct lock_class_stats[MAX_LOCKDEP_KEYS], lock_stats);
-
-static int lock_contention_point(struct lock_class *class, unsigned long ip)
-{
- int i;
-
- for (i = 0; i < ARRAY_SIZE(class->contention_point); i++) {
- if (class->contention_point[i] == 0) {
- class->contention_point[i] = ip;
- break;
- }
- if (class->contention_point[i] == ip)
- break;
- }
-
- return i;
-}
-
-static void lock_time_inc(struct lock_time *lt, s64 time)
-{
- if (time > lt->max)
- lt->max = time;
-
- if (time < lt->min || !lt->min)
- lt->min = time;
-
- lt->total += time;
- lt->nr++;
-}
-
-static inline void lock_time_add(struct lock_time *src, struct lock_time *dst)
-{
- dst->min += src->min;
- dst->max += src->max;
- dst->total += src->total;
- dst->nr += src->nr;
-}
-
-struct lock_class_stats lock_stats(struct lock_class *class)
-{
- struct lock_class_stats stats;
- int cpu, i;
-
- memset(&stats, 0, sizeof(struct lock_class_stats));
- for_each_possible_cpu(cpu) {
- struct lock_class_stats *pcs =
- &per_cpu(lock_stats, cpu)[class - lock_classes];
-
- for (i = 0; i < ARRAY_SIZE(stats.contention_point); i++)
- stats.contention_point[i] += pcs->contention_point[i];
-
- lock_time_add(&pcs->read_waittime, &stats.read_waittime);
- lock_time_add(&pcs->write_waittime, &stats.write_waittime);
-
- lock_time_add(&pcs->read_holdtime, &stats.read_holdtime);
- lock_time_add(&pcs->write_holdtime, &stats.write_holdtime);
-
- for (i = 0; i < ARRAY_SIZE(stats.bounces); i++)
- stats.bounces[i] += pcs->bounces[i];
- }
-
- return stats;
-}
-
-void clear_lock_stats(struct lock_class *class)
-{
- int cpu;
-
- for_each_possible_cpu(cpu) {
- struct lock_class_stats *cpu_stats =
- &per_cpu(lock_stats, cpu)[class - lock_classes];
-
- memset(cpu_stats, 0, sizeof(struct lock_class_stats));
- }
- memset(class->contention_point, 0, sizeof(class->contention_point));
-}
-
-static struct lock_class_stats *get_lock_stats(struct lock_class *class)
-{
- return &get_cpu_var(lock_stats)[class - lock_classes];
-}
-
-static void put_lock_stats(struct lock_class_stats *stats)
-{
- put_cpu_var(lock_stats);
-}
-
-static void lock_release_holdtime(struct held_lock *hlock)
-{
- struct lock_class_stats *stats;
- s64 holdtime;
-
- if (!lock_stat)
- return;
-
- holdtime = sched_clock() - hlock->holdtime_stamp;
-
- stats = get_lock_stats(hlock->class);
- if (hlock->read)
- lock_time_inc(&stats->read_holdtime, holdtime);
- else
- lock_time_inc(&stats->write_holdtime, holdtime);
- put_lock_stats(stats);
-}
-#else
-static inline void lock_release_holdtime(struct held_lock *hlock)
-{
-}
-#endif
-
-/*
- * We keep a global list of all lock classes. The list only grows,
- * never shrinks. The list is only accessed with the lockdep
- * spinlock lock held.
- */
-LIST_HEAD(all_lock_classes);
-
-/*
- * The lockdep classes are in a hash-table as well, for fast lookup:
- */
-#define CLASSHASH_BITS (MAX_LOCKDEP_KEYS_BITS - 1)
-#define CLASSHASH_SIZE (1UL << CLASSHASH_BITS)
-#define __classhashfn(key) hash_long((unsigned long)key, CLASSHASH_BITS)
-#define classhashentry(key) (classhash_table + __classhashfn((key)))
-
-static struct list_head classhash_table[CLASSHASH_SIZE];
-
-/*
- * We put the lock dependency chains into a hash-table as well, to cache
- * their existence:
- */
-#define CHAINHASH_BITS (MAX_LOCKDEP_CHAINS_BITS-1)
-#define CHAINHASH_SIZE (1UL << CHAINHASH_BITS)
-#define __chainhashfn(chain) hash_long(chain, CHAINHASH_BITS)
-#define chainhashentry(chain) (chainhash_table + __chainhashfn((chain)))
-
-static struct list_head chainhash_table[CHAINHASH_SIZE];
-
-/*
- * The hash key of the lock dependency chains is a hash itself too:
- * it's a hash of all locks taken up to that lock, including that lock.
- * It's a 64-bit hash, because it's important for the keys to be
- * unique.
- */
-#define iterate_chain_key(key1, key2) \
- (((key1) << MAX_LOCKDEP_KEYS_BITS) ^ \
- ((key1) >> (64-MAX_LOCKDEP_KEYS_BITS)) ^ \
- (key2))
-
-void lockdep_off(void)
-{
- current->lockdep_recursion++;
-}
-
-EXPORT_SYMBOL(lockdep_off);
-
-void lockdep_on(void)
-{
- current->lockdep_recursion--;
-}
-
-EXPORT_SYMBOL(lockdep_on);
-
-/*
- * Debugging switches:
- */
-
-#define VERBOSE 0
-#define VERY_VERBOSE 0
-
-#if VERBOSE
-# define HARDIRQ_VERBOSE 1
-# define SOFTIRQ_VERBOSE 1
-#else
-# define HARDIRQ_VERBOSE 0
-# define SOFTIRQ_VERBOSE 0
-#endif
-
-#if VERBOSE || HARDIRQ_VERBOSE || SOFTIRQ_VERBOSE
-/*
- * Quick filtering for interesting events:
- */
-static int class_filter(struct lock_class *class)
-{
-#if 0
- /* Example */
- if (class->name_version == 1 &&
- !strcmp(class->name, "lockname"))
- return 1;
- if (class->name_version == 1 &&
- !strcmp(class->name, "&struct->lockfield"))
- return 1;
-#endif
- /* Filter everything else. 1 would be to allow everything else */
- return 0;
-}
-#endif
-
-static int verbose(struct lock_class *class)
-{
-#if VERBOSE
- return class_filter(class);
-#endif
- return 0;
-}
-
-/*
- * Stack-trace: tightly packed array of stack backtrace
- * addresses. Protected by the graph_lock.
- */
-unsigned long nr_stack_trace_entries;
-static unsigned long stack_trace[MAX_STACK_TRACE_ENTRIES];
-
-static int save_trace(struct stack_trace *trace)
-{
- trace->nr_entries = 0;
- trace->max_entries = MAX_STACK_TRACE_ENTRIES - nr_stack_trace_entries;
- trace->entries = stack_trace + nr_stack_trace_entries;
-
- trace->skip = 3;
-
- save_stack_trace(trace);
-
- trace->max_entries = trace->nr_entries;
-
- nr_stack_trace_entries += trace->nr_entries;
-
- if (nr_stack_trace_entries == MAX_STACK_TRACE_ENTRIES) {
- if (!debug_locks_off_graph_unlock())
- return 0;
-
- printk("BUG: MAX_STACK_TRACE_ENTRIES too low!\n");
- printk("turning off the locking correctness validator.\n");
- dump_stack();
-
- return 0;
- }
-
- return 1;
-}
-
-unsigned int nr_hardirq_chains;
-unsigned int nr_softirq_chains;
-unsigned int nr_process_chains;
-unsigned int max_lockdep_depth;
-unsigned int max_recursion_depth;
-
-#ifdef CONFIG_DEBUG_LOCKDEP
-/*
- * We cannot printk in early bootup code. Not even early_printk()
- * might work. So we mark any initialization errors and printk
- * about it later on, in lockdep_info().
- */
-static int lockdep_init_error;
-static unsigned long lockdep_init_trace_data[20];
-static struct stack_trace lockdep_init_trace = {
- .max_entries = ARRAY_SIZE(lockdep_init_trace_data),
- .entries = lockdep_init_trace_data,
-};
-
-/*
- * Various lockdep statistics:
- */
-atomic_t chain_lookup_hits;
-atomic_t chain_lookup_misses;
-atomic_t hardirqs_on_events;
-atomic_t hardirqs_off_events;
-atomic_t redundant_hardirqs_on;
-atomic_t redundant_hardirqs_off;
-atomic_t softirqs_on_events;
-atomic_t softirqs_off_events;
-atomic_t redundant_softirqs_on;
-atomic_t redundant_softirqs_off;
-atomic_t nr_unused_locks;
-atomic_t nr_cyclic_checks;
-atomic_t nr_cyclic_check_recursions;
-atomic_t nr_find_usage_forwards_checks;
-atomic_t nr_find_usage_forwards_recursions;
-atomic_t nr_find_usage_backwards_checks;
-atomic_t nr_find_usage_backwards_recursions;
-# define debug_atomic_inc(ptr) atomic_inc(ptr)
-# define debug_atomic_dec(ptr) atomic_dec(ptr)
-# define debug_atomic_read(ptr) atomic_read(ptr)
-#else
-# define debug_atomic_inc(ptr) do { } while (0)
-# define debug_atomic_dec(ptr) do { } while (0)
-# define debug_atomic_read(ptr) 0
-#endif
-
-/*
- * Locking printouts:
- */
-
-static const char *usage_str[] =
-{
- [LOCK_USED] = "initial-use ",
- [LOCK_USED_IN_HARDIRQ] = "in-hardirq-W",
- [LOCK_USED_IN_SOFTIRQ] = "in-softirq-W",
- [LOCK_ENABLED_SOFTIRQS] = "softirq-on-W",
- [LOCK_ENABLED_HARDIRQS] = "hardirq-on-W",
- [LOCK_USED_IN_HARDIRQ_READ] = "in-hardirq-R",
- [LOCK_USED_IN_SOFTIRQ_READ] = "in-softirq-R",
- [LOCK_ENABLED_SOFTIRQS_READ] = "softirq-on-R",
- [LOCK_ENABLED_HARDIRQS_READ] = "hardirq-on-R",
-};
-
-const char * __get_key_name(struct lockdep_subclass_key *key, char *str)
-{
- return kallsyms_lookup((unsigned long)key, NULL, NULL, NULL, str);
-}
-
-void
-get_usage_chars(struct lock_class *class, char *c1, char *c2, char *c3, char *c4)
-{
- *c1 = '.', *c2 = '.', *c3 = '.', *c4 = '.';
-
- if (class->usage_mask & LOCKF_USED_IN_HARDIRQ)
- *c1 = '+';
- else
- if (class->usage_mask & LOCKF_ENABLED_HARDIRQS)
- *c1 = '-';
-
- if (class->usage_mask & LOCKF_USED_IN_SOFTIRQ)
- *c2 = '+';
- else
- if (class->usage_mask & LOCKF_ENABLED_SOFTIRQS)
- *c2 = '-';
-
- if (class->usage_mask & LOCKF_ENABLED_HARDIRQS_READ)
- *c3 = '-';
- if (class->usage_mask & LOCKF_USED_IN_HARDIRQ_READ) {
- *c3 = '+';
- if (class->usage_mask & LOCKF_ENABLED_HARDIRQS_READ)
- *c3 = '?';
- }
-
- if (class->usage_mask & LOCKF_ENABLED_SOFTIRQS_READ)
- *c4 = '-';
- if (class->usage_mask & LOCKF_USED_IN_SOFTIRQ_READ) {
- *c4 = '+';
- if (class->usage_mask & LOCKF_ENABLED_SOFTIRQS_READ)
- *c4 = '?';
- }
-}
-
-static void print_lock_name(struct lock_class *class)
-{
- char str[KSYM_NAME_LEN], c1, c2, c3, c4;
- const char *name;
-
- get_usage_chars(class, &c1, &c2, &c3, &c4);
-
- name = class->name;
- if (!name) {
- name = __get_key_name(class->key, str);
- printk(" (%s", name);
- } else {
- printk(" (%s", name);
- if (class->name_version > 1)
- printk("#%d", class->name_version);
- if (class->subclass)
- printk("/%d", class->subclass);
- }
- printk("){%c%c%c%c}", c1, c2, c3, c4);
-}
-
-static void print_lockdep_cache(struct lockdep_map *lock)
-{
- const char *name;
- char str[KSYM_NAME_LEN];
-
- name = lock->name;
- if (!name)
- name = __get_key_name(lock->key->subkeys, str);
-
- printk("%s", name);
-}
-
-static void print_lock(struct held_lock *hlock)
-{
- print_lock_name(hlock->class);
- printk(", at: ");
- print_ip_sym(hlock->acquire_ip);
-}
-
-static void lockdep_print_held_locks(struct task_struct *curr)
-{
- int i, depth = curr->lockdep_depth;
-
- if (!depth) {
- printk("no locks held by %s/%d.\n", curr->comm, task_pid_nr(curr));
- return;
- }
- printk("%d lock%s held by %s/%d:\n",
- depth, depth > 1 ? "s" : "", curr->comm, task_pid_nr(curr));
-
- for (i = 0; i < depth; i++) {
- printk(" #%d: ", i);
- print_lock(curr->held_locks + i);
- }
-}
-
-static void print_lock_class_header(struct lock_class *class, int depth)
-{
- int bit;
-
- printk("%*s->", depth, "");
- print_lock_name(class);
- printk(" ops: %lu", class->ops);
- printk(" {\n");
-
- for (bit = 0; bit < LOCK_USAGE_STATES; bit++) {
- if (class->usage_mask & (1 << bit)) {
- int len = depth;
-
- len += printk("%*s %s", depth, "", usage_str[bit]);
- len += printk(" at:\n");
- print_stack_trace(class->usage_traces + bit, len);
- }
- }
- printk("%*s }\n", depth, "");
-
- printk("%*s ... key at: ",depth,"");
- print_ip_sym((unsigned long)class->key);
-}
-
-/*
- * printk all lock dependencies starting at <entry>:
- */
-static void print_lock_dependencies(struct lock_class *class, int depth)
-{
- struct lock_list *entry;
-
- if (DEBUG_LOCKS_WARN_ON(depth >= 20))
- return;
-
- print_lock_class_header(class, depth);
-
- list_for_each_entry(entry, &class->locks_after, entry) {
- if (DEBUG_LOCKS_WARN_ON(!entry->class))
- return;
-
- print_lock_dependencies(entry->class, depth + 1);
-
- printk("%*s ... acquired at:\n",depth,"");
- print_stack_trace(&entry->trace, 2);
- printk("\n");
- }
-}
-
-static void print_kernel_version(void)
-{
- printk("%s %.*s\n", init_utsname()->release,
- (int)strcspn(init_utsname()->version, " "),
- init_utsname()->version);
-}
-
-static int very_verbose(struct lock_class *class)
-{
-#if VERY_VERBOSE
- return class_filter(class);
-#endif
- return 0;
-}
-
-/*
- * Is this the address of a static object:
- */
-static int static_obj(void *obj)
-{
- unsigned long start = (unsigned long) &_stext,
- end = (unsigned long) &_end,
- addr = (unsigned long) obj;
-#ifdef CONFIG_SMP
- int i;
-#endif
-
- /*
- * static variable?
- */
- if ((addr >= start) && (addr < end))
- return 1;
-
-#ifdef CONFIG_SMP
- /*
- * percpu var?
- */
- for_each_possible_cpu(i) {
- start = (unsigned long) &__per_cpu_start + per_cpu_offset(i);
- end = (unsigned long) &__per_cpu_start + PERCPU_ENOUGH_ROOM
- + per_cpu_offset(i);
-
- if ((addr >= start) && (addr < end))
- return 1;
- }
-#endif
-
- /*
- * module var?
- */
- return is_module_address(addr);
-}
-
-/*
- * To make lock name printouts unique, we calculate a unique
- * class->name_version generation counter:
- */
-static int count_matching_names(struct lock_class *new_class)
-{
- struct lock_class *class;
- int count = 0;
-
- if (!new_class->name)
- return 0;
-
- list_for_each_entry(class, &all_lock_classes, lock_entry) {
- if (new_class->key - new_class->subclass == class->key)
- return class->name_version;
- if (class->name && !strcmp(class->name, new_class->name))
- count = max(count, class->name_version);
- }
-
- return count + 1;
-}
-
-/*
- * Register a lock's class in the hash-table, if the class is not present
- * yet. Otherwise we look it up. We cache the result in the lock object
- * itself, so actual lookup of the hash should be once per lock object.
- */
-static inline struct lock_class *
-look_up_lock_class(struct lockdep_map *lock, unsigned int subclass)
-{
- struct lockdep_subclass_key *key;
- struct list_head *hash_head;
- struct lock_class *class;
-
-#ifdef CONFIG_DEBUG_LOCKDEP
- /*
- * If the architecture calls into lockdep before initializing
- * the hashes then we'll warn about it later. (we cannot printk
- * right now)
- */
- if (unlikely(!lockdep_initialized)) {
- lockdep_init();
- lockdep_init_error = 1;
- save_stack_trace(&lockdep_init_trace);
- }
-#endif
-
- /*
- * Static locks do not have their class-keys yet - for them the key
- * is the lock object itself:
- */
- if (unlikely(!lock->key))
- lock->key = (void *)lock;
-
- /*
- * NOTE: the class-key must be unique. For dynamic locks, a static
- * lock_class_key variable is passed in through the mutex_init()
- * (or spin_lock_init()) call - which acts as the key. For static
- * locks we use the lock object itself as the key.
- */
- BUILD_BUG_ON(sizeof(struct lock_class_key) >
- sizeof(struct lockdep_map));
-
- key = lock->key->subkeys + subclass;
-
- hash_head = classhashentry(key);
-
- /*
- * We can walk the hash lockfree, because the hash only
- * grows, and we are careful when adding entries to the end:
- */
- list_for_each_entry(class, hash_head, hash_entry) {
- if (class->key == key) {
- WARN_ON_ONCE(class->name != lock->name);
- return class;
- }
- }
-
- return NULL;
-}
-
-/*
- * Register a lock's class in the hash-table, if the class is not present
- * yet. Otherwise we look it up. We cache the result in the lock object
- * itself, so actual lookup of the hash should be once per lock object.
- */
-static inline struct lock_class *
-register_lock_class(struct lockdep_map *lock, unsigned int subclass, int force)
-{
- struct lockdep_subclass_key *key;
- struct list_head *hash_head;
- struct lock_class *class;
- unsigned long flags;
-
- class = look_up_lock_class(lock, subclass);
- if (likely(class))
- return class;
-
- /*
- * Debug-check: all keys must be persistent!
- */
- if (!static_obj(lock->key)) {
- debug_locks_off();
- printk("INFO: trying to register non-static key.\n");
- printk("the code is fine but needs lockdep annotation.\n");
- printk("turning off the locking correctness validator.\n");
- dump_stack();
-
- return NULL;
- }
-
- key = lock->key->subkeys + subclass;
- hash_head = classhashentry(key);
-
- raw_local_irq_save(flags);
- if (!graph_lock()) {
- raw_local_irq_restore(flags);
- return NULL;
- }
- /*
- * We have to do the hash-walk again, to avoid races
- * with another CPU:
- */
- list_for_each_entry(class, hash_head, hash_entry)
- if (class->key == key)
- goto out_unlock_set;
- /*
- * Allocate a new key from the static array, and add it to
- * the hash:
- */
- if (nr_lock_classes >= MAX_LOCKDEP_KEYS) {
- if (!debug_locks_off_graph_unlock()) {
- raw_local_irq_restore(flags);
- return NULL;
- }
- raw_local_irq_restore(flags);
-
- printk("BUG: MAX_LOCKDEP_KEYS too low!\n");
- printk("turning off the locking correctness validator.\n");
- return NULL;
- }
- class = lock_classes + nr_lock_classes++;
- debug_atomic_inc(&nr_unused_locks);
- class->key = key;
- class->name = lock->name;
- class->subclass = subclass;
- INIT_LIST_HEAD(&class->lock_entry);
- INIT_LIST_HEAD(&class->locks_before);
- INIT_LIST_HEAD(&class->locks_after);
- class->name_version = count_matching_names(class);
- /*
- * We use RCU's safe list-add method to make
- * parallel walking of the hash-list safe:
- */
- list_add_tail_rcu(&class->hash_entry, hash_head);
-
- if (verbose(class)) {
- graph_unlock();
- raw_local_irq_restore(flags);
-
- printk("\nnew class %p: %s", class->key, class->name);
- if (class->name_version > 1)
- printk("#%d", class->name_version);
- printk("\n");
- dump_stack();
-
- raw_local_irq_save(flags);
- if (!graph_lock()) {
- raw_local_irq_restore(flags);
- return NULL;
- }
- }
-out_unlock_set:
- graph_unlock();
- raw_local_irq_restore(flags);
-
- if (!subclass || force)
- lock->class_cache = class;
-
- if (DEBUG_LOCKS_WARN_ON(class->subclass != subclass))
- return NULL;
-
- return class;
-}
-
-#ifdef CONFIG_PROVE_LOCKING
-/*
- * Allocate a lockdep entry. (assumes the graph_lock held, returns
- * with NULL on failure)
- */
-static struct lock_list *alloc_list_entry(void)
-{
- if (nr_list_entries >= MAX_LOCKDEP_ENTRIES) {
- if (!debug_locks_off_graph_unlock())
- return NULL;
-
- printk("BUG: MAX_LOCKDEP_ENTRIES too low!\n");
- printk("turning off the locking correctness validator.\n");
- return NULL;
- }
- return list_entries + nr_list_entries++;
-}
-
-/*
- * Add a new dependency to the head of the list:
- */
-static int add_lock_to_list(struct lock_class *class, struct lock_class *this,
- struct list_head *head, unsigned long ip, int distance)
-{
- struct lock_list *entry;
- /*
- * Lock not present yet - get a new dependency struct and
- * add it to the list:
- */
- entry = alloc_list_entry();
- if (!entry)
- return 0;
-
- entry->class = this;
- entry->distance = distance;
- if (!save_trace(&entry->trace))
- return 0;
-
- /*
- * Since we never remove from the dependency list, the list can
- * be walked lockless by other CPUs, it's only allocation
- * that must be protected by the spinlock. But this also means
- * we must make new entries visible only once writes to the
- * entry become visible - hence the RCU op:
- */
- list_add_tail_rcu(&entry->entry, head);
-
- return 1;
-}
-
-/*
- * Recursive, forwards-direction lock-dependency checking, used for
- * both noncyclic checking and for hardirq-unsafe/softirq-unsafe
- * checking.
- *
- * (to keep the stackframe of the recursive functions small we
- * use these global variables, and we also mark various helper
- * functions as noinline.)
- */
-static struct held_lock *check_source, *check_target;
-
-/*
- * Print a dependency chain entry (this is only done when a deadlock
- * has been detected):
- */
-static noinline int
-print_circular_bug_entry(struct lock_list *target, unsigned int depth)
-{
- if (debug_locks_silent)
- return 0;
- printk("\n-> #%u", depth);
- print_lock_name(target->class);
- printk(":\n");
- print_stack_trace(&target->trace, 6);
-
- return 0;
-}
-
-/*
- * When a circular dependency is detected, print the
- * header first:
- */
-static noinline int
-print_circular_bug_header(struct lock_list *entry, unsigned int depth)
-{
- struct task_struct *curr = current;
-
- if (!debug_locks_off_graph_unlock() || debug_locks_silent)
- return 0;
-
- printk("\n=======================================================\n");
- printk( "[ INFO: possible circular locking dependency detected ]\n");
- print_kernel_version();
- printk( "-------------------------------------------------------\n");
- printk("%s/%d is trying to acquire lock:\n",
- curr->comm, task_pid_nr(curr));
- print_lock(check_source);
- printk("\nbut task is already holding lock:\n");
- print_lock(check_target);
- printk("\nwhich lock already depends on the new lock.\n\n");
- printk("\nthe existing dependency chain (in reverse order) is:\n");
-
- print_circular_bug_entry(entry, depth);
-
- return 0;
-}
-
-static noinline int print_circular_bug_tail(void)
-{
- struct task_struct *curr = current;
- struct lock_list this;
-
- if (debug_locks_silent)
- return 0;
-
- this.class = check_source->class;
- if (!save_trace(&this.trace))
- return 0;
-
- print_circular_bug_entry(&this, 0);
-
- printk("\nother info that might help us debug this:\n\n");
- lockdep_print_held_locks(curr);
-
- printk("\nstack backtrace:\n");
- dump_stack();
-
- return 0;
-}
-
-#define RECURSION_LIMIT 40
-
-static int noinline print_infinite_recursion_bug(void)
-{
- if (!debug_locks_off_graph_unlock())
- return 0;
-
- WARN_ON(1);
-
- return 0;
-}
-
-/*
- * Prove that the dependency graph starting at <entry> can not
- * lead to <target>. Print an error and return 0 if it does.
- */
-static noinline int
-check_noncircular(struct lock_class *source, unsigned int depth)
-{
- struct lock_list *entry;
-
- debug_atomic_inc(&nr_cyclic_check_recursions);
- if (depth > max_recursion_depth)
- max_recursion_depth = depth;
- if (depth >= RECURSION_LIMIT)
- return print_infinite_recursion_bug();
- /*
- * Check this lock's dependency list:
- */
- list_for_each_entry(entry, &source->locks_after, entry) {
- if (entry->class == check_target->class)
- return print_circular_bug_header(entry, depth+1);
- debug_atomic_inc(&nr_cyclic_checks);
- if (!check_noncircular(entry->class, depth+1))
- return print_circular_bug_entry(entry, depth+1);
- }
- return 1;
-}
-
-#ifdef CONFIG_TRACE_IRQFLAGS
-/*
- * Forwards and backwards subgraph searching, for the purposes of
- * proving that two subgraphs can be connected by a new dependency
- * without creating any illegal irq-safe -> irq-unsafe lock dependency.
- */
-static enum lock_usage_bit find_usage_bit;
-static struct lock_class *forwards_match, *backwards_match;
-
-/*
- * Find a node in the forwards-direction dependency sub-graph starting
- * at <source> that matches <find_usage_bit>.
- *
- * Return 2 if such a node exists in the subgraph, and put that node
- * into <forwards_match>.
- *
- * Return 1 otherwise and keep <forwards_match> unchanged.
- * Return 0 on error.
- */
-static noinline int
-find_usage_forwards(struct lock_class *source, unsigned int depth)
-{
- struct lock_list *entry;
- int ret;
-
- if (depth > max_recursion_depth)
- max_recursion_depth = depth;
- if (depth >= RECURSION_LIMIT)
- return print_infinite_recursion_bug();
-
- debug_atomic_inc(&nr_find_usage_forwards_checks);
- if (source->usage_mask & (1 << find_usage_bit)) {
- forwards_match = source;
- return 2;
- }
-
- /*
- * Check this lock's dependency list:
- */
- list_for_each_entry(entry, &source->locks_after, entry) {
- debug_atomic_inc(&nr_find_usage_forwards_recursions);
- ret = find_usage_forwards(entry->class, depth+1);
- if (ret == 2 || ret == 0)
- return ret;
- }
- return 1;
-}
-
-/*
- * Find a node in the backwards-direction dependency sub-graph starting
- * at <source> that matches <find_usage_bit>.
- *
- * Return 2 if such a node exists in the subgraph, and put that node
- * into <backwards_match>.
- *
- * Return 1 otherwise and keep <backwards_match> unchanged.
- * Return 0 on error.
- */
-static noinline int
-find_usage_backwards(struct lock_class *source, unsigned int depth)
-{
- struct lock_list *entry;
- int ret;
-
- if (!__raw_spin_is_locked(&lockdep_lock))
- return DEBUG_LOCKS_WARN_ON(1);
-
- if (depth > max_recursion_depth)
- max_recursion_depth = depth;
- if (depth >= RECURSION_LIMIT)
- return print_infinite_recursion_bug();
-
- debug_atomic_inc(&nr_find_usage_backwards_checks);
- if (source->usage_mask & (1 << find_usage_bit)) {
- backwards_match = source;
- return 2;
- }
-
- /*
- * Check this lock's dependency list:
- */
- list_for_each_entry(entry, &source->locks_before, entry) {
- debug_atomic_inc(&nr_find_usage_backwards_recursions);
- ret = find_usage_backwards(entry->class, depth+1);
- if (ret == 2 || ret == 0)
- return ret;
- }
- return 1;
-}
-
-static int
-print_bad_irq_dependency(struct task_struct *curr,
- struct held_lock *prev,
- struct held_lock *next,
- enum lock_usage_bit bit1,
- enum lock_usage_bit bit2,
- const char *irqclass)
-{
- if (!debug_locks_off_graph_unlock() || debug_locks_silent)
- return 0;
-
- printk("\n======================================================\n");
- printk( "[ INFO: %s-safe -> %s-unsafe lock order detected ]\n",
- irqclass, irqclass);
- print_kernel_version();
- printk( "------------------------------------------------------\n");
- printk("%s/%d [HC%u[%lu]:SC%u[%lu]:HE%u:SE%u] is trying to acquire:\n",
- curr->comm, task_pid_nr(curr),
- curr->hardirq_context, hardirq_count() >> HARDIRQ_SHIFT,
- curr->softirq_context, softirq_count() >> SOFTIRQ_SHIFT,
- curr->hardirqs_enabled,
- curr->softirqs_enabled);
- print_lock(next);
-
- printk("\nand this task is already holding:\n");
- print_lock(prev);
- printk("which would create a new lock dependency:\n");
- print_lock_name(prev->class);
- printk(" ->");
- print_lock_name(next->class);
- printk("\n");
-
- printk("\nbut this new dependency connects a %s-irq-safe lock:\n",
- irqclass);
- print_lock_name(backwards_match);
- printk("\n... which became %s-irq-safe at:\n", irqclass);
-
- print_stack_trace(backwards_match->usage_traces + bit1, 1);
-
- printk("\nto a %s-irq-unsafe lock:\n", irqclass);
- print_lock_name(forwards_match);
- printk("\n... which became %s-irq-unsafe at:\n", irqclass);
- printk("...");
-
- print_stack_trace(forwards_match->usage_traces + bit2, 1);
-
- printk("\nother info that might help us debug this:\n\n");
- lockdep_print_held_locks(curr);
-
- printk("\nthe %s-irq-safe lock's dependencies:\n", irqclass);
- print_lock_dependencies(backwards_match, 0);
-
- printk("\nthe %s-irq-unsafe lock's dependencies:\n", irqclass);
- print_lock_dependencies(forwards_match, 0);
-
- printk("\nstack backtrace:\n");
- dump_stack();
-
- return 0;
-}
-
-static int
-check_usage(struct task_struct *curr, struct held_lock *prev,
- struct held_lock *next, enum lock_usage_bit bit_backwards,
- enum lock_usage_bit bit_forwards, const char *irqclass)
-{
- int ret;
-
- find_usage_bit = bit_backwards;
- /* fills in <backwards_match> */
- ret = find_usage_backwards(prev->class, 0);
- if (!ret || ret == 1)
- return ret;
-
- find_usage_bit = bit_forwards;
- ret = find_usage_forwards(next->class, 0);
- if (!ret || ret == 1)
- return ret;
- /* ret == 2 */
- return print_bad_irq_dependency(curr, prev, next,
- bit_backwards, bit_forwards, irqclass);
-}
-
-static int
-check_prev_add_irq(struct task_struct *curr, struct held_lock *prev,
- struct held_lock *next)
-{
- /*
- * Prove that the new dependency does not connect a hardirq-safe
- * lock with a hardirq-unsafe lock - to achieve this we search
- * the backwards-subgraph starting at <prev>, and the
- * forwards-subgraph starting at <next>:
- */
- if (!check_usage(curr, prev, next, LOCK_USED_IN_HARDIRQ,
- LOCK_ENABLED_HARDIRQS, "hard"))
- return 0;
-
- /*
- * Prove that the new dependency does not connect a hardirq-safe-read
- * lock with a hardirq-unsafe lock - to achieve this we search
- * the backwards-subgraph starting at <prev>, and the
- * forwards-subgraph starting at <next>:
- */
- if (!check_usage(curr, prev, next, LOCK_USED_IN_HARDIRQ_READ,
- LOCK_ENABLED_HARDIRQS, "hard-read"))
- return 0;
-
- /*
- * Prove that the new dependency does not connect a softirq-safe
- * lock with a softirq-unsafe lock - to achieve this we search
- * the backwards-subgraph starting at <prev>, and the
- * forwards-subgraph starting at <next>:
- */
- if (!check_usage(curr, prev, next, LOCK_USED_IN_SOFTIRQ,
- LOCK_ENABLED_SOFTIRQS, "soft"))
- return 0;
- /*
- * Prove that the new dependency does not connect a softirq-safe-read
- * lock with a softirq-unsafe lock - to achieve this we search
- * the backwards-subgraph starting at <prev>, and the
- * forwards-subgraph starting at <next>:
- */
- if (!check_usage(curr, prev, next, LOCK_USED_IN_SOFTIRQ_READ,
- LOCK_ENABLED_SOFTIRQS, "soft"))
- return 0;
-
- return 1;
-}
-
-static void inc_chains(void)
-{
- if (current->hardirq_context)
- nr_hardirq_chains++;
- else {
- if (current->softirq_context)
- nr_softirq_chains++;
- else
- nr_process_chains++;
- }
-}
-
-#else
-
-static inline int
-check_prev_add_irq(struct task_struct *curr, struct held_lock *prev,
- struct held_lock *next)
-{
- return 1;
-}
-
-static inline void inc_chains(void)
-{
- nr_process_chains++;
-}
-
-#endif
-
-static int
-print_deadlock_bug(struct task_struct *curr, struct held_lock *prev,
- struct held_lock *next)
-{
- if (!debug_locks_off_graph_unlock() || debug_locks_silent)
- return 0;
-
- printk("\n=============================================\n");
- printk( "[ INFO: possible recursive locking detected ]\n");
- print_kernel_version();
- printk( "---------------------------------------------\n");
- printk("%s/%d is trying to acquire lock:\n",
- curr->comm, task_pid_nr(curr));
- print_lock(next);
- printk("\nbut task is already holding lock:\n");
- print_lock(prev);
-
- printk("\nother info that might help us debug this:\n");
- lockdep_print_held_locks(curr);
-
- printk("\nstack backtrace:\n");
- dump_stack();
-
- return 0;
-}
-
-/*
- * Check whether we are holding such a class already.
- *
- * (Note that this has to be done separately, because the graph cannot
- * detect such classes of deadlocks.)
- *
- * Returns: 0 on deadlock detected, 1 on OK, 2 on recursive read
- */
-static int
-check_deadlock(struct task_struct *curr, struct held_lock *next,
- struct lockdep_map *next_instance, int read)
-{
- struct held_lock *prev;
- int i;
-
- for (i = 0; i < curr->lockdep_depth; i++) {
- prev = curr->held_locks + i;
- if (prev->class != next->class)
- continue;
- /*
- * Allow read-after-read recursion of the same
- * lock class (i.e. read_lock(lock)+read_lock(lock)):
- */
- if ((read == 2) && prev->read)
- return 2;
- return print_deadlock_bug(curr, prev, next);
- }
- return 1;
-}
-
-/*
- * There was a chain-cache miss, and we are about to add a new dependency
- * to a previous lock. We recursively validate the following rules:
- *
- * - would the adding of the <prev> -> <next> dependency create a
- * circular dependency in the graph? [== circular deadlock]
- *
- * - does the new prev->next dependency connect any hardirq-safe lock
- * (in the full backwards-subgraph starting at <prev>) with any
- * hardirq-unsafe lock (in the full forwards-subgraph starting at
- * <next>)? [== illegal lock inversion with hardirq contexts]
- *
- * - does the new prev->next dependency connect any softirq-safe lock
- * (in the full backwards-subgraph starting at <prev>) with any
- * softirq-unsafe lock (in the full forwards-subgraph starting at
- * <next>)? [== illegal lock inversion with softirq contexts]
- *
- * any of these scenarios could lead to a deadlock.
- *
- * Then if all the validations pass, we add the forwards and backwards
- * dependency.
- */
-static int
-check_prev_add(struct task_struct *curr, struct held_lock *prev,
- struct held_lock *next, int distance)
-{
- struct lock_list *entry;
- int ret;
-
- /*
- * Prove that the new <prev> -> <next> dependency would not
- * create a circular dependency in the graph. (We do this by
- * forward-recursing into the graph starting at <next>, and
- * checking whether we can reach <prev>.)
- *
- * We are using global variables to control the recursion, to
- * keep the stackframe size of the recursive functions low:
- */
- check_source = next;
- check_target = prev;
- if (!(check_noncircular(next->class, 0)))
- return print_circular_bug_tail();
-
- if (!check_prev_add_irq(curr, prev, next))
- return 0;
-
- /*
- * For recursive read-locks we do all the dependency checks,
- * but we dont store read-triggered dependencies (only
- * write-triggered dependencies). This ensures that only the
- * write-side dependencies matter, and that if for example a
- * write-lock never takes any other locks, then the reads are
- * equivalent to a NOP.
- */
- if (next->read == 2 || prev->read == 2)
- return 1;
- /*
- * Is the <prev> -> <next> dependency already present?
- *
- * (this may occur even though this is a new chain: consider
- * e.g. the L1 -> L2 -> L3 -> L4 and the L5 -> L1 -> L2 -> L3
- * chains - the second one will be new, but L1 already has
- * L2 added to its dependency list, due to the first chain.)
- */
- list_for_each_entry(entry, &prev->class->locks_after, entry) {
- if (entry->class == next->class) {
- if (distance == 1)
- entry->distance = 1;
- return 2;
- }
- }
-
- /*
- * Ok, all validations passed, add the new lock
- * to the previous lock's dependency list:
- */
- ret = add_lock_to_list(prev->class, next->class,
- &prev->class->locks_after, next->acquire_ip, distance);
-
- if (!ret)
- return 0;
-
- ret = add_lock_to_list(next->class, prev->class,
- &next->class->locks_before, next->acquire_ip, distance);
- if (!ret)
- return 0;
-
- /*
- * Debugging printouts:
- */
- if (verbose(prev->class) || verbose(next->class)) {
- graph_unlock();
- printk("\n new dependency: ");
- print_lock_name(prev->class);
- printk(" => ");
- print_lock_name(next->class);
- printk("\n");
- dump_stack();
- return graph_lock();
- }
- return 1;
-}
-
-/*
- * Add the dependency to all directly-previous locks that are 'relevant'.
- * The ones that are relevant are (in increasing distance from curr):
- * all consecutive trylock entries and the final non-trylock entry - or
- * the end of this context's lock-chain - whichever comes first.
- */
-static int
-check_prevs_add(struct task_struct *curr, struct held_lock *next)
-{
- int depth = curr->lockdep_depth;
- struct held_lock *hlock;
-
- /*
- * Debugging checks.
- *
- * Depth must not be zero for a non-head lock:
- */
- if (!depth)
- goto out_bug;
- /*
- * At least two relevant locks must exist for this
- * to be a head:
- */
- if (curr->held_locks[depth].irq_context !=
- curr->held_locks[depth-1].irq_context)
- goto out_bug;
-
- for (;;) {
- int distance = curr->lockdep_depth - depth + 1;
- hlock = curr->held_locks + depth-1;
- /*
- * Only non-recursive-read entries get new dependencies
- * added:
- */
- if (hlock->read != 2) {
- if (!check_prev_add(curr, hlock, next, distance))
- return 0;
- /*
- * Stop after the first non-trylock entry,
- * as non-trylock entries have added their
- * own direct dependencies already, so this
- * lock is connected to them indirectly:
- */
- if (!hlock->trylock)
- break;
- }
- depth--;
- /*
- * End of lock-stack?
- */
- if (!depth)
- break;
- /*
- * Stop the search if we cross into another context:
- */
- if (curr->held_locks[depth].irq_context !=
- curr->held_locks[depth-1].irq_context)
- break;
- }
- return 1;
-out_bug:
- if (!debug_locks_off_graph_unlock())
- return 0;
-
- WARN_ON(1);
-
- return 0;
-}
-
-unsigned long nr_lock_chains;
-static struct lock_chain lock_chains[MAX_LOCKDEP_CHAINS];
-
-/*
- * Look up a dependency chain. If the key is not present yet then
- * add it and return 1 - in this case the new dependency chain is
- * validated. If the key is already hashed, return 0.
- * (On return with 1 graph_lock is held.)
- */
-static inline int lookup_chain_cache(u64 chain_key, struct lock_class *class)
-{
- struct list_head *hash_head = chainhashentry(chain_key);
- struct lock_chain *chain;
-
- if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
- return 0;
- /*
- * We can walk it lock-free, because entries only get added
- * to the hash:
- */
- list_for_each_entry(chain, hash_head, entry) {
- if (chain->chain_key == chain_key) {
-cache_hit:
- debug_atomic_inc(&chain_lookup_hits);
- if (very_verbose(class))
- printk("\nhash chain already cached, key: "
- "%016Lx tail class: [%p] %s\n",
- (unsigned long long)chain_key,
- class->key, class->name);
- return 0;
- }
- }
- if (very_verbose(class))
- printk("\nnew hash chain, key: %016Lx tail class: [%p] %s\n",
- (unsigned long long)chain_key, class->key, class->name);
- /*
- * Allocate a new chain entry from the static array, and add
- * it to the hash:
- */
- if (!graph_lock())
- return 0;
- /*
- * We have to walk the chain again locked - to avoid duplicates:
- */
- list_for_each_entry(chain, hash_head, entry) {
- if (chain->chain_key == chain_key) {
- graph_unlock();
- goto cache_hit;
- }
- }
- if (unlikely(nr_lock_chains >= MAX_LOCKDEP_CHAINS)) {
- if (!debug_locks_off_graph_unlock())
- return 0;
-
- printk("BUG: MAX_LOCKDEP_CHAINS too low!\n");
- printk("turning off the locking correctness validator.\n");
- return 0;
- }
- chain = lock_chains + nr_lock_chains++;
- chain->chain_key = chain_key;
- list_add_tail_rcu(&chain->entry, hash_head);
- debug_atomic_inc(&chain_lookup_misses);
- inc_chains();
-
- return 1;
-}
-
-static int validate_chain(struct task_struct *curr, struct lockdep_map *lock,
- struct held_lock *hlock, int chain_head, u64 chain_key)
-{
- /*
- * Trylock needs to maintain the stack of held locks, but it
- * does not add new dependencies, because trylock can be done
- * in any order.
- *
- * We look up the chain_key and do the O(N^2) check and update of
- * the dependencies only if this is a new dependency chain.
- * (If lookup_chain_cache() returns with 1 it acquires
- * graph_lock for us)
- */
- if (!hlock->trylock && (hlock->check == 2) &&
- lookup_chain_cache(chain_key, hlock->class)) {
- /*
- * Check whether last held lock:
- *
- * - is irq-safe, if this lock is irq-unsafe
- * - is softirq-safe, if this lock is hardirq-unsafe
- *
- * And check whether the new lock's dependency graph
- * could lead back to the previous lock.
- *
- * any of these scenarios could lead to a deadlock. If
- * All validations
- */
- int ret = check_deadlock(curr, hlock, lock, hlock->read);
-
- if (!ret)
- return 0;
- /*
- * Mark recursive read, as we jump over it when
- * building dependencies (just like we jump over
- * trylock entries):
- */
- if (ret == 2)
- hlock->read = 2;
- /*
- * Add dependency only if this lock is not the head
- * of the chain, and if it's not a secondary read-lock:
- */
- if (!chain_head && ret != 2)
- if (!check_prevs_add(curr, hlock))
- return 0;
- graph_unlock();
- } else
- /* after lookup_chain_cache(): */
- if (unlikely(!debug_locks))
- return 0;
-
- return 1;
-}
-#else
-static inline int validate_chain(struct task_struct *curr,
- struct lockdep_map *lock, struct held_lock *hlock,
- int chain_head, u64 chain_key)
-{
- return 1;
-}
-#endif
-
-/*
- * We are building curr_chain_key incrementally, so double-check
- * it from scratch, to make sure that it's done correctly:
- */
-static void check_chain_key(struct task_struct *curr)
-{
-#ifdef CONFIG_DEBUG_LOCKDEP
- struct held_lock *hlock, *prev_hlock = NULL;
- unsigned int i, id;
- u64 chain_key = 0;
-
- for (i = 0; i < curr->lockdep_depth; i++) {
- hlock = curr->held_locks + i;
- if (chain_key != hlock->prev_chain_key) {
- debug_locks_off();
- printk("hm#1, depth: %u [%u], %016Lx != %016Lx\n",
- curr->lockdep_depth, i,
- (unsigned long long)chain_key,
- (unsigned long long)hlock->prev_chain_key);
- WARN_ON(1);
- return;
- }
- id = hlock->class - lock_classes;
- if (DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS))
- return;
-
- if (prev_hlock && (prev_hlock->irq_context !=
- hlock->irq_context))
- chain_key = 0;
- chain_key = iterate_chain_key(chain_key, id);
- prev_hlock = hlock;
- }
- if (chain_key != curr->curr_chain_key) {
- debug_locks_off();
- printk("hm#2, depth: %u [%u], %016Lx != %016Lx\n",
- curr->lockdep_depth, i,
- (unsigned long long)chain_key,
- (unsigned long long)curr->curr_chain_key);
- WARN_ON(1);
- }
-#endif
-}
-
-static int
-print_usage_bug(struct task_struct *curr, struct held_lock *this,
- enum lock_usage_bit prev_bit, enum lock_usage_bit new_bit)
-{
- if (!debug_locks_off_graph_unlock() || debug_locks_silent)
- return 0;
-
- printk("\n=================================\n");
- printk( "[ INFO: inconsistent lock state ]\n");
- print_kernel_version();
- printk( "---------------------------------\n");
-
- printk("inconsistent {%s} -> {%s} usage.\n",
- usage_str[prev_bit], usage_str[new_bit]);
-
- printk("%s/%d [HC%u[%lu]:SC%u[%lu]:HE%u:SE%u] takes:\n",
- curr->comm, task_pid_nr(curr),
- trace_hardirq_context(curr), hardirq_count() >> HARDIRQ_SHIFT,
- trace_softirq_context(curr), softirq_count() >> SOFTIRQ_SHIFT,
- trace_hardirqs_enabled(curr),
- trace_softirqs_enabled(curr));
- print_lock(this);
-
- printk("{%s} state was registered at:\n", usage_str[prev_bit]);
- print_stack_trace(this->class->usage_traces + prev_bit, 1);
-
- print_irqtrace_events(curr);
- printk("\nother info that might help us debug this:\n");
- lockdep_print_held_locks(curr);
-
- printk("\nstack backtrace:\n");
- dump_stack();
-
- return 0;
-}
-
-/*
- * Print out an error if an invalid bit is set:
- */
-static inline int
-valid_state(struct task_struct *curr, struct held_lock *this,
- enum lock_usage_bit new_bit, enum lock_usage_bit bad_bit)
-{
- if (unlikely(this->class->usage_mask & (1 << bad_bit)))
- return print_usage_bug(curr, this, bad_bit, new_bit);
- return 1;
-}
-
-static int mark_lock(struct task_struct *curr, struct held_lock *this,
- enum lock_usage_bit new_bit);
-
-#ifdef CONFIG_TRACE_IRQFLAGS
-
-/*
- * print irq inversion bug:
- */
-static int
-print_irq_inversion_bug(struct task_struct *curr, struct lock_class *other,
- struct held_lock *this, int forwards,
- const char *irqclass)
-{
- if (!debug_locks_off_graph_unlock() || debug_locks_silent)
- return 0;
-
- printk("\n=========================================================\n");
- printk( "[ INFO: possible irq lock inversion dependency detected ]\n");
- print_kernel_version();
- printk( "---------------------------------------------------------\n");
- printk("%s/%d just changed the state of lock:\n",
- curr->comm, task_pid_nr(curr));
- print_lock(this);
- if (forwards)
- printk("but this lock took another, %s-irq-unsafe lock in the past:\n", irqclass);
- else
- printk("but this lock was taken by another, %s-irq-safe lock in the past:\n", irqclass);
- print_lock_name(other);
- printk("\n\nand interrupts could create inverse lock ordering between them.\n\n");
-
- printk("\nother info that might help us debug this:\n");
- lockdep_print_held_locks(curr);
-
- printk("\nthe first lock's dependencies:\n");
- print_lock_dependencies(this->class, 0);
-
- printk("\nthe second lock's dependencies:\n");
- print_lock_dependencies(other, 0);
-
- printk("\nstack backtrace:\n");
- dump_stack();
-
- return 0;
-}
-
-/*
- * Prove that in the forwards-direction subgraph starting at <this>
- * there is no lock matching <mask>:
- */
-static int
-check_usage_forwards(struct task_struct *curr, struct held_lock *this,
- enum lock_usage_bit bit, const char *irqclass)
-{
- int ret;
-
- find_usage_bit = bit;
- /* fills in <forwards_match> */
- ret = find_usage_forwards(this->class, 0);
- if (!ret || ret == 1)
- return ret;
-
- return print_irq_inversion_bug(curr, forwards_match, this, 1, irqclass);
-}
-
-/*
- * Prove that in the backwards-direction subgraph starting at <this>
- * there is no lock matching <mask>:
- */
-static int
-check_usage_backwards(struct task_struct *curr, struct held_lock *this,
- enum lock_usage_bit bit, const char *irqclass)
-{
- int ret;
-
- find_usage_bit = bit;
- /* fills in <backwards_match> */
- ret = find_usage_backwards(this->class, 0);
- if (!ret || ret == 1)
- return ret;
-
- return print_irq_inversion_bug(curr, backwards_match, this, 0, irqclass);
-}
-
-void print_irqtrace_events(struct task_struct *curr)
-{
- printk("irq event stamp: %u\n", curr->irq_events);
- printk("hardirqs last enabled at (%u): ", curr->hardirq_enable_event);
- print_ip_sym(curr->hardirq_enable_ip);
- printk("hardirqs last disabled at (%u): ", curr->hardirq_disable_event);
- print_ip_sym(curr->hardirq_disable_ip);
- printk("softirqs last enabled at (%u): ", curr->softirq_enable_event);
- print_ip_sym(curr->softirq_enable_ip);
- printk("softirqs last disabled at (%u): ", curr->softirq_disable_event);
- print_ip_sym(curr->softirq_disable_ip);
-}
-
-static int hardirq_verbose(struct lock_class *class)
-{
-#if HARDIRQ_VERBOSE
- return class_filter(class);
-#endif
- return 0;
-}
-
-static int softirq_verbose(struct lock_class *class)
-{
-#if SOFTIRQ_VERBOSE
- return class_filter(class);
-#endif
- return 0;
-}
-
-#define STRICT_READ_CHECKS 1
-
-static int mark_lock_irq(struct task_struct *curr, struct held_lock *this,
- enum lock_usage_bit new_bit)
-{
- int ret = 1;
-
- switch(new_bit) {
- case LOCK_USED_IN_HARDIRQ:
- if (!valid_state(curr, this, new_bit, LOCK_ENABLED_HARDIRQS))
- return 0;
- if (!valid_state(curr, this, new_bit,
- LOCK_ENABLED_HARDIRQS_READ))
- return 0;
- /*
- * just marked it hardirq-safe, check that this lock
- * took no hardirq-unsafe lock in the past:
- */
- if (!check_usage_forwards(curr, this,
- LOCK_ENABLED_HARDIRQS, "hard"))
- return 0;
-#if STRICT_READ_CHECKS
- /*
- * just marked it hardirq-safe, check that this lock
- * took no hardirq-unsafe-read lock in the past:
- */
- if (!check_usage_forwards(curr, this,
- LOCK_ENABLED_HARDIRQS_READ, "hard-read"))
- return 0;
-#endif
- if (hardirq_verbose(this->class))
- ret = 2;
- break;
- case LOCK_USED_IN_SOFTIRQ:
- if (!valid_state(curr, this, new_bit, LOCK_ENABLED_SOFTIRQS))
- return 0;
- if (!valid_state(curr, this, new_bit,
- LOCK_ENABLED_SOFTIRQS_READ))
- return 0;
- /*
- * just marked it softirq-safe, check that this lock
- * took no softirq-unsafe lock in the past:
- */
- if (!check_usage_forwards(curr, this,
- LOCK_ENABLED_SOFTIRQS, "soft"))
- return 0;
-#if STRICT_READ_CHECKS
- /*
- * just marked it softirq-safe, check that this lock
- * took no softirq-unsafe-read lock in the past:
- */
- if (!check_usage_forwards(curr, this,
- LOCK_ENABLED_SOFTIRQS_READ, "soft-read"))
- return 0;
-#endif
- if (softirq_verbose(this->class))
- ret = 2;
- break;
- case LOCK_USED_IN_HARDIRQ_READ:
- if (!valid_state(curr, this, new_bit, LOCK_ENABLED_HARDIRQS))
- return 0;
- /*
- * just marked it hardirq-read-safe, check that this lock
- * took no hardirq-unsafe lock in the past:
- */
- if (!check_usage_forwards(curr, this,
- LOCK_ENABLED_HARDIRQS, "hard"))
- return 0;
- if (hardirq_verbose(this->class))
- ret = 2;
- break;
- case LOCK_USED_IN_SOFTIRQ_READ:
- if (!valid_state(curr, this, new_bit, LOCK_ENABLED_SOFTIRQS))
- return 0;
- /*
- * just marked it softirq-read-safe, check that this lock
- * took no softirq-unsafe lock in the past:
- */
- if (!check_usage_forwards(curr, this,
- LOCK_ENABLED_SOFTIRQS, "soft"))
- return 0;
- if (softirq_verbose(this->class))
- ret = 2;
- break;
- case LOCK_ENABLED_HARDIRQS:
- if (!valid_state(curr, this, new_bit, LOCK_USED_IN_HARDIRQ))
- return 0;
- if (!valid_state(curr, this, new_bit,
- LOCK_USED_IN_HARDIRQ_READ))
- return 0;
- /*
- * just marked it hardirq-unsafe, check that no hardirq-safe
- * lock in the system ever took it in the past:
- */
- if (!check_usage_backwards(curr, this,
- LOCK_USED_IN_HARDIRQ, "hard"))
- return 0;
-#if STRICT_READ_CHECKS
- /*
- * just marked it hardirq-unsafe, check that no
- * hardirq-safe-read lock in the system ever took
- * it in the past:
- */
- if (!check_usage_backwards(curr, this,
- LOCK_USED_IN_HARDIRQ_READ, "hard-read"))
- return 0;
-#endif
- if (hardirq_verbose(this->class))
- ret = 2;
- break;
- case LOCK_ENABLED_SOFTIRQS:
- if (!valid_state(curr, this, new_bit, LOCK_USED_IN_SOFTIRQ))
- return 0;
- if (!valid_state(curr, this, new_bit,
- LOCK_USED_IN_SOFTIRQ_READ))
- return 0;
- /*
- * just marked it softirq-unsafe, check that no softirq-safe
- * lock in the system ever took it in the past:
- */
- if (!check_usage_backwards(curr, this,
- LOCK_USED_IN_SOFTIRQ, "soft"))
- return 0;
-#if STRICT_READ_CHECKS
- /*
- * just marked it softirq-unsafe, check that no
- * softirq-safe-read lock in the system ever took
- * it in the past:
- */
- if (!check_usage_backwards(curr, this,
- LOCK_USED_IN_SOFTIRQ_READ, "soft-read"))
- return 0;
-#endif
- if (softirq_verbose(this->class))
- ret = 2;
- break;
- case LOCK_ENABLED_HARDIRQS_READ:
- if (!valid_state(curr, this, new_bit, LOCK_USED_IN_HARDIRQ))
- return 0;
-#if STRICT_READ_CHECKS
- /*
- * just marked it hardirq-read-unsafe, check that no
- * hardirq-safe lock in the system ever took it in the past:
- */
- if (!check_usage_backwards(curr, this,
- LOCK_USED_IN_HARDIRQ, "hard"))
- return 0;
-#endif
- if (hardirq_verbose(this->class))
- ret = 2;
- break;
- case LOCK_ENABLED_SOFTIRQS_READ:
- if (!valid_state(curr, this, new_bit, LOCK_USED_IN_SOFTIRQ))
- return 0;
-#if STRICT_READ_CHECKS
- /*
- * just marked it softirq-read-unsafe, check that no
- * softirq-safe lock in the system ever took it in the past:
- */
- if (!check_usage_backwards(curr, this,
- LOCK_USED_IN_SOFTIRQ, "soft"))
- return 0;
-#endif
- if (softirq_verbose(this->class))
- ret = 2;
- break;
- default:
- WARN_ON(1);
- break;
- }
-
- return ret;
-}
-
-/*
- * Mark all held locks with a usage bit:
- */
-static int
-mark_held_locks(struct task_struct *curr, int hardirq)
-{
- enum lock_usage_bit usage_bit;
- struct held_lock *hlock;
- int i;
-
- for (i = 0; i < curr->lockdep_depth; i++) {
- hlock = curr->held_locks + i;
-
- if (hardirq) {
- if (hlock->read)
- usage_bit = LOCK_ENABLED_HARDIRQS_READ;
- else
- usage_bit = LOCK_ENABLED_HARDIRQS;
- } else {
- if (hlock->read)
- usage_bit = LOCK_ENABLED_SOFTIRQS_READ;
- else
- usage_bit = LOCK_ENABLED_SOFTIRQS;
- }
- if (!mark_lock(curr, hlock, usage_bit))
- return 0;
- }
-
- return 1;
-}
-
-/*
- * Debugging helper: via this flag we know that we are in
- * 'early bootup code', and will warn about any invalid irqs-on event:
- */
-static int early_boot_irqs_enabled;
-
-void early_boot_irqs_off(void)
-{
- early_boot_irqs_enabled = 0;
-}
-
-void early_boot_irqs_on(void)
-{
- early_boot_irqs_enabled = 1;
-}
-
-/*
- * Hardirqs will be enabled:
- */
-void trace_hardirqs_on(void)
-{
- struct task_struct *curr = current;
- unsigned long ip;
-
- if (unlikely(!debug_locks || current->lockdep_recursion))
- return;
-
- if (DEBUG_LOCKS_WARN_ON(unlikely(!early_boot_irqs_enabled)))
- return;
-
- if (unlikely(curr->hardirqs_enabled)) {
- debug_atomic_inc(&redundant_hardirqs_on);
- return;
- }
- /* we'll do an OFF -> ON transition: */
- curr->hardirqs_enabled = 1;
- ip = (unsigned long) __builtin_return_address(0);
-
- if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
- return;
- if (DEBUG_LOCKS_WARN_ON(current->hardirq_context))
- return;
- /*
- * We are going to turn hardirqs on, so set the
- * usage bit for all held locks:
- */
- if (!mark_held_locks(curr, 1))
- return;
- /*
- * If we have softirqs enabled, then set the usage
- * bit for all held locks. (disabled hardirqs prevented
- * this bit from being set before)
- */
- if (curr->softirqs_enabled)
- if (!mark_held_locks(curr, 0))
- return;
-
- curr->hardirq_enable_ip = ip;
- curr->hardirq_enable_event = ++curr->irq_events;
- debug_atomic_inc(&hardirqs_on_events);
-}
-
-EXPORT_SYMBOL(trace_hardirqs_on);
-
-/*
- * Hardirqs were disabled:
- */
-void trace_hardirqs_off(void)
-{
- struct task_struct *curr = current;
-
- if (unlikely(!debug_locks || current->lockdep_recursion))
- return;
-
- if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
- return;
-
- if (curr->hardirqs_enabled) {
- /*
- * We have done an ON -> OFF transition:
- */
- curr->hardirqs_enabled = 0;
- curr->hardirq_disable_ip = _RET_IP_;
- curr->hardirq_disable_event = ++curr->irq_events;
- debug_atomic_inc(&hardirqs_off_events);
- } else
- debug_atomic_inc(&redundant_hardirqs_off);
-}
-
-EXPORT_SYMBOL(trace_hardirqs_off);
-
-/*
- * Softirqs will be enabled:
- */
-void trace_softirqs_on(unsigned long ip)
-{
- struct task_struct *curr = current;
-
- if (unlikely(!debug_locks))
- return;
-
- if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
- return;
-
- if (curr->softirqs_enabled) {
- debug_atomic_inc(&redundant_softirqs_on);
- return;
- }
-
- /*
- * We'll do an OFF -> ON transition:
- */
- curr->softirqs_enabled = 1;
- curr->softirq_enable_ip = ip;
- curr->softirq_enable_event = ++curr->irq_events;
- debug_atomic_inc(&softirqs_on_events);
- /*
- * We are going to turn softirqs on, so set the
- * usage bit for all held locks, if hardirqs are
- * enabled too:
- */
- if (curr->hardirqs_enabled)
- mark_held_locks(curr, 0);
-}
-
-/*
- * Softirqs were disabled:
- */
-void trace_softirqs_off(unsigned long ip)
-{
- struct task_struct *curr = current;
-
- if (unlikely(!debug_locks))
- return;
-
- if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
- return;
-
- if (curr->softirqs_enabled) {
- /*
- * We have done an ON -> OFF transition:
- */
- curr->softirqs_enabled = 0;
- curr->softirq_disable_ip = ip;
- curr->softirq_disable_event = ++curr->irq_events;
- debug_atomic_inc(&softirqs_off_events);
- DEBUG_LOCKS_WARN_ON(!softirq_count());
- } else
- debug_atomic_inc(&redundant_softirqs_off);
-}
-
-static int mark_irqflags(struct task_struct *curr, struct held_lock *hlock)
-{
- /*
- * If non-trylock use in a hardirq or softirq context, then
- * mark the lock as used in these contexts:
- */
- if (!hlock->trylock) {
- if (hlock->read) {
- if (curr->hardirq_context)
- if (!mark_lock(curr, hlock,
- LOCK_USED_IN_HARDIRQ_READ))
- return 0;
- if (curr->softirq_context)
- if (!mark_lock(curr, hlock,
- LOCK_USED_IN_SOFTIRQ_READ))
- return 0;
- } else {
- if (curr->hardirq_context)
- if (!mark_lock(curr, hlock, LOCK_USED_IN_HARDIRQ))
- return 0;
- if (curr->softirq_context)
- if (!mark_lock(curr, hlock, LOCK_USED_IN_SOFTIRQ))
- return 0;
- }
- }
- if (!hlock->hardirqs_off) {
- if (hlock->read) {
- if (!mark_lock(curr, hlock,
- LOCK_ENABLED_HARDIRQS_READ))
- return 0;
- if (curr->softirqs_enabled)
- if (!mark_lock(curr, hlock,
- LOCK_ENABLED_SOFTIRQS_READ))
- return 0;
- } else {
- if (!mark_lock(curr, hlock,
- LOCK_ENABLED_HARDIRQS))
- return 0;
- if (curr->softirqs_enabled)
- if (!mark_lock(curr, hlock,
- LOCK_ENABLED_SOFTIRQS))
- return 0;
- }
- }
-
- return 1;
-}
-
-static int separate_irq_context(struct task_struct *curr,
- struct held_lock *hlock)
-{
- unsigned int depth = curr->lockdep_depth;
-
- /*
- * Keep track of points where we cross into an interrupt context:
- */
- hlock->irq_context = 2*(curr->hardirq_context ? 1 : 0) +
- curr->softirq_context;
- if (depth) {
- struct held_lock *prev_hlock;
-
- prev_hlock = curr->held_locks + depth-1;
- /*
- * If we cross into another context, reset the
- * hash key (this also prevents the checking and the
- * adding of the dependency to 'prev'):
- */
- if (prev_hlock->irq_context != hlock->irq_context)
- return 1;
- }
- return 0;
-}
-
-#else
-
-static inline
-int mark_lock_irq(struct task_struct *curr, struct held_lock *this,
- enum lock_usage_bit new_bit)
-{
- WARN_ON(1);
- return 1;
-}
-
-static inline int mark_irqflags(struct task_struct *curr,
- struct held_lock *hlock)
-{
- return 1;
-}
-
-static inline int separate_irq_context(struct task_struct *curr,
- struct held_lock *hlock)
-{
- return 0;
-}
-
-#endif
-
-/*
- * Mark a lock with a usage bit, and validate the state transition:
- */
-static int mark_lock(struct task_struct *curr, struct held_lock *this,
- enum lock_usage_bit new_bit)
-{
- unsigned int new_mask = 1 << new_bit, ret = 1;
-
- /*
- * If already set then do not dirty the cacheline,
- * nor do any checks:
- */
- if (likely(this->class->usage_mask & new_mask))
- return 1;
-
- if (!graph_lock())
- return 0;
- /*
- * Make sure we didnt race:
- */
- if (unlikely(this->class->usage_mask & new_mask)) {
- graph_unlock();
- return 1;
- }
-
- this->class->usage_mask |= new_mask;
-
- if (!save_trace(this->class->usage_traces + new_bit))
- return 0;
-
- switch (new_bit) {
- case LOCK_USED_IN_HARDIRQ:
- case LOCK_USED_IN_SOFTIRQ:
- case LOCK_USED_IN_HARDIRQ_READ:
- case LOCK_USED_IN_SOFTIRQ_READ:
- case LOCK_ENABLED_HARDIRQS:
- case LOCK_ENABLED_SOFTIRQS:
- case LOCK_ENABLED_HARDIRQS_READ:
- case LOCK_ENABLED_SOFTIRQS_READ:
- ret = mark_lock_irq(curr, this, new_bit);
- if (!ret)
- return 0;
- break;
- case LOCK_USED:
- /*
- * Add it to the global list of classes:
- */
- list_add_tail_rcu(&this->class->lock_entry, &all_lock_classes);
- debug_atomic_dec(&nr_unused_locks);
- break;
- default:
- if (!debug_locks_off_graph_unlock())
- return 0;
- WARN_ON(1);
- return 0;
- }
-
- graph_unlock();
-
- /*
- * We must printk outside of the graph_lock:
- */
- if (ret == 2) {
- printk("\nmarked lock as {%s}:\n", usage_str[new_bit]);
- print_lock(this);
- print_irqtrace_events(curr);
- dump_stack();
- }
-
- return ret;
-}
-
-/*
- * Initialize a lock instance's lock-class mapping info:
- */
-void lockdep_init_map(struct lockdep_map *lock, const char *name,
- struct lock_class_key *key, int subclass)
-{
- if (unlikely(!debug_locks))
- return;
-
- if (DEBUG_LOCKS_WARN_ON(!key))
- return;
- if (DEBUG_LOCKS_WARN_ON(!name))
- return;
- /*
- * Sanity check, the lock-class key must be persistent:
- */
- if (!static_obj(key)) {
- printk("BUG: key %p not in .data!\n", key);
- DEBUG_LOCKS_WARN_ON(1);
- return;
- }
- lock->name = name;
- lock->key = key;
- lock->class_cache = NULL;
-#ifdef CONFIG_LOCK_STAT
- lock->cpu = raw_smp_processor_id();
-#endif
- if (subclass)
- register_lock_class(lock, subclass, 1);
-}
-
-EXPORT_SYMBOL_GPL(lockdep_init_map);
-
-/*
- * This gets called for every mutex_lock*()/spin_lock*() operation.
- * We maintain the dependency maps and validate the locking attempt:
- */
-static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
- int trylock, int read, int check, int hardirqs_off,
- unsigned long ip)
-{
- struct task_struct *curr = current;
- struct lock_class *class = NULL;
- struct held_lock *hlock;
- unsigned int depth, id;
- int chain_head = 0;
- u64 chain_key;
-
- if (!prove_locking)
- check = 1;
-
- if (unlikely(!debug_locks))
- return 0;
-
- if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
- return 0;
-
- if (unlikely(subclass >= MAX_LOCKDEP_SUBCLASSES)) {
- debug_locks_off();
- printk("BUG: MAX_LOCKDEP_SUBCLASSES too low!\n");
- printk("turning off the locking correctness validator.\n");
- return 0;
- }
-
- if (!subclass)
- class = lock->class_cache;
- /*
- * Not cached yet or subclass?
- */
- if (unlikely(!class)) {
- class = register_lock_class(lock, subclass, 0);
- if (!class)
- return 0;
- }
- debug_atomic_inc((atomic_t *)&class->ops);
- if (very_verbose(class)) {
- printk("\nacquire class [%p] %s", class->key, class->name);
- if (class->name_version > 1)
- printk("#%d", class->name_version);
- printk("\n");
- dump_stack();
- }
-
- /*
- * Add the lock to the list of currently held locks.
- * (we dont increase the depth just yet, up until the
- * dependency checks are done)
- */
- depth = curr->lockdep_depth;
- if (DEBUG_LOCKS_WARN_ON(depth >= MAX_LOCK_DEPTH))
- return 0;
-
- hlock = curr->held_locks + depth;
-
- hlock->class = class;
- hlock->acquire_ip = ip;
- hlock->instance = lock;
- hlock->trylock = trylock;
- hlock->read = read;
- hlock->check = check;
- hlock->hardirqs_off = hardirqs_off;
-#ifdef CONFIG_LOCK_STAT
- hlock->waittime_stamp = 0;
- hlock->holdtime_stamp = sched_clock();
-#endif
-
- if (check == 2 && !mark_irqflags(curr, hlock))
- return 0;
-
- /* mark it as used: */
- if (!mark_lock(curr, hlock, LOCK_USED))
- return 0;
-
- /*
- * Calculate the chain hash: it's the combined has of all the
- * lock keys along the dependency chain. We save the hash value
- * at every step so that we can get the current hash easily
- * after unlock. The chain hash is then used to cache dependency
- * results.
- *
- * The 'key ID' is what is the most compact key value to drive
- * the hash, not class->key.
- */
- id = class - lock_classes;
- if (DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS))
- return 0;
-
- chain_key = curr->curr_chain_key;
- if (!depth) {
- if (DEBUG_LOCKS_WARN_ON(chain_key != 0))
- return 0;
- chain_head = 1;
- }
-
- hlock->prev_chain_key = chain_key;
- if (separate_irq_context(curr, hlock)) {
- chain_key = 0;
- chain_head = 1;
- }
- chain_key = iterate_chain_key(chain_key, id);
-
- if (!validate_chain(curr, lock, hlock, chain_head, chain_key))
- return 0;
-
- curr->curr_chain_key = chain_key;
- curr->lockdep_depth++;
- check_chain_key(curr);
-#ifdef CONFIG_DEBUG_LOCKDEP
- if (unlikely(!debug_locks))
- return 0;
-#endif
- if (unlikely(curr->lockdep_depth >= MAX_LOCK_DEPTH)) {
- debug_locks_off();
- printk("BUG: MAX_LOCK_DEPTH too low!\n");
- printk("turning off the locking correctness validator.\n");
- return 0;
- }
-
- if (unlikely(curr->lockdep_depth > max_lockdep_depth))
- max_lockdep_depth = curr->lockdep_depth;
-
- return 1;
-}
-
-static int
-print_unlock_inbalance_bug(struct task_struct *curr, struct lockdep_map *lock,
- unsigned long ip)
-{
- if (!debug_locks_off())
- return 0;
- if (debug_locks_silent)
- return 0;
-
- printk("\n=====================================\n");
- printk( "[ BUG: bad unlock balance detected! ]\n");
- printk( "-------------------------------------\n");
- printk("%s/%d is trying to release lock (",
- curr->comm, task_pid_nr(curr));
- print_lockdep_cache(lock);
- printk(") at:\n");
- print_ip_sym(ip);
- printk("but there are no more locks to release!\n");
- printk("\nother info that might help us debug this:\n");
- lockdep_print_held_locks(curr);
-
- printk("\nstack backtrace:\n");
- dump_stack();
-
- return 0;
-}
-
-/*
- * Common debugging checks for both nested and non-nested unlock:
- */
-static int check_unlock(struct task_struct *curr, struct lockdep_map *lock,
- unsigned long ip)
-{
- if (unlikely(!debug_locks))
- return 0;
- if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
- return 0;
-
- if (curr->lockdep_depth <= 0)
- return print_unlock_inbalance_bug(curr, lock, ip);
-
- return 1;
-}
-
-/*
- * Remove the lock to the list of currently held locks in a
- * potentially non-nested (out of order) manner. This is a
- * relatively rare operation, as all the unlock APIs default
- * to nested mode (which uses lock_release()):
- */
-static int
-lock_release_non_nested(struct task_struct *curr,
- struct lockdep_map *lock, unsigned long ip)
-{
- struct held_lock *hlock, *prev_hlock;
- unsigned int depth;
- int i;
-
- /*
- * Check whether the lock exists in the current stack
- * of held locks:
- */
- depth = curr->lockdep_depth;
- if (DEBUG_LOCKS_WARN_ON(!depth))
- return 0;
-
- prev_hlock = NULL;
- for (i = depth-1; i >= 0; i--) {
- hlock = curr->held_locks + i;
- /*
- * We must not cross into another context:
- */
- if (prev_hlock && prev_hlock->irq_context != hlock->irq_context)
- break;
- if (hlock->instance == lock)
- goto found_it;
- prev_hlock = hlock;
- }
- return print_unlock_inbalance_bug(curr, lock, ip);
-
-found_it:
- lock_release_holdtime(hlock);
-
- /*
- * We have the right lock to unlock, 'hlock' points to it.
- * Now we remove it from the stack, and add back the other
- * entries (if any), recalculating the hash along the way:
- */
- curr->lockdep_depth = i;
- curr->curr_chain_key = hlock->prev_chain_key;
-
- for (i++; i < depth; i++) {
- hlock = curr->held_locks + i;
- if (!__lock_acquire(hlock->instance,
- hlock->class->subclass, hlock->trylock,
- hlock->read, hlock->check, hlock->hardirqs_off,
- hlock->acquire_ip))
- return 0;
- }
-
- if (DEBUG_LOCKS_WARN_ON(curr->lockdep_depth != depth - 1))
- return 0;
- return 1;
-}
-
-/*
- * Remove the lock to the list of currently held locks - this gets
- * called on mutex_unlock()/spin_unlock*() (or on a failed
- * mutex_lock_interruptible()). This is done for unlocks that nest
- * perfectly. (i.e. the current top of the lock-stack is unlocked)
- */
-static int lock_release_nested(struct task_struct *curr,
- struct lockdep_map *lock, unsigned long ip)
-{
- struct held_lock *hlock;
- unsigned int depth;
-
- /*
- * Pop off the top of the lock stack:
- */
- depth = curr->lockdep_depth - 1;
- hlock = curr->held_locks + depth;
-
- /*
- * Is the unlock non-nested:
- */
- if (hlock->instance != lock)
- return lock_release_non_nested(curr, lock, ip);
- curr->lockdep_depth--;
-
- if (DEBUG_LOCKS_WARN_ON(!depth && (hlock->prev_chain_key != 0)))
- return 0;
-
- curr->curr_chain_key = hlock->prev_chain_key;
-
- lock_release_holdtime(hlock);
-
-#ifdef CONFIG_DEBUG_LOCKDEP
- hlock->prev_chain_key = 0;
- hlock->class = NULL;
- hlock->acquire_ip = 0;
- hlock->irq_context = 0;
-#endif
- return 1;
-}
-
-/*
- * Remove the lock to the list of currently held locks - this gets
- * called on mutex_unlock()/spin_unlock*() (or on a failed
- * mutex_lock_interruptible()). This is done for unlocks that nest
- * perfectly. (i.e. the current top of the lock-stack is unlocked)
- */
-static void
-__lock_release(struct lockdep_map *lock, int nested, unsigned long ip)
-{
- struct task_struct *curr = current;
-
- if (!check_unlock(curr, lock, ip))
- return;
-
- if (nested) {
- if (!lock_release_nested(curr, lock, ip))
- return;
- } else {
- if (!lock_release_non_nested(curr, lock, ip))
- return;
- }
-
- check_chain_key(curr);
-}
-
-/*
- * Check whether we follow the irq-flags state precisely:
- */
-static void check_flags(unsigned long flags)
-{
-#if defined(CONFIG_DEBUG_LOCKDEP) && defined(CONFIG_TRACE_IRQFLAGS)
- if (!debug_locks)
- return;
-
- if (irqs_disabled_flags(flags))
- DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled);
- else
- DEBUG_LOCKS_WARN_ON(!current->hardirqs_enabled);
-
- /*
- * We dont accurately track softirq state in e.g.
- * hardirq contexts (such as on 4KSTACKS), so only
- * check if not in hardirq contexts:
- */
- if (!hardirq_count()) {
- if (softirq_count())
- DEBUG_LOCKS_WARN_ON(current->softirqs_enabled);
- else
- DEBUG_LOCKS_WARN_ON(!current->softirqs_enabled);
- }
-
- if (!debug_locks)
- print_irqtrace_events(current);
-#endif
-}
-
-/*
- * We are not always called with irqs disabled - do that here,
- * and also avoid lockdep recursion:
- */
-void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
- int trylock, int read, int check, unsigned long ip)
-{
- unsigned long flags;
-
- if (unlikely(!lock_stat && !prove_locking))
- return;
-
- if (unlikely(current->lockdep_recursion))
- return;
-
- raw_local_irq_save(flags);
- check_flags(flags);
-
- current->lockdep_recursion = 1;
- __lock_acquire(lock, subclass, trylock, read, check,
- irqs_disabled_flags(flags), ip);
- current->lockdep_recursion = 0;
- raw_local_irq_restore(flags);
-}
-
-EXPORT_SYMBOL_GPL(lock_acquire);
-
-void lock_release(struct lockdep_map *lock, int nested, unsigned long ip)
-{
- unsigned long flags;
-
- if (unlikely(!lock_stat && !prove_locking))
- return;
-
- if (unlikely(current->lockdep_recursion))
- return;
-
- raw_local_irq_save(flags);
- check_flags(flags);
- current->lockdep_recursion = 1;
- __lock_release(lock, nested, ip);
- current->lockdep_recursion = 0;
- raw_local_irq_restore(flags);
-}
-
-EXPORT_SYMBOL_GPL(lock_release);
-
-#ifdef CONFIG_LOCK_STAT
-static int
-print_lock_contention_bug(struct task_struct *curr, struct lockdep_map *lock,
- unsigned long ip)
-{
- if (!debug_locks_off())
- return 0;
- if (debug_locks_silent)
- return 0;
-
- printk("\n=================================\n");
- printk( "[ BUG: bad contention detected! ]\n");
- printk( "---------------------------------\n");
- printk("%s/%d is trying to contend lock (",
- curr->comm, task_pid_nr(curr));
- print_lockdep_cache(lock);
- printk(") at:\n");
- print_ip_sym(ip);
- printk("but there are no locks held!\n");
- printk("\nother info that might help us debug this:\n");
- lockdep_print_held_locks(curr);
-
- printk("\nstack backtrace:\n");
- dump_stack();
-
- return 0;
-}
-
-static void
-__lock_contended(struct lockdep_map *lock, unsigned long ip)
-{
- struct task_struct *curr = current;
- struct held_lock *hlock, *prev_hlock;
- struct lock_class_stats *stats;
- unsigned int depth;
- int i, point;
-
- depth = curr->lockdep_depth;
- if (DEBUG_LOCKS_WARN_ON(!depth))
- return;
-
- prev_hlock = NULL;
- for (i = depth-1; i >= 0; i--) {
- hlock = curr->held_locks + i;
- /*
- * We must not cross into another context:
- */
- if (prev_hlock && prev_hlock->irq_context != hlock->irq_context)
- break;
- if (hlock->instance == lock)
- goto found_it;
- prev_hlock = hlock;
- }
- print_lock_contention_bug(curr, lock, ip);
- return;
-
-found_it:
- hlock->waittime_stamp = sched_clock();
-
- point = lock_contention_point(hlock->class, ip);
-
- stats = get_lock_stats(hlock->class);
- if (point < ARRAY_SIZE(stats->contention_point))
- stats->contention_point[i]++;
- if (lock->cpu != smp_processor_id())
- stats->bounces[bounce_contended + !!hlock->read]++;
- put_lock_stats(stats);
-}
-
-static void
-__lock_acquired(struct lockdep_map *lock)
-{
- struct task_struct *curr = current;
- struct held_lock *hlock, *prev_hlock;
- struct lock_class_stats *stats;
- unsigned int depth;
- u64 now;
- s64 waittime = 0;
- int i, cpu;
-
- depth = curr->lockdep_depth;
- if (DEBUG_LOCKS_WARN_ON(!depth))
- return;
-
- prev_hlock = NULL;
- for (i = depth-1; i >= 0; i--) {
- hlock = curr->held_locks + i;
- /*
- * We must not cross into another context:
- */
- if (prev_hlock && prev_hlock->irq_context != hlock->irq_context)
- break;
- if (hlock->instance == lock)
- goto found_it;
- prev_hlock = hlock;
- }
- print_lock_contention_bug(curr, lock, _RET_IP_);
- return;
-
-found_it:
- cpu = smp_processor_id();
- if (hlock->waittime_stamp) {
- now = sched_clock();
- waittime = now - hlock->waittime_stamp;
- hlock->holdtime_stamp = now;
- }
-
- stats = get_lock_stats(hlock->class);
- if (waittime) {
- if (hlock->read)
- lock_time_inc(&stats->read_waittime, waittime);
- else
- lock_time_inc(&stats->write_waittime, waittime);
- }
- if (lock->cpu != cpu)
- stats->bounces[bounce_acquired + !!hlock->read]++;
- put_lock_stats(stats);
-
- lock->cpu = cpu;
-}
-
-void lock_contended(struct lockdep_map *lock, unsigned long ip)
-{
- unsigned long flags;
-
- if (unlikely(!lock_stat))
- return;
-
- if (unlikely(current->lockdep_recursion))
- return;
-
- raw_local_irq_save(flags);
- check_flags(flags);
- current->lockdep_recursion = 1;
- __lock_contended(lock, ip);
- current->lockdep_recursion = 0;
- raw_local_irq_restore(flags);
-}
-EXPORT_SYMBOL_GPL(lock_contended);
-
-void lock_acquired(struct lockdep_map *lock)
-{
- unsigned long flags;
-
- if (unlikely(!lock_stat))
- return;
-
- if (unlikely(current->lockdep_recursion))
- return;
-
- raw_local_irq_save(flags);
- check_flags(flags);
- current->lockdep_recursion = 1;
- __lock_acquired(lock);
- current->lockdep_recursion = 0;
- raw_local_irq_restore(flags);
-}
-EXPORT_SYMBOL_GPL(lock_acquired);
-#endif
-
-/*
- * Used by the testsuite, sanitize the validator state
- * after a simulated failure:
- */
-
-void lockdep_reset(void)
-{
- unsigned long flags;
- int i;
-
- raw_local_irq_save(flags);
- current->curr_chain_key = 0;
- current->lockdep_depth = 0;
- current->lockdep_recursion = 0;
- memset(current->held_locks, 0, MAX_LOCK_DEPTH*sizeof(struct held_lock));
- nr_hardirq_chains = 0;
- nr_softirq_chains = 0;
- nr_process_chains = 0;
- debug_locks = 1;
- for (i = 0; i < CHAINHASH_SIZE; i++)
- INIT_LIST_HEAD(chainhash_table + i);
- raw_local_irq_restore(flags);
-}
-
-static void zap_class(struct lock_class *class)
-{
- int i;
-
- /*
- * Remove all dependencies this lock is
- * involved in:
- */
- for (i = 0; i < nr_list_entries; i++) {
- if (list_entries[i].class == class)
- list_del_rcu(&list_entries[i].entry);
- }
- /*
- * Unhash the class and remove it from the all_lock_classes list:
- */
- list_del_rcu(&class->hash_entry);
- list_del_rcu(&class->lock_entry);
-
-}
-
-static inline int within(void *addr, void *start, unsigned long size)
-{
- return addr >= start && addr < start + size;
-}
-
-void lockdep_free_key_range(void *start, unsigned long size)
-{
- struct lock_class *class, *next;
- struct list_head *head;
- unsigned long flags;
- int i;
-
- raw_local_irq_save(flags);
- graph_lock();
-
- /*
- * Unhash all classes that were created by this module:
- */
- for (i = 0; i < CLASSHASH_SIZE; i++) {
- head = classhash_table + i;
- if (list_empty(head))
- continue;
- list_for_each_entry_safe(class, next, head, hash_entry)
- if (within(class->key, start, size))
- zap_class(class);
- }
-
- graph_unlock();
- raw_local_irq_restore(flags);
-}
-
-void lockdep_reset_lock(struct lockdep_map *lock)
-{
- struct lock_class *class, *next;
- struct list_head *head;
- unsigned long flags;
- int i, j;
-
- raw_local_irq_save(flags);
-
- /*
- * Remove all classes this lock might have:
- */
- for (j = 0; j < MAX_LOCKDEP_SUBCLASSES; j++) {
- /*
- * If the class exists we look it up and zap it:
- */
- class = look_up_lock_class(lock, j);
- if (class)
- zap_class(class);
- }
- /*
- * Debug check: in the end all mapped classes should
- * be gone.
- */
- graph_lock();
- for (i = 0; i < CLASSHASH_SIZE; i++) {
- head = classhash_table + i;
- if (list_empty(head))
- continue;
- list_for_each_entry_safe(class, next, head, hash_entry) {
- if (unlikely(class == lock->class_cache)) {
- if (debug_locks_off_graph_unlock())
- WARN_ON(1);
- goto out_restore;
- }
- }
- }
- graph_unlock();
-
-out_restore:
- raw_local_irq_restore(flags);
-}
-
-void lockdep_init(void)
-{
- int i;
-
- /*
- * Some architectures have their own start_kernel()
- * code which calls lockdep_init(), while we also
- * call lockdep_init() from the start_kernel() itself,
- * and we want to initialize the hashes only once:
- */
- if (lockdep_initialized)
- return;
-
- for (i = 0; i < CLASSHASH_SIZE; i++)
- INIT_LIST_HEAD(classhash_table + i);
-
- for (i = 0; i < CHAINHASH_SIZE; i++)
- INIT_LIST_HEAD(chainhash_table + i);
-
- lockdep_initialized = 1;
-}
-
-void __init lockdep_info(void)
-{
- printk("Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar\n");
-
- printk("... MAX_LOCKDEP_SUBCLASSES: %lu\n", MAX_LOCKDEP_SUBCLASSES);
- printk("... MAX_LOCK_DEPTH: %lu\n", MAX_LOCK_DEPTH);
- printk("... MAX_LOCKDEP_KEYS: %lu\n", MAX_LOCKDEP_KEYS);
- printk("... CLASSHASH_SIZE: %lu\n", CLASSHASH_SIZE);
- printk("... MAX_LOCKDEP_ENTRIES: %lu\n", MAX_LOCKDEP_ENTRIES);
- printk("... MAX_LOCKDEP_CHAINS: %lu\n", MAX_LOCKDEP_CHAINS);
- printk("... CHAINHASH_SIZE: %lu\n", CHAINHASH_SIZE);
-
- printk(" memory used by lock dependency info: %lu kB\n",
- (sizeof(struct lock_class) * MAX_LOCKDEP_KEYS +
- sizeof(struct list_head) * CLASSHASH_SIZE +
- sizeof(struct lock_list) * MAX_LOCKDEP_ENTRIES +
- sizeof(struct lock_chain) * MAX_LOCKDEP_CHAINS +
- sizeof(struct list_head) * CHAINHASH_SIZE) / 1024);
-
- printk(" per task-struct memory footprint: %lu bytes\n",
- sizeof(struct held_lock) * MAX_LOCK_DEPTH);
-
-#ifdef CONFIG_DEBUG_LOCKDEP
- if (lockdep_init_error) {
- printk("WARNING: lockdep init error! Arch code didn't call lockdep_init() early enough?\n");
- printk("Call stack leading to lockdep invocation was:\n");
- print_stack_trace(&lockdep_init_trace, 0);
- }
-#endif
-}
-
-static inline int in_range(const void *start, const void *addr, const void *end)
-{
- return addr >= start && addr <= end;
-}
-
-static void
-print_freed_lock_bug(struct task_struct *curr, const void *mem_from,
- const void *mem_to, struct held_lock *hlock)
-{
- if (!debug_locks_off())
- return;
- if (debug_locks_silent)
- return;
-
- printk("\n=========================\n");
- printk( "[ BUG: held lock freed! ]\n");
- printk( "-------------------------\n");
- printk("%s/%d is freeing memory %p-%p, with a lock still held there!\n",
- curr->comm, task_pid_nr(curr), mem_from, mem_to-1);
- print_lock(hlock);
- lockdep_print_held_locks(curr);
-
- printk("\nstack backtrace:\n");
- dump_stack();
-}
-
-/*
- * Called when kernel memory is freed (or unmapped), or if a lock
- * is destroyed or reinitialized - this code checks whether there is
- * any held lock in the memory range of <from> to <to>:
- */
-void debug_check_no_locks_freed(const void *mem_from, unsigned long mem_len)
-{
- const void *mem_to = mem_from + mem_len, *lock_from, *lock_to;
- struct task_struct *curr = current;
- struct held_lock *hlock;
- unsigned long flags;
- int i;
-
- if (unlikely(!debug_locks))
- return;
-
- local_irq_save(flags);
- for (i = 0; i < curr->lockdep_depth; i++) {
- hlock = curr->held_locks + i;
-
- lock_from = (void *)hlock->instance;
- lock_to = (void *)(hlock->instance + 1);
-
- if (!in_range(mem_from, lock_from, mem_to) &&
- !in_range(mem_from, lock_to, mem_to))
- continue;
-
- print_freed_lock_bug(curr, mem_from, mem_to, hlock);
- break;
- }
- local_irq_restore(flags);
-}
-EXPORT_SYMBOL_GPL(debug_check_no_locks_freed);
-
-static void print_held_locks_bug(struct task_struct *curr)
-{
- if (!debug_locks_off())
- return;
- if (debug_locks_silent)
- return;
-
- printk("\n=====================================\n");
- printk( "[ BUG: lock held at task exit time! ]\n");
- printk( "-------------------------------------\n");
- printk("%s/%d is exiting with locks still held!\n",
- curr->comm, task_pid_nr(curr));
- lockdep_print_held_locks(curr);
-
- printk("\nstack backtrace:\n");
- dump_stack();
-}
-
-void debug_check_no_locks_held(struct task_struct *task)
-{
- if (unlikely(task->lockdep_depth > 0))
- print_held_locks_bug(task);
-}
-
-void debug_show_all_locks(void)
-{
- struct task_struct *g, *p;
- int count = 10;
- int unlock = 1;
-
- if (unlikely(!debug_locks)) {
- printk("INFO: lockdep is turned off.\n");
- return;
- }
- printk("\nShowing all locks held in the system:\n");
-
- /*
- * Here we try to get the tasklist_lock as hard as possible,
- * if not successful after 2 seconds we ignore it (but keep
- * trying). This is to enable a debug printout even if a
- * tasklist_lock-holding task deadlocks or crashes.
- */
-retry:
- if (!read_trylock(&tasklist_lock)) {
- if (count == 10)
- printk("hm, tasklist_lock locked, retrying... ");
- if (count) {
- count--;
- printk(" #%d", 10-count);
- mdelay(200);
- goto retry;
- }
- printk(" ignoring it.\n");
- unlock = 0;
- }
- if (count != 10)
- printk(" locked it.\n");
-
- do_each_thread(g, p) {
- if (p->lockdep_depth)
- lockdep_print_held_locks(p);
- if (!unlock)
- if (read_trylock(&tasklist_lock))
- unlock = 1;
- } while_each_thread(g, p);
-
- printk("\n");
- printk("=============================================\n\n");
-
- if (unlock)
- read_unlock(&tasklist_lock);
-}
-
-EXPORT_SYMBOL_GPL(debug_show_all_locks);
-
-void debug_show_held_locks(struct task_struct *task)
-{
- if (unlikely(!debug_locks)) {
- printk("INFO: lockdep is turned off.\n");
- return;
- }
- lockdep_print_held_locks(task);
-}
-
-EXPORT_SYMBOL_GPL(debug_show_held_locks);
-
-void lockdep_sys_exit(void)
-{
- struct task_struct *curr = current;
-
- if (unlikely(curr->lockdep_depth)) {
- if (!debug_locks_off())
- return;
- printk("\n================================================\n");
- printk( "[ BUG: lock held when returning to user space! ]\n");
- printk( "------------------------------------------------\n");
- printk("%s/%d is leaving the kernel with locks still held!\n",
- curr->comm, curr->pid);
- lockdep_print_held_locks(curr);
- }
-}
Index: linux-2.6-lttng.stable/kernel/Makefile
===================================================================
--- linux-2.6-lttng.stable.orig/kernel/Makefile 2007-10-29 09:52:13.000000000 -0400
+++ linux-2.6-lttng.stable/kernel/Makefile 2007-10-29 09:52:35.000000000 -0400
@@ -15,10 +15,6 @@ obj-$(CONFIG_SYSCTL) += sysctl_check.o
obj-$(CONFIG_STACKTRACE) += stacktrace.o
obj-y += time/
obj-$(CONFIG_DEBUG_MUTEXES) += mutex-debug.o
-obj-$(CONFIG_LOCKDEP) += lockdep.o
-ifeq ($(CONFIG_PROC_FS),y)
-obj-$(CONFIG_LOCKDEP) += lockdep_proc.o
-endif
obj-$(CONFIG_FUTEX) += futex.o
ifeq ($(CONFIG_COMPAT),y)
obj-$(CONFIG_FUTEX) += futex_compat.o
Index: linux-2.6-lttng.stable/instrumentation/lockdep_internals.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng.stable/instrumentation/lockdep_internals.h 2007-10-29 09:52:35.000000000 -0400
@@ -0,0 +1,78 @@
+/*
+ * kernel/lockdep_internals.h
+ *
+ * Runtime locking correctness validator
+ *
+ * lockdep subsystem internal functions and variables.
+ */
+
+/*
+ * MAX_LOCKDEP_ENTRIES is the maximum number of lock dependencies
+ * we track.
+ *
+ * We use the per-lock dependency maps in two ways: we grow it by adding
+ * every to-be-taken lock to all currently held lock's own dependency
+ * table (if it's not there yet), and we check it for lock order
+ * conflicts and deadlocks.
+ */
+#define MAX_LOCKDEP_ENTRIES 8192UL
+
+#define MAX_LOCKDEP_KEYS_BITS 11
+#define MAX_LOCKDEP_KEYS (1UL << MAX_LOCKDEP_KEYS_BITS)
+
+#define MAX_LOCKDEP_CHAINS_BITS 14
+#define MAX_LOCKDEP_CHAINS (1UL << MAX_LOCKDEP_CHAINS_BITS)
+
+/*
+ * Stack-trace: tightly packed array of stack backtrace
+ * addresses. Protected by the hash_lock.
+ */
+#define MAX_STACK_TRACE_ENTRIES 262144UL
+
+extern struct list_head all_lock_classes;
+
+extern void
+get_usage_chars(struct lock_class *class, char *c1, char *c2, char *c3, char *c4);
+
+extern const char * __get_key_name(struct lockdep_subclass_key *key, char *str);
+
+extern unsigned long nr_lock_classes;
+extern unsigned long nr_list_entries;
+extern unsigned long nr_lock_chains;
+extern unsigned long nr_stack_trace_entries;
+
+extern unsigned int nr_hardirq_chains;
+extern unsigned int nr_softirq_chains;
+extern unsigned int nr_process_chains;
+extern unsigned int max_lockdep_depth;
+extern unsigned int max_recursion_depth;
+
+#ifdef CONFIG_DEBUG_LOCKDEP
+/*
+ * Various lockdep statistics:
+ */
+extern atomic_t chain_lookup_hits;
+extern atomic_t chain_lookup_misses;
+extern atomic_t hardirqs_on_events;
+extern atomic_t hardirqs_off_events;
+extern atomic_t redundant_hardirqs_on;
+extern atomic_t redundant_hardirqs_off;
+extern atomic_t softirqs_on_events;
+extern atomic_t softirqs_off_events;
+extern atomic_t redundant_softirqs_on;
+extern atomic_t redundant_softirqs_off;
+extern atomic_t nr_unused_locks;
+extern atomic_t nr_cyclic_checks;
+extern atomic_t nr_cyclic_check_recursions;
+extern atomic_t nr_find_usage_forwards_checks;
+extern atomic_t nr_find_usage_forwards_recursions;
+extern atomic_t nr_find_usage_backwards_checks;
+extern atomic_t nr_find_usage_backwards_recursions;
+# define debug_atomic_inc(ptr) atomic_inc(ptr)
+# define debug_atomic_dec(ptr) atomic_dec(ptr)
+# define debug_atomic_read(ptr) atomic_read(ptr)
+#else
+# define debug_atomic_inc(ptr) do { } while (0)
+# define debug_atomic_dec(ptr) do { } while (0)
+# define debug_atomic_read(ptr) 0
+#endif
Index: linux-2.6-lttng.stable/instrumentation/lockdep_proc.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng.stable/instrumentation/lockdep_proc.c 2007-10-29 09:52:35.000000000 -0400
@@ -0,0 +1,683 @@
+/*
+ * kernel/lockdep_proc.c
+ *
+ * Runtime locking correctness validator
+ *
+ * Started by Ingo Molnar:
+ *
+ * Copyright (C) 2006,2007 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
+ * Copyright (C) 2007 Red Hat, Inc., Peter Zijlstra <pzijlstr@redhat.com>
+ *
+ * Code for /proc/lockdep and /proc/lockdep_stats:
+ *
+ */
+#include <linux/module.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <linux/kallsyms.h>
+#include <linux/debug_locks.h>
+#include <linux/vmalloc.h>
+#include <linux/sort.h>
+#include <asm/uaccess.h>
+#include <asm/div64.h>
+
+#include "lockdep_internals.h"
+
+static void *l_next(struct seq_file *m, void *v, loff_t *pos)
+{
+ struct lock_class *class;
+
+ (*pos)++;
+
+ if (v == SEQ_START_TOKEN)
+ class = m->private;
+ else {
+ class = v;
+
+ if (class->lock_entry.next != &all_lock_classes)
+ class = list_entry(class->lock_entry.next,
+ struct lock_class, lock_entry);
+ else
+ class = NULL;
+ }
+
+ return class;
+}
+
+static void *l_start(struct seq_file *m, loff_t *pos)
+{
+ struct lock_class *class;
+ loff_t i = 0;
+
+ if (*pos == 0)
+ return SEQ_START_TOKEN;
+
+ list_for_each_entry(class, &all_lock_classes, lock_entry) {
+ if (++i == *pos)
+ return class;
+ }
+ return NULL;
+}
+
+static void l_stop(struct seq_file *m, void *v)
+{
+}
+
+static unsigned long count_forward_deps(struct lock_class *class)
+{
+ struct lock_list *entry;
+ unsigned long ret = 1;
+
+ /*
+ * Recurse this class's dependency list:
+ */
+ list_for_each_entry(entry, &class->locks_after, entry)
+ ret += count_forward_deps(entry->class);
+
+ return ret;
+}
+
+static unsigned long count_backward_deps(struct lock_class *class)
+{
+ struct lock_list *entry;
+ unsigned long ret = 1;
+
+ /*
+ * Recurse this class's dependency list:
+ */
+ list_for_each_entry(entry, &class->locks_before, entry)
+ ret += count_backward_deps(entry->class);
+
+ return ret;
+}
+
+static void print_name(struct seq_file *m, struct lock_class *class)
+{
+ char str[128];
+ const char *name = class->name;
+
+ if (!name) {
+ name = __get_key_name(class->key, str);
+ seq_printf(m, "%s", name);
+ } else{
+ seq_printf(m, "%s", name);
+ if (class->name_version > 1)
+ seq_printf(m, "#%d", class->name_version);
+ if (class->subclass)
+ seq_printf(m, "/%d", class->subclass);
+ }
+}
+
+static int l_show(struct seq_file *m, void *v)
+{
+ unsigned long nr_forward_deps, nr_backward_deps;
+ struct lock_class *class = v;
+ struct lock_list *entry;
+ char c1, c2, c3, c4;
+
+ if (v == SEQ_START_TOKEN) {
+ seq_printf(m, "all lock classes:\n");
+ return 0;
+ }
+
+ seq_printf(m, "%p", class->key);
+#ifdef CONFIG_DEBUG_LOCKDEP
+ seq_printf(m, " OPS:%8ld", class->ops);
+#endif
+ nr_forward_deps = count_forward_deps(class);
+ seq_printf(m, " FD:%5ld", nr_forward_deps);
+
+ nr_backward_deps = count_backward_deps(class);
+ seq_printf(m, " BD:%5ld", nr_backward_deps);
+
+ get_usage_chars(class, &c1, &c2, &c3, &c4);
+ seq_printf(m, " %c%c%c%c", c1, c2, c3, c4);
+
+ seq_printf(m, ": ");
+ print_name(m, class);
+ seq_puts(m, "\n");
+
+ list_for_each_entry(entry, &class->locks_after, entry) {
+ if (entry->distance == 1) {
+ seq_printf(m, " -> [%p] ", entry->class);
+ print_name(m, entry->class);
+ seq_puts(m, "\n");
+ }
+ }
+ seq_puts(m, "\n");
+
+ return 0;
+}
+
+static const struct seq_operations lockdep_ops = {
+ .start = l_start,
+ .next = l_next,
+ .stop = l_stop,
+ .show = l_show,
+};
+
+static int lockdep_open(struct inode *inode, struct file *file)
+{
+ int res = seq_open(file, &lockdep_ops);
+ if (!res) {
+ struct seq_file *m = file->private_data;
+
+ if (!list_empty(&all_lock_classes))
+ m->private = list_entry(all_lock_classes.next,
+ struct lock_class, lock_entry);
+ else
+ m->private = NULL;
+ }
+ return res;
+}
+
+static const struct file_operations proc_lockdep_operations = {
+ .open = lockdep_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release,
+};
+
+static void lockdep_stats_debug_show(struct seq_file *m)
+{
+#ifdef CONFIG_DEBUG_LOCKDEP
+ unsigned int hi1 = debug_atomic_read(&hardirqs_on_events),
+ hi2 = debug_atomic_read(&hardirqs_off_events),
+ hr1 = debug_atomic_read(&redundant_hardirqs_on),
+ hr2 = debug_atomic_read(&redundant_hardirqs_off),
+ si1 = debug_atomic_read(&softirqs_on_events),
+ si2 = debug_atomic_read(&softirqs_off_events),
+ sr1 = debug_atomic_read(&redundant_softirqs_on),
+ sr2 = debug_atomic_read(&redundant_softirqs_off);
+
+ seq_printf(m, " chain lookup misses: %11u\n",
+ debug_atomic_read(&chain_lookup_misses));
+ seq_printf(m, " chain lookup hits: %11u\n",
+ debug_atomic_read(&chain_lookup_hits));
+ seq_printf(m, " cyclic checks: %11u\n",
+ debug_atomic_read(&nr_cyclic_checks));
+ seq_printf(m, " cyclic-check recursions: %11u\n",
+ debug_atomic_read(&nr_cyclic_check_recursions));
+ seq_printf(m, " find-mask forwards checks: %11u\n",
+ debug_atomic_read(&nr_find_usage_forwards_checks));
+ seq_printf(m, " find-mask forwards recursions: %11u\n",
+ debug_atomic_read(&nr_find_usage_forwards_recursions));
+ seq_printf(m, " find-mask backwards checks: %11u\n",
+ debug_atomic_read(&nr_find_usage_backwards_checks));
+ seq_printf(m, " find-mask backwards recursions:%11u\n",
+ debug_atomic_read(&nr_find_usage_backwards_recursions));
+
+ seq_printf(m, " hardirq on events: %11u\n", hi1);
+ seq_printf(m, " hardirq off events: %11u\n", hi2);
+ seq_printf(m, " redundant hardirq ons: %11u\n", hr1);
+ seq_printf(m, " redundant hardirq offs: %11u\n", hr2);
+ seq_printf(m, " softirq on events: %11u\n", si1);
+ seq_printf(m, " softirq off events: %11u\n", si2);
+ seq_printf(m, " redundant softirq ons: %11u\n", sr1);
+ seq_printf(m, " redundant softirq offs: %11u\n", sr2);
+#endif
+}
+
+static int lockdep_stats_show(struct seq_file *m, void *v)
+{
+ struct lock_class *class;
+ unsigned long nr_unused = 0, nr_uncategorized = 0,
+ nr_irq_safe = 0, nr_irq_unsafe = 0,
+ nr_softirq_safe = 0, nr_softirq_unsafe = 0,
+ nr_hardirq_safe = 0, nr_hardirq_unsafe = 0,
+ nr_irq_read_safe = 0, nr_irq_read_unsafe = 0,
+ nr_softirq_read_safe = 0, nr_softirq_read_unsafe = 0,
+ nr_hardirq_read_safe = 0, nr_hardirq_read_unsafe = 0,
+ sum_forward_deps = 0, factor = 0;
+
+ list_for_each_entry(class, &all_lock_classes, lock_entry) {
+
+ if (class->usage_mask == 0)
+ nr_unused++;
+ if (class->usage_mask == LOCKF_USED)
+ nr_uncategorized++;
+ if (class->usage_mask & LOCKF_USED_IN_IRQ)
+ nr_irq_safe++;
+ if (class->usage_mask & LOCKF_ENABLED_IRQS)
+ nr_irq_unsafe++;
+ if (class->usage_mask & LOCKF_USED_IN_SOFTIRQ)
+ nr_softirq_safe++;
+ if (class->usage_mask & LOCKF_ENABLED_SOFTIRQS)
+ nr_softirq_unsafe++;
+ if (class->usage_mask & LOCKF_USED_IN_HARDIRQ)
+ nr_hardirq_safe++;
+ if (class->usage_mask & LOCKF_ENABLED_HARDIRQS)
+ nr_hardirq_unsafe++;
+ if (class->usage_mask & LOCKF_USED_IN_IRQ_READ)
+ nr_irq_read_safe++;
+ if (class->usage_mask & LOCKF_ENABLED_IRQS_READ)
+ nr_irq_read_unsafe++;
+ if (class->usage_mask & LOCKF_USED_IN_SOFTIRQ_READ)
+ nr_softirq_read_safe++;
+ if (class->usage_mask & LOCKF_ENABLED_SOFTIRQS_READ)
+ nr_softirq_read_unsafe++;
+ if (class->usage_mask & LOCKF_USED_IN_HARDIRQ_READ)
+ nr_hardirq_read_safe++;
+ if (class->usage_mask & LOCKF_ENABLED_HARDIRQS_READ)
+ nr_hardirq_read_unsafe++;
+
+ sum_forward_deps += count_forward_deps(class);
+ }
+#ifdef CONFIG_DEBUG_LOCKDEP
+ DEBUG_LOCKS_WARN_ON(debug_atomic_read(&nr_unused_locks) != nr_unused);
+#endif
+ seq_printf(m, " lock-classes: %11lu [max: %lu]\n",
+ nr_lock_classes, MAX_LOCKDEP_KEYS);
+ seq_printf(m, " direct dependencies: %11lu [max: %lu]\n",
+ nr_list_entries, MAX_LOCKDEP_ENTRIES);
+ seq_printf(m, " indirect dependencies: %11lu\n",
+ sum_forward_deps);
+
+ /*
+ * Total number of dependencies:
+ *
+ * All irq-safe locks may nest inside irq-unsafe locks,
+ * plus all the other known dependencies:
+ */
+ seq_printf(m, " all direct dependencies: %11lu\n",
+ nr_irq_unsafe * nr_irq_safe +
+ nr_hardirq_unsafe * nr_hardirq_safe +
+ nr_list_entries);
+
+ /*
+ * Estimated factor between direct and indirect
+ * dependencies:
+ */
+ if (nr_list_entries)
+ factor = sum_forward_deps / nr_list_entries;
+
+#ifdef CONFIG_PROVE_LOCKING
+ seq_printf(m, " dependency chains: %11lu [max: %lu]\n",
+ nr_lock_chains, MAX_LOCKDEP_CHAINS);
+#endif
+
+#ifdef CONFIG_TRACE_IRQFLAGS
+ seq_printf(m, " in-hardirq chains: %11u\n",
+ nr_hardirq_chains);
+ seq_printf(m, " in-softirq chains: %11u\n",
+ nr_softirq_chains);
+#endif
+ seq_printf(m, " in-process chains: %11u\n",
+ nr_process_chains);
+ seq_printf(m, " stack-trace entries: %11lu [max: %lu]\n",
+ nr_stack_trace_entries, MAX_STACK_TRACE_ENTRIES);
+ seq_printf(m, " combined max dependencies: %11u\n",
+ (nr_hardirq_chains + 1) *
+ (nr_softirq_chains + 1) *
+ (nr_process_chains + 1)
+ );
+ seq_printf(m, " hardirq-safe locks: %11lu\n",
+ nr_hardirq_safe);
+ seq_printf(m, " hardirq-unsafe locks: %11lu\n",
+ nr_hardirq_unsafe);
+ seq_printf(m, " softirq-safe locks: %11lu\n",
+ nr_softirq_safe);
+ seq_printf(m, " softirq-unsafe locks: %11lu\n",
+ nr_softirq_unsafe);
+ seq_printf(m, " irq-safe locks: %11lu\n",
+ nr_irq_safe);
+ seq_printf(m, " irq-unsafe locks: %11lu\n",
+ nr_irq_unsafe);
+
+ seq_printf(m, " hardirq-read-safe locks: %11lu\n",
+ nr_hardirq_read_safe);
+ seq_printf(m, " hardirq-read-unsafe locks: %11lu\n",
+ nr_hardirq_read_unsafe);
+ seq_printf(m, " softirq-read-safe locks: %11lu\n",
+ nr_softirq_read_safe);
+ seq_printf(m, " softirq-read-unsafe locks: %11lu\n",
+ nr_softirq_read_unsafe);
+ seq_printf(m, " irq-read-safe locks: %11lu\n",
+ nr_irq_read_safe);
+ seq_printf(m, " irq-read-unsafe locks: %11lu\n",
+ nr_irq_read_unsafe);
+
+ seq_printf(m, " uncategorized locks: %11lu\n",
+ nr_uncategorized);
+ seq_printf(m, " unused locks: %11lu\n",
+ nr_unused);
+ seq_printf(m, " max locking depth: %11u\n",
+ max_lockdep_depth);
+ seq_printf(m, " max recursion depth: %11u\n",
+ max_recursion_depth);
+ lockdep_stats_debug_show(m);
+ seq_printf(m, " debug_locks: %11u\n",
+ debug_locks);
+
+ return 0;
+}
+
+static int lockdep_stats_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, lockdep_stats_show, NULL);
+}
+
+static const struct file_operations proc_lockdep_stats_operations = {
+ .open = lockdep_stats_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+#ifdef CONFIG_LOCK_STAT
+
+struct lock_stat_data {
+ struct lock_class *class;
+ struct lock_class_stats stats;
+};
+
+struct lock_stat_seq {
+ struct lock_stat_data *iter;
+ struct lock_stat_data *iter_end;
+ struct lock_stat_data stats[MAX_LOCKDEP_KEYS];
+};
+
+/*
+ * sort on absolute number of contentions
+ */
+static int lock_stat_cmp(const void *l, const void *r)
+{
+ const struct lock_stat_data *dl = l, *dr = r;
+ unsigned long nl, nr;
+
+ nl = dl->stats.read_waittime.nr + dl->stats.write_waittime.nr;
+ nr = dr->stats.read_waittime.nr + dr->stats.write_waittime.nr;
+
+ return nr - nl;
+}
+
+static void seq_line(struct seq_file *m, char c, int offset, int length)
+{
+ int i;
+
+ for (i = 0; i < offset; i++)
+ seq_puts(m, " ");
+ for (i = 0; i < length; i++)
+ seq_printf(m, "%c", c);
+ seq_puts(m, "\n");
+}
+
+static void snprint_time(char *buf, size_t bufsiz, s64 nr)
+{
+ unsigned long rem;
+
+ rem = do_div(nr, 1000); /* XXX: do_div_signed */
+ snprintf(buf, bufsiz, "%lld.%02d", (long long)nr, ((int)rem+5)/10);
+}
+
+static void seq_time(struct seq_file *m, s64 time)
+{
+ char num[15];
+
+ snprint_time(num, sizeof(num), time);
+ seq_printf(m, " %14s", num);
+}
+
+static void seq_lock_time(struct seq_file *m, struct lock_time *lt)
+{
+ seq_printf(m, "%14lu", lt->nr);
+ seq_time(m, lt->min);
+ seq_time(m, lt->max);
+ seq_time(m, lt->total);
+}
+
+static void seq_stats(struct seq_file *m, struct lock_stat_data *data)
+{
+ char name[39];
+ struct lock_class *class;
+ struct lock_class_stats *stats;
+ int i, namelen;
+
+ class = data->class;
+ stats = &data->stats;
+
+ namelen = 38;
+ if (class->name_version > 1)
+ namelen -= 2; /* XXX truncates versions > 9 */
+ if (class->subclass)
+ namelen -= 2;
+
+ if (!class->name) {
+ char str[KSYM_NAME_LEN];
+ const char *key_name;
+
+ key_name = __get_key_name(class->key, str);
+ snprintf(name, namelen, "%s", key_name);
+ } else {
+ snprintf(name, namelen, "%s", class->name);
+ }
+ namelen = strlen(name);
+ if (class->name_version > 1) {
+ snprintf(name+namelen, 3, "#%d", class->name_version);
+ namelen += 2;
+ }
+ if (class->subclass) {
+ snprintf(name+namelen, 3, "/%d", class->subclass);
+ namelen += 2;
+ }
+
+ if (stats->write_holdtime.nr) {
+ if (stats->read_holdtime.nr)
+ seq_printf(m, "%38s-W:", name);
+ else
+ seq_printf(m, "%40s:", name);
+
+ seq_printf(m, "%14lu ", stats->bounces[bounce_contended_write]);
+ seq_lock_time(m, &stats->write_waittime);
+ seq_printf(m, " %14lu ", stats->bounces[bounce_acquired_write]);
+ seq_lock_time(m, &stats->write_holdtime);
+ seq_puts(m, "\n");
+ }
+
+ if (stats->read_holdtime.nr) {
+ seq_printf(m, "%38s-R:", name);
+ seq_printf(m, "%14lu ", stats->bounces[bounce_contended_read]);
+ seq_lock_time(m, &stats->read_waittime);
+ seq_printf(m, " %14lu ", stats->bounces[bounce_acquired_read]);
+ seq_lock_time(m, &stats->read_holdtime);
+ seq_puts(m, "\n");
+ }
+
+ if (stats->read_waittime.nr + stats->write_waittime.nr == 0)
+ return;
+
+ if (stats->read_holdtime.nr)
+ namelen += 2;
+
+ for (i = 0; i < ARRAY_SIZE(class->contention_point); i++) {
+ char sym[KSYM_SYMBOL_LEN];
+ char ip[32];
+
+ if (class->contention_point[i] == 0)
+ break;
+
+ if (!i)
+ seq_line(m, '-', 40-namelen, namelen);
+
+ sprint_symbol(sym, class->contention_point[i]);
+ snprintf(ip, sizeof(ip), "[<%p>]",
+ (void *)class->contention_point[i]);
+ seq_printf(m, "%40s %14lu %29s %s\n", name,
+ stats->contention_point[i],
+ ip, sym);
+ }
+ if (i) {
+ seq_puts(m, "\n");
+ seq_line(m, '.', 0, 40 + 1 + 10 * (14 + 1));
+ seq_puts(m, "\n");
+ }
+}
+
+static void seq_header(struct seq_file *m)
+{
+ seq_printf(m, "lock_stat version 0.2\n");
+ seq_line(m, '-', 0, 40 + 1 + 10 * (14 + 1));
+ seq_printf(m, "%40s %14s %14s %14s %14s %14s %14s %14s %14s "
+ "%14s %14s\n",
+ "class name",
+ "con-bounces",
+ "contentions",
+ "waittime-min",
+ "waittime-max",
+ "waittime-total",
+ "acq-bounces",
+ "acquisitions",
+ "holdtime-min",
+ "holdtime-max",
+ "holdtime-total");
+ seq_line(m, '-', 0, 40 + 1 + 10 * (14 + 1));
+ seq_printf(m, "\n");
+}
+
+static void *ls_start(struct seq_file *m, loff_t *pos)
+{
+ struct lock_stat_seq *data = m->private;
+
+ if (*pos == 0)
+ return SEQ_START_TOKEN;
+
+ data->iter = data->stats + *pos;
+ if (data->iter >= data->iter_end)
+ data->iter = NULL;
+
+ return data->iter;
+}
+
+static void *ls_next(struct seq_file *m, void *v, loff_t *pos)
+{
+ struct lock_stat_seq *data = m->private;
+
+ (*pos)++;
+
+ if (v == SEQ_START_TOKEN)
+ data->iter = data->stats;
+ else {
+ data->iter = v;
+ data->iter++;
+ }
+
+ if (data->iter == data->iter_end)
+ data->iter = NULL;
+
+ return data->iter;
+}
+
+static void ls_stop(struct seq_file *m, void *v)
+{
+}
+
+static int ls_show(struct seq_file *m, void *v)
+{
+ if (v == SEQ_START_TOKEN)
+ seq_header(m);
+ else
+ seq_stats(m, v);
+
+ return 0;
+}
+
+static struct seq_operations lockstat_ops = {
+ .start = ls_start,
+ .next = ls_next,
+ .stop = ls_stop,
+ .show = ls_show,
+};
+
+static int lock_stat_open(struct inode *inode, struct file *file)
+{
+ int res;
+ struct lock_class *class;
+ struct lock_stat_seq *data = vmalloc(sizeof(struct lock_stat_seq));
+
+ if (!data)
+ return -ENOMEM;
+
+ res = seq_open(file, &lockstat_ops);
+ if (!res) {
+ struct lock_stat_data *iter = data->stats;
+ struct seq_file *m = file->private_data;
+
+ data->iter = iter;
+ list_for_each_entry(class, &all_lock_classes, lock_entry) {
+ iter->class = class;
+ iter->stats = lock_stats(class);
+ iter++;
+ }
+ data->iter_end = iter;
+
+ sort(data->stats, data->iter_end - data->iter,
+ sizeof(struct lock_stat_data),
+ lock_stat_cmp, NULL);
+
+ m->private = data;
+ } else
+ vfree(data);
+
+ return res;
+}
+
+static ssize_t lock_stat_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct lock_class *class;
+ char c;
+
+ if (count) {
+ if (get_user(c, buf))
+ return -EFAULT;
+
+ if (c != '0')
+ return count;
+
+ list_for_each_entry(class, &all_lock_classes, lock_entry)
+ clear_lock_stats(class);
+ }
+ return count;
+}
+
+static int lock_stat_release(struct inode *inode, struct file *file)
+{
+ struct seq_file *seq = file->private_data;
+
+ vfree(seq->private);
+ seq->private = NULL;
+ return seq_release(inode, file);
+}
+
+static const struct file_operations proc_lock_stat_operations = {
+ .open = lock_stat_open,
+ .write = lock_stat_write,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = lock_stat_release,
+};
+#endif /* CONFIG_LOCK_STAT */
+
+static int __init lockdep_proc_init(void)
+{
+ struct proc_dir_entry *entry;
+
+ entry = create_proc_entry("lockdep", S_IRUSR, NULL);
+ if (entry)
+ entry->proc_fops = &proc_lockdep_operations;
+
+ entry = create_proc_entry("lockdep_stats", S_IRUSR, NULL);
+ if (entry)
+ entry->proc_fops = &proc_lockdep_stats_operations;
+
+#ifdef CONFIG_LOCK_STAT
+ entry = create_proc_entry("lock_stat", S_IRUSR, NULL);
+ if (entry)
+ entry->proc_fops = &proc_lock_stat_operations;
+#endif
+
+ return 0;
+}
+
+__initcall(lockdep_proc_init);
+
Index: linux-2.6-lttng.stable/kernel/lockdep_internals.h
===================================================================
--- linux-2.6-lttng.stable.orig/kernel/lockdep_internals.h 2007-10-29 09:51:06.000000000 -0400
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,78 +0,0 @@
-/*
- * kernel/lockdep_internals.h
- *
- * Runtime locking correctness validator
- *
- * lockdep subsystem internal functions and variables.
- */
-
-/*
- * MAX_LOCKDEP_ENTRIES is the maximum number of lock dependencies
- * we track.
- *
- * We use the per-lock dependency maps in two ways: we grow it by adding
- * every to-be-taken lock to all currently held lock's own dependency
- * table (if it's not there yet), and we check it for lock order
- * conflicts and deadlocks.
- */
-#define MAX_LOCKDEP_ENTRIES 8192UL
-
-#define MAX_LOCKDEP_KEYS_BITS 11
-#define MAX_LOCKDEP_KEYS (1UL << MAX_LOCKDEP_KEYS_BITS)
-
-#define MAX_LOCKDEP_CHAINS_BITS 14
-#define MAX_LOCKDEP_CHAINS (1UL << MAX_LOCKDEP_CHAINS_BITS)
-
-/*
- * Stack-trace: tightly packed array of stack backtrace
- * addresses. Protected by the hash_lock.
- */
-#define MAX_STACK_TRACE_ENTRIES 262144UL
-
-extern struct list_head all_lock_classes;
-
-extern void
-get_usage_chars(struct lock_class *class, char *c1, char *c2, char *c3, char *c4);
-
-extern const char * __get_key_name(struct lockdep_subclass_key *key, char *str);
-
-extern unsigned long nr_lock_classes;
-extern unsigned long nr_list_entries;
-extern unsigned long nr_lock_chains;
-extern unsigned long nr_stack_trace_entries;
-
-extern unsigned int nr_hardirq_chains;
-extern unsigned int nr_softirq_chains;
-extern unsigned int nr_process_chains;
-extern unsigned int max_lockdep_depth;
-extern unsigned int max_recursion_depth;
-
-#ifdef CONFIG_DEBUG_LOCKDEP
-/*
- * Various lockdep statistics:
- */
-extern atomic_t chain_lookup_hits;
-extern atomic_t chain_lookup_misses;
-extern atomic_t hardirqs_on_events;
-extern atomic_t hardirqs_off_events;
-extern atomic_t redundant_hardirqs_on;
-extern atomic_t redundant_hardirqs_off;
-extern atomic_t softirqs_on_events;
-extern atomic_t softirqs_off_events;
-extern atomic_t redundant_softirqs_on;
-extern atomic_t redundant_softirqs_off;
-extern atomic_t nr_unused_locks;
-extern atomic_t nr_cyclic_checks;
-extern atomic_t nr_cyclic_check_recursions;
-extern atomic_t nr_find_usage_forwards_checks;
-extern atomic_t nr_find_usage_forwards_recursions;
-extern atomic_t nr_find_usage_backwards_checks;
-extern atomic_t nr_find_usage_backwards_recursions;
-# define debug_atomic_inc(ptr) atomic_inc(ptr)
-# define debug_atomic_dec(ptr) atomic_dec(ptr)
-# define debug_atomic_read(ptr) atomic_read(ptr)
-#else
-# define debug_atomic_inc(ptr) do { } while (0)
-# define debug_atomic_dec(ptr) do { } while (0)
-# define debug_atomic_read(ptr) 0
-#endif
Index: linux-2.6-lttng.stable/kernel/lockdep_proc.c
===================================================================
--- linux-2.6-lttng.stable.orig/kernel/lockdep_proc.c 2007-10-29 09:51:06.000000000 -0400
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,683 +0,0 @@
-/*
- * kernel/lockdep_proc.c
- *
- * Runtime locking correctness validator
- *
- * Started by Ingo Molnar:
- *
- * Copyright (C) 2006,2007 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
- * Copyright (C) 2007 Red Hat, Inc., Peter Zijlstra <pzijlstr@redhat.com>
- *
- * Code for /proc/lockdep and /proc/lockdep_stats:
- *
- */
-#include <linux/module.h>
-#include <linux/proc_fs.h>
-#include <linux/seq_file.h>
-#include <linux/kallsyms.h>
-#include <linux/debug_locks.h>
-#include <linux/vmalloc.h>
-#include <linux/sort.h>
-#include <asm/uaccess.h>
-#include <asm/div64.h>
-
-#include "lockdep_internals.h"
-
-static void *l_next(struct seq_file *m, void *v, loff_t *pos)
-{
- struct lock_class *class;
-
- (*pos)++;
-
- if (v == SEQ_START_TOKEN)
- class = m->private;
- else {
- class = v;
-
- if (class->lock_entry.next != &all_lock_classes)
- class = list_entry(class->lock_entry.next,
- struct lock_class, lock_entry);
- else
- class = NULL;
- }
-
- return class;
-}
-
-static void *l_start(struct seq_file *m, loff_t *pos)
-{
- struct lock_class *class;
- loff_t i = 0;
-
- if (*pos == 0)
- return SEQ_START_TOKEN;
-
- list_for_each_entry(class, &all_lock_classes, lock_entry) {
- if (++i == *pos)
- return class;
- }
- return NULL;
-}
-
-static void l_stop(struct seq_file *m, void *v)
-{
-}
-
-static unsigned long count_forward_deps(struct lock_class *class)
-{
- struct lock_list *entry;
- unsigned long ret = 1;
-
- /*
- * Recurse this class's dependency list:
- */
- list_for_each_entry(entry, &class->locks_after, entry)
- ret += count_forward_deps(entry->class);
-
- return ret;
-}
-
-static unsigned long count_backward_deps(struct lock_class *class)
-{
- struct lock_list *entry;
- unsigned long ret = 1;
-
- /*
- * Recurse this class's dependency list:
- */
- list_for_each_entry(entry, &class->locks_before, entry)
- ret += count_backward_deps(entry->class);
-
- return ret;
-}
-
-static void print_name(struct seq_file *m, struct lock_class *class)
-{
- char str[128];
- const char *name = class->name;
-
- if (!name) {
- name = __get_key_name(class->key, str);
- seq_printf(m, "%s", name);
- } else{
- seq_printf(m, "%s", name);
- if (class->name_version > 1)
- seq_printf(m, "#%d", class->name_version);
- if (class->subclass)
- seq_printf(m, "/%d", class->subclass);
- }
-}
-
-static int l_show(struct seq_file *m, void *v)
-{
- unsigned long nr_forward_deps, nr_backward_deps;
- struct lock_class *class = v;
- struct lock_list *entry;
- char c1, c2, c3, c4;
-
- if (v == SEQ_START_TOKEN) {
- seq_printf(m, "all lock classes:\n");
- return 0;
- }
-
- seq_printf(m, "%p", class->key);
-#ifdef CONFIG_DEBUG_LOCKDEP
- seq_printf(m, " OPS:%8ld", class->ops);
-#endif
- nr_forward_deps = count_forward_deps(class);
- seq_printf(m, " FD:%5ld", nr_forward_deps);
-
- nr_backward_deps = count_backward_deps(class);
- seq_printf(m, " BD:%5ld", nr_backward_deps);
-
- get_usage_chars(class, &c1, &c2, &c3, &c4);
- seq_printf(m, " %c%c%c%c", c1, c2, c3, c4);
-
- seq_printf(m, ": ");
- print_name(m, class);
- seq_puts(m, "\n");
-
- list_for_each_entry(entry, &class->locks_after, entry) {
- if (entry->distance == 1) {
- seq_printf(m, " -> [%p] ", entry->class);
- print_name(m, entry->class);
- seq_puts(m, "\n");
- }
- }
- seq_puts(m, "\n");
-
- return 0;
-}
-
-static const struct seq_operations lockdep_ops = {
- .start = l_start,
- .next = l_next,
- .stop = l_stop,
- .show = l_show,
-};
-
-static int lockdep_open(struct inode *inode, struct file *file)
-{
- int res = seq_open(file, &lockdep_ops);
- if (!res) {
- struct seq_file *m = file->private_data;
-
- if (!list_empty(&all_lock_classes))
- m->private = list_entry(all_lock_classes.next,
- struct lock_class, lock_entry);
- else
- m->private = NULL;
- }
- return res;
-}
-
-static const struct file_operations proc_lockdep_operations = {
- .open = lockdep_open,
- .read = seq_read,
- .llseek = seq_lseek,
- .release = seq_release,
-};
-
-static void lockdep_stats_debug_show(struct seq_file *m)
-{
-#ifdef CONFIG_DEBUG_LOCKDEP
- unsigned int hi1 = debug_atomic_read(&hardirqs_on_events),
- hi2 = debug_atomic_read(&hardirqs_off_events),
- hr1 = debug_atomic_read(&redundant_hardirqs_on),
- hr2 = debug_atomic_read(&redundant_hardirqs_off),
- si1 = debug_atomic_read(&softirqs_on_events),
- si2 = debug_atomic_read(&softirqs_off_events),
- sr1 = debug_atomic_read(&redundant_softirqs_on),
- sr2 = debug_atomic_read(&redundant_softirqs_off);
-
- seq_printf(m, " chain lookup misses: %11u\n",
- debug_atomic_read(&chain_lookup_misses));
- seq_printf(m, " chain lookup hits: %11u\n",
- debug_atomic_read(&chain_lookup_hits));
- seq_printf(m, " cyclic checks: %11u\n",
- debug_atomic_read(&nr_cyclic_checks));
- seq_printf(m, " cyclic-check recursions: %11u\n",
- debug_atomic_read(&nr_cyclic_check_recursions));
- seq_printf(m, " find-mask forwards checks: %11u\n",
- debug_atomic_read(&nr_find_usage_forwards_checks));
- seq_printf(m, " find-mask forwards recursions: %11u\n",
- debug_atomic_read(&nr_find_usage_forwards_recursions));
- seq_printf(m, " find-mask backwards checks: %11u\n",
- debug_atomic_read(&nr_find_usage_backwards_checks));
- seq_printf(m, " find-mask backwards recursions:%11u\n",
- debug_atomic_read(&nr_find_usage_backwards_recursions));
-
- seq_printf(m, " hardirq on events: %11u\n", hi1);
- seq_printf(m, " hardirq off events: %11u\n", hi2);
- seq_printf(m, " redundant hardirq ons: %11u\n", hr1);
- seq_printf(m, " redundant hardirq offs: %11u\n", hr2);
- seq_printf(m, " softirq on events: %11u\n", si1);
- seq_printf(m, " softirq off events: %11u\n", si2);
- seq_printf(m, " redundant softirq ons: %11u\n", sr1);
- seq_printf(m, " redundant softirq offs: %11u\n", sr2);
-#endif
-}
-
-static int lockdep_stats_show(struct seq_file *m, void *v)
-{
- struct lock_class *class;
- unsigned long nr_unused = 0, nr_uncategorized = 0,
- nr_irq_safe = 0, nr_irq_unsafe = 0,
- nr_softirq_safe = 0, nr_softirq_unsafe = 0,
- nr_hardirq_safe = 0, nr_hardirq_unsafe = 0,
- nr_irq_read_safe = 0, nr_irq_read_unsafe = 0,
- nr_softirq_read_safe = 0, nr_softirq_read_unsafe = 0,
- nr_hardirq_read_safe = 0, nr_hardirq_read_unsafe = 0,
- sum_forward_deps = 0, factor = 0;
-
- list_for_each_entry(class, &all_lock_classes, lock_entry) {
-
- if (class->usage_mask == 0)
- nr_unused++;
- if (class->usage_mask == LOCKF_USED)
- nr_uncategorized++;
- if (class->usage_mask & LOCKF_USED_IN_IRQ)
- nr_irq_safe++;
- if (class->usage_mask & LOCKF_ENABLED_IRQS)
- nr_irq_unsafe++;
- if (class->usage_mask & LOCKF_USED_IN_SOFTIRQ)
- nr_softirq_safe++;
- if (class->usage_mask & LOCKF_ENABLED_SOFTIRQS)
- nr_softirq_unsafe++;
- if (class->usage_mask & LOCKF_USED_IN_HARDIRQ)
- nr_hardirq_safe++;
- if (class->usage_mask & LOCKF_ENABLED_HARDIRQS)
- nr_hardirq_unsafe++;
- if (class->usage_mask & LOCKF_USED_IN_IRQ_READ)
- nr_irq_read_safe++;
- if (class->usage_mask & LOCKF_ENABLED_IRQS_READ)
- nr_irq_read_unsafe++;
- if (class->usage_mask & LOCKF_USED_IN_SOFTIRQ_READ)
- nr_softirq_read_safe++;
- if (class->usage_mask & LOCKF_ENABLED_SOFTIRQS_READ)
- nr_softirq_read_unsafe++;
- if (class->usage_mask & LOCKF_USED_IN_HARDIRQ_READ)
- nr_hardirq_read_safe++;
- if (class->usage_mask & LOCKF_ENABLED_HARDIRQS_READ)
- nr_hardirq_read_unsafe++;
-
- sum_forward_deps += count_forward_deps(class);
- }
-#ifdef CONFIG_DEBUG_LOCKDEP
- DEBUG_LOCKS_WARN_ON(debug_atomic_read(&nr_unused_locks) != nr_unused);
-#endif
- seq_printf(m, " lock-classes: %11lu [max: %lu]\n",
- nr_lock_classes, MAX_LOCKDEP_KEYS);
- seq_printf(m, " direct dependencies: %11lu [max: %lu]\n",
- nr_list_entries, MAX_LOCKDEP_ENTRIES);
- seq_printf(m, " indirect dependencies: %11lu\n",
- sum_forward_deps);
-
- /*
- * Total number of dependencies:
- *
- * All irq-safe locks may nest inside irq-unsafe locks,
- * plus all the other known dependencies:
- */
- seq_printf(m, " all direct dependencies: %11lu\n",
- nr_irq_unsafe * nr_irq_safe +
- nr_hardirq_unsafe * nr_hardirq_safe +
- nr_list_entries);
-
- /*
- * Estimated factor between direct and indirect
- * dependencies:
- */
- if (nr_list_entries)
- factor = sum_forward_deps / nr_list_entries;
-
-#ifdef CONFIG_PROVE_LOCKING
- seq_printf(m, " dependency chains: %11lu [max: %lu]\n",
- nr_lock_chains, MAX_LOCKDEP_CHAINS);
-#endif
-
-#ifdef CONFIG_TRACE_IRQFLAGS
- seq_printf(m, " in-hardirq chains: %11u\n",
- nr_hardirq_chains);
- seq_printf(m, " in-softirq chains: %11u\n",
- nr_softirq_chains);
-#endif
- seq_printf(m, " in-process chains: %11u\n",
- nr_process_chains);
- seq_printf(m, " stack-trace entries: %11lu [max: %lu]\n",
- nr_stack_trace_entries, MAX_STACK_TRACE_ENTRIES);
- seq_printf(m, " combined max dependencies: %11u\n",
- (nr_hardirq_chains + 1) *
- (nr_softirq_chains + 1) *
- (nr_process_chains + 1)
- );
- seq_printf(m, " hardirq-safe locks: %11lu\n",
- nr_hardirq_safe);
- seq_printf(m, " hardirq-unsafe locks: %11lu\n",
- nr_hardirq_unsafe);
- seq_printf(m, " softirq-safe locks: %11lu\n",
- nr_softirq_safe);
- seq_printf(m, " softirq-unsafe locks: %11lu\n",
- nr_softirq_unsafe);
- seq_printf(m, " irq-safe locks: %11lu\n",
- nr_irq_safe);
- seq_printf(m, " irq-unsafe locks: %11lu\n",
- nr_irq_unsafe);
-
- seq_printf(m, " hardirq-read-safe locks: %11lu\n",
- nr_hardirq_read_safe);
- seq_printf(m, " hardirq-read-unsafe locks: %11lu\n",
- nr_hardirq_read_unsafe);
- seq_printf(m, " softirq-read-safe locks: %11lu\n",
- nr_softirq_read_safe);
- seq_printf(m, " softirq-read-unsafe locks: %11lu\n",
- nr_softirq_read_unsafe);
- seq_printf(m, " irq-read-safe locks: %11lu\n",
- nr_irq_read_safe);
- seq_printf(m, " irq-read-unsafe locks: %11lu\n",
- nr_irq_read_unsafe);
-
- seq_printf(m, " uncategorized locks: %11lu\n",
- nr_uncategorized);
- seq_printf(m, " unused locks: %11lu\n",
- nr_unused);
- seq_printf(m, " max locking depth: %11u\n",
- max_lockdep_depth);
- seq_printf(m, " max recursion depth: %11u\n",
- max_recursion_depth);
- lockdep_stats_debug_show(m);
- seq_printf(m, " debug_locks: %11u\n",
- debug_locks);
-
- return 0;
-}
-
-static int lockdep_stats_open(struct inode *inode, struct file *file)
-{
- return single_open(file, lockdep_stats_show, NULL);
-}
-
-static const struct file_operations proc_lockdep_stats_operations = {
- .open = lockdep_stats_open,
- .read = seq_read,
- .llseek = seq_lseek,
- .release = single_release,
-};
-
-#ifdef CONFIG_LOCK_STAT
-
-struct lock_stat_data {
- struct lock_class *class;
- struct lock_class_stats stats;
-};
-
-struct lock_stat_seq {
- struct lock_stat_data *iter;
- struct lock_stat_data *iter_end;
- struct lock_stat_data stats[MAX_LOCKDEP_KEYS];
-};
-
-/*
- * sort on absolute number of contentions
- */
-static int lock_stat_cmp(const void *l, const void *r)
-{
- const struct lock_stat_data *dl = l, *dr = r;
- unsigned long nl, nr;
-
- nl = dl->stats.read_waittime.nr + dl->stats.write_waittime.nr;
- nr = dr->stats.read_waittime.nr + dr->stats.write_waittime.nr;
-
- return nr - nl;
-}
-
-static void seq_line(struct seq_file *m, char c, int offset, int length)
-{
- int i;
-
- for (i = 0; i < offset; i++)
- seq_puts(m, " ");
- for (i = 0; i < length; i++)
- seq_printf(m, "%c", c);
- seq_puts(m, "\n");
-}
-
-static void snprint_time(char *buf, size_t bufsiz, s64 nr)
-{
- unsigned long rem;
-
- rem = do_div(nr, 1000); /* XXX: do_div_signed */
- snprintf(buf, bufsiz, "%lld.%02d", (long long)nr, ((int)rem+5)/10);
-}
-
-static void seq_time(struct seq_file *m, s64 time)
-{
- char num[15];
-
- snprint_time(num, sizeof(num), time);
- seq_printf(m, " %14s", num);
-}
-
-static void seq_lock_time(struct seq_file *m, struct lock_time *lt)
-{
- seq_printf(m, "%14lu", lt->nr);
- seq_time(m, lt->min);
- seq_time(m, lt->max);
- seq_time(m, lt->total);
-}
-
-static void seq_stats(struct seq_file *m, struct lock_stat_data *data)
-{
- char name[39];
- struct lock_class *class;
- struct lock_class_stats *stats;
- int i, namelen;
-
- class = data->class;
- stats = &data->stats;
-
- namelen = 38;
- if (class->name_version > 1)
- namelen -= 2; /* XXX truncates versions > 9 */
- if (class->subclass)
- namelen -= 2;
-
- if (!class->name) {
- char str[KSYM_NAME_LEN];
- const char *key_name;
-
- key_name = __get_key_name(class->key, str);
- snprintf(name, namelen, "%s", key_name);
- } else {
- snprintf(name, namelen, "%s", class->name);
- }
- namelen = strlen(name);
- if (class->name_version > 1) {
- snprintf(name+namelen, 3, "#%d", class->name_version);
- namelen += 2;
- }
- if (class->subclass) {
- snprintf(name+namelen, 3, "/%d", class->subclass);
- namelen += 2;
- }
-
- if (stats->write_holdtime.nr) {
- if (stats->read_holdtime.nr)
- seq_printf(m, "%38s-W:", name);
- else
- seq_printf(m, "%40s:", name);
-
- seq_printf(m, "%14lu ", stats->bounces[bounce_contended_write]);
- seq_lock_time(m, &stats->write_waittime);
- seq_printf(m, " %14lu ", stats->bounces[bounce_acquired_write]);
- seq_lock_time(m, &stats->write_holdtime);
- seq_puts(m, "\n");
- }
-
- if (stats->read_holdtime.nr) {
- seq_printf(m, "%38s-R:", name);
- seq_printf(m, "%14lu ", stats->bounces[bounce_contended_read]);
- seq_lock_time(m, &stats->read_waittime);
- seq_printf(m, " %14lu ", stats->bounces[bounce_acquired_read]);
- seq_lock_time(m, &stats->read_holdtime);
- seq_puts(m, "\n");
- }
-
- if (stats->read_waittime.nr + stats->write_waittime.nr == 0)
- return;
-
- if (stats->read_holdtime.nr)
- namelen += 2;
-
- for (i = 0; i < ARRAY_SIZE(class->contention_point); i++) {
- char sym[KSYM_SYMBOL_LEN];
- char ip[32];
-
- if (class->contention_point[i] == 0)
- break;
-
- if (!i)
- seq_line(m, '-', 40-namelen, namelen);
-
- sprint_symbol(sym, class->contention_point[i]);
- snprintf(ip, sizeof(ip), "[<%p>]",
- (void *)class->contention_point[i]);
- seq_printf(m, "%40s %14lu %29s %s\n", name,
- stats->contention_point[i],
- ip, sym);
- }
- if (i) {
- seq_puts(m, "\n");
- seq_line(m, '.', 0, 40 + 1 + 10 * (14 + 1));
- seq_puts(m, "\n");
- }
-}
-
-static void seq_header(struct seq_file *m)
-{
- seq_printf(m, "lock_stat version 0.2\n");
- seq_line(m, '-', 0, 40 + 1 + 10 * (14 + 1));
- seq_printf(m, "%40s %14s %14s %14s %14s %14s %14s %14s %14s "
- "%14s %14s\n",
- "class name",
- "con-bounces",
- "contentions",
- "waittime-min",
- "waittime-max",
- "waittime-total",
- "acq-bounces",
- "acquisitions",
- "holdtime-min",
- "holdtime-max",
- "holdtime-total");
- seq_line(m, '-', 0, 40 + 1 + 10 * (14 + 1));
- seq_printf(m, "\n");
-}
-
-static void *ls_start(struct seq_file *m, loff_t *pos)
-{
- struct lock_stat_seq *data = m->private;
-
- if (*pos == 0)
- return SEQ_START_TOKEN;
-
- data->iter = data->stats + *pos;
- if (data->iter >= data->iter_end)
- data->iter = NULL;
-
- return data->iter;
-}
-
-static void *ls_next(struct seq_file *m, void *v, loff_t *pos)
-{
- struct lock_stat_seq *data = m->private;
-
- (*pos)++;
-
- if (v == SEQ_START_TOKEN)
- data->iter = data->stats;
- else {
- data->iter = v;
- data->iter++;
- }
-
- if (data->iter == data->iter_end)
- data->iter = NULL;
-
- return data->iter;
-}
-
-static void ls_stop(struct seq_file *m, void *v)
-{
-}
-
-static int ls_show(struct seq_file *m, void *v)
-{
- if (v == SEQ_START_TOKEN)
- seq_header(m);
- else
- seq_stats(m, v);
-
- return 0;
-}
-
-static struct seq_operations lockstat_ops = {
- .start = ls_start,
- .next = ls_next,
- .stop = ls_stop,
- .show = ls_show,
-};
-
-static int lock_stat_open(struct inode *inode, struct file *file)
-{
- int res;
- struct lock_class *class;
- struct lock_stat_seq *data = vmalloc(sizeof(struct lock_stat_seq));
-
- if (!data)
- return -ENOMEM;
-
- res = seq_open(file, &lockstat_ops);
- if (!res) {
- struct lock_stat_data *iter = data->stats;
- struct seq_file *m = file->private_data;
-
- data->iter = iter;
- list_for_each_entry(class, &all_lock_classes, lock_entry) {
- iter->class = class;
- iter->stats = lock_stats(class);
- iter++;
- }
- data->iter_end = iter;
-
- sort(data->stats, data->iter_end - data->iter,
- sizeof(struct lock_stat_data),
- lock_stat_cmp, NULL);
-
- m->private = data;
- } else
- vfree(data);
-
- return res;
-}
-
-static ssize_t lock_stat_write(struct file *file, const char __user *buf,
- size_t count, loff_t *ppos)
-{
- struct lock_class *class;
- char c;
-
- if (count) {
- if (get_user(c, buf))
- return -EFAULT;
-
- if (c != '0')
- return count;
-
- list_for_each_entry(class, &all_lock_classes, lock_entry)
- clear_lock_stats(class);
- }
- return count;
-}
-
-static int lock_stat_release(struct inode *inode, struct file *file)
-{
- struct seq_file *seq = file->private_data;
-
- vfree(seq->private);
- seq->private = NULL;
- return seq_release(inode, file);
-}
-
-static const struct file_operations proc_lock_stat_operations = {
- .open = lock_stat_open,
- .write = lock_stat_write,
- .read = seq_read,
- .llseek = seq_lseek,
- .release = lock_stat_release,
-};
-#endif /* CONFIG_LOCK_STAT */
-
-static int __init lockdep_proc_init(void)
-{
- struct proc_dir_entry *entry;
-
- entry = create_proc_entry("lockdep", S_IRUSR, NULL);
- if (entry)
- entry->proc_fops = &proc_lockdep_operations;
-
- entry = create_proc_entry("lockdep_stats", S_IRUSR, NULL);
- if (entry)
- entry->proc_fops = &proc_lockdep_stats_operations;
-
-#ifdef CONFIG_LOCK_STAT
- entry = create_proc_entry("lock_stat", S_IRUSR, NULL);
- if (entry)
- entry->proc_fops = &proc_lock_stat_operations;
-#endif
-
- return 0;
-}
-
-__initcall(lockdep_proc_init);
-
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 13+ messages in thread
* [patch 07/10] Move vmstat to instrumentation/.
2007-10-29 14:26 [patch 00/10] [PATCH] Create instrumentation/ kernel directory Mathieu Desnoyers
` (4 preceding siblings ...)
2007-10-29 14:26 ` [patch 06/10] Move lockdep " Mathieu Desnoyers
@ 2007-10-29 14:26 ` Mathieu Desnoyers
2007-10-29 14:26 ` [patch 08/10] Move blktrace Kconfig entry to instrumentation menu Mathieu Desnoyers
` (3 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2007-10-29 14:26 UTC (permalink / raw)
To: linux-kernel; +Cc: Mathieu Desnoyers, Christoph Lameter
[-- Attachment #1: move-vmstat-to-instrumentation.patch --]
[-- Type: text/plain, Size: 44282 bytes --]
Move mm/vmstat.c to instrumentation/.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Christoph Lameter <christoph@lameter.com>
---
instrumentation/Makefile | 2
instrumentation/vmstat.c | 863 +++++++++++++++++++++++++++++++++++++++++++++++
mm/Makefile | 2
mm/vmstat.c | 863 -----------------------------------------------
4 files changed, 865 insertions(+), 865 deletions(-)
Index: linux-2.6-lttng.stable/instrumentation/Makefile
===================================================================
--- linux-2.6-lttng.stable.orig/instrumentation/Makefile 2007-10-29 09:52:58.000000000 -0400
+++ linux-2.6-lttng.stable/instrumentation/Makefile 2007-10-29 09:53:14.000000000 -0400
@@ -2,7 +2,7 @@
# Makefile for the linux kernel instrumentation
#
-obj-y += profile.o
+obj-y += profile.o vmstat.o
obj-$(CONFIG_KPROBES) += kprobes.o
obj-$(CONFIG_MARKERS) += marker.o
Index: linux-2.6-lttng.stable/mm/Makefile
===================================================================
--- linux-2.6-lttng.stable.orig/mm/Makefile 2007-10-29 09:51:06.000000000 -0400
+++ linux-2.6-lttng.stable/mm/Makefile 2007-10-29 09:53:02.000000000 -0400
@@ -10,7 +10,7 @@ mmu-$(CONFIG_MMU) := fremap.o highmem.o
obj-y := bootmem.o filemap.o mempool.o oom_kill.o fadvise.o \
page_alloc.o page-writeback.o pdflush.o \
readahead.o swap.o truncate.o vmscan.o \
- prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
+ prio_tree.o util.o mmzone.o backing-dev.o \
page_isolation.o $(mmu-y)
obj-$(CONFIG_BOUNCE) += bounce.o
Index: linux-2.6-lttng.stable/instrumentation/vmstat.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng.stable/instrumentation/vmstat.c 2007-10-29 09:53:02.000000000 -0400
@@ -0,0 +1,863 @@
+/*
+ * linux/mm/vmstat.c
+ *
+ * Manages VM statistics
+ * Copyright (C) 1991, 1992, 1993, 1994 Linus Torvalds
+ *
+ * zoned VM statistics
+ * Copyright (C) 2006 Silicon Graphics, Inc.,
+ * Christoph Lameter <christoph@lameter.com>
+ */
+
+#include <linux/mm.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <linux/cpu.h>
+#include <linux/sched.h>
+
+#ifdef CONFIG_VM_EVENT_COUNTERS
+DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};
+EXPORT_PER_CPU_SYMBOL(vm_event_states);
+
+static void sum_vm_events(unsigned long *ret, cpumask_t *cpumask)
+{
+ int cpu = 0;
+ int i;
+
+ memset(ret, 0, NR_VM_EVENT_ITEMS * sizeof(unsigned long));
+
+ cpu = first_cpu(*cpumask);
+ while (cpu < NR_CPUS) {
+ struct vm_event_state *this = &per_cpu(vm_event_states, cpu);
+
+ cpu = next_cpu(cpu, *cpumask);
+
+ if (cpu < NR_CPUS)
+ prefetch(&per_cpu(vm_event_states, cpu));
+
+
+ for (i = 0; i < NR_VM_EVENT_ITEMS; i++)
+ ret[i] += this->event[i];
+ }
+}
+
+/*
+ * Accumulate the vm event counters across all CPUs.
+ * The result is unavoidably approximate - it can change
+ * during and after execution of this function.
+*/
+void all_vm_events(unsigned long *ret)
+{
+ sum_vm_events(ret, &cpu_online_map);
+}
+EXPORT_SYMBOL_GPL(all_vm_events);
+
+#ifdef CONFIG_HOTPLUG
+/*
+ * Fold the foreign cpu events into our own.
+ *
+ * This is adding to the events on one processor
+ * but keeps the global counts constant.
+ */
+void vm_events_fold_cpu(int cpu)
+{
+ struct vm_event_state *fold_state = &per_cpu(vm_event_states, cpu);
+ int i;
+
+ for (i = 0; i < NR_VM_EVENT_ITEMS; i++) {
+ count_vm_events(i, fold_state->event[i]);
+ fold_state->event[i] = 0;
+ }
+}
+#endif /* CONFIG_HOTPLUG */
+
+#endif /* CONFIG_VM_EVENT_COUNTERS */
+
+/*
+ * Manage combined zone based / global counters
+ *
+ * vm_stat contains the global counters
+ */
+atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS];
+EXPORT_SYMBOL(vm_stat);
+
+#ifdef CONFIG_SMP
+
+static int calculate_threshold(struct zone *zone)
+{
+ int threshold;
+ int mem; /* memory in 128 MB units */
+
+ /*
+ * The threshold scales with the number of processors and the amount
+ * of memory per zone. More memory means that we can defer updates for
+ * longer, more processors could lead to more contention.
+ * fls() is used to have a cheap way of logarithmic scaling.
+ *
+ * Some sample thresholds:
+ *
+ * Threshold Processors (fls) Zonesize fls(mem+1)
+ * ------------------------------------------------------------------
+ * 8 1 1 0.9-1 GB 4
+ * 16 2 2 0.9-1 GB 4
+ * 20 2 2 1-2 GB 5
+ * 24 2 2 2-4 GB 6
+ * 28 2 2 4-8 GB 7
+ * 32 2 2 8-16 GB 8
+ * 4 2 2 <128M 1
+ * 30 4 3 2-4 GB 5
+ * 48 4 3 8-16 GB 8
+ * 32 8 4 1-2 GB 4
+ * 32 8 4 0.9-1GB 4
+ * 10 16 5 <128M 1
+ * 40 16 5 900M 4
+ * 70 64 7 2-4 GB 5
+ * 84 64 7 4-8 GB 6
+ * 108 512 9 4-8 GB 6
+ * 125 1024 10 8-16 GB 8
+ * 125 1024 10 16-32 GB 9
+ */
+
+ mem = zone->present_pages >> (27 - PAGE_SHIFT);
+
+ threshold = 2 * fls(num_online_cpus()) * (1 + fls(mem));
+
+ /*
+ * Maximum threshold is 125
+ */
+ threshold = min(125, threshold);
+
+ return threshold;
+}
+
+/*
+ * Refresh the thresholds for each zone.
+ */
+static void refresh_zone_stat_thresholds(void)
+{
+ struct zone *zone;
+ int cpu;
+ int threshold;
+
+ for_each_zone(zone) {
+
+ if (!zone->present_pages)
+ continue;
+
+ threshold = calculate_threshold(zone);
+
+ for_each_online_cpu(cpu)
+ zone_pcp(zone, cpu)->stat_threshold = threshold;
+ }
+}
+
+/*
+ * For use when we know that interrupts are disabled.
+ */
+void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
+ int delta)
+{
+ struct per_cpu_pageset *pcp = zone_pcp(zone, smp_processor_id());
+ s8 *p = pcp->vm_stat_diff + item;
+ long x;
+
+ x = delta + *p;
+
+ if (unlikely(x > pcp->stat_threshold || x < -pcp->stat_threshold)) {
+ zone_page_state_add(x, zone, item);
+ x = 0;
+ }
+ *p = x;
+}
+EXPORT_SYMBOL(__mod_zone_page_state);
+
+/*
+ * For an unknown interrupt state
+ */
+void mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
+ int delta)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ __mod_zone_page_state(zone, item, delta);
+ local_irq_restore(flags);
+}
+EXPORT_SYMBOL(mod_zone_page_state);
+
+/*
+ * Optimized increment and decrement functions.
+ *
+ * These are only for a single page and therefore can take a struct page *
+ * argument instead of struct zone *. This allows the inclusion of the code
+ * generated for page_zone(page) into the optimized functions.
+ *
+ * No overflow check is necessary and therefore the differential can be
+ * incremented or decremented in place which may allow the compilers to
+ * generate better code.
+ * The increment or decrement is known and therefore one boundary check can
+ * be omitted.
+ *
+ * NOTE: These functions are very performance sensitive. Change only
+ * with care.
+ *
+ * Some processors have inc/dec instructions that are atomic vs an interrupt.
+ * However, the code must first determine the differential location in a zone
+ * based on the processor number and then inc/dec the counter. There is no
+ * guarantee without disabling preemption that the processor will not change
+ * in between and therefore the atomicity vs. interrupt cannot be exploited
+ * in a useful way here.
+ */
+void __inc_zone_state(struct zone *zone, enum zone_stat_item item)
+{
+ struct per_cpu_pageset *pcp = zone_pcp(zone, smp_processor_id());
+ s8 *p = pcp->vm_stat_diff + item;
+
+ (*p)++;
+
+ if (unlikely(*p > pcp->stat_threshold)) {
+ int overstep = pcp->stat_threshold / 2;
+
+ zone_page_state_add(*p + overstep, zone, item);
+ *p = -overstep;
+ }
+}
+
+void __inc_zone_page_state(struct page *page, enum zone_stat_item item)
+{
+ __inc_zone_state(page_zone(page), item);
+}
+EXPORT_SYMBOL(__inc_zone_page_state);
+
+void __dec_zone_state(struct zone *zone, enum zone_stat_item item)
+{
+ struct per_cpu_pageset *pcp = zone_pcp(zone, smp_processor_id());
+ s8 *p = pcp->vm_stat_diff + item;
+
+ (*p)--;
+
+ if (unlikely(*p < - pcp->stat_threshold)) {
+ int overstep = pcp->stat_threshold / 2;
+
+ zone_page_state_add(*p - overstep, zone, item);
+ *p = overstep;
+ }
+}
+
+void __dec_zone_page_state(struct page *page, enum zone_stat_item item)
+{
+ __dec_zone_state(page_zone(page), item);
+}
+EXPORT_SYMBOL(__dec_zone_page_state);
+
+void inc_zone_state(struct zone *zone, enum zone_stat_item item)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ __inc_zone_state(zone, item);
+ local_irq_restore(flags);
+}
+
+void inc_zone_page_state(struct page *page, enum zone_stat_item item)
+{
+ unsigned long flags;
+ struct zone *zone;
+
+ zone = page_zone(page);
+ local_irq_save(flags);
+ __inc_zone_state(zone, item);
+ local_irq_restore(flags);
+}
+EXPORT_SYMBOL(inc_zone_page_state);
+
+void dec_zone_page_state(struct page *page, enum zone_stat_item item)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ __dec_zone_page_state(page, item);
+ local_irq_restore(flags);
+}
+EXPORT_SYMBOL(dec_zone_page_state);
+
+/*
+ * Update the zone counters for one cpu.
+ *
+ * Note that refresh_cpu_vm_stats strives to only access
+ * node local memory. The per cpu pagesets on remote zones are placed
+ * in the memory local to the processor using that pageset. So the
+ * loop over all zones will access a series of cachelines local to
+ * the processor.
+ *
+ * The call to zone_page_state_add updates the cachelines with the
+ * statistics in the remote zone struct as well as the global cachelines
+ * with the global counters. These could cause remote node cache line
+ * bouncing and will have to be only done when necessary.
+ */
+void refresh_cpu_vm_stats(int cpu)
+{
+ struct zone *zone;
+ int i;
+ unsigned long flags;
+
+ for_each_zone(zone) {
+ struct per_cpu_pageset *p;
+
+ if (!populated_zone(zone))
+ continue;
+
+ p = zone_pcp(zone, cpu);
+
+ for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
+ if (p->vm_stat_diff[i]) {
+ local_irq_save(flags);
+ zone_page_state_add(p->vm_stat_diff[i],
+ zone, i);
+ p->vm_stat_diff[i] = 0;
+#ifdef CONFIG_NUMA
+ /* 3 seconds idle till flush */
+ p->expire = 3;
+#endif
+ local_irq_restore(flags);
+ }
+#ifdef CONFIG_NUMA
+ /*
+ * Deal with draining the remote pageset of this
+ * processor
+ *
+ * Check if there are pages remaining in this pageset
+ * if not then there is nothing to expire.
+ */
+ if (!p->expire || (!p->pcp[0].count && !p->pcp[1].count))
+ continue;
+
+ /*
+ * We never drain zones local to this processor.
+ */
+ if (zone_to_nid(zone) == numa_node_id()) {
+ p->expire = 0;
+ continue;
+ }
+
+ p->expire--;
+ if (p->expire)
+ continue;
+
+ if (p->pcp[0].count)
+ drain_zone_pages(zone, p->pcp + 0);
+
+ if (p->pcp[1].count)
+ drain_zone_pages(zone, p->pcp + 1);
+#endif
+ }
+}
+
+#endif
+
+#ifdef CONFIG_NUMA
+/*
+ * zonelist = the list of zones passed to the allocator
+ * z = the zone from which the allocation occurred.
+ *
+ * Must be called with interrupts disabled.
+ */
+void zone_statistics(struct zonelist *zonelist, struct zone *z)
+{
+ if (z->zone_pgdat == zonelist->zones[0]->zone_pgdat) {
+ __inc_zone_state(z, NUMA_HIT);
+ } else {
+ __inc_zone_state(z, NUMA_MISS);
+ __inc_zone_state(zonelist->zones[0], NUMA_FOREIGN);
+ }
+ if (z->node == numa_node_id())
+ __inc_zone_state(z, NUMA_LOCAL);
+ else
+ __inc_zone_state(z, NUMA_OTHER);
+}
+#endif
+
+#ifdef CONFIG_PROC_FS
+
+#include <linux/seq_file.h>
+
+static char * const migratetype_names[MIGRATE_TYPES] = {
+ "Unmovable",
+ "Reclaimable",
+ "Movable",
+ "Reserve",
+};
+
+static void *frag_start(struct seq_file *m, loff_t *pos)
+{
+ pg_data_t *pgdat;
+ loff_t node = *pos;
+ for (pgdat = first_online_pgdat();
+ pgdat && node;
+ pgdat = next_online_pgdat(pgdat))
+ --node;
+
+ return pgdat;
+}
+
+static void *frag_next(struct seq_file *m, void *arg, loff_t *pos)
+{
+ pg_data_t *pgdat = (pg_data_t *)arg;
+
+ (*pos)++;
+ return next_online_pgdat(pgdat);
+}
+
+static void frag_stop(struct seq_file *m, void *arg)
+{
+}
+
+/* Walk all the zones in a node and print using a callback */
+static void walk_zones_in_node(struct seq_file *m, pg_data_t *pgdat,
+ void (*print)(struct seq_file *m, pg_data_t *, struct zone *))
+{
+ struct zone *zone;
+ struct zone *node_zones = pgdat->node_zones;
+ unsigned long flags;
+
+ for (zone = node_zones; zone - node_zones < MAX_NR_ZONES; ++zone) {
+ if (!populated_zone(zone))
+ continue;
+
+ spin_lock_irqsave(&zone->lock, flags);
+ print(m, pgdat, zone);
+ spin_unlock_irqrestore(&zone->lock, flags);
+ }
+}
+
+static void frag_show_print(struct seq_file *m, pg_data_t *pgdat,
+ struct zone *zone)
+{
+ int order;
+
+ seq_printf(m, "Node %d, zone %8s ", pgdat->node_id, zone->name);
+ for (order = 0; order < MAX_ORDER; ++order)
+ seq_printf(m, "%6lu ", zone->free_area[order].nr_free);
+ seq_putc(m, '\n');
+}
+
+/*
+ * This walks the free areas for each zone.
+ */
+static int frag_show(struct seq_file *m, void *arg)
+{
+ pg_data_t *pgdat = (pg_data_t *)arg;
+ walk_zones_in_node(m, pgdat, frag_show_print);
+ return 0;
+}
+
+static void pagetypeinfo_showfree_print(struct seq_file *m,
+ pg_data_t *pgdat, struct zone *zone)
+{
+ int order, mtype;
+
+ for (mtype = 0; mtype < MIGRATE_TYPES; mtype++) {
+ seq_printf(m, "Node %4d, zone %8s, type %12s ",
+ pgdat->node_id,
+ zone->name,
+ migratetype_names[mtype]);
+ for (order = 0; order < MAX_ORDER; ++order) {
+ unsigned long freecount = 0;
+ struct free_area *area;
+ struct list_head *curr;
+
+ area = &(zone->free_area[order]);
+
+ list_for_each(curr, &area->free_list[mtype])
+ freecount++;
+ seq_printf(m, "%6lu ", freecount);
+ }
+ seq_putc(m, '\n');
+ }
+}
+
+/* Print out the free pages at each order for each migatetype */
+static int pagetypeinfo_showfree(struct seq_file *m, void *arg)
+{
+ int order;
+ pg_data_t *pgdat = (pg_data_t *)arg;
+
+ /* Print header */
+ seq_printf(m, "%-43s ", "Free pages count per migrate type at order");
+ for (order = 0; order < MAX_ORDER; ++order)
+ seq_printf(m, "%6d ", order);
+ seq_putc(m, '\n');
+
+ walk_zones_in_node(m, pgdat, pagetypeinfo_showfree_print);
+
+ return 0;
+}
+
+static void pagetypeinfo_showblockcount_print(struct seq_file *m,
+ pg_data_t *pgdat, struct zone *zone)
+{
+ int mtype;
+ unsigned long pfn;
+ unsigned long start_pfn = zone->zone_start_pfn;
+ unsigned long end_pfn = start_pfn + zone->spanned_pages;
+ unsigned long count[MIGRATE_TYPES] = { 0, };
+
+ for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
+ struct page *page;
+
+ if (!pfn_valid(pfn))
+ continue;
+
+ page = pfn_to_page(pfn);
+ mtype = get_pageblock_migratetype(page);
+
+ count[mtype]++;
+ }
+
+ /* Print counts */
+ seq_printf(m, "Node %d, zone %8s ", pgdat->node_id, zone->name);
+ for (mtype = 0; mtype < MIGRATE_TYPES; mtype++)
+ seq_printf(m, "%12lu ", count[mtype]);
+ seq_putc(m, '\n');
+}
+
+/* Print out the free pages at each order for each migratetype */
+static int pagetypeinfo_showblockcount(struct seq_file *m, void *arg)
+{
+ int mtype;
+ pg_data_t *pgdat = (pg_data_t *)arg;
+
+ seq_printf(m, "\n%-23s", "Number of blocks type ");
+ for (mtype = 0; mtype < MIGRATE_TYPES; mtype++)
+ seq_printf(m, "%12s ", migratetype_names[mtype]);
+ seq_putc(m, '\n');
+ walk_zones_in_node(m, pgdat, pagetypeinfo_showblockcount_print);
+
+ return 0;
+}
+
+/*
+ * This prints out statistics in relation to grouping pages by mobility.
+ * It is expensive to collect so do not constantly read the file.
+ */
+static int pagetypeinfo_show(struct seq_file *m, void *arg)
+{
+ pg_data_t *pgdat = (pg_data_t *)arg;
+
+ seq_printf(m, "Page block order: %d\n", pageblock_order);
+ seq_printf(m, "Pages per block: %lu\n", pageblock_nr_pages);
+ seq_putc(m, '\n');
+ pagetypeinfo_showfree(m, pgdat);
+ pagetypeinfo_showblockcount(m, pgdat);
+
+ return 0;
+}
+
+const struct seq_operations fragmentation_op = {
+ .start = frag_start,
+ .next = frag_next,
+ .stop = frag_stop,
+ .show = frag_show,
+};
+
+const struct seq_operations pagetypeinfo_op = {
+ .start = frag_start,
+ .next = frag_next,
+ .stop = frag_stop,
+ .show = pagetypeinfo_show,
+};
+
+#ifdef CONFIG_ZONE_DMA
+#define TEXT_FOR_DMA(xx) xx "_dma",
+#else
+#define TEXT_FOR_DMA(xx)
+#endif
+
+#ifdef CONFIG_ZONE_DMA32
+#define TEXT_FOR_DMA32(xx) xx "_dma32",
+#else
+#define TEXT_FOR_DMA32(xx)
+#endif
+
+#ifdef CONFIG_HIGHMEM
+#define TEXT_FOR_HIGHMEM(xx) xx "_high",
+#else
+#define TEXT_FOR_HIGHMEM(xx)
+#endif
+
+#define TEXTS_FOR_ZONES(xx) TEXT_FOR_DMA(xx) TEXT_FOR_DMA32(xx) xx "_normal", \
+ TEXT_FOR_HIGHMEM(xx) xx "_movable",
+
+static const char * const vmstat_text[] = {
+ /* Zoned VM counters */
+ "nr_free_pages",
+ "nr_inactive",
+ "nr_active",
+ "nr_anon_pages",
+ "nr_mapped",
+ "nr_file_pages",
+ "nr_dirty",
+ "nr_writeback",
+ "nr_slab_reclaimable",
+ "nr_slab_unreclaimable",
+ "nr_page_table_pages",
+ "nr_unstable",
+ "nr_bounce",
+ "nr_vmscan_write",
+
+#ifdef CONFIG_NUMA
+ "numa_hit",
+ "numa_miss",
+ "numa_foreign",
+ "numa_interleave",
+ "numa_local",
+ "numa_other",
+#endif
+
+#ifdef CONFIG_VM_EVENT_COUNTERS
+ "pgpgin",
+ "pgpgout",
+ "pswpin",
+ "pswpout",
+
+ TEXTS_FOR_ZONES("pgalloc")
+
+ "pgfree",
+ "pgactivate",
+ "pgdeactivate",
+
+ "pgfault",
+ "pgmajfault",
+
+ TEXTS_FOR_ZONES("pgrefill")
+ TEXTS_FOR_ZONES("pgsteal")
+ TEXTS_FOR_ZONES("pgscan_kswapd")
+ TEXTS_FOR_ZONES("pgscan_direct")
+
+ "pginodesteal",
+ "slabs_scanned",
+ "kswapd_steal",
+ "kswapd_inodesteal",
+ "pageoutrun",
+ "allocstall",
+
+ "pgrotated",
+#endif
+};
+
+static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
+ struct zone *zone)
+{
+ int i;
+ seq_printf(m, "Node %d, zone %8s", pgdat->node_id, zone->name);
+ seq_printf(m,
+ "\n pages free %lu"
+ "\n min %lu"
+ "\n low %lu"
+ "\n high %lu"
+ "\n scanned %lu (a: %lu i: %lu)"
+ "\n spanned %lu"
+ "\n present %lu",
+ zone_page_state(zone, NR_FREE_PAGES),
+ zone->pages_min,
+ zone->pages_low,
+ zone->pages_high,
+ zone->pages_scanned,
+ zone->nr_scan_active, zone->nr_scan_inactive,
+ zone->spanned_pages,
+ zone->present_pages);
+
+ for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
+ seq_printf(m, "\n %-12s %lu", vmstat_text[i],
+ zone_page_state(zone, i));
+
+ seq_printf(m,
+ "\n protection: (%lu",
+ zone->lowmem_reserve[0]);
+ for (i = 1; i < ARRAY_SIZE(zone->lowmem_reserve); i++)
+ seq_printf(m, ", %lu", zone->lowmem_reserve[i]);
+ seq_printf(m,
+ ")"
+ "\n pagesets");
+ for_each_online_cpu(i) {
+ struct per_cpu_pageset *pageset;
+ int j;
+
+ pageset = zone_pcp(zone, i);
+ for (j = 0; j < ARRAY_SIZE(pageset->pcp); j++) {
+ seq_printf(m,
+ "\n cpu: %i pcp: %i"
+ "\n count: %i"
+ "\n high: %i"
+ "\n batch: %i",
+ i, j,
+ pageset->pcp[j].count,
+ pageset->pcp[j].high,
+ pageset->pcp[j].batch);
+ }
+#ifdef CONFIG_SMP
+ seq_printf(m, "\n vm stats threshold: %d",
+ pageset->stat_threshold);
+#endif
+ }
+ seq_printf(m,
+ "\n all_unreclaimable: %u"
+ "\n prev_priority: %i"
+ "\n start_pfn: %lu",
+ zone_is_all_unreclaimable(zone),
+ zone->prev_priority,
+ zone->zone_start_pfn);
+ seq_putc(m, '\n');
+}
+
+/*
+ * Output information about zones in @pgdat.
+ */
+static int zoneinfo_show(struct seq_file *m, void *arg)
+{
+ pg_data_t *pgdat = (pg_data_t *)arg;
+ walk_zones_in_node(m, pgdat, zoneinfo_show_print);
+ return 0;
+}
+
+const struct seq_operations zoneinfo_op = {
+ .start = frag_start, /* iterate over all zones. The same as in
+ * fragmentation. */
+ .next = frag_next,
+ .stop = frag_stop,
+ .show = zoneinfo_show,
+};
+
+static void *vmstat_start(struct seq_file *m, loff_t *pos)
+{
+ unsigned long *v;
+#ifdef CONFIG_VM_EVENT_COUNTERS
+ unsigned long *e;
+#endif
+ int i;
+
+ if (*pos >= ARRAY_SIZE(vmstat_text))
+ return NULL;
+
+#ifdef CONFIG_VM_EVENT_COUNTERS
+ v = kmalloc(NR_VM_ZONE_STAT_ITEMS * sizeof(unsigned long)
+ + sizeof(struct vm_event_state), GFP_KERNEL);
+#else
+ v = kmalloc(NR_VM_ZONE_STAT_ITEMS * sizeof(unsigned long),
+ GFP_KERNEL);
+#endif
+ m->private = v;
+ if (!v)
+ return ERR_PTR(-ENOMEM);
+ for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
+ v[i] = global_page_state(i);
+#ifdef CONFIG_VM_EVENT_COUNTERS
+ e = v + NR_VM_ZONE_STAT_ITEMS;
+ all_vm_events(e);
+ e[PGPGIN] /= 2; /* sectors -> kbytes */
+ e[PGPGOUT] /= 2;
+#endif
+ return v + *pos;
+}
+
+static void *vmstat_next(struct seq_file *m, void *arg, loff_t *pos)
+{
+ (*pos)++;
+ if (*pos >= ARRAY_SIZE(vmstat_text))
+ return NULL;
+ return (unsigned long *)m->private + *pos;
+}
+
+static int vmstat_show(struct seq_file *m, void *arg)
+{
+ unsigned long *l = arg;
+ unsigned long off = l - (unsigned long *)m->private;
+
+ seq_printf(m, "%s %lu\n", vmstat_text[off], *l);
+ return 0;
+}
+
+static void vmstat_stop(struct seq_file *m, void *arg)
+{
+ kfree(m->private);
+ m->private = NULL;
+}
+
+const struct seq_operations vmstat_op = {
+ .start = vmstat_start,
+ .next = vmstat_next,
+ .stop = vmstat_stop,
+ .show = vmstat_show,
+};
+
+#endif /* CONFIG_PROC_FS */
+
+#ifdef CONFIG_SMP
+static DEFINE_PER_CPU(struct delayed_work, vmstat_work);
+int sysctl_stat_interval __read_mostly = HZ;
+
+static void vmstat_update(struct work_struct *w)
+{
+ refresh_cpu_vm_stats(smp_processor_id());
+ schedule_delayed_work(&__get_cpu_var(vmstat_work),
+ sysctl_stat_interval);
+}
+
+static void __devinit start_cpu_timer(int cpu)
+{
+ struct delayed_work *vmstat_work = &per_cpu(vmstat_work, cpu);
+
+ INIT_DELAYED_WORK_DEFERRABLE(vmstat_work, vmstat_update);
+ schedule_delayed_work_on(cpu, vmstat_work, HZ + cpu);
+}
+
+/*
+ * Use the cpu notifier to insure that the thresholds are recalculated
+ * when necessary.
+ */
+static int __cpuinit vmstat_cpuup_callback(struct notifier_block *nfb,
+ unsigned long action,
+ void *hcpu)
+{
+ long cpu = (long)hcpu;
+
+ switch (action) {
+ case CPU_ONLINE:
+ case CPU_ONLINE_FROZEN:
+ start_cpu_timer(cpu);
+ break;
+ case CPU_DOWN_PREPARE:
+ case CPU_DOWN_PREPARE_FROZEN:
+ cancel_rearming_delayed_work(&per_cpu(vmstat_work, cpu));
+ per_cpu(vmstat_work, cpu).work.func = NULL;
+ break;
+ case CPU_DOWN_FAILED:
+ case CPU_DOWN_FAILED_FROZEN:
+ start_cpu_timer(cpu);
+ break;
+ case CPU_DEAD:
+ case CPU_DEAD_FROZEN:
+ refresh_zone_stat_thresholds();
+ break;
+ default:
+ break;
+ }
+ return NOTIFY_OK;
+}
+
+static struct notifier_block __cpuinitdata vmstat_notifier =
+ { &vmstat_cpuup_callback, NULL, 0 };
+
+static int __init setup_vmstat(void)
+{
+ int cpu;
+
+ refresh_zone_stat_thresholds();
+ register_cpu_notifier(&vmstat_notifier);
+
+ for_each_online_cpu(cpu)
+ start_cpu_timer(cpu);
+ return 0;
+}
+module_init(setup_vmstat)
+#endif
Index: linux-2.6-lttng.stable/mm/vmstat.c
===================================================================
--- linux-2.6-lttng.stable.orig/mm/vmstat.c 2007-10-29 09:51:06.000000000 -0400
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,863 +0,0 @@
-/*
- * linux/mm/vmstat.c
- *
- * Manages VM statistics
- * Copyright (C) 1991, 1992, 1993, 1994 Linus Torvalds
- *
- * zoned VM statistics
- * Copyright (C) 2006 Silicon Graphics, Inc.,
- * Christoph Lameter <christoph@lameter.com>
- */
-
-#include <linux/mm.h>
-#include <linux/err.h>
-#include <linux/module.h>
-#include <linux/cpu.h>
-#include <linux/sched.h>
-
-#ifdef CONFIG_VM_EVENT_COUNTERS
-DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};
-EXPORT_PER_CPU_SYMBOL(vm_event_states);
-
-static void sum_vm_events(unsigned long *ret, cpumask_t *cpumask)
-{
- int cpu = 0;
- int i;
-
- memset(ret, 0, NR_VM_EVENT_ITEMS * sizeof(unsigned long));
-
- cpu = first_cpu(*cpumask);
- while (cpu < NR_CPUS) {
- struct vm_event_state *this = &per_cpu(vm_event_states, cpu);
-
- cpu = next_cpu(cpu, *cpumask);
-
- if (cpu < NR_CPUS)
- prefetch(&per_cpu(vm_event_states, cpu));
-
-
- for (i = 0; i < NR_VM_EVENT_ITEMS; i++)
- ret[i] += this->event[i];
- }
-}
-
-/*
- * Accumulate the vm event counters across all CPUs.
- * The result is unavoidably approximate - it can change
- * during and after execution of this function.
-*/
-void all_vm_events(unsigned long *ret)
-{
- sum_vm_events(ret, &cpu_online_map);
-}
-EXPORT_SYMBOL_GPL(all_vm_events);
-
-#ifdef CONFIG_HOTPLUG
-/*
- * Fold the foreign cpu events into our own.
- *
- * This is adding to the events on one processor
- * but keeps the global counts constant.
- */
-void vm_events_fold_cpu(int cpu)
-{
- struct vm_event_state *fold_state = &per_cpu(vm_event_states, cpu);
- int i;
-
- for (i = 0; i < NR_VM_EVENT_ITEMS; i++) {
- count_vm_events(i, fold_state->event[i]);
- fold_state->event[i] = 0;
- }
-}
-#endif /* CONFIG_HOTPLUG */
-
-#endif /* CONFIG_VM_EVENT_COUNTERS */
-
-/*
- * Manage combined zone based / global counters
- *
- * vm_stat contains the global counters
- */
-atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS];
-EXPORT_SYMBOL(vm_stat);
-
-#ifdef CONFIG_SMP
-
-static int calculate_threshold(struct zone *zone)
-{
- int threshold;
- int mem; /* memory in 128 MB units */
-
- /*
- * The threshold scales with the number of processors and the amount
- * of memory per zone. More memory means that we can defer updates for
- * longer, more processors could lead to more contention.
- * fls() is used to have a cheap way of logarithmic scaling.
- *
- * Some sample thresholds:
- *
- * Threshold Processors (fls) Zonesize fls(mem+1)
- * ------------------------------------------------------------------
- * 8 1 1 0.9-1 GB 4
- * 16 2 2 0.9-1 GB 4
- * 20 2 2 1-2 GB 5
- * 24 2 2 2-4 GB 6
- * 28 2 2 4-8 GB 7
- * 32 2 2 8-16 GB 8
- * 4 2 2 <128M 1
- * 30 4 3 2-4 GB 5
- * 48 4 3 8-16 GB 8
- * 32 8 4 1-2 GB 4
- * 32 8 4 0.9-1GB 4
- * 10 16 5 <128M 1
- * 40 16 5 900M 4
- * 70 64 7 2-4 GB 5
- * 84 64 7 4-8 GB 6
- * 108 512 9 4-8 GB 6
- * 125 1024 10 8-16 GB 8
- * 125 1024 10 16-32 GB 9
- */
-
- mem = zone->present_pages >> (27 - PAGE_SHIFT);
-
- threshold = 2 * fls(num_online_cpus()) * (1 + fls(mem));
-
- /*
- * Maximum threshold is 125
- */
- threshold = min(125, threshold);
-
- return threshold;
-}
-
-/*
- * Refresh the thresholds for each zone.
- */
-static void refresh_zone_stat_thresholds(void)
-{
- struct zone *zone;
- int cpu;
- int threshold;
-
- for_each_zone(zone) {
-
- if (!zone->present_pages)
- continue;
-
- threshold = calculate_threshold(zone);
-
- for_each_online_cpu(cpu)
- zone_pcp(zone, cpu)->stat_threshold = threshold;
- }
-}
-
-/*
- * For use when we know that interrupts are disabled.
- */
-void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
- int delta)
-{
- struct per_cpu_pageset *pcp = zone_pcp(zone, smp_processor_id());
- s8 *p = pcp->vm_stat_diff + item;
- long x;
-
- x = delta + *p;
-
- if (unlikely(x > pcp->stat_threshold || x < -pcp->stat_threshold)) {
- zone_page_state_add(x, zone, item);
- x = 0;
- }
- *p = x;
-}
-EXPORT_SYMBOL(__mod_zone_page_state);
-
-/*
- * For an unknown interrupt state
- */
-void mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
- int delta)
-{
- unsigned long flags;
-
- local_irq_save(flags);
- __mod_zone_page_state(zone, item, delta);
- local_irq_restore(flags);
-}
-EXPORT_SYMBOL(mod_zone_page_state);
-
-/*
- * Optimized increment and decrement functions.
- *
- * These are only for a single page and therefore can take a struct page *
- * argument instead of struct zone *. This allows the inclusion of the code
- * generated for page_zone(page) into the optimized functions.
- *
- * No overflow check is necessary and therefore the differential can be
- * incremented or decremented in place which may allow the compilers to
- * generate better code.
- * The increment or decrement is known and therefore one boundary check can
- * be omitted.
- *
- * NOTE: These functions are very performance sensitive. Change only
- * with care.
- *
- * Some processors have inc/dec instructions that are atomic vs an interrupt.
- * However, the code must first determine the differential location in a zone
- * based on the processor number and then inc/dec the counter. There is no
- * guarantee without disabling preemption that the processor will not change
- * in between and therefore the atomicity vs. interrupt cannot be exploited
- * in a useful way here.
- */
-void __inc_zone_state(struct zone *zone, enum zone_stat_item item)
-{
- struct per_cpu_pageset *pcp = zone_pcp(zone, smp_processor_id());
- s8 *p = pcp->vm_stat_diff + item;
-
- (*p)++;
-
- if (unlikely(*p > pcp->stat_threshold)) {
- int overstep = pcp->stat_threshold / 2;
-
- zone_page_state_add(*p + overstep, zone, item);
- *p = -overstep;
- }
-}
-
-void __inc_zone_page_state(struct page *page, enum zone_stat_item item)
-{
- __inc_zone_state(page_zone(page), item);
-}
-EXPORT_SYMBOL(__inc_zone_page_state);
-
-void __dec_zone_state(struct zone *zone, enum zone_stat_item item)
-{
- struct per_cpu_pageset *pcp = zone_pcp(zone, smp_processor_id());
- s8 *p = pcp->vm_stat_diff + item;
-
- (*p)--;
-
- if (unlikely(*p < - pcp->stat_threshold)) {
- int overstep = pcp->stat_threshold / 2;
-
- zone_page_state_add(*p - overstep, zone, item);
- *p = overstep;
- }
-}
-
-void __dec_zone_page_state(struct page *page, enum zone_stat_item item)
-{
- __dec_zone_state(page_zone(page), item);
-}
-EXPORT_SYMBOL(__dec_zone_page_state);
-
-void inc_zone_state(struct zone *zone, enum zone_stat_item item)
-{
- unsigned long flags;
-
- local_irq_save(flags);
- __inc_zone_state(zone, item);
- local_irq_restore(flags);
-}
-
-void inc_zone_page_state(struct page *page, enum zone_stat_item item)
-{
- unsigned long flags;
- struct zone *zone;
-
- zone = page_zone(page);
- local_irq_save(flags);
- __inc_zone_state(zone, item);
- local_irq_restore(flags);
-}
-EXPORT_SYMBOL(inc_zone_page_state);
-
-void dec_zone_page_state(struct page *page, enum zone_stat_item item)
-{
- unsigned long flags;
-
- local_irq_save(flags);
- __dec_zone_page_state(page, item);
- local_irq_restore(flags);
-}
-EXPORT_SYMBOL(dec_zone_page_state);
-
-/*
- * Update the zone counters for one cpu.
- *
- * Note that refresh_cpu_vm_stats strives to only access
- * node local memory. The per cpu pagesets on remote zones are placed
- * in the memory local to the processor using that pageset. So the
- * loop over all zones will access a series of cachelines local to
- * the processor.
- *
- * The call to zone_page_state_add updates the cachelines with the
- * statistics in the remote zone struct as well as the global cachelines
- * with the global counters. These could cause remote node cache line
- * bouncing and will have to be only done when necessary.
- */
-void refresh_cpu_vm_stats(int cpu)
-{
- struct zone *zone;
- int i;
- unsigned long flags;
-
- for_each_zone(zone) {
- struct per_cpu_pageset *p;
-
- if (!populated_zone(zone))
- continue;
-
- p = zone_pcp(zone, cpu);
-
- for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
- if (p->vm_stat_diff[i]) {
- local_irq_save(flags);
- zone_page_state_add(p->vm_stat_diff[i],
- zone, i);
- p->vm_stat_diff[i] = 0;
-#ifdef CONFIG_NUMA
- /* 3 seconds idle till flush */
- p->expire = 3;
-#endif
- local_irq_restore(flags);
- }
-#ifdef CONFIG_NUMA
- /*
- * Deal with draining the remote pageset of this
- * processor
- *
- * Check if there are pages remaining in this pageset
- * if not then there is nothing to expire.
- */
- if (!p->expire || (!p->pcp[0].count && !p->pcp[1].count))
- continue;
-
- /*
- * We never drain zones local to this processor.
- */
- if (zone_to_nid(zone) == numa_node_id()) {
- p->expire = 0;
- continue;
- }
-
- p->expire--;
- if (p->expire)
- continue;
-
- if (p->pcp[0].count)
- drain_zone_pages(zone, p->pcp + 0);
-
- if (p->pcp[1].count)
- drain_zone_pages(zone, p->pcp + 1);
-#endif
- }
-}
-
-#endif
-
-#ifdef CONFIG_NUMA
-/*
- * zonelist = the list of zones passed to the allocator
- * z = the zone from which the allocation occurred.
- *
- * Must be called with interrupts disabled.
- */
-void zone_statistics(struct zonelist *zonelist, struct zone *z)
-{
- if (z->zone_pgdat == zonelist->zones[0]->zone_pgdat) {
- __inc_zone_state(z, NUMA_HIT);
- } else {
- __inc_zone_state(z, NUMA_MISS);
- __inc_zone_state(zonelist->zones[0], NUMA_FOREIGN);
- }
- if (z->node == numa_node_id())
- __inc_zone_state(z, NUMA_LOCAL);
- else
- __inc_zone_state(z, NUMA_OTHER);
-}
-#endif
-
-#ifdef CONFIG_PROC_FS
-
-#include <linux/seq_file.h>
-
-static char * const migratetype_names[MIGRATE_TYPES] = {
- "Unmovable",
- "Reclaimable",
- "Movable",
- "Reserve",
-};
-
-static void *frag_start(struct seq_file *m, loff_t *pos)
-{
- pg_data_t *pgdat;
- loff_t node = *pos;
- for (pgdat = first_online_pgdat();
- pgdat && node;
- pgdat = next_online_pgdat(pgdat))
- --node;
-
- return pgdat;
-}
-
-static void *frag_next(struct seq_file *m, void *arg, loff_t *pos)
-{
- pg_data_t *pgdat = (pg_data_t *)arg;
-
- (*pos)++;
- return next_online_pgdat(pgdat);
-}
-
-static void frag_stop(struct seq_file *m, void *arg)
-{
-}
-
-/* Walk all the zones in a node and print using a callback */
-static void walk_zones_in_node(struct seq_file *m, pg_data_t *pgdat,
- void (*print)(struct seq_file *m, pg_data_t *, struct zone *))
-{
- struct zone *zone;
- struct zone *node_zones = pgdat->node_zones;
- unsigned long flags;
-
- for (zone = node_zones; zone - node_zones < MAX_NR_ZONES; ++zone) {
- if (!populated_zone(zone))
- continue;
-
- spin_lock_irqsave(&zone->lock, flags);
- print(m, pgdat, zone);
- spin_unlock_irqrestore(&zone->lock, flags);
- }
-}
-
-static void frag_show_print(struct seq_file *m, pg_data_t *pgdat,
- struct zone *zone)
-{
- int order;
-
- seq_printf(m, "Node %d, zone %8s ", pgdat->node_id, zone->name);
- for (order = 0; order < MAX_ORDER; ++order)
- seq_printf(m, "%6lu ", zone->free_area[order].nr_free);
- seq_putc(m, '\n');
-}
-
-/*
- * This walks the free areas for each zone.
- */
-static int frag_show(struct seq_file *m, void *arg)
-{
- pg_data_t *pgdat = (pg_data_t *)arg;
- walk_zones_in_node(m, pgdat, frag_show_print);
- return 0;
-}
-
-static void pagetypeinfo_showfree_print(struct seq_file *m,
- pg_data_t *pgdat, struct zone *zone)
-{
- int order, mtype;
-
- for (mtype = 0; mtype < MIGRATE_TYPES; mtype++) {
- seq_printf(m, "Node %4d, zone %8s, type %12s ",
- pgdat->node_id,
- zone->name,
- migratetype_names[mtype]);
- for (order = 0; order < MAX_ORDER; ++order) {
- unsigned long freecount = 0;
- struct free_area *area;
- struct list_head *curr;
-
- area = &(zone->free_area[order]);
-
- list_for_each(curr, &area->free_list[mtype])
- freecount++;
- seq_printf(m, "%6lu ", freecount);
- }
- seq_putc(m, '\n');
- }
-}
-
-/* Print out the free pages at each order for each migatetype */
-static int pagetypeinfo_showfree(struct seq_file *m, void *arg)
-{
- int order;
- pg_data_t *pgdat = (pg_data_t *)arg;
-
- /* Print header */
- seq_printf(m, "%-43s ", "Free pages count per migrate type at order");
- for (order = 0; order < MAX_ORDER; ++order)
- seq_printf(m, "%6d ", order);
- seq_putc(m, '\n');
-
- walk_zones_in_node(m, pgdat, pagetypeinfo_showfree_print);
-
- return 0;
-}
-
-static void pagetypeinfo_showblockcount_print(struct seq_file *m,
- pg_data_t *pgdat, struct zone *zone)
-{
- int mtype;
- unsigned long pfn;
- unsigned long start_pfn = zone->zone_start_pfn;
- unsigned long end_pfn = start_pfn + zone->spanned_pages;
- unsigned long count[MIGRATE_TYPES] = { 0, };
-
- for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
- struct page *page;
-
- if (!pfn_valid(pfn))
- continue;
-
- page = pfn_to_page(pfn);
- mtype = get_pageblock_migratetype(page);
-
- count[mtype]++;
- }
-
- /* Print counts */
- seq_printf(m, "Node %d, zone %8s ", pgdat->node_id, zone->name);
- for (mtype = 0; mtype < MIGRATE_TYPES; mtype++)
- seq_printf(m, "%12lu ", count[mtype]);
- seq_putc(m, '\n');
-}
-
-/* Print out the free pages at each order for each migratetype */
-static int pagetypeinfo_showblockcount(struct seq_file *m, void *arg)
-{
- int mtype;
- pg_data_t *pgdat = (pg_data_t *)arg;
-
- seq_printf(m, "\n%-23s", "Number of blocks type ");
- for (mtype = 0; mtype < MIGRATE_TYPES; mtype++)
- seq_printf(m, "%12s ", migratetype_names[mtype]);
- seq_putc(m, '\n');
- walk_zones_in_node(m, pgdat, pagetypeinfo_showblockcount_print);
-
- return 0;
-}
-
-/*
- * This prints out statistics in relation to grouping pages by mobility.
- * It is expensive to collect so do not constantly read the file.
- */
-static int pagetypeinfo_show(struct seq_file *m, void *arg)
-{
- pg_data_t *pgdat = (pg_data_t *)arg;
-
- seq_printf(m, "Page block order: %d\n", pageblock_order);
- seq_printf(m, "Pages per block: %lu\n", pageblock_nr_pages);
- seq_putc(m, '\n');
- pagetypeinfo_showfree(m, pgdat);
- pagetypeinfo_showblockcount(m, pgdat);
-
- return 0;
-}
-
-const struct seq_operations fragmentation_op = {
- .start = frag_start,
- .next = frag_next,
- .stop = frag_stop,
- .show = frag_show,
-};
-
-const struct seq_operations pagetypeinfo_op = {
- .start = frag_start,
- .next = frag_next,
- .stop = frag_stop,
- .show = pagetypeinfo_show,
-};
-
-#ifdef CONFIG_ZONE_DMA
-#define TEXT_FOR_DMA(xx) xx "_dma",
-#else
-#define TEXT_FOR_DMA(xx)
-#endif
-
-#ifdef CONFIG_ZONE_DMA32
-#define TEXT_FOR_DMA32(xx) xx "_dma32",
-#else
-#define TEXT_FOR_DMA32(xx)
-#endif
-
-#ifdef CONFIG_HIGHMEM
-#define TEXT_FOR_HIGHMEM(xx) xx "_high",
-#else
-#define TEXT_FOR_HIGHMEM(xx)
-#endif
-
-#define TEXTS_FOR_ZONES(xx) TEXT_FOR_DMA(xx) TEXT_FOR_DMA32(xx) xx "_normal", \
- TEXT_FOR_HIGHMEM(xx) xx "_movable",
-
-static const char * const vmstat_text[] = {
- /* Zoned VM counters */
- "nr_free_pages",
- "nr_inactive",
- "nr_active",
- "nr_anon_pages",
- "nr_mapped",
- "nr_file_pages",
- "nr_dirty",
- "nr_writeback",
- "nr_slab_reclaimable",
- "nr_slab_unreclaimable",
- "nr_page_table_pages",
- "nr_unstable",
- "nr_bounce",
- "nr_vmscan_write",
-
-#ifdef CONFIG_NUMA
- "numa_hit",
- "numa_miss",
- "numa_foreign",
- "numa_interleave",
- "numa_local",
- "numa_other",
-#endif
-
-#ifdef CONFIG_VM_EVENT_COUNTERS
- "pgpgin",
- "pgpgout",
- "pswpin",
- "pswpout",
-
- TEXTS_FOR_ZONES("pgalloc")
-
- "pgfree",
- "pgactivate",
- "pgdeactivate",
-
- "pgfault",
- "pgmajfault",
-
- TEXTS_FOR_ZONES("pgrefill")
- TEXTS_FOR_ZONES("pgsteal")
- TEXTS_FOR_ZONES("pgscan_kswapd")
- TEXTS_FOR_ZONES("pgscan_direct")
-
- "pginodesteal",
- "slabs_scanned",
- "kswapd_steal",
- "kswapd_inodesteal",
- "pageoutrun",
- "allocstall",
-
- "pgrotated",
-#endif
-};
-
-static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
- struct zone *zone)
-{
- int i;
- seq_printf(m, "Node %d, zone %8s", pgdat->node_id, zone->name);
- seq_printf(m,
- "\n pages free %lu"
- "\n min %lu"
- "\n low %lu"
- "\n high %lu"
- "\n scanned %lu (a: %lu i: %lu)"
- "\n spanned %lu"
- "\n present %lu",
- zone_page_state(zone, NR_FREE_PAGES),
- zone->pages_min,
- zone->pages_low,
- zone->pages_high,
- zone->pages_scanned,
- zone->nr_scan_active, zone->nr_scan_inactive,
- zone->spanned_pages,
- zone->present_pages);
-
- for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
- seq_printf(m, "\n %-12s %lu", vmstat_text[i],
- zone_page_state(zone, i));
-
- seq_printf(m,
- "\n protection: (%lu",
- zone->lowmem_reserve[0]);
- for (i = 1; i < ARRAY_SIZE(zone->lowmem_reserve); i++)
- seq_printf(m, ", %lu", zone->lowmem_reserve[i]);
- seq_printf(m,
- ")"
- "\n pagesets");
- for_each_online_cpu(i) {
- struct per_cpu_pageset *pageset;
- int j;
-
- pageset = zone_pcp(zone, i);
- for (j = 0; j < ARRAY_SIZE(pageset->pcp); j++) {
- seq_printf(m,
- "\n cpu: %i pcp: %i"
- "\n count: %i"
- "\n high: %i"
- "\n batch: %i",
- i, j,
- pageset->pcp[j].count,
- pageset->pcp[j].high,
- pageset->pcp[j].batch);
- }
-#ifdef CONFIG_SMP
- seq_printf(m, "\n vm stats threshold: %d",
- pageset->stat_threshold);
-#endif
- }
- seq_printf(m,
- "\n all_unreclaimable: %u"
- "\n prev_priority: %i"
- "\n start_pfn: %lu",
- zone_is_all_unreclaimable(zone),
- zone->prev_priority,
- zone->zone_start_pfn);
- seq_putc(m, '\n');
-}
-
-/*
- * Output information about zones in @pgdat.
- */
-static int zoneinfo_show(struct seq_file *m, void *arg)
-{
- pg_data_t *pgdat = (pg_data_t *)arg;
- walk_zones_in_node(m, pgdat, zoneinfo_show_print);
- return 0;
-}
-
-const struct seq_operations zoneinfo_op = {
- .start = frag_start, /* iterate over all zones. The same as in
- * fragmentation. */
- .next = frag_next,
- .stop = frag_stop,
- .show = zoneinfo_show,
-};
-
-static void *vmstat_start(struct seq_file *m, loff_t *pos)
-{
- unsigned long *v;
-#ifdef CONFIG_VM_EVENT_COUNTERS
- unsigned long *e;
-#endif
- int i;
-
- if (*pos >= ARRAY_SIZE(vmstat_text))
- return NULL;
-
-#ifdef CONFIG_VM_EVENT_COUNTERS
- v = kmalloc(NR_VM_ZONE_STAT_ITEMS * sizeof(unsigned long)
- + sizeof(struct vm_event_state), GFP_KERNEL);
-#else
- v = kmalloc(NR_VM_ZONE_STAT_ITEMS * sizeof(unsigned long),
- GFP_KERNEL);
-#endif
- m->private = v;
- if (!v)
- return ERR_PTR(-ENOMEM);
- for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
- v[i] = global_page_state(i);
-#ifdef CONFIG_VM_EVENT_COUNTERS
- e = v + NR_VM_ZONE_STAT_ITEMS;
- all_vm_events(e);
- e[PGPGIN] /= 2; /* sectors -> kbytes */
- e[PGPGOUT] /= 2;
-#endif
- return v + *pos;
-}
-
-static void *vmstat_next(struct seq_file *m, void *arg, loff_t *pos)
-{
- (*pos)++;
- if (*pos >= ARRAY_SIZE(vmstat_text))
- return NULL;
- return (unsigned long *)m->private + *pos;
-}
-
-static int vmstat_show(struct seq_file *m, void *arg)
-{
- unsigned long *l = arg;
- unsigned long off = l - (unsigned long *)m->private;
-
- seq_printf(m, "%s %lu\n", vmstat_text[off], *l);
- return 0;
-}
-
-static void vmstat_stop(struct seq_file *m, void *arg)
-{
- kfree(m->private);
- m->private = NULL;
-}
-
-const struct seq_operations vmstat_op = {
- .start = vmstat_start,
- .next = vmstat_next,
- .stop = vmstat_stop,
- .show = vmstat_show,
-};
-
-#endif /* CONFIG_PROC_FS */
-
-#ifdef CONFIG_SMP
-static DEFINE_PER_CPU(struct delayed_work, vmstat_work);
-int sysctl_stat_interval __read_mostly = HZ;
-
-static void vmstat_update(struct work_struct *w)
-{
- refresh_cpu_vm_stats(smp_processor_id());
- schedule_delayed_work(&__get_cpu_var(vmstat_work),
- sysctl_stat_interval);
-}
-
-static void __devinit start_cpu_timer(int cpu)
-{
- struct delayed_work *vmstat_work = &per_cpu(vmstat_work, cpu);
-
- INIT_DELAYED_WORK_DEFERRABLE(vmstat_work, vmstat_update);
- schedule_delayed_work_on(cpu, vmstat_work, HZ + cpu);
-}
-
-/*
- * Use the cpu notifier to insure that the thresholds are recalculated
- * when necessary.
- */
-static int __cpuinit vmstat_cpuup_callback(struct notifier_block *nfb,
- unsigned long action,
- void *hcpu)
-{
- long cpu = (long)hcpu;
-
- switch (action) {
- case CPU_ONLINE:
- case CPU_ONLINE_FROZEN:
- start_cpu_timer(cpu);
- break;
- case CPU_DOWN_PREPARE:
- case CPU_DOWN_PREPARE_FROZEN:
- cancel_rearming_delayed_work(&per_cpu(vmstat_work, cpu));
- per_cpu(vmstat_work, cpu).work.func = NULL;
- break;
- case CPU_DOWN_FAILED:
- case CPU_DOWN_FAILED_FROZEN:
- start_cpu_timer(cpu);
- break;
- case CPU_DEAD:
- case CPU_DEAD_FROZEN:
- refresh_zone_stat_thresholds();
- break;
- default:
- break;
- }
- return NOTIFY_OK;
-}
-
-static struct notifier_block __cpuinitdata vmstat_notifier =
- { &vmstat_cpuup_callback, NULL, 0 };
-
-static int __init setup_vmstat(void)
-{
- int cpu;
-
- refresh_zone_stat_thresholds();
- register_cpu_notifier(&vmstat_notifier);
-
- for_each_online_cpu(cpu)
- start_cpu_timer(cpu);
- return 0;
-}
-module_init(setup_vmstat)
-#endif
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 13+ messages in thread
* [patch 08/10] Move blktrace Kconfig entry to instrumentation menu
2007-10-29 14:26 [patch 00/10] [PATCH] Create instrumentation/ kernel directory Mathieu Desnoyers
` (5 preceding siblings ...)
2007-10-29 14:26 ` [patch 07/10] Move vmstat to instrumentation/ Mathieu Desnoyers
@ 2007-10-29 14:26 ` Mathieu Desnoyers
2007-10-29 14:26 ` [patch 09/10] Move blktrace to instrumentation directory Mathieu Desnoyers
` (2 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2007-10-29 14:26 UTC (permalink / raw)
To: linux-kernel; +Cc: Mathieu Desnoyers, Jens Axboe
[-- Attachment #1: move-blktrace-to-instrumentation-menu.patch --]
[-- Type: text/plain, Size: 2375 bytes --]
This block layer isntrumentation should appear in the instrumentation menu, not
in the block layer menu.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Jens Axboe <jens.axboe@oracle.com>
---
block/Kconfig | 13 -------------
instrumentation/Kconfig | 13 +++++++++++++
2 files changed, 13 insertions(+), 13 deletions(-)
Index: linux-2.6-lttng.stable/block/Kconfig
===================================================================
--- linux-2.6-lttng.stable.orig/block/Kconfig 2007-10-29 09:12:10.000000000 -0400
+++ linux-2.6-lttng.stable/block/Kconfig 2007-10-29 09:12:29.000000000 -0400
@@ -27,19 +27,6 @@ config LBD
your machine, or if you want to have a raid or loopback device
bigger than 2TB. Otherwise say N.
-config BLK_DEV_IO_TRACE
- bool "Support for tracing block io actions"
- depends on SYSFS
- select RELAY
- select DEBUG_FS
- help
- Say Y here, if you want to be able to trace the block layer actions
- on a given queue. Tracing allows you to see any traffic happening
- on a block device queue. For more information (and the user space
- support tools needed), fetch the blktrace app from:
-
- git://brick.kernel.dk/data/git/blktrace.git
-
config LSF
bool "Support for Large Single Files"
depends on !64BIT
Index: linux-2.6-lttng.stable/instrumentation/Kconfig
===================================================================
--- linux-2.6-lttng.stable.orig/instrumentation/Kconfig 2007-10-29 09:13:07.000000000 -0400
+++ linux-2.6-lttng.stable/instrumentation/Kconfig 2007-10-29 09:13:14.000000000 -0400
@@ -46,4 +46,17 @@ config MARKERS
Place an empty function call at each marker site. Can be
dynamically changed for a probe function.
+config BLK_DEV_IO_TRACE
+ bool "Support for tracing block io actions"
+ depends on SYSFS
+ select RELAY
+ select DEBUG_FS
+ help
+ Say Y here, if you want to be able to trace the block layer actions
+ on a given queue. Tracing allows you to see any traffic happening
+ on a block device queue. For more information (and the user space
+ support tools needed), fetch the blktrace app from:
+
+ git://brick.kernel.dk/data/git/blktrace.git
+
endif # INSTRUMENTATION
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 13+ messages in thread
* [patch 09/10] Move blktrace to instrumentation directory
2007-10-29 14:26 [patch 00/10] [PATCH] Create instrumentation/ kernel directory Mathieu Desnoyers
` (6 preceding siblings ...)
2007-10-29 14:26 ` [patch 08/10] Move blktrace Kconfig entry to instrumentation menu Mathieu Desnoyers
@ 2007-10-29 14:26 ` Mathieu Desnoyers
2007-10-29 14:26 ` [patch 10/10] Move samples to instrumentation/samples Mathieu Desnoyers
2007-10-29 15:58 ` [patch 01/10] Move oprofile to arch/*/instrumentation Mathieu Desnoyers
9 siblings, 0 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2007-10-29 14:26 UTC (permalink / raw)
To: linux-kernel; +Cc: Mathieu Desnoyers, Jens Axboe
[-- Attachment #1: move-blktrace-to-instrumentation.patch --]
[-- Type: text/plain, Size: 29434 bytes --]
This block IO tracing belongs to the instrumentation category.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Jens Axboe <jens.axboe@oracle.com>
---
block/Makefile | 1
block/blktrace.c | 576 ---------------------------------------------
instrumentation/Makefile | 1
instrumentation/blktrace.c | 576 +++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 577 insertions(+), 577 deletions(-)
Index: linux-2.6-lttng.stable/block/Makefile
===================================================================
--- linux-2.6-lttng.stable.orig/block/Makefile 2007-10-29 09:51:05.000000000 -0400
+++ linux-2.6-lttng.stable/block/Makefile 2007-10-29 09:53:17.000000000 -0400
@@ -10,5 +10,4 @@ obj-$(CONFIG_IOSCHED_AS) += as-iosched.o
obj-$(CONFIG_IOSCHED_DEADLINE) += deadline-iosched.o
obj-$(CONFIG_IOSCHED_CFQ) += cfq-iosched.o
-obj-$(CONFIG_BLK_DEV_IO_TRACE) += blktrace.o
obj-$(CONFIG_BLOCK_COMPAT) += compat_ioctl.o
Index: linux-2.6-lttng.stable/block/blktrace.c
===================================================================
--- linux-2.6-lttng.stable.orig/block/blktrace.c 2007-10-29 09:51:05.000000000 -0400
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,576 +0,0 @@
-/*
- * Copyright (C) 2006 Jens Axboe <axboe@kernel.dk>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
- *
- */
-#include <linux/kernel.h>
-#include <linux/blkdev.h>
-#include <linux/blktrace_api.h>
-#include <linux/percpu.h>
-#include <linux/init.h>
-#include <linux/mutex.h>
-#include <linux/debugfs.h>
-#include <linux/time.h>
-#include <asm/uaccess.h>
-
-static DEFINE_PER_CPU(unsigned long long, blk_trace_cpu_offset) = { 0, };
-static unsigned int blktrace_seq __read_mostly = 1;
-
-/*
- * Send out a notify message.
- */
-static void trace_note(struct blk_trace *bt, pid_t pid, int action,
- const void *data, size_t len)
-{
- struct blk_io_trace *t;
-
- t = relay_reserve(bt->rchan, sizeof(*t) + len);
- if (t) {
- const int cpu = smp_processor_id();
-
- t->magic = BLK_IO_TRACE_MAGIC | BLK_IO_TRACE_VERSION;
- t->time = cpu_clock(cpu) - per_cpu(blk_trace_cpu_offset, cpu);
- t->device = bt->dev;
- t->action = action;
- t->pid = pid;
- t->cpu = cpu;
- t->pdu_len = len;
- memcpy((void *) t + sizeof(*t), data, len);
- }
-}
-
-/*
- * Send out a notify for this process, if we haven't done so since a trace
- * started
- */
-static void trace_note_tsk(struct blk_trace *bt, struct task_struct *tsk)
-{
- tsk->btrace_seq = blktrace_seq;
- trace_note(bt, tsk->pid, BLK_TN_PROCESS, tsk->comm, sizeof(tsk->comm));
-}
-
-static void trace_note_time(struct blk_trace *bt)
-{
- struct timespec now;
- unsigned long flags;
- u32 words[2];
-
- getnstimeofday(&now);
- words[0] = now.tv_sec;
- words[1] = now.tv_nsec;
-
- local_irq_save(flags);
- trace_note(bt, 0, BLK_TN_TIMESTAMP, words, sizeof(words));
- local_irq_restore(flags);
-}
-
-static int act_log_check(struct blk_trace *bt, u32 what, sector_t sector,
- pid_t pid)
-{
- if (((bt->act_mask << BLK_TC_SHIFT) & what) == 0)
- return 1;
- if (sector < bt->start_lba || sector > bt->end_lba)
- return 1;
- if (bt->pid && pid != bt->pid)
- return 1;
-
- return 0;
-}
-
-/*
- * Data direction bit lookup
- */
-static u32 ddir_act[2] __read_mostly = { BLK_TC_ACT(BLK_TC_READ), BLK_TC_ACT(BLK_TC_WRITE) };
-
-/*
- * Bio action bits of interest
- */
-static u32 bio_act[9] __read_mostly = { 0, BLK_TC_ACT(BLK_TC_BARRIER), BLK_TC_ACT(BLK_TC_SYNC), 0, BLK_TC_ACT(BLK_TC_AHEAD), 0, 0, 0, BLK_TC_ACT(BLK_TC_META) };
-
-/*
- * More could be added as needed, taking care to increment the decrementer
- * to get correct indexing
- */
-#define trace_barrier_bit(rw) \
- (((rw) & (1 << BIO_RW_BARRIER)) >> (BIO_RW_BARRIER - 0))
-#define trace_sync_bit(rw) \
- (((rw) & (1 << BIO_RW_SYNC)) >> (BIO_RW_SYNC - 1))
-#define trace_ahead_bit(rw) \
- (((rw) & (1 << BIO_RW_AHEAD)) << (2 - BIO_RW_AHEAD))
-#define trace_meta_bit(rw) \
- (((rw) & (1 << BIO_RW_META)) >> (BIO_RW_META - 3))
-
-/*
- * The worker for the various blk_add_trace*() types. Fills out a
- * blk_io_trace structure and places it in a per-cpu subbuffer.
- */
-void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
- int rw, u32 what, int error, int pdu_len, void *pdu_data)
-{
- struct task_struct *tsk = current;
- struct blk_io_trace *t;
- unsigned long flags;
- unsigned long *sequence;
- pid_t pid;
- int cpu;
-
- if (unlikely(bt->trace_state != Blktrace_running))
- return;
-
- what |= ddir_act[rw & WRITE];
- what |= bio_act[trace_barrier_bit(rw)];
- what |= bio_act[trace_sync_bit(rw)];
- what |= bio_act[trace_ahead_bit(rw)];
- what |= bio_act[trace_meta_bit(rw)];
-
- pid = tsk->pid;
- if (unlikely(act_log_check(bt, what, sector, pid)))
- return;
-
- /*
- * A word about the locking here - we disable interrupts to reserve
- * some space in the relay per-cpu buffer, to prevent an irq
- * from coming in and stepping on our toes. Once reserved, it's
- * enough to get preemption disabled to prevent read of this data
- * before we are through filling it. get_cpu()/put_cpu() does this
- * for us
- */
- local_irq_save(flags);
-
- if (unlikely(tsk->btrace_seq != blktrace_seq))
- trace_note_tsk(bt, tsk);
-
- t = relay_reserve(bt->rchan, sizeof(*t) + pdu_len);
- if (t) {
- cpu = smp_processor_id();
- sequence = per_cpu_ptr(bt->sequence, cpu);
-
- t->magic = BLK_IO_TRACE_MAGIC | BLK_IO_TRACE_VERSION;
- t->sequence = ++(*sequence);
- t->time = cpu_clock(cpu) - per_cpu(blk_trace_cpu_offset, cpu);
- t->sector = sector;
- t->bytes = bytes;
- t->action = what;
- t->pid = pid;
- t->device = bt->dev;
- t->cpu = cpu;
- t->error = error;
- t->pdu_len = pdu_len;
-
- if (pdu_len)
- memcpy((void *) t + sizeof(*t), pdu_data, pdu_len);
- }
-
- local_irq_restore(flags);
-}
-
-EXPORT_SYMBOL_GPL(__blk_add_trace);
-
-static struct dentry *blk_tree_root;
-static struct mutex blk_tree_mutex;
-static unsigned int root_users;
-
-static inline void blk_remove_root(void)
-{
- if (blk_tree_root) {
- debugfs_remove(blk_tree_root);
- blk_tree_root = NULL;
- }
-}
-
-static void blk_remove_tree(struct dentry *dir)
-{
- mutex_lock(&blk_tree_mutex);
- debugfs_remove(dir);
- if (--root_users == 0)
- blk_remove_root();
- mutex_unlock(&blk_tree_mutex);
-}
-
-static struct dentry *blk_create_tree(const char *blk_name)
-{
- struct dentry *dir = NULL;
-
- mutex_lock(&blk_tree_mutex);
-
- if (!blk_tree_root) {
- blk_tree_root = debugfs_create_dir("block", NULL);
- if (!blk_tree_root)
- goto err;
- }
-
- dir = debugfs_create_dir(blk_name, blk_tree_root);
- if (dir)
- root_users++;
- else
- blk_remove_root();
-
-err:
- mutex_unlock(&blk_tree_mutex);
- return dir;
-}
-
-static void blk_trace_cleanup(struct blk_trace *bt)
-{
- relay_close(bt->rchan);
- debugfs_remove(bt->dropped_file);
- blk_remove_tree(bt->dir);
- free_percpu(bt->sequence);
- kfree(bt);
-}
-
-static int blk_trace_remove(struct request_queue *q)
-{
- struct blk_trace *bt;
-
- bt = xchg(&q->blk_trace, NULL);
- if (!bt)
- return -EINVAL;
-
- if (bt->trace_state == Blktrace_setup ||
- bt->trace_state == Blktrace_stopped)
- blk_trace_cleanup(bt);
-
- return 0;
-}
-
-static int blk_dropped_open(struct inode *inode, struct file *filp)
-{
- filp->private_data = inode->i_private;
-
- return 0;
-}
-
-static ssize_t blk_dropped_read(struct file *filp, char __user *buffer,
- size_t count, loff_t *ppos)
-{
- struct blk_trace *bt = filp->private_data;
- char buf[16];
-
- snprintf(buf, sizeof(buf), "%u\n", atomic_read(&bt->dropped));
-
- return simple_read_from_buffer(buffer, count, ppos, buf, strlen(buf));
-}
-
-static const struct file_operations blk_dropped_fops = {
- .owner = THIS_MODULE,
- .open = blk_dropped_open,
- .read = blk_dropped_read,
-};
-
-/*
- * Keep track of how many times we encountered a full subbuffer, to aid
- * the user space app in telling how many lost events there were.
- */
-static int blk_subbuf_start_callback(struct rchan_buf *buf, void *subbuf,
- void *prev_subbuf, size_t prev_padding)
-{
- struct blk_trace *bt;
-
- if (!relay_buf_full(buf))
- return 1;
-
- bt = buf->chan->private_data;
- atomic_inc(&bt->dropped);
- return 0;
-}
-
-static int blk_remove_buf_file_callback(struct dentry *dentry)
-{
- debugfs_remove(dentry);
- return 0;
-}
-
-static struct dentry *blk_create_buf_file_callback(const char *filename,
- struct dentry *parent,
- int mode,
- struct rchan_buf *buf,
- int *is_global)
-{
- return debugfs_create_file(filename, mode, parent, buf,
- &relay_file_operations);
-}
-
-static struct rchan_callbacks blk_relay_callbacks = {
- .subbuf_start = blk_subbuf_start_callback,
- .create_buf_file = blk_create_buf_file_callback,
- .remove_buf_file = blk_remove_buf_file_callback,
-};
-
-/*
- * Setup everything required to start tracing
- */
-int do_blk_trace_setup(struct request_queue *q, struct block_device *bdev,
- struct blk_user_trace_setup *buts)
-{
- struct blk_trace *old_bt, *bt = NULL;
- struct dentry *dir = NULL;
- char b[BDEVNAME_SIZE];
- int ret, i;
-
- if (!buts->buf_size || !buts->buf_nr)
- return -EINVAL;
-
- strcpy(buts->name, bdevname(bdev, b));
-
- /*
- * some device names have larger paths - convert the slashes
- * to underscores for this to work as expected
- */
- for (i = 0; i < strlen(buts->name); i++)
- if (buts->name[i] == '/')
- buts->name[i] = '_';
-
- ret = -ENOMEM;
- bt = kzalloc(sizeof(*bt), GFP_KERNEL);
- if (!bt)
- goto err;
-
- bt->sequence = alloc_percpu(unsigned long);
- if (!bt->sequence)
- goto err;
-
- ret = -ENOENT;
- dir = blk_create_tree(buts->name);
- if (!dir)
- goto err;
-
- bt->dir = dir;
- bt->dev = bdev->bd_dev;
- atomic_set(&bt->dropped, 0);
-
- ret = -EIO;
- bt->dropped_file = debugfs_create_file("dropped", 0444, dir, bt, &blk_dropped_fops);
- if (!bt->dropped_file)
- goto err;
-
- bt->rchan = relay_open("trace", dir, buts->buf_size,
- buts->buf_nr, &blk_relay_callbacks, bt);
- if (!bt->rchan)
- goto err;
-
- bt->act_mask = buts->act_mask;
- if (!bt->act_mask)
- bt->act_mask = (u16) -1;
-
- bt->start_lba = buts->start_lba;
- bt->end_lba = buts->end_lba;
- if (!bt->end_lba)
- bt->end_lba = -1ULL;
-
- bt->pid = buts->pid;
- bt->trace_state = Blktrace_setup;
-
- ret = -EBUSY;
- old_bt = xchg(&q->blk_trace, bt);
- if (old_bt) {
- (void) xchg(&q->blk_trace, old_bt);
- goto err;
- }
-
- return 0;
-err:
- if (dir)
- blk_remove_tree(dir);
- if (bt) {
- if (bt->dropped_file)
- debugfs_remove(bt->dropped_file);
- free_percpu(bt->sequence);
- if (bt->rchan)
- relay_close(bt->rchan);
- kfree(bt);
- }
- return ret;
-}
-
-static int blk_trace_setup(struct request_queue *q, struct block_device *bdev,
- char __user *arg)
-{
- struct blk_user_trace_setup buts;
- int ret;
-
- ret = copy_from_user(&buts, arg, sizeof(buts));
- if (ret)
- return -EFAULT;
-
- ret = do_blk_trace_setup(q, bdev, &buts);
- if (ret)
- return ret;
-
- if (copy_to_user(arg, &buts, sizeof(buts)))
- return -EFAULT;
-
- return 0;
-}
-
-static int blk_trace_startstop(struct request_queue *q, int start)
-{
- struct blk_trace *bt;
- int ret;
-
- if ((bt = q->blk_trace) == NULL)
- return -EINVAL;
-
- /*
- * For starting a trace, we can transition from a setup or stopped
- * trace. For stopping a trace, the state must be running
- */
- ret = -EINVAL;
- if (start) {
- if (bt->trace_state == Blktrace_setup ||
- bt->trace_state == Blktrace_stopped) {
- blktrace_seq++;
- smp_mb();
- bt->trace_state = Blktrace_running;
-
- trace_note_time(bt);
- ret = 0;
- }
- } else {
- if (bt->trace_state == Blktrace_running) {
- bt->trace_state = Blktrace_stopped;
- relay_flush(bt->rchan);
- ret = 0;
- }
- }
-
- return ret;
-}
-
-/**
- * blk_trace_ioctl: - handle the ioctls associated with tracing
- * @bdev: the block device
- * @cmd: the ioctl cmd
- * @arg: the argument data, if any
- *
- **/
-int blk_trace_ioctl(struct block_device *bdev, unsigned cmd, char __user *arg)
-{
- struct request_queue *q;
- int ret, start = 0;
-
- q = bdev_get_queue(bdev);
- if (!q)
- return -ENXIO;
-
- mutex_lock(&bdev->bd_mutex);
-
- switch (cmd) {
- case BLKTRACESETUP:
- ret = blk_trace_setup(q, bdev, arg);
- break;
- case BLKTRACESTART:
- start = 1;
- case BLKTRACESTOP:
- ret = blk_trace_startstop(q, start);
- break;
- case BLKTRACETEARDOWN:
- ret = blk_trace_remove(q);
- break;
- default:
- ret = -ENOTTY;
- break;
- }
-
- mutex_unlock(&bdev->bd_mutex);
- return ret;
-}
-
-/**
- * blk_trace_shutdown: - stop and cleanup trace structures
- * @q: the request queue associated with the device
- *
- **/
-void blk_trace_shutdown(struct request_queue *q)
-{
- if (q->blk_trace) {
- blk_trace_startstop(q, 0);
- blk_trace_remove(q);
- }
-}
-
-/*
- * Average offset over two calls to cpu_clock() with a gettimeofday()
- * in the middle
- */
-static void blk_check_time(unsigned long long *t, int this_cpu)
-{
- unsigned long long a, b;
- struct timeval tv;
-
- a = cpu_clock(this_cpu);
- do_gettimeofday(&tv);
- b = cpu_clock(this_cpu);
-
- *t = tv.tv_sec * 1000000000 + tv.tv_usec * 1000;
- *t -= (a + b) / 2;
-}
-
-/*
- * calibrate our inter-CPU timings
- */
-static void blk_trace_check_cpu_time(void *data)
-{
- unsigned long long *t;
- int this_cpu = get_cpu();
-
- t = &per_cpu(blk_trace_cpu_offset, this_cpu);
-
- /*
- * Just call it twice, hopefully the second call will be cache hot
- * and a little more precise
- */
- blk_check_time(t, this_cpu);
- blk_check_time(t, this_cpu);
-
- put_cpu();
-}
-
-static void blk_trace_set_ht_offsets(void)
-{
-#if defined(CONFIG_SCHED_SMT)
- int cpu, i;
-
- /*
- * now make sure HT siblings have the same time offset
- */
- preempt_disable();
- for_each_online_cpu(cpu) {
- unsigned long long *cpu_off, *sibling_off;
-
- for_each_cpu_mask(i, per_cpu(cpu_sibling_map, cpu)) {
- if (i == cpu)
- continue;
-
- cpu_off = &per_cpu(blk_trace_cpu_offset, cpu);
- sibling_off = &per_cpu(blk_trace_cpu_offset, i);
- *sibling_off = *cpu_off;
- }
- }
- preempt_enable();
-#endif
-}
-
-static __init int blk_trace_init(void)
-{
- mutex_init(&blk_tree_mutex);
- on_each_cpu(blk_trace_check_cpu_time, NULL, 1, 1);
- blk_trace_set_ht_offsets();
-
- return 0;
-}
-
-module_init(blk_trace_init);
-
Index: linux-2.6-lttng.stable/instrumentation/Makefile
===================================================================
--- linux-2.6-lttng.stable.orig/instrumentation/Makefile 2007-10-29 09:53:14.000000000 -0400
+++ linux-2.6-lttng.stable/instrumentation/Makefile 2007-10-29 09:53:24.000000000 -0400
@@ -10,3 +10,4 @@ obj-$(CONFIG_LOCKDEP) += lockdep.o
ifeq ($(CONFIG_PROC_FS),y)
obj-$(CONFIG_LOCKDEP) += lockdep_proc.o
endif
+obj-$(CONFIG_BLK_DEV_IO_TRACE) += blktrace.o
Index: linux-2.6-lttng.stable/instrumentation/blktrace.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng.stable/instrumentation/blktrace.c 2007-10-29 09:53:17.000000000 -0400
@@ -0,0 +1,576 @@
+/*
+ * Copyright (C) 2006 Jens Axboe <axboe@kernel.dk>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ */
+#include <linux/kernel.h>
+#include <linux/blkdev.h>
+#include <linux/blktrace_api.h>
+#include <linux/percpu.h>
+#include <linux/init.h>
+#include <linux/mutex.h>
+#include <linux/debugfs.h>
+#include <linux/time.h>
+#include <asm/uaccess.h>
+
+static DEFINE_PER_CPU(unsigned long long, blk_trace_cpu_offset) = { 0, };
+static unsigned int blktrace_seq __read_mostly = 1;
+
+/*
+ * Send out a notify message.
+ */
+static void trace_note(struct blk_trace *bt, pid_t pid, int action,
+ const void *data, size_t len)
+{
+ struct blk_io_trace *t;
+
+ t = relay_reserve(bt->rchan, sizeof(*t) + len);
+ if (t) {
+ const int cpu = smp_processor_id();
+
+ t->magic = BLK_IO_TRACE_MAGIC | BLK_IO_TRACE_VERSION;
+ t->time = cpu_clock(cpu) - per_cpu(blk_trace_cpu_offset, cpu);
+ t->device = bt->dev;
+ t->action = action;
+ t->pid = pid;
+ t->cpu = cpu;
+ t->pdu_len = len;
+ memcpy((void *) t + sizeof(*t), data, len);
+ }
+}
+
+/*
+ * Send out a notify for this process, if we haven't done so since a trace
+ * started
+ */
+static void trace_note_tsk(struct blk_trace *bt, struct task_struct *tsk)
+{
+ tsk->btrace_seq = blktrace_seq;
+ trace_note(bt, tsk->pid, BLK_TN_PROCESS, tsk->comm, sizeof(tsk->comm));
+}
+
+static void trace_note_time(struct blk_trace *bt)
+{
+ struct timespec now;
+ unsigned long flags;
+ u32 words[2];
+
+ getnstimeofday(&now);
+ words[0] = now.tv_sec;
+ words[1] = now.tv_nsec;
+
+ local_irq_save(flags);
+ trace_note(bt, 0, BLK_TN_TIMESTAMP, words, sizeof(words));
+ local_irq_restore(flags);
+}
+
+static int act_log_check(struct blk_trace *bt, u32 what, sector_t sector,
+ pid_t pid)
+{
+ if (((bt->act_mask << BLK_TC_SHIFT) & what) == 0)
+ return 1;
+ if (sector < bt->start_lba || sector > bt->end_lba)
+ return 1;
+ if (bt->pid && pid != bt->pid)
+ return 1;
+
+ return 0;
+}
+
+/*
+ * Data direction bit lookup
+ */
+static u32 ddir_act[2] __read_mostly = { BLK_TC_ACT(BLK_TC_READ), BLK_TC_ACT(BLK_TC_WRITE) };
+
+/*
+ * Bio action bits of interest
+ */
+static u32 bio_act[9] __read_mostly = { 0, BLK_TC_ACT(BLK_TC_BARRIER), BLK_TC_ACT(BLK_TC_SYNC), 0, BLK_TC_ACT(BLK_TC_AHEAD), 0, 0, 0, BLK_TC_ACT(BLK_TC_META) };
+
+/*
+ * More could be added as needed, taking care to increment the decrementer
+ * to get correct indexing
+ */
+#define trace_barrier_bit(rw) \
+ (((rw) & (1 << BIO_RW_BARRIER)) >> (BIO_RW_BARRIER - 0))
+#define trace_sync_bit(rw) \
+ (((rw) & (1 << BIO_RW_SYNC)) >> (BIO_RW_SYNC - 1))
+#define trace_ahead_bit(rw) \
+ (((rw) & (1 << BIO_RW_AHEAD)) << (2 - BIO_RW_AHEAD))
+#define trace_meta_bit(rw) \
+ (((rw) & (1 << BIO_RW_META)) >> (BIO_RW_META - 3))
+
+/*
+ * The worker for the various blk_add_trace*() types. Fills out a
+ * blk_io_trace structure and places it in a per-cpu subbuffer.
+ */
+void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
+ int rw, u32 what, int error, int pdu_len, void *pdu_data)
+{
+ struct task_struct *tsk = current;
+ struct blk_io_trace *t;
+ unsigned long flags;
+ unsigned long *sequence;
+ pid_t pid;
+ int cpu;
+
+ if (unlikely(bt->trace_state != Blktrace_running))
+ return;
+
+ what |= ddir_act[rw & WRITE];
+ what |= bio_act[trace_barrier_bit(rw)];
+ what |= bio_act[trace_sync_bit(rw)];
+ what |= bio_act[trace_ahead_bit(rw)];
+ what |= bio_act[trace_meta_bit(rw)];
+
+ pid = tsk->pid;
+ if (unlikely(act_log_check(bt, what, sector, pid)))
+ return;
+
+ /*
+ * A word about the locking here - we disable interrupts to reserve
+ * some space in the relay per-cpu buffer, to prevent an irq
+ * from coming in and stepping on our toes. Once reserved, it's
+ * enough to get preemption disabled to prevent read of this data
+ * before we are through filling it. get_cpu()/put_cpu() does this
+ * for us
+ */
+ local_irq_save(flags);
+
+ if (unlikely(tsk->btrace_seq != blktrace_seq))
+ trace_note_tsk(bt, tsk);
+
+ t = relay_reserve(bt->rchan, sizeof(*t) + pdu_len);
+ if (t) {
+ cpu = smp_processor_id();
+ sequence = per_cpu_ptr(bt->sequence, cpu);
+
+ t->magic = BLK_IO_TRACE_MAGIC | BLK_IO_TRACE_VERSION;
+ t->sequence = ++(*sequence);
+ t->time = cpu_clock(cpu) - per_cpu(blk_trace_cpu_offset, cpu);
+ t->sector = sector;
+ t->bytes = bytes;
+ t->action = what;
+ t->pid = pid;
+ t->device = bt->dev;
+ t->cpu = cpu;
+ t->error = error;
+ t->pdu_len = pdu_len;
+
+ if (pdu_len)
+ memcpy((void *) t + sizeof(*t), pdu_data, pdu_len);
+ }
+
+ local_irq_restore(flags);
+}
+
+EXPORT_SYMBOL_GPL(__blk_add_trace);
+
+static struct dentry *blk_tree_root;
+static struct mutex blk_tree_mutex;
+static unsigned int root_users;
+
+static inline void blk_remove_root(void)
+{
+ if (blk_tree_root) {
+ debugfs_remove(blk_tree_root);
+ blk_tree_root = NULL;
+ }
+}
+
+static void blk_remove_tree(struct dentry *dir)
+{
+ mutex_lock(&blk_tree_mutex);
+ debugfs_remove(dir);
+ if (--root_users == 0)
+ blk_remove_root();
+ mutex_unlock(&blk_tree_mutex);
+}
+
+static struct dentry *blk_create_tree(const char *blk_name)
+{
+ struct dentry *dir = NULL;
+
+ mutex_lock(&blk_tree_mutex);
+
+ if (!blk_tree_root) {
+ blk_tree_root = debugfs_create_dir("block", NULL);
+ if (!blk_tree_root)
+ goto err;
+ }
+
+ dir = debugfs_create_dir(blk_name, blk_tree_root);
+ if (dir)
+ root_users++;
+ else
+ blk_remove_root();
+
+err:
+ mutex_unlock(&blk_tree_mutex);
+ return dir;
+}
+
+static void blk_trace_cleanup(struct blk_trace *bt)
+{
+ relay_close(bt->rchan);
+ debugfs_remove(bt->dropped_file);
+ blk_remove_tree(bt->dir);
+ free_percpu(bt->sequence);
+ kfree(bt);
+}
+
+static int blk_trace_remove(struct request_queue *q)
+{
+ struct blk_trace *bt;
+
+ bt = xchg(&q->blk_trace, NULL);
+ if (!bt)
+ return -EINVAL;
+
+ if (bt->trace_state == Blktrace_setup ||
+ bt->trace_state == Blktrace_stopped)
+ blk_trace_cleanup(bt);
+
+ return 0;
+}
+
+static int blk_dropped_open(struct inode *inode, struct file *filp)
+{
+ filp->private_data = inode->i_private;
+
+ return 0;
+}
+
+static ssize_t blk_dropped_read(struct file *filp, char __user *buffer,
+ size_t count, loff_t *ppos)
+{
+ struct blk_trace *bt = filp->private_data;
+ char buf[16];
+
+ snprintf(buf, sizeof(buf), "%u\n", atomic_read(&bt->dropped));
+
+ return simple_read_from_buffer(buffer, count, ppos, buf, strlen(buf));
+}
+
+static const struct file_operations blk_dropped_fops = {
+ .owner = THIS_MODULE,
+ .open = blk_dropped_open,
+ .read = blk_dropped_read,
+};
+
+/*
+ * Keep track of how many times we encountered a full subbuffer, to aid
+ * the user space app in telling how many lost events there were.
+ */
+static int blk_subbuf_start_callback(struct rchan_buf *buf, void *subbuf,
+ void *prev_subbuf, size_t prev_padding)
+{
+ struct blk_trace *bt;
+
+ if (!relay_buf_full(buf))
+ return 1;
+
+ bt = buf->chan->private_data;
+ atomic_inc(&bt->dropped);
+ return 0;
+}
+
+static int blk_remove_buf_file_callback(struct dentry *dentry)
+{
+ debugfs_remove(dentry);
+ return 0;
+}
+
+static struct dentry *blk_create_buf_file_callback(const char *filename,
+ struct dentry *parent,
+ int mode,
+ struct rchan_buf *buf,
+ int *is_global)
+{
+ return debugfs_create_file(filename, mode, parent, buf,
+ &relay_file_operations);
+}
+
+static struct rchan_callbacks blk_relay_callbacks = {
+ .subbuf_start = blk_subbuf_start_callback,
+ .create_buf_file = blk_create_buf_file_callback,
+ .remove_buf_file = blk_remove_buf_file_callback,
+};
+
+/*
+ * Setup everything required to start tracing
+ */
+int do_blk_trace_setup(struct request_queue *q, struct block_device *bdev,
+ struct blk_user_trace_setup *buts)
+{
+ struct blk_trace *old_bt, *bt = NULL;
+ struct dentry *dir = NULL;
+ char b[BDEVNAME_SIZE];
+ int ret, i;
+
+ if (!buts->buf_size || !buts->buf_nr)
+ return -EINVAL;
+
+ strcpy(buts->name, bdevname(bdev, b));
+
+ /*
+ * some device names have larger paths - convert the slashes
+ * to underscores for this to work as expected
+ */
+ for (i = 0; i < strlen(buts->name); i++)
+ if (buts->name[i] == '/')
+ buts->name[i] = '_';
+
+ ret = -ENOMEM;
+ bt = kzalloc(sizeof(*bt), GFP_KERNEL);
+ if (!bt)
+ goto err;
+
+ bt->sequence = alloc_percpu(unsigned long);
+ if (!bt->sequence)
+ goto err;
+
+ ret = -ENOENT;
+ dir = blk_create_tree(buts->name);
+ if (!dir)
+ goto err;
+
+ bt->dir = dir;
+ bt->dev = bdev->bd_dev;
+ atomic_set(&bt->dropped, 0);
+
+ ret = -EIO;
+ bt->dropped_file = debugfs_create_file("dropped", 0444, dir, bt, &blk_dropped_fops);
+ if (!bt->dropped_file)
+ goto err;
+
+ bt->rchan = relay_open("trace", dir, buts->buf_size,
+ buts->buf_nr, &blk_relay_callbacks, bt);
+ if (!bt->rchan)
+ goto err;
+
+ bt->act_mask = buts->act_mask;
+ if (!bt->act_mask)
+ bt->act_mask = (u16) -1;
+
+ bt->start_lba = buts->start_lba;
+ bt->end_lba = buts->end_lba;
+ if (!bt->end_lba)
+ bt->end_lba = -1ULL;
+
+ bt->pid = buts->pid;
+ bt->trace_state = Blktrace_setup;
+
+ ret = -EBUSY;
+ old_bt = xchg(&q->blk_trace, bt);
+ if (old_bt) {
+ (void) xchg(&q->blk_trace, old_bt);
+ goto err;
+ }
+
+ return 0;
+err:
+ if (dir)
+ blk_remove_tree(dir);
+ if (bt) {
+ if (bt->dropped_file)
+ debugfs_remove(bt->dropped_file);
+ free_percpu(bt->sequence);
+ if (bt->rchan)
+ relay_close(bt->rchan);
+ kfree(bt);
+ }
+ return ret;
+}
+
+static int blk_trace_setup(struct request_queue *q, struct block_device *bdev,
+ char __user *arg)
+{
+ struct blk_user_trace_setup buts;
+ int ret;
+
+ ret = copy_from_user(&buts, arg, sizeof(buts));
+ if (ret)
+ return -EFAULT;
+
+ ret = do_blk_trace_setup(q, bdev, &buts);
+ if (ret)
+ return ret;
+
+ if (copy_to_user(arg, &buts, sizeof(buts)))
+ return -EFAULT;
+
+ return 0;
+}
+
+static int blk_trace_startstop(struct request_queue *q, int start)
+{
+ struct blk_trace *bt;
+ int ret;
+
+ if ((bt = q->blk_trace) == NULL)
+ return -EINVAL;
+
+ /*
+ * For starting a trace, we can transition from a setup or stopped
+ * trace. For stopping a trace, the state must be running
+ */
+ ret = -EINVAL;
+ if (start) {
+ if (bt->trace_state == Blktrace_setup ||
+ bt->trace_state == Blktrace_stopped) {
+ blktrace_seq++;
+ smp_mb();
+ bt->trace_state = Blktrace_running;
+
+ trace_note_time(bt);
+ ret = 0;
+ }
+ } else {
+ if (bt->trace_state == Blktrace_running) {
+ bt->trace_state = Blktrace_stopped;
+ relay_flush(bt->rchan);
+ ret = 0;
+ }
+ }
+
+ return ret;
+}
+
+/**
+ * blk_trace_ioctl: - handle the ioctls associated with tracing
+ * @bdev: the block device
+ * @cmd: the ioctl cmd
+ * @arg: the argument data, if any
+ *
+ **/
+int blk_trace_ioctl(struct block_device *bdev, unsigned cmd, char __user *arg)
+{
+ struct request_queue *q;
+ int ret, start = 0;
+
+ q = bdev_get_queue(bdev);
+ if (!q)
+ return -ENXIO;
+
+ mutex_lock(&bdev->bd_mutex);
+
+ switch (cmd) {
+ case BLKTRACESETUP:
+ ret = blk_trace_setup(q, bdev, arg);
+ break;
+ case BLKTRACESTART:
+ start = 1;
+ case BLKTRACESTOP:
+ ret = blk_trace_startstop(q, start);
+ break;
+ case BLKTRACETEARDOWN:
+ ret = blk_trace_remove(q);
+ break;
+ default:
+ ret = -ENOTTY;
+ break;
+ }
+
+ mutex_unlock(&bdev->bd_mutex);
+ return ret;
+}
+
+/**
+ * blk_trace_shutdown: - stop and cleanup trace structures
+ * @q: the request queue associated with the device
+ *
+ **/
+void blk_trace_shutdown(struct request_queue *q)
+{
+ if (q->blk_trace) {
+ blk_trace_startstop(q, 0);
+ blk_trace_remove(q);
+ }
+}
+
+/*
+ * Average offset over two calls to cpu_clock() with a gettimeofday()
+ * in the middle
+ */
+static void blk_check_time(unsigned long long *t, int this_cpu)
+{
+ unsigned long long a, b;
+ struct timeval tv;
+
+ a = cpu_clock(this_cpu);
+ do_gettimeofday(&tv);
+ b = cpu_clock(this_cpu);
+
+ *t = tv.tv_sec * 1000000000 + tv.tv_usec * 1000;
+ *t -= (a + b) / 2;
+}
+
+/*
+ * calibrate our inter-CPU timings
+ */
+static void blk_trace_check_cpu_time(void *data)
+{
+ unsigned long long *t;
+ int this_cpu = get_cpu();
+
+ t = &per_cpu(blk_trace_cpu_offset, this_cpu);
+
+ /*
+ * Just call it twice, hopefully the second call will be cache hot
+ * and a little more precise
+ */
+ blk_check_time(t, this_cpu);
+ blk_check_time(t, this_cpu);
+
+ put_cpu();
+}
+
+static void blk_trace_set_ht_offsets(void)
+{
+#if defined(CONFIG_SCHED_SMT)
+ int cpu, i;
+
+ /*
+ * now make sure HT siblings have the same time offset
+ */
+ preempt_disable();
+ for_each_online_cpu(cpu) {
+ unsigned long long *cpu_off, *sibling_off;
+
+ for_each_cpu_mask(i, per_cpu(cpu_sibling_map, cpu)) {
+ if (i == cpu)
+ continue;
+
+ cpu_off = &per_cpu(blk_trace_cpu_offset, cpu);
+ sibling_off = &per_cpu(blk_trace_cpu_offset, i);
+ *sibling_off = *cpu_off;
+ }
+ }
+ preempt_enable();
+#endif
+}
+
+static __init int blk_trace_init(void)
+{
+ mutex_init(&blk_tree_mutex);
+ on_each_cpu(blk_trace_check_cpu_time, NULL, 1, 1);
+ blk_trace_set_ht_offsets();
+
+ return 0;
+}
+
+module_init(blk_trace_init);
+
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 13+ messages in thread
* [patch 10/10] Move samples to instrumentation/samples
2007-10-29 14:26 [patch 00/10] [PATCH] Create instrumentation/ kernel directory Mathieu Desnoyers
` (7 preceding siblings ...)
2007-10-29 14:26 ` [patch 09/10] Move blktrace to instrumentation directory Mathieu Desnoyers
@ 2007-10-29 14:26 ` Mathieu Desnoyers
2007-10-29 15:58 ` [patch 01/10] Move oprofile to arch/*/instrumentation Mathieu Desnoyers
9 siblings, 0 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2007-10-29 14:26 UTC (permalink / raw)
To: linux-kernel; +Cc: Mathieu Desnoyers, Sam Ravnborg, Randy Dunlap
[-- Attachment #1: move-samples-to-instrumentation-samples.patch --]
[-- Type: text/plain, Size: 13390 bytes --]
Move the markers samples to instrumentation/samples.
Was previously built late in the kernel build, but is now built at the same time
as instrumentation/ (core-y).
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Sam Ravnborg <sam@ravnborg.org>
CC: Randy Dunlap <rdunlap@xenotime.net>
---
Makefile | 3
instrumentation/Kconfig | 6 +
instrumentation/Makefile | 2
instrumentation/samples/Makefile | 4 +
instrumentation/samples/marker-example.c | 54 +++++++++++++++++
instrumentation/samples/probe-example.c | 98 +++++++++++++++++++++++++++++++
lib/Kconfig.debug | 2
samples/Kconfig | 16 -----
samples/Makefile | 3
samples/markers/Makefile | 4 -
samples/markers/marker-example.c | 54 -----------------
samples/markers/probe-example.c | 98 -------------------------------
12 files changed, 164 insertions(+), 180 deletions(-)
Index: linux-2.6-lttng.stable/instrumentation/Kconfig
===================================================================
--- linux-2.6-lttng.stable.orig/instrumentation/Kconfig 2007-10-29 10:01:09.000000000 -0400
+++ linux-2.6-lttng.stable/instrumentation/Kconfig 2007-10-29 10:03:18.000000000 -0400
@@ -59,4 +59,10 @@ config BLK_DEV_IO_TRACE
git://brick.kernel.dk/data/git/blktrace.git
+config SAMPLE_MARKERS
+ tristate "Build markers examples -- loadable modules only"
+ depends on MARKERS && m
+ help
+ This build markers example modules.
+
endif # INSTRUMENTATION
Index: linux-2.6-lttng.stable/samples/Kconfig
===================================================================
--- linux-2.6-lttng.stable.orig/samples/Kconfig 2007-10-29 10:01:09.000000000 -0400
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,16 +0,0 @@
-# samples/Kconfig
-
-menuconfig SAMPLES
- bool "Sample kernel code"
- help
- You can build and test sample kernel code here.
-
-if SAMPLES
-
-config SAMPLE_MARKERS
- tristate "Build markers examples -- loadable modules only"
- depends on MARKERS && m
- help
- This build markers example modules.
-
-endif # SAMPLES
Index: linux-2.6-lttng.stable/instrumentation/Makefile
===================================================================
--- linux-2.6-lttng.stable.orig/instrumentation/Makefile 2007-10-29 10:03:14.000000000 -0400
+++ linux-2.6-lttng.stable/instrumentation/Makefile 2007-10-29 10:03:18.000000000 -0400
@@ -11,3 +11,5 @@ ifeq ($(CONFIG_PROC_FS),y)
obj-$(CONFIG_LOCKDEP) += lockdep_proc.o
endif
obj-$(CONFIG_BLK_DEV_IO_TRACE) += blktrace.o
+
+obj-y += samples/
Index: linux-2.6-lttng.stable/instrumentation/samples/Makefile
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng.stable/instrumentation/samples/Makefile 2007-10-29 10:17:36.000000000 -0400
@@ -0,0 +1,4 @@
+# builds the kprobes example kernel modules;
+# then to use one (as root): insmod <module_name.ko>
+
+obj-$(CONFIG_SAMPLE_MARKERS) += probe-example.o marker-example.o
Index: linux-2.6-lttng.stable/samples/Makefile
===================================================================
--- linux-2.6-lttng.stable.orig/samples/Makefile 2007-10-29 10:01:09.000000000 -0400
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,3 +0,0 @@
-# Makefile for Linux samples code
-
-obj-$(CONFIG_SAMPLES) += markers/
Index: linux-2.6-lttng.stable/samples/markers/marker-example.c
===================================================================
--- linux-2.6-lttng.stable.orig/samples/markers/marker-example.c 2007-10-29 10:01:09.000000000 -0400
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,54 +0,0 @@
-/* marker-example.c
- *
- * Executes a marker when /proc/marker-example is opened.
- *
- * (C) Copyright 2007 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
- *
- * This file is released under the GPLv2.
- * See the file COPYING for more details.
- */
-
-#include <linux/module.h>
-#include <linux/marker.h>
-#include <linux/sched.h>
-#include <linux/proc_fs.h>
-
-struct proc_dir_entry *pentry_example;
-
-static int my_open(struct inode *inode, struct file *file)
-{
- int i;
-
- trace_mark(subsystem_event, "%d %s", 123, "example string");
- for (i = 0; i < 10; i++)
- trace_mark(subsystem_eventb, MARK_NOARGS);
- return -EPERM;
-}
-
-static struct file_operations mark_ops = {
- .open = my_open,
-};
-
-static int example_init(void)
-{
- printk(KERN_ALERT "example init\n");
- pentry_example = create_proc_entry("marker-example", 0444, NULL);
- if (pentry_example)
- pentry_example->proc_fops = &mark_ops;
- else
- return -EPERM;
- return 0;
-}
-
-static void example_exit(void)
-{
- printk(KERN_ALERT "example exit\n");
- remove_proc_entry("marker-example", NULL);
-}
-
-module_init(example_init)
-module_exit(example_exit)
-
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Mathieu Desnoyers");
-MODULE_DESCRIPTION("Marker example");
Index: linux-2.6-lttng.stable/samples/markers/probe-example.c
===================================================================
--- linux-2.6-lttng.stable.orig/samples/markers/probe-example.c 2007-10-29 10:01:09.000000000 -0400
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,98 +0,0 @@
-/* probe-example.c
- *
- * Connects two functions to marker call sites.
- *
- * (C) Copyright 2007 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
- *
- * This file is released under the GPLv2.
- * See the file COPYING for more details.
- */
-
-#include <linux/sched.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/marker.h>
-#include <asm/atomic.h>
-
-struct probe_data {
- const char *name;
- const char *format;
- marker_probe_func *probe_func;
-};
-
-void probe_subsystem_event(const struct marker *mdata, void *private,
- const char *format, ...)
-{
- va_list ap;
- /* Declare args */
- unsigned int value;
- const char *mystr;
-
- /* Assign args */
- va_start(ap, format);
- value = va_arg(ap, typeof(value));
- mystr = va_arg(ap, typeof(mystr));
-
- /* Call printk */
- printk(KERN_DEBUG "Value %u, string %s\n", value, mystr);
-
- /* or count, check rights, serialize data in a buffer */
-
- va_end(ap);
-}
-
-atomic_t eventb_count = ATOMIC_INIT(0);
-
-void probe_subsystem_eventb(const struct marker *mdata, void *private,
- const char *format, ...)
-{
- /* Increment counter */
- atomic_inc(&eventb_count);
-}
-
-static struct probe_data probe_array[] =
-{
- { .name = "subsystem_event",
- .format = "%d %s",
- .probe_func = probe_subsystem_event },
- { .name = "subsystem_eventb",
- .format = MARK_NOARGS,
- .probe_func = probe_subsystem_eventb },
-};
-
-static int __init probe_init(void)
-{
- int result;
- int i;
-
- for (i = 0; i < ARRAY_SIZE(probe_array); i++) {
- result = marker_probe_register(probe_array[i].name,
- probe_array[i].format,
- probe_array[i].probe_func, &probe_array[i]);
- if (result)
- printk(KERN_INFO "Unable to register probe %s\n",
- probe_array[i].name);
- result = marker_arm(probe_array[i].name);
- if (result)
- printk(KERN_INFO "Unable to arm probe %s\n",
- probe_array[i].name);
- }
- return 0;
-}
-
-static void __exit probe_fini(void)
-{
- int i;
-
- for (i = 0; i < ARRAY_SIZE(probe_array); i++)
- marker_probe_unregister(probe_array[i].name);
- printk(KERN_INFO "Number of event b : %u\n",
- atomic_read(&eventb_count));
-}
-
-module_init(probe_init);
-module_exit(probe_fini);
-
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Mathieu Desnoyers");
-MODULE_DESCRIPTION("SUBSYSTEM Probe");
Index: linux-2.6-lttng.stable/lib/Kconfig.debug
===================================================================
--- linux-2.6-lttng.stable.orig/lib/Kconfig.debug 2007-10-29 10:03:13.000000000 -0400
+++ linux-2.6-lttng.stable/lib/Kconfig.debug 2007-10-29 10:03:18.000000000 -0400
@@ -508,5 +508,3 @@ config FAULT_INJECTION_STACKTRACE_FILTER
select FRAME_POINTER
help
Provide stacktrace filter for fault-injection capabilities
-
-source "samples/Kconfig"
Index: linux-2.6-lttng.stable/Makefile
===================================================================
--- linux-2.6-lttng.stable.orig/Makefile 2007-10-29 10:01:09.000000000 -0400
+++ linux-2.6-lttng.stable/Makefile 2007-10-29 10:03:18.000000000 -0400
@@ -775,9 +775,6 @@ vmlinux: $(vmlinux-lds) $(vmlinux-init)
ifdef CONFIG_HEADERS_CHECK
$(Q)$(MAKE) -f $(srctree)/Makefile headers_check
endif
-ifdef CONFIG_SAMPLES
- $(Q)$(MAKE) $(build)=samples
-endif
$(call vmlinux-modpost)
$(call if_changed_rule,vmlinux__)
$(Q)rm -f .old_version
Index: linux-2.6-lttng.stable/samples/markers/Makefile
===================================================================
--- linux-2.6-lttng.stable.orig/samples/markers/Makefile 2007-10-29 10:01:09.000000000 -0400
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,4 +0,0 @@
-# builds the kprobes example kernel modules;
-# then to use one (as root): insmod <module_name.ko>
-
-obj-$(CONFIG_SAMPLE_MARKERS) += probe-example.o marker-example.o
Index: linux-2.6-lttng.stable/instrumentation/samples/marker-example.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng.stable/instrumentation/samples/marker-example.c 2007-10-29 10:03:18.000000000 -0400
@@ -0,0 +1,54 @@
+/* marker-example.c
+ *
+ * Executes a marker when /proc/marker-example is opened.
+ *
+ * (C) Copyright 2007 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/marker.h>
+#include <linux/sched.h>
+#include <linux/proc_fs.h>
+
+struct proc_dir_entry *pentry_example;
+
+static int my_open(struct inode *inode, struct file *file)
+{
+ int i;
+
+ trace_mark(subsystem_event, "%d %s", 123, "example string");
+ for (i = 0; i < 10; i++)
+ trace_mark(subsystem_eventb, MARK_NOARGS);
+ return -EPERM;
+}
+
+static struct file_operations mark_ops = {
+ .open = my_open,
+};
+
+static int example_init(void)
+{
+ printk(KERN_ALERT "example init\n");
+ pentry_example = create_proc_entry("marker-example", 0444, NULL);
+ if (pentry_example)
+ pentry_example->proc_fops = &mark_ops;
+ else
+ return -EPERM;
+ return 0;
+}
+
+static void example_exit(void)
+{
+ printk(KERN_ALERT "example exit\n");
+ remove_proc_entry("marker-example", NULL);
+}
+
+module_init(example_init)
+module_exit(example_exit)
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION("Marker example");
Index: linux-2.6-lttng.stable/instrumentation/samples/probe-example.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng.stable/instrumentation/samples/probe-example.c 2007-10-29 10:03:18.000000000 -0400
@@ -0,0 +1,98 @@
+/* probe-example.c
+ *
+ * Connects two functions to marker call sites.
+ *
+ * (C) Copyright 2007 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <linux/sched.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/marker.h>
+#include <asm/atomic.h>
+
+struct probe_data {
+ const char *name;
+ const char *format;
+ marker_probe_func *probe_func;
+};
+
+void probe_subsystem_event(const struct marker *mdata, void *private,
+ const char *format, ...)
+{
+ va_list ap;
+ /* Declare args */
+ unsigned int value;
+ const char *mystr;
+
+ /* Assign args */
+ va_start(ap, format);
+ value = va_arg(ap, typeof(value));
+ mystr = va_arg(ap, typeof(mystr));
+
+ /* Call printk */
+ printk(KERN_DEBUG "Value %u, string %s\n", value, mystr);
+
+ /* or count, check rights, serialize data in a buffer */
+
+ va_end(ap);
+}
+
+atomic_t eventb_count = ATOMIC_INIT(0);
+
+void probe_subsystem_eventb(const struct marker *mdata, void *private,
+ const char *format, ...)
+{
+ /* Increment counter */
+ atomic_inc(&eventb_count);
+}
+
+static struct probe_data probe_array[] =
+{
+ { .name = "subsystem_event",
+ .format = "%d %s",
+ .probe_func = probe_subsystem_event },
+ { .name = "subsystem_eventb",
+ .format = MARK_NOARGS,
+ .probe_func = probe_subsystem_eventb },
+};
+
+static int __init probe_init(void)
+{
+ int result;
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(probe_array); i++) {
+ result = marker_probe_register(probe_array[i].name,
+ probe_array[i].format,
+ probe_array[i].probe_func, &probe_array[i]);
+ if (result)
+ printk(KERN_INFO "Unable to register probe %s\n",
+ probe_array[i].name);
+ result = marker_arm(probe_array[i].name);
+ if (result)
+ printk(KERN_INFO "Unable to arm probe %s\n",
+ probe_array[i].name);
+ }
+ return 0;
+}
+
+static void __exit probe_fini(void)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(probe_array); i++)
+ marker_probe_unregister(probe_array[i].name);
+ printk(KERN_INFO "Number of event b : %u\n",
+ atomic_read(&eventb_count));
+}
+
+module_init(probe_init);
+module_exit(probe_fini);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION("SUBSYSTEM Probe");
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 05/10] Move profiling to instrumentation
2007-10-29 14:26 ` [patch 05/10] Move profiling to instrumentation Mathieu Desnoyers
@ 2007-10-29 14:59 ` Arjan van de Ven
2007-10-29 15:07 ` Mathieu Desnoyers
0 siblings, 1 reply; 13+ messages in thread
From: Arjan van de Ven @ 2007-10-29 14:59 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: linux-kernel, Mathieu Desnoyers, Ingo Molnar, William L. Irwin
On Mon, 29 Oct 2007 10:26:51 -0400
Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> Move kernel/profile.c to instrumentation/.
I have one question... can this be done via a git move command so that
history is properly maintained?
--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 05/10] Move profiling to instrumentation
2007-10-29 14:59 ` Arjan van de Ven
@ 2007-10-29 15:07 ` Mathieu Desnoyers
0 siblings, 0 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2007-10-29 15:07 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: linux-kernel, Ingo Molnar, William L. Irwin
* Arjan van de Ven (arjan@infradead.org) wrote:
> On Mon, 29 Oct 2007 10:26:51 -0400
> Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
>
> > Move kernel/profile.c to instrumentation/.
>
>
> I have one question... can this be done via a git move command so that
> history is properly maintained?
>
moving kernel/profile.c to instrumentation/profile.c should be done
through a git mv. Moving the Makefile lines should be done through a
patch.
But since I use quilt to manage my patchset, I guess I don't have this
ability. Should I switch to git to perform this change to make
everyone's life easier or should I simply ask kindly for maintainers to
perform git mv in their tree ?
Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 13+ messages in thread
* [patch 01/10] Move oprofile to arch/*/instrumentation
2007-10-29 14:26 [patch 00/10] [PATCH] Create instrumentation/ kernel directory Mathieu Desnoyers
` (8 preceding siblings ...)
2007-10-29 14:26 ` [patch 10/10] Move samples to instrumentation/samples Mathieu Desnoyers
@ 2007-10-29 15:58 ` Mathieu Desnoyers
9 siblings, 0 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2007-10-29 15:58 UTC (permalink / raw)
To: linux-kernel; +Cc: Philippe Elie
This patch is way too big for LKML (524k). I'll split it into each
different architectures for next submission.
Here is the changelog:
Move oprofile to arch/*/instrumentation
- Move oprofile to arch/*/instrumentation
- Fix per architecture Makefiles to follow this change
- Did _not_ fix whitespaces in oprofile so the file move is easy to follow.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Philippe Elie <phil.el@wanadoo.fr>
---
arch/alpha/Makefile | 2
arch/alpha/instrumentation/Makefile | 19
arch/alpha/instrumentation/common.c | 189 +++
arch/alpha/instrumentation/op_impl.h | 55
arch/alpha/instrumentation/op_model_ev4.c | 116 ++
arch/alpha/instrumentation/op_model_ev5.c | 211 +++
arch/alpha/instrumentation/op_model_ev6.c | 103 +
arch/alpha/instrumentation/op_model_ev67.c | 263 ++++
arch/alpha/oprofile/Makefile | 19
arch/alpha/oprofile/common.c | 189 ---
arch/alpha/oprofile/op_impl.h | 55
arch/alpha/oprofile/op_model_ev4.c | 116 --
arch/alpha/oprofile/op_model_ev5.c | 211 ---
arch/alpha/oprofile/op_model_ev6.c | 103 -
arch/alpha/oprofile/op_model_ev67.c | 263 ----
arch/arm/Makefile | 2
arch/arm/instrumentation/Makefile | 13
arch/arm/instrumentation/backtrace.c | 81 +
arch/arm/instrumentation/common.c | 179 +++
arch/arm/instrumentation/op_arm_model.h | 34
arch/arm/instrumentation/op_counter.h | 27
arch/arm/instrumentation/op_model_arm11_core.c | 162 ++
arch/arm/instrumentation/op_model_arm11_core.h | 45
arch/arm/instrumentation/op_model_mpcore.c | 303 +++++
arch/arm/instrumentation/op_model_mpcore.h | 61 +
arch/arm/instrumentation/op_model_v6.c | 67 +
arch/arm/instrumentation/op_model_xscale.c | 441 ++++++++
arch/arm/oprofile/Makefile | 13
arch/arm/oprofile/backtrace.c | 81 -
arch/arm/oprofile/common.c | 179 ---
arch/arm/oprofile/op_arm_model.h | 34
arch/arm/oprofile/op_counter.h | 27
arch/arm/oprofile/op_model_arm11_core.c | 162 --
arch/arm/oprofile/op_model_arm11_core.h | 45
arch/arm/oprofile/op_model_mpcore.c | 303 -----
arch/arm/oprofile/op_model_mpcore.h | 61 -
arch/arm/oprofile/op_model_v6.c | 67 -
arch/arm/oprofile/op_model_xscale.c | 441 --------
arch/avr32/Makefile | 1
arch/avr32/instrumentation/Makefile | 3
arch/blackfin/Makefile | 3
arch/blackfin/instrumentation/Makefile | 14
arch/blackfin/instrumentation/common.c | 168 +++
arch/blackfin/instrumentation/op_blackfin.h | 98 +
arch/blackfin/instrumentation/op_model_bf533.c | 161 ++
arch/blackfin/instrumentation/timer_int.c | 73 +
arch/blackfin/oprofile/Makefile | 14
arch/blackfin/oprofile/common.c | 168 ---
arch/blackfin/oprofile/op_blackfin.h | 98 -
arch/blackfin/oprofile/op_model_bf533.c | 161 --
arch/blackfin/oprofile/timer_int.c | 73 -
arch/i386/Makefile | 2
arch/ia64/Makefile | 2
arch/ia64/instrumentation/Makefile | 10
arch/ia64/instrumentation/backtrace.c | 150 ++
arch/ia64/instrumentation/init.c | 38
arch/ia64/instrumentation/perfmon.c | 99 +
arch/ia64/oprofile/Makefile | 10
arch/ia64/oprofile/backtrace.c | 150 --
arch/ia64/oprofile/init.c | 38
arch/ia64/oprofile/perfmon.c | 99 -
arch/m32r/Makefile | 2
arch/m32r/instrumentation/Makefile | 9
arch/m32r/instrumentation/init.c | 22
arch/m32r/oprofile/Makefile | 9
arch/m32r/oprofile/init.c | 22
arch/mips/Makefile | 2
arch/mips/instrumentation/Makefile | 17
arch/mips/instrumentation/common.c | 123 ++
arch/mips/instrumentation/op_impl.h | 40
arch/mips/instrumentation/op_model_mipsxx.c | 348 ++++++
arch/mips/instrumentation/op_model_rm9000.c | 138 ++
arch/mips/oprofile/Makefile | 17
arch/mips/oprofile/common.c | 123 --
arch/mips/oprofile/op_impl.h | 40
arch/mips/oprofile/op_model_mipsxx.c | 348 ------
arch/mips/oprofile/op_model_rm9000.c | 138 --
arch/parisc/Makefile | 2
arch/parisc/instrumentation/Makefile | 9
arch/parisc/instrumentation/init.c | 23
arch/parisc/oprofile/Makefile | 9
arch/parisc/oprofile/init.c | 23
arch/powerpc/Makefile | 2
arch/powerpc/instrumentation/Makefile | 19
arch/powerpc/instrumentation/backtrace.c | 127 ++
arch/powerpc/instrumentation/common.c | 232 ++++
arch/powerpc/instrumentation/op_model_7450.c | 212 +++
arch/powerpc/instrumentation/op_model_cell.c | 1211 ++++++++++++++++++++++
arch/powerpc/instrumentation/op_model_fsl_booke.c | 371 ++++++
arch/powerpc/instrumentation/op_model_pa6t.c | 240 ++++
arch/powerpc/instrumentation/op_model_power4.c | 317 +++++
arch/powerpc/instrumentation/op_model_rs64.c | 224 ++++
arch/powerpc/oprofile/Makefile | 19
arch/powerpc/oprofile/backtrace.c | 127 --
arch/powerpc/oprofile/common.c | 232 ----
arch/powerpc/oprofile/op_model_7450.c | 212 ---
arch/powerpc/oprofile/op_model_cell.c | 1211 ----------------------
arch/powerpc/oprofile/op_model_fsl_booke.c | 371 ------
arch/powerpc/oprofile/op_model_pa6t.c | 240 ----
arch/powerpc/oprofile/op_model_power4.c | 317 -----
arch/powerpc/oprofile/op_model_rs64.c | 224 ----
arch/ppc/Makefile | 2
arch/s390/Makefile | 2
arch/s390/instrumentation/Makefile | 9
arch/s390/instrumentation/backtrace.c | 79 +
arch/s390/instrumentation/init.c | 26
arch/s390/oprofile/Makefile | 9
arch/s390/oprofile/backtrace.c | 79 -
arch/s390/oprofile/init.c | 26
arch/sh/Makefile | 2
arch/sh/instrumentation/Makefile | 17
arch/sh/instrumentation/op_model_null.c | 23
arch/sh/instrumentation/op_model_sh7750.c | 281 +++++
arch/sh/oprofile/Makefile | 17
arch/sh/oprofile/op_model_null.c | 23
arch/sh/oprofile/op_model_sh7750.c | 281 -----
arch/sh64/Makefile | 2
arch/sh64/instrumentation/Makefile | 12
arch/sh64/instrumentation/op_model_null.c | 23
arch/sh64/oprofile/Makefile | 12
arch/sh64/oprofile/op_model_null.c | 23
arch/sparc/Makefile | 2
arch/sparc/instrumentation/Makefile | 9
arch/sparc/instrumentation/init.c | 23
arch/sparc/oprofile/Makefile | 9
arch/sparc/oprofile/init.c | 23
arch/sparc64/Makefile | 2
arch/sparc64/instrumentation/Makefile | 9
arch/sparc64/instrumentation/init.c | 23
arch/sparc64/oprofile/Makefile | 9
arch/sparc64/oprofile/init.c | 23
arch/x86/instrumentation/Makefile | 12
arch/x86/instrumentation/backtrace.c | 91 +
arch/x86/instrumentation/init.c | 48
arch/x86/instrumentation/nmi_int.c | 477 ++++++++
arch/x86/instrumentation/nmi_timer_int.c | 69 +
arch/x86/instrumentation/op_counter.h | 29
arch/x86/instrumentation/op_model_athlon.c | 180 +++
arch/x86/instrumentation/op_model_p4.c | 722 +++++++++++++
arch/x86/instrumentation/op_model_ppro.c | 192 +++
arch/x86/instrumentation/op_x86_model.h | 51
arch/x86/oprofile/Makefile | 12
arch/x86/oprofile/backtrace.c | 91 -
arch/x86/oprofile/init.c | 48
arch/x86/oprofile/nmi_int.c | 477 --------
arch/x86/oprofile/nmi_timer_int.c | 69 -
arch/x86/oprofile/op_counter.h | 29
arch/x86/oprofile/op_model_athlon.c | 180 ---
arch/x86/oprofile/op_model_p4.c | 722 -------------
arch/x86/oprofile/op_model_ppro.c | 192 ---
arch/x86/oprofile/op_x86_model.h | 51
arch/x86_64/Makefile | 2
152 files changed, 9287 insertions(+), 9284 deletions(-)
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2007-10-29 15:58 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-29 14:26 [patch 00/10] [PATCH] Create instrumentation/ kernel directory Mathieu Desnoyers
2007-10-29 14:26 ` [patch 01/10] Create instrumentation directory Mathieu Desnoyers
2007-10-29 14:26 ` [patch 03/10] Move kprobes to instrumentation/ and arch/*/instrumentation/ Mathieu Desnoyers
2007-10-29 14:26 ` [patch 04/10] Move Linux Kernel Markers to instrumentation directory Mathieu Desnoyers
2007-10-29 14:26 ` [patch 05/10] Move profiling to instrumentation Mathieu Desnoyers
2007-10-29 14:59 ` Arjan van de Ven
2007-10-29 15:07 ` Mathieu Desnoyers
2007-10-29 14:26 ` [patch 06/10] Move lockdep " Mathieu Desnoyers
2007-10-29 14:26 ` [patch 07/10] Move vmstat to instrumentation/ Mathieu Desnoyers
2007-10-29 14:26 ` [patch 08/10] Move blktrace Kconfig entry to instrumentation menu Mathieu Desnoyers
2007-10-29 14:26 ` [patch 09/10] Move blktrace to instrumentation directory Mathieu Desnoyers
2007-10-29 14:26 ` [patch 10/10] Move samples to instrumentation/samples Mathieu Desnoyers
2007-10-29 15:58 ` [patch 01/10] Move oprofile to arch/*/instrumentation Mathieu Desnoyers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox