* [patch 0/4] Port KVM-trace to tracepoints
@ 2008-07-17 15:57 Mathieu Desnoyers
2008-07-17 15:57 ` [patch 1/4] kvm move VMCS Encodings to system headers Mathieu Desnoyers
` (4 more replies)
0 siblings, 5 replies; 40+ messages in thread
From: Mathieu Desnoyers @ 2008-07-17 15:57 UTC (permalink / raw)
To: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Avi Kivity, kvm
Hi,
Here is a port of KVM-trace, currently using macros on top of the Linux Markers,
to tracepoints. Note that I cleaned up the instrumentation too, so stuff like
KVMTRACE_3D(CR_WRITE, vcpu, (u32)cr,
(u32)kvm_register_read(vcpu, reg),
(u32)((u64)kvm_register_read(vcpu, reg) >> 32),
handler);
Sprinkled all across the KVM C code becomes :
trace_kvm_cr_write(vcpu, cr, reg);
Which looks much nicer, IMHO.
It applies on top of linux-next patch-v2.6.26-next-20080715.
Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 40+ messages in thread
* [patch 1/4] kvm move VMCS Encodings to system headers
2008-07-17 15:57 [patch 0/4] Port KVM-trace to tracepoints Mathieu Desnoyers
@ 2008-07-17 15:57 ` Mathieu Desnoyers
2008-07-17 15:57 ` [patch 2/4] kvm move VMCS read " Mathieu Desnoyers
` (3 subsequent siblings)
4 siblings, 0 replies; 40+ messages in thread
From: Mathieu Desnoyers @ 2008-07-17 15:57 UTC (permalink / raw)
To: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Avi Kivity, kvm
Cc: Mathieu Desnoyers, Feng(Eric) Liu
[-- Attachment #1: kvm-move-vmcs-encodings-to-system-headers.patch --]
[-- Type: text/plain, Size: 14937 bytes --]
The VMCS Encodings will be needed by the kvm-trace probes. Put them in
system-side headers instead of local header.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: 'Peter Zijlstra' <peterz@infradead.org>
CC: 'Feng(Eric) Liu' <eric.e.liu@intel.com>
CC: Avi Kivity <avi@qumranet.com>
CC: kvm@vger.kernel.org
---
arch/x86/kvm/vmx.h | 142 ---------------------------------------------
include/asm-x86/kvm_host.h | 142 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 142 insertions(+), 142 deletions(-)
Index: linux-2.6-lttng/include/asm-x86/kvm_host.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-x86/kvm_host.h 2008-07-17 11:15:31.000000000 -0400
+++ linux-2.6-lttng/include/asm-x86/kvm_host.h 2008-07-17 11:16:30.000000000 -0400
@@ -719,4 +719,146 @@ asmlinkage void kvm_handle_fault_on_rebo
KVM_EX_ENTRY " 666b, 667b \n\t" \
".popsection"
+/* VMCS Encodings */
+enum vmcs_field {
+ VIRTUAL_PROCESSOR_ID = 0x00000000,
+ GUEST_ES_SELECTOR = 0x00000800,
+ GUEST_CS_SELECTOR = 0x00000802,
+ GUEST_SS_SELECTOR = 0x00000804,
+ GUEST_DS_SELECTOR = 0x00000806,
+ GUEST_FS_SELECTOR = 0x00000808,
+ GUEST_GS_SELECTOR = 0x0000080a,
+ GUEST_LDTR_SELECTOR = 0x0000080c,
+ GUEST_TR_SELECTOR = 0x0000080e,
+ HOST_ES_SELECTOR = 0x00000c00,
+ HOST_CS_SELECTOR = 0x00000c02,
+ HOST_SS_SELECTOR = 0x00000c04,
+ HOST_DS_SELECTOR = 0x00000c06,
+ HOST_FS_SELECTOR = 0x00000c08,
+ HOST_GS_SELECTOR = 0x00000c0a,
+ HOST_TR_SELECTOR = 0x00000c0c,
+ IO_BITMAP_A = 0x00002000,
+ IO_BITMAP_A_HIGH = 0x00002001,
+ IO_BITMAP_B = 0x00002002,
+ IO_BITMAP_B_HIGH = 0x00002003,
+ MSR_BITMAP = 0x00002004,
+ MSR_BITMAP_HIGH = 0x00002005,
+ VM_EXIT_MSR_STORE_ADDR = 0x00002006,
+ VM_EXIT_MSR_STORE_ADDR_HIGH = 0x00002007,
+ VM_EXIT_MSR_LOAD_ADDR = 0x00002008,
+ VM_EXIT_MSR_LOAD_ADDR_HIGH = 0x00002009,
+ VM_ENTRY_MSR_LOAD_ADDR = 0x0000200a,
+ VM_ENTRY_MSR_LOAD_ADDR_HIGH = 0x0000200b,
+ TSC_OFFSET = 0x00002010,
+ TSC_OFFSET_HIGH = 0x00002011,
+ VIRTUAL_APIC_PAGE_ADDR = 0x00002012,
+ VIRTUAL_APIC_PAGE_ADDR_HIGH = 0x00002013,
+ APIC_ACCESS_ADDR = 0x00002014,
+ APIC_ACCESS_ADDR_HIGH = 0x00002015,
+ EPT_POINTER = 0x0000201a,
+ EPT_POINTER_HIGH = 0x0000201b,
+ GUEST_PHYSICAL_ADDRESS = 0x00002400,
+ GUEST_PHYSICAL_ADDRESS_HIGH = 0x00002401,
+ VMCS_LINK_POINTER = 0x00002800,
+ VMCS_LINK_POINTER_HIGH = 0x00002801,
+ GUEST_IA32_DEBUGCTL = 0x00002802,
+ GUEST_IA32_DEBUGCTL_HIGH = 0x00002803,
+ GUEST_PDPTR0 = 0x0000280a,
+ GUEST_PDPTR0_HIGH = 0x0000280b,
+ GUEST_PDPTR1 = 0x0000280c,
+ GUEST_PDPTR1_HIGH = 0x0000280d,
+ GUEST_PDPTR2 = 0x0000280e,
+ GUEST_PDPTR2_HIGH = 0x0000280f,
+ GUEST_PDPTR3 = 0x00002810,
+ GUEST_PDPTR3_HIGH = 0x00002811,
+ PIN_BASED_VM_EXEC_CONTROL = 0x00004000,
+ CPU_BASED_VM_EXEC_CONTROL = 0x00004002,
+ EXCEPTION_BITMAP = 0x00004004,
+ PAGE_FAULT_ERROR_CODE_MASK = 0x00004006,
+ PAGE_FAULT_ERROR_CODE_MATCH = 0x00004008,
+ CR3_TARGET_COUNT = 0x0000400a,
+ VM_EXIT_CONTROLS = 0x0000400c,
+ VM_EXIT_MSR_STORE_COUNT = 0x0000400e,
+ VM_EXIT_MSR_LOAD_COUNT = 0x00004010,
+ VM_ENTRY_CONTROLS = 0x00004012,
+ VM_ENTRY_MSR_LOAD_COUNT = 0x00004014,
+ VM_ENTRY_INTR_INFO_FIELD = 0x00004016,
+ VM_ENTRY_EXCEPTION_ERROR_CODE = 0x00004018,
+ VM_ENTRY_INSTRUCTION_LEN = 0x0000401a,
+ TPR_THRESHOLD = 0x0000401c,
+ SECONDARY_VM_EXEC_CONTROL = 0x0000401e,
+ VM_INSTRUCTION_ERROR = 0x00004400,
+ VM_EXIT_REASON = 0x00004402,
+ VM_EXIT_INTR_INFO = 0x00004404,
+ VM_EXIT_INTR_ERROR_CODE = 0x00004406,
+ IDT_VECTORING_INFO_FIELD = 0x00004408,
+ IDT_VECTORING_ERROR_CODE = 0x0000440a,
+ VM_EXIT_INSTRUCTION_LEN = 0x0000440c,
+ VMX_INSTRUCTION_INFO = 0x0000440e,
+ GUEST_ES_LIMIT = 0x00004800,
+ GUEST_CS_LIMIT = 0x00004802,
+ GUEST_SS_LIMIT = 0x00004804,
+ GUEST_DS_LIMIT = 0x00004806,
+ GUEST_FS_LIMIT = 0x00004808,
+ GUEST_GS_LIMIT = 0x0000480a,
+ GUEST_LDTR_LIMIT = 0x0000480c,
+ GUEST_TR_LIMIT = 0x0000480e,
+ GUEST_GDTR_LIMIT = 0x00004810,
+ GUEST_IDTR_LIMIT = 0x00004812,
+ GUEST_ES_AR_BYTES = 0x00004814,
+ GUEST_CS_AR_BYTES = 0x00004816,
+ GUEST_SS_AR_BYTES = 0x00004818,
+ GUEST_DS_AR_BYTES = 0x0000481a,
+ GUEST_FS_AR_BYTES = 0x0000481c,
+ GUEST_GS_AR_BYTES = 0x0000481e,
+ GUEST_LDTR_AR_BYTES = 0x00004820,
+ GUEST_TR_AR_BYTES = 0x00004822,
+ GUEST_INTERRUPTIBILITY_INFO = 0x00004824,
+ GUEST_ACTIVITY_STATE = 0X00004826,
+ GUEST_SYSENTER_CS = 0x0000482A,
+ HOST_IA32_SYSENTER_CS = 0x00004c00,
+ CR0_GUEST_HOST_MASK = 0x00006000,
+ CR4_GUEST_HOST_MASK = 0x00006002,
+ CR0_READ_SHADOW = 0x00006004,
+ CR4_READ_SHADOW = 0x00006006,
+ CR3_TARGET_VALUE0 = 0x00006008,
+ CR3_TARGET_VALUE1 = 0x0000600a,
+ CR3_TARGET_VALUE2 = 0x0000600c,
+ CR3_TARGET_VALUE3 = 0x0000600e,
+ EXIT_QUALIFICATION = 0x00006400,
+ GUEST_LINEAR_ADDRESS = 0x0000640a,
+ GUEST_CR0 = 0x00006800,
+ GUEST_CR3 = 0x00006802,
+ GUEST_CR4 = 0x00006804,
+ GUEST_ES_BASE = 0x00006806,
+ GUEST_CS_BASE = 0x00006808,
+ GUEST_SS_BASE = 0x0000680a,
+ GUEST_DS_BASE = 0x0000680c,
+ GUEST_FS_BASE = 0x0000680e,
+ GUEST_GS_BASE = 0x00006810,
+ GUEST_LDTR_BASE = 0x00006812,
+ GUEST_TR_BASE = 0x00006814,
+ GUEST_GDTR_BASE = 0x00006816,
+ GUEST_IDTR_BASE = 0x00006818,
+ GUEST_DR7 = 0x0000681a,
+ GUEST_RSP = 0x0000681c,
+ GUEST_RIP = 0x0000681e,
+ GUEST_RFLAGS = 0x00006820,
+ GUEST_PENDING_DBG_EXCEPTIONS = 0x00006822,
+ GUEST_SYSENTER_ESP = 0x00006824,
+ GUEST_SYSENTER_EIP = 0x00006826,
+ HOST_CR0 = 0x00006c00,
+ HOST_CR3 = 0x00006c02,
+ HOST_CR4 = 0x00006c04,
+ HOST_FS_BASE = 0x00006c06,
+ HOST_GS_BASE = 0x00006c08,
+ HOST_TR_BASE = 0x00006c0a,
+ HOST_GDTR_BASE = 0x00006c0c,
+ HOST_IDTR_BASE = 0x00006c0e,
+ HOST_IA32_SYSENTER_ESP = 0x00006c10,
+ HOST_IA32_SYSENTER_EIP = 0x00006c12,
+ HOST_RSP = 0x00006c14,
+ HOST_RIP = 0x00006c16,
+};
+
#endif
Index: linux-2.6-lttng/arch/x86/kvm/vmx.h
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kvm/vmx.h 2008-07-17 11:15:17.000000000 -0400
+++ linux-2.6-lttng/arch/x86/kvm/vmx.h 2008-07-17 11:16:30.000000000 -0400
@@ -68,148 +68,6 @@
#define VM_ENTRY_SMM 0x00000400
#define VM_ENTRY_DEACT_DUAL_MONITOR 0x00000800
-/* VMCS Encodings */
-enum vmcs_field {
- VIRTUAL_PROCESSOR_ID = 0x00000000,
- GUEST_ES_SELECTOR = 0x00000800,
- GUEST_CS_SELECTOR = 0x00000802,
- GUEST_SS_SELECTOR = 0x00000804,
- GUEST_DS_SELECTOR = 0x00000806,
- GUEST_FS_SELECTOR = 0x00000808,
- GUEST_GS_SELECTOR = 0x0000080a,
- GUEST_LDTR_SELECTOR = 0x0000080c,
- GUEST_TR_SELECTOR = 0x0000080e,
- HOST_ES_SELECTOR = 0x00000c00,
- HOST_CS_SELECTOR = 0x00000c02,
- HOST_SS_SELECTOR = 0x00000c04,
- HOST_DS_SELECTOR = 0x00000c06,
- HOST_FS_SELECTOR = 0x00000c08,
- HOST_GS_SELECTOR = 0x00000c0a,
- HOST_TR_SELECTOR = 0x00000c0c,
- IO_BITMAP_A = 0x00002000,
- IO_BITMAP_A_HIGH = 0x00002001,
- IO_BITMAP_B = 0x00002002,
- IO_BITMAP_B_HIGH = 0x00002003,
- MSR_BITMAP = 0x00002004,
- MSR_BITMAP_HIGH = 0x00002005,
- VM_EXIT_MSR_STORE_ADDR = 0x00002006,
- VM_EXIT_MSR_STORE_ADDR_HIGH = 0x00002007,
- VM_EXIT_MSR_LOAD_ADDR = 0x00002008,
- VM_EXIT_MSR_LOAD_ADDR_HIGH = 0x00002009,
- VM_ENTRY_MSR_LOAD_ADDR = 0x0000200a,
- VM_ENTRY_MSR_LOAD_ADDR_HIGH = 0x0000200b,
- TSC_OFFSET = 0x00002010,
- TSC_OFFSET_HIGH = 0x00002011,
- VIRTUAL_APIC_PAGE_ADDR = 0x00002012,
- VIRTUAL_APIC_PAGE_ADDR_HIGH = 0x00002013,
- APIC_ACCESS_ADDR = 0x00002014,
- APIC_ACCESS_ADDR_HIGH = 0x00002015,
- EPT_POINTER = 0x0000201a,
- EPT_POINTER_HIGH = 0x0000201b,
- GUEST_PHYSICAL_ADDRESS = 0x00002400,
- GUEST_PHYSICAL_ADDRESS_HIGH = 0x00002401,
- VMCS_LINK_POINTER = 0x00002800,
- VMCS_LINK_POINTER_HIGH = 0x00002801,
- GUEST_IA32_DEBUGCTL = 0x00002802,
- GUEST_IA32_DEBUGCTL_HIGH = 0x00002803,
- GUEST_PDPTR0 = 0x0000280a,
- GUEST_PDPTR0_HIGH = 0x0000280b,
- GUEST_PDPTR1 = 0x0000280c,
- GUEST_PDPTR1_HIGH = 0x0000280d,
- GUEST_PDPTR2 = 0x0000280e,
- GUEST_PDPTR2_HIGH = 0x0000280f,
- GUEST_PDPTR3 = 0x00002810,
- GUEST_PDPTR3_HIGH = 0x00002811,
- PIN_BASED_VM_EXEC_CONTROL = 0x00004000,
- CPU_BASED_VM_EXEC_CONTROL = 0x00004002,
- EXCEPTION_BITMAP = 0x00004004,
- PAGE_FAULT_ERROR_CODE_MASK = 0x00004006,
- PAGE_FAULT_ERROR_CODE_MATCH = 0x00004008,
- CR3_TARGET_COUNT = 0x0000400a,
- VM_EXIT_CONTROLS = 0x0000400c,
- VM_EXIT_MSR_STORE_COUNT = 0x0000400e,
- VM_EXIT_MSR_LOAD_COUNT = 0x00004010,
- VM_ENTRY_CONTROLS = 0x00004012,
- VM_ENTRY_MSR_LOAD_COUNT = 0x00004014,
- VM_ENTRY_INTR_INFO_FIELD = 0x00004016,
- VM_ENTRY_EXCEPTION_ERROR_CODE = 0x00004018,
- VM_ENTRY_INSTRUCTION_LEN = 0x0000401a,
- TPR_THRESHOLD = 0x0000401c,
- SECONDARY_VM_EXEC_CONTROL = 0x0000401e,
- VM_INSTRUCTION_ERROR = 0x00004400,
- VM_EXIT_REASON = 0x00004402,
- VM_EXIT_INTR_INFO = 0x00004404,
- VM_EXIT_INTR_ERROR_CODE = 0x00004406,
- IDT_VECTORING_INFO_FIELD = 0x00004408,
- IDT_VECTORING_ERROR_CODE = 0x0000440a,
- VM_EXIT_INSTRUCTION_LEN = 0x0000440c,
- VMX_INSTRUCTION_INFO = 0x0000440e,
- GUEST_ES_LIMIT = 0x00004800,
- GUEST_CS_LIMIT = 0x00004802,
- GUEST_SS_LIMIT = 0x00004804,
- GUEST_DS_LIMIT = 0x00004806,
- GUEST_FS_LIMIT = 0x00004808,
- GUEST_GS_LIMIT = 0x0000480a,
- GUEST_LDTR_LIMIT = 0x0000480c,
- GUEST_TR_LIMIT = 0x0000480e,
- GUEST_GDTR_LIMIT = 0x00004810,
- GUEST_IDTR_LIMIT = 0x00004812,
- GUEST_ES_AR_BYTES = 0x00004814,
- GUEST_CS_AR_BYTES = 0x00004816,
- GUEST_SS_AR_BYTES = 0x00004818,
- GUEST_DS_AR_BYTES = 0x0000481a,
- GUEST_FS_AR_BYTES = 0x0000481c,
- GUEST_GS_AR_BYTES = 0x0000481e,
- GUEST_LDTR_AR_BYTES = 0x00004820,
- GUEST_TR_AR_BYTES = 0x00004822,
- GUEST_INTERRUPTIBILITY_INFO = 0x00004824,
- GUEST_ACTIVITY_STATE = 0X00004826,
- GUEST_SYSENTER_CS = 0x0000482A,
- HOST_IA32_SYSENTER_CS = 0x00004c00,
- CR0_GUEST_HOST_MASK = 0x00006000,
- CR4_GUEST_HOST_MASK = 0x00006002,
- CR0_READ_SHADOW = 0x00006004,
- CR4_READ_SHADOW = 0x00006006,
- CR3_TARGET_VALUE0 = 0x00006008,
- CR3_TARGET_VALUE1 = 0x0000600a,
- CR3_TARGET_VALUE2 = 0x0000600c,
- CR3_TARGET_VALUE3 = 0x0000600e,
- EXIT_QUALIFICATION = 0x00006400,
- GUEST_LINEAR_ADDRESS = 0x0000640a,
- GUEST_CR0 = 0x00006800,
- GUEST_CR3 = 0x00006802,
- GUEST_CR4 = 0x00006804,
- GUEST_ES_BASE = 0x00006806,
- GUEST_CS_BASE = 0x00006808,
- GUEST_SS_BASE = 0x0000680a,
- GUEST_DS_BASE = 0x0000680c,
- GUEST_FS_BASE = 0x0000680e,
- GUEST_GS_BASE = 0x00006810,
- GUEST_LDTR_BASE = 0x00006812,
- GUEST_TR_BASE = 0x00006814,
- GUEST_GDTR_BASE = 0x00006816,
- GUEST_IDTR_BASE = 0x00006818,
- GUEST_DR7 = 0x0000681a,
- GUEST_RSP = 0x0000681c,
- GUEST_RIP = 0x0000681e,
- GUEST_RFLAGS = 0x00006820,
- GUEST_PENDING_DBG_EXCEPTIONS = 0x00006822,
- GUEST_SYSENTER_ESP = 0x00006824,
- GUEST_SYSENTER_EIP = 0x00006826,
- HOST_CR0 = 0x00006c00,
- HOST_CR3 = 0x00006c02,
- HOST_CR4 = 0x00006c04,
- HOST_FS_BASE = 0x00006c06,
- HOST_GS_BASE = 0x00006c08,
- HOST_TR_BASE = 0x00006c0a,
- HOST_GDTR_BASE = 0x00006c0c,
- HOST_IDTR_BASE = 0x00006c0e,
- HOST_IA32_SYSENTER_ESP = 0x00006c10,
- HOST_IA32_SYSENTER_EIP = 0x00006c12,
- HOST_RSP = 0x00006c14,
- HOST_RIP = 0x00006c16,
-};
-
#define VMX_EXIT_REASONS_FAILED_VMENTRY 0x80000000
#define EXIT_REASON_EXCEPTION_NMI 0
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 40+ messages in thread
* [patch 2/4] kvm move VMCS read to system headers
2008-07-17 15:57 [patch 0/4] Port KVM-trace to tracepoints Mathieu Desnoyers
2008-07-17 15:57 ` [patch 1/4] kvm move VMCS Encodings to system headers Mathieu Desnoyers
@ 2008-07-17 15:57 ` Mathieu Desnoyers
2008-07-17 15:57 ` [patch 3/4] KVM move register read-write " Mathieu Desnoyers
` (2 subsequent siblings)
4 siblings, 0 replies; 40+ messages in thread
From: Mathieu Desnoyers @ 2008-07-17 15:57 UTC (permalink / raw)
To: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Avi Kivity, kvm
Cc: Mathieu Desnoyers, Feng(Eric) Liu
[-- Attachment #1: kvm-move-vmcs-read-to-system-headers.patch --]
[-- Type: text/plain, Size: 2932 bytes --]
The VMCS read will be needed by the kvm-trace probes. Put them in
static inline functions in system-side headers instead of in the C file.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: 'Peter Zijlstra' <peterz@infradead.org>
CC: 'Feng(Eric) Liu' <eric.e.liu@intel.com>
CC: Avi Kivity <avi@qumranet.com>
CC: kvm@vger.kernel.org
---
arch/x86/kvm/vmx.c | 28 ----------------------------
include/asm-x86/kvm_host.h | 30 ++++++++++++++++++++++++++++++
2 files changed, 30 insertions(+), 28 deletions(-)
Index: linux-2.6-lttng/arch/x86/kvm/vmx.c
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kvm/vmx.c 2008-07-17 11:41:54.000000000 -0400
+++ linux-2.6-lttng/arch/x86/kvm/vmx.c 2008-07-17 11:41:58.000000000 -0400
@@ -388,34 +388,6 @@ static inline void ept_sync_individual_a
}
}
-static unsigned long vmcs_readl(unsigned long field)
-{
- unsigned long value;
-
- asm volatile (__ex(ASM_VMX_VMREAD_RDX_RAX)
- : "=a"(value) : "d"(field) : "cc");
- return value;
-}
-
-static u16 vmcs_read16(unsigned long field)
-{
- return vmcs_readl(field);
-}
-
-static u32 vmcs_read32(unsigned long field)
-{
- return vmcs_readl(field);
-}
-
-static u64 vmcs_read64(unsigned long field)
-{
-#ifdef CONFIG_X86_64
- return vmcs_readl(field);
-#else
- return vmcs_readl(field) | ((u64)vmcs_readl(field+1) << 32);
-#endif
-}
-
static noinline void vmwrite_error(unsigned long field, unsigned long value)
{
printk(KERN_ERR "vmwrite error: reg %lx value %lx (err %d)\n",
Index: linux-2.6-lttng/include/asm-x86/kvm_host.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-x86/kvm_host.h 2008-07-17 11:41:57.000000000 -0400
+++ linux-2.6-lttng/include/asm-x86/kvm_host.h 2008-07-17 11:42:17.000000000 -0400
@@ -719,6 +719,8 @@ asmlinkage void kvm_handle_fault_on_rebo
KVM_EX_ENTRY " 666b, 667b \n\t" \
".popsection"
+#define __ex(x) __kvm_handle_fault_on_reboot(x)
+
/* VMCS Encodings */
enum vmcs_field {
VIRTUAL_PROCESSOR_ID = 0x00000000,
@@ -861,4 +863,32 @@ enum vmcs_field {
HOST_RIP = 0x00006c16,
};
+static inline unsigned long vmcs_readl(unsigned long field)
+{
+ unsigned long value;
+
+ asm volatile (__ex(ASM_VMX_VMREAD_RDX_RAX)
+ : "=a"(value) : "d"(field) : "cc");
+ return value;
+}
+
+static inline u16 vmcs_read16(unsigned long field)
+{
+ return vmcs_readl(field);
+}
+
+static inline u32 vmcs_read32(unsigned long field)
+{
+ return vmcs_readl(field);
+}
+
+static inline u64 vmcs_read64(unsigned long field)
+{
+#ifdef CONFIG_X86_64
+ return vmcs_readl(field);
+#else
+ return vmcs_readl(field) | ((u64)vmcs_readl(field+1) << 32);
+#endif
+}
+
#endif
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 40+ messages in thread
* [patch 3/4] KVM move register read-write to system headers
2008-07-17 15:57 [patch 0/4] Port KVM-trace to tracepoints Mathieu Desnoyers
2008-07-17 15:57 ` [patch 1/4] kvm move VMCS Encodings to system headers Mathieu Desnoyers
2008-07-17 15:57 ` [patch 2/4] kvm move VMCS read " Mathieu Desnoyers
@ 2008-07-17 15:57 ` Mathieu Desnoyers
2008-07-17 15:57 ` [patch 4/4] KVM-trace port to tracepoints Mathieu Desnoyers
2008-07-22 18:42 ` [patch 0/4] Port KVM-trace " Avi Kivity
4 siblings, 0 replies; 40+ messages in thread
From: Mathieu Desnoyers @ 2008-07-17 15:57 UTC (permalink / raw)
To: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Avi Kivity, kvm
Cc: Mathieu Desnoyers, Feng(Eric) Liu
[-- Attachment #1: kvm-move-register-read-write-to-system-headers.patch --]
[-- Type: text/plain, Size: 5850 bytes --]
Needed by kvm_tracer probes, which are outside of arch/x86/kvm.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: 'Peter Zijlstra' <peterz@infradead.org>
CC: 'Feng(Eric) Liu' <eric.e.liu@intel.com>
CC: Avi Kivity <avi@qumranet.com>
CC: kvm@vger.kernel.org
---
arch/x86/kvm/kvm_cache_regs.h | 32 --------------------------------
arch/x86/kvm/lapic.c | 2 +-
arch/x86/kvm/svm.c | 2 +-
arch/x86/kvm/vmx.c | 2 +-
arch/x86/kvm/x86.c | 2 +-
arch/x86/kvm/x86_emulate.c | 2 +-
include/asm-x86/kvm_cache_regs.h | 32 ++++++++++++++++++++++++++++++++
7 files changed, 37 insertions(+), 37 deletions(-)
Index: linux-2.6-lttng/arch/x86/kvm/kvm_cache_regs.h
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kvm/kvm_cache_regs.h 2008-07-17 11:44:43.000000000 -0400
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,32 +0,0 @@
-#ifndef ASM_KVM_CACHE_REGS_H
-#define ASM_KVM_CACHE_REGS_H
-
-static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu,
- enum kvm_reg reg)
-{
- if (!test_bit(reg, (unsigned long *)&vcpu->arch.regs_avail))
- kvm_x86_ops->cache_reg(vcpu, reg);
-
- return vcpu->arch.regs[reg];
-}
-
-static inline void kvm_register_write(struct kvm_vcpu *vcpu,
- enum kvm_reg reg,
- unsigned long val)
-{
- vcpu->arch.regs[reg] = val;
- __set_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
- __set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
-}
-
-static inline unsigned long kvm_rip_read(struct kvm_vcpu *vcpu)
-{
- return kvm_register_read(vcpu, VCPU_REGS_RIP);
-}
-
-static inline void kvm_rip_write(struct kvm_vcpu *vcpu, unsigned long val)
-{
- kvm_register_write(vcpu, VCPU_REGS_RIP, val);
-}
-
-#endif
Index: linux-2.6-lttng/include/asm-x86/kvm_cache_regs.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-x86/kvm_cache_regs.h 2008-07-17 11:44:53.000000000 -0400
@@ -0,0 +1,32 @@
+#ifndef ASM_KVM_CACHE_REGS_H
+#define ASM_KVM_CACHE_REGS_H
+
+static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu,
+ enum kvm_reg reg)
+{
+ if (!test_bit(reg, (unsigned long *)&vcpu->arch.regs_avail))
+ kvm_x86_ops->cache_reg(vcpu, reg);
+
+ return vcpu->arch.regs[reg];
+}
+
+static inline void kvm_register_write(struct kvm_vcpu *vcpu,
+ enum kvm_reg reg,
+ unsigned long val)
+{
+ vcpu->arch.regs[reg] = val;
+ __set_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
+ __set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
+}
+
+static inline unsigned long kvm_rip_read(struct kvm_vcpu *vcpu)
+{
+ return kvm_register_read(vcpu, VCPU_REGS_RIP);
+}
+
+static inline void kvm_rip_write(struct kvm_vcpu *vcpu, unsigned long val)
+{
+ kvm_register_write(vcpu, VCPU_REGS_RIP, val);
+}
+
+#endif
Index: linux-2.6-lttng/arch/x86/kvm/lapic.c
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kvm/lapic.c 2008-07-17 11:46:14.000000000 -0400
+++ linux-2.6-lttng/arch/x86/kvm/lapic.c 2008-07-17 11:46:29.000000000 -0400
@@ -32,7 +32,7 @@
#include <asm/current.h>
#include <asm/apicdef.h>
#include <asm/atomic.h>
-#include "kvm_cache_regs.h"
+#include <asm/kvm_cache_regs.h>
#include "irq.h"
#define PRId64 "d"
Index: linux-2.6-lttng/arch/x86/kvm/svm.c
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kvm/svm.c 2008-07-17 11:46:14.000000000 -0400
+++ linux-2.6-lttng/arch/x86/kvm/svm.c 2008-07-17 11:46:47.000000000 -0400
@@ -18,7 +18,6 @@
#include "kvm_svm.h"
#include "irq.h"
#include "mmu.h"
-#include "kvm_cache_regs.h"
#include <linux/module.h>
#include <linux/kernel.h>
@@ -26,6 +25,7 @@
#include <linux/highmem.h>
#include <linux/sched.h>
+#include <asm/kvm_cache_regs.h>
#include <asm/desc.h>
#define __ex(x) __kvm_handle_fault_on_reboot(x)
Index: linux-2.6-lttng/arch/x86/kvm/vmx.c
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kvm/vmx.c 2008-07-17 11:46:14.000000000 -0400
+++ linux-2.6-lttng/arch/x86/kvm/vmx.c 2008-07-17 11:47:04.000000000 -0400
@@ -26,9 +26,9 @@
#include <linux/highmem.h>
#include <linux/sched.h>
#include <linux/moduleparam.h>
-#include "kvm_cache_regs.h"
#include "x86.h"
+#include <asm/kvm_cache_regs.h>
#include <asm/io.h>
#include <asm/desc.h>
Index: linux-2.6-lttng/arch/x86/kvm/x86.c
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kvm/x86.c 2008-07-17 11:46:14.000000000 -0400
+++ linux-2.6-lttng/arch/x86/kvm/x86.c 2008-07-17 11:47:12.000000000 -0400
@@ -19,7 +19,6 @@
#include "mmu.h"
#include "i8254.h"
#include "tss.h"
-#include "kvm_cache_regs.h"
#include "x86.h"
#include <linux/clocksource.h>
@@ -30,6 +29,7 @@
#include <linux/mman.h>
#include <linux/highmem.h>
+#include <asm/kvm_cache_regs.h>
#include <asm/uaccess.h>
#include <asm/msr.h>
#include <asm/desc.h>
Index: linux-2.6-lttng/arch/x86/kvm/x86_emulate.c
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kvm/x86_emulate.c 2008-07-17 11:46:14.000000000 -0400
+++ linux-2.6-lttng/arch/x86/kvm/x86_emulate.c 2008-07-17 11:47:24.000000000 -0400
@@ -26,10 +26,10 @@
#define DPRINTF(_f, _a ...) printf(_f , ## _a)
#else
#include <linux/kvm_host.h>
-#include "kvm_cache_regs.h"
#define DPRINTF(x...) do {} while (0)
#endif
#include <linux/module.h>
+#include <asm/kvm_cache_regs.h>
#include <asm/kvm_x86_emulate.h>
/*
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 40+ messages in thread
* [patch 4/4] KVM-trace port to tracepoints
2008-07-17 15:57 [patch 0/4] Port KVM-trace to tracepoints Mathieu Desnoyers
` (2 preceding siblings ...)
2008-07-17 15:57 ` [patch 3/4] KVM move register read-write " Mathieu Desnoyers
@ 2008-07-17 15:57 ` Mathieu Desnoyers
2008-07-17 16:49 ` Jan Kiszka
2008-07-17 16:52 ` Anthony Liguori
2008-07-22 18:42 ` [patch 0/4] Port KVM-trace " Avi Kivity
4 siblings, 2 replies; 40+ messages in thread
From: Mathieu Desnoyers @ 2008-07-17 15:57 UTC (permalink / raw)
To: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Avi Kivity, kvm
Cc: Mathieu Desnoyers, Feng(Eric) Liu
[-- Attachment #1: kvm-trace-port-to-tracepoints.patch --]
[-- Type: text/plain, Size: 24667 bytes --]
Port/cleanup of KVM-trace to tracepoints.
Tracepoints allow dormat instrumentation, like the kernel markers, but also
allows to describe the trace points in global headers so they can be easily
managed. They also do not use format strings.
Anything that would involve an action (dereference a pointer, vmcs read, ...)
only required when tracing is placed in the probes created in kvm_trace.c
This patch depends on the "Tracepoints" patch.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: 'Peter Zijlstra' <peterz@infradead.org>
CC: 'Feng(Eric) Liu' <eric.e.liu@intel.com>
CC: Avi Kivity <avi@qumranet.com>
CC: kvm@vger.kernel.org
---
arch/x86/kvm/vmx.c | 38 ++---
arch/x86/kvm/x86.c | 43 ++----
include/trace/kvm.h | 83 ++++++++++++
virt/kvm/kvm_trace.c | 336 +++++++++++++++++++++++++++++++++++++++++++--------
4 files changed, 398 insertions(+), 102 deletions(-)
Index: linux-2.6-lttng/arch/x86/kvm/vmx.c
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kvm/vmx.c 2008-07-17 11:54:36.000000000 -0400
+++ linux-2.6-lttng/arch/x86/kvm/vmx.c 2008-07-17 11:54:54.000000000 -0400
@@ -26,6 +26,7 @@
#include <linux/highmem.h>
#include <linux/sched.h>
#include <linux/moduleparam.h>
+#include <trace/kvm.h>
#include "x86.h"
#include <asm/kvm_cache_regs.h>
@@ -2094,7 +2095,7 @@ static void vmx_inject_irq(struct kvm_vc
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
- KVMTRACE_1D(INJ_VIRQ, vcpu, (u32)irq, handler);
+ trace_kvm_inj_virq(vcpu, irq);
if (vcpu->arch.rmode.active) {
vmx->rmode.irq.pending = true;
@@ -2252,8 +2253,7 @@ static int handle_exception(struct kvm_v
if (vm_need_ept())
BUG();
cr2 = vmcs_readl(EXIT_QUALIFICATION);
- KVMTRACE_3D(PAGE_FAULT, vcpu, error_code, (u32)cr2,
- (u32)((u64)cr2 >> 32), handler);
+ trace_kvm_page_fault(vcpu, error_code, cr2);
return kvm_mmu_page_fault(vcpu, cr2, error_code);
}
@@ -2282,7 +2282,7 @@ static int handle_external_interrupt(str
struct kvm_run *kvm_run)
{
++vcpu->stat.irq_exits;
- KVMTRACE_1D(INTR, vcpu, vmcs_read32(VM_EXIT_INTR_INFO), handler);
+ trace_kvm_intr(vcpu);
return 1;
}
@@ -2340,10 +2340,7 @@ static int handle_cr(struct kvm_vcpu *vc
reg = (exit_qualification >> 8) & 15;
switch ((exit_qualification >> 4) & 3) {
case 0: /* mov to cr */
- KVMTRACE_3D(CR_WRITE, vcpu, (u32)cr,
- (u32)kvm_register_read(vcpu, reg),
- (u32)((u64)kvm_register_read(vcpu, reg) >> 32),
- handler);
+ trace_kvm_cr_write(vcpu, cr, reg);
switch (cr) {
case 0:
kvm_set_cr0(vcpu, kvm_register_read(vcpu, reg));
@@ -2371,23 +2368,19 @@ static int handle_cr(struct kvm_vcpu *vc
vcpu->arch.cr0 &= ~X86_CR0_TS;
vmcs_writel(CR0_READ_SHADOW, vcpu->arch.cr0);
vmx_fpu_activate(vcpu);
- KVMTRACE_0D(CLTS, vcpu, handler);
+ trace_kvm_clts(vcpu);
skip_emulated_instruction(vcpu);
return 1;
case 1: /*mov from cr*/
switch (cr) {
case 3:
kvm_register_write(vcpu, reg, vcpu->arch.cr3);
- KVMTRACE_3D(CR_READ, vcpu, (u32)cr,
- (u32)kvm_register_read(vcpu, reg),
- (u32)((u64)kvm_register_read(vcpu, reg) >> 32),
- handler);
+ trace_kvm_cr_read(vcpu, cr, reg);
skip_emulated_instruction(vcpu);
return 1;
case 8:
kvm_register_write(vcpu, reg, kvm_get_cr8(vcpu));
- KVMTRACE_2D(CR_READ, vcpu, (u32)cr,
- (u32)kvm_register_read(vcpu, reg), handler);
+ trace_kvm_cr_read(vcpu, cr, reg);
skip_emulated_instruction(vcpu);
return 1;
}
@@ -2432,7 +2425,7 @@ static int handle_dr(struct kvm_vcpu *vc
val = 0;
}
kvm_register_write(vcpu, reg, val);
- KVMTRACE_2D(DR_READ, vcpu, (u32)dr, (u32)val, handler);
+ trace_kvm_dr_read(vcpu, dr, val);
} else {
/* mov to dr */
}
@@ -2456,8 +2449,7 @@ static int handle_rdmsr(struct kvm_vcpu
return 1;
}
- KVMTRACE_3D(MSR_READ, vcpu, ecx, (u32)data, (u32)(data >> 32),
- handler);
+ trace_kvm_msr_read(vcpu, ecx, data);
/* FIXME: handling of bits 32:63 of rax, rdx */
vcpu->arch.regs[VCPU_REGS_RAX] = data & -1u;
@@ -2472,8 +2464,7 @@ static int handle_wrmsr(struct kvm_vcpu
u64 data = (vcpu->arch.regs[VCPU_REGS_RAX] & -1u)
| ((u64)(vcpu->arch.regs[VCPU_REGS_RDX] & -1u) << 32);
- KVMTRACE_3D(MSR_WRITE, vcpu, ecx, (u32)data, (u32)(data >> 32),
- handler);
+ trace_kvm_msr_write(vcpu, ecx, data);
if (vmx_set_msr(vcpu, ecx, data) != 0) {
kvm_inject_gp(vcpu, 0);
@@ -2500,7 +2491,7 @@ static int handle_interrupt_window(struc
cpu_based_vm_exec_control &= ~CPU_BASED_VIRTUAL_INTR_PENDING;
vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
- KVMTRACE_0D(PEND_INTR, vcpu, handler);
+ trace_kvm_pend_intr(vcpu);
/*
* If the user space waits to inject interrupts, exit as soon as
@@ -2680,8 +2671,7 @@ static int kvm_handle_exit(struct kvm_ru
struct vcpu_vmx *vmx = to_vmx(vcpu);
u32 vectoring_info = vmx->idt_vectoring_info;
- KVMTRACE_3D(VMEXIT, vcpu, exit_reason, (u32)kvm_rip_read(vcpu),
- (u32)((u64)kvm_rip_read(vcpu) >> 32), entryexit);
+ trace_kvm_vmexit(vcpu, exit_reason);
/* Access CR3 don't cause VMExit in paging mode, so we need
* to sync with guest real CR3. */
@@ -3037,7 +3027,7 @@ static void vmx_vcpu_run(struct kvm_vcpu
/* We need to handle NMIs before interrupts are enabled */
if ((intr_info & INTR_INFO_INTR_TYPE_MASK) == 0x200 &&
(intr_info & INTR_INFO_VALID_MASK)) {
- KVMTRACE_0D(NMI, vcpu, handler);
+ trace_kvm_nmi(vcpu);
asm("int $2");
}
Index: linux-2.6-lttng/arch/x86/kvm/x86.c
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kvm/x86.c 2008-07-17 11:54:36.000000000 -0400
+++ linux-2.6-lttng/arch/x86/kvm/x86.c 2008-07-17 11:54:37.000000000 -0400
@@ -28,6 +28,7 @@
#include <linux/module.h>
#include <linux/mman.h>
#include <linux/highmem.h>
+#include <trace/kvm.h>
#include <asm/kvm_cache_regs.h>
#include <asm/uaccess.h>
@@ -312,10 +313,10 @@ EXPORT_SYMBOL_GPL(kvm_set_cr0);
void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw)
{
- kvm_set_cr0(vcpu, (vcpu->arch.cr0 & ~0x0ful) | (msw & 0x0f));
- KVMTRACE_1D(LMSW, vcpu,
- (u32)((vcpu->arch.cr0 & ~0x0ful) | (msw & 0x0f)),
- handler);
+ unsigned long cr0 = (vcpu->arch.cr0 & ~0x0ful) | (msw & 0x0f);
+
+ kvm_set_cr0(vcpu, cr0);
+ trace_kvm_lmsw(vcpu, cr0);
}
EXPORT_SYMBOL_GPL(kvm_lmsw);
@@ -2045,7 +2046,7 @@ int emulate_invlpg(struct kvm_vcpu *vcpu
int emulate_clts(struct kvm_vcpu *vcpu)
{
- KVMTRACE_0D(CLTS, vcpu, handler);
+ trace_kvm_clts(vcpu);
kvm_x86_ops->set_cr0(vcpu, vcpu->arch.cr0 & ~X86_CR0_TS);
return X86EMUL_CONTINUE;
}
@@ -2345,12 +2346,7 @@ int kvm_emulate_pio(struct kvm_vcpu *vcp
vcpu->arch.pio.guest_page_offset = 0;
vcpu->arch.pio.rep = 0;
- if (vcpu->run->io.direction == KVM_EXIT_IO_IN)
- KVMTRACE_2D(IO_READ, vcpu, vcpu->run->io.port, (u32)size,
- handler);
- else
- KVMTRACE_2D(IO_WRITE, vcpu, vcpu->run->io.port, (u32)size,
- handler);
+ trace_kvm_io(vcpu, size);
val = kvm_register_read(vcpu, VCPU_REGS_RAX);
memcpy(vcpu->arch.pio_data, &val, 4);
@@ -2389,12 +2385,7 @@ int kvm_emulate_pio_string(struct kvm_vc
vcpu->arch.pio.guest_page_offset = offset_in_page(address);
vcpu->arch.pio.rep = rep;
- if (vcpu->run->io.direction == KVM_EXIT_IO_IN)
- KVMTRACE_2D(IO_READ, vcpu, vcpu->run->io.port, (u32)size,
- handler);
- else
- KVMTRACE_2D(IO_WRITE, vcpu, vcpu->run->io.port, (u32)size,
- handler);
+ trace_kvm_io(vcpu, size);
if (!count) {
kvm_x86_ops->skip_emulated_instruction(vcpu);
@@ -2508,7 +2499,7 @@ void kvm_arch_exit(void)
int kvm_emulate_halt(struct kvm_vcpu *vcpu)
{
++vcpu->stat.halt_exits;
- KVMTRACE_0D(HLT, vcpu, handler);
+ trace_kvm_hlt(vcpu);
if (irqchip_in_kernel(vcpu->kvm)) {
vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
up_read(&vcpu->kvm->slots_lock);
@@ -2544,7 +2535,7 @@ int kvm_emulate_hypercall(struct kvm_vcp
a2 = kvm_register_read(vcpu, VCPU_REGS_RDX);
a3 = kvm_register_read(vcpu, VCPU_REGS_RSI);
- KVMTRACE_1D(VMMCALL, vcpu, (u32)nr, handler);
+ trace_kvm_vmmcall(vcpu, nr);
if (!is_long_mode(vcpu)) {
nr &= 0xFFFFFFFF;
@@ -2644,8 +2635,7 @@ unsigned long realmode_get_cr(struct kvm
vcpu_printf(vcpu, "%s: unexpected cr %u\n", __func__, cr);
return 0;
}
- KVMTRACE_3D(CR_READ, vcpu, (u32)cr, (u32)value,
- (u32)((u64)value >> 32), handler);
+ trace_kvm_cr_read_realmode(vcpu, cr, value);
return value;
}
@@ -2653,8 +2643,7 @@ unsigned long realmode_get_cr(struct kvm
void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long val,
unsigned long *rflags)
{
- KVMTRACE_3D(CR_WRITE, vcpu, (u32)cr, (u32)val,
- (u32)((u64)val >> 32), handler);
+ trace_kvm_cr_write_realmode(vcpu, cr, val);
switch (cr) {
case 0:
@@ -2745,11 +2734,7 @@ void kvm_emulate_cpuid(struct kvm_vcpu *
kvm_register_write(vcpu, VCPU_REGS_RDX, best->edx);
}
kvm_x86_ops->skip_emulated_instruction(vcpu);
- KVMTRACE_5D(CPUID, vcpu, function,
- (u32)kvm_register_read(vcpu, VCPU_REGS_RAX),
- (u32)kvm_register_read(vcpu, VCPU_REGS_RBX),
- (u32)kvm_register_read(vcpu, VCPU_REGS_RCX),
- (u32)kvm_register_read(vcpu, VCPU_REGS_RDX), handler);
+ trace_kvm_cpuid(vcpu, function);
}
EXPORT_SYMBOL_GPL(kvm_emulate_cpuid);
@@ -2904,8 +2889,8 @@ again:
kvm_guest_enter();
+ trace_kvm_vmentry(vcpu);
- KVMTRACE_0D(VMENTRY, vcpu, entryexit);
kvm_x86_ops->run(vcpu, kvm_run);
vcpu->guest_mode = 0;
Index: linux-2.6-lttng/include/trace/kvm.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/trace/kvm.h 2008-07-17 11:54:37.000000000 -0400
@@ -0,0 +1,83 @@
+#ifndef _TRACE_KVM_H
+#define _TRACE_KVM_H
+
+#include <linux/kvm_host.h>
+#include <linux/tracepoint.h>
+
+/*
+ * ENTRYEXIT events.
+ */
+DEFINE_TRACE(kvm_vmentry,
+ TPPROTO(struct kvm_vcpu *vcpu),
+ TPARGS(vcpu));
+DEFINE_TRACE(kvm_vmexit,
+ TPPROTO(struct kvm_vcpu *vcpu, u32 exit_reason),
+ TPARGS(vcpu, exit_reason));
+
+/*
+ * HANDLER events.
+ */
+DEFINE_TRACE(kvm_page_fault,
+ TPPROTO(struct kvm_vcpu *vcpu, u32 error_code, unsigned long cr2),
+ TPARGS(vcpu, error_code, cr2));
+
+DEFINE_TRACE(kvm_inj_virq,
+ TPPROTO(struct kvm_vcpu *vcpu, int irq),
+ TPARGS(vcpu, irq));
+DEFINE_TRACE(kvm_redeliver_evt,
+ TPPROTO(struct kvm_vcpu *vcpu, u32 idtv_info_field),
+ TPARGS(vcpu, idtv_info_field));
+DEFINE_TRACE(kvm_pend_intr,
+ TPPROTO(struct kvm_vcpu *vcpu),
+ TPARGS(vcpu));
+DEFINE_TRACE(kvm_io,
+ TPPROTO(struct kvm_vcpu *vcpu, int size),
+ TPARGS(vcpu, size));
+DEFINE_TRACE(kvm_cr_read,
+ TPPROTO(struct kvm_vcpu *vcpu, int cr, int reg),
+ TPARGS(vcpu, cr, reg));
+DEFINE_TRACE(kvm_cr_write,
+ TPPROTO(struct kvm_vcpu *vcpu, int cr, int reg),
+ TPARGS(vcpu, cr, reg));
+DEFINE_TRACE(kvm_cr_read_realmode,
+ TPPROTO(struct kvm_vcpu *vcpu, int cr, unsigned long value),
+ TPARGS(vcpu, cr, value));
+DEFINE_TRACE(kvm_cr_write_realmode,
+ TPPROTO(struct kvm_vcpu *vcpu, int cr, unsigned long value),
+ TPARGS(vcpu, cr, value));
+DEFINE_TRACE(kvm_dr_read,
+ TPPROTO(struct kvm_vcpu *vcpu, int dr, unsigned long val),
+ TPARGS(vcpu, dr, val));
+/* FIXME : missing dr_write in the original instrumentation */
+DEFINE_TRACE(kvm_msr_read,
+ TPPROTO(struct kvm_vcpu *vcpu, u32 ecx, u64 data),
+ TPARGS(vcpu, ecx, data));
+DEFINE_TRACE(kvm_msr_write,
+ TPPROTO(struct kvm_vcpu *vcpu, u32 ecx, u64 data),
+ TPARGS(vcpu, ecx, data));
+DEFINE_TRACE(kvm_cpuid,
+ TPPROTO(struct kvm_vcpu *vcpu, u32 function),
+ TPARGS(vcpu, function));
+DEFINE_TRACE(kvm_intr,
+ TPPROTO(struct kvm_vcpu *vcpu),
+ TPARGS(vcpu));
+DEFINE_TRACE(kvm_nmi,
+ TPPROTO(struct kvm_vcpu *vcpu),
+ TPARGS(vcpu));
+DEFINE_TRACE(kvm_vmmcall,
+ TPPROTO(struct kvm_vcpu *vcpu, unsigned long nr),
+ TPARGS(vcpu, nr));
+DEFINE_TRACE(kvm_hlt,
+ TPPROTO(struct kvm_vcpu *vcpu),
+ TPARGS(vcpu));
+DEFINE_TRACE(kvm_clts,
+ TPPROTO(struct kvm_vcpu *vcpu),
+ TPARGS(vcpu));
+DEFINE_TRACE(kvm_lmsw,
+ TPPROTO(struct kvm_vcpu *vcpu, unsigned long cr0),
+ TPARGS(vcpu, cr0));
+DEFINE_TRACE(kvm_apic_access,
+ TPPROTO(struct kvm_vcpu *vcpu, unsigned long offset),
+ TPARGS(vcpu, offset));
+
+#endif
Index: linux-2.6-lttng/virt/kvm/kvm_trace.c
===================================================================
--- linux-2.6-lttng.orig/virt/kvm/kvm_trace.c 2008-07-17 11:54:12.000000000 -0400
+++ linux-2.6-lttng/virt/kvm/kvm_trace.c 2008-07-17 11:55:40.000000000 -0400
@@ -6,6 +6,7 @@
* it's possible to reconstruct a chronological record of trace events.
* The implementation refers to blktrace kernel support.
*
+ * Copyright (c) 2008 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
* Copyright (c) 2008 Intel Corporation
* Copyright (C) 2006 Jens Axboe <axboe@kernel.dk>
*
@@ -19,6 +20,8 @@
#include <linux/debugfs.h>
#include <linux/kvm_host.h>
+#include <trace/kvm.h>
+#include <asm/kvm_cache_regs.h>
#define KVM_TRACE_STATE_RUNNING (1 << 0)
#define KVM_TRACE_STATE_PAUSE (1 << 1)
@@ -32,12 +35,8 @@ struct kvm_trace {
};
static struct kvm_trace *kvm_trace;
-struct kvm_trace_probe {
- const char *name;
- const char *format;
- u32 cycle_in;
- marker_probe_func *probe_func;
-};
+static int register_tracepoints(void);
+static void unregister_tracepoints(void);
static inline int calc_rec_size(int cycle, int extra)
{
@@ -47,48 +46,58 @@ static inline int calc_rec_size(int cycl
return cycle ? rec_size += KVM_TRC_CYCLE_SIZE : rec_size;
}
-static void kvm_add_trace(void *probe_private, void *call_data,
- const char *format, va_list *args)
+static void kvm_add_trace_timestamp(unsigned int evid, struct kvm_vcpu *vcpu,
+ int nr_args, u32 *args)
{
- struct kvm_trace_probe *p = probe_private;
struct kvm_trace *kt = kvm_trace;
struct kvm_trace_rec rec;
- struct kvm_vcpu *vcpu;
- int i, extra, size;
+ int size;
if (unlikely(kt->trace_state != KVM_TRACE_STATE_RUNNING))
return;
- rec.event = va_arg(*args, u32);
- vcpu = va_arg(*args, struct kvm_vcpu *);
+ rec.event = evid;
rec.pid = current->tgid;
rec.vcpu_id = vcpu->vcpu_id;
+ rec.cycle_in = 1;
- extra = va_arg(*args, u32);
- WARN_ON(!(extra <= KVM_TRC_EXTRA_MAX));
- extra = min_t(u32, extra, KVM_TRC_EXTRA_MAX);
- rec.extra_u32 = extra;
-
- rec.cycle_in = p->cycle_in;
-
- if (rec.cycle_in) {
- rec.u.cycle.cycle_u64 = get_cycles();
-
- for (i = 0; i < rec.extra_u32; i++)
- rec.u.cycle.extra_u32[i] = va_arg(*args, u32);
- } else {
- for (i = 0; i < rec.extra_u32; i++)
- rec.u.nocycle.extra_u32[i] = va_arg(*args, u32);
- }
+ WARN_ON(!(nr_args <= KVM_TRC_EXTRA_MAX));
+ nr_args = min_t(u32, nr_args, KVM_TRC_EXTRA_MAX);
+ rec.extra_u32 = nr_args;
+
+ rec.u.cycle.cycle_u64 = get_cycles();
+ if (nr_args > 0)
+ memcpy(rec.u.cycle.extra_u32, args, nr_args * sizeof(u32));
size = calc_rec_size(rec.cycle_in, rec.extra_u32 * sizeof(u32));
relay_write(kt->rchan, &rec, size);
}
-static struct kvm_trace_probe kvm_trace_probes[] = {
- { "kvm_trace_entryexit", "%u %p %u %u %u %u %u %u", 1, kvm_add_trace },
- { "kvm_trace_handler", "%u %p %u %u %u %u %u %u", 0, kvm_add_trace },
-};
+static void kvm_add_trace(unsigned int evid, struct kvm_vcpu *vcpu,
+ int nr_args, u32 *args)
+{
+ struct kvm_trace *kt = kvm_trace;
+ struct kvm_trace_rec rec;
+ int size;
+
+ if (unlikely(kt->trace_state != KVM_TRACE_STATE_RUNNING))
+ return;
+
+ rec.event = evid;
+ rec.pid = current->tgid;
+ rec.vcpu_id = vcpu->vcpu_id;
+ rec.cycle_in = 0;
+
+ WARN_ON(!(nr_args <= KVM_TRC_EXTRA_MAX));
+ nr_args = min_t(u32, nr_args, KVM_TRC_EXTRA_MAX);
+ rec.extra_u32 = nr_args;
+
+ if (nr_args > 0)
+ memcpy(rec.u.nocycle.extra_u32, args, nr_args * sizeof(u32));
+
+ size = calc_rec_size(rec.cycle_in, rec.extra_u32 * sizeof(u32));
+ relay_write(kt->rchan, &rec, size);
+}
static int lost_records_get(void *data, u64 *val)
{
@@ -154,7 +163,7 @@ static struct rchan_callbacks kvm_relay_
static int do_kvm_trace_enable(struct kvm_user_trace_setup *kuts)
{
struct kvm_trace *kt;
- int i, r = -ENOMEM;
+ int r = -ENOMEM;
if (!kuts->buf_size || !kuts->buf_nr)
return -EINVAL;
@@ -177,14 +186,8 @@ static int do_kvm_trace_enable(struct kv
kvm_trace = kt;
- for (i = 0; i < ARRAY_SIZE(kvm_trace_probes); i++) {
- struct kvm_trace_probe *p = &kvm_trace_probes[i];
-
- r = marker_probe_register(p->name, p->format, p->probe_func, p);
- if (r)
- printk(KERN_INFO "Unable to register probe %s\n",
- p->name);
- }
+ r = register_tracepoints();
+ WARN_ON(r);
kvm_trace->trace_state = KVM_TRACE_STATE_RUNNING;
@@ -236,7 +239,6 @@ static int kvm_trace_pause(void)
void kvm_trace_cleanup(void)
{
struct kvm_trace *kt = kvm_trace;
- int i;
if (kt == NULL)
return;
@@ -245,12 +247,7 @@ void kvm_trace_cleanup(void)
kt->trace_state == KVM_TRACE_STATE_PAUSE) {
kt->trace_state = KVM_TRACE_STATE_CLEARUP;
-
- for (i = 0; i < ARRAY_SIZE(kvm_trace_probes); i++) {
- struct kvm_trace_probe *p = &kvm_trace_probes[i];
- marker_probe_unregister(p->name, p->probe_func, p);
- }
-
+ unregister_tracepoints();
relay_close(kt->rchan);
debugfs_remove(kt->lost_file);
kfree(kt);
@@ -280,3 +277,244 @@ int kvm_trace_ioctl(unsigned int ioctl,
return r;
}
+
+/*
+ * Tracepoint probes for KVM-trace.
+ *
+ * The probes end up calling either kvm_add_trace_timestamp (the cycle counter
+ * is read) or kvm_add_trace (no timestamp saved in the trace).
+ */
+
+/*
+ * Probes with timestamp.
+ */
+static void probe_kvm_vmentry(struct kvm_vcpu *vcpu)
+{
+ kvm_add_trace_timestamp(KVM_TRC_VMENTRY, vcpu, 0, NULL);
+}
+
+static void probe_kvm_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason)
+{
+ kvm_add_trace_timestamp(KVM_TRC_VMEXIT, vcpu, 3,
+ (u32 []){ exit_reason,
+ (u32)kvm_rip_read(vcpu),
+ (u32)((u64)kvm_rip_read(vcpu) >> 32) });
+}
+
+/*
+ * Probes without timestamp.
+ */
+static void probe_kvm_page_fault(struct kvm_vcpu *vcpu, u32 error_code,
+ unsigned long cr2)
+{
+ kvm_add_trace(KVM_TRC_PAGE_FAULT, vcpu, 3,
+ (u32 []){ error_code, (u32)cr2, (u32)((u64)cr2 >> 32) });
+}
+
+static void probe_kvm_inj_virq(struct kvm_vcpu *vcpu, int irq)
+{
+ kvm_add_trace(KVM_TRC_VMENTRY, vcpu, 1, (u32 []){ (u32)irq });
+}
+
+static void probe_kvm_redeliver_evt(struct kvm_vcpu *vcpu, u32 idtv_info_field)
+{
+ kvm_add_trace(KVM_TRC_REDELIVER_EVT, vcpu, 1,
+ (u32 []){ idtv_info_field });
+}
+
+static void probe_kvm_pend_intr(struct kvm_vcpu *vcpu)
+{
+ kvm_add_trace(KVM_TRC_PEND_INTR, vcpu, 0, NULL);
+}
+
+static void probe_kvm_io(struct kvm_vcpu *vcpu, int size)
+{
+ if (vcpu->run->io.direction == KVM_EXIT_IO_IN)
+ kvm_add_trace(KVM_TRC_IO_READ, vcpu, 2,
+ (u32 []){ vcpu->run->io.port, (u32)size });
+ else
+ kvm_add_trace(KVM_TRC_IO_WRITE, vcpu, 2,
+ (u32 []){ vcpu->run->io.port, (u32)size });
+}
+
+static void probe_kvm_cr_read(struct kvm_vcpu *vcpu, int cr, int reg)
+{
+ switch (cr) {
+ case 3:
+ kvm_add_trace(KVM_TRC_CR_READ, vcpu, 3,
+ (u32 []){ (u32)cr,
+ (u32)kvm_register_read(vcpu, reg),
+ (u32)((u64)kvm_register_read(vcpu, reg) >> 32) });
+ break;
+ case 8:
+ kvm_add_trace(KVM_TRC_CR_READ, vcpu, 2,
+ (u32 []){ (u32)cr, (u32)kvm_register_read(vcpu, reg) });
+ break;
+ }
+}
+
+static void probe_kvm_cr_write(struct kvm_vcpu *vcpu, int cr, int reg)
+{
+ kvm_add_trace(KVM_TRC_CR_WRITE, vcpu, 3,
+ (u32 []){ (u32)cr,
+ (u32)kvm_register_read(vcpu, reg),
+ (u32)((u64)kvm_register_read(vcpu, reg) >> 32) });
+}
+
+
+static void probe_kvm_cr_read_realmode(struct kvm_vcpu *vcpu, int cr,
+ unsigned long value)
+{
+ kvm_add_trace(KVM_TRC_CR_READ, vcpu, 3,
+ (u32 []){ (u32)cr, (u32)value, (u32)((u64)value >> 32) });
+}
+
+static void probe_kvm_cr_write_realmode(struct kvm_vcpu *vcpu, int cr,
+ unsigned long value)
+{
+ kvm_add_trace(KVM_TRC_CR_WRITE, vcpu, 3,
+ (u32 []){ (u32)cr, (u32)value, (u32)((u64)value >> 32) });
+}
+
+static void probe_kvm_dr_read(struct kvm_vcpu *vcpu, int dr, unsigned long val)
+{
+ kvm_add_trace(KVM_TRC_DR_READ, vcpu, 2,
+ (u32 []){ (u32)dr, (u32)val });
+}
+
+static void probe_kvm_msr_read(struct kvm_vcpu *vcpu, u32 ecx, u64 data)
+{
+ kvm_add_trace(KVM_TRC_MSR_READ, vcpu, 3,
+ (u32 []){ ecx, (u32)data, (u32)(data >> 32) });
+}
+
+static void probe_kvm_msr_write(struct kvm_vcpu *vcpu, u32 ecx, u64 data)
+{
+ kvm_add_trace(KVM_TRC_MSR_WRITE, vcpu, 3,
+ (u32 []){ ecx, (u32)data, (u32)(data >> 32) });
+}
+
+static void probe_kvm_cpuid(struct kvm_vcpu *vcpu, u32 function)
+{
+ kvm_add_trace(KVM_TRC_CPUID, vcpu, 5,
+ (u32 []){ function,
+ (u32)kvm_register_read(vcpu, VCPU_REGS_RAX),
+ (u32)kvm_register_read(vcpu, VCPU_REGS_RBX),
+ (u32)kvm_register_read(vcpu, VCPU_REGS_RCX),
+ (u32)kvm_register_read(vcpu, VCPU_REGS_RDX) });
+}
+
+static void probe_kvm_intr(struct kvm_vcpu *vcpu)
+{
+ kvm_add_trace(KVM_TRC_INTR, vcpu, 1,
+ (u32 []){ vmcs_read32(VM_EXIT_INTR_INFO) });
+}
+
+static void probe_kvm_nmi(struct kvm_vcpu *vcpu)
+{
+ kvm_add_trace(KVM_TRC_NMI, vcpu, 0, NULL);
+}
+
+static void probe_kvm_vmmcall(struct kvm_vcpu *vcpu, unsigned long nr)
+{
+ kvm_add_trace(KVM_TRC_VMMCALL, vcpu, 1, (u32 []){ (u32)nr });
+}
+
+static void probe_kvm_hlt(struct kvm_vcpu *vcpu)
+{
+ kvm_add_trace(KVM_TRC_HLT, vcpu, 0, NULL);
+}
+
+static void probe_kvm_clts(struct kvm_vcpu *vcpu)
+{
+ kvm_add_trace(KVM_TRC_CLTS, vcpu, 0, NULL);
+}
+
+static void probe_kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long cr0)
+{
+ kvm_add_trace(KVM_TRC_LMSW, vcpu, 1, (u32 []){ (u32)cr0 });
+}
+
+static void probe_kvm_apic_access(struct kvm_vcpu *vcpu, unsigned long offset)
+{
+ kvm_add_trace(KVM_TRC_LMSW, vcpu, 1, (u32 []){ (u32)offset });
+}
+
+static int register_tracepoints(void)
+{
+ int ret;
+
+ ret = register_trace_kvm_vmentry(probe_kvm_vmentry);
+ WARN_ON(ret);
+ ret = register_trace_kvm_vmexit(probe_kvm_vmexit);
+ WARN_ON(ret);
+ ret = register_trace_kvm_page_fault(probe_kvm_page_fault);
+ WARN_ON(ret);
+ ret = register_trace_kvm_inj_virq(probe_kvm_inj_virq);
+ WARN_ON(ret);
+ ret = register_trace_kvm_redeliver_evt(probe_kvm_redeliver_evt);
+ WARN_ON(ret);
+ ret = register_trace_kvm_pend_intr(probe_kvm_pend_intr);
+ WARN_ON(ret);
+ ret = register_trace_kvm_io(probe_kvm_io);
+ WARN_ON(ret);
+ ret = register_trace_kvm_cr_read(probe_kvm_cr_read);
+ WARN_ON(ret);
+ ret = register_trace_kvm_cr_write(probe_kvm_cr_write);
+ WARN_ON(ret);
+ ret = register_trace_kvm_cr_read_realmode(probe_kvm_cr_read_realmode);
+ WARN_ON(ret);
+ ret = register_trace_kvm_cr_write_realmode(probe_kvm_cr_write_realmode);
+ WARN_ON(ret);
+ ret = register_trace_kvm_dr_read(probe_kvm_dr_read);
+ WARN_ON(ret);
+ ret = register_trace_kvm_msr_read(probe_kvm_msr_read);
+ WARN_ON(ret);
+ ret = register_trace_kvm_msr_write(probe_kvm_msr_write);
+ WARN_ON(ret);
+ ret = register_trace_kvm_cpuid(probe_kvm_cpuid);
+ WARN_ON(ret);
+ ret = register_trace_kvm_intr(probe_kvm_intr);
+ WARN_ON(ret);
+ ret = register_trace_kvm_nmi(probe_kvm_nmi);
+ WARN_ON(ret);
+ ret = register_trace_kvm_vmmcall(probe_kvm_vmmcall);
+ WARN_ON(ret);
+ ret = register_trace_kvm_hlt(probe_kvm_hlt);
+ WARN_ON(ret);
+ ret = register_trace_kvm_clts(probe_kvm_clts);
+ WARN_ON(ret);
+ ret = register_trace_kvm_lmsw(probe_kvm_lmsw);
+ WARN_ON(ret);
+ ret = register_trace_kvm_apic_access(probe_kvm_apic_access);
+ WARN_ON(ret);
+
+ return 0;
+}
+
+static void unregister_tracepoints(void)
+{
+ unregister_trace_kvm_apic_access(probe_kvm_apic_access);
+ unregister_trace_kvm_lmsw(probe_kvm_lmsw);
+ unregister_trace_kvm_clts(probe_kvm_clts);
+ unregister_trace_kvm_hlt(probe_kvm_hlt);
+ unregister_trace_kvm_vmmcall(probe_kvm_vmmcall);
+ unregister_trace_kvm_nmi(probe_kvm_nmi);
+ unregister_trace_kvm_intr(probe_kvm_intr);
+ unregister_trace_kvm_cpuid(probe_kvm_cpuid);
+ unregister_trace_kvm_msr_write(probe_kvm_msr_write);
+ unregister_trace_kvm_msr_read(probe_kvm_msr_read);
+ unregister_trace_kvm_dr_read(probe_kvm_dr_read);
+ unregister_trace_kvm_cr_write_realmode(probe_kvm_cr_write_realmode);
+ unregister_trace_kvm_cr_read_realmode(probe_kvm_cr_read_realmode);
+ unregister_trace_kvm_cr_write(probe_kvm_cr_write);
+ unregister_trace_kvm_cr_read(probe_kvm_cr_read);
+ unregister_trace_kvm_io(probe_kvm_io);
+ unregister_trace_kvm_pend_intr(probe_kvm_pend_intr);
+ unregister_trace_kvm_redeliver_evt(probe_kvm_redeliver_evt);
+ unregister_trace_kvm_inj_virq(probe_kvm_inj_virq);
+ unregister_trace_kvm_page_fault(probe_kvm_page_fault);
+ unregister_trace_kvm_vmexit(probe_kvm_vmexit);
+ unregister_trace_kvm_vmentry(probe_kvm_vmentry);
+
+}
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints
2008-07-17 15:57 ` [patch 4/4] KVM-trace port to tracepoints Mathieu Desnoyers
@ 2008-07-17 16:49 ` Jan Kiszka
2008-07-17 17:28 ` Mathieu Desnoyers
2008-07-17 16:52 ` Anthony Liguori
1 sibling, 1 reply; 40+ messages in thread
From: Jan Kiszka @ 2008-07-17 16:49 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Avi Kivity, kvm,
Feng(Eric) Liu
Mathieu Desnoyers wrote:
> Port/cleanup of KVM-trace to tracepoints.
>
> Tracepoints allow dormat instrumentation, like the kernel markers, but also
> allows to describe the trace points in global headers so they can be easily
> managed. They also do not use format strings.
>
> Anything that would involve an action (dereference a pointer, vmcs read, ...)
> only required when tracing is placed in the probes created in kvm_trace.c
>
> This patch depends on the "Tracepoints" patch.
>
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> CC: 'Peter Zijlstra' <peterz@infradead.org>
> CC: 'Feng(Eric) Liu' <eric.e.liu@intel.com>
> CC: Avi Kivity <avi@qumranet.com>
> CC: kvm@vger.kernel.org
> ---
> arch/x86/kvm/vmx.c | 38 ++---
> arch/x86/kvm/x86.c | 43 ++----
> include/trace/kvm.h | 83 ++++++++++++
> virt/kvm/kvm_trace.c | 336 +++++++++++++++++++++++++++++++++++++++++++--------
> 4 files changed, 398 insertions(+), 102 deletions(-)
Is it a specific property of KVM-trace that causes this LOC blow-up? Or
is this a generic side-effect of tracepoints?
[ Hmm, hope I didn't missed too much of the tracepoint discussion... ]
Jan
--
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints
2008-07-17 15:57 ` [patch 4/4] KVM-trace port to tracepoints Mathieu Desnoyers
2008-07-17 16:49 ` Jan Kiszka
@ 2008-07-17 16:52 ` Anthony Liguori
2008-07-17 17:04 ` Mathieu Desnoyers
1 sibling, 1 reply; 40+ messages in thread
From: Anthony Liguori @ 2008-07-17 16:52 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Avi Kivity, kvm,
Feng(Eric) Liu
Hi Mathieu,
It's difficult to review your patches because they aren't inlined.
At any rate, this patches are unusable with SVM. They try to execute VT
instructions unconditionally. For instance, you changed:
>
> - KVMTRACE_1D(INTR, vcpu, vmcs_read32(VM_EXIT_INTR_INFO), handler);
> + trace_kvm_intr(vcpu);
Which lived in VT-specific code (vmx.c)
To:
> +static void probe_kvm_intr(struct kvm_vcpu *vcpu)
> +{
> + kvm_add_trace(KVM_TRC_INTR, vcpu, 1,
> + (u32 []){ vmcs_read32(VM_EXIT_INTR_INFO) });
> +}
> +
Which lives in common code (kvm_trace.c). But vmcs_read32() is
VT-specific and should not be used in common code so this motion is
wrong. Why not just pass more arguments to probe_kvm_intr()? Then your
first two patches can be dropped completely.
Regards,
Anthony Liguori
Mathieu Desnoyers wrote:
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints
2008-07-17 16:52 ` Anthony Liguori
@ 2008-07-17 17:04 ` Mathieu Desnoyers
0 siblings, 0 replies; 40+ messages in thread
From: Mathieu Desnoyers @ 2008-07-17 17:04 UTC (permalink / raw)
To: Anthony Liguori
Cc: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Avi Kivity, kvm,
Feng(Eric) Liu
* Anthony Liguori (anthony@codemonkey.ws) wrote:
> Hi Mathieu,
>
Hi Anthony,
> It's difficult to review your patches because they aren't inlined.
>
> At any rate, this patches are unusable with SVM. They try to execute VT
> instructions unconditionally. For instance, you changed:
>>
>> - KVMTRACE_1D(INTR, vcpu, vmcs_read32(VM_EXIT_INTR_INFO), handler);
>> + trace_kvm_intr(vcpu);
>
> Which lived in VT-specific code (vmx.c)
>
> To:
>
>> +static void probe_kvm_intr(struct kvm_vcpu *vcpu)
>> +{
>> + kvm_add_trace(KVM_TRC_INTR, vcpu, 1,
>> + (u32 []){ vmcs_read32(VM_EXIT_INTR_INFO) });
>> +}
>> +
>
> Which lives in common code (kvm_trace.c). But vmcs_read32() is VT-specific
> and should not be used in common code so this motion is wrong. Why not
> just pass more arguments to probe_kvm_intr()? Then your first two patches
> can be dropped completely.
>
Yes, I just noticed that I made a small mistake : the probe code should
actually go in arch/x86/kvm/kvm_trace_probes.c, which is x86-specific.
The reason why I would try to move the vmcs_read32(VM_EXIT_INTR_INFO) to
the probe code is because, unlike the Markers, when a function call with
potential side-effects is put within the arguments, it's really an
argument to a static inline function. In the markers, since it was a
parameter passed to a macro, I could shuffle it into the branch and it
would not be executed when markers were disabled.
However, we don't have this with tracepoints.
kvm-move-register-read-write-to-system-headers.patch becomes
obsolete since I put the probe code in arch/x86/kvm/.
But it would still be required to move vmcs read and encodings to
headers, either to include/asm-x86 or arch/x86/kvm.
Mathieu
> Regards,
>
> Anthony Liguori
>
> Mathieu Desnoyers wrote:
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints
2008-07-17 16:49 ` Jan Kiszka
@ 2008-07-17 17:28 ` Mathieu Desnoyers
2008-07-22 16:04 ` Jan Kiszka
0 siblings, 1 reply; 40+ messages in thread
From: Mathieu Desnoyers @ 2008-07-17 17:28 UTC (permalink / raw)
To: Jan Kiszka
Cc: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Avi Kivity, kvm,
Feng(Eric) Liu
* Jan Kiszka (jan.kiszka@siemens.com) wrote:
> Mathieu Desnoyers wrote:
> > Port/cleanup of KVM-trace to tracepoints.
> >
> > Tracepoints allow dormat instrumentation, like the kernel markers, but also
> > allows to describe the trace points in global headers so they can be easily
> > managed. They also do not use format strings.
> >
> > Anything that would involve an action (dereference a pointer, vmcs read, ...)
> > only required when tracing is placed in the probes created in kvm_trace.c
> >
> > This patch depends on the "Tracepoints" patch.
> >
> > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> > CC: 'Peter Zijlstra' <peterz@infradead.org>
> > CC: 'Feng(Eric) Liu' <eric.e.liu@intel.com>
> > CC: Avi Kivity <avi@qumranet.com>
> > CC: kvm@vger.kernel.org
> > ---
> > arch/x86/kvm/vmx.c | 38 ++---
> > arch/x86/kvm/x86.c | 43 ++----
> > include/trace/kvm.h | 83 ++++++++++++
> > virt/kvm/kvm_trace.c | 336 +++++++++++++++++++++++++++++++++++++++++++--------
> > 4 files changed, 398 insertions(+), 102 deletions(-)
>
> Is it a specific property of KVM-trace that causes this LOC blow-up? Or
> is this a generic side-effect of tracepoints?
>
> [ Hmm, hope I didn't missed too much of the tracepoint discussion... ]
>
This LOC blow-up is caused by the creation of one probe per
instrumentation site. So instead of placing the argument setup of
everything that goes in the trace (0 to 5 u32 arguments) in the kvm
code, it can be placed separately in a probe object, which could
eventually be a dynamically loadable module.
The primary objective of tracepoints is to make sure the kernel code
does not become harder to read because of added instrumentation and to
provide type-checking at compile-time without needing to put format
strings into the kernel code, which, to some, looks like debugging code.
The other aspect it try to address is maintainability of trace points :
it's much easier to look at all the prototype definitions in
include/trace/*.h and to manage them (and external tracers which would
like to connect on those points) than to try to figure out in which C
files tracing statements has been hidden. We can think of it as a
standard tracing API providing a more or less stable list of kernel
tracepoints.
So, while KVMTRACE_?D() statements suits closely kvm-trace
specificities, it's useless to impose constraints such as splitting
unsigned longs into two u32 for tracers which can support a wider
variety of data types.
After refactoring the patch to put the probes in arch/x86/kvm, the
result is :
arch/x86/kvm/Makefile | 1
arch/x86/kvm/kvm_trace_probes.c | 251 ++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx.c | 38 ++----
arch/x86/kvm/x86.c | 43 ++----
include/asm-x86/kvm_host.h | 8 +
include/trace/kvm.h | 83 +++++++++++++
virt/kvm/kvm_trace.c | 93 ++++++--------
7 files changed, 414 insertions(+), 103 deletions(-)
So actually, is it better to have less LOC which looks like this :
KVMTRACE_5D(CPUID, vcpu, function,
(u32)kvm_register_read(vcpu, VCPU_REGS_RAX),
(u32)kvm_register_read(vcpu, VCPU_REGS_RBX),
(u32)kvm_register_read(vcpu, VCPU_REGS_RCX),
(u32)kvm_register_read(vcpu, VCPU_REGS_RDX), handler);
or more LOC looking like this :
include/trace/kvm.h:
DEFINE_TRACE(kvm_cpuid,
TPPROTO(struct kvm_vcpu *vcpu, u32 function),
TPARGS(vcpu, function));
arch/x86/kvm/x86.c:
trace_kvm_cpuid(vcpu, function);
arch/x86/kvm/kvm_trace_probes.c:
static void probe_kvm_cpuid(struct kvm_vcpu *vcpu, u32 function)
{
kvm_add_trace(KVM_TRC_CPUID, vcpu, 5,
(u32 []){ function,
(u32)kvm_register_read(vcpu, VCPU_REGS_RAX),
(u32)kvm_register_read(vcpu, VCPU_REGS_RBX),
(u32)kvm_register_read(vcpu, VCPU_REGS_RCX),
(u32)kvm_register_read(vcpu, VCPU_REGS_RDX) });
}
int register_kvm_tracepoints(void)
{
...
ret = register_trace_kvm_cpuid(probe_kvm_cpuid);
WARN_ON(ret);
...
}
void unregister_kvm_tracepoints(void)
{
...
unregister_trace_kvm_cpuid(probe_kvm_cpuid);
...
}
?
Notice that only a single line of code is inserted to the kernel code,
while all the rest sits outsite in a separated probe module. So I think
it's very important to distinguish between LOC which impair kernel code
readability and LOC which sit in their own sandbox.
Mathieu
> Jan
>
> --
> Siemens AG, Corporate Technology, CT SE 2
> Corporate Competence Center Embedded Linux
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints
2008-07-17 17:28 ` Mathieu Desnoyers
@ 2008-07-22 16:04 ` Jan Kiszka
2008-07-22 18:46 ` Avi Kivity
0 siblings, 1 reply; 40+ messages in thread
From: Jan Kiszka @ 2008-07-22 16:04 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Avi Kivity, kvm,
Feng(Eric) Liu
Mathieu Desnoyers wrote:
> * Jan Kiszka (jan.kiszka@siemens.com) wrote:
>> Mathieu Desnoyers wrote:
>>> Port/cleanup of KVM-trace to tracepoints.
>>>
>>> Tracepoints allow dormat instrumentation, like the kernel markers, but also
>>> allows to describe the trace points in global headers so they can be easily
>>> managed. They also do not use format strings.
>>>
>>> Anything that would involve an action (dereference a pointer, vmcs read, ...)
>>> only required when tracing is placed in the probes created in kvm_trace.c
>>>
>>> This patch depends on the "Tracepoints" patch.
>>>
>>> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
>>> CC: 'Peter Zijlstra' <peterz@infradead.org>
>>> CC: 'Feng(Eric) Liu' <eric.e.liu@intel.com>
>>> CC: Avi Kivity <avi@qumranet.com>
>>> CC: kvm@vger.kernel.org
>>> ---
>>> arch/x86/kvm/vmx.c | 38 ++---
>>> arch/x86/kvm/x86.c | 43 ++----
>>> include/trace/kvm.h | 83 ++++++++++++
>>> virt/kvm/kvm_trace.c | 336 +++++++++++++++++++++++++++++++++++++++++++--------
>>> 4 files changed, 398 insertions(+), 102 deletions(-)
>> Is it a specific property of KVM-trace that causes this LOC blow-up? Or
>> is this a generic side-effect of tracepoints?
>>
>> [ Hmm, hope I didn't missed too much of the tracepoint discussion... ]
>>
>
> This LOC blow-up is caused by the creation of one probe per
> instrumentation site. So instead of placing the argument setup of
> everything that goes in the trace (0 to 5 u32 arguments) in the kvm
> code, it can be placed separately in a probe object, which could
> eventually be a dynamically loadable module.
>
> The primary objective of tracepoints is to make sure the kernel code
> does not become harder to read because of added instrumentation and to
> provide type-checking at compile-time without needing to put format
> strings into the kernel code, which, to some, looks like debugging code.
> The other aspect it try to address is maintainability of trace points :
> it's much easier to look at all the prototype definitions in
> include/trace/*.h and to manage them (and external tracers which would
> like to connect on those points) than to try to figure out in which C
> files tracing statements has been hidden. We can think of it as a
> standard tracing API providing a more or less stable list of kernel
> tracepoints.
>
> So, while KVMTRACE_?D() statements suits closely kvm-trace
> specificities, it's useless to impose constraints such as splitting
> unsigned longs into two u32 for tracers which can support a wider
> variety of data types.
>
> After refactoring the patch to put the probes in arch/x86/kvm, the
> result is :
>
> arch/x86/kvm/Makefile | 1
> arch/x86/kvm/kvm_trace_probes.c | 251 ++++++++++++++++++++++++++++++++++++++++
> arch/x86/kvm/vmx.c | 38 ++----
> arch/x86/kvm/x86.c | 43 ++----
> include/asm-x86/kvm_host.h | 8 +
> include/trace/kvm.h | 83 +++++++++++++
> virt/kvm/kvm_trace.c | 93 ++++++--------
> 7 files changed, 414 insertions(+), 103 deletions(-)
>
> So actually, is it better to have less LOC which looks like this :
>
> KVMTRACE_5D(CPUID, vcpu, function,
> (u32)kvm_register_read(vcpu, VCPU_REGS_RAX),
> (u32)kvm_register_read(vcpu, VCPU_REGS_RBX),
> (u32)kvm_register_read(vcpu, VCPU_REGS_RCX),
> (u32)kvm_register_read(vcpu, VCPU_REGS_RDX), handler);
>
>
> or more LOC looking like this :
>
> include/trace/kvm.h:
> DEFINE_TRACE(kvm_cpuid,
> TPPROTO(struct kvm_vcpu *vcpu, u32 function),
> TPARGS(vcpu, function));
>
> arch/x86/kvm/x86.c:
> trace_kvm_cpuid(vcpu, function);
>
> arch/x86/kvm/kvm_trace_probes.c:
> static void probe_kvm_cpuid(struct kvm_vcpu *vcpu, u32 function)
> {
> kvm_add_trace(KVM_TRC_CPUID, vcpu, 5,
> (u32 []){ function,
> (u32)kvm_register_read(vcpu, VCPU_REGS_RAX),
> (u32)kvm_register_read(vcpu, VCPU_REGS_RBX),
> (u32)kvm_register_read(vcpu, VCPU_REGS_RCX),
> (u32)kvm_register_read(vcpu, VCPU_REGS_RDX) });
> }
>
> int register_kvm_tracepoints(void)
> {
> ...
> ret = register_trace_kvm_cpuid(probe_kvm_cpuid);
> WARN_ON(ret);
> ...
> }
>
> void unregister_kvm_tracepoints(void)
> {
> ...
> unregister_trace_kvm_cpuid(probe_kvm_cpuid);
> ...
> }
>
> ?
>
> Notice that only a single line of code is inserted to the kernel code,
> while all the rest sits outsite in a separated probe module. So I think
> it's very important to distinguish between LOC which impair kernel code
> readability and LOC which sit in their own sandbox.
That's true - as long as you don't have to add/remove/modify
tracepoints. I had to do this job in the past (not for KVM). Having 1
spot in 1 file (based on generic probes) would be handier in that case
than 5 spots in 3 files. But if the KVM tracepoints are considered
stable in their number and structure, that shouldn't be an issue here.
Jan
--
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 0/4] Port KVM-trace to tracepoints
2008-07-17 15:57 [patch 0/4] Port KVM-trace to tracepoints Mathieu Desnoyers
` (3 preceding siblings ...)
2008-07-17 15:57 ` [patch 4/4] KVM-trace port to tracepoints Mathieu Desnoyers
@ 2008-07-22 18:42 ` Avi Kivity
2008-07-22 19:16 ` Frank Ch. Eigler
4 siblings, 1 reply; 40+ messages in thread
From: Avi Kivity @ 2008-07-22 18:42 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, kvm
Mathieu Desnoyers wrote:
> Hi,
>
> Here is a port of KVM-trace, currently using macros on top of the Linux Markers,
> to tracepoints. Note that I cleaned up the instrumentation too, so stuff like
>
> KVMTRACE_3D(CR_WRITE, vcpu, (u32)cr,
> (u32)kvm_register_read(vcpu, reg),
> (u32)((u64)kvm_register_read(vcpu, reg) >> 32),
> handler);
>
> Sprinkled all across the KVM C code becomes :
>
> trace_kvm_cr_write(vcpu, cr, reg);
>
> Which looks much nicer, IMHO.
>
> It applies on top of linux-next patch-v2.6.26-next-20080715.
>
>
It does look nicer; but it means maintaining tracepoints becomes much
harder, because the information is scattered across many files.
kvm tracepoints are heavily tied into the implementation; and making
them harder to write means we will have less information. In fact, I am
contemplating moving in another direction (when looking at the
pgprintk()s scattered around arch/x86/kvm/mmu.c:
kvm_trace("pfentry", "page_fault entry addr %lx error code %x\n",
cr2, error_code);
Unlike printk()s, no actual formatting would occur during runtime.
Instead, at initialization time all the strings would be parsed into a
data structure that describes the data types, and the runtime would
simply consult this structure and copy the arguments into trace
records. User space would also be able to pull this structure and so
recreate the formatted string.
The advantages I see to this are:
- easy to add traces; the most important advantage
- when the code changes, obsolete traces are completely removed
- good performance
- no need to have a formats file in userspace (which is tied to the
kernel version)
- can also send printk()s along, for synchronization with other kvm
and kernel events
In fact, why not convert printk() to do this, making it a
high-performance logging system? Of course it would retain the dmesg
interface, but we could say that messages at the KERN_TRACE level bypass
dmesg and only go to the trace system.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints
2008-07-22 16:04 ` Jan Kiszka
@ 2008-07-22 18:46 ` Avi Kivity
2008-07-23 7:49 ` Peter Zijlstra
0 siblings, 1 reply; 40+ messages in thread
From: Avi Kivity @ 2008-07-22 18:46 UTC (permalink / raw)
To: Jan Kiszka
Cc: Mathieu Desnoyers, akpm, Ingo Molnar, linux-kernel,
Peter Zijlstra, kvm, Feng(Eric) Liu
Jan Kiszka wrote:
> That's true - as long as you don't have to add/remove/modify
> tracepoints. I had to do this job in the past (not for KVM). Having 1
> spot in 1 file (based on generic probes) would be handier in that case
> than 5 spots in 3 files. But if the KVM tracepoints are considered
> stable in their number and structure, that shouldn't be an issue here.
>
>
Tracepoints aren't stable; they are artefacts of the implementation.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 0/4] Port KVM-trace to tracepoints
2008-07-22 18:42 ` [patch 0/4] Port KVM-trace " Avi Kivity
@ 2008-07-22 19:16 ` Frank Ch. Eigler
2008-07-22 19:31 ` Avi Kivity
0 siblings, 1 reply; 40+ messages in thread
From: Frank Ch. Eigler @ 2008-07-22 19:16 UTC (permalink / raw)
To: Avi Kivity
Cc: Mathieu Desnoyers, akpm, Ingo Molnar, linux-kernel,
Peter Zijlstra, kvm
Avi Kivity <avi@qumranet.com> writes:
> [...]
> kvm tracepoints are heavily tied into the implementation; and making
> them harder to write means we will have less information. In fact, I
> am contemplating moving in another direction (when looking at the
> pgprintk()s scattered around arch/x86/kvm/mmu.c:
>
> kvm_trace("pfentry", "page_fault entry addr %lx error code %x\n",
> cr2, error_code);
>
> Unlike printk()s, no actual formatting would occur during runtime.
Have you considered using trace_mark() directly - eliminating the
KVM_TRACEN() middlemen?
> Instead, at initialization time all the strings would be parsed into
> a data structure that describes the data types, and the runtime
> would simply consult this structure and copy the arguments into
> trace records. User space would also be able to pull this structure
> and so recreate the formatted string.
If one really wanted to, one could build such a mechanism on top of
marker-based callbacks.
> The advantages I see to this are:
>
> - easy to add traces; the most important advantage
> - when the code changes, obsolete traces are completely removed
> - good performance
Ditto.
> - no need to have a formats file in userspace (which is tied to the
> kernel version)
OTOH, you'd have the kernel collecting compact binary records
containing just the parameters, which are at least as tied to kernel
version.
> - can also send printk()s along, for synchronization with other kvm
> and kernel events
Ditto. It is elementary to attach a printk-generating marker callback.
- FChE
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 0/4] Port KVM-trace to tracepoints
2008-07-22 19:16 ` Frank Ch. Eigler
@ 2008-07-22 19:31 ` Avi Kivity
2008-07-22 19:54 ` Frank Ch. Eigler
2008-07-22 22:12 ` [patch 0/4] Port KVM-trace to tracepoints -> LTTng ? Mathieu Desnoyers
0 siblings, 2 replies; 40+ messages in thread
From: Avi Kivity @ 2008-07-22 19:31 UTC (permalink / raw)
To: Frank Ch. Eigler
Cc: Mathieu Desnoyers, akpm, Ingo Molnar, linux-kernel,
Peter Zijlstra, kvm
Frank Ch. Eigler wrote:
> Avi Kivity <avi@qumranet.com> writes:
>
>
>> [...]
>> kvm tracepoints are heavily tied into the implementation; and making
>> them harder to write means we will have less information. In fact, I
>> am contemplating moving in another direction (when looking at the
>> pgprintk()s scattered around arch/x86/kvm/mmu.c:
>>
>> kvm_trace("pfentry", "page_fault entry addr %lx error code %x\n",
>> cr2, error_code);
>>
>> Unlike printk()s, no actual formatting would occur during runtime.
>>
>
> Have you considered using trace_mark() directly - eliminating the
> KVM_TRACEN() middlemen?
>
>
Eliminating KVM_TRACEN -- yes. There are too many of them, they aren't
type-aware, and they're in uppercase.
Using trace_mark() directly -- looking at it, seems to fit the
requirements exactly. Should have looked at it earlier. Is there a way
to get a list of all markers?
Perhaps the kvmtrace marker->relay integration should be made a marker
feature, since there is nothing specific to kvm in it.
>> Instead, at initialization time all the strings would be parsed into
>> a data structure that describes the data types, and the runtime
>> would simply consult this structure and copy the arguments into
>> trace records. User space would also be able to pull this structure
>> and so recreate the formatted string.
>>
>
> If one really wanted to, one could build such a mechanism on top of
> marker-based callbacks.
>
>
One does want to.
>> - no need to have a formats file in userspace (which is tied to the
>> kernel version)
>>
>
> OTOH, you'd have the kernel collecting compact binary records
> containing just the parameters, which are at least as tied to kernel
> version.
>
>
Yes, but the userspace side would collect the format strings as well
(just once) and could put them in the same file. The aggregation is
portable across kernel versions.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 0/4] Port KVM-trace to tracepoints
2008-07-22 19:31 ` Avi Kivity
@ 2008-07-22 19:54 ` Frank Ch. Eigler
2008-07-22 22:12 ` [patch 0/4] Port KVM-trace to tracepoints -> LTTng ? Mathieu Desnoyers
1 sibling, 0 replies; 40+ messages in thread
From: Frank Ch. Eigler @ 2008-07-22 19:54 UTC (permalink / raw)
To: Avi Kivity
Cc: Mathieu Desnoyers, akpm, Ingo Molnar, linux-kernel,
Peter Zijlstra, kvm
Avi Kivity <avi@qumranet.com> writes:
> [...]
>> Have you considered using trace_mark() directly - eliminating the
>> KVM_TRACEN() middlemen?
> [...]
> Using trace_mark() directly -- looking at it, seems to fit the
> requirements exactly. Should have looked at it earlier. Is there a
> way to get a list of all markers?
>From kernel-space, I can't find just now an API for listing them, but
there probably is / could be one.
>From user-space, lttng tools can probably do it. systemtap can too:
% stap -l 'kernel.mark("*")'
You could prototype binary tracing thusly:
% stap -e 'probe kernel.mark("kvm_foobar") {
printf("%4b%4b%4b", # three 4-byte ints
cpu(), $arg1, $arg2)
}'
> Perhaps the kvmtrace marker->relay integration should be made a marker
> feature, since there is nothing specific to kvm in it.
Right, I believe something like that is in the lttng patch suite.
- FChE
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 0/4] Port KVM-trace to tracepoints -> LTTng ?
2008-07-22 19:31 ` Avi Kivity
2008-07-22 19:54 ` Frank Ch. Eigler
@ 2008-07-22 22:12 ` Mathieu Desnoyers
2008-07-27 10:11 ` Avi Kivity
1 sibling, 1 reply; 40+ messages in thread
From: Mathieu Desnoyers @ 2008-07-22 22:12 UTC (permalink / raw)
To: Avi Kivity
Cc: Frank Ch. Eigler, akpm, Ingo Molnar, linux-kernel, Peter Zijlstra,
kvm
* Avi Kivity (avi@qumranet.com) wrote:
> Frank Ch. Eigler wrote:
>> Avi Kivity <avi@qumranet.com> writes:
>>
>>
>>> [...]
>>> kvm tracepoints are heavily tied into the implementation; and making
>>> them harder to write means we will have less information. In fact, I
>>> am contemplating moving in another direction (when looking at the
>>> pgprintk()s scattered around arch/x86/kvm/mmu.c:
>>>
>>> kvm_trace("pfentry", "page_fault entry addr %lx error code %x\n",
>>> cr2, error_code);
>>>
>>> Unlike printk()s, no actual formatting would occur during runtime.
>>>
>>
>> Have you considered using trace_mark() directly - eliminating the
>> KVM_TRACEN() middlemen?
>>
>>
>
> Eliminating KVM_TRACEN -- yes. There are too many of them, they aren't
> type-aware, and they're in uppercase.
>
> Using trace_mark() directly -- looking at it, seems to fit the requirements
> exactly. Should have looked at it earlier. Is there a way to get a list
> of all markers?
>
> Perhaps the kvmtrace marker->relay integration should be made a marker
> feature, since there is nothing specific to kvm in it.
>
>>> Instead, at initialization time all the strings would be parsed into
>>> a data structure that describes the data types, and the runtime
>>> would simply consult this structure and copy the arguments into
>>> trace records. User space would also be able to pull this structure
>>> and so recreate the formatted string.
>>>
>>
>> If one really wanted to, one could build such a mechanism on top of
>> marker-based callbacks.
>>
>>
>
> One does want to.
>
>>> - no need to have a formats file in userspace (which is tied to the
>>> kernel version)
>>>
>>
>> OTOH, you'd have the kernel collecting compact binary records
>> containing just the parameters, which are at least as tied to kernel
>> version.
>>
>>
>
> Yes, but the userspace side would collect the format strings as well (just
> once) and could put them in the same file. The aggregation is portable
> across kernel versions.
>
Yes,
LTTng does exactly all that.
Please have a look at the current LTTng patchset :
http://ltt.polymtl.ca/lttng/patch-2.6.26-0.11.tar.bz2
The interface to list markers is currently found in /proc/ltt
Commands like :
cat /proc/ltt (list markers)
And echo -n "connect marker_name default dynamic channel_name"
See the script ltt-armall.sh in the package :
http://ltt.polymtl.ca/lttng/ltt-control-0.50-17072008.tar.gz
To see how to arm all markers listed.
General information (compatibility list and quickstart guide are
available at http://ltt.polymtl.ca). Packages also useful : lttv (trace
analyser, including text dump, filtering, gui...) and a userspace marker
package (only supports x86 32/64 currently).
All these packages support any kind of custom markers, because the
marker names/types are listed in the "facilities_*" control tracefile at
trace start, so the traces are self-described. I also list other stuff
(memory maps, irq handler names, system call handler names) at trace
start so we can dynamically have these mapping, independently of the
architecture.
See :
http://ltt.polymtl.ca/packages/lttv-0.10.0-pre14-17072008.tar.gz
http://ltt.polymtl.ca/packages/markers-userspace-0.5.tar.bz2
I'll be more than happy to answer your questions.
Mathieu
> --
> I have a truly marvellous patch that fixes the bug which this
> signature is too narrow to contain.
>
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints
2008-07-22 18:46 ` Avi Kivity
@ 2008-07-23 7:49 ` Peter Zijlstra
2008-07-23 8:08 ` Avi Kivity
0 siblings, 1 reply; 40+ messages in thread
From: Peter Zijlstra @ 2008-07-23 7:49 UTC (permalink / raw)
To: Avi Kivity
Cc: Jan Kiszka, Mathieu Desnoyers, akpm, Ingo Molnar, linux-kernel,
kvm, Feng(Eric) Liu
On Tue, 2008-07-22 at 21:46 +0300, Avi Kivity wrote:
> Jan Kiszka wrote:
> > That's true - as long as you don't have to add/remove/modify
> > tracepoints. I had to do this job in the past (not for KVM). Having 1
> > spot in 1 file (based on generic probes) would be handier in that case
> > than 5 spots in 3 files. But if the KVM tracepoints are considered
> > stable in their number and structure, that shouldn't be an issue here.
> >
> >
>
> Tracepoints aren't stable; they are artefacts of the implementation.
Which IMHO makes it unsuitable for trace_mark() as that will be exported
to user-space, and every time you change your tracepoints you'll change
user visible things - not nice.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints
2008-07-23 7:49 ` Peter Zijlstra
@ 2008-07-23 8:08 ` Avi Kivity
2008-07-23 8:55 ` Peter Zijlstra
2008-07-23 13:20 ` Mathieu Desnoyers
0 siblings, 2 replies; 40+ messages in thread
From: Avi Kivity @ 2008-07-23 8:08 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Jan Kiszka, Mathieu Desnoyers, akpm, Ingo Molnar, linux-kernel,
kvm, Feng(Eric) Liu
Peter Zijlstra wrote:
> On Tue, 2008-07-22 at 21:46 +0300, Avi Kivity wrote:
>
>> Jan Kiszka wrote:
>>
>>> That's true - as long as you don't have to add/remove/modify
>>> tracepoints. I had to do this job in the past (not for KVM). Having 1
>>> spot in 1 file (based on generic probes) would be handier in that case
>>> than 5 spots in 3 files. But if the KVM tracepoints are considered
>>> stable in their number and structure, that shouldn't be an issue here.
>>>
>>>
>>>
>> Tracepoints aren't stable; they are artefacts of the implementation.
>>
>
> Which IMHO makes it unsuitable for trace_mark() as that will be exported
> to user-space, and every time you change your tracepoints you'll change
> user visible things - not nice.
>
They are used for debugging (mostly performance related), so changes are
expected.
What uses of trace_mark() depend on a stable interface? blktrace?
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints
2008-07-23 8:08 ` Avi Kivity
@ 2008-07-23 8:55 ` Peter Zijlstra
2008-07-23 9:32 ` Avi Kivity
2008-07-23 10:03 ` Christoph Hellwig
2008-07-23 13:20 ` Mathieu Desnoyers
1 sibling, 2 replies; 40+ messages in thread
From: Peter Zijlstra @ 2008-07-23 8:55 UTC (permalink / raw)
To: Avi Kivity
Cc: Jan Kiszka, Mathieu Desnoyers, akpm, Ingo Molnar, linux-kernel,
kvm, Feng(Eric) Liu
On Wed, 2008-07-23 at 11:08 +0300, Avi Kivity wrote:
> Peter Zijlstra wrote:
> > On Tue, 2008-07-22 at 21:46 +0300, Avi Kivity wrote:
> >
> >> Jan Kiszka wrote:
> >>
> >>> That's true - as long as you don't have to add/remove/modify
> >>> tracepoints. I had to do this job in the past (not for KVM). Having 1
> >>> spot in 1 file (based on generic probes) would be handier in that case
> >>> than 5 spots in 3 files. But if the KVM tracepoints are considered
> >>> stable in their number and structure, that shouldn't be an issue here.
> >>>
> >>>
> >>>
> >> Tracepoints aren't stable; they are artefacts of the implementation.
> >>
> >
> > Which IMHO makes it unsuitable for trace_mark() as that will be exported
> > to user-space, and every time you change your tracepoints you'll change
> > user visible things - not nice.
> >
>
> They are used for debugging (mostly performance related), so changes are
> expected.
>
> What uses of trace_mark() depend on a stable interface? blktrace?
There are currently no trace_mark() sites in the kernel that I'm aware
of (except for the scheduler :-/, and those should be converted to
tracepoints ASAP).
Andrew raised the whole point about trace_mark() generating an
user-visible interface and thus it should be stable, and I agree with
that.
What that means is that trace_mark() can only be used for really stable
points.
This in turn means we might as well use trace points.
Which allows for the conclusion that trace_mark() is not needed and
could be removed from the kernel.
However - it might be handy for ad-hoc debugging purposes that never see
the light of day (linus' git tree in this case). So on those grounds one
could argue against removing trace_mark.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints
2008-07-23 8:55 ` Peter Zijlstra
@ 2008-07-23 9:32 ` Avi Kivity
2008-07-23 9:53 ` Peter Zijlstra
2008-07-23 10:03 ` Christoph Hellwig
1 sibling, 1 reply; 40+ messages in thread
From: Avi Kivity @ 2008-07-23 9:32 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Jan Kiszka, Mathieu Desnoyers, akpm, Ingo Molnar, linux-kernel,
kvm, Feng(Eric) Liu
Peter Zijlstra wrote:
> There are currently no trace_mark() sites in the kernel that I'm aware
> of (except for the scheduler :-/, and those should be converted to
> tracepoints ASAP).
>
> Andrew raised the whole point about trace_mark() generating an
> user-visible interface and thus it should be stable, and I agree with
> that.
>
> What that means is that trace_mark() can only be used for really stable
> points.
>
> This in turn means we might as well use trace points.
>
> Which allows for the conclusion that trace_mark() is not needed and
> could be removed from the kernel.
>
> However - it might be handy for ad-hoc debugging purposes that never see
> the light of day (linus' git tree in this case). So on those grounds one
> could argue against removing trace_mark
But trace_mark() is so wonderful. Can't we just declare the tracemarks
as a non-stable interface?
Perhaps add an unstable_trace_mark() to make it clear.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints
2008-07-23 9:32 ` Avi Kivity
@ 2008-07-23 9:53 ` Peter Zijlstra
2008-07-23 13:15 ` Mathieu Desnoyers
0 siblings, 1 reply; 40+ messages in thread
From: Peter Zijlstra @ 2008-07-23 9:53 UTC (permalink / raw)
To: Avi Kivity
Cc: Jan Kiszka, Mathieu Desnoyers, akpm, Ingo Molnar, linux-kernel,
kvm, Feng(Eric) Liu
On Wed, 2008-07-23 at 12:32 +0300, Avi Kivity wrote:
> Peter Zijlstra wrote:
> > There are currently no trace_mark() sites in the kernel that I'm aware
> > of (except for the scheduler :-/, and those should be converted to
> > tracepoints ASAP).
> >
> > Andrew raised the whole point about trace_mark() generating an
> > user-visible interface and thus it should be stable, and I agree with
> > that.
> >
> > What that means is that trace_mark() can only be used for really stable
> > points.
> >
> > This in turn means we might as well use trace points.
> >
> > Which allows for the conclusion that trace_mark() is not needed and
> > could be removed from the kernel.
> >
> > However - it might be handy for ad-hoc debugging purposes that never see
> > the light of day (linus' git tree in this case). So on those grounds one
> > could argue against removing trace_mark
>
> But trace_mark() is so wonderful.
I guess tastes differ...
> Can't we just declare the tracemarks
> as a non-stable interface?
>
> Perhaps add an unstable_trace_mark() to make it clear.
At the very least it would need its own output channel. But I'm afraid
this will be KS material.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints
2008-07-23 8:55 ` Peter Zijlstra
2008-07-23 9:32 ` Avi Kivity
@ 2008-07-23 10:03 ` Christoph Hellwig
2008-07-23 10:08 ` Avi Kivity
1 sibling, 1 reply; 40+ messages in thread
From: Christoph Hellwig @ 2008-07-23 10:03 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Avi Kivity, Jan Kiszka, Mathieu Desnoyers, akpm, Ingo Molnar,
linux-kernel, kvm, Feng(Eric) Liu
On Wed, Jul 23, 2008 at 10:55:03AM +0200, Peter Zijlstra wrote:
> Andrew raised the whole point about trace_mark() generating an
> user-visible interface and thus it should be stable, and I agree with
> that.
I'm not sure why this comes up again and again, but it's utter bullshit.
trace_mark does not generate any kind of user interface, just purely
a kernel internal interface.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints
2008-07-23 10:03 ` Christoph Hellwig
@ 2008-07-23 10:08 ` Avi Kivity
2008-07-23 10:13 ` Christoph Hellwig
0 siblings, 1 reply; 40+ messages in thread
From: Avi Kivity @ 2008-07-23 10:08 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Peter Zijlstra, Jan Kiszka, Mathieu Desnoyers, akpm, Ingo Molnar,
linux-kernel, kvm, Feng(Eric) Liu
Christoph Hellwig wrote:
> On Wed, Jul 23, 2008 at 10:55:03AM +0200, Peter Zijlstra wrote:
>
>> Andrew raised the whole point about trace_mark() generating an
>> user-visible interface and thus it should be stable, and I agree with
>> that.
>>
>
> I'm not sure why this comes up again and again, but it's utter bullshit.
> trace_mark does not generate any kind of user interface, just purely
> a kernel internal interface.
>
trace_mark() is implement kvmtrace, which is propagated to userspace.
So while trace_mark() itself is not a userspace interface, one of its
users is.
It's an unstable interface. But so is dmesg; that's the nature of tracing.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints
2008-07-23 10:08 ` Avi Kivity
@ 2008-07-23 10:13 ` Christoph Hellwig
0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2008-07-23 10:13 UTC (permalink / raw)
To: Avi Kivity
Cc: Christoph Hellwig, Peter Zijlstra, Jan Kiszka, Mathieu Desnoyers,
akpm, Ingo Molnar, linux-kernel, kvm, Feng(Eric) Liu
On Wed, Jul 23, 2008 at 01:08:41PM +0300, Avi Kivity wrote:
> trace_mark() is implement kvmtrace, which is propagated to userspace.
> So while trace_mark() itself is not a userspace interface, one of its
> users is.
>
> It's an unstable interface. But so is dmesg; that's the nature of tracing.
Trace_mark is as stable as any other kernel interface, and
the data you pass through it is as stable as you want it to. In most
cases like kvmtrace or my spu scheduler tracing code the trace data
is directly forwarded through a userspace interface, and that is as
stable as any freeform interface, e.g. as like printk mentioned above.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints
2008-07-23 9:53 ` Peter Zijlstra
@ 2008-07-23 13:15 ` Mathieu Desnoyers
0 siblings, 0 replies; 40+ messages in thread
From: Mathieu Desnoyers @ 2008-07-23 13:15 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Avi Kivity, Jan Kiszka, akpm, Ingo Molnar, linux-kernel, kvm,
Feng(Eric) Liu
* Peter Zijlstra (peterz@infradead.org) wrote:
> On Wed, 2008-07-23 at 12:32 +0300, Avi Kivity wrote:
> > Peter Zijlstra wrote:
> > > There are currently no trace_mark() sites in the kernel that I'm aware
> > > of (except for the scheduler :-/, and those should be converted to
> > > tracepoints ASAP).
> > >
> > > Andrew raised the whole point about trace_mark() generating an
> > > user-visible interface and thus it should be stable, and I agree with
> > > that.
> > >
> > > What that means is that trace_mark() can only be used for really stable
> > > points.
> > >
> > > This in turn means we might as well use trace points.
> > >
> > > Which allows for the conclusion that trace_mark() is not needed and
> > > could be removed from the kernel.
> > >
> > > However - it might be handy for ad-hoc debugging purposes that never see
> > > the light of day (linus' git tree in this case). So on those grounds one
> > > could argue against removing trace_mark
> >
> > But trace_mark() is so wonderful.
>
> I guess tastes differ...
>
> > Can't we just declare the tracemarks
> > as a non-stable interface?
> >
> > Perhaps add an unstable_trace_mark() to make it clear.
>
> At the very least it would need its own output channel. But I'm afraid
> this will be KS material.
>
Hi Peter,
Currently what I have in LTTng includes this output channel. It works
for me, but if I can make it work for others that would be great.
- Tracepoints in kernel code to instrument the kernel.
- LTTng probes connect on those relatively stable tracepoints. They
format the data so it's meaningful to userspace (e.g. extracts the pid
of the prev and next process at sched_switch).
- The LTTng serializer is connected on those markers. It parses the
format string to dynamically reserve space in the relay buffer, write
a timestamp and event ID (one event ID is pre-assigned to a marker
name) and copy the arguments from the stack to the event record (which
has a variable size).
Event IDs and timestamps are added by LTTng, thus not required by
markers. However, one can think of this flow as an efficient and compact
binary data export mechanism to userspace.
Headers exports data type sizes and endianness, a special data channel
exports the mappings between { marker name, ID, format string } so
events are self-described. Therefore, one can add any event he likes and
it will be automatically understood by the tracing toolchain.
If an event is removed or filtered out or modified (by changing its
field name), the userspace trace analyser will detect it and the
specific probe which expects this event will fail to load, leading to
missing analyses, but nothing more than that.
So currently what we would have is, more or less : trace_marks located
within LTTng are kept in sync with userland, but the whole chain also
allows to add "debug-style" trace_marks directly in the kernel code
(this is really useful when trying to perform a low-impact printk-like
runtime bissection of a bug in the kernel code.
I actually see the trace_marks/LTTng combination as a printk which would
extract information in its binary form instead of using text-formatting.
The actual formatting can then be done later, in userland, if ever
needed (many analyses use the raw binary format directly). I guess KS
would be a good opportunity to discuss this interface topic.
Mathieu
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints
2008-07-23 8:08 ` Avi Kivity
2008-07-23 8:55 ` Peter Zijlstra
@ 2008-07-23 13:20 ` Mathieu Desnoyers
1 sibling, 0 replies; 40+ messages in thread
From: Mathieu Desnoyers @ 2008-07-23 13:20 UTC (permalink / raw)
To: Avi Kivity
Cc: Peter Zijlstra, Jan Kiszka, akpm, Ingo Molnar, linux-kernel, kvm,
Feng(Eric) Liu
* Avi Kivity (avi@qumranet.com) wrote:
> Peter Zijlstra wrote:
>> On Tue, 2008-07-22 at 21:46 +0300, Avi Kivity wrote:
>>
>>> Jan Kiszka wrote:
>>>
>>>> That's true - as long as you don't have to add/remove/modify
>>>> tracepoints. I had to do this job in the past (not for KVM). Having 1
>>>> spot in 1 file (based on generic probes) would be handier in that case
>>>> than 5 spots in 3 files. But if the KVM tracepoints are considered
>>>> stable in their number and structure, that shouldn't be an issue here.
>>>>
>>>>
>>> Tracepoints aren't stable; they are artefacts of the implementation.
>>>
>>
>> Which IMHO makes it unsuitable for trace_mark() as that will be exported
>> to user-space, and every time you change your tracepoints you'll change
>> user visible things - not nice.
>>
>
> They are used for debugging (mostly performance related), so changes are
> expected.
>
> What uses of trace_mark() depend on a stable interface? blktrace?
>
Actually, LTTng likes to have the { marker name, field name } pairs
unchanged for the markers it looks for, but that's about it. If a
userspace analysis plugin fails to see a marker (because it is disabled
or changed), it just does not apply its particular analysis on the
trace.
Since the markers and marker types are self-described in the trace,
userspace can detect any change the the present markers, so there is no
need to rely on "version numbers" because we are able to proceed to a
complete marker list verification (names, field names, types) before
starting the trace analysis.
Mathieu
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 0/4] Port KVM-trace to tracepoints -> LTTng ?
2008-07-22 22:12 ` [patch 0/4] Port KVM-trace to tracepoints -> LTTng ? Mathieu Desnoyers
@ 2008-07-27 10:11 ` Avi Kivity
2008-07-28 0:54 ` [RFC] LTTng merge plan Mathieu Desnoyers
0 siblings, 1 reply; 40+ messages in thread
From: Avi Kivity @ 2008-07-27 10:11 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Frank Ch. Eigler, akpm, Ingo Molnar, linux-kernel, Peter Zijlstra,
kvm
Mathieu Desnoyers wrote:
>> Yes, but the userspace side would collect the format strings as well (just
>> once) and could put them in the same file. The aggregation is portable
>> across kernel versions.
>>
>>
>
> Yes,
>
> LTTng does exactly all that.
>
>
>
[snip goodies]
> I'll be more than happy to answer your questions.
>
What's the merge plan for this?
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 40+ messages in thread
* [RFC] LTTng merge plan
2008-07-27 10:11 ` Avi Kivity
@ 2008-07-28 0:54 ` Mathieu Desnoyers
2008-07-29 16:18 ` Frank Ch. Eigler
0 siblings, 1 reply; 40+ messages in thread
From: Mathieu Desnoyers @ 2008-07-28 0:54 UTC (permalink / raw)
To: Avi Kivity
Cc: Linus Torvalds, Andrew Morton, Ingo Molnar, Peter Zijlstra,
Steven Rostedt, James Bottomley, Frank Ch. Eigler, kvm,
linux-kernel, systemtap-ml, linux-btrace
* Avi Kivity (avi@qumranet.com) wrote:
> Mathieu Desnoyers wrote:
>
>
>>> Yes, but the userspace side would collect the format strings as well
>>> (just once) and could put them in the same file. The aggregation is
>>> portable across kernel versions.
>>>
>>>
>>
>> Yes,
>>
>> LTTng does exactly all that.
>>
>>
>>
>
> [snip goodies]
>
>> I'll be more than happy to answer your questions.
>>
>
> What's the merge plan for this?
>
Hi Avi,
Thanks for asking. Given the amount of expectation from kernel
developers, distributions and users I have seen for kernel tracing at
this year's OLS, I think giving a detailed merge plan for my LTTng work
is appropriate.
Currently, it looks like :
In Ingo's trees :
- Tracepoints, scheduler tracepoints instrumentation, ftrace port to
tracepoints
- Should make it into 2.6.27 since ftrace needs those.
- Immediate Values (faster branch based on load immediate instruction)
Useful for markers and tracepoints, but can also be used for any
compiled-in code that has to be dynamically enabled.
- Aims at 2.6.28
- Text Edit Lock : protection of kernel text modification with a mutex.
Synchronises kprobes and immediate values.
- Aims at 2.6.28
Short-term submission plan
In LTTng patchset
(http://ltt.polymtl.ca/lttng/patch-2.6.26-0.12.tar.bz2)
- Instrumentation
- LTTng tracepoints
- Used by LTTng, SystemTAP and usable specialized probes.
- Port specific sets of tracepoints along with their current users
- ftrace (port currently in Ingo's tree), KVM trace, blktrace.
- Data extraction
- LTTng timestamping
- Based on the CPU cycle counter when synchronized across CPUs.
- Fallback on a simple cache-bouncing atomic counter if no
synchronized fast time source is available. Basically, the idea
is that having the correct event _order_ is more important than
having an approximate time, because this "timestamp" is used to
reorder events which are written in per-CPU buffers. Time updates
can always be recorded as an event in the trace to get an idea of
the kernel time flow.
- LTTng trace management
- netlink interface to start/stop tracing and set the buffer sizes.
- Supports multiple channels (high/medium/low event rate).
Metadata (marker types, list of interrupt handlers...) can be
exported in low event rate channels.
- Supports flight recorder mode (overwriting oldest buffer data),
normal mode (writes to disk, drops events if buffer is full) or
hybrid, or mixed, mode, where the high event rate buffers only
are in flight recorder mode.
- Data relay
- Atomic buffering mechanism which does not call into kernel
primitives except preempt disable. Only touches variables
atomically, does not use any lock. Aims at having minimal
intrusiveness and allowing the largest code coverage (thus not
calling kernel code).
- LTTng marker control
- Currently a /proc/ltt interface with read and write operations to
list markers and connect LTTng probe to individual markers,
specifying in which channel to send the data (I know, should
probably belong to /sys instead, comments welcome) It's not part
of the core marker infrastructure because it depends both on
markers and on the LTTng trace management. It's also responsible
for allocating a numeric ID to a marker which is guaranteed to be
unique as long as there is at least one active trace.
Medium-term submission plan
In LTTng patchset
- Instrumentation
- Userspace tracing interface
- Allow userspace to declare tracepoints and/or markers
- Provide a data extraction interface to collect the tracing data.
- More work needed in this area.
- LTTng statedump
- Exports the kernel data structures to the trace buffers at trace
start. List interrupts, system calls, threads, memory maps, ... It
does not use /proc because :
1 - /proc has nasty races which makes the information "generally
correct" but not more.
2 - /proc exports the information in text format, which is not
as compact as LTTng binary format.
Longer term wishlist
- GCC support for static branch patching
- Improvement on the immediate values for dynamic code activation
A bit more information is available in the slides I just presented at
OLS at :
http://ltt.polymtl.ca/slides/desnoyers-talk-ols2008.pdf
I'll gladly answer to questions/comments.
Mathieu
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] LTTng merge plan
2008-07-28 0:54 ` [RFC] LTTng merge plan Mathieu Desnoyers
@ 2008-07-29 16:18 ` Frank Ch. Eigler
2008-07-29 17:01 ` Mathieu Desnoyers
0 siblings, 1 reply; 40+ messages in thread
From: Frank Ch. Eigler @ 2008-07-29 16:18 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Avi Kivity, Linus Torvalds, Andrew Morton, Ingo Molnar,
Peter Zijlstra, Steven Rostedt, James Bottomley, kvm,
linux-kernel, systemtap-ml, linux-btrace
Hi, Mathieu -
> [...]
> Currently, [my merge plan] looks like :
>
> In Ingo's trees :
> - Tracepoints, scheduler tracepoints instrumentation, ftrace port to
> tracepoints
> - Should make it into 2.6.27 since ftrace needs those.
> [...]
This is the tracepoints infrastructure, and ...
> Short-term submission plan
>
> In LTTng patchset
> (http://ltt.polymtl.ca/lttng/patch-2.6.26-0.12.tar.bz2)
>
> - Instrumentation
> - LTTng tracepoints
> - Used by LTTng, SystemTAP and usable specialized probes.
> [...]
... this is the "meat", which includes both the tracepoints
and the the tracepoint-to-marker conversion modules, such as those in
"lttng-instrumentation-*-tracepoint-probes.patch", right?
- FChE
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] LTTng merge plan
2008-07-29 16:18 ` Frank Ch. Eigler
@ 2008-07-29 17:01 ` Mathieu Desnoyers
[not found] ` <20080729211543.GB17097@redhat.com>
0 siblings, 1 reply; 40+ messages in thread
From: Mathieu Desnoyers @ 2008-07-29 17:01 UTC (permalink / raw)
To: Frank Ch. Eigler
Cc: Avi Kivity, Linus Torvalds, Andrew Morton, Ingo Molnar,
Peter Zijlstra, Steven Rostedt, James Bottomley, kvm,
linux-kernel, systemtap-ml, linux-btrace
* Frank Ch. Eigler (fche@redhat.com) wrote:
> Hi, Mathieu -
>
> > [...]
> > Currently, [my merge plan] looks like :
> >
> > In Ingo's trees :
> > - Tracepoints, scheduler tracepoints instrumentation, ftrace port to
> > tracepoints
> > - Should make it into 2.6.27 since ftrace needs those.
> > [...]
>
> This is the tracepoints infrastructure, and ...
>
> > Short-term submission plan
> >
> > In LTTng patchset
> > (http://ltt.polymtl.ca/lttng/patch-2.6.26-0.12.tar.bz2)
> >
> > - Instrumentation
> > - LTTng tracepoints
> > - Used by LTTng, SystemTAP and usable specialized probes.
> > [...]
>
> ... this is the "meat", which includes both the tracepoints
> and the the tracepoint-to-marker conversion modules, such as those in
> "lttng-instrumentation-*-tracepoint-probes.patch", right?
>
> - FChE
Exactly. The tracepoint-to-markers conversion modules bridge between
the in-kernel API (tracepoints) which declares standard location for
kernel instrumentation and user-space visible "markers", so that there
is an indirection level between the instrumented kernel code and what is
visible from userspace.
For instance, we have
in kernel/sched.c:context_switch()
trace_sched_switch(rq, prev, next);
Which exports a struct rq * and two struct task_struct *. They are used
by both ftrace, which needs access to these data structures, and by the
LTTng probes, which extracts the previous PID and state (running,
waiting..) and next PID to be scheduled in.
Basically, these modules turn the data exported by tracepoints,
meaningful only to kernel modules, into data useful for userspace trace
analysis.
Mathieu
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: module-placed markers/tracepoints
[not found] ` <20080729211543.GB17097@redhat.com>
@ 2008-07-29 22:41 ` Mathieu Desnoyers
2008-07-29 23:01 ` Frank Ch. Eigler
2008-07-30 1:40 ` Rusty Russell
0 siblings, 2 replies; 40+ messages in thread
From: Mathieu Desnoyers @ 2008-07-29 22:41 UTC (permalink / raw)
To: Frank Ch. Eigler; +Cc: Rusty Russell, linux-kernel, akpm
* Frank Ch. Eigler (fche@redhat.com) wrote:
> Hi -
>
> Some locals are wondering -- is there code for (or need for new code
> for) incrementing module reference counts while markers and/or
> tracepoints resident in modules have active clients?
>
> - FChE
Hi Frank,
Probe module unloading is supposed to be done automatically assuming the
following module unload behavior :
1 - Module unload code takes the stop machine lock.
2 - It is assumed that the module exit function is called at that time,
while no preempt-disabled section is executed. This function
unregisters the callbacks.
3 - Given that preemption is disabled around tracepoint/marker calls, no
calls are left to a freed module.
Actually, I notice that it might not be the case. I took for granted
what (I can't remember who ? Rusty ?) told me about module unload code.
If we look at sys_delete_module() :
/* Stop the machine so refcounts can't move and disable module. */
ret = try_stop_module(mod, flags, &forced);
if (ret != 0)
goto out;
/* Never wait if forced. */
if (!forced && module_refcount(mod) != 0)
wait_for_zero_refcount(mod);
mutex_unlock(&module_mutex);
/* Final destruction now noone is using it. */
if (mod->exit != NULL)
mod->exit();
So we see that mod->exit(), which I believe is responsible for calling
the module exit() functions, is called after the stop_machine, not
during the stop machine.
Therefore, what would be needed here is to add a synchronize_sched()
after mod->exit() in module.c or after each tracepoint/marker unregister
in marker.c/tracepoint.c. However, I'd really prefer to add this to
module.c since adding this for _every_ unregister will slow down
processing or multiple probe unregistration too much.
Rusty, am I understanding that correctly ?
Mathieu
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: module-placed markers/tracepoints
2008-07-29 22:41 ` module-placed markers/tracepoints Mathieu Desnoyers
@ 2008-07-29 23:01 ` Frank Ch. Eigler
2008-07-29 23:19 ` Mathieu Desnoyers
2008-07-30 1:40 ` Rusty Russell
1 sibling, 1 reply; 40+ messages in thread
From: Frank Ch. Eigler @ 2008-07-29 23:01 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: Rusty Russell, linux-kernel, akpm
Hi -
On Tue, Jul 29, 2008 at 06:41:16PM -0400, Mathieu Desnoyers wrote:
> > Some locals are wondering -- is there code for (or need for new code
> > for) incrementing module reference counts while markers and/or
> > tracepoints resident in modules have active clients?
>
> Probe module unloading is supposed to be done automatically assuming the
> following module unload behavior [...]
The question was more that if module-placed markers/tracepoints are
armed, is there any mechanism to prevent the modules' unloading. For
kprobes, there is.
- FChE
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: module-placed markers/tracepoints
2008-07-29 23:01 ` Frank Ch. Eigler
@ 2008-07-29 23:19 ` Mathieu Desnoyers
0 siblings, 0 replies; 40+ messages in thread
From: Mathieu Desnoyers @ 2008-07-29 23:19 UTC (permalink / raw)
To: Frank Ch. Eigler; +Cc: Rusty Russell, linux-kernel, akpm
* Frank Ch. Eigler (fche@redhat.com) wrote:
> Hi -
>
> On Tue, Jul 29, 2008 at 06:41:16PM -0400, Mathieu Desnoyers wrote:
> > > Some locals are wondering -- is there code for (or need for new code
> > > for) incrementing module reference counts while markers and/or
> > > tracepoints resident in modules have active clients?
> >
> > Probe module unloading is supposed to be done automatically assuming the
> > following module unload behavior [...]
>
> The question was more that if module-placed markers/tracepoints are
> armed, is there any mechanism to prevent the modules' unloading. For
> kprobes, there is.
>
> - FChE
I see, it's the other way around : declaring a marker/tracepoint in a
module.
When you register to a marker/tracepoint, you actually register the
probe in a hash table. It will be connected to every placed
marker/tracepoint with that given name. Upon module load, markers/tp
that match the name are connected, and upon module unload all markers/tp
are simply freed by module.c : unlike kprobes, there is no need for the
marker/tp infrastructures to keep track of every marker/tp. So we don't
need any refcount on the module which has the marker/tp.
However, the probe question is important; I think a synchronize_sched()
is missing to handle unload of marker/tp probes modules.
Mathieu
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: module-placed markers/tracepoints
2008-07-29 22:41 ` module-placed markers/tracepoints Mathieu Desnoyers
2008-07-29 23:01 ` Frank Ch. Eigler
@ 2008-07-30 1:40 ` Rusty Russell
2008-07-30 2:27 ` [PATCH] Module : call synchronize_sched() between module exit() and free Mathieu Desnoyers
1 sibling, 1 reply; 40+ messages in thread
From: Rusty Russell @ 2008-07-30 1:40 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: Frank Ch. Eigler, linux-kernel, akpm
On Wednesday 30 July 2008 08:41:16 Mathieu Desnoyers wrote:
> Therefore, what would be needed here is to add a synchronize_sched()
> after mod->exit() in module.c or after each tracepoint/marker unregister
> in marker.c/tracepoint.c. However, I'd really prefer to add this to
> module.c since adding this for _every_ unregister will slow down
> processing or multiple probe unregistration too much.
>
> Rusty, am I understanding that correctly ?
>
> Mathieu
Hi Mathieu,
Yes: stop_machine is merely used to atomically check the module refcount
for zero and set the state so it can't be incremented again (ie.
try_module_get will fail).
So placing a tracepoint or marker in a module does not bump the module
refcount? If that's true, then there needs to be some kind of
remove_markers_from_module() call after module->exit(), which should do the
synchronize_sched() or whatever, right?
Rusty.
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH] Module : call synchronize_sched() between module exit() and free.
2008-07-30 1:40 ` Rusty Russell
@ 2008-07-30 2:27 ` Mathieu Desnoyers
2008-07-30 3:04 ` Rusty Russell
2008-07-30 11:40 ` Frank Ch. Eigler
0 siblings, 2 replies; 40+ messages in thread
From: Mathieu Desnoyers @ 2008-07-30 2:27 UTC (permalink / raw)
To: Rusty Russell; +Cc: Frank Ch. Eigler, linux-kernel, akpm
> Hi Mathieu,
>
> Yes: stop_machine is merely used to atomically check the module refcount
> for zero and set the state so it can't be incremented again (ie.
> try_module_get will fail).
>
> So placing a tracepoint or marker in a module does not bump the module
> refcount? If that's true, then there needs to be some kind of
> remove_markers_from_module() call after module->exit(), which should do the
> synchronize_sched() or whatever, right?
>
> Rusty.
Actually, it's not placing a marker/tracepoint in a module which causes
a problem, this is a simple function call after all, and correctly dealt
with by current module.c code.
The problem comes from a probe function (the callback) that would be
registered to be called from a marker and would sit in an unloadable
kernel module. I would not want to tie the refcount of the probe modules
to the fact that they are connected to a marker because it would then
become impossible to unload them due to the fact that unregistration is
done in module exit().
This is one of the reasons why I disable preemption around the marker
site (the function call) : to make sure I can can unregister the
callback, wait for a quiescent state (with synchronize_sched()) and then
free the module memory.
This would give the following supplementary guarantee about module
teardown : every function called with preemption off and unregistered in
the module exit() would reach a quiescent state before the module is
freed. Given this does apply to rarely used code (module unload), I
think it might be ok to simply add a call to synchronize_sched() before
the module memory is freed. Not tying this to markers/tracepoints would
keep the behavior consistant across various build options, which is IMHO
a good thing.
I could also just document that a mandatory "synchronize_sched()" should
be called at the end of the probe module exit() function which makes
sure the probes has reached a quiescent state.
I don't want to add a synchronize_sched() into the marker/tracepoint
probe unregistration code because I want to keep batch probe
unregistration fast enough so it does no take ~5 seconds to unload ~100
probes. (may take longer on a loaded SMP system)
Mathieu
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Module : call synchronize_sched() between module exit() and free.
2008-07-30 2:27 ` [PATCH] Module : call synchronize_sched() between module exit() and free Mathieu Desnoyers
@ 2008-07-30 3:04 ` Rusty Russell
2008-07-30 4:05 ` Mathieu Desnoyers
2008-07-30 11:40 ` Frank Ch. Eigler
1 sibling, 1 reply; 40+ messages in thread
From: Rusty Russell @ 2008-07-30 3:04 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: Frank Ch. Eigler, linux-kernel, akpm
On Wednesday 30 July 2008 12:27:51 Mathieu Desnoyers wrote:
> > Hi Mathieu,
> >
> > Yes: stop_machine is merely used to atomically check the module
> > refcount for zero and set the state so it can't be incremented again (ie.
> > try_module_get will fail).
> >
> > So placing a tracepoint or marker in a module does not bump the module
> > refcount? If that's true, then there needs to be some kind of
> > remove_markers_from_module() call after module->exit(), which should do
> > the synchronize_sched() or whatever, right?
> >
> > Rusty.
>
> Actually, it's not placing a marker/tracepoint in a module which causes
> a problem, this is a simple function call after all, and correctly dealt
> with by current module.c code.
>
> The problem comes from a probe function (the callback) that would be
> registered to be called from a marker and would sit in an unloadable
> kernel module. I would not want to tie the refcount of the probe modules
> to the fact that they are connected to a marker because it would then
> become impossible to unload them due to the fact that unregistration is
> done in module exit().
Hi Mathieu,
Still confused, sorry. Why don't you don't do a synchronize_sched() at
the end of your module's exit routine? "You must be completely finished by
the time ->exit() returns" is the rule so far...
Rusty.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Module : call synchronize_sched() between module exit() and free.
2008-07-30 3:04 ` Rusty Russell
@ 2008-07-30 4:05 ` Mathieu Desnoyers
0 siblings, 0 replies; 40+ messages in thread
From: Mathieu Desnoyers @ 2008-07-30 4:05 UTC (permalink / raw)
To: Rusty Russell; +Cc: Frank Ch. Eigler, linux-kernel, akpm
* Rusty Russell (rusty@rustcorp.com.au) wrote:
> On Wednesday 30 July 2008 12:27:51 Mathieu Desnoyers wrote:
> > > Hi Mathieu,
> > >
> > > Yes: stop_machine is merely used to atomically check the module
> > > refcount for zero and set the state so it can't be incremented again (ie.
> > > try_module_get will fail).
> > >
> > > So placing a tracepoint or marker in a module does not bump the module
> > > refcount? If that's true, then there needs to be some kind of
> > > remove_markers_from_module() call after module->exit(), which should do
> > > the synchronize_sched() or whatever, right?
> > >
> > > Rusty.
> >
> > Actually, it's not placing a marker/tracepoint in a module which causes
> > a problem, this is a simple function call after all, and correctly dealt
> > with by current module.c code.
> >
> > The problem comes from a probe function (the callback) that would be
> > registered to be called from a marker and would sit in an unloadable
> > kernel module. I would not want to tie the refcount of the probe modules
> > to the fact that they are connected to a marker because it would then
> > become impossible to unload them due to the fact that unregistration is
> > done in module exit().
>
> Hi Mathieu,
>
> Still confused, sorry. Why don't you don't do a synchronize_sched() at
> the end of your module's exit routine? "You must be completely finished by
> the time ->exit() returns" is the rule so far...
>
Hi Rusty,
Yes, this is I think the right solution. Sorry for the title of the
previous email : I started writing a patch and changed my mind half-way.
Doing a synchronize_sched() at the end of the exit() function seems like
the right thing to do. I'll have to update the documentation and
examples accordingly.
Thanks,
Mathieu
> Rusty.
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Module : call synchronize_sched() between module exit() and free.
2008-07-30 2:27 ` [PATCH] Module : call synchronize_sched() between module exit() and free Mathieu Desnoyers
2008-07-30 3:04 ` Rusty Russell
@ 2008-07-30 11:40 ` Frank Ch. Eigler
2008-07-30 14:09 ` Mathieu Desnoyers
1 sibling, 1 reply; 40+ messages in thread
From: Frank Ch. Eigler @ 2008-07-30 11:40 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: Rusty Russell, linux-kernel, akpm
Hi -
On Tue, Jul 29, 2008 at 10:27:51PM -0400, Mathieu Desnoyers wrote:
> [...]
> Actually, it's not placing a marker/tracepoint in a module which causes
> a problem, this is a simple function call after all, and correctly dealt
> with by current module.c code.
> [...]
Just to spell it out, it is this scenario I'd like to see documented:
module-foo.c:
foo() { ... trace_mark (foo, "..."); ... }
module-bar.c:
setup() { ... marker_probe_register ("foo" , ..., &foo_handler ); }
teardown() { ... marker_probe_unregister ("foo" , ..., &foo_handler ); }
foo_handler() { }
1) module-foo loads
2) module-bar loads
3) module-bar.c:setup()
4) module-foo unloads
What happens here? Certainly no more calls to foo_handler, but is
that all? (Would it not be desirable for an active marker to cause
module-foo's refcount to increase, so as to prevent unloading at this
time?)
5) module-bar.c:teardown()
Can this teardown code succeed fully even if module-foo is already
dead and gone?
- FChE
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Module : call synchronize_sched() between module exit() and free.
2008-07-30 11:40 ` Frank Ch. Eigler
@ 2008-07-30 14:09 ` Mathieu Desnoyers
2008-07-31 0:54 ` Rusty Russell
0 siblings, 1 reply; 40+ messages in thread
From: Mathieu Desnoyers @ 2008-07-30 14:09 UTC (permalink / raw)
To: Frank Ch. Eigler; +Cc: Rusty Russell, linux-kernel, akpm
* Frank Ch. Eigler (fche@redhat.com) wrote:
> Hi -
>
> On Tue, Jul 29, 2008 at 10:27:51PM -0400, Mathieu Desnoyers wrote:
> > [...]
> > Actually, it's not placing a marker/tracepoint in a module which causes
> > a problem, this is a simple function call after all, and correctly dealt
> > with by current module.c code.
> > [...]
>
> Just to spell it out, it is this scenario I'd like to see documented:
>
> module-foo.c:
> foo() { ... trace_mark (foo, "..."); ... }
>
> module-bar.c:
> setup() { ... marker_probe_register ("foo" , ..., &foo_handler ); }
> teardown() { ... marker_probe_unregister ("foo" , ..., &foo_handler ); }
> foo_handler() { }
>
> 1) module-foo loads
> 2) module-bar loads
> 3) module-bar.c:setup()
> 4) module-foo unloads
>
> What happens here? Certainly no more calls to foo_handler, but is
> that all?
Yep, that's it. However a hash table still keeps track of the "foo"
handler so that it's reconnected whenever module-foo is reloaded.
>(Would it not be desirable for an active marker to cause
> module-foo's refcount to increase, so as to prevent unloading at this
> time?)
No, because I want to be able to unload the marked module and I don't
want the fact that a probe is connected to it to change that.
>
> 5) module-bar.c:teardown()
>
> Can this teardown code succeed fully even if module-foo is already
> dead and gone?
>
Yes.
Actually there is a detail missing here. Your teardown should be :
teardown() { ... marker_probe_unregister ("foo" , ..., &foo_handler );
synchronize_sched(); /* Before returning from exit */ }
This makes sure that every live marker call are finished and that it is
safe to unload module-bar (the probe).
Mathieu
>
> - FChE
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Module : call synchronize_sched() between module exit() and free.
2008-07-30 14:09 ` Mathieu Desnoyers
@ 2008-07-31 0:54 ` Rusty Russell
0 siblings, 0 replies; 40+ messages in thread
From: Rusty Russell @ 2008-07-31 0:54 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: Frank Ch. Eigler, linux-kernel, akpm
On Thursday 31 July 2008 00:09:38 Mathieu Desnoyers wrote:
> * Frank Ch. Eigler (fche@redhat.com) wrote:
> >(Would it not be desirable for an active marker to cause
> > module-foo's refcount to increase, so as to prevent unloading at this
> > time?)
>
> No, because I want to be able to unload the marked module and I don't
> want the fact that a probe is connected to it to change that.
Also you might want to put a marker in the module's exit code.
> Actually there is a detail missing here. Your teardown should be :
> teardown() { ... marker_probe_unregister ("foo" , ..., &foo_handler );
> synchronize_sched(); /* Before returning from exit */ }
>
> This makes sure that every live marker call are finished and that it is
> safe to unload module-bar (the probe).
Perhaps create a synchronize_marker_unregister() wrapper for such uses; it's
better documentation and the caller doesn't have to know exactly how it
works...
Cheers,
Rusty.
^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2008-07-31 0:54 UTC | newest]
Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-17 15:57 [patch 0/4] Port KVM-trace to tracepoints Mathieu Desnoyers
2008-07-17 15:57 ` [patch 1/4] kvm move VMCS Encodings to system headers Mathieu Desnoyers
2008-07-17 15:57 ` [patch 2/4] kvm move VMCS read " Mathieu Desnoyers
2008-07-17 15:57 ` [patch 3/4] KVM move register read-write " Mathieu Desnoyers
2008-07-17 15:57 ` [patch 4/4] KVM-trace port to tracepoints Mathieu Desnoyers
2008-07-17 16:49 ` Jan Kiszka
2008-07-17 17:28 ` Mathieu Desnoyers
2008-07-22 16:04 ` Jan Kiszka
2008-07-22 18:46 ` Avi Kivity
2008-07-23 7:49 ` Peter Zijlstra
2008-07-23 8:08 ` Avi Kivity
2008-07-23 8:55 ` Peter Zijlstra
2008-07-23 9:32 ` Avi Kivity
2008-07-23 9:53 ` Peter Zijlstra
2008-07-23 13:15 ` Mathieu Desnoyers
2008-07-23 10:03 ` Christoph Hellwig
2008-07-23 10:08 ` Avi Kivity
2008-07-23 10:13 ` Christoph Hellwig
2008-07-23 13:20 ` Mathieu Desnoyers
2008-07-17 16:52 ` Anthony Liguori
2008-07-17 17:04 ` Mathieu Desnoyers
2008-07-22 18:42 ` [patch 0/4] Port KVM-trace " Avi Kivity
2008-07-22 19:16 ` Frank Ch. Eigler
2008-07-22 19:31 ` Avi Kivity
2008-07-22 19:54 ` Frank Ch. Eigler
2008-07-22 22:12 ` [patch 0/4] Port KVM-trace to tracepoints -> LTTng ? Mathieu Desnoyers
2008-07-27 10:11 ` Avi Kivity
2008-07-28 0:54 ` [RFC] LTTng merge plan Mathieu Desnoyers
2008-07-29 16:18 ` Frank Ch. Eigler
2008-07-29 17:01 ` Mathieu Desnoyers
[not found] ` <20080729211543.GB17097@redhat.com>
2008-07-29 22:41 ` module-placed markers/tracepoints Mathieu Desnoyers
2008-07-29 23:01 ` Frank Ch. Eigler
2008-07-29 23:19 ` Mathieu Desnoyers
2008-07-30 1:40 ` Rusty Russell
2008-07-30 2:27 ` [PATCH] Module : call synchronize_sched() between module exit() and free Mathieu Desnoyers
2008-07-30 3:04 ` Rusty Russell
2008-07-30 4:05 ` Mathieu Desnoyers
2008-07-30 11:40 ` Frank Ch. Eigler
2008-07-30 14:09 ` Mathieu Desnoyers
2008-07-31 0:54 ` Rusty Russell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox