* [patch 0/4] Port KVM-trace to tracepoints
@ 2008-07-17 15:57 Mathieu Desnoyers
2008-07-17 15:57 ` [patch 1/4] kvm move VMCS Encodings to system headers Mathieu Desnoyers
` (4 more replies)
0 siblings, 5 replies; 40+ messages in thread
From: Mathieu Desnoyers @ 2008-07-17 15:57 UTC (permalink / raw)
To: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Avi Kivity, kvm
Hi,
Here is a port of KVM-trace, currently using macros on top of the Linux Markers,
to tracepoints. Note that I cleaned up the instrumentation too, so stuff like
KVMTRACE_3D(CR_WRITE, vcpu, (u32)cr,
(u32)kvm_register_read(vcpu, reg),
(u32)((u64)kvm_register_read(vcpu, reg) >> 32),
handler);
Sprinkled all across the KVM C code becomes :
trace_kvm_cr_write(vcpu, cr, reg);
Which looks much nicer, IMHO.
It applies on top of linux-next patch-v2.6.26-next-20080715.
Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 40+ messages in thread* [patch 1/4] kvm move VMCS Encodings to system headers 2008-07-17 15:57 [patch 0/4] Port KVM-trace to tracepoints Mathieu Desnoyers @ 2008-07-17 15:57 ` Mathieu Desnoyers 2008-07-17 15:57 ` [patch 2/4] kvm move VMCS read " Mathieu Desnoyers ` (3 subsequent siblings) 4 siblings, 0 replies; 40+ messages in thread From: Mathieu Desnoyers @ 2008-07-17 15:57 UTC (permalink / raw) To: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Avi Kivity, kvm Cc: Mathieu Desnoyers, Feng(Eric) Liu [-- Attachment #1: kvm-move-vmcs-encodings-to-system-headers.patch --] [-- Type: text/plain, Size: 14937 bytes --] The VMCS Encodings will be needed by the kvm-trace probes. Put them in system-side headers instead of local header. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> CC: 'Peter Zijlstra' <peterz@infradead.org> CC: 'Feng(Eric) Liu' <eric.e.liu@intel.com> CC: Avi Kivity <avi@qumranet.com> CC: kvm@vger.kernel.org --- arch/x86/kvm/vmx.h | 142 --------------------------------------------- include/asm-x86/kvm_host.h | 142 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 142 insertions(+), 142 deletions(-) Index: linux-2.6-lttng/include/asm-x86/kvm_host.h =================================================================== --- linux-2.6-lttng.orig/include/asm-x86/kvm_host.h 2008-07-17 11:15:31.000000000 -0400 +++ linux-2.6-lttng/include/asm-x86/kvm_host.h 2008-07-17 11:16:30.000000000 -0400 @@ -719,4 +719,146 @@ asmlinkage void kvm_handle_fault_on_rebo KVM_EX_ENTRY " 666b, 667b \n\t" \ ".popsection" +/* VMCS Encodings */ +enum vmcs_field { + VIRTUAL_PROCESSOR_ID = 0x00000000, + GUEST_ES_SELECTOR = 0x00000800, + GUEST_CS_SELECTOR = 0x00000802, + GUEST_SS_SELECTOR = 0x00000804, + GUEST_DS_SELECTOR = 0x00000806, + GUEST_FS_SELECTOR = 0x00000808, + GUEST_GS_SELECTOR = 0x0000080a, + GUEST_LDTR_SELECTOR = 0x0000080c, + GUEST_TR_SELECTOR = 0x0000080e, + HOST_ES_SELECTOR = 0x00000c00, + HOST_CS_SELECTOR = 0x00000c02, + HOST_SS_SELECTOR = 0x00000c04, + HOST_DS_SELECTOR = 0x00000c06, + HOST_FS_SELECTOR = 0x00000c08, + HOST_GS_SELECTOR = 0x00000c0a, + HOST_TR_SELECTOR = 0x00000c0c, + IO_BITMAP_A = 0x00002000, + IO_BITMAP_A_HIGH = 0x00002001, + IO_BITMAP_B = 0x00002002, + IO_BITMAP_B_HIGH = 0x00002003, + MSR_BITMAP = 0x00002004, + MSR_BITMAP_HIGH = 0x00002005, + VM_EXIT_MSR_STORE_ADDR = 0x00002006, + VM_EXIT_MSR_STORE_ADDR_HIGH = 0x00002007, + VM_EXIT_MSR_LOAD_ADDR = 0x00002008, + VM_EXIT_MSR_LOAD_ADDR_HIGH = 0x00002009, + VM_ENTRY_MSR_LOAD_ADDR = 0x0000200a, + VM_ENTRY_MSR_LOAD_ADDR_HIGH = 0x0000200b, + TSC_OFFSET = 0x00002010, + TSC_OFFSET_HIGH = 0x00002011, + VIRTUAL_APIC_PAGE_ADDR = 0x00002012, + VIRTUAL_APIC_PAGE_ADDR_HIGH = 0x00002013, + APIC_ACCESS_ADDR = 0x00002014, + APIC_ACCESS_ADDR_HIGH = 0x00002015, + EPT_POINTER = 0x0000201a, + EPT_POINTER_HIGH = 0x0000201b, + GUEST_PHYSICAL_ADDRESS = 0x00002400, + GUEST_PHYSICAL_ADDRESS_HIGH = 0x00002401, + VMCS_LINK_POINTER = 0x00002800, + VMCS_LINK_POINTER_HIGH = 0x00002801, + GUEST_IA32_DEBUGCTL = 0x00002802, + GUEST_IA32_DEBUGCTL_HIGH = 0x00002803, + GUEST_PDPTR0 = 0x0000280a, + GUEST_PDPTR0_HIGH = 0x0000280b, + GUEST_PDPTR1 = 0x0000280c, + GUEST_PDPTR1_HIGH = 0x0000280d, + GUEST_PDPTR2 = 0x0000280e, + GUEST_PDPTR2_HIGH = 0x0000280f, + GUEST_PDPTR3 = 0x00002810, + GUEST_PDPTR3_HIGH = 0x00002811, + PIN_BASED_VM_EXEC_CONTROL = 0x00004000, + CPU_BASED_VM_EXEC_CONTROL = 0x00004002, + EXCEPTION_BITMAP = 0x00004004, + PAGE_FAULT_ERROR_CODE_MASK = 0x00004006, + PAGE_FAULT_ERROR_CODE_MATCH = 0x00004008, + CR3_TARGET_COUNT = 0x0000400a, + VM_EXIT_CONTROLS = 0x0000400c, + VM_EXIT_MSR_STORE_COUNT = 0x0000400e, + VM_EXIT_MSR_LOAD_COUNT = 0x00004010, + VM_ENTRY_CONTROLS = 0x00004012, + VM_ENTRY_MSR_LOAD_COUNT = 0x00004014, + VM_ENTRY_INTR_INFO_FIELD = 0x00004016, + VM_ENTRY_EXCEPTION_ERROR_CODE = 0x00004018, + VM_ENTRY_INSTRUCTION_LEN = 0x0000401a, + TPR_THRESHOLD = 0x0000401c, + SECONDARY_VM_EXEC_CONTROL = 0x0000401e, + VM_INSTRUCTION_ERROR = 0x00004400, + VM_EXIT_REASON = 0x00004402, + VM_EXIT_INTR_INFO = 0x00004404, + VM_EXIT_INTR_ERROR_CODE = 0x00004406, + IDT_VECTORING_INFO_FIELD = 0x00004408, + IDT_VECTORING_ERROR_CODE = 0x0000440a, + VM_EXIT_INSTRUCTION_LEN = 0x0000440c, + VMX_INSTRUCTION_INFO = 0x0000440e, + GUEST_ES_LIMIT = 0x00004800, + GUEST_CS_LIMIT = 0x00004802, + GUEST_SS_LIMIT = 0x00004804, + GUEST_DS_LIMIT = 0x00004806, + GUEST_FS_LIMIT = 0x00004808, + GUEST_GS_LIMIT = 0x0000480a, + GUEST_LDTR_LIMIT = 0x0000480c, + GUEST_TR_LIMIT = 0x0000480e, + GUEST_GDTR_LIMIT = 0x00004810, + GUEST_IDTR_LIMIT = 0x00004812, + GUEST_ES_AR_BYTES = 0x00004814, + GUEST_CS_AR_BYTES = 0x00004816, + GUEST_SS_AR_BYTES = 0x00004818, + GUEST_DS_AR_BYTES = 0x0000481a, + GUEST_FS_AR_BYTES = 0x0000481c, + GUEST_GS_AR_BYTES = 0x0000481e, + GUEST_LDTR_AR_BYTES = 0x00004820, + GUEST_TR_AR_BYTES = 0x00004822, + GUEST_INTERRUPTIBILITY_INFO = 0x00004824, + GUEST_ACTIVITY_STATE = 0X00004826, + GUEST_SYSENTER_CS = 0x0000482A, + HOST_IA32_SYSENTER_CS = 0x00004c00, + CR0_GUEST_HOST_MASK = 0x00006000, + CR4_GUEST_HOST_MASK = 0x00006002, + CR0_READ_SHADOW = 0x00006004, + CR4_READ_SHADOW = 0x00006006, + CR3_TARGET_VALUE0 = 0x00006008, + CR3_TARGET_VALUE1 = 0x0000600a, + CR3_TARGET_VALUE2 = 0x0000600c, + CR3_TARGET_VALUE3 = 0x0000600e, + EXIT_QUALIFICATION = 0x00006400, + GUEST_LINEAR_ADDRESS = 0x0000640a, + GUEST_CR0 = 0x00006800, + GUEST_CR3 = 0x00006802, + GUEST_CR4 = 0x00006804, + GUEST_ES_BASE = 0x00006806, + GUEST_CS_BASE = 0x00006808, + GUEST_SS_BASE = 0x0000680a, + GUEST_DS_BASE = 0x0000680c, + GUEST_FS_BASE = 0x0000680e, + GUEST_GS_BASE = 0x00006810, + GUEST_LDTR_BASE = 0x00006812, + GUEST_TR_BASE = 0x00006814, + GUEST_GDTR_BASE = 0x00006816, + GUEST_IDTR_BASE = 0x00006818, + GUEST_DR7 = 0x0000681a, + GUEST_RSP = 0x0000681c, + GUEST_RIP = 0x0000681e, + GUEST_RFLAGS = 0x00006820, + GUEST_PENDING_DBG_EXCEPTIONS = 0x00006822, + GUEST_SYSENTER_ESP = 0x00006824, + GUEST_SYSENTER_EIP = 0x00006826, + HOST_CR0 = 0x00006c00, + HOST_CR3 = 0x00006c02, + HOST_CR4 = 0x00006c04, + HOST_FS_BASE = 0x00006c06, + HOST_GS_BASE = 0x00006c08, + HOST_TR_BASE = 0x00006c0a, + HOST_GDTR_BASE = 0x00006c0c, + HOST_IDTR_BASE = 0x00006c0e, + HOST_IA32_SYSENTER_ESP = 0x00006c10, + HOST_IA32_SYSENTER_EIP = 0x00006c12, + HOST_RSP = 0x00006c14, + HOST_RIP = 0x00006c16, +}; + #endif Index: linux-2.6-lttng/arch/x86/kvm/vmx.h =================================================================== --- linux-2.6-lttng.orig/arch/x86/kvm/vmx.h 2008-07-17 11:15:17.000000000 -0400 +++ linux-2.6-lttng/arch/x86/kvm/vmx.h 2008-07-17 11:16:30.000000000 -0400 @@ -68,148 +68,6 @@ #define VM_ENTRY_SMM 0x00000400 #define VM_ENTRY_DEACT_DUAL_MONITOR 0x00000800 -/* VMCS Encodings */ -enum vmcs_field { - VIRTUAL_PROCESSOR_ID = 0x00000000, - GUEST_ES_SELECTOR = 0x00000800, - GUEST_CS_SELECTOR = 0x00000802, - GUEST_SS_SELECTOR = 0x00000804, - GUEST_DS_SELECTOR = 0x00000806, - GUEST_FS_SELECTOR = 0x00000808, - GUEST_GS_SELECTOR = 0x0000080a, - GUEST_LDTR_SELECTOR = 0x0000080c, - GUEST_TR_SELECTOR = 0x0000080e, - HOST_ES_SELECTOR = 0x00000c00, - HOST_CS_SELECTOR = 0x00000c02, - HOST_SS_SELECTOR = 0x00000c04, - HOST_DS_SELECTOR = 0x00000c06, - HOST_FS_SELECTOR = 0x00000c08, - HOST_GS_SELECTOR = 0x00000c0a, - HOST_TR_SELECTOR = 0x00000c0c, - IO_BITMAP_A = 0x00002000, - IO_BITMAP_A_HIGH = 0x00002001, - IO_BITMAP_B = 0x00002002, - IO_BITMAP_B_HIGH = 0x00002003, - MSR_BITMAP = 0x00002004, - MSR_BITMAP_HIGH = 0x00002005, - VM_EXIT_MSR_STORE_ADDR = 0x00002006, - VM_EXIT_MSR_STORE_ADDR_HIGH = 0x00002007, - VM_EXIT_MSR_LOAD_ADDR = 0x00002008, - VM_EXIT_MSR_LOAD_ADDR_HIGH = 0x00002009, - VM_ENTRY_MSR_LOAD_ADDR = 0x0000200a, - VM_ENTRY_MSR_LOAD_ADDR_HIGH = 0x0000200b, - TSC_OFFSET = 0x00002010, - TSC_OFFSET_HIGH = 0x00002011, - VIRTUAL_APIC_PAGE_ADDR = 0x00002012, - VIRTUAL_APIC_PAGE_ADDR_HIGH = 0x00002013, - APIC_ACCESS_ADDR = 0x00002014, - APIC_ACCESS_ADDR_HIGH = 0x00002015, - EPT_POINTER = 0x0000201a, - EPT_POINTER_HIGH = 0x0000201b, - GUEST_PHYSICAL_ADDRESS = 0x00002400, - GUEST_PHYSICAL_ADDRESS_HIGH = 0x00002401, - VMCS_LINK_POINTER = 0x00002800, - VMCS_LINK_POINTER_HIGH = 0x00002801, - GUEST_IA32_DEBUGCTL = 0x00002802, - GUEST_IA32_DEBUGCTL_HIGH = 0x00002803, - GUEST_PDPTR0 = 0x0000280a, - GUEST_PDPTR0_HIGH = 0x0000280b, - GUEST_PDPTR1 = 0x0000280c, - GUEST_PDPTR1_HIGH = 0x0000280d, - GUEST_PDPTR2 = 0x0000280e, - GUEST_PDPTR2_HIGH = 0x0000280f, - GUEST_PDPTR3 = 0x00002810, - GUEST_PDPTR3_HIGH = 0x00002811, - PIN_BASED_VM_EXEC_CONTROL = 0x00004000, - CPU_BASED_VM_EXEC_CONTROL = 0x00004002, - EXCEPTION_BITMAP = 0x00004004, - PAGE_FAULT_ERROR_CODE_MASK = 0x00004006, - PAGE_FAULT_ERROR_CODE_MATCH = 0x00004008, - CR3_TARGET_COUNT = 0x0000400a, - VM_EXIT_CONTROLS = 0x0000400c, - VM_EXIT_MSR_STORE_COUNT = 0x0000400e, - VM_EXIT_MSR_LOAD_COUNT = 0x00004010, - VM_ENTRY_CONTROLS = 0x00004012, - VM_ENTRY_MSR_LOAD_COUNT = 0x00004014, - VM_ENTRY_INTR_INFO_FIELD = 0x00004016, - VM_ENTRY_EXCEPTION_ERROR_CODE = 0x00004018, - VM_ENTRY_INSTRUCTION_LEN = 0x0000401a, - TPR_THRESHOLD = 0x0000401c, - SECONDARY_VM_EXEC_CONTROL = 0x0000401e, - VM_INSTRUCTION_ERROR = 0x00004400, - VM_EXIT_REASON = 0x00004402, - VM_EXIT_INTR_INFO = 0x00004404, - VM_EXIT_INTR_ERROR_CODE = 0x00004406, - IDT_VECTORING_INFO_FIELD = 0x00004408, - IDT_VECTORING_ERROR_CODE = 0x0000440a, - VM_EXIT_INSTRUCTION_LEN = 0x0000440c, - VMX_INSTRUCTION_INFO = 0x0000440e, - GUEST_ES_LIMIT = 0x00004800, - GUEST_CS_LIMIT = 0x00004802, - GUEST_SS_LIMIT = 0x00004804, - GUEST_DS_LIMIT = 0x00004806, - GUEST_FS_LIMIT = 0x00004808, - GUEST_GS_LIMIT = 0x0000480a, - GUEST_LDTR_LIMIT = 0x0000480c, - GUEST_TR_LIMIT = 0x0000480e, - GUEST_GDTR_LIMIT = 0x00004810, - GUEST_IDTR_LIMIT = 0x00004812, - GUEST_ES_AR_BYTES = 0x00004814, - GUEST_CS_AR_BYTES = 0x00004816, - GUEST_SS_AR_BYTES = 0x00004818, - GUEST_DS_AR_BYTES = 0x0000481a, - GUEST_FS_AR_BYTES = 0x0000481c, - GUEST_GS_AR_BYTES = 0x0000481e, - GUEST_LDTR_AR_BYTES = 0x00004820, - GUEST_TR_AR_BYTES = 0x00004822, - GUEST_INTERRUPTIBILITY_INFO = 0x00004824, - GUEST_ACTIVITY_STATE = 0X00004826, - GUEST_SYSENTER_CS = 0x0000482A, - HOST_IA32_SYSENTER_CS = 0x00004c00, - CR0_GUEST_HOST_MASK = 0x00006000, - CR4_GUEST_HOST_MASK = 0x00006002, - CR0_READ_SHADOW = 0x00006004, - CR4_READ_SHADOW = 0x00006006, - CR3_TARGET_VALUE0 = 0x00006008, - CR3_TARGET_VALUE1 = 0x0000600a, - CR3_TARGET_VALUE2 = 0x0000600c, - CR3_TARGET_VALUE3 = 0x0000600e, - EXIT_QUALIFICATION = 0x00006400, - GUEST_LINEAR_ADDRESS = 0x0000640a, - GUEST_CR0 = 0x00006800, - GUEST_CR3 = 0x00006802, - GUEST_CR4 = 0x00006804, - GUEST_ES_BASE = 0x00006806, - GUEST_CS_BASE = 0x00006808, - GUEST_SS_BASE = 0x0000680a, - GUEST_DS_BASE = 0x0000680c, - GUEST_FS_BASE = 0x0000680e, - GUEST_GS_BASE = 0x00006810, - GUEST_LDTR_BASE = 0x00006812, - GUEST_TR_BASE = 0x00006814, - GUEST_GDTR_BASE = 0x00006816, - GUEST_IDTR_BASE = 0x00006818, - GUEST_DR7 = 0x0000681a, - GUEST_RSP = 0x0000681c, - GUEST_RIP = 0x0000681e, - GUEST_RFLAGS = 0x00006820, - GUEST_PENDING_DBG_EXCEPTIONS = 0x00006822, - GUEST_SYSENTER_ESP = 0x00006824, - GUEST_SYSENTER_EIP = 0x00006826, - HOST_CR0 = 0x00006c00, - HOST_CR3 = 0x00006c02, - HOST_CR4 = 0x00006c04, - HOST_FS_BASE = 0x00006c06, - HOST_GS_BASE = 0x00006c08, - HOST_TR_BASE = 0x00006c0a, - HOST_GDTR_BASE = 0x00006c0c, - HOST_IDTR_BASE = 0x00006c0e, - HOST_IA32_SYSENTER_ESP = 0x00006c10, - HOST_IA32_SYSENTER_EIP = 0x00006c12, - HOST_RSP = 0x00006c14, - HOST_RIP = 0x00006c16, -}; - #define VMX_EXIT_REASONS_FAILED_VMENTRY 0x80000000 #define EXIT_REASON_EXCEPTION_NMI 0 -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 40+ messages in thread
* [patch 2/4] kvm move VMCS read to system headers 2008-07-17 15:57 [patch 0/4] Port KVM-trace to tracepoints Mathieu Desnoyers 2008-07-17 15:57 ` [patch 1/4] kvm move VMCS Encodings to system headers Mathieu Desnoyers @ 2008-07-17 15:57 ` Mathieu Desnoyers 2008-07-17 15:57 ` [patch 3/4] KVM move register read-write " Mathieu Desnoyers ` (2 subsequent siblings) 4 siblings, 0 replies; 40+ messages in thread From: Mathieu Desnoyers @ 2008-07-17 15:57 UTC (permalink / raw) To: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Avi Kivity, kvm Cc: Mathieu Desnoyers, Feng(Eric) Liu [-- Attachment #1: kvm-move-vmcs-read-to-system-headers.patch --] [-- Type: text/plain, Size: 2932 bytes --] The VMCS read will be needed by the kvm-trace probes. Put them in static inline functions in system-side headers instead of in the C file. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> CC: 'Peter Zijlstra' <peterz@infradead.org> CC: 'Feng(Eric) Liu' <eric.e.liu@intel.com> CC: Avi Kivity <avi@qumranet.com> CC: kvm@vger.kernel.org --- arch/x86/kvm/vmx.c | 28 ---------------------------- include/asm-x86/kvm_host.h | 30 ++++++++++++++++++++++++++++++ 2 files changed, 30 insertions(+), 28 deletions(-) Index: linux-2.6-lttng/arch/x86/kvm/vmx.c =================================================================== --- linux-2.6-lttng.orig/arch/x86/kvm/vmx.c 2008-07-17 11:41:54.000000000 -0400 +++ linux-2.6-lttng/arch/x86/kvm/vmx.c 2008-07-17 11:41:58.000000000 -0400 @@ -388,34 +388,6 @@ static inline void ept_sync_individual_a } } -static unsigned long vmcs_readl(unsigned long field) -{ - unsigned long value; - - asm volatile (__ex(ASM_VMX_VMREAD_RDX_RAX) - : "=a"(value) : "d"(field) : "cc"); - return value; -} - -static u16 vmcs_read16(unsigned long field) -{ - return vmcs_readl(field); -} - -static u32 vmcs_read32(unsigned long field) -{ - return vmcs_readl(field); -} - -static u64 vmcs_read64(unsigned long field) -{ -#ifdef CONFIG_X86_64 - return vmcs_readl(field); -#else - return vmcs_readl(field) | ((u64)vmcs_readl(field+1) << 32); -#endif -} - static noinline void vmwrite_error(unsigned long field, unsigned long value) { printk(KERN_ERR "vmwrite error: reg %lx value %lx (err %d)\n", Index: linux-2.6-lttng/include/asm-x86/kvm_host.h =================================================================== --- linux-2.6-lttng.orig/include/asm-x86/kvm_host.h 2008-07-17 11:41:57.000000000 -0400 +++ linux-2.6-lttng/include/asm-x86/kvm_host.h 2008-07-17 11:42:17.000000000 -0400 @@ -719,6 +719,8 @@ asmlinkage void kvm_handle_fault_on_rebo KVM_EX_ENTRY " 666b, 667b \n\t" \ ".popsection" +#define __ex(x) __kvm_handle_fault_on_reboot(x) + /* VMCS Encodings */ enum vmcs_field { VIRTUAL_PROCESSOR_ID = 0x00000000, @@ -861,4 +863,32 @@ enum vmcs_field { HOST_RIP = 0x00006c16, }; +static inline unsigned long vmcs_readl(unsigned long field) +{ + unsigned long value; + + asm volatile (__ex(ASM_VMX_VMREAD_RDX_RAX) + : "=a"(value) : "d"(field) : "cc"); + return value; +} + +static inline u16 vmcs_read16(unsigned long field) +{ + return vmcs_readl(field); +} + +static inline u32 vmcs_read32(unsigned long field) +{ + return vmcs_readl(field); +} + +static inline u64 vmcs_read64(unsigned long field) +{ +#ifdef CONFIG_X86_64 + return vmcs_readl(field); +#else + return vmcs_readl(field) | ((u64)vmcs_readl(field+1) << 32); +#endif +} + #endif -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 40+ messages in thread
* [patch 3/4] KVM move register read-write to system headers 2008-07-17 15:57 [patch 0/4] Port KVM-trace to tracepoints Mathieu Desnoyers 2008-07-17 15:57 ` [patch 1/4] kvm move VMCS Encodings to system headers Mathieu Desnoyers 2008-07-17 15:57 ` [patch 2/4] kvm move VMCS read " Mathieu Desnoyers @ 2008-07-17 15:57 ` Mathieu Desnoyers 2008-07-17 15:57 ` [patch 4/4] KVM-trace port to tracepoints Mathieu Desnoyers 2008-07-22 18:42 ` [patch 0/4] Port KVM-trace " Avi Kivity 4 siblings, 0 replies; 40+ messages in thread From: Mathieu Desnoyers @ 2008-07-17 15:57 UTC (permalink / raw) To: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Avi Kivity, kvm Cc: Mathieu Desnoyers, Feng(Eric) Liu [-- Attachment #1: kvm-move-register-read-write-to-system-headers.patch --] [-- Type: text/plain, Size: 5850 bytes --] Needed by kvm_tracer probes, which are outside of arch/x86/kvm. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> CC: 'Peter Zijlstra' <peterz@infradead.org> CC: 'Feng(Eric) Liu' <eric.e.liu@intel.com> CC: Avi Kivity <avi@qumranet.com> CC: kvm@vger.kernel.org --- arch/x86/kvm/kvm_cache_regs.h | 32 -------------------------------- arch/x86/kvm/lapic.c | 2 +- arch/x86/kvm/svm.c | 2 +- arch/x86/kvm/vmx.c | 2 +- arch/x86/kvm/x86.c | 2 +- arch/x86/kvm/x86_emulate.c | 2 +- include/asm-x86/kvm_cache_regs.h | 32 ++++++++++++++++++++++++++++++++ 7 files changed, 37 insertions(+), 37 deletions(-) Index: linux-2.6-lttng/arch/x86/kvm/kvm_cache_regs.h =================================================================== --- linux-2.6-lttng.orig/arch/x86/kvm/kvm_cache_regs.h 2008-07-17 11:44:43.000000000 -0400 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,32 +0,0 @@ -#ifndef ASM_KVM_CACHE_REGS_H -#define ASM_KVM_CACHE_REGS_H - -static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu, - enum kvm_reg reg) -{ - if (!test_bit(reg, (unsigned long *)&vcpu->arch.regs_avail)) - kvm_x86_ops->cache_reg(vcpu, reg); - - return vcpu->arch.regs[reg]; -} - -static inline void kvm_register_write(struct kvm_vcpu *vcpu, - enum kvm_reg reg, - unsigned long val) -{ - vcpu->arch.regs[reg] = val; - __set_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty); - __set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail); -} - -static inline unsigned long kvm_rip_read(struct kvm_vcpu *vcpu) -{ - return kvm_register_read(vcpu, VCPU_REGS_RIP); -} - -static inline void kvm_rip_write(struct kvm_vcpu *vcpu, unsigned long val) -{ - kvm_register_write(vcpu, VCPU_REGS_RIP, val); -} - -#endif Index: linux-2.6-lttng/include/asm-x86/kvm_cache_regs.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6-lttng/include/asm-x86/kvm_cache_regs.h 2008-07-17 11:44:53.000000000 -0400 @@ -0,0 +1,32 @@ +#ifndef ASM_KVM_CACHE_REGS_H +#define ASM_KVM_CACHE_REGS_H + +static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu, + enum kvm_reg reg) +{ + if (!test_bit(reg, (unsigned long *)&vcpu->arch.regs_avail)) + kvm_x86_ops->cache_reg(vcpu, reg); + + return vcpu->arch.regs[reg]; +} + +static inline void kvm_register_write(struct kvm_vcpu *vcpu, + enum kvm_reg reg, + unsigned long val) +{ + vcpu->arch.regs[reg] = val; + __set_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty); + __set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail); +} + +static inline unsigned long kvm_rip_read(struct kvm_vcpu *vcpu) +{ + return kvm_register_read(vcpu, VCPU_REGS_RIP); +} + +static inline void kvm_rip_write(struct kvm_vcpu *vcpu, unsigned long val) +{ + kvm_register_write(vcpu, VCPU_REGS_RIP, val); +} + +#endif Index: linux-2.6-lttng/arch/x86/kvm/lapic.c =================================================================== --- linux-2.6-lttng.orig/arch/x86/kvm/lapic.c 2008-07-17 11:46:14.000000000 -0400 +++ linux-2.6-lttng/arch/x86/kvm/lapic.c 2008-07-17 11:46:29.000000000 -0400 @@ -32,7 +32,7 @@ #include <asm/current.h> #include <asm/apicdef.h> #include <asm/atomic.h> -#include "kvm_cache_regs.h" +#include <asm/kvm_cache_regs.h> #include "irq.h" #define PRId64 "d" Index: linux-2.6-lttng/arch/x86/kvm/svm.c =================================================================== --- linux-2.6-lttng.orig/arch/x86/kvm/svm.c 2008-07-17 11:46:14.000000000 -0400 +++ linux-2.6-lttng/arch/x86/kvm/svm.c 2008-07-17 11:46:47.000000000 -0400 @@ -18,7 +18,6 @@ #include "kvm_svm.h" #include "irq.h" #include "mmu.h" -#include "kvm_cache_regs.h" #include <linux/module.h> #include <linux/kernel.h> @@ -26,6 +25,7 @@ #include <linux/highmem.h> #include <linux/sched.h> +#include <asm/kvm_cache_regs.h> #include <asm/desc.h> #define __ex(x) __kvm_handle_fault_on_reboot(x) Index: linux-2.6-lttng/arch/x86/kvm/vmx.c =================================================================== --- linux-2.6-lttng.orig/arch/x86/kvm/vmx.c 2008-07-17 11:46:14.000000000 -0400 +++ linux-2.6-lttng/arch/x86/kvm/vmx.c 2008-07-17 11:47:04.000000000 -0400 @@ -26,9 +26,9 @@ #include <linux/highmem.h> #include <linux/sched.h> #include <linux/moduleparam.h> -#include "kvm_cache_regs.h" #include "x86.h" +#include <asm/kvm_cache_regs.h> #include <asm/io.h> #include <asm/desc.h> Index: linux-2.6-lttng/arch/x86/kvm/x86.c =================================================================== --- linux-2.6-lttng.orig/arch/x86/kvm/x86.c 2008-07-17 11:46:14.000000000 -0400 +++ linux-2.6-lttng/arch/x86/kvm/x86.c 2008-07-17 11:47:12.000000000 -0400 @@ -19,7 +19,6 @@ #include "mmu.h" #include "i8254.h" #include "tss.h" -#include "kvm_cache_regs.h" #include "x86.h" #include <linux/clocksource.h> @@ -30,6 +29,7 @@ #include <linux/mman.h> #include <linux/highmem.h> +#include <asm/kvm_cache_regs.h> #include <asm/uaccess.h> #include <asm/msr.h> #include <asm/desc.h> Index: linux-2.6-lttng/arch/x86/kvm/x86_emulate.c =================================================================== --- linux-2.6-lttng.orig/arch/x86/kvm/x86_emulate.c 2008-07-17 11:46:14.000000000 -0400 +++ linux-2.6-lttng/arch/x86/kvm/x86_emulate.c 2008-07-17 11:47:24.000000000 -0400 @@ -26,10 +26,10 @@ #define DPRINTF(_f, _a ...) printf(_f , ## _a) #else #include <linux/kvm_host.h> -#include "kvm_cache_regs.h" #define DPRINTF(x...) do {} while (0) #endif #include <linux/module.h> +#include <asm/kvm_cache_regs.h> #include <asm/kvm_x86_emulate.h> /* -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 40+ messages in thread
* [patch 4/4] KVM-trace port to tracepoints 2008-07-17 15:57 [patch 0/4] Port KVM-trace to tracepoints Mathieu Desnoyers ` (2 preceding siblings ...) 2008-07-17 15:57 ` [patch 3/4] KVM move register read-write " Mathieu Desnoyers @ 2008-07-17 15:57 ` Mathieu Desnoyers 2008-07-17 16:49 ` Jan Kiszka 2008-07-17 16:52 ` Anthony Liguori 2008-07-22 18:42 ` [patch 0/4] Port KVM-trace " Avi Kivity 4 siblings, 2 replies; 40+ messages in thread From: Mathieu Desnoyers @ 2008-07-17 15:57 UTC (permalink / raw) To: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Avi Kivity, kvm Cc: Mathieu Desnoyers, Feng(Eric) Liu [-- Attachment #1: kvm-trace-port-to-tracepoints.patch --] [-- Type: text/plain, Size: 24667 bytes --] Port/cleanup of KVM-trace to tracepoints. Tracepoints allow dormat instrumentation, like the kernel markers, but also allows to describe the trace points in global headers so they can be easily managed. They also do not use format strings. Anything that would involve an action (dereference a pointer, vmcs read, ...) only required when tracing is placed in the probes created in kvm_trace.c This patch depends on the "Tracepoints" patch. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> CC: 'Peter Zijlstra' <peterz@infradead.org> CC: 'Feng(Eric) Liu' <eric.e.liu@intel.com> CC: Avi Kivity <avi@qumranet.com> CC: kvm@vger.kernel.org --- arch/x86/kvm/vmx.c | 38 ++--- arch/x86/kvm/x86.c | 43 ++---- include/trace/kvm.h | 83 ++++++++++++ virt/kvm/kvm_trace.c | 336 +++++++++++++++++++++++++++++++++++++++++++-------- 4 files changed, 398 insertions(+), 102 deletions(-) Index: linux-2.6-lttng/arch/x86/kvm/vmx.c =================================================================== --- linux-2.6-lttng.orig/arch/x86/kvm/vmx.c 2008-07-17 11:54:36.000000000 -0400 +++ linux-2.6-lttng/arch/x86/kvm/vmx.c 2008-07-17 11:54:54.000000000 -0400 @@ -26,6 +26,7 @@ #include <linux/highmem.h> #include <linux/sched.h> #include <linux/moduleparam.h> +#include <trace/kvm.h> #include "x86.h" #include <asm/kvm_cache_regs.h> @@ -2094,7 +2095,7 @@ static void vmx_inject_irq(struct kvm_vc { struct vcpu_vmx *vmx = to_vmx(vcpu); - KVMTRACE_1D(INJ_VIRQ, vcpu, (u32)irq, handler); + trace_kvm_inj_virq(vcpu, irq); if (vcpu->arch.rmode.active) { vmx->rmode.irq.pending = true; @@ -2252,8 +2253,7 @@ static int handle_exception(struct kvm_v if (vm_need_ept()) BUG(); cr2 = vmcs_readl(EXIT_QUALIFICATION); - KVMTRACE_3D(PAGE_FAULT, vcpu, error_code, (u32)cr2, - (u32)((u64)cr2 >> 32), handler); + trace_kvm_page_fault(vcpu, error_code, cr2); return kvm_mmu_page_fault(vcpu, cr2, error_code); } @@ -2282,7 +2282,7 @@ static int handle_external_interrupt(str struct kvm_run *kvm_run) { ++vcpu->stat.irq_exits; - KVMTRACE_1D(INTR, vcpu, vmcs_read32(VM_EXIT_INTR_INFO), handler); + trace_kvm_intr(vcpu); return 1; } @@ -2340,10 +2340,7 @@ static int handle_cr(struct kvm_vcpu *vc reg = (exit_qualification >> 8) & 15; switch ((exit_qualification >> 4) & 3) { case 0: /* mov to cr */ - KVMTRACE_3D(CR_WRITE, vcpu, (u32)cr, - (u32)kvm_register_read(vcpu, reg), - (u32)((u64)kvm_register_read(vcpu, reg) >> 32), - handler); + trace_kvm_cr_write(vcpu, cr, reg); switch (cr) { case 0: kvm_set_cr0(vcpu, kvm_register_read(vcpu, reg)); @@ -2371,23 +2368,19 @@ static int handle_cr(struct kvm_vcpu *vc vcpu->arch.cr0 &= ~X86_CR0_TS; vmcs_writel(CR0_READ_SHADOW, vcpu->arch.cr0); vmx_fpu_activate(vcpu); - KVMTRACE_0D(CLTS, vcpu, handler); + trace_kvm_clts(vcpu); skip_emulated_instruction(vcpu); return 1; case 1: /*mov from cr*/ switch (cr) { case 3: kvm_register_write(vcpu, reg, vcpu->arch.cr3); - KVMTRACE_3D(CR_READ, vcpu, (u32)cr, - (u32)kvm_register_read(vcpu, reg), - (u32)((u64)kvm_register_read(vcpu, reg) >> 32), - handler); + trace_kvm_cr_read(vcpu, cr, reg); skip_emulated_instruction(vcpu); return 1; case 8: kvm_register_write(vcpu, reg, kvm_get_cr8(vcpu)); - KVMTRACE_2D(CR_READ, vcpu, (u32)cr, - (u32)kvm_register_read(vcpu, reg), handler); + trace_kvm_cr_read(vcpu, cr, reg); skip_emulated_instruction(vcpu); return 1; } @@ -2432,7 +2425,7 @@ static int handle_dr(struct kvm_vcpu *vc val = 0; } kvm_register_write(vcpu, reg, val); - KVMTRACE_2D(DR_READ, vcpu, (u32)dr, (u32)val, handler); + trace_kvm_dr_read(vcpu, dr, val); } else { /* mov to dr */ } @@ -2456,8 +2449,7 @@ static int handle_rdmsr(struct kvm_vcpu return 1; } - KVMTRACE_3D(MSR_READ, vcpu, ecx, (u32)data, (u32)(data >> 32), - handler); + trace_kvm_msr_read(vcpu, ecx, data); /* FIXME: handling of bits 32:63 of rax, rdx */ vcpu->arch.regs[VCPU_REGS_RAX] = data & -1u; @@ -2472,8 +2464,7 @@ static int handle_wrmsr(struct kvm_vcpu u64 data = (vcpu->arch.regs[VCPU_REGS_RAX] & -1u) | ((u64)(vcpu->arch.regs[VCPU_REGS_RDX] & -1u) << 32); - KVMTRACE_3D(MSR_WRITE, vcpu, ecx, (u32)data, (u32)(data >> 32), - handler); + trace_kvm_msr_write(vcpu, ecx, data); if (vmx_set_msr(vcpu, ecx, data) != 0) { kvm_inject_gp(vcpu, 0); @@ -2500,7 +2491,7 @@ static int handle_interrupt_window(struc cpu_based_vm_exec_control &= ~CPU_BASED_VIRTUAL_INTR_PENDING; vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control); - KVMTRACE_0D(PEND_INTR, vcpu, handler); + trace_kvm_pend_intr(vcpu); /* * If the user space waits to inject interrupts, exit as soon as @@ -2680,8 +2671,7 @@ static int kvm_handle_exit(struct kvm_ru struct vcpu_vmx *vmx = to_vmx(vcpu); u32 vectoring_info = vmx->idt_vectoring_info; - KVMTRACE_3D(VMEXIT, vcpu, exit_reason, (u32)kvm_rip_read(vcpu), - (u32)((u64)kvm_rip_read(vcpu) >> 32), entryexit); + trace_kvm_vmexit(vcpu, exit_reason); /* Access CR3 don't cause VMExit in paging mode, so we need * to sync with guest real CR3. */ @@ -3037,7 +3027,7 @@ static void vmx_vcpu_run(struct kvm_vcpu /* We need to handle NMIs before interrupts are enabled */ if ((intr_info & INTR_INFO_INTR_TYPE_MASK) == 0x200 && (intr_info & INTR_INFO_VALID_MASK)) { - KVMTRACE_0D(NMI, vcpu, handler); + trace_kvm_nmi(vcpu); asm("int $2"); } Index: linux-2.6-lttng/arch/x86/kvm/x86.c =================================================================== --- linux-2.6-lttng.orig/arch/x86/kvm/x86.c 2008-07-17 11:54:36.000000000 -0400 +++ linux-2.6-lttng/arch/x86/kvm/x86.c 2008-07-17 11:54:37.000000000 -0400 @@ -28,6 +28,7 @@ #include <linux/module.h> #include <linux/mman.h> #include <linux/highmem.h> +#include <trace/kvm.h> #include <asm/kvm_cache_regs.h> #include <asm/uaccess.h> @@ -312,10 +313,10 @@ EXPORT_SYMBOL_GPL(kvm_set_cr0); void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw) { - kvm_set_cr0(vcpu, (vcpu->arch.cr0 & ~0x0ful) | (msw & 0x0f)); - KVMTRACE_1D(LMSW, vcpu, - (u32)((vcpu->arch.cr0 & ~0x0ful) | (msw & 0x0f)), - handler); + unsigned long cr0 = (vcpu->arch.cr0 & ~0x0ful) | (msw & 0x0f); + + kvm_set_cr0(vcpu, cr0); + trace_kvm_lmsw(vcpu, cr0); } EXPORT_SYMBOL_GPL(kvm_lmsw); @@ -2045,7 +2046,7 @@ int emulate_invlpg(struct kvm_vcpu *vcpu int emulate_clts(struct kvm_vcpu *vcpu) { - KVMTRACE_0D(CLTS, vcpu, handler); + trace_kvm_clts(vcpu); kvm_x86_ops->set_cr0(vcpu, vcpu->arch.cr0 & ~X86_CR0_TS); return X86EMUL_CONTINUE; } @@ -2345,12 +2346,7 @@ int kvm_emulate_pio(struct kvm_vcpu *vcp vcpu->arch.pio.guest_page_offset = 0; vcpu->arch.pio.rep = 0; - if (vcpu->run->io.direction == KVM_EXIT_IO_IN) - KVMTRACE_2D(IO_READ, vcpu, vcpu->run->io.port, (u32)size, - handler); - else - KVMTRACE_2D(IO_WRITE, vcpu, vcpu->run->io.port, (u32)size, - handler); + trace_kvm_io(vcpu, size); val = kvm_register_read(vcpu, VCPU_REGS_RAX); memcpy(vcpu->arch.pio_data, &val, 4); @@ -2389,12 +2385,7 @@ int kvm_emulate_pio_string(struct kvm_vc vcpu->arch.pio.guest_page_offset = offset_in_page(address); vcpu->arch.pio.rep = rep; - if (vcpu->run->io.direction == KVM_EXIT_IO_IN) - KVMTRACE_2D(IO_READ, vcpu, vcpu->run->io.port, (u32)size, - handler); - else - KVMTRACE_2D(IO_WRITE, vcpu, vcpu->run->io.port, (u32)size, - handler); + trace_kvm_io(vcpu, size); if (!count) { kvm_x86_ops->skip_emulated_instruction(vcpu); @@ -2508,7 +2499,7 @@ void kvm_arch_exit(void) int kvm_emulate_halt(struct kvm_vcpu *vcpu) { ++vcpu->stat.halt_exits; - KVMTRACE_0D(HLT, vcpu, handler); + trace_kvm_hlt(vcpu); if (irqchip_in_kernel(vcpu->kvm)) { vcpu->arch.mp_state = KVM_MP_STATE_HALTED; up_read(&vcpu->kvm->slots_lock); @@ -2544,7 +2535,7 @@ int kvm_emulate_hypercall(struct kvm_vcp a2 = kvm_register_read(vcpu, VCPU_REGS_RDX); a3 = kvm_register_read(vcpu, VCPU_REGS_RSI); - KVMTRACE_1D(VMMCALL, vcpu, (u32)nr, handler); + trace_kvm_vmmcall(vcpu, nr); if (!is_long_mode(vcpu)) { nr &= 0xFFFFFFFF; @@ -2644,8 +2635,7 @@ unsigned long realmode_get_cr(struct kvm vcpu_printf(vcpu, "%s: unexpected cr %u\n", __func__, cr); return 0; } - KVMTRACE_3D(CR_READ, vcpu, (u32)cr, (u32)value, - (u32)((u64)value >> 32), handler); + trace_kvm_cr_read_realmode(vcpu, cr, value); return value; } @@ -2653,8 +2643,7 @@ unsigned long realmode_get_cr(struct kvm void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long val, unsigned long *rflags) { - KVMTRACE_3D(CR_WRITE, vcpu, (u32)cr, (u32)val, - (u32)((u64)val >> 32), handler); + trace_kvm_cr_write_realmode(vcpu, cr, val); switch (cr) { case 0: @@ -2745,11 +2734,7 @@ void kvm_emulate_cpuid(struct kvm_vcpu * kvm_register_write(vcpu, VCPU_REGS_RDX, best->edx); } kvm_x86_ops->skip_emulated_instruction(vcpu); - KVMTRACE_5D(CPUID, vcpu, function, - (u32)kvm_register_read(vcpu, VCPU_REGS_RAX), - (u32)kvm_register_read(vcpu, VCPU_REGS_RBX), - (u32)kvm_register_read(vcpu, VCPU_REGS_RCX), - (u32)kvm_register_read(vcpu, VCPU_REGS_RDX), handler); + trace_kvm_cpuid(vcpu, function); } EXPORT_SYMBOL_GPL(kvm_emulate_cpuid); @@ -2904,8 +2889,8 @@ again: kvm_guest_enter(); + trace_kvm_vmentry(vcpu); - KVMTRACE_0D(VMENTRY, vcpu, entryexit); kvm_x86_ops->run(vcpu, kvm_run); vcpu->guest_mode = 0; Index: linux-2.6-lttng/include/trace/kvm.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6-lttng/include/trace/kvm.h 2008-07-17 11:54:37.000000000 -0400 @@ -0,0 +1,83 @@ +#ifndef _TRACE_KVM_H +#define _TRACE_KVM_H + +#include <linux/kvm_host.h> +#include <linux/tracepoint.h> + +/* + * ENTRYEXIT events. + */ +DEFINE_TRACE(kvm_vmentry, + TPPROTO(struct kvm_vcpu *vcpu), + TPARGS(vcpu)); +DEFINE_TRACE(kvm_vmexit, + TPPROTO(struct kvm_vcpu *vcpu, u32 exit_reason), + TPARGS(vcpu, exit_reason)); + +/* + * HANDLER events. + */ +DEFINE_TRACE(kvm_page_fault, + TPPROTO(struct kvm_vcpu *vcpu, u32 error_code, unsigned long cr2), + TPARGS(vcpu, error_code, cr2)); + +DEFINE_TRACE(kvm_inj_virq, + TPPROTO(struct kvm_vcpu *vcpu, int irq), + TPARGS(vcpu, irq)); +DEFINE_TRACE(kvm_redeliver_evt, + TPPROTO(struct kvm_vcpu *vcpu, u32 idtv_info_field), + TPARGS(vcpu, idtv_info_field)); +DEFINE_TRACE(kvm_pend_intr, + TPPROTO(struct kvm_vcpu *vcpu), + TPARGS(vcpu)); +DEFINE_TRACE(kvm_io, + TPPROTO(struct kvm_vcpu *vcpu, int size), + TPARGS(vcpu, size)); +DEFINE_TRACE(kvm_cr_read, + TPPROTO(struct kvm_vcpu *vcpu, int cr, int reg), + TPARGS(vcpu, cr, reg)); +DEFINE_TRACE(kvm_cr_write, + TPPROTO(struct kvm_vcpu *vcpu, int cr, int reg), + TPARGS(vcpu, cr, reg)); +DEFINE_TRACE(kvm_cr_read_realmode, + TPPROTO(struct kvm_vcpu *vcpu, int cr, unsigned long value), + TPARGS(vcpu, cr, value)); +DEFINE_TRACE(kvm_cr_write_realmode, + TPPROTO(struct kvm_vcpu *vcpu, int cr, unsigned long value), + TPARGS(vcpu, cr, value)); +DEFINE_TRACE(kvm_dr_read, + TPPROTO(struct kvm_vcpu *vcpu, int dr, unsigned long val), + TPARGS(vcpu, dr, val)); +/* FIXME : missing dr_write in the original instrumentation */ +DEFINE_TRACE(kvm_msr_read, + TPPROTO(struct kvm_vcpu *vcpu, u32 ecx, u64 data), + TPARGS(vcpu, ecx, data)); +DEFINE_TRACE(kvm_msr_write, + TPPROTO(struct kvm_vcpu *vcpu, u32 ecx, u64 data), + TPARGS(vcpu, ecx, data)); +DEFINE_TRACE(kvm_cpuid, + TPPROTO(struct kvm_vcpu *vcpu, u32 function), + TPARGS(vcpu, function)); +DEFINE_TRACE(kvm_intr, + TPPROTO(struct kvm_vcpu *vcpu), + TPARGS(vcpu)); +DEFINE_TRACE(kvm_nmi, + TPPROTO(struct kvm_vcpu *vcpu), + TPARGS(vcpu)); +DEFINE_TRACE(kvm_vmmcall, + TPPROTO(struct kvm_vcpu *vcpu, unsigned long nr), + TPARGS(vcpu, nr)); +DEFINE_TRACE(kvm_hlt, + TPPROTO(struct kvm_vcpu *vcpu), + TPARGS(vcpu)); +DEFINE_TRACE(kvm_clts, + TPPROTO(struct kvm_vcpu *vcpu), + TPARGS(vcpu)); +DEFINE_TRACE(kvm_lmsw, + TPPROTO(struct kvm_vcpu *vcpu, unsigned long cr0), + TPARGS(vcpu, cr0)); +DEFINE_TRACE(kvm_apic_access, + TPPROTO(struct kvm_vcpu *vcpu, unsigned long offset), + TPARGS(vcpu, offset)); + +#endif Index: linux-2.6-lttng/virt/kvm/kvm_trace.c =================================================================== --- linux-2.6-lttng.orig/virt/kvm/kvm_trace.c 2008-07-17 11:54:12.000000000 -0400 +++ linux-2.6-lttng/virt/kvm/kvm_trace.c 2008-07-17 11:55:40.000000000 -0400 @@ -6,6 +6,7 @@ * it's possible to reconstruct a chronological record of trace events. * The implementation refers to blktrace kernel support. * + * Copyright (c) 2008 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> * Copyright (c) 2008 Intel Corporation * Copyright (C) 2006 Jens Axboe <axboe@kernel.dk> * @@ -19,6 +20,8 @@ #include <linux/debugfs.h> #include <linux/kvm_host.h> +#include <trace/kvm.h> +#include <asm/kvm_cache_regs.h> #define KVM_TRACE_STATE_RUNNING (1 << 0) #define KVM_TRACE_STATE_PAUSE (1 << 1) @@ -32,12 +35,8 @@ struct kvm_trace { }; static struct kvm_trace *kvm_trace; -struct kvm_trace_probe { - const char *name; - const char *format; - u32 cycle_in; - marker_probe_func *probe_func; -}; +static int register_tracepoints(void); +static void unregister_tracepoints(void); static inline int calc_rec_size(int cycle, int extra) { @@ -47,48 +46,58 @@ static inline int calc_rec_size(int cycl return cycle ? rec_size += KVM_TRC_CYCLE_SIZE : rec_size; } -static void kvm_add_trace(void *probe_private, void *call_data, - const char *format, va_list *args) +static void kvm_add_trace_timestamp(unsigned int evid, struct kvm_vcpu *vcpu, + int nr_args, u32 *args) { - struct kvm_trace_probe *p = probe_private; struct kvm_trace *kt = kvm_trace; struct kvm_trace_rec rec; - struct kvm_vcpu *vcpu; - int i, extra, size; + int size; if (unlikely(kt->trace_state != KVM_TRACE_STATE_RUNNING)) return; - rec.event = va_arg(*args, u32); - vcpu = va_arg(*args, struct kvm_vcpu *); + rec.event = evid; rec.pid = current->tgid; rec.vcpu_id = vcpu->vcpu_id; + rec.cycle_in = 1; - extra = va_arg(*args, u32); - WARN_ON(!(extra <= KVM_TRC_EXTRA_MAX)); - extra = min_t(u32, extra, KVM_TRC_EXTRA_MAX); - rec.extra_u32 = extra; - - rec.cycle_in = p->cycle_in; - - if (rec.cycle_in) { - rec.u.cycle.cycle_u64 = get_cycles(); - - for (i = 0; i < rec.extra_u32; i++) - rec.u.cycle.extra_u32[i] = va_arg(*args, u32); - } else { - for (i = 0; i < rec.extra_u32; i++) - rec.u.nocycle.extra_u32[i] = va_arg(*args, u32); - } + WARN_ON(!(nr_args <= KVM_TRC_EXTRA_MAX)); + nr_args = min_t(u32, nr_args, KVM_TRC_EXTRA_MAX); + rec.extra_u32 = nr_args; + + rec.u.cycle.cycle_u64 = get_cycles(); + if (nr_args > 0) + memcpy(rec.u.cycle.extra_u32, args, nr_args * sizeof(u32)); size = calc_rec_size(rec.cycle_in, rec.extra_u32 * sizeof(u32)); relay_write(kt->rchan, &rec, size); } -static struct kvm_trace_probe kvm_trace_probes[] = { - { "kvm_trace_entryexit", "%u %p %u %u %u %u %u %u", 1, kvm_add_trace }, - { "kvm_trace_handler", "%u %p %u %u %u %u %u %u", 0, kvm_add_trace }, -}; +static void kvm_add_trace(unsigned int evid, struct kvm_vcpu *vcpu, + int nr_args, u32 *args) +{ + struct kvm_trace *kt = kvm_trace; + struct kvm_trace_rec rec; + int size; + + if (unlikely(kt->trace_state != KVM_TRACE_STATE_RUNNING)) + return; + + rec.event = evid; + rec.pid = current->tgid; + rec.vcpu_id = vcpu->vcpu_id; + rec.cycle_in = 0; + + WARN_ON(!(nr_args <= KVM_TRC_EXTRA_MAX)); + nr_args = min_t(u32, nr_args, KVM_TRC_EXTRA_MAX); + rec.extra_u32 = nr_args; + + if (nr_args > 0) + memcpy(rec.u.nocycle.extra_u32, args, nr_args * sizeof(u32)); + + size = calc_rec_size(rec.cycle_in, rec.extra_u32 * sizeof(u32)); + relay_write(kt->rchan, &rec, size); +} static int lost_records_get(void *data, u64 *val) { @@ -154,7 +163,7 @@ static struct rchan_callbacks kvm_relay_ static int do_kvm_trace_enable(struct kvm_user_trace_setup *kuts) { struct kvm_trace *kt; - int i, r = -ENOMEM; + int r = -ENOMEM; if (!kuts->buf_size || !kuts->buf_nr) return -EINVAL; @@ -177,14 +186,8 @@ static int do_kvm_trace_enable(struct kv kvm_trace = kt; - for (i = 0; i < ARRAY_SIZE(kvm_trace_probes); i++) { - struct kvm_trace_probe *p = &kvm_trace_probes[i]; - - r = marker_probe_register(p->name, p->format, p->probe_func, p); - if (r) - printk(KERN_INFO "Unable to register probe %s\n", - p->name); - } + r = register_tracepoints(); + WARN_ON(r); kvm_trace->trace_state = KVM_TRACE_STATE_RUNNING; @@ -236,7 +239,6 @@ static int kvm_trace_pause(void) void kvm_trace_cleanup(void) { struct kvm_trace *kt = kvm_trace; - int i; if (kt == NULL) return; @@ -245,12 +247,7 @@ void kvm_trace_cleanup(void) kt->trace_state == KVM_TRACE_STATE_PAUSE) { kt->trace_state = KVM_TRACE_STATE_CLEARUP; - - for (i = 0; i < ARRAY_SIZE(kvm_trace_probes); i++) { - struct kvm_trace_probe *p = &kvm_trace_probes[i]; - marker_probe_unregister(p->name, p->probe_func, p); - } - + unregister_tracepoints(); relay_close(kt->rchan); debugfs_remove(kt->lost_file); kfree(kt); @@ -280,3 +277,244 @@ int kvm_trace_ioctl(unsigned int ioctl, return r; } + +/* + * Tracepoint probes for KVM-trace. + * + * The probes end up calling either kvm_add_trace_timestamp (the cycle counter + * is read) or kvm_add_trace (no timestamp saved in the trace). + */ + +/* + * Probes with timestamp. + */ +static void probe_kvm_vmentry(struct kvm_vcpu *vcpu) +{ + kvm_add_trace_timestamp(KVM_TRC_VMENTRY, vcpu, 0, NULL); +} + +static void probe_kvm_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason) +{ + kvm_add_trace_timestamp(KVM_TRC_VMEXIT, vcpu, 3, + (u32 []){ exit_reason, + (u32)kvm_rip_read(vcpu), + (u32)((u64)kvm_rip_read(vcpu) >> 32) }); +} + +/* + * Probes without timestamp. + */ +static void probe_kvm_page_fault(struct kvm_vcpu *vcpu, u32 error_code, + unsigned long cr2) +{ + kvm_add_trace(KVM_TRC_PAGE_FAULT, vcpu, 3, + (u32 []){ error_code, (u32)cr2, (u32)((u64)cr2 >> 32) }); +} + +static void probe_kvm_inj_virq(struct kvm_vcpu *vcpu, int irq) +{ + kvm_add_trace(KVM_TRC_VMENTRY, vcpu, 1, (u32 []){ (u32)irq }); +} + +static void probe_kvm_redeliver_evt(struct kvm_vcpu *vcpu, u32 idtv_info_field) +{ + kvm_add_trace(KVM_TRC_REDELIVER_EVT, vcpu, 1, + (u32 []){ idtv_info_field }); +} + +static void probe_kvm_pend_intr(struct kvm_vcpu *vcpu) +{ + kvm_add_trace(KVM_TRC_PEND_INTR, vcpu, 0, NULL); +} + +static void probe_kvm_io(struct kvm_vcpu *vcpu, int size) +{ + if (vcpu->run->io.direction == KVM_EXIT_IO_IN) + kvm_add_trace(KVM_TRC_IO_READ, vcpu, 2, + (u32 []){ vcpu->run->io.port, (u32)size }); + else + kvm_add_trace(KVM_TRC_IO_WRITE, vcpu, 2, + (u32 []){ vcpu->run->io.port, (u32)size }); +} + +static void probe_kvm_cr_read(struct kvm_vcpu *vcpu, int cr, int reg) +{ + switch (cr) { + case 3: + kvm_add_trace(KVM_TRC_CR_READ, vcpu, 3, + (u32 []){ (u32)cr, + (u32)kvm_register_read(vcpu, reg), + (u32)((u64)kvm_register_read(vcpu, reg) >> 32) }); + break; + case 8: + kvm_add_trace(KVM_TRC_CR_READ, vcpu, 2, + (u32 []){ (u32)cr, (u32)kvm_register_read(vcpu, reg) }); + break; + } +} + +static void probe_kvm_cr_write(struct kvm_vcpu *vcpu, int cr, int reg) +{ + kvm_add_trace(KVM_TRC_CR_WRITE, vcpu, 3, + (u32 []){ (u32)cr, + (u32)kvm_register_read(vcpu, reg), + (u32)((u64)kvm_register_read(vcpu, reg) >> 32) }); +} + + +static void probe_kvm_cr_read_realmode(struct kvm_vcpu *vcpu, int cr, + unsigned long value) +{ + kvm_add_trace(KVM_TRC_CR_READ, vcpu, 3, + (u32 []){ (u32)cr, (u32)value, (u32)((u64)value >> 32) }); +} + +static void probe_kvm_cr_write_realmode(struct kvm_vcpu *vcpu, int cr, + unsigned long value) +{ + kvm_add_trace(KVM_TRC_CR_WRITE, vcpu, 3, + (u32 []){ (u32)cr, (u32)value, (u32)((u64)value >> 32) }); +} + +static void probe_kvm_dr_read(struct kvm_vcpu *vcpu, int dr, unsigned long val) +{ + kvm_add_trace(KVM_TRC_DR_READ, vcpu, 2, + (u32 []){ (u32)dr, (u32)val }); +} + +static void probe_kvm_msr_read(struct kvm_vcpu *vcpu, u32 ecx, u64 data) +{ + kvm_add_trace(KVM_TRC_MSR_READ, vcpu, 3, + (u32 []){ ecx, (u32)data, (u32)(data >> 32) }); +} + +static void probe_kvm_msr_write(struct kvm_vcpu *vcpu, u32 ecx, u64 data) +{ + kvm_add_trace(KVM_TRC_MSR_WRITE, vcpu, 3, + (u32 []){ ecx, (u32)data, (u32)(data >> 32) }); +} + +static void probe_kvm_cpuid(struct kvm_vcpu *vcpu, u32 function) +{ + kvm_add_trace(KVM_TRC_CPUID, vcpu, 5, + (u32 []){ function, + (u32)kvm_register_read(vcpu, VCPU_REGS_RAX), + (u32)kvm_register_read(vcpu, VCPU_REGS_RBX), + (u32)kvm_register_read(vcpu, VCPU_REGS_RCX), + (u32)kvm_register_read(vcpu, VCPU_REGS_RDX) }); +} + +static void probe_kvm_intr(struct kvm_vcpu *vcpu) +{ + kvm_add_trace(KVM_TRC_INTR, vcpu, 1, + (u32 []){ vmcs_read32(VM_EXIT_INTR_INFO) }); +} + +static void probe_kvm_nmi(struct kvm_vcpu *vcpu) +{ + kvm_add_trace(KVM_TRC_NMI, vcpu, 0, NULL); +} + +static void probe_kvm_vmmcall(struct kvm_vcpu *vcpu, unsigned long nr) +{ + kvm_add_trace(KVM_TRC_VMMCALL, vcpu, 1, (u32 []){ (u32)nr }); +} + +static void probe_kvm_hlt(struct kvm_vcpu *vcpu) +{ + kvm_add_trace(KVM_TRC_HLT, vcpu, 0, NULL); +} + +static void probe_kvm_clts(struct kvm_vcpu *vcpu) +{ + kvm_add_trace(KVM_TRC_CLTS, vcpu, 0, NULL); +} + +static void probe_kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long cr0) +{ + kvm_add_trace(KVM_TRC_LMSW, vcpu, 1, (u32 []){ (u32)cr0 }); +} + +static void probe_kvm_apic_access(struct kvm_vcpu *vcpu, unsigned long offset) +{ + kvm_add_trace(KVM_TRC_LMSW, vcpu, 1, (u32 []){ (u32)offset }); +} + +static int register_tracepoints(void) +{ + int ret; + + ret = register_trace_kvm_vmentry(probe_kvm_vmentry); + WARN_ON(ret); + ret = register_trace_kvm_vmexit(probe_kvm_vmexit); + WARN_ON(ret); + ret = register_trace_kvm_page_fault(probe_kvm_page_fault); + WARN_ON(ret); + ret = register_trace_kvm_inj_virq(probe_kvm_inj_virq); + WARN_ON(ret); + ret = register_trace_kvm_redeliver_evt(probe_kvm_redeliver_evt); + WARN_ON(ret); + ret = register_trace_kvm_pend_intr(probe_kvm_pend_intr); + WARN_ON(ret); + ret = register_trace_kvm_io(probe_kvm_io); + WARN_ON(ret); + ret = register_trace_kvm_cr_read(probe_kvm_cr_read); + WARN_ON(ret); + ret = register_trace_kvm_cr_write(probe_kvm_cr_write); + WARN_ON(ret); + ret = register_trace_kvm_cr_read_realmode(probe_kvm_cr_read_realmode); + WARN_ON(ret); + ret = register_trace_kvm_cr_write_realmode(probe_kvm_cr_write_realmode); + WARN_ON(ret); + ret = register_trace_kvm_dr_read(probe_kvm_dr_read); + WARN_ON(ret); + ret = register_trace_kvm_msr_read(probe_kvm_msr_read); + WARN_ON(ret); + ret = register_trace_kvm_msr_write(probe_kvm_msr_write); + WARN_ON(ret); + ret = register_trace_kvm_cpuid(probe_kvm_cpuid); + WARN_ON(ret); + ret = register_trace_kvm_intr(probe_kvm_intr); + WARN_ON(ret); + ret = register_trace_kvm_nmi(probe_kvm_nmi); + WARN_ON(ret); + ret = register_trace_kvm_vmmcall(probe_kvm_vmmcall); + WARN_ON(ret); + ret = register_trace_kvm_hlt(probe_kvm_hlt); + WARN_ON(ret); + ret = register_trace_kvm_clts(probe_kvm_clts); + WARN_ON(ret); + ret = register_trace_kvm_lmsw(probe_kvm_lmsw); + WARN_ON(ret); + ret = register_trace_kvm_apic_access(probe_kvm_apic_access); + WARN_ON(ret); + + return 0; +} + +static void unregister_tracepoints(void) +{ + unregister_trace_kvm_apic_access(probe_kvm_apic_access); + unregister_trace_kvm_lmsw(probe_kvm_lmsw); + unregister_trace_kvm_clts(probe_kvm_clts); + unregister_trace_kvm_hlt(probe_kvm_hlt); + unregister_trace_kvm_vmmcall(probe_kvm_vmmcall); + unregister_trace_kvm_nmi(probe_kvm_nmi); + unregister_trace_kvm_intr(probe_kvm_intr); + unregister_trace_kvm_cpuid(probe_kvm_cpuid); + unregister_trace_kvm_msr_write(probe_kvm_msr_write); + unregister_trace_kvm_msr_read(probe_kvm_msr_read); + unregister_trace_kvm_dr_read(probe_kvm_dr_read); + unregister_trace_kvm_cr_write_realmode(probe_kvm_cr_write_realmode); + unregister_trace_kvm_cr_read_realmode(probe_kvm_cr_read_realmode); + unregister_trace_kvm_cr_write(probe_kvm_cr_write); + unregister_trace_kvm_cr_read(probe_kvm_cr_read); + unregister_trace_kvm_io(probe_kvm_io); + unregister_trace_kvm_pend_intr(probe_kvm_pend_intr); + unregister_trace_kvm_redeliver_evt(probe_kvm_redeliver_evt); + unregister_trace_kvm_inj_virq(probe_kvm_inj_virq); + unregister_trace_kvm_page_fault(probe_kvm_page_fault); + unregister_trace_kvm_vmexit(probe_kvm_vmexit); + unregister_trace_kvm_vmentry(probe_kvm_vmentry); + +} -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints 2008-07-17 15:57 ` [patch 4/4] KVM-trace port to tracepoints Mathieu Desnoyers @ 2008-07-17 16:49 ` Jan Kiszka 2008-07-17 17:28 ` Mathieu Desnoyers 2008-07-17 16:52 ` Anthony Liguori 1 sibling, 1 reply; 40+ messages in thread From: Jan Kiszka @ 2008-07-17 16:49 UTC (permalink / raw) To: Mathieu Desnoyers Cc: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Avi Kivity, kvm, Feng(Eric) Liu Mathieu Desnoyers wrote: > Port/cleanup of KVM-trace to tracepoints. > > Tracepoints allow dormat instrumentation, like the kernel markers, but also > allows to describe the trace points in global headers so they can be easily > managed. They also do not use format strings. > > Anything that would involve an action (dereference a pointer, vmcs read, ...) > only required when tracing is placed in the probes created in kvm_trace.c > > This patch depends on the "Tracepoints" patch. > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> > CC: 'Peter Zijlstra' <peterz@infradead.org> > CC: 'Feng(Eric) Liu' <eric.e.liu@intel.com> > CC: Avi Kivity <avi@qumranet.com> > CC: kvm@vger.kernel.org > --- > arch/x86/kvm/vmx.c | 38 ++--- > arch/x86/kvm/x86.c | 43 ++---- > include/trace/kvm.h | 83 ++++++++++++ > virt/kvm/kvm_trace.c | 336 +++++++++++++++++++++++++++++++++++++++++++-------- > 4 files changed, 398 insertions(+), 102 deletions(-) Is it a specific property of KVM-trace that causes this LOC blow-up? Or is this a generic side-effect of tracepoints? [ Hmm, hope I didn't missed too much of the tracepoint discussion... ] Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints 2008-07-17 16:49 ` Jan Kiszka @ 2008-07-17 17:28 ` Mathieu Desnoyers 2008-07-22 16:04 ` Jan Kiszka 0 siblings, 1 reply; 40+ messages in thread From: Mathieu Desnoyers @ 2008-07-17 17:28 UTC (permalink / raw) To: Jan Kiszka Cc: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Avi Kivity, kvm, Feng(Eric) Liu * Jan Kiszka (jan.kiszka@siemens.com) wrote: > Mathieu Desnoyers wrote: > > Port/cleanup of KVM-trace to tracepoints. > > > > Tracepoints allow dormat instrumentation, like the kernel markers, but also > > allows to describe the trace points in global headers so they can be easily > > managed. They also do not use format strings. > > > > Anything that would involve an action (dereference a pointer, vmcs read, ...) > > only required when tracing is placed in the probes created in kvm_trace.c > > > > This patch depends on the "Tracepoints" patch. > > > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> > > CC: 'Peter Zijlstra' <peterz@infradead.org> > > CC: 'Feng(Eric) Liu' <eric.e.liu@intel.com> > > CC: Avi Kivity <avi@qumranet.com> > > CC: kvm@vger.kernel.org > > --- > > arch/x86/kvm/vmx.c | 38 ++--- > > arch/x86/kvm/x86.c | 43 ++---- > > include/trace/kvm.h | 83 ++++++++++++ > > virt/kvm/kvm_trace.c | 336 +++++++++++++++++++++++++++++++++++++++++++-------- > > 4 files changed, 398 insertions(+), 102 deletions(-) > > Is it a specific property of KVM-trace that causes this LOC blow-up? Or > is this a generic side-effect of tracepoints? > > [ Hmm, hope I didn't missed too much of the tracepoint discussion... ] > This LOC blow-up is caused by the creation of one probe per instrumentation site. So instead of placing the argument setup of everything that goes in the trace (0 to 5 u32 arguments) in the kvm code, it can be placed separately in a probe object, which could eventually be a dynamically loadable module. The primary objective of tracepoints is to make sure the kernel code does not become harder to read because of added instrumentation and to provide type-checking at compile-time without needing to put format strings into the kernel code, which, to some, looks like debugging code. The other aspect it try to address is maintainability of trace points : it's much easier to look at all the prototype definitions in include/trace/*.h and to manage them (and external tracers which would like to connect on those points) than to try to figure out in which C files tracing statements has been hidden. We can think of it as a standard tracing API providing a more or less stable list of kernel tracepoints. So, while KVMTRACE_?D() statements suits closely kvm-trace specificities, it's useless to impose constraints such as splitting unsigned longs into two u32 for tracers which can support a wider variety of data types. After refactoring the patch to put the probes in arch/x86/kvm, the result is : arch/x86/kvm/Makefile | 1 arch/x86/kvm/kvm_trace_probes.c | 251 ++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/vmx.c | 38 ++---- arch/x86/kvm/x86.c | 43 ++---- include/asm-x86/kvm_host.h | 8 + include/trace/kvm.h | 83 +++++++++++++ virt/kvm/kvm_trace.c | 93 ++++++-------- 7 files changed, 414 insertions(+), 103 deletions(-) So actually, is it better to have less LOC which looks like this : KVMTRACE_5D(CPUID, vcpu, function, (u32)kvm_register_read(vcpu, VCPU_REGS_RAX), (u32)kvm_register_read(vcpu, VCPU_REGS_RBX), (u32)kvm_register_read(vcpu, VCPU_REGS_RCX), (u32)kvm_register_read(vcpu, VCPU_REGS_RDX), handler); or more LOC looking like this : include/trace/kvm.h: DEFINE_TRACE(kvm_cpuid, TPPROTO(struct kvm_vcpu *vcpu, u32 function), TPARGS(vcpu, function)); arch/x86/kvm/x86.c: trace_kvm_cpuid(vcpu, function); arch/x86/kvm/kvm_trace_probes.c: static void probe_kvm_cpuid(struct kvm_vcpu *vcpu, u32 function) { kvm_add_trace(KVM_TRC_CPUID, vcpu, 5, (u32 []){ function, (u32)kvm_register_read(vcpu, VCPU_REGS_RAX), (u32)kvm_register_read(vcpu, VCPU_REGS_RBX), (u32)kvm_register_read(vcpu, VCPU_REGS_RCX), (u32)kvm_register_read(vcpu, VCPU_REGS_RDX) }); } int register_kvm_tracepoints(void) { ... ret = register_trace_kvm_cpuid(probe_kvm_cpuid); WARN_ON(ret); ... } void unregister_kvm_tracepoints(void) { ... unregister_trace_kvm_cpuid(probe_kvm_cpuid); ... } ? Notice that only a single line of code is inserted to the kernel code, while all the rest sits outsite in a separated probe module. So I think it's very important to distinguish between LOC which impair kernel code readability and LOC which sit in their own sandbox. Mathieu > Jan > > -- > Siemens AG, Corporate Technology, CT SE 2 > Corporate Competence Center Embedded Linux -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints 2008-07-17 17:28 ` Mathieu Desnoyers @ 2008-07-22 16:04 ` Jan Kiszka 2008-07-22 18:46 ` Avi Kivity 0 siblings, 1 reply; 40+ messages in thread From: Jan Kiszka @ 2008-07-22 16:04 UTC (permalink / raw) To: Mathieu Desnoyers Cc: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Avi Kivity, kvm, Feng(Eric) Liu Mathieu Desnoyers wrote: > * Jan Kiszka (jan.kiszka@siemens.com) wrote: >> Mathieu Desnoyers wrote: >>> Port/cleanup of KVM-trace to tracepoints. >>> >>> Tracepoints allow dormat instrumentation, like the kernel markers, but also >>> allows to describe the trace points in global headers so they can be easily >>> managed. They also do not use format strings. >>> >>> Anything that would involve an action (dereference a pointer, vmcs read, ...) >>> only required when tracing is placed in the probes created in kvm_trace.c >>> >>> This patch depends on the "Tracepoints" patch. >>> >>> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> >>> CC: 'Peter Zijlstra' <peterz@infradead.org> >>> CC: 'Feng(Eric) Liu' <eric.e.liu@intel.com> >>> CC: Avi Kivity <avi@qumranet.com> >>> CC: kvm@vger.kernel.org >>> --- >>> arch/x86/kvm/vmx.c | 38 ++--- >>> arch/x86/kvm/x86.c | 43 ++---- >>> include/trace/kvm.h | 83 ++++++++++++ >>> virt/kvm/kvm_trace.c | 336 +++++++++++++++++++++++++++++++++++++++++++-------- >>> 4 files changed, 398 insertions(+), 102 deletions(-) >> Is it a specific property of KVM-trace that causes this LOC blow-up? Or >> is this a generic side-effect of tracepoints? >> >> [ Hmm, hope I didn't missed too much of the tracepoint discussion... ] >> > > This LOC blow-up is caused by the creation of one probe per > instrumentation site. So instead of placing the argument setup of > everything that goes in the trace (0 to 5 u32 arguments) in the kvm > code, it can be placed separately in a probe object, which could > eventually be a dynamically loadable module. > > The primary objective of tracepoints is to make sure the kernel code > does not become harder to read because of added instrumentation and to > provide type-checking at compile-time without needing to put format > strings into the kernel code, which, to some, looks like debugging code. > The other aspect it try to address is maintainability of trace points : > it's much easier to look at all the prototype definitions in > include/trace/*.h and to manage them (and external tracers which would > like to connect on those points) than to try to figure out in which C > files tracing statements has been hidden. We can think of it as a > standard tracing API providing a more or less stable list of kernel > tracepoints. > > So, while KVMTRACE_?D() statements suits closely kvm-trace > specificities, it's useless to impose constraints such as splitting > unsigned longs into two u32 for tracers which can support a wider > variety of data types. > > After refactoring the patch to put the probes in arch/x86/kvm, the > result is : > > arch/x86/kvm/Makefile | 1 > arch/x86/kvm/kvm_trace_probes.c | 251 ++++++++++++++++++++++++++++++++++++++++ > arch/x86/kvm/vmx.c | 38 ++---- > arch/x86/kvm/x86.c | 43 ++---- > include/asm-x86/kvm_host.h | 8 + > include/trace/kvm.h | 83 +++++++++++++ > virt/kvm/kvm_trace.c | 93 ++++++-------- > 7 files changed, 414 insertions(+), 103 deletions(-) > > So actually, is it better to have less LOC which looks like this : > > KVMTRACE_5D(CPUID, vcpu, function, > (u32)kvm_register_read(vcpu, VCPU_REGS_RAX), > (u32)kvm_register_read(vcpu, VCPU_REGS_RBX), > (u32)kvm_register_read(vcpu, VCPU_REGS_RCX), > (u32)kvm_register_read(vcpu, VCPU_REGS_RDX), handler); > > > or more LOC looking like this : > > include/trace/kvm.h: > DEFINE_TRACE(kvm_cpuid, > TPPROTO(struct kvm_vcpu *vcpu, u32 function), > TPARGS(vcpu, function)); > > arch/x86/kvm/x86.c: > trace_kvm_cpuid(vcpu, function); > > arch/x86/kvm/kvm_trace_probes.c: > static void probe_kvm_cpuid(struct kvm_vcpu *vcpu, u32 function) > { > kvm_add_trace(KVM_TRC_CPUID, vcpu, 5, > (u32 []){ function, > (u32)kvm_register_read(vcpu, VCPU_REGS_RAX), > (u32)kvm_register_read(vcpu, VCPU_REGS_RBX), > (u32)kvm_register_read(vcpu, VCPU_REGS_RCX), > (u32)kvm_register_read(vcpu, VCPU_REGS_RDX) }); > } > > int register_kvm_tracepoints(void) > { > ... > ret = register_trace_kvm_cpuid(probe_kvm_cpuid); > WARN_ON(ret); > ... > } > > void unregister_kvm_tracepoints(void) > { > ... > unregister_trace_kvm_cpuid(probe_kvm_cpuid); > ... > } > > ? > > Notice that only a single line of code is inserted to the kernel code, > while all the rest sits outsite in a separated probe module. So I think > it's very important to distinguish between LOC which impair kernel code > readability and LOC which sit in their own sandbox. That's true - as long as you don't have to add/remove/modify tracepoints. I had to do this job in the past (not for KVM). Having 1 spot in 1 file (based on generic probes) would be handier in that case than 5 spots in 3 files. But if the KVM tracepoints are considered stable in their number and structure, that shouldn't be an issue here. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints 2008-07-22 16:04 ` Jan Kiszka @ 2008-07-22 18:46 ` Avi Kivity 2008-07-23 7:49 ` Peter Zijlstra 0 siblings, 1 reply; 40+ messages in thread From: Avi Kivity @ 2008-07-22 18:46 UTC (permalink / raw) To: Jan Kiszka Cc: Mathieu Desnoyers, akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, kvm, Feng(Eric) Liu Jan Kiszka wrote: > That's true - as long as you don't have to add/remove/modify > tracepoints. I had to do this job in the past (not for KVM). Having 1 > spot in 1 file (based on generic probes) would be handier in that case > than 5 spots in 3 files. But if the KVM tracepoints are considered > stable in their number and structure, that shouldn't be an issue here. > > Tracepoints aren't stable; they are artefacts of the implementation. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints 2008-07-22 18:46 ` Avi Kivity @ 2008-07-23 7:49 ` Peter Zijlstra 2008-07-23 8:08 ` Avi Kivity 0 siblings, 1 reply; 40+ messages in thread From: Peter Zijlstra @ 2008-07-23 7:49 UTC (permalink / raw) To: Avi Kivity Cc: Jan Kiszka, Mathieu Desnoyers, akpm, Ingo Molnar, linux-kernel, kvm, Feng(Eric) Liu On Tue, 2008-07-22 at 21:46 +0300, Avi Kivity wrote: > Jan Kiszka wrote: > > That's true - as long as you don't have to add/remove/modify > > tracepoints. I had to do this job in the past (not for KVM). Having 1 > > spot in 1 file (based on generic probes) would be handier in that case > > than 5 spots in 3 files. But if the KVM tracepoints are considered > > stable in their number and structure, that shouldn't be an issue here. > > > > > > Tracepoints aren't stable; they are artefacts of the implementation. Which IMHO makes it unsuitable for trace_mark() as that will be exported to user-space, and every time you change your tracepoints you'll change user visible things - not nice. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints 2008-07-23 7:49 ` Peter Zijlstra @ 2008-07-23 8:08 ` Avi Kivity 2008-07-23 8:55 ` Peter Zijlstra 2008-07-23 13:20 ` Mathieu Desnoyers 0 siblings, 2 replies; 40+ messages in thread From: Avi Kivity @ 2008-07-23 8:08 UTC (permalink / raw) To: Peter Zijlstra Cc: Jan Kiszka, Mathieu Desnoyers, akpm, Ingo Molnar, linux-kernel, kvm, Feng(Eric) Liu Peter Zijlstra wrote: > On Tue, 2008-07-22 at 21:46 +0300, Avi Kivity wrote: > >> Jan Kiszka wrote: >> >>> That's true - as long as you don't have to add/remove/modify >>> tracepoints. I had to do this job in the past (not for KVM). Having 1 >>> spot in 1 file (based on generic probes) would be handier in that case >>> than 5 spots in 3 files. But if the KVM tracepoints are considered >>> stable in their number and structure, that shouldn't be an issue here. >>> >>> >>> >> Tracepoints aren't stable; they are artefacts of the implementation. >> > > Which IMHO makes it unsuitable for trace_mark() as that will be exported > to user-space, and every time you change your tracepoints you'll change > user visible things - not nice. > They are used for debugging (mostly performance related), so changes are expected. What uses of trace_mark() depend on a stable interface? blktrace? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints 2008-07-23 8:08 ` Avi Kivity @ 2008-07-23 8:55 ` Peter Zijlstra 2008-07-23 9:32 ` Avi Kivity 2008-07-23 10:03 ` Christoph Hellwig 2008-07-23 13:20 ` Mathieu Desnoyers 1 sibling, 2 replies; 40+ messages in thread From: Peter Zijlstra @ 2008-07-23 8:55 UTC (permalink / raw) To: Avi Kivity Cc: Jan Kiszka, Mathieu Desnoyers, akpm, Ingo Molnar, linux-kernel, kvm, Feng(Eric) Liu On Wed, 2008-07-23 at 11:08 +0300, Avi Kivity wrote: > Peter Zijlstra wrote: > > On Tue, 2008-07-22 at 21:46 +0300, Avi Kivity wrote: > > > >> Jan Kiszka wrote: > >> > >>> That's true - as long as you don't have to add/remove/modify > >>> tracepoints. I had to do this job in the past (not for KVM). Having 1 > >>> spot in 1 file (based on generic probes) would be handier in that case > >>> than 5 spots in 3 files. But if the KVM tracepoints are considered > >>> stable in their number and structure, that shouldn't be an issue here. > >>> > >>> > >>> > >> Tracepoints aren't stable; they are artefacts of the implementation. > >> > > > > Which IMHO makes it unsuitable for trace_mark() as that will be exported > > to user-space, and every time you change your tracepoints you'll change > > user visible things - not nice. > > > > They are used for debugging (mostly performance related), so changes are > expected. > > What uses of trace_mark() depend on a stable interface? blktrace? There are currently no trace_mark() sites in the kernel that I'm aware of (except for the scheduler :-/, and those should be converted to tracepoints ASAP). Andrew raised the whole point about trace_mark() generating an user-visible interface and thus it should be stable, and I agree with that. What that means is that trace_mark() can only be used for really stable points. This in turn means we might as well use trace points. Which allows for the conclusion that trace_mark() is not needed and could be removed from the kernel. However - it might be handy for ad-hoc debugging purposes that never see the light of day (linus' git tree in this case). So on those grounds one could argue against removing trace_mark. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints 2008-07-23 8:55 ` Peter Zijlstra @ 2008-07-23 9:32 ` Avi Kivity 2008-07-23 9:53 ` Peter Zijlstra 2008-07-23 10:03 ` Christoph Hellwig 1 sibling, 1 reply; 40+ messages in thread From: Avi Kivity @ 2008-07-23 9:32 UTC (permalink / raw) To: Peter Zijlstra Cc: Jan Kiszka, Mathieu Desnoyers, akpm, Ingo Molnar, linux-kernel, kvm, Feng(Eric) Liu Peter Zijlstra wrote: > There are currently no trace_mark() sites in the kernel that I'm aware > of (except for the scheduler :-/, and those should be converted to > tracepoints ASAP). > > Andrew raised the whole point about trace_mark() generating an > user-visible interface and thus it should be stable, and I agree with > that. > > What that means is that trace_mark() can only be used for really stable > points. > > This in turn means we might as well use trace points. > > Which allows for the conclusion that trace_mark() is not needed and > could be removed from the kernel. > > However - it might be handy for ad-hoc debugging purposes that never see > the light of day (linus' git tree in this case). So on those grounds one > could argue against removing trace_mark But trace_mark() is so wonderful. Can't we just declare the tracemarks as a non-stable interface? Perhaps add an unstable_trace_mark() to make it clear. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints 2008-07-23 9:32 ` Avi Kivity @ 2008-07-23 9:53 ` Peter Zijlstra 2008-07-23 13:15 ` Mathieu Desnoyers 0 siblings, 1 reply; 40+ messages in thread From: Peter Zijlstra @ 2008-07-23 9:53 UTC (permalink / raw) To: Avi Kivity Cc: Jan Kiszka, Mathieu Desnoyers, akpm, Ingo Molnar, linux-kernel, kvm, Feng(Eric) Liu On Wed, 2008-07-23 at 12:32 +0300, Avi Kivity wrote: > Peter Zijlstra wrote: > > There are currently no trace_mark() sites in the kernel that I'm aware > > of (except for the scheduler :-/, and those should be converted to > > tracepoints ASAP). > > > > Andrew raised the whole point about trace_mark() generating an > > user-visible interface and thus it should be stable, and I agree with > > that. > > > > What that means is that trace_mark() can only be used for really stable > > points. > > > > This in turn means we might as well use trace points. > > > > Which allows for the conclusion that trace_mark() is not needed and > > could be removed from the kernel. > > > > However - it might be handy for ad-hoc debugging purposes that never see > > the light of day (linus' git tree in this case). So on those grounds one > > could argue against removing trace_mark > > But trace_mark() is so wonderful. I guess tastes differ... > Can't we just declare the tracemarks > as a non-stable interface? > > Perhaps add an unstable_trace_mark() to make it clear. At the very least it would need its own output channel. But I'm afraid this will be KS material. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints 2008-07-23 9:53 ` Peter Zijlstra @ 2008-07-23 13:15 ` Mathieu Desnoyers 0 siblings, 0 replies; 40+ messages in thread From: Mathieu Desnoyers @ 2008-07-23 13:15 UTC (permalink / raw) To: Peter Zijlstra Cc: Avi Kivity, Jan Kiszka, akpm, Ingo Molnar, linux-kernel, kvm, Feng(Eric) Liu * Peter Zijlstra (peterz@infradead.org) wrote: > On Wed, 2008-07-23 at 12:32 +0300, Avi Kivity wrote: > > Peter Zijlstra wrote: > > > There are currently no trace_mark() sites in the kernel that I'm aware > > > of (except for the scheduler :-/, and those should be converted to > > > tracepoints ASAP). > > > > > > Andrew raised the whole point about trace_mark() generating an > > > user-visible interface and thus it should be stable, and I agree with > > > that. > > > > > > What that means is that trace_mark() can only be used for really stable > > > points. > > > > > > This in turn means we might as well use trace points. > > > > > > Which allows for the conclusion that trace_mark() is not needed and > > > could be removed from the kernel. > > > > > > However - it might be handy for ad-hoc debugging purposes that never see > > > the light of day (linus' git tree in this case). So on those grounds one > > > could argue against removing trace_mark > > > > But trace_mark() is so wonderful. > > I guess tastes differ... > > > Can't we just declare the tracemarks > > as a non-stable interface? > > > > Perhaps add an unstable_trace_mark() to make it clear. > > At the very least it would need its own output channel. But I'm afraid > this will be KS material. > Hi Peter, Currently what I have in LTTng includes this output channel. It works for me, but if I can make it work for others that would be great. - Tracepoints in kernel code to instrument the kernel. - LTTng probes connect on those relatively stable tracepoints. They format the data so it's meaningful to userspace (e.g. extracts the pid of the prev and next process at sched_switch). - The LTTng serializer is connected on those markers. It parses the format string to dynamically reserve space in the relay buffer, write a timestamp and event ID (one event ID is pre-assigned to a marker name) and copy the arguments from the stack to the event record (which has a variable size). Event IDs and timestamps are added by LTTng, thus not required by markers. However, one can think of this flow as an efficient and compact binary data export mechanism to userspace. Headers exports data type sizes and endianness, a special data channel exports the mappings between { marker name, ID, format string } so events are self-described. Therefore, one can add any event he likes and it will be automatically understood by the tracing toolchain. If an event is removed or filtered out or modified (by changing its field name), the userspace trace analyser will detect it and the specific probe which expects this event will fail to load, leading to missing analyses, but nothing more than that. So currently what we would have is, more or less : trace_marks located within LTTng are kept in sync with userland, but the whole chain also allows to add "debug-style" trace_marks directly in the kernel code (this is really useful when trying to perform a low-impact printk-like runtime bissection of a bug in the kernel code. I actually see the trace_marks/LTTng combination as a printk which would extract information in its binary form instead of using text-formatting. The actual formatting can then be done later, in userland, if ever needed (many analyses use the raw binary format directly). I guess KS would be a good opportunity to discuss this interface topic. Mathieu -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints 2008-07-23 8:55 ` Peter Zijlstra 2008-07-23 9:32 ` Avi Kivity @ 2008-07-23 10:03 ` Christoph Hellwig 2008-07-23 10:08 ` Avi Kivity 1 sibling, 1 reply; 40+ messages in thread From: Christoph Hellwig @ 2008-07-23 10:03 UTC (permalink / raw) To: Peter Zijlstra Cc: Avi Kivity, Jan Kiszka, Mathieu Desnoyers, akpm, Ingo Molnar, linux-kernel, kvm, Feng(Eric) Liu On Wed, Jul 23, 2008 at 10:55:03AM +0200, Peter Zijlstra wrote: > Andrew raised the whole point about trace_mark() generating an > user-visible interface and thus it should be stable, and I agree with > that. I'm not sure why this comes up again and again, but it's utter bullshit. trace_mark does not generate any kind of user interface, just purely a kernel internal interface. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints 2008-07-23 10:03 ` Christoph Hellwig @ 2008-07-23 10:08 ` Avi Kivity 2008-07-23 10:13 ` Christoph Hellwig 0 siblings, 1 reply; 40+ messages in thread From: Avi Kivity @ 2008-07-23 10:08 UTC (permalink / raw) To: Christoph Hellwig Cc: Peter Zijlstra, Jan Kiszka, Mathieu Desnoyers, akpm, Ingo Molnar, linux-kernel, kvm, Feng(Eric) Liu Christoph Hellwig wrote: > On Wed, Jul 23, 2008 at 10:55:03AM +0200, Peter Zijlstra wrote: > >> Andrew raised the whole point about trace_mark() generating an >> user-visible interface and thus it should be stable, and I agree with >> that. >> > > I'm not sure why this comes up again and again, but it's utter bullshit. > trace_mark does not generate any kind of user interface, just purely > a kernel internal interface. > trace_mark() is implement kvmtrace, which is propagated to userspace. So while trace_mark() itself is not a userspace interface, one of its users is. It's an unstable interface. But so is dmesg; that's the nature of tracing. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints 2008-07-23 10:08 ` Avi Kivity @ 2008-07-23 10:13 ` Christoph Hellwig 0 siblings, 0 replies; 40+ messages in thread From: Christoph Hellwig @ 2008-07-23 10:13 UTC (permalink / raw) To: Avi Kivity Cc: Christoph Hellwig, Peter Zijlstra, Jan Kiszka, Mathieu Desnoyers, akpm, Ingo Molnar, linux-kernel, kvm, Feng(Eric) Liu On Wed, Jul 23, 2008 at 01:08:41PM +0300, Avi Kivity wrote: > trace_mark() is implement kvmtrace, which is propagated to userspace. > So while trace_mark() itself is not a userspace interface, one of its > users is. > > It's an unstable interface. But so is dmesg; that's the nature of tracing. Trace_mark is as stable as any other kernel interface, and the data you pass through it is as stable as you want it to. In most cases like kvmtrace or my spu scheduler tracing code the trace data is directly forwarded through a userspace interface, and that is as stable as any freeform interface, e.g. as like printk mentioned above. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints 2008-07-23 8:08 ` Avi Kivity 2008-07-23 8:55 ` Peter Zijlstra @ 2008-07-23 13:20 ` Mathieu Desnoyers 1 sibling, 0 replies; 40+ messages in thread From: Mathieu Desnoyers @ 2008-07-23 13:20 UTC (permalink / raw) To: Avi Kivity Cc: Peter Zijlstra, Jan Kiszka, akpm, Ingo Molnar, linux-kernel, kvm, Feng(Eric) Liu * Avi Kivity (avi@qumranet.com) wrote: > Peter Zijlstra wrote: >> On Tue, 2008-07-22 at 21:46 +0300, Avi Kivity wrote: >> >>> Jan Kiszka wrote: >>> >>>> That's true - as long as you don't have to add/remove/modify >>>> tracepoints. I had to do this job in the past (not for KVM). Having 1 >>>> spot in 1 file (based on generic probes) would be handier in that case >>>> than 5 spots in 3 files. But if the KVM tracepoints are considered >>>> stable in their number and structure, that shouldn't be an issue here. >>>> >>>> >>> Tracepoints aren't stable; they are artefacts of the implementation. >>> >> >> Which IMHO makes it unsuitable for trace_mark() as that will be exported >> to user-space, and every time you change your tracepoints you'll change >> user visible things - not nice. >> > > They are used for debugging (mostly performance related), so changes are > expected. > > What uses of trace_mark() depend on a stable interface? blktrace? > Actually, LTTng likes to have the { marker name, field name } pairs unchanged for the markers it looks for, but that's about it. If a userspace analysis plugin fails to see a marker (because it is disabled or changed), it just does not apply its particular analysis on the trace. Since the markers and marker types are self-described in the trace, userspace can detect any change the the present markers, so there is no need to rely on "version numbers" because we are able to proceed to a complete marker list verification (names, field names, types) before starting the trace analysis. Mathieu -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints 2008-07-17 15:57 ` [patch 4/4] KVM-trace port to tracepoints Mathieu Desnoyers 2008-07-17 16:49 ` Jan Kiszka @ 2008-07-17 16:52 ` Anthony Liguori 2008-07-17 17:04 ` Mathieu Desnoyers 1 sibling, 1 reply; 40+ messages in thread From: Anthony Liguori @ 2008-07-17 16:52 UTC (permalink / raw) To: Mathieu Desnoyers Cc: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Avi Kivity, kvm, Feng(Eric) Liu Hi Mathieu, It's difficult to review your patches because they aren't inlined. At any rate, this patches are unusable with SVM. They try to execute VT instructions unconditionally. For instance, you changed: > > - KVMTRACE_1D(INTR, vcpu, vmcs_read32(VM_EXIT_INTR_INFO), handler); > + trace_kvm_intr(vcpu); Which lived in VT-specific code (vmx.c) To: > +static void probe_kvm_intr(struct kvm_vcpu *vcpu) > +{ > + kvm_add_trace(KVM_TRC_INTR, vcpu, 1, > + (u32 []){ vmcs_read32(VM_EXIT_INTR_INFO) }); > +} > + Which lives in common code (kvm_trace.c). But vmcs_read32() is VT-specific and should not be used in common code so this motion is wrong. Why not just pass more arguments to probe_kvm_intr()? Then your first two patches can be dropped completely. Regards, Anthony Liguori Mathieu Desnoyers wrote: ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] KVM-trace port to tracepoints 2008-07-17 16:52 ` Anthony Liguori @ 2008-07-17 17:04 ` Mathieu Desnoyers 0 siblings, 0 replies; 40+ messages in thread From: Mathieu Desnoyers @ 2008-07-17 17:04 UTC (permalink / raw) To: Anthony Liguori Cc: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Avi Kivity, kvm, Feng(Eric) Liu * Anthony Liguori (anthony@codemonkey.ws) wrote: > Hi Mathieu, > Hi Anthony, > It's difficult to review your patches because they aren't inlined. > > At any rate, this patches are unusable with SVM. They try to execute VT > instructions unconditionally. For instance, you changed: >> >> - KVMTRACE_1D(INTR, vcpu, vmcs_read32(VM_EXIT_INTR_INFO), handler); >> + trace_kvm_intr(vcpu); > > Which lived in VT-specific code (vmx.c) > > To: > >> +static void probe_kvm_intr(struct kvm_vcpu *vcpu) >> +{ >> + kvm_add_trace(KVM_TRC_INTR, vcpu, 1, >> + (u32 []){ vmcs_read32(VM_EXIT_INTR_INFO) }); >> +} >> + > > Which lives in common code (kvm_trace.c). But vmcs_read32() is VT-specific > and should not be used in common code so this motion is wrong. Why not > just pass more arguments to probe_kvm_intr()? Then your first two patches > can be dropped completely. > Yes, I just noticed that I made a small mistake : the probe code should actually go in arch/x86/kvm/kvm_trace_probes.c, which is x86-specific. The reason why I would try to move the vmcs_read32(VM_EXIT_INTR_INFO) to the probe code is because, unlike the Markers, when a function call with potential side-effects is put within the arguments, it's really an argument to a static inline function. In the markers, since it was a parameter passed to a macro, I could shuffle it into the branch and it would not be executed when markers were disabled. However, we don't have this with tracepoints. kvm-move-register-read-write-to-system-headers.patch becomes obsolete since I put the probe code in arch/x86/kvm/. But it would still be required to move vmcs read and encodings to headers, either to include/asm-x86 or arch/x86/kvm. Mathieu > Regards, > > Anthony Liguori > > Mathieu Desnoyers wrote: -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 0/4] Port KVM-trace to tracepoints 2008-07-17 15:57 [patch 0/4] Port KVM-trace to tracepoints Mathieu Desnoyers ` (3 preceding siblings ...) 2008-07-17 15:57 ` [patch 4/4] KVM-trace port to tracepoints Mathieu Desnoyers @ 2008-07-22 18:42 ` Avi Kivity 2008-07-22 19:16 ` Frank Ch. Eigler 4 siblings, 1 reply; 40+ messages in thread From: Avi Kivity @ 2008-07-22 18:42 UTC (permalink / raw) To: Mathieu Desnoyers; +Cc: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, kvm Mathieu Desnoyers wrote: > Hi, > > Here is a port of KVM-trace, currently using macros on top of the Linux Markers, > to tracepoints. Note that I cleaned up the instrumentation too, so stuff like > > KVMTRACE_3D(CR_WRITE, vcpu, (u32)cr, > (u32)kvm_register_read(vcpu, reg), > (u32)((u64)kvm_register_read(vcpu, reg) >> 32), > handler); > > Sprinkled all across the KVM C code becomes : > > trace_kvm_cr_write(vcpu, cr, reg); > > Which looks much nicer, IMHO. > > It applies on top of linux-next patch-v2.6.26-next-20080715. > > It does look nicer; but it means maintaining tracepoints becomes much harder, because the information is scattered across many files. kvm tracepoints are heavily tied into the implementation; and making them harder to write means we will have less information. In fact, I am contemplating moving in another direction (when looking at the pgprintk()s scattered around arch/x86/kvm/mmu.c: kvm_trace("pfentry", "page_fault entry addr %lx error code %x\n", cr2, error_code); Unlike printk()s, no actual formatting would occur during runtime. Instead, at initialization time all the strings would be parsed into a data structure that describes the data types, and the runtime would simply consult this structure and copy the arguments into trace records. User space would also be able to pull this structure and so recreate the formatted string. The advantages I see to this are: - easy to add traces; the most important advantage - when the code changes, obsolete traces are completely removed - good performance - no need to have a formats file in userspace (which is tied to the kernel version) - can also send printk()s along, for synchronization with other kvm and kernel events In fact, why not convert printk() to do this, making it a high-performance logging system? Of course it would retain the dmesg interface, but we could say that messages at the KERN_TRACE level bypass dmesg and only go to the trace system. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 0/4] Port KVM-trace to tracepoints 2008-07-22 18:42 ` [patch 0/4] Port KVM-trace " Avi Kivity @ 2008-07-22 19:16 ` Frank Ch. Eigler 2008-07-22 19:31 ` Avi Kivity 0 siblings, 1 reply; 40+ messages in thread From: Frank Ch. Eigler @ 2008-07-22 19:16 UTC (permalink / raw) To: Avi Kivity Cc: Mathieu Desnoyers, akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, kvm Avi Kivity <avi@qumranet.com> writes: > [...] > kvm tracepoints are heavily tied into the implementation; and making > them harder to write means we will have less information. In fact, I > am contemplating moving in another direction (when looking at the > pgprintk()s scattered around arch/x86/kvm/mmu.c: > > kvm_trace("pfentry", "page_fault entry addr %lx error code %x\n", > cr2, error_code); > > Unlike printk()s, no actual formatting would occur during runtime. Have you considered using trace_mark() directly - eliminating the KVM_TRACEN() middlemen? > Instead, at initialization time all the strings would be parsed into > a data structure that describes the data types, and the runtime > would simply consult this structure and copy the arguments into > trace records. User space would also be able to pull this structure > and so recreate the formatted string. If one really wanted to, one could build such a mechanism on top of marker-based callbacks. > The advantages I see to this are: > > - easy to add traces; the most important advantage > - when the code changes, obsolete traces are completely removed > - good performance Ditto. > - no need to have a formats file in userspace (which is tied to the > kernel version) OTOH, you'd have the kernel collecting compact binary records containing just the parameters, which are at least as tied to kernel version. > - can also send printk()s along, for synchronization with other kvm > and kernel events Ditto. It is elementary to attach a printk-generating marker callback. - FChE ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 0/4] Port KVM-trace to tracepoints 2008-07-22 19:16 ` Frank Ch. Eigler @ 2008-07-22 19:31 ` Avi Kivity 2008-07-22 19:54 ` Frank Ch. Eigler 2008-07-22 22:12 ` [patch 0/4] Port KVM-trace to tracepoints -> LTTng ? Mathieu Desnoyers 0 siblings, 2 replies; 40+ messages in thread From: Avi Kivity @ 2008-07-22 19:31 UTC (permalink / raw) To: Frank Ch. Eigler Cc: Mathieu Desnoyers, akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, kvm Frank Ch. Eigler wrote: > Avi Kivity <avi@qumranet.com> writes: > > >> [...] >> kvm tracepoints are heavily tied into the implementation; and making >> them harder to write means we will have less information. In fact, I >> am contemplating moving in another direction (when looking at the >> pgprintk()s scattered around arch/x86/kvm/mmu.c: >> >> kvm_trace("pfentry", "page_fault entry addr %lx error code %x\n", >> cr2, error_code); >> >> Unlike printk()s, no actual formatting would occur during runtime. >> > > Have you considered using trace_mark() directly - eliminating the > KVM_TRACEN() middlemen? > > Eliminating KVM_TRACEN -- yes. There are too many of them, they aren't type-aware, and they're in uppercase. Using trace_mark() directly -- looking at it, seems to fit the requirements exactly. Should have looked at it earlier. Is there a way to get a list of all markers? Perhaps the kvmtrace marker->relay integration should be made a marker feature, since there is nothing specific to kvm in it. >> Instead, at initialization time all the strings would be parsed into >> a data structure that describes the data types, and the runtime >> would simply consult this structure and copy the arguments into >> trace records. User space would also be able to pull this structure >> and so recreate the formatted string. >> > > If one really wanted to, one could build such a mechanism on top of > marker-based callbacks. > > One does want to. >> - no need to have a formats file in userspace (which is tied to the >> kernel version) >> > > OTOH, you'd have the kernel collecting compact binary records > containing just the parameters, which are at least as tied to kernel > version. > > Yes, but the userspace side would collect the format strings as well (just once) and could put them in the same file. The aggregation is portable across kernel versions. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 0/4] Port KVM-trace to tracepoints 2008-07-22 19:31 ` Avi Kivity @ 2008-07-22 19:54 ` Frank Ch. Eigler 2008-07-22 22:12 ` [patch 0/4] Port KVM-trace to tracepoints -> LTTng ? Mathieu Desnoyers 1 sibling, 0 replies; 40+ messages in thread From: Frank Ch. Eigler @ 2008-07-22 19:54 UTC (permalink / raw) To: Avi Kivity Cc: Mathieu Desnoyers, akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, kvm Avi Kivity <avi@qumranet.com> writes: > [...] >> Have you considered using trace_mark() directly - eliminating the >> KVM_TRACEN() middlemen? > [...] > Using trace_mark() directly -- looking at it, seems to fit the > requirements exactly. Should have looked at it earlier. Is there a > way to get a list of all markers? >From kernel-space, I can't find just now an API for listing them, but there probably is / could be one. >From user-space, lttng tools can probably do it. systemtap can too: % stap -l 'kernel.mark("*")' You could prototype binary tracing thusly: % stap -e 'probe kernel.mark("kvm_foobar") { printf("%4b%4b%4b", # three 4-byte ints cpu(), $arg1, $arg2) }' > Perhaps the kvmtrace marker->relay integration should be made a marker > feature, since there is nothing specific to kvm in it. Right, I believe something like that is in the lttng patch suite. - FChE ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 0/4] Port KVM-trace to tracepoints -> LTTng ? 2008-07-22 19:31 ` Avi Kivity 2008-07-22 19:54 ` Frank Ch. Eigler @ 2008-07-22 22:12 ` Mathieu Desnoyers 2008-07-27 10:11 ` Avi Kivity 1 sibling, 1 reply; 40+ messages in thread From: Mathieu Desnoyers @ 2008-07-22 22:12 UTC (permalink / raw) To: Avi Kivity Cc: Frank Ch. Eigler, akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, kvm * Avi Kivity (avi@qumranet.com) wrote: > Frank Ch. Eigler wrote: >> Avi Kivity <avi@qumranet.com> writes: >> >> >>> [...] >>> kvm tracepoints are heavily tied into the implementation; and making >>> them harder to write means we will have less information. In fact, I >>> am contemplating moving in another direction (when looking at the >>> pgprintk()s scattered around arch/x86/kvm/mmu.c: >>> >>> kvm_trace("pfentry", "page_fault entry addr %lx error code %x\n", >>> cr2, error_code); >>> >>> Unlike printk()s, no actual formatting would occur during runtime. >>> >> >> Have you considered using trace_mark() directly - eliminating the >> KVM_TRACEN() middlemen? >> >> > > Eliminating KVM_TRACEN -- yes. There are too many of them, they aren't > type-aware, and they're in uppercase. > > Using trace_mark() directly -- looking at it, seems to fit the requirements > exactly. Should have looked at it earlier. Is there a way to get a list > of all markers? > > Perhaps the kvmtrace marker->relay integration should be made a marker > feature, since there is nothing specific to kvm in it. > >>> Instead, at initialization time all the strings would be parsed into >>> a data structure that describes the data types, and the runtime >>> would simply consult this structure and copy the arguments into >>> trace records. User space would also be able to pull this structure >>> and so recreate the formatted string. >>> >> >> If one really wanted to, one could build such a mechanism on top of >> marker-based callbacks. >> >> > > One does want to. > >>> - no need to have a formats file in userspace (which is tied to the >>> kernel version) >>> >> >> OTOH, you'd have the kernel collecting compact binary records >> containing just the parameters, which are at least as tied to kernel >> version. >> >> > > Yes, but the userspace side would collect the format strings as well (just > once) and could put them in the same file. The aggregation is portable > across kernel versions. > Yes, LTTng does exactly all that. Please have a look at the current LTTng patchset : http://ltt.polymtl.ca/lttng/patch-2.6.26-0.11.tar.bz2 The interface to list markers is currently found in /proc/ltt Commands like : cat /proc/ltt (list markers) And echo -n "connect marker_name default dynamic channel_name" See the script ltt-armall.sh in the package : http://ltt.polymtl.ca/lttng/ltt-control-0.50-17072008.tar.gz To see how to arm all markers listed. General information (compatibility list and quickstart guide are available at http://ltt.polymtl.ca). Packages also useful : lttv (trace analyser, including text dump, filtering, gui...) and a userspace marker package (only supports x86 32/64 currently). All these packages support any kind of custom markers, because the marker names/types are listed in the "facilities_*" control tracefile at trace start, so the traces are self-described. I also list other stuff (memory maps, irq handler names, system call handler names) at trace start so we can dynamically have these mapping, independently of the architecture. See : http://ltt.polymtl.ca/packages/lttv-0.10.0-pre14-17072008.tar.gz http://ltt.polymtl.ca/packages/markers-userspace-0.5.tar.bz2 I'll be more than happy to answer your questions. Mathieu > -- > I have a truly marvellous patch that fixes the bug which this > signature is too narrow to contain. > -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 0/4] Port KVM-trace to tracepoints -> LTTng ? 2008-07-22 22:12 ` [patch 0/4] Port KVM-trace to tracepoints -> LTTng ? Mathieu Desnoyers @ 2008-07-27 10:11 ` Avi Kivity 2008-07-28 0:54 ` [RFC] LTTng merge plan Mathieu Desnoyers 0 siblings, 1 reply; 40+ messages in thread From: Avi Kivity @ 2008-07-27 10:11 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Frank Ch. Eigler, akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, kvm Mathieu Desnoyers wrote: >> Yes, but the userspace side would collect the format strings as well (just >> once) and could put them in the same file. The aggregation is portable >> across kernel versions. >> >> > > Yes, > > LTTng does exactly all that. > > > [snip goodies] > I'll be more than happy to answer your questions. > What's the merge plan for this? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 40+ messages in thread
* [RFC] LTTng merge plan 2008-07-27 10:11 ` Avi Kivity @ 2008-07-28 0:54 ` Mathieu Desnoyers 2008-07-29 16:18 ` Frank Ch. Eigler 0 siblings, 1 reply; 40+ messages in thread From: Mathieu Desnoyers @ 2008-07-28 0:54 UTC (permalink / raw) To: Avi Kivity Cc: Linus Torvalds, Andrew Morton, Ingo Molnar, Peter Zijlstra, Steven Rostedt, James Bottomley, Frank Ch. Eigler, kvm, linux-kernel, systemtap-ml, linux-btrace * Avi Kivity (avi@qumranet.com) wrote: > Mathieu Desnoyers wrote: > > >>> Yes, but the userspace side would collect the format strings as well >>> (just once) and could put them in the same file. The aggregation is >>> portable across kernel versions. >>> >>> >> >> Yes, >> >> LTTng does exactly all that. >> >> >> > > [snip goodies] > >> I'll be more than happy to answer your questions. >> > > What's the merge plan for this? > Hi Avi, Thanks for asking. Given the amount of expectation from kernel developers, distributions and users I have seen for kernel tracing at this year's OLS, I think giving a detailed merge plan for my LTTng work is appropriate. Currently, it looks like : In Ingo's trees : - Tracepoints, scheduler tracepoints instrumentation, ftrace port to tracepoints - Should make it into 2.6.27 since ftrace needs those. - Immediate Values (faster branch based on load immediate instruction) Useful for markers and tracepoints, but can also be used for any compiled-in code that has to be dynamically enabled. - Aims at 2.6.28 - Text Edit Lock : protection of kernel text modification with a mutex. Synchronises kprobes and immediate values. - Aims at 2.6.28 Short-term submission plan In LTTng patchset (http://ltt.polymtl.ca/lttng/patch-2.6.26-0.12.tar.bz2) - Instrumentation - LTTng tracepoints - Used by LTTng, SystemTAP and usable specialized probes. - Port specific sets of tracepoints along with their current users - ftrace (port currently in Ingo's tree), KVM trace, blktrace. - Data extraction - LTTng timestamping - Based on the CPU cycle counter when synchronized across CPUs. - Fallback on a simple cache-bouncing atomic counter if no synchronized fast time source is available. Basically, the idea is that having the correct event _order_ is more important than having an approximate time, because this "timestamp" is used to reorder events which are written in per-CPU buffers. Time updates can always be recorded as an event in the trace to get an idea of the kernel time flow. - LTTng trace management - netlink interface to start/stop tracing and set the buffer sizes. - Supports multiple channels (high/medium/low event rate). Metadata (marker types, list of interrupt handlers...) can be exported in low event rate channels. - Supports flight recorder mode (overwriting oldest buffer data), normal mode (writes to disk, drops events if buffer is full) or hybrid, or mixed, mode, where the high event rate buffers only are in flight recorder mode. - Data relay - Atomic buffering mechanism which does not call into kernel primitives except preempt disable. Only touches variables atomically, does not use any lock. Aims at having minimal intrusiveness and allowing the largest code coverage (thus not calling kernel code). - LTTng marker control - Currently a /proc/ltt interface with read and write operations to list markers and connect LTTng probe to individual markers, specifying in which channel to send the data (I know, should probably belong to /sys instead, comments welcome) It's not part of the core marker infrastructure because it depends both on markers and on the LTTng trace management. It's also responsible for allocating a numeric ID to a marker which is guaranteed to be unique as long as there is at least one active trace. Medium-term submission plan In LTTng patchset - Instrumentation - Userspace tracing interface - Allow userspace to declare tracepoints and/or markers - Provide a data extraction interface to collect the tracing data. - More work needed in this area. - LTTng statedump - Exports the kernel data structures to the trace buffers at trace start. List interrupts, system calls, threads, memory maps, ... It does not use /proc because : 1 - /proc has nasty races which makes the information "generally correct" but not more. 2 - /proc exports the information in text format, which is not as compact as LTTng binary format. Longer term wishlist - GCC support for static branch patching - Improvement on the immediate values for dynamic code activation A bit more information is available in the slides I just presented at OLS at : http://ltt.polymtl.ca/slides/desnoyers-talk-ols2008.pdf I'll gladly answer to questions/comments. Mathieu -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] LTTng merge plan 2008-07-28 0:54 ` [RFC] LTTng merge plan Mathieu Desnoyers @ 2008-07-29 16:18 ` Frank Ch. Eigler 2008-07-29 17:01 ` Mathieu Desnoyers 0 siblings, 1 reply; 40+ messages in thread From: Frank Ch. Eigler @ 2008-07-29 16:18 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Avi Kivity, Linus Torvalds, Andrew Morton, Ingo Molnar, Peter Zijlstra, Steven Rostedt, James Bottomley, kvm, linux-kernel, systemtap-ml, linux-btrace Hi, Mathieu - > [...] > Currently, [my merge plan] looks like : > > In Ingo's trees : > - Tracepoints, scheduler tracepoints instrumentation, ftrace port to > tracepoints > - Should make it into 2.6.27 since ftrace needs those. > [...] This is the tracepoints infrastructure, and ... > Short-term submission plan > > In LTTng patchset > (http://ltt.polymtl.ca/lttng/patch-2.6.26-0.12.tar.bz2) > > - Instrumentation > - LTTng tracepoints > - Used by LTTng, SystemTAP and usable specialized probes. > [...] ... this is the "meat", which includes both the tracepoints and the the tracepoint-to-marker conversion modules, such as those in "lttng-instrumentation-*-tracepoint-probes.patch", right? - FChE ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] LTTng merge plan 2008-07-29 16:18 ` Frank Ch. Eigler @ 2008-07-29 17:01 ` Mathieu Desnoyers [not found] ` <20080729211543.GB17097@redhat.com> 0 siblings, 1 reply; 40+ messages in thread From: Mathieu Desnoyers @ 2008-07-29 17:01 UTC (permalink / raw) To: Frank Ch. Eigler Cc: Avi Kivity, Linus Torvalds, Andrew Morton, Ingo Molnar, Peter Zijlstra, Steven Rostedt, James Bottomley, kvm, linux-kernel, systemtap-ml, linux-btrace * Frank Ch. Eigler (fche@redhat.com) wrote: > Hi, Mathieu - > > > [...] > > Currently, [my merge plan] looks like : > > > > In Ingo's trees : > > - Tracepoints, scheduler tracepoints instrumentation, ftrace port to > > tracepoints > > - Should make it into 2.6.27 since ftrace needs those. > > [...] > > This is the tracepoints infrastructure, and ... > > > Short-term submission plan > > > > In LTTng patchset > > (http://ltt.polymtl.ca/lttng/patch-2.6.26-0.12.tar.bz2) > > > > - Instrumentation > > - LTTng tracepoints > > - Used by LTTng, SystemTAP and usable specialized probes. > > [...] > > ... this is the "meat", which includes both the tracepoints > and the the tracepoint-to-marker conversion modules, such as those in > "lttng-instrumentation-*-tracepoint-probes.patch", right? > > - FChE Exactly. The tracepoint-to-markers conversion modules bridge between the in-kernel API (tracepoints) which declares standard location for kernel instrumentation and user-space visible "markers", so that there is an indirection level between the instrumented kernel code and what is visible from userspace. For instance, we have in kernel/sched.c:context_switch() trace_sched_switch(rq, prev, next); Which exports a struct rq * and two struct task_struct *. They are used by both ftrace, which needs access to these data structures, and by the LTTng probes, which extracts the previous PID and state (running, waiting..) and next PID to be scheduled in. Basically, these modules turn the data exported by tracepoints, meaningful only to kernel modules, into data useful for userspace trace analysis. Mathieu -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <20080729211543.GB17097@redhat.com>]
* Re: module-placed markers/tracepoints [not found] ` <20080729211543.GB17097@redhat.com> @ 2008-07-29 22:41 ` Mathieu Desnoyers 2008-07-29 23:01 ` Frank Ch. Eigler 2008-07-30 1:40 ` Rusty Russell 0 siblings, 2 replies; 40+ messages in thread From: Mathieu Desnoyers @ 2008-07-29 22:41 UTC (permalink / raw) To: Frank Ch. Eigler; +Cc: Rusty Russell, linux-kernel, akpm * Frank Ch. Eigler (fche@redhat.com) wrote: > Hi - > > Some locals are wondering -- is there code for (or need for new code > for) incrementing module reference counts while markers and/or > tracepoints resident in modules have active clients? > > - FChE Hi Frank, Probe module unloading is supposed to be done automatically assuming the following module unload behavior : 1 - Module unload code takes the stop machine lock. 2 - It is assumed that the module exit function is called at that time, while no preempt-disabled section is executed. This function unregisters the callbacks. 3 - Given that preemption is disabled around tracepoint/marker calls, no calls are left to a freed module. Actually, I notice that it might not be the case. I took for granted what (I can't remember who ? Rusty ?) told me about module unload code. If we look at sys_delete_module() : /* Stop the machine so refcounts can't move and disable module. */ ret = try_stop_module(mod, flags, &forced); if (ret != 0) goto out; /* Never wait if forced. */ if (!forced && module_refcount(mod) != 0) wait_for_zero_refcount(mod); mutex_unlock(&module_mutex); /* Final destruction now noone is using it. */ if (mod->exit != NULL) mod->exit(); So we see that mod->exit(), which I believe is responsible for calling the module exit() functions, is called after the stop_machine, not during the stop machine. Therefore, what would be needed here is to add a synchronize_sched() after mod->exit() in module.c or after each tracepoint/marker unregister in marker.c/tracepoint.c. However, I'd really prefer to add this to module.c since adding this for _every_ unregister will slow down processing or multiple probe unregistration too much. Rusty, am I understanding that correctly ? Mathieu -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: module-placed markers/tracepoints 2008-07-29 22:41 ` module-placed markers/tracepoints Mathieu Desnoyers @ 2008-07-29 23:01 ` Frank Ch. Eigler 2008-07-29 23:19 ` Mathieu Desnoyers 2008-07-30 1:40 ` Rusty Russell 1 sibling, 1 reply; 40+ messages in thread From: Frank Ch. Eigler @ 2008-07-29 23:01 UTC (permalink / raw) To: Mathieu Desnoyers; +Cc: Rusty Russell, linux-kernel, akpm Hi - On Tue, Jul 29, 2008 at 06:41:16PM -0400, Mathieu Desnoyers wrote: > > Some locals are wondering -- is there code for (or need for new code > > for) incrementing module reference counts while markers and/or > > tracepoints resident in modules have active clients? > > Probe module unloading is supposed to be done automatically assuming the > following module unload behavior [...] The question was more that if module-placed markers/tracepoints are armed, is there any mechanism to prevent the modules' unloading. For kprobes, there is. - FChE ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: module-placed markers/tracepoints 2008-07-29 23:01 ` Frank Ch. Eigler @ 2008-07-29 23:19 ` Mathieu Desnoyers 0 siblings, 0 replies; 40+ messages in thread From: Mathieu Desnoyers @ 2008-07-29 23:19 UTC (permalink / raw) To: Frank Ch. Eigler; +Cc: Rusty Russell, linux-kernel, akpm * Frank Ch. Eigler (fche@redhat.com) wrote: > Hi - > > On Tue, Jul 29, 2008 at 06:41:16PM -0400, Mathieu Desnoyers wrote: > > > Some locals are wondering -- is there code for (or need for new code > > > for) incrementing module reference counts while markers and/or > > > tracepoints resident in modules have active clients? > > > > Probe module unloading is supposed to be done automatically assuming the > > following module unload behavior [...] > > The question was more that if module-placed markers/tracepoints are > armed, is there any mechanism to prevent the modules' unloading. For > kprobes, there is. > > - FChE I see, it's the other way around : declaring a marker/tracepoint in a module. When you register to a marker/tracepoint, you actually register the probe in a hash table. It will be connected to every placed marker/tracepoint with that given name. Upon module load, markers/tp that match the name are connected, and upon module unload all markers/tp are simply freed by module.c : unlike kprobes, there is no need for the marker/tp infrastructures to keep track of every marker/tp. So we don't need any refcount on the module which has the marker/tp. However, the probe question is important; I think a synchronize_sched() is missing to handle unload of marker/tp probes modules. Mathieu -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: module-placed markers/tracepoints 2008-07-29 22:41 ` module-placed markers/tracepoints Mathieu Desnoyers 2008-07-29 23:01 ` Frank Ch. Eigler @ 2008-07-30 1:40 ` Rusty Russell 2008-07-30 2:27 ` [PATCH] Module : call synchronize_sched() between module exit() and free Mathieu Desnoyers 1 sibling, 1 reply; 40+ messages in thread From: Rusty Russell @ 2008-07-30 1:40 UTC (permalink / raw) To: Mathieu Desnoyers; +Cc: Frank Ch. Eigler, linux-kernel, akpm On Wednesday 30 July 2008 08:41:16 Mathieu Desnoyers wrote: > Therefore, what would be needed here is to add a synchronize_sched() > after mod->exit() in module.c or after each tracepoint/marker unregister > in marker.c/tracepoint.c. However, I'd really prefer to add this to > module.c since adding this for _every_ unregister will slow down > processing or multiple probe unregistration too much. > > Rusty, am I understanding that correctly ? > > Mathieu Hi Mathieu, Yes: stop_machine is merely used to atomically check the module refcount for zero and set the state so it can't be incremented again (ie. try_module_get will fail). So placing a tracepoint or marker in a module does not bump the module refcount? If that's true, then there needs to be some kind of remove_markers_from_module() call after module->exit(), which should do the synchronize_sched() or whatever, right? Rusty. ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH] Module : call synchronize_sched() between module exit() and free. 2008-07-30 1:40 ` Rusty Russell @ 2008-07-30 2:27 ` Mathieu Desnoyers 2008-07-30 3:04 ` Rusty Russell 2008-07-30 11:40 ` Frank Ch. Eigler 0 siblings, 2 replies; 40+ messages in thread From: Mathieu Desnoyers @ 2008-07-30 2:27 UTC (permalink / raw) To: Rusty Russell; +Cc: Frank Ch. Eigler, linux-kernel, akpm > Hi Mathieu, > > Yes: stop_machine is merely used to atomically check the module refcount > for zero and set the state so it can't be incremented again (ie. > try_module_get will fail). > > So placing a tracepoint or marker in a module does not bump the module > refcount? If that's true, then there needs to be some kind of > remove_markers_from_module() call after module->exit(), which should do the > synchronize_sched() or whatever, right? > > Rusty. Actually, it's not placing a marker/tracepoint in a module which causes a problem, this is a simple function call after all, and correctly dealt with by current module.c code. The problem comes from a probe function (the callback) that would be registered to be called from a marker and would sit in an unloadable kernel module. I would not want to tie the refcount of the probe modules to the fact that they are connected to a marker because it would then become impossible to unload them due to the fact that unregistration is done in module exit(). This is one of the reasons why I disable preemption around the marker site (the function call) : to make sure I can can unregister the callback, wait for a quiescent state (with synchronize_sched()) and then free the module memory. This would give the following supplementary guarantee about module teardown : every function called with preemption off and unregistered in the module exit() would reach a quiescent state before the module is freed. Given this does apply to rarely used code (module unload), I think it might be ok to simply add a call to synchronize_sched() before the module memory is freed. Not tying this to markers/tracepoints would keep the behavior consistant across various build options, which is IMHO a good thing. I could also just document that a mandatory "synchronize_sched()" should be called at the end of the probe module exit() function which makes sure the probes has reached a quiescent state. I don't want to add a synchronize_sched() into the marker/tracepoint probe unregistration code because I want to keep batch probe unregistration fast enough so it does no take ~5 seconds to unload ~100 probes. (may take longer on a loaded SMP system) Mathieu -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Module : call synchronize_sched() between module exit() and free. 2008-07-30 2:27 ` [PATCH] Module : call synchronize_sched() between module exit() and free Mathieu Desnoyers @ 2008-07-30 3:04 ` Rusty Russell 2008-07-30 4:05 ` Mathieu Desnoyers 2008-07-30 11:40 ` Frank Ch. Eigler 1 sibling, 1 reply; 40+ messages in thread From: Rusty Russell @ 2008-07-30 3:04 UTC (permalink / raw) To: Mathieu Desnoyers; +Cc: Frank Ch. Eigler, linux-kernel, akpm On Wednesday 30 July 2008 12:27:51 Mathieu Desnoyers wrote: > > Hi Mathieu, > > > > Yes: stop_machine is merely used to atomically check the module > > refcount for zero and set the state so it can't be incremented again (ie. > > try_module_get will fail). > > > > So placing a tracepoint or marker in a module does not bump the module > > refcount? If that's true, then there needs to be some kind of > > remove_markers_from_module() call after module->exit(), which should do > > the synchronize_sched() or whatever, right? > > > > Rusty. > > Actually, it's not placing a marker/tracepoint in a module which causes > a problem, this is a simple function call after all, and correctly dealt > with by current module.c code. > > The problem comes from a probe function (the callback) that would be > registered to be called from a marker and would sit in an unloadable > kernel module. I would not want to tie the refcount of the probe modules > to the fact that they are connected to a marker because it would then > become impossible to unload them due to the fact that unregistration is > done in module exit(). Hi Mathieu, Still confused, sorry. Why don't you don't do a synchronize_sched() at the end of your module's exit routine? "You must be completely finished by the time ->exit() returns" is the rule so far... Rusty. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Module : call synchronize_sched() between module exit() and free. 2008-07-30 3:04 ` Rusty Russell @ 2008-07-30 4:05 ` Mathieu Desnoyers 0 siblings, 0 replies; 40+ messages in thread From: Mathieu Desnoyers @ 2008-07-30 4:05 UTC (permalink / raw) To: Rusty Russell; +Cc: Frank Ch. Eigler, linux-kernel, akpm * Rusty Russell (rusty@rustcorp.com.au) wrote: > On Wednesday 30 July 2008 12:27:51 Mathieu Desnoyers wrote: > > > Hi Mathieu, > > > > > > Yes: stop_machine is merely used to atomically check the module > > > refcount for zero and set the state so it can't be incremented again (ie. > > > try_module_get will fail). > > > > > > So placing a tracepoint or marker in a module does not bump the module > > > refcount? If that's true, then there needs to be some kind of > > > remove_markers_from_module() call after module->exit(), which should do > > > the synchronize_sched() or whatever, right? > > > > > > Rusty. > > > > Actually, it's not placing a marker/tracepoint in a module which causes > > a problem, this is a simple function call after all, and correctly dealt > > with by current module.c code. > > > > The problem comes from a probe function (the callback) that would be > > registered to be called from a marker and would sit in an unloadable > > kernel module. I would not want to tie the refcount of the probe modules > > to the fact that they are connected to a marker because it would then > > become impossible to unload them due to the fact that unregistration is > > done in module exit(). > > Hi Mathieu, > > Still confused, sorry. Why don't you don't do a synchronize_sched() at > the end of your module's exit routine? "You must be completely finished by > the time ->exit() returns" is the rule so far... > Hi Rusty, Yes, this is I think the right solution. Sorry for the title of the previous email : I started writing a patch and changed my mind half-way. Doing a synchronize_sched() at the end of the exit() function seems like the right thing to do. I'll have to update the documentation and examples accordingly. Thanks, Mathieu > Rusty. -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Module : call synchronize_sched() between module exit() and free. 2008-07-30 2:27 ` [PATCH] Module : call synchronize_sched() between module exit() and free Mathieu Desnoyers 2008-07-30 3:04 ` Rusty Russell @ 2008-07-30 11:40 ` Frank Ch. Eigler 2008-07-30 14:09 ` Mathieu Desnoyers 1 sibling, 1 reply; 40+ messages in thread From: Frank Ch. Eigler @ 2008-07-30 11:40 UTC (permalink / raw) To: Mathieu Desnoyers; +Cc: Rusty Russell, linux-kernel, akpm Hi - On Tue, Jul 29, 2008 at 10:27:51PM -0400, Mathieu Desnoyers wrote: > [...] > Actually, it's not placing a marker/tracepoint in a module which causes > a problem, this is a simple function call after all, and correctly dealt > with by current module.c code. > [...] Just to spell it out, it is this scenario I'd like to see documented: module-foo.c: foo() { ... trace_mark (foo, "..."); ... } module-bar.c: setup() { ... marker_probe_register ("foo" , ..., &foo_handler ); } teardown() { ... marker_probe_unregister ("foo" , ..., &foo_handler ); } foo_handler() { } 1) module-foo loads 2) module-bar loads 3) module-bar.c:setup() 4) module-foo unloads What happens here? Certainly no more calls to foo_handler, but is that all? (Would it not be desirable for an active marker to cause module-foo's refcount to increase, so as to prevent unloading at this time?) 5) module-bar.c:teardown() Can this teardown code succeed fully even if module-foo is already dead and gone? - FChE ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Module : call synchronize_sched() between module exit() and free. 2008-07-30 11:40 ` Frank Ch. Eigler @ 2008-07-30 14:09 ` Mathieu Desnoyers 2008-07-31 0:54 ` Rusty Russell 0 siblings, 1 reply; 40+ messages in thread From: Mathieu Desnoyers @ 2008-07-30 14:09 UTC (permalink / raw) To: Frank Ch. Eigler; +Cc: Rusty Russell, linux-kernel, akpm * Frank Ch. Eigler (fche@redhat.com) wrote: > Hi - > > On Tue, Jul 29, 2008 at 10:27:51PM -0400, Mathieu Desnoyers wrote: > > [...] > > Actually, it's not placing a marker/tracepoint in a module which causes > > a problem, this is a simple function call after all, and correctly dealt > > with by current module.c code. > > [...] > > Just to spell it out, it is this scenario I'd like to see documented: > > module-foo.c: > foo() { ... trace_mark (foo, "..."); ... } > > module-bar.c: > setup() { ... marker_probe_register ("foo" , ..., &foo_handler ); } > teardown() { ... marker_probe_unregister ("foo" , ..., &foo_handler ); } > foo_handler() { } > > 1) module-foo loads > 2) module-bar loads > 3) module-bar.c:setup() > 4) module-foo unloads > > What happens here? Certainly no more calls to foo_handler, but is > that all? Yep, that's it. However a hash table still keeps track of the "foo" handler so that it's reconnected whenever module-foo is reloaded. >(Would it not be desirable for an active marker to cause > module-foo's refcount to increase, so as to prevent unloading at this > time?) No, because I want to be able to unload the marked module and I don't want the fact that a probe is connected to it to change that. > > 5) module-bar.c:teardown() > > Can this teardown code succeed fully even if module-foo is already > dead and gone? > Yes. Actually there is a detail missing here. Your teardown should be : teardown() { ... marker_probe_unregister ("foo" , ..., &foo_handler ); synchronize_sched(); /* Before returning from exit */ } This makes sure that every live marker call are finished and that it is safe to unload module-bar (the probe). Mathieu > > - FChE -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Module : call synchronize_sched() between module exit() and free. 2008-07-30 14:09 ` Mathieu Desnoyers @ 2008-07-31 0:54 ` Rusty Russell 0 siblings, 0 replies; 40+ messages in thread From: Rusty Russell @ 2008-07-31 0:54 UTC (permalink / raw) To: Mathieu Desnoyers; +Cc: Frank Ch. Eigler, linux-kernel, akpm On Thursday 31 July 2008 00:09:38 Mathieu Desnoyers wrote: > * Frank Ch. Eigler (fche@redhat.com) wrote: > >(Would it not be desirable for an active marker to cause > > module-foo's refcount to increase, so as to prevent unloading at this > > time?) > > No, because I want to be able to unload the marked module and I don't > want the fact that a probe is connected to it to change that. Also you might want to put a marker in the module's exit code. > Actually there is a detail missing here. Your teardown should be : > teardown() { ... marker_probe_unregister ("foo" , ..., &foo_handler ); > synchronize_sched(); /* Before returning from exit */ } > > This makes sure that every live marker call are finished and that it is > safe to unload module-bar (the probe). Perhaps create a synchronize_marker_unregister() wrapper for such uses; it's better documentation and the caller doesn't have to know exactly how it works... Cheers, Rusty. ^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2008-07-31 0:54 UTC | newest]
Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-17 15:57 [patch 0/4] Port KVM-trace to tracepoints Mathieu Desnoyers
2008-07-17 15:57 ` [patch 1/4] kvm move VMCS Encodings to system headers Mathieu Desnoyers
2008-07-17 15:57 ` [patch 2/4] kvm move VMCS read " Mathieu Desnoyers
2008-07-17 15:57 ` [patch 3/4] KVM move register read-write " Mathieu Desnoyers
2008-07-17 15:57 ` [patch 4/4] KVM-trace port to tracepoints Mathieu Desnoyers
2008-07-17 16:49 ` Jan Kiszka
2008-07-17 17:28 ` Mathieu Desnoyers
2008-07-22 16:04 ` Jan Kiszka
2008-07-22 18:46 ` Avi Kivity
2008-07-23 7:49 ` Peter Zijlstra
2008-07-23 8:08 ` Avi Kivity
2008-07-23 8:55 ` Peter Zijlstra
2008-07-23 9:32 ` Avi Kivity
2008-07-23 9:53 ` Peter Zijlstra
2008-07-23 13:15 ` Mathieu Desnoyers
2008-07-23 10:03 ` Christoph Hellwig
2008-07-23 10:08 ` Avi Kivity
2008-07-23 10:13 ` Christoph Hellwig
2008-07-23 13:20 ` Mathieu Desnoyers
2008-07-17 16:52 ` Anthony Liguori
2008-07-17 17:04 ` Mathieu Desnoyers
2008-07-22 18:42 ` [patch 0/4] Port KVM-trace " Avi Kivity
2008-07-22 19:16 ` Frank Ch. Eigler
2008-07-22 19:31 ` Avi Kivity
2008-07-22 19:54 ` Frank Ch. Eigler
2008-07-22 22:12 ` [patch 0/4] Port KVM-trace to tracepoints -> LTTng ? Mathieu Desnoyers
2008-07-27 10:11 ` Avi Kivity
2008-07-28 0:54 ` [RFC] LTTng merge plan Mathieu Desnoyers
2008-07-29 16:18 ` Frank Ch. Eigler
2008-07-29 17:01 ` Mathieu Desnoyers
[not found] ` <20080729211543.GB17097@redhat.com>
2008-07-29 22:41 ` module-placed markers/tracepoints Mathieu Desnoyers
2008-07-29 23:01 ` Frank Ch. Eigler
2008-07-29 23:19 ` Mathieu Desnoyers
2008-07-30 1:40 ` Rusty Russell
2008-07-30 2:27 ` [PATCH] Module : call synchronize_sched() between module exit() and free Mathieu Desnoyers
2008-07-30 3:04 ` Rusty Russell
2008-07-30 4:05 ` Mathieu Desnoyers
2008-07-30 11:40 ` Frank Ch. Eigler
2008-07-30 14:09 ` Mathieu Desnoyers
2008-07-31 0:54 ` Rusty Russell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox