* Measuring KVM Performance using Hardware Performance Counters @ 2014-01-30 5:06 Xin Tong 2014-02-02 2:08 ` Xin Tong 0 siblings, 1 reply; 6+ messages in thread From: Xin Tong @ 2014-01-30 5:06 UTC (permalink / raw) To: kvm Hi I would like to measure the performance of KVM by using hardware performance counters and I have some questions 1. if i want to get the amount of time spent in instruction and device emulation, should i use oprofile on the kvm process on the host machine ? 2. what about amount of time spent in TLB misses, kvm makes uses of nested page in which a tlb miss could be expensive. 3. are there existing ways/measurements i can look into ? Thank you ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Measuring KVM Performance using Hardware Performance Counters 2014-01-30 5:06 Measuring KVM Performance using Hardware Performance Counters Xin Tong @ 2014-02-02 2:08 ` Xin Tong 2014-02-02 11:37 ` Paolo Bonzini 0 siblings, 1 reply; 6+ messages in thread From: Xin Tong @ 2014-02-02 2:08 UTC (permalink / raw) To: kvm I am getting very weird profile results by running operf on linux on the host and profiling the a kvm virtual machine running dacapo eclipse benchmark. I am expecting a lot of time should be spent in the qemu-system-x86_64 as the instructions from the eclipse benchmark would be treated as part of the qemu-system-x86_64 process, but the results tell different. any suggestions ? CPU_CLK_UNHALTED % Module Symbol/Functions 1054 31.49 /kvm /kvm 645 19.27 /kvm_intel /kvm_intel 248 7.41 /home/xtong/qemu-kvm-1.2.0/install/bin/qemu-system-x86_64 /home/xtong/qemu-kvm-1.2.0/install/bin/qemu-system-x86_64 215 6.42 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic native_write_msr_safe 101 3.02 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic native_read_msr_safe 93 2.78 /lib/x86_64-linux-gnu/libc-2.15.so /lib/x86_64-linux-gnu/libc-2.15.so 77 2.30 /lib/x86_64-linux-gnu/libglib-2.0.so.0.3200.4 /lib/x86_64-linux-gnu/libglib-2.0.so.0.3200.4 65 1.94 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic __srcu_read_lock 48 1.43 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic __srcu_read_unlock 40 1.20 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic native_load_tr_desc 29 0.87 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic __ticket_spin_lock 25 0.75 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic guest_exit 21 0.63 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic native_load_gdt ... Thank you. On Wed, Jan 29, 2014 at 11:06 PM, Xin Tong <xerox.time.tech@gmail.com> wrote: > Hi > > I would like to measure the performance of KVM by using hardware > performance counters and I have some questions > > 1. if i want to get the amount of time spent in instruction and device > emulation, should i use oprofile on the kvm process on the host > machine ? > 2. what about amount of time spent in TLB misses, kvm makes uses of > nested page in which a tlb miss could be expensive. > 3. are there existing ways/measurements i can look into ? > > Thank you ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Measuring KVM Performance using Hardware Performance Counters 2014-02-02 2:08 ` Xin Tong @ 2014-02-02 11:37 ` Paolo Bonzini [not found] ` <CA+JLOitGuTcX9coon0EKWBuhzuw4_7BxEab3=sk80J_WFakqhA@mail.gmail.com> 0 siblings, 1 reply; 6+ messages in thread From: Paolo Bonzini @ 2014-02-02 11:37 UTC (permalink / raw) To: Xin Tong, kvm Il 02/02/2014 03:08, Xin Tong ha scritto: > I am getting very weird profile results by running operf on linux on > the host and profiling the a kvm virtual machine running dacapo > eclipse benchmark. I am expecting a lot of time should be spent in > the qemu-system-x86_64 as the instructions from the eclipse benchmark > would be treated as part of the qemu-system-x86_64 process, but the > results tell different. any suggestions ? Most of the time should be spent running the guest. This is in the context of process qemu-system-x86_64, but it is not running code from the qemu executable. What likely happens is that when the profiling counter expires, it causes the VM to exit before the profiling interrupt is delivered. The sample should then be associated to the kvm_intel module. 31.5% in the KVM module seems like a lot. Can you detail the results further, so that I can see the function names for kvm and kvm_intel? Paolo > > CPU_CLK_UNHALTED % Module > Symbol/Functions > 1054 31.49 /kvm > /kvm > 645 19.27 /kvm_intel > /kvm_intel > 248 7.41 > /home/xtong/qemu-kvm-1.2.0/install/bin/qemu-system-x86_64 > /home/xtong/qemu-kvm-1.2.0/install/bin/qemu-system-x86_64 > 215 6.42 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic > native_write_msr_safe > 101 3.02 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic > native_read_msr_safe > 93 2.78 /lib/x86_64-linux-gnu/libc-2.15.so > /lib/x86_64-linux-gnu/libc-2.15.so > 77 2.30 /lib/x86_64-linux-gnu/libglib-2.0.so.0.3200.4 > /lib/x86_64-linux-gnu/libglib-2.0.so.0.3200.4 > 65 1.94 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic > __srcu_read_lock > 48 1.43 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic > __srcu_read_unlock > 40 1.20 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic > native_load_tr_desc > 29 0.87 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic > __ticket_spin_lock > 25 0.75 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic > guest_exit > 21 0.63 /usr/lib/debug/boot/vmlinux-3.11.0-13-generic > native_load_gdt > ... > Thank you. ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <CA+JLOitGuTcX9coon0EKWBuhzuw4_7BxEab3=sk80J_WFakqhA@mail.gmail.com>]
[parent not found: <52EF4F48.7080609@redhat.com>]
* Re: Measuring KVM Performance using Hardware Performance Counters [not found] ` <52EF4F48.7080609@redhat.com> @ 2014-02-03 17:06 ` Xin Tong 2014-02-04 3:31 ` Paolo Bonzini 0 siblings, 1 reply; 6+ messages in thread From: Xin Tong @ 2014-02-03 17:06 UTC (permalink / raw) To: Paolo Bonzini, kvm This is a profile taken on kvm on host. the kvm guest linux is running SPECCPU2006 403.gcc with reference data set. The commands to run and report the profile are sudo operf -s --separate-thread --event=CPU_CLK_UNHALTED:5000000 --vmlinux=/home/xtong/xtong-kernel/vmlinux sudo opreport --image-path /home/xtong/xtong-kernel -l -g -d --xml -o current.opm I have a few questions. 1. why are there 3 qemu-system-x86_64 threads ? and all of them are taking a non-trivial amount of time, i.e. 71.24%, 16.25% and 12.51% respectively. 2. why is qemu-system-x86_64 only taking 8.10% of the time, i imagine most of the time should be spent in qemu-system-x86_64 as 403.GCC does not do too much IO. 3. why are so much time spent in vmlinux ? the symbols for vmlinux is listed below. 4. why are so much time spent in kvm-intel and kvm ? the symbols for both are listed below. Overall Profile: current.opm ( 1549.0 ticks ) [ default counter: CPU_CLK_UNHALTED ] Process > Core > Thread > Module /.../qemu-system-x86_64 [PID 2534] (1231 ticks/79.47%) Core0 (1231 ticks/100.00%) /.../qemu-system-x86_64 TID 2537 [TID 2537] (877 ticks/71.24%) /.../vmlinux (395 ticks/45.04%) /kvm (198 ticks/22.58%) /kvm_intel (153 ticks/17.45%) /.../qemu-system-x86_64 (71 ticks/8.10%) /.../libc-2.15.so (47 ticks/5.36%) /.../libpthread-2.15.so (12 ticks/1.37%) /libahci (1 ticks/0.11%) /.../qemu-system-x86_64 TID 2658 [TID 2658] (200 ticks/16.25%) /.../vmlinux (190 ticks/95.00%) /.../libpthread-2.15.so (6 ticks/3.00%) /.../libc-2.15.so (2 ticks/1.00%) /.../qemu-system-x86_64 (1 ticks/0.50%) [vdso] (tgid:2534 range:0x7fff10588000-0x7fff10589fff) (1 ticks/0.50%) /.../qemu-system-x86_64 TID 2534 [TID 2534] (154 ticks/12.51%) /ksmd [PID 53] (83 ticks/5.36%) /.../top [PID 2617] (43 ticks/2.78%) /.../unity-2d-shell [PID 1807] (41 ticks/2.65%) /.../vmlinux (395 ticks/45.04%): CPU_CLK_UNHALTED % Symbol/Functions 82 20.76 native_write_msr_safe 46 11.65 native_read_msr_safe 25 6.33 __srcu_read_lock 16 4.05 native_load_tr_desc 12 3.04 __srcu_read_unlock 9 2.28 native_load_gdt 8 2.03 fget_light 8 2.03 _raw_spin_lock_irq 7 1.77 memcpy 6 1.52 _raw_spin_lock 6 1.52 __set_current_blocked 5 1.27 user_return_notifier_unregister 5 1.27 update_cfs_shares 5 1.27 page_fault 5 1.27 bsearch /kvm (198 ticks/22.58%) CPU_CLK_UNHALTED % Symbol/Functions 42 21.21 kvm_arch_vcpu_ioctl_run 9 4.55 kvm_set_shared_msr 7 3.54 x86_emulate_insn 7 3.54 paging64_walk_addr_generic 7 3.54 kvm_on_user_return 7 3.54 do_insn_fetch 6 3.03 gfn_to_memslot 6 3.03 decode_operand 5 2.53 kvm_io_bus_sort_cmp 5 2.53 kvm_arch_vcpu_load 5 2.53 kvm_apic_accept_pic_intr 4 2.02 kvm_fetch_guest_virt 4 2.02 kvm_arch_vcpu_runnable 4 2.02 emulator_read_gpr 4 2.02 apic_has_pending_timer 3 1.52 x86_decode_insn /kvm_intel (153 ticks/17.45%) CPU_CLK_UNHALTED % Symbol/Functions 92 60.13 vmx_vcpu_run 9 5.88 vmx_save_host_state 7 4.58 vmx_handle_external_intr 7 4.58 vmcs_writel 6 3.92 vmx_handle_exit 6 3.92 __vmx_load_host_state.part.43 4 2.61 clear_atomic_switch_msr 3 1.96 vmx_vm_has_apicv 3 1.96 vmx_get_rflags 2 1.31 vmx_read_guest_seg_selector 2 1.31 vmcs_clear_bits 2 1.31 skip_emulated_instruction 1 0.65 vmx_set_rflags 1 0.65 vmx_vcpu_load Thank you, Xin On Mon, Feb 3, 2014 at 2:11 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 02/02/2014 16:47, Xin Tong ha scritto: > >> On Sun, Feb 2, 2014 at 5:37 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: >>> >>> Il 02/02/2014 03:08, Xin Tong ha scritto: >>>> >>>> >>>> I am getting very weird profile results by running operf on linux on >>>> the host and profiling the a kvm virtual machine running dacapo >>>> eclipse benchmark. I am expecting a lot of time should be spent in >>>> the qemu-system-x86_64 as the instructions from the eclipse benchmark >>>> would be treated as part of the qemu-system-x86_64 process, but the >>>> results tell different. any suggestions ? >>> >>> >>> >>> Most of the time should be spent running the guest. This is in the >>> context >>> of process qemu-system-x86_64, but it is not running code from the qemu >>> executable. >> >> >> That is what i was trying to say. you said it better. >>> >>> >>> What likely happens is that when the profiling counter expires, it causes >>> the VM to exit before the profiling interrupt is delivered. The sample >>> should then be associated to the kvm_intel module. >> >> >> could not the kvm module read the counters and inject an counter >> overflow interrupt into the guest ? what are some ways to make the >> profile more accurate. this is collected on a intel haswell host >> machine. > > > Yes, it can. But then you have to run perf/operf in the guest not in the > host. > > For this to work, you need to specify "-cpu host" on the QEMU command line. > > Paolo ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Measuring KVM Performance using Hardware Performance Counters 2014-02-03 17:06 ` Xin Tong @ 2014-02-04 3:31 ` Paolo Bonzini 2014-02-04 20:53 ` Xin Tong 0 siblings, 1 reply; 6+ messages in thread From: Paolo Bonzini @ 2014-02-04 3:31 UTC (permalink / raw) To: Xin Tong, kvm Il 03/02/2014 18:06, Xin Tong ha scritto: > /.../qemu-system-x86_64 TID 2537 [TID 2537] (877 ticks/71.24%) This is the CPU thread (calls into the KVM modules). > /.../vmlinux (395 ticks/45.04%) > /kvm (198 ticks/22.58%) > /kvm_intel (153 ticks/17.45%) > /.../qemu-system-x86_64 (71 ticks/8.10%) > /.../libc-2.15.so (47 ticks/5.36%) > /.../libpthread-2.15.so (12 ticks/1.37%) > /libahci (1 ticks/0.11%) > /.../qemu-system-x86_64 TID 2658 [TID 2658] (200 ticks/16.25%) This is probably the VNC thread. > /.../vmlinux (190 ticks/95.00%) > /.../libpthread-2.15.so (6 ticks/3.00%) > /.../libc-2.15.so (2 ticks/1.00%) > /.../qemu-system-x86_64 (1 ticks/0.50%) > [vdso] (tgid:2534 range:0x7fff10588000-0x7fff10589fff) (1 ticks/0.50%) > /.../qemu-system-x86_64 TID 2534 [TID 2534] (154 ticks/12.51%) This is the main thread (lowest TID of all). > /ksmd [PID 53] (83 ticks/5.36%) > /.../top [PID 2617] (43 ticks/2.78%) > /.../unity-2d-shell [PID 1807] (41 ticks/2.65%) Now, going to your questions: > 2. why is qemu-system-x86_64 only taking 8.10% of the time, i imagine > most of the time should be spent in qemu-system-x86_64 as 403.GCC does > not do too much IO. This is the same I already answered: the time spent running the guest is in the context of process qemu-system-x86_64, but it is not running code from the qemu executable. In fact, 8.10% spent in qemu-system-x86_64 is a lot. I would expect much less. > 3. why are so much time spent in vmlinux ? the symbols for vmlinux is > listed below. > 4. why are so much time spent in kvm-intel and kvm ? the symbols for > both are listed below. Again, I think I already answered this. The time spent in vmx_vcpu_run is the actual time spent in the guest (17.45*60.13% = 10% roughly). Everything else is overhead introduced by either virtualization or profiling. What SPEC scores are you getting from bare-metal, KVM without oprofile and KVM with oprofile? Is the profile substantially different if you use perf instead of oprofile? Can you run "scripts/kvm/kvm_stat" (from the QEMU tree) while your guest is running (under oprofile) and paste the output from that tool? Thanks, Paolo > > > /.../vmlinux (395 ticks/45.04%): > CPU_CLK_UNHALTED % Symbol/Functions > 82 20.76 native_write_msr_safe > 46 11.65 native_read_msr_safe > 25 6.33 __srcu_read_lock > 16 4.05 native_load_tr_desc > 12 3.04 __srcu_read_unlock > 9 2.28 native_load_gdt > 8 2.03 fget_light > 8 2.03 _raw_spin_lock_irq > 7 1.77 memcpy > 6 1.52 _raw_spin_lock > 6 1.52 __set_current_blocked > 5 1.27 user_return_notifier_unregister > 5 1.27 update_cfs_shares > 5 1.27 page_fault > 5 1.27 bsearch > > /kvm (198 ticks/22.58%) > CPU_CLK_UNHALTED % Symbol/Functions > 42 21.21 kvm_arch_vcpu_ioctl_run > 9 4.55 kvm_set_shared_msr > 7 3.54 x86_emulate_insn > 7 3.54 paging64_walk_addr_generic > 7 3.54 kvm_on_user_return > 7 3.54 do_insn_fetch > 6 3.03 gfn_to_memslot > 6 3.03 decode_operand > 5 2.53 kvm_io_bus_sort_cmp > 5 2.53 kvm_arch_vcpu_load > 5 2.53 kvm_apic_accept_pic_intr > 4 2.02 kvm_fetch_guest_virt > 4 2.02 kvm_arch_vcpu_runnable > 4 2.02 emulator_read_gpr > 4 2.02 apic_has_pending_timer > 3 1.52 x86_decode_insn > > /kvm_intel (153 ticks/17.45%) > CPU_CLK_UNHALTED % Symbol/Functions > 92 60.13 vmx_vcpu_run > 9 5.88 vmx_save_host_state > 7 4.58 vmx_handle_external_intr > 7 4.58 vmcs_writel > 6 3.92 vmx_handle_exit > 6 3.92 __vmx_load_host_state.part.43 > 4 2.61 clear_atomic_switch_msr > 3 1.96 vmx_vm_has_apicv > 3 1.96 vmx_get_rflags > 2 1.31 vmx_read_guest_seg_selector > 2 1.31 vmcs_clear_bits > 2 1.31 skip_emulated_instruction > 1 0.65 vmx_set_rflags > 1 0.65 vmx_vcpu_load > > Thank you, > Xin > > > > On Mon, Feb 3, 2014 at 2:11 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: >> Il 02/02/2014 16:47, Xin Tong ha scritto: >> >>> On Sun, Feb 2, 2014 at 5:37 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: >>>> >>>> Il 02/02/2014 03:08, Xin Tong ha scritto: >>>>> >>>>> >>>>> I am getting very weird profile results by running operf on linux on >>>>> the host and profiling the a kvm virtual machine running dacapo >>>>> eclipse benchmark. I am expecting a lot of time should be spent in >>>>> the qemu-system-x86_64 as the instructions from the eclipse benchmark >>>>> would be treated as part of the qemu-system-x86_64 process, but the >>>>> results tell different. any suggestions ? >>>> >>>> >>>> >>>> Most of the time should be spent running the guest. This is in the >>>> context >>>> of process qemu-system-x86_64, but it is not running code from the qemu >>>> executable. >>> >>> >>> That is what i was trying to say. you said it better. >>>> >>>> >>>> What likely happens is that when the profiling counter expires, it causes >>>> the VM to exit before the profiling interrupt is delivered. The sample >>>> should then be associated to the kvm_intel module. >>> >>> >>> could not the kvm module read the counters and inject an counter >>> overflow interrupt into the guest ? what are some ways to make the >>> profile more accurate. this is collected on a intel haswell host >>> machine. >> >> >> Yes, it can. But then you have to run perf/operf in the guest not in the >> host. >> >> For this to work, you need to specify "-cpu host" on the QEMU command line. >> >> Paolo > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Measuring KVM Performance using Hardware Performance Counters 2014-02-04 3:31 ` Paolo Bonzini @ 2014-02-04 20:53 ` Xin Tong 0 siblings, 0 replies; 6+ messages in thread From: Xin Tong @ 2014-02-04 20:53 UTC (permalink / raw) To: Paolo Bonzini; +Cc: kvm On Mon, Feb 3, 2014 at 9:31 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 03/02/2014 18:06, Xin Tong ha scritto: > >> /.../qemu-system-x86_64 TID 2537 [TID 2537] (877 ticks/71.24%) > > > This is the CPU thread (calls into the KVM modules). This is the thread that calls the int kvm_cpu_exec(CPUArchState *env) and ioctl. > > >> /.../vmlinux (395 ticks/45.04%) >> /kvm (198 ticks/22.58%) >> /kvm_intel (153 ticks/17.45%) >> /.../qemu-system-x86_64 (71 ticks/8.10%) >> /.../libc-2.15.so (47 ticks/5.36%) >> /.../libpthread-2.15.so (12 ticks/1.37%) >> /libahci (1 ticks/0.11%) >> /.../qemu-system-x86_64 TID 2658 [TID 2658] (200 ticks/16.25%) > > > This is probably the VNC thread. > > >> /.../vmlinux (190 ticks/95.00%) >> /.../libpthread-2.15.so (6 ticks/3.00%) >> /.../libc-2.15.so (2 ticks/1.00%) >> /.../qemu-system-x86_64 (1 ticks/0.50%) >> [vdso] (tgid:2534 range:0x7fff10588000-0x7fff10589fff) (1 >> ticks/0.50%) >> /.../qemu-system-x86_64 TID 2534 [TID 2534] (154 ticks/12.51%) > > > This is the main thread (lowest TID of all). Is this the thread that executes main_loop_wait and handles IO emulation ? > > >> /ksmd [PID 53] (83 ticks/5.36%) >> /.../top [PID 2617] (43 ticks/2.78%) >> /.../unity-2d-shell [PID 1807] (41 ticks/2.65%) > > > Now, going to your questions: > > >> 2. why is qemu-system-x86_64 only taking 8.10% of the time, i imagine >> most of the time should be spent in qemu-system-x86_64 as 403.GCC does >> not do too much IO. > > > This is the same I already answered: the time spent running the guest is in > the context of process qemu-system-x86_64, but it is not running code from > the qemu executable. In fact, 8.10% spent in qemu-system-x86_64 is a lot. > I would expect much less. > > >> 3. why are so much time spent in vmlinux ? the symbols for vmlinux is >> listed below. >> 4. why are so much time spent in kvm-intel and kvm ? the symbols for >> both are listed below. > > > Again, I think I already answered this. The time spent in vmx_vcpu_run is > the actual time spent in the guest (17.45*60.13% = 10% roughly). I see, so this 10% is the actually useful GCC work done from the prospective of the guest. and the 8.10% is the time spent in the QEMU-system-x86_64 process. > > Everything else is overhead introduced by either virtualization or > profiling. > > What SPEC scores are you getting from bare-metal, KVM without oprofile and > KVM with oprofile? Is the profile substantially different if you use perf > instead of oprofile? > > Can you run "scripts/kvm/kvm_stat" (from the QEMU tree) while your guest is > running (under oprofile) and paste the output from that tool? > Data below is from a new 403.gcc run on a haswell linux machine with ref input data set. Profile taken over the entire GCC run. Host linux kernel (compiled myself) version: Linux aristotle 3.13.1+ #3 SMP Mon Feb 3 11:16:00 EST 2014 x86_64 x86_64 x86_64 GNU/Linux Command to run kvm: kvm ~/disks/ubuntu-natty.qcow2 -nographic Commands to oprofile: sudo operf -s --separate-thread --event=CPU_CLK_UNHALTED:5000000 --vmlinux=/home/xtong/xtong-kernel/vmlinux sudo opreport --image-path /home/xtong/xtong-kernel -l -g -d --xml -o current.opm Profiled KVM Performance (Lower is better): 1005 seconds Non-Profiled KVM Performance (Lower is better): 998 seconds Non-Profiled run on host machine (Lower is better): 686 seconds ( the virtualization overhead is still quite large in spite of the hw extension !). running sudo scripts/kvm/kvm_stat gave kvm statistics kvm_exit 105 16 kvm_entry 105 16 kvm_apic 70 10 kvm_emulate_insn 35 5 kvm_eoi 35 5 kvm_mmio 35 5 kvm_inj_virq 35 5 kvm_apic_accept_irq 35 5 Overall: Core0 (53973 ticks/100.00%) /.../qemu-system-x86_64 TID 8861 [TID 8861] (38676 ticks/71.66%) /.../vmlinux (17426 ticks/45.06%) /kvm (7768 ticks/20.08%) /kvm_intel (6882 ticks/17.79%) /.../qemu-system-x86_64 (3535 ticks/9.14%) /.../libc-2.15.so (2384 ticks/6.16%) /.../libpthread-2.15.so (537 ticks/1.39%) /.../librt-2.15.so (82 ticks/0.21%) /.../libglib-2.0.so.0.3200.4 (41 ticks/0.11%) [vdso] (tgid:8858 range:0x7fffe31fe000-0x7fffe31fffff) (16 ticks/0.04%) /libahci (4 ticks/0.01%) /r8169 (1 ticks/0.00%) /.../qemu-system-x86_64 TID 8858 [TID 8858] (6745 ticks/12.50%) /.../qemu-system-x86_64 TID 8880 [TID 8880] (2945 ticks/5.46%) /.../qemu-system-x86_64 TID 8882 [TID 8882] (2835 ticks/5.25%) /.../qemu-system-x86_64 TID 8860 [TID 8860] (2772 ticks/5.14%) vmlinux (45.06%) CPU_CLK_UNHALTED % Symbol/Functions 4583 26.30 native_write_msr_safe 2067 11.86 native_read_msr_safe 1148 6.59 __srcu_read_lock 852 4.89 __srcu_read_unlock 758 4.35 native_load_tr_desc 527 3.02 native_load_gdt 492 2.82 fget_light 313 1.80 __set_current_blocked 270 1.55 _raw_spin_lock_irq 198 1.14 recalc_sigpending 194 1.11 system_call 174 1.00 memcpy 164 0.94 rcu_note_context_switch 164 0.94 fput 160 0.92 user_return_notifier_unregister 158 0.91 SyS_ioctl kvm (20.08%) CPU_CLK_UNHALTED % Symbol/Functions 1949 25.09 kvm_arch_vcpu_ioctl_run 421 5.42 kvm_set_shared_msr 394 5.07 paging64_walk_addr_generic 271 3.49 kvm_on_user_return 256 3.30 x86_emulate_instruction 249 3.21 x86_emulate_insn 221 2.85 em_in 212 2.73 x86_decode_insn 156 2.01 init_emulate_ctxt 145 1.87 emulator_pio_in_out.isra.104 kvm-intel (17.79%) CPU_CLK_UNHALTED % Symbol/Functions 4006 58.21 vmx_vcpu_run 540 7.85 vmcs_writel 472 6.86 vmx_save_host_state 275 4.00 __vmx_load_host_state.part.43 151 2.19 vmx_handle_exit 149 2.17 vmcs_clear_bits 130 1.89 clear_atomic_switch_msr 109 1.58 vmx_set_interrupt_shadow 109 1.58 vmx_handle_external_intr 102 1.48 vmx_get_rflags so, 58.21%*17.79% = 10.3% of the time is spent executing the code from 403.gcc ( i.e. doing useful work), is this correct ? If that is the case, why do not i see ~10X performance degradation. Also I see a lot of time is spent in native_write_msr_safe and native_read_msr_safe in vmlinux, what are these for ? Thanks, Xin > Thanks, > > Paolo > >> >> >> /.../vmlinux (395 ticks/45.04%): >> CPU_CLK_UNHALTED % Symbol/Functions >> 82 20.76 native_write_msr_safe >> 46 11.65 native_read_msr_safe >> 25 6.33 __srcu_read_lock >> 16 4.05 native_load_tr_desc >> 12 3.04 __srcu_read_unlock >> 9 2.28 native_load_gdt >> 8 2.03 fget_light >> 8 2.03 _raw_spin_lock_irq >> 7 1.77 memcpy >> 6 1.52 _raw_spin_lock >> 6 1.52 __set_current_blocked >> 5 1.27 user_return_notifier_unregister >> 5 1.27 update_cfs_shares >> 5 1.27 page_fault >> 5 1.27 bsearch >> >> /kvm (198 ticks/22.58%) >> CPU_CLK_UNHALTED % Symbol/Functions >> 42 21.21 kvm_arch_vcpu_ioctl_run >> 9 4.55 kvm_set_shared_msr >> 7 3.54 x86_emulate_insn >> 7 3.54 paging64_walk_addr_generic >> 7 3.54 kvm_on_user_return >> 7 3.54 do_insn_fetch >> 6 3.03 gfn_to_memslot >> 6 3.03 decode_operand >> 5 2.53 kvm_io_bus_sort_cmp >> 5 2.53 kvm_arch_vcpu_load >> 5 2.53 kvm_apic_accept_pic_intr >> 4 2.02 kvm_fetch_guest_virt >> 4 2.02 kvm_arch_vcpu_runnable >> 4 2.02 emulator_read_gpr >> 4 2.02 apic_has_pending_timer >> 3 1.52 x86_decode_insn >> >> /kvm_intel (153 ticks/17.45%) >> CPU_CLK_UNHALTED % Symbol/Functions >> 92 60.13 vmx_vcpu_run >> 9 5.88 vmx_save_host_state >> 7 4.58 vmx_handle_external_intr >> 7 4.58 vmcs_writel >> 6 3.92 vmx_handle_exit >> 6 3.92 __vmx_load_host_state.part.43 >> 4 2.61 clear_atomic_switch_msr >> 3 1.96 vmx_vm_has_apicv >> 3 1.96 vmx_get_rflags >> 2 1.31 vmx_read_guest_seg_selector >> 2 1.31 vmcs_clear_bits >> 2 1.31 skip_emulated_instruction >> 1 0.65 vmx_set_rflags >> 1 0.65 vmx_vcpu_load >> >> Thank you, >> Xin >> >> >> >> On Mon, Feb 3, 2014 at 2:11 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: >>> >>> Il 02/02/2014 16:47, Xin Tong ha scritto: >>> >>>> On Sun, Feb 2, 2014 at 5:37 AM, Paolo Bonzini <pbonzini@redhat.com> >>>> wrote: >>>>> >>>>> >>>>> Il 02/02/2014 03:08, Xin Tong ha scritto: >>>>>> >>>>>> >>>>>> >>>>>> I am getting very weird profile results by running operf on linux on >>>>>> the host and profiling the a kvm virtual machine running dacapo >>>>>> eclipse benchmark. I am expecting a lot of time should be spent in >>>>>> the qemu-system-x86_64 as the instructions from the eclipse benchmark >>>>>> would be treated as part of the qemu-system-x86_64 process, but the >>>>>> results tell different. any suggestions ? >>>>> >>>>> >>>>> >>>>> >>>>> Most of the time should be spent running the guest. This is in the >>>>> context >>>>> of process qemu-system-x86_64, but it is not running code from the qemu >>>>> executable. >>>> >>>> >>>> >>>> That is what i was trying to say. you said it better. >>>>> >>>>> >>>>> >>>>> What likely happens is that when the profiling counter expires, it >>>>> causes >>>>> the VM to exit before the profiling interrupt is delivered. The sample >>>>> should then be associated to the kvm_intel module. >>>> >>>> >>>> >>>> could not the kvm module read the counters and inject an counter >>>> overflow interrupt into the guest ? what are some ways to make the >>>> profile more accurate. this is collected on a intel haswell host >>>> machine. >>> >>> >>> >>> Yes, it can. But then you have to run perf/operf in the guest not in the >>> host. >>> >>> For this to work, you need to specify "-cpu host" on the QEMU command >>> line. >>> >>> Paolo >> >> -- >> >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-02-04 20:53 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-30 5:06 Measuring KVM Performance using Hardware Performance Counters Xin Tong
2014-02-02 2:08 ` Xin Tong
2014-02-02 11:37 ` Paolo Bonzini
[not found] ` <CA+JLOitGuTcX9coon0EKWBuhzuw4_7BxEab3=sk80J_WFakqhA@mail.gmail.com>
[not found] ` <52EF4F48.7080609@redhat.com>
2014-02-03 17:06 ` Xin Tong
2014-02-04 3:31 ` Paolo Bonzini
2014-02-04 20:53 ` Xin Tong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox