From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [PATCH] KVM: Use thread debug register storage instead of kvm specific data Date: Sun, 06 Sep 2009 11:21:15 +0300 Message-ID: <4AA370FB.9050007@redhat.com> References: <1251798248-13164-1-git-send-email-avi@redhat.com> <4A9CEDBE.8010902@redhat.com> <1251828730.9683.129.camel@twinturbo.austin.ibm.com> <4A9D6693.7040401@redhat.com> <1252075697.22211.43.camel@twinturbo.austin.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Marcelo Tosatti , kvm@vger.kernel.org, Jan Kiszka To: habanero@linux.vnet.ibm.com Return-path: Received: from mx1.redhat.com ([209.132.183.28]:27041 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750710AbZIFIVT (ORCPT ); Sun, 6 Sep 2009 04:21:19 -0400 In-Reply-To: <1252075697.22211.43.camel@twinturbo.austin.ibm.com> Sender: kvm-owner@vger.kernel.org List-ID: On 09/04/2009 05:48 PM, Andrew Theurer wrote: > >> Still not idle=poll, it may shave off 0.2%. >> > Won't this affect SMT in a negative way? (OK, I am not running SMT now, > but eventually we will be) A long time ago, we tested P4's with HT, and > a polling idle in one thread always negatively impacted performance in > the sibling thread. > Sorry, I meant idle=halt. idle=poll is too wasteful to be used. > FWIW, I did try idle=halt, and it was slightly worse. > Interesting, I've heard that mwait latency is bad for spinlocks, guess it's fine for idle. > >> profile1 is qemu-kvm-87 >> profile2 is qemu-master >> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 10000000 >> total samples (ts1) for profile1 is 1616921 >> total samples (ts2) for profile2 is 1752347 (includes multiplier of 0.995420) >> functions which have a abs(pct2-pct1)< 0.06 are not displayed >> >> pct2: pct1: >> 100* 100* pct2 >> s1 s2 s2/s1 s2/ts1 s1/ts1 -pct1 symbol bin >> --------- --------- ------- ------- ------- ------ ------ --- >> 879611 907883 1.03/1 56.149 54.400 1.749 vmx_vcpu_run kvm >> 614 11553 18.82/1 0.715 0.038 0.677 gfn_to_memslot_unali kvm.ko >> 34511 44922 1.30/1 2.778 2.134 0.644 phys_page_find_alloc qemu >> 2866 9334 3.26/1 0.577 0.177 0.400 paging64_walk_addr kvm.ko >> 11139 17200 1.54/1 1.064 0.689 0.375 copy_user_generic_st vmlinux >> 3100 7108 2.29/1 0.440 0.192 0.248 x86_decode_insn kvm.ko >> 8169 11873 1.45/1 0.734 0.505 0.229 virtqueue_avail_byte qemu >> 1103 4540 4.12/1 0.281 0.068 0.213 kvm_read_guest kvm.ko >> 17427 20401 1.17/1 1.262 1.078 0.184 memcpy libc >> 0 2905 0.180 0.000 0.180 gfn_to_pfn kvm.ko >> 1831 4328 2.36/1 0.268 0.113 0.154 x86_emulate_insn kvm.ko >> 65 2431 37.41/1 0.150 0.004 0.146 emulator_read_emulat kvm.ko >> 14922 17196 1.15/1 1.064 0.923 0.141 qemu_get_ram_ptr qemu >> 545 2724 5.00/1 0.168 0.034 0.135 emulate_instruction kvm.ko >> 599 2464 4.11/1 0.152 0.037 0.115 kvm_read_guest_page kvm.ko >> 503 2355 4.68/1 0.146 0.031 0.115 gfn_to_hva kvm.ko >> 1076 2918 2.71/1 0.181 0.067 0.114 memcpy_c vmlinux >> 594 2241 3.77/1 0.139 0.037 0.102 next_segment kvm.ko >> 1680 3248 1.93/1 0.201 0.104 0.097 pipe_poll vmlinux >> 0 1463 0.090 0.000 0.090 subpage_readl qemu >> 0 1363 0.084 0.000 0.084 msix_enabled qemu >> 527 1883 3.57/1 0.116 0.033 0.084 paging64_gpte_to_gfn kvm.ko >> 962 2223 2.31/1 0.138 0.059 0.078 do_insn_fetch kvm.ko >> 348 1605 4.61/1 0.099 0.022 0.078 is_rsvd_bits_set kvm.ko >> 520 1763 3.39/1 0.109 0.032 0.077 unalias_gfn kvm.ko >> 1 1163 1163.65/1 0.072 0.000 0.072 tdp_page_fault kvm.ko >> 3827 4912 1.28/1 0.304 0.237 0.067 __down_read vmlinux >> 0 1014 0.063 0.000 0.063 mapping_level kvm.ko >> 973 0 0.000 0.060 -0.060 pm_ioport_readl qemu >> 1635 528 1/3.09 0.033 0.101 -0.068 ioport_read qemu >> 2179 1017 1/2.14 0.063 0.135 -0.072 kvm_emulate_pio kvm.ko >> 25141 23722 1/1.06 1.467 1.555 -0.088 native_write_msr_saf vmlinux >> 1560 0 0.000 0.096 -0.096 eventfd_poll vmlinux >> ------- ------- ------ >> 105.100 97.450 7.650 >> > > 18x more samples for gfn_to_memslot_unali*, 37x for > emulator_read_emula*, and more CPU time in guest mode. > And 5x more instructions emulated. I wonder where that comes from. > One other thing: So far I have not been using preadv/pwritev. I assume > I need a more recent glibc (on 2.5 now) for qemu to take advantage of > this? > > Yes, but it should be easy to write a LD_PRELOAD hack that will work with your current glibc. It should certainly improve things. -- error compiling committee.c: too many arguments to function