From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Theurer Subject: Re: [PATCH] KVM: Use thread debug register storage instead of kvm specific data Date: Fri, 04 Sep 2009 11:08:51 -0500 Message-ID: <4AA13B93.2090501@linux.vnet.ibm.com> References: <1251798248-13164-1-git-send-email-avi@redhat.com> <4A9D6693.7040401@redhat.com> <1252075697.22211.43.camel@twinturbo.austin.ibm.com> <200909041030.39291.iggy@theiggy.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org To: Brian Jackson Return-path: Received: from e3.ny.us.ibm.com ([32.97.182.143]:53203 "EHLO e3.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757027AbZIDQJI (ORCPT ); Fri, 4 Sep 2009 12:09:08 -0400 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e3.ny.us.ibm.com (8.14.3/8.13.1) with ESMTP id n84G2HGG028638 for ; Fri, 4 Sep 2009 12:02:17 -0400 Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by d01relay02.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id n84G9AIf171490 for ; Fri, 4 Sep 2009 12:09:10 -0400 Received: from d01av01.pok.ibm.com (loopback [127.0.0.1]) by d01av01.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id n84G9ABo023161 for ; Fri, 4 Sep 2009 12:09:10 -0400 In-Reply-To: <200909041030.39291.iggy@theiggy.com> Sender: kvm-owner@vger.kernel.org List-ID: Brian Jackson wrote: > On Friday 04 September 2009 09:48:17 am Andrew Theurer wrote: > >>> Still not idle=poll, it may shave off 0.2%. >> Won't this affect SMT in a negative way? (OK, I am not running SMT now, >> but eventually we will be) A long time ago, we tested P4's with HT, and >> a polling idle in one thread always negatively impacted performance in >> the sibling thread. >> >> FWIW, I did try idle=halt, and it was slightly worse. >> >> I did get a chance to try the latest qemu (master and next heads). I >> have been running into a problem with virtIO stor driver for windows on >> anything much newer than kvm-87. I compiled the driver from the new git >> tree, installed OK, but still had the same error. Finally, I removed >> the serial number feature in the virtio-blk in qemu, and I can now get >> the driver to work in Windows. > > What were the symptoms you were seeing (i.e. define "a problem"). Device manager reports "a problem code 10" occurred, and the driver cannot initialize. Vadim Rozenfeld informed me: > There is a sanity check in the code, which checks the I/O range and fails if is not equal to 40h. > Resent virtio-blk devices have I/O range equal to 0x400 (serial number feature). So, out signed viostor driver will fail on the latest KVMs. This problem was fixed and committed to SVN some time ago. I assumed the fix was to the virtio windows driver, but I could not get the driver I compiled from latest git to work either (only on qemu-kvm-87). So, I just backed out the serial number feature in qemu, and it worked. FWIW, the linux virtio-blk driver never had a problem. > >> So, not really any good news on performance with latest qemu builds. >> Performance is slightly worse: >> >> qemu-kvm-87 >> user nice system irq softirq guest idle iowait >> 5.79 0.00 9.28 0.08 1.00 20.81 58.78 4.26 >> total busy: 36.97 >> >> qemu-kvm-88-905-g6025b2d (master) >> user nice system irq softirq guest idle iowait >> 6.57 0.00 10.86 0.08 1.02 21.35 55.90 4.21 >> total busy: 39.89 >> >> qemu-kvm-88-910-gbf8a05b (next) >> user nice system irq softirq guest idle iowait >> 6.60 0.00 10.91 0.09 1.03 21.35 55.71 4.31 >> total busy: 39.98 >> >> diff of profiles, p1=qemu-kvm-87, p2=qemu-master >> > >> 18x more samples for gfn_to_memslot_unali*, 37x for >> emulator_read_emula*, and more CPU time in guest mode. >> >> One other thing I decided to try was some cpu binding. I know this is >> not practical for production, but I wanted to see if there's any benefit >> at all. One reason was that a coworker here tried binding the qemu >> thread for the vcpu and the qemu IO thread to the same cpu. On a >> networking test, guest->local-host, throughput was up about 2x. >> Obviously there was a nice effect of being on the same cache. I >> wondered, even without full bore throughput tests, could we see any >> benefit here. So, I bound each pair of VMs to a dedicated core. What I >> saw was about a 6% improvement in performance. For a system which has >> pretty incredible memory performance and is not that busy, I was >> surprised that I got 6%. I am not advocating binding, but what I do >> wonder: on 1-way VMs, if we keep all the qemu threads together on the >> same CPU, but still allowing the scheduler to move them (all of them at >> once) to different cpus over time, would we see the same benefit? >> >> One other thing: So far I have not been using preadv/pwritev. I assume >> I need a more recent glibc (on 2.5 now) for qemu to take advantage of >> this? > > Getting p(read|write)v working almost doubled my virtio-net throughput in a > Linux guest. Not quite as much in Windows guests. Yes you need glibc-2.10. I > think some distros might have backported it to 2.9. You will also need some > support for it in your system includes. Thanks, I will try a newer glibc, or maybe just move to a newer Linux installation which happens to have a newer glic. -Andrew