From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NUM8F-0006YA-44 for qemu-devel@nongnu.org; Mon, 11 Jan 2010 10:24:39 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NUM8A-0006XE-GZ for qemu-devel@nongnu.org; Mon, 11 Jan 2010 10:24:38 -0500 Received: from [199.232.76.173] (port=50671 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NUM8A-0006XB-8B for qemu-devel@nongnu.org; Mon, 11 Jan 2010 10:24:34 -0500 Received: from mx20.gnu.org ([199.232.41.8]:10510) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1NUM89-0005MK-S6 for qemu-devel@nongnu.org; Mon, 11 Jan 2010 10:24:33 -0500 Received: from mx1.redhat.com ([209.132.183.28]) by mx20.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NUM3q-0008Be-PI for qemu-devel@nongnu.org; Mon, 11 Jan 2010 10:20:07 -0500 Message-ID: <4B4B4199.9050603@redhat.com> Date: Mon, 11 Jan 2010 17:19:53 +0200 From: Avi Kivity MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device References: <1263195647.2005.44.camel@localhost> <4B4AE1BD.4000400@redhat.com> <20100111134248.GA25622@lst.de> <4B4B2C5F.7050403@codemonkey.ws> <4B4B35AF.3010706@redhat.com> <4B4B3796.1010106@codemonkey.ws> <4B4B39D4.8060405@redhat.com> <4B4B4013.9030706@codemonkey.ws> In-Reply-To: <4B4B4013.9030706@codemonkey.ws> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: qemu-devel , Dor Laor , Christoph Hellwig , Vadim Rozenfeld On 01/11/2010 05:13 PM, Anthony Liguori wrote: > On 01/11/2010 08:46 AM, Avi Kivity wrote: >> On 01/11/2010 04:37 PM, Anthony Liguori wrote: >>>> That has the downside of bouncing a cache line on unrelated exits. >>> >>> >>> The read and write sides of the ring are widely separated in >>> physical memory specifically to avoid cache line bouncing. >> >> I meant, exits on random vcpus will cause the cacheline containing >> the notification disable flag to bounce around. As it is, we read it >> on the vcpu that owns the queue and write it on that vcpu or the I/O >> thread. > > Bottom halves are always run from the IO thread. Okay, so that won't be an issue. >>>> It probably doesn't matter with qemu as it is now, since it will >>>> bounce qemu_mutex, but it will hurt with large guests (especially >>>> if they have many rings). >>>> >>>> IMO we should get things to work well without riding on unrelated >>>> exits, especially as we're trying to reduce those exits. >>> >>> A block I/O request can potentially be very, very long lived. By >>> serializing requests like this, there's a high likelihood that it's >>> going to kill performance with anything capable of processing >>> multiple requests. >> >> Right, that's why I suggested having a queue depth at which disabling >> notification kicks in. The patch hardcodes this depth to 1, >> unpatched qemu is infinite, a good value is probably spindle count + >> VAT. > > That means we would need a user visible option which is quite > unfortunate. We could guess it, perhaps. > Also, that logic only really makes sense with cache=off. With > cache=writethrough, you can get pathological cases whereas you have an > uncached access followed by cached accesses. In fact, with > read-ahead, this is probably not an uncommon scenario. So you'd increase the disable depths when cache!=none. >>> OTOH, if we aggressively poll the ring when we have an opportunity >>> to, there's very little down side to that and it addresses the >>> serialization problem. >> >> But we can't guarantee that we'll get those opportunities, so it >> doesn't address the problem in a general way. A guest that doesn't >> use hpet and only has a single virtio-blk device will not have any >> reason to exit to qemu. > > We can mitigate this with a timer but honestly, we need to do perf > measurements to see. My feeling is that we will need some more > aggressive form of polling than just waiting for IO completion. I > don't think queue depth is enough because it assumes that all requests > are equal. When dealing with cache=off or even just storage with it's > own cache, that's simply not the case. Maybe we can adapt behaviour dynamically based on how fast results come in. -- error compiling committee.c: too many arguments to function