From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NULxl-0002ET-Iq for qemu-devel@nongnu.org; Mon, 11 Jan 2010 10:13:49 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NULxc-0001ux-Gf for qemu-devel@nongnu.org; Mon, 11 Jan 2010 10:13:45 -0500 Received: from [199.232.76.173] (port=52020 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NULxc-0001uX-Bk for qemu-devel@nongnu.org; Mon, 11 Jan 2010 10:13:40 -0500 Received: from mail-gx0-f223.google.com ([209.85.217.223]:62391) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NULxc-0001pG-0R for qemu-devel@nongnu.org; Mon, 11 Jan 2010 10:13:40 -0500 Received: by gxk23 with SMTP id 23so24772374gxk.2 for ; Mon, 11 Jan 2010 07:13:25 -0800 (PST) Message-ID: <4B4B4013.9030706@codemonkey.ws> Date: Mon, 11 Jan 2010 09:13:23 -0600 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device References: <1263195647.2005.44.camel@localhost> <4B4AE1BD.4000400@redhat.com> <20100111134248.GA25622@lst.de> <4B4B2C5F.7050403@codemonkey.ws> <4B4B35AF.3010706@redhat.com> <4B4B3796.1010106@codemonkey.ws> <4B4B39D4.8060405@redhat.com> In-Reply-To: <4B4B39D4.8060405@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Avi Kivity Cc: qemu-devel , Dor Laor , Christoph Hellwig , Vadim Rozenfeld On 01/11/2010 08:46 AM, Avi Kivity wrote: > On 01/11/2010 04:37 PM, Anthony Liguori wrote: >>> That has the downside of bouncing a cache line on unrelated exits. >> >> >> The read and write sides of the ring are widely separated in physical >> memory specifically to avoid cache line bouncing. > > I meant, exits on random vcpus will cause the cacheline containing the > notification disable flag to bounce around. As it is, we read it on > the vcpu that owns the queue and write it on that vcpu or the I/O thread. Bottom halves are always run from the IO thread. >>> It probably doesn't matter with qemu as it is now, since it will >>> bounce qemu_mutex, but it will hurt with large guests (especially if >>> they have many rings). >>> >>> IMO we should get things to work well without riding on unrelated >>> exits, especially as we're trying to reduce those exits. >> >> A block I/O request can potentially be very, very long lived. By >> serializing requests like this, there's a high likelihood that it's >> going to kill performance with anything capable of processing >> multiple requests. > > Right, that's why I suggested having a queue depth at which disabling > notification kicks in. The patch hardcodes this depth to 1, unpatched > qemu is infinite, a good value is probably spindle count + VAT. That means we would need a user visible option which is quite unfortunate. Also, that logic only really makes sense with cache=off. With cache=writethrough, you can get pathological cases whereas you have an uncached access followed by cached accesses. In fact, with read-ahead, this is probably not an uncommon scenario. >> OTOH, if we aggressively poll the ring when we have an opportunity >> to, there's very little down side to that and it addresses the >> serialization problem. > > But we can't guarantee that we'll get those opportunities, so it > doesn't address the problem in a general way. A guest that doesn't > use hpet and only has a single virtio-blk device will not have any > reason to exit to qemu. We can mitigate this with a timer but honestly, we need to do perf measurements to see. My feeling is that we will need some more aggressive form of polling than just waiting for IO completion. I don't think queue depth is enough because it assumes that all requests are equal. When dealing with cache=off or even just storage with it's own cache, that's simply not the case. Regards, Anthony Liguori