From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1NULxl-0002ET-Iq
	for qemu-devel@nongnu.org; Mon, 11 Jan 2010 10:13:49 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1NULxc-0001ux-Gf
	for qemu-devel@nongnu.org; Mon, 11 Jan 2010 10:13:45 -0500
Received: from [199.232.76.173] (port=52020 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1NULxc-0001uX-Bk
	for qemu-devel@nongnu.org; Mon, 11 Jan 2010 10:13:40 -0500
Received: from mail-gx0-f223.google.com ([209.85.217.223]:62391)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <anthony@codemonkey.ws>) id 1NULxc-0001pG-0R
	for qemu-devel@nongnu.org; Mon, 11 Jan 2010 10:13:40 -0500
Received: by gxk23 with SMTP id 23so24772374gxk.2
	for <qemu-devel@nongnu.org>; Mon, 11 Jan 2010 07:13:25 -0800 (PST)
Message-ID: <4B4B4013.9030706@codemonkey.ws>
Date: Mon, 11 Jan 2010 09:13:23 -0600
From: Anthony Liguori <anthony@codemonkey.ws>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows
	guests, running on top of virtio block device
References: <1263195647.2005.44.camel@localhost> <4B4AE1BD.4000400@redhat.com>
	<20100111134248.GA25622@lst.de>
	<4B4B2C5F.7050403@codemonkey.ws> <4B4B35AF.3010706@redhat.com>
	<4B4B3796.1010106@codemonkey.ws> <4B4B39D4.8060405@redhat.com>
In-Reply-To: <4B4B39D4.8060405@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Avi Kivity <avi@redhat.com>
Cc: qemu-devel <qemu-devel@nongnu.org>, Dor Laor <dlaor@redhat.com>, Christoph Hellwig <hch@lst.de>, Vadim Rozenfeld <vrozenfe@redhat.com>

On 01/11/2010 08:46 AM, Avi Kivity wrote:
> On 01/11/2010 04:37 PM, Anthony Liguori wrote:
>>> That has the downside of bouncing a cache line on unrelated exits.
>>
>>
>> The read and write sides of the ring are widely separated in physical 
>> memory specifically to avoid cache line bouncing.
>
> I meant, exits on random vcpus will cause the cacheline containing the 
> notification disable flag to bounce around.  As it is, we read it on 
> the vcpu that owns the queue and write it on that vcpu or the I/O thread.

Bottom halves are always run from the IO thread.
>>>   It probably doesn't matter with qemu as it is now, since it will 
>>> bounce qemu_mutex, but it will hurt with large guests (especially if 
>>> they have many rings).
>>>
>>> IMO we should get things to work well without riding on unrelated 
>>> exits, especially as we're trying to reduce those exits.
>>
>> A block I/O request can potentially be very, very long lived.  By 
>> serializing requests like this, there's a high likelihood that it's 
>> going to kill performance with anything capable of processing 
>> multiple requests.
>
> Right, that's why I suggested having a queue depth at which disabling 
> notification kicks in.  The patch hardcodes this depth to 1, unpatched 
> qemu is infinite, a good value is probably spindle count + VAT.

That means we would need a user visible option which is quite unfortunate.

Also, that logic only really makes sense with cache=off.  With 
cache=writethrough, you can get pathological cases whereas you have an 
uncached access followed by cached accesses.  In fact, with read-ahead, 
this is probably not an uncommon scenario.

>> OTOH, if we aggressively poll the ring when we have an opportunity 
>> to, there's very little down side to that and it addresses the 
>> serialization problem.
>
> But we can't guarantee that we'll get those opportunities, so it 
> doesn't address the problem in a general way.  A guest that doesn't 
> use hpet and only has a single virtio-blk device will not have any 
> reason to exit to qemu.

We can mitigate this with a timer but honestly, we need to do perf 
measurements to see.  My feeling is that we will need some more 
aggressive form of polling than just waiting for IO completion.  I don't 
think queue depth is enough because it assumes that all requests are 
equal.  When dealing with cache=off or even just storage with it's own 
cache, that's simply not the case.

Regards,

Anthony Liguori