From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1NUM8F-0006YA-44
	for qemu-devel@nongnu.org; Mon, 11 Jan 2010 10:24:39 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1NUM8A-0006XE-GZ
	for qemu-devel@nongnu.org; Mon, 11 Jan 2010 10:24:38 -0500
Received: from [199.232.76.173] (port=50671 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1NUM8A-0006XB-8B
	for qemu-devel@nongnu.org; Mon, 11 Jan 2010 10:24:34 -0500
Received: from mx20.gnu.org ([199.232.41.8]:10510)
	by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32)
	(Exim 4.60) (envelope-from <avi@redhat.com>) id 1NUM89-0005MK-S6
	for qemu-devel@nongnu.org; Mon, 11 Jan 2010 10:24:33 -0500
Received: from mx1.redhat.com ([209.132.183.28])
	by mx20.gnu.org with esmtp (Exim 4.60)
	(envelope-from <avi@redhat.com>) id 1NUM3q-0008Be-PI
	for qemu-devel@nongnu.org; Mon, 11 Jan 2010 10:20:07 -0500
Message-ID: <4B4B4199.9050603@redhat.com>
Date: Mon, 11 Jan 2010 17:19:53 +0200
From: Avi Kivity <avi@redhat.com>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows
	guests, running on top of virtio block device
References: <1263195647.2005.44.camel@localhost> <4B4AE1BD.4000400@redhat.com>
	<20100111134248.GA25622@lst.de>
	<4B4B2C5F.7050403@codemonkey.ws> <4B4B35AF.3010706@redhat.com>
	<4B4B3796.1010106@codemonkey.ws> <4B4B39D4.8060405@redhat.com>
	<4B4B4013.9030706@codemonkey.ws>
In-Reply-To: <4B4B4013.9030706@codemonkey.ws>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: qemu-devel <qemu-devel@nongnu.org>, Dor Laor <dlaor@redhat.com>, Christoph Hellwig <hch@lst.de>, Vadim Rozenfeld <vrozenfe@redhat.com>

On 01/11/2010 05:13 PM, Anthony Liguori wrote:
> On 01/11/2010 08:46 AM, Avi Kivity wrote:
>> On 01/11/2010 04:37 PM, Anthony Liguori wrote:
>>>> That has the downside of bouncing a cache line on unrelated exits.
>>>
>>>
>>> The read and write sides of the ring are widely separated in 
>>> physical memory specifically to avoid cache line bouncing.
>>
>> I meant, exits on random vcpus will cause the cacheline containing 
>> the notification disable flag to bounce around.  As it is, we read it 
>> on the vcpu that owns the queue and write it on that vcpu or the I/O 
>> thread.
>
> Bottom halves are always run from the IO thread.

Okay, so that won't be an issue.

>>>>   It probably doesn't matter with qemu as it is now, since it will 
>>>> bounce qemu_mutex, but it will hurt with large guests (especially 
>>>> if they have many rings).
>>>>
>>>> IMO we should get things to work well without riding on unrelated 
>>>> exits, especially as we're trying to reduce those exits.
>>>
>>> A block I/O request can potentially be very, very long lived.  By 
>>> serializing requests like this, there's a high likelihood that it's 
>>> going to kill performance with anything capable of processing 
>>> multiple requests.
>>
>> Right, that's why I suggested having a queue depth at which disabling 
>> notification kicks in.  The patch hardcodes this depth to 1, 
>> unpatched qemu is infinite, a good value is probably spindle count + 
>> VAT.
>
> That means we would need a user visible option which is quite 
> unfortunate.

We could guess it, perhaps.

> Also, that logic only really makes sense with cache=off.  With 
> cache=writethrough, you can get pathological cases whereas you have an 
> uncached access followed by cached accesses.  In fact, with 
> read-ahead, this is probably not an uncommon scenario.

So you'd increase the disable depths when cache!=none.

>>> OTOH, if we aggressively poll the ring when we have an opportunity 
>>> to, there's very little down side to that and it addresses the 
>>> serialization problem.
>>
>> But we can't guarantee that we'll get those opportunities, so it 
>> doesn't address the problem in a general way.  A guest that doesn't 
>> use hpet and only has a single virtio-blk device will not have any 
>> reason to exit to qemu.
>
> We can mitigate this with a timer but honestly, we need to do perf 
> measurements to see.  My feeling is that we will need some more 
> aggressive form of polling than just waiting for IO completion.  I 
> don't think queue depth is enough because it assumes that all requests 
> are equal.  When dealing with cache=off or even just storage with it's 
> own cache, that's simply not the case.

Maybe we can adapt behaviour dynamically based on how fast results come in.

-- 
error compiling committee.c: too many arguments to function