From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi@redhat.com>
Subject: Re: [PATCH 0/6] Kill off the virtio_net tx mitigation timer
Date: Mon, 03 Nov 2008 17:19:44 +0200
Message-ID: <490F1690.6060509@redhat.com>
References: <> <1225389113-28332-1-git-send-email-markmc@redhat.com>	 <490D7754.4070807@redhat.com> <1225715009.5904.39.camel@blaa>	 <490EF141.8040005@redhat.com> <1225724694.5904.63.camel@blaa>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: kvm@vger.kernel.org
To: Mark McLoughlin <markmc@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx2.redhat.com ([66.187.237.31]:33046 "EHLO mx2.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1755046AbYKCPTs (ORCPT <rfc822;kvm@vger.kernel.org>);
	Mon, 3 Nov 2008 10:19:48 -0500
Received: from int-mx2.corp.redhat.com (int-mx2.corp.redhat.com [172.16.27.26])
	by mx2.redhat.com (8.13.8/8.13.8) with ESMTP id mA3FJmNn006196
	for <kvm@vger.kernel.org>; Mon, 3 Nov 2008 10:19:48 -0500
In-Reply-To: <1225724694.5904.63.camel@blaa>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Mark McLoughlin wrote:
>> But it will increase overhead, since suddenly we aren't queueing 
>> anymore.  One vmexit per small packet.
>>     
>
> Yes in theory, but the packet copies are acting to mitigate exits since
> we don't re-enable notifications again until we're sure the ring is
> empty.
>   

You mean, the guest and the copy proceed in parallel, and while they do, 
exits are disabled?

> With copyless, though, we'd have an unacceptable vmexit rate.
>   

Right.

  

>> If the timer affects latency, then something is very wrong.  We're 
>> lacking an adjustable window.
>>
>> The way I see it, the notification window should be adjusted according 
>> to the current workload.  If the link is idle, the window should be one 
>> packet -- notify as soon as something is queued.  As the workload 
>> increases, the window increases to (safety_factor * allowable_latency / 
>> packet_rate).  The timer is set to allowable_latency to catch changes in 
>> workload.
>>
>> For example:
>>
>> - allowable_latency 1ms (implies 1K vmexits/sec desired)
>> - current packet_rate 20K packets/sec
>> - safety_factor 0.8
>>
>> So we request notifications every 0.8 * 20K * 1m = 16 packets, and set 
>> the timer to 1ms.  Usually we get a notification every 16 packets, just 
>> before timer expiration.  If the workload increases, we get 
>> notifications sooner, so we increase the window.  If the workload drops, 
>> the timer fires and we decrease the window.
>>
>> The timer should never fire on an all-out benchmark, or in a ping test.
>>     
>
> Yeah, I do like the sound of this.
>
> However, since it requires a new guest feature and I don't expect it'll
> improve the situation over the proposed patch until we have copyless
> transmit, I think we should do this as part of the copyless effort.
>   

Hopefully copyless and this can be done in parallel.  I think they have 
value independently.

> One thing I'd worry about with this scheme is all-out receive - e.g. any
> delay in returning a TCP ACK to the sending side, might cause us to hit
> the TCP window size.
>   

Consider a real NIC, that also has latency for ACKs that is determined 
by the queue length.  The proposal doesn't change that, except 
momentarily when transitioning from high throughput to low throughput.

In any case, latency is never more than allowable_latency (not including 
time spent in the guest network stack queues, but we aren't responsible 
for that).

(one day we can add a queue for acks and other high priority stuff, but 
we have enough on our hands now)

>> We're hurting our cache, and this won't work well with many nics.  At 
>> the very least this should be done in a dedicated thread.
>>     
>
> A thread per nic is doable, but it'd be especially tricky on the receive
> side without more "short-cut the one producer, one consumer case" work.
>   

We can start with transmit.  I'm somewhat worried about further 
divergence from qemu mainline (just completed a merge...).

-- 
error compiling committee.c: too many arguments to function