From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [PATCH 0/6] Kill off the virtio_net tx mitigation timer Date: Mon, 03 Nov 2008 17:19:44 +0200 Message-ID: <490F1690.6060509@redhat.com> References: <> <1225389113-28332-1-git-send-email-markmc@redhat.com> <490D7754.4070807@redhat.com> <1225715009.5904.39.camel@blaa> <490EF141.8040005@redhat.com> <1225724694.5904.63.camel@blaa> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org To: Mark McLoughlin Return-path: Received: from mx2.redhat.com ([66.187.237.31]:33046 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755046AbYKCPTs (ORCPT ); Mon, 3 Nov 2008 10:19:48 -0500 Received: from int-mx2.corp.redhat.com (int-mx2.corp.redhat.com [172.16.27.26]) by mx2.redhat.com (8.13.8/8.13.8) with ESMTP id mA3FJmNn006196 for ; Mon, 3 Nov 2008 10:19:48 -0500 In-Reply-To: <1225724694.5904.63.camel@blaa> Sender: kvm-owner@vger.kernel.org List-ID: Mark McLoughlin wrote: >> But it will increase overhead, since suddenly we aren't queueing >> anymore. One vmexit per small packet. >> > > Yes in theory, but the packet copies are acting to mitigate exits since > we don't re-enable notifications again until we're sure the ring is > empty. > You mean, the guest and the copy proceed in parallel, and while they do, exits are disabled? > With copyless, though, we'd have an unacceptable vmexit rate. > Right. >> If the timer affects latency, then something is very wrong. We're >> lacking an adjustable window. >> >> The way I see it, the notification window should be adjusted according >> to the current workload. If the link is idle, the window should be one >> packet -- notify as soon as something is queued. As the workload >> increases, the window increases to (safety_factor * allowable_latency / >> packet_rate). The timer is set to allowable_latency to catch changes in >> workload. >> >> For example: >> >> - allowable_latency 1ms (implies 1K vmexits/sec desired) >> - current packet_rate 20K packets/sec >> - safety_factor 0.8 >> >> So we request notifications every 0.8 * 20K * 1m = 16 packets, and set >> the timer to 1ms. Usually we get a notification every 16 packets, just >> before timer expiration. If the workload increases, we get >> notifications sooner, so we increase the window. If the workload drops, >> the timer fires and we decrease the window. >> >> The timer should never fire on an all-out benchmark, or in a ping test. >> > > Yeah, I do like the sound of this. > > However, since it requires a new guest feature and I don't expect it'll > improve the situation over the proposed patch until we have copyless > transmit, I think we should do this as part of the copyless effort. > Hopefully copyless and this can be done in parallel. I think they have value independently. > One thing I'd worry about with this scheme is all-out receive - e.g. any > delay in returning a TCP ACK to the sending side, might cause us to hit > the TCP window size. > Consider a real NIC, that also has latency for ACKs that is determined by the queue length. The proposal doesn't change that, except momentarily when transitioning from high throughput to low throughput. In any case, latency is never more than allowable_latency (not including time spent in the guest network stack queues, but we aren't responsible for that). (one day we can add a queue for acks and other high priority stuff, but we have enough on our hands now) >> We're hurting our cache, and this won't work well with many nics. At >> the very least this should be done in a dedicated thread. >> > > A thread per nic is doable, but it'd be especially tricky on the receive > side without more "short-cut the one producer, one consumer case" work. > We can start with transmit. I'm somewhat worried about further divergence from qemu mainline (just completed a merge...). -- error compiling committee.c: too many arguments to function