From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark McLoughlin Subject: Re: [PATCH 0/6] Kill off the virtio_net tx mitigation timer Date: Mon, 03 Nov 2008 15:04:54 +0000 Message-ID: <1225724694.5904.63.camel@blaa> References: <> <1225389113-28332-1-git-send-email-markmc@redhat.com> <490D7754.4070807@redhat.com> <1225715009.5904.39.camel@blaa> <490EF141.8040005@redhat.com> Reply-To: Mark McLoughlin Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org To: Avi Kivity Return-path: Received: from mx2.redhat.com ([66.187.237.31]:48162 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756338AbYKCPFx (ORCPT ); Mon, 3 Nov 2008 10:05:53 -0500 Received: from int-mx2.corp.redhat.com (int-mx2.corp.redhat.com [172.16.27.26]) by mx2.redhat.com (8.13.8/8.13.8) with ESMTP id mA3F5r49001043 for ; Mon, 3 Nov 2008 10:05:53 -0500 In-Reply-To: <490EF141.8040005@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On Mon, 2008-11-03 at 14:40 +0200, Avi Kivity wrote: > Mark McLoughlin wrote: > > On Sun, 2008-11-02 at 11:48 +0200, Avi Kivity wrote: > > > >> Mark McLoughlin wrote: > >>> The main patch in this series is 5/6 - it just kills off the > >>> virtio_net tx mitigation timer and does all the tx I/O in the > >>> I/O thread. > >>> > >>> > >>> > >> What will it do to small packet, multi-flow loads (simulated by ping -f > >> -l 30 $external)? > >> > > > > It should improve the latency - the packets will be flushed more quickly > > than the 150us timeout without blocking the guest. > > > > > > But it will increase overhead, since suddenly we aren't queueing > anymore. One vmexit per small packet. Yes in theory, but the packet copies are acting to mitigate exits since we don't re-enable notifications again until we're sure the ring is empty. With copyless, though, we'd have an unacceptable vmexit rate. > >> Where does the benefit come from? > >> > > > > There are two things going on here, I think. > > > > First is that the timer affects latency, removing the timeout helps > > that. > > > > If the timer affects latency, then something is very wrong. We're > lacking an adjustable window. > > The way I see it, the notification window should be adjusted according > to the current workload. If the link is idle, the window should be one > packet -- notify as soon as something is queued. As the workload > increases, the window increases to (safety_factor * allowable_latency / > packet_rate). The timer is set to allowable_latency to catch changes in > workload. > > For example: > > - allowable_latency 1ms (implies 1K vmexits/sec desired) > - current packet_rate 20K packets/sec > - safety_factor 0.8 > > So we request notifications every 0.8 * 20K * 1m = 16 packets, and set > the timer to 1ms. Usually we get a notification every 16 packets, just > before timer expiration. If the workload increases, we get > notifications sooner, so we increase the window. If the workload drops, > the timer fires and we decrease the window. > > The timer should never fire on an all-out benchmark, or in a ping test. Yeah, I do like the sound of this. However, since it requires a new guest feature and I don't expect it'll improve the situation over the proposed patch until we have copyless transmit, I think we should do this as part of the copyless effort. One thing I'd worry about with this scheme is all-out receive - e.g. any delay in returning a TCP ACK to the sending side, might cause us to hit the TCP window size. > > Second is that currently when we fill up the ring we block the guest > > vcpu and flush. Thus, while we're copying a entire ring full of packets > > that guest isn't making progress. Doing the copying in the I/O thread > > helps there. > > > > We're hurting our cache, and this won't work well with many nics. At > the very least this should be done in a dedicated thread. A thread per nic is doable, but it'd be especially tricky on the receive side without more "short-cut the one producer, one consumer case" work. Cheers, Mark.