From mboxrd@z Thu Jan  1 00:00:00 1970
From: Mark McLoughlin <markmc@redhat.com>
Subject: Re: [PATCH 0/6] Kill off the virtio_net tx mitigation timer
Date: Mon, 03 Nov 2008 15:04:54 +0000
Message-ID: <1225724694.5904.63.camel@blaa>
References: <> <1225389113-28332-1-git-send-email-markmc@redhat.com>
	 <490D7754.4070807@redhat.com> <1225715009.5904.39.camel@blaa>
	 <490EF141.8040005@redhat.com>
Reply-To: Mark McLoughlin <markmc@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
Cc: kvm@vger.kernel.org
To: Avi Kivity <avi@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx2.redhat.com ([66.187.237.31]:48162 "EHLO mx2.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1756338AbYKCPFx (ORCPT <rfc822;kvm@vger.kernel.org>);
	Mon, 3 Nov 2008 10:05:53 -0500
Received: from int-mx2.corp.redhat.com (int-mx2.corp.redhat.com [172.16.27.26])
	by mx2.redhat.com (8.13.8/8.13.8) with ESMTP id mA3F5r49001043
	for <kvm@vger.kernel.org>; Mon, 3 Nov 2008 10:05:53 -0500
In-Reply-To: <490EF141.8040005@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On Mon, 2008-11-03 at 14:40 +0200, Avi Kivity wrote:
> Mark McLoughlin wrote:
> > On Sun, 2008-11-02 at 11:48 +0200, Avi Kivity wrote:
> >   
> >> Mark McLoughlin wrote:
> >>> The main patch in this series is 5/6 - it just kills off the
> >>> virtio_net tx mitigation timer and does all the tx I/O in the
> >>> I/O thread.
> >>>
> >>>   
> >>>       
> >> What will it do to small packet, multi-flow loads (simulated by ping -f 
> >> -l 30 $external)?
> >>     
> >
> > It should improve the latency - the packets will be flushed more quickly
> > than the 150us timeout without blocking the guest.
> >
> >   
> 
> But it will increase overhead, since suddenly we aren't queueing 
> anymore.  One vmexit per small packet.

Yes in theory, but the packet copies are acting to mitigate exits since
we don't re-enable notifications again until we're sure the ring is
empty.

With copyless, though, we'd have an unacceptable vmexit rate.

> >> Where does the benefit come from?
> >>     
> >
> > There are two things going on here, I think.
> >
> > First is that the timer affects latency, removing the timeout helps
> > that.
> >   
> 
> If the timer affects latency, then something is very wrong.  We're 
> lacking an adjustable window.
> 
> The way I see it, the notification window should be adjusted according 
> to the current workload.  If the link is idle, the window should be one 
> packet -- notify as soon as something is queued.  As the workload 
> increases, the window increases to (safety_factor * allowable_latency / 
> packet_rate).  The timer is set to allowable_latency to catch changes in 
> workload.
> 
> For example:
> 
> - allowable_latency 1ms (implies 1K vmexits/sec desired)
> - current packet_rate 20K packets/sec
> - safety_factor 0.8
> 
> So we request notifications every 0.8 * 20K * 1m = 16 packets, and set 
> the timer to 1ms.  Usually we get a notification every 16 packets, just 
> before timer expiration.  If the workload increases, we get 
> notifications sooner, so we increase the window.  If the workload drops, 
> the timer fires and we decrease the window.
> 
> The timer should never fire on an all-out benchmark, or in a ping test.

Yeah, I do like the sound of this.

However, since it requires a new guest feature and I don't expect it'll
improve the situation over the proposed patch until we have copyless
transmit, I think we should do this as part of the copyless effort.

One thing I'd worry about with this scheme is all-out receive - e.g. any
delay in returning a TCP ACK to the sending side, might cause us to hit
the TCP window size.

> > Second is that currently when we fill up the ring we block the guest
> > vcpu and flush. Thus, while we're copying a entire ring full of packets
> > that guest isn't making progress. Doing the copying in the I/O thread
> > helps there.
> >   
> 
> We're hurting our cache, and this won't work well with many nics.  At 
> the very least this should be done in a dedicated thread.

A thread per nic is doable, but it'd be especially tricky on the receive
side without more "short-cut the one producer, one consumer case" work.

Cheers,
Mark.