From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:34070)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1aO7TL-0001Jh-5Y
	for qemu-devel@nongnu.org; Tue, 26 Jan 2016 12:32:07 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1aO7TH-0000Tz-6f
	for qemu-devel@nongnu.org; Tue, 26 Jan 2016 12:32:07 -0500
Received: from mx1.redhat.com ([209.132.183.28]:58020)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1aO7TH-0000Tu-0u
	for qemu-devel@nongnu.org; Tue, 26 Jan 2016 12:32:03 -0500
References: <56A7A1A8.4060704@windriver.com> <56A7A3C7.6090006@redhat.com>
	<56A7AB12.8060303@windriver.com>
From: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <56A7AD8C.50606@redhat.com>
Date: Tue, 26 Jan 2016 18:31:56 +0100
MIME-Version: 1.0
In-Reply-To: <56A7AB12.8060303@windriver.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] high outage times for qemu virtio network links
 during live migration, trying to debug
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Chris Friesen <chris.friesen@windriver.com>, libvir-list@redhat.com, qemu-devel@nongnu.org


On 26/01/2016 18:21, Chris Friesen wrote:
>>>
>>> My question is, why doesn't qemu continue processing virtio packets
>>> while the dirty page scanning and memory transfer over the network is
>>> proceeding?
>>
>> QEMU (or vhost) _are_ processing virtio traffic, because otherwise you'd
>> have no delay---only dropped packets.  Or am I missing something?
> 
> I have separate timestamps embedded in the packet for when it was sent
> and when it was echoed back by the target (which is the one being
> migrated).  What I'm seeing is that packets to the guest are being sent
> every msec, but they get delayed somewhere for over a second on the way
> to the destination VM while the migration is in progress.  Once the
> migration is over, a bunch of packets get delivered to the app in the
> guest and are then processed all at once and echoed back to the sender
> in a big burst (and a bunch of packets are dropped, presumably due to a
> buffer overflowing somewhere).

That doesn't exclude a bug somewhere in net/ code.  It doesn't pinpoint
it to QEMU or vhost-net.

In any case, what I would do is to use tracing at all levels (guest
kernel, QEMU, host kernel) for packet rx and tx, and find out at which
layer the hiccup appears.

Paolo

> For comparison, we have a DPDK-based fastpath NIC type that we added
> (sort of like vhost-net), and it continues to process packets while the
> dirty page scanning is going on.  Only the actual cutover affects it.