From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34070) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aO7TL-0001Jh-5Y for qemu-devel@nongnu.org; Tue, 26 Jan 2016 12:32:07 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aO7TH-0000Tz-6f for qemu-devel@nongnu.org; Tue, 26 Jan 2016 12:32:07 -0500 Received: from mx1.redhat.com ([209.132.183.28]:58020) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aO7TH-0000Tu-0u for qemu-devel@nongnu.org; Tue, 26 Jan 2016 12:32:03 -0500 References: <56A7A1A8.4060704@windriver.com> <56A7A3C7.6090006@redhat.com> <56A7AB12.8060303@windriver.com> From: Paolo Bonzini Message-ID: <56A7AD8C.50606@redhat.com> Date: Tue, 26 Jan 2016 18:31:56 +0100 MIME-Version: 1.0 In-Reply-To: <56A7AB12.8060303@windriver.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] high outage times for qemu virtio network links during live migration, trying to debug List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Chris Friesen , libvir-list@redhat.com, qemu-devel@nongnu.org On 26/01/2016 18:21, Chris Friesen wrote: >>> >>> My question is, why doesn't qemu continue processing virtio packets >>> while the dirty page scanning and memory transfer over the network is >>> proceeding? >> >> QEMU (or vhost) _are_ processing virtio traffic, because otherwise you'd >> have no delay---only dropped packets. Or am I missing something? > > I have separate timestamps embedded in the packet for when it was sent > and when it was echoed back by the target (which is the one being > migrated). What I'm seeing is that packets to the guest are being sent > every msec, but they get delayed somewhere for over a second on the way > to the destination VM while the migration is in progress. Once the > migration is over, a bunch of packets get delivered to the app in the > guest and are then processed all at once and echoed back to the sender > in a big burst (and a bunch of packets are dropped, presumably due to a > buffer overflowing somewhere). That doesn't exclude a bug somewhere in net/ code. It doesn't pinpoint it to QEMU or vhost-net. In any case, what I would do is to use tracing at all levels (guest kernel, QEMU, host kernel) for packet rx and tx, and find out at which layer the hiccup appears. Paolo > For comparison, we have a DPDK-based fastpath NIC type that we added > (sort of like vhost-net), and it continues to process packets while the > dirty page scanning is going on. Only the actual cutover affects it.