From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2F98C433E0 for ; Mon, 1 Mar 2021 15:11:50 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5ECB1601FE for ; Mon, 1 Mar 2021 15:11:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5ECB1601FE Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:56370 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lGkD6-0001K2-0y for qemu-devel@archiver.kernel.org; Mon, 01 Mar 2021 10:11:48 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:55150) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lGkAp-0007Kv-06 for qemu-devel@nongnu.org; Mon, 01 Mar 2021 10:09:27 -0500 Received: from szxga01-in.huawei.com ([45.249.212.187]:2970) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lGkAl-00008J-DE for qemu-devel@nongnu.org; Mon, 01 Mar 2021 10:09:26 -0500 Received: from DGGEMM401-HUB.china.huawei.com (unknown [172.30.72.57]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4Dq3XR3fYxzYF0M; Mon, 1 Mar 2021 23:07:35 +0800 (CST) Received: from dggema716-chm.china.huawei.com (10.3.20.80) by DGGEMM401-HUB.china.huawei.com (10.3.20.209) with Microsoft SMTP Server (TLS) id 14.3.498.0; Mon, 1 Mar 2021 23:09:06 +0800 Received: from dggema771-chm.china.huawei.com (10.1.198.213) by dggema716-chm.china.huawei.com (10.3.20.80) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2106.2; Mon, 1 Mar 2021 23:09:00 +0800 Received: from dggema771-chm.china.huawei.com ([10.9.128.123]) by dggema771-chm.china.huawei.com ([10.9.128.123]) with mapi id 15.01.2106.006; Mon, 1 Mar 2021 23:09:00 +0800 From: "Wangxin (Alexander)" To: "qemu-devel@nongnu.org" Subject: [RFC]migration: stop/start device at the end of live migration concurrently Thread-Topic: [RFC]migration: stop/start device at the end of live migration concurrently Thread-Index: AdcOrIzBJVDdoqTXSMWIMcUnmEwV9w== Date: Mon, 1 Mar 2021 15:09:00 +0000 Message-ID: Accept-Language: zh-CN, en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.174.149.106] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-CFilter-Loop: Reflected Received-SPF: pass client-ip=45.249.212.187; envelope-from=wangxinxin.wang@huawei.com; helo=szxga01-in.huawei.com X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Wuchenye \(karot, Cloud Infrastructure Service Product Dept\)" , "Zhoujian \(jay\)" , "dgilbert@redhat.com" , "quintela@redhat.com" Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Hi all, We found that the downtime of migration will reach a few seconds when live migrating a huge VM with 224vCPU/180GiB/16 vhost-user nics (x32 queues)/ 24 vhost-user-blk disks(x4 queues), most of the time is spent in the position of stopping the device at src and starting device at dst. Our idea is to stop the device through multiple threads during the end of migration. To be more specific, we create thread pool at the beginning of l= ive migraion, when migration thread call virtio_vmstate_change callback to stop= or start device in vm_state_notify, it will submits request to thread pool to handle the callback concurrently. We live migrate the vm and count the cost time at different stages of stopping/starting devices. - - - Cost: Original With state change concu= rrently get vring base 36ms 18ms disk disable guest notify 48ms 32ms disable host notify 300ms 120ms Src get vring base 1376ms 294ms net disable host notify 1011ms 116ms disable guest notify 59ms 40ms - - - enable guest notify 310ms 97ms net set memtable 48ms 20ms enable host notify 2022ms 114ms=20 Dst enable host notify 312ms 78ms disk enable guest notify 32ms 23ms set memTable 16ms 10ms Total Downtime 5600ms 962ms However, there are some side effects: 1. When disable host notify or guest notify concurrently, the vm will be cr= ashed due to disabling same notify at the different threads, we now add two diffe= rent lock to solve this problem, it is hacking to do so and may be resulting in other= problems. 2. As the QEMU BQL will be held by migration thread before stopping device = in migration_completion, there will be deadlock in the following scene: migration_thread [thread 1] set_up_multithread ... migration_completion()# get QEMU BQL qemu_mutex_lock_iothread() vm_stop_force_state() ... submit stopping device request to thread pool virtio_vmstate_change virtio_set_status ... memory_region_transaction_be= gin ... prepare_mmio_access qemu_mutex_iothread_lock= ed()# N qemu_mutex_lock_iothread= ()# deadlock Now we add another lock to replace the BQL in this scene to solve the probl= em, but we think this is not reliable enough and has potential risk that other processes will also use the QEMU BQL during the process of stopping device.= My question is: how to deal with the conflict with QEMU BQL properly. Any advice will be appreciated, thanks.