From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1C742C00140 for ; Tue, 26 Jul 2022 16:19:18 +0000 (UTC) Received: from localhost ([::1]:56026 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oGNHB-0002vr-6u for qemu-devel@archiver.kernel.org; Tue, 26 Jul 2022 12:19:17 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:34860) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oGN9e-0002du-K8 for qemu-devel@nongnu.org; Tue, 26 Jul 2022 12:11:30 -0400 Received: from mx0b-00069f02.pphosted.com ([205.220.177.32]:40810) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oGN9b-00069k-Mz for qemu-devel@nongnu.org; Tue, 26 Jul 2022 12:11:30 -0400 Received: from pps.filterd (m0246630.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26QFnPjs006246; Tue, 26 Jul 2022 16:10:47 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : mime-version : content-type : content-transfer-encoding; s=corp-2022-7-12; bh=DkYrxW9EsyqEnUWotH853gQMGa486bfxXa6V4VW0KFk=; b=iPVw0LVKUuMw8S7r2dbLw2vwD9s5hq/ZR8eAMPbDV0ysC1PV9Jgr7woP0j+IIkvdhhnk bviN67ywUYs5bXPSiVt8JPh3SlcUoMJtw3wq531qmdZapUUGdWRheSoMlwqL/02b8yVY d7ZNCPIIWDzuKwdbqJWYhQ25VET7hlpUiRMx48NJKkbGl447xL+SRPYkPZ6P4fS9j9fr QvW/W7z5Cli5gqfUG1PI2JIcm77s/MLdVSmNWHr8+JhcLmNB61rwBSvbmO1XRRXRYTIw fu/w/Mo5CGCeoOPY168xSOh5aOPq0dreaoGja2Idf6mpnQfSGwj1XvRR/7BTqD5KpNKQ RQ== Received: from phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta03.appoci.oracle.com [138.1.37.129]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3hg940pty4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 26 Jul 2022 16:10:47 +0000 Received: from pps.filterd (phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 26QEsIAQ006240; Tue, 26 Jul 2022 16:10:46 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3hh65bq10u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 26 Jul 2022 16:10:46 +0000 Received: from phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 26QG5uS9023334; Tue, 26 Jul 2022 16:10:45 GMT Received: from ca-dev63.us.oracle.com (ca-dev63.us.oracle.com [10.211.8.221]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 3hh65bq0y5-1; Tue, 26 Jul 2022 16:10:45 +0000 From: Steve Sistare To: qemu-devel@nongnu.org Cc: Paolo Bonzini , Stefan Hajnoczi , =?UTF-8?q?Marc-Andr=C3=A9=20Lureau?= , =?UTF-8?q?Alex=20Benn=C3=A9e?= , "Dr. David Alan Gilbert" , "Michael S. Tsirkin" , Marcel Apfelbaum , Alex Williamson , "Daniel P. Berrange" , Juan Quintela , Markus Armbruster , Eric Blake , Jason Zeng , Zheng Chuan , Steve Sistare , Mark Kanda , Guoyi Tu , Peter Maydell , =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , Igor Mammedov , David Hildenbrand , John Snow , Peng Liang Subject: [PATCH V9 00/46] Live Update Date: Tue, 26 Jul 2022 09:09:57 -0700 Message-Id: <1658851843-236870-1-git-send-email-steven.sistare@oracle.com> X-Mailer: git-send-email 1.8.3.1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-26_04,2022-07-26_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 malwarescore=0 phishscore=0 spamscore=0 mlxlogscore=999 suspectscore=0 bulkscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2206140000 definitions=main-2207260063 X-Proofpoint-GUID: iUmkETpK0u8ZuLI8ygIL-gn7EyHkBF_n X-Proofpoint-ORIG-GUID: iUmkETpK0u8ZuLI8ygIL-gn7EyHkBF_n Received-SPF: pass client-ip=205.220.177.32; envelope-from=steven.sistare@oracle.com; helo=mx0b-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" This version of the live update patch series integrates live update into the live migration framework. The new interfaces are: * mode (migration parameter) * cpr-exec-args (migration parameter) * file (migration URI) * migrate-mode-enable (command-line argument) * only-cpr-capable (command-line argument) Provide the cpr-exec and cpr-reboot migration modes for live update. These save and restore VM state, with minimal guest pause time, so that qemu may be updated to a new version in between. The caller sets the mode parameter before invoking the migrate or migrate-incoming commands. In cpr-reboot mode, the migrate command saves state to a file, allowing one to quit qemu, reboot to an updated kernel, start an updated version of qemu, and resume via the migrate-incoming command. The caller must specify a migration URI that writes to and reads from a file. Unlike normal mode, the use of certain local storage options does not block the migration, but the caller must not modify guest block devices between the quit and restart. The guest RAM memory-backend must be shared, and the @x-ignore-shared migration capability must be set, to avoid saving it to the file. Guest RAM must be non-volatile across reboot, which can be achieved by backing it with a dax device, or /dev/shm PKRAM as proposed in https://lore.kernel.org/lkml/1617140178-8773-1-git-send-email-anthony.yznaga@oracle.com but this is not enforced. The restarted qemu arguments must match those used to initially start qemu, plus the -incoming option. The reboot mode supports vfio devices if the caller first suspends the guest, such as by issuing guest-suspend-ram to the qemu guest agent. The guest drivers' suspend methods flush outstanding requests and re-initialize the devices, and thus there is no device state to save and restore. After issuing migrate-incoming, the caller must issue a system_wakeup command to resume. In cpr-exec mode, the migrate command saves state to a file and directly exec's a new version of qemu on the same host, replacing the original process while retaining its PID. The caller must specify a migration URI that writes to and reads from a file, and resumes execution via the migrate-incoming command. Arguments for the new qemu process are taken from the cpr-exec-args migration parameter, and must include the -incoming option. Guest RAM must be backed by a memory backend with share=on, but cannot be memory-backend-ram. The memory is re-mmap'd in the updated process, so guest ram is efficiently preserved in place, albeit with new virtual addresses. In addition, the '-migrate-mode-enable cpr-exec' option is required. This causes secondary guest ram blocks (those not specified on the command line) to be allocated by mmap'ing a memfd. The memfds are kept open across exec, their values are saved in special cpr state which is retrieved after exec, and they are re-mmap'd. Since guest RAM is not copied, and storage blocks are not migrated, the caller must disable all capabilities related to page and block copy. The implementation ignores all related parameters. The exec mode supports vfio devices by preserving the vfio container, group, device, and event descriptors across the qemu re-exec, and by updating DMA mapping virtual addresses using VFIO_DMA_UNMAP_FLAG_VADDR and VFIO_DMA_MAP_FLAG_VADDR as defined in https://lore.kernel.org/kvm/1611939252-7240-1-git-send-email-steven.sistare@oracle.com and integrated in Linux kernel 5.12. Here is an example of updating qemu from v7.0.50 to v7.0.51 using exec mode. The software update is performed while the guest is running to minimize downtime. window 1 | window 2 | # qemu-system-$arch ... | -migrate-mode-enable cpr-exec | QEMU 7.0.50 monitor - type 'help' ... | (qemu) info status | VM status: running | | # yum update qemu (qemu) migrate_set_parameter mode cpr-exec | (qemu) migrate_set_parameter cpr-exec-args | qemu-system-$arch ... -incoming defer | (qemu) migrate -d file:/tmp/qemu.sav | QEMU 7.0.51 monitor - type 'help' ... | (qemu) info status | VM status: paused (inmigrate) | (qemu) migrate_incoming file:/tmp/qemu.sav | (qemu) info status | VM status: running | Here is an example of updating the host kernel using reboot mode. window 1 | window 2 | # qemu-system-$arch ... mem-path=/dev/dax0.0 | -migrate-mode-enable cpr-reboot | QEMU 7.0.50 monitor - type 'help' ... | (qemu) info status | VM status: running | | # yum update kernel-uek (qemu) migrate_set_parameter mode cpr-reboot | (qemu) migrate -d file:/tmp/qemu.sav | (qemu) quit | | # systemctl kexec | kexec_core: Starting new kernel | ... | | # qemu-system-$arch mem-path=/dev/dax0.0 ... | -incoming defer | QEMU 7.0.51 monitor - type 'help' ... | (qemu) info status | VM status: paused (inmigrate) | (qemu) migrate_incoming file:/tmp/qemu.sav | (qemu) info status | VM status: running | Changes from V8 to V9: vfio: - free all cpr state during unwind in vfio_connect_container - change cpr_resave_fd to return void, and avoid new unwind cases - delete incorrect .unmigratable=1 in vmstate handlers - add route batching in vfio_claim_vectors - simplified vfio intx cpr code - fix commit message for 'recover from unmap-all-vaddr failure' - verify suspended runstate for cpr-reboot mode Other: - delete cpr-save, cpr-exec, cpr-load - delete ram block vmstate handlers that were added in V8 - rename cpr-enable option to migrate-mode-enable - add file URI for migration - add mode and cpr-exec-args migration parameters - add per-mode migration blockers - add mode checks in migration notifiers - fix suspended runstate during migration - replace RAM_ANON flag with RAM_NAMED_FILE - support memory-backend-epc Steve Sistare (44): migration: fix populate_vfio_info --- reboot mode --- memory: RAM_NAMED_FILE flag migration: file URI migration: mode parameter migration: migrate-enable-mode option migration: simplify blockers migration: per-mode blockers cpr: relax some blockers cpr: reboot mode qdev-properties: strList --- exec mode --- qapi: strList_from_string qapi: QAPI_LIST_LENGTH qapi: strv_from_strList qapi: strList unit tests migration: cpr-exec-args parameter migration: simplify notifiers migration: check mode in notifiers memory: flat section iterator oslib: qemu_clear_cloexec vl: helper to request re-exec cpr: preserve extra state cpr: exec mode cpr: add exec-mode blockers cpr: ram block blockers cpr: only-cpr-capable cpr: Mismatched GPAs fix hostmem-memfd: cpr support hostmem-epc: cpr support pci: export msix_is_pending --- vfio for exec --- vfio-pci: refactor for cpr vfio-pci: cpr part 1 (fd and dma) vfio-pci: cpr part 2 (msi) vfio-pci: cpr part 3 (intx) vfio-pci: recover from unmap-all-vaddr failure chardev: cpr framework --- misc for exec --- chardev: cpr for simple devices chardev: cpr for pty python/machine: QEMUMachine full_args python/machine: QEMUMachine reopen_qmp_connection tests/avocado: add cpr regression test vl: start on wakeup request --- vfio for reboot --- migration: fix suspended runstate migration: notifier error reporting vfio: allow cpr-reboot migration if suspended Mark Kanda, Steve Sistare (2): vhost: reset vhost devices for cpr chardev: cpr for sockets MAINTAINERS | 14 ++ accel/xen/xen-all.c | 3 + backends/hostmem-epc.c | 18 +- backends/hostmem-file.c | 1 + backends/hostmem-memfd.c | 22 ++- backends/tpm/tpm_emulator.c | 11 +- block/parallels.c | 7 +- block/qcow.c | 7 +- block/vdi.c | 7 +- block/vhdx.c | 7 +- block/vmdk.c | 7 +- block/vpc.c | 7 +- block/vvfat.c | 7 +- chardev/char-mux.c | 1 + chardev/char-null.c | 1 + chardev/char-pty.c | 16 +- chardev/char-serial.c | 1 + chardev/char-socket.c | 48 +++++ chardev/char-stdio.c | 31 +++ chardev/char.c | 49 ++++- dump/dump.c | 4 +- gdbstub.c | 1 + hmp-commands.hx | 2 +- hw/9pfs/9p.c | 11 +- hw/core/qdev-properties-system.c | 12 ++ hw/core/qdev-properties.c | 44 +++++ hw/display/virtio-gpu-base.c | 8 +- hw/intc/arm_gic_kvm.c | 3 +- hw/intc/arm_gicv3_its_kvm.c | 3 +- hw/intc/arm_gicv3_kvm.c | 3 +- hw/misc/ivshmem.c | 8 +- hw/net/virtio-net.c | 10 +- hw/pci/msix.c | 2 +- hw/pci/pci.c | 12 ++ hw/ppc/pef.c | 2 +- hw/ppc/spapr.c | 2 +- hw/ppc/spapr_events.c | 2 +- hw/ppc/spapr_rtas.c | 2 +- hw/remote/proxy.c | 7 +- hw/s390x/s390-virtio-ccw.c | 9 +- hw/scsi/vhost-scsi.c | 9 +- hw/vfio/common.c | 235 +++++++++++++++++++---- hw/vfio/cpr.c | 177 ++++++++++++++++++ hw/vfio/meson.build | 1 + hw/vfio/migration.c | 23 +-- hw/vfio/pci.c | 336 ++++++++++++++++++++++++++++----- hw/vfio/trace-events | 1 + hw/virtio/vhost-vdpa.c | 6 +- hw/virtio/vhost.c | 32 +++- include/chardev/char-socket.h | 1 + include/chardev/char.h | 5 + include/exec/memory.h | 48 +++++ include/exec/ram_addr.h | 1 + include/exec/ramblock.h | 1 + include/hw/pci/msix.h | 1 + include/hw/qdev-properties-system.h | 4 + include/hw/qdev-properties.h | 3 + include/hw/vfio/vfio-common.h | 12 ++ include/hw/virtio/vhost.h | 1 + include/migration/blocker.h | 69 ++++++- include/migration/cpr-state.h | 30 +++ include/migration/cpr.h | 20 ++ include/migration/misc.h | 13 +- include/migration/vmstate.h | 2 + include/qapi/util.h | 28 +++ include/qemu/osdep.h | 9 + include/sysemu/runstate.h | 2 + migration/cpr-state.c | 362 ++++++++++++++++++++++++++++++++++++ migration/cpr.c | 85 +++++++++ migration/file.c | 62 ++++++ migration/file.h | 14 ++ migration/meson.build | 3 + migration/migration.c | 268 +++++++++++++++++++++++--- migration/ram.c | 24 ++- migration/target.c | 1 + migration/trace-events | 12 ++ monitor/hmp-cmds.c | 59 +++--- monitor/hmp.c | 3 + monitor/qmp.c | 4 + python/qemu/machine/machine.py | 14 ++ qapi/char.json | 7 +- qapi/migration.json | 68 ++++++- qapi/qapi-util.c | 37 ++++ qemu-options.hx | 50 ++++- replay/replay.c | 4 + softmmu/memory.c | 31 ++- softmmu/physmem.c | 100 +++++++++- softmmu/runstate.c | 42 ++++- softmmu/vl.c | 10 + stubs/cpr-state.c | 26 +++ stubs/meson.build | 2 + stubs/migr-blocker.c | 9 +- stubs/migration.c | 33 ++++ target/i386/kvm/kvm.c | 8 +- target/i386/nvmm/nvmm-all.c | 4 +- target/i386/sev.c | 2 +- target/i386/whpx/whpx-all.c | 3 +- tests/avocado/cpr.py | 176 ++++++++++++++++++ tests/unit/meson.build | 1 + tests/unit/test-strlist.c | 81 ++++++++ trace-events | 1 + ui/spice-core.c | 5 +- ui/vdagent.c | 5 +- util/oslib-posix.c | 9 + util/oslib-win32.c | 4 + 105 files changed, 2781 insertions(+), 330 deletions(-) create mode 100644 hw/vfio/cpr.c create mode 100644 include/migration/cpr-state.h create mode 100644 include/migration/cpr.h create mode 100644 migration/cpr-state.c create mode 100644 migration/cpr.c create mode 100644 migration/file.c create mode 100644 migration/file.h create mode 100644 stubs/cpr-state.c create mode 100644 stubs/migration.c create mode 100644 tests/avocado/cpr.py create mode 100644 tests/unit/test-strlist.c -- 1.8.3.1