From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4A957C433EF for ; Wed, 5 Jan 2022 21:17:41 +0000 (UTC) Received: from localhost ([::1]:47980 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1n5Df9-0007VS-37 for qemu-devel@archiver.kernel.org; Wed, 05 Jan 2022 16:17:39 -0500 Received: from eggs.gnu.org ([209.51.188.92]:58970) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n5Dc3-0005MV-UB for qemu-devel@nongnu.org; Wed, 05 Jan 2022 16:14:28 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:46639) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n5Dbw-00062F-NI for qemu-devel@nongnu.org; Wed, 05 Jan 2022 16:14:26 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1641417259; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=ONsG7DRoO6uq1S6/vcW0ccNEbLbV//wEK4iHgyh5ssA=; b=eiHWXh0JPJZh00VNP6B7SdIJI57e+Ii6qrZSDopC5iXU57gC3mXRz/c0dANpf7waAJoVkc fN0iuCeYHi6VBprJtZnbx1yW5QfHk5xeTDG+J/diZH3o2j8n+fXyOATC4a7ji3Ur/KiHMT bemBJflHpdCq5lop3hXQRCCWAvUrxXw= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-669-hWMxfR6TNs-1hVHyRfVdAw-1; Wed, 05 Jan 2022 16:14:17 -0500 X-MC-Unique: hWMxfR6TNs-1hVHyRfVdAw-1 Received: by mail-wm1-f72.google.com with SMTP id l20-20020a05600c1d1400b003458e02cea0so2384466wms.7 for ; Wed, 05 Jan 2022 13:14:16 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=ONsG7DRoO6uq1S6/vcW0ccNEbLbV//wEK4iHgyh5ssA=; b=dnPUtTNIgp4tZVTRDihkHQn9Ahbq3Y+1G09ACfhcIF4/OulVrWEZd0wEkE+3dhllH1 4HJYb518MQPluSAH9RCeYBA4h6RjGz6YvBjHxc1iI/qUQEsl+M2Ztt13ClQeZ8p5+6IM ED7/0TMCB7xXO4/MxB9DlbHgjhCpf9R4mxMx+4evv7ah7cRyYs5ndnxS5CDN2DEoifTR W4ROwucAXKeH3CudXkz5nXJUckAnGJ4MR90kHmD5o1j7gKMfqT4cijJwFPoBqxXybBwK TzJ1KS0YZ7veMnUvo7oEZSkQZLZmBYr7xlaI1Y83BHLdJpUyLj7o/6L6DuSqVaU8MwOO uiGQ== X-Gm-Message-State: AOAM532EjV4kKhglIOEeUfkoTZhTWt/bE4aQvjXwyQjHRyTxucNvgabr Sp4ejNlRuhgyMWtYPxPdjD9feQr6S4Bf6RDYo9Zoh/Nw+acVWflr0D7I1ebi8bvi4TW3NQiAB89 x6bskuBhZhAdBeqo= X-Received: by 2002:a5d:46d0:: with SMTP id g16mr50564272wrs.624.1641417255771; Wed, 05 Jan 2022 13:14:15 -0800 (PST) X-Google-Smtp-Source: ABdhPJyv1+s/Fy8QGDf1RXDpAe4JyRP5OiyK+fWSUvmj4w9MscDfvfOzyXLbfFFNuN8qZXSkgSTnwQ== X-Received: by 2002:a5d:46d0:: with SMTP id g16mr50564237wrs.624.1641417255501; Wed, 05 Jan 2022 13:14:15 -0800 (PST) Received: from redhat.com ([2a03:c5c0:207e:991b:6857:5652:b903:a63b]) by smtp.gmail.com with ESMTPSA id q8sm87551wrx.59.2022.01.05.13.14.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Jan 2022 13:14:14 -0800 (PST) Date: Wed, 5 Jan 2022 16:14:09 -0500 From: "Michael S. Tsirkin" To: Steven Sistare Subject: Re: [PATCH V7 19/29] vfio-pci: cpr part 1 (fd and dma) Message-ID: <20220105161046-mutt-send-email-mst@kernel.org> References: <1640199934-455149-1-git-send-email-steven.sistare@oracle.com> <1640199934-455149-20-git-send-email-steven.sistare@oracle.com> <20211222181003-mutt-send-email-mst@kernel.org> MIME-Version: 1.0 In-Reply-To: Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=mst@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Received-SPF: pass client-ip=170.10.129.124; envelope-from=mst@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -31 X-Spam_score: -3.2 X-Spam_bar: --- X-Spam_report: (-3.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.372, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Daniel P. Berrange" , Juan Quintela , Jason Zeng , Alex =?iso-8859-1?Q?Benn=E9e?= , qemu-devel@nongnu.org, Eric Blake , "Dr. David Alan Gilbert" , Zheng Chuan , Alex Williamson , Stefan Hajnoczi , =?iso-8859-1?Q?Marc-Andr=E9?= Lureau , Paolo Bonzini , Philippe =?iso-8859-1?Q?Mathieu-Daud=E9?= , Markus Armbruster Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" On Wed, Jan 05, 2022 at 12:24:21PM -0500, Steven Sistare wrote: > On 12/22/2021 6:15 PM, Michael S. Tsirkin wrote: > > On Wed, Dec 22, 2021 at 11:05:24AM -0800, Steve Sistare wrote: > >> Enable vfio-pci devices to be saved and restored across an exec restart > >> of qemu. > >> > >> At vfio creation time, save the value of vfio container, group, and device > >> descriptors in cpr state. > >> > >> In cpr-save and cpr-exec, suspend the use of virtual addresses in DMA > >> mappings with VFIO_DMA_UNMAP_FLAG_VADDR, because guest ram will be remapped > >> at a different VA after exec. DMA to already-mapped pages continues. Save > >> the msi message area as part of vfio-pci vmstate, save the interrupt and > >> notifier eventfd's in cpr state, and clear the close-on-exec flag for the > >> vfio descriptors. The flag is not cleared earlier because the descriptors > >> should not persist across miscellaneous fork and exec calls that may be > >> performed during normal operation. > >> > >> On qemu restart, vfio_realize() finds the saved descriptors, uses > >> the descriptors, and notes that the device is being reused. Device and > >> iommu state is already configured, so operations in vfio_realize that > >> would modify the configuration are skipped for a reused device, including > >> vfio ioctl's and writes to PCI configuration space. The result is that > >> vfio_realize constructs qemu data structures that reflect the current > >> state of the device. However, the reconstruction is not complete until > >> cpr-load is called. cpr-load loads the msi data and finds eventfds in cpr > >> state. It rebuilds vector data structures and attaches the interrupts to > >> the new KVM instance. cpr-load then invokes the main vfio listener callback, > >> which walks the flattened ranges of the vfio_address_spaces and calls > >> VFIO_DMA_MAP_FLAG_VADDR to inform the kernel of the new VA's. Lastly, it > >> starts the VM and suppresses vfio pci device reset. > >> > >> This functionality is delivered by 3 patches for clarity. Part 1 handles > >> device file descriptors and DMA. Part 2 adds eventfd and MSI/MSI-X vector > >> support. Part 3 adds INTX support. > >> > >> Signed-off-by: Steve Sistare > >> --- > >> MAINTAINERS | 1 + > >> hw/pci/pci.c | 10 ++++ > >> hw/vfio/common.c | 115 ++++++++++++++++++++++++++++++++++++++---- > >> hw/vfio/cpr.c | 94 ++++++++++++++++++++++++++++++++++ > >> hw/vfio/meson.build | 1 + > >> hw/vfio/pci.c | 77 ++++++++++++++++++++++++++++ > >> hw/vfio/trace-events | 1 + > >> include/hw/pci/pci.h | 1 + > >> include/hw/vfio/vfio-common.h | 8 +++ > >> include/migration/cpr.h | 3 ++ > >> migration/cpr.c | 10 +++- > >> migration/target.c | 14 +++++ > >> 12 files changed, 324 insertions(+), 11 deletions(-) > >> create mode 100644 hw/vfio/cpr.c > >> > >> diff --git a/MAINTAINERS b/MAINTAINERS > >> index cfe7480..feed239 100644 > >> --- a/MAINTAINERS > >> +++ b/MAINTAINERS > >> @@ -2992,6 +2992,7 @@ CPR > >> M: Steve Sistare > >> M: Mark Kanda > >> S: Maintained > >> +F: hw/vfio/cpr.c > >> F: include/migration/cpr.h > >> F: migration/cpr.c > >> F: qapi/cpr.json > >> diff --git a/hw/pci/pci.c b/hw/pci/pci.c > >> index 0fd21e1..e35df4f 100644 > >> --- a/hw/pci/pci.c > >> +++ b/hw/pci/pci.c > >> @@ -307,6 +307,16 @@ static void pci_do_device_reset(PCIDevice *dev) > >> { > >> int r; > >> > >> + /* > >> + * A reused vfio-pci device is already configured, so do not reset it > >> + * during qemu_system_reset prior to cpr-load, else interrupts may be > >> + * lost. By contrast, pure-virtual pci devices may be reset here and > >> + * updated with new state in cpr-load with no ill effects. > >> + */ > >> + if (dev->reused) { > >> + return; > >> + } > >> + > >> pci_device_deassert_intx(dev); > >> assert(dev->irq_state == 0); > >> > > > > > > Hmm that's a weird thing to do. I suspect this works because > > "reused" means something like "in the process of being restored"? > > Because clearly, we do not want to skip this part e.g. when > > guest resets the device. > > Exactly. vfio_realize sets the flag if it detects the device is reused during > a restart, and vfio_pci_post_load clears the reused flag. > > > So a better name could be called for, but really I don't > > love how vfio gets to poke at internal PCI state. > > I'd rather we found a way just not to call this function. > > If we can't, maybe an explicit API, and make it > > actually say what it's doing? > > How about: > > pci_set_restore(PCIDevice *dev) { dev->restore = true; } > pci_clr_restore(PCIDevice *dev) { dev->restore = false; } > > vfio_realize() > pci_set_restore(pdev) > > vfio_pci_post_load() > pci_clr_restore(pdev) > > pci_do_device_reset() > if (dev->restore) > return; > > - Steve Not too bad. I'd like a better definition of what dev->restore is exactly and to add them in comments near where it is defined and used. E.g. does this mean "device is being restored because of qemu restart"? Do we need a per device flag for this thing or would a global "qemu restart in progress" flag be enough? -- MST