qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Cédric Le Goater" <clg@redhat.com>
To: Avihai Horon <avihaih@nvidia.com>,
	Alex Williamson <alex.williamson@redhat.com>
Cc: qemu-devel@nongnu.org, Halil Pasic <pasic@linux.ibm.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Eric Farman <farman@linux.ibm.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	David Hildenbrand <david@redhat.com>,
	Ilya Leoshkevich <iii@linux.ibm.com>,
	Thomas Huth <thuth@redhat.com>,
	Juan Quintela <quintela@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>, Fam Zheng <fam@euphon.net>,
	Eric Blake <eblake@redhat.com>,
	Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>,
	John Snow <jsnow@redhat.com>,
	qemu-s390x@nongnu.org, qemu-block@nongnu.org,
	Yishai Hadas <yishaih@nvidia.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Maor Gottlieb <maorg@nvidia.com>,
	Kirti Wankhede <kwankhede@nvidia.com>,
	Tarun Gupta <targupta@nvidia.com>,
	Joao Martins <joao.m.martins@oracle.com>
Subject: Re: [PATCH v9 07/14] vfio/migration: Block multiple devices migration
Date: Wed, 8 Feb 2023 17:44:24 +0100	[thread overview]
Message-ID: <fcb9bda8-6d95-6109-ae5b-beeb9aa63af2@redhat.com> (raw)
In-Reply-To: <238b17d1-17a3-e5d1-2973-4bda83928d6e@nvidia.com>

On 2/8/23 14:08, Avihai Horon wrote:
> 
> On 08/02/2023 0:34, Alex Williamson wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On Mon, 6 Feb 2023 14:31:30 +0200
>> Avihai Horon <avihaih@nvidia.com> wrote:
>>
>>> Currently VFIO migration doesn't implement some kind of intermediate
>>> quiescent state in which P2P DMAs are quiesced before stopping or
>>> running the device. This can cause problems in multi-device migration
>>> where the devices are doing P2P DMAs, since the devices are not stopped
>>> together at the same time.
>>>
>>> Until such support is added, block migration of multiple devices.
>>>
>>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>>> ---
>>>   include/hw/vfio/vfio-common.h |  2 ++
>>>   hw/vfio/common.c              | 51 +++++++++++++++++++++++++++++++++++
>>>   hw/vfio/migration.c           |  6 +++++
>>>   3 files changed, 59 insertions(+)
>>>
>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>> index e573f5a9f1..56b1683824 100644
>>> --- a/include/hw/vfio/vfio-common.h
>>> +++ b/include/hw/vfio/vfio-common.h
>>> @@ -218,6 +218,8 @@ typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
>>>   extern VFIOGroupList vfio_group_list;
>>>
>>>   bool vfio_mig_active(void);
>>> +int vfio_block_multiple_devices_migration(Error **errp);
>>> +void vfio_unblock_multiple_devices_migration(void);
>>>   int64_t vfio_mig_bytes_transferred(void);
>>>
>>>   #ifdef CONFIG_LINUX
>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>> index 3a35f4afad..01db41b735 100644
>>> --- a/hw/vfio/common.c
>>> +++ b/hw/vfio/common.c
>>> @@ -41,6 +41,7 @@
>>>   #include "qapi/error.h"
>>>   #include "migration/migration.h"
>>>   #include "migration/misc.h"
>>> +#include "migration/blocker.h"
>>>   #include "sysemu/tpm.h"
>>>
>>>   VFIOGroupList vfio_group_list =
>>> @@ -337,6 +338,56 @@ bool vfio_mig_active(void)
>>>       return true;
>>>   }
>>>
>>> +Error *multiple_devices_migration_blocker;

static ?

So we have two migration blockers, one per device and one global. I guess
it is easier that way than trying to update the per device Error*.

C.


>>> +
>>> +static unsigned int vfio_migratable_device_num(void)
>>> +{
>>> +    VFIOGroup *group;
>>> +    VFIODevice *vbasedev;
>>> +    unsigned int device_num = 0;
>>> +
>>> +    QLIST_FOREACH(group, &vfio_group_list, next) {
>>> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
>>> +            if (vbasedev->migration) {
>>> +                device_num++;
>>> +            }
>>> +        }
>>> +    }
>>> +
>>> +    return device_num;
>>> +}
>>> +
>>> +int vfio_block_multiple_devices_migration(Error **errp)
>>> +{
>>> +    int ret;
>>> +
>>> +    if (vfio_migratable_device_num() != 2) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    error_setg(&multiple_devices_migration_blocker,
>>> +               "Migration is currently not supported with multiple "
>>> +               "VFIO devices");
>>> +    ret = migrate_add_blocker(multiple_devices_migration_blocker, errp);
>>> +    if (ret < 0) {
>>> +        error_free(multiple_devices_migration_blocker);
>>> +        multiple_devices_migration_blocker = NULL;
>>> +    }
>>> +
>>> +    return ret;
>>> +}
>>> +
>>> +void vfio_unblock_multiple_devices_migration(void)
>>> +{
>>> +    if (vfio_migratable_device_num() != 2) {
>>> +        return;
>>> +    }
>>> +
>>> +    migrate_del_blocker(multiple_devices_migration_blocker);
>>> +    error_free(multiple_devices_migration_blocker);
>>> +    multiple_devices_migration_blocker = NULL;
>>> +}
>> A couple awkward things here.  First I wish we could do something
>> cleaner or more intuitive than the != 2 test.  I get that we're trying
>> to do this on the addition of the 2nd device supporting migration, or
>> the removal of the next to last device independent of all other devices,
>> but I wonder if it wouldn't be better to remove the multiple-device
>> blocker after migration is torn down for the device so we can test
>> device >1 or ==1 in combination with whether
>> multiple_devices_migration_blocker is NULL.
>>
>> Which comes to the second awkwardness, if we fail to add the blocker we
>> free and clear the blocker, but when we tear down the device due to that
>> failure we'll remove the blocker that doesn't exist, free NULL, and
>> clear it again.  Thanks to the glib slist the migration blocker is
>> using, I think that all works, but I'd rather not be dependent on that
>> implementation to avoid a segfault here.  Incorporating a test of
>> multiple_devices_migration_blocker as above would avoid this too.
> 
> You mean something like this?
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 3a35f4afad..f3e08eff58 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> 
> [...]
> 
> +int vfio_block_multiple_devices_migration(Error **errp)
> +{
> +    int ret;
> +
> +    if (vfio_migratable_device_num() <= 1 ||
> +        multiple_devices_migration_blocker) {
> +        return 0;
> +    }
> +
> +    error_setg(&multiple_devices_migration_blocker,
> +               "Migration is currently not supported with multiple "
> +               "VFIO devices");
> +    ret = migrate_add_blocker(multiple_devices_migration_blocker, errp);
> +    if (ret < 0) {
> +        error_free(multiple_devices_migration_blocker);
> +        multiple_devices_migration_blocker = NULL;
> +    }
> +
> +    return ret;
> +}
> +
> +void vfio_unblock_multiple_devices_migration(void)
> +{
> +    if (vfio_migratable_device_num() > 1 ||
> +        !multiple_devices_migration_blocker) {
> +        return;
> +    }
> +
> +    migrate_del_blocker(multiple_devices_migration_blocker);
> +    error_free(multiple_devices_migration_blocker);
> +    multiple_devices_migration_blocker = NULL;
> +}
> +
>   static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
>   {
>       VFIOGroup *group;
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 552c2313b2..15b446c0ec 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -880,6 +880,11 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
>           goto add_blocker;
>       }
> 
> +    ret = vfio_block_multiple_devices_migration(errp);
> +    if (ret) {
> +        return ret;
> +    }
> +
>       trace_vfio_migration_probe(vbasedev->name, info->index);
>       g_free(info);
>       return 0;
> @@ -906,6 +911,7 @@ void vfio_migration_finalize(VFIODevice *vbasedev)
>           qemu_del_vm_change_state_handler(migration->vm_state);
>           unregister_savevm(VMSTATE_IF(vbasedev->dev), "vfio", vbasedev);
>           vfio_migration_exit(vbasedev);
> +        vfio_unblock_multiple_devices_migration();
>       }
> 
>       if (vbasedev->migration_blocker) {
> 
> 
> Maybe also negate the if conditions and put the add/remove blocker code inside it? Is it more readable this way?
> E.g.:
> 
> +int vfio_block_multiple_devices_migration(Error **errp)
> +{
> +    int ret = 0;
> +
> +    if (vfio_migratable_device_num() > 1 &&
> +        !multiple_devices_migration_blocker) {
> +        error_setg(&multiple_devices_migration_blocker,
> +                   "Migration is currently not supported with multiple "
> +                   "VFIO devices");
> +        ret = migrate_add_blocker(multiple_devices_migration_blocker, errp);
> +        if (ret < 0) {
> +            error_free(multiple_devices_migration_blocker);
> +            multiple_devices_migration_blocker = NULL;
> +        }
> +    }
> +
> +    return ret;
> +}
> +
> +void vfio_unblock_multiple_devices_migration(void)
> +{
> +    if (vfio_migratable_device_num() <= 1 &&
> +        multiple_devices_migration_blocker) {
> +        migrate_del_blocker(multiple_devices_migration_blocker);
> +        error_free(multiple_devices_migration_blocker);
> +        multiple_devices_migration_blocker = NULL;
> +    }
> +}
> 
> Thanks.
> 



  reply	other threads:[~2023-02-08 16:45 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-06 12:31 [PATCH v9 00/14] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
2023-02-06 12:31 ` [PATCH v9 01/14] linux-headers: Update to v6.2-rc1 Avihai Horon
2023-02-06 12:31 ` [PATCH v9 02/14] migration: No save_live_pending() method uses the QEMUFile parameter Avihai Horon
2023-02-06 12:31 ` [PATCH v9 03/14] vfio/migration: Fix NULL pointer dereference bug Avihai Horon
2023-02-06 12:31 ` [PATCH v9 04/14] vfio/migration: Allow migration without VFIO IOMMU dirty tracking support Avihai Horon
2023-02-06 12:31 ` [PATCH v9 05/14] migration/qemu-file: Add qemu_file_get_to_fd() Avihai Horon
2023-02-06 12:31 ` [PATCH v9 06/14] vfio/common: Change vfio_devices_all_running_and_saving() logic to equivalent one Avihai Horon
2023-02-06 12:31 ` [PATCH v9 07/14] vfio/migration: Block multiple devices migration Avihai Horon
2023-02-07 22:34   ` Alex Williamson
2023-02-08 13:08     ` Avihai Horon
2023-02-08 16:44       ` Cédric Le Goater [this message]
2023-02-08 17:16         ` Avihai Horon
2023-02-08 17:22       ` Alex Williamson
2023-02-08 17:35         ` Avihai Horon
2023-02-06 12:31 ` [PATCH v9 08/14] vfio/migration: Move migration v1 logic to vfio_migration_init() Avihai Horon
2023-02-06 12:31 ` [PATCH v9 09/14] vfio/migration: Rename functions/structs related to v1 protocol Avihai Horon
2023-02-06 12:31 ` [PATCH v9 10/14] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
2023-02-07 23:42   ` Alex Williamson
2023-02-08 13:15     ` Avihai Horon
2023-02-06 12:31 ` [PATCH v9 11/14] vfio/migration: Optimize vfio_save_pending() Avihai Horon
2023-02-06 12:31 ` [PATCH v9 12/14] vfio/migration: Remove VFIO migration protocol v1 Avihai Horon
2023-02-06 12:31 ` [PATCH v9 13/14] vfio: Alphabetize migration section of VFIO trace-events file Avihai Horon
2023-02-06 12:31 ` [PATCH v9 14/14] docs/devel: Align VFIO migration docs to v2 protocol Avihai Horon
2023-02-07 23:49   ` Alex Williamson
2023-02-08 13:18     ` Avihai Horon
2023-02-08 17:25   ` Cédric Le Goater
2023-02-08 17:40     ` Avihai Horon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fcb9bda8-6d95-6109-ae5b-beeb9aa63af2@redhat.com \
    --to=clg@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=avihaih@nvidia.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=david@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eblake@redhat.com \
    --cc=fam@euphon.net \
    --cc=farman@linux.ibm.com \
    --cc=iii@linux.ibm.com \
    --cc=jgg@nvidia.com \
    --cc=joao.m.martins@oracle.com \
    --cc=jsnow@redhat.com \
    --cc=kwankhede@nvidia.com \
    --cc=maorg@nvidia.com \
    --cc=mst@redhat.com \
    --cc=pasic@linux.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-s390x@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=richard.henderson@linaro.org \
    --cc=stefanha@redhat.com \
    --cc=targupta@nvidia.com \
    --cc=thuth@redhat.com \
    --cc=vsementsov@yandex-team.ru \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).