From: Kirti Wankhede <kwankhede@nvidia.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: alex.williamson@redhat.com, cjia@nvidia.com,
qemu-devel@nongnu.org, kvm@vger.kernel.org, quintela@redhat.com
Subject: Re: [RFC PATCH v1 1/4] VFIO KABI for migration interface
Date: Thu, 18 Oct 2018 02:47:22 +0530 [thread overview]
Message-ID: <2abff2cd-6bdc-69d9-28b3-72baab47bf81@nvidia.com> (raw)
In-Reply-To: <20181017100952.GA2531@work-vm>
On 10/17/2018 3:39 PM, Dr. David Alan Gilbert wrote:
> * Kirti Wankhede (kwankhede@nvidia.com) wrote:
>> - Added vfio_device_migration_info structure to use interact with vendor
>> driver.
>> - Different flags are used to get or set migration related information
>> from/to vendor driver.
>> Flag VFIO_MIGRATION_PROBE: To query if feature is supported
>> Flag VFIO_MIGRATION_GET_REGION: To get migration region info
>> Flag VFIO_MIGRATION_SET_STATE: To convey device state in vendor driver
>> Flag VFIO_MIGRATION_GET_PENDING: To get pending bytes yet to be migrated
>> from vendor driver
>> Flag VFIO_MIGRATION_GET_BUFFER: On this flag, vendor driver should write
>> data to migration region and return number of bytes written in the region
>> Flag VFIO_MIGRATION_SET_BUFFER: In migration resume path, user space app
>> writes to migration region and communicates it to vendor driver with
>> this ioctl with this flag.
>> Flag VFIO_MIGRATION_GET_DIRTY_PFNS: Get bitmap of dirty pages from vendor
>> driver from given start address
>>
>> - Added enum for possible device states.
>>
>> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
>> Reviewed-by: Neo Jia <cjia@nvidia.com>
>> ---
>> linux-headers/linux/vfio.h | 91 ++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 91 insertions(+)
>>
>> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
>> index 3615a269d378..8e9045ed9aa8 100644
>> --- a/linux-headers/linux/vfio.h
>> +++ b/linux-headers/linux/vfio.h
>> @@ -602,6 +602,97 @@ struct vfio_device_ioeventfd {
>>
>> #define VFIO_DEVICE_IOEVENTFD _IO(VFIO_TYPE, VFIO_BASE + 16)
>>
>> +/**
>> + * VFIO_DEVICE_MIGRATION_INFO - _IOW(VFIO_TYPE, VFIO_BASE + 17,
>> + * struct vfio_device_migration_info)
>
> This is a little odd; what you really have here is 7 different
> operations that can be performed; they're independent operations all
> relating to part of migration; so instead of a 'flag' it's more of an
> operation type, and there's no reason to make them individual bit flags
> - for edxample there's no reason you'd want to OR together
> MIGRATION_GET_REGION and MIGRATION_GET_PENDING - they're just
> independent.
> (Similarly you could just make them 7 ioctls)
>
Right, all flags are independent commands. But I tried to use one ioctl
number.
> I guess what you're trying to do here is a direct 1-1 mapping of qemu's
> struct SaveVMHandlers interface to devices.
> That's not necessarily a bad idea, but remember it's not a stable
> interface, so QEMU will change it over time and you'll have to then
> figure out how to shim these interfaces to it, so it's worth keeping
> that in mind, just in case you can make these interfaces more general.
>
Alex suggested using sparse region instead of ioctl, I'm making note of
your point to define structures when implementing this interface using
sparse mmap region.
>> + * Flag VFIO_MIGRATION_PROBE:
>> + * To query if feature is supported
>> + *
>> + * Flag VFIO_MIGRATION_GET_REGION:
>> + * To get migration region info
>> + * region_index [output] : region index to be used for migration region
>> + * size [output]: size of migration region
>> + *
>> + * Flag VFIO_MIGRATION_SET_STATE:
>> + * To set device state in vendor driver
>> + * device_state [input] : User space app sends device state to vendor
>> + * driver on state change
>> + *
>> + * Flag VFIO_MIGRATION_GET_PENDING:
>> + * To get pending bytes yet to be migrated from vendor driver
>> + * threshold_size [Input] : threshold of buffer in User space app.
>> + * pending_precopy_only [output] : pending data which must be migrated in
>> + * precopy phase or in stopped state, in other words - before target
>> + * vm start
>> + * pending_compatible [output] : pending data which may be migrated in any
>> + * phase
>> + * pending_postcopy_only [output] : pending data which must be migrated in
>> + * postcopy phase or in stopped state, in other words - after source
>> + * vm stop
>> + * Sum of pending_precopy_only, pending_compatible and
>> + * pending_postcopy_only is the whole amount of pending data.
>> + *
>> + * Flag VFIO_MIGRATION_GET_BUFFER:
>> + * On this flag, vendor driver should write data to migration region and
>> + * return number of bytes written in the region.
>> + * bytes_written [output] : number of bytes written in migration buffer by
>> + * vendor driver
>> + *
>> + * Flag VFIO_MIGRATION_SET_BUFFER
>> + * In migration resume path, user space app writes to migration region and
>> + * communicates it to vendor driver with this ioctl with this flag.
>> + * bytes_written [Input] : number of bytes written in migration buffer by
>> + * user space app.
>
> I guess we call getbuffer/setbuffer multiple times. Is there anything
> that needs to be passed to get the data to line up (e.g. offset into the
> data)
>
I think, vendor driver can keep track of offset as it knows what data
copied and what is remaining.
>> + * Flag VFIO_MIGRATION_GET_DIRTY_PFNS
>> + * Get bitmap of dirty pages from vendor driver from given start address.
>> + * start_addr [Input] : start address
>> + * pfn_count [Input] : Total pfn count from start_addr for which dirty
>> + * bitmap is requested
>> + * dirty_bitmap [Output] : bitmap memory allocated by user space
>> + * application, vendor driver should return the bitmap with bits set
>> + * only for pages to be marked dirty.
>> + * Return: 0 on success, -errno on failure.
>> + */
>> +
>> +struct vfio_device_migration_info {
>> + __u32 argsz;
>> + __u32 flags;
>> +#define VFIO_MIGRATION_PROBE (1 << 0)
>> +#define VFIO_MIGRATION_GET_REGION (1 << 1)
>> +#define VFIO_MIGRATION_SET_STATE (1 << 2)
>> +#define VFIO_MIGRATION_GET_PENDING (1 << 3)
>> +#define VFIO_MIGRATION_GET_BUFFER (1 << 4)
>> +#define VFIO_MIGRATION_SET_BUFFER (1 << 5)
>> +#define VFIO_MIGRATION_GET_DIRTY_PFNS (1 << 6)
>> + __u32 region_index; /* region index */
>> + __u64 size; /* size */
>> + __u32 device_state; /* VFIO device state */
>> + __u64 pending_precopy_only;
>> + __u64 pending_compatible;
>> + __u64 pending_postcopy_only;
>> + __u64 threshold_size;
>> + __u64 bytes_written;
>> + __u64 start_addr;
>> + __u64 pfn_count;
>> + __u8 dirty_bitmap[];
>> +};
>
> This feels like it should be separate types for the different calls.
>
> Also, is there no state to tell the userspace what version of migration
> data we have? What happens if I try and load a migration state
> from an older driver into a newer one, or the other way around,
> or into a destination card that's not the same as the source.
>
As I mentioned in other reply, this RFC is mainly focused on core logic
of live migration which include implementation for pre-copy, stop-n-copy
and resume phases, defining device states.
>> +enum {
>> + VFIO_DEVICE_STATE_NONE,
>> + VFIO_DEVICE_STATE_RUNNING,
>> + VFIO_DEVICE_STATE_MIGRATION_SETUP,
>> + VFIO_DEVICE_STATE_MIGRATION_PRECOPY_ACTIVE,
>> + VFIO_DEVICE_STATE_MIGRATION_STOPNCOPY_ACTIVE,
>> + VFIO_DEVICE_STATE_MIGRATION_SAVE_COMPLETED,
>> + VFIO_DEVICE_STATE_MIGRATION_RESUME,
>> + VFIO_DEVICE_STATE_MIGRATION_RESUME_COMPLETED,
>> + VFIO_DEVICE_STATE_MIGRATION_FAILED,
>> + VFIO_DEVICE_STATE_MIGRATION_CANCELLED,
>> +};
>
> That's an interesting merge of QEMU's run-state and it's migration
> state.
>
> You probably need to define which transitions you take as legal.
>
In reply to Alex, I had listed down allowable state transitions.
Thanks,
Kirti
> Dave
>
>> +
>> +#define VFIO_DEVICE_MIGRATION_INFO _IO(VFIO_TYPE, VFIO_BASE + 17)
>> +
>> /* -------- API for Type1 VFIO IOMMU -------- */
>>
>> /**
>> --
>> 2.7.0
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
next prev parent reply other threads:[~2018-10-17 21:17 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-16 18:12 [RFC PATCH v1 0/4] Add migration support for VFIO device Kirti Wankhede
2018-10-16 18:12 ` [RFC PATCH v1 1/4] VFIO KABI for migration interface Kirti Wankhede
2018-10-16 22:34 ` Alex Williamson
2018-10-17 20:47 ` Kirti Wankhede
2018-10-17 23:14 ` Alex Williamson
2018-10-18 1:49 ` Gonglei (Arei)
2018-10-17 9:06 ` Christoph Hellwig
2018-10-17 10:09 ` Dr. David Alan Gilbert
2018-10-17 21:17 ` Kirti Wankhede [this message]
2018-10-18 9:23 ` Dr. David Alan Gilbert
2018-10-16 18:12 ` [RFC PATCH v1 2/4] Add migration functions for VFIO devices Kirti Wankhede
2018-10-17 9:00 ` Cornelia Huck
2018-10-17 21:03 ` Kirti Wankhede
2018-10-16 18:12 ` [RFC PATCH v1 3/4] Add vfio_listerner_log_sync to mark dirty pages Kirti Wankhede
2018-10-16 18:12 ` [RFC PATCH v1 4/4] Make vfio-pci device migration capable Kirti Wankhede
2018-10-17 8:49 ` [RFC PATCH v1 0/4] Add migration support for VFIO device Cornelia Huck
2018-10-17 20:59 ` Kirti Wankhede
2018-10-18 2:41 ` Tian, Kevin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2abff2cd-6bdc-69d9-28b3-72baab47bf81@nvidia.com \
--to=kwankhede@nvidia.com \
--cc=alex.williamson@redhat.com \
--cc=cjia@nvidia.com \
--cc=dgilbert@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox