From: "Michael S. Tsirkin" <mst@redhat.com>
To: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: "Maxime Coquelin" <maxime.coquelin@redhat.com>,
dev@dpdk.org, "Stephen Hemminger" <stephen@networkplumber.org>,
qemu-devel@nongnu.org, libvir-list@redhat.com,
vpp-dev@lists.fd.io,
"Marc-André Lureau" <marcandre.lureau@redhat.com>
Subject: Re: dpdk/vpp and cross-version migration for vhost
Date: Tue, 22 Nov 2016 16:53:05 +0200 [thread overview]
Message-ID: <20161122164143-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20161122130223.GW5048@yliu-dev.sh.intel.com>
On Tue, Nov 22, 2016 at 09:02:23PM +0800, Yuanhan Liu wrote:
> On Thu, Nov 17, 2016 at 07:37:09PM +0200, Michael S. Tsirkin wrote:
> > On Thu, Nov 17, 2016 at 05:49:36PM +0800, Yuanhan Liu wrote:
> > > On Thu, Nov 17, 2016 at 09:47:09AM +0100, Maxime Coquelin wrote:
> > > >
> > > >
> > > > On 11/17/2016 09:29 AM, Yuanhan Liu wrote:
> > > > >As usaual, sorry for late response :/
> > > > >
> > > > >On Thu, Oct 13, 2016 at 08:50:52PM +0300, Michael S. Tsirkin wrote:
> > > > >>Hi!
> > > > >>So it looks like we face a problem with cross-version
> > > > >>migration when using vhost. It's not new but became more
> > > > >>acute with the advent of vhost user.
> > > > >>
> > > > >>For users to be able to migrate between different versions
> > > > >>of the hypervisor the interface exposed to guests
> > > > >>by hypervisor must stay unchanged.
> > > > >>
> > > > >>The problem is that a qemu device is connected
> > > > >>to a backend in another process, so the interface
> > > > >>exposed to guests depends on the capabilities of that
> > > > >>process.
> > > > >>
> > > > >>Specifically, for vhost user interface based on virtio, this includes
> > > > >>the "host features" bitmap that defines the interface, as well as more
> > > > >>host values such as the max ring size. Adding new features/changing
> > > > >>values to this interface is required to make progress, but on the other
> > > > >>hand we need ability to get the old host features to be compatible.
> > > > >
> > > > >It looks like to the same issue of vhost-user reconnect to me. For example,
> > > > >
> > > > >- start dpdk 16.07 & qemu 2.5
> > > > >- kill dpdk
> > > > >- start dpdk 16.11
> > > > >
> > > > >Though DPDK 16.11 has more features comparing to dpdk 16.07 (say, indirect),
> > > > >above should work. Because qemu saves the negotiated features before the
> > > > >disconnect and stores it back after the reconnection.
> > > > >
> > > > > commit a463215b087c41d7ca94e51aa347cde523831873
> > > > > Author: Marc-André Lureau <marcandre.lureau@redhat.com>
> > > > > Date: Mon Jun 6 18:45:05 2016 +0200
> > > > >
> > > > > vhost-net: save & restore vhost-user acked features
> > > > >
> > > > > The initial vhost-user connection sets the features to be negotiated
> > > > > with the driver. Renegotiation isn't possible without device reset.
> > > > >
> > > > > To handle reconnection of vhost-user backend, ensure the same set of
> > > > > features are provided, and reuse already acked features.
> > > > >
> > > > > Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> > > > >
> > > > >
> > > > >So we could do similar to vhost-user? I mean, save the acked features
> > > > >before migration and store it back after it. This should be able to
> > > > >keep the compatibility. If user downgrades DPDK version, it also could
> > > > >be easily detected, and then exit with an error to user: migration
> > > > >failed due to un-compatible vhost features.
> > > > >
> > > > >Just some rough thoughts. Makes tiny sense?
> > > >
> > > > My understanding is that the management tool has to know whether
> > > > versions are compatible before initiating the migration:
> > >
> > > Makes sense. How about getting and restoring the acked features through
> > > qemu command lines then, say, through the monitor interface?
> > >
> > > With that, it would be something like:
> > >
> > > - start vhost-user backend (DPDK, VPP, or whatever) & qemu in the src host
> > >
> > > - read the acked features (through monitor interface)
> > >
> > > - start vhost-user backend in the dst host
> > >
> > > - start qemu in the dst host with the just queried acked features
> > >
> > > QEMU then is expected to use this feature set for the later vhost-user
> > > feature negotitation. Exit if features compatibility is broken.
> > >
> > > Thoughts?
> > >
> > > --yliu
> >
> >
> > You keep assuming that you have the VM started first and
> > figure out things afterwards, but this does not work.
> >
> > Think about a cluster of machines. You want to start a VM in
> > a way that will ensure compatibility with all hosts
> > in a cluster.
>
> I see. I was more considering about the case when the dst
> host (including the qemu and dpdk combo) is given, and
> then determine whether it will be a successfull migration
> or not.
>
> And you are asking that we need to know which host could
> be a good candidate before starting the migration. In such
> case, we indeed need some inputs from both the qemu and
> vhost-user backend.
>
> For DPDK, I think it could be simple, just as you said, it
> could be either a tiny script, or even a macro defined in
> the source code file (we extend it every time we add a
> new feature) to let the libvirt to read it. Or something
> else.
There's the issue of APIs that tweak features as Maxime
suggested. Maybe the only thing to do is to deprecate it,
but I feel some way for application to pass info into
guest might be benefitial.
> > If you don't, guest visible interface will change
> > and you won't be able to migrate.
> >
> > It does not make sense to discuss feature bits specifically
> > since that is not the only part of interface.
> > For example, max ring size supported might change.
>
> I don't quite understand why we have to consider the max ring
> size here? Isn't it a virtio device attribute, that QEMU could
> provide such compatibility information?
>
> I mean, DPDK is supposed to support vary vring size, it's QEMU
> to give a specifc value.
If backend supports s/g of any size up to 2^16, there's no issue.
ATM some backends might be assuming up to 1K s/g since
QEMU never supported bigger ones. We might classify this
as a bug, or not and add a feature flag.
But it's just an example. There might be more values at issue
in the future.
> > Let me describe how it works in qemu/libvirt.
> > When you install a VM, you can specify compatibility
> > level (aka "machine type"), and you can query the supported compatibility
> > levels. Management uses that to find the supported compatibility
> > and stores the compatibility in XML that is migrated with the VM.
> > There's also a way to find the latest level which is the
> > default unless overridden by user, again this level
> > is recorded and then
> > - management can make sure migration destination is compatible
> > - management can avoid migration to hosts without that support
>
> Thanks for the info, it helps.
>
> ...
> > > > >>As version here is an opaque string for libvirt and qemu,
> > > > >>anything can be used - but I suggest either a list
> > > > >>of values defining the interface, e.g.
> > > > >>any_layout=on,max_ring=256
> > > > >>or a version including the name and vendor of the backend,
> > > > >>e.g. "org.dpdk.v4.5.6".
>
> The version scheme may not be ideal here. Assume a QEMU is supposed
> to work with a specific DPDK version, however, user may disable some
> newer features through qemu command line, that it also could work with
> an elder DPDK version. Using the version scheme will not allow us doing
> such migration to an elder DPDK version. The MTU is a lively example
> here? (when MTU feature is provided by QEMU but is actually disabled
> by user, that it could also work with an elder DPDK without MTU support).
>
> --yliu
OK, so does a list of values look better to you then?
> > > > >>
> > > > >>Note that typically the list of supported versions can only be
> > > > >>extended, not shrunk. Also, if the host/guest interface
> > > > >>does not change, don't change the current version as
> > > > >>this just creates work for everyone.
> > > > >>
> > > > >>Thoughts? Would this work well for management? dpdk? vpp?
> > > > >>
> > > > >>Thanks!
> > > > >>
> > > > >>--
> > > > >>MST
WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: "Maxime Coquelin" <maxime.coquelin@redhat.com>,
dev@dpdk.org, "Stephen Hemminger" <stephen@networkplumber.org>,
qemu-devel@nongnu.org, libvir-list@redhat.com,
vpp-dev@lists.fd.io,
"Marc-André Lureau" <marcandre.lureau@redhat.com>
Subject: Re: [Qemu-devel] dpdk/vpp and cross-version migration for vhost
Date: Tue, 22 Nov 2016 16:53:05 +0200 [thread overview]
Message-ID: <20161122164143-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20161122130223.GW5048@yliu-dev.sh.intel.com>
On Tue, Nov 22, 2016 at 09:02:23PM +0800, Yuanhan Liu wrote:
> On Thu, Nov 17, 2016 at 07:37:09PM +0200, Michael S. Tsirkin wrote:
> > On Thu, Nov 17, 2016 at 05:49:36PM +0800, Yuanhan Liu wrote:
> > > On Thu, Nov 17, 2016 at 09:47:09AM +0100, Maxime Coquelin wrote:
> > > >
> > > >
> > > > On 11/17/2016 09:29 AM, Yuanhan Liu wrote:
> > > > >As usaual, sorry for late response :/
> > > > >
> > > > >On Thu, Oct 13, 2016 at 08:50:52PM +0300, Michael S. Tsirkin wrote:
> > > > >>Hi!
> > > > >>So it looks like we face a problem with cross-version
> > > > >>migration when using vhost. It's not new but became more
> > > > >>acute with the advent of vhost user.
> > > > >>
> > > > >>For users to be able to migrate between different versions
> > > > >>of the hypervisor the interface exposed to guests
> > > > >>by hypervisor must stay unchanged.
> > > > >>
> > > > >>The problem is that a qemu device is connected
> > > > >>to a backend in another process, so the interface
> > > > >>exposed to guests depends on the capabilities of that
> > > > >>process.
> > > > >>
> > > > >>Specifically, for vhost user interface based on virtio, this includes
> > > > >>the "host features" bitmap that defines the interface, as well as more
> > > > >>host values such as the max ring size. Adding new features/changing
> > > > >>values to this interface is required to make progress, but on the other
> > > > >>hand we need ability to get the old host features to be compatible.
> > > > >
> > > > >It looks like to the same issue of vhost-user reconnect to me. For example,
> > > > >
> > > > >- start dpdk 16.07 & qemu 2.5
> > > > >- kill dpdk
> > > > >- start dpdk 16.11
> > > > >
> > > > >Though DPDK 16.11 has more features comparing to dpdk 16.07 (say, indirect),
> > > > >above should work. Because qemu saves the negotiated features before the
> > > > >disconnect and stores it back after the reconnection.
> > > > >
> > > > > commit a463215b087c41d7ca94e51aa347cde523831873
> > > > > Author: Marc-André Lureau <marcandre.lureau@redhat.com>
> > > > > Date: Mon Jun 6 18:45:05 2016 +0200
> > > > >
> > > > > vhost-net: save & restore vhost-user acked features
> > > > >
> > > > > The initial vhost-user connection sets the features to be negotiated
> > > > > with the driver. Renegotiation isn't possible without device reset.
> > > > >
> > > > > To handle reconnection of vhost-user backend, ensure the same set of
> > > > > features are provided, and reuse already acked features.
> > > > >
> > > > > Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> > > > >
> > > > >
> > > > >So we could do similar to vhost-user? I mean, save the acked features
> > > > >before migration and store it back after it. This should be able to
> > > > >keep the compatibility. If user downgrades DPDK version, it also could
> > > > >be easily detected, and then exit with an error to user: migration
> > > > >failed due to un-compatible vhost features.
> > > > >
> > > > >Just some rough thoughts. Makes tiny sense?
> > > >
> > > > My understanding is that the management tool has to know whether
> > > > versions are compatible before initiating the migration:
> > >
> > > Makes sense. How about getting and restoring the acked features through
> > > qemu command lines then, say, through the monitor interface?
> > >
> > > With that, it would be something like:
> > >
> > > - start vhost-user backend (DPDK, VPP, or whatever) & qemu in the src host
> > >
> > > - read the acked features (through monitor interface)
> > >
> > > - start vhost-user backend in the dst host
> > >
> > > - start qemu in the dst host with the just queried acked features
> > >
> > > QEMU then is expected to use this feature set for the later vhost-user
> > > feature negotitation. Exit if features compatibility is broken.
> > >
> > > Thoughts?
> > >
> > > --yliu
> >
> >
> > You keep assuming that you have the VM started first and
> > figure out things afterwards, but this does not work.
> >
> > Think about a cluster of machines. You want to start a VM in
> > a way that will ensure compatibility with all hosts
> > in a cluster.
>
> I see. I was more considering about the case when the dst
> host (including the qemu and dpdk combo) is given, and
> then determine whether it will be a successfull migration
> or not.
>
> And you are asking that we need to know which host could
> be a good candidate before starting the migration. In such
> case, we indeed need some inputs from both the qemu and
> vhost-user backend.
>
> For DPDK, I think it could be simple, just as you said, it
> could be either a tiny script, or even a macro defined in
> the source code file (we extend it every time we add a
> new feature) to let the libvirt to read it. Or something
> else.
There's the issue of APIs that tweak features as Maxime
suggested. Maybe the only thing to do is to deprecate it,
but I feel some way for application to pass info into
guest might be benefitial.
> > If you don't, guest visible interface will change
> > and you won't be able to migrate.
> >
> > It does not make sense to discuss feature bits specifically
> > since that is not the only part of interface.
> > For example, max ring size supported might change.
>
> I don't quite understand why we have to consider the max ring
> size here? Isn't it a virtio device attribute, that QEMU could
> provide such compatibility information?
>
> I mean, DPDK is supposed to support vary vring size, it's QEMU
> to give a specifc value.
If backend supports s/g of any size up to 2^16, there's no issue.
ATM some backends might be assuming up to 1K s/g since
QEMU never supported bigger ones. We might classify this
as a bug, or not and add a feature flag.
But it's just an example. There might be more values at issue
in the future.
> > Let me describe how it works in qemu/libvirt.
> > When you install a VM, you can specify compatibility
> > level (aka "machine type"), and you can query the supported compatibility
> > levels. Management uses that to find the supported compatibility
> > and stores the compatibility in XML that is migrated with the VM.
> > There's also a way to find the latest level which is the
> > default unless overridden by user, again this level
> > is recorded and then
> > - management can make sure migration destination is compatible
> > - management can avoid migration to hosts without that support
>
> Thanks for the info, it helps.
>
> ...
> > > > >>As version here is an opaque string for libvirt and qemu,
> > > > >>anything can be used - but I suggest either a list
> > > > >>of values defining the interface, e.g.
> > > > >>any_layout=on,max_ring=256
> > > > >>or a version including the name and vendor of the backend,
> > > > >>e.g. "org.dpdk.v4.5.6".
>
> The version scheme may not be ideal here. Assume a QEMU is supposed
> to work with a specific DPDK version, however, user may disable some
> newer features through qemu command line, that it also could work with
> an elder DPDK version. Using the version scheme will not allow us doing
> such migration to an elder DPDK version. The MTU is a lively example
> here? (when MTU feature is provided by QEMU but is actually disabled
> by user, that it could also work with an elder DPDK without MTU support).
>
> --yliu
OK, so does a list of values look better to you then?
> > > > >>
> > > > >>Note that typically the list of supported versions can only be
> > > > >>extended, not shrunk. Also, if the host/guest interface
> > > > >>does not change, don't change the current version as
> > > > >>this just creates work for everyone.
> > > > >>
> > > > >>Thoughts? Would this work well for management? dpdk? vpp?
> > > > >>
> > > > >>Thanks!
> > > > >>
> > > > >>--
> > > > >>MST
next prev parent reply other threads:[~2016-11-22 14:53 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-13 17:50 dpdk/vpp and cross-version migration for vhost Michael S. Tsirkin
2016-10-13 17:50 ` [Qemu-devel] " Michael S. Tsirkin
2016-11-16 20:43 ` Maxime Coquelin
2016-11-16 20:43 ` [Qemu-devel] " Maxime Coquelin
2016-11-17 8:29 ` Yuanhan Liu
2016-11-17 8:29 ` [Qemu-devel] " Yuanhan Liu
2016-11-17 8:47 ` Maxime Coquelin
2016-11-17 8:47 ` [Qemu-devel] " Maxime Coquelin
2016-11-17 9:49 ` Yuanhan Liu
2016-11-17 9:49 ` [Qemu-devel] " Yuanhan Liu
2016-11-17 15:25 ` [vpp-dev] " Thomas F Herbert
2016-11-17 15:25 ` [Qemu-devel] " Thomas F Herbert
2016-11-17 17:37 ` Michael S. Tsirkin
2016-11-17 17:37 ` [Qemu-devel] " Michael S. Tsirkin
2016-11-22 13:02 ` Yuanhan Liu
2016-11-22 13:02 ` [Qemu-devel] " Yuanhan Liu
2016-11-22 14:53 ` Michael S. Tsirkin [this message]
2016-11-22 14:53 ` Michael S. Tsirkin
2016-11-24 6:31 ` Yuanhan Liu
2016-11-24 6:31 ` [Qemu-devel] " Yuanhan Liu
2016-11-24 9:30 ` Kevin Traynor
2016-11-24 9:30 ` [Qemu-devel] [dpdk-dev] " Kevin Traynor
2016-11-24 12:33 ` Yuanhan Liu
2016-11-24 12:33 ` [Qemu-devel] [dpdk-dev] " Yuanhan Liu
2016-11-24 12:47 ` Maxime Coquelin
2016-11-24 12:47 ` [Qemu-devel] [dpdk-dev] " Maxime Coquelin
2016-11-24 15:01 ` Kevin Traynor
2016-11-24 15:01 ` [Qemu-devel] [dpdk-dev] " Kevin Traynor
2016-11-24 15:24 ` Kavanagh, Mark B
2016-11-24 15:24 ` [Qemu-devel] [dpdk-dev] " Kavanagh, Mark B
2016-11-28 15:28 ` Maxime Coquelin
2016-11-28 15:28 ` [Qemu-devel] [dpdk-dev] " Maxime Coquelin
2016-11-28 22:18 ` Thomas Monjalon
2016-11-28 22:18 ` [Qemu-devel] [dpdk-dev] " Thomas Monjalon
2016-11-29 8:09 ` Maxime Coquelin
2016-11-29 8:09 ` [Qemu-devel] [dpdk-dev] " Maxime Coquelin
2016-12-09 13:35 ` Maxime Coquelin
2016-12-09 13:35 ` [Qemu-devel] " Maxime Coquelin
2016-12-09 14:42 ` Daniel P. Berrange
2016-12-09 14:42 ` [Qemu-devel] " Daniel P. Berrange
2016-12-09 16:45 ` Maxime Coquelin
2016-12-09 16:45 ` [Qemu-devel] " Maxime Coquelin
2016-12-09 16:48 ` Daniel P. Berrange
2016-12-09 16:48 ` [Qemu-devel] " Daniel P. Berrange
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161122164143-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=dev@dpdk.org \
--cc=libvir-list@redhat.com \
--cc=marcandre.lureau@redhat.com \
--cc=maxime.coquelin@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stephen@networkplumber.org \
--cc=vpp-dev@lists.fd.io \
--cc=yuanhan.liu@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.