From: Peter Xu <peterx@redhat.com>
To: Akihiko Odaki <akihiko.odaki@daynix.com>
Cc: "Jason Wang" <jasowang@redhat.com>,
"Daniel P. Berrangé" <berrange@redhat.com>,
"Thomas Huth" <thuth@redhat.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
"Yuri Benditovich" <yuri.benditovich@daynix.com>,
eduardo@habkost.net, marcel.apfelbaum@gmail.com,
philmd@linaro.org, wangyanan55@huawei.com,
dmitry.fleytman@gmail.com, sriram.yagnaraman@est.tech,
sw@weilnetz.de, qemu-devel@nongnu.org, yan@daynix.com,
"Fabiano Rosas" <farosas@suse.de>,
devel@lists.libvirt.org
Subject: Re: [PATCH v2 4/4] virtio-net: Add support for USO features
Date: Mon, 29 Jul 2024 10:29:29 -0400 [thread overview]
Message-ID: <ZqenSQHzniN14g7G@x1n> (raw)
In-Reply-To: <f1d0621b-84f0-4c2c-b4f4-f8ebd494ec48@daynix.com>
On Mon, Jul 29, 2024 at 01:45:12PM +0900, Akihiko Odaki wrote:
> On 2024/07/29 12:50, Jason Wang wrote:
> > On Sun, Jul 28, 2024 at 11:19 PM Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> > >
> > > On 2024/07/27 5:47, Peter Xu wrote:
> > > > On Fri, Jul 26, 2024 at 04:17:12PM +0100, Daniel P. Berrangé wrote:
> > > > > On Fri, Jul 26, 2024 at 10:43:42AM -0400, Peter Xu wrote:
> > > > > > On Fri, Jul 26, 2024 at 09:48:02AM +0100, Daniel P. Berrangé wrote:
> > > > > > > On Fri, Jul 26, 2024 at 09:03:24AM +0200, Thomas Huth wrote:
> > > > > > > > On 26/07/2024 08.08, Michael S. Tsirkin wrote:
> > > > > > > > > On Thu, Jul 25, 2024 at 06:18:20PM -0400, Peter Xu wrote:
> > > > > > > > > > On Tue, Aug 01, 2023 at 01:31:48AM +0300, Yuri Benditovich wrote:
> > > > > > > > > > > USO features of virtio-net device depend on kernel ability
> > > > > > > > > > > to support them, for backward compatibility by default the
> > > > > > > > > > > features are disabled on 8.0 and earlier.
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
> > > > > > > > > > > Signed-off-by: Andrew Melnychecnko <andrew@daynix.com>
> > > > > > > > > >
> > > > > > > > > > Looks like this patch broke migration when the VM starts on a host that has
> > > > > > > > > > USO supported, to another host that doesn't..
> > > > > > > > >
> > > > > > > > > This was always the case with all offloads. The answer at the moment is,
> > > > > > > > > don't do this.
> > > > > > > >
> > > > > > > > May I ask for my understanding:
> > > > > > > > "don't do this" = don't automatically enable/disable virtio features in QEMU
> > > > > > > > depending on host kernel features, or "don't do this" = don't try to migrate
> > > > > > > > between machines that have different host kernel features?
> > > > > > > >
> > > > > > > > > Long term, we need to start exposing management APIs
> > > > > > > > > to discover this, and management has to disable unsupported features.
> > > > > > > >
> > > > > > > > Ack, this likely needs some treatments from the libvirt side, too.
> > > > > > >
> > > > > > > When QEMU automatically toggles machine type featuers based on host
> > > > > > > kernel, relying on libvirt to then disable them again is impractical,
> > > > > > > as we cannot assume that the libvirt people are using knows about
> > > > > > > newly introduced features. Even if libvirt is updated to know about
> > > > > > > it, people can easily be using a previous libvirt release.
> > > > > > >
> > > > > > > QEMU itself needs to make the machine types do that they are there
> > > > > > > todo, which is to define a stable machine ABI.
> > > > > > >
> > > > > > > What QEMU is missing here is a "platform ABI" concept, to encode
> > > > > > > sets of features which are tied to specific platform generations.
> > > > > > > As long as we don't have that we'll keep having these broken
> > > > > > > migration problems from machine types dynamically changing instead
> > > > > > > of providing a stable guest ABI.
> > > > > >
> > > > > > Any more elaboration on this idea? Would it be easily feasible in
> > > > > > implementation?
> > > > >
> > > > > In terms of launching QEMU I'd imagine:
> > > > >
> > > > > $QEMU -machine pc-q35-9.1 -platform linux-6.9 ...args...
> > > > >
> > > > > Any virtual machine HW features which are tied to host kernel features
> > > > > would have their defaults set based on the requested -platform. The
> > > > > -machine will be fully invariant wrt the host kernel.
> > > > >
> > > > > You would have -platform hlep to list available platforms, and
> > > > > corresonding QMP "query-platforms" command to list what platforms
> > > > > are supported on a given host OS.
> > > > >
> > > > > Downstream distros can provide their own platforms definitions
> > > > > (eg "linux-rhel-9.5") if they have kernels whose feature set
> > > > > diverges from upstream due to backports.
> > > > >
> > > > > Mgmt apps won't need to be taught about every single little QEMU
> > > > > setting whose default is derived from the kernel. Individual
> > > > > defaults are opaque and controlled by the requested platform.
> > > > >
> > > > > Live migration has clearly defined semantics, and mgmt app can
> > > > > use query-platforms to validate two hosts are compatible.
> > > > >
> > > > > Omitting -platform should pick the very latest platform that is
> > > > > cmpatible with the current host (not neccessarily the latest
> > > > > platform built-in to QEMU).
> > > >
> > > > This seems to add one more layer to maintain, and so far I don't know
> > > > whether it's a must.
> > > >
> > > > To put it simple, can we simply rely on qemu cmdline as "the guest ABI"? I
> > > > thought it was mostly the case already, except some extremely rare
> > > > outliers.
> > > >
> > > > When we have one host that boots up a VM using:
> > > >
> > > > $QEMU1 $cmdline
> > > >
> > > > Then another host boots up:
> > > >
> > > > $QEMU2 $cmdline -incoming XXX
> > > >
> > > > Then migration should succeed if $cmdline is exactly the same, and the VM
> > > > can boot up all fine without errors on both sides.
> > > >
> > > > AFAICT this has nothing to do with what kernel is underneath, even not
> > > > Linux? I think either QEMU1 / QEMU2 has the option to fail. But if it
> > > > didn't, I thought the ABI should be guaranteed.
> > > >
> > > > That's why I think this is a migration violation, as 99.99% of other device
> > > > properties should be following this rule. The issue here is, we have the
> > > > same virtio-net-pci cmdline on both sides in this case, but the ABI got
> > > > break.
> > > >
> > > > That's also why I was suggesting if the property contributes to the guest
> > > > ABI, then AFAIU QEMU needs to:
> > > >
> > > > - Firstly, never quietly flipping any bit that affects the ABI...
> > > >
> > > > - Have a default value of off, then QEMU will always allow the VM to boot
> > > > by default, while advanced users can opt-in on new features. We can't
> > > > make this ON by default otherwise some VMs can already fail to boot,
> > >
> > > It may not be necessary the case that old features are supported by
> > > every systems. In an extreme case, a user may migrate a VM from Linux to
> > > Windows, which probably doesn't support any offloading at all. A more
> > > convincing scenario is RSS offloading with eBPF; using eBPF requires a
> > > privilege so we cannot assume it is always available even on the latest
> > > version of Linux.
> >
> > I don't get why eBPF matters here. It is something that is not noticed
> > by the guest and we have a fallback anyhow.
> >
> > >
> > > >
> > > > - If the host doesn't support the feature while the cmdline enabled it,
> > > > it needs to fail QEMU boot rather than flipping, so that it says "hey,
> > > > this host does not support running such VM specified, due to XXX
> > > > feature missing".
> > >
> > > This is handled in:
> > >
> > > "virtio-net: Convert feature properties to OnOffAuto"
> > > https://patchew.org/QEMU/20240714-auto-v3-0-e27401aabab3@daynix.com/
> >
> > I may miss something but I think "Auto" doesn't make sense to libvirt.
>
> The point is libvirt can explicitly set "on" to avoid the "auto" behavior.
> libvirt does not have to use the "auto" value.
>
> libvirt can still use "auto" if desired. virDomainNetDefParseXMLDriver() in
> libvirt actually parses tristate values (libvirt uses "default" instead of
> "auto" as the mnemonic) for these features though "default" is currently
> disabled by the schema (src/conf/schemas/domaincommon.rng). Allowing user to
> specify "default" is only a matter of editing the schema. Of course
> specifying "default" will make the VM unsafe for migration.
Isn't keeping the default AUTO the same as before when it used to be ON? I
mean, AUTO in a qemu cmdline doesn't guarantee guest API either.
Indeed it looks like it's a step forward to make ON having the clear
semantics of "fail when unsupported". It's just that I am not sure how
useful is AUTO here, because anyway we'll need to break ON semantics even
with AUTO, so that an old QEMU script with USO=ON used to boot on old
kernels but not it won't.
What I was trying to say is whether we should make the default parameter to
be migratable. IOW, it looks to me AUTO should deserve a migration
blocker when chosen.
After all, Libvirt hopefully shouldn't use AUTO at all but only ON/OFF,
while any user when not caring much on these perf details should always use
OFF on any kernel dependent features that may affect the guest ABI.
Thanks,
--
Peter Xu
next prev parent reply other threads:[~2024-07-29 14:30 UTC|newest]
Thread overview: 107+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-31 22:31 [PATCH v2 0/4] virtio-net: add USO feature (UDP segmentation offload) Yuri Benditovich
2023-07-31 22:31 ` [PATCH v2 1/4] tap: Add USO support to tap device Yuri Benditovich
2023-07-31 22:31 ` [PATCH v2 2/4] tap: Add check for USO features Yuri Benditovich
2023-07-31 22:31 ` [PATCH v2 3/4] virtio-net: Add USO flags to vhost support Yuri Benditovich
2023-07-31 22:31 ` [PATCH v2 4/4] virtio-net: Add support for USO features Yuri Benditovich
2023-08-02 5:17 ` Akihiko Odaki
2024-07-25 22:18 ` Peter Xu
2024-07-26 2:12 ` Jason Wang
2024-07-26 15:01 ` Peter Xu
2024-07-26 6:08 ` Michael S. Tsirkin
2024-07-26 7:03 ` Thomas Huth
2024-07-26 7:25 ` Michael S. Tsirkin
2024-07-26 11:32 ` Peter Xu
2024-07-26 17:39 ` Thomas Huth
2024-07-26 20:55 ` Peter Xu
2024-08-01 5:41 ` Michael S. Tsirkin
2024-07-26 8:48 ` Daniel P. Berrangé
2024-07-26 14:43 ` Peter Xu
2024-07-26 15:17 ` Daniel P. Berrangé
2024-07-26 20:47 ` Peter Xu
2024-07-28 15:18 ` Akihiko Odaki
2024-07-29 3:50 ` Jason Wang
2024-07-29 4:45 ` Akihiko Odaki
2024-07-29 14:29 ` Peter Xu [this message]
2024-07-29 16:43 ` Akihiko Odaki
2024-07-30 2:04 ` Jason Wang
2024-07-30 2:57 ` Akihiko Odaki
2024-07-30 3:03 ` Jason Wang
2024-07-30 3:11 ` Akihiko Odaki
2024-07-30 3:17 ` Jason Wang
2024-07-30 3:28 ` Akihiko Odaki
2024-07-30 3:45 ` Jason Wang
2024-07-30 10:23 ` Akihiko Odaki
2024-07-30 11:52 ` Yuri Benditovich
2024-07-31 2:05 ` Jason Wang
2024-07-29 15:58 ` Daniel P. Berrangé
2024-07-29 17:00 ` Peter Xu
2024-07-29 17:23 ` Akihiko Odaki
2024-07-30 18:02 ` Peter Xu
2024-07-29 17:26 ` Daniel P. Berrangé
2024-07-30 18:13 ` Peter Xu
2024-07-30 18:46 ` Daniel P. Berrangé
2024-07-30 19:11 ` Peter Xu
2024-07-30 19:22 ` Michael S. Tsirkin
2024-07-30 20:03 ` Peter Xu
2024-07-30 21:32 ` Michael S. Tsirkin
2024-07-30 22:01 ` Peter Xu
2024-07-31 2:01 ` Jason Wang
2024-07-31 7:04 ` Daniel P. Berrangé
2024-07-31 7:41 ` Michael S. Tsirkin
2024-07-31 12:57 ` Peter Xu
2024-08-01 2:28 ` Jason Wang
2024-08-01 5:28 ` Akihiko Odaki
2024-08-01 5:34 ` Michael S. Tsirkin
2024-08-01 5:51 ` Michael S. Tsirkin
2024-08-01 15:36 ` Peter Xu
2024-08-01 15:39 ` Michael S. Tsirkin
2024-08-01 15:45 ` Daniel P. Berrangé
2024-08-01 15:50 ` Michael S. Tsirkin
2024-08-01 15:58 ` Daniel P. Berrangé
2024-08-01 5:05 ` Akihiko Odaki
2024-08-01 15:13 ` Peter Xu
2024-08-01 15:15 ` Michael S. Tsirkin
2024-08-01 15:25 ` Daniel P. Berrangé
2024-08-02 4:30 ` Akihiko Odaki
2024-08-02 13:21 ` Michael S. Tsirkin
2024-08-02 15:05 ` Peter Xu
2024-08-02 15:54 ` Akihiko Odaki
2024-08-02 16:26 ` Peter Xu
2024-08-02 16:40 ` Michael S. Tsirkin
2024-08-02 20:56 ` Peter Xu
2024-08-04 6:49 ` Akihiko Odaki
2024-08-04 13:08 ` Peter Xu
2024-08-04 13:41 ` Michael S. Tsirkin
2024-08-05 7:27 ` Akihiko Odaki
2024-08-06 20:41 ` Peter Xu
2024-08-08 11:43 ` Akihiko Odaki
2024-08-08 13:55 ` Peter Xu
2024-08-08 14:45 ` Michael S. Tsirkin
2024-08-09 10:28 ` Akihiko Odaki
2024-08-05 7:30 ` Michael S. Tsirkin
2024-08-05 7:53 ` Akihiko Odaki
2024-08-05 8:23 ` Michael S. Tsirkin
2024-08-05 9:37 ` Akihiko Odaki
2024-08-05 10:08 ` Michael S. Tsirkin
2024-08-06 7:35 ` Akihiko Odaki
2024-08-06 13:29 ` Michael S. Tsirkin
2024-08-08 10:52 ` Akihiko Odaki
2024-08-08 10:54 ` Michael S. Tsirkin
2024-08-08 11:03 ` Akihiko Odaki
2024-08-08 11:12 ` Michael S. Tsirkin
2024-08-08 11:32 ` Akihiko Odaki
2024-08-08 14:21 ` Peter Xu
2024-08-08 14:15 ` Peter Xu
2024-08-08 14:47 ` Michael S. Tsirkin
2024-08-08 15:25 ` Peter Xu
2024-08-09 12:50 ` Fabiano Rosas
2024-08-18 5:04 ` Akihiko Odaki
2024-08-18 7:03 ` Michael S. Tsirkin
2024-08-19 4:27 ` Akihiko Odaki
2024-08-11 7:35 ` Michael S. Tsirkin
2024-08-18 5:09 ` Akihiko Odaki
2024-07-29 17:02 ` Akihiko Odaki
2024-08-01 5:38 ` Michael S. Tsirkin
2024-07-29 3:52 ` Jason Wang
2023-08-09 20:21 ` [PATCH v2 0/4] virtio-net: add USO feature (UDP segmentation offload) Yuri Benditovich
2023-08-10 3:14 ` Jason Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZqenSQHzniN14g7G@x1n \
--to=peterx@redhat.com \
--cc=akihiko.odaki@daynix.com \
--cc=berrange@redhat.com \
--cc=devel@lists.libvirt.org \
--cc=dmitry.fleytman@gmail.com \
--cc=eduardo@habkost.net \
--cc=farosas@suse.de \
--cc=jasowang@redhat.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=mst@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=sriram.yagnaraman@est.tech \
--cc=sw@weilnetz.de \
--cc=thuth@redhat.com \
--cc=wangyanan55@huawei.com \
--cc=yan@daynix.com \
--cc=yuri.benditovich@daynix.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).