Re: [PATCH v2 4/4] virtio-net: Add support for USO features

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Akihiko Odaki <akihiko.odaki@daynix.com>
To: Jason Wang <jasowang@redhat.com>
Cc: "Peter Xu" <peterx@redhat.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	"Thomas Huth" <thuth@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Yuri Benditovich" <yuri.benditovich@daynix.com>,
	eduardo@habkost.net, marcel.apfelbaum@gmail.com,
	philmd@linaro.org, wangyanan55@huawei.com,
	dmitry.fleytman@gmail.com, sriram.yagnaraman@est.tech,
	sw@weilnetz.de, qemu-devel@nongnu.org, yan@daynix.com,
	"Fabiano Rosas" <farosas@suse.de>,
	devel@lists.libvirt.org
Subject: Re: [PATCH v2 4/4] virtio-net: Add support for USO features
Date: Tue, 30 Jul 2024 12:11:51 +0900	[thread overview]
Message-ID: <bc2ef42e-9003-4a3e-aee5-e65d34f205bf@daynix.com> (raw)
In-Reply-To: <CACGkMEvgYTjQ5orDJMbmE8-Kwqko9dFPerTnxsRgs0EtnmKKTQ@mail.gmail.com>

On 2024/07/30 12:03, Jason Wang wrote:
> On Tue, Jul 30, 2024 at 10:57 AM Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>
>> On 2024/07/30 11:04, Jason Wang wrote:
>>> On Tue, Jul 30, 2024 at 12:43 AM Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>>>
>>>> On 2024/07/29 23:29, Peter Xu wrote:
>>>>> On Mon, Jul 29, 2024 at 01:45:12PM +0900, Akihiko Odaki wrote:
>>>>>> On 2024/07/29 12:50, Jason Wang wrote:
>>>>>>> On Sun, Jul 28, 2024 at 11:19 PM Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>>>>>>>
>>>>>>>> On 2024/07/27 5:47, Peter Xu wrote:
>>>>>>>>> On Fri, Jul 26, 2024 at 04:17:12PM +0100, Daniel P. Berrangé wrote:
>>>>>>>>>> On Fri, Jul 26, 2024 at 10:43:42AM -0400, Peter Xu wrote:
>>>>>>>>>>> On Fri, Jul 26, 2024 at 09:48:02AM +0100, Daniel P. Berrangé wrote:
>>>>>>>>>>>> On Fri, Jul 26, 2024 at 09:03:24AM +0200, Thomas Huth wrote:
>>>>>>>>>>>>> On 26/07/2024 08.08, Michael S. Tsirkin wrote:
>>>>>>>>>>>>>> On Thu, Jul 25, 2024 at 06:18:20PM -0400, Peter Xu wrote:
>>>>>>>>>>>>>>> On Tue, Aug 01, 2023 at 01:31:48AM +0300, Yuri Benditovich wrote:
>>>>>>>>>>>>>>>> USO features of virtio-net device depend on kernel ability
>>>>>>>>>>>>>>>> to support them, for backward compatibility by default the
>>>>>>>>>>>>>>>> features are disabled on 8.0 and earlier.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
>>>>>>>>>>>>>>>> Signed-off-by: Andrew Melnychecnko <andrew@daynix.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Looks like this patch broke migration when the VM starts on a host that has
>>>>>>>>>>>>>>> USO supported, to another host that doesn't..
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This was always the case with all offloads. The answer at the moment is,
>>>>>>>>>>>>>> don't do this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> May I ask for my understanding:
>>>>>>>>>>>>> "don't do this" = don't automatically enable/disable virtio features in QEMU
>>>>>>>>>>>>> depending on host kernel features, or "don't do this" = don't try to migrate
>>>>>>>>>>>>> between machines that have different host kernel features?
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Long term, we need to start exposing management APIs
>>>>>>>>>>>>>> to discover this, and management has to disable unsupported features.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ack, this likely needs some treatments from the libvirt side, too.
>>>>>>>>>>>>
>>>>>>>>>>>> When QEMU automatically toggles machine type featuers based on host
>>>>>>>>>>>> kernel, relying on libvirt to then disable them again is impractical,
>>>>>>>>>>>> as we cannot assume that the libvirt people are using knows about
>>>>>>>>>>>> newly introduced features. Even if libvirt is updated to know about
>>>>>>>>>>>> it, people can easily be using a previous libvirt release.
>>>>>>>>>>>>
>>>>>>>>>>>> QEMU itself needs to make the machine types do that they are there
>>>>>>>>>>>> todo, which is to define a stable machine ABI.
>>>>>>>>>>>>
>>>>>>>>>>>> What QEMU is missing here is a "platform ABI" concept, to encode
>>>>>>>>>>>> sets of features which are tied to specific platform generations.
>>>>>>>>>>>> As long as we don't have that we'll keep having these broken
>>>>>>>>>>>> migration problems from machine types dynamically changing instead
>>>>>>>>>>>> of providing a stable guest ABI.
>>>>>>>>>>>
>>>>>>>>>>> Any more elaboration on this idea?  Would it be easily feasible in
>>>>>>>>>>> implementation?
>>>>>>>>>>
>>>>>>>>>> In terms of launching QEMU I'd imagine:
>>>>>>>>>>
>>>>>>>>>>        $QEMU -machine pc-q35-9.1 -platform linux-6.9 ...args...
>>>>>>>>>>
>>>>>>>>>> Any virtual machine HW features which are tied to host kernel features
>>>>>>>>>> would have their defaults set based on the requested -platform. The
>>>>>>>>>> -machine will be fully invariant wrt the host kernel.
>>>>>>>>>>
>>>>>>>>>> You would have -platform hlep to list available platforms, and
>>>>>>>>>> corresonding QMP "query-platforms" command to list what platforms
>>>>>>>>>> are supported on a given host OS.
>>>>>>>>>>
>>>>>>>>>> Downstream distros can provide their own platforms definitions
>>>>>>>>>> (eg "linux-rhel-9.5") if they have kernels whose feature set
>>>>>>>>>> diverges from upstream due to backports.
>>>>>>>>>>
>>>>>>>>>> Mgmt apps won't need to be taught about every single little QEMU
>>>>>>>>>> setting whose default is derived from the kernel. Individual
>>>>>>>>>> defaults are opaque and controlled by the requested platform.
>>>>>>>>>>
>>>>>>>>>> Live migration has clearly defined semantics, and mgmt app can
>>>>>>>>>> use query-platforms to validate two hosts are compatible.
>>>>>>>>>>
>>>>>>>>>> Omitting -platform should pick the very latest platform that is
>>>>>>>>>> cmpatible with the current host (not neccessarily the latest
>>>>>>>>>> platform built-in to QEMU).
>>>>>>>>>
>>>>>>>>> This seems to add one more layer to maintain, and so far I don't know
>>>>>>>>> whether it's a must.
>>>>>>>>>
>>>>>>>>> To put it simple, can we simply rely on qemu cmdline as "the guest ABI"?  I
>>>>>>>>> thought it was mostly the case already, except some extremely rare
>>>>>>>>> outliers.
>>>>>>>>>
>>>>>>>>> When we have one host that boots up a VM using:
>>>>>>>>>
>>>>>>>>>        $QEMU1 $cmdline
>>>>>>>>>
>>>>>>>>> Then another host boots up:
>>>>>>>>>
>>>>>>>>>        $QEMU2 $cmdline -incoming XXX
>>>>>>>>>
>>>>>>>>> Then migration should succeed if $cmdline is exactly the same, and the VM
>>>>>>>>> can boot up all fine without errors on both sides.
>>>>>>>>>
>>>>>>>>> AFAICT this has nothing to do with what kernel is underneath, even not
>>>>>>>>> Linux?  I think either QEMU1 / QEMU2 has the option to fail.  But if it
>>>>>>>>> didn't, I thought the ABI should be guaranteed.
>>>>>>>>>
>>>>>>>>> That's why I think this is a migration violation, as 99.99% of other device
>>>>>>>>> properties should be following this rule.  The issue here is, we have the
>>>>>>>>> same virtio-net-pci cmdline on both sides in this case, but the ABI got
>>>>>>>>> break.
>>>>>>>>>
>>>>>>>>> That's also why I was suggesting if the property contributes to the guest
>>>>>>>>> ABI, then AFAIU QEMU needs to:
>>>>>>>>>
>>>>>>>>>        - Firstly, never quietly flipping any bit that affects the ABI...
>>>>>>>>>
>>>>>>>>>        - Have a default value of off, then QEMU will always allow the VM to boot
>>>>>>>>>          by default, while advanced users can opt-in on new features.  We can't
>>>>>>>>>          make this ON by default otherwise some VMs can already fail to boot,
>>>>>>>>
>>>>>>>> It may not be necessary the case that old features are supported by
>>>>>>>> every systems. In an extreme case, a user may migrate a VM from Linux to
>>>>>>>> Windows, which probably doesn't support any offloading at all. A more
>>>>>>>> convincing scenario is RSS offloading with eBPF; using eBPF requires a
>>>>>>>> privilege so we cannot assume it is always available even on the latest
>>>>>>>> version of Linux.
>>>>>>>
>>>>>>> I don't get why eBPF matters here. It is something that is not noticed
>>>>>>> by the guest and we have a fallback anyhow.
>>>>
>>>> It is noticeable for the guest, and the fallback is not effective with
>>>> vhost.
>>>
>>> It's a bug then. Qemu can fallback to tuntap if it sees issues in vhost.
>>
>> We can certainly fallback to in-QEMU RSS by disabling vhost, but I would
>> not say lack of such fallback is a bug.
> 
> Such fallback is by design since the introduction of vhost.
> 
>> We don't provide in-QEMU
>> fallback for other offloads.
> 
> Yes but what I want to say is that eBPF RSS is different from those
> segmentation offloads. And technically, Qemu can do fallback for
> offloads (as RSC did).

Well, I couldn't find any code disabling vhost for the in-QEMU RSC 
implementation.

Looking at the code, I also found the case of vhost-vdpa. vhost can be 
simply disabled if it is backed by tuntap, but it is not the case for vDPA.

Regards,
Akihiko Odaki

next prev parent reply	other threads:[~2024-07-30  3:17 UTC|newest]

Thread overview: 107+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-31 22:31 [PATCH v2 0/4] virtio-net: add USO feature (UDP segmentation offload) Yuri Benditovich
2023-07-31 22:31 ` [PATCH v2 1/4] tap: Add USO support to tap device Yuri Benditovich
2023-07-31 22:31 ` [PATCH v2 2/4] tap: Add check for USO features Yuri Benditovich
2023-07-31 22:31 ` [PATCH v2 3/4] virtio-net: Add USO flags to vhost support Yuri Benditovich
2023-07-31 22:31 ` [PATCH v2 4/4] virtio-net: Add support for USO features Yuri Benditovich
2023-08-02  5:17   ` Akihiko Odaki
2024-07-25 22:18   ` Peter Xu
2024-07-26  2:12     ` Jason Wang
2024-07-26 15:01       ` Peter Xu
2024-07-26  6:08     ` Michael S. Tsirkin
2024-07-26  7:03       ` Thomas Huth
2024-07-26  7:25         ` Michael S. Tsirkin
2024-07-26 11:32           ` Peter Xu
2024-07-26 17:39           ` Thomas Huth
2024-07-26 20:55             ` Peter Xu
2024-08-01  5:41             ` Michael S. Tsirkin
2024-07-26  8:48         ` Daniel P. Berrangé
2024-07-26 14:43           ` Peter Xu
2024-07-26 15:17             ` Daniel P. Berrangé
2024-07-26 20:47               ` Peter Xu
2024-07-28 15:18                 ` Akihiko Odaki
2024-07-29  3:50                   ` Jason Wang
2024-07-29  4:45                     ` Akihiko Odaki
2024-07-29 14:29                       ` Peter Xu
2024-07-29 16:43                         ` Akihiko Odaki
2024-07-30  2:04                           ` Jason Wang
2024-07-30  2:57                             ` Akihiko Odaki
2024-07-30  3:03                               ` Jason Wang
2024-07-30  3:11                                 ` Akihiko Odaki [this message]
2024-07-30  3:17                                   ` Jason Wang
2024-07-30  3:28                                     ` Akihiko Odaki
2024-07-30  3:45                                       ` Jason Wang
2024-07-30 10:23                                         ` Akihiko Odaki
2024-07-30 11:52                                           ` Yuri Benditovich
2024-07-31  2:05                                           ` Jason Wang
2024-07-29 15:58                 ` Daniel P. Berrangé
2024-07-29 17:00                   ` Peter Xu
2024-07-29 17:23                     ` Akihiko Odaki
2024-07-30 18:02                       ` Peter Xu
2024-07-29 17:26                     ` Daniel P. Berrangé
2024-07-30 18:13                       ` Peter Xu
2024-07-30 18:46                         ` Daniel P. Berrangé
2024-07-30 19:11                           ` Peter Xu
2024-07-30 19:22                             ` Michael S. Tsirkin
2024-07-30 20:03                               ` Peter Xu
2024-07-30 21:32                                 ` Michael S. Tsirkin
2024-07-30 22:01                                   ` Peter Xu
2024-07-31  2:01                                   ` Jason Wang
2024-07-31  7:04                                   ` Daniel P. Berrangé
2024-07-31  7:41                                     ` Michael S. Tsirkin
2024-07-31 12:57                                       ` Peter Xu
2024-08-01  2:28                                         ` Jason Wang
2024-08-01  5:28                                           ` Akihiko Odaki
2024-08-01  5:34                                         ` Michael S. Tsirkin
2024-08-01  5:51                                         ` Michael S. Tsirkin
2024-08-01 15:36                                           ` Peter Xu
2024-08-01 15:39                                             ` Michael S. Tsirkin
2024-08-01 15:45                                           ` Daniel P. Berrangé
2024-08-01 15:50                                             ` Michael S. Tsirkin
2024-08-01 15:58                                               ` Daniel P. Berrangé
2024-08-01  5:05                             ` Akihiko Odaki
2024-08-01 15:13                               ` Peter Xu
2024-08-01 15:15                                 ` Michael S. Tsirkin
2024-08-01 15:25                                   ` Daniel P. Berrangé
2024-08-02  4:30                                 ` Akihiko Odaki
2024-08-02 13:21                                   ` Michael S. Tsirkin
2024-08-02 15:05                                   ` Peter Xu
2024-08-02 15:54                                     ` Akihiko Odaki
2024-08-02 16:26                                       ` Peter Xu
2024-08-02 16:40                                         ` Michael S. Tsirkin
2024-08-02 20:56                                           ` Peter Xu
2024-08-04  6:49                                         ` Akihiko Odaki
2024-08-04 13:08                                           ` Peter Xu
2024-08-04 13:41                                             ` Michael S. Tsirkin
2024-08-05  7:27                                             ` Akihiko Odaki
2024-08-06 20:41                                               ` Peter Xu
2024-08-08 11:43                                                 ` Akihiko Odaki
2024-08-08 13:55                                                   ` Peter Xu
2024-08-08 14:45                                                     ` Michael S. Tsirkin
2024-08-09 10:28                                                     ` Akihiko Odaki
2024-08-05  7:30                                           ` Michael S. Tsirkin
2024-08-05  7:53                                             ` Akihiko Odaki
2024-08-05  8:23                                               ` Michael S. Tsirkin
2024-08-05  9:37                                                 ` Akihiko Odaki
2024-08-05 10:08                                                   ` Michael S. Tsirkin
2024-08-06  7:35                                                     ` Akihiko Odaki
2024-08-06 13:29                                                       ` Michael S. Tsirkin
2024-08-08 10:52                                                         ` Akihiko Odaki
2024-08-08 10:54                                                           ` Michael S. Tsirkin
2024-08-08 11:03                                                             ` Akihiko Odaki
2024-08-08 11:12                                                               ` Michael S. Tsirkin
2024-08-08 11:32                                                                 ` Akihiko Odaki
2024-08-08 14:21                                                                   ` Peter Xu
2024-08-08 14:15                                                                 ` Peter Xu
2024-08-08 14:47                                                                   ` Michael S. Tsirkin
2024-08-08 15:25                                                                     ` Peter Xu
2024-08-09 12:50                                                                       ` Fabiano Rosas
2024-08-18  5:04                                                                         ` Akihiko Odaki
2024-08-18  7:03                                                                           ` Michael S. Tsirkin
2024-08-19  4:27                                                                             ` Akihiko Odaki
2024-08-11  7:35                                                                       ` Michael S. Tsirkin
2024-08-18  5:09                                                                       ` Akihiko Odaki
2024-07-29 17:02                   ` Akihiko Odaki
2024-08-01  5:38                     ` Michael S. Tsirkin
2024-07-29  3:52       ` Jason Wang
2023-08-09 20:21 ` [PATCH v2 0/4] virtio-net: add USO feature (UDP segmentation offload) Yuri Benditovich
2023-08-10  3:14   ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bc2ef42e-9003-4a3e-aee5-e65d34f205bf@daynix.com \
    --to=akihiko.odaki@daynix.com \
    --cc=berrange@redhat.com \
    --cc=devel@lists.libvirt.org \
    --cc=dmitry.fleytman@gmail.com \
    --cc=eduardo@habkost.net \
    --cc=farosas@suse.de \
    --cc=jasowang@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=sriram.yagnaraman@est.tech \
    --cc=sw@weilnetz.de \
    --cc=thuth@redhat.com \
    --cc=wangyanan55@huawei.com \
    --cc=yan@daynix.com \
    --cc=yuri.benditovich@daynix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).