From: "Michael S. Tsirkin" <mst@redhat.com>
To: Parav Pandit <parav@nvidia.com>
Cc: Paolo Abeni <pabeni@redhat.com>,
"hengqi@linux.alibaba.com" <hengqi@linux.alibaba.com>,
"virtio-comment@lists.linux.dev" <virtio-comment@lists.linux.dev>,
"cohuck@redhat.com" <cohuck@redhat.com>,
"mvaralar@redhat.com" <mvaralar@redhat.com>,
Jason Wang <jasowang@redhat.com>,
Shahaf Shuler <shahafs@nvidia.com>,
Willem de Bruijn <willemb@google.com>,
Daniel Verkamp <dverkamp@chromium.org>
Subject: Re: [PATCH v1] virtio-net: Fix to avoid using reserved feature bits
Date: Tue, 6 May 2025 03:56:05 -0400 [thread overview]
Message-ID: <20250506035434-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <PH8PR12MB7208BA2F9A48EEBF42B15AE0DC892@PH8PR12MB7208.namprd12.prod.outlook.com>
On Tue, May 06, 2025 at 06:15:41AM +0000, Parav Pandit wrote:
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Thursday, May 1, 2025 7:13 PM
> > To: Parav Pandit <parav@nvidia.com>
> > Cc: Paolo Abeni <pabeni@redhat.com>; hengqi@linux.alibaba.com; virtio-
> > comment@lists.linux.dev; cohuck@redhat.com; mvaralar@redhat.com; Jason
> > Wang <jasowang@redhat.com>; Shahaf Shuler <shahafs@nvidia.com>;
> > Willem de Bruijn <willemb@google.com>; Daniel Verkamp
> > <dverkamp@chromium.org>
> > Subject: Re: [PATCH v1] virtio-net: Fix to avoid using reserved feature bits
> >
> > On Wed, Apr 30, 2025 at 10:54:12AM +0000, Parav Pandit wrote:
> > >
> > >
> > > > From: Paolo Abeni <pabeni@redhat.com>
> > > > Sent: Wednesday, April 30, 2025 3:42 PM
> > > >
> > > > On 4/30/25 6:44 AM, Parav Pandit wrote:
> > > > >> From: Michael S. Tsirkin <mst@redhat.com>
> > > > >> Sent: Wednesday, April 30, 2025 2:14 AM
> > > > >>
> > > > >> On Mon, Apr 28, 2025 at 10:39:59AM +0200, Paolo Abeni wrote:
> > > > >>> On 4/23/25 8:07 PM, Michael S. Tsirkin wrote:
> > > > >>>> On Wed, Apr 23, 2025 at 09:29:11AM -0700, Daniel Verkamp wrote:
> > > > >>>>> On Tue, Apr 22, 2025 at 10:46 PM Michael S. Tsirkin
> > > > >>>>> <mst@redhat.com>
> > > > >> wrote:
> > > > >>>>>> I'm afraid we'll have to bite the bullet.
> > > > >>>>>
> > > > >>>>> One other issue with bits > 63 is that the vhost-user protocol
> > > > >>>>> VHOST_USER_GET_FEATURES and VHOST_USER_SET_FEATURES
> > > > >> messages use
> > > > >>>>> u64 to represent the features, so vhost-user-net devices can't
> > > > >>>>> query or enable these features. vhost-user is outside the
> > > > >>>>> scope of the virtio spec, though, and I think it's reasonable
> > > > >>>>> to extend the protocol to enable high feature bits rather than
> > avoiding them forever.
> > > > >>>>
> > > > >>>> Yes you would use VHOST_USER_SET_PROTOCOL_FEATURES to make
> > > > >>>> VHOST_USER_GET_FEATURES return two u64s, or even a new
> > message
> > > > >> returning an array.
> > > > >>>
> > > > >>> I think that additionally the VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET
> > > > >>> command will need some clarification, as in the current text
> > > > >>> looks a bit
> > > > >>> inconsistent:
> > > > >>>
> > > > >>> """
> > > > >>> // in Offloads State Configuration / Setting Offloads State:
> > > > >>>
> > > > >>> #define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO 46
> > > > >>>
> > > > >>> // ...
> > > > >>>
> > > > >>> The class VIRTIO_NET_CTRL_GUEST_OFFLOADS has one command:
> > > > >>> VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET applies the new offloads
> > > > >> configuration.
> > > > >>>
> > > > >>> le64 value passed as command data is a bitmask, bits set define
> > > > >>> offloads to be enabled, bits cleared - offloads to be disabled.
> > > > >>>
> > > > >>> There is a corresponding device feature for each offload. Upon
> > > > >>> feature negotiation corresponding offload gets enabled to
> > > > >>> preserve backward compatibility """
> > > > >>>
> > > > >>> The "corresponding device feature" has the same numerical value
> > > > >>> of the selected offloads, except for UDP tunnels related one
> > > > >>> (which are mapped to bits corresponding to reserved features).
> > > > >>>
> > > > >>> It's unclear to me which should be the better way to address
> > > > >>> this inconsistency.
> > > > >>>
> > > > >>> /P
> > > > >>
> > > > >>
> > > > >> Parav, what's your take here? Given your change broke
> > > > >> VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET, do you want to revert it?
> > > > >
> > > > > I see two options.
> > > > > Opt_1:
> > > > > Open source Linux kernel driver and DPDK PMD has not used
> > > > > RSS_CONTEXT
> > > > yet.
> > > > > If Heng from Alibaba acks that they do not have any internal
> > > > implementation either, it may be safe to shift _all_ feature > 63 to
> > > > lower position.
> > > > > We can get Yuri's feedback, if at all windows driver has used RSS
> > context.
> > > > >
> > > > > And once for all we mark it that feature bits are limited to 0-63.
> > > > > There is enough infrastructure in place in virtio spec to not try
> > > > > to squeeze
> > > > things in feature bits.
> > > > > And these 4 bits are good example of it already, which could have
> > > > > been
> > > > negotiated/communicated at later phase of driver at runtime.
> > > > > Only bit required was a bit to expand vnet header size at early stage.
> > > > >
> > > > > Advantage: brings the good practice to adapt to the modern and
> > > > > efficient
> > > > driver->device interface.
> > > > > Risk: May break RSS_CONTEXT (risk looks low)
> > > > >
> > > > > Opt_2. VIRTIO_NET_CTRL_GUEST_OFFLOADS command text to be
> > updated
> > > > to
> > > > > indicate that,
> > > > >
> > > > > Below defines corresponds to respective feature bits 65 to 68.
> > > > > There is still
> > > > one to one mapping, its just position is different inside the class.
> > > > > This is clarification text to be added and sw can adjust for it.
> > > > >
> > > > > #define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO 46 #define
> > > > > VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM 47 #define
> > > > > VIRTIO_NET_F_GUEST_USO4 54 #define VIRTIO_NET_F_GUEST_USO6 55
> > > > >
> > > > > Advantages of 2nd option are:
> > > > > a. featuring bits remain open upto 127.
> > > > > b. Does not break RSS_CONTEXT.
> > > > >
> > > > > Both options are practical to me.
> > > > > I prefer #1, if Heng acks it, but also ok for #2.
> > > >
> > > > I spent quite of bit of time trying to evaluate the scope of
> > > > features bit expansion (implied by the option 2 above).
> > > >
> > > > While strictly speaking I haven't hit yet a complete blocker,
> > > > implementation- wise it's going to be huge and error prone, as great
> > > > deal of both the kernel and the user-space/qemu infrastructure
> > > > hard-codes the
> > > > 64 bit limit.
> > > >
> > > > Even exposing the feature extensions only the the virtio-net device
> > > > (AFAICS it will "minimize" the code churn) a lot of code and devices
> > > > implementations are going to be impacted.
> > > >
> > > > I expect a far away in time timeline for implementations based on option
> > 2.
> > > >
> > > > /P
> > > Given that its painful enough, we can update the virtio spec to limit to 64-bit
> > features as option_1.
> > >
> > > A virtio spec diff like below,
> > > - 0 to 23, and 50 to 127 Feature bits for the specific device type
> > > + 0 to 23, and 50 to 63 Feature bits for the specific device type
> > >
> > > Otherwise, we are delaying the problem to next feature, which is not good.
> > >
> > > Heng,
> > > Yuri already acked, so we are good from Windows side.
> > > With that Windows, Linux kernel, DPDK PMD are safe.
> > >
> > > Can you also please confirm that there are no users within Alibaba relying
> > on RSS_CONTEXT, so that shifting the feature bit will not break any existing
> > functionality?
> > > I think it will not break, because we can update the non-open-source
> > software to detect RSS_CONTEXT on a new bit.
> > >
> > > For example, driver can do,
> > > Enable_rss_context if bit X OR Bit 64 is set.
> >
> >
> > sorry if I am unclear, I think it's too late to move RSS really. It has been out
> > there for two years. The reason we can be reasonably sure no one
> > implemented the offloads so far is because the set offloads command is
> > broken and no one complained. We need to fix them in the spec, current one
> > is broken. So the tunnel offloads can move if it's convenient.
>
> There are few proposals on table.
>
> 1. From Paolo,
> - 0 to 23, and 50 to 127 Feature bits for the specific device type
> + 0 to 23, and 46 to 127 Feature bits for the specific device type
>
> This does not have good reason of why it should still be 127.
>
> 2. From me:
> - 0 to 23, and 50 to 127 Feature bits for the specific device type
> + 0 to 23, and 45 to 64 Feature bits for the specific device type
>
> This is an extension of Paolo, to justify that implementing feature bits is extremely hard even for experts as pointed by Paolo.
> It is worth to not extend it further.
> RSS can be negotiated via new bit 44 in future bit as OR of 44 and 64 so that more wider users (Linux, freebsd, qnx, Windows, dpdk pmd) can pick 44.
>
> 3. From me:
> Keep the feature bits encoding as is up to 127 bits, because may be there is (unknown and weird) value in having 127 feature bits.
> (unknown because the reasoning of #1 and #3 mismatch).
> In that case,
> VIRTIO_NET_CTRL_GUEST_OFFLOADS command text to be updated to indicate feature fits A to D map to
> VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.offloads bits A' to D'.
>
> I am fine with option #2 and #3.
> Doing #1 for sure is wrong.
> Wrong because it delays the problem of #1 from this to another feature [A] who's voting already completed.
>
> [A] https://lore.kernel.org/virtio-comment/DM4PR18MB4269F73B786E83EF68A70F37DFB82@DM4PR18MB4269.namprd18.prod.outlook.com/T/#t
#3 seems more conservate.
--
MST
next prev parent reply other threads:[~2025-05-06 7:56 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-26 6:20 [PATCH v1] virtio-net: Fix to avoid using reserved feature bits Parav Pandit
2025-01-26 9:19 ` Michael S. Tsirkin
2025-01-26 16:44 ` Parav Pandit
2025-01-26 16:50 ` Michael S. Tsirkin
2025-01-27 9:21 ` Cornelia Huck
2025-01-27 12:54 ` Parav Pandit
2025-04-22 17:49 ` Paolo Abeni
2025-04-23 5:46 ` Michael S. Tsirkin
2025-04-23 16:05 ` Paolo Abeni
2025-04-28 9:13 ` Michael S. Tsirkin
2025-04-28 17:07 ` Paolo Abeni
2025-04-28 17:18 ` Michael S. Tsirkin
2025-04-23 16:29 ` Daniel Verkamp
2025-04-23 18:07 ` Michael S. Tsirkin
2025-04-28 8:39 ` Paolo Abeni
2025-04-28 8:47 ` Michael S. Tsirkin
2025-04-29 20:43 ` Michael S. Tsirkin
2025-04-30 4:44 ` Parav Pandit
2025-04-30 5:25 ` Yuri Benditovich
2025-04-30 5:44 ` Parav Pandit
2025-04-30 10:12 ` Paolo Abeni
2025-04-30 10:54 ` Parav Pandit
2025-05-01 13:42 ` Michael S. Tsirkin
2025-05-01 15:57 ` Paolo Abeni
2025-05-06 6:15 ` Parav Pandit
2025-05-06 7:56 ` Michael S. Tsirkin [this message]
2025-05-06 8:56 ` Parav Pandit
2025-05-06 14:38 ` Paolo Abeni
2025-05-06 15:00 ` Parav Pandit
2025-05-06 15:40 ` Paolo Abeni
2025-05-06 16:20 ` Parav Pandit
2025-05-07 9:57 ` Paolo Abeni
2025-05-08 6:15 ` Michael S. Tsirkin
2025-05-19 8:57 ` Paolo Abeni
2025-05-19 9:04 ` Parav Pandit
2025-05-19 9:24 ` Paolo Abeni
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250506035434-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=cohuck@redhat.com \
--cc=dverkamp@chromium.org \
--cc=hengqi@linux.alibaba.com \
--cc=jasowang@redhat.com \
--cc=mvaralar@redhat.com \
--cc=pabeni@redhat.com \
--cc=parav@nvidia.com \
--cc=shahafs@nvidia.com \
--cc=virtio-comment@lists.linux.dev \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox