public inbox for virtio-comment@lists.linux.dev
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Parav Pandit <parav@nvidia.com>
Cc: Paolo Abeni <pabeni@redhat.com>,
	"hengqi@linux.alibaba.com" <hengqi@linux.alibaba.com>,
	"virtio-comment@lists.linux.dev" <virtio-comment@lists.linux.dev>,
	"cohuck@redhat.com" <cohuck@redhat.com>,
	"mvaralar@redhat.com" <mvaralar@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	Shahaf Shuler <shahafs@nvidia.com>,
	Willem de Bruijn <willemb@google.com>,
	Daniel Verkamp <dverkamp@chromium.org>
Subject: Re: [PATCH v1] virtio-net: Fix to avoid using reserved feature bits
Date: Thu, 1 May 2025 09:42:47 -0400	[thread overview]
Message-ID: <20250501093933-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CY8PR12MB7195E87CBF30EB5C7705835DDC832@CY8PR12MB7195.namprd12.prod.outlook.com>

On Wed, Apr 30, 2025 at 10:54:12AM +0000, Parav Pandit wrote:
> 
> 
> > From: Paolo Abeni <pabeni@redhat.com>
> > Sent: Wednesday, April 30, 2025 3:42 PM
> > 
> > On 4/30/25 6:44 AM, Parav Pandit wrote:
> > >> From: Michael S. Tsirkin <mst@redhat.com>
> > >> Sent: Wednesday, April 30, 2025 2:14 AM
> > >>
> > >> On Mon, Apr 28, 2025 at 10:39:59AM +0200, Paolo Abeni wrote:
> > >>> On 4/23/25 8:07 PM, Michael S. Tsirkin wrote:
> > >>>> On Wed, Apr 23, 2025 at 09:29:11AM -0700, Daniel Verkamp wrote:
> > >>>>> On Tue, Apr 22, 2025 at 10:46 PM Michael S. Tsirkin
> > >>>>> <mst@redhat.com>
> > >> wrote:
> > >>>>>> I'm afraid we'll have to bite the bullet.
> > >>>>>
> > >>>>> One other issue with bits > 63 is that the vhost-user protocol
> > >>>>> VHOST_USER_GET_FEATURES and VHOST_USER_SET_FEATURES
> > >> messages use
> > >>>>> u64 to represent the features, so vhost-user-net devices can't
> > >>>>> query or enable these features. vhost-user is outside the scope of
> > >>>>> the virtio spec, though, and I think it's reasonable to extend the
> > >>>>> protocol to enable high feature bits rather than avoiding them forever.
> > >>>>
> > >>>> Yes you would use VHOST_USER_SET_PROTOCOL_FEATURES to make
> > >>>> VHOST_USER_GET_FEATURES return two u64s, or even a new message
> > >> returning an array.
> > >>>
> > >>> I think that additionally the VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET
> > >>> command will need some clarification, as in the current text looks a
> > >>> bit
> > >>> inconsistent:
> > >>>
> > >>> """
> > >>> // in Offloads State Configuration / Setting Offloads State:
> > >>>
> > >>> #define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO  46
> > >>>
> > >>> // ...
> > >>>
> > >>> The class VIRTIO_NET_CTRL_GUEST_OFFLOADS has one command:
> > >>> VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET applies the new offloads
> > >> configuration.
> > >>>
> > >>> le64 value passed as command data is a bitmask, bits set define
> > >>> offloads to be enabled, bits cleared - offloads to be disabled.
> > >>>
> > >>> There is a corresponding device feature for each offload. Upon
> > >>> feature negotiation corresponding offload gets enabled to preserve
> > >>> backward compatibility """
> > >>>
> > >>> The "corresponding device feature" has the same numerical value of
> > >>> the selected offloads, except for UDP tunnels related one (which are
> > >>> mapped to bits corresponding to reserved features).
> > >>>
> > >>> It's unclear to me which should be the better way to address this
> > >>> inconsistency.
> > >>>
> > >>> /P
> > >>
> > >>
> > >> Parav, what's your take here? Given your change broke
> > >> VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET, do you want to revert it?
> > >
> > > I see two options.
> > > Opt_1:
> > > Open source Linux kernel driver and DPDK PMD has not used RSS_CONTEXT
> > yet.
> > > If Heng from Alibaba acks that they do not have any internal
> > implementation either, it may be safe to shift _all_ feature > 63 to lower
> > position.
> > > We can get Yuri's feedback, if at all windows driver has used RSS context.
> > >
> > > And once for all we mark it that feature bits are limited to 0-63.
> > > There is enough infrastructure in place in virtio spec to not try to squeeze
> > things in feature bits.
> > > And these 4 bits are good example of it already, which could have been
> > negotiated/communicated at later phase of driver at runtime.
> > > Only bit required was a bit to expand vnet header size at early stage.
> > >
> > > Advantage: brings the good practice to adapt to the modern and efficient
> > driver->device interface.
> > > Risk: May break RSS_CONTEXT (risk looks low)
> > >
> > > Opt_2. VIRTIO_NET_CTRL_GUEST_OFFLOADS command text to be updated
> > to
> > > indicate that,
> > >
> > > Below defines corresponds to respective feature bits 65 to 68. There is still
> > one to one mapping, its just position is different inside the class.
> > > This is clarification text to be added and sw can adjust for it.
> > >
> > > #define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO 46 #define
> > > VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM 47 #define
> > > VIRTIO_NET_F_GUEST_USO4 54 #define VIRTIO_NET_F_GUEST_USO6 55
> > >
> > > Advantages of 2nd option are:
> > > a. featuring bits remain open upto 127.
> > > b. Does not break RSS_CONTEXT.
> > >
> > > Both options are practical to me.
> > > I prefer #1, if Heng acks it, but also ok for #2.
> > 
> > I spent quite of bit of time trying to evaluate the scope of features bit
> > expansion (implied by the option 2 above).
> > 
> > While strictly speaking I haven't hit yet a complete blocker, implementation-
> > wise it's going to be huge and error prone, as great deal of both the kernel
> > and the user-space/qemu infrastructure hard-codes the
> > 64 bit limit.
> > 
> > Even exposing the feature extensions only the the virtio-net device (AFAICS it
> > will "minimize" the code churn) a lot of code and devices implementations are
> > going to be impacted.
> > 
> > I expect a far away in time timeline for implementations based on option 2.
> > 
> > /P
> Given that its painful enough, we can update the virtio spec to limit to 64-bit features as option_1.
> 
> A virtio spec diff like below,
> - 0 to 23, and 50 to 127 Feature bits for the specific device type
> + 0 to 23, and 50 to 63 Feature bits for the specific device type
> 
> Otherwise, we are delaying the problem to next feature, which is not good.
> 
> Heng,
> Yuri already acked, so we are good from Windows side.
> With that Windows, Linux kernel, DPDK PMD are safe.
> 
> Can you also please confirm that there are no users within Alibaba relying on RSS_CONTEXT, so that shifting the feature bit will not break any existing functionality?
> I think it will not break, because we can update the non-open-source software to detect RSS_CONTEXT on a new bit.
> 
> For example, driver can do,
> Enable_rss_context if bit X OR Bit 64 is set.


sorry if I am unclear, I think it's too late to move RSS really. It has
been out there for two years. The reason we can be reasonably sure no
one implemented the offloads so far is because the set offloads command
is broken and no one complained.  We need to fix them in the spec,
current one is broken. So the tunnel offloads can move if
it's convenient.

-- 
MST


  reply	other threads:[~2025-05-01 13:42 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-26  6:20 [PATCH v1] virtio-net: Fix to avoid using reserved feature bits Parav Pandit
2025-01-26  9:19 ` Michael S. Tsirkin
2025-01-26 16:44   ` Parav Pandit
2025-01-26 16:50     ` Michael S. Tsirkin
2025-01-27  9:21       ` Cornelia Huck
2025-01-27 12:54         ` Parav Pandit
2025-04-22 17:49 ` Paolo Abeni
2025-04-23  5:46   ` Michael S. Tsirkin
2025-04-23 16:05     ` Paolo Abeni
2025-04-28  9:13       ` Michael S. Tsirkin
2025-04-28 17:07         ` Paolo Abeni
2025-04-28 17:18           ` Michael S. Tsirkin
2025-04-23 16:29     ` Daniel Verkamp
2025-04-23 18:07       ` Michael S. Tsirkin
2025-04-28  8:39         ` Paolo Abeni
2025-04-28  8:47           ` Michael S. Tsirkin
2025-04-29 20:43           ` Michael S. Tsirkin
2025-04-30  4:44             ` Parav Pandit
2025-04-30  5:25               ` Yuri Benditovich
2025-04-30  5:44                 ` Parav Pandit
2025-04-30 10:12               ` Paolo Abeni
2025-04-30 10:54                 ` Parav Pandit
2025-05-01 13:42                   ` Michael S. Tsirkin [this message]
2025-05-01 15:57                     ` Paolo Abeni
2025-05-06  6:15                     ` Parav Pandit
2025-05-06  7:56                       ` Michael S. Tsirkin
2025-05-06  8:56                         ` Parav Pandit
2025-05-06 14:38                           ` Paolo Abeni
2025-05-06 15:00                             ` Parav Pandit
2025-05-06 15:40                               ` Paolo Abeni
2025-05-06 16:20                                 ` Parav Pandit
2025-05-07  9:57                                   ` Paolo Abeni
2025-05-08  6:15                                     ` Michael S. Tsirkin
2025-05-19  8:57                                     ` Paolo Abeni
2025-05-19  9:04                                       ` Parav Pandit
2025-05-19  9:24                                         ` Paolo Abeni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250501093933-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=dverkamp@chromium.org \
    --cc=hengqi@linux.alibaba.com \
    --cc=jasowang@redhat.com \
    --cc=mvaralar@redhat.com \
    --cc=pabeni@redhat.com \
    --cc=parav@nvidia.com \
    --cc=shahafs@nvidia.com \
    --cc=virtio-comment@lists.linux.dev \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox