All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: Jason Wang <jasowang@redhat.com>,
	virtio-dev@lists.oasis-open.org, parav@nvidia.com,
	virtio-comment@lists.oasis-open.org, "Zhu,
	Lingshan" <lingshan.zhu@intel.com>
Subject: Re: [virtio-comment] [RFC PATCH] admin-queue: bind the group member to the device
Date: Wed, 28 Jun 2023 11:55:08 -0400	[thread overview]
Message-ID: <20230628114143-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <1687932392.6613173-2-xuanzhuo@linux.alibaba.com>

On Wed, Jun 28, 2023 at 02:06:32PM +0800, Xuan Zhuo wrote:
> On Wed, 28 Jun 2023 10:49:45 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Tue, Jun 27, 2023 at 6:54 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Tue, 27 Jun 2023 17:00:06 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Tue, Jun 27, 2023 at 4:28 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > >
> > > > > Thanks Parav for pointing it out. We may have some gaps on the case.
> > > > >
> > > > > Let me introduce our case, which I think it is simple and should be easy to
> > > > > understand.
> > > > >
> > > > > First, the user (customer) purchased a bare metal machine.
> > > > >
> > > > > ## Bare metal machine
> > > > >
> > > > > Let me briefly explain the characteristics of a bare metal machine. It is not a
> > > > > virtual machine, it is a physical machine, and the difference between it and a
> > > > > general physical machine is that its PCI is connected to a device similar to a
> > > > > DPU. This DPU provides devices such as virtio-blk/net to the host through PCI.
> > > > > These devices are managed by the vendor, and must be created and purchased
> > > > > on the vendor's management platform.
> > > > >
> > > > > ## DPU
> > > > >
> > > > > There is a software implementation in the DPU, which will respond to PCI
> > > > > operations. But as mentioned above, resources such as network cards must be
> > > > > purchased and created before they can exist. So users can create VF, which is
> > > > > just a pci-level operation, but there may not be a corresponding backend.
> > > > >
> > > > > ## Management Platform
> > > > >
> > > > > The creation and configuration of devices is realized on the management
> > > > > platform.
> > > > >
> > > > > After the user completed the purchase on the management platform (this is an
> > > > > independent platform provided by the vendor and has nothing to do with
> > > > > virtio), then there will be a corresponding device implementation in the DPU.
> > > > > This includes some user configurations, available bandwidth resources and other
> > > > > information.
> > > > >
> > > > > ## Usage
> > > > >
> > > > > Since the user is directly on the HOST, the user can create VMs, passthrough PF
> > > > > or VF into the VM. Or users can create a large number of dockers, all of which
> > > > > use a separate virtio-net device for performance.
> > > > >
> > > > > The reason why users use vf is that we need to use a large number of virtio-net
> > > > > devices. This number reaches 1k+.
> > > > >
> > > > > Based on this scenario, we need to bind vf to the backend device. Because, we
> > > > > cannot automatically complete the creation of the virtio-net backend device when
> > > > > the user creates a vf.
> > > > >
> > > > > ## Migration
> > > > >
> > > > > In addition, let's consider another scenario of migration. If a vm is migrated
> > > > > from another host, of course its corresponding virtio device is also migrated to
> > > > > the DPU. At this time, our newly created vf can only be used by the vm after it
> > > > > is bound to the migrated device. We do not want this vf to be a brand new
> > > > > device.
> > > > >
> > > > > ## Abstraction
> > > > >
> > > > > So, this is how I understand the process of creating vf:
> > > > >
> > > > > 1. Create a PCI VF, at this time there may be no backend virtio device, or there
> > > > >     is only a default backend. It does not fully meet our expectations.
> > > > > 2. Create device or migrate device
> > > > > 3. Bind the backend virtio device to the vf
> > > >
> > > > 3) should come before 2)?
> > > >
> > > > Who is going to do 3) btw, is it the user? If yes, for example, if a
> > > > user wants another 4 queue virtio-net devices, after purchase, how
> > > > does the user know its id?
> > >
> > > Got the id from the management platform.
> >
> > So it can do the binding via that management platform which this
> > became a cloud vendor specific interface.
> 
> In our scenario, this is bound by the user using this id and vf id in the os.
> 
> >
> > >
> > > >
> > > > >
> > > > > In most scenarios, the first step may be enough. We can make some fine-tuning on
> > > > > this default device, such as modifying its mac. In the future, we can use admin
> > > > > queue to modify its msix vector and other configurations.
> > > > >
> > > > > But we should allow, we bind a backend virtio device to a certain vf. This is
> > > > > useful for live migration and virtio devices with special configurations.
> > > >
> > > > All of these could be addressed if a dynamic provisioning model is
> > > > implemented (SIOV or transport virtqueue). Trying to have a workaround
> > > > in SR-IOV might be tricky.
> > >
> > >
> > > SR-IOV vf is native PCI device, this is the advancement.
> >
> > The problem is that it doesn't support flexible provisioning, e.g
> > create and destroy a single VF.
> 
> YES. ^_^!!

So sure, create it. Once you have created it, you can
use the VF# to talk to it.


I *suspect* that what this ID does is replace provisioning commands.

So instead of saying "create VF#3 with MAC 0xABC and 0x1000VQs"
you would have management say "ID 0xFACE refers to MAC ABC and 1000VQs"
and later you will say "bind VF#3 to ID 0xFACE" and that will
set it up.

Is that it?

But why is it important to do it in two steps like this?
as opposed to in one step?  I have no idea.

> 
> >
> > >
> > >
> > > >
> > > > >
> > > > > The design of virtio itself is two layers, and virtio should allow switching the
> > > > > transport layer by nature. This is our advantage.
> > > >
> > > > Is it not switching the transport layer but about binding/unbinding
> > > > vitio devices to VF?
> > >
> > > YES.
> > >
> > > >
> > > > Is a new capability or similar admin cmd sufficient in this case?
> > >
> > > All is ok.
> > >
> > >
> > > >
> > > > struct virtio_pci_bind_cap {
> > > >         struct virtio_pci_cap cap;
> > > >         u16 bind; // virtio_device_id
> > > >         u16 unbind; // virtio_device_id
> > > > };
> > >
> > > You mean that the "bind" or "unbind" is writeable?
> 
> This is a good idea.
> 
> Thanks.

So stealing valuable memory from limited pci config space, no error
handling, no filtering... Ugh.  Let's not put a round peg in a square
hole.

For management I think we should use admin commands. They were built for
the management use-case.
Config space (pci and virtio) is better for driver slow path.

-- 
MST


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: Jason Wang <jasowang@redhat.com>,
	virtio-dev@lists.oasis-open.org, parav@nvidia.com,
	virtio-comment@lists.oasis-open.org, "Zhu,
	Lingshan" <lingshan.zhu@intel.com>
Subject: [virtio-dev] Re: [virtio-comment] [RFC PATCH] admin-queue: bind the group member to the device
Date: Wed, 28 Jun 2023 11:55:08 -0400	[thread overview]
Message-ID: <20230628114143-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <1687932392.6613173-2-xuanzhuo@linux.alibaba.com>

On Wed, Jun 28, 2023 at 02:06:32PM +0800, Xuan Zhuo wrote:
> On Wed, 28 Jun 2023 10:49:45 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Tue, Jun 27, 2023 at 6:54 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Tue, 27 Jun 2023 17:00:06 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Tue, Jun 27, 2023 at 4:28 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > >
> > > > > Thanks Parav for pointing it out. We may have some gaps on the case.
> > > > >
> > > > > Let me introduce our case, which I think it is simple and should be easy to
> > > > > understand.
> > > > >
> > > > > First, the user (customer) purchased a bare metal machine.
> > > > >
> > > > > ## Bare metal machine
> > > > >
> > > > > Let me briefly explain the characteristics of a bare metal machine. It is not a
> > > > > virtual machine, it is a physical machine, and the difference between it and a
> > > > > general physical machine is that its PCI is connected to a device similar to a
> > > > > DPU. This DPU provides devices such as virtio-blk/net to the host through PCI.
> > > > > These devices are managed by the vendor, and must be created and purchased
> > > > > on the vendor's management platform.
> > > > >
> > > > > ## DPU
> > > > >
> > > > > There is a software implementation in the DPU, which will respond to PCI
> > > > > operations. But as mentioned above, resources such as network cards must be
> > > > > purchased and created before they can exist. So users can create VF, which is
> > > > > just a pci-level operation, but there may not be a corresponding backend.
> > > > >
> > > > > ## Management Platform
> > > > >
> > > > > The creation and configuration of devices is realized on the management
> > > > > platform.
> > > > >
> > > > > After the user completed the purchase on the management platform (this is an
> > > > > independent platform provided by the vendor and has nothing to do with
> > > > > virtio), then there will be a corresponding device implementation in the DPU.
> > > > > This includes some user configurations, available bandwidth resources and other
> > > > > information.
> > > > >
> > > > > ## Usage
> > > > >
> > > > > Since the user is directly on the HOST, the user can create VMs, passthrough PF
> > > > > or VF into the VM. Or users can create a large number of dockers, all of which
> > > > > use a separate virtio-net device for performance.
> > > > >
> > > > > The reason why users use vf is that we need to use a large number of virtio-net
> > > > > devices. This number reaches 1k+.
> > > > >
> > > > > Based on this scenario, we need to bind vf to the backend device. Because, we
> > > > > cannot automatically complete the creation of the virtio-net backend device when
> > > > > the user creates a vf.
> > > > >
> > > > > ## Migration
> > > > >
> > > > > In addition, let's consider another scenario of migration. If a vm is migrated
> > > > > from another host, of course its corresponding virtio device is also migrated to
> > > > > the DPU. At this time, our newly created vf can only be used by the vm after it
> > > > > is bound to the migrated device. We do not want this vf to be a brand new
> > > > > device.
> > > > >
> > > > > ## Abstraction
> > > > >
> > > > > So, this is how I understand the process of creating vf:
> > > > >
> > > > > 1. Create a PCI VF, at this time there may be no backend virtio device, or there
> > > > >     is only a default backend. It does not fully meet our expectations.
> > > > > 2. Create device or migrate device
> > > > > 3. Bind the backend virtio device to the vf
> > > >
> > > > 3) should come before 2)?
> > > >
> > > > Who is going to do 3) btw, is it the user? If yes, for example, if a
> > > > user wants another 4 queue virtio-net devices, after purchase, how
> > > > does the user know its id?
> > >
> > > Got the id from the management platform.
> >
> > So it can do the binding via that management platform which this
> > became a cloud vendor specific interface.
> 
> In our scenario, this is bound by the user using this id and vf id in the os.
> 
> >
> > >
> > > >
> > > > >
> > > > > In most scenarios, the first step may be enough. We can make some fine-tuning on
> > > > > this default device, such as modifying its mac. In the future, we can use admin
> > > > > queue to modify its msix vector and other configurations.
> > > > >
> > > > > But we should allow, we bind a backend virtio device to a certain vf. This is
> > > > > useful for live migration and virtio devices with special configurations.
> > > >
> > > > All of these could be addressed if a dynamic provisioning model is
> > > > implemented (SIOV or transport virtqueue). Trying to have a workaround
> > > > in SR-IOV might be tricky.
> > >
> > >
> > > SR-IOV vf is native PCI device, this is the advancement.
> >
> > The problem is that it doesn't support flexible provisioning, e.g
> > create and destroy a single VF.
> 
> YES. ^_^!!

So sure, create it. Once you have created it, you can
use the VF# to talk to it.


I *suspect* that what this ID does is replace provisioning commands.

So instead of saying "create VF#3 with MAC 0xABC and 0x1000VQs"
you would have management say "ID 0xFACE refers to MAC ABC and 1000VQs"
and later you will say "bind VF#3 to ID 0xFACE" and that will
set it up.

Is that it?

But why is it important to do it in two steps like this?
as opposed to in one step?  I have no idea.

> 
> >
> > >
> > >
> > > >
> > > > >
> > > > > The design of virtio itself is two layers, and virtio should allow switching the
> > > > > transport layer by nature. This is our advantage.
> > > >
> > > > Is it not switching the transport layer but about binding/unbinding
> > > > vitio devices to VF?
> > >
> > > YES.
> > >
> > > >
> > > > Is a new capability or similar admin cmd sufficient in this case?
> > >
> > > All is ok.
> > >
> > >
> > > >
> > > > struct virtio_pci_bind_cap {
> > > >         struct virtio_pci_cap cap;
> > > >         u16 bind; // virtio_device_id
> > > >         u16 unbind; // virtio_device_id
> > > > };
> > >
> > > You mean that the "bind" or "unbind" is writeable?
> 
> This is a good idea.
> 
> Thanks.

So stealing valuable memory from limited pci config space, no error
handling, no filtering... Ugh.  Let's not put a round peg in a square
hole.

For management I think we should use admin commands. They were built for
the management use-case.
Config space (pci and virtio) is better for driver slow path.

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


  reply	other threads:[~2023-06-28 15:55 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-26  6:22 [virtio-comment] [RFC PATCH] admin-queue: bind the group member to the device Xuan Zhuo
2023-06-26  6:22 ` [virtio-dev] " Xuan Zhuo
2023-06-26  6:43 ` [virtio-comment] " Zhu, Lingshan
2023-06-26  6:43   ` [virtio-dev] " Zhu, Lingshan
2023-06-26  7:08   ` Xuan Zhuo
2023-06-26  7:08     ` [virtio-dev] " Xuan Zhuo
2023-06-26  7:57     ` Zhu, Lingshan
2023-06-26  7:57       ` [virtio-dev] " Zhu, Lingshan
2023-06-26  8:09       ` Xuan Zhuo
2023-06-26  8:09         ` [virtio-dev] " Xuan Zhuo
2023-06-26  8:59         ` [virtio-comment] " Zhu, Lingshan
2023-06-26  8:59           ` Zhu, Lingshan
2023-06-26  9:16           ` [virtio-comment] " Xuan Zhuo
2023-06-26  9:16             ` Xuan Zhuo
2023-06-26  9:32             ` [virtio-comment] " Xuan Zhuo
2023-06-26  9:32               ` [virtio-dev] " Xuan Zhuo
2023-06-26  9:56             ` Zhu, Lingshan
2023-06-26  9:56               ` [virtio-dev] " Zhu, Lingshan
2023-06-26 10:50               ` Xuan Zhuo
2023-06-26 10:50                 ` [virtio-dev] " Xuan Zhuo
2023-06-26 12:19                 ` Parav Pandit
2023-06-26 12:19                   ` [virtio-dev] " Parav Pandit
2023-06-26 12:32                   ` Xuan Zhuo
2023-06-26 12:32                     ` [virtio-dev] " Xuan Zhuo
2023-06-26 13:01                     ` Parav Pandit
2023-06-26 13:01                       ` [virtio-dev] " Parav Pandit
2023-06-26 12:35                   ` Michael S. Tsirkin
2023-06-26 12:35                     ` [virtio-dev] " Michael S. Tsirkin
2023-06-26 12:39                     ` Xuan Zhuo
2023-06-26 12:39                       ` [virtio-dev] " Xuan Zhuo
2023-06-26 22:46                     ` Parav Pandit
2023-06-26 22:46                       ` [virtio-dev] " Parav Pandit
2023-06-27  2:57                 ` Zhu, Lingshan
2023-06-27  2:57                   ` [virtio-dev] " Zhu, Lingshan
2023-06-27  8:14                   ` Xuan Zhuo
2023-06-27  8:14                     ` [virtio-dev] " Xuan Zhuo
2023-06-27  9:04                     ` Zhu, Lingshan
2023-06-27  9:04                       ` [virtio-dev] " Zhu, Lingshan
2023-06-26  9:32 ` [virtio-comment] " Michael S. Tsirkin
2023-06-26  9:32   ` [virtio-dev] " Michael S. Tsirkin
2023-06-26  9:35   ` [virtio-comment] " Xuan Zhuo
2023-06-26  9:35     ` [virtio-dev] " Xuan Zhuo
2023-06-27  8:08 ` [virtio-comment] " Jason Wang
2023-06-27  8:08   ` [virtio-dev] " Jason Wang
2023-06-27  8:16   ` Xuan Zhuo
2023-06-27  8:16     ` [virtio-dev] " Xuan Zhuo
2023-06-27  8:23 ` Xuan Zhuo
2023-06-27  8:23   ` [virtio-dev] " Xuan Zhuo
2023-06-27  9:00   ` Jason Wang
2023-06-27  9:00     ` [virtio-dev] " Jason Wang
2023-06-27 10:50     ` Xuan Zhuo
2023-06-27 10:50       ` [virtio-dev] " Xuan Zhuo
2023-06-28  2:49       ` Jason Wang
2023-06-28  2:49         ` [virtio-dev] " Jason Wang
2023-06-28  6:06         ` Xuan Zhuo
2023-06-28  6:06           ` [virtio-dev] " Xuan Zhuo
2023-06-28 15:55           ` Michael S. Tsirkin [this message]
2023-06-28 15:55             ` Michael S. Tsirkin
2023-06-29  3:29             ` Jason Wang
2023-06-29  3:29               ` [virtio-dev] " Jason Wang
2023-06-27 15:03   ` Parav Pandit
2023-06-27 15:03     ` [virtio-dev] " Parav Pandit
2023-06-27 16:02   ` Michael S. Tsirkin
2023-06-27 16:02     ` [virtio-dev] " Michael S. Tsirkin
2023-06-28  2:21     ` Xuan Zhuo
2023-06-28  2:21       ` [virtio-dev] " Xuan Zhuo
2023-06-28 15:06       ` Parav Pandit
2023-06-28 15:06         ` [virtio-dev] " Parav Pandit
2023-06-28 15:41       ` Michael S. Tsirkin
2023-06-28 15:41         ` [virtio-dev] " Michael S. Tsirkin
2023-07-03  3:21         ` [virtio-comment] " Xuan Zhuo
2023-07-03  3:21           ` Xuan Zhuo
2023-07-03  7:42           ` [virtio-comment] " Jason Wang
2023-07-03  7:42             ` Jason Wang
2023-07-03 20:03           ` [virtio-comment] " Parav Pandit
2023-07-03 20:03             ` Parav Pandit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230628114143-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=lingshan.zhu@intel.com \
    --cc=parav@nvidia.com \
    --cc=virtio-comment@lists.oasis-open.org \
    --cc=virtio-dev@lists.oasis-open.org \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.