All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Eugenio Perez Martin <eperezma@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>,
	Xuan Zhuo <xuanzhuo@linux.alibaba.com>,
	virtio-comment@lists.oasis-open.org,
	Laurent Vivier <lvivier@redhat.com>, Cindy Lu <lulu@redhat.com>,
	cohuck@redhat.com, alvaro.karsz@solid-run.com,
	Liuxiangdong <liuxiangdong5@huawei.com>,
	Gautam Dawar <gdawar@xilinx.com>,
	longpeng2@huawei.com, Dragos Tatulea <dtatulea@nvidia.com>,
	parav@nvidia.com, stefanha@redhat.com,
	Harpreet Singh Anand <hanand@xilinx.com>,
	Stefano Garzarella <sgarzare@redhat.com>,
	Heng Qi <hengqi@linux.alibaba.com>,
	Zhu Lingshan <lingshan.zhu@intel.com>,
	Shannon Nelson <snelson@pensando.io>,
	mgurtovoy@nvidia.com, si-wei.liu@oracle.com
Subject: Re: [virtio-comment] Re: [PATCH 0/2] Selective queue enabling
Date: Thu, 8 Jun 2023 18:08:39 -0400	[thread overview]
Message-ID: <20230608180617-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CAJaqyWf+vn06NHcQY208dwyB=WH3PQ3A7z53o0BTy102XrT+3Q@mail.gmail.com>

On Thu, Jun 08, 2023 at 10:36:19AM +0200, Eugenio Perez Martin wrote:
> On Thu, Jun 8, 2023 at 9:19 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Jun 08, 2023 at 08:43:18AM +0200, Eugenio Perez Martin wrote:
> > > On Thu, Jun 8, 2023 at 8:04 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Thu, Jun 08, 2023 at 08:44:41AM +0800, Jason Wang wrote:
> > > > > On Thu, Jun 8, 2023 at 4:27 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Wed, Jun 07, 2023 at 11:41:39AM +0200, Eugenio Perez Martin wrote:
> > > > > > > On Wed, Jun 7, 2023 at 10:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Wed, Jun 07, 2023 at 10:47:12AM +0200, Eugenio Perez Martin wrote:
> > > > > > > > > On Wed, Jun 7, 2023 at 10:23 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Wed, 7 Jun 2023 07:35:58 +0200, Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > > > > > > > > > On Tue, Jun 6, 2023 at 9:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Jun 06, 2023 at 07:55:09PM +0200, Eugenio Pérez wrote:
> > > > > > > > > > > > > This series allows the driver to start the device (as set DRIVER_OK) with only
> > > > > > > > > > > > > some queues enabled, and then enable another queues later.
> > > > > > > > > > > > >
> > > > > > > > > > > > > This is the current way to migrate net device state through control
> > > > > > > > > > > > > virtqueue, in a software assisted framework with vDPA:
> > > > > > > > > > > > > * First, only net CVQ is enabled at DRIVER_OK
> > > > > > > > > > > > > * All the control commands (mac address, mq, etc) needed for the device
> > > > > > > > > > > > > to behave the same as the source of migration are sent
> > > > > > > > > > > > > * Finally all the dataplane queues are enabled.
> > > > > > > > > > > >
> > > > > > > > > > > > In my opinion, this is somewhat problematic. Specifically, currently
> > > > > > > > > > > > devices tend to deduce how many queues are needed by looking
> > > > > > > > > > > > at the state at DRIVER_OK time.
> > > > > > > > > > > >
> > > > > > > > > > > > Question: what is wrong with enabling queues initially and then
> > > > > > > > > > > > doing a reset right after DRIVER_OK? You can even allocate
> > > > > > > > > > > > memory for just one queue (zeroing it out).
> > > > > > > > > > > >
> > > > > > > > > > > > Granted this looks kind of ugly but side-steps this problem with
> > > > > > > > > > > > no need for spec changes.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > The problem is that the rx queues can start receiving, as the guest
> > > > > > > > > > > already has buffers there.
> > > > > > > > > >
> > > > > > > > > > Can we reset the vq before filling buffers to it?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > They are passthrough from the guest so there is a window where the
> > > > > > > > > device can process rx descriptors.
> > > > > > > >
> > > > > > > > Not if there are no descriptors there.
> > > > > > > >
> > > > > > >
> > > > > > > But the migration is driven by the hypervisor, and it cannot control
> > > > > > > that. The guest will likely have rx descriptors available.
> > > > > >
> > > > > > Maybe I misunderstand. Is hypervisor driving cvq while guest is driving
> > > > > > rx queues?
> > > > >
> > > > > No, hypervisor tries to restore virtqueue states via cvq before guest can drive.
> > > >
> > > > So cvq maps to hypervisor memory?
> > > >
> > >
> > > >From the device POV, yes, the CVQ vring is in hypervisor memory, not
> > > in the guest's one. That allows the hypervisor to send the CVQ
> > > commands to restore the device state in the destination, without the
> > > guest intervention.
> > >
> > > Data vqs, on the other hand, are passthrough. The device talks
> > > directly to the guest's vrings.
> > >
> > > Currently, this is done by emulating CVQ in the host's kernel and then
> > > translating the commands in the way that suits better the vdpa device,
> > > using its vendor vdpa driver in the host. Other methods like PASID are
> > > possible too.
> > >
> > > Thanks!
> >
> > OK. So my suggestion is simple: map data vrings to a zero page in
> > hypervisor memory initially. Later reset and map to guest.
> >
> 
> The idea is interesting, but we lose the net configuration in a device
> reset, so we need to send it again.

Ring reset, not device reset.

> In the case of qemu+vDPA maybe it is possible with queue_reset, like:
> * Map all the guest pages as usual.
> * Map a new zero page, forbid the guest to write on that page. Even in
> vIOMMU case, we can send all the CVQ commands before allowing the
> guest to modify mappings. The guest has no way to write on that page
> through the device as, well, dataplane is not initialized. It's the
> way DPDK shadow virtqueues worked, so it should be valid.
> * Reset the queues to the guest passthrough.
> 
> I don't like the complexity of it but I like it does require even less
> changes to the device / spec.

This is exactly what I meant.

> >
> > If that does not work, then I am not sure this proposal is enough
> > since I think devices want to have a specific point in time
> > where they know which queues are going to be used.
> 
> In the case of net, this should not be a problem since the spec
> mandates 2 if !cvq, 3 if cvq but !mq, and max_virtqueue_pairs*2+1 if
> cvq and mq. virtio-blk also has num_queues.

This is max, not how many there are in practice.

> Is it even valid to not enable some of the queues?

Yes and linux will do that if max_virtqueue_pairs > #CPUs.

> I've always felt that queue_enable has been redundant before
> queue_reset for this reason actually.
> 
> > Maybe we could use e.g. bit 1 in queue_enable to signal that?
> >
> 
> I'm totally ok to go in that direction.
> 
> Thanks!
> 
> >
> > > >
> > > >
> > > > > So in this case if RX queues are enabled at the same time, the device
> > > > > may try to queue packets to queue 0.
> > > > >
> > > > > Thanks
> > > > >
> > > > > > How do you do this - they are DMA from same VF no?
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > > > > > Apart from that, the back and forth
> > > > > > > > > > > introduces latencies.
> > > > > > > > > > >
> > > > > > > > > > > Maybe a better angle is to start all the queues as if they're reset,
> > > > > > > > > > > write 1 just to CVQ, configure the device, and then write 1 to all
> > > > > > > > > > > dataplane vqs?
> > > > > > > > > >
> > > > > > > > > > write to what?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Sorry I was unclear, I mean to enable the vqs writing 1 to queue_enable.
> > > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > >
> > > > > > > > > > Thanks.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks!
> > > > > > > > > > >
> > > > > > > > > > > > > Eugenio Pérez (2):
> > > > > > > > > > > > >   virtio: introduce selective queue enabling
> > > > > > > > > > > > >   virtio: pci support virtqueue selective enabling
> > > > > > > > > > > > >
> > > > > > > > > > > > >  content.tex       | 15 +++++++++++++--
> > > > > > > > > > > > >  transport-pci.tex |  4 ++++
> > > > > > > > > > > > >  2 files changed, 17 insertions(+), 2 deletions(-)
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > 2.31.1
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > This publicly archived list offers a means to provide input to the
> > > > > > > > > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > > > > > > > >
> > > > > > > > > > In order to verify user consent to the Feedback License terms and
> > > > > > > > > > to minimize spam in the list archive, subscription is required
> > > > > > > > > > before posting.
> > > > > > > > > >
> > > > > > > > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > > > > > > > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > > > > > > > > List help: virtio-comment-help@lists.oasis-open.org
> > > > > > > > > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > > > > > > > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > > > > > > > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > > > > > > > > Committee: https://www.oasis-open.org/committees/virtio/
> > > > > > > > > > Join OASIS: https://www.oasis-open.org/join/
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


  parent reply	other threads:[~2023-06-08 22:08 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-06 17:55 [virtio-comment] [PATCH 0/2] Selective queue enabling Eugenio Pérez
2023-06-06 17:55 ` [virtio-comment] [PATCH 1/2] virtio: introduce selective " Eugenio Pérez
2023-06-06 18:04   ` [virtio-comment] " Parav Pandit
2023-06-06 19:19     ` [virtio-comment] " Michael S. Tsirkin
2023-06-06 19:25       ` [virtio-comment] " Parav Pandit
2023-06-08 11:53         ` Eugenio Perez Martin
2023-06-08 13:15           ` Parav Pandit
2023-06-08 15:05             ` Michael S. Tsirkin
2023-06-08 15:07               ` Parav Pandit
2023-06-08 15:11           ` Michael S. Tsirkin
2023-06-08 12:11       ` [virtio-comment] " Xuan Zhuo
2023-06-08 13:21         ` [virtio-comment] " Parav Pandit
2023-06-08 14:18         ` [virtio-comment] " Eugenio Perez Martin
2023-06-08 14:39           ` Michael S. Tsirkin
2023-06-09  3:53           ` Xuan Zhuo
2023-06-06 19:11   ` Michael S. Tsirkin
2023-06-13  7:50   ` Michael S. Tsirkin
2023-06-13  8:28     ` Eugenio Perez Martin
2023-06-06 17:55 ` [virtio-comment] [PATCH 2/2] virtio: pci support virtqueue selective enabling Eugenio Pérez
2023-06-06 19:09   ` [virtio-comment] " Michael S. Tsirkin
2023-06-07  7:37     ` Eugenio Perez Martin
2023-06-07  9:03       ` Michael S. Tsirkin
2023-06-07  9:46         ` Eugenio Perez Martin
2023-06-06 19:09 ` [virtio-comment] Re: [PATCH 0/2] Selective queue enabling Michael S. Tsirkin
2023-06-07  5:35   ` Eugenio Perez Martin
2023-06-07  8:22     ` Xuan Zhuo
2023-06-07  8:47       ` Eugenio Perez Martin
2023-06-07  8:59         ` Michael S. Tsirkin
2023-06-07  9:41           ` Eugenio Perez Martin
2023-06-07 20:26             ` Michael S. Tsirkin
2023-06-08  0:44               ` Jason Wang
2023-06-08  6:04                 ` Michael S. Tsirkin
2023-06-08  6:43                   ` Eugenio Perez Martin
2023-06-08  7:18                     ` Michael S. Tsirkin
2023-06-08  7:47                       ` Jason Wang
2023-06-08 13:44                         ` Michael S. Tsirkin
2023-06-08  8:36                       ` Eugenio Perez Martin
2023-06-08 14:13                         ` Parav Pandit
2023-06-08 22:08                         ` Michael S. Tsirkin [this message]
2023-06-09 10:27                           ` Eugenio Perez Martin
2023-06-09 15:54                             ` Michael S. Tsirkin
2023-06-12  7:56                               ` Eugenio Perez Martin
2023-06-13  7:46                                 ` Michael S. Tsirkin
2023-06-13  7:53                                   ` Michael S. Tsirkin
2023-06-13 10:12                                     ` Eugenio Perez Martin
2023-06-13 12:28                                       ` Michael S. Tsirkin
2023-06-15  8:35                                         ` Eugenio Perez Martin
2023-06-16 14:40                                           ` Michael S. Tsirkin
2023-06-17 12:53                                             ` Eugenio Perez Martin
2023-06-17 23:08                                               ` Michael S. Tsirkin
2023-06-24 18:40                                                 ` Eugenio Perez Martin
2023-06-25  5:31                                                   ` Jason Wang
2023-06-25 21:32                                                     ` Michael S. Tsirkin
2023-06-26  2:53                                                       ` Jason Wang
2023-06-26  8:19                                                         ` Eugenio Perez Martin
2023-06-26  9:40                                                           ` Michael S. Tsirkin
2023-06-27  8:07                                                             ` Jason Wang
2023-06-13 19:00                                       ` Parav Pandit
2023-06-13 19:54                                         ` Michael S. Tsirkin
2023-06-13 21:09                                           ` Parav Pandit
2023-06-13 21:19                                             ` Parav Pandit
2023-06-13 21:48                                             ` Michael S. Tsirkin
2023-06-13 21:54                                               ` Parav Pandit
2023-06-14  4:26                                                 ` Zhu, Lingshan
2023-06-14  4:32                                                   ` Parav Pandit
2023-06-14  6:11                                                     ` Zhu, Lingshan
2023-06-14 11:56                                                       ` Parav Pandit
2023-06-15  5:56                                                         ` Zhu, Lingshan
2023-06-16  9:19                                                           ` Eugenio Perez Martin
2023-06-08  7:46                   ` Jason Wang
2023-07-06 18:18 ` [virtio-comment] " Eugenio Perez Martin
2023-07-10  3:55   ` Jason Wang
2023-07-10  5:49     ` Michael S. Tsirkin
2023-07-10 12:13       ` Parav Pandit
2023-07-11  3:09         ` Jason Wang
2023-07-11  3:08       ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230608180617-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=alvaro.karsz@solid-run.com \
    --cc=cohuck@redhat.com \
    --cc=dtatulea@nvidia.com \
    --cc=eperezma@redhat.com \
    --cc=gdawar@xilinx.com \
    --cc=hanand@xilinx.com \
    --cc=hengqi@linux.alibaba.com \
    --cc=jasowang@redhat.com \
    --cc=lingshan.zhu@intel.com \
    --cc=liuxiangdong5@huawei.com \
    --cc=longpeng2@huawei.com \
    --cc=lulu@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=mgurtovoy@nvidia.com \
    --cc=parav@nvidia.com \
    --cc=sgarzare@redhat.com \
    --cc=si-wei.liu@oracle.com \
    --cc=snelson@pensando.io \
    --cc=stefanha@redhat.com \
    --cc=virtio-comment@lists.oasis-open.org \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.