From: "Michael S. Tsirkin" <mst@redhat.com>
To: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: Eugenio Perez Martin <eperezma@redhat.com>,
Jason Wang <jasowang@redhat.com>, Cindy Lu <lulu@redhat.com>,
Stefano Garzarella <sgarzare@redhat.com>,
qemu-level <qemu-devel@nongnu.org>,
Laurent Vivier <lvivier@redhat.com>,
Juan Quintela <quintela@redhat.com>,
Eelco Chaudron <echaudro@redhat.com>
Subject: Re: Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user
Date: Fri, 10 Mar 2023 05:49:40 -0500 [thread overview]
Message-ID: <20230310054745-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <29f5522c-c305-9bf5-3283-e096b38261ef@redhat.com>
On Fri, Mar 10, 2023 at 11:33:42AM +0100, Maxime Coquelin wrote:
>
>
> On 3/8/23 13:15, Michael S. Tsirkin wrote:
> > On Wed, Mar 08, 2023 at 11:33:45AM +0100, Maxime Coquelin wrote:
> > > Hello Michael,
> > >
> > > On 2/1/23 12:20, Michael S. Tsirkin wrote:
> > > > On Wed, Feb 01, 2023 at 12:14:18PM +0100, Maxime Coquelin wrote:
> > > > > Thanks Eugenio for working on this.
> > > > >
> > > > > On 1/31/23 20:10, Eugenio Perez Martin wrote:
> > > > > > Hi,
> > > > > >
> > > > > > The current approach of offering an emulated CVQ to the guest and map
> > > > > > the commands to vhost-user is not scaling well:
> > > > > > * Some devices already offer it, so the transformation is redundant.
> > > > > > * There is no support for commands with variable length (RSS?)
> > > > > >
> > > > > > We can solve both of them by offering it through vhost-user the same
> > > > > > way as vhost-vdpa do. With this approach qemu needs to track the
> > > > > > commands, for similar reasons as vhost-vdpa: qemu needs to track the
> > > > > > device status for live migration. vhost-user should use the same SVQ
> > > > > > code for this, so we avoid duplications.
> > > > > >
> > > > > > One of the challenges here is to know what virtqueue to shadow /
> > > > > > isolate. The vhost-user device may not have the same queues as the
> > > > > > device frontend:
> > > > > > * The first depends on the actual vhost-user device, and qemu fetches
> > > > > > it with VHOST_USER_GET_QUEUE_NUM at the moment.
> > > > > > * The qemu device frontend's is set by netdev queues= cmdline parameter in qemu
> > > > > >
> > > > > > For the device, the CVQ is the last one it offers, but for the guest
> > > > > > it is the last one offered in config space.
> > > > > >
> > > > > > To create a new vhost-user command to decrease that maximum number of
> > > > > > queues may be an option. But we can do it without adding more
> > > > > > commands, remapping the CVQ index at virtqueue setup. I think it
> > > > > > should be doable using (struct vhost_dev).vq_index and maybe a few
> > > > > > adjustments here and there.
> > > > > >
> > > > > > Thoughts?
> > > > >
> > > > > I am fine with both proposals.
> > > > > I think index remapping will require a bit more rework in the DPDK
> > > > > Vhost-user library, but nothing insurmountable.
> > > > >
> > > > > I am currently working on a PoC adding support for VDUSE in the DPDK
> > > > > Vhost library, and recently added control queue support. We can reuse it
> > > > > if we want to prototype your proposal.
> > > > >
> > > > > Maxime
> > > > >
> > > > > > Thanks!
> > > > > >
> > > >
> > > >
> > > > technically backend knows how many vqs are there, last one is cvq...
> > > > not sure we need full blown remapping ...
> > > >
> > >
> > > Before VHOST_USER_PROTOCOL_F_STATUS was supported by qemu (very
> > > recently, v7.2.0), we had no way for the backend to be sure the
> > > frontend won't configure new queue pairs, this not not defined in the
> > > spec AFAICT [0]. In DPDK Vhost library, we notify the application it can
> > > start to use the device once the first queue pair is setup and enabled,
> > > then we notify the application when new queues are ready to be
> > > processed. In this case, I think we cannot deduce whether the queue is a
> > > data or a control queue when it is setup.
> > >
> > > When VHOST_USER_PROTOCOL_F_STATUS is supported, we know no more queues
> > > will be configured once the DRIVER_OK status is set. In this case, we
> > > can deduce the last queue setup will be the control queue at DRIVER_OK
> > > time if the number of queues is odd.
> > >
> > > Using index remapping, we would know directly at queue setup time
> > > whether this is a data or control queue based on its index value,
> > > i.e. if the index equals to max queue index supported by the backend.
> > > But thinking at it again, we may at least back this with a protocol
> > > feature to avoid issues with legacy backends.
> > >
> > > I hope it clarifies, let me know if anything unclear.
> > >
> > > Thanks,
> > > Maxime
> > >
> > > [0]:
> > > https://elixir.bootlin.com/qemu/latest/source/docs/interop/vhost-user.rst
> >
> >
> > OK maybe document this.
>
> Sure, working on it... But I just found a discrepancy related to
> VHOST_USER_GET_QUEUE_NUM between the spec and the frontend/backend
> implementations.
>
> In the spec [0], VHOST_USER_GET_QUEUE_NUM reply is the number of queues.
> In Qemu Vhost-user Net frontend [1], VHOST_USER_GET_QUEUE_NUM is handled
> as the number of queue *pairs*, and so does the DPDK Vhost library.
> Vhost-user-bridge Qemu test application handles it as the number of queues.
weird how does Vhost-user-bridge work then? I guess it just
ignores the extra queues that were not inited?
> Other device types seem to handle it as the number of queues, which
> makes sense since they don't have the notion of queue pair.
>
> Fixing the QEMU and DPDK implementations would require a new protocol
> feature bit not to break compatibility with older versions.
>
> So maybe we should add in the spec that for network devices,
> VHOST_USER_GET_QUEUE_NUM reply represents the number of queue pairs, and
> also fix vhost-user-bridge to reply with the number of queue pairs?
>
> Maxime
Not sure we need to fix vhost-user-bridge - it seems to work?
In any case let's add a protocol feature to fix it for net maybe?
> [0]: https://elixir.bootlin.com/qemu/latest/source/docs/interop/vhost-user.rst#L1091
> [1]: https://elixir.bootlin.com/qemu/latest/source/net/vhost-user.c#L69
prev parent reply other threads:[~2023-03-10 10:50 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-31 19:10 Emulating device configuration / max_virtqueue_pairs in vhost-vdpa and vhost-user Eugenio Perez Martin
2023-01-31 19:11 ` Eugenio Perez Martin
2023-01-31 21:32 ` Michael S. Tsirkin
2023-02-01 3:29 ` Jason Wang
2023-02-01 7:49 ` Eugenio Perez Martin
2023-02-01 10:44 ` Michael S. Tsirkin
2023-02-02 3:41 ` Jason Wang
2023-02-02 18:32 ` Eugenio Perez Martin
2023-02-01 10:53 ` Michael S. Tsirkin
2023-02-01 3:27 ` Jason Wang
2023-02-01 6:55 ` Eugenio Perez Martin
2023-02-02 3:02 ` Jason Wang
2023-02-01 11:14 ` Maxime Coquelin
2023-02-01 11:19 ` Eugenio Perez Martin
2023-03-02 8:48 ` Maxime Coquelin
2023-02-01 11:20 ` Michael S. Tsirkin
2023-02-01 11:48 ` Eugenio Perez Martin
2023-02-02 3:44 ` Jason Wang
2023-02-02 18:37 ` Eugenio Perez Martin
2023-03-08 10:33 ` Maxime Coquelin
2023-03-08 12:15 ` Michael S. Tsirkin
2023-03-10 10:33 ` Maxime Coquelin
2023-03-10 10:49 ` Michael S. Tsirkin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230310054745-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=echaudro@redhat.com \
--cc=eperezma@redhat.com \
--cc=jasowang@redhat.com \
--cc=lulu@redhat.com \
--cc=lvivier@redhat.com \
--cc=maxime.coquelin@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=sgarzare@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).