From: "Michael S. Tsirkin" <mst@redhat.com>
To: Eugenio Perez Martin <eperezma@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>,
qemu-devel@nongnu.org, Parav Pandit <parav@mellanox.com>,
Zhu Lingshan <lingshan.zhu@intel.com>,
longpeng2@huawei.com, Stefano Garzarella <sgarzare@redhat.com>,
Gautam Dawar <gdawar@xilinx.com>,
"Gonglei (Arei)" <arei.gonglei@huawei.com>,
Harpreet Singh Anand <hanand@xilinx.com>,
alvaro.karsz@solid-run.com,
Liuxiangdong <liuxiangdong5@huawei.com>,
Dragos Tatulea <dtatulea@nvidia.com>,
si-wei.liu@oracle.com, Shannon Nelson <snelson@pensando.io>,
Lei Yang <leiyang@redhat.com>,
Laurent Vivier <lvivier@redhat.com>, Cindy Lu <lulu@redhat.com>
Subject: Re: [PATCH v3 5/5] vdpa: move CVQ isolation check to net_init_vhost_vdpa
Date: Thu, 18 May 2023 17:22:33 -0400 [thread overview]
Message-ID: <20230518172138-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CAJaqyWe=cop=M_kz7JazvnCboaMAqA4xuVO7WBS9rks83JHgkw@mail.gmail.com>
On Thu, May 18, 2023 at 08:36:22AM +0200, Eugenio Perez Martin wrote:
> On Thu, May 18, 2023 at 7:50 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Wed, May 17, 2023 at 2:30 PM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > On Wed, May 17, 2023 at 5:59 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Tue, May 9, 2023 at 11:44 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> > > > >
> > > > > Evaluating it at start time instead of initialization time may make the
> > > > > guest capable of dynamically adding or removing migration blockers.
> > > > >
> > > > > Also, moving to initialization reduces the number of ioctls in the
> > > > > migration, reducing failure possibilities.
> > > > >
> > > > > As a drawback we need to check for CVQ isolation twice: one time with no
> > > > > MQ negotiated and another one acking it, as long as the device supports
> > > > > it. This is because Vring ASID / group management is based on vq
> > > > > indexes, but we don't know the index of CVQ before negotiating MQ.
> > > > >
> > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > ---
> > > > > v2: Take out the reset of the device from vhost_vdpa_cvq_is_isolated
> > > > > v3: Only record cvq_isolated, true if the device have cvq isolated in
> > > > > both !MQ and MQ configurations.
> > > > > ---
> > > > > net/vhost-vdpa.c | 178 +++++++++++++++++++++++++++++++++++------------
> > > > > 1 file changed, 135 insertions(+), 43 deletions(-)
> > > > >
> > > > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > > > > index 3fb833fe76..29054b77a9 100644
> > > > > --- a/net/vhost-vdpa.c
> > > > > +++ b/net/vhost-vdpa.c
> > > > > @@ -43,6 +43,10 @@ typedef struct VhostVDPAState {
> > > > >
> > > > > /* The device always have SVQ enabled */
> > > > > bool always_svq;
> > > > > +
> > > > > + /* The device can isolate CVQ in its own ASID */
> > > > > + bool cvq_isolated;
> > > > > +
> > > > > bool started;
> > > > > } VhostVDPAState;
> > > > >
> > > > > @@ -362,15 +366,8 @@ static NetClientInfo net_vhost_vdpa_info = {
> > > > > .check_peer_type = vhost_vdpa_check_peer_type,
> > > > > };
> > > > >
> > > > > -/**
> > > > > - * Get vring virtqueue group
> > > > > - *
> > > > > - * @device_fd vdpa device fd
> > > > > - * @vq_index Virtqueue index
> > > > > - *
> > > > > - * Return -errno in case of error, or vq group if success.
> > > > > - */
> > > > > -static int64_t vhost_vdpa_get_vring_group(int device_fd, unsigned vq_index)
> > > > > +static int64_t vhost_vdpa_get_vring_group(int device_fd, unsigned vq_index,
> > > > > + Error **errp)
> > > > > {
> > > > > struct vhost_vring_state state = {
> > > > > .index = vq_index,
> > > > > @@ -379,8 +376,7 @@ static int64_t vhost_vdpa_get_vring_group(int device_fd, unsigned vq_index)
> > > > >
> > > > > if (unlikely(r < 0)) {
> > > > > r = -errno;
> > > > > - error_report("Cannot get VQ %u group: %s", vq_index,
> > > > > - g_strerror(errno));
> > > > > + error_setg_errno(errp, errno, "Cannot get VQ %u group", vq_index);
> > > > > return r;
> > > > > }
> > > > >
> > > > > @@ -480,9 +476,9 @@ static int vhost_vdpa_net_cvq_start(NetClientState *nc)
> > > > > {
> > > > > VhostVDPAState *s, *s0;
> > > > > struct vhost_vdpa *v;
> > > > > - uint64_t backend_features;
> > > > > int64_t cvq_group;
> > > > > - int cvq_index, r;
> > > > > + int r;
> > > > > + Error *err = NULL;
> > > > >
> > > > > assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > > > >
> > > > > @@ -502,41 +498,22 @@ static int vhost_vdpa_net_cvq_start(NetClientState *nc)
> > > > > /*
> > > > > * If we early return in these cases SVQ will not be enabled. The migration
> > > > > * will be blocked as long as vhost-vdpa backends will not offer _F_LOG.
> > > > > - *
> > > > > - * Calling VHOST_GET_BACKEND_FEATURES as they are not available in v->dev
> > > > > - * yet.
> > > > > */
> > > > > - r = ioctl(v->device_fd, VHOST_GET_BACKEND_FEATURES, &backend_features);
> > > > > - if (unlikely(r < 0)) {
> > > > > - error_report("Cannot get vdpa backend_features: %s(%d)",
> > > > > - g_strerror(errno), errno);
> > > > > - return -1;
> > > > > + if (!vhost_vdpa_net_valid_svq_features(v->dev->features, NULL)) {
> > > > > + return 0;
> > > > > }
> > > > > - if (!(backend_features & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID)) ||
> > > > > - !vhost_vdpa_net_valid_svq_features(v->dev->features, NULL)) {
> > > > > +
> > > > > + if (!s->cvq_isolated) {
> > > > > return 0;
> > > > > }
> > > > >
> > > > > - /*
> > > > > - * Check if all the virtqueues of the virtio device are in a different vq
> > > > > - * than the last vq. VQ group of last group passed in cvq_group.
> > > > > - */
> > > > > - cvq_index = v->dev->vq_index_end - 1;
> > > > > - cvq_group = vhost_vdpa_get_vring_group(v->device_fd, cvq_index);
> > > > > + cvq_group = vhost_vdpa_get_vring_group(v->device_fd,
> > > > > + v->dev->vq_index_end - 1,
> > > > > + &err);
> > > > > if (unlikely(cvq_group < 0)) {
> > > > > + error_report_err(err);
> > > > > return cvq_group;
> > > > > }
> > > > > - for (int i = 0; i < cvq_index; ++i) {
> > > > > - int64_t group = vhost_vdpa_get_vring_group(v->device_fd, i);
> > > > > -
> > > > > - if (unlikely(group < 0)) {
> > > > > - return group;
> > > > > - }
> > > > > -
> > > > > - if (group == cvq_group) {
> > > > > - return 0;
> > > > > - }
> > > > > - }
> > > > >
> > > > > r = vhost_vdpa_set_address_space_id(v, cvq_group, VHOST_VDPA_NET_CVQ_ASID);
> > > > > if (unlikely(r < 0)) {
> > > > > @@ -799,6 +776,111 @@ static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
> > > > > .avail_handler = vhost_vdpa_net_handle_ctrl_avail,
> > > > > };
> > > > >
> > > > > +/**
> > > > > + * Probe the device to check control virtqueue is isolated.
> > > > > + *
> > > > > + * @device_fd vhost-vdpa file descriptor
> > > > > + * @features features to negotiate
> > > > > + * @cvq_index Control vq index
> > > > > + *
> > > > > + * Returns -1 in case of error, 0 if false and 1 if true
> > > > > + */
> > > > > +static int vhost_vdpa_cvq_is_isolated(int device_fd, uint64_t features,
> > > > > + unsigned cvq_index, Error **errp)
> > > > > +{
> > > > > + int64_t cvq_group;
> > > > > + int r;
> > > > > +
> > > > > + r = vhost_vdpa_set_dev_features_fd(device_fd, features);
> > > > > + if (unlikely(r < 0)) {
> > > > > + error_setg_errno(errp, -r, "Cannot set device features");
> > > > > + return r;
> > > > > + }
> > > > > +
> > > > > + cvq_group = vhost_vdpa_get_vring_group(device_fd, cvq_index, errp);
> > > > > + if (unlikely(cvq_group < 0)) {
> > > > > + return cvq_group;
> > > > > + }
> > > > > +
> > > > > + for (int i = 0; i < cvq_index; ++i) {
> > > > > + int64_t group = vhost_vdpa_get_vring_group(device_fd, i, errp);
> > > > > +
> > > > > + if (unlikely(group < 0)) {
> > > > > + return group;
> > > > > + }
> > > > > +
> > > > > + if (group == (int64_t)cvq_group) {
> > > > > + return 0;
> > > > > + }
> > > > > + }
> > > > > +
> > > > > + return 1;
> > > > > +}
> > > > > +
> > > > > +/**
> > > > > + * Probe if CVQ is isolated when the device is MQ and when it is not MQ
> > > > > + *
> > > > > + * @device_fd The vdpa device fd
> > > > > + * @features Features offered by the device.
> > > > > + * @cvq_index The control vq index if mq is negotiated. Ignored
> > > > > + * otherwise.
> > > > > + *
> > > > > + * Returns <0 in case of failure, 0 if false and 1 if true.
> > > > > + */
> > > > > +static int vhost_vdpa_probe_cvq_isolation(int device_fd, uint64_t features,
> > > > > + int cvq_index, Error **errp)
> > > > > +{
> > > > > + uint64_t backend_features;
> > > > > + int r;
> > > > > +
> > > > > + ERRP_GUARD();
> > > > > +
> > > > > + r = ioctl(device_fd, VHOST_GET_BACKEND_FEATURES, &backend_features);
> > > > > + if (unlikely(r < 0)) {
> > > > > + error_setg_errno(errp, errno, "Cannot get vdpa backend_features");
> > > > > + return r;
> > > > > + }
> > > > > +
> > > > > + if (!(backend_features & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID))) {
> > > > > + return 0;
> > > > > + }
> > > > > +
> > > > > + r = vhost_vdpa_cvq_is_isolated(device_fd,
> > > > > + features & ~BIT_ULL(VIRTIO_NET_F_MQ), 2,
> > > > > + errp);
> > > > > + if (unlikely(r < 0)) {
> > > > > + if (r != -ENOTSUP) {
> > > > > + return r;
> > > > > + }
> > > > > +
> > > > > + /*
> > > > > + * The kernel report VHOST_BACKEND_F_IOTLB_ASID if the vdpa frontend
> > > > > + * support ASID even if the parent driver does not. The CVQ cannot be
> > > > > + * isolated in this case.
> > > > > + */
> > > > > + error_free(*errp);
> > > > > + *errp = NULL;
> > > > > + return 0;
> > > > > + }
> > > > > +
> > > > > + if (r == 0) {
> > > > > + return 0;
> > > > > + }
> > > > > +
> > > > > + vhost_vdpa_reset_status_fd(device_fd);
> > > > > + if (!(features & BIT_ULL(VIRTIO_NET_F_MQ))) {
> > > > > + return 0;
> > > > > + }
> > > > > +
> > > > > + r = vhost_vdpa_cvq_is_isolated(device_fd, features, cvq_index * 2, errp);
> > > >
> > > > I think checking this once should be sufficient. That is to say, it
> > > > should be a bug if there's hardware that puts cvq in a dedicated group
> > > > in MQ but not in SQ.
> > > >
> > >
> > > This is checking the NIC is not buggy :). Otherwise, we're giving
> > > access to the guest to the CVQ shadow vring. And, currently, SVQ code
> > > assumes only QEMU can access it.
> >
> > Just to make sure we are at the same page, I meant, the hardware
> > should be buggy if the isolation of cvq is not consistent between
> > single and multiqueue.
> >
>
> Yes, I got you.
>
> The problem with that particular bug is that we will handle guest's
> vring with the bad IOVA tree. Since QEMU is not sanitizing that
> descriptors anymore, the device can be used to write at qemu memory.
> At this time only SVQ vring and in buffers should be writable by this,
> so it's not a big deal.
>
> This can also happen if the device is buggy in other ways. For
> example, reporting that CVQ is isolated at VHOST_VDPA_GET_VRING_GROUP
> but then handling maps ignoring the ASID parameter. There is no
> protection for that, so I agree this double check makes little sense.
Ok so you will repost with this check removed?
> > >
> > > But maybe this made more sense in previous versions, where the series
> > > also cached the cvq group here. If I understand you correctly, it is
> > > enough to check that CVQ is isolated in SQ, and assume it will be
> > > isolated also in MQ, right? I can modify the patch that way if you
> > > confirm this.
> >
> > I think so, or just negotiate with what hardware provides us and check.
> >
>
> To always probe with SQ makes the code simpler, but let me know if you
> think there are advantages to probing otherwise.
>
> Thanks!
next prev parent reply other threads:[~2023-05-18 21:23 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-09 15:44 [PATCH v3 0/5] Move ASID test to vhost-vdpa net initialization Eugenio Pérez
2023-05-09 15:44 ` [PATCH v3 1/5] vdpa: Remove status in reset tracing Eugenio Pérez
2023-05-09 15:44 ` [PATCH v3 2/5] vdpa: add vhost_vdpa_reset_status_fd Eugenio Pérez
2023-05-17 3:14 ` Jason Wang
2023-05-17 5:46 ` Eugenio Perez Martin
2023-05-17 5:49 ` Jason Wang
2023-05-24 17:36 ` Eugenio Perez Martin
2023-05-26 4:10 ` Jason Wang
2023-05-09 15:44 ` [PATCH v3 3/5] vdpa: add vhost_vdpa_set_dev_features_fd Eugenio Pérez
2023-05-09 15:44 ` [PATCH v3 4/5] vdpa: return errno in vhost_vdpa_get_vring_group error Eugenio Pérez
2023-05-09 15:44 ` [PATCH v3 5/5] vdpa: move CVQ isolation check to net_init_vhost_vdpa Eugenio Pérez
2023-05-17 3:59 ` Jason Wang
2023-05-17 6:29 ` Eugenio Perez Martin
2023-05-18 5:49 ` Jason Wang
2023-05-18 6:36 ` Eugenio Perez Martin
2023-05-18 6:58 ` Jason Wang
2023-05-18 21:22 ` Michael S. Tsirkin [this message]
2023-05-19 4:50 ` Eugenio Perez Martin
2023-05-17 6:18 ` [PATCH v3 0/5] Move ASID test to vhost-vdpa net initialization Lei Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230518172138-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=alvaro.karsz@solid-run.com \
--cc=arei.gonglei@huawei.com \
--cc=dtatulea@nvidia.com \
--cc=eperezma@redhat.com \
--cc=gdawar@xilinx.com \
--cc=hanand@xilinx.com \
--cc=jasowang@redhat.com \
--cc=leiyang@redhat.com \
--cc=lingshan.zhu@intel.com \
--cc=liuxiangdong5@huawei.com \
--cc=longpeng2@huawei.com \
--cc=lulu@redhat.com \
--cc=lvivier@redhat.com \
--cc=parav@mellanox.com \
--cc=qemu-devel@nongnu.org \
--cc=sgarzare@redhat.com \
--cc=si-wei.liu@oracle.com \
--cc=snelson@pensando.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).