From: "Michael S. Tsirkin" <mst@redhat.com>
To: Eugenio Perez Martin <eperezma@redhat.com>
Cc: linux-kernel@vger.kernel.org, kvm list <kvm@vger.kernel.org>,
virtualization@lists.linux-foundation.org,
netdev@vger.kernel.org, Jason Wang <jasowang@redhat.com>
Subject: Re: [PATCH RFC v7 03/14] vhost: use batched get_vq_desc version
Date: Tue, 16 Jun 2020 18:08:04 -0400 [thread overview]
Message-ID: <20200616180136-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CAJaqyWeX7knekVPcsZ2+AAf8zvZhPvt46fZncAsLhwYJ3eUa1g@mail.gmail.com>
On Tue, Jun 16, 2020 at 05:23:43PM +0200, Eugenio Perez Martin wrote:
> On Mon, Jun 15, 2020 at 6:05 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> >
> > On Thu, 2020-06-11 at 07:30 -0400, Michael S. Tsirkin wrote:
> > > On Wed, Jun 10, 2020 at 06:18:32PM +0200, Eugenio Perez Martin wrote:
> > > > On Wed, Jun 10, 2020 at 5:13 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > On Wed, Jun 10, 2020 at 02:37:50PM +0200, Eugenio Perez Martin wrote:
> > > > > > > +/* This function returns a value > 0 if a descriptor was found, or 0 if none were found.
> > > > > > > + * A negative code is returned on error. */
> > > > > > > +static int fetch_descs(struct vhost_virtqueue *vq)
> > > > > > > +{
> > > > > > > + int ret;
> > > > > > > +
> > > > > > > + if (unlikely(vq->first_desc >= vq->ndescs)) {
> > > > > > > + vq->first_desc = 0;
> > > > > > > + vq->ndescs = 0;
> > > > > > > + }
> > > > > > > +
> > > > > > > + if (vq->ndescs)
> > > > > > > + return 1;
> > > > > > > +
> > > > > > > + for (ret = 1;
> > > > > > > + ret > 0 && vq->ndescs <= vhost_vq_num_batch_descs(vq);
> > > > > > > + ret = fetch_buf(vq))
> > > > > > > + ;
> > > > > >
> > > > > > (Expanding comment in V6):
> > > > > >
> > > > > > We get an infinite loop this way:
> > > > > > * vq->ndescs == 0, so we call fetch_buf() here
> > > > > > * fetch_buf gets less than vhost_vq_num_batch_descs(vq); descriptors. ret = 1
> > > > > > * This loop calls again fetch_buf, but vq->ndescs > 0 (and avail_vq ==
> > > > > > last_avail_vq), so it just return 1
> > > > >
> > > > > That's what
> > > > > [PATCH RFC v7 08/14] fixup! vhost: use batched get_vq_desc version
> > > > > is supposed to fix.
> > > > >
> > > >
> > > > Sorry, I forgot to include that fixup.
> > > >
> > > > With it I don't see CPU stalls, but with that version latency has
> > > > increased a lot and I see packet lost:
> > > > + ping -c 5 10.200.0.1
> > > > PING 10.200.0.1 (10.200.0.1) 56(84) bytes of data.
> > > > > From 10.200.0.2 icmp_seq=1 Destination Host Unreachable
> > > > > From 10.200.0.2 icmp_seq=2 Destination Host Unreachable
> > > > > From 10.200.0.2 icmp_seq=3 Destination Host Unreachable
> > > > 64 bytes from 10.200.0.1: icmp_seq=5 ttl=64 time=6848 ms
> > > >
> > > > --- 10.200.0.1 ping statistics ---
> > > > 5 packets transmitted, 1 received, +3 errors, 80% packet loss, time 76ms
> > > > rtt min/avg/max/mdev = 6848.316/6848.316/6848.316/0.000 ms, pipe 4
> > > > --
> > > >
> > > > I cannot even use netperf.
> > >
> > > OK so that's the bug to try to find and fix I think.
> > >
> > >
> > > > If I modify with my proposed version:
> > > > + ping -c 5 10.200.0.1
> > > > PING 10.200.0.1 (10.200.0.1) 56(84) bytes of data.
> > > > 64 bytes from 10.200.0.1: icmp_seq=1 ttl=64 time=7.07 ms
> > > > 64 bytes from 10.200.0.1: icmp_seq=2 ttl=64 time=0.358 ms
> > > > 64 bytes from 10.200.0.1: icmp_seq=3 ttl=64 time=5.35 ms
> > > > 64 bytes from 10.200.0.1: icmp_seq=4 ttl=64 time=2.27 ms
> > > > 64 bytes from 10.200.0.1: icmp_seq=5 ttl=64 time=0.426 ms
> > >
> > > Not sure which version this is.
> > >
> > > > [root@localhost ~]# netperf -H 10.200.0.1 -p 12865 -l 10 -t TCP_STREAM
> > > > MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> > > > 10.200.0.1 () port 0 AF_INET
> > > > Recv Send Send
> > > > Socket Socket Message Elapsed
> > > > Size Size Size Time Throughput
> > > > bytes bytes bytes secs. 10^6bits/sec
> > > >
> > > > 131072 16384 16384 10.01 4742.36
> > > > [root@localhost ~]# netperf -H 10.200.0.1 -p 12865 -l 10 -t UDP_STREAM
> > > > MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> > > > 10.200.0.1 () port 0 AF_INET
> > > > Socket Message Elapsed Messages
> > > > Size Size Time Okay Errors Throughput
> > > > bytes bytes secs # # 10^6bits/sec
> > > >
> > > > 212992 65507 10.00 9214 0 482.83
> > > > 212992 10.00 9214 482.83
> > > >
> > > > I will compare with the non-batch version for reference, but the
> > > > difference between the two is noticeable. Maybe it's worth finding a
> > > > good value for the if() inside fetch_buf?
> > > >
> > > > Thanks!
> > > >
> > >
> > > I don't think it's performance, I think it's a bug somewhere,
> > > e.g. maybe we corrupt a packet, or stall the queue, or
> > > something like this.
> > >
> > > Let's do this, I will squash the fixups and post v8 so you can bisect
> > > and then debug cleanly.
> >
> > Ok, so if we apply the patch proposed in v7 08/14 (Or the version 8 of the patchset sent), this is what happens:
> >
> > 1. Userland (virtio_test in my case) introduces just one buffer in vq, and it kicks
> > 2. vhost module reaches fetch_descs, called from vhost_get_vq_desc. From there we call fetch_buf in a for loop.
> > 3. The first time we call fetch_buf, it returns properly one buffer. However, the second time we call it, it returns 0
> > because vq->avail_idx == vq->last_avail_idx and vq->avail_idx == last_avail_idx code path.
> > 4. fetch_descs assign ret = 0, so it returns 0. vhost_get_vq_desc will goto err, and it will signal no new buffer
> > (returning vq->num).
> >
> > So to fix it and maintain the batching maybe we could return vq->ndescs in case ret == 0:
> >
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index c0dfb5e3d2af..5993d4f34ca9 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -2315,7 +2327,8 @@ static int fetch_descs(struct vhost_virtqueue *vq)
> >
> > /* On success we expect some descs */
> > BUG_ON(ret > 0 && !vq->ndescs);
> > - return ret;
> > + return ret ?: vq->ndescs;
I'd rather we used standard C. Also ret < 0 needs
to be handled. Also - what if fetch of some descs fails
but some succeeds?
What do we want to do?
Maybe:
return vq->ndescs ? vq->ndescs : ret;
> > }
> >
> > /* Reverse the effects of fetch_descs */
> > --
> >
> > Another possibility could be to return different codes from fetch_buf, but I find the suggested modification easier.
> >
> > What do you think?
> >
> > Thanks!
> >
>
> Hi!
>
> I can send a proposed RFC v9 in case it is more convenient for you.
>
> Thanks!
Excellent, pls go ahead!
And can you include the performance numbers?
It's enough to test the final version.
--
MST
next prev parent reply other threads:[~2020-06-16 22:08 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-10 11:35 [PATCH RFC v7 00/14] vhost: ring format independence Michael S. Tsirkin
2020-06-10 11:35 ` [PATCH RFC v7 01/14] vhost: option to fetch descriptors through an independent struct Michael S. Tsirkin
2020-06-10 11:35 ` Michael S. Tsirkin
2020-06-10 11:35 ` [PATCH RFC v7 02/14] fixup! " Michael S. Tsirkin
2020-06-10 14:00 ` Eugenio Pérez
2020-06-10 11:36 ` [PATCH RFC v7 03/14] vhost: use batched get_vq_desc version Michael S. Tsirkin
2020-06-10 11:36 ` Michael S. Tsirkin
2020-06-10 12:37 ` Eugenio Perez Martin
2020-06-10 15:13 ` Michael S. Tsirkin
2020-06-10 16:18 ` Eugenio Perez Martin
2020-06-11 11:30 ` Michael S. Tsirkin
2020-06-15 16:05 ` Eugenio Pérez
2020-06-16 15:23 ` Eugenio Perez Martin
2020-06-16 22:08 ` Michael S. Tsirkin [this message]
2020-06-10 14:29 ` Eugenio Pérez
2020-06-10 15:08 ` Michael S. Tsirkin
2020-06-10 15:12 ` Eugenio Perez Martin
2020-06-10 11:36 ` [PATCH RFC v7 04/14] vhost/net: pass net specific struct pointer Michael S. Tsirkin
2020-06-10 11:36 ` Michael S. Tsirkin
2020-06-10 15:13 ` Eugenio Pérez
2020-06-10 11:36 ` [PATCH RFC v7 05/14] vhost: reorder functions Michael S. Tsirkin
2020-06-10 11:36 ` Michael S. Tsirkin
2020-06-10 11:36 ` [PATCH RFC v7 06/14] vhost: format-independent API for used buffers Michael S. Tsirkin
2020-06-10 11:36 ` Michael S. Tsirkin
2020-06-10 11:36 ` [PATCH RFC v7 07/14] fixup! " Michael S. Tsirkin
2020-06-10 11:36 ` Michael S. Tsirkin
2020-06-10 11:36 ` [PATCH RFC v7 08/14] fixup! vhost: use batched get_vq_desc version Michael S. Tsirkin
2020-06-10 11:36 ` Michael S. Tsirkin
2020-06-10 11:36 ` [PATCH RFC v7 09/14] vhost/net: convert to new API: heads->bufs Michael S. Tsirkin
2020-06-10 11:36 ` Michael S. Tsirkin
2020-06-10 11:36 ` [PATCH RFC v7 10/14] vhost/net: avoid iov length math Michael S. Tsirkin
2020-06-10 11:36 ` Michael S. Tsirkin
2020-06-10 11:36 ` [PATCH RFC v7 11/14] vhost/test: convert to the buf API Michael S. Tsirkin
2020-06-10 11:36 ` Michael S. Tsirkin
2020-06-10 11:36 ` [PATCH RFC v7 12/14] vhost/scsi: switch to buf APIs Michael S. Tsirkin
2020-06-10 11:36 ` Michael S. Tsirkin
2020-06-10 11:36 ` [PATCH RFC v7 13/14] vhost/vsock: switch to the buf API Michael S. Tsirkin
2020-06-10 11:36 ` Michael S. Tsirkin
2020-06-10 11:36 ` [PATCH RFC v7 14/14] vhost: drop head based APIs Michael S. Tsirkin
2020-06-10 11:36 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200616180136-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=eperezma@redhat.com \
--cc=jasowang@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.