From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Steven Luong (sluong)" <sluong@cisco.com>
Cc: "Jerome Tollet (jtollet)" <jtollet@cisco.com>,
"Damjan Marion (damarion)" <damarion@cisco.com>,
"virtualization@lists.linux-foundation.org"
<virtualization@lists.linux-foundation.org>
Subject: Re: kernel vhost demands an interrupt from guest when the ring is full in order to enable guest to submit new packets to the queue
Date: Mon, 17 Dec 2018 19:15:55 -0500 [thread overview]
Message-ID: <20181217191142-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <4FE702F9-8680-461F-822B-03E81082F4F2@cisco.com>
On Mon, Dec 17, 2018 at 11:56:59PM +0000, Steven Luong (sluong) wrote:
>
>
> On 12/17/18, 2:55 PM, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
> On Thu, Dec 13, 2018 at 11:24:28PM +0000, Steven Luong (sluong) via Virtualization wrote:
> > Folks,
> >
> >
> >
> > We came across a memory race condition between VPP vhost driver and the kernel
> > vhost. VPP is running a tap interface over vhost backend. In this case, VPP is
> > acting as the vhost driver mode and the kernel vhost is acting as the vhost
> > device mode.
> >
> >
> >
> > In the kernel vhost’s TX traffic direction which is VPP’s RX traffic direction,
> > kernel vhost is the producer and VPP is the consumer.
>
> Looking at vhost net kernel code, it seems to use the
> same terminology as virtio net.
> Can you pls clarify which ring number do you use 0 or 1?
>
> <SVL> 0. </SVL>
>
> I will assume ring 0 below.
>
>
>
> > Kernel vhost places
> > traffic in kernel vhost’s TX vring. VPP removes traffic in VPP’s RX vring.
>
>
> So VPP makes buffers available and vhost kernel uses them?
>
> <SVL> Correct. </SVL>
>
> > It
> > is inevitable that the vring may become full under heavy traffic and the kernel
> > vhost may have to stop and wait for the guest (VPP) to empty the vring and to
> > refill the vring with descriptors. When that case happens, kernel vhost clears
> > the bit in the vring’s used flag to request an interrupt/notification. Due to
> > shared memory race condition, VPP may miss the clearing of the vring’s used
> > flag from kernel vhost and didn’t send kernel vhost an interrupt after VPP
> > empties and refills the vring with new descriptors.
>
> So kernel vhost's wakeups are signalled through eventfd - I assume
> this is what you mean by an interrupt?
>
>
> To prevent the race that you describe, vhost has this code:
>
> /* OK, now we need to know about added descriptors. */
> if (!headcount) {
> if (unlikely(busyloop_intr)) {
> vhost_poll_queue(&vq->poll);
> } else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
> /* They have slipped one in as we were
> * doing that: check again. */
> vhost_disable_notify(&net->dev, vq);
> continue;
> }
> /* Nothing new? Wait for eventfd to tell us
>
> So ring should be re-checked after notifications are enabled.
> If ring is no longer empty, vhost will proceed to handle the ring.
> This code has been around for many years.
>
> Thus if VPP doesn't work with it then I suspect it does something
> differently, e.g. is it possible that it is missing a memory
> barrier after writing out the available buffers and before
> checking the flags?
>
> <SVL> Can the race condition be avoided totally? Here is the scenario.
>
> t0: VPP refills the vring with new descriptors, not yet sets avail->idx to make it available to kernel vhost.
> t1: kernel vhost checks the vring, still sees no space available, but does not yet execute the line of code to clear the used->flags
> t2: vpp executes sfence, and sets avail->idx to make it available to kernel vhost
At this point you need a full memory barrier. E.g. mfence, or xor.
Otherwise the read can get re-ordered before the write.
> t3: vpp checks used->flags, it is still 1, vpp skips sending the interrupt
> t4: kernel vhost clears used->flags to indicate interrupt is required. But VPP just missed it.
fine but at this point kernel vhost will recheck using vhost_get_avail
and process the buffers. So the lack of notification isn't a problem:
bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
{
__virtio16 avail_idx;
int r;
if (!(vq->used_flags & VRING_USED_F_NO_NOTIFY))
return false;
vq->used_flags &= ~VRING_USED_F_NO_NOTIFY;
if (!vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) {
r = vhost_update_used_flags(vq);
if (r) {
vq_err(vq, "Failed to enable notification at %p: %d\n",
&vq->used->flags, r);
return false;
}
} else {
r = vhost_update_avail_event(vq, vq->avail_idx);
if (r) {
vq_err(vq, "Failed to update avail event index at %p: %d\n",
vhost_avail_event(vq), r);
return false;
}
}
/* They could have slipped one in as we were doing that: make
* sure it's written, then check again. */
smp_mb();
r = vhost_get_avail(vq, avail_idx, &vq->avail->idx);
if (r) {
vq_err(vq, "Failed to check avail idx at %p: %d\n",
&vq->avail->idx, r);
return false;
}
return vhost16_to_cpu(vq, avail_idx) != vq->avail_idx;
}
>The result is kernel vhost got stuck even though the vring is filled with new descriptors.
Shouldn't be. Maybe you are missing a fence before reading used->flags?
> Steven
>
> </SVL>
>
>
> > Unfortunately, this missed
> > notification causes the kernel vhost to be stuck because once the kernel vhost
> > is waiting for an interrupt/notification from the guest, only an interrupt/
> > notification from the guest can resume/re-enable the guest from submitting new
> > packets to the vring. This design seems vulnerable.
>
> The protocol right now relies on notifications never being lost.
>
> I can imagine an alternative where device also re-checks
> the rings periodically, but that would involve running
> timers host side which is only possible if there's a
> free processor that can handle them. Further,
> it will still lead to stalls and packet drops, which will
> be harder to debug.
>
> That's just my $.02, pls feel free to take it with the virtio tc
> maybe others will feel differently.
>
> > Should the kernel vhost
> > totally depend on the notification from the guest? Quoting the text from
> >
> >
> >
> > http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html
> >
> >
> >
> > /* The device uses this in used->flags to advise the driver: don’t kick me
> > * when you add a buffer. It’s unreliable, so it’s simply an
> > * optimization. */
> > #define VIRTQ_USED_F_NO_NOTIFY 1
> >
> >
> >
> > I interpret that the notification is simply an optimization, not a reliable
> > notification mechanism.
>
> What was meant I think is that suppression is unreliable.
>
> >So the kernel vhost should not bet the farm on it.
> >
> >
> >
> > We encounter the same race condition in kernel vhost’s RX traffic direction
> > which causes the kernel vhost to be stuck due to missed interrupt/notification
> > although it is less frequent.
>
>
> For the reverse direction the spec actually has some pseudo-code and a suggestion
> about handling that race:
>
> For optimal performance, a driver MAY disable used buffer notifications while processing the used
> ring, but beware the problem of missing notifications between emptying the ring and reenabling no-
> tifications. This is usually handled by re-checking for more used buffers after notifications are re-
> enabled:
>
> virtq_disable_used_buffer_notifications(vq);
> for (;;) {
> if (vq->last_seen_used != le16_to_cpu(virtq->used.idx)) {
> virtq_enable_used_buffer_notifications(vq);
> mb();
> if (vq->last_seen_used != le16_to_cpu(virtq->used.idx))
> break;
> virtq_disable_used_buffer_notifications(vq);
> }
> struct virtq_used_elem *e = virtq.used->ring[vq->last_seen_used%vsz];
> process_buffer(e);
> vq->last_seen_used++;
>
>
>
> >
> >
> > Steven
> >
>
> > _______________________________________________
> > Virtualization mailing list
> > Virtualization@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
>
>
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
next prev parent reply other threads:[~2018-12-18 0:15 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-13 23:24 kernel vhost demands an interrupt from guest when the ring is full in order to enable guest to submit new packets to the queue Steven Luong (sluong) via Virtualization
2018-12-17 22:55 ` Michael S. Tsirkin
2018-12-17 23:56 ` Steven Luong (sluong) via Virtualization
2018-12-18 0:15 ` Michael S. Tsirkin [this message]
2019-01-08 19:05 ` Steven Luong (sluong) via Virtualization
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181217191142-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=damarion@cisco.com \
--cc=jtollet@cisco.com \
--cc=sluong@cisco.com \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).