From: "Michael S. Tsirkin" <mst@redhat.com>
To: Ilya Maximets <i.maximets@ovn.org>
Cc: Stefan Hajnoczi <stefanha@gmail.com>,
qemu-devel@nongnu.org, Jason Wang <jasowang@redhat.com>,
Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [PATCH] virtio: use shadow_avail_idx while checking number of heads
Date: Mon, 25 Sep 2023 18:24:33 -0400 [thread overview]
Message-ID: <20230925182026-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <c12950b0-8222-23db-0f25-55a106b98e0c@ovn.org>
On Tue, Sep 26, 2023 at 12:13:11AM +0200, Ilya Maximets wrote:
> On 9/25/23 23:24, Michael S. Tsirkin wrote:
> > On Mon, Sep 25, 2023 at 10:58:05PM +0200, Ilya Maximets wrote:
> >> On 9/25/23 17:38, Stefan Hajnoczi wrote:
> >>> On Mon, 25 Sept 2023 at 11:36, Ilya Maximets <i.maximets@ovn.org> wrote:
> >>>>
> >>>> On 9/25/23 17:12, Stefan Hajnoczi wrote:
> >>>>> On Mon, 25 Sept 2023 at 11:02, Ilya Maximets <i.maximets@ovn.org> wrote:
> >>>>>>
> >>>>>> On 9/25/23 16:23, Stefan Hajnoczi wrote:
> >>>>>>> On Fri, 25 Aug 2023 at 13:04, Ilya Maximets <i.maximets@ovn.org> wrote:
> >>>>>>>>
> >>>>>>>> We do not need the most up to date number of heads, we only want to
> >>>>>>>> know if there is at least one.
> >>>>>>>>
> >>>>>>>> Use shadow variable as long as it is not equal to the last available
> >>>>>>>> index checked. This avoids expensive qatomic dereference of the
> >>>>>>>> RCU-protected memory region cache as well as the memory access itself
> >>>>>>>> and the subsequent memory barrier.
> >>>>>>>>
> >>>>>>>> The change improves performance of the af-xdp network backend by 2-3%.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> >>>>>>>> ---
> >>>>>>>> hw/virtio/virtio.c | 10 +++++++++-
> >>>>>>>> 1 file changed, 9 insertions(+), 1 deletion(-)
> >>>>>>>>
> >>>>>>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >>>>>>>> index 309038fd46..04bf7cc977 100644
> >>>>>>>> --- a/hw/virtio/virtio.c
> >>>>>>>> +++ b/hw/virtio/virtio.c
> >>>>>>>> @@ -999,7 +999,15 @@ void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
> >>>>>>>> /* Called within rcu_read_lock(). */
> >>>>>>>> static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
> >>>>>>>> {
> >>>>>>>> - uint16_t num_heads = vring_avail_idx(vq) - idx;
> >>>>>>>> + uint16_t num_heads;
> >>>>>>>> +
> >>>>>>>> + if (vq->shadow_avail_idx != idx) {
> >>>>>>>> + num_heads = vq->shadow_avail_idx - idx;
> >>>>>>>> +
> >>>>>>>> + return num_heads;
> >>>>>>>
> >>>>>>> This still needs to check num_heads > vq->vring.num and return -EINVAL
> >>>>>>> as is done below.
> >>>>>>
> >>>>>> Hmm, yeas, you're right. If the value was incorrect initially, the shadow
> >>>>>> will be incorrect. However, I think we should just not return here in this
> >>>>>> case and let vring_avail_idx() to grab an actual new value below. Otherwise
> >>>>>> we may never break out of this error.
> >>>>>>
> >>>>>> Does that make sense?
> >>>>>
> >>>>> No, because virtio_error() marks the device as broken. The device
> >>>>> requires a reset in order to function again. Fetching
> >>>>> vring_avail_idx() again won't help.
> >>>>
> >>>> OK, I see. In this case we're talking about situation where
> >>>> vring_avail_idx() was called in some other place and stored a bad value
> >>>> in the shadow variable, then virtqueue_num_heads() got called. Right?
> >>
> >> Hmm, I suppose we also need a read barrier after all even if we use
> >> a shadow index. Assuming the index is correct, but the shadow variable
> >> was updated by a call outside of this function, then we may miss a
> >> barrier and read the descriptor out of order, in theory. Read barrier
> >> is going to be a compiler barrier on x86, so the performance gain from
> >> this patch should still be mostly there. I'll test that.
> >
> > I can't say I understand generally. shadow is under qemu control,
> > I don't think it can be updated concurrently by multiple CPUs.
>
> It can't, I agree. Scenario I'm thinking about is the following:
>
> 1. vring_avail_idx() is called from one of the places other than
> virtqueue_num_heads(). Shadow is updated with the current value.
> Some users of vring_avail_idx() do not use barriers after the call.
>
> 2. virtqueue_split_get_avail_bytes() is called.
>
> 3. virtqueue_split_get_avail_bytes() calls virtqueue_num_heads().
>
> 4. virtqueue_num_heads() checks the shadow and returns early.
>
> 5. virtqueue_split_get_avail_bytes() calls vring_split_desc_read() and
> reads the descriptor.
>
> If between steps 1 and 5 we do not have a read barrier, we potentially
> risk reading descriptor data that is not yet fully written, because
> there is no guarantee that reading the last_avail_idx on step 1 wasn't
> reordered with the descriptor read.
>
> In current code we always have smp_rmb() in virtqueue_num_heads().
> But if we return from this function without a barrier, we may have an
> issue, IIUC.
>
> I agree that it's kind of a very unlikely scenario and we will probably
> have a control dependency between steps 1 and 5 that will prevent the
> issue, but it might be safer to just have an explicit barrier in
> virtqueue_num_heads().
>
> Does that make sense? Or am I missing something else here?
Aha, got it. Good point, yes. Pls document in a code comment.
> >
> >
> >>>>
> >>>> AFAIU, we can still just fall through here and let vring_avail_idx()
> >>>> to read the index again and fail the existing check. That would happen
> >>>> today without this patch applied.
> >>>
> >>> Yes, that is fine.
> >>>
> >>>>
> >>>> I'm jut trying to avoid duplication of the virtio_error call, i.e.:
> >>>>
> >>>> if (vq->shadow_avail_idx != idx) {
> >>>> num_heads = vq->shadow_avail_idx - idx;
> >>>>
> >>>> /* Check it isn't doing very strange things with descriptor numbers. */
> >>>> if (num_heads > vq->vring.num) {
> >>>> virtio_error(vq->vdev, "Guest moved used index from %u to %u",
> >>>> idx, vq->shadow_avail_idx);
> >>>> return -EINVAL;
> >>>> }
> >>>> return num_heads;
> >>>> }
> >>>>
> >>>> vs
> >>>>
> >>>> if (vq->shadow_avail_idx != idx) {
> >>>> num_heads = vq->shadow_avail_idx - idx;
> >>>>
> >>>> /* Only use the shadow value if it was good initially. */
> >>>> if (num_heads <= vq->vring.num) {
> >>>> return num_heads;
> >>>> }
> >>>> }
> >>>>
> >>>>
> >>>> What do you think?
> >>>
> >>> Sounds good.
> >>>
> >>>>
> >>>> Best regards, Ilya Maximets.
> >
next prev parent reply other threads:[~2023-09-25 22:25 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-25 17:04 [PATCH] virtio: use shadow_avail_idx while checking number of heads Ilya Maximets
2023-09-25 11:20 ` Ilya Maximets
2023-09-25 14:23 ` Stefan Hajnoczi
2023-09-25 15:02 ` Ilya Maximets
2023-09-25 15:12 ` Stefan Hajnoczi
2023-09-25 15:36 ` Ilya Maximets
2023-09-25 15:38 ` Stefan Hajnoczi
2023-09-25 20:58 ` Ilya Maximets
2023-09-25 21:24 ` Michael S. Tsirkin
2023-09-25 22:13 ` Ilya Maximets
2023-09-25 22:24 ` Michael S. Tsirkin [this message]
2023-09-27 14:01 ` Ilya Maximets
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230925182026-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=i.maximets@ovn.org \
--cc=jasowang@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@gmail.com \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.