From: David Hildenbrand <david@redhat.com>
To: Halil Pasic <pasic@linux.ibm.com>
Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org,
virtualization@lists.linux.dev, kvm@vger.kernel.org,
Chandra Merla <cmerla@redhat.com>,
Stable@vger.kernel.org, Cornelia Huck <cohuck@redhat.com>,
Thomas Huth <thuth@redhat.com>,
Eric Farman <farman@linux.ibm.com>,
Heiko Carstens <hca@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Alexander Gordeev <agordeev@linux.ibm.com>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Sven Schnelle <svens@linux.ibm.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
Wei Wang <wei.w.wang@intel.com>
Subject: Re: [PATCH v1] s390/virtio_ccw: don't allocate/assign airqs for non-existing queues
Date: Fri, 4 Apr 2025 12:00:55 +0200 [thread overview]
Message-ID: <4a33daa3-7415-411e-a491-07635e3cfdc4@redhat.com> (raw)
In-Reply-To: <20250404063619.0fa60a41.pasic@linux.ibm.com>
On 04.04.25 06:36, Halil Pasic wrote:
> On Thu, 3 Apr 2025 16:28:31 +0200
> David Hildenbrand <david@redhat.com> wrote:
>
>>> Sorry I have to have a look at that discussion. Maybe it will answer
>>> some my questions.
>>
>> Yes, I think so.
>>
>>>
>>>> Let's fix it without affecting existing setups for now by properly
>>>> ignoring the non-existing queues, so the indicator bits will match
>>>> the queue indexes.
>>>
>>> Just one question. My understanding is that the crux is that Linux
>>> and QEMU (or the driver and the device) disagree at which index
>>> reporting_vq is actually sitting. Is that right?
>>
>> I thought I made it clear: this is only about the airq indicator bit.
>> That's where both disagree.
>>
>> Not the actual queue index (see above).
>
> I did some more research including having a look at that discussion. Let
> me try to sum up how did we end up here.
Let me add some more details after digging as well:
>
> Before commit a229989d975e ("virtio: don't allocate vqs when names[i] =
> NULL") the kernel behavior used to be in spec, but QEMU and possibly
> other hypervisor were out of spec and things did not work.
It all started with VIRTIO_BALLOON_F_FREE_PAGE_HINT. Before that,
we only had the single optional VIRTIO_BALLOON_F_STATS_VQ queue at the very
end. So there was no possibility for holes "in-between".
In the Linux driver, we created the stats queue only if the feature bit
VIRTIO_BALLOON_F_STATS_VQ was actually around:
nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
err = virtio_find_vqs(vb->vdev, nvqs, vqs, callbacks, names, NULL);
That changed with VIRTIO_BALLOON_F_FREE_PAGE_HINT, because we would
unconditionally create 4 queues. QEMU always supported the first 3 queues
unconditionally, but old QEMU did obviously not support the (new)
VIRTIO_BALLOON_F_FREE_PAGE_HINT queue.
390x didn't particularly like getting queried for non-existing
queues. [1] So the fix was not for a hypervisor that was out of spec, but
because quering non-existing queues didn't work.
The fix implied that if VIRTIO_BALLOON_F_STATS_VQ is missing, suddenly the queue
index of VIRTIO_BALLOON_F_FREE_PAGE_HINT changed as well.
Again, as QEMU always implemented the 3 first queues unconditionally, this was
not a problem.
[1] https://lore.kernel.org/all/c6746307-fae5-7652-af8d-19f560fc31d9@de.ibm.com/#t
>
> Possibly because of the complexity of fixing the hypervisor(s) commit
> a229989d975e ("virtio: don't allocate vqs when names[i] = NULL") opted
> for changing the guest side so that it does not fit the spec but fits
> the hypervisor(s). It unfortunately also broke notifiers (for the with
> holes) scenario for virtio-ccw only.
Yes, it broke the notifiers.
But note that everything was in spec at that point, because we only documented
"free_page_vq == 3" in the spec *2 years later*, in 2020:
commit 38448268eba0c105200d131c3f7f660129a4d673
Author: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Date: Tue Aug 25 07:45:02 2020 -0700
content: Document balloon feature free page hints
Free page hints allow the balloon driver to provide information on what
pages are not currently in use so that we can avoid the cost of copying
them in migration scenarios. Add a feature description for free page hints
describing basic functioning and requirements.
At that point, what we documented in the spec *did not match reality* in
Linux. QEMU was fully compatible, because VIRTIO_BALLOON_F_STATS_VQ is
unconditionally set.
QEMU and Linux kept using that queue index assignment model, and the spec
was wrong (out of sync?) at that point. The spec got more wrong with
commit d917d4a8d552c003e046b0e3b1b529d98f7e695b
Author: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Date: Tue Aug 25 07:45:17 2020 -0700
content: Document balloon feature free page reporting
Free page reporting is a feature that allows the guest to proactively
report unused pages to the host. By making use of this feature is is
possible to reduce the overall memory footprint of the guest in cases where
some significant portion of the memory is idle. Add documentation for the
free page reporting feature describing the functionality and requirements.
Where we documented VIRTIO_BALLOON_F_REPORTING after the changes were added to
QEMU+Linux implementation, so the spec did not reflect reality.
I'll note also cloud-hypervisor [2] today follows that model.
In particular, it *only* supports VIRTIO_BALLOON_F_REPORTING, turning
the queue index of VIRTIO_BALLOON_F_REPORTING into *2* instead of documented
in the spec to be *4*.
So in reality, we can see VIRTIO_BALLOON_F_REPORTING to be either 2/3/4, depending
on the availability of the other two features/queues.
[2] https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/virtio-devices/src/balloon.rs
>
> Now we had another look at this, and have concluded that fixing the
> hypervisor(s) and fixing the kernel, and making sure that the fixed
> kernel can tolerate the old broken hypervisor(s) is way to complicated
> if possible at all. So we decided to give the spec a reality check and
> fix the notifier bit assignment for virtio-ccw which is broken beyond
> doubt if we accept that the correct virtqueue index is the one that the
> hypervisor(s) use and not the one that the spec says they should use.
In case of virtio-balloon, it's unfortunate that it went that way, but the
spec simply did not / does not reflect reality when it was added to the spec.
>
> With the spec fixed, the whole notion of "holes" will be something that
> does not make sense any more. With that the merit of the kernel interface
> virtio_find_vqs() supporting "holes" is quite questionable. Now we need
> it because the drivers within the Linux kernel still think of the queues
> in terms of the current spec, i.e. they try to have the "holes" as
> mandated by the spec, and the duty of making it work with the broken
> device implementations falls to the transports.
>
Right, the "holes" only exist in the input array.
> Under the assumption that the spec is indeed going to be fixed:
>
> Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Thanks!
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2025-04-04 10:01 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-02 20:36 [PATCH v1] s390/virtio_ccw: don't allocate/assign airqs for non-existing queues David Hildenbrand
2025-04-03 9:44 ` Thomas Huth
2025-04-03 12:45 ` Cornelia Huck
2025-04-03 12:57 ` Michael S. Tsirkin
2025-04-03 13:12 ` Christian Borntraeger
2025-04-03 14:18 ` Halil Pasic
2025-04-03 14:28 ` David Hildenbrand
2025-04-04 4:36 ` Halil Pasic
2025-04-04 10:00 ` David Hildenbrand [this message]
2025-04-04 10:55 ` David Hildenbrand
2025-04-04 13:36 ` Halil Pasic
2025-04-04 13:48 ` David Hildenbrand
2025-04-04 14:00 ` Halil Pasic
2025-04-04 14:17 ` David Hildenbrand
2025-04-04 15:39 ` Halil Pasic
2025-04-04 16:49 ` David Hildenbrand
2025-04-04 17:36 ` David Hildenbrand
2025-04-07 7:52 ` Michael S. Tsirkin
2025-04-07 8:17 ` David Hildenbrand
2025-04-07 8:34 ` Michael S. Tsirkin
2025-04-07 8:44 ` David Hildenbrand
2025-04-07 8:49 ` Michael S. Tsirkin
2025-04-07 8:54 ` David Hildenbrand
2025-04-07 8:58 ` Michael S. Tsirkin
2025-04-07 9:11 ` David Hildenbrand
2025-04-07 9:13 ` David Hildenbrand
2025-04-07 13:13 ` David Hildenbrand
2025-04-07 17:39 ` Daniel Verkamp
2025-04-07 18:47 ` David Hildenbrand
2025-04-07 21:09 ` Daniel Verkamp
2025-04-09 11:02 ` David Hildenbrand
2025-04-07 21:20 ` Michael S. Tsirkin
2025-04-09 10:46 ` David Hildenbrand
2025-04-09 10:56 ` Michael S. Tsirkin
2025-04-09 11:12 ` David Hildenbrand
2025-04-09 12:07 ` Michael S. Tsirkin
2025-04-09 12:24 ` David Hildenbrand
2025-04-09 16:08 ` Michael S. Tsirkin
2025-04-07 9:37 ` Michael S. Tsirkin
2025-04-07 13:12 ` Halil Pasic
2025-04-07 13:17 ` David Hildenbrand
2025-04-07 13:28 ` Cornelia Huck
2025-04-07 13:32 ` Michael S. Tsirkin
2025-04-07 17:26 ` Halil Pasic
2025-04-07 8:38 ` David Hildenbrand
2025-04-07 8:44 ` Michael S. Tsirkin
2025-04-07 8:50 ` David Hildenbrand
2025-04-07 9:22 ` David Hildenbrand
2025-04-07 8:41 ` Michael S. Tsirkin
2025-04-06 18:42 ` Michael S. Tsirkin
2025-04-07 7:18 ` David Hildenbrand
2025-04-07 8:54 ` Michael S. Tsirkin
2025-04-07 9:08 ` David Hildenbrand
2025-04-06 15:40 ` Michael S. Tsirkin
2025-04-03 14:35 ` Michael S. Tsirkin
2025-04-04 4:02 ` Halil Pasic
2025-04-04 5:33 ` Michael S. Tsirkin
2025-04-04 12:05 ` Halil Pasic
2025-04-10 18:44 ` David Hildenbrand
2025-04-11 11:11 ` Christian Borntraeger
2025-04-11 12:42 ` Heiko Carstens
2025-04-11 12:47 ` Christian Borntraeger
2025-04-11 13:34 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4a33daa3-7415-411e-a491-07635e3cfdc4@redhat.com \
--to=david@redhat.com \
--cc=Stable@vger.kernel.org \
--cc=agordeev@linux.ibm.com \
--cc=borntraeger@linux.ibm.com \
--cc=cmerla@redhat.com \
--cc=cohuck@redhat.com \
--cc=farman@linux.ibm.com \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=mst@redhat.com \
--cc=pasic@linux.ibm.com \
--cc=svens@linux.ibm.com \
--cc=thuth@redhat.com \
--cc=virtualization@lists.linux.dev \
--cc=wei.w.wang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox