public inbox for linux-s390@vger.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Halil Pasic <pasic@linux.ibm.com>
Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org,
	virtualization@lists.linux.dev, kvm@vger.kernel.org,
	Chandra Merla <cmerla@redhat.com>,
	Stable@vger.kernel.org, Cornelia Huck <cohuck@redhat.com>,
	Thomas Huth <thuth@redhat.com>,
	Eric Farman <farman@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Sven Schnelle <svens@linux.ibm.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Wei Wang <wei.w.wang@intel.com>
Subject: Re: [PATCH v1] s390/virtio_ccw: don't allocate/assign airqs for non-existing queues
Date: Fri, 4 Apr 2025 12:55:09 +0200	[thread overview]
Message-ID: <d54fbf56-b462-4eea-a86e-3a0defb6298b@redhat.com> (raw)
In-Reply-To: <4a33daa3-7415-411e-a491-07635e3cfdc4@redhat.com>

On 04.04.25 12:00, David Hildenbrand wrote:
> On 04.04.25 06:36, Halil Pasic wrote:
>> On Thu, 3 Apr 2025 16:28:31 +0200
>> David Hildenbrand <david@redhat.com> wrote:
>>
>>>> Sorry I have to have a look at that discussion. Maybe it will answer
>>>> some my questions.
>>>
>>> Yes, I think so.
>>>
>>>>     
>>>>> Let's fix it without affecting existing setups for now by properly
>>>>> ignoring the non-existing queues, so the indicator bits will match
>>>>> the queue indexes.
>>>>
>>>> Just one question. My understanding is that the crux is that Linux
>>>> and QEMU (or the driver and the device) disagree at which index
>>>> reporting_vq is actually sitting. Is that right?
>>>
>>> I thought I made it clear: this is only about the airq indicator bit.
>>> That's where both disagree.
>>>
>>> Not the actual queue index (see above).
>>
>> I did some more research including having a look at that discussion. Let
>> me try to sum up how did we end up here.
> 
> Let me add some more details after digging as well:
> 
>>
>> Before commit a229989d975e ("virtio: don't allocate vqs when names[i] =
>> NULL") the kernel behavior used to be in spec, but QEMU and possibly
>> other hypervisor were out of spec and things did not work.
> 
> It all started with VIRTIO_BALLOON_F_FREE_PAGE_HINT. Before that,
> we only had the single optional VIRTIO_BALLOON_F_STATS_VQ queue at the very
> end. So there was no possibility for holes "in-between".
> 
> In the Linux driver, we created the stats queue only if the feature bit
> VIRTIO_BALLOON_F_STATS_VQ was actually around:
> 
> 	nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
> 	err = virtio_find_vqs(vb->vdev, nvqs, vqs, callbacks, names, NULL);
> 
> That changed with VIRTIO_BALLOON_F_FREE_PAGE_HINT, because we would
> unconditionally create 4 queues. QEMU always supported the first 3 queues
> unconditionally, but old QEMU did obviously not support the (new)
> VIRTIO_BALLOON_F_FREE_PAGE_HINT queue.
> 
> 390x didn't particularly like getting queried for non-existing
> queues. [1] So the fix was not for a hypervisor that was out of spec, but
> because quering non-existing queues didn't work.
> 
> The fix implied that if VIRTIO_BALLOON_F_STATS_VQ is missing, suddenly the queue
> index of VIRTIO_BALLOON_F_FREE_PAGE_HINT changed as well.
> 
> Again, as QEMU always implemented the 3 first queues unconditionally, this was
> not a problem.
> 
> [1] https://lore.kernel.org/all/c6746307-fae5-7652-af8d-19f560fc31d9@de.ibm.com/#t
> 
>>
>> Possibly because of the complexity of fixing the hypervisor(s) commit
>> a229989d975e ("virtio: don't allocate vqs when names[i] = NULL") opted
>> for changing the guest side so that it does not fit the spec but fits
>> the hypervisor(s). It unfortunately also broke notifiers (for the with
>> holes) scenario for virtio-ccw only.
> 
> Yes, it broke the notifiers.
> 
> But note that everything was in spec at that point, because we only documented
> "free_page_vq == 3" in the spec *2 years later*, in 2020:
> 
> commit 38448268eba0c105200d131c3f7f660129a4d673
> Author: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> Date:   Tue Aug 25 07:45:02 2020 -0700
> 
>       content: Document balloon feature free page hints
>       
>       Free page hints allow the balloon driver to provide information on what
>       pages are not currently in use so that we can avoid the cost of copying
>       them in migration scenarios. Add a feature description for free page hints
>       describing basic functioning and requirements.
>       
> At that point, what we documented in the spec *did not match reality* in
> Linux. QEMU was fully compatible, because VIRTIO_BALLOON_F_STATS_VQ is
> unconditionally set.
> 
> 
> QEMU and Linux kept using that queue index assignment model, and the spec
> was wrong (out of sync?) at that point. The spec got more wrong with
> 
> commit d917d4a8d552c003e046b0e3b1b529d98f7e695b
> Author: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> Date:   Tue Aug 25 07:45:17 2020 -0700
> 
>       content: Document balloon feature free page reporting
>       
>       Free page reporting is a feature that allows the guest to proactively
>       report unused pages to the host. By making use of this feature is is
>       possible to reduce the overall memory footprint of the guest in cases where
>       some significant portion of the memory is idle. Add documentation for the
>       free page reporting feature describing the functionality and requirements.
> 
> Where we documented VIRTIO_BALLOON_F_REPORTING after the changes were added to
> QEMU+Linux implementation, so the spec did not reflect reality.
> 
> I'll note also cloud-hypervisor [2] today follows that model.
> 
> In particular, it *only* supports VIRTIO_BALLOON_F_REPORTING, turning
> the queue index of VIRTIO_BALLOON_F_REPORTING into *2* instead of documented
> in the spec to be *4*.
> 
> So in reality, we can see VIRTIO_BALLOON_F_REPORTING to be either 2/3/4, depending
> on the availability of the other two features/queues.
> 
> [2] https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/virtio-devices/src/balloon.rs
> 
> 
>>
>> Now we had another look at this, and have concluded that fixing the
>> hypervisor(s) and fixing the kernel, and making sure that the fixed
>> kernel can tolerate the old broken hypervisor(s) is way to complicated
>> if possible at all. So we decided to give the spec a reality check and
>> fix the notifier bit assignment for virtio-ccw which is broken beyond
>> doubt if we accept that the correct virtqueue index is the one that the
>> hypervisor(s) use and not the one that the spec says they should use.
> 
> In case of virtio-balloon, it's unfortunate that it went that way, but the
> spec simply did not / does not reflect reality when it was added to the spec.
> 
>>
>> With the spec fixed, the whole notion of "holes" will be something that
>> does not make sense any more. With that the merit of the kernel interface
>> virtio_find_vqs() supporting "holes" is quite questionable. Now we need
>> it because the drivers within the Linux kernel still think of the queues
>> in terms of the current spec, i.e. they try to have the "holes" as
>> mandated by the spec, and the duty of making it work with the broken
>> device implementations falls to the transports.
>>
> 
> Right, the "holes" only exist in the input array.
> 
>> Under the assumption that the spec is indeed going to be fixed:

For virito-balloon, we should probably do the following:

 From 38e340c2bb53c2a7cc7c675f5dfdd44ecf7701d9 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@redhat.com>
Date: Fri, 4 Apr 2025 12:53:16 +0200
Subject: [PATCH] virtio-balloon: Fix queue index assignment for
  non-existing queues

Signed-off-by: David Hildenbrand <david@redhat.com>
---
  device-types/balloon/description.tex | 22 ++++++++++++++++------
  1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/device-types/balloon/description.tex b/device-types/balloon/description.tex
index a1d9603..a7396ff 100644
--- a/device-types/balloon/description.tex
+++ b/device-types/balloon/description.tex
@@ -16,6 +16,21 @@ \subsection{Device ID}\label{sec:Device Types / Memory Balloon Device / Device I
    5
  
  \subsection{Virtqueues}\label{sec:Device Types / Memory Balloon Device / Virtqueues}
+
+\begin{description}
+\item[inflateq] Exists unconditionally.
+\item[deflateq] Exists unconditionally.
+\item[statsq] Only exists if VIRTIO_BALLOON_F_STATS_VQ is set.
+\item[free_page_vq] Only exists if VIRTIO_BALLOON_F_FREE_PAGE_HINT is set.
+\item[reporting_vq] Only exists if VIRTIO_BALLOON_F_PAGE_REPORTING is set.
+\end{description}
+
+\begin{note}
+Virtqueue indexes are assigned sequentially for existing queues, starting
+with index 0; consequently, if a virtqueue does not exist, it does not get
+an index assigned. Assuming all virtqueues exist for a device, the indexes
+are:
+
  \begin{description}
  \item[0] inflateq
  \item[1] deflateq
@@ -23,12 +38,7 @@ \subsection{Virtqueues}\label{sec:Device Types / Memory Balloon Device / Virtque
  \item[3] free_page_vq
  \item[4] reporting_vq
  \end{description}
-
-  statsq only exists if VIRTIO_BALLOON_F_STATS_VQ is set.
-
-  free_page_vq only exists if VIRTIO_BALLOON_F_FREE_PAGE_HINT is set.
-
-  reporting_vq only exists if VIRTIO_BALLOON_F_PAGE_REPORTING is set.
+\end{note}
  
  \subsection{Feature bits}\label{sec:Device Types / Memory Balloon Device / Feature bits}
  \begin{description}
-- 
2.48.1


If something along these lines sounds reasonable, I can send a proper patch to the
proper audience.

-- 
Cheers,

David / dhildenb


  reply	other threads:[~2025-04-04 10:55 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-02 20:36 [PATCH v1] s390/virtio_ccw: don't allocate/assign airqs for non-existing queues David Hildenbrand
2025-04-03  9:44 ` Thomas Huth
2025-04-03 12:45 ` Cornelia Huck
2025-04-03 12:57 ` Michael S. Tsirkin
2025-04-03 13:12 ` Christian Borntraeger
2025-04-03 14:18 ` Halil Pasic
2025-04-03 14:28   ` David Hildenbrand
2025-04-04  4:36     ` Halil Pasic
2025-04-04 10:00       ` David Hildenbrand
2025-04-04 10:55         ` David Hildenbrand [this message]
2025-04-04 13:36           ` Halil Pasic
2025-04-04 13:48             ` David Hildenbrand
2025-04-04 14:00               ` Halil Pasic
2025-04-04 14:17                 ` David Hildenbrand
2025-04-04 15:39                   ` Halil Pasic
2025-04-04 16:49                     ` David Hildenbrand
2025-04-04 17:36                       ` David Hildenbrand
2025-04-07  7:52                     ` Michael S. Tsirkin
2025-04-07  8:17                       ` David Hildenbrand
2025-04-07  8:34                         ` Michael S. Tsirkin
2025-04-07  8:44                           ` David Hildenbrand
2025-04-07  8:49                             ` Michael S. Tsirkin
2025-04-07  8:54                               ` David Hildenbrand
2025-04-07  8:58                                 ` Michael S. Tsirkin
2025-04-07  9:11                                   ` David Hildenbrand
2025-04-07  9:13                                     ` David Hildenbrand
2025-04-07 13:13                                       ` David Hildenbrand
2025-04-07 17:39                                         ` Daniel Verkamp
2025-04-07 18:47                                           ` David Hildenbrand
2025-04-07 21:09                                             ` Daniel Verkamp
2025-04-09 11:02                                               ` David Hildenbrand
2025-04-07 21:20                                             ` Michael S. Tsirkin
2025-04-09 10:46                                               ` David Hildenbrand
2025-04-09 10:56                                                 ` Michael S. Tsirkin
2025-04-09 11:12                                                   ` David Hildenbrand
2025-04-09 12:07                                                     ` Michael S. Tsirkin
2025-04-09 12:24                                                       ` David Hildenbrand
2025-04-09 16:08                                                         ` Michael S. Tsirkin
2025-04-07  9:37                                     ` Michael S. Tsirkin
2025-04-07 13:12                           ` Halil Pasic
2025-04-07 13:17                             ` David Hildenbrand
2025-04-07 13:28                               ` Cornelia Huck
2025-04-07 13:32                                 ` Michael S. Tsirkin
2025-04-07 17:26                                 ` Halil Pasic
2025-04-07  8:38                         ` David Hildenbrand
2025-04-07  8:44                           ` Michael S. Tsirkin
2025-04-07  8:50                             ` David Hildenbrand
2025-04-07  9:22                             ` David Hildenbrand
2025-04-07  8:41                     ` Michael S. Tsirkin
2025-04-06 18:42               ` Michael S. Tsirkin
2025-04-07  7:18                 ` David Hildenbrand
2025-04-07  8:54                   ` Michael S. Tsirkin
2025-04-07  9:08                     ` David Hildenbrand
2025-04-06 15:40           ` Michael S. Tsirkin
2025-04-03 14:35   ` Michael S. Tsirkin
2025-04-04  4:02     ` Halil Pasic
2025-04-04  5:33       ` Michael S. Tsirkin
2025-04-04 12:05         ` Halil Pasic
2025-04-10 18:44 ` David Hildenbrand
2025-04-11 11:11   ` Christian Borntraeger
2025-04-11 12:42     ` Heiko Carstens
2025-04-11 12:47       ` Christian Borntraeger
2025-04-11 13:34       ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d54fbf56-b462-4eea-a86e-3a0defb6298b@redhat.com \
    --to=david@redhat.com \
    --cc=Stable@vger.kernel.org \
    --cc=agordeev@linux.ibm.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=cmerla@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=farman@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=pasic@linux.ibm.com \
    --cc=svens@linux.ibm.com \
    --cc=thuth@redhat.com \
    --cc=virtualization@lists.linux.dev \
    --cc=wei.w.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox