All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: Halil Pasic <pasic@linux.ibm.com>,
	linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org,
	virtualization@lists.linux.dev, kvm@vger.kernel.org,
	Chandra Merla <cmerla@redhat.com>,
	Stable@vger.kernel.org, Cornelia Huck <cohuck@redhat.com>,
	Thomas Huth <thuth@redhat.com>,
	Eric Farman <farman@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Sven Schnelle <svens@linux.ibm.com>,
	Wei Wang <wei.w.wang@intel.com>
Subject: Re: [PATCH v1] s390/virtio_ccw: don't allocate/assign airqs for non-existing queues
Date: Sun, 6 Apr 2025 11:40:33 -0400	[thread overview]
Message-ID: <20250406113926-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <d54fbf56-b462-4eea-a86e-3a0defb6298b@redhat.com>

On Fri, Apr 04, 2025 at 12:55:09PM +0200, David Hildenbrand wrote:
> On 04.04.25 12:00, David Hildenbrand wrote:
> > On 04.04.25 06:36, Halil Pasic wrote:
> > > On Thu, 3 Apr 2025 16:28:31 +0200
> > > David Hildenbrand <david@redhat.com> wrote:
> > > 
> > > > > Sorry I have to have a look at that discussion. Maybe it will answer
> > > > > some my questions.
> > > > 
> > > > Yes, I think so.
> > > > 
> > > > > > Let's fix it without affecting existing setups for now by properly
> > > > > > ignoring the non-existing queues, so the indicator bits will match
> > > > > > the queue indexes.
> > > > > 
> > > > > Just one question. My understanding is that the crux is that Linux
> > > > > and QEMU (or the driver and the device) disagree at which index
> > > > > reporting_vq is actually sitting. Is that right?
> > > > 
> > > > I thought I made it clear: this is only about the airq indicator bit.
> > > > That's where both disagree.
> > > > 
> > > > Not the actual queue index (see above).
> > > 
> > > I did some more research including having a look at that discussion. Let
> > > me try to sum up how did we end up here.
> > 
> > Let me add some more details after digging as well:
> > 
> > > 
> > > Before commit a229989d975e ("virtio: don't allocate vqs when names[i] =
> > > NULL") the kernel behavior used to be in spec, but QEMU and possibly
> > > other hypervisor were out of spec and things did not work.
> > 
> > It all started with VIRTIO_BALLOON_F_FREE_PAGE_HINT. Before that,
> > we only had the single optional VIRTIO_BALLOON_F_STATS_VQ queue at the very
> > end. So there was no possibility for holes "in-between".
> > 
> > In the Linux driver, we created the stats queue only if the feature bit
> > VIRTIO_BALLOON_F_STATS_VQ was actually around:
> > 
> > 	nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
> > 	err = virtio_find_vqs(vb->vdev, nvqs, vqs, callbacks, names, NULL);
> > 
> > That changed with VIRTIO_BALLOON_F_FREE_PAGE_HINT, because we would
> > unconditionally create 4 queues. QEMU always supported the first 3 queues
> > unconditionally, but old QEMU did obviously not support the (new)
> > VIRTIO_BALLOON_F_FREE_PAGE_HINT queue.
> > 
> > 390x didn't particularly like getting queried for non-existing
> > queues. [1] So the fix was not for a hypervisor that was out of spec, but
> > because quering non-existing queues didn't work.
> > 
> > The fix implied that if VIRTIO_BALLOON_F_STATS_VQ is missing, suddenly the queue
> > index of VIRTIO_BALLOON_F_FREE_PAGE_HINT changed as well.
> > 
> > Again, as QEMU always implemented the 3 first queues unconditionally, this was
> > not a problem.
> > 
> > [1] https://lore.kernel.org/all/c6746307-fae5-7652-af8d-19f560fc31d9@de.ibm.com/#t
> > 
> > > 
> > > Possibly because of the complexity of fixing the hypervisor(s) commit
> > > a229989d975e ("virtio: don't allocate vqs when names[i] = NULL") opted
> > > for changing the guest side so that it does not fit the spec but fits
> > > the hypervisor(s). It unfortunately also broke notifiers (for the with
> > > holes) scenario for virtio-ccw only.
> > 
> > Yes, it broke the notifiers.
> > 
> > But note that everything was in spec at that point, because we only documented
> > "free_page_vq == 3" in the spec *2 years later*, in 2020:
> > 
> > commit 38448268eba0c105200d131c3f7f660129a4d673
> > Author: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> > Date:   Tue Aug 25 07:45:02 2020 -0700
> > 
> >       content: Document balloon feature free page hints
> >       Free page hints allow the balloon driver to provide information on what
> >       pages are not currently in use so that we can avoid the cost of copying
> >       them in migration scenarios. Add a feature description for free page hints
> >       describing basic functioning and requirements.
> > At that point, what we documented in the spec *did not match reality* in
> > Linux. QEMU was fully compatible, because VIRTIO_BALLOON_F_STATS_VQ is
> > unconditionally set.
> > 
> > 
> > QEMU and Linux kept using that queue index assignment model, and the spec
> > was wrong (out of sync?) at that point. The spec got more wrong with
> > 
> > commit d917d4a8d552c003e046b0e3b1b529d98f7e695b
> > Author: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> > Date:   Tue Aug 25 07:45:17 2020 -0700
> > 
> >       content: Document balloon feature free page reporting
> >       Free page reporting is a feature that allows the guest to proactively
> >       report unused pages to the host. By making use of this feature is is
> >       possible to reduce the overall memory footprint of the guest in cases where
> >       some significant portion of the memory is idle. Add documentation for the
> >       free page reporting feature describing the functionality and requirements.
> > 
> > Where we documented VIRTIO_BALLOON_F_REPORTING after the changes were added to
> > QEMU+Linux implementation, so the spec did not reflect reality.
> > 
> > I'll note also cloud-hypervisor [2] today follows that model.
> > 
> > In particular, it *only* supports VIRTIO_BALLOON_F_REPORTING, turning
> > the queue index of VIRTIO_BALLOON_F_REPORTING into *2* instead of documented
> > in the spec to be *4*.
> > 
> > So in reality, we can see VIRTIO_BALLOON_F_REPORTING to be either 2/3/4, depending
> > on the availability of the other two features/queues.
> > 
> > [2] https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/virtio-devices/src/balloon.rs
> > 
> > 
> > > 
> > > Now we had another look at this, and have concluded that fixing the
> > > hypervisor(s) and fixing the kernel, and making sure that the fixed
> > > kernel can tolerate the old broken hypervisor(s) is way to complicated
> > > if possible at all. So we decided to give the spec a reality check and
> > > fix the notifier bit assignment for virtio-ccw which is broken beyond
> > > doubt if we accept that the correct virtqueue index is the one that the
> > > hypervisor(s) use and not the one that the spec says they should use.
> > 
> > In case of virtio-balloon, it's unfortunate that it went that way, but the
> > spec simply did not / does not reflect reality when it was added to the spec.
> > 
> > > 
> > > With the spec fixed, the whole notion of "holes" will be something that
> > > does not make sense any more. With that the merit of the kernel interface
> > > virtio_find_vqs() supporting "holes" is quite questionable. Now we need
> > > it because the drivers within the Linux kernel still think of the queues
> > > in terms of the current spec, i.e. they try to have the "holes" as
> > > mandated by the spec, and the duty of making it work with the broken
> > > device implementations falls to the transports.
> > > 
> > 
> > Right, the "holes" only exist in the input array.
> > 
> > > Under the assumption that the spec is indeed going to be fixed:
> 
> For virito-balloon, we should probably do the following:
> 
> From 38e340c2bb53c2a7cc7c675f5dfdd44ecf7701d9 Mon Sep 17 00:00:00 2001
> From: David Hildenbrand <david@redhat.com>
> Date: Fri, 4 Apr 2025 12:53:16 +0200
> Subject: [PATCH] virtio-balloon: Fix queue index assignment for
>  non-existing queues
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  device-types/balloon/description.tex | 22 ++++++++++++++++------
>  1 file changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/device-types/balloon/description.tex b/device-types/balloon/description.tex
> index a1d9603..a7396ff 100644
> --- a/device-types/balloon/description.tex
> +++ b/device-types/balloon/description.tex
> @@ -16,6 +16,21 @@ \subsection{Device ID}\label{sec:Device Types / Memory Balloon Device / Device I
>    5
>  \subsection{Virtqueues}\label{sec:Device Types / Memory Balloon Device / Virtqueues}
> +
> +\begin{description}
> +\item[inflateq] Exists unconditionally.
> +\item[deflateq] Exists unconditionally.
> +\item[statsq] Only exists if VIRTIO_BALLOON_F_STATS_VQ is set.
> +\item[free_page_vq] Only exists if VIRTIO_BALLOON_F_FREE_PAGE_HINT is set.
> +\item[reporting_vq] Only exists if VIRTIO_BALLOON_F_PAGE_REPORTING is set.
> +\end{description}
> +
> +\begin{note}
> +Virtqueue indexes are assigned sequentially for existing queues, starting
> +with index 0; consequently, if a virtqueue does not exist, it does not get
> +an index assigned. Assuming all virtqueues exist for a device, the indexes
> +are:
> +
>  \begin{description}
>  \item[0] inflateq
>  \item[1] deflateq
> @@ -23,12 +38,7 @@ \subsection{Virtqueues}\label{sec:Device Types / Memory Balloon Device / Virtque
>  \item[3] free_page_vq
>  \item[4] reporting_vq
>  \end{description}
> -
> -  statsq only exists if VIRTIO_BALLOON_F_STATS_VQ is set.
> -
> -  free_page_vq only exists if VIRTIO_BALLOON_F_FREE_PAGE_HINT is set.
> -
> -  reporting_vq only exists if VIRTIO_BALLOON_F_PAGE_REPORTING is set.
> +\end{note}
>  \subsection{Feature bits}\label{sec:Device Types / Memory Balloon Device / Feature bits}
>  \begin{description}
> -- 
> 2.48.1
> 
> 
> If something along these lines sounds reasonable, I can send a proper patch to the
> proper audience.


Indeed, but do we want to add a note about previous spec versions
saying something different? Maybe, with a hint how devices following
old spec can be detected?



> -- 
> Cheers,
> 
> David / dhildenb


  parent reply	other threads:[~2025-04-06 15:40 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-02 20:36 [PATCH v1] s390/virtio_ccw: don't allocate/assign airqs for non-existing queues David Hildenbrand
2025-04-03  9:44 ` Thomas Huth
2025-04-03 12:45 ` Cornelia Huck
2025-04-03 12:57 ` Michael S. Tsirkin
2025-04-03 13:12 ` Christian Borntraeger
2025-04-03 14:18 ` Halil Pasic
2025-04-03 14:28   ` David Hildenbrand
2025-04-04  4:36     ` Halil Pasic
2025-04-04 10:00       ` David Hildenbrand
2025-04-04 10:55         ` David Hildenbrand
2025-04-04 13:36           ` Halil Pasic
2025-04-04 13:48             ` David Hildenbrand
2025-04-04 14:00               ` Halil Pasic
2025-04-04 14:17                 ` David Hildenbrand
2025-04-04 15:39                   ` Halil Pasic
2025-04-04 16:49                     ` David Hildenbrand
2025-04-04 17:36                       ` David Hildenbrand
2025-04-07  7:52                     ` Michael S. Tsirkin
2025-04-07  8:17                       ` David Hildenbrand
2025-04-07  8:34                         ` Michael S. Tsirkin
2025-04-07  8:44                           ` David Hildenbrand
2025-04-07  8:49                             ` Michael S. Tsirkin
2025-04-07  8:54                               ` David Hildenbrand
2025-04-07  8:58                                 ` Michael S. Tsirkin
2025-04-07  9:11                                   ` David Hildenbrand
2025-04-07  9:13                                     ` David Hildenbrand
2025-04-07 13:13                                       ` David Hildenbrand
2025-04-07 17:39                                         ` Daniel Verkamp
2025-04-07 18:47                                           ` David Hildenbrand
2025-04-07 21:09                                             ` Daniel Verkamp
2025-04-09 11:02                                               ` David Hildenbrand
2025-04-07 21:20                                             ` Michael S. Tsirkin
2025-04-09 10:46                                               ` David Hildenbrand
2025-04-09 10:56                                                 ` Michael S. Tsirkin
2025-04-09 11:12                                                   ` David Hildenbrand
2025-04-09 12:07                                                     ` Michael S. Tsirkin
2025-04-09 12:24                                                       ` David Hildenbrand
2025-04-09 16:08                                                         ` Michael S. Tsirkin
2025-04-07  9:37                                     ` Michael S. Tsirkin
2025-04-07 13:12                           ` Halil Pasic
2025-04-07 13:17                             ` David Hildenbrand
2025-04-07 13:28                               ` Cornelia Huck
2025-04-07 13:32                                 ` Michael S. Tsirkin
2025-04-07 17:26                                 ` Halil Pasic
2025-04-07  8:38                         ` David Hildenbrand
2025-04-07  8:44                           ` Michael S. Tsirkin
2025-04-07  8:50                             ` David Hildenbrand
2025-04-07  9:22                             ` David Hildenbrand
2025-04-07  8:41                     ` Michael S. Tsirkin
2025-04-06 18:42               ` Michael S. Tsirkin
2025-04-07  7:18                 ` David Hildenbrand
2025-04-07  8:54                   ` Michael S. Tsirkin
2025-04-07  9:08                     ` David Hildenbrand
2025-04-06 15:40           ` Michael S. Tsirkin [this message]
2025-04-03 14:35   ` Michael S. Tsirkin
2025-04-04  4:02     ` Halil Pasic
2025-04-04  5:33       ` Michael S. Tsirkin
2025-04-04 12:05         ` Halil Pasic
2025-04-10 18:44 ` David Hildenbrand
2025-04-11 11:11   ` Christian Borntraeger
2025-04-11 12:42     ` Heiko Carstens
2025-04-11 12:47       ` Christian Borntraeger
2025-04-11 13:34       ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250406113926-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=Stable@vger.kernel.org \
    --cc=agordeev@linux.ibm.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=cmerla@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=david@redhat.com \
    --cc=farman@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=pasic@linux.ibm.com \
    --cc=svens@linux.ibm.com \
    --cc=thuth@redhat.com \
    --cc=virtualization@lists.linux.dev \
    --cc=wei.w.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.