All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] virtio_balloon: quiesce balloon work on device shutdown
@ 2026-06-22 13:37 Denis V. Lunev
  2026-06-22 13:37 ` [PATCH 1/2] virtio_balloon: factor out virtballoon_quiesce() Denis V. Lunev
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Denis V. Lunev @ 2026-06-22 13:37 UTC (permalink / raw)
  To: mst, david; +Cc: virtualization, linux-kernel, Denis V. Lunev

Since commit 8bd2fa086a04 ("virtio: break and reset virtio devices on
device_shutdown()") the virtio bus breaks and resets every virtio device
during device_shutdown(), i.e. on reboot and kexec. virtio_balloon has no
.shutdown of its own, so that generic path runs while the balloon's
asynchronous work is still armed: the free page reporting worker, the
inflate/deflate and stats workers, the OOM notifier and the free page
shrinker.

Once the device has been broken, virtqueue_add_inbuf() in
virtballoon_free_page_report() returns -EIO and trips its WARN_ON_ONCE().
On a kernel booted with panic_on_warn that turns an ordinary reboot into a
fatal panic in the middle of device_shutdown(), so the machine never
reaches the new kernel. The inflate/deflate and OOM paths do not warn but
are no better off: they call wait_event(vb->acked, ...) and would block
forever on a queue that can no longer complete.

This was hit in the field as an intermittent failure of a virtualization
cluster upgrade: guest storage nodes were rebooted via kexec into the new
kernel, and the ones whose free page reporting happened to run during
device_shutdown() panicked (the guests run with panic_on_warn) and never
came back, stalling the rolling upgrade. The crash dump showed the WARN at
virtio_balloon.c:216 in a page_reporting kworker, with all the balloon
virtqueues already broken.

Patch 1 factors the teardown out of virtballoon_remove() into a
virtballoon_quiesce() helper (no functional change). Patch 2 adds a
virtio_balloon .shutdown handler that quiesces via that helper while the
device is still alive, then breaks and resets it the way the generic
virtio_dev_shutdown() would.

Relaxing the single WARN_ON_ONCE() instead was considered and rejected: it
would silence the panic but leave the inflate/deflate and OOM paths
hanging on the broken device. The device has to be quiesced, not just kept
quiet.

Validated by churning balloon inflate/deflate from the host while
kexec-rebooting the guest in a loop under panic_on_warn: the unpatched
module reproduces the WARN within a couple of cycles, while the patched
module survives many consecutive kexec cycles cleanly (12/12 in the final
run, 0 WARNs). checkpatch is clean on both patches.

Denis V. Lunev (2):
  virtio_balloon: factor out virtballoon_quiesce()
  virtio_balloon: quiesce balloon work before device shutdown

 drivers/virtio/virtio_balloon.c | 37 ++++++++++++++++++++++++++-------
 1 file changed, 30 insertions(+), 7 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] virtio_balloon: factor out virtballoon_quiesce()
  2026-06-22 13:37 [PATCH 0/2] virtio_balloon: quiesce balloon work on device shutdown Denis V. Lunev
@ 2026-06-22 13:37 ` Denis V. Lunev
  2026-06-22 14:46   ` David Hildenbrand (Arm)
  2026-06-22 13:37 ` [PATCH 2/2] virtio_balloon: quiesce balloon work before device shutdown Denis V. Lunev
  2026-06-22 14:29 ` [PATCH 0/2] virtio_balloon: quiesce balloon work on " David Hildenbrand (Arm)
  2 siblings, 1 reply; 9+ messages in thread
From: Denis V. Lunev @ 2026-06-22 13:37 UTC (permalink / raw)
  To: mst, david; +Cc: virtualization, linux-kernel, Denis V. Lunev

virtballoon_remove() stops all of the balloon's asynchronous work (the
free page reporting worker, the inflate/deflate and stats workers, the
OOM notifier and the free page shrinker) before tearing the device
down. A following change needs the same teardown from a .shutdown
handler, so move it into a virtballoon_quiesce() helper.

No functional change.

Signed-off-by: Denis V. Lunev <den@openvz.org>
---
 drivers/virtio/virtio_balloon.c | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 088b3a0e6ce6..5b02d9191ac6 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -1098,26 +1098,39 @@ static void remove_common(struct virtio_balloon *vb)
 	vb->vdev->config->del_vqs(vb->vdev);
 }
 
-static void virtballoon_remove(struct virtio_device *vdev)
+/*
+ * Stop all asynchronous balloon work. The device must still be alive so that
+ * in-flight requests can drain via the host before it is reset or freed.
+ */
+static void virtballoon_quiesce(struct virtio_balloon *vb)
 {
-	struct virtio_balloon *vb = vdev->priv;
+	struct virtio_device *vdev = vb->vdev;
 
-	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_REPORTING))
+	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_REPORTING))
 		page_reporting_unregister(&vb->pr_dev_info);
-	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
+	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
 		unregister_oom_notifier(&vb->oom_nb);
-	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
+	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
 		virtio_balloon_unregister_shrinker(vb);
+
 	spin_lock_irq(&vb->stop_update_lock);
 	vb->stop_update = true;
 	spin_unlock_irq(&vb->stop_update_lock);
 	cancel_work_sync(&vb->update_balloon_size_work);
 	cancel_work_sync(&vb->update_balloon_stats_work);
 
-	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
 		cancel_work_sync(&vb->report_free_page_work);
+}
+
+static void virtballoon_remove(struct virtio_device *vdev)
+{
+	struct virtio_balloon *vb = vdev->priv;
+
+	virtballoon_quiesce(vb);
+
+	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
 		destroy_workqueue(vb->balloon_wq);
-	}
 
 	remove_common(vb);
 	mutex_destroy(&vb->balloon_lock);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] virtio_balloon: quiesce balloon work before device shutdown
  2026-06-22 13:37 [PATCH 0/2] virtio_balloon: quiesce balloon work on device shutdown Denis V. Lunev
  2026-06-22 13:37 ` [PATCH 1/2] virtio_balloon: factor out virtballoon_quiesce() Denis V. Lunev
@ 2026-06-22 13:37 ` Denis V. Lunev
  2026-06-22 14:38   ` David Hildenbrand (Arm)
  2026-06-22 14:29 ` [PATCH 0/2] virtio_balloon: quiesce balloon work on " David Hildenbrand (Arm)
  2 siblings, 1 reply; 9+ messages in thread
From: Denis V. Lunev @ 2026-06-22 13:37 UTC (permalink / raw)
  To: mst, david; +Cc: virtualization, linux-kernel, Denis V. Lunev

Commit 8bd2fa086a04 ("virtio: break and reset virtio devices on
device_shutdown()") added a generic virtio bus .shutdown handler that
breaks and resets every virtio device during device_shutdown(), i.e. on
reboot and kexec.

virtio_balloon provides no .shutdown of its own, so that generic path
runs while the balloon's asynchronous work is still armed. Once the
device has been broken, virtqueue_add_inbuf() in
virtballoon_free_page_report() returns -EIO and trips its
WARN_ON_ONCE(). On a kernel booted with panic_on_warn that turns an
ordinary reboot, for example a kexec based upgrade, into a fatal panic
in the middle of device_shutdown(), so the machine never reaches the
new kernel.

Relaxing that single WARN_ON_ONCE() would only hide the symptom: the
inflate/deflate and OOM paths do not warn, they call
wait_event(vb->acked, ...) and would instead block forever on a broken
queue that can no longer complete. The device has to be quiesced, not
just kept quiet.

Add a .shutdown handler that quiesces the balloon via the shared
virtballoon_quiesce() helper while the device is still alive, and only
then breaks and resets it. The break and reset are repeated here rather
than reused from virtio_dev_shutdown(): drv->shutdown replaces the
generic handler rather than augmenting it, so that drivers such as
virtio-gpu can opt out of the reset. Unlike virtballoon_remove() the
balloon workqueue is not destroyed, as shutdown does not free the
device and cancel_work_sync() together with stop_update already prevent
any further work from being queued.

Fixes: 8bd2fa086a04 ("virtio: break and reset virtio devices on device_shutdown()")
Signed-off-by: Denis V. Lunev <den@openvz.org>
---
 drivers/virtio/virtio_balloon.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 5b02d9191ac6..e35ada767b4b 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -1137,6 +1137,15 @@ static void virtballoon_remove(struct virtio_device *vdev)
 	kfree(vb);
 }
 
+static void virtballoon_shutdown(struct virtio_device *vdev)
+{
+	virtballoon_quiesce(vdev->priv);
+
+	virtio_break_device(vdev);
+	virtio_synchronize_cbs(vdev);
+	vdev->config->reset(vdev);
+}
+
 #ifdef CONFIG_PM_SLEEP
 static int virtballoon_freeze(struct virtio_device *vdev)
 {
@@ -1202,6 +1211,7 @@ static struct virtio_driver virtio_balloon_driver = {
 	.validate =	virtballoon_validate,
 	.probe =	virtballoon_probe,
 	.remove =	virtballoon_remove,
+	.shutdown =	virtballoon_shutdown,
 	.config_changed = virtballoon_changed,
 #ifdef CONFIG_PM_SLEEP
 	.freeze	=	virtballoon_freeze,
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] virtio_balloon: quiesce balloon work on device shutdown
  2026-06-22 13:37 [PATCH 0/2] virtio_balloon: quiesce balloon work on device shutdown Denis V. Lunev
  2026-06-22 13:37 ` [PATCH 1/2] virtio_balloon: factor out virtballoon_quiesce() Denis V. Lunev
  2026-06-22 13:37 ` [PATCH 2/2] virtio_balloon: quiesce balloon work before device shutdown Denis V. Lunev
@ 2026-06-22 14:29 ` David Hildenbrand (Arm)
  2026-06-22 14:33   ` Denis V. Lunev
  2 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-22 14:29 UTC (permalink / raw)
  To: Denis V. Lunev, mst; +Cc: virtualization, linux-kernel

On 6/22/26 15:37, Denis V. Lunev wrote:
> Since commit 8bd2fa086a04 ("virtio: break and reset virtio devices on
> device_shutdown()") the virtio bus breaks and resets every virtio device
> during device_shutdown(), i.e. on reboot and kexec. virtio_balloon has no
> .shutdown of its own, so that generic path runs while the balloon's
> asynchronous work is still armed: the free page reporting worker, the
> inflate/deflate and stats workers, the OOM notifier and the free page
> shrinker.
> 
> Once the device has been broken, virtqueue_add_inbuf() in
> virtballoon_free_page_report() returns -EIO and trips its WARN_ON_ONCE().
> On a kernel booted with panic_on_warn that turns an ordinary reboot into a
> fatal panic in the middle of device_shutdown(), so the machine never
> reaches the new kernel. The inflate/deflate and OOM paths do not warn but
> are no better off: they call wait_event(vb->acked, ...) and would block
> forever on a queue that can no longer complete.
> 
> This was hit in the field as an intermittent failure of a virtualization
> cluster upgrade: guest storage nodes were rebooted via kexec into the new
> kernel, and the ones whose free page reporting happened to run during
> device_shutdown() panicked (the guests run with panic_on_warn) and never
> came back, stalling the rolling upgrade. The crash dump showed the WARN at
> virtio_balloon.c:216 in a page_reporting kworker, with all the balloon
> virtqueues already broken.
> 
> Patch 1 factors the teardown out of virtballoon_remove() into a
> virtballoon_quiesce() helper (no functional change). Patch 2 adds a
> virtio_balloon .shutdown handler that quiesces via that helper while the
> device is still alive, then breaks and resets it the way the generic
> virtio_dev_shutdown() would.
> 
> Relaxing the single WARN_ON_ONCE() instead was considered and rejected: it
> would silence the panic but leave the inflate/deflate and OOM paths
> hanging on the broken device. The device has to be quiesced, not just kept
> quiet.

Do you have a link to that discussion you could add?


-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] virtio_balloon: quiesce balloon work on device shutdown
  2026-06-22 14:29 ` [PATCH 0/2] virtio_balloon: quiesce balloon work on " David Hildenbrand (Arm)
@ 2026-06-22 14:33   ` Denis V. Lunev
  0 siblings, 0 replies; 9+ messages in thread
From: Denis V. Lunev @ 2026-06-22 14:33 UTC (permalink / raw)
  To: David Hildenbrand (Arm), Denis V. Lunev, mst; +Cc: virtualization, linux-kernel

On 6/22/26 16:29, David Hildenbrand (Arm) wrote:
> This email originated from an IP that might not be authorized by the domain it was sent from.
> Do not click links or open attachments unless it is an email you expected to receive.
> On 6/22/26 15:37, Denis V. Lunev wrote:
>> Since commit 8bd2fa086a04 ("virtio: break and reset virtio devices on
>> device_shutdown()") the virtio bus breaks and resets every virtio device
>> during device_shutdown(), i.e. on reboot and kexec. virtio_balloon has no
>> .shutdown of its own, so that generic path runs while the balloon's
>> asynchronous work is still armed: the free page reporting worker, the
>> inflate/deflate and stats workers, the OOM notifier and the free page
>> shrinker.
>>
>> Once the device has been broken, virtqueue_add_inbuf() in
>> virtballoon_free_page_report() returns -EIO and trips its WARN_ON_ONCE().
>> On a kernel booted with panic_on_warn that turns an ordinary reboot into a
>> fatal panic in the middle of device_shutdown(), so the machine never
>> reaches the new kernel. The inflate/deflate and OOM paths do not warn but
>> are no better off: they call wait_event(vb->acked, ...) and would block
>> forever on a queue that can no longer complete.
>>
>> This was hit in the field as an intermittent failure of a virtualization
>> cluster upgrade: guest storage nodes were rebooted via kexec into the new
>> kernel, and the ones whose free page reporting happened to run during
>> device_shutdown() panicked (the guests run with panic_on_warn) and never
>> came back, stalling the rolling upgrade. The crash dump showed the WARN at
>> virtio_balloon.c:216 in a page_reporting kworker, with all the balloon
>> virtqueues already broken.
>>
>> Patch 1 factors the teardown out of virtballoon_remove() into a
>> virtballoon_quiesce() helper (no functional change). Patch 2 adds a
>> virtio_balloon .shutdown handler that quiesces via that helper while the
>> device is still alive, then breaks and resets it the way the generic
>> virtio_dev_shutdown() would.
>>
>> Relaxing the single WARN_ON_ONCE() instead was considered and rejected: it
>> would silence the panic but leave the inflate/deflate and OOM paths
>> hanging on the broken device. The device has to be quiesced, not just kept
>> quiet.
> Do you have a link to that discussion you could add?
>
>
that was internal discussion for that fix. We have faced this
during tests, made the fix and validated it before sending.

Thus - sorry, this is my mistake.

Den

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] virtio_balloon: quiesce balloon work before device shutdown
  2026-06-22 13:37 ` [PATCH 2/2] virtio_balloon: quiesce balloon work before device shutdown Denis V. Lunev
@ 2026-06-22 14:38   ` David Hildenbrand (Arm)
  2026-06-22 14:58     ` Michael S. Tsirkin
  0 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-22 14:38 UTC (permalink / raw)
  To: Denis V. Lunev, mst; +Cc: virtualization, linux-kernel

On 6/22/26 15:37, Denis V. Lunev wrote:
> Commit 8bd2fa086a04 ("virtio: break and reset virtio devices on
> device_shutdown()") added a generic virtio bus .shutdown handler that
> breaks and resets every virtio device during device_shutdown(), i.e. on
> reboot and kexec.
> 
> virtio_balloon provides no .shutdown of its own, so that generic path
> runs while the balloon's asynchronous work is still armed. Once the
> device has been broken, virtqueue_add_inbuf() in
> virtballoon_free_page_report() returns -EIO and trips its
> WARN_ON_ONCE(). On a kernel booted with panic_on_warn that turns an
> ordinary reboot, for example a kexec based upgrade, into a fatal panic
> in the middle of device_shutdown(), so the machine never reaches the
> new kernel.
> 
> Relaxing that single WARN_ON_ONCE() would only hide the symptom: the
> inflate/deflate and OOM paths do not warn, they call
> wait_event(vb->acked, ...) and would instead block forever on a broken
> queue that can no longer complete. The device has to be quiesced, not
> just kept quiet.

Ah, so

	/* We should always be able to add one buffer to an empty queue. */
	virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL);

is not actually correct.

Yeah, quiescing sounds cleaner, although I am thinking whether we should also
warn if virtqueue_add_outbuf() fails, similar to what we do in
virtballoon_free_page_report().

> 
> Add a .shutdown handler that quiesces the balloon via the shared
> virtballoon_quiesce() helper while the device is still alive, and only
> then breaks and resets it. The break and reset are repeated here rather
> than reused from virtio_dev_shutdown(): drv->shutdown replaces the
> generic handler rather than augmenting it, so that drivers such as
> virtio-gpu can opt out of the reset. Unlike virtballoon_remove() the
> balloon workqueue is not destroyed, as shutdown does not free the
> device and cancel_work_sync() together with stop_update already prevent
> any further work from being queued.
> 
> Fixes: 8bd2fa086a04 ("virtio: break and reset virtio devices on device_shutdown()")
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> ---
>  drivers/virtio/virtio_balloon.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 5b02d9191ac6..e35ada767b4b 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -1137,6 +1137,15 @@ static void virtballoon_remove(struct virtio_device *vdev)
>  	kfree(vb);
>  }
>  
> +static void virtballoon_shutdown(struct virtio_device *vdev)
> +{
> +	virtballoon_quiesce(vdev->priv);
> +
> +	virtio_break_device(vdev);
> +	virtio_synchronize_cbs(vdev);
> +	vdev->config->reset(vdev);

I guess it would be good if we wouldn't have to copy what the default handler
does, but could instead just have it in a reusable core function?

> +}
> +

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] virtio_balloon: factor out virtballoon_quiesce()
  2026-06-22 13:37 ` [PATCH 1/2] virtio_balloon: factor out virtballoon_quiesce() Denis V. Lunev
@ 2026-06-22 14:46   ` David Hildenbrand (Arm)
  2026-06-22 14:59     ` Michael S. Tsirkin
  0 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-22 14:46 UTC (permalink / raw)
  To: Denis V. Lunev, mst; +Cc: virtualization, linux-kernel

On 6/22/26 15:37, Denis V. Lunev wrote:
> virtballoon_remove() stops all of the balloon's asynchronous work (the
> free page reporting worker, the inflate/deflate and stats workers, the
> OOM notifier and the free page shrinker) before tearing the device
> down. A following change needs the same teardown from a .shutdown
> handler, so move it into a virtballoon_quiesce() helper.
> 
> No functional change.
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> ---
>  drivers/virtio/virtio_balloon.c | 27 ++++++++++++++++++++-------
>  1 file changed, 20 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 088b3a0e6ce6..5b02d9191ac6 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -1098,26 +1098,39 @@ static void remove_common(struct virtio_balloon *vb)
>  	vb->vdev->config->del_vqs(vb->vdev);
>  }
>  
> -static void virtballoon_remove(struct virtio_device *vdev)
> +/*
> + * Stop all asynchronous balloon work. The device must still be alive so that
> + * in-flight requests can drain via the host before it is reset or freed.
> + */
> +static void virtballoon_quiesce(struct virtio_balloon *vb)
>  {
> -	struct virtio_balloon *vb = vdev->priv;
> +	struct virtio_device *vdev = vb->vdev;
>  
> -	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_REPORTING))
> +	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_REPORTING))
>  		page_reporting_unregister(&vb->pr_dev_info);

Most functions here grab mutexes and can sleep. I assume that's fine in shutdown
context.

Nothing jumped at me


Sashiko asks whether ->remove (hotunplug/driver-unloading) can follow a
->shutdown callback. I don't know, seems questionable if this could happen and
should probably be handled by the core?

If it's possible you'd have to remember that stuff was already quiesce'ed. Maybe
simply by checking early in virtballoon_quiesce() whether vb->stop_update ==
true already.

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] virtio_balloon: quiesce balloon work before device shutdown
  2026-06-22 14:38   ` David Hildenbrand (Arm)
@ 2026-06-22 14:58     ` Michael S. Tsirkin
  0 siblings, 0 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2026-06-22 14:58 UTC (permalink / raw)
  To: David Hildenbrand (Arm); +Cc: Denis V. Lunev, virtualization, linux-kernel

On Mon, Jun 22, 2026 at 04:38:54PM +0200, David Hildenbrand (Arm) wrote:
> On 6/22/26 15:37, Denis V. Lunev wrote:
> > Commit 8bd2fa086a04 ("virtio: break and reset virtio devices on
> > device_shutdown()") added a generic virtio bus .shutdown handler that
> > breaks and resets every virtio device during device_shutdown(), i.e. on
> > reboot and kexec.
> > 
> > virtio_balloon provides no .shutdown of its own, so that generic path
> > runs while the balloon's asynchronous work is still armed. Once the
> > device has been broken, virtqueue_add_inbuf() in
> > virtballoon_free_page_report() returns -EIO and trips its
> > WARN_ON_ONCE(). On a kernel booted with panic_on_warn that turns an
> > ordinary reboot, for example a kexec based upgrade, into a fatal panic
> > in the middle of device_shutdown(), so the machine never reaches the
> > new kernel.
> > 
> > Relaxing that single WARN_ON_ONCE() would only hide the symptom: the
> > inflate/deflate and OOM paths do not warn, they call
> > wait_event(vb->acked, ...) and would instead block forever on a broken
> > queue that can no longer complete. The device has to be quiesced, not
> > just kept quiet.
> 
> Ah, so
> 
> 	/* We should always be able to add one buffer to an empty queue. */
> 	virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL);
> 
> is not actually correct.

Yes - we can't if the device is completely gone)

> Yeah, quiescing sounds cleaner, although I am thinking whether we should also
> warn if virtqueue_add_outbuf() fails, similar to what we do in
> virtballoon_free_page_report().
> 
> > 
> > Add a .shutdown handler that quiesces the balloon via the shared
> > virtballoon_quiesce() helper while the device is still alive, and only
> > then breaks and resets it. The break and reset are repeated here rather
> > than reused from virtio_dev_shutdown(): drv->shutdown replaces the
> > generic handler rather than augmenting it, so that drivers such as
> > virtio-gpu can opt out of the reset. Unlike virtballoon_remove() the
> > balloon workqueue is not destroyed, as shutdown does not free the
> > device and cancel_work_sync() together with stop_update already prevent
> > any further work from being queued.
> > 
> > Fixes: 8bd2fa086a04 ("virtio: break and reset virtio devices on device_shutdown()")
> > Signed-off-by: Denis V. Lunev <den@openvz.org>
> > ---
> >  drivers/virtio/virtio_balloon.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> > index 5b02d9191ac6..e35ada767b4b 100644
> > --- a/drivers/virtio/virtio_balloon.c
> > +++ b/drivers/virtio/virtio_balloon.c
> > @@ -1137,6 +1137,15 @@ static void virtballoon_remove(struct virtio_device *vdev)
> >  	kfree(vb);
> >  }
> >  
> > +static void virtballoon_shutdown(struct virtio_device *vdev)
> > +{
> > +	virtballoon_quiesce(vdev->priv);
> > +
> > +	virtio_break_device(vdev);
> > +	virtio_synchronize_cbs(vdev);
> > +	vdev->config->reset(vdev);
> 
> I guess it would be good if we wouldn't have to copy what the default handler
> does, but could instead just have it in a reusable core function?
> 
> > +}
> > +
> 
> -- 
> Cheers,
> 
> David


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] virtio_balloon: factor out virtballoon_quiesce()
  2026-06-22 14:46   ` David Hildenbrand (Arm)
@ 2026-06-22 14:59     ` Michael S. Tsirkin
  0 siblings, 0 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2026-06-22 14:59 UTC (permalink / raw)
  To: David Hildenbrand (Arm); +Cc: Denis V. Lunev, virtualization, linux-kernel

On Mon, Jun 22, 2026 at 04:46:48PM +0200, David Hildenbrand (Arm) wrote:
> On 6/22/26 15:37, Denis V. Lunev wrote:
> > virtballoon_remove() stops all of the balloon's asynchronous work (the
> > free page reporting worker, the inflate/deflate and stats workers, the
> > OOM notifier and the free page shrinker) before tearing the device
> > down. A following change needs the same teardown from a .shutdown
> > handler, so move it into a virtballoon_quiesce() helper.
> > 
> > No functional change.
> > 
> > Signed-off-by: Denis V. Lunev <den@openvz.org>
> > ---
> >  drivers/virtio/virtio_balloon.c | 27 ++++++++++++++++++++-------
> >  1 file changed, 20 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> > index 088b3a0e6ce6..5b02d9191ac6 100644
> > --- a/drivers/virtio/virtio_balloon.c
> > +++ b/drivers/virtio/virtio_balloon.c
> > @@ -1098,26 +1098,39 @@ static void remove_common(struct virtio_balloon *vb)
> >  	vb->vdev->config->del_vqs(vb->vdev);
> >  }
> >  
> > -static void virtballoon_remove(struct virtio_device *vdev)
> > +/*
> > + * Stop all asynchronous balloon work. The device must still be alive so that
> > + * in-flight requests can drain via the host before it is reset or freed.
> > + */
> > +static void virtballoon_quiesce(struct virtio_balloon *vb)
> >  {
> > -	struct virtio_balloon *vb = vdev->priv;
> > +	struct virtio_device *vdev = vb->vdev;
> >  
> > -	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_REPORTING))
> > +	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_REPORTING))
> >  		page_reporting_unregister(&vb->pr_dev_info);
> 
> Most functions here grab mutexes and can sleep. I assume that's fine in shutdown
> context.
> 
> Nothing jumped at me
> 
> 
> Sashiko asks whether ->remove (hotunplug/driver-unloading) can follow a
> ->shutdown callback. I don't know, seems questionable if this could happen and
> should probably be handled by the core?
> 
> If it's possible you'd have to remember that stuff was already quiesce'ed. Maybe
> simply by checking early in virtballoon_quiesce() whether vb->stop_update ==
> true already.
> 
> -- 
> Cheers,
> 
> David

I don't think it can happen.

-- 
MST


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-06-22 14:59 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-22 13:37 [PATCH 0/2] virtio_balloon: quiesce balloon work on device shutdown Denis V. Lunev
2026-06-22 13:37 ` [PATCH 1/2] virtio_balloon: factor out virtballoon_quiesce() Denis V. Lunev
2026-06-22 14:46   ` David Hildenbrand (Arm)
2026-06-22 14:59     ` Michael S. Tsirkin
2026-06-22 13:37 ` [PATCH 2/2] virtio_balloon: quiesce balloon work before device shutdown Denis V. Lunev
2026-06-22 14:38   ` David Hildenbrand (Arm)
2026-06-22 14:58     ` Michael S. Tsirkin
2026-06-22 14:29 ` [PATCH 0/2] virtio_balloon: quiesce balloon work on " David Hildenbrand (Arm)
2026-06-22 14:33   ` Denis V. Lunev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.