Linux virtualization list
 help / color / mirror / Atom feed
* [PATCH] virtio_console: add timeout to __send_to_port() spin loop
@ 2026-04-21  6:57 Peng Yang
  2026-04-21  7:38 ` Arnd Bergmann
  0 siblings, 1 reply; 6+ messages in thread
From: Peng Yang @ 2026-04-21  6:57 UTC (permalink / raw)
  To: Amit Shah, Arnd Bergmann, Greg Kroah-Hartman
  Cc: kernel, virtualization, linux-kernel, Peng Yang

__send_to_port() busy-waits in virtqueue_get_buf() while holding
outvq_lock with IRQs disabled. If the host stops draining the TX
virtqueue, this loop never terminates.

This was observed during secondary VM boot: virtio_mem plugged memory
in multiple iterations, each emitting dev_info() messages through the
hvc console. A writev() on the hvc TTY entered __send_to_port() and
stalled in the spin loop. When the watchdog bark ISR fired on another
CPU, it attempted printk(), which tried to acquire outvq_lock through
the same path and spun indefinitely. With all CPUs stuck, the watchdog
could not be serviced and triggered a bite.

Add a 200 ms deadline using ktime_get_mono_fast_ns() to bound the spin
loop. ktime_get_mono_fast_ns() reads the hardware counter directly and
is safe to call with IRQs disabled and spinlocks held.

The 200 ms value is chosen to be far above normal host response latency
(microseconds) to avoid spurious exits, yet well below the watchdog
bark-to-bite window (typically 3 s) so that CPUs can escape the loop
and complete the bark handler before a bite occurs.

Signed-off-by: Peng Yang <peng.yang@oss.qualcomm.com>
---
 drivers/char/virtio_console.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 9a33217c68d9..b3535681dfe1 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -27,6 +27,7 @@
 #include <linux/module.h>
 #include <linux/dma-mapping.h>
 #include <linux/string_choices.h>
+#include <linux/timekeeping.h>
 #include "../tty/hvc/hvc_console.h"
 
 #define is_rproc_enabled IS_ENABLED(CONFIG_REMOTEPROC)
@@ -601,6 +602,7 @@ static ssize_t __send_to_port(struct port *port, struct scatterlist *sg,
 	int err;
 	unsigned long flags;
 	unsigned int len;
+	u64 deadline;
 
 	out_vq = port->out_vq;
 
@@ -632,10 +634,18 @@ static ssize_t __send_to_port(struct port *port, struct scatterlist *sg,
 	 * buffer and relax the spinning requirement.  The downside is
 	 * we need to kmalloc a GFP_ATOMIC buffer each time the
 	 * console driver writes something out.
+	 *
+	 * To avoid spinning forever if the host stops processing the
+	 * TX virtqueue (e.g. during VM shutdown), a 200ms deadline is
+	 * used to break out of the loop as a fallback.
 	 */
-	while (!virtqueue_get_buf(out_vq, &len)
-		&& !virtqueue_is_broken(out_vq))
+	deadline = ktime_get_mono_fast_ns() + 200ULL * NSEC_PER_MSEC;
+	while (!virtqueue_get_buf(out_vq, &len) &&
+	       !virtqueue_is_broken(out_vq)) {
+		if (ktime_get_mono_fast_ns() >= deadline)
+			break;
 		cpu_relax();
+	}
 done:
 	spin_unlock_irqrestore(&port->outvq_lock, flags);
 

---
base-commit: 97e797263a5e963da3d1e66e743fd518567dfe37
change-id: 20260420-add_timeout_to___send_to_port-104ce7bcf241

Best regards,
--  
Peng Yang <peng.yang@oss.qualcomm.com>


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] virtio_console: add timeout to __send_to_port() spin loop
  2026-04-21  6:57 [PATCH] virtio_console: add timeout to __send_to_port() spin loop Peng Yang
@ 2026-04-21  7:38 ` Arnd Bergmann
  2026-04-21  9:18   ` Peng Yang
  0 siblings, 1 reply; 6+ messages in thread
From: Arnd Bergmann @ 2026-04-21  7:38 UTC (permalink / raw)
  To: Peng Yang, Amit Shah, Greg Kroah-Hartman
  Cc: kernel, virtualization, linux-kernel

On Tue, Apr 21, 2026, at 08:57, Peng Yang wrote:
> __send_to_port() busy-waits in virtqueue_get_buf() while holding
> outvq_lock with IRQs disabled. If the host stops draining the TX
> virtqueue, this loop never terminates.
>
> This was observed during secondary VM boot: virtio_mem plugged memory
> in multiple iterations, each emitting dev_info() messages through the
> hvc console. A writev() on the hvc TTY entered __send_to_port() and
> stalled in the spin loop. When the watchdog bark ISR fired on another
> CPU, it attempted printk(), which tried to acquire outvq_lock through
> the same path and spun indefinitely. With all CPUs stuck, the watchdog
> could not be serviced and triggered a bite.
>
> Add a 200 ms deadline using ktime_get_mono_fast_ns() to bound the spin
> loop. ktime_get_mono_fast_ns() reads the hardware counter directly and
> is safe to call with IRQs disabled and spinlocks held.
>
> The 200 ms value is chosen to be far above normal host response latency
> (microseconds) to avoid spurious exits, yet well below the watchdog
> bark-to-bite window (typically 3 s) so that CPUs can escape the loop
> and complete the bark handler before a bite occurs.

Which host implementation do you use? The way the virtio_console
driver works really assumes that virtqueue_kick() consumes the
buffer synchronously. Even though that is not how virtio is
specified, this does tend to work. ;-)

> @@ -632,10 +634,18 @@ static ssize_t __send_to_port(struct port *port, struct scatterlist *sg,
>  	 * buffer and relax the spinning requirement.  The downside is
> 	 * we need to kmalloc a GFP_ATOMIC buffer each time the
> 	 * console driver writes something out.
> +	 *
> +	 * To avoid spinning forever if the host stops processing the
> +	 * TX virtqueue (e.g. during VM shutdown), a 200ms deadline is
> +	 * used to break out of the loop as a fallback.
 	 */

Did you by any chance mean to use microseconds instead of milliseconds?
Waiting this long with interrupts disabled likely breaks a lot
of assumptions, e.g. in the scheduler. If you have to deal with
a hypervisor that does not handle the console output synchronously,
the alternative suggested in the existing comment would likely
be more appropriate.

      Arnd

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] virtio_console: add timeout to __send_to_port() spin loop
  2026-04-21  7:38 ` Arnd Bergmann
@ 2026-04-21  9:18   ` Peng Yang
  2026-04-21 12:14     ` Arnd Bergmann
  0 siblings, 1 reply; 6+ messages in thread
From: Peng Yang @ 2026-04-21  9:18 UTC (permalink / raw)
  To: Arnd Bergmann, Amit Shah, Greg Kroah-Hartman
  Cc: kernel, virtualization, linux-kernel, kernel



On 4/21/2026 3:38 PM, Arnd Bergmann wrote:
> On Tue, Apr 21, 2026, at 08:57, Peng Yang wrote:
>> __send_to_port() busy-waits in virtqueue_get_buf() while holding
>> outvq_lock with IRQs disabled. If the host stops draining the TX
>> virtqueue, this loop never terminates.
>>
>> This was observed during secondary VM boot: virtio_mem plugged memory
>> in multiple iterations, each emitting dev_info() messages through the
>> hvc console. A writev() on the hvc TTY entered __send_to_port() and
>> stalled in the spin loop. When the watchdog bark ISR fired on another
>> CPU, it attempted printk(), which tried to acquire outvq_lock through
>> the same path and spun indefinitely. With all CPUs stuck, the watchdog
>> could not be serviced and triggered a bite.
>>
>> Add a 200 ms deadline using ktime_get_mono_fast_ns() to bound the spin
>> loop. ktime_get_mono_fast_ns() reads the hardware counter directly and
>> is safe to call with IRQs disabled and spinlocks held.
>>
>> The 200 ms value is chosen to be far above normal host response latency
>> (microseconds) to avoid spurious exits, yet well below the watchdog
>> bark-to-bite window (typically 3 s) so that CPUs can escape the loop
>> and complete the bark handler before a bite occurs.
> 
> Which host implementation do you use? The way the virtio_console
> driver works really assumes that virtqueue_kick() consumes the
> buffer synchronously. Even though that is not how virtio is
> specified, this does tend to work. ;-)
> 
We are using crosvm as the host VMM with its virtio-console backend,
running on Android. The trigger is Android host reboot/shutdown: when
Android initiates a reboot, the crosvm process exits and tears down
the virtio-console backend. At that point, the TX virtqueue is no
longer being drained by the host and will never be consumed again.

The crash dump from the actual failure confirms the exact deadlock
scenario:

Core 3 holds outvq_lock and spins forever in virtqueue_get_buf waiting
for the host to consume the buffer:

virtqueue_get_buf
__send_to_port
put_chars
hvc_push
hvc_write
n_tty_write
  <- writev() syscall

Core 0 has a watchdog bark ISR fire and attempts printk, holds the
console lock, but spins on _raw_spin_lock_irqsave waiting to acquire
outvq_lock:

queued_spin_lock_slowpath
_raw_spin_lock_irqsave
__send_to_port
put_chars
hvc_console_print
console_flush_all
console_unlock
vprintk_emit
  <- printk (watchdog bark handler)

Core 1 has a virtio_mem worker calling _dev_info, spinning inside
vprintk_emit waiting to acquire the console lock which is held by Core
0:

vprintk_emit       <- spinning for console lock
dev_vprintk_emit
dev_printk_emit
__dev_printk
_dev_info
  <- virtio_mem worker

Core 2 (khvcd kernel thread) is also blocked in __hvc_poll trying to
acquire outvq_lock:

queued_spin_lock_slowpath
_raw_spin_lock_irqsave
__hvc_poll
khvcd

With all CPUs stuck, the watchdog cannot be serviced and a bite occurs
before the graceful shutdown can complete.

The 200 ms timeout is intended as a bounded escape from this "backend
already gone" scenario. It is well above normal crosvm response latency
(microseconds) to avoid false exits under normal operation, and well
below the watchdog bark-to-bite window so that CPUs can escape the loop
and allow the graceful shutdown to proceed.
>> @@ -632,10 +634,18 @@ static ssize_t __send_to_port(struct port *port, struct scatterlist *sg,
>>  	 * buffer and relax the spinning requirement.  The downside is
>> 	 * we need to kmalloc a GFP_ATOMIC buffer each time the
>> 	 * console driver writes something out.
>> +	 *
>> +	 * To avoid spinning forever if the host stops processing the
>> +	 * TX virtqueue (e.g. during VM shutdown), a 200ms deadline is
>> +	 * used to break out of the loop as a fallback.
>  	 */
> 
> Did you by any chance mean to use microseconds instead of milliseconds?
> Waiting this long with interrupts disabled likely breaks a lot
> of assumptions, e.g. in the scheduler. If you have to deal with
> a hypervisor that does not handle the console output synchronously,
> the alternative suggested in the existing comment would likely
> be more appropriate.
> 
>       Arnd

The unit is intentionally milliseconds, not microseconds. The timeout
only triggers in the abnormal case where crosvm has already exited —
under normal operation crosvm drains the TX virtqueue in microseconds
and the loop exits immediately. The 200 ms value is chosen to be well
below the watchdog bark-to-bite window (typically a few seconds) so
that CPUs can escape the spin loop and allow the watchdog bark handler
to complete before a bite occurs.

You are right that holding IRQs disabled for up to 200 ms is far from
ideal and breaks scheduler assumptions. We considered the alternative
suggested in the existing comment — copying data to a GFP_ATOMIC buffer
to avoid the spin entirely — but that approach has a fundamental limitation
in our case: the spin loop is not just in __send_to_port called from the hvc
TTY path, but also in send_control_msg, and more critically, the deadlock
occurs because printk itself ends up calling put_chars → __send_to_port
while holding the console lock with IRQs disabled. Refactoring to eliminate
the spin loop entirely would require a more invasive rework of the console
write path, which is beyond the scope of this fix.

The 200 ms timeout is intended as a minimal, targeted workaround to prevent
the watchdog bite in our specific scenario. We are open to suggestions on a
better long-term approach.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] virtio_console: add timeout to __send_to_port() spin loop
  2026-04-21  9:18   ` Peng Yang
@ 2026-04-21 12:14     ` Arnd Bergmann
  2026-04-21 14:11       ` Peng Yang
  2026-04-21 14:23       ` Peng Yang
  0 siblings, 2 replies; 6+ messages in thread
From: Arnd Bergmann @ 2026-04-21 12:14 UTC (permalink / raw)
  To: Peng Yang, Amit Shah, Greg Kroah-Hartman
  Cc: kernel, virtualization, linux-kernel, kernel

On Tue, Apr 21, 2026, at 11:18, Peng Yang wrote:
> On 4/21/2026 3:38 PM, Arnd Bergmann wrote:
>> 
>> Which host implementation do you use? The way the virtio_console
>> driver works really assumes that virtqueue_kick() consumes the
>> buffer synchronously. Even though that is not how virtio is
>> specified, this does tend to work. ;-)
>> 
> We are using crosvm as the host VMM with its virtio-console backend,
> running on Android. The trigger is Android host reboot/shutdown: when
> Android initiates a reboot, the crosvm process exits and tears down
> the virtio-console backend. At that point, the TX virtqueue is no
> longer being drained by the host and will never be consumed again.

I see, so the normal behavior is likely just fine, but the error
handling is what goes wrong. Maybe there is a way for the guest
to detect the device being turn down already so it does not
actually have to wait any more?

> The crash dump from the actual failure confirms the exact deadlock
> scenario:
>
> Core 3 holds outvq_lock and spins forever in virtqueue_get_buf waiting
> for the host to consume the buffer:
>
> virtqueue_get_buf
> __send_to_port
> put_chars
> hvc_push
> hvc_write
> n_tty_write
>   <- writev() syscall

This current loop here is

        while (!virtqueue_get_buf(out_vq, &len)
                && !virtqueue_is_broken(out_vq))
                cpu_relax();

which looks like the virtqueue_is_broken() check is meant to
catch this exact case. Do you know why this does not break
out of the loop after crosvm tears down the virtio-console
device?

> Core 0 has a watchdog bark ISR fire and attempts printk, holds the
> console lock, but spins on _raw_spin_lock_irqsave waiting to acquire
> outvq_lock:
>
> queued_spin_lock_slowpath
> _raw_spin_lock_irqsave
> __send_to_port
> put_chars
> hvc_console_print
> console_flush_all
> console_unlock
> vprintk_emit
>   <- printk (watchdog bark handler)

My first thought here was that __send_to_port() should perhaps
release the lock during the while() loop, which should avoid
blocking the other threads on the spin_lock_irqsave() but
would not avoid blocking on the loop.

> The 200 ms timeout is intended as a minimal, targeted workaround to prevent
> the watchdog bite in our specific scenario. We are open to suggestions on a
> better long-term approach.

Not sure how to do it, but I think finding a way to call
virtio_break_device() at the point the host device goes away is
the best solution here. Ideally there would just be a notification
from the host, but since __send_to_port() may be called with
interrupts disabled and may be running on the only CPU, that
would still be unreliable.

Maybe there is a way for virtio_console to read a status
register in the virtio config that tells it whether the
host has turned it off? I was thinking vdev->config->get_status(vdev)
but that seems to only get updated by the guest.

      Arnd

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] virtio_console: add timeout to __send_to_port() spin loop
  2026-04-21 12:14     ` Arnd Bergmann
@ 2026-04-21 14:11       ` Peng Yang
  2026-04-21 14:23       ` Peng Yang
  1 sibling, 0 replies; 6+ messages in thread
From: Peng Yang @ 2026-04-21 14:11 UTC (permalink / raw)
  To: Arnd Bergmann, Amit Shah, Greg Kroah-Hartman
  Cc: kernel, virtualization, linux-kernel, kernel



On 4/21/2026 8:14 PM, Arnd Bergmann wrote:
> On Tue, Apr 21, 2026, at 11:18, Peng Yang wrote:
>> On 4/21/2026 3:38 PM, Arnd Bergmann wrote:
>>>
>>> Which host implementation do you use? The way the virtio_console
>>> driver works really assumes that virtqueue_kick() consumes the
>>> buffer synchronously. Even though that is not how virtio is
>>> specified, this does tend to work. ;-)
>>>
>> We are using crosvm as the host VMM with its virtio-console backend,
>> running on Android. The trigger is Android host reboot/shutdown: when
>> Android initiates a reboot, the crosvm process exits and tears down
>> the virtio-console backend. At that point, the TX virtqueue is no
>> longer being drained by the host and will never be consumed again.
> 
> I see, so the normal behavior is likely just fine, but the error
> handling is what goes wrong. Maybe there is a way for the guest
> to detect the device being turn down already so it does not
> actually have to wait any more?
> 
Yes, exactly. Normal operation is fine; the problem is purely in the
error/teardown path.
We investigated both the virtqueue_is_broken() path and the virtio config
status register path. Neither works in our scenario, for the reasons
explained below.

>> The crash dump from the actual failure confirms the exact deadlock
>> scenario:
>>
>> Core 3 holds outvq_lock and spins forever in virtqueue_get_buf waiting
>> for the host to consume the buffer:
>>
>> virtqueue_get_buf
>> __send_to_port
>> put_chars
>> hvc_push
>> hvc_write
>> n_tty_write
>>   <- writev() syscall
> 
> This current loop here is
> 
>         while (!virtqueue_get_buf(out_vq, &len)
>                 && !virtqueue_is_broken(out_vq))
>                 cpu_relax();
> 
> which looks like the virtqueue_is_broken() check is meant to
> catch this exact case. Do you know why this does not break
> out of the loop after crosvm tears down the virtio-console
> device?
> 
virtqueue_is_broken() only reads the guest-side vq->broken flag, which is
set either by virtio_break_device() or by a failed virtqueue_notify() kick.
Neither happens here:
- When the host VMM exits, it does so as a pure userspace process termination.
No PCI interrupt or notification is sent to the guest, so
virtio_break_device() is never called from the guest side.
- __send_to_port() runs with IRQs disabled and outvq_lock held. Even if the
host were to send a config change interrupt, it cannot be delivered in this
context, so the async chain virtio_config_changed() → config_intr()
→ virtio_break_device() is completely blocked.

>> Core 0 has a watchdog bark ISR fire and attempts printk, holds the
>> console lock, but spins on _raw_spin_lock_irqsave waiting to acquire
>> outvq_lock:
>>
>> queued_spin_lock_slowpath
>> _raw_spin_lock_irqsave
>> __send_to_port
>> put_chars
>> hvc_console_print
>> console_flush_all
>> console_unlock
>> vprintk_emit
>>   <- printk (watchdog bark handler)
> 
> My first thought here was that __send_to_port() should perhaps
> release the lock during the while() loop, which should avoid
> blocking the other threads on the spin_lock_irqsave() but
> would not avoid blocking on the loop.
> 
Releasing the lock during the spin would unblock other CPUs waiting on outvq_lock
(e.g. the watchdog bark handler trying to printk). However it does not fix the
root issue — the CPU holding the lock would still spin forever. It also introduces
a TOCTOU race: another thread could modify the port or queue state between the
unlock and re-lock. The timeout avoids both problems by bounding the spin duration
without releasing the lock.

>> The 200 ms timeout is intended as a minimal, targeted workaround to prevent
>> the watchdog bite in our specific scenario. We are open to suggestions on a
>> better long-term approach.
> 
> Not sure how to do it, but I think finding a way to call
> virtio_break_device() at the point the host device goes away is
> the best solution here. Ideally there would just be a notification
> from the host, but since __send_to_port() may be called with
> interrupts disabled and may be running on the only CPU, that
> would still be unreliable.
> 
> Maybe there is a way for virtio_console to read a status
> register in the virtio config that tells it whether the
> host has turned it off? I was thinking vdev->config->get_status(vdev)
> but that seems to only get updated by the guest.
> 
>       Arnd

We checked this. In our host VMM implementation, VIRTIO_CONFIG_S_NEEDS_RESET is
never set on the teardown path — the device simply stops responding without
updating any status register. So polling vp_modern_get_status() inside the spin
loop would not help here.
We agree that the ideal long-term fix is for the host to trigger virtio_break_device()
via a clean PCI hot-unplug sequence, but that is not possible in a crash or forced
reboot scenario.
The 200ms value is chosen to be well above normal host response time (microseconds)
to avoid false positives, while remaining well below the watchdog bark-to-bite window
(3 seconds) to ensure all CPUs can exit the loop and complete the bark handler before
a bite occurs.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] virtio_console: add timeout to __send_to_port() spin loop
  2026-04-21 12:14     ` Arnd Bergmann
  2026-04-21 14:11       ` Peng Yang
@ 2026-04-21 14:23       ` Peng Yang
  1 sibling, 0 replies; 6+ messages in thread
From: Peng Yang @ 2026-04-21 14:23 UTC (permalink / raw)
  To: Arnd Bergmann, Amit Shah, Greg Kroah-Hartman
  Cc: kernel, virtualization, linux-kernel, kernel



On 4/21/2026 8:14 PM, Arnd Bergmann wrote:
> On Tue, Apr 21, 2026, at 11:18, Peng Yang wrote:
>> On 4/21/2026 3:38 PM, Arnd Bergmann wrote:
>>>
>>> Which host implementation do you use? The way the virtio_console
>>> driver works really assumes that virtqueue_kick() consumes the
>>> buffer synchronously. Even though that is not how virtio is
>>> specified, this does tend to work. ;-)
>>>
>> We are using crosvm as the host VMM with its virtio-console backend,
>> running on Android. The trigger is Android host reboot/shutdown: when
>> Android initiates a reboot, the crosvm process exits and tears down
>> the virtio-console backend. At that point, the TX virtqueue is no
>> longer being drained by the host and will never be consumed again.
> 
> I see, so the normal behavior is likely just fine, but the error
> handling is what goes wrong. Maybe there is a way for the guest
> to detect the device being turn down already so it does not
> actually have to wait any more?
> 
Yes, exactly. Normal operation is fine; the problem is purely
in the error/teardown path.
We investigated both the virtqueue_is_broken() path and the
virtio config status register path. Neither works in our
scenario, for the reasons explained below.

>> The crash dump from the actual failure confirms the exact deadlock
>> scenario:
>>
>> Core 3 holds outvq_lock and spins forever in virtqueue_get_buf waiting
>> for the host to consume the buffer:
>>
>> virtqueue_get_buf
>> __send_to_port
>> put_chars
>> hvc_push
>> hvc_write
>> n_tty_write
>>   <- writev() syscall
> 
> This current loop here is
> 
>         while (!virtqueue_get_buf(out_vq, &len)
>                 && !virtqueue_is_broken(out_vq))
>                 cpu_relax();
> 
> which looks like the virtqueue_is_broken() check is meant to
> catch this exact case. Do you know why this does not break
> out of the loop after crosvm tears down the virtio-console
> device?
> 
virtqueue_is_broken() only reads the guest-side vq->broken
flag, which is set either by virtio_break_device() or by a
failed virtqueue_notify() kick. Neither happens here:

- When the host VMM exits, it does so as a pure userspace
  process termination. No PCI interrupt or notification is
  sent to the guest, so virtio_break_device() is never
  called from the guest side.
- __send_to_port() runs with IRQs disabled and outvq_lock
  held. Even if the host were to send a config change
  interrupt, it cannot be delivered in this context, so the
  async chain virtio_config_changed() -> config_intr()
  -> virtio_break_device() is completely blocked.

As a result, vq->broken remains false forever and the loop
never exits.

>> Core 0 has a watchdog bark ISR fire and attempts printk, holds the
>> console lock, but spins on _raw_spin_lock_irqsave waiting to acquire
>> outvq_lock:
>>
>> queued_spin_lock_slowpath
>> _raw_spin_lock_irqsave
>> __send_to_port
>> put_chars
>> hvc_console_print
>> console_flush_all
>> console_unlock
>> vprintk_emit
>>   <- printk (watchdog bark handler)
> 
> My first thought here was that __send_to_port() should perhaps
> release the lock during the while() loop, which should avoid
> blocking the other threads on the spin_lock_irqsave() but
> would not avoid blocking on the loop.
> 
Releasing the lock during the spin would unblock other CPUs
waiting on outvq_lock (e.g. the watchdog bark handler trying
to printk). However it does not fix the root issue — the CPU
holding the lock would still spin forever. It also introduces
a TOCTOU race: another thread could modify the port or queue
state between the unlock and re-lock. The timeout avoids both
problems by bounding the spin duration without releasing the
lock.

>> The 200 ms timeout is intended as a minimal, targeted workaround to prevent
>> the watchdog bite in our specific scenario. We are open to suggestions on a
>> better long-term approach.
> 
> Not sure how to do it, but I think finding a way to call
> virtio_break_device() at the point the host device goes away is
> the best solution here. Ideally there would just be a notification
> from the host, but since __send_to_port() may be called with
> interrupts disabled and may be running on the only CPU, that
> would still be unreliable.
> 
> Maybe there is a way for virtio_console to read a status
> register in the virtio config that tells it whether the
> host has turned it off? I was thinking vdev->config->get_status(vdev)
> but that seems to only get updated by the guest.
> 
>       Arnd

We checked this. In our host VMM implementation,
VIRTIO_CONFIG_S_NEEDS_RESET is never set on the teardown
path — the device simply stops responding without updating
any status register. So polling vp_modern_get_status()
inside the spin loop would not help here.

We agree that the ideal long-term fix is for the host to
trigger virtio_break_device() via a clean PCI hot-unplug
sequence, but that is not possible in a crash or forced
reboot scenario.

The 200ms value is chosen to be well above normal host
response time (microseconds) to avoid false positives, while
remaining well below the watchdog bark-to-bite window
(3 seconds) to ensure all CPUs can exit the loop and complete
the bark handler before a bite occurs.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-04-21 14:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-21  6:57 [PATCH] virtio_console: add timeout to __send_to_port() spin loop Peng Yang
2026-04-21  7:38 ` Arnd Bergmann
2026-04-21  9:18   ` Peng Yang
2026-04-21 12:14     ` Arnd Bergmann
2026-04-21 14:11       ` Peng Yang
2026-04-21 14:23       ` Peng Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox