All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Roger Pau Monné" <roger.pau@citrix.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: Oleksii Kurochko <oleksii.kurochko@gmail.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Anthony PERARD <anthony.perard@vates.tech>,
	Michal Orzel <michal.orzel@amd.com>,
	Julien Grall <julien@xen.org>,
	Stefano Stabellini <sstabellini@kernel.org>,
	xen-devel@lists.xenproject.org
Subject: Re: [PATCH for-4.22] char/ns16550: bound execution time of ns16550_interrupt()
Date: Tue, 23 Jun 2026 16:16:22 +0200	[thread overview]
Message-ID: <ajqVNtt02XMUGk-X@macbook.local> (raw)
In-Reply-To: <06c8caab-4722-4b02-991f-bdc4a4ac0ea6@suse.com>

On Tue, Jun 23, 2026 at 03:44:06PM +0200, Jan Beulich wrote:
> On 23.06.2026 12:31, Roger Pau Monne wrote:
> > The current logic in ns16550_interrupt() will loop until the device sets
> > the NOINT in IIR.  At least on the Lenovo ThinkSystem SR630 V4 the flow
> > control of the serial-over-lan emulated UART seems to be broken, as it
> > doesn't set the NOINT bit consistently.  The Transmitter Holding Register
> > Empty in LSR also seems to not be properly signaled, as even with it set
> > writes to the transmit register take ~6ms.  This leads to the watchdog
> > triggering very easily on such system.
> > 
> > Introduce an upper bound on the execution time of ns16550_interrupt(), this
> > is currently set as 4x the polling interval, which is calculated as the
> > time to fill RX FIFO and/or empty TX FIFO.  The current maximum is 5ms.
> > Once the timeout triggers the interrupt is disabled and the uart is
> > switched to polling mode.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > There's a possible alternative approach to solve this by moving the actual
> > interrupt processing to a softirq tasklet and disabling the interrupt
> > source until the processing is done, likely unifying the logic with the
> > timer task.  However that's a bigger change, and too risky for 4.22 at this
> > point.
> 
> +1
> 
> > --- a/xen/drivers/char/ns16550.c
> > +++ b/xen/drivers/char/ns16550.c
> > @@ -62,6 +62,7 @@ static struct ns16550 {
> >  #endif
> >      unsigned int timeout_ms;
> >      bool intr_works;
> > +    bool force_polling;
> >      bool dw_usr_bsy;
> >  #ifdef NS16550_PCI
> >      /* PCI card parameters. */
> > @@ -190,12 +191,41 @@ static void cf_check ns16550_interrupt(int irq, void *dev_id)
> >  {
> >      struct serial_port *port = dev_id;
> >      struct ns16550 *uart = port->uart;
> > +    /* Set quite arbitrarily as 4x the time to drain the TX or fill RX FIFOs. */
> 
> Nit: I'd drop the latter of the two "the".
> 
> > +    const s_time_t timeout = NOW() + min(MILLISECS(uart->timeout_ms * 4),
> > +                                         MILLISECS(5));
> 
> MILLISECS(min(uart->timeout_ms * 4, 5U)) ?

Bah, yes, sorry, I've added the min() later and clearly didn't apply
much brainpower about it's position.

> 
> > +    if ( uart->force_polling )
> > +        return;
> 
> As the IRQ was disabled, is this even possible? I.e. should this be some
> kind of assertion or alike?

Hm, I wasn't setting IRQ_DISABLED before, and hence needed this guard.
But now with IRQ_DISABLED being set in ->status do_IRQ() should filter
any stray interrupts.  I will attempt to add an ASSERT_UNREACHABLE()
here.

> >      uart->intr_works = 1;
> >  
> >      while ( !(ns_read_reg(uart, UART_IIR) & UART_IIR_NOINT) )
> >      {
> >          u8 lsr = ns_read_reg(uart, UART_LSR);
> > +        s_time_t now = NOW();
> > +
> > +        /* Break out of the loop if spending too much time. */
> > +        if ( now > timeout )
> > +        {
> > +            struct irq_desc *desc = irq_to_desc(irq);
> > +
> > +            /* Disable the interrupt source - it's never shared. */
> > +            spin_lock_irq(&desc->lock);
> 
> This needs to be spin_lock_irqsave() - we may not rely on IRQs being on
> when we make it here. However, ...

I was relying on do_IRQ() unconditionally enabling IRQs before calling
the handler, but it's safer to use the _irqsave() variant.

> > +            desc->status |= IRQ_DISABLED;
> > +            if ( desc->handler->disable )
> > +                desc->handler->disable(desc);
> > +            spin_unlock_irq(&desc->lock);
> 
> ... all of this open-coding is quite bad anyway. We should probably add
> a helper for this in IRQ handling code.

Can add, no problem.

> > +            /* Disable interrupt generation on the device and arm the timer. */
> > +            uart->force_polling = true;
> > +            ns_write_reg(uart, UART_IER, 0);
> > +            set_timer(&uart->timer, now + MILLISECS(uart->timeout_ms));
> > +            printk(XENLOG_WARNING
> > +                   "uart interrupt taking too long, switched to polling\n");
> 
> Probably it is indeed best to keep this simple, but: A single instance of
> this taking e.g. just over 5ms (perhaps with a low baud rate) may not be
> indicative of an actual issue.

It's all fiddly, yes, I could add a counter and maybe only disable if
we have a certain amount of executions of ns16550_interrupt()
exceeding the timeout, yet I wanted to keep this simple.

> To alleviate this as least some, perhaps
> besides capping at 5ms we should also make sure that the timeout used isn't
> below ->timeout_ms?

OK, let me see what I can do.

Thanks, Roger.


  reply	other threads:[~2026-06-23 14:16 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-23 10:31 [PATCH for-4.22] char/ns16550: bound execution time of ns16550_interrupt() Roger Pau Monne
2026-06-23 13:36 ` Oleksii Kurochko
2026-06-23 13:46   ` Jan Beulich
2026-06-23 14:19     ` Roger Pau Monné
2026-06-23 13:44 ` Jan Beulich
2026-06-23 14:16   ` Roger Pau Monné [this message]
2026-06-23 14:27     ` Jan Beulich
2026-06-23 15:54       ` Roger Pau Monné

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ajqVNtt02XMUGk-X@macbook.local \
    --to=roger.pau@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=anthony.perard@vates.tech \
    --cc=jbeulich@suse.com \
    --cc=julien@xen.org \
    --cc=michal.orzel@amd.com \
    --cc=oleksii.kurochko@gmail.com \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.