linux-serial.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] IRQ handling race and spurious IIR read in serial/8250.c
@ 2009-03-12 17:57 Markus Armbruster
  2009-03-12 18:09 ` Alan Cox
  0 siblings, 1 reply; 5+ messages in thread
From: Markus Armbruster @ 2009-03-12 17:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-serial, xen-devel@lists.xensource.com, Ian Jackson,
	Anders Kaseorg, Jeremy Fitzhardinge, Andrew Morton

From: Ian Jackson <ian.jackson@eu.citrix.com>

Do not read IIR in serial8250_start_tx when UART_BUG_TXEN

Reading the IIR clears some oustanding interrupts so it is not safe.
Instead, simply transmit immediately if the buffer is empty without
regard to IIR.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>

---
Ian Jackson recently debugged a problem reported by Anders Kaseorg, and
posted a fix (see http://lkml.org/lkml/2009/2/11/240).  As far as I can
see, the patch has been dropped on the floor.

I also experienced the problem, and Ian's patch fixes it for me.

The bug bites Xen HVM guests.  Output to the serial console stalls until
some input happens.  As Ian's analysis quoted below shows, it is a race
condition that could theoretically bite elsewhere as well.

Ian Jackson explained:
> The bugs in detail (this discussion applies to 2.6.20 and also to
> 2.6.28.4):
>
>  1. The hunk of serial8250_startup I quote below attempts to discover
>     whether writing the IER re-asserts the THRI (transmit ready)
>     interrupt.  However the spinlock that it has taken out,
>     port->lock, is not the one that the IRQ service routine takes
>     before reading the IIR (i->lock).  As a result, on an SMP system
>     the generated interrupt races with the straight-line code in
>     serial8250_startup.
>
>     If serial8250_startup loses the race (perhaps because the system
>     is a VM and its VCPU got preempted), UART_BUG_TXEN is spuriously
>     added to bugs.  This is quite unlikely in a normal system but in
>     certain Xen configurations, particularly ones where there is CPU
>     pressure, we may lose the race every time.
>
>     It is not exactly clear to me how this ought to be resolved.  One
>     possibility is that the UART_BUG_TXEN problem might be worked
>     around perfectly well by the new and very similar workaround
>     UART_BUG_THRE[1] in 2.6.21ish in which case it could just be
>     removed.
>      2. UART_BUG_TXEN's workaround appears to be intended to be
> harmless.
>     However what it actually does is to read the IIR, thus clearing
>     any actual interrupt (including incidentally non-THRI), and then
>     only perform the intended servicing if the interrupt was _not_
>     asserted.  That is, it breaks on any serial port with the bug.
>
>     As far as I can see there is not much use in UART_BUG_TXEN reading
>     IIR at all, so a suitable change if we want to keep UART_BUG_TXEN
>     might be the first patch I enclose below (again, not compiled
>     or tested).
>
>     If UART_BUG_TXEN is retained something along these lines should be
>     done at the very least.
>
> Ian.
>
> [1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=40b36daad0ac704e6d5c1b75789f371ef5b053c1
>     in which case UART

--- ../linux-2.6.28.4/drivers/serial/8250.c~	2009-02-06 21:47:45.000000000 +0000
+++ ../linux-2.6.28.4/drivers/serial/8250.c	2009-02-11 15:55:24.000000000 +0000
@@ -1257,14 +1257,12 @@
 		serial_out(up, UART_IER, up->ier);
 
 		if (up->bugs & UART_BUG_TXEN) {
-			unsigned char lsr, iir;
+			unsigned char lsr;
 			lsr = serial_in(up, UART_LSR);
 			up->lsr_saved_flags |= lsr & LSR_SAVE_FLAGS;
-			iir = serial_in(up, UART_IIR) & 0x0f;
 			if ((up->port.type == PORT_RM9000) ?
-				(lsr & UART_LSR_THRE &&
-				(iir == UART_IIR_NO_INT || iir == UART_IIR_THRI)) :
-				(lsr & UART_LSR_TEMT && iir & UART_IIR_NO_INT))
+				(lsr & UART_LSR_THRE) :
+				(lsr & UART_LSR_TEMT))
 				transmit_chars(up);
 		}
 	}



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] IRQ handling race and spurious IIR read in serial/8250.c
  2009-03-12 17:57 [PATCH] IRQ handling race and spurious IIR read in serial/8250.c Markus Armbruster
@ 2009-03-12 18:09 ` Alan Cox
  2009-03-12 19:30   ` Ian Jackson
  0 siblings, 1 reply; 5+ messages in thread
From: Alan Cox @ 2009-03-12 18:09 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Jeremy Fitzhardinge, Ian, Jackson, linux-kernel, Anders Kaseorg,
	xen-devel@lists.xensource.com, linux-serial, Andrew Morton

> Ian Jackson recently debugged a problem reported by Anders Kaseorg, and
> posted a fix (see http://lkml.org/lkml/2009/2/11/240).  As far as I can
> see, the patch has been dropped on the floor.

It was proposed as a band aid fix not a real fix. The real fix involves
finishing sorting out any locking issues Ian identified. I've not seen a
patch for that yet.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] IRQ handling race and spurious IIR read in serial/8250.c
  2009-03-12 18:09 ` Alan Cox
@ 2009-03-12 19:30   ` Ian Jackson
  2009-03-12 21:38     ` Alan Cox
  0 siblings, 1 reply; 5+ messages in thread
From: Ian Jackson @ 2009-03-12 19:30 UTC (permalink / raw)
  To: Alan Cox
  Cc: Jeremy Fitzhardinge, xen-devel@lists.xensource.com,
	linux-kernel@vger.kernel.org, Markus Armbruster, Anders Kaseorg,
	linux-serial@vger.kernel.org, Andrew Morton

Alan Cox writes ("Re: [PATCH] IRQ handling race and spurious IIR read in serial/8250.c"):
> > Ian Jackson recently debugged a problem reported by Anders Kaseorg, and
> > posted a fix (see http://lkml.org/lkml/2009/2/11/240).  As far as I can
> > see, the patch has been dropped on the floor.
> 
> It was proposed as a band aid fix not a real fix. The real fix involves
> finishing sorting out any locking issues Ian identified. I've not seen a
> patch for that yet.

The patch I have provided and which Markus has reposted has these
properties:

 1. It is definitely correct and does not introduce any new bugs.

 2. It makes the existing bug definitely go away.  That is, after my
    patch is applied the code drives the UART correctly - that is,
    according to the specification and in a manner guaranteed to be
    reliable - even though the code may still mistakenly set a bug
    flag;

 3. Without it the code is definitely wrong;

 4. Any fix which does not involve completely removing UART_BUG_TXEN
    will need my change.   

Or to put it another way

                        UART_BUG_TXEN           UART_BUG_TXEN
                        incorrectly set         behaviour

    Current code         Sometimes in VMs        Breaks correct systems
                         Rarely in baremetal?    (observed in production)

    With my patch        Sometimes in VMs        Always correct.
                         (behaviour unchanged)   Minor possible
                                                  performance problem.

    Perfect code         Perhaps feature         If retained, should
                         should be removed?       be correct.  Minor
                                                  performance impact
                                                  is acceptable.

My patch is therefore a strict improvement and also a likely component
of many of the possible ultimate fixes; the other ultimate fixes
involve removing entirely the code which I am now proposing to patch.

Please do not block this bugfix just because I haven't been able to
conclusively determine whether the buggy-without-my-patch workaround
should be entirely removed, and just because I do not fix every other
bug in the same area.

In particular, the fact that the detection of UART_BUG_TXEN remains
buggy is not a reason to block a patch which makes UART_BUG_TXEN's
effects correct.

As you can see `perfect code' is not yet attainable because the people
who originally implemented UART_BUG_TXEN don't seem to be around right
now to ask, and we don't have a 100% clear description of the buggy
behaviour it is trying to detect, and so we don't know whether it can
safely be removed.

Ian.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Re: [PATCH] IRQ handling race and spurious IIR read in serial/8250.c
  2009-03-12 19:30   ` Ian Jackson
@ 2009-03-12 21:38     ` Alan Cox
  2009-06-19  1:39       ` [Xen-devel] " Robert Evans
  0 siblings, 1 reply; 5+ messages in thread
From: Alan Cox @ 2009-03-12 21:38 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Jeremy Fitzhardinge, xen-devel@lists.xensource.com,
	Markus Armbruster, linux-kernel@vger.kernel.org, Anders Kaseorg,
	linux-serial@vger.kernel.org, Andrew Morton

>  1. It is definitely correct and does not introduce any new bugs.

Right ;)

>  2. It makes the existing bug definitely go away.  That is, after my
>     patch is applied the code drives the UART correctly - that is,
>     according to the specification and in a manner guaranteed to be

The UART hardware and specifications diverge - often quite spectacularly.

>  4. Any fix which does not involve completely removing UART_BUG_TXEN
>     will need my change.   

Or the locking fix

It was described as a band aid so I treated it as such. Regardless of the
right thing to do long term its not a 2.6.29 candidate at this point but
certainly something that wants looking into in 2.6.30

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Xen-devel] Re: [PATCH] IRQ handling race and spurious IIR read in serial/8250.c
  2009-03-12 21:38     ` Alan Cox
@ 2009-06-19  1:39       ` Robert Evans
  0 siblings, 0 replies; 5+ messages in thread
From: Robert Evans @ 2009-06-19  1:39 UTC (permalink / raw)
  To: Alan Cox
  Cc: Ian Jackson, Jeremy Fitzhardinge, xen-devel@lists.xensource.com,
	Markus Armbruster, Anders Kaseorg, linux-serial@vger.kernel.org,
	Andrew Morton

On Thu, Mar 12, 2009 at 5:38 PM, Alan Cox<alan@lxorguk.ukuu.org.uk> wrote:
...
>>  4. Any fix which does not involve completely removing UART_BUG_TXEN
>>     will need my change.
>
> Or the locking fix
>
> It was described as a band aid so I treated it as such. Regardless of the
> right thing to do long term its not a 2.6.29 candidate at this point but
> certainly something that wants looking into in 2.6.30

Stratus has experienced this same problem on bare metal.  An analysis
and proposed fixes were posted to this list on 2008-08-13.  See
  http://marc.info/?l=linux-serial&m=121863945506976&w=2

The first proposed fix takes the port's irq lock when starting up the
UART to provide mutual exclusion between the "quick test" in the
startup code and the interrupt service routine.  Stratus has quite a
bit or runtime with this locking fix on kernels of the 2.6.18 vintage
with good success.

Is this a "right thing" fix that could be considered for inclusion upstream?

--- linux-2.6.18-92-old/drivers/serial/8250.c   2008-08-13
14:07:08.000000000 +0000
+++ linux-2.6.18-92-new/drivers/serial/8250.c   2008-08-13
14:04:12.000000000 +0000
@@ -1749,65 +1749,73 @@

               timeout = timeout > 6 ? (timeout / 2 - 2) : 1;

               up->timer.data = (unsigned long)up;
               mod_timer(&up->timer, jiffies + timeout);
       } else {
               retval = serial_link_irq_chain(up);
               if (retval)
                       return retval;
       }

       /*
        * Now, initialize the UART
        */
       serial_outp(up, UART_LCR, UART_LCR_WLEN8);

-       spin_lock_irqsave(&up->port.lock, flags);
+       if (is_real_interrupt(up->port.irq)) {
+               spin_lock_irqsave(&irq_lists[up->port.irq].lock, flags);
+               spin_lock(&up->port.lock);
+       } else
+               spin_lock_irqsave(&up->port.lock, flags);
       if (up->port.flags & UPF_FOURPORT) {
               if (!is_real_interrupt(up->port.irq))
                       up->port.mctrl |= TIOCM_OUT1;
       } else
               /*
                * Most PC uarts need OUT2 raised to enable interrupts.
                */
               if (is_real_interrupt(up->port.irq))
                       up->port.mctrl |= TIOCM_OUT2;

       serial8250_set_mctrl(&up->port, up->port.mctrl);

       /*
        * Do a quick test to see if we receive an
        * interrupt when we enable the TX irq.
        */
       serial_outp(up, UART_IER, UART_IER_THRI);
       lsr = serial_in(up, UART_LSR);
       iir = serial_in(up, UART_IIR);
       serial_outp(up, UART_IER, 0);

       if (lsr & UART_LSR_TEMT && iir & UART_IIR_NO_INT) {
               if (!(up->bugs & UART_BUG_TXEN)) {
                       up->bugs |= UART_BUG_TXEN;
                       pr_debug("ttyS%d - enabling bad tx status workarounds\n",
                                port->line);
               }
       } else {
               up->bugs &= ~UART_BUG_TXEN;
       }

-       spin_unlock_irqrestore(&up->port.lock, flags);
+       if (is_real_interrupt(up->port.irq)) {
+               spin_unlock(&up->port.lock);
+               spin_unlock_irqrestore(&irq_lists[up->port.irq].lock, flags);
+       } else
+               spin_unlock_irqrestore(&up->port.lock, flags);

       /*
        * Finally, enable interrupts.  Note: Modem status interrupts
        * are set via set_termios(), which will be occurring imminently
        * anyway, so we don't enable them here.
        */
       up->ier = UART_IER_RLSI | UART_IER_RDI;
       serial_outp(up, UART_IER, up->ier);

       if (up->port.flags & UPF_FOURPORT) {
               unsigned int icp;
               /*
                * Enable interrupts on the AST Fourport board
                */
               icp = (up->port.iobase & 0xfe0) | 0x01f;
               outb_p(0x80, icp);


Robert N. Evans
Software Engineer
STRATUS TECHNOLOGIES
111 Powdermill Road,
Maynard, MA 01754-3409  U.S.A.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-06-19  1:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-12 17:57 [PATCH] IRQ handling race and spurious IIR read in serial/8250.c Markus Armbruster
2009-03-12 18:09 ` Alan Cox
2009-03-12 19:30   ` Ian Jackson
2009-03-12 21:38     ` Alan Cox
2009-06-19  1:39       ` [Xen-devel] " Robert Evans

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).