linux-serial.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 8250 misses interrupt, stalls
@ 2008-08-08  5:27 Jeff DeFouw
  2008-08-08 12:36 ` Kohne, Mike
  0 siblings, 1 reply; 4+ messages in thread
From: Jeff DeFouw @ 2008-08-08  5:27 UTC (permalink / raw)
  To: linux-serial

I have a Celeron ETX processor module with a Winbond W83627HF SuperIO 
chip, providing two 16550A compatible UARTs, which connects over LPC to 
an 82801DB south bridge.  I'm only using the first serial port at the 
moment.  The modules 8250 and 8250_pnp are loaded, and the serial port 
is on standard resources (irq 4).  After one of my complex applications 
at work runs for a few minutes, the serial port suddenly stops.  The 
UART IIR indicates an interrupt is pending, and the LSR indicates data 
is waiting to be received (as well as sent), but the interrupt handler 
is not being called.  If while it's stuck I reset the enabled interrupts 
(save IER, clear IER, restore IER) the I/O resumes.  I don't see 
anything wrong with the way the UART is being serviced.  The kernel is 
patched to 2.6.25.11, configured by Debian as SMP, no PREEMPT, shared 
IRQ (nothing else is on irq 4) and without any external modules loaded.  
It's a Celeron without any HT or multiple cores, so it's uniprocessor.  
I haven't been able to make a simple test case.  The real program is 
kind of a loopback that's continuously receiving and sending data at 
115200 8N1 with no flow control.  The full data rate isn't being used, 
and I'm not getting any overruns (until it stalls).  A simple loopback 
program doesn't demonstrate the problem, but the real application runs 
into it every time within 10 minutes (usually under 5).

-- 
Jeff DeFouw <jeffd@i2k.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: 8250 misses interrupt, stalls
  2008-08-08  5:27 8250 misses interrupt, stalls Jeff DeFouw
@ 2008-08-08 12:36 ` Kohne, Mike
  2008-08-13 14:56   ` [PATCH/RFC] " Robert Evans
  2008-08-20 15:29   ` Jeff DeFouw
  0 siblings, 2 replies; 4+ messages in thread
From: Kohne, Mike @ 2008-08-08 12:36 UTC (permalink / raw)
  To: Jeff DeFouw, linux-serial

I have a system (single core CPU, Winbond 83627HF, but with extra serial
ports provided by a SCH3116 - another superio chip). In my box, the
extra serial ports refuse to work (I never see interrupts) when I use
SMP kernels. If I rebuild the kernel without SMP, my serial ports work
fine. I've never gotten a sufficient explaination as to why turning on
an SMP kernel would screw up these serial ports. We don't have the heavy
(continuous) use that you do, so we would never have noticed the kind of
errors you see.

Now obviously there's a wide difference between my behavior (they never
work) and yours (they work for a while), but you might want to try
building a non-SMP kernel (I took the distributed kernel config as my
starting point, and just turned off SMP using make menuconfig). 

I'd be interested to know if this had any effect on your problem. 

Good luck!

-----Original Message-----
From: linux-serial-owner@vger.kernel.org
[mailto:linux-serial-owner@vger.kernel.org] On Behalf Of Jeff DeFouw
Sent: Friday, August 08, 2008 1:27 AM
To: linux-serial@vger.kernel.org
Subject: 8250 misses interrupt, stalls

I have a Celeron ETX processor module with a Winbond W83627HF SuperIO
chip, providing two 16550A compatible UARTs, which connects over LPC to
an 82801DB south bridge.  I'm only using the first serial port at the
moment.  The modules 8250 and 8250_pnp are loaded, and the serial port
is on standard resources (irq 4).  After one of my complex applications
at work runs for a few minutes, the serial port suddenly stops.  The
UART IIR indicates an interrupt is pending, and the LSR indicates data
is waiting to be received (as well as sent), but the interrupt handler
is not being called.  If while it's stuck I reset the enabled interrupts
(save IER, clear IER, restore IER) the I/O resumes.  I don't see
anything wrong with the way the UART is being serviced.  The kernel is
patched to 2.6.25.11, configured by Debian as SMP, no PREEMPT, shared
IRQ (nothing else is on irq 4) and without any external modules loaded.

It's a Celeron without any HT or multiple cores, so it's uniprocessor.  
I haven't been able to make a simple test case.  The real program is
kind of a loopback that's continuously receiving and sending data at
115200 8N1 with no flow control.  The full data rate isn't being used,
and I'm not getting any overruns (until it stalls).  A simple loopback
program doesn't demonstrate the problem, but the real application runs
into it every time within 10 minutes (usually under 5).

--
Jeff DeFouw <jeffd@i2k.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-serial"
in the body of a message to majordomo@vger.kernel.org More majordomo
info at  http://vger.kernel.org/majordomo-info.html


This message (including any attachments) contains confidential 
and/or proprietary information intended only for the addressee.  
Any unauthorized disclosure, copying, distribution or reliance on 
the contents of this information is strictly prohibited and may 
constitute a violation of law.  If you are not the intended 
recipient, please notify the sender immediately by responding to 
this e-mail, and delete the message from your system.  If you 
have any questions about this e-mail please notify the sender 
immediately.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH/RFC] RE: 8250 misses interrupt, stalls
  2008-08-08 12:36 ` Kohne, Mike
@ 2008-08-13 14:56   ` Robert Evans
  2008-08-20 15:29   ` Jeff DeFouw
  1 sibling, 0 replies; 4+ messages in thread
From: Robert Evans @ 2008-08-13 14:56 UTC (permalink / raw)
  To: linux-serial

At Stratus we have also seen missed interrupts in 8250.c.  We
concluded these are due to serial driver UART_BUG_TXEN false positive
on SMP system as explained below with patches to fix this problem.

Symptom and Environment:

On an SMP x86_64 system that has 8 CPU cores, we see serial output to
16550A UARTs stalling.  When this situation occurs, the associated
tty_struct does not have stopped flags set and the uart_info->xmit
buffer is not empty.  If interrupts were occurring the data should be
sent to the UART.  Because the data is not being sent, it seems a
"transmitter holding register empty" interrupt (THRI) is getting lost
and therefore outgoing data stops.

We only see this bug on SMP systems.  When we boot a multi-core system
with maxcpus=1 the problem does not occur.  In these tests, a serial
console is on ttyS0 which uses IRQ4.  ttyS1 is used by pppd and it
uses IRQ3.  Neither IRQ is shared.  We can test by rebooting the
system and after the init scripts have run, looking at whether the
UART_BUG_TXEN bit is set for each tty.  With maxcpus=1 UART_BUG_TXEN
never gets set.  Without maxcpus on the kernel command line, we see
false positives in setting UART_BUG_TXEN perhaps 75 to 95 percent of
the time on ttyS1, and about 20 percent of the time on ttyS0.

The particular UARTs used in the Stratus system are part of the
PC87427 Server IO chip.  This chip was designed by National
Semiconductor and documentation on the NS web site says the chip is
compatible with 16450 and 16550A.  Now that business has transferred
to Winbond from NS; so those parts come from Winbond.  This is a
modern UART implementation that should not have the UART_BUG_TXEN.

We saw this problems in Red Hat Enterprise Linux 5.2, with kernel
linux-2.6.18-92.el5.  However from source code examination, it seems
this would also occur in linux 2.6.24 and later.

Analysis:
(line numbers from linux v2.6.24.4)

How UART_BUG_TXEN gets set due to a false positive on SMP systems ---
The UART is initialized by function serial8250_startup() in 8250.c.
At line 1860 the call to serial_link_irq_chain(up) connects the IRQ to
the ISR in this driver.  It is relevant that the ISR reads the IIR
before it tries to acquire the
up->port.lock spinlock and reading the IIR would clear THRI if it is
the interrupt cause thus breaking this detection logic that comes a
few lines later in serial8250_startup().  Line 1881 is the last step
necessary for the ISR to be
entered.
in serial8250_startup:
1860                retval = serial_link_irq_chain(up);
...
1874        } else
1875                /*
1876                 * Most PC uarts need OUT2 raised to enable interrupts.
1877                 */
1878                if (is_real_interrupt(up->port.irq))
1879                        up->port.mctrl |= TIOCM_OUT2;
1880
1881        serial8250_set_mctrl(&up->port, up->port.mctrl);
1882
1883        /*
1884         * Do a quick test to see if we receive an
1885         * interrupt when we enable the TX irq.
1886         */
1887        serial_outp(up, UART_IER, UART_IER_THRI);
1888        lsr = serial_in(up, UART_LSR);
1889        iir = serial_in(up, UART_IIR);
1890        serial_outp(up, UART_IER, 0);
1891
1892        if (lsr & UART_LSR_TEMT && iir & UART_IIR_NO_INT) {
1893                if (!(up->bugs & UART_BUG_TXEN)) {
1894                        up->bugs |= UART_BUG_TXEN;
1895                        pr_debug("ttyS%d - enabling bad tx status
workarounds\n",
1896                                 port->line);
1897                }
1898        } else {
1899                up->bugs &= ~UART_BUG_TXEN;
1900        }
1901
1902        spin_unlock_irqrestore(&up->port.lock, flags);

Line 1887 causes an interrupt and the problem can occur when the ISR
is entered on another processor.  If ISR reads the IIR, clearing THRI,
before the IIR is read on line 1889, a false positive is detected.


How incorrectly detecting UART_BUG_TXEN causes output to stall ---

When usermode has more characters for the UART to transmit, the
characters are placed into the uart_info->xmit circular buffer and
serial8250_start_tx() gets called.  That function flows through the
following code path:

in serial8250_start_tx:
1241        struct uart_8250_port *up = (struct uart_8250_port *)port;
1242
1243        if (!(up->ier & UART_IER_THRI)) {
1244                up->ier |= UART_IER_THRI;
1245                serial_out(up, UART_IER, up->ier);
1246
1247                if (up->bugs & UART_BUG_TXEN) {
1248                        unsigned char lsr, iir;
1249                        lsr = serial_in(up, UART_LSR);
1250                        up->lsr_saved_flags |= lsr & LSR_SAVE_FLAGS;
1251                        iir = serial_in(up, UART_IIR) & 0x0f;
1252                        if ((up->port.type == PORT_RM9000) ?
1253                                (lsr & UART_LSR_THRE &&
1254                                (iir == UART_IIR_NO_INT || iir ==
UART_IIR_THRI)) :
1255                                (lsr & UART_LSR_TEMT && iir &
UART_IIR_NO_INT))
1256                                transmit_chars(up);
1257                }
1258        }

On NON-buggy UARTs line 1245 causes a THRI interrupt request.  If the
IIR has not been read by the ISR by the time it is read on line 1251,
the value read by line 1251 can indicate that THRI is pending; in this
case, reading the IIR would clear the THRI status, causing the
interrupt to get lost.  This value read from the IIR would not be
UART_IIR_NO_INT so that line 1256 would be bypassed; with no
characters sent to the transmitter in the UART another THRI interrupt
request would not occur.  Subsequent calls to this routine do nothing
because the  UART_IER_THRI bit is already set.  This causes output
stalls.


Proposed fixes ---

Here are two patches that are alternatives for fixing this problem:


The first patch was tested at Red Hat and Stratus.  It has been
released to users of Red Hat Enterprise Linux 5.2.
This patch takes the port's irq lock when starting up the UART to
provide mutual exclusion between the "quick test" in the startup code
and the interrupt service routine.

--- linux-2.6.18-92-old/drivers/serial/8250.c   2008-08-13
14:07:08.000000000 +0000
+++ linux-2.6.18-92-new/drivers/serial/8250.c   2008-08-13
14:04:12.000000000 +0000
@@ -1749,65 +1749,73 @@

               timeout = timeout > 6 ? (timeout / 2 - 2) : 1;

               up->timer.data = (unsigned long)up;
               mod_timer(&up->timer, jiffies + timeout);
       } else {
               retval = serial_link_irq_chain(up);
               if (retval)
                       return retval;
       }

       /*
        * Now, initialize the UART
        */
       serial_outp(up, UART_LCR, UART_LCR_WLEN8);

-       spin_lock_irqsave(&up->port.lock, flags);
+       if (is_real_interrupt(up->port.irq)) {
+               spin_lock_irqsave(&irq_lists[up->port.irq].lock, flags);
+               spin_lock(&up->port.lock);
+       } else
+               spin_lock_irqsave(&up->port.lock, flags);
       if (up->port.flags & UPF_FOURPORT) {
               if (!is_real_interrupt(up->port.irq))
                       up->port.mctrl |= TIOCM_OUT1;
       } else
               /*
                * Most PC uarts need OUT2 raised to enable interrupts.
                */
               if (is_real_interrupt(up->port.irq))
                       up->port.mctrl |= TIOCM_OUT2;

       serial8250_set_mctrl(&up->port, up->port.mctrl);

       /*
        * Do a quick test to see if we receive an
        * interrupt when we enable the TX irq.
        */
       serial_outp(up, UART_IER, UART_IER_THRI);
       lsr = serial_in(up, UART_LSR);
       iir = serial_in(up, UART_IIR);
       serial_outp(up, UART_IER, 0);

       if (lsr & UART_LSR_TEMT && iir & UART_IIR_NO_INT) {
               if (!(up->bugs & UART_BUG_TXEN)) {
                       up->bugs |= UART_BUG_TXEN;
                       pr_debug("ttyS%d - enabling bad tx status workarounds\n",
                                port->line);
               }
       } else {
               up->bugs &= ~UART_BUG_TXEN;
       }

-       spin_unlock_irqrestore(&up->port.lock, flags);
+       if (is_real_interrupt(up->port.irq)) {
+               spin_unlock(&up->port.lock);
+               spin_unlock_irqrestore(&irq_lists[up->port.irq].lock, flags);
+       } else
+               spin_unlock_irqrestore(&up->port.lock, flags);

       /*
        * Finally, enable interrupts.  Note: Modem status interrupts
        * are set via set_termios(), which will be occurring imminently
        * anyway, so we don't enable them here.
        */
       up->ier = UART_IER_RLSI | UART_IER_RDI;
       serial_outp(up, UART_IER, up->ier);

       if (up->port.flags & UPF_FOURPORT) {
               unsigned int icp;
               /*
                * Enable interrupts on the AST Fourport board
                */
               icp = (up->port.iobase & 0xfe0) | 0x01f;
               outb_p(0x80, icp);






The second patch blocks the 16550A UART from asserting its IRQ during
the quick test in serial8250_startup previously discussed.  This patch
patch has only had limited testing at Stratus.

--- linux-2.6.24.4.old/drivers/serial/8250.c    2008-03-24
18:49:18.000000000 +0000
+++ linux-2.6.24.4.new/drivers/serial/8250.c    2008-04-16
19:40:41.000000000 +0000
@@ -1876,21 +1876,25 @@
                * Most PC uarts need OUT2 raised to enable interrupts.
                */
               if (is_real_interrupt(up->port.irq))
                       up->port.mctrl |= TIOCM_OUT2;

-       serial8250_set_mctrl(&up->port, up->port.mctrl);
+       /* Block IRQ to avoid false positive if SMP */
+       serial8250_set_mctrl(&up->port, 0);

       /*
        * Do a quick test to see if we receive an
        * interrupt when we enable the TX irq.
        */
       serial_outp(up, UART_IER, UART_IER_THRI);
       lsr = serial_in(up, UART_LSR);
       iir = serial_in(up, UART_IIR);
       serial_outp(up, UART_IER, 0);

+       /* Enable this UART to assert its IRQ */
+       serial8250_set_mctrl(&up->port, up->port.mctrl);
+
       if (lsr & UART_LSR_TEMT && iir & UART_IIR_NO_INT) {
               if (!(up->bugs & UART_BUG_TXEN)) {
                       up->bugs |= UART_BUG_TXEN;
                       pr_debug("ttyS%d - enabling bad tx status workarounds\n",
                                port->line);


Robert N. Evans
Software Engineer
STRATUS TECHNOLOGIES
111 Powdermill Road,
Maynard, MA 01754-3409  U.S.A.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 8250 misses interrupt, stalls
  2008-08-08 12:36 ` Kohne, Mike
  2008-08-13 14:56   ` [PATCH/RFC] " Robert Evans
@ 2008-08-20 15:29   ` Jeff DeFouw
  1 sibling, 0 replies; 4+ messages in thread
From: Jeff DeFouw @ 2008-08-20 15:29 UTC (permalink / raw)
  To: Kohne, Mike; +Cc: linux-serial

On Fri, Aug 08, 2008 at 05:36:58AM -0700, Kohne, Mike wrote:
> I have a system (single core CPU, Winbond 83627HF, but with extra serial
> ports provided by a SCH3116 - another superio chip). In my box, the
> extra serial ports refuse to work (I never see interrupts) when I use
> SMP kernels. If I rebuild the kernel without SMP, my serial ports work
> fine. I've never gotten a sufficient explaination as to why turning on
> an SMP kernel would screw up these serial ports. We don't have the heavy
> (continuous) use that you do, so we would never have noticed the kind of
> errors you see.
> 
> Now obviously there's a wide difference between my behavior (they never
> work) and yours (they work for a while), but you might want to try
> building a non-SMP kernel (I took the distributed kernel config as my
> starting point, and just turned off SMP using make menuconfig). 
> 
> I'd be interested to know if this had any effect on your problem. 

I tried a stripped down UP kernel, but it didn't fix the problem.  It 
took about an hour to fail, compared to the usual under 10 minutes.  I 
also tried adding counters all over the interrupt paths and couldn't 
find any interrupts getting lost in the kernel.  It's looking like 
hardware for me.  I'll have to do some sort of workaround with timers.

-- 
Jeff DeFouw <jeffd@i2k.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-08-20 15:29 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-08  5:27 8250 misses interrupt, stalls Jeff DeFouw
2008-08-08 12:36 ` Kohne, Mike
2008-08-13 14:56   ` [PATCH/RFC] " Robert Evans
2008-08-20 15:29   ` Jeff DeFouw

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).