* UV serial regression since 2.6.36
@ 2011-12-06 22:34 Jiri Slaby
2012-01-04 18:41 ` Jiri Slaby
2012-01-12 21:55 ` [PATCH] TTY: fix UV serial console regression Jiri Slaby
0 siblings, 2 replies; 7+ messages in thread
From: Jiri Slaby @ 2011-12-06 22:34 UTC (permalink / raw)
To: LKML, Alan Cox, Arnd Bergmann, Greg KH, Jiri Slaby
Hi,
In short, the issue causes serial port traffic to stop after some time
on UV machines. Only when serial is used as a console. It worked
perfectly with 2.6.32. It's a standard 16550 on 0x3f8 (well, it's
emulated to be at that port). Also in the PNP subsys.
To reproduce that 'debug' kernel parameter must not be used.
The root cause seems to be that the serial "chip" there has problems
with interrupts. It generates some but doesn't indicate it's the source
or somebody clears it (NOINT is set). And when it is supposed to
generate one (on THRE), it does not. So there are bytes in the TX buffer
which are never sent. Until the port is kicked e.g. by "echo h >
/proc/sysrq-trigger".
The unhandled interrupts were always an issue, but somehow hidden. It
became a real problem with patches post 2.6.32.
I bisected it to these two commits _together_:
commit 3f582b8c11014e4ce310d9839fb335164195333f
Author: Arnd Bergmann <arnd@arndb.de>
Date: Tue Jun 29 22:31:40 2010 +0200
serial: fix termios settings in open
AND
commit 74c2107759dc6efaa1b9127014be58a742a1e7ac
Author: Alan Cox <alan@linux.intel.com>
Date: Tue Jun 1 22:53:00 2010 +0200
serial: Use block_til_ready helper
Those reverted on the top of 2.6.37 makes it work again. Those reverted
on the top of 3.0 plus mine 3 reverted which removed update_set_termios
completely makes it work again.
I didn't look closely to why those patches are causing that. Like
c_cflag copy in uart_update_termios and "tty->termios->c_cflag & CBAUD"
test in tty_port_block_til_ready don't look good to me...
<rant from a person who spent with that bug 2 weeks :( >
I had to bisect three times. First to find a patch which allows booting
on <2.6.36 kernels on that machine. Second to narrow down the interval
-- it pointed to a merge commit. The third one to find the culprit.
The third one was unsuccessful because of two commits causing the issue.
Yes, hunting this crap
Now I'm going on vacation till the end of the year so I will be seldomly
responding.
</rant>
thanks,
--
js
suse labs
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: UV serial regression since 2.6.36
2011-12-06 22:34 UV serial regression since 2.6.36 Jiri Slaby
@ 2012-01-04 18:41 ` Jiri Slaby
2012-01-04 20:53 ` Greg KH
2012-01-12 21:55 ` [PATCH] TTY: fix UV serial console regression Jiri Slaby
1 sibling, 1 reply; 7+ messages in thread
From: Jiri Slaby @ 2012-01-04 18:41 UTC (permalink / raw)
To: Jiri Slaby; +Cc: LKML, Alan Cox, Arnd Bergmann, Greg KH
On 12/06/2011 11:34 PM, Jiri Slaby wrote:
> Hi,
>
> In short, the issue causes serial port traffic to stop after some time
> on UV machines. Only when serial is used as a console. It worked
> perfectly with 2.6.32. It's a standard 16550 on 0x3f8 (well, it's
> emulated to be at that port). Also in the PNP subsys.
>
> To reproduce that 'debug' kernel parameter must not be used.
>
> The root cause seems to be that the serial "chip" there has problems
> with interrupts. It generates some but doesn't indicate it's the source
> or somebody clears it (NOINT is set). And when it is supposed to
> generate one (on THRE), it does not. So there are bytes in the TX buffer
> which are never sent. Until the port is kicked e.g. by "echo h >
> /proc/sysrq-trigger".
>
> The unhandled interrupts were always an issue, but somehow hidden. It
> became a real problem with patches post 2.6.32.
>
> I bisected it to these two commits _together_:
>
> commit 3f582b8c11014e4ce310d9839fb335164195333f
> Author: Arnd Bergmann <arnd@arndb.de>
> Date: Tue Jun 29 22:31:40 2010 +0200
>
> serial: fix termios settings in open
>
> AND
>
> commit 74c2107759dc6efaa1b9127014be58a742a1e7ac
> Author: Alan Cox <alan@linux.intel.com>
> Date: Tue Jun 1 22:53:00 2010 +0200
>
> serial: Use block_til_ready helper
>
> Those reverted on the top of 2.6.37 makes it work again. Those reverted
> on the top of 3.0 plus mine 3 reverted which removed update_set_termios
> completely makes it work again.
>
> I didn't look closely to why those patches are causing that. Like
> c_cflag copy in uart_update_termios and "tty->termios->c_cflag & CBAUD"
> test in tty_port_block_til_ready don't look good to me...
My vacation is over. I suppose you haven't had a chance to take a look
into the regression? (Just in case any of you is looking into that right
now to not duplicate the effort.)
> thanks,
--
js
suse labs
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: UV serial regression since 2.6.36
2012-01-04 18:41 ` Jiri Slaby
@ 2012-01-04 20:53 ` Greg KH
0 siblings, 0 replies; 7+ messages in thread
From: Greg KH @ 2012-01-04 20:53 UTC (permalink / raw)
To: Jiri Slaby; +Cc: Jiri Slaby, LKML, Alan Cox, Arnd Bergmann
On Wed, Jan 04, 2012 at 07:41:47PM +0100, Jiri Slaby wrote:
> On 12/06/2011 11:34 PM, Jiri Slaby wrote:
> > Hi,
> >
> > In short, the issue causes serial port traffic to stop after some time
> > on UV machines. Only when serial is used as a console. It worked
> > perfectly with 2.6.32. It's a standard 16550 on 0x3f8 (well, it's
> > emulated to be at that port). Also in the PNP subsys.
> >
> > To reproduce that 'debug' kernel parameter must not be used.
> >
> > The root cause seems to be that the serial "chip" there has problems
> > with interrupts. It generates some but doesn't indicate it's the source
> > or somebody clears it (NOINT is set). And when it is supposed to
> > generate one (on THRE), it does not. So there are bytes in the TX buffer
> > which are never sent. Until the port is kicked e.g. by "echo h >
> > /proc/sysrq-trigger".
> >
> > The unhandled interrupts were always an issue, but somehow hidden. It
> > became a real problem with patches post 2.6.32.
> >
> > I bisected it to these two commits _together_:
> >
> > commit 3f582b8c11014e4ce310d9839fb335164195333f
> > Author: Arnd Bergmann <arnd@arndb.de>
> > Date: Tue Jun 29 22:31:40 2010 +0200
> >
> > serial: fix termios settings in open
> >
> > AND
> >
> > commit 74c2107759dc6efaa1b9127014be58a742a1e7ac
> > Author: Alan Cox <alan@linux.intel.com>
> > Date: Tue Jun 1 22:53:00 2010 +0200
> >
> > serial: Use block_til_ready helper
> >
> > Those reverted on the top of 2.6.37 makes it work again. Those reverted
> > on the top of 3.0 plus mine 3 reverted which removed update_set_termios
> > completely makes it work again.
> >
> > I didn't look closely to why those patches are causing that. Like
> > c_cflag copy in uart_update_termios and "tty->termios->c_cflag & CBAUD"
> > test in tty_port_block_til_ready don't look good to me...
>
> My vacation is over. I suppose you haven't had a chance to take a look
> into the regression? (Just in case any of you is looking into that right
> now to not duplicate the effort.)
I haven't, sorry, as I was on vacation as well.
greg k-h
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] TTY: fix UV serial console regression
2011-12-06 22:34 UV serial regression since 2.6.36 Jiri Slaby
2012-01-04 18:41 ` Jiri Slaby
@ 2012-01-12 21:55 ` Jiri Slaby
2012-01-12 22:00 ` Greg KH
2012-01-12 23:41 ` Alan Cox
1 sibling, 2 replies; 7+ messages in thread
From: Jiri Slaby @ 2012-01-12 21:55 UTC (permalink / raw)
To: gregkh
Cc: Arnd Bergmann, alan, linux-serial, jirislaby, linux-kernel,
Jiri Slaby, 3.0 3.1 3.2
Commit 74c2107759d (serial: Use block_til_ready helper) and its fixup
3f582b8c110 (serial: fix termios settings in open) introduced a
regression on UV systems. The serial eventually freezes while being
used. It's completely unpredictable and sometimes needs a heap of
traffic to happen first.
To reproduce this, yast installation was used as it turned out to be
pretty reliable in reproducing. Especially during installation process
where one doesn't have an SSH daemon running. And no monitor as the HW
is completely headless. So this was fun to find. Given the machine
doesn't boot on vanilla before 2.6.36 final. (And the commits above
are older.)
Unless there is some bad race in the code, the hardware seems to be
pretty broken. Otherwise pure MSR read should not cause such a bug,
or?
So to prevent the bug, revert to the old behavior. I.e. read modem
status only if we really have to -- for non-CLOCAL set serials.
Non-CLOCAL works on this hardware OK, I tried. See? I don't.
And document that shit.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: 3.0 3.1 3.2 <stable@vger.kernel.org>
References: https://lkml.org/lkml/2011/12/6/573
References: https://bugzilla.novell.com/show_bug.cgi?id=718518
---
drivers/tty/tty_port.c | 12 +++++++-----
1 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/tty/tty_port.c b/drivers/tty/tty_port.c
index 7a4fb1c..1109e1b 100644
--- a/drivers/tty/tty_port.c
+++ b/drivers/tty/tty_port.c
@@ -227,7 +227,6 @@ int tty_port_block_til_ready(struct tty_port *port,
int do_clocal = 0, retval;
unsigned long flags;
DEFINE_WAIT(wait);
- int cd;
/* block if port is in the process of being closed */
if (tty_hung_up_p(filp) || port->flags & ASYNC_CLOSING) {
@@ -284,11 +283,14 @@ int tty_port_block_til_ready(struct tty_port *port,
retval = -ERESTARTSYS;
break;
}
- /* Probe the carrier. For devices with no carrier detect this
- will always return true */
- cd = tty_port_carrier_raised(port);
+ /*
+ * Probe the carrier. For devices with no carrier detect
+ * tty_port_carrier_raised will always return true.
+ * Never ask drivers if CLOCAL is set, this causes troubles
+ * on some hardware.
+ */
if (!(port->flags & ASYNC_CLOSING) &&
- (do_clocal || cd))
+ (do_clocal || tty_port_carrier_raised(port)))
break;
if (signal_pending(current)) {
retval = -ERESTARTSYS;
--
1.7.8.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] TTY: fix UV serial console regression
2012-01-12 21:55 ` [PATCH] TTY: fix UV serial console regression Jiri Slaby
@ 2012-01-12 22:00 ` Greg KH
2012-01-12 23:41 ` Alan Cox
1 sibling, 0 replies; 7+ messages in thread
From: Greg KH @ 2012-01-12 22:00 UTC (permalink / raw)
To: Jiri Slaby
Cc: Arnd Bergmann, alan, linux-serial, jirislaby, linux-kernel,
3.0 3.1 3.2
On Thu, Jan 12, 2012 at 10:55:15PM +0100, Jiri Slaby wrote:
> Commit 74c2107759d (serial: Use block_til_ready helper) and its fixup
> 3f582b8c110 (serial: fix termios settings in open) introduced a
> regression on UV systems. The serial eventually freezes while being
> used. It's completely unpredictable and sometimes needs a heap of
> traffic to happen first.
>
> To reproduce this, yast installation was used as it turned out to be
> pretty reliable in reproducing. Especially during installation process
> where one doesn't have an SSH daemon running. And no monitor as the HW
> is completely headless. So this was fun to find. Given the machine
> doesn't boot on vanilla before 2.6.36 final. (And the commits above
> are older.)
>
> Unless there is some bad race in the code, the hardware seems to be
> pretty broken. Otherwise pure MSR read should not cause such a bug,
> or?
>
> So to prevent the bug, revert to the old behavior. I.e. read modem
> status only if we really have to -- for non-CLOCAL set serials.
> Non-CLOCAL works on this hardware OK, I tried. See? I don't.
>
> And document that shit.
Thanks for tracking this down, I'll queue it up and get it to Linus
after 3.3-rc1 is out.
greg k-h
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] TTY: fix UV serial console regression
2012-01-12 21:55 ` [PATCH] TTY: fix UV serial console regression Jiri Slaby
2012-01-12 22:00 ` Greg KH
@ 2012-01-12 23:41 ` Alan Cox
2012-01-16 13:28 ` Jiri Slaby
1 sibling, 1 reply; 7+ messages in thread
From: Alan Cox @ 2012-01-12 23:41 UTC (permalink / raw)
To: Jiri Slaby
Cc: gregkh, Arnd Bergmann, alan, linux-serial, jirislaby,
linux-kernel, 3.0 3.1 3.2
> Unless there is some bad race in the code, the hardware seems to be
> pretty broken. Otherwise pure MSR read should not cause such a bug,
> or?
UV serial being what actual hardware and system ?
> So to prevent the bug, revert to the old behavior. I.e. read modem
> status only if we really have to -- for non-CLOCAL set serials.
> Non-CLOCAL works on this hardware OK, I tried. See? I don't.
The old behaviour is rather driver dependant eg whether it used
serial_core or not.
Doesn't look an unreasonable change though.
Alan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] TTY: fix UV serial console regression
2012-01-12 23:41 ` Alan Cox
@ 2012-01-16 13:28 ` Jiri Slaby
0 siblings, 0 replies; 7+ messages in thread
From: Jiri Slaby @ 2012-01-16 13:28 UTC (permalink / raw)
To: Alan Cox
Cc: Jiri Slaby, gregkh, Arnd Bergmann, alan, linux-serial,
linux-kernel, 3.0 3.1 3.2
On 01/13/2012 12:41 AM, Alan Cox wrote:
>> Unless there is some bad race in the code, the hardware seems to
>> be pretty broken. Otherwise pure MSR read should not cause such a
>> bug, or?
>
> UV serial being what actual hardware and system ?
What exact paramteres are you interested in? dmidecode on this machine
says it's an Altix XE310.
thanks,
--
js
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-01-16 13:28 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-06 22:34 UV serial regression since 2.6.36 Jiri Slaby
2012-01-04 18:41 ` Jiri Slaby
2012-01-04 20:53 ` Greg KH
2012-01-12 21:55 ` [PATCH] TTY: fix UV serial console regression Jiri Slaby
2012-01-12 22:00 ` Greg KH
2012-01-12 23:41 ` Alan Cox
2012-01-16 13:28 ` Jiri Slaby
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox