* serial8250: bogus low_latency destabilizes kernel, need sanity check @ 2014-01-14 0:35 Pavel Roskin 2014-01-14 12:08 ` One Thousand Gnomes 0 siblings, 1 reply; 6+ messages in thread From: Pavel Roskin @ 2014-01-14 0:35 UTC (permalink / raw) To: Greg Kroah-Hartman, Jiri Slaby, linux-kernel Hello! I've been debugging an instability of a kernel on some 32-bit x86 embedded system. The kernel would just hang randomly. I had to enable most debug options to find the reason. The system has several serial ports, including ttyS4. There is also a file called /etc/serial.conf that contains a line /dev/ttyS4 uart 16550a irq 17 baud_base 921600 port 0xd000 low_latency That file is processed by the setserial utility on startup that makes the port as low_latency. And then the kernel reports this: BUG: sleeping function called from invalid context at /root/src/linux-3.12.6/kernel/mutex.c:616 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/0 INFO: lockdep is turned off. irq event stamp: 296476 hardirqs last enabled at (296475): [<c10b4ff1>] tick_nohz_idle_exit+0x151/0x1b0 hardirqs last disabled at (296476): [<c1588f45>] _raw_spin_lock_irq+0x15/0x80 softirqs last enabled at (296458): [<c1049c9d>] __do_softirq+0x2ad/0x3b0 softirqs last disabled at (296421): [<c1004637>] do_softirq+0x97/0xf0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.6 #3 Hardware name: RadiSys SandyBridge Platform/S-CEQM67-i5-2515EE , BIOS 20.02.01 08/29/2012 00000000 00000000 f480de5c c1580991 c17d7a00 f480de84 c1076726 c1710154 00000001 00000001 00000000 c17d7cfc f495b000 f3df097c 00000000 f480decc c1584223 f495b058 00000001 f495b018 f480deb8 c134cf7c 00000001 00000002 Call Trace: [<c1580991>] dump_stack+0x4b/0x66 [<c1076726>] __might_sleep+0x166/0x210 [<c1584223>] mutex_lock_nested+0x23/0x380 [<c134cf7c>] ? ldsem_down_read_trylock+0x7c/0xa0 [<c134acf2>] ? tty_ldisc_ref+0x22/0x50 [<c134acf2>] ? tty_ldisc_ref+0x22/0x50 [<c134bc5e>] flush_to_ldisc+0x3e/0x100 [<c134bd60>] tty_flip_buffer_push+0x40/0x50 [<c1361a15>] serial8250_rx_chars+0xc5/0x200 [<c158902b>] ? _raw_spin_lock_irqsave+0x7b/0x90 [<c1362e67>] ? serial8250_handle_irq+0x37/0xa0 [<c1362eb1>] serial8250_handle_irq+0x81/0xa0 [<c1362eec>] serial8250_default_handle_irq+0x1c/0x20 [<c1360d0c>] serial8250_interrupt+0x5c/0xd0 [<c10a3744>] handle_irq_event_percpu+0x54/0x390 [<c10a67e6>] ? handle_fasteoi_irq+0x16/0xe0 [<c10a3ab1>] ? handle_irq_event+0x31/0x60 [<c10a3aba>] handle_irq_event+0x3a/0x60 [<c10a67d0>] ? unmask_irq+0x30/0x30 [<c10a681e>] handle_fasteoi_irq+0x4e/0xe0 <IRQ> [<c15922d2>] ? do_IRQ+0x42/0xc0 That's a backtrace for Linux 3.12.6, but 3.13-rc8 does the same thing. serial8250_handle_irq() tries to use the DMA and fails, so it calls serial8250_rx_chars(). That function calls tty_flip_buffer_push(). The comment above tty_flip_buffer_push() says: "This function must not be called from IRQ context if port->low_latency is set" And that's precisely what we are doing. Sure, root can damage the system by using incorrect configuration files. However, I think we need some sanity checking. After all, the device may degrade and stop working as a low-latency port, and we don't want the whole system to hang because of that. Maybe we should unset the low_latency flag as soon as DMA fails? There are two flags, one is state->uart_port->flags and the other is port->low_latency. I guess we need to unset both. -- Regards, Pavel Roskin ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: serial8250: bogus low_latency destabilizes kernel, need sanity check 2014-01-14 0:35 serial8250: bogus low_latency destabilizes kernel, need sanity check Pavel Roskin @ 2014-01-14 12:08 ` One Thousand Gnomes 2014-01-14 16:24 ` Pavel Roskin 0 siblings, 1 reply; 6+ messages in thread From: One Thousand Gnomes @ 2014-01-14 12:08 UTC (permalink / raw) To: Pavel Roskin; +Cc: Greg Kroah-Hartman, Jiri Slaby, linux-kernel > Maybe we should unset the low_latency flag as soon as DMA fails? There > are two flags, one is state->uart_port->flags and the other is > port->low_latency. I guess we need to unset both. Well low latency and DMA are pretty much exclusive in the real world so probably DMA ports shouldn't allow low_latency to be set at all in DMA mode. Alan ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: serial8250: bogus low_latency destabilizes kernel, need sanity check 2014-01-14 12:08 ` One Thousand Gnomes @ 2014-01-14 16:24 ` Pavel Roskin 2014-02-01 15:09 ` Peter Hurley 0 siblings, 1 reply; 6+ messages in thread From: Pavel Roskin @ 2014-01-14 16:24 UTC (permalink / raw) To: One Thousand Gnomes; +Cc: Greg Kroah-Hartman, Jiri Slaby, linux-kernel Hi Alan, Quoting One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>: >> Maybe we should unset the low_latency flag as soon as DMA fails? There >> are two flags, one is state->uart_port->flags and the other is >> port->low_latency. I guess we need to unset both. > > Well low latency and DMA are pretty much exclusive in the real world so > probably DMA ports shouldn't allow low_latency to be set at all in DMA > mode. That's a useful insight. I assumed exactly the opposite. But how should a real low_latency device work? Are they supported by 8250_core? Do they have hard IRQs? Are those IRQs handled by serial8250_handle_irq()? If DMA is not used, then serial8250_rx_chars() is the only way to receive data. But serial8250_rx_chars() calls tty_flip_buffer_push() unconditionally, and the later should not be called from the IRQ context for low_latency devices, if the comment about it is to be trusted. -- Regards, Pavel Roskin ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: serial8250: bogus low_latency destabilizes kernel, need sanity check 2014-01-14 16:24 ` Pavel Roskin @ 2014-02-01 15:09 ` Peter Hurley 2014-02-03 11:10 ` One Thousand Gnomes 0 siblings, 1 reply; 6+ messages in thread From: Peter Hurley @ 2014-02-01 15:09 UTC (permalink / raw) To: Pavel Roskin, One Thousand Gnomes Cc: Greg Kroah-Hartman, Jiri Slaby, linux-kernel On 01/14/2014 11:24 AM, Pavel Roskin wrote: > Hi Alan, > > Quoting One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>: > >>> Maybe we should unset the low_latency flag as soon as DMA fails? There >>> are two flags, one is state->uart_port->flags and the other is >>> port->low_latency. I guess we need to unset both. >> >> Well low latency and DMA are pretty much exclusive in the real world so >> probably DMA ports shouldn't allow low_latency to be set at all in DMA >> mode. > > That's a useful insight. I assumed exactly the opposite. The meaning of low_latency has migrated since 2.6.28 Although it used to mean 'process the input immediately even in interrupt context', it has become 'the device doesn't receive data in interrupt context so process the data now'. > But how should a real low_latency device work? A low_latency device does not receive data in interrupt context; ie., tty_flip_buffers() cannot be called from an irq handler or with interrupts disabled. > Are they supported by 8250_core? yes > Do they have hard IRQs? possibly but not for rx > Are those IRQs handled by serial8250_handle_irq()? possibly, but again not for rx > If DMA is not used, then serial8250_rx_chars() is the only way to receive data. > But serial8250_rx_chars() calls tty_flip_buffer_push() unconditionally, > and the later should not be called from the IRQ context for low_latency devices, > if the comment about it is to be trusted. Exactly. Perhaps we should unconditionally unset low_latency (or remove it entirely). Real low latency can be addressed by using the -RT kernel. Regards, Peter Hurley ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: serial8250: bogus low_latency destabilizes kernel, need sanity check 2014-02-01 15:09 ` Peter Hurley @ 2014-02-03 11:10 ` One Thousand Gnomes 2014-02-04 12:42 ` Peter Hurley 0 siblings, 1 reply; 6+ messages in thread From: One Thousand Gnomes @ 2014-02-03 11:10 UTC (permalink / raw) To: Peter Hurley; +Cc: Pavel Roskin, Greg Kroah-Hartman, Jiri Slaby, linux-kernel On Sat, 01 Feb 2014 10:09:03 -0500 Peter Hurley <peter@hurleysoftware.com> wrote: > On 01/14/2014 11:24 AM, Pavel Roskin wrote: > > Hi Alan, > > > > Quoting One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>: > > > >>> Maybe we should unset the low_latency flag as soon as DMA fails? There > >>> are two flags, one is state->uart_port->flags and the other is > >>> port->low_latency. I guess we need to unset both. > >> > >> Well low latency and DMA are pretty much exclusive in the real world so > >> probably DMA ports shouldn't allow low_latency to be set at all in DMA > >> mode. > > > > That's a useful insight. I assumed exactly the opposite. > > The meaning of low_latency has migrated since 2.6.28 Not really. The meaning of low latency was always "get the turn around time for command/response protocols down as low as possible". DMA driven serial usually reports a transfer completion on a watermark or a timeout, so tends to work very badly within the Linux definition of 'low latency' for tty. What it does has certainly changed but thats implementation detail. > Perhaps we should unconditionally unset low_latency (or remove it entirely). > Real low latency can be addressed by using the -RT kernel. Just saying "use -RT" would be a regression and actually hurt quite a few annoying "simple protocol" using tools for all sorts of control systems. We are talking about milliseconds not microseconds here. The expected behaviour in low_latency is probably best described as data arrives processed wakeup and to avoid the case of data arrives queued for back end [up to 10mS delay, but typically 1-2mS] processed wakeup which multipled over a 50,000 S record download is a lot of time Everything else is not user visible so can be changed freely to get that assumption to work (including ending up not needing it in the first place). Getting tty to the point everything but N_TTY canonical mode is a fast path would probably eliminate the need nicely - I don't know of any use cases that expect ICANON, ECHO or I*/O* processing for low latency. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: serial8250: bogus low_latency destabilizes kernel, need sanity check 2014-02-03 11:10 ` One Thousand Gnomes @ 2014-02-04 12:42 ` Peter Hurley 0 siblings, 0 replies; 6+ messages in thread From: Peter Hurley @ 2014-02-04 12:42 UTC (permalink / raw) To: One Thousand Gnomes Cc: Pavel Roskin, Greg Kroah-Hartman, Jiri Slaby, linux-kernel On 02/03/2014 06:10 AM, One Thousand Gnomes wrote: > On Sat, 01 Feb 2014 10:09:03 -0500 > Peter Hurley <peter@hurleysoftware.com> wrote: > >> On 01/14/2014 11:24 AM, Pavel Roskin wrote: >>> Hi Alan, >>> >>> Quoting One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>: >>> >>>>> Maybe we should unset the low_latency flag as soon as DMA fails? There >>>>> are two flags, one is state->uart_port->flags and the other is >>>>> port->low_latency. I guess we need to unset both. >>>> >>>> Well low latency and DMA are pretty much exclusive in the real world so >>>> probably DMA ports shouldn't allow low_latency to be set at all in DMA >>>> mode. >>> >>> That's a useful insight. I assumed exactly the opposite. >> >> The meaning of low_latency has migrated since 2.6.28 > > Not really. The meaning of low latency was always "get the turn around > time for command/response protocols down as low as possible". DMA driven > serial usually reports a transfer completion on a watermark or a timeout, > so tends to work very badly within the Linux definition of 'low latency' > for tty. > > What it does has certainly changed but thats implementation detail. I meant the meaning as interpreted by the kernel, not the ideal meaning nor its original intent. >> Perhaps we should unconditionally unset low_latency (or remove it entirely). >> Real low latency can be addressed by using the -RT kernel. > > Just saying "use -RT" would be a regression and actually hurt quite a few > annoying "simple protocol" using tools for all sorts of control systems. > We are talking about milliseconds not microseconds here. Ok, fair enough. [Although my gut feeling is that nominal overhead is more like sub 10 usecs, and only when the scheduler is I/O-bound does worst case get near 1 msec.] > The expected behaviour in low_latency is probably best described as > > data arrives > processed > wakeup low_latency cannot guarantee that data will be processed, only that it will not wait. Examples: 1) SLIP is changing the mtu size. In this case, data will be dropped because, since the net queue is stopped, no data is taken up but any data passed to the receive_buf() is assumed to have been consumed. 2) tty buffers are being flushed. There may or may not be any data to process but there's no way to know without waiting. 3) termios is changing/has been changed. Depending on the line discipline, data may or may not be processed until termios changes complete. etc. > and to avoid the case of > > data arrives > queued for back end > [up to 10mS delay, but typically 1-2mS] > processed > wakeup > > > which multipled over a 50,000 S record download is a lot of time > > Everything else is not user visible so can be changed freely to get that > assumption to work (including ending up not needing it in the first > place). > > Getting tty to the point everything but N_TTY canonical mode is a fast > path would probably eliminate the need nicely - I don't know of any use > cases that expect ICANON, ECHO or I*/O* processing for low latency. Easier said than done. For example, what happens if termios is changing? Presumably, data cannot be processed at that time. So the line discipline returns early without having processed the data. [For example, the receive_buf() path could use trylocks and abort, rather than waiting.] But then, what restarts the attempt to process the data and can that wait? Similarly for throttling. Unthrottling may be in progress; and even though in progress, the condition that prompted the unthrottle may no longer be true and throttling must be done. Ok, it can't happen right now, but then when? Regards, Peter Hurley ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-02-04 12:42 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-01-14 0:35 serial8250: bogus low_latency destabilizes kernel, need sanity check Pavel Roskin 2014-01-14 12:08 ` One Thousand Gnomes 2014-01-14 16:24 ` Pavel Roskin 2014-02-01 15:09 ` Peter Hurley 2014-02-03 11:10 ` One Thousand Gnomes 2014-02-04 12:42 ` Peter Hurley
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox