tty latency and RT

linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* tty latency and RT
@ 2018-07-06  8:51 André Pribil
  2018-07-06  9:16 ` John Ogness
  0 siblings, 1 reply; 7+ messages in thread
From: André Pribil @ 2018-07-06  8:51 UTC (permalink / raw)
  To: linux-rt-users@vger.kernel.org

Hello,

are there any plans to improve the latency with serial ttys and RT?

I'm seeing very high latencies (up to 100 ms) when a high priority 
RT thread accesses a serial tty port. I'm working with 4.14.52-rt34 
on a ARM system.

The reason for this issue seems to be that a kworker thread with a 
normal priority is involved here. When the high priority RT 
thread needs to wait for this kworker thread, it can easily be blocked 
by other lower priority RT threads running in the system.

I have seen that Steven Walter proposed some change in 2015, where
this kworker thread in replaced by a kthread with a RT priority if
the low_latency mode is selected.
See: https://www.spinics.net/lists/linux-serial/msg17782.html

The follow-up discussion in this thread also proposes a solution that
allows low_latency ports to directly execute the worker.
As interrupts are already threaded in PREEMPT_RT, wouldn't that be a 
solution?

Please excuse me if I got something wrong. 
Thanks in advance.

Andre

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: tty latency and RT
  2018-07-06  8:51 tty latency and RT André Pribil
@ 2018-07-06  9:16 ` John Ogness
  2018-07-09  9:55   ` Esben Haabendal
  0 siblings, 1 reply; 7+ messages in thread
From: John Ogness @ 2018-07-06  9:16 UTC (permalink / raw)
  To: André Pribil; +Cc: linux-rt-users@vger.kernel.org

Hi,

On 2018-07-06, André Pribil <a.pribil@beck-ipc.com> wrote:
> are there any plans to improve the latency with serial ttys and RT?
>
> I'm seeing very high latencies (up to 100 ms) when a high priority 
> RT thread accesses a serial tty port. I'm working with 4.14.52-rt34 
> on a ARM system.

As you later point out, this is a problem with the tty subsystem
(kworker) and is not RT-specific. This problem needs to be fixed in
mainline.

> The reason for this issue seems to be that a kworker thread with a 
> normal priority is involved here. When the high priority RT 
> thread needs to wait for this kworker thread, it can easily be blocked 
> by other lower priority RT threads running in the system.
>
> I have seen that Steven Walter proposed some change in 2015, where
> this kworker thread in replaced by a kthread with a RT priority if
> the low_latency mode is selected.
> See: https://www.spinics.net/lists/linux-serial/msg17782.html
>
> The follow-up discussion in this thread also proposes a solution that
> allows low_latency ports to directly execute the worker.
> As interrupts are already threaded in PREEMPT_RT, wouldn't that be a 
> solution?

The patches and discussions in that thread provide some good ground work
for a real solution. Someone needs to take more time to work with the
maintainers to get it mainline.

John Ogness

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: tty latency and RT
  2018-07-06  9:16 ` John Ogness
@ 2018-07-09  9:55   ` Esben Haabendal
  2018-07-09 11:37     ` André Pribil
  2018-07-11  8:09     ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 7+ messages in thread
From: Esben Haabendal @ 2018-07-09  9:55 UTC (permalink / raw)
  To: John Ogness; +Cc: André Pribil, linux-rt-users@vger.kernel.org

John Ogness <john.ogness@linutronix.de> writes:

> Hi,
>
> On 2018-07-06, André Pribil <a.pribil@beck-ipc.com> wrote:
>> are there any plans to improve the latency with serial ttys and RT?
>>
>> I'm seeing very high latencies (up to 100 ms) when a high priority 
>> RT thread accesses a serial tty port. I'm working with 4.14.52-rt34 
>> on a ARM system.
>
> As you later point out, this is a problem with the tty subsystem
> (kworker) and is not RT-specific. This problem needs to be fixed in
> mainline.
>
>> The reason for this issue seems to be that a kworker thread with a 
>> normal priority is involved here. When the high priority RT 
>> thread needs to wait for this kworker thread, it can easily be blocked 
>> by other lower priority RT threads running in the system.
>>
>> I have seen that Steven Walter proposed some change in 2015, where
>> this kworker thread in replaced by a kthread with a RT priority if
>> the low_latency mode is selected.
>> See: https://www.spinics.net/lists/linux-serial/msg17782.html
>>
>> The follow-up discussion in this thread also proposes a solution that
>> allows low_latency ports to directly execute the worker.
>> As interrupts are already threaded in PREEMPT_RT, wouldn't that be a 
>> solution?
>
> The patches and discussions in that thread provide some good ground work
> for a real solution. Someone needs to take more time to work with the
> maintainers to get it mainline.

I am using the following patch.
Not sure if it is worth proposing it for mainline inclusion, though.
RFC:

commit ca633caa8a18e8043f7f59cf9b3b9cbe2097a5ee
Author: Esben Haabendal <eha@deif.com>
Date:   Fri Jun 1 13:03:41 2018 +0200

    tty: Re-introduce annoying historical pain in the butt
    
    Since a9c3f68f3cd8d55f809fbdb0c138ed061ea1bd25, use of UART for real-time
    applications has been problematic.
    
    When all CPU cores are running SCHED_FIFO tasks, tty buffers will be
    stalled, causing unbounded latency of UART RX.  For applications with
    real-time requirements to reaction UART RX, this is of-course unacceptable.
    
    This change reuses the old low_latency flag, but only respects it for
    drivers using threaded interrupt handler.  Applications caring about
    real-time requirements should do that anyway.
    
    Signed-off-by: Esben Haabendal <eha@deif.com>

diff --git a/drivers/tty/tty_buffer.c b/drivers/tty/tty_buffer.c
index f8eba1c5412f..eb9ad23341c1 100644
--- a/drivers/tty/tty_buffer.c
+++ b/drivers/tty/tty_buffer.c
@@ -399,6 +399,7 @@ EXPORT_SYMBOL(__tty_insert_flip_char);
 void tty_schedule_flip(struct tty_port *port)
 {
 	struct tty_bufhead *buf = &port->buf;
+	WARN_ON(port->low_latency);
 
 	/* paired w/ acquire in flush_to_ldisc(); ensures
 	 * flush_to_ldisc() sees buffer data.
@@ -408,6 +409,16 @@ void tty_schedule_flip(struct tty_port *port)
 }
 EXPORT_SYMBOL(tty_schedule_flip);
 
+static void flush_to_ldisc(struct work_struct *work);
+
+void tty_do_flip(struct tty_port *port)
+{
+	struct tty_bufhead *buf = &port->buf;
+	smp_store_release(&buf->tail->commit, buf->tail->used);
+	flush_to_ldisc(&buf->work);
+}
+EXPORT_SYMBOL(tty_do_flip);
+
 /**
  *	tty_prepare_flip_string		-	make room for characters
  *	@port: tty port
@@ -543,7 +554,18 @@ static void flush_to_ldisc(struct work_struct *work)
 
 void tty_flip_buffer_push(struct tty_port *port)
 {
-	tty_schedule_flip(port);
+	if (in_interrupt() || !port->low_latency)
+		/* Schedule buffer flip on workqueue if running in hardirq
+		   context, or if low_latency mode is not enabled.
+		*/
+		tty_schedule_flip(port);
+	else
+		/* Flip buffer immediately when when low_latency flag is set
+		   and running in threaded interrupt handler.  Without this,
+		   latency can increase dramatically when one or more
+		   SCHED_FIFO tasks are hogging the CPU(s).
+		*/
+		tty_do_flip(port);
 }
 EXPORT_SYMBOL(tty_flip_buffer_push);
 
diff --git a/include/linux/tty_flip.h b/include/linux/tty_flip.h
index 767f62086bd9..005443e83b2d 100644
--- a/include/linux/tty_flip.h
+++ b/include/linux/tty_flip.h
@@ -13,6 +13,7 @@ extern int tty_prepare_flip_string(struct tty_port *port,
 		unsigned char **chars, size_t size);
 extern void tty_flip_buffer_push(struct tty_port *port);
 void tty_schedule_flip(struct tty_port *port);
+void tty_do_flip(struct tty_port *port);
 int __tty_insert_flip_char(struct tty_port *port, unsigned char ch, char flag);
 
 static inline int tty_insert_flip_char(struct tty_port *port,

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* RE: tty latency and RT
  2018-07-09  9:55   ` Esben Haabendal
@ 2018-07-09 11:37     ` André Pribil
  2018-07-11  8:09     ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 7+ messages in thread
From: André Pribil @ 2018-07-09 11:37 UTC (permalink / raw)
  To: Esben Haabendal, John Ogness; +Cc: linux-rt-users@vger.kernel.org

Esben Haabendal [mailto:esbenhaabendal@gmail.com] writes:

> I am using the following patch.
> Not sure if it is worth proposing it for mainline inclusion, though.

Thanks a lot. That's what I was looking for.

Andre

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: tty latency and RT
  2018-07-09  9:55   ` Esben Haabendal
  2018-07-09 11:37     ` André Pribil
@ 2018-07-11  8:09     ` Sebastian Andrzej Siewior
  2018-07-13  8:33       ` Esben Haabendal
  1 sibling, 1 reply; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-07-11  8:09 UTC (permalink / raw)
  To: Esben Haabendal
  Cc: John Ogness, André Pribil, linux-rt-users@vger.kernel.org

On 2018-07-09 11:55:08 [+0200], Esben Haabendal wrote:
> I am using the following patch.
> Not sure if it is worth proposing it for mainline inclusion, though.
> RFC:

| ======================================================
| WARNING: possible circular locking dependency detected
| 4.16.18-rt9+ #187 Not tainted
| ------------------------------------------------------
| sshd/3205 is trying to acquire lock:
|  (&buf->lock){+.+.}, at: [<        (ptrval)>] flush_to_ldisc+0x1e/0xa0
| 
| but task is already holding lock:
|  (&ldata->output_lock){+.+.}, at: [<        (ptrval)>] n_tty_write+0x12a/0x480
| 
| which lock already depends on the new lock.
|
| 
| the existing dependency chain (in reverse order) is:
| 
| -> #2 (&ldata->output_lock){+.+.}:
|        _mutex_lock+0x26/0x40
|        n_tty_write+0x12a/0x480
|        tty_write+0x1b3/0x320
|        redirected_tty_write+0x9a/0xb0
|        do_iter_write+0x159/0x1a0
|        vfs_writev+0x93/0x110
|        do_writev+0x5f/0xf0
|        SyS_writev+0xb/0x10
|        do_syscall_64+0x73/0x220
|        entry_SYSCALL_64_after_hwframe+0x42/0xb7
| 
| -> #1 (&tty->termios_rwsem){++++}:
|        down_write+0x39/0x50
|        n_tty_flush_buffer+0x19/0xf0
|        tty_buffer_flush+0x71/0x90
|        tty_ldisc_flush+0x1d/0x40
|        tty_port_close_start.part.5+0xa0/0x1b0
|        tty_port_close+0x29/0x60
|        uart_close+0x26/0x70
|        tty_release+0xfc/0x4f0
|        __fput+0xf1/0x200
|        ____fput+0x9/0x10
|        task_work_run+0x8b/0xc0
|        exit_to_usermode_loop+0xbc/0xc0
|        do_syscall_64+0x21b/0x220
|        entry_SYSCALL_64_after_hwframe+0x42/0xb7
| 
| -> #0 (&buf->lock){+.+.}:
|        lock_acquire+0x95/0x240
|        _mutex_lock+0x26/0x40
|        flush_to_ldisc+0x1e/0xa0
|        tty_flip_buffer_push+0x28/0x40
|        pty_write+0x4e/0x60
|        n_tty_write+0x1ae/0x480
|        tty_write+0x1b3/0x320
|        __vfs_write+0x35/0x160
|        vfs_write+0xc1/0x1c0
|        SyS_write+0x53/0xc0
|        do_syscall_64+0x73/0x220
|        entry_SYSCALL_64_after_hwframe+0x42/0xb7
| 
| other info that might help us debug this:
|
| Chain exists of:
|   &buf->lock --> &tty->termios_rwsem --> &ldata->output_lock
|
|  Possible unsafe locking scenario:
|
|        CPU0                    CPU1
|        ----                    ----
|   lock(&ldata->output_lock);
|                                lock(&tty->termios_rwsem);
|                                lock(&ldata->output_lock);
|   lock(&buf->lock);
| 
|  *** DEADLOCK ***
|
| 4 locks held by sshd/3205:
|  #0:  (&tty->ldisc_sem){++++}, at: [<        (ptrval)>] ldsem_down_read+0x2d/0x40
|  #1:  (&tty->atomic_write_lock){+.+.}, at: [<        (ptrval)>] tty_write_lock+0x19/0x50
|  #2:  (&o_tty->termios_rwsem/1){++++}, at: [<        (ptrval)>] n_tty_write+0x9a/0x480
|  #3:  (&ldata->output_lock){+.+.}, at: [<        (ptrval)>] n_tty_write+0x12a/0x480
| 
| stack backtrace:
| CPU: 7 PID: 3205 Comm: sshd Not tainted 4.16.18-rt9+ #187
| Call Trace:
|  dump_stack+0x70/0xa7
|  print_circular_bug.isra.37+0x1d8/0x1e6
|  __lock_acquire+0x1284/0x1340
|  ? flush_to_ldisc+0x1e/0xa0
|  lock_acquire+0x95/0x240
|  ? lock_acquire+0x95/0x240
|  ? flush_to_ldisc+0x1e/0xa0
|  _mutex_lock+0x26/0x40
|  ? flush_to_ldisc+0x1e/0xa0
|  flush_to_ldisc+0x1e/0xa0
|  tty_flip_buffer_push+0x28/0x40
|  pty_write+0x4e/0x60
|  n_tty_write+0x1ae/0x480
|  ? do_wait_intr_irq+0xc0/0xc0
|  tty_write+0x1b3/0x320
|  ? process_echoes+0x60/0x60
|  __vfs_write+0x35/0x160
|  ? SYSC_newfstat+0x44/0x70
|  vfs_write+0xc1/0x1c0
|  SyS_write+0x53/0xc0
|  do_syscall_64+0x73/0x220
|  entry_SYSCALL_64_after_hwframe+0x42/0xb7
| RIP: 0033:0x7f0cbc297134
| RSP: 002b:00007ffc66125a38 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
| RAX: ffffffffffffffda RBX: 0000000000000052 RCX: 00007f0cbc297134
| RDX: 0000000000000052 RSI: 00005639334fbf80 RDI: 0000000000000001
| RBP: 00005639334fbf80 R08: 0000000000000000 R09: 00007f0cbc564090
| R10: 0000000000000000 R11: 0000000000000246 R12: 00007f0cbc564760
| R13: 0000000000000052 R14: 00007f0cbc55f760 R15: 0000000000000052

Sebastian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: tty latency and RT
  2018-07-11  8:09     ` Sebastian Andrzej Siewior
@ 2018-07-13  8:33       ` Esben Haabendal
  2018-07-18 16:49         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 7+ messages in thread
From: Esben Haabendal @ 2018-07-13  8:33 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: John Ogness, André Pribil, linux-rt-users@vger.kernel.org

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2018-07-09 11:55:08 [+0200], Esben Haabendal wrote:
>> I am using the following patch.
>> Not sure if it is worth proposing it for mainline inclusion, though.
>> RFC:
>
> | ======================================================
> | WARNING: possible circular locking dependency detected
> | 4.16.18-rt9+ #187 Not tainted
> | ------------------------------------------------------
> | sshd/3205 is trying to acquire lock:
> |  (&buf->lock){+.+.}, at: [<        (ptrval)>] flush_to_ldisc+0x1e/0xa0
> | 
> | but task is already holding lock:
> |  (&ldata->output_lock){+.+.}, at: [<        (ptrval)>] n_tty_write+0x12a/0x480
> | 
> | which lock already depends on the new lock.
> |
> | 
> | the existing dependency chain (in reverse order) is:
> | 
> | -> #2 (&ldata->output_lock){+.+.}:
> |        _mutex_lock+0x26/0x40
> |        n_tty_write+0x12a/0x480
> |        tty_write+0x1b3/0x320
> |        redirected_tty_write+0x9a/0xb0
> |        do_iter_write+0x159/0x1a0
> |        vfs_writev+0x93/0x110
> |        do_writev+0x5f/0xf0
> |        SyS_writev+0xb/0x10
> |        do_syscall_64+0x73/0x220
> |        entry_SYSCALL_64_after_hwframe+0x42/0xb7
> | 
> | -> #1 (&tty->termios_rwsem){++++}:
> |        down_write+0x39/0x50
> |        n_tty_flush_buffer+0x19/0xf0
> |        tty_buffer_flush+0x71/0x90
> |        tty_ldisc_flush+0x1d/0x40
> |        tty_port_close_start.part.5+0xa0/0x1b0
> |        tty_port_close+0x29/0x60
> |        uart_close+0x26/0x70
> |        tty_release+0xfc/0x4f0
> |        __fput+0xf1/0x200
> |        ____fput+0x9/0x10
> |        task_work_run+0x8b/0xc0
> |        exit_to_usermode_loop+0xbc/0xc0
> |        do_syscall_64+0x21b/0x220
> |        entry_SYSCALL_64_after_hwframe+0x42/0xb7
> | 
> | -> #0 (&buf->lock){+.+.}:
> |        lock_acquire+0x95/0x240
> |        _mutex_lock+0x26/0x40
> |        flush_to_ldisc+0x1e/0xa0
> |        tty_flip_buffer_push+0x28/0x40
> |        pty_write+0x4e/0x60
> |        n_tty_write+0x1ae/0x480
> |        tty_write+0x1b3/0x320
> |        __vfs_write+0x35/0x160
> |        vfs_write+0xc1/0x1c0
> |        SyS_write+0x53/0xc0
> |        do_syscall_64+0x73/0x220
> |        entry_SYSCALL_64_after_hwframe+0x42/0xb7
> | 
> | other info that might help us debug this:
> |
> | Chain exists of:
> |   &buf->lock --> &tty->termios_rwsem --> &ldata->output_lock
> |
> |  Possible unsafe locking scenario:
> |
> |        CPU0                    CPU1
> |        ----                    ----
> |   lock(&ldata->output_lock);
> |                                lock(&tty->termios_rwsem);
> |                                lock(&ldata->output_lock);
> |   lock(&buf->lock);
> | 
> |  *** DEADLOCK ***
> |
> | 4 locks held by sshd/3205:
> |  #0:  (&tty->ldisc_sem){++++}, at: [<        (ptrval)>] ldsem_down_read+0x2d/0x40
> |  #1:  (&tty->atomic_write_lock){+.+.}, at: [<        (ptrval)>] tty_write_lock+0x19/0x50
> |  #2:  (&o_tty->termios_rwsem/1){++++}, at: [<        (ptrval)>] n_tty_write+0x9a/0x480
> |  #3:  (&ldata->output_lock){+.+.}, at: [<        (ptrval)>] n_tty_write+0x12a/0x480

Yes, it seems like there is:

1. tty_buffer_flush(), which results in

  mutex_lock(&buf->lock);
  down_write(&tty->termios_rwsem); # in n_tty_flush_buffer()

and with my patch:

2. n_tty_write(), which results in

  down_read(&tty->termios_rwsem);
  mutex_lock(&ldata->output_lock);
  mutex_lock(&buf->lock); # in flush_to_ldisc() (from tty_do_flip())

So both

  &buf->lock --> &tty->termios_rwsem

and

  &tty->termios_rwsem --> &ldata->output_lock --> &buf->lock

chains, which I guess is what lockdep refers to, and which indeed looks
like a deadlock waiting to happen.

But is the lock ordering in tty_buffer_flush() really needed?
Could we move the

	if (ld && ld->ops->flush_buffer)
		ld->ops->flush_buffer(tty);

lines out of the &buf->lock section?  I don't think that any code
outside of tty_buffer.c should touch the port->buf code, and thus should
not need to hold the &buf->lock.

/Esben



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: tty latency and RT
  2018-07-13  8:33       ` Esben Haabendal
@ 2018-07-18 16:49         ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-07-18 16:49 UTC (permalink / raw)
  To: Esben Haabendal
  Cc: John Ogness, André Pribil, linux-rt-users@vger.kernel.org

On 2018-07-13 10:33:54 [+0200], Esben Haabendal wrote:
> But is the lock ordering in tty_buffer_flush() really needed?
> Could we move the
> 
> 	if (ld && ld->ops->flush_buffer)
> 		ld->ops->flush_buffer(tty);
> 
> lines out of the &buf->lock section?  I don't think that any code
> outside of tty_buffer.c should touch the port->buf code, and thus should
> not need to hold the &buf->lock.

You could try this out and see what happens. If you attempt to change
the locking then I suggest you involve upstream. I haven't looked too
closely into it. I remember Peter wasn't too happy about the locking
while the "low latency" shortcut was in and then he removed it.

> /Esben

Sebastian

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-07-18 17:28 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-07-06  8:51 tty latency and RT André Pribil
2018-07-06  9:16 ` John Ogness
2018-07-09  9:55   ` Esben Haabendal
2018-07-09 11:37     ` André Pribil
2018-07-11  8:09     ` Sebastian Andrzej Siewior
2018-07-13  8:33       ` Esben Haabendal
2018-07-18 16:49         ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).