public inbox for linux-serial@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v1 0/2] Fix serial console for PREEMPT_RT
@ 2024-01-16  7:32 Leonardo Bras
  2024-01-16  7:32 ` [RFC PATCH v1 2/2] serial/8250: Avoid getting lock in RT atomic context Leonardo Bras
  2024-01-16  7:38 ` [RFC PATCH v1 0/2] Fix serial console for PREEMPT_RT Leonardo Bras
  0 siblings, 2 replies; 5+ messages in thread
From: Leonardo Bras @ 2024-01-16  7:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Jiri Slaby, Thomas Gleixner,
	Ilpo Järvinen, Andy Shevchenko, Florian Fainelli,
	John Ogness, Tony Lindgren, Marcelo Tosatti
  Cc: Leonardo Bras, linux-kernel, linux-serial

While dealing with a bug that breaks the serial console from qemu (8250)
after printing a lot of data, I found some issues on this driver on RT
as well as spurious IRQ behaviors that don't seem to be adeqate for RT.

Comments:
Patch #1:
I found out this driver get an IRQ request for every tx byte, but the
handler is able to deal with sending multiple bytes per "thread wake up". 

Since the irqs_unhandled keep growing, and theads_handled don't change
as often, after some intense load (tx ~300kBytes) the serial will 
disable the IRQ line for this driver, which ultimately breaks the console.

My fist solution kept track of how many requests given handler dealt with,
which got added to theads_handled. On note_interrupt I got the diff from
theads_handled_last and subtracted that diff from irqs_unhandled.

This solution required a change in the irqreturn_t typedef and a bunch of
helpers and defines, as well as adapting the 8250 driver. 
At the end seemed like a overcomplicated solution for the issue, but it
can be an alternative if the current solution is considered imprecise.

Mu cyrrent solution on patch #1 is much simpler, just keeping the
IRQ enabled as long as the irq_thread deal with any IRQ request before
irqs_unhandled hitting the limit value.

Patch #2:
In RT, the 8250 driver has an issue if it's interrupted while holding the
port->lock. If the interruption needs to printk() anything, it
will try to get the port->lock, which is busy, so spin_lock() will try
to reschedule the interruption, which is in atomic context, and will 
trigger a bug.

This bug reproduces quite often, like in 50% of tests I did. 

The only thing I could think of for fixing this is using in_atomic()
when PREEMPT_RT=y, so it makes use of the same mechanism as for
oops_in_progress to avoid getting the lock if it's busy. It's working
just fine.

Yeah, I got the warning in checkpath:
"ERROR: do not use in_atomic in drivers"

So I need some feedback on what to do to avoid this bug, if not 
by using in_atomic() at this driver.

Since this one is linked to the console, any printk will try to get
this drivers port->lock, and so it's kind of hard to avoid this accesses.

I though on doing an interface for spin_lock_only_if_can_sleep() but
it seemed overkill.

Please provide comments / feedback.

Thanks!
Leo


Leonardo Bras (2):
  irq/spurious: Reset irqs_unhandled if an irq_thread handles one IRQ
    request
  serial/8250: Avoid getting lock in RT atomic context

 drivers/tty/serial/8250/8250_port.c | 2 +-
 kernel/irq/spurious.c               | 8 ++++++++
 2 files changed, 9 insertions(+), 1 deletion(-)


base-commit: 052d534373b7ed33712a63d5e17b2b6cdbce84fd
-- 
2.43.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC PATCH v1 2/2] serial/8250: Avoid getting lock in RT atomic context
  2024-01-16  7:32 [RFC PATCH v1 0/2] Fix serial console for PREEMPT_RT Leonardo Bras
@ 2024-01-16  7:32 ` Leonardo Bras
  2024-01-16  7:49   ` Jiri Slaby
  2024-01-16  7:38 ` [RFC PATCH v1 0/2] Fix serial console for PREEMPT_RT Leonardo Bras
  1 sibling, 1 reply; 5+ messages in thread
From: Leonardo Bras @ 2024-01-16  7:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Jiri Slaby, Thomas Gleixner,
	Ilpo Järvinen, Andy Shevchenko, Florian Fainelli,
	John Ogness, Tony Lindgren, Marcelo Tosatti
  Cc: Leonardo Bras, linux-kernel, linux-serial

With PREEMPT_RT enabled, a spin_lock_irqsave() becomes a possibly sleeping
spin_lock(), without preempt_disable() or irq_disable().

This allows a task T1 to get preempted or interrupted while holding the
port->lock. If the preempting task T2 need the lock, spin_lock() code
will schedule T1 back until it finishes using the lock, and then go back to
T2.

There is an issue if a T1 holding port->lock is interrupted by an
IRQ, and this IRQ handler needs to get port->lock for writting (printk):
spin_lock() code will try to reschedule the interrupt handler, which is in
atomic context, causing a BUG() for trying to reschedule/sleep in atomic
context.

So for the case (PREEMPT_RT && in_atomic()) try to get the lock, and if it
fails proceed anyway, just like it's done in oops_in_progress case.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 drivers/tty/serial/8250/8250_port.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index 8ca061d3bbb92..8480832846319 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -3397,7 +3397,7 @@ void serial8250_console_write(struct uart_8250_port *up, const char *s,
 
 	touch_nmi_watchdog();
 
-	if (oops_in_progress)
+	if (oops_in_progress || (IS_ENABLED(CONFIG_PREEMPT_RT) && in_atomic())
 		locked = uart_port_trylock_irqsave(port, &flags);
 	else
 		uart_port_lock_irqsave(port, &flags);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH v1 0/2] Fix serial console for PREEMPT_RT
  2024-01-16  7:32 [RFC PATCH v1 0/2] Fix serial console for PREEMPT_RT Leonardo Bras
  2024-01-16  7:32 ` [RFC PATCH v1 2/2] serial/8250: Avoid getting lock in RT atomic context Leonardo Bras
@ 2024-01-16  7:38 ` Leonardo Bras
  1 sibling, 0 replies; 5+ messages in thread
From: Leonardo Bras @ 2024-01-16  7:38 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Greg Kroah-Hartman, Jiri Slaby, Thomas Gleixner,
	Ilpo Järvinen, Andy Shevchenko, Florian Fainelli,
	John Ogness, Tony Lindgren, Marcelo Tosatti, linux-kernel,
	linux-serial

On Tue, Jan 16, 2024 at 04:32:31AM -0300, Leonardo Bras wrote:
> While dealing with a bug that breaks the serial console from qemu (8250)
> after printing a lot of data, I found some issues on this driver on RT
> as well as spurious IRQ behaviors that don't seem to be adeqate for RT.
> 
> Comments:
> Patch #1:
> I found out this driver get an IRQ request for every tx byte, but the
> handler is able to deal with sending multiple bytes per "thread wake up". 
> 
> Since the irqs_unhandled keep growing, and theads_handled don't change
> as often, after some intense load (tx ~300kBytes) the serial will 
> disable the IRQ line for this driver, which ultimately breaks the console.
> 
> My fist solution kept track of how many requests given handler dealt with,
> which got added to theads_handled. On note_interrupt I got the diff from
> theads_handled_last and subtracted that diff from irqs_unhandled.
> 
> This solution required a change in the irqreturn_t typedef and a bunch of
> helpers and defines, as well as adapting the 8250 driver. 
> At the end seemed like a overcomplicated solution for the issue, but it
> can be an alternative if the current solution is considered imprecise.
> 
> Mu cyrrent solution on patch #1 is much simpler, just keeping the
> IRQ enabled as long as the irq_thread deal with any IRQ request before
> irqs_unhandled hitting the limit value.
> 
> Patch #2:
> In RT, the 8250 driver has an issue if it's interrupted while holding the
> port->lock. If the interruption needs to printk() anything, it
> will try to get the port->lock, which is busy, so spin_lock() will try
> to reschedule the interruption, which is in atomic context, and will 
> trigger a bug.
> 
> This bug reproduces quite often, like in 50% of tests I did. 
> 
> The only thing I could think of for fixing this is using in_atomic()
> when PREEMPT_RT=y, so it makes use of the same mechanism as for
> oops_in_progress to avoid getting the lock if it's busy. It's working
> just fine.
> 
> Yeah, I got the warning in checkpath:
> "ERROR: do not use in_atomic in drivers"
> 
> So I need some feedback on what to do to avoid this bug, if not 
> by using in_atomic() at this driver.
> 
> Since this one is linked to the console, any printk will try to get
> this drivers port->lock, and so it's kind of hard to avoid this accesses.
> 
> I though on doing an interface for spin_lock_only_if_can_sleep() but
> it seemed overkill.
> 
> Please provide comments / feedback.
> 
> Thanks!
> Leo
> 
> 
> Leonardo Bras (2):
>   irq/spurious: Reset irqs_unhandled if an irq_thread handles one IRQ
>     request
>   serial/8250: Avoid getting lock in RT atomic context
> 
>  drivers/tty/serial/8250/8250_port.c | 2 +-
>  kernel/irq/spurious.c               | 8 ++++++++
>  2 files changed, 9 insertions(+), 1 deletion(-)
> 
> 
> base-commit: 052d534373b7ed33712a63d5e17b2b6cdbce84fd
> -- 
> 2.43.0
> 

Resent this one with the correct order / patches.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH v1 2/2] serial/8250: Avoid getting lock in RT atomic context
  2024-01-16  7:32 ` [RFC PATCH v1 2/2] serial/8250: Avoid getting lock in RT atomic context Leonardo Bras
@ 2024-01-16  7:49   ` Jiri Slaby
  2024-01-16 18:24     ` Leonardo Bras
  0 siblings, 1 reply; 5+ messages in thread
From: Jiri Slaby @ 2024-01-16  7:49 UTC (permalink / raw)
  To: Leonardo Bras, Greg Kroah-Hartman, Thomas Gleixner,
	Ilpo Järvinen, Andy Shevchenko, Florian Fainelli,
	John Ogness, Tony Lindgren, Marcelo Tosatti, Steven Rostedt
  Cc: linux-kernel, linux-serial

On 16. 01. 24, 8:32, Leonardo Bras wrote:
> With PREEMPT_RT enabled, a spin_lock_irqsave() becomes a possibly sleeping
> spin_lock(), without preempt_disable() or irq_disable().
> 
> This allows a task T1 to get preempted or interrupted while holding the
> port->lock. If the preempting task T2 need the lock, spin_lock() code
> will schedule T1 back until it finishes using the lock, and then go back to
> T2.
> 
> There is an issue if a T1 holding port->lock is interrupted by an
> IRQ, and this IRQ handler needs to get port->lock for writting (printk):
> spin_lock() code will try to reschedule the interrupt handler, which is in
> atomic context, causing a BUG() for trying to reschedule/sleep in atomic
> context.
> 
> So for the case (PREEMPT_RT && in_atomic()) try to get the lock, and if it
> fails proceed anyway, just like it's done in oops_in_progress case.

Hmm, that appears incorrect to me.

Perhaps we need a raw spin lock? Or maybe I am totally off, as my RT 
knowledge is close to zero.

This needs advices from RT folks...

> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>   drivers/tty/serial/8250/8250_port.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
> index 8ca061d3bbb92..8480832846319 100644
> --- a/drivers/tty/serial/8250/8250_port.c
> +++ b/drivers/tty/serial/8250/8250_port.c
> @@ -3397,7 +3397,7 @@ void serial8250_console_write(struct uart_8250_port *up, const char *s,
>   
>   	touch_nmi_watchdog();
>   
> -	if (oops_in_progress)
> +	if (oops_in_progress || (IS_ENABLED(CONFIG_PREEMPT_RT) && in_atomic())
>   		locked = uart_port_trylock_irqsave(port, &flags);
>   	else
>   		uart_port_lock_irqsave(port, &flags);

-- 
js
suse labs


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH v1 2/2] serial/8250: Avoid getting lock in RT atomic context
  2024-01-16  7:49   ` Jiri Slaby
@ 2024-01-16 18:24     ` Leonardo Bras
  0 siblings, 0 replies; 5+ messages in thread
From: Leonardo Bras @ 2024-01-16 18:24 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Leonardo Bras, Greg Kroah-Hartman, Thomas Gleixner,
	Ilpo Järvinen, Andy Shevchenko, Florian Fainelli,
	John Ogness, Tony Lindgren, Marcelo Tosatti, Steven Rostedt,
	linux-kernel, linux-serial

On Tue, Jan 16, 2024 at 08:49:14AM +0100, Jiri Slaby wrote:
> On 16. 01. 24, 8:32, Leonardo Bras wrote:
> > With PREEMPT_RT enabled, a spin_lock_irqsave() becomes a possibly sleeping
> > spin_lock(), without preempt_disable() or irq_disable().
> > 
> > This allows a task T1 to get preempted or interrupted while holding the
> > port->lock. If the preempting task T2 need the lock, spin_lock() code
> > will schedule T1 back until it finishes using the lock, and then go back to
> > T2.
> > 
> > There is an issue if a T1 holding port->lock is interrupted by an
> > IRQ, and this IRQ handler needs to get port->lock for writting (printk):
> > spin_lock() code will try to reschedule the interrupt handler, which is in
> > atomic context, causing a BUG() for trying to reschedule/sleep in atomic
> > context.
> > 
> > So for the case (PREEMPT_RT && in_atomic()) try to get the lock, and if it
> > fails proceed anyway, just like it's done in oops_in_progress case.
> 
> Hmm, that appears incorrect to me.
> 
> Perhaps we need a raw spin lock? Or maybe I am totally off, as my RT
> knowledge is close to zero.

If we have a raw_spin_lock_irqsave() here, it would hurt RT by a lot since 
disabling interrupts is usually bad at the RT kernel, and serial console 
can be used a lot.

> 
> This needs advices from RT folks...

Agree. All help is welcome in this case!

Thanks!
Leo

> 
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > ---
> >   drivers/tty/serial/8250/8250_port.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
> > index 8ca061d3bbb92..8480832846319 100644
> > --- a/drivers/tty/serial/8250/8250_port.c
> > +++ b/drivers/tty/serial/8250/8250_port.c
> > @@ -3397,7 +3397,7 @@ void serial8250_console_write(struct uart_8250_port *up, const char *s,
> >   	touch_nmi_watchdog();
> > -	if (oops_in_progress)
> > +	if (oops_in_progress || (IS_ENABLED(CONFIG_PREEMPT_RT) && in_atomic())
> >   		locked = uart_port_trylock_irqsave(port, &flags);
> >   	else
> >   		uart_port_lock_irqsave(port, &flags);
> 
> -- 
> js
> suse labs
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-01-16 18:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-16  7:32 [RFC PATCH v1 0/2] Fix serial console for PREEMPT_RT Leonardo Bras
2024-01-16  7:32 ` [RFC PATCH v1 2/2] serial/8250: Avoid getting lock in RT atomic context Leonardo Bras
2024-01-16  7:49   ` Jiri Slaby
2024-01-16 18:24     ` Leonardo Bras
2024-01-16  7:38 ` [RFC PATCH v1 0/2] Fix serial console for PREEMPT_RT Leonardo Bras

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox