linux-amlogic.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
       [not found]         ` <CGME20220427070833eucas1p27a32ce7c41c0da26f05bd52155f0031c@eucas1p2.samsung.com>
@ 2022-04-27  7:08           ` Marek Szyprowski
  2022-04-27  7:38             ` Petr Mladek
  0 siblings, 1 reply; 33+ messages in thread
From: Marek Szyprowski @ 2022-04-27  7:08 UTC (permalink / raw)
  To: Petr Mladek, John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman, linux-amlogic

Hi,

On 26.04.2022 15:16, Petr Mladek wrote:
> On Tue 2022-04-26 14:07:42, Petr Mladek wrote:
>> On Mon 2022-04-25 23:04:28, John Ogness wrote:
>>> Currently threaded console printers synchronize against each
>>> other using console_lock(). However, different console drivers
>>> are unrelated and do not require any synchronization between
>>> each other. Removing the synchronization between the threaded
>>> console printers will allow each console to print at its own
>>> speed.
>>>
>>> But the threaded consoles printers do still need to synchronize
>>> against console_lock() callers. Introduce a per-console mutex
>>> and a new console boolean field @blocked to provide this
>>> synchronization.
>>>
>>> console_lock() is modified so that it must acquire the mutex
>>> of each console in order to set the @blocked field. Console
>>> printing threads will acquire their mutex while printing a
>>> record. If @blocked was set, the thread will go back to sleep
>>> instead of printing.
>>>
>>> The reason for the @blocked boolean field is so that
>>> console_lock() callers do not need to acquire multiple console
>>> mutexes simultaneously, which would introduce unnecessary
>>> complexity due to nested mutex locking. Also, a new field
>>> was chosen instead of adding a new @flags value so that the
>>> blocked status could be checked without concern of reading
>>> inconsistent values due to @flags updates from other contexts.
>>>
>>> Threaded console printers also need to synchronize against
>>> console_trylock() callers. Since console_trylock() may be
>>> called from any context, the per-console mutex cannot be used
>>> for this synchronization. (mutex_trylock() cannot be called
>>> from atomic contexts.) Introduce a global atomic counter to
>>> identify if any threaded printers are active. The threaded
>>> printers will also check the atomic counter to identify if the
>>> console has been locked by another task via console_trylock().
>>>
>>> Note that @console_sem is still used to provide synchronization
>>> between console_lock() and console_trylock() callers.
>>>
>>> A locking overview for console_lock(), console_trylock(), and the
>>> threaded printers is as follows (pseudo code):
>>>
>>> console_lock()
>>> {
>>>          down(&console_sem);
>>>          for_each_console(con) {
>>>                  mutex_lock(&con->lock);
>>>                  con->blocked = true;
>>>                  mutex_unlock(&con->lock);
>>>          }
>>>          /* console_lock acquired */
>>> }
>>>
>>> console_trylock()
>>> {
>>>          if (down_trylock(&console_sem) == 0) {
>>>                  if (atomic_cmpxchg(&console_kthreads_active, 0, -1) == 0) {
>>>                          /* console_lock acquired */
>>>                  }
>>>          }
>>> }
>>>
>>> threaded_printer()
>>> {
>>>          mutex_lock(&con->lock);
>>>          if (!con->blocked) {
>>> 		/* console_lock() callers blocked */
>>>
>>>                  if (atomic_inc_unless_negative(&console_kthreads_active)) {
>>>                          /* console_trylock() callers blocked */
>>>
>>>                          con->write();
>>>
>>>                          atomic_dec(&console_lock_count);
>>>                  }
>>>          }
>>>          mutex_unlock(&con->lock);
>>> }
>>>
>>> The console owner and waiter logic now only applies between contexts
>>> that have taken the console_lock via console_trylock(). Threaded
>>> printers never take the console_lock, so they do not have a
>>> console_lock to handover. Tasks that have used console_lock() will
>>> block the threaded printers using a mutex and if the console_lock
>>> is handed over to an atomic context, it would be unable to unblock
>>> the threaded printers. However, the console_trylock() case is
>>> really the only scenario that is interesting for handovers anyway.
>>>
>>> @panic_console_dropped must change to atomic_t since it is no longer
>>> protected exclusively by the console_lock.
>>>
>>> Since threaded printers remain asleep if they see that the console
>>> is locked, they now must be explicitly woken in __console_unlock().
>>> This means wake_up_klogd() calls following a console_unlock() are
>>> no longer necessary and are removed.
>>>
>>> Also note that threaded printers no longer need to check
>>> @console_suspended. The check for the @blocked field implicitly
>>> covers the suspended console case.
>>>
>>> Signed-off-by: John Ogness <john.ogness@linutronix.de>
>> Nice, it it better than v4. I am going to push this for linux-next.
>>
>> Reviewed-by: Petr Mladek <pmladek@suse.com>
> JFYI, I have just pushed this patch instead of the one
> from v4 into printk/linux.git, branch rework/kthreads.
>
> It means that this branch has been rebased. It will be
> used in the next refresh of linux-next.

This patchset landed in linux next-20220426. In my tests I've found that 
it causes deadlock on all my Amlogic Meson G12B/SM1 based boards: Odroid 
C4/N2 and Khadas VIM3/VIM3l. The deadlock happens when system boots to 
userspace and getty (with automated login) is executed. I even see the 
bash prompt, but then the console is freezed. Reverting this patch 
(e00cc0e1cbf4) on top of linux-next (together with 6b3d71e87892 to make 
revert clean) fixes the issue.


Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-04-27  7:08           ` [PATCH printk v5 1/1] printk: extend console_lock for per-console locking Marek Szyprowski
@ 2022-04-27  7:38             ` Petr Mladek
  2022-04-27 11:44               ` Marek Szyprowski
  0 siblings, 1 reply; 33+ messages in thread
From: Petr Mladek @ 2022-04-27  7:38 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

On Wed 2022-04-27 09:08:33, Marek Szyprowski wrote:
> Hi,
> 
> On 26.04.2022 15:16, Petr Mladek wrote:
> > On Tue 2022-04-26 14:07:42, Petr Mladek wrote:
> >> On Mon 2022-04-25 23:04:28, John Ogness wrote:
> >>> Currently threaded console printers synchronize against each
> >>> other using console_lock(). However, different console drivers
> >>> are unrelated and do not require any synchronization between
> >>> each other. Removing the synchronization between the threaded
> >>> console printers will allow each console to print at its own
> >>> speed.
> >>>
> >>> But the threaded consoles printers do still need to synchronize
> >>> against console_lock() callers. Introduce a per-console mutex
> >>> and a new console boolean field @blocked to provide this
> >>> synchronization.
> >>>
> >>> console_lock() is modified so that it must acquire the mutex
> >>> of each console in order to set the @blocked field. Console
> >>> printing threads will acquire their mutex while printing a
> >>> record. If @blocked was set, the thread will go back to sleep
> >>> instead of printing.
> >>>
> >>> The reason for the @blocked boolean field is so that
> >>> console_lock() callers do not need to acquire multiple console
> >>> mutexes simultaneously, which would introduce unnecessary
> >>> complexity due to nested mutex locking. Also, a new field
> >>> was chosen instead of adding a new @flags value so that the
> >>> blocked status could be checked without concern of reading
> >>> inconsistent values due to @flags updates from other contexts.
> >>>
> >>> Threaded console printers also need to synchronize against
> >>> console_trylock() callers. Since console_trylock() may be
> >>> called from any context, the per-console mutex cannot be used
> >>> for this synchronization. (mutex_trylock() cannot be called
> >>> from atomic contexts.) Introduce a global atomic counter to
> >>> identify if any threaded printers are active. The threaded
> >>> printers will also check the atomic counter to identify if the
> >>> console has been locked by another task via console_trylock().
> >>>
> >>> Note that @console_sem is still used to provide synchronization
> >>> between console_lock() and console_trylock() callers.
> >>>
> >>> A locking overview for console_lock(), console_trylock(), and the
> >>> threaded printers is as follows (pseudo code):
> >>>
> >>> console_lock()
> >>> {
> >>>          down(&console_sem);
> >>>          for_each_console(con) {
> >>>                  mutex_lock(&con->lock);
> >>>                  con->blocked = true;
> >>>                  mutex_unlock(&con->lock);
> >>>          }
> >>>          /* console_lock acquired */
> >>> }
> >>>
> >>> console_trylock()
> >>> {
> >>>          if (down_trylock(&console_sem) == 0) {
> >>>                  if (atomic_cmpxchg(&console_kthreads_active, 0, -1) == 0) {
> >>>                          /* console_lock acquired */
> >>>                  }
> >>>          }
> >>> }
> >>>
> >>> threaded_printer()
> >>> {
> >>>          mutex_lock(&con->lock);
> >>>          if (!con->blocked) {
> >>> 		/* console_lock() callers blocked */
> >>>
> >>>                  if (atomic_inc_unless_negative(&console_kthreads_active)) {
> >>>                          /* console_trylock() callers blocked */
> >>>
> >>>                          con->write();
> >>>
> >>>                          atomic_dec(&console_lock_count);
> >>>                  }
> >>>          }
> >>>          mutex_unlock(&con->lock);
> >>> }
> >>>
> >>> The console owner and waiter logic now only applies between contexts
> >>> that have taken the console_lock via console_trylock(). Threaded
> >>> printers never take the console_lock, so they do not have a
> >>> console_lock to handover. Tasks that have used console_lock() will
> >>> block the threaded printers using a mutex and if the console_lock
> >>> is handed over to an atomic context, it would be unable to unblock
> >>> the threaded printers. However, the console_trylock() case is
> >>> really the only scenario that is interesting for handovers anyway.
> >>>
> >>> @panic_console_dropped must change to atomic_t since it is no longer
> >>> protected exclusively by the console_lock.
> >>>
> >>> Since threaded printers remain asleep if they see that the console
> >>> is locked, they now must be explicitly woken in __console_unlock().
> >>> This means wake_up_klogd() calls following a console_unlock() are
> >>> no longer necessary and are removed.
> >>>
> >>> Also note that threaded printers no longer need to check
> >>> @console_suspended. The check for the @blocked field implicitly
> >>> covers the suspended console case.
> >>>
> >>> Signed-off-by: John Ogness <john.ogness@linutronix.de>
> >> Nice, it it better than v4. I am going to push this for linux-next.
> >>
> >> Reviewed-by: Petr Mladek <pmladek@suse.com>
> > JFYI, I have just pushed this patch instead of the one
> > from v4 into printk/linux.git, branch rework/kthreads.
> >
> > It means that this branch has been rebased. It will be
> > used in the next refresh of linux-next.
> 
> This patchset landed in linux next-20220426. In my tests I've found that 
> it causes deadlock on all my Amlogic Meson G12B/SM1 based boards: Odroid 
> C4/N2 and Khadas VIM3/VIM3l. The deadlock happens when system boots to 
> userspace and getty (with automated login) is executed. I even see the 
> bash prompt, but then the console is freezed. Reverting this patch 
> (e00cc0e1cbf4) on top of linux-next (together with 6b3d71e87892 to make 
> revert clean) fixes the issue.

Thanks a lot for the report!

Just by chance, do you have the log from the dead-locked boot stored
in userspace and can you share it? I mean the log stored in
/var/log/dmesg or journaltctl.

In the worst case, it might help to see log from the boot with
the reverted patch. I would help us to see the ordering of various
console-related operations on your system.

And regarding the console. Is it the graphics console (ttyX)
or a serial one (ttyS) or yet another one?

Best Regards,
Petr

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-04-27  7:38             ` Petr Mladek
@ 2022-04-27 11:44               ` Marek Szyprowski
  2022-04-27 16:15                 ` John Ogness
  0 siblings, 1 reply; 33+ messages in thread
From: Marek Szyprowski @ 2022-04-27 11:44 UTC (permalink / raw)
  To: Petr Mladek
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi,

On 27.04.2022 09:38, Petr Mladek wrote:
> On Wed 2022-04-27 09:08:33, Marek Szyprowski wrote:
>> On 26.04.2022 15:16, Petr Mladek wrote:
>>> On Tue 2022-04-26 14:07:42, Petr Mladek wrote:
>>>> On Mon 2022-04-25 23:04:28, John Ogness wrote:
>>>>> Currently threaded console printers synchronize against each
>>>>> other using console_lock(). However, different console drivers
>>>>> are unrelated and do not require any synchronization between
>>>>> each other. Removing the synchronization between the threaded
>>>>> console printers will allow each console to print at its own
>>>>> speed.
>>>>>
>>>>> But the threaded consoles printers do still need to synchronize
>>>>> against console_lock() callers. Introduce a per-console mutex
>>>>> and a new console boolean field @blocked to provide this
>>>>> synchronization.
>>>>>
>>>>> console_lock() is modified so that it must acquire the mutex
>>>>> of each console in order to set the @blocked field. Console
>>>>> printing threads will acquire their mutex while printing a
>>>>> record. If @blocked was set, the thread will go back to sleep
>>>>> instead of printing.
>>>>>
>>>>> The reason for the @blocked boolean field is so that
>>>>> console_lock() callers do not need to acquire multiple console
>>>>> mutexes simultaneously, which would introduce unnecessary
>>>>> complexity due to nested mutex locking. Also, a new field
>>>>> was chosen instead of adding a new @flags value so that the
>>>>> blocked status could be checked without concern of reading
>>>>> inconsistent values due to @flags updates from other contexts.
>>>>>
>>>>> Threaded console printers also need to synchronize against
>>>>> console_trylock() callers. Since console_trylock() may be
>>>>> called from any context, the per-console mutex cannot be used
>>>>> for this synchronization. (mutex_trylock() cannot be called
>>>>> from atomic contexts.) Introduce a global atomic counter to
>>>>> identify if any threaded printers are active. The threaded
>>>>> printers will also check the atomic counter to identify if the
>>>>> console has been locked by another task via console_trylock().
>>>>>
>>>>> Note that @console_sem is still used to provide synchronization
>>>>> between console_lock() and console_trylock() callers.
>>>>>
>>>>> A locking overview for console_lock(), console_trylock(), and the
>>>>> threaded printers is as follows (pseudo code):
>>>>>
>>>>> console_lock()
>>>>> {
>>>>>           down(&console_sem);
>>>>>           for_each_console(con) {
>>>>>                   mutex_lock(&con->lock);
>>>>>                   con->blocked = true;
>>>>>                   mutex_unlock(&con->lock);
>>>>>           }
>>>>>           /* console_lock acquired */
>>>>> }
>>>>>
>>>>> console_trylock()
>>>>> {
>>>>>           if (down_trylock(&console_sem) == 0) {
>>>>>                   if (atomic_cmpxchg(&console_kthreads_active, 0, -1) == 0) {
>>>>>                           /* console_lock acquired */
>>>>>                   }
>>>>>           }
>>>>> }
>>>>>
>>>>> threaded_printer()
>>>>> {
>>>>>           mutex_lock(&con->lock);
>>>>>           if (!con->blocked) {
>>>>> 		/* console_lock() callers blocked */
>>>>>
>>>>>                   if (atomic_inc_unless_negative(&console_kthreads_active)) {
>>>>>                           /* console_trylock() callers blocked */
>>>>>
>>>>>                           con->write();
>>>>>
>>>>>                           atomic_dec(&console_lock_count);
>>>>>                   }
>>>>>           }
>>>>>           mutex_unlock(&con->lock);
>>>>> }
>>>>>
>>>>> The console owner and waiter logic now only applies between contexts
>>>>> that have taken the console_lock via console_trylock(). Threaded
>>>>> printers never take the console_lock, so they do not have a
>>>>> console_lock to handover. Tasks that have used console_lock() will
>>>>> block the threaded printers using a mutex and if the console_lock
>>>>> is handed over to an atomic context, it would be unable to unblock
>>>>> the threaded printers. However, the console_trylock() case is
>>>>> really the only scenario that is interesting for handovers anyway.
>>>>>
>>>>> @panic_console_dropped must change to atomic_t since it is no longer
>>>>> protected exclusively by the console_lock.
>>>>>
>>>>> Since threaded printers remain asleep if they see that the console
>>>>> is locked, they now must be explicitly woken in __console_unlock().
>>>>> This means wake_up_klogd() calls following a console_unlock() are
>>>>> no longer necessary and are removed.
>>>>>
>>>>> Also note that threaded printers no longer need to check
>>>>> @console_suspended. The check for the @blocked field implicitly
>>>>> covers the suspended console case.
>>>>>
>>>>> Signed-off-by: John Ogness <john.ogness@linutronix.de>
>>>> Nice, it it better than v4. I am going to push this for linux-next.
>>>>
>>>> Reviewed-by: Petr Mladek <pmladek@suse.com>
>>> JFYI, I have just pushed this patch instead of the one
>>> from v4 into printk/linux.git, branch rework/kthreads.
>>>
>>> It means that this branch has been rebased. It will be
>>> used in the next refresh of linux-next.
>> This patchset landed in linux next-20220426. In my tests I've found that
>> it causes deadlock on all my Amlogic Meson G12B/SM1 based boards: Odroid
>> C4/N2 and Khadas VIM3/VIM3l. The deadlock happens when system boots to
>> userspace and getty (with automated login) is executed. I even see the
>> bash prompt, but then the console is freezed. Reverting this patch
>> (e00cc0e1cbf4) on top of linux-next (together with 6b3d71e87892 to make
>> revert clean) fixes the issue.
> Thanks a lot for the report!
>
> Just by chance, do you have the log from the dead-locked boot stored
> in userspace and can you share it? I mean the log stored in
> /var/log/dmesg or journaltctl.

If there would be any messages, I expect them to be visible on the 
serial kernel console.

> In the worst case, it might help to see log from the boot with
> the reverted patch. I would help us to see the ordering of various
> console-related operations on your system.
>
> And regarding the console. Is it the graphics console (ttyX)
> or a serial one (ttyS) or yet another one?

Serial console, /dev/ttyAML0 with kernel console enabled. Later a DRM 
driver is loaded (meson_drm), which initializes its fbdev emulation with 
its console.

However it looks that I've trusted automatic bisect a bit too much and 
had a bit of luck while checking the reverts on top of linux-next. The 
issue is not 100% reproducible, so I've did this bisection again 
manually with more tries. The real commit causing the issue is 
09c5ba0aa2fc ("printk: add kthread console printers"). Reverting the 
following 3 commits 6b3d71e878920b085dd823bc422951bb6f143505, 
e00cc0e1cbf4ea5a63d66c8de8d79519855fb231 and 
09c5ba0aa2fcfdadb17d045c3ee6f86d69270df7 on top of linux-next makes the 
system fully operational again.

I've also tried to disable the DRM driver and its fbdev and console (by 
adding modprobe.blacklist=meson_drm to kernel cmdline), but this didn't 
help. Here is the full serial console log:

https://pastebin.com/E5CDH88L

If there is anything you would like me to try, let me know.


Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-04-27 11:44               ` Marek Szyprowski
@ 2022-04-27 16:15                 ` John Ogness
  2022-04-27 16:48                   ` Petr Mladek
                                     ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: John Ogness @ 2022-04-27 16:15 UTC (permalink / raw)
  To: Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman, linux-amlogic

Hi Marek,

On 2022-04-27, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> Here is the full serial console log:
>
> https://pastebin.com/E5CDH88L

Here are a few ideas from me:

1. For next-20220427 the printk-threaded series was slightly changed. I
do not expect it to work any different, but I would prefer we are
debugging the current version. If possible, could you move to
next-20220427?

2. I noticed you boot with the kernel boot arguments "earlycon" and
"no_console_suspend". Could you try booting without this? I expect this
will make no difference.

3. It looks like the problem happens quite late in the boot process. I
expect it is due to some userspace process that is running that is
interacting with printk (either /dev/kmsg or /proc/kmsg) and is causing
problems. If you boot with init=/bin/sh then I expect the system is
running fine. (You don't have much of a system running, but it should
not hang.) We need to isolate which userspace process is triggering the
issue.

4. Have you tried issuing magic sysrq commands on the serial line? (For
example, sending a break signal and then the letter 't' or sending a
break signal and then the letter 'c'?) That might trigger various dumps
so that we can see the system state.

5. You are not running a VT console, so the graphics driver should not
be affecting the printk subsystem at all. I expect your autologin is
also starting various services and programs. If you disable the
automatic login and instead manually login (perhaps as another user) can
you manually start those services one at a time to see at what point the
system hangs?

Thanks for you help with this!

John Ogness

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-04-27 16:15                 ` John Ogness
@ 2022-04-27 16:48                   ` Petr Mladek
  2022-04-28 14:54                   ` Petr Mladek
  2022-04-29 13:53                   ` Marek Szyprowski
  2 siblings, 0 replies; 33+ messages in thread
From: Petr Mladek @ 2022-04-27 16:48 UTC (permalink / raw)
  To: John Ogness
  Cc: Marek Szyprowski, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Greg Kroah-Hartman, linux-amlogic

On Wed 2022-04-27 18:21:16, John Ogness wrote:
> Hi Marek,
> 
> On 2022-04-27, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> > Here is the full serial console log:
> >
> > https://pastebin.com/E5CDH88L
> 
> Here are a few ideas from me:
> 
> 1. For next-20220427 the printk-threaded series was slightly changed. I
> do not expect it to work any different, but I would prefer we are
> debugging the current version. If possible, could you move to
> next-20220427?
> 
> 2. I noticed you boot with the kernel boot arguments "earlycon" and
> "no_console_suspend". Could you try booting without this? I expect this
> will make no difference.
> 
> 3. It looks like the problem happens quite late in the boot process. I
> expect it is due to some userspace process that is running that is
> interacting with printk (either /dev/kmsg or /proc/kmsg) and is causing
> problems. If you boot with init=/bin/sh then I expect the system is
> running fine. (You don't have much of a system running, but it should
> not hang.) We need to isolate which userspace process is triggering the
> issue.

Interesting idea.


> 4. Have you tried issuing magic sysrq commands on the serial line? (For
> example, sending a break signal and then the letter 't' or sending a
> break signal and then the letter 'c'?) That might trigger various dumps
> so that we can see the system state.

I see that sshd is started. If you are able to connect the system
with the frozen login via ssh then it might be easier to trigger
sysrq via procfs, for example:

  #> echo t >/proc/sysrq-trigger

"sysrq t" should print state of all processes. It might show what process
is hanging and where.


> 5. You are not running a VT console, so the graphics driver should not
> be affecting the printk subsystem at all. I expect your autologin is
> also starting various services and programs. If you disable the
> automatic login and instead manually login (perhaps as another user) can
> you manually start those services one at a time to see at what point the
> system hangs?

Yeah, I am not able to reproduce it and some more clues would help.

Best Regards,
Petr

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-04-27 16:15                 ` John Ogness
  2022-04-27 16:48                   ` Petr Mladek
@ 2022-04-28 14:54                   ` Petr Mladek
  2022-04-29 13:53                   ` Marek Szyprowski
  2 siblings, 0 replies; 33+ messages in thread
From: Petr Mladek @ 2022-04-28 14:54 UTC (permalink / raw)
  To: John Ogness
  Cc: Marek Szyprowski, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel, Greg Kroah-Hartman, linux-amlogic

On Wed 2022-04-27 18:21:16, John Ogness wrote:
> Hi Marek,
> 
> On 2022-04-27, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> > Here is the full serial console log:
> >
> > https://pastebin.com/E5CDH88L
> 
> Here are a few ideas from me:
> 
> 3. It looks like the problem happens quite late in the boot process. I
> expect it is due to some userspace process that is running that is
> interacting with printk (either /dev/kmsg or /proc/kmsg) and is causing
> problems.

I did not find an real issue in the code handling /dev/kmsg,
/proc/kmsg, or syslog sycall API. There might be just few
small changes:

    1. There is an increased number of spurious wakeups because
       log_wait is shared between upstream readers and printk kthreads.
       And we newly wake up waiters from both vprintk_emit()
       and __console_unlock() code paths.

       It might affect especially the pooling APIs: kmsg_pool(),
       devkmsg_pool()). They might return 0 more often than before.


    2. 4th patch replaced wake_up_interruptible(&log_wait) with
       wake_up_interruptible_all(&log_wait). As a result, all
       readers are woken at the same time.

       It is perfectly fine because the log buffer is lockless.
       And all readers should be either independent or synchronized
       against each other.


Any of the above changes should not introduce new problems. But
they might make some old problem (race) more visible.

I spent quite some time reviewing the code and testing. But I neither
see any problem nor I am able to reproduce it. Some more clues
from Marek would be needed.

Best Regards,
Petr

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-04-27 16:15                 ` John Ogness
  2022-04-27 16:48                   ` Petr Mladek
  2022-04-28 14:54                   ` Petr Mladek
@ 2022-04-29 13:53                   ` Marek Szyprowski
  2022-04-30 16:00                     ` John Ogness
  2 siblings, 1 reply; 33+ messages in thread
From: Marek Szyprowski @ 2022-04-29 13:53 UTC (permalink / raw)
  To: John Ogness, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman, linux-amlogic

Hi John,

On 27.04.2022 18:15, John Ogness wrote:
> On 2022-04-27, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>> Here is the full serial console log:
>>
>> https://protect2.fireeye.com/v1/url?k=087c101e-57e728e3-087d9b51-000babff317b-69d8576a8b9d481f&q=1&e=5f72c413-9d23-4e64-98e4-377fcc2038de&u=https%3A%2F%2Fpastebin.com%2FE5CDH88L
> Here are a few ideas from me:
>
> 1. For next-20220427 the printk-threaded series was slightly changed. I
> do not expect it to work any different, but I would prefer we are
> debugging the current version. If possible, could you move to
> next-20220427?

I've moved to next-20220429. Nothing changed compared to next-20220427.


> 2. I noticed you boot with the kernel boot arguments "earlycon" and
> "no_console_suspend". Could you try booting without this? I expect this
> will make no difference.

Well, nothing changed.


> 3. It looks like the problem happens quite late in the boot process. I
> expect it is due to some userspace process that is running that is
> interacting with printk (either /dev/kmsg or /proc/kmsg) and is causing
> problems. If you boot with init=/bin/sh then I expect the system is
> running fine. (You don't have much of a system running, but it should
> not hang.) We need to isolate which userspace process is triggering the
> issue.

The same issue happens if I boot with init=/bin/bash


> 4. Have you tried issuing magic sysrq commands on the serial line? (For
> example, sending a break signal and then the letter 't' or sending a
> break signal and then the letter 'c'?) That might trigger various dumps
> so that we can see the system state.
>
> 5. You are not running a VT console, so the graphics driver should not
> be affecting the printk subsystem at all. I expect your autologin is
> also starting various services and programs. If you disable the
> automatic login and instead manually login (perhaps as another user) can
> you manually start those services one at a time to see at what point the
> system hangs?
>
> Thanks for you help with this!

I found something really interesting. When lockup happens, I'm still 
able to log via ssh and trigger any magic sysrq action via 
/proc/sysrq-trigger (triggering it from UART console via break doesn't 
work).

It turned out that the UART console is somehow blocked, but it receives 
and buffers all the input. For example after issuing "echo 
 >/proc/sysrq-trigger" from the ssh console, the UART console has been 
updated and I see the magic sysrq banner and then all the commands I 
blindly typed in the UART console! However this doesn't unblock the console.

Here is the output of 't' magic sys request:

https://pastebin.com/fjbRuy4f

If you have any more suggestion what to check let me know.

This issue must be somehow related to the way the UART driver works on 
the Amlogic Meson boards. The other boards based on different SoCs 
(Exynos, QCOM, BCM) I have in my test farm (with the same userspace and 
configuration) work fine with those patches.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-04-29 13:53                   ` Marek Szyprowski
@ 2022-04-30 16:00                     ` John Ogness
  2022-05-02  9:19                       ` Marek Szyprowski
  0 siblings, 1 reply; 33+ messages in thread
From: John Ogness @ 2022-04-30 16:00 UTC (permalink / raw)
  To: Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman, linux-amlogic

On 2022-04-29, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> The same issue happens if I boot with init=/bin/bash

Very interesting. Since you are seeing all the output up until you try
doing something, I guess receiving UART data is triggering the issue.

> I found something really interesting. When lockup happens, I'm still
> able to log via ssh and trigger any magic sysrq action via
> /proc/sysrq-trigger.

If you boot the system and directly login via ssh (without sending any
data via serial), can you trigger printk output on the UART? For
example, with:

    echo hello > /dev/kmsg

(You might need to increase your loglevel to see it.)

> It turned out that the UART console is somehow blocked, but it
> receives and buffers all the input. For example after issuing "echo
>  >/proc/sysrq-trigger" from the ssh console, the UART console has been 
> updated and I see the magic sysrq banner and then all the commands I 
> blindly typed in the UART console! However this doesn't unblock the
> console.

sysrq falls back to direct printing. This would imply that the kthread
printer is somehow unable to print.

> Here is the output of 't' magic sys request:
>
> https://pastebin.com/fjbRuy4f

It looks like the call trace for the printing kthread (pr/ttyAML0) is
corrupt.

Could you post your kernel config?

John

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-04-30 16:00                     ` John Ogness
@ 2022-05-02  9:19                       ` Marek Szyprowski
  2022-05-02 13:11                         ` John Ogness
  2022-05-02 13:17                         ` Petr Mladek
  0 siblings, 2 replies; 33+ messages in thread
From: Marek Szyprowski @ 2022-05-02  9:19 UTC (permalink / raw)
  To: John Ogness, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman, linux-amlogic

Hi John,

On 30.04.2022 18:00, John Ogness wrote:
> On 2022-04-29, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>> The same issue happens if I boot with init=/bin/bash
> Very interesting. Since you are seeing all the output up until you try
> doing something, I guess receiving UART data is triggering the issue.

Right, this is how it looks like.

>> I found something really interesting. When lockup happens, I'm still
>> able to log via ssh and trigger any magic sysrq action via
>> /proc/sysrq-trigger.
> If you boot the system and directly login via ssh (without sending any
> data via serial), can you trigger printk output on the UART? For
> example, with:
>
>      echo hello > /dev/kmsg
>
> (You might need to increase your loglevel to see it.)

Data written to /dev/kmsg and all kernel logs were always displayed 
correctly. Also data written directly to /dev/ttyAML0 is displayed 
properly on the console. The latter doesn't however trigger the input 
related activity.

It looks that the data read from the uart is delivered only if other 
activity happens on the kernel console. If I type 'reboot' and press 
enter, nothing happens immediately. If I type 'date >/dev/ttyAML0' via 
ssh then, I only see the date printed on the console. However if I type 
'date >/dev/kmsg', the the date is printed and reboot happens.


>> It turned out that the UART console is somehow blocked, but it
>> receives and buffers all the input. For example after issuing "echo
>>   >/proc/sysrq-trigger" from the ssh console, the UART console has been
>> updated and I see the magic sysrq banner and then all the commands I
>> blindly typed in the UART console! However this doesn't unblock the
>> console.
> sysrq falls back to direct printing. This would imply that the kthread
> printer is somehow unable to print.
>
>> Here is the output of 't' magic sys request:
>>
>> https://protect2.fireeye.com/v1/url?k=8649f24d-e73258c4-86487902-74fe48600034-a2ca6bb18361467d&q=1&e=1bc4226f-a422-42b9-95e8-128845b8609f&u=https%3A%2F%2Fpastebin.com%2FfjbRuy4f
> It looks like the call trace for the printing kthread (pr/ttyAML0) is
> corrupt.

Right, good catch. For comparison, here is a 't' sysrq result from the 
'working' serial console (next-20220429), which happens usually 1 of 4 
boots:

https://pastebin.com/mp8zGFbW


> Could you post your kernel config?

https://pastebin.com/GUWGdCHX

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-02  9:19                       ` Marek Szyprowski
@ 2022-05-02 13:11                         ` John Ogness
  2022-05-02 22:29                           ` Marek Szyprowski
  2022-06-08 15:10                           ` Geert Uytterhoeven
  2022-05-02 13:17                         ` Petr Mladek
  1 sibling, 2 replies; 33+ messages in thread
From: John Ogness @ 2022-05-02 13:11 UTC (permalink / raw)
  To: Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman, linux-amlogic

On 2022-05-02, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> Data written to /dev/kmsg and all kernel logs were always displayed
> correctly. Also data written directly to /dev/ttyAML0 is displayed
> properly on the console. The latter doesn't however trigger the input
> related activity.
>
> It looks that the data read from the uart is delivered only if other
> activity happens on the kernel console. If I type 'reboot' and press
> enter, nothing happens immediately. If I type 'date >/dev/ttyAML0' via
> ssh then, I only see the date printed on the console. However if I
> type 'date >/dev/kmsg', the the date is printed and reboot happens.

I suppose if you login via ssh and check /proc/interrupts, then type
some things over serial, then check /proc/interrupts again, you will see
there have been no interrupts for the uart. But interrupts for other
devices are happening. Is this correct?

> For comparison, here is a 't' sysrq result from the 'working' serial
> console (next-20220429), which happens usually 1 of 4 boots:
>
> https://pastebin.com/mp8zGFbW

This still looks odd to me. We should be seeing a trace originating from
ret_from_fork+0x10/0x20 and kthread+0x118/0x11c.

I wonder if the early creation of the thread is somehow causing
problems. Could you try the following patch to see if it makes a
difference? I would also like to see the sysrq-t output with this patch
applied:

---------------- BEGIN PATCH ---------------
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 2311a0ad584a..c4362d25de22 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3837,7 +3837,7 @@ static int __init printk_activate_kthreads(void)
 
 	return 0;
 }
-early_initcall(printk_activate_kthreads);
+late_initcall(printk_activate_kthreads);
 
 #if defined CONFIG_PRINTK
 /* If @con is specified, only wait for that console. Otherwise wait for all. */
---------------- END PATCH ---------------

Thanks for your help with this!

John

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-02  9:19                       ` Marek Szyprowski
  2022-05-02 13:11                         ` John Ogness
@ 2022-05-02 13:17                         ` Petr Mladek
  2022-05-02 23:13                           ` Marek Szyprowski
  1 sibling, 1 reply; 33+ messages in thread
From: Petr Mladek @ 2022-05-02 13:17 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

On Mon 2022-05-02 11:19:07, Marek Szyprowski wrote:
> Hi John,
> 
> On 30.04.2022 18:00, John Ogness wrote:
> > On 2022-04-29, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> >> The same issue happens if I boot with init=/bin/bash
> > Very interesting. Since you are seeing all the output up until you try
> > doing something, I guess receiving UART data is triggering the issue.
> 
> Right, this is how it looks like.
> 
> >> I found something really interesting. When lockup happens, I'm still
> >> able to log via ssh and trigger any magic sysrq action via
> >> /proc/sysrq-trigger.
> > If you boot the system and directly login via ssh (without sending any
> > data via serial), can you trigger printk output on the UART? For
> > example, with:
> >
> >      echo hello > /dev/kmsg
> >
> > (You might need to increase your loglevel to see it.)
> 
> Data written to /dev/kmsg and all kernel logs were always displayed 
> correctly. Also data written directly to /dev/ttyAML0 is displayed 
> properly on the console. The latter doesn't however trigger the input 
> related activity.
> 
> It looks that the data read from the uart is delivered only if other 
> activity happens on the kernel console. If I type 'reboot' and press 
> enter, nothing happens immediately. If I type 'date >/dev/ttyAML0' via 
> ssh then, I only see the date printed on the console. However if I type 
> 'date >/dev/kmsg', the the date is printed and reboot happens.

This is really interesting.

'date >/dev/kmsg' should be handled like a normal printk().
It should get pushed to the console using printk kthread,
that calls call_console_driver() that calls con->write()
callback. In our case, it should be meson_serial_console_write().

I am not sure if meson_serial_console_write() is used also
when writing via /dev/ttyAML0.

> 
> >> It turned out that the UART console is somehow blocked, but it
> >> receives and buffers all the input. For example after issuing "echo
> >>   >/proc/sysrq-trigger" from the ssh console, the UART console has been
> >> updated and I see the magic sysrq banner and then all the commands I
> >> blindly typed in the UART console! However this doesn't unblock the
> >> console.
> > sysrq falls back to direct printing. This would imply that the kthread
> > printer is somehow unable to print.
> >
> >> Here is the output of 't' magic sys request:
> >>
> >> https://protect2.fireeye.com/v1/url?k=8649f24d-e73258c4-86487902-74fe48600034-a2ca6bb18361467d&q=1&e=1bc4226f-a422-42b9-95e8-128845b8609f&u=https%3A%2F%2Fpastebin.com%2FfjbRuy4f
> > It looks like the call trace for the printing kthread (pr/ttyAML0) is
> > corrupt.
> 
> Right, good catch. For comparison, here is a 't' sysrq result from the 
> 'working' serial console (next-20220429), which happens usually 1 of 4 
> boots:
> 
> https://pastebin.com/mp8zGFbW

Strange. The backtrace is weird here too:

[   50.514509] task:pr/ttyAML0      state:R  running task     stack:    0 pid:   65 ppid:     2 flags:0x00000008
[   50.514540] Call trace:
[   50.514548]  __switch_to+0xe8/0x160
[   50.514561]  meson_serial_console+0x78/0x118

There should be kthread() and printk_kthread_func() on the stack.

Hmm,  meson_serial_console+0x78/0x118 is weird on its own.
meson_serial_console is the name of the structure. I would
expect a name of the .write callback, like
meson_serial_console_write()

Best Regards,
Petr

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-02 13:11                         ` John Ogness
@ 2022-05-02 22:29                           ` Marek Szyprowski
  2022-05-04  5:56                             ` John Ogness
  2022-06-08 15:10                           ` Geert Uytterhoeven
  1 sibling, 1 reply; 33+ messages in thread
From: Marek Szyprowski @ 2022-05-02 22:29 UTC (permalink / raw)
  To: John Ogness, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman, linux-amlogic

Hi John,

On 02.05.2022 15:11, John Ogness wrote:
> On 2022-05-02, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>> Data written to /dev/kmsg and all kernel logs were always displayed
>> correctly. Also data written directly to /dev/ttyAML0 is displayed
>> properly on the console. The latter doesn't however trigger the input
>> related activity.
>>
>> It looks that the data read from the uart is delivered only if other
>> activity happens on the kernel console. If I type 'reboot' and press
>> enter, nothing happens immediately. If I type 'date >/dev/ttyAML0' via
>> ssh then, I only see the date printed on the console. However if I
>> type 'date >/dev/kmsg', the the date is printed and reboot happens.
> I suppose if you login via ssh and check /proc/interrupts, then type
> some things over serial, then check /proc/interrupts again, you will see
> there have been no interrupts for the uart. But interrupts for other
> devices are happening. Is this correct?

Right. The counter for ttyAML0 is not increased when lockup happens and 
I type something to the uart console.

>> For comparison, here is a 't' sysrq result from the 'working' serial
>> console (next-20220429), which happens usually 1 of 4 boots:
>>
>> https://protect2.fireeye.com/v1/url?k=3ef0fd63-5f7be855-3ef1762c-000babff9b5d-2e40dc5adc30a14c&q=1&e=1469838f-8586-403e-bd4d-922675d8b658&u=https%3A%2F%2Fpastebin.com%2Fmp8zGFbW
> This still looks odd to me. We should be seeing a trace originating from
> ret_from_fork+0x10/0x20 and kthread+0x118/0x11c.
>
> I wonder if the early creation of the thread is somehow causing
> problems. Could you try the following patch to see if it makes a
> difference? I would also like to see the sysrq-t output with this patch
> applied:
>
> ---------------- BEGIN PATCH ---------------
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 2311a0ad584a..c4362d25de22 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3837,7 +3837,7 @@ static int __init printk_activate_kthreads(void)
>   
>   	return 0;
>   }
> -early_initcall(printk_activate_kthreads);
> +late_initcall(printk_activate_kthreads);
>   
>   #if defined CONFIG_PRINTK
>   /* If @con is specified, only wait for that console. Otherwise wait for all. */
> ---------------- END PATCH ---------------
>
> Thanks for your help with this!

Well, nothing has changed. The lockup still happens.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-02 13:17                         ` Petr Mladek
@ 2022-05-02 23:13                           ` Marek Szyprowski
  2022-05-03  6:49                             ` Petr Mladek
  2022-05-04 21:11                             ` John Ogness
  0 siblings, 2 replies; 33+ messages in thread
From: Marek Szyprowski @ 2022-05-02 23:13 UTC (permalink / raw)
  To: Petr Mladek
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi Petr,

On 02.05.2022 15:17, Petr Mladek wrote:
> On Mon 2022-05-02 11:19:07, Marek Szyprowski wrote:
>> On 30.04.2022 18:00, John Ogness wrote:
>>> On 2022-04-29, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>>>> The same issue happens if I boot with init=/bin/bash
>>> Very interesting. Since you are seeing all the output up until you try
>>> doing something, I guess receiving UART data is triggering the issue.
>> Right, this is how it looks like.
>>
>>>> I found something really interesting. When lockup happens, I'm still
>>>> able to log via ssh and trigger any magic sysrq action via
>>>> /proc/sysrq-trigger.
>>> If you boot the system and directly login via ssh (without sending any
>>> data via serial), can you trigger printk output on the UART? For
>>> example, with:
>>>
>>>       echo hello > /dev/kmsg
>>>
>>> (You might need to increase your loglevel to see it.)
>> Data written to /dev/kmsg and all kernel logs were always displayed
>> correctly. Also data written directly to /dev/ttyAML0 is displayed
>> properly on the console. The latter doesn't however trigger the input
>> related activity.
>>
>> It looks that the data read from the uart is delivered only if other
>> activity happens on the kernel console. If I type 'reboot' and press
>> enter, nothing happens immediately. If I type 'date >/dev/ttyAML0' via
>> ssh then, I only see the date printed on the console. However if I type
>> 'date >/dev/kmsg', the the date is printed and reboot happens.
> This is really interesting.
>
> 'date >/dev/kmsg' should be handled like a normal printk().
> It should get pushed to the console using printk kthread,
> that calls call_console_driver() that calls con->write()
> callback. In our case, it should be meson_serial_console_write().
>
> I am not sure if meson_serial_console_write() is used also
> when writing via /dev/ttyAML0.
>
>>>> It turned out that the UART console is somehow blocked, but it
>>>> receives and buffers all the input. For example after issuing "echo
>>>>    >/proc/sysrq-trigger" from the ssh console, the UART console has been
>>>> updated and I see the magic sysrq banner and then all the commands I
>>>> blindly typed in the UART console! However this doesn't unblock the
>>>> console.
>>> sysrq falls back to direct printing. This would imply that the kthread
>>> printer is somehow unable to print.
>>>
>>>> Here is the output of 't' magic sys request:
>>>>
>>>> https://protect2.fireeye.com/v1/url?k=8649f24d-e73258c4-86487902-74fe48600034-a2ca6bb18361467d&q=1&e=1bc4226f-a422-42b9-95e8-128845b8609f&u=https%3A%2F%2Fpastebin.com%2FfjbRuy4f
>>> It looks like the call trace for the printing kthread (pr/ttyAML0) is
>>> corrupt.
>> Right, good catch. For comparison, here is a 't' sysrq result from the
>> 'working' serial console (next-20220429), which happens usually 1 of 4
>> boots:
>>
>> https://protect2.fireeye.com/v1/url?k=610106f1-008a13b6-61008dbe-000babff99aa-11083c39c44861df&q=1&e=2eafad9e-c5d2-4696-9d78-f3b5885256dc&u=https%3A%2F%2Fpastebin.com%2Fmp8zGFbW
> Strange. The backtrace is weird here too:
>
> [   50.514509] task:pr/ttyAML0      state:R  running task     stack:    0 pid:   65 ppid:     2 flags:0x00000008
> [   50.514540] Call trace:
> [   50.514548]  __switch_to+0xe8/0x160
> [   50.514561]  meson_serial_console+0x78/0x118
>
> There should be kthread() and printk_kthread_func() on the stack.
>
> Hmm,  meson_serial_console+0x78/0x118 is weird on its own.
> meson_serial_console is the name of the structure. I would
> expect a name of the .write callback, like
> meson_serial_console_write()

Well, this sounds a bit like a stack corruption. For the reference, I've 
checked what magic sysrq 't' reports for my other test boards:

RaspberryPi4:

[  166.702431] task:pr/ttyS0        state:R stack:    0 pid:   64 
ppid:     2 flags:0x00000008
[  166.711069] Call trace:
[  166.713647]  __switch_to+0xe8/0x160
[  166.717216]  __schedule+0x2f4/0x9f0
[  166.720862]  log_wait+0x0/0x50
[  166.724081] task:vfio-irqfd-clea state:I stack:    0 pid:   65 
ppid:     2 flags:0x00000008
[  166.732698] Call trace:


ARM Juno R1:

[   74.356562] task:pr/ttyAMA0      state:R  running task stack:    0 
pid:   47 ppid:     2 flags:0x00000008
[   74.356605] Call trace:
[   74.356617]  __switch_to+0xe8/0x160
[   74.356637]  amba_console+0x78/0x118
[   74.356657] task:kworker/2:1     state:I stack:    0 pid:   48 
ppid:     2 flags:0x00000008
[   74.356695] Workqueue:  0x0 (mm_percpu_wq)
[   74.356738] Call trace:


QEMU virt/arm64:

[  174.155760] task:pr/ttyAMA0      state:S stack:    0 pid:   26 
ppid:     2 flags:0x00000008
[  174.156305] Call trace:
[  174.156529]  __switch_to+0xe8/0x160
[  174.157131]  0xffff5ebbbfdd62d8


In the last case it doesn't happen always. In the other runs I got the 
following log from QEMU virt/arm64:

[  200.537579] task:pr/ttyAMA0      state:S stack:    0 pid:   26 
ppid:     2 flags:0x00000008
[  200.538121] Call trace:
[  200.538341]  __switch_to+0xe8/0x160
[  200.538583]  __schedule+0x2f4/0x9f0
[  200.538822]  schedule+0x54/0xd0
[  200.539047]  printk_kthread_func+0x2d8/0x3bc
[  200.539301]  kthread+0x118/0x11c
[  200.539523]  ret_from_fork+0x10/0x20


I hope that at least the qemu case will let you to analyze it by 
yourself. I run my test system with the following command:

qemu-system-aarch64 -kernel virt/Image -append "console=ttyAMA0 
root=/dev/vda rootwait" -M virt -cpu cortex-a57 -smp 2 -m 1024 -device 
virtio-blk-device,drive=virtio-blk0 -drive 
file=qemu-virt-rootfs.raw,id=virtio-blk0,if=none,format=raw -netdev 
user,id=user -device virtio-net-device,netdev=user -display none


Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-02 23:13                           ` Marek Szyprowski
@ 2022-05-03  6:49                             ` Petr Mladek
  2022-05-04  6:05                               ` Marek Szyprowski
  2022-05-04 21:11                             ` John Ogness
  1 sibling, 1 reply; 33+ messages in thread
From: Petr Mladek @ 2022-05-03  6:49 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

On Tue 2022-05-03 01:13:19, Marek Szyprowski wrote:
> Hi Petr,
> 
> On 02.05.2022 15:17, Petr Mladek wrote:
> > On Mon 2022-05-02 11:19:07, Marek Szyprowski wrote:
> >> On 30.04.2022 18:00, John Ogness wrote:
> >>> On 2022-04-29, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> >>>> The same issue happens if I boot with init=/bin/bash
> >>> Very interesting. Since you are seeing all the output up until you try
> >>> doing something, I guess receiving UART data is triggering the issue.
> >> Right, this is how it looks like.
> >>
> >>>> I found something really interesting. When lockup happens, I'm still
> >>>> able to log via ssh and trigger any magic sysrq action via
> >>>> /proc/sysrq-trigger.
> >>> If you boot the system and directly login via ssh (without sending any
> >>> data via serial), can you trigger printk output on the UART? For
> >>> example, with:
> >>>
> >>>       echo hello > /dev/kmsg
> >>>
> >>> (You might need to increase your loglevel to see it.)
> >> Data written to /dev/kmsg and all kernel logs were always displayed
> >> correctly. Also data written directly to /dev/ttyAML0 is displayed
> >> properly on the console. The latter doesn't however trigger the input
> >> related activity.
> >>
> >> It looks that the data read from the uart is delivered only if other
> >> activity happens on the kernel console. If I type 'reboot' and press
> >> enter, nothing happens immediately. If I type 'date >/dev/ttyAML0' via
> >> ssh then, I only see the date printed on the console. However if I type
> >> 'date >/dev/kmsg', the the date is printed and reboot happens.
> > This is really interesting.
> >
> > 'date >/dev/kmsg' should be handled like a normal printk().
> > It should get pushed to the console using printk kthread,
> > that calls call_console_driver() that calls con->write()
> > callback. In our case, it should be meson_serial_console_write().
> >
> > I am not sure if meson_serial_console_write() is used also
> > when writing via /dev/ttyAML0.
> >
> >>>> It turned out that the UART console is somehow blocked, but it
> >>>> receives and buffers all the input. For example after issuing "echo
> >>>>    >/proc/sysrq-trigger" from the ssh console, the UART console has been
> >>>> updated and I see the magic sysrq banner and then all the commands I
> >>>> blindly typed in the UART console! However this doesn't unblock the
> >>>> console.
> >>> sysrq falls back to direct printing. This would imply that the kthread
> >>> printer is somehow unable to print.
> >>>
> >>>> Here is the output of 't' magic sys request:
> >>>>
> >>>> https://protect2.fireeye.com/v1/url?k=8649f24d-e73258c4-86487902-74fe48600034-a2ca6bb18361467d&q=1&e=1bc4226f-a422-42b9-95e8-128845b8609f&u=https%3A%2F%2Fpastebin.com%2FfjbRuy4f
> >>> It looks like the call trace for the printing kthread (pr/ttyAML0) is
> >>> corrupt.
> >> Right, good catch. For comparison, here is a 't' sysrq result from the
> >> 'working' serial console (next-20220429), which happens usually 1 of 4
> >> boots:
> >>
> >> https://protect2.fireeye.com/v1/url?k=610106f1-008a13b6-61008dbe-000babff99aa-11083c39c44861df&q=1&e=2eafad9e-c5d2-4696-9d78-f3b5885256dc&u=https%3A%2F%2Fpastebin.com%2Fmp8zGFbW
> > Strange. The backtrace is weird here too:
> >
> > [   50.514509] task:pr/ttyAML0      state:R  running task     stack:    0 pid:   65 ppid:     2 flags:0x00000008
> > [   50.514540] Call trace:
> > [   50.514548]  __switch_to+0xe8/0x160
> > [   50.514561]  meson_serial_console+0x78/0x118
> >
> > There should be kthread() and printk_kthread_func() on the stack.
> >
> > Hmm,  meson_serial_console+0x78/0x118 is weird on its own.
> > meson_serial_console is the name of the structure. I would
> > expect a name of the .write callback, like
> > meson_serial_console_write()
> 
> Well, this sounds a bit like a stack corruption. For the reference, I've 
> checked what magic sysrq 't' reports for my other test boards:
> 
> RaspberryPi4:
> 
> [  166.702431] task:pr/ttyS0        state:R stack:    0 pid:   64 
> ppid:     2 flags:0x00000008
> [  166.711069] Call trace:
> [  166.713647]  __switch_to+0xe8/0x160
> [  166.717216]  __schedule+0x2f4/0x9f0
> [  166.720862]  log_wait+0x0/0x50
> [  166.724081] task:vfio-irqfd-clea state:I stack:    0 pid:   65 
> ppid:     2 flags:0x00000008
> [  166.732698] Call trace:
> 
> 
> ARM Juno R1:
> 
> [   74.356562] task:pr/ttyAMA0      state:R  running task stack:    0 
> pid:   47 ppid:     2 flags:0x00000008
> [   74.356605] Call trace:
> [   74.356617]  __switch_to+0xe8/0x160
> [   74.356637]  amba_console+0x78/0x118
> [   74.356657] task:kworker/2:1     state:I stack:    0 pid:   48 
> ppid:     2 flags:0x00000008
> [   74.356695] Workqueue:  0x0 (mm_percpu_wq)
> [   74.356738] Call trace:
> 
> 
> QEMU virt/arm64:
> 
> [  174.155760] task:pr/ttyAMA0      state:S stack:    0 pid:   26 
> ppid:     2 flags:0x00000008
> [  174.156305] Call trace:
> [  174.156529]  __switch_to+0xe8/0x160
> [  174.157131]  0xffff5ebbbfdd62d8

You mentioned in the other mail that the other boards work as
expected. I mean that console gets stuck only on the meson board.
Is it true, please?

The stack looks really weird. But another weird thing is that
even the meson board is able to show the messages, for example,
using echo hello >/dev/kmsg. It suggests that the kthreads
somehow work.

There is also a possibility that this code path is optimized
some special way and the unwinder has troubles to show
the stack correctly.


> In the last case it doesn't happen always. In the other runs I got the 
> following log from QEMU virt/arm64:
> 
> [  200.537579] task:pr/ttyAMA0      state:S stack:    0 pid:   26 
> ppid:     2 flags:0x00000008
> [  200.538121] Call trace:
> [  200.538341]  __switch_to+0xe8/0x160
> [  200.538583]  __schedule+0x2f4/0x9f0
> [  200.538822]  schedule+0x54/0xd0
> [  200.539047]  printk_kthread_func+0x2d8/0x3bc
> [  200.539301]  kthread+0x118/0x11c
> [  200.539523]  ret_from_fork+0x10/0x20

This is what I would expect when the kthread is in an interruptible
sleep waiting for new strings to handle.

BTW: This is what I see on my x86_64 test system:

[61892.932242] task:pr/tty0         state:S stack:    0 pid:   14 ppid:     2 flags:0x00004000
[61892.932250] Call Trace:
[61892.932253]  <TASK>
[61892.932263]  __schedule+0x376/0xbb0
[61892.932284]  schedule+0x44/0xb0
[61892.932290]  printk_kthread_func+0x18f/0x370
[61892.932303]  ? schedstat_stop+0x10/0x10
[61892.932316]  ? console_start+0x30/0x30
[61892.932322]  kthread+0xf2/0x120
[61892.932327]  ? kthread_complete_and_exit+0x20/0x20
[61892.932338]  ret_from_fork+0x1f/0x30
[61892.932370]  </TASK>
[61892.932373] task:pr/ttyS0        state:R  running task     stack:    0 pid:   15 ppid:     2 flags:0x00004000
[61892.932391] Call Trace:
[61892.932398]  <TASK>
[61892.932427]  ? printk_kthread_func+0x15b/0x370
[61892.932436]  ? schedstat_stop+0x10/0x10
[61892.932449]  ? console_start+0x30/0x30
[61892.932455]  ? kthread+0xf2/0x120
[61892.932460]  ? kthread_complete_and_exit+0x20/0x20
[61892.932471]  ? ret_from_fork+0x1f/0x30
[61892.932502]  </TASK>


pr/tty0 is in the interruptible sleep and the stack looks reasonable.

pr/ttyS0 is in runnable state and the stack is weird. The '?' means
that this address was found on the stack, it belongs to some
function but the unwinder is not able to assign it to the current
call path by going back via the return addresses stored on the stack.


> I hope that at least the qemu case will let you to analyze it by 
> yourself. I run my test system with the following command:
> 
> qemu-system-aarch64 -kernel virt/Image -append "console=ttyAMA0 
> root=/dev/vda rootwait" -M virt -cpu cortex-a57 -smp 2 -m 1024 -device 
> virtio-blk-device,drive=virtio-blk0 -drive 
> file=qemu-virt-rootfs.raw,id=virtio-blk0,if=none,format=raw -netdev 
> user,id=user -device virtio-net-device,netdev=user -display none

Thanks a lot for all the information. It is really helpful.

Best Regards,
Petr

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-02 22:29                           ` Marek Szyprowski
@ 2022-05-04  5:56                             ` John Ogness
  2022-05-04  6:52                               ` Marek Szyprowski
  0 siblings, 1 reply; 33+ messages in thread
From: John Ogness @ 2022-05-04  5:56 UTC (permalink / raw)
  To: Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman, linux-amlogic

On 2022-05-03, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>> I suppose if you login via ssh and check /proc/interrupts, then type
>> some things over serial, then check /proc/interrupts again, you will
>> see there have been no interrupts for the uart. But interrupts for
>> other devices are happening. Is this correct?
>
> Right. The counter for ttyAML0 is not increased when lockup happens
> and I type something to the uart console.

Hmmm. This would imply that the interrupts are disabled fo the UART.

Just to be sure that we haven't corrupted something in the driver, if
you make the following change, everything works, right?

--------- BEGIN PATCH ------
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c7973266b176..1eaa323e335c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3578,7 +3578,7 @@ static int __init printk_activate_kthreads(void)
 	struct console *con;
 
 	console_lock();
-	printk_kthreads_available = true;
+	//printk_kthreads_available = true;
 	for_each_console(con)
 		printk_start_kthread(con);
 	console_unlock();
--------- END PATCH ------

The above change will cause the kthreads to not print and instead always
fallback to the direct method.

John

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-03  6:49                             ` Petr Mladek
@ 2022-05-04  6:05                               ` Marek Szyprowski
  0 siblings, 0 replies; 33+ messages in thread
From: Marek Szyprowski @ 2022-05-04  6:05 UTC (permalink / raw)
  To: Petr Mladek
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

Hi,

On 03.05.2022 08:49, Petr Mladek wrote:
> On Tue 2022-05-03 01:13:19, Marek Szyprowski wrote:
>> On 02.05.2022 15:17, Petr Mladek wrote:
>>> On Mon 2022-05-02 11:19:07, Marek Szyprowski wrote:
>>>> On 30.04.2022 18:00, John Ogness wrote:
>>>>> On 2022-04-29, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>>>>>> The same issue happens if I boot with init=/bin/bash
>>>>> Very interesting. Since you are seeing all the output up until you try
>>>>> doing something, I guess receiving UART data is triggering the issue.
>>>> Right, this is how it looks like.
>>>>
>>>>>> I found something really interesting. When lockup happens, I'm still
>>>>>> able to log via ssh and trigger any magic sysrq action via
>>>>>> /proc/sysrq-trigger.
>>>>> If you boot the system and directly login via ssh (without sending any
>>>>> data via serial), can you trigger printk output on the UART? For
>>>>> example, with:
>>>>>
>>>>>        echo hello > /dev/kmsg
>>>>>
>>>>> (You might need to increase your loglevel to see it.)
>>>> Data written to /dev/kmsg and all kernel logs were always displayed
>>>> correctly. Also data written directly to /dev/ttyAML0 is displayed
>>>> properly on the console. The latter doesn't however trigger the input
>>>> related activity.
>>>>
>>>> It looks that the data read from the uart is delivered only if other
>>>> activity happens on the kernel console. If I type 'reboot' and press
>>>> enter, nothing happens immediately. If I type 'date >/dev/ttyAML0' via
>>>> ssh then, I only see the date printed on the console. However if I type
>>>> 'date >/dev/kmsg', the the date is printed and reboot happens.
>>> This is really interesting.
>>>
>>> 'date >/dev/kmsg' should be handled like a normal printk().
>>> It should get pushed to the console using printk kthread,
>>> that calls call_console_driver() that calls con->write()
>>> callback. In our case, it should be meson_serial_console_write().
>>>
>>> I am not sure if meson_serial_console_write() is used also
>>> when writing via /dev/ttyAML0.
>>>
>>>>>> It turned out that the UART console is somehow blocked, but it
>>>>>> receives and buffers all the input. For example after issuing "echo
>>>>>>     >/proc/sysrq-trigger" from the ssh console, the UART console has been
>>>>>> updated and I see the magic sysrq banner and then all the commands I
>>>>>> blindly typed in the UART console! However this doesn't unblock the
>>>>>> console.
>>>>> sysrq falls back to direct printing. This would imply that the kthread
>>>>> printer is somehow unable to print.
>>>>>
>>>>>> Here is the output of 't' magic sys request:
>>>>>>
>>>>>> https://protect2.fireeye.com/v1/url?k=8649f24d-e73258c4-86487902-74fe48600034-a2ca6bb18361467d&q=1&e=1bc4226f-a422-42b9-95e8-128845b8609f&u=https%3A%2F%2Fpastebin.com%2FfjbRuy4f
>>>>> It looks like the call trace for the printing kthread (pr/ttyAML0) is
>>>>> corrupt.
>>>> Right, good catch. For comparison, here is a 't' sysrq result from the
>>>> 'working' serial console (next-20220429), which happens usually 1 of 4
>>>> boots:
>>>>
>>>> https://protect2.fireeye.com/v1/url?k=610106f1-008a13b6-61008dbe-000babff99aa-11083c39c44861df&q=1&e=2eafad9e-c5d2-4696-9d78-f3b5885256dc&u=https%3A%2F%2Fpastebin.com%2Fmp8zGFbW
>>> Strange. The backtrace is weird here too:
>>>
>>> [   50.514509] task:pr/ttyAML0      state:R  running task     stack:    0 pid:   65 ppid:     2 flags:0x00000008
>>> [   50.514540] Call trace:
>>> [   50.514548]  __switch_to+0xe8/0x160
>>> [   50.514561]  meson_serial_console+0x78/0x118
>>>
>>> There should be kthread() and printk_kthread_func() on the stack.
>>>
>>> Hmm,  meson_serial_console+0x78/0x118 is weird on its own.
>>> meson_serial_console is the name of the structure. I would
>>> expect a name of the .write callback, like
>>> meson_serial_console_write()
>> Well, this sounds a bit like a stack corruption. For the reference, I've
>> checked what magic sysrq 't' reports for my other test boards:
>>
>> RaspberryPi4:
>>
>> [  166.702431] task:pr/ttyS0        state:R stack:    0 pid:   64
>> ppid:     2 flags:0x00000008
>> [  166.711069] Call trace:
>> [  166.713647]  __switch_to+0xe8/0x160
>> [  166.717216]  __schedule+0x2f4/0x9f0
>> [  166.720862]  log_wait+0x0/0x50
>> [  166.724081] task:vfio-irqfd-clea state:I stack:    0 pid:   65
>> ppid:     2 flags:0x00000008
>> [  166.732698] Call trace:
>>
>>
>> ARM Juno R1:
>>
>> [   74.356562] task:pr/ttyAMA0      state:R  running task stack:    0
>> pid:   47 ppid:     2 flags:0x00000008
>> [   74.356605] Call trace:
>> [   74.356617]  __switch_to+0xe8/0x160
>> [   74.356637]  amba_console+0x78/0x118
>> [   74.356657] task:kworker/2:1     state:I stack:    0 pid:   48
>> ppid:     2 flags:0x00000008
>> [   74.356695] Workqueue:  0x0 (mm_percpu_wq)
>> [   74.356738] Call trace:
>>
>>
>> QEMU virt/arm64:
>>
>> [  174.155760] task:pr/ttyAMA0      state:S stack:    0 pid:   26
>> ppid:     2 flags:0x00000008
>> [  174.156305] Call trace:
>> [  174.156529]  __switch_to+0xe8/0x160
>> [  174.157131]  0xffff5ebbbfdd62d8
> You mentioned in the other mail that the other boards work as
> expected. I mean that console gets stuck only on the meson board.
> Is it true, please?

Right. Even on Meson based boards the console is operational about 1 of 
4 boots.


> The stack looks really weird. But another weird thing is that
> even the meson board is able to show the messages, for example,
> using echo hello >/dev/kmsg. It suggests that the kthreads
> somehow work.
>
> There is also a possibility that this code path is optimized
> some special way and the unwinder has troubles to show
> the stack correctly.

I doubt that this is a result of the compiler's optimization. See my 
logs from QEMU's virt machine. I've managed to capture 2 states of 
ttyAMA0 task. One shows some kind of stack corruption imho. It doesn't 
happen always though.

 > ...

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-04  5:56                             ` John Ogness
@ 2022-05-04  6:52                               ` Marek Szyprowski
  0 siblings, 0 replies; 33+ messages in thread
From: Marek Szyprowski @ 2022-05-04  6:52 UTC (permalink / raw)
  To: John Ogness, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman, linux-amlogic

Hi John,

On 04.05.2022 07:56, John Ogness wrote:
> On 2022-05-03, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>>> I suppose if you login via ssh and check /proc/interrupts, then type
>>> some things over serial, then check /proc/interrupts again, you will
>>> see there have been no interrupts for the uart. But interrupts for
>>> other devices are happening. Is this correct?
>> Right. The counter for ttyAML0 is not increased when lockup happens
>> and I type something to the uart console.
> Hmmm. This would imply that the interrupts are disabled fo the UART.
>
> Just to be sure that we haven't corrupted something in the driver, if
> you make the following change, everything works, right?
>
> --------- BEGIN PATCH ------
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index c7973266b176..1eaa323e335c 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3578,7 +3578,7 @@ static int __init printk_activate_kthreads(void)
>   	struct console *con;
>   
>   	console_lock();
> -	printk_kthreads_available = true;
> +	//printk_kthreads_available = true;
>   	for_each_console(con)
>   		printk_start_kthread(con);
>   	console_unlock();
> --------- END PATCH ------
>
> The above change will cause the kthreads to not print and instead always
> fallback to the direct method.

With the above change console always works.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-02 23:13                           ` Marek Szyprowski
  2022-05-03  6:49                             ` Petr Mladek
@ 2022-05-04 21:11                             ` John Ogness
  2022-05-04 22:42                               ` John Ogness
  1 sibling, 1 reply; 33+ messages in thread
From: John Ogness @ 2022-05-04 21:11 UTC (permalink / raw)
  To: Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman, linux-amlogic

On 2022-05-03, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> QEMU virt/arm64:
>
> [  174.155760] task:pr/ttyAMA0      state:S stack:    0 pid:   26 
> ppid:     2 flags:0x00000008
> [  174.156305] Call trace:
> [  174.156529]  __switch_to+0xe8/0x160
> [  174.157131]  0xffff5ebbbfdd62d8

I can reproduce the apparent stack corruption with qemu:

[    5.545268] task:pr/ttyAMA0      state:S stack:    0 pid:   26 ppid:     2 flags:0x00000008
[    5.545520] Call trace:
[    5.545620]  __switch_to+0x104/0x160
[    5.545796]  __schedule+0x2f4/0x9f0
[    5.546122]  schedule+0x54/0xd0
[    5.546206]  0x0

When it happens, the printk-kthread is the only one with the corrupted
stack. It seems I am doing something wrong when creating the kthread? I
will investigate this.

Thanks Marek for helping us to narrow this down.

John

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-04 21:11                             ` John Ogness
@ 2022-05-04 22:42                               ` John Ogness
  2022-05-05 22:33                                 ` John Ogness
  0 siblings, 1 reply; 33+ messages in thread
From: John Ogness @ 2022-05-04 22:42 UTC (permalink / raw)
  To: Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman, linux-amlogic

On 2022-05-04, John Ogness <john.ogness@linutronix.de> wrote:
> I can reproduce the apparent stack corruption with qemu:
>
> [    5.545268] task:pr/ttyAMA0      state:S stack:    0 pid:   26 ppid:     2 flags:0x00000008
> [    5.545520] Call trace:
> [    5.545620]  __switch_to+0x104/0x160
> [    5.545796]  __schedule+0x2f4/0x9f0
> [    5.546122]  schedule+0x54/0xd0
> [    5.546206]  0x0

I believe I am chasing a ghost. I can rather easily reproduce these
strange call traces, but if another sysrq-t is sent afterwards, the call
trace is OK. Also, I added trace_dump_stack() into the printk-kthread
main loop to dump the stack on every iteration. There I never see any
corruption, even though the timestamps are near the sysrq-t dump showing
corruption. Moving trace_dump_stack() into
amba-pl011:pl011_console_write() also showed no stack corruption at very
near times when sysrq-t did.

And it should be noted that the console-hanging issues reported in this
thread _cannot_ be reproduced with qemu.

So I will stop focussing on this "corrupt stack" thing and instead
investigate what the meson driver is doing that causes it to get
stuck. Since interrupts do not even fire, I'm guessing that the RX
interrupts are not being re-enabled (AML_UART_RX_INT_EN) for some code
path. This bit is only explicitly set once, in
meson_uart_startup(). Whenever the bit is cleared, later the previous
value is restored. This is assumed to mean the interrupt gets
re-enabled. But if there is some code path where multiple CPUs can
modify the register, then the interrupt could end up permanently
disabled.

I will go through and check if all access to AML_UART_CONTROL is
protected by port->lock.

John

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-04 22:42                               ` John Ogness
@ 2022-05-05 22:33                                 ` John Ogness
  2022-05-06  6:43                                   ` Marek Szyprowski
  0 siblings, 1 reply; 33+ messages in thread
From: John Ogness @ 2022-05-05 22:33 UTC (permalink / raw)
  To: Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman, linux-amlogic

Hi Marek,

On 2022-05-05, John Ogness <john.ogness@linutronix.de> wrote:
> I will go through and check if all access to AML_UART_CONTROL is
> protected by port->lock.

The startup() callback of the uart_ops is not called with the port
locked. I'm having difficulties identifying if the startup() callback
can occur after the console was already registered via meson_uart_init()
and could be actively printing, but I see other serial drivers are
protecting their registers in the startup() callback with the
port->lock.

Could you try booting the meson hardware with the following change? (And
removing any previous debug changes I posted?)

John

diff --git a/drivers/tty/serial/meson_uart.c b/drivers/tty/serial/meson_uart.c
index 2bf1c57e0981..f551b8603817 100644
--- a/drivers/tty/serial/meson_uart.c
+++ b/drivers/tty/serial/meson_uart.c
@@ -267,9 +267,12 @@ static void meson_uart_reset(struct uart_port *port)
 
 static int meson_uart_startup(struct uart_port *port)
 {
+	unsigned long flags;
 	u32 val;
 	int ret = 0;
 
+	spin_lock_irqsave(&port->lock, flags);
+
 	val = readl(port->membase + AML_UART_CONTROL);
 	val |= AML_UART_CLEAR_ERR;
 	writel(val, port->membase + AML_UART_CONTROL);
@@ -285,6 +288,8 @@ static int meson_uart_startup(struct uart_port *port)
 	val = (AML_UART_RECV_IRQ(1) | AML_UART_XMIT_IRQ(port->fifosize / 2));
 	writel(val, port->membase + AML_UART_MISC);
 
+	spin_unlock_irqrestore(&port->lock, flags);
+
 	ret = request_irq(port->irq, meson_uart_interrupt, 0,
 			  port->name, port);
 

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-05 22:33                                 ` John Ogness
@ 2022-05-06  6:43                                   ` Marek Szyprowski
  2022-05-06  7:55                                     ` Neil Armstrong
                                                       ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Marek Szyprowski @ 2022-05-06  6:43 UTC (permalink / raw)
  To: John Ogness, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman, linux-amlogic

Hi John,

On 06.05.2022 00:33, John Ogness wrote:
> On 2022-05-05, John Ogness <john.ogness@linutronix.de> wrote:
>> I will go through and check if all access to AML_UART_CONTROL is
>> protected by port->lock.
> The startup() callback of the uart_ops is not called with the port
> locked. I'm having difficulties identifying if the startup() callback
> can occur after the console was already registered via meson_uart_init()
> and could be actively printing, but I see other serial drivers are
> protecting their registers in the startup() callback with the
> port->lock.
>
> Could you try booting the meson hardware with the following change? (And
> removing any previous debug changes I posted?)

Bingo! It looks that the startup() is called when getty initializes 
console. This fixed the issues observed on the Amlogic Meson based boards.

Feel free to add:

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-06  6:43                                   ` Marek Szyprowski
@ 2022-05-06  7:55                                     ` Neil Armstrong
  2022-05-08 11:02                                       ` John Ogness
  2022-05-06  8:16                                     ` Petr Mladek
  2022-05-06  9:20                                     ` John Ogness
  2 siblings, 1 reply; 33+ messages in thread
From: Neil Armstrong @ 2022-05-06  7:55 UTC (permalink / raw)
  To: Marek Szyprowski, John Ogness, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman, linux-amlogic

Hi,

On 06/05/2022 08:43, Marek Szyprowski wrote:
> Hi John,
> 
> On 06.05.2022 00:33, John Ogness wrote:
>> On 2022-05-05, John Ogness <john.ogness@linutronix.de> wrote:
>>> I will go through and check if all access to AML_UART_CONTROL is
>>> protected by port->lock.
>> The startup() callback of the uart_ops is not called with the port
>> locked. I'm having difficulties identifying if the startup() callback
>> can occur after the console was already registered via meson_uart_init()
>> and could be actively printing, but I see other serial drivers are
>> protecting their registers in the startup() callback with the
>> port->lock.
>>
>> Could you try booting the meson hardware with the following change? (And
>> removing any previous debug changes I posted?)
> 
> Bingo! It looks that the startup() is called when getty initializes
> console. This fixed the issues observed on the Amlogic Meson based boards.
> 
> Feel free to add:
> 
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> 
> Best regards

Thanks all for figuring out the issue, perhaps other uart drivers could fall
in the same issue if startup code isn't protected with lock ?

Neil

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-06  6:43                                   ` Marek Szyprowski
  2022-05-06  7:55                                     ` Neil Armstrong
@ 2022-05-06  8:16                                     ` Petr Mladek
  2022-05-06  9:20                                     ` John Ogness
  2 siblings, 0 replies; 33+ messages in thread
From: Petr Mladek @ 2022-05-06  8:16 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Greg Kroah-Hartman, linux-amlogic

On Fri 2022-05-06 08:43:02, Marek Szyprowski wrote:
> Hi John,
> 
> On 06.05.2022 00:33, John Ogness wrote:
> > On 2022-05-05, John Ogness <john.ogness@linutronix.de> wrote:
> >> I will go through and check if all access to AML_UART_CONTROL is
> >> protected by port->lock.
> > The startup() callback of the uart_ops is not called with the port
> > locked. I'm having difficulties identifying if the startup() callback
> > can occur after the console was already registered via meson_uart_init()
> > and could be actively printing, but I see other serial drivers are
> > protecting their registers in the startup() callback with the
> > port->lock.

I guess that it is used by the early console before the racy
code is called.

> > Could you try booting the meson hardware with the following change? (And
> > removing any previous debug changes I posted?)
> 
> Bingo! It looks that the startup() is called when getty initializes 
> console. This fixed the issues observed on the Amlogic Meson based boards.
> 
> Feel free to add:
> 
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

Uff, it is a huge relief that it has got fixed.

Best Regards,
Petr

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-06  6:43                                   ` Marek Szyprowski
  2022-05-06  7:55                                     ` Neil Armstrong
  2022-05-06  8:16                                     ` Petr Mladek
@ 2022-05-06  9:20                                     ` John Ogness
  2 siblings, 0 replies; 33+ messages in thread
From: John Ogness @ 2022-05-06  9:20 UTC (permalink / raw)
  To: Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman, linux-amlogic

On 2022-05-06, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>> Could you try booting the meson hardware with the following change? (And
>> removing any previous debug changes I posted?)
>
> Bingo! It looks that the startup() is called when getty initializes 
> console. This fixed the issues observed on the Amlogic Meson based boards.
>
> Feel free to add:
>
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

Thanks Marek. I will post an official patch on the correct people/lists.

John

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-06  7:55                                     ` Neil Armstrong
@ 2022-05-08 11:02                                       ` John Ogness
  0 siblings, 0 replies; 33+ messages in thread
From: John Ogness @ 2022-05-08 11:02 UTC (permalink / raw)
  To: Neil Armstrong, Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman, linux-amlogic

Hi Neil,

On 2022-05-06, Neil Armstrong <narmstrong@baylibre.com> wrote:
> Thanks all for figuring out the issue, perhaps other uart drivers
> could fall in the same issue if startup code isn't protected with
> lock?

When preparing for the official patch submission [0], I needed quite a
bit of time to understand why another function (meson_uart_reset) should
not and cannot acquire the port->lock.

I then started investigating some other drivers and indeed I see lots of
potential problems. Any console initializing port->lock from the
driver's probe() is probably wrong (and there are lots of them). But as
I've learned with the meson driver, the details are subtle. Each driver
will need to be carefully evaluated to see if it is actually safe.

uart_ops->startup() is called without holding port->lock. If the device
is a console, it is already registered and printing.

driver->probe() is called without holding port->lock. If the device is a
console, it is already registered and printing.

For both functions, port->lock might not be initialized yet, so blindly
acquiring it is wrong.

Note that this is not related to the introduction of kthread printing.

I've put it on my TODO list to go through the ~76 console drivers to
investigate their startup() and probe() implementations. But I will not
be able to do this quickly. My time might be better spent writing to all
the maintainers asking them to please verify the usage.

John Ogness

[0] https://lore.kernel.org/lkml/20220508103547.626355-1-john.ogness@linutronix.de

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-05-02 13:11                         ` John Ogness
  2022-05-02 22:29                           ` Marek Szyprowski
@ 2022-06-08 15:10                           ` Geert Uytterhoeven
  2022-06-09 11:19                             ` John Ogness
  1 sibling, 1 reply; 33+ messages in thread
From: Geert Uytterhoeven @ 2022-06-08 15:10 UTC (permalink / raw)
  To: John Ogness
  Cc: Marek Szyprowski, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, Linux Kernel Mailing List, Greg Kroah-Hartman,
	open list:ARM/Amlogic Meson...

Hi John,

On Mon, May 2, 2022 at 3:19 PM John Ogness <john.ogness@linutronix.de> wrote:
> On 2022-05-02, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> > Data written to /dev/kmsg and all kernel logs were always displayed
> > correctly. Also data written directly to /dev/ttyAML0 is displayed
> > properly on the console. The latter doesn't however trigger the input
> > related activity.
> >
> > It looks that the data read from the uart is delivered only if other
> > activity happens on the kernel console. If I type 'reboot' and press
> > enter, nothing happens immediately. If I type 'date >/dev/ttyAML0' via
> > ssh then, I only see the date printed on the console. However if I
> > type 'date >/dev/kmsg', the the date is printed and reboot happens.
>
> I suppose if you login via ssh and check /proc/interrupts, then type
> some things over serial, then check /proc/interrupts again, you will see
> there have been no interrupts for the uart. But interrupts for other
> devices are happening. Is this correct?
>
> > For comparison, here is a 't' sysrq result from the 'working' serial
> > console (next-20220429), which happens usually 1 of 4 boots:
> >
> > https://pastebin.com/mp8zGFbW
>
> This still looks odd to me. We should be seeing a trace originating from
> ret_from_fork+0x10/0x20 and kthread+0x118/0x11c.
>
> I wonder if the early creation of the thread is somehow causing
> problems. Could you try the following patch to see if it makes a
> difference? I would also like to see the sysrq-t output with this patch
> applied:

On one board, I'm seeing a new splat during early boot, pointing to
printk_activate_kthreads:

    Calibrating delay loop (skipped), value calculated using timer
frequency.. 48.00 BogoMIPS (lpj=96000)
    pid_max: default: 32768 minimum: 301
    Mount-cache hash table entries: 4096 (order: 3, 32768 bytes, linear)
    Mountpoint-cache hash table entries: 4096 (order: 3, 32768 bytes, linear)

    =============================
    [ BUG: Invalid wait context ]
    5.19.0-rc1-ebisu-00802-g06a0dd60d6e4 #431 Not tainted
    -----------------------------
    swapper/0/1 is trying to lock:
    ffffffc00910bac8 (base_crng.lock){....}-{3:3}, at:
crng_make_state+0x148/0x1e4
    other info that might help us debug this:
    context-{5:5}
    2 locks held by swapper/0/1:
     #0: ffffffc008f8ae00 (console_lock){+.+.}-{0:0}, at:
printk_activate_kthreads+0x10/0x54
     #1: ffffffc009da4a28 (&meta->lock){....}-{2:2}, at:
__kfence_alloc+0x378/0x5c4
    stack backtrace:
    CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.19.0-rc1-ebisu-00802-g06a0dd60d6e4 #431
    Hardware name: Renesas Ebisu-4D board based on r8a77990 (DT)
    Call trace:
     dump_backtrace.part.0+0x98/0xc0
     show_stack+0x14/0x28
     dump_stack_lvl+0xac/0xec
     dump_stack+0x14/0x2c
     __lock_acquire+0x388/0x10a0
     lock_acquire+0x190/0x2c0
     _raw_spin_lock_irqsave+0x6c/0x94
     crng_make_state+0x148/0x1e4
     _get_random_bytes.part.0+0x4c/0xe8
     get_random_u32+0x4c/0x140
     __kfence_alloc+0x460/0x5c4
     kmem_cache_alloc_trace+0x194/0x1dc
     __kthread_create_on_node+0x5c/0x1a8
     kthread_create_on_node+0x58/0x7c
     printk_start_kthread.part.0+0x34/0xa8
     printk_activate_kthreads+0x4c/0x54
     do_one_initcall+0xec/0x278
     kernel_init_freeable+0x11c/0x214
     kernel_init+0x24/0x124
     ret_from_fork+0x10/0x20
    rcu: Hierarchical SRCU implementation.
    printk: console [tty0] printing thread started
    EFI services will not be available.
    smp: Bringing up secondary CPUs ...
    Detected VIPT I-cache on CPU1
    CPU1: Booted secondary processor 0x0000000001 [0x410fd034]
    smp: Brought up 1 node, 2 CPUs
    SMP: Total of 2 processors activated.

> ---------------- BEGIN PATCH ---------------
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 2311a0ad584a..c4362d25de22 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3837,7 +3837,7 @@ static int __init printk_activate_kthreads(void)
>
>         return 0;
>  }
> -early_initcall(printk_activate_kthreads);
> +late_initcall(printk_activate_kthreads);
>
>  #if defined CONFIG_PRINTK
>  /* If @con is specified, only wait for that console. Otherwise wait for all. */
> ---------------- END PATCH ---------------

Doesn't seem to make much of a difference, only a slightly different
backtrace, compared to the above:

     Mount-cache hash table entries: 4096 (order: 3, 32768 bytes, linear)
     Mountpoint-cache hash table entries: 4096 (order: 3, 32768 bytes, linear)
    +rcu: Hierarchical SRCU implementation.

     =============================
     [ BUG: Invalid wait context ]
    -5.19.0-rc1-ebisu-00802-g06a0dd60d6e4 #431 Not tainted
    +5.19.0-rc1-ebisu-00802-g06a0dd60d6e4-dirty #433 Not tainted
     -----------------------------
     swapper/0/1 is trying to lock:
     ffffffc00910bac8 (base_crng.lock){....}-{3:3}, at:
crng_make_state+0x148/0x1e4
     other info that might help us debug this:
     context-{5:5}
    -2 locks held by swapper/0/1:
    - #0: ffffffc008f8ae00 (console_lock){+.+.}-{0:0}, at:
printk_activate_kthreads+0x10/0x54
    - #1: ffffffc009da4a28 (&meta->lock){....}-{2:2}, at:
__kfence_alloc+0x378/0x5c4
    +1 lock held by swapper/0/1:
    + #0: ffffffc009da4a28 (&meta->lock){....}-{2:2}, at:
__kfence_alloc+0x378/0x5c4
     stack backtrace:
    -CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.19.0-rc1-ebisu-00802-g06a0dd60d6e4 #431
    +CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.19.0-rc1-ebisu-00802-g06a0dd60d6e4-dirty #433
     Hardware name: Renesas Ebisu-4D board based on r8a77990 (DT)
     Call trace:
      dump_backtrace.part.0+0x98/0xc0
    @@ -33,20 +32,14 @@ Call trace:
      kmem_cache_alloc_trace+0x194/0x1dc
      __kthread_create_on_node+0x5c/0x1a8
      kthread_create_on_node+0x58/0x7c
    - printk_start_kthread.part.0+0x34/0xa8
    - printk_activate_kthreads+0x4c/0x54
    + rcu_spawn_gp_kthread+0x54/0x208
      do_one_initcall+0xec/0x278
      kernel_init_freeable+0x11c/0x214
      kernel_init+0x24/0x124
      ret_from_fork+0x10/0x20
    -rcu: Hierarchical SRCU implementation.
    -printk: console [tty0] printing thread started
     EFI services will not be available.
     smp: Bringing up secondary CPUs ...
     Detected VIPT I-cache on CPU1
     ...
    +printk: console [tty0] printing thread started

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-06-08 15:10                           ` Geert Uytterhoeven
@ 2022-06-09 11:19                             ` John Ogness
  2022-06-09 11:58                               ` Jason A. Donenfeld
  0 siblings, 1 reply; 33+ messages in thread
From: John Ogness @ 2022-06-09 11:19 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Marek Szyprowski, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, Linux Kernel Mailing List, Greg Kroah-Hartman,
	open list:ARM/Amlogic Meson..., Theodore Ts'o,
	Jason A. Donenfeld, Alexander Potapenko, Marco Elver, kasan-dev

(Added RANDOM NUMBER DRIVER and KFENCE people.)

Hi Geert,

On 2022-06-08, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>     =============================
>     [ BUG: Invalid wait context ]
>     5.19.0-rc1-ebisu-00802-g06a0dd60d6e4 #431 Not tainted
>     -----------------------------
>     swapper/0/1 is trying to lock:
>     ffffffc00910bac8 (base_crng.lock){....}-{3:3}, at:
> crng_make_state+0x148/0x1e4
>     other info that might help us debug this:
>     context-{5:5}
>     2 locks held by swapper/0/1:
>      #0: ffffffc008f8ae00 (console_lock){+.+.}-{0:0}, at:
> printk_activate_kthreads+0x10/0x54
>      #1: ffffffc009da4a28 (&meta->lock){....}-{2:2}, at:
> __kfence_alloc+0x378/0x5c4
>     stack backtrace:
>     CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 5.19.0-rc1-ebisu-00802-g06a0dd60d6e4 #431
>     Hardware name: Renesas Ebisu-4D board based on r8a77990 (DT)
>     Call trace:
>      dump_backtrace.part.0+0x98/0xc0
>      show_stack+0x14/0x28
>      dump_stack_lvl+0xac/0xec
>      dump_stack+0x14/0x2c
>      __lock_acquire+0x388/0x10a0
>      lock_acquire+0x190/0x2c0
>      _raw_spin_lock_irqsave+0x6c/0x94
>      crng_make_state+0x148/0x1e4
>      _get_random_bytes.part.0+0x4c/0xe8
>      get_random_u32+0x4c/0x140
>      __kfence_alloc+0x460/0x5c4
>      kmem_cache_alloc_trace+0x194/0x1dc
>      __kthread_create_on_node+0x5c/0x1a8
>      kthread_create_on_node+0x58/0x7c
>      printk_start_kthread.part.0+0x34/0xa8
>      printk_activate_kthreads+0x4c/0x54
>      do_one_initcall+0xec/0x278
>      kernel_init_freeable+0x11c/0x214
>      kernel_init+0x24/0x124
>      ret_from_fork+0x10/0x20

I am guessing you have CONFIG_PROVE_RAW_LOCK_NESTING enabled?

We are seeing a spinlock (base_crng.lock) taken while holding a
raw_spinlock (meta->lock).

kfence_guarded_alloc()
  raw_spin_trylock_irqsave(&meta->lock, flags)
    prandom_u32_max()
      prandom_u32()
        get_random_u32()
          get_random_bytes()
            _get_random_bytes()
              crng_make_state()
                spin_lock_irqsave(&base_crng.lock, flags);

I expect it is allowed to create kthreads via kthread_run() in
early_initcalls.

John Ogness

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-06-09 11:19                             ` John Ogness
@ 2022-06-09 11:58                               ` Jason A. Donenfeld
  2022-06-09 12:18                                 ` Dmitry Vyukov
  2022-06-09 12:18                                 ` Jason A. Donenfeld
  0 siblings, 2 replies; 33+ messages in thread
From: Jason A. Donenfeld @ 2022-06-09 11:58 UTC (permalink / raw)
  To: John Ogness
  Cc: Geert Uytterhoeven, Marek Szyprowski, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	Linux Kernel Mailing List, Greg Kroah-Hartman,
	open list:ARM/Amlogic Meson..., Theodore Ts'o,
	Alexander Potapenko, Marco Elver, kasan-dev

Hi John,

On Thu, Jun 09, 2022 at 01:25:15PM +0206, John Ogness wrote:
> (Added RANDOM NUMBER DRIVER and KFENCE people.)

Thanks.

> I am guessing you have CONFIG_PROVE_RAW_LOCK_NESTING enabled?
> 
> We are seeing a spinlock (base_crng.lock) taken while holding a
> raw_spinlock (meta->lock).
> 
> kfence_guarded_alloc()
>   raw_spin_trylock_irqsave(&meta->lock, flags)
>     prandom_u32_max()
>       prandom_u32()
>         get_random_u32()
>           get_random_bytes()
>             _get_random_bytes()
>               crng_make_state()
>                 spin_lock_irqsave(&base_crng.lock, flags);
> 
> I expect it is allowed to create kthreads via kthread_run() in
> early_initcalls.

AFAIK, CONFIG_PROVE_RAW_LOCK_NESTING is useful for teasing out cases
where RT's raw spinlocks will nest wrong with RT's sleeping spinlocks.
But nobody who wants an RT kernel will be using KFENCE. So this seems
like a non-issue? Maybe just add a `depends on !KFENCE` to
PROVE_RAW_LOCK_NESTING?

Jason

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-06-09 11:58                               ` Jason A. Donenfeld
@ 2022-06-09 12:18                                 ` Dmitry Vyukov
  2022-06-09 12:27                                   ` Jason A. Donenfeld
  2022-06-09 12:18                                 ` Jason A. Donenfeld
  1 sibling, 1 reply; 33+ messages in thread
From: Dmitry Vyukov @ 2022-06-09 12:18 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: John Ogness, Geert Uytterhoeven, Marek Szyprowski, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	Linux Kernel Mailing List, Greg Kroah-Hartman,
	open list:ARM/Amlogic Meson..., Theodore Ts'o,
	Alexander Potapenko, Marco Elver, kasan-dev

On Thu, 9 Jun 2022 at 13:59, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Hi John,
>
> On Thu, Jun 09, 2022 at 01:25:15PM +0206, John Ogness wrote:
> > (Added RANDOM NUMBER DRIVER and KFENCE people.)
>
> Thanks.
>
> > I am guessing you have CONFIG_PROVE_RAW_LOCK_NESTING enabled?
> >
> > We are seeing a spinlock (base_crng.lock) taken while holding a
> > raw_spinlock (meta->lock).
> >
> > kfence_guarded_alloc()
> >   raw_spin_trylock_irqsave(&meta->lock, flags)
> >     prandom_u32_max()
> >       prandom_u32()
> >         get_random_u32()
> >           get_random_bytes()
> >             _get_random_bytes()
> >               crng_make_state()
> >                 spin_lock_irqsave(&base_crng.lock, flags);
> >
> > I expect it is allowed to create kthreads via kthread_run() in
> > early_initcalls.
>
> AFAIK, CONFIG_PROVE_RAW_LOCK_NESTING is useful for teasing out cases
> where RT's raw spinlocks will nest wrong with RT's sleeping spinlocks.
> But nobody who wants an RT kernel will be using KFENCE. So this seems
> like a non-issue? Maybe just add a `depends on !KFENCE` to
> PROVE_RAW_LOCK_NESTING?

Don't know if there are other good solutions (of similar simplicity).
But fwiw this is not about the target production environment. Real
production uses of RT kernels will probably not enable LOCKDEP,
PROVE_RAW_LOCK_NESTING and other debugging configs.
This is about detecting as many bugs as possible in testing
environments. And testing environments can well have both LOCKDEP and
KFENCE enabled. Any such limitation will require doubling the number
of tested configurations.

Btw, should this new CONFIG_PROVE_RAW_LOCK_NESTING be generally
enabled on testing systems? We don't have it enabled on syzbot.

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-06-09 11:58                               ` Jason A. Donenfeld
  2022-06-09 12:18                                 ` Dmitry Vyukov
@ 2022-06-09 12:18                                 ` Jason A. Donenfeld
  1 sibling, 0 replies; 33+ messages in thread
From: Jason A. Donenfeld @ 2022-06-09 12:18 UTC (permalink / raw)
  To: John Ogness
  Cc: Geert Uytterhoeven, Marek Szyprowski, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	Linux Kernel Mailing List, Greg Kroah-Hartman,
	open list:ARM/Amlogic Meson..., Theodore Ts'o,
	Alexander Potapenko, Marco Elver, kasan-dev

Hey again,

On Thu, Jun 09, 2022 at 01:58:44PM +0200, Jason A. Donenfeld wrote:
> Hi John,
> 
> On Thu, Jun 09, 2022 at 01:25:15PM +0206, John Ogness wrote:
> > (Added RANDOM NUMBER DRIVER and KFENCE people.)
> 
> Thanks.
> 
> > I am guessing you have CONFIG_PROVE_RAW_LOCK_NESTING enabled?
> > 
> > We are seeing a spinlock (base_crng.lock) taken while holding a
> > raw_spinlock (meta->lock).
> > 
> > kfence_guarded_alloc()
> >   raw_spin_trylock_irqsave(&meta->lock, flags)
> >     prandom_u32_max()
> >       prandom_u32()
> >         get_random_u32()
> >           get_random_bytes()
> >             _get_random_bytes()
> >               crng_make_state()
> >                 spin_lock_irqsave(&base_crng.lock, flags);
> > 
> > I expect it is allowed to create kthreads via kthread_run() in
> > early_initcalls.
> 
> AFAIK, CONFIG_PROVE_RAW_LOCK_NESTING is useful for teasing out cases
> where RT's raw spinlocks will nest wrong with RT's sleeping spinlocks.
> But nobody who wants an RT kernel will be using KFENCE. So this seems
> like a non-issue? Maybe just add a `depends on !KFENCE` to
> PROVE_RAW_LOCK_NESTING?

On second thought, the fix is trivial:
https://lore.kernel.org/lkml/20220609121709.12939-1-Jason@zx2c4.com/

Jason

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-06-09 12:18                                 ` Dmitry Vyukov
@ 2022-06-09 12:27                                   ` Jason A. Donenfeld
  2022-06-09 12:32                                     ` Dmitry Vyukov
  2022-06-17 16:51                                     ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 33+ messages in thread
From: Jason A. Donenfeld @ 2022-06-09 12:27 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: John Ogness, Geert Uytterhoeven, Marek Szyprowski, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	Linux Kernel Mailing List, Greg Kroah-Hartman,
	open list:ARM/Amlogic Meson..., Theodore Ts'o,
	Alexander Potapenko, Marco Elver, kasan-dev, bigeasy

Hi Dmitry,

On Thu, Jun 09, 2022 at 02:18:19PM +0200, Dmitry Vyukov wrote:
> > AFAIK, CONFIG_PROVE_RAW_LOCK_NESTING is useful for teasing out cases
> > where RT's raw spinlocks will nest wrong with RT's sleeping spinlocks.
> > But nobody who wants an RT kernel will be using KFENCE. So this seems
> > like a non-issue? Maybe just add a `depends on !KFENCE` to
> > PROVE_RAW_LOCK_NESTING?
> 
> Don't know if there are other good solutions (of similar simplicity).

Fortunately, I found one that solves things without needing to
compromise on anything:
https://lore.kernel.org/lkml/20220609121709.12939-1-Jason@zx2c4.com/

> Btw, should this new CONFIG_PROVE_RAW_LOCK_NESTING be generally
> enabled on testing systems? We don't have it enabled on syzbot.

Last time I spoke with RT people about this, the goal was eventually to
*always* enable it when lock proving is enabled, but there are too many
bugs and cases now to do that, so it's an opt-in. I might be
misremembering, though, so CC'ing Sebastian in case he wants to chime
in.

Jason

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-06-09 12:27                                   ` Jason A. Donenfeld
@ 2022-06-09 12:32                                     ` Dmitry Vyukov
  2022-06-17 16:51                                     ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 33+ messages in thread
From: Dmitry Vyukov @ 2022-06-09 12:32 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: John Ogness, Geert Uytterhoeven, Marek Szyprowski, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	Linux Kernel Mailing List, Greg Kroah-Hartman,
	open list:ARM/Amlogic Meson..., Theodore Ts'o,
	Alexander Potapenko, Marco Elver, kasan-dev, bigeasy

On Thu, 9 Jun 2022 at 14:27, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Hi Dmitry,
>
> On Thu, Jun 09, 2022 at 02:18:19PM +0200, Dmitry Vyukov wrote:
> > > AFAIK, CONFIG_PROVE_RAW_LOCK_NESTING is useful for teasing out cases
> > > where RT's raw spinlocks will nest wrong with RT's sleeping spinlocks.
> > > But nobody who wants an RT kernel will be using KFENCE. So this seems
> > > like a non-issue? Maybe just add a `depends on !KFENCE` to
> > > PROVE_RAW_LOCK_NESTING?
> >
> > Don't know if there are other good solutions (of similar simplicity).
>
> Fortunately, I found one that solves things without needing to
> compromise on anything:
> https://lore.kernel.org/lkml/20220609121709.12939-1-Jason@zx2c4.com/

Cool! Thanks!

> > Btw, should this new CONFIG_PROVE_RAW_LOCK_NESTING be generally
> > enabled on testing systems? We don't have it enabled on syzbot.
>
> Last time I spoke with RT people about this, the goal was eventually to
> *always* enable it when lock proving is enabled, but there are too many
> bugs and cases now to do that, so it's an opt-in. I might be
> misremembering, though, so CC'ing Sebastian in case he wants to chime
> in.

OK, we will wait then.
Little point in doubling the number of reports for known issues.

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking
  2022-06-09 12:27                                   ` Jason A. Donenfeld
  2022-06-09 12:32                                     ` Dmitry Vyukov
@ 2022-06-17 16:51                                     ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 33+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-06-17 16:51 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Dmitry Vyukov, John Ogness, Geert Uytterhoeven, Marek Szyprowski,
	Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	Linux Kernel Mailing List, Greg Kroah-Hartman,
	open list:ARM/Amlogic Meson..., Theodore Ts'o,
	Alexander Potapenko, Marco Elver, kasan-dev

On 2022-06-09 14:27:11 [+0200], Jason A. Donenfeld wrote:
> Hi Dmitry,
> 
> On Thu, Jun 09, 2022 at 02:18:19PM +0200, Dmitry Vyukov wrote:
> > > AFAIK, CONFIG_PROVE_RAW_LOCK_NESTING is useful for teasing out cases
> > > where RT's raw spinlocks will nest wrong with RT's sleeping spinlocks.
> > > But nobody who wants an RT kernel will be using KFENCE. So this seems
> > > like a non-issue? Maybe just add a `depends on !KFENCE` to
> > > PROVE_RAW_LOCK_NESTING?
> > 
> > Don't know if there are other good solutions (of similar simplicity).
> 
> Fortunately, I found one that solves things without needing to
> compromise on anything:
> https://lore.kernel.org/lkml/20220609121709.12939-1-Jason@zx2c4.com/
> 
> > Btw, should this new CONFIG_PROVE_RAW_LOCK_NESTING be generally
> > enabled on testing systems? We don't have it enabled on syzbot.
> 
> Last time I spoke with RT people about this, the goal was eventually to
> *always* enable it when lock proving is enabled, but there are too many
> bugs and cases now to do that, so it's an opt-in. I might be
> misremembering, though, so CC'ing Sebastian in case he wants to chime
> in.

That is basically still the case. If CONFIG_PROVE_RAW_LOCK_NESTING yells
then there will be yelling on PREEMPT_RT, too. We would like to get
things fixed ;)

Without going through this thread, John is looking at printk and printk
triggers a few of those. That is one of reasons why this is not enabled
by default.

> Jason

Sebastian

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2022-06-17 16:52 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20220421212250.565456-1-john.ogness@linutronix.de>
     [not found] ` <20220421212250.565456-15-john.ogness@linutronix.de>
     [not found]   ` <878rrs6ft7.fsf@jogness.linutronix.de>
     [not found]     ` <Ymfgis0EAw0Oxoa5@alley>
     [not found]       ` <Ymfwk+X0CHq6ex3s@alley>
     [not found]         ` <CGME20220427070833eucas1p27a32ce7c41c0da26f05bd52155f0031c@eucas1p2.samsung.com>
2022-04-27  7:08           ` [PATCH printk v5 1/1] printk: extend console_lock for per-console locking Marek Szyprowski
2022-04-27  7:38             ` Petr Mladek
2022-04-27 11:44               ` Marek Szyprowski
2022-04-27 16:15                 ` John Ogness
2022-04-27 16:48                   ` Petr Mladek
2022-04-28 14:54                   ` Petr Mladek
2022-04-29 13:53                   ` Marek Szyprowski
2022-04-30 16:00                     ` John Ogness
2022-05-02  9:19                       ` Marek Szyprowski
2022-05-02 13:11                         ` John Ogness
2022-05-02 22:29                           ` Marek Szyprowski
2022-05-04  5:56                             ` John Ogness
2022-05-04  6:52                               ` Marek Szyprowski
2022-06-08 15:10                           ` Geert Uytterhoeven
2022-06-09 11:19                             ` John Ogness
2022-06-09 11:58                               ` Jason A. Donenfeld
2022-06-09 12:18                                 ` Dmitry Vyukov
2022-06-09 12:27                                   ` Jason A. Donenfeld
2022-06-09 12:32                                     ` Dmitry Vyukov
2022-06-17 16:51                                     ` Sebastian Andrzej Siewior
2022-06-09 12:18                                 ` Jason A. Donenfeld
2022-05-02 13:17                         ` Petr Mladek
2022-05-02 23:13                           ` Marek Szyprowski
2022-05-03  6:49                             ` Petr Mladek
2022-05-04  6:05                               ` Marek Szyprowski
2022-05-04 21:11                             ` John Ogness
2022-05-04 22:42                               ` John Ogness
2022-05-05 22:33                                 ` John Ogness
2022-05-06  6:43                                   ` Marek Szyprowski
2022-05-06  7:55                                     ` Neil Armstrong
2022-05-08 11:02                                       ` John Ogness
2022-05-06  8:16                                     ` Petr Mladek
2022-05-06  9:20                                     ` John Ogness

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).