problems with 4.14.6-rt7

linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* problems with 4.14.6-rt7
@ 2018-01-08 14:56 Tim Sander
  2018-01-24 14:18 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 5+ messages in thread
From: Tim Sander @ 2018-01-08 14:56 UTC (permalink / raw)
  To: linux-rt-users; +Cc: Sebastian Andrzej Siewior

Hi

I am currently using 4.14.6-rt7 on an De0-Nano-Soc (intel/altera arm v7 dual core)
board. I have added a small driver which adds some functionality like iio. As soon
as i have some realtime load on one core i see messages like:

Showing busy workqueues and worker pools:
workqueue events_freezable: flags=0x4
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
    pending: mmc_rescan
workqueue events_power_efficient: flags=0x80
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
    pending: phy_state_machine, neigh_periodic_work
workqueue mm_percpu_wq: flags=0x8
  pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
    pending: vmstat_update
BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 68s!
Showing busy workqueues and worker pools:
workqueue mm_percpu_wq: flags=0x8
  pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
    pending: vmstat_update
BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 98s!
Showing busy workqueues and worker pools:
workqueue mm_percpu_wq: flags=0x8
  pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
    pending: vmstat_update

In other occasions i have seen complete system lockups where the only message i get is:
INFO: task systemd:1 blocked for more than 120 seconds.
      Tainted: G           O    4.14.6-rt7 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
systemd         D    0     1      0 0x00001000
[<80793c70>] (__schedule) from [<80793f30>] (schedule+0x64/0xfc)
[<80793f30>] (schedule) from [<80796940>] (__write_rt_lock+0x13c/0x214)
[<80796940>] (__write_rt_lock) from [<80798464>] (rt_write_lock+0x24/0x28)
[<80798464>] (rt_write_lock) from [<8011f4b4>] (copy_process.part.5+0xfc4/0x17d8)
[<8011f4b4>] (copy_process.part.5) from [<8011fe58>] (_do_fork+0xc8/0x474)
[<8011fe58>] (_do_fork) from [<80120320>] (SyS_clone+0x30/0x38)
[<80120320>] (SyS_clone) from [<80107e60>] (ret_fast_syscall+0x0/0x28)
INFO: task systemd:1 blocked for more than 120 seconds.

I know that i am really using up all realtime budget on this system for one core 
but i would expect that rt-throttling kicks in and gets some time for these
starving processes?

Best regards
Tim


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: problems with 4.14.6-rt7
  2018-01-08 14:56 problems with 4.14.6-rt7 Tim Sander
@ 2018-01-24 14:18 ` Sebastian Andrzej Siewior
  2018-01-26 16:01   ` Tim Sander
  0 siblings, 1 reply; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-01-24 14:18 UTC (permalink / raw)
  To: Tim Sander; +Cc: linux-rt-users

On 2018-01-08 15:56:17 [+0100], Tim Sander wrote:
> Hi
Hi,

> I am currently using 4.14.6-rt7 on an De0-Nano-Soc (intel/altera arm v7 dual core)
> board. I have added a small driver which adds some functionality like iio. As soon
> as i have some realtime load on one core i see messages like:
…
> I know that i am really using up all realtime budget on this system for one core 
> but i would expect that rt-throttling kicks in and gets some time for these
> starving processes?

You should see something like
	sched: RT throttling activated

once the RT tasks are throttled. Do you?

> Best regards
> Tim

Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: problems with 4.14.6-rt7
  2018-01-24 14:18 ` Sebastian Andrzej Siewior
@ 2018-01-26 16:01   ` Tim Sander
  2018-02-20 18:48     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 5+ messages in thread
From: Tim Sander @ 2018-01-26 16:01 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

Hi Sebastian

Am Mittwoch, 24. Januar 2018, 15:18:50 CET schrieb Sebastian Andrzej Siewior:
> On 2018-01-08 15:56:17 [+0100], Tim Sander wrote:
> > I am currently using 4.14.6-rt7 on an De0-Nano-Soc (intel/altera arm v7
> > dual core) board. I have added a small driver which adds some
> > functionality like iio. As soon
> > as i have some realtime load on one core i see messages like:
> …
> 
> > I know that i am really using up all realtime budget on this system for
> > one core but i would expect that rt-throttling kicks in and gets some
> > time for these starving processes?
> 
> You should see something like
> 	sched: RT throttling activated
> 
> once the RT tasks are throttled. Do you?
I just double checked with your 4.14.15-rt12 release. I have not seen
any throtteling messages? (the double "cut here" line is no copy error of mine, but
verbatim kernel output)

Here is the dmesg output of a testrun:

BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 56s!
Showing busy workqueues and worker pools:
workqueue mm_percpu_wq: flags=0x8
  pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
    pending: vmstat_update
------------[ cut here ]------------
------------[ cut here ]------------
WARNING: CPU: 1 PID: 215 at kernel/workqueue.c:1699 worker_thread+0x390/0x614
Modules linked in: serdes(O)
CPU: 1 PID: 215 Comm: kworker/0:2 Tainted: G           O    4.14.15-rt12 #1
Hardware name: Altera SOCFPGA
[<801116d0>] (unwind_backtrace) from [<8010cae4>] (show_stack+0x20/0x24)
[<8010cae4>] (show_stack) from [<8077ef10>] (dump_stack+0x90/0xa4)
[<8077ef10>] (dump_stack) from [<80120e9c>] (__warn+0xf8/0x110)
[<80120e9c>] (__warn) from [<80120f84>] (warn_slowpath_null+0x30/0x38)
[<80120f84>] (warn_slowpath_null) from [<8013d76c>] (worker_thread+0x390/0x614)
[<8013d76c>] (worker_thread) from [<801437fc>] (kthread+0x13c/0x16c)
[<801437fc>] (kthread) from [<80107f1c>] (ret_from_fork+0x14/0x38)
---[ end trace 0000000000000002 ]---
WARNING: CPU: 0 PID: 3 at kernel/kthread.c:370 __kthread_bind_mask+0x74/0x78
Modules linked in: serdes(O)
CPU: 0 PID: 3 Comm: kworker/0:0 Tainted: G        W  O    4.14.15-rt12 #1
Hardware name: Altera SOCFPGA
[<801116d0>] (unwind_backtrace) from [<8010cae4>] (show_stack+0x20/0x24)
[<8010cae4>] (show_stack) from [<8077ef10>] (dump_stack+0x90/0xa4)
[<8077ef10>] (dump_stack) from [<80120e9c>] (__warn+0xf8/0x110)
[<80120e9c>] (__warn) from [<80120f84>] (warn_slowpath_null+0x30/0x38)
[<80120f84>] (warn_slowpath_null) from [<80143ee4>] (__kthread_bind_mask+0x74/0x78)
[<80143ee4>] (__kthread_bind_mask) from [<80144680>] (kthread_bind_mask+0x1c/0x20)
[<80144680>] (kthread_bind_mask) from [<8013ac4c>] (create_worker+0xec/0x180)
[<8013ac4c>] (create_worker) from [<8013d7d0>] (worker_thread+0x3f4/0x614)
[<8013d7d0>] (worker_thread) from [<801437fc>] (kthread+0x13c/0x16c)
[<801437fc>] (kthread) from [<80107f1c>] (ret_from_fork+0x14/0x38)
---[ end trace 0000000000000003 ]---
------------[ cut here ]------------
WARNING: CPU: 0 PID: 3 at kernel/workqueue.c:1657 worker_enter_idle+0x168/0x1d8
Modules linked in: serdes(O)
CPU: 0 PID: 3 Comm: kworker/0:0 Tainted: G        W  O    4.14.15-rt12 #1
Hardware name: Altera SOCFPGA
[<801116d0>] (unwind_backtrace) from [<8010cae4>] (show_stack+0x20/0x24)
[<8010cae4>] (show_stack) from [<8077ef10>] (dump_stack+0x90/0xa4)
[<8077ef10>] (dump_stack) from [<80120e9c>] (__warn+0xf8/0x110)
[<80120e9c>] (__warn) from [<80120f84>] (warn_slowpath_null+0x30/0x38)
[<80120f84>] (warn_slowpath_null) from [<80139964>] (worker_enter_idle+0x168/0x1d8)
[<80139964>] (worker_enter_idle) from [<8013ac78>] (create_worker+0x118/0x180)
[<8013ac78>] (create_worker) from [<8013d7d0>] (worker_thread+0x3f4/0x614)
[<8013d7d0>] (worker_thread) from [<801437fc>] (kthread+0x13c/0x16c)
[<801437fc>] (kthread) from [<80107f1c>] (ret_from_fork+0x14/0x38)
---[ end trace 0000000000000004 ]---

Best regards
Tim


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: problems with 4.14.6-rt7
  2018-01-26 16:01   ` Tim Sander
@ 2018-02-20 18:48     ` Sebastian Andrzej Siewior
  2018-02-20 19:36       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-02-20 18:48 UTC (permalink / raw)
  To: Tim Sander; +Cc: linux-rt-users

On 2018-01-26 17:01:09 [+0100], Tim Sander wrote:
> Hi Sebastian
Hi Tim,

> I just double checked with your 4.14.15-rt12 release. I have not seen
> any throtteling messages? (the double "cut here" line is no copy error of mine, but
> verbatim kernel output)
> 
> Here is the dmesg output of a testrun:
> 
> BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 56s!
> Showing busy workqueues and worker pools:
> workqueue mm_percpu_wq: flags=0x8
>   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
>     pending: vmstat_update

I don't see all of this but I don't see the lockdep detector unless I
boot UP. This bothers me little, let me look why…

Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: problems with 4.14.6-rt7
  2018-02-20 18:48     ` Sebastian Andrzej Siewior
@ 2018-02-20 19:36       ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-02-20 19:36 UTC (permalink / raw)
  To: Tim Sander; +Cc: linux-rt-users, Peter Zijlstra, tglx

On 2018-02-20 19:48:06 [+0100], To Tim Sander wrote:
> On 2018-01-26 17:01:09 [+0100], Tim Sander wrote:
> > Hi Sebastian
Hi Tim,

> > I just double checked with your 4.14.15-rt12 release. I have not seen
> > any throtteling messages? (the double "cut here" line is no copy error of mine, but
> > verbatim kernel output)
> > 
> > Here is the dmesg output of a testrun:
> > 
> > BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 56s!
> > Showing busy workqueues and worker pools:
> > workqueue mm_percpu_wq: flags=0x8
> >   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
> >     pending: vmstat_update
> 
> I don't see all of this but I don't see the lockdep detector unless I
> boot UP. This bothers me little, let me look why…

As I just learned (or have been told): RT_RUNTIME_SHARE on SMP disables
the RT throttling if a RT thread went mad. This can be undone via
/sys/kernel/debug/sched_features.

Without throttling, the workqueue code never gets on the CPU and
complains that it is stuck. With throttling it wouldn't happen. However
throttling kills everything in the RT class which means it won't process
threaded interrupts so an UP system remains "dead" (since you can't
access via network/serial, most timer won't expires, …). The benefit is
limited.

Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-02-20 19:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-08 14:56 problems with 4.14.6-rt7 Tim Sander
2018-01-24 14:18 ` Sebastian Andrzej Siewior
2018-01-26 16:01   ` Tim Sander
2018-02-20 18:48     ` Sebastian Andrzej Siewior
2018-02-20 19:36       ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).