* [RT] oops in 4.1.3-rt3
@ 2015-08-03 19:27 Clark Williams
2015-08-04 8:08 ` Nicholas Mc Guire
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Clark Williams @ 2015-08-03 19:27 UTC (permalink / raw)
To: Sebastian Andrzej Siewior; +Cc: RT, LKML
Sebastian,
Below is a traceback I hit while running 4.1.3-rt3 on my Lenovo T530.
I was doing my normal, play music, copy files over the lan, do compiles,
do email, etc., so I I can't really point you at a reproducer. The graphics
system stayed up somewhat but the actual trace I hit scrolled off. This was
extracted from the syslog with 'journalctl'.
If I encounter it again, I'll see if I can force a kdump.
Clark
Jul 28 20:29:07 sluggy systemd-udevd[831]: worker [13548] did not accept message -111 (Connection refused), kill it
Jul 28 20:29:07 sluggy kernel: general protection fault: 0000 [#1] PREEMPT SMP
Jul 28 20:29:07 sluggy kernel: Modules linked in: hidp cmac rfcomm fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun
Jul 28 20:29:07 sluggy kernel: snd_hda_codec snd_usbmidi_lib snd_rawmidi snd_hda_core videodev iwlwifi btusb snd_hwdep btbcm
Jul 28 20:29:07 sluggy kernel: CPU: 4 PID: 13548 Comm: systemd-udevd Not tainted 4.1.3-rt3 #3
Jul 28 20:29:07 sluggy kernel: Hardware name: LENOVO 24293E8/24293E8, BIOS G4ET94WW (2.54 ) 05/23/2013
Jul 28 20:29:07 sluggy kernel: task: ffff8802ffb0aa40 ti: ffff88007871c000 task.ti: ffff88007871c000
Jul 28 20:29:07 sluggy kernel: RIP: 0010:[<ffffffff8168a35d>] [<ffffffff8168a35d>] sock_prot_inuse_add+0x2d/0x50
Jul 28 20:29:07 sluggy kernel: RSP: 0018:ffff88007871fb68 EFLAGS: 00010296
Jul 28 20:29:07 sluggy kernel: RAX: 0000000000000004 RBX: 00000000ffffffff RCX: 0000000000000000
Jul 28 20:29:07 sluggy kernel: RDX: ffffffff81810428 RSI: ffffffff81ae0d87 RDI: ffffffff81a7f8c0
Jul 28 20:29:07 sluggy kernel: RBP: ffff88007871fb88 R08: 0000000000000001 R09: 0000000000000000
Jul 28 20:29:07 sluggy kernel: R10: ffff880088ebb390 R11: 0000000000000001 R12: 20303d6469752031
Jul 28 20:29:08 sluggy kernel: R13: ffffffff81cea7a0 R14: 0000000000000000 R15: 0000000000000018
Jul 28 20:29:08 sluggy kernel: FS: 0000000000000000(0000) GS:ffff88041e300000(0000) knlGS:0000000000000000
Jul 28 20:29:08 sluggy kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 28 20:29:08 sluggy kernel: CR2: 000055da9fd30060 CR3: 0000000001c0e000 CR4: 00000000001407e0
Jul 28 20:29:08 sluggy kernel: Stack:
Jul 28 20:29:08 sluggy kernel: ffff88031abe42d8 ffff88031abe42d8 ffff88031abe4000 ffff88031abe45d8
Jul 28 20:29:08 sluggy kernel: ffff88007871fbf8 ffffffff816db0b6 ffff880407f06400 ffff88040afbd1d0
Jul 28 20:29:08 sluggy kernel: 0000000100000000 20303d6469752031 0000000fffffef45 00000000fa86c4d0
Jul 28 20:29:08 sluggy kernel: Call Trace:
Jul 28 20:29:08 sluggy kernel: [<ffffffff816db0b6>] netlink_release+0x316/0x540
Jul 28 20:29:08 sluggy kernel: [<ffffffff8168615f>] sock_release+0x1f/0x90
Jul 28 20:29:08 sluggy kernel: [<ffffffff816861e2>] sock_close+0x12/0x20
Jul 28 20:29:08 sluggy kernel: [<ffffffff8123899b>] __fput+0xcb/0x1e0
Jul 28 20:29:08 sluggy kernel: [<ffffffff81238afe>] ____fput+0xe/0x10
Jul 28 20:29:08 sluggy kernel: [<ffffffff810c4db4>] task_work_run+0xd4/0xf0
Jul 28 20:29:08 sluggy kernel: [<ffffffff810a8b57>] do_exit+0x3f7/0xbe0
Jul 28 20:29:08 sluggy kernel: [<ffffffff810a93e0>] do_group_exit+0x50/0xd0
Jul 28 20:29:08 sluggy kernel: [<ffffffff810b5fbb>] get_signal+0x29b/0x6e0
Jul 28 20:29:08 sluggy kernel: [<ffffffff810165b7>] do_signal+0x37/0x7c0
Jul 28 20:29:08 sluggy kernel: [<ffffffff812833b3>] ? ep_poll+0x3b3/0x460
Jul 28 20:29:08 sluggy kernel: [<ffffffff81157816>] ? __audit_syscall_exit+0x1f6/0x290
Jul 28 20:29:08 sluggy kernel: [<ffffffff81016de8>] do_notify_resume+0xa8/0x100
Jul 28 20:29:08 sluggy kernel: [<ffffffff817cc1be>] int_signal+0x12/0x17
Jul 28 20:29:08 sluggy kernel: Code: 44 00 00 55 48 89 e5 41 55 41 54 53 49 89 fc 49 89 f5 48 c7 c7 87 0d ae 81 89 d3 48 83 ec
Jul 28 20:29:08 sluggy kernel: RIP [<ffffffff8168a35d>] sock_prot_inuse_add+0x2d/0x50
Jul 28 20:29:08 sluggy kernel: RSP <ffff88007871fb68>
Jul 28 20:29:08 sluggy kernel: ---[ end trace 0000000000000002 ]---
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RT] oops in 4.1.3-rt3
2015-08-03 19:27 [RT] oops in 4.1.3-rt3 Clark Williams
@ 2015-08-04 8:08 ` Nicholas Mc Guire
2015-08-04 12:05 ` Thomas Gleixner
2015-08-16 11:33 ` Sebastian Andrzej Siewior
2016-02-04 20:59 ` Clark Williams
2 siblings, 1 reply; 9+ messages in thread
From: Nicholas Mc Guire @ 2015-08-04 8:08 UTC (permalink / raw)
To: Clark Williams; +Cc: Sebastian Andrzej Siewior, RT, LKML
On Mon, 03 Aug 2015, Clark Williams wrote:
> Sebastian,
>
> Below is a traceback I hit while running 4.1.3-rt3 on my Lenovo T530.
>
> I was doing my normal, play music, copy files over the lan, do compiles,
> do email, etc., so I I can't really point you at a reproducer. The graphics
> system stayed up somewhat but the actual trace I hit scrolled off. This was
> extracted from the syslog with 'journalctl'.
>
> If I encounter it again, I'll see if I can force a kdump.
>
have an i7 8-core here running 4.1.3-rt3 that is more or less idle (about 3
days) only one ssh login over the network running top and no other user logged
in - just the default processes of Debian 8.1 active and it slowly grinds
to a halt (no rt user-space tasks running). the last screen that top
displayed was
<snip>
3 root -2 0 0 0 0 R 60.3 0.0 5756:16 ksoftirqd/0
23 root -2 0 0 0 0 R 50.8 0.0 4948:08 ksoftirqd/2
17 root -2 0 0 0 0 S 50.6 0.0 4897:13 ksoftirqd/1
29 root -2 0 0 0 0 S 50.4 0.0 4953:24 ksoftirqd/3
10 root 20 0 0 0 0 R 17.0 0.0 1836:35 rcuc/0
1 root 20 0 177192 5628 3040 R 15.0 0.0 1715:07 systemd
16 root 20 0 0 0 0 R 14.8 0.0 1780:42 rcuc/1
7 root 20 0 0 0 0 R 12.6 0.0 717:42.70 rcu_preempt
15052 hofrat 20 0 23640 2792 2352 R 11.5 0.0 0:54.54 top
1205 root 20 0 0 0 0 R 10.0 0.0 812:43.02 kworker/0:2
1524 message+ 20 0 43544 4624 3000 R 8.2 0.0 1049:19 dbus-daemon
2070 root 20 0 307568 10492 9420 S 8.0 0.1 1023:26 packagekitd
<snip>
but I did not get any traces in syslog and no details that are of any help
it would be interesting though to know if there is a similarity in the top
tasks on the system.
thx!
hofrat
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RT] oops in 4.1.3-rt3
2015-08-04 8:08 ` Nicholas Mc Guire
@ 2015-08-04 12:05 ` Thomas Gleixner
2015-08-16 11:58 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2015-08-04 12:05 UTC (permalink / raw)
To: Nicholas Mc Guire; +Cc: Clark Williams, Sebastian Andrzej Siewior, RT, LKML
On Tue, 4 Aug 2015, Nicholas Mc Guire wrote:
> <snip>
> 3 root -2 0 0 0 0 R 60.3 0.0 5756:16 ksoftirqd/0
> 23 root -2 0 0 0 0 R 50.8 0.0 4948:08 ksoftirqd/2
> 17 root -2 0 0 0 0 S 50.6 0.0 4897:13 ksoftirqd/1
> 29 root -2 0 0 0 0 S 50.4 0.0 4953:24 ksoftirqd/3
So ksoftirqd eats 50+% CPU on each core. On an idle system!?! Any
chance that you can get a function trace snapshot out of it?
Thanks,
tglx
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RT] oops in 4.1.3-rt3
2015-08-03 19:27 [RT] oops in 4.1.3-rt3 Clark Williams
2015-08-04 8:08 ` Nicholas Mc Guire
@ 2015-08-16 11:33 ` Sebastian Andrzej Siewior
2016-02-04 20:59 ` Clark Williams
2 siblings, 0 replies; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-08-16 11:33 UTC (permalink / raw)
To: Clark Williams; +Cc: RT, LKML
* Clark Williams | 2015-08-03 14:27:00 [-0500]:
>Sebastian,
Hi Clark,
>Below is a traceback I hit while running 4.1.3-rt3 on my Lenovo T530.
>
>I was doing my normal, play music, copy files over the lan, do compiles,
>do email, etc., so I I can't really point you at a reproducer. The graphics
>system stayed up somewhat but the actual trace I hit scrolled off. This was
>extracted from the syslog with 'journalctl'.
>
>If I encounter it again, I'll see if I can force a kdump.
So it did not happen again? Is it possible to get addr2line from this
kernel (from $eip)? sock_prot_inuse_add() does "just" __this_cpu_add().
>Clark
Sebastian
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RT] oops in 4.1.3-rt3
2015-08-04 12:05 ` Thomas Gleixner
@ 2015-08-16 11:58 ` Sebastian Andrzej Siewior
2015-08-17 3:33 ` Alex Goebel
2015-08-19 20:58 ` Thomas Gleixner
0 siblings, 2 replies; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-08-16 11:58 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: Nicholas Mc Guire, Clark Williams, RT, LKML
* Thomas Gleixner | 2015-08-04 14:05:10 [+0200]:
>On Tue, 4 Aug 2015, Nicholas Mc Guire wrote:
>> <snip>
>> 3 root -2 0 0 0 0 R 60.3 0.0 5756:16 ksoftirqd/0
>> 23 root -2 0 0 0 0 R 50.8 0.0 4948:08 ksoftirqd/2
>> 17 root -2 0 0 0 0 S 50.6 0.0 4897:13 ksoftirqd/1
>> 29 root -2 0 0 0 0 S 50.4 0.0 4953:24 ksoftirqd/3
>
>So ksoftirqd eats 50+% CPU on each core. On an idle system!?! Any
>chance that you can get a function trace snapshot out of it?
This might be a .config thing, device driver or user land behaviour.
Usually ksoftirqd should remain (almost) idle because most of BH work
remains in task-context and not in ksoftirqd. tasklets for instance
should run in ksoftirqd. So a function trace or event tracing to figure
out what is scheduling the softirq might give a pointer.
I have here a AMD box with
3 root -2 0 0 0 0 S 1.3 0.0 126:16.31 ksoftirqd/0
18 root -2 0 0 0 0 S 1.3 0.0 136:55.85 ksoftirqd/1
30 root -2 0 0 0 0 S 1.3 0.0 138:50.52 ksoftirqd/3
24 root -2 0 0 0 0 S 1.0 0.0 143:54.28 ksoftirqd/2
with a 15 days uptime. That one percent CPU usage on a idle system looks
odd here, too.
>Thanks,
>
> tglx
Sebastian
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RT] oops in 4.1.3-rt3
2015-08-16 11:58 ` Sebastian Andrzej Siewior
@ 2015-08-17 3:33 ` Alex Goebel
2015-08-19 20:58 ` Thomas Gleixner
1 sibling, 0 replies; 9+ messages in thread
From: Alex Goebel @ 2015-08-17 3:33 UTC (permalink / raw)
To: linux-rt-users
Hi,
Sebastian Andrzej Siewior <bigeasy <at> linutronix.de> writes:
> This might be a .config thing, device driver or user land behaviour.
> Usually ksoftirqd should remain (almost) idle because most of BH work
> remains in task-context and not in ksoftirqd. tasklets for instance
> should run in ksoftirqd. So a function trace or event tracing to figure
> out what is scheduling the softirq might give a pointer.
I have a problem that might be related, or at least I suffer from the same
symptoms:
Trying to run various Debian 4.1 (and 4.0 as well, IIRC) -rt kernels on a
Core i7-720QM Thinkpad, it boots up, I can log in, everything seems normal,
but then after a few minutes, not doing anything special, things get
sluggish and within seconds everything grinds to a complete halt. In the
last screen update I had, there was no double-digit percentage of ksoftirqd
though.
Below is what I get in the logs. Note that it doesn't actually shut down, it
just freezes and a hard reset is needed. The same kernels as non-RT work fine.
Not sure if this is really related, but perhaps it helps.
Cheers,
Alex
Aug 10 17:04:52 linux kernel: [ 189.542192] INFO: rcu_preempt detected
stalls on CPUs/tasks: { 3} (detected by 0, t=5252 jiffies, g=3874, c=3873,
q=8201)
Aug 10 17:04:52 linux kernel: [ 189.542194] Task dump for CPU 3:
Aug 10 17:04:52 linux kernel: [ 189.542200] swapper/3 R running task
0 0 1 0x00200000
Aug 10 17:04:52 linux kernel: [ 189.542206] ffffffff8136e2fa
ffffffff8101f9e5 ffff8803333eee00 0000000008004000
Aug 10 17:04:52 linux kernel: [ 189.542209] 0000000000000004
0000000000000004 ffffffff8145baf4 0000000000000000
Aug 10 17:04:52 linux kernel: [ 189.542212] 0000000000000000
ffff88033338f020 ffff8803333eee00 ffffffff8190b640
Aug 10 17:04:52 linux kernel: [ 189.542213] Call Trace:
Aug 10 17:04:52 linux kernel: [ 189.542224] [<ffffffff8136e2fa>] ?
intel_idle+0x12a/0x140
Aug 10 17:04:52 linux kernel: [ 189.542230] [<ffffffff8101f9e5>] ?
read_tsc+0x5/0x10
Aug 10 17:04:52 linux kernel: [ 189.542238] [<ffffffff8145baf4>] ?
cpuidle_enter_state+0x74/0x270
Aug 10 17:04:52 linux kernel: [ 189.542244] [<ffffffff810b9298>] ?
cpu_startup_entry+0x3a8/0x4e0
Aug 10 17:04:52 linux kernel: [ 189.542249] [<ffffffff8104e0d9>] ?
start_secondary+0x169/0x190
Aug 10 17:04:52 linux kernel: [ 189.542254] rcu_preempt kthread starved for
5251 jiffies!
Aug 10 17:05:01 linux CRON[4531]: (root) CMD (command -v debian-sa1 >
/dev/null && debian-sa1 1 1)
Aug 10 17:05:33 linux kernel: [ 210.562994] INFO: rcu_preempt detected
stalls on CPUs/tasks: { 3} (detected by 1, t=5252 jiffies, g=3875, c=3874,
q=14365)
Aug 10 17:05:33 linux kernel: [ 210.562997] Task dump for CPU 3:
Aug 10 17:05:33 linux kernel: [ 210.563003] swapper/3 R running task
0 0 1 0x00200000
Aug 10 17:05:33 linux kernel: [ 210.563081] ffffffff8136e2fa
ffffffff8101f9e5 ffff8803333eee00 0000000008004000
Aug 10 17:05:33 linux kernel: [ 210.563085] 0000000000000004
0000000000000004 ffffffff8145baf4 0000000000000000
Aug 10 17:05:33 linux kernel: [ 210.563090] 0000000000000000
ffff88033338f020 ffff8803333eee00 ffffffff8190b640
Aug 10 17:05:33 linux kernel: [ 210.563091] Call Trace:
Aug 10 17:05:33 linux kernel: [ 210.563103] [<ffffffff8136e2fa>] ?
intel_idle+0x12a/0x140
Aug 10 17:05:33 linux kernel: [ 210.563111] [<ffffffff8101f9e5>] ?
read_tsc+0x5/0x10
Aug 10 17:05:33 linux kernel: [ 210.563121] [<ffffffff8145baf4>] ?
cpuidle_enter_state+0x74/0x270
Aug 10 17:05:33 linux kernel: [ 210.563133] [<ffffffff810b9298>] ?
cpu_startup_entry+0x3a8/0x4e0
Aug 10 17:05:33 linux kernel: [ 210.563139] [<ffffffff8104e0d9>] ?
start_secondary+0x169/0x190
Aug 10 17:05:33 linux kernel: [ 210.563143] rcu_preempt kthread starved for
5252 jiffies!
Aug 10 17:05:34 linux shutdown[4689]: shutting down for system halt
Aug 10 17:05:34 linux init: Switching to runlevel: 0
Aug 10 17:05:34 linux shutdown[4822]: shutting down for system halt
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RT] oops in 4.1.3-rt3
2015-08-16 11:58 ` Sebastian Andrzej Siewior
2015-08-17 3:33 ` Alex Goebel
@ 2015-08-19 20:58 ` Thomas Gleixner
1 sibling, 0 replies; 9+ messages in thread
From: Thomas Gleixner @ 2015-08-19 20:58 UTC (permalink / raw)
To: Sebastian Andrzej Siewior; +Cc: Nicholas Mc Guire, Clark Williams, RT, LKML
On Sun, 16 Aug 2015, Sebastian Andrzej Siewior wrote:
> * Thomas Gleixner | 2015-08-04 14:05:10 [+0200]:
>
> >On Tue, 4 Aug 2015, Nicholas Mc Guire wrote:
> >> <snip>
> >> 3 root -2 0 0 0 0 R 60.3 0.0 5756:16 ksoftirqd/0
> >> 23 root -2 0 0 0 0 R 50.8 0.0 4948:08 ksoftirqd/2
> >> 17 root -2 0 0 0 0 S 50.6 0.0 4897:13 ksoftirqd/1
> >> 29 root -2 0 0 0 0 S 50.4 0.0 4953:24 ksoftirqd/3
> >
> >So ksoftirqd eats 50+% CPU on each core. On an idle system!?! Any
> >chance that you can get a function trace snapshot out of it?
>
> This might be a .config thing, device driver or user land behaviour.
> Usually ksoftirqd should remain (almost) idle because most of BH work
> remains in task-context and not in ksoftirqd. tasklets for instance
> should run in ksoftirqd. So a function trace or event tracing to figure
> out what is scheduling the softirq might give a pointer.
> I have here a AMD box with
>
> 3 root -2 0 0 0 0 S 1.3 0.0 126:16.31 ksoftirqd/0
> 18 root -2 0 0 0 0 S 1.3 0.0 136:55.85 ksoftirqd/1
> 30 root -2 0 0 0 0 S 1.3 0.0 138:50.52 ksoftirqd/3
> 24 root -2 0 0 0 0 S 1.0 0.0 143:54.28 ksoftirqd/2
>
> with a 15 days uptime. That one percent CPU usage on a idle system looks
> odd here, too.
Not really. If ksoftirqd takes 13us per invocation and you run with
HZ=1000 then this is reasonable.
Thanks,
tglx
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RT] oops in 4.1.3-rt3
2015-08-03 19:27 [RT] oops in 4.1.3-rt3 Clark Williams
2015-08-04 8:08 ` Nicholas Mc Guire
2015-08-16 11:33 ` Sebastian Andrzej Siewior
@ 2016-02-04 20:59 ` Clark Williams
2016-03-09 13:21 ` Sebastian Andrzej Siewior
2 siblings, 1 reply; 9+ messages in thread
From: Clark Williams @ 2016-02-04 20:59 UTC (permalink / raw)
To: Sebastian Andrzej Siewior; +Cc: RT, LKML
[-- Attachment #1: Type: text/plain, Size: 791 bytes --]
On Mon, 3 Aug 2015 14:27:00 -0500
Clark Williams <williams@redhat.com> wrote:
> Sebastian,
>
> Below is a traceback I hit while running 4.1.3-rt3 on my Lenovo T530.
>
> I was doing my normal, play music, copy files over the lan, do compiles,
> do email, etc., so I I can't really point you at a reproducer. The graphics
> system stayed up somewhat but the actual trace I hit scrolled off. This was
> extracted from the syslog with 'journalctl'.
>
> If I encounter it again, I'll see if I can force a kdump.
>
I have been unable to reproduce this bug. It happened during normal use and I haven't changed work habits, so I'd say drop this from the -rt open-issues section until we can get an actual reproducer or at least until the frequency goes up.
Thanks,
Clark
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 801 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RT] oops in 4.1.3-rt3
2016-02-04 20:59 ` Clark Williams
@ 2016-03-09 13:21 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-03-09 13:21 UTC (permalink / raw)
To: Clark Williams; +Cc: RT, LKML
* Clark Williams | 2016-02-04 14:59:31 [-0600]:
>On Mon, 3 Aug 2015 14:27:00 -0500
>I have been unable to reproduce this bug. It happened during normal use and I haven't changed work habits, so I'd say drop this from the -rt open-issues section until we can get an actual reproducer or at least until the frequency goes up.
I moved it to [0] and dropped it from my announce mail.
[0] https://bugzilla.kernel.org/show_bug.cgi?id=114131
>Thanks,
>Clark
Sebastian
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-03-09 13:21 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-03 19:27 [RT] oops in 4.1.3-rt3 Clark Williams
2015-08-04 8:08 ` Nicholas Mc Guire
2015-08-04 12:05 ` Thomas Gleixner
2015-08-16 11:58 ` Sebastian Andrzej Siewior
2015-08-17 3:33 ` Alex Goebel
2015-08-19 20:58 ` Thomas Gleixner
2015-08-16 11:33 ` Sebastian Andrzej Siewior
2016-02-04 20:59 ` Clark Williams
2016-03-09 13:21 ` Sebastian Andrzej Siewior
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).