2.6.31-rc1-mmotm0702 - ps command hangs inside kernel

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* 2.6.31-rc1-mmotm0702 - ps command hangs inside kernel
@ 2009-07-13 20:54 Valdis.Kletnieks
  2009-07-13 21:38 ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Valdis.Kletnieks @ 2009-07-13 20:54 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 7668 bytes --]

Several times recently, I've had the 'ps' command hang inside the kernel
for extended periods of time - usually around 1100 seconds, but today I
had one that hung there for 2351 seconds.

Info:  It's always reading the same file - /proc/<pid>/status, where the
pid is the 'pcscd' daemon.  And eventually, it manages to exit on its own.
Whatever is getting horked up resets itself - subsequent 'ps' commands work
just fine.

My best guess is that it's getting onto a squirrely vma in
get_stack_usage_in_bytes() in the for/follow_page loop.  Possibly highly
relevant - /usr/sbin/pcscd is one of the few 32-bit binaries running in
a mostly 64-bit runtime.

Here's the traceback of the ps, as reported by 2 alt-sysrq-t several
minutes apart:

ps            R  running task     3936 26646  26580 0x00000080
 ffff88005a599be8 ffffffff81499842 ffff88005a599b78 ffffffff8103589b
 ffffffff81035805 ffff88007e71ea40 ffff880002120f80 ffff8800657d8fe0
 000000000000df78 ffff8800657d8fe0 000000000000df78 ffff8800657d8fe8
Call Trace:
 [<ffffffff81499842>] ? thread_return+0xb6/0xfa
 [<ffffffff8103589b>] ? finish_task_switch+0xd1/0xf4
 [<ffffffff81035805>] ? finish_task_switch+0x3b/0xf4
 [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145
 [<ffffffff8149b2a6>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8101f76b>] ? smp_apic_timer_interrupt+0x81/0x8f
 [<ffffffff81046231>] ? irq_exit+0xaf/0xb4
 [<ffffffff8100bd80>] restore_args+0x0/0x30
 [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145
 [<ffffffff8149b2a6>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8101f76b>] ? smp_apic_timer_interrupt+0x81/0x8f
 [<ffffffff810b8222>] ? IS_ERR+0x25/0x2c
 [<ffffffff810b8f22>] ? follow_page+0x28/0x2e3
 [<ffffffff81121a04>] ? proc_pid_status+0x60c/0x694
 [<ffffffff81066072>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff8111e42c>] ? proc_single_show+0x57/0x74
 [<ffffffff810e90b4>] ? seq_read+0x249/0x49b
 [<ffffffff8116c840>] ? security_file_permission+0x11/0x13
 [<ffffffff810d05e1>] ? vfs_read+0xe0/0x141
 [<ffffffff810d8191>] ? path_put+0x1d/0x21
 [<ffffffff810d06f8>] ? sys_read+0x45/0x69
 [<ffffffff8100b2ab>] ? system_call_fastpath+0x16/0x1b

ps            R  running task     3936 26646  26580 0x00000080
 ffff88005a599bd8 ffffffff81499842 ffff88000213bf80 0000000000000000
 ffff88005a599b78 ffffffff8103589b ffffffff81035805 ffff88007e71ea40
 ffff88000213bf80 ffff8800657d8fe0 000000000000df78 ffff8800657d8fe8
Call Trace:
 [<ffffffff81499842>] ? thread_return+0xb6/0xfa
 [<ffffffff8103589b>] ? finish_task_switch+0xd1/0xf4
 [<ffffffff81035805>] ? finish_task_switch+0x3b/0xf4
 [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145
 [<ffffffff8149b2a6>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8101f76b>] smp_apic_timer_interrupt+0x81/0x8f
 [<ffffffff81046231>] ? irq_exit+0xaf/0xb4
 [<ffffffff8100bd80>] ? restore_args+0x0/0x30
 [<ffffffff810b8222>] ? IS_ERR+0x25/0x2c
 [<ffffffff810b8222>] ? IS_ERR+0x25/0x2c
 [<ffffffff810b8f22>] ? follow_page+0x28/0x2e3
 [<ffffffff810b91d9>] ? follow_page+0x2df/0x2e3
 [<ffffffff811219d8>] ? proc_pid_status+0x5e0/0x694
 [<ffffffff81066072>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff8111e42c>] ? proc_single_show+0x57/0x74
 [<ffffffff810e90b4>] ? seq_read+0x249/0x49b
 [<ffffffff8116c840>] ? security_file_permission+0x11/0x13
 [<ffffffff810d05e1>] ? vfs_read+0xe0/0x141
 [<ffffffff810d8191>] ? path_put+0x1d/0x21
 [<ffffffff810d06f8>] ? sys_read+0x45/0x69
 [<ffffffff8100b2ab>] ? system_call_fastpath+0x16/0x1b

Here's the output of 'cat /proc/2031/maps' currently - I didn't think to
get one during the last event:

# cat /proc/2031/maps
08047000-0805e000 r-xp 00000000 fe:05 51682                              /usr/sbin/pcscd
0805e000-0805f000 rw-p 00016000 fe:05 51682                              /usr/sbin/pcscd
0805f000-080e8000 rw-p 00000000 00:00 0
09990000-099b1000 rw-p 00000000 00:00 0                                  [heap]
44368000-4436f000 r-xp 00000000 fe:05 20682                              /usr/lib/libusb-0.1.so.4.4.4
4436f000-44371000 rw-p 00006000 fe:05 20682                              /usr/lib/libusb-0.1.so.4.4.4
45c81000-45cab000 r-xp 00000000 fe:00 8715                               /lib/libgcc_s-4.4.0-20090708.so.1
45cab000-45cac000 rw-p 0002a000 fe:00 8715                               /lib/libgcc_s-4.4.0-20090708.so.1
47c00000-47c20000 r-xp 00000000 fe:00 8266                               /lib/ld-2.10.1.so
47c20000-47c21000 r--p 0001f000 fe:00 8266                               /lib/ld-2.10.1.so
47c21000-47c22000 rw-p 00020000 fe:00 8266                               /lib/ld-2.10.1.so
47c24000-47d8f000 r-xp 00000000 fe:00 8360                               /lib/libc-2.10.1.so
47d8f000-47d90000 ---p 0016b000 fe:00 8360                               /lib/libc-2.10.1.so
47d90000-47d92000 r--p 0016b000 fe:00 8360                               /lib/libc-2.10.1.so
47d92000-47d93000 rw-p 0016d000 fe:00 8360                               /lib/libc-2.10.1.so
47d93000-47d96000 rw-p 00000000 00:00 0 
47d98000-47d9b000 r-xp 00000000 fe:00 8388                               /lib/libdl-2.10.1.so
47d9b000-47d9c000 r--p 00002000 fe:00 8388                               /lib/libdl-2.10.1.so
47d9c000-47d9d000 rw-p 00003000 fe:00 8388                               /lib/libdl-2.10.1.so
47d9f000-47db5000 r-xp 00000000 fe:00 8675                               /lib/libpthread-2.10.1.so
47db5000-47db6000 ---p 00016000 fe:00 8675                               /lib/libpthread-2.10.1.so
47db6000-47db7000 r--p 00016000 fe:00 8675                               /lib/libpthread-2.10.1.so
47db7000-47db8000 rw-p 00017000 fe:00 8675                               /lib/libpthread-2.10.1.so
47db8000-47dba000 rw-p 00000000 00:00 0
f6c00000-f6c21000 rw-p 00000000 00:00 0
f6c21000-f6d00000 ---p 00000000 00:00 0
f6dff000-f6e00000 ---p 00000000 00:00 0
f6e00000-f7600000 rw-p 00000000 00:00 0
f7600000-f7621000 rw-p 00000000 00:00 0
f7621000-f7700000 ---p 00000000 00:00 0
f773b000-f773c000 ---p 00000000 00:00 0
f773c000-f7f3f000 rw-p 00000000 00:00 0
f7f50000-f7f51000 rw-s 0000f000 fe:09 86049                              /var/run/pcscd.pub
f7f51000-f7f52000 rw-s 0000e000 fe:09 86049                              /var/run/pcscd.pub
f7f52000-f7f53000 rw-s 0000d000 fe:09 86049                              /var/run/pcscd.pub
f7f53000-f7f54000 rw-s 0000c000 fe:09 86049                              /var/run/pcscd.pub
f7f54000-f7f55000 rw-s 0000b000 fe:09 86049                              /var/run/pcscd.pub
f7f55000-f7f56000 rw-s 0000a000 fe:09 86049                              /var/run/pcscd.pub
f7f56000-f7f57000 rw-s 00009000 fe:09 86049                              /var/run/pcscd.pub
f7f57000-f7f58000 rw-s 00008000 fe:09 86049                              /var/run/pcscd.pub
f7f58000-f7f59000 rw-s 00007000 fe:09 86049                              /var/run/pcscd.pub
f7f59000-f7f5a000 rw-s 00006000 fe:09 86049                              /var/run/pcscd.pub
f7f5a000-f7f5b000 rw-s 00005000 fe:09 86049                              /var/run/pcscd.pub
f7f5b000-f7f5c000 rw-s 00004000 fe:09 86049                              /var/run/pcscd.pub
f7f5c000-f7f5d000 rw-s 00003000 fe:09 86049                              /var/run/pcscd.pub
f7f5d000-f7f5e000 rw-s 00002000 fe:09 86049                              /var/run/pcscd.pub
f7f5e000-f7f5f000 rw-s 00001000 fe:09 86049                              /var/run/pcscd.pub
f7f5f000-f7f60000 rw-s 00000000 fe:09 86049                              /var/run/pcscd.pub
f7f60000-f7f61000 r-xp 00000000 00:00 0                                  [vdso]
fffe8000-ffffd000 rw-p 00000000 00:00 0                                  [stack]


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.31-rc1-mmotm0702 - ps command hangs inside kernel
  2009-07-13 20:54 2.6.31-rc1-mmotm0702 - ps command hangs inside kernel Valdis.Kletnieks
@ 2009-07-13 21:38 ` Andrew Morton
  2009-07-14  5:31   ` Stefani Seibold
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2009-07-13 21:38 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: linux-kernel, Stefani Seibold

On Mon, 13 Jul 2009 16:54:51 -0400
Valdis.Kletnieks@vt.edu wrote:

> Several times recently, I've had the 'ps' command hang inside the kernel
> for extended periods of time - usually around 1100 seconds, but today I
> had one that hung there for 2351 seconds.
> 
> Info:  It's always reading the same file - /proc/<pid>/status, where the
> pid is the 'pcscd' daemon.  And eventually, it manages to exit on its own.
> Whatever is getting horked up resets itself - subsequent 'ps' commands work
> just fine.
> 
> My best guess is that it's getting onto a squirrely vma in
> get_stack_usage_in_bytes() in the for/follow_page loop.  Possibly highly
> relevant - /usr/sbin/pcscd is one of the few 32-bit binaries running in
> a mostly 64-bit runtime.

OK, thanks for the analysis.

> Here's the traceback of the ps, as reported by 2 alt-sysrq-t several
> minutes apart:
> 
> ps            R  running task     3936 26646  26580 0x00000080
>  ffff88005a599be8 ffffffff81499842 ffff88005a599b78 ffffffff8103589b
>  ffffffff81035805 ffff88007e71ea40 ffff880002120f80 ffff8800657d8fe0
>  000000000000df78 ffff8800657d8fe0 000000000000df78 ffff8800657d8fe8
> Call Trace:
>  [<ffffffff81499842>] ? thread_return+0xb6/0xfa
>  [<ffffffff8103589b>] ? finish_task_switch+0xd1/0xf4
>  [<ffffffff81035805>] ? finish_task_switch+0x3b/0xf4
>  [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145
>  [<ffffffff8149b2a6>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff8101f76b>] ? smp_apic_timer_interrupt+0x81/0x8f
>  [<ffffffff81046231>] ? irq_exit+0xaf/0xb4
>  [<ffffffff8100bd80>] restore_args+0x0/0x30
>  [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145
>  [<ffffffff8149b2a6>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff8101f76b>] ? smp_apic_timer_interrupt+0x81/0x8f
>  [<ffffffff810b8222>] ? IS_ERR+0x25/0x2c
>  [<ffffffff810b8f22>] ? follow_page+0x28/0x2e3
>  [<ffffffff81121a04>] ? proc_pid_status+0x60c/0x694
>  [<ffffffff81066072>] ? trace_hardirqs_on+0xd/0xf
>  [<ffffffff8111e42c>] ? proc_single_show+0x57/0x74
>  [<ffffffff810e90b4>] ? seq_read+0x249/0x49b
>  [<ffffffff8116c840>] ? security_file_permission+0x11/0x13
>  [<ffffffff810d05e1>] ? vfs_read+0xe0/0x141
>  [<ffffffff810d8191>] ? path_put+0x1d/0x21
>  [<ffffffff810d06f8>] ? sys_read+0x45/0x69
>  [<ffffffff8100b2ab>] ? system_call_fastpath+0x16/0x1b
> 
> ps            R  running task     3936 26646  26580 0x00000080
>  ffff88005a599bd8 ffffffff81499842 ffff88000213bf80 0000000000000000
>  ffff88005a599b78 ffffffff8103589b ffffffff81035805 ffff88007e71ea40
>  ffff88000213bf80 ffff8800657d8fe0 000000000000df78 ffff8800657d8fe8
> Call Trace:
>  [<ffffffff81499842>] ? thread_return+0xb6/0xfa
>  [<ffffffff8103589b>] ? finish_task_switch+0xd1/0xf4
>  [<ffffffff81035805>] ? finish_task_switch+0x3b/0xf4
>  [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145
>  [<ffffffff8149b2a6>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff8101f76b>] smp_apic_timer_interrupt+0x81/0x8f
>  [<ffffffff81046231>] ? irq_exit+0xaf/0xb4
>  [<ffffffff8100bd80>] ? restore_args+0x0/0x30
>  [<ffffffff810b8222>] ? IS_ERR+0x25/0x2c
>  [<ffffffff810b8222>] ? IS_ERR+0x25/0x2c
>  [<ffffffff810b8f22>] ? follow_page+0x28/0x2e3
>  [<ffffffff810b91d9>] ? follow_page+0x2df/0x2e3
>  [<ffffffff811219d8>] ? proc_pid_status+0x5e0/0x694
>  [<ffffffff81066072>] ? trace_hardirqs_on+0xd/0xf
>  [<ffffffff8111e42c>] ? proc_single_show+0x57/0x74
>  [<ffffffff810e90b4>] ? seq_read+0x249/0x49b
>  [<ffffffff8116c840>] ? security_file_permission+0x11/0x13
>  [<ffffffff810d05e1>] ? vfs_read+0xe0/0x141
>  [<ffffffff810d8191>] ? path_put+0x1d/0x21
>  [<ffffffff810d06f8>] ? sys_read+0x45/0x69
>  [<ffffffff8100b2ab>] ? system_call_fastpath+0x16/0x1b

Another possibility is that pcscd has gone off on a long in-kernel sulk
(polling hardware?) while holding some lock which ps needs (eg, mmap_sem).

It would be useful if you can grab a pcscd backtrace during the stall.

I have a new version of those patches in my inbox to look at - I'm a
bit swamped in backlog at present.  I could drop the old version from
my tree for a while, but it's unclear how that would help us find the
cause of the problem.


> Here's the output of 'cat /proc/2031/maps' currently - I didn't think to
> get one during the last event:
> 
> # cat /proc/2031/maps
> 08047000-0805e000 r-xp 00000000 fe:05 51682                              /usr/sbin/pcscd
> 0805e000-0805f000 rw-p 00016000 fe:05 51682                              /usr/sbin/pcscd
> 0805f000-080e8000 rw-p 00000000 00:00 0
> 09990000-099b1000 rw-p 00000000 00:00 0                                  [heap]
> 44368000-4436f000 r-xp 00000000 fe:05 20682                              /usr/lib/libusb-0.1.so.4.4.4
> 4436f000-44371000 rw-p 00006000 fe:05 20682                              /usr/lib/libusb-0.1.so.4.4.4
> 45c81000-45cab000 r-xp 00000000 fe:00 8715                               /lib/libgcc_s-4.4.0-20090708.so.1
> 45cab000-45cac000 rw-p 0002a000 fe:00 8715                               /lib/libgcc_s-4.4.0-20090708.so.1
> 47c00000-47c20000 r-xp 00000000 fe:00 8266                               /lib/ld-2.10.1.so
> 47c20000-47c21000 r--p 0001f000 fe:00 8266                               /lib/ld-2.10.1.so
> 47c21000-47c22000 rw-p 00020000 fe:00 8266                               /lib/ld-2.10.1.so
> 47c24000-47d8f000 r-xp 00000000 fe:00 8360                               /lib/libc-2.10.1.so
> 47d8f000-47d90000 ---p 0016b000 fe:00 8360                               /lib/libc-2.10.1.so
> 47d90000-47d92000 r--p 0016b000 fe:00 8360                               /lib/libc-2.10.1.so
> 47d92000-47d93000 rw-p 0016d000 fe:00 8360                               /lib/libc-2.10.1.so
> 47d93000-47d96000 rw-p 00000000 00:00 0 
> 47d98000-47d9b000 r-xp 00000000 fe:00 8388                               /lib/libdl-2.10.1.so
> 47d9b000-47d9c000 r--p 00002000 fe:00 8388                               /lib/libdl-2.10.1.so
> 47d9c000-47d9d000 rw-p 00003000 fe:00 8388                               /lib/libdl-2.10.1.so
> 47d9f000-47db5000 r-xp 00000000 fe:00 8675                               /lib/libpthread-2.10.1.so
> 47db5000-47db6000 ---p 00016000 fe:00 8675                               /lib/libpthread-2.10.1.so
> 47db6000-47db7000 r--p 00016000 fe:00 8675                               /lib/libpthread-2.10.1.so
> 47db7000-47db8000 rw-p 00017000 fe:00 8675                               /lib/libpthread-2.10.1.so
> 47db8000-47dba000 rw-p 00000000 00:00 0
> f6c00000-f6c21000 rw-p 00000000 00:00 0
> f6c21000-f6d00000 ---p 00000000 00:00 0
> f6dff000-f6e00000 ---p 00000000 00:00 0
> f6e00000-f7600000 rw-p 00000000 00:00 0
> f7600000-f7621000 rw-p 00000000 00:00 0
> f7621000-f7700000 ---p 00000000 00:00 0
> f773b000-f773c000 ---p 00000000 00:00 0
> f773c000-f7f3f000 rw-p 00000000 00:00 0
> f7f50000-f7f51000 rw-s 0000f000 fe:09 86049                              /var/run/pcscd.pub
> f7f51000-f7f52000 rw-s 0000e000 fe:09 86049                              /var/run/pcscd.pub
> f7f52000-f7f53000 rw-s 0000d000 fe:09 86049                              /var/run/pcscd.pub
> f7f53000-f7f54000 rw-s 0000c000 fe:09 86049                              /var/run/pcscd.pub
> f7f54000-f7f55000 rw-s 0000b000 fe:09 86049                              /var/run/pcscd.pub
> f7f55000-f7f56000 rw-s 0000a000 fe:09 86049                              /var/run/pcscd.pub
> f7f56000-f7f57000 rw-s 00009000 fe:09 86049                              /var/run/pcscd.pub
> f7f57000-f7f58000 rw-s 00008000 fe:09 86049                              /var/run/pcscd.pub
> f7f58000-f7f59000 rw-s 00007000 fe:09 86049                              /var/run/pcscd.pub
> f7f59000-f7f5a000 rw-s 00006000 fe:09 86049                              /var/run/pcscd.pub
> f7f5a000-f7f5b000 rw-s 00005000 fe:09 86049                              /var/run/pcscd.pub
> f7f5b000-f7f5c000 rw-s 00004000 fe:09 86049                              /var/run/pcscd.pub
> f7f5c000-f7f5d000 rw-s 00003000 fe:09 86049                              /var/run/pcscd.pub
> f7f5d000-f7f5e000 rw-s 00002000 fe:09 86049                              /var/run/pcscd.pub
> f7f5e000-f7f5f000 rw-s 00001000 fe:09 86049                              /var/run/pcscd.pub
> f7f5f000-f7f60000 rw-s 00000000 fe:09 86049                              /var/run/pcscd.pub
> f7f60000-f7f61000 r-xp 00000000 00:00 0                                  [vdso]
> fffe8000-ffffd000 rw-p 00000000 00:00 0                                  [stack]
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.31-rc1-mmotm0702 - ps command hangs inside kernel
  2009-07-13 21:38 ` Andrew Morton
@ 2009-07-14  5:31   ` Stefani Seibold
  2009-07-16 19:12     ` Valdis.Kletnieks
  0 siblings, 1 reply; 7+ messages in thread
From: Stefani Seibold @ 2009-07-14  5:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Valdis.Kletnieks, linux-kernel

Hi,


Am Montag, den 13.07.2009, 14:38 -0700 schrieb Andrew Morton:
> On Mon, 13 Jul 2009 16:54:51 -0400
> Valdis.Kletnieks@vt.edu wrote:
> 
> > Several times recently, I've had the 'ps' command hang inside the kernel
> > for extended periods of time - usually around 1100 seconds, but today I
> > had one that hung there for 2351 seconds.
> > 
> > Info:  It's always reading the same file - /proc/<pid>/status, where the
> > pid is the 'pcscd' daemon.  And eventually, it manages to exit on its own.
> > Whatever is getting horked up resets itself - subsequent 'ps' commands work
> > just fine.
> > 
> > My best guess is that it's getting onto a squirrely vma in
> > get_stack_usage_in_bytes() in the for/follow_page loop.  Possibly highly
> > relevant - /usr/sbin/pcscd is one of the few 32-bit binaries running in
> > a mostly 64-bit runtime.
> 

i am the author of the get_stack_usage_bytes(). Because i have currently
no 64bit machine running, i am not able to analyse your problem. Does it
only happen on 32bit application on a 64bit kernel? Is it only affected
to pcsd? 

Can you give me a more accurate info what exactly the problem is?

> OK, thanks for the analysis.
> 
> > Here's the traceback of the ps, as reported by 2 alt-sysrq-t several
> > minutes apart:
> > 
> > 
> > ps            R  running task     3936 26646  26580 0x00000080
> >  ffff88005a599bd8 ffffffff81499842 ffff88000213bf80 0000000000000000
> >  ffff88005a599b78 ffffffff8103589b ffffffff81035805 ffff88007e71ea40
> >  ffff88000213bf80 ffff8800657d8fe0 000000000000df78 ffff8800657d8fe8
> > Call Trace:
> >  [<ffffffff81499842>] ? thread_return+0xb6/0xfa
> >  [<ffffffff8103589b>] ? finish_task_switch+0xd1/0xf4
> >  [<ffffffff81035805>] ? finish_task_switch+0x3b/0xf4
> >  [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145
> >  [<ffffffff8149b2a6>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> >  [<ffffffff8101f76b>] smp_apic_timer_interrupt+0x81/0x8f
> >  [<ffffffff81046231>] ? irq_exit+0xaf/0xb4
> >  [<ffffffff8100bd80>] ? restore_args+0x0/0x30
> >  [<ffffffff810b8222>] ? IS_ERR+0x25/0x2c
> >  [<ffffffff810b8222>] ? IS_ERR+0x25/0x2c
> >  [<ffffffff810b8f22>] ? follow_page+0x28/0x2e3
> >  [<ffffffff810b91d9>] ? follow_page+0x2df/0x2e3
> >  [<ffffffff811219d8>] ? proc_pid_status+0x5e0/0x694
> >  [<ffffffff81066072>] ? trace_hardirqs_on+0xd/0xf
> >  [<ffffffff8111e42c>] ? proc_single_show+0x57/0x74
> >  [<ffffffff810e90b4>] ? seq_read+0x249/0x49b
> >  [<ffffffff8116c840>] ? security_file_permission+0x11/0x13
> >  [<ffffffff810d05e1>] ? vfs_read+0xe0/0x141
> >  [<ffffffff810d8191>] ? path_put+0x1d/0x21
> >  [<ffffffff810d06f8>] ? sys_read+0x45/0x69
> >  [<ffffffff8100b2ab>] ? system_call_fastpath+0x16/0x1b
> 
The double follow_page looks strange for me... I will have a look on it.

> Another possibility is that pcscd has gone off on a long in-kernel sulk
> (polling hardware?) while holding some lock which ps needs (eg, mmap_sem).
> 
> It would be useful if you can grab a pcscd backtrace during the stall.
> 
> I have a new version of those patches in my inbox to look at - I'm a
> bit swamped in backlog at present.  I could drop the old version from
> my tree for a while, but it's unclear how that would help us find the
> cause of the problem.
> 
> 
> > Here's the output of 'cat /proc/2031/maps' currently - I didn't think to
> > get one during the last event:
> > 
> > # cat /proc/2031/maps
> > 08047000-0805e000 r-xp 00000000 fe:05 51682                              /usr/sbin/pcscd
> > 0805e000-0805f000 rw-p 00016000 fe:05 51682                              /usr/sbin/pcscd
> > 0805f000-080e8000 rw-p 00000000 00:00 0
> > 09990000-099b1000 rw-p 00000000 00:00 0                                  [heap]
> > 44368000-4436f000 r-xp 00000000 fe:05 20682                              /usr/lib/libusb-0.1.so.4.4.4
> > 4436f000-44371000 rw-p 00006000 fe:05 20682                              /usr/lib/libusb-0.1.so.4.4.4
> > 45c81000-45cab000 r-xp 00000000 fe:00 8715                               /lib/libgcc_s-4.4.0-20090708.so.1
> > 45cab000-45cac000 rw-p 0002a000 fe:00 8715                               /lib/libgcc_s-4.4.0-20090708.so.1
> > 47c00000-47c20000 r-xp 00000000 fe:00 8266                               /lib/ld-2.10.1.so
> > 47c20000-47c21000 r--p 0001f000 fe:00 8266                               /lib/ld-2.10.1.so
> > 47c21000-47c22000 rw-p 00020000 fe:00 8266                               /lib/ld-2.10.1.so
> > 47c24000-47d8f000 r-xp 00000000 fe:00 8360                               /lib/libc-2.10.1.so
> > 47d8f000-47d90000 ---p 0016b000 fe:00 8360                               /lib/libc-2.10.1.so
> > 47d90000-47d92000 r--p 0016b000 fe:00 8360                               /lib/libc-2.10.1.so
> > 47d92000-47d93000 rw-p 0016d000 fe:00 8360                               /lib/libc-2.10.1.so
> > 47d93000-47d96000 rw-p 00000000 00:00 0 
> > 47d98000-47d9b000 r-xp 00000000 fe:00 8388                               /lib/libdl-2.10.1.so
> > 47d9b000-47d9c000 r--p 00002000 fe:00 8388                               /lib/libdl-2.10.1.so
> > 47d9c000-47d9d000 rw-p 00003000 fe:00 8388                               /lib/libdl-2.10.1.so
> > 47d9f000-47db5000 r-xp 00000000 fe:00 8675                               /lib/libpthread-2.10.1.so
> > 47db5000-47db6000 ---p 00016000 fe:00 8675                               /lib/libpthread-2.10.1.so
> > 47db6000-47db7000 r--p 00016000 fe:00 8675                               /lib/libpthread-2.10.1.so
> > 47db7000-47db8000 rw-p 00017000 fe:00 8675                               /lib/libpthread-2.10.1.so
> > 47db8000-47dba000 rw-p 00000000 00:00 0
> > f6c00000-f6c21000 rw-p 00000000 00:00 0
> > f6c21000-f6d00000 ---p 00000000 00:00 0
> > f6dff000-f6e00000 ---p 00000000 00:00 0
> > f6e00000-f7600000 rw-p 00000000 00:00 0
> > f7600000-f7621000 rw-p 00000000 00:00 0
> > f7621000-f7700000 ---p 00000000 00:00 0
> > f773b000-f773c000 ---p 00000000 00:00 0
> > f773c000-f7f3f000 rw-p 00000000 00:00 0
> > f7f50000-f7f51000 rw-s 0000f000 fe:09 86049                              /var/run/pcscd.pub
> > f7f51000-f7f52000 rw-s 0000e000 fe:09 86049                              /var/run/pcscd.pub
> > f7f52000-f7f53000 rw-s 0000d000 fe:09 86049                              /var/run/pcscd.pub
> > f7f53000-f7f54000 rw-s 0000c000 fe:09 86049                              /var/run/pcscd.pub
> > f7f54000-f7f55000 rw-s 0000b000 fe:09 86049                              /var/run/pcscd.pub
> > f7f55000-f7f56000 rw-s 0000a000 fe:09 86049                              /var/run/pcscd.pub
> > f7f56000-f7f57000 rw-s 00009000 fe:09 86049                              /var/run/pcscd.pub
> > f7f57000-f7f58000 rw-s 00008000 fe:09 86049                              /var/run/pcscd.pub
> > f7f58000-f7f59000 rw-s 00007000 fe:09 86049                              /var/run/pcscd.pub
> > f7f59000-f7f5a000 rw-s 00006000 fe:09 86049                              /var/run/pcscd.pub
> > f7f5a000-f7f5b000 rw-s 00005000 fe:09 86049                              /var/run/pcscd.pub
> > f7f5b000-f7f5c000 rw-s 00004000 fe:09 86049                              /var/run/pcscd.pub
> > f7f5c000-f7f5d000 rw-s 00003000 fe:09 86049                              /var/run/pcscd.pub
> > f7f5d000-f7f5e000 rw-s 00002000 fe:09 86049                              /var/run/pcscd.pub
> > f7f5e000-f7f5f000 rw-s 00001000 fe:09 86049                              /var/run/pcscd.pub
> > f7f5f000-f7f60000 rw-s 00000000 fe:09 86049                              /var/run/pcscd.pub
> > f7f60000-f7f61000 r-xp 00000000 00:00 0                                  [vdso]
> > fffe8000-ffffd000 rw-p 00000000 00:00 0                                  [stack]
> > 
> > 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.31-rc1-mmotm0702 - ps command hangs inside kernel
  2009-07-14  5:31   ` Stefani Seibold
@ 2009-07-16 19:12     ` Valdis.Kletnieks
  2009-07-16 20:24       ` Stefani Seibold
  0 siblings, 1 reply; 7+ messages in thread
From: Valdis.Kletnieks @ 2009-07-16 19:12 UTC (permalink / raw)
  To: Stefani Seibold; +Cc: Andrew Morton, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 11966 bytes --]

On Tue, 14 Jul 2009 07:31:19 +0200, Stefani Seibold said:
> Am Montag, den 13.07.2009, 14:38 -0700 schrieb Andrew Morton:
> > On Mon, 13 Jul 2009 16:54:51 -0400
> > Valdis.Kletnieks@vt.edu wrote:
> > 
> > > Several times recently, I've had the 'ps' command hang inside the kernel
> > > for extended periods of time - usually around 1100 seconds, but today I
> > > had one that hung there for 2351 seconds.

> i am the author of the get_stack_usage_bytes(). Because i have currently
> no 64bit machine running, i am not able to analyse your problem. Does it
> only happen on 32bit application on a 64bit kernel? Is it only affected
> to pcsd?

I've only seen it happen to pcscd.  However, most of the time it's one of
the very few 32-bit apps running on my laptop (I've got exactly *one* legacy
app for a secure-token that is stuck in 32-bit land).  So I can't tell if it's
a generic 32-bit issue.

> Can you give me a more accurate info what exactly the problem is?


> 
> > OK, thanks for the analysis.
> > 
> > > Here's the traceback of the ps, as reported by 2 alt-sysrq-t several
> > > minutes apart:
> > > 
> > > 
> > > ps            R  running task     3936 26646  26580 0x00000080
> > >  ffff88005a599bd8 ffffffff81499842 ffff88000213bf80 0000000000000000
> > >  ffff88005a599b78 ffffffff8103589b ffffffff81035805 ffff88007e71ea40
> > >  ffff88000213bf80 ffff8800657d8fe0 000000000000df78 ffff8800657d8fe8
> > > Call Trace:
> > >  [<ffffffff81499842>] ? thread_return+0xb6/0xfa
> > >  [<ffffffff8103589b>] ? finish_task_switch+0xd1/0xf4
> > >  [<ffffffff81035805>] ? finish_task_switch+0x3b/0xf4
> > >  [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145
> > >  [<ffffffff8149b2a6>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> > >  [<ffffffff8101f76b>] smp_apic_timer_interrupt+0x81/0x8f
> > >  [<ffffffff81046231>] ? irq_exit+0xaf/0xb4
> > >  [<ffffffff8100bd80>] ? restore_args+0x0/0x30
> > >  [<ffffffff810b8222>] ? IS_ERR+0x25/0x2c
> > >  [<ffffffff810b8222>] ? IS_ERR+0x25/0x2c
> > >  [<ffffffff810b8f22>] ? follow_page+0x28/0x2e3
> > >  [<ffffffff810b91d9>] ? follow_page+0x2df/0x2e3
> > >  [<ffffffff811219d8>] ? proc_pid_status+0x5e0/0x694
> > >  [<ffffffff81066072>] ? trace_hardirqs_on+0xd/0xf
> > >  [<ffffffff8111e42c>] ? proc_single_show+0x57/0x74
> > >  [<ffffffff810e90b4>] ? seq_read+0x249/0x49b
> > >  [<ffffffff8116c840>] ? security_file_permission+0x11/0x13
> > >  [<ffffffff810d05e1>] ? vfs_read+0xe0/0x141
> > >  [<ffffffff810d8191>] ? path_put+0x1d/0x21
> > >  [<ffffffff810d06f8>] ? sys_read+0x45/0x69
> > >  [<ffffffff8100b2ab>] ? system_call_fastpath+0x16/0x1b
> > 
> The double follow_page looks strange for me... I will have a look on it.

It's possible that one of the two follow_page() entries is stale and just
happened to be left on the stack.  A large chunk of proc_pid_status() is
inlined, so it's possible that two calls were made and left their return
addresses in different locations on the stack.

I am pretty sure that follow_page+0x28 is the correct one, as I see it
in 2 more tracebacks today (see below)...

Got a better idea than just sticking in a printk() to dump the starting
values of stkpage and vma->vm_start in get_stack_usage_in_bytes()? I suspect
that for a 32-bit process, those 2 values aren't lined up the way we think
they are, and as a result that for loop ends up walking across a *lot* of
pages unintentionally.

> > Another possibility is that pcscd has gone off on a long in-kernel sulk
> > (polling hardware?) while holding some lock which ps needs (eg, mmap_sem).
> > 
> > It would be useful if you can grab a pcscd backtrace during the stall.

Got bit again this morning - here's the relevant tracebacks. Looks like pcscd
is off sulking in select() and nanosleep(), which are pretty normal places for
programs to go sulk.

(pcscd has 2 threads, apparently. I had 2 ps commands hung up, thus the
two entries for it)

pcscd         S 0000000000000000  4656  2100      1 0x00020080
 ffff88007da3f948 0000000000000046 ffff88007da3f8b8 ffffffff81052b2e
 ffff88007f90e340 ffff88007e08db30 0000000000000000 0000000000000001
 ffff88007da3f8d8 ffff880074ee4fa0 000000000000df78 ffff880074ee4fa0
Call Trace:
 [<ffffffff81052b2e>] ? queue_work_on+0x5e/0x6c
 [<ffffffff81056bec>] ? add_wait_queue+0x1b/0x42
 [<ffffffff8149a70b>] schedule_hrtimeout_range+0x3f/0x11f
 [<ffffffff8149bb91>] ? _spin_unlock_irqrestore+0x72/0x80
 [<ffffffff81056c0b>] ? add_wait_queue+0x3a/0x42
 [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145
 [<ffffffff81066072>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff810de040>] poll_schedule_timeout+0x33/0x4c
 [<ffffffff810deb82>] do_select+0x4b7/0x4f5
 [<ffffffff810df064>] ? __pollwait+0x0/0xc7
 [<ffffffff810df12b>] ? pollwake+0x0/0x51
 [<ffffffff810a8d3a>] ? get_page_from_freelist+0x38a/0x63c
 [<ffffffff811bb5c4>] ? list_del+0xbc/0xea
 [<ffffffff810a7561>] ? __rmqueue+0x124/0x2bf
 [<ffffffff81034de2>] ? get_parent_ip+0x11/0x42
 [<ffffffff8149e770>] ? sub_preempt_count+0x35/0x49
 [<ffffffff810a8ed8>] ? get_page_from_freelist+0x528/0x63c
 [<ffffffff8105b908>] ? __update_sched_clock+0x2f/0x8e
 [<ffffffff8105bb72>] ? sched_clock_cpu+0x20b/0x219
 [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145
 [<ffffffff81066072>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff810542bf>] ? alloc_pid+0x2ce/0x3e3
 [<ffffffff810542bf>] ? alloc_pid+0x2ce/0x3e3
 [<ffffffff81034de2>] ? get_parent_ip+0x11/0x42
 [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145
 [<ffffffff81107bf7>] compat_core_sys_select+0x183/0x23d
 [<ffffffff8149e770>] ? sub_preempt_count+0x35/0x49
 [<ffffffff8149bb91>] ? _spin_unlock_irqrestore+0x72/0x80
 [<ffffffff81030930>] ? task_rq_unlock+0xc/0xe
 [<ffffffff8103ab25>] ? wake_up_new_task+0x169/0x174
 [<ffffffff8103eaf6>] ? do_fork+0x37a/0x424
 [<ffffffff8105e2e2>] ? current_kernel_time+0x28/0x50
 [<ffffffff81107f48>] compat_sys_select+0x96/0xbe
 [<ffffffff8107d8b0>] ? audit_syscall_entry+0x170/0x19c
 [<ffffffff8102b538>] sysenter_dispatch+0x7/0x2c

pcscd         S ffff88007e1fbf48  5640  2106      1 0x00020080
 ffff88007e1fbe28 0000000000000046 ffff88007e1fbd88 ffffffff8149bb91
 ffff88007e1fbe78 0000000000000001 ffff88007e1fbe18 ffffffff81059d91
 0000000000000000 ffff880074d6ac20 000000000000df78 ffff880074d6ac20
Call Trace:
 [<ffffffff8149bb91>] ? _spin_unlock_irqrestore+0x72/0x80
 [<ffffffff81059d91>] ? __hrtimer_start_range_ns+0x35b/0x36d
 [<ffffffff8149a882>] do_nanosleep+0x88/0xee
 [<ffffffff81059e75>] hrtimer_nanosleep+0xac/0x121
 [<ffffffff8105935d>] ? hrtimer_wakeup+0x0/0x21
 [<ffffffff81075da2>] compat_sys_nanosleep+0x7b/0xe1
 [<ffffffff8102b538>] sysenter_dispatch+0x7/0x2c
 [<ffffffff8149b2a6>] ? trace_hardirqs_on_thunk+0x3a/0x3f

ps            R  running task     3936 45836  45832 0x00000080
 ffff88004dc09b98 ffffffff81065f3f 0000000000000001 00000388525af000
 ffff88004dc09bb8 ffffffff81065f3f 0000000000000000 000003886bc2b000
 ffff88004dc09ce8 ffffffff8149b2a6 0000000000000000 ffff88000212cf68
Call Trace:
 [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145
 [<ffffffff81065f3f>] trace_hardirqs_on_caller+0x1f/0x145
 [<ffffffff8149b2a6>] trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145
 [<ffffffff8149b2a6>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8101f76b>] ? smp_apic_timer_interrupt+0x81/0x8f
 [<ffffffff810b8222>] ? IS_ERR+0x25/0x2c
 [<ffffffff810b8f22>] ? follow_page+0x28/0x2e3
 [<ffffffff811219d8>] proc_pid_status+0x5e0/0x694
 [<ffffffff81066072>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff8111e42c>] proc_single_show+0x57/0x74
 [<ffffffff810e90b4>] seq_read+0x249/0x49b
 [<ffffffff8116c840>] ? security_file_permission+0x11/0x13
 [<ffffffff810d05e1>] vfs_read+0xe0/0x141
 [<ffffffff810d8191>] ? path_put+0x1d/0x21
 [<ffffffff810d06f8>] sys_read+0x45/0x69
 [<ffffffff8100b2ab>] system_call_fastpath+0x16/0x1b

ps            R  running task     5584 45948  45868 0x00000080
 ffff88004de81bb8 0000000000000046 ffff88004de81b18 ffffffff81035142
 ffff880002120f80 0000000000000000 ffff88004de81b68 ffffffff8103589b
 ffffffff81035805 ffff8800653fc460 000000000000df78 ffff8800653fc468
Call Trace:
 [<ffffffff81035142>] ? mmdrop+0x2b/0x3d
 [<ffffffff8103589b>] ? finish_task_switch+0xd1/0xf4
 [<ffffffff81035805>] ? finish_task_switch+0x3b/0xf4
 [<ffffffff81499842>] ? thread_return+0xb6/0xfa
 [<ffffffff814999d1>] preempt_schedule_irq+0x56/0x74
 [<ffffffff8100be96>] retint_kernel+0x26/0x30
 [<ffffffff81094996>] ? ftrace_likely_update+0x12/0x14
 [<ffffffff8100bd80>] ? restore_args+0x0/0x30
 [<ffffffff810b8222>] IS_ERR+0x25/0x2c
 [<ffffffff810b8f22>] follow_page+0x28/0x2e3
 [<ffffffff81094990>] ? ftrace_likely_update+0xc/0x14
 [<ffffffff811219d8>] proc_pid_status+0x5e0/0x694
 [<ffffffff81066072>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff8111e42c>] proc_single_show+0x57/0x74
 [<ffffffff810e90b4>] seq_read+0x249/0x49b
 [<ffffffff8116c840>] ? security_file_permission+0x11/0x13
 [<ffffffff810d05e1>] vfs_read+0xe0/0x141
 [<ffffffff810d8191>] ? path_put+0x1d/0x21
 [<ffffffff810d06f8>] sys_read+0x45/0x69
 [<ffffffff8100b2ab>] system_call_fastpath+0x16/0x1b

Just for comparison, here's the pcscd stack traces right now, when things
are working just fine. One thread in select, one in nanosleep, just like
when it was broken.

pcscd         S 0000000000000000  4656  2000      1 0x00020080
 ffff880073037948 0000000000000046 ffff8800730378a8 ffffffff81030461
 0000000000000000 0000000000000000 ffff880000000000 ffffffff81499842
 ffff8800730378d8 ffff88007ebf9120 000000000000df78 ffff88007ebf9120
Call Trace:
 [<ffffffff81030461>] ? need_resched+0x3a/0x40
 [<ffffffff81499842>] ? thread_return+0xb6/0xfa
 [<ffffffff81056bec>] ? add_wait_queue+0x1b/0x42
 [<ffffffff8149a70b>] schedule_hrtimeout_range+0x3f/0x11f
 [<ffffffff8149bb91>] ? _spin_unlock_irqrestore+0x72/0x80
 [<ffffffff81056c0b>] ? add_wait_queue+0x3a/0x42
 [<ffffffff810df122>] ? __pollwait+0xbe/0xc7
 [<ffffffff810de040>] poll_schedule_timeout+0x33/0x4c
 [<ffffffff810deb82>] do_select+0x4b7/0x4f5
 [<ffffffff810df064>] ? __pollwait+0x0/0xc7
 [<ffffffff810df12b>] ? pollwake+0x0/0x51
 [<ffffffff810a8d3a>] ? get_page_from_freelist+0x38a/0x63c
 [<ffffffff811bb5c4>] ? list_del+0xbc/0xea
 [<ffffffff810a7561>] ? __rmqueue+0x124/0x2bf
 [<ffffffff81034de2>] ? get_parent_ip+0x11/0x42
 [<ffffffff8149e770>] ? sub_preempt_count+0x35/0x49
 [<ffffffff810a8ed8>] ? get_page_from_freelist+0x528/0x63c
 [<ffffffff81035805>] ? finish_task_switch+0x3b/0xf4
 [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145
 [<ffffffff8149bc01>] ? _spin_unlock_irq+0x62/0x6f
 [<ffffffff8103589b>] ? finish_task_switch+0xd1/0xf4
 [<ffffffff81035805>] ? finish_task_switch+0x3b/0xf4
 [<ffffffff81030461>] ? need_resched+0x3a/0x40
 [<ffffffff81499842>] ? thread_return+0xb6/0xfa
 [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145
 [<ffffffff81107bf7>] compat_core_sys_select+0x183/0x23d
 [<ffffffff81499a4d>] ? preempt_schedule+0x5e/0x67
 [<ffffffff8149bb9a>] ? _spin_unlock_irqrestore+0x7b/0x80
 [<ffffffff81030930>] ? task_rq_unlock+0xc/0xe
 [<ffffffff8103ab25>] ? wake_up_new_task+0x169/0x174
 [<ffffffff8103eaf6>] ? do_fork+0x37a/0x424
 [<ffffffff81107f48>] compat_sys_select+0x96/0xbe
 [<ffffffff8107d80b>] ? audit_syscall_entry+0xcb/0x19c
 [<ffffffff8102b538>] sysenter_dispatch+0x7/0x2c

pcscd         S ffff88007cf99f48  4896  2006      1 0x00020080
 ffff88007cf99e28 0000000000000046 ffff88007cf99d88 ffffffff8149bb91
 ffff88007cf99e78 0000000000000001 ffff88007cf99e18 ffffffff81059d91
 0000000000000000 ffff88007e4da3e0 000000000000df78 ffff88007e4da3e0
Call Trace:
 [<ffffffff8149bb91>] ? _spin_unlock_irqrestore+0x72/0x80
 [<ffffffff81059d91>] ? __hrtimer_start_range_ns+0x35b/0x36d
 [<ffffffff8149a882>] do_nanosleep+0x88/0xee
 [<ffffffff81059e75>] hrtimer_nanosleep+0xac/0x121
 [<ffffffff8105935d>] ? hrtimer_wakeup+0x0/0x21
 [<ffffffff81075da2>] compat_sys_nanosleep+0x7b/0xe1
 [<ffffffff8102b538>] sysenter_dispatch+0x7/0x2c
 [<ffffffff8149b2a6>] ? trace_hardirqs_on_thunk+0x3a/0x3f



[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.31-rc1-mmotm0702 - ps command hangs inside kernel
  2009-07-16 19:12     ` Valdis.Kletnieks
@ 2009-07-16 20:24       ` Stefani Seibold
  2009-07-16 20:44         ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Stefani Seibold @ 2009-07-16 20:24 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: Andrew Morton, linux-kernel

On Thu, 16 Jul 2009, 15:12 -0400 Valdis.Kletnieks@vt.edu said:
> On Tue, 14 Jul 2009 07:31:19 +0200, Stefani Seibold said:
> > Am Montag, den 13.07.2009, 14:38 -0700 schrieb Andrew Morton:
> > > On Mon, 13 Jul 2009 16:54:51 -0400
> > > Valdis.Kletnieks@vt.edu wrote:
> > > 
> > > > Several times recently, I've had the 'ps' command hang inside the kernel
> > > > for extended periods of time - usually around 1100 seconds, but today I
> > > > had one that hung there for 2351 seconds.
> 
> > i am the author of the get_stack_usage_bytes(). Because i have currently
> > no 64bit machine running, i am not able to analyse your problem. Does it
> > only happen on 32bit application on a 64bit kernel? Is it only affected
> > to pcsd?
> 
> I've only seen it happen to pcscd.  However, most of the time it's one of
> the very few 32-bit apps running on my laptop (I've got exactly *one* legacy
> app for a secure-token that is stuck in 32-bit land).  So I can't tell if it's
> a generic 32-bit issue.
> 
> It's possible that one of the two follow_page() entries is stale and just
> happened to be left on the stack.  A large chunk of proc_pid_status() is
> inlined, so it's possible that two calls were made and left their return
> addresses in different locations on the stack.
> 
> I am pretty sure that follow_page+0x28 is the correct one, as I see it
> in 2 more tracebacks today (see below)...

The stack trace looks like you there is a old version included in the
2.6.31-rc1-mmotm0702 patches.

I switch to walk_page_range() function since patch version V0.9 dated
from Jun 10 2009. Here is the link to the lkml patchwork:
 
http://patchwork.kernel.org/patch/32210/

I do the map examination exactly in the same way like the function used
for /proc/<pid>/smaps. So i think this version should do it without side
effects.

Can you tell me were you downloaded the 2.6.31-rc1-mmotm0702 patch?

> ps            R  running task     3936 45836  45832 0x00000080
>  ffff88004dc09b98 ffffffff81065f3f 0000000000000001 00000388525af000
>  ffff88004dc09bb8 ffffffff81065f3f 0000000000000000 000003886bc2b000
>  ffff88004dc09ce8 ffffffff8149b2a6 0000000000000000 ffff88000212cf68
> Call Trace:
>  [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145
>  [<ffffffff81065f3f>] trace_hardirqs_on_caller+0x1f/0x145
>  [<ffffffff8149b2a6>] trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145
>  [<ffffffff8149b2a6>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff8101f76b>] ? smp_apic_timer_interrupt+0x81/0x8f
>  [<ffffffff810b8222>] ? IS_ERR+0x25/0x2c
>  [<ffffffff810b8f22>] ? follow_page+0x28/0x2e3
>  [<ffffffff811219d8>] proc_pid_status+0x5e0/0x694
>  [<ffffffff81066072>] ? trace_hardirqs_on+0xd/0xf
>  [<ffffffff8111e42c>] proc_single_show+0x57/0x74
>  [<ffffffff810e90b4>] seq_read+0x249/0x49b
>  [<ffffffff8116c840>] ? security_file_permission+0x11/0x13
>  [<ffffffff810d05e1>] vfs_read+0xe0/0x141
>  [<ffffffff810d8191>] ? path_put+0x1d/0x21
>  [<ffffffff810d06f8>] sys_read+0x45/0x69
>  [<ffffffff8100b2ab>] system_call_fastpath+0x16/0x1b



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.31-rc1-mmotm0702 - ps command hangs inside kernel
  2009-07-16 20:24       ` Stefani Seibold
@ 2009-07-16 20:44         ` Andrew Morton
  2009-07-16 21:21           ` Valdis.Kletnieks
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2009-07-16 20:44 UTC (permalink / raw)
  To: Stefani Seibold; +Cc: Valdis.Kletnieks, linux-kernel

On Thu, 16 Jul 2009 22:24:47 +0200
Stefani Seibold <stefani@seibold.net> wrote:

> On Thu, 16 Jul 2009, 15:12 -0400 Valdis.Kletnieks@vt.edu said:
> > On Tue, 14 Jul 2009 07:31:19 +0200, Stefani Seibold said:
> > > Am Montag, den 13.07.2009, 14:38 -0700 schrieb Andrew Morton:
> > > > On Mon, 13 Jul 2009 16:54:51 -0400
> > > > Valdis.Kletnieks@vt.edu wrote:
> > > > 
> > > > > Several times recently, I've had the 'ps' command hang inside the kernel
> > > > > for extended periods of time - usually around 1100 seconds, but today I
> > > > > had one that hung there for 2351 seconds.
> > 
> > > i am the author of the get_stack_usage_bytes(). Because i have currently
> > > no 64bit machine running, i am not able to analyse your problem. Does it
> > > only happen on 32bit application on a 64bit kernel? Is it only affected
> > > to pcsd?
> > 
> > I've only seen it happen to pcscd.  However, most of the time it's one of
> > the very few 32-bit apps running on my laptop (I've got exactly *one* legacy
> > app for a secure-token that is stuck in 32-bit land).  So I can't tell if it's
> > a generic 32-bit issue.
> > 
> > It's possible that one of the two follow_page() entries is stale and just
> > happened to be left on the stack.  A large chunk of proc_pid_status() is
> > inlined, so it's possible that two calls were made and left their return
> > addresses in different locations on the stack.
> > 
> > I am pretty sure that follow_page+0x28 is the correct one, as I see it
> > in 2 more tracebacks today (see below)...
> 
> The stack trace looks like you there is a old version included in the
> 2.6.31-rc1-mmotm0702 patches.
> 
> I switch to walk_page_range() function since patch version V0.9 dated
> from Jun 10 2009. Here is the link to the lkml patchwork:
>  
> http://patchwork.kernel.org/patch/32210/
> 
> I do the map examination exactly in the same way like the function used
> for /proc/<pid>/smaps. So i think this version should do it without side
> effects.
> 
> Can you tell me were you downloaded the 2.6.31-rc1-mmotm0702 patch?

It would have been version 08.  I've now updated to v11.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.31-rc1-mmotm0702 - ps command hangs inside kernel
  2009-07-16 20:44         ` Andrew Morton
@ 2009-07-16 21:21           ` Valdis.Kletnieks
  0 siblings, 0 replies; 7+ messages in thread
From: Valdis.Kletnieks @ 2009-07-16 21:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Stefani Seibold, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 457 bytes --]

On Thu, 16 Jul 2009 13:44:01 PDT, Andrew Morton said:
> On Thu, 16 Jul 2009 22:24:47 +0200
> Stefani Seibold <stefani@seibold.net> wrote:

> > The stack trace looks like you there is a old version included in the
> > 2.6.31-rc1-mmotm0702 patches.

> It would have been version 08.  I've now updated to v11.

Did -11 make it into the mmotm0715 tarball?  If so, I'll go build that and
if the problem doesn't re-manifest in a week or so we can call it fixed.


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-07-16 21:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-13 20:54 2.6.31-rc1-mmotm0702 - ps command hangs inside kernel Valdis.Kletnieks
2009-07-13 21:38 ` Andrew Morton
2009-07-14  5:31   ` Stefani Seibold
2009-07-16 19:12     ` Valdis.Kletnieks
2009-07-16 20:24       ` Stefani Seibold
2009-07-16 20:44         ` Andrew Morton
2009-07-16 21:21           ` Valdis.Kletnieks

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox