* 2.6.31-rc1-mmotm0702 - ps command hangs inside kernel @ 2009-07-13 20:54 Valdis.Kletnieks 2009-07-13 21:38 ` Andrew Morton 0 siblings, 1 reply; 7+ messages in thread From: Valdis.Kletnieks @ 2009-07-13 20:54 UTC (permalink / raw) To: linux-kernel, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 7668 bytes --] Several times recently, I've had the 'ps' command hang inside the kernel for extended periods of time - usually around 1100 seconds, but today I had one that hung there for 2351 seconds. Info: It's always reading the same file - /proc/<pid>/status, where the pid is the 'pcscd' daemon. And eventually, it manages to exit on its own. Whatever is getting horked up resets itself - subsequent 'ps' commands work just fine. My best guess is that it's getting onto a squirrely vma in get_stack_usage_in_bytes() in the for/follow_page loop. Possibly highly relevant - /usr/sbin/pcscd is one of the few 32-bit binaries running in a mostly 64-bit runtime. Here's the traceback of the ps, as reported by 2 alt-sysrq-t several minutes apart: ps R running task 3936 26646 26580 0x00000080 ffff88005a599be8 ffffffff81499842 ffff88005a599b78 ffffffff8103589b ffffffff81035805 ffff88007e71ea40 ffff880002120f80 ffff8800657d8fe0 000000000000df78 ffff8800657d8fe0 000000000000df78 ffff8800657d8fe8 Call Trace: [<ffffffff81499842>] ? thread_return+0xb6/0xfa [<ffffffff8103589b>] ? finish_task_switch+0xd1/0xf4 [<ffffffff81035805>] ? finish_task_switch+0x3b/0xf4 [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145 [<ffffffff8149b2a6>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff8101f76b>] ? smp_apic_timer_interrupt+0x81/0x8f [<ffffffff81046231>] ? irq_exit+0xaf/0xb4 [<ffffffff8100bd80>] restore_args+0x0/0x30 [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145 [<ffffffff8149b2a6>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff8101f76b>] ? smp_apic_timer_interrupt+0x81/0x8f [<ffffffff810b8222>] ? IS_ERR+0x25/0x2c [<ffffffff810b8f22>] ? follow_page+0x28/0x2e3 [<ffffffff81121a04>] ? proc_pid_status+0x60c/0x694 [<ffffffff81066072>] ? trace_hardirqs_on+0xd/0xf [<ffffffff8111e42c>] ? proc_single_show+0x57/0x74 [<ffffffff810e90b4>] ? seq_read+0x249/0x49b [<ffffffff8116c840>] ? security_file_permission+0x11/0x13 [<ffffffff810d05e1>] ? vfs_read+0xe0/0x141 [<ffffffff810d8191>] ? path_put+0x1d/0x21 [<ffffffff810d06f8>] ? sys_read+0x45/0x69 [<ffffffff8100b2ab>] ? system_call_fastpath+0x16/0x1b ps R running task 3936 26646 26580 0x00000080 ffff88005a599bd8 ffffffff81499842 ffff88000213bf80 0000000000000000 ffff88005a599b78 ffffffff8103589b ffffffff81035805 ffff88007e71ea40 ffff88000213bf80 ffff8800657d8fe0 000000000000df78 ffff8800657d8fe8 Call Trace: [<ffffffff81499842>] ? thread_return+0xb6/0xfa [<ffffffff8103589b>] ? finish_task_switch+0xd1/0xf4 [<ffffffff81035805>] ? finish_task_switch+0x3b/0xf4 [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145 [<ffffffff8149b2a6>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff8101f76b>] smp_apic_timer_interrupt+0x81/0x8f [<ffffffff81046231>] ? irq_exit+0xaf/0xb4 [<ffffffff8100bd80>] ? restore_args+0x0/0x30 [<ffffffff810b8222>] ? IS_ERR+0x25/0x2c [<ffffffff810b8222>] ? IS_ERR+0x25/0x2c [<ffffffff810b8f22>] ? follow_page+0x28/0x2e3 [<ffffffff810b91d9>] ? follow_page+0x2df/0x2e3 [<ffffffff811219d8>] ? proc_pid_status+0x5e0/0x694 [<ffffffff81066072>] ? trace_hardirqs_on+0xd/0xf [<ffffffff8111e42c>] ? proc_single_show+0x57/0x74 [<ffffffff810e90b4>] ? seq_read+0x249/0x49b [<ffffffff8116c840>] ? security_file_permission+0x11/0x13 [<ffffffff810d05e1>] ? vfs_read+0xe0/0x141 [<ffffffff810d8191>] ? path_put+0x1d/0x21 [<ffffffff810d06f8>] ? sys_read+0x45/0x69 [<ffffffff8100b2ab>] ? system_call_fastpath+0x16/0x1b Here's the output of 'cat /proc/2031/maps' currently - I didn't think to get one during the last event: # cat /proc/2031/maps 08047000-0805e000 r-xp 00000000 fe:05 51682 /usr/sbin/pcscd 0805e000-0805f000 rw-p 00016000 fe:05 51682 /usr/sbin/pcscd 0805f000-080e8000 rw-p 00000000 00:00 0 09990000-099b1000 rw-p 00000000 00:00 0 [heap] 44368000-4436f000 r-xp 00000000 fe:05 20682 /usr/lib/libusb-0.1.so.4.4.4 4436f000-44371000 rw-p 00006000 fe:05 20682 /usr/lib/libusb-0.1.so.4.4.4 45c81000-45cab000 r-xp 00000000 fe:00 8715 /lib/libgcc_s-4.4.0-20090708.so.1 45cab000-45cac000 rw-p 0002a000 fe:00 8715 /lib/libgcc_s-4.4.0-20090708.so.1 47c00000-47c20000 r-xp 00000000 fe:00 8266 /lib/ld-2.10.1.so 47c20000-47c21000 r--p 0001f000 fe:00 8266 /lib/ld-2.10.1.so 47c21000-47c22000 rw-p 00020000 fe:00 8266 /lib/ld-2.10.1.so 47c24000-47d8f000 r-xp 00000000 fe:00 8360 /lib/libc-2.10.1.so 47d8f000-47d90000 ---p 0016b000 fe:00 8360 /lib/libc-2.10.1.so 47d90000-47d92000 r--p 0016b000 fe:00 8360 /lib/libc-2.10.1.so 47d92000-47d93000 rw-p 0016d000 fe:00 8360 /lib/libc-2.10.1.so 47d93000-47d96000 rw-p 00000000 00:00 0 47d98000-47d9b000 r-xp 00000000 fe:00 8388 /lib/libdl-2.10.1.so 47d9b000-47d9c000 r--p 00002000 fe:00 8388 /lib/libdl-2.10.1.so 47d9c000-47d9d000 rw-p 00003000 fe:00 8388 /lib/libdl-2.10.1.so 47d9f000-47db5000 r-xp 00000000 fe:00 8675 /lib/libpthread-2.10.1.so 47db5000-47db6000 ---p 00016000 fe:00 8675 /lib/libpthread-2.10.1.so 47db6000-47db7000 r--p 00016000 fe:00 8675 /lib/libpthread-2.10.1.so 47db7000-47db8000 rw-p 00017000 fe:00 8675 /lib/libpthread-2.10.1.so 47db8000-47dba000 rw-p 00000000 00:00 0 f6c00000-f6c21000 rw-p 00000000 00:00 0 f6c21000-f6d00000 ---p 00000000 00:00 0 f6dff000-f6e00000 ---p 00000000 00:00 0 f6e00000-f7600000 rw-p 00000000 00:00 0 f7600000-f7621000 rw-p 00000000 00:00 0 f7621000-f7700000 ---p 00000000 00:00 0 f773b000-f773c000 ---p 00000000 00:00 0 f773c000-f7f3f000 rw-p 00000000 00:00 0 f7f50000-f7f51000 rw-s 0000f000 fe:09 86049 /var/run/pcscd.pub f7f51000-f7f52000 rw-s 0000e000 fe:09 86049 /var/run/pcscd.pub f7f52000-f7f53000 rw-s 0000d000 fe:09 86049 /var/run/pcscd.pub f7f53000-f7f54000 rw-s 0000c000 fe:09 86049 /var/run/pcscd.pub f7f54000-f7f55000 rw-s 0000b000 fe:09 86049 /var/run/pcscd.pub f7f55000-f7f56000 rw-s 0000a000 fe:09 86049 /var/run/pcscd.pub f7f56000-f7f57000 rw-s 00009000 fe:09 86049 /var/run/pcscd.pub f7f57000-f7f58000 rw-s 00008000 fe:09 86049 /var/run/pcscd.pub f7f58000-f7f59000 rw-s 00007000 fe:09 86049 /var/run/pcscd.pub f7f59000-f7f5a000 rw-s 00006000 fe:09 86049 /var/run/pcscd.pub f7f5a000-f7f5b000 rw-s 00005000 fe:09 86049 /var/run/pcscd.pub f7f5b000-f7f5c000 rw-s 00004000 fe:09 86049 /var/run/pcscd.pub f7f5c000-f7f5d000 rw-s 00003000 fe:09 86049 /var/run/pcscd.pub f7f5d000-f7f5e000 rw-s 00002000 fe:09 86049 /var/run/pcscd.pub f7f5e000-f7f5f000 rw-s 00001000 fe:09 86049 /var/run/pcscd.pub f7f5f000-f7f60000 rw-s 00000000 fe:09 86049 /var/run/pcscd.pub f7f60000-f7f61000 r-xp 00000000 00:00 0 [vdso] fffe8000-ffffd000 rw-p 00000000 00:00 0 [stack] [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 2.6.31-rc1-mmotm0702 - ps command hangs inside kernel 2009-07-13 20:54 2.6.31-rc1-mmotm0702 - ps command hangs inside kernel Valdis.Kletnieks @ 2009-07-13 21:38 ` Andrew Morton 2009-07-14 5:31 ` Stefani Seibold 0 siblings, 1 reply; 7+ messages in thread From: Andrew Morton @ 2009-07-13 21:38 UTC (permalink / raw) To: Valdis.Kletnieks; +Cc: linux-kernel, Stefani Seibold On Mon, 13 Jul 2009 16:54:51 -0400 Valdis.Kletnieks@vt.edu wrote: > Several times recently, I've had the 'ps' command hang inside the kernel > for extended periods of time - usually around 1100 seconds, but today I > had one that hung there for 2351 seconds. > > Info: It's always reading the same file - /proc/<pid>/status, where the > pid is the 'pcscd' daemon. And eventually, it manages to exit on its own. > Whatever is getting horked up resets itself - subsequent 'ps' commands work > just fine. > > My best guess is that it's getting onto a squirrely vma in > get_stack_usage_in_bytes() in the for/follow_page loop. Possibly highly > relevant - /usr/sbin/pcscd is one of the few 32-bit binaries running in > a mostly 64-bit runtime. OK, thanks for the analysis. > Here's the traceback of the ps, as reported by 2 alt-sysrq-t several > minutes apart: > > ps R running task 3936 26646 26580 0x00000080 > ffff88005a599be8 ffffffff81499842 ffff88005a599b78 ffffffff8103589b > ffffffff81035805 ffff88007e71ea40 ffff880002120f80 ffff8800657d8fe0 > 000000000000df78 ffff8800657d8fe0 000000000000df78 ffff8800657d8fe8 > Call Trace: > [<ffffffff81499842>] ? thread_return+0xb6/0xfa > [<ffffffff8103589b>] ? finish_task_switch+0xd1/0xf4 > [<ffffffff81035805>] ? finish_task_switch+0x3b/0xf4 > [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145 > [<ffffffff8149b2a6>] ? trace_hardirqs_on_thunk+0x3a/0x3f > [<ffffffff8101f76b>] ? smp_apic_timer_interrupt+0x81/0x8f > [<ffffffff81046231>] ? irq_exit+0xaf/0xb4 > [<ffffffff8100bd80>] restore_args+0x0/0x30 > [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145 > [<ffffffff8149b2a6>] ? trace_hardirqs_on_thunk+0x3a/0x3f > [<ffffffff8101f76b>] ? smp_apic_timer_interrupt+0x81/0x8f > [<ffffffff810b8222>] ? IS_ERR+0x25/0x2c > [<ffffffff810b8f22>] ? follow_page+0x28/0x2e3 > [<ffffffff81121a04>] ? proc_pid_status+0x60c/0x694 > [<ffffffff81066072>] ? trace_hardirqs_on+0xd/0xf > [<ffffffff8111e42c>] ? proc_single_show+0x57/0x74 > [<ffffffff810e90b4>] ? seq_read+0x249/0x49b > [<ffffffff8116c840>] ? security_file_permission+0x11/0x13 > [<ffffffff810d05e1>] ? vfs_read+0xe0/0x141 > [<ffffffff810d8191>] ? path_put+0x1d/0x21 > [<ffffffff810d06f8>] ? sys_read+0x45/0x69 > [<ffffffff8100b2ab>] ? system_call_fastpath+0x16/0x1b > > ps R running task 3936 26646 26580 0x00000080 > ffff88005a599bd8 ffffffff81499842 ffff88000213bf80 0000000000000000 > ffff88005a599b78 ffffffff8103589b ffffffff81035805 ffff88007e71ea40 > ffff88000213bf80 ffff8800657d8fe0 000000000000df78 ffff8800657d8fe8 > Call Trace: > [<ffffffff81499842>] ? thread_return+0xb6/0xfa > [<ffffffff8103589b>] ? finish_task_switch+0xd1/0xf4 > [<ffffffff81035805>] ? finish_task_switch+0x3b/0xf4 > [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145 > [<ffffffff8149b2a6>] ? trace_hardirqs_on_thunk+0x3a/0x3f > [<ffffffff8101f76b>] smp_apic_timer_interrupt+0x81/0x8f > [<ffffffff81046231>] ? irq_exit+0xaf/0xb4 > [<ffffffff8100bd80>] ? restore_args+0x0/0x30 > [<ffffffff810b8222>] ? IS_ERR+0x25/0x2c > [<ffffffff810b8222>] ? IS_ERR+0x25/0x2c > [<ffffffff810b8f22>] ? follow_page+0x28/0x2e3 > [<ffffffff810b91d9>] ? follow_page+0x2df/0x2e3 > [<ffffffff811219d8>] ? proc_pid_status+0x5e0/0x694 > [<ffffffff81066072>] ? trace_hardirqs_on+0xd/0xf > [<ffffffff8111e42c>] ? proc_single_show+0x57/0x74 > [<ffffffff810e90b4>] ? seq_read+0x249/0x49b > [<ffffffff8116c840>] ? security_file_permission+0x11/0x13 > [<ffffffff810d05e1>] ? vfs_read+0xe0/0x141 > [<ffffffff810d8191>] ? path_put+0x1d/0x21 > [<ffffffff810d06f8>] ? sys_read+0x45/0x69 > [<ffffffff8100b2ab>] ? system_call_fastpath+0x16/0x1b Another possibility is that pcscd has gone off on a long in-kernel sulk (polling hardware?) while holding some lock which ps needs (eg, mmap_sem). It would be useful if you can grab a pcscd backtrace during the stall. I have a new version of those patches in my inbox to look at - I'm a bit swamped in backlog at present. I could drop the old version from my tree for a while, but it's unclear how that would help us find the cause of the problem. > Here's the output of 'cat /proc/2031/maps' currently - I didn't think to > get one during the last event: > > # cat /proc/2031/maps > 08047000-0805e000 r-xp 00000000 fe:05 51682 /usr/sbin/pcscd > 0805e000-0805f000 rw-p 00016000 fe:05 51682 /usr/sbin/pcscd > 0805f000-080e8000 rw-p 00000000 00:00 0 > 09990000-099b1000 rw-p 00000000 00:00 0 [heap] > 44368000-4436f000 r-xp 00000000 fe:05 20682 /usr/lib/libusb-0.1.so.4.4.4 > 4436f000-44371000 rw-p 00006000 fe:05 20682 /usr/lib/libusb-0.1.so.4.4.4 > 45c81000-45cab000 r-xp 00000000 fe:00 8715 /lib/libgcc_s-4.4.0-20090708.so.1 > 45cab000-45cac000 rw-p 0002a000 fe:00 8715 /lib/libgcc_s-4.4.0-20090708.so.1 > 47c00000-47c20000 r-xp 00000000 fe:00 8266 /lib/ld-2.10.1.so > 47c20000-47c21000 r--p 0001f000 fe:00 8266 /lib/ld-2.10.1.so > 47c21000-47c22000 rw-p 00020000 fe:00 8266 /lib/ld-2.10.1.so > 47c24000-47d8f000 r-xp 00000000 fe:00 8360 /lib/libc-2.10.1.so > 47d8f000-47d90000 ---p 0016b000 fe:00 8360 /lib/libc-2.10.1.so > 47d90000-47d92000 r--p 0016b000 fe:00 8360 /lib/libc-2.10.1.so > 47d92000-47d93000 rw-p 0016d000 fe:00 8360 /lib/libc-2.10.1.so > 47d93000-47d96000 rw-p 00000000 00:00 0 > 47d98000-47d9b000 r-xp 00000000 fe:00 8388 /lib/libdl-2.10.1.so > 47d9b000-47d9c000 r--p 00002000 fe:00 8388 /lib/libdl-2.10.1.so > 47d9c000-47d9d000 rw-p 00003000 fe:00 8388 /lib/libdl-2.10.1.so > 47d9f000-47db5000 r-xp 00000000 fe:00 8675 /lib/libpthread-2.10.1.so > 47db5000-47db6000 ---p 00016000 fe:00 8675 /lib/libpthread-2.10.1.so > 47db6000-47db7000 r--p 00016000 fe:00 8675 /lib/libpthread-2.10.1.so > 47db7000-47db8000 rw-p 00017000 fe:00 8675 /lib/libpthread-2.10.1.so > 47db8000-47dba000 rw-p 00000000 00:00 0 > f6c00000-f6c21000 rw-p 00000000 00:00 0 > f6c21000-f6d00000 ---p 00000000 00:00 0 > f6dff000-f6e00000 ---p 00000000 00:00 0 > f6e00000-f7600000 rw-p 00000000 00:00 0 > f7600000-f7621000 rw-p 00000000 00:00 0 > f7621000-f7700000 ---p 00000000 00:00 0 > f773b000-f773c000 ---p 00000000 00:00 0 > f773c000-f7f3f000 rw-p 00000000 00:00 0 > f7f50000-f7f51000 rw-s 0000f000 fe:09 86049 /var/run/pcscd.pub > f7f51000-f7f52000 rw-s 0000e000 fe:09 86049 /var/run/pcscd.pub > f7f52000-f7f53000 rw-s 0000d000 fe:09 86049 /var/run/pcscd.pub > f7f53000-f7f54000 rw-s 0000c000 fe:09 86049 /var/run/pcscd.pub > f7f54000-f7f55000 rw-s 0000b000 fe:09 86049 /var/run/pcscd.pub > f7f55000-f7f56000 rw-s 0000a000 fe:09 86049 /var/run/pcscd.pub > f7f56000-f7f57000 rw-s 00009000 fe:09 86049 /var/run/pcscd.pub > f7f57000-f7f58000 rw-s 00008000 fe:09 86049 /var/run/pcscd.pub > f7f58000-f7f59000 rw-s 00007000 fe:09 86049 /var/run/pcscd.pub > f7f59000-f7f5a000 rw-s 00006000 fe:09 86049 /var/run/pcscd.pub > f7f5a000-f7f5b000 rw-s 00005000 fe:09 86049 /var/run/pcscd.pub > f7f5b000-f7f5c000 rw-s 00004000 fe:09 86049 /var/run/pcscd.pub > f7f5c000-f7f5d000 rw-s 00003000 fe:09 86049 /var/run/pcscd.pub > f7f5d000-f7f5e000 rw-s 00002000 fe:09 86049 /var/run/pcscd.pub > f7f5e000-f7f5f000 rw-s 00001000 fe:09 86049 /var/run/pcscd.pub > f7f5f000-f7f60000 rw-s 00000000 fe:09 86049 /var/run/pcscd.pub > f7f60000-f7f61000 r-xp 00000000 00:00 0 [vdso] > fffe8000-ffffd000 rw-p 00000000 00:00 0 [stack] > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 2.6.31-rc1-mmotm0702 - ps command hangs inside kernel 2009-07-13 21:38 ` Andrew Morton @ 2009-07-14 5:31 ` Stefani Seibold 2009-07-16 19:12 ` Valdis.Kletnieks 0 siblings, 1 reply; 7+ messages in thread From: Stefani Seibold @ 2009-07-14 5:31 UTC (permalink / raw) To: Andrew Morton; +Cc: Valdis.Kletnieks, linux-kernel Hi, Am Montag, den 13.07.2009, 14:38 -0700 schrieb Andrew Morton: > On Mon, 13 Jul 2009 16:54:51 -0400 > Valdis.Kletnieks@vt.edu wrote: > > > Several times recently, I've had the 'ps' command hang inside the kernel > > for extended periods of time - usually around 1100 seconds, but today I > > had one that hung there for 2351 seconds. > > > > Info: It's always reading the same file - /proc/<pid>/status, where the > > pid is the 'pcscd' daemon. And eventually, it manages to exit on its own. > > Whatever is getting horked up resets itself - subsequent 'ps' commands work > > just fine. > > > > My best guess is that it's getting onto a squirrely vma in > > get_stack_usage_in_bytes() in the for/follow_page loop. Possibly highly > > relevant - /usr/sbin/pcscd is one of the few 32-bit binaries running in > > a mostly 64-bit runtime. > i am the author of the get_stack_usage_bytes(). Because i have currently no 64bit machine running, i am not able to analyse your problem. Does it only happen on 32bit application on a 64bit kernel? Is it only affected to pcsd? Can you give me a more accurate info what exactly the problem is? > OK, thanks for the analysis. > > > Here's the traceback of the ps, as reported by 2 alt-sysrq-t several > > minutes apart: > > > > > > ps R running task 3936 26646 26580 0x00000080 > > ffff88005a599bd8 ffffffff81499842 ffff88000213bf80 0000000000000000 > > ffff88005a599b78 ffffffff8103589b ffffffff81035805 ffff88007e71ea40 > > ffff88000213bf80 ffff8800657d8fe0 000000000000df78 ffff8800657d8fe8 > > Call Trace: > > [<ffffffff81499842>] ? thread_return+0xb6/0xfa > > [<ffffffff8103589b>] ? finish_task_switch+0xd1/0xf4 > > [<ffffffff81035805>] ? finish_task_switch+0x3b/0xf4 > > [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145 > > [<ffffffff8149b2a6>] ? trace_hardirqs_on_thunk+0x3a/0x3f > > [<ffffffff8101f76b>] smp_apic_timer_interrupt+0x81/0x8f > > [<ffffffff81046231>] ? irq_exit+0xaf/0xb4 > > [<ffffffff8100bd80>] ? restore_args+0x0/0x30 > > [<ffffffff810b8222>] ? IS_ERR+0x25/0x2c > > [<ffffffff810b8222>] ? IS_ERR+0x25/0x2c > > [<ffffffff810b8f22>] ? follow_page+0x28/0x2e3 > > [<ffffffff810b91d9>] ? follow_page+0x2df/0x2e3 > > [<ffffffff811219d8>] ? proc_pid_status+0x5e0/0x694 > > [<ffffffff81066072>] ? trace_hardirqs_on+0xd/0xf > > [<ffffffff8111e42c>] ? proc_single_show+0x57/0x74 > > [<ffffffff810e90b4>] ? seq_read+0x249/0x49b > > [<ffffffff8116c840>] ? security_file_permission+0x11/0x13 > > [<ffffffff810d05e1>] ? vfs_read+0xe0/0x141 > > [<ffffffff810d8191>] ? path_put+0x1d/0x21 > > [<ffffffff810d06f8>] ? sys_read+0x45/0x69 > > [<ffffffff8100b2ab>] ? system_call_fastpath+0x16/0x1b > The double follow_page looks strange for me... I will have a look on it. > Another possibility is that pcscd has gone off on a long in-kernel sulk > (polling hardware?) while holding some lock which ps needs (eg, mmap_sem). > > It would be useful if you can grab a pcscd backtrace during the stall. > > I have a new version of those patches in my inbox to look at - I'm a > bit swamped in backlog at present. I could drop the old version from > my tree for a while, but it's unclear how that would help us find the > cause of the problem. > > > > Here's the output of 'cat /proc/2031/maps' currently - I didn't think to > > get one during the last event: > > > > # cat /proc/2031/maps > > 08047000-0805e000 r-xp 00000000 fe:05 51682 /usr/sbin/pcscd > > 0805e000-0805f000 rw-p 00016000 fe:05 51682 /usr/sbin/pcscd > > 0805f000-080e8000 rw-p 00000000 00:00 0 > > 09990000-099b1000 rw-p 00000000 00:00 0 [heap] > > 44368000-4436f000 r-xp 00000000 fe:05 20682 /usr/lib/libusb-0.1.so.4.4.4 > > 4436f000-44371000 rw-p 00006000 fe:05 20682 /usr/lib/libusb-0.1.so.4.4.4 > > 45c81000-45cab000 r-xp 00000000 fe:00 8715 /lib/libgcc_s-4.4.0-20090708.so.1 > > 45cab000-45cac000 rw-p 0002a000 fe:00 8715 /lib/libgcc_s-4.4.0-20090708.so.1 > > 47c00000-47c20000 r-xp 00000000 fe:00 8266 /lib/ld-2.10.1.so > > 47c20000-47c21000 r--p 0001f000 fe:00 8266 /lib/ld-2.10.1.so > > 47c21000-47c22000 rw-p 00020000 fe:00 8266 /lib/ld-2.10.1.so > > 47c24000-47d8f000 r-xp 00000000 fe:00 8360 /lib/libc-2.10.1.so > > 47d8f000-47d90000 ---p 0016b000 fe:00 8360 /lib/libc-2.10.1.so > > 47d90000-47d92000 r--p 0016b000 fe:00 8360 /lib/libc-2.10.1.so > > 47d92000-47d93000 rw-p 0016d000 fe:00 8360 /lib/libc-2.10.1.so > > 47d93000-47d96000 rw-p 00000000 00:00 0 > > 47d98000-47d9b000 r-xp 00000000 fe:00 8388 /lib/libdl-2.10.1.so > > 47d9b000-47d9c000 r--p 00002000 fe:00 8388 /lib/libdl-2.10.1.so > > 47d9c000-47d9d000 rw-p 00003000 fe:00 8388 /lib/libdl-2.10.1.so > > 47d9f000-47db5000 r-xp 00000000 fe:00 8675 /lib/libpthread-2.10.1.so > > 47db5000-47db6000 ---p 00016000 fe:00 8675 /lib/libpthread-2.10.1.so > > 47db6000-47db7000 r--p 00016000 fe:00 8675 /lib/libpthread-2.10.1.so > > 47db7000-47db8000 rw-p 00017000 fe:00 8675 /lib/libpthread-2.10.1.so > > 47db8000-47dba000 rw-p 00000000 00:00 0 > > f6c00000-f6c21000 rw-p 00000000 00:00 0 > > f6c21000-f6d00000 ---p 00000000 00:00 0 > > f6dff000-f6e00000 ---p 00000000 00:00 0 > > f6e00000-f7600000 rw-p 00000000 00:00 0 > > f7600000-f7621000 rw-p 00000000 00:00 0 > > f7621000-f7700000 ---p 00000000 00:00 0 > > f773b000-f773c000 ---p 00000000 00:00 0 > > f773c000-f7f3f000 rw-p 00000000 00:00 0 > > f7f50000-f7f51000 rw-s 0000f000 fe:09 86049 /var/run/pcscd.pub > > f7f51000-f7f52000 rw-s 0000e000 fe:09 86049 /var/run/pcscd.pub > > f7f52000-f7f53000 rw-s 0000d000 fe:09 86049 /var/run/pcscd.pub > > f7f53000-f7f54000 rw-s 0000c000 fe:09 86049 /var/run/pcscd.pub > > f7f54000-f7f55000 rw-s 0000b000 fe:09 86049 /var/run/pcscd.pub > > f7f55000-f7f56000 rw-s 0000a000 fe:09 86049 /var/run/pcscd.pub > > f7f56000-f7f57000 rw-s 00009000 fe:09 86049 /var/run/pcscd.pub > > f7f57000-f7f58000 rw-s 00008000 fe:09 86049 /var/run/pcscd.pub > > f7f58000-f7f59000 rw-s 00007000 fe:09 86049 /var/run/pcscd.pub > > f7f59000-f7f5a000 rw-s 00006000 fe:09 86049 /var/run/pcscd.pub > > f7f5a000-f7f5b000 rw-s 00005000 fe:09 86049 /var/run/pcscd.pub > > f7f5b000-f7f5c000 rw-s 00004000 fe:09 86049 /var/run/pcscd.pub > > f7f5c000-f7f5d000 rw-s 00003000 fe:09 86049 /var/run/pcscd.pub > > f7f5d000-f7f5e000 rw-s 00002000 fe:09 86049 /var/run/pcscd.pub > > f7f5e000-f7f5f000 rw-s 00001000 fe:09 86049 /var/run/pcscd.pub > > f7f5f000-f7f60000 rw-s 00000000 fe:09 86049 /var/run/pcscd.pub > > f7f60000-f7f61000 r-xp 00000000 00:00 0 [vdso] > > fffe8000-ffffd000 rw-p 00000000 00:00 0 [stack] > > > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 2.6.31-rc1-mmotm0702 - ps command hangs inside kernel 2009-07-14 5:31 ` Stefani Seibold @ 2009-07-16 19:12 ` Valdis.Kletnieks 2009-07-16 20:24 ` Stefani Seibold 0 siblings, 1 reply; 7+ messages in thread From: Valdis.Kletnieks @ 2009-07-16 19:12 UTC (permalink / raw) To: Stefani Seibold; +Cc: Andrew Morton, linux-kernel [-- Attachment #1: Type: text/plain, Size: 11966 bytes --] On Tue, 14 Jul 2009 07:31:19 +0200, Stefani Seibold said: > Am Montag, den 13.07.2009, 14:38 -0700 schrieb Andrew Morton: > > On Mon, 13 Jul 2009 16:54:51 -0400 > > Valdis.Kletnieks@vt.edu wrote: > > > > > Several times recently, I've had the 'ps' command hang inside the kernel > > > for extended periods of time - usually around 1100 seconds, but today I > > > had one that hung there for 2351 seconds. > i am the author of the get_stack_usage_bytes(). Because i have currently > no 64bit machine running, i am not able to analyse your problem. Does it > only happen on 32bit application on a 64bit kernel? Is it only affected > to pcsd? I've only seen it happen to pcscd. However, most of the time it's one of the very few 32-bit apps running on my laptop (I've got exactly *one* legacy app for a secure-token that is stuck in 32-bit land). So I can't tell if it's a generic 32-bit issue. > Can you give me a more accurate info what exactly the problem is? > > > OK, thanks for the analysis. > > > > > Here's the traceback of the ps, as reported by 2 alt-sysrq-t several > > > minutes apart: > > > > > > > > > ps R running task 3936 26646 26580 0x00000080 > > > ffff88005a599bd8 ffffffff81499842 ffff88000213bf80 0000000000000000 > > > ffff88005a599b78 ffffffff8103589b ffffffff81035805 ffff88007e71ea40 > > > ffff88000213bf80 ffff8800657d8fe0 000000000000df78 ffff8800657d8fe8 > > > Call Trace: > > > [<ffffffff81499842>] ? thread_return+0xb6/0xfa > > > [<ffffffff8103589b>] ? finish_task_switch+0xd1/0xf4 > > > [<ffffffff81035805>] ? finish_task_switch+0x3b/0xf4 > > > [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145 > > > [<ffffffff8149b2a6>] ? trace_hardirqs_on_thunk+0x3a/0x3f > > > [<ffffffff8101f76b>] smp_apic_timer_interrupt+0x81/0x8f > > > [<ffffffff81046231>] ? irq_exit+0xaf/0xb4 > > > [<ffffffff8100bd80>] ? restore_args+0x0/0x30 > > > [<ffffffff810b8222>] ? IS_ERR+0x25/0x2c > > > [<ffffffff810b8222>] ? IS_ERR+0x25/0x2c > > > [<ffffffff810b8f22>] ? follow_page+0x28/0x2e3 > > > [<ffffffff810b91d9>] ? follow_page+0x2df/0x2e3 > > > [<ffffffff811219d8>] ? proc_pid_status+0x5e0/0x694 > > > [<ffffffff81066072>] ? trace_hardirqs_on+0xd/0xf > > > [<ffffffff8111e42c>] ? proc_single_show+0x57/0x74 > > > [<ffffffff810e90b4>] ? seq_read+0x249/0x49b > > > [<ffffffff8116c840>] ? security_file_permission+0x11/0x13 > > > [<ffffffff810d05e1>] ? vfs_read+0xe0/0x141 > > > [<ffffffff810d8191>] ? path_put+0x1d/0x21 > > > [<ffffffff810d06f8>] ? sys_read+0x45/0x69 > > > [<ffffffff8100b2ab>] ? system_call_fastpath+0x16/0x1b > > > The double follow_page looks strange for me... I will have a look on it. It's possible that one of the two follow_page() entries is stale and just happened to be left on the stack. A large chunk of proc_pid_status() is inlined, so it's possible that two calls were made and left their return addresses in different locations on the stack. I am pretty sure that follow_page+0x28 is the correct one, as I see it in 2 more tracebacks today (see below)... Got a better idea than just sticking in a printk() to dump the starting values of stkpage and vma->vm_start in get_stack_usage_in_bytes()? I suspect that for a 32-bit process, those 2 values aren't lined up the way we think they are, and as a result that for loop ends up walking across a *lot* of pages unintentionally. > > Another possibility is that pcscd has gone off on a long in-kernel sulk > > (polling hardware?) while holding some lock which ps needs (eg, mmap_sem). > > > > It would be useful if you can grab a pcscd backtrace during the stall. Got bit again this morning - here's the relevant tracebacks. Looks like pcscd is off sulking in select() and nanosleep(), which are pretty normal places for programs to go sulk. (pcscd has 2 threads, apparently. I had 2 ps commands hung up, thus the two entries for it) pcscd S 0000000000000000 4656 2100 1 0x00020080 ffff88007da3f948 0000000000000046 ffff88007da3f8b8 ffffffff81052b2e ffff88007f90e340 ffff88007e08db30 0000000000000000 0000000000000001 ffff88007da3f8d8 ffff880074ee4fa0 000000000000df78 ffff880074ee4fa0 Call Trace: [<ffffffff81052b2e>] ? queue_work_on+0x5e/0x6c [<ffffffff81056bec>] ? add_wait_queue+0x1b/0x42 [<ffffffff8149a70b>] schedule_hrtimeout_range+0x3f/0x11f [<ffffffff8149bb91>] ? _spin_unlock_irqrestore+0x72/0x80 [<ffffffff81056c0b>] ? add_wait_queue+0x3a/0x42 [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145 [<ffffffff81066072>] ? trace_hardirqs_on+0xd/0xf [<ffffffff810de040>] poll_schedule_timeout+0x33/0x4c [<ffffffff810deb82>] do_select+0x4b7/0x4f5 [<ffffffff810df064>] ? __pollwait+0x0/0xc7 [<ffffffff810df12b>] ? pollwake+0x0/0x51 [<ffffffff810a8d3a>] ? get_page_from_freelist+0x38a/0x63c [<ffffffff811bb5c4>] ? list_del+0xbc/0xea [<ffffffff810a7561>] ? __rmqueue+0x124/0x2bf [<ffffffff81034de2>] ? get_parent_ip+0x11/0x42 [<ffffffff8149e770>] ? sub_preempt_count+0x35/0x49 [<ffffffff810a8ed8>] ? get_page_from_freelist+0x528/0x63c [<ffffffff8105b908>] ? __update_sched_clock+0x2f/0x8e [<ffffffff8105bb72>] ? sched_clock_cpu+0x20b/0x219 [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145 [<ffffffff81066072>] ? trace_hardirqs_on+0xd/0xf [<ffffffff810542bf>] ? alloc_pid+0x2ce/0x3e3 [<ffffffff810542bf>] ? alloc_pid+0x2ce/0x3e3 [<ffffffff81034de2>] ? get_parent_ip+0x11/0x42 [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145 [<ffffffff81107bf7>] compat_core_sys_select+0x183/0x23d [<ffffffff8149e770>] ? sub_preempt_count+0x35/0x49 [<ffffffff8149bb91>] ? _spin_unlock_irqrestore+0x72/0x80 [<ffffffff81030930>] ? task_rq_unlock+0xc/0xe [<ffffffff8103ab25>] ? wake_up_new_task+0x169/0x174 [<ffffffff8103eaf6>] ? do_fork+0x37a/0x424 [<ffffffff8105e2e2>] ? current_kernel_time+0x28/0x50 [<ffffffff81107f48>] compat_sys_select+0x96/0xbe [<ffffffff8107d8b0>] ? audit_syscall_entry+0x170/0x19c [<ffffffff8102b538>] sysenter_dispatch+0x7/0x2c pcscd S ffff88007e1fbf48 5640 2106 1 0x00020080 ffff88007e1fbe28 0000000000000046 ffff88007e1fbd88 ffffffff8149bb91 ffff88007e1fbe78 0000000000000001 ffff88007e1fbe18 ffffffff81059d91 0000000000000000 ffff880074d6ac20 000000000000df78 ffff880074d6ac20 Call Trace: [<ffffffff8149bb91>] ? _spin_unlock_irqrestore+0x72/0x80 [<ffffffff81059d91>] ? __hrtimer_start_range_ns+0x35b/0x36d [<ffffffff8149a882>] do_nanosleep+0x88/0xee [<ffffffff81059e75>] hrtimer_nanosleep+0xac/0x121 [<ffffffff8105935d>] ? hrtimer_wakeup+0x0/0x21 [<ffffffff81075da2>] compat_sys_nanosleep+0x7b/0xe1 [<ffffffff8102b538>] sysenter_dispatch+0x7/0x2c [<ffffffff8149b2a6>] ? trace_hardirqs_on_thunk+0x3a/0x3f ps R running task 3936 45836 45832 0x00000080 ffff88004dc09b98 ffffffff81065f3f 0000000000000001 00000388525af000 ffff88004dc09bb8 ffffffff81065f3f 0000000000000000 000003886bc2b000 ffff88004dc09ce8 ffffffff8149b2a6 0000000000000000 ffff88000212cf68 Call Trace: [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145 [<ffffffff81065f3f>] trace_hardirqs_on_caller+0x1f/0x145 [<ffffffff8149b2a6>] trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145 [<ffffffff8149b2a6>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff8101f76b>] ? smp_apic_timer_interrupt+0x81/0x8f [<ffffffff810b8222>] ? IS_ERR+0x25/0x2c [<ffffffff810b8f22>] ? follow_page+0x28/0x2e3 [<ffffffff811219d8>] proc_pid_status+0x5e0/0x694 [<ffffffff81066072>] ? trace_hardirqs_on+0xd/0xf [<ffffffff8111e42c>] proc_single_show+0x57/0x74 [<ffffffff810e90b4>] seq_read+0x249/0x49b [<ffffffff8116c840>] ? security_file_permission+0x11/0x13 [<ffffffff810d05e1>] vfs_read+0xe0/0x141 [<ffffffff810d8191>] ? path_put+0x1d/0x21 [<ffffffff810d06f8>] sys_read+0x45/0x69 [<ffffffff8100b2ab>] system_call_fastpath+0x16/0x1b ps R running task 5584 45948 45868 0x00000080 ffff88004de81bb8 0000000000000046 ffff88004de81b18 ffffffff81035142 ffff880002120f80 0000000000000000 ffff88004de81b68 ffffffff8103589b ffffffff81035805 ffff8800653fc460 000000000000df78 ffff8800653fc468 Call Trace: [<ffffffff81035142>] ? mmdrop+0x2b/0x3d [<ffffffff8103589b>] ? finish_task_switch+0xd1/0xf4 [<ffffffff81035805>] ? finish_task_switch+0x3b/0xf4 [<ffffffff81499842>] ? thread_return+0xb6/0xfa [<ffffffff814999d1>] preempt_schedule_irq+0x56/0x74 [<ffffffff8100be96>] retint_kernel+0x26/0x30 [<ffffffff81094996>] ? ftrace_likely_update+0x12/0x14 [<ffffffff8100bd80>] ? restore_args+0x0/0x30 [<ffffffff810b8222>] IS_ERR+0x25/0x2c [<ffffffff810b8f22>] follow_page+0x28/0x2e3 [<ffffffff81094990>] ? ftrace_likely_update+0xc/0x14 [<ffffffff811219d8>] proc_pid_status+0x5e0/0x694 [<ffffffff81066072>] ? trace_hardirqs_on+0xd/0xf [<ffffffff8111e42c>] proc_single_show+0x57/0x74 [<ffffffff810e90b4>] seq_read+0x249/0x49b [<ffffffff8116c840>] ? security_file_permission+0x11/0x13 [<ffffffff810d05e1>] vfs_read+0xe0/0x141 [<ffffffff810d8191>] ? path_put+0x1d/0x21 [<ffffffff810d06f8>] sys_read+0x45/0x69 [<ffffffff8100b2ab>] system_call_fastpath+0x16/0x1b Just for comparison, here's the pcscd stack traces right now, when things are working just fine. One thread in select, one in nanosleep, just like when it was broken. pcscd S 0000000000000000 4656 2000 1 0x00020080 ffff880073037948 0000000000000046 ffff8800730378a8 ffffffff81030461 0000000000000000 0000000000000000 ffff880000000000 ffffffff81499842 ffff8800730378d8 ffff88007ebf9120 000000000000df78 ffff88007ebf9120 Call Trace: [<ffffffff81030461>] ? need_resched+0x3a/0x40 [<ffffffff81499842>] ? thread_return+0xb6/0xfa [<ffffffff81056bec>] ? add_wait_queue+0x1b/0x42 [<ffffffff8149a70b>] schedule_hrtimeout_range+0x3f/0x11f [<ffffffff8149bb91>] ? _spin_unlock_irqrestore+0x72/0x80 [<ffffffff81056c0b>] ? add_wait_queue+0x3a/0x42 [<ffffffff810df122>] ? __pollwait+0xbe/0xc7 [<ffffffff810de040>] poll_schedule_timeout+0x33/0x4c [<ffffffff810deb82>] do_select+0x4b7/0x4f5 [<ffffffff810df064>] ? __pollwait+0x0/0xc7 [<ffffffff810df12b>] ? pollwake+0x0/0x51 [<ffffffff810a8d3a>] ? get_page_from_freelist+0x38a/0x63c [<ffffffff811bb5c4>] ? list_del+0xbc/0xea [<ffffffff810a7561>] ? __rmqueue+0x124/0x2bf [<ffffffff81034de2>] ? get_parent_ip+0x11/0x42 [<ffffffff8149e770>] ? sub_preempt_count+0x35/0x49 [<ffffffff810a8ed8>] ? get_page_from_freelist+0x528/0x63c [<ffffffff81035805>] ? finish_task_switch+0x3b/0xf4 [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145 [<ffffffff8149bc01>] ? _spin_unlock_irq+0x62/0x6f [<ffffffff8103589b>] ? finish_task_switch+0xd1/0xf4 [<ffffffff81035805>] ? finish_task_switch+0x3b/0xf4 [<ffffffff81030461>] ? need_resched+0x3a/0x40 [<ffffffff81499842>] ? thread_return+0xb6/0xfa [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145 [<ffffffff81107bf7>] compat_core_sys_select+0x183/0x23d [<ffffffff81499a4d>] ? preempt_schedule+0x5e/0x67 [<ffffffff8149bb9a>] ? _spin_unlock_irqrestore+0x7b/0x80 [<ffffffff81030930>] ? task_rq_unlock+0xc/0xe [<ffffffff8103ab25>] ? wake_up_new_task+0x169/0x174 [<ffffffff8103eaf6>] ? do_fork+0x37a/0x424 [<ffffffff81107f48>] compat_sys_select+0x96/0xbe [<ffffffff8107d80b>] ? audit_syscall_entry+0xcb/0x19c [<ffffffff8102b538>] sysenter_dispatch+0x7/0x2c pcscd S ffff88007cf99f48 4896 2006 1 0x00020080 ffff88007cf99e28 0000000000000046 ffff88007cf99d88 ffffffff8149bb91 ffff88007cf99e78 0000000000000001 ffff88007cf99e18 ffffffff81059d91 0000000000000000 ffff88007e4da3e0 000000000000df78 ffff88007e4da3e0 Call Trace: [<ffffffff8149bb91>] ? _spin_unlock_irqrestore+0x72/0x80 [<ffffffff81059d91>] ? __hrtimer_start_range_ns+0x35b/0x36d [<ffffffff8149a882>] do_nanosleep+0x88/0xee [<ffffffff81059e75>] hrtimer_nanosleep+0xac/0x121 [<ffffffff8105935d>] ? hrtimer_wakeup+0x0/0x21 [<ffffffff81075da2>] compat_sys_nanosleep+0x7b/0xe1 [<ffffffff8102b538>] sysenter_dispatch+0x7/0x2c [<ffffffff8149b2a6>] ? trace_hardirqs_on_thunk+0x3a/0x3f [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 2.6.31-rc1-mmotm0702 - ps command hangs inside kernel 2009-07-16 19:12 ` Valdis.Kletnieks @ 2009-07-16 20:24 ` Stefani Seibold 2009-07-16 20:44 ` Andrew Morton 0 siblings, 1 reply; 7+ messages in thread From: Stefani Seibold @ 2009-07-16 20:24 UTC (permalink / raw) To: Valdis.Kletnieks; +Cc: Andrew Morton, linux-kernel On Thu, 16 Jul 2009, 15:12 -0400 Valdis.Kletnieks@vt.edu said: > On Tue, 14 Jul 2009 07:31:19 +0200, Stefani Seibold said: > > Am Montag, den 13.07.2009, 14:38 -0700 schrieb Andrew Morton: > > > On Mon, 13 Jul 2009 16:54:51 -0400 > > > Valdis.Kletnieks@vt.edu wrote: > > > > > > > Several times recently, I've had the 'ps' command hang inside the kernel > > > > for extended periods of time - usually around 1100 seconds, but today I > > > > had one that hung there for 2351 seconds. > > > i am the author of the get_stack_usage_bytes(). Because i have currently > > no 64bit machine running, i am not able to analyse your problem. Does it > > only happen on 32bit application on a 64bit kernel? Is it only affected > > to pcsd? > > I've only seen it happen to pcscd. However, most of the time it's one of > the very few 32-bit apps running on my laptop (I've got exactly *one* legacy > app for a secure-token that is stuck in 32-bit land). So I can't tell if it's > a generic 32-bit issue. > > It's possible that one of the two follow_page() entries is stale and just > happened to be left on the stack. A large chunk of proc_pid_status() is > inlined, so it's possible that two calls were made and left their return > addresses in different locations on the stack. > > I am pretty sure that follow_page+0x28 is the correct one, as I see it > in 2 more tracebacks today (see below)... The stack trace looks like you there is a old version included in the 2.6.31-rc1-mmotm0702 patches. I switch to walk_page_range() function since patch version V0.9 dated from Jun 10 2009. Here is the link to the lkml patchwork: http://patchwork.kernel.org/patch/32210/ I do the map examination exactly in the same way like the function used for /proc/<pid>/smaps. So i think this version should do it without side effects. Can you tell me were you downloaded the 2.6.31-rc1-mmotm0702 patch? > ps R running task 3936 45836 45832 0x00000080 > ffff88004dc09b98 ffffffff81065f3f 0000000000000001 00000388525af000 > ffff88004dc09bb8 ffffffff81065f3f 0000000000000000 000003886bc2b000 > ffff88004dc09ce8 ffffffff8149b2a6 0000000000000000 ffff88000212cf68 > Call Trace: > [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145 > [<ffffffff81065f3f>] trace_hardirqs_on_caller+0x1f/0x145 > [<ffffffff8149b2a6>] trace_hardirqs_on_thunk+0x3a/0x3f > [<ffffffff81065f3f>] ? trace_hardirqs_on_caller+0x1f/0x145 > [<ffffffff8149b2a6>] ? trace_hardirqs_on_thunk+0x3a/0x3f > [<ffffffff8101f76b>] ? smp_apic_timer_interrupt+0x81/0x8f > [<ffffffff810b8222>] ? IS_ERR+0x25/0x2c > [<ffffffff810b8f22>] ? follow_page+0x28/0x2e3 > [<ffffffff811219d8>] proc_pid_status+0x5e0/0x694 > [<ffffffff81066072>] ? trace_hardirqs_on+0xd/0xf > [<ffffffff8111e42c>] proc_single_show+0x57/0x74 > [<ffffffff810e90b4>] seq_read+0x249/0x49b > [<ffffffff8116c840>] ? security_file_permission+0x11/0x13 > [<ffffffff810d05e1>] vfs_read+0xe0/0x141 > [<ffffffff810d8191>] ? path_put+0x1d/0x21 > [<ffffffff810d06f8>] sys_read+0x45/0x69 > [<ffffffff8100b2ab>] system_call_fastpath+0x16/0x1b ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 2.6.31-rc1-mmotm0702 - ps command hangs inside kernel 2009-07-16 20:24 ` Stefani Seibold @ 2009-07-16 20:44 ` Andrew Morton 2009-07-16 21:21 ` Valdis.Kletnieks 0 siblings, 1 reply; 7+ messages in thread From: Andrew Morton @ 2009-07-16 20:44 UTC (permalink / raw) To: Stefani Seibold; +Cc: Valdis.Kletnieks, linux-kernel On Thu, 16 Jul 2009 22:24:47 +0200 Stefani Seibold <stefani@seibold.net> wrote: > On Thu, 16 Jul 2009, 15:12 -0400 Valdis.Kletnieks@vt.edu said: > > On Tue, 14 Jul 2009 07:31:19 +0200, Stefani Seibold said: > > > Am Montag, den 13.07.2009, 14:38 -0700 schrieb Andrew Morton: > > > > On Mon, 13 Jul 2009 16:54:51 -0400 > > > > Valdis.Kletnieks@vt.edu wrote: > > > > > > > > > Several times recently, I've had the 'ps' command hang inside the kernel > > > > > for extended periods of time - usually around 1100 seconds, but today I > > > > > had one that hung there for 2351 seconds. > > > > > i am the author of the get_stack_usage_bytes(). Because i have currently > > > no 64bit machine running, i am not able to analyse your problem. Does it > > > only happen on 32bit application on a 64bit kernel? Is it only affected > > > to pcsd? > > > > I've only seen it happen to pcscd. However, most of the time it's one of > > the very few 32-bit apps running on my laptop (I've got exactly *one* legacy > > app for a secure-token that is stuck in 32-bit land). So I can't tell if it's > > a generic 32-bit issue. > > > > It's possible that one of the two follow_page() entries is stale and just > > happened to be left on the stack. A large chunk of proc_pid_status() is > > inlined, so it's possible that two calls were made and left their return > > addresses in different locations on the stack. > > > > I am pretty sure that follow_page+0x28 is the correct one, as I see it > > in 2 more tracebacks today (see below)... > > The stack trace looks like you there is a old version included in the > 2.6.31-rc1-mmotm0702 patches. > > I switch to walk_page_range() function since patch version V0.9 dated > from Jun 10 2009. Here is the link to the lkml patchwork: > > http://patchwork.kernel.org/patch/32210/ > > I do the map examination exactly in the same way like the function used > for /proc/<pid>/smaps. So i think this version should do it without side > effects. > > Can you tell me were you downloaded the 2.6.31-rc1-mmotm0702 patch? It would have been version 08. I've now updated to v11. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 2.6.31-rc1-mmotm0702 - ps command hangs inside kernel 2009-07-16 20:44 ` Andrew Morton @ 2009-07-16 21:21 ` Valdis.Kletnieks 0 siblings, 0 replies; 7+ messages in thread From: Valdis.Kletnieks @ 2009-07-16 21:21 UTC (permalink / raw) To: Andrew Morton; +Cc: Stefani Seibold, linux-kernel [-- Attachment #1: Type: text/plain, Size: 457 bytes --] On Thu, 16 Jul 2009 13:44:01 PDT, Andrew Morton said: > On Thu, 16 Jul 2009 22:24:47 +0200 > Stefani Seibold <stefani@seibold.net> wrote: > > The stack trace looks like you there is a old version included in the > > 2.6.31-rc1-mmotm0702 patches. > It would have been version 08. I've now updated to v11. Did -11 make it into the mmotm0715 tarball? If so, I'll go build that and if the problem doesn't re-manifest in a week or so we can call it fixed. [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-07-16 21:21 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-07-13 20:54 2.6.31-rc1-mmotm0702 - ps command hangs inside kernel Valdis.Kletnieks 2009-07-13 21:38 ` Andrew Morton 2009-07-14 5:31 ` Stefani Seibold 2009-07-16 19:12 ` Valdis.Kletnieks 2009-07-16 20:24 ` Stefani Seibold 2009-07-16 20:44 ` Andrew Morton 2009-07-16 21:21 ` Valdis.Kletnieks
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox