public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Processes stuck in unkillable D state (now seen in 2.6.7-mm6)
@ 2004-07-08 22:15 Rob Mueller
  2004-07-09 12:58 ` Chris Mason
  0 siblings, 1 reply; 10+ messages in thread
From: Rob Mueller @ 2004-07-08 22:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: Chris Mason

This is an update to a thread I started last week about processes getting 
stuck in D state.

About 2 days ago, we upgraded to 2.6.7-mm6. Things have generally been 
running fine, but today again, some processes got stuck in an unkillable D 
state. This time, rather than 1 process getting stuck however, about 20 got 
stuck in a relatively short period of time (seems to have been over about 
half an hour). All of processes are cyrus imapd processes.

I've tried to get sysreq-t output, but as this machine is still up and 
running, it has about 2500 processes on it, and I can't seem to get 
consistent sysreq-t output. I set the kernel log buffer size to 17 (128k) 
but that definitely doesn't seem to be enough. I notice that it also seems 
to dump to /var/log/messages, and I get more output there, but it still 
doesn't seem to be a complete process list, and each time I do a sysreq-t, I 
get a different number of procs (though always incomplete) in the output. 
Anyway, I've done sysreq-t twice, and got the output from dmesg -s 1000000 
and /var/log/messages. Since the output is so big, I've put them, and the 
kernel config here:

http://robm.fastmail.fm/kernel/t1/

Process ID's that are definitely stuck are:

1013, 13389, 13469, 16056, 17340, 18489, 21341, 22661, 23976, 29138, 29752, 
30330, 31106, 31956, 32559, 32575, 3753, 5926, 6052, 8857, 9914

But as mentioned above, you won't find most of these in the sysreq-t output, 
I presume because the buffer isn't big enough. Still, hopefully the ones you 
can see there will be some useful information. (FYI, searching for imapd\s+D 
in the sysreq-t output rather than the individual pids seems to be a quicker 
way of finding the problem procs)

Having a quick look myself, there are some odd things there though. For 
instance, from sysreqmsglog1.txt

   imapd         D F1778660     0  3753   1906          3754   809 (NOTLB)
   eb15adb8 00000086 00000020 f1778660 c0310318 c43fc600 08155888 0000002d
          f567d380 f7b97480 c42c3d20 00000000 0001ece6 6051d45f 00007c67 
c42c3d20
          c03d8180 f1778660 f1778810 f78ad9cc 00000003 f78ad9cc f78ad9cc 
c025d40c
   Call Trace:
    [<c0310318>] memcpy_fromiovec+0x38/0x60
    [<c025d40c>] generic_unplug_device+0x2c/0x40
    [<c037a288>] io_schedule+0x28/0x40
    [<c012e17c>] __lock_page+0xbc/0xe0
    [<c012deb0>] page_wake_function+0x0/0x50
    [<c012deb0>] page_wake_function+0x0/0x50
    [<c012f1a1>] filemap_nopage+0x231/0x360
    [<c013dd58>] do_no_page+0xb8/0x3a0
    [<c013bbbb>] pte_alloc_map+0xdb/0xf0
    [<c013e1ee>] handle_mm_fault+0xbe/0x1a0
    [<c0112c62>] do_page_fault+0x172/0x5ec
    [<c012435b>] do_sigaction+0x19b/0x210
    [<c0120dac>] update_process_times+0x2c/0x40
    [<c0110230>] smp_apic_timer_interrupt+0x140/0x150
    [<c0112af0>] do_page_fault+0x0/0x5ec
    [<c0104b19>] error_code+0x2d/0x38

   imapd         D E59812C0     0 22661   1906         23248 22592 (NOTLB)
   d54f5db8 00000086 f7b7de18 e59812c0 d54f5d94 c04b0dc0 00000020 00000000
          c42c3060 f71696f0 c42c3d20 00000000 0002cda6 891b682d 00007b15 
c42c3d20
          f71696f0 e59812c0 e5981470 00000003 c025d3bb f78ad9cc f78ad9cc 
c025d40c
   Call Trace:
    [<c025d3bb>] __generic_unplug_device+0x1b/0x40
    [<c025d40c>] generic_unplug_device+0x2c/0x40
    [<c037a288>] io_schedule+0x28/0x40
    [<c012e17c>] __lock_page+0xbc/0xe0
    [<c012deb0>] page_wake_function+0x0/0x50
    [<c012deb0>] page_wake_function+0x0/0x50
    [<c012f1a1>] filemap_nopage+0x231/0x360
    [<c013dd58>] do_no_page+0xb8/0x3a0
    [<c013bbbb>] pte_alloc_map+0xdb/0xf0
    [<c013e1ee>] handle_mm_fault+0xbe/0x1a0
    [<c0112af0>] do_page_fault+0x0/0x5ec
    [<c0104a5a>] apic_timer_interrupt+0x1a/0x20
    [<c0112c62>] do_page_fault+0x172/0x5ec
    [<c012435b>] do_sigaction+0x19b/0x210
    [<c0124693>] sys_rt_sigaction+0x53/0x90
    [<c030c631>] sys_socketcall+0x111/0x200
    [<c0112af0>] do_page_fault+0x0/0x5ec
    [<c0104b19>] error_code+0x2d/0x38

Those calls into "generic_unplug_device" look really strange to me...

Rob


^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: Processes stuck in unkillable D state (now seen in 2.6.7-mm6)
@ 2004-07-09  9:52 Rob Mueller
  0 siblings, 0 replies; 10+ messages in thread
From: Rob Mueller @ 2004-07-09  9:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Chris Mason

As an update, this machine eventually ended up dieing pretty horribly. When 
we found it, there were 100's of procs stuck in D state, and our automated 
"ping" script was reporting all sorts of problems. Anyway we killed off as 
many processes as possible, and did a sysreq-t before trying to reboot. The 
soft reboot failed, and a hard reboot was required.

The newly gathered sysreq output has been placed in the files 
sysreqdmesg3.txt and sysreqmsglog3.txt here:

http://robm.fastmail.fm/kernel/t1/

Rob


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2004-07-20 21:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-08 22:15 Processes stuck in unkillable D state (now seen in 2.6.7-mm6) Rob Mueller
2004-07-09 12:58 ` Chris Mason
2004-07-12 19:53   ` Rob Mueller
2004-07-12 20:11     ` William Lee Irwin III
2004-07-12 20:14       ` Rob Mueller
2004-07-12 20:25         ` William Lee Irwin III
2004-07-20 19:51     ` Chris Mason
2004-07-20 21:19       ` Rob Mueller
2004-07-15 16:12   ` Processes stuck in unkillable D state (now seen in 2.6.8-rc1) Rob Mueller
  -- strict thread matches above, loose matches on Subject: below --
2004-07-09  9:52 Processes stuck in unkillable D state (now seen in 2.6.7-mm6) Rob Mueller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox