public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.6.1 IO lockup on SMP systems
@ 2004-01-31 16:40 Sergey S. Kostyliov
  2004-02-01  0:17 ` Andrew Morton
  0 siblings, 1 reply; 24+ messages in thread
From: Sergey S. Kostyliov @ 2004-01-31 16:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: anton

Hello all,

I had experienced a lockups on three of my servers with 2.6.1. It doesn't
look like a deadlock, the box is still pingable and all tcp ports which were
 in listen state before a lockup are remains in listen state, but I can't get
any data from this ports. According to sar(1) systems had not been overloaded
right before a lockup. And there is no log entries in all user services logs
for almost 10 hours after lockup.

So I think this is an IO lockup. On the other side it doesn't look like a bug
 in particular controller driver, because they are different for each box.
And finally it doesn't look like a bug in particular io-scheduler because two
of boxes were runed with "deadline" and one with "as". Of course all
assumptions are valid only if all lockups I had seen have the same nature.

All of three boxes are SMP. Unfortunately all are remote and aren't attached
to a serial console yet (this is planed in next couple of weeks).

1) ope
01:02.1 RAID bus controller: Mylex Corporation: Unknown device 0050 (rev 02)
elevator=deadline
.config:	http://sysadminday.org.ru/2.6.1-io_lockup/ope/.config
lspci:		http://sysadminday.org.ru/2.6.1-io_lockup/ope/lspci
lspci -vvn:	http://sysadminday.org.ru/2.6.1-io_lockup/ope/lspci_-vvn

2) white
02:04.0 RAID bus controller: American Megatrends Inc. MegaRAID (rev 02)
elevator=deadline
.config:	http://sysadminday.org.ru/2.6.1-io_lockup/white/.config
lspci:		http://sysadminday.org.ru/2.6.1-io_lockup/white/lspci
lspci -vvn:	http://sysadminday.org.ru/2.6.1-io_lockup/white/lspci_-vvn

3) tiny
02:00.0 Unknown mass storage controller: Compaq Computer Corporation Smart-2/P RAID Controller (rev 03)
03:00.0 Unknown mass storage controller: Compaq Computer Corporation Smart-2/P RAID Controller (rev 03)
elevator=as
.config:	http://sysadminday.org.ru/2.6.1-io_lockup/tiny/.config
lspci:		http://sysadminday.org.ru/2.6.1-io_lockup/tiny/lspci
lspci -vvn:	http://sysadminday.org.ru/2.6.1-io_lockup/tiny/lspci_-vvn

Any hints will be appreciated.

-- 
                   Best regards,
                   Sergey S. Kostyliov <rathamahata@php4.ru>
                   Public PGP key: http://sysadminday.org.ru/rathamahata.asc


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.1 IO lockup on SMP systems
  2004-01-31 16:40 2.6.1 IO lockup on SMP systems Sergey S. Kostyliov
@ 2004-02-01  0:17 ` Andrew Morton
  2004-02-21 16:45   ` Sergey S. Kostyliov
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2004-02-01  0:17 UTC (permalink / raw)
  To: Sergey S. Kostyliov; +Cc: linux-kernel, anton

"Sergey S. Kostyliov" <rathamahata@php4.ru> wrote:
>
> I had experienced a lockups on three of my servers with 2.6.1. It doesn't
>  look like a deadlock, the box is still pingable and all tcp ports which were
>   in listen state before a lockup are remains in listen state, but I can't get
>  any data from this ports. According to sar(1) systems had not been overloaded
>  right before a lockup. And there is no log entries in all user services logs
>  for almost 10 hours after lockup.

Please ensure that CONFIG_KALLSYMS is enabled, then generate an all-tasks
backtrace or a locked machine with sysrq-T or `echo t >
/proc/sysrq-trigger'.  Then send us the resulting trace.

You may need a serial console to be able to capture all the output.

Also, it would be useful to know what sort of load the machines are under,
and what filesystems are in use.

Thanks.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.1 IO lockup on SMP systems
  2004-02-01  0:17 ` Andrew Morton
@ 2004-02-21 16:45   ` Sergey S. Kostyliov
  2004-02-21 19:30     ` Andrew Morton
  0 siblings, 1 reply; 24+ messages in thread
From: Sergey S. Kostyliov @ 2004-02-21 16:45 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, anton

Hello Andrew,

On Sunday 01 February 2004 03:17, Andrew Morton wrote:
> "Sergey S. Kostyliov" <rathamahata@php4.ru> wrote:
> >
> > I had experienced a lockups on three of my servers with 2.6.1. It doesn't
> >  look like a deadlock, the box is still pingable and all tcp ports which were
> >   in listen state before a lockup are remains in listen state, but I can't get
> >  any data from this ports. According to sar(1) systems had not been overloaded
> >  right before a lockup. And there is no log entries in all user services logs
> >  for almost 10 hours after lockup.
> 
> Please ensure that CONFIG_KALLSYMS is enabled, then generate an all-tasks
> backtrace or a locked machine with sysrq-T or `echo t >
> /proc/sysrq-trigger'.  Then send us the resulting trace.

I've just reproduced this lockup with 2.6.3.

> 
> You may need a serial console to be able to capture all the output.
> 
> Also, it would be useful to know what sort of load the machines are under,
> and what filesystems are in use.

The machine is a http server. The main applications are:
1) apache 1.3 which serves php pages (mod_php):
	 15.3 requests/sec - 111.9 kB/second - 7.3 kB/request
	 54 requests currently being processed, 19 idle servers
2) mysql:
	Threads: 19  Questions: 26922012  Slow queries: 9799  Opens: 64980
	Flush tables: 1  Open tables: 630  Queries per second avg: 143.547

This is an IO bound machine in general. All filesystems are reiserfs.

Here is a sysrq-T output obtained from a locked box via serail console:

SysRq : Show State

free                        sibling
task             PC    stack   pid father child younger older
init          D 28E916FC    24     1      0     2               (NOTLB)
c244fcf0 00000086 d8460080 28e916fc 00003243 c2422bc0 f77fbd00 00000096
d8460080 2ede4081 00003243 c02af980 00000001 2ede4181 00003243 d8460080
d84600a0 c2422bc0 000017a2 2ede43e1 00003243 c244dac8 03471525 c244fd04
Call Trace:
[<c012b62c>] schedule_timeout+0x6c/0xc0
[<c0142b11>] wakeup_bdflush+0x21/0x40
[<c012b5b0>] process_timeout+0x0/0x10
[<c011eb7b>] io_schedule_timeout+0x2b/0x40
[<c01f54a4>] blk_congestion_wait+0x84/0xa0
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c0148f9f>] try_to_free_pages+0xef/0x190
[<c014187c>] __alloc_pages+0x1dc/0x350
[<c013e6a9>] filemap_nopage+0x329/0x3d0
[<c0157728>] read_swap_cache_async+0xb8/0xd0
[<c014c903>] swapin_readahead+0x43/0x90
[<c014cb98>] do_swap_page+0x248/0x320
[<c014d4d0>] handle_mm_fault+0xe0/0x1b0
[<c011a94c>] do_page_fault+0x33c/0x530
[<c0171334>] sys_select+0x264/0x520
[<c011a610>] do_page_fault+0x0/0x530
[<c0109d25>] error_code+0x2d/0x38

migration/0   S 00000001    12     2      1             3       (L-TLB)
c245dfc4 00000046 c241abc0 00000001 00000003 c245df98 c02ab560 f7da6d00
69378bf7 247a42e0 00000000 e11a4de2 f77e1f58 c245c000 f77e1f50 00000292
c245dfc4 c241abc0 00001801 e11a54a6 00000001 c244ce68 c241b4ec c245c000
Call Trace:
[<c011f4af>] migration_thread+0xdf/0x160
[<c011f3d0>] migration_thread+0x0/0x160
[<c0106f79>] kernel_thread_helper+0x5/0xc

ksoftirqd/0   S 00000001    24     3      1             4     2 (L-TLB)
c245bfd8 00000046 c241abc0 00000001 00000003 f3fc8ca8 f63f62e0 c241c54c
c245bf94 c245bf94 c241c55c 00000000 cebbd940 253f991d 000019b1 f5f766d0
f5f766f0 c241abc0 0000010b 8d075105 000019ea c244c838 c245a000 c245a000
Call Trace:
[<c0126d22>] ksoftirqd+0xe2/0x100
[<c0126c40>] ksoftirqd+0x0/0x100
[<c0106f79>] kernel_thread_helper+0x5/0xc

migration/1   S 00000001     8     4      1             5     3 (L-TLB)
c2459fc4 00000046 c2422bc0 00000001 00000003 c02aeedc c02ab560 c0336c60
c012bb70 c02aeedc c02aeed8 c2458000 c0123735 00000082 c02aba30 00000008
c2458000 c2422bc0 00004b63 0295e0d2 00000000 c244c208 c24234ec c2458000
Call Trace:
[<c012bb70>] free_uid+0x20/0x90
[<c0123735>] reparent_to_init+0x105/0x1a0
[<c011f4af>] migration_thread+0xdf/0x160
[<c011f3d0>] migration_thread+0x0/0x160
[<c0106f79>] kernel_thread_helper+0x5/0xc

ksoftirqd/1   S C0134355    24     5      1             6     4 (L-TLB)
c2455fd8 00000046 c03385e0 c0134355 02002bfd cad564c8 ed2ed0e0 c242454c
c2455f94 c2455f94 c242455c 00000000 c2454000 c033759c c0126a03 eca2f350
eca2f370 c2422bc0 0000025a 31392d56 000019e4 c2457ae8 c2454000 c2454000
Call Trace:
[<c0134355>] rcu_process_callbacks+0x155/0x190
[<c0126a03>] tasklet_action+0x73/0xe0
[<c0126d22>] ksoftirqd+0xe2/0x100
[<c0126c40>] ksoftirqd+0x0/0x100
[<c0106f79>] kernel_thread_helper+0x5/0xc

events/0      S 00000001     0     6      1 14588       7     5 (L-TLB)
f7f93f70 00000046 c241abc0 00000001 00000003 0000000b f77fb8c0 c02b8124
00000246 c241b520 c0353e40 00000000 f7fcbbe4 f7f92000 f7fcbbe0 00000092
f7f93f70 c241abc0 000001c9 25a4fd1b 00003243 c24574b8 f7f92000 f7fcbbcc
Call Trace:
[<c01333e5>] worker_thread+0x285/0x2b0
[<c01e5a60>] console_callback+0x0/0xe0
[<c011d8e0>] default_wake_function+0x0/0x20
[<c0109192>] ret_from_fork+0x6/0x14
[<c011d8e0>] default_wake_function+0x0/0x20
[<c0133160>] worker_thread+0x0/0x2b0
[<c0106f79>] kernel_thread_helper+0x5/0xc



events/1      S ED2ED520     0     7      1             8     6 (L-TLB)
f7f91f70 00000046 00000000 ed2ed520 00000000 e07e9e88 ed2ed520 00000000
00000000 f630a080 0000007b 0000007b f630a080 f630a0a0 c2422bc0 f6258d20
f6258d40 c2422bc0 0000006b ecdabbfb 000019ef c2456e88 f7f90000 f7fcbc2c
Call Trace:
[<c01333e5>] worker_thread+0x285/0x2b0
[<c0132e00>] __call_usermodehelper+0x0/0x70
[<c011d8e0>] default_wake_function+0x0/0x20
[<c0109192>] ret_from_fork+0x6/0x14
[<c011d8e0>] default_wake_function+0x0/0x20
[<c0133160>] worker_thread+0x0/0x2b0
[<c0106f79>] kernel_thread_helper+0x5/0xc

kblockd/0     S 00000001    24     8      1             9     7 (L-TLB)
c2527f70 00000046 c241abc0 00000001 00000003 00000001 f776f2a0 f7fa8000
c02027ec c2772e00 f3dbde28 f7c37834 f7fcb3a4 c2526000 f7fcb3a0 00000092
c2527f70 c241abc0 0000067d 03fc798c 00002b4f c2456858 c2526000 f7fcb38c
Call Trace:
[<c02027ec>] DAC960_process_queue+0x1c/0x170
[<c01333e5>] worker_thread+0x285/0x2b0
[<c01f4670>] blk_unplug_work+0x0/0x20
[<c011d8e0>] default_wake_function+0x0/0x20
[<c0109192>] ret_from_fork+0x6/0x14
[<c011d8e0>] default_wake_function+0x0/0x20
[<c0133160>] worker_thread+0x0/0x2b0
[<c0106f79>] kernel_thread_helper+0x5/0xc

kblockd/1     S C2772E00     8     9      1            13     8 (L-TLB)
c2525f70 00000046 c01f29d6 c2772e00 00000003 00000003 d1a49ae0 f7fa8000
c02027ec c2772e00 ecffc5d8 c2763c60 f7fcb404 c2524000 f7fcb400 c25026b0
c25026d0 c2422bc0 00000961 ff665670 00002b4e c2456228 c2524000 f7fcb3ec
Call Trace:
[<c01f29d6>] elv_next_request+0x16/0x110
[<c02027ec>] DAC960_process_queue+0x1c/0x170
[<c01333e5>] worker_thread+0x285/0x2b0
[<c01f4670>] blk_unplug_work+0x0/0x20
[<c011d8e0>] default_wake_function+0x0/0x20
[<c0109192>] ret_from_fork+0x6/0x14
[<c011d8e0>] default_wake_function+0x0/0x20
[<c0133160>] worker_thread+0x0/0x2b0
[<c0106f79>] kernel_thread_helper+0x5/0xc

kswapd0       S C23FFE98     0    13      1            10     9 (L-TLB)
f7dfbf04 00000046 c23fff38 c23ffe98 000000d0 00000200 f77a06c0 c02b0280
00000002 00000000 c0149200 00000100 c02b0280 000000d0 00000200 f72ecce0
f72ecd00 c2422bc0 0000b6a2 df9ac558 00003243 c2502878 f7dfa000 f7dfbf20
Call Trace:
[<c0149200>] balance_pgdat+0x1c0/0x250
[<c014939b>] kswapd+0x10b/0x160
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c0109192>] ret_from_fork+0x6/0x14
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c0149290>] kswapd+0x0/0x160
[<c0106f79>] kernel_thread_helper+0x5/0xc

kirqd         S 00000001     8    10      1            14    13 (L-TLB)
c2501fa0 00000046 c2422bc0 00000001 00000003 00000000 d1a49040 00000000
c0109c28 00000000 000000d5 005d2025 c244d2d0 4926873b 03471a9a f77ac6f0
f77ac710 c2422bc0 000006fd 881c4eb1 00003243 c2503b08 03472e23 c2501fb4
Call Trace:
[<c0109c28>] common_interrupt+0x18/0x20
[<c012b62c>] schedule_timeout+0x6c/0xc0
[<c012b5b0>] process_timeout+0x0/0x10
[<c0118ee7>] balanced_irq+0x57/0x80
[<c0118e90>] balanced_irq+0x0/0x80
[<c0106f79>] kernel_thread_helper+0x5/0xc

aio/0         S 00000082     0    14      1            15    10 (L-TLB)
f7da9f70 00000046 00000001 00000082 00000001 c244ff68 c02ab560 f7da9f4c
c011d93a c244d900 00000003 00000000 c244ff68 f7da8000 00010000 c244d900
c244d920 c241abc0 000027fb 1965fa0c 00000000 c2502248 f7da8000 00000000
Call Trace:
[<c011d93a>] __wake_up_common+0x3a/0x70
[<c01333e5>] worker_thread+0x285/0x2b0
[<c011d8e0>] default_wake_function+0x0/0x20
[<c0109192>] ret_from_fork+0x6/0x14
[<c011d8e0>] default_wake_function+0x0/0x20
[<c0133160>] worker_thread+0x0/0x2b0
[<c0106f79>] kernel_thread_helper+0x5/0xc

aio/1         S 00000001     0    15      1            16    14 (L-TLB)
f7da5f70 00000046 c2422bc0 00000001 00000003 c244ff68 c02ab560 f7da5f4c
c011d93a c244d900 00000003 00000000 c244ff68 f7da4000 00010000 f7da7960
f7dd7c04 c2422bc0 0000241f 19668d09 00000000 f7da7b28 f7da4000 00000000
Call Trace:
[<c011d93a>] __wake_up_common+0x3a/0x70
[<c01333e5>] worker_thread+0x285/0x2b0
[<c011d8e0>] default_wake_function+0x0/0x20
[<c0109192>] ret_from_fork+0x6/0x14
[<c011d8e0>] default_wake_function+0x0/0x20
[<c0133160>] worker_thread+0x0/0x2b0
[<c0106f79>] kernel_thread_helper+0x5/0xc


kseriod       S 00000001  1688    16      1            17    15 (L-TLB)
c26c9fb0 00000046 c2422bc0 00000001 00000003 e2024870 f68ad760 00000002
e2024000 012aeedc e20248b0 c02be660 c02be8a0 c02be820 00000286 c021ab1a
c02be8a0 c2422bc0 018bda82 3f31580d 0000317e f7da6268 c26c8000 ffffe000
Call Trace:
[<c021ab1a>] serio_find_dev+0x6a/0x70
[<c021adb6>] serio_thread+0x146/0x180
[<c0109192>] ret_from_fork+0x6/0x14
[<c011d8e0>] default_wake_function+0x0/0x20
[<c021ac70>] serio_thread+0x0/0x180
[<c0106f79>] kernel_thread_helper+0x5/0xc

reiserfs/0    S 00000003     0    17      1            18    16 (L-TLB)
c2697f70 00000046 f880b38c 00000003 00000001 00000000 ecec66e0 f880b398
f880b34c 00000292 c01b824f f8831c20 c26dce44 c2696000 c26dce40 cdc9aca0
cdc9acc0 c241abc0 00001bcf c3d3faeb 00001a46 f7da6898 c2696000 c26dce2c
Call Trace:
[<c01b824f>] kupdate_one_transaction+0x12f/0x250
[<c01333e5>] worker_thread+0x285/0x2b0
[<c01b97c0>] reiserfs_journal_commit_task_func+0x0/0x100
[<c011d8e0>] default_wake_function+0x0/0x20
[<c0109192>] ret_from_fork+0x6/0x14
[<c011d8e0>] default_wake_function+0x0/0x20
[<c0133160>] worker_thread+0x0/0x2b0
[<c0106f79>] kernel_thread_helper+0x5/0xc

reiserfs/1    S 00000003     0    18      1            23    17 (L-TLB)
f77e1f70 00000046 f8a272ec 00000003 00000001 00000000 f776fb20 00000000
f8a272ac f77a11f0 c01b824f f77a11f0 c26dcea4 f77e0000 c26dcea0 f38aace0
f38aad00 c2422bc0 00000448 e234458a 00001a55 f7da6ec8 f77e0000 c26dce8c
Call Trace:
[<c01b824f>] kupdate_one_transaction+0x12f/0x250
[<c01333e5>] worker_thread+0x285/0x2b0
[<c01b97c0>] reiserfs_journal_commit_task_func+0x0/0x100
[<c011d8e0>] default_wake_function+0x0/0x20
[<c0109192>] ret_from_fork+0x6/0x14
[<c011d8e0>] default_wake_function+0x0/0x20
[<c0133160>] worker_thread+0x0/0x2b0
[<c0106f79>] kernel_thread_helper+0x5/0xc

devfsd        D 25935C12    16    23      1           610    18 (NOTLB)
f7683bcc 00000086 f5f6b980 25935c12 00003243 c241abc0 f77fb8c0 00000096
f5f6b980 25935ac8 00003243 c02af980 00000001 25935c12 00003243 f5f6b980
f5f6b9a0 c241abc0 00002372 25935f22 00003243 f7757538 03471489 f7683be0
Call Trace:
[<c012b62c>] schedule_timeout+0x6c/0xc0
[<c0142b11>] wakeup_bdflush+0x21/0x40
[<c012b5b0>] process_timeout+0x0/0x10
[<c011eb7b>] io_schedule_timeout+0x2b/0x40
[<c01f54a4>] blk_congestion_wait+0x84/0xa0
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c0148f9f>] try_to_free_pages+0xef/0x190
[<c014187c>] __alloc_pages+0x1dc/0x350
[<c0157728>] read_swap_cache_async+0xb8/0xd0
[<c014c903>] swapin_readahead+0x43/0x90
[<c014cb98>] do_swap_page+0x248/0x320
[<c014d4d0>] handle_mm_fault+0xe0/0x1b0
[<c011a94c>] do_page_fault+0x33c/0x530
[<c0141740>] __alloc_pages+0xa0/0x350
[<c011b858>] recalc_task_prio+0xa8/0x1d0
[<c011d503>] schedule+0x373/0x700
[<c011a610>] do_page_fault+0x0/0x530
[<c0109d25>] error_code+0x2d/0x38
[<c01cc1f6>] __copy_to_user_ll+0x46/0x80
[<c01c161e>] devfsd_read+0x42e/0x4e0
[<c011d8e0>] default_wake_function+0x0/0x20
[<c011d8e0>] default_wake_function+0x0/0x20
[<c015c4a8>] vfs_read+0xb8/0x130
[<c015c752>] sys_read+0x42/0x70
[<c01092bb>] syscall_call+0x7/0xb

syslogd       D 00000001     0   610      1           616    23 (NOTLB)
f71cdcf0 00000086 c241abc0 00000001 00000003 c2422bc0 f77a04a0 00000096
f5f6b980 24068d22 00003243 c02af980 00000001 00000096 f71cc000 f71cc000
f71cdd04 c241abc0 00001cf0 2406959c 00003243 f7da74f8 0347146f f71cdd04
Call Trace:
[<c012b62c>] schedule_timeout+0x6c/0xc0
[<c0142b11>] wakeup_bdflush+0x21/0x40
[<c012b5b0>] process_timeout+0x0/0x10
[<c011eb7b>] io_schedule_timeout+0x2b/0x40
[<c01f54a4>] blk_congestion_wait+0x84/0xa0
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c0148f9f>] try_to_free_pages+0xef/0x190
[<c014187c>] __alloc_pages+0x1dc/0x350
[<c013e6a9>] filemap_nopage+0x329/0x3d0
[<c0157728>] read_swap_cache_async+0xb8/0xd0
[<c014c903>] swapin_readahead+0x43/0x90
[<c014cb98>] do_swap_page+0x248/0x320
[<c014d4d0>] handle_mm_fault+0xe0/0x1b0
[<c011a94c>] do_page_fault+0x33c/0x530
[<c0125b1e>] do_setitimer+0x1be/0x1f0
[<c01086c0>] sys_sigreturn+0xf0/0x110
[<c011a610>] do_page_fault+0x0/0x530
[<c0109d25>] error_code+0x2d/0x38

klogd         D B043E3E1     0   616      1           699   610 (NOTLB)
f7769cf0 00000086 d8460080 b6390e66 00003244 c2422bc0 f7306d60 00000096
d8460080 b6390d6f 00003244 c02af980 00000001 b6390e66 00003244 d8460080
d84600a0 c2422bc0 00001736 bc2e3c36 00003244 f7628838 03472f32 f7769d04
Call Trace:
[<c012b62c>] schedule_timeout+0x6c/0xc0
[<c0142b11>] wakeup_bdflush+0x21/0x40
[<c012b5b0>] process_timeout+0x0/0x10
[<c011eb7b>] io_schedule_timeout+0x2b/0x40
[<c01f54a4>] blk_congestion_wait+0x84/0xa0
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c0148f9f>] try_to_free_pages+0xef/0x190
[<c014187c>] __alloc_pages+0x1dc/0x350
[<c0157728>] read_swap_cache_async+0xb8/0xd0
[<c014c903>] swapin_readahead+0x43/0x90
[<c014cb98>] do_swap_page+0x248/0x320
[<c014d4d0>] handle_mm_fault+0xe0/0x1b0
[<c011a94c>] do_page_fault+0x33c/0x530
[<c011d8e0>] default_wake_function+0x0/0x20
[<c012b4b0>] do_timer+0xc0/0xd0
[<c015c4c2>] vfs_read+0xd2/0x130
[<c015c752>] sys_read+0x42/0x70
[<c011a610>] do_page_fault+0x0/0x530
[<c0109d25>] error_code+0x2d/0x38

ntpd          S 00000001    16   699      1           718   616 (NOTLB)
f735deb0 00000086 c2422bc0 00000001 00000003 00000000 f77a06c0 c02afe00
00000000 00000000 08098478 00000000 f72ecce0 00000010 c02b0700 00000000
000000d0 c2422bc0 00000260 ce28bcca 00003244 f72ecea8 00000000 7fffffff
Call Trace:
[<c012b67e>] schedule_timeout+0xbe/0xc0
[<c022485b>] datagram_poll+0x2b/0xca
[<c021e809>] sock_poll+0x29/0x40
[<c0170f21>] do_select+0x1a1/0x310
[<c0170bb0>] __pollwait+0x0/0xd0
[<c01713cb>] sys_select+0x2fb/0x520
[<c01092bb>] syscall_call+0x7/0xb

sshd          D EBC82985     0   718      1  1051     741   699 (NOTLB)
f77bfc84 00000082 d8460080 ebc82985 00003244 c2422bc0 f77fb040 00000082
d8460080 ebc82884 00003244 c02af980 00000001 f1bd4c48 00003244 d8460080
d84600a0 c2422bc0 00001749 f1bd4e95 00003244 f71f3b28 034732b5 f77bfc98
Call Trace:
[<c012b62c>] schedule_timeout+0x6c/0xc0
[<c0142b11>] wakeup_bdflush+0x21/0x40
[<c012b5b0>] process_timeout+0x0/0x10
[<c011eb7b>] io_schedule_timeout+0x2b/0x40
[<c01f54a4>] blk_congestion_wait+0x84/0xa0
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c0148f9f>] try_to_free_pages+0xef/0x190
[<c014187c>] __alloc_pages+0x1dc/0x350
[<c0143bc2>] do_page_cache_readahead+0x172/0x1e0
[<c013e511>] filemap_nopage+0x191/0x3d0
[<c013e380>] filemap_nopage+0x0/0x3d0
[<c014cfd3>] do_no_page+0xd3/0x3c0
[<c014acc7>] pte_alloc_map+0xc7/0x110
[<c014d4f6>] handle_mm_fault+0x106/0x1b0
[<c011a94c>] do_page_fault+0x33c/0x530
[<c0171334>] sys_select+0x264/0x520
[<c011a610>] do_page_fault+0x0/0x530
[<c0109d25>] error_code+0x2d/0x38



xinetd        D 00000001     0   741      1           758   718 (NOTLB)
f7627c84 00000086 c241abc0 00000001 00000003 f7626000 f77fb6a0 212e541d
00003243 f5f6b980 f5f6b9a0 c241abc0 00015dbd 212e559d 00003243 f7626000
f7627c98 c241abc0 000004c7 212e61bf 00003243 f72ff4b8 03471440 f7627c98
Call Trace:
[<c012b62c>] schedule_timeout+0x6c/0xc0
[<c0142b11>] wakeup_bdflush+0x21/0x40
[<c012b5b0>] process_timeout+0x0/0x10
[<c011eb7b>] io_schedule_timeout+0x2b/0x40
[<c01f54a4>] blk_congestion_wait+0x84/0xa0
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c0148f9f>] try_to_free_pages+0xef/0x190
[<c014187c>] __alloc_pages+0x1dc/0x350
[<c0143bc2>] do_page_cache_readahead+0x172/0x1e0
[<c01f5d56>] generic_make_request+0x106/0x190
[<c013e511>] filemap_nopage+0x191/0x3d0
[<c013e380>] filemap_nopage+0x0/0x3d0
[<c014cfd3>] do_no_page+0xd3/0x3c0
[<c014acc7>] pte_alloc_map+0xc7/0x110
[<c014d4f6>] handle_mm_fault+0x106/0x1b0
[<c011a94c>] do_page_fault+0x33c/0x530
[<c01cc1fa>] __copy_to_user_ll+0x4a/0x80
[<c0171334>] sys_select+0x264/0x520
[<c011d8e0>] default_wake_function+0x0/0x20
[<c011a610>] do_page_fault+0x0/0x530
[<c0109d25>] error_code+0x2d/0x38

svscan        D 00000001     0   758      1   759     785   741 (NOTLB)
f736fcf0 00000082 c241abc0 00000001 00000003 c2422bc0 f776f080 00000096
d8460080 24345b50 00003243 c02af980 00000001 00000096 f736e000 f736e000
f736fd04 c241abc0 00001e2b 24346317 00003243 f72ec878 03471472 f736fd04
Call Trace:
[<c012b62c>] schedule_timeout+0x6c/0xc0
[<c0142b11>] wakeup_bdflush+0x21/0x40
[<c012b5b0>] process_timeout+0x0/0x10
[<c011eb7b>] io_schedule_timeout+0x2b/0x40
[<c01f54a4>] blk_congestion_wait+0x84/0xa0
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c0148f9f>] try_to_free_pages+0xef/0x190
[<c014187c>] __alloc_pages+0x1dc/0x350
[<c013e6a9>] filemap_nopage+0x329/0x3d0
[<c0157728>] read_swap_cache_async+0xb8/0xd0
[<c014c903>] swapin_readahead+0x43/0x90
[<c014cb98>] do_swap_page+0x248/0x320
[<c014d4d0>] handle_mm_fault+0xe0/0x1b0
[<c011a94c>] do_page_fault+0x33c/0x530
[<c012b636>] schedule_timeout+0x76/0xc0
[<c013027e>] sys_rt_sigaction+0xfe/0x120
[<c012b5b0>] process_timeout+0x0/0x10
[<c012b85e>] sys_nanosleep+0x10e/0x1c0
[<c011a610>] do_page_fault+0x0/0x530
[<c0109d25>] error_code+0x2d/0x38

supervise     D 00000001     0   759    758   761     760       (NOTLB)
f6e4fcf0 00000086 c2422bc0 00000001 00000003 f6e4e000 f77a0060 606e7502
00003245 d8460080 d84600a0 c2422bc0 0000dc5d 606e7682 00003245 f6e4e000
f6e4fd04 c2422bc0 00000496 6672deb3 00003245 f72ffae8 03473a5b f6e4fd04
Call Trace:
[<c012b62c>] schedule_timeout+0x6c/0xc0
[<c0142b11>] wakeup_bdflush+0x21/0x40
[<c012b5b0>] process_timeout+0x0/0x10
[<c011eb7b>] io_schedule_timeout+0x2b/0x40
[<c01f54a4>] blk_congestion_wait+0x84/0xa0
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c0148f9f>] try_to_free_pages+0xef/0x190
[<c014187c>] __alloc_pages+0x1dc/0x350
[<c013e6a9>] filemap_nopage+0x329/0x3d0
[<c0157728>] read_swap_cache_async+0xb8/0xd0
[<c014c903>] swapin_readahead+0x43/0x90
[<c014cb98>] do_swap_page+0x248/0x320
[<c014d4d0>] handle_mm_fault+0xe0/0x1b0
[<c011a94c>] do_page_fault+0x33c/0x530
[<c0170ba4>] poll_freewait+0x44/0x50
[<c01719d2>] sys_poll+0x272/0x2c0
[<c0170bb0>] __pollwait+0x0/0xd0
[<c011a610>] do_page_fault+0x0/0x530
[<c0109d25>] error_code+0x2d/0x38

supervise     D 00000001     0   760    758   762           759 (NOTLB)
f6e4dc84 00000082 c241abc0 00000001 00000003 c2422bc0 f7306b40 00000082
f5f6b980 211bd57b 00003243 c02af980 00000001 00000082 f6e4c000 f5f83310
f5f83330 c241abc0 0001b8e4 211bdfd4 00003243 f72fe228 0347143e f6e4dc98
Call Trace:
[<c012b62c>] schedule_timeout+0x6c/0xc0
[<c0142b11>] wakeup_bdflush+0x21/0x40
[<c012b5b0>] process_timeout+0x0/0x10
[<c011eb7b>] io_schedule_timeout+0x2b/0x40
[<c01f54a4>] blk_congestion_wait+0x84/0xa0
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c0148f9f>] try_to_free_pages+0xef/0x190
[<c014187c>] __alloc_pages+0x1dc/0x350
[<c0143bc2>] do_page_cache_readahead+0x172/0x1e0
[<c013e511>] filemap_nopage+0x191/0x3d0
[<c013e380>] filemap_nopage+0x0/0x3d0
[<c014cfd3>] do_no_page+0xd3/0x3c0
[<c014acc7>] pte_alloc_map+0xc7/0x110
[<c014d4f6>] handle_mm_fault+0x106/0x1b0
[<c011a94c>] do_page_fault+0x33c/0x530
[<c0170ba4>] poll_freewait+0x44/0x50
[<c01719d2>] sys_poll+0x272/0x2c0
[<c0170bb0>] __pollwait+0x0/0xd0
[<c011a610>] do_page_fault+0x0/0x530
[<c0109d25>] error_code+0x2d/0x38

dnscache      D A631E98B     0   761    759                     (NOTLB)
f6e2bc84 00000086 d8460080 ac271597 00003245 c2422bc0 f776f2a0 00000082
d8460080 ac271452 00003245 c02af980 00000001 ac271597 00003245 d8460080
d84600a0 c2422bc0 00001661 ac27189b 00003245 f77ad518 03473f51 f6e2bc98
Call Trace:
[<c012b62c>] schedule_timeout+0x6c/0xc0
[<c0142b11>] wakeup_bdflush+0x21/0x40
[<c012b5b0>] process_timeout+0x0/0x10
[<c011eb7b>] io_schedule_timeout+0x2b/0x40
[<c01f54a4>] blk_congestion_wait+0x84/0xa0
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c0148f9f>] try_to_free_pages+0xef/0x190
[<c014187c>] __alloc_pages+0x1dc/0x350
[<c0143bc2>] do_page_cache_readahead+0x172/0x1e0
[<c013e511>] filemap_nopage+0x191/0x3d0
[<c013e380>] filemap_nopage+0x0/0x3d0
[<c014cfd3>] do_no_page+0xd3/0x3c0
[<c014acc7>] pte_alloc_map+0xc7/0x110
[<c014d4f6>] handle_mm_fault+0x106/0x1b0
[<c011a94c>] do_page_fault+0x33c/0x530
[<c0170ba4>] poll_freewait+0x44/0x50
[<c01719d2>] sys_poll+0x272/0x2c0
[<c0170bb0>] __pollwait+0x0/0xd0
[<c011a610>] do_page_fault+0x0/0x530
[<c0109d25>] error_code+0x2d/0x38

multilog      S 00000001  4036   762    760                     (NOTLB)
f6de9eb4 00000086 c2422bc0 00000001 00000003 c013d359 f77a0b00 00000000
f7629900 c1a5c288 c02b0e40 c1a5c288 0001b38a 0001b38a 00000292 c0157bdf
c034dac0 c2422bc0 00000d4b d73a9523 00001a38 f7629ac8 f739b66c f739b600
Call Trace:
[<c013d359>] __lock_page+0xb9/0xd0
[<c0157bdf>] swap_free+0x2f/0x50
[<c0169f8e>] pipe_wait+0x7e/0xa0
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c016a19f>] pipe_readv+0x1ef/0x2f0
[<c016a2d8>] pipe_read+0x38/0x40
[<c015c4a8>] vfs_read+0xb8/0x130
[<c012f0af>] sys_rt_sigprocmask+0xbf/0x190
[<c015c752>] sys_read+0x42/0x70
[<c01092bb>] syscall_call+0x7/0xb

httpd         D 24562C9D     0   785      1  2898     828   758 (NOTLB)
f72a9c84 00000082 00000000 24562c9d 00003243 c2422bc0 f776fb20 00000082
f5f6b980 24562c9d 00003243 c02af980 00000001 00000082 f72a8000 f7756d40
f7756d60 c241abc0 000072d3 24563c20 00003243 f72fe858 03471475 f72a9c98
Call Trace:
[<c012b62c>] schedule_timeout+0x6c/0xc0
[<c0142b11>] wakeup_bdflush+0x21/0x40
[<c012b5b0>] process_timeout+0x0/0x10
[<c011eb7b>] io_schedule_timeout+0x2b/0x40
[<c01f54a4>] blk_congestion_wait+0x84/0xa0
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c0148f9f>] try_to_free_pages+0xef/0x190
[<c014187c>] __alloc_pages+0x1dc/0x350
[<c0143bc2>] do_page_cache_readahead+0x172/0x1e0
[<c013d1b5>] unlock_page+0x15/0x60
[<c013e511>] filemap_nopage+0x191/0x3d0
[<c013e380>] filemap_nopage+0x0/0x3d0
[<c014cfd3>] do_no_page+0xd3/0x3c0
[<c0157bdf>] swap_free+0x2f/0x50
[<c014acc7>] pte_alloc_map+0xc7/0x110
[<c014d4f6>] handle_mm_fault+0x106/0x1b0
[<c011a94c>] do_page_fault+0x33c/0x530
[<c0170bb0>] __pollwait+0x0/0xd0
[<c0171334>] sys_select+0x264/0x520
c0109d25>] error_code+0x2d/0x38



mysqld_safe   S 00000000     0   835      1   924     851   828 (NOTLB)
f693df50 00000086 7dda6067 00000000 f696bb60 f696bb80 f696bb60 f77acd20
c011a94c f696bb60 f69a9620 080c9f9c 00000001 00000001 080c9f9c f71f26d0
f71f26f0 c241abc0 0001663f 2f2e593c 00000008 f77acee8 fffffe00 f693c000
Call Trace:
[<c011a94c>] do_page_fault+0x33c/0x530
[<c01254ab>] sys_wait4+0x1bb/0x290
[<c011d8e0>] default_wake_function+0x0/0x20
[<c012f0f3>] sys_rt_sigprocmask+0x103/0x190
[<c011d8e0>] default_wake_function+0x0/0x20
[<c01092bb>] syscall_call+0x7/0xb

qmail-send    D 3F7BF822     0   851      1   864     900   835 (NOTLB)
f7603c84 00000086 d8460080 457110f1 00003246 c2422bc0 f696b0c0 00000082
d8460080 45710fc1 00003246 c02af980 00000001 457110f1 00003246 d8460080
d84600a0 c2422bc0 00001710 4b664525 00003246 f72ec248 0347495e f7603c98
Call Trace:
[<c012b62c>] schedule_timeout+0x6c/0xc0
[<c0142b11>] wakeup_bdflush+0x21/0x40
[<c012b5b0>] process_timeout+0x0/0x10
[<c011eb7b>] io_schedule_timeout+0x2b/0x40
[<c01f54a4>] blk_congestion_wait+0x84/0xa0
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c0148f9f>] try_to_free_pages+0xef/0x190
[<c014187c>] __alloc_pages+0x1dc/0x350
[<c0143bc2>] do_page_cache_readahead+0x172/0x1e0
[<c014ca8f>] do_swap_page+0x13f/0x320
[<c013e511>] filemap_nopage+0x191/0x3d0
[<c013e380>] filemap_nopage+0x0/0x3d0
[<c014cfd3>] do_no_page+0xd3/0x3c0
[<c014acc7>] pte_alloc_map+0xc7/0x110
[<c014d4f6>] handle_mm_fault+0x106/0x1b0
[<c011a94c>] do_page_fault+0x33c/0x530
[<c0171334>] sys_select+0x264/0x520
[<c011a610>] do_page_fault+0x0/0x530
[<c0109d25>] error_code+0x2d/0x38

splogger      S 114C3021  5660   864    851           865       (NOTLB)
f684beb4 00000086 f7da7330 114c3021 02002c19 00000003 f68addc0 00000009
f684bea4 00000000 c021dd7c f684d9a0 00000000 f7140580 f684bf90 f7da7330
f7da7350 c241abc0 00000a82 c14af7c2 000019e4 f684db68 c26e038c c26e0320
Call Trace:
[<c021dd7c>] sockfd_lookup+0x1c/0x80
[<c0169f8e>] pipe_wait+0x7e/0xa0
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c01790da>] update_atime+0x9a/0xe0
[<c011fda0>] autoremove_wake_function+0x0/0x50
[<c016a19f>] pipe_readv+0x1ef/0x2f0
[<c016a2d8>] pipe_read+0x38/0x40
[<c015c4a8>] vfs_read+0xb8/0x130
[<c010fa00>] do_gettimeofday+0x20/0xc0
[<c015c752>] sys_read+0x42/0x70
[<c01092bb>] syscall_call+0x7/0xb

qmail-lspawn  S 00000001     0   865    851           866   864 (NOTLB)
f685feb0 00000086 c241abc0 00000001 00000003 c138a310 f689b960 000001d5
f685ff40 00000000 c0141740 c02afe00 00000000 00000000 c14176c1 00000000
f71f2d00 c241abc0 000028b6 c141be0d 000019e4 f71f2ec8 00000000 7fffffff
Call Trace:
[<c0141740>] __alloc_pages+0xa0/0x350
[<c012b67e>] schedule_timeout+0xbe/0xc0

-- 
                   Best regards,
                   Sergey S. Kostyliov <rathamahata@php4.ru>
                   Public PGP key: http://sysadminday.org.ru/rathamahata.asc

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.1 IO lockup on SMP systems
  2004-02-21 16:45   ` Sergey S. Kostyliov
@ 2004-02-21 19:30     ` Andrew Morton
  2004-02-22 17:39       ` Alexander Y. Fomichev
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2004-02-21 19:30 UTC (permalink / raw)
  To: Sergey S. Kostyliov; +Cc: linux-kernel, anton

"Sergey S. Kostyliov" <rathamahata@php4.ru> wrote:
>
> Hello Andrew,
> 
> On Sunday 01 February 2004 03:17, Andrew Morton wrote:
> > "Sergey S. Kostyliov" <rathamahata@php4.ru> wrote:
> > >
> > > I had experienced a lockups on three of my servers with 2.6.1. It doesn't
> > >  look like a deadlock, the box is still pingable and all tcp ports which were
> > >   in listen state before a lockup are remains in listen state, but I can't get
> > >  any data from this ports. According to sar(1) systems had not been overloaded
> > >  right before a lockup. And there is no log entries in all user services logs
> > >  for almost 10 hours after lockup.
> > 
> > Please ensure that CONFIG_KALLSYMS is enabled, then generate an all-tasks
> > backtrace or a locked machine with sysrq-T or `echo t >
> > /proc/sysrq-trigger'.  Then send us the resulting trace.
> 
> I've just reproduced this lockup with 2.6.3.
> 
> > 
> > You may need a serial console to be able to capture all the output.
> > 
> > Also, it would be useful to know what sort of load the machines are under,
> > and what filesystems are in use.
> 
> The machine is a http server. The main applications are:
> 1) apache 1.3 which serves php pages (mod_php):
> 	 15.3 requests/sec - 111.9 kB/second - 7.3 kB/request
> 	 54 requests currently being processed, 19 idle servers
> 2) mysql:
> 	Threads: 19  Questions: 26922012  Slow queries: 9799  Opens: 64980
> 	Flush tables: 1  Open tables: 630  Queries per second avg: 143.547
> 
> This is an IO bound machine in general. All filesystems are reiserfs.
> 
> Here is a sysrq-T output obtained from a locked box via serail console:

OK, so everything is stuck trying to allocate memory.  Perhaps you ran out
of swapspace, or some process has gone berzerk allocating memory.

How much memory does the machine have, and how much swap space?

I suggest that you run a `vmstat 30' trace on a terminal somewhere, see what
it says prior to the hangs.  Also capture the sysrq-M output after it has
hung.

It would be useful to monitor the contents of /proc/vmstat also.

And perhaps keep top running in `sort by memory usage' mode.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.1 IO lockup on SMP systems
  2004-02-21 19:30     ` Andrew Morton
@ 2004-02-22 17:39       ` Alexander Y. Fomichev
  2004-02-23 17:27         ` Sergey S. Kostyliov
  0 siblings, 1 reply; 24+ messages in thread
From: Alexander Y. Fomichev @ 2004-02-22 17:39 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Sergey S. Kostyliov, linux-kernel, anton

On Saturday 21 February 2004 22:30, Andrew Morton wrote:
> "Sergey S. Kostyliov" <rathamahata@php4.ru> wrote:
> > Hello Andrew,
> >
> > On Sunday 01 February 2004 03:17, Andrew Morton wrote:
> > > "Sergey S. Kostyliov" <rathamahata@php4.ru> wrote:
> > > > I had experienced a lockups on three of my servers with 2.6.1. It
> > > > doesn't look like a deadlock, the box is still pingable and all tcp
> > > > ports which were in listen state before a lockup are remains in
> > > > listen state, but I can't get any data from this ports. According to
> > > > sar(1) systems had not been overloaded right before a lockup. And
> > > > there is no log entries in all user services logs for almost 10 hours
> > > > after lockup.
> > >
> > > Please ensure that CONFIG_KALLSYMS is enabled, then generate an
> > > all-tasks backtrace or a locked machine with sysrq-T or `echo t >
> > > /proc/sysrq-trigger'.  Then send us the resulting trace.
> >
> > I've just reproduced this lockup with 2.6.3.
> >
> > > You may need a serial console to be able to capture all the output.
> > >
> > > Also, it would be useful to know what sort of load the machines are
> > > under, and what filesystems are in use.
> >
> > The machine is a http server. The main applications are:
> > 1) apache 1.3 which serves php pages (mod_php):
> > 	 15.3 requests/sec - 111.9 kB/second - 7.3 kB/request
> > 	 54 requests currently being processed, 19 idle servers
> > 2) mysql:
> > 	Threads: 19  Questions: 26922012  Slow queries: 9799  Opens: 64980
> > 	Flush tables: 1  Open tables: 630  Queries per second avg: 143.547
> >
> > This is an IO bound machine in general. All filesystems are reiserfs.
> >
> > Here is a sysrq-T output obtained from a locked box via serail console:
>
> OK, so everything is stuck trying to allocate memory.  Perhaps you ran out
> of swapspace, or some process has gone berzerk allocating memory.
>
> How much memory does the machine have, and how much swap space?
>
# free
             total       used       free     shared    buffers     cached
Mem:       2073868    2067508       6360          0     232708     897828
-/+ buffers/cache:     936972    1136896
Swap:      1535976       5228    1530748

> I suggest that you run a `vmstat 30' trace on a terminal somewhere, see
> what it says prior to the hangs. 
Ok.We'll try to get it next time.

> Also capture the sysrq-M output after it 
> has hung.
>
This "showmem" && "showreg" have been taken just before
"SysRq: Show State" from previous message.

SysRq : Show Memory
    Mem-info:
    DMA per-cpu:
    cpu 0 hot: low 2, high 6, batch 1
    cpu 0 cold: low 0, high 2, batch 1
    cpu 1 hot: low 2, high 6, batch 1
    cpu 1 cold: low 0, high 2, batch 1
    Normal per-cpu:
    cpu 0 hot: low 32, high 96, batch 16
    cpu 0 cold: low 0, high 32, batch 16
    cpu 1 hot: low 32, high 96, batch 16
    cpu 1 cold: low 0, high 32, batch 16
    HighMem per-cpu:
    cpu 0 hot: low 32, high 96, batch 16
    cpu 0 cold: low 0, high 32, batch 16
    cpu 1 hot: low 32, high 96, batch 16
    cpu 1 cold: low 0, high 32, batch 16

    Free pages:        3172kB (512kB HighMem)
    Active:1783 inactive:87 dirty:0 writeback:0 unstable:0 free:793
    DMA free:1292kB min:16kB low:32kB high:48kB active:3748kB inactive:0kB
    Normal free:1368kB min:936kB low:1872kB high:2808kB active:1368kB 
inactive:356kB
    HighMem free:512kB min:512kB low:1024kB high:1536kB active:2008kB 
inactive:0kB
    DMA: 151*4kB 70*8kB 6*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 
0*2048kB 0*4096kB = B
    Normal: 192*4kB 9*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 
0*1024kB 0*2048kB 0*4096kB B
    HighMem: 0*4kB 2*8kB 3*16kB 0*32kB 1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 
0*2048kB 0*4096kB =B
    Swap cache: add 1140128, delete 1140063, find 459572/584559, race 145+217
    Free swap:       384364kB
    524288 pages of RAM
    294912 pages of HIGHMEM
    5821 reserved pages
    976 pages shared
    65 pages swap cached


SysRq : Show Regs

Pid: 0, comm:              swapper
EIP: 0060:[<c0106d1c>] CPU: 0
EIP is at default_idle+0x2c/0x40
 EFLAGS: 00000246    Not tainted
 EAX: 00000000 EBX: c02e6000 ECX: c0106cf0 EDX: c02e6000
 ESI: c02e6000 EDI: c0105000 EBP: 0008e000 DS: 007b ES: 007b
 CR0: 8005003b CR2: bffff7e0 CR3: 2d021000 CR4: 00000690
 Call Trace:
  [<c0106dab>] cpu_idle+0x3b/0x50
   [<c02e88e9>] start_kernel+0x179/0x1a0
    [<c02e84a0>] unknown_bootoption+0x0/0x120


> It would be useful to monitor the contents of /proc/vmstat also.
>
> And perhaps keep top running in `sort by memory usage' mode.
ok, we'll try too.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
< on behalf of "Sergey S. Kostyliov" <rathamahata@php4.ru> >

Best regards.
        Alexander Y. Fomichev <gluk@php4.ru>
        Public PGP key: http://sysadminday.org.ru/gluk.asc

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.1 IO lockup on SMP systems
  2004-02-22 17:39       ` Alexander Y. Fomichev
@ 2004-02-23 17:27         ` Sergey S. Kostyliov
  2004-02-23 21:30           ` Mike Fedyk
  2004-02-23 22:26           ` Andrew Morton
  0 siblings, 2 replies; 24+ messages in thread
From: Sergey S. Kostyliov @ 2004-02-23 17:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Alexander Y. Fomichev, anton

Hello Andrew,

Now this happens for the third time.

> > > I've just reproduced this lockup with 2.6.3.
> > >
> > > > You may need a serial console to be able to capture all the output.
> > > >
> > > > Also, it would be useful to know what sort of load the machines are
> > > > under, and what filesystems are in use.
> > >
> > > The machine is a http server. The main applications are:
> > > 1) apache 1.3 which serves php pages (mod_php):
> > > 	 15.3 requests/sec - 111.9 kB/second - 7.3 kB/request
> > > 	 54 requests currently being processed, 19 idle servers
> > > 2) mysql:
> > > 	Threads: 19  Questions: 26922012  Slow queries: 9799  Opens: 64980
> > > 	Flush tables: 1  Open tables: 630  Queries per second avg: 143.547
> > >
> > > This is an IO bound machine in general. All filesystems are reiserfs.
> > >
> > > Here is a sysrq-T output obtained from a locked box via serail console:
> >
> > OK, so everything is stuck trying to allocate memory.  Perhaps you ran out
> > of swapspace, or some process has gone berzerk allocating memory.

The memory exhaustion is indeed possible for this box. I'll double check
ulimit and /etc/security/limits.conf stuff. The only thing which worries
me that this box had been running for months without any problems with
2.4.23aa1.

I have added another 2Gb to swap space (hope this give enough time
to find the memory hungry process(es)).

> >
> > How much memory does the machine have, and how much swap space?
> >
> # free
>              total       used       free     shared    buffers     cached
> Mem:       2073868    2067508       6360          0     232708     897828
> -/+ buffers/cache:     936972    1136896
> Swap:      1535976       5228    1530748
> 
> > I suggest that you run a `vmstat 30' trace on a terminal somewhere, see
> > what it says prior to the hangs. 
> Ok.We'll try to get it next time.

Here it is:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 1  0 551920   8108 203744 933532    0    0     4    68 1214   426  5  1 92  2
 0  0 551928   7140 203756 930316    0    0    17    61 1240   529  8  1 89  2
 0  0 551976   5788 203772 928224    1    6   360   139 1297   317  7  2 83  8
 0  0 551968   7588 203812 923504    0    0    19   125 1303   308  8  2 87  4
 0  1 551976  10444 203892 914100    0    0    25   127 1433   438 10  3 85  3
 0  0 551976   9220 204004 914804    0    0   123   126 1278   325  6  1 88  5
 0  0 551976   8108 204044 912248    0    0    38    69 1279   291  6  1 91  2
 0  1 551976  11828 204144 912320    1    0   135    94 1249   296  6  1 89  3
 0  5 562204   3280 203952 157084    1  566   305   674 1281   313  6  4 73 17
 0 18 598224   4276   1888  33356   91 2734   233  2761 1090   199  0  2  0 97
 1 38 662520   2760   2104  30520  110 3721   261  3738 1161   831  1  2  0 97
10 41 699936   2772   1920  28716  123 2924   249  2946 1103  1273  0  3  0 97
 0 39 748588   2956   1956  22668  160 3313   245  3331 1056  1047  0  2  0 98
 0 38 796100   3108   1888  21348  321 3191   430  3206 1045  1002  0  2  0 97
 4 43 844532   3308   1956  17644  518 3719   670  3733 1357   999  0  2  0 98
 0 51 882596   2940   2052  13960  520 2796   705  2810 1048  1182  0  2  0 98
 3 59 913392   2456   2048  10900 1013 2524  1308  2542 1144   601  0  2  0 98
 5 71 937816   2760   2072   8584 1534 2681  1860  2702 1234   607  0  2  0 97

> 
> > Also capture the sysrq-M output after it 
> > has hung.
> >
> This "showmem" && "showreg" have been taken just before
> "SysRq: Show State" from previous message.
> 
> SysRq : Show Memory
>     Mem-info:
>     DMA per-cpu:
>     cpu 0 hot: low 2, high 6, batch 1
>     cpu 0 cold: low 0, high 2, batch 1
>     cpu 1 hot: low 2, high 6, batch 1
>     cpu 1 cold: low 0, high 2, batch 1
>     Normal per-cpu:
>     cpu 0 hot: low 32, high 96, batch 16
>     cpu 0 cold: low 0, high 32, batch 16
>     cpu 1 hot: low 32, high 96, batch 16
>     cpu 1 cold: low 0, high 32, batch 16
>     HighMem per-cpu:
>     cpu 0 hot: low 32, high 96, batch 16
>     cpu 0 cold: low 0, high 32, batch 16
>     cpu 1 hot: low 32, high 96, batch 16
>     cpu 1 cold: low 0, high 32, batch 16
> 
>     Free pages:        3172kB (512kB HighMem)
>     Active:1783 inactive:87 dirty:0 writeback:0 unstable:0 free:793
>     DMA free:1292kB min:16kB low:32kB high:48kB active:3748kB inactive:0kB
>     Normal free:1368kB min:936kB low:1872kB high:2808kB active:1368kB 
> inactive:356kB
>     HighMem free:512kB min:512kB low:1024kB high:1536kB active:2008kB 
> inactive:0kB
>     DMA: 151*4kB 70*8kB 6*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 
> 0*2048kB 0*4096kB = B
>     Normal: 192*4kB 9*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 
> 0*1024kB 0*2048kB 0*4096kB B
>     HighMem: 0*4kB 2*8kB 3*16kB 0*32kB 1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 
> 0*2048kB 0*4096kB =B
>     Swap cache: add 1140128, delete 1140063, find 459572/584559, race 145+217
>     Free swap:       384364kB
>     524288 pages of RAM
>     294912 pages of HIGHMEM
>     5821 reserved pages
>     976 pages shared
>     65 pages swap cached
> 
> 
> SysRq : Show Regs
> 
> Pid: 0, comm:              swapper
> EIP: 0060:[<c0106d1c>] CPU: 0
> EIP is at default_idle+0x2c/0x40
>  EFLAGS: 00000246    Not tainted
>  EAX: 00000000 EBX: c02e6000 ECX: c0106cf0 EDX: c02e6000
>  ESI: c02e6000 EDI: c0105000 EBP: 0008e000 DS: 007b ES: 007b
>  CR0: 8005003b CR2: bffff7e0 CR3: 2d021000 CR4: 00000690
>  Call Trace:
>   [<c0106dab>] cpu_idle+0x3b/0x50
>    [<c02e88e9>] start_kernel+0x179/0x1a0
>     [<c02e84a0>] unknown_bootoption+0x0/0x120

I forgot to switch output capture on in minicom, so the sysrq-M
was scrolled out of the terminal by subsequent sysrq-T, largest
part of which was in turn scrolled out. But the sysrq-T part
is almost the same as previous one.

> 
> > It would be useful to monitor the contents of /proc/vmstat also.

The last /proc/vmstat content is 5 minutes before a real lockup
(It looks like simple "while true; do date; cat /proc/vmstat; sleep 10; done"
script suffer from the same memory exhaustion problem.)

Mon Feb 23 17:41:34 MSK 2004
nr_dirty 136
nr_writeback 0
nr_unstable 0
nr_page_table_pages 987
nr_mapped 227018
nr_slab 13041
pgpgin 8593704
pgpgout 4349808
pswpin 169183
pswpout 183480
pgalloc 20244471
pgfree 20247061
pgactivate 548813
pgdeactivate 628769
pgfault 25756129
pgmajfault 67820
pgscan 4570640
pgrefill 2934423
pgsteal 2024118
pginodesteal 0
kswapd_steal 1886046
kswapd_inodesteal 891
pageoutrun 10047
allocstall 3930
pgrotated 178662

Mon Feb 23 17:41:44 MSK 2004
nr_dirty 339
nr_writeback 0
nr_unstable 0
nr_page_table_pages 991
nr_mapped 226443
nr_slab 13036
pgpgin 8593956
pgpgout 4351080
pswpin 169186
pswpout 183480
pgalloc 20250240
pgfree 20253382
pgactivate 549009
pgdeactivate 628769
pgfault 25764719
pgmajfault 67827
pgscan 4570640
pgrefill 2934423
pgsteal 2024118
pginodesteal 0
kswapd_steal 1886046
kswapd_inodesteal 891
pageoutrun 10047
allocstall 3930
pgrotated 178662

Mon Feb 23 17:41:54 MSK 2004
nr_dirty 505
nr_writeback 0
nr_unstable 0
nr_page_table_pages 993
nr_mapped 226477
nr_slab 13049
pgpgin 8594244
pgpgout 4352144
pswpin 169186
pswpout 183480
pgalloc 20256355
pgfree 20259400
pgactivate 549048
pgdeactivate 628769
pgfault 25772385
pgmajfault 67837
pgscan 4570640
pgrefill 2934423
pgsteal 2024118
pginodesteal 0
kswapd_steal 1886046
kswapd_inodesteal 891
pageoutrun 10047
allocstall 3930
pgrotated 178662

Mon Feb 23 17:42:15 MSK 2004
nr_dirty 0
nr_writeback 765
nr_unstable 0
nr_page_table_pages 1044
nr_mapped 209677
nr_slab 4672
pgpgin 8605592
pgpgout 4424120
pswpin 169454
pswpout 201127
pgalloc 20561829
pgfree 20563033
pgactivate 601317
pgdeactivate 778533
pgfault 25777874
pgmajfault 68001
pgscan 5399589
pgrefill 3543496
pgsteal 2300249
pginodesteal 0
kswapd_steal 2058168
kswapd_inodesteal 14284
pageoutrun 10114
allocstall 7008
pgrotated 193130
Mon Feb 23 17:42:47 MSK 2004
nr_dirty 1
nr_writeback 597
nr_unstable 0
nr_page_table_pages 1213
nr_mapped 190410
nr_slab 4640
pgpgin 8614032
pgpgout 4500108
pswpin 170334
pswpout 219922
pgalloc 20588517
pgfree 20589474
pgactivate 711818
pgdeactivate 908805
pgfault 25783157
pgmajfault 68204
pgscan 5667215
pgrefill 3774369
pgsteal 2322731
pginodesteal 0
kswapd_steal 2066149
kswapd_inodesteal 14352
pageoutrun 10167
allocstall 7383
pgrotated 209403

> >
> > And perhaps keep top running in `sort by memory usage' mode.
> ok, we'll try too.

Unfortunately the top output is kind of useless because mysql
hide the real problem, I'll try to run top in batch mode next time.

top - 17:47:03 up  7:10,  3 users,  load average: 124.72, 66.96, 27.71
Tasks: 219 total,   1 running, 218 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.2% us,  2.1% sy,  0.0% ni,  0.0% id, 97.6% wa,  0.1% hi,  0.0% si
Mem:   2073868k total,  2070796k used,     3072k free,     1996k buffers
Swap:  1535976k total,   944520k used,   591456k free,     6884k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  896 mysql     15   0 1013m  21m 4896 S  0.1  1.1   0:05.64 mysqld
  939 mysql     16   0 1013m  21m 4896 S  0.0  1.1   0:00.05 mysqld
  940 mysql     20   0 1013m  21m 4896 S  0.0  1.1   0:00.00 mysqld
  941 mysql     17   0 1013m  21m 4896 S  0.0  1.1   0:00.00 mysqld
  942 mysql     15   0 1013m  21m 4896 D  0.4  1.1   1:26.19 mysqld
  943 mysql     16   0 1013m  21m 4896 S  0.0  1.1   0:00.00 mysqld
  961 mysql     17   0 1013m  21m 4896 D  0.0  1.1   0:00.00 mysqld
  962 mysql     15   0 1013m  21m 4896 D  0.0  1.1   0:13.12 mysqld
  971 mysql     16   0 1013m  21m 4896 S  0.0  1.1   0:00.05 mysqld
  972 mysql     16   0 1013m  21m 4896 S  0.0  1.1   0:03.12 mysqld
27314 mysql     15   0 1013m  21m 4896 D  0.0  1.1   0:11.73 mysqld
27325 mysql     15   0 1013m  21m 4896 S  0.0  1.1   0:08.70 mysqld
27339 mysql     15   0 1013m  21m 4896 S  0.0  1.1   0:07.78 mysqld
27361 mysql     16   0 1013m  21m 4896 S  0.0  1.1   0:10.05 mysqld
27375 mysql     15   0 1013m  21m 4896 S  0.0  1.1   0:10.61 mysqld
27390 mysql     16   0 1013m  21m 4896 S  0.0  1.1   0:11.44 mysqld
27392 mysql     16   0 1013m  21m 4896 S  0.0  1.1   0:09.20 mysqld
27393 mysql     16   0 1013m  21m 4896 S  0.0  1.1   0:12.23 mysqld
 3671 mysql     16   0 1013m  21m 4896 D  0.0  1.1   0:00.04 mysqld
 3672 mysql     18   0 1013m  21m 4896 D  0.0  1.1   0:00.11 mysqld
 3691 mysql     16   0 1013m  21m 4896 D  0.1  1.1   0:00.02 mysqld
 3704 mysql     17   0 1013m  21m 4896 S  0.0  1.1   0:00.02 mysqld

Thank you for your help!

-- 
                   Best regards,
                   Sergey S. Kostyliov <rathamahata@php4.ru>
                   Public PGP key: http://sysadminday.org.ru/rathamahata.asc

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.1 IO lockup on SMP systems
  2004-02-23 17:27         ` Sergey S. Kostyliov
@ 2004-02-23 21:30           ` Mike Fedyk
  2004-02-24 11:56             ` Sergey S. Kostyliov
  2004-02-23 22:26           ` Andrew Morton
  1 sibling, 1 reply; 24+ messages in thread
From: Mike Fedyk @ 2004-02-23 21:30 UTC (permalink / raw)
  To: Sergey S. Kostyliov
  Cc: Andrew Morton, linux-kernel, Alexander Y. Fomichev, anton

Sergey S. Kostyliov wrote:
> Hello Andrew,
> 
> Now this happens for the third time.
> 
> 
>>>>I've just reproduced this lockup with 2.6.3.
>>>>
>>>>
>>>>>You may need a serial console to be able to capture all the output.
>>>>>
>>>>>Also, it would be useful to know what sort of load the machines are
>>>>>under, and what filesystems are in use.
>>>>
>>>>The machine is a http server. The main applications are:
>>>>1) apache 1.3 which serves php pages (mod_php):
>>>>	 15.3 requests/sec - 111.9 kB/second - 7.3 kB/request
>>>>	 54 requests currently being processed, 19 idle servers
>>>>2) mysql:
>>>>	Threads: 19  Questions: 26922012  Slow queries: 9799  Opens: 64980
>>>>	Flush tables: 1  Open tables: 630  Queries per second avg: 143.547
>>>>
>>>>This is an IO bound machine in general. All filesystems are reiserfs.
>>>>
>>>>Here is a sysrq-T output obtained from a locked box via serail console:
>>>
>>>OK, so everything is stuck trying to allocate memory.  Perhaps you ran out
>>>of swapspace, or some process has gone berzerk allocating memory.
> 
> 
> The memory exhaustion is indeed possible for this box. I'll double check
> ulimit and /etc/security/limits.conf stuff. The only thing which worries
> me that this box had been running for months without any problems with
> 2.4.23aa1.
> 
> I have added another 2Gb to swap space (hope this give enough time
> to find the memory hungry process(es)).

Also check how much memory is being used for slab in /proc/meminfo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.1 IO lockup on SMP systems
  2004-02-23 17:27         ` Sergey S. Kostyliov
  2004-02-23 21:30           ` Mike Fedyk
@ 2004-02-23 22:26           ` Andrew Morton
  2004-02-24  7:23             ` Marcelo Tosatti
  2004-02-24 11:54             ` Sergey S. Kostyliov
  1 sibling, 2 replies; 24+ messages in thread
From: Andrew Morton @ 2004-02-23 22:26 UTC (permalink / raw)
  To: Sergey S. Kostyliov; +Cc: linux-kernel, gluk, anton

"Sergey S. Kostyliov" <rathamahata@php4.ru> wrote:
>
> > > OK, so everything is stuck trying to allocate memory.  Perhaps you ran out
> > > of swapspace, or some process has gone berzerk allocating memory.
> 
> The memory exhaustion is indeed possible for this box. I'll double check
> ulimit and /etc/security/limits.conf stuff. The only thing which worries
> me that this box had been running for months without any problems with
> 2.4.23aa1.

It is conceivable that you have some application which runs OK on 2.4.x but
has some subtle bug which causes the app to go crazy on a 2.6 kernel
consuming lots of memory.  Or there's a bug in the 2.6 kernel ;)

> I have added another 2Gb to swap space (hope this give enough time
> to find the memory hungry process(es)).
> 
> > >
> > > How much memory does the machine have, and how much swap space?
> > >
> > # free
> >              total       used       free     shared    buffers     cached
> > Mem:       2073868    2067508       6360          0     232708     897828
> > -/+ buffers/cache:     936972    1136896
> > Swap:      1535976       5228    1530748
> > 
> > > I suggest that you run a `vmstat 30' trace on a terminal somewhere, see
> > > what it says prior to the hangs. 
> > Ok.We'll try to get it next time.
> 
> Here it is:
> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
>  1  0 551920   8108 203744 933532    0    0     4    68 1214   426  5  1 92  2
>  0  0 551928   7140 203756 930316    0    0    17    61 1240   529  8  1 89  2
>  0  0 551976   5788 203772 928224    1    6   360   139 1297   317  7  2 83  8
>  0  0 551968   7588 203812 923504    0    0    19   125 1303   308  8  2 87  4
>  0  1 551976  10444 203892 914100    0    0    25   127 1433   438 10  3 85  3
>  0  0 551976   9220 204004 914804    0    0   123   126 1278   325  6  1 88  5
>  0  0 551976   8108 204044 912248    0    0    38    69 1279   291  6  1 91  2
>  0  1 551976  11828 204144 912320    1    0   135    94 1249   296  6  1 89  3
>  0  5 562204   3280 203952 157084    1  566   305   674 1281   313  6  4 73 17
>  0 18 598224   4276   1888  33356   91 2734   233  2761 1090   199  0  2  0 97
>  1 38 662520   2760   2104  30520  110 3721   261  3738 1161   831  1  2  0 97
> 10 41 699936   2772   1920  28716  123 2924   249  2946 1103  1273  0  3  0 97
>  0 39 748588   2956   1956  22668  160 3313   245  3331 1056  1047  0  2  0 98
>  0 38 796100   3108   1888  21348  321 3191   430  3206 1045  1002  0  2  0 97
>  4 43 844532   3308   1956  17644  518 3719   670  3733 1357   999  0  2  0 98
>  0 51 882596   2940   2052  13960  520 2796   705  2810 1048  1182  0  2  0 98
>  3 59 913392   2456   2048  10900 1013 2524  1308  2542 1144   601  0  2  0 98
>  5 71 937816   2760   2072   8584 1534 2681  1860  2702 1234   607  0  2  0 97

OK, so it's doing a lot of swapping and your swap utilisation is
continuously increasing.  I would suspect an application or kernel memory
leak.

I suggest you keep that `vmstat 30' running all the time.  When the machine
dies, take a look at the final 20 lines.

Also, run

	while true
	do
		cat /proc/meminfo
		sleep 10
	done

and record the info which that leaves behind when the machine locks up. 
This should tell us whether it is an application or kernel memory leak.  If
it is indeed a leak.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.1 IO lockup on SMP systems
  2004-02-24  7:23             ` Marcelo Tosatti
@ 2004-02-24  6:53               ` Andrew Morton
  0 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2004-02-24  6:53 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: rathamahata, linux-kernel, gluk, anton

Marcelo Tosatti <marcelo.tosatti@cyclades.com> wrote:
>
>  > Also, run
>  >
>  > 	while true
>  > 	do
>  > 		cat /proc/meminfo
>  > 		sleep 10
>  > 	done
>  >
>  > and record the info which that leaves behind when the machine locks up.
>  > This should tell us whether it is an application or kernel memory leak.  If
>  > it is indeed a leak.
> 
>  Hi Andrew,
> 
>  Care to explain me why should the kernel hang if due to an application
>  leak ?

It shouldn't - the oom killer should have done something.  But we'll
address that once we've confirmed that something really is leaking.

>  The hang looks wrong even if the leak is in userspace app, yes?

Probably, yes.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.1 IO lockup on SMP systems
  2004-02-23 22:26           ` Andrew Morton
@ 2004-02-24  7:23             ` Marcelo Tosatti
  2004-02-24  6:53               ` Andrew Morton
  2004-02-24 11:54             ` Sergey S. Kostyliov
  1 sibling, 1 reply; 24+ messages in thread
From: Marcelo Tosatti @ 2004-02-24  7:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Sergey S. Kostyliov, linux-kernel, gluk, anton



On Mon, 23 Feb 2004, Andrew Morton wrote:

> "Sergey S. Kostyliov" <rathamahata@php4.ru> wrote:
> >
> > > > OK, so everything is stuck trying to allocate memory.  Perhaps you ran out
> > > > of swapspace, or some process has gone berzerk allocating memory.
> >
> > The memory exhaustion is indeed possible for this box. I'll double check
> > ulimit and /etc/security/limits.conf stuff. The only thing which worries
> > me that this box had been running for months without any problems with
> > 2.4.23aa1.
>
> It is conceivable that you have some application which runs OK on 2.4.x but
> has some subtle bug which causes the app to go crazy on a 2.6 kernel
> consuming lots of memory.  Or there's a bug in the 2.6 kernel ;)
>
> > I have added another 2Gb to swap space (hope this give enough time
> > to find the memory hungry process(es)).
> >
> > > >
> > > > How much memory does the machine have, and how much swap space?
> > > >
> > > # free
> > >              total       used       free     shared    buffers     cached
> > > Mem:       2073868    2067508       6360          0     232708     897828
> > > -/+ buffers/cache:     936972    1136896
> > > Swap:      1535976       5228    1530748
> > >
> > > > I suggest that you run a `vmstat 30' trace on a terminal somewhere, see
> > > > what it says prior to the hangs.
> > > Ok.We'll try to get it next time.
> >
> > Here it is:
> > procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
> >  1  0 551920   8108 203744 933532    0    0     4    68 1214   426  5  1 92  2
> >  0  0 551928   7140 203756 930316    0    0    17    61 1240   529  8  1 89  2
> >  0  0 551976   5788 203772 928224    1    6   360   139 1297   317  7  2 83  8
> >  0  0 551968   7588 203812 923504    0    0    19   125 1303   308  8  2 87  4
> >  0  1 551976  10444 203892 914100    0    0    25   127 1433   438 10  3 85  3
> >  0  0 551976   9220 204004 914804    0    0   123   126 1278   325  6  1 88  5
> >  0  0 551976   8108 204044 912248    0    0    38    69 1279   291  6  1 91  2
> >  0  1 551976  11828 204144 912320    1    0   135    94 1249   296  6  1 89  3
> >  0  5 562204   3280 203952 157084    1  566   305   674 1281   313  6  4 73 17
> >  0 18 598224   4276   1888  33356   91 2734   233  2761 1090   199  0  2  0 97
> >  1 38 662520   2760   2104  30520  110 3721   261  3738 1161   831  1  2  0 97
> > 10 41 699936   2772   1920  28716  123 2924   249  2946 1103  1273  0  3  0 97
> >  0 39 748588   2956   1956  22668  160 3313   245  3331 1056  1047  0  2  0 98
> >  0 38 796100   3108   1888  21348  321 3191   430  3206 1045  1002  0  2  0 97
> >  4 43 844532   3308   1956  17644  518 3719   670  3733 1357   999  0  2  0 98
> >  0 51 882596   2940   2052  13960  520 2796   705  2810 1048  1182  0  2  0 98
> >  3 59 913392   2456   2048  10900 1013 2524  1308  2542 1144   601  0  2  0 98
> >  5 71 937816   2760   2072   8584 1534 2681  1860  2702 1234   607  0  2  0 97
>
> OK, so it's doing a lot of swapping and your swap utilisation is
> continuously increasing.  I would suspect an application or kernel memory
> leak.
>
> I suggest you keep that `vmstat 30' running all the time.  When the machine
> dies, take a look at the final 20 lines.
>
> Also, run
>
> 	while true
> 	do
> 		cat /proc/meminfo
> 		sleep 10
> 	done
>
> and record the info which that leaves behind when the machine locks up.
> This should tell us whether it is an application or kernel memory leak.  If
> it is indeed a leak.

Hi Andrew,

Care to explain me why should the kernel hang if due to an application
leak ?

The hang looks wrong even if the leak is in userspace app, yes?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.1 IO lockup on SMP systems
  2004-02-23 22:26           ` Andrew Morton
  2004-02-24  7:23             ` Marcelo Tosatti
@ 2004-02-24 11:54             ` Sergey S. Kostyliov
  2004-02-26 12:19               ` Sergey S. Kostyliov
  1 sibling, 1 reply; 24+ messages in thread
From: Sergey S. Kostyliov @ 2004-02-24 11:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, gluk, anton

On Tuesday 24 February 2004 01:26, Andrew Morton wrote:
> "Sergey S. Kostyliov" <rathamahata@php4.ru> wrote:

<cut>

> > The memory exhaustion is indeed possible for this box. I'll double check
> > ulimit and /etc/security/limits.conf stuff. The only thing which worries
> > me that this box had been running for months without any problems with
> > 2.4.23aa1.
> 
> It is conceivable that you have some application which runs OK on 2.4.x but
> has some subtle bug which causes the app to go crazy on a 2.6 kernel
> consuming lots of memory.  Or there's a bug in the 2.6 kernel ;)
> 
> > I have added another 2Gb to swap space (hope this give enough time
> > to find the memory hungry process(es)).

<cut>

> 
> OK, so it's doing a lot of swapping and your swap utilisation is
> continuously increasing.  I would suspect an application or kernel memory
> leak.
> 
> I suggest you keep that `vmstat 30' running all the time.  When the machine
> dies, take a look at the final 20 lines.

Here is from the last lockup:

1) last 20 entries of the `vmstat 30':
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 1  1 116676   7752 266156 621360    8    1  1031   186 1364   444 53  5 30 12
 1  0 116656   7512 266316 617716    2    3   334    79 1355   334 59  4 34  3
 1  0 116240   8072 266800 616444   17    1   539   302 1397   464 59  7 29  6
 1  0 116216  13320 266948 614044    1    1  1229    92 1505   587 61  6 27  6
 2  0 116208   8344 267152 618048    1    0   436   143 1367   386 58  5 32  5
 1  1 116216   6024 267308 619188    0   59  4574   164 1554   742 61  6 20 12
 1  1 116284   6468 267736 614028    4    2  1087   117 1458   529 60  7 27  6
 1  0 116280   6336 267888 617860    1    0  1225   101 1419   542 59  6 30  6
 2  1 116472   7264 268148 619288    0    4  7788   100 1645   950 33  6 29 33
 1  1 116728   5976 268296 617112    0    7  7799    86 1566   815 30  6 32 32
 2  0 116752   6080 268488 615992    6    8  7434   136 1627   910 34  7 25 34
 0  1 116944   6368 268588 615420    1    4  7601    95 1696   952 39  6 25 30
 1  0 116968  30600 268896 585832    0    4  2212   176 1584   642 62  7 16 15
 0  1 116968   6128 269064 604912    0    0  1410    67 1460   532 60  5 29  6
 1  0 116964   6280 269308 604008    0    4  7449   106 1561   819 35  5 30 30
 0  1 116976   6080 269400 603208    1    0  7317   121 1535   762 31  6 31 32
 1 16 331784   4452   2488  25132   30 7369  1916  7441 1177   333  7  6  6 81
 1 26 627540   3116   2172  23156  134 10159   217 10173 1159   200  0  4  0 96
 5 29 884564   3144   2036  16032  468 9443   622  9471 1106   435  0  5  0 95
 0 50 1097880   2800   2108   8592  484 7141   794  7164 1119   831  0  6  0 94

2) sysrq-M (This one looks strange to me because of
"Free swap:       2326708kB")

SysRq : Show Memory
Mem-info:
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
cpu 1 hot: low 2, high 6, batch 1
cpu 1 cold: low 0, high 2, batch 1
Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
HighMem per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16

Free pages:        2136kB (512kB HighMem)
Active:832 inactive:103 dirty:0 writeback:0 unstable:0 free:534
DMA free:256kB min:16kB low:32kB high:48kB active:0kB inactive:0kB
Normal free:1368kB min:936kB low:1872kB high:2808kB active:1380kB inactive:352kB
HighMem free:512kB min:512kB low:1024kB high:1536kB active:2008kB inactive:0kB
DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 256kB
Normal: 170*4kB 10*8kB 0*16kB 1*32kB 1*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1368kB
HighMem: 8*4kB 0*8kB 2*16kB 0*32kB 1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 512kB
Swap cache: add 392088, delete 392033, find 22279/32705, race 16+47
Free swap:       2326708kB
524288 pages of RAM
294912 pages of HIGHMEM
5821 reserved pages
860 pages shared
55 pages swap cached

3) sysrq-T:
http://sysadminday.org.ru/2.6.3-io_lockup/ope/sysrq-T

4) 3 last copies of /proc/vmstat

Tue Feb 24 02:36:53 MSK 2004
nr_dirty 320
nr_writeback 0
nr_unstable 0
nr_page_table_pages 822
nr_mapped 289207
nr_slab 11709
pgpgin 15829228
pgpgout 18320340
pswpin 25882
pswpout 37006
pgalloc 28844087
pgfree 28845931
pgactivate 923552
pgdeactivate 760039
pgfault 25500106
pgmajfault 66503
pgscan 7611061
pgrefill 4989936
pgsteal 5628844
pginodesteal 0
kswapd_steal 5211828
kswapd_inodesteal 2958
pageoutrun 33148
allocstall 12799
pgrotated 205322

Tue Feb 24 02:37:03 MSK 2004
nr_dirty 566
nr_writeback 0
nr_unstable 0
nr_page_table_pages 823
nr_mapped 289174
nr_slab 11733
pgpgin 15917192
pgpgout 18321888
pswpin 25882
pswpout 37006
pgalloc 28886326
pgfree 28888201
pgactivate 923806
pgdeactivate 760254
pgfault 25519499
pgmajfault 66550
pgscan 7633363
pgrefill 5008883
pgsteal 5650891
pginodesteal 0
kswapd_steal 5233875
kswapd_inodesteal 2958
pageoutrun 33287
allocstall 12799
pgrotated 205322

Tue Feb 24 02:37:23 MSK 2004
nr_dirty 4
nr_writeback 4559
nr_unstable 0
nr_page_table_pages 962
nr_mapped 197703
nr_slab 4887
pgpgin 15935652
pgpgout 18698124
pswpin 26444
pswpout 130749
pgalloc 29203531
pgfree 29204764
pgactivate 927401
pgdeactivate 944643
pgfault 25525960
pgmajfault 66694
pgscan 9534651
pgrefill 6027760
pgsteal 5952333
pginodesteal 0
kswapd_steal 5421086
kswapd_inodesteal 4181
pageoutrun 33500
allocstall 16189
pgrotated 292969
Tue Feb 24 02:38:16 MSK 2004
nr_dirty 0
nr_writeback 1805
nr_unstable 0
nr_page_table_pages 1433
nr_mapped 102046
nr_slab 4782
pgpgin 15956340
pgpgout 19099784
pswpin 30206
pswpout 230912
pgalloc 29315002
pgfree 29316033
pgactivate 1082560
pgdeactivate 1202414
pgfault 25537663
pgmajfault 67369
pgscan 11280124
pgrefill 6802507
pgsteal 6058697
pginodesteal 0
kswapd_steal 5476702
kswapd_inodesteal 4257
pageoutrun 33668
allocstall 17610
pgrotated 391878

4) Full top output:

top - 02:39:00 up  8:22,  3 users,  load average: 76.16, 25.71, 10.41
Tasks: 225 total,   1 running, 224 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.4% us,  5.3% sy,  0.0% ni,  0.2% id, 93.9% wa,  0.2% hi,  0.0% si
Mem:   2073868k total,  2071260k used,     2608k free,     2104k buffers
Swap:  3583968k total,  1097884k used,  2486084k free,     8604k cached

25123 mysql     15   0 1002m 142m 4896 D  1.0  7.0  10:35.25 mysqld
25122 mysql     15   0 1002m 142m 4896 D  0.0  7.0   0:05.91 mysqld
24132 mysql     15   0 1002m 142m 4896 D  0.0  7.0   0:28.97 mysqld
24129 mysql     15   0 1002m 142m 4896 S  0.0  7.0   0:05.90 mysqld
24125 mysql     15   0 1002m 142m 4896 D  0.1  7.0   0:07.59 mysqld
 5420 mysql     16   0 1002m 142m 4896 S  0.0  7.0   0:50.44 mysqld
 4748 mysql     15   0 1002m 142m 4896 D  0.0  7.0   3:10.94 mysqld
 4746 mysql     15   0 1002m 142m 4896 S  0.0  7.0   2:52.37 mysqld
  970 mysql     16   0 1002m 142m 4896 S  0.0  7.0   0:19.57 mysqld
  969 mysql     16   0 1002m 142m 4896 S  0.0  7.0   0:17.52 mysqld
  968 mysql     15   0 1002m 142m 4896 D  0.1  7.0   0:15.47 mysqld
  967 mysql     16   0 1002m 142m 4896 S  0.0  7.0   0:00.00 mysqld
  958 mysql     16   0 1002m 142m 4896 S  0.0  7.0   0:15.64 mysqld
  957 mysql     15   0 1002m 142m 4896 S  0.0  7.0   2:17.52 mysqld
  956 mysql     16   0 1002m 142m 4896 S  0.0  7.0   0:00.16 mysqld
  955 mysql     17   0 1002m 142m 4896 S  0.0  7.0   0:00.00 mysqld
  954 mysql     15   0 1002m 142m 4896 S  0.0  7.0   0:00.01 mysqld
  898 mysql     15   0 1002m 142m 4896 D  0.0  7.0   0:03.57 mysqld
30132 pricemat  25   0 88976  12m 1944 S  0.0  0.6   2:29.37 make_words
29381 apache    15   0 57948 3200  41m D  0.0  0.2   0:26.14 httpd
29652 apache    15   0 57920 3188  41m S  0.0  0.2   0:19.16 httpd
31015 apache    15   0 56456 2484  41m D  0.0  0.1   0:14.68 httpd
29155 apache    15   0 55064 2916  42m S  0.0  0.1   0:21.33 httpd
30281 apache    15   0 54756 5096  41m D  0.0  0.2   0:11.47 httpd
29638 apache    16   0 54744 3816  42m S  0.0  0.2   0:17.62 httpd
29540 apache    15   0 54436 4732  41m D  0.0  0.2   0:19.75 httpd
30153 apache    15   0 54404 3472  41m S  0.0  0.2   0:12.79 httpd
30123 apache    15   0 54356 4024  41m D  0.0  0.2   0:13.18 httpd
30116 apache    15   0 54316 3352  41m D  0.0  0.2   0:11.75 httpd
29647 apache    15   0 54308 4224  41m D  0.0  0.2   0:17.31 httpd
30134 apache    15   0 53416 2968  41m D  0.0  0.1   0:14.14 httpd
29651 apache    15   0 53040 3220  41m D  0.0  0.2   0:17.58 httpd
29013 apache    15   0 52888 4552  41m S  0.0  0.2   0:13.12 httpd
30619 apache    15   0 52824 3584  41m D  0.0  0.2   0:05.70 httpd
28174 apache    15   0 52692 3956  41m D  0.0  0.2   0:17.85 httpd
30926 apache    15   0 52572 2960  41m S  0.0  0.1   0:04.82 httpd
30117 apache    15   0 52464 4356  41m D  0.1  0.2   0:12.74 httpd
30135 apache    15   0 52392 3984  41m D  0.0  0.2   0:11.73 httpd
30126 apache    15   0 52380 4076  41m D  0.0  0.2   0:13.88 httpd
30133 apache    15   0 52340 2856  41m D  0.0  0.1   0:13.50 httpd
31136 apache    15   0 52316 2596  41m D  0.0  0.1   0:00.90 httpd
30127 apache    15   0 52312 3044  41m D  0.0  0.1   0:12.13 httpd
30136 apache    15   0 52208 2780  41m D  0.0  0.1   0:13.56 httpd
31138 apache    15   0 52116 3272  41m D  0.1  0.2   0:00.78 httpd
31137 apache    15   0 52068 2420  41m D  0.0  0.1   0:00.99 httpd
31289 apache    16   0 51476 1900  41m D  0.0  0.1   0:00.00 httpd
31273 apache    17   0 51360 2188  41m D  0.0  0.1   0:00.01 httpd
31261 apache    16   0 51252 1740  41m D  0.0  0.1   0:00.02 httpd
31234 apache    16   0 51220 1520  41m D  0.1  0.1   0:00.02 httpd
31208 apache    16   0 51220 1888  41m D  0.0  0.1   0:00.02 httpd
31276 apache    16   0 51144 1920  41m D  0.0  0.1   0:00.00 httpd
31274 apache    16   0 51144 1952  41m D  0.0  0.1   0:00.00 httpd
31258 apache    18   0 51144 2068  41m D  0.0  0.1   0:00.02 httpd
31255 apache    18   0 51144 2068  41m D  0.0  0.1   0:00.01 httpd
31254 apache    16   0 51144 2012  41m D  0.0  0.1   0:00.01 httpd
31252 apache    17   0 51144 1996  41m D  0.0  0.1   0:00.01 httpd
31251 apache    16   0 51144 2012  41m D  0.0  0.1   0:00.01 httpd
31238 apache    16   0 51144 2068  41m D  0.0  0.1   0:00.01 httpd
31212 apache    17   0 51144 2028  41m D  0.0  0.1   0:00.01 httpd
31288 apache    17   0 51140 2056  41m D  0.0  0.1   0:00.01 httpd
31287 apache    16   0 51140 2020  41m D  0.0  0.1   0:00.00 httpd
31227 apache    18   0 51140 2084  41m D  0.0  0.1   0:00.00 httpd
31201 apache    16   0 51140 2024  41m D  0.0  0.1   0:00.01 httpd
31225 apache    16   0 51136 1768  41m D  0.0  0.1   0:00.01 httpd
31300 apache    16   0 51132 1700  41m D  0.0  0.1   0:00.00 httpd
31285 apache    16   0 51132 2112  41m D  0.0  0.1   0:00.00 httpd
31283 apache    16   0 51132 1708  41m D  0.0  0.1   0:00.00 httpd
31280 apache    16   0 51132 1692  41m D  0.0  0.1   0:00.00 httpd
31272 apache    18   0 51132 1828  41m D  0.0  0.1   0:00.00 httpd
31257 apache    16   0 51132 2012  41m D  0.0  0.1   0:00.00 httpd
31207 apache    16   0 51132 1708  41m D  0.0  0.1   0:00.00 httpd
31243 apache    16   0 51128 1856  41m D  0.0  0.1   0:00.00 httpd
31296 apache    16   0 51120 1844  41m D  0.0  0.1   0:00.00 httpd
31295 apache    16   0 51120 1632  41m D  0.0  0.1   0:00.00 httpd
31284 apache    16   0 51120 1640  41m D  0.0  0.1   0:00.01 httpd
31277 apache    16   0 51120 1616  41m D  0.0  0.1   0:00.00 httpd
31271 apache    18   0 51120 1656  41m D  0.0  0.1   0:00.00 httpd
31220 apache    16   0 51120 1620  41m D  0.0  0.1   0:00.01 httpd
31206 apache    16   0 51108 1944  41m D  0.0  0.1   0:00.01 httpd
31249 apache    17   0 51104 1788  41m D  0.0  0.1   0:00.01 httpd
31237 apache    16   0 51104 1848  41m D  0.1  0.1   0:00.02 httpd
31253 apache    16   0 51100 2140  41m D  0.0  0.1   0:00.01 httpd
31203 apache    17   0 51100 1608  41m D  0.0  0.1   0:00.01 httpd
31211 apache    16   0 51096 2004  41m D  0.0  0.1   0:00.01 httpd
31298 apache    17   0 51092 2004  41m D  0.0  0.1   0:00.00 httpd
31282 apache    16   0 51092 2084  41m D  0.0  0.1   0:00.00 httpd
31267 apache    18   0 51092 2056  41m D  0.0  0.1   0:00.01 httpd
31313 apache    18   0 51088 1512  41m D  0.0  0.1   0:00.00 httpd
31312 apache    16   0 51088 1508  41m D  0.0  0.1   0:00.00 httpd
31310 apache    17   0 51088 1512  41m D  0.0  0.1   0:00.00 httpd
31286 apache    16   0 51088 1680  41m D  0.0  0.1   0:00.01 httpd
31281 apache    15   0 51088 1268  41m D  0.0  0.1   0:00.00 httpd
31269 apache    18   0 51088 1824  41m D  0.0  0.1   0:00.00 httpd
31268 apache    17   0 51088 1776  41m D  0.1  0.1   0:00.04 httpd
31248 apache    16   0 51088 1600  41m S  0.1  0.1   0:00.02 httpd
31242 apache    16   0 51088 1336  41m D  0.0  0.1   0:00.00 httpd
31241 apache    15   0 51088 1636  41m S  0.0  0.1   0:00.01 httpd
31236 apache    18   0 51088 1752  41m D  0.0  0.1   0:00.00 httpd
31233 apache    16   0 51088 1376  41m S  0.0  0.1   0:00.00 httpd
31231 apache    16   0 51088 1196  41m D  0.0  0.1   0:00.00 httpd
31217 apache    16   0 51088 1636  41m S  0.0  0.1   0:00.01 httpd
31214 apache    16   0 51088 1428  41m S  0.0  0.1   0:00.01 httpd
31210 apache    16   0 51088 1320  41m S  0.0  0.1   0:00.00 httpd
31205 apache    18   0 51088 1648  41m D  0.0  0.1   0:00.01 httpd
31204 apache    16   0 51088 1268  41m S  0.0  0.1   0:00.00 httpd
31235 apache    16   0 51080 1364  41m D  0.0  0.1   0:00.00 httpd
31232 apache    16   0 51080 1484  41m D  0.0  0.1   0:00.03 httpd
31219 apache    18   0 51080 1800  41m D  0.0  0.1   0:00.01 httpd
31315 apache    18   0 51076 1384  41m D  0.0  0.1   0:00.00 httpd
31314 apache    16   0 51076 1316  41m D  0.0  0.1   0:00.00 httpd
31311 apache    18   0 51076 1464  41m D  0.0  0.1   0:00.01 httpd
31309 apache    18   0 51076 1384  41m D  0.0  0.1   0:00.00 httpd
31308 apache    18   0 51076 1304  41m D  0.0  0.1   0:00.00 httpd
31306 apache    17   0 51076 1420  41m D  0.0  0.1   0:00.00 httpd
31305 apache    18   0 51076 1320  41m D  0.0  0.1   0:00.01 httpd
31304 apache    18   0 51076 1280  41m D  0.0  0.1   0:00.00 httpd
31303 apache    18   0 51076 1380  41m D  0.0  0.1   0:00.00 httpd
31302 apache    17   0 51076 1308  41m D  0.0  0.1   0:00.00 httpd
31301 apache    15   0 51076 1292  41m D  0.0  0.1   0:00.00 httpd
31297 apache    16   0 51076 1348  41m D  0.0  0.1   0:00.00 httpd
31279 apache    16   0 51076 1292  41m S  0.0  0.1   0:00.00 httpd
31278 apache    15   0 51076 1204  41m D  0.0  0.1   0:00.00 httpd
31275 apache    15   0 51076 1196  41m D  0.0  0.1   0:00.00 httpd
31260 apache    16   0 51076 1548  41m S  0.0  0.1   0:00.02 httpd
31259 apache    18   0 51076 1536  41m S  0.0  0.1   0:00.00 httpd
31256 apache    18   0 51076 1444  41m S  0.0  0.1   0:00.00 httpd
31250 apache    16   0 51076 1484  41m S  0.0  0.1   0:00.00 httpd
31247 apache    16   0 51076 1292  41m D  0.0  0.1   0:00.01 httpd
31246 apache    16   0 51076 1296  41m S  0.0  0.1   0:00.01 httpd
31245 apache    18   0 51076 1172  41m D  0.0  0.1   0:00.00 httpd
31244 apache    15   0 51076 1412  41m S  0.0  0.1   0:00.00 httpd
31240 apache    16   0 51076 1500  41m S  0.0  0.1   0:00.01 httpd
31239 apache    15   0 51076 1548  41m D  0.0  0.1   0:00.01 httpd
31230 apache    18   0 51076 1300  41m D  0.0  0.1   0:00.00 httpd
31229 apache    18   0 51076 1304  41m D  0.0  0.1   0:00.00 httpd
31228 apache    16   0 51076 1424  41m S  0.0  0.1   0:00.00 httpd
31226 apache    16   0 51076 1760  41m D  0.0  0.1   0:00.01 httpd
31223 apache    18   0 51076 1216  41m D  0.0  0.1   0:00.00 httpd
31218 apache    18   0 51076 1704  41m D  0.0  0.1   0:00.01 httpd
31216 apache    18   0 51076 1208  41m D  0.0  0.1   0:00.00 httpd
31215 apache    16   0 51076 1240  41m D  0.0  0.1   0:00.00 httpd
31202 apache    17   0 51076 1620  41m D  0.0  0.1   0:00.01 httpd
31325 root      17   0 51064 1320  41m D  0.0  0.1   0:00.00 httpd
31324 root      17   0 51064 1320  41m D  0.0  0.1   0:00.00 httpd
31323 root      15   0 51064 1320  41m D  0.0  0.1   0:00.00 httpd
31322 root      15   0 51064 1320  41m D  0.0  0.1   0:00.00 httpd
31319 root      18   0 51064 1288  41m D  0.0  0.1   0:00.00 httpd
31318 root      17   0 51064 1312  41m D  0.0  0.1   0:00.00 httpd
31316 root      18   0 51064 1328  41m D  0.0  0.1   0:00.00 httpd
  794 root      17   0 51064 1192  41m S  0.1  0.1   0:01.67 httpd
23885 pricemat  16   0  5652 1124 4892 S  0.0  0.1   0:00.02 php
23980 pricemat  17   0  5648  652 4892 S  0.0  0.0   0:00.01 php
 1430 root      15   0  3780  260 3112 S  0.0  0.0   0:00.61 sshd
 8273 root      15   0  3716  468 3112 S  0.0  0.0   0:01.26 sshd
  994 root      16   0  3660  516 3112 S  0.0  0.0   0:10.24 sshd
 2147 root      15   0  3572  176 3112 S  0.0  0.0   0:00.12 sshd
 2129 root      16   0  3572  156 3112 S  0.0  0.0   0:00.76 sshd
 1919 root      15   0  3532  128 3112 S  0.0  0.0   0:00.11 sshd
 1480 root      16   0  3488   84 2224 S  0.0  0.0   0:00.83 bash
 2991 rathamah  16   0  3336  420 2052 S  0.0  0.0   0:00.04 bash
 1431 rathamah  16   0  2828  880 2052 S  0.0  0.0   0:00.04 bash
  770 dnscache  15   0  2712   24 1412 S  0.0  0.0   0:17.59 dnscache
  728 root      16   0  2672  176 2560 S  0.0  0.0   0:01.72 sshd
 1001 rathamah  16   0  2588   48 2052 S  0.0  0.0   0:00.03 bash
 2957 root      17   0  2388   40 1984 S  0.0  0.0   0:00.02 login
  846 root      20   0  2284  252 2120 S  0.0  0.0   0:00.02 mysqld_safe
  750 root      16   0  2212   44 1900 S  0.0  0.0   0:00.00 xinetd
 1478 root      16   0  2148  216 1788 S  0.0  0.0   0:00.00 su
 1062 rathamah  15   0  2132  508 1728 D  0.4  0.0   2:37.73 top
 8278 rathamah  15   0  2028  560 1728 R  0.2  0.0   0:18.48 top
31292 mobilius  18   0  1972  124 1884 S  0.0  0.0   0:00.00 lacheck.sh
31263 mobilius  18   0  1972  112 1884 S  0.0  0.0   0:00.01 sh
 2131 rathamah  15   0  1964   80 1884 S  0.0  0.0   0:04.17 proc_vmstat.sh
 1192 apache    16   0  1952   44 1816 S  0.0  0.0   0:00.60 cache_clean
  708 ntp       16   0  1936 1928 1792 S  0.0  0.1   0:00.10 ntpd
  619 root      15   0  1840  304 1624 S  0.0  0.0   0:06.43 syslogd
  837 root      15   0  1728  136 1536 S  0.0  0.0   0:00.14 crond
23884 root      16   0  1620  160 1536 S  0.0  0.0   0:00.00 crond
31264 root      18   0  1616  108 1536 S  0.0  0.0   0:00.00 crond
31262 root      17   0  1616   96 1536 S  0.0  0.0   0:00.00 crond
 1088 root      23   0  1616   84 1536 S  0.0  0.0   0:00.00 crond
  625 root      16   0  1532  188 1364 S  0.0  0.0   0:00.19 klogd
 2149 rathamah  15   0  1464   88 1404 S  0.0  0.0   0:00.02 vmstat
  863 qmails    15   0  1444  196 1388 S  0.0  0.0   0:45.57 qmail-send
31321 megashop  18   0  1436  200 1388 D  0.0  0.0   0:00.00 qmail-inject
31293 megashop  18   0  1436  200 1388 D  0.0  0.0   0:00.01 qmail-inject
23939 pricemat  16   0  1436    8 1388 S  0.0  0.0   0:00.00 qmail-inject
   23 root      17   0  1436  432 1392 S  0.0  0.0   0:00.66 devfsd
31317 rathamah  17   0  1432  124 1420 D  0.0  0.0   0:00.01 date
 1201 urs       15   0  1420  100 1392 D  0.0  0.0   0:00.54 tcpserver
 1187 urs       18   0  1420   44 1392 S  0.0  0.0   0:00.00 tcpserver
  767 root      15   0  1420   76 1356 S  0.0  0.0   0:00.12 svscan
    1 root      15   0  1420  424 1372 D  0.0  0.0   0:04.26 init
  866 qmaill    15   0  1412  292 1352 S  0.0  0.0   0:00.62 splogger
  867 root      15   0  1404   96 1360 S  0.0  0.0   0:00.15 qmail-lspawn
  868 qmailr    16   0  1400   96 1356 S  0.0  0.0   0:00.17 qmail-rspawn
  771 dnslog    15   0  1400   16 1368 S  0.0  0.0   0:10.64 multilog
 1261 root      16   0  1392   40 1352 S  0.0  0.0   0:00.00 mingetty
  919 root      16   0  1392   36 1352 S  0.0  0.0   0:00.00 mingetty
  916 root      16   0  1392   92 1352 S  0.0  0.0   0:00.00 mingetty
  915 root      16   0  1392   64 1352 S  0.0  0.0   0:00.00 mingetty
  914 root      16   0  1392   80 1352 S  0.0  0.0   0:00.00 mingetty
  913 root      16   0  1392   76 1352 S  0.0  0.0   0:00.00 mingetty
  869 qmailq    15   0  1388   84 1356 S  0.0  0.0   0:00.66 qmail-clean
  769 root      16   0  1388   12 1360 S  0.0  0.0   0:00.00 supervise
  768 root      16   0  1388   12 1360 S  0.0  0.0   0:00.00 supervise
31307 mobilius  18   0   376  116  348 D  0.0  0.0   0:00.00 awk
31320 root      15   0     0    0    0 D  0.0  0.0   0:00.00 pdflush
31222 root      15   0     0    0    0 D  0.0  0.0   0:00.01 pdflush
24026 root      15   0     0    0    0 D  0.0  0.0   0:05.24 pdflush
   18 root       5 -10     0    0    0 S  0.0  0.0   0:00.04 reiserfs/1
   17 root       5 -10     0    0    0 S  0.0  0.0   0:00.02 reiserfs/0
   16 root      18   0     0    0    0 S  0.0  0.0   0:00.15 kseriod
   15 root      15 -10     0    0    0 S  0.0  0.0   0:00.00 aio/1
   14 root      10 -10     0    0    0 S  0.0  0.0   0:00.00 aio/0
   13 root      15   0     0    0    0 D  8.9  0.0   0:23.43 kswapd0
   10 root      15   0     0    0    0 S  0.0  0.0   0:00.00 kirqd
    9 root       5 -10     0    0    0 S  0.0  0.0   0:00.01 kblockd/1
    8 root       5 -10     0    0    0 S  0.0  0.0   0:00.01 kblockd/0
    7 root       5 -10     0    0    0 S  0.0  0.0   0:00.02 events/1
    6 root       5 -10     0    0    0 S  0.0  0.0   0:00.03 events/0
    5 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/1
    4 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/1
    3 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0
    2 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0

> 
> Also, run
> 
> 	while true
> 	do
> 		cat /proc/meminfo
> 		sleep 10
> 	done
> 
> and record the info which that leaves behind when the machine locks up. 
> This should tell us whether it is an application or kernel memory leak.  If
> it is indeed a leak.

Will do this next time.

> 
> 
> 

-- 
                   Best regards,
                   Sergey S. Kostyliov <rathamahata@php4.ru>
                   Public PGP key: http://sysadminday.org.ru/rathamahata.asc

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.1 IO lockup on SMP systems
  2004-02-23 21:30           ` Mike Fedyk
@ 2004-02-24 11:56             ` Sergey S. Kostyliov
  0 siblings, 0 replies; 24+ messages in thread
From: Sergey S. Kostyliov @ 2004-02-24 11:56 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: Andrew Morton, linux-kernel, Alexander Y. Fomichev, anton

On Tuesday 24 February 2004 00:30, Mike Fedyk wrote:
> Sergey S. Kostyliov wrote:
> > Hello Andrew,
> > 
> > Now this happens for the third time.
> > 
> > 
> >>>>I've just reproduced this lockup with 2.6.3.
> >>>>
> >>>>
> >>>>>You may need a serial console to be able to capture all the output.
> >>>>>
> >>>>>Also, it would be useful to know what sort of load the machines are
> >>>>>under, and what filesystems are in use.
> >>>>
> >>>>The machine is a http server. The main applications are:
> >>>>1) apache 1.3 which serves php pages (mod_php):
> >>>>	 15.3 requests/sec - 111.9 kB/second - 7.3 kB/request
> >>>>	 54 requests currently being processed, 19 idle servers
> >>>>2) mysql:
> >>>>	Threads: 19  Questions: 26922012  Slow queries: 9799  Opens: 64980
> >>>>	Flush tables: 1  Open tables: 630  Queries per second avg: 143.547
> >>>>
> >>>>This is an IO bound machine in general. All filesystems are reiserfs.
> >>>>
> >>>>Here is a sysrq-T output obtained from a locked box via serail console:
> >>>
> >>>OK, so everything is stuck trying to allocate memory.  Perhaps you ran out
> >>>of swapspace, or some process has gone berzerk allocating memory.
> > 
> > 
> > The memory exhaustion is indeed possible for this box. I'll double check
> > ulimit and /etc/security/limits.conf stuff. The only thing which worries
> > me that this box had been running for months without any problems with
> > 2.4.23aa1.
> > 
> > I have added another 2Gb to swap space (hope this give enough time
> > to find the memory hungry process(es)).
> 
> Also check how much memory is being used for slab in /proc/meminfo

Thanks for the hint, will do this next time. 

-- 
                   Best regards,
                   Sergey S. Kostyliov <rathamahata@php4.ru>
                   Public PGP key: http://sysadminday.org.ru/rathamahata.asc

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.1 IO lockup on SMP systems
  2004-02-24 11:54             ` Sergey S. Kostyliov
@ 2004-02-26 12:19               ` Sergey S. Kostyliov
  2004-02-26 12:53                 ` Andrew Morton
  0 siblings, 1 reply; 24+ messages in thread
From: Sergey S. Kostyliov @ 2004-02-26 12:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, gluk, anton, Mike Fedyk

On Tuesday 24 February 2004 14:54, Sergey S. Kostyliov wrote:
> On Tuesday 24 February 2004 01:26, Andrew Morton wrote:
> > "Sergey S. Kostyliov" <rathamahata@php4.ru> wrote:
> 
> <cut>
> 
> > > The memory exhaustion is indeed possible for this box. I'll double check
> > > ulimit and /etc/security/limits.conf stuff. The only thing which worries
> > > me that this box had been running for months without any problems with
> > > 2.4.23aa1.
> > 
> > It is conceivable that you have some application which runs OK on 2.4.x but
> > has some subtle bug which causes the app to go crazy on a 2.6 kernel
> > consuming lots of memory.  Or there's a bug in the 2.6 kernel ;)
> > 
> > > I have added another 2Gb to swap space (hope this give enough time
> > > to find the memory hungry process(es)).
> 
> <cut>
> 
> > 
> > OK, so it's doing a lot of swapping and your swap utilisation is
> > continuously increasing.  I would suspect an application or kernel memory
> > leak.
> > 
> > I suggest you keep that `vmstat 30' running all the time.  When the machine
> > dies, take a look at the final 20 lines.
> 
> Here is from the last lockup:

Yet another lockup has just occurred. I could be wrong but from the
/proc/meminfo content it doesn't looks like memory leak (neither kernel
nor userspace), doesn't it?

1) 3 last /proc/meminfo before a hang:
===============================
Thu Feb 26 04:58:34 MSK 2004
MemTotal:      2073868 kB
MemFree:          7008 kB
Buffers:        223100 kB
Cached:         593368 kB
SwapCached:     748824 kB
Active:        1776280 kB
Inactive:       226160 kB
HighTotal:     1179648 kB
HighFree:         2560 kB
LowTotal:       894220 kB
LowFree:          4448 kB
SwapTotal:     3583968 kB
SwapFree:      2675616 kB
Dirty:            2156 kB
Writeback:           0 kB
Mapped:        1219740 kB
Slab:            43668 kB
Committed_AS:  1846968 kB
PageTables:       4020 kB
VmallocTotal:   114680 kB
VmallocUsed:      7448 kB
VmallocChunk:   107232 kB

Thu Feb 26 04:59:05 MSK 2004
MemTotal:      2073868 kB
MemFree:          3972 kB
Buffers:          2268 kB
Cached:          36132 kB
SwapCached:     726940 kB
Active:        1157256 kB
Inactive:         3696 kB
HighTotal:     1179648 kB
HighFree:          704 kB
LowTotal:       894220 kB
LowFree:          3268 kB
SwapTotal:     3583968 kB
SwapFree:      2633444 kB
Dirty:              20 kB
Writeback:        3376 kB
Mapped:        1154812 kB
Slab:            27996 kB
Committed_AS:  1851456 kB
PageTables:       4052 kB
VmallocTotal:   114680 kB
VmallocUsed:      7448 kB
VmallocChunk:   107232 kB

Thu Feb 26 05:00:15 MSK 2004
MemTotal:      2073868 kB
MemFree:          2528 kB
Buffers:          2180 kB
Cached:          34216 kB
SwapCached:     643808 kB
Active:         999316 kB
Inactive:        12088 kB
HighTotal:     1179648 kB
HighFree:          576 kB
LowTotal:       894220 kB
LowFree:          1952 kB
SwapTotal:     3583968 kB
SwapFree:      2559796 kB
Dirty:               0 kB
Writeback:        3052 kB
Mapped:        1001208 kB
Slab:            23932 kB
Committed_AS:  1979784 kB
PageTables:       4840 kB
VmallocTotal:   114680 kB
VmallocUsed:      7448 kB
VmallocChunk:   107232 kB

2) sysrq-M:
===========
SysRq : Show Memory
Mem-info:
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
cpu 1 hot: low 2, high 6, batch 1
cpu 1 cold: low 0, high 2, batch 1
Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
HighMem per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16

Free pages:        2120kB (512kB HighMem)
Active:1067 inactive:93 dirty:0 writeback:0 unstable:0 free:530
DMA free:176kB min:16kB low:32kB high:48kB active:884kB inactive:0kB
Normal free:1432kB min:936kB low:1872kB high:2808kB active:1376kB inactive:372kB
HighMem free:512kB min:512kB low:1024kB high:1536kB active:2008kB inactive:0kB
DMA: 0*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 176kB
Normal: 248*4kB 3*8kB 0*16kB 1*32kB 6*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1432kB
HighMem: 0*4kB 0*8kB 0*16kB 2*32kB 1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 512kB
Swap cache: add 1726105, delete 1726052, find 1388170/1627421, race 19+488
Free swap:       2195688kB
524288 pages of RAM
294912 pages of HIGHMEM
5821 reserved pages
993 pages shared
54 pages swap cached

3) sysrq-T:
===========
http://sysadminday.org.ru/2.6.3-lockup/20040226/sysrq-T

3) `vmstat 30':
===============
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 0 19 1255096   1952   1996  19920  426 1763   505  1778 1068   172  0  1  0 99
 0 24 1260156   1944   2028  19816  374 1650   463  1670 1067   165  0  1  0 99
 0 18 1266576   1880   2000  18960  372 1835   449  1847 1072   177  0  1  0 99
 0 19 1274696   2904   1960  17892  366 2002   422  2007 1054   179  0  1  0 99
 0 14 1279000   2896   1916  17356  203 1683   243  1693 1037   137  0  1  0 99
 0 19 1288068   2472   1912  16608  180 2074   220  2085 1048   138  0  1  0 99
 1 13 1294388   2152   1932  16404  253 1841   302  1849 1037   117  0  1  0 99
 0 17 1301552   2328   1956  15684  318 1866   375  1880 1037   162  0  1  0 99
 0 18 1307280   2448   1956  15024  331 1697   408  1714 1041   155  0  1  0 99
 0 20 1312696   2184   1852  13948  480 1720   549  1732 1041   166  0  1  0 99
 0 21 1321756   2308   1952  13400  435 2012   572  2028 1048   191  0  1  0 99
 0 20 1330740   2372   1840  12152  509 1920   564  1939 1045   162  0  1  0 99
 0 19 1336432   2616   1844  11252  513 1697   568  1704 1043   135  0  1  0 99
 0 20 1342256   2364   1896  10704  520 1810   573  1816 1042   185  0  1  0 99
 0 17 1350608   2868   1796  10112  368 2079   412  2092 1040   133  0  1  0 99
 0 19 1356100   2176   1988   9120  401 1668   533  1677 1039   161  0  1  0 99
 0 20 1359692   2248   2004   8876  369 1500   482  1514 1039   169  0  1  0 99
 0 19 1364868   2696   1904   8428  455 1643   604  1658 1038   172  0  1  0 99
 0 20 1371124   2876   1920   7212  537 2133   745  2147 1312   209  0  1  0 99
 0 20 1378172   3192   1832   6036  614 1623   793  1631 1042   180  0  1  0 99

-- 
                   Best regards,
                   Sergey S. Kostyliov <rathamahata@php4.ru>
                   Public PGP key: http://sysadminday.org.ru/rathamahata.asc

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.1 IO lockup on SMP systems
  2004-02-26 12:19               ` Sergey S. Kostyliov
@ 2004-02-26 12:53                 ` Andrew Morton
  2004-02-26 13:11                   ` Andrew Morton
  2004-02-26 14:30                   ` Sergey S. Kostyliov
  0 siblings, 2 replies; 24+ messages in thread
From: Andrew Morton @ 2004-02-26 12:53 UTC (permalink / raw)
  To: Sergey S. Kostyliov; +Cc: linux-kernel, gluk, anton, mfedyk

"Sergey S. Kostyliov" <rathamahata@php4.ru> wrote:
>
> Yet another lockup has just occurred. I could be wrong but from the
> /proc/meminfo content it doesn't looks like memory leak (neither kernel
> nor userspace), doesn't it?

I think it's a kernel leak.

> Thu Feb 26 05:00:15 MSK 2004
> MemTotal:      2073868 kB
> MemFree:          2528 kB
> Buffers:          2180 kB
> Cached:          34216 kB
> SwapCached:     643808 kB
> Active:         999316 kB
> Inactive:        12088 kB
> HighTotal:     1179648 kB
> HighFree:          576 kB
> LowTotal:       894220 kB
> LowFree:          1952 kB
> SwapTotal:     3583968 kB
> SwapFree:      2559796 kB
> Dirty:               0 kB
> Writeback:        3052 kB
> Mapped:        1001208 kB
> Slab:            23932 kB
> Committed_AS:  1979784 kB
> PageTables:       4840 kB
> VmallocTotal:   114680 kB
> VmallocUsed:      7448 kB
> VmallocChunk:   107232 kB

A gig of mapped memory, most of it in swapcache.  That's probably all
highmem.  Only a gig of memory on the page LRU.  Where is the rest?  Lost.

Almost no pagecache at all, slab is small.

> 3) sysrq-T:
> ===========
> http://sysadminday.org.ru/2.6.3-lockup/20040226/sysrq-T

hm, you have 34 instances of crond running.   How odd.

> 3) `vmstat 30':
> ===============
> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
>  0 19 1255096   1952   1996  19920  426 1763   505  1778 1068   172  0  1  0 99
>  0 24 1260156   1944   2028  19816  374 1650   463  1670 1067   165  0  1  0 99

Again, all your memory has vanished.

I'd say that we've leaked everything in lowmem and everyone is stuck trying
to reclaim some lowmem memory.  Not sure why the oom-killer didn't do
anything.  I haven't tested it in a year - maybe it broke.

So.  What are you using which is different from everyone else?  DAC960 I
see.  What about firewall setups, NIC drivers, RAID/MD/etc?  Anything in
there which isn't a mainstream thing?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.1 IO lockup on SMP systems
  2004-02-26 12:53                 ` Andrew Morton
@ 2004-02-26 13:11                   ` Andrew Morton
  2004-02-26 14:37                     ` Dave Jones
  2004-02-26 14:30                   ` Sergey S. Kostyliov
  1 sibling, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2004-02-26 13:11 UTC (permalink / raw)
  To: rathamahata, linux-kernel, gluk, anton, mfedyk

Andrew Morton <akpm@osdl.org> wrote:
>
> Not sure why the oom-killer didn't do anything.

There's still free swap space.  The oom-killer has problems.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.1 IO lockup on SMP systems
  2004-02-26 12:53                 ` Andrew Morton
  2004-02-26 13:11                   ` Andrew Morton
@ 2004-02-26 14:30                   ` Sergey S. Kostyliov
  2004-02-26 20:03                     ` Andrew Morton
  1 sibling, 1 reply; 24+ messages in thread
From: Sergey S. Kostyliov @ 2004-02-26 14:30 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, gluk, anton, mfedyk

On Thursday 26 February 2004 15:53, Andrew Morton wrote:
> "Sergey S. Kostyliov" <rathamahata@php4.ru> wrote:
> >
> > Yet another lockup has just occurred. I could be wrong but from the
> > /proc/meminfo content it doesn't looks like memory leak (neither kernel
> > nor userspace), doesn't it?
> 
> I think it's a kernel leak.
> 
> > Thu Feb 26 05:00:15 MSK 2004
> > MemTotal:      2073868 kB
> > MemFree:          2528 kB
> > Buffers:          2180 kB
> > Cached:          34216 kB
> > SwapCached:     643808 kB
> > Active:         999316 kB
> > Inactive:        12088 kB
> > HighTotal:     1179648 kB
> > HighFree:          576 kB
> > LowTotal:       894220 kB
> > LowFree:          1952 kB
> > SwapTotal:     3583968 kB
> > SwapFree:      2559796 kB
> > Dirty:               0 kB
> > Writeback:        3052 kB
> > Mapped:        1001208 kB
> > Slab:            23932 kB
> > Committed_AS:  1979784 kB
> > PageTables:       4840 kB
> > VmallocTotal:   114680 kB
> > VmallocUsed:      7448 kB
> > VmallocChunk:   107232 kB
> 
> A gig of mapped memory, most of it in swapcache.  That's probably all
> highmem.  Only a gig of memory on the page LRU.  Where is the rest?  Lost.
> 
> Almost no pagecache at all, slab is small.
> 
> > 3) sysrq-T:
> > ===========
> > http://sysadminday.org.ru/2.6.3-lockup/20040226/sysrq-T
> 
> hm, you have 34 instances of crond running.   How odd.
> 
> > 3) `vmstat 30':
> > ===============
> > procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
> >  0 19 1255096   1952   1996  19920  426 1763   505  1778 1068   172  0  1  0 99
> >  0 24 1260156   1944   2028  19816  374 1650   463  1670 1067   165  0  1  0 99
> 
> Again, all your memory has vanished.
> 
> I'd say that we've leaked everything in lowmem and everyone is stuck trying
> to reclaim some lowmem memory.  Not sure why the oom-killer didn't do
> anything.  I haven't tested it in a year - maybe it broke.
> 
> So.  What are you using which is different from everyone else?  DAC960 I
> see.  What about firewall setups, NIC drivers, RAID/MD/etc?  Anything in
> there which isn't a mainstream thing?

Iptables (ipt_REJECT,ipt_state,ip_conntrack,ipt_state,iptable_filter modules)
is used as firewall.

I think NICs are pretty usual:
00:04.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 08)
00:05.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 08)
handled by Intel e100 driver.

Only plain partitions (there is no md, dm or something like this):
[rathamahata@ope rathamahata]$ mount
/dev/rd/host0/target0/part1 on / type reiserfs (rw)
none on /proc type proc (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/rd/host0/target1/part2 on /usr/local type reiserfs (rw)
/dev/rd/host0/target3/part1 on /var type reiserfs (rw,noatime,nodiratime)
/dev/rd/host0/target7/part1 on /var/www/html/fo type reiserfs (rw,noatime,nodiratime)
/dev/rd/host0/target2/part1 on /home type reiserfs (rw,noatime,nodiratime)
/dev/rd/host0/target4/part1 on /var/lib/innodb/1 type reiserfs (rw,noatime,nodiratime,notail)
/dev/rd/host0/target5/part1 on /var/lib/innodb/2 type reiserfs (rw,noatime,nodiratime,notail)
/dev/rd/host0/target6/part1 on /var/lib/oracle/db04 type reiserfs (rw,noatime,nodiratime,notail)
sysfs on /sys type sysfs (rw)

Here is a .config:
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_UID16=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_EXPERIMENTAL=y
CONFIG_CLEAN_COMPILE=y
CONFIG_STANDALONE=y
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSCTL=y
CONFIG_LOG_BUF_SHIFT=15
CONFIG_KALLSYMS=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_OBSOLETE_MODPARM=y
CONFIG_KMOD=y
CONFIG_X86_PC=y
CONFIG_MPENTIUMIII=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_SMP=y
CONFIG_NR_CPUS=2
CONFIG_PREEMPT=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_TSC=y
CONFIG_MICROCODE=m
CONFIG_X86_MSR=m
CONFIG_X86_CPUID=m
CONFIG_HIGHMEM4G=y
CONFIG_HIGHMEM=y
CONFIG_MTRR=y
CONFIG_HAVE_DEC_LOCK=y
CONFIG_PM=y
CONFIG_ACPI_BOOT=y
CONFIG_APM=y
CONFIG_PCI=y
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_NAMES=y
CONFIG_BINFMT_ELF=y
CONFIG_BLK_DEV_DAC960=y
CONFIG_BLK_DEV_LOOP=m
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_NETFILTER=y
CONFIG_IP_NF_CONNTRACK=m
CONFIG_IP_NF_FTP=m
CONFIG_IP_NF_IPTABLES=y
CONFIG_IP_NF_MATCH_LIMIT=m
CONFIG_IP_NF_MATCH_IPRANGE=m
CONFIG_IP_NF_MATCH_MAC=m
CONFIG_IP_NF_MATCH_PKTTYPE=m
CONFIG_IP_NF_MATCH_MARK=m
CONFIG_IP_NF_MATCH_MULTIPORT=m
CONFIG_IP_NF_MATCH_TOS=m
CONFIG_IP_NF_MATCH_RECENT=m
CONFIG_IP_NF_MATCH_ECN=m
CONFIG_IP_NF_MATCH_DSCP=m
CONFIG_IP_NF_MATCH_LENGTH=m
CONFIG_IP_NF_MATCH_TTL=m
CONFIG_IP_NF_MATCH_TCPMSS=m
CONFIG_IP_NF_MATCH_HELPER=m
CONFIG_IP_NF_MATCH_STATE=m
CONFIG_IP_NF_MATCH_CONNTRACK=m
CONFIG_IP_NF_MATCH_OWNER=m
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m
CONFIG_IP_NF_MANGLE=m
CONFIG_IP_NF_TARGET_TOS=m
CONFIG_IP_NF_TARGET_ECN=m
CONFIG_IP_NF_TARGET_MARK=m
CONFIG_IP_NF_TARGET_CLASSIFY=m
CONFIG_IP_NF_TARGET_LOG=m
CONFIG_IP_NF_TARGET_ULOG=m
CONFIG_IP_NF_TARGET_TCPMSS=m
CONFIG_IPV6_SCTP__=y
CONFIG_NETDEVICES=y
CONFIG_DUMMY=m
CONFIG_NET_ETHERNET=y
CONFIG_MII=y
CONFIG_NET_PCI=y
CONFIG_E100=y
CONFIG_INPUT=y
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
CONFIG_SOUND_GAMEPORT=y
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_UNIX98_PTYS=y
CONFIG_UNIX98_PTY_COUNT=256
CONFIG_VIDEO_SELECT=y
CONFIG_VGA_CONSOLE=y
CONFIG_DUMMY_CONSOLE=y
CONFIG_EXT2_FS=m
CONFIG_EXT3_FS=m
CONFIG_EXT3_FS_XATTR=y
CONFIG_JBD=m
CONFIG_FS_MBCACHE=m
CONFIG_REISERFS_FS=y
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_DEVFS_FS=y
CONFIG_DEVPTS_FS=y
CONFIG_TMPFS=y
CONFIG_RAMFS=y
CONFIG_MSDOS_PARTITION=y
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ISO8859_1=y
CONFIG_DEBUG_KERNEL=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_X86_FIND_SMP_CONFIG=y
CONFIG_X86_MPPARSE=y
CONFIG_X86_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_BIOS_REBOOT=y
CONFIG_X86_TRAMPOLINE=y
CONFIG_PC=y

> 
> 
> 

-- 
                   Best regards,
                   Sergey S. Kostyliov <rathamahata@php4.ru>
                   Public PGP key: http://sysadminday.org.ru/rathamahata.asc

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.1 IO lockup on SMP systems
  2004-02-26 13:11                   ` Andrew Morton
@ 2004-02-26 14:37                     ` Dave Jones
  2004-02-26 15:37                       ` Arjan van de Ven
  0 siblings, 1 reply; 24+ messages in thread
From: Dave Jones @ 2004-02-26 14:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: rathamahata, linux-kernel, gluk, anton, mfedyk

On Thu, Feb 26, 2004 at 05:11:35AM -0800, Andrew Morton wrote:
 > Andrew Morton <akpm@osdl.org> wrote:
 > >
 > > Not sure why the oom-killer didn't do anything.
 > 
 > There's still free swap space.  The oom-killer has problems.

That sounds odd. Surely if we have free swap, we don't
want the oom-killer to do anything ?

		Dave


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.1 IO lockup on SMP systems
  2004-02-26 14:37                     ` Dave Jones
@ 2004-02-26 15:37                       ` Arjan van de Ven
  0 siblings, 0 replies; 24+ messages in thread
From: Arjan van de Ven @ 2004-02-26 15:37 UTC (permalink / raw)
  To: Dave Jones; +Cc: Andrew Morton, rathamahata, linux-kernel, gluk, anton, mfedyk

[-- Attachment #1: Type: text/plain, Size: 560 bytes --]

On Thu, 2004-02-26 at 15:37, Dave Jones wrote:
> On Thu, Feb 26, 2004 at 05:11:35AM -0800, Andrew Morton wrote:
>  > Andrew Morton <akpm@osdl.org> wrote:
>  > >
>  > > Not sure why the oom-killer didn't do anything.
>  > 
>  > There's still free swap space.  The oom-killer has problems.
> 
> That sounds odd. Surely if we have free swap, we don't
> want the oom-killer to do anything ?

with highmem it's not so easy :)
the lowzone can be entirely pinned by pagetables and such and the
highmem zone can be free... and still you want to oomkill.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.1 IO lockup on SMP systems
  2004-02-26 14:30                   ` Sergey S. Kostyliov
@ 2004-02-26 20:03                     ` Andrew Morton
  2004-02-28 14:56                       ` Sergey S. Kostyliov
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2004-02-26 20:03 UTC (permalink / raw)
  To: Sergey S. Kostyliov; +Cc: linux-kernel, gluk, anton, mfedyk

"Sergey S. Kostyliov" <rathamahata@php4.ru> wrote:
>
> > So.  What are you using which is different from everyone else?  DAC960 I
>  > see.  What about firewall setups, NIC drivers, RAID/MD/etc?  Anything in
>  > there which isn't a mainstream thing?
> 
>  Iptables (ipt_REJECT,ipt_state,ip_conntrack,ipt_state,iptable_filter modules)
>  is used as firewall.
> 
>  I think NICs are pretty usual:
>  00:04.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 08)
>  00:05.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 08)
>  handled by Intel e100 driver.
> 
>  Only plain partitions (there is no md, dm or something like this):
>  [rathamahata@ope rathamahata]$ mount
>  /dev/rd/host0/target0/part1 on / type reiserfs (rw)
>  none on /proc type proc (rw)
>  none on /dev/pts type devpts (rw,gid=5,mode=620)
>  /dev/rd/host0/target1/part2 on /usr/local type reiserfs (rw)
>  /dev/rd/host0/target3/part1 on /var type reiserfs (rw,noatime,nodiratime)
>  /dev/rd/host0/target7/part1 on /var/www/html/fo type reiserfs (rw,noatime,nodiratime)
>  /dev/rd/host0/target2/part1 on /home type reiserfs (rw,noatime,nodiratime)
>  /dev/rd/host0/target4/part1 on /var/lib/innodb/1 type reiserfs (rw,noatime,nodiratime,notail)
>  /dev/rd/host0/target5/part1 on /var/lib/innodb/2 type reiserfs (rw,noatime,nodiratime,notail)
>  /dev/rd/host0/target6/part1 on /var/lib/oracle/db04 type reiserfs (rw,noatime,nodiratime,notail)
>  sysfs on /sys type sysfs (rw)

OK, thanks.  Is there any possibility that you can run without iptables for
a while, see if that fixes it?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.1 IO lockup on SMP systems
  2004-02-26 20:03                     ` Andrew Morton
@ 2004-02-28 14:56                       ` Sergey S. Kostyliov
  2004-04-08  9:08                         ` 2.6.X kernel memory leak? (was: Re: 2.6.1 IO lockup on SMP systems) Sergey S. Kostyliov
  0 siblings, 1 reply; 24+ messages in thread
From: Sergey S. Kostyliov @ 2004-02-28 14:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, gluk, anton, mfedyk

On Thursday 26 February 2004 23:03, Andrew Morton wrote:
<cut>
> OK, thanks.  Is there any possibility that you can run without iptables for
> a while, see if that fixes it?

I recompiled 2.6.3 without iptables support, unfortunately it doesn't
solve the problem, machine still hangs.

1) sysrq-M:

SysRq : Show Memory
Mem-info:
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
cpu 1 hot: low 2, high 6, batch 1
cpu 1 cold: low 0, high 2, batch 1
Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
HighMem per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16

Free pages:        3276kB (512kB HighMem)
Active:820 inactive:195 dirty:0 writeback:0 unstable:0 free:819
DMA free:1348kB min:16kB low:32kB high:48kB active:316kB inactive:0kB
Normal free:1416kB min:936kB low:1872kB high:2808kB active:1388kB inactive:348kB
HighMem free:512kB min:512kB low:1024kB high:1536kB active:1604kB inactive:404kB
DMA: 75*4kB 69*8kB 21*16kB 3*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1348kB
Normal: 98*4kB 20*8kB 0*16kB 1*32kB 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1416kB
HighMem: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 2*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 512kB
Swap cache: add 342862, delete 342774, find 15349/23980, race 14+29
Free swap:       2473044kB
524288 pages of RAM
294912 pages of HIGHMEM
5814 reserved pages
899 pages shared
89 pages swap cached

2) /proc/meminfo before a lockup
Sat Feb 28 06:42:33 MSK 2004
MemTotal:      2073896 kB
MemFree:          3452 kB
Buffers:          2240 kB
Cached:          29648 kB
SwapCached:      21084 kB
Active:         627896 kB
Inactive:        17340 kB
HighTotal:     1179648 kB
HighFree:          576 kB
LowTotal:       894248 kB
LowFree:          2876 kB
SwapTotal:     3583968 kB
SwapFree:      3095996 kB
Dirty:               0 kB
Writeback:       14104 kB
Mapped:         625540 kB
Slab:            19044 kB
Committed_AS:  1767368 kB
PageTables:       4316 kB
VmallocTotal:   114680 kB
VmallocUsed:      7448 kB
VmallocChunk:   107232 kB

-- 
                   Best regards,
                   Sergey S. Kostyliov <rathamahata@php4.ru>
                   Public PGP key: http://sysadminday.org.ru/rathamahata.asc

^ permalink raw reply	[flat|nested] 24+ messages in thread

* 2.6.X kernel memory leak? (was: Re: 2.6.1 IO lockup on SMP systems)
  2004-02-28 14:56                       ` Sergey S. Kostyliov
@ 2004-04-08  9:08                         ` Sergey S. Kostyliov
  2004-04-09  7:17                           ` 2.6.X kernel memory leak? Sergey S. Kostyliov
  0 siblings, 1 reply; 24+ messages in thread
From: Sergey S. Kostyliov @ 2004-04-08  9:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: Anton Kovalenko

Hello all,

On Saturday 28 February 2004 17:56, Sergey S. Kostyliov wrote:
> On Thursday 26 February 2004 23:03, Andrew Morton wrote:
> <cut>
> > OK, thanks.  Is there any possibility that you can run without iptables for
> > a while, see if that fixes it?
> 
> I recompiled 2.6.3 without iptables support, unfortunately it doesn't
> solve the problem, machine still hangs.

It looks like problem hasn't gone away in the last kernels. The visible
symptoms haven't changed: machine is pingable, tcp ports which were in
LISTEN state remains to be in LISTEN after lockup, nothing else.

The last one is for different machine than in my previous reports,
so I suspect this is not a hardware issue. Kernel is 2.6.5-aa3 but
I believe Andrea's changes is not related to this problem.

sysrq-M
	http://sysadminday.org.ru/2.6.X-lockup/terror/20040408/sysrq-M

sysrq-T
	http://sysadminday.org.ru/2.6.X-lockup/terror/20040408/sysrq-T

.config
	http://sysadminday.org.ru/2.6.X-lockup/terror/.config

`lspci -vv'
	http://sysadminday.org.ru/2.6.X-lockup/terror/lspci_-vv

`dmesg'
	http://sysadminday.org.ru/2.6.X-lockup/terror/dmesg

/etc/fstab
	http://sysadminday.org.ru/2.6.X-lockup/terror/fstab


-- 
                   Best regards,
                   Sergey S. Kostyliov <rathamahata@php4.ru>
                   Public PGP key: http://sysadminday.org.ru/rathamahata.asc

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.X kernel memory leak?
  2004-04-08  9:08                         ` 2.6.X kernel memory leak? (was: Re: 2.6.1 IO lockup on SMP systems) Sergey S. Kostyliov
@ 2004-04-09  7:17                           ` Sergey S. Kostyliov
  2004-04-09  9:09                             ` Andrew Morton
  0 siblings, 1 reply; 24+ messages in thread
From: Sergey S. Kostyliov @ 2004-04-09  7:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Anton Kovalenko

On Thursday 08 April 2004 13:08, Sergey S. Kostyliov wrote:
> Hello all,
> 
> On Saturday 28 February 2004 17:56, Sergey S. Kostyliov wrote:
> > On Thursday 26 February 2004 23:03, Andrew Morton wrote:
> > <cut>
> > > OK, thanks.  Is there any possibility that you can run without iptables for
> > > a while, see if that fixes it?
> > 
> > I recompiled 2.6.3 without iptables support, unfortunately it doesn't
> > solve the problem, machine still hangs.
> 
> It looks like problem hasn't gone away in the last kernels. The visible
> symptoms haven't changed: machine is pingable, tcp ports which were in
> LISTEN state remains to be in LISTEN after lockup, nothing else.
> 
> The last one is for different machine than in my previous reports,
> so I suspect this is not a hardware issue. Kernel is 2.6.5-aa3 but
> I believe Andrea's changes is not related to this problem.
> 
> sysrq-M
> 	http://sysadminday.org.ru/2.6.X-lockup/terror/20040408/sysrq-M
> 
> sysrq-T
> 	http://sysadminday.org.ru/2.6.X-lockup/terror/20040408/sysrq-T
> 
> .config
> 	http://sysadminday.org.ru/2.6.X-lockup/terror/.config
> 
> `lspci -vv'
> 	http://sysadminday.org.ru/2.6.X-lockup/terror/lspci_-vv
> 
> `dmesg'
> 	http://sysadminday.org.ru/2.6.X-lockup/terror/dmesg
> 
> /etc/fstab
> 	http://sysadminday.org.ru/2.6.X-lockup/terror/fstab
> 
> 

And here is part of sysrq-T for the third machine, which have just locked up,
kernel is 2.6.5-rc3-aa2.

multilog      S F7BF3D60     0  3302   3288                     (NOTLB)
f7b83ed8 00000082 00000001 f7bf3d60 f7b83e9c c011a771 f7a4db80 00000000
       00000003 f7bf3d58 f7b82000 00000282 f7aaece0 00000000 0804ea70 f7aaece0
       f7aaed00 c180dbe0 0000111c 19e0b9c6 0001faed f7a89a70 f7b83f00 f7a6bb80
Call Trace:
 [<c011a771>] __wake_up_common+0x31/0x60
 [<c016ee7c>] pipe_wait+0x7c/0xa0
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c016f07a>] pipe_readv+0x1da/0x2c0
 [<c016f180>] pipe_read+0x20/0x30
 [<c0160cc0>] vfs_read+0xb0/0x110
 [<c0160f18>] sys_read+0x38/0x60
 [<c01073c9>] sysenter_past_esp+0x52/0x71

qmail-lspawn  S C030D340     0  3325   3301          3326       (NOTLB)
f74c5ea4 00000082 c0117444 c030d340 00000246 01470f60 f7cd8b80 c030d6d0
       00000000 c030d6c0 c1382d20 00000000 00000000 19c98941 0001faed f7aaece0
       f7aaed00 c1815be0 00004ec0 19ca1051 0001faed f7bb3a10 00000010 f74c5eb4
Call Trace:
 [<c0117444>] do_page_fault+0x304/0x4ef
 [<c0129693>] schedule_timeout+0xc3/0xd0
 [<c0175a90>] __pollwait+0x80/0xd0
 [<c016f5d2>] pipe_poll+0x32/0x90
 [<c0175da2>] do_select+0x1c2/0x330
 [<c0175a10>] __pollwait+0x0/0xd0
 [<c017620e>] sys_select+0x2de/0x4d0
 [<c016030f>] filp_close+0x4f/0x80
 [<c01073c9>] sysenter_past_esp+0x52/0x71

qmail-rspawn  S C030D300     0  3326   3301          3327  3325 (NOTLB)
f74d9ea4 00000082 f74d8000 c030d300 00000246 01468f60 f7a9eb80 c181756c
       f74d9e58 c030d680 c11654c0 00000000 00000000 c0118397 00000000 f7aaece0
       f7aaed00 c180dbe0 0000f336 ad9386e9 000010b2 f747bad0 cbf9ff0c f74d9eb4
Call Trace:
 [<c0118397>] recalc_task_prio+0x97/0x1c0
 [<c0129693>] schedule_timeout+0xc3/0xd0
 [<c0175a90>] __pollwait+0x80/0xd0
 [<c016f5d2>] pipe_poll+0x32/0x90
 [<c0175da2>] do_select+0x1c2/0x330
 [<c0175a10>] __pollwait+0x0/0xd0
 [<c017620e>] sys_select+0x2de/0x4d0
 [<c016030f>] filp_close+0x4f/0x80
 [<c01073c9>] sysenter_past_esp+0x52/0x71

qmail-clean   S 00000012     0  3327   3301                3326 (NOTLB)
f7445ed8 00000082 f7445f00 00000012 c01bfa2f 00000000 f7a9e280 f7445ea8
       c0118397 b1f8808e 3cc9b81f f7a9e940 19ec7e20 0001faed c180dbe0 e8af92d0
       e8af92f0 c1815be0 00008b3e 19ecb28e 0001faed f747b500 00000082 f74dbf00
Call Trace:
 [<c01bfa2f>] do_journal_end+0xcf/0xbe0
 [<c0118397>] recalc_task_prio+0x97/0x1c0
 [<c016ee7c>] pipe_wait+0x7c/0xa0
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c016f07a>] pipe_readv+0x1da/0x2c0
 [<c016f42d>] pipe_writev+0x29d/0x360
 [<c016f180>] pipe_read+0x20/0x30
 [<c0160cc0>] vfs_read+0xb0/0x110
 [<c0160f18>] sys_read+0x38/0x60
 [<c01073c9>] sysenter_past_esp+0x52/0x71

proftpd       D 00000000     0  3328      1          3364  3282 (NOTLB)
f7413d34 00000086 00000000 00000000 00000000 00000000 f7a9edc0 00000000
       00000000 00000000 00000000 00000000 f7412000 00000000 00000246 f73d0da0
       f73d0dc0 c180dbe0 000005ed 980d0112 000222af f747af30 00000000 c030dc20
Call Trace:
 [<c0129642>] schedule_timeout+0x72/0xd0
 [<c01295c0>] process_timeout+0x0/0x10
 [<c011bfa8>] io_schedule_timeout+0x28/0x40
 [<c020e8ab>] blk_congestion_wait+0x7b/0x90
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c014205b>] __alloc_pages+0x29b/0x330
 [<c0144e8d>] do_page_cache_readahead+0x1cd/0x280
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c013e49f>] filemap_nopage+0x17f/0x460
 [<c015004b>] do_no_page+0xdb/0x680
 [<c013cc31>] unlock_page+0x11/0x60
 [<c014f435>] do_wp_page+0x4c5/0x570
 [<c015081c>] handle_mm_fault+0xec/0x1d0
 [<c0117444>] do_page_fault+0x304/0x4ef
 [<c010f555>] convert_fxsr_from_user+0x15/0xe0
 [<c010f92c>] restore_i387+0x8c/0x90
 [<c01066b4>] restore_sigcontext+0x114/0x130
 [<c01067b2>] sys_sigreturn+0xe2/0x150
 [<c0117140>] do_page_fault+0x0/0x4ef
 [<c0107e85>] error_code+0x2d/0x38

sshd          D 00000000     0  3364      1  3238    3391  3328 (NOTLB)
f73f7d18 00000082 00000000 00000000 00000000 00000000 f7a5d040 00000000
       00000000 00000000 00000000 00000000 f73f6000 00000000 00000246 00000000
       ffffffff c1815be0 00000149 de751563 000222af f740cf50 00000000 c030dc20
Call Trace:
 [<c0129642>] schedule_timeout+0x72/0xd0
 [<c01295c0>] process_timeout+0x0/0x10
 [<c011bfa8>] io_schedule_timeout+0x28/0x40
 [<c020e8ab>] blk_congestion_wait+0x7b/0x90
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c014205b>] __alloc_pages+0x29b/0x330
 [<c015bc11>] read_swap_cache_async+0x101/0x10d
 [<c014f79f>] swapin_readahead+0x2f/0xd0
 [<c014fb57>] do_swap_page+0x317/0x430
 [<c014d835>] pte_alloc_map+0xc5/0x130
 [<c01507f8>] handle_mm_fault+0xc8/0x1d0
 [<c0117444>] do_page_fault+0x304/0x4ef
 [<c0129693>] schedule_timeout+0xc3/0xd0
 [<c0175a90>] __pollwait+0x80/0xd0
 [<c028bd2d>] tcp_poll+0x1d/0x170
 [<c0175a04>] do_select+0x1e7/0x330
 [<c0117140>] do_page_fault+0x0/0x4ef
 [<c0107e85>] error_code+0x2d/0x38
 [<c017007b>] get_write_access+0x4b/0xe0
 [<c01cf8e0>] __copy_to_user_ll+0x40/0x60
 [<c01762e7>] sys_select+0x3b7/0x4d0
 [<c010f92c>] restore_i387+0x8c/0x90
 [<c01066b4>] restore_sigcontext+0x114/0x130
 [<c01073c9>] sysenter_past_esp+0x52/0x71

cron          D 00000000     0  3391      1  6677    3401  3364 (NOTLB)
f73cbd34 00000082 00000000 00000000 00000000 00000000 f7bd5280 00000000
       00000000 00000000 00000000 00000000 f73ca000 00000000 00000246 00000000
       ffffffff c180dbe0 00000180 980ff35e 000222af f73d0f70 00000000 c030dc20
Call Trace:
 [<c0129642>] schedule_timeout+0x72/0xd0
 [<c01295c0>] process_timeout+0x0/0x10
 [<c011bfa8>] io_schedule_timeout+0x28/0x40
 [<c020e8ab>] blk_congestion_wait+0x7b/0x90
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c014205b>] __alloc_pages+0x29b/0x330
 [<c011a2c9>] schedule+0x389/0x7a0
 [<c0144e8d>] do_page_cache_readahead+0x1cd/0x280
 [<c013e49f>] filemap_nopage+0x17f/0x460
 [<c015004b>] do_no_page+0xdb/0x680
 [<c015081c>] handle_mm_fault+0xec/0x1d0
 [<c0117444>] do_page_fault+0x304/0x4ef
 [<c016c10b>] sys_stat64+0x2b/0x30
 [<c0117140>] do_page_fault+0x0/0x4ef
 [<c0107e85>] error_code+0x2d/0x38

agetty        D 00000000     0  3401      1          3402  3391 (NOTLB)
f7badc54 00000086 00000000 00000000 00000000 00000000 f7b07dc0 00000000
       00000000 00000000 00000000 00000000 f7bac000 c180e540 c01287ac 00000000
       ffffffff c180dbe0 00018704 9823a1c3 000222af f7a88ed0 00000000 c030dc20
Call Trace:
 [<c01287ac>] __mod_timer+0x23c/0x370
 [<c0129642>] schedule_timeout+0x72/0xd0
 [<c01295c0>] process_timeout+0x0/0x10
 [<c011bfa8>] io_schedule_timeout+0x28/0x40
 [<c020e8ab>] blk_congestion_wait+0x7b/0x90
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c014205b>] __alloc_pages+0x29b/0x330
 [<c015bc11>] read_swap_cache_async+0x101/0x10d
 [<c014f79f>] swapin_readahead+0x2f/0xd0
 [<c014fb57>] do_swap_page+0x317/0x430
 [<c014d835>] pte_alloc_map+0xc5/0x130
 [<c01507f8>] handle_mm_fault+0xc8/0x1d0
 [<c0117444>] do_page_fault+0x304/0x4ef
 [<c011a2c9>] schedule+0x389/0x7a0
 [<c01bee6f>] journal_end+0xf/0x20
 [<c01aeac7>] reiserfs_dirty_inode+0x77/0x110
 [<c0117140>] do_page_fault+0x0/0x4ef
 [<c0107e85>] error_code+0x2d/0x38
 [<c025007b>] sg_res_in_use+0x6b/0x80
 [<c01cf8e0>] __copy_to_user_ll+0x40/0x60
 [<c01fa91d>] read_chan+0x5dd/0xb00
 [<c011a730>] default_wake_function+0x0/0x10
 [<c011a730>] default_wake_function+0x0/0x10
 [<c014e246>] unmap_vmas+0xf6/0x310
 [<c011a730>] default_wake_function+0x0/0x10
 [<c01f46dd>] tty_write+0x1ad/0x360
 [<c01f44f6>] tty_read+0x176/0x1b0
 [<c0160cc0>] vfs_read+0xb0/0x110
 [<c0160f18>] sys_read+0x38/0x60
 [<c01073c9>] sysenter_past_esp+0x52/0x71

agetty        S F7D2E800     0  3402      1          3403  3401 (NOTLB)
f7a87e58 00000082 00000000 f7d2e800 f7a87e20 f78200d8 f7b07b80 f7820114
       c01bee6f 00000000 c01aeac7 000000ff 00000000 c02d88f7 00000000 00000001
       0064d901 c180dbe0 000850e1 ef9caa45 00000013 f7a414e0 00000286 f7d2e800
Call Trace:
 [<c01bee6f>] journal_end+0xf/0x20
 [<c01aeac7>] reiserfs_dirty_inode+0x77/0x110
 [<c0129693>] schedule_timeout+0xc3/0xd0
 [<c0205ba3>] do_con_write+0x2b3/0x740
 [<c01facaa>] read_chan+0x96a/0xb00
 [<c011a730>] default_wake_function+0x0/0x10
 [<c011a730>] default_wake_function+0x0/0x10
 [<c014e246>] unmap_vmas+0xf6/0x310
 [<c011a730>] default_wake_function+0x0/0x10
 [<c01f46dd>] tty_write+0x1ad/0x360
 [<c01f44f6>] tty_read+0x176/0x1b0
 [<c0160cc0>] vfs_read+0xb0/0x110
 [<c0160f18>] sys_read+0x38/0x60
 [<c01073c9>] sysenter_past_esp+0x52/0x71

agetty        S 00003500     0  3403      1          3404  3402 (NOTLB)
f73e5e58 00000082 00000000 00003500 175c6fc1 f7de0844 f7b07940 e05da8c0
       00000011 00000000 f7de9220 c192f000 c0283e93 c192f000 00000000 f7de9220
       f7de0830 c180dbe0 000b78d0 ef8aa30a 00000013 f740c980 00000286 f7d2e800
Call Trace:
 [<c0283e93>] ip_local_deliver+0xd3/0x1f0
 [<c0129693>] schedule_timeout+0xc3/0xd0
 [<c0205ba3>] do_con_write+0x2b3/0x740
 [<c01facaa>] read_chan+0x96a/0xb00
 [<c011a730>] default_wake_function+0x0/0x10
 [<c011a730>] default_wake_function+0x0/0x10
 [<c014e246>] unmap_vmas+0xf6/0x310
 [<c011a730>] default_wake_function+0x0/0x10
 [<c01f46dd>] tty_write+0x1ad/0x360
 [<c01f44f6>] tty_read+0x176/0x1b0
 [<c0160cc0>] vfs_read+0xb0/0x110
 [<c0160f18>] sys_read+0x38/0x60
 [<c01073c9>] sysenter_past_esp+0x52/0x71

agetty        S 00000000     0  3404      1          3405  3403 (NOTLB)
f74c3e58 00000082 000001ff 00000000 00000003 00000000 f7a4d280 00000000
       00020000 00000000 f74c3e6c 000000ff 00000000 00000000 00000000 00000003
       00000286 c1815be0 0007e435 ef918c51 00000013 f7bb3440 00000286 00000000
Call Trace:
 [<c0129693>] schedule_timeout+0xc3/0xd0
 [<c0205ba3>] do_con_write+0x2b3/0x740
 [<c01facaa>] read_chan+0x96a/0xb00
 [<c011a730>] default_wake_function+0x0/0x10
 [<c011a730>] default_wake_function+0x0/0x10
 [<c014e246>] unmap_vmas+0xf6/0x310
 [<c011a730>] default_wake_function+0x0/0x10
 [<c01f46dd>] tty_write+0x1ad/0x360
 [<c01f44f6>] tty_read+0x176/0x1b0
 [<c0160cc0>] vfs_read+0xb0/0x110
 [<c0160f18>] sys_read+0x38/0x60
 [<c01073c9>] sysenter_past_esp+0x52/0x71

agetty        S 00000000     0  3405      1          3406  3404 (NOTLB)
f73cfe58 00000086 000001ff 00000000 00000004 00000000 f7a6a4c0 00000000
       00020000 00000000 f73cfe6c 000000ff 00000000 00000000 00000000 00000004
       00000286 c180dbe0 00084d71 efb34714 00000013 f73d1b10 00000286 00000000
Call Trace:
 [<c0129693>] schedule_timeout+0xc3/0xd0
 [<c0205ba3>] do_con_write+0x2b3/0x740
 [<c01facaa>] read_chan+0x96a/0xb00
 [<c011a730>] default_wake_function+0x0/0x10
 [<c011a730>] default_wake_function+0x0/0x10
 [<c014e246>] unmap_vmas+0xf6/0x310
 [<c011a730>] default_wake_function+0x0/0x10
 [<c01f46dd>] tty_write+0x1ad/0x360
 [<c01f44f6>] tty_read+0x176/0x1b0
 [<c0160cc0>] vfs_read+0xb0/0x110
 [<c0160f18>] sys_read+0x38/0x60
 [<c01073c9>] sysenter_past_esp+0x52/0x71

agetty        S F7D2E800     0  3406      1          4611  3405 (NOTLB)
f73e3e58 00000082 00000000 f7d2e800 f73e3e20 f78200d8 f7cd8040 f7820114
       c01bee6f 00000000 c01aeac7 000000ff 00000000 c02d88f7 00000000 00000001
       0064d901 c1815be0 0007cc5c efa78898 00000013 f740c3b0 00000286 f7d2e800
Call Trace:
 [<c01bee6f>] journal_end+0xf/0x20
 [<c01aeac7>] reiserfs_dirty_inode+0x77/0x110
 [<c0129693>] schedule_timeout+0xc3/0xd0
 [<c0205ba3>] do_con_write+0x2b3/0x740
 [<c01facaa>] read_chan+0x96a/0xb00
 [<c011a730>] default_wake_function+0x0/0x10
 [<c011a730>] default_wake_function+0x0/0x10
 [<c014e246>] unmap_vmas+0xf6/0x310
 [<c011a730>] default_wake_function+0x0/0x10
 [<c01f46dd>] tty_write+0x1ad/0x360
 [<c01f44f6>] tty_read+0x176/0x1b0
 [<c0160cc0>] vfs_read+0xb0/0x110
 [<c0160f18>] sys_read+0x38/0x60
 [<c01073c9>] sysenter_past_esp+0x52/0x71

ntpd          D 00000000     0  4611      1                3406 (NOTLB)
ce397bd0 00000082 00000000 00000000 00000000 00000000 f7bd5b80 00000000
       00000000 00000000 00000000 00000000 ce396000 00000000 00000246 f7bb2100
       f7bb2120 c1815be0 0000018d f19a9e2b 000222af cc3de3b0 00000000 c030dc20
Call Trace:
 [<c0129642>] schedule_timeout+0x72/0xd0
 [<c01295c0>] process_timeout+0x0/0x10
 [<c011bfa8>] io_schedule_timeout+0x28/0x40
 [<c020e8ab>] blk_congestion_wait+0x7b/0x90
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c014205b>] __alloc_pages+0x29b/0x330
 [<c013cf9d>] find_lock_page+0x4d/0x270
 [<c013f42c>] generic_file_aio_write_nolock+0x33c/0xba0
 [<c0148054>] mark_page_accessed+0x34/0x40
 [<c0164052>] __find_get_block+0x62/0xc0
 [<c0164052>] __find_get_block+0x62/0xc0
 [<c01b6c92>] search_by_key+0x642/0xe10
 [<c013fced>] generic_file_write_nolock+0x5d/0x80
 [<c019fa27>] reiserfs_find_entry+0x97/0x150
 [<c013fddf>] generic_file_write+0x3f/0x60
 [<c01aa31f>] reiserfs_file_write+0x7ff/0x810
 [<c019fbfc>] reiserfs_lookup+0x11c/0x1f0
 [<c0130dd9>] in_group_p+0x39/0x70
 [<c016ff29>] vfs_permission+0x79/0x140
 [<c017a24c>] dput+0x1c/0x3a0
 [<c01701fa>] path_release+0xa/0x30
 [<c0171be7>] open_namei+0xb7/0x3e0
 [<c015fc6d>] filp_open+0x2d/0x60
 [<c0160e80>] vfs_write+0xb0/0x110
 [<c0160f78>] sys_write+0x38/0x60
 [<c01073c9>] sysenter_past_esp+0x52/0x71

httpd         D 00000000     0 12739   3100         12740       (NOTLB)
ee10dc70 00000086 00000000 00000000 00000000 00000000 f7a9c4c0 00000000
       00000000 00000000 00000000 00000000 ee10c000 9821ec28 000222af eff200a0
       eff200c0 c180dbe0 00000644 9821eff8 000222af f714e3f0 00000000 c030dc20
Call Trace:
 [<c0129642>] schedule_timeout+0x72/0xd0
 [<c01295c0>] process_timeout+0x0/0x10
 [<c011bfa8>] io_schedule_timeout+0x28/0x40
 [<c020e8ab>] blk_congestion_wait+0x7b/0x90
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c014205b>] __alloc_pages+0x29b/0x330
 [<c015bc11>] read_swap_cache_async+0x101/0x10d
 [<c014f79f>] swapin_readahead+0x2f/0xd0
 [<c014fb57>] do_swap_page+0x317/0x430
 [<c014d835>] pte_alloc_map+0xc5/0x130
 [<c01507f8>] handle_mm_fault+0xc8/0x1d0
 [<c0117444>] do_page_fault+0x304/0x4ef
 [<c0129693>] schedule_timeout+0xc3/0xd0
 [<c02688fa>] lock_sock+0x6a/0xc0
 [<c0268e09>] __kfree_skb+0x79/0x100
 [<c02905fc>] wait_for_connect+0xec/0x110
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c0117140>] do_page_fault+0x0/0x4ef
 [<c0107e85>] error_code+0x2d/0x38
 [<c01cf529>] __get_user_4+0x11/0x17
 [<c0264955>] move_addr_to_user+0x25/0x90
 [<c017dab0>] new_inode+0x10/0xc0
 [<c0265f7c>] sys_accept+0xec/0x160
 [<c028fd7b>] tcp_close+0x36b/0x720
 [<c0266b05>] sys_socketcall+0xf5/0x2a0
 [<c01073c9>] sysenter_past_esp+0x52/0x71

httpd         D 00000000     0 12740   3100         12741 12739 (NOTLB)
dc433c70 00000086 00000000 00000000 00000000 00000000 e45efdc0 00000000
       00000000 00000000 00000000 00000000 dc432000 00000000 00000246 f714e220
       f714e240 c180dbe0 0000022d 981ea476 000222af eff219b0 00000000 c030dc20
Call Trace:
 [<c0129642>] schedule_timeout+0x72/0xd0
 [<c01295c0>] process_timeout+0x0/0x10
 [<c011bfa8>] io_schedule_timeout+0x28/0x40
 [<c020e8ab>] blk_congestion_wait+0x7b/0x90
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c014205b>] __alloc_pages+0x29b/0x330
 [<c015bc11>] read_swap_cache_async+0x101/0x10d
 [<c014f79f>] swapin_readahead+0x2f/0xd0
 [<c014fb57>] do_swap_page+0x317/0x430
 [<c014d835>] pte_alloc_map+0xc5/0x130
 [<c01507f8>] handle_mm_fault+0xc8/0x1d0
 [<c0117444>] do_page_fault+0x304/0x4ef
 [<c0129693>] schedule_timeout+0xc3/0xd0
 [<c02688fa>] lock_sock+0x6a/0xc0
 [<c0268e09>] __kfree_skb+0x79/0x100
 [<c02905fc>] wait_for_connect+0xec/0x110
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c0117140>] do_page_fault+0x0/0x4ef
 [<c0107e85>] error_code+0x2d/0x38
 [<c01cf529>] __get_user_4+0x11/0x17
 [<c0264955>] move_addr_to_user+0x25/0x90
 [<c017dab0>] new_inode+0x10/0xc0
 [<c0265f7c>] sys_accept+0xec/0x160
 [<c028fd7b>] tcp_close+0x36b/0x720
 [<c0266b05>] sys_socketcall+0xf5/0x2a0
 [<c01073c9>] sysenter_past_esp+0x52/0x71

httpd         D 00000000     0 12741   3100         12742 12740 (NOTLB)
f580dc70 00000086 00000000 00000000 00000000 00000000 f7a6a700 00000000
       00000000 00000000 00000000 00000000 f580c000 00000000 00000246 00000000
       ffffffff c1815be0 00000182 fdf459c5 000222af f7a63a90 00000000 c030dc20
Call Trace:
 [<c0129642>] schedule_timeout+0x72/0xd0
 [<c01295c0>] process_timeout+0x0/0x10
 [<c011bfa8>] io_schedule_timeout+0x28/0x40
 [<c020e8ab>] blk_congestion_wait+0x7b/0x90
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c014205b>] __alloc_pages+0x29b/0x330
 [<c015bc11>] read_swap_cache_async+0x101/0x10d
 [<c014f79f>] swapin_readahead+0x2f/0xd0
 [<c014fb57>] do_swap_page+0x317/0x430
 [<c014d835>] pte_alloc_map+0xc5/0x130
 [<c01507f8>] handle_mm_fault+0xc8/0x1d0
 [<c0117444>] do_page_fault+0x304/0x4ef
 [<c0129693>] schedule_timeout+0xc3/0xd0
 [<c02688fa>] lock_sock+0x6a/0xc0
 [<c0268e09>] __kfree_skb+0x79/0x100
 [<c02905fc>] wait_for_connect+0xec/0x110
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c0117140>] do_page_fault+0x0/0x4ef
 [<c0107e85>] error_code+0x2d/0x38
 [<c01cf529>] __get_user_4+0x11/0x17
 [<c0264955>] move_addr_to_user+0x25/0x90
 [<c017dab0>] new_inode+0x10/0xc0
 [<c0265f7c>] sys_accept+0xec/0x160
 [<c028fd7b>] tcp_close+0x36b/0x720
 [<c0266b05>] sys_socketcall+0xf5/0x2a0
 [<c01073c9>] sysenter_past_esp+0x52/0x71

httpd         R running     0 12742   3100         12743 12741 (NOTLB)
httpd         D 00000000     0 12743   3100         13713 12742 (NOTLB)
f7501c70 00000086 00000000 00000000 00000000 00000000 f7a9e700 00000000
       00000000 00000000 00000000 00000000 f7500000 00000000 00000246 f73d07d0
       f73d07f0 c1815be0 0000015c 0315f596 000222b0 f7c9d9f0 00000000 c030dc20
Call Trace:
 [<c0129642>] schedule_timeout+0x72/0xd0
 [<c01295c0>] process_timeout+0x0/0x10
 [<c011bfa8>] io_schedule_timeout+0x28/0x40
 [<c020e8ab>] blk_congestion_wait+0x7b/0x90
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c014205b>] __alloc_pages+0x29b/0x330
 [<c015bc11>] read_swap_cache_async+0x101/0x10d
 [<c014f79f>] swapin_readahead+0x2f/0xd0
 [<c014fb57>] do_swap_page+0x317/0x430
 [<c014d835>] pte_alloc_map+0xc5/0x130
 [<c01507f8>] handle_mm_fault+0xc8/0x1d0
 [<c0117444>] do_page_fault+0x304/0x4ef
 [<c0129693>] schedule_timeout+0xc3/0xd0
 [<c02688fa>] lock_sock+0x6a/0xc0
 [<c0268e09>] __kfree_skb+0x79/0x100
 [<c02905fc>] wait_for_connect+0xec/0x110
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c0117140>] do_page_fault+0x0/0x4ef
 [<c0107e85>] error_code+0x2d/0x38
 [<c01cf529>] __get_user_4+0x11/0x17
 [<c0264955>] move_addr_to_user+0x25/0x90
 [<c017dab0>] new_inode+0x10/0xc0
 [<c0265f7c>] sys_accept+0xec/0x160
 [<c028fd7b>] tcp_close+0x36b/0x720
 [<c0266b05>] sys_socketcall+0xf5/0x2a0
 [<c01073c9>] sysenter_past_esp+0x52/0x71

httpd         D 00000000     0 13713   3100         19047 12743 (NOTLB)
df9a5c70 00000082 00000000 00000000 00000000 00000000 e45ef280 00000000
       00000000 00000000 00000000 00000000 df9a4000 00000000 00000246 00000000
       ffffffff c180dbe0 000001f6 976392ca 000222af f714e9c0 00000000 c030dc20
Call Trace:
 [<c0129642>] schedule_timeout+0x72/0xd0
 [<c01295c0>] process_timeout+0x0/0x10
 [<c011bfa8>] io_schedule_timeout+0x28/0x40
 [<c020e8ab>] blk_congestion_wait+0x7b/0x90
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c014205b>] __alloc_pages+0x29b/0x330
 [<c015bc11>] read_swap_cache_async+0x101/0x10d
 [<c014f79f>] swapin_readahead+0x2f/0xd0
 [<c014fb57>] do_swap_page+0x317/0x430
 [<c014d835>] pte_alloc_map+0xc5/0x130
 [<c01507f8>] handle_mm_fault+0xc8/0x1d0
 [<c0117444>] do_page_fault+0x304/0x4ef
 [<c0129693>] schedule_timeout+0xc3/0xd0
 [<c02688fa>] lock_sock+0x6a/0xc0
 [<c0268e09>] __kfree_skb+0x79/0x100
 [<c02905fc>] wait_for_connect+0xec/0x110
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c0117140>] do_page_fault+0x0/0x4ef
 [<c0107e85>] error_code+0x2d/0x38
 [<c01cf529>] __get_user_4+0x11/0x17
 [<c0264955>] move_addr_to_user+0x25/0x90
 [<c017dab0>] new_inode+0x10/0xc0
 [<c0265f7c>] sys_accept+0xec/0x160
 [<c014e121>] unmap_page_range+0x31/0x60
 [<c014e246>] unmap_vmas+0xf6/0x310
 [<c01488ff>] __pagevec_lru_add_active+0x13f/0x1b0
 [<c017a24c>] dput+0x1c/0x3a0
 [<c0161d39>] __fput+0xb9/0x120
 [<c0266b05>] sys_socketcall+0xf5/0x2a0
 [<c0152ce4>] do_munmap+0x154/0x1b0
 [<c01073c9>] sysenter_past_esp+0x52/0x71

httpd         D 00000000     0 19047   3100               13713 (NOTLB)
c5e5dc70 00000082 00000000 00000000 00000000 00000000 f7a9e040 00000000
       00000000 00000000 00000000 00000000 c5e5c000 00000000 00000246 f740cd80
       f740cda0 c1815be0 00000160 0cfd8041 000222b0 eff20840 00000000 c030dc20
Call Trace:
 [<c0129642>] schedule_timeout+0x72/0xd0
 [<c01295c0>] process_timeout+0x0/0x10
 [<c011bfa8>] io_schedule_timeout+0x28/0x40
 [<c020e8ab>] blk_congestion_wait+0x7b/0x90
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c014205b>] __alloc_pages+0x29b/0x330
 [<c015bc11>] read_swap_cache_async+0x101/0x10d
 [<c014f79f>] swapin_readahead+0x2f/0xd0
 [<c014fb57>] do_swap_page+0x317/0x430
 [<c014d835>] pte_alloc_map+0xc5/0x130
 [<c01507f8>] handle_mm_fault+0xc8/0x1d0
 [<c0117444>] do_page_fault+0x304/0x4ef
 [<c0129693>] schedule_timeout+0xc3/0xd0
 [<c02688fa>] lock_sock+0x6a/0xc0
 [<c0268e09>] __kfree_skb+0x79/0x100
 [<c02905fc>] wait_for_connect+0xec/0x110
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c0117140>] do_page_fault+0x0/0x4ef
 [<c0107e85>] error_code+0x2d/0x38
 [<c01cf529>] __get_user_4+0x11/0x17
 [<c0264955>] move_addr_to_user+0x25/0x90
 [<c017dab0>] new_inode+0x10/0xc0
 [<c0265f7c>] sys_accept+0xec/0x160
 [<c028fd7b>] tcp_close+0x36b/0x720
 [<c0266b05>] sys_socketcall+0xf5/0x2a0
 [<c01073c9>] sysenter_past_esp+0x52/0x71

sshd          S C3331DA4     0  3238   3364  3247    3756       (NOTLB)
c3331d7c 00000086 c3331e50 c3331da4 00000000 c3331e50 c1fa6dc0 00000000
       000475c4 00000000 fac86840 00000000 00000000 c1064f80 00000001 c3331e50
       c01b637e c1815be0 0004594a 64ab66fa 0001d39c f7aafa50 c3331da8 e06aaa38
Call Trace:
 [<c01b637e>] pathrelse+0x1e/0x30
 [<c0129693>] schedule_timeout+0xc3/0xd0
 [<c013fced>] generic_file_write_nolock+0x5d/0x80
 [<c02c090a>] unix_stream_data_wait+0xfa/0x180
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c02c1003>] unix_stream_recvmsg+0x673/0x710
 [<c01aa31f>] reiserfs_file_write+0x7ff/0x810
 [<c0265030>] sock_aio_read+0xb0/0xd0
 [<c0160bcd>] do_sync_read+0x6d/0xb0
 [<c01f556e>] release_dev+0x33e/0x7e0
 [<c015fdec>] dentry_open+0x14c/0x220
 [<c015fc8f>] filp_open+0x4f/0x60
 [<c0160cf7>] vfs_read+0xe7/0x110
 [<c0161d39>] __fput+0xb9/0x120
 [<c0160f18>] sys_read+0x38/0x60
 [<c01073c9>] sysenter_past_esp+0x52/0x71

sshd          D 00000000     0  3247   3238  3248               (NOTLB)
c309dd18 00000086 00000000 00000000 00000000 00000000 c1fa64c0 00000000
       00000000 00000000 00000000 00000000 c309c000 00000000 00000246 00000000
       ffffffff c1815be0 c030dc20
Call Trace:
 [<c0129642>] schedule_timeout+0x72/0xd0
 [<c01295c0>] process_timeout+0x0/0x10
 [<c011bfa8>] io_schedule_timeout+0x28/0x40
 [<c020e8ab>] blk_congestion_wait+0x7b/0x90
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c014205b>] __alloc_pages+0x29b/0x330
 [<c015bc11>] read_swap_cache_async+0x101/0x10d
 [<c014f79f>] swapin_readahead+0x2f/0xd0
 [<c014fb57>] do_swap_page+0x317/0x430
 [<c014d835>] pte_alloc_map+0xc5/0x130
 [<c01507f8>] handle_mm_fault+0xc8/0x1d0
 [<c0117444>] do_page_fault+0x304/0x4ef
 [<c01fc19a>] pty_chars_in_buffer+0x1a/0x40
 [<c01fc175>] pty_write_room+0x25/0x30
 [<c0175a04>] poll_freewait+0x44/0x50
 [<c0175dc7>] do_select+0x1e7/0x330
 [<c0117140>] do_page_fault+0x0/0x4ef
 [<c0107e85>] error_code+0x2d/0x38
 [<c017007b>] get_write_access+0x4b/0xe0
 [<c01cf8e0>] __copy_to_user_ll+0x40/0x60
 [<c01762e7>] sys_select+0x3b7/0x4d0
 [<c0160f78>] sys_write+0x38/0x60
 [<c01073c9>] sysenter_past_esp+0x52/0x71

bash          S C030D340     0  3248   3247                     (NOTLB)
c757be58 00000086 00000010 c030d340 00000246 01470f60 f7a5d4c0 00000000
       00000000 00000010 c1817708 00000000        f406c260 c180dbe0 0001e5df 4853c415 0001e873 c19633c0 c0129026 00000001
Call Trace:
 [<c0129026>] update_wall_time+0x16/0x40
 [<c0129693>] schedule_timeout+0xc3/0xd0
 [<c01f81b4>] opost_block+0xf4/0x1b0
 [<c01facaa>] read_chan+0x96a/0xb00
 [<c014f435>] do_wp_page+0x4c5/0x570
 [<c011a730>] default_wake_function+0x0/0x10
 [<c011a730>] default_wake_function+0x0/0x10
 [<c011a730>] default_wake_function+0x0/0x10
 [<c01f46dd>] tty_write+0x1ad/0x360
 [<c01f44f6>] tty_read+0x176/0x1b0
 [<c0160cc0>] vfs_read+0xb0/0x110
 [<c0160f18>] sys_read+0x38/0x60
 [<c01073c9>] sysenter_past_esp+0x52/0x71

sshd          S EDEF5DA4     0  3756   3364  3759    3914  3238 (NOTLB)
edef5d7c 00000082 edef5e50 edef5da4 c010dfe0 c03c6cb0 f7a9cb80 e089c860
       00000620 c192f240 c0268b62 d7c83812 d7c83812 d7c83812 00000000 00000246
       f7fa3190 c1815be0 000024d4 56c3dc45 0001da75 f70aa9e0 f70ab980 f70aa810
Call Trace:
 [<c010dfe0>] do_gettimeofday+0x20/0xc0
 [<c0268b62>] alloc_skb+0x32/0xd0
 [<c0129693>] schedule_timeout+0xc3/0xd0
 [<c02c090a>] unix_stream_data_wait+0xfa/0x180
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c0107365>] need_resched+0x27/0x32
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c02c1003>] unix_stream_recvmsg+0x673/0x710
 [<c013dc64>] file_read_actor+0xc4/0xd0
 [<c01aa31f>] reiserfs_file_write+0x7ff/0x810
 [<c0265030>] sock_aio_read+0xb0/0xd0
 [<c0160bcd>] do_sync_read+0x6d/0xb0
 [<c01f556e>] release_dev+0x33e/0x7e0
 [<c015fdec>] dentry_open+0x14c/0x220
 [<c015fc8f>] filp_open+0x4f/0x60
 [<c0160cf7>] vfs_read+0xe7/0x110
 [<c0161d39>] __fput+0xb9/0x120
 [<c0160f18>] sys_read+0x38/0x60
 [<c01073c9>] sysenter_past_esp+0x52/0x71

sshd          D 00000000     0  3759   3756  3760               (NOTLB)
d49bfd18 00000086 00000000 00000000 00000000 00000000 c1fa6700 00000000
       00000000 00000000 00000000 00000000 d49be000 00000000 00000246 00000000
       ffffffff c1815be0 00000bad 1ca15353 000222b0 f7c9ce50 00000000 c030dc20
Call Trace:
 [<c0129642>] schedule_timeout+0x72/0xd0
 [<c01295c0>] process_timeout+0x0/0x10
 [<c011bfa8>] io_schedule_timeout+0x28/0x40
 [<c020e8ab>] blk_congestion_wait+0x7b/0x90
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c014205b>] __alloc_pages+0x29b/0x330
 [<c015bc11>] read_swap_cache_async+0x101/0x10d
 [<c014f79f>] swapin_readahead+0x2f/0xd0
 [<c014fb57>] do_swap_page+0x317/0x430
 [<c014d835>] pte_alloc_map+0xc5/0x130
 [<c01507f8>] handle_mm_fault+0xc8/0x1d0
 [<c0117444>] do_page_fault+0x304/0x4ef
 [<c01fc19a>] pty_chars_in_buffer+0x1a/0x40
 [<c01fc175>] pty_write_room+0x25/0x30
 [<c0175a04>] poll_freewait+0x44/0x50
 [<c0175dc7>] do_select+0x1e7/0x330
 [<c0117140>] do_page_fault+0x0/0x4ef
 [<c0107e85>] error_code+0x2d/0x38
 [<c017007b>] get_write_access+0x4b/0xe0
 [<c01cf8e0>] __copy_to_user_ll+0x40/0x60
 [<c01762e7>] sys_select+0x3b7/0x4d0
 [<c0160f78>] sys_write+0x38/0x60
 [<c01073c9>] sysenter_past_esp+0x52/0x71

bash          S 00000246     0  3760   3759                     (NOTLB)
e7045e58 00000086 c030d300 00000246 01468f60 c0141c4f f7cd84c0 db8215b4
       c030d680 c12446c0 00000000 00000000 00000082 c1914c00 d20a9000 e7045e94
       e7045e6c c180dbe0 0008cdf5 57ea2b52 0001db26 f714fb30 c013ce05 c011a771
Call Trace:
 [<c0141c4f>] buffered_rmqueue+0x10f/0x280
 [<c013ce05>] find_get_page+0x35/0xc0
 [<c011a771>] __wake_up_common+0x31/0x60
 [<c0129693>] schedule_timeout+0xc3/0xd0
 [<c01f81b4>] opost_block+0xf4/0x1b0
 [<c01facaa>] read_chan+0x96a/0xb00
 [<c011a730>] default_wake_function+0x0/0x10
 [<c011a730>] default_wake_function+0x0/0x10
 [<c011a730>] default_wake_function+0x0/0x10
 [<c01f46dd>] tty_write+0x1ad/0x360
 [<c01f44f6>] tty_read+0x176/0x1b0
 [<c0160cc0>] vfs_read+0xb0/0x110
 [<c0160f18>] sys_read+0x38/0x60
 [<c01073c9>] sysenter_past_esp+0x52/0x71

sshd          S D806FDA4     0  3914   3364  3917          3756 (NOTLB)
d806fd7c 00000086 d806fe50 d806fda4 00000000 d806fe50 f7a6adc0 00000000
       00047c9c 00000000 1ec86840 d806fd5c f7aae140 d5d0fb1e 02036e86 d19acde0
       d19ace00 c180dbe0 0000201b 11d21020 000203d5 eff20e10 c180dbe0 d806fd98
Call Trace:
 [<c0129693>] schedule_timeout+0xc3/0xd0
 [<c0118397>] recalc_task_prio+0x97/0x1c0
 [<c02c090a>] unix_stream_data_wait+0xfa/0x180
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c011d4b1>] autoremove_wake_function+0x11/0x40
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c0268b62>] alloc_skb+0x32/0xd0
 [<c02c1003>] unix_stream_recvmsg+0x673/0x710
 [<c0265030>] sock_aio_read+0xb0/0xd0
 [<c0160bcd>] do_sync_read+0x6d/0xb0
 [<c0129026>] update_wall_time+0x16/0x40
 [<c0160cf7>] vfs_read+0xe7/0x110
 [<c0160f18>] sys_read+0x38/0x60
 [<c01073c9>] sysenter_past_esp+0x52/0x71

sshd          D 00000000     0  3917   3914  3918               (NOTLB)
f682dd18 00000000 00000000 00000000 00000000 f7a9c040        00000000 00000000 00000000 00000000 f682c000 00000000 00000246 00000000
       ffffffff c180dbe0 000014af 982fad90 000222af f7aae310 00000000 c030dc20
Call Trace:
 [<c0129642>] schedule_timeout+0x72/0xd0
 [<c01295c0>] process_timeout+0x0/0x10
 [<c011bfa8>] io_schedule_timeout+0x28/0x40
 [<c020e8ab>] blk_congestion_wait+0x7b/0x90
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c014205b>] __alloc_pages+0x29b/0x330
 [<c01cf8f9>] __copy_to_user_ll+0x59/0x60
 [<c015bc11>] read_swap_cache_async+0x101/0x10d
 [<c014f79f>] swapin_readahead+0x2f/0xd0
 [<c014fb57>] do_swap_page+0x317/0x430
 [<c014d835>] pte_alloc_map+0xc5/0x130
 [<c01507f8>] handle_mm_fault+0xc8/0x1d0
 [<c0117444>] do_page_fault+0x304/0x4ef
 [<c01fc19a>] pty_chars_in_buffer+0x1a/0x40
 [<c01fc175>] pty_write_room+0x25/0x30
 [<c0175a04>] poll_freewait+0x44/0x50
 [<c0175dc7>] do_select+0x1e7/0x330
 [<c0117140>] do_page_fault+0x0/0x4ef
 [<c0107e85>] error_code+0x2d/0x38
 [<c017007b>] get_write_access+0x4b/0xe0
 [<c01cf8e0>] __copy_to_user_ll+0x40/0x60
 [<c01762e7>] sys_select+0x3b7/0x4d0
 [<c0114246>] smp_apic_timer_interrupt+0xd6/0x140
 [<c01073c9>] sysenter_past_esp+0x52/0x71

bash          S 00000246     0  3918   3917                     (NOTLB)
ef5dde58 00000082 c030d780 00000246 c030d780 c0141c4f f7bd5040 db8215b4
       c030db00 c17735c0 00000000 00000000 0000038e c1914c00 eff81000 d19ac810
       d19ac830 c180dbe0 000cb843 74fcf82c 0001dc80 f7a40940 c013ce05 00000000
Call Trace:
 [<c0141c4f>] buffered_rmqueue+0x10f/0x280
 [<c013ce05>] find_get_page+0x35/0xc0
 [<c0129693>] schedule_timeout+0xc3/0xd0
 [<c01f81b4>] opost_block+0xf4/0x1b0
 [<c01facaa>] read_chan+0x96a/0xb00
 [<c011a730>] default_wake_function+0x0/0x10
 [<c011a730>] default_wake_function+0x0/0x10
 [<c011a730>] default_wake_function+0x0/0x10
 [<c01f46dd>] tty_write+0x1ad/0x360
 [<c01f44f6>] tty_read+0x176/0x1b0
 [<c0160cc0>] vfs_read+0xb0/0x110
 [<c0160f18>] sys_read+0x38/0x60
 [<c01073c9>] sysenter_past_esp+0x52/0x71

pdflush       S 00000000     0  3951      6                  24 (L-TLB)
c268df78 00000046 00000000 00000000 00000000 00000000 f7a4d940 00000000
       00000000 00000000 00000000 00000000 00000000 00000000 00000000 f73d07d0
       f73d07f0 c1815be0 00000110 29477494 000222b0 f7bb28a0 00000000 00000000
Call Trace:
 [<c0144105>] __pdflush+0xd5/0x380
 [<c011a771>] __wake_up_common+0x31/0x60
 [<c01443b0>] pdflush+0x0/0x10
 [<c01443ba>] pdflush+0xa/0x10
 [<c01443b0>] pdflush+0x0/0x10
 [<c0135e94>] kthread+0xa4/0xb0
 [<c0135df0>] kthread+0x0/0xb0
 [<c0104ec5>] kernel_thread_helper+0x5/0x10

pdflush       S 00000000     0  6583      7                     (L-TLB)
c4ae1f78 00000046 00000000 00000000 00000000 00000000 f7a9c040 00000000
       00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
       00000000 c180dbe0 00000285 9884aecd 000222af eff20270 00000000 00000000
Call Trace:
 [<c0144105>] __pdflush+0xd5/0x380
 [<c011a771>] __wake_up_common+0x31/0x60
 [<c01443b0>] pdflush+0x0/0x10
 [<c01443ba>] pdflush+0xa/0x10
 [<c01443b0>] pdflush+0x0/0x10
 [<c0135e94>] kthread+0xa4/0xb0
 [<c0135df0>] kthread+0x0/0xb0
 [<c0104ec5>] kernel_thread_helper+0x5/0x10

cron          S C030D300     0  6677   3391  6678    6760       (NOTLB)
f4233ed8 00000082 c0141e77 c030d300 00000010 00000001 e45efb80 d19ac240
       f7b074c0 f7a85f0c c011a2c9 f4233f04 00000082 d19ac240 00000010 e7892280
       e78922a0 c1815be0 0000eff8 924d6713 00020568 d19ac410 f7fffaa0 f7a62180
Call Trace:
 [<c0141e77>] __alloc_pages+0xb7/0x330
 [<c011a2c9>] schedule+0x389/0x7a0
 [<c016ee7c>] pipe_wait+0x7c/0xa0
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c01cf8f9>] __copy_to_user_ll+0x59/0x60
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c016f07a>] pipe_readv+0x1da/0x2c0
 [<c016f180>] pipe_read+0x20/0x30
 [<c0160cc0>] vfs_read+0xb0/0x110
 [<c0160f18>] sys_read+0x38/0x60
 [<c01073c9>] sysenter_past_esp+0x52/0x71

sh            S E0B63080     0  6678   6677  6679               (NOTLB)
caa4df48 c01508a3 e78483bc e45ef940 2c5fa065
       c011db30 f6f71380 e45ef940 e45ef960 f6f71380 d19ad3b0 c0117444 f73d0da0
       f73d0dc0 c1815be0 0002185a 91df5d00 00020568 d19ad580 f1499544 00000001
Call Trace:
 [<c01508a3>] handle_mm_fault+0x173/0x1d0
 [<c011db30>] copy_mm+0x250/0x570
 [<c0117444>] do_page_fault+0x304/0x4ef
 [<c012354b>] sys_wait4+0x1bb/0x280
 [<c011a730>] default_wake_function+0x0/0x10
 [<c011a730>] default_wake_function+0x0/0x10
 [<c0123635>] sys_waitpid+0x25/0x29
 [<c01073c9>] sysenter_past_esp+0x52/0x71

daily_reports S F0D7B080     0  6679   6678  6689               (NOTLB)
e9afbf48 00000086 080f1e34 f0d7b080 c01508a3 edc523c4 f7cd8700 347c9065
       c011db30 f6f71770 f7cd8700 f7cd8720 f6f71770 d19ac810 c0117444 00000001
       aea87c72 c180dbe0 00016174 5e27e5ad 0002057f d19ac9e0 f712a584 00000000
Call Trace:
 [<c01508a3>] handle_mm_fault+0x173/0x1d0
 [<c011db30>] copy_mm+0x250/0x570
 [<c0117444>] do_page_fault+0x304/0x4ef
 [<c012354b>] sys_wait4+0x1bb/0x280
 [<c011a730>] default_wake_function+0x0/0x10
 [<c011a730>] default_wake_function+0x0/0x10
 [<c0123635>] sys_waitpid+0x25/0x29
 [<c01073c9>] sysenter_past_esp+0x52/0x71

mysql         S C015081C     0  6689   6679                     (NOTLB)
f5fe5d7c 00000086 d57ed400 c015081c 00000001 f4182998 f7d66dc0 c01bebe9
       ccadf818 f7d66dc0 f7d66de0 ccadf818 f73d07d0 603d05d3 0002057f f73d07d0
       f73d07f0 c1815be0 000021e1 605a5570 0002057f e78935c0 00000000 f5fe5df8
Call Trace:
 [<c015081c>] handle_mm_fault+0xec/0x1d0
 [<c01bebe9>] journal_mark_dirty+0x159/0x2e0
 [<c0129693>] schedule_timeout+0xc3/0xd0
 [<c0141e77>] __alloc_pages+0xb7/0x330
 [<c011d4b1>] autoremove_wake_function+0x11/0x40
 [<c02c090a>] unix_stream_data_wait+0xfa/0x180
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c0268b62>] alloc_skb+0x32/0xd0
 [<c02c1003>] unix_stream_recvmsg+0x673/0x710
 [<c0265030>] sock_aio_read+0xb0/0xd0
 [<c0160bcd>] do_sync_read+0x6d/0xb0
 [<c014e246>] unmap_vmas+0xf6/0x310
 [<c01488ff>] __pagevec_lru_add_active+0x13f/0x1b0
 [<c012eb45>] sys_rt_sigaction+0xd5/0xf0
 [<c0160cf7>] vfs_read+0xe7/0x110
 [<c017489b>] do_fcntl+0x11b/0x1d0
 [<c0160f18>] sys_read+0x38/0x60
 [<c01073c9>] sysenter_past_esp+0x52/0x71

cron          S C030D300     0  6760   3391  6761          6677 (NOTLB)
eb401ed8 00000082 c0141e77 c030d300 00000010 00000001 f7a4d040 e7892280
       f7a4d040 d4323c28 c011a2c9 eb401f04 00000082 e7892280 00000010 c1962c20
       c1962c40 c1815be0 0000d409 4121cbab 0002062c e7892450 f7fffaa0 c1962c20
Call Trace:
 [<c0141e77>] __alloc_pages+0xb7/0x330
 [<c011a2c9>] schedule+0x389/0x7a0
 [<c016ee7c>] pipe_wait+0x7c/0xa0
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c01cf8f9>] __copy_to_user_ll+0x59/0x60
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c016f07a>] pipe_readv+0x1da/0x2c0
 [<c016f180>] pipe_read+0x20/0x30
 [<c0160cc0>] vfs_read+0xb0/0x110
 [<c0160f18>] sys_read+0x38/0x60
 [<c01073c9>] sysenter_past_esp+0x52/0x71

sh            S F7B97860     0  6761   6760  6763               (NOTLB)
d4323f48 00000086 0002062c f7b97860 f7b97880 c1815be0 f7a4d4c0 4190165b
       0002062c c1962df4 d4323f7c d4322000 d4322000 d4323f7c d4322000 e8af8160
       e8af8180 c1815be0 000015ca 41908132 0002062c c1962df0 f7bf5fa4 00000001
Call Trace:
 [<c012354b>] sys_wait4+0x1bb/0x280
 [<c011a730>] default_wake_function+0x0/0x10
 [<c011a730>] default_wake_function+0x0/0x10
 [<c0123635>] sys_waitpid+0x25/0x29
 [<c01073c9>] sysenter_past_esp+0x52/0x71

php           S D2527DA4     0  6763   6761                     (NOTLB)
d2527d7c 00000082 d2527e50 d2527da4 00000000 d2527e50 c1fa6280 00000000
       00008c65 00000000 36d5b0c8 00000000 f70ab3b0 c10c0900 e5d5b060 f70ab3b0
       f70ab3d0 c180dbe0 00002530 8a40e181 0002062c e8af8ed0 00000000 f7dacc00
Call Trace:
 [<c0129693>] schedule_timeout+0xc3/0xd0
 [<c0118397>] recalc_task_prio+0x97/0x1c0
 [<c02c090a>] unix_stream_data_wait+0xfa/0x180
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c011d4b1>] autoremove_wake_function+0x11/0x40
 [<c011d4a0>] autoremove_wake_function+0x0/0x40
 [<c02c1003>] unix_stream_recvmsg+0x673/0x710
 [<c0265030>] sock_aio_read+0xb0/0xd0
 [<c0160bcd>] do_sync_read+0x6d/0xb0
 [<c0160cf7>] vfs_read+0xe7/0x110
 [<c017489b>] do_fcntl+0x11b/0x1d0
 [<c0160f18>] sys_read+0x38/0x60
 [<c01073c9>] sysenter_past_esp+0x52/0x71

-- 
                   Best regards,
                   Sergey S. Kostyliov <rathamahata@php4.ru>
                   Public PGP key: http://sysadminday.org.ru/rathamahata.asc

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.X kernel memory leak?
  2004-04-09  7:17                           ` 2.6.X kernel memory leak? Sergey S. Kostyliov
@ 2004-04-09  9:09                             ` Andrew Morton
  2004-04-09 12:15                               ` Sergey S. Kostyliov
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2004-04-09  9:09 UTC (permalink / raw)
  To: Sergey S. Kostyliov; +Cc: linux-kernel, anton

"Sergey S. Kostyliov" <rathamahata@php4.ru> wrote:
>
> And here is part of sysrq-T for the third machine, which have just locked up,
>  kernel is 2.6.5-rc3-aa2.

It does look like a kernel memory leak, but it's not into slab.

You've disabled iptables.  Possibly there's a leak in a device driver? 
Which drivers are in regular use there?  What are you using for those
hardware RAID controllers?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.X kernel memory leak?
  2004-04-09  9:09                             ` Andrew Morton
@ 2004-04-09 12:15                               ` Sergey S. Kostyliov
  0 siblings, 0 replies; 24+ messages in thread
From: Sergey S. Kostyliov @ 2004-04-09 12:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, anton

Hello Andrew,

On Friday 09 April 2004 13:09, Andrew Morton wrote:
> "Sergey S. Kostyliov" <rathamahata@php4.ru> wrote:
> >
> > And here is part of sysrq-T for the third machine, which have just locked up,
> >  kernel is 2.6.5-rc3-aa2.
> 
> It does look like a kernel memory leak, but it's not into slab.
> 
> You've disabled iptables.  Possibly there's a leak in a device driver? 
> Which drivers are in regular use there?  What are you using for those
> hardware RAID controllers?

I've seen this kind of lockup (according to sysrq-T) on different boxes:

1) ope
	RAID:		mylex 352
	drivers:	e100, dac960
	.config:	http://sysadminday.org.ru/2.6.1-io_lockup/ope/.config

2) terror
	RAID:		megaraid 320-2
	drivers:	e1000, megaraid2
	.config:	http://sysadminday.org.ru/2.6.X-lockup/terror/.config

3) mirror
	drivers:	e100, aic7xxx, md, netconsole
	.config:	http://sysadminday.org.ru/2.6.X-lockup/mirror/.config

I also saw the same symptoms on a fourth box, but I'm not shure about
this one because it didn't use to be attached to serial console at that time.

For this box:
	RAID:		Compaq smart 2
	drivers:	tlan,epic100,cpqarray

-- 
                   Best regards,
                   Sergey S. Kostyliov <rathamahata@php4.ru>
                   Public PGP key: http://sysadminday.org.ru/rathamahata.asc

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2004-04-09 12:19 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-01-31 16:40 2.6.1 IO lockup on SMP systems Sergey S. Kostyliov
2004-02-01  0:17 ` Andrew Morton
2004-02-21 16:45   ` Sergey S. Kostyliov
2004-02-21 19:30     ` Andrew Morton
2004-02-22 17:39       ` Alexander Y. Fomichev
2004-02-23 17:27         ` Sergey S. Kostyliov
2004-02-23 21:30           ` Mike Fedyk
2004-02-24 11:56             ` Sergey S. Kostyliov
2004-02-23 22:26           ` Andrew Morton
2004-02-24  7:23             ` Marcelo Tosatti
2004-02-24  6:53               ` Andrew Morton
2004-02-24 11:54             ` Sergey S. Kostyliov
2004-02-26 12:19               ` Sergey S. Kostyliov
2004-02-26 12:53                 ` Andrew Morton
2004-02-26 13:11                   ` Andrew Morton
2004-02-26 14:37                     ` Dave Jones
2004-02-26 15:37                       ` Arjan van de Ven
2004-02-26 14:30                   ` Sergey S. Kostyliov
2004-02-26 20:03                     ` Andrew Morton
2004-02-28 14:56                       ` Sergey S. Kostyliov
2004-04-08  9:08                         ` 2.6.X kernel memory leak? (was: Re: 2.6.1 IO lockup on SMP systems) Sergey S. Kostyliov
2004-04-09  7:17                           ` 2.6.X kernel memory leak? Sergey S. Kostyliov
2004-04-09  9:09                             ` Andrew Morton
2004-04-09 12:15                               ` Sergey S. Kostyliov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox