public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Thomas Davis <tadavis@lbl.gov>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>,
	"'Jens Axboe'" <axboe@suse.de>,
	Claudio Martins <ctpm@rnl.ist.utl.pt>,
	Andrew Morton <akpm@osdl.org>,
	lkml <linux-kernel@vger.kernel.org>,
	Neil Brown <neilb@cse.unsw.edu.au>
Subject: Re: Processes stuck on D state on Dual Opteron
Date: Tue, 12 Apr 2005 10:07:04 -0700	[thread overview]
Message-ID: <425C0038.5030809@lbl.gov> (raw)
In-Reply-To: <425BB958.3080308@yahoo.com.au>

Nick Piggin wrote:
> 
> It is a bit subtle: get_request may only drop the lock and return NULL
> (after retaking the lock), if we fail on a memory allocation. If we
> just fail due to unavailable queue slots, then the lock is never
> dropped. And the mem allocation can't fail because it is a mempool
> alloc with GFP_NOIO.
> 

I'm jumping in here, because we have seen this problem on a X86-64 system, with 4gb of ram, and SLES9 (2.6.5-7.141)

You can drive the node into this state:

Mem-info:
Node 1 DMA per-cpu: empty
Node 1 Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
Node 1 HighMem per-cpu: empty
Node 0 DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
cpu 1 hot: low 2, high 6, batch 1
cpu 1 cold: low 0, high 2, batch 1
Node 0 Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
Node 0 HighMem per-cpu: empty

Free pages:       10360kB (0kB HighMem)
Active:485853 inactive:421820 dirty:0 writeback:0 unstable:0 free:2590 slab:10816 mapped:903444 pagetables:2097
Node 1 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB
lowmem_reserve[]: 0 1664 1664
Node 1 Normal free:2464kB min:2468kB low:4936kB high:7404kB active:918440kB inactive:710360kB present:1703936kB
lowmem_reserve[]: 0 0 0
Node 1 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
lowmem_reserve[]: 0 0 0
Node 0 DMA free:4928kB min:20kB low:40kB high:60kB active:0kB inactive:0kB present:16384kB
lowmem_reserve[]: 0 2031 2031
Node 0 Normal free:2968kB min:3016kB low:6032kB high:9048kB active:1024968kB inactive:976924kB present:2080764kB
lowmem_reserve[]: 0 0 0
Node 0 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
lowmem_reserve[]: 0 0 0
Node 1 DMA: empty
Node 1 Normal: 46*4kB 19*8kB 9*16kB 4*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2464kB
Node 1 HighMem: empty
Node 0 DMA: 4*4kB 4*8kB 1*16kB 2*32kB 3*64kB 4*128kB 2*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4928kB
Node 0 Normal: 0*4kB 1*8kB 1*16kB 0*32kB 0*64kB 1*128kB 1*256kB 3*512kB 1*1024kB 0*2048kB 0*4096kB = 2968kB
Node 0 HighMem: empty
Swap cache: add 1009224, delete 106245, find 179674/181478, race 0+2
Free swap:       4739812kB
950271 pages of RAM
17513 reserved pages
2788 pages shared
902980 pages swap cached

with processes doing this:

SysRq : Show State

                                                       sibling
  task                 PC          pid father child younger older
init          D 000001000000e810     0     1      0     2               (NOTLB)
000001007ff81be8 0000000000000006 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 0000000000000000 0000000000000000
       0000000000000000 0000010002c1d6e0
Call Trace:<ffffffff8017338b>{try_to_free_pages+283} <ffffffff80147d0d>{schedule_timeout+173}
       <ffffffff80147c50>{process_timeout+0} <ffffffff8013a292>{io_schedule_timeout+82}
       <ffffffff80280efd>{blk_congestion_wait+141} <ffffffff8013c530>{autoremove_wake_function+0}
       <ffffffff8013c530>{autoremove_wake_function+0} <ffffffff8016ab68>{__alloc_pages+776}
       <ffffffff8018573f>{read_swap_cache_async+63} <ffffffff801781b1>{swapin_readahead+97}
       <ffffffff8017834e>{do_swap_page+142} <ffffffff801796a1>{handle_mm_fault+337}
       <ffffffff80123ebb>{do_page_fault+411} <ffffffff801a3259>{sys_select+1097}
       <ffffffff801a332f>{sys_select+1311} <ffffffff801122a9>{error_exit+0}

mg.C.2        D 000001000000e810     0  1971   1955  1972               (NOTLB)
00000100e236bc68 0000000000000006 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 0000000000000000 0000000000000000
       0000000100000000 00000100816ed360
Call Trace:<ffffffff8017338b>{try_to_free_pages+283} <ffffffff80147d0d>{schedule_timeout+173}
       <ffffffff80147c50>{process_timeout+0} <ffffffff8013a292>{io_schedule_timeout+82}
       <ffffffff80280efd>{blk_congestion_wait+141} <ffffffff8013c530>{autoremove_wake_function+0}
       <ffffffff8013c530>{autoremove_wake_function+0} <ffffffff8016ab68>{__alloc_pages+776}
       <ffffffff801778ad>{do_wp_page+285} <ffffffff801796c5>{handle_mm_fault+373}
       <ffffffff80123ebb>{do_page_fault+411} <ffffffff801122a9>{error_exit+0}
mg.C.2        S 000001007b0a06a0     0  1972   1971          1974       (NOTLB)
00000100bc1c1ca0 0000000000000006 0000000000000010 0000000000010246
       000000000004c7c0 00000100816ec280 0000007680000780 0000010081f23390
       0000000180000780 00000100816ed360
Call Trace:<ffffffff8016abb4>{__alloc_pages+852} <ffffffff80110ac8>{__down_interruptible+216}
       <ffffffff80139280>{default_wake_function+0} <ffffffff8013531c>{recalc_task_prio+940}
       <ffffffff80230d91>{__down_failed_interruptible+53}
       <ffffffffa01cc47e>{:mosal:.text.lock.mosal_sync+5}
       <ffffffffa0291daf>{:mod_vipkl:VIPKL_EQ_poll+607} <ffffffffa029bb01>{:mod_vipkl:VIPKL_EQ_poll_stat+529}
       <ffffffffa029e658>{:mod_vipkl:VIPKL_ioctl+5144} <ffffffffa0294e21>{:mod_vipkl:vipkl_wrap_kernel_ioctl+417}
       <ffffffff8018c00e>{filp_close+126} <ffffffff801a1fb4>{sys_ioctl+612}
       <ffffffff801118d4>{system_call+124}
mg.C.2        S 000001007b0a18c0     0  1974   1971                1972 (NOTLB)
00000100a3955ca0 0000000000000006 00000001e7d422e8 000001002c9ca550
       000000000005f138 00000100816ec280 0000007680000780 0000010081f23390
       0000000180000780 00000100816ed360
Call Trace:<ffffffff8016abb4>{__alloc_pages+852} <ffffffff80110ac8>{__down_interruptible+216}
       <ffffffff80139280>{default_wake_function+0} <ffffffff8013531c>{recalc_task_prio+940}
       <ffffffff80230d91>{__down_failed_interruptible+53}
       <ffffffffa01cc47e>{:mosal:.text.lock.mosal_sync+5}
       <ffffffffa0291daf>{:mod_vipkl:VIPKL_EQ_poll+607} <ffffffff8011db9d>{smp_send_reschedule+29}
       <ffffffffa029bb01>{:mod_vipkl:VIPKL_EQ_poll_stat+529}
       <ffffffffa029e658>{:mod_vipkl:VIPKL_ioctl+5144} <ffffffffa0294e21>{:mod_vipkl:vipkl_wrap_kernel_ioctl+417}
       <ffffffff8018c00e>{filp_close+126} <ffffffff801a1fb4>{sys_ioctl+612}
       <ffffffff801118d4>{system_call+124}

and it will never, ever recover from it.

Note - this is a cluster of AMD x86_64's, running IB with 4gb of ram.  We have limited the amount of memory that IB can pin down, and limited process size to 1.5gb (on a 4gb machine!) just to maintain stability.

We do not use md; it's a compute node with only a single local drive.

We have been told, the 2.6 memory allocator goes into an infinite loop, and never recovers from it.

thomas

  reply	other threads:[~2005-04-12 17:10 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-05  2:16 Processes stuck on D state on Dual Opteron Claudio Martins
2005-04-05  2:12 ` Andrew Morton
2005-04-10  2:28   ` Claudio Martins
2005-04-10  2:47     ` Andrew Morton
2005-04-10  3:19       ` Claudio Martins
2005-04-11  0:38       ` Claudio Martins
2005-04-11  6:36         ` Nick Piggin
2005-04-11  9:55         ` Nick Piggin
2005-04-11 12:45           ` Nick Piggin
2005-04-11 14:05             ` Claudio Martins
2005-04-11 22:59               ` Nick Piggin
2005-04-12  0:22                 ` Claudio Martins
2005-04-12  0:46                   ` Andrew Morton
2005-04-13  0:31                     ` Claudio Martins
2005-04-13  2:24                       ` Nick Piggin
2005-04-12  1:19                   ` Nick Piggin
2005-04-12  7:07                     ` Jens Axboe
2005-04-12  8:03                       ` Chen, Kenneth W
2005-04-12 11:09                         ` Nick Piggin
2005-04-12 11:26                           ` Nick Piggin
2005-04-12 12:04                             ` Nick Piggin
2005-04-12 17:07                               ` Thomas Davis [this message]
2005-04-12 18:33                           ` Chen, Kenneth W
2005-04-13  1:45                             ` Nick Piggin
2005-04-11 23:46             ` Neil Brown
2005-04-12  0:30               ` Claudio Martins
2005-04-10  2:53     ` Nick Piggin
2005-04-10  3:22       ` Claudio Martins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=425C0038.5030809@lbl.gov \
    --to=tadavis@lbl.gov \
    --cc=akpm@osdl.org \
    --cc=axboe@suse.de \
    --cc=ctpm@rnl.ist.utl.pt \
    --cc=kenneth.w.chen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@cse.unsw.edu.au \
    --cc=nickpiggin@yahoo.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox