All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Davis <tadavis@lbl.gov>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>,
	"'Jens Axboe'" <axboe@suse.de>,
	Claudio Martins <ctpm@rnl.ist.utl.pt>,
	Andrew Morton <akpm@osdl.org>,
	lkml <linux-kernel@vger.kernel.org>,
	Neil Brown <neilb@cse.unsw.edu.au>
Subject: Re: Processes stuck on D state on Dual Opteron
Date: Tue, 12 Apr 2005 10:07:04 -0700	[thread overview]
Message-ID: <425C0038.5030809@lbl.gov> (raw)
In-Reply-To: <425BB958.3080308@yahoo.com.au>

Nick Piggin wrote:
> 
> It is a bit subtle: get_request may only drop the lock and return NULL
> (after retaking the lock), if we fail on a memory allocation. If we
> just fail due to unavailable queue slots, then the lock is never
> dropped. And the mem allocation can't fail because it is a mempool
> alloc with GFP_NOIO.
> 

I'm jumping in here, because we have seen this problem on a X86-64 system, with 4gb of ram, and SLES9 (2.6.5-7.141)

You can drive the node into this state:

Mem-info:
Node 1 DMA per-cpu: empty
Node 1 Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
Node 1 HighMem per-cpu: empty
Node 0 DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
cpu 1 hot: low 2, high 6, batch 1
cpu 1 cold: low 0, high 2, batch 1
Node 0 Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
Node 0 HighMem per-cpu: empty

Free pages:       10360kB (0kB HighMem)
Active:485853 inactive:421820 dirty:0 writeback:0 unstable:0 free:2590 slab:10816 mapped:903444 pagetables:2097
Node 1 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB
lowmem_reserve[]: 0 1664 1664
Node 1 Normal free:2464kB min:2468kB low:4936kB high:7404kB active:918440kB inactive:710360kB present:1703936kB
lowmem_reserve[]: 0 0 0
Node 1 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
lowmem_reserve[]: 0 0 0
Node 0 DMA free:4928kB min:20kB low:40kB high:60kB active:0kB inactive:0kB present:16384kB
lowmem_reserve[]: 0 2031 2031
Node 0 Normal free:2968kB min:3016kB low:6032kB high:9048kB active:1024968kB inactive:976924kB present:2080764kB
lowmem_reserve[]: 0 0 0
Node 0 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
lowmem_reserve[]: 0 0 0
Node 1 DMA: empty
Node 1 Normal: 46*4kB 19*8kB 9*16kB 4*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2464kB
Node 1 HighMem: empty
Node 0 DMA: 4*4kB 4*8kB 1*16kB 2*32kB 3*64kB 4*128kB 2*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4928kB
Node 0 Normal: 0*4kB 1*8kB 1*16kB 0*32kB 0*64kB 1*128kB 1*256kB 3*512kB 1*1024kB 0*2048kB 0*4096kB = 2968kB
Node 0 HighMem: empty
Swap cache: add 1009224, delete 106245, find 179674/181478, race 0+2
Free swap:       4739812kB
950271 pages of RAM
17513 reserved pages
2788 pages shared
902980 pages swap cached

with processes doing this:

SysRq : Show State

                                                       sibling
  task                 PC          pid father child younger older
init          D 000001000000e810     0     1      0     2               (NOTLB)
000001007ff81be8 0000000000000006 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 0000000000000000 0000000000000000
       0000000000000000 0000010002c1d6e0
Call Trace:<ffffffff8017338b>{try_to_free_pages+283} <ffffffff80147d0d>{schedule_timeout+173}
       <ffffffff80147c50>{process_timeout+0} <ffffffff8013a292>{io_schedule_timeout+82}
       <ffffffff80280efd>{blk_congestion_wait+141} <ffffffff8013c530>{autoremove_wake_function+0}
       <ffffffff8013c530>{autoremove_wake_function+0} <ffffffff8016ab68>{__alloc_pages+776}
       <ffffffff8018573f>{read_swap_cache_async+63} <ffffffff801781b1>{swapin_readahead+97}
       <ffffffff8017834e>{do_swap_page+142} <ffffffff801796a1>{handle_mm_fault+337}
       <ffffffff80123ebb>{do_page_fault+411} <ffffffff801a3259>{sys_select+1097}
       <ffffffff801a332f>{sys_select+1311} <ffffffff801122a9>{error_exit+0}

mg.C.2        D 000001000000e810     0  1971   1955  1972               (NOTLB)
00000100e236bc68 0000000000000006 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 0000000000000000 0000000000000000
       0000000100000000 00000100816ed360
Call Trace:<ffffffff8017338b>{try_to_free_pages+283} <ffffffff80147d0d>{schedule_timeout+173}
       <ffffffff80147c50>{process_timeout+0} <ffffffff8013a292>{io_schedule_timeout+82}
       <ffffffff80280efd>{blk_congestion_wait+141} <ffffffff8013c530>{autoremove_wake_function+0}
       <ffffffff8013c530>{autoremove_wake_function+0} <ffffffff8016ab68>{__alloc_pages+776}
       <ffffffff801778ad>{do_wp_page+285} <ffffffff801796c5>{handle_mm_fault+373}
       <ffffffff80123ebb>{do_page_fault+411} <ffffffff801122a9>{error_exit+0}
mg.C.2        S 000001007b0a06a0     0  1972   1971          1974       (NOTLB)
00000100bc1c1ca0 0000000000000006 0000000000000010 0000000000010246
       000000000004c7c0 00000100816ec280 0000007680000780 0000010081f23390
       0000000180000780 00000100816ed360
Call Trace:<ffffffff8016abb4>{__alloc_pages+852} <ffffffff80110ac8>{__down_interruptible+216}
       <ffffffff80139280>{default_wake_function+0} <ffffffff8013531c>{recalc_task_prio+940}
       <ffffffff80230d91>{__down_failed_interruptible+53}
       <ffffffffa01cc47e>{:mosal:.text.lock.mosal_sync+5}
       <ffffffffa0291daf>{:mod_vipkl:VIPKL_EQ_poll+607} <ffffffffa029bb01>{:mod_vipkl:VIPKL_EQ_poll_stat+529}
       <ffffffffa029e658>{:mod_vipkl:VIPKL_ioctl+5144} <ffffffffa0294e21>{:mod_vipkl:vipkl_wrap_kernel_ioctl+417}
       <ffffffff8018c00e>{filp_close+126} <ffffffff801a1fb4>{sys_ioctl+612}
       <ffffffff801118d4>{system_call+124}
mg.C.2        S 000001007b0a18c0     0  1974   1971                1972 (NOTLB)
00000100a3955ca0 0000000000000006 00000001e7d422e8 000001002c9ca550
       000000000005f138 00000100816ec280 0000007680000780 0000010081f23390
       0000000180000780 00000100816ed360
Call Trace:<ffffffff8016abb4>{__alloc_pages+852} <ffffffff80110ac8>{__down_interruptible+216}
       <ffffffff80139280>{default_wake_function+0} <ffffffff8013531c>{recalc_task_prio+940}
       <ffffffff80230d91>{__down_failed_interruptible+53}
       <ffffffffa01cc47e>{:mosal:.text.lock.mosal_sync+5}
       <ffffffffa0291daf>{:mod_vipkl:VIPKL_EQ_poll+607} <ffffffff8011db9d>{smp_send_reschedule+29}
       <ffffffffa029bb01>{:mod_vipkl:VIPKL_EQ_poll_stat+529}
       <ffffffffa029e658>{:mod_vipkl:VIPKL_ioctl+5144} <ffffffffa0294e21>{:mod_vipkl:vipkl_wrap_kernel_ioctl+417}
       <ffffffff8018c00e>{filp_close+126} <ffffffff801a1fb4>{sys_ioctl+612}
       <ffffffff801118d4>{system_call+124}

and it will never, ever recover from it.

Note - this is a cluster of AMD x86_64's, running IB with 4gb of ram.  We have limited the amount of memory that IB can pin down, and limited process size to 1.5gb (on a 4gb machine!) just to maintain stability.

We do not use md; it's a compute node with only a single local drive.

We have been told, the 2.6 memory allocator goes into an infinite loop, and never recovers from it.

thomas

  reply	other threads:[~2005-04-12 17:10 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-05  2:16 Processes stuck on D state on Dual Opteron Claudio Martins
2005-04-05  2:12 ` Andrew Morton
2005-04-10  2:28   ` Claudio Martins
2005-04-10  2:47     ` Andrew Morton
2005-04-10  3:19       ` Claudio Martins
2005-04-11  0:38       ` Claudio Martins
2005-04-11  6:36         ` Nick Piggin
2005-04-11  9:55         ` Nick Piggin
2005-04-11 12:45           ` Nick Piggin
2005-04-11 14:05             ` Claudio Martins
2005-04-11 22:59               ` Nick Piggin
2005-04-12  0:22                 ` Claudio Martins
2005-04-12  0:46                   ` Andrew Morton
2005-04-13  0:31                     ` Claudio Martins
2005-04-13  2:24                       ` Nick Piggin
2005-04-12  1:19                   ` Nick Piggin
2005-04-12  7:07                     ` Jens Axboe
2005-04-12  8:03                       ` Chen, Kenneth W
2005-04-12 11:09                         ` Nick Piggin
2005-04-12 11:26                           ` Nick Piggin
2005-04-12 12:04                             ` Nick Piggin
2005-04-12 17:07                               ` Thomas Davis [this message]
2005-04-12 18:33                           ` Chen, Kenneth W
2005-04-13  1:45                             ` Nick Piggin
2005-04-11 23:46             ` Neil Brown
2005-04-12  0:30               ` Claudio Martins
2005-04-10  2:53     ` Nick Piggin
2005-04-10  3:22       ` Claudio Martins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=425C0038.5030809@lbl.gov \
    --to=tadavis@lbl.gov \
    --cc=akpm@osdl.org \
    --cc=axboe@suse.de \
    --cc=ctpm@rnl.ist.utl.pt \
    --cc=kenneth.w.chen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@cse.unsw.edu.au \
    --cc=nickpiggin@yahoo.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.