Re: swapping to death by stressing mlock

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Rusty Lynch <rusty@linux.co.intel.com>
To: Rusty Lynch <rusty@linux.co.intel.com>
Cc: linux-mm@kvack.org
Subject: Re: swapping to death by stressing mlock
Date: 18 Sep 2003 13:46:57 -0700	[thread overview]
Message-ID: <1063918017.12547.9.camel@vmhack> (raw)
In-Reply-To: <200309182021.h8IKLnqX006918@penguin.co.intel.com>

I just loaded my 2.4.18 kernel and noticed that:
* I can not allocate and mlock as large a chunk of memory because mlock
returns fails, but I can start multiple allocate/mlock operations and
get to the same lockup
* BUT, the processes that are hogging up memory are now in a runnable
state (instead of being in an uninterpretable sleep), so I can us
meta-sysrq-i to kill off the offending processes and totally recover
from the condition. 

So, maybe this is a valid buggy behavior?

    --rustyl

On Thu, 2003-09-18 at 13:21, Rusty Lynch wrote:
> While getting more familiar with the vm subsystem I discovered that it is
> fairly easy to lockup my system by mlocking enough memory. I believe what 
> is happening is that I am reducing the amount of swappable physical ram
> to the point that try_to_free_pages() will go into an endless loop waiting
> for bdflush to free up some pages.
> 
> I'm guessing this is not a valid condition for a properly configured server,
> but since I'm not feeling very confident about my above explanation, I'm not
> so sure this isn't something to look into.
> 
> On my 2.6.0-test5 kernel I run a little utility that attempts to allocate 
> a large enough chunk of memory, touch all pages in the buffer, and then 
> mlock the buffer.  Just setting vm.overcommit_memory=2 and a real low
> vm.overcommit_ratio doesn't help a lot since all I have to do is squeeze out
> the available physical ram that can be swapped out.
> 
> This is what I see for my offending process if I meta-sysrq-t.
> 
> fat_bastard   D 00000001 4293732848   598    550                     (NOTLB)
> cc9d3c78 00000082 c1285bc0 00000001 00000003 c1286580 c1285bc0 cc9d3c98
>        00000000 00000246 c014f520 cc9d3c6c cf033004 cf6ff000 00000007 00000000
>        00000000 ffff8258 cc9d3c8c 00000000 cc9d3cc4 c0134dde cc9d3c8c ffff8258
> Call Trace:
>  [<c014f520>] background_writeout+0x0/0xe0
>  [<c0134dde>] schedule_timeout+0x6e/0xc0
>  [<c0134d60>] process_timeout+0x0/0x10
>  [<c012793b>] io_schedule_timeout+0x2b/0x40
>  [<c031d2bb>] blk_congestion_wait+0x8b/0xa0
>  [<c0128c30>] autoremove_wake_function+0x0/0x50
>  [<c0128c30>] autoremove_wake_function+0x0/0x50
>  [<c01581f2>] try_to_free_pages+0x102/0x1c0
>  [<c014e1a7>] __alloc_pages+0x1f7/0x3a0
>  [<c0166d31>] read_swap_cache_async+0xb1/0xbd
>  [<c015b8b2>] swapin_readahead+0x42/0x90
>  [<c015bb68>] do_swap_page+0x268/0x340
>  [<c011007b>] save_v86_state+0x4b/0x200
>  [<c015c521>] handle_mm_fault+0xf1/0x200
>  [<c015ab1e>] get_user_pages+0xee/0x3a0
>  [<c015f18d>] insert_vm_struct+0x6d/0x77
>  [<c015c74d>] make_pages_present+0x8d/0xa0
>  [<c015cd24>] mlock_fixup+0xe4/0x120
>  [<c0280e94>] capable+0x24/0x50
>  [<c015ce49>] do_mlock+0xe9/0x110
>  [<c015cf37>] sys_mlock+0xc7/0xe0
>  [<c010c873>] syscall_call+0x7/0xb
> 
> If I attempt to kill all processes with meta-sysrq-i, then I start seeing init
> stuck in the same spot:
> 
> init          D 00000001 21838320   606      1                 605 (NOTLB)
> cea9fc5c 00000082 c1285bc0 00000001 00000003 c1286580 c1285bc0 cea9fc7c
>        00000000 00000246 c014f520 cea9fc50 ce3d0004 cf6ff000 00000007 00000000
>        00000000 00076d98 cea9fc70 00000000 cea9fca8 c0134dde cea9fc70 00076d98
> Call Trace:
>  [<c014f520>] background_writeout+0x0/0xe0
>  [<c0134dde>] schedule_timeout+0x6e/0xc0
>  [<c0134d60>] process_timeout+0x0/0x10
>  [<c012793b>] io_schedule_timeout+0x2b/0x40
>  [<c031d2bb>] blk_congestion_wait+0x8b/0xa0
>  [<c0128c30>] autoremove_wake_function+0x0/0x50
>  [<c0128c30>] autoremove_wake_function+0x0/0x50
>  [<c01581f2>] try_to_free_pages+0x102/0x1c0
>  [<c014e1a7>] __alloc_pages+0x1f7/0x3a0
>  [<c0150982>] __do_page_cache_readahead+0x182/0x21e
>  [<c014b18f>] filemap_nopage+0x11f/0x330
>  [<c015bff1>] do_no_page+0xd1/0x3f0
>  [<c015c548>] handle_mm_fault+0x118/0x200
>  [<c0123886>] do_page_fault+0x176/0x4dc
>  [<c0138c91>] sigprocmask+0x71/0x150
>  [<c0138e11>] sys_rt_sigprocmask+0xa1/0x1e0
>  [<c0123710>] do_page_fault+0x0/0x4dc
>  [<c010d2dd>] error_code+0x2d/0x38
> 
> The current process (as seen via meta-sysrq-p) seems to always be the swapper:
> Pid: 0, comm:              swapper
> EIP: 0060:[<c010a070>] CPU: 0
> EIP is at default_idle+0x30/0x40
>  EFLAGS: 00000246    Not tainted
> EAX: 00000000 EBX: c0600000 ECX: 001d9b2e EDX: c0600000
> ESI: c0600000 EDI: c010a040 EBP: c0601fb4 DS: 007b ES: 007b
> CR0: 8005003b CR2: 0804d6a0 CR3: 0b9b8000 CR4: 00000680
> Call Trace:
>  [<c010a106>] cpu_idle+0x46/0x50
>  [<c0105000>] rest_init+0x0/0x80
>  [<c0602961>] start_kernel+0x181/0x1b0
>  [<c0602500>] unknown_bootoption+0x0/0x100
> 
> I also noticed that try_to_free_pages() is ignoring the return value for 
> wakeup_bdflush(), so for kicks I 
> 
> +        WARN_ON(wakeup_bdflush(total_scanned));
> -        wakeup_bdflush(total_scanned);
> 
> After my system is nicely locked up, I start seeing tons of warnings
> like:
> 
> Badness in try_to_free_pages at mm/vmscan.c:886
> Call Trace:
>  [<c01582b8>] try_to_free_pages+0x1c8/0x1e0
>  [<c014e1a7>] __alloc_pages+0x1f7/0x3a0
>  [<c014e372>] __get_free_pages+0x22/0x50
>  [<c0152385>] cache_grow+0x125/0x400
>  [<c013437c>] del_timer_sync+0x2c/0x80
>  [<c0124819>] kernel_map_pages+0x29/0x64
>  [<c015279a>] cache_alloc_refill+0x13a/0x4c0
>  [<c0153185>] kmem_cache_alloc+0x1b5/0x1e0
>  [<c017ca59>] getname+0x29/0xd0
>  [<c017e28b>] __user_walk+0x1b/0x60
>  [<c018319e>] select_bits_alloc+0x1e/0x30
>  [<c01785ce>] vfs_stat+0x1e/0x60
>  [<c01833fb>] sys_select+0x23b/0x520
>  [<c0178ccb>] sys_stat64+0x1b/0x40
>  [<c012f105>] sys_time+0x35/0x70
>  [<c010c873>] syscall_call+0x7/0xb
> 
> 
> So... is my explanation on target?  Is this a condition that would really
> only pop up in crazy stress testing?  If not then maybe sys_mlock should have
> an additional threshold?
> 
>     --rustyl


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

next prev parent reply	other threads:[~2003-09-18 20:46 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-09-18 20:21 swapping to death by stressing mlock Rusty Lynch
2003-09-18 20:46 ` Rusty Lynch [this message]
2003-09-18 21:15 ` William Lee Irwin III

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1063918017.12547.9.camel@vmhack \
    --to=rusty@linux.co.intel.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.