From: Minchan Kim <minchan@kernel.org>
To: Luigi Semenzato <semenzato@google.com>
Cc: David Rientjes <rientjes@google.com>,
linux-mm@kvack.org, Dan Magenheimer <dan.magenheimer@oracle.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Sonny Rao <sonnyrao@google.com>
Subject: Re: zram OOM behavior
Date: Wed, 31 Oct 2012 09:57:38 +0900 [thread overview]
Message-ID: <20121031005738.GM15767@bbox> (raw)
In-Reply-To: <CAA25o9Tp5J6-9JzwEfcZJ4dHQCEKV9_GYO0ZQ05Ttc3QWP=5_Q@mail.gmail.com>
Hi Luigi,
On Tue, Oct 30, 2012 at 12:12:02PM -0700, Luigi Semenzato wrote:
> On Mon, Oct 29, 2012 at 10:41 PM, David Rientjes <rientjes@google.com> wrote:
> > On Mon, 29 Oct 2012, Luigi Semenzato wrote:
> >
> >> However, now there is something that worries me more. The trace of
> >> the thread with TIF_MEMDIE set shows that it has executed most of
> >> do_exit() and appears to be waiting to be reaped. From my reading of
> >> the code, this implies that task->exit_state should be non-zero, which
> >> means that select_bad_process should have skipped that thread, which
> >> means that we cannot be in the deadlock situation, and my experiments
> >> are not consistent.
> >>
> >
> > Yeah, this is what I was referring to earlier, select_bad_process() will
> > not consider the thread for which you posted a stack trace for oom kill,
> > so it's not deferring because of it. There are either other thread(s)
> > that have been oom killed and have not yet release their memory or the oom
> > killer is never being called.
>
> Thanks. I now have better information on what's happening.
>
> The "culprit" is not the OOM-killed process (the one with TIF_MEMDIE
> set). It's another process that's exiting for some other reason.
>
> select_bad_process() checks for thread->exit_state at the beginning,
> and skips processes that are exiting. But later it checks for
> p->flags & PF_EXITING, and can return -1 in that case (and it does for
> me).
>
> It turns out that do_exit() does a lot of things between setting the
> thread->flags PF_EXITING bit (in exit_signals()) and setting
> thread->exit_state to non-zero (in exit_notify()). Some of those
> things apparently need memory. I caught one process responsible for
> the PTR_ERR(-1) while it was doing this:
>
> [ 191.859358] VC manager R running 0 2388 1108 0x00000104
> [ 191.859377] err_ptr_count = 45623
> [ 191.859384] e0611b1c 00200086 f5608000 815ecd20 815ecd20 a0a9ebc3
> 0000002c f67cfd20
> [ 191.859407] f430a060 81191c34 e0611aec 81196d79 4168ef20 00000001
> e1302400 e130264c
> [ 191.859428] e1302400 e0611af4 813b71d5 e0611b00 810b42f1 e1302400
> e0611b0c 810b430e
> [ 191.859450] Call Trace:
> [ 191.859465] [<81191c34>] ? __delay+0xe/0x10
> [ 191.859478] [<81196d79>] ? do_raw_spin_lock+0xa2/0xf3
> [ 191.859491] [<813b71d5>] ? _raw_spin_unlock+0xd/0xf
> [ 191.859504] [<810b42f1>] ? put_super+0x26/0x29
> [ 191.859515] [<810b430e>] ? drop_super+0x1a/0x1d
> [ 191.859527] [<8104512d>] __cond_resched+0x1b/0x2b
> [ 191.859537] [<813b67a7>] _cond_resched+0x18/0x21
> [ 191.859549] [<81093940>] shrink_slab+0x224/0x22f
> [ 191.859562] [<81095a96>] try_to_free_pages+0x1b7/0x2e6
> [ 191.859574] [<8108df2a>] __alloc_pages_nodemask+0x40a/0x61f
> [ 191.859588] [<810a9dbe>] read_swap_cache_async+0x4a/0xcf
> [ 191.859600] [<810a9ea4>] swapin_readahead+0x61/0x8d
> [ 191.859612] [<8109fff4>] handle_pte_fault+0x310/0x5fb
> [ 191.859624] [<810a0420>] handle_mm_fault+0xae/0xbd
> [ 191.859637] [<8101d0f9>] do_page_fault+0x265/0x284
> [ 191.859648] [<8104aa17>] ? dequeue_entity+0x236/0x252
> [ 191.859660] [<8101ce94>] ? vmalloc_sync_all+0xa/0xa
> [ 191.859672] [<813b7887>] error_code+0x67/0x6c
> [ 191.859683] [<81191d21>] ? __get_user_4+0x11/0x17
> [ 191.859695] [<81059f28>] ? exit_robust_list+0x30/0x105
> [ 191.859707] [<813b71b0>] ? _raw_spin_unlock_irq+0xd/0x10
> [ 191.859718] [<810446d5>] ? finish_task_switch+0x53/0x89
> [ 191.859730] [<8102351d>] mm_release+0x1d/0xc3
> [ 191.859740] [<81026ce9>] exit_mm+0x1d/0xe9
> [ 191.859750] [<81032b87>] ? exit_signals+0x57/0x10a
> [ 191.859760] [<81028082>] do_exit+0x19b/0x640
> [ 191.859770] [<81058598>] ? futex_wait_queue_me+0xaa/0xbe
> [ 191.859781] [<81030bbf>] ? recalc_sigpending_tsk+0x51/0x5c
> [ 191.859793] [<81030beb>] ? recalc_sigpending+0x17/0x3e
> [ 191.859803] [<81028752>] do_group_exit+0x63/0x86
> [ 191.859813] [<81032b19>] get_signal_to_deliver+0x434/0x44b
> [ 191.859825] [<81001e01>] do_signal+0x37/0x4fe
> [ 191.859837] [<81048eed>] ? set_next_entity+0x36/0x9d
> [ 191.859850] [<81050d8e>] ? timekeeping_get_ns+0x11/0x55
> [ 191.859861] [<8105a754>] ? sys_futex+0xcb/0xdb
> [ 191.859871] [<810024a7>] do_notify_resume+0x26/0x65
> [ 191.859883] [<813b73a5>] work_notifysig+0xa/0x11
> [ 191.859893] Kernel panic - not syncing: too many ERR_PTR
>
> I don't know why mm_release() would page fault, but it looks like it does.
>
> So the OOM killer will not kill other processes because it thinks a
> process is exiting, which will free up memory. But the exiting
> process needs memory to continue exiting --> deadlock. Sounds
> plausible?
It sounds right in your kernel but principal problem is min_filelist_kbytes patch.
If normal exited process in exit path requires a page and there is no free page
any more, it ends up going to OOM path after try to reclaim memory several time.
Then,
In select_bad_process,
if (task->flags & PF_EXITING) {
if (task == current) <== true
return OOM_SCAN_SELECT;
In oom_kill_process,
if (p->flags & PF_EXITING)
set_tsk_thread_flag(p, TIF_MEMDIE);
At last, normal exited process would get a free page.
But in your kernel, it seems not because I guess did_some_progress in
__alloc_pages_direct_reclaim is never 0. The why it is never 0 is
do_try_to_free_pages's all_unreclaimable can't do his role by your
min_filelist_kbytes. It makes __alloc_pages_slowpath's looping forever.
Sounds plausible?
>
> OK, now someone is going to fix this, right? :-)
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-10-31 0:51 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-28 17:32 zram OOM behavior Luigi Semenzato
2012-10-03 13:30 ` Konrad Rzeszutek Wilk
[not found] ` <CAA25o9SwO209DD6CUx-LzhMt9XU6niGJ-fBPmgwfcrUvf0BPWA@mail.gmail.com>
2012-10-12 23:30 ` Luigi Semenzato
2012-10-15 14:44 ` Minchan Kim
2012-10-15 18:54 ` Luigi Semenzato
2012-10-16 6:18 ` Minchan Kim
2012-10-16 17:36 ` Luigi Semenzato
2012-10-19 17:49 ` Luigi Semenzato
2012-10-22 23:53 ` Minchan Kim
2012-10-23 0:40 ` Luigi Semenzato
2012-10-23 6:03 ` David Rientjes
2012-10-29 18:26 ` Luigi Semenzato
2012-10-29 19:00 ` David Rientjes
2012-10-29 22:36 ` Luigi Semenzato
2012-10-29 22:52 ` David Rientjes
2012-10-29 23:23 ` Luigi Semenzato
2012-10-29 23:34 ` Luigi Semenzato
2012-10-30 0:18 ` Minchan Kim
2012-10-30 0:45 ` Luigi Semenzato
2012-10-30 5:41 ` David Rientjes
2012-10-30 19:12 ` Luigi Semenzato
2012-10-30 20:30 ` Luigi Semenzato
2012-10-30 22:32 ` Luigi Semenzato
2012-10-31 18:42 ` David Rientjes
2012-10-30 22:37 ` Sonny Rao
2012-10-31 4:46 ` David Rientjes
2012-10-31 6:14 ` Luigi Semenzato
2012-10-31 6:28 ` Luigi Semenzato
2012-10-31 18:45 ` David Rientjes
2012-10-31 0:57 ` Minchan Kim [this message]
2012-10-31 1:06 ` Luigi Semenzato
2012-10-31 1:27 ` Minchan Kim
2012-10-31 3:49 ` Luigi Semenzato
2012-10-31 7:24 ` Minchan Kim
2012-10-31 16:07 ` Luigi Semenzato
2012-10-31 17:49 ` Mandeep Singh Baines
2012-10-31 18:54 ` David Rientjes
2012-10-31 21:40 ` Luigi Semenzato
2012-11-01 2:11 ` Minchan Kim
2012-11-01 4:38 ` David Rientjes
2012-11-01 5:18 ` Minchan Kim
2012-11-01 2:43 ` Minchan Kim
2012-11-01 4:48 ` David Rientjes
2012-11-01 5:26 ` Minchan Kim
2012-11-01 8:28 ` Mel Gorman
2012-11-01 15:57 ` Luigi Semenzato
2012-11-01 15:58 ` Luigi Semenzato
2012-11-01 21:48 ` David Rientjes
2012-11-01 17:50 ` Luigi Semenzato
2012-11-01 21:50 ` David Rientjes
2012-11-01 21:58 ` [patch] mm, oom: allow exiting threads to have access to memory reserves David Rientjes
2012-11-01 22:43 ` Andrew Morton
2012-11-01 23:05 ` David Rientjes
2012-11-01 23:06 ` Luigi Semenzato
2012-11-01 22:04 ` zram OOM behavior Luigi Semenzato
2012-11-01 22:25 ` David Rientjes
-- strict thread matches above, loose matches on Subject: below --
2012-11-02 6:39 Minchan Kim
2012-11-02 8:30 ` Mel Gorman
2012-11-02 22:36 ` Minchan Kim
2012-11-05 14:46 ` Mel Gorman
2012-11-06 0:25 ` Minchan Kim
2012-11-06 8:58 ` Mel Gorman
2012-11-06 10:17 ` Minchan Kim
2012-11-09 9:50 ` Mel Gorman
2012-11-12 13:32 ` Minchan Kim
2012-11-12 14:06 ` Mel Gorman
2012-11-13 13:31 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121031005738.GM15767@bbox \
--to=minchan@kernel.org \
--cc=dan.magenheimer@oracle.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=rientjes@google.com \
--cc=semenzato@google.com \
--cc=sonnyrao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).