From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-f169.google.com (mail-ig0-f169.google.com [209.85.213.169]) by kanga.kvack.org (Postfix) with ESMTP id 968126B006C for ; Mon, 15 Dec 2014 18:02:10 -0500 (EST) Received: by mail-ig0-f169.google.com with SMTP id hl2so6803837igb.0 for ; Mon, 15 Dec 2014 15:02:10 -0800 (PST) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org. [140.211.169.12]) by mx.google.com with ESMTPS id gb19si7782252icb.23.2014.12.15.15.02.08 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Dec 2014 15:02:09 -0800 (PST) Received: from akpm3.mtv.corp.google.com (unknown [216.239.45.95]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 2CA37ACD for ; Mon, 15 Dec 2014 23:02:08 +0000 (UTC) Date: Mon, 15 Dec 2014 15:02:07 -0800 From: Andrew Morton Subject: Stalled MM patches for review Message-Id: <20141215150207.67c9a25583c04202d9f4508e@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org I'm sitting on a bunch of patches which have question marks over them. I'll send them out now. Can people please dig in and see if we can get them finished off one way or the other? My notes (which may be out of date): mm-page_isolation-check-pfn-validity-before-access.patch: - Might be unneeded. mhocko has issues. mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask.patch: - Needs review and checking mm-page_alloc-embed-oom-killing-naturally-into-allocation-slowpath.patch: - mhocko wanted a changelog update mm-fix-invalid-use-of-pfn_valid_within-in-test_pages_in_a_zone.patch: - Yasuaki Ishimatsu has issues with it mm-introduce-do_shared_fault-and-drop-do_fault-fix-fix.patch: - Adds a comment whcih might not be true? fs-mpagec-forgotten-write_sync-in-case-of-data-integrity-write.patch: - Unsure whether or not this helps. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f173.google.com (mail-pd0-f173.google.com [209.85.192.173]) by kanga.kvack.org (Postfix) with ESMTP id 7E7566B008C for ; Mon, 15 Dec 2014 18:57:16 -0500 (EST) Received: by mail-pd0-f173.google.com with SMTP id ft15so12658919pdb.4 for ; Mon, 15 Dec 2014 15:57:16 -0800 (PST) Received: from fgwmail6.fujitsu.co.jp (fgwmail6.fujitsu.co.jp. [192.51.44.36]) by mx.google.com with ESMTPS id l11si16099597pdj.98.2014.12.15.15.57.14 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 15 Dec 2014 15:57:15 -0800 (PST) Received: from kw-mxauth.gw.nic.fujitsu.com (unknown [10.0.237.134]) by fgwmail6.fujitsu.co.jp (Postfix) with ESMTP id 9F2603EE0AE for ; Tue, 16 Dec 2014 08:57:12 +0900 (JST) Received: from s1.gw.fujitsu.co.jp (s1.gw.fujitsu.co.jp [10.0.50.91]) by kw-mxauth.gw.nic.fujitsu.com (Postfix) with ESMTP id ADC22AC0453 for ; Tue, 16 Dec 2014 08:57:11 +0900 (JST) Received: from g01jpfmpwyt01.exch.g01.fujitsu.local (g01jpfmpwyt01.exch.g01.fujitsu.local [10.128.193.38]) by s1.gw.fujitsu.co.jp (Postfix) with ESMTP id 5A3E9E08001 for ; Tue, 16 Dec 2014 08:57:11 +0900 (JST) Message-ID: <548F7541.8040407@jp.fujitsu.com> Date: Tue, 16 Dec 2014 08:56:49 +0900 From: Yasuaki Ishimatsu MIME-Version: 1.0 Subject: Re: Stalled MM patches for review References: <20141215150207.67c9a25583c04202d9f4508e@linux-foundation.org> In-Reply-To: <20141215150207.67c9a25583c04202d9f4508e@linux-foundation.org> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: Andrew Morton (2014/12/16 8:02), Andrew Morton wrote: > > I'm sitting on a bunch of patches which have question marks over them. > I'll send them out now. Can people please dig in and see if we can get > them finished off one way or the other? > > My notes (which may be out of date): > Here are the threads of each discussion. Please use them as a reference. > mm-page_isolation-check-pfn-validity-before-access.patch: > - Might be unneeded. mhocko has issues. https://lkml.org/lkml/2014/11/6/79 > > mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask.patch: > - Needs review and checking > mm-page_alloc-embed-oom-killing-naturally-into-allocation-slowpath.patch: > - mhocko wanted a changelog update https://lkml.org/lkml/2014/12/4/697 > > mm-fix-invalid-use-of-pfn_valid_within-in-test_pages_in_a_zone.patch: > - Yasuaki Ishimatsu has issues with it https://lkml.org/lkml/2014/12/9/482 > > mm-introduce-do_shared_fault-and-drop-do_fault-fix-fix.patch: > - Adds a comment whcih might not be true? > > fs-mpagec-forgotten-write_sync-in-case-of-data-integrity-write.patch: > - Unsure whether or not this helps. https://lkml.org/lkml/2014/2/15/245 Thanks, Yasuaki Ishimatsu > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f45.google.com (mail-wg0-f45.google.com [74.125.82.45]) by kanga.kvack.org (Postfix) with ESMTP id 3E59D6B0032 for ; Mon, 15 Dec 2014 22:07:03 -0500 (EST) Received: by mail-wg0-f45.google.com with SMTP id b13so16377413wgh.18 for ; Mon, 15 Dec 2014 19:07:02 -0800 (PST) Received: from gum.cmpxchg.org (gum.cmpxchg.org. [85.214.110.215]) by mx.google.com with ESMTPS id vb8si19384919wjc.134.2014.12.15.19.07.02 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Dec 2014 19:07:02 -0800 (PST) Date: Mon, 15 Dec 2014 22:06:58 -0500 From: Johannes Weiner Subject: Re: Stalled MM patches for review Message-ID: <20141216030658.GA18569@phnom.home.cmpxchg.org> References: <20141215150207.67c9a25583c04202d9f4508e@linux-foundation.org> <548F7541.8040407@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <548F7541.8040407@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: To: Yasuaki Ishimatsu Cc: linux-mm@kvack.org, Andrew Morton On Tue, Dec 16, 2014 at 08:56:49AM +0900, Yasuaki Ishimatsu wrote: > (2014/12/16 8:02), Andrew Morton wrote: > > > >I'm sitting on a bunch of patches which have question marks over them. > >I'll send them out now. Can people please dig in and see if we can get > >them finished off one way or the other? > > > >My notes (which may be out of date): > > >mm-page_alloc-embed-oom-killing-naturally-into-allocation-slowpath.patch: > > - mhocko wanted a changelog update > > https://lkml.org/lkml/2014/12/4/697 Thanks Yasuaki-san! Andrew, here is an updated version of this patch with a more detailed changelog. Must have fallen through the cracks: --- From: Johannes Weiner Subject: [patch] mm: page_alloc: embed OOM killing naturally into allocation slowpath Date: Fri, 5 Dec 2014 09:48:13 -0500 Message-Id: <1417790893-32010-1-git-send-email-hannes@cmpxchg.org> The OOM killing invocation does a lot of duplicative checks against the task's allocation context. Rework it to take advantage of the existing checks in the allocator slowpath. The OOM killer is invoked when the allocator is unable to reclaim any pages but the allocation has to keep looping. Instead of having a check for __GFP_NORETRY hidden in oom_gfp_allowed(), just move the OOM invocation to the true branch of should_alloc_retry(). The __GFP_FS check from oom_gfp_allowed() can then be moved into the OOM avoidance branch in __alloc_pages_may_oom(), along with the PF_DUMPCORE test. __alloc_pages_may_oom() can then signal to the caller whether the OOM killer was invoked, instead of requiring it to duplicate the order and high_zoneidx checks to guess this when deciding whether to continue. Signed-off-by: Johannes Weiner Acked-by: Michal Hocko --- include/linux/oom.h | 5 ---- mm/page_alloc.c | 78 +++++++++++++++++++++++------------------------------ 2 files changed, 33 insertions(+), 50 deletions(-) diff --git a/include/linux/oom.h b/include/linux/oom.h index e8d6e1058723..4971874f54db 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -85,11 +85,6 @@ static inline void oom_killer_enable(void) oom_killer_disabled = false; } -static inline bool oom_gfp_allowed(gfp_t gfp_mask) -{ - return (gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY); -} - extern struct task_struct *find_lock_task_mm(struct task_struct *p); /* sysctls */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 616a2c956b4b..88b64c09a8c0 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2232,12 +2232,21 @@ static inline struct page * __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist, enum zone_type high_zoneidx, nodemask_t *nodemask, struct zone *preferred_zone, - int classzone_idx, int migratetype) + int classzone_idx, int migratetype, unsigned long *did_some_progress) { struct page *page; - /* Acquire the per-zone oom lock for each zone */ + *did_some_progress = 0; + + if (oom_killer_disabled) + return NULL; + + /* + * Acquire the per-zone oom lock for each zone. If that + * fails, somebody else is making progress for us. + */ if (!oom_zonelist_trylock(zonelist, gfp_mask)) { + *did_some_progress = 1; schedule_timeout_uninterruptible(1); return NULL; } @@ -2263,12 +2272,18 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, goto out; if (!(gfp_mask & __GFP_NOFAIL)) { + /* Coredumps can quickly deplete all memory reserves */ + if (current->flags & PF_DUMPCORE) + goto out; /* The OOM killer will not help higher order allocs */ if (order > PAGE_ALLOC_COSTLY_ORDER) goto out; /* The OOM killer does not needlessly kill tasks for lowmem */ if (high_zoneidx < ZONE_NORMAL) goto out; + /* The OOM killer does not compensate for light reclaim */ + if (!(gfp_mask & __GFP_FS)) + goto out; /* * GFP_THISNODE contains __GFP_NORETRY and we never hit this. * Sanity check for bare calls of __GFP_THISNODE, not real OOM. @@ -2281,7 +2296,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, } /* Exhausted what can be done so it's blamo time */ out_of_memory(zonelist, gfp_mask, order, nodemask, false); - + *did_some_progress = 1; out: oom_zonelist_unlock(zonelist, gfp_mask); return page; @@ -2571,7 +2586,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, (gfp_mask & GFP_THISNODE) == GFP_THISNODE) goto nopage; -restart: if (!(gfp_mask & __GFP_NO_KSWAPD)) wake_all_kswapds(order, zonelist, high_zoneidx, preferred_zone, nodemask); @@ -2701,51 +2715,25 @@ rebalance: if (page) goto got_pg; - /* - * If we failed to make any progress reclaiming, then we are - * running out of options and have to consider going OOM - */ - if (!did_some_progress) { - if (oom_gfp_allowed(gfp_mask)) { - if (oom_killer_disabled) - goto nopage; - /* Coredumps can quickly deplete all memory reserves */ - if ((current->flags & PF_DUMPCORE) && - !(gfp_mask & __GFP_NOFAIL)) - goto nopage; - page = __alloc_pages_may_oom(gfp_mask, order, - zonelist, high_zoneidx, - nodemask, preferred_zone, - classzone_idx, migratetype); - if (page) - goto got_pg; - - if (!(gfp_mask & __GFP_NOFAIL)) { - /* - * The oom killer is not called for high-order - * allocations that may fail, so if no progress - * is being made, there are no other options and - * retrying is unlikely to help. - */ - if (order > PAGE_ALLOC_COSTLY_ORDER) - goto nopage; - /* - * The oom killer is not called for lowmem - * allocations to prevent needlessly killing - * innocent tasks. - */ - if (high_zoneidx < ZONE_NORMAL) - goto nopage; - } - - goto restart; - } - } - /* Check if we should retry the allocation */ pages_reclaimed += did_some_progress; if (should_alloc_retry(gfp_mask, order, did_some_progress, pages_reclaimed)) { + /* + * If we fail to make progress by freeing individual + * pages, but the allocation wants us to keep going, + * start OOM killing tasks. + */ + if (!did_some_progress) { + page = __alloc_pages_may_oom(gfp_mask, order, zonelist, + high_zoneidx, nodemask, + preferred_zone, classzone_idx, + migratetype,&did_some_progress); + if (page) + goto got_pg; + if (!did_some_progress) + goto nopage; + } /* Wait for some write requests to complete then retry */ wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50); goto rebalance; -- 2.1.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-f169.google.com (mail-ig0-f169.google.com [209.85.213.169]) by kanga.kvack.org (Postfix) with ESMTP id 31E966B006C for ; Tue, 16 Dec 2014 20:07:20 -0500 (EST) Received: by mail-ig0-f169.google.com with SMTP id hl2so8693089igb.2 for ; Tue, 16 Dec 2014 17:07:20 -0800 (PST) Received: from mail-ie0-x230.google.com (mail-ie0-x230.google.com. [2607:f8b0:4001:c03::230]) by mx.google.com with ESMTPS id z5si2473854igl.33.2014.12.16.17.07.18 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 16 Dec 2014 17:07:19 -0800 (PST) Received: by mail-ie0-f176.google.com with SMTP id tr6so13554366ieb.7 for ; Tue, 16 Dec 2014 17:07:18 -0800 (PST) Date: Tue, 16 Dec 2014 17:07:16 -0800 (PST) From: David Rientjes Subject: Re: Stalled MM patches for review In-Reply-To: <20141216030658.GA18569@phnom.home.cmpxchg.org> Message-ID: References: <20141215150207.67c9a25583c04202d9f4508e@linux-foundation.org> <548F7541.8040407@jp.fujitsu.com> <20141216030658.GA18569@phnom.home.cmpxchg.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Johannes Weiner Cc: Yasuaki Ishimatsu , linux-mm@kvack.org, Andrew Morton On Mon, 15 Dec 2014, Johannes Weiner wrote: > diff --git a/include/linux/oom.h b/include/linux/oom.h > index e8d6e1058723..4971874f54db 100644 > --- a/include/linux/oom.h > +++ b/include/linux/oom.h > @@ -85,11 +85,6 @@ static inline void oom_killer_enable(void) > oom_killer_disabled = false; > } > > -static inline bool oom_gfp_allowed(gfp_t gfp_mask) > -{ > - return (gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY); > -} > - Hmm, my patch which removed this already seems to have been yanked from -mm because this is seen as an alternative. Not sure why it couldn't have been rebased on top of it. > extern struct task_struct *find_lock_task_mm(struct task_struct *p); > > /* sysctls */ > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 616a2c956b4b..88b64c09a8c0 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2232,12 +2232,21 @@ static inline struct page * > __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > struct zonelist *zonelist, enum zone_type high_zoneidx, > nodemask_t *nodemask, struct zone *preferred_zone, > - int classzone_idx, int migratetype) > + int classzone_idx, int migratetype, unsigned long *did_some_progress) > { > struct page *page; > > - /* Acquire the per-zone oom lock for each zone */ > + *did_some_progress = 0; I initially didn't care much for using did_some_progress as a boolean value for __alloc_pages_may_oom() to determine whether the oom killer (either already running or having been called) should lead to future memory freeing, but its use for reclaim is similar with the exception that we don't need to actually know how much memory was freed since this check is now already under should_alloc_retry(). Pretty creative. > + > + if (oom_killer_disabled) > + return NULL; > + > + /* > + * Acquire the per-zone oom lock for each zone. If that > + * fails, somebody else is making progress for us. > + */ > if (!oom_zonelist_trylock(zonelist, gfp_mask)) { > + *did_some_progress = 1; > schedule_timeout_uninterruptible(1); Aside, outside the scope of this particular patch: I think this should probably be schedule_timeout_killable(1) instead in the case where current has already been killed as a result of the pending oom kill. It's probably not a huge deal since we'll drop the oom zonelist locks almost immediately after that and we just have to wait to be scheduled again, but it is probably more correct to do schedule_timeout_killable(1). > return NULL; > } > @@ -2263,12 +2272,18 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > goto out; > > if (!(gfp_mask & __GFP_NOFAIL)) { > + /* Coredumps can quickly deplete all memory reserves */ > + if (current->flags & PF_DUMPCORE) > + goto out; > /* The OOM killer will not help higher order allocs */ > if (order > PAGE_ALLOC_COSTLY_ORDER) > goto out; > /* The OOM killer does not needlessly kill tasks for lowmem */ > if (high_zoneidx < ZONE_NORMAL) > goto out; > + /* The OOM killer does not compensate for light reclaim */ > + if (!(gfp_mask & __GFP_FS)) > + goto out; > /* > * GFP_THISNODE contains __GFP_NORETRY and we never hit this. > * Sanity check for bare calls of __GFP_THISNODE, not real OOM. > @@ -2281,7 +2296,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > } > /* Exhausted what can be done so it's blamo time */ > out_of_memory(zonelist, gfp_mask, order, nodemask, false); > - > + *did_some_progress = 1; > out: > oom_zonelist_unlock(zonelist, gfp_mask); > return page; > @@ -2571,7 +2586,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > (gfp_mask & GFP_THISNODE) == GFP_THISNODE) > goto nopage; > > -restart: > if (!(gfp_mask & __GFP_NO_KSWAPD)) > wake_all_kswapds(order, zonelist, high_zoneidx, > preferred_zone, nodemask); > @@ -2701,51 +2715,25 @@ rebalance: > if (page) > goto got_pg; > > - /* > - * If we failed to make any progress reclaiming, then we are > - * running out of options and have to consider going OOM > - */ > - if (!did_some_progress) { > - if (oom_gfp_allowed(gfp_mask)) { > - if (oom_killer_disabled) > - goto nopage; > - /* Coredumps can quickly deplete all memory reserves */ > - if ((current->flags & PF_DUMPCORE) && > - !(gfp_mask & __GFP_NOFAIL)) > - goto nopage; > - page = __alloc_pages_may_oom(gfp_mask, order, > - zonelist, high_zoneidx, > - nodemask, preferred_zone, > - classzone_idx, migratetype); > - if (page) > - goto got_pg; > - > - if (!(gfp_mask & __GFP_NOFAIL)) { > - /* > - * The oom killer is not called for high-order > - * allocations that may fail, so if no progress > - * is being made, there are no other options and > - * retrying is unlikely to help. > - */ > - if (order > PAGE_ALLOC_COSTLY_ORDER) > - goto nopage; > - /* > - * The oom killer is not called for lowmem > - * allocations to prevent needlessly killing > - * innocent tasks. > - */ > - if (high_zoneidx < ZONE_NORMAL) > - goto nopage; > - } > - > - goto restart; > - } > - } > - > /* Check if we should retry the allocation */ > pages_reclaimed += did_some_progress; > if (should_alloc_retry(gfp_mask, order, did_some_progress, > pages_reclaimed)) { > + /* > + * If we fail to make progress by freeing individual > + * pages, but the allocation wants us to keep going, > + * start OOM killing tasks. > + */ > + if (!did_some_progress) { > + page = __alloc_pages_may_oom(gfp_mask, order, zonelist, > + high_zoneidx, nodemask, > + preferred_zone, classzone_idx, > + migratetype,&did_some_progress); Missing a space. > + if (page) > + goto got_pg; > + if (!did_some_progress) > + goto nopage; > + } > /* Wait for some write requests to complete then retry */ > wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50); > goto rebalance; This is broken because it does not recall gfp_to_alloc_flags(). If current is the oom kill victim, then ALLOC_NO_WATERMARKS never gets set properly and the slowpath will end up looping forever. The "restart" label which was removed in this patch needs to be reintroduced, and it can probably be moved to directly before gfp_to_alloc_flags(). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f53.google.com (mail-wg0-f53.google.com [74.125.82.53]) by kanga.kvack.org (Postfix) with ESMTP id B169A6B0032 for ; Tue, 16 Dec 2014 21:13:07 -0500 (EST) Received: by mail-wg0-f53.google.com with SMTP id l18so19009082wgh.12 for ; Tue, 16 Dec 2014 18:13:07 -0800 (PST) Received: from gum.cmpxchg.org (gum.cmpxchg.org. [85.214.110.215]) by mx.google.com with ESMTPS id cm1si5720752wib.51.2014.12.16.18.13.06 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Dec 2014 18:13:06 -0800 (PST) Date: Tue, 16 Dec 2014 21:13:02 -0500 From: Johannes Weiner Subject: Re: Stalled MM patches for review Message-ID: <20141217021302.GA14148@phnom.home.cmpxchg.org> References: <20141215150207.67c9a25583c04202d9f4508e@linux-foundation.org> <548F7541.8040407@jp.fujitsu.com> <20141216030658.GA18569@phnom.home.cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: David Rientjes Cc: Yasuaki Ishimatsu , linux-mm@kvack.org, Andrew Morton On Tue, Dec 16, 2014 at 05:07:16PM -0800, David Rientjes wrote: > On Mon, 15 Dec 2014, Johannes Weiner wrote: > > /* Check if we should retry the allocation */ > > pages_reclaimed += did_some_progress; > > if (should_alloc_retry(gfp_mask, order, did_some_progress, > > pages_reclaimed)) { > > + /* > > + * If we fail to make progress by freeing individual > > + * pages, but the allocation wants us to keep going, > > + * start OOM killing tasks. > > + */ > > + if (!did_some_progress) { > > + page = __alloc_pages_may_oom(gfp_mask, order, zonelist, > > + high_zoneidx, nodemask, > > + preferred_zone, classzone_idx, > > + migratetype,&did_some_progress); > > Missing a space. That was because of the 80 character limit, it seemed preferrable over a linewrap. > > + if (page) > > + goto got_pg; > > + if (!did_some_progress) > > + goto nopage; > > + } > > /* Wait for some write requests to complete then retry */ > > wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50); > > goto rebalance; > > This is broken because it does not recall gfp_to_alloc_flags(). If > current is the oom kill victim, then ALLOC_NO_WATERMARKS never gets set > properly and the slowpath will end up looping forever. The "restart" > label which was removed in this patch needs to be reintroduced, and it can > probably be moved to directly before gfp_to_alloc_flags(). Thanks for catching this. gfp_to_alloc_flags()'s name doesn't exactly imply such side effects... Here is a fixlet on top: --- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f173.google.com (mail-wi0-f173.google.com [209.85.212.173]) by kanga.kvack.org (Postfix) with ESMTP id 530226B006E for ; Wed, 17 Dec 2014 10:52:06 -0500 (EST) Received: by mail-wi0-f173.google.com with SMTP id r20so16209708wiv.6 for ; Wed, 17 Dec 2014 07:52:05 -0800 (PST) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id hs6si7365031wjb.68.2014.12.17.07.52.05 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 17 Dec 2014 07:52:05 -0800 (PST) Date: Wed, 17 Dec 2014 16:51:56 +0100 From: Michal Hocko Subject: Re: Stalled MM patches for review Message-ID: <20141217155156.GA24889@dhcp22.suse.cz> References: <20141215150207.67c9a25583c04202d9f4508e@linux-foundation.org> <548F7541.8040407@jp.fujitsu.com> <20141216030658.GA18569@phnom.home.cmpxchg.org> <20141217021302.GA14148@phnom.home.cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141217021302.GA14148@phnom.home.cmpxchg.org> Sender: owner-linux-mm@kvack.org List-ID: To: Johannes Weiner Cc: David Rientjes , Yasuaki Ishimatsu , linux-mm@kvack.org, Andrew Morton On Tue 16-12-14 21:13:02, Johannes Weiner wrote: [...] > From 45362d1920340716ef58bf1024d9674b5dfa809e Mon Sep 17 00:00:00 2001 > From: Johannes Weiner > Date: Tue, 16 Dec 2014 21:04:24 -0500 > Subject: [patch] mm: page_alloc: embed OOM killing naturally into allocation > slowpath fix > > When retrying the allocation after potentially invoking OOM, make sure > the alloc flags are recalculated, as they have to consider TIF_MEMDIE. > > Restore the original restart label, but rename it to 'retry' to match > the should_alloc_retry() it depends on. > > Signed-off-by: Johannes Weiner Missed that too. Well spotted! Acked-by: Michal Hocko > --- > mm/page_alloc.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 83ec725aec36..e8f5997c557c 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2673,6 +2673,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > (gfp_mask & GFP_THISNODE) == GFP_THISNODE) > goto nopage; > > +retry: > if (!(gfp_mask & __GFP_NO_KSWAPD)) > wake_all_kswapds(order, zonelist, high_zoneidx, > preferred_zone, nodemask); > @@ -2695,7 +2696,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > classzone_idx = zonelist_zone_idx(preferred_zoneref); > } > > -rebalance: > /* This is the last chance, in general, before the goto nopage. */ > page = get_page_from_freelist(gfp_mask, nodemask, order, zonelist, > high_zoneidx, alloc_flags & ~ALLOC_NO_WATERMARKS, > @@ -2823,7 +2823,7 @@ rebalance: > } > /* Wait for some write requests to complete then retry */ > wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50); > - goto rebalance; > + goto retry; > } else { > /* > * High-order allocations do not necessarily loop after > -- > 2.1.3 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-f178.google.com (mail-ig0-f178.google.com [209.85.213.178]) by kanga.kvack.org (Postfix) with ESMTP id 0EF396B0032 for ; Wed, 17 Dec 2014 17:28:41 -0500 (EST) Received: by mail-ig0-f178.google.com with SMTP id hl2so226152igb.11 for ; Wed, 17 Dec 2014 14:28:40 -0800 (PST) Received: from mail-ig0-x22f.google.com (mail-ig0-x22f.google.com. [2607:f8b0:4001:c05::22f]) by mx.google.com with ESMTPS id yz3si4515560icb.10.2014.12.17.14.28.39 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 17 Dec 2014 14:28:39 -0800 (PST) Received: by mail-ig0-f175.google.com with SMTP id h15so9585611igd.14 for ; Wed, 17 Dec 2014 14:28:39 -0800 (PST) Date: Wed, 17 Dec 2014 14:28:37 -0800 (PST) From: David Rientjes Subject: Re: Stalled MM patches for review In-Reply-To: <20141217021302.GA14148@phnom.home.cmpxchg.org> Message-ID: References: <20141215150207.67c9a25583c04202d9f4508e@linux-foundation.org> <548F7541.8040407@jp.fujitsu.com> <20141216030658.GA18569@phnom.home.cmpxchg.org> <20141217021302.GA14148@phnom.home.cmpxchg.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Johannes Weiner Cc: Yasuaki Ishimatsu , linux-mm@kvack.org, Andrew Morton On Tue, 16 Dec 2014, Johannes Weiner wrote: > > This is broken because it does not recall gfp_to_alloc_flags(). If > > current is the oom kill victim, then ALLOC_NO_WATERMARKS never gets set > > properly and the slowpath will end up looping forever. The "restart" > > label which was removed in this patch needs to be reintroduced, and it can > > probably be moved to directly before gfp_to_alloc_flags(). > > Thanks for catching this. gfp_to_alloc_flags()'s name doesn't exactly > imply such side effects... Here is a fixlet on top: > It would have livelocked the machine on an oom kill. > --- > From 45362d1920340716ef58bf1024d9674b5dfa809e Mon Sep 17 00:00:00 2001 > From: Johannes Weiner > Date: Tue, 16 Dec 2014 21:04:24 -0500 > Subject: [patch] mm: page_alloc: embed OOM killing naturally into allocation > slowpath fix > > When retrying the allocation after potentially invoking OOM, make sure > the alloc flags are recalculated, as they have to consider TIF_MEMDIE. > > Restore the original restart label, but rename it to 'retry' to match > the should_alloc_retry() it depends on. > > Signed-off-by: Johannes Weiner > --- > mm/page_alloc.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 83ec725aec36..e8f5997c557c 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2673,6 +2673,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > (gfp_mask & GFP_THISNODE) == GFP_THISNODE) > goto nopage; > > +retry: > if (!(gfp_mask & __GFP_NO_KSWAPD)) > wake_all_kswapds(order, zonelist, high_zoneidx, > preferred_zone, nodemask); > @@ -2695,7 +2696,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > classzone_idx = zonelist_zone_idx(preferred_zoneref); > } > > -rebalance: > /* This is the last chance, in general, before the goto nopage. */ > page = get_page_from_freelist(gfp_mask, nodemask, order, zonelist, > high_zoneidx, alloc_flags & ~ALLOC_NO_WATERMARKS, > @@ -2823,7 +2823,7 @@ rebalance: > } > /* Wait for some write requests to complete then retry */ > wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50); > - goto rebalance; > + goto retry; > } else { > /* > * High-order allocations do not necessarily loop after Why remove 'rebalance'? In the situation where direct reclaim does free memory and we're waiting on writeback (no call to the oom killer is made), it doesn't seem necessary to recalculate classzone_idx. Additionally, we never called wait_iff_congested() before when the oom killer freed memory. This is a no-op if the preferred_zone isn't waiting on writeback, but seems pointless if we just freed memory by calling the oom killer. In other words, I'm not sure why you're fixlet isn't this: diff --git a/mm/page_alloc.c b/mm/page_alloc.c --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2673,6 +2673,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, (gfp_mask & GFP_THISNODE) == GFP_THISNODE) goto nopage; +retry: if (!(gfp_mask & __GFP_NO_KSWAPD)) wake_all_kswapds(order, zonelist, high_zoneidx, preferred_zone, nodemask); @@ -2822,6 +2823,7 @@ rebalance: BUG_ON(gfp_mask & __GFP_NOFAIL); goto nopage; } + goto retry; } /* Wait for some write requests to complete then retry */ wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f171.google.com (mail-wi0-f171.google.com [209.85.212.171]) by kanga.kvack.org (Postfix) with ESMTP id 7094D6B0032 for ; Wed, 17 Dec 2014 21:20:27 -0500 (EST) Received: by mail-wi0-f171.google.com with SMTP id bs8so222263wib.16 for ; Wed, 17 Dec 2014 18:20:26 -0800 (PST) Received: from gum.cmpxchg.org (gum.cmpxchg.org. [85.214.110.215]) by mx.google.com with ESMTPS id mn7si9625111wjc.31.2014.12.17.18.20.26 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 17 Dec 2014 18:20:26 -0800 (PST) Date: Wed, 17 Dec 2014 21:20:19 -0500 From: Johannes Weiner Subject: Re: Stalled MM patches for review Message-ID: <20141218022019.GA25071@phnom.home.cmpxchg.org> References: <20141215150207.67c9a25583c04202d9f4508e@linux-foundation.org> <548F7541.8040407@jp.fujitsu.com> <20141216030658.GA18569@phnom.home.cmpxchg.org> <20141217021302.GA14148@phnom.home.cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: David Rientjes Cc: Yasuaki Ishimatsu , linux-mm@kvack.org, Andrew Morton On Wed, Dec 17, 2014 at 02:28:37PM -0800, David Rientjes wrote: > On Tue, 16 Dec 2014, Johannes Weiner wrote: > > > > This is broken because it does not recall gfp_to_alloc_flags(). If > > > current is the oom kill victim, then ALLOC_NO_WATERMARKS never gets set > > > properly and the slowpath will end up looping forever. The "restart" > > > label which was removed in this patch needs to be reintroduced, and it can > > > probably be moved to directly before gfp_to_alloc_flags(). > > > > Thanks for catching this. gfp_to_alloc_flags()'s name doesn't exactly > > imply such side effects... Here is a fixlet on top: > > > > It would have livelocked the machine on an oom kill. Very unlikely. The allocator will loop trying to reclaim and there is usually other activity on the system that makes progress. There must be, because the allocator can always hold locks that the victim needs to exit. > > From 45362d1920340716ef58bf1024d9674b5dfa809e Mon Sep 17 00:00:00 2001 > > From: Johannes Weiner > > Date: Tue, 16 Dec 2014 21:04:24 -0500 > > Subject: [patch] mm: page_alloc: embed OOM killing naturally into allocation > > slowpath fix > > > > When retrying the allocation after potentially invoking OOM, make sure > > the alloc flags are recalculated, as they have to consider TIF_MEMDIE. > > > > Restore the original restart label, but rename it to 'retry' to match > > the should_alloc_retry() it depends on. > > > > Signed-off-by: Johannes Weiner > > --- > > mm/page_alloc.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 83ec725aec36..e8f5997c557c 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -2673,6 +2673,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > > (gfp_mask & GFP_THISNODE) == GFP_THISNODE) > > goto nopage; > > > > +retry: > > if (!(gfp_mask & __GFP_NO_KSWAPD)) > > wake_all_kswapds(order, zonelist, high_zoneidx, > > preferred_zone, nodemask); > > @@ -2695,7 +2696,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > > classzone_idx = zonelist_zone_idx(preferred_zoneref); > > } > > > > -rebalance: > > /* This is the last chance, in general, before the goto nopage. */ > > page = get_page_from_freelist(gfp_mask, nodemask, order, zonelist, > > high_zoneidx, alloc_flags & ~ALLOC_NO_WATERMARKS, > > @@ -2823,7 +2823,7 @@ rebalance: > > } > > /* Wait for some write requests to complete then retry */ > > wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50); > > - goto rebalance; > > + goto retry; > > } else { > > /* > > * High-order allocations do not necessarily loop after > > Why remove 'rebalance'? In the situation where direct reclaim does free > memory and we're waiting on writeback (no call to the oom killer is made), > it doesn't seem necessary to recalculate classzone_idx. > > Additionally, we never called wait_iff_congested() before when the oom > killer freed memory. This is a no-op if the preferred_zone isn't waiting > on writeback, but seems pointless if we just freed memory by calling the > oom killer. Why keep all these undocumented assumptions in the code? It's really simple: if we retry freeing memory (LRU reclaim or OOM kills), we wait for congestion, kick kswapd, re-evaluate the current task state, regardless of which reclaim method did what or anything at all. It's a slowpath, so there is no reason to not keep this simple and robust. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f178.google.com (mail-wi0-f178.google.com [209.85.212.178]) by kanga.kvack.org (Postfix) with ESMTP id 4BC156B0032 for ; Thu, 18 Dec 2014 11:55:07 -0500 (EST) Received: by mail-wi0-f178.google.com with SMTP id em10so2480163wid.17 for ; Thu, 18 Dec 2014 08:55:06 -0800 (PST) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id bz8si12921820wjb.73.2014.12.18.08.55.05 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 18 Dec 2014 08:55:06 -0800 (PST) Date: Thu, 18 Dec 2014 17:55:04 +0100 From: Michal Hocko Subject: Re: Stalled MM patches for review Message-ID: <20141218165504.GB957@dhcp22.suse.cz> References: <20141215150207.67c9a25583c04202d9f4508e@linux-foundation.org> <548F7541.8040407@jp.fujitsu.com> <20141216030658.GA18569@phnom.home.cmpxchg.org> <20141217021302.GA14148@phnom.home.cmpxchg.org> <20141218022019.GA25071@phnom.home.cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141218022019.GA25071@phnom.home.cmpxchg.org> Sender: owner-linux-mm@kvack.org List-ID: To: Johannes Weiner Cc: David Rientjes , Yasuaki Ishimatsu , linux-mm@kvack.org, Andrew Morton On Wed 17-12-14 21:20:19, Johannes Weiner wrote: > On Wed, Dec 17, 2014 at 02:28:37PM -0800, David Rientjes wrote: [...] > > Why remove 'rebalance'? In the situation where direct reclaim does free > > memory and we're waiting on writeback (no call to the oom killer is made), > > it doesn't seem necessary to recalculate classzone_idx. > > > > Additionally, we never called wait_iff_congested() before when the oom > > killer freed memory. This is a no-op if the preferred_zone isn't waiting > > on writeback, but seems pointless if we just freed memory by calling the > > oom killer. > > Why keep all these undocumented assumptions in the code? It's really > simple: if we retry freeing memory (LRU reclaim or OOM kills), we wait > for congestion, kick kswapd, re-evaluate the current task state, > regardless of which reclaim method did what or anything at all. It's > a slowpath, so there is no reason to not keep this simple and robust. Agreed, the less subtle loops via labels we have the better. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org