linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vinayak Menon <vinmenon@codeaurora.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	hannes@cmpxchg.org, vdavydov@parallels.com, mhocko@suse.cz,
	mgorman@suse.de, minchan@kernel.org
Subject: Re: [PATCH v2] mm: vmscan: fix the page state calculation in too_many_isolated
Date: Fri, 16 Jan 2015 10:40:01 +0530	[thread overview]
Message-ID: <54B89D29.5000702@codeaurora.org> (raw)
In-Reply-To: <20150115171728.ebc77a48.akpm@linux-foundation.org>

On 01/16/2015 06:47 AM, Andrew Morton wrote:
> On Wed, 14 Jan 2015 17:06:59 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
>
>> It is observed that sometimes multiple tasks get blocked for long
>> in the congestion_wait loop below, in shrink_inactive_list. This
>> is because of vm_stat values not being synced.
>>
>> (__schedule) from [<c0a03328>]
>> (schedule_timeout) from [<c0a04940>]
>> (io_schedule_timeout) from [<c01d585c>]
>> (congestion_wait) from [<c01cc9d8>]
>> (shrink_inactive_list) from [<c01cd034>]
>> (shrink_zone) from [<c01cdd08>]
>> (try_to_free_pages) from [<c01c442c>]
>> (__alloc_pages_nodemask) from [<c01f1884>]
>> (new_slab) from [<c09fcf60>]
>> (__slab_alloc) from [<c01f1a6c>]
>>
>> In one such instance, zone_page_state(zone, NR_ISOLATED_FILE)
>> had returned 14, zone_page_state(zone, NR_INACTIVE_FILE)
>> returned 92, and GFP_IOFS was set, and this resulted
>> in too_many_isolated returning true. But one of the CPU's
>> pageset vm_stat_diff had NR_ISOLATED_FILE as "-14". So the
>> actual isolated count was zero. As there weren't any more
>> updates to NR_ISOLATED_FILE and vmstat_update deffered work
>> had not been scheduled yet, 7 tasks were spinning in the
>> congestion wait loop for around 4 seconds, in the direct
>> reclaim path.
>>
>> This patch uses zone_page_state_snapshot instead, but restricts
>> its usage to avoid performance penalty.
>
> Seems reasonable.
>
>>
>> ...
>>
>> @@ -1516,15 +1531,18 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
>>   	unsigned long nr_immediate = 0;
>>   	isolate_mode_t isolate_mode = 0;
>>   	int file = is_file_lru(lru);
>> +	int safe = 0;
>>   	struct zone *zone = lruvec_zone(lruvec);
>>   	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
>>
>> -	while (unlikely(too_many_isolated(zone, file, sc))) {
>> +	while (unlikely(too_many_isolated(zone, file, sc, safe))) {
>>   		congestion_wait(BLK_RW_ASYNC, HZ/10);
>>
>>   		/* We are about to die and free our memory. Return now. */
>>   		if (fatal_signal_pending(current))
>>   			return SWAP_CLUSTER_MAX;
>> +
>> +		safe = 1;
>>   	}
>
> But here and under the circumstances you describe, we'll call
> congestion_wait() a single time.  That shouldn't have occurred.
>
> So how about we put the fallback logic into too_many_isolated() itself?
>
>

congestion_wait was allowed to run once as an optimization, considering 
that __too_many_isolated (unsafe and faster) can be correct in returning 
true most of the time. So we avoid calling the safe version, in most of 
the cases. But I agree that we should not call congestion_wait 
unnecessarily even in those rare cases. So this looks correct to me.


>
> From: Andrew Morton <akpm@linux-foundation.org>
> Subject: mm-vmscan-fix-the-page-state-calculation-in-too_many_isolated-fix
>
> Move the zone_page_state_snapshot() fallback logic into
> too_many_isolated(), so shrink_inactive_list() doesn't incorrectly call
> congestion_wait().
>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Vinayak Menon <vinmenon@codeaurora.org>
> Cc: Vladimir Davydov <vdavydov@parallels.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
>   mm/vmscan.c |   23 +++++++++++------------
>   1 file changed, 11 insertions(+), 12 deletions(-)
>
> diff -puN mm/vmscan.c~mm-vmscan-fix-the-page-state-calculation-in-too_many_isolated-fix mm/vmscan.c
> --- a/mm/vmscan.c~mm-vmscan-fix-the-page-state-calculation-in-too_many_isolated-fix
> +++ a/mm/vmscan.c
> @@ -1402,7 +1402,7 @@ int isolate_lru_page(struct page *page)
>   }
>
>   static int __too_many_isolated(struct zone *zone, int file,
> -	struct scan_control *sc, int safe)
> +			       struct scan_control *sc, int safe)
>   {
>   	unsigned long inactive, isolated;
>
> @@ -1435,7 +1435,7 @@ static int __too_many_isolated(struct zo
>    * unnecessary swapping, thrashing and OOM.
>    */
>   static int too_many_isolated(struct zone *zone, int file,
> -		struct scan_control *sc, int safe)
> +			     struct scan_control *sc)
>   {
>   	if (current_is_kswapd())
>   		return 0;
> @@ -1443,12 +1443,14 @@ static int too_many_isolated(struct zone
>   	if (!global_reclaim(sc))
>   		return 0;
>
> -	if (unlikely(__too_many_isolated(zone, file, sc, 0))) {
> -		if (safe)
> -			return __too_many_isolated(zone, file, sc, safe);
> -		else
> -			return 1;
> -	}
> +	/*
> +	 * __too_many_isolated(safe=0) is fast but inaccurate, because it
> +	 * doesn't account for the vm_stat_diff[] counters.  So if it looks
> +	 * like too_many_isolated() is about to return true, fall back to the
> +	 * slower, more accurate zone_page_state_snapshot().
> +	 */
> +	if (unlikely(__too_many_isolated(zone, file, sc, 0)))
> +		return __too_many_isolated(zone, file, sc, safe);
>
>   	return 0;
>   }
> @@ -1540,18 +1542,15 @@ shrink_inactive_list(unsigned long nr_to
>   	unsigned long nr_immediate = 0;
>   	isolate_mode_t isolate_mode = 0;
>   	int file = is_file_lru(lru);
> -	int safe = 0;
>   	struct zone *zone = lruvec_zone(lruvec);
>   	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
>
> -	while (unlikely(too_many_isolated(zone, file, sc, safe))) {
> +	while (unlikely(too_many_isolated(zone, file, sc))) {
>   		congestion_wait(BLK_RW_ASYNC, HZ/10);
>
>   		/* We are about to die and free our memory. Return now. */
>   		if (fatal_signal_pending(current))
>   			return SWAP_CLUSTER_MAX;
> -
> -		safe = 1;
>   	}
>
>   	lru_add_drain();
> _
>


-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-01-16  5:10 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-14 11:36 [PATCH v2] mm: vmscan: fix the page state calculation in too_many_isolated Vinayak Menon
2015-01-14 16:50 ` Michal Hocko
2015-01-15 17:24   ` Vinayak Menon
2015-01-16 15:49     ` Michal Hocko
2015-01-16 17:57       ` Michal Hocko
2015-01-16 19:17         ` Christoph Lameter
2015-01-17 15:18       ` Vinayak Menon
2015-01-17 19:48         ` Christoph Lameter
2015-01-19  4:27           ` Vinayak Menon
2015-01-21 14:39             ` Michal Hocko
2015-01-22 15:16               ` Vlastimil Babka
2015-01-22 16:11               ` Christoph Lameter
2015-01-26 17:46                 ` Michal Hocko
2015-01-26 18:35                   ` Christoph Lameter
2015-01-27 10:52                     ` Michal Hocko
2015-01-27 16:59                       ` Christoph Lameter
2015-01-30 15:28                         ` Michal Hocko
2015-01-26 17:28           ` Michal Hocko
2015-01-26 18:35             ` Christoph Lameter
2015-01-26 22:11             ` Andrew Morton
2015-01-27 10:41               ` Michal Hocko
2015-01-27 10:33             ` Vinayak Menon
2015-01-27 10:45               ` Michal Hocko
2015-01-29 17:32       ` Christoph Lameter
2015-01-30 15:27         ` Michal Hocko
2015-01-16  1:17 ` Andrew Morton
2015-01-16  5:10   ` Vinayak Menon [this message]
2015-01-17 16:29   ` Vinayak Menon
2015-02-11 22:14     ` Andrew Morton
2015-02-12 16:19       ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54B89D29.5000702@codeaurora.org \
    --to=vinmenon@codeaurora.org \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=minchan@kernel.org \
    --cc=vdavydov@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).