Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@suse.de>
To: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: 'Michal Hocko' <mhocko@kernel.org>,
	linux-mm@kvack.org, 'Johannes Weiner' <hannes@cmpxchg.org>,
	'Tetsuo Handa' <penguin-kernel@I-love.SAKURA.ne.jp>,
	'LKML' <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone
Date: Fri, 20 Jan 2017 09:25:58 +0000	[thread overview]
Message-ID: <20170120092558.phi37tvpsb4g5isv@suse.de> (raw)
In-Reply-To: <000501d272e8$5bfcf7d0$13f6e770$@alibaba-inc.com>

On Fri, Jan 20, 2017 at 02:42:24PM +0800, Hillf Danton wrote:
> > @@ -1603,16 +1603,16 @@ int isolate_lru_page(struct page *page)
> >   * the LRU list will go small and be scanned faster than necessary, leading to
> >   * unnecessary swapping, thrashing and OOM.
> >   */
> > -static int too_many_isolated(struct pglist_data *pgdat, int file,
> > +static bool safe_to_isolate(struct pglist_data *pgdat, int file,
> >  		struct scan_control *sc)
> 
> I prefer the current function name.
> 

The restructure is to work with the workqueue api.

> >  {
> >  	unsigned long inactive, isolated;
> > 
> >  	if (current_is_kswapd())
> > -		return 0;
> > +		return true;
> > 
> > -	if (!sane_reclaim(sc))
> > -		return 0;
> > +	if (sane_reclaim(sc))
> > +		return true;
> 
> We only need a one-line change.

It's bool so the conversion is made to bool while it's being changed
anyway.

> > 
> >  	if (file) {
> >  		inactive = node_page_state(pgdat, NR_INACTIVE_FILE);
> > @@ -1630,7 +1630,7 @@ static int too_many_isolated(struct pglist_data *pgdat, int file,
> >  	if ((sc->gfp_mask & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS))
> >  		inactive >>= 3;
> > 
> > -	return isolated > inactive;
> > +	return isolated < inactive;
> >  }
> > 
> >  static noinline_for_stack void
> > @@ -1719,12 +1719,28 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
> >  	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
> >  	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
> > 
> > -	while (unlikely(too_many_isolated(pgdat, file, sc))) {
> > -		congestion_wait(BLK_RW_ASYNC, HZ/10);
> > +	while (!safe_to_isolate(pgdat, file, sc)) {
> > +		long ret;
> > +
> > +		ret = wait_event_interruptible_timeout(pgdat->isolated_wait,
> > +			safe_to_isolate(pgdat, file, sc), HZ/10);
> > 
> >  		/* We are about to die and free our memory. Return now. */
> > -		if (fatal_signal_pending(current))
> > -			return SWAP_CLUSTER_MAX;
> > +		if (fatal_signal_pending(current)) {
> > +			nr_reclaimed = SWAP_CLUSTER_MAX;
> > +			goto out;
> > +		}
> > +
> > +		/*
> > +		 * If we reached the timeout, this is direct reclaim, and
> > +		 * pages cannot be isolated then return. If the situation
> 
> Please add something that we would rather shrink slab than go
> another round of nap.
> 

That's not necessarily true or even a good idea. It could result in
excessive slab shrinking that is no longer in proportion to LRU scanning
and increased contention within shrinkers.

> > +		 * persists for a long time then it'll eventually reach
> > +		 * the no_progress limit in should_reclaim_retry and consider
> > +		 * going OOM. In this case, do not wake the isolated_wait
> > +		 * queue as the wakee will still not be able to make progress.
> > +		 */
> > +		if (!ret && !current_is_kswapd() && !safe_to_isolate(pgdat, file, sc))
> > +			return 0;
> >  	}
> > 
> >  	lru_add_drain();
> > @@ -1839,6 +1855,10 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
> >  			stat.nr_activate, stat.nr_ref_keep,
> >  			stat.nr_unmap_fail,
> >  			sc->priority, file);
> > +
> > +out:
> > +	if (waitqueue_active(&pgdat->isolated_wait))
> > +		wake_up(&pgdat->isolated_wait);
> >  	return nr_reclaimed;
> >  }
> > 
> Is it also needed to check isolated_wait active before kswapd 
> takes nap?
> 

No because this is where pages were isolated and there is no putback
event that would justify waking the queue. There is a race between
waitqueue_active() and going to sleep that we rely on the timeout to
recover from.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Mel Gorman <mgorman@suse.de>
To: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: "'Michal Hocko'" <mhocko@kernel.org>,
	linux-mm@kvack.org, "'Johannes Weiner'" <hannes@cmpxchg.org>,
	"'Tetsuo Handa'" <penguin-kernel@I-love.SAKURA.ne.jp>,
	"'LKML'" <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone
Date: Fri, 20 Jan 2017 09:25:58 +0000	[thread overview]
Message-ID: <20170120092558.phi37tvpsb4g5isv@suse.de> (raw)
In-Reply-To: <000501d272e8$5bfcf7d0$13f6e770$@alibaba-inc.com>

On Fri, Jan 20, 2017 at 02:42:24PM +0800, Hillf Danton wrote:
> > @@ -1603,16 +1603,16 @@ int isolate_lru_page(struct page *page)
> >   * the LRU list will go small and be scanned faster than necessary, leading to
> >   * unnecessary swapping, thrashing and OOM.
> >   */
> > -static int too_many_isolated(struct pglist_data *pgdat, int file,
> > +static bool safe_to_isolate(struct pglist_data *pgdat, int file,
> >  		struct scan_control *sc)
> 
> I prefer the current function name.
> 

The restructure is to work with the workqueue api.

> >  {
> >  	unsigned long inactive, isolated;
> > 
> >  	if (current_is_kswapd())
> > -		return 0;
> > +		return true;
> > 
> > -	if (!sane_reclaim(sc))
> > -		return 0;
> > +	if (sane_reclaim(sc))
> > +		return true;
> 
> We only need a one-line change.

It's bool so the conversion is made to bool while it's being changed
anyway.

> > 
> >  	if (file) {
> >  		inactive = node_page_state(pgdat, NR_INACTIVE_FILE);
> > @@ -1630,7 +1630,7 @@ static int too_many_isolated(struct pglist_data *pgdat, int file,
> >  	if ((sc->gfp_mask & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS))
> >  		inactive >>= 3;
> > 
> > -	return isolated > inactive;
> > +	return isolated < inactive;
> >  }
> > 
> >  static noinline_for_stack void
> > @@ -1719,12 +1719,28 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
> >  	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
> >  	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
> > 
> > -	while (unlikely(too_many_isolated(pgdat, file, sc))) {
> > -		congestion_wait(BLK_RW_ASYNC, HZ/10);
> > +	while (!safe_to_isolate(pgdat, file, sc)) {
> > +		long ret;
> > +
> > +		ret = wait_event_interruptible_timeout(pgdat->isolated_wait,
> > +			safe_to_isolate(pgdat, file, sc), HZ/10);
> > 
> >  		/* We are about to die and free our memory. Return now. */
> > -		if (fatal_signal_pending(current))
> > -			return SWAP_CLUSTER_MAX;
> > +		if (fatal_signal_pending(current)) {
> > +			nr_reclaimed = SWAP_CLUSTER_MAX;
> > +			goto out;
> > +		}
> > +
> > +		/*
> > +		 * If we reached the timeout, this is direct reclaim, and
> > +		 * pages cannot be isolated then return. If the situation
> 
> Please add something that we would rather shrink slab than go
> another round of nap.
> 

That's not necessarily true or even a good idea. It could result in
excessive slab shrinking that is no longer in proportion to LRU scanning
and increased contention within shrinkers.

> > +		 * persists for a long time then it'll eventually reach
> > +		 * the no_progress limit in should_reclaim_retry and consider
> > +		 * going OOM. In this case, do not wake the isolated_wait
> > +		 * queue as the wakee will still not be able to make progress.
> > +		 */
> > +		if (!ret && !current_is_kswapd() && !safe_to_isolate(pgdat, file, sc))
> > +			return 0;
> >  	}
> > 
> >  	lru_add_drain();
> > @@ -1839,6 +1855,10 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
> >  			stat.nr_activate, stat.nr_ref_keep,
> >  			stat.nr_unmap_fail,
> >  			sc->priority, file);
> > +
> > +out:
> > +	if (waitqueue_active(&pgdat->isolated_wait))
> > +		wake_up(&pgdat->isolated_wait);
> >  	return nr_reclaimed;
> >  }
> > 
> Is it also needed to check isolated_wait active before kswapd 
> takes nap?
> 

No because this is where pages were isolated and there is no putback
event that would justify waking the queue. There is a race between
waitqueue_active() and going to sleep that we rely on the timeout to
recover from.

-- 
Mel Gorman
SUSE Labs

next prev parent reply	other threads:[~2017-01-20  9:26 UTC|newest]

Thread overview: 110+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-18 13:44 [RFC PATCH 0/2] fix unbounded too_many_isolated Michal Hocko
2017-01-18 13:44 ` Michal Hocko
2017-01-18 13:44 ` [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone Michal Hocko
2017-01-18 13:44   ` Michal Hocko
2017-01-18 14:46   ` Mel Gorman
2017-01-18 14:46     ` Mel Gorman
2017-01-18 15:15     ` Michal Hocko
2017-01-18 15:15       ` Michal Hocko
2017-01-18 15:54       ` Mel Gorman
2017-01-18 15:54         ` Mel Gorman
2017-01-18 16:17         ` Michal Hocko
2017-01-18 16:17           ` Michal Hocko
2017-01-18 17:00           ` Mel Gorman
2017-01-18 17:00             ` Mel Gorman
2017-01-18 17:29             ` Michal Hocko
2017-01-18 17:29               ` Michal Hocko
2017-01-19 10:07               ` Mel Gorman
2017-01-19 10:07                 ` Mel Gorman
2017-01-19 11:23                 ` Michal Hocko
2017-01-19 11:23                   ` Michal Hocko
2017-01-19 13:11                   ` Mel Gorman
2017-01-19 13:11                     ` Mel Gorman
2017-01-20 13:27                     ` Tetsuo Handa
2017-01-20 13:27                       ` Tetsuo Handa
2017-01-21  7:42                       ` Tetsuo Handa
2017-01-21  7:42                         ` Tetsuo Handa
2017-01-25 10:15                         ` Michal Hocko
2017-01-25 10:15                           ` Michal Hocko
2017-01-25 10:19                           ` Christoph Hellwig
2017-01-25 10:19                             ` Christoph Hellwig
2017-01-25 10:46                             ` Michal Hocko
2017-01-25 10:46                               ` Michal Hocko
2017-01-25 11:09                               ` Tetsuo Handa
2017-01-25 11:09                                 ` Tetsuo Handa
2017-01-25 13:00                                 ` Michal Hocko
2017-01-25 13:00                                   ` Michal Hocko
2017-01-27 14:49                                   ` Michal Hocko
2017-01-27 14:49                                     ` Michal Hocko
2017-01-28 15:27                                     ` Tetsuo Handa
2017-01-28 15:27                                       ` Tetsuo Handa
2017-01-30  8:55                                       ` Michal Hocko
2017-01-30  8:55                                         ` Michal Hocko
2017-02-02 10:14                                         ` Michal Hocko
2017-02-02 10:14                                           ` Michal Hocko
2017-02-03 10:57                                           ` Tetsuo Handa
2017-02-03 10:57                                             ` Tetsuo Handa
2017-02-03 14:41                                             ` Michal Hocko
2017-02-03 14:41                                               ` Michal Hocko
2017-02-03 14:50                                             ` Michal Hocko
2017-02-03 14:50                                               ` Michal Hocko
2017-02-03 17:24                                               ` Brian Foster
2017-02-03 17:24                                                 ` Brian Foster
2017-02-06  6:29                                                 ` Tetsuo Handa
2017-02-06  6:29                                                   ` Tetsuo Handa
2017-02-06 14:35                                                   ` Brian Foster
2017-02-06 14:35                                                     ` Brian Foster
2017-02-06 14:42                                                     ` Michal Hocko
2017-02-06 14:42                                                       ` Michal Hocko
2017-02-06 15:47                                                       ` Brian Foster
2017-02-06 15:47                                                         ` Brian Foster
2017-02-07 10:30                                                     ` Tetsuo Handa
2017-02-07 10:30                                                       ` Tetsuo Handa
2017-02-07 16:54                                                       ` Brian Foster
2017-02-07 16:54                                                         ` Brian Foster
2017-02-03 14:55                                             ` Michal Hocko
2017-02-03 14:55                                               ` Michal Hocko
2017-02-05 10:43                                               ` Tetsuo Handa
2017-02-05 10:43                                                 ` Tetsuo Handa
2017-02-06 10:34                                                 ` Michal Hocko
2017-02-06 10:34                                                   ` Michal Hocko
2017-02-06 10:39                                                 ` Michal Hocko
2017-02-06 10:39                                                   ` Michal Hocko
2017-02-07 21:12                                                   ` Michal Hocko
2017-02-07 21:12                                                     ` Michal Hocko
2017-02-08  9:24                                                     ` Peter Zijlstra
2017-02-08  9:24                                                       ` Peter Zijlstra
2017-02-21  9:40                                             ` Michal Hocko
2017-02-21  9:40                                               ` Michal Hocko
2017-02-21 14:35                                               ` Tetsuo Handa
2017-02-21 14:35                                                 ` Tetsuo Handa
2017-02-21 15:53                                                 ` Michal Hocko
2017-02-21 15:53                                                   ` Michal Hocko
2017-02-22  2:02                                                   ` Tetsuo Handa
2017-02-22  2:02                                                     ` Tetsuo Handa
2017-02-22  7:54                                                     ` Michal Hocko
2017-02-22  7:54                                                       ` Michal Hocko
2017-02-26  6:30                                                       ` Tetsuo Handa
2017-02-26  6:30                                                         ` Tetsuo Handa
2017-01-31 11:58                                   ` Michal Hocko
2017-01-31 11:58                                     ` Michal Hocko
2017-01-31 12:51                                     ` Christoph Hellwig
2017-01-31 12:51                                       ` Christoph Hellwig
2017-01-31 13:21                                       ` Michal Hocko
2017-01-31 13:21                                         ` Michal Hocko
2017-01-25 10:33                           ` [RFC PATCH 1/2] mm, vmscan: account the number of isolated pagesper zone Tetsuo Handa
2017-01-25 10:33                             ` Tetsuo Handa
2017-01-25 12:34                             ` Michal Hocko
2017-01-25 12:34                               ` Michal Hocko
2017-01-25 13:13                               ` [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone Tetsuo Handa
2017-01-25 13:13                                 ` Tetsuo Handa
2017-01-25  9:53                       ` Michal Hocko
2017-01-25  9:53                         ` Michal Hocko
2017-01-20  6:42                 ` Hillf Danton
2017-01-20  6:42                   ` Hillf Danton
2017-01-20  9:25                   ` Mel Gorman [this message]
2017-01-20  9:25                     ` Mel Gorman
2017-01-18 13:44 ` [RFC PATCH 2/2] mm, vmscan: do not loop on too_many_isolated for ever Michal Hocko
2017-01-18 13:44   ` Michal Hocko
2017-01-18 14:50   ` Mel Gorman
2017-01-18 14:50     ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170120092558.phi37tvpsb4g5isv@suse.de \
    --to=mgorman@suse.de \
    --cc=hannes@cmpxchg.org \
    --cc=hillf.zj@alibaba-inc.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.