All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Rik van Riel <riel@redhat.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	"Zhang, Yanmin" <yanmin.zhang@intel.com>,
	"linuxram@us.ibm.com" <linuxram@us.ibm.com>,
	linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/3] Reintroduce zone_reclaim_interval for when zone_reclaim() scans and fails to avoid CPU spinning at 100% on NUMA
Date: Tue, 9 Jun 2009 09:58:22 +0800	[thread overview]
Message-ID: <20090609015822.GA6740@localhost> (raw)
In-Reply-To: <1244466090-10711-2-git-send-email-mel@csn.ul.ie>

On Mon, Jun 08, 2009 at 09:01:28PM +0800, Mel Gorman wrote:
> On NUMA machines, the administrator can configure zone_reclaim_mode that is a
> more targetted form of direct reclaim. On machines with large NUMA distances,
> zone_reclaim_mode defaults to 1 meaning that clean unmapped pages will be
> reclaimed if the zone watermarks are not being met. The problem is that
> zone_reclaim() can be in a situation where it scans excessively without
> making progress.
> 
> One such situation is where a large tmpfs mount is occupying a large
> percentage of memory overall. The pages do not get cleaned or reclaimed by
> zone_reclaim(), but the lists are uselessly scanned frequencly making the
> CPU spin at 100%. The scanning occurs because zone_reclaim() cannot tell
> in advance the scan is pointless because the counters do not distinguish
> between pagecache pages backed by disk and by RAM.  The observation in
> the field is that malloc() stalls for a long time (minutes in some cases)
> when this situation occurs.
> 
> Accounting for ram-backed file pages was considered but not implemented on
> the grounds it would be introducing new branches and expensive checks into
> the page cache add/remove patches and increase the number of statistics
> needed in the zone. As zone_reclaim() failing is currently considered a
> corner case, this seemed like overkill. Note, if there are a large number
> of reports about CPU spinning at 100% on NUMA that is fixed by disabling
> zone_reclaim, then this assumption is false and zone_reclaim() scanning
> and failing is not a corner case but a common occurance
> 
> This patch reintroduces zone_reclaim_interval which was removed by commit
> 34aa1330f9b3c5783d269851d467326525207422 [zoned vm counters: zone_reclaim:
> remove /proc/sys/vm/zone_reclaim_interval] because the zone counters were
> considered sufficient to determine in advance if the scan would succeed.
> As unsuccessful scans can still occur, zone_reclaim_interval is still
> required.

Can we avoid the user visible parameter zone_reclaim_interval?

That means to introduce some heuristics for it. Since the whole point
is to avoid 100% CPU usage, we can take down the time used for this
failed zone reclaim (T) and forbid zone reclaim until (NOW + 100*T).

Thanks,
Fengguang

WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Rik van Riel <riel@redhat.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	"Zhang, Yanmin" <yanmin.zhang@intel.com>,
	"linuxram@us.ibm.com" <linuxram@us.ibm.com>,
	linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/3] Reintroduce zone_reclaim_interval for when zone_reclaim() scans and fails to avoid CPU spinning at 100% on NUMA
Date: Tue, 9 Jun 2009 09:58:22 +0800	[thread overview]
Message-ID: <20090609015822.GA6740@localhost> (raw)
In-Reply-To: <1244466090-10711-2-git-send-email-mel@csn.ul.ie>

On Mon, Jun 08, 2009 at 09:01:28PM +0800, Mel Gorman wrote:
> On NUMA machines, the administrator can configure zone_reclaim_mode that is a
> more targetted form of direct reclaim. On machines with large NUMA distances,
> zone_reclaim_mode defaults to 1 meaning that clean unmapped pages will be
> reclaimed if the zone watermarks are not being met. The problem is that
> zone_reclaim() can be in a situation where it scans excessively without
> making progress.
> 
> One such situation is where a large tmpfs mount is occupying a large
> percentage of memory overall. The pages do not get cleaned or reclaimed by
> zone_reclaim(), but the lists are uselessly scanned frequencly making the
> CPU spin at 100%. The scanning occurs because zone_reclaim() cannot tell
> in advance the scan is pointless because the counters do not distinguish
> between pagecache pages backed by disk and by RAM.  The observation in
> the field is that malloc() stalls for a long time (minutes in some cases)
> when this situation occurs.
> 
> Accounting for ram-backed file pages was considered but not implemented on
> the grounds it would be introducing new branches and expensive checks into
> the page cache add/remove patches and increase the number of statistics
> needed in the zone. As zone_reclaim() failing is currently considered a
> corner case, this seemed like overkill. Note, if there are a large number
> of reports about CPU spinning at 100% on NUMA that is fixed by disabling
> zone_reclaim, then this assumption is false and zone_reclaim() scanning
> and failing is not a corner case but a common occurance
> 
> This patch reintroduces zone_reclaim_interval which was removed by commit
> 34aa1330f9b3c5783d269851d467326525207422 [zoned vm counters: zone_reclaim:
> remove /proc/sys/vm/zone_reclaim_interval] because the zone counters were
> considered sufficient to determine in advance if the scan would succeed.
> As unsuccessful scans can still occur, zone_reclaim_interval is still
> required.

Can we avoid the user visible parameter zone_reclaim_interval?

That means to introduce some heuristics for it. Since the whole point
is to avoid 100% CPU usage, we can take down the time used for this
failed zone reclaim (T) and forbid zone reclaim until (NOW + 100*T).

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2009-06-09  1:58 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-08 13:01 [PATCH 0/3] [RFC] Functional fix to zone_reclaim() and bring behaviour more in line with expectations Mel Gorman
2009-06-08 13:01 ` Mel Gorman
2009-06-08 13:01 ` [PATCH 1/3] Reintroduce zone_reclaim_interval for when zone_reclaim() scans and fails to avoid CPU spinning at 100% on NUMA Mel Gorman
2009-06-08 13:01   ` Mel Gorman
2009-06-08 13:31   ` Rik van Riel
2009-06-08 13:31     ` Rik van Riel
2009-06-08 13:54     ` Mel Gorman
2009-06-08 13:54       ` Mel Gorman
2009-06-08 14:33       ` Christoph Lameter
2009-06-08 14:33         ` Christoph Lameter
2009-06-08 14:38         ` Mel Gorman
2009-06-08 14:38           ` Mel Gorman
2009-06-08 14:55           ` Christoph Lameter
2009-06-08 14:55             ` Christoph Lameter
2009-06-08 15:11             ` Mel Gorman
2009-06-08 15:11               ` Mel Gorman
2009-06-10  5:23               ` Andrew Morton
2009-06-10  5:23                 ` Andrew Morton
2009-06-10  6:44                 ` KOSAKI Motohiro
2009-06-10  6:44                   ` KOSAKI Motohiro
2009-06-10 10:00                 ` Mel Gorman
2009-06-10 10:00                   ` Mel Gorman
2009-06-08 14:48       ` Rik van Riel
2009-06-08 14:48         ` Rik van Riel
2009-06-09  8:08         ` Mel Gorman
2009-06-09  8:08           ` Mel Gorman
2009-06-09  1:58   ` Wu Fengguang [this message]
2009-06-09  1:58     ` Wu Fengguang
2009-06-09  8:14     ` Mel Gorman
2009-06-09  8:14       ` Mel Gorman
2009-06-09  8:25       ` Wu Fengguang
2009-06-09  8:25         ` Wu Fengguang
2009-06-09  8:31         ` Mel Gorman
2009-06-09  8:31           ` Mel Gorman
2009-06-09  9:07           ` Wu Fengguang
2009-06-09  9:07             ` Wu Fengguang
2009-06-09  9:40             ` Mel Gorman
2009-06-09  9:40               ` Mel Gorman
2009-06-09 13:38               ` Wu Fengguang
2009-06-09 13:38                 ` Wu Fengguang
2009-06-09 15:06                 ` Mel Gorman
2009-06-09 15:06                   ` Mel Gorman
2009-06-10  2:14                   ` Wu Fengguang
2009-06-10  2:14                     ` Wu Fengguang
2009-06-10  9:54                     ` Mel Gorman
2009-06-10  9:54                       ` Mel Gorman
2009-06-09  7:48   ` KOSAKI Motohiro
2009-06-09  7:48     ` KOSAKI Motohiro
2009-06-09  8:18     ` Mel Gorman
2009-06-09  8:18       ` Mel Gorman
2009-06-09  8:45       ` KOSAKI Motohiro
2009-06-09  8:45         ` KOSAKI Motohiro
2009-06-09  9:42         ` Mel Gorman
2009-06-09  9:42           ` Mel Gorman
2009-06-09  9:45           ` KOSAKI Motohiro
2009-06-09  9:45             ` KOSAKI Motohiro
2009-06-09  9:59             ` KOSAKI Motohiro
2009-06-09  9:59               ` KOSAKI Motohiro
2009-06-09 10:44               ` Mel Gorman
2009-06-09 10:44                 ` Mel Gorman
2009-06-09 10:50                 ` KOSAKI Motohiro
2009-06-09 10:50                   ` KOSAKI Motohiro
2009-06-08 13:01 ` [PATCH 2/3] Properly account for the number of page cache pages zone_reclaim() can reclaim Mel Gorman
2009-06-08 13:01   ` Mel Gorman
2009-06-08 14:25   ` Christoph Lameter
2009-06-08 14:25     ` Christoph Lameter
2009-06-08 14:36     ` Mel Gorman
2009-06-08 14:36       ` Mel Gorman
2009-06-09  2:25   ` Wu Fengguang
2009-06-09  2:25     ` Wu Fengguang
2009-06-09  8:27     ` Mel Gorman
2009-06-09  8:27       ` Mel Gorman
2009-06-09  8:45       ` Wu Fengguang
2009-06-09  8:45         ` Wu Fengguang
2009-06-09 10:48         ` Mel Gorman
2009-06-09 10:48           ` Mel Gorman
2009-06-09 12:08           ` Wu Fengguang
2009-06-09 12:08             ` Wu Fengguang
2009-06-09  8:55       ` KOSAKI Motohiro
2009-06-09  8:55         ` KOSAKI Motohiro
2009-06-09  2:37   ` Wu Fengguang
2009-06-09  2:37     ` Wu Fengguang
2009-06-09  8:19   ` KOSAKI Motohiro
2009-06-09  8:19     ` KOSAKI Motohiro
2009-06-09  8:47     ` Mel Gorman
2009-06-09  8:47       ` Mel Gorman
2009-06-08 13:01 ` [PATCH 3/3] Do not unconditionally treat zones that fail zone_reclaim() as full Mel Gorman
2009-06-08 13:01   ` Mel Gorman
2009-06-08 14:32   ` Christoph Lameter
2009-06-08 14:32     ` Christoph Lameter
2009-06-08 14:43     ` Mel Gorman
2009-06-08 14:43       ` Mel Gorman
2009-06-09  3:11   ` Wu Fengguang
2009-06-09  3:11     ` Wu Fengguang
2009-06-09  8:50     ` Mel Gorman
2009-06-09  8:50       ` Mel Gorman
2009-06-09  7:48   ` KOSAKI Motohiro
2009-06-09  7:48     ` KOSAKI Motohiro
2009-06-09  9:25     ` Mel Gorman
2009-06-09  9:25       ` Mel Gorman
2009-06-09 12:05       ` KOSAKI Motohiro
2009-06-09 12:05         ` KOSAKI Motohiro
2009-06-09 13:28         ` Mel Gorman
2009-06-09 13:28           ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090609015822.GA6740@localhost \
    --to=fengguang.wu@intel.com \
    --cc=cl@linux-foundation.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxram@us.ibm.com \
    --cc=mel@csn.ul.ie \
    --cc=riel@redhat.com \
    --cc=yanmin.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.