All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Christoph Lameter <cl@linux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Minchan Kim <minchan.kim@gmail.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] mm: page allocator: Initialise ZLC for first zone eligible for zone_reclaim
Date: Mon, 18 Jul 2011 22:13:25 +0100	[thread overview]
Message-ID: <20110718211325.GC5349@suse.de> (raw)
In-Reply-To: <alpine.DEB.2.00.1107181208050.31576@router.home>

On Mon, Jul 18, 2011 at 12:20:11PM -0500, Christoph Lameter wrote:
> 
> 
> On Mon, 18 Jul 2011, Mel Gorman wrote:
> 
> > On Mon, Jul 18, 2011 at 09:56:31AM -0500, Christoph Lameter wrote:
> > > On Fri, 15 Jul 2011, Mel Gorman wrote:
> > >
> > > > Currently the zonelist cache is setup only after the first zone has
> > > > been considered and zone_reclaim() has been called. The objective was
> > > > to avoid a costly setup but zone_reclaim is itself quite expensive. If
> > > > it is failing regularly such as the first eligible zone having mostly
> > > > mapped pages, the cost in scanning and allocation stalls is far higher
> > > > than the ZLC initialisation step.
> > >
> > > Would it not be easier to set zlc_active and allowednodes based on the
> > > zone having an active ZLC at the start of get_pages()?
> > >
> >
> > What do you mean by a zones active ZLC? zonelists are on a per-node,
> > not a per-zone basis (see node_zonelist) so a zone doesn't have an
> > active ZLC as such. If the zlc_active is set at the beginning of
> 
> Look at get_page_from_freelist(): It sets
> zlc_active = 0 even through the zonelist under consideration may have a
> ZLC. zlc_active = 0 can also mean that the function has not bothered to
> look for the zlc information of the current zonelist.
> 

Yes. So? It's only necessary if the watermarks are not met.

> > get_page_from_freelist(), it implies that we are calling zlc_setup()
> > even when the watermarks are met which is unnecessary.
> 
> Ok then that decision to not call zlc_setup() for performance reasons is
> what created the problem that you are trying to solve. In case that the
> first zones watermarks are okay we can avoid calling zlc_setup().
> 

The original implementation did not check the ZLC in the first loop
at all. It wasn't just about avoiding the cost of setup. I suspect
this problem has been there a long time and it's taking this long
for bug reports to show up because NUMA machines are being used for
generic numa-unaware workloads.

> What we do now have is checking for zlc_active in the loop just so that
> the first time around we do not call zlc_setup().
> 

Yes, why incur the cost for the common case?

> We may be able to simplify the function by:
> 
> 1.  Checking for the special case that the first zone is ok and that we do
> not want to call zlc_setup before we get to the loop.
> 
> 2. Do the zlc_setup() before the loop.
> 
> 3. Remove the zlc_setup() code as you did from the loop as well as the
> checks for zlc_active. zlc_active becomes not necessary since a zlc
> is always available when we go through the loop.
> 

That initial test will involve duplication of things like the cpuset and
no watermarks check just to place the zlc_setup() in a different place.
I might be missing your point but it seems like the gain would be
marginal. Fancy posting a patch?

-- 
Mel Gorman
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@suse.de>
To: Christoph Lameter <cl@linux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Minchan Kim <minchan.kim@gmail.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] mm: page allocator: Initialise ZLC for first zone eligible for zone_reclaim
Date: Mon, 18 Jul 2011 22:13:25 +0100	[thread overview]
Message-ID: <20110718211325.GC5349@suse.de> (raw)
In-Reply-To: <alpine.DEB.2.00.1107181208050.31576@router.home>

On Mon, Jul 18, 2011 at 12:20:11PM -0500, Christoph Lameter wrote:
> 
> 
> On Mon, 18 Jul 2011, Mel Gorman wrote:
> 
> > On Mon, Jul 18, 2011 at 09:56:31AM -0500, Christoph Lameter wrote:
> > > On Fri, 15 Jul 2011, Mel Gorman wrote:
> > >
> > > > Currently the zonelist cache is setup only after the first zone has
> > > > been considered and zone_reclaim() has been called. The objective was
> > > > to avoid a costly setup but zone_reclaim is itself quite expensive. If
> > > > it is failing regularly such as the first eligible zone having mostly
> > > > mapped pages, the cost in scanning and allocation stalls is far higher
> > > > than the ZLC initialisation step.
> > >
> > > Would it not be easier to set zlc_active and allowednodes based on the
> > > zone having an active ZLC at the start of get_pages()?
> > >
> >
> > What do you mean by a zones active ZLC? zonelists are on a per-node,
> > not a per-zone basis (see node_zonelist) so a zone doesn't have an
> > active ZLC as such. If the zlc_active is set at the beginning of
> 
> Look at get_page_from_freelist(): It sets
> zlc_active = 0 even through the zonelist under consideration may have a
> ZLC. zlc_active = 0 can also mean that the function has not bothered to
> look for the zlc information of the current zonelist.
> 

Yes. So? It's only necessary if the watermarks are not met.

> > get_page_from_freelist(), it implies that we are calling zlc_setup()
> > even when the watermarks are met which is unnecessary.
> 
> Ok then that decision to not call zlc_setup() for performance reasons is
> what created the problem that you are trying to solve. In case that the
> first zones watermarks are okay we can avoid calling zlc_setup().
> 

The original implementation did not check the ZLC in the first loop
at all. It wasn't just about avoiding the cost of setup. I suspect
this problem has been there a long time and it's taking this long
for bug reports to show up because NUMA machines are being used for
generic numa-unaware workloads.

> What we do now have is checking for zlc_active in the loop just so that
> the first time around we do not call zlc_setup().
> 

Yes, why incur the cost for the common case?

> We may be able to simplify the function by:
> 
> 1.  Checking for the special case that the first zone is ok and that we do
> not want to call zlc_setup before we get to the loop.
> 
> 2. Do the zlc_setup() before the loop.
> 
> 3. Remove the zlc_setup() code as you did from the loop as well as the
> checks for zlc_active. zlc_active becomes not necessary since a zlc
> is always available when we go through the loop.
> 

That initial test will involve duplication of things like the cpuset and
no watermarks check just to place the zlc_setup() in a different place.
I might be missing your point but it seems like the gain would be
marginal. Fancy posting a patch?

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2011-07-18 21:13 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-15 15:08 [PATCH 0/2] Reduce frequency of stalls due to zone_reclaim() on NUMA v2r1 Mel Gorman
2011-07-15 15:08 ` Mel Gorman
2011-07-15 15:08 ` [PATCH 1/2] mm: page allocator: Initialise ZLC for first zone eligible for zone_reclaim Mel Gorman
2011-07-15 15:08   ` Mel Gorman
2011-07-18 14:56   ` Christoph Lameter
2011-07-18 14:56     ` Christoph Lameter
2011-07-18 16:05     ` Mel Gorman
2011-07-18 16:05       ` Mel Gorman
2011-07-18 17:20       ` Christoph Lameter
2011-07-18 17:20         ` Christoph Lameter
2011-07-18 21:13         ` Mel Gorman [this message]
2011-07-18 21:13           ` Mel Gorman
2011-07-18 21:54           ` Christoph Lameter
2011-07-18 21:54             ` Christoph Lameter
2011-07-19 14:01             ` Christoph Lameter
2011-07-19 14:01               ` Christoph Lameter
2011-07-20 18:08               ` Christoph Lameter
2011-07-20 18:08                 ` Christoph Lameter
2011-07-20 19:18                 ` Mel Gorman
2011-07-20 19:18                   ` Mel Gorman
2011-07-20 19:28                   ` Christoph Lameter
2011-07-20 19:28                     ` Christoph Lameter
2011-07-20 19:52                     ` Christoph Lameter
2011-07-20 19:52                       ` Christoph Lameter
2011-07-20 21:17                       ` Christoph Lameter
2011-07-20 21:17                         ` Christoph Lameter
2011-07-20 22:48                         ` Mel Gorman
2011-07-20 22:48                           ` Mel Gorman
2011-07-21 15:24                           ` Christoph Lameter
2011-07-21 15:24                             ` Christoph Lameter
2011-07-15 15:09 ` [PATCH 2/2] mm: page allocator: Reconsider zones for allocation after direct reclaim Mel Gorman
2011-07-15 15:09   ` Mel Gorman
2011-07-19 11:46   ` [PATCH] mm: page allocator: Reconsider zones for allocation after direct reclaim fix Mel Gorman
2011-07-19 11:46     ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110718211325.GC5349@suse.de \
    --to=mgorman@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.