Re: [PATCH 18/31] mm: sched: numa: Implement constant, per task Working Set Sampling (WSS) rate

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@suse.de>
To: Andrew Theurer <habanero@linux.vnet.ibm.com>
Cc: a.p.zijlstra@chello.nl, riel@redhat.com, aarcange@redhat.com,
	lee.schermerhorn@hp.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 18/31] mm: sched: numa: Implement constant, per task Working Set Sampling (WSS) rate
Date: Thu, 15 Nov 2012 10:27:50 +0000	[thread overview]
Message-ID: <20121115102750.GT8218@suse.de> (raw)
In-Reply-To: <1352921993.7130.154.camel@oc2024037011.ibm.com>

On Wed, Nov 14, 2012 at 01:39:53PM -0600, Andrew Theurer wrote:
> > > <SNIP>
> > >
> > > I am wondering if it would be better to shrink the scan period back to a
> > > much smaller fixed value,
> > 
> > I'll do that anyway.
> > 
> > > and instead of picking 256MB ranges of memory
> > > to mark completely, go back to using all of the address space, but mark
> > > only every Nth page. 
> > 
> > It'll still be necessary to do the full walk and I wonder if we'd lose on
> > the larger number of PTE locks that will have to be taken to do a scan if
> > we are only updating every 128 pages for example. It could be very expensive.
> 
> Yes, good point.  My other inclination was not doing a mass marking of
> pages at all (except just one time at some point after task init) and
> conditionally setting or clearing the prot_numa in the fault path itself
> to control the fault rate. 

That's a bit of a catch-22. You need faults to control the scan rate
which determines the fault rate.

One thing that could be done is that the PTE scanning-and-updating is
rate limited if there is an excessive number of migrations due to NUMA
hinting faults within a given window. I've prototyped something along
these lines. The problem is that it'll disrupt the accuracy of the
statistics gathered by the hinting faults.

> The problem I see is I am not sure how we
> "back-off" the fault rate per page. 

I went for a straight cutoff. If a node has migrated too much recently,
no PTEs are marked for update if the PTE points to a page on that node. I
know it's a big heavy hammer but it'll indicate if it's worthwhile.

> You could choose to not leave the
> page marked, but then you never get a fault on that page again, so
> there's no good way to mark it again in the fault path for that page
> unless you have the periodic marker. 

In my case, the throttle window expires and it goes back to scanning at
the normal rate. I've changed the details of how the scanning rate
increases and decreases but how exactly is not that important right now.

> However, maybe a certain number of
> pages are considered clustered together, and a fault from any page is
> considered a fault for the cluster of pages.  When handling the fault,
> the number of pages which are marked in the cluster is varied to achieve
> a target, reasonable fault rate.  Might be able to treat page migrations
> in clusters as well...  I probably need to think about this a bit
> more....
> 

FWIW, I'm wary of putting too many smarts into how the scanning rates are
adapted. It'll be too specific to workloads and machine sizes.

-- 
Mel Gorman
SUSE Labs

next prev parent reply	other threads:[~2012-11-15 10:27 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-14 17:24 [PATCH 18/31] mm: sched: numa: Implement constant, per task Working Set Sampling (WSS) rate Andrew Theurer
2012-11-14 18:28 ` Mel Gorman
2012-11-14 19:39   ` Andrew Theurer
2012-11-15 10:27     ` Mel Gorman [this message]
  -- strict thread matches above, loose matches on Subject: below --
2012-11-13 11:12 [RFC PATCH 00/31] Foundation for automatic NUMA balancing V2 Mel Gorman
2012-11-13 11:12 ` [PATCH 18/31] mm: sched: numa: Implement constant, per task Working Set Sampling (WSS) rate Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121115102750.GT8218@suse.de \
    --to=mgorman@suse.de \
    --cc=a.p.zijlstra@chello.nl \
    --cc=aarcange@redhat.com \
    --cc=habanero@linux.vnet.ibm.com \
    --cc=lee.schermerhorn@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).