Re: [PATCH] sched,numa: limit amount of virtual memory scanned in task_numa_work

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@suse.de>
To: Rik van Riel <riel@redhat.com>
Cc: linux-kernel@vger.kernel.org, peterz@infradead.org,
	mingo@kernel.org, Andrea Arcangeli <aarcange@redhat.com>,
	Jan Stancek <jstancek@redhat.com>
Subject: Re: [PATCH] sched,numa: limit amount of virtual memory scanned in task_numa_work
Date: Fri, 11 Sep 2015 17:16:23 +0100	[thread overview]
Message-ID: <20150911161623.GM25655@suse.de> (raw)
In-Reply-To: <55F2F9EB.4050106@redhat.com>

On Fri, Sep 11, 2015 at 11:57:31AM -0400, Rik van Riel wrote:
> On 09/11/2015 11:05 AM, Mel Gorman wrote:
> > On Fri, Sep 11, 2015 at 09:00:27AM -0400, Rik van Riel wrote:
> >> Currently task_numa_work scans up to numa_balancing_scan_size_mb worth
> >> of memory per invocation, but only counts memory areas that have at
> >> least one PTE that is still present and not marked for numa hint faulting.
> >>
> >> It will skip over arbitarily large amounts of memory that are either
> >> unused, full of swap ptes, or full of PTEs that were already marked
> >> for NUMA hint faults but have not been faulted on yet.
> >>
> > 
> > This was deliberate and intended to cover a case whereby a process sparsely
> > using the address space would quickly skip over the sparse portions and
> > reach the active portions. Obviously you've found that this is not always
> > a great idea.
> 
> Skipping over non-present pages is fine, since the scan
> rate is keyed off the RSS.
> 
> However, skipping over pages that are already marked
> PROT_NONE / PTE_NUMA results in unmapping pages at a much
> accelerated rate (sometimes using >90% of the CPU of the
> task), because the pages that are already PROT_NONE / NUMA
> _are_ counted as part of the RSS.
> 

True.

> >> @@ -2240,18 +2242,22 @@ void task_numa_work(struct callback_head *work)
> >>  			start = max(start, vma->vm_start);
> >>  			end = ALIGN(start + (pages << PAGE_SHIFT), HPAGE_SIZE);
> >>  			end = min(end, vma->vm_end);
> >> -			nr_pte_updates += change_prot_numa(vma, start, end);
> >> +			nr_pte_updates = change_prot_numa(vma, start, end);
> >>  
> > 
> > Are you *sure* about this particular change?
> > 
> > The intent is that sparse space be skipped until the first updated PTE
> > is found and then scan sysctl_numa_balancing_scan_size pages after that.
> > With this change, if we find a single PTE in the middle of a sparse space
> > than we stop updating pages in the nr_pte_updates check below. You get
> > protected from a lot of scanning by the virtpages check but it does not
> > seem this fix is necessary.  It has an odd side-effect whereby we possible
> > scan more with this patch in some cases.
> 
> True, it is possible that this patch would lead to more scanning
> than before, if a process has present PTEs interleaved with areas
> that are either sparsely populated, or already marked PROT_NONE.
> 
> However, was your intention to not quickly skip over empty areas
> that come right after one single present PTE, but only over empty
> areas at the beginning of a scan area?
> 

The intent was to skip over inactive areas which potentially are marked
PROT_NONE but not being addressed.

Just because it was the intent does not mean it was the best idea
though. I can easily see how the accelerated scan rate would occur and
why it needs to be mitigated. I just wanted to be 100% sure I understand
what you were thinking and what problem you encountered.

Acked-by: Mel Gorman <mgorman@suse.de>

Thanks.

-- 
Mel Gorman
SUSE Labs

next prev parent reply	other threads:[~2015-09-11 16:16 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-11 13:00 [PATCH] sched,numa: limit amount of virtual memory scanned in task_numa_work Rik van Riel
2015-09-11 15:05 ` Mel Gorman
2015-09-11 15:57   ` Rik van Riel
2015-09-11 16:16     ` Mel Gorman [this message]
2015-09-18  8:48 ` [tip:sched/core] sched/numa: Limit the amount of virtual memory scanned in task_numa_work() tip-bot for Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150911161623.GM25655@suse.de \
    --to=mgorman@suse.de \
    --cc=aarcange@redhat.com \
    --cc=jstancek@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox