Re: [PATCH] sched,numa: limit amount of virtual memory scanned in task_numa_work

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@suse.de>
To: Rik van Riel <riel@redhat.com>
Cc: linux-kernel@vger.kernel.org, peterz@infradead.org,
	mingo@kernel.org, Andrea Arcangeli <aarcange@redhat.com>,
	Jan Stancek <jstancek@redhat.com>
Subject: Re: [PATCH] sched,numa: limit amount of virtual memory scanned in task_numa_work
Date: Fri, 11 Sep 2015 17:16:23 +0100	[thread overview]
Message-ID: <20150911161623.GM25655@suse.de> (raw)
In-Reply-To: <55F2F9EB.4050106@redhat.com>

On Fri, Sep 11, 2015 at 11:57:31AM -0400, Rik van Riel wrote:
> On 09/11/2015 11:05 AM, Mel Gorman wrote:
> > On Fri, Sep 11, 2015 at 09:00:27AM -0400, Rik van Riel wrote:
> >> Currently task_numa_work scans up to numa_balancing_scan_size_mb worth
> >> of memory per invocation, but only counts memory areas that have at
> >> least one PTE that is still present and not marked for numa hint faulting.
> >>
> >> It will skip over arbitarily large amounts of memory that are either
> >> unused, full of swap ptes, or full of PTEs that were already marked
> >> for NUMA hint faults but have not been faulted on yet.
> >>
> > 
> > This was deliberate and intended to cover a case whereby a process sparsely
> > using the address space would quickly skip over the sparse portions and
> > reach the active portions. Obviously you've found that this is not always
> > a great idea.
> 
> Skipping over non-present pages is fine, since the scan
> rate is keyed off the RSS.
> 
> However, skipping over pages that are already marked
> PROT_NONE / PTE_NUMA results in unmapping pages at a much
> accelerated rate (sometimes using >90% of the CPU of the
> task), because the pages that are already PROT_NONE / NUMA
> _are_ counted as part of the RSS.
> 

True.

> >> @@ -2240,18 +2242,22 @@ void task_numa_work(struct callback_head *work)
> >>  			start = max(start, vma->vm_start);
> >>  			end = ALIGN(start + (pages << PAGE_SHIFT), HPAGE_SIZE);
> >>  			end = min(end, vma->vm_end);
> >> -			nr_pte_updates += change_prot_numa(vma, start, end);
> >> +			nr_pte_updates = change_prot_numa(vma, start, end);
> >>  
> > 
> > Are you *sure* about this particular change?
> > 
> > The intent is that sparse space be skipped until the first updated PTE
> > is found and then scan sysctl_numa_balancing_scan_size pages after that.
> > With this change, if we find a single PTE in the middle of a sparse space
> > than we stop updating pages in the nr_pte_updates check below. You get
> > protected from a lot of scanning by the virtpages check but it does not
> > seem this fix is necessary.  It has an odd side-effect whereby we possible
> > scan more with this patch in some cases.
> 
> True, it is possible that this patch would lead to more scanning
> than before, if a process has present PTEs interleaved with areas
> that are either sparsely populated, or already marked PROT_NONE.
> 
> However, was your intention to not quickly skip over empty areas
> that come right after one single present PTE, but only over empty
> areas at the beginning of a scan area?
> 

The intent was to skip over inactive areas which potentially are marked
PROT_NONE but not being addressed.

Just because it was the intent does not mean it was the best idea
though. I can easily see how the accelerated scan rate would occur and
why it needs to be mitigated. I just wanted to be 100% sure I understand
what you were thinking and what problem you encountered.

Acked-by: Mel Gorman <mgorman@suse.de>

Thanks.

-- 
Mel Gorman
SUSE Labs

next prev parent reply	other threads:[~2015-09-11 16:16 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-11 13:00 [PATCH] sched,numa: limit amount of virtual memory scanned in task_numa_work Rik van Riel
2015-09-11 15:05 ` Mel Gorman
2015-09-11 15:57   ` Rik van Riel
2015-09-11 16:16     ` Mel Gorman [this message]
2015-09-18  8:48 ` [tip:sched/core] sched/numa: Limit the amount of virtual memory scanned in task_numa_work() tip-bot for Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150911161623.GM25655@suse.de \
    --to=mgorman@suse.de \
    --cc=aarcange@redhat.com \
    --cc=jstancek@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.