From: Rik van Riel <riel@redhat.com>
To: Mel Gorman <mgorman@suse.de>
Cc: linux-kernel@vger.kernel.org, peterz@infradead.org,
mingo@kernel.org, Andrea Arcangeli <aarcange@redhat.com>,
Jan Stancek <jstancek@redhat.com>
Subject: Re: [PATCH] sched,numa: limit amount of virtual memory scanned in task_numa_work
Date: Fri, 11 Sep 2015 11:57:31 -0400 [thread overview]
Message-ID: <55F2F9EB.4050106@redhat.com> (raw)
In-Reply-To: <20150911150544.GL25655@suse.de>
On 09/11/2015 11:05 AM, Mel Gorman wrote:
> On Fri, Sep 11, 2015 at 09:00:27AM -0400, Rik van Riel wrote:
>> Currently task_numa_work scans up to numa_balancing_scan_size_mb worth
>> of memory per invocation, but only counts memory areas that have at
>> least one PTE that is still present and not marked for numa hint faulting.
>>
>> It will skip over arbitarily large amounts of memory that are either
>> unused, full of swap ptes, or full of PTEs that were already marked
>> for NUMA hint faults but have not been faulted on yet.
>>
>
> This was deliberate and intended to cover a case whereby a process sparsely
> using the address space would quickly skip over the sparse portions and
> reach the active portions. Obviously you've found that this is not always
> a great idea.
Skipping over non-present pages is fine, since the scan
rate is keyed off the RSS.
However, skipping over pages that are already marked
PROT_NONE / PTE_NUMA results in unmapping pages at a much
accelerated rate (sometimes using >90% of the CPU of the
task), because the pages that are already PROT_NONE / NUMA
_are_ counted as part of the RSS.
>> @@ -2240,18 +2242,22 @@ void task_numa_work(struct callback_head *work)
>> start = max(start, vma->vm_start);
>> end = ALIGN(start + (pages << PAGE_SHIFT), HPAGE_SIZE);
>> end = min(end, vma->vm_end);
>> - nr_pte_updates += change_prot_numa(vma, start, end);
>> + nr_pte_updates = change_prot_numa(vma, start, end);
>>
>
> Are you *sure* about this particular change?
>
> The intent is that sparse space be skipped until the first updated PTE
> is found and then scan sysctl_numa_balancing_scan_size pages after that.
> With this change, if we find a single PTE in the middle of a sparse space
> than we stop updating pages in the nr_pte_updates check below. You get
> protected from a lot of scanning by the virtpages check but it does not
> seem this fix is necessary. It has an odd side-effect whereby we possible
> scan more with this patch in some cases.
True, it is possible that this patch would lead to more scanning
than before, if a process has present PTEs interleaved with areas
that are either sparsely populated, or already marked PROT_NONE.
However, was your intention to not quickly skip over empty areas
that come right after one single present PTE, but only over empty
areas at the beginning of a scan area?
If so, I don't understand the logic behind that, and would like
to know more :)
>> /*
>> - * Scan sysctl_numa_balancing_scan_size but ensure that
>> - * at least one PTE is updated so that unused virtual
>> - * address space is quickly skipped.
>> + * Try to scan sysctl_numa_balancing_size worth of
>> + * hpages that have at least one present PTE that
>> + * is not already pte-numa. If the VMA contains
>> + * areas that are unused or already full of prot_numa
>> + * PTEs, scan up to virtpages, to skip through those
>> + * areas faster.
>> */
>> if (nr_pte_updates)
>> pages -= (end - start) >> PAGE_SHIFT;
>> + virtpages -= (end - start) >> PAGE_SHIFT;
>>
>
> It's a pity there will potentially be a lot of useless dead scanning on
> those processes but caching start addresses is both outside the scope of
> this patch and has its own problems.
The problem has been observed when processes already have a lot of
pages marked PROT_NONE by change_prot_numa(), and change_prot_numa()
returning zero because no PTEs were hanged.
In that case, the amount of useless dead scanning should be a whole
lot less with this patch, than without.
I do not quite understand how this patch makes it worse, though.
--
All rights reversed
next prev parent reply other threads:[~2015-09-11 15:57 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-11 13:00 [PATCH] sched,numa: limit amount of virtual memory scanned in task_numa_work Rik van Riel
2015-09-11 15:05 ` Mel Gorman
2015-09-11 15:57 ` Rik van Riel [this message]
2015-09-11 16:16 ` Mel Gorman
2015-09-18 8:48 ` [tip:sched/core] sched/numa: Limit the amount of virtual memory scanned in task_numa_work() tip-bot for Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55F2F9EB.4050106@redhat.com \
--to=riel@redhat.com \
--cc=aarcange@redhat.com \
--cc=jstancek@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox