From: Mike Rapoport <rppt@kernel.org>
To: Raghavendra K T <raghavendra.kt@amd.com>
Cc: = <linux-kernel@vger.kernel.org>,
linux-mm@kvack.org, --cc=Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Daniel Bristot de Oliveira <bristot@redhat.com>,
Valentin Schneider <vschneid@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Matthew Wilcox <willy@infradead.org>,
Vlastimil Babka <vbabka@suse.cz>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
Peter Xu <peterx@redhat.com>,
David Hildenbrand <david@redhat.com>, xu xin <cgel.zte@gmail.com>,
Yu Zhao <yuzhao@google.com>, Colin Cross <ccross@google.com>,
Arnd Bergmann <arnd@arndb.de>, Hugh Dickins <hughd@google.com>,
Bharata B Rao <bharata@amd.com>,
Disha Talreja <dishaa.talreja@amd.com>
Subject: Re: [RFC PATCH V1 1/1] sched/numa: Enhance vma scanning logic
Date: Thu, 19 Jan 2023 11:39:36 +0200 [thread overview]
Message-ID: <Y8kP2KbJqWDIgGRZ@kernel.org> (raw)
In-Reply-To: <67bf778d592c39d02444825c416c2ed11d2ef4b2.1673610485.git.raghavendra.kt@amd.com>
Hi,
On Mon, Jan 16, 2023 at 07:05:34AM +0530, Raghavendra K T wrote:
> During the Numa scanning make sure only relevant vmas of the
> tasks are scanned.
Please add more detailed description about what are the issues with the
current scanning this patch aims to solve.
> Logic:
> 1) For the first two time allow unconditional scanning of vmas
> 2) Store recent 4 unique tasks (last 8bits of PIDs) accessed the vma.
> False negetives in case of collison should be fine here.
^ negatives
> 3) If more than 4 pids exist assume task indeed accessed vma to
> to avoid false negetives
>
> Co-developed-by: Bharata B Rao <bharata@amd.com>
> (initial patch to store pid information)
>
> Suggested-by: Mel Gorman <mgorman@techsingularity.net>
> Signed-off-by: Bharata B Rao <bharata@amd.com>
> Signed-off-by: Raghavendra K T <raghavendra.kt@amd.com>
> ---
> include/linux/mm_types.h | 2 ++
> kernel/sched/fair.c | 32 ++++++++++++++++++++++++++++++++
> mm/memory.c | 21 +++++++++++++++++++++
> 3 files changed, 55 insertions(+)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 500e536796ca..07feae37b8e6 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -506,6 +506,8 @@ struct vm_area_struct {
> struct mempolicy *vm_policy; /* NUMA policy for the VMA */
> #endif
> struct vm_userfaultfd_ctx vm_userfaultfd_ctx;
> + unsigned int accessing_pids;
> + int next_pid_slot;
> } __randomize_layout;
>
> struct kioctx_table;
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index e4a0b8bd941c..944d2e3b0b3c 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2916,6 +2916,35 @@ static void reset_ptenuma_scan(struct task_struct *p)
> p->mm->numa_scan_offset = 0;
> }
>
> +static bool vma_is_accessed(struct vm_area_struct *vma)
> +{
> + int i;
> + bool more_pids_exist;
> + unsigned long pid, max_pids;
> + unsigned long current_pid = current->pid & LAST__PID_MASK;
> +
> + max_pids = sizeof(unsigned int) * BITS_PER_BYTE / LAST__PID_SHIFT;
> +
> + /* By default we assume >= max_pids exist */
> + more_pids_exist = true;
> +
> + if (READ_ONCE(current->mm->numa_scan_seq) < 2)
> + return true;
> +
> + for (i = 0; i < max_pids; i++) {
> + pid = (vma->accessing_pids >> i * LAST__PID_SHIFT) &
> + LAST__PID_MASK;
> + if (pid == current_pid)
> + return true;
> + if (pid == 0) {
> + more_pids_exist = false;
> + break;
> + }
> + }
> +
> + return more_pids_exist;
> +}
> +
> /*
> * The expensive part of numa migration is done from task_work context.
> * Triggered from task_tick_numa().
> @@ -3015,6 +3044,9 @@ static void task_numa_work(struct callback_head *work)
> if (!vma_is_accessible(vma))
> continue;
>
> + if (!vma_is_accessed(vma))
> + continue;
> +
> do {
> start = max(start, vma->vm_start);
> end = ALIGN(start + (pages << PAGE_SHIFT), HPAGE_SIZE);
> diff --git a/mm/memory.c b/mm/memory.c
> index 8c8420934d60..fafd78d87a51 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4717,7 +4717,28 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
> pte_t pte, old_pte;
> bool was_writable = pte_savedwrite(vmf->orig_pte);
> int flags = 0;
> + int pid_slot = vma->next_pid_slot;
>
> + int i;
> + unsigned long pid, max_pids;
> + unsigned long current_pid = current->pid & LAST__PID_MASK;
> +
> + max_pids = sizeof(unsigned int) * BITS_PER_BYTE / LAST__PID_SHIFT;
> +
> + /* Avoid duplicate PID updation */
> + for (i = 0; i < max_pids; i++) {
> + pid = (vma->accessing_pids >> i * LAST__PID_SHIFT) &
> + LAST__PID_MASK;
> + if (pid == current_pid)
> + goto skip_update;
> + }
> +
> + vma->next_pid_slot = (++pid_slot) % max_pids;
> + vma->accessing_pids &= ~(LAST__PID_MASK << (pid_slot * LAST__PID_SHIFT));
> + vma->accessing_pids |= ((current_pid) <<
> + (pid_slot * LAST__PID_SHIFT));
> +
> +skip_update:
> /*
> * The "pte" at this point cannot be used safely without
> * validation through pte_unmap_same(). It's of NUMA type but
> --
> 2.34.1
>
>
--
Sincerely yours,
Mike.
next prev parent reply other threads:[~2023-01-19 9:39 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-16 1:35 [RFC PATCH V1 0/1] sched/numa: Enhance vma scanning Raghavendra K T
2023-01-16 2:25 ` Raghavendra K T
2023-01-16 1:35 ` [RFC PATCH V1 1/1] sched/numa: Enhance vma scanning logic Raghavendra K T
2023-01-16 2:25 ` Raghavendra K T
2023-01-17 11:14 ` David Hildenbrand
2023-01-17 13:09 ` Raghavendra K T
2023-01-17 14:59 ` Mel Gorman
2023-01-17 17:45 ` Raghavendra K T
2023-01-18 5:47 ` Raghavendra K T
2023-01-24 19:18 ` Raghavendra K T
2023-01-27 10:17 ` Mel Gorman
2023-01-27 15:27 ` Raghavendra K T
2023-01-18 4:43 ` Bharata B Rao
2023-02-21 0:38 ` Kalra, Ashish
2023-01-19 9:39 ` Mike Rapoport [this message]
2023-01-19 10:24 ` Raghavendra K T
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y8kP2KbJqWDIgGRZ@kernel.org \
--to=rppt@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=bharata@amd.com \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=ccross@google.com \
--cc=cgel.zte@gmail.com \
--cc=david@redhat.com \
--cc=dietmar.eggemann@arm.com \
--cc=dishaa.talreja@amd.com \
--cc=hughd@google.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@amd.com \
--cc=rostedt@goodmis.org \
--cc=vbabka@suse.cz \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=willy@infradead.org \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.