From: Byungchul Park <byungchul@sk.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "Huang, Ying" <ying.huang@intel.com>,
mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com,
vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
bristot@redhat.com, vschneid@redhat.com,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
kernel_team@skhynix.com
Subject: Re: [PATCH v4] sched/numa, mm: do not try to migrate memory to memoryless nodes
Date: Tue, 20 Feb 2024 11:33:04 +0900 [thread overview]
Message-ID: <20240220023304.GF65758@system.software.com> (raw)
In-Reply-To: <20240219174508.bc6256248a163c3ab9a58369@linux-foundation.org>
On Mon, Feb 19, 2024 at 05:45:08PM -0800, Andrew Morton wrote:
> On Mon, 19 Feb 2024 16:43:36 +0800 "Huang, Ying" <ying.huang@intel.com> wrote:
>
> > > + /*
> > > + * Cannot migrate to memoryless nodes.
> > > + */
> > > + if (!node_state(dst_nid, N_MEMORY))
> > > + return false;
> > > +
> > > /*
> > > * The pages in slow memory node should be migrated according
> > > * to hot/cold instead of private/shared.
> >
> > Good catch!
> >
> > IIUC, you will use patch as fix to the issue in
> >
> > https://lore.kernel.org/lkml/20240216111502.79759-1-byungchul@sk.com/
> >
> > If so, we need the Fixes: tag to make it land in -stable properly.
>
> Yes, this changelog is missing rather a lot of important information.
>
> I pulled together the below, please check.
To make it more clear, I need to explain it more. I posted the following
two patches while resolving the oops issue. However, two are going on
for different purposes.
1) https://lkml.kernel.org/r/20240219041920.1183-1-byungchul@sk.com
I started this patch as the fix for the oops. However, I found the
root cause comes from using -1 as an array index. So let the root
cause fix go with another thread, 2). Nevertheless, 1) is still
necessary as a *reasonable optimization* but not the real fix any
more.
2) https://lkml.kernel.org/r/20240216111502.79759-1-byungchul@sk.com
I found the root cause of the oops comes from using -1 as an array
index. So moved all the oops message, Fixes: tag, and cc stable to
here. Long story short, 2) is the *real fix* for the oops.
Byungchul
> From: Byungchul Park <byungchul@sk.com>
> Subject: sched/numa, mm: do not try to migrate memory to memoryless nodes
> Date: Mon, 19 Feb 2024 13:10:47 +0900
>
> With numa balancing on, when a numa system is running where a numa node
> doesn't have its local memory so it has no managed zones, the following
> oops has been observed. It's because wakeup_kswapd() is called with a
> wrong zone index, -1. Fixed it by checking the index before calling
> wakeup_kswapd().
>
> > BUG: unable to handle page fault for address: 00000000000033f3
> > #PF: supervisor read access in kernel mode
> > #PF: error_code(0x0000) - not-present page
> > PGD 0 P4D 0
> > Oops: 0000 [#1] PREEMPT SMP NOPTI
> > CPU: 2 PID: 895 Comm: masim Not tainted 6.6.0-dirty #255
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> > RIP: 0010:wakeup_kswapd (./linux/mm/vmscan.c:7812)
> > Code: (omitted)
> > RSP: 0000:ffffc90004257d58 EFLAGS: 00010286
> > RAX: ffffffffffffffff RBX: ffff88883fff0480 RCX: 0000000000000003
> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88883fff0480
> > RBP: ffffffffffffffff R08: ff0003ffffffffff R09: ffffffffffffffff
> > R10: ffff888106c95540 R11: 0000000055555554 R12: 0000000000000003
> > R13: 0000000000000000 R14: 0000000000000000 R15: ffff88883fff0940
> > FS: 00007fc4b8124740(0000) GS:ffff888827c00000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00000000000033f3 CR3: 000000026cc08004 CR4: 0000000000770ee0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > PKRU: 55555554
> > Call Trace:
> > <TASK>
> > ? __die
> > ? page_fault_oops
> > ? __pte_offset_map_lock
> > ? exc_page_fault
> > ? asm_exc_page_fault
> > ? wakeup_kswapd
> > migrate_misplaced_page
> > __handle_mm_fault
> > handle_mm_fault
> > do_user_addr_fault
> > exc_page_fault
> > asm_exc_page_fault
> > RIP: 0033:0x55b897ba0808
> > Code: (omitted)
> > RSP: 002b:00007ffeefa821a0 EFLAGS: 00010287
> > RAX: 000055b89983acd0 RBX: 00007ffeefa823f8 RCX: 000055b89983acd0
> > RDX: 00007fc2f8122010 RSI: 0000000000020000 RDI: 000055b89983acd0
> > RBP: 00007ffeefa821a0 R08: 0000000000000037 R09: 0000000000000075
> > R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
> > R13: 00007ffeefa82410 R14: 000055b897ba5dd8 R15: 00007fc4b8340000
> > </TASK>
>
> Fix this by avoiding any attempt to migrate memory to memoryless nodes.
>
> Link: https://lkml.kernel.org/r/20240219041920.1183-1-byungchul@sk.com
> Link: https://lkml.kernel.org/r/20240216111502.79759-1-byungchul@sk.com
> Fixes: c574bbe917036 ("NUMA balancing: optimize page placement for memory tiering system")
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> Reviewed-by: Oscar Salvador <osalvador@suse.de>
> Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
> Reviewed-by: Phil Auld <pauld@redhat.com>
> Cc: Benjamin Segall <bsegall@google.com>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Valentin Schneider <vschneid@redhat.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
> kernel/sched/fair.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> --- a/kernel/sched/fair.c~sched-numa-mm-do-not-try-to-migrate-memory-to-memoryless-nodes
> +++ a/kernel/sched/fair.c
> @@ -1831,6 +1831,12 @@ bool should_numa_migrate_memory(struct t
> int last_cpupid, this_cpupid;
>
> /*
> + * Cannot migrate to memoryless nodes.
> + */
> + if (!node_state(dst_nid, N_MEMORY))
> + return false;
> +
> + /*
> * The pages in slow memory node should be migrated according
> * to hot/cold instead of private/shared.
> */
> _
next prev parent reply other threads:[~2024-02-20 2:33 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-19 4:19 [PATCH v4] sched/numa, mm: do not try to migrate memory to memoryless nodes Byungchul Park
2024-02-19 8:43 ` Huang, Ying
2024-02-20 1:45 ` Andrew Morton
2024-02-20 1:53 ` Byungchul Park
2024-02-20 3:05 ` Andrew Morton
2024-02-20 3:20 ` Byungchul Park
2024-02-20 2:07 ` Byungchul Park
2024-02-20 2:33 ` Byungchul Park [this message]
2024-02-20 3:28 ` Andrew Morton
2024-02-20 4:09 ` Byungchul Park
2024-02-19 15:06 ` Phil Auld
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240220023304.GF65758@system.software.com \
--to=byungchul@sk.com \
--cc=akpm@linux-foundation.org \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=kernel_team@skhynix.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.