All of lore.kernel.org
 help / color / mirror / Atom feed
From: K Prateek Nayak <kprateek.nayak@amd.com>
To: Leon Romanovsky <leon@kernel.org>
Cc: Steve Wahl <steve.wahl@hpe.com>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	linux-kernel@vger.kernel.org,
	Vishal Chourasia <vishalc@linux.ibm.com>,
	samir <samir@linux.ibm.com>,
	Naman Jain <namjain@linux.microsoft.com>,
	Saurabh Singh Sengar <ssengar@linux.microsoft.com>,
	srivatsa@csail.mit.edu, Michael Kelley <mhklinux@outlook.com>,
	Russ Anderson <rja@hpe.com>, Dimitri Sivanich <sivanich@hpe.com>
Subject: Re: [PATCH v4 1/2] sched/topology: improve topology_span_sane speed
Date: Thu, 12 Jun 2025 15:00:22 +0530	[thread overview]
Message-ID: <5a673979-e96c-4dc2-b84b-849c6c8084ae@amd.com> (raw)
In-Reply-To: <20250612074157.GO10669@unreal>

Hello Leon,

Thank you for more info!

On 6/12/2025 1:11 PM, Leon Romanovsky wrote:
>   [    0.032188] CPU topo: Max. logical packages:  10
>   [    0.032189] CPU topo: Max. logical dies:      10
>   [    0.032189] CPU topo: Max. dies per package:   1
>   [    0.032194] CPU topo: Max. threads per core:   1
>   [    0.032194] CPU topo: Num. cores per package:     1
>   [    0.032195] CPU topo: Num. threads per package:   1
>   [    0.032195] CPU topo: Allowing 10 present CPUs plus 0 hotplug CPUs

This indicates each CPU is a socket leading to 10 sockets ...

>   [    0.288498] smp: Bringing up secondary CPUs ...
>   [    0.289225] smpboot: x86: Booting SMP configuration:
>   [    0.289900] .... node  #0, CPUs:        #1
>   [    0.290511] .... node  #1, CPUs:    #2  #3
>   [    0.291559] .... node  #2, CPUs:    #4  #5
>   [    0.292557] .... node  #3, CPUs:    #6  #7
>   [    0.293593] .... node  #4, CPUs:    #8  #9
>   [    0.326310] smp: Brought up 5 nodes, 10 CPUs

... and this indicates two sockets are grouped as one NUMA node
leading to 5 nodes in total. I tried the following:

     qemu-system-x86_64 -enable-kvm \
     -cpu EPYC-Milan-v2 -m 20G -smp cpus=10,sockets=10 \
     -machine q35 \
     -object memory-backend-ram,size=4G,id=m0 \
     -object memory-backend-ram,size=4G,id=m1 \
     -object memory-backend-ram,size=4G,id=m2 \
     -object memory-backend-ram,size=4G,id=m3 \
     -object memory-backend-ram,size=4G,id=m4 \
     -numa node,cpus=0-1,memdev=m0,nodeid=0 \
     -numa node,cpus=2-3,memdev=m1,nodeid=1 \
     -numa node,cpus=4-5,memdev=m2,nodeid=2 \
     -numa node,cpus=6-7,memdev=m3,nodeid=3 \
     -numa node,cpus=8-9,memdev=m4,nodeid=4 \
     ...

but could not hit this issue with v6.16-rc1 kernel and QEMU emulator
version 10.0.50 (v10.0.0-1610-gd9ce74873a)

>   [    0.327532] smpboot: Total of 10 processors activated (51878.08 BogoMIPS)
>   [    0.329252] ------------[ cut here ]------------
>   [    0.329252] WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:2486 build_sched_domains+0xe67/0x13a0
>   [    0.330608] Modules linked in:
>   [    0.331050] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.16.0-rc1_for_upstream_min_debug_2025_06_09_14_44 #1 NONE
>   [    0.332386] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
>   [    0.333767] RIP: 0010:build_sched_domains+0xe67/0x13a0
>   [    0.334298] Code: ff ff 8b 6c 24 08 48 8b 44 24 68 65 48 2b 05 60 24 d0 01 0f 85 03 05 00 00 48 83 c4 70 89 e8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <0f> 0b e9 65 fe ff ff 48 c7 c7 28 fb 08 82 4c 89 44 24 28 c6 05 e4
>   [    0.336635] RSP: 0000:ffff8881002efe30 EFLAGS: 00010202
>   [    0.337326] RAX: 00000000ffffff01 RBX: 0000000000000002 RCX: 00000000ffffff01
>   [    0.338234] RDX: 00000000fffffff6 RSI: 0000000000000300 RDI: ffff888100047168
>   [    0.338523] RBP: 0000000000000000 R08: ffff888100047168 R09: 0000000000000000
>   [    0.339425] R10: ffffffff830dee80 R11: 0000000000000000 R12: ffff888100047168
>   [    0.340323] R13: 0000000000000002 R14: ffff888100193480 R15: ffff888380030f40
>   [    0.341221] FS:  0000000000000000(0000) GS:ffff8881b9b76000(0000) knlGS:0000000000000000
>   [    0.342298] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   [    0.343096] CR2: ffff88843ffff000 CR3: 000000000282c001 CR4: 0000000000370eb0
>   [    0.344042] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   [    0.344927] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   [    0.345811] Call Trace:
>   [    0.346191]  <TASK>
>   [    0.346429]  sched_init_smp+0x32/0xa0
>   [    0.346944]  ? stop_machine+0x2c/0x40
>   [    0.347460]  kernel_init_freeable+0xf5/0x260
>   [    0.348031]  ? rest_init+0xc0/0xc0
>   [    0.348513]  kernel_init+0x16/0x120
>   [    0.349008]  ret_from_fork+0x5e/0xd0
>   [    0.349510]  ? rest_init+0xc0/0xc0
>   [    0.349998]  ret_from_fork_asm+0x11/0x20
>   [    0.350464]  </TASK>
>   [    0.350812] ---[ end trace 0000000000000000 ]---

Ah! Since this happens so early topology isn't created yet for
the debug prints to hit! Is it possible to get a dmesg with
"ignore_loglevel" and "sched_verbose" on an older kernel that
did not throw this error on the same host?

> 
>>
>> Even the qemu cmdline for the guest can help! We can try reproducing
>> it at our end then. Thank you for all the help.
> 
> It is custom QEMU with limited access to hypervisor. This crash is
> inside VM.

Noted! Thank a ton for all the data provided.

-- 
Thanks and Regards,
Prateek


  reply	other threads:[~2025-06-12  9:30 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-04 16:08 [PATCH v4 0/2] Improving topology_span_sane Steve Wahl
2025-03-04 16:08 ` [PATCH v4 1/2] sched/topology: improve topology_span_sane speed Steve Wahl
2025-04-08 19:05   ` [tip: sched/core] " tip-bot2 for Steve Wahl
2025-06-10 11:07   ` [PATCH v4 1/2] " Leon Romanovsky
2025-06-10 11:33     ` K Prateek Nayak
2025-06-10 12:36       ` Leon Romanovsky
2025-06-10 13:09         ` Leon Romanovsky
2025-06-10 19:39           ` Steve Wahl
2025-06-11  6:06             ` Leon Romanovsky
2025-06-11  6:56               ` K Prateek Nayak
2025-06-12  7:41                 ` Leon Romanovsky
2025-06-12  9:30                   ` K Prateek Nayak [this message]
2025-06-12 10:41                     ` K Prateek Nayak
2025-06-15  6:42                       ` Leon Romanovsky
2025-06-16 14:18                         ` Steve Wahl
2025-06-17  3:04                           ` K Prateek Nayak
2025-06-17  7:55                             ` Leon Romanovsky
2025-06-17  7:34                           ` Leon Romanovsky
2025-06-17  9:22                             ` K Prateek Nayak
2025-06-23  6:06                               ` K Prateek Nayak
2025-03-04 16:08 ` [PATCH v4 2/2] sched/topology: Refinement to topology_span_sane speedup Steve Wahl
2025-04-08 19:05   ` [tip: sched/core] " tip-bot2 for Steve Wahl
2025-03-06  6:46 ` [PATCH v4 0/2] Improving topology_span_sane K Prateek Nayak
2025-03-06 14:33 ` Valentin Schneider
2025-03-07 10:06 ` Madadi Vineeth Reddy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5a673979-e96c-4dc2-b84b-849c6c8084ae@amd.com \
    --to=kprateek.nayak@amd.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mhklinux@outlook.com \
    --cc=mingo@redhat.com \
    --cc=namjain@linux.microsoft.com \
    --cc=peterz@infradead.org \
    --cc=rja@hpe.com \
    --cc=rostedt@goodmis.org \
    --cc=samir@linux.ibm.com \
    --cc=sivanich@hpe.com \
    --cc=srivatsa@csail.mit.edu \
    --cc=ssengar@linux.microsoft.com \
    --cc=steve.wahl@hpe.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vishalc@linux.ibm.com \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.