Linux RCU subsystem development
 help / color / mirror / Atom feed
From: Kim Phillips <kim.phillips@amd.com>
To: Usama Arif <usama.arif@bytedance.com>,
	dwmw2@infradead.org, tglx@linutronix.de, arjan@linux.intel.com
Cc: mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, x86@kernel.org, pbonzini@redhat.com,
	paulmck@kernel.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, rcu@vger.kernel.org, mimoja@mimoja.de,
	hewenliang4@huawei.com, thomas.lendacky@amd.com,
	seanjc@google.com, pmenzel@molgen.mpg.de,
	fam.zheng@bytedance.com, punit.agrawal@bytedance.com,
	simon.evans@bytedance.com, liangma@liangbit.com,
	David Woodhouse <dwmw@amazon.co.uk>,
	Mario Limonciello <Mario.Limonciello@amd.com>
Subject: Re: [PATCH v6 07/11] x86/smpboot: Disable parallel boot for AMD CPUs
Date: Fri, 3 Feb 2023 13:48:39 -0600	[thread overview]
Message-ID: <b3d9fbbf-e760-5d1d-9182-44c144abd1bf@amd.com> (raw)
In-Reply-To: <20230202215625.3248306-8-usama.arif@bytedance.com>

+Mario

Hi,

On 2/2/23 3:56 PM, Usama Arif wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---

I'd like to nack this, but can't (and not because it doesn't have
commit text):

If I:

  - take dwmw2's parallel-6.2-rc6 branch (commit 459d1c46dbd1)
  - remove the set_cpu_bug(c, X86_BUG_NO_PARALLEL_BRINGUP) line from amd.c

Then:

  - a Ryzen 3000 (Picasso A1/Zen+) notebook I have access to fails to boot.
  - Zen 2,3,4-based servers boot fine
  - a Zen1-based server doesn't boot.

This is what's left on its serial port:

[    3.199633] smp: Bringing up secondary CPUs ...
[    3.200732] x86: Booting SMP configuration:
[    3.204242] .... node  #0, CPUs:          #1
[    3.204301] CPU 1 to 93/x86/cpu:kick in 63 21 -114014307645 0 . 0 0 0 0 . 0 114025055970
[    3.204478] ------------[ cut here ]------------
[    3.204481] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/common.c:2122 cpu_init+0x2d/0x1f0
[    3.204490] Modules linked in:
[    3.204493] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc6+ #19
[    3.204496] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
[    3.204498] RIP: 0010:cpu_init+0x2d/0x1f0
[    3.204502] Code: e5 41 56 41 55 41 54 53 65 48 8b 1c 25 80 2e 1f 00 65 44 8b 35 20 e4 39 55 48 8b 05 5d f7 51 02 44 89 f2 f0 48 0f ab 10 73 06 <0f> 0b eb 02 f3 90 48 8b 05 3e f7 51 02 48 0f a3 10 73 f1 45 85 f6
[    3.204504] RSP: 0000:ffffffffac803d70 EFLAGS: 00010083
[    3.204506] RAX: ffff8d293eef6e40 RBX: ffff8d1d40010000 RCX: 0000000000000008
[    3.204508] RDX: 0000000000000000 RSI: ffff8d1d1c40b048 RDI: ffffffffac566418
[    3.204509] RBP: ffffffffac803d90 R08: 00000000fffffe14 R09: ffff8d1d1c406078
[    3.204510] R10: ffffffffac803dc0 R11: 0000000000000000 R12: 0000000000000000
[    3.204511] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    3.204512] FS:  0000000000000000(0000) GS:ffff8d1d1c400000(0000) knlGS:0000000000000000
[    3.204514] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.204515] CR2: 0000000000000000 CR3: 0000800daec12000 CR4: 00000000003100a0
[    3.204517] Call Trace:
[    3.204519] ---[ end trace 0000000000000000 ]---
[    3.204580] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: 0 APIC: 2
[    3.288686]    #2
[    3.288735] CPU 2 to 93/x86/cpu:kick in 210 42 -114355248756 0 . 0 0 0 0 . 0 114356192013
[    3.288798] ------------[ cut here ]------------
[    3.288804] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/common.c:2122 cpu_init+0x2d/0x1f0
[    3.288815] Modules linked in:
[    3.288819] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.2.0-rc6+ #19
[    3.288823] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
[    3.288826] RIP: 0010:cpu_init+0x2d/0x1f0
[    3.288831] Code: e5 41 56 41 55 41 54 53 65 48 8b 1c 25 80 2e 1f 00 65 44 8b 35 20 e4 39 55 48 8b 05 5d f7 51 02 44 89 f2 f0 48 0f ab 10 73 06 <0f> 0b eb 02 f3 90 48 8b 05 3e f7 51 02 48 0f a3 10 73 f1 45 85 f6
[    3.288835] RSP: 0000:ffffffffac803d70 EFLAGS: 00010083
[    3.288838] RAX: ffff8d293eef6e40 RBX: ffff8d1d40010000 RCX: 0000000000000008
[    3.288841] RDX: 0000000000000000 RSI: ffff8d1d1c40b048 RDI: ffffffffac566418
[    3.288844] RBP: ffffffffac803d90 R08: 00000000fffffe14 R09: ffff8d1d1c406078
[    3.288845] R10: ffffffffac803dc0 R11: 0000000000000000 R12: 0000000000000000
[    3.288848] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    3.288850] FS:  0000000000000000(0000) GS:ffff8d1d1c400000(0000) knlGS:0000000000000000
[    3.288852] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.288855] CR2: 0000000000000000 CR3: 0000800daec12000 CR4: 00000000003100a0
[    3.288857] Call Trace:
[    3.288859] ---[ end trace 0000000000000000 ]---
[    3.288925] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: 0 APIC: 8
6.36[   [    3. 68 33]3 [ #3[  [ #
                               [    3.368623[    3
  [    3.368623]    #3
[    3.368662] ------------[ cut here ]------------
[    3.368673] CPU 3 to 93/x86/cpu:kick in 504 315 -114684508974 0 . 0 0 0 0 . 0 114685353594
[    3.368705] BUG: scheduling while atomic: swapper/0/1/0x00000003
[    3.368708] 7 locks held by swapper/0/1:
[    3.368710]  #0: ffffffffacbff920 (console_lock){....}-{0:0}, at: vprintk_emit+0x13a/0x2e0
[    3.368721]  #1: ffffffffacbffd48 (console_srcu){....}-{0:0}, at: console_flush_all+0x2d/0x250
[    3.368728]  #2: ffffffffac87f540 (console_owner){....}-{0:0}, at: console_emit_next_record.constprop.22+0x189/0x350
[    3.368735]  #3: ffffffffadaae838 (&port_lock_key){....}-{2:2}, at: serial8250_console_write+0x88/0x3c0
[    3.368745]  #4: ffffffffac86aa50 (cpu_add_remove_lock){....}-{3:3}, at: cpu_up+0x6a/0xd0
[    3.368753]  #5: ffffffffac86a9a0 (cpu_hotplug_lock){....}-{0:0}, at: _cpu_up+0x3d/0x2f0
[    3.368760]  #6: ffffffffac8763b0 (smpboot_threads_lock){....}-{3:3}, at: smpboot_create_threads+0x21/0x80
[    3.368769] Modules linked in:
[    3.368770] Preemption disabled at:
[    3.368771] [<ffffffffaae717a4>] do_cpu_up+0x3e4/0x780
[    3.368777] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.2.0-rc6+ #19
[    3.368781] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
[    3.368782] Call Trace:
[    3.368783]  <TASK>
[    3.368789]  dump_stack_lvl+0x49/0x63
[    3.368795]  ? do_cpu_up+0x3e4/0x780
[    3.368799]  dump_stack+0x10/0x16
[    3.368802]  __schedule_bug+0xad/0xd0
[    3.368808]  __schedule+0x76/0x8a0
[    3.368812]  ? sched_clock+0x9/0x10
[    3.368817]  ? sched_clock_local+0x17/0x90
[    3.368826]  ? sort_range+0x30/0x30
[    3.368830]  schedule+0x88/0xd0
[    3.368833]  schedule_timeout+0x40/0x320
[    3.368840]  ? __this_cpu_preempt_check+0x13/0x20
[    3.368844]  ? lock_release+0x353/0x3c0
[    3.368852]  ? sort_range+0x30/0x30
[    3.368856]  wait_for_completion_killable+0xe0/0x1c0
[    3.368864]  __kthread_create_on_node+0xfe/0x1e0
[    3.368876]  ? wait_for_completion_killable+0x38/0x1c0
[    3.368884]  kthread_create_on_node+0x46/0x70
[    3.368894]  kthread_create_on_cpu+0x2c/0x90
[    3.368899]  __smpboot_create_thread+0x87/0x140
[    3.368905]  smpboot_create_threads+0x3f/0x80
[    3.368909]  ? idle_thread_get+0x40/0x40
[    3.368913]  cpuhp_invoke_callback+0x13c/0x5d0
[    3.368921]  __cpuhp_invoke_callback_range+0x69/0xf0
[    3.368929]  _cpu_up+0x12a/0x2f0
[    3.368937]  cpu_up+0x8f/0xd0
[    3.368942]  bringup_nonboot_cpus+0x7c/0x160
[    3.368950]  smp_init+0x2a/0x83
[    3.368957]  kernel_init_freeable+0x1a1/0x309
[    3.368961]  ? lock_release+0x353/0x3c0
[    3.368972]  ? rest_init+0x140/0x140
[    3.368977]  kernel_init+0x1a/0x130
[    3.368980]  ret_from_fork+0x22/0x30
[    3.368996]  </TASK>
[    3.369419]
[    3.369420] .... node  #1, CPUs:     #4
[    3.369466] ------------[ cut here ]------------
[    3.369469] CPU 4 to 93/x86/cpu:kick in 378 42 -114685407543 0 . 0 0 0 0 . 0 114687022569
[    3.369474] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/common.c:2122 cpu_init+0x2d/0x1f0
[    3.369487] Modules linked in:
[    3.369491] ------------[ cut here ]------------
[    3.369494] DEBUG_LOCKS_WARN_ON(val > preempt_count())
[    3.369493] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.2.0-rc6+ #19
[    3.369499] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018


...which points to the WARN_ON here:

static void wait_for_master_cpu(int cpu)
{
#ifdef CONFIG_SMP
         /*
          * wait for ACK from master CPU before continuing
          * with AP initialization
          */
         WARN_ON(cpumask_test_and_set_cpu(cpu, cpu_initialized_mask));
         while (!cpumask_test_cpu(cpu, cpu_callout_mask))
                 cpu_relax();
#endif
}

Let me know if you'd like me to test any changes.

Thanks,

Kim

  reply	other threads:[~2023-02-03 19:48 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-02 21:56 [PATCH v6 00/11] Parallel CPU bringup for x86_64 Usama Arif
2023-02-02 21:56 ` [PATCH v6 01/11] x86/apic/x2apic: Fix parallel handling of cluster_mask Usama Arif
2023-02-06 23:20   ` Thomas Gleixner
2023-02-07 10:57     ` David Woodhouse
2023-02-07 11:27       ` David Woodhouse
2023-02-07 14:24         ` Thomas Gleixner
2023-02-07 19:53           ` David Woodhouse
2023-02-07 20:58             ` Thomas Gleixner
2023-02-07 14:22       ` Thomas Gleixner
2023-02-02 21:56 ` [PATCH v6 02/11] cpu/hotplug: Move idle_thread_get() to <linux/smpboot.h> Usama Arif
2023-02-06 23:33   ` Thomas Gleixner
2023-02-07  1:24     ` Paul E. McKenney
2023-02-02 21:56 ` [PATCH v6 03/11] cpu/hotplug: Add dynamic parallel bringup states before CPUHP_BRINGUP_CPU Usama Arif
2023-02-06 23:43   ` Thomas Gleixner
2023-02-02 21:56 ` [PATCH v6 04/11] x86/smpboot: Reference count on smpboot_setup_warm_reset_vector() Usama Arif
2023-02-06 23:48   ` Thomas Gleixner
     [not found]     ` <57195f701f6d1d70ec440c9a28cbee4cfb81dc41.camel@amazon.co.uk>
2023-02-07 14:39       ` Thomas Gleixner
2023-02-07 16:50         ` Sean Christopherson
2023-02-07 19:48         ` [EXTERNAL][PATCH " David Woodhouse
2023-02-02 21:56 ` [PATCH v6 05/11] x86/smpboot: Split up native_cpu_up into separate phases and document them Usama Arif
2023-02-06 23:59   ` Thomas Gleixner
2023-02-02 21:56 ` [PATCH v6 06/11] x86/smpboot: Support parallel startup of secondary CPUs Usama Arif
2023-02-02 22:30   ` David Woodhouse
2023-02-02 22:50     ` [External] " Usama Arif
2023-02-03  8:14       ` David Woodhouse
2023-02-03 14:41         ` Arjan van de Ven
2023-02-03 18:17   ` Sean Christopherson
2023-02-07  0:07   ` Thomas Gleixner
2023-02-02 21:56 ` [PATCH v6 07/11] x86/smpboot: Disable parallel boot for AMD CPUs Usama Arif
2023-02-03 19:48   ` Kim Phillips [this message]
     [not found]     ` <d5ec64236ba75f0d3f3718fb69b2cb9169d8af0a.camel@amazon.co.uk>
2023-02-03 21:45       ` Kim Phillips
2023-02-03 22:25         ` [EXTERNAL][PATCH " David Woodhouse
2023-02-04  9:07     ` [PATCH " David Woodhouse
2023-02-04 10:09     ` David Woodhouse
2023-02-04 15:40     ` David Woodhouse
2023-02-04 18:18       ` Arjan van de Ven
2023-02-04 22:31         ` David Woodhouse
2023-02-05 22:13           ` [External] " Usama Arif
2023-02-06  8:05             ` David Woodhouse
2023-02-06 12:11               ` Usama Arif
2023-02-06 18:07                 ` Sean Christopherson
2023-02-06 17:58       ` Kim Phillips
2023-02-07 16:27         ` Kim Phillips
2023-02-07  0:23       ` Thomas Gleixner
2023-02-07 10:04         ` David Woodhouse
2023-02-07 14:44           ` Thomas Gleixner
2023-02-07  0:09   ` Thomas Gleixner
     [not found]     ` <cbd9e88e738dc0c479e87121ca82431731905c73.camel@amazon.co.uk>
2023-02-07 14:46       ` Thomas Gleixner
2023-02-02 21:56 ` [PATCH v6 08/11] x86/smpboot: Send INIT/SIPI/SIPI to secondary CPUs in parallel Usama Arif
2023-02-07  0:28   ` Thomas Gleixner
2023-02-02 21:56 ` [PATCH v6 09/11] x86/mtrr: Avoid repeated save of MTRRs on boot-time CPU bringup Usama Arif
2023-02-02 21:56 ` [PATCH v6 10/11] x86/smpboot: Serialize topology updates for secondary bringup Usama Arif
2023-02-02 21:56 ` [PATCH v6 11/11] x86/smpboot: reuse timer calibration Usama Arif
2023-02-07  0:31   ` Thomas Gleixner
2023-02-07 23:16   ` Arjan van de Ven
2023-02-07 23:55     ` Thomas Gleixner
2023-02-05 19:17 ` [PATCH v6 00/11] Parallel CPU bringup for x86_64 Russ Anderson
2023-02-06  8:28   ` David Woodhouse
2023-02-06 12:18     ` [External] " Usama Arif

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b3d9fbbf-e760-5d1d-9182-44c144abd1bf@amd.com \
    --to=kim.phillips@amd.com \
    --cc=Mario.Limonciello@amd.com \
    --cc=arjan@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=dwmw2@infradead.org \
    --cc=dwmw@amazon.co.uk \
    --cc=fam.zheng@bytedance.com \
    --cc=hewenliang4@huawei.com \
    --cc=hpa@zytor.com \
    --cc=kvm@vger.kernel.org \
    --cc=liangma@liangbit.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mimoja@mimoja.de \
    --cc=mingo@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=pmenzel@molgen.mpg.de \
    --cc=punit.agrawal@bytedance.com \
    --cc=rcu@vger.kernel.org \
    --cc=seanjc@google.com \
    --cc=simon.evans@bytedance.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=usama.arif@bytedance.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox