From: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
To: "mingo@kernel.org" <mingo@kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Cc: "akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"bp@alien8.de" <bp@alien8.de>,
"peterz@infradead.org" <peterz@infradead.org>,
"hpa@zytor.com" <hpa@zytor.com>,
"luto@amacapital.net" <luto@amacapital.net>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"torvalds@linux-foundation.org" <torvalds@linux-foundation.org>,
"dave@sr71.net" <dave@sr71.net>,
"oleg@redhat.com" <oleg@redhat.com>,
"ubizjak@gmail.com" <ubizjak@gmail.com>
Subject: Re: [PATCH 2/3] x86/fpu: Remove the thread::fpu pointer
Date: Tue, 25 Jun 2024 05:26:37 +0000 [thread overview]
Message-ID: <93bcabebe678b532cd8ee75fa2f48f32ceeb64b2.camel@intel.com> (raw)
In-Reply-To: <20240605083557.2051480-3-mingo@kernel.org>
On Wed, 2024-06-05 at 10:35 +0200, Ingo Molnar wrote:
> As suggested by Oleg, remove the thread::fpu pointer, as we can
> calculate it via x86_task_fpu() at compile-time.
I'm seeing boot failures in a TDX VM that bisects to this commit in tip
(807333522953). The host is a pile of out-of-tree KVM patches, but the nature of
the change makes me wonder if it's not TDX related. The failure looks like the
below on the first bad commit. Some of the later commits had a failure with more
of an FPU associated stack trace. It also only shows when I have lock debugging
on:
#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_LOCK_DEBUGGING_SUPPORT=y
CONFIG_PROVE_LOCKING=y
# CONFIG_PROVE_RAW_LOCK_NESTING is not set
# CONFIG_LOCK_STAT is not set
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
CONFIG_DEBUG_RWSEMS=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_LOCKDEP=y
CONFIG_LOCKDEP_BITS=15
CONFIG_LOCKDEP_CHAINS_BITS=16
CONFIG_LOCKDEP_STACK_TRACE_BITS=19
CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS=14
CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12
# CONFIG_DEBUG_LOCKDEP is not set
CONFIG_DEBUG_ATOMIC_SLEEP=y
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
# CONFIG_LOCK_TORTURE_TEST is not set
# CONFIG_WW_MUTEX_SELFTEST is not set
# CONFIG_SCF_TORTURE_TEST is not set
# CONFIG_CSD_LOCK_WAIT_DEBUG is not set
# end of Lock Debugging (spinlocks, mutexes, etc...)
If it's not obvious, I can investigate some more tomorrow on a more normal VM
configuration.
[ 8.830714] ------------[ cut here ]------------
[ 8.830714] DEBUG_LOCKS_WARN_ON(1)
[ 8.830714] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:232
__lock_acquire+0xa5c/0x2120
[ 8.830714] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G S
6.10.0-rc3-00004-g807333522953 #117
[ 8.830714] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown
2/2/2022
[ 8.830714] RIP: 0010:__lock_acquire+0xa5c/0x2120
[ 8.830714] Code: b8 85 c0 0f 84 39 fd ff ff 8b 05 13 4d 92 01 85 c0 0f 85 2b
fd ff ff 48 c7 c6 c5 93 3e 82 48 c7 c7 66 aa 39 82 e8 24 34 f8 ff <0f> 0b 31 c0
44 8b 5d b8 e9 60 f7 ff ff 88 55 b0 44 89 5d b8 e8 4b
[ 8.830714] RSP: 0000:ffffffff82603b28 EFLAGS: 00010086
[ 8.830714] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 8.830714] RDX: 0000000000000004 RSI: 00000000ffffffea RDI: 00000000ffffffff
[ 8.830714] RBP: ffffffff82603ba8 R08: ff1100046fffdfe8 R09: 0000000000000003
[ 8.830714] R10: ff11000437ffe000 R11: ff11000464000d78 R12: ffffffff826b57a8
[ 8.830714] R13: ffffffff826b4e00 R14: ffffffff826b57a8 R15: 51eb851eb8f20808
[ 8.830714] FS: 0000000000000000(0000) GS:ff11000427a00000(0000)
knlGS:0000000000000000
[ 8.830714] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8.830714] CR2: ff1100047ffff000 CR3: 0000000002eda001 CR4: 0000000000771ef0
[ 8.830714] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8.830714] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 8.830714] PKRU: 55555554
[ 8.830714] Call Trace:
[ 8.830714] <TASK>
[ 8.830714] ? show_regs+0x60/0x70
[ 8.830714] ? __warn+0x84/0x180
[ 8.830714] ? __lock_acquire+0xa5c/0x2120
[ 8.830714] ? report_bug+0x1f3/0x200
[ 8.830714] ? handle_bug+0x40/0x70
[ 8.830714] ? exc_invalid_op+0x19/0x70
[ 8.830714] ? asm_exc_invalid_op+0x1b/0x20
[ 8.830714] ? __lock_acquire+0xa5c/0x2120
[ 8.830714] ? __lock_acquire+0xa5c/0x2120
[ 8.830714] ? common_startup_64+0x13e/0x148
[ 8.830714] lock_acquire+0xc8/0x2e0
[ 8.830714] ? copy_process+0x107/0x2b80
[ 8.830714] ? __this_cpu_preempt_check+0x13/0x20
[ 8.830714] ? _raw_spin_lock_irq+0x4b/0x50
[ 8.830714] _raw_spin_lock_irq+0x37/0x50
[ 8.830714] ? copy_process+0x107/0x2b80
[ 8.830714] copy_process+0x107/0x2b80
[ 8.830714] ? find_held_lock+0x31/0x90
[ 8.830714] ? lock_release+0x130/0x290
[ 8.830714] ? _raw_spin_unlock_irqrestore+0x2c/0x60
[ 8.830714] ? find_held_lock+0x31/0x90
[ 8.830714] kernel_clone+0x97/0x3b0
[ 8.830714] ? __mutex_unlock_slowpath+0x3c/0x2a0
[ 8.830714] user_mode_thread+0x59/0x70
[ 8.830714] ? rest_init+0x190/0x190
[ 8.830714] rest_init+0x1e/0x190
[ 8.830714] start_kernel+0x672/0x790
[ 8.830714] x86_64_start_reservations+0x18/0x30
[ 8.830714] x86_64_start_kernel+0xd0/0xe0
[ 8.830714] common_startup_64+0x13e/0x148
[ 8.830714] </TASK>
[ 8.830714] irq event stamp: 27428
[ 8.830714] hardirqs last enabled at (27427): [<ffffffff81dc236c>]
_raw_spin_unlock_irqrestore+0x2c/0x60
[ 8.830714] hardirqs last disabled at (27428): [<ffffffff81dc208b>]
_raw_spin_lock_irq+0x4b/0x50
[ 8.830714] softirqs last enabled at (27360): [<ffffffff8116d93c>]
cgroup_idr_alloc.constprop.0+0x5c/0x100
[ 8.830714] softirqs last disabled at (27358): [<ffffffff8116d916>]
cgroup_idr_alloc.constprop.0+0x36/0x100
[ 8.830714] ---[ end trace 0000000000000000 ]---
[ 8.830714] BUG: kernel NULL pointer dereference, address: 00000000000000c4
[ 8.830714] #PF: supervisor read access in kernel mode
[ 8.830714] #PF: error_code(0x0000) - not-present page
[ 8.830714] PGD 0
[ 8.830714] Oops: Oops: 0000 [#1] PREEMPT SMP
[ 8.830714] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G S W
6.10.0-rc3-00004-g807333522953 #117
[ 8.830714] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown
2/2/2022
[ 8.830714] RIP: 0010:__lock_acquire+0x1c9/0x2120
[ 8.830714] Code: 45 28 41 89 44 24 24 4c 89 f8 25 ff 1f 00 00 48 0f a3 05 2a
03 dc 01 0f 83 aa 05 00 00 48 69 c0 c8 00 00 00 48 05 a0 bc eb 82 <0f> b6 90 c4
00 00 00 41 0f b7 44 24 20 66 25 ff 1f 0f b7 c0 48 0f
[ 8.830714] RSP: 0000:ffffffff82603b28 EFLAGS: 00010046
[ 8.830714] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 8.830714] RDX: 0000000000000004 RSI: 00000000ffffffea RDI: 00000000ffffffff
[ 8.830714] RBP: ffffffff82603ba8 R08: ff1100046fffdfe8 R09: 0000000000000003
[ 8.830714] R10: ff11000437ffe000 R11: 0000000000000001 R12: ffffffff826b57a8
[ 8.830714] R13: ffffffff826b4e00 R14: ffffffff826b57a8 R15: 51eb851eb8f20808
[ 8.830714] FS: 0000000000000000(0000) GS:ff11000427a00000(0000)
knlGS:0000000000000000
[ 8.830714] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8.830714] CR2: 00000000000000c4 CR3: 0000000002eda001 CR4: 0000000000771ef0
[ 8.830714] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8.830714] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 8.830714] PKRU: 55555554
[ 8.830714] Call Trace:
[ 8.830714] <TASK>
[ 8.830714] ? show_regs+0x60/0x70
[ 8.830714] ? __die+0x20/0x60
[ 8.830714] ? page_fault_oops+0x15a/0x480
[ 8.830714] ? __lock_acquire+0xa5c/0x2120
[ 8.830714] ? exc_page_fault+0x437/0x910
[ 8.830714] ? asm_exc_page_fault+0x27/0x30
[ 8.830714] ? __lock_acquire+0x1c9/0x2120
[ 8.830714] ? __lock_acquire+0xa5c/0x2120
[ 8.830714] ? common_startup_64+0x13e/0x148
[ 8.830714] lock_acquire+0xc8/0x2e0
[ 8.830714] ? copy_process+0x107/0x2b80
[ 8.830714] ? __this_cpu_preempt_check+0x13/0x20
[ 8.830714] ? _raw_spin_lock_irq+0x4b/0x50
[ 8.830714] _raw_spin_lock_irq+0x37/0x50
[ 8.830714] ? copy_process+0x107/0x2b80
[ 8.830714] copy_process+0x107/0x2b80
[ 8.830714] ? find_held_lock+0x31/0x90
[ 8.830714] ? lock_release+0x130/0x290
[ 8.830714] ? _raw_spin_unlock_irqrestore+0x2c/0x60
[ 8.830714] ? find_held_lock+0x31/0x90
[ 8.830714] kernel_clone+0x97/0x3b0
[ 8.830714] ? __mutex_unlock_slowpath+0x3c/0x2a0
[ 8.830714] user_mode_thread+0x59/0x70
[ 8.830714] ? rest_init+0x190/0x190
[ 8.830714] rest_init+0x1e/0x190
[ 8.830714] start_kernel+0x672/0x790
[ 8.830714] x86_64_start_reservations+0x18/0x30
[ 8.830714] x86_64_start_kernel+0xd0/0xe0
[ 8.830714] common_startup_64+0x13e/0x148
[ 8.830714] </TASK>
[ 8.830714] CR2: 00000000000000c4
[ 8.830714] ---[ end trace 0000000000000000 ]---
[ 8.830714] RIP: 0010:__lock_acquire+0x1c9/0x2120
[ 8.830714] Code: 45 28 41 89 44 24 24 4c 89 f8 25 ff 1f 00 00 48 0f a3 05 2a
03 dc 01 0f 83 aa 05 00 00 48 69 c0 c8 00 00 00 48 05 a0 bc eb 82 <0f> b6 90 c4
00 00 00 41 0f b7 44 24 20 66 25 ff 1f 0f b7 c0 48 0f
[ 8.830714] RSP: 0000:ffffffff82603b28 EFLAGS: 00010046
[ 8.830714] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 8.830714] RDX: 0000000000000004 RSI: 00000000ffffffea RDI: 00000000ffffffff
[ 8.830714] RBP: ffffffff82603ba8 R08: ff1100046fffdfe8 R09: 0000000000000003
[ 8.830714] R10: ff11000437ffe000 R11: 0000000000000001 R12: ffffffff826b57a8
[ 8.830714] R13: ffffffff826b4e00 R14: ffffffff826b57a8 R15: 51eb851eb8f20808
[ 8.830714] FS: 0000000000000000(0000) GS:ff11000427a00000(0000)
knlGS:0000000000000000
[ 8.830714] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8.830714] CR2: 00000000000000c4 CR3: 0000000002eda001 CR4: 0000000000771ef0
[ 8.830714] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8.830714] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 8.830714] PKRU: 55555554
[ 8.830714] Kernel panic - not syncing: Attempted to kill the idle task!
[ 8.830714] ---[ end Kernel panic - not syncing: Attempted to kill the idle
task! ]---
next prev parent reply other threads:[~2024-06-25 5:28 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-05 8:35 [PATCH 0/3, v3] x86/fpu: Remove the thread::fpu pointer Ingo Molnar
2024-06-05 8:35 ` [PATCH 1/3] x86/fpu: Make task_struct::thread constant size Ingo Molnar
2024-06-05 19:04 ` Chang S. Bae
2024-06-06 9:30 ` [PATCH] x86/fpu: Fix stale comment in ex_handler_fprestore() Ingo Molnar
2024-06-06 15:55 ` Chang S. Bae
2024-06-05 8:35 ` [PATCH 2/3] x86/fpu: Remove the thread::fpu pointer Ingo Molnar
2024-06-05 13:38 ` Oleg Nesterov
2024-06-06 8:53 ` Ingo Molnar
2024-06-08 6:55 ` Ingo Molnar
2024-06-08 7:26 ` Ingo Molnar
2024-06-08 10:10 ` Oleg Nesterov
2024-06-25 5:26 ` Edgecombe, Rick P [this message]
2024-06-25 13:45 ` Edgecombe, Rick P
2024-06-05 8:35 ` [PATCH 3/3] x86/fpu: Remove init_task FPU state dependencies, add debugging warning Ingo Molnar
2024-06-05 14:17 ` Oleg Nesterov
2024-06-05 16:08 ` Linus Torvalds
2024-06-05 16:26 ` Oleg Nesterov
2024-06-05 17:28 ` Linus Torvalds
2024-06-06 8:30 ` [PATCH 3/3, v4] x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks Ingo Molnar
2024-06-06 8:46 ` [PATCH 4/3] x86/fpu: Push 'fpu' pointer calculation into the fpu__drop() call Ingo Molnar
2024-06-06 8:47 ` [PATCH 5/3] x86/fpu: Make sure x86_task_fpu() doesn't get called for PF_KTHREAD tasks during exit Ingo Molnar
2024-06-06 8:48 ` [PATCH 3/3, v4] x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks Ingo Molnar
2024-06-06 12:00 ` Oleg Nesterov
2024-06-07 10:56 ` Ingo Molnar
2024-06-24 6:47 ` [PATCH 3/3] x86/fpu: Remove init_task FPU state dependencies, add debugging warning Ning, Hongyu
2024-06-27 3:50 ` Ning, Hongyu
2024-06-05 21:21 ` [PATCH 0/3, v3] x86/fpu: Remove the thread::fpu pointer Brian Gerst
2024-06-06 9:06 ` [PATCH] x86/fpu: Introduce the x86_task_fpu() helper method Ingo Molnar
2024-06-06 15:35 ` Brian Gerst
2024-06-07 11:38 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=93bcabebe678b532cd8ee75fa2f48f32ceeb64b2.camel@intel.com \
--to=rick.p.edgecombe@intel.com \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=dave@sr71.net \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=ubizjak@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox