public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/4] KVM: Register cpuhp and syscore callbacks when enabling hardware
Date: Tue, 7 May 2024 09:31:55 -0700	[thread overview]
Message-ID: <ZjpXeyzU46I1eu0A@google.com> (raw)
In-Reply-To: <20240425233951.3344485-4-seanjc@google.com>

On Thu, Apr 25, 2024, Sean Christopherson wrote:
> Register KVM's cpuhp and syscore callback when enabling virtualization
> in hardware instead of registering the callbacks during initialization,
> and let the CPU up/down framework invoke the inner enable/disable
> functions.  Registering the callbacks during initialization makes things
> more complex than they need to be, as KVM needs to be very careful about
> handling races between enabling CPUs being onlined/offlined and hardware
> being enabled/disabled.
> 
> Intel TDX support will require KVM to enable virtualization during KVM
> initialization, i.e. will add another wrinkle to things, at which point
> sorting out the potential races with kvm_usage_count would become even
> more complex.
> +static int hardware_enable_all(void)
> +{
> +	int r;
> +
> +	guard(mutex)(&kvm_lock);
> +
> +	if (kvm_usage_count++)
> +		return 0;
> +
> +	r = cpuhp_setup_state(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online",
> +			      kvm_online_cpu, kvm_offline_cpu);
> +	if (r)
> +		return r;

There's a lock ordering issue here.  KVM currently takes kvm_lock inside
cpu_hotplug_lock, but this code does the opposite.  I need to take a closer look
at the locking, as I'm not entirely certain that the existing ordering is correct
or ideal.  E.g. cpu_hotplug_lock is taken when updating static keys, static calls,
etc., which makes taking cpu_hotplug_lock outside kvm_lock dicey, as flows that
take kvm_lock then need to be very careful to never trigger seemingly innocuous
updates.

And this lockdep splat that I've now hit twice with the current implementation
suggests that cpu_hotplug_lock => kvm_lock is already unsafe/broken (I need to
re-decipher the splat; I _think_ mostly figured it out last week, but then forgot
over the weekend).


[48419.244819] ======================================================
[48419.250999] WARNING: possible circular locking dependency detected
[48419.257179] 6.9.0-smp--04b1c6b4841d-next #301 Tainted: G S         O      
[48419.264050] ------------------------------------------------------
[48419.270229] tee/27085 is trying to acquire lock:
[48419.274849] ffffb5fdd4e430a8 (&kvm->slots_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x179/0x1e0 [kvm]
[48419.284085] 
               but task is already holding lock:
[48419.289918] ffffffffc06ccba8 (kvm_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x14a/0x1e0 [kvm]
[48419.298386] 
               which lock already depends on the new lock.

[48419.306559] 
               the existing dependency chain (in reverse order) is:
[48419.314040] 
               -> #3 (kvm_lock){+.+.}-{3:3}:
[48419.319523]        __mutex_lock+0x6a/0xb40
[48419.323622]        mutex_lock_nested+0x1f/0x30
[48419.328071]        kvm_dev_ioctl+0x4fb/0xe50 [kvm]
[48419.332898]        __se_sys_ioctl+0x7b/0xd0
[48419.337082]        __x64_sys_ioctl+0x21/0x30
[48419.341357]        do_syscall_64+0x8b/0x170
[48419.345540]        entry_SYSCALL_64_after_hwframe+0x71/0x79
[48419.351115] 
               -> #2 (cpu_hotplug_lock){++++}-{0:0}:
[48419.357294]        cpus_read_lock+0x2e/0xb0
[48419.361480]        static_key_slow_inc+0x16/0x30
[48419.366098]        kvm_lapic_set_base+0x6a/0x1c0 [kvm]
[48419.371298]        kvm_set_apic_base+0x8f/0xe0 [kvm]
[48419.376298]        kvm_set_msr_common+0xa29/0x1080 [kvm]
[48419.381645]        vmx_set_msr+0xa54/0xbe0 [kvm_intel]
[48419.386795]        __kvm_set_msr+0xb6/0x1a0 [kvm]
[48419.391535]        kvm_arch_vcpu_ioctl+0xf66/0x1150 [kvm]
[48419.396970]        kvm_vcpu_ioctl+0x485/0x5b0 [kvm]
[48419.401881]        __se_sys_ioctl+0x7b/0xd0
[48419.406067]        __x64_sys_ioctl+0x21/0x30
[48419.410342]        do_syscall_64+0x8b/0x170
[48419.414528]        entry_SYSCALL_64_after_hwframe+0x71/0x79
[48419.420099] 
               -> #1 (&kvm->srcu){.?.+}-{0:0}:
[48419.425758]        __synchronize_srcu+0x44/0x1a0
[48419.430378]        synchronize_srcu_expedited+0x21/0x30
[48419.435603]        kvm_swap_active_memslots+0x113/0x1c0 [kvm]
[48419.441385]        kvm_set_memslot+0x360/0x620 [kvm]
[48419.446387]        __kvm_set_memory_region+0x27b/0x300 [kvm]
[48419.452078]        kvm_vm_ioctl_set_memory_region+0x43/0x60 [kvm]
[48419.458207]        kvm_vm_ioctl+0x295/0x650 [kvm]
[48419.462945]        __se_sys_ioctl+0x7b/0xd0
[48419.467133]        __x64_sys_ioctl+0x21/0x30
[48419.471406]        do_syscall_64+0x8b/0x170
[48419.475590]        entry_SYSCALL_64_after_hwframe+0x71/0x79
[48419.481164] 
               -> #0 (&kvm->slots_lock){+.+.}-{3:3}:
[48419.487343]        __lock_acquire+0x15ef/0x2e40
[48419.491876]        lock_acquire+0xe0/0x260
[48419.495977]        __mutex_lock+0x6a/0xb40
[48419.500076]        mutex_lock_nested+0x1f/0x30
[48419.504521]        set_nx_huge_pages+0x179/0x1e0 [kvm]
[48419.509694]        param_attr_store+0x93/0x100
[48419.514142]        module_attr_store+0x22/0x40
[48419.518587]        sysfs_kf_write+0x81/0xb0
[48419.522774]        kernfs_fop_write_iter+0x133/0x1d0
[48419.527738]        vfs_write+0x317/0x380
[48419.531663]        ksys_write+0x70/0xe0
[48419.535505]        __x64_sys_write+0x1f/0x30
[48419.539777]        do_syscall_64+0x8b/0x170
[48419.543961]        entry_SYSCALL_64_after_hwframe+0x71/0x79
[48419.549534] 
               other info that might help us debug this:

[48419.557534] Chain exists of:
                 &kvm->slots_lock --> cpu_hotplug_lock --> kvm_lock

[48419.567873]  Possible unsafe locking scenario:

[48419.573793]        CPU0                    CPU1
[48419.578325]        ----                    ----
[48419.582859]   lock(kvm_lock);
[48419.585831]                                lock(cpu_hotplug_lock);
[48419.592011]                                lock(kvm_lock);
[48419.597499]   lock(&kvm->slots_lock);
[48419.601173] 
                *** DEADLOCK ***

[48419.607090] 5 locks held by tee/27085:
[48419.610841]  #0: ffffa0dcc92eb410 (sb_writers#4){.+.+}-{0:0}, at: vfs_write+0xe4/0x380
[48419.618765]  #1: ffffa0dce221ac88 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xd8/0x1d0
[48419.627553]  #2: ffffa11c4d6bef28 (kn->active#257){.+.+}-{0:0}, at: kernfs_fop_write_iter+0xe0/0x1d0
[48419.636684]  #3: ffffffffc06e3c50 (&mod->param_lock){+.+.}-{3:3}, at: param_attr_store+0x57/0x100
[48419.645575]  #4: ffffffffc06ccba8 (kvm_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x14a/0x1e0 [kvm]
[48419.654564] 
               stack backtrace:
[48419.658925] CPU: 93 PID: 27085 Comm: tee Tainted: G S         O       6.9.0-smp--04b1c6b4841d-next #301
[48419.668312] Hardware name: Google Interlaken/interlaken, BIOS 0.20231025.0-0 10/25/2023
[48419.676314] Call Trace:
[48419.678770]  <TASK>
[48419.680878]  dump_stack_lvl+0x83/0xc0
[48419.684552]  dump_stack+0x14/0x20
[48419.687880]  print_circular_bug+0x2f0/0x300
[48419.692068]  check_noncircular+0x103/0x120
[48419.696163]  ? __lock_acquire+0x5e3/0x2e40
[48419.700266]  __lock_acquire+0x15ef/0x2e40
[48419.704286]  ? __lock_acquire+0x5e3/0x2e40
[48419.708387]  ? __lock_acquire+0x5e3/0x2e40
[48419.712486]  ? __lock_acquire+0x5e3/0x2e40
[48419.716586]  lock_acquire+0xe0/0x260
[48419.720164]  ? set_nx_huge_pages+0x179/0x1e0 [kvm]
[48419.725060]  ? lock_acquire+0xe0/0x260
[48419.728822]  ? param_attr_store+0x57/0x100
[48419.732921]  ? set_nx_huge_pages+0x179/0x1e0 [kvm]
[48419.737768]  __mutex_lock+0x6a/0xb40
[48419.741359]  ? set_nx_huge_pages+0x179/0x1e0 [kvm]
[48419.746207]  ? set_nx_huge_pages+0x14a/0x1e0 [kvm]
[48419.751054]  ? param_attr_store+0x57/0x100
[48419.755158]  ? __mutex_lock+0x240/0xb40
[48419.759005]  ? param_attr_store+0x57/0x100
[48419.763107]  mutex_lock_nested+0x1f/0x30
[48419.767031]  set_nx_huge_pages+0x179/0x1e0 [kvm]
[48419.771705]  param_attr_store+0x93/0x100
[48419.775629]  module_attr_store+0x22/0x40
[48419.779556]  sysfs_kf_write+0x81/0xb0
[48419.783222]  kernfs_fop_write_iter+0x133/0x1d0
[48419.787668]  vfs_write+0x317/0x380
[48419.791084]  ksys_write+0x70/0xe0
[48419.794401]  __x64_sys_write+0x1f/0x30
[48419.798152]  do_syscall_64+0x8b/0x170
[48419.801819]  entry_SYSCALL_64_after_hwframe+0x71/0x79
[48419.806872] RIP: 0033:0x7f98eff1ee0d
[48419.810465] Code: e5 41 57 41 56 53 50 49 89 d6 49 89 f7 89 fb 48 8b 05 37 7c 07 00 83 38 00 75 28 b8 01 00 00 00 89 df 4c 89 fe 4c 89 f2 0f 05 <48> 89 c3 48 3d 01 f0 ff ff 73 3a 48 89 d8 48 83 c4 08 5b 41 5e 41
[48419.829213] RSP: 002b:00007ffd5dee15c0 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[48419.836792] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f98eff1ee0d
[48419.843931] RDX: 0000000000000002 RSI: 00007ffd5dee16d0 RDI: 0000000000000005
[48419.851065] RBP: 00007ffd5dee15e0 R08: 00007f98efe161d2 R09: 0000000000000001
[48419.858196] R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000002
[48419.865328] R13: 00007ffd5dee16d0 R14: 0000000000000002 R15: 00007ffd5dee16d0
[48419.872472]  </TASK>



  parent reply	other threads:[~2024-05-07 16:31 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-25 23:39 [PATCH 0/4] KVM: Register cpuhp/syscore callbacks when enabling virt Sean Christopherson
2024-04-25 23:39 ` [PATCH 1/4] x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef Sean Christopherson
2024-05-13 12:50   ` Huang, Kai
2024-05-13 16:01     ` Sean Christopherson
2024-05-13 22:44       ` Huang, Kai
2024-05-14 22:41         ` Huang, Kai
2024-05-21 20:02           ` Sean Christopherson
2024-05-21 21:43             ` Huang, Kai
2024-05-21 23:16               ` Sean Christopherson
2024-04-25 23:39 ` [PATCH 2/4] KVM: x86: Register emergency virt callback in common code, via kvm_x86_ops Sean Christopherson
2024-04-26  8:52   ` Chao Gao
2024-04-26 17:08     ` Sean Christopherson
2024-05-13 12:55       ` Huang, Kai
2024-05-13 16:17         ` Sean Christopherson
2024-04-25 23:39 ` [PATCH 3/4] KVM: Register cpuhp and syscore callbacks when enabling hardware Sean Christopherson
2024-04-26  8:32   ` Chao Gao
2024-04-26 17:07     ` Sean Christopherson
2024-05-07 16:31   ` Sean Christopherson [this message]
2024-05-09 12:10     ` Huang, Kai
2024-05-13 12:56   ` Huang, Kai
2024-04-25 23:39 ` [PATCH 4/4] KVM: Rename functions related to enabling virtualization hardware Sean Christopherson
2024-05-13 12:59   ` Huang, Kai
2024-05-13 16:20     ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZjpXeyzU46I1eu0A@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox