* [PATCH 0/2] KVM: Protect vCPU's PID with a rwlock
@ 2024-08-02 20:01 Sean Christopherson
2024-08-02 20:01 ` [PATCH 1/2] KVM: Return '0' directly when there's no task to yield to Sean Christopherson
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Sean Christopherson @ 2024-08-02 20:01 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Paolo Bonzini
Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, Sean Christopherson,
Steve Rutherford
Protect vcpu->pid with a rwlock instead of RCU, so that running a vCPU
with a different task doesn't require a full RCU synchronization, which
can introduce a non-trivial amount of jitter, especially on large systems.
I've had this mini-series sitting around for ~2 years, pretty much as-is.
I could have sworn past me thought there was a flaw in using a rwlock, and
so I never posted it, but for the life of me I can't think of any issues.
Extra eyeballs would be much appreciated.
Sean Christopherson (2):
KVM: Return '0' directly when there's no task to yield to
KVM: Protect vCPU's "last run PID" with rwlock, not RCU
arch/arm64/include/asm/kvm_host.h | 2 +-
include/linux/kvm_host.h | 3 ++-
virt/kvm/kvm_main.c | 36 +++++++++++++++++--------------
3 files changed, 23 insertions(+), 18 deletions(-)
base-commit: 332d2c1d713e232e163386c35a3ba0c1b90df83f
--
2.46.0.rc2.264.g509ed76dc8-goog
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/2] KVM: Return '0' directly when there's no task to yield to
2024-08-02 20:01 [PATCH 0/2] KVM: Protect vCPU's PID with a rwlock Sean Christopherson
@ 2024-08-02 20:01 ` Sean Christopherson
2024-08-02 20:01 ` [PATCH 2/2] KVM: Protect vCPU's "last run PID" with rwlock, not RCU Sean Christopherson
` (2 subsequent siblings)
3 siblings, 0 replies; 12+ messages in thread
From: Sean Christopherson @ 2024-08-02 20:01 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Paolo Bonzini
Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, Sean Christopherson,
Steve Rutherford
Do "return 0" instead of initializing and returning a local variable in
kvm_vcpu_yield_to(), e.g. so that it's more obvious what the function
returns if there is no task.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/kvm_main.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d0788d0a72cc..91048a7ad3be 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3932,7 +3932,7 @@ int kvm_vcpu_yield_to(struct kvm_vcpu *target)
{
struct pid *pid;
struct task_struct *task = NULL;
- int ret = 0;
+ int ret;
rcu_read_lock();
pid = rcu_dereference(target->pid);
@@ -3940,7 +3940,7 @@ int kvm_vcpu_yield_to(struct kvm_vcpu *target)
task = get_pid_task(pid, PIDTYPE_PID);
rcu_read_unlock();
if (!task)
- return ret;
+ return 0;
ret = yield_to(task, 1);
put_task_struct(task);
--
2.46.0.rc2.264.g509ed76dc8-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/2] KVM: Protect vCPU's "last run PID" with rwlock, not RCU
2024-08-02 20:01 [PATCH 0/2] KVM: Protect vCPU's PID with a rwlock Sean Christopherson
2024-08-02 20:01 ` [PATCH 1/2] KVM: Return '0' directly when there's no task to yield to Sean Christopherson
@ 2024-08-02 20:01 ` Sean Christopherson
2024-08-02 20:28 ` Steve Rutherford
` (2 more replies)
2024-08-06 22:59 ` [PATCH 0/2] KVM: Protect vCPU's PID with a rwlock Oliver Upton
2024-10-31 19:51 ` Sean Christopherson
3 siblings, 3 replies; 12+ messages in thread
From: Sean Christopherson @ 2024-08-02 20:01 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Paolo Bonzini
Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, Sean Christopherson,
Steve Rutherford
To avoid jitter on KVM_RUN due to synchronize_rcu(), use a rwlock instead
of RCU to protect vcpu->pid, a.k.a. the pid of the task last used to a
vCPU. When userspace is doing M:N scheduling of tasks to vCPUs, e.g. to
run SEV migration helper vCPUs during post-copy, the synchronize_rcu()
needed to change the PID associated with the vCPU can stall for hundreds
of milliseconds, which is problematic for latency sensitive post-copy
operations.
In the directed yield path, do not acquire the lock if it's contended,
i.e. if the associated PID is changing, as that means the vCPU's task is
already running.
Reported-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/arm64/include/asm/kvm_host.h | 2 +-
include/linux/kvm_host.h | 3 ++-
virt/kvm/kvm_main.c | 32 +++++++++++++++++--------------
3 files changed, 21 insertions(+), 16 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index a33f5996ca9f..7199cb014806 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1115,7 +1115,7 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
void kvm_arm_halt_guest(struct kvm *kvm);
void kvm_arm_resume_guest(struct kvm *kvm);
-#define vcpu_has_run_once(vcpu) !!rcu_access_pointer((vcpu)->pid)
+#define vcpu_has_run_once(vcpu) (!!READ_ONCE((vcpu)->pid))
#ifndef __KVM_NVHE_HYPERVISOR__
#define kvm_call_hyp_nvhe(f, ...) \
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 689e8be873a7..d6f4e8b2b44c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -342,7 +342,8 @@ struct kvm_vcpu {
#ifndef __KVM_HAVE_ARCH_WQP
struct rcuwait wait;
#endif
- struct pid __rcu *pid;
+ struct pid *pid;
+ rwlock_t pid_lock;
int sigset_active;
sigset_t sigset;
unsigned int halt_poll_ns;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 91048a7ad3be..fabffd85fa34 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -486,6 +486,7 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
vcpu->kvm = kvm;
vcpu->vcpu_id = id;
vcpu->pid = NULL;
+ rwlock_init(&vcpu->pid_lock);
#ifndef __KVM_HAVE_ARCH_WQP
rcuwait_init(&vcpu->wait);
#endif
@@ -513,7 +514,7 @@ static void kvm_vcpu_destroy(struct kvm_vcpu *vcpu)
* the vcpu->pid pointer, and at destruction time all file descriptors
* are already gone.
*/
- put_pid(rcu_dereference_protected(vcpu->pid, 1));
+ put_pid(vcpu->pid);
free_page((unsigned long)vcpu->run);
kmem_cache_free(kvm_vcpu_cache, vcpu);
@@ -3930,15 +3931,17 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_kick);
int kvm_vcpu_yield_to(struct kvm_vcpu *target)
{
- struct pid *pid;
struct task_struct *task = NULL;
int ret;
- rcu_read_lock();
- pid = rcu_dereference(target->pid);
- if (pid)
- task = get_pid_task(pid, PIDTYPE_PID);
- rcu_read_unlock();
+ if (!read_trylock(&target->pid_lock))
+ return 0;
+
+ if (target->pid)
+ task = get_pid_task(target->pid, PIDTYPE_PID);
+
+ read_unlock(&target->pid_lock);
+
if (!task)
return 0;
ret = yield_to(task, 1);
@@ -4178,9 +4181,9 @@ static int vcpu_get_pid(void *data, u64 *val)
{
struct kvm_vcpu *vcpu = data;
- rcu_read_lock();
- *val = pid_nr(rcu_dereference(vcpu->pid));
- rcu_read_unlock();
+ read_lock(&vcpu->pid_lock);
+ *val = pid_nr(vcpu->pid);
+ read_unlock(&vcpu->pid_lock);
return 0;
}
@@ -4466,7 +4469,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
r = -EINVAL;
if (arg)
goto out;
- oldpid = rcu_access_pointer(vcpu->pid);
+ oldpid = vcpu->pid;
if (unlikely(oldpid != task_pid(current))) {
/* The thread running this VCPU changed. */
struct pid *newpid;
@@ -4476,9 +4479,10 @@ static long kvm_vcpu_ioctl(struct file *filp,
break;
newpid = get_task_pid(current, PIDTYPE_PID);
- rcu_assign_pointer(vcpu->pid, newpid);
- if (oldpid)
- synchronize_rcu();
+ write_lock(&vcpu->pid_lock);
+ vcpu->pid = newpid;
+ write_unlock(&vcpu->pid_lock);
+
put_pid(oldpid);
}
vcpu->wants_to_run = !READ_ONCE(vcpu->run->immediate_exit__unsafe);
--
2.46.0.rc2.264.g509ed76dc8-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] KVM: Protect vCPU's "last run PID" with rwlock, not RCU
2024-08-02 20:01 ` [PATCH 2/2] KVM: Protect vCPU's "last run PID" with rwlock, not RCU Sean Christopherson
@ 2024-08-02 20:28 ` Steve Rutherford
2024-08-02 20:51 ` Sean Christopherson
2024-08-02 21:27 ` Steve Rutherford
2024-08-06 22:58 ` Oliver Upton
2 siblings, 1 reply; 12+ messages in thread
From: Steve Rutherford @ 2024-08-02 20:28 UTC (permalink / raw)
To: Sean Christopherson
Cc: Marc Zyngier, Oliver Upton, Paolo Bonzini, linux-arm-kernel,
kvmarm, kvm, linux-kernel
On Fri, Aug 2, 2024 at 1:01 PM Sean Christopherson <seanjc@google.com> wrote:
>
> To avoid jitter on KVM_RUN due to synchronize_rcu(), use a rwlock instead
> of RCU to protect vcpu->pid, a.k.a. the pid of the task last used to a
> vCPU. When userspace is doing M:N scheduling of tasks to vCPUs, e.g. to
> run SEV migration helper vCPUs during post-copy, the synchronize_rcu()
> needed to change the PID associated with the vCPU can stall for hundreds
> of milliseconds, which is problematic for latency sensitive post-copy
> operations.
>
> In the directed yield path, do not acquire the lock if it's contended,
> i.e. if the associated PID is changing, as that means the vCPU's task is
> already running.
>
> Reported-by: Steve Rutherford <srutherford@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/arm64/include/asm/kvm_host.h | 2 +-
> include/linux/kvm_host.h | 3 ++-
> virt/kvm/kvm_main.c | 32 +++++++++++++++++--------------
> 3 files changed, 21 insertions(+), 16 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index a33f5996ca9f..7199cb014806 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -1115,7 +1115,7 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
> void kvm_arm_halt_guest(struct kvm *kvm);
> void kvm_arm_resume_guest(struct kvm *kvm);
>
> -#define vcpu_has_run_once(vcpu) !!rcu_access_pointer((vcpu)->pid)
> +#define vcpu_has_run_once(vcpu) (!!READ_ONCE((vcpu)->pid))
>
> #ifndef __KVM_NVHE_HYPERVISOR__
> #define kvm_call_hyp_nvhe(f, ...) \
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 689e8be873a7..d6f4e8b2b44c 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -342,7 +342,8 @@ struct kvm_vcpu {
> #ifndef __KVM_HAVE_ARCH_WQP
> struct rcuwait wait;
> #endif
> - struct pid __rcu *pid;
> + struct pid *pid;
> + rwlock_t pid_lock;
> int sigset_active;
> sigset_t sigset;
> unsigned int halt_poll_ns;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 91048a7ad3be..fabffd85fa34 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -486,6 +486,7 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
> vcpu->kvm = kvm;
> vcpu->vcpu_id = id;
> vcpu->pid = NULL;
> + rwlock_init(&vcpu->pid_lock);
> #ifndef __KVM_HAVE_ARCH_WQP
> rcuwait_init(&vcpu->wait);
> #endif
> @@ -513,7 +514,7 @@ static void kvm_vcpu_destroy(struct kvm_vcpu *vcpu)
> * the vcpu->pid pointer, and at destruction time all file descriptors
> * are already gone.
> */
> - put_pid(rcu_dereference_protected(vcpu->pid, 1));
> + put_pid(vcpu->pid);
>
> free_page((unsigned long)vcpu->run);
> kmem_cache_free(kvm_vcpu_cache, vcpu);
> @@ -3930,15 +3931,17 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_kick);
>
> int kvm_vcpu_yield_to(struct kvm_vcpu *target)
> {
> - struct pid *pid;
> struct task_struct *task = NULL;
> int ret;
>
> - rcu_read_lock();
> - pid = rcu_dereference(target->pid);
> - if (pid)
> - task = get_pid_task(pid, PIDTYPE_PID);
> - rcu_read_unlock();
> + if (!read_trylock(&target->pid_lock))
> + return 0;
> +
> + if (target->pid)
> + task = get_pid_task(target->pid, PIDTYPE_PID);
> +
> + read_unlock(&target->pid_lock);
> +
> if (!task)
> return 0;
> ret = yield_to(task, 1);
> @@ -4178,9 +4181,9 @@ static int vcpu_get_pid(void *data, u64 *val)
> {
> struct kvm_vcpu *vcpu = data;
>
> - rcu_read_lock();
> - *val = pid_nr(rcu_dereference(vcpu->pid));
> - rcu_read_unlock();
> + read_lock(&vcpu->pid_lock);
> + *val = pid_nr(vcpu->pid);
> + read_unlock(&vcpu->pid_lock);
> return 0;
> }
>
> @@ -4466,7 +4469,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
> r = -EINVAL;
> if (arg)
> goto out;
> - oldpid = rcu_access_pointer(vcpu->pid);
> + oldpid = vcpu->pid;
Overall this patch looks correct, but this spot took me a moment, and
I want to confirm. This skips the reader lock since writing only
happens just below, under the vcpu lock, and we've already taken that
lock?
> if (unlikely(oldpid != task_pid(current))) {
> /* The thread running this VCPU changed. */
> struct pid *newpid;
> @@ -4476,9 +4479,10 @@ static long kvm_vcpu_ioctl(struct file *filp,
> break;
>
> newpid = get_task_pid(current, PIDTYPE_PID);
> - rcu_assign_pointer(vcpu->pid, newpid);
> - if (oldpid)
> - synchronize_rcu();
> + write_lock(&vcpu->pid_lock);
> + vcpu->pid = newpid;
> + write_unlock(&vcpu->pid_lock);
> +
> put_pid(oldpid);
> }
> vcpu->wants_to_run = !READ_ONCE(vcpu->run->immediate_exit__unsafe);
> --
> 2.46.0.rc2.264.g509ed76dc8-goog
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] KVM: Protect vCPU's "last run PID" with rwlock, not RCU
2024-08-02 20:28 ` Steve Rutherford
@ 2024-08-02 20:51 ` Sean Christopherson
0 siblings, 0 replies; 12+ messages in thread
From: Sean Christopherson @ 2024-08-02 20:51 UTC (permalink / raw)
To: Steve Rutherford
Cc: Marc Zyngier, Oliver Upton, Paolo Bonzini, linux-arm-kernel,
kvmarm, kvm, linux-kernel
On Fri, Aug 02, 2024, Steve Rutherford wrote:
> On Fri, Aug 2, 2024 at 1:01 PM Sean Christopherson <seanjc@google.com> wrote:
> > @@ -4178,9 +4181,9 @@ static int vcpu_get_pid(void *data, u64 *val)
> > {
> > struct kvm_vcpu *vcpu = data;
> >
> > - rcu_read_lock();
> > - *val = pid_nr(rcu_dereference(vcpu->pid));
> > - rcu_read_unlock();
> > + read_lock(&vcpu->pid_lock);
> > + *val = pid_nr(vcpu->pid);
> > + read_unlock(&vcpu->pid_lock);
> > return 0;
> > }
> >
> > @@ -4466,7 +4469,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
> > r = -EINVAL;
> > if (arg)
> > goto out;
> > - oldpid = rcu_access_pointer(vcpu->pid);
> > + oldpid = vcpu->pid;
>
> Overall this patch looks correct, but this spot took me a moment, and
> I want to confirm. This skips the reader lock since writing only
> happens just below, under the vcpu lock, and we've already taken that
> lock?
Yep, exactly.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] KVM: Protect vCPU's "last run PID" with rwlock, not RCU
2024-08-02 20:01 ` [PATCH 2/2] KVM: Protect vCPU's "last run PID" with rwlock, not RCU Sean Christopherson
2024-08-02 20:28 ` Steve Rutherford
@ 2024-08-02 21:27 ` Steve Rutherford
2024-08-06 22:58 ` Oliver Upton
2 siblings, 0 replies; 12+ messages in thread
From: Steve Rutherford @ 2024-08-02 21:27 UTC (permalink / raw)
To: Sean Christopherson
Cc: Marc Zyngier, Oliver Upton, Paolo Bonzini, linux-arm-kernel,
kvmarm, kvm, linux-kernel
On Fri, Aug 2, 2024 at 1:01 PM Sean Christopherson <seanjc@google.com> wrote:
>
> To avoid jitter on KVM_RUN due to synchronize_rcu(), use a rwlock instead
> of RCU to protect vcpu->pid, a.k.a. the pid of the task last used to a
> vCPU. When userspace is doing M:N scheduling of tasks to vCPUs, e.g. to
> run SEV migration helper vCPUs during post-copy, the synchronize_rcu()
> needed to change the PID associated with the vCPU can stall for hundreds
> of milliseconds, which is problematic for latency sensitive post-copy
> operations.
>
> In the directed yield path, do not acquire the lock if it's contended,
> i.e. if the associated PID is changing, as that means the vCPU's task is
> already running.
>
> Reported-by: Steve Rutherford <srutherford@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/arm64/include/asm/kvm_host.h | 2 +-
> include/linux/kvm_host.h | 3 ++-
> virt/kvm/kvm_main.c | 32 +++++++++++++++++--------------
> 3 files changed, 21 insertions(+), 16 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index a33f5996ca9f..7199cb014806 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -1115,7 +1115,7 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
> void kvm_arm_halt_guest(struct kvm *kvm);
> void kvm_arm_resume_guest(struct kvm *kvm);
>
> -#define vcpu_has_run_once(vcpu) !!rcu_access_pointer((vcpu)->pid)
> +#define vcpu_has_run_once(vcpu) (!!READ_ONCE((vcpu)->pid))
>
> #ifndef __KVM_NVHE_HYPERVISOR__
> #define kvm_call_hyp_nvhe(f, ...) \
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 689e8be873a7..d6f4e8b2b44c 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -342,7 +342,8 @@ struct kvm_vcpu {
> #ifndef __KVM_HAVE_ARCH_WQP
> struct rcuwait wait;
> #endif
> - struct pid __rcu *pid;
> + struct pid *pid;
> + rwlock_t pid_lock;
> int sigset_active;
> sigset_t sigset;
> unsigned int halt_poll_ns;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 91048a7ad3be..fabffd85fa34 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -486,6 +486,7 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
> vcpu->kvm = kvm;
> vcpu->vcpu_id = id;
> vcpu->pid = NULL;
> + rwlock_init(&vcpu->pid_lock);
> #ifndef __KVM_HAVE_ARCH_WQP
> rcuwait_init(&vcpu->wait);
> #endif
> @@ -513,7 +514,7 @@ static void kvm_vcpu_destroy(struct kvm_vcpu *vcpu)
> * the vcpu->pid pointer, and at destruction time all file descriptors
> * are already gone.
> */
> - put_pid(rcu_dereference_protected(vcpu->pid, 1));
> + put_pid(vcpu->pid);
>
> free_page((unsigned long)vcpu->run);
> kmem_cache_free(kvm_vcpu_cache, vcpu);
> @@ -3930,15 +3931,17 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_kick);
>
> int kvm_vcpu_yield_to(struct kvm_vcpu *target)
> {
> - struct pid *pid;
> struct task_struct *task = NULL;
> int ret;
>
> - rcu_read_lock();
> - pid = rcu_dereference(target->pid);
> - if (pid)
> - task = get_pid_task(pid, PIDTYPE_PID);
> - rcu_read_unlock();
> + if (!read_trylock(&target->pid_lock))
> + return 0;
> +
> + if (target->pid)
> + task = get_pid_task(target->pid, PIDTYPE_PID);
> +
> + read_unlock(&target->pid_lock);
> +
> if (!task)
> return 0;
> ret = yield_to(task, 1);
> @@ -4178,9 +4181,9 @@ static int vcpu_get_pid(void *data, u64 *val)
> {
> struct kvm_vcpu *vcpu = data;
>
> - rcu_read_lock();
> - *val = pid_nr(rcu_dereference(vcpu->pid));
> - rcu_read_unlock();
> + read_lock(&vcpu->pid_lock);
> + *val = pid_nr(vcpu->pid);
> + read_unlock(&vcpu->pid_lock);
> return 0;
> }
>
> @@ -4466,7 +4469,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
> r = -EINVAL;
> if (arg)
> goto out;
> - oldpid = rcu_access_pointer(vcpu->pid);
> + oldpid = vcpu->pid;
> if (unlikely(oldpid != task_pid(current))) {
> /* The thread running this VCPU changed. */
> struct pid *newpid;
> @@ -4476,9 +4479,10 @@ static long kvm_vcpu_ioctl(struct file *filp,
> break;
>
> newpid = get_task_pid(current, PIDTYPE_PID);
> - rcu_assign_pointer(vcpu->pid, newpid);
> - if (oldpid)
> - synchronize_rcu();
> + write_lock(&vcpu->pid_lock);
> + vcpu->pid = newpid;
> + write_unlock(&vcpu->pid_lock);
> +
> put_pid(oldpid);
> }
> vcpu->wants_to_run = !READ_ONCE(vcpu->run->immediate_exit__unsafe);
> --
> 2.46.0.rc2.264.g509ed76dc8-goog
>
Reviewed-by: Steve Rutherford <srutherford@google.com>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] KVM: Protect vCPU's "last run PID" with rwlock, not RCU
2024-08-02 20:01 ` [PATCH 2/2] KVM: Protect vCPU's "last run PID" with rwlock, not RCU Sean Christopherson
2024-08-02 20:28 ` Steve Rutherford
2024-08-02 21:27 ` Steve Rutherford
@ 2024-08-06 22:58 ` Oliver Upton
2024-08-06 23:59 ` Sean Christopherson
2 siblings, 1 reply; 12+ messages in thread
From: Oliver Upton @ 2024-08-06 22:58 UTC (permalink / raw)
To: Sean Christopherson
Cc: Marc Zyngier, Paolo Bonzini, linux-arm-kernel, kvmarm, kvm,
linux-kernel, Steve Rutherford
On Fri, Aug 02, 2024 at 01:01:36PM -0700, Sean Christopherson wrote:
> To avoid jitter on KVM_RUN due to synchronize_rcu(), use a rwlock instead
> of RCU to protect vcpu->pid, a.k.a. the pid of the task last used to a
> vCPU. When userspace is doing M:N scheduling of tasks to vCPUs, e.g. to
> run SEV migration helper vCPUs during post-copy, the synchronize_rcu()
> needed to change the PID associated with the vCPU can stall for hundreds
> of milliseconds, which is problematic for latency sensitive post-copy
> operations.
>
> In the directed yield path, do not acquire the lock if it's contended,
> i.e. if the associated PID is changing, as that means the vCPU's task is
> already running.
>
> Reported-by: Steve Rutherford <srutherford@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/arm64/include/asm/kvm_host.h | 2 +-
> include/linux/kvm_host.h | 3 ++-
> virt/kvm/kvm_main.c | 32 +++++++++++++++++--------------
> 3 files changed, 21 insertions(+), 16 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index a33f5996ca9f..7199cb014806 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -1115,7 +1115,7 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
> void kvm_arm_halt_guest(struct kvm *kvm);
> void kvm_arm_resume_guest(struct kvm *kvm);
>
> -#define vcpu_has_run_once(vcpu) !!rcu_access_pointer((vcpu)->pid)
> +#define vcpu_has_run_once(vcpu) (!!READ_ONCE((vcpu)->pid))
>
> #ifndef __KVM_NVHE_HYPERVISOR__
> #define kvm_call_hyp_nvhe(f, ...) \
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 689e8be873a7..d6f4e8b2b44c 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -342,7 +342,8 @@ struct kvm_vcpu {
> #ifndef __KVM_HAVE_ARCH_WQP
> struct rcuwait wait;
> #endif
> - struct pid __rcu *pid;
> + struct pid *pid;
> + rwlock_t pid_lock;
> int sigset_active;
> sigset_t sigset;
> unsigned int halt_poll_ns;
Adding yet another lock is never exciting, but this looks fine. Can you
nest this lock inside of the vcpu->mutex acquisition in
kvm_vm_ioctl_create_vcpu() so lockdep gets the picture?
> @@ -4466,7 +4469,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
> r = -EINVAL;
> if (arg)
> goto out;
> - oldpid = rcu_access_pointer(vcpu->pid);
> + oldpid = vcpu->pid;
It'd be good to add a comment here about how this is guarded by the
vcpu->mutex, as Steve points out.
--
Thanks,
Oliver
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] KVM: Protect vCPU's PID with a rwlock
2024-08-02 20:01 [PATCH 0/2] KVM: Protect vCPU's PID with a rwlock Sean Christopherson
2024-08-02 20:01 ` [PATCH 1/2] KVM: Return '0' directly when there's no task to yield to Sean Christopherson
2024-08-02 20:01 ` [PATCH 2/2] KVM: Protect vCPU's "last run PID" with rwlock, not RCU Sean Christopherson
@ 2024-08-06 22:59 ` Oliver Upton
2024-10-31 19:51 ` Sean Christopherson
3 siblings, 0 replies; 12+ messages in thread
From: Oliver Upton @ 2024-08-06 22:59 UTC (permalink / raw)
To: Sean Christopherson
Cc: Marc Zyngier, Paolo Bonzini, linux-arm-kernel, kvmarm, kvm,
linux-kernel, Steve Rutherford
On Fri, Aug 02, 2024 at 01:01:34PM -0700, Sean Christopherson wrote:
> Protect vcpu->pid with a rwlock instead of RCU, so that running a vCPU
> with a different task doesn't require a full RCU synchronization, which
> can introduce a non-trivial amount of jitter, especially on large systems.
>
> I've had this mini-series sitting around for ~2 years, pretty much as-is.
> I could have sworn past me thought there was a flaw in using a rwlock, and
> so I never posted it, but for the life of me I can't think of any issues.
>
> Extra eyeballs would be much appreciated.
Besides the nitpicks:
Acked-by: Oliver Upton <oliver.upton@linux.dev>
--
Thanks,
Oliver
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] KVM: Protect vCPU's "last run PID" with rwlock, not RCU
2024-08-06 22:58 ` Oliver Upton
@ 2024-08-06 23:59 ` Sean Christopherson
2024-08-09 18:05 ` Oliver Upton
0 siblings, 1 reply; 12+ messages in thread
From: Sean Christopherson @ 2024-08-06 23:59 UTC (permalink / raw)
To: Oliver Upton
Cc: Marc Zyngier, Paolo Bonzini, linux-arm-kernel, kvmarm, kvm,
linux-kernel, Steve Rutherford
On Tue, Aug 06, 2024, Oliver Upton wrote:
> On Fri, Aug 02, 2024 at 01:01:36PM -0700, Sean Christopherson wrote:
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index a33f5996ca9f..7199cb014806 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -1115,7 +1115,7 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
> > void kvm_arm_halt_guest(struct kvm *kvm);
> > void kvm_arm_resume_guest(struct kvm *kvm);
> >
> > -#define vcpu_has_run_once(vcpu) !!rcu_access_pointer((vcpu)->pid)
> > +#define vcpu_has_run_once(vcpu) (!!READ_ONCE((vcpu)->pid))
> >
> > #ifndef __KVM_NVHE_HYPERVISOR__
> > #define kvm_call_hyp_nvhe(f, ...) \
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 689e8be873a7..d6f4e8b2b44c 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -342,7 +342,8 @@ struct kvm_vcpu {
> > #ifndef __KVM_HAVE_ARCH_WQP
> > struct rcuwait wait;
> > #endif
> > - struct pid __rcu *pid;
> > + struct pid *pid;
> > + rwlock_t pid_lock;
> > int sigset_active;
> > sigset_t sigset;
> > unsigned int halt_poll_ns;
>
> Adding yet another lock is never exciting, but this looks fine.
Heh, my feelings too. Maybe that's why I didn't post this for two years.
> Can you nest this lock inside of the vcpu->mutex acquisition in
> kvm_vm_ioctl_create_vcpu() so lockdep gets the picture?
I don't think that's necessary. Commit 42a90008f890 ("KVM: Ensure lockdep knows
about kvm->lock vs. vcpu->mutex ordering rule") added the lock+unlock in
kvm_vm_ioctl_create_vcpu() purely because actually taking vcpu->mutex inside
kvm->lock is rare, i.e. lockdep would be unable to detect issues except for very
specific VM types hitting very specific flows.
But for this lock, every arch is guaranteed to take the lock on the first KVM_RUN,
as "oldpid" is '0' and guaranteed to mismatch task_pid(current). So I don't think
we go out of our way to alert lockdep.
> > @@ -4466,7 +4469,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
> > r = -EINVAL;
> > if (arg)
> > goto out;
> > - oldpid = rcu_access_pointer(vcpu->pid);
> > + oldpid = vcpu->pid;
>
> It'd be good to add a comment here about how this is guarded by the
> vcpu->mutex, as Steve points out.
Roger that.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] KVM: Protect vCPU's "last run PID" with rwlock, not RCU
2024-08-06 23:59 ` Sean Christopherson
@ 2024-08-09 18:05 ` Oliver Upton
2024-08-13 2:05 ` Sean Christopherson
0 siblings, 1 reply; 12+ messages in thread
From: Oliver Upton @ 2024-08-09 18:05 UTC (permalink / raw)
To: Sean Christopherson
Cc: Marc Zyngier, Paolo Bonzini, linux-arm-kernel, kvmarm, kvm,
linux-kernel, Steve Rutherford
On Tue, Aug 06, 2024 at 04:59:03PM -0700, Sean Christopherson wrote:
> > Can you nest this lock inside of the vcpu->mutex acquisition in
> > kvm_vm_ioctl_create_vcpu() so lockdep gets the picture?
>
> I don't think that's necessary. Commit 42a90008f890 ("KVM: Ensure lockdep knows
> about kvm->lock vs. vcpu->mutex ordering rule") added the lock+unlock in
> kvm_vm_ioctl_create_vcpu() purely because actually taking vcpu->mutex inside
> kvm->lock is rare, i.e. lockdep would be unable to detect issues except for very
> specific VM types hitting very specific flows.
I don't think the perceived rarity matters at all w/ this. Beyond the
lockdep benefits, it is a self-documenting way to describe lock ordering.
Dunno about you, but I haven't kept up with locking.rst at all :)
Having said that, an inversion would still be *very* obvious, as it
would be trying to grab a mutex while holding a spinlock...
--
Thanks,
Oliver
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] KVM: Protect vCPU's "last run PID" with rwlock, not RCU
2024-08-09 18:05 ` Oliver Upton
@ 2024-08-13 2:05 ` Sean Christopherson
0 siblings, 0 replies; 12+ messages in thread
From: Sean Christopherson @ 2024-08-13 2:05 UTC (permalink / raw)
To: Oliver Upton
Cc: Marc Zyngier, Paolo Bonzini, linux-arm-kernel, kvmarm, kvm,
linux-kernel, Steve Rutherford
On Fri, Aug 09, 2024, Oliver Upton wrote:
> On Tue, Aug 06, 2024 at 04:59:03PM -0700, Sean Christopherson wrote:
> > > Can you nest this lock inside of the vcpu->mutex acquisition in
> > > kvm_vm_ioctl_create_vcpu() so lockdep gets the picture?
> >
> > I don't think that's necessary. Commit 42a90008f890 ("KVM: Ensure lockdep knows
> > about kvm->lock vs. vcpu->mutex ordering rule") added the lock+unlock in
> > kvm_vm_ioctl_create_vcpu() purely because actually taking vcpu->mutex inside
> > kvm->lock is rare, i.e. lockdep would be unable to detect issues except for very
> > specific VM types hitting very specific flows.
>
> I don't think the perceived rarity matters at all w/ this.
Rarity definitely matters. If KVM was splattered with code that takes vcpu->mutex
inside kvm->lock, then the mess that led to above commit likely would never had
happened.
> Beyond the lockdep benefits, it is a self-documenting way to describe lock
> ordering.
Lock acquisition alone won't suffice, many of the more unique locks in KVM need
comments/documentation, e.g. to explain additional rules, assumptions that make
things work, etc. We could obviously add comments for everything, but I don't
see how that's clearly better than actual documentation. E.g. pid_lock is taken
for read across vCPUs. Acquiring vcpu->pid_lock inside vcpu->mutex doesn't
capture that at all.
It's also simply not realistic to enumerate every possible combination. Many of
the combinations will likely never happen in practice, especially for spinlocks
since their usage is quite targeted. Trying to document the "preferred" ordering
between the various spinlocks would be an exercise in futility as so many would
be 100% arbitrary due to lack of a use case.
KVM's mutexes are more interesting because they tend to be coarser, and thus are
more prone to nesting, so maybe we could have lockdep-enforced documentation for
those? But if we want to do that, I think we should have a dedicated helper (and
possible arch hooks), not an ad hoc pile of locks in vCPU creation.
And we should have that discussion over here[*], because I was planning on posting
a patch to revert the lockdep-only lock "documentation".
[*] https://lore.kernel.org/all/ZrFYsSPaDWUHOl0N@google.com
> Dunno about you, but I haven't kept up with locking.rst at all :)
Heh, x86 has done a decent job of documenting its lock usage. I would much rather
add an entry in locking.rst for this new lock than add a lock+unlock in vCPU
creation. Especially since the usage is rather uncommon (guaranteed single writer,
readers are best-effort and cross-vCPU).
> Having said that, an inversion would still be *very* obvious, as it
> would be trying to grab a mutex while holding a spinlock...
>
> --
> Thanks,
> Oliver
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] KVM: Protect vCPU's PID with a rwlock
2024-08-02 20:01 [PATCH 0/2] KVM: Protect vCPU's PID with a rwlock Sean Christopherson
` (2 preceding siblings ...)
2024-08-06 22:59 ` [PATCH 0/2] KVM: Protect vCPU's PID with a rwlock Oliver Upton
@ 2024-10-31 19:51 ` Sean Christopherson
3 siblings, 0 replies; 12+ messages in thread
From: Sean Christopherson @ 2024-10-31 19:51 UTC (permalink / raw)
To: Sean Christopherson, Marc Zyngier, Oliver Upton, Paolo Bonzini
Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, Steve Rutherford
On Fri, 02 Aug 2024 13:01:34 -0700, Sean Christopherson wrote:
> Protect vcpu->pid with a rwlock instead of RCU, so that running a vCPU
> with a different task doesn't require a full RCU synchronization, which
> can introduce a non-trivial amount of jitter, especially on large systems.
>
> I've had this mini-series sitting around for ~2 years, pretty much as-is.
> I could have sworn past me thought there was a flaw in using a rwlock, and
> so I never posted it, but for the life of me I can't think of any issues.
>
> [...]
Applied to kvm-x86 generic, with a comment explaining the vcpu->mutex is the
true protector of vcpu->pid.
Oliver, I didn't add the requested lockdep notification, I really want to make
that a separate discussion:
https://lore.kernel.org/all/20241009150455.1057573-7-seanjc@google.com
[1/2] KVM: Return '0' directly when there's no task to yield to
https://github.com/kvm-x86/linux/commit/6cf9ef23d942
[2/2] KVM: Protect vCPU's "last run PID" with rwlock, not RCU
https://github.com/kvm-x86/linux/commit/3e7f43188ee2
--
https://github.com/kvm-x86/linux/tree/next
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-10-31 19:56 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-02 20:01 [PATCH 0/2] KVM: Protect vCPU's PID with a rwlock Sean Christopherson
2024-08-02 20:01 ` [PATCH 1/2] KVM: Return '0' directly when there's no task to yield to Sean Christopherson
2024-08-02 20:01 ` [PATCH 2/2] KVM: Protect vCPU's "last run PID" with rwlock, not RCU Sean Christopherson
2024-08-02 20:28 ` Steve Rutherford
2024-08-02 20:51 ` Sean Christopherson
2024-08-02 21:27 ` Steve Rutherford
2024-08-06 22:58 ` Oliver Upton
2024-08-06 23:59 ` Sean Christopherson
2024-08-09 18:05 ` Oliver Upton
2024-08-13 2:05 ` Sean Christopherson
2024-08-06 22:59 ` [PATCH 0/2] KVM: Protect vCPU's PID with a rwlock Oliver Upton
2024-10-31 19:51 ` Sean Christopherson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).