From: Pan Xinhui <xinhui@linux.vnet.ibm.com>
To: "Radim Krčmář" <rkrcmar@redhat.com>,
"Pan Xinhui" <xinhui.pan@linux.vnet.ibm.com>
Cc: kernellwp@gmail.com, linux-s390@vger.kernel.org, jgross@suse.com,
kvm@vger.kernel.org, peterz@infradead.org,
xen-devel-request@lists.xenproject.org, will.deacon@arm.com,
linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org, mingo@redhat.com,
paulus@samba.org, mpe@ellerman.id.au, benh@kernel.crashing.org,
pbonzini@redhat.com, paulmck@linux.vnet.ibm.com,
linuxppc-dev@lists.ozlabs.org, boqun.feng@gmail.com
Subject: Re: [PATCH v4 5/5] x86, kvm: support vcpu preempted check
Date: Thu, 20 Oct 2016 02:45:33 +0800 [thread overview]
Message-ID: <34aefe78-af86-19b3-9e2f-cb3ee6b5f735@linux.vnet.ibm.com> (raw)
In-Reply-To: <20161019172403.GA9240@potion>
在 2016/10/20 01:24, Radim Krčmář 写道:
> 2016-10-19 06:20-0400, Pan Xinhui:
>> This is to fix some lock holder preemption issues. Some other locks
>> implementation do a spin loop before acquiring the lock itself.
>> Currently kernel has an interface of bool vcpu_is_preempted(int cpu). It
>> takes the cpu as parameter and return true if the cpu is preempted. Then
>> kernel can break the spin loops upon on the retval of vcpu_is_preempted.
>>
>> As kernel has used this interface, So lets support it.
>>
>> We use one field of struct kvm_steal_time to indicate that if one vcpu
>> is running or not.
>>
>> unix benchmark result:
>> host: kernel 4.8.1, i5-4570, 4 cpus
>> guest: kernel 4.8.1, 8 vcpus
>>
>> test-case after-patch before-patch
>> Execl Throughput | 18307.9 lps | 11701.6 lps
>> File Copy 1024 bufsize 2000 maxblocks | 1352407.3 KBps | 790418.9 KBps
>> File Copy 256 bufsize 500 maxblocks | 367555.6 KBps | 222867.7 KBps
>> File Copy 4096 bufsize 8000 maxblocks | 3675649.7 KBps | 1780614.4 KBps
>> Pipe Throughput | 11872208.7 lps | 11855628.9 lps
>> Pipe-based Context Switching | 1495126.5 lps | 1490533.9 lps
>> Process Creation | 29881.2 lps | 28572.8 lps
>> Shell Scripts (1 concurrent) | 23224.3 lpm | 22607.4 lpm
>> Shell Scripts (8 concurrent) | 3531.4 lpm | 3211.9 lpm
>> System Call Overhead | 10385653.0 lps | 10419979.0 lps
>>
>> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
>> ---
>> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
>> @@ -98,6 +98,10 @@ struct pv_time_ops {
>> unsigned long long (*steal_clock)(int cpu);
>> };
>>
>> +struct pv_vcpu_ops {
>> + bool (*vcpu_is_preempted)(int cpu);
>> +};
>> +
>
> (I would put it into pv_lock_ops to save the plumbing.)
>
hi, Radim
thanks for your reply.
yes, a new struct leads patch into unnecessary lines changed.
I do that just because I am not sure which existing xxx_ops I should place the vcpu_is_preempted in.
>> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
>> @@ -45,7 +45,8 @@ struct kvm_steal_time {
>> __u64 steal;
>> __u32 version;
>> __u32 flags;
>> - __u32 pad[12];
>> + __u32 preempted;
>
> Why __u32 instead of __u8?
>
I thought it is 32-bits aligned...
yes, u8 is good to store the preempt status.
>> + __u32 pad[11];
>> };
>
> Please document the change in Documentation/virtual/kvm/msr.txt, section
> MSR_KVM_STEAL_TIME.
>
okay, I totally forgot to do that. thanks!
>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>> @@ -415,6 +415,15 @@ void kvm_disable_steal_time(void)
>> +static bool kvm_vcpu_is_preempted(int cpu)
>> +{
>> + struct kvm_steal_time *src;
>> +
>> + src = &per_cpu(steal_time, cpu);
>> +
>> + return !!src->preempted;
>> +}
>> +
>> #ifdef CONFIG_SMP
>> static void __init kvm_smp_prepare_boot_cpu(void)
>> {
>> @@ -488,6 +497,8 @@ void __init kvm_guest_init(void)
>> kvm_guest_cpu_init();
>> #endif
>>
>> + pv_vcpu_ops.vcpu_is_preempted = kvm_vcpu_is_preempted;
>
> Would be nicer to assign conditionally in the KVM_FEATURE_STEAL_TIME
> block. The steal_time structure has to be zeroed, so this code would
> work, but the native function (return false) is better if we know that
> the kvm_vcpu_is_preempted() would always return false anway.
>
yes, agree. Will do that.
I once thought we can patch the code runtime.
we replace binary code
"call 0xXXXXXXXX #pv_vcpu_ops.vcpu_is_preempted"
with
"xor eax, eax"
however it is not worth doing that. the performace improvements might be very small.
> Old KVMs won't have the feature, so we could also assign only when KVM
> reports it, but that requires extra definitions and the performance gain
> is fairly small, so I'm ok with this.
>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> @@ -2057,6 +2057,8 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
>> &vcpu->arch.st.steal, sizeof(struct kvm_steal_time))))
>> return;
>>
>> + vcpu->arch.st.steal.preempted = 0;
>> +
>> if (vcpu->arch.st.steal.version & 1)
>> vcpu->arch.st.steal.version += 1; /* first time write, random junk */
>>
>> @@ -2812,6 +2814,16 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>
>> void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>> {
>> + if (vcpu->arch.st.msr_val & KVM_MSR_ENABLED)
>> + if (kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
>> + &vcpu->arch.st.steal,
>> + sizeof(struct kvm_steal_time)) == 0) {
>> + vcpu->arch.st.steal.preempted = 1;
>> + kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
>> + &vcpu->arch.st.steal,
>> + sizeof(struct kvm_steal_time));
>> + }
>
> Please name this block of code. Something like
> kvm_steal_time_set_preempted(vcpu);
>
yep, my code style is ugly.
will do that.
thanks
xinhui
> Thanks.
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
next prev parent reply other threads:[~2016-10-19 18:45 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-19 10:20 [PATCH v4 0/5] implement vcpu preempted check Pan Xinhui
2016-10-19 6:47 ` Christian Borntraeger
2016-10-19 16:57 ` Pan Xinhui
2016-10-19 10:20 ` [PATCH v4 1/5] kernel/sched: introduce vcpu preempted check interface Pan Xinhui
2016-10-19 10:20 ` [PATCH v4 2/5] locking/osq: Drop the overload of osq_lock() Pan Xinhui
2016-10-19 10:20 ` [PATCH v4 3/5] kernel/locking: Drop the overload of {mutex, rwsem}_spin_on_owner Pan Xinhui
2016-10-19 10:20 ` [PATCH v4 4/5] powerpc/spinlock: support vcpu preempted check Pan Xinhui
2016-10-19 10:20 ` [PATCH v4 5/5] x86, kvm: " Pan Xinhui
2016-10-19 17:24 ` Radim Krčmář
2016-10-19 18:45 ` Pan Xinhui [this message]
2016-10-24 14:39 ` Paolo Bonzini
2016-10-24 15:14 ` Radim Krčmář
2016-10-24 15:18 ` Paolo Bonzini
2016-10-25 1:25 ` Pan Xinhui
2016-10-19 15:58 ` [PATCH v4 0/5] implement " Juergen Gross
2016-10-19 17:08 ` Pan Xinhui
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=34aefe78-af86-19b3-9e2f-cb3ee6b5f735@linux.vnet.ibm.com \
--to=xinhui@linux.vnet.ibm.com \
--cc=benh@kernel.crashing.org \
--cc=boqun.feng@gmail.com \
--cc=jgross@suse.com \
--cc=kernellwp@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mingo@redhat.com \
--cc=mpe@ellerman.id.au \
--cc=paulmck@linux.vnet.ibm.com \
--cc=paulus@samba.org \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=rkrcmar@redhat.com \
--cc=virtualization@lists.linux-foundation.org \
--cc=will.deacon@arm.com \
--cc=xen-devel-request@lists.xenproject.org \
--cc=xinhui.pan@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox