From: Paolo Bonzini <pbonzini@redhat.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>,
x86@kernel.org, Marcelo Tosatti <mtosatti@redhat.com>,
Radim Krcmar <rkrcmar@redhat.com>,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
Alexander Graf <agraf@suse.de>,
Andy Lutomirski <luto@amacapital.net>
Subject: Re: [PATCH 2/5] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader
Date: Fri, 11 Dec 2015 09:42:03 +0100 [thread overview]
Message-ID: <566A8C5B.7090209@redhat.com> (raw)
In-Reply-To: <20151211075249.GA12564@gmail.com>
On 11/12/2015 08:52, Ingo Molnar wrote:
>
> * Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>>
>>
>> On 10/12/2015 00:12, Andy Lutomirski wrote:
>>> From: Andy Lutomirski <luto@amacapital.net>
>>>
>>> The pvclock vdso code was too abstracted to understand easily and
>>> excessively paranoid. Simplify it for a huge speedup.
>>>
>>> This opens the door for additional simplifications, as the vdso no
>>> longer accesses the pvti for any vcpu other than vcpu 0.
>>>
>>> Before, vclock_gettime using kvm-clock took about 45ns on my machine.
>>> With this change, it takes 29ns, which is almost as fast as the pure TSC
>>> implementation.
>>>
>>> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
>>> ---
>>> arch/x86/entry/vdso/vclock_gettime.c | 81 ++++++++++++++++++++----------------
>>> 1 file changed, 46 insertions(+), 35 deletions(-)
>>>
>>> diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
>>> index ca94fa649251..c325ba1bdddf 100644
>>> --- a/arch/x86/entry/vdso/vclock_gettime.c
>>> +++ b/arch/x86/entry/vdso/vclock_gettime.c
>>> @@ -78,47 +78,58 @@ static notrace const struct pvclock_vsyscall_time_info *get_pvti(int cpu)
>>>
>>> static notrace cycle_t vread_pvclock(int *mode)
>>> {
>>> - const struct pvclock_vsyscall_time_info *pvti;
>>> + const struct pvclock_vcpu_time_info *pvti = &get_pvti(0)->pvti;
>>> cycle_t ret;
>>> - u64 last;
>>> - u32 version;
>>> - u8 flags;
>>> - unsigned cpu, cpu1;
>>> -
>>> + u64 tsc, pvti_tsc;
>>> + u64 last, delta, pvti_system_time;
>>> + u32 version, pvti_tsc_to_system_mul, pvti_tsc_shift;
>>>
>>> /*
>>> - * Note: hypervisor must guarantee that:
>>> - * 1. cpu ID number maps 1:1 to per-CPU pvclock time info.
>>> - * 2. that per-CPU pvclock time info is updated if the
>>> - * underlying CPU changes.
>>> - * 3. that version is increased whenever underlying CPU
>>> - * changes.
>>> + * Note: The kernel and hypervisor must guarantee that cpu ID
>>> + * number maps 1:1 to per-CPU pvclock time info.
>>> + *
>>> + * Because the hypervisor is entirely unaware of guest userspace
>>> + * preemption, it cannot guarantee that per-CPU pvclock time
>>> + * info is updated if the underlying CPU changes or that that
>>> + * version is increased whenever underlying CPU changes.
>>> *
>>> + * On KVM, we are guaranteed that pvti updates for any vCPU are
>>> + * atomic as seen by *all* vCPUs. This is an even stronger
>>> + * guarantee than we get with a normal seqlock.
>>> + *
>>> + * On Xen, we don't appear to have that guarantee, but Xen still
>>> + * supplies a valid seqlock using the version field.
>>> +
>>> + * We only do pvclock vdso timing at all if
>>> + * PVCLOCK_TSC_STABLE_BIT is set, and we interpret that bit to
>>> + * mean that all vCPUs have matching pvti and that the TSC is
>>> + * synced, so we can just look at vCPU 0's pvti.
>>> */
>>> - do {
>>> - cpu = __getcpu() & VGETCPU_CPU_MASK;
>>> - /* TODO: We can put vcpu id into higher bits of pvti.version.
>>> - * This will save a couple of cycles by getting rid of
>>> - * __getcpu() calls (Gleb).
>>> - */
>>> -
>>> - pvti = get_pvti(cpu);
>>> -
>>> - version = __pvclock_read_cycles(&pvti->pvti, &ret, &flags);
>>> -
>>> - /*
>>> - * Test we're still on the cpu as well as the version.
>>> - * We could have been migrated just after the first
>>> - * vgetcpu but before fetching the version, so we
>>> - * wouldn't notice a version change.
>>> - */
>>> - cpu1 = __getcpu() & VGETCPU_CPU_MASK;
>>> - } while (unlikely(cpu != cpu1 ||
>>> - (pvti->pvti.version & 1) ||
>>> - pvti->pvti.version != version));
>>> -
>>> - if (unlikely(!(flags & PVCLOCK_TSC_STABLE_BIT)))
>>> +
>>> + if (unlikely(!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))) {
>>> *mode = VCLOCK_NONE;
>>> + return 0;
>>> + }
>>> +
>>> + do {
>>> + version = pvti->version;
>>> +
>>> + /* This is also a read barrier, so we'll read version first. */
>>> + tsc = rdtsc_ordered();
>>> +
>>> + pvti_tsc_to_system_mul = pvti->tsc_to_system_mul;
>>> + pvti_tsc_shift = pvti->tsc_shift;
>>> + pvti_system_time = pvti->system_time;
>>> + pvti_tsc = pvti->tsc_timestamp;
>>> +
>>> + /* Make sure that the version double-check is last. */
>>> + smp_rmb();
>>> + } while (unlikely((version & 1) || version != pvti->version));
>>> +
>>> + delta = tsc - pvti_tsc;
>>> + ret = pvti_system_time +
>>> + pvclock_scale_delta(delta, pvti_tsc_to_system_mul,
>>> + pvti_tsc_shift);
>>>
>>> /* refer to tsc.c read_tsc() comment for rationale */
>>> last = gtod->cycle_last;
>>>
>>
>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>
> Thanks. I've added your Reviewed-by to the 1/5 patch as well - to be able to put
> the whole series into the tip:x86/entry tree. Let me know if you'd like it to be
> done differently.
The 1/5 patch is entirely in KVM and is not necessary for the rest of
the series to work. I would like it to be separate, because Marcelo has
not yet chimed in to say why it was necessary.
Can you just apply patches 2-5?
Paolo
next prev parent reply other threads:[~2015-12-11 8:42 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-09 23:12 [PATCH 0/5] x86: KVM vdso and clock improvements Andy Lutomirski
2015-12-11 3:20 ` Andy Lutomirski
2015-12-09 23:12 ` [PATCH 1/5] x86/kvm: On KVM re-enable (e.g. after suspend), update clocks Andy Lutomirski
2015-12-11 3:20 ` Andy Lutomirski
2015-12-14 8:16 ` [tip:x86/asm] " tip-bot for Andy Lutomirski
2015-12-14 10:18 ` Paolo Bonzini
2016-03-16 22:06 ` [PATCH 1/5] " Radim Krcmar
2016-03-16 22:15 ` Andy Lutomirski
2016-03-16 22:59 ` Radim Krcmar
2016-03-16 23:07 ` Andy Lutomirski
2016-03-17 15:10 ` Radim Krcmar
2016-03-17 18:22 ` Andy Lutomirski
2016-03-17 19:58 ` Radim Krcmar
2015-12-09 23:12 ` [PATCH 2/5] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader Andy Lutomirski
2015-12-11 3:20 ` Andy Lutomirski
2015-12-10 9:09 ` Paolo Bonzini
2015-12-11 7:52 ` Ingo Molnar
2015-12-11 8:42 ` Paolo Bonzini [this message]
2015-12-11 18:03 ` Andy Lutomirski
2015-12-14 8:16 ` [tip:x86/asm] " tip-bot for Andy Lutomirski
2015-12-09 23:12 ` [PATCH 3/5] x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap Andy Lutomirski
2015-12-11 3:20 ` Andy Lutomirski
2015-12-10 9:09 ` Paolo Bonzini
2015-12-14 8:17 ` [tip:x86/asm] " tip-bot for Andy Lutomirski
2015-12-09 23:12 ` [PATCH 4/5] x86/vdso: Remove pvclock fixmap machinery Andy Lutomirski
2015-12-11 3:20 ` Andy Lutomirski
2015-12-10 9:09 ` Paolo Bonzini
2015-12-11 8:06 ` [PATCH] x86/platform/uv: Include clocksource.h for clocksource_touch_watchdog() Ingo Molnar
2015-12-11 8:06 ` Ingo Molnar
2015-12-11 17:33 ` Andy Lutomirski
2015-12-14 8:17 ` [tip:x86/asm] x86/vdso: Remove pvclock fixmap machinery tip-bot for Andy Lutomirski
2015-12-09 23:12 ` [PATCH 5/5] x86/vdso: Enable vdso pvclock access on all vdso variants Andy Lutomirski
2015-12-11 3:20 ` Andy Lutomirski
2015-12-10 9:10 ` Paolo Bonzini
2015-12-14 8:17 ` [tip:x86/asm] " tip-bot for Andy Lutomirski
2015-12-11 3:20 ` [PATCH 1/5] x86/kvm: On KVM re-enable (e.g. after suspend), update clocks Andy Lutomirski
2015-12-11 3:20 ` [PATCH 2/5] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader Andy Lutomirski
2015-12-11 3:20 ` [PATCH 3/5] x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap Andy Lutomirski
2015-12-11 3:20 ` [PATCH 4/5] x86/vdso: Remove pvclock fixmap machinery Andy Lutomirski
2015-12-11 3:20 ` [PATCH 5/5] x86/vdso: Enable vdso pvclock access on all vdso variants Andy Lutomirski
2015-12-11 3:21 ` [PATCH 0/5] x86: KVM vdso and clock improvements Andy Lutomirski
2015-12-11 3:21 ` Andy Lutomirski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=566A8C5B.7090209@redhat.com \
--to=pbonzini@redhat.com \
--cc=agraf@suse.de \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=luto@kernel.org \
--cc=mingo@kernel.org \
--cc=mtosatti@redhat.com \
--cc=rkrcmar@redhat.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.