From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: X86 ML <x86@kernel.org>, Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
"K. Y. Srinivasan" <kys@microsoft.com>,
Haiyang Zhang <haiyangz@microsoft.com>,
Stephen Hemminger <sthemmin@microsoft.com>,
Dexuan Cui <decui@microsoft.com>,
"linux-kernel\@vger.kernel.org" <linux-kernel@vger.kernel.org>,
devel@linuxdriverproject.org,
Linux Virtualization <virtualization@lists.linux-foundation.org>
Subject: Re: [PATCH RFC 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
Date: Wed, 08 Feb 2017 18:27:17 +0100 [thread overview]
Message-ID: <87y3xgr3t6.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <CALCETrU48en_Zki88eXZEcU7QFf0u9sVAT_qgY0TyzoWkP2-6g@mail.gmail.com> (Andy Lutomirski's message of "Wed, 8 Feb 2017 09:12:48 -0800")
Andy Lutomirski <luto@amacapital.net> writes:
> On Wed, Feb 8, 2017 at 9:07 AM, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>> Hyper-V TSC page clocksource is suitable for vDSO, however, the protocol
>> defined by the hypervisor is different from VCLOCK_PVCLOCK. Implement the
>> required support re-using pvclock_page VVAR as VCLOCK_PVCLOCK is mutually
>> exclusive with VCLOCK_HVCLOCK at run time.
>>
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>> arch/x86/entry/vdso/vclock_gettime.c | 48 ++++++++++++++++++++++++++++++++++++
>> arch/x86/entry/vdso/vma.c | 26 +++++++++++++------
>> arch/x86/hyperv/hv_init.c | 3 +++
>> arch/x86/include/asm/clocksource.h | 3 ++-
>> 4 files changed, 72 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
>> index 9d4d6e1..93e9dcd 100644
>> --- a/arch/x86/entry/vdso/vclock_gettime.c
>> +++ b/arch/x86/entry/vdso/vclock_gettime.c
>> @@ -17,6 +17,7 @@
>> #include <asm/unistd.h>
>> #include <asm/msr.h>
>> #include <asm/pvclock.h>
>> +#include <asm/mshyperv.h>
>> #include <linux/math64.h>
>> #include <linux/time.h>
>> #include <linux/kernel.h>
>> @@ -141,6 +142,49 @@ static notrace u64 vread_pvclock(int *mode)
>> return last;
>> }
>> #endif
>> +#ifdef CONFIG_HYPERV_CLOCK
>> +/* (a * b) >> 64 implementation */
>> +static inline u64 mul64x64_hi(u64 a, u64 b)
>> +{
>> + u64 a_lo, a_hi, b_lo, b_hi, p1, p2;
>> +
>> + a_lo = (u32)a;
>> + a_hi = a >> 32;
>> + b_lo = (u32)b;
>> + b_hi = b >> 32;
>> + p1 = a_lo * b_hi;
>> + p2 = a_hi * b_lo;
>> +
>> + return a_hi * b_hi + (p1 >> 32) + (p2 >> 32) +
>> + ((((a_lo * b_lo) >> 32) + (u32)p1 + (u32)p2) >> 32);
>> +
>> +}
>
> Unless GCC is waaay more clever than I think, this is hugely
> suboptimal on 64-bit. x86 can do this in a single instruction, and
> gcc can express it cleanly using __uint128_t. I wouldn't be terribly
> surprised if the 32-bit generated code was fine, too.
>
Yes, this is a 'quick-and-dirty' mulq implementation suitable for 32
bit. I'll try making it better.
>> +
>> +static notrace u64 vread_hvclock(int *mode)
>> +{
>> + const struct ms_hyperv_tsc_page *tsc_pg =
>> + (const struct ms_hyperv_tsc_page *)&pvclock_page;
>> + u64 sequence, scale, offset, current_tick, cur_tsc;
>> +
>> + while (1) {
>> + sequence = READ_ONCE(tsc_pg->tsc_sequence);
>> + if (!sequence)
>> + break;
>> +
>> + scale = READ_ONCE(tsc_pg->tsc_scale);
>> + offset = READ_ONCE(tsc_pg->tsc_offset);
>> + rdtscll(cur_tsc);
>> +
>> + current_tick = mul64x64_hi(cur_tsc, scale) + offset;
>> +
>> + if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
>> + return current_tick;
>> + }
>
> Can you explain better what's going on here? What protocol does the
> hypervisor use to update this bizarro seqlock? What exactly does
> sequence==0 mean?
I basically copied the code from arch/x86/hyperv/hv_init.c (in
linux-next). The protocol is:
tsc_sequence == 0 means we should not be using this clocksource. This
happens during migration, for example.
When tsc_sequence != 0 we calculate
current_tick = rdtsc() * tsc_pg->tsc_scale + tsc_pg->tsc_offset
and then we check tsc_sequence again. If it changed it means that the
hypervisor was updating the data at this moment and we need to discard
the result and try again. If the sequence number is the same we're good
to go.
>
>> +
>> + *mode = VCLOCK_NONE;
>> + return 0;
>> +}
>> +#endif
>>
>> notrace static u64 vread_tsc(void)
>> {
>> @@ -173,6 +217,10 @@ notrace static inline u64 vgetsns(int *mode)
>> else if (gtod->vclock_mode == VCLOCK_PVCLOCK)
>> cycles = vread_pvclock(mode);
>> #endif
>> +#ifdef CONFIG_HYPERV_CLOCK
>> + else if (gtod->vclock_mode == VCLOCK_HVCLOCK)
>> + cycles = vread_hvclock(mode);
>> +#endif
>> else
>> return 0;
>> v = (cycles - gtod->cycle_last) & gtod->mask;
>> diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
>> index 10820f6..4b9d90c 100644
>> --- a/arch/x86/entry/vdso/vma.c
>> +++ b/arch/x86/entry/vdso/vma.c
>> @@ -21,6 +21,7 @@
>> #include <asm/page.h>
>> #include <asm/desc.h>
>> #include <asm/cpufeature.h>
>> +#include <asm/mshyperv.h>
>>
>> #if defined(CONFIG_X86_64)
>> unsigned int __read_mostly vdso64_enabled = 1;
>> @@ -112,13 +113,24 @@ static int vvar_fault(const struct vm_special_mapping *sm,
>> ret = vm_insert_pfn(vma, vmf->address,
>> __pa_symbol(&__vvar_page) >> PAGE_SHIFT);
>> } else if (sym_offset == image->sym_pvclock_page) {
>> - struct pvclock_vsyscall_time_info *pvti =
>> - pvclock_pvti_cpu0_va();
>> - if (pvti && vclock_was_used(VCLOCK_PVCLOCK)) {
>> - ret = vm_insert_pfn(
>> - vma,
>> - vmf->address,
>> - __pa(pvti) >> PAGE_SHIFT);
>> + if (vclock_was_used(VCLOCK_PVCLOCK)) {
>> + struct pvclock_vsyscall_time_info *pvti =
>> + pvclock_pvti_cpu0_va();
>> + if (pvti) {
>> + ret = vm_insert_pfn(
>> + vma,
>> + vmf->address,
>> + __pa(pvti) >> PAGE_SHIFT);
>> + }
>> + } else if (vclock_was_used(VCLOCK_HVCLOCK)) {
>> + struct ms_hyperv_tsc_page *tsc_pg =
>> + hv_get_tsc_page();
>> + if (tsc_pg) {
>> + ret = vm_insert_pfn(
>> + vma,
>> + vmf->address,
>> + vmalloc_to_pfn(tsc_pg));
>> + }
>
> This is IMO just too weird and fragile. Why not allocate another page
> and avoid this strange aliasing?
My idea was that as we can't be in both VCLOCK_PVCLOCK (KVM, Xen) and
VCLOCK_HVCLOCK (Hyper-V) modes simultaneously. But let's make it
different to avoid confusion.
--
Vitaly
next prev parent reply other threads:[~2017-02-08 17:28 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-08 17:07 [PATCH RFC 0/2] x86/vdso: Add Hyper-V TSC page clocksource support Vitaly Kuznetsov
2017-02-08 17:07 ` [PATCH RFC 1/2] hyperv: implement hv_get_tsc_page() Vitaly Kuznetsov
2017-02-08 17:07 ` [PATCH RFC 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method Vitaly Kuznetsov
2017-02-08 17:12 ` Andy Lutomirski
2017-02-08 17:27 ` Vitaly Kuznetsov [this message]
2017-02-08 17:52 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87y3xgr3t6.fsf@vitty.brq.redhat.com \
--to=vkuznets@redhat.com \
--cc=decui@microsoft.com \
--cc=devel@linuxdriverproject.org \
--cc=haiyangz@microsoft.com \
--cc=hpa@zytor.com \
--cc=kys@microsoft.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=mingo@redhat.com \
--cc=sthemmin@microsoft.com \
--cc=tglx@linutronix.de \
--cc=virtualization@lists.linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox