Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Stephen Hemminger <stephen@networkplumber.org>
To: Andy Lutomirski <luto@amacapital.net>
Cc: KY Srinivasan <kys@microsoft.com>,
	Stephen Hemminger <sthemmin@microsoft.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"virtualization@lists.linux-foundation.org" 
	<virtualization@lists.linux-foundation.org>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	"devel@linuxdriverproject.org" <devel@linuxdriverproject.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
Date: Thu, 9 Feb 2017 15:15:06 -0800	[thread overview]
Message-ID: <20170209151506.2c0a555f@xeon-e3> (raw)
In-Reply-To: <CALCETrXptSZJMH4yBqn=xPeJHDPZNnq7io8v75=spT6q1RTpdA@mail.gmail.com>

On Thu, 9 Feb 2017 14:55:50 -0800
Andy Lutomirski <luto@amacapital.net> wrote:

> On Thu, Feb 9, 2017 at 12:45 PM, KY Srinivasan <kys@microsoft.com> wrote:
> >
> >  
> >> -----Original Message-----
> >> From: Thomas Gleixner [mailto:tglx@linutronix.de]
> >> Sent: Thursday, February 9, 2017 9:08 AM
> >> To: Vitaly Kuznetsov <vkuznets@redhat.com>
> >> Cc: x86@kernel.org; Andy Lutomirski <luto@amacapital.net>; Ingo Molnar
> >> <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; KY Srinivasan
> >> <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>; Stephen
> >> Hemminger <sthemmin@microsoft.com>; Dexuan Cui
> >> <decui@microsoft.com>; linux-kernel@vger.kernel.org;
> >> devel@linuxdriverproject.org; virtualization@lists.linux-foundation.org
> >> Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read
> >> method
> >>
> >> On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:  
> >> > +#ifdef CONFIG_HYPERV_TSCPAGE
> >> > +static notrace u64 vread_hvclock(int *mode)
> >> > +{
> >> > +   const struct ms_hyperv_tsc_page *tsc_pg =
> >> > +           (const struct ms_hyperv_tsc_page *)&hvclock_page;
> >> > +   u64 sequence, scale, offset, current_tick, cur_tsc;
> >> > +
> >> > +   while (1) {
> >> > +           sequence = READ_ONCE(tsc_pg->tsc_sequence);
> >> > +           if (!sequence)
> >> > +                   break;
> >> > +
> >> > +           scale = READ_ONCE(tsc_pg->tsc_scale);
> >> > +           offset = READ_ONCE(tsc_pg->tsc_offset);
> >> > +           rdtscll(cur_tsc);
> >> > +
> >> > +           current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
> >> > +
> >> > +           if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
> >> > +                   return current_tick;  
> >>
> >> That sequence stuff lacks still a sensible explanation. It's fundamentally
> >> different from the sequence counting we do in the kernel, so documentation
> >> for it is really required.  
> >
> > The host is updating multiple fields in this shared TSC page and the sequence number is
> > used to ensure that the guest sees a consistent set values published. If I remember
> > correctly, Xen has a similar mechanism.  
> 
> So what's the actual protocol?  When the hypervisor updates the page,
> does it freeze all guest cpus?  If not, how does it maintain
> atomicity?

The protocol looks a lot like Linux seqlock, but it has an extra protection
which is missing here.

The host needs to update sequence number twice in order to guarantee ordering.
Otherwise it is possible that Host and guest can race.

					Host
						Write offset
						Write scale
						Set tsc_sequence = N
          Guest
		read sequence = N
		Read scale
						Write scale
						Write offset
		
		Read Offset
		Check sequence == N
						Set tsc_sequence = N +1

Look like the current host side protocol is wrong.

The solution that Andi Kleen invented, and I used in seqlock was for the writer to update
sequence at start and end of transaction. If sequence number is odd, then the reader knows
it is looking at stale data.
					Host
						Write offset
						Write scale
						Set tsc_sequence = N (end of transaction)
          Guest
		read sequence = N
		Spin until sequence is even (N is even)
		Read scale
						Set tsc_sequence += 1
						Write scale
						Write offset
		
		Read Offset
		Check sequence == N? (fails is N + 1)
						Set tsc_sequence += 1 (end of transaction)
		read sequence = N+2
		Spin until sequence is even (ie N +2)
		Read scale	
		Read Offset
		Check sequence == N +2? (yes ok).

Also it is faster to just read scale and offset with this loop and save
the reading of TSC and doing multiply until after scale/offset has been acquired.

next prev parent reply	other threads:[~2017-02-09 23:17 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-09 14:10 [PATCH 0/2] x86/vdso: Add Hyper-V TSC page clocksource support Vitaly Kuznetsov
2017-02-09 14:10 ` [PATCH 1/2] hyperv: implement hv_get_tsc_page() Vitaly Kuznetsov
2017-02-09 18:24   ` Stephen Hemminger
2017-02-09 20:14     ` Thomas Gleixner
2017-02-09 23:17       ` Stephen Hemminger
2017-02-09 14:10 ` [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method Vitaly Kuznetsov
2017-02-09 17:08   ` Thomas Gleixner
2017-02-09 18:27     ` Stephen Hemminger
2017-02-10 12:25       ` Vitaly Kuznetsov
2017-02-10 12:28         ` Thomas Gleixner
2017-02-10 16:31           ` Stephen Hemminger
2017-02-10 18:01             ` Thomas Gleixner
2017-02-13  7:49               ` Dexuan Cui
2017-02-13  9:27                 ` Thomas Gleixner
2017-02-13 19:06                 ` Andy Lutomirski
2017-02-13 19:28                   ` Thomas Gleixner
2017-02-09 20:45     ` KY Srinivasan
2017-02-09 22:55       ` Andy Lutomirski
2017-02-09 23:15         ` Stephen Hemminger [this message]
2017-02-10 12:15         ` Vitaly Kuznetsov
2017-02-10 11:06     ` Vitaly Kuznetsov
2017-02-10 11:15       ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170209151506.2c0a555f@xeon-e3 \
    --to=stephen@networkplumber.org \
    --cc=devel@linuxdriverproject.org \
    --cc=haiyangz@microsoft.com \
    --cc=hpa@zytor.com \
    --cc=kys@microsoft.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mingo@redhat.com \
    --cc=sthemmin@microsoft.com \
    --cc=tglx@linutronix.de \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox