All of lore.kernel.org
 help / color / mirror / Atom feed
From: Avi Kivity <avi@redhat.com>
To: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Xen-devel <xen-devel@lists.xensource.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	kurt.hackel@oracle.com,
	Dan Magenheimer <dan.magenheimer@oracle.com>,
	Keir Fraser <keir.fraser@eu.citrix.com>,
	Glauber de Oliveira Costa <gcosta@redhat.com>,
	Zach Brown <zach.brown@oracle.com>,
	the arch/x86 maintainers <x86@kernel.org>,
	Chris Mason <chris.mason@oracle.com>
Subject: Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation
Date: Tue, 06 Oct 2009 11:04:51 +0200	[thread overview]
Message-ID: <4ACB0833.2050203@redhat.com> (raw)
In-Reply-To: <1254790211-15416-4-git-send-email-jeremy.fitzhardinge@citrix.com>

On 10/06/2009 02:50 AM, Jeremy Fitzhardinge wrote:
> This patch allows the pvclock mechanism to be used in usermode.  To
> do this, we map an extra page into usermode containing an array of
> pvclock_vcpu_time_info structures which give the information required
> to compute a global system clock from the tsc.  With this, we can
> implement pvclock_clocksource_vread().
>
> One complication is that usermode is subject to two levels of scheduling:
> kernel scheduling of tasks onto vcpus, and hypervisor scheduling of
> vcpus onto pcpus.  In either case the underlying pcpu changed, and with
> it, the correct set of parameters to compute tsc->system clock.  To
> address this we install a preempt notifier on sched_out to increment
> that vcpu's version number.  Usermode can then check the version number
> is unchanged while computing the time and retry if it has (the only
> difference from the kernel's version of the algorithm is that the vcpu
> may have changed, so we may need to switch pvclock_vcpu_time_info
> structures.
>
> To use this feature, hypervisor-specific code is required
> to call pvclock_init_vsyscall(), and if successful:
>   - cause the pvclock_vcpu_time_info structure at
>     pvclock_get_vsyscall_time_info(cpu) to be updated appropriately for
>     each vcpu.
>   - use pvclock_clocksource_vread as the implementation of clocksource
>     .vread.
>
> +
> +cycle_t __vsyscall_fn pvclock_clocksource_vread(void)
> +{
> +	const struct pvclock_vcpu_time_info *pvti_base;
> +	const struct pvclock_vcpu_time_info *pvti;
> +	cycle_t ret;
> +	u32 version;
> +
> +	pvti_base = (struct pvclock_vcpu_time_info *)fix_to_virt(FIX_PVCLOCK_TIME_INFO);
> +
> +	/*
> +	 * When looping to get a consistent (time-info, tsc) pair, we
> +	 * also need to deal with the possibility we can switch vcpus,
> +	 * so make sure we always re-fetch time-info for the current vcpu.
> +	 */
> +	do {
> +		unsigned cpu;
> +
> +		vgetcpu(&cpu, NULL, NULL);
> +		pvti =&pvti_base[cpu];
> +
> +		version = __pvclock_read_cycles(pvti,&ret);
> +	} while (unlikely(pvti->version != version));
> +
> +	return ret;
> +}
>    

Instead of using vgetcpu() and rdtsc() independently, you can use rdtscp 
to read both atomically.  This removes the need for the preempt notifier.

> +
> +/*
> + * Initialize the generic pvclock vsyscall state.  This will allocate
> + * a/some page(s) for the per-vcpu pvclock information, set up a
> + * fixmap mapping for the page(s)
> + */
> +int __init pvclock_init_vsyscall(void)
> +{
> +	int cpu;
> +
> +	/* Just one page for now */
> +	if (nr_cpu_ids * sizeof(struct vcpu_time_info)>  PAGE_SIZE) {
> +		printk(KERN_WARNING "pvclock_vsyscall: too many CPUs to fit time_info into a single page\n");
> +		return -ENOSPC;
> +	}
> +
> +	pvclock_vsyscall_time_info =
> +		(struct pvclock_vcpu_time_info *)get_zeroed_page(GFP_KERNEL);
> +	if (pvclock_vsyscall_time_info == NULL)
> +		return -ENOMEM;
> +
>    

Need to align the vcpu_time_infos on a cacheline boundary.

> +	for (cpu = 0; cpu<  nr_cpu_ids; cpu++)
> +		pvclock_vsyscall_time_info[cpu].version = ~0;
> +
> +	__set_fixmap(FIX_PVCLOCK_TIME_INFO, __pa(pvclock_vsyscall_time_info),
> +		     PAGE_KERNEL_VSYSCALL);
> +
> +	preempt_notifier_init(&pvclock_vsyscall_notifier,
> +			&pvclock_vsyscall_preempt_ops);
> +	preempt_notifier_register(&pvclock_vsyscall_notifier);
> +
>    

preempt notifiers are per-thread, not global, and will upset the cycle 
counters.  I'd drop them and use rdtscp instead (and give up if the 
processor doesn't support it).

-- 
error compiling committee.c: too many arguments to function


WARNING: multiple messages have this Message-ID (diff)
From: Avi Kivity <avi@redhat.com>
To: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>,
	Xen-devel <xen-devel@lists.xensource.com>,
	kurt.hackel@oracle.com, the arch/x86 maintainers <x86@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Glauber de Oliveira Costa <gcosta@redhat.com>,
	Keir Fraser <keir.fraser@eu.citrix.com>,
	Zach Brown <zach.brown@oracle.com>,
	Chris Mason <chris.mason@oracle.com>
Subject: Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation
Date: Tue, 06 Oct 2009 11:04:51 +0200	[thread overview]
Message-ID: <4ACB0833.2050203@redhat.com> (raw)
In-Reply-To: <1254790211-15416-4-git-send-email-jeremy.fitzhardinge@citrix.com>

On 10/06/2009 02:50 AM, Jeremy Fitzhardinge wrote:
> This patch allows the pvclock mechanism to be used in usermode.  To
> do this, we map an extra page into usermode containing an array of
> pvclock_vcpu_time_info structures which give the information required
> to compute a global system clock from the tsc.  With this, we can
> implement pvclock_clocksource_vread().
>
> One complication is that usermode is subject to two levels of scheduling:
> kernel scheduling of tasks onto vcpus, and hypervisor scheduling of
> vcpus onto pcpus.  In either case the underlying pcpu changed, and with
> it, the correct set of parameters to compute tsc->system clock.  To
> address this we install a preempt notifier on sched_out to increment
> that vcpu's version number.  Usermode can then check the version number
> is unchanged while computing the time and retry if it has (the only
> difference from the kernel's version of the algorithm is that the vcpu
> may have changed, so we may need to switch pvclock_vcpu_time_info
> structures.
>
> To use this feature, hypervisor-specific code is required
> to call pvclock_init_vsyscall(), and if successful:
>   - cause the pvclock_vcpu_time_info structure at
>     pvclock_get_vsyscall_time_info(cpu) to be updated appropriately for
>     each vcpu.
>   - use pvclock_clocksource_vread as the implementation of clocksource
>     .vread.
>
> +
> +cycle_t __vsyscall_fn pvclock_clocksource_vread(void)
> +{
> +	const struct pvclock_vcpu_time_info *pvti_base;
> +	const struct pvclock_vcpu_time_info *pvti;
> +	cycle_t ret;
> +	u32 version;
> +
> +	pvti_base = (struct pvclock_vcpu_time_info *)fix_to_virt(FIX_PVCLOCK_TIME_INFO);
> +
> +	/*
> +	 * When looping to get a consistent (time-info, tsc) pair, we
> +	 * also need to deal with the possibility we can switch vcpus,
> +	 * so make sure we always re-fetch time-info for the current vcpu.
> +	 */
> +	do {
> +		unsigned cpu;
> +
> +		vgetcpu(&cpu, NULL, NULL);
> +		pvti =&pvti_base[cpu];
> +
> +		version = __pvclock_read_cycles(pvti,&ret);
> +	} while (unlikely(pvti->version != version));
> +
> +	return ret;
> +}
>    

Instead of using vgetcpu() and rdtsc() independently, you can use rdtscp 
to read both atomically.  This removes the need for the preempt notifier.

> +
> +/*
> + * Initialize the generic pvclock vsyscall state.  This will allocate
> + * a/some page(s) for the per-vcpu pvclock information, set up a
> + * fixmap mapping for the page(s)
> + */
> +int __init pvclock_init_vsyscall(void)
> +{
> +	int cpu;
> +
> +	/* Just one page for now */
> +	if (nr_cpu_ids * sizeof(struct vcpu_time_info)>  PAGE_SIZE) {
> +		printk(KERN_WARNING "pvclock_vsyscall: too many CPUs to fit time_info into a single page\n");
> +		return -ENOSPC;
> +	}
> +
> +	pvclock_vsyscall_time_info =
> +		(struct pvclock_vcpu_time_info *)get_zeroed_page(GFP_KERNEL);
> +	if (pvclock_vsyscall_time_info == NULL)
> +		return -ENOMEM;
> +
>    

Need to align the vcpu_time_infos on a cacheline boundary.

> +	for (cpu = 0; cpu<  nr_cpu_ids; cpu++)
> +		pvclock_vsyscall_time_info[cpu].version = ~0;
> +
> +	__set_fixmap(FIX_PVCLOCK_TIME_INFO, __pa(pvclock_vsyscall_time_info),
> +		     PAGE_KERNEL_VSYSCALL);
> +
> +	preempt_notifier_init(&pvclock_vsyscall_notifier,
> +			&pvclock_vsyscall_preempt_ops);
> +	preempt_notifier_register(&pvclock_vsyscall_notifier);
> +
>    

preempt notifiers are per-thread, not global, and will upset the cycle 
counters.  I'd drop them and use rdtscp instead (and give up if the 
processor doesn't support it).

-- 
error compiling committee.c: too many arguments to function

  reply	other threads:[~2009-10-06  9:06 UTC|newest]

Thread overview: 119+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-06  0:50 [PATCH RFC] Extending pvclock down to usermode for vsyscall Jeremy Fitzhardinge
2009-10-06  0:50 ` Jeremy Fitzhardinge
2009-10-06  0:50 ` [PATCH 1/5] x86/pvclock: make sure rdtsc doesn't speculate out of region Jeremy Fitzhardinge
2009-10-06  0:50   ` Jeremy Fitzhardinge
2009-10-06  0:50 ` [PATCH 2/5] x86/pvclock: no need to use strong read barriers in pvclock_get_time_values Jeremy Fitzhardinge
2009-10-06  0:50   ` Jeremy Fitzhardinge
2009-10-06  0:50 ` [PATCH 3/5] x86/pvclock: add vsyscall implementation Jeremy Fitzhardinge
2009-10-06  0:50   ` Jeremy Fitzhardinge
2009-10-06  9:04   ` Avi Kivity [this message]
2009-10-06  9:04     ` Avi Kivity
2009-10-06 14:19     ` Dan Magenheimer
2009-10-06 14:19       ` Dan Magenheimer
2009-10-06 15:11       ` Avi Kivity
2009-10-06 15:11         ` Avi Kivity
2009-10-06 18:46     ` [Xen-devel] " Jeremy Fitzhardinge
2009-10-06 18:46       ` Jeremy Fitzhardinge
2009-10-07 10:25       ` [Xen-devel] " Avi Kivity
2009-10-07 10:25         ` Avi Kivity
2009-10-07 19:29         ` [Xen-devel] " Jeremy Fitzhardinge
2009-10-07 19:29           ` Jeremy Fitzhardinge
2009-10-07 20:09           ` [Xen-devel] " Avi Kivity
2009-10-07 20:09             ` Avi Kivity
2009-10-07 21:19             ` [Xen-devel] " Jeremy Fitzhardinge
2009-10-07 21:19               ` Jeremy Fitzhardinge
2009-10-07 21:37               ` [Xen-devel] " Avi Kivity
2009-10-07 21:37                 ` Avi Kivity
2009-10-07 21:51                 ` [Xen-devel] " Jeremy Fitzhardinge
2009-10-07 21:51                   ` Jeremy Fitzhardinge
2009-10-07 21:53                   ` [Xen-devel] " Avi Kivity
2009-10-07 21:53                     ` Avi Kivity
2009-10-07 20:48         ` [Xen-devel] " Dan Magenheimer
2009-10-07 20:48           ` Dan Magenheimer
2009-10-07 21:08           ` [Xen-devel] " Avi Kivity
2009-10-07 21:08             ` Avi Kivity
2009-10-07 22:36             ` [Xen-devel] " Dan Magenheimer
2009-10-07 22:36               ` Dan Magenheimer
2009-10-10  0:24         ` [Xen-devel] " Jeremy Fitzhardinge
2009-10-10  0:24           ` Jeremy Fitzhardinge
2009-10-10 18:10           ` [Xen-devel] " Avi Kivity
2009-10-10 18:10             ` Avi Kivity
2009-10-12 18:20             ` [Xen-devel] " Jeremy Fitzhardinge
2009-10-12 18:20               ` Jeremy Fitzhardinge
2009-10-12 18:29               ` [Xen-devel] " Avi Kivity
2009-10-12 18:29                 ` Avi Kivity
2009-10-12 19:13                 ` [Xen-devel] " Jeremy Fitzhardinge
2009-10-12 19:13                   ` Jeremy Fitzhardinge
2009-10-13  6:39                   ` [Xen-devel] " Avi Kivity
2009-10-13  6:39                     ` Avi Kivity
2009-10-13 20:00                     ` [Xen-devel] " Jeremy Fitzhardinge
2009-10-13 20:00                       ` Jeremy Fitzhardinge
2009-10-14 12:32                       ` [Xen-devel] " Avi Kivity
2009-10-14 12:32                         ` Avi Kivity
2009-10-15 19:17                         ` [Xen-devel] " Jeremy Fitzhardinge
2009-10-15 19:17                           ` Jeremy Fitzhardinge
2009-10-27 17:29                         ` [Xen-devel] " Dan Magenheimer
2009-10-27 17:29                           ` Dan Magenheimer
2009-10-27 18:20                           ` [Xen-devel] " Jeremy Fitzhardinge
2009-10-27 18:20                             ` Jeremy Fitzhardinge
2009-10-28  5:52                             ` [Xen-devel] " Avi Kivity
2009-10-28  5:52                               ` Avi Kivity
2009-10-28  9:29                               ` [Xen-devel] " Glauber Costa
2009-10-28  9:34                                 ` Avi Kivity
2009-10-28  9:34                                   ` Avi Kivity
2009-10-28 17:47                                   ` [Xen-devel] " Jeremy Fitzhardinge
2009-10-28 17:47                                     ` Jeremy Fitzhardinge
2009-10-29 12:13                                     ` [Xen-devel] " Avi Kivity
2009-10-29 12:13                                       ` Avi Kivity
2009-10-29 13:03                                       ` [Xen-devel] " Chris Mason
2009-10-29 13:03                                         ` Chris Mason
2009-10-29 14:46                                       ` [Xen-devel] " Dan Magenheimer
2009-10-29 14:46                                         ` Dan Magenheimer
2009-10-29 15:07                                         ` [Xen-devel] " Avi Kivity
2009-10-29 15:07                                           ` Avi Kivity
2009-10-29 15:55                                           ` [Xen-devel] " Dan Magenheimer
2009-10-29 15:55                                             ` Dan Magenheimer
2009-10-29 16:15                                             ` [Xen-devel] " Dan Magenheimer
2009-10-29 16:15                                               ` Dan Magenheimer
2009-11-01  9:28                                               ` [Xen-devel] " Avi Kivity
2009-11-01  9:28                                                 ` Avi Kivity
2009-11-02 15:28                                                 ` [Xen-devel] " Dan Magenheimer
2009-11-02 15:28                                                   ` Dan Magenheimer
2009-11-02 15:41                                                   ` [Xen-devel] " Avi Kivity
2009-11-02 15:41                                                     ` Avi Kivity
2009-11-01  9:32                                             ` [Xen-devel] " Avi Kivity
2009-11-01  9:32                                               ` Avi Kivity
2009-11-02 15:46                                               ` [Xen-devel] " Dan Magenheimer
2009-11-02 15:46                                                 ` Dan Magenheimer
2009-11-03  5:12                                                 ` [Xen-devel] " Avi Kivity
2009-11-03  5:12                                                   ` Avi Kivity
2009-11-04 20:30                                                   ` [Xen-devel] " Dan Magenheimer
2009-11-04 20:30                                                     ` Dan Magenheimer
2009-11-05  6:47                                                     ` [Xen-devel] " Avi Kivity
2009-11-05  6:47                                                       ` Avi Kivity
2009-11-05 14:52                                                       ` [Xen-devel] " Dan Magenheimer
2009-11-05 14:52                                                         ` Dan Magenheimer
2009-11-05 15:07                                                         ` [Xen-devel] " Keir Fraser
2009-11-05 15:07                                                           ` Keir Fraser
2009-11-04 21:19                                           ` [Xen-devel] " john stultz
2009-11-04 21:19                                             ` john stultz
2009-11-04 21:28                                             ` Dan Magenheimer
2009-11-04 21:28                                               ` Dan Magenheimer
2009-11-05  0:02                                               ` [Xen-devel] " john stultz
2009-11-05  0:02                                                 ` john stultz
2009-11-05  0:45                                                 ` [Xen-devel] " Dan Magenheimer
2009-11-05  0:45                                                   ` Dan Magenheimer
2009-10-30 23:30                                     ` pvclock implementation in pv_ops kernel: why not __native_read_tsc()? Dan Magenheimer
2009-10-31  1:17                                       ` Jeremy Fitzhardinge
2009-10-06  0:50 ` [PATCH 4/5] x86/fixmap: add a predicate for usermode fixmaps Jeremy Fitzhardinge
2009-10-06  0:50   ` Jeremy Fitzhardinge
2009-10-06 10:23   ` [Xen-devel] " Jan Beulich
2009-10-06 10:23     ` Jan Beulich
2009-10-06 18:47     ` [Xen-devel] " Jeremy Fitzhardinge
2009-10-06 18:47       ` Jeremy Fitzhardinge
2009-10-06  0:50 ` [PATCH 5/5] xen/time: add pvclock_clocksource_vread support Jeremy Fitzhardinge
2009-10-06  0:50   ` Jeremy Fitzhardinge
2009-10-06 10:28   ` [Xen-devel] " Jan Beulich
2009-10-06 10:28     ` Jan Beulich
2009-10-06 18:48     ` [Xen-devel] " Jeremy Fitzhardinge
2009-10-06 18:48       ` Jeremy Fitzhardinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ACB0833.2050203@redhat.com \
    --to=avi@redhat.com \
    --cc=chris.mason@oracle.com \
    --cc=dan.magenheimer@oracle.com \
    --cc=gcosta@redhat.com \
    --cc=jeremy.fitzhardinge@citrix.com \
    --cc=keir.fraser@eu.citrix.com \
    --cc=kurt.hackel@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xensource.com \
    --cc=zach.brown@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.