All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: will@kernel.org, catalin.marinas@arm.com,
	linux-kernel@vger.kernel.org, rostedt@goodmis.org,
	mingo@redhat.com, Nicolas Saenz Julienne <nsaenzju@redhat.com>,
	nilal@redhat.com, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [RFC PATCH 2/2] KVM: arm64: export cntvoff in debugfs
Date: Fri, 19 Nov 2021 13:31:11 +0000	[thread overview]
Message-ID: <87czmw6zmo.wl-maz@kernel.org> (raw)
In-Reply-To: <20211119125946.GA57544@fuller.cnet>

On Fri, 19 Nov 2021 12:59:46 +0000,
Marcelo Tosatti <mtosatti@redhat.com> wrote:
> 
> On Fri, Nov 19, 2021 at 12:17:00PM +0000, Marc Zyngier wrote:
> > On Fri, 19 Nov 2021 10:21:18 +0000,
> > Nicolas Saenz Julienne <nsaenzju@redhat.com> wrote:
> > > 
> > > While using cntvct as the raw clock for tracing, it's possible to
> > > synchronize host/guest traces just by knowing the virtual offset applied
> > > to the guest's virtual counter.
> > > 
> > > This is also the case on x86 when TSC is available. The offset is
> > > exposed in debugfs as 'tsc-offset' on a per vcpu basis. So let's
> > > implement the same for arm64.
> > 
> > How does this work with NV, where the guest hypervisor is in control
> > of the virtual offset? How does userspace knows which vcpu to pick so
> > that it gets the right offset?
> 
> On x86, the offsets for different vcpus are the same due to the logic at
> kvm_synchronize_tsc function:
> 
> During guest vcpu creation, when the TSC-clock values are written
> in a short window of time (or the clock value is zero), the code uses
> the same TSC.
> 
> This logic is problematic (since "short window of time" is a heuristic 
> which can fail), and is being replaced by writing the same offset
> for each vCPU:
> 
> commit 828ca89628bfcb1b8f27535025f69dd00eb55207
> Author: Oliver Upton <oupton@google.com>
> Date:   Thu Sep 16 18:15:38 2021 +0000
> 
>     KVM: x86: Expose TSC offset controls to userspace
>     
>     To date, VMM-directed TSC synchronization and migration has been a bit
>     messy. KVM has some baked-in heuristics around TSC writes to infer if
>     the VMM is attempting to synchronize. This is problematic, as it depends
>     on host userspace writing to the guest's TSC within 1 second of the last
>     write.
>     
>     A much cleaner approach to configuring the guest's views of the TSC is to
>     simply migrate the TSC offset for every vCPU. Offsets are idempotent,
>     and thus not subject to change depending on when the VMM actually
>     reads/writes values from/to KVM. The VMM can then read the TSC once with
>     KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when
>     the guest is paused.
> 
> So with that in place, the answer to 
> 
> How does userspace knows which vcpu to pick so
> that it gets the right offset?
> 
> is any vcpu, since the offsets are the same.

As I just said above, this assertion doesn't hold true once you have
nested virt, because the offset is per-cpu, and is adjusted to mean
different things on different hypervisors (some hypervisors expose
stolen time through it, for example).

What this patch is doing is to expose a Linux-specific behaviour, and
try to derive properties from it. It really doesn't work in general.

> 
> > I also wonder why we need this when userspace already has direct
> > access to that information without any extra kernel support (read the
> > CNTVCT view of the vcpu using the ONEREG API, subtract it from the
> > host view of the counter, job done).
> 
> If guest has access to the clock offset (between guest and host), then
> in the guest:
> 
> 	clockval = hostclockval - clockoffset
> 
> Adding "clockoffset" to that will retrieve the host clock.
> 
> Is that what you mean?

No. The *VMM* (qemu, kvmtool, crosvm, insertyourfavouriteonehere) has
already access to it. Why do we need an extra interface?

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Nicolas Saenz Julienne <nsaenzju@redhat.com>,
	linux-arm-kernel@lists.infradead.org, rostedt@goodmis.org,
	james.morse@arm.com, alexandru.elisei@arm.com,
	suzuki.poulose@arm.com, catalin.marinas@arm.com, will@kernel.org,
	linux-kernel@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	mingo@redhat.com, nilal@redhat.com
Subject: Re: [RFC PATCH 2/2] KVM: arm64: export cntvoff in debugfs
Date: Fri, 19 Nov 2021 13:31:11 +0000	[thread overview]
Message-ID: <87czmw6zmo.wl-maz@kernel.org> (raw)
In-Reply-To: <20211119125946.GA57544@fuller.cnet>

On Fri, 19 Nov 2021 12:59:46 +0000,
Marcelo Tosatti <mtosatti@redhat.com> wrote:
> 
> On Fri, Nov 19, 2021 at 12:17:00PM +0000, Marc Zyngier wrote:
> > On Fri, 19 Nov 2021 10:21:18 +0000,
> > Nicolas Saenz Julienne <nsaenzju@redhat.com> wrote:
> > > 
> > > While using cntvct as the raw clock for tracing, it's possible to
> > > synchronize host/guest traces just by knowing the virtual offset applied
> > > to the guest's virtual counter.
> > > 
> > > This is also the case on x86 when TSC is available. The offset is
> > > exposed in debugfs as 'tsc-offset' on a per vcpu basis. So let's
> > > implement the same for arm64.
> > 
> > How does this work with NV, where the guest hypervisor is in control
> > of the virtual offset? How does userspace knows which vcpu to pick so
> > that it gets the right offset?
> 
> On x86, the offsets for different vcpus are the same due to the logic at
> kvm_synchronize_tsc function:
> 
> During guest vcpu creation, when the TSC-clock values are written
> in a short window of time (or the clock value is zero), the code uses
> the same TSC.
> 
> This logic is problematic (since "short window of time" is a heuristic 
> which can fail), and is being replaced by writing the same offset
> for each vCPU:
> 
> commit 828ca89628bfcb1b8f27535025f69dd00eb55207
> Author: Oliver Upton <oupton@google.com>
> Date:   Thu Sep 16 18:15:38 2021 +0000
> 
>     KVM: x86: Expose TSC offset controls to userspace
>     
>     To date, VMM-directed TSC synchronization and migration has been a bit
>     messy. KVM has some baked-in heuristics around TSC writes to infer if
>     the VMM is attempting to synchronize. This is problematic, as it depends
>     on host userspace writing to the guest's TSC within 1 second of the last
>     write.
>     
>     A much cleaner approach to configuring the guest's views of the TSC is to
>     simply migrate the TSC offset for every vCPU. Offsets are idempotent,
>     and thus not subject to change depending on when the VMM actually
>     reads/writes values from/to KVM. The VMM can then read the TSC once with
>     KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when
>     the guest is paused.
> 
> So with that in place, the answer to 
> 
> How does userspace knows which vcpu to pick so
> that it gets the right offset?
> 
> is any vcpu, since the offsets are the same.

As I just said above, this assertion doesn't hold true once you have
nested virt, because the offset is per-cpu, and is adjusted to mean
different things on different hypervisors (some hypervisors expose
stolen time through it, for example).

What this patch is doing is to expose a Linux-specific behaviour, and
try to derive properties from it. It really doesn't work in general.

> 
> > I also wonder why we need this when userspace already has direct
> > access to that information without any extra kernel support (read the
> > CNTVCT view of the vcpu using the ONEREG API, subtract it from the
> > host view of the counter, job done).
> 
> If guest has access to the clock offset (between guest and host), then
> in the guest:
> 
> 	clockval = hostclockval - clockoffset
> 
> Adding "clockoffset" to that will retrieve the host clock.
> 
> Is that what you mean?

No. The *VMM* (qemu, kvmtool, crosvm, insertyourfavouriteonehere) has
already access to it. Why do we need an extra interface?

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Nicolas Saenz Julienne <nsaenzju@redhat.com>,
	linux-arm-kernel@lists.infradead.org, rostedt@goodmis.org,
	james.morse@arm.com, alexandru.elisei@arm.com,
	suzuki.poulose@arm.com, catalin.marinas@arm.com, will@kernel.org,
	linux-kernel@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	mingo@redhat.com, nilal@redhat.com
Subject: Re: [RFC PATCH 2/2] KVM: arm64: export cntvoff in debugfs
Date: Fri, 19 Nov 2021 13:31:11 +0000	[thread overview]
Message-ID: <87czmw6zmo.wl-maz@kernel.org> (raw)
In-Reply-To: <20211119125946.GA57544@fuller.cnet>

On Fri, 19 Nov 2021 12:59:46 +0000,
Marcelo Tosatti <mtosatti@redhat.com> wrote:
> 
> On Fri, Nov 19, 2021 at 12:17:00PM +0000, Marc Zyngier wrote:
> > On Fri, 19 Nov 2021 10:21:18 +0000,
> > Nicolas Saenz Julienne <nsaenzju@redhat.com> wrote:
> > > 
> > > While using cntvct as the raw clock for tracing, it's possible to
> > > synchronize host/guest traces just by knowing the virtual offset applied
> > > to the guest's virtual counter.
> > > 
> > > This is also the case on x86 when TSC is available. The offset is
> > > exposed in debugfs as 'tsc-offset' on a per vcpu basis. So let's
> > > implement the same for arm64.
> > 
> > How does this work with NV, where the guest hypervisor is in control
> > of the virtual offset? How does userspace knows which vcpu to pick so
> > that it gets the right offset?
> 
> On x86, the offsets for different vcpus are the same due to the logic at
> kvm_synchronize_tsc function:
> 
> During guest vcpu creation, when the TSC-clock values are written
> in a short window of time (or the clock value is zero), the code uses
> the same TSC.
> 
> This logic is problematic (since "short window of time" is a heuristic 
> which can fail), and is being replaced by writing the same offset
> for each vCPU:
> 
> commit 828ca89628bfcb1b8f27535025f69dd00eb55207
> Author: Oliver Upton <oupton@google.com>
> Date:   Thu Sep 16 18:15:38 2021 +0000
> 
>     KVM: x86: Expose TSC offset controls to userspace
>     
>     To date, VMM-directed TSC synchronization and migration has been a bit
>     messy. KVM has some baked-in heuristics around TSC writes to infer if
>     the VMM is attempting to synchronize. This is problematic, as it depends
>     on host userspace writing to the guest's TSC within 1 second of the last
>     write.
>     
>     A much cleaner approach to configuring the guest's views of the TSC is to
>     simply migrate the TSC offset for every vCPU. Offsets are idempotent,
>     and thus not subject to change depending on when the VMM actually
>     reads/writes values from/to KVM. The VMM can then read the TSC once with
>     KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when
>     the guest is paused.
> 
> So with that in place, the answer to 
> 
> How does userspace knows which vcpu to pick so
> that it gets the right offset?
> 
> is any vcpu, since the offsets are the same.

As I just said above, this assertion doesn't hold true once you have
nested virt, because the offset is per-cpu, and is adjusted to mean
different things on different hypervisors (some hypervisors expose
stolen time through it, for example).

What this patch is doing is to expose a Linux-specific behaviour, and
try to derive properties from it. It really doesn't work in general.

> 
> > I also wonder why we need this when userspace already has direct
> > access to that information without any extra kernel support (read the
> > CNTVCT view of the vcpu using the ONEREG API, subtract it from the
> > host view of the counter, job done).
> 
> If guest has access to the clock offset (between guest and host), then
> in the guest:
> 
> 	clockval = hostclockval - clockoffset
> 
> Adding "clockoffset" to that will retrieve the host clock.
> 
> Is that what you mean?

No. The *VMM* (qemu, kvmtool, crosvm, insertyourfavouriteonehere) has
already access to it. Why do we need an extra interface?

	M.

-- 
Without deviation from the norm, progress is not possible.

  reply	other threads:[~2021-11-19 13:31 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-19 10:21 [RFC PATCH 0/2] KVM: arm64: Host/Guest trace syncronization Nicolas Saenz Julienne
2021-11-19 10:21 ` Nicolas Saenz Julienne
2021-11-19 10:21 ` Nicolas Saenz Julienne
2021-11-19 10:21 ` [RFC PATCH 1/2] arm64/tracing: add cntvct based trace clock Nicolas Saenz Julienne
2021-11-19 10:21   ` Nicolas Saenz Julienne
2021-11-19 10:21   ` Nicolas Saenz Julienne
2021-11-19 11:26   ` Marcelo Tosatti
2021-11-19 11:26     ` Marcelo Tosatti
2021-11-19 11:26     ` Marcelo Tosatti
2021-11-19 12:00     ` Marc Zyngier
2021-11-19 12:00       ` Marc Zyngier
2021-11-19 12:00       ` Marc Zyngier
2021-11-19 13:26       ` Nicolas Saenz Julienne
2021-11-19 13:26         ` Nicolas Saenz Julienne
2021-11-19 13:26         ` Nicolas Saenz Julienne
2021-11-22 14:57   ` Steven Rostedt
2021-11-22 14:57     ` Steven Rostedt
2021-11-22 14:57     ` Steven Rostedt
2021-11-24  9:45     ` Nicolas Saenz Julienne
2021-11-24  9:45       ` Nicolas Saenz Julienne
2021-11-24  9:45       ` Nicolas Saenz Julienne
2021-11-19 10:21 ` [RFC PATCH 2/2] KVM: arm64: export cntvoff in debugfs Nicolas Saenz Julienne
2021-11-19 10:21   ` Nicolas Saenz Julienne
2021-11-19 10:21   ` Nicolas Saenz Julienne
2021-11-19 11:11   ` Marcelo Tosatti
2021-11-19 11:11     ` Marcelo Tosatti
2021-11-19 11:11     ` Marcelo Tosatti
2021-11-19 12:17   ` Marc Zyngier
2021-11-19 12:17     ` Marc Zyngier
2021-11-19 12:17     ` Marc Zyngier
2021-11-19 12:59     ` Marcelo Tosatti
2021-11-19 12:59       ` Marcelo Tosatti
2021-11-19 12:59       ` Marcelo Tosatti
2021-11-19 13:31       ` Marc Zyngier [this message]
2021-11-19 13:31         ` Marc Zyngier
2021-11-19 13:31         ` Marc Zyngier
2021-11-22 20:40     ` Nicolas Saenz Julienne
2021-11-22 20:40       ` Nicolas Saenz Julienne
2021-11-22 20:40       ` Nicolas Saenz Julienne
2021-11-23 11:09       ` Marc Zyngier
2021-11-23 11:09         ` Marc Zyngier
2021-11-23 11:09         ` Marc Zyngier
2021-11-29 12:47       ` Marcelo Tosatti
2021-11-29 12:47         ` Marcelo Tosatti
2021-11-29 12:47         ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87czmw6zmo.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=nilal@redhat.com \
    --cc=nsaenzju@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.