All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Cc: will@kernel.org, catalin.marinas@arm.com, mtosatti@redhat.com,
	linux-kernel@vger.kernel.org, rostedt@goodmis.org,
	mingo@redhat.com, nilal@redhat.com, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [RFC PATCH 2/2] KVM: arm64: export cntvoff in debugfs
Date: Tue, 23 Nov 2021 11:09:18 +0000	[thread overview]
Message-ID: <87lf1fcen5.wl-maz@kernel.org> (raw)
In-Reply-To: <0e948a211bd8d63ba05594fb8c03bf3a77a227a0.camel@redhat.com>

On Mon, 22 Nov 2021 20:40:52 +0000,
Nicolas Saenz Julienne <nsaenzju@redhat.com> wrote:
> 
> Hi Marc, thanks for the review.
> 
> On Fri, 2021-11-19 at 12:17 +0000, Marc Zyngier wrote:
> > On Fri, 19 Nov 2021 10:21:18 +0000,
> > Nicolas Saenz Julienne <nsaenzju@redhat.com> wrote:
> > > 
> > > While using cntvct as the raw clock for tracing, it's possible to
> > > synchronize host/guest traces just by knowing the virtual offset applied
> > > to the guest's virtual counter.
> > > 
> > > This is also the case on x86 when TSC is available. The offset is
> > > exposed in debugfs as 'tsc-offset' on a per vcpu basis. So let's
> > > implement the same for arm64.
> > 
> > How does this work with NV, where the guest hypervisor is in control
> > of the virtual offset? 
> 
> TBH I handn't thought about NV. Looking at it from that angle, I now see my
> approach doesn't work on hosts that use CNTVCT (regardless of NV). Upon
> entering into a guest, we change CNTVOFF before the host is done with tracing,
> so traces like 'kvm_entry' will have weird timestamps. I was just lucky that
> the hosts I was testing with use CNTPCT.

There are multiple things at play here:

- if the system is a host, the kernel will use CNTPCT. Userspace will
  still use CNTVCT, and the offset is guaranteed to be 0 *when running
  userspace*.

- if the system isn't a host (which doesn't necessarily means a
  guest), CNTVCT is the only thing that is being used, and the offset
  is unknown (Linux requires it to be constant across vcpus though).

So I doubt you'd get a bad timestamp on the host. It is just that you
have named your trace clock incorrectly (and Steven's idea of an
indirected clock could help here).

> I believe the solution would be to be able to force a 0 offset between
> guest/host. With that in mind, is there a reason why kvm_timer_vcpu_init()
> imposes a non-zero one by default? I checked out the commits that introduced
> that code, but couldn't find a compelling reason. VMMs can always change it
> through KVM_REG_ARM_TIMER_CNT afterwards.

We want to minimise the chance for an observable rollover of the
virtual counter, so time starts at 0 *in the guest*. The VMM can
change the view of that time for the purpose of migration.

If you want a 0 offset, set the counter to the physical value in the
VMM (imprecise) or have a look at Oliver Upton's patches that were
allowing an offset to be specified directly. But migration, by
definition, breaks this.

> 
> > I also wonder why we need this when userspace already has direct access to
> > that information without any extra kernel support (read the CNTVCT view of
> > the vcpu using the ONEREG API, subtract it from the host view of the counter,
> > job done).
> 
> Well IIUC, you're at the mercy of how long it takes to return from the ONEREG
> ioctl. The results will be skewed. For some workloads, where low latency is
> key, we really need high precision traces in the order of single digit us or
> even 100s of ns. I'm not sure you'll be able to get there with that approach.

The PTP clock does exactly that from the guest PoV, with a lot more
overhead, and this results in single digit ns precision. Why isn't
that possible from userspace?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org, rostedt@goodmis.org,
	james.morse@arm.com, alexandru.elisei@arm.com,
	suzuki.poulose@arm.com, catalin.marinas@arm.com, will@kernel.org,
	linux-kernel@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	mingo@redhat.com, mtosatti@redhat.com, nilal@redhat.com
Subject: Re: [RFC PATCH 2/2] KVM: arm64: export cntvoff in debugfs
Date: Tue, 23 Nov 2021 11:09:18 +0000	[thread overview]
Message-ID: <87lf1fcen5.wl-maz@kernel.org> (raw)
In-Reply-To: <0e948a211bd8d63ba05594fb8c03bf3a77a227a0.camel@redhat.com>

On Mon, 22 Nov 2021 20:40:52 +0000,
Nicolas Saenz Julienne <nsaenzju@redhat.com> wrote:
> 
> Hi Marc, thanks for the review.
> 
> On Fri, 2021-11-19 at 12:17 +0000, Marc Zyngier wrote:
> > On Fri, 19 Nov 2021 10:21:18 +0000,
> > Nicolas Saenz Julienne <nsaenzju@redhat.com> wrote:
> > > 
> > > While using cntvct as the raw clock for tracing, it's possible to
> > > synchronize host/guest traces just by knowing the virtual offset applied
> > > to the guest's virtual counter.
> > > 
> > > This is also the case on x86 when TSC is available. The offset is
> > > exposed in debugfs as 'tsc-offset' on a per vcpu basis. So let's
> > > implement the same for arm64.
> > 
> > How does this work with NV, where the guest hypervisor is in control
> > of the virtual offset? 
> 
> TBH I handn't thought about NV. Looking at it from that angle, I now see my
> approach doesn't work on hosts that use CNTVCT (regardless of NV). Upon
> entering into a guest, we change CNTVOFF before the host is done with tracing,
> so traces like 'kvm_entry' will have weird timestamps. I was just lucky that
> the hosts I was testing with use CNTPCT.

There are multiple things at play here:

- if the system is a host, the kernel will use CNTPCT. Userspace will
  still use CNTVCT, and the offset is guaranteed to be 0 *when running
  userspace*.

- if the system isn't a host (which doesn't necessarily means a
  guest), CNTVCT is the only thing that is being used, and the offset
  is unknown (Linux requires it to be constant across vcpus though).

So I doubt you'd get a bad timestamp on the host. It is just that you
have named your trace clock incorrectly (and Steven's idea of an
indirected clock could help here).

> I believe the solution would be to be able to force a 0 offset between
> guest/host. With that in mind, is there a reason why kvm_timer_vcpu_init()
> imposes a non-zero one by default? I checked out the commits that introduced
> that code, but couldn't find a compelling reason. VMMs can always change it
> through KVM_REG_ARM_TIMER_CNT afterwards.

We want to minimise the chance for an observable rollover of the
virtual counter, so time starts at 0 *in the guest*. The VMM can
change the view of that time for the purpose of migration.

If you want a 0 offset, set the counter to the physical value in the
VMM (imprecise) or have a look at Oliver Upton's patches that were
allowing an offset to be specified directly. But migration, by
definition, breaks this.

> 
> > I also wonder why we need this when userspace already has direct access to
> > that information without any extra kernel support (read the CNTVCT view of
> > the vcpu using the ONEREG API, subtract it from the host view of the counter,
> > job done).
> 
> Well IIUC, you're at the mercy of how long it takes to return from the ONEREG
> ioctl. The results will be skewed. For some workloads, where low latency is
> key, we really need high precision traces in the order of single digit us or
> even 100s of ns. I'm not sure you'll be able to get there with that approach.

The PTP clock does exactly that from the guest PoV, with a lot more
overhead, and this results in single digit ns precision. Why isn't
that possible from userspace?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org, rostedt@goodmis.org,
	james.morse@arm.com, alexandru.elisei@arm.com,
	suzuki.poulose@arm.com, catalin.marinas@arm.com, will@kernel.org,
	linux-kernel@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	mingo@redhat.com, mtosatti@redhat.com, nilal@redhat.com
Subject: Re: [RFC PATCH 2/2] KVM: arm64: export cntvoff in debugfs
Date: Tue, 23 Nov 2021 11:09:18 +0000	[thread overview]
Message-ID: <87lf1fcen5.wl-maz@kernel.org> (raw)
In-Reply-To: <0e948a211bd8d63ba05594fb8c03bf3a77a227a0.camel@redhat.com>

On Mon, 22 Nov 2021 20:40:52 +0000,
Nicolas Saenz Julienne <nsaenzju@redhat.com> wrote:
> 
> Hi Marc, thanks for the review.
> 
> On Fri, 2021-11-19 at 12:17 +0000, Marc Zyngier wrote:
> > On Fri, 19 Nov 2021 10:21:18 +0000,
> > Nicolas Saenz Julienne <nsaenzju@redhat.com> wrote:
> > > 
> > > While using cntvct as the raw clock for tracing, it's possible to
> > > synchronize host/guest traces just by knowing the virtual offset applied
> > > to the guest's virtual counter.
> > > 
> > > This is also the case on x86 when TSC is available. The offset is
> > > exposed in debugfs as 'tsc-offset' on a per vcpu basis. So let's
> > > implement the same for arm64.
> > 
> > How does this work with NV, where the guest hypervisor is in control
> > of the virtual offset? 
> 
> TBH I handn't thought about NV. Looking at it from that angle, I now see my
> approach doesn't work on hosts that use CNTVCT (regardless of NV). Upon
> entering into a guest, we change CNTVOFF before the host is done with tracing,
> so traces like 'kvm_entry' will have weird timestamps. I was just lucky that
> the hosts I was testing with use CNTPCT.

There are multiple things at play here:

- if the system is a host, the kernel will use CNTPCT. Userspace will
  still use CNTVCT, and the offset is guaranteed to be 0 *when running
  userspace*.

- if the system isn't a host (which doesn't necessarily means a
  guest), CNTVCT is the only thing that is being used, and the offset
  is unknown (Linux requires it to be constant across vcpus though).

So I doubt you'd get a bad timestamp on the host. It is just that you
have named your trace clock incorrectly (and Steven's idea of an
indirected clock could help here).

> I believe the solution would be to be able to force a 0 offset between
> guest/host. With that in mind, is there a reason why kvm_timer_vcpu_init()
> imposes a non-zero one by default? I checked out the commits that introduced
> that code, but couldn't find a compelling reason. VMMs can always change it
> through KVM_REG_ARM_TIMER_CNT afterwards.

We want to minimise the chance for an observable rollover of the
virtual counter, so time starts at 0 *in the guest*. The VMM can
change the view of that time for the purpose of migration.

If you want a 0 offset, set the counter to the physical value in the
VMM (imprecise) or have a look at Oliver Upton's patches that were
allowing an offset to be specified directly. But migration, by
definition, breaks this.

> 
> > I also wonder why we need this when userspace already has direct access to
> > that information without any extra kernel support (read the CNTVCT view of
> > the vcpu using the ONEREG API, subtract it from the host view of the counter,
> > job done).
> 
> Well IIUC, you're at the mercy of how long it takes to return from the ONEREG
> ioctl. The results will be skewed. For some workloads, where low latency is
> key, we really need high precision traces in the order of single digit us or
> even 100s of ns. I'm not sure you'll be able to get there with that approach.

The PTP clock does exactly that from the guest PoV, with a lot more
overhead, and this results in single digit ns precision. Why isn't
that possible from userspace?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

  reply	other threads:[~2021-11-23 11:09 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-19 10:21 [RFC PATCH 0/2] KVM: arm64: Host/Guest trace syncronization Nicolas Saenz Julienne
2021-11-19 10:21 ` Nicolas Saenz Julienne
2021-11-19 10:21 ` Nicolas Saenz Julienne
2021-11-19 10:21 ` [RFC PATCH 1/2] arm64/tracing: add cntvct based trace clock Nicolas Saenz Julienne
2021-11-19 10:21   ` Nicolas Saenz Julienne
2021-11-19 10:21   ` Nicolas Saenz Julienne
2021-11-19 11:26   ` Marcelo Tosatti
2021-11-19 11:26     ` Marcelo Tosatti
2021-11-19 11:26     ` Marcelo Tosatti
2021-11-19 12:00     ` Marc Zyngier
2021-11-19 12:00       ` Marc Zyngier
2021-11-19 12:00       ` Marc Zyngier
2021-11-19 13:26       ` Nicolas Saenz Julienne
2021-11-19 13:26         ` Nicolas Saenz Julienne
2021-11-19 13:26         ` Nicolas Saenz Julienne
2021-11-22 14:57   ` Steven Rostedt
2021-11-22 14:57     ` Steven Rostedt
2021-11-22 14:57     ` Steven Rostedt
2021-11-24  9:45     ` Nicolas Saenz Julienne
2021-11-24  9:45       ` Nicolas Saenz Julienne
2021-11-24  9:45       ` Nicolas Saenz Julienne
2021-11-19 10:21 ` [RFC PATCH 2/2] KVM: arm64: export cntvoff in debugfs Nicolas Saenz Julienne
2021-11-19 10:21   ` Nicolas Saenz Julienne
2021-11-19 10:21   ` Nicolas Saenz Julienne
2021-11-19 11:11   ` Marcelo Tosatti
2021-11-19 11:11     ` Marcelo Tosatti
2021-11-19 11:11     ` Marcelo Tosatti
2021-11-19 12:17   ` Marc Zyngier
2021-11-19 12:17     ` Marc Zyngier
2021-11-19 12:17     ` Marc Zyngier
2021-11-19 12:59     ` Marcelo Tosatti
2021-11-19 12:59       ` Marcelo Tosatti
2021-11-19 12:59       ` Marcelo Tosatti
2021-11-19 13:31       ` Marc Zyngier
2021-11-19 13:31         ` Marc Zyngier
2021-11-19 13:31         ` Marc Zyngier
2021-11-22 20:40     ` Nicolas Saenz Julienne
2021-11-22 20:40       ` Nicolas Saenz Julienne
2021-11-22 20:40       ` Nicolas Saenz Julienne
2021-11-23 11:09       ` Marc Zyngier [this message]
2021-11-23 11:09         ` Marc Zyngier
2021-11-23 11:09         ` Marc Zyngier
2021-11-29 12:47       ` Marcelo Tosatti
2021-11-29 12:47         ` Marcelo Tosatti
2021-11-29 12:47         ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lf1fcen5.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=nilal@redhat.com \
    --cc=nsaenzju@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.