From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753046AbcICB3Q (ORCPT ); Fri, 2 Sep 2016 21:29:16 -0400 Received: from mx1.redhat.com ([209.132.183.28]:59850 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751271AbcICB3P (ORCPT ); Fri, 2 Sep 2016 21:29:15 -0400 Date: Fri, 2 Sep 2016 21:29:13 -0400 From: Luiz Capitulino To: Marcelo Tosatti Cc: Stefan Hajnoczi , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, pbonzini@redhat.com, rkrcmar@redhat.com, rostedt@goodmis.org, mhiramat@kernel.org Subject: Re: [PATCH 4/4] kvm: x86: export TSC offset to user-space Message-ID: <20160902212913.3d38a337@redhat.com> In-Reply-To: <20160902234936.GA12659@amt.cnet> References: <1472663145-1835-1-git-send-email-lcapitulino@redhat.com> <1472663145-1835-5-git-send-email-lcapitulino@redhat.com> <20160902134301.GC21771@stefanha-x1.localdomain> <20160902234936.GA12659@amt.cnet> Organization: Red Hat MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Sat, 03 Sep 2016 01:29:15 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 2 Sep 2016 20:49:37 -0300 Marcelo Tosatti wrote: > On Fri, Sep 02, 2016 at 09:43:01AM -0400, Stefan Hajnoczi wrote: > > On Wed, Aug 31, 2016 at 01:05:45PM -0400, Luiz Capitulino wrote: > > > We need to retrieve a VM's TSC offset in order to use > > > the host's TSC to merge host and guest traces. This is > > > explained in detail in this thread: > > > > > > [Qemu-devel] [RFC] host and guest kernel trace merging > > > https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg00887.html > > > > > > Today, the only way to retrieve a VM's TSC offset is > > > by using the kvm_write_tsc_offset tracepoint. This has > > > a few problems. First, the tracepoint is only emitted > > > when the VM boots, which requires a reboot to get it if > > > the VM is already running. Second, tracepoints are not > > > supposed to be ABIs in case they need to be consumed by > > > user-space tools. > > > > > > This commit exports a VM's TSC offset to user-space via > > > debugfs. A new file called "tsc-offset" is created in > > > the VM's debugfs directory. For example: > > > > > > /sys/kernel/debug/kvm/51696-10/tsc-offset > > > > > > This file contains one TSC offset per line, for each > > > vCPU. For example: > > > > > > vcpu0: 18446742405270834952 > > > vcpu1: 18446742405270834952 > > > vcpu2: 18446742405270834952 > > > vcpu3: 18446742405270834952 > > > > > > There are some important observations about this > > > solution: > > > > > > - While all vCPUs TSC offsets should be equal for the > > > cases we care about (ie. stable TSC and no write to > > > the TSC MSR), I chose to follow the spec and export > > > each vCPU's TSC offset (might also be helpful for > > > debugging) > > > > > > - The TSC offset is only useful after the VM has booted > > > > > > - We'll probably need to export the TSC multiplier too. > > > However, I've been using only the TSC offset for now. > > > So, let's get this merged first and do the TSC multiplier > > > as a second step > > > > Can TSC offset changes occur at runtime? > > > > One example is vcpu hotplug where the tracing tool would need to fetch > > the new vcpu's TSC offset after tracing has already started. > > > > Another example is if QEMU or the guest change the TSC offset while > > running. If the tracing tool doesn't notice this then trace events will have > > incorrect timestamps. > > > > Stefan > > Yes they can, and the interface should mention that "the user is > responsible for handling races of execution" (IMO). > > So the workflow is: > > 1) User boots VM and knows the state of the VM. > 2) User runs trace-cmd on the host. > > Is there a need to automate gathering of traces? (that is to know the > state of reboots and so forth). I don't see one. In that case, the above > workflow is functional. > > Can you add such comments to the interface Luiz (that the value > read is potentially stale). Sure, no problem.