From: Marcelo Tosatti <mtosatti@redhat.com>
To: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Cc: linux-kernel@vger.kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
kvm@vger.kernel.org, Joerg Roedel <joerg.roedel@amd.com>,
David Sharp <dhsharp@google.com>,
Steven Rostedt <rostedt@goodmis.org>,
Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>,
Ingo Molnar <mingo@redhat.com>, Avi Kivity <avi@redhat.com>,
yrl.pp-manager.tt@hitachi.com,
Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [RFC PATCH 0/2] kvm/vmx: Output TSC offset
Date: Fri, 16 Nov 2012 01:19:46 -0200 [thread overview]
Message-ID: <20121116031946.GA23939@amt.cnet> (raw)
In-Reply-To: <20121114013611.5338.15086.stgit@yunodevel>
On Wed, Nov 14, 2012 at 10:36:21AM +0900, Yoshihiro YUNOMAE wrote:
> Hi All,
>
> The following patch set can make disordered trace data of a guest and a host
> sorted in chronological order.
>
> In a virtualization environment, it is difficult to analyze performance
> problems, such as a delay of I/O request on a guest. This is because multiple
> guests operate on the host. One of approaches for solving such kind of problems
> is to sort trace data of guests and the host in chronological order.
>
> After we applied the patch set(https://lkml.org/lkml/2012/11/13/588), raw TSC
> can be chosen as a timestamp of ftrace. TSC is useful for merging trace data
> in chronological order by two reasons. One of the reasons is that guests can
> directly read raw TSC from the CPU using rdtsc operation. This means that raw
> TSC value is not software clock like sched_clock, so we don't need to consider
> about how the timestamp is calculated. The other is that TSC of recent x86 CPUs
> is constantly incremented. This means that we don't need to worry about pace of
> the timestamp. Therefore, choosing TSC as a timestamp for tracing is reasonable
> to integrate trace data of guests and a host.
>
> Here, we need to consider about just one matter for using TSC on guests. TSC
> value on a guest is always the host TSC plus the guest's "TSC offset". In other
> words, to merge trace data using TSC as timestamp in chronological order, we
> need to consider TSC offset of the guest.
>
> However, only the host kernel can read the TSC offset from VMCS and TSC offset
> is not output in anywhere now. In other words, tools in userland cannot get
> the TSC offset value, so we cannot merge trace data of guest and the host in
> chronological order. Therefore, the TSC offset should be exported for userland
> tools.
>
> In this patch set, TSC offset is exported by printk() on the host. I also
> attached a tool for merging trace data of a guest and a host in chronological
> order.
>
> <Example>
> We assume that wakeup-latency for a command is big on a guest. Normally
> we will use ftrace's wakeup-latency tracer or event tracer on the guest, but we
> may not be able to solve this problem. This is because guests often exit to
> the host for several reasons. In the next, we will use TSC as ftrace's timestamp
> and record the trace data on the guest and the host. Then, we get following
> data:
>
> /* guest data */
> comm-3826 [000] d...49836825726903: sched_wakeup: [detail]
> comm-3826 [000] d...49836832225344: sched_switch: [detail]
> /* host data */
> qemu-kvm-2687 [003] d...50550079203669: kvm_exit: [detail]
> qemu-kvm-2687 [003] d...50550079206816: kvm_entry: [detail]
> qemu-kvm-2687 [003] d...50550079240656: kvm_exit: [detail]
> qemu-kvm-2687 [003] d...50550079243467: kvm_entry: [detail]
> qemu-kvm-2687 [003] d...50550079256103: kvm_exit: [detail]
> qemu-kvm-2687 [003] d...50550079268391: kvm_entry: [detail]
> qemu-kvm-2687 [003] d...50550079280829: kvm_exit: [detail]
> qemu-kvm-2687 [003] d...50550079286028: kvm_entry: [detail]
>
> Since TSC offset is not considered, these data cannot be merged. If this trace
> data is shown like as follows, we will be able to understand the reason:
>
> qemu-kvm-2687 [003] d...50550079203669: kvm_exit: [detail]
> qemu-kvm-2687 [003] d...50550079206816: kvm_entry: [detail]
> comm-3826 [000] d.h.49836825726903: sched_wakeup: [detail] <=
> qemu-kvm-2687 [003] d...50550079240656: kvm_exit: [detail]
> qemu-kvm-2687 [003] d...50550079243467: kvm_entry: [detail]
> qemu-kvm-2687 [003] d...50550079256103: kvm_exit: [detail]
> qemu-kvm-2687 [003] d...50550079268391: kvm_entry: [detail]
> comm-3826 [000] d...49836832225344: sched_switch: [detail] <=
> qemu-kvm-2687 [003] d...50550079280829: kvm_exit: [detail]
> qemu-kvm-2687 [003] d...50550079286028: kvm_entry: [detail]
>
> In this case, we can understand wakeup-latency was big due to exit to host
> twice. Getting this data sorted in chronological order is our goal.
>
> To merge the data like previous pattern, we apply this patch set. Then, we can
> get TSC offset of the guest as follows:
>
> $ dmesg | grep kvm
> [ 57.717180] kvm: (2687) write TSC offset 18446743360465545001, now clock ##
> ^^^^ ^^^^^^^^^^^^^^^^^^^^ |
> PID TSC offset |
> HOST TSC value --+
>
> We use this TSC offset value to a merge script and obtain the following data:
>
> $ ./trace-merge.pl 18446743360465545001 host.data guest.data
> h qemu-kvm-2687 [003] d...50550079203669: kvm_exit: [detail]
> h qemu-kvm-2687 [003] d...50550079206816: kvm_entry: [detail]
> g comm-3826 [000] d.h.50550079226331: sched_wakeup: [detail] <=
> h qemu-kvm-2687 [003] d...50550079240656: kvm_exit: [detail]
> h qemu-kvm-2687 [003] d...50550079243467: kvm_entry: [detail]
> h qemu-kvm-2687 [003] d...50550079256103: kvm_exit: [detail]
> h qemu-kvm-2687 [003] d...50550079268391: kvm_entry: [detail]
> g comm-3826 [000] d...50550079279266: sched_switch: [detail] <=
> h qemu-kvm-2687 [003] d...50550079280829: kvm_exit: [detail]
> h qemu-kvm-2687 [003] d...50550079286028: kvm_entry: [detail]
> |
> \----guest/host
>
> In this summary, I suggest the patch which TSC offset for each guest can be
> output on the host.
The guest TSC can change (for example if TSC scaling is used). Moreover
TSC offset can change, and you'd have to monitor that. What
about a module option so that tsc_offset is written as zero (to be
used as debugging tool). Then the following restrictions apply:
- TSC must be synchronized across CPUs/VCPUS.
- TSC must be reliable.
Would that suffice? (a module option to kvm.ko, say zero_tsc_offset).
> I chose printk() to output TSC offset value, but I think this is not the best
> method. For example, defining as a tracepoint is one of the methods. In the
> case, multiple buffers are needed to keep these data.
>
> I need your comments, thanks!
> ---
>
> Yoshihiro YUNOMAE (2):
> kvm/vmx: Print TSC_OFFSET information when TSC offset value is written to VMCS
> tools: Add a tool for merging trace data of a guest and a host
>
>
> arch/x86/kvm/vmx.c | 5 +
> tools/scripts/trace-merge/trace-merge.pl | 109 ++++++++++++++++++++++++++++++
> 2 files changed, 114 insertions(+)
> create mode 100755 tools/scripts/trace-merge/trace-merge.pl
>
> --
> Yoshihiro YUNOMAE
> Software Platform Research Dept. Linux Technology Center
> Hitachi, Ltd., Yokohama Research Laboratory
> E-mail: yoshihiro.yunomae.ez@hitachi.com
next prev parent reply other threads:[~2012-11-16 3:29 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-14 1:36 [RFC PATCH 0/2] kvm/vmx: Output TSC offset Yoshihiro YUNOMAE
2012-11-14 1:36 ` [RFC PATCH 1/2] kvm/vmx: Print TSC_OFFSET information when TSC offset value is written to VMCS Yoshihiro YUNOMAE
2012-11-14 1:37 ` [RFC PATCH 2/2] tools: Add a tool for merging trace data of a guest and a host Yoshihiro YUNOMAE
2012-11-14 2:00 ` [RFC PATCH 0/2] kvm/vmx: Output TSC offset Steven Rostedt
2012-11-14 2:02 ` H. Peter Anvin
2012-11-14 2:03 ` David Sharp
2012-11-14 2:31 ` Steven Rostedt
2012-11-14 8:26 ` Yoshihiro YUNOMAE
2012-11-16 15:05 ` Steven Rostedt
2012-11-16 18:56 ` Marcelo Tosatti
2012-11-20 10:38 ` Yoshihiro YUNOMAE
2012-11-16 19:15 ` Marcelo Tosatti
2012-11-20 10:36 ` Yoshihiro YUNOMAE
2012-11-20 22:51 ` Marcelo Tosatti
2012-11-22 5:21 ` Yoshihiro YUNOMAE
2012-11-23 22:46 ` Marcelo Tosatti
2012-11-26 11:05 ` Yoshihiro YUNOMAE
2012-11-26 23:16 ` Marcelo Tosatti
2012-11-27 10:53 ` Yoshihiro YUNOMAE
2012-11-29 22:51 ` Marcelo Tosatti
2012-11-30 1:36 ` Yoshihiro YUNOMAE
2012-11-30 20:42 ` Marcelo Tosatti
2012-12-03 0:55 ` Yoshihiro YUNOMAE
2012-11-16 3:19 ` Marcelo Tosatti [this message]
2012-11-16 8:09 ` Yoshihiro YUNOMAE
2012-11-16 10:05 ` Marcelo Tosatti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121116031946.GA23939@amt.cnet \
--to=mtosatti@redhat.com \
--cc=avi@redhat.com \
--cc=dhsharp@google.com \
--cc=hidehiro.kawai.ez@hitachi.com \
--cc=hpa@zytor.com \
--cc=joerg.roedel@amd.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=masami.hiramatsu.pt@hitachi.com \
--cc=mingo@redhat.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=yoshihiro.yunomae.ez@hitachi.com \
--cc=yrl.pp-manager.tt@hitachi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox