From: Marcelo Tosatti <mtosatti@redhat.com>
To: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Cc: linux-kernel@vger.kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
kvm@vger.kernel.org, Joerg Roedel <joerg.roedel@amd.com>,
David Sharp <dhsharp@google.com>,
Steven Rostedt <rostedt@goodmis.org>,
Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>,
Ingo Molnar <mingo@redhat.com>, Avi Kivity <avi@redhat.com>,
yrl.pp-manager.tt@hitachi.com,
Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [RFC PATCH 0/2] kvm/vmx: Output TSC offset
Date: Fri, 16 Nov 2012 01:19:46 -0200 [thread overview]
Message-ID: <20121116031946.GA23939@amt.cnet> (raw)
In-Reply-To: <20121114013611.5338.15086.stgit@yunodevel>
On Wed, Nov 14, 2012 at 10:36:21AM +0900, Yoshihiro YUNOMAE wrote:
> Hi All,
>
> The following patch set can make disordered trace data of a guest and a host
> sorted in chronological order.
>
> In a virtualization environment, it is difficult to analyze performance
> problems, such as a delay of I/O request on a guest. This is because multiple
> guests operate on the host. One of approaches for solving such kind of problems
> is to sort trace data of guests and the host in chronological order.
>
> After we applied the patch set(https://lkml.org/lkml/2012/11/13/588), raw TSC
> can be chosen as a timestamp of ftrace. TSC is useful for merging trace data
> in chronological order by two reasons. One of the reasons is that guests can
> directly read raw TSC from the CPU using rdtsc operation. This means that raw
> TSC value is not software clock like sched_clock, so we don't need to consider
> about how the timestamp is calculated. The other is that TSC of recent x86 CPUs
> is constantly incremented. This means that we don't need to worry about pace of
> the timestamp. Therefore, choosing TSC as a timestamp for tracing is reasonable
> to integrate trace data of guests and a host.
>
> Here, we need to consider about just one matter for using TSC on guests. TSC
> value on a guest is always the host TSC plus the guest's "TSC offset". In other
> words, to merge trace data using TSC as timestamp in chronological order, we
> need to consider TSC offset of the guest.
>
> However, only the host kernel can read the TSC offset from VMCS and TSC offset
> is not output in anywhere now. In other words, tools in userland cannot get
> the TSC offset value, so we cannot merge trace data of guest and the host in
> chronological order. Therefore, the TSC offset should be exported for userland
> tools.
>
> In this patch set, TSC offset is exported by printk() on the host. I also
> attached a tool for merging trace data of a guest and a host in chronological
> order.
>
> <Example>
> We assume that wakeup-latency for a command is big on a guest. Normally
> we will use ftrace's wakeup-latency tracer or event tracer on the guest, but we
> may not be able to solve this problem. This is because guests often exit to
> the host for several reasons. In the next, we will use TSC as ftrace's timestamp
> and record the trace data on the guest and the host. Then, we get following
> data:
>
> /* guest data */
> comm-3826 [000] d...49836825726903: sched_wakeup: [detail]
> comm-3826 [000] d...49836832225344: sched_switch: [detail]
> /* host data */
> qemu-kvm-2687 [003] d...50550079203669: kvm_exit: [detail]
> qemu-kvm-2687 [003] d...50550079206816: kvm_entry: [detail]
> qemu-kvm-2687 [003] d...50550079240656: kvm_exit: [detail]
> qemu-kvm-2687 [003] d...50550079243467: kvm_entry: [detail]
> qemu-kvm-2687 [003] d...50550079256103: kvm_exit: [detail]
> qemu-kvm-2687 [003] d...50550079268391: kvm_entry: [detail]
> qemu-kvm-2687 [003] d...50550079280829: kvm_exit: [detail]
> qemu-kvm-2687 [003] d...50550079286028: kvm_entry: [detail]
>
> Since TSC offset is not considered, these data cannot be merged. If this trace
> data is shown like as follows, we will be able to understand the reason:
>
> qemu-kvm-2687 [003] d...50550079203669: kvm_exit: [detail]
> qemu-kvm-2687 [003] d...50550079206816: kvm_entry: [detail]
> comm-3826 [000] d.h.49836825726903: sched_wakeup: [detail] <=
> qemu-kvm-2687 [003] d...50550079240656: kvm_exit: [detail]
> qemu-kvm-2687 [003] d...50550079243467: kvm_entry: [detail]
> qemu-kvm-2687 [003] d...50550079256103: kvm_exit: [detail]
> qemu-kvm-2687 [003] d...50550079268391: kvm_entry: [detail]
> comm-3826 [000] d...49836832225344: sched_switch: [detail] <=
> qemu-kvm-2687 [003] d...50550079280829: kvm_exit: [detail]
> qemu-kvm-2687 [003] d...50550079286028: kvm_entry: [detail]
>
> In this case, we can understand wakeup-latency was big due to exit to host
> twice. Getting this data sorted in chronological order is our goal.
>
> To merge the data like previous pattern, we apply this patch set. Then, we can
> get TSC offset of the guest as follows:
>
> $ dmesg | grep kvm
> [ 57.717180] kvm: (2687) write TSC offset 18446743360465545001, now clock ##
> ^^^^ ^^^^^^^^^^^^^^^^^^^^ |
> PID TSC offset |
> HOST TSC value --+
>
> We use this TSC offset value to a merge script and obtain the following data:
>
> $ ./trace-merge.pl 18446743360465545001 host.data guest.data
> h qemu-kvm-2687 [003] d...50550079203669: kvm_exit: [detail]
> h qemu-kvm-2687 [003] d...50550079206816: kvm_entry: [detail]
> g comm-3826 [000] d.h.50550079226331: sched_wakeup: [detail] <=
> h qemu-kvm-2687 [003] d...50550079240656: kvm_exit: [detail]
> h qemu-kvm-2687 [003] d...50550079243467: kvm_entry: [detail]
> h qemu-kvm-2687 [003] d...50550079256103: kvm_exit: [detail]
> h qemu-kvm-2687 [003] d...50550079268391: kvm_entry: [detail]
> g comm-3826 [000] d...50550079279266: sched_switch: [detail] <=
> h qemu-kvm-2687 [003] d...50550079280829: kvm_exit: [detail]
> h qemu-kvm-2687 [003] d...50550079286028: kvm_entry: [detail]
> |
> \----guest/host
>
> In this summary, I suggest the patch which TSC offset for each guest can be
> output on the host.
The guest TSC can change (for example if TSC scaling is used). Moreover
TSC offset can change, and you'd have to monitor that. What
about a module option so that tsc_offset is written as zero (to be
used as debugging tool). Then the following restrictions apply:
- TSC must be synchronized across CPUs/VCPUS.
- TSC must be reliable.
Would that suffice? (a module option to kvm.ko, say zero_tsc_offset).
> I chose printk() to output TSC offset value, but I think this is not the best
> method. For example, defining as a tracepoint is one of the methods. In the
> case, multiple buffers are needed to keep these data.
>
> I need your comments, thanks!
> ---
>
> Yoshihiro YUNOMAE (2):
> kvm/vmx: Print TSC_OFFSET information when TSC offset value is written to VMCS
> tools: Add a tool for merging trace data of a guest and a host
>
>
> arch/x86/kvm/vmx.c | 5 +
> tools/scripts/trace-merge/trace-merge.pl | 109 ++++++++++++++++++++++++++++++
> 2 files changed, 114 insertions(+)
> create mode 100755 tools/scripts/trace-merge/trace-merge.pl
>
> --
> Yoshihiro YUNOMAE
> Software Platform Research Dept. Linux Technology Center
> Hitachi, Ltd., Yokohama Research Laboratory
> E-mail: yoshihiro.yunomae.ez@hitachi.com
next prev parent reply other threads:[~2012-11-16 3:29 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-14 1:36 [RFC PATCH 0/2] kvm/vmx: Output TSC offset Yoshihiro YUNOMAE
2012-11-14 1:36 ` [RFC PATCH 1/2] kvm/vmx: Print TSC_OFFSET information when TSC offset value is written to VMCS Yoshihiro YUNOMAE
2012-11-14 1:37 ` [RFC PATCH 2/2] tools: Add a tool for merging trace data of a guest and a host Yoshihiro YUNOMAE
2012-11-14 2:00 ` [RFC PATCH 0/2] kvm/vmx: Output TSC offset Steven Rostedt
2012-11-14 2:02 ` H. Peter Anvin
2012-11-14 2:03 ` David Sharp
2012-11-14 2:31 ` Steven Rostedt
2012-11-14 8:26 ` Yoshihiro YUNOMAE
2012-11-16 15:05 ` Steven Rostedt
2012-11-16 18:56 ` Marcelo Tosatti
2012-11-20 10:38 ` Yoshihiro YUNOMAE
2012-11-16 19:15 ` Marcelo Tosatti
2012-11-20 10:36 ` Yoshihiro YUNOMAE
2012-11-20 22:51 ` Marcelo Tosatti
2012-11-22 5:21 ` Yoshihiro YUNOMAE
2012-11-23 22:46 ` Marcelo Tosatti
2012-11-26 11:05 ` Yoshihiro YUNOMAE
2012-11-26 23:16 ` Marcelo Tosatti
2012-11-27 10:53 ` Yoshihiro YUNOMAE
2012-11-29 22:51 ` Marcelo Tosatti
2012-11-30 1:36 ` Yoshihiro YUNOMAE
2012-11-30 20:42 ` Marcelo Tosatti
2012-12-03 0:55 ` Yoshihiro YUNOMAE
2012-11-16 3:19 ` Marcelo Tosatti [this message]
2012-11-16 8:09 ` Yoshihiro YUNOMAE
2012-11-16 10:05 ` Marcelo Tosatti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121116031946.GA23939@amt.cnet \
--to=mtosatti@redhat.com \
--cc=avi@redhat.com \
--cc=dhsharp@google.com \
--cc=hidehiro.kawai.ez@hitachi.com \
--cc=hpa@zytor.com \
--cc=joerg.roedel@amd.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=masami.hiramatsu.pt@hitachi.com \
--cc=mingo@redhat.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=yoshihiro.yunomae.ez@hitachi.com \
--cc=yrl.pp-manager.tt@hitachi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.