All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gleb Natapov <gleb@redhat.com>
To: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	linux-kernel@vger.kernel.org, David Sharp <dhsharp@google.com>,
	yrl.pp-manager.tt@hitachi.com,
	Steven Rostedt <rostedt@goodmis.org>,
	Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH V3 0/1] kvm: Output TSC offset
Date: Sun, 23 Jun 2013 10:58:48 +0300	[thread overview]
Message-ID: <20130623075848.GO5832@redhat.com> (raw)
In-Reply-To: <20130612074341.25553.56877.stgit@yunodevel>

On Wed, Jun 12, 2013 at 04:43:41PM +0900, Yoshihiro YUNOMAE wrote:
> Hi All,
> 
Applied, thanks.

> I'd like to propose a patch which adds a tracepoint at write_tsc_offset for
> tracing guests TSC offset changes. It is required for sorting the trace data of
> a guest and the host in chronological order.
> 
> In a virtualization environment, it is difficult to analyze performance
> problems, such as a delay of I/O request on a guest. This is because multiple
> guests operate on the host. One of approaches for solving such kind of problems
> is to sort trace data of guests and the host in chronological order.
> 
> Raw TSC can be chosen as a timestamp of ftrace. I think TSC is useful for
> merging trace data in chronological order by two reasons. One of the reasons is 
> that guests can directly read raw TSC from the CPU using rdtsc operation. This
> means that raw TSC value is not software clock like sched_clock, so we don't
> need to consider about how the timestamp is calculated. The other is that TSC
> of recent x86 CPUs is constantly incremented. This means that we don't need to
> worry about pace of the timestamp. Therefore, choosing TSC as a timestamp for
> tracing is reasonable to integrate trace data of guests and a host.
> 
> Here, we need to consider about just one matter for using TSC on guests. TSC
> value on a guest is always the host TSC plus the guest's "TSC offset". In other
> words, to merge trace data using TSC as timestamp in chronological order, we
> need to consider TSC offset of the guest.
> 
> However, only the host kernel can read the TSC offset from VMCS and TSC offset
> is not output in anywhere now. In other words, tools in userland cannot get
> the TSC offset value, so we cannot merge trace data of guest and the host in
> chronological order. Therefore, I think the TSC offset should be exported for
> userland tools.
> 
> In this patch, TSC offset is exported by the tracepoint kvm_write_tsc_offset
> on the host. TSC offset events will be very rare event because guests may
> seldom execute write_tsc. So if we enable this event for a normal buffer of
> ftrace, the events will be overwritten by other events. For a recent linux
> kernel, a multiple buffer function is available. So, I recommend to enable
> this event for a sub buffer of ftrace.
> 
> <Example>
> We assume that wakeup-latency for a command is big on a guest. Normally
> we will use ftrace's wakeup-latency tracer or event tracer on the guest, but we
> may not be able to solve this problem. This is because guests often exit to
> the host for several reasons. In the next, we will use TSC as ftrace's timestamp
> and record the trace data on the guest and the host. Then, we get following
> data:
> 
>  /* guest data */
>             comm-3826  [000] d...49836825726903: sched_wakeup: [detail]
>             comm-3826  [000] d...49836832225344: sched_switch: [detail]
>  /* host data */
>         qemu-kvm-2687  [003] d...50550079203669: kvm_exit: [detail]
>         qemu-kvm-2687  [003] d...50550079206816: kvm_entry: [detail]
>         qemu-kvm-2687  [003] d...50550079240656: kvm_exit: [detail]
>         qemu-kvm-2687  [003] d...50550079243467: kvm_entry: [detail]
>         qemu-kvm-2687  [003] d...50550079256103: kvm_exit: [detail]
>         qemu-kvm-2687  [003] d...50550079268391: kvm_entry: [detail]
>         qemu-kvm-2687  [003] d...50550079280829: kvm_exit: [detail]
>         qemu-kvm-2687  [003] d...50550079286028: kvm_entry: [detail]
> 
> Since TSC offset is not considered, these data cannot be merged. If this trace
> data is shown like as follows, we will be able to understand the reason:
> 
>         qemu-kvm-2687  [003] d...50550079203669: kvm_exit: [detail]
>         qemu-kvm-2687  [003] d...50550079206816: kvm_entry: [detail]
>             comm-3826  [000] d.h.49836825726903: sched_wakeup: [detail] <=
>         qemu-kvm-2687  [003] d...50550079240656: kvm_exit: [detail]
>         qemu-kvm-2687  [003] d...50550079243467: kvm_entry: [detail]
>         qemu-kvm-2687  [003] d...50550079256103: kvm_exit: [detail]
>         qemu-kvm-2687  [003] d...50550079268391: kvm_entry: [detail]
>             comm-3826  [000] d...49836832225344: sched_switch: [detail] <=
>         qemu-kvm-2687  [003] d...50550079280829: kvm_exit: [detail]
>         qemu-kvm-2687  [003] d...50550079286028: kvm_entry: [detail]
> 
> In this case, we can understand wakeup-latency was big due to exit to host
> twice. Getting this data sorted in chronological order is our goal.
> 
> To merge the data like previous pattern, we apply this patch. Then, we can
> get TSC offset of the guest as follows:
> 
> # cat /sys/kernel/debug/tracing/instances/tsc_offset/trace
> .. d...4300151845072: kvm_write_tsc_offset: vcpu=0 prev=0 next=18446739773557710924
>                                                         ^      ^^^^^^^^^^^^^^^^^^^^
> 
> We use this TSC offset value to a merge script and obtain the following data:
> 
> $ ./trace-merge.pl -g guest_data -h host_data -t 18446739773557710924
> 
> h        qemu-kvm-2687  [003] d...50550079203669: kvm_exit: [detail]
> h        qemu-kvm-2687  [003] d...50550079206816: kvm_entry: [detail]
> g            comm-3826  [000] d.h.50550079226331: sched_wakeup: [detail] <=
> h        qemu-kvm-2687  [003] d...50550079240656: kvm_exit: [detail]
> h        qemu-kvm-2687  [003] d...50550079243467: kvm_entry: [detail]
> h        qemu-kvm-2687  [003] d...50550079256103: kvm_exit: [detail]
> h        qemu-kvm-2687  [003] d...50550079268391: kvm_entry: [detail]
> g            comm-3826  [000] d...50550079279266: sched_switch: [detail] <=
> h        qemu-kvm-2687  [003] d...50550079280829: kvm_exit: [detail]
> h        qemu-kvm-2687  [003] d...50550079286028: kvm_entry: [detail]
> |
> \----guest/host
> 
> In this summary, I suggest the patch which TSC offset for each guest can be
> output on the host.
> 
> As one example, I will send a merge tool. This tool assumes the guest doesn't
> execute write_tsc.
> 
> Changes in V2:
> - Use tracepoint for outputting TSC offset value instead of printk() [1/1]
> 
> Changes in V3:
> - Add tracepoint in svm.c [1/1]
> - Record vcpu_id [1/1]
> - Change "previous=" to "prev=" in output result [1/1]
> 
> Thanks!
> 
> ---
> 
> Yoshihiro YUNOMAE (1):
>       kvm: Add a tracepoint write_tsc_offset
> 
> 
>  arch/x86/kvm/svm.c   |   10 +++++++++-
>  arch/x86/kvm/trace.h |   21 +++++++++++++++++++++
>  arch/x86/kvm/vmx.c   |    7 ++++++-
>  arch/x86/kvm/x86.c   |    1 +
>  4 files changed, 37 insertions(+), 2 deletions(-)
> 
> -- 
> Yoshihiro YUNOMAE
> Software Platform Research Dept. Linux Technology Center
> Hitachi, Ltd., Yokohama Research Laboratory
> E-mail: yoshihiro.yunomae.ez@hitachi.com

--
			Gleb.

  parent reply	other threads:[~2013-06-23  7:59 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-12  7:43 [PATCH V3 0/1] kvm: Output TSC offset Yoshihiro YUNOMAE
2013-06-12  7:43 ` [PATCH V3 1/1] kvm: Add a tracepoint write_tsc_offset Yoshihiro YUNOMAE
2013-06-12 22:44   ` Marcelo Tosatti
2013-06-24 15:02   ` Paolo Bonzini
2013-06-12  7:46 ` [EXAMPLE] tools: a tool for merging trace data of a guest and a host Yoshihiro YUNOMAE
2013-06-23  7:58 ` Gleb Natapov [this message]
2013-06-25 10:16   ` [PATCH] [BUGFIX] Fix build error caused by an undefinition of the kvm_write_tsc_offset tracepoint for x86_32 Yoshihiro YUNOMAE
2013-06-25 10:18     ` Yoshihiro YUNOMAE
2013-06-25 10:43       ` Gleb Natapov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130623075848.GO5832@redhat.com \
    --to=gleb@redhat.com \
    --cc=dhsharp@google.com \
    --cc=hidehiro.kawai.ez@hitachi.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=mingo@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=yoshihiro.yunomae.ez@hitachi.com \
    --cc=yrl.pp-manager.tt@hitachi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.