From: Marcelo Tosatti <mtosatti@redhat.com>
To: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Cc: Steven Rostedt <rostedt@goodmis.org>,
David Sharp <dhsharp@google.com>,
"H. Peter Anvin" <hpa@zytor.com>,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
Joerg Roedel <joerg.roedel@amd.com>,
Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>,
Ingo Molnar <mingo@redhat.com>, Avi Kivity <avi@redhat.com>,
yrl.pp-manager.tt@hitachi.com,
Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: Re: Re: Re: Re: [RFC PATCH 0/2] kvm/vmx: Output TSC offset
Date: Mon, 26 Nov 2012 21:16:53 -0200 [thread overview]
Message-ID: <20121126231653.GA20391@amt.cnet> (raw)
In-Reply-To: <50B34CE6.9070207@hitachi.com>
On Mon, Nov 26, 2012 at 08:05:10PM +0900, Yoshihiro YUNOMAE wrote:
> >>>500h. event tsc_write tsc_offset=-3000
> >>>
> >>>Then a guest trace containing events with a TSC timestamp.
> >>>Which tsc_offset to use?
> >>>
> >>>(that is the problem, which unless i am mistaken can only be solved
> >>>easily if the guest can convert RDTSC -> TSC of host).
> >>
> >>There are three following cases of changing TSC offset:
> >> 1. Reset TSC at guest boot time
> >> 2. Adjust TSC offset due to some host's problems
> >> 3. Write TSC on guests
> >>The scenario which you mentioned is case 3, so we'll discuss this case.
> >>Here, we assume that a guest is allocated single CPU for the sake of
> >>ease.
> >>
> >>If a guest executes write_tsc, TSC values jumps to forward or backward.
> >>For the forward case, trace data are as follows:
> >>
> >>< host > < guest >
> >>cycles events cycles events
> >> 3000 tsc_offset=-2950
> >> 3001 kvm_enter
> >> 53 eventX
> >> ....
> >> 100 (write_tsc=+900)
> >> 3060 kvm_exit
> >> 3075 tsc_offset=-2050
> >> 3080 kvm_enter
> >> 1050 event1
> >> 1055 event2
> >> ...
> >>
> >>
> >>This case is simple. The guest TSC of the first kvm_enter is calculated
> >>as follows:
> >>
> >> (host TSC of kvm_enter) + (current tsc_offset) = 3001 - 2950 = 51
> >>
> >>Similarly, the guest TSC of the second kvm_enter is 130. So, the guest
> >>events between 51 and 130, that is, 53 eventX is inserted between the
> >>first pair of kvm_enter and kvm_exit. To insert events of the guests
> >>between 51 and 130, we convert the guest TSC to the host TSC using TSC
> >>offset 2950.
> >>
> >>For the backward case, trace data are as follows:
> >>
> >>< host > < guest >
> >>cycles events cycles events
> >> 3000 tsc_offset=-2950
> >> 3001 kvm_enter
> >> 53 eventX
> >> ....
> >> 100 (write_tsc=-50)
> >> 3060 kvm_exit
> >> 3075 tsc_offset=-2050
> >> 3080 kvm_enter
> >> 90 event1
> >> 95 event2
> >> ...
> >
> > 3400 100 (write_tsc=-50)
> >
> > 90 event3
> > 95 event4
> >
> >>As you say, in this case, the previous method is invalid. When we
> >>calculate the guest TSC value for the tsc_offset=-3000 event, the value
> >>is 75 on the guest. This seems like prior event of write_tsc=-50 event.
> >>So, we need to consider more.
> >>
> >>In this case, it is important that we can understand where the guest
> >>executes write_tsc or the host rewrites the TSC offset. write_tsc on
> >>the guest equals wrmsr 0x00000010, so this instruction induces vm_exit.
> >>This implies that the guest does not operate when the host changes TSC
> >>offset on the cpu. In other words, the guest cannot use new TSC before
> >>the host rewrites the new TSC offset. So, if timestamp on the guest is
> >>not monotonically increased, we can understand the guest executes
> >>write_tsc. Moreover, in the region where timestamp is decreasing, we
> >>can understand when the host rewrote the TSC offset in the guest trace
> >>data. Therefore, we can sort trace data in chronological order.
> >
> >This requires an entire trace of events. That is, to be able
> >to reconstruct timeline you require the entire trace from the moment
> >guest starts. So that you can correlate wrmsr-to-tsc on the guest with
> >vmexit-due-to-tsc-write on the host.
> >
> >Which means that running out of space for trace buffer equals losing
> >ability to order events.
> >
> >Is that desirable? It seems cumbersome to me.
>
> As you say, tracing events can overwrite important events like
> kvm_exit/entry or write_tsc_offset. So, Steven's multiple buffer is
> needed by this feature. Normal events which often hit record the buffer
> A, and important events which rarely hit record the buffer B. In our
> case, the important event is write_tsc_offset.
> >Also the need to correlate each write_tsc event in the guest trace
> >with a corresponding tsc_offset write in the host trace means that it
> >is _necessary_ for the guest and host to enable tracing simultaneously.
> >Correct?
> >
> >Also, there are WRMSR executions in the guest for which there is
> >no event in the trace buffer. From SeaBIOS, during boot.
> >In that case, there is no explicit event in the guest trace which you
> >can correlate with tsc_offset changes in the host side.
>
> I understand that you want to say, but we don't correlate between
> write_tsc event and write_tsc_offset event directly. This is because
> the write_tsc tracepoint (also WRMSR instruction) is not prepared in
> the current kernel. So, in the previous mail
> (https://lkml.org/lkml/2012/11/22/53), I suggested the method which we
> don't need to prepare the write_tsc tracepoint.
>
> In the method, we enable ftrace before the guest boots, and we need to
> keep all write_tsc_offset events in the buffer. If we forgot enabling
> ftrace or we don't use multiple buffers, we don't use this feature.
Yoshihiro,
Better have a single method to convert guest TSC to host TSC.
Ok, if you keep both TSC offset write events and guest TSC writes (*)
in separate buffers which are persistent, then you can convert
guest-tsc-events to host-tsc.
Can you please write a succint but complete description of the method
so it can be verified?
(*) note guest TSC writes have no events because Linux does not write
to TSC offset, but a "system booted" event can be used to correlate
with the TSC write by BIOS.
Thanks
> So, I think as Peter says, the host should also export TSC offset
> information to /proc/pid/kvm/*.
>
> >If the guest had access to the host TSC value, these complications
> >would disappear.
>
> As a debugging mode, the TSC offset zero mode will be useful, I think.
> (not for the real operation mode)
>
> Thanks,
> --
> Yoshihiro YUNOMAE
> Software Platform Research Dept. Linux Technology Center
> Hitachi, Ltd., Yokohama Research Laboratory
> E-mail: yoshihiro.yunomae.ez@hitachi.com
next prev parent reply other threads:[~2012-11-26 23:16 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-14 1:36 [RFC PATCH 0/2] kvm/vmx: Output TSC offset Yoshihiro YUNOMAE
2012-11-14 1:36 ` [RFC PATCH 1/2] kvm/vmx: Print TSC_OFFSET information when TSC offset value is written to VMCS Yoshihiro YUNOMAE
2012-11-14 1:37 ` [RFC PATCH 2/2] tools: Add a tool for merging trace data of a guest and a host Yoshihiro YUNOMAE
2012-11-14 2:00 ` [RFC PATCH 0/2] kvm/vmx: Output TSC offset Steven Rostedt
2012-11-14 2:02 ` H. Peter Anvin
2012-11-14 2:03 ` David Sharp
2012-11-14 2:31 ` Steven Rostedt
2012-11-14 8:26 ` Yoshihiro YUNOMAE
2012-11-16 15:05 ` Steven Rostedt
2012-11-16 18:56 ` Marcelo Tosatti
2012-11-20 10:38 ` Yoshihiro YUNOMAE
2012-11-16 19:15 ` Marcelo Tosatti
2012-11-20 10:36 ` Yoshihiro YUNOMAE
2012-11-20 22:51 ` Marcelo Tosatti
2012-11-22 5:21 ` Yoshihiro YUNOMAE
2012-11-23 22:46 ` Marcelo Tosatti
2012-11-26 11:05 ` Yoshihiro YUNOMAE
2012-11-26 23:16 ` Marcelo Tosatti [this message]
2012-11-27 10:53 ` Yoshihiro YUNOMAE
2012-11-29 22:51 ` Marcelo Tosatti
2012-11-30 1:36 ` Yoshihiro YUNOMAE
2012-11-30 20:42 ` Marcelo Tosatti
2012-12-03 0:55 ` Yoshihiro YUNOMAE
2012-11-16 3:19 ` Marcelo Tosatti
2012-11-16 8:09 ` Yoshihiro YUNOMAE
2012-11-16 10:05 ` Marcelo Tosatti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121126231653.GA20391@amt.cnet \
--to=mtosatti@redhat.com \
--cc=avi@redhat.com \
--cc=dhsharp@google.com \
--cc=hidehiro.kawai.ez@hitachi.com \
--cc=hpa@zytor.com \
--cc=joerg.roedel@amd.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=masami.hiramatsu.pt@hitachi.com \
--cc=mingo@redhat.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=yoshihiro.yunomae.ez@hitachi.com \
--cc=yrl.pp-manager.tt@hitachi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.