[Qemu-devel] [RFC] host and guest kernel trace merging

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC] host and guest kernel trace merging
@ 2016-03-03 19:35 Luiz Capitulino
  2016-03-04 11:19 ` Stefan Hajnoczi
  0 siblings, 1 reply; 14+ messages in thread
From: Luiz Capitulino @ 2016-03-03 19:35 UTC (permalink / raw)
  To: kvm
  Cc: yoshihiro.yunomae.ez, mtosatti, qemu-devel, rostedt,
	linux-trace-users, peterx, stefanha, pbonzini

Very recently, trace-cmd got a few new features that allow you
to merge the host and guest kernel traces using the host TSC.

Those features originated in the tracing we're doing to debug spikes
in real-time KVM. However, as real-time KVM uses a very specific
setup and as we have so far debugged a very simple application,
I'm wondering: is this feature useful for the general, non-realtime,
use-cases?

If the answer is yes, then I've got several ideas on how to
make host and guest trace merging extremely simple to use.

I'll first describe how we do tracing for real-time KVM. Then
I'll give some suggestions on how to use the same procedure
for unpinned use-cases. Lastly, I'll talk about how we could
make it easy to use.

Real-time KVM host and guest tracing
====================================

In real-time KVM, each guest's vCPU is pinned to a different host
core. The real-time application running in the guest is also pinned.
When we get a spike, we know in which guest CPU it ocurred, and
so we know in which host core this CPU was running. All we have to
do is to get a trace of that guest CPU/host core pair.

1. Setup
--------

You'll need the following:

 1. A stable TSC
 2. The TSC offset of the guest you want to debug
    (see below)
 3. Have your guest transfer a file to your
    host someway (I use networking)
 4. Latest trace-cmd.git in both host and guest
    (HEAD c21aae2c or later)

To get the TSC offset of the guest, you can use the kvm_write_tsc_offset
tracepoint in the host. I use this script to do it:

 http://people.redhat.com/~lcapitul/real-time/start-debug-guest

Yes, it sucks. I have an idea on how to improve this (keep reading).

2. Tracing
----------

In summary, what you have to do is:

 1. run trace-cmd start -C x86-tsc in the host
 2. run trace-cmd record -C x86-tsc in the guest
 3. run trace-cmd stop in the host
 4. run trace-cmd extract in the host
 4. copy the guest's trace.dat file to a known
    location in the host

This guest script does all that:

 http://people.redhat.com/~lcapitul/real-time/trace-host-and-guest

I run it like this:

 # trace-host-and-guest cyclictest -q -n -b10 --notrace

3. Merging
----------

Merging is simple:

 $ trace-cmd report -i host-trace.dat --ts-offset $((GUEST-TSC-offset)) \
                    -i guest-trace.dat

For real-time KVM, we also want to see the difference in nanoseconds
of each line in the trace so we use additional options:

 $ trace-cmd report --ts-diff --ts2secs HOST-Hz -t \
                    -i host-trace.dat --ts-offset $((GUEST-TSC-offset)) \
					-i guest-trace.dat

Here's a real example:

 $ trace-cmd report --ts-diff --ts2secs 26000000000 -t \
                    -i host-trace.dat --ts-offset $((18443676333795429734)) \
					-i guest-trace.dat

And here's a little extract of a merged trace where the host injects
a timer interrupt, the guest handles it and reprograms the next
hrtimer timer. The value in "()" is how many nanoseconds it took
between the previous and the following line:

 host-trace.dat:         qemu-kvm-3699  [004] [004]  6196.749398857: (+88)    function:             kvm_inject_pending_timer_irqs <-- kvm_arch_vcpu_ioctl_run
 host-trace.dat:         qemu-kvm-3699  [004] [004]  6196.749398990: (+133)   kvm_entry:            vcpu 0
guest-trace.dat:           <idle>-0     [000] [000]  6196.749399096: (+106)   function:             hrtimer_interrupt <-- local_apic_timer_interrupt
guest-trace.dat:           <idle>-0     [000] [000]  6196.749399123: (+27)    function:             hrtimer_wakeup <-- __run_hrtimer
guest-trace.dat:           <idle>-0     [000] [000]  6196.749399183: (+60)    function:             tick_program_event <-- hrtimer_interrupt
 host-trace.dat:         qemu-kvm-3699  [004] [004]  6196.749399219: (+36)    kvm_exit:             reason MSR_WRITE rip 0xffffffff8104bf58 info 0 0
 host-trace.dat:         qemu-kvm-3699  [004] [004]  6196.749399260: (+41)    function:             kvm_set_lapic_tscdeadline_msr <-- kvm_set_msr_common
 host-trace.dat:         qemu-kvm-3699  [004] [004]  6196.749399283: (+23)    function:             hrtimer_start <-- start_apic_timer
 host-trace.dat:         qemu-kvm-3699  [004] [004]  6196.749399336: (+53)    kvm_entry:            vcpu 0

Unpinned use-cases
==================

If you can't pin the guest vCPU threads and the guest application like
we do in real-time KVM, you could try the following:

 * If your guest has a single CPU, or you want to trace a
   specific guest vCPU then try to pass -P vCPU-TID when
   running "trace-cmd record start" in the host

 * If you want to trace multiple vCPUs, I think you could
   try to trace all cores where the vCPUs could run with -M.
   Then you could try to merge this with the guest trace and
   see if you get a single timeline of all cores and guests CPUs

trace-cmd-server
================

Everything I described could look like this:

  # trace-cmd server [ in the host ]
  # trace-cmd record [ in the guest ]
  # trace-cmd report [ in the host, to merge the traces ]

To achieve this, we need two things:

 1. Add an interface to obtain the guest TSC offset from the
    host user-space.

    Maybe we could have a new sysfs directory, with a file
    per vCPU thread and the offset as contents? Or maybe
    just add a new entry to /proc/, like: /proc/TID/tsc-offset?

 2. Build a trace-cmd-server

    Which does the following:

      1. Receive trace-cmd record commands from a guest,
         to be performed in the host

      2. Read the TSC offset for vCPUs being traced

      3. Read the CPU frequency for --ts2secs

      4. Receive the guest.dat from the guest and store it together
         with the host's

    I'd suggest the default communication between guest and
    host be via networking. Then we add vsock support when it's
    available.

Credits
=======

Real-time KVM tracing was originally developped by Marcelo Tosatti
using scripts. Steven Rostedt added the features we needed in trace-cmd
at my request.

All the ideas and scripts showed here are mine. I also take the
the responsibility for any mistakes or nonsensical stuff.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [RFC] host and guest kernel trace merging
  2016-03-03 19:35 [Qemu-devel] [RFC] host and guest kernel trace merging Luiz Capitulino
@ 2016-03-04 11:19 ` Stefan Hajnoczi
  2016-03-04 13:23   ` Steven Rostedt
  2016-03-24  8:42   ` Peter Xu
  0 siblings, 2 replies; 14+ messages in thread
From: Stefan Hajnoczi @ 2016-03-04 11:19 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: kvm, yoshihiro.yunomae.ez, mtosatti, qemu-devel, rostedt,
	linux-trace-users, peterx, stefanha, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1873 bytes --]

On Thu, Mar 03, 2016 at 02:35:01PM -0500, Luiz Capitulino wrote:
> trace-cmd-server
> ================
> 
> Everything I described could look like this:
> 
>   # trace-cmd server [ in the host ]
>   # trace-cmd record [ in the guest ]
>   # trace-cmd report [ in the host, to merge the traces ]
> 
> To achieve this, we need two things:
> 
>  1. Add an interface to obtain the guest TSC offset from the
>     host user-space.
> 
>     Maybe we could have a new sysfs directory, with a file
>     per vCPU thread and the offset as contents? Or maybe
>     just add a new entry to /proc/, like: /proc/TID/tsc-offset?

Yes, the interface is missing.  In the past I have heard people using
trace events on the host to:

1. Collect tsc offsets
2. Track which vCPU is scheduled to a host CPU

So instead of relying on an interface they enable the relevant trace
events on the host and then parse the trace to collect this information.
However, it's a bad solution especially for tsc offsets since you may
wish to trace an already-running VM where the tracepoint that records
the tsc offset may not fire after startup (?).

Therefore, I agree that an interface for the tsc offset is needed.

>  2. Build a trace-cmd-server
> 
>     Which does the following:
> 
>       1. Receive trace-cmd record commands from a guest,
>          to be performed in the host

Sometimes the opposite is desirable: the host controls tracing inside
the guest.  Any thoughts on this use case?

>       2. Read the TSC offset for vCPUs being traced
> 
>       3. Read the CPU frequency for --ts2secs
> 
>       4. Receive the guest.dat from the guest and store it together
>          with the host's
> 
>     I'd suggest the default communication between guest and
>     host be via networking. Then we add vsock support when it's
>     available.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [RFC] host and guest kernel trace merging
  2016-03-04 11:19 ` Stefan Hajnoczi
@ 2016-03-04 13:23   ` Steven Rostedt
  2016-03-07 15:17     ` Stefan Hajnoczi
  2016-03-24  5:16     ` Peter Xu
  2016-03-24  8:42   ` Peter Xu
  1 sibling, 2 replies; 14+ messages in thread
From: Steven Rostedt @ 2016-03-04 13:23 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kvm, yoshihiro.yunomae.ez, mtosatti, qemu-devel, peterx,
	Luiz Capitulino, linux-trace-users, stefanha, pbonzini

On Fri, 4 Mar 2016 11:19:33 +0000
Stefan Hajnoczi <stefanha@gmail.com> wrote:

> >  2. Build a trace-cmd-server
> > 
> >     Which does the following:
> > 
> >       1. Receive trace-cmd record commands from a guest,
> >          to be performed in the host  
> 
> Sometimes the opposite is desirable: the host controls tracing inside
> the guest.  Any thoughts on this use case?

My idea for a trace-cmd server, is to have a --client operation, for
running on the guest.

 trace-cmd server --client <connection>

The connection will be some socket, either network or something
directly attached to the host.

Then on the host, we can have

  trace-cmd server --connect <guest>

Where the server will create a connection to the guest.

And then, you could run on the host:

  trace-cmd record <host-events> --connect <guest> <guest-events>

And this will start recording host events, and then connect to the
local server that connects to the guest(s) and that will start tracing
on the guest as well.

Then events on the guest will be passed to the host server.

Something like this is my idea. We can work out the details on the best
way to get things working. We may be able to eliminate the host server
middle man. But I envision that we need a trace-cmd server running on
the guest to start off the commands.

The problem I have with the guest server, and something that we may be
able to fix later on, but should always keep it in the back of our
minds, is the security issue. For this to work, the guest server needs
to run as root. It will have an open socket (network or to host), that
will enable tracing on the guest. There needs to be some sort of
verification on that connection to prevent anyone from connecting to it.

In the protocol for the connection between guest and host, I'll
currently add a "security" feature, that will allow the guest to tell
whomever is connecting to it, what type of security feature it wants.
For now it will be TRACE_CMD_NO_SECURITY. But that will have to change
in the future.

-- Steve

> 
> >       2. Read the TSC offset for vCPUs being traced
> > 
> >       3. Read the CPU frequency for --ts2secs
> > 
> >       4. Receive the guest.dat from the guest and store it together
> >          with the host's
> > 
> >     I'd suggest the default communication between guest and
> >     host be via networking. Then we add vsock support when it's
> >     available.  

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [RFC] host and guest kernel trace merging
  2016-03-04 13:23   ` Steven Rostedt
@ 2016-03-07 15:17     ` Stefan Hajnoczi
  2016-03-07 15:49       ` Steven Rostedt
  2016-03-24  5:16     ` Peter Xu
  1 sibling, 1 reply; 14+ messages in thread
From: Stefan Hajnoczi @ 2016-03-07 15:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: kvm, Stefan Hajnoczi, yoshihiro.yunomae.ez, mtosatti, qemu-devel,
	peterx, Luiz Capitulino, linux-trace-users, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1488 bytes --]

On Fri, Mar 04, 2016 at 08:23:11AM -0500, Steven Rostedt wrote:
> The problem I have with the guest server, and something that we may be
> able to fix later on, but should always keep it in the back of our
> minds, is the security issue. For this to work, the guest server needs
> to run as root. It will have an open socket (network or to host), that
> will enable tracing on the guest. There needs to be some sort of
> verification on that connection to prevent anyone from connecting to it.
> 
> In the protocol for the connection between guest and host, I'll
> currently add a "security" feature, that will allow the guest to tell
> whomever is connecting to it, what type of security feature it wants.
> For now it will be TRACE_CMD_NO_SECURITY. But that will have to change
> in the future.

qemu-guest-agent runs inside the guest and replies to RPC commands from
the host.  It is used for backups, shutdown, network configuration, etc.
From time to time people have wanted the ability to execute an arbitrary
command inside the guest and return the output.  This functionality has
never been merged, probably for the security reason.

A tracing server that runs inside the guest is comparable to
qemu-guest-agent.  As long as the tracing server requires manual
commands to start it and does not run by default, then I think the
security issue can be kept at bay.  It's a powerful tool that requires
explicit guest administrator action to enable.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [RFC] host and guest kernel trace merging
  2016-03-07 15:17     ` Stefan Hajnoczi
@ 2016-03-07 15:49       ` Steven Rostedt
  2016-03-07 16:10         ` Eric Blake
  0 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2016-03-07 15:49 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kvm, Stefan Hajnoczi, yoshihiro.yunomae.ez, mtosatti, qemu-devel,
	peterx, Luiz Capitulino, linux-trace-users, pbonzini

On Mon, 7 Mar 2016 15:17:05 +0000
Stefan Hajnoczi <stefanha@redhat.com> wrote:

> qemu-guest-agent runs inside the guest and replies to RPC commands from
> the host.  It is used for backups, shutdown, network configuration, etc.
> From time to time people have wanted the ability to execute an arbitrary
> command inside the guest and return the output.  This functionality has
> never been merged, probably for the security reason.

How's the connection set up. That is, how does it know the commands are
coming from the host? And how does it know that the commands from the
host is from a trusted source? If the host is compromised, is there
anything keeping an intruder from controlling the guest?

> 
> A tracing server that runs inside the guest is comparable to
> qemu-guest-agent.  As long as the tracing server requires manual
> commands to start it and does not run by default, then I think the
> security issue can be kept at bay.  It's a powerful tool that requires
> explicit guest administrator action to enable.

This isn't something I would expect to be started by default. My worry
is that once it is started, then you open up a connection that can be
attached by pretty much anyone. We could make a network socket that
only communicates with the host, but even that worries me. I'm worried
that the host may have actors that might compromise the system.

-- Steve

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [RFC] host and guest kernel trace merging
  2016-03-07 15:49       ` Steven Rostedt
@ 2016-03-07 16:10         ` Eric Blake
  2016-03-07 16:26           ` Steven Rostedt
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Blake @ 2016-03-07 16:10 UTC (permalink / raw)
  To: Steven Rostedt, Stefan Hajnoczi
  Cc: kvm, Stefan Hajnoczi, yoshihiro.yunomae.ez, mtosatti, qemu-devel,
	peterx, Luiz Capitulino, linux-trace-users, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1202 bytes --]

On 03/07/2016 08:49 AM, Steven Rostedt wrote:
> On Mon, 7 Mar 2016 15:17:05 +0000
> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> 
>> qemu-guest-agent runs inside the guest and replies to RPC commands from
>> the host.  It is used for backups, shutdown, network configuration, etc.
>> From time to time people have wanted the ability to execute an arbitrary
>> command inside the guest and return the output.  This functionality has
>> never been merged, probably for the security reason.
> 
> How's the connection set up. That is, how does it know the commands are
> coming from the host? And how does it know that the commands from the
> host is from a trusted source? If the host is compromised, is there
> anything keeping an intruder from controlling the guest?

qemu-guest-agent uses a virtio channel, so only the host can be driving
that channel.  But how can a guest know that it trusts the host? It
can't.  A compromised host implicitly compromises all guests, and that's
always been the case.  At least qemu-guest-agent doesn't make the window
any larger.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [RFC] host and guest kernel trace merging
  2016-03-07 16:10         ` Eric Blake
@ 2016-03-07 16:26           ` Steven Rostedt
  2016-03-07 17:13             ` Paolo Bonzini
  0 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2016-03-07 16:26 UTC (permalink / raw)
  To: Eric Blake
  Cc: kvm, Stefan Hajnoczi, yoshihiro.yunomae.ez, mtosatti, qemu-devel,
	peterx, Luiz Capitulino, linux-trace-users, Stefan Hajnoczi,
	pbonzini

On Mon, 7 Mar 2016 09:10:10 -0700
Eric Blake <eblake@redhat.com> wrote:

> On 03/07/2016 08:49 AM, Steven Rostedt wrote:
> > On Mon, 7 Mar 2016 15:17:05 +0000
> > Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > 
> >   
> >> qemu-guest-agent runs inside the guest and replies to RPC commands from
> >> the host.  It is used for backups, shutdown, network configuration, etc.
> >> From time to time people have wanted the ability to execute an arbitrary
> >> command inside the guest and return the output.  This functionality has
> >> never been merged, probably for the security reason.  
> > 
> > How's the connection set up. That is, how does it know the commands are
> > coming from the host? And how does it know that the commands from the
> > host is from a trusted source? If the host is compromised, is there
> > anything keeping an intruder from controlling the guest?  
> 
> qemu-guest-agent uses a virtio channel, so only the host can be driving
> that channel.  But how can a guest know that it trusts the host? It
> can't.  A compromised host implicitly compromises all guests, and that's
> always been the case.  At least qemu-guest-agent doesn't make the window
> any larger.
> 

I should have been a bit more clear about what I meant by "host is
compromised". I should have asked, what about untrusted tasks on the
host. Is the channel protected where only admin users can access it?

Of course, one of my concerns with the trace-cmd server is that it may
require a network connection, because doesn't a virtio channel require
to be initialized at boot up? Where the host must have an active
listener when the guest starts? Or am I thinking about something else.

-- Steve

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [RFC] host and guest kernel trace merging
  2016-03-07 16:26           ` Steven Rostedt
@ 2016-03-07 17:13             ` Paolo Bonzini
  0 siblings, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2016-03-07 17:13 UTC (permalink / raw)
  To: Steven Rostedt, Eric Blake
  Cc: kvm, Stefan Hajnoczi, yoshihiro.yunomae.ez, mtosatti, qemu-devel,
	peterx, Luiz Capitulino, linux-trace-users, Stefan Hajnoczi



On 07/03/2016 17:26, Steven Rostedt wrote:
> > > How's the connection set up. That is, how does it know the commands are
> > > coming from the host? And how does it know that the commands from the
> > > host is from a trusted source? If the host is compromised, is there
> > > anything keeping an intruder from controlling the guest?  
> > 
> > qemu-guest-agent uses a virtio channel, so only the host can be driving
> > that channel.  But how can a guest know that it trusts the host? It
> > can't.  A compromised host implicitly compromises all guests, and that's
> > always been the case.  At least qemu-guest-agent doesn't make the window
> > any larger.
>
> I should have been a bit more clear about what I meant by "host is
> compromised". I should have asked, what about untrusted tasks on the
> host. Is the channel protected where only admin users can access it?

The other side of the channel is typically a socket or a pty, so it's
protected by file permissions, SELinux, and the like.

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [RFC] host and guest kernel trace merging
  2016-03-04 13:23   ` Steven Rostedt
  2016-03-07 15:17     ` Stefan Hajnoczi
@ 2016-03-24  5:16     ` Peter Xu
  2016-03-24 13:02       ` Luiz Capitulino
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Xu @ 2016-03-24  5:16 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: kvm, Stefan Hajnoczi, yoshihiro.yunomae.ez, mtosatti, qemu-devel,
	Luiz Capitulino, linux-trace-users, stefanha, pbonzini

Hi, Steven,

On Fri, Mar 04, 2016 at 08:23:11AM -0500, Steven Rostedt wrote:
> My idea for a trace-cmd server, is to have a --client operation, for
> running on the guest.
> 
>  trace-cmd server --client <connection>
> 
> The connection will be some socket, either network or something
> directly attached to the host.
> 
> Then on the host, we can have
> 
>   trace-cmd server --connect <guest>
> 
> Where the server will create a connection to the guest.
> 
> And then, you could run on the host:
> 
>   trace-cmd record <host-events> --connect <guest> <guest-events>
> 
> And this will start recording host events, and then connect to the
> local server that connects to the guest(s) and that will start tracing
> on the guest as well.
> 
> Then events on the guest will be passed to the host server.
> 
> Something like this is my idea. We can work out the details on the best
> way to get things working. We may be able to eliminate the host server
> middle man. But I envision that we need a trace-cmd server running on
> the guest to start off the commands.

Not sure whether fully I understand the above, it seems that we can
remove the host server middle man (as you have mentioned). Moreover,
I am not sure whether we can use this for multiple hosts as well,
and guests will be a special case, in which we can get very accurate
tsc offset. Let me try to do a rough summary of what I thought.

So, firstly, we start a trace-cmd server on every host (or guest)
that we want to trace using:

  trace-cmd server <connection>

(In the case of one host + one guest tracing, we need to start the
server on both host and guest, though the <connection> might differ)

Then, on any host/guest that can reach all the target hosts/guests
via different <connection>s, do the tracing using:

  trace-cmd record --connect <connection> <events...> \
                   --connect <connection> <events...> \
                   ...

Finally, when "trace-cmd record" stops, we get one trace file with
all traces merged. Here, if we are tracing multiple hosts, we use
timestamps for merging (maybe we can have some other way to adjust
the offset for each host). For guest special case, we will know it
in some way (the stupid one is to provide it in parameter following
the --connect of guest, or we may detect it via <connection>
formats, etc.) take special care to fetch the offset and CPU
frequency information, so that we will get extremely accurate merged
results.

All the above still not considering security issues, assuming what
Paolo mentioned (using existing file protections for sockets, pipes,
etc.) should work.

Thanks.

-- peterx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [RFC] host and guest kernel trace merging
  2016-03-04 11:19 ` Stefan Hajnoczi
  2016-03-04 13:23   ` Steven Rostedt
@ 2016-03-24  8:42   ` Peter Xu
  2016-03-24 10:13     ` Stefan Hajnoczi
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Xu @ 2016-03-24  8:42 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kvm, yoshihiro.yunomae.ez, mtosatti, qemu-devel, rostedt,
	Luiz Capitulino, linux-trace-users, stefanha, pbonzini

On Fri, Mar 04, 2016 at 11:19:33AM +0000, Stefan Hajnoczi wrote:
> On Thu, Mar 03, 2016 at 02:35:01PM -0500, Luiz Capitulino wrote:
> > trace-cmd-server
> > ================
> > 
> > Everything I described could look like this:
> > 
> >   # trace-cmd server [ in the host ]
> >   # trace-cmd record [ in the guest ]
> >   # trace-cmd report [ in the host, to merge the traces ]
> > 
> > To achieve this, we need two things:
> > 
> >  1. Add an interface to obtain the guest TSC offset from the
> >     host user-space.
> > 
> >     Maybe we could have a new sysfs directory, with a file
> >     per vCPU thread and the offset as contents? Or maybe
> >     just add a new entry to /proc/, like: /proc/TID/tsc-offset?
> 
> Yes, the interface is missing.  In the past I have heard people using
> trace events on the host to:
> 
> 1. Collect tsc offsets
> 2. Track which vCPU is scheduled to a host CPU
> 
> So instead of relying on an interface they enable the relevant trace
> events on the host and then parse the trace to collect this information.
> However, it's a bad solution especially for tsc offsets since you may
> wish to trace an already-running VM where the tracepoint that records
> the tsc offset may not fire after startup (?).
> 
> Therefore, I agree that an interface for the tsc offset is needed.

It seems that KVM still has no such a generic interface to query VM
status, right? How about we create one for it? As a start, we can
make it fairly simple. Afterward, we can enrich it when
necessary. For example:

we create this directory to store all KVM guest informations (or
general KVM dynamic informations):

  /sys/hypervisor/kvm/

For each VM, we can have this to store VM specific infos:

  /sys/hypervisor/kvm/$VM_NAME

For each vCPU:

  /sys/hypervisor/kvm/$VM_NAME/cpus/cpuN/

and we can put tsc-offset here like:

  /sys/hypervisor/kvm/$VM_NAME/cpus/cpuN/tsc-offset

Would this be workable?

Thanks.

-- peterx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [RFC] host and guest kernel trace merging
  2016-03-24  8:42   ` Peter Xu
@ 2016-03-24 10:13     ` Stefan Hajnoczi
  2016-03-25  2:22       ` Peter Xu
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Hajnoczi @ 2016-03-24 10:13 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, Stefan Hajnoczi, yoshihiro.yunomae.ez, mtosatti, qemu-devel,
	rostedt, Luiz Capitulino, linux-trace-users, pbonzini

[-- Attachment #1: Type: text/plain, Size: 2670 bytes --]

On Thu, Mar 24, 2016 at 04:42:42PM +0800, Peter Xu wrote:
> On Fri, Mar 04, 2016 at 11:19:33AM +0000, Stefan Hajnoczi wrote:
> > On Thu, Mar 03, 2016 at 02:35:01PM -0500, Luiz Capitulino wrote:
> > > trace-cmd-server
> > > ================
> > > 
> > > Everything I described could look like this:
> > > 
> > >   # trace-cmd server [ in the host ]
> > >   # trace-cmd record [ in the guest ]
> > >   # trace-cmd report [ in the host, to merge the traces ]
> > > 
> > > To achieve this, we need two things:
> > > 
> > >  1. Add an interface to obtain the guest TSC offset from the
> > >     host user-space.
> > > 
> > >     Maybe we could have a new sysfs directory, with a file
> > >     per vCPU thread and the offset as contents? Or maybe
> > >     just add a new entry to /proc/, like: /proc/TID/tsc-offset?
> > 
> > Yes, the interface is missing.  In the past I have heard people using
> > trace events on the host to:
> > 
> > 1. Collect tsc offsets
> > 2. Track which vCPU is scheduled to a host CPU
> > 
> > So instead of relying on an interface they enable the relevant trace
> > events on the host and then parse the trace to collect this information.
> > However, it's a bad solution especially for tsc offsets since you may
> > wish to trace an already-running VM where the tracepoint that records
> > the tsc offset may not fire after startup (?).
> > 
> > Therefore, I agree that an interface for the tsc offset is needed.
> 
> It seems that KVM still has no such a generic interface to query VM
> status, right? How about we create one for it? As a start, we can
> make it fairly simple. Afterward, we can enrich it when
> necessary. For example:
> 
> we create this directory to store all KVM guest informations (or
> general KVM dynamic informations):
> 
>   /sys/hypervisor/kvm/
> 
> For each VM, we can have this to store VM specific infos:
> 
>   /sys/hypervisor/kvm/$VM_NAME
> 
> For each vCPU:
> 
>   /sys/hypervisor/kvm/$VM_NAME/cpus/cpuN/
> 
> and we can put tsc-offset here like:
> 
>   /sys/hypervisor/kvm/$VM_NAME/cpus/cpuN/tsc-offset
> 
> Would this be workable?

There are probably race conditions if the tsc offset is queried
independently from the trace collection.  For example, imagine the host
is suspend right when tracing begins.  I think the TSC could be adjusted
when the host wakes up again.

Ideally the TSC information would be part of the trace data so that
there are no race conditions when interpeting time stamps.

Maybe TSC changes are so infrequent that this doesn't matter...

Perhaps folks who understand TSC better have an opinion on this.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [RFC] host and guest kernel trace merging
  2016-03-24  5:16     ` Peter Xu
@ 2016-03-24 13:02       ` Luiz Capitulino
  2016-03-25  1:53         ` Peter Xu
  0 siblings, 1 reply; 14+ messages in thread
From: Luiz Capitulino @ 2016-03-24 13:02 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, Stefan Hajnoczi, yoshihiro.yunomae.ez, mtosatti, qemu-devel,
	Steven Rostedt, linux-trace-users, stefanha, pbonzini

On Thu, 24 Mar 2016 13:16:20 +0800
Peter Xu <peterx@redhat.com> wrote:

> Hi, Steven,
> 
> On Fri, Mar 04, 2016 at 08:23:11AM -0500, Steven Rostedt wrote:
> > My idea for a trace-cmd server, is to have a --client operation, for
> > running on the guest.
> > 
> >  trace-cmd server --client <connection>
> > 
> > The connection will be some socket, either network or something
> > directly attached to the host.
> > 
> > Then on the host, we can have
> > 
> >   trace-cmd server --connect <guest>
> > 
> > Where the server will create a connection to the guest.
> > 
> > And then, you could run on the host:
> > 
> >   trace-cmd record <host-events> --connect <guest> <guest-events>
> > 
> > And this will start recording host events, and then connect to the
> > local server that connects to the guest(s) and that will start tracing
> > on the guest as well.
> > 
> > Then events on the guest will be passed to the host server.
> > 
> > Something like this is my idea. We can work out the details on the best
> > way to get things working. We may be able to eliminate the host server
> > middle man. But I envision that we need a trace-cmd server running on
> > the guest to start off the commands.
> 
> Not sure whether fully I understand the above, it seems that we can
> remove the host server middle man (as you have mentioned). Moreover,
> I am not sure whether we can use this for multiple hosts as well,

Honest question, what's the multiple hosts use-case?

I would start by thinking about the most simple use-case: a host
and a guest with a single vCPU. Then add vCPUs, and then add multiple
guests.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [RFC] host and guest kernel trace merging
  2016-03-24 13:02       ` Luiz Capitulino
@ 2016-03-25  1:53         ` Peter Xu
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2016-03-25  1:53 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: kvm, Stefan Hajnoczi, yoshihiro.yunomae.ez, mtosatti, qemu-devel,
	Steven Rostedt, linux-trace-users, stefanha, pbonzini

On Thu, Mar 24, 2016 at 09:02:08AM -0400, Luiz Capitulino wrote:
> On Thu, 24 Mar 2016 13:16:20 +0800
> Peter Xu <peterx@redhat.com> wrote:
> 
> > Hi, Steven,
> > 
> > On Fri, Mar 04, 2016 at 08:23:11AM -0500, Steven Rostedt wrote:
> > > My idea for a trace-cmd server, is to have a --client operation, for
> > > running on the guest.
> > > 
> > >  trace-cmd server --client <connection>
> > > 
> > > The connection will be some socket, either network or something
> > > directly attached to the host.
> > > 
> > > Then on the host, we can have
> > > 
> > >   trace-cmd server --connect <guest>
> > > 
> > > Where the server will create a connection to the guest.
> > > 
> > > And then, you could run on the host:
> > > 
> > >   trace-cmd record <host-events> --connect <guest> <guest-events>
> > > 
> > > And this will start recording host events, and then connect to the
> > > local server that connects to the guest(s) and that will start tracing
> > > on the guest as well.
> > > 
> > > Then events on the guest will be passed to the host server.
> > > 
> > > Something like this is my idea. We can work out the details on the best
> > > way to get things working. We may be able to eliminate the host server
> > > middle man. But I envision that we need a trace-cmd server running on
> > > the guest to start off the commands.
> > 
> > Not sure whether fully I understand the above, it seems that we can
> > remove the host server middle man (as you have mentioned). Moreover,
> > I am not sure whether we can use this for multiple hosts as well,
> 
> Honest question, what's the multiple hosts use-case?
> 
> I would start by thinking about the most simple use-case: a host
> and a guest with a single vCPU. Then add vCPUs, and then add multiple
> guests.

For multiple host use case: maybe some direct communication between
two/more host kernels? For example, I remembered that there are
kernel-level distributed file systems (like: Luster?) that protocol
talks directly via kernel level.

Anyway, this is only a very very unmature idea.  What I feel like is
that, there are lots of common things between "one guest and one
host tracing" and "two hosts tracing" when the guests are VMs, since
VM itself is some-kind-of host as well (maybe if we talks about
docker guest instances, things might differ, but here we mainly
talks about QEMU/KVM ones).  And, I do not mean to "do it all in
once". :) What I meant is more like "we'd better keep the interface
design in mind, in case we might need the multiple hosts tracing in
the future".

So, if we to do it, maybe we can just start with "host and guest"
ones, only keep the interface ready for "multiple hosts" ones.  If
we finally do not think we need the multiple hosts ones, we can just
skip it.

Anyway, I fully agree with you that, we should start from simple
ones. :)

Thanks.

-- peterx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [RFC] host and guest kernel trace merging
  2016-03-24 10:13     ` Stefan Hajnoczi
@ 2016-03-25  2:22       ` Peter Xu
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2016-03-25  2:22 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kvm, Stefan Hajnoczi, yoshihiro.yunomae.ez, mtosatti, qemu-devel,
	rostedt, Luiz Capitulino, linux-trace-users, pbonzini

On Thu, Mar 24, 2016 at 10:13:17AM +0000, Stefan Hajnoczi wrote:
> There are probably race conditions if the tsc offset is queried
> independently from the trace collection.  For example, imagine the host
> is suspend right when tracing begins.  I think the TSC could be adjusted
> when the host wakes up again.

Right... So maybe we should never allow tsc-offset change
(e.g. suspend) happen during host-guest tracing?  It seems more like
a question about "whether we can do a merge", rather than "read a
correct offset"... If it changes, we cannot do the merge any more
(only if we record the offset for each guest entry)...

> 
> Ideally the TSC information would be part of the trace data so that
> there are no race conditions when interpeting time stamps.

Agree. We should keep the tsc-offset in the trace data.

Thanks.

-- peterx

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-03-25  2:22 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-03 19:35 [Qemu-devel] [RFC] host and guest kernel trace merging Luiz Capitulino
2016-03-04 11:19 ` Stefan Hajnoczi
2016-03-04 13:23   ` Steven Rostedt
2016-03-07 15:17     ` Stefan Hajnoczi
2016-03-07 15:49       ` Steven Rostedt
2016-03-07 16:10         ` Eric Blake
2016-03-07 16:26           ` Steven Rostedt
2016-03-07 17:13             ` Paolo Bonzini
2016-03-24  5:16     ` Peter Xu
2016-03-24 13:02       ` Luiz Capitulino
2016-03-25  1:53         ` Peter Xu
2016-03-24  8:42   ` Peter Xu
2016-03-24 10:13     ` Stefan Hajnoczi
2016-03-25  2:22       ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).