View for virtual machine monitoring

All of lore.kernel.org
 help / color / mirror / Atom feed

* View for virtual machine monitoring
@ 2013-07-09 18:32 Mohamad Gebai
       [not found] ` <1373394774.51dc57569fe05-AbLIzcVVHEexwLxtslwksLDks+cytr/Z@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Mohamad Gebai @ 2013-07-09 18:32 UTC (permalink / raw)
  To: linuxtools-dev-j9T/66MeVpFAfugRpC6u6w,
	lttng-dev-bnB2LGs2QVJ+nrgayQ7rhA

[-- Attachment #1: Type: text/plain, Size: 1801 bytes --]

Hello,
We are currently working on a new view in Eclipse's TMF plugin (Tracing and
Monitoring Framework) specific to virtual machine analysis. This view requires
kernel traces from the host and from each guest with a set of specific
tracepoints activated. The traces are then merged together and analysed in a
way that the real state of each system can be rebuilt, while taking into
account all the interactions between the different systems.

The main purpose of this view is to easily point out latency problems due to
resource sharing. For now, we only consider CPU time, but more resources (such
as memory allocation, disks...) will be added.

Two screenshots are attached. The first one shows the virtual machines and the
state of their respective virtual CPUs. The second screenshot gives in-depth
information about one of the virtual CPUs, showing only the threads that
interacted with this vCPU and their state during the time of the trace. We
think that this approach of showing information across the layers (OS, KVM,
guest OS, and eventually JVM...) can be helpful to investigate latency-related
problems specific to virtual machines.

Legend:
Green: user mode
Blue: kernel mode
Yellow: process blocked
Purple: vCPU preempted
Grey: vCPU idle

For the sake of our experience, we pinned vCPU0 of VM1 and vCPU0 of VM2 on the
same physical CPU, and ran a CPU-intensive workload for one second one each one
of them. We generated our traces using the low-overhead LTTng tracer. We can
clearly see that during that second, both of the virtual CPUs are fighting over
the same physical CPU.

We seek any thoughts or suggestions on the effectiveness of this view or on our
approach. Any real life problems waiting for investigation are also welcome.

Mohamad Gebai

[-- Attachment #2: screenshot_1.png --]
[-- Type: image/png, Size: 36963 bytes --]

[-- Attachment #3: screenshot_2.png --]
[-- Type: image/png, Size: 86040 bytes --]

[-- Attachment #4: Type: text/plain, Size: 0 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: View for virtual machine monitoring
@ 2013-07-09 19:12 Thibault, Daniel
  0 siblings, 0 replies; 8+ messages in thread
From: Thibault, Daniel @ 2013-07-09 19:12 UTC (permalink / raw)
  To: lttng-dev@lists.lttng.org; +Cc: Mohamad Gebai

----------------------------------------------------------------------
Date: Tue, 09 Jul 2013 14:32:54 -0400
From: Mohamad Gebai <mohamad.gebai@polymtl.ca>
To: linuxtools-dev@eclipse.org, lttng-dev@lists.lttng.org

> For the sake of our experience, we pinned vCPU0 of VM1 and vCPU0 of VM2 on the same physical CPU, and ran a CPU-intensive workload for one second one each one of them.
> We generated our traces using the low-overhead LTTng tracer. We can clearly see that during that second, both of the virtual CPUs are fighting over the same physical CPU.

   You should have said "and ran a CPU-intensive workload for one second on both of them simultaneously".  Otherwise you may be misconstrued as meaning you ran the workloads consecutively.  (The third sentence dispels this incorrect reading, but it's better to get the meaning across unambiguously the first time)

Daniel U. Thibault
Protection des systèmes et contremesures (PSC) | Systems Protection & Countermeasures (SPC)
Cyber sécurité pour les missions essentielles (CME) | Mission Critical Cyber Security (MCCS)
R & D pour la défense Canada - Valcartier (RDDC Valcartier) | Defence R&D Canada - Valcartier (DRDC Valcartier)
2459 route de la Bravoure
Québec QC  G3J 1X5
CANADA
Vox : (418) 844-4000 x4245
Fax : (418) 844-4538
NAC : 918V QSDJ <http://www.travelgis.com/map.asp?addr=918V%20QSDJ>
Gouvernement du Canada | Government of Canada
<http://www.valcartier.drdc-rddc.gc.ca/>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: View for virtual machine monitoring
       [not found] ` <1373394774.51dc57569fe05-AbLIzcVVHEexwLxtslwksLDks+cytr/Z@public.gmane.org>
@ 2013-07-09 19:23   ` Aaron Spear
       [not found]     ` <77342852.21701313.1373397836659.JavaMail.root-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
  2013-07-09 19:37   ` Alexandre Montplaisir
  1 sibling, 1 reply; 8+ messages in thread
From: Aaron Spear @ 2013-07-09 19:23 UTC (permalink / raw)
  To: Linux Tools developer discussions; +Cc: lttng-dev-bnB2LGs2QVJ+nrgayQ7rhA

Hello Mohamad!

Your work looks very interesting.  I have been forced to be away from it for a couple months now due to other work priorities, but I have been building something quite similar myself. 

----- Original Message -----
> Hello,
> We are currently working on a new view in Eclipse's TMF plugin (Tracing and
> Monitoring Framework) specific to virtual machine analysis. This view
> requires
> kernel traces from the host and from each guest with a set of specific
> tracepoints activated. The traces are then merged together and analysed in a
> way that the real state of each system can be rebuilt, while taking into
> account all the interactions between the different systems.

I assume you are using LTTng for Linux, are you using it for KVM as well?  

I assume then that you are using CTF formatted traces?

Are you using TMF's CTF parser?

> The main purpose of this view is to easily point out latency problems due to
> resource sharing. For now, we only consider CPU time, but more resources
> (such
> as memory allocation, disks...) will be added.
> 
> Two screenshots are attached. The first one shows the virtual machines and
> the
> state of their respective virtual CPUs. The second screenshot gives in-depth
> information about one of the virtual CPUs, showing only the threads that
> interacted with this vCPU and their state during the time of the trace. We
> think that this approach of showing information across the layers (OS, KVM,
> guest OS, and eventually JVM...) can be helpful to investigate
> latency-related
> problems specific to virtual machines.

I agree!

> Legend:
> Green: user mode
> Blue: kernel mode
> Yellow: process blocked
> Purple: vCPU preempted
> Grey: vCPU idle
> 
> For the sake of our experience, we pinned vCPU0 of VM1 and vCPU0 of VM2 on
> the
> same physical CPU, and ran a CPU-intensive workload for one second one each
> one
> of them. We generated our traces using the low-overhead LTTng tracer. We can
> clearly see that during that second, both of the virtual CPUs are fighting
> over
> the same physical CPU.
> 
> We seek any thoughts or suggestions on the effectiveness of this view or on
> our approach. Any real life problems waiting for investigation are also welcome.

I am interested to know how you setup your view under the hood.  Did you build from the code base that was already there with the ControlFlowView (which is what I did), and then using the TMF 
state system infrastructure to model state of the various elements you wish to display?

If you look through the history on this list you will see some links that I posted to the prototype that I was working with on github as well as some screenshots.  I went with an approach of trying to make the view a generic display of hierarchical state of objects vs time, and then pluggable code that understands the event schema, iterating the events and updating the view.  I like the idea of having a view that can be data driven and so it is then fairly straight forward to plug in any sort of state vs. time in context display.  My work is incomplete, it still lacks a number of features that I intend to add including the ability to have multiple instances of the view open at the same time, all syncronized and/or a single view that aggregates the contents of many different traces.  It loo
 ks as though you are already doing that, though I can't help but wonder how you defined the hierarchy when different levels in the hierarchy have different traces.

best regards,
Aaron Spear

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: View for virtual machine monitoring
       [not found] ` <1373394774.51dc57569fe05-AbLIzcVVHEexwLxtslwksLDks+cytr/Z@public.gmane.org>
  2013-07-09 19:23   ` Aaron Spear
@ 2013-07-09 19:37   ` Alexandre Montplaisir
       [not found]     ` <51DC6662.8080807-GLOHREzkiE4oS2+YlJWNIQ@public.gmane.org>
  1 sibling, 1 reply; 8+ messages in thread
From: Alexandre Montplaisir @ 2013-07-09 19:37 UTC (permalink / raw)
  To: Mohamad Gebai
  Cc: lttng-dev-bnB2LGs2QVJ+nrgayQ7rhA,
	Linux Tools developer discussions

Hi Mohamad,

Quite impressive!

Small detail: I don't know if you use the exact same colors as the
Control Flow View, but the color you use for "vCPU preempted" seems
similar to the one used for the Interrupted (IRQ) state. You should make
sure they can be differentiated if shown side-by-side. In any case, it's
easy to tweak ;)

There is still some designing/reviewing to do on the required concepts,
like experiment types and state systems for experiments. But this is a
very good example of what it will be possible to do once those features
are integrated (hopefully in the coming months!)

Cheers,
Alexandre



On 13-07-09 02:32 PM, Mohamad Gebai wrote:
> Hello,
> We are currently working on a new view in Eclipse's TMF plugin (Tracing and
> Monitoring Framework) specific to virtual machine analysis. This view requires
> kernel traces from the host and from each guest with a set of specific
> tracepoints activated. The traces are then merged together and analysed in a
> way that the real state of each system can be rebuilt, while taking into
> account all the interactions between the different systems.
>
> The main purpose of this view is to easily point out latency problems due to
> resource sharing. For now, we only consider CPU time, but more resources (such
> as memory allocation, disks...) will be added.
>
> Two screenshots are attached. The first one shows the virtual machines and the
> state of their respective virtual CPUs. The second screenshot gives in-depth
> information about one of the virtual CPUs, showing only the threads that
> interacted with this vCPU and their state during the time of the trace. We
> think that this approach of showing information across the layers (OS, KVM,
> guest OS, and eventually JVM...) can be helpful to investigate latency-related
> problems specific to virtual machines.
>
> Legend:
> Green: user mode
> Blue: kernel mode
> Yellow: process blocked
> Purple: vCPU preempted
> Grey: vCPU idle
>
> For the sake of our experience, we pinned vCPU0 of VM1 and vCPU0 of VM2 on the
> same physical CPU, and ran a CPU-intensive workload for one second one each one
> of them. We generated our traces using the low-overhead LTTng tracer. We can
> clearly see that during that second, both of the virtual CPUs are fighting over
> the same physical CPU.
>
> We seek any thoughts or suggestions on the effectiveness of this view or on our
> approach. Any real life problems waiting for investigation are also welcome.
>
> Mohamad Gebai
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: View for virtual machine monitoring
       [not found]     ` <77342852.21701313.1373397836659.JavaMail.root-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
@ 2013-07-09 20:28       ` Mohamad Gebai
       [not found]         ` <1373401690.51dc725a2c350-AbLIzcVVHEexwLxtslwksLDks+cytr/Z@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Mohamad Gebai @ 2013-07-09 20:28 UTC (permalink / raw)
  To: Linux Tools developer discussions, Aaron Spear
  Cc: lttng-dev-bnB2LGs2QVJ+nrgayQ7rhA

> Hello Mohamad!
>
> Your work looks very interesting.  I have been forced to be away from it for
> a couple months now due to other work priorities, but I have been building
> something quite similar myself.
>

Hello,
Thank you for your elaborated answer!

> ----- Original Message -----
> > Hello,
> > We are currently working on a new view in Eclipse's TMF plugin (Tracing and
> > Monitoring Framework) specific to virtual machine analysis. This view
> > requires
> > kernel traces from the host and from each guest with a set of specific
> > tracepoints activated. The traces are then merged together and analysed in
> a
> > way that the real state of each system can be rebuilt, while taking into
> > account all the interactions between the different systems.
>
> I assume you are using LTTng for Linux, are you using it for KVM as well?

Yes, I am using the tracepoints of KVM that are already in the kernel, and
generating the traces using LTTng. Actually I had disabled a feature in this
view before taking the screenshots, which showed when the CPU was in VMX root
mode to handle a VMEXIT. This information was built from KVM's tracepoints and
showed the overhead caused by virtualization.

>
> I assume then that you are using CTF formatted traces?

Yes.

> Are you using TMF's CTF parser?

Yes, I am using the CTF parser from TMF.

> > The main purpose of this view is to easily point out latency problems due
> to
> > resource sharing. For now, we only consider CPU time, but more resources
> > (such
> > as memory allocation, disks...) will be added.
> >
> > Two screenshots are attached. The first one shows the virtual machines and
> > the
> > state of their respective virtual CPUs. The second screenshot gives
> in-depth
> > information about one of the virtual CPUs, showing only the threads that
> > interacted with this vCPU and their state during the time of the trace. We
> > think that this approach of showing information across the layers (OS, KVM,
> > guest OS, and eventually JVM...) can be helpful to investigate
> > latency-related
> > problems specific to virtual machines.
>
> I agree!
>
> I am interested to know how you setup your view under the hood.  Did you
> build from the code base that was already there with the ControlFlowView
> (which is what I did), and then using the TMF
> state system infrastructure to model state of the various elements you wish
> to display?

Actually I used the ResourceView as a base for my view since I have different
data types in the same view (VMs, vCPUs, threads and eventually interrupts).

> If you look through the history on this list you will see some links that I
> posted to the prototype that I was working with on github as well as some
> screenshots.

Yes I have seen your previous work in TMF.

> I went with an approach of trying to make the view a generic
> display of hierarchical state of objects vs time, and then pluggable code
> that understands the event schema, iterating the events and updating the
> view.  I like the idea of having a view that can be data driven and so it is
> then fairly straight forward to plug in any sort of state vs. time in context
> display.  My work is incomplete, it still lacks a number of features that I
> intend to add including the ability to have multiple instances of the view
> open at the same time, all syncronized and/or a single view that aggregates
> the contents of many different traces.  It looks as though you are already
> doing that, though I can't help but wonder how you defined the hierarchy when
> different levels in the hierarchy have different traces.

How it works for now is that we have to create an Experiment with a predefined
type (VM analysis type), and add to it all of the traces we want to analyse. At
this step, we also have to specify which traces were recorded on the host, and
which ones were recorded on the guests.
Thanks to Genevieve Bastien's experimental work, we were able to merge all of
the traces together as one single trace owned by the Experiment. Then the
events are handled one by one, and depending on if they came from a host, or
from a guest, the state system is modified accordingly.
I will very soon post a link for my experimental branch for this view if you
want to take a closer look at the code underneath.

Thanks again for your reponse,
Mohamad.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: View for virtual machine monitoring
       [not found]     ` <51DC6662.8080807-GLOHREzkiE4oS2+YlJWNIQ@public.gmane.org>
@ 2013-07-09 20:31       ` Mohamad Gebai
  0 siblings, 0 replies; 8+ messages in thread
From: Mohamad Gebai @ 2013-07-09 20:31 UTC (permalink / raw)
  To: Alexandre Montplaisir
  Cc: lttng-dev-bnB2LGs2QVJ+nrgayQ7rhA,
	Linux Tools developer discussions

> Hi Mohamad,
>
> Quite impressive!

Hi Alex,
Thank you!

> Small detail: I don't know if you use the exact same colors as the
> Control Flow View, but the color you use for "vCPU preempted" seems
> similar to the one used for the Interrupted (IRQ) state. You should make
> sure they can be differentiated if shown side-by-side. In any case, it's
> easy to tweak ;)
Actually I started from the Resource View. I agree that the colors are similar,
I hadn't noticed because I don't handle interrupts events for now. Thank you
for the comment!

> There is still some designing/reviewing to do on the required concepts,
> like experiment types and state systems for experiments. But this is a
> very good example of what it will be possible to do once those features
> are integrated (hopefully in the coming months!)

I agree!

> Cheers,
Mohamad.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: View for virtual machine monitoring
       [not found]         ` <1373401690.51dc725a2c350-AbLIzcVVHEexwLxtslwksLDks+cytr/Z@public.gmane.org>
@ 2013-07-09 20:43           ` Mohamad Gebai
  0 siblings, 0 replies; 8+ messages in thread
From: Mohamad Gebai @ 2013-07-09 20:43 UTC (permalink / raw)
  To: Mohamad Gebai
  Cc: lttng-dev-bnB2LGs2QVJ+nrgayQ7rhA,
	Linux Tools developer discussions

[-- Attachment #1: Type: text/plain, Size: 699 bytes --]


> How it works for now is that we have to create an Experiment with a
> predefined
> type (VM analysis type), and add to it all of the traces we want to analyse.
> At
> this step, we also have to specify which traces were recorded on the host,
> and
> which ones were recorded on the guests.
> Thanks to Genevieve Bastien's experimental work, we were able to merge all of
> the traces together as one single trace owned by the Experiment. Then the
> events are handled one by one, and depending on if they came from a host, or
> from a guest, the state system is modified accordingly.

I attached a screenshot showing the state system built for this particular
Experiment.

Mohamad.

[-- Attachment #2: state_system.png --]
[-- Type: image/png, Size: 72144 bytes --]

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: View for virtual machine monitoring
       [not found] <48CF5AC71E61DB46B70D0F388054EFFD13423326@VAL-E-02.valcartier.drdc-rddc.gc.ca>
@ 2013-07-11 18:08 ` Mohamad Gebai
  0 siblings, 0 replies; 8+ messages in thread
From: Mohamad Gebai @ 2013-07-11 18:08 UTC (permalink / raw)
  To: Thibault, Daniel; +Cc: lttng-dev@lists.lttng.org

> > For the sake of our experience, we pinned vCPU0 of VM1 and vCPU0 of VM2 on
> > the same physical CPU, and ran a CPU-intensive workload for one second one
> > each one of them.
> > We generated our traces using the low-overhead LTTng tracer. We can clearly
> > see that during that second, both of the virtual CPUs are fighting over the
> > same physical CPU.
>
>    You should have said "and ran a CPU-intensive workload for one second on
> both of them simultaneously".  Otherwise you may be misconstrued as meaning
> you ran the workloads consecutively.  (The third sentence dispels this
> incorrect reading, but it's better to get the meaning across unambiguously
> the first time)

I agree! Thank you for the feedback,
Mohamad.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-07-11 18:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-09 18:32 View for virtual machine monitoring Mohamad Gebai
     [not found] ` <1373394774.51dc57569fe05-AbLIzcVVHEexwLxtslwksLDks+cytr/Z@public.gmane.org>
2013-07-09 19:23   ` Aaron Spear
     [not found]     ` <77342852.21701313.1373397836659.JavaMail.root-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
2013-07-09 20:28       ` Mohamad Gebai
     [not found]         ` <1373401690.51dc725a2c350-AbLIzcVVHEexwLxtslwksLDks+cytr/Z@public.gmane.org>
2013-07-09 20:43           ` Mohamad Gebai
2013-07-09 19:37   ` Alexandre Montplaisir
     [not found]     ` <51DC6662.8080807-GLOHREzkiE4oS2+YlJWNIQ@public.gmane.org>
2013-07-09 20:31       ` Mohamad Gebai
  -- strict thread matches above, loose matches on Subject: below --
2013-07-09 19:12 Thibault, Daniel
     [not found] <48CF5AC71E61DB46B70D0F388054EFFD13423326@VAL-E-02.valcartier.drdc-rddc.gc.ca>
2013-07-11 18:08 ` Mohamad Gebai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.