All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: Proposal for Xen support of performance monitoringanddebug hardware
@ 2005-04-23  2:27 Santos, Jose Renato G (Jose Renato Santos)
  2005-04-25 15:17 ` William Cohen
  0 siblings, 1 reply; 3+ messages in thread
From: Santos, Jose Renato G (Jose Renato Santos) @ 2005-04-23  2:27 UTC (permalink / raw)
  To: William Cohen, Ian Pratt
  Cc: Turner, Yoshio, Aravind Menon, xen-devel, G John Janakiraman



  William,

  Please, see my comments embedded in the text below.

> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com 
> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of 
> William Cohen
> Sent: Friday, April 22, 2005 2:03 PM
> To: Ian Pratt
> Cc: xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] Proposal for Xen support of 
> performance monitoringanddebug hardware
> 
> 
> Ian Pratt wrote:
> >  
> > 
> >>I have been working on a proposal to add Xen support for
> >>performance monitoring and debugging hardware. The goal of 
> >>this would be enable OProfile, perfmon, and perfctr to work 
> >>on Xen. The proposal is still pretty preliminary, but I would 
> >>like comments on the current version.
> > 
> > 
> > William, have you seen the patches from Jose Renato Santos 
> for multi 
> > VM oprofile support? We're planning on getting these 
> checked in to the 
> > xen repo, after a little reworking.
> > 
> > It's somewhat orthogonal to your msr protection scheme, but 
> you should 
> > be aware of it.
> 
> Rik van Riel pointed me at the Santos's patch for oprofile support. 
> There are some differences between the two approaches. The 
> Xen oprofile 
> support by HP pretty much just supports oprofile and was 
> designed to get 
> some information about what was going on in the Xen hypervisor. It 
> doesn't provide access to the other performance monitoring (or 
> debugging) hardware.
> 

  I agree. It would be useful to give domains low level
  access to the MSRs, for supporting a larger set of tools.

> > I can certainly see some merit in having fine grained 
> access control 
> > over MSRs, though for the case of perf counter registers I wander 
> > whether we'd be better off with some higher-level interface?
> 
> I was aiming for minimal support low-level, trying to follow the 
> existing Xen approach of not coding too much knowledge about 
> the system 
> in Xen. Make the MSR registers visible and make sure that a guest OS 
> cannot clobber other guest OSs. The guests OS decide how to use the 
> performance monitoring hw.  The hypervisor needs a list of which 
> registers are in which class, but the hypervisor doesn't need to know 
> the details of what the registers do.
> 
> There is significant variations in the precise events and 
> contraints on 
> the combinations of events allowed in many of the performance 
> monitoring 
> systems. OProfile has files for each architecture to map 
> events to the 
> counter setup.  There are a lot of variations in the events 
> available on 
> a processor; OProfile doesn't hide those differences. The 
> University of 
> Tennessee knoxville PAPI has abstraction to hide some of these 
> differences with generic events, e.g. cache miss event.  
> ppc64 (aix) and 
> ia64 (perfmon) have libraries to do the complicated 
> constraints testing 
> to determine whether events can be done at the same time. 
> However, these 
> mapping operations are handled in user-space, not in the kernel.
> 
> I am not sure that that should be pushed into the hypervisor. 
> I suspect 
> that someone will complain that the high-level interface 
> doesn't handle 
> some particular instrumentation mode of the performance monitoring 
> hardware. Adding it to Xen will require rebuilding xen and 
> the guest OS 
> and rebooting the entire machime. The low-level interface makes the 
> guest OS responsible and only it would need to be recompiled, 
> and only 
> rebuild and reboot the guest OS.
> 

  I agree with your point. Providing low level access to MSR
  seems the right approach, if you want to provide support
  for other tools besides OProfile. I also agree that 
  it would be too complex to provide a high level abstraction
  of performance events across different architectures in the 
  hypervisor.

> > What other msr's do you anticipate your scheme being used 
> to provide 
> > restricted access to for selected VMs?
> 
> The sampling used by OProfile would naturally be something 
> high on the 
> list of things to use. It would also be nice to be able to do the 
> stopwatch counting provided by perfctr and perfmon.
> 
> The PPC64, IA64 and Pentium 4 they have precise event 
> sampling. I would 
> like to be able access those through the hypervisor.
> 
> -Will
> 

  I agree with Ian comments in his reply to this same email.
  While Xenoprof is useful for providing system wide profiling, I can
  see it would be usefull to have virtualization of MSR's and enable
  domains to have individual hardware performance monitoring
capabilities.
  We were also thinking on these lines and planning to
  extend xenoprof to have MSR virtualization.

  I did not understand how your global scope for MSR access would work.
  It seems you were planning to provide system wide profiling with this.
  (Please, clarify if this is not the case). I see the folowing
  problems with this approach if I understood it correctly (from 
  an Oprofile point of view):
  1) It would not be possible to profile hypervisor code, since
interrupts 
     caused by hardware overflow would be handled by the domain. When
     the domain start executing the information about what Xen code was
     running at the time of MSR overflow is lost. In Xenoprof we
     handle the MSR interrupts inside the hypervisor and save
     the PC value at that time, enabling the profile of
     hypervisor code. An additional complication is the use of normal
     IRQs instead of NMI. This would prevent performance profiling
     of some parts of the kernel (including interrupt handlers).
  2) It seems you plan to have interrupts that occurs in other
     domains to be delivered to the owner of the MSR. A potential
     problem with this approach is that this could cause additional
     domain context swiching (to schedule the owner domain to 
     handle the interrupt) and this could change your profiling
     results. In addition, it is not clear how the interrupt
     handler would get information about the PC sample at the
     time of MSR overflow. Even if it was possible to receive this
     information from the hypervisor, we would still need a way
     to map this PC value to the right process and associated
     binary file running on the other domain, which seems difficult.

  I think both system wide profiling and single domain (virtualized)
  profiling are important and it would be nice to have both.
  As Ian mentioned we cannot have both at the same time,
  at least for the same MSR. However, it would be possible to have
  some registers being virtualized and others being used 
  for system wide profiling, at the same time.
  It would be nice to have a unified framework that could provide
  both functionalities and a way to select.

  Renato 
  
  
  
  
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Proposal for Xen support of performance monitoringanddebug hardware
  2005-04-23  2:27 Proposal for Xen support of performance monitoringanddebug hardware Santos, Jose Renato G (Jose Renato Santos)
@ 2005-04-25 15:17 ` William Cohen
  0 siblings, 0 replies; 3+ messages in thread
From: William Cohen @ 2005-04-25 15:17 UTC (permalink / raw)
  To: Santos, Jose Renato G (Jose Renato Santos)
  Cc: Ian Pratt, Aravind Menon, xen-devel, G John Janakiraman,
	Turner, Yoshio

Santos,

Thanks for the comments. I will take a closer look at the Xen oprofile 
support and see what I can incoporate into the proposal.

Santos, Jose Renato G (Jose Renato Santos) wrote:
> 
>   William,
> 
>   Please, see my comments embedded in the text below.

[...]


>   I agree with Ian comments in his reply to this same email.
>   While Xenoprof is useful for providing system wide profiling, I can
>   see it would be usefull to have virtualization of MSR's and enable
>   domains to have individual hardware performance monitoring
> capabilities.
>   We were also thinking on these lines and planning to
>   extend xenoprof to have MSR virtualization.
> 
>   I did not understand how your global scope for MSR access would work.
>   It seems you were planning to provide system wide profiling with this.
>   (Please, clarify if this is not the case). I see the folowing
>   problems with this approach if I understood it correctly (from 
>   an Oprofile point of view):

The system-wide profiling was a relatively new addition to the document 
and it does need some more thought on how all the pieces work.

I was thinking that the xen_msr_allocate function would provide some 
information on how to route the performance monitoring hardware. Select 
scope as GLOBAL for domain 0 to reserve the performance monitoring 
hardware for domain 0. The xen_msr_irq_hander sets the irq for 
performance monitoring to route all perf irq to the domain that reserved 
the perf HW.


>   1) It would not be possible to profile hypervisor code, since
> interrupts 
>      caused by hardware overflow would be handled by the domain. When
>      the domain start executing the information about what Xen code was
>      running at the time of MSR overflow is lost. In Xenoprof we
>      handle the MSR interrupts inside the hypervisor and save
>      the PC value at that time, enabling the profile of
>      hypervisor code. An additional complication is the use of normal
>      IRQs instead of NMI. This would prevent performance profiling
>      of some parts of the kernel (including interrupt handlers).

Shouldn't it be possible for the hypervisor to send the needed 
information about address to the irq handler in the domain? From the 
address it should be possible to determine that it is a sample from the 
hypervisor. The overhead of moving things from hypervisor to domain 
might be undesirable.

I have some reservations about using NMI in this case. With OProfile it 
is quite possible to kill the machine by setting a sampling interval to 
be smaller than the overhead incurred by the interrupt servicing 
routine. Allowing NMIs would be a way for a domain to crash the entire 
machine. The NMI do allow better coverage of code.


>   2) It seems you plan to have interrupts that occurs in other
>      domains to be delivered to the owner of the MSR. A potential
>      problem with this approach is that this could cause additional
>      domain context swiching (to schedule the owner domain to 
>      handle the interrupt) and this could change your profiling
>      results. In addition, it is not clear how the interrupt
>      handler would get information about the PC sample at the
>      time of MSR overflow. Even if it was possible to receive this
>      information from the hypervisor, we would still need a way
>      to map this PC value to the right process and associated
>      binary file running on the other domain, which seems difficult.

PC values are pretty transient. Memory maps go away. The mapping the pc 
values to something reasonable is still an issue; there is a FIXME in 
the document for this. OProfile has some help in the kernel to convert 
the raw pc value to a dcookie and file offset. This help is not 
available to outside the domain.

>   I think both system wide profiling and single domain (virtualized)
>   profiling are important and it would be nice to have both.
>   As Ian mentioned we cannot have both at the same time,
>   at least for the same MSR. However, it would be possible to have
>   some registers being virtualized and others being used 
>   for system wide profiling, at the same time.
>   It would be nice to have a unified framework that could provide
>   both functionalities and a way to select.
> 
>   Renato 

Slicing and dicing the performance monitoring hardware may be possible, 
but it is a complicated operation. There are lots of constraints about 
the combinations that are allow and not allowed. Combinations like 
inter-domain and intra-domain sampling would be difficult because the 
interrupt would be the same. The allocation software would have to have 
a picture of all the domain allocations. There are lots of constraints 
on which registers can be used for what on pentium 4 and ppc64.

For the time being the proposal will address both global and virtual 
modes but not allow concurrent use of the global and virtual modes.

-Will

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: Proposal for Xen support of performance monitoringanddebug hardware
@ 2005-04-25 21:21 Santos, Jose Renato G (Jose Renato Santos)
  0 siblings, 0 replies; 3+ messages in thread
From: Santos, Jose Renato G (Jose Renato Santos) @ 2005-04-25 21:21 UTC (permalink / raw)
  To: William Cohen
  Cc: Ian Pratt, Aravind Menon, xen-devel, G John Janakiraman,
	Turner, Yoshio



> -----Original Message-----
> From: William Cohen [mailto:wcohen@redhat.com] 
> Sent: Monday, April 25, 2005 8:18 AM
> To: Santos, Jose Renato G (Jose Renato Santos)
> Cc: Ian Pratt; xen-devel@lists.xensource.com; Aravind Menon; 
> G John Janakiraman; Turner, Yoshio
> Subject: Re: [Xen-devel] Proposal for Xen support of 
> performance monitoringanddebug hardware
> 
> 
> Santos,
> 
> Thanks for the comments. I will take a closer look at the Xen 
> oprofile 
> support and see what I can incoporate into the proposal.
  
  Good!

[...]

> I was thinking that the xen_msr_allocate function would provide some 
> information on how to route the performance monitoring 
> hardware. Select 
> scope as GLOBAL for domain 0 to reserve the performance monitoring 
> hardware for domain 0. The xen_msr_irq_hander sets the irq for 
> performance monitoring to route all perf irq to the domain 
> that reserved 
> the perf HW.
> 
   In Xenoprof we have a similar notion, in which one domain 
   receives the interrupts generated by counter overflows in
   other domains(which we call passive domains). In this case,
   profiling is at a coarser domain level (fine grain 
   profiling at application/function level is lost).  
   In general, I think enabling a domain to handle perf counter
   interrupts for other domains is a good thing, but we should
   NOT be limited to this case. It is still useful to have
   interrupts delivered to the running domain for system-wide
   profiling. I think your interface should enable that option
   for the GLOBAL case too.

   
 > 
> >   1) It would not be possible to profile hypervisor code, since 
> > interrupts
> >      caused by hardware overflow would be handled by the 
> domain. When
> >      the domain start executing the information about what 
> Xen code was
> >      running at the time of MSR overflow is lost. In Xenoprof we
> >      handle the MSR interrupts inside the hypervisor and save
> >      the PC value at that time, enabling the profile of
> >      hypervisor code. An additional complication is the use 
> of normal
> >      IRQs instead of NMI. This would prevent performance profiling
> >      of some parts of the kernel (including interrupt handlers).
> 
> Shouldn't it be possible for the hypervisor to send the needed 
> information about address to the irq handler in the domain? From the 
> address it should be possible to determine that it is a 
> sample from the 
> hypervisor. The overhead of moving things from hypervisor to domain 
> might be undesirable.
> 

  Yes, it is possible for the hypervisor to send PC samples to the
  domain. But this requires saving the PC value at the time of
  interrupt, i.e. at physical interrupt handler in the hypervisor. 
  This is exactly how Xenoprof works. The NMI handler in Xen stores
  the PC sample in a buffer and triggers an virtual IRQ in the 
  domain. The domain interrupt handler reads the sample from the
  buffer. The overhead is insignificant since this is done through
  a shared memory page.
  
> I have some reservations about using NMI in this case. With 
> OProfile it 
> is quite possible to kill the machine by setting a sampling 
> interval to 
> be smaller than the overhead incurred by the interrupt servicing 
> routine. Allowing NMIs would be a way for a domain to crash 
> the entire 
> machine. The NMI do allow better coverage of code.
> 
> 
   Programming small performance counters with low values
   can trash the system, but this is not restrictred to NMI.
   Even with maskable interrupts, the machine will trash in this
   case. We should prevent this by other means: e.g by preventing
   the counters  to be programmed with small values to begin with,
   or by reprogramming the counter to not generate interrupts 
   when a high interrupt rate is detected (i.e. kind of disabling NMI).
   I still think NMI is a better choice since it has better coverage
   and does not solve the problem you mentioned.

> >   2) It seems you plan to have interrupts that occurs in other
> >      domains to be delivered to the owner of the MSR. A potential
> >      problem with this approach is that this could cause additional
> >      domain context swiching (to schedule the owner domain to 
> >      handle the interrupt) and this could change your profiling
> >      results. In addition, it is not clear how the interrupt
> >      handler would get information about the PC sample at the
> >      time of MSR overflow. Even if it was possible to receive this
> >      information from the hypervisor, we would still need a way
> >      to map this PC value to the right process and associated
> >      binary file running on the other domain, which seems difficult.
> 
> PC values are pretty transient. Memory maps go away. The 
> mapping the pc 
> values to something reasonable is still an issue; there is a FIXME in 
> the document for this. OProfile has some help in the kernel 
> to convert 
> the raw pc value to a dcookie and file offset. This help is not 
> available to outside the domain.
> 
  Exactly! That is why it is important to have a framework
  that does not prevent interrupts to be delivered to multiple
  domains in case of system wide profiling. It is better to have 
  domains interpreting their own samples if we want fine grain 
  profiling.
   
> >   I think both system wide profiling and single domain (virtualized)
> >   profiling are important and it would be nice to have both.
> >   As Ian mentioned we cannot have both at the same time,
> >   at least for the same MSR. However, it would be possible to have
> >   some registers being virtualized and others being used 
> >   for system wide profiling, at the same time.
> >   It would be nice to have a unified framework that could provide
> >   both functionalities and a way to select.
> > 
> >   Renato
> 
> Slicing and dicing the performance monitoring hardware may be 
> possible, 
> but it is a complicated operation. There are lots of 
> constraints about 
> the combinations that are allow and not allowed. Combinations like 
> inter-domain and intra-domain sampling would be difficult because the 
> interrupt would be the same. The allocation software would 
> have to have 
> a picture of all the domain allocations. There are lots of 
> constraints 
> on which registers can be used for what on pentium 4 and ppc64.
> 
> For the time being the proposal will address both global and virtual 
> modes but not allow concurrent use of the global and virtual modes.
> 

  I agree with your point. It makes sense to have a simple initial
  implementation without concurrent use of global and virtual modes.
  But maybe we could have a generic interface that can accommodate
  this flexibility. This could avoid interface changes in the future.
  Not sure it this is worth, though... Just a thought ...

  Renato

> -Will
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2005-04-25 21:21 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-23  2:27 Proposal for Xen support of performance monitoringanddebug hardware Santos, Jose Renato G (Jose Renato Santos)
2005-04-25 15:17 ` William Cohen
  -- strict thread matches above, loose matches on Subject: below --
2005-04-25 21:21 Santos, Jose Renato G (Jose Renato Santos)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.