Xen, oprofile, perf, PEBS, event counters, PVHVM, PV

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* Xen, oprofile, perf, PEBS, event counters, PVHVM, PV
@ 2013-01-14 20:45 Konrad Rzeszutek Wilk
  2013-01-15  9:13 ` Jan Beulich
  2013-01-15 17:11 ` Marcus Granado
  0 siblings, 2 replies; 7+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-14 20:45 UTC (permalink / raw)
  To: marcus.granado, msw, JBeulich, boris.ostrovsky, jacob.shin,
	Sherry.Hurwitz, jun.nakajima, kurt.hackel, Marcos.Matsunaga,
	andrew.thomas, xen-devel


I've been looking at doing some perf analysis of block and network
backend to get a better idea whether the things I spotted in the ring
protocol are indeed a problem.

But ran in to a problem of actually trying to use the performance tools.
What I am wondering if the folks on this email have also run in similar
issues? This is with oprofile in the Linux kernel backported by 
Michel Petullo and fixed by me (thought I might have introduced
bugs too):
http://git.kernel.org/?p=linux/kernel/git/konrad/xen.git;a=shortlog;h=refs/heads/devel/oprofile.v1
with Xen 4.3-unstable.


a). 32/64 compat is missing backtrace support. If you run with a 32-bit dom0
    and try to set the backtrace, the hypervisor sets is as -some-huge-number.
    It might be there are some other hypercalls that need some compat tweaks.

b). 32-bit dom0 oprofile toolstack truncates the EIP of 64-bit guests
    (or hypervisor). I am not really sure how to solve that except just
    not run 64-bit guests/hypervisor with a 32-bit dom0. Or make
    oprofile and its tools capable of doing 64-bit architecture.
    The vice-versa condition does not exist - so I can profile 32-bit
    guests using a 64-bit dom0.

c). HVM guest tracing. I tried running oprofile with an PVHVM Linux
    guest (32 or 64-bit) and I get data, but the /dev/oprofile/* worryingly
    tells  me that the sample_invalid_eip is about 10%. So 10% of the sample
    data is unknown.

c1). HVM did not profile on Intel (had not tried AMD). Andrew Thomas suggested
    a patch which did indeed fix the problem. (see attached)

d). opreport. I had the most difficult time understanding it. I finally
    figured it out that it collapses guest's vcpu information on-to the domain
    0 vcpu. So if you run something like this:

	Domain-0                             0     0    0   r--      27.1  0
	Domain-0                             0     1    1   -b-       6.3  1
	Domain-0                             0     2    2   -b-       5.3  2
	Domain-0                             0     3    3   ---      10.6  3
	backend                              1     0    1   r--      22.2  0-1
	backend                              1     1    0   -b-      23.7  0-1
	netperf                              2     0    2   -b-      11.0  2
	netserver                            3     0    3   r--      12.8  3


   The opreport is going to show me netperf vcpu0, netserver vcpu0 and backend vcp0
   all on the domain0's vcpu0. So if I run the oprofile long enough and do:

	opreport -l vmlinux-netperf xen-syms -t


	CPU: Intel Sandy Bridge microarchitecture, speed 3092.89 MHz (estimated)
	Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 90000
	Samples on CPU 0
	Samples on CPU 1
	Samples on CPU 2
	Samples on CPU 3
	samples  %        samples  %        samples  %        samples  %        image name               app name                 symbol name
	13058     3.5286  0              0  0              0  0              0  xen-syms                 qemu-dm                  get_page_from_gfn_p2m
	12671     3.4240  0              0  0              0  0              0  xen-syms                 qemu-dm                  hap_p2m_ga_to_gfn_4_levels
	11382     3.0757  0              0  0              0  0              0  xen-syms                 qemu-dm                  hap_gva_to_gfn_4_levels
	9276      2.5066  0              0  0              0  0              0  xen-syms                 qemu-dm                  __hvm_copy
	9146      2.4715  0              0  0              0  0              0  xen-syms                 qemu-dm                  paging_gva_to_gfn
	8069      2.1804  0              0  0              0  0              0  xen-syms                 qemu-dm                  guest_walk_tables_4_levels
	7228      1.9532  0              0  0              0  0              0  xen-syms                 qemu-dm                  __get_gfn_type_access
	6387      1.7259  0              0  0              0  0              0  xen-syms                 qemu-dm                  gnttab_copy
	6216      1.6797  0              0  0              0  0              0  xen-syms                 qemu-dm                  ept_get_entry
	6091      1.6459  0              0  0              0  0              0  vmlinux-netperf          qemu-dm                  inet_sendmsg
	5954      1.6089  0              0  0              0  0              0  vmlinux-netperf          qemu-dm                  sock_sendmsg
	5865      1.5849  0              0  0              0  0              0  vmlinux-netperf          qemu-dm                  handle_irq_event_percpu
	5851      1.5811  0              0  0              0  0              0  vmlinux-netperf          qemu-dm                  handle_irq_event
	5771      1.5595  0              0  0              0  0              0  vmlinux-netperf          qemu-dm                  xennet_interrupt
	5697      1.5395  0              0  0              0  0              0  vmlinux-netperf          qemu-dm                  xennet_tx_buf_gc
	5567      1.5043  0              0  0              0  0              0  xen-syms                 qemu-dm                  do_grant_table_op
	5520      1.4916  0              0  0              0  0              0  vmlinux-netperf          qemu-dm                  sys_sendto
	5116      1.3825  0              0  0              0  0              0  vmlinux-netperf          qemu-dm                  system_call_fastpath
	4857      1.3125  0              0  0              0  0              0  xen-syms                 qemu-dm                  memcpy
	4429      1.1968  0              0  0              0  0              0  xen-syms                 qemu-dm                  hvm_grant_table_op
	4350      1.1755  0              0  0              0  0              0  vmlinux-netperf          qemu-dm                  gnttab_end_foreign_access_ref
	4326      1.1690  0              0  0              0  0              0  xen-syms                 qemu-dm                  page_get_owner_and_reference
	4256      1.1501  0              0  0              0  0              0  xen-syms                 qemu-dm                  hvm_do_hypercall
	4205      1.1363  0              0  0              0  0              0  vmlinux-netperf          qemu-dm                  gnttab_end_foreign_access_ref_v1
	4056      1.0960  0              0  0              0  0              0  vmlinux-netperf          qemu-dm                  __copy_user_nocache
	3817      1.0314  0              0  0              0  0              0  xen-syms                 qemu-dm                  map_domain_gfn

	which seems valid enough - vcpu0 on _all_ of the guest is the place where the action is happening.
        But that is not neccessarily the right pCPU!

    But if I do it for the backend (which has two vcpus):


	CPU: Intel Sandy Bridge microarchitecture, speed 3092.89 MHz (estimated)
	Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 90000
	Samples on CPU 0
	Samples on CPU 1
	Samples on CPU 2
	Samples on CPU 3
	samples  %        samples  %        samples  %        samples  %        image name               app name                 symbol name
	13058     4.2539  0              0  0              0  0              0  xen-syms                 qemu-dm                  get_page_from_gfn_p2m
	12671     4.1278  0              0  0              0  0              0  xen-syms                 qemu-dm                  hap_p2m_ga_to_gfn_4_levels
	11382     3.7079  0              0  0              0  0              0  xen-syms                 qemu-dm                  hap_gva_to_gfn_4_levels
	9276      3.0219  0              0  0              0  0              0  xen-syms                 qemu-dm                  __hvm_copy
	9146      2.9795  0              0  0              0  0              0  xen-syms                 qemu-dm                  paging_gva_to_gfn
	8069      2.6286  0              0  0              0  0              0  xen-syms                 qemu-dm                  guest_walk_tables_4_levels
	7228      2.3547  0              0  0              0  0              0  xen-syms                 qemu-dm                  __get_gfn_type_access
	6387      2.0807  0              0  0              0  0              0  xen-syms                 qemu-dm                  gnttab_copy
	6216      2.0250  0              0  0              0  0              0  xen-syms                 qemu-dm                  ept_get_entry
	5567      1.8136  0              0  0              0  0              0  xen-syms                 qemu-dm                  do_grant_table_op
	4857      1.5823  0              0  0              0  0              0  xen-syms                 qemu-dm                  memcpy
	4429      1.4428  0              0  0              0  0              0  xen-syms                 qemu-dm                  hvm_grant_table_op
	4326      1.4093  0              0  0              0  0              0  xen-syms                 qemu-dm                  page_get_owner_and_reference
	4256      1.3865  0              0  0              0  0              0  xen-syms                 qemu-dm                  hvm_do_hypercall
	3817      1.2435  0              0  0              0  0              0  xen-syms                 qemu-dm                  map_domain_gfn
	3661      1.1926  0              0  0              0  0              0  vmlinux-backend          qemu-dm                  xen_netbk_kthread
	3319      1.0812  0              0  0              0  0              0  xen-syms                 qemu-dm                  ept_next_level
	3237      1.0545  0              0  0              0  0              0  vmlinux-backend          qemu-dm                  kthread
	3097      1.0089  0              0  0              0  0              0  xen-syms                 qemu-dm                  hvm_copy_to_guest_virt_nofault

    somehow the vcpu1 is gone. Albeit if I split it up:

	opreport -l vmlinux-backend xen-syms -t 1 cpu:0
	opreport -l vmlinux-backend xen-syms -t 1 cpu:1

    I can get some values.

e). If I look at how the guests are utilized, combined with 'xl vcpu-list' it appears
    to me that the pCPU0 is the most used one.

	Mem: 8208716k total, 6435216k used, 1773500k free    CPUs: 4 @ 3092MHz
	      NAME  STATE   CPU(sec) CPU(%)     MEM(k) MEM(%)  MAXMEM(k) MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) VBDS   VBD_OO   VBD_RD   VBD_WR  VBD_RSECT  VBD_WSECT SSID
	   backend -----r       3889  172.1    2093004   25.5    2098176      25.6     2    0        0        0    1        0        9        0         96          0    0
	  domain-0 -----r        125    3.5    2097140   25.5    2097152      25.5     4    0        0        0    0        0        0        0          0          0    0
	   netperf --b---       1064   47.8    1044444   12.7    1049600      12.8     1    0        0        0    1        0        8        0         56          0    0
	 netserver -----r       1328   59.8    1044444   12.7    1049600      12.8     1    0        0        0    1        0        8        0         56          0    0


	and xentrace/xenalyze tell me that the backend domain is busy:

	|-- Domain 3 --| 
	 Runstates:
	   blocked:       3  0.00s  21733 { 28204| 28204| 28204}
	  partial run:  342975  6.52s  45613 { 14187| 84608|113476}
	  full run:  362709 28.23s 186763 { 83904|369588|399732}
	  partial contention:     997  0.82s 1977180 {5596136|5804072|47488112}
	  concurrency_hazard:  353528  1.72s  11677 {  8977| 10056|322393}
	  full_contention:       6  0.00s   5136 {  5040|  5228|  5400}
	      lost:     864  1.37s 3814508 {351182583|1296246349|1296246349}

	(if I drill in deeper, vcpu 0 is busy doing grant table operations
	(56% of the VMCALL exits are b/c of it, the vcpu1 is busy getting IPIed, the
	/proc/cpuinfo in the guest collaborates with that information).


    So I would have expected the oprofile numbers to be more serious that then occasional 4% or so
    here and there.

    Oh, that is what the "-a" argument is for!:

    (here it is with a bit trimming done)

	samples  cum. samples  %                     app name                 symbol name
	13058    13058          4.2518   4.2518      xen-syms                 qemu-dm                  get_page_from_gfn_p2m
	12671    25729          4.1258   8.3776      xen-syms                 qemu-dm                  hap_p2m_ga_to_gfn_4_levels
	11382    37111          3.7061  12.0837      xen-syms                 qemu-dm                  hap_gva_to_gfn_4_levels
	9276     46387          3.0204  15.1041      xen-syms                 qemu-dm                  __hvm_copy
	9146     55533          2.9780  18.0822      xen-syms                 qemu-dm                  paging_gva_to_gfn
	8069     63602          2.6274  20.7095      xen-syms                 qemu-dm                  guest_walk_tables_4_levels
	7228     70830          2.3535  23.0630      xen-syms                 qemu-dm                  __get_gfn_type_access
	6387     77217          2.0797  25.1427      xen-syms                 qemu-dm                  gnttab_copy
	6216     83433          2.0240  27.1667      xen-syms                 qemu-dm                  ept_get_entry
	5567     89000          1.8127  28.9794      xen-syms                 qemu-dm                  do_grant_table_op
	4857     93857          1.5815  30.5609      xen-syms                 qemu-dm                  memcpy
	4429     98286          1.4421  32.0030      xen-syms                 qemu-dm                  hvm_grant_table_op
	4326     102612         1.4086  33.4116      xen-syms                 qemu-dm                  page_get_owner_and_reference   
	4256     106868         1.3858  34.7974      xen-syms                 qemu-dm                  hvm_do_hypercall
	3817     110685         1.2429  36.0402      xen-syms                 qemu-dm                  map_domain_gfn
	3661     114346         1.1921  37.2323      vmlinux-backend          qemu-dm                  xen_netbk_kthread

    so 36% of the pCPU time is spent in the hypervisor (I think?).


    Q: Can xenalyze tell me that as well? That the pCPU1 is pegged at 100%?


  f). oprofile counters on Intel and the hypercalls. It looks like the counters that oprofile
     is using are the valid MSR_P6_PERFCTR0 and MSR_P6_EVNTSEL0. It has no notion of
    fixed counters or unicore counters. So for example on the SandyBridge it will report
    8 counters (and try to clear MSR_P6_PERFCTR0+8).
    Even thought there are three event-based ones, four fixed, and one uncore.
    It also is missing support for some Intel architectures (Model 44 for example - the Westmere).
    In other words, it looks like the Linux and hypervisor oprofile support has not been
    maintained for some time. I don't know how it works on AMD as I just took a look at
    this last week.

 g). There are no code that takes advantage of the PEBS or the unicore events to get
    a full featured trace (did you know that PEBS can also collect the register information!)


  g1). Linux perf looks to be using a garden variety vector instead of doing it via
    the NMI. Could we do it too? Or would that be inappropiate? I am not sure whether
    that can actually be done for the event type counters. Perhaps this is only for
    the fixed ones and uncore ones.

  h). There are some counters in the hypervisor for the oprofile statistics, like
   lost samples, etc. I does not look like they are exported/printed anywhere. Perhaps
   an 'register_keyhandler' should be written to dump those (and also which domains
   are profiled).

  h1). I can't find any reference to DOM_IDLE which probably should also be accounted
      for? Or DOM_IO (perhaps less since it does not seem to be in usage).

 i). opreports often tells me
	warning: /domain1-apps could not be found.
	warning: /domain1-modules could not be found.
	warning: /domain1-xen-unknown could not be found.
	warning: /domain2-apps could not be found.
	warning: /domain2-modules could not be found.
	warning: /domain2-xen-unknown could not be found.
	warning: /domain3-apps could not be found.
	warning: /domain3-modules could not be found.
	warning: /domain3-xen-unknown could not be found.
	warning: /vmlinux-unknown could not be found.
	warning: /xen-unknown could not be found.

   which is odd, b/c I did specify:

	opcontrol --start --passive-domains=1,2,3 \
        	--passive-images=/shared/vmlinux-backend,/shared/vmlinux-netperf,/shared/vmlinux-netserver \
        	--xen=/shared/xen-syms --vmlinux=/shared/vmlinux-dom0 \
                	--buffer-size=256000 \
        	-c=5 --separate=cpu,kernel --event=CPU_CLK_UNHALTED:90000:0x0:1:0

   so is it just the toolstack being silly?


Anyhow, I don't know if it just me not realizing how to use 'opreport' to get per
guest or per vcpu information but I find it hard to use. I like how the 'perf top'
can instantly give an excellent idea of the whole machine status. Maybe that is
why folks in the Linux community really wanted something else than oprofile.

And it occurs to me it could be possible be to make some inroads on making
performance monitoring easier:

  1). fix the glaring omissions in oprofile for the new CPUs
  2). Add a register keyhandle to get some debug info.
  3). piggyback on oprofile hypercalls and insert some bridge in perf (lots
      of handwaving here). Or perhaps emulate in the Linux kernel the
      wmsrs (so xen_safe_wrmsrs) and have the pvops kernel based on the MSRs
      make the hypercalls to setup the buffers, etc.

     3a). new hypercalls? intercept rdmsr/wrmsrs and stuff the right data
      in the initial domain? Other thoughts?

  4). Extend perf to have '--xen' so it can also look at the xen-hypervisor
      ELF file.


diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index aee1f9e..16db4fe 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -58,6 +58,9 @@
 
 enum handler_return { HNDL_done, HNDL_unhandled, HNDL_exception_raised };
 
+//extern void do_nmi(void);
+extern bool_t alternative_nmi;
+
 static void vmx_ctxt_switch_from(struct vcpu *v);
 static void vmx_ctxt_switch_to(struct vcpu *v);
 
@@ -2441,8 +2444,16 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
             if ( (intr_info & INTR_INFO_INTR_TYPE_MASK) !=
                  (X86_EVENTTYPE_NMI << 8) )
                 goto exit_and_crash;
+            HVM_DBG_LOG(DBG_LEVEL_VMMU,
+                        "NMI eax=%lx, ebx=%lx, ecx=%lx, edx=%lx, esi=%lx, edi=%lx",
+                        (unsigned long)regs->eax, (unsigned long)regs->ebx,
+                        (unsigned long)regs->ecx, (unsigned long)regs->edx,
+                        (unsigned long)regs->esi, (unsigned long)regs->edi);
             HVMTRACE_0D(NMI);
-            self_nmi(); /* Real NMI, vector 2: normal processing. */
+		if (alternative_nmi)
+			do_nmi(regs);
+		else
+            		self_nmi(); /* Real NMI, vector 2: normal processing. */
             break;
         case TRAP_machine_check:
             HVMTRACE_0D(MCE);
diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
index 4fd6711..3e61e3e 100644
--- a/xen/arch/x86/nmi.c
+++ b/xen/arch/x86/nmi.c
@@ -40,6 +40,8 @@ static unsigned int nmi_p4_cccr_val;
 static DEFINE_PER_CPU(struct timer, nmi_timer);
 static DEFINE_PER_CPU(unsigned int, nmi_timer_ticks);
 
+bool_t alternative_nmi = 0;
+boolean_param("alternative_nmi", alternative_nmi);
 /* opt_watchdog: If true, run a watchdog NMI on each processor. */
 bool_t __initdata opt_watchdog = 0;
 boolean_param("watchdog", opt_watchdog);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Xen, oprofile, perf, PEBS, event counters, PVHVM, PV
  2013-01-14 20:45 Xen, oprofile, perf, PEBS, event counters, PVHVM, PV Konrad Rzeszutek Wilk
@ 2013-01-15  9:13 ` Jan Beulich
  2013-01-15 17:11 ` Marcus Granado
  1 sibling, 0 replies; 7+ messages in thread
From: Jan Beulich @ 2013-01-15  9:13 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: msw, andrew.thomas, boris.ostrovsky, jacob.shin, kurt.hackel,
	marcus.granado, xen-devel, Marcos.Matsunaga, jun.nakajima,
	Sherry.Hurwitz

>>> On 14.01.13 at 21:45, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> a). 32/64 compat is missing backtrace support. If you run with a 32-bit dom0
>     and try to set the backtrace, the hypervisor sets is as -some-huge-number.
>     It might be there are some other hypercalls that need some compat 
> tweaks.

dump_guest_backtrace() actually looks to be prepared to deal
with 32-bit guests.

Jan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Xen, oprofile, perf, PEBS, event counters, PVHVM, PV
  2013-01-14 20:45 Xen, oprofile, perf, PEBS, event counters, PVHVM, PV Konrad Rzeszutek Wilk
  2013-01-15  9:13 ` Jan Beulich
@ 2013-01-15 17:11 ` Marcus Granado
  2013-01-16  4:47   ` Konrad Rzeszutek Wilk
  2013-01-18 15:43   ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 7+ messages in thread
From: Marcus Granado @ 2013-01-15 17:11 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Marcus Granado, xen-devel@lists.xensource.com, msw@amazon.com,
	andrew.thomas@oracle.com, jun.nakajima@intel.com,
	boris.ostrovsky@amd.com, jacob.shin@amd.com,
	kurt.hackel@oracle.com, Marcos.Matsunaga@oracle.com,
	JBeulich@suse.com, Sherry.Hurwitz@amd.com


[-- Attachment #1.1: Type: text/plain, Size: 6719 bytes --]

On 14/01/13 20:45, Konrad Rzeszutek Wilk wrote:
> a). 32/64 compat is missing backtrace support. If you run with a 32-bit dom0
>      and try to set the backtrace, the hypervisor sets is as -some-huge-number.
>      It might be there are some other hypercalls that need some compat tweaks.

It's not clear to me if it is the same issue, but there was some work to 
make xenoprof's callgraph work with 32-bit domains on a 64-bit xen here:
http://lists.xen.org/archives/html/xen-devel/2012-01/msg01721.html
The patch should be nowin xen but it requires a one-linechange in the 
32-bit dom0 kernel matching this one in xen:
http://lists.xen.org/archives/html/xen-devel/2012-01/txt4qZ7uGGPTc.txt

> b). 32-bit dom0 oprofile toolstack truncates the EIP of 64-bit guests
>      (or hypervisor). I am not really sure how to solve that except just
>      not run 64-bit guests/hypervisor with a 32-bit dom0. Or make
>      oprofile and its tools capable of doing 64-bit architecture.
>      The vice-versa condition does not exist - so I can profile 32-bit
>      guests using a 64-bit dom0.
Afaics, the 32-bit dom0 oprofile.ko module receives the 64-bit eips; the 
XENOPROF_ESCAPE_CODE comparison is made as ULL in the kernel and seems 
to work. This could be happening maybe in either opcontrol, oprofiled or 
opreport, but with the patches above I obtained the following result in 
an idle 32-bit dom0, which seems to display the correct 64-bit memory 
location information for hypervisor functions:


|# opreport -lwc #(functions calling other functions):|
|CPU: Core ||2||, speed ||2493.77||MHz (estimated)|
|Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a 
unit mask of ||0x00||(Unhalted core cycles) count ||1000000|
|vma      samples  % image name               app name symbol name|
|-------------------------------------------------------------------------------|
|||00000000000b9610 ||1||2.0000||libc-||2.5||.so 
python                   getaddrinfo|
|||000000000000fad0 ||1||2.0000||libpthread-||2.5||.so 
python                   _fini|
|||000000000000e790 ||3||6.0000||ld-||2.5||.so python                   
_dl_fini|
|||000000000002aec0 ||12||24.0000||libc-||2.5||.so 
python                   msort_with_tmp|
|||0000000000004480||25||50.0000||_xslib.so python                   
init_xslib|
|0000000000000000||415746||22.5841||libpython2.||4||.so.||1.0||python 
/usr/lib/libpython2.||4||.so.||1.0|
|||0000000000000000||415746||100.000||libpython2.||4||.so.||1.0||python 
/usr/lib/libpython2.||4||.so.||1.0||[self]|
|...|
|-------------------------------------------------------------------------------|
|ffff82c480170470 ||36||0.1587||xen-syms qemu-dm                  
send_IPI_mask_flat|
|||ffff82c480170470 ||36||100.000||xen-syms qemu-dm                  
send_IPI_mask_flat [self]|
|-------------------------------------------------------------------------------|
|ffff82c480120d40 ||33||0.1455||xen-syms qemu-dm                  
cpumask_raise_softirq|
|||ffff82c480120d40 ||33||100.000||xen-syms qemu-dm                  
cpumask_raise_softirq [self] |



This quote from 
http://lists.xen.org/archives/html/xen-devel/2012-01/msg01721.htmlmay be 
useful:
"

A few comments from my tests with oprofile 0.9.6 in userspace:

- to obtain callgraphs of the xen code, you need to enable 
theCONFIG_FRAME_POINTER flag during compilation of the xen binary, 
eg.using "make" with "frame_pointer=y".- if the oprofiled daemon is 
running in a 32-bit guest, it needs toreceive the xen-range in 32-bits, 
eg. --xen-image=/boot/xen-syms-4.1.1--xen-range=80100000,801fe5ee
"

>    h). There are some counters in the hypervisor for the oprofile statistics, like
>     lost samples, etc. I does not look like they are exported/printed anywhere. Perhaps
>     an 'register_keyhandler' should be written to dump those (and also which domains
>     are profiled).

I see some lost sample information when I run 'opcontrol --start 
--verbose=all', 'opcontrol --deinit' and look at oprofiled.log, are 
these the counters you are looking for?
# cat /var/lib/oprofile/samples/oprofiled.log
oprofiled started Tue Jan 15 16:02:00 2013
kernel pointer size: 4
Tue Jan 15 16:04:34 2013
-- OProfile Statistics --
Nr. sample dumps: 4
Nr. non-backtrace samples: 25508
Nr. kernel samples: 14344
Nr. lost samples (no kernel/user): 0
Nr. lost kernel samples: 0
Nr. incomplete code structs: 0
Nr. samples lost due to sample file open failure: 4569
Nr. samples lost due to no permanent mapping: 78
Nr. event lost due to buffer overflow: 0
Nr. samples lost due to no mapping: 20
Nr. backtraces skipped due to no file mapping: 0
Nr. samples lost due to no mm: 4727
---- Statistics for cpu : 3
Nr. samples lost cpu buffer overflow: 0
Nr. samples received: 11734
Nr. backtrace aborted: 0
Nr. samples lost invalid pc: 0
...


>   i). opreports often tells me
>          warning: /domain1-apps could not be found.
>          warning: /domain1-modules could not be found.
>          warning: /domain1-xen-unknown could not be found.
>          warning: /domain2-apps could not be found.
>          warning: /domain2-modules could not be found.
>          warning: /domain2-xen-unknown could not be found.
>          warning: /domain3-apps could not be found.
>          warning: /domain3-modules could not be found.
>          warning: /domain3-xen-unknown could not be found.
>          warning: /vmlinux-unknown could not be found.
>          warning: /xen-unknown could not be found.

These warnings remind me of what I was receiving for the dom0 kernel 
modules, I fixed them by using -p for the modules in opreport:
# opreport -l -p/usr/lib/debug/lib/modules/`uname -r`
I guess opreport may be in need of this parameter pointing to the guest 
kernel symbols.

> And it occurs to me it could be possible be to make some inroads on making
> performance monitoring easier:
>
>    1). fix the glaring omissions in oprofile for the new CPUs
>    2). Add a register keyhandle to get some debug info.
>    3). piggyback on oprofile hypercalls and insert some bridge in perf (lots
>        of handwaving here). Or perhaps emulate in the Linux kernel the
>        wmsrs (so xen_safe_wrmsrs) and have the pvops kernel based on the MSRs
>        make the hypercalls to setup the buffers, etc.
>
>       3a). new hypercalls? intercept rdmsr/wrmsrs and stuff the right data
>        in the initial domain? Other thoughts?
>
>    4). Extend perf to have '--xen' so it can also look at the xen-hypervisor
>        ELF file.

5) live event reports from xenoprof/opreport, ala perf top.
6) ports of oprofile kernel modules for other oses (bsd, windows, 
mirage), so that these oses can be used as active participants.

cheers,
Marcus




[-- Attachment #1.2: Type: text/html, Size: 127564 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Xen, oprofile, perf, PEBS, event counters, PVHVM, PV
  2013-01-15 17:11 ` Marcus Granado
@ 2013-01-16  4:47   ` Konrad Rzeszutek Wilk
  2013-01-16 16:18     ` Boris Ostrovsky
  2013-01-18 15:43   ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 7+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-16  4:47 UTC (permalink / raw)
  To: Marcus Granado
  Cc: Marcus Granado, xen-devel@lists.xensource.com, msw@amazon.com,
	andrew.thomas@oracle.com, jun.nakajima@intel.com,
	boris.ostrovsky@amd.com, jacob.shin@amd.com,
	kurt.hackel@oracle.com, Marcos.Matsunaga@oracle.com,
	JBeulich@suse.com, Sherry.Hurwitz@amd.com

On Tue, Jan 15, 2013 at 05:11:01PM +0000, Marcus Granado wrote:
> On 14/01/13 20:45, Konrad Rzeszutek Wilk wrote:
> >a). 32/64 compat is missing backtrace support. If you run with a 32-bit dom0
> >     and try to set the backtrace, the hypervisor sets is as -some-huge-number.
> >     It might be there are some other hypercalls that need some compat tweaks.
> 
> It's not clear to me if it is the same issue, but there was some
> work to make xenoprof's callgraph work with 32-bit domains on a
> 64-bit xen here:
> http://lists.xen.org/archives/html/xen-devel/2012-01/msg01721.html
> The patch should be nowin xen but it requires a one-linechange in
> the 32-bit dom0 kernel matching this one in xen:
> http://lists.xen.org/archives/html/xen-devel/2012-01/txt4qZ7uGGPTc.txt

Oh, those look like the patches that would help. Could you repost
them and I can review them?
> 
> >b). 32-bit dom0 oprofile toolstack truncates the EIP of 64-bit guests
> >     (or hypervisor). I am not really sure how to solve that except just
> >     not run 64-bit guests/hypervisor with a 32-bit dom0. Or make
> >     oprofile and its tools capable of doing 64-bit architecture.
> >     The vice-versa condition does not exist - so I can profile 32-bit
> >     guests using a 64-bit dom0.
> Afaics, the 32-bit dom0 oprofile.ko module receives the 64-bit eips;
> the XENOPROF_ESCAPE_CODE comparison is made as ULL in the kernel and
> seems to work. This could be happening maybe in either opcontrol,
> oprofiled or opreport, but with the patches above I obtained the
> following result in an idle 32-bit dom0, which seems to display the
> correct 64-bit memory location information for hypervisor functions:
> 
> 
> |# opreport -lwc #(functions calling other functions):|
> |CPU: Core ||2||, speed ||2493.77||MHz (estimated)|
> |Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with
> a unit mask of ||0x00||(Unhalted core cycles) count ||1000000|
> |vma      samples  % image name               app name symbol name|
> |-------------------------------------------------------------------------------|
> |||00000000000b9610 ||1||2.0000||libc-||2.5||.so python
> getaddrinfo|
> |||000000000000fad0 ||1||2.0000||libpthread-||2.5||.so python
> _fini|
> |||000000000000e790 ||3||6.0000||ld-||2.5||.so python
> _dl_fini|
> |||000000000002aec0 ||12||24.0000||libc-||2.5||.so python
> msort_with_tmp|
> |||0000000000004480||25||50.0000||_xslib.so python
> init_xslib|
> |0000000000000000||415746||22.5841||libpython2.||4||.so.||1.0||python
> /usr/lib/libpython2.||4||.so.||1.0|
> |||0000000000000000||415746||100.000||libpython2.||4||.so.||1.0||python
> /usr/lib/libpython2.||4||.so.||1.0||[self]|
> |...|
> |-------------------------------------------------------------------------------|
> |ffff82c480170470 ||36||0.1587||xen-syms qemu-dm
> send_IPI_mask_flat|
> |||ffff82c480170470 ||36||100.000||xen-syms qemu-dm
> send_IPI_mask_flat [self]|
> |-------------------------------------------------------------------------------|
> |ffff82c480120d40 ||33||0.1455||xen-syms qemu-dm
> cpumask_raise_softirq|
> |||ffff82c480120d40 ||33||100.000||xen-syms qemu-dm
> cpumask_raise_softirq [self] |
> 
> 
> 
> This quote from
> http://lists.xen.org/archives/html/xen-devel/2012-01/msg01721.htmlmay
> be useful:
> "
> 
> A few comments from my tests with oprofile 0.9.6 in userspace:
> 
> - to obtain callgraphs of the xen code, you need to enable
> theCONFIG_FRAME_POINTER flag during compilation of the xen binary,
> eg.using "make" with "frame_pointer=y".- if the oprofiled daemon is
> running in a 32-bit guest, it needs toreceive the xen-range in
> 32-bits, eg.
> --xen-image=/boot/xen-syms-4.1.1--xen-range=80100000,801fe5ee
> "
> 
> >   h). There are some counters in the hypervisor for the oprofile statistics, like
> >    lost samples, etc. I does not look like they are exported/printed anywhere. Perhaps
> >    an 'register_keyhandler' should be written to dump those (and also which domains
> >    are profiled).
> 
> I see some lost sample information when I run 'opcontrol --start
> --verbose=all', 'opcontrol --deinit' and look at oprofiled.log, are
> these the counters you are looking for?

No. I was thinking

 44 static u64 total_samples;
 45 static u64 invalid_buffer_samples;
 46 static u64 corrupted_buffer_samples;
 47 static u64 lost_samples;
 48 static u64 active_samples;
 49 static u64 passive_samples;
 50 static u64 idle_samples;
 51 static u64 others_samples;

in common/xenoprof.c

> # cat /var/lib/oprofile/samples/oprofiled.log
> oprofiled started Tue Jan 15 16:02:00 2013
> kernel pointer size: 4
> Tue Jan 15 16:04:34 2013
> -- OProfile Statistics --
> Nr. sample dumps: 4
> Nr. non-backtrace samples: 25508
> Nr. kernel samples: 14344
> Nr. lost samples (no kernel/user): 0
> Nr. lost kernel samples: 0
> Nr. incomplete code structs: 0
> Nr. samples lost due to sample file open failure: 4569
> Nr. samples lost due to no permanent mapping: 78
> Nr. event lost due to buffer overflow: 0
> Nr. samples lost due to no mapping: 20
> Nr. backtraces skipped due to no file mapping: 0
> Nr. samples lost due to no mm: 4727
> ---- Statistics for cpu : 3
> Nr. samples lost cpu buffer overflow: 0
> Nr. samples received: 11734
> Nr. backtrace aborted: 0
> Nr. samples lost invalid pc: 0
> ...
> 
> 
> >  i). opreports often tells me
> >         warning: /domain1-apps could not be found.
> >         warning: /domain1-modules could not be found.
> >         warning: /domain1-xen-unknown could not be found.
> >         warning: /domain2-apps could not be found.
> >         warning: /domain2-modules could not be found.
> >         warning: /domain2-xen-unknown could not be found.
> >         warning: /domain3-apps could not be found.
> >         warning: /domain3-modules could not be found.
> >         warning: /domain3-xen-unknown could not be found.
> >         warning: /vmlinux-unknown could not be found.
> >         warning: /xen-unknown could not be found.
> 
> These warnings remind me of what I was receiving for the dom0 kernel
> modules, I fixed them by using -p for the modules in opreport:
> # opreport -l -p/usr/lib/debug/lib/modules/`uname -r`
> I guess opreport may be in need of this parameter pointing to the
> guest kernel symbols.

OK. I will try that out.

> 
> >And it occurs to me it could be possible be to make some inroads on making
> >performance monitoring easier:
> >
> >   1). fix the glaring omissions in oprofile for the new CPUs
> >   2). Add a register keyhandle to get some debug info.
> >   3). piggyback on oprofile hypercalls and insert some bridge in perf (lots
> >       of handwaving here). Or perhaps emulate in the Linux kernel the
> >       wmsrs (so xen_safe_wrmsrs) and have the pvops kernel based on the MSRs
> >       make the hypercalls to setup the buffers, etc.
> >
> >      3a). new hypercalls? intercept rdmsr/wrmsrs and stuff the right data
> >       in the initial domain? Other thoughts?
> >
> >   4). Extend perf to have '--xen' so it can also look at the xen-hypervisor
> >       ELF file.
> 
> 5) live event reports from xenoprof/opreport, ala perf top.
> 6) ports of oprofile kernel modules for other oses (bsd, windows,
> mirage), so that these oses can be used as active participants.

Great!Thank you for adding those in.
> 
> cheers,
> Marcus
> 
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Xen, oprofile, perf, PEBS, event counters, PVHVM, PV
  2013-01-16  4:47   ` Konrad Rzeszutek Wilk
@ 2013-01-16 16:18     ` Boris Ostrovsky
  2013-01-16 17:40       ` Suravee Suthikulpanit
  0 siblings, 1 reply; 7+ messages in thread
From: Boris Ostrovsky @ 2013-01-16 16:18 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Marcus Granado, xen-devel@lists.xensource.com, msw@amazon.com,
	andrew.thomas@oracle.com, jun.nakajima@intel.com,
	kurt.hackel@oracle.com, jacob.shin@amd.com, Marcus Granado,
	Marcos.Matsunaga@oracle.com, JBeulich@suse.com,
	Sherry.Hurwitz@amd.com, Suravee Suthikulpanit

On 01/15/2013 11:47 PM, Konrad Rzeszutek Wilk wrote:
> On Tue, Jan 15, 2013 at 05:11:01PM +0000, Marcus Granado wrote:
>> On 14/01/13 20:45, Konrad Rzeszutek Wilk wrote:

>>> And it occurs to me it could be possible be to make some inroads on making
>>> performance monitoring easier:
>>>
>>>    1). fix the glaring omissions in oprofile for the new CPUs
>>>    2). Add a register keyhandle to get some debug info.
>>>    3). piggyback on oprofile hypercalls and insert some bridge in perf (lots
>>>        of handwaving here). Or perhaps emulate in the Linux kernel the
>>>        wmsrs (so xen_safe_wrmsrs) and have the pvops kernel based on the MSRs
>>>        make the hypercalls to setup the buffers, etc.
>>>
>>>       3a). new hypercalls? intercept rdmsr/wrmsrs and stuff the right data
>>>        in the initial domain? Other thoughts?
>>>
>>>    4). Extend perf to have '--xen' so it can also look at the xen-hypervisor
>>>        ELF file.
>>
>> 5) live event reports from xenoprof/opreport, ala perf top.
>> 6) ports of oprofile kernel modules for other oses (bsd, windows,
>> mirage), so that these oses can be used as active participants.

(+ Suravee)

What is the status of oprofile development in general (on Linux)? I 
don't see a whole lot of activity there, it appears to be more in 
maintenance mode.

I realize that doing (4) may end up being a lot of work but given much 
more active perf development it seems to me that this should be the 
preferred route.

-boris

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Xen, oprofile, perf, PEBS, event counters, PVHVM, PV
  2013-01-16 16:18     ` Boris Ostrovsky
@ 2013-01-16 17:40       ` Suravee Suthikulpanit
  0 siblings, 0 replies; 7+ messages in thread
From: Suravee Suthikulpanit @ 2013-01-16 17:40 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Marcus Granado, xen-devel@lists.xensource.com, msw@amazon.com,
	andrew.thomas@oracle.com, Konrad Rzeszutek Wilk,
	jun.nakajima@intel.com, kurt.hackel@oracle.com,
	jacob.shin@amd.com, Marcus Granado, Marcos.Matsunaga@oracle.com,
	JBeulich@suse.com, Sherry.Hurwitz@amd.com


On Wed, 2013-01-16 at 11:18 -0500, Boris Ostrovsky wrote:
> On 01/15/2013 11:47 PM, Konrad Rzeszutek Wilk wrote:
> > On Tue, Jan 15, 2013 at 05:11:01PM +0000, Marcus Granado wrote:
> >> On 14/01/13 20:45, Konrad Rzeszutek Wilk wrote:
> 
> >>> And it occurs to me it could be possible be to make some inroads on making
> >>> performance monitoring easier:
> >>>
> >>>    1). fix the glaring omissions in oprofile for the new CPUs
> >>>    2). Add a register keyhandle to get some debug info.
> >>>    3). piggyback on oprofile hypercalls and insert some bridge in perf (lots
> >>>        of handwaving here). Or perhaps emulate in the Linux kernel the
> >>>        wmsrs (so xen_safe_wrmsrs) and have the pvops kernel based on the MSRs
> >>>        make the hypercalls to setup the buffers, etc.
> >>>
> >>>       3a). new hypercalls? intercept rdmsr/wrmsrs and stuff the right data
> >>>        in the initial domain? Other thoughts?
> >>>
> >>>    4). Extend perf to have '--xen' so it can also look at the xen-hypervisor
> >>>        ELF file.
> >>
> >> 5) live event reports from xenoprof/opreport, ala perf top.
> >> 6) ports of oprofile kernel modules for other oses (bsd, windows,
> >> mirage), so that these oses can be used as active participants.
> 
> (+ Suravee)
> 
> What is the status of oprofile development in general (on Linux)? I 
> don't see a whole lot of activity there, it appears to be more in 
> maintenance mode.

Linux OProfile driver is in maintenance mode.  However, the user-space tools have been converted to use 
PERF syscalls interface which allows it to support per-process (non-root) profiling (similar to PERF default mode).  

One advantage of OProfile is the large database of performance counters on various platforms.

Suravee

> I realize that doing (4) may end up being a lot of work but given much 
> more active perf development it seems to me that this should be the 
> preferred route.


> -boris

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Xen, oprofile, perf, PEBS, event counters, PVHVM, PV
  2013-01-15 17:11 ` Marcus Granado
  2013-01-16  4:47   ` Konrad Rzeszutek Wilk
@ 2013-01-18 15:43   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 7+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-18 15:43 UTC (permalink / raw)
  To: Marcus Granado
  Cc: Marcus Granado, xen-devel@lists.xensource.com, msw@amazon.com,
	andrew.thomas@oracle.com, boris.ostrovsky@amd.com,
	Sherry.Hurwitz@amd.com, jacob.shin@amd.com,
	Marcos.Matsunaga@oracle.com, JBeulich@suse.com,
	jun.nakajima@intel.com, kurt.hackel@oracle.com

On Tue, Jan 15, 2013 at 05:11:01PM +0000, Marcus Granado wrote:
> On 14/01/13 20:45, Konrad Rzeszutek Wilk wrote:
> >a). 32/64 compat is missing backtrace support. If you run with a 32-bit dom0
> >     and try to set the backtrace, the hypervisor sets is as -some-huge-number.
> >     It might be there are some other hypercalls that need some compat tweaks.
> 
> It's not clear to me if it is the same issue, but there was some
> work to make xenoprof's callgraph work with 32-bit domains on a
> 64-bit xen here:
> http://lists.xen.org/archives/html/xen-devel/2012-01/msg01721.html
> The patch should be nowin xen but it requires a one-linechange in
> the 32-bit dom0 kernel matching this one in xen:
> http://lists.xen.org/archives/html/xen-devel/2012-01/txt4qZ7uGGPTc.txt
> 
> >b). 32-bit dom0 oprofile toolstack truncates the EIP of 64-bit guests
> >     (or hypervisor). I am not really sure how to solve that except just
> >     not run 64-bit guests/hypervisor with a 32-bit dom0. Or make
> >     oprofile and its tools capable of doing 64-bit architecture.
> >     The vice-versa condition does not exist - so I can profile 32-bit
> >     guests using a 64-bit dom0.
> Afaics, the 32-bit dom0 oprofile.ko module receives the 64-bit eips;
> the XENOPROF_ESCAPE_CODE comparison is made as ULL in the kernel and
> seems to work. This could be happening maybe in either opcontrol,
> oprofiled or opreport, but with the patches above I obtained the
> following result in an idle 32-bit dom0, which seems to display the
> correct 64-bit memory location information for hypervisor functions:
> 
> 
> |# opreport -lwc #(functions calling other functions):|
> |CPU: Core ||2||, speed ||2493.77||MHz (estimated)|
> |Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with
> a unit mask of ||0x00||(Unhalted core cycles) count ||1000000|
> |vma      samples  % image name               app name symbol name|
> |-------------------------------------------------------------------------------|
> |||00000000000b9610 ||1||2.0000||libc-||2.5||.so python
> getaddrinfo|
> |||000000000000fad0 ||1||2.0000||libpthread-||2.5||.so python
> _fini|
> |||000000000000e790 ||3||6.0000||ld-||2.5||.so python
> _dl_fini|
> |||000000000002aec0 ||12||24.0000||libc-||2.5||.so python
> msort_with_tmp|
> |||0000000000004480||25||50.0000||_xslib.so python
> init_xslib|
> |0000000000000000||415746||22.5841||libpython2.||4||.so.||1.0||python
> /usr/lib/libpython2.||4||.so.||1.0|
> |||0000000000000000||415746||100.000||libpython2.||4||.so.||1.0||python
> /usr/lib/libpython2.||4||.so.||1.0||[self]|
> |...|
> |-------------------------------------------------------------------------------|
> |ffff82c480170470 ||36||0.1587||xen-syms qemu-dm
> send_IPI_mask_flat|
> |||ffff82c480170470 ||36||100.000||xen-syms qemu-dm
> send_IPI_mask_flat [self]|
> |-------------------------------------------------------------------------------|
> |ffff82c480120d40 ||33||0.1455||xen-syms qemu-dm
> cpumask_raise_softirq|
> |||ffff82c480120d40 ||33||100.000||xen-syms qemu-dm
> cpumask_raise_softirq [self] |
> 
> 
> 
> This quote from
> http://lists.xen.org/archives/html/xen-devel/2012-01/msg01721.htmlmay
> be useful:
> "
> 
> A few comments from my tests with oprofile 0.9.6 in userspace:
> 
> - to obtain callgraphs of the xen code, you need to enable
> theCONFIG_FRAME_POINTER flag during compilation of the xen binary,
> eg.using "make" with "frame_pointer=y".- if the oprofiled daemon is
> running in a 32-bit guest, it needs toreceive the xen-range in
> 32-bits, eg.
> --xen-image=/boot/xen-syms-4.1.1--xen-range=80100000,801fe5ee
> "
> 
> >   h). There are some counters in the hypervisor for the oprofile statistics, like
> >    lost samples, etc. I does not look like they are exported/printed anywhere. Perhaps
> >    an 'register_keyhandler' should be written to dump those (and also which domains
> >    are profiled).
> 
> I see some lost sample information when I run 'opcontrol --start
> --verbose=all', 'opcontrol --deinit' and look at oprofiled.log, are
> these the counters you are looking for?
> # cat /var/lib/oprofile/samples/oprofiled.log
> oprofiled started Tue Jan 15 16:02:00 2013
> kernel pointer size: 4
> Tue Jan 15 16:04:34 2013
> -- OProfile Statistics --
> Nr. sample dumps: 4
> Nr. non-backtrace samples: 25508
> Nr. kernel samples: 14344
> Nr. lost samples (no kernel/user): 0
> Nr. lost kernel samples: 0
> Nr. incomplete code structs: 0
> Nr. samples lost due to sample file open failure: 4569
> Nr. samples lost due to no permanent mapping: 78
> Nr. event lost due to buffer overflow: 0
> Nr. samples lost due to no mapping: 20
> Nr. backtraces skipped due to no file mapping: 0
> Nr. samples lost due to no mm: 4727
> ---- Statistics for cpu : 3
> Nr. samples lost cpu buffer overflow: 0
> Nr. samples received: 11734
> Nr. backtrace aborted: 0
> Nr. samples lost invalid pc: 0
> ...
> 
> 
> >  i). opreports often tells me
> >         warning: /domain1-apps could not be found.
> >         warning: /domain1-modules could not be found.
> >         warning: /domain1-xen-unknown could not be found.
> >         warning: /domain2-apps could not be found.
> >         warning: /domain2-modules could not be found.
> >         warning: /domain2-xen-unknown could not be found.
> >         warning: /domain3-apps could not be found.
> >         warning: /domain3-modules could not be found.
> >         warning: /domain3-xen-unknown could not be found.
> >         warning: /vmlinux-unknown could not be found.
> >         warning: /xen-unknown could not be found.
> 
> These warnings remind me of what I was receiving for the dom0 kernel
> modules, I fixed them by using -p for the modules in opreport:
> # opreport -l -p/usr/lib/debug/lib/modules/`uname -r`
> I guess opreport may be in need of this parameter pointing to the
> guest kernel symbols.
> 
> >And it occurs to me it could be possible be to make some inroads on making
> >performance monitoring easier:
> >
> >   1). fix the glaring omissions in oprofile for the new CPUs
> >   2). Add a register keyhandle to get some debug info.
> >   3). piggyback on oprofile hypercalls and insert some bridge in perf (lots
> >       of handwaving here). Or perhaps emulate in the Linux kernel the
> >       wmsrs (so xen_safe_wrmsrs) and have the pvops kernel based on the MSRs
> >       make the hypercalls to setup the buffers, etc.
> >
> >      3a). new hypercalls? intercept rdmsr/wrmsrs and stuff the right data
> >       in the initial domain? Other thoughts?
> >
> >   4). Extend perf to have '--xen' so it can also look at the xen-hypervisor
> >       ELF file.
> 
> 5) live event reports from xenoprof/opreport, ala perf top.

Is that in terms of a bridge code or just actually making 'perf' use the
proper hypercalls?

> 6) ports of oprofile kernel modules for other oses (bsd, windows,
> mirage), so that these oses can be used as active participants.

Based on the fact that oprofile is in maintaince mode I don't think
the other 6) makes much sense.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-01-18 15:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-14 20:45 Xen, oprofile, perf, PEBS, event counters, PVHVM, PV Konrad Rzeszutek Wilk
2013-01-15  9:13 ` Jan Beulich
2013-01-15 17:11 ` Marcus Granado
2013-01-16  4:47   ` Konrad Rzeszutek Wilk
2013-01-16 16:18     ` Boris Ostrovsky
2013-01-16 17:40       ` Suravee Suthikulpanit
2013-01-18 15:43   ` Konrad Rzeszutek Wilk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).