* xentrace, xenalyze @ 2016-02-24 13:21 Paul Sujkov 2016-02-24 14:41 ` George Dunlap 2016-02-24 14:51 ` Dario Faggioli 0 siblings, 2 replies; 9+ messages in thread From: Paul Sujkov @ 2016-02-24 13:21 UTC (permalink / raw) To: xen-devel; +Cc: george.dunlap [-- Attachment #1.1: Type: text/plain, Size: 1134 bytes --] Hi, I'm from GlobalLogic team that uses Xen as a base for an automative platform. Got few questions regarding Xen tracing and it seems that existing documentation is rather limited. At the previous Xen Hackathon I was talking about shared (mediated pass-through) GPU concept for ARM; it's working well, but still has performance issues, and some of them seem to be correlated to other Xen subsystems, but it's not obvious why so (e.g. turning off pv real time clock driver gives us a significant boost in overall graphics performance). So, I'm looking for two things here actually: 1. to understand how can I use xenalyze to find bottlenecks in the system at the moment 2. to add VGPU scheduler traces to Xen trace subsystem, xenbaked, xentrace and xenalyze Some insights into the second question can be found in RTDS scheduler patches (it adds a few of it's own trace events and uses generic scheduler tracing); however, it uses already functioning scheduler tracing subsystem which is VCPU specific and is not suitable for VGPU, and there are no visible patches to xenalyze to parse these traces out. -- Regards, Pavlo Suikov [-- Attachment #1.2: Type: text/html, Size: 1346 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: xentrace, xenalyze 2016-02-24 13:21 xentrace, xenalyze Paul Sujkov @ 2016-02-24 14:41 ` George Dunlap 2016-02-24 15:24 ` Paul Sujkov 2016-02-24 14:51 ` Dario Faggioli 1 sibling, 1 reply; 9+ messages in thread From: George Dunlap @ 2016-02-24 14:41 UTC (permalink / raw) To: Paul Sujkov, xen-devel, Stefano Stabellini On 24/02/16 13:21, Paul Sujkov wrote: > Hi, > > I'm from GlobalLogic team that uses Xen as a base for an automative > platform. Got few questions regarding Xen tracing and it seems that > existing documentation is rather limited. > > At the previous Xen Hackathon I was talking about shared (mediated > pass-through) GPU concept for ARM; it's working well, but still has > performance issues, and some of them seem to be correlated to other Xen > subsystems, but it's not obvious why so (e.g. turning off pv real time > clock driver gives us a significant boost in overall graphics performance). > So, I'm looking for two things here actually: > > 1. to understand how can I use xenalyze to find bottlenecks in the system > at the moment > 2. to add VGPU scheduler traces to Xen trace subsystem, xenbaked, xentrace > and xenalyze > > Some insights into the second question can be found in RTDS scheduler > patches (it adds a few of it's own trace events and uses generic scheduler > tracing); however, it uses already functioning scheduler tracing subsystem > which is VCPU specific and is not suitable for VGPU, and there are no > visible patches to xenalyze to parse these traces out. Thanks, Paul. First of all: I'm not sure anyone has used xenbaked in ages; xentrace itself is just a daemon to shovel data from the hypervisor into a file -- it doesn't need to know anything about what's being traced (for the most part). That just leaves adding traces to the hypervisor, xentrace_format, and xenalyze. I think actually the first thing you might need to do is to get the xentrace infrastructure working on ARM. At least at some point there was no way for dom0 to map the xentrace pages (but that may have changed). After that, the next thing would be to add the equivalent of VMEXIT and VMENTRY traces in the hypervisor on ARM guest exit and entry, and then add support for analyzing that data to xenalyze. Once you've got that done, then you just add in extra tracing information as you need to drill down and figure out what's going on. :-) -George ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: xentrace, xenalyze 2016-02-24 14:41 ` George Dunlap @ 2016-02-24 15:24 ` Paul Sujkov 2016-02-24 15:53 ` Dario Faggioli 2016-02-24 15:58 ` George Dunlap 0 siblings, 2 replies; 9+ messages in thread From: Paul Sujkov @ 2016-02-24 15:24 UTC (permalink / raw) To: George Dunlap; +Cc: Stefano Stabellini, xen-devel [-- Attachment #1.1: Type: text/plain, Size: 823 bytes --] > I think actually the first thing you might need to do is to get the xentrace infrastructure working on ARM Already done that. It requires some patches to memory manager, timer and policies. I guess I should upstream them, though. > After that, the next thing would be to add the equivalent of VMEXIT and VMENTRY traces in the hypervisor on ARM guest exit and entry It seems that this is already covered as well. At least, I have pretty decent (and correct if I support timer frequency instead of CPU frequency to xenalyze - this is where it differs from x86) trace info summary. > add in extra tracing information > add support for analyzing that data to xenalyze And, well, these are exactly the steps I can really use some help with :) are there any examples of parsing some additional custom trace with xenalyze? [-- Attachment #1.2: Type: text/html, Size: 1595 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: xentrace, xenalyze 2016-02-24 15:24 ` Paul Sujkov @ 2016-02-24 15:53 ` Dario Faggioli 2016-02-24 15:58 ` George Dunlap 1 sibling, 0 replies; 9+ messages in thread From: Dario Faggioli @ 2016-02-24 15:53 UTC (permalink / raw) To: Paul Sujkov, George Dunlap; +Cc: xen-devel, Stefano Stabellini [-- Attachment #1.1: Type: text/plain, Size: 1035 bytes --] On Wed, 2016-02-24 at 17:24 +0200, Paul Sujkov wrote: > > I think actually the first thing you might need to do is to get > the xentrace infrastructure working on ARM > > Already done that. It requires some patches to memory manager, timer > and policies. > Really? Cool! > I guess I should upstream them, though. > That would indeed be ideal? > > add in extra tracing information > > add support for analyzing that data to xenalyze > > And, well, these are exactly the steps I can really use some help > with :) are there any examples of parsing some additional custom > trace with xenalyze? > Have you seen my other email? In any case, have a look here: http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg02233.html Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: xentrace, xenalyze 2016-02-24 15:24 ` Paul Sujkov 2016-02-24 15:53 ` Dario Faggioli @ 2016-02-24 15:58 ` George Dunlap 2016-02-24 17:58 ` Paul Sujkov 1 sibling, 1 reply; 9+ messages in thread From: George Dunlap @ 2016-02-24 15:58 UTC (permalink / raw) To: Paul Sujkov; +Cc: Stefano Stabellini, xen-devel On 24/02/16 15:24, Paul Sujkov wrote: >> I think actually the first thing you might need to do is to get the xentrace > infrastructure working on ARM > > Already done that. It requires some patches to memory manager, timer and > policies. I guess I should upstream them, though. > >> After that, the next thing would be to add the equivalent of VMEXIT and VMENTRY > traces in the hypervisor on ARM guest exit and entry > > It seems that this is already covered as well. At least, I have pretty > decent (and correct if I support timer frequency instead of CPU frequency > to xenalyze - this is where it differs from x86) trace info summary. You mean, you have local patches you haven't upstreamed? Or they're already upstream? (If the latter, I don't see the trace definitions in xen/include/public/trace.h...) If I could see those traces I could give you better advice about how to integrate them into xenalyze (and possibly how to change them so they fit better into what xenalyze does). > >> add in extra tracing information >> add support for analyzing that data to xenalyze > > And, well, these are exactly the steps I can really use some help with :) > are there any examples of parsing some additional custom trace with > xenalyze? So at the basic level, xenalyze has a "dump" mode, which just attempts to print out the trace records it sees in the file, in a human readable format, in the order in which they originally happened (even across physical corer / processors). To get *that* working, you just need to add it into the "triage" from xenalyze.c:process_record(). But the real power of xenalyze is to aggregate information about how many vmexits of a particular type happened, and how long we spent (in cycles) doing each one. The basic data structure for this is struct event_cycles_summary. You keep such a struct for every separate type of event that takes a certain number of cycles you want to be able to classify. As you go through the trace file, whenever that event happens, you call update_summary() with a pointer to the event struct and the number of cycles. Then when you're done processing the whole file, you call PRINT_SUMMARY() with a pointer to the summary struct, along with printf information you want to print before the summary information. So the next step, after getting the ARM equivalent of TRC_HVM_VMEXIT and TRC_HVM_VMENTRY set up, would be to get the equivalent of hvm_vmexit_process() and hvm_vmentry_process() (and hvm_close_vmexit()) set up. You'd probably want to start by creating a new structure, arm_data, and adding it to the vcpu_data struct (beside hvm_data and pv_data) (also making a new VCPU_DATA_ARM enumeration value of course). The basic processing cycle goes like this: * vmexit: Store the information about the vmexit in v->hvm_data * Other HVM traces: add more information about what happened in v->hvm_data * vmentry: Calculate the length of this event (vmentry.tsc - vmexit.tsc), figure out all the different summaries which correspond to this event, and call update_summary() on each of them. One subtlety to introduce here: it's not uncommon to enter into Xen due to a vmexit, do something on behalf of a guest, and then get scheduled out to run some other vcpu. The simplistic "vmexit -> vmentry" calculation would account this time waiting for the cpu as time processing the event -- which is not what you want. So xenalyze has a concept of "closing" a vmexit which happens when the vmexit is logically finished. hvm_close_vmexit() is called either from hvm_process_vmext(), or from process_runstate_change when it detects a vcpu switching to the "runnable" state. OK, hopefully that gives you enough to start with. :-) -George ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: xentrace, xenalyze 2016-02-24 15:58 ` George Dunlap @ 2016-02-24 17:58 ` Paul Sujkov 0 siblings, 0 replies; 9+ messages in thread From: Paul Sujkov @ 2016-02-24 17:58 UTC (permalink / raw) To: George Dunlap; +Cc: Stefano Stabellini, xen-devel [-- Attachment #1.1: Type: text/plain, Size: 1087 bytes --] > You mean, you have local patches you haven't upstreamed? Or they're > already upstream? (If the latter, I don't see the trace definitions in > xen/include/public/trace.h...) Yep, local patches. Some of them still look like a dirty hacks which is why they've been used internally for a while but were never upstreamed. > If I could see those traces I could give you better advice about how to > integrate them into xenalyze (and possibly how to change them so they > fit better into what xenalyze does). These are not the new traces yet, but the patches to get the old ones visible on ARM. Like skipping p2m translation for Xen domain (to get mapping done), correct get_cycles function from time.h (to get correct timestamps), call for init_trace_bufs in start_xen for ARM, and so on. But once I'll come up with VGPU trace format and implement traces into Xen trace subsystem (and check xentrace copy everything as expected), I'll contact you again. Thanks in advance! > OK, hopefully that gives you enough to start with. :-) Yes, that's quite a starting point, thanks a lot :) [-- Attachment #1.2: Type: text/html, Size: 1972 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: xentrace, xenalyze 2016-02-24 13:21 xentrace, xenalyze Paul Sujkov 2016-02-24 14:41 ` George Dunlap @ 2016-02-24 14:51 ` Dario Faggioli 2016-02-24 16:00 ` Paul Sujkov 1 sibling, 1 reply; 9+ messages in thread From: Dario Faggioli @ 2016-02-24 14:51 UTC (permalink / raw) To: Paul Sujkov, xen-devel; +Cc: george.dunlap [-- Attachment #1.1: Type: text/plain, Size: 2589 bytes --] On Wed, 2016-02-24 at 15:21 +0200, Paul Sujkov wrote: > Hi, > Hi, > I'm from GlobalLogic team that uses Xen as a base for an automative > platform. Got few questions regarding Xen tracing and it seems that > existing documentation is rather limited. > It is... > At the previous Xen Hackathon I was talking about shared (mediated > pass-through) GPU concept for ARM; it's working well, but still has > performance issues, and some of them seem to be correlated to other > Xen subsystems, but it's not obvious why so (e.g. turning off pv real > time clock driver gives us a significant boost in overall graphics > performance). So, I'm looking for two things here actually: > Great to hear you're (still) working on this. I'm looking forward to see the results! :-D > 1. to understand how can I use xenalyze to find bottlenecks in the > system at the moment > This is hard to tell (at least with such a broad scope). xentrace supports quite a bit of events, but enabling all of them may may not be what you want. For one, it would produce traces that are so big and complex to analyze, that it would be very hard to make sense out of them. I usually enable a subset of them (one or more "classes") and try to figure out if I see the problem in the resulting trace. If yes, I try with a narrower subset. If not, I try with either a broader or a different one. > 2. to add VGPU scheduler traces to Xen trace subsystem, xenbaked, > xentrace and xenalyze > I've never used xenbaked. I've only used xentrace_format and xenalyze. I don't know if xenbaked still works/can still be useful. > Some insights into the second question can be found in RTDS scheduler > patches (it adds a few of it's own trace events and uses generic > scheduler tracing); however, it uses already functioning scheduler > tracing subsystem which is VCPU specific and is not suitable for > VGPU, and there are no visible patches to xenalyze to parse these > traces out. > Have a look at this series: http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg02233.html And I've got another one that I'll send out asap (and I can Cc you). There are for sure some examples of adding trace points in Xen, and adding support for them in xentrace_format and xenalyze. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: xentrace, xenalyze 2016-02-24 14:51 ` Dario Faggioli @ 2016-02-24 16:00 ` Paul Sujkov 2016-02-24 16:19 ` George Dunlap 0 siblings, 1 reply; 9+ messages in thread From: Paul Sujkov @ 2016-02-24 16:00 UTC (permalink / raw) To: Dario Faggioli; +Cc: George Dunlap, xen-devel [-- Attachment #1.1: Type: text/plain, Size: 3018 bytes --] > Have a look at this series: > http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg02233.html Thanks a lot! Looking through it at the moment, looks very promising. > And I've got another one that I'll send out asap (and I can Cc you). Thanks in advance :) > I usually enable a subset of them (one or more "classes") and try to > figure out if I see the problem in the resulting trace. If yes, I try > with a narrower subset. If not, I try with either a broader or a > different one. Well, I have doubts on how to interpret the very basic info xenalyze is supporting me with. E.g. how can I measure intra-vm latencies, both global (how much PCPU time did hypervisor itself spent during all the testing time) or local (doing the same for specific interrupts)? Why domain 32767 (default domain for cases when it's not clear what domain traces are about - according to documentation) is getting quite a lot of PCPU time (does this mean traces are incorrect or there is some significant problem in setup)? What's concurrency_hazard, partial contention, full_contention, etc (these are from xenalyze summary)? How can I get number of context switches (overall or average)? Adding some subtle questions, like, e.g. I have domain summary looking like this: |-- Domain 2 --| Runstates: blocked: 273 0.35s 7908 { 2093| 9561| 47811} partial run: 2284 1.27s 3420 { 6183| 6197| 6382} full run: 1322 0.10s 479 { 95| 3772| 6164} partial contention: 907 1.73s 11713 { 30655| 34266| 34305} concurrency_hazard: 2474 0.18s 435 { 48| 5681| 6206} full_contention: 381 0.02s 383 { 56| 36601| 36601} ... -- v0 -- Runstates: running: 1981 1.36s 4217 { 6193| 6215| 6242} runnable: 737 1.74s 14472 { 271| 36780| 38705} wake: 430 0.04s 632 { 67| 26049| 35549} preempt: 307 1.69s 33856 { 108| 36650| 39345} blocked: 430 0.56s 7974 { 1189| 21758| 60893} cpu affinity: 336 66914 { 3456| 52202|243760} [0]: 167 66156 { 3650| 57926|216477} [1]: 169 67663 { 3205| 44754|245733} -- v1 -- Runstates: running: 2773 0.29s 649 { 54| 6382| 6382} runnable: 874 0.22s 1520 { 5995| 36669| 36710} wake: 845 0.09s 640 { 452| 25366| 26313} preempt: 29 0.13s 27152 { 34413| 36708| 36710} blocked: 845 3.14s 22856 { 2477| 61224| 61422} cpu affinity: 391 57508 { 2788| 58686|128810} [0]: 196 59685 { 2834| 58664|128810} [1]: 195 55319 { 2770| 60622|130371} It looks like Domain 2 had 0.10s of full run and 1.27s of partial run, but it's VCPU v0 was running 1.36s and VCPU v1 was running 0.29s. How does these numbers relate, what exactly is partial run, can I get some insight from concurrency_hazard or full_contention numbers? I am trying to build up some understanding using xenalyze sources mostly because documentation does not go into any details whatsoever, but it goes pretty slow. [-- Attachment #1.2: Type: text/html, Size: 4728 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: xentrace, xenalyze 2016-02-24 16:00 ` Paul Sujkov @ 2016-02-24 16:19 ` George Dunlap 0 siblings, 0 replies; 9+ messages in thread From: George Dunlap @ 2016-02-24 16:19 UTC (permalink / raw) To: Paul Sujkov, Dario Faggioli; +Cc: xen-devel On 24/02/16 16:00, Paul Sujkov wrote: >> Have a look at this series: >> http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg02233.html > > Thanks a lot! Looking through it at the moment, looks very promising. > >> And I've got another one that I'll send out asap (and I can Cc you). > > Thanks in advance :) > >> I usually enable a subset of them (one or more "classes") and try to >> figure out if I see the problem in the resulting trace. If yes, I try >> with a narrower subset. If not, I try with either a broader or a >> different one. > > Well, I have doubts on how to interpret the very basic info xenalyze is > supporting me with. E.g. how can I measure intra-vm latencies, both global > (how much PCPU time did hypervisor itself spent during all the testing > time) or local (doing the same for specific interrupts)? You need to add ARM-specific traces to xen and xenalyze to get this information. > Why domain 32767 > (default domain for cases when it's not clear what domain traces are about > - according to documentation) is getting quite a lot of PCPU time (does > this mean traces are incorrect or there is some significant problem in > setup)? Domain 3276*8* is the "default domain". 32767 is the idle domain. This domain "getting pcpu time" means that the cpu is idle. :-) What's concurrency_hazard, partial contention, full_contention, etc > (these are from xenalyze summary)? How can I get number of context switches > (overall or average)? > > Adding some subtle questions, like, e.g. I have domain summary looking like > this: > > |-- Domain 2 --| > Runstates: > blocked: 273 0.35s 7908 { 2093| 9561| 47811} > partial run: 2284 1.27s 3420 { 6183| 6197| 6382} > full run: 1322 0.10s 479 { 95| 3772| 6164} > partial contention: 907 1.73s 11713 { 30655| 34266| 34305} > concurrency_hazard: 2474 0.18s 435 { 48| 5681| 6206} > full_contention: 381 0.02s 383 { 56| 36601| 36601} > ... > -- v0 -- > Runstates: > running: 1981 1.36s 4217 { 6193| 6215| 6242} > runnable: 737 1.74s 14472 { 271| 36780| 38705} > wake: 430 0.04s 632 { 67| 26049| 35549} > preempt: 307 1.69s 33856 { 108| 36650| 39345} > blocked: 430 0.56s 7974 { 1189| 21758| 60893} > cpu affinity: 336 66914 { 3456| 52202|243760} > [0]: 167 66156 { 3650| 57926|216477} > [1]: 169 67663 { 3205| 44754|245733} > -- v1 -- > Runstates: > running: 2773 0.29s 649 { 54| 6382| 6382} > runnable: 874 0.22s 1520 { 5995| 36669| 36710} > wake: 845 0.09s 640 { 452| 25366| 26313} > preempt: 29 0.13s 27152 { 34413| 36708| 36710} > blocked: 845 3.14s 22856 { 2477| 61224| 61422} > cpu affinity: 391 57508 { 2788| 58686|128810} > [0]: 196 59685 { 2834| 58664|128810} > [1]: 195 55319 { 2770| 60622|130371} > > It looks like Domain 2 had 0.10s of full run and 1.27s of partial run, but > it's VCPU v0 was running 1.36s and VCPU v1 was running 0.29s. How does > these numbers relate, what exactly is partial run, can I get some insight > from concurrency_hazard or full_contention numbers? So the *real* thing is the per-vcpu runstates. These correspond to runstates inside of Xen. In the above traces, vcpu 0 entered the "running" state 1981 times, and vcpu 1 entered the "running" state 2773 times. This measures the time the vcpu started executing and then stopped (which is probably what you mean by "context switch"). The stuff for the domain is a concept I invented called "domain runstates". Probably the easiest thing to do is to look at the description I made when I tried to submit the calculation of these into Xen itself: http://lists.xen.org/archives/html/xen-devel/2010-11/msg01325.html (That patch series was rejected, but as I understand it the patches are still carried by XenServer.) > I am trying to build up some understanding using xenalyze sources mostly > because documentation does not go into any details whatsoever, but it goes > pretty slow. Right -- there's a huge amount of functionality, and it was initially just a tool that I used myself. That said, I did write an html file with a bit of documentation when xenalyze was in a separate tree; that seems not to have been checked in with xenalyze: ...hmm, permissions seem to be borked; I'll see if I can get that sorted and then send you a link to it. -George ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-02-24 17:58 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-02-24 13:21 xentrace, xenalyze Paul Sujkov 2016-02-24 14:41 ` George Dunlap 2016-02-24 15:24 ` Paul Sujkov 2016-02-24 15:53 ` Dario Faggioli 2016-02-24 15:58 ` George Dunlap 2016-02-24 17:58 ` Paul Sujkov 2016-02-24 14:51 ` Dario Faggioli 2016-02-24 16:00 ` Paul Sujkov 2016-02-24 16:19 ` George Dunlap
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).