* OProfile vs. Perfmon
@ 2003-10-07 23:27 John Levon
2003-10-08 19:46 ` John Levon
0 siblings, 1 reply; 2+ messages in thread
From: John Levon @ 2003-10-07 23:27 UTC (permalink / raw)
To: linux-ia64
2.6 still lacks proper OProfile support. Will Cohen's older patch was
rejected due to issues over lack of re-use of the perfmon interface.
I've been looking briefly at perfmon just now, and wanted to gather your
thoughts on the best way this needs to be integrated.
First, what OProfile needs :
o some way to set up the counter values
o system-wide statistical profiling
o an in-kernel notification on counter overflow, and reset of the
counter value at that point
o some way to mark each generated event by a small integer value and the
IP saved at the time of the overflow interrupt
And that's it, I think.
Previously mentioned was the suggestion that OProfile kernel glue should
hook into perfmon's custom buffer API. Having looked briefly at it, it
seems the perfmon core has a lot of buffer management code, that we
really do not want to use, as it would massively complicate the
arch-independent oprofile core. IOW, the overflow handler would feed
straight into oprofile_add_sample(). In effect, all we need is to be
able to receive the overflow interrupts.
It certainly looks like this is easy enough.
The next issue is setting up the counters. Basically the userspace
should be using pfmlib to program the required events, and using
sys_perfmonctl to set up a system-wide session. From oprofiled's point
of view, it would basically write the pmc's, create a session, and start
the profiling.
I'm a bit baffled by the code concerning ctx_cpu in system-wide mode. Is
user-space supposed to figure out how many CPUs are up and create a
context for each of them ? In particular:
4229 the_cpu = ctx->ctx_cpu = smp_processor_id();
I cannot make head or tail of why the CPU that happens to be running the
task calling sys_perfmonctl() is relevant at all to system-wide
profiling. Can someone explain ?
Finally, is the above summary roughly accurate, and are there any
pitfalls I might need to watch out for ?
thanks,
john
--
Khendon's Law:
If the same point is made twice by the same person, the thread is over.
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: OProfile vs. Perfmon
2003-10-07 23:27 OProfile vs. Perfmon John Levon
@ 2003-10-08 19:46 ` John Levon
0 siblings, 0 replies; 2+ messages in thread
From: John Levon @ 2003-10-08 19:46 UTC (permalink / raw)
To: linux-ia64
On Wed, Oct 08, 2003 at 10:40:30AM -0700, Stephane Eranian wrote:
> Again, different tools have different goals. The important point
> is that I believe we can provide that kind of flexibility while still using
> a single kernel interface.
Absolutely agreed, it is not sensible to have multiple interfaces to the
same thing, and flexibility in the type of performance counting is
important.
> If you looked at the default format (perfmon_default_smpl.c), you see how this
> can be implemented. As for buffer management, again, perfmon-2 gives you a
> choice. You can have perfmon-2 allocates and remap the buffer into the address
> space of the tool OR you can do this yourself, i.e., allocate you own buffer
> AND export it in ANY WAY you want.
Sounds great.
> The idea is that the user space tools would go through the perfmonctl() system
> call to setup the counters for a system-wide session. You do NOT need to use
> pfmlib if you already have your own libary to handle the IA-64 events and
> their encodings. In fact, the upcoming version 3.0 of the libpfm library is
> TOTALLY disconnected from the kernel interface. In any case, Keep in mind
> that the library DOES NOT call perfmonctl(), it is just here to help you
> figure out what values to program into the PMC registers given what you want to
> measure.
We do have our own stuff but it would certainly make sense for us to use
pfmlib for the allocation/setup.
> The tool simply needs to fork/pthread_create a worker task for each CPU
> it wants to measure on and it must pinned the task. The system-wide perfmon
> context will be bound to the CPU is was created on. The creator task can
> later migrate to other CPUs, but the context can only be accessed from tasks
> running on its CPU. I realize that this is not the model you have chosen, but
> I believe you can fairly easily hide the difference into some small library.
OK, thanks for clarifying. It's certainly not much code to fork off N
tasks and pin them to each CPU before calling the stuff to create a
context.
I'll get hacking :)
thanks,
john
--
Khendon's Law:
If the same point is made twice by the same person, the thread is over.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2003-10-08 19:46 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-07 23:27 OProfile vs. Perfmon John Levon
2003-10-08 19:46 ` John Levon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox