From: Ingo Molnar <mingo@elte.hu>
To: "K.Prasad" <prasad@linux.vnet.ibm.com>
Cc: Arjan van de Ven <arjan@infradead.org>,
"Frank Ch. Eigler" <fche@redhat.com>,
peterz@infradead.org, linux-kernel@vger.kernel.org,
Frederic Weisbecker <fweisbec@gmail.com>
Subject: Re: [RFC PATCH] perf_core: provide a kernel-internal interface to get to performance counters
Date: Thu, 1 Oct 2009 12:28:37 +0200 [thread overview]
Message-ID: <20091001102837.GA31794@elte.hu> (raw)
In-Reply-To: <20091001100109.GB3636@in.ibm.com>
* K.Prasad <prasad@linux.vnet.ibm.com> wrote:
> On Thu, Oct 01, 2009 at 10:53:30AM +0200, Ingo Molnar wrote:
> >
> > * K.Prasad <prasad@linux.vnet.ibm.com> wrote:
> >
> > > On Thu, Oct 01, 2009 at 09:25:18AM +0200, Ingo Molnar wrote:
> > > >
> > > > * Arjan van de Ven <arjan@infradead.org> wrote:
> > > >
> > > > > On Sun, 27 Sep 2009 00:02:46 +0530
> > > > > "K.Prasad" <prasad@linux.vnet.ibm.com> wrote:
> > > > >
> > > > > > On Sat, Sep 26, 2009 at 12:03:28PM -0400, Frank Ch. Eigler wrote:
> > > > >
> > > > > > > For what it's worth, this sort of thing also looks useful from
> > > > > > > systemtap's point of view.
> > > > > >
> > > > > > Wouldn't SystemTap be another user that desires support for
> > > > > > multiple/all CPU perf-counters (apart from hw-breakpoints as a
> > > > > > potential user)? As Arjan pointed out, perf's present design would
> > > > > > support only a per-CPU or per-task counter; not both.
> > > > >
> > > > > I'm sorry but I think I am missing your point. "all cpu counters"
> > > > > would be one small helper wrapper away, a helper I'm sure the
> > > > > SystemTap people are happy to submit as part of their patch series
> > > > > when they submit SystemTap to the kernel.
> > > >
> > > > Yes, and Frederic wrote that wrapper already for the hw-breakpoints
> > > > patches. It's a non-issue and does not affect the design - we can always
> > > > gang up an array of per cpu perf events, it's a straightforward use of
> > > > the existing design.
> > > >
> > >
> > > Such a design (iteratively invoking a per-CPU perf event for all
> > > desired CPUs) isn't without issues, some of which are noted here:
> > > (apart from http://lkml.org/lkml/2009/9/14/298).
> > >
> > > - It breaks the abstraction that a user of the exported interfaces would
> > > enjoy w.r.t. having all CPU (or a cpumask of CPU) breakpoints.
> >
> > CPU offlining/onlining support would be interesting to add.
> >
> > > - (Un)Availability of debug registers on every requested CPU is not
> > > known until request for that CPU fails. A failed request should be
> > > followed by a rollback of the partially successful requests.
> >
> > Yes.
> >
> > > - Any breakpoint exceptions generated due to partially successful
> > > requests (before a failed request is encountered) must be treated as
> > > 'stray' and be ignored (by the end-user? or the wrapper code?).
> >
> > Such inatomicity is inherent in using more than one CPU and a disjoint
> > set of hw-breakpoints. If the calling code cares then callbacks
> > triggering while the registration has not returned yet can be ignored.
> >
>
> It can be prevented through book-keeping for debug registers, and
> takes a 'greedy' approach that writes values onto the physical registers
> only if it is known that there are sufficient slots available on all
> desired CPUs (as done by the register_kernel_hw_breakpoint() code in
> -tip now).
>
> > > - Any CPUs that become online eventually have to be trapped and
> > > populated with the appropriate debug register value (not something
> > > that the end-user of breakpoints should be bothered with).
> > >
> > > - Modifying the characteristics of a kernel breakpoint (including the
> > > valid CPUs) will be equally painful.
> > >
> > > - Races between the requests (also leading to temporary failure of
> > > all CPU requests) presenting an unclear picture about free debug
> > > registers (making it difficult to predict the need for a retry).
> > >
> > > So we either have a perf event infrastructure that is cognisant of
> > > many/all CPU counters, or make perf as a user of hw-breakpoints layer
> > > which already handles such requests in a deft manner (through
> > > appropriate book-keeping).
> >
> > Given that these are all still in the add-on category not affecting the
> > design, while the problems solved by perf events are definitely in the
> > non-trivial category, i'd suggest you extend perf events with a 'system
> > wide' event abstraction, which:
> >
> > - Enumerates such registered events (via a list)
> >
> > - Adds a CPU hotplug handler (which clones those events over to a new
> > CPU and directs it back to the ring-buffer of the existing event(s)
> > [if any])
> >
> > - Plus a state field that allows the filtering out of stray/premature
> > events.
> >
>
> With some book-keeping (as stated before) in place, stray exceptions
> due to premature events would be prevented since only successful
> requests are written onto debug registers. There would be no need for
> a rollback from the end-user too.
>
> But I'm not sure if such book-keeping variables/data-structures will
> find uses in other hw/sw events in perf apart from breakpoints
> (depends on whether there's a need for support for multiple instances
> of a hw/sw perf counter for a given CPU). If yes, then, the existing
> synchronisation mechanism (through spin-locks over hw_breakpoint_lock)
> must be extended over other perf events (post integration).
yes - i think the counter array currently used by 'perf top' could be
changed to use that new event type. Also, 'perf record --all' could use
it. SysProf (system-wide profiler) would also be potential users of it.
So yes, there would be use for this well beyond hw-breakpoints.
Ingo
next prev parent reply other threads:[~2009-10-01 10:28 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-25 10:25 [RFC PATCH] perf_core: provide a kernel-internal interface to get to performance counters Arjan van de Ven
2009-09-25 10:44 ` Frederic Weisbecker
2009-09-25 11:42 ` Peter Zijlstra
2009-09-26 16:03 ` Frank Ch. Eigler
2009-09-26 16:11 ` Arjan van de Ven
2009-09-26 16:20 ` Frank Ch. Eigler
2009-09-26 18:32 ` K.Prasad
2009-09-26 18:48 ` Arjan van de Ven
2009-10-01 7:25 ` Ingo Molnar
2009-10-01 8:16 ` K.Prasad
2009-10-01 8:53 ` Ingo Molnar
2009-10-01 10:01 ` K.Prasad
2009-10-01 10:28 ` Ingo Molnar [this message]
2009-10-04 22:28 ` Frederic Weisbecker
2009-10-05 9:55 ` Ingo Molnar
2009-10-05 10:13 ` Frédéric Weisbecker
2009-10-05 7:53 ` Peter Zijlstra
2009-10-05 8:55 ` Ingo Molnar
2009-10-05 9:24 ` Frédéric Weisbecker
2009-10-05 9:48 ` Ingo Molnar
2009-10-05 10:08 ` Frédéric Weisbecker
2009-11-21 13:36 ` [tip:perf/core] perf/core: Provide " tip-bot for Arjan van de Ven
2010-02-05 15:47 ` [RFC PATCH] perf_core: provide " Christoph Hellwig
2010-02-05 17:59 ` john smith
2010-02-06 6:24 ` Arjan van de Ven
2010-02-06 11:46 ` Frederic Weisbecker
2010-02-06 14:18 ` Peter Zijlstra
2010-02-06 16:08 ` Frederic Weisbecker
2010-02-07 17:01 ` Ingo Molnar
-- strict thread matches above, loose matches on Subject: below --
2010-01-25 5:10 john smith
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091001102837.GA31794@elte.hu \
--to=mingo@elte.hu \
--cc=arjan@infradead.org \
--cc=fche@redhat.com \
--cc=fweisbec@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=prasad@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox