* [Patch 0/1] HW-BKPT: Allow per-cpu kernel-space Hardware Breakpoint requests
@ 2009-08-17 12:46 K.Prasad
2009-08-19 16:11 ` K.Prasad
0 siblings, 1 reply; 15+ messages in thread
From: K.Prasad @ 2009-08-17 12:46 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Ingo Molnar, Peter Zijlstra, Lai Jiangshan,
Steven Rostedt, Mathieu Desnoyers, Alan Stern
Hi All,
Please find a patch that enables kernel-space breakpoints to be
requested for a subset of the available CPUs in the system. This allows
per-CPU breakpoints and comes with the associated benefit of reduced
overhead during (un)registration.
This enhancement allows exploitation of hardware breakpoint registers by
'perf' which produces a CPU-wise information.
Design changes
--------------
- Every breakpoint request 'consumes' the first available debug register
(starting from HBP_NUM) in each CPU represented by 'cpumask' field in
'struct hw_breakpoint'.
- 'hbp_kernel_pos' (that separates kernel-space breakpoints from the
free/user-space breakpoints) now points to the maximum of debug
registers consumed on any given CPU.
-- 'hbp_kernel_pos' is decremented (one-at-a-time) to allow a new-slot
for kernel-space requests iff all debug registers on the given CPU
(from HBP_NUM - 1 to 'hbp_kernel_pos' are already consumed.
-- 'hbp_kernel_pos' is incremented (one-at-a-time) to free a slot iff
a removal request results in the release of a bkpt request that
consumed maximum debug registers for kernel-space.
- Every removal request results in compaction of breakpoint registers
(on a per-cpu basis) to occupy the vacant debug register.
The patch is based on commit b6c720b811aed0eeda89f277f13c1bd1bdf721fd of
-tip tree and has been tested to work fine on an x86 machine for both
cases (i.e. system-wide kernel breakpoints and bkpts for a subset of CPUs).
Please let me know your comments on the same.
Thanks,
K.Prasad
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 0/1] HW-BKPT: Allow per-cpu kernel-space Hardware Breakpoint requests
2009-08-17 12:46 [Patch 0/1] HW-BKPT: Allow per-cpu kernel-space Hardware Breakpoint requests K.Prasad
@ 2009-08-19 16:11 ` K.Prasad
2009-08-19 17:33 ` Frederic Weisbecker
0 siblings, 1 reply; 15+ messages in thread
From: K.Prasad @ 2009-08-19 16:11 UTC (permalink / raw)
To: LKML, Frederic Weisbecker
Cc: Ingo Molnar, Peter Zijlstra, Lai Jiangshan, Steven Rostedt,
Mathieu Desnoyers, Alan Stern
On Mon, Aug 17, 2009 at 06:16:41PM +0530, K.Prasad wrote:
> Hi All,
> Please find a patch that enables kernel-space breakpoints to be
> requested for a subset of the available CPUs in the system. This allows
> per-CPU breakpoints and comes with the associated benefit of reduced
> overhead during (un)registration.
>
> This enhancement allows exploitation of hardware breakpoint registers by
> 'perf' which produces a CPU-wise information.
>
> Design changes
> --------------
> - Every breakpoint request 'consumes' the first available debug register
> (starting from HBP_NUM) in each CPU represented by 'cpumask' field in
> 'struct hw_breakpoint'.
>
> - 'hbp_kernel_pos' (that separates kernel-space breakpoints from the
> free/user-space breakpoints) now points to the maximum of debug
> registers consumed on any given CPU.
> -- 'hbp_kernel_pos' is decremented (one-at-a-time) to allow a new-slot
> for kernel-space requests iff all debug registers on the given CPU
> (from HBP_NUM - 1 to 'hbp_kernel_pos' are already consumed.
> -- 'hbp_kernel_pos' is incremented (one-at-a-time) to free a slot iff
> a removal request results in the release of a bkpt request that
> consumed maximum debug registers for kernel-space.
>
> - Every removal request results in compaction of breakpoint registers
> (on a per-cpu basis) to occupy the vacant debug register.
>
> The patch is based on commit b6c720b811aed0eeda89f277f13c1bd1bdf721fd of
> -tip tree and has been tested to work fine on an x86 machine for both
> cases (i.e. system-wide kernel breakpoints and bkpts for a subset of CPUs).
>
> Please let me know your comments on the same.
>
> Thanks,
> K.Prasad
>
Hi Frederic,
Do you find these patches, that provide the ability to restrict
kernel-space breakpoints to any given subset of CPUs, to bring the
requisite features for exploitation of hw-bkpt by 'perf tools'?
Also of interest would be the reduced overhead associated with
(un)register_kernel_hw_breakpoint() operations (no IPI in case of
single-CPU breakpoint request).
Thanks,
K.Prasad
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 0/1] HW-BKPT: Allow per-cpu kernel-space Hardware Breakpoint requests
2009-08-19 16:11 ` K.Prasad
@ 2009-08-19 17:33 ` Frederic Weisbecker
2009-08-20 17:27 ` K.Prasad
0 siblings, 1 reply; 15+ messages in thread
From: Frederic Weisbecker @ 2009-08-19 17:33 UTC (permalink / raw)
To: K.Prasad, Peter Zijlstra, Ingo Molnar
Cc: LKML, Lai Jiangshan, Steven Rostedt, Mathieu Desnoyers,
Alan Stern
On Wed, Aug 19, 2009 at 09:41:19PM +0530, K.Prasad wrote:
> On Mon, Aug 17, 2009 at 06:16:41PM +0530, K.Prasad wrote:
> > Hi All,
> > Please find a patch that enables kernel-space breakpoints to be
> > requested for a subset of the available CPUs in the system. This allows
> > per-CPU breakpoints and comes with the associated benefit of reduced
> > overhead during (un)registration.
> >
> > This enhancement allows exploitation of hardware breakpoint registers by
> > 'perf' which produces a CPU-wise information.
> >
> > Design changes
> > --------------
> > - Every breakpoint request 'consumes' the first available debug register
> > (starting from HBP_NUM) in each CPU represented by 'cpumask' field in
> > 'struct hw_breakpoint'.
> >
> > - 'hbp_kernel_pos' (that separates kernel-space breakpoints from the
> > free/user-space breakpoints) now points to the maximum of debug
> > registers consumed on any given CPU.
> > -- 'hbp_kernel_pos' is decremented (one-at-a-time) to allow a new-slot
> > for kernel-space requests iff all debug registers on the given CPU
> > (from HBP_NUM - 1 to 'hbp_kernel_pos' are already consumed.
> > -- 'hbp_kernel_pos' is incremented (one-at-a-time) to free a slot iff
> > a removal request results in the release of a bkpt request that
> > consumed maximum debug registers for kernel-space.
> >
> > - Every removal request results in compaction of breakpoint registers
> > (on a per-cpu basis) to occupy the vacant debug register.
> >
> > The patch is based on commit b6c720b811aed0eeda89f277f13c1bd1bdf721fd of
> > -tip tree and has been tested to work fine on an x86 machine for both
> > cases (i.e. system-wide kernel breakpoints and bkpts for a subset of CPUs).
> >
> > Please let me know your comments on the same.
> >
> > Thanks,
> > K.Prasad
> >
>
> Hi Frederic,
> Do you find these patches, that provide the ability to restrict
> kernel-space breakpoints to any given subset of CPUs, to bring the
> requisite features for exploitation of hw-bkpt by 'perf tools'?
>
> Also of interest would be the reduced overhead associated with
> (un)register_kernel_hw_breakpoint() operations (no IPI in case of
> single-CPU breakpoint request).
>
> Thanks,
> K.Prasad
>
Nice.
Yeah I just reviewed the patch and it looks good.
Now I guess we should meet two others requirements for a pmu
through this high level Api:
- only update the hardware registers when needed: while switching
to another thread of a same group, the hardware register switching
is wasteful.
BTW, I wonder if we need a flag while creating a user bp that tells whether
the bp is inherited through fork/clone calls.
- having a callback that quickly swap two breakpoints in order to support
the hardware register multiplexing. I guess the pmu object would just need
to call it when the multiplexing is decided.
Providing those would let us build a pmu struct on top of this high level API,
hopefully.
All that would be a benefit in both sides. It avoids us building a low level PMU
that reinvent the wheel, ie: the hardware breakpoints API handles a lot of things
both in arch and core sides (debug register setting tricks with dr7 and co,
cpu hotplug, kexec, etc...).
In the bp API it brings more power (register switching only if needed, per cpu
support, clone inheritance support, etc...)
And in the end we have a pmu (which unifies the control of this profiling
unit through a well established and known object for perfcounter) controlled by
a high level API that could also benefit to other debugging subsystems.
What do you think?
It would be also nice to have Peter's and Ingo opinion about it, to be sure
we are not going in the wrong direction.
Thanks.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 0/1] HW-BKPT: Allow per-cpu kernel-space Hardware Breakpoint requests
2009-08-19 17:33 ` Frederic Weisbecker
@ 2009-08-20 17:27 ` K.Prasad
2009-08-21 14:28 ` Ingo Molnar
2009-08-25 20:33 ` K.Prasad
0 siblings, 2 replies; 15+ messages in thread
From: K.Prasad @ 2009-08-20 17:27 UTC (permalink / raw)
To: Frederic Weisbecker, Ingo Molnar, Peter Zijlstra
Cc: LKML, Lai Jiangshan, Steven Rostedt, Mathieu Desnoyers,
Alan Stern
On Wed, Aug 19, 2009 at 07:33:00PM +0200, Frederic Weisbecker wrote:
> On Wed, Aug 19, 2009 at 09:41:19PM +0530, K.Prasad wrote:
> > On Mon, Aug 17, 2009 at 06:16:41PM +0530, K.Prasad wrote:
> > > Hi All,
> > > Please find a patch that enables kernel-space breakpoints to be
> > > requested for a subset of the available CPUs in the system. This allows
> > > per-CPU breakpoints and comes with the associated benefit of reduced
> > > overhead during (un)registration.
> > >
> > > This enhancement allows exploitation of hardware breakpoint registers by
> > > 'perf' which produces a CPU-wise information.
> > >
[edited]
> >
> > Hi Frederic,
> > Do you find these patches, that provide the ability to restrict
> > kernel-space breakpoints to any given subset of CPUs, to bring the
> > requisite features for exploitation of hw-bkpt by 'perf tools'?
> >
> > Also of interest would be the reduced overhead associated with
> > (un)register_kernel_hw_breakpoint() operations (no IPI in case of
> > single-CPU breakpoint request).
> >
> > Thanks,
> > K.Prasad
> >
>
>
> Nice.
> Yeah I just reviewed the patch and it looks good.
>
> Now I guess we should meet two others requirements for a pmu
> through this high level Api:
>
> - only update the hardware registers when needed: while switching
> to another thread of a same group, the hardware register switching
> is wasteful.
> BTW, I wonder if we need a flag while creating a user bp that tells whether
> the bp is inherited through fork/clone calls.
>
So this means avoiding a re-write of addresses into debug registers when
they don't change. It is indeed desirable and would help if the same
breakpoint is used across, say, many/all threads of a process.
However I'd believe that the time taken for this is miniscule compared
to the overhead involved during context switch. Perhaps consider this
requirement a later time?
> - having a callback that quickly swap two breakpoints in order to support
> the hardware register multiplexing. I guess the pmu object would just need
> to call it when the multiplexing is decided.
>
>
Are you suggesting something like a modify_kernel_hw_breakpoint() that
can quickly change a breakpoint address/characteristics?
That's quite doable...it requires a quick validation through
arch_validate_hwbkpt_settings() and the requisite IPIs (depending on
what the new cpumask is).
I will send a patch to that effect soon.
> Providing those would let us build a pmu struct on top of this high level API,
> hopefully.
>
> All that would be a benefit in both sides. It avoids us building a low level PMU
> that reinvent the wheel, ie: the hardware breakpoints API handles a lot of things
> both in arch and core sides (debug register setting tricks with dr7 and co,
> cpu hotplug, kexec, etc...).
> In the bp API it brings more power (register switching only if needed, per cpu
> support, clone inheritance support, etc...)
>
> And in the end we have a pmu (which unifies the control of this profiling
> unit through a well established and known object for perfcounter) controlled by
> a high level API that could also benefit to other debugging subsystems.
>
> What do you think?
> It would be also nice to have Peter's and Ingo opinion about it, to be sure
> we are not going in the wrong direction.
>
Indeed, it will be nice to know from Ingo and Peter that we are heading
right.
Thanks,
K.Prasad
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 0/1] HW-BKPT: Allow per-cpu kernel-space Hardware Breakpoint requests
2009-08-20 17:27 ` K.Prasad
@ 2009-08-21 14:28 ` Ingo Molnar
2009-08-26 3:36 ` Frederic Weisbecker
2009-08-25 20:33 ` K.Prasad
1 sibling, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2009-08-21 14:28 UTC (permalink / raw)
To: K.Prasad
Cc: Frederic Weisbecker, Peter Zijlstra, LKML, Lai Jiangshan,
Steven Rostedt, Mathieu Desnoyers, Alan Stern
* K.Prasad <prasad@linux.vnet.ibm.com> wrote:
> > Providing those would let us build a pmu struct on top of this
> > high level API, hopefully.
Note that there's a PMU struct already in
arch/x86/kernel/cpu/perf_counter.c. Could debug-register ops be
tacked on to it?
> > All that would be a benefit in both sides. It avoids us building
> > a low level PMU that reinvent the wheel, ie: the hardware
> > breakpoints API handles a lot of things both in arch and core
> > sides (debug register setting tricks with dr7 and co, cpu
> > hotplug, kexec, etc...). In the bp API it brings more power
> > (register switching only if needed, per cpu support, clone
> > inheritance support, etc...)
> >
> > And in the end we have a pmu (which unifies the control of this
> > profiling unit through a well established and known object for
> > perfcounter) controlled by a high level API that could also
> > benefit to other debugging subsystems.
> >
> > What do you think? It would be also nice to have Peter's and
> > Ingo opinion about it, to be sure we are not going in the wrong
> > direction.
>
> Indeed, it will be nice to know from Ingo and Peter that we are
> heading right.
If you do this proper perfcounters integration then i'm certainly
happy.
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 0/1] HW-BKPT: Allow per-cpu kernel-space Hardware Breakpoint requests
2009-08-20 17:27 ` K.Prasad
2009-08-21 14:28 ` Ingo Molnar
@ 2009-08-25 20:33 ` K.Prasad
1 sibling, 0 replies; 15+ messages in thread
From: K.Prasad @ 2009-08-25 20:33 UTC (permalink / raw)
To: Frederic Weisbecker, Ingo Molnar, Peter Zijlstra
Cc: LKML, Lai Jiangshan, Steven Rostedt, Mathieu Desnoyers,
Alan Stern
On Thu, Aug 20, 2009 at 10:57:19PM +0530, K.Prasad wrote:
> On Wed, Aug 19, 2009 at 07:33:00PM +0200, Frederic Weisbecker wrote:
> > On Wed, Aug 19, 2009 at 09:41:19PM +0530, K.Prasad wrote:
> > > On Mon, Aug 17, 2009 at 06:16:41PM +0530, K.Prasad wrote:
> > > > Hi All,
> > > > Please find a patch that enables kernel-space breakpoints to be
> > > > requested for a subset of the available CPUs in the system. This allows
> > > > per-CPU breakpoints and comes with the associated benefit of reduced
> > > > overhead during (un)registration.
> > > >
> > > > This enhancement allows exploitation of hardware breakpoint registers by
> > > > 'perf' which produces a CPU-wise information.
> > > >
> [edited]
> > >
> > > Hi Frederic,
> > > Do you find these patches, that provide the ability to restrict
> > > kernel-space breakpoints to any given subset of CPUs, to bring the
> > > requisite features for exploitation of hw-bkpt by 'perf tools'?
> > >
> > > Also of interest would be the reduced overhead associated with
> > > (un)register_kernel_hw_breakpoint() operations (no IPI in case of
> > > single-CPU breakpoint request).
> > >
[edited]
> > - having a callback that quickly swap two breakpoints in order to support
> > the hardware register multiplexing. I guess the pmu object would just need
> > to call it when the multiplexing is decided.
> >
> >
>
> Are you suggesting something like a modify_kernel_hw_breakpoint() that
> can quickly change a breakpoint address/characteristics?
>
> That's quite doable...it requires a quick validation through
> arch_validate_hwbkpt_settings() and the requisite IPIs (depending on
> what the new cpumask is).
>
> I will send a patch to that effect soon.
>
Hi Frederic,
I just sent a patchset that adds the ability to specify per-cpu
kernel-space breakpoints + a (relatively) lightweight function to modify
the characteristics of a kernel-space breakpoint that can be used to
swap between two breakpoint requests.
Please pull them into -tip tree if you find them mature and ready.
With these new feature additions, I see the HW-Breakpoint infrastructure
code ready to meet the needs for exploitation by perf-tools and I presume
you would restart your effort on the same?
Thanks,
K.Prasad
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 0/1] HW-BKPT: Allow per-cpu kernel-space Hardware Breakpoint requests
2009-08-21 14:28 ` Ingo Molnar
@ 2009-08-26 3:36 ` Frederic Weisbecker
2009-08-26 9:16 ` Ingo Molnar
0 siblings, 1 reply; 15+ messages in thread
From: Frederic Weisbecker @ 2009-08-26 3:36 UTC (permalink / raw)
To: Ingo Molnar
Cc: K.Prasad, Peter Zijlstra, LKML, Lai Jiangshan, Steven Rostedt,
Mathieu Desnoyers, Alan Stern
On Fri, Aug 21, 2009 at 04:28:11PM +0200, Ingo Molnar wrote:
>
> * K.Prasad <prasad@linux.vnet.ibm.com> wrote:
>
> > > Providing those would let us build a pmu struct on top of this
> > > high level API, hopefully.
>
> Note that there's a PMU struct already in
> arch/x86/kernel/cpu/perf_counter.c. Could debug-register ops be
> tacked on to it?
No, we don't need to build an arch level pmu since the BP api
already handles the arch abstraction (or well, it is planned to).
Instead, what we need is a core pmu that relies on the BP api.
Such pmu will be allocated dynamically while creating a hardware
breakpoint counter.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 0/1] HW-BKPT: Allow per-cpu kernel-space Hardware Breakpoint requests
2009-08-26 3:36 ` Frederic Weisbecker
@ 2009-08-26 9:16 ` Ingo Molnar
2009-08-26 11:49 ` Frederic Weisbecker
0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2009-08-26 9:16 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: K.Prasad, Peter Zijlstra, LKML, Lai Jiangshan, Steven Rostedt,
Mathieu Desnoyers, Alan Stern
* Frederic Weisbecker <fweisbec@gmail.com> wrote:
> On Fri, Aug 21, 2009 at 04:28:11PM +0200, Ingo Molnar wrote:
> >
> > * K.Prasad <prasad@linux.vnet.ibm.com> wrote:
> >
> > > > Providing those would let us build a pmu struct on top of this
> > > > high level API, hopefully.
> >
> > Note that there's a PMU struct already in
> > arch/x86/kernel/cpu/perf_counter.c. Could debug-register ops be
> > tacked on to it?
>
> No, we don't need to build an arch level pmu since the BP api
> already handles the arch abstraction (or well, it is planned to).
>
> Instead, what we need is a core pmu that relies on the BP api.
> Such pmu will be allocated dynamically while creating a hardware
> breakpoint counter.
i'm not convinced at all we need all that layering of
perfcounters->pmu->BP. Why not add BP support to the PMU abstraction
and be done with it?
That way we get hardware breakpoints via 'pinned, exclusive, per cpu
hw-breakpoint counters' for example and kernel/hw-breakpoint.c can
go away altogether.
kernel/perf_counter.c already handles scheduling, conflict
resolution, enumeration, syscall exposure and more.
Hm?
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 0/1] HW-BKPT: Allow per-cpu kernel-space Hardware Breakpoint requests
2009-08-26 9:16 ` Ingo Molnar
@ 2009-08-26 11:49 ` Frederic Weisbecker
2009-08-26 18:02 ` K.Prasad
0 siblings, 1 reply; 15+ messages in thread
From: Frederic Weisbecker @ 2009-08-26 11:49 UTC (permalink / raw)
To: Ingo Molnar
Cc: K.Prasad, Peter Zijlstra, LKML, Lai Jiangshan, Steven Rostedt,
Mathieu Desnoyers, Alan Stern
On Wed, Aug 26, 2009 at 11:16:42AM +0200, Ingo Molnar wrote:
>
> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
>
> > On Fri, Aug 21, 2009 at 04:28:11PM +0200, Ingo Molnar wrote:
> > >
> > > * K.Prasad <prasad@linux.vnet.ibm.com> wrote:
> > >
> > > > > Providing those would let us build a pmu struct on top of this
> > > > > high level API, hopefully.
> > >
> > > Note that there's a PMU struct already in
> > > arch/x86/kernel/cpu/perf_counter.c. Could debug-register ops be
> > > tacked on to it?
> >
> > No, we don't need to build an arch level pmu since the BP api
> > already handles the arch abstraction (or well, it is planned to).
> >
> > Instead, what we need is a core pmu that relies on the BP api.
> > Such pmu will be allocated dynamically while creating a hardware
> > breakpoint counter.
>
> i'm not convinced at all we need all that layering of
> perfcounters->pmu->BP. Why not add BP support to the PMU abstraction
> and be done with it?
>
> That way we get hardware breakpoints via 'pinned, exclusive, per cpu
> hw-breakpoint counters' for example and kernel/hw-breakpoint.c can
> go away altogether.
>
> kernel/perf_counter.c already handles scheduling, conflict
> resolution, enumeration, syscall exposure and more.
>
> Hm?
What you are suggesting is a complete refactoring of the breakpoint API
on top of pmus.
Well, that's possible and would factorize the scheduling, conflict and so
on. So that's theoretically a good point and I hope we'll come to such
centralization, that looks like my suggestion to Peter to share the
perfcounter layer that handles the scheduling of hardware registers.
But the pmu handling is currently not ready for that.
For now it's completely tied to perfcounter, the pmu handling must
become completely standalone wrt perfcounter because hardware
breakpoint shouldn't depend on perfcounter.
It couldn't even, because it is not only wanted for perfcounter but
currently used by ptrace (and perhaps some other various users) and
I can't imagine we need to open a perfcounter to use ptrace facilities.
That said, I really agree with the concept, we could then drop the
scheduling bindings for hardware breakpoints and use a centralized
thing for that which would be the pmu.
But:
- the PMUs handling is not ready for that as I explained above
- we still need the hardware breakpoint layer that decodes a breakpoint
request (address, length of the memory target, number of registers
limitation). This part is still a mandatory feature to build dynamic
PMU based breakpoints.
Frederic.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 0/1] HW-BKPT: Allow per-cpu kernel-space Hardware Breakpoint requests
2009-08-26 11:49 ` Frederic Weisbecker
@ 2009-08-26 18:02 ` K.Prasad
2009-08-29 13:41 ` Ingo Molnar
0 siblings, 1 reply; 15+ messages in thread
From: K.Prasad @ 2009-08-26 18:02 UTC (permalink / raw)
To: Ingo Molnar, Frederic Weisbecker
Cc: Peter Zijlstra, LKML, Lai Jiangshan, Steven Rostedt,
Mathieu Desnoyers, Alan Stern
On Wed, Aug 26, 2009 at 01:49:57PM +0200, Frederic Weisbecker wrote:
> On Wed, Aug 26, 2009 at 11:16:42AM +0200, Ingo Molnar wrote:
> >
> > * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> >
> > > On Fri, Aug 21, 2009 at 04:28:11PM +0200, Ingo Molnar wrote:
> > > >
> > > > * K.Prasad <prasad@linux.vnet.ibm.com> wrote:
> > > >
> > > > > > Providing those would let us build a pmu struct on top of this
> > > > > > high level API, hopefully.
> > > >
> > > > Note that there's a PMU struct already in
> > > > arch/x86/kernel/cpu/perf_counter.c. Could debug-register ops be
> > > > tacked on to it?
> > >
> > > No, we don't need to build an arch level pmu since the BP api
> > > already handles the arch abstraction (or well, it is planned to).
> > >
> > > Instead, what we need is a core pmu that relies on the BP api.
> > > Such pmu will be allocated dynamically while creating a hardware
> > > breakpoint counter.
> >
> > i'm not convinced at all we need all that layering of
> > perfcounters->pmu->BP. Why not add BP support to the PMU abstraction
> > and be done with it?
> >
> > That way we get hardware breakpoints via 'pinned, exclusive, per cpu
> > hw-breakpoint counters' for example and kernel/hw-breakpoint.c can
> > go away altogether.
> >
> > kernel/perf_counter.c already handles scheduling, conflict
> > resolution, enumeration, syscall exposure and more.
> >
> > Hm?
>
>
> What you are suggesting is a complete refactoring of the breakpoint API
> on top of pmus.
>
> Well, that's possible and would factorize the scheduling, conflict and so
> on. So that's theoretically a good point and I hope we'll come to such
> centralization, that looks like my suggestion to Peter to share the
> perfcounter layer that handles the scheduling of hardware registers.
>
> But the pmu handling is currently not ready for that.
I am not sure if pmus can handle, (or want to handle) all the intricacies
involved with the hw-breakpoint layer and let the other in-kernel users of
hw-breakpoint such as ptrace and ftrace (at the moment) operate over it.
The hw-breakpoint infrastructure has now grown to address nearly all
requirements of perf-tools (barring the facility to schedule
over-committed breakpoint requests, and a pending enable/disable
feature) while its interoperability allows co-existence of other users.
Given that there are multiple users of hw-breakpoint and that it is a
contended resource (with diversity in breakpoint characteristics)
wouldn't it be best to leave its management in a layer well below all
its users (including perf/pmu)?
That, in my opinion, would help the hw-breakpoint infrastructure evolve
continuously to help the users exploit the debug registers better.
Thanks,
K.Prasad
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 0/1] HW-BKPT: Allow per-cpu kernel-space Hardware Breakpoint requests
2009-08-26 18:02 ` K.Prasad
@ 2009-08-29 13:41 ` Ingo Molnar
2009-09-01 6:38 ` K.Prasad
0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2009-08-29 13:41 UTC (permalink / raw)
To: K.Prasad
Cc: Frederic Weisbecker, Peter Zijlstra, LKML, Lai Jiangshan,
Steven Rostedt, Mathieu Desnoyers, Alan Stern
* K.Prasad <prasad@linux.vnet.ibm.com> wrote:
> I am not sure if pmus can handle, (or want to handle) all the
> intricacies involved with the hw-breakpoint layer [...]
Which are those intricacies? It's all rather straightforward
register scheduling and reservation stuff - which perfcounters
already solves in a very rich way.
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 0/1] HW-BKPT: Allow per-cpu kernel-space Hardware Breakpoint requests
2009-08-29 13:41 ` Ingo Molnar
@ 2009-09-01 6:38 ` K.Prasad
2009-09-01 23:51 ` Frederic Weisbecker
0 siblings, 1 reply; 15+ messages in thread
From: K.Prasad @ 2009-09-01 6:38 UTC (permalink / raw)
To: Ingo Molnar
Cc: Frederic Weisbecker, Peter Zijlstra, LKML, Lai Jiangshan,
Steven Rostedt, Mathieu Desnoyers, Alan Stern, Paul Mackerras,
David Gibson
On Sat, Aug 29, 2009 at 03:41:07PM +0200, Ingo Molnar wrote:
>
> * K.Prasad <prasad@linux.vnet.ibm.com> wrote:
>
> > I am not sure if pmus can handle, (or want to handle) all the
> > intricacies involved with the hw-breakpoint layer [...]
>
> Which are those intricacies? It's all rather straightforward
> register scheduling and reservation stuff - which perfcounters
> already solves in a very rich way.
>
> Ingo
While it is quite true that debug register scheduling and reservation
(using exclusive/pinned properties) are possible through the perf's
implementation, breakpoint exception handling and a provision to invoke
user-defined callback require an extension to the existing perf
implementation (which allows only counting and sampling upon an event,
as I presently understand).
Breakpoint exception handling involving tasks such as filtering stray
exceptions (arising out of breakpoint length limitations), user-defined
callback invocation and signal generation are, as I see not in common
with perf-counter's functionality. And on architectures like PPC64 whose
exception behaviour is 'trigger-before-execute' making it difficult to
bring a 'continuous-trigger' behaviour, sufficient interlocking is necessary
with single-step exception (required for a
bkpt_exception-->disable_bp-->single_step-->enable_bp-->invoke_callback+signal
process).
And post integration, in-kernel users like ptrace, kgdb* and xmon*
which hitherto have interacted directly with the debug registers
(through set_debugreg()/set_dabr()) should route their requests through the
perf-layer. It is difficult to imagine ptrace's idempotent requests
(through ptrace_<get><set>_debugreg()) having to pass through perf-layer
(and becoming dependant on CONFIG_PERF_COUNTERS), not to mention the
tricks required to synchronise signal generation timing with exception
behaviour (especially on PPC64).
* - Not converted to use hw-breakpoint layer yet
With debugging and performance monitoring being two primary uses of
hw-breakpoints (apart from the many niche uses that one can think of),
it would be prudent to retain the breakpoints as a separate layer
allowing exploitation by applications with either needs than to tightly
integrate with perf-counters.
With plenty of users exploiting the breakpoint layer's debugging
capabilities - like SystemTap http://lwn.net/Articles/343581/
(extensible for user-space), ftrace, ptrace and potentially gdbstub
(http://tinyurl.com/gdbstub-prototype), it is but a sad state to keep
the hw-breakpoint layer waiting in-queue for want of performance
monitoring (through perf-counter exploitation/integration).
Thanks,
K.Prasad
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 0/1] HW-BKPT: Allow per-cpu kernel-space Hardware Breakpoint requests
2009-09-01 6:38 ` K.Prasad
@ 2009-09-01 23:51 ` Frederic Weisbecker
2009-09-03 18:28 ` K.Prasad
0 siblings, 1 reply; 15+ messages in thread
From: Frederic Weisbecker @ 2009-09-01 23:51 UTC (permalink / raw)
To: K.Prasad
Cc: Ingo Molnar, Peter Zijlstra, LKML, Lai Jiangshan, Steven Rostedt,
Mathieu Desnoyers, Alan Stern, Paul Mackerras, David Gibson
On Tue, Sep 01, 2009 at 12:08:45PM +0530, K.Prasad wrote:
> On Sat, Aug 29, 2009 at 03:41:07PM +0200, Ingo Molnar wrote:
> >
> > * K.Prasad <prasad@linux.vnet.ibm.com> wrote:
> >
> > > I am not sure if pmus can handle, (or want to handle) all the
> > > intricacies involved with the hw-breakpoint layer [...]
> >
> > Which are those intricacies? It's all rather straightforward
> > register scheduling and reservation stuff - which perfcounters
> > already solves in a very rich way.
> >
> > Ingo
>
> While it is quite true that debug register scheduling and reservation
> (using exclusive/pinned properties) are possible through the perf's
> implementation, breakpoint exception handling and a provision to invoke
> user-defined callback require an extension to the existing perf
> implementation (which allows only counting and sampling upon an event,
> as I presently understand).
Well, not that much actually. The upper (core) layer of the hw bp
should reside and still handle specific breakpoint problems.
Also that doesn't imply a complete zapping of the low level, we
indeed still need to handle things like exception callbacks.
Actually the only part that may roughly shrink is the registers
scheduling. We just won't need to handle anymore tricky things like
per thread virtual debug registers and things like that.
> Breakpoint exception handling involving tasks such as filtering stray
> exceptions (arising out of breakpoint length limitations), user-defined
> callback invocation and signal generation are, as I see not in common
> with perf-counter's functionality. And on architectures like PPC64 whose
> exception behaviour is 'trigger-before-execute' making it difficult to
> bring a 'continuous-trigger' behaviour, sufficient interlocking is necessary
> with single-step exception (required for a
> bkpt_exception-->disable_bp-->single_step-->enable_bp-->invoke_callback+signal
> process).
No really, it's not up to perf to handle such peculiar things. It's still
the role of the bp API (low and high level).
> And post integration, in-kernel users like ptrace, kgdb* and xmon*
> which hitherto have interacted directly with the debug registers
> (through set_debugreg()/set_dabr()) should route their requests through the
> perf-layer. It is difficult to imagine ptrace's idempotent requests
> (through ptrace_<get><set>_debugreg()) having to pass through perf-layer
> (and becoming dependant on CONFIG_PERF_COUNTERS), not to mention the
> tricks required to synchronise signal generation timing with exception
> behaviour (especially on PPC64).
> * - Not converted to use hw-breakpoint layer yet
Actually, I see the perf layer here as a middle man between
- the very hardware stuff (dr[0-467]) handling, reading, writing, updating
- the core API (register_kernel_breakpoint(), register_user_breakpoint() etc..)
And this middle man can handle so much things on its own that the two above
gets utterly shrinked.
Also the ptrace thing is tricky in itself, and that can't be helped easily.
Because of the direct writing to debug registers done by POKE_USR,
whatever the current breakpoint API with or without perf integration, we still
need subterfuges to carry it.
> With debugging and performance monitoring being two primary uses of
> hw-breakpoints (apart from the many niche uses that one can think of),
> it would be prudent to retain the breakpoints as a separate layer
> allowing exploitation by applications with either needs than to tightly
> integrate with perf-counters.
A lonesome counter would be very limited in itself, we would only
the perf support for breakpoint. Again, the API is still required.
The goal is to have:
1) A factorization of the registers scheduling, of breakpoint
target allocation (task/cpu, etc..., it's all handled by perf)
2) Optimization of registers scheduling
3) New features (period to trigger events, target inheritance, context exclusion
etc...)
4) A schrink of the code
> With plenty of users exploiting the breakpoint layer's debugging
> capabilities - like SystemTap http://lwn.net/Articles/343581/
> (extensible for user-space), ftrace, ptrace and potentially gdbstub
> (http://tinyurl.com/gdbstub-prototype), it is but a sad state to keep
> the hw-breakpoint layer waiting in-queue for want of performance
> monitoring (through perf-counter exploitation/integration).
I first felt the idea of a perf based design suspicious. Because it appeared
to be a real overkill.
But actually after more thoughts about it, it could really simplify, factorize,
and enhance this API.
I'm currently trying to do something. A quick draft just to see where
we can go with it, how could look like such a beast...
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 0/1] HW-BKPT: Allow per-cpu kernel-space Hardware Breakpoint requests
2009-09-01 23:51 ` Frederic Weisbecker
@ 2009-09-03 18:28 ` K.Prasad
2009-09-03 19:22 ` Ingo Molnar
0 siblings, 1 reply; 15+ messages in thread
From: K.Prasad @ 2009-09-03 18:28 UTC (permalink / raw)
To: Frederic Weisbecker, Ingo Molnar
Cc: Peter Zijlstra, LKML, Lai Jiangshan, Steven Rostedt,
Mathieu Desnoyers, Alan Stern, Paul Mackerras, David Gibson
On Wed, Sep 02, 2009 at 01:51:33AM +0200, Frederic Weisbecker wrote:
> On Tue, Sep 01, 2009 at 12:08:45PM +0530, K.Prasad wrote:
> > On Sat, Aug 29, 2009 at 03:41:07PM +0200, Ingo Molnar wrote:
> > >
> > > * K.Prasad <prasad@linux.vnet.ibm.com> wrote:
> > >
> > > > I am not sure if pmus can handle, (or want to handle) all the
> > > > intricacies involved with the hw-breakpoint layer [...]
> > >
> > > Which are those intricacies? It's all rather straightforward
> > > register scheduling and reservation stuff - which perfcounters
> > > already solves in a very rich way.
> > >
> > > Ingo
> >
[edited]
> > And post integration, in-kernel users like ptrace, kgdb* and xmon*
> > which hitherto have interacted directly with the debug registers
> > (through set_debugreg()/set_dabr()) should route their requests through the
> > perf-layer. It is difficult to imagine ptrace's idempotent requests
> > (through ptrace_<get><set>_debugreg()) having to pass through perf-layer
> > (and becoming dependant on CONFIG_PERF_COUNTERS), not to mention the
> > tricks required to synchronise signal generation timing with exception
> > behaviour (especially on PPC64).
> > * - Not converted to use hw-breakpoint layer yet
>
>
> Actually, I see the perf layer here as a middle man between
>
> - the very hardware stuff (dr[0-467]) handling, reading, writing, updating
> - the core API (register_kernel_breakpoint(), register_user_breakpoint() etc..)
>
> And this middle man can handle so much things on its own that the two above
> gets utterly shrinked.
>
> Also the ptrace thing is tricky in itself, and that can't be helped easily.
> Because of the direct writing to debug registers done by POKE_USR,
> whatever the current breakpoint API with or without perf integration, we still
> need subterfuges to carry it.
>
The reverse-dependancy this would create over perf (CONFIG_PERF) for the
hw-breakpoint layer is an undesirable side-effect, and gives rise to
atleast two immediate questions:
- Handling of requests for hw-breakpoint from users like ptrace when
CONFIG_PERF is not turned on
- Managing 'register scheduling and reservation' on architectures where
perf layer isn't ported. An inefficient way of handling this would be
to retain the existing register allocation code of hw-breakpoint for
such architectures - thereby artificially imposing arch-specific code
into generic stuff.
A solution here would be to detach parts of perf layer's code that
handle register scheduling and reservation (which I learn are in
kernel/perf_counter.c) into a separate entity (outside the ambit of
CONFIG_PERF) that can serve the needs of both hw-breakpoint and perf
thereby eliminating the two issues enumerated above.
The tight coupling between the functions that perform register
scheduling (in kernel/perf_counter.c) and perf's data structures is quite
apparent and does suggest non-trivial amount of effort to detach them
into a layer of its own.
However this might be quite necessary in order to balance between a
desire to re-use the 'register scheduling and reservation' code of
perf-layer while not running into issues as above.
This, along with the framework (described in the previous mail) to retain
the hw-breakpoint's APIs + code interacting with debug registers
(including exception handling) would be a good compromise.
Thanks,
K.Prasad
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 0/1] HW-BKPT: Allow per-cpu kernel-space Hardware Breakpoint requests
2009-09-03 18:28 ` K.Prasad
@ 2009-09-03 19:22 ` Ingo Molnar
0 siblings, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2009-09-03 19:22 UTC (permalink / raw)
To: K.Prasad
Cc: Frederic Weisbecker, Peter Zijlstra, LKML, Lai Jiangshan,
Steven Rostedt, Mathieu Desnoyers, Alan Stern, Paul Mackerras,
David Gibson
* K.Prasad <prasad@linux.vnet.ibm.com> wrote:
> On Wed, Sep 02, 2009 at 01:51:33AM +0200, Frederic Weisbecker wrote:
> > On Tue, Sep 01, 2009 at 12:08:45PM +0530, K.Prasad wrote:
> > > On Sat, Aug 29, 2009 at 03:41:07PM +0200, Ingo Molnar wrote:
> > > >
> > > > * K.Prasad <prasad@linux.vnet.ibm.com> wrote:
> > > >
> > > > > I am not sure if pmus can handle, (or want to handle) all the
> > > > > intricacies involved with the hw-breakpoint layer [...]
> > > >
> > > > Which are those intricacies? It's all rather straightforward
> > > > register scheduling and reservation stuff - which perfcounters
> > > > already solves in a very rich way.
> > > >
> > > > Ingo
> > >
> [edited]
> > > And post integration, in-kernel users like ptrace, kgdb* and xmon*
> > > which hitherto have interacted directly with the debug registers
> > > (through set_debugreg()/set_dabr()) should route their requests through the
> > > perf-layer. It is difficult to imagine ptrace's idempotent requests
> > > (through ptrace_<get><set>_debugreg()) having to pass through perf-layer
> > > (and becoming dependant on CONFIG_PERF_COUNTERS), not to mention the
> > > tricks required to synchronise signal generation timing with exception
> > > behaviour (especially on PPC64).
> > > * - Not converted to use hw-breakpoint layer yet
> >
> >
> > Actually, I see the perf layer here as a middle man between
> >
> > - the very hardware stuff (dr[0-467]) handling, reading, writing, updating
> > - the core API (register_kernel_breakpoint(), register_user_breakpoint() etc..)
> >
> > And this middle man can handle so much things on its own that the two above
> > gets utterly shrinked.
> >
> > Also the ptrace thing is tricky in itself, and that can't be helped easily.
> > Because of the direct writing to debug registers done by POKE_USR,
> > whatever the current breakpoint API with or without perf integration, we still
> > need subterfuges to carry it.
> >
>
> The reverse-dependancy this would create over perf (CONFIG_PERF) for the
> hw-breakpoint layer is an undesirable side-effect, and gives rise to
> atleast two immediate questions:
>
> - Handling of requests for hw-breakpoint from users like ptrace when
> CONFIG_PERF is not turned on
This is basically just a build/layering logistics question and it is
solved easily - we could have a library mode for it.
> - Managing 'register scheduling and reservation' on architectures where
> perf layer isn't ported. An inefficient way of handling this would be
> to retain the existing register allocation code of hw-breakpoint for
> such architectures - thereby artificially imposing arch-specific code
> into generic stuff.
Minimally porting perf to enable a hw-breakpoints PMU extension is
very easy in practice. For example on s390 it took just 15 lines of
code:
12310e9: [S390] Enable tick based perf_counter on s390.
arch/s390/Kconfig | 1 +
arch/s390/include/asm/perf_counter.h | 8 ++++++++
tools/perf/perf.h | 6 ++++++
3 files changed, 15 insertions(+), 0 deletions(-)
On FRV it took 38 lines (60% of which are boilerplace copyright
notices), on PARISC 15 lines.
By far the most complexity is in factoring out the hw-breakpoint
code itself - and that has to be done regardless of the register
scheduling model.
> A solution here would be to detach parts of perf layer's code that
> handle register scheduling and reservation (which I learn are in
> kernel/perf_counter.c) into a separate entity (outside the ambit
> of CONFIG_PERF) that can serve the needs of both hw-breakpoint and
> perf thereby eliminating the two issues enumerated above.
>
> The tight coupling between the functions that perform register
> scheduling (in kernel/perf_counter.c) and perf's data structures
> is quite apparent and does suggest non-trivial amount of effort to
> detach them into a layer of its own.
>
> However this might be quite necessary in order to balance between
> a desire to re-use the 'register scheduling and reservation' code
> of perf-layer while not running into issues as above.
>
> This, along with the framework (described in the previous mail) to
> retain the hw-breakpoint's APIs + code interacting with debug
> registers (including exception handling) would be a good
> compromise.
I dont think the librarization is all that complex. It's very much
desired, as we'd reuse an existing piece of infrastructure to
implement another one - this is always good.
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2009-09-03 19:23 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-17 12:46 [Patch 0/1] HW-BKPT: Allow per-cpu kernel-space Hardware Breakpoint requests K.Prasad
2009-08-19 16:11 ` K.Prasad
2009-08-19 17:33 ` Frederic Weisbecker
2009-08-20 17:27 ` K.Prasad
2009-08-21 14:28 ` Ingo Molnar
2009-08-26 3:36 ` Frederic Weisbecker
2009-08-26 9:16 ` Ingo Molnar
2009-08-26 11:49 ` Frederic Weisbecker
2009-08-26 18:02 ` K.Prasad
2009-08-29 13:41 ` Ingo Molnar
2009-09-01 6:38 ` K.Prasad
2009-09-01 23:51 ` Frederic Weisbecker
2009-09-03 18:28 ` K.Prasad
2009-09-03 19:22 ` Ingo Molnar
2009-08-25 20:33 ` K.Prasad
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).