public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>, LKML <linux-kernel@vger.kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Mike Galbraith <efault@gmx.de>, Paul Mackerras <paulus@samba.org>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	Anton Blanchard <anton@samba.org>, Li Zefan <lizf@cn.fujitsu.com>,
	Zhaolei <zhaolei@cn.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>,
	"K . Prasad" <prasad@linux.vnet.ibm.com>,
	Alan Stern <stern@rowland.harvard.edu>
Subject: Re: [RFC][PATCH 5/5] perfcounter: Add support for kernel hardware breakpoints
Date: Sat, 25 Jul 2009 12:56:56 +0200	[thread overview]
Message-ID: <1248519416.5780.12.camel@laptop> (raw)
In-Reply-To: <20090724174723.GA11985@nowhere>

On Fri, 2009-07-24 at 19:47 +0200, Frederic Weisbecker wrote:
> On Fri, Jul 24, 2009 at 04:26:09PM +0200, Peter Zijlstra wrote:
> > On Fri, 2009-07-24 at 16:02 +0200, Frédéric Weisbecker wrote:
> > > 2009/7/23 Peter Zijlstra <a.p.zijlstra@chello.nl>:
> > > > On Mon, 2009-07-20 at 13:08 -0400, Frederic Weisbecker wrote:
> > > >> This adds the support for kernel hardware breakpoints in perfcounter.
> > > >> It is added as a new type of software counter and can be defined by
> > > >> using the counter number 5 and by passsing the address of the
> > > >> breakpoint to set through the config attribute.
> > > >
> > > > Is there a limit to these hardware breakpoints? If so, the software
> > > > counter model is not sufficient, since we assume we can always schedule
> > > > all software counters. However if you were to add more counters than you
> > > > have hardware breakpoints you're hosed.
> > > >
> > > >
> > > 
> > > Hmm, indeed. But this patch handles this case:
> > > 
> > > +static const struct pmu *bp_perf_counter_init(struct perf_counter *counter)
> > > +{
> > > +       if (hw_breakpoint_perf_init((unsigned long)counter->attr.config))
> > > +               return NULL;
> > > +
> > > 
> > > IIRC, hw_breakpoint_perf_init() calls register_kernel_breakpoint() which in turn
> > > returns -ENOSPC if we haven't any breakpoint room left.
> > > 
> > > It seems we can only set 4 breakpoints simultaneously in x86, or
> > > something close to that.
> > 
> > Ah, that's not the correct way of doing that. Suppose that you would
> > register 4 breakpoint counter to one task, that would leave you unable
> > to register a breakpoint counter on another task. Even though these
> > breakpoints would never be scheduled simultaneously.
> 
> 
> 
> Ah, but the breakpoint API deals with that.
> We have two types of breakpoints: the kernel bp and the user bp.
> The kernel breakpoints are global points that don't deal with task
> scheduling, virtual registers, etc...
> 
> But the user breakpoints (eg: used with ptrace) are dealt with virtual
> debug registers in a way similar to perfcounter: each time we switch the
> current task on a cpu, the hardware register states are stored in the
> thread, and we load the virtual values into the hardware for the next
> thread.

Ah, but that is sub-optimal, perf counters doesn't actually change the
state if both tasks have the same counter configuration. Yielding a
great performance benefit on scheduling intensive workloads. Poking at
these MSRs, esp. writing to them is very expensive.

So I would suggest not using that feature of the breakpoint API for the
perf counter integration.

> However, this patchset only deals with kernel breakpoint for now (wide
> tracing).

Right, and that's all you would need for perf counter support, please
don't use whatever task state handling you have in place.

> > Also, regular perf counters would multiplex counters when over-committed
> > on a hardware resource, allowing you to create more such breakpoints
> > than you have actual hardware slots.

> In the task level I talked above?

For either cpu or task level.

> > The way to do this is to create a breakpoint pmu which would simply fail
> > the pmu->enable() method if there are insufficient hardware resources
> > available.

> Now I wonder if the code that manages hardware debug breakpoint task switching
> and the code from perfcounter could be factorized in one common thing.

Dunno, its really not that hard to RR a list of counters/breakpoint.

> > Also, your init routine, the above hw_breakpoint_perf_init(), will have
> > to verify that when the counter is part of a group, this and all other
> > hw breakpoint counters in that group can, now, but also in the future,
> > be scheduled simultaneously.

> This is already dealt from the hardware breakpoint API.
> We use only one breakpoint register for the user breakpoints, and the rest
> for kernel breakpoints. Also if no user breakpoint is registered, every
> registers can be used for kernek breakpoints.

This means that you can only ever allow 3 breakpoints into any one group
and have to ensure that no other user can come in when they're not in
active use -- the group is scheduled out.

That is, you have to reserve the max number of breakpoint in a group for
exclusive use by perf counters.

Also, this 1 for userspace seems restrictive. I'd want to have all 4
from GDB if I'd knew my hardware was capable and I'd needed that many.

> > This means that there should be some arbitration towards other in-kernel
> > hw breakpoint users, because if you allow all 4 hw breakpoints in a
> > group and then let another hw breakpoint users have one, you can never
> > schedule that group again.

> That's also why I think it's better to keep this virtual register management
> from inside the breakpoint API, so that it can deal with perfcounter, ptrace,
> etc... at the same.
> 
> What do you think?

I think not. I think the breakpoint API should not do task state, or at
least have an interface without this.

Having two multiplexing layers on top of one another is inefficient and
error prone.


  reply	other threads:[~2009-07-25 10:58 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-20 17:08 [RFC][PATCH 0/5] hw-breakpoints: Make the API generic + support for perfcounters Frederic Weisbecker
2009-07-20 17:08 ` [RFC][PATCH 1/5] hw-breakpoints: Make kernel breakpoints API truly generic Frederic Weisbecker
2009-07-20 17:27   ` Mathieu Desnoyers
2009-07-25  2:37     ` Frederic Weisbecker
2009-07-25 15:38       ` Mathieu Desnoyers
2009-07-28  1:35         ` Frederic Weisbecker
2009-07-21 11:15   ` K.Prasad
2009-07-25  2:56     ` Frederic Weisbecker
2009-07-20 17:08 ` [RFC][PATCH 2/5] hw-breakpoints: Pull up the target symbol in a generic field Frederic Weisbecker
2009-07-20 17:08 ` [RFC][PATCH 3/5] hw-breakpoints: Make user breakpoints API truly generic Frederic Weisbecker
2009-07-20 17:08 ` [RFC][PATCH 4/5] perfcounter: Grow the event number to 64 bits Frederic Weisbecker
2009-07-20 17:08 ` [RFC][PATCH 5/5] perfcounter: Add support for kernel hardware breakpoints Frederic Weisbecker
2009-07-20 17:38   ` Peter Zijlstra
2009-07-21  7:11     ` Frédéric Weisbecker
2009-07-20 17:38   ` Peter Zijlstra
2009-07-21  7:19     ` Frédéric Weisbecker
2009-07-20 17:38   ` Peter Zijlstra
2009-07-20 21:22     ` Frédéric Weisbecker
2009-07-24 20:20       ` Masami Hiramatsu
2009-07-23 13:08   ` Peter Zijlstra
2009-07-23 17:45     ` Peter Zijlstra
2009-07-23 19:56       ` Alan Stern
2009-07-24 14:02     ` Frédéric Weisbecker
2009-07-24 14:26       ` Peter Zijlstra
2009-07-24 17:47         ` Frederic Weisbecker
2009-07-25 10:56           ` Peter Zijlstra [this message]
2009-07-25 14:19             ` Frederic Weisbecker
2009-07-25 15:51               ` Mathieu Desnoyers
2009-07-25 16:27                 ` Peter Zijlstra
2009-07-25 16:22               ` Peter Zijlstra
2009-07-25 23:57                 ` K.Prasad
2009-07-27  8:53                   ` Peter Zijlstra
2009-07-28  1:03                     ` Frederic Weisbecker
2009-07-28  7:24                       ` Peter Zijlstra
2009-07-28 14:04                         ` Mathieu Desnoyers
2009-07-28 14:42                           ` Peter Zijlstra
2009-07-29  0:36                         ` Frederic Weisbecker
2009-07-29  8:28                           ` Peter Zijlstra
2009-07-29 14:03                             ` Frederic Weisbecker
2009-07-28 16:12                     ` K.Prasad
2009-07-28 16:41                       ` Peter Zijlstra
2009-07-29  6:37                         ` K.Prasad
2009-07-29  9:22                           ` Peter Zijlstra
2009-07-29 14:57                             ` Arnaldo Carvalho de Melo
2009-07-28  0:18                 ` Frederic Weisbecker
2009-07-28  7:26                   ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1248519416.5780.12.camel@laptop \
    --to=a.p.zijlstra@chello.nl \
    --cc=acme@redhat.com \
    --cc=anton@samba.org \
    --cc=efault@gmx.de \
    --cc=fweisbec@gmail.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=mingo@elte.hu \
    --cc=paulus@samba.org \
    --cc=prasad@linux.vnet.ibm.com \
    --cc=rostedt@goodmis.org \
    --cc=stern@rowland.harvard.edu \
    --cc=tglx@linutronix.de \
    --cc=zhaolei@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox