From: "K.Prasad" <prasad@linux.vnet.ibm.com>
To: Paul Mackerras <paulus@samba.org>
Cc: Michael Neuling <mikey@neuling.org>,
Benjamin Herrenschmidt <benh@au1.ibm.com>,
shaggy@linux.vnet.ibm.com,
Frederic Weisbecker <fweisbec@gmail.com>,
David Gibson <dwg@au1.ibm.com>,
"linuxppc-dev@ozlabs.org" <linuxppc-dev@ozlabs.org>,
Alan Stern <stern@rowland.harvard.edu>,
Roland McGrath <roland@redhat.com>
Subject: Re: [Patch 2/2] PPC64-HWBKPT: Implement hw-breakpoints for PPC64
Date: Thu, 20 May 2010 09:36:03 +0530 [thread overview]
Message-ID: <20100520040603.GA7448@in.ibm.com> (raw)
In-Reply-To: <20100517123241.GA5379@brick.ozlabs.ibm.com>
On Mon, May 17, 2010 at 10:32:41PM +1000, Paul Mackerras wrote:
> On Fri, May 14, 2010 at 12:25:31PM +0530, K.Prasad wrote:
>
> > Okay. I will re-use single_step_exception() after modifications; it
> > appearsto have no in-kernel users for it.
>
> It's called from exceptions-64s.S, head_32.S and head_8xx.S in
> arch/powerpc/kernel.
>
> > > Suppose the address at which the data breakpoint has been unmapped,
> > > and the process has a handler for the SIGSEGV signal. When we try to
> > > single-step the load or store, we will get a DSI (0x300) interrupt,
> > > call into do_page_fault, and end up sending the process a SIGSEGV.
> > > That will invoke the signal handler, which can then do anything it
> > > likes. It can do a blocking system call, it can longjmp() back into
> > > its main event, or it can return from the signal handler. Only in the
> > > last case will it retry the load or store, and then only if the signal
> > > handler hasn't modified the NIP value in the signal frame. That's
> > > what I mean by "doesn't return to the instruction".
> > >
> >
> > At the outset, this seemed to be a scary thing to happen; but turns out
> > to be harmful only to the extent of generating a false hw-breakpoint
> > exception in certain cases. A case-by-case basis analysis reveals thus:
> >
> > Consider an instruction stream i1, i2, i3, ... iN, where i1 has
> > finished execution and i2 is about to be executed but has generated a
> > DSI interrupt with the above-mentioned conditions i.e. DSISR indicates a
> > DABR match + Page-Table entry not found. Now according to do_hash_page
> > in exception-64s.S (as pasted below), do_page_fault() and do_dabr() are
> > invoked one after the other.
> >
> > _STATIC(do_hash_page)
> > std r3,_DAR(r1)
> > std r4,_DSISR(r1)
> >
> > andis. r0,r4,0xa410 /* weird error? */
> > bne- handle_page_fault /* if not, try to insert a HPTE */
> > andis. r0,r4,DSISR_DABRMATCH@h
> > bne- handle_dabr_fault
>
> Note that bne is not a procedure call; we'll actually get two DSIs in
> this scenario. But I don't think that matters. Also note that the
> branch to handle_page_fault here is not for the HPTE-not-found case;
> it's for the unusual errors. So we'll handle the HPTE insertion after
> handling the DABR match.
>
Okay.
> > Thus, when control returns to user-space to instruction 'i2', the
> > hw_breakpoint_handler() has completed execution, and a SIGSEGV is pending
> > to be delivered and single-stepping enabled MSR_SE is set. Upon return to
> > user-space, the handler for SIGSEGV is executed and it may perform one of
> > the following (as you stated previously):
> > (a) Make a blocking syscall, eventually yielding the CPU to a new thread
> > (b) Jump to a different instruction in user-space, say iN, and not complete
> > the execution of instruction i2 at all.
> > (c) Return to instruction i2 and complete the execution.
> >
> > In case of (a), the context-switches should not affect the ability to
> > single-step the instruction when the thread is eventually scheduled to
> > run. The thread, when scheduled onto the CPU will complete signal
> > handling, return to execute instruction i2, cause single-step exception,
> > restore breakpoints and run smoothly thereafter.
>
> Right. However, the thread is running the signal handler without the
> DABR being set, which is unfortunate.
>
In order to keep the breakpoint active during signal handling, a
PowerPC specific signal handling code, say do_signal_pending() in
arch/powerpc/kernel/signal.c, must be tapped to check for/restore
the breakpoint for the process (if it exists).
Then, single_step_dabr_instruction() must be suitably modified to not
clear the current->thread.last_hit_ubp pointer if breakpoint has been
taken in a nested condition i.e. a breakpoint exception in signal-handler
which was preceded by a previous breakpoint exception in normal user-space
stack. A flag to denote such a condition would be required in
'struct thread_struct'.
I'm afraid if this is more complexity than we want to handle in the
first patchset. I agree that this will create a 'blind-spot' of code
which cannot not be monitored using breakpoints and may limit debugging
efforts (specially for memory corruption); however suspecting that signal
handlers (especially those that don't return to original instruction)
would be rare, I feel that this could be a 'feature' that can be brought
later-in. What do you think?
> > In case of (b), the new instruction iN is single-stepped, the breakpoint
> > values are restored and the hw-breakpoint exception callback is invoked
> > after iN is executed. The user of this breakpoint i.e. the caller of
> > register_user_hw_breakpoint() who had placed a breakpoint on addressed
> > accessed by instruction i2 will be confused to find that an unrelated
> > instruction (which may not be a load/store) has caused the breakpoint.
>
> That's the case if the signal handler modifies the NIP value in the
> register set saved on the stack and returns. If the signal handler
> instead simply jumps to instruction iN (e.g. with longjmp or
> siglongjmp), we'll never get the single-step callback.
>
True. As described above, this will lead to unreliable restoration of
breakpoints. I'm not sure if this is a common occurrence in user-space,
but if rare, I think we should be able to tolerate this behaviour for now.
> > If so desired, we may adopt the 'trigger-before-execute' semantics for
> > user-space breakpoints wherein the hw-breakpoint callback (through
> > perf_bp_event()) is invoked in hw_breakpoint_handler() itself. This
> > would indicate to the user that the impending instruction causes a DABR
> > 'hit' but it may or may not be executed due to the role of
> > signal-handler or due to self-modifying code (as mentioned below).
> >
> > Kindly let me know what you think about it.
> >
> > (c) is the normal execution path we desire. The instruction i2 will be
> > safely single-stepped and breakpoints are restored.
> >
> > > The instruction could be changed underneath us if the program is
> > > multi-threaded and another thread writes another instruction to the
> > > instruction word where the load or store is. Or it could use mmap()
> > > to map some other page at the address of the load or store. Either
> > > way we could end up with a different instruction there.
> > >
> >
> > If the instruction that originally caused the DABR exception is changed,
> > the new instruction in its place would still single-step to restore
> > breakpoint values. However the user of breakpoint interface will be
> > confused to find that the callback is invoked for an irrelevant
> > instruction.
> >
> > It could be circumvented, to an extent, through the use of
> > trigger-before-execute semantics (as described before).
>
> I don't think we want to do trigger-before-execute. Ideally what we
> want to ensure at all times is that either DABR is set (enabled) or
> MSR.SE is set, but not both. To ensure that we'd have to modify the
> signal delivery code and possibly other places.
>
The case where both DABR and MSR_SE are set is when a page-fault
resulting in context-switch occurs before single-stepping, right? I have
responded to that below. The problem with signal-handling, as I
understand, is unreliable restoration of/lost breakpoints and I wish to
deal them in future patchsets. Let me know if my understanding is incorrect.
> > > If we do get a context switch, e.g. as a result of a page fault, and
> > > then switch back to the task, it looks to me like we will end up with
> > > MSR_SE and DABR both set. I don't suppose that will actually cause
> > > any real problem besides double-counting the hit.
> > >
> >
> > Page fault exception will be handled before hw_breakpoint_handler(),
> > hence MSR_SE would not have been set if a context-switch happened in
> > pange-fault handling itself. I don't see a case where both MSR_SE and
> > DABR will be set together.
>
> Imagine this scenario: we get the DABR match, set MSR_SE and return to
> the task. In the meantime another higher-priority task has become
> runnable and our need_resched flag is set, so we call schedule() on
> the way back out to usermode. The other task runs and then blocks and
> our task gets scheduled again. As part of the context switch,
> arch_install_hw_breakpoint() will get called and will set DABR. So
> we'll return to usermode with both DABR and MSE_SE set.
>
I didn't foresee such a possibility. I think this can be handled by
putting a check in arch_install_hw_breakpoint() as shown below:
int arch_install_hw_breakpoint(struct perf_event *bp)
{
struct arch_hw_breakpoint *info = counter_arch_bp(bp);
struct perf_event **slot = &__get_cpu_var(bp_per_reg);
*slot = bp;
if (!current->thread.last_hit_ubp)
set_dabr(info->address | info->type | DABR_TRANSLATION);
return 0;
}
This should prevent DABR and MSR_SE from being enabled at the same time.
> > Thanks for the comments. Let me know if the analysis above is incorrect
> > or if I've failed to recognise any important issue that you pointed out.
> > I will send out a patch with changes to emulate_single_step() in the
> > next version of the patchset, if I don't hear any further comments.
>
> We haven't discussed at all the case where the breakpoint is a per-cpu
> breakpoint or where it's a per-task breakpoint but the DABR match
> occurs within the kernel -- which can happen, even for a user address,
> in __get_user, __put_user, __copy_tofrom_user, etc. If the access
> there is to a bad address, we'll invoke the exception case in
> bad_page_fault(), which looks to be another place where we need to
> recognize that single-stepping won't succeed and reinstall the DABR
> setting. Do we count that as an event or not? - I'm not sure.
>
I'll respond to this part in the subsequent mail.
Thanks,
K.Prasad
next prev parent reply other threads:[~2010-05-20 4:06 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20100414033555.746326035@pr>
2010-04-14 3:48 ` [Patch 1/2] PPC64-HWBKPT: Disable interrupts for data breakpoint exceptions K.Prasad
2010-05-03 5:04 ` Paul Mackerras
2010-04-14 3:48 ` [Patch 2/2] PPC64-HWBKPT: Implement hw-breakpoints for PPC64 K.Prasad
2010-05-03 6:23 ` Paul Mackerras
2010-05-04 20:33 ` K.Prasad
2010-05-12 3:34 ` K.Prasad
2010-05-12 6:32 ` Paul Mackerras
2010-05-14 6:55 ` K.Prasad
2010-05-17 12:32 ` Paul Mackerras
2010-05-20 4:06 ` K.Prasad [this message]
2010-05-20 13:10 ` Paul Mackerras
2010-05-24 15:49 ` K.Prasad
[not found] <20100330095040.565675261@pr>
2010-03-30 10:00 ` K.Prasad
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100520040603.GA7448@in.ibm.com \
--to=prasad@linux.vnet.ibm.com \
--cc=benh@au1.ibm.com \
--cc=dwg@au1.ibm.com \
--cc=fweisbec@gmail.com \
--cc=linuxppc-dev@ozlabs.org \
--cc=mikey@neuling.org \
--cc=paulus@samba.org \
--cc=roland@redhat.com \
--cc=shaggy@linux.vnet.ibm.com \
--cc=stern@rowland.harvard.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.