From: "tiejun.chen" <tiejun.chen@windriver.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com>,
Anton Blanchard <anton@au1.ibm.com>,
linux-kernel <linux-kernel@vger.kernel.org>,
Steven Rostedt <rostedt@goodmis.org>,
Yong Zhang <yong.zhang0@gmail.com>,
paulus@samba.org, yrl.pp-manager.tt@hitachi.com,
Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
linuxppc-dev@lists.ozlabs.org
Subject: Re: [BUG?]3.0-rc4+ftrace+kprobe: set kprobe at instruction 'stwu' lead to system crash/freeze
Date: Wed, 30 Nov 2011 19:06:19 +0800 [thread overview]
Message-ID: <4ED60E2B.8030603@windriver.com> (raw)
In-Reply-To: <1322626752.21641.22.camel@pasglop>
Benjamin Herrenschmidt wrote:
> On Fri, 2011-07-01 at 18:03 +0800, tiejun.chen wrote:
>> Here emulate_step() is called to emulate 'stwu'. Actually this is equivalent to
>> 1> update pr_regs->gpr[1] = mem(old r1 + (-A))
>> 2> 'stw <old r1>, mem<(old r1 + (-A)) >
>>
>> You should notice the stack based on new r1 would be covered with mem<old r1
>> +(-A)>. So after this, the kernel exit from post_krpobe, something would be
>> broken. This should depend on sizeof(-A).
>>
>> For kprobe show_interrupts, you can see pregs->nip is re-written violently so
>> kernel issued.
>>
>> But sometimes we may only re-write some violate registers the kernel still
>> alive. And so this is just why the kernel works well for some kprobed point
>> after you change some kernel options/toolchains.
>>
>> If I'm correct its difficult to kprobe these stwu sp operation since the
>> sizeof(-A) is undermined for the kernel. So we have to implement in-depend
>> interrupt stack like PPC64.
>
> So I've spent a lot of time trying to find a better way to fix that bug
> and I think I might have finally found one :-)
I can understand what you mean in below since I remember you already clarified
this way previously.
>
> - When you try to emulate stwcx on the kernel stack (and only there),
I think it should be stwu/stdu.
> don't actually perform the store. Set a TIF flag instead to indicate
> special processing in the exception return path and store the address to
> update somewhere either in a free slot of the stack frame itself of
> somewhere in the thread struct (the former would be easier). You may as
> well do some sanity checking on the value while at it to catch errors
> early.
>
> - In the exception return code, we already test for various TIF flags
> (*** see comment below, it's even buggy today for preempt ***), so we
> add a test for that flag and go to do_work.
>
> - At the end of do_work, we check for this TIF flag. If it's not set or
> any other flag is set, move on as usual. However, if it's the only flag
> still set:
>
> - Copy the exception frame we're about to unwind to just -below- the
> new r1 value we want to write to. Then perform the write, and change
> r1 to point to that copy of the frame.
>
> - Branch to restore: which will unwind everything from that copy of
> the frame, and eventually set r1 to GPR(1) in the copy which contains
> the new value of r1.
We still can't restore this there. As you know this emulated store instruction
can touch any filed inside pt_regs. Sometimes nip would be involved for this
problematic location. And unfortunately, this is just that we meet currently. So
we have to go exc_exit_restart.
.globl exc_exit_restart
exc_exit_restart:
lwz r11,_NIP(r1)
lwz r12,_MSR(r1)
I mean we have to do that real restore here. So I'm really not sure if its a
good way to add such a codes including check TIF/copy-get new r1/restore
operation here since this is so deep for the exception return code.
exc_exit_start:
mtspr SPRN_SRR0,r11
mtspr SPRN_SRR1,r12
>
> This is the less intrusive approach and should work just fine, it's also
> more robust than anything I've been able to think of and the approach
> would work for 32 and 64-bit similarily.
>
> (***) Above comment about a bug: If you look at entry_64.S version of
> ret_from_except_lite you'll notice that in the !preempt case, after
> we've checked MSR_PR we test for any TIF flag in _TIF_USER_WORK_MASK to
> decide whether to go to do_work or not. However, in the preempt case, we
> do a convoluted trick to test SIGPENDING only if PR was set and always
> test NEED_RESCHED ... but we forget to test any other bit of
> _TIF_USER_WORK_MASK !!! So that means that with preempt, we completely
> fail to test for things like single step, syscall tracing, etc...
>
This is another problem we should address.
> I think this should be fixed at the same time, by simplifying the code
> by doing:
>
> - Test PR. If set, go to test_work_user, else continue (or the other
> way around and call it test_work_kernel)
>
> - In test_work_user, always test for _TIF_USER_WORK_MASK to decide to
> go to do_work, maybe call it do_user_work
>
> - In test_work_kernel, test for _TIF_KERNEL_WORK_MASK which is set to
> our new flag along with NEED_RESCHED if preempt is enabled and branch to
> do_kernel_work.
>
> do_user_work is basically the same as today's user_work
>
> do_kernel_work is basically the same as today preempt block with added
> code to handle the new flag as described above.
>
> Is anybody volunteering for fixing that ? I don't have the bandwidth
I always use one specific kprobe stack to fix this for BOOKE and work well in my
local tree :) Do you remember my v3 patch? I think its possible to extend this
for all PPC variants.
Anyway, I'd like to be this volunteer with our last solution.
Tiejun
> right now, but if nobody shows up I suppose I'll have to eventually deal
> with it myself :-)
>
> Cheers,
> Ben.
WARNING: multiple messages have this Message-ID (diff)
From: "tiejun.chen" <tiejun.chen@windriver.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com>,
Anton Blanchard <anton@au1.ibm.com>,
linux-kernel <linux-kernel@vger.kernel.org>,
Steven Rostedt <rostedt@goodmis.org>,
Yong Zhang <yong.zhang0@gmail.com>, <paulus@samba.org>,
<yrl.pp-manager.tt@hitachi.com>,
Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
<linuxppc-dev@lists.ozlabs.org>
Subject: Re: [BUG?]3.0-rc4+ftrace+kprobe: set kprobe at instruction 'stwu' lead to system crash/freeze
Date: Wed, 30 Nov 2011 19:06:19 +0800 [thread overview]
Message-ID: <4ED60E2B.8030603@windriver.com> (raw)
In-Reply-To: <1322626752.21641.22.camel@pasglop>
Benjamin Herrenschmidt wrote:
> On Fri, 2011-07-01 at 18:03 +0800, tiejun.chen wrote:
>> Here emulate_step() is called to emulate 'stwu'. Actually this is equivalent to
>> 1> update pr_regs->gpr[1] = mem(old r1 + (-A))
>> 2> 'stw <old r1>, mem<(old r1 + (-A)) >
>>
>> You should notice the stack based on new r1 would be covered with mem<old r1
>> +(-A)>. So after this, the kernel exit from post_krpobe, something would be
>> broken. This should depend on sizeof(-A).
>>
>> For kprobe show_interrupts, you can see pregs->nip is re-written violently so
>> kernel issued.
>>
>> But sometimes we may only re-write some violate registers the kernel still
>> alive. And so this is just why the kernel works well for some kprobed point
>> after you change some kernel options/toolchains.
>>
>> If I'm correct its difficult to kprobe these stwu sp operation since the
>> sizeof(-A) is undermined for the kernel. So we have to implement in-depend
>> interrupt stack like PPC64.
>
> So I've spent a lot of time trying to find a better way to fix that bug
> and I think I might have finally found one :-)
I can understand what you mean in below since I remember you already clarified
this way previously.
>
> - When you try to emulate stwcx on the kernel stack (and only there),
I think it should be stwu/stdu.
> don't actually perform the store. Set a TIF flag instead to indicate
> special processing in the exception return path and store the address to
> update somewhere either in a free slot of the stack frame itself of
> somewhere in the thread struct (the former would be easier). You may as
> well do some sanity checking on the value while at it to catch errors
> early.
>
> - In the exception return code, we already test for various TIF flags
> (*** see comment below, it's even buggy today for preempt ***), so we
> add a test for that flag and go to do_work.
>
> - At the end of do_work, we check for this TIF flag. If it's not set or
> any other flag is set, move on as usual. However, if it's the only flag
> still set:
>
> - Copy the exception frame we're about to unwind to just -below- the
> new r1 value we want to write to. Then perform the write, and change
> r1 to point to that copy of the frame.
>
> - Branch to restore: which will unwind everything from that copy of
> the frame, and eventually set r1 to GPR(1) in the copy which contains
> the new value of r1.
We still can't restore this there. As you know this emulated store instruction
can touch any filed inside pt_regs. Sometimes nip would be involved for this
problematic location. And unfortunately, this is just that we meet currently. So
we have to go exc_exit_restart.
.globl exc_exit_restart
exc_exit_restart:
lwz r11,_NIP(r1)
lwz r12,_MSR(r1)
I mean we have to do that real restore here. So I'm really not sure if its a
good way to add such a codes including check TIF/copy-get new r1/restore
operation here since this is so deep for the exception return code.
exc_exit_start:
mtspr SPRN_SRR0,r11
mtspr SPRN_SRR1,r12
>
> This is the less intrusive approach and should work just fine, it's also
> more robust than anything I've been able to think of and the approach
> would work for 32 and 64-bit similarily.
>
> (***) Above comment about a bug: If you look at entry_64.S version of
> ret_from_except_lite you'll notice that in the !preempt case, after
> we've checked MSR_PR we test for any TIF flag in _TIF_USER_WORK_MASK to
> decide whether to go to do_work or not. However, in the preempt case, we
> do a convoluted trick to test SIGPENDING only if PR was set and always
> test NEED_RESCHED ... but we forget to test any other bit of
> _TIF_USER_WORK_MASK !!! So that means that with preempt, we completely
> fail to test for things like single step, syscall tracing, etc...
>
This is another problem we should address.
> I think this should be fixed at the same time, by simplifying the code
> by doing:
>
> - Test PR. If set, go to test_work_user, else continue (or the other
> way around and call it test_work_kernel)
>
> - In test_work_user, always test for _TIF_USER_WORK_MASK to decide to
> go to do_work, maybe call it do_user_work
>
> - In test_work_kernel, test for _TIF_KERNEL_WORK_MASK which is set to
> our new flag along with NEED_RESCHED if preempt is enabled and branch to
> do_kernel_work.
>
> do_user_work is basically the same as today's user_work
>
> do_kernel_work is basically the same as today preempt block with added
> code to handle the new flag as described above.
>
> Is anybody volunteering for fixing that ? I don't have the bandwidth
I always use one specific kprobe stack to fix this for BOOKE and work well in my
local tree :) Do you remember my v3 patch? I think its possible to extend this
for all PPC variants.
Anyway, I'd like to be this volunteer with our last solution.
Tiejun
> right now, but if nobody shows up I suppose I'll have to eventually deal
> with it myself :-)
>
> Cheers,
> Ben.
next prev parent reply other threads:[~2011-11-30 11:05 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-24 9:21 [BUG?]3.0-rc4+ftrace+kprobe: set kprobe at instruction 'stwu' lead to system crash/freeze Yong Zhang
2011-06-24 9:21 ` Yong Zhang
2011-06-24 10:29 ` Steven Rostedt
2011-06-24 10:29 ` Steven Rostedt
2011-06-26 14:47 ` Masami Hiramatsu
2011-06-26 14:47 ` Masami Hiramatsu
2011-06-27 10:01 ` Ananth N Mavinakayanahalli
2011-06-27 10:01 ` Ananth N Mavinakayanahalli
2011-06-28 10:41 ` Ananth N Mavinakayanahalli
2011-06-28 10:41 ` Ananth N Mavinakayanahalli
2011-06-28 13:15 ` Steven Rostedt
2011-06-28 13:15 ` Steven Rostedt
2011-06-29 6:41 ` Yong Zhang
2011-06-29 6:41 ` Yong Zhang
2011-06-29 6:23 ` Yong Zhang
2011-06-29 6:23 ` Yong Zhang
2011-06-29 6:46 ` Ananth N Mavinakayanahalli
2011-06-29 6:46 ` Ananth N Mavinakayanahalli
2011-06-30 7:08 ` Yong Zhang
2011-06-30 7:08 ` Yong Zhang
2011-07-01 10:03 ` tiejun.chen
2011-07-01 10:03 ` tiejun.chen
2011-07-04 2:23 ` Yong Zhang
2011-07-04 2:23 ` Yong Zhang
2011-11-30 4:19 ` Benjamin Herrenschmidt
2011-11-30 4:19 ` Benjamin Herrenschmidt
2011-11-30 11:06 ` tiejun.chen [this message]
2011-11-30 11:06 ` tiejun.chen
2011-11-30 21:00 ` Benjamin Herrenschmidt
2011-12-01 10:44 ` tiejun.chen
2011-12-01 10:44 ` tiejun.chen
2011-12-01 21:37 ` Benjamin Herrenschmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4ED60E2B.8030603@windriver.com \
--to=tiejun.chen@windriver.com \
--cc=anton@au1.ibm.com \
--cc=benh@kernel.crashing.org \
--cc=jkenisto@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=masami.hiramatsu.pt@hitachi.com \
--cc=paulus@samba.org \
--cc=rostedt@goodmis.org \
--cc=yong.zhang0@gmail.com \
--cc=yrl.pp-manager.tt@hitachi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.