* [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint
@ 2018-11-19 12:44 Breno Leitao
2018-11-20 10:34 ` Michael Ellerman
0 siblings, 1 reply; 3+ messages in thread
From: Breno Leitao @ 2018-11-19 12:44 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mikey, mpe, gromero, v3.9+, Breno Leitao
On a signal handler return, the user could set a context with MSR[TS] bits
set, and these bits would be copied to task regs->msr.
At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set,
several __get_user() are called and then a recheckpoint is executed.
This is a problem since a page fault (in kernel space) could happen when
calling __get_user(). If it happens, the process MSR[TS] bits were
already set, but recheckpoint was not executed, and SPRs are still invalid.
The page fault can cause the current process to be de-scheduled, with
MSR[TS] active and without tm_recheckpoint() being called. More
importantly, without TEXAR[FS] bit set also.
Since TEXASR might not have the FS bit set, and when the process is
scheduled back, it will try to reclaim, which will be aborted because of
the CPU is not in the suspended state, and, then, recheckpoint. This
recheckpoint will restore thread->texasr into TEXASR SPR, which might be
zero, hitting a BUG_ON().
[ 2181.457997] kernel BUG at arch/powerpc/kernel/tm.S:446!
This patch simply delays the MSR[TS] set, so, if there is any page fault in
the __get_user() section, it does not have regs->msr[TS] set, since the TM
structures are still invalid, thus avoiding doing TM operations for
in-kernel exceptions and possible process reschedule.
With this patch, the MSR[TS] will only be set just before recheckpointing
and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in
invalid state.
It is not possible to move tm_recheckpoint to happen earlier, because it is
required to get the checkpointed registers from userspace, with
__get_user(), thus, the only way to avoid this undesired behavior is
delaying the MSR[TS] set, as done in this patch.
Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals")
Cc: stable@vger.kernel.org (v3.9+)
Signed-off-by: Breno Leitao <leitao@debian.org>
---
arch/powerpc/kernel/signal_64.c | 29 +++++++++++++++--------------
1 file changed, 15 insertions(+), 14 deletions(-)
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 83d51bf586c7..15b153bdd826 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -467,20 +467,6 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
if (MSR_TM_RESV(msr))
return -EINVAL;
- /* pull in MSR TS bits from user context */
- regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
-
- /*
- * Ensure that TM is enabled in regs->msr before we leave the signal
- * handler. It could be the case that (a) user disabled the TM bit
- * through the manipulation of the MSR bits in uc_mcontext or (b) the
- * TM bit was disabled because a sufficient number of context switches
- * happened whilst in the signal handler and load_tm overflowed,
- * disabling the TM bit. In either case we can end up with an illegal
- * TM state leading to a TM Bad Thing when we return to userspace.
- */
- regs->msr |= MSR_TM;
-
/* pull in MSR LE from user context */
regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE);
@@ -572,6 +558,21 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
tm_enable();
/* Make sure the transaction is marked as failed */
tsk->thread.tm_texasr |= TEXASR_FS;
+
+ /* pull in MSR TS bits from user context */
+ regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
+
+ /*
+ * Ensure that TM is enabled in regs->msr before we leave the signal
+ * handler. It could be the case that (a) user disabled the TM bit
+ * through the manipulation of the MSR bits in uc_mcontext or (b) the
+ * TM bit was disabled because a sufficient number of context switches
+ * happened whilst in the signal handler and load_tm overflowed,
+ * disabling the TM bit. In either case we can end up with an illegal
+ * TM state leading to a TM Bad Thing when we return to userspace.
+ */
+ regs->msr |= MSR_TM;
+
/* This loads the checkpointed FP/VEC state, if used */
tm_recheckpoint(&tsk->thread);
--
2.19.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint
2018-11-19 12:44 [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint Breno Leitao
@ 2018-11-20 10:34 ` Michael Ellerman
2018-11-21 18:32 ` Breno Leitao
0 siblings, 1 reply; 3+ messages in thread
From: Michael Ellerman @ 2018-11-20 10:34 UTC (permalink / raw)
To: Breno Leitao, linuxppc-dev; +Cc: mikey, gromero, v3.9+, Breno Leitao
Hi Breno,
Thanks for chasing this one down.
Breno Leitao <leitao@debian.org> writes:
> On a signal handler return, the user could set a context with MSR[TS] bits
> set, and these bits would be copied to task regs->msr.
>
> At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set,
> several __get_user() are called and then a recheckpoint is executed.
>
> This is a problem since a page fault (in kernel space) could happen when
> calling __get_user(). If it happens, the process MSR[TS] bits were
> already set, but recheckpoint was not executed, and SPRs are still invalid.
>
> The page fault can cause the current process to be de-scheduled, with
> MSR[TS] active and without tm_recheckpoint() being called. More
> importantly, without TEXAR[FS] bit set also.
>
> Since TEXASR might not have the FS bit set, and when the process is
> scheduled back, it will try to reclaim, which will be aborted because of
> the CPU is not in the suspended state, and, then, recheckpoint. This
> recheckpoint will restore thread->texasr into TEXASR SPR, which might be
> zero, hitting a BUG_ON().
>
> [ 2181.457997] kernel BUG at arch/powerpc/kernel/tm.S:446!
As Mikey said, would be good to have at least the stack trace & NIP
here, if not the full oops.
> This patch simply delays the MSR[TS] set, so, if there is any page fault in
> the __get_user() section, it does not have regs->msr[TS] set, since the TM
> structures are still invalid, thus avoiding doing TM operations for
> in-kernel exceptions and possible process reschedule.
>
> With this patch, the MSR[TS] will only be set just before recheckpointing
> and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in
> invalid state.
To make this safe when PREEMPT is enabled we need to preempt_disable() /
enable() around the setting of regs->msr and the recheckpoint.
That could also serve as nice documentation.
I guess the other question is whether it should be the job of
tm_recheckpoint() to set regs->msr, given that it already hard disables
interrupts.
eg. we'd set the TM flags in a local msr variable and pass the to
tm_recheckpoint(), it would then assign that to regs->msr in the IRQ
disabled section. Though there's many callers of tm_recheckpoint() that
don't need that behaviour, so it would probably need to be factored out.
> It is not possible to move tm_recheckpoint to happen earlier, because it is
> required to get the checkpointed registers from userspace, with
> __get_user(), thus, the only way to avoid this undesired behavior is
> delaying the MSR[TS] set, as done in this patch.
I think the root cause here is that we're copying into the live regs of
current. That has obviously worked in the past, because the register
state wasn't used until we returned back to userspace. But that's no
longer true with TM. And even so it's quite subtle. I also suspect some
of our FP/VEC handling may not work correctly if we're scheduled part
way through restoring the regs.
What might work better is if we copy all the regs into temporary space
and then with interrupts disabled we copy them into the task. That way
we should never be scheduled with a half-populated set of regs.
That's obviously a much bigger patch though and something we'll have to
do later.
> Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals")
> Cc: stable@vger.kernel.org (v3.9+)
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
> arch/powerpc/kernel/signal_64.c | 29 +++++++++++++++--------------
> 1 file changed, 15 insertions(+), 14 deletions(-)
>
> diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
> index 83d51bf586c7..15b153bdd826 100644
> --- a/arch/powerpc/kernel/signal_64.c
> +++ b/arch/powerpc/kernel/signal_64.c
> @@ -467,20 +467,6 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
> if (MSR_TM_RESV(msr))
> return -EINVAL;
>
> - /* pull in MSR TS bits from user context */
> - regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
> -
> - /*
> - * Ensure that TM is enabled in regs->msr before we leave the signal
> - * handler. It could be the case that (a) user disabled the TM bit
> - * through the manipulation of the MSR bits in uc_mcontext or (b) the
> - * TM bit was disabled because a sufficient number of context switches
> - * happened whilst in the signal handler and load_tm overflowed,
> - * disabling the TM bit. In either case we can end up with an illegal
> - * TM state leading to a TM Bad Thing when we return to userspace.
> - */
> - regs->msr |= MSR_TM;
> -
> /* pull in MSR LE from user context */
> regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE);
>
> @@ -572,6 +558,21 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
> tm_enable();
> /* Make sure the transaction is marked as failed */
> tsk->thread.tm_texasr |= TEXASR_FS;
> +
preempt_disable();
> + /* pull in MSR TS bits from user context */
> + regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
> +
> + /*
> + * Ensure that TM is enabled in regs->msr before we leave the signal
> + * handler. It could be the case that (a) user disabled the TM bit
> + * through the manipulation of the MSR bits in uc_mcontext or (b) the
> + * TM bit was disabled because a sufficient number of context switches
> + * happened whilst in the signal handler and load_tm overflowed,
> + * disabling the TM bit. In either case we can end up with an illegal
> + * TM state leading to a TM Bad Thing when we return to userspace.
> + */
> + regs->msr |= MSR_TM;
> +
> /* This loads the checkpointed FP/VEC state, if used */
> tm_recheckpoint(&tsk->thread);
>
preempt_enable();
Although looking at the code that follows, it probably won't cope with
being preempted either. So the preempt_enable() should probably go at
the end of the function.
cheers
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint
2018-11-20 10:34 ` Michael Ellerman
@ 2018-11-21 18:32 ` Breno Leitao
0 siblings, 0 replies; 3+ messages in thread
From: Breno Leitao @ 2018-11-21 18:32 UTC (permalink / raw)
To: Michael Ellerman, linuxppc-dev; +Cc: mikey, gromero, v3.9+
hi Michael,
On 11/20/18 8:34 AM, Michael Ellerman wrote:
> Hi Breno,
>
> Thanks for chasing this one down.
>
> Breno Leitao <leitao@debian.org> writes:
>
>> On a signal handler return, the user could set a context with MSR[TS] bits
>> set, and these bits would be copied to task regs->msr.
>>
>> At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set,
>> several __get_user() are called and then a recheckpoint is executed.
>>
>> This is a problem since a page fault (in kernel space) could happen when
>> calling __get_user(). If it happens, the process MSR[TS] bits were
>> already set, but recheckpoint was not executed, and SPRs are still invalid.
>>
>> The page fault can cause the current process to be de-scheduled, with
>> MSR[TS] active and without tm_recheckpoint() being called. More
>> importantly, without TEXAR[FS] bit set also.
>>
>> Since TEXASR might not have the FS bit set, and when the process is
>> scheduled back, it will try to reclaim, which will be aborted because of
>> the CPU is not in the suspended state, and, then, recheckpoint. This
>> recheckpoint will restore thread->texasr into TEXASR SPR, which might be
>> zero, hitting a BUG_ON().
>>
>> [ 2181.457997] kernel BUG at arch/powerpc/kernel/tm.S:446!
>
> As Mikey said, would be good to have at least the stack trace & NIP
> here, if not the full oops.
Ack!
>> This patch simply delays the MSR[TS] set, so, if there is any page fault in
>> the __get_user() section, it does not have regs->msr[TS] set, since the TM
>> structures are still invalid, thus avoiding doing TM operations for
>> in-kernel exceptions and possible process reschedule.
>>
>> With this patch, the MSR[TS] will only be set just before recheckpointing
>> and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in
>> invalid state.
>
> To make this safe when PREEMPT is enabled we need to preempt_disable() /
> enable() around the setting of regs->msr and the recheckpoint.
>
> That could also serve as nice documentation.
>
> I guess the other question is whether it should be the job of
> tm_recheckpoint() to set regs->msr, given that it already hard disables
> interrupts.
>
> eg. we'd set the TM flags in a local msr variable and pass the to
> tm_recheckpoint(), it would then assign that to regs->msr in the IRQ
> disabled section. Though there's many callers of tm_recheckpoint() that
> don't need that behaviour, so it would probably need to be factored out.
Right, that might be doable, but I am wondering if it isn't better to create
a new function that does it as below:
void tm_set_msr_and_recheckpoint(struct pt_regs *regs, u64 msr)
{
preempt_disable();
regs->msr = msr;
tm_recheckpoint();
preempt_enable();
}
I understand that preempt_disable() does a better work disabling preemption
compared to disabling IRQ. Also, it does not change the API just for this
very specific signal case.
>> It is not possible to move tm_recheckpoint to happen earlier, because it is
>> required to get the checkpointed registers from userspace, with
>> __get_user(), thus, the only way to avoid this undesired behavior is
>> delaying the MSR[TS] set, as done in this patch.
>
> I think the root cause here is that we're copying into the live regs of
> current. That has obviously worked in the past, because the register
> state wasn't used until we returned back to userspace. But that's no
> longer true with TM. And even so it's quite subtle. I also suspect some
> of our FP/VEC handling may not work correctly if we're scheduled part
> way through restoring the regs.
I got your point and I think we have a risk of corrupting the regs if MSR[TS]
is set and we do page_fault->recheckpoint/recheclaim in the middle of
__get_user() chunks.
> What might work better is if we copy all the regs into temporary space
> and then with interrupts disabled we copy them into the task. That way
> we should never be scheduled with a half-populated set of regs.
Yes, that seems to be the best strategy, and I might be interested in working
on it.
> That's obviously a much bigger patch though and something we'll have to
> do later.
>
>> Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals")
>> Cc: stable@vger.kernel.org (v3.9+)
>> Signed-off-by: Breno Leitao <leitao@debian.org>
>> ---
>> arch/powerpc/kernel/signal_64.c | 29 +++++++++++++++--------------
>> 1 file changed, 15 insertions(+), 14 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
>> index 83d51bf586c7..15b153bdd826 100644
>> --- a/arch/powerpc/kernel/signal_64.c
>> +++ b/arch/powerpc/kernel/signal_64.c
>> @@ -467,20 +467,6 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
>> if (MSR_TM_RESV(msr))
>> return -EINVAL;
>>
>> - /* pull in MSR TS bits from user context */
>> - regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
>> -
>> - /*
>> - * Ensure that TM is enabled in regs->msr before we leave the signal
>> - * handler. It could be the case that (a) user disabled the TM bit
>> - * through the manipulation of the MSR bits in uc_mcontext or (b) the
>> - * TM bit was disabled because a sufficient number of context switches
>> - * happened whilst in the signal handler and load_tm overflowed,
>> - * disabling the TM bit. In either case we can end up with an illegal
>> - * TM state leading to a TM Bad Thing when we return to userspace.
>> - */
>> - regs->msr |= MSR_TM;
>> -
>> /* pull in MSR LE from user context */
>> regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE);
>>
>> @@ -572,6 +558,21 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
>> tm_enable();
>> /* Make sure the transaction is marked as failed */
>> tsk->thread.tm_texasr |= TEXASR_FS;
>> +
>
> preempt_disable();
>
>> + /* pull in MSR TS bits from user context */
>> + regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
>> +
>> + /*
>> + * Ensure that TM is enabled in regs->msr before we leave the signal
>> + * handler. It could be the case that (a) user disabled the TM bit
>> + * through the manipulation of the MSR bits in uc_mcontext or (b) the
>> + * TM bit was disabled because a sufficient number of context switches
>> + * happened whilst in the signal handler and load_tm overflowed,
>> + * disabling the TM bit. In either case we can end up with an illegal
>> + * TM state leading to a TM Bad Thing when we return to userspace.
>> + */
>> + regs->msr |= MSR_TM;
>> +
>> /* This loads the checkpointed FP/VEC state, if used */
>> tm_recheckpoint(&tsk->thread);
>>
> preempt_enable();
>
> Although looking at the code that follows, it probably won't cope with
> being preempted either. So the preempt_enable() should probably go at
> the end of the function.
Why? I confess I do not understand the preempt mechanism a lot. Does a
preemption save/restore FP and vector states when it is preempted?
What is the code/IRQ that is executed when there is a preemption? I am
wondering if a preempt happens because the Decrementer (0x900) IRQ happens
and a syscall is being executed, thus, saving the whole context and then
restoring it. If my guess is correct, the IRQ might not save the FP/VEC
states, thus, your concern about ahving load_fp_state() after a preemption.
Let me paste the "patched" code here to help with the discussion:
regs->msr |= MSR_TM;
/* This loads the checkpointed FP/VEC state, if used */
tm_recheckpoint(&tsk->thread);
msr_check_and_set(msr & (MSR_FP | MSR_VEC));
if (msr & MSR_FP) {
load_fp_state(&tsk->thread.fp_state);
regs->msr |= (MSR_FP | tsk->thread.fpexc_mode);
}
if (msr & MSR_VEC) {
load_vr_state(&tsk->thread.vr_state);
regs->msr |= MSR_VEC;
}
Thanks
Breno
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2018-11-22 5:07 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-11-19 12:44 [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint Breno Leitao
2018-11-20 10:34 ` Michael Ellerman
2018-11-21 18:32 ` Breno Leitao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox