* [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint
@ 2018-11-19 12:44 Breno Leitao
2018-11-20 10:34 ` Michael Ellerman
0 siblings, 1 reply; 3+ messages in thread
From: Breno Leitao @ 2018-11-19 12:44 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mikey, mpe, gromero, v3.9+, Breno Leitao
On a signal handler return, the user could set a context with MSR[TS] bits
set, and these bits would be copied to task regs->msr.
At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set,
several __get_user() are called and then a recheckpoint is executed.
This is a problem since a page fault (in kernel space) could happen when
calling __get_user(). If it happens, the process MSR[TS] bits were
already set, but recheckpoint was not executed, and SPRs are still invalid.
The page fault can cause the current process to be de-scheduled, with
MSR[TS] active and without tm_recheckpoint() being called. More
importantly, without TEXAR[FS] bit set also.
Since TEXASR might not have the FS bit set, and when the process is
scheduled back, it will try to reclaim, which will be aborted because of
the CPU is not in the suspended state, and, then, recheckpoint. This
recheckpoint will restore thread->texasr into TEXASR SPR, which might be
zero, hitting a BUG_ON().
[ 2181.457997] kernel BUG at arch/powerpc/kernel/tm.S:446!
This patch simply delays the MSR[TS] set, so, if there is any page fault in
the __get_user() section, it does not have regs->msr[TS] set, since the TM
structures are still invalid, thus avoiding doing TM operations for
in-kernel exceptions and possible process reschedule.
With this patch, the MSR[TS] will only be set just before recheckpointing
and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in
invalid state.
It is not possible to move tm_recheckpoint to happen earlier, because it is
required to get the checkpointed registers from userspace, with
__get_user(), thus, the only way to avoid this undesired behavior is
delaying the MSR[TS] set, as done in this patch.
Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals")
Cc: stable@vger.kernel.org (v3.9+)
Signed-off-by: Breno Leitao <leitao@debian.org>
---
arch/powerpc/kernel/signal_64.c | 29 +++++++++++++++--------------
1 file changed, 15 insertions(+), 14 deletions(-)
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 83d51bf586c7..15b153bdd826 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -467,20 +467,6 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
if (MSR_TM_RESV(msr))
return -EINVAL;
- /* pull in MSR TS bits from user context */
- regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
-
- /*
- * Ensure that TM is enabled in regs->msr before we leave the signal
- * handler. It could be the case that (a) user disabled the TM bit
- * through the manipulation of the MSR bits in uc_mcontext or (b) the
- * TM bit was disabled because a sufficient number of context switches
- * happened whilst in the signal handler and load_tm overflowed,
- * disabling the TM bit. In either case we can end up with an illegal
- * TM state leading to a TM Bad Thing when we return to userspace.
- */
- regs->msr |= MSR_TM;
-
/* pull in MSR LE from user context */
regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE);
@@ -572,6 +558,21 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
tm_enable();
/* Make sure the transaction is marked as failed */
tsk->thread.tm_texasr |= TEXASR_FS;
+
+ /* pull in MSR TS bits from user context */
+ regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
+
+ /*
+ * Ensure that TM is enabled in regs->msr before we leave the signal
+ * handler. It could be the case that (a) user disabled the TM bit
+ * through the manipulation of the MSR bits in uc_mcontext or (b) the
+ * TM bit was disabled because a sufficient number of context switches
+ * happened whilst in the signal handler and load_tm overflowed,
+ * disabling the TM bit. In either case we can end up with an illegal
+ * TM state leading to a TM Bad Thing when we return to userspace.
+ */
+ regs->msr |= MSR_TM;
+
/* This loads the checkpointed FP/VEC state, if used */
tm_recheckpoint(&tsk->thread);
--
2.19.0
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint 2018-11-19 12:44 [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint Breno Leitao @ 2018-11-20 10:34 ` Michael Ellerman 2018-11-21 18:32 ` Breno Leitao 0 siblings, 1 reply; 3+ messages in thread From: Michael Ellerman @ 2018-11-20 10:34 UTC (permalink / raw) To: Breno Leitao, linuxppc-dev; +Cc: mikey, gromero, v3.9+, Breno Leitao Hi Breno, Thanks for chasing this one down. Breno Leitao <leitao@debian.org> writes: > On a signal handler return, the user could set a context with MSR[TS] bits > set, and these bits would be copied to task regs->msr. > > At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set, > several __get_user() are called and then a recheckpoint is executed. > > This is a problem since a page fault (in kernel space) could happen when > calling __get_user(). If it happens, the process MSR[TS] bits were > already set, but recheckpoint was not executed, and SPRs are still invalid. > > The page fault can cause the current process to be de-scheduled, with > MSR[TS] active and without tm_recheckpoint() being called. More > importantly, without TEXAR[FS] bit set also. > > Since TEXASR might not have the FS bit set, and when the process is > scheduled back, it will try to reclaim, which will be aborted because of > the CPU is not in the suspended state, and, then, recheckpoint. This > recheckpoint will restore thread->texasr into TEXASR SPR, which might be > zero, hitting a BUG_ON(). > > [ 2181.457997] kernel BUG at arch/powerpc/kernel/tm.S:446! As Mikey said, would be good to have at least the stack trace & NIP here, if not the full oops. > This patch simply delays the MSR[TS] set, so, if there is any page fault in > the __get_user() section, it does not have regs->msr[TS] set, since the TM > structures are still invalid, thus avoiding doing TM operations for > in-kernel exceptions and possible process reschedule. > > With this patch, the MSR[TS] will only be set just before recheckpointing > and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in > invalid state. To make this safe when PREEMPT is enabled we need to preempt_disable() / enable() around the setting of regs->msr and the recheckpoint. That could also serve as nice documentation. I guess the other question is whether it should be the job of tm_recheckpoint() to set regs->msr, given that it already hard disables interrupts. eg. we'd set the TM flags in a local msr variable and pass the to tm_recheckpoint(), it would then assign that to regs->msr in the IRQ disabled section. Though there's many callers of tm_recheckpoint() that don't need that behaviour, so it would probably need to be factored out. > It is not possible to move tm_recheckpoint to happen earlier, because it is > required to get the checkpointed registers from userspace, with > __get_user(), thus, the only way to avoid this undesired behavior is > delaying the MSR[TS] set, as done in this patch. I think the root cause here is that we're copying into the live regs of current. That has obviously worked in the past, because the register state wasn't used until we returned back to userspace. But that's no longer true with TM. And even so it's quite subtle. I also suspect some of our FP/VEC handling may not work correctly if we're scheduled part way through restoring the regs. What might work better is if we copy all the regs into temporary space and then with interrupts disabled we copy them into the task. That way we should never be scheduled with a half-populated set of regs. That's obviously a much bigger patch though and something we'll have to do later. > Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals") > Cc: stable@vger.kernel.org (v3.9+) > Signed-off-by: Breno Leitao <leitao@debian.org> > --- > arch/powerpc/kernel/signal_64.c | 29 +++++++++++++++-------------- > 1 file changed, 15 insertions(+), 14 deletions(-) > > diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c > index 83d51bf586c7..15b153bdd826 100644 > --- a/arch/powerpc/kernel/signal_64.c > +++ b/arch/powerpc/kernel/signal_64.c > @@ -467,20 +467,6 @@ static long restore_tm_sigcontexts(struct task_struct *tsk, > if (MSR_TM_RESV(msr)) > return -EINVAL; > > - /* pull in MSR TS bits from user context */ > - regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK); > - > - /* > - * Ensure that TM is enabled in regs->msr before we leave the signal > - * handler. It could be the case that (a) user disabled the TM bit > - * through the manipulation of the MSR bits in uc_mcontext or (b) the > - * TM bit was disabled because a sufficient number of context switches > - * happened whilst in the signal handler and load_tm overflowed, > - * disabling the TM bit. In either case we can end up with an illegal > - * TM state leading to a TM Bad Thing when we return to userspace. > - */ > - regs->msr |= MSR_TM; > - > /* pull in MSR LE from user context */ > regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE); > > @@ -572,6 +558,21 @@ static long restore_tm_sigcontexts(struct task_struct *tsk, > tm_enable(); > /* Make sure the transaction is marked as failed */ > tsk->thread.tm_texasr |= TEXASR_FS; > + preempt_disable(); > + /* pull in MSR TS bits from user context */ > + regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK); > + > + /* > + * Ensure that TM is enabled in regs->msr before we leave the signal > + * handler. It could be the case that (a) user disabled the TM bit > + * through the manipulation of the MSR bits in uc_mcontext or (b) the > + * TM bit was disabled because a sufficient number of context switches > + * happened whilst in the signal handler and load_tm overflowed, > + * disabling the TM bit. In either case we can end up with an illegal > + * TM state leading to a TM Bad Thing when we return to userspace. > + */ > + regs->msr |= MSR_TM; > + > /* This loads the checkpointed FP/VEC state, if used */ > tm_recheckpoint(&tsk->thread); > preempt_enable(); Although looking at the code that follows, it probably won't cope with being preempted either. So the preempt_enable() should probably go at the end of the function. cheers ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint 2018-11-20 10:34 ` Michael Ellerman @ 2018-11-21 18:32 ` Breno Leitao 0 siblings, 0 replies; 3+ messages in thread From: Breno Leitao @ 2018-11-21 18:32 UTC (permalink / raw) To: Michael Ellerman, linuxppc-dev; +Cc: mikey, gromero, v3.9+ hi Michael, On 11/20/18 8:34 AM, Michael Ellerman wrote: > Hi Breno, > > Thanks for chasing this one down. > > Breno Leitao <leitao@debian.org> writes: > >> On a signal handler return, the user could set a context with MSR[TS] bits >> set, and these bits would be copied to task regs->msr. >> >> At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set, >> several __get_user() are called and then a recheckpoint is executed. >> >> This is a problem since a page fault (in kernel space) could happen when >> calling __get_user(). If it happens, the process MSR[TS] bits were >> already set, but recheckpoint was not executed, and SPRs are still invalid. >> >> The page fault can cause the current process to be de-scheduled, with >> MSR[TS] active and without tm_recheckpoint() being called. More >> importantly, without TEXAR[FS] bit set also. >> >> Since TEXASR might not have the FS bit set, and when the process is >> scheduled back, it will try to reclaim, which will be aborted because of >> the CPU is not in the suspended state, and, then, recheckpoint. This >> recheckpoint will restore thread->texasr into TEXASR SPR, which might be >> zero, hitting a BUG_ON(). >> >> [ 2181.457997] kernel BUG at arch/powerpc/kernel/tm.S:446! > > As Mikey said, would be good to have at least the stack trace & NIP > here, if not the full oops. Ack! >> This patch simply delays the MSR[TS] set, so, if there is any page fault in >> the __get_user() section, it does not have regs->msr[TS] set, since the TM >> structures are still invalid, thus avoiding doing TM operations for >> in-kernel exceptions and possible process reschedule. >> >> With this patch, the MSR[TS] will only be set just before recheckpointing >> and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in >> invalid state. > > To make this safe when PREEMPT is enabled we need to preempt_disable() / > enable() around the setting of regs->msr and the recheckpoint. > > That could also serve as nice documentation. > > I guess the other question is whether it should be the job of > tm_recheckpoint() to set regs->msr, given that it already hard disables > interrupts. > > eg. we'd set the TM flags in a local msr variable and pass the to > tm_recheckpoint(), it would then assign that to regs->msr in the IRQ > disabled section. Though there's many callers of tm_recheckpoint() that > don't need that behaviour, so it would probably need to be factored out. Right, that might be doable, but I am wondering if it isn't better to create a new function that does it as below: void tm_set_msr_and_recheckpoint(struct pt_regs *regs, u64 msr) { preempt_disable(); regs->msr = msr; tm_recheckpoint(); preempt_enable(); } I understand that preempt_disable() does a better work disabling preemption compared to disabling IRQ. Also, it does not change the API just for this very specific signal case. >> It is not possible to move tm_recheckpoint to happen earlier, because it is >> required to get the checkpointed registers from userspace, with >> __get_user(), thus, the only way to avoid this undesired behavior is >> delaying the MSR[TS] set, as done in this patch. > > I think the root cause here is that we're copying into the live regs of > current. That has obviously worked in the past, because the register > state wasn't used until we returned back to userspace. But that's no > longer true with TM. And even so it's quite subtle. I also suspect some > of our FP/VEC handling may not work correctly if we're scheduled part > way through restoring the regs. I got your point and I think we have a risk of corrupting the regs if MSR[TS] is set and we do page_fault->recheckpoint/recheclaim in the middle of __get_user() chunks. > What might work better is if we copy all the regs into temporary space > and then with interrupts disabled we copy them into the task. That way > we should never be scheduled with a half-populated set of regs. Yes, that seems to be the best strategy, and I might be interested in working on it. > That's obviously a much bigger patch though and something we'll have to > do later. > >> Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals") >> Cc: stable@vger.kernel.org (v3.9+) >> Signed-off-by: Breno Leitao <leitao@debian.org> >> --- >> arch/powerpc/kernel/signal_64.c | 29 +++++++++++++++-------------- >> 1 file changed, 15 insertions(+), 14 deletions(-) >> >> diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c >> index 83d51bf586c7..15b153bdd826 100644 >> --- a/arch/powerpc/kernel/signal_64.c >> +++ b/arch/powerpc/kernel/signal_64.c >> @@ -467,20 +467,6 @@ static long restore_tm_sigcontexts(struct task_struct *tsk, >> if (MSR_TM_RESV(msr)) >> return -EINVAL; >> >> - /* pull in MSR TS bits from user context */ >> - regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK); >> - >> - /* >> - * Ensure that TM is enabled in regs->msr before we leave the signal >> - * handler. It could be the case that (a) user disabled the TM bit >> - * through the manipulation of the MSR bits in uc_mcontext or (b) the >> - * TM bit was disabled because a sufficient number of context switches >> - * happened whilst in the signal handler and load_tm overflowed, >> - * disabling the TM bit. In either case we can end up with an illegal >> - * TM state leading to a TM Bad Thing when we return to userspace. >> - */ >> - regs->msr |= MSR_TM; >> - >> /* pull in MSR LE from user context */ >> regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE); >> >> @@ -572,6 +558,21 @@ static long restore_tm_sigcontexts(struct task_struct *tsk, >> tm_enable(); >> /* Make sure the transaction is marked as failed */ >> tsk->thread.tm_texasr |= TEXASR_FS; >> + > > preempt_disable(); > >> + /* pull in MSR TS bits from user context */ >> + regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK); >> + >> + /* >> + * Ensure that TM is enabled in regs->msr before we leave the signal >> + * handler. It could be the case that (a) user disabled the TM bit >> + * through the manipulation of the MSR bits in uc_mcontext or (b) the >> + * TM bit was disabled because a sufficient number of context switches >> + * happened whilst in the signal handler and load_tm overflowed, >> + * disabling the TM bit. In either case we can end up with an illegal >> + * TM state leading to a TM Bad Thing when we return to userspace. >> + */ >> + regs->msr |= MSR_TM; >> + >> /* This loads the checkpointed FP/VEC state, if used */ >> tm_recheckpoint(&tsk->thread); >> > preempt_enable(); > > Although looking at the code that follows, it probably won't cope with > being preempted either. So the preempt_enable() should probably go at > the end of the function. Why? I confess I do not understand the preempt mechanism a lot. Does a preemption save/restore FP and vector states when it is preempted? What is the code/IRQ that is executed when there is a preemption? I am wondering if a preempt happens because the Decrementer (0x900) IRQ happens and a syscall is being executed, thus, saving the whole context and then restoring it. If my guess is correct, the IRQ might not save the FP/VEC states, thus, your concern about ahving load_fp_state() after a preemption. Let me paste the "patched" code here to help with the discussion: regs->msr |= MSR_TM; /* This loads the checkpointed FP/VEC state, if used */ tm_recheckpoint(&tsk->thread); msr_check_and_set(msr & (MSR_FP | MSR_VEC)); if (msr & MSR_FP) { load_fp_state(&tsk->thread.fp_state); regs->msr |= (MSR_FP | tsk->thread.fpexc_mode); } if (msr & MSR_VEC) { load_vr_state(&tsk->thread.vr_state); regs->msr |= MSR_VEC; } Thanks Breno ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2018-11-22 5:07 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-11-19 12:44 [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint Breno Leitao 2018-11-20 10:34 ` Michael Ellerman 2018-11-21 18:32 ` Breno Leitao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox