From: ebiederm@xmission.com (Eric W. Biederman)
To: Michael Schmitz <schmitzmic@gmail.com>
Cc: geert@linux-m68k.org, linux-arch@vger.kernel.org,
linux-m68k@lists.linux-m68k.org, torvalds@linux-foundation.org,
schwab@linux-m68k.org
Subject: Re: [PATCH v4 0/3] m68k: Improved switch stack handling
Date: Fri, 23 Jul 2021 17:31:20 -0500 [thread overview]
Message-ID: <87tukkk6h3.fsf@disp2133> (raw)
In-Reply-To: <12992a3c-0740-f90e-aa4e-1ec1d8ea38f6@gmail.com> (Michael Schmitz's message of "Fri, 23 Jul 2021 16:23:12 +1200")
Michael Schmitz <schmitzmic@gmail.com> writes:
> Hi Eric,
>
> Am 23.07.2021 um 02:49 schrieb Eric W. Biederman:
>> Michael Schmitz <schmitzmic@gmail.com> writes:
>>
>>> Hi Eric,
>>>
>>> On 21/07/21 8:32 am, Eric W. Biederman wrote:
>>>>
>>>>> diff --git a/arch/m68k/fpsp040/skeleton.S b/arch/m68k/fpsp040/skeleton.S
>>>>> index a8f4161..6c92d38 100644
>>>>> --- a/arch/m68k/fpsp040/skeleton.S
>>>>> +++ b/arch/m68k/fpsp040/skeleton.S
>>>>> @@ -502,7 +502,17 @@ in_ea:
>>>>> .section .fixup,#alloc,#execinstr
>>>>> .even
>>>>> 1:
>>>>> +
>>>>> + SAVE_ALL_INT
>>>>> + SAVE_SWITCH_STACK
>>>> ^^^^^^^^^^
>>>>
>>>> I don't think this saves the registers in the well known fixed location
>>>> on the stack because some registers are saved at the exception entry
>>>> point.
>>>
>>> The FPU exception entry points are not using the exception entry code in
>>> head.S. These entry points are stored in the exception vector table directly. No
>>> saving of a syscall stack frame happens there. The FPU places its exception
>>> frame on the stack, and that is what the FPU exception handlers use.
>>>
>>> (If these have to call out to the generic exception handlers again, they will
>>> build a minimal stack frame, see code in skeleton.S.)
>>>
>>> Calling fpsp040_die() is no different from calling a syscall that may need to
>>> have access to the full stack frame. The 'fixed location' is just 'on the stack
>>> before calling fpsp040_die()', again this is no different from calling
>>> e.g. sys_fork() which does not take a pointer to the begin of the stack frame as
>>> an argument.
>>>
>>> I must admit I never looked at how do_exit() figures out where the stack frame
>>> containing the saved registers is stored, I just assumed it unwinds the stack up
>>> to the point where the caller syscall was made, and works from there. The same
>>> strategy ought to work here.
>>
>> For do_exit the part we need to be careful with is PTRACE_EVENT_EXIT,
>> which means it is ptrace that we need to look at.
>>
>> For m68k the code in put_reg and get_reg finds the registers by looking
>> at task->thread.esp0.
>
> Thanks, that's what I was missing here.
>>
>> I was expecting m68k to use the same technique as alpha which expects a
>> fixed offset from task_stack_page(task).
>>
>> So your code will work if you add code to update task->thread.esp0 which
>> is also known as THREAD_ESP0 in entry.S
>
> Shoving
>
> movel %sp,%curptr@(TASK_THREAD+THREAD_ESP0)
>
> in between the SAVE_ALL_INT and SAVE_SWITCH_STACK ought to do the
> trick there.
>
>>
>>>> Without being saved at the well known fixed location if some process
>>>> stops in PTRACE_EVENT_EXIT in do_exit we likely get some complete
>>>> gibberish.
>>>>
>>>> That is probably safe.
>>>>
>>>>> jbra fpsp040_die
>>>>> + addql #8,%sp
>>>>> + addql #8,%sp
>>>>> + addql #8,%sp
>>>>> + addql #8,%sp
>>>>> + addql #8,%sp
>>>>> + addql #4,%sp
>>>>> + rts
>>>> Especially as everything after jumping to fpsp040_die does not execute.
>>>
>>> Unless we change fpsp040_die() to call force_sig(SIGSEGV).
>>
>> Yes. I think we would probably need to have it also call get_signal and
>> all of that, because I don't think the very light call path for that
>> exception includes testing if signals are pending.
>
> As far as I can see, there is a test for pending signals:
>
> ENTRY(ret_from_exception)
> .Lret_from_exception:
> btst #5,%sp@(PT_OFF_SR) | check if returning to kernel
> bnes 1f | if so, skip resched, signals
> | only allow interrupts when we are really the last one on the
> | kernel stack, otherwise stack overflow can occur during
> | heavy interrupt load
> andw #ALLOWINT,%sr
>
> resume_userspace:
> movel %curptr@(TASK_STACK),%a1
> moveb %a1@(TINFO_FLAGS+3),%d0 | bits 0-7 of TINFO_FLAGS
> jne exit_work | any bit set? -> exit_work
> 1: RESTORE_ALL
>
> exit_work:
> | save top of frame
> movel %sp,%curptr@(TASK_THREAD+THREAD_ESP0)
> lslb #1,%d0 | shift out TIF_NEED_RESCHED
> jne do_signal_return | any remaining bit
> (signal/notify_resume)? -> do_signal_return
> pea resume_userspace
> jra schedule
>
> As long as TIF_NOTIFY_SIGNAL or TIF_SIGPENDING are set,
> do_signal_return will be called.
I was going to say I don't think so, as my tracing of
the code lead in a couple of different directions. Upon closer
inspection all those paths either lead to fpsp_done or more
directly to ret_from_exception.
For anyone else who might want to trace the code, or for myself later on
when I forget. As best as I can figure the hardware exception vector
table is setup in: arch/m68k/kernel/vector.c
For the vectors in question it appears to be this chunk of code:
if (CPU_IS_040 && !FPU_IS_EMU) {
/* set up FPSP entry points */
asmlinkage void dz_vec(void) asm ("dz");
asmlinkage void inex_vec(void) asm ("inex");
asmlinkage void ovfl_vec(void) asm ("ovfl");
asmlinkage void unfl_vec(void) asm ("unfl");
asmlinkage void snan_vec(void) asm ("snan");
asmlinkage void operr_vec(void) asm ("operr");
asmlinkage void bsun_vec(void) asm ("bsun");
asmlinkage void fline_vec(void) asm ("fline");
asmlinkage void unsupp_vec(void) asm ("unsupp");
vectors[VEC_FPDIVZ] = dz_vec;
vectors[VEC_FPIR] = inex_vec;
vectors[VEC_FPOVER] = ovfl_vec;
vectors[VEC_FPUNDER] = unfl_vec;
vectors[VEC_FPNAN] = snan_vec;
vectors[VEC_FPOE] = operr_vec;
vectors[VEC_FPBRUC] = bsun_vec;
vectors[VEC_LINE11] = fline_vec;
vectors[VEC_FPUNSUP] = unsupp_vec;
}
Which leads me to call traces that look like this:
hw
fline
fpsp_fline
mem_read
user_read
copyin
in_ea
<page-fault>
fpsp040_die
If that mem_read returns it can be followed by
not_mvcr
real_fline
ret_from_exception
Or it can be followed by
fix_con
uni_2
gen_except
do_clean
finish_up
fpsp_done
ret_from_exception
>> The way the code is structured it is actively incorrect to return from
>> fpsp040_die, as the code does not know what to do if it reads a byte
>> from userspace and there is nothing there.
>
> Correct - my hope is that upon return from the FPU exception (that
> continued after a dodgy read or write), we get the signal delivered
> and will die then.
Yes. That does look like a good strategy.
I am wondering if there are values we can return that will make the
path out of the exit routine more deterministic.
I have played with that a little bit today, but it doesn't look like I
am going to have time to put together any kind of real patch today.
Simply modifying fpsp040_die to call force_sigsegv(SIGSEGV) should be
enough to trigger a signal (no call stack work needed if we remove
do_exit). The tricky bit is what value do we want to fake when we
can not read anything from userspace. For a write fault we should just
be able to skip the write entirely.
In both cases we probably should break out of the loop prematurely. But
I don't know if that is necessary.
The lazy strategy would be to copy the ifpsp060 code and simply oops
the kernel if the read or write of userspace gets a page fault.
>> So instead of handling -EFAULT like most pieces of kernel code the code
>> just immediately calls do_exit, and does not even attempt to handle
>> the error.
>>
>> That is not my favorite strategy at all, but I suspect it isn't worth
>> it, or safe to update the skeleton.S to handle errors. Especially as we
>> have not even figured out how to test that code yet.
>
> That's bothering me more than a little, but I need to find out whether
> the emulator even handles FPU exceptions correctly ...
As a fallback plan we can following the lead of ifpsp060/os.S and simply
not catch the kernel triggered page fault, and let
arch/m68k/mm/fault.c:send_fault_sig() return a kernel oops. It is not
ideal as it allows userspace to trigger a kernel oops, but it does at
least keep the kernel in a consistent state.
diff --git a/arch/m68k/fpsp040/skeleton.S b/arch/m68k/fpsp040/skeleton.S
index a8f41615d94a..4c6c4b07ef38 100644
--- a/arch/m68k/fpsp040/skeleton.S
+++ b/arch/m68k/fpsp040/skeleton.S
@@ -479,7 +479,6 @@ copyout:
| movec %d1,%DFC | set dfc for user data space
moreout:
moveb (%a0)+,%d1 | fetch supervisor byte
-out_ea:
movesb %d1,(%a1)+ | write user byte
dbf %d0,moreout
rts
@@ -493,21 +492,9 @@ copyin:
| SFC is already set
| movec %d1,%SFC | set sfc for user space
morein:
-in_ea:
movesb (%a0)+,%d1 | fetch user byte
moveb %d1,(%a1)+ | write supervisor byte
dbf %d0,morein
rts
- .section .fixup,#alloc,#execinstr
- .even
-1:
- jbra fpsp040_die
-
- .section __ex_table,#alloc
- .align 4
-
- .long in_ea,1b
- .long out_ea,1b
-
|end
diff --git a/arch/m68k/kernel/traps.c b/arch/m68k/kernel/traps.c
index e2a6f3556211..3ec6ae1bdaf9 100644
--- a/arch/m68k/kernel/traps.c
+++ b/arch/m68k/kernel/traps.c
@@ -1144,15 +1144,6 @@ asmlinkage void set_esp0(unsigned long ssp)
current->thread.esp0 = ssp;
}
-/*
- * This function is called if an error occur while accessing
- * user-space from the fpsp040 code.
- */
-asmlinkage void fpsp040_die(void)
-{
- do_exit(SIGSEGV);
-}
-
#ifdef CONFIG_M68KFPU_EMU
asmlinkage void fpemu_signal(int signal, int code, void *addr)
{
Eric
next prev parent reply other threads:[~2021-07-23 22:31 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-23 0:21 [PATCH v4 0/3] m68k: Improved switch stack handling Michael Schmitz
2021-06-23 0:21 ` [PATCH v4 1/3] m68k: save extra registers on more syscall entry points Michael Schmitz
2021-06-23 0:21 ` [PATCH v4 2/3] m68k: correctly handle IO worker stack frame set-up Michael Schmitz
2021-06-23 0:21 ` [PATCH v4 3/3] m68k: track syscalls being traced with shallow user context stack Michael Schmitz
2021-07-25 10:05 ` Geert Uytterhoeven
2021-07-25 20:48 ` Michael Schmitz
2021-07-25 21:00 ` Linus Torvalds
2021-07-26 14:27 ` Greg Ungerer
2021-07-15 13:29 ` [PATCH v4 0/3] m68k: Improved switch stack handling Eric W. Biederman
2021-07-15 23:10 ` Michael Schmitz
2021-07-17 5:38 ` Michael Schmitz
2021-07-17 18:52 ` Eric W. Biederman
2021-07-17 20:09 ` Michael Schmitz
2021-07-17 23:04 ` Michael Schmitz
2021-07-18 10:47 ` Andreas Schwab
2021-07-18 19:47 ` Michael Schmitz
2021-07-18 20:59 ` Brad Boyer
2021-07-19 3:15 ` Michael Schmitz
2021-07-20 20:32 ` Eric W. Biederman
2021-07-20 22:16 ` Michael Schmitz
2021-07-22 14:49 ` Eric W. Biederman
2021-07-23 4:23 ` Michael Schmitz
2021-07-23 22:31 ` Eric W. Biederman [this message]
2021-07-23 23:52 ` Michael Schmitz
2021-07-24 12:05 ` Andreas Schwab
2021-07-25 7:44 ` Michael Schmitz
2021-07-25 10:12 ` Brad Boyer
2021-07-26 2:00 ` Michael Schmitz
2021-07-26 19:36 ` [RFC][PATCH] signal/m68k: Use force_sigsegv(SIGSEGV) in fpsp040_die Eric W. Biederman
2021-07-26 20:13 ` Andreas Schwab
2021-07-26 20:29 ` Eric W. Biederman
2021-07-26 21:25 ` Andreas Schwab
2021-07-26 20:29 ` Michael Schmitz
2021-07-26 21:08 ` [PATCH] " Eric W. Biederman
2021-08-25 15:56 ` Eric W. Biederman
2021-08-26 12:15 ` Geert Uytterhoeven
2021-07-25 11:53 ` [PATCH v4 0/3] m68k: Improved switch stack handling Andreas Schwab
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87tukkk6h3.fsf@disp2133 \
--to=ebiederm@xmission.com \
--cc=geert@linux-m68k.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-m68k@lists.linux-m68k.org \
--cc=schmitzmic@gmail.com \
--cc=schwab@linux-m68k.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox