Re: [PATCH v4 0/3] m68k: Improved switch stack handling

Generic Linux architectural discussions
 help / color / mirror / Atom feed

From: ebiederm@xmission.com (Eric W. Biederman)
To: Michael Schmitz <schmitzmic@gmail.com>
Cc: geert@linux-m68k.org, linux-arch@vger.kernel.org,
	linux-m68k@lists.linux-m68k.org, torvalds@linux-foundation.org,
	schwab@linux-m68k.org
Subject: Re: [PATCH v4 0/3] m68k: Improved switch stack handling
Date: Fri, 23 Jul 2021 17:31:20 -0500	[thread overview]
Message-ID: <87tukkk6h3.fsf@disp2133> (raw)
In-Reply-To: <12992a3c-0740-f90e-aa4e-1ec1d8ea38f6@gmail.com> (Michael Schmitz's message of "Fri, 23 Jul 2021 16:23:12 +1200")

Michael Schmitz <schmitzmic@gmail.com> writes:

> Hi Eric,
>
> Am 23.07.2021 um 02:49 schrieb Eric W. Biederman:
>> Michael Schmitz <schmitzmic@gmail.com> writes:
>>
>>> Hi Eric,
>>>
>>> On 21/07/21 8:32 am, Eric W. Biederman wrote:
>>>>
>>>>> diff --git a/arch/m68k/fpsp040/skeleton.S b/arch/m68k/fpsp040/skeleton.S
>>>>> index a8f4161..6c92d38 100644
>>>>> --- a/arch/m68k/fpsp040/skeleton.S
>>>>> +++ b/arch/m68k/fpsp040/skeleton.S
>>>>> @@ -502,7 +502,17 @@ in_ea:
>>>>>   	.section .fixup,#alloc,#execinstr
>>>>>   	.even
>>>>>   1:
>>>>> +
>>>>> +	SAVE_ALL_INT
>>>>> +	SAVE_SWITCH_STACK
>>>>          ^^^^^^^^^^
>>>>
>>>> I don't think this saves the registers in the well known fixed location
>>>> on the stack because some registers are saved at the exception entry
>>>> point.
>>>
>>> The FPU exception entry points are not using the exception entry code in
>>> head.S. These entry points are stored in the exception vector table directly. No
>>> saving of a syscall stack frame happens there. The FPU places its exception
>>> frame on the stack, and that is what the FPU exception handlers use.
>>>
>>> (If these have to call out to the generic exception handlers again, they will
>>> build a minimal stack frame, see code in skeleton.S.)
>>>
>>> Calling fpsp040_die() is no different from calling a syscall that may need to
>>> have access to the full stack frame. The 'fixed location' is just 'on the stack
>>> before calling  fpsp040_die()', again this is no different from calling
>>> e.g. sys_fork() which does not take a pointer to the begin of the stack frame as
>>> an argument.
>>>
>>> I must admit I never looked at how do_exit() figures out where the stack frame
>>> containing the saved registers is stored, I just assumed it unwinds the stack up
>>> to the point where the caller syscall was made, and works from there. The same
>>> strategy ought to work here.
>>
>> For do_exit the part we need to be careful with is PTRACE_EVENT_EXIT,
>> which means it is ptrace that we need to look at.
>>
>> For m68k the code in put_reg and get_reg finds the registers by looking
>> at task->thread.esp0.
>
> Thanks, that's what I was missing here.
>>
>> I was expecting m68k to use the same technique as alpha which expects a
>> fixed offset from task_stack_page(task).
>>
>> So your code will work if you add code to update task->thread.esp0 which
>> is also known as THREAD_ESP0 in entry.S
>
> Shoving
>
> movel   %sp,%curptr@(TASK_THREAD+THREAD_ESP0)
>
> in between the SAVE_ALL_INT and SAVE_SWITCH_STACK ought to do the
> trick there.
>
>>
>>>> Without being saved at the well known fixed location if some process
>>>> stops in PTRACE_EVENT_EXIT in do_exit we likely get some complete
>>>> gibberish.
>>>>
>>>> That is probably safe.
>>>>
>>>>>   	jbra	fpsp040_die
>>>>> +	addql   #8,%sp
>>>>> +	addql   #8,%sp
>>>>> +	addql   #8,%sp
>>>>> +	addql   #8,%sp
>>>>> +	addql   #8,%sp
>>>>> +	addql   #4,%sp
>>>>> +	rts
>>>> Especially as everything after jumping to fpsp040_die does not execute.
>>>
>>> Unless we change fpsp040_die() to call force_sig(SIGSEGV).
>>
>> Yes.  I think we would probably need to have it also call get_signal and
>> all of that, because I don't think the very light call path for that
>> exception includes testing if signals are pending.
>
> As far as I can see, there is a test for pending signals:
>
> ENTRY(ret_from_exception)
> .Lret_from_exception:
>         btst    #5,%sp@(PT_OFF_SR)      | check if returning to kernel
>         bnes    1f                      | if so, skip resched, signals
>         | only allow interrupts when we are really the last one on the
>         | kernel stack, otherwise stack overflow can occur during
>         | heavy interrupt load
>         andw    #ALLOWINT,%sr
>
> resume_userspace:
>         movel   %curptr@(TASK_STACK),%a1
>         moveb   %a1@(TINFO_FLAGS+3),%d0	| bits 0-7 of TINFO_FLAGS
>         jne     exit_work		| any bit set? -> exit_work
> 1:      RESTORE_ALL
>
> exit_work:
>         | save top of frame
>         movel   %sp,%curptr@(TASK_THREAD+THREAD_ESP0)
>         lslb    #1,%d0			| shift out TIF_NEED_RESCHED
>         jne     do_signal_return	| any remaining bit
> (signal/notify_resume)? -> do_signal_return
>         pea     resume_userspace
>         jra     schedule
>
> As long as TIF_NOTIFY_SIGNAL or TIF_SIGPENDING are set,
> do_signal_return will be called.

I was going to say I don't think so, as my tracing of
the code lead in a couple of different directions.  Upon closer
inspection all those paths either lead to fpsp_done or more
directly to ret_from_exception.

For anyone else who might want to trace the code, or for myself later on
when I forget.  As best as I can figure the hardware exception vector
table is setup in: arch/m68k/kernel/vector.c

For the vectors in question it appears to be this chunk of code:

	if (CPU_IS_040 && !FPU_IS_EMU) {
		/* set up FPSP entry points */
		asmlinkage void dz_vec(void) asm ("dz");
		asmlinkage void inex_vec(void) asm ("inex");
		asmlinkage void ovfl_vec(void) asm ("ovfl");
		asmlinkage void unfl_vec(void) asm ("unfl");
		asmlinkage void snan_vec(void) asm ("snan");
		asmlinkage void operr_vec(void) asm ("operr");
		asmlinkage void bsun_vec(void) asm ("bsun");
		asmlinkage void fline_vec(void) asm ("fline");
		asmlinkage void unsupp_vec(void) asm ("unsupp");

		vectors[VEC_FPDIVZ] = dz_vec;
		vectors[VEC_FPIR] = inex_vec;
		vectors[VEC_FPOVER] = ovfl_vec;
		vectors[VEC_FPUNDER] = unfl_vec;
		vectors[VEC_FPNAN] = snan_vec;
		vectors[VEC_FPOE] = operr_vec;
		vectors[VEC_FPBRUC] = bsun_vec;
		vectors[VEC_LINE11] = fline_vec;
		vectors[VEC_FPUNSUP] = unsupp_vec;
	}


Which leads me to call traces that look like this:

hw
  fline
    fpsp_fline
       mem_read
          user_read
             copyin
               in_ea
                  <page-fault>
                     fpsp040_die

If that mem_read returns it can be followed by
       not_mvcr
          real_fline
            ret_from_exception

Or it can be followed by
       fix_con
          uni_2
             gen_except
                 do_clean
                    finish_up
                       fpsp_done
                          ret_from_exception
                          


>> The way the code is structured it is actively incorrect to return from
>> fpsp040_die, as the code does not know what to do if it reads a byte
>> from userspace and there is nothing there.
>
> Correct - my hope is that upon return from the FPU exception (that
> continued after a dodgy read or write), we get the signal delivered
> and will die then.

Yes.  That does look like a good strategy.

I am wondering if there are values we can return that will make the
path out of the exit routine more deterministic.

I have played with that a little bit today, but it doesn't look like I
am going to have time to put together any kind of real patch today.

Simply modifying fpsp040_die to call force_sigsegv(SIGSEGV) should be
enough to trigger a signal (no call stack work needed if we remove
do_exit).  The tricky bit is what value do we want to fake when we
can not read anything from userspace.   For a write fault we should just
be able to skip the write entirely.

In both cases we probably should break out of the loop prematurely.  But
I don't know if that is necessary.


The lazy strategy would be to copy the ifpsp060 code and simply oops
the kernel if the read or write of userspace gets a page fault.

>> So instead of handling -EFAULT like most pieces of kernel code the code
>> just immediately calls do_exit, and does not even attempt to handle
>> the error.
>>
>> That is not my favorite strategy at all, but I suspect it isn't worth
>> it, or safe to update the skeleton.S to handle errors.  Especially as we
>> have not even figured out how to test that code yet.
>
> That's bothering me more than a little, but I need to find out whether
> the emulator even handles FPU exceptions correctly ...


As a fallback plan we can following the lead of ifpsp060/os.S and simply
not catch the kernel triggered page fault, and let
arch/m68k/mm/fault.c:send_fault_sig() return a kernel oops.  It is not
ideal as it allows userspace to trigger a kernel oops, but it does at
least keep the kernel in a consistent state.

diff --git a/arch/m68k/fpsp040/skeleton.S b/arch/m68k/fpsp040/skeleton.S
index a8f41615d94a..4c6c4b07ef38 100644
--- a/arch/m68k/fpsp040/skeleton.S
+++ b/arch/m68k/fpsp040/skeleton.S
@@ -479,7 +479,6 @@ copyout:
 |	movec	%d1,%DFC		| set dfc for user data space
 moreout:
 	moveb	(%a0)+,%d1	| fetch supervisor byte
-out_ea:
 	movesb	%d1,(%a1)+	| write user byte
 	dbf	%d0,moreout
 	rts
@@ -493,21 +492,9 @@ copyin:
 |	SFC is already set
 |	movec	%d1,%SFC		| set sfc for user space
 morein:
-in_ea:
 	movesb	(%a0)+,%d1	| fetch user byte
 	moveb	%d1,(%a1)+	| write supervisor byte
 	dbf	%d0,morein
 	rts
 
-	.section .fixup,#alloc,#execinstr
-	.even
-1:
-	jbra	fpsp040_die
-
-	.section __ex_table,#alloc
-	.align	4
-
-	.long	in_ea,1b
-	.long	out_ea,1b
-
 	|end
diff --git a/arch/m68k/kernel/traps.c b/arch/m68k/kernel/traps.c
index e2a6f3556211..3ec6ae1bdaf9 100644
--- a/arch/m68k/kernel/traps.c
+++ b/arch/m68k/kernel/traps.c
@@ -1144,15 +1144,6 @@ asmlinkage void set_esp0(unsigned long ssp)
 	current->thread.esp0 = ssp;
 }
 
-/*
- * This function is called if an error occur while accessing
- * user-space from the fpsp040 code.
- */
-asmlinkage void fpsp040_die(void)
-{
-	do_exit(SIGSEGV);
-}
-
 #ifdef CONFIG_M68KFPU_EMU
 asmlinkage void fpemu_signal(int signal, int code, void *addr)
 {

Eric

next prev parent reply	other threads:[~2021-07-23 22:31 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-23  0:21 [PATCH v4 0/3] m68k: Improved switch stack handling Michael Schmitz
2021-06-23  0:21 ` [PATCH v4 1/3] m68k: save extra registers on more syscall entry points Michael Schmitz
2021-06-23  0:21 ` [PATCH v4 2/3] m68k: correctly handle IO worker stack frame set-up Michael Schmitz
2021-06-23  0:21 ` [PATCH v4 3/3] m68k: track syscalls being traced with shallow user context stack Michael Schmitz
2021-07-25 10:05   ` Geert Uytterhoeven
2021-07-25 20:48     ` Michael Schmitz
2021-07-25 21:00       ` Linus Torvalds
2021-07-26 14:27         ` Greg Ungerer
2021-07-15 13:29 ` [PATCH v4 0/3] m68k: Improved switch stack handling Eric W. Biederman
2021-07-15 23:10   ` Michael Schmitz
2021-07-17  5:38     ` Michael Schmitz
2021-07-17 18:52       ` Eric W. Biederman
2021-07-17 20:09         ` Michael Schmitz
2021-07-17 23:04           ` Michael Schmitz
2021-07-18 10:47             ` Andreas Schwab
2021-07-18 19:47               ` Michael Schmitz
2021-07-18 20:59                 ` Brad Boyer
2021-07-19  3:15                   ` Michael Schmitz
2021-07-20 20:32             ` Eric W. Biederman
2021-07-20 22:16               ` Michael Schmitz
2021-07-22 14:49                 ` Eric W. Biederman
2021-07-23  4:23                   ` Michael Schmitz
2021-07-23 22:31                     ` Eric W. Biederman [this message]
2021-07-23 23:52                       ` Michael Schmitz
2021-07-24 12:05                         ` Andreas Schwab
2021-07-25  7:44                           ` Michael Schmitz
2021-07-25 10:12                             ` Brad Boyer
2021-07-26  2:00                               ` Michael Schmitz
2021-07-26 19:36                                 ` [RFC][PATCH] signal/m68k: Use force_sigsegv(SIGSEGV) in fpsp040_die Eric W. Biederman
2021-07-26 20:13                                   ` Andreas Schwab
2021-07-26 20:29                                     ` Eric W. Biederman
2021-07-26 21:25                                       ` Andreas Schwab
2021-07-26 20:29                                   ` Michael Schmitz
2021-07-26 21:08                                     ` [PATCH] " Eric W. Biederman
2021-08-25 15:56                                       ` Eric W. Biederman
2021-08-26 12:15                                       ` Geert Uytterhoeven
2021-07-25 11:53                             ` [PATCH v4 0/3] m68k: Improved switch stack handling Andreas Schwab

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:a8f41615d94 dfblob:4c6c4b07ef3 dfblob:e2a6f355621
dfblob:3ec6ae1bdaf )
 OR (
bs:"Re: [PATCH v4 0/3] m68k: Improved switch stack handling" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tukkk6h3.fsf@disp2133 \
    --to=ebiederm@xmission.com \
    --cc=geert@linux-m68k.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-m68k@lists.linux-m68k.org \
    --cc=schmitzmic@gmail.com \
    --cc=schwab@linux-m68k.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox