Re: [PATCH v4 0/3] m68k: Improved switch stack handling

All of lore.kernel.org
 help / color / mirror / Atom feed

From: ebiederm@xmission.com (Eric W. Biederman)
To: Michael Schmitz <schmitzmic@gmail.com>
Cc: geert@linux-m68k.org, linux-arch@vger.kernel.org,
	linux-m68k@lists.linux-m68k.org, torvalds@linux-foundation.org,
	schwab@linux-m68k.org
Subject: Re: [PATCH v4 0/3] m68k: Improved switch stack handling
Date: Fri, 23 Jul 2021 17:31:20 -0500	[thread overview]
Message-ID: <87tukkk6h3.fsf@disp2133> (raw)
In-Reply-To: <12992a3c-0740-f90e-aa4e-1ec1d8ea38f6@gmail.com> (Michael Schmitz's message of "Fri, 23 Jul 2021 16:23:12 +1200")

Michael Schmitz <schmitzmic@gmail.com> writes:

> Hi Eric,
>
> Am 23.07.2021 um 02:49 schrieb Eric W. Biederman:
>> Michael Schmitz <schmitzmic@gmail.com> writes:
>>
>>> Hi Eric,
>>>
>>> On 21/07/21 8:32 am, Eric W. Biederman wrote:
>>>>
>>>>> diff --git a/arch/m68k/fpsp040/skeleton.S b/arch/m68k/fpsp040/skeleton.S
>>>>> index a8f4161..6c92d38 100644
>>>>> --- a/arch/m68k/fpsp040/skeleton.S
>>>>> +++ b/arch/m68k/fpsp040/skeleton.S
>>>>> @@ -502,7 +502,17 @@ in_ea:
>>>>>   	.section .fixup,#alloc,#execinstr
>>>>>   	.even
>>>>>   1:
>>>>> +
>>>>> +	SAVE_ALL_INT
>>>>> +	SAVE_SWITCH_STACK
>>>>          ^^^^^^^^^^
>>>>
>>>> I don't think this saves the registers in the well known fixed location
>>>> on the stack because some registers are saved at the exception entry
>>>> point.
>>>
>>> The FPU exception entry points are not using the exception entry code in
>>> head.S. These entry points are stored in the exception vector table directly. No
>>> saving of a syscall stack frame happens there. The FPU places its exception
>>> frame on the stack, and that is what the FPU exception handlers use.
>>>
>>> (If these have to call out to the generic exception handlers again, they will
>>> build a minimal stack frame, see code in skeleton.S.)
>>>
>>> Calling fpsp040_die() is no different from calling a syscall that may need to
>>> have access to the full stack frame. The 'fixed location' is just 'on the stack
>>> before calling  fpsp040_die()', again this is no different from calling
>>> e.g. sys_fork() which does not take a pointer to the begin of the stack frame as
>>> an argument.
>>>
>>> I must admit I never looked at how do_exit() figures out where the stack frame
>>> containing the saved registers is stored, I just assumed it unwinds the stack up
>>> to the point where the caller syscall was made, and works from there. The same
>>> strategy ought to work here.
>>
>> For do_exit the part we need to be careful with is PTRACE_EVENT_EXIT,
>> which means it is ptrace that we need to look at.
>>
>> For m68k the code in put_reg and get_reg finds the registers by looking
>> at task->thread.esp0.
>
> Thanks, that's what I was missing here.
>>
>> I was expecting m68k to use the same technique as alpha which expects a
>> fixed offset from task_stack_page(task).
>>
>> So your code will work if you add code to update task->thread.esp0 which
>> is also known as THREAD_ESP0 in entry.S
>
> Shoving
>
> movel   %sp,%curptr@(TASK_THREAD+THREAD_ESP0)
>
> in between the SAVE_ALL_INT and SAVE_SWITCH_STACK ought to do the
> trick there.
>
>>
>>>> Without being saved at the well known fixed location if some process
>>>> stops in PTRACE_EVENT_EXIT in do_exit we likely get some complete
>>>> gibberish.
>>>>
>>>> That is probably safe.
>>>>
>>>>>   	jbra	fpsp040_die
>>>>> +	addql   #8,%sp
>>>>> +	addql   #8,%sp
>>>>> +	addql   #8,%sp
>>>>> +	addql   #8,%sp
>>>>> +	addql   #8,%sp
>>>>> +	addql   #4,%sp
>>>>> +	rts
>>>> Especially as everything after jumping to fpsp040_die does not execute.
>>>
>>> Unless we change fpsp040_die() to call force_sig(SIGSEGV).
>>
>> Yes.  I think we would probably need to have it also call get_signal and
>> all of that, because I don't think the very light call path for that
>> exception includes testing if signals are pending.
>
> As far as I can see, there is a test for pending signals:
>
> ENTRY(ret_from_exception)
> .Lret_from_exception:
>         btst    #5,%sp@(PT_OFF_SR)      | check if returning to kernel
>         bnes    1f                      | if so, skip resched, signals
>         | only allow interrupts when we are really the last one on the
>         | kernel stack, otherwise stack overflow can occur during
>         | heavy interrupt load
>         andw    #ALLOWINT,%sr
>
> resume_userspace:
>         movel   %curptr@(TASK_STACK),%a1
>         moveb   %a1@(TINFO_FLAGS+3),%d0	| bits 0-7 of TINFO_FLAGS
>         jne     exit_work		| any bit set? -> exit_work
> 1:      RESTORE_ALL
>
> exit_work:
>         | save top of frame
>         movel   %sp,%curptr@(TASK_THREAD+THREAD_ESP0)
>         lslb    #1,%d0			| shift out TIF_NEED_RESCHED
>         jne     do_signal_return	| any remaining bit
> (signal/notify_resume)? -> do_signal_return
>         pea     resume_userspace
>         jra     schedule
>
> As long as TIF_NOTIFY_SIGNAL or TIF_SIGPENDING are set,
> do_signal_return will be called.

I was going to say I don't think so, as my tracing of
the code lead in a couple of different directions.  Upon closer
inspection all those paths either lead to fpsp_done or more
directly to ret_from_exception.

For anyone else who might want to trace the code, or for myself later on
when I forget.  As best as I can figure the hardware exception vector
table is setup in: arch/m68k/kernel/vector.c

For the vectors in question it appears to be this chunk of code:

	if (CPU_IS_040 && !FPU_IS_EMU) {
		/* set up FPSP entry points */
		asmlinkage void dz_vec(void) asm ("dz");
		asmlinkage void inex_vec(void) asm ("inex");
		asmlinkage void ovfl_vec(void) asm ("ovfl");
		asmlinkage void unfl_vec(void) asm ("unfl");
		asmlinkage void snan_vec(void) asm ("snan");
		asmlinkage void operr_vec(void) asm ("operr");
		asmlinkage void bsun_vec(void) asm ("bsun");
		asmlinkage void fline_vec(void) asm ("fline");
		asmlinkage void unsupp_vec(void) asm ("unsupp");

		vectors[VEC_FPDIVZ] = dz_vec;
		vectors[VEC_FPIR] = inex_vec;
		vectors[VEC_FPOVER] = ovfl_vec;
		vectors[VEC_FPUNDER] = unfl_vec;
		vectors[VEC_FPNAN] = snan_vec;
		vectors[VEC_FPOE] = operr_vec;
		vectors[VEC_FPBRUC] = bsun_vec;
		vectors[VEC_LINE11] = fline_vec;
		vectors[VEC_FPUNSUP] = unsupp_vec;
	}


Which leads me to call traces that look like this:

hw
  fline
    fpsp_fline
       mem_read
          user_read
             copyin
               in_ea
                  <page-fault>
                     fpsp040_die

If that mem_read returns it can be followed by
       not_mvcr
          real_fline
            ret_from_exception

Or it can be followed by
       fix_con
          uni_2
             gen_except
                 do_clean
                    finish_up
                       fpsp_done
                          ret_from_exception
                          


>> The way the code is structured it is actively incorrect to return from
>> fpsp040_die, as the code does not know what to do if it reads a byte
>> from userspace and there is nothing there.
>
> Correct - my hope is that upon return from the FPU exception (that
> continued after a dodgy read or write), we get the signal delivered
> and will die then.

Yes.  That does look like a good strategy.

I am wondering if there are values we can return that will make the
path out of the exit routine more deterministic.

I have played with that a little bit today, but it doesn't look like I
am going to have time to put together any kind of real patch today.

Simply modifying fpsp040_die to call force_sigsegv(SIGSEGV) should be
enough to trigger a signal (no call stack work needed if we remove
do_exit).  The tricky bit is what value do we want to fake when we
can not read anything from userspace.   For a write fault we should just
be able to skip the write entirely.

In both cases we probably should break out of the loop prematurely.  But
I don't know if that is necessary.


The lazy strategy would be to copy the ifpsp060 code and simply oops
the kernel if the read or write of userspace gets a page fault.

>> So instead of handling -EFAULT like most pieces of kernel code the code
>> just immediately calls do_exit, and does not even attempt to handle
>> the error.
>>
>> That is not my favorite strategy at all, but I suspect it isn't worth
>> it, or safe to update the skeleton.S to handle errors.  Especially as we
>> have not even figured out how to test that code yet.
>
> That's bothering me more than a little, but I need to find out whether
> the emulator even handles FPU exceptions correctly ...


As a fallback plan we can following the lead of ifpsp060/os.S and simply
not catch the kernel triggered page fault, and let
arch/m68k/mm/fault.c:send_fault_sig() return a kernel oops.  It is not
ideal as it allows userspace to trigger a kernel oops, but it does at
least keep the kernel in a consistent state.

diff --git a/arch/m68k/fpsp040/skeleton.S b/arch/m68k/fpsp040/skeleton.S
index a8f41615d94a..4c6c4b07ef38 100644
--- a/arch/m68k/fpsp040/skeleton.S
+++ b/arch/m68k/fpsp040/skeleton.S
@@ -479,7 +479,6 @@ copyout:
 |	movec	%d1,%DFC		| set dfc for user data space
 moreout:
 	moveb	(%a0)+,%d1	| fetch supervisor byte
-out_ea:
 	movesb	%d1,(%a1)+	| write user byte
 	dbf	%d0,moreout
 	rts
@@ -493,21 +492,9 @@ copyin:
 |	SFC is already set
 |	movec	%d1,%SFC		| set sfc for user space
 morein:
-in_ea:
 	movesb	(%a0)+,%d1	| fetch user byte
 	moveb	%d1,(%a1)+	| write supervisor byte
 	dbf	%d0,morein
 	rts
 
-	.section .fixup,#alloc,#execinstr
-	.even
-1:
-	jbra	fpsp040_die
-
-	.section __ex_table,#alloc
-	.align	4
-
-	.long	in_ea,1b
-	.long	out_ea,1b
-
 	|end
diff --git a/arch/m68k/kernel/traps.c b/arch/m68k/kernel/traps.c
index e2a6f3556211..3ec6ae1bdaf9 100644
--- a/arch/m68k/kernel/traps.c
+++ b/arch/m68k/kernel/traps.c
@@ -1144,15 +1144,6 @@ asmlinkage void set_esp0(unsigned long ssp)
 	current->thread.esp0 = ssp;
 }
 
-/*
- * This function is called if an error occur while accessing
- * user-space from the fpsp040 code.
- */
-asmlinkage void fpsp040_die(void)
-{
-	do_exit(SIGSEGV);
-}
-
 #ifdef CONFIG_M68KFPU_EMU
 asmlinkage void fpemu_signal(int signal, int code, void *addr)
 {

Eric

next prev parent reply	other threads:[~2021-07-23 22:31 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-23  0:21 [PATCH v4 0/3] m68k: Improved switch stack handling Michael Schmitz
2021-06-23  0:21 ` [PATCH v4 1/3] m68k: save extra registers on more syscall entry points Michael Schmitz
2021-06-23  0:21 ` [PATCH v4 2/3] m68k: correctly handle IO worker stack frame set-up Michael Schmitz
2021-06-23  0:21 ` [PATCH v4 3/3] m68k: track syscalls being traced with shallow user context stack Michael Schmitz
2021-07-25 10:05   ` Geert Uytterhoeven
2021-07-25 20:48     ` Michael Schmitz
2021-07-25 21:00       ` Linus Torvalds
2021-07-26 14:27         ` Greg Ungerer
2021-07-15 13:29 ` [PATCH v4 0/3] m68k: Improved switch stack handling Eric W. Biederman
2021-07-15 23:10   ` Michael Schmitz
2021-07-17  5:38     ` Michael Schmitz
2021-07-17 18:52       ` Eric W. Biederman
2021-07-17 20:09         ` Michael Schmitz
2021-07-17 23:04           ` Michael Schmitz
2021-07-18 10:47             ` Andreas Schwab
2021-07-18 19:47               ` Michael Schmitz
2021-07-18 20:59                 ` Brad Boyer
2021-07-19  3:15                   ` Michael Schmitz
2021-07-20 20:32             ` Eric W. Biederman
2021-07-20 22:16               ` Michael Schmitz
2021-07-22 14:49                 ` Eric W. Biederman
2021-07-23  4:23                   ` Michael Schmitz
2021-07-23 22:31                     ` Eric W. Biederman [this message]
2021-07-23 23:52                       ` Michael Schmitz
2021-07-24 12:05                         ` Andreas Schwab
2021-07-25  7:44                           ` Michael Schmitz
2021-07-25 10:12                             ` Brad Boyer
2021-07-26  2:00                               ` Michael Schmitz
2021-07-26 19:36                                 ` [RFC][PATCH] signal/m68k: Use force_sigsegv(SIGSEGV) in fpsp040_die Eric W. Biederman
2021-07-26 20:13                                   ` Andreas Schwab
2021-07-26 20:29                                     ` Eric W. Biederman
2021-07-26 21:25                                       ` Andreas Schwab
2021-07-26 20:29                                   ` Michael Schmitz
2021-07-26 21:08                                     ` [PATCH] " Eric W. Biederman
2021-08-25 15:56                                       ` Eric W. Biederman
2021-08-26 12:15                                       ` Geert Uytterhoeven
2021-07-25 11:53                             ` [PATCH v4 0/3] m68k: Improved switch stack handling Andreas Schwab

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:a8f41615d94 dfblob:4c6c4b07ef3 dfblob:e2a6f355621
dfblob:3ec6ae1bdaf )
 OR (
bs:"Re: [PATCH v4 0/3] m68k: Improved switch stack handling" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tukkk6h3.fsf@disp2133 \
    --to=ebiederm@xmission.com \
    --cc=geert@linux-m68k.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-m68k@lists.linux-m68k.org \
    --cc=schmitzmic@gmail.com \
    --cc=schwab@linux-m68k.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.