[PATCH RFC 0/2] 68020/030 signal handling following exceptions

public inbox for linux-m68k@lists.linux-m68k.org
 help / color / mirror / Atom feed

* [PATCH RFC 0/2] 68020/030 signal handling following exceptions
@ 2023-05-02  6:50 Finn Thain
  2023-05-02  6:50 ` [PATCH RFC 1/2] m68k: Don't deliver signals except at instruction boundary Finn Thain
  2023-05-02  6:50 ` [PATCH RFC 2/2] m68k: Make allowance for signal delivery following an address error Finn Thain
  0 siblings, 2 replies; 23+ messages in thread
From: Finn Thain @ 2023-05-02  6:50 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: linux-m68k, Michael Schmitz, Andreas Schwab

These two patches remedy stack corruption on 68020/030 when delivering
signals following a bus error or (theoretically) an address error.


Finn Thain (2):
  m68k: Don't deliver signals except at instruction boundary
  m68k: Make allowance for signal delivery following an address error

 arch/m68k/kernel/entry.S  |  6 +++++-
 arch/m68k/kernel/signal.c | 13 +++++++++----
 arch/m68k/kernel/traps.c  | 10 ++++++++--
 3 files changed, 22 insertions(+), 7 deletions(-)

-- 
2.37.5


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH RFC 1/2] m68k: Don't deliver signals except at instruction boundary
  2023-05-02  6:50 [PATCH RFC 0/2] 68020/030 signal handling following exceptions Finn Thain
@ 2023-05-02  6:50 ` Finn Thain
  2023-05-02 16:51   ` Andreas Schwab
  2023-05-02  6:50 ` [PATCH RFC 2/2] m68k: Make allowance for signal delivery following an address error Finn Thain
  1 sibling, 1 reply; 23+ messages in thread
From: Finn Thain @ 2023-05-02  6:50 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: linux-m68k, Michael Schmitz, Andreas Schwab

From: Andreas Schwab <schwab@linux-m68k.org>

Signal delivery should only happen at insn boundaries, but due to the
way the 030 handles return from bus error exceptions (the insn is resumed,
not restarted like on the 040/060) the kernel may do it in the middle of
the faulting insn.

For example, a page fault can happen during execution of
        moveml %a2-%a3/%a5,%sp@-
This then produces a format 0xB exception frame containing an unreliable
USP value. That value gets used to calculate the location for a signal
frame and the end result is a corrupted the user stack.

This stack corruption was observed in dash (actually in glibc) where it
showed up as an intermittent "stack smashing detected" failure following
SIGCHLD signal delivery.

The failure was hard to reproduce because delivery of the signal races
with the page fault and because the kernel places an unpredictable gap
of up to 7 bytes between the USP and the signal frame, which is often
sufficient to prevent stack corruption.

Reported-and-tested-by: Stan Johnson <userm57@yahoo.com>
Link: https://lore.kernel.org/all/CAMuHMdW3yD22_ApemzW_6me3adq6A458u1_F0v-1EYwK_62jPA@mail.gmail.com/
Cc: Michael Schmitz <schmitzmic@gmail.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Finn Thain <fthain@linux-m68k.org>
---
This is the same patch that Andreas sent, excepting a minor change to the
commentary and a 'fallthrough' statement.
Andreas, as the author, this will need your signed-off-by.
---
 arch/m68k/kernel/entry.S |  6 +++++-
 arch/m68k/kernel/traps.c | 10 ++++++++--
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/m68k/kernel/entry.S b/arch/m68k/kernel/entry.S
index 4dd2fd7acba9..77b558dad14b 100644
--- a/arch/m68k/kernel/entry.S
+++ b/arch/m68k/kernel/entry.S
@@ -117,7 +117,11 @@ ENTRY(buserr)
 	movel	%sp,%sp@-		| stack frame pointer argument
 	jbsr	buserr_c
 	addql	#4,%sp
-	jra	ret_from_exception
+	| deliver no signals if the fault occurred with an insn in progress
+	| (on the 020/030)
+	tstl	%d0
+	jeq	ret_from_exception
+	RESTORE_ALL
 
 ENTRY(trap)
 	SAVE_ALL_INT
diff --git a/arch/m68k/kernel/traps.c b/arch/m68k/kernel/traps.c
index a700807c9b6d..f535054d1e2a 100644
--- a/arch/m68k/kernel/traps.c
+++ b/arch/m68k/kernel/traps.c
@@ -751,8 +751,10 @@ static inline void access_errorcf(unsigned int fs, struct frame *fp)
 }
 #endif /* CONFIG_COLDFIRE CONFIG_MMU */
 
-asmlinkage void buserr_c(struct frame *fp)
+asmlinkage int buserr_c(struct frame *fp)
 {
+	int not_insn_boundary = 0;
+
 	/* Only set esp0 if coming from user mode */
 	if (user_mode(&fp->ptregs))
 		current->thread.esp0 = (unsigned long) fp;
@@ -793,8 +795,10 @@ asmlinkage void buserr_c(struct frame *fp)
 	  break;
 #endif
 #if defined (CPU_M68020_OR_M68030)
-	case 0xa:
 	case 0xb:
+	  not_insn_boundary = 1;
+	  fallthrough;
+	case 0xa:
 	  bus_error030 (fp);
 	  break;
 #endif
@@ -803,6 +807,8 @@ asmlinkage void buserr_c(struct frame *fp)
 	  pr_debug("Unknown SIGSEGV - 4\n");
 	  force_sig(SIGSEGV);
 	}
+
+	return not_insn_boundary;
 }
 
 
-- 
2.37.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH RFC 2/2] m68k: Make allowance for signal delivery following an address error
  2023-05-02  6:50 [PATCH RFC 0/2] 68020/030 signal handling following exceptions Finn Thain
  2023-05-02  6:50 ` [PATCH RFC 1/2] m68k: Don't deliver signals except at instruction boundary Finn Thain
@ 2023-05-02  6:50 ` Finn Thain
  2023-05-02 21:06   ` Michael Schmitz
  1 sibling, 1 reply; 23+ messages in thread
From: Finn Thain @ 2023-05-02  6:50 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: linux-m68k, Michael Schmitz, Andreas Schwab

The 68030 Users Manual says that address errors occur immediately they
are detected during instruction prefetch. The instruction pipeline allows
prefetch to overlap with other instructions which means an address error
can be taken during execution of a different instruction.

Both a bus error and an address error may produce a format 0xB exception
frame. Regarding those frames, the UM says the PC contained therein has
the "address of instruction in execution when fault occurred -- may not be
the instruction which generated the faulted bus cycle".

In the bus error case, the USP in that frame is not reliable (like the
PC). We should not rely on it in the address error case. The address error
case always produces a SIGBUS. That signal may be caught and the
exception fixed up. If a debugger or emulator were to do so, this patch
would theoretically prevent user stack corruption.

Cc: Michael Schmitz <schmitzmic@gmail.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Link: https://lore.kernel.org/linux-m68k/20230429080410.8993-1-schmitzmic@gmail.com/
Co-developed-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: Finn Thain <fthain@linux-m68k.org>
---
Michael, as co-developer, this will need your signed-off-by.
---
 arch/m68k/kernel/signal.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/m68k/kernel/signal.c b/arch/m68k/kernel/signal.c
index b9f6908a31bc..8aeafbb083f7 100644
--- a/arch/m68k/kernel/signal.c
+++ b/arch/m68k/kernel/signal.c
@@ -858,11 +858,16 @@ static inline int rt_setup_ucontext(struct ucontext __user *uc, struct pt_regs *
 }
 
 static inline void __user *
-get_sigframe(struct ksignal *ksig, size_t frame_size)
+get_sigframe(struct ksignal *ksig, struct pt_regs *tregs, size_t frame_size)
 {
 	unsigned long usp = sigsp(rdusp(), ksig);
+	unsigned long gap = 0;
 
-	return (void __user *)((usp - frame_size) & -8UL);
+	if (CPU_IS_020_OR_030 && tregs->format == 0xb)
+		/* USP is unreliable so use worst-case value */
+		gap = 256;
+
+	return (void __user *)((usp - gap - frame_size) & -8UL);
 }
 
 static int setup_frame(struct ksignal *ksig, sigset_t *set,
@@ -880,7 +885,7 @@ static int setup_frame(struct ksignal *ksig, sigset_t *set,
 		return -EFAULT;
 	}
 
-	frame = get_sigframe(ksig, sizeof(*frame) + fsize);
+	frame = get_sigframe(ksig, tregs, sizeof(*frame) + fsize);
 
 	if (fsize)
 		err |= copy_to_user (frame + 1, regs + 1, fsize);
@@ -952,7 +957,7 @@ static int setup_rt_frame(struct ksignal *ksig, sigset_t *set,
 		return -EFAULT;
 	}
 
-	frame = get_sigframe(ksig, sizeof(*frame));
+	frame = get_sigframe(ksig, tregs, sizeof(*frame));
 
 	if (fsize)
 		err |= copy_to_user (&frame->uc.uc_extra, regs + 1, fsize);
-- 
2.37.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 1/2] m68k: Don't deliver signals except at instruction boundary
  2023-05-02  6:50 ` [PATCH RFC 1/2] m68k: Don't deliver signals except at instruction boundary Finn Thain
@ 2023-05-02 16:51   ` Andreas Schwab
  2023-05-02 20:15     ` Michael Schmitz
  2023-05-03  3:22     ` Finn Thain
  0 siblings, 2 replies; 23+ messages in thread
From: Andreas Schwab @ 2023-05-02 16:51 UTC (permalink / raw)
  To: Finn Thain; +Cc: Geert Uytterhoeven, linux-m68k, Michael Schmitz

On Mai 02 2023, Finn Thain wrote:

> From: Andreas Schwab <schwab@linux-m68k.org>
>
> Signal delivery should only happen at insn boundaries, but due to the
> way the 030 handles return from bus error exceptions (the insn is resumed,
> not restarted like on the 040/060) the kernel may do it in the middle of
> the faulting insn.

I don't think this will work properly when the address cannot be
resolved.  The bus error exception will just be raised again
immediately.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 1/2] m68k: Don't deliver signals except at instruction boundary
  2023-05-02 16:51   ` Andreas Schwab
@ 2023-05-02 20:15     ` Michael Schmitz
  2023-05-03  3:22     ` Finn Thain
  1 sibling, 0 replies; 23+ messages in thread
From: Michael Schmitz @ 2023-05-02 20:15 UTC (permalink / raw)
  To: Andreas Schwab, Finn Thain; +Cc: Geert Uytterhoeven, linux-m68k

Hi Andreas, Finn,

On 3/05/23 04:51, Andreas Schwab wrote:
> On Mai 02 2023, Finn Thain wrote:
>
>> From: Andreas Schwab <schwab@linux-m68k.org>
>>
>> Signal delivery should only happen at insn boundaries, but due to the
>> way the 030 handles return from bus error exceptions (the insn is resumed,
>> not restarted like on the 040/060) the kernel may do it in the middle of
>> the faulting insn.
> I don't think this will work properly when the address cannot be
> resolved.  The bus error exception will just be raised again
> immediately.

Seems certain, even though we haven't observed that in testing yet 
(haven't designed the test case for that specifically, mind you).

The (slighty hairy) sole solution to this (that I can see) would be to 
only skip signal delivery on the condition that bus_error030 was able to 
correct the bus fault.

In any other case bus_error030 will raise SIGKILL or SIGSEGV which must 
be delivered right away.

Can we infer that from the state of the fault frame after bus_error030, 
or do we have to modify bus_error030 to return success/error codes?

Cheers,

     Michael

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 2/2] m68k: Make allowance for signal delivery following an address error
  2023-05-02  6:50 ` [PATCH RFC 2/2] m68k: Make allowance for signal delivery following an address error Finn Thain
@ 2023-05-02 21:06   ` Michael Schmitz
  2023-05-03  4:11     ` Finn Thain
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Schmitz @ 2023-05-02 21:06 UTC (permalink / raw)
  To: Finn Thain, Geert Uytterhoeven; +Cc: linux-m68k, Andreas Schwab

Hi Finn,

On 2/05/23 18:50, Finn Thain wrote:
> The 68030 Users Manual says that address errors occur immediately they
> are detected during instruction prefetch. The instruction pipeline allows
> prefetch to overlap with other instructions which means an address error
> can be taken during execution of a different instruction.
>
> Both a bus error and an address error may produce a format 0xB exception
> frame. Regarding those frames, the UM says the PC contained therein has
> the "address of instruction in execution when fault occurred -- may not be
> the instruction which generated the faulted bus cycle".

The UM  also states that:

"An address error exception occurs when the processor attempts to 
prefetch an instruction
from an odd address. This exception is similar to a bus error exception, 
but is internally
initiated. A bus cycle is not executed, and the processor begins 
exception processing
immediately." (8.1.3)

> In the bus error case, the USP in that frame is not reliable (like the
> PC). We should not rely on it in the address error case. The address error

The instruction causing the fault can't have modified USP yet. So the 
case that you are concerned about here is where the instruction in 
execution is e.g. a moveml using USP as one of the data addresses 
(making USP unreliable) and the instruction faulting is one of the next 
ones being decoded. Address properly aligned for the first instruction, 
then no longer aligned for the next ones?

(You mentioned cpBcc or cpDBcc instructions in another mail - where did 
you find that? It's not mentioned in my copy of the 030 UM...)

I'm not trying to say this all isn't necessary for address errors. I'm 
just trying to understand why it is.

> case always produces a SIGBUS. That signal may be caught and the
> exception fixed up. If a debugger or emulator were to do so, this patch
> would theoretically prevent user stack corruption.
>
> Cc: Michael Schmitz <schmitzmic@gmail.com>
> Cc: Andreas Schwab <schwab@linux-m68k.org>
> Link: https://lore.kernel.org/linux-m68k/20230429080410.8993-1-schmitzmic@gmail.com/
> Co-developed-by: Michael Schmitz <schmitzmic@gmail.com>
> Signed-off-by: Finn Thain <fthain@linux-m68k.org>
> ---
> Michael, as co-developer, this will need your signed-off-by.

I believe this patch is currently our best option to prevent user stack 
corruption by signal frames in the bus error case (after Andreas pointed 
out a problem with your patch 1).

I've also been unable to reproduce the out-of-memory condition with a 
stack gap size of 256 in my recent tests, so I can no longer see a 
drawback compared to my (immensely more complex) similar patch.

So even though I'm still a bit thick on the address error case:

Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>

> ---
>   arch/m68k/kernel/signal.c | 13 +++++++++----
>   1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/arch/m68k/kernel/signal.c b/arch/m68k/kernel/signal.c
> index b9f6908a31bc..8aeafbb083f7 100644
> --- a/arch/m68k/kernel/signal.c
> +++ b/arch/m68k/kernel/signal.c
> @@ -858,11 +858,16 @@ static inline int rt_setup_ucontext(struct ucontext __user *uc, struct pt_regs *
>   }
>   
>   static inline void __user *
> -get_sigframe(struct ksignal *ksig, size_t frame_size)
> +get_sigframe(struct ksignal *ksig, struct pt_regs *tregs, size_t frame_size)
>   {
>   	unsigned long usp = sigsp(rdusp(), ksig);
> +	unsigned long gap = 0;
>   
> -	return (void __user *)((usp - frame_size) & -8UL);
> +	if (CPU_IS_020_OR_030 && tregs->format == 0xb)
> +		/* USP is unreliable so use worst-case value */
> +		gap = 256;
> +
> +	return (void __user *)((usp - gap - frame_size) & -8UL);
>   }
>   
>   static int setup_frame(struct ksignal *ksig, sigset_t *set,
> @@ -880,7 +885,7 @@ static int setup_frame(struct ksignal *ksig, sigset_t *set,
>   		return -EFAULT;
>   	}
>   
> -	frame = get_sigframe(ksig, sizeof(*frame) + fsize);
> +	frame = get_sigframe(ksig, tregs, sizeof(*frame) + fsize);
>   
>   	if (fsize)
>   		err |= copy_to_user (frame + 1, regs + 1, fsize);
> @@ -952,7 +957,7 @@ static int setup_rt_frame(struct ksignal *ksig, sigset_t *set,
>   		return -EFAULT;
>   	}
>   
> -	frame = get_sigframe(ksig, sizeof(*frame));
> +	frame = get_sigframe(ksig, tregs, sizeof(*frame));
>   
>   	if (fsize)
>   		err |= copy_to_user (&frame->uc.uc_extra, regs + 1, fsize);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 1/2] m68k: Don't deliver signals except at instruction boundary
  2023-05-02 16:51   ` Andreas Schwab
  2023-05-02 20:15     ` Michael Schmitz
@ 2023-05-03  3:22     ` Finn Thain
  2023-05-03  3:24       ` Finn Thain
  1 sibling, 1 reply; 23+ messages in thread
From: Finn Thain @ 2023-05-03  3:22 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Geert Uytterhoeven, linux-m68k, Michael Schmitz

On Tue, 2 May 2023, Andreas Schwab wrote:

> On Mai 02 2023, Finn Thain wrote:
> 
> > From: Andreas Schwab <schwab@linux-m68k.org>
> >
> > Signal delivery should only happen at insn boundaries, but due to the 
> > way the 030 handles return from bus error exceptions (the insn is 
> > resumed, not restarted like on the 040/060) the kernel may do it in 
> > the middle of the faulting insn.
> 
> I don't think this will work properly when the address cannot be 
> resolved.  The bus error exception will just be raised again 
> immediately.
> 

Sorry, I don't follow. Did you mean to say this patch doesn't work 
properly when the bus error cannot be resolved? Is that because the user 
space signal handler is supposed to fix things up so that the bus error 
can then be resolved??

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 1/2] m68k: Don't deliver signals except at instruction boundary
  2023-05-03  3:22     ` Finn Thain
@ 2023-05-03  3:24       ` Finn Thain
  0 siblings, 0 replies; 23+ messages in thread
From: Finn Thain @ 2023-05-03  3:24 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Geert Uytterhoeven, linux-m68k, Michael Schmitz


On Wed, 3 May 2023, Finn Thain wrote:

> On Tue, 2 May 2023, Andreas Schwab wrote:
> 
> > On Mai 02 2023, Finn Thain wrote:
> > 
> > > From: Andreas Schwab <schwab@linux-m68k.org>
> > >
> > > Signal delivery should only happen at insn boundaries, but due to the 
> > > way the 030 handles return from bus error exceptions (the insn is 
> > > resumed, not restarted like on the 040/060) the kernel may do it in 
> > > the middle of the faulting insn.
> > 
> > I don't think this will work properly when the address cannot be 
> > resolved.  The bus error exception will just be raised again 
> > immediately.
> > 
> 
> Sorry, I don't follow. Did you mean to say this patch doesn't work 
> properly when the bus error cannot be resolved? Is that because the user 
> space signal handler is supposed to fix things up so that the bus error 
> can then be resolved??
> 

Nevermind. I just read Michael's explanation.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 2/2] m68k: Make allowance for signal delivery following an address error
  2023-05-02 21:06   ` Michael Schmitz
@ 2023-05-03  4:11     ` Finn Thain
  2023-05-03  5:50       ` Michael Schmitz
  0 siblings, 1 reply; 23+ messages in thread
From: Finn Thain @ 2023-05-03  4:11 UTC (permalink / raw)
  To: Michael Schmitz; +Cc: Geert Uytterhoeven, linux-m68k, Andreas Schwab

On Wed, 3 May 2023, Michael Schmitz wrote:

> So the case that you are concerned about here is where the instruction 
> in execution is e.g. a moveml using USP as one of the data addresses 
> (making USP unreliable) and the instruction faulting is one of the next 
> ones being decoded. [...]

Right. The instruction in execution may be an incomplete stack operation 
like MOVEM. If the signal handler fixes up the address error, that stack 
operation might be resumed after sys_sigreturn(), like the bus error case.
The kernel allows the exception frame to fixed up by the user program.

> (You mentioned cpBcc or cpDBcc instructions in another mail - where did 
> you find that? It's not mentioned in my copy of the 030 UM...)
> 

It's in the 3rd edition (1990), section 10.5.2.8.
https://www.nxp.com/files-static/32bit/doc/ref_manual/MC68030UM-P1.pdf
https://www.nxp.com/files-static/32bit/doc/ref_manual/MC68030UM-P2.pdf

> I'm not trying to say this all isn't necessary for address errors. I'm 
> just trying to understand why it is.
> 

I haven't yet tried to write code to demonstrate the theoretical address 
error issue but I can attempt that if need be. However, such code would be 
moot if this patch is going to be required anyway, just to fix the bus 
error case...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 2/2] m68k: Make allowance for signal delivery following an address error
  2023-05-03  4:11     ` Finn Thain
@ 2023-05-03  5:50       ` Michael Schmitz
  2023-05-03  8:02         ` Finn Thain
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Schmitz @ 2023-05-03  5:50 UTC (permalink / raw)
  To: Finn Thain; +Cc: Geert Uytterhoeven, linux-m68k, Andreas Schwab

Hi Finn,

Am 03.05.2023 um 16:11 schrieb Finn Thain:
> On Wed, 3 May 2023, Michael Schmitz wrote:
>
>> So the case that you are concerned about here is where the instruction
>> in execution is e.g. a moveml using USP as one of the data addresses
>> (making USP unreliable) and the instruction faulting is one of the next
>> ones being decoded. [...]
>
> Right. The instruction in execution may be an incomplete stack operation
> like MOVEM. If the signal handler fixes up the address error, that stack
> operation might be resumed after sys_sigreturn(), like the bus error case.
> The kernel allows the exception frame to fixed up by the user program.

Right - we don't have anything at present that would use signal handlers 
in this way (that we'd know of) but it's a legitimate use.

>> (You mentioned cpBcc or cpDBcc instructions in another mail - where did
>> you find that? It's not mentioned in my copy of the 030 UM...)
>>
>
> It's in the 3rd edition (1990), section 10.5.2.8.
> https://www.nxp.com/files-static/32bit/doc/ref_manual/MC68030UM-P1.pdf
> https://www.nxp.com/files-static/32bit/doc/ref_manual/MC68030UM-P2.pdf

Thanks - I have those but hadn't looked at that section yet.

Linux requires a FPU (real or emulated) and FPU conditional instructions 
might hit this bug (even though any code compiled by our binutils should 
avoid it). Better fix this now that we're poking around in this area of 
the kernel.

>> I'm not trying to say this all isn't necessary for address errors. I'm
>> just trying to understand why it is.
>>
>
> I haven't yet tried to write code to demonstrate the theoretical address
> error issue but I can attempt that if need be. However, such code would be
> moot if this patch is going to be required anyway, just to fix the bus
> error case...

No, seeing the coprocessor conditional branch case we want this patch 
even if we decide to handle data faults differently.

That is, unless Andreas can come up with a reason why calculated branch 
target adresses cannot be used with these coprocessor branch instructions?

Cheers,

	Michael


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 2/2] m68k: Make allowance for signal delivery following an address error
  2023-05-03  5:50       ` Michael Schmitz
@ 2023-05-03  8:02         ` Finn Thain
  2023-05-04 21:06           ` Michael Schmitz
  0 siblings, 1 reply; 23+ messages in thread
From: Finn Thain @ 2023-05-03  8:02 UTC (permalink / raw)
  To: Michael Schmitz; +Cc: Geert Uytterhoeven, linux-m68k, Andreas Schwab

On Wed, 3 May 2023, Michael Schmitz wrote:

> >
> > I haven't yet tried to write code to demonstrate the theoretical 
> > address error issue but I can attempt that if need be. However, such 
> > code would be moot if this patch is going to be required anyway, just 
> > to fix the bus error case...
> 
> No, seeing the coprocessor conditional branch case we want this patch 
> even if we decide to handle data faults differently.
> 
> That is, unless Andreas can come up with a reason why calculated branch 
> target adresses cannot be used with these coprocessor branch 
> instructions?

Moreover, can Andreas or Geert come up with a better way to fix the actual 
bus error bug (nevermind the theoretical address error bug) than this same 
patch (which just happens to work for both)?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 2/2] m68k: Make allowance for signal delivery following an address error
  2023-05-03  8:02         ` Finn Thain
@ 2023-05-04 21:06           ` Michael Schmitz
  2023-05-05  3:14             ` Finn Thain
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Schmitz @ 2023-05-04 21:06 UTC (permalink / raw)
  To: Finn Thain; +Cc: Geert Uytterhoeven, linux-m68k, Andreas Schwab

Hi Finn,

On 3/05/23 20:02, Finn Thain wrote:

> On Wed, 3 May 2023, Michael Schmitz wrote:
>
>>> I haven't yet tried to write code to demonstrate the theoretical
>>> address error issue but I can attempt that if need be. However, such
>>> code would be moot if this patch is going to be required anyway, just
>>> to fix the bus error case...
>> No, seeing the coprocessor conditional branch case we want this patch
>> even if we decide to handle data faults differently.
>>
>> That is, unless Andreas can come up with a reason why calculated branch
>> target adresses cannot be used with these coprocessor branch
>> instructions?
> Moreover, can Andreas or Geert come up with a better way to fix the actual
> bus error bug (nevermind the theoretical address error bug) than this same
> patch (which just happens to work for both)?

In terms of fixing the bus error bug, I think we'll need to fix this 
regression first (only first patch of yours applied):

> [16472.250000] Unable to handle kernel access at virtual address 409fa84e
> [16472.300000] Oops: 00000000
> [16472.340000] Modules linked in: ne 8390p
> [16472.400000] PC: [<00003b1a>] setup_frame+0x6e/0x1f8
> [16472.440000] SR: 2204  SP: dadced50  a2: 006020d0
> [16472.490000] d0: 00000000    d1: 00000000    d2: 00000000 d3: 00000000
> [16472.520000] d4: 00c85f6c    d5: 00003918    a0: c043dec8 a1: 006020d0
> [16472.550000] Process stress-ng (pid: 3239, task=362f381a)
> [16472.590000] Frame format=A ssw=0709 isc=0032 isb=0280 
> daddr=c043decc dobuf=0000000e
> [16472.640000] Stack from 00c85da8:
> [16472.640000]         00c85f6c 00000000 00000ca3 00000000 80088568 
> 8008c4d4 00c85fcc 0000381c
> [16472.640000]         effff6fc 00c85fa4 00c85dcc 00c85e84 000c8536 
> 00c85e90 00000001 00000001
> [16472.640000]         000c84fe 00c85e84 00c85e6e 00c85e84 000bae1e 
> 00c85e90 00000001 00000001
> [16472.640000]         00c85f00 005ba83c 00c85e6e 00c85e6e 00680484 
> 005ba7f8 00536b70 00982700
> [16472.640000]         0001ecb4 0080ac20 00c85f80 0001f2b0 00c85ea8 
> 0000000e 00536b70 000ce0bc
> [16472.640000]         0080ac20 00536b70 0001ecb4 0000000e 00981a34 
> 006023ce 00536b70 0001ecb4
> [16472.900000] Call Trace: [<0000381c>] test_ti_thread_flag+0x0/0x24
> [16472.940000]  [<000c8536>] free_pages_and_swap_cache+0x38/0x40
> [16472.970000]  [<000c84fe>] free_pages_and_swap_cache+0x0/0x40
> [16473.010000]  [<000bae1e>] tlb_flush_mmu+0x80/0x96
> [16473.060000]  [<0001ecb4>] __sigqueue_free+0x34/0x3a
> [16473.100000]  [<0001f2b0>] next_signal+0x0/0x54
> [16473.150000]  [<000ce0bc>] kmem_cache_free+0x4a/0x56
> [16473.190000]  [<0001ecb4>] __sigqueue_free+0x34/0x3a
> [16473.250000]  [<0001ecb4>] __sigqueue_free+0x34/0x3a
> [16473.300000]  [<0001f0e4>] recalc_sigpending+0x6/0x1e
> [16473.350000]  [<0001f388>] dequeue_signal+0x84/0x130
> [16473.390000]  [<00021402>] do_signal_stop+0x0/0x154
> [16473.430000]  [<002db200>] mt_destroy_walk+0x14e/0x160
> [16473.480000]  [<00021a06>] get_signal+0x3d8/0x4f6
> [16473.530000]  [<00021b06>] get_signal+0x4d8/0x4f6
> [16473.580000]  [<0000381c>] test_ti_thread_flag+0x0/0x24
> [16473.620000]  [<00004536>] do_notify_resume+0x3b2/0x480
> [16473.670000]  [<00002000>] _start+0x0/0x8
> [16473.720000]  [<00002bfa>] do_IRQ+0x26/0x32
> [16473.750000]  [<00002af4>] auto_irqhandler_fixup+0x4/0xc
> [16473.800000]  [<00002204>] do_one_initcall+0xa8/0x188
> [16473.850000]  [<002db0b2>] mt_destroy_walk+0x0/0x160
> [16473.900000]  [<00002aa0>] do_signal_return+0x10/0x1a
> [16473.940000]  [<00002a26>] syscall+0x8/0xc
> [16473.980000]  [<00002000>] _start+0x0/0x8
> [16474.010000]
> [16474.070000] Code: 6002 4280 4281 2401 0eab 6800 0004 8480 <302c> 
> 0032 0280 0000 0fff 2601 0eab 0800 0008 8483 761c d68b 0eab 3800 000c 8481
> [16474.210000] Disabling lock debugging due to kernel taint

objdump -d of setup_frame:

> 00003aac <setup_frame>:
>     3aac:       4fef fee4       lea %sp@(-284),%sp
>     3ab0:       48e7 3f1e       moveml %d2-%d7/%a3-%fp,%sp@-
>     3ab4:       282f 0148       movel %sp@(328),%d4
>     3ab8:       2c6f 0150       moveal %sp@(336),%fp
>     3abc:       284e            moveal %fp,%a4
>     3abe:       d9ee 0028       addal %fp@(40),%a4
>     3ac2:       e9ec 0004 0032  bfextu %a4@(50),0,4,%d0
>     3ac8:       2a70 0db0 002f  moveal @(00000000002fa214,%d0:l:4),%a5
>     3ace:       a214
>     3ad0:       2044            moveal %d4,%a0
>     3ad2:       2c28 0034       movel %a0@(52),%d6
>     3ad6:       4a8d            tstl %a5
>     3ad8:       6c06            bges 3ae0 <setup_frame+0x34>
>     3ada:       70f2            moveq #-14,%d0
>     3adc:       6000 01bc       braw 3c9a <dbl_thresh+0x99>
>     3ae0:       486d 0138       pea %a5@(312)
>     3ae4:       2f04            movel %d4,%sp@-
>     3ae6:       4eba fe7c       jsr %pc@(3964 <get_sigframe>)
>     3aea:       2648            moveal %a0,%a3
>     3aec:       508f            addql #8,%sp
>     3aee:       2a3c 0000 3918  movel #14616,%d5         <= 
> &raw_copy_to_user
>     3af4:       4a8d            tstl %a5
>     3af6:       6714            beqs 3b0c <setup_frame+0x60>
>     3af8:       2f0d            movel %a5,%sp@-
>     3afa:       486e 0034       pea %fp@(52)
>     3afe:       4868 0138       pea %a0@(312)
>     3b02:       2245            moveal %d5,%a1
>     3b04:       4e91            jsr %a1@         <= raw_copy_to_user()
>     3b06:       4fef 000c       lea %sp@(12),%sp
>     3b0a:       6002            bras 3b0e <setup_frame+0x62>
>     3b0c:       4280            clrl %d0
>     3b0e:       4281            clrl %d1
>     3b10:       2401            movel %d1,%d2
>     3b12:       0eab 6800 0004  movesl %d6,%a3@(4)
>     3b18:       8480            orl %d0,%d2
>     3b1a:       302c 0032       movew %a4@(50),%d0         <= fault PC
>     3b1e:       0280 0000 0fff  andil #4095,%d0
>     3b24:       2601            movel %d1,%d3
>     3b26:       0eab 0800 0008  movesl %d0,%a3@(8)
>     3b2c:       8483            orl %d3,%d2
>     3b2e:       761c            moveq #28,%d3
>     3b30:       d68b            addl %a3,%d3
>     3b32:       0eab 3800 000c  movesl %d3,%a3@(12)
>     3b38:       8481            orl %d1,%d2
>     3b3a:       4878 0004       pea 4 <CC3_CLRE_I>
>     3b3e:       206f 0150       moveal %sp@(336),%a0
>     3b42:       4868 0004       pea %a0@(4)
Which is this line, as far as I can make out:

> static int setup_frame(struct ksignal *ksig, sigset_t *set,
>                         struct pt_regs *regs)
> {
>         struct sigframe __user *frame;
>         struct pt_regs *tregs = rte_regs(regs);
>         int fsize = frame_extra_sizes(tregs->format);
>         struct sigcontext context;
>         int err = 0, sig = ksig->sig;
>
>         if (fsize < 0) {
>                 pr_debug("setup_frame: Unknown frame format %#x\n",
>                          tregs->format);
>                 return -EFAULT;
>         }
>
>         frame = get_sigframe(ksig, sizeof(*frame) + fsize);
>
>         if (fsize)
>                 err |= copy_to_user (frame + 1, regs + 1, fsize);
>
>         err |= __put_user(sig, &frame->sig);
>
>         err |= __put_user(tregs->vector, &frame->code);            <====
>         err |= __put_user(&frame->sc, &frame->psc);
>
Happened during this stressor:

> running --sigsegv 2 -t 300 --timestamp --no-rand-seed --times
> stress-ng: 09:18:25.65 info:  [3310] setting to a 300 second (5 mins, 
> 0.00 secs) run per stressor
> stress-ng: 09:18:25.80 info:  [3310] dispatching hogs: 2 sigsegv
> Timeout, server hobbes not responding.
I'll try with your second patch applied as well. I hadn't seen any 
regressions with a patch adding 256 bytes of gap indiscriminately 
though, so I'm sure that second patch itself is OK.

Cheers,

     Michael



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 2/2] m68k: Make allowance for signal delivery following an address error
  2023-05-04 21:06           ` Michael Schmitz
@ 2023-05-05  3:14             ` Finn Thain
  2023-05-05  5:05               ` Michael Schmitz
  0 siblings, 1 reply; 23+ messages in thread
From: Finn Thain @ 2023-05-05  3:14 UTC (permalink / raw)
  To: Michael Schmitz; +Cc: Geert Uytterhoeven, linux-m68k, Andreas Schwab

On Fri, 5 May 2023, Michael Schmitz wrote:

> 
> In terms of fixing the bus error bug, I think we'll need to fix this 
> regression first (only first patch of yours applied):
> 

That patch came from Andreas, maybe he can comment.

> I'll try with your second patch applied as well. 

I think patch 2/2 should be tested alone since 1/2 was nak'd.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 2/2] m68k: Make allowance for signal delivery following an address error
  2023-05-05  3:14             ` Finn Thain
@ 2023-05-05  5:05               ` Michael Schmitz
  2023-05-05 21:35                 ` Michael Schmitz
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Schmitz @ 2023-05-05  5:05 UTC (permalink / raw)
  To: Finn Thain; +Cc: Geert Uytterhoeven, linux-m68k, Andreas Schwab

Hi Finn,

Am 05.05.2023 um 15:14 schrieb Finn Thain:
> On Fri, 5 May 2023, Michael Schmitz wrote:
>
>>
>> In terms of fixing the bus error bug, I think we'll need to fix this
>> regression first (only first patch of yours applied):
>>
>
> That patch came from Andreas, maybe he can comment.
>
>> I'll try with your second patch applied as well.
>
> I think patch 2/2 should be tested alone since 1/2 was nak'd.

I'm trying to find out whether patch 2 mitigates the effect of patch 1. 
But I'll test patch 2 alone as well. No worries.

Cheers,

	Michael



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 2/2] m68k: Make allowance for signal delivery following an address error
  2023-05-05  5:05               ` Michael Schmitz
@ 2023-05-05 21:35                 ` Michael Schmitz
  2023-05-06  0:36                   ` Finn Thain
  2023-05-06  1:11                   ` Finn Thain
  0 siblings, 2 replies; 23+ messages in thread
From: Michael Schmitz @ 2023-05-05 21:35 UTC (permalink / raw)
  To: Finn Thain; +Cc: Geert Uytterhoeven, linux-m68k, Andreas Schwab

Hi Finn,

Am 05.05.2023 um 17:05 schrieb Michael Schmitz:
> Hi Finn,
>
> Am 05.05.2023 um 15:14 schrieb Finn Thain:
>> On Fri, 5 May 2023, Michael Schmitz wrote:
>>
>>>
>>> In terms of fixing the bus error bug, I think we'll need to fix this
>>> regression first (only first patch of yours applied):
>>>
>>
>> That patch came from Andreas, maybe he can comment.
>>
>>> I'll try with your second patch applied as well.
>>
>> I think patch 2/2 should be tested alone since 1/2 was nak'd.
>
> I'm trying to find out whether patch 2 mitigates the effect of patch 1.

It didn't - hung without printing an Oops message, on the first stress 
iteration using the --stack 2 --stack-fill stressor.

Cheers,

	Michael

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 2/2] m68k: Make allowance for signal delivery following an address error
  2023-05-05 21:35                 ` Michael Schmitz
@ 2023-05-06  0:36                   ` Finn Thain
  2023-05-06  1:46                     ` Michael Schmitz
  2023-05-06  1:11                   ` Finn Thain
  1 sibling, 1 reply; 23+ messages in thread
From: Finn Thain @ 2023-05-06  0:36 UTC (permalink / raw)
  To: Michael Schmitz; +Cc: Geert Uytterhoeven, linux-m68k, Andreas Schwab

On Sat, 6 May 2023, Michael Schmitz wrote:

> Am 05.05.2023 um 17:05 schrieb Michael Schmitz:
> > Am 05.05.2023 um 15:14 schrieb Finn Thain:
> >> On Fri, 5 May 2023, Michael Schmitz wrote:
> >>
> >>> In terms of fixing the bus error bug, I think we'll need to fix this 
> >>> regression first (only first patch of yours applied):
> >>>
> >>
> >> That patch came from Andreas, maybe he can comment.
> >>
> >>> I'll try with your second patch applied as well.
> >>
> >> I think patch 2/2 should be tested alone since 1/2 was nak'd.
> >
> > I'm trying to find out whether patch 2 mitigates the effect of patch 
> > 1.
> 
> It didn't - hung without printing an Oops message, on the first stress 
> iteration using the --stack 2 --stack-fill stressor.
> 

In my mind, at the time I sent them, patch 1 was expected to mitigate the 
impact of patch 2. It would have been surprising if the reverse had 
somehow happened.

Anyway, I'm still in the dark as to why patch 1 is interesting. Was that 
because you wanted to delay signal delivery to try to put more pressure on 
stack memory by delivering multiple signals? You and I did come up with an 
alternative patch for bypassing signal delivery after certain exceptions. 
Maybe that would work better.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 2/2] m68k: Make allowance for signal delivery following an address error
  2023-05-05 21:35                 ` Michael Schmitz
  2023-05-06  0:36                   ` Finn Thain
@ 2023-05-06  1:11                   ` Finn Thain
  2023-05-06  1:54                     ` Michael Schmitz
  1 sibling, 1 reply; 23+ messages in thread
From: Finn Thain @ 2023-05-06  1:11 UTC (permalink / raw)
  To: Michael Schmitz; +Cc: Geert Uytterhoeven, linux-m68k, Andreas Schwab

On Sat, 6 May 2023, Michael Schmitz wrote:

> > I'm trying to find out whether patch 2 mitigates the effect of patch 
> > 1.
> 
> It didn't - hung without printing an Oops message, on the first stress 
> iteration using the --stack 2 --stack-fill stressor.
> 

When I run that under stock v6.3, one of the stress-ng-stack processes 
gets killed by the OOM killer, followed by a soft lockup.

So I'd say the failure you saw was not related to the patches you used.

# stress-ng -t 180 --stack 2 --stack-fill
stress-ng: info:  [43] setting to a 180 second (3 mins, 0.00 secs) run per stressor
stress-ng: info:  [43] dispatching hogs: 2 stack
[  162.400000] stress-ng-stack invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=1000
[  162.420000] CPU: 0 PID: 47 Comm: stress-ng-stack Not tainted 6.3.0-mac #3
[  162.420000] Stack from 01465d68:
[  162.420000]         01465d68 004d72d8 004d72d8 fffffffd 7fffffff 01465d88 0040e286 004d72d8
[  162.420000]         01465db8 0040b74c fffffffd 7fffffff ffffffff 00000840 00000000 00000000
[  162.420000]         0092f520 01465e80 004e5560 0092f520 01465df0 0008a110 01465e80 0092f520
[  162.420000]         fffffffd 7fffffff ffffffff 00000840 00000000 00000000 01465e80 004e56aa
[  162.420000]         004e5560 01465e80 01465e1c 0008a6e0 01465e80 004ad402 00140cca 00000002
[  162.420000]         00000000 000c21b2 0050b376 00000000 00000000 01465ea4 000c373a 01465e80
[  162.420000] Call Trace: [<0040e286>] dump_stack+0x10/0x16
[  162.420000]  [<0040b74c>] dump_header+0x52/0x1be
[  162.420000]  [<0008a110>] oom_kill_process+0x3bc/0x3f8
[  162.420000]  [<0008a6e0>] out_of_memory+0x2e4/0x44c
[  162.420000]  [<00140cca>] proc_pid_status+0x16e/0x1370
[  162.420000]  [<000c21b2>] get_page_from_freelist+0x0/0xb76
[  162.420000]  [<000c373a>] __alloc_pages+0x734/0xb30
[  162.420000]  [<000b28fe>] find_vma+0x0/0x22
[  162.420000]  [<00410cf2>] down_read+0x0/0xd6
[  162.420000]  [<001408ca>] do_task_stat+0x1038/0x1224
[  162.420000]  [<00010000>] sm_dnrm+0x6/0x1a
[  162.420000]  [<00020000>] __release_region+0x54/0xc8
[  162.420000]  [<00140cca>] proc_pid_status+0x16e/0x1370
[  162.420000]  [<00001e70>] kernel_pg_dir+0xe70/0x1000
[  162.420000]  [<00002865>] calibrate_delay+0xd3/0x26a
[  162.420000]  [<000c3f38>] __folio_alloc+0x20/0x26
[  162.420000]  [<00140cca>] proc_pid_status+0x16e/0x1370
[  162.420000]  [<000aceea>] handle_mm_fault+0x904/0xb28
[  162.420000]  [<00100cca>] zero_user_segments+0x10e/0x12c
[  162.420000]  [<000ef3c7>] take_dentry_name_snapshot+0x47/0x64
[  162.420000]  [<000070ac>] do_page_fault+0xde/0x292
[  162.420000]  [<00006204>] buserr_c+0x2d6/0x6c4
[  162.420000]  [<00002000>] _start+0x0/0x8
[  162.420000]  [<00040000>] pick_next_task_dl+0x7c/0x8c
[  162.420000]  [<00001000>] kernel_pg_dir+0x0/0x1000
[  162.420000]  [<00002aa4>] buserr+0x20/0x28
[  162.420000]  [<00002000>] _start+0x0/0x8
[  162.420000]  [<00040000>] pick_next_task_dl+0x7c/0x8c
[  162.420000]  [<0004c017>] irq_thread_dtor+0x7/0xd8
[  162.420000]  [<0020b289>] ioc_find_get_icq+0x107/0x246
[  162.420000] 
[  162.430000] Mem-Info:
[  162.440000] active_anon:5264 inactive_anon:1765 isolated_anon:0
[  162.440000]  active_file:0 inactive_file:0 isolated_file:0
[  162.440000]  unevictable:3 dirty:0 writeback:0
[  162.440000]  slab_reclaimable:71 slab_unreclaimable:331
[  162.440000]  mapped:564 shmem:560 pagetables:16
[  162.440000]  sec_pagetables:0 bounce:0
[  162.440000]  kernel_misc_reclaimable:0
[  162.440000]  free:176 free_pcp:0 free_cma:0
[  162.450000] Node 0 active_anon:21056kB inactive_anon:7060kB active_file:0kB inactive_file:0kB unevictable:12kB isolated(anon):0kB isolated(file):0kB mapped:2256kB dirty:0kB writeback:0kB shmem:2240kB writeback_tmp:0kB kernel_stack:280kB pagetables:64kB sec_pagetables:0kB all_unreclaimable? no
[  162.460000] DMA free:704kB boost:0kB min:704kB low:880kB high:1056kB reserved_highatomic:0KB active_anon:21056kB inactive_anon:7060kB active_file:0kB inactive_file:0kB unevictable:12kB writepending:0kB present:36864kB managed:31168kB mlocked:12kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[  162.480000] lowmem_reserve[]: 0 0 0
[  162.490000] DMA: 0*4kB 0*8kB 0*16kB 0*32kB 1*64kB (M) 1*128kB (M) 2*256kB (M) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 704kB
[  162.520000] 611 total pagecache pages
[  162.530000] 0 pages in swap cache
[  162.540000] Free swap  = 0kB
[  162.550000] Total swap = 0kB
[  162.560000] 9216 pages RAM
[  162.570000] 0 pages HighMem/MovableOnly
[  162.580000] 1424 pages reserved
[  162.590000] Tasks state (memory values in pages):
[  162.610000] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[  162.620000] [     43]     0    43     5254      754    12032        0         -1000 stress-ng
[  162.630000] [     44]     0    44     5254      197     8448        0         -1000 stress-ng-stack
[  162.640000] [     45]     0    45     5254      197     8448        0         -1000 stress-ng-stack
[  162.650000] [     46]     0    46     8439     3336    21248        0          1000 stress-ng-stack
[  162.660000] [     47]     0    47     8375     3274    20992        0          1000 stress-ng-stack
[  162.670000] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),task=stress-ng-stack,pid=46,uid=0
[  162.690000] Out of memory: Killed process 46 (stress-ng-stack) total-vm:33756kB, anon-rss:13336kB, file-rss:4kB, shmem-rss:4kB, UID:0 pgtables:20kB oom_score_adj:1000
[  220.440000] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kswapd0:20]
[  220.440000] Modules linked in:
[  220.440000] Format 00  Vector: 0064  PC: 0009310c  Status: 2004    Not tainted
[  220.440000] ORIG_D0: ffffffff  D0: 00000000  A2: 0085a530  A1: 0000029b
[  220.440000] A0: 0050b3f2  D5: 00000000  D4: 00000001
[  220.440000] D3: 00000002  D2: 0000001f  D1: 0000029c
[  244.440000] watchdog: BUG: soft lockup - CPU#0 stuck for 45s! [kswapd0:20]
[  244.440000] Modules linked in:
[  244.440000] Format 00  Vector: 0064  PC: 00207178  Status: 2000    Tainted: G             L    
[  244.440000] ORIG_D0: ffffffff  D0: 0000000c  A2: 0085a530  A1: 008e3f02
[  244.440000] A0: 008e3e48  D5: 00000000  D4: 00000001
[  244.440000] D3: 00000002  D2: 00000000  D1: 00000000
[  268.440000] watchdog: BUG: soft lockup - CPU#0 stuck for 68s! [kswapd0:20]
[  268.440000] Modules linked in:
[  268.440000] Format 00  Vector: 0064  PC: 00095a76  Status: 2000    Tainted: G             L    
[  268.440000] ORIG_D0: ffffffff  D0: 00000000  A2: 0085a530  A1: 0085a530
[  268.440000] A0: fffffff6  D5: 00000000  D4: 00000001
[  268.440000] D3: 00000002  D2: 0000001f  D1: 00000020
[  292.440000] watchdog: BUG: soft lockup - CPU#0 stuck for 90s! [kswapd0:20]
[  292.440000] Modules linked in:
[  292.440000] Format 00  Vector: 0064  PC: 000930d0  Status: 2000    Tainted: G             L    
[  292.440000] ORIG_D0: ffffffff  D0: 00000299  A2: 0085a530  A1: 0000029b
[  292.440000] A0: 0050b3f2  D5: 00000000  D4: 00000001
[  292.440000] D3: 00000002  D2: 0000001f  D1: 0000001f
[  316.450000] watchdog: BUG: soft lockup - CPU#0 stuck for 112s! [kswapd0:20]
[  316.450000] Modules linked in:
[  316.450000] Format 00  Vector: 0064  PC: 0009142a  Status: 2004    Tainted: G             L    
[  316.450000] ORIG_D0: ffffffff  D0: 0000000c  A2: 0085a530  A1: 008e3f02
[  316.450000] A0: 0050b376  D5: 00000000  D4: 00000001
[  316.450000] D3: 00000002  D2: 00000000  D1: 00000000
[  340.450000] watchdog: BUG: soft lockup - CPU#0 stuck for 135s! [kswapd0:20]
[  340.450000] Modules linked in:
[  340.450000] Format 00  Vector: 0064  PC: 00092cd2  Status: 2000    Tainted: G             L    
[  340.450000] ORIG_D0: ffffffff  D0: 0000000f  A2: 0085a530  A1: 000000b0
[  340.450000] A0: 00092c9e  D5: 008e3f94  D4: 00000002
[  340.450000] D3: 00044260  D2: 00000000  D1: 0050b3ae
[  364.450000] watchdog: BUG: soft lockup - CPU#0 stuck for 157s! [kswapd0:20]
[  364.450000] Modules linked in:
[  364.450000] Format 00  Vector: 0064  PC: 000b06de  Status: 2004    Tainted: G             L    
[  364.450000] ORIG_D0: ffffffff  D0: 0000000c  A2: 0085a530  A1: 008e3f02
[  364.450000] A0: 0050b376  D5: 00000000  D4: 00000001
[  364.450000] D3: 00000002  D2: 00000000  D1: 00000000
[  388.450000] watchdog: BUG: soft lockup - CPU#0 stuck for 179s! [kswapd0:20]
[  388.450000] Modules linked in:
[  388.450000] Format 00  Vector: 0064  PC: 000359da  Status: 2000    Tainted: G             L    
[  388.450000] ORIG_D0: ffffffff  D0: 00000003  A2: 0085a530  A1: 008e3f94
[  388.450000] A0: 008e3f88  D5: 008e3f94  D4: 00000001
[  388.450000] D3: 00000000  D2: 00000000  D1: 00000003
[  412.450000] watchdog: BUG: soft lockup - CPU#0 stuck for 202s! [kswapd0:20]
[  412.450000] Modules linked in:
[  412.450000] Format 00  Vector: 0064  PC: 00095ca6  Status: 2000    Tainted: G             L    
[  412.450000] ORIG_D0: ffffffff  D0: 0050b644  A2: 0085a530  A1: 008e3f44
[  412.450000] A0: 0085a530  D5: 00000000  D4: 00000001
[  412.450000] D3: 00000001  D2: 0000001f  D1: 00000000
[  436.450000] watchdog: BUG: soft lockup - CPU#0 stuck for 224s! [kswapd0:20]
[  436.450000] Modules linked in:
[  436.450000] Format 00  Vector: 0064  PC: 000359e2  Status: 2004    Tainted: G             L    
[  436.450000] ORIG_D0: ffffffff  D0: 00000000  A2: 0085a530  A1: 008e3f44
[  436.450000] A0: 008afe00  D5: 00000000  D4: 00000001
[  436.450000] D3: 00000001  D2: 00000000  D1: 00000000
[  460.460000] watchdog: BUG: soft lockup - CPU#0 stuck for 246s! [kswapd0:20]
[  460.460000] Modules linked in:
[  460.460000] Format 00  Vector: 0064  PC: 00092b30  Status: 2004    Tainted: G             L    
[  460.460000] ORIG_D0: ffffffff  D0: 008e3f44  A2: 0085a530  A1: 008e3f44
[  460.460000] A0: 0050b15c  D5: 00000000  D4: 00000001
[  460.460000] D3: 00000001  D2: 00000000  D1: 00000000
[  484.460000] watchdog: BUG: soft lockup - CPU#0 stuck for 269s! [kswapd0:20]
[  484.460000] Modules linked in:
[  484.460000] Format 00  Vector: 0064  PC: 000930c0  Status: 2000    Tainted: G             L    
[  484.460000] ORIG_D0: ffffffff  D0: 00000000  A2: 0085a530  A1: 008e3f02
[  484.460000] A0: 0050b3f2  D5: 00000000  D4: 00000001
[  484.460000] D3: 00000001  D2: 0000001f  D1: 0000001f
[  508.460000] watchdog: BUG: soft lockup - CPU#0 stuck for 291s! [kswapd0:20]
[  508.460000] Modules linked in:
[  508.460000] Format 00  Vector: 0064  PC: 00097f8c  Status: 2014    Tainted: G             L    
[  508.460000] ORIG_D0: ffffffff  D0: 00002014  A2: 0085a530  A1: 008e3f4c
[  508.460000] A0: 008e3f02  D5: 008e3f94  D4: 00000001
[  508.460000] D3: 00000001  D2: 00000000  D1: 00000048
[  532.460000] watchdog: BUG: soft lockup - CPU#0 stuck for 313s! [kswapd0:20]
[  532.460000] Modules linked in:
[  532.460000] Format 00  Vector: 0064  PC: 00095a5c  Status: 2008    Tainted: G             L    
[  532.460000] ORIG_D0: ffffffff  D0: fffffff6  A2: 0085a530  A1: 0085a530
[  532.460000] A0: fffffff6  D5: 00000000  D4: 00000001
[  532.460000] D3: 00000001  D2: 0000001f  D1: 00000020

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 2/2] m68k: Make allowance for signal delivery following an address error
  2023-05-06  0:36                   ` Finn Thain
@ 2023-05-06  1:46                     ` Michael Schmitz
  2023-05-06  2:46                       ` Finn Thain
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Schmitz @ 2023-05-06  1:46 UTC (permalink / raw)
  To: Finn Thain; +Cc: Geert Uytterhoeven, linux-m68k, Andreas Schwab

Hi Finn,

Am 06.05.2023 um 12:36 schrieb Finn Thain:
> On Sat, 6 May 2023, Michael Schmitz wrote:
>
>> Am 05.05.2023 um 17:05 schrieb Michael Schmitz:
>>> Am 05.05.2023 um 15:14 schrieb Finn Thain:
>>>> On Fri, 5 May 2023, Michael Schmitz wrote:
>>>>
>>>>> In terms of fixing the bus error bug, I think we'll need to fix this
>>>>> regression first (only first patch of yours applied):
>>>>>
>>>>
>>>> That patch came from Andreas, maybe he can comment.
>>>>
>>>>> I'll try with your second patch applied as well.
>>>>
>>>> I think patch 2/2 should be tested alone since 1/2 was nak'd.
>>>
>>> I'm trying to find out whether patch 2 mitigates the effect of patch
>>> 1.
>>
>> It didn't - hung without printing an Oops message, on the first stress
>> iteration using the --stack 2 --stack-fill stressor.
>>
>
> In my mind, at the time I sent them, patch 1 was expected to mitigate the
> impact of patch 2. It would have been surprising if the reverse had
> somehow happened.

I did wonder if delaying signal delivery would cause fewer signals to be 
delivered in total, and whether the remaining bus faults that do cause 
signal delivery (format a) still cause any trouble because of eventual 
unreliable USP.

Adding your patch 2 then might have mitigated these problems while not 
placing too much of a burden on the stack.

> Anyway, I'm still in the dark as to why patch 1 is interesting. Was that
> because you wanted to delay signal delivery to try to put more pressure on
> stack memory by delivering multiple signals? You and I did come up with an

No - it just eliminates temporary stack growth due to the format b bus 
faults as entry into signal delivery.

> alternative patch for bypassing signal delivery after certain exceptions.
> Maybe that would work better.

Meaning skipping signal delivery on any bus fault? I haven't tried that 
one yet. But I suspect the same problem exists with that as with your 
patch 1 - if the bus fault wasn't handled, and the signal to be 
delivered to the process is SIGKILL or SIGSEGV, completing the execption 
without signal delivery will just cause the same bus fault again, ad 
infinitum.

Cheers,

	Michael

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 2/2] m68k: Make allowance for signal delivery following an address error
  2023-05-06  1:11                   ` Finn Thain
@ 2023-05-06  1:54                     ` Michael Schmitz
  2023-05-06  2:11                       ` Finn Thain
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Schmitz @ 2023-05-06  1:54 UTC (permalink / raw)
  To: Finn Thain; +Cc: Geert Uytterhoeven, linux-m68k, Andreas Schwab

Hi Finn

Am 06.05.2023 um 13:11 schrieb Finn Thain:
> On Sat, 6 May 2023, Michael Schmitz wrote:
>
>>> I'm trying to find out whether patch 2 mitigates the effect of patch
>>> 1.
>>
>> It didn't - hung without printing an Oops message, on the first stress
>> iteration using the --stack 2 --stack-fill stressor.
>>
>
> When I run that under stock v6.3, one of the stress-ng-stack processes
> gets killed by the OOM killer, followed by a soft lockup.

Yes, I'd seen the OOM killer kick in before at least once (no soft 
lockup). Probably due to a cron job putting additional pressure on the 
system. But that's not the same as a kernel access error, or the hard 
lockup I reported (with the heartbeat LED and the disk LED on the CF 
adapter on the IDE bus steady on, so no interrupts serviced anymore).

> So I'd say the failure you saw was not related to the patches you used.

Hard to say - I'd have to see it fail in the same way again. Didn't take 
long after start of the stress tests so I'll probably do that sometime.

Your patch 2 alone looks a lot more stable so far.

Cheers,

	Michael

>
> # stress-ng -t 180 --stack 2 --stack-fill
> stress-ng: info:  [43] setting to a 180 second (3 mins, 0.00 secs) run per stressor
> stress-ng: info:  [43] dispatching hogs: 2 stack
> [  162.400000] stress-ng-stack invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=1000
> [  162.420000] CPU: 0 PID: 47 Comm: stress-ng-stack Not tainted 6.3.0-mac #3
> [  162.420000] Stack from 01465d68:
> [  162.420000]         01465d68 004d72d8 004d72d8 fffffffd 7fffffff 01465d88 0040e286 004d72d8
> [  162.420000]         01465db8 0040b74c fffffffd 7fffffff ffffffff 00000840 00000000 00000000
> [  162.420000]         0092f520 01465e80 004e5560 0092f520 01465df0 0008a110 01465e80 0092f520
> [  162.420000]         fffffffd 7fffffff ffffffff 00000840 00000000 00000000 01465e80 004e56aa
> [  162.420000]         004e5560 01465e80 01465e1c 0008a6e0 01465e80 004ad402 00140cca 00000002
> [  162.420000]         00000000 000c21b2 0050b376 00000000 00000000 01465ea4 000c373a 01465e80
> [  162.420000] Call Trace: [<0040e286>] dump_stack+0x10/0x16
> [  162.420000]  [<0040b74c>] dump_header+0x52/0x1be
> [  162.420000]  [<0008a110>] oom_kill_process+0x3bc/0x3f8
> [  162.420000]  [<0008a6e0>] out_of_memory+0x2e4/0x44c
> [  162.420000]  [<00140cca>] proc_pid_status+0x16e/0x1370
> [  162.420000]  [<000c21b2>] get_page_from_freelist+0x0/0xb76
> [  162.420000]  [<000c373a>] __alloc_pages+0x734/0xb30
> [  162.420000]  [<000b28fe>] find_vma+0x0/0x22
> [  162.420000]  [<00410cf2>] down_read+0x0/0xd6
> [  162.420000]  [<001408ca>] do_task_stat+0x1038/0x1224
> [  162.420000]  [<00010000>] sm_dnrm+0x6/0x1a
> [  162.420000]  [<00020000>] __release_region+0x54/0xc8
> [  162.420000]  [<00140cca>] proc_pid_status+0x16e/0x1370
> [  162.420000]  [<00001e70>] kernel_pg_dir+0xe70/0x1000
> [  162.420000]  [<00002865>] calibrate_delay+0xd3/0x26a
> [  162.420000]  [<000c3f38>] __folio_alloc+0x20/0x26
> [  162.420000]  [<00140cca>] proc_pid_status+0x16e/0x1370
> [  162.420000]  [<000aceea>] handle_mm_fault+0x904/0xb28
> [  162.420000]  [<00100cca>] zero_user_segments+0x10e/0x12c
> [  162.420000]  [<000ef3c7>] take_dentry_name_snapshot+0x47/0x64
> [  162.420000]  [<000070ac>] do_page_fault+0xde/0x292
> [  162.420000]  [<00006204>] buserr_c+0x2d6/0x6c4
> [  162.420000]  [<00002000>] _start+0x0/0x8
> [  162.420000]  [<00040000>] pick_next_task_dl+0x7c/0x8c
> [  162.420000]  [<00001000>] kernel_pg_dir+0x0/0x1000
> [  162.420000]  [<00002aa4>] buserr+0x20/0x28
> [  162.420000]  [<00002000>] _start+0x0/0x8
> [  162.420000]  [<00040000>] pick_next_task_dl+0x7c/0x8c
> [  162.420000]  [<0004c017>] irq_thread_dtor+0x7/0xd8
> [  162.420000]  [<0020b289>] ioc_find_get_icq+0x107/0x246
> [  162.420000]
> [  162.430000] Mem-Info:
> [  162.440000] active_anon:5264 inactive_anon:1765 isolated_anon:0
> [  162.440000]  active_file:0 inactive_file:0 isolated_file:0
> [  162.440000]  unevictable:3 dirty:0 writeback:0
> [  162.440000]  slab_reclaimable:71 slab_unreclaimable:331
> [  162.440000]  mapped:564 shmem:560 pagetables:16
> [  162.440000]  sec_pagetables:0 bounce:0
> [  162.440000]  kernel_misc_reclaimable:0
> [  162.440000]  free:176 free_pcp:0 free_cma:0
> [  162.450000] Node 0 active_anon:21056kB inactive_anon:7060kB active_file:0kB inactive_file:0kB unevictable:12kB isolated(anon):0kB isolated(file):0kB mapped:2256kB dirty:0kB writeback:0kB shmem:2240kB writeback_tmp:0kB kernel_stack:280kB pagetables:64kB sec_pagetables:0kB all_unreclaimable? no
> [  162.460000] DMA free:704kB boost:0kB min:704kB low:880kB high:1056kB reserved_highatomic:0KB active_anon:21056kB inactive_anon:7060kB active_file:0kB inactive_file:0kB unevictable:12kB writepending:0kB present:36864kB managed:31168kB mlocked:12kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [  162.480000] lowmem_reserve[]: 0 0 0
> [  162.490000] DMA: 0*4kB 0*8kB 0*16kB 0*32kB 1*64kB (M) 1*128kB (M) 2*256kB (M) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 704kB
> [  162.520000] 611 total pagecache pages
> [  162.530000] 0 pages in swap cache
> [  162.540000] Free swap  = 0kB
> [  162.550000] Total swap = 0kB
> [  162.560000] 9216 pages RAM
> [  162.570000] 0 pages HighMem/MovableOnly
> [  162.580000] 1424 pages reserved
> [  162.590000] Tasks state (memory values in pages):
> [  162.610000] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
> [  162.620000] [     43]     0    43     5254      754    12032        0         -1000 stress-ng
> [  162.630000] [     44]     0    44     5254      197     8448        0         -1000 stress-ng-stack
> [  162.640000] [     45]     0    45     5254      197     8448        0         -1000 stress-ng-stack
> [  162.650000] [     46]     0    46     8439     3336    21248        0          1000 stress-ng-stack
> [  162.660000] [     47]     0    47     8375     3274    20992        0          1000 stress-ng-stack
> [  162.670000] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),task=stress-ng-stack,pid=46,uid=0
> [  162.690000] Out of memory: Killed process 46 (stress-ng-stack) total-vm:33756kB, anon-rss:13336kB, file-rss:4kB, shmem-rss:4kB, UID:0 pgtables:20kB oom_score_adj:1000
> [  220.440000] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kswapd0:20]
> [  220.440000] Modules linked in:
> [  220.440000] Format 00  Vector: 0064  PC: 0009310c  Status: 2004    Not tainted
> [  220.440000] ORIG_D0: ffffffff  D0: 00000000  A2: 0085a530  A1: 0000029b
> [  220.440000] A0: 0050b3f2  D5: 00000000  D4: 00000001
> [  220.440000] D3: 00000002  D2: 0000001f  D1: 0000029c
> [  244.440000] watchdog: BUG: soft lockup - CPU#0 stuck for 45s! [kswapd0:20]
> [  244.440000] Modules linked in:
> [  244.440000] Format 00  Vector: 0064  PC: 00207178  Status: 2000    Tainted: G             L
> [  244.440000] ORIG_D0: ffffffff  D0: 0000000c  A2: 0085a530  A1: 008e3f02
> [  244.440000] A0: 008e3e48  D5: 00000000  D4: 00000001
> [  244.440000] D3: 00000002  D2: 00000000  D1: 00000000
> [  268.440000] watchdog: BUG: soft lockup - CPU#0 stuck for 68s! [kswapd0:20]
> [  268.440000] Modules linked in:
> [  268.440000] Format 00  Vector: 0064  PC: 00095a76  Status: 2000    Tainted: G             L
> [  268.440000] ORIG_D0: ffffffff  D0: 00000000  A2: 0085a530  A1: 0085a530
> [  268.440000] A0: fffffff6  D5: 00000000  D4: 00000001
> [  268.440000] D3: 00000002  D2: 0000001f  D1: 00000020
> [  292.440000] watchdog: BUG: soft lockup - CPU#0 stuck for 90s! [kswapd0:20]
> [  292.440000] Modules linked in:
> [  292.440000] Format 00  Vector: 0064  PC: 000930d0  Status: 2000    Tainted: G             L
> [  292.440000] ORIG_D0: ffffffff  D0: 00000299  A2: 0085a530  A1: 0000029b
> [  292.440000] A0: 0050b3f2  D5: 00000000  D4: 00000001
> [  292.440000] D3: 00000002  D2: 0000001f  D1: 0000001f
> [  316.450000] watchdog: BUG: soft lockup - CPU#0 stuck for 112s! [kswapd0:20]
> [  316.450000] Modules linked in:
> [  316.450000] Format 00  Vector: 0064  PC: 0009142a  Status: 2004    Tainted: G             L
> [  316.450000] ORIG_D0: ffffffff  D0: 0000000c  A2: 0085a530  A1: 008e3f02
> [  316.450000] A0: 0050b376  D5: 00000000  D4: 00000001
> [  316.450000] D3: 00000002  D2: 00000000  D1: 00000000
> [  340.450000] watchdog: BUG: soft lockup - CPU#0 stuck for 135s! [kswapd0:20]
> [  340.450000] Modules linked in:
> [  340.450000] Format 00  Vector: 0064  PC: 00092cd2  Status: 2000    Tainted: G             L
> [  340.450000] ORIG_D0: ffffffff  D0: 0000000f  A2: 0085a530  A1: 000000b0
> [  340.450000] A0: 00092c9e  D5: 008e3f94  D4: 00000002
> [  340.450000] D3: 00044260  D2: 00000000  D1: 0050b3ae
> [  364.450000] watchdog: BUG: soft lockup - CPU#0 stuck for 157s! [kswapd0:20]
> [  364.450000] Modules linked in:
> [  364.450000] Format 00  Vector: 0064  PC: 000b06de  Status: 2004    Tainted: G             L
> [  364.450000] ORIG_D0: ffffffff  D0: 0000000c  A2: 0085a530  A1: 008e3f02
> [  364.450000] A0: 0050b376  D5: 00000000  D4: 00000001
> [  364.450000] D3: 00000002  D2: 00000000  D1: 00000000
> [  388.450000] watchdog: BUG: soft lockup - CPU#0 stuck for 179s! [kswapd0:20]
> [  388.450000] Modules linked in:
> [  388.450000] Format 00  Vector: 0064  PC: 000359da  Status: 2000    Tainted: G             L
> [  388.450000] ORIG_D0: ffffffff  D0: 00000003  A2: 0085a530  A1: 008e3f94
> [  388.450000] A0: 008e3f88  D5: 008e3f94  D4: 00000001
> [  388.450000] D3: 00000000  D2: 00000000  D1: 00000003
> [  412.450000] watchdog: BUG: soft lockup - CPU#0 stuck for 202s! [kswapd0:20]
> [  412.450000] Modules linked in:
> [  412.450000] Format 00  Vector: 0064  PC: 00095ca6  Status: 2000    Tainted: G             L
> [  412.450000] ORIG_D0: ffffffff  D0: 0050b644  A2: 0085a530  A1: 008e3f44
> [  412.450000] A0: 0085a530  D5: 00000000  D4: 00000001
> [  412.450000] D3: 00000001  D2: 0000001f  D1: 00000000
> [  436.450000] watchdog: BUG: soft lockup - CPU#0 stuck for 224s! [kswapd0:20]
> [  436.450000] Modules linked in:
> [  436.450000] Format 00  Vector: 0064  PC: 000359e2  Status: 2004    Tainted: G             L
> [  436.450000] ORIG_D0: ffffffff  D0: 00000000  A2: 0085a530  A1: 008e3f44
> [  436.450000] A0: 008afe00  D5: 00000000  D4: 00000001
> [  436.450000] D3: 00000001  D2: 00000000  D1: 00000000
> [  460.460000] watchdog: BUG: soft lockup - CPU#0 stuck for 246s! [kswapd0:20]
> [  460.460000] Modules linked in:
> [  460.460000] Format 00  Vector: 0064  PC: 00092b30  Status: 2004    Tainted: G             L
> [  460.460000] ORIG_D0: ffffffff  D0: 008e3f44  A2: 0085a530  A1: 008e3f44
> [  460.460000] A0: 0050b15c  D5: 00000000  D4: 00000001
> [  460.460000] D3: 00000001  D2: 00000000  D1: 00000000
> [  484.460000] watchdog: BUG: soft lockup - CPU#0 stuck for 269s! [kswapd0:20]
> [  484.460000] Modules linked in:
> [  484.460000] Format 00  Vector: 0064  PC: 000930c0  Status: 2000    Tainted: G             L
> [  484.460000] ORIG_D0: ffffffff  D0: 00000000  A2: 0085a530  A1: 008e3f02
> [  484.460000] A0: 0050b3f2  D5: 00000000  D4: 00000001
> [  484.460000] D3: 00000001  D2: 0000001f  D1: 0000001f
> [  508.460000] watchdog: BUG: soft lockup - CPU#0 stuck for 291s! [kswapd0:20]
> [  508.460000] Modules linked in:
> [  508.460000] Format 00  Vector: 0064  PC: 00097f8c  Status: 2014    Tainted: G             L
> [  508.460000] ORIG_D0: ffffffff  D0: 00002014  A2: 0085a530  A1: 008e3f4c
> [  508.460000] A0: 008e3f02  D5: 008e3f94  D4: 00000001
> [  508.460000] D3: 00000001  D2: 00000000  D1: 00000048
> [  532.460000] watchdog: BUG: soft lockup - CPU#0 stuck for 313s! [kswapd0:20]
> [  532.460000] Modules linked in:
> [  532.460000] Format 00  Vector: 0064  PC: 00095a5c  Status: 2008    Tainted: G             L
> [  532.460000] ORIG_D0: ffffffff  D0: fffffff6  A2: 0085a530  A1: 0085a530
> [  532.460000] A0: fffffff6  D5: 00000000  D4: 00000001
> [  532.460000] D3: 00000001  D2: 0000001f  D1: 00000020
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 2/2] m68k: Make allowance for signal delivery following an address error
  2023-05-06  1:54                     ` Michael Schmitz
@ 2023-05-06  2:11                       ` Finn Thain
  2023-05-06  2:30                         ` Michael Schmitz
  0 siblings, 1 reply; 23+ messages in thread
From: Finn Thain @ 2023-05-06  2:11 UTC (permalink / raw)
  To: Michael Schmitz; +Cc: Geert Uytterhoeven, linux-m68k, Andreas Schwab

On Sat, 6 May 2023, Michael Schmitz wrote:

> 
> Yes, I'd seen the OOM killer kick in before at least once (no soft lockup).

I wonder whether your .config has the same settings as mine...

#
# Debug Oops, Lockups and Hangs
#
CONFIG_PANIC_ON_OOPS=y
CONFIG_PANIC_ON_OOPS_VALUE=1
CONFIG_PANIC_TIMEOUT=0
CONFIG_LOCKUP_DETECTOR=y
CONFIG_SOFTLOCKUP_DETECTOR=y
# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
CONFIG_DETECT_HUNG_TASK=y
CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=60
# CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set
# CONFIG_WQ_WATCHDOG is not set
# CONFIG_TEST_LOCKUP is not set
# end of Debug Oops, Lockups and Hangs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 2/2] m68k: Make allowance for signal delivery following an address error
  2023-05-06  2:11                       ` Finn Thain
@ 2023-05-06  2:30                         ` Michael Schmitz
  0 siblings, 0 replies; 23+ messages in thread
From: Michael Schmitz @ 2023-05-06  2:30 UTC (permalink / raw)
  To: Finn Thain; +Cc: Geert Uytterhoeven, linux-m68k, Andreas Schwab

Hi Finn,

Am 06.05.2023 um 14:11 schrieb Finn Thain:
> On Sat, 6 May 2023, Michael Schmitz wrote:
>
>>
>> Yes, I'd seen the OOM killer kick in before at least once (no soft lockup).
>
> I wonder whether your .config has the same settings as mine...
>
> #
> # Debug Oops, Lockups and Hangs
> #
> CONFIG_PANIC_ON_OOPS=y
> CONFIG_PANIC_ON_OOPS_VALUE=1
> CONFIG_PANIC_TIMEOUT=0
> CONFIG_LOCKUP_DETECTOR=y
> CONFIG_SOFTLOCKUP_DETECTOR=y
> # CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
> CONFIG_DETECT_HUNG_TASK=y
> CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=60
> # CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set
> # CONFIG_WQ_WATCHDOG is not set
> # CONFIG_TEST_LOCKUP is not set
> # end of Debug Oops, Lockups and Hangs

No, it doesn't:

# CONFIG_PANIC_ON_OOPS is not set
CONFIG_PANIC_ON_OOPS_VALUE=0
CONFIG_PANIC_TIMEOUT=0
# CONFIG_TEST_LOCKUP is not set

I can add your debug settings if it doesn't add too much bloat.

Cheers,

	Michael


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 2/2] m68k: Make allowance for signal delivery following an address error
  2023-05-06  1:46                     ` Michael Schmitz
@ 2023-05-06  2:46                       ` Finn Thain
  2023-05-06  8:24                         ` Michael Schmitz
  0 siblings, 1 reply; 23+ messages in thread
From: Finn Thain @ 2023-05-06  2:46 UTC (permalink / raw)
  To: Michael Schmitz; +Cc: Geert Uytterhoeven, linux-m68k, Andreas Schwab

On Sat, 6 May 2023, Michael Schmitz wrote:

> >
> > In my mind, at the time I sent them, patch 1 was expected to mitigate 
> > the impact of patch 2. It would have been surprising if the reverse 
> > had somehow happened.
> 
> I did wonder if delaying signal delivery would cause fewer signals to be 
> delivered in total, and whether the remaining bus faults that do cause 
> signal delivery (format a) still cause any trouble because of eventual 
> unreliable USP.
> 
> Adding your patch 2 then might have mitigated these problems while not 
> placing too much of a burden on the stack.
> 

If fewer signals get delivered in total, that's not a bug, right? It just 
amounts to is slightly more signal latency on what is already a slow CPU.

Format 0xA bus errors happen between instructions according to the User 
Manual. There's no possibility of hitting the stack corruption bug AFAICS:

	"For efficiency, the MC68030 uses two different bus error stack 
	frame formats. When the bus error exception is taken at an 
	instruction boundary, less information is required to recover from 
	the error, and the processor builds the short bus fault stack 
	frame as shown in Table 8-7."

> > Anyway, I'm still in the dark as to why patch 1 is interesting. Was 
> > that because you wanted to delay signal delivery to try to put more 
> > pressure on stack memory by delivering multiple signals? You and I did 
> > come up with an
> 
> No - it just eliminates temporary stack growth due to the format b bus 
> faults as entry into signal delivery.
> 
> > alternative patch for bypassing signal delivery after certain 
> > exceptions. Maybe that would work better.
> 
> Meaning skipping signal delivery on any bus fault? I haven't tried that 
> one yet.

Don't try that patch if you are hoping to see signal delivery after a 
format 0xA exception -- you'd only ever get signal delivery after traps, 
interrupts, FP exceptions and a few other obscure cases.

As with patch 1/2, I don't know why you'd want to use it.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 2/2] m68k: Make allowance for signal delivery following an address error
  2023-05-06  2:46                       ` Finn Thain
@ 2023-05-06  8:24                         ` Michael Schmitz
  0 siblings, 0 replies; 23+ messages in thread
From: Michael Schmitz @ 2023-05-06  8:24 UTC (permalink / raw)
  To: Finn Thain; +Cc: Geert Uytterhoeven, linux-m68k, Andreas Schwab

Hi Finn,

Am 06.05.2023 um 14:46 schrieb Finn Thain:
> On Sat, 6 May 2023, Michael Schmitz wrote:
>
>>>
>>> In my mind, at the time I sent them, patch 1 was expected to mitigate
>>> the impact of patch 2. It would have been surprising if the reverse
>>> had somehow happened.
>>
>> I did wonder if delaying signal delivery would cause fewer signals to be
>> delivered in total, and whether the remaining bus faults that do cause
>> signal delivery (format a) still cause any trouble because of eventual
>> unreliable USP.
>>
>> Adding your patch 2 then might have mitigated these problems while not
>> placing too much of a burden on the stack.
>>
>
> If fewer signals get delivered in total, that's not a bug, right? It just
> amounts to is slightly more signal latency on what is already a slow CPU.

I'm not saying it's a bug, and I had expected more latency. I haven't 
seen any significant difference in either number of signals, and signal 
latency is hard to gauge without instrumenting the signal code to add a 
timestamp when the signal is raised. No significant impact on execution 
time of the tests at any rate.

> Format 0xA bus errors happen between instructions according to the User
> Manual. There's no possibility of hitting the stack corruption bug AFAICS:
>
> 	"For efficiency, the MC68030 uses two different bus error stack
> 	frame formats. When the bus error exception is taken at an
> 	instruction boundary, less information is required to recover from
> 	the error, and the processor builds the short bus fault stack
> 	frame as shown in Table 8-7."

Right - I was thinking an exception raised by another instruction than 
the one currently executing could cause USP to become unreliable but all 
that does is make the fault address differ from USP (for an instruction 
executing that uses USP). Your current patches don't use the fault 
address at all.

>>> Anyway, I'm still in the dark as to why patch 1 is interesting. Was
>>> that because you wanted to delay signal delivery to try to put more
>>> pressure on stack memory by delivering multiple signals? You and I did
>>> come up with an
>>
>> No - it just eliminates temporary stack growth due to the format b bus
>> faults as entry into signal delivery.
>>
>>> alternative patch for bypassing signal delivery after certain
>>> exceptions. Maybe that would work better.
>>
>> Meaning skipping signal delivery on any bus fault? I haven't tried that
>> one yet.
>
> Don't try that patch if you are hoping to see signal delivery after a
> format 0xA exception -- you'd only ever get signal delivery after traps,
> interrupts, FP exceptions and a few other obscure cases.
>
> As with patch 1/2, I don't know why you'd want to use it.

As it turns out, I had already tested that one, and it caused signals to 
be delivered on interrupts mostly instead of bus error exceptions. Never 
ran stress tests on that one at the time.

Cheers,

	Michael

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2023-05-06  8:25 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-02  6:50 [PATCH RFC 0/2] 68020/030 signal handling following exceptions Finn Thain
2023-05-02  6:50 ` [PATCH RFC 1/2] m68k: Don't deliver signals except at instruction boundary Finn Thain
2023-05-02 16:51   ` Andreas Schwab
2023-05-02 20:15     ` Michael Schmitz
2023-05-03  3:22     ` Finn Thain
2023-05-03  3:24       ` Finn Thain
2023-05-02  6:50 ` [PATCH RFC 2/2] m68k: Make allowance for signal delivery following an address error Finn Thain
2023-05-02 21:06   ` Michael Schmitz
2023-05-03  4:11     ` Finn Thain
2023-05-03  5:50       ` Michael Schmitz
2023-05-03  8:02         ` Finn Thain
2023-05-04 21:06           ` Michael Schmitz
2023-05-05  3:14             ` Finn Thain
2023-05-05  5:05               ` Michael Schmitz
2023-05-05 21:35                 ` Michael Schmitz
2023-05-06  0:36                   ` Finn Thain
2023-05-06  1:46                     ` Michael Schmitz
2023-05-06  2:46                       ` Finn Thain
2023-05-06  8:24                         ` Michael Schmitz
2023-05-06  1:11                   ` Finn Thain
2023-05-06  1:54                     ` Michael Schmitz
2023-05-06  2:11                       ` Finn Thain
2023-05-06  2:30                         ` Michael Schmitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox