Hi Stan, Am 08.02.2023 um 11:58 schrieb Michael Schmitz: > Thanks Stan, > > On 8/02/23 08:37, Stan Johnson wrote: >> Hi Michael, >> >> On 2/5/23 3:19 PM, Michael Schmitz wrote: >>> ... >>> >>> Seeing Finn's report that Al Viro's VM_FAULT_RETRY fix may have solved >>> his task corruption troubles on 040, I just noticed that I probably >>> misunderstood how Al's patch works. >>> >>> Botching up a fault retry and carrying on may well leave the page tables >>> in a state where some later access could go to the wrong page and >>> manifest as user space corruption. Could you try Al's patch 4 (m68k: fix >>> livelock in uaccess) to see if this helps? >>> ... >> ok, this appears to be the patch: >> >> Signed-off-by: Al Viro >> --- >> arch/m68k/mm/fault.c | 5 ++++- >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c >> index 4d2837eb3e2a..228128e45c67 100644 >> --- a/arch/m68k/mm/fault.c >> +++ b/arch/m68k/mm/fault.c >> @@ -138,8 +138,11 @@ int do_page_fault(struct pt_regs *regs, unsigned >> long address, >> fault = handle_mm_fault(vma, address, flags, regs); >> pr_debug("handle_mm_fault returns %x\n", fault); >> >> - if (fault_signal_pending(fault, regs)) >> + if (fault_signal_pending(fault, regs)) { >> + if (!user_mode(regs)) >> + goto no_context; >> return 0; >> + } >> >> /* The fault is fully completed (including releasing mmap lock) */ >> if (fault & VM_FAULT_COMPLETED) > > That's correct. > > Your results show improvement but the problem does not entirely go away. > > Looking at differences between 030 and 040/040 fault handling, it > appears only 030 handles faults corrected by exception tables (such as > used in uaccess macros) special, i.e. aborting bus error processing > while 040 and 060 carry on in the fault handler. > > I wonder if that's the main difference between 030 and 040 behaviour? Following the 040 code a bit further, I suspect that happens in the 040 writeback handler, so this may be a red herring. > I'll try and log such accesses caught by exception tables on 030 to see > if they are rare enough to allow adding a kernel log message... Looks like this kind of event is rare enough to not trigger in a normal boot on my 030. Please give the attached patch a try so we can confirm (or rule out) that user space access faults from kernel mode are to blame for your stack smashes. Cheers, Michael > Cheers, > > Michael > >