Hi Stan,

Am 08.02.2023 um 11:58 schrieb Michael Schmitz:
> Thanks Stan,
>
> On 8/02/23 08:37, Stan Johnson wrote:
>> Hi Michael,
>>
>> On 2/5/23 3:19 PM, Michael Schmitz wrote:
>>> ...
>>>
>>> Seeing Finn's report that Al Viro's VM_FAULT_RETRY fix may have solved
>>> his task corruption troubles on 040, I just noticed that I probably
>>> misunderstood how Al's patch works.
>>>
>>> Botching up a fault retry and carrying on may well leave the page tables
>>> in a state where some later access could go to the wrong page and
>>> manifest as user space corruption. Could you try Al's patch 4 (m68k: fix
>>> livelock in uaccess) to see if this helps?
>>> ...
>> ok, this appears to be the patch:
>>
>> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
>> ---
>>   arch/m68k/mm/fault.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c
>> index 4d2837eb3e2a..228128e45c67 100644
>> --- a/arch/m68k/mm/fault.c
>> +++ b/arch/m68k/mm/fault.c
>> @@ -138,8 +138,11 @@ int do_page_fault(struct pt_regs *regs, unsigned
>> long address,
>>       fault = handle_mm_fault(vma, address, flags, regs);
>>       pr_debug("handle_mm_fault returns %x\n", fault);
>>
>> -    if (fault_signal_pending(fault, regs))
>> +    if (fault_signal_pending(fault, regs)) {
>> +        if (!user_mode(regs))
>> +            goto no_context;
>>           return 0;
>> +    }
>>
>>       /* The fault is fully completed (including releasing mmap lock) */
>>       if (fault & VM_FAULT_COMPLETED)
>
> That's correct.
>
> Your results show improvement but the problem does not entirely go away.
>
> Looking at differences between 030 and 040/040 fault handling, it
> appears only 030 handles faults corrected by exception tables (such as
> used in uaccess macros) special, i.e. aborting bus error processing
> while 040 and 060 carry on in the fault handler.
>
> I wonder if that's the main difference between 030 and 040 behaviour?

Following the 040 code a bit further, I suspect that happens in the 040 
writeback handler, so this may be a red herring.

> I'll try and log such accesses caught by exception tables on 030 to see
> if they are rare enough to allow adding a kernel log message...

Looks like this kind of event is rare enough to not trigger in a normal 
boot on my 030. Please give the attached patch a try so we can confirm 
(or rule out) that user space access faults from kernel mode are to 
blame for your stack smashes.

Cheers,

	Michael


> Cheers,
>
>     Michael
>
>