From mboxrd@z Thu Jan 1 00:00:00 1970 From: Finn Thain Subject: Re: [PATCH 04/10] m68k: fix livelock in uaccess Date: Sun, 5 Feb 2023 17:18:08 +1100 (AEDT) Message-ID: <92a4aa45-0a7c-a389-798a-2f3e3cfa516f@linux-m68k.org> References: Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:date:date:feedback-id :feedback-id:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:sender:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; t=1675577767; x=1675664167; bh=hmB8NiHkQ5VrUabRisnaTxPVwBHX 9Yof16xrIa+uZ4w=; b=rOKaPu3DT9duQjRq4Ez/XhnitooRKGpbYAkW6xLm07uG q6L+bduvNmIsjwE7ALTsMZ5ZiBWlGFBIQysy2Nk/NEvRwrBLkXyJ6T4H5yp6m87L s239am+pbhVeS1ZpHH1D/LGbgTt0lSEHFnKu2fXMXRruU/I9Llc4s4f+OFHx04Sf vySX7XVyRDi2fgL3Ob510kKuRamgZT5Zd0RHzJEgJnIFeaf4482JsvweS/vTan9f xFUmO9AdW46nrF3fVkI/Trv7cS8AtWuOGqsP6XRAnOyWa040ZjJUqKm6LNHKUgM/ Het/i4M7TEEiA6UJ5qp1k+4daE7mjBNKnQZRSlUtaQ== In-Reply-To: List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Al Viro Cc: linux-arch@vger.kernel.org, linux-alpha@vger.kernel.org, linux-ia64@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-m68k@lists.linux-m68k.org, Michal Simek , Dinh Nguyen , openrisc@lists.librecores.org, linux-parisc@vger.kernel.org, linux-riscv@lists.infradead.org, sparclinux@vger.kernel.org, Linus Torvalds Hello Al, On Tue, 31 Jan 2023, Al Viro wrote: > m68k equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY > handling" If e.g. get_user() triggers a page fault and a fatal signal is > caught, we might end up with handle_mm_fault() returning VM_FAULT_RETRY > and not doing anything to page tables. In such case we must *not* > return to the faulting insn - that would repeat the entire thing without > making any progress; what we need instead is to treat that as failed > (user) memory access. > > Signed-off-by: Al Viro That could be a bug I was chasing back in 2021 but never found. The mmap stressors in stress-ng were triggering a crash on a Mac Quadras, though only rarely. Sometimes it would run all day without a failure. Last year when I started using GCC 12 to build the kernel, I saw the same workload fail again but the failure mode had become a silent hang/livelock instead of the oopses I got with GCC 6. When I press the NMI button after the livelock I always see do_page_fault() in the backtrace. So I've been testing your patch. I've been running the same stress-ng reproducer for about 12 hours now with no failures which looks promising. In case that stress-ng testing is of use: Tested-by: Finn Thain BTW, how did you identify that bug in do_page_fault()? If its the same bug I was chasing, it could be an old one. The stress-ng logs I collected last year include a crash from a v4.14 build. > --- > arch/m68k/mm/fault.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c > index 4d2837eb3e2a..228128e45c67 100644 > --- a/arch/m68k/mm/fault.c > +++ b/arch/m68k/mm/fault.c > @@ -138,8 +138,11 @@ int do_page_fault(struct pt_regs *regs, unsigned long address, > fault = handle_mm_fault(vma, address, flags, regs); > pr_debug("handle_mm_fault returns %x\n", fault); > > - if (fault_signal_pending(fault, regs)) > + if (fault_signal_pending(fault, regs)) { > + if (!user_mode(regs)) > + goto no_context; > return 0; > + } > > /* The fault is fully completed (including releasing mmap lock) */ > if (fault & VM_FAULT_COMPLETED) >