From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DEC46C5519F for ; Fri, 20 Nov 2020 10:31:10 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6B61222255 for ; Fri, 20 Nov 2020 10:31:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="iaG8yBng" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6B61222255 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=wrcm4B+sUrtOPJcLVyfQ7yt1zguLMCgoPr/xdmW2q2k=; b=iaG8yBngGciAqiD3wyfnTdZNh VxxzAGxUZbM639zene8Ds5CHTTilktQ9xk3CX17099u2ACCYpozrfeAX1Hc1rR/C+XxTF0C1VjHb6 LbQMsXKsio6C4u10fRfIrJ7Vj1mTzvEqnTyONKub+tmfSUIfbqW5cDAvqVopzF0B9bmX1lVkdDR6u OIPiN7LrW7JNxKuMN5GOtoyfdXZ2D3b4idUMQAk5IqvjX+oz16yMfB3S6ds/2OOcia29kheNo73wS 12FW/OdR7dGbz8oITHLItZukvFuVmDP4ysQOgLCgZ/hReaZYSrmSt5y9TnOvkyYrhp5qDeovigHPi tJpG/jIRw==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kg3gh-0001b2-6c; Fri, 20 Nov 2020 10:30:43 +0000 Received: from foss.arm.com ([217.140.110.172]) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kg3ge-0001Zp-9T for linux-arm-kernel@lists.infradead.org; Fri, 20 Nov 2020 10:30:41 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6C0C21042; Fri, 20 Nov 2020 02:30:38 -0800 (PST) Received: from C02TD0UTHF1T.local (unknown [10.57.27.176]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id AAB813F70D; Fri, 20 Nov 2020 02:30:34 -0800 (PST) Date: Fri, 20 Nov 2020 10:30:31 +0000 From: Mark Rutland To: Will Deacon Subject: Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...) Message-ID: <20201120103031.GB2328@C02TD0UTHF1T.local> References: <20201117182915.GM1437@paulmck-ThinkPad-P72> <20201118225621.GA1770130@elver.google.com> <20201118233841.GS1437@paulmck-ThinkPad-P72> <20201119125357.GA2084963@elver.google.com> <20201119151409.GU1437@paulmck-ThinkPad-P72> <20201119170259.GA2134472@elver.google.com> <20201119184854.GY1437@paulmck-ThinkPad-P72> <20201119193819.GA2601289@elver.google.com> <20201119213512.GB1437@paulmck-ThinkPad-P72> <20201119225352.GA5251@willie-the-truck> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20201119225352.GA5251@willie-the-truck> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201120_053040_425896_D93D0281 X-CRM114-Status: GOOD ( 33.93 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-arm-kernel@lists.infradead.org, Marco Elver , Anders Roxell , "Paul E. McKenney" , Jann Horn , Peter Zijlstra , Lai Jiangshan , Linux Kernel Mailing List , Steven Rostedt , rcu@vger.kernel.org, Linux-MM , Alexander Potapenko , kasan-dev , Tejun Heo , Andrew Morton , Dmitry Vyukov Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Thu, Nov 19, 2020 at 10:53:53PM +0000, Will Deacon wrote: > On Thu, Nov 19, 2020 at 01:35:12PM -0800, Paul E. McKenney wrote: > > On Thu, Nov 19, 2020 at 08:38:19PM +0100, Marco Elver wrote: > > > On Thu, Nov 19, 2020 at 10:48AM -0800, Paul E. McKenney wrote: > > > > On Thu, Nov 19, 2020 at 06:02:59PM +0100, Marco Elver wrote: > > > > [ . . . ] > > > > > > > I can try bisection again, or reverting some commits that might be > > > > > suspicious? But we'd need some selection of suspicious commits. > > > > > > > > The report claims that one of the rcu_node ->lock fields is held > > > > with interrupts enabled, which would indeed be bad. Except that all > > > > of the stack traces that it shows have these locks held within the > > > > scheduling-clock interrupt handler. Now with the "rcu: Don't invoke > > > > try_invoke_on_locked_down_task() with irqs disabled" but without the > > > > "sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled" > > > > commit, I understand why. With both, I don't see how this happens. > > > > > > I'm at a loss, but happy to keep bisecting and trying patches. I'm also > > > considering: > > > > > > Is it the compiler? Probably not, I tried 2 versions of GCC. > > > > > > Can we trust lockdep to precisely know IRQ state? I know there's > > > been some recent work around this, but hopefully we're not > > > affected here? > > > > > > Is QEMU buggy? > > > > > > > At this point, I am reduced to adding lockdep_assert_irqs_disabled() > > > > calls at various points in that code, as shown in the patch below. > > > > > > > > At this point, I would guess that your first priority would be the > > > > initial bug rather than this following issue, but you never know, this > > > > might well help diagnose the initial bug. > > > > > > I don't mind either way. I'm worried deadlocking the whole system might > > > be worse. > > > > Here is another set of lockdep_assert_irqs_disabled() calls on the > > off-chance that they actually find something. > > FWIW, arm64 is known broken wrt lockdep and irq tracing atm. Mark has been > looking at that and I think he is close to having something workable. > > Mark -- is there anything Marco and Paul can try out? I initially traced some issues back to commit: 044d0d6de9f50192 ("lockdep: Only trace IRQ edges") ... and that change of semantic could cause us to miss edges in some cases, but IIUC mostly where we haven't done the right thing in exception entry/return. I don't think my patches address this case yet, but my WIP (currently just fixing user<->kernel transitions) is at: https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/irq-fixes I'm looking into the kernel<->kernel transitions now, and I know that we mess up RCU management for a small window around arch_cpu_idle, but it's not immediately clear to me if either of those cases could cause this report. Thanks, Mark. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel