From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0D3C5C54798 for ; Tue, 5 Mar 2024 21:05:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=TLu/1Lx1cpw8DIHJBUE8FYcTQivYy/cD4r/31VOv5mQ=; b=dmhgKdAATUjkFU 6xzuTJbWj3sc9vpuGn7Cr8x0euSPEqoKEQDY25syXPuTnojjXxQNzIfvmwPgeCxLDsOUzstVL/Lfm 7R9xl0fmMI7MP20gUVdBGCmjSvTXe7uZlPEgo8g3hgXKEOPH7gfP2qX0IGS+kmeIEqZeQokrs+qnO m2SUUI+bIrfaz6MgdbN2JO2684JWR/QGIxBw1N3w4QjIxOkw32FXSp/b7wBc79V9sYdmJ6wqMG763 Jz1OeVHYWtrbnTd3xQPNFNse083fTCacqTEbkaWsVOBfILuxDKtBn5sdnI/6hCK7oXGy8LafjytUX COE1elJc2aKbV17e60Fw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rhbxy-0000000FGdQ-1sKZ; Tue, 05 Mar 2024 21:04:50 +0000 Received: from pandora.armlinux.org.uk ([2001:4d48:ad52:32c8:5054:ff:fe00:142]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1rhbxt-0000000FGcL-47gV for linux-arm-kernel@lists.infradead.org; Tue, 05 Mar 2024 21:04:47 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=armlinux.org.uk; s=pandora-2019; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=VdP6ujX/mXzLTK+wczbgKda/+/y8mb//riT9+sx3DRg=; b=ezagR6XxbxdlLZDhxhRB148mBr WRVYKuObGe527Gg+yHJA9el7xOh2ZDdhfmgecvMSX3y9W0SBD6ECeujZ+dyn5cXpUJFkm4Qkm1Jk1 9XAOeWzm4+fFIQwwb1MUwe3zpPQBlu8R6iiOwa/fXnv3Pl+5cvKPjyc9qOAsoCOn1X6o0fv9s+Ix8 gVGm0/WYDUPaykujZOjxZA5bUPhCOB4qd/BJahSPrcr8Y3YFF4oJpkxWVf+M7mUfX01RxbsgnUk4A ypkkkkn7ahdWQthKMjM//et509Q77FyWfcKYz7bT0ATMqUatgGCUx+o96SqvlXgvg2xGKepXZlkd6 VOi7yrYw==; Received: from shell.armlinux.org.uk ([fd8f:7570:feb6:1:5054:ff:fe00:4ec]:33032) by pandora.armlinux.org.uk with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1rhbxp-0007ax-2H; Tue, 05 Mar 2024 21:04:41 +0000 Received: from linux by shell.armlinux.org.uk with local (Exim 4.94.2) (envelope-from ) id 1rhbxo-0005i4-1B; Tue, 05 Mar 2024 21:04:40 +0000 Date: Tue, 5 Mar 2024 21:04:39 +0000 From: "Russell King (Oracle)" To: Stefan Wiehler Cc: linux-arm-kernel@lists.infradead.org Subject: Re: Lockdep-RCU splat in ARM CPU hotplug Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240305_130446_050658_141AB353 X-CRM114-Status: GOOD ( 19.27 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, Mar 05, 2024 at 05:00:06PM +0100, Stefan Wiehler wrote: > Hi, > > With CONFIG_PROVE_RCU_LIST=y and by executing > > $ echo 0 > /sys/devices/system/cpu/cpu1/online > > one can trigger the following Lockdep-RCU splat on ARM (reproducible on an Orange Pi PC in QEMU): > > ============================= > WARNING: suspicious RCU usage > 6.8.0-rc7-00001-g0db1d0ed8958 #10 Not tainted > ----------------------------- > kernel/locking/lockdep.c:3762 RCU-list traversed in non-reader section!! > > other info that might help us debug this: > > > RCU used illegally from offline CPU! > rcu_scheduler_active = 2, debug_locks = 1 > no locks held by swapper/1/0. > > stack backtrace: > CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0-rc7-00001-g0db1d0ed8958 #10 > Hardware name: Allwinner sun8i Family > unwind_backtrace from show_stack+0x10/0x14 > show_stack from dump_stack_lvl+0x60/0x90 > dump_stack_lvl from lockdep_rcu_suspicious+0x150/0x1a0 > lockdep_rcu_suspicious from __lock_acquire+0x11fc/0x29f8 > __lock_acquire from lock_acquire+0x10c/0x348 > lock_acquire from _raw_spin_lock_irqsave+0x50/0x6c > _raw_spin_lock_irqsave from check_and_switch_context+0x7c/0x4a8 > check_and_switch_context from arch_cpu_idle_dead+0x10/0x7c > arch_cpu_idle_dead from do_idle+0xbc/0x138 > do_idle from cpu_startup_entry+0x28/0x2c > cpu_startup_entry from secondary_start_kernel+0x11c/0x124 > secondary_start_kernel from 0x401018a0 > > Originally the splat was found on an AXM5516 with v5.15, so the issue presumably exists for quite some time already on all ARM boards. > > Lockdep-RCU is triggered by this call of raw_spin_lock_irqsave() in check_and_switch_context() while the CPU is already marked offline: https://elixir.bootlin.com/linux/v6.8-rc7/source/arch/arm/mm/context.c#L257 > > On ARM64, we have cpu_die_early() calling rcutree_report_cpu_dead() which presumably prevents such a splat from occurring: https://elixir.bootlin.com/linux/v6.8-rc7/source/arch/arm64/kernel/smp.c#L412 > > Simply calling rcutree_report_cpu_dead() in arch_cpu_idle_dead() on ARM seems to have no effect though. As my understanding of the CPU hotplugging subsystem on ARM is a bit limited, I would appreciate some help here. So I think this is down to what check_and_switch_context() is doing. Tracing through the paths, idle_task_exit() is called from the arch_cpu_idle_dead() path on both 32-bit ARM and x86. So this is legal to do (if it wasn't then x86 would have problems.) idle_task_exit() calls switch_mm(), which is an arch-defined function, and this calls check_and_switch_context(). Anything which switch_mm() calls has to be safe to be called from the arch_cpu_idle_dead() path. We can't get rid of the spinlock in check_and_switch_context() as that is fundamental to how the ASID handling works - removing it would cause all sorts of races. I don't see how we can solve this at the moment, not helped by my limited RCU knowledge. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel