From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73E61C433FE for ; Fri, 30 Sep 2022 11:01:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232402AbiI3LBX (ORCPT ); Fri, 30 Sep 2022 07:01:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42648 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229547AbiI3LAx (ORCPT ); Fri, 30 Sep 2022 07:00:53 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C384318C007 for ; Fri, 30 Sep 2022 03:38:54 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 30B356222B for ; Fri, 30 Sep 2022 10:38:01 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8A2DAC433D6; Fri, 30 Sep 2022 10:38:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1664534280; bh=rdRBgpEJ33rWNGCGW/IAGeWpWn9Qa40WvhZzb56UsXM=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=XnJGPDE1FS9QjzqfPhhYMABSxKRaFmvZeJPodIr3GXYeV6nWellIDB5zztlTgLyIE DEzb8EHOE/9PoEqkYvt6C4ZGw7btNdx5GdKBVEsNzrm7zzcienTKjDepKcIqjGL/wu amCbbXvQW8Y4kZRqEdMgbF0u57r6lTRAhrUzSvffyGDNMBoUnRK3D/3LvdIVr4VLma umKWPveclE/hp2Qh3vK7WQ14MsByQBkLVNz1XcA5rHprjtzxPMd45hzS6ABK47qkrp ChpXG70Yr9TswmBJFAYnDYF1AZF91MZpT5Z62oFYGm7NruEOu3i+BoxVnTKIFyu4ls rm7wGLlTxdbnw== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1oeDP4-00Djra-6e; Fri, 30 Sep 2022 11:37:58 +0100 Date: Fri, 30 Sep 2022 11:37:57 +0100 Message-ID: <868rm16tbu.wl-maz@kernel.org> From: Marc Zyngier To: "=?utf-8?B?WmhhbmcgWGluY2hlbmc=?=" Cc: "=?utf-8?B?dGdseA==?=" , "=?utf-8?B?bGludXgta2VybmVs?=" , "=?utf-8?B?b2xla3NhbmRy?=" , "=?utf-8?B?SGFucyBkZSBHb2VkZQ==?=" , "=?utf-8?B?YmlnZWFzeQ==?=" , "=?utf-8?B?bWFyay5ydXRsYW5k?=" , "=?utf-8?B?bWljaGFlbA==?=" Subject: Re: [PATCH] interrupt: discover and disable very frequent interrupts In-Reply-To: References: <20220930064042.14564-1-zhangxincheng@uniontech.com> <86bkqx6wrd.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: zhangxincheng@uniontech.com, tglx@linutronix.de, linux-kernel@vger.kernel.org, oleksandr@natalenko.name, hdegoede@redhat.com, bigeasy@linutronix.de, mark.rutland@arm.com, michael@walle.cc X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 30 Sep 2022 10:57:17 +0100, "=?utf-8?B?WmhhbmcgWGluY2hlbmc=?=" wrote: > > > Irrespective of the patch itself, I would really like to understand > > why you consider that it is a better course of action to kill a device > > (and potentially the whole machine) than to let the storm eventually > > calm down? A frequent interrupt is not necessarily the sign of > > something going wrong. It is the sign of a busy system. I prefer my > > systems busy rather than dead. > > Because I found that some peripherals will send interrupts to the > CPU very frequently in some cases, and the interrupts will be > handled correctly, which will cause the CPU to do nothing but handle > the interrupts. At the same time, the RCU system will report the > following logs: > > [ 838.131628] rcu: INFO: rcu_sched self-detected stall on CPU > [ 838.137189] rcu: 0-....: (194839 ticks this GP) idle=f02/1/0x4000000000000004 softirq=9993/9993 fqs=97428 > [ 838.146912] rcu: (t=195015 jiffies g=6773 q=0) > [ 838.151516] Task dump for CPU 0: > [ 838.154730] systemd-sleep R running task 0 3445 1 0x0000000a > [ 838.161764] Call trace: > [ 838.164198] dump_backtrace+0x0/0x190 > [ 838.167846] show_stack+0x14/0x20 > [ 838.171148] sched_show_task+0x134/0x160 > [ 838.175057] dump_cpu_task+0x40/0x4c > [ 838.178618] rcu_dump_cpu_stacks+0xc4/0x108 > [ 838.182788] rcu_check_callbacks+0x6e4/0x898 > [ 838.187044] update_process_times+0x2c/0x88 > [ 838.191214] tick_sched_handle.isra.5+0x3c/0x50 > [ 838.195730] tick_sched_timer+0x48/0x98 > [ 838.199552] __hrtimer_run_queues+0xec/0x2f8 > [ 838.203808] hrtimer_interrupt+0x10c/0x298 > [ 838.207891] arch_timer_handler_phys+0x2c/0x38 > [ 838.212321] handle_percpu_devid_irq+0x88/0x228 > [ 838.216837] generic_handle_irq+0x2c/0x40 > [ 838.220833] __handle_domain_irq+0x60/0xb8 > [ 838.224915] gic_handle_irq+0x7c/0x178 > [ 838.228650] el1_irq+0xb0/0x140 > [ 838.231778] __do_softirq+0x84/0x2e8 > [ 838.235340] irq_exit+0x9c/0xb8 > [ 838.238468] __handle_domain_irq+0x64/0xb8 > [ 838.242550] gic_handle_irq+0x7c/0x178 > [ 838.246285] el1_irq+0xb0/0x140 > [ 838.249413] resume_irqs+0xfc/0x148 > [ 838.252888] resume_device_irqs+0x10/0x18 > [ 838.256883] dpm_resume_noirq+0x10/0x20 > [ 838.260706] suspend_devices_and_enter+0x170/0x788 > [ 838.265483] pm_suspend+0x41c/0x4cc > [ 838.268958] state_store+0xbc/0x160 > [ 838.272433] kobj_attr_store+0x14/0x28 > [ 838.276168] sysfs_kf_write+0x40/0x50 > [ 838.279817] kernfs_fop_write+0xcc/0x1e0 > [ 838.283726] __vfs_write+0x18/0x140 > [ 838.287201] vfs_write+0xa4/0x1b0 > [ 838.290503] ksys_write+0x4c/0xb8 > [ 838.293804] __arm64_sys_write+0x18/0x20 > [ 838.297713] el0_svc_common+0x90/0x178 > [ 838.301449] el0_svc_handler+0x9c/0xa8 > [ 838.305184] el0_svc+0x8/0xc > > The log is from the process of waking up a sleeping machine, > I left the machine in this state for a night and it successfully woke up, > and then I saw from /proc/interrupts that a GPIO interrupt triggered > more than 13 billion times. > > 29: 1368200001 0 0 0 0 0 0 0 phytium_gpio6 Edge ACPI:Event Again: what makes you think that it is better to kill the interrupt than suffering a RCU stall? Yes, that's a lot of interrupts. But killing it and risking the whole system isn't an acceptable outcome. M. -- Without deviation from the norm, progress is not possible.