From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 73E61C433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 11:01:25 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232402AbiI3LBX (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 07:01:23 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42648 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229547AbiI3LAx (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 07:00:53 -0400
Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C384318C007
        for <linux-kernel@vger.kernel.org>; Fri, 30 Sep 2022 03:38:54 -0700 (PDT)
Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140])
        (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
        (No client certificate requested)
        by dfw.source.kernel.org (Postfix) with ESMTPS id 30B356222B
        for <linux-kernel@vger.kernel.org>; Fri, 30 Sep 2022 10:38:01 +0000 (UTC)
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8A2DAC433D6;
        Fri, 30 Sep 2022 10:38:00 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
        s=k20201202; t=1664534280;
        bh=rdRBgpEJ33rWNGCGW/IAGeWpWn9Qa40WvhZzb56UsXM=;
        h=Date:From:To:Cc:Subject:In-Reply-To:References:From;
        b=XnJGPDE1FS9QjzqfPhhYMABSxKRaFmvZeJPodIr3GXYeV6nWellIDB5zztlTgLyIE
         DEzb8EHOE/9PoEqkYvt6C4ZGw7btNdx5GdKBVEsNzrm7zzcienTKjDepKcIqjGL/wu
         amCbbXvQW8Y4kZRqEdMgbF0u57r6lTRAhrUzSvffyGDNMBoUnRK3D/3LvdIVr4VLma
         umKWPveclE/hp2Qh3vK7WQ14MsByQBkLVNz1XcA5rHprjtzxPMd45hzS6ABK47qkrp
         ChpXG70Yr9TswmBJFAYnDYF1AZF91MZpT5Z62oFYGm7NruEOu3i+BoxVnTKIFyu4ls
         rm7wGLlTxdbnw==
Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org)
        by disco-boy.misterjones.org with esmtpsa  (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
        (Exim 4.95)
        (envelope-from <maz@kernel.org>)
        id 1oeDP4-00Djra-6e;
        Fri, 30 Sep 2022 11:37:58 +0100
Date:   Fri, 30 Sep 2022 11:37:57 +0100
Message-ID: <868rm16tbu.wl-maz@kernel.org>
From:   Marc Zyngier <maz@kernel.org>
To:     "=?utf-8?B?WmhhbmcgWGluY2hlbmc=?=" <zhangxincheng@uniontech.com>
Cc:     "=?utf-8?B?dGdseA==?=" <tglx@linutronix.de>,
        "=?utf-8?B?bGludXgta2VybmVs?=" <linux-kernel@vger.kernel.org>,
        "=?utf-8?B?b2xla3NhbmRy?=" <oleksandr@natalenko.name>,
        "=?utf-8?B?SGFucyBkZSBHb2VkZQ==?=" <hdegoede@redhat.com>,
        "=?utf-8?B?YmlnZWFzeQ==?=" <bigeasy@linutronix.de>,
        "=?utf-8?B?bWFyay5ydXRsYW5k?=" <mark.rutland@arm.com>,
        "=?utf-8?B?bWljaGFlbA==?=" <michael@walle.cc>
Subject: Re: [PATCH] interrupt: discover and disable very frequent interrupts
In-Reply-To: <tencent_7C4E401B708789BC3A26F57C@qq.com>
References: <20220930064042.14564-1-zhangxincheng@uniontech.com>
        <86bkqx6wrd.wl-maz@kernel.org>
        <tencent_7C4E401B708789BC3A26F57C@qq.com>
User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue)
 FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1
 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO)
MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue")
Content-Type: text/plain; charset=US-ASCII
X-SA-Exim-Connect-IP: 185.219.108.64
X-SA-Exim-Rcpt-To: zhangxincheng@uniontech.com, tglx@linutronix.de, linux-kernel@vger.kernel.org, oleksandr@natalenko.name, hdegoede@redhat.com, bigeasy@linutronix.de, mark.rutland@arm.com, michael@walle.cc
X-SA-Exim-Mail-From: maz@kernel.org
X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Fri, 30 Sep 2022 10:57:17 +0100,
"=?utf-8?B?WmhhbmcgWGluY2hlbmc=?=" <zhangxincheng@uniontech.com> wrote:
> 
> > Irrespective of the patch itself, I would really like to understand
> > why you consider that it is a better course of action to kill a device
> > (and potentially the whole machine) than to let the storm eventually
> > calm down? A frequent interrupt is not necessarily the sign of
> > something going wrong. It is the sign of a busy system. I prefer my
> > systems busy rather than dead.
> 
> Because I found that some peripherals will send interrupts to the
> CPU very frequently in some cases, and the interrupts will be
> handled correctly, which will cause the CPU to do nothing but handle
> the interrupts. At the same time, the RCU system will report the
> following logs:
> 
> [  838.131628] rcu: INFO: rcu_sched self-detected stall on CPU
> [  838.137189] rcu:     0-....: (194839 ticks this GP) idle=f02/1/0x4000000000000004 softirq=9993/9993 fqs=97428 
> [  838.146912] rcu:      (t=195015 jiffies g=6773 q=0)
> [  838.151516] Task dump for CPU 0:
> [  838.154730] systemd-sleep   R  running task        0  3445      1 0x0000000a
> [  838.161764] Call trace:
> [  838.164198]  dump_backtrace+0x0/0x190
> [  838.167846]  show_stack+0x14/0x20
> [  838.171148]  sched_show_task+0x134/0x160
> [  838.175057]  dump_cpu_task+0x40/0x4c
> [  838.178618]  rcu_dump_cpu_stacks+0xc4/0x108
> [  838.182788]  rcu_check_callbacks+0x6e4/0x898
> [  838.187044]  update_process_times+0x2c/0x88
> [  838.191214]  tick_sched_handle.isra.5+0x3c/0x50
> [  838.195730]  tick_sched_timer+0x48/0x98
> [  838.199552]  __hrtimer_run_queues+0xec/0x2f8
> [  838.203808]  hrtimer_interrupt+0x10c/0x298
> [  838.207891]  arch_timer_handler_phys+0x2c/0x38
> [  838.212321]  handle_percpu_devid_irq+0x88/0x228
> [  838.216837]  generic_handle_irq+0x2c/0x40
> [  838.220833]  __handle_domain_irq+0x60/0xb8
> [  838.224915]  gic_handle_irq+0x7c/0x178
> [  838.228650]  el1_irq+0xb0/0x140
> [  838.231778]  __do_softirq+0x84/0x2e8
> [  838.235340]  irq_exit+0x9c/0xb8
> [  838.238468]  __handle_domain_irq+0x64/0xb8
> [  838.242550]  gic_handle_irq+0x7c/0x178
> [  838.246285]  el1_irq+0xb0/0x140
> [  838.249413]  resume_irqs+0xfc/0x148
> [  838.252888]  resume_device_irqs+0x10/0x18
> [  838.256883]  dpm_resume_noirq+0x10/0x20
> [  838.260706]  suspend_devices_and_enter+0x170/0x788
> [  838.265483]  pm_suspend+0x41c/0x4cc
> [  838.268958]  state_store+0xbc/0x160
> [  838.272433]  kobj_attr_store+0x14/0x28
> [  838.276168]  sysfs_kf_write+0x40/0x50
> [  838.279817]  kernfs_fop_write+0xcc/0x1e0
> [  838.283726]  __vfs_write+0x18/0x140
> [  838.287201]  vfs_write+0xa4/0x1b0
> [  838.290503]  ksys_write+0x4c/0xb8
> [  838.293804]  __arm64_sys_write+0x18/0x20
> [  838.297713]  el0_svc_common+0x90/0x178
> [  838.301449]  el0_svc_handler+0x9c/0xa8
> [  838.305184]  el0_svc+0x8/0xc
> 
> The log is from the process of waking up a sleeping machine, 
> I left the machine in this state for a night and it successfully woke up,
>  and then I saw from /proc/interrupts that a GPIO interrupt triggered 
> more than 13 billion times.
> 
> 29: 1368200001  0  0  0  0  0  0  0     phytium_gpio6   Edge ACPI:Event

Again: what makes you think that it is better to kill the interrupt
than suffering a RCU stall? Yes, that's a lot of interrupts. But
killing it and risking the whole system isn't an acceptable outcome.

	M.

-- 
Without deviation from the norm, progress is not possible.