From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22742C433F5 for ; Fri, 30 Sep 2022 09:24:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231367AbiI3JYI (ORCPT ); Fri, 30 Sep 2022 05:24:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59744 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230371AbiI3JYB (ORCPT ); Fri, 30 Sep 2022 05:24:01 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0D51B156C31 for ; Fri, 30 Sep 2022 02:23:56 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id B7B84B8278C for ; Fri, 30 Sep 2022 09:23:54 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5FF00C433D6; Fri, 30 Sep 2022 09:23:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1664529833; bh=3OE0jjKOgVNPji6xs4wdzSYVQWhivQxrZy4M1o2GPZI=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=EQRTO0AIKb3xvzITUlIAGu0q/OPOhn7mHZeqVZYwu3WH+HADjNANCahFyYr/YOIjD OpkddW17scavFqYnnI5QTzw3BCkiEmBmGnt/XQK/iAh+lwO8Y330SoezXc4lX1AGpO ZME6Tk1SSpxGydlJiIl7RE5+yeXdSQtq5UJhm5DKTsUkGgL6TK35SLERLY++f7g0vQ 6Uh89inso+m+QLBH8XUXXI+x+UemI7nJapqq/XUKrpczuHRCGZr46FoNu3ESLaHTSZ Vc4U+ZB4QhKDOjSVLIgw1y+ZUuCc7WYlZnaVKmmnymIHzIGpeyXpyRF6S4EtXFrtRP KB46U89swja6Q== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1oeCFL-00Dj0s-5w; Fri, 30 Sep 2022 10:23:51 +0100 Date: Fri, 30 Sep 2022 10:23:50 +0100 Message-ID: <86bkqx6wrd.wl-maz@kernel.org> From: Marc Zyngier To: Zhang Xincheng Cc: tglx@linutronix.de, linux-kernel@vger.kernel.org, oleksandr@natalenko.name, hdegoede@redhat.com, bigeasy@linutronix.de, mark.rutland@arm.com, michael@walle.cc Subject: Re: [PATCH] interrupt: discover and disable very frequent interrupts In-Reply-To: <20220930064042.14564-1-zhangxincheng@uniontech.com> References: <20220930064042.14564-1-zhangxincheng@uniontech.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: zhangxincheng@uniontech.com, tglx@linutronix.de, linux-kernel@vger.kernel.org, oleksandr@natalenko.name, hdegoede@redhat.com, bigeasy@linutronix.de, mark.rutland@arm.com, michael@walle.cc X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 30 Sep 2022 07:40:42 +0100, Zhang Xincheng wrote: > > From: zhangxincheng > > In some cases, a peripheral's interrupt will be triggered frequently, > which will keep the CPU processing the interrupt and eventually cause > the RCU to report rcu_sched self-detected stall on the CPU. > > [ 838.131628] rcu: INFO: rcu_sched self-detected stall on CPU > [ 838.137189] rcu: 0-....: (194839 ticks this GP) idle=f02/1/0x4000000000000004 > softirq=9993/9993 fqs=97428 > [ 838.146912] rcu: (t=195015 jiffies g=6773 q=0) > [ 838.151516] Task dump for CPU 0: > [ 838.154730] systemd-sleep R running task 0 3445 1 0x0000000a > > Signed-off-by: zhangxincheng > Change-Id: I9c92146f2772eae383c16c8c10de028b91e07150 > Signed-off-by: zhangxincheng Irrespective of the patch itself, I would really like to understand why you consider that it is a better course of action to kill a device (and potentially the whole machine) than to let the storm eventually calm down? A frequent interrupt is not necessarily the sign of something going wrong. It is the sign of a busy system. I prefer my systems busy rather than dead. Furthermore, I see no rationale here about the number of interrupt that *you* consider as being "too many" over what period of time (it seems to me that both parameters are firmly hardcoded). Something like this should be limited to a debug feature. It would also be a lot more useful if it was built as an interrupt *limiting* feature, rather then killing the interrupt forever (which is IMHO a ludicrous thing to do). Thanks, M. -- Without deviation from the norm, progress is not possible.