From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 93D8BC77B73 for ; Fri, 5 May 2023 02:59:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:References:To:From:Subject: Cc:Message-Id:Date:Mime-Version:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=Oo4rRY2p0773UcTKtW0JwT5oscWvHF0W6xD7D4m//H8=; b=JeA48i1kcweME4 Vi34q8bzvZVjmbZG9dm6YVUisirzvyVmk4GqG92xydfzKM3a7ktPFe8Io/milu57ZIlQHXpf2KOBy iOtr/EdsaJQylSVx2HEL0G0Ft7ZBxrWyhUVwa6UWqaqiZ0WZvORPlHezn/fqWBWMUuL/KqTQYyHMF lkdKuWCbgMKNSr10YbDAJEaaimRYVxNZDvsfzmXHf8d4CzHlCyPEkWZ2GA3+vmIWYuNw0kvjcxfj2 bx2XFC+L1kgWPIQtcVRa5yCy6V6dCQ2DPVtRVeKCRliqcQmaFb0futwc9S7NLED/xZd1usUWBT5yJ tjCE8IiuLxSQIJjafIsw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pulea-009apV-08; Fri, 05 May 2023 02:58:40 +0000 Received: from mail-pf1-x435.google.com ([2607:f8b0:4864:20::435]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1puleX-009aoj-0J for linux-arm-kernel@lists.infradead.org; Fri, 05 May 2023 02:58:39 +0000 Received: by mail-pf1-x435.google.com with SMTP id d2e1a72fcca58-63b67a26069so1510255b3a.0 for ; Thu, 04 May 2023 19:58:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1683255516; x=1685847516; h=in-reply-to:references:to:from:subject:cc:message-id:date :content-transfer-encoding:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=hsLRnxoIm7PGg3IcqPGZjQFk9R9fMjzy9q8PINnfXFY=; b=D+ZkMGlj3dllnlZNoGLBYfX51aE2AfVLTyNhcKpmqUnLvAJFbaLkAALNmG14suaBOo QLbP8YZx8CzaGJVHeeZYzXP1BUK7fXrqyYhzHI8O+oK0CaWG441FhtZD1bdsRFPOc5Sf NgnT0a8J+MgslAr3DsB1DKhXhB1H42WRehFISyLPfFPT3LXYWeBKy42mSw5pwV+9FNL/ extrR8E80Qcb9IeBl/BODwQYKnE3wX3Szyxj7XQ4EmKi2xIyBOKnc4bt+Doc2+dGBw9t yZ5tWt6I5j6ryZT/qURUBLIOanp7ngcWRZBa0Ps8YgveRDQa0iTZt13M4pShPjWftOLQ rFbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683255516; x=1685847516; h=in-reply-to:references:to:from:subject:cc:message-id:date :content-transfer-encoding:mime-version:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=hsLRnxoIm7PGg3IcqPGZjQFk9R9fMjzy9q8PINnfXFY=; b=dAq+8NZEWoRtrHy9mbKZGHqghMPUvgTNLpliu4j5SILu9fakP+rl17qV3jgCSF9+ZI yf9ps4yHy7Ic7cocV9HPNL7bxSBVZw1hrx6RiksTjDTLjrmwxzuXymlowvqGqa3Vu8tj C1Nay4qbje9rFwi6jN/lThRlSdBZWzK3hwr6ycGMPzfO1XubLSrEA9U59GVUM75qmKCh IQ9uVtBbwcV1lqWgPT/RPDwTcT/fLL0O8ZXz+XW97aREZFmQHjkwJ5hXhaM48kPEW2ol vXtvD0/swt7kEyNqKCJ4ccdsRQauZ/J/N+z0HsD+4ZIS8yp2yJ0cogT8OQfqSG0ajndT GHUw== X-Gm-Message-State: AC+VfDxbQ1LPlBeVg0AcLXPkP3LvDMhbYtiLMu0FkQLnmEXraQ25h1Wg y72OG9/iQDLmMuTBcg5N9CA= X-Google-Smtp-Source: ACHHUZ68OZVP9DBpb3AsMVB/P11AAYngipplEgHSxlKuL+c359iDCmXs6WWhSER/b+lQxQyskV8vRw== X-Received: by 2002:a05:6a21:9814:b0:f2:ea8e:b130 with SMTP id ue20-20020a056a21981400b000f2ea8eb130mr3553254pzb.62.1683255516209; Thu, 04 May 2023 19:58:36 -0700 (PDT) Received: from localhost ([203.59.190.92]) by smtp.gmail.com with ESMTPSA id p17-20020aa78611000000b006242f4a8945sm422040pfn.182.2023.05.04.19.58.24 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 04 May 2023 19:58:35 -0700 (PDT) Mime-Version: 1.0 Date: Fri, 05 May 2023 12:58:22 +1000 Message-Id: Cc: "Sumit Garg" , "Mark Rutland" , "Matthias Kaehlcke" , "Stephane Eranian" , "Stephen Boyd" , , "Tzung-Bi Shih" , "Lecopzer Chen" , , "Masayoshi Mizuma" , "Guenter Roeck" , "Pingfan Liu" , "Andi Kleen" , "Ian Rogers" , , , , "Randy Dunlap" , "Chen-Yu Tsai" , , , , , "Will Deacon" , , , "Marc Zyngier" , "Catalin Marinas" , "Daniel Thompson" Subject: Re: [PATCH v4 07/17] watchdog/hardlockup: Move perf hardlockup checking/panic to common watchdog.c From: "Nicholas Piggin" To: "Douglas Anderson" , "Petr Mladek" , "Andrew Morton" X-Mailer: aerc 0.14.0 References: <20230504221349.1535669-1-dianders@chromium.org> <20230504151100.v4.7.Id4133d3183e798122dc3b6205e7852601f289071@changeid> In-Reply-To: <20230504151100.v4.7.Id4133d3183e798122dc3b6205e7852601f289071@changeid> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230504_195837_135073_C78BB647 X-CRM114-Status: GOOD ( 45.84 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Fri May 5, 2023 at 8:13 AM AEST, Douglas Anderson wrote: > The perf hardlockup detector works by looking at interrupt counts and > seeing if they change from run to run. The interrupt counts are > managed by the common watchdog code via its watchdog_timer_fn(). > > Currently the API between the perf detector and the common code is a > function: is_hardlockup(). When the hard lockup detector sees that > function return true then it handles printing out debug info and > inducing a panic if necessary. > > Let's change the API a little bit in preparation for the buddy > hardlockup detector. The buddy hardlockup detector wants to print I think the name change is a gratuitous. Especially since it's now static. watchdog_hardlockup_ is a pretty long prefix too, hardlockup_ should be enough? Seems okay otherwise though. Thanks, Nick > nearly the same debug info and have nearly the same panic > behavior. That means we want to move all that code to the common > file. For now, the code in the common file will only be there if the > perf hardlockup detector is enabled, but eventually it will be > selected by a common config. > > Right now, this _just_ moves the code from the perf detector file to > the common file and changes the names. It doesn't make the changes > that the buddy hardlockup detector will need and doesn't do any style > cleanups. A future patch will do cleanup to make it more obvious what > changed. > > With the above, we no longer have any callers of is_hardlockup() > outside of the "watchdog.c" file, so we can remove it from the header, > make it static, move it to the same "#ifdef" block as our new > watchdog_hardlockup_check(), and rename it to make it obvious it's > just for hardlockup detectors. While doing this, it can be noted that > even if no hardlockup detectors were configured the existing code used > to still have the code for counting/checking "hrtimer_interrupts" even > if the perf hardlockup detector wasn't configured. We didn't need to > do that, so move all the "hrtimer_interrupts" counting to only be > there if the perf hardlockup detector is configured as well. > > This change is expected to be a no-op. > > Signed-off-by: Douglas Anderson > --- > > Changes in v4: > - ("Move perf hardlockup checking/panic ...") new for v4. > > include/linux/nmi.h | 5 ++- > kernel/watchdog.c | 92 +++++++++++++++++++++++++++++++++--------- > kernel/watchdog_perf.c | 42 +------------------ > 3 files changed, 78 insertions(+), 61 deletions(-) > > diff --git a/include/linux/nmi.h b/include/linux/nmi.h > index 35d09d70f394..c6cb9bc5dc80 100644 > --- a/include/linux/nmi.h > +++ b/include/linux/nmi.h > @@ -15,7 +15,6 @@ > void lockup_detector_init(void); > void lockup_detector_soft_poweroff(void); > void lockup_detector_cleanup(void); > -bool is_hardlockup(void); > > extern int watchdog_user_enabled; > extern int nmi_watchdog_user_enabled; > @@ -88,6 +87,10 @@ extern unsigned int hardlockup_panic; > static inline void hardlockup_detector_disable(void) {} > #endif > > +#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) > +void watchdog_hardlockup_check(struct pt_regs *regs); > +#endif > + > #if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR) > # define NMI_WATCHDOG_SYSCTL_PERM 0644 > #else > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > index c705a18b26bf..2d319cdf64b9 100644 > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -85,6 +85,78 @@ __setup("nmi_watchdog=", hardlockup_panic_setup); > > #endif /* CONFIG_HARDLOCKUP_DETECTOR */ > > +#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) > + > +static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); > +static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved); > +static DEFINE_PER_CPU(bool, hard_watchdog_warn); > +static unsigned long hardlockup_allcpu_dumped; > + > +static bool watchdog_hardlockup_is_lockedup(void) > +{ > + unsigned long hrint = __this_cpu_read(hrtimer_interrupts); > + > + if (__this_cpu_read(hrtimer_interrupts_saved) == hrint) > + return true; > + > + __this_cpu_write(hrtimer_interrupts_saved, hrint); > + return false; > +} > + > +static void watchdog_hardlockup_interrupt_count(void) > +{ > + __this_cpu_inc(hrtimer_interrupts); > +} > + > +void watchdog_hardlockup_check(struct pt_regs *regs) > +{ > + /* check for a hardlockup > + * This is done by making sure our timer interrupt > + * is incrementing. The timer interrupt should have > + * fired multiple times before we overflow'd. If it hasn't > + * then this is a good indication the cpu is stuck > + */ > + if (watchdog_hardlockup_is_lockedup()) { > + int this_cpu = smp_processor_id(); > + > + /* only print hardlockups once */ > + if (__this_cpu_read(hard_watchdog_warn) == true) > + return; > + > + pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", > + this_cpu); > + print_modules(); > + print_irqtrace_events(current); > + if (regs) > + show_regs(regs); > + else > + dump_stack(); > + > + /* > + * Perform all-CPU dump only once to avoid multiple hardlockups > + * generating interleaving traces > + */ > + if (sysctl_hardlockup_all_cpu_backtrace && > + !test_and_set_bit(0, &hardlockup_allcpu_dumped)) > + trigger_allbutself_cpu_backtrace(); > + > + if (hardlockup_panic) > + nmi_panic(regs, "Hard LOCKUP"); > + > + __this_cpu_write(hard_watchdog_warn, true); > + return; > + } > + > + __this_cpu_write(hard_watchdog_warn, false); > + return; > +} > + > +#else /* CONFIG_HARDLOCKUP_DETECTOR_PERF */ > + > +static inline void watchdog_hardlockup_interrupt_count(void) { } > + > +#endif /* !CONFIG_HARDLOCKUP_DETECTOR_PERF */ > + > /* > * These functions can be overridden if an architecture implements its > * own hardlockup detector. > @@ -176,8 +248,6 @@ static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts); > static DEFINE_PER_CPU(unsigned long, watchdog_report_ts); > static DEFINE_PER_CPU(struct hrtimer, watchdog_hrtimer); > static DEFINE_PER_CPU(bool, softlockup_touch_sync); > -static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); > -static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved); > static unsigned long soft_lockup_nmi_warn; > > static int __init nowatchdog_setup(char *str) > @@ -312,22 +382,6 @@ static int is_softlockup(unsigned long touch_ts, > } > > /* watchdog detector functions */ > -bool is_hardlockup(void) > -{ > - unsigned long hrint = __this_cpu_read(hrtimer_interrupts); > - > - if (__this_cpu_read(hrtimer_interrupts_saved) == hrint) > - return true; > - > - __this_cpu_write(hrtimer_interrupts_saved, hrint); > - return false; > -} > - > -static void watchdog_interrupt_count(void) > -{ > - __this_cpu_inc(hrtimer_interrupts); > -} > - > static DEFINE_PER_CPU(struct completion, softlockup_completion); > static DEFINE_PER_CPU(struct cpu_stop_work, softlockup_stop_work); > > @@ -359,7 +413,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) > return HRTIMER_NORESTART; > > /* kick the hardlockup detector */ > - watchdog_interrupt_count(); > + watchdog_hardlockup_interrupt_count(); > > /* kick the softlockup detector */ > if (completion_done(this_cpu_ptr(&softlockup_completion))) { > diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c > index c3d8ceb149da..5f3651b87ee7 100644 > --- a/kernel/watchdog_perf.c > +++ b/kernel/watchdog_perf.c > @@ -20,13 +20,11 @@ > #include > #include > > -static DEFINE_PER_CPU(bool, hard_watchdog_warn); > static DEFINE_PER_CPU(bool, watchdog_nmi_touch); > static DEFINE_PER_CPU(struct perf_event *, watchdog_ev); > static DEFINE_PER_CPU(struct perf_event *, dead_event); > static struct cpumask dead_events_mask; > > -static unsigned long hardlockup_allcpu_dumped; > static atomic_t watchdog_cpus = ATOMIC_INIT(0); > > notrace void arch_touch_nmi_watchdog(void) > @@ -122,45 +120,7 @@ static void watchdog_overflow_callback(struct perf_event *event, > if (!watchdog_check_timestamp()) > return; > > - /* check for a hardlockup > - * This is done by making sure our timer interrupt > - * is incrementing. The timer interrupt should have > - * fired multiple times before we overflow'd. If it hasn't > - * then this is a good indication the cpu is stuck > - */ > - if (is_hardlockup()) { > - int this_cpu = smp_processor_id(); > - > - /* only print hardlockups once */ > - if (__this_cpu_read(hard_watchdog_warn) == true) > - return; > - > - pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", > - this_cpu); > - print_modules(); > - print_irqtrace_events(current); > - if (regs) > - show_regs(regs); > - else > - dump_stack(); > - > - /* > - * Perform all-CPU dump only once to avoid multiple hardlockups > - * generating interleaving traces > - */ > - if (sysctl_hardlockup_all_cpu_backtrace && > - !test_and_set_bit(0, &hardlockup_allcpu_dumped)) > - trigger_allbutself_cpu_backtrace(); > - > - if (hardlockup_panic) > - nmi_panic(regs, "Hard LOCKUP"); > - > - __this_cpu_write(hard_watchdog_warn, true); > - return; > - } > - > - __this_cpu_write(hard_watchdog_warn, false); > - return; > + watchdog_hardlockup_check(regs); > } > > static int hardlockup_detector_event_create(void) > -- > 2.40.1.521.gf1e218fcd8-goog _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel