public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
From: Florian Fainelli <florian.fainelli@broadcom.com>
To: "Rafał Miłecki" <zajec5@gmail.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Ingo Molnar" <mingo@redhat.com>, "Will Deacon" <will@kernel.org>,
	"Waiman Long" <longman@redhat.com>,
	"Boqun Feng" <boqun.feng@gmail.com>,
	"Russell King" <linux@armlinux.org.uk>,
	"Daniel Lezcano" <daniel.lezcano@linaro.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Florian Fainelli" <f.fainelli@gmail.com>,
	linux-clk@vger.kernel.org,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"Network Development" <netdev@vger.kernel.org>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>
Cc: OpenWrt Development List <openwrt-devel@lists.openwrt.org>,
	bcm-kernel-feedback-list <bcm-kernel-feedback-list@broadcom.com>
Subject: Re: ARM board lockups/hangs triggered by locks and mutexes
Date: Tue, 1 Aug 2023 15:25:09 -0700	[thread overview]
Message-ID: <8fcbe485-ce6b-c500-56da-77cae0c872ff@broadcom.com> (raw)
In-Reply-To: <CACna6rxpzDWE5-gnmpgMgfzPmmHvEGTZk4GJvJ8jLSMazh2bVA@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 4808 bytes --]

Hi Rafal,

On 8/1/23 15:10, Rafał Miłecki wrote:
> Hi,
> 
> Years ago I added support for Broadcom's BCM53573 SoCs. We released
> firmwares based on Linux 4.4 (and later on 4.14) that worked almost
> fine. There was one little issue we couldn't debug or fix: random hangs
> and reboots. They were too rare to deal with (most devices worked fine
> for weeks or months).
> 
> Recently I updated my stable kernel 5.4 and I started experiencing
> stability issues on my own! After some uptime (usually from 0 to 20
> minutes of close to zero activity) serial console hangs. I can't type
> anything and I stop getting any messages. I've to wait about a minute
> for watchdog to kick in and reboot device.
> 
> #####
> 
> I took that great chance and decided to track the regression.
> 
> Linux 5.4 stable branch worked stable up to the release v5.4.197.
> Starting with v5.4.198 I started experiencing those stability issues. I
> bisected it down to the commit 4460066eb248 ("ipv6: fix locking issues
> with loops over idev->addr_list"):
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=4460066eb2480b9e203c73755e12e2efc820a27e
> 
> With above commit reverted I was able to use stable 5.4 branch up to the
> release v5.4.207. Starting with v5.4.208 it got unstable again. I
> bisected it down to:
> commit d0d583484d2e ("locking/refcount: Consolidate implementations of
> refcount_t")
> commit dab787c73f6e ("locking/refcount: Consolidate
> REFCOUNT_{MAX,SATURATED} definitions")
> commit 0d3182fbe689 ("locking/refcount: Move saturation warnings out of line")
> commit 809554147d60 ("locking/refcount: Improve performance of generic
> REFCOUNT_FULL code")
> commit 9c9269977f03 ("locking/refcount: Move the bulk of the
> REFCOUNT_FULL implementation into the <linux/refcount.h> header")
> commit 04bff7d7b808 ("locking/refcount: Remove unused
> refcount_*_checked() variants")
> commit 513b19a43bec ("locking/refcount: Ensure integer operands are
> treated as signed")
> commit 68b4ee68e8c8 ("locking/refcount: Define constants for
> saturation and max refcount values")
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=d0d583484d2ed9f5903edbbfa7e2a68f78b950b0
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=dab787c73f6e38d8e7ed3c1e683385e8f0fe28a2
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=0d3182fbe689e3808c03b6cde6be98237f9e0a4a
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=809554147d609163cfbaf815c443c575b538a7ef
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=9c9269977f03ab9c448c8b71581a951e0eb4fb7b
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=04bff7d7b8081c4bb2e8171be31d33df297eee5b
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=513b19a43becee5f7af6d283bb9d3d241a8a21a8
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=68b4ee68e8c8800cf8d6b61cc74b4031a0742a4c
> (I didn't actually check above commits individually).
> 
> Reverting above locking/refcount commits worked fine for few releases:
> up to the v5.4.219. Starting with v5.4.220 I got hangs again. I bisected
> that down to the commit 131287ff833d ("once: add DO_ONCE_SLOW() for
> sleepable contexts").
> 
> Reverting that extra commit from v5.4.238 allows me to run Linux for
> hours again (currently 3 devices x 6 hours and counting). So I need in
> total 10+1 reverts from 5.4 branch to get a stable kernel.
> 
> #####
> 
> I'm clueless at this point. Is that possible kernel has some locking bug
> I can hit only using this specific SoC? BCM53573s have a single ARM
> Cortex-A7 CPU running at 900 MHz. The only unusual thing about this hw I
> can think of is a slow arch timer running at 36,8 kHz.

 From the look of it, it seems like the CPU might have bugs with atomics?

Your log indicates that your Cortex-A7 is r0p5 which is described to be 
susceptible to ARM_ERRATA_814220, do you have it enabled by any chance, 
if not, can you enable it and see if makes any difference?

> 
> I tried compiling kernel with:
> CONFIG_SOFTLOCKUP_DETECTOR=y
> CONFIG_DETECT_HUNG_TASK=y
> CONFIG_WQ_WATCHDOG=y
> but it didn't change or report anything.
> 
> Unfortunately enabling *any* of following options:
> CONFIG_DEBUG_RT_MUTEXES=y
> CONFIG_DEBUG_SPINLOCK=y
> CONFIG_DEBUG_MUTEXES=y
> seems to make locksup/hangs go away. I tried for few hours.
> 
> Sadly I don't have access to JTAG or any low level debugging interface.
> 
> Does looking at commits I reported above give anyone a hint on what may
> be going on maybe?
> 

-- 
Florian


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4221 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2023-08-01 22:25 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-01 22:10 ARM board lockups/hangs triggered by locks and mutexes Rafał Miłecki
2023-08-01 22:21 ` Russell King (Oracle)
2023-08-02  7:00   ` Rafał Miłecki
2023-08-02  7:38     ` Rafał Miłecki
2023-08-01 22:25 ` Florian Fainelli [this message]
2023-08-02  7:02   ` Rafał Miłecki
2023-08-04 10:24 ` Rafał Miłecki
2023-08-04 11:07 ` Rafał Miłecki
2023-08-07 11:10   ` Rafał Miłecki
2023-08-07 18:34     ` Florian Fainelli
2023-08-11 10:49       ` Rafał Miłecki
2023-08-14  9:04     ` Geert Uytterhoeven
2023-08-18 20:23       ` Rafał Miłecki
2023-08-18 20:24         ` Rafał Miłecki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8fcbe485-ce6b-c500-56da-77cae0c872ff@broadcom.com \
    --to=florian.fainelli@broadcom.com \
    --cc=bcm-kernel-feedback-list@broadcom.com \
    --cc=boqun.feng@gmail.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=f.fainelli@gmail.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-clk@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=longman@redhat.com \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=openwrt-devel@lists.openwrt.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=zajec5@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox