From: "Rafał Miłecki" <zajec5@gmail.com>
To: Florian Fainelli <florian.fainelli@broadcom.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>,
Waiman Long <longman@redhat.com>,
Boqun Feng <boqun.feng@gmail.com>,
Russell King <linux@armlinux.org.uk>,
Daniel Lezcano <daniel.lezcano@linaro.org>,
Thomas Gleixner <tglx@linutronix.de>,
Florian Fainelli <f.fainelli@gmail.com>,
linux-clk@vger.kernel.org,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
Network Development <netdev@vger.kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Cc: OpenWrt Development List <openwrt-devel@lists.openwrt.org>,
bcm-kernel-feedback-list <bcm-kernel-feedback-list@broadcom.com>
Subject: Re: ARM board lockups/hangs triggered by locks and mutexes
Date: Wed, 2 Aug 2023 09:02:11 +0200 [thread overview]
Message-ID: <ded9eba2-3a50-0d75-f8be-80feac788d24@gmail.com> (raw)
In-Reply-To: <8fcbe485-ce6b-c500-56da-77cae0c872ff@broadcom.com>
On 2.08.2023 00:25, Florian Fainelli wrote:
> Hi Rafal,
>
> On 8/1/23 15:10, Rafał Miłecki wrote:
>> Hi,
>>
>> Years ago I added support for Broadcom's BCM53573 SoCs. We released
>> firmwares based on Linux 4.4 (and later on 4.14) that worked almost
>> fine. There was one little issue we couldn't debug or fix: random hangs
>> and reboots. They were too rare to deal with (most devices worked fine
>> for weeks or months).
>>
>> Recently I updated my stable kernel 5.4 and I started experiencing
>> stability issues on my own! After some uptime (usually from 0 to 20
>> minutes of close to zero activity) serial console hangs. I can't type
>> anything and I stop getting any messages. I've to wait about a minute
>> for watchdog to kick in and reboot device.
>>
>> #####
>>
>> I took that great chance and decided to track the regression.
>>
>> Linux 5.4 stable branch worked stable up to the release v5.4.197.
>> Starting with v5.4.198 I started experiencing those stability issues. I
>> bisected it down to the commit 4460066eb248 ("ipv6: fix locking issues
>> with loops over idev->addr_list"):
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=4460066eb2480b9e203c73755e12e2efc820a27e
>>
>> With above commit reverted I was able to use stable 5.4 branch up to the
>> release v5.4.207. Starting with v5.4.208 it got unstable again. I
>> bisected it down to:
>> commit d0d583484d2e ("locking/refcount: Consolidate implementations of
>> refcount_t")
>> commit dab787c73f6e ("locking/refcount: Consolidate
>> REFCOUNT_{MAX,SATURATED} definitions")
>> commit 0d3182fbe689 ("locking/refcount: Move saturation warnings out of line")
>> commit 809554147d60 ("locking/refcount: Improve performance of generic
>> REFCOUNT_FULL code")
>> commit 9c9269977f03 ("locking/refcount: Move the bulk of the
>> REFCOUNT_FULL implementation into the <linux/refcount.h> header")
>> commit 04bff7d7b808 ("locking/refcount: Remove unused
>> refcount_*_checked() variants")
>> commit 513b19a43bec ("locking/refcount: Ensure integer operands are
>> treated as signed")
>> commit 68b4ee68e8c8 ("locking/refcount: Define constants for
>> saturation and max refcount values")
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=d0d583484d2ed9f5903edbbfa7e2a68f78b950b0
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=dab787c73f6e38d8e7ed3c1e683385e8f0fe28a2
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=0d3182fbe689e3808c03b6cde6be98237f9e0a4a
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=809554147d609163cfbaf815c443c575b538a7ef
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=9c9269977f03ab9c448c8b71581a951e0eb4fb7b
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=04bff7d7b8081c4bb2e8171be31d33df297eee5b
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=513b19a43becee5f7af6d283bb9d3d241a8a21a8
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=68b4ee68e8c8800cf8d6b61cc74b4031a0742a4c
>> (I didn't actually check above commits individually).
>>
>> Reverting above locking/refcount commits worked fine for few releases:
>> up to the v5.4.219. Starting with v5.4.220 I got hangs again. I bisected
>> that down to the commit 131287ff833d ("once: add DO_ONCE_SLOW() for
>> sleepable contexts").
>>
>> Reverting that extra commit from v5.4.238 allows me to run Linux for
>> hours again (currently 3 devices x 6 hours and counting). So I need in
>> total 10+1 reverts from 5.4 branch to get a stable kernel.
>>
>> #####
>>
>> I'm clueless at this point. Is that possible kernel has some locking bug
>> I can hit only using this specific SoC? BCM53573s have a single ARM
>> Cortex-A7 CPU running at 900 MHz. The only unusual thing about this hw I
>> can think of is a slow arch timer running at 36,8 kHz.
>
> From the look of it, it seems like the CPU might have bugs with atomics?
>
> Your log indicates that your Cortex-A7 is r0p5 which is described to be susceptible to ARM_ERRATA_814220, do you have it enabled by any chance, if not, can you enable it and see if makes any difference?
I had it disabled. Unfortunately CONFIG_ARM_ERRATA_814220=y doesn't help.
next prev parent reply other threads:[~2023-08-02 7:02 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-01 22:10 ARM board lockups/hangs triggered by locks and mutexes Rafał Miłecki
2023-08-01 22:21 ` Russell King (Oracle)
2023-08-02 7:00 ` Rafał Miłecki
2023-08-02 7:38 ` Rafał Miłecki
2023-08-01 22:25 ` Florian Fainelli
2023-08-02 7:02 ` Rafał Miłecki [this message]
2023-08-04 10:24 ` Rafał Miłecki
2023-08-04 11:07 ` Rafał Miłecki
2023-08-07 11:10 ` Rafał Miłecki
2023-08-07 18:34 ` Florian Fainelli
2023-08-11 10:49 ` Rafał Miłecki
2023-08-14 9:04 ` Geert Uytterhoeven
2023-08-18 20:23 ` Rafał Miłecki
2023-08-18 20:24 ` Rafał Miłecki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ded9eba2-3a50-0d75-f8be-80feac788d24@gmail.com \
--to=zajec5@gmail.com \
--cc=bcm-kernel-feedback-list@broadcom.com \
--cc=boqun.feng@gmail.com \
--cc=daniel.lezcano@linaro.org \
--cc=f.fainelli@gmail.com \
--cc=florian.fainelli@broadcom.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-clk@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@armlinux.org.uk \
--cc=longman@redhat.com \
--cc=mingo@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=openwrt-devel@lists.openwrt.org \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).