From: Tom Rini <trini@konsulko.com>
To: "Pali Rohár" <pali@kernel.org>
Cc: Stefan Roese <sr@denx.de>, u-boot@lists.denx.de
Subject: Re: Broken watchdog in u-boot master branch
Date: Mon, 10 Oct 2022 15:24:13 -0400 [thread overview]
Message-ID: <20221010192413.GS2020586@bill-the-cat> (raw)
In-Reply-To: <20221010181425.GR2020586@bill-the-cat>
[-- Attachment #1: Type: text/plain, Size: 5288 bytes --]
On Mon, Oct 10, 2022 at 02:14:25PM -0400, Tom Rini wrote:
> On Mon, Oct 10, 2022 at 08:01:23PM +0200, Pali Rohár wrote:
> > On Monday 10 October 2022 13:56:10 Tom Rini wrote:
> > > On Mon, Oct 10, 2022 at 07:44:05PM +0200, Pali Rohár wrote:
> > > > On Monday 10 October 2022 13:40:38 Tom Rini wrote:
> > > > > On Mon, Oct 10, 2022 at 07:22:56PM +0200, Pali Rohár wrote:
> > > > > > On Monday 10 October 2022 12:28:18 Tom Rini wrote:
> > > > > > > On Sun, Oct 09, 2022 at 09:12:25PM +0200, Pali Rohár wrote:
> > > > > > > > Hello! Watchdog code seems to be broken in u-boot master branch.
> > > > > > > > On Nokia N900 I'm getting following message in qemu:
> > > > > > > >
> > > > > > > > cyclic function rx51_watchdog took too long: 10000us vs 1000us max, disabling
> > > > > > > >
> > > > > > > > Seems that watchdog core code is not prepared for "slower" watchdogs
> > > > > > > > which communicate over slower i2c bus, like it is the case for N900.
> > > > > > > >
> > > > > > > > Disabling slower watchdog is a bad idea as it would result in reboot
> > > > > > > > loop instead of slower - but working code.
> > > > > > >
> > > > > > > So, looking at this in more detail, we have
> > > > > > > CONFIG_CYCLIC_MAX_CPU_TIME_US as a configuration option (which is where
> > > > > > > the too long comes from). And picking a random CI run:
> > > > > > > https://source.denx.de/u-boot/u-boot/-/jobs/511177
> > > > > > > I do see we hit this in CI once, but not every time, QEMU runs here. Is
> > > > > > > that the max time is configurable enough to satisfy your concerns here?
> > > > > >
> > > > > > It is needed to investigate, how to _properly_ fix this issue, not just
> > > > > > workarounded it. Probably other boards may be affected.
> > > > >
> > > > > So it's the cyclic watchdog code, which we merged as early as possible
> > > > > that's the reason here. And it was merged as early as we could to see if
> > > > > there's problems. Are there problems? We're seeing "system too slow,
> > > > > disabling" on QEMU, sometimes, and the value of too slow is
> > > > > configurable. I know you reported other problems with n900 HW, so we
> > > > > can't see if it's failing there
> > > >
> > > > I was tested it with older asm code (as described in that other email,
> > > > via git checkout commit -- file) on n900 HW and watchdog problem is
> > > > there too. Phone reboots in about 20 seconds. But as I do not have
> > > > serial console, I do not know if that "disabling" message is printed
> > > > there too (but I guess it is).
> > >
> > > I think I'm a bit baffled at this point, honestly. The watchdog timeout
> > > is 60 seconds. If you're confident in it being about 20 seconds,
> > > consistently, changing WATCHDOG_TIMEOUT_MSECS to say 10000 (so, 10
> > > seconds) should let you see if U-Boot has configured the watchdog and
> > > it's being tripped, or if it's still at the prior stage value.
> >
> > $ git grep CONFIG_WATCHDOG_TIMEOUT_MSECS configs/nokia_rx51_defconfig
> > configs/nokia_rx51_defconfig:CONFIG_WATCHDOG_TIMEOUT_MSECS=31000
> >
> > Also watchdog is started by NOLO (which loads and execute U-Boot) so
> > there can be some smaller timeout.
> >
> > So I have feeling that on the real HW is same issue. cyclic code
> > disabled watchdog kicking and then watchdog restarted phone.
> >
> > I do not remember exact time (if it is 20s or 25s; I have not measured
> > it precisely), but it sounds plausible.
>
> OK, so what happens if you increase CONFIG_CYCLIC_MAX_CPU_TIME_US to
> something very high (so we should still enable the watchdog and
> configure the timeout) along with CONFIG_WATCHDOG_TIMEOUT_MSECS being
> high too (so if we can't service it in time really it's so long as to be
> noticeable) ? Or CONFIG_WATCHDOG_TIMEOUT_MSECS to something much lower
> (so that if the device is resetting quicker we're crashing elsewhere) ?
OK, on my beagleboard xM with a small change:
diff --git a/drivers/watchdog/omap_wdt.c b/drivers/watchdog/omap_wdt.c
index ca2bc7cfb59e..f0e57b4f7286 100644
--- a/drivers/watchdog/omap_wdt.c
+++ b/drivers/watchdog/omap_wdt.c
@@ -39,7 +39,7 @@
#include <common.h>
#include <log.h>
#include <watchdog.h>
-#include <asm/arch/hardware.h>
+#include <asm/ti-common/omap_wdt.h>
#include <asm/io.h>
#include <asm/processor.h>
#include <asm/arch/cpu.h>
On my beagleboard xM I now see:
U-Boot SPL 2022.10-00459-g73e741b8ee46-dirty (Oct 10 2022 - 15:18:38 -0400)
Trying to boot from MMC1
U-Boot 2022.10-00459-g73e741b8ee46-dirty (Oct 10 2022 - 15:18:38 -0400)
OMAP3630/3730-GP ES1.1, CPU-OPP2, L3-200MHz, Max CPU Clock 800 MHz
Model: TI OMAP3 BeagleBoard
OMAP3 Beagle board + LPDDR/NAND
I2C: ready
DRAM: 256 MiB
Core: 45 devices, 19 uclasses, devicetree: separate
WDT: Started wdt@48314000 without servicing (60s timeout)
NAND: 0 MiB
MMC: OMAP SD/MMC: 0
Loading Environment from NAND... *** Warning - readenv() failed, using default environment
Beagle xM Rev A/B
No EEPROM on expansion board
OMAP die ID: 6e5e00211ff00000015739eb08031024
Net: No ethernet found.
Hit any key to stop autoboot: 0
So, this is as close as I can get to testing on n900 HW, and it's fine
here.
--
Tom
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]
next prev parent reply other threads:[~2022-10-10 19:24 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-09 19:12 Broken watchdog in u-boot master branch Pali Rohár
2022-10-10 13:55 ` Tom Rini
2022-10-10 16:19 ` Pali Rohár
2022-10-11 7:18 ` Rasmus Villemoes
2022-10-11 7:25 ` Pali Rohár
2022-10-17 6:52 ` Stefan Roese
2022-10-10 16:28 ` Tom Rini
2022-10-10 17:22 ` Pali Rohár
2022-10-10 17:40 ` Tom Rini
2022-10-10 17:44 ` Pali Rohár
2022-10-10 17:56 ` Tom Rini
2022-10-10 18:01 ` Pali Rohár
2022-10-10 18:14 ` Tom Rini
2022-10-10 19:24 ` Tom Rini [this message]
2022-10-10 19:33 ` Pali Rohár
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221010192413.GS2020586@bill-the-cat \
--to=trini@konsulko.com \
--cc=pali@kernel.org \
--cc=sr@denx.de \
--cc=u-boot@lists.denx.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox