linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Stijn Tintel <stijn@linux-ipv6.be>
To: linuxppc-dev@lists.ozlabs.org,
	Nicholas Piggin <npiggin@gmail.com>,
	Davidlohr Bueso <dbueso@suse.de>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Paul Mackerras <paulus@samba.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Michael Ellerman <mpe@ellerman.id.au>
Subject: [BISECTED] power8: watchdog: CPU 3 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x154/0x2d0
Date: Wed, 22 Dec 2021 03:20:45 +0200	[thread overview]
Message-ID: <c9abdadc-bc38-dbba-7f96-1ce15db8ab79@linux-ipv6.be> (raw)

Hi,

After upgrading my Power8 server from 5.10 LTS to 5.15 LTS, I started
experiencing CPU hard lockups, usually rather quickly after boot:


watchdog: CPU 3 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x154/0x2d0
watchdog: CPU 3 TB:265651929071, last heartbeat TB:259344820187 (12318ms
ago)
watchdog: CPU 4 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x22c/0x2d0
watchdog: CPU 4 TB:265651929059, last heartbeat TB:259344820045 (12318ms
ago)
watchdog: CPU 5 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 5 TB:265651929037, last heartbeat TB:259349940303 (12308ms
ago)
watchdog: CPU 6 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x144/0x2d0
watchdog: CPU 6 TB:265651929056, last heartbeat TB:259349940294 (12308ms
ago)
watchdog: CPU 12 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x280/0x2d0
watchdog: CPU 12 TB:242479050267, last heartbeat TB:236822174350
(11048ms ago)
watchdog: CPU 26 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x22c/0x2d0
watchdog: CPU 26 TB:265657049348, last heartbeat TB:259355060595
(12308ms ago)
watchdog: CPU 40 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 40 TB:265657049289, last heartbeat TB:259360180427
(12298ms ago)
watchdog: CPU 47 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x21c/0x2d0
watchdog: CPU 47 TB:265657049213, last heartbeat TB:259365300321
(12288ms ago)
watchdog: CPU 60 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 60 TB:265651929348, last heartbeat TB:259370420527
(12268ms ago)
watchdog: CPU 72 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 72 TB:265718488733, last heartbeat TB:259375540545
(12388ms ago)
watchdog: CPU 13 detected hard LOCKUP on other CPUs 0-2,7,10,44
watchdog: CPU 13 TB:267541867921, last SMP heartbeat TB:259380660378
(15939ms ago)
watchdog: CPU 34 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 34 TB:269913954376, last heartbeat TB:263456144470
(12612ms ago)
watchdog: CPU 41 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 41 TB:267865972392, last heartbeat TB:261408162383
(12612ms ago)
watchdog: CPU 74 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 74 TB:267766470637, last heartbeat TB:261423522630
(12388ms ago)
watchdog: CPU 8 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 8 TB:274978264599, last heartbeat TB:269237436681 (11212ms
ago)
watchdog: CPU 9 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 9 TB:268029810836, last heartbeat TB:261397922093 (12952ms
ago)
watchdog: CPU 11 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 11 TB:279685725759, last heartbeat TB:273685814104
(11718ms ago)
watchdog: CPU 16 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 16 TB:267865972449, last heartbeat TB:261397922458
(12632ms ago)
watchdog: CPU 18 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 18 TB:269913954314, last heartbeat TB:263445904285
(12632ms ago)
watchdog: CPU 24 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 24 TB:267865972338, last heartbeat TB:261403042311
(12622ms ago)
watchdog: CPU 31 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x22c/0x2d0
watchdog: CPU 31 TB:268029811095, last heartbeat TB:261403042673
(12942ms ago)
watchdog: CPU 32 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 32 TB:267865972528, last heartbeat TB:261403042589
(12622ms ago)
watchdog: CPU 33 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 33 TB:268029811013, last heartbeat TB:261408162474
(12932ms ago)
watchdog: CPU 35 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 35 TB:280174344471, last heartbeat TB:273696054625
(12652ms ago)
watchdog: CPU 37 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 37 TB:269913954356, last heartbeat TB:263456144501
(12612ms ago)
watchdog: CPU 38 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x228/0x2d0
watchdog: CPU 38 TB:290393774681, last heartbeat TB:283946212510
(12592ms ago)

Bisecting lead to the following commit:

deb9b13eb2571fbde164ae012c77985fd14f2f02 is the first bad commit
commit deb9b13eb2571fbde164ae012c77985fd14f2f02
Author: Davidlohr Bueso <dave@stgolabs.net>
Date:   Mon Mar 8 17:59:50 2021 -0800

   powerpc/qspinlock: Use generic smp_cond_load_relaxed
   

The problem persists in 2f47a9a4dfa3674fad19a49b40c5103a9a8e1589 and
goes away if I revert deb9b13eb2571fbde164ae012c77985fd14f2f02 on top of
that. As deb9b13eb2571fbde164ae012c77985fd14f2f02 seems to be a revert
of 49a7d46a06c30c7beabbf9d1a8ea1de0f9e4fdfe, I suspect this problem
might have existed before 49a7d46a06c30c7beabbf9d1a8ea1de0f9e4fdfe. I
therefore tried to build 49a7d46a06c30c7beabbf9d1a8ea1de0f9e4fdfe and
49a7d46a06c30c7beabbf9d1a8ea1de0f9e4fdfe^1 to verify if the problem
exists there as well, unfortunately these commits don't build due to the
following compile error:

kernel/smp.c:In function 'smp_init':
./include/linux/compiler.h:392:38:error: call to
'__compiletime_assert_150' declared with attribute error: BUILD_BUG_ON
failed: offsetof(struct task_struct, wake_entry_type) - offsetof(struct
task_struct, wake_entry) != offsetof(struct __call_single_data, flags) -
offsetof(struct __call_single_data, llist)
 392 |  _compiletime_assert(condition, msg, __compiletime_assert_,
__COUNTER__)
     |                                      ^

Is this report enough to revert deb9b13eb2571fbde164ae012c77985fd14f2f02
for now?

Stijn


             reply	other threads:[~2021-12-22 10:34 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-22  1:20 Stijn Tintel [this message]
2021-12-25 10:31 ` [BISECTED] power8: watchdog: CPU 3 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x154/0x2d0 Nicholas Piggin
2021-12-28  8:55   ` Stijn Tintel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c9abdadc-bc38-dbba-7f96-1ce15db8ab79@linux-ipv6.be \
    --to=stijn@linux-ipv6.be \
    --cc=benh@kernel.crashing.org \
    --cc=christophe.leroy@csgroup.eu \
    --cc=dbueso@suse.de \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).