From: "Marc Hartmayer" <mhartmay@linux.ibm.com>
To: Lai Jiangshan <jiangshanlai@gmail.com>, linux-kernel@vger.kernel.org
Cc: Lai Jiangshan <jiangshan.ljs@antgroup.com>,
Valentin Schneider <vschneid@redhat.com>,
Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>,
Heiko Carstens <hca@linux.ibm.com>,
Sven Schnelle <svens@linux.ibm.com>,
Mete Durlu <meted@linux.ibm.com>,
Christian Borntraeger <borntraeger@linux.ibm.com>
Subject: Re: [PATCH 1/4] workqueue: Reap workers via kthread_stop() and remove detach_completion
Date: Tue, 10 Sep 2024 18:29:17 +0200 [thread overview]
Message-ID: <87tten8obm.fsf@linux.ibm.com> (raw)
In-Reply-To: <87wmjj971b.fsf@linux.ibm.com>
On Tue, Sep 10, 2024 at 11:45 AM +0200, "Marc Hartmayer" <mhartmay@linux.ibm.com> wrote:
> On Tue, Jul 23, 2024 at 06:19 PM +0200, "Marc Hartmayer" <mhartmay@linux.ibm.com> wrote:
>> On Fri, Jun 21, 2024 at 03:32 PM +0800, Lai Jiangshan <jiangshanlai@gmail.com> wrote:
>>> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
>>>
>>> The code to kick off the destruction of workers is now in a process
>>> context (idle_cull_fn()), so kthread_stop() can be used in the process
>>> context to replace the work of pool->detach_completion.
>>>
>>> The wakeup in wake_dying_workers() is unneeded after this change, but it
>>> is harmless, jut keep it here until next patch renames wake_dying_workers()
>>> rather than renaming it again and again.
>>>
>>> Cc: Valentin Schneider <vschneid@redhat.com>
>>> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
>>> ---
>>> kernel/workqueue.c | 35 +++++++++++++++++++----------------
>>> 1 file changed, 19 insertions(+), 16 deletions(-)
>>>
>>
[…snip…]
Hi Lai,
I’ve reproduced the issue using the Linux commit bc83b4d1f086. Here is
the “beautified” stacktrace (output of
`$LINUX/scripts/decode_stacktrace.sh vmlinux auto < dmesg.txt`).
...
[ 14.271265] Unable to handle kernel pointer dereference in virtual kernel address space
[ 14.271314] Failing address: 0000000000000000 TEID: 0000000000000483
[ 14.271317] Fault in home space mode while using kernel ASCE.
[ 14.271320] AS:000000001df84007 R3:0000000064888007 S:0000000064887800 P:000000000000003d
[ 14.271519] Oops: 0004 ilc:2 [#1] SMP
[ 14.271570] Modules linked in: essiv authenc dm_crypt encrypted_keys loop vmw_vsock_virtio_transport vmw_vsock_virtio_transport_common pkey vsock zcrypt s390_trng rng_core ghash_s390 prng chacha_s390 libchacha virtio_console aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common vfio_ccw mdev vfio_iommu_type1 vfio sch_fq_codel drm i2c_core drm_panel_orientation_quirks configfs autofs4
[ 14.271661] CPU: 0 UID: 0 PID: 324 Comm: kworker/u10:2 Not tainted 6.11.0-20240909.rc7.git8.bc83b4d1f086.300.fc40.s390x+git #1
[ 14.271667] Hardware name: IBM 8561 T01 701 (KVM/Linux)
[ 14.271677] Workqueue: . 0x0 ()
[ 14.271702] Krnl PSW : 0404c00180000000 000002d8c205ef28 worker_thread (./arch/s390/include/asm/atomic_ops.h:198 ./arch/s390/include/asm/spinlock.h:61 ./arch/s390/include/asm/spinlock.h:66 ./include/linux/spinlock.h:187 ./include/linux/spinlock_api_smp.h:120 kernel/workqueue.c:3346)
[ 14.271728] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[ 14.271730] Krnl GPRS: 00000000673f0000 0000000000000000 000002d800000001 00000000673f0000
[ 14.271732] 0000000000000000 00000000673f0000 0000000067ac5b40 0000000000000002
[ 14.271735] 0000000000000000 0000000000000028 0000000000000000 0000000067ac5b40
[ 14.271736] 00000000673f0000 00000000673f0000 000002d8c205ef18 00000258c22a7d88
[ 14.271752] Krnl Code: 000002d8c205ef1c: acfcf0c8 stnsm 200(%r15),252
objdump: '/tmp/tmp.yRzOQQynJL.o': No such file
objdump: '/tmp/tmp.yRzOQQynJL.o': No such file
All code
========
Code starting with the faulting instruction
===========================================
000002d8c205ef20: a7180000 lhi %r1,0
#000002d8c205ef24: 582083ac l %r2,940(%r8)
>000002d8c205ef28: ba12a000 cs %r1,%r2,0(%r10)
000002d8c205ef2c: a77400cf brc 7,000002d8c205f0ca
000002d8c205ef30: 5800b078 l %r0,120(%r11)
000002d8c205ef34: a7010002 tmll %r0,2
000002d8c205ef38: a77400d4 brc 7,000002d8c205f0e0
[ 14.271766] Call Trace:
[ 14.271769] worker_thread (./arch/s390/include/asm/atomic_ops.h:198 ./arch/s390/include/asm/spinlock.h:61 ./arch/s390/include/asm/spinlock.h:66 ./include/linux/spinlock.h:187 ./include/linux/spinlock_api_smp.h:120 kernel/workqueue.c:3346)
[ 14.271774] worker_thread (./arch/s390/include/asm/lowcore.h:226 ./arch/s390/include/asm/spinlock.h:61 ./arch/s390/include/asm/spinlock.h:66 ./include/linux/spinlock.h:187 ./include/linux/spinlock_api_smp.h:120 kernel/workqueue.c:3346)
[ 14.271777] kthread (kernel/kthread.c:389)
[ 14.271781] __ret_from_fork (arch/s390/kernel/process.c:62)
[ 14.271784] ret_from_fork (arch/s390/kernel/entry.S:309)
[ 14.271806] Last Breaking-Event-Address:
[ 14.271807] mutex_unlock (kernel/locking/mutex.c:549)
So it seems to me that `worker->pool` is NULL in the
`workqueue.c:worker_thread` function and this leads to the crash.
My next step is to try to bisect the bug.
--
Kind regards / Beste Grüße
Marc Hartmayer
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Wolfgang Wendt
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294
next prev parent reply other threads:[~2024-09-10 16:29 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-21 7:32 [PATCH 0/4] workqueue: Destroy workers in idle_cull_fn() Lai Jiangshan
2024-06-21 7:32 ` [PATCH 1/4] workqueue: Reap workers via kthread_stop() and remove detach_completion Lai Jiangshan
2024-07-23 16:19 ` Marc Hartmayer
2024-07-25 0:11 ` Lai Jiangshan
2024-07-29 1:49 ` Lai Jiangshan
2024-09-10 9:45 ` Marc Hartmayer
2024-09-10 16:29 ` Marc Hartmayer [this message]
2024-09-11 3:23 ` Lai Jiangshan
2024-09-11 3:32 ` Lai Jiangshan
2024-09-11 8:27 ` Marc Hartmayer
2024-09-11 9:37 ` Marc Hartmayer
2024-06-21 7:32 ` [PATCH 2/4] workqueue: Don't bind the rescuer in the last working cpu Lai Jiangshan
2024-06-21 7:32 ` [PATCH 3/4] workqueue: Detach workers directly in idle_cull_fn() Lai Jiangshan
2024-06-21 7:32 ` [PATCH 4/4] workqueue: Remove useless pool->dying_workers Lai Jiangshan
2024-06-21 22:34 ` [PATCH 0/4] workqueue: Destroy workers in idle_cull_fn() Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87tten8obm.fsf@linux.ibm.com \
--to=mhartmay@linux.ibm.com \
--cc=borntraeger@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=jiangshan.ljs@antgroup.com \
--cc=jiangshanlai@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=meted@linux.ibm.com \
--cc=svens@linux.ibm.com \
--cc=tj@kernel.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox