public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Marc Hartmayer" <mhartmay@linux.ibm.com>
To: Lai Jiangshan <jiangshanlai@gmail.com>, linux-kernel@vger.kernel.org
Cc: Lai Jiangshan <jiangshan.ljs@antgroup.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Sven Schnelle <svens@linux.ibm.com>,
	Mete Durlu <meted@linux.ibm.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>
Subject: Re: [PATCH 1/4] workqueue: Reap workers via kthread_stop() and remove detach_completion
Date: Tue, 10 Sep 2024 11:45:04 +0200	[thread overview]
Message-ID: <87wmjj971b.fsf@linux.ibm.com> (raw)
In-Reply-To: <87le1sjd2e.fsf@linux.ibm.com>

On Tue, Jul 23, 2024 at 06:19 PM +0200, "Marc Hartmayer" <mhartmay@linux.ibm.com> wrote:
> On Fri, Jun 21, 2024 at 03:32 PM +0800, Lai Jiangshan <jiangshanlai@gmail.com> wrote:
>> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
>>
>> The code to kick off the destruction of workers is now in a process
>> context (idle_cull_fn()), so kthread_stop() can be used in the process
>> context to replace the work of pool->detach_completion.
>>
>> The wakeup in wake_dying_workers() is unneeded after this change, but it
>> is harmless, jut keep it here until next patch renames wake_dying_workers()
>> rather than renaming it again and again.
>>
>> Cc: Valentin Schneider <vschneid@redhat.com>
>> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
>> ---
>>  kernel/workqueue.c | 35 +++++++++++++++++++----------------
>>  1 file changed, 19 insertions(+), 16 deletions(-)
>>
>

Hi Lai,

I’m not sure if this NULL-pointer crash is related to this patch series
or not. But it is triggered by the same test that also triggered the
other problem that I reported.

        [   23.133876] Unable to handle kernel pointer dereference in virtual kernel address space
        [   23.133950] Failing address: 0000000000000000 TEID: 0000000000000483
        [   23.133954] Fault in home space mode while using kernel ASCE.
        [   23.133957] AS:000000001b8f0007 R3:0000000056cf4007 S:0000000056cf3800 P:000000000000003d 
        [   23.134207] Oops: 0004 ilc:2 [#1] SMP 
        [   23.134273] Modules linked in: essiv authenc dm_crypt encrypted_keys loop pkey zcrypt s390_trng rng_core ghash_s390 prng chacha_s390 libchacha aes_s390 des_s390 virtio_console libdes vmw_vsock_virtio_transport vmw_vsock_virtio_transport_common sha3_512_s390 vsock sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common vfio_ccw mdev vfio_iommu_type1 vfio sch_fq_codel drm i2c_core drm_panel_orientation_quirks configfs autofs4
        [   23.134386] CPU: 0 UID: 0 PID: 376 Comm: kworker/u10:2 Not tainted 6.11.0-20240902.rc6.git1.67784a74e258.300.fc40.s390x+git #1
        [   23.134394] Hardware name: IBM 8561 T01 703 (KVM/Linux)
        [   23.134406] Workqueue:  0x0 ()
        [   23.134440] Krnl PSW : 0404c00180000000 0000024e326caf28 (worker_thread+0x48/0x430)
        [   23.134471]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
        [   23.134474] Krnl GPRS: 0000000058778000 0000000000000000 0000024e00000001 0000000058778000
        [   23.134476]            0000000000000000 0000000058778000 0000000057b8e240 0000000000000002
        [   23.134480]            0000000000000000 0000000000000028 0000000000000000 0000000057b8e240
        [   23.134481]            0000000058778000 0000000058778000 0000024e326caf18 000001ce32953d88
        [   23.134499] Krnl Code: 0000024e326caf1c: acfcf0c8		stnsm	200(%r15),252
        [   23.134499]            0000024e326caf20: a7180000		lhi	%r1,0
        [   23.134499]           #0000024e326caf24: 582083ac		l	%r2,940(%r8)
        [   23.134499]           >0000024e326caf28: ba12a000		cs	%r1,%r2,0(%r10)
        [   23.134499]            0000024e326caf2c: a77400cf		brc	7,0000024e326cb0ca
        [   23.134499]            0000024e326caf30: 5800b078		l	%r0,120(%r11)
        [   23.134499]            0000024e326caf34: a7010002		tmll	%r0,2
        [   23.134499]            0000024e326caf38: a77400d4		brc	7,0000024e326cb0e0
        [   23.134516] Call Trace:
        [   23.134520]  [<0000024e326caf28>] worker_thread+0x48/0x430 
        [   23.134525] ([<0000024e326caf18>] worker_thread+0x38/0x430)
        [   23.134528]  [<0000024e326d3a3e>] kthread+0x11e/0x130 
        [   23.134533]  [<0000024e3264b0dc>] __ret_from_fork+0x3c/0x60 
        [   23.134536]  [<0000024e333fb37a>] ret_from_fork+0xa/0x38 
        [   23.134552] Last Breaking-Event-Address:
        [   23.134553]  [<0000024e333f4c04>] mutex_unlock+0x24/0x30
        [   23.134562] Kernel panic - not syncing: Fatal exception: panic_on_oops

This happened with Linux
6.11.0-20240902.rc6.git1.67784a74e258.300.fc40.s390x (using defconfig),
but also with an older commit
6.11.0-20240719.rc0.git15.720261cfc732.300.fc40.s390x on s390x (both
kernels contain your patches). I have not bisected/debugged the
problem yet, but you may have an idea already. Will try to reproduce the
problem and give you more debug information.

Thanks!

[…snip]

-- 
Kind regards / Beste Grüße
   Marc Hartmayer

IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Wolfgang Wendt
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

  parent reply	other threads:[~2024-09-10  9:45 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-21  7:32 [PATCH 0/4] workqueue: Destroy workers in idle_cull_fn() Lai Jiangshan
2024-06-21  7:32 ` [PATCH 1/4] workqueue: Reap workers via kthread_stop() and remove detach_completion Lai Jiangshan
2024-07-23 16:19   ` Marc Hartmayer
2024-07-25  0:11     ` Lai Jiangshan
2024-07-29  1:49       ` Lai Jiangshan
2024-09-10  9:45     ` Marc Hartmayer [this message]
2024-09-10 16:29       ` Marc Hartmayer
2024-09-11  3:23         ` Lai Jiangshan
2024-09-11  3:32           ` Lai Jiangshan
2024-09-11  8:27             ` Marc Hartmayer
2024-09-11  9:37               ` Marc Hartmayer
2024-06-21  7:32 ` [PATCH 2/4] workqueue: Don't bind the rescuer in the last working cpu Lai Jiangshan
2024-06-21  7:32 ` [PATCH 3/4] workqueue: Detach workers directly in idle_cull_fn() Lai Jiangshan
2024-06-21  7:32 ` [PATCH 4/4] workqueue: Remove useless pool->dying_workers Lai Jiangshan
2024-06-21 22:34 ` [PATCH 0/4] workqueue: Destroy workers in idle_cull_fn() Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wmjj971b.fsf@linux.ibm.com \
    --to=mhartmay@linux.ibm.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=jiangshan.ljs@antgroup.com \
    --cc=jiangshanlai@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=meted@linux.ibm.com \
    --cc=svens@linux.ibm.com \
    --cc=tj@kernel.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox