All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Marc Hartmayer" <mhartmay@linux.ibm.com>
To: Lai Jiangshan <jiangshanlai@gmail.com>, linux-kernel@vger.kernel.org
Cc: Lai Jiangshan <jiangshan.ljs@antgroup.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Sven Schnelle <svens@linux.ibm.com>,
	Mete Durlu <meted@linux.ibm.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>
Subject: Re: [PATCH 1/4] workqueue: Reap workers via kthread_stop() and remove detach_completion
Date: Tue, 10 Sep 2024 11:45:04 +0200	[thread overview]
Message-ID: <87wmjj971b.fsf@linux.ibm.com> (raw)
In-Reply-To: <87le1sjd2e.fsf@linux.ibm.com>

On Tue, Jul 23, 2024 at 06:19 PM +0200, "Marc Hartmayer" <mhartmay@linux.ibm.com> wrote:
> On Fri, Jun 21, 2024 at 03:32 PM +0800, Lai Jiangshan <jiangshanlai@gmail.com> wrote:
>> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
>>
>> The code to kick off the destruction of workers is now in a process
>> context (idle_cull_fn()), so kthread_stop() can be used in the process
>> context to replace the work of pool->detach_completion.
>>
>> The wakeup in wake_dying_workers() is unneeded after this change, but it
>> is harmless, jut keep it here until next patch renames wake_dying_workers()
>> rather than renaming it again and again.
>>
>> Cc: Valentin Schneider <vschneid@redhat.com>
>> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
>> ---
>>  kernel/workqueue.c | 35 +++++++++++++++++++----------------
>>  1 file changed, 19 insertions(+), 16 deletions(-)
>>
>

Hi Lai,

I’m not sure if this NULL-pointer crash is related to this patch series
or not. But it is triggered by the same test that also triggered the
other problem that I reported.

        [   23.133876] Unable to handle kernel pointer dereference in virtual kernel address space
        [   23.133950] Failing address: 0000000000000000 TEID: 0000000000000483
        [   23.133954] Fault in home space mode while using kernel ASCE.
        [   23.133957] AS:000000001b8f0007 R3:0000000056cf4007 S:0000000056cf3800 P:000000000000003d 
        [   23.134207] Oops: 0004 ilc:2 [#1] SMP 
        [   23.134273] Modules linked in: essiv authenc dm_crypt encrypted_keys loop pkey zcrypt s390_trng rng_core ghash_s390 prng chacha_s390 libchacha aes_s390 des_s390 virtio_console libdes vmw_vsock_virtio_transport vmw_vsock_virtio_transport_common sha3_512_s390 vsock sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common vfio_ccw mdev vfio_iommu_type1 vfio sch_fq_codel drm i2c_core drm_panel_orientation_quirks configfs autofs4
        [   23.134386] CPU: 0 UID: 0 PID: 376 Comm: kworker/u10:2 Not tainted 6.11.0-20240902.rc6.git1.67784a74e258.300.fc40.s390x+git #1
        [   23.134394] Hardware name: IBM 8561 T01 703 (KVM/Linux)
        [   23.134406] Workqueue:  0x0 ()
        [   23.134440] Krnl PSW : 0404c00180000000 0000024e326caf28 (worker_thread+0x48/0x430)
        [   23.134471]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
        [   23.134474] Krnl GPRS: 0000000058778000 0000000000000000 0000024e00000001 0000000058778000
        [   23.134476]            0000000000000000 0000000058778000 0000000057b8e240 0000000000000002
        [   23.134480]            0000000000000000 0000000000000028 0000000000000000 0000000057b8e240
        [   23.134481]            0000000058778000 0000000058778000 0000024e326caf18 000001ce32953d88
        [   23.134499] Krnl Code: 0000024e326caf1c: acfcf0c8		stnsm	200(%r15),252
        [   23.134499]            0000024e326caf20: a7180000		lhi	%r1,0
        [   23.134499]           #0000024e326caf24: 582083ac		l	%r2,940(%r8)
        [   23.134499]           >0000024e326caf28: ba12a000		cs	%r1,%r2,0(%r10)
        [   23.134499]            0000024e326caf2c: a77400cf		brc	7,0000024e326cb0ca
        [   23.134499]            0000024e326caf30: 5800b078		l	%r0,120(%r11)
        [   23.134499]            0000024e326caf34: a7010002		tmll	%r0,2
        [   23.134499]            0000024e326caf38: a77400d4		brc	7,0000024e326cb0e0
        [   23.134516] Call Trace:
        [   23.134520]  [<0000024e326caf28>] worker_thread+0x48/0x430 
        [   23.134525] ([<0000024e326caf18>] worker_thread+0x38/0x430)
        [   23.134528]  [<0000024e326d3a3e>] kthread+0x11e/0x130 
        [   23.134533]  [<0000024e3264b0dc>] __ret_from_fork+0x3c/0x60 
        [   23.134536]  [<0000024e333fb37a>] ret_from_fork+0xa/0x38 
        [   23.134552] Last Breaking-Event-Address:
        [   23.134553]  [<0000024e333f4c04>] mutex_unlock+0x24/0x30
        [   23.134562] Kernel panic - not syncing: Fatal exception: panic_on_oops

This happened with Linux
6.11.0-20240902.rc6.git1.67784a74e258.300.fc40.s390x (using defconfig),
but also with an older commit
6.11.0-20240719.rc0.git15.720261cfc732.300.fc40.s390x on s390x (both
kernels contain your patches). I have not bisected/debugged the
problem yet, but you may have an idea already. Will try to reproduce the
problem and give you more debug information.

Thanks!

[…snip]

-- 
Kind regards / Beste Grüße
   Marc Hartmayer

IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Wolfgang Wendt
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

  parent reply	other threads:[~2024-09-10  9:45 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-21  7:32 [PATCH 0/4] workqueue: Destroy workers in idle_cull_fn() Lai Jiangshan
2024-06-21  7:32 ` [PATCH 1/4] workqueue: Reap workers via kthread_stop() and remove detach_completion Lai Jiangshan
2024-07-23 16:19   ` Marc Hartmayer
2024-07-25  0:11     ` Lai Jiangshan
2024-07-29  1:49       ` Lai Jiangshan
2024-09-10  9:45     ` Marc Hartmayer [this message]
2024-09-10 16:29       ` Marc Hartmayer
2024-09-11  3:23         ` Lai Jiangshan
2024-09-11  3:32           ` Lai Jiangshan
2024-09-11  8:27             ` Marc Hartmayer
2024-09-11  9:37               ` Marc Hartmayer
2024-06-21  7:32 ` [PATCH 2/4] workqueue: Don't bind the rescuer in the last working cpu Lai Jiangshan
2024-06-21  7:32 ` [PATCH 3/4] workqueue: Detach workers directly in idle_cull_fn() Lai Jiangshan
2024-06-21  7:32 ` [PATCH 4/4] workqueue: Remove useless pool->dying_workers Lai Jiangshan
2024-06-21 22:34 ` [PATCH 0/4] workqueue: Destroy workers in idle_cull_fn() Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wmjj971b.fsf@linux.ibm.com \
    --to=mhartmay@linux.ibm.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=jiangshan.ljs@antgroup.com \
    --cc=jiangshanlai@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=meted@linux.ibm.com \
    --cc=svens@linux.ibm.com \
    --cc=tj@kernel.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.