All of lore.kernel.org
 help / color / mirror / Atom feed
From: stuart hayes <stuart.w.hayes@gmail.com>
To: Bert Karwatzki <spasswolf@web.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-kernel@vger.kernel.org, linux-next@vger.kernel.org,
	Tejun Heo <tj@kernel.org>
Subject: Re: hung tasks on shutdown in linux-next-202409{20,23,24,25}
Date: Mon, 30 Sep 2024 16:11:55 -0500	[thread overview]
Message-ID: <f4547877-8aa2-45a0-b05d-624eb4e2d296@gmail.com> (raw)
In-Reply-To: <20240929105329.4797-1-spasswolf@web.de>



On 9/29/2024 5:53 AM, Bert Karwatzki wrote:
> Summary: The introduction of async reboot in commit 8064952c6504
> ("driver core: shut down devices asynchronously") leads to frequent hangs on
> shutdown even after commit 4f2c346e6216 ("driver core: fix async device shutdown hang")
> is introduced.
> 
> I did some further experimenting (and lots of reboots ...) and found out that
> the bug is preemption related, for me it only occurs when using CONFIG_PREEMPT=y
> or CONFIG_PREEMPT_RT=y. When using CONFIG_PREEMPT_NONE=y or
> CONFIG_PREEMPT_VOLUNTARY=y everything works fine.
> 
> Test results (linux-next-20240925):
> PREEMPT_NONE		20 reboots, no fail
> PREEMPT_VOLUNTARY	20 reboots, no fail
> PREEMPT			3 reboots, 4th reboot failed
> PREEMPT_RT		2 reboots, 3rd reboot failed
> 
> The behaviour can be improved by increasing the number of min_active items
> in the async workqueue:
> 

Thank you for continuing to look at this! That is interesting data.

I see from an earlier message that drm_atomic_helper_dirtyfb is holding a lock when
the hang occurs:

> T115;4 locks held by kworker/7:2/343:
> T115; #0: ffff91ea00050d48 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x4a4/0x580
> T115; #1: ffffbaf182e07e58 ((work_completion)(&helper->damage_work)){+.+.}-{0:0}, at: process_one_work+0x1c7/0x580
> T115; #2: ffffbaf182e07d00 (crtc_ww_class_acquire){+.+.}-{0:0}, at: drm_atomic_helper_dirtyfb+0x47/0x280
> T115; #3: ffff91ea13b80528 (crtc_ww_class_mutex){+.+.}-{3:3}, at: modeset_lock+0xbf/0x1b0

Except for NVMe drives, the shutdown process with the async shutdown patches should be
the same as the shutdown process without the patch--that is, the devices should be shut
down one after the other, in the same order... the only difference is that the individual
device shutdowns are scheduled in a workqueue where they wait for the previous device
shutdown to finish, instead of being shut down one at a time in a loop in the systemd
task.  So I'm wondering if the async shutdown could somehow exposing some sort of race in
a display device driver's shutdown function.

A full CPU backtrace (which you could get from setting /proc/sys/kernel/hung_task_all_cpu_backtrace
before reproducing the error) would be extremely helpful if you have the inclination... :)





  reply	other threads:[~2024-09-30 21:12 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-29 10:53 hung tasks on shutdown in linux-next-202409{20,23,24,25} Bert Karwatzki
2024-09-30 21:11 ` stuart hayes [this message]
  -- strict thread matches above, loose matches on Subject: below --
2024-09-29 10:52 Bert Karwatzki
2024-09-25 21:37 Bert Karwatzki
2024-09-25 21:48 ` stuart hayes
2024-09-25 11:40 Bert Karwatzki
2024-09-25 12:09 ` Greg Kroah-Hartman
2024-09-25 19:15   ` Bert Karwatzki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f4547877-8aa2-45a0-b05d-624eb4e2d296@gmail.com \
    --to=stuart.w.hayes@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-next@vger.kernel.org \
    --cc=spasswolf@web.de \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.