From: Tejun Heo <tj@kernel.org>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Lai Jiangshan <jiangshanlai@gmail.com>,
Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/4] workqueue: Improve scalability of workqueue watchdog touch
Date: Tue, 25 Jun 2024 06:57:33 -1000 [thread overview]
Message-ID: <Znr2_fjb1UY-Rakp@slm.duckdns.org> (raw)
In-Reply-To: <20240625114249.289014-3-npiggin@gmail.com>
On Tue, Jun 25, 2024 at 09:42:45PM +1000, Nicholas Piggin wrote:
> On a ~2000 CPU powerpc system, hard lockups have been observed in the
> workqueue code when stop_machine runs (in this case due to CPU hotplug).
> This is due to lots of CPUs spinning in multi_cpu_stop, calling
> touch_nmi_watchdog() which ends up calling wq_watchdog_touch().
> wq_watchdog_touch() writes to the global variable wq_watchdog_touched,
> and that can find itself in the same cacheline as other important
> workqueue data, which slows down operations to the point of lockups.
>
> In the case of the following abridged trace, worker_pool_idr was in
> the hot line, causing the lockups to always appear at idr_find.
>
> watchdog: CPU 1125 self-detected hard LOCKUP @ idr_find
> Call Trace:
> get_work_pool
> __queue_work
> call_timer_fn
> run_timer_softirq
> __do_softirq
> do_softirq_own_stack
> irq_exit
> timer_interrupt
> decrementer_common_virt
> * interrupt: 900 (timer) at multi_cpu_stop
> multi_cpu_stop
> cpu_stopper_thread
> smpboot_thread_fn
> kthread
>
> Fix this by having wq_watchdog_touch() only write to the line if the
> last time a touch was recorded exceeds 1/4 of the watchdog threshold.
>
> Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Applied 1-2 to wq/for-6.11. I think 3 and 4 should probably be routed
through either tip or Andrew?
Thanks.
--
tejun
next prev parent reply other threads:[~2024-06-25 16:57 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-25 11:42 [PATCH 0/4] Fix scalability problem in workqueue watchdog touch caused by stop_machine Nicholas Piggin
2024-06-25 11:42 ` [PATCH 1/4] workqueue: wq_watchdog_touch is always called with valid CPU Nicholas Piggin
2024-06-25 11:42 ` [PATCH 2/4] workqueue: Improve scalability of workqueue watchdog touch Nicholas Piggin
2024-06-25 16:57 ` Tejun Heo [this message]
2024-06-26 0:52 ` Nicholas Piggin
2024-06-27 12:16 ` Hillf Danton
2024-06-27 12:42 ` Waiman Long
2024-06-25 11:42 ` [PATCH 3/4] stop_machine: Rearrange multi_cpu_stop state machine loop Nicholas Piggin
2024-06-25 11:42 ` [PATCH 4/4] stop_machine: Add a delay between multi_cpu_stop touching watchdogs Nicholas Piggin
2024-06-25 14:53 ` [PATCH 0/4] Fix scalability problem in workqueue watchdog touch caused by stop_machine Paul E. McKenney
2024-06-26 0:57 ` Nicholas Piggin
2024-09-25 5:25 ` Srikar Dronamraju
2024-06-26 12:58 ` Michal Koutný
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Znr2_fjb1UY-Rakp@slm.duckdns.org \
--to=tj@kernel.org \
--cc=jiangshanlai@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=npiggin@gmail.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=srikar@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.