* [PATCH RESEND v2 0/3] hung_task: extend blocking task stacktrace dump to semaphore
@ 2025-03-14 14:42 Lance Yang
2025-03-14 14:42 ` [PATCH RESEND v2 1/3] hung_task: replace blocker_mutex with encoded blocker Lance Yang
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Lance Yang @ 2025-03-14 14:42 UTC (permalink / raw)
To: akpm
Cc: will, peterz, mingo, longman, mhiramat, anna.schumaker,
boqun.feng, joel.granados, kent.overstreet, leonylgao,
linux-kernel, rostedt, senozhatsky, tfiga, amaindex, Lance Yang
Hi all,
Inspired by mutex blocker tracking[1], this patch series extend the
feature to not only dump the blocker task holding a mutex but also to
support semaphores. Unlike mutexes, semaphores lack explicit ownership
tracking, making it challenging to identify the root cause of hangs. To
address this, we introduce a last_holder field to the semaphore structure,
which is updated when a task successfully calls down() and cleared during
up().
The assumption is that if a task is blocked on a semaphore, the holders
must not have released it. While this does not guarantee that the last
holder is one of the current blockers, it likely provides a practical hint
for diagnosing semaphore-related stalls.
With this change, the hung task detector can now show blocker task's info
like below:
[Thu Mar 13 15:18:38 2025] INFO: task cat:1803 blocked for more than 122 seconds.
[Thu Mar 13 15:18:38 2025] Tainted: G OE 6.14.0-rc3+ #14
[Thu Mar 13 15:18:38 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Mar 13 15:18:38 2025] task:cat state:D stack:0 pid:1803 tgid:1803 ppid:1057 task_flags:0x400000 flags:0x00000004
[Thu Mar 13 15:18:38 2025] Call trace:
[Thu Mar 13 15:18:38 2025] __switch_to+0x1ec/0x380 (T)
[Thu Mar 13 15:18:38 2025] __schedule+0xc30/0x44f8
[Thu Mar 13 15:18:38 2025] schedule+0xb8/0x3b0
[Thu Mar 13 15:18:38 2025] schedule_timeout+0x1d0/0x208
[Thu Mar 13 15:18:38 2025] __down_common+0x2d4/0x6f8
[Thu Mar 13 15:18:38 2025] __down+0x24/0x50
[Thu Mar 13 15:18:38 2025] down+0xd0/0x140
[Thu Mar 13 15:18:38 2025] read_dummy+0x3c/0xa0 [hung_task_sem]
[Thu Mar 13 15:18:38 2025] full_proxy_read+0xfc/0x1d0
[Thu Mar 13 15:18:38 2025] vfs_read+0x1a0/0x858
[Thu Mar 13 15:18:38 2025] ksys_read+0x100/0x220
[Thu Mar 13 15:18:38 2025] __arm64_sys_read+0x78/0xc8
[Thu Mar 13 15:18:38 2025] invoke_syscall+0xd8/0x278
[Thu Mar 13 15:18:38 2025] el0_svc_common.constprop.0+0xb8/0x298
[Thu Mar 13 15:18:38 2025] do_el0_svc+0x4c/0x88
[Thu Mar 13 15:18:38 2025] el0_svc+0x44/0x108
[Thu Mar 13 15:18:38 2025] el0t_64_sync_handler+0x134/0x160
[Thu Mar 13 15:18:38 2025] el0t_64_sync+0x1b8/0x1c0
[Thu Mar 13 15:18:38 2025] INFO: task cat:1803 blocked on a semaphore likely last held by task cat:1802
[Thu Mar 13 15:18:38 2025] task:cat state:S stack:0 pid:1802 tgid:1802 ppid:1057 task_flags:0x400000 flags:0x00000004
[Thu Mar 13 15:18:38 2025] Call trace:
[Thu Mar 13 15:18:38 2025] __switch_to+0x1ec/0x380 (T)
[Thu Mar 13 15:18:38 2025] __schedule+0xc30/0x44f8
[Thu Mar 13 15:18:38 2025] schedule+0xb8/0x3b0
[Thu Mar 13 15:18:38 2025] schedule_timeout+0xf4/0x208
[Thu Mar 13 15:18:38 2025] msleep_interruptible+0x70/0x130
[Thu Mar 13 15:18:38 2025] read_dummy+0x48/0xa0 [hung_task_sem]
[Thu Mar 13 15:18:38 2025] full_proxy_read+0xfc/0x1d0
[Thu Mar 13 15:18:38 2025] vfs_read+0x1a0/0x858
[Thu Mar 13 15:18:38 2025] ksys_read+0x100/0x220
[Thu Mar 13 15:18:38 2025] __arm64_sys_read+0x78/0xc8
[Thu Mar 13 15:18:38 2025] invoke_syscall+0xd8/0x278
[Thu Mar 13 15:18:38 2025] el0_svc_common.constprop.0+0xb8/0x298
[Thu Mar 13 15:18:38 2025] do_el0_svc+0x4c/0x88
[Thu Mar 13 15:18:38 2025] el0_svc+0x44/0x108
[Thu Mar 13 15:18:38 2025] el0t_64_sync_handler+0x134/0x160
[Thu Mar 13 15:18:38 2025] el0t_64_sync+0x1b8/0x1c0
[1] https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com
Thanks,
Lance
---
v1 -> v2:
* Use one field to store the blocker as only one is active at a time,
suggested by Masami
* Leverage the LSB of the blocker field to reduce memory footprint,
suggested by Masami
* Add a hung_task detector semaphore blocking test sample code
* https://lore.kernel.org/all/20250301055102.88746-1-ioworker0@gmail.com
Lance Yang (2):
hung_task: replace blocker_mutex with encoded blocker
hung_task: show the blocker task if the task is hung on semaphore
Zi Li (1):
samples: add hung_task detector semaphore blocking sample
include/linux/hung_task.h | 94 +++++++++++++++++++++++++
include/linux/sched.h | 2 +-
include/linux/semaphore.h | 15 +++-
kernel/hung_task.c | 52 +++++++++++---
kernel/locking/mutex.c | 8 ++-
kernel/locking/semaphore.c | 55 +++++++++++++--
samples/Kconfig | 11 +--
samples/hung_task/Makefile | 3 +-
samples/hung_task/hung_task_mutex.c | 20 ++++--
samples/hung_task/hung_task_semaphore.c | 74 +++++++++++++++++++
10 files changed, 301 insertions(+), 33 deletions(-)
create mode 100644 include/linux/hung_task.h
create mode 100644 samples/hung_task/hung_task_semaphore.c
--
2.45.2
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH RESEND v2 1/3] hung_task: replace blocker_mutex with encoded blocker
2025-03-14 14:42 [PATCH RESEND v2 0/3] hung_task: extend blocking task stacktrace dump to semaphore Lance Yang
@ 2025-03-14 14:42 ` Lance Yang
2025-03-18 9:41 ` Masami Hiramatsu
2025-03-14 14:42 ` [PATCH RESEND v2 2/3] hung_task: show the blocker task if the task is hung on semaphore Lance Yang
` (2 subsequent siblings)
3 siblings, 1 reply; 12+ messages in thread
From: Lance Yang @ 2025-03-14 14:42 UTC (permalink / raw)
To: akpm
Cc: will, peterz, mingo, longman, mhiramat, anna.schumaker,
boqun.feng, joel.granados, kent.overstreet, leonylgao,
linux-kernel, rostedt, senozhatsky, tfiga, amaindex, Lance Yang,
Mingzhe Yang
This patch replaces 'struct mutex *blocker_mutex' with 'unsigned long
blocker', as only one blocker is active at a time.
The blocker filed can store both the lock addrees and the lock type, with
LSB used to encode the type as Masami suggested, making it easier to extend
the feature to cover other types of locks.
Also, once the lock type is determined, we can directly extract the address
and cast it to a lock pointer ;)
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
Signed-off-by: Lance Yang <ioworker0@gmail.com>
---
include/linux/hung_task.h | 94 +++++++++++++++++++++++++++++++++++++++
include/linux/sched.h | 2 +-
kernel/hung_task.c | 15 ++++---
kernel/locking/mutex.c | 8 +++-
4 files changed, 111 insertions(+), 8 deletions(-)
create mode 100644 include/linux/hung_task.h
diff --git a/include/linux/hung_task.h b/include/linux/hung_task.h
new file mode 100644
index 000000000000..64ced33b0d1f
--- /dev/null
+++ b/include/linux/hung_task.h
@@ -0,0 +1,94 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Detect Hung Task: detecting tasks stuck in D state
+ *
+ * Copyright (C) 2025 Tongcheng Travel (www.ly.com)
+ * Author: Lance Yang <mingzhe.yang@ly.com>
+ */
+#ifndef __LINUX_HUNG_TASK_H
+#define __LINUX_HUNG_TASK_H
+
+#include <linux/bug.h>
+#include <linux/sched.h>
+#include <linux/compiler.h>
+
+/*
+ * @blocker: Combines lock address and blocking type.
+ *
+ * Since lock pointers are at least 4-byte aligned(32-bit) or 8-byte
+ * aligned(64-bit). This leaves the 2 least bits (LSBs) of the pointer
+ * always zero. So we can use these bits to encode the specific blocking
+ * type.
+ *
+ * Type encoding:
+ * 00 - Blocked on mutex (BLOCKER_TYPE_MUTEX)
+ * 01 - Blocked on semaphore (BLOCKER_TYPE_SEM)
+ * 10 - Blocked on rt-mutex (BLOCKER_TYPE_RTMUTEX)
+ * 11 - Blocked on rw-semaphore (BLOCKER_TYPE_RWSEM)
+ */
+#define BLOCKER_TYPE_MUTEX 0x00UL
+#define BLOCKER_TYPE_SEM 0x01UL
+#define BLOCKER_TYPE_RTMUTEX 0x02UL
+#define BLOCKER_TYPE_RWSEM 0x03UL
+
+#define BLOCKER_TYPE_MASK 0x03UL
+
+#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
+static inline void hung_task_set_blocker(void *lock, unsigned long type)
+{
+ unsigned long lock_ptr = (unsigned long)lock;
+
+ WARN_ON_ONCE(!lock_ptr);
+ WARN_ON_ONCE(lock_ptr & BLOCKER_TYPE_MASK);
+ WARN_ON_ONCE(READ_ONCE(current->blocker));
+
+ /*
+ * If the lock pointer matches the BLOCKER_TYPE_MASK, return
+ * without writing anything.
+ */
+ if (lock_ptr & BLOCKER_TYPE_MASK)
+ return;
+
+ WRITE_ONCE(current->blocker, lock_ptr | type);
+}
+
+static inline void hung_task_clear_blocker(void)
+{
+ WARN_ON_ONCE(!READ_ONCE(current->blocker));
+
+ WRITE_ONCE(current->blocker, 0UL);
+}
+
+static inline bool hung_task_blocker_is_type(unsigned long blocker,
+ unsigned long type)
+{
+ WARN_ON_ONCE(!blocker);
+
+ return (blocker & BLOCKER_TYPE_MASK) == type;
+}
+
+static inline void *hung_task_blocker_to_lock(unsigned long blocker)
+{
+ WARN_ON_ONCE(!blocker);
+
+ return (void *)(blocker & ~BLOCKER_TYPE_MASK);
+}
+#else
+static inline void hung_task_set_blocker(void *lock, unsigned long type)
+{
+}
+static inline void hung_task_clear_blocker(void)
+{
+}
+static inline bool hung_task_blocker_is_type(unsigned long blocker,
+ unsigned long type)
+{
+ return false;
+}
+static inline void *hung_task_blocker_to_lock(unsigned long blocker)
+{
+ return NULL;
+}
+#endif
+
+#endif /* __LINUX_HUNG_TASK_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1419d94c8e87..f27060dac499 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1218,7 +1218,7 @@ struct task_struct {
#endif
#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
- struct mutex *blocker_mutex;
+ unsigned long blocker;
#endif
#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index dc898ec93463..46eb6717564d 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -25,6 +25,10 @@
#include <trace/events/sched.h>
+#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
+#include <linux/hung_task.h>
+#endif
+
/*
* The number of tasks checked:
*/
@@ -98,16 +102,17 @@ static struct notifier_block panic_block = {
static void debug_show_blocker(struct task_struct *task)
{
struct task_struct *g, *t;
- unsigned long owner;
- struct mutex *lock;
+ unsigned long owner, blocker;
RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "No rcu lock held");
- lock = READ_ONCE(task->blocker_mutex);
- if (!lock)
+ blocker = READ_ONCE(task->blocker);
+ if (!blocker || !hung_task_blocker_is_type(blocker, BLOCKER_TYPE_MUTEX))
return;
- owner = mutex_get_owner(lock);
+ owner = mutex_get_owner(
+ (struct mutex *)hung_task_blocker_to_lock(blocker));
+
if (unlikely(!owner)) {
pr_err("INFO: task %s:%d is blocked on a mutex, but the owner is not found.\n",
task->comm, task->pid);
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 6a543c204a14..642d6398e0dd 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -42,6 +42,10 @@
# define MUTEX_WARN_ON(cond)
#endif
+#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
+#include <linux/hung_task.h>
+#endif
+
void
__mutex_init(struct mutex *lock, const char *name, struct lock_class_key *key)
{
@@ -189,7 +193,7 @@ __mutex_add_waiter(struct mutex *lock, struct mutex_waiter *waiter,
struct list_head *list)
{
#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
- WRITE_ONCE(current->blocker_mutex, lock);
+ hung_task_set_blocker(lock, BLOCKER_TYPE_MUTEX);
#endif
debug_mutex_add_waiter(lock, waiter, current);
@@ -207,7 +211,7 @@ __mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *waiter)
debug_mutex_remove_waiter(lock, waiter, current);
#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
- WRITE_ONCE(current->blocker_mutex, NULL);
+ hung_task_clear_blocker();
#endif
}
--
2.45.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH RESEND v2 2/3] hung_task: show the blocker task if the task is hung on semaphore
2025-03-14 14:42 [PATCH RESEND v2 0/3] hung_task: extend blocking task stacktrace dump to semaphore Lance Yang
2025-03-14 14:42 ` [PATCH RESEND v2 1/3] hung_task: replace blocker_mutex with encoded blocker Lance Yang
@ 2025-03-14 14:42 ` Lance Yang
2025-03-19 11:55 ` Masami Hiramatsu
2025-03-14 14:43 ` [PATCH RESEND v2 3/3] samples: add hung_task detector semaphore blocking sample Lance Yang
2025-03-14 17:37 ` [PATCH RESEND v2 0/3] hung_task: extend blocking task stacktrace dump to semaphore Boqun Feng
3 siblings, 1 reply; 12+ messages in thread
From: Lance Yang @ 2025-03-14 14:42 UTC (permalink / raw)
To: akpm
Cc: will, peterz, mingo, longman, mhiramat, anna.schumaker,
boqun.feng, joel.granados, kent.overstreet, leonylgao,
linux-kernel, rostedt, senozhatsky, tfiga, amaindex, Lance Yang,
Mingzhe Yang
Inspired by mutex blocker tracking[1], this patch makes a trade-off to
balance the overhead and utility of the hung task detector.
Unlike mutexes, semaphores lack explicit ownership tracking, making it
challenging to identify the root cause of hangs. To address this, we
introduce a last_holder field to the semaphore structure, which is
updated when a task successfully calls down() and cleared during up().
The assumption is that if a task is blocked on a semaphore, the holders
must not have released it. While this does not guarantee that the last
holder is one of the current blockers, it likely provides a practical hint
for diagnosing semaphore-related stalls.
With this change, the hung task detector can now show blocker task's info
like below:
[Thu Mar 13 15:18:38 2025] INFO: task cat:1803 blocked for more than 122 seconds.
[Thu Mar 13 15:18:38 2025] Tainted: G OE 6.14.0-rc3+ #14
[Thu Mar 13 15:18:38 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Mar 13 15:18:38 2025] task:cat state:D stack:0 pid:1803 tgid:1803 ppid:1057 task_flags:0x400000 flags:0x00000004
[Thu Mar 13 15:18:38 2025] Call trace:
[Thu Mar 13 15:18:38 2025] __switch_to+0x1ec/0x380 (T)
[Thu Mar 13 15:18:38 2025] __schedule+0xc30/0x44f8
[Thu Mar 13 15:18:38 2025] schedule+0xb8/0x3b0
[Thu Mar 13 15:18:38 2025] schedule_timeout+0x1d0/0x208
[Thu Mar 13 15:18:38 2025] __down_common+0x2d4/0x6f8
[Thu Mar 13 15:18:38 2025] __down+0x24/0x50
[Thu Mar 13 15:18:38 2025] down+0xd0/0x140
[Thu Mar 13 15:18:38 2025] read_dummy+0x3c/0xa0 [hung_task_sem]
[Thu Mar 13 15:18:38 2025] full_proxy_read+0xfc/0x1d0
[Thu Mar 13 15:18:38 2025] vfs_read+0x1a0/0x858
[Thu Mar 13 15:18:38 2025] ksys_read+0x100/0x220
[Thu Mar 13 15:18:38 2025] __arm64_sys_read+0x78/0xc8
[Thu Mar 13 15:18:38 2025] invoke_syscall+0xd8/0x278
[Thu Mar 13 15:18:38 2025] el0_svc_common.constprop.0+0xb8/0x298
[Thu Mar 13 15:18:38 2025] do_el0_svc+0x4c/0x88
[Thu Mar 13 15:18:38 2025] el0_svc+0x44/0x108
[Thu Mar 13 15:18:38 2025] el0t_64_sync_handler+0x134/0x160
[Thu Mar 13 15:18:38 2025] el0t_64_sync+0x1b8/0x1c0
[Thu Mar 13 15:18:38 2025] INFO: task cat:1803 blocked on a semaphore likely last held by task cat:1802
[Thu Mar 13 15:18:38 2025] task:cat state:S stack:0 pid:1802 tgid:1802 ppid:1057 task_flags:0x400000 flags:0x00000004
[Thu Mar 13 15:18:38 2025] Call trace:
[Thu Mar 13 15:18:38 2025] __switch_to+0x1ec/0x380 (T)
[Thu Mar 13 15:18:38 2025] __schedule+0xc30/0x44f8
[Thu Mar 13 15:18:38 2025] schedule+0xb8/0x3b0
[Thu Mar 13 15:18:38 2025] schedule_timeout+0xf4/0x208
[Thu Mar 13 15:18:38 2025] msleep_interruptible+0x70/0x130
[Thu Mar 13 15:18:38 2025] read_dummy+0x48/0xa0 [hung_task_sem]
[Thu Mar 13 15:18:38 2025] full_proxy_read+0xfc/0x1d0
[Thu Mar 13 15:18:38 2025] vfs_read+0x1a0/0x858
[Thu Mar 13 15:18:38 2025] ksys_read+0x100/0x220
[Thu Mar 13 15:18:38 2025] __arm64_sys_read+0x78/0xc8
[Thu Mar 13 15:18:38 2025] invoke_syscall+0xd8/0x278
[Thu Mar 13 15:18:38 2025] el0_svc_common.constprop.0+0xb8/0x298
[Thu Mar 13 15:18:38 2025] do_el0_svc+0x4c/0x88
[Thu Mar 13 15:18:38 2025] el0_svc+0x44/0x108
[Thu Mar 13 15:18:38 2025] el0t_64_sync_handler+0x134/0x160
[Thu Mar 13 15:18:38 2025] el0t_64_sync+0x1b8/0x1c0
[1] https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
Signed-off-by: Lance Yang <ioworker0@gmail.com>
---
include/linux/semaphore.h | 15 ++++++++++-
kernel/hung_task.c | 45 ++++++++++++++++++++++++-------
kernel/locking/semaphore.c | 55 +++++++++++++++++++++++++++++++++-----
3 files changed, 98 insertions(+), 17 deletions(-)
diff --git a/include/linux/semaphore.h b/include/linux/semaphore.h
index 04655faadc2d..89706157e622 100644
--- a/include/linux/semaphore.h
+++ b/include/linux/semaphore.h
@@ -16,13 +16,25 @@ struct semaphore {
raw_spinlock_t lock;
unsigned int count;
struct list_head wait_list;
+
+#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
+ unsigned long last_holder;
+#endif
};
+#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
+#define __LAST_HOLDER_SEMAPHORE_INITIALIZER \
+ , .last_holder = 0UL
+#else
+#define __LAST_HOLDER_SEMAPHORE_INITIALIZER
+#endif
+
#define __SEMAPHORE_INITIALIZER(name, n) \
{ \
.lock = __RAW_SPIN_LOCK_UNLOCKED((name).lock), \
.count = n, \
- .wait_list = LIST_HEAD_INIT((name).wait_list), \
+ .wait_list = LIST_HEAD_INIT((name).wait_list) \
+ __LAST_HOLDER_SEMAPHORE_INITIALIZER \
}
/*
@@ -47,5 +59,6 @@ extern int __must_check down_killable(struct semaphore *sem);
extern int __must_check down_trylock(struct semaphore *sem);
extern int __must_check down_timeout(struct semaphore *sem, long jiffies);
extern void up(struct semaphore *sem);
+extern unsigned long sem_last_holder(struct semaphore *sem);
#endif /* __LINUX_SEMAPHORE_H */
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 46eb6717564d..f8cb5a0e14f7 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -102,31 +102,56 @@ static struct notifier_block panic_block = {
static void debug_show_blocker(struct task_struct *task)
{
struct task_struct *g, *t;
- unsigned long owner, blocker;
+ unsigned long owner, blocker, blocker_lock_type;
RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "No rcu lock held");
blocker = READ_ONCE(task->blocker);
- if (!blocker || !hung_task_blocker_is_type(blocker, BLOCKER_TYPE_MUTEX))
+ if (!blocker)
return;
- owner = mutex_get_owner(
- (struct mutex *)hung_task_blocker_to_lock(blocker));
+ if (hung_task_blocker_is_type(blocker, BLOCKER_TYPE_MUTEX)) {
+ owner = mutex_get_owner(
+ (struct mutex *)hung_task_blocker_to_lock(blocker));
+ blocker_lock_type = BLOCKER_TYPE_MUTEX;
+ } else if (hung_task_blocker_is_type(blocker, BLOCKER_TYPE_SEM)) {
+ owner = sem_last_holder(
+ (struct semaphore *)hung_task_blocker_to_lock(blocker));
+ blocker_lock_type = BLOCKER_TYPE_SEM;
+ } else
+ return;
if (unlikely(!owner)) {
- pr_err("INFO: task %s:%d is blocked on a mutex, but the owner is not found.\n",
- task->comm, task->pid);
+ switch (blocker_lock_type) {
+ case BLOCKER_TYPE_MUTEX:
+ pr_err("INFO: task %s:%d is blocked on a mutex, but the owner is not found.\n",
+ task->comm, task->pid);
+ break;
+ case BLOCKER_TYPE_SEM:
+ pr_err("INFO: task %s:%d is blocked on a semaphore, but the last holder is not found.\n",
+ task->comm, task->pid);
+ break;
+ }
return;
}
/* Ensure the owner information is correct. */
for_each_process_thread(g, t) {
- if ((unsigned long)t == owner) {
+ if ((unsigned long)t != owner)
+ continue;
+
+ switch (blocker_lock_type) {
+ case BLOCKER_TYPE_MUTEX:
pr_err("INFO: task %s:%d is blocked on a mutex likely owned by task %s:%d.\n",
- task->comm, task->pid, t->comm, t->pid);
- sched_show_task(t);
- return;
+ task->comm, task->pid, t->comm, t->pid);
+ break;
+ case BLOCKER_TYPE_SEM:
+ pr_err("INFO: task %s:%d blocked on a semaphore likely last held by task %s:%d\n",
+ task->comm, task->pid, t->comm, t->pid);
+ break;
}
+ sched_show_task(t);
+ return;
}
}
#else
diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
index 34bfae72f295..87dfb93a812d 100644
--- a/kernel/locking/semaphore.c
+++ b/kernel/locking/semaphore.c
@@ -34,11 +34,16 @@
#include <linux/ftrace.h>
#include <trace/events/lock.h>
+#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
+#include <linux/hung_task.h>
+#endif
+
static noinline void __down(struct semaphore *sem);
static noinline int __down_interruptible(struct semaphore *sem);
static noinline int __down_killable(struct semaphore *sem);
static noinline int __down_timeout(struct semaphore *sem, long timeout);
static noinline void __up(struct semaphore *sem);
+static inline void __sem_acquire(struct semaphore *sem);
/**
* down - acquire the semaphore
@@ -58,7 +63,7 @@ void __sched down(struct semaphore *sem)
might_sleep();
raw_spin_lock_irqsave(&sem->lock, flags);
if (likely(sem->count > 0))
- sem->count--;
+ __sem_acquire(sem);
else
__down(sem);
raw_spin_unlock_irqrestore(&sem->lock, flags);
@@ -82,7 +87,7 @@ int __sched down_interruptible(struct semaphore *sem)
might_sleep();
raw_spin_lock_irqsave(&sem->lock, flags);
if (likely(sem->count > 0))
- sem->count--;
+ __sem_acquire(sem);
else
result = __down_interruptible(sem);
raw_spin_unlock_irqrestore(&sem->lock, flags);
@@ -109,7 +114,7 @@ int __sched down_killable(struct semaphore *sem)
might_sleep();
raw_spin_lock_irqsave(&sem->lock, flags);
if (likely(sem->count > 0))
- sem->count--;
+ __sem_acquire(sem);
else
result = __down_killable(sem);
raw_spin_unlock_irqrestore(&sem->lock, flags);
@@ -139,7 +144,7 @@ int __sched down_trylock(struct semaphore *sem)
raw_spin_lock_irqsave(&sem->lock, flags);
count = sem->count - 1;
if (likely(count >= 0))
- sem->count = count;
+ __sem_acquire(sem);
raw_spin_unlock_irqrestore(&sem->lock, flags);
return (count < 0);
@@ -164,7 +169,7 @@ int __sched down_timeout(struct semaphore *sem, long timeout)
might_sleep();
raw_spin_lock_irqsave(&sem->lock, flags);
if (likely(sem->count > 0))
- sem->count--;
+ __sem_acquire(sem);
else
result = __down_timeout(sem, timeout);
raw_spin_unlock_irqrestore(&sem->lock, flags);
@@ -185,6 +190,12 @@ void __sched up(struct semaphore *sem)
unsigned long flags;
raw_spin_lock_irqsave(&sem->lock, flags);
+
+#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
+ if (READ_ONCE(sem->last_holder) == (unsigned long)current)
+ WRITE_ONCE(sem->last_holder, 0UL);
+#endif
+
if (likely(list_empty(&sem->wait_list)))
sem->count++;
else
@@ -224,8 +235,12 @@ static inline int __sched ___down_common(struct semaphore *sem, long state,
raw_spin_unlock_irq(&sem->lock);
timeout = schedule_timeout(timeout);
raw_spin_lock_irq(&sem->lock);
- if (waiter.up)
+ if (waiter.up) {
+#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
+ WRITE_ONCE(sem->last_holder, (unsigned long)current);
+#endif
return 0;
+ }
}
timed_out:
@@ -242,10 +257,18 @@ static inline int __sched __down_common(struct semaphore *sem, long state,
{
int ret;
+#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
+ hung_task_set_blocker(sem, BLOCKER_TYPE_SEM);
+#endif
+
trace_contention_begin(sem, 0);
ret = ___down_common(sem, state, timeout);
trace_contention_end(sem, ret);
+#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
+ hung_task_clear_blocker();
+#endif
+
return ret;
}
@@ -277,3 +300,23 @@ static noinline void __sched __up(struct semaphore *sem)
waiter->up = true;
wake_up_process(waiter->task);
}
+
+#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
+unsigned long sem_last_holder(struct semaphore *sem)
+{
+ return READ_ONCE(sem->last_holder);
+}
+#else
+unsigned long sem_last_holder(struct semaphore *sem)
+{
+ return 0UL;
+}
+#endif
+
+static inline void __sem_acquire(struct semaphore *sem)
+{
+ sem->count--;
+#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
+ WRITE_ONCE(sem->last_holder, (unsigned long)current);
+#endif
+}
--
2.45.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH RESEND v2 3/3] samples: add hung_task detector semaphore blocking sample
2025-03-14 14:42 [PATCH RESEND v2 0/3] hung_task: extend blocking task stacktrace dump to semaphore Lance Yang
2025-03-14 14:42 ` [PATCH RESEND v2 1/3] hung_task: replace blocker_mutex with encoded blocker Lance Yang
2025-03-14 14:42 ` [PATCH RESEND v2 2/3] hung_task: show the blocker task if the task is hung on semaphore Lance Yang
@ 2025-03-14 14:43 ` Lance Yang
2025-03-18 1:36 ` Masami Hiramatsu
2025-03-14 17:37 ` [PATCH RESEND v2 0/3] hung_task: extend blocking task stacktrace dump to semaphore Boqun Feng
3 siblings, 1 reply; 12+ messages in thread
From: Lance Yang @ 2025-03-14 14:43 UTC (permalink / raw)
To: akpm
Cc: will, peterz, mingo, longman, mhiramat, anna.schumaker,
boqun.feng, joel.granados, kent.overstreet, leonylgao,
linux-kernel, rostedt, senozhatsky, tfiga, amaindex, Lance Yang
From: Zi Li <amaindex@outlook.com>
Add a hung_task detector semaphore blocking test sample code.
This module will create a dummy file on the debugfs. That file will cause
the read process to sleep for a sufficiently long time (256 seconds)
while holding a semaphore. As a result, the second process will wait on
the semaphore for a prolonged duration and be detected by the hung_task
detector.
Usage is;
> cd /sys/kernel/debug/hung_task
> cat semaphore & cat semaphore
and wait for hung_task message.
Signed-off-by: Lance Yang <ioworker0@gmail.com>
Signed-off-by: Zi Li <amaindex@outlook.com>
---
samples/Kconfig | 11 ++--
samples/hung_task/Makefile | 3 +-
samples/hung_task/hung_task_mutex.c | 20 ++++---
samples/hung_task/hung_task_semaphore.c | 74 +++++++++++++++++++++++++
4 files changed, 96 insertions(+), 12 deletions(-)
create mode 100644 samples/hung_task/hung_task_semaphore.c
diff --git a/samples/Kconfig b/samples/Kconfig
index 09011be2391a..3a073d6b848b 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -304,10 +304,13 @@ config SAMPLE_HUNG_TASK
tristate "Hung task detector test code"
depends on DETECT_HUNG_TASK && DEBUG_FS
help
- Build a module which provide a simple debugfs file. If user reads
- the file, it will sleep long time (256 seconds) with holding a
- mutex. Thus if there are 2 or more processes read this file, it
- will be detected by the hung_task watchdog.
+ Build multiple modules to test the hung task detector. Each module
+ provides a simple debugfs file corresponding to a specific
+ synchronization primitive (e.g., mutex, semaphore, etc.). When the
+ file is read, the module will sleep for a long time (256 seconds)
+ while holding the respective synchronizer. If multiple processes
+ attempt to read these files concurrently, the hung_task watchdog
+ can detect potential hangs or deadlocks.
source "samples/rust/Kconfig"
diff --git a/samples/hung_task/Makefile b/samples/hung_task/Makefile
index fe9dde799880..7483c2c0a0ef 100644
--- a/samples/hung_task/Makefile
+++ b/samples/hung_task/Makefile
@@ -1,2 +1,3 @@
# SPDX-License-Identifier: GPL-2.0-only
-obj-$(CONFIG_SAMPLE_HUNG_TASK) += hung_task_mutex.o
\ No newline at end of file
+obj-$(CONFIG_SAMPLE_HUNG_TASK) += hung_task_mutex.o
+obj-$(CONFIG_SAMPLE_HUNG_TASK) += hung_task_semaphore.o
\ No newline at end of file
diff --git a/samples/hung_task/hung_task_mutex.c b/samples/hung_task/hung_task_mutex.c
index 7a29f2246d22..e4d1d69618b8 100644
--- a/samples/hung_task/hung_task_mutex.c
+++ b/samples/hung_task/hung_task_mutex.c
@@ -22,7 +22,7 @@
static const char dummy_string[] = "This is a dummy string.";
static DEFINE_MUTEX(dummy_mutex);
-struct dentry *hung_task_dir;
+static struct dentry *hung_task_dir;
static ssize_t read_dummy(struct file *file, char __user *user_buf,
size_t count, loff_t *ppos)
@@ -43,19 +43,25 @@ static const struct file_operations hung_task_fops = {
static int __init hung_task_sample_init(void)
{
- hung_task_dir = debugfs_create_dir(HUNG_TASK_DIR, NULL);
- if (IS_ERR(hung_task_dir))
- return PTR_ERR(hung_task_dir);
+ hung_task_dir = debugfs_lookup(HUNG_TASK_DIR, NULL);
+ if (!hung_task_dir) {
+ hung_task_dir = debugfs_create_dir(HUNG_TASK_DIR, NULL);
+ if (IS_ERR(hung_task_dir))
+ return PTR_ERR(hung_task_dir);
+ }
- debugfs_create_file(HUNG_TASK_FILE, 0400, hung_task_dir,
- NULL, &hung_task_fops);
+ debugfs_create_file(HUNG_TASK_FILE, 0400, hung_task_dir, NULL,
+ &hung_task_fops);
return 0;
}
static void __exit hung_task_sample_exit(void)
{
- debugfs_remove_recursive(hung_task_dir);
+ debugfs_lookup_and_remove(HUNG_TASK_FILE, hung_task_dir);
+
+ if (simple_empty(hung_task_dir))
+ debugfs_remove(hung_task_dir);
}
module_init(hung_task_sample_init);
diff --git a/samples/hung_task/hung_task_semaphore.c b/samples/hung_task/hung_task_semaphore.c
new file mode 100644
index 000000000000..a5814971bfb8
--- /dev/null
+++ b/samples/hung_task/hung_task_semaphore.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * hung_task_semaphore.c - Sample code which causes hung task by semaphore
+ *
+ * Usage: load this module and read `<debugfs>/hung_task/semaphore`
+ * by 2 or more processes.
+ *
+ * This is for testing kernel hung_task error message.
+ * Note that this will make your system freeze and maybe
+ * cause panic. So do not use this except for the test.
+ */
+
+#include <linux/debugfs.h>
+#include <linux/delay.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/semaphore.h>
+
+#define HUNG_TASK_DIR "hung_task"
+#define HUNG_TASK_FILE "semaphore"
+#define SLEEP_SECOND 256
+
+static const char dummy_string[] = "This is a dummy string.";
+static DEFINE_SEMAPHORE(dummy_sem, 1);
+static struct dentry *hung_task_dir;
+
+static ssize_t read_dummy(struct file *file, char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+ /* If the second task waits on the semaphore, it is uninterruptible sleep. */
+ down(&dummy_sem);
+
+ /* When the first task sleep here, it is interruptible. */
+ msleep_interruptible(SLEEP_SECOND * 1000);
+
+ up(&dummy_sem);
+
+ return simple_read_from_buffer(user_buf, count, ppos, dummy_string,
+ sizeof(dummy_string));
+}
+
+static const struct file_operations hung_task_fops = {
+ .read = read_dummy,
+};
+
+static int __init hung_task_sample_init(void)
+{
+ hung_task_dir = debugfs_lookup(HUNG_TASK_DIR, NULL);
+ if (!hung_task_dir) {
+ hung_task_dir = debugfs_create_dir(HUNG_TASK_DIR, NULL);
+ if (IS_ERR(hung_task_dir))
+ return PTR_ERR(hung_task_dir);
+ }
+
+ debugfs_create_file(HUNG_TASK_FILE, 0400, hung_task_dir, NULL,
+ &hung_task_fops);
+
+ return 0;
+}
+
+static void __exit hung_task_sample_exit(void)
+{
+ debugfs_lookup_and_remove(HUNG_TASK_FILE, hung_task_dir);
+
+ if (simple_empty(hung_task_dir))
+ debugfs_remove(hung_task_dir);
+}
+
+module_init(hung_task_sample_init);
+module_exit(hung_task_sample_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Zi Li");
+MODULE_DESCRIPTION("Simple sleep under semaphore file for testing hung task");
--
2.45.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH RESEND v2 0/3] hung_task: extend blocking task stacktrace dump to semaphore
2025-03-14 14:42 [PATCH RESEND v2 0/3] hung_task: extend blocking task stacktrace dump to semaphore Lance Yang
` (2 preceding siblings ...)
2025-03-14 14:43 ` [PATCH RESEND v2 3/3] samples: add hung_task detector semaphore blocking sample Lance Yang
@ 2025-03-14 17:37 ` Boqun Feng
2025-03-15 3:18 ` Lance Yang
3 siblings, 1 reply; 12+ messages in thread
From: Boqun Feng @ 2025-03-14 17:37 UTC (permalink / raw)
To: Lance Yang
Cc: akpm, will, peterz, mingo, longman, mhiramat, anna.schumaker,
joel.granados, kent.overstreet, leonylgao, linux-kernel, rostedt,
senozhatsky, tfiga, amaindex, John Stultz
Hi Lance,
On Fri, Mar 14, 2025 at 10:42:57PM +0800, Lance Yang wrote:
> Hi all,
>
> Inspired by mutex blocker tracking[1], this patch series extend the
> feature to not only dump the blocker task holding a mutex but also to
> support semaphores. Unlike mutexes, semaphores lack explicit ownership
> tracking, making it challenging to identify the root cause of hangs. To
> address this, we introduce a last_holder field to the semaphore structure,
> which is updated when a task successfully calls down() and cleared during
> up().
>
> The assumption is that if a task is blocked on a semaphore, the holders
> must not have released it. While this does not guarantee that the last
> holder is one of the current blockers, it likely provides a practical hint
> for diagnosing semaphore-related stalls.
>
Could you copy John Stultz for the future versions? Because John is
working on proxy execution, which will make a task always track which
mutex it's blocked by:
https://lore.kernel.org/lkml/20250312221147.1865364-3-jstultz@google.com/
I feel it's better to build the hung task detection with that in mind,
thanks!
Regards,
Boqun
[...]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RESEND v2 0/3] hung_task: extend blocking task stacktrace dump to semaphore
2025-03-14 17:37 ` [PATCH RESEND v2 0/3] hung_task: extend blocking task stacktrace dump to semaphore Boqun Feng
@ 2025-03-15 3:18 ` Lance Yang
0 siblings, 0 replies; 12+ messages in thread
From: Lance Yang @ 2025-03-15 3:18 UTC (permalink / raw)
To: Boqun Feng
Cc: akpm, will, peterz, mingo, longman, mhiramat, anna.schumaker,
joel.granados, kent.overstreet, leonylgao, linux-kernel, rostedt,
senozhatsky, tfiga, amaindex, John Stultz
On Sat, Mar 15, 2025 at 1:38 AM Boqun Feng <boqun.feng@gmail.com> wrote:
>
> Hi Lance,
>
> On Fri, Mar 14, 2025 at 10:42:57PM +0800, Lance Yang wrote:
> > Hi all,
> >
> > Inspired by mutex blocker tracking[1], this patch series extend the
> > feature to not only dump the blocker task holding a mutex but also to
> > support semaphores. Unlike mutexes, semaphores lack explicit ownership
> > tracking, making it challenging to identify the root cause of hangs. To
> > address this, we introduce a last_holder field to the semaphore structure,
> > which is updated when a task successfully calls down() and cleared during
> > up().
> >
> > The assumption is that if a task is blocked on a semaphore, the holders
> > must not have released it. While this does not guarantee that the last
> > holder is one of the current blockers, it likely provides a practical hint
> > for diagnosing semaphore-related stalls.
> >
>
> Could you copy John Stultz for the future versions? Because John is
> working on proxy execution, which will make a task always track which
> mutex it's blocked by:
>
> https://lore.kernel.org/lkml/20250312221147.1865364-3-jstultz@google.com/
>
> I feel it's better to build the hung task detection with that in mind,
> thanks!
Yeah. Thanks for letting me know. I will keep John in the loop ;)
Thanks,
Lance
>
> Regards,
> Boqun
>
> [...]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RESEND v2 3/3] samples: add hung_task detector semaphore blocking sample
2025-03-14 14:43 ` [PATCH RESEND v2 3/3] samples: add hung_task detector semaphore blocking sample Lance Yang
@ 2025-03-18 1:36 ` Masami Hiramatsu
2025-03-18 6:20 ` Amaindex
0 siblings, 1 reply; 12+ messages in thread
From: Masami Hiramatsu @ 2025-03-18 1:36 UTC (permalink / raw)
To: Lance Yang
Cc: akpm, will, peterz, mingo, longman, mhiramat, anna.schumaker,
boqun.feng, joel.granados, kent.overstreet, leonylgao,
linux-kernel, rostedt, senozhatsky, tfiga, amaindex
On Fri, 14 Mar 2025 22:43:00 +0800
Lance Yang <ioworker0@gmail.com> wrote:
> From: Zi Li <amaindex@outlook.com>
>
> Add a hung_task detector semaphore blocking test sample code.
>
> This module will create a dummy file on the debugfs. That file will cause
> the read process to sleep for a sufficiently long time (256 seconds)
> while holding a semaphore. As a result, the second process will wait on
> the semaphore for a prolonged duration and be detected by the hung_task
> detector.
>
> Usage is;
>
> > cd /sys/kernel/debug/hung_task
> > cat semaphore & cat semaphore
>
> and wait for hung_task message.
Thanks for updating the sample code. BTW, do we need to have almost same
2 samples? I think we can unify it (one module provides both "mutex" and
"semaphre" test files.) to "hung_task_tests.c"
That will help us to extend it for other types easier (and less code).
Thank you,
>
> Signed-off-by: Lance Yang <ioworker0@gmail.com>
> Signed-off-by: Zi Li <amaindex@outlook.com>
> ---
> samples/Kconfig | 11 ++--
> samples/hung_task/Makefile | 3 +-
> samples/hung_task/hung_task_mutex.c | 20 ++++---
> samples/hung_task/hung_task_semaphore.c | 74 +++++++++++++++++++++++++
> 4 files changed, 96 insertions(+), 12 deletions(-)
> create mode 100644 samples/hung_task/hung_task_semaphore.c
>
> diff --git a/samples/Kconfig b/samples/Kconfig
> index 09011be2391a..3a073d6b848b 100644
> --- a/samples/Kconfig
> +++ b/samples/Kconfig
> @@ -304,10 +304,13 @@ config SAMPLE_HUNG_TASK
> tristate "Hung task detector test code"
> depends on DETECT_HUNG_TASK && DEBUG_FS
> help
> - Build a module which provide a simple debugfs file. If user reads
> - the file, it will sleep long time (256 seconds) with holding a
> - mutex. Thus if there are 2 or more processes read this file, it
> - will be detected by the hung_task watchdog.
> + Build multiple modules to test the hung task detector. Each module
> + provides a simple debugfs file corresponding to a specific
> + synchronization primitive (e.g., mutex, semaphore, etc.). When the
> + file is read, the module will sleep for a long time (256 seconds)
> + while holding the respective synchronizer. If multiple processes
> + attempt to read these files concurrently, the hung_task watchdog
> + can detect potential hangs or deadlocks.
>
> source "samples/rust/Kconfig"
>
> diff --git a/samples/hung_task/Makefile b/samples/hung_task/Makefile
> index fe9dde799880..7483c2c0a0ef 100644
> --- a/samples/hung_task/Makefile
> +++ b/samples/hung_task/Makefile
> @@ -1,2 +1,3 @@
> # SPDX-License-Identifier: GPL-2.0-only
> -obj-$(CONFIG_SAMPLE_HUNG_TASK) += hung_task_mutex.o
> \ No newline at end of file
> +obj-$(CONFIG_SAMPLE_HUNG_TASK) += hung_task_mutex.o
> +obj-$(CONFIG_SAMPLE_HUNG_TASK) += hung_task_semaphore.o
> \ No newline at end of file
> diff --git a/samples/hung_task/hung_task_mutex.c b/samples/hung_task/hung_task_mutex.c
> index 7a29f2246d22..e4d1d69618b8 100644
> --- a/samples/hung_task/hung_task_mutex.c
> +++ b/samples/hung_task/hung_task_mutex.c
> @@ -22,7 +22,7 @@
>
> static const char dummy_string[] = "This is a dummy string.";
> static DEFINE_MUTEX(dummy_mutex);
> -struct dentry *hung_task_dir;
> +static struct dentry *hung_task_dir;
>
> static ssize_t read_dummy(struct file *file, char __user *user_buf,
> size_t count, loff_t *ppos)
> @@ -43,19 +43,25 @@ static const struct file_operations hung_task_fops = {
>
> static int __init hung_task_sample_init(void)
> {
> - hung_task_dir = debugfs_create_dir(HUNG_TASK_DIR, NULL);
> - if (IS_ERR(hung_task_dir))
> - return PTR_ERR(hung_task_dir);
> + hung_task_dir = debugfs_lookup(HUNG_TASK_DIR, NULL);
> + if (!hung_task_dir) {
> + hung_task_dir = debugfs_create_dir(HUNG_TASK_DIR, NULL);
> + if (IS_ERR(hung_task_dir))
> + return PTR_ERR(hung_task_dir);
> + }
>
> - debugfs_create_file(HUNG_TASK_FILE, 0400, hung_task_dir,
> - NULL, &hung_task_fops);
> + debugfs_create_file(HUNG_TASK_FILE, 0400, hung_task_dir, NULL,
> + &hung_task_fops);
>
> return 0;
> }
>
> static void __exit hung_task_sample_exit(void)
> {
> - debugfs_remove_recursive(hung_task_dir);
> + debugfs_lookup_and_remove(HUNG_TASK_FILE, hung_task_dir);
> +
> + if (simple_empty(hung_task_dir))
> + debugfs_remove(hung_task_dir);
> }
>
> module_init(hung_task_sample_init);
> diff --git a/samples/hung_task/hung_task_semaphore.c b/samples/hung_task/hung_task_semaphore.c
> new file mode 100644
> index 000000000000..a5814971bfb8
> --- /dev/null
> +++ b/samples/hung_task/hung_task_semaphore.c
> @@ -0,0 +1,74 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * hung_task_semaphore.c - Sample code which causes hung task by semaphore
> + *
> + * Usage: load this module and read `<debugfs>/hung_task/semaphore`
> + * by 2 or more processes.
> + *
> + * This is for testing kernel hung_task error message.
> + * Note that this will make your system freeze and maybe
> + * cause panic. So do not use this except for the test.
> + */
> +
> +#include <linux/debugfs.h>
> +#include <linux/delay.h>
> +#include <linux/fs.h>
> +#include <linux/module.h>
> +#include <linux/semaphore.h>
> +
> +#define HUNG_TASK_DIR "hung_task"
> +#define HUNG_TASK_FILE "semaphore"
> +#define SLEEP_SECOND 256
> +
> +static const char dummy_string[] = "This is a dummy string.";
> +static DEFINE_SEMAPHORE(dummy_sem, 1);
> +static struct dentry *hung_task_dir;
> +
> +static ssize_t read_dummy(struct file *file, char __user *user_buf,
> + size_t count, loff_t *ppos)
> +{
> + /* If the second task waits on the semaphore, it is uninterruptible sleep. */
> + down(&dummy_sem);
> +
> + /* When the first task sleep here, it is interruptible. */
> + msleep_interruptible(SLEEP_SECOND * 1000);
> +
> + up(&dummy_sem);
> +
> + return simple_read_from_buffer(user_buf, count, ppos, dummy_string,
> + sizeof(dummy_string));
> +}
> +
> +static const struct file_operations hung_task_fops = {
> + .read = read_dummy,
> +};
> +
> +static int __init hung_task_sample_init(void)
> +{
> + hung_task_dir = debugfs_lookup(HUNG_TASK_DIR, NULL);
> + if (!hung_task_dir) {
> + hung_task_dir = debugfs_create_dir(HUNG_TASK_DIR, NULL);
> + if (IS_ERR(hung_task_dir))
> + return PTR_ERR(hung_task_dir);
> + }
> +
> + debugfs_create_file(HUNG_TASK_FILE, 0400, hung_task_dir, NULL,
> + &hung_task_fops);
> +
> + return 0;
> +}
> +
> +static void __exit hung_task_sample_exit(void)
> +{
> + debugfs_lookup_and_remove(HUNG_TASK_FILE, hung_task_dir);
> +
> + if (simple_empty(hung_task_dir))
> + debugfs_remove(hung_task_dir);
> +}
> +
> +module_init(hung_task_sample_init);
> +module_exit(hung_task_sample_exit);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Zi Li");
> +MODULE_DESCRIPTION("Simple sleep under semaphore file for testing hung task");
> --
> 2.45.2
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RESEND v2 3/3] samples: add hung_task detector semaphore blocking sample
2025-03-18 1:36 ` Masami Hiramatsu
@ 2025-03-18 6:20 ` Amaindex
0 siblings, 0 replies; 12+ messages in thread
From: Amaindex @ 2025-03-18 6:20 UTC (permalink / raw)
To: Masami Hiramatsu (Google), Lance Yang
Cc: akpm, will, peterz, mingo, longman, anna.schumaker, boqun.feng,
joel.granados, kent.overstreet, leonylgao, linux-kernel, rostedt,
senozhatsky, tfiga
Hi Masami,
On 3/18/25 9:36 AM, Masami Hiramatsu (Google) wrote:
> On Fri, 14 Mar 2025 22:43:00 +0800
> Lance Yang <ioworker0@gmail.com> wrote:
>
>> From: Zi Li <amaindex@outlook.com>
>>
>> Add a hung_task detector semaphore blocking test sample code.
>>
>> This module will create a dummy file on the debugfs. That file will cause
>> the read process to sleep for a sufficiently long time (256 seconds)
>> while holding a semaphore. As a result, the second process will wait on
>> the semaphore for a prolonged duration and be detected by the hung_task
>> detector.
>>
>> Usage is;
>>
>> > cd /sys/kernel/debug/hung_task
>> > cat semaphore & cat semaphore
>>
>> and wait for hung_task message.
> Thanks for updating the sample code. BTW, do we need to have almost same
> 2 samples? I think we can unify it (one module provides both "mutex" and
> "semaphre" test files.) to "hung_task_tests.c"
>
> That will help us to extend it for other types easier (and less code).
>
> Thank you,
Thanks for your suggestion!
Yeah, I agree that keeping two separate samples is redundant. So let's
combine them into a single 'hung_task_tests.c', which can handle both
mutex and semaphore cases. This will make the sample simpler and easier to
extend.
Cheers,
Zi
>> Signed-off-by: Lance Yang <ioworker0@gmail.com>
>> Signed-off-by: Zi Li <amaindex@outlook.com>
>> ---
>> samples/Kconfig | 11 ++--
>> samples/hung_task/Makefile | 3 +-
>> samples/hung_task/hung_task_mutex.c | 20 ++++---
>> samples/hung_task/hung_task_semaphore.c | 74 +++++++++++++++++++++++++
>> 4 files changed, 96 insertions(+), 12 deletions(-)
>> create mode 100644 samples/hung_task/hung_task_semaphore.c
>>
>> diff --git a/samples/Kconfig b/samples/Kconfig
>> index 09011be2391a..3a073d6b848b 100644
>> --- a/samples/Kconfig
>> +++ b/samples/Kconfig
>> @@ -304,10 +304,13 @@ config SAMPLE_HUNG_TASK
>> tristate "Hung task detector test code"
>> depends on DETECT_HUNG_TASK && DEBUG_FS
>> help
>> - Build a module which provide a simple debugfs file. If user reads
>> - the file, it will sleep long time (256 seconds) with holding a
>> - mutex. Thus if there are 2 or more processes read this file, it
>> - will be detected by the hung_task watchdog.
>> + Build multiple modules to test the hung task detector. Each module
>> + provides a simple debugfs file corresponding to a specific
>> + synchronization primitive (e.g., mutex, semaphore, etc.). When the
>> + file is read, the module will sleep for a long time (256 seconds)
>> + while holding the respective synchronizer. If multiple processes
>> + attempt to read these files concurrently, the hung_task watchdog
>> + can detect potential hangs or deadlocks.
>>
>> source "samples/rust/Kconfig"
>>
>> diff --git a/samples/hung_task/Makefile b/samples/hung_task/Makefile
>> index fe9dde799880..7483c2c0a0ef 100644
>> --- a/samples/hung_task/Makefile
>> +++ b/samples/hung_task/Makefile
>> @@ -1,2 +1,3 @@
>> # SPDX-License-Identifier: GPL-2.0-only
>> -obj-$(CONFIG_SAMPLE_HUNG_TASK) += hung_task_mutex.o
>> \ No newline at end of file
>> +obj-$(CONFIG_SAMPLE_HUNG_TASK) += hung_task_mutex.o
>> +obj-$(CONFIG_SAMPLE_HUNG_TASK) += hung_task_semaphore.o
>> \ No newline at end of file
>> diff --git a/samples/hung_task/hung_task_mutex.c b/samples/hung_task/hung_task_mutex.c
>> index 7a29f2246d22..e4d1d69618b8 100644
>> --- a/samples/hung_task/hung_task_mutex.c
>> +++ b/samples/hung_task/hung_task_mutex.c
>> @@ -22,7 +22,7 @@
>>
>> static const char dummy_string[] = "This is a dummy string.";
>> static DEFINE_MUTEX(dummy_mutex);
>> -struct dentry *hung_task_dir;
>> +static struct dentry *hung_task_dir;
>>
>> static ssize_t read_dummy(struct file *file, char __user *user_buf,
>> size_t count, loff_t *ppos)
>> @@ -43,19 +43,25 @@ static const struct file_operations hung_task_fops = {
>>
>> static int __init hung_task_sample_init(void)
>> {
>> - hung_task_dir = debugfs_create_dir(HUNG_TASK_DIR, NULL);
>> - if (IS_ERR(hung_task_dir))
>> - return PTR_ERR(hung_task_dir);
>> + hung_task_dir = debugfs_lookup(HUNG_TASK_DIR, NULL);
>> + if (!hung_task_dir) {
>> + hung_task_dir = debugfs_create_dir(HUNG_TASK_DIR, NULL);
>> + if (IS_ERR(hung_task_dir))
>> + return PTR_ERR(hung_task_dir);
>> + }
>>
>> - debugfs_create_file(HUNG_TASK_FILE, 0400, hung_task_dir,
>> - NULL, &hung_task_fops);
>> + debugfs_create_file(HUNG_TASK_FILE, 0400, hung_task_dir, NULL,
>> + &hung_task_fops);
>>
>> return 0;
>> }
>>
>> static void __exit hung_task_sample_exit(void)
>> {
>> - debugfs_remove_recursive(hung_task_dir);
>> + debugfs_lookup_and_remove(HUNG_TASK_FILE, hung_task_dir);
>> +
>> + if (simple_empty(hung_task_dir))
>> + debugfs_remove(hung_task_dir);
>> }
>>
>> module_init(hung_task_sample_init);
>> diff --git a/samples/hung_task/hung_task_semaphore.c b/samples/hung_task/hung_task_semaphore.c
>> new file mode 100644
>> index 000000000000..a5814971bfb8
>> --- /dev/null
>> +++ b/samples/hung_task/hung_task_semaphore.c
>> @@ -0,0 +1,74 @@
>> +// SPDX-License-Identifier: GPL-2.0-or-later
>> +/*
>> + * hung_task_semaphore.c - Sample code which causes hung task by semaphore
>> + *
>> + * Usage: load this module and read `<debugfs>/hung_task/semaphore`
>> + * by 2 or more processes.
>> + *
>> + * This is for testing kernel hung_task error message.
>> + * Note that this will make your system freeze and maybe
>> + * cause panic. So do not use this except for the test.
>> + */
>> +
>> +#include <linux/debugfs.h>
>> +#include <linux/delay.h>
>> +#include <linux/fs.h>
>> +#include <linux/module.h>
>> +#include <linux/semaphore.h>
>> +
>> +#define HUNG_TASK_DIR "hung_task"
>> +#define HUNG_TASK_FILE "semaphore"
>> +#define SLEEP_SECOND 256
>> +
>> +static const char dummy_string[] = "This is a dummy string.";
>> +static DEFINE_SEMAPHORE(dummy_sem, 1);
>> +static struct dentry *hung_task_dir;
>> +
>> +static ssize_t read_dummy(struct file *file, char __user *user_buf,
>> + size_t count, loff_t *ppos)
>> +{
>> + /* If the second task waits on the semaphore, it is uninterruptible sleep. */
>> + down(&dummy_sem);
>> +
>> + /* When the first task sleep here, it is interruptible. */
>> + msleep_interruptible(SLEEP_SECOND * 1000);
>> +
>> + up(&dummy_sem);
>> +
>> + return simple_read_from_buffer(user_buf, count, ppos, dummy_string,
>> + sizeof(dummy_string));
>> +}
>> +
>> +static const struct file_operations hung_task_fops = {
>> + .read = read_dummy,
>> +};
>> +
>> +static int __init hung_task_sample_init(void)
>> +{
>> + hung_task_dir = debugfs_lookup(HUNG_TASK_DIR, NULL);
>> + if (!hung_task_dir) {
>> + hung_task_dir = debugfs_create_dir(HUNG_TASK_DIR, NULL);
>> + if (IS_ERR(hung_task_dir))
>> + return PTR_ERR(hung_task_dir);
>> + }
>> +
>> + debugfs_create_file(HUNG_TASK_FILE, 0400, hung_task_dir, NULL,
>> + &hung_task_fops);
>> +
>> + return 0;
>> +}
>> +
>> +static void __exit hung_task_sample_exit(void)
>> +{
>> + debugfs_lookup_and_remove(HUNG_TASK_FILE, hung_task_dir);
>> +
>> + if (simple_empty(hung_task_dir))
>> + debugfs_remove(hung_task_dir);
>> +}
>> +
>> +module_init(hung_task_sample_init);
>> +module_exit(hung_task_sample_exit);
>> +
>> +MODULE_LICENSE("GPL");
>> +MODULE_AUTHOR("Zi Li");
>> +MODULE_DESCRIPTION("Simple sleep under semaphore file for testing hung task");
>> --
>> 2.45.2
>>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RESEND v2 1/3] hung_task: replace blocker_mutex with encoded blocker
2025-03-14 14:42 ` [PATCH RESEND v2 1/3] hung_task: replace blocker_mutex with encoded blocker Lance Yang
@ 2025-03-18 9:41 ` Masami Hiramatsu
2025-03-18 11:35 ` Lance Yang
0 siblings, 1 reply; 12+ messages in thread
From: Masami Hiramatsu @ 2025-03-18 9:41 UTC (permalink / raw)
To: Lance Yang
Cc: akpm, will, peterz, mingo, longman, mhiramat, anna.schumaker,
boqun.feng, joel.granados, kent.overstreet, leonylgao,
linux-kernel, rostedt, senozhatsky, tfiga, amaindex, Mingzhe Yang
On Fri, 14 Mar 2025 22:42:58 +0800
Lance Yang <ioworker0@gmail.com> wrote:
> This patch replaces 'struct mutex *blocker_mutex' with 'unsigned long
> blocker', as only one blocker is active at a time.
>
> The blocker filed can store both the lock addrees and the lock type, with
> LSB used to encode the type as Masami suggested, making it easier to extend
> the feature to cover other types of locks.
>
> Also, once the lock type is determined, we can directly extract the address
> and cast it to a lock pointer ;)
Hi Lance,
Thanks for update. I added some comments.
>
> Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
> Signed-off-by: Lance Yang <ioworker0@gmail.com>
> ---
> include/linux/hung_task.h | 94 +++++++++++++++++++++++++++++++++++++++
> include/linux/sched.h | 2 +-
> kernel/hung_task.c | 15 ++++---
> kernel/locking/mutex.c | 8 +++-
> 4 files changed, 111 insertions(+), 8 deletions(-)
> create mode 100644 include/linux/hung_task.h
>
> diff --git a/include/linux/hung_task.h b/include/linux/hung_task.h
> new file mode 100644
> index 000000000000..64ced33b0d1f
> --- /dev/null
> +++ b/include/linux/hung_task.h
> @@ -0,0 +1,94 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Detect Hung Task: detecting tasks stuck in D state
> + *
> + * Copyright (C) 2025 Tongcheng Travel (www.ly.com)
> + * Author: Lance Yang <mingzhe.yang@ly.com>
> + */
> +#ifndef __LINUX_HUNG_TASK_H
> +#define __LINUX_HUNG_TASK_H
> +
> +#include <linux/bug.h>
> +#include <linux/sched.h>
> +#include <linux/compiler.h>
> +
> +/*
> + * @blocker: Combines lock address and blocking type.
> + *
> + * Since lock pointers are at least 4-byte aligned(32-bit) or 8-byte
> + * aligned(64-bit). This leaves the 2 least bits (LSBs) of the pointer
> + * always zero. So we can use these bits to encode the specific blocking
> + * type.
> + *
> + * Type encoding:
> + * 00 - Blocked on mutex (BLOCKER_TYPE_MUTEX)
> + * 01 - Blocked on semaphore (BLOCKER_TYPE_SEM)
> + * 10 - Blocked on rt-mutex (BLOCKER_TYPE_RTMUTEX)
> + * 11 - Blocked on rw-semaphore (BLOCKER_TYPE_RWSEM)
> + */
> +#define BLOCKER_TYPE_MUTEX 0x00UL
> +#define BLOCKER_TYPE_SEM 0x01UL
> +#define BLOCKER_TYPE_RTMUTEX 0x02UL
> +#define BLOCKER_TYPE_RWSEM 0x03UL
> +
> +#define BLOCKER_TYPE_MASK 0x03UL
> +
> +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> +static inline void hung_task_set_blocker(void *lock, unsigned long type)
> +{
> + unsigned long lock_ptr = (unsigned long)lock;
> +
> + WARN_ON_ONCE(!lock_ptr);
> + WARN_ON_ONCE(lock_ptr & BLOCKER_TYPE_MASK);
> + WARN_ON_ONCE(READ_ONCE(current->blocker));
The last one needs a comment why it is problem.
WARN_ON_ONCE(READ_ONCE(current->blocker),
"Task blocker is not cleared. Maybe forgot to clear it somewhere? Type: %d\n",
current->blocker);
But I don't think this isn't needed to be checked usually.
> +
> + /*
> + * If the lock pointer matches the BLOCKER_TYPE_MASK, return
> + * without writing anything.
> + */
> + if (lock_ptr & BLOCKER_TYPE_MASK)
> + return;
You can use WARN_ON_ONCE for 'if' condition.
if (WARN_ON_ONCE(lock_ptr & BLOCKER_TYPE_MASK))
return;
> +
> + WRITE_ONCE(current->blocker, lock_ptr | type);
> +}
> +
> +static inline void hung_task_clear_blocker(void)
> +{
> + WARN_ON_ONCE(!READ_ONCE(current->blocker));
> +
> + WRITE_ONCE(current->blocker, 0UL);
> +}
> +
> +static inline bool hung_task_blocker_is_type(unsigned long blocker,
> + unsigned long type)
> +{
> + WARN_ON_ONCE(!blocker);
> +
> + return (blocker & BLOCKER_TYPE_MASK) == type;
> +}
> +
> +static inline void *hung_task_blocker_to_lock(unsigned long blocker)
> +{
> + WARN_ON_ONCE(!blocker);
> +
> + return (void *)(blocker & ~BLOCKER_TYPE_MASK);
> +}
> +#else
> +static inline void hung_task_set_blocker(void *lock, unsigned long type)
> +{
> +}
> +static inline void hung_task_clear_blocker(void)
> +{
> +}
> +static inline bool hung_task_blocker_is_type(unsigned long blocker,
> + unsigned long type)
> +{
> + return false;
> +}
> +static inline void *hung_task_blocker_to_lock(unsigned long blocker)
> +{
> + return NULL;
> +}
> +#endif
> +
> +#endif /* __LINUX_HUNG_TASK_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 1419d94c8e87..f27060dac499 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1218,7 +1218,7 @@ struct task_struct {
> #endif
>
> #ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> - struct mutex *blocker_mutex;
> + unsigned long blocker;
> #endif
>
> #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index dc898ec93463..46eb6717564d 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -25,6 +25,10 @@
>
> #include <trace/events/sched.h>
>
> +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> +#include <linux/hung_task.h>
> +#endif
We don't need this #ifdef, since it is already checked in the
header file.
> +
> /*
> * The number of tasks checked:
> */
> @@ -98,16 +102,17 @@ static struct notifier_block panic_block = {
> static void debug_show_blocker(struct task_struct *task)
> {
> struct task_struct *g, *t;
> - unsigned long owner;
> - struct mutex *lock;
> + unsigned long owner, blocker;
>
> RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "No rcu lock held");
>
> - lock = READ_ONCE(task->blocker_mutex);
> - if (!lock)
> + blocker = READ_ONCE(task->blocker);
> + if (!blocker || !hung_task_blocker_is_type(blocker, BLOCKER_TYPE_MUTEX))
> return;
>
> - owner = mutex_get_owner(lock);
> + owner = mutex_get_owner(
> + (struct mutex *)hung_task_blocker_to_lock(blocker));
> +
> if (unlikely(!owner)) {
> pr_err("INFO: task %s:%d is blocked on a mutex, but the owner is not found.\n",
> task->comm, task->pid);
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 6a543c204a14..642d6398e0dd 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -42,6 +42,10 @@
> # define MUTEX_WARN_ON(cond)
> #endif
>
> +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> +#include <linux/hung_task.h>
> +#endif
Ditto.
> +
> void
> __mutex_init(struct mutex *lock, const char *name, struct lock_class_key *key)
> {
> @@ -189,7 +193,7 @@ __mutex_add_waiter(struct mutex *lock, struct mutex_waiter *waiter,
> struct list_head *list)
> {
> #ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> - WRITE_ONCE(current->blocker_mutex, lock);
> + hung_task_set_blocker(lock, BLOCKER_TYPE_MUTEX);
> #endif
> debug_mutex_add_waiter(lock, waiter, current);
>
> @@ -207,7 +211,7 @@ __mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *waiter)
>
> debug_mutex_remove_waiter(lock, waiter, current);
> #ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> - WRITE_ONCE(current->blocker_mutex, NULL);
> + hung_task_clear_blocker();
> #endif
> }
>
> --
> 2.45.2
>
Thank you,
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RESEND v2 1/3] hung_task: replace blocker_mutex with encoded blocker
2025-03-18 9:41 ` Masami Hiramatsu
@ 2025-03-18 11:35 ` Lance Yang
0 siblings, 0 replies; 12+ messages in thread
From: Lance Yang @ 2025-03-18 11:35 UTC (permalink / raw)
To: Masami Hiramatsu
Cc: akpm, will, peterz, mingo, longman, anna.schumaker, boqun.feng,
joel.granados, kent.overstreet, leonylgao, linux-kernel, rostedt,
senozhatsky, tfiga, amaindex, Mingzhe Yang
On Tue, Mar 18, 2025 at 5:42 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> On Fri, 14 Mar 2025 22:42:58 +0800
> Lance Yang <ioworker0@gmail.com> wrote:
>
> > This patch replaces 'struct mutex *blocker_mutex' with 'unsigned long
> > blocker', as only one blocker is active at a time.
> >
> > The blocker filed can store both the lock addrees and the lock type, with
> > LSB used to encode the type as Masami suggested, making it easier to extend
> > the feature to cover other types of locks.
> >
> > Also, once the lock type is determined, we can directly extract the address
> > and cast it to a lock pointer ;)
>
> Hi Lance,
>
> Thanks for update. I added some comments.
Thanks for taking time to review!
>
> >
> > Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
> > Signed-off-by: Lance Yang <ioworker0@gmail.com>
> > ---
> > include/linux/hung_task.h | 94 +++++++++++++++++++++++++++++++++++++++
> > include/linux/sched.h | 2 +-
> > kernel/hung_task.c | 15 ++++---
> > kernel/locking/mutex.c | 8 +++-
> > 4 files changed, 111 insertions(+), 8 deletions(-)
> > create mode 100644 include/linux/hung_task.h
> >
> > diff --git a/include/linux/hung_task.h b/include/linux/hung_task.h
> > new file mode 100644
> > index 000000000000..64ced33b0d1f
> > --- /dev/null
> > +++ b/include/linux/hung_task.h
> > @@ -0,0 +1,94 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Detect Hung Task: detecting tasks stuck in D state
> > + *
> > + * Copyright (C) 2025 Tongcheng Travel (www.ly.com)
> > + * Author: Lance Yang <mingzhe.yang@ly.com>
> > + */
> > +#ifndef __LINUX_HUNG_TASK_H
> > +#define __LINUX_HUNG_TASK_H
> > +
> > +#include <linux/bug.h>
> > +#include <linux/sched.h>
> > +#include <linux/compiler.h>
> > +
> > +/*
> > + * @blocker: Combines lock address and blocking type.
> > + *
> > + * Since lock pointers are at least 4-byte aligned(32-bit) or 8-byte
> > + * aligned(64-bit). This leaves the 2 least bits (LSBs) of the pointer
> > + * always zero. So we can use these bits to encode the specific blocking
> > + * type.
> > + *
> > + * Type encoding:
> > + * 00 - Blocked on mutex (BLOCKER_TYPE_MUTEX)
> > + * 01 - Blocked on semaphore (BLOCKER_TYPE_SEM)
> > + * 10 - Blocked on rt-mutex (BLOCKER_TYPE_RTMUTEX)
> > + * 11 - Blocked on rw-semaphore (BLOCKER_TYPE_RWSEM)
> > + */
> > +#define BLOCKER_TYPE_MUTEX 0x00UL
> > +#define BLOCKER_TYPE_SEM 0x01UL
> > +#define BLOCKER_TYPE_RTMUTEX 0x02UL
> > +#define BLOCKER_TYPE_RWSEM 0x03UL
> > +
> > +#define BLOCKER_TYPE_MASK 0x03UL
> > +
> > +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> > +static inline void hung_task_set_blocker(void *lock, unsigned long type)
> > +{
> > + unsigned long lock_ptr = (unsigned long)lock;
> > +
> > + WARN_ON_ONCE(!lock_ptr);
> > + WARN_ON_ONCE(lock_ptr & BLOCKER_TYPE_MASK);
> > + WARN_ON_ONCE(READ_ONCE(current->blocker));
>
> The last one needs a comment why it is problem.
>
> WARN_ON_ONCE(READ_ONCE(current->blocker),
> "Task blocker is not cleared. Maybe forgot to clear it somewhere? Type: %d\n",
> current->blocker);
>
> But I don't think this isn't needed to be checked usually.
Hmm... Looking back, adding it was probably overkill. So let's drop it in
the next version ;)
>
> > +
> > + /*
> > + * If the lock pointer matches the BLOCKER_TYPE_MASK, return
> > + * without writing anything.
> > + */
> > + if (lock_ptr & BLOCKER_TYPE_MASK)
> > + return;
>
> You can use WARN_ON_ONCE for 'if' condition.
>
> if (WARN_ON_ONCE(lock_ptr & BLOCKER_TYPE_MASK))
> return;
Ah, I didn't realize we could do it that way before. That's nice to know!
>
> > +
> > + WRITE_ONCE(current->blocker, lock_ptr | type);
> > +}
> > +
> > +static inline void hung_task_clear_blocker(void)
> > +{
> > + WARN_ON_ONCE(!READ_ONCE(current->blocker));
> > +
> > + WRITE_ONCE(current->blocker, 0UL);
> > +}
> > +
> > +static inline bool hung_task_blocker_is_type(unsigned long blocker,
> > + unsigned long type)
> > +{
> > + WARN_ON_ONCE(!blocker);
> > +
> > + return (blocker & BLOCKER_TYPE_MASK) == type;
> > +}
> > +
> > +static inline void *hung_task_blocker_to_lock(unsigned long blocker)
> > +{
> > + WARN_ON_ONCE(!blocker);
> > +
> > + return (void *)(blocker & ~BLOCKER_TYPE_MASK);
> > +}
> > +#else
> > +static inline void hung_task_set_blocker(void *lock, unsigned long type)
> > +{
> > +}
> > +static inline void hung_task_clear_blocker(void)
> > +{
> > +}
> > +static inline bool hung_task_blocker_is_type(unsigned long blocker,
> > + unsigned long type)
> > +{
> > + return false;
> > +}
> > +static inline void *hung_task_blocker_to_lock(unsigned long blocker)
> > +{
> > + return NULL;
> > +}
> > +#endif
> > +
> > +#endif /* __LINUX_HUNG_TASK_H */
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 1419d94c8e87..f27060dac499 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1218,7 +1218,7 @@ struct task_struct {
> > #endif
> >
> > #ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> > - struct mutex *blocker_mutex;
> > + unsigned long blocker;
> > #endif
> >
> > #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
> > diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> > index dc898ec93463..46eb6717564d 100644
> > --- a/kernel/hung_task.c
> > +++ b/kernel/hung_task.c
> > @@ -25,6 +25,10 @@
> >
> > #include <trace/events/sched.h>
> >
> > +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> > +#include <linux/hung_task.h>
> > +#endif
>
> We don't need this #ifdef, since it is already checked in the
> header file.
Good catch! I completely missed it ;(
>
> > +
> > /*
> > * The number of tasks checked:
> > */
> > @@ -98,16 +102,17 @@ static struct notifier_block panic_block = {
> > static void debug_show_blocker(struct task_struct *task)
> > {
> > struct task_struct *g, *t;
> > - unsigned long owner;
> > - struct mutex *lock;
> > + unsigned long owner, blocker;
> >
> > RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "No rcu lock held");
> >
> > - lock = READ_ONCE(task->blocker_mutex);
> > - if (!lock)
> > + blocker = READ_ONCE(task->blocker);
> > + if (!blocker || !hung_task_blocker_is_type(blocker, BLOCKER_TYPE_MUTEX))
> > return;
> >
> > - owner = mutex_get_owner(lock);
> > + owner = mutex_get_owner(
> > + (struct mutex *)hung_task_blocker_to_lock(blocker));
> > +
> > if (unlikely(!owner)) {
> > pr_err("INFO: task %s:%d is blocked on a mutex, but the owner is not found.\n",
> > task->comm, task->pid);
> > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> > index 6a543c204a14..642d6398e0dd 100644
> > --- a/kernel/locking/mutex.c
> > +++ b/kernel/locking/mutex.c
> > @@ -42,6 +42,10 @@
> > # define MUTEX_WARN_ON(cond)
> > #endif
> >
> > +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> > +#include <linux/hung_task.h>
> > +#endif
>
> Ditto.
Yes, you're right. Will remove that as you suggested.
Thanks,
Lance
>
> > +
> > void
> > __mutex_init(struct mutex *lock, const char *name, struct lock_class_key *key)
> > {
> > @@ -189,7 +193,7 @@ __mutex_add_waiter(struct mutex *lock, struct mutex_waiter *waiter,
> > struct list_head *list)
> > {
> > #ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> > - WRITE_ONCE(current->blocker_mutex, lock);
> > + hung_task_set_blocker(lock, BLOCKER_TYPE_MUTEX);
> > #endif
> > debug_mutex_add_waiter(lock, waiter, current);
> >
> > @@ -207,7 +211,7 @@ __mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *waiter)
> >
> > debug_mutex_remove_waiter(lock, waiter, current);
> > #ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> > - WRITE_ONCE(current->blocker_mutex, NULL);
> > + hung_task_clear_blocker();
> > #endif
> > }
> >
> > --
> > 2.45.2
> >
>
> Thank you,
>
>
> --
> Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RESEND v2 2/3] hung_task: show the blocker task if the task is hung on semaphore
2025-03-14 14:42 ` [PATCH RESEND v2 2/3] hung_task: show the blocker task if the task is hung on semaphore Lance Yang
@ 2025-03-19 11:55 ` Masami Hiramatsu
2025-03-19 12:11 ` Lance Yang
0 siblings, 1 reply; 12+ messages in thread
From: Masami Hiramatsu @ 2025-03-19 11:55 UTC (permalink / raw)
To: Lance Yang
Cc: akpm, will, peterz, mingo, longman, mhiramat, anna.schumaker,
boqun.feng, joel.granados, kent.overstreet, leonylgao,
linux-kernel, rostedt, senozhatsky, tfiga, amaindex, Mingzhe Yang
On Fri, 14 Mar 2025 22:42:59 +0800
Lance Yang <ioworker0@gmail.com> wrote:
> Inspired by mutex blocker tracking[1], this patch makes a trade-off to
> balance the overhead and utility of the hung task detector.
>
> Unlike mutexes, semaphores lack explicit ownership tracking, making it
> challenging to identify the root cause of hangs. To address this, we
> introduce a last_holder field to the semaphore structure, which is
> updated when a task successfully calls down() and cleared during up().
>
> The assumption is that if a task is blocked on a semaphore, the holders
> must not have released it. While this does not guarantee that the last
> holder is one of the current blockers, it likely provides a practical hint
> for diagnosing semaphore-related stalls.
>
> With this change, the hung task detector can now show blocker task's info
> like below:
>
> [Thu Mar 13 15:18:38 2025] INFO: task cat:1803 blocked for more than 122 seconds.
> [Thu Mar 13 15:18:38 2025] Tainted: G OE 6.14.0-rc3+ #14
> [Thu Mar 13 15:18:38 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Thu Mar 13 15:18:38 2025] task:cat state:D stack:0 pid:1803 tgid:1803 ppid:1057 task_flags:0x400000 flags:0x00000004
> [Thu Mar 13 15:18:38 2025] Call trace:
> [Thu Mar 13 15:18:38 2025] __switch_to+0x1ec/0x380 (T)
> [Thu Mar 13 15:18:38 2025] __schedule+0xc30/0x44f8
> [Thu Mar 13 15:18:38 2025] schedule+0xb8/0x3b0
> [Thu Mar 13 15:18:38 2025] schedule_timeout+0x1d0/0x208
> [Thu Mar 13 15:18:38 2025] __down_common+0x2d4/0x6f8
> [Thu Mar 13 15:18:38 2025] __down+0x24/0x50
> [Thu Mar 13 15:18:38 2025] down+0xd0/0x140
> [Thu Mar 13 15:18:38 2025] read_dummy+0x3c/0xa0 [hung_task_sem]
> [Thu Mar 13 15:18:38 2025] full_proxy_read+0xfc/0x1d0
> [Thu Mar 13 15:18:38 2025] vfs_read+0x1a0/0x858
> [Thu Mar 13 15:18:38 2025] ksys_read+0x100/0x220
> [Thu Mar 13 15:18:38 2025] __arm64_sys_read+0x78/0xc8
> [Thu Mar 13 15:18:38 2025] invoke_syscall+0xd8/0x278
> [Thu Mar 13 15:18:38 2025] el0_svc_common.constprop.0+0xb8/0x298
> [Thu Mar 13 15:18:38 2025] do_el0_svc+0x4c/0x88
> [Thu Mar 13 15:18:38 2025] el0_svc+0x44/0x108
> [Thu Mar 13 15:18:38 2025] el0t_64_sync_handler+0x134/0x160
> [Thu Mar 13 15:18:38 2025] el0t_64_sync+0x1b8/0x1c0
> [Thu Mar 13 15:18:38 2025] INFO: task cat:1803 blocked on a semaphore likely last held by task cat:1802
> [Thu Mar 13 15:18:38 2025] task:cat state:S stack:0 pid:1802 tgid:1802 ppid:1057 task_flags:0x400000 flags:0x00000004
> [Thu Mar 13 15:18:38 2025] Call trace:
> [Thu Mar 13 15:18:38 2025] __switch_to+0x1ec/0x380 (T)
> [Thu Mar 13 15:18:38 2025] __schedule+0xc30/0x44f8
> [Thu Mar 13 15:18:38 2025] schedule+0xb8/0x3b0
> [Thu Mar 13 15:18:38 2025] schedule_timeout+0xf4/0x208
> [Thu Mar 13 15:18:38 2025] msleep_interruptible+0x70/0x130
> [Thu Mar 13 15:18:38 2025] read_dummy+0x48/0xa0 [hung_task_sem]
> [Thu Mar 13 15:18:38 2025] full_proxy_read+0xfc/0x1d0
> [Thu Mar 13 15:18:38 2025] vfs_read+0x1a0/0x858
> [Thu Mar 13 15:18:38 2025] ksys_read+0x100/0x220
> [Thu Mar 13 15:18:38 2025] __arm64_sys_read+0x78/0xc8
> [Thu Mar 13 15:18:38 2025] invoke_syscall+0xd8/0x278
> [Thu Mar 13 15:18:38 2025] el0_svc_common.constprop.0+0xb8/0x298
> [Thu Mar 13 15:18:38 2025] do_el0_svc+0x4c/0x88
> [Thu Mar 13 15:18:38 2025] el0_svc+0x44/0x108
> [Thu Mar 13 15:18:38 2025] el0t_64_sync_handler+0x134/0x160
> [Thu Mar 13 15:18:38 2025] el0t_64_sync+0x1b8/0x1c0
>
> [1] https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com
>
> Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
> Signed-off-by: Lance Yang <ioworker0@gmail.com>
> ---
> include/linux/semaphore.h | 15 ++++++++++-
> kernel/hung_task.c | 45 ++++++++++++++++++++++++-------
> kernel/locking/semaphore.c | 55 +++++++++++++++++++++++++++++++++-----
> 3 files changed, 98 insertions(+), 17 deletions(-)
>
> diff --git a/include/linux/semaphore.h b/include/linux/semaphore.h
> index 04655faadc2d..89706157e622 100644
> --- a/include/linux/semaphore.h
> +++ b/include/linux/semaphore.h
> @@ -16,13 +16,25 @@ struct semaphore {
> raw_spinlock_t lock;
> unsigned int count;
> struct list_head wait_list;
> +
> +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> + unsigned long last_holder;
> +#endif
> };
>
> +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> +#define __LAST_HOLDER_SEMAPHORE_INITIALIZER \
> + , .last_holder = 0UL
> +#else
> +#define __LAST_HOLDER_SEMAPHORE_INITIALIZER
> +#endif
> +
> #define __SEMAPHORE_INITIALIZER(name, n) \
> { \
> .lock = __RAW_SPIN_LOCK_UNLOCKED((name).lock), \
> .count = n, \
> - .wait_list = LIST_HEAD_INIT((name).wait_list), \
> + .wait_list = LIST_HEAD_INIT((name).wait_list) \
> + __LAST_HOLDER_SEMAPHORE_INITIALIZER \
> }
>
> /*
> @@ -47,5 +59,6 @@ extern int __must_check down_killable(struct semaphore *sem);
> extern int __must_check down_trylock(struct semaphore *sem);
> extern int __must_check down_timeout(struct semaphore *sem, long jiffies);
> extern void up(struct semaphore *sem);
> +extern unsigned long sem_last_holder(struct semaphore *sem);
>
> #endif /* __LINUX_SEMAPHORE_H */
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 46eb6717564d..f8cb5a0e14f7 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -102,31 +102,56 @@ static struct notifier_block panic_block = {
> static void debug_show_blocker(struct task_struct *task)
> {
> struct task_struct *g, *t;
> - unsigned long owner, blocker;
> + unsigned long owner, blocker, blocker_lock_type;
>
> RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "No rcu lock held");
>
> blocker = READ_ONCE(task->blocker);
> - if (!blocker || !hung_task_blocker_is_type(blocker, BLOCKER_TYPE_MUTEX))
> + if (!blocker)
> return;
>
> - owner = mutex_get_owner(
> - (struct mutex *)hung_task_blocker_to_lock(blocker));
> + if (hung_task_blocker_is_type(blocker, BLOCKER_TYPE_MUTEX)) {
> + owner = mutex_get_owner(
> + (struct mutex *)hung_task_blocker_to_lock(blocker));
> + blocker_lock_type = BLOCKER_TYPE_MUTEX;
> + } else if (hung_task_blocker_is_type(blocker, BLOCKER_TYPE_SEM)) {
> + owner = sem_last_holder(
> + (struct semaphore *)hung_task_blocker_to_lock(blocker));
> + blocker_lock_type = BLOCKER_TYPE_SEM;
> + } else
Can't we extract the type from blocker? I think we can just mask it.
(then, we can use switch-case above)
Others looks good to me.
Thanks,
> + return;
>
> if (unlikely(!owner)) {
> - pr_err("INFO: task %s:%d is blocked on a mutex, but the owner is not found.\n",
> - task->comm, task->pid);
> + switch (blocker_lock_type) {
> + case BLOCKER_TYPE_MUTEX:
> + pr_err("INFO: task %s:%d is blocked on a mutex, but the owner is not found.\n",
> + task->comm, task->pid);
> + break;
> + case BLOCKER_TYPE_SEM:
> + pr_err("INFO: task %s:%d is blocked on a semaphore, but the last holder is not found.\n",
> + task->comm, task->pid);
> + break;
> + }
> return;
> }
>
> /* Ensure the owner information is correct. */
> for_each_process_thread(g, t) {
> - if ((unsigned long)t == owner) {
> + if ((unsigned long)t != owner)
> + continue;
> +
> + switch (blocker_lock_type) {
> + case BLOCKER_TYPE_MUTEX:
> pr_err("INFO: task %s:%d is blocked on a mutex likely owned by task %s:%d.\n",
> - task->comm, task->pid, t->comm, t->pid);
> - sched_show_task(t);
> - return;
> + task->comm, task->pid, t->comm, t->pid);
> + break;
> + case BLOCKER_TYPE_SEM:
> + pr_err("INFO: task %s:%d blocked on a semaphore likely last held by task %s:%d\n",
> + task->comm, task->pid, t->comm, t->pid);
> + break;
> }
> + sched_show_task(t);
> + return;
> }
> }
> #else
> diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
> index 34bfae72f295..87dfb93a812d 100644
> --- a/kernel/locking/semaphore.c
> +++ b/kernel/locking/semaphore.c
> @@ -34,11 +34,16 @@
> #include <linux/ftrace.h>
> #include <trace/events/lock.h>
>
> +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> +#include <linux/hung_task.h>
> +#endif
> +
> static noinline void __down(struct semaphore *sem);
> static noinline int __down_interruptible(struct semaphore *sem);
> static noinline int __down_killable(struct semaphore *sem);
> static noinline int __down_timeout(struct semaphore *sem, long timeout);
> static noinline void __up(struct semaphore *sem);
> +static inline void __sem_acquire(struct semaphore *sem);
>
> /**
> * down - acquire the semaphore
> @@ -58,7 +63,7 @@ void __sched down(struct semaphore *sem)
> might_sleep();
> raw_spin_lock_irqsave(&sem->lock, flags);
> if (likely(sem->count > 0))
> - sem->count--;
> + __sem_acquire(sem);
> else
> __down(sem);
> raw_spin_unlock_irqrestore(&sem->lock, flags);
> @@ -82,7 +87,7 @@ int __sched down_interruptible(struct semaphore *sem)
> might_sleep();
> raw_spin_lock_irqsave(&sem->lock, flags);
> if (likely(sem->count > 0))
> - sem->count--;
> + __sem_acquire(sem);
> else
> result = __down_interruptible(sem);
> raw_spin_unlock_irqrestore(&sem->lock, flags);
> @@ -109,7 +114,7 @@ int __sched down_killable(struct semaphore *sem)
> might_sleep();
> raw_spin_lock_irqsave(&sem->lock, flags);
> if (likely(sem->count > 0))
> - sem->count--;
> + __sem_acquire(sem);
> else
> result = __down_killable(sem);
> raw_spin_unlock_irqrestore(&sem->lock, flags);
> @@ -139,7 +144,7 @@ int __sched down_trylock(struct semaphore *sem)
> raw_spin_lock_irqsave(&sem->lock, flags);
> count = sem->count - 1;
> if (likely(count >= 0))
> - sem->count = count;
> + __sem_acquire(sem);
> raw_spin_unlock_irqrestore(&sem->lock, flags);
>
> return (count < 0);
> @@ -164,7 +169,7 @@ int __sched down_timeout(struct semaphore *sem, long timeout)
> might_sleep();
> raw_spin_lock_irqsave(&sem->lock, flags);
> if (likely(sem->count > 0))
> - sem->count--;
> + __sem_acquire(sem);
> else
> result = __down_timeout(sem, timeout);
> raw_spin_unlock_irqrestore(&sem->lock, flags);
> @@ -185,6 +190,12 @@ void __sched up(struct semaphore *sem)
> unsigned long flags;
>
> raw_spin_lock_irqsave(&sem->lock, flags);
> +
> +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> + if (READ_ONCE(sem->last_holder) == (unsigned long)current)
> + WRITE_ONCE(sem->last_holder, 0UL);
> +#endif
> +
> if (likely(list_empty(&sem->wait_list)))
> sem->count++;
> else
> @@ -224,8 +235,12 @@ static inline int __sched ___down_common(struct semaphore *sem, long state,
> raw_spin_unlock_irq(&sem->lock);
> timeout = schedule_timeout(timeout);
> raw_spin_lock_irq(&sem->lock);
> - if (waiter.up)
> + if (waiter.up) {
> +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> + WRITE_ONCE(sem->last_holder, (unsigned long)current);
> +#endif
> return 0;
> + }
> }
>
> timed_out:
> @@ -242,10 +257,18 @@ static inline int __sched __down_common(struct semaphore *sem, long state,
> {
> int ret;
>
> +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> + hung_task_set_blocker(sem, BLOCKER_TYPE_SEM);
> +#endif
> +
> trace_contention_begin(sem, 0);
> ret = ___down_common(sem, state, timeout);
> trace_contention_end(sem, ret);
>
> +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> + hung_task_clear_blocker();
> +#endif
> +
> return ret;
> }
>
> @@ -277,3 +300,23 @@ static noinline void __sched __up(struct semaphore *sem)
> waiter->up = true;
> wake_up_process(waiter->task);
> }
> +
> +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> +unsigned long sem_last_holder(struct semaphore *sem)
> +{
> + return READ_ONCE(sem->last_holder);
> +}
> +#else
> +unsigned long sem_last_holder(struct semaphore *sem)
> +{
> + return 0UL;
> +}
> +#endif
> +
> +static inline void __sem_acquire(struct semaphore *sem)
> +{
> + sem->count--;
> +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> + WRITE_ONCE(sem->last_holder, (unsigned long)current);
> +#endif
> +}
> --
> 2.45.2
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RESEND v2 2/3] hung_task: show the blocker task if the task is hung on semaphore
2025-03-19 11:55 ` Masami Hiramatsu
@ 2025-03-19 12:11 ` Lance Yang
0 siblings, 0 replies; 12+ messages in thread
From: Lance Yang @ 2025-03-19 12:11 UTC (permalink / raw)
To: Masami Hiramatsu
Cc: akpm, will, peterz, mingo, longman, anna.schumaker, boqun.feng,
joel.granados, kent.overstreet, leonylgao, linux-kernel, rostedt,
senozhatsky, tfiga, amaindex, Mingzhe Yang
On Wed, Mar 19, 2025 at 7:55 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> On Fri, 14 Mar 2025 22:42:59 +0800
> Lance Yang <ioworker0@gmail.com> wrote:
>
> > Inspired by mutex blocker tracking[1], this patch makes a trade-off to
> > balance the overhead and utility of the hung task detector.
> >
> > Unlike mutexes, semaphores lack explicit ownership tracking, making it
> > challenging to identify the root cause of hangs. To address this, we
> > introduce a last_holder field to the semaphore structure, which is
> > updated when a task successfully calls down() and cleared during up().
> >
> > The assumption is that if a task is blocked on a semaphore, the holders
> > must not have released it. While this does not guarantee that the last
> > holder is one of the current blockers, it likely provides a practical hint
> > for diagnosing semaphore-related stalls.
> >
> > With this change, the hung task detector can now show blocker task's info
> > like below:
> >
> > [Thu Mar 13 15:18:38 2025] INFO: task cat:1803 blocked for more than 122 seconds.
> > [Thu Mar 13 15:18:38 2025] Tainted: G OE 6.14.0-rc3+ #14
> > [Thu Mar 13 15:18:38 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [Thu Mar 13 15:18:38 2025] task:cat state:D stack:0 pid:1803 tgid:1803 ppid:1057 task_flags:0x400000 flags:0x00000004
> > [Thu Mar 13 15:18:38 2025] Call trace:
> > [Thu Mar 13 15:18:38 2025] __switch_to+0x1ec/0x380 (T)
> > [Thu Mar 13 15:18:38 2025] __schedule+0xc30/0x44f8
> > [Thu Mar 13 15:18:38 2025] schedule+0xb8/0x3b0
> > [Thu Mar 13 15:18:38 2025] schedule_timeout+0x1d0/0x208
> > [Thu Mar 13 15:18:38 2025] __down_common+0x2d4/0x6f8
> > [Thu Mar 13 15:18:38 2025] __down+0x24/0x50
> > [Thu Mar 13 15:18:38 2025] down+0xd0/0x140
> > [Thu Mar 13 15:18:38 2025] read_dummy+0x3c/0xa0 [hung_task_sem]
> > [Thu Mar 13 15:18:38 2025] full_proxy_read+0xfc/0x1d0
> > [Thu Mar 13 15:18:38 2025] vfs_read+0x1a0/0x858
> > [Thu Mar 13 15:18:38 2025] ksys_read+0x100/0x220
> > [Thu Mar 13 15:18:38 2025] __arm64_sys_read+0x78/0xc8
> > [Thu Mar 13 15:18:38 2025] invoke_syscall+0xd8/0x278
> > [Thu Mar 13 15:18:38 2025] el0_svc_common.constprop.0+0xb8/0x298
> > [Thu Mar 13 15:18:38 2025] do_el0_svc+0x4c/0x88
> > [Thu Mar 13 15:18:38 2025] el0_svc+0x44/0x108
> > [Thu Mar 13 15:18:38 2025] el0t_64_sync_handler+0x134/0x160
> > [Thu Mar 13 15:18:38 2025] el0t_64_sync+0x1b8/0x1c0
> > [Thu Mar 13 15:18:38 2025] INFO: task cat:1803 blocked on a semaphore likely last held by task cat:1802
> > [Thu Mar 13 15:18:38 2025] task:cat state:S stack:0 pid:1802 tgid:1802 ppid:1057 task_flags:0x400000 flags:0x00000004
> > [Thu Mar 13 15:18:38 2025] Call trace:
> > [Thu Mar 13 15:18:38 2025] __switch_to+0x1ec/0x380 (T)
> > [Thu Mar 13 15:18:38 2025] __schedule+0xc30/0x44f8
> > [Thu Mar 13 15:18:38 2025] schedule+0xb8/0x3b0
> > [Thu Mar 13 15:18:38 2025] schedule_timeout+0xf4/0x208
> > [Thu Mar 13 15:18:38 2025] msleep_interruptible+0x70/0x130
> > [Thu Mar 13 15:18:38 2025] read_dummy+0x48/0xa0 [hung_task_sem]
> > [Thu Mar 13 15:18:38 2025] full_proxy_read+0xfc/0x1d0
> > [Thu Mar 13 15:18:38 2025] vfs_read+0x1a0/0x858
> > [Thu Mar 13 15:18:38 2025] ksys_read+0x100/0x220
> > [Thu Mar 13 15:18:38 2025] __arm64_sys_read+0x78/0xc8
> > [Thu Mar 13 15:18:38 2025] invoke_syscall+0xd8/0x278
> > [Thu Mar 13 15:18:38 2025] el0_svc_common.constprop.0+0xb8/0x298
> > [Thu Mar 13 15:18:38 2025] do_el0_svc+0x4c/0x88
> > [Thu Mar 13 15:18:38 2025] el0_svc+0x44/0x108
> > [Thu Mar 13 15:18:38 2025] el0t_64_sync_handler+0x134/0x160
> > [Thu Mar 13 15:18:38 2025] el0t_64_sync+0x1b8/0x1c0
> >
> > [1] https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com
> >
> > Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
> > Signed-off-by: Lance Yang <ioworker0@gmail.com>
> > ---
> > include/linux/semaphore.h | 15 ++++++++++-
> > kernel/hung_task.c | 45 ++++++++++++++++++++++++-------
> > kernel/locking/semaphore.c | 55 +++++++++++++++++++++++++++++++++-----
> > 3 files changed, 98 insertions(+), 17 deletions(-)
> >
> > diff --git a/include/linux/semaphore.h b/include/linux/semaphore.h
> > index 04655faadc2d..89706157e622 100644
> > --- a/include/linux/semaphore.h
> > +++ b/include/linux/semaphore.h
> > @@ -16,13 +16,25 @@ struct semaphore {
> > raw_spinlock_t lock;
> > unsigned int count;
> > struct list_head wait_list;
> > +
> > +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> > + unsigned long last_holder;
> > +#endif
> > };
> >
> > +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> > +#define __LAST_HOLDER_SEMAPHORE_INITIALIZER \
> > + , .last_holder = 0UL
> > +#else
> > +#define __LAST_HOLDER_SEMAPHORE_INITIALIZER
> > +#endif
> > +
> > #define __SEMAPHORE_INITIALIZER(name, n) \
> > { \
> > .lock = __RAW_SPIN_LOCK_UNLOCKED((name).lock), \
> > .count = n, \
> > - .wait_list = LIST_HEAD_INIT((name).wait_list), \
> > + .wait_list = LIST_HEAD_INIT((name).wait_list) \
> > + __LAST_HOLDER_SEMAPHORE_INITIALIZER \
> > }
> >
> > /*
> > @@ -47,5 +59,6 @@ extern int __must_check down_killable(struct semaphore *sem);
> > extern int __must_check down_trylock(struct semaphore *sem);
> > extern int __must_check down_timeout(struct semaphore *sem, long jiffies);
> > extern void up(struct semaphore *sem);
> > +extern unsigned long sem_last_holder(struct semaphore *sem);
> >
> > #endif /* __LINUX_SEMAPHORE_H */
> > diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> > index 46eb6717564d..f8cb5a0e14f7 100644
> > --- a/kernel/hung_task.c
> > +++ b/kernel/hung_task.c
> > @@ -102,31 +102,56 @@ static struct notifier_block panic_block = {
> > static void debug_show_blocker(struct task_struct *task)
> > {
> > struct task_struct *g, *t;
> > - unsigned long owner, blocker;
> > + unsigned long owner, blocker, blocker_lock_type;
> >
> > RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "No rcu lock held");
> >
> > blocker = READ_ONCE(task->blocker);
> > - if (!blocker || !hung_task_blocker_is_type(blocker, BLOCKER_TYPE_MUTEX))
> > + if (!blocker)
> > return;
> >
> > - owner = mutex_get_owner(
> > - (struct mutex *)hung_task_blocker_to_lock(blocker));
> > + if (hung_task_blocker_is_type(blocker, BLOCKER_TYPE_MUTEX)) {
> > + owner = mutex_get_owner(
> > + (struct mutex *)hung_task_blocker_to_lock(blocker));
> > + blocker_lock_type = BLOCKER_TYPE_MUTEX;
> > + } else if (hung_task_blocker_is_type(blocker, BLOCKER_TYPE_SEM)) {
> > + owner = sem_last_holder(
> > + (struct semaphore *)hung_task_blocker_to_lock(blocker));
> > + blocker_lock_type = BLOCKER_TYPE_SEM;
> > + } else
>
> Can't we extract the type from blocker? I think we can just mask it.
> (then, we can use switch-case above)
Yep, I was thinking the same. Will adjust as you suggested ;)
Thanks,
Lance
>
> Others looks good to me.
>
> Thanks,
>
> > + return;
> >
> > if (unlikely(!owner)) {
> > - pr_err("INFO: task %s:%d is blocked on a mutex, but the owner is not found.\n",
> > - task->comm, task->pid);
> > + switch (blocker_lock_type) {
> > + case BLOCKER_TYPE_MUTEX:
> > + pr_err("INFO: task %s:%d is blocked on a mutex, but the owner is not found.\n",
> > + task->comm, task->pid);
> > + break;
> > + case BLOCKER_TYPE_SEM:
> > + pr_err("INFO: task %s:%d is blocked on a semaphore, but the last holder is not found.\n",
> > + task->comm, task->pid);
> > + break;
> > + }
> > return;
> > }
> >
> > /* Ensure the owner information is correct. */
> > for_each_process_thread(g, t) {
> > - if ((unsigned long)t == owner) {
> > + if ((unsigned long)t != owner)
> > + continue;
> > +
> > + switch (blocker_lock_type) {
> > + case BLOCKER_TYPE_MUTEX:
> > pr_err("INFO: task %s:%d is blocked on a mutex likely owned by task %s:%d.\n",
> > - task->comm, task->pid, t->comm, t->pid);
> > - sched_show_task(t);
> > - return;
> > + task->comm, task->pid, t->comm, t->pid);
> > + break;
> > + case BLOCKER_TYPE_SEM:
> > + pr_err("INFO: task %s:%d blocked on a semaphore likely last held by task %s:%d\n",
> > + task->comm, task->pid, t->comm, t->pid);
> > + break;
> > }
> > + sched_show_task(t);
> > + return;
> > }
> > }
> > #else
> > diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
> > index 34bfae72f295..87dfb93a812d 100644
> > --- a/kernel/locking/semaphore.c
> > +++ b/kernel/locking/semaphore.c
> > @@ -34,11 +34,16 @@
> > #include <linux/ftrace.h>
> > #include <trace/events/lock.h>
> >
> > +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> > +#include <linux/hung_task.h>
> > +#endif
> > +
> > static noinline void __down(struct semaphore *sem);
> > static noinline int __down_interruptible(struct semaphore *sem);
> > static noinline int __down_killable(struct semaphore *sem);
> > static noinline int __down_timeout(struct semaphore *sem, long timeout);
> > static noinline void __up(struct semaphore *sem);
> > +static inline void __sem_acquire(struct semaphore *sem);
> >
> > /**
> > * down - acquire the semaphore
> > @@ -58,7 +63,7 @@ void __sched down(struct semaphore *sem)
> > might_sleep();
> > raw_spin_lock_irqsave(&sem->lock, flags);
> > if (likely(sem->count > 0))
> > - sem->count--;
> > + __sem_acquire(sem);
> > else
> > __down(sem);
> > raw_spin_unlock_irqrestore(&sem->lock, flags);
> > @@ -82,7 +87,7 @@ int __sched down_interruptible(struct semaphore *sem)
> > might_sleep();
> > raw_spin_lock_irqsave(&sem->lock, flags);
> > if (likely(sem->count > 0))
> > - sem->count--;
> > + __sem_acquire(sem);
> > else
> > result = __down_interruptible(sem);
> > raw_spin_unlock_irqrestore(&sem->lock, flags);
> > @@ -109,7 +114,7 @@ int __sched down_killable(struct semaphore *sem)
> > might_sleep();
> > raw_spin_lock_irqsave(&sem->lock, flags);
> > if (likely(sem->count > 0))
> > - sem->count--;
> > + __sem_acquire(sem);
> > else
> > result = __down_killable(sem);
> > raw_spin_unlock_irqrestore(&sem->lock, flags);
> > @@ -139,7 +144,7 @@ int __sched down_trylock(struct semaphore *sem)
> > raw_spin_lock_irqsave(&sem->lock, flags);
> > count = sem->count - 1;
> > if (likely(count >= 0))
> > - sem->count = count;
> > + __sem_acquire(sem);
> > raw_spin_unlock_irqrestore(&sem->lock, flags);
> >
> > return (count < 0);
> > @@ -164,7 +169,7 @@ int __sched down_timeout(struct semaphore *sem, long timeout)
> > might_sleep();
> > raw_spin_lock_irqsave(&sem->lock, flags);
> > if (likely(sem->count > 0))
> > - sem->count--;
> > + __sem_acquire(sem);
> > else
> > result = __down_timeout(sem, timeout);
> > raw_spin_unlock_irqrestore(&sem->lock, flags);
> > @@ -185,6 +190,12 @@ void __sched up(struct semaphore *sem)
> > unsigned long flags;
> >
> > raw_spin_lock_irqsave(&sem->lock, flags);
> > +
> > +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> > + if (READ_ONCE(sem->last_holder) == (unsigned long)current)
> > + WRITE_ONCE(sem->last_holder, 0UL);
> > +#endif
> > +
> > if (likely(list_empty(&sem->wait_list)))
> > sem->count++;
> > else
> > @@ -224,8 +235,12 @@ static inline int __sched ___down_common(struct semaphore *sem, long state,
> > raw_spin_unlock_irq(&sem->lock);
> > timeout = schedule_timeout(timeout);
> > raw_spin_lock_irq(&sem->lock);
> > - if (waiter.up)
> > + if (waiter.up) {
> > +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> > + WRITE_ONCE(sem->last_holder, (unsigned long)current);
> > +#endif
> > return 0;
> > + }
> > }
> >
> > timed_out:
> > @@ -242,10 +257,18 @@ static inline int __sched __down_common(struct semaphore *sem, long state,
> > {
> > int ret;
> >
> > +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> > + hung_task_set_blocker(sem, BLOCKER_TYPE_SEM);
> > +#endif
> > +
> > trace_contention_begin(sem, 0);
> > ret = ___down_common(sem, state, timeout);
> > trace_contention_end(sem, ret);
> >
> > +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> > + hung_task_clear_blocker();
> > +#endif
> > +
> > return ret;
> > }
> >
> > @@ -277,3 +300,23 @@ static noinline void __sched __up(struct semaphore *sem)
> > waiter->up = true;
> > wake_up_process(waiter->task);
> > }
> > +
> > +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> > +unsigned long sem_last_holder(struct semaphore *sem)
> > +{
> > + return READ_ONCE(sem->last_holder);
> > +}
> > +#else
> > +unsigned long sem_last_holder(struct semaphore *sem)
> > +{
> > + return 0UL;
> > +}
> > +#endif
> > +
> > +static inline void __sem_acquire(struct semaphore *sem)
> > +{
> > + sem->count--;
> > +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> > + WRITE_ONCE(sem->last_holder, (unsigned long)current);
> > +#endif
> > +}
> > --
> > 2.45.2
> >
>
>
> --
> Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-03-19 12:11 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-14 14:42 [PATCH RESEND v2 0/3] hung_task: extend blocking task stacktrace dump to semaphore Lance Yang
2025-03-14 14:42 ` [PATCH RESEND v2 1/3] hung_task: replace blocker_mutex with encoded blocker Lance Yang
2025-03-18 9:41 ` Masami Hiramatsu
2025-03-18 11:35 ` Lance Yang
2025-03-14 14:42 ` [PATCH RESEND v2 2/3] hung_task: show the blocker task if the task is hung on semaphore Lance Yang
2025-03-19 11:55 ` Masami Hiramatsu
2025-03-19 12:11 ` Lance Yang
2025-03-14 14:43 ` [PATCH RESEND v2 3/3] samples: add hung_task detector semaphore blocking sample Lance Yang
2025-03-18 1:36 ` Masami Hiramatsu
2025-03-18 6:20 ` Amaindex
2025-03-14 17:37 ` [PATCH RESEND v2 0/3] hung_task: extend blocking task stacktrace dump to semaphore Boqun Feng
2025-03-15 3:18 ` Lance Yang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox