All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/3] hung_task: extend blocking task stacktrace dump to semaphore
@ 2025-04-14 14:59 Lance Yang
  2025-04-14 14:59 ` [PATCH v5 1/3] hung_task: replace blocker_mutex with encoded blocker Lance Yang
                   ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Lance Yang @ 2025-04-14 14:59 UTC (permalink / raw)
  To: akpm
  Cc: will, peterz, mingo, longman, mhiramat, anna.schumaker,
	boqun.feng, joel.granados, kent.overstreet, leonylgao,
	linux-kernel, rostedt, senozhatsky, tfiga, amaindex, jstultz,
	Lance Yang

Hi all,

Inspired by mutex blocker tracking[1], this patch series extend the
feature to not only dump the blocker task holding a mutex but also to
support semaphores. Unlike mutexes, semaphores lack explicit ownership
tracking, making it challenging to identify the root cause of hangs. To
address this, we introduce a last_holder field to the semaphore structure,
which is updated when a task successfully calls down() and cleared during
up().

The assumption is that if a task is blocked on a semaphore, the holders
must not have released it. While this does not guarantee that the last
holder is one of the current blockers, it likely provides a practical hint
for diagnosing semaphore-related stalls.

With this change, the hung task detector can now show blocker task's info
like below:

[Tue Apr  8 12:19:07 2025] INFO: task cat:945 blocked for more than 120 seconds.
[Tue Apr  8 12:19:07 2025]       Tainted: G            E      6.14.0-rc6+ #1
[Tue Apr  8 12:19:07 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Apr  8 12:19:07 2025] task:cat             state:D stack:0     pid:945   tgid:945   ppid:828    task_flags:0x400000 flags:0x00000000
[Tue Apr  8 12:19:07 2025] Call Trace:
[Tue Apr  8 12:19:07 2025]  <TASK>
[Tue Apr  8 12:19:07 2025]  __schedule+0x491/0xbd0
[Tue Apr  8 12:19:07 2025]  schedule+0x27/0xf0
[Tue Apr  8 12:19:07 2025]  schedule_timeout+0xe3/0xf0
[Tue Apr  8 12:19:07 2025]  ? __folio_mod_stat+0x2a/0x80
[Tue Apr  8 12:19:07 2025]  ? set_ptes.constprop.0+0x27/0x90
[Tue Apr  8 12:19:07 2025]  __down_common+0x155/0x280
[Tue Apr  8 12:19:07 2025]  down+0x53/0x70
[Tue Apr  8 12:19:07 2025]  read_dummy_semaphore+0x23/0x60
[Tue Apr  8 12:19:07 2025]  full_proxy_read+0x5f/0xa0
[Tue Apr  8 12:19:07 2025]  vfs_read+0xbc/0x350
[Tue Apr  8 12:19:07 2025]  ? __count_memcg_events+0xa5/0x140
[Tue Apr  8 12:19:07 2025]  ? count_memcg_events.constprop.0+0x1a/0x30
[Tue Apr  8 12:19:07 2025]  ? handle_mm_fault+0x180/0x260
[Tue Apr  8 12:19:07 2025]  ksys_read+0x66/0xe0
[Tue Apr  8 12:19:07 2025]  do_syscall_64+0x51/0x120
[Tue Apr  8 12:19:07 2025]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Tue Apr  8 12:19:07 2025] RIP: 0033:0x7f419478f46e
[Tue Apr  8 12:19:07 2025] RSP: 002b:00007fff1c4d2668 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[Tue Apr  8 12:19:07 2025] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f419478f46e
[Tue Apr  8 12:19:07 2025] RDX: 0000000000020000 RSI: 00007f4194683000 RDI: 0000000000000003
[Tue Apr  8 12:19:07 2025] RBP: 00007f4194683000 R08: 00007f4194682010 R09: 0000000000000000
[Tue Apr  8 12:19:07 2025] R10: fffffffffffffbc5 R11: 0000000000000246 R12: 0000000000000000
[Tue Apr  8 12:19:07 2025] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[Tue Apr  8 12:19:07 2025]  </TASK>
[Tue Apr  8 12:19:07 2025] INFO: task cat:945 blocked on a semaphore likely last held by task cat:938
[Tue Apr  8 12:19:07 2025] task:cat             state:S stack:0     pid:938   tgid:938   ppid:584    task_flags:0x400000 flags:0x00000000
[Tue Apr  8 12:19:07 2025] Call Trace:
[Tue Apr  8 12:19:07 2025]  <TASK>
[Tue Apr  8 12:19:07 2025]  __schedule+0x491/0xbd0
[Tue Apr  8 12:19:07 2025]  ? _raw_spin_unlock_irqrestore+0xe/0x40
[Tue Apr  8 12:19:07 2025]  schedule+0x27/0xf0
[Tue Apr  8 12:19:07 2025]  schedule_timeout+0x77/0xf0
[Tue Apr  8 12:19:07 2025]  ? __pfx_process_timeout+0x10/0x10
[Tue Apr  8 12:19:07 2025]  msleep_interruptible+0x49/0x60
[Tue Apr  8 12:19:07 2025]  read_dummy_semaphore+0x2d/0x60
[Tue Apr  8 12:19:07 2025]  full_proxy_read+0x5f/0xa0
[Tue Apr  8 12:19:07 2025]  vfs_read+0xbc/0x350
[Tue Apr  8 12:19:07 2025]  ? __count_memcg_events+0xa5/0x140
[Tue Apr  8 12:19:07 2025]  ? count_memcg_events.constprop.0+0x1a/0x30
[Tue Apr  8 12:19:07 2025]  ? handle_mm_fault+0x180/0x260
[Tue Apr  8 12:19:07 2025]  ksys_read+0x66/0xe0
[Tue Apr  8 12:19:07 2025]  do_syscall_64+0x51/0x120
[Tue Apr  8 12:19:07 2025]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Tue Apr  8 12:19:07 2025] RIP: 0033:0x7f7c584a646e
[Tue Apr  8 12:19:07 2025] RSP: 002b:00007ffdba8ce158 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[Tue Apr  8 12:19:07 2025] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f7c584a646e
[Tue Apr  8 12:19:07 2025] RDX: 0000000000020000 RSI: 00007f7c5839a000 RDI: 0000000000000003
[Tue Apr  8 12:19:07 2025] RBP: 00007f7c5839a000 R08: 00007f7c58399010 R09: 0000000000000000
[Tue Apr  8 12:19:07 2025] R10: fffffffffffffbc5 R11: 0000000000000246 R12: 0000000000000000
[Tue Apr  8 12:19:07 2025] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[Tue Apr  8 12:19:07 2025]  </TASK>

[1] https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com

Thanks,
Lance

---
v4 -> v5:
 * #01 Add comments for the blocker field, suggested by Andrew
 * #02 Reduce unnecessary #ifdef directives, suggested by Andrew
 * https://lore.kernel.org/all/20250320064923.24000-1-ioworker0@gmail.com

v3 -> v4:
 * #01 #02 Pick RB from Masami - thanks!
 * #03 Pick AB from Masami - thanks!
 * Extract the type from the blocker and use a switch-case instead of
if-else, suggested by Masami
 * https://lore.kernel.org/all/20250319081138.25133-1-ioworker0@gmail.com

v2 -> v3:
 * Remove the unnecessary WARN_ON_ONCE check for 'current->blocker',
 suggested by Masami
 * Drop the redundant #ifdef for including the hung task header file,
 suggested by Masam
 * Unify the samples into 'hung_task_tests.c', suggested by Masami
 * https://lore.kernel.org/all/20250314144300.32542-1-ioworker0@gmail.com

v1 -> v2:
 * Use one field to store the blocker as only one is active at a time,
 suggested by Masami
 * Leverage the LSB of the blocker field to reduce memory footprint,
 suggested by Masami
 * Add a hung_task detector semaphore blocking test sample code
 * https://lore.kernel.org/all/20250301055102.88746-1-ioworker0@gmail.com

Lance Yang (2):
  hung_task: replace blocker_mutex with encoded blocker
  hung_task: show the blocker task if the task is hung on semaphore

Zi Li (1):
  samples: extend hung_task detector test with semaphore support

 include/linux/hung_task.h           | 99 +++++++++++++++++++++++++++++
 include/linux/sched.h               |  6 +-
 include/linux/semaphore.h           | 15 ++++-
 kernel/hung_task.c                  | 55 ++++++++++++----
 kernel/locking/mutex.c              |  5 +-
 kernel/locking/semaphore.c          | 57 +++++++++++++++--
 samples/Kconfig                     |  9 +--
 samples/hung_task/Makefile          |  2 +-
 samples/hung_task/hung_task_mutex.c | 66 -------------------
 samples/hung_task/hung_task_tests.c | 97 ++++++++++++++++++++++++++++
 10 files changed, 319 insertions(+), 92 deletions(-)
 create mode 100644 include/linux/hung_task.h
 delete mode 100644 samples/hung_task/hung_task_mutex.c
 create mode 100644 samples/hung_task/hung_task_tests.c

-- 
2.49.0


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2025-08-26  5:16 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-14 14:59 [PATCH v5 0/3] hung_task: extend blocking task stacktrace dump to semaphore Lance Yang
2025-04-14 14:59 ` [PATCH v5 1/3] hung_task: replace blocker_mutex with encoded blocker Lance Yang
2025-04-14 21:36   ` Andrew Morton
2025-04-15  3:44     ` Lance Yang
2025-04-14 14:59 ` [PATCH v5 2/3] hung_task: show the blocker task if the task is hung on semaphore Lance Yang
2025-08-22  7:38   ` Geert Uytterhoeven
2025-08-22 15:18     ` Lance Yang
2025-08-22 15:37       ` Geert Uytterhoeven
2025-08-22 16:42         ` Lance Yang
2025-08-23  0:27           ` Finn Thain
2025-08-23  4:47             ` Lance Yang
2025-08-23  5:00               ` [PATCH 1/1] hung_task: fix warnings caused by unaligned lock pointers Lance Yang
2025-08-26  4:49                 ` Masami Hiramatsu
2025-08-26  5:11                   ` Lance Yang
2025-08-23  7:40               ` [PATCH 1/1] hung_task: fix warnings by enforcing alignment on lock structures Lance Yang
2025-08-23 11:06                 ` John Paul Adrian Glaubitz
2025-08-23 21:53                 ` kernel test robot
2025-08-24  0:47                   ` Finn Thain
2025-08-24  3:03                     ` Lance Yang
2025-08-24  4:18                       ` Finn Thain
2025-08-24  5:02                         ` Lance Yang
2025-08-24  5:57                           ` Finn Thain
2025-08-24  6:18                             ` Lance Yang
2025-08-26  5:02                 ` Masami Hiramatsu
2025-08-26  5:16                   ` Lance Yang
2025-08-23  7:49               ` [PATCH v5 2/3] hung_task: show the blocker task if the task is hung on semaphore Lance Yang
2025-04-14 14:59 ` [PATCH v5 3/3] samples: extend hung_task detector test with semaphore support Lance Yang
2025-04-14 21:38 ` [PATCH v5 0/3] hung_task: extend blocking task stacktrace dump to semaphore Andrew Morton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.