linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lance Yang <ioworker0@gmail.com>
To: akpm@linux-foundation.org
Cc: will@kernel.org, peterz@infradead.org, mingo@redhat.com,
	longman@redhat.com, mhiramat@kernel.org,
	anna.schumaker@oracle.com, boqun.feng@gmail.com,
	joel.granados@kernel.org, kent.overstreet@linux.dev,
	leonylgao@tencent.com, linux-kernel@vger.kernel.org,
	rostedt@goodmis.org, senozhatsky@chromium.org,
	tfiga@chromium.org, amaindex@outlook.com, jstultz@google.com,
	Lance Yang <ioworker0@gmail.com>
Subject: [PATCH v5 0/3] hung_task: extend blocking task stacktrace dump to semaphore
Date: Mon, 14 Apr 2025 22:59:42 +0800	[thread overview]
Message-ID: <20250414145945.84916-1-ioworker0@gmail.com> (raw)

Hi all,

Inspired by mutex blocker tracking[1], this patch series extend the
feature to not only dump the blocker task holding a mutex but also to
support semaphores. Unlike mutexes, semaphores lack explicit ownership
tracking, making it challenging to identify the root cause of hangs. To
address this, we introduce a last_holder field to the semaphore structure,
which is updated when a task successfully calls down() and cleared during
up().

The assumption is that if a task is blocked on a semaphore, the holders
must not have released it. While this does not guarantee that the last
holder is one of the current blockers, it likely provides a practical hint
for diagnosing semaphore-related stalls.

With this change, the hung task detector can now show blocker task's info
like below:

[Tue Apr  8 12:19:07 2025] INFO: task cat:945 blocked for more than 120 seconds.
[Tue Apr  8 12:19:07 2025]       Tainted: G            E      6.14.0-rc6+ #1
[Tue Apr  8 12:19:07 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Apr  8 12:19:07 2025] task:cat             state:D stack:0     pid:945   tgid:945   ppid:828    task_flags:0x400000 flags:0x00000000
[Tue Apr  8 12:19:07 2025] Call Trace:
[Tue Apr  8 12:19:07 2025]  <TASK>
[Tue Apr  8 12:19:07 2025]  __schedule+0x491/0xbd0
[Tue Apr  8 12:19:07 2025]  schedule+0x27/0xf0
[Tue Apr  8 12:19:07 2025]  schedule_timeout+0xe3/0xf0
[Tue Apr  8 12:19:07 2025]  ? __folio_mod_stat+0x2a/0x80
[Tue Apr  8 12:19:07 2025]  ? set_ptes.constprop.0+0x27/0x90
[Tue Apr  8 12:19:07 2025]  __down_common+0x155/0x280
[Tue Apr  8 12:19:07 2025]  down+0x53/0x70
[Tue Apr  8 12:19:07 2025]  read_dummy_semaphore+0x23/0x60
[Tue Apr  8 12:19:07 2025]  full_proxy_read+0x5f/0xa0
[Tue Apr  8 12:19:07 2025]  vfs_read+0xbc/0x350
[Tue Apr  8 12:19:07 2025]  ? __count_memcg_events+0xa5/0x140
[Tue Apr  8 12:19:07 2025]  ? count_memcg_events.constprop.0+0x1a/0x30
[Tue Apr  8 12:19:07 2025]  ? handle_mm_fault+0x180/0x260
[Tue Apr  8 12:19:07 2025]  ksys_read+0x66/0xe0
[Tue Apr  8 12:19:07 2025]  do_syscall_64+0x51/0x120
[Tue Apr  8 12:19:07 2025]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Tue Apr  8 12:19:07 2025] RIP: 0033:0x7f419478f46e
[Tue Apr  8 12:19:07 2025] RSP: 002b:00007fff1c4d2668 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[Tue Apr  8 12:19:07 2025] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f419478f46e
[Tue Apr  8 12:19:07 2025] RDX: 0000000000020000 RSI: 00007f4194683000 RDI: 0000000000000003
[Tue Apr  8 12:19:07 2025] RBP: 00007f4194683000 R08: 00007f4194682010 R09: 0000000000000000
[Tue Apr  8 12:19:07 2025] R10: fffffffffffffbc5 R11: 0000000000000246 R12: 0000000000000000
[Tue Apr  8 12:19:07 2025] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[Tue Apr  8 12:19:07 2025]  </TASK>
[Tue Apr  8 12:19:07 2025] INFO: task cat:945 blocked on a semaphore likely last held by task cat:938
[Tue Apr  8 12:19:07 2025] task:cat             state:S stack:0     pid:938   tgid:938   ppid:584    task_flags:0x400000 flags:0x00000000
[Tue Apr  8 12:19:07 2025] Call Trace:
[Tue Apr  8 12:19:07 2025]  <TASK>
[Tue Apr  8 12:19:07 2025]  __schedule+0x491/0xbd0
[Tue Apr  8 12:19:07 2025]  ? _raw_spin_unlock_irqrestore+0xe/0x40
[Tue Apr  8 12:19:07 2025]  schedule+0x27/0xf0
[Tue Apr  8 12:19:07 2025]  schedule_timeout+0x77/0xf0
[Tue Apr  8 12:19:07 2025]  ? __pfx_process_timeout+0x10/0x10
[Tue Apr  8 12:19:07 2025]  msleep_interruptible+0x49/0x60
[Tue Apr  8 12:19:07 2025]  read_dummy_semaphore+0x2d/0x60
[Tue Apr  8 12:19:07 2025]  full_proxy_read+0x5f/0xa0
[Tue Apr  8 12:19:07 2025]  vfs_read+0xbc/0x350
[Tue Apr  8 12:19:07 2025]  ? __count_memcg_events+0xa5/0x140
[Tue Apr  8 12:19:07 2025]  ? count_memcg_events.constprop.0+0x1a/0x30
[Tue Apr  8 12:19:07 2025]  ? handle_mm_fault+0x180/0x260
[Tue Apr  8 12:19:07 2025]  ksys_read+0x66/0xe0
[Tue Apr  8 12:19:07 2025]  do_syscall_64+0x51/0x120
[Tue Apr  8 12:19:07 2025]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Tue Apr  8 12:19:07 2025] RIP: 0033:0x7f7c584a646e
[Tue Apr  8 12:19:07 2025] RSP: 002b:00007ffdba8ce158 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[Tue Apr  8 12:19:07 2025] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f7c584a646e
[Tue Apr  8 12:19:07 2025] RDX: 0000000000020000 RSI: 00007f7c5839a000 RDI: 0000000000000003
[Tue Apr  8 12:19:07 2025] RBP: 00007f7c5839a000 R08: 00007f7c58399010 R09: 0000000000000000
[Tue Apr  8 12:19:07 2025] R10: fffffffffffffbc5 R11: 0000000000000246 R12: 0000000000000000
[Tue Apr  8 12:19:07 2025] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[Tue Apr  8 12:19:07 2025]  </TASK>

[1] https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com

Thanks,
Lance

---
v4 -> v5:
 * #01 Add comments for the blocker field, suggested by Andrew
 * #02 Reduce unnecessary #ifdef directives, suggested by Andrew
 * https://lore.kernel.org/all/20250320064923.24000-1-ioworker0@gmail.com

v3 -> v4:
 * #01 #02 Pick RB from Masami - thanks!
 * #03 Pick AB from Masami - thanks!
 * Extract the type from the blocker and use a switch-case instead of
if-else, suggested by Masami
 * https://lore.kernel.org/all/20250319081138.25133-1-ioworker0@gmail.com

v2 -> v3:
 * Remove the unnecessary WARN_ON_ONCE check for 'current->blocker',
 suggested by Masami
 * Drop the redundant #ifdef for including the hung task header file,
 suggested by Masam
 * Unify the samples into 'hung_task_tests.c', suggested by Masami
 * https://lore.kernel.org/all/20250314144300.32542-1-ioworker0@gmail.com

v1 -> v2:
 * Use one field to store the blocker as only one is active at a time,
 suggested by Masami
 * Leverage the LSB of the blocker field to reduce memory footprint,
 suggested by Masami
 * Add a hung_task detector semaphore blocking test sample code
 * https://lore.kernel.org/all/20250301055102.88746-1-ioworker0@gmail.com

Lance Yang (2):
  hung_task: replace blocker_mutex with encoded blocker
  hung_task: show the blocker task if the task is hung on semaphore

Zi Li (1):
  samples: extend hung_task detector test with semaphore support

 include/linux/hung_task.h           | 99 +++++++++++++++++++++++++++++
 include/linux/sched.h               |  6 +-
 include/linux/semaphore.h           | 15 ++++-
 kernel/hung_task.c                  | 55 ++++++++++++----
 kernel/locking/mutex.c              |  5 +-
 kernel/locking/semaphore.c          | 57 +++++++++++++++--
 samples/Kconfig                     |  9 +--
 samples/hung_task/Makefile          |  2 +-
 samples/hung_task/hung_task_mutex.c | 66 -------------------
 samples/hung_task/hung_task_tests.c | 97 ++++++++++++++++++++++++++++
 10 files changed, 319 insertions(+), 92 deletions(-)
 create mode 100644 include/linux/hung_task.h
 delete mode 100644 samples/hung_task/hung_task_mutex.c
 create mode 100644 samples/hung_task/hung_task_tests.c

-- 
2.49.0


             reply	other threads:[~2025-04-14 15:00 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-14 14:59 Lance Yang [this message]
2025-04-14 14:59 ` [PATCH v5 1/3] hung_task: replace blocker_mutex with encoded blocker Lance Yang
2025-04-14 21:36   ` Andrew Morton
2025-04-15  3:44     ` Lance Yang
2025-04-14 14:59 ` [PATCH v5 2/3] hung_task: show the blocker task if the task is hung on semaphore Lance Yang
2025-08-22  7:38   ` Geert Uytterhoeven
2025-08-22 15:18     ` Lance Yang
2025-08-22 15:37       ` Geert Uytterhoeven
2025-08-22 16:42         ` Lance Yang
2025-08-23  0:27           ` Finn Thain
2025-08-23  4:47             ` Lance Yang
2025-08-23  5:00               ` [PATCH 1/1] hung_task: fix warnings caused by unaligned lock pointers Lance Yang
2025-08-26  4:49                 ` Masami Hiramatsu
2025-08-26  5:11                   ` Lance Yang
2025-08-23  7:40               ` [PATCH 1/1] hung_task: fix warnings by enforcing alignment on lock structures Lance Yang
2025-08-23 21:53                 ` kernel test robot
2025-08-24  0:47                   ` Finn Thain
2025-08-24  3:03                     ` Lance Yang
2025-08-24  4:18                       ` Finn Thain
2025-08-24  5:02                         ` Lance Yang
2025-08-24  5:57                           ` Finn Thain
2025-08-24  6:18                             ` Lance Yang
2025-08-26  5:02                 ` Masami Hiramatsu
2025-08-26  5:16                   ` Lance Yang
2025-08-23  7:49               ` [PATCH v5 2/3] hung_task: show the blocker task if the task is hung on semaphore Lance Yang
2025-04-14 14:59 ` [PATCH v5 3/3] samples: extend hung_task detector test with semaphore support Lance Yang
2025-04-14 21:38 ` [PATCH v5 0/3] hung_task: extend blocking task stacktrace dump to semaphore Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250414145945.84916-1-ioworker0@gmail.com \
    --to=ioworker0@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=amaindex@outlook.com \
    --cc=anna.schumaker@oracle.com \
    --cc=boqun.feng@gmail.com \
    --cc=joel.granados@kernel.org \
    --cc=jstultz@google.com \
    --cc=kent.overstreet@linux.dev \
    --cc=leonylgao@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mhiramat@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=senozhatsky@chromium.org \
    --cc=tfiga@chromium.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).