public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Waiman Long <llong@redhat.com>
To: Steven Rostedt <rostedt@goodmis.org>,
	"Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Boqun Feng <boqun.feng@gmail.com>,
	Joel Granados <joel.granados@kernel.org>,
	Anna Schumaker <anna.schumaker@oracle.com>,
	Lance Yang <ioworker0@gmail.com>,
	Kent Overstreet <kent.overstreet@linux.dev>,
	Yongliang Gao <leonylgao@tencent.com>,
	Tomasz Figa <tfiga@chromium.org>,
	Sergey Senozhatsky <senozhatsky@chromium.org>,
	linux-kernel@vger.kernel.org,
	Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [PATCH 1/2] hung_task: Show the blocker task if the task is hung on mutex
Date: Wed, 19 Feb 2025 15:18:57 -0500	[thread overview]
Message-ID: <0fa9dd8e-2d83-487e-bfb1-1f5d20cd9fe6@redhat.com> (raw)
In-Reply-To: <20250219112308.5d905680@gandalf.local.home>

On 2/19/25 11:23 AM, Steven Rostedt wrote:
> On Wed, 19 Feb 2025 22:00:49 +0900
> "Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:
>
>> From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
>>
>> The "hung_task" shows a long-time uninterruptible slept task, but most
>> often, it's blocked on a mutex acquired by another task. Without
>> dumping such a task, investigating the root cause of the hung task
>> problem is very difficult.
>>
>> Fortunately CONFIG_DEBUG_MUTEXES=y allows us to identify the mutex
>> blocking the task. And the mutex has "owner" information, which can
>> be used to find the owner task and dump it with hung tasks.
>>
>> With this change, the hung task shows blocker task's info like below;
>>
> We've hit bugs like this in the field a few times, and it was very
> difficult to debug. Something like this would have made our lives much
> easier!
I agree that it will be a useful feature.
>> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
>> ---
>>   kernel/hung_task.c           |   38 ++++++++++++++++++++++++++++++++++++++
>>   kernel/locking/mutex-debug.c |    1 +
>>   kernel/locking/mutex.c       |    9 +++++++++
>>   kernel/locking/mutex.h       |    6 ++++++
>>   4 files changed, 54 insertions(+)
>>
>> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
>> index 04efa7a6e69b..d1ce69504090 100644
>> --- a/kernel/hung_task.c
>> +++ b/kernel/hung_task.c
>> @@ -25,6 +25,8 @@
>>   
>>   #include <trace/events/sched.h>
>>   
>> +#include "locking/mutex.h"
>> +
>>   /*
>>    * The number of tasks checked:
>>    */
>> @@ -93,6 +95,41 @@ static struct notifier_block panic_block = {
>>   	.notifier_call = hung_task_panic,
>>   };
>>   
>> +
>> +#ifdef CONFIG_DEBUG_MUTEXES
>> +static void debug_show_blocker(struct task_struct *task)
>> +{
>> +	struct task_struct *g, *t;
>> +	unsigned long owner;
>> +	struct mutex *lock;
>> +
>> +	if (!task->blocked_on)
>> +		return;
>> +
>> +	lock = task->blocked_on->mutex;
> This is a catch 22. To look at the task's blocked_on, we need the
> lock->wait_lock held, otherwise this could be an issue. But to get that
> lock, we need to look at the task's blocked_on field! As this can race.
>
> Another thing is that the waiter is on the task's stack. Perhaps we need to
> move this into sched/core.c and be able to lock the task's rq. Because even
> something like:
>
> 	waiter = READ_ONCE(task->blocked_on);
>
> May be garbage if the task were to suddenly wake up and run.
>
> Now if we were able to lock the task's rq, which would prevent it from
> being woken up, then the blocked_on field would not be at risk of being
> corrupted.

It is tricky to access the mutex_waiter structure which is allocated 
from stack. So another way to work around this issue is to add a new 
blocked_on_mutex field in task_struct to directly point to relevant 
mutex. Yes, that increase the size of task_struct by 8 bytes, but it is 
a pretty large structure anyway. Using READ_ONCE/WRITE_ONCE() to access 
this field, we don't need to take lock, though taking the wait_lock may 
still be needed to examine other information inside the mutex.

Cheers,
Longman


  reply	other threads:[~2025-02-19 20:19 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-19 13:00 [PATCH 0/2] hung_task: Dump the blocking task stacktrace Masami Hiramatsu (Google)
2025-02-19 13:00 ` [PATCH 1/2] hung_task: Show the blocker task if the task is hung on mutex Masami Hiramatsu (Google)
2025-02-19 16:23   ` Steven Rostedt
2025-02-19 20:18     ` Waiman Long [this message]
2025-02-19 20:24       ` Steven Rostedt
2025-02-19 22:44         ` Waiman Long
2025-02-19 22:56           ` Masami Hiramatsu
2025-02-19 23:55             ` Steven Rostedt
2025-02-20  1:52               ` Lance Yang
2025-02-20  2:07               ` Masami Hiramatsu
2025-02-20  2:21                 ` Waiman Long
2025-02-20  2:23                 ` Steven Rostedt
2025-02-20  1:36             ` Waiman Long
2025-02-20  1:41               ` Steven Rostedt
2025-02-20  2:15                 ` Waiman Long
2025-02-20  2:27                   ` Steven Rostedt
2025-02-20  3:29                     ` Waiman Long
2025-02-20  2:59                   ` Masami Hiramatsu
2025-02-20  3:37                     ` Waiman Long
2025-02-20  9:29                       ` Masami Hiramatsu
2025-02-20 13:28                         ` Waiman Long
2025-02-20  2:40                 ` Masami Hiramatsu
2025-02-20  3:11                   ` Steven Rostedt
2025-02-20 13:13                     ` Waiman Long
2025-02-20 16:30                       ` Steven Rostedt
2025-02-19 23:09         ` Masami Hiramatsu
2025-02-19 23:58           ` Steven Rostedt
2025-02-20  2:08             ` Masami Hiramatsu
2025-02-20  2:25               ` Waiman Long
2025-02-20  1:40           ` Waiman Long
2025-02-20  2:45           ` Sergey Senozhatsky
2025-02-20  3:46             ` Sergey Senozhatsky
2025-02-20  3:49             ` Waiman Long
2025-02-20  4:19               ` Sergey Senozhatsky
2025-02-20  9:25             ` Masami Hiramatsu
2025-02-19 13:00 ` [PATCH 2/2] samples: Add hung_task detector mutex blocking sample Masami Hiramatsu (Google)
2025-02-19 13:33 ` [PATCH 0/2] hung_task: Dump the blocking task stacktrace Lance Yang
2025-02-19 15:02   ` Lance Yang
2025-02-19 20:20     ` Waiman Long
2025-02-20  1:27       ` Lance Yang
2025-02-20 14:18       ` Masami Hiramatsu
2025-02-20 14:22         ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0fa9dd8e-2d83-487e-bfb1-1f5d20cd9fe6@redhat.com \
    --to=llong@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=anna.schumaker@oracle.com \
    --cc=boqun.feng@gmail.com \
    --cc=ioworker0@gmail.com \
    --cc=joel.granados@kernel.org \
    --cc=kent.overstreet@linux.dev \
    --cc=leonylgao@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhiramat@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=senozhatsky@chromium.org \
    --cc=tfiga@chromium.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox