[PATCH 0/5] tracing: Fix several deadlock/race issues in timerlat and hwlat tracer

linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/5] tracing: Fix several deadlock/race issues in timerlat and hwlat tracer
@ 2024-09-24  9:45 Wei Li
  2024-09-24  9:45 ` [PATCH 1/5] tracing/timerlat: Fix duplicated kthread creation due to CPU online/offline Wei Li
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Wei Li @ 2024-09-24  9:45 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Daniel Bristot de Oliveira
  Cc: linux-trace-kernel, xiexiuqi

These issues are found in concurrent CPU-hotplug and tracer-toggling
testing, the test cases are as follows:

Background: *test_hotplug.sh*
```
#!/bin/sh

while true
do
	echo 0 > /sys/devices/system/cpu/cpu1/online
	echo 1 > /sys/devices/system/cpu/cpu1/online
done
```

Test 1: *test_timerlat.sh*
```
#!/bin/sh

while true
do
	echo timerlat > /sys/kernel/debug/tracing/current_tracer
	echo nop > /sys/kernel/debug/tracing/current_tracer
done
```

Test 2: *test_hwlat.sh*
```
#!/bin/sh

echo per-cpu > /sys/kernel/debug/tracing/hwlat_detector/mode
while true
do
	echo hwlat > /sys/kernel/debug/tracing/current_tracer
	echo nop > /sys/kernel/debug/tracing/current_tracer
done
```

Wei Li (5):
  tracing/timerlat: Fix duplicated kthread creation due to CPU
    online/offline
  tracing/timerlat: Drop interface_lock in stop_kthread()
  tracing/timerlat: Fix a race during cpuhp processing
  tracing/hwlat: Fix a race during cpuhp processing
  tracing/hwlat: Fix deadlock in cpuhp processing

 kernel/trace/trace_hwlat.c   |  5 ++++-
 kernel/trace/trace_osnoise.c | 22 +++++++++++++---------
 2 files changed, 17 insertions(+), 10 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/5] tracing/timerlat: Fix duplicated kthread creation due to CPU online/offline
  2024-09-24  9:45 [PATCH 0/5] tracing: Fix several deadlock/race issues in timerlat and hwlat tracer Wei Li
@ 2024-09-24  9:45 ` Wei Li
  2024-09-24  9:45 ` [PATCH 2/5] tracing/timerlat: Drop interface_lock in stop_kthread() Wei Li
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Wei Li @ 2024-09-24  9:45 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Daniel Bristot de Oliveira
  Cc: linux-trace-kernel, xiexiuqi

osnoise_hotplug_workfn() is the asynchronous online callback for
"trace/osnoise:online". It may be congested when a CPU goes online and
offline repeatedly and is invoked for multiple times after a certain
online.

This will lead to kthread leak and timer corruption. Add a check
in start_kthread() to prevent this situation.

Fixes: c8895e271f79 ("trace/osnoise: Support hotplug operations")
Signed-off-by: Wei Li <liwei391@huawei.com>
---
 kernel/trace/trace_osnoise.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
index 7e75c1214b36..934a14bc72e6 100644
--- a/kernel/trace/trace_osnoise.c
+++ b/kernel/trace/trace_osnoise.c
@@ -2007,6 +2007,10 @@ static int start_kthread(unsigned int cpu)
 	void *main = osnoise_main;
 	char comm[24];
 
+	/* Do not start a new thread if it is already running */
+	if (per_cpu(per_cpu_osnoise_var, cpu).kthread)
+		return 0;
+
 	if (timerlat_enabled()) {
 		snprintf(comm, 24, "timerlat/%d", cpu);
 		main = timerlat_main;
@@ -2061,11 +2065,10 @@ static int start_per_cpu_kthreads(void)
 		if (cpumask_test_and_clear_cpu(cpu, &kthread_cpumask)) {
 			struct task_struct *kthread;
 
-			kthread = per_cpu(per_cpu_osnoise_var, cpu).kthread;
+			kthread = xchg_relaxed(&(per_cpu(per_cpu_osnoise_var, cpu).kthread), NULL);
 			if (!WARN_ON(!kthread))
 				kthread_stop(kthread);
 		}
-		per_cpu(per_cpu_osnoise_var, cpu).kthread = NULL;
 	}
 
 	for_each_cpu(cpu, current_mask) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/5] tracing/timerlat: Drop interface_lock in stop_kthread()
  2024-09-24  9:45 [PATCH 0/5] tracing: Fix several deadlock/race issues in timerlat and hwlat tracer Wei Li
  2024-09-24  9:45 ` [PATCH 1/5] tracing/timerlat: Fix duplicated kthread creation due to CPU online/offline Wei Li
@ 2024-09-24  9:45 ` Wei Li
  2024-09-24  9:45 ` [PATCH 3/5] tracing/timerlat: Fix a race during cpuhp processing Wei Li
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Wei Li @ 2024-09-24  9:45 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Daniel Bristot de Oliveira
  Cc: linux-trace-kernel, xiexiuqi

stop_kthread() is the offline callback for "trace/osnoise:online", since
commit 5bfbcd1ee57b ("tracing/timerlat: Add interface_lock around clearing
of kthread in stop_kthread()"), the following ABBA deadlock scenario is
introduced:

T1                            | T2 [BP]               | T3 [AP]
osnoise_hotplug_workfn()      | work_for_cpu_fn()     | cpuhp_thread_fun()
                              |   _cpu_down()         |   osnoise_cpu_die()
  mutex_lock(&interface_lock) |                       |     stop_kthread()
                              |     cpus_write_lock() |       mutex_lock(&interface_lock)
  cpus_read_lock()            |     cpuhp_kick_ap()   |

As the interface_lock here in just for protecting the "kthread" field of
the osn_var, use xchg() instead to fix this issue. Also use
for_each_online_cpu() back in stop_per_cpu_kthreads() as it can take
cpu_read_lock() again.

Fixes: 5bfbcd1ee57b ("tracing/timerlat: Add interface_lock around clearing of kthread in stop_kthread()")
Signed-off-by: Wei Li <liwei391@huawei.com>
---
 kernel/trace/trace_osnoise.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
index 934a14bc72e6..ddc9afb9b7d4 100644
--- a/kernel/trace/trace_osnoise.c
+++ b/kernel/trace/trace_osnoise.c
@@ -1953,12 +1953,8 @@ static void stop_kthread(unsigned int cpu)
 {
 	struct task_struct *kthread;
 
-	mutex_lock(&interface_lock);
-	kthread = per_cpu(per_cpu_osnoise_var, cpu).kthread;
+	kthread = xchg_relaxed(&(per_cpu(per_cpu_osnoise_var, cpu).kthread), NULL);
 	if (kthread) {
-		per_cpu(per_cpu_osnoise_var, cpu).kthread = NULL;
-		mutex_unlock(&interface_lock);
-
 		if (cpumask_test_and_clear_cpu(cpu, &kthread_cpumask) &&
 		    !WARN_ON(!test_bit(OSN_WORKLOAD, &osnoise_options))) {
 			kthread_stop(kthread);
@@ -1972,7 +1968,6 @@ static void stop_kthread(unsigned int cpu)
 			put_task_struct(kthread);
 		}
 	} else {
-		mutex_unlock(&interface_lock);
 		/* if no workload, just return */
 		if (!test_bit(OSN_WORKLOAD, &osnoise_options)) {
 			/*
@@ -1994,8 +1989,12 @@ static void stop_per_cpu_kthreads(void)
 {
 	int cpu;
 
-	for_each_possible_cpu(cpu)
+	cpus_read_lock();
+
+	for_each_online_cpu(cpu)
 		stop_kthread(cpu);
+
+	cpus_read_unlock();
 }
 
 /*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/5] tracing/timerlat: Fix a race during cpuhp processing
  2024-09-24  9:45 [PATCH 0/5] tracing: Fix several deadlock/race issues in timerlat and hwlat tracer Wei Li
  2024-09-24  9:45 ` [PATCH 1/5] tracing/timerlat: Fix duplicated kthread creation due to CPU online/offline Wei Li
  2024-09-24  9:45 ` [PATCH 2/5] tracing/timerlat: Drop interface_lock in stop_kthread() Wei Li
@ 2024-09-24  9:45 ` Wei Li
  2024-09-24  9:45 ` [PATCH 4/5] tracing/hwlat: " Wei Li
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Wei Li @ 2024-09-24  9:45 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Daniel Bristot de Oliveira
  Cc: linux-trace-kernel, xiexiuqi

There is another found exception that the "timerlat/1" thread was
scheduled on CPU0, and lead to timer corruption finally:

```
ODEBUG: init active (active state 0) object: ffff888237c2e108 object type: hrtimer hint: timerlat_irq+0x0/0x220
WARNING: CPU: 0 PID: 426 at lib/debugobjects.c:518 debug_print_object+0x7d/0xb0
Modules linked in:
CPU: 0 UID: 0 PID: 426 Comm: timerlat/1 Not tainted 6.11.0-rc7+ #45
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:debug_print_object+0x7d/0xb0
...
Call Trace:
 <TASK>
 ? __warn+0x7c/0x110
 ? debug_print_object+0x7d/0xb0
 ? report_bug+0xf1/0x1d0
 ? prb_read_valid+0x17/0x20
 ? handle_bug+0x3f/0x70
 ? exc_invalid_op+0x13/0x60
 ? asm_exc_invalid_op+0x16/0x20
 ? debug_print_object+0x7d/0xb0
 ? debug_print_object+0x7d/0xb0
 ? __pfx_timerlat_irq+0x10/0x10
 __debug_object_init+0x110/0x150
 hrtimer_init+0x1d/0x60
 timerlat_main+0xab/0x2d0
 ? __pfx_timerlat_main+0x10/0x10
 kthread+0xb7/0xe0
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2d/0x40
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1a/0x30
 </TASK>
```

After tracing the scheduling event, it was discovered that the migration
of the "timerlat/1" thread was performed during thread creation. Further
analysis confirmed that it is because the CPU online processing for
osnoise is implemented through workers, which is asynchronous with the
offline processing. When the worker was scheduled to create a thread, the
CPU may has already been removed from the cpu_online_mask during the offline
process, resulting in the inability to select the right CPU:

T1                       | T2
[CPUHP_ONLINE]           | cpu_device_down()
osnoise_hotplug_workfn() |
                         |     cpus_write_lock()
                         |     takedown_cpu(1)
                         |     cpus_write_unlock()
[CPUHP_OFFLINE]          |
    cpus_read_lock()     |
    start_kthread(1)     |
    cpus_read_unlock()   |

To fix this, skip online processing if the CPU is already offline.

Fixes: c8895e271f79 ("trace/osnoise: Support hotplug operations")
Signed-off-by: Wei Li <liwei391@huawei.com>
---
 kernel/trace/trace_osnoise.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
index ddc9afb9b7d4..6ed4008e6d62 100644
--- a/kernel/trace/trace_osnoise.c
+++ b/kernel/trace/trace_osnoise.c
@@ -2097,6 +2097,8 @@ static void osnoise_hotplug_workfn(struct work_struct *dummy)
 	mutex_lock(&interface_lock);
 	cpus_read_lock();
 
+	if (!cpu_online(cpu))
+		goto out_unlock;
 	if (!cpumask_test_cpu(cpu, &osnoise_cpumask))
 		goto out_unlock;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/5] tracing/hwlat: Fix a race during cpuhp processing
  2024-09-24  9:45 [PATCH 0/5] tracing: Fix several deadlock/race issues in timerlat and hwlat tracer Wei Li
                   ` (2 preceding siblings ...)
  2024-09-24  9:45 ` [PATCH 3/5] tracing/timerlat: Fix a race during cpuhp processing Wei Li
@ 2024-09-24  9:45 ` Wei Li
  2024-09-24  9:45 ` [PATCH 5/5] tracing/hwlat: Fix deadlock in " Wei Li
  2024-09-26 12:57 ` [PATCH 0/5] tracing: Fix several deadlock/race issues in timerlat and hwlat tracer Steven Rostedt
  5 siblings, 0 replies; 11+ messages in thread
From: Wei Li @ 2024-09-24  9:45 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Daniel Bristot de Oliveira
  Cc: linux-trace-kernel, xiexiuqi

The cpuhp online/offline processing race also exists in percpu-mode hwlat
tracer in theory, apply the fix too.

Fixes: ba998f7d9531 ("trace/hwlat: Support hotplug operations")
Signed-off-by: Wei Li <liwei391@huawei.com>
---
 kernel/trace/trace_hwlat.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
index b791524a6536..3bd6071441ad 100644
--- a/kernel/trace/trace_hwlat.c
+++ b/kernel/trace/trace_hwlat.c
@@ -520,6 +520,8 @@ static void hwlat_hotplug_workfn(struct work_struct *dummy)
 	if (!hwlat_busy || hwlat_data.thread_mode != MODE_PER_CPU)
 		goto out_unlock;
 
+	if (!cpu_online(cpu))
+		goto out_unlock;
 	if (!cpumask_test_cpu(cpu, tr->tracing_cpumask))
 		goto out_unlock;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/5] tracing/hwlat: Fix deadlock in cpuhp processing
  2024-09-24  9:45 [PATCH 0/5] tracing: Fix several deadlock/race issues in timerlat and hwlat tracer Wei Li
                   ` (3 preceding siblings ...)
  2024-09-24  9:45 ` [PATCH 4/5] tracing/hwlat: " Wei Li
@ 2024-09-24  9:45 ` Wei Li
  2024-10-03 20:19   ` Steven Rostedt
  2024-09-26 12:57 ` [PATCH 0/5] tracing: Fix several deadlock/race issues in timerlat and hwlat tracer Steven Rostedt
  5 siblings, 1 reply; 11+ messages in thread
From: Wei Li @ 2024-09-24  9:45 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Daniel Bristot de Oliveira
  Cc: linux-trace-kernel, xiexiuqi

Another "hung task" error was reported during the test, and i figured out
the deadlock scenario is as follows:

T1 [BP]               | T2 [AP]                     | T3 [hwlatd/1]                  | T4
work_for_cpu_fn()     | cpuhp_thread_fun()          | kthread_fn()                   | hwlat_hotplug_workfn()
  _cpu_down()         |   stop_cpu_kthread()        |                                |   mutex_lock(&hwlat_data.lock)
    cpus_write_lock() |     kthread_stop(hwlatd/1)  |   mutex_lock(&hwlat_data.lock) |
    __cpuhp_kick_ap() |       wait_for_completion() |                                |   cpus_read_lock()

It constitutes ABBA deadlock indirectly between "cpu_hotplug_lock" and
"hwlat_data.lock", make the mutex obtaining in kthread_fn() interruptible
to fix this.

Fixes: ba998f7d9531 ("trace/hwlat: Support hotplug operations")
Signed-off-by: Wei Li <liwei391@huawei.com>
---
 kernel/trace/trace_hwlat.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
index 3bd6071441ad..4c228ccb8a38 100644
--- a/kernel/trace/trace_hwlat.c
+++ b/kernel/trace/trace_hwlat.c
@@ -370,7 +370,8 @@ static int kthread_fn(void *data)
 		get_sample();
 		local_irq_enable();
 
-		mutex_lock(&hwlat_data.lock);
+		if (mutex_lock_interruptible(&hwlat_data.lock))
+			break;
 		interval = hwlat_data.sample_window - hwlat_data.sample_width;
 		mutex_unlock(&hwlat_data.lock);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/5] tracing: Fix several deadlock/race issues in timerlat and hwlat tracer
  2024-09-24  9:45 [PATCH 0/5] tracing: Fix several deadlock/race issues in timerlat and hwlat tracer Wei Li
                   ` (4 preceding siblings ...)
  2024-09-24  9:45 ` [PATCH 5/5] tracing/hwlat: Fix deadlock in " Wei Li
@ 2024-09-26 12:57 ` Steven Rostedt
  5 siblings, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2024-09-26 12:57 UTC (permalink / raw)
  To: Wei Li
  Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-trace-kernel, xiexiuqi,
	Clark Williams


Daniel, is unfortunately no longer the maintainer of this code:

  https://lwn.net/Articles/979912/

I'll try to take a look at this next week.

Thanks,

-- Steve


On Tue, 24 Sep 2024 17:45:10 +0800
Wei Li <liwei391@huawei.com> wrote:

> These issues are found in concurrent CPU-hotplug and tracer-toggling
> testing, the test cases are as follows:
> 
> Background: *test_hotplug.sh*
> ```
> #!/bin/sh
> 
> while true
> do
> 	echo 0 > /sys/devices/system/cpu/cpu1/online
> 	echo 1 > /sys/devices/system/cpu/cpu1/online
> done
> ```
> 
> Test 1: *test_timerlat.sh*
> ```
> #!/bin/sh
> 
> while true
> do
> 	echo timerlat > /sys/kernel/debug/tracing/current_tracer
> 	echo nop > /sys/kernel/debug/tracing/current_tracer
> done
> ```
> 
> Test 2: *test_hwlat.sh*
> ```
> #!/bin/sh
> 
> echo per-cpu > /sys/kernel/debug/tracing/hwlat_detector/mode
> while true
> do
> 	echo hwlat > /sys/kernel/debug/tracing/current_tracer
> 	echo nop > /sys/kernel/debug/tracing/current_tracer
> done
> ```
> 
> Wei Li (5):
>   tracing/timerlat: Fix duplicated kthread creation due to CPU
>     online/offline
>   tracing/timerlat: Drop interface_lock in stop_kthread()
>   tracing/timerlat: Fix a race during cpuhp processing
>   tracing/hwlat: Fix a race during cpuhp processing
>   tracing/hwlat: Fix deadlock in cpuhp processing
> 
>  kernel/trace/trace_hwlat.c   |  5 ++++-
>  kernel/trace/trace_osnoise.c | 22 +++++++++++++---------
>  2 files changed, 17 insertions(+), 10 deletions(-)
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 5/5] tracing/hwlat: Fix deadlock in cpuhp processing
  2024-09-24  9:45 ` [PATCH 5/5] tracing/hwlat: Fix deadlock in " Wei Li
@ 2024-10-03 20:19   ` Steven Rostedt
  2024-10-09  7:47     ` liwei (GF)
  0 siblings, 1 reply; 11+ messages in thread
From: Steven Rostedt @ 2024-10-03 20:19 UTC (permalink / raw)
  To: Wei Li
  Cc: Masami Hiramatsu, Mathieu Desnoyers, Daniel Bristot de Oliveira,
	linux-trace-kernel, xiexiuqi

On Tue, 24 Sep 2024 17:45:15 +0800
Wei Li <liwei391@huawei.com> wrote:

> Another "hung task" error was reported during the test, and i figured out
> the deadlock scenario is as follows:
> 
> T1 [BP]               | T2 [AP]                     | T3 [hwlatd/1]                  | T4
> work_for_cpu_fn()     | cpuhp_thread_fun()          | kthread_fn()                   | hwlat_hotplug_workfn()
>   _cpu_down()         |   stop_cpu_kthread()        |                                |   mutex_lock(&hwlat_data.lock)
>     cpus_write_lock() |     kthread_stop(hwlatd/1)  |   mutex_lock(&hwlat_data.lock) |
>     __cpuhp_kick_ap() |       wait_for_completion() |                                |   cpus_read_lock()
> 
> It constitutes ABBA deadlock indirectly between "cpu_hotplug_lock" and
> "hwlat_data.lock", make the mutex obtaining in kthread_fn() interruptible
> to fix this.
> 
> Fixes: ba998f7d9531 ("trace/hwlat: Support hotplug operations")
> Signed-off-by: Wei Li <liwei391@huawei.com>
> ---
>  kernel/trace/trace_hwlat.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
> index 3bd6071441ad..4c228ccb8a38 100644
> --- a/kernel/trace/trace_hwlat.c
> +++ b/kernel/trace/trace_hwlat.c
> @@ -370,7 +370,8 @@ static int kthread_fn(void *data)
>  		get_sample();
>  		local_irq_enable();
>  
> -		mutex_lock(&hwlat_data.lock);
> +		if (mutex_lock_interruptible(&hwlat_data.lock))
> +			break;

So basically this requires as signal to break it out of the loop?

But if it receives a signal for any other reason, it breaks out of the loop
too. Which is not what we want. If anything, it should be:

		if (mutex_lock_interruptible(&hwlat_data.lock))
			continue;

But I still don't really like this solution, as it will still report a
	deadlock.

Is it possible to switch the cpu_read_lock() to be taken before the
hwlat_data.lock?

-- Steve


>  		interval = hwlat_data.sample_window - hwlat_data.sample_width;
>  		mutex_unlock(&hwlat_data.lock);
>  


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 5/5] tracing/hwlat: Fix deadlock in cpuhp processing
  2024-10-03 20:19   ` Steven Rostedt
@ 2024-10-09  7:47     ` liwei (GF)
  2024-11-12 23:50       ` Steven Rostedt
  0 siblings, 1 reply; 11+ messages in thread
From: liwei (GF) @ 2024-10-09  7:47 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, Mathieu Desnoyers, Daniel Bristot de Oliveira,
	linux-trace-kernel, xiexiuqi



On 2024/10/4 4:19, Steven Rostedt wrote:
> On Tue, 24 Sep 2024 17:45:15 +0800
> Wei Li <liwei391@huawei.com> wrote:
> 
>> Another "hung task" error was reported during the test, and i figured out
>> the deadlock scenario is as follows:
>>
>> T1 [BP]               | T2 [AP]                     | T3 [hwlatd/1]                  | T4
>> work_for_cpu_fn()     | cpuhp_thread_fun()          | kthread_fn()                   | hwlat_hotplug_workfn()
>>   _cpu_down()         |   stop_cpu_kthread()        |                                |   mutex_lock(&hwlat_data.lock)
>>     cpus_write_lock() |     kthread_stop(hwlatd/1)  |   mutex_lock(&hwlat_data.lock) |
>>     __cpuhp_kick_ap() |       wait_for_completion() |                                |   cpus_read_lock()
>>
>> It constitutes ABBA deadlock indirectly beAs it calls msleep_interruptible() and 'break' if signal pending below, i choosed 'break' here too.tween "cpu_hotplug_lock" and
>> "hwlat_data.lock", make the mutex obtaining in kthread_fn() interruptible
>> to fix this.
>>
>> Fixes: ba998f7d9531 ("trace/hwlat: Support hotplug operations")
>> Signed-off-by: Wei Li <liwei391@huawei.com>
>> ---
>>  kernel/trace/trace_hwlat.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
>> index 3bd6071441ad..4c228ccb8a38 100644
>> --- a/kernel/trace/trace_hwlat.c
>> +++ b/kernel/trace/trace_hwlat.c
>> @@ -370,7 +370,8 @@ static int kthread_fn(void *data)
>>  		get_sample();
>>  		local_irq_enable();
>>  
>> -		mutex_lock(&hwlat_data.lock);
>> +		if (mutex_lock_interruptible(&hwlat_data.lock))
>> +			break;
> 
> So basically this requires as signal to break it out of the loop?
> 
> But if it receives a signal for any other reason, it breaks out of the loop
> too. Which is not what we want. If anything, it should be:
> 
> 		if (mutex_lock_interruptible(&hwlat_data.lock))
> 			continue;

As it calls msleep_interruptible() below and 'break' if signal pending, i
choosed 'break' here too.

> But I still don't really like this solution, as it will still report a
> 	deadlock.
> 
> Is it possible to switch the cpu_read_lock() to be taken before the
> hwlat_data.lock?

It's a little hard to change the sequence of these two locks, we'll hold
"cpu_hotplug_lock" for longer unnecessarily if we do that.

But maybe we can remove the "hwlat_data.lock" in kthread_fn(), let me try
another modification.


Thanks,
Wei

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 5/5] tracing/hwlat: Fix deadlock in cpuhp processing
  2024-10-09  7:47     ` liwei (GF)
@ 2024-11-12 23:50       ` Steven Rostedt
  2024-11-14  2:06         ` liwei (GF)
  0 siblings, 1 reply; 11+ messages in thread
From: Steven Rostedt @ 2024-11-12 23:50 UTC (permalink / raw)
  To: liwei (GF)
  Cc: Masami Hiramatsu, Mathieu Desnoyers, Daniel Bristot de Oliveira,
	linux-trace-kernel, xiexiuqi

On Wed, 9 Oct 2024 15:47:23 +0800
"liwei (GF)" <liwei391@huawei.com> wrote:

> On 2024/10/4 4:19, Steven Rostedt wrote:
> > On Tue, 24 Sep 2024 17:45:15 +0800
> > Wei Li <liwei391@huawei.com> wrote:
> >   
> >> Another "hung task" error was reported during the test, and i figured out
> >> the deadlock scenario is as follows:
> >>
> >> T1 [BP]               | T2 [AP]                     | T3 [hwlatd/1]                  | T4
> >> work_for_cpu_fn()     | cpuhp_thread_fun()          | kthread_fn()                   | hwlat_hotplug_workfn()
> >>   _cpu_down()         |   stop_cpu_kthread()        |                                |   mutex_lock(&hwlat_data.lock)
> >>     cpus_write_lock() |     kthread_stop(hwlatd/1)  |   mutex_lock(&hwlat_data.lock) |
> >>     __cpuhp_kick_ap() |       wait_for_completion() |                                |   cpus_read_lock()

So, if we can make T3 not take the mutex_lock then that should be a
solution, right?

> >>
> >> It constitutes ABBA deadlock indirectly beAs it calls msleep_interruptible() and 'break' if signal pending below, i choosed 'break' here too.tween "cpu_hotplug_lock" and
> >> "hwlat_data.lock", make the mutex obtaining in kthread_fn() interruptible
> >> to fix this.
> >>
> >> Fixes: ba998f7d9531 ("trace/hwlat: Support hotplug operations")
> >> Signed-off-by: Wei Li <liwei391@huawei.com>
> >> ---
> >>  kernel/trace/trace_hwlat.c | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
> >> index 3bd6071441ad..4c228ccb8a38 100644
> >> --- a/kernel/trace/trace_hwlat.c
> >> +++ b/kernel/trace/trace_hwlat.c
> >> @@ -370,7 +370,8 @@ static int kthread_fn(void *data)
> >>  		get_sample();
> >>  		local_irq_enable();
> >>  
> >> -		mutex_lock(&hwlat_data.lock);
> >> +		if (mutex_lock_interruptible(&hwlat_data.lock))
> >> +			break;  
> > 
> > So basically this requires as signal to break it out of the loop?
> > 
> > But if it receives a signal for any other reason, it breaks out of the loop
> > too. Which is not what we want. If anything, it should be:
> > 
> > 		if (mutex_lock_interruptible(&hwlat_data.lock))
> > 			continue;  
> 
> As it calls msleep_interruptible() below and 'break' if signal pending, i
> choosed 'break' here too.
> 
> > But I still don't really like this solution, as it will still report a
> > 	deadlock.
> > 
> > Is it possible to switch the cpu_read_lock() to be taken before the
> > hwlat_data.lock?  
> 
> It's a little hard to change the sequence of these two locks, we'll hold
> "cpu_hotplug_lock" for longer unnecessarily if we do that.
> 
> But maybe we can remove the "hwlat_data.lock" in kthread_fn(), let me try
> another modification.

Have you found something yet? Looking at the code we have:

		mutex_lock(&hwlat_data.lock);
		interval = hwlat_data.sample_window - hwlat_data.sample_width;
		mutex_unlock(&hwlat_data.lock);

Where the lock is only there to synchronize the calculation of the
interval. We could add a counter for when sample_window and sample_width
are updated, and we could simply do:

 again:
		counter = atomic_read(&hwlat_data.counter);
		smp_rmb();
		if (!(counter & 1)) {
			new_interval = hwlat_data.sample_window - hwlat_data.sample_width;
			smp_rmb();
			if (counter == atomic_read(&hwlat_data.counter))
				interval = new_interval;
		}

Then we could do something like:

	atomic_inc(&hwlat_data.counter);
	smp_wmb();
	/* update sample_window or sample_width */
	smp_wmb();
	atomic_inc(&hwlat_data.counter);

And then the interval will only be updated if the values are not being
updated. Otherwise it just keeps the previous value.

-- Steve

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 5/5] tracing/hwlat: Fix deadlock in cpuhp processing
  2024-11-12 23:50       ` Steven Rostedt
@ 2024-11-14  2:06         ` liwei (GF)
  0 siblings, 0 replies; 11+ messages in thread
From: liwei (GF) @ 2024-11-14  2:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, Mathieu Desnoyers, Daniel Bristot de Oliveira,
	linux-trace-kernel, xiexiuqi

Hi Steven,

On 2024/11/13 7:50, Steven Rostedt wrote:
>>>> Another "hung task" error was reported during the test, and i figured out
>>>> the deadlock scenario is as follows:
>>>>
>>>> T1 [BP]               | T2 [AP]                     | T3 [hwlatd/1]                  | T4
>>>> work_for_cpu_fn()     | cpuhp_thread_fun()          | kthread_fn()                   | hwlat_hotplug_workfn()
>>>>   _cpu_down()         |   stop_cpu_kthread()        |                                |   mutex_lock(&hwlat_data.lock)
>>>>     cpus_write_lock() |     kthread_stop(hwlatd/1)  |   mutex_lock(&hwlat_data.lock) |
>>>>     __cpuhp_kick_ap() |       wait_for_completion() |                                |   cpus_read_lock()
> 
> So, if we can make T3 not take the mutex_lock then that should be a
> solution, right?
> 
>>>>
>>>> It constitutes ABBA deadlock indirectly beAs it calls msleep_interruptible() and 'break' if signal pending below, i choosed 'break' here too.tween "cpu_hotplug_lock" and
>>>> "hwlat_data.lock", make the mutex obtaining in kthread_fn() interruptible
>>>> to fix this.
>>>>
>>>> Fixes: ba998f7d9531 ("trace/hwlat: Support hotplug operations")
>>>> Signed-off-by: Wei Li <liwei391@huawei.com>
>>>> ---
>>>>  kernel/trace/trace_hwlat.c | 3 ++-
>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
>>>> index 3bd6071441ad..4c228ccb8a38 100644
>>>> --- a/kernel/trace/trace_hwlat.c
>>>> +++ b/kernel/trace/trace_hwlat.c
>>>> @@ -370,7 +370,8 @@ static int kthread_fn(void *data)
>>>>  		get_sample();
>>>>  		local_irq_enable();
>>>>  
>>>> -		mutex_lock(&hwlat_data.lock);
>>>> +		if (mutex_lock_interruptible(&hwlat_data.lock))
>>>> +			break;  
>>>
>>> So basically this requires as signal to break it out of the loop?
>>>
>>> But if it receives a signal for any other reason, it breaks out of the loop
>>> too. Which is not what we want. If anything, it should be:
>>>
>>> 		if (mutex_lock_interruptible(&hwlat_data.lock))
>>> 			continue;  
>>
>> As it calls msleep_interruptible() below and 'break' if signal pending, i
>> choosed 'break' here too.
>>
>>> But I still don't really like this solution, as it will still report a
>>> 	deadlock.
>>>
>>> Is it possible to switch the cpu_read_lock() to be taken before the
>>> hwlat_data.lock?  
>>
>> It's a little hard to change the sequence of these two locks, we'll hold
>> "cpu_hotplug_lock" for longer unnecessarily if we do that.
>>
>> But maybe we can remove the "hwlat_data.lock" in kthread_fn(), let me try
>> another modification.
> 
> Have you found something yet? Looking at the code we have:
> 
> 		mutex_lock(&hwlat_data.lock);
> 		interval = hwlat_data.sample_window - hwlat_data.sample_width;
> 		mutex_unlock(&hwlat_data.lock);
> 
> Where the lock is only there to synchronize the calculation of the
> interval. We could add a counter for when sample_window and sample_width
> are updated, and we could simply do:
> 
>  again:
> 		counter = atomic_read(&hwlat_data.counter);
> 		smp_rmb();
> 		if (!(counter & 1)) {
> 			new_interval = hwlat_data.sample_window - hwlat_data.sample_width;
> 			smp_rmb();
> 			if (counter == atomic_read(&hwlat_data.counter))
> 				interval = new_interval;
> 		}
> 
> Then we could do something like:
> 
> 	atomic_inc(&hwlat_data.counter);
> 	smp_wmb();
> 	/* update sample_window or sample_width */
> 	smp_wmb();
> 	atomic_inc(&hwlat_data.counter);
> 
> And then the interval will only be updated if the values are not being
> updated. Otherwise it just keeps the previous value.
> 

Your seqlock-like solution seems to be able to solve this issue, but the difficulty is that
the current updates of sample_window and sample_width are implemented using the framework
of 'trace_min_max_fops'. We cannot add 'atomic_inc(&hwlat_data.counter)' into the update
processes for sample_window and sample_width directly. If we want to remove the mutex_lock
here, maybe we need to break the application of trace_min_max_write(). However, if we do
that, we can add a 'hwlat_data.sample_interval' and update it at the same time as updating
sample_window and sample_width. I didn't figure out an elegant solution yet.

Thanks,
Wei

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-11-14  2:07 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-24  9:45 [PATCH 0/5] tracing: Fix several deadlock/race issues in timerlat and hwlat tracer Wei Li
2024-09-24  9:45 ` [PATCH 1/5] tracing/timerlat: Fix duplicated kthread creation due to CPU online/offline Wei Li
2024-09-24  9:45 ` [PATCH 2/5] tracing/timerlat: Drop interface_lock in stop_kthread() Wei Li
2024-09-24  9:45 ` [PATCH 3/5] tracing/timerlat: Fix a race during cpuhp processing Wei Li
2024-09-24  9:45 ` [PATCH 4/5] tracing/hwlat: " Wei Li
2024-09-24  9:45 ` [PATCH 5/5] tracing/hwlat: Fix deadlock in " Wei Li
2024-10-03 20:19   ` Steven Rostedt
2024-10-09  7:47     ` liwei (GF)
2024-11-12 23:50       ` Steven Rostedt
2024-11-14  2:06         ` liwei (GF)
2024-09-26 12:57 ` [PATCH 0/5] tracing: Fix several deadlock/race issues in timerlat and hwlat tracer Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).