All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched/deadline: stop dl_server before CPU goes offline
@ 2025-10-09 18:47 ` Shrikanth Hegde
  2025-10-09 20:28   ` Marek Szyprowski
                     ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Shrikanth Hegde @ 2025-10-09 18:47 UTC (permalink / raw)
  To: peterz, juri.lelli
  Cc: sshegde, mingo, vincent.guittot, linux-kernel, linuxppc-dev,
	m.szyprowski, venkat88, jstultz

From: Peter Zijlstra (Intel) <peterz@infradead.org>

IBM CI tool reported kernel warning[1] when running a CPU removal
operation through drmgr[2]. i.e "drmgr -c cpu -r -q 1"

WARNING: CPU: 0 PID: 0 at kernel/sched/cpudeadline.c:219 cpudl_set+0x58/0x170
NIP [c0000000002b6ed8] cpudl_set+0x58/0x170
LR [c0000000002b7cb8] dl_server_timer+0x168/0x2a0
Call Trace:
[c000000002c2f8c0] init_stack+0x78c0/0x8000 (unreliable)
[c0000000002b7cb8] dl_server_timer+0x168/0x2a0
[c00000000034df84] __hrtimer_run_queues+0x1a4/0x390
[c00000000034f624] hrtimer_interrupt+0x124/0x300
[c00000000002a230] timer_interrupt+0x140/0x320

Git bisects to: commit 4ae8d9aa9f9d ("sched/deadline: Fix dl_server getting stuck")

This happens since: 
- dl_server hrtimer gets enqueued close to cpu offline, when 
  kthread_park enqueues a fair task.
- CPU goes offline and drmgr removes it from cpu_present_mask.
- hrtimer fires and warning is hit.

Fix it by stopping the dl_server before CPU is marked dead.

[1]: https://lore.kernel.org/all/8218e149-7718-4432-9312-f97297c352b9@linux.ibm.com/
[2]: https://github.com/ibm-power-utilities/powerpc-utils/tree/next/src/drmgr

[sshegde: wrote the changelog and tested it]
Fixes: 4ae8d9aa9f9d ("sched/deadline: Fix dl_server getting stuck")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Closes: https://lore.kernel.org/all/8218e149-7718-4432-9312-f97297c352b9@linux.ibm.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
---
 kernel/sched/core.c     | 2 ++
 kernel/sched/deadline.c | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 198d2dd45f59..f1ebf67b48e2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8571,10 +8571,12 @@ int sched_cpu_dying(unsigned int cpu)
 	sched_tick_stop(cpu);
 
 	rq_lock_irqsave(rq, &rf);
+	update_rq_clock(rq);
 	if (rq->nr_running != 1 || rq_has_pinned_tasks(rq)) {
 		WARN(true, "Dying CPU not properly vacated!");
 		dump_rq_tasks(rq, KERN_WARNING);
 	}
+	dl_server_stop(&rq->fair_server);
 	rq_unlock_irqrestore(rq, &rf);
 
 	calc_load_migrate(rq);
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 615411a0a881..7b7671060bf9 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1582,6 +1582,9 @@ void dl_server_start(struct sched_dl_entity *dl_se)
 	if (!dl_server(dl_se) || dl_se->dl_server_active)
 		return;
 
+	if (WARN_ON_ONCE(!cpu_online(cpu_of(rq))))
+		return;
+
 	dl_se->dl_server_active = 1;
 	enqueue_dl_entity(dl_se, ENQUEUE_WAKEUP);
 	if (!dl_task(dl_se->rq->curr) || dl_entity_preempt(dl_se, &rq->curr->dl))
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched/deadline: stop dl_server before CPU goes offline
  2025-10-09 18:47 ` [PATCH] sched/deadline: stop dl_server before CPU goes offline Shrikanth Hegde
@ 2025-10-09 20:28   ` Marek Szyprowski
  2025-10-10  4:59   ` Venkat
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Marek Szyprowski @ 2025-10-09 20:28 UTC (permalink / raw)
  To: Shrikanth Hegde, peterz, juri.lelli
  Cc: mingo, vincent.guittot, linux-kernel, linuxppc-dev, venkat88,
	jstultz

On 09.10.2025 20:47, Shrikanth Hegde wrote:
> From: Peter Zijlstra (Intel) <peterz@infradead.org>
>
> IBM CI tool reported kernel warning[1] when running a CPU removal
> operation through drmgr[2]. i.e "drmgr -c cpu -r -q 1"
>
> WARNING: CPU: 0 PID: 0 at kernel/sched/cpudeadline.c:219 cpudl_set+0x58/0x170
> NIP [c0000000002b6ed8] cpudl_set+0x58/0x170
> LR [c0000000002b7cb8] dl_server_timer+0x168/0x2a0
> Call Trace:
> [c000000002c2f8c0] init_stack+0x78c0/0x8000 (unreliable)
> [c0000000002b7cb8] dl_server_timer+0x168/0x2a0
> [c00000000034df84] __hrtimer_run_queues+0x1a4/0x390
> [c00000000034f624] hrtimer_interrupt+0x124/0x300
> [c00000000002a230] timer_interrupt+0x140/0x320
>
> Git bisects to: commit 4ae8d9aa9f9d ("sched/deadline: Fix dl_server getting stuck")
>
> This happens since:
> - dl_server hrtimer gets enqueued close to cpu offline, when
>    kthread_park enqueues a fair task.
> - CPU goes offline and drmgr removes it from cpu_present_mask.
> - hrtimer fires and warning is hit.
>
> Fix it by stopping the dl_server before CPU is marked dead.
>
> [1]: https://lore.kernel.org/all/8218e149-7718-4432-9312-f97297c352b9@linux.ibm.com/
> [2]: https://github.com/ibm-power-utilities/powerpc-utils/tree/next/src/drmgr
>
> [sshegde: wrote the changelog and tested it]
> Fixes: 4ae8d9aa9f9d ("sched/deadline: Fix dl_server getting stuck")
> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Closes: https://lore.kernel.org/all/8218e149-7718-4432-9312-f97297c352b9@linux.ibm.com
Closes: 
https://lore.kernel.org/all/e56310b5-f7a9-4fad-b79a-dcbcdd3d3883@samsung.com/
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> ---
>   kernel/sched/core.c     | 2 ++
>   kernel/sched/deadline.c | 3 +++
>   2 files changed, 5 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 198d2dd45f59..f1ebf67b48e2 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -8571,10 +8571,12 @@ int sched_cpu_dying(unsigned int cpu)
>   	sched_tick_stop(cpu);
>   
>   	rq_lock_irqsave(rq, &rf);
> +	update_rq_clock(rq);
>   	if (rq->nr_running != 1 || rq_has_pinned_tasks(rq)) {
>   		WARN(true, "Dying CPU not properly vacated!");
>   		dump_rq_tasks(rq, KERN_WARNING);
>   	}
> +	dl_server_stop(&rq->fair_server);
>   	rq_unlock_irqrestore(rq, &rf);
>   
>   	calc_load_migrate(rq);
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 615411a0a881..7b7671060bf9 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1582,6 +1582,9 @@ void dl_server_start(struct sched_dl_entity *dl_se)
>   	if (!dl_server(dl_se) || dl_se->dl_server_active)
>   		return;
>   
> +	if (WARN_ON_ONCE(!cpu_online(cpu_of(rq))))
> +		return;
> +
>   	dl_se->dl_server_active = 1;
>   	enqueue_dl_entity(dl_se, ENQUEUE_WAKEUP);
>   	if (!dl_task(dl_se->rq->curr) || dl_entity_preempt(dl_se, &rq->curr->dl))

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched/deadline: stop dl_server before CPU goes offline
  2025-10-09 18:47 ` [PATCH] sched/deadline: stop dl_server before CPU goes offline Shrikanth Hegde
  2025-10-09 20:28   ` Marek Szyprowski
@ 2025-10-10  4:59   ` Venkat
  2025-10-14  9:11   ` Juri Lelli
  2025-10-14 11:47   ` [tip: sched/urgent] sched/deadline: Stop " tip-bot2 for Peter Zijlstra (Intel)
  3 siblings, 0 replies; 7+ messages in thread
From: Venkat @ 2025-10-10  4:59 UTC (permalink / raw)
  To: Shrikanth Hegde, linuxppc-dev, LKML, Peter Zijlstra, mingo
  Cc: Peter Zijlstra, juri.lelli, mingo, vincent.guittot, m.szyprowski,
	jstultz



> On 10 Oct 2025, at 12:17 AM, Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
> 
> From: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> IBM CI tool reported kernel warning[1] when running a CPU removal
> operation through drmgr[2]. i.e "drmgr -c cpu -r -q 1"
> 
> WARNING: CPU: 0 PID: 0 at kernel/sched/cpudeadline.c:219 cpudl_set+0x58/0x170
> NIP [c0000000002b6ed8] cpudl_set+0x58/0x170
> LR [c0000000002b7cb8] dl_server_timer+0x168/0x2a0
> Call Trace:
> [c000000002c2f8c0] init_stack+0x78c0/0x8000 (unreliable)
> [c0000000002b7cb8] dl_server_timer+0x168/0x2a0
> [c00000000034df84] __hrtimer_run_queues+0x1a4/0x390
> [c00000000034f624] hrtimer_interrupt+0x124/0x300
> [c00000000002a230] timer_interrupt+0x140/0x320
> 
> Git bisects to: commit 4ae8d9aa9f9d ("sched/deadline: Fix dl_server getting stuck")
> 
> This happens since: 
> - dl_server hrtimer gets enqueued close to cpu offline, when 
>  kthread_park enqueues a fair task.
> - CPU goes offline and drmgr removes it from cpu_present_mask.
> - hrtimer fires and warning is hit.
> 
> Fix it by stopping the dl_server before CPU is marked dead.
> 
> [1]: https://lore.kernel.org/all/8218e149-7718-4432-9312-f97297c352b9@linux.ibm.com/
> [2]: https://github.com/ibm-power-utilities/powerpc-utils/tree/next/src/drmgr
> 
> [sshegde: wrote the changelog and tested it]
> Fixes: 4ae8d9aa9f9d ("sched/deadline: Fix dl_server getting stuck")
> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Closes: https://lore.kernel.org/all/8218e149-7718-4432-9312-f97297c352b9@linux.ibm.com
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>

This patch fixes reported issue. Please add below tag.

Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>

Regards,
Venkat.
> ---
> kernel/sched/core.c     | 2 ++
> kernel/sched/deadline.c | 3 +++
> 2 files changed, 5 insertions(+)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 198d2dd45f59..f1ebf67b48e2 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -8571,10 +8571,12 @@ int sched_cpu_dying(unsigned int cpu)
> sched_tick_stop(cpu);
> 
> rq_lock_irqsave(rq, &rf);
> + update_rq_clock(rq);
> if (rq->nr_running != 1 || rq_has_pinned_tasks(rq)) {
> WARN(true, "Dying CPU not properly vacated!");
> dump_rq_tasks(rq, KERN_WARNING);
> }
> + dl_server_stop(&rq->fair_server);
> rq_unlock_irqrestore(rq, &rf);
> 
> calc_load_migrate(rq);
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 615411a0a881..7b7671060bf9 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1582,6 +1582,9 @@ void dl_server_start(struct sched_dl_entity *dl_se)
> if (!dl_server(dl_se) || dl_se->dl_server_active)
> return;
> 
> + if (WARN_ON_ONCE(!cpu_online(cpu_of(rq))))
> + return;
> +
> dl_se->dl_server_active = 1;
> enqueue_dl_entity(dl_se, ENQUEUE_WAKEUP);
> if (!dl_task(dl_se->rq->curr) || dl_entity_preempt(dl_se, &rq->curr->dl))
> -- 
> 2.47.3
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched/deadline: stop dl_server before CPU goes offline
  2025-10-09 18:47 ` [PATCH] sched/deadline: stop dl_server before CPU goes offline Shrikanth Hegde
  2025-10-09 20:28   ` Marek Szyprowski
  2025-10-10  4:59   ` Venkat
@ 2025-10-14  9:11   ` Juri Lelli
  2025-10-17  6:01     ` Youngmin Nam
  2025-10-14 11:47   ` [tip: sched/urgent] sched/deadline: Stop " tip-bot2 for Peter Zijlstra (Intel)
  3 siblings, 1 reply; 7+ messages in thread
From: Juri Lelli @ 2025-10-14  9:11 UTC (permalink / raw)
  To: Shrikanth Hegde
  Cc: peterz, mingo, vincent.guittot, linux-kernel, linuxppc-dev,
	m.szyprowski, venkat88, jstultz

Hello,

On 10/10/25 00:17, Shrikanth Hegde wrote:
> From: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> IBM CI tool reported kernel warning[1] when running a CPU removal
> operation through drmgr[2]. i.e "drmgr -c cpu -r -q 1"
> 
> WARNING: CPU: 0 PID: 0 at kernel/sched/cpudeadline.c:219 cpudl_set+0x58/0x170
> NIP [c0000000002b6ed8] cpudl_set+0x58/0x170
> LR [c0000000002b7cb8] dl_server_timer+0x168/0x2a0
> Call Trace:
> [c000000002c2f8c0] init_stack+0x78c0/0x8000 (unreliable)
> [c0000000002b7cb8] dl_server_timer+0x168/0x2a0
> [c00000000034df84] __hrtimer_run_queues+0x1a4/0x390
> [c00000000034f624] hrtimer_interrupt+0x124/0x300
> [c00000000002a230] timer_interrupt+0x140/0x320
> 
> Git bisects to: commit 4ae8d9aa9f9d ("sched/deadline: Fix dl_server getting stuck")
> 
> This happens since: 
> - dl_server hrtimer gets enqueued close to cpu offline, when 
>   kthread_park enqueues a fair task.
> - CPU goes offline and drmgr removes it from cpu_present_mask.
> - hrtimer fires and warning is hit.
> 
> Fix it by stopping the dl_server before CPU is marked dead.
> 
> [1]: https://lore.kernel.org/all/8218e149-7718-4432-9312-f97297c352b9@linux.ibm.com/
> [2]: https://github.com/ibm-power-utilities/powerpc-utils/tree/next/src/drmgr
> 
> [sshegde: wrote the changelog and tested it]
> Fixes: 4ae8d9aa9f9d ("sched/deadline: Fix dl_server getting stuck")
> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Closes: https://lore.kernel.org/all/8218e149-7718-4432-9312-f97297c352b9@linux.ibm.com
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>

Looks good to me.

Acked-by: Juri Lelli <juri.lelli@redhat.com>

Thanks!
Juri



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [tip: sched/urgent] sched/deadline: Stop dl_server before CPU goes offline
  2025-10-09 18:47 ` [PATCH] sched/deadline: stop dl_server before CPU goes offline Shrikanth Hegde
                     ` (2 preceding siblings ...)
  2025-10-14  9:11   ` Juri Lelli
@ 2025-10-14 11:47   ` tip-bot2 for Peter Zijlstra (Intel)
  3 siblings, 0 replies; 7+ messages in thread
From: tip-bot2 for Peter Zijlstra (Intel) @ 2025-10-14 11:47 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Venkat Rao Bagalkote, Shrikanth Hegde,
	Marek Szyprowski, x86, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     ee6e44dfe6e50b4a5df853d933a96bdff5309e6e
Gitweb:        https://git.kernel.org/tip/ee6e44dfe6e50b4a5df853d933a96bdff5309e6e
Author:        Peter Zijlstra (Intel) <peterz@infradead.org>
AuthorDate:    Fri, 10 Oct 2025 00:17:27 +05:30
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 14 Oct 2025 13:43:08 +02:00

sched/deadline: Stop dl_server before CPU goes offline

IBM CI tool reported kernel warning[1] when running a CPU removal
operation through drmgr[2]. i.e "drmgr -c cpu -r -q 1"

WARNING: CPU: 0 PID: 0 at kernel/sched/cpudeadline.c:219 cpudl_set+0x58/0x170
NIP [c0000000002b6ed8] cpudl_set+0x58/0x170
LR [c0000000002b7cb8] dl_server_timer+0x168/0x2a0
Call Trace:
[c000000002c2f8c0] init_stack+0x78c0/0x8000 (unreliable)
[c0000000002b7cb8] dl_server_timer+0x168/0x2a0
[c00000000034df84] __hrtimer_run_queues+0x1a4/0x390
[c00000000034f624] hrtimer_interrupt+0x124/0x300
[c00000000002a230] timer_interrupt+0x140/0x320

Git bisects to: commit 4ae8d9aa9f9d ("sched/deadline: Fix dl_server getting stuck")

This happens since:
- dl_server hrtimer gets enqueued close to cpu offline, when
  kthread_park enqueues a fair task.
- CPU goes offline and drmgr removes it from cpu_present_mask.
- hrtimer fires and warning is hit.

Fix it by stopping the dl_server before CPU is marked dead.

[1]: https://lore.kernel.org/all/8218e149-7718-4432-9312-f97297c352b9@linux.ibm.com/
[2]: https://github.com/ibm-power-utilities/powerpc-utils/tree/next/src/drmgr

[sshegde: wrote the changelog and tested it]
Fixes: 4ae8d9aa9f9d ("sched/deadline: Fix dl_server getting stuck")
Closes: https://lore.kernel.org/all/8218e149-7718-4432-9312-f97297c352b9@linux.ibm.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
---
 kernel/sched/core.c     | 2 ++
 kernel/sched/deadline.c | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 198d2dd..f1ebf67 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8571,10 +8571,12 @@ int sched_cpu_dying(unsigned int cpu)
 	sched_tick_stop(cpu);
 
 	rq_lock_irqsave(rq, &rf);
+	update_rq_clock(rq);
 	if (rq->nr_running != 1 || rq_has_pinned_tasks(rq)) {
 		WARN(true, "Dying CPU not properly vacated!");
 		dump_rq_tasks(rq, KERN_WARNING);
 	}
+	dl_server_stop(&rq->fair_server);
 	rq_unlock_irqrestore(rq, &rf);
 
 	calc_load_migrate(rq);
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 615411a..7b76710 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1582,6 +1582,9 @@ void dl_server_start(struct sched_dl_entity *dl_se)
 	if (!dl_server(dl_se) || dl_se->dl_server_active)
 		return;
 
+	if (WARN_ON_ONCE(!cpu_online(cpu_of(rq))))
+		return;
+
 	dl_se->dl_server_active = 1;
 	enqueue_dl_entity(dl_se, ENQUEUE_WAKEUP);
 	if (!dl_task(dl_se->rq->curr) || dl_entity_preempt(dl_se, &rq->curr->dl))

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched/deadline: stop dl_server before CPU goes offline
  2025-10-14  9:11   ` Juri Lelli
@ 2025-10-17  6:01     ` Youngmin Nam
  2025-10-20  6:08       ` Juri Lelli
  0 siblings, 1 reply; 7+ messages in thread
From: Youngmin Nam @ 2025-10-17  6:01 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Shrikanth Hegde, peterz, mingo, vincent.guittot, linux-kernel,
	linuxppc-dev, m.szyprowski, venkat88, jstultz, d7271.choe,
	soohyuni.kim, bongkyu7.kim, youngmin.nam, jkkkkk.choi

[-- Attachment #1: Type: text/plain, Size: 2156 bytes --]

On Tue, Oct 14, 2025 at 11:11:31AM +0200, Juri Lelli wrote:
> Hello,
> 
> On 10/10/25 00:17, Shrikanth Hegde wrote:
> > From: Peter Zijlstra (Intel) <peterz@infradead.org>
> > 
> > IBM CI tool reported kernel warning[1] when running a CPU removal
> > operation through drmgr[2]. i.e "drmgr -c cpu -r -q 1"
> > 
> > WARNING: CPU: 0 PID: 0 at kernel/sched/cpudeadline.c:219 cpudl_set+0x58/0x170
> > NIP [c0000000002b6ed8] cpudl_set+0x58/0x170
> > LR [c0000000002b7cb8] dl_server_timer+0x168/0x2a0
> > Call Trace:
> > [c000000002c2f8c0] init_stack+0x78c0/0x8000 (unreliable)
> > [c0000000002b7cb8] dl_server_timer+0x168/0x2a0
> > [c00000000034df84] __hrtimer_run_queues+0x1a4/0x390
> > [c00000000034f624] hrtimer_interrupt+0x124/0x300
> > [c00000000002a230] timer_interrupt+0x140/0x320
> > 
> > Git bisects to: commit 4ae8d9aa9f9d ("sched/deadline: Fix dl_server getting stuck")
> > 
> > This happens since: 
> > - dl_server hrtimer gets enqueued close to cpu offline, when 
> >   kthread_park enqueues a fair task.
> > - CPU goes offline and drmgr removes it from cpu_present_mask.
> > - hrtimer fires and warning is hit.
> > 
> > Fix it by stopping the dl_server before CPU is marked dead.
> > 
> > [1]: https://lore.kernel.org/all/8218e149-7718-4432-9312-f97297c352b9@linux.ibm.com/
> > [2]: https://github.com/ibm-power-utilities/powerpc-utils/tree/next/src/drmgr
> > 
> > [sshegde: wrote the changelog and tested it]
> > Fixes: 4ae8d9aa9f9d ("sched/deadline: Fix dl_server getting stuck")
> > Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> > Closes: https://lore.kernel.org/all/8218e149-7718-4432-9312-f97297c352b9@linux.ibm.com
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> 
> Looks good to me.
> 
> Acked-by: Juri Lelli <juri.lelli@redhat.com>
> 
> Thanks!
> Juri
> 

Hi All,

Could we expect this patch to address the following issue as well?

https://lore.kernel.org/all/aMKTHKfegBk4DgjA@jlelli-thinkpadt14gen4.remote.csb/

Thanks,
Youngmin.

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched/deadline: stop dl_server before CPU goes offline
  2025-10-17  6:01     ` Youngmin Nam
@ 2025-10-20  6:08       ` Juri Lelli
  0 siblings, 0 replies; 7+ messages in thread
From: Juri Lelli @ 2025-10-20  6:08 UTC (permalink / raw)
  To: Youngmin Nam
  Cc: Shrikanth Hegde, peterz, mingo, vincent.guittot, linux-kernel,
	linuxppc-dev, m.szyprowski, venkat88, jstultz, d7271.choe,
	soohyuni.kim, bongkyu7.kim, jkkkkk.choi

On 17/10/25 15:01, Youngmin Nam wrote:
> On Tue, Oct 14, 2025 at 11:11:31AM +0200, Juri Lelli wrote:
> > Hello,
> > 
> > On 10/10/25 00:17, Shrikanth Hegde wrote:
> > > From: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > 
> > > IBM CI tool reported kernel warning[1] when running a CPU removal
> > > operation through drmgr[2]. i.e "drmgr -c cpu -r -q 1"
> > > 
> > > WARNING: CPU: 0 PID: 0 at kernel/sched/cpudeadline.c:219 cpudl_set+0x58/0x170
> > > NIP [c0000000002b6ed8] cpudl_set+0x58/0x170
> > > LR [c0000000002b7cb8] dl_server_timer+0x168/0x2a0
> > > Call Trace:
> > > [c000000002c2f8c0] init_stack+0x78c0/0x8000 (unreliable)
> > > [c0000000002b7cb8] dl_server_timer+0x168/0x2a0
> > > [c00000000034df84] __hrtimer_run_queues+0x1a4/0x390
> > > [c00000000034f624] hrtimer_interrupt+0x124/0x300
> > > [c00000000002a230] timer_interrupt+0x140/0x320
> > > 
> > > Git bisects to: commit 4ae8d9aa9f9d ("sched/deadline: Fix dl_server getting stuck")
> > > 
> > > This happens since: 
> > > - dl_server hrtimer gets enqueued close to cpu offline, when 
> > >   kthread_park enqueues a fair task.
> > > - CPU goes offline and drmgr removes it from cpu_present_mask.
> > > - hrtimer fires and warning is hit.
> > > 
> > > Fix it by stopping the dl_server before CPU is marked dead.
> > > 
> > > [1]: https://lore.kernel.org/all/8218e149-7718-4432-9312-f97297c352b9@linux.ibm.com/
> > > [2]: https://github.com/ibm-power-utilities/powerpc-utils/tree/next/src/drmgr
> > > 
> > > [sshegde: wrote the changelog and tested it]
> > > Fixes: 4ae8d9aa9f9d ("sched/deadline: Fix dl_server getting stuck")
> > > Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> > > Closes: https://lore.kernel.org/all/8218e149-7718-4432-9312-f97297c352b9@linux.ibm.com
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> > > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > > Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> > 
> > Looks good to me.
> > 
> > Acked-by: Juri Lelli <juri.lelli@redhat.com>
> > 
> > Thanks!
> > Juri
> > 
> 
> Hi All,
> 
> Could we expect this patch to address the following issue as well?
> 
> https://lore.kernel.org/all/aMKTHKfegBk4DgjA@jlelli-thinkpadt14gen4.remote.csb/

I don't think I see a direct connection with it.

Thanks,
Juri



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-10-20  6:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CGME20251009202834eucas1p1bfe2bd8fdb6360bc836fa09dd2208a58@eucas1p1.samsung.com>
2025-10-09 18:47 ` [PATCH] sched/deadline: stop dl_server before CPU goes offline Shrikanth Hegde
2025-10-09 20:28   ` Marek Szyprowski
2025-10-10  4:59   ` Venkat
2025-10-14  9:11   ` Juri Lelli
2025-10-17  6:01     ` Youngmin Nam
2025-10-20  6:08       ` Juri Lelli
2025-10-14 11:47   ` [tip: sched/urgent] sched/deadline: Stop " tip-bot2 for Peter Zijlstra (Intel)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.