public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] sched/deadline: fix task_struct reference leak
@ 2024-06-20 12:56 Wander Lairson Costa
  2024-06-20 14:10 ` Juri Lelli
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Wander Lairson Costa @ 2024-06-20 12:56 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider,
	open list:SCHEDULER
  Cc: rrendec, Wander Lairson Costa

During the execution of the following stress test with linux-rt:

stress-ng --cyclic 30 --timeout 30 --minimize --quiet

kmemleak frequently reported a memory leak concerning the task_struct:

unreferenced object 0xffff8881305b8000 (size 16136):
  comm "stress-ng", pid 614, jiffies 4294883961 (age 286.412s)
  object hex dump (first 32 bytes):
    02 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00  .@..............
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  debug hex dump (first 16 bytes):
    53 09 00 00 00 00 00 00 00 00 00 00 00 00 00 00  S...............
  backtrace:
    [<00000000046b6790>] dup_task_struct+0x30/0x540
    [<00000000c5ca0f0b>] copy_process+0x3d9/0x50e0
    [<00000000ced59777>] kernel_clone+0xb0/0x770
    [<00000000a50befdc>] __do_sys_clone+0xb6/0xf0
    [<000000001dbf2008>] do_syscall_64+0x5d/0xf0
    [<00000000552900ff>] entry_SYSCALL_64_after_hwframe+0x6e/0x76

The issue occurs in start_dl_timer(), which increments the task_struct
reference count and sets a timer. The timer callback, dl_task_timer,
is supposed to decrement the reference count upon expiration. However,
if enqueue_task_dl() is called before the timer expires and cancels it,
the reference count is not decremented, leading to the leak.

This patch fixes the reference leak by ensuring the task_struct
reference count is properly decremented when the timer is canceled.

---

Changelog:

v2:
* Add the fixes tag
* Add a comment about canceling the timer while the callback was running

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Fixes: feff2e65efd8 ("sched/deadline: Unthrottle PI boosted threads while enqueuing")
---
 kernel/sched/deadline.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index c75d1307d86d..9bedd148f007 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1804,8 +1804,13 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
 			 * The replenish timer needs to be canceled. No
 			 * problem if it fires concurrently: boosted threads
 			 * are ignored in dl_task_timer().
+			 *
+			 * If the timer callback was running (hrtimer_try_to_cancel == -1),
+			 * it will eventually call put_task_struct().
 			 */
-			hrtimer_try_to_cancel(&p->dl.dl_timer);
+			if (hrtimer_try_to_cancel(&p->dl.dl_timer) == 1 &&
+			    !dl_server(&p->dl))
+				put_task_struct(p);
 			p->dl.dl_throttled = 0;
 		}
 	} else if (!dl_prio(p->normal_prio)) {
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] sched/deadline: fix task_struct reference leak
  2024-06-20 12:56 [PATCH v2] sched/deadline: fix task_struct reference leak Wander Lairson Costa
@ 2024-06-20 14:10 ` Juri Lelli
  2024-07-01  7:06 ` [tip: sched/urgent] " tip-bot2 for Wander Lairson Costa
  2024-07-01 11:07 ` [tip: sched/urgent] sched/deadline: Fix " tip-bot2 for Wander Lairson Costa
  2 siblings, 0 replies; 4+ messages in thread
From: Juri Lelli @ 2024-06-20 14:10 UTC (permalink / raw)
  To: Wander Lairson Costa
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider,
	open list:SCHEDULER, rrendec

On 20/06/24 09:56, Wander Lairson Costa wrote:
> During the execution of the following stress test with linux-rt:
> 
> stress-ng --cyclic 30 --timeout 30 --minimize --quiet
> 
> kmemleak frequently reported a memory leak concerning the task_struct:
> 
> unreferenced object 0xffff8881305b8000 (size 16136):
>   comm "stress-ng", pid 614, jiffies 4294883961 (age 286.412s)
>   object hex dump (first 32 bytes):
>     02 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00  .@..............
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   debug hex dump (first 16 bytes):
>     53 09 00 00 00 00 00 00 00 00 00 00 00 00 00 00  S...............
>   backtrace:
>     [<00000000046b6790>] dup_task_struct+0x30/0x540
>     [<00000000c5ca0f0b>] copy_process+0x3d9/0x50e0
>     [<00000000ced59777>] kernel_clone+0xb0/0x770
>     [<00000000a50befdc>] __do_sys_clone+0xb6/0xf0
>     [<000000001dbf2008>] do_syscall_64+0x5d/0xf0
>     [<00000000552900ff>] entry_SYSCALL_64_after_hwframe+0x6e/0x76
> 
> The issue occurs in start_dl_timer(), which increments the task_struct
> reference count and sets a timer. The timer callback, dl_task_timer,
> is supposed to decrement the reference count upon expiration. However,
> if enqueue_task_dl() is called before the timer expires and cancels it,
> the reference count is not decremented, leading to the leak.
> 
> This patch fixes the reference leak by ensuring the task_struct
> reference count is properly decremented when the timer is canceled.
> 
> ---
> 
> Changelog:
> 
> v2:
> * Add the fixes tag
> * Add a comment about canceling the timer while the callback was running

Uh, looks like these bits above should come after the SoB section so
that they are ignored when applying the patch? But, maybe, Peter, you
can fix that if you decide to take this version?

> 
> Signed-off-by: Wander Lairson Costa <wander@redhat.com>
> Fixes: feff2e65efd8 ("sched/deadline: Unthrottle PI boosted threads while enqueuing")

Apart from that

Acked-by: Juri Lelli <juri.lelli@redhat.com>

Thanks!
Juri

> ---
>  kernel/sched/deadline.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index c75d1307d86d..9bedd148f007 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1804,8 +1804,13 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
>  			 * The replenish timer needs to be canceled. No
>  			 * problem if it fires concurrently: boosted threads
>  			 * are ignored in dl_task_timer().
> +			 *
> +			 * If the timer callback was running (hrtimer_try_to_cancel == -1),
> +			 * it will eventually call put_task_struct().
>  			 */
> -			hrtimer_try_to_cancel(&p->dl.dl_timer);
> +			if (hrtimer_try_to_cancel(&p->dl.dl_timer) == 1 &&
> +			    !dl_server(&p->dl))
> +				put_task_struct(p);
>  			p->dl.dl_throttled = 0;
>  		}
>  	} else if (!dl_prio(p->normal_prio)) {
> -- 
> 2.45.2
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [tip: sched/urgent] sched/deadline: fix task_struct reference leak
  2024-06-20 12:56 [PATCH v2] sched/deadline: fix task_struct reference leak Wander Lairson Costa
  2024-06-20 14:10 ` Juri Lelli
@ 2024-07-01  7:06 ` tip-bot2 for Wander Lairson Costa
  2024-07-01 11:07 ` [tip: sched/urgent] sched/deadline: Fix " tip-bot2 for Wander Lairson Costa
  2 siblings, 0 replies; 4+ messages in thread
From: tip-bot2 for Wander Lairson Costa @ 2024-07-01  7:06 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), Juri Lelli, x86, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     a7accf658efa4fa5b04a74f21863833fd737f469
Gitweb:        https://git.kernel.org/tip/a7accf658efa4fa5b04a74f21863833fd737f469
Author:        Wander Lairson Costa <wander@redhat.com>
AuthorDate:    Thu, 20 Jun 2024 09:56:17 -03:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 25 Jun 2024 10:43:41 +02:00

sched/deadline: fix task_struct reference leak

During the execution of the following stress test with linux-rt:

stress-ng --cyclic 30 --timeout 30 --minimize --quiet

kmemleak frequently reported a memory leak concerning the task_struct:

unreferenced object 0xffff8881305b8000 (size 16136):
  comm "stress-ng", pid 614, jiffies 4294883961 (age 286.412s)
  object hex dump (first 32 bytes):
    02 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00  .@..............
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  debug hex dump (first 16 bytes):
    53 09 00 00 00 00 00 00 00 00 00 00 00 00 00 00  S...............
  backtrace:
    [<00000000046b6790>] dup_task_struct+0x30/0x540
    [<00000000c5ca0f0b>] copy_process+0x3d9/0x50e0
    [<00000000ced59777>] kernel_clone+0xb0/0x770
    [<00000000a50befdc>] __do_sys_clone+0xb6/0xf0
    [<000000001dbf2008>] do_syscall_64+0x5d/0xf0
    [<00000000552900ff>] entry_SYSCALL_64_after_hwframe+0x6e/0x76

The issue occurs in start_dl_timer(), which increments the task_struct
reference count and sets a timer. The timer callback, dl_task_timer,
is supposed to decrement the reference count upon expiration. However,
if enqueue_task_dl() is called before the timer expires and cancels it,
the reference count is not decremented, leading to the leak.

This patch fixes the reference leak by ensuring the task_struct
reference count is properly decremented when the timer is canceled.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lore.kernel.org/r/20240620125618.11419-1-wander@redhat.com
---
 kernel/sched/deadline.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index c75d130..9bedd14 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1804,8 +1804,13 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
 			 * The replenish timer needs to be canceled. No
 			 * problem if it fires concurrently: boosted threads
 			 * are ignored in dl_task_timer().
+			 *
+			 * If the timer callback was running (hrtimer_try_to_cancel == -1),
+			 * it will eventually call put_task_struct().
 			 */
-			hrtimer_try_to_cancel(&p->dl.dl_timer);
+			if (hrtimer_try_to_cancel(&p->dl.dl_timer) == 1 &&
+			    !dl_server(&p->dl))
+				put_task_struct(p);
 			p->dl.dl_throttled = 0;
 		}
 	} else if (!dl_prio(p->normal_prio)) {

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [tip: sched/urgent] sched/deadline: Fix task_struct reference leak
  2024-06-20 12:56 [PATCH v2] sched/deadline: fix task_struct reference leak Wander Lairson Costa
  2024-06-20 14:10 ` Juri Lelli
  2024-07-01  7:06 ` [tip: sched/urgent] " tip-bot2 for Wander Lairson Costa
@ 2024-07-01 11:07 ` tip-bot2 for Wander Lairson Costa
  2 siblings, 0 replies; 4+ messages in thread
From: tip-bot2 for Wander Lairson Costa @ 2024-07-01 11:07 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Wander Lairson Costa, Peter Zijlstra (Intel), Juri Lelli, x86,
	linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     b58652db66c910c2245f5bee7deca41c12d707b9
Gitweb:        https://git.kernel.org/tip/b58652db66c910c2245f5bee7deca41c12d707b9
Author:        Wander Lairson Costa <wander@redhat.com>
AuthorDate:    Thu, 20 Jun 2024 09:56:17 -03:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 01 Jul 2024 13:01:44 +02:00

sched/deadline: Fix task_struct reference leak

During the execution of the following stress test with linux-rt:

stress-ng --cyclic 30 --timeout 30 --minimize --quiet

kmemleak frequently reported a memory leak concerning the task_struct:

unreferenced object 0xffff8881305b8000 (size 16136):
  comm "stress-ng", pid 614, jiffies 4294883961 (age 286.412s)
  object hex dump (first 32 bytes):
    02 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00  .@..............
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  debug hex dump (first 16 bytes):
    53 09 00 00 00 00 00 00 00 00 00 00 00 00 00 00  S...............
  backtrace:
    [<00000000046b6790>] dup_task_struct+0x30/0x540
    [<00000000c5ca0f0b>] copy_process+0x3d9/0x50e0
    [<00000000ced59777>] kernel_clone+0xb0/0x770
    [<00000000a50befdc>] __do_sys_clone+0xb6/0xf0
    [<000000001dbf2008>] do_syscall_64+0x5d/0xf0
    [<00000000552900ff>] entry_SYSCALL_64_after_hwframe+0x6e/0x76

The issue occurs in start_dl_timer(), which increments the task_struct
reference count and sets a timer. The timer callback, dl_task_timer,
is supposed to decrement the reference count upon expiration. However,
if enqueue_task_dl() is called before the timer expires and cancels it,
the reference count is not decremented, leading to the leak.

This patch fixes the reference leak by ensuring the task_struct
reference count is properly decremented when the timer is canceled.

Fixes: feff2e65efd8 ("sched/deadline: Unthrottle PI boosted threads while enqueuing")
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lore.kernel.org/r/20240620125618.11419-1-wander@redhat.com
---
 kernel/sched/deadline.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index c75d130..9bedd14 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1804,8 +1804,13 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
 			 * The replenish timer needs to be canceled. No
 			 * problem if it fires concurrently: boosted threads
 			 * are ignored in dl_task_timer().
+			 *
+			 * If the timer callback was running (hrtimer_try_to_cancel == -1),
+			 * it will eventually call put_task_struct().
 			 */
-			hrtimer_try_to_cancel(&p->dl.dl_timer);
+			if (hrtimer_try_to_cancel(&p->dl.dl_timer) == 1 &&
+			    !dl_server(&p->dl))
+				put_task_struct(p);
 			p->dl.dl_throttled = 0;
 		}
 	} else if (!dl_prio(p->normal_prio)) {

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-07-01 11:07 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-20 12:56 [PATCH v2] sched/deadline: fix task_struct reference leak Wander Lairson Costa
2024-06-20 14:10 ` Juri Lelli
2024-07-01  7:06 ` [tip: sched/urgent] " tip-bot2 for Wander Lairson Costa
2024-07-01 11:07 ` [tip: sched/urgent] sched/deadline: Fix " tip-bot2 for Wander Lairson Costa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox