All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v4 0/6] samples/damon: handle damon_{start,stop}() failures
@ 2026-06-10 13:55 SeongJae Park
  2026-06-10 13:55 ` [RFC PATCH v4 1/6] samples/damon/wsse: handle damon_start() failure SeongJae Park
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: SeongJae Park @ 2026-06-10 13:55 UTC (permalink / raw)
  Cc: SeongJae Park, # 6 . 14 . x, Andrew Morton, damon, linux-kernel,
	linux-mm

All DAMON sample modules are not correctly handling failures from
damon_start().  Among those, mtier also has an additional problem for
handling of damon_stop() failures.  wsse and prcl also have a problem in
their damon_call() failure handling.  As a result, memory leaks, next
DAMON operation disruptions, and use-after-free can happen.  Fix those.

Note that only the damon_start() failure caused issues can reliably be
reproduced.  Reproducing those issues require the admin permission,
though.

Changes from RFC v3
- RFC v3: https://lore.kernel.org/20260610011420.3018-1-sj@kernel.org
- Add damon_Call() failure handling fixes for wsse and prcl.
Changes from RFC v2
- RFC v2: https://lore.kernel.org/20260609142119.68120-1-sj@kernel.org
- Add damon_start() failure handling fixes for wsse and prcl.
Changes from RFC v1
- RFC v1: https://lore.kernel.org/20260609005443.2122-1-sj@kernel.org
- Add damon_stop() failure handling fix to the series.

SeongJae Park (6):
  samples/damon/wsse: handle damon_start() failure
  samples/damon/prcl: handle damon_start() failure
  samples/damon/mtier: handle damon_start() failure
  samples/damon/mtier: handle damon_stop() failure
  samples/damon/wsse: stop and free damon ctx when damon_call() fails
  samples/damon/prcl: stop and free damon ctx when damon_call() fails

 samples/damon/mtier.c | 14 ++++++++++++--
 samples/damon/prcl.c  | 11 +++++++++--
 samples/damon/wsse.c  | 11 +++++++++--
 3 files changed, 30 insertions(+), 6 deletions(-)


base-commit: 1fe919b2e7b6455d0b976d75dcbe44324361a83b
-- 
2.47.3


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH v4 1/6] samples/damon/wsse: handle damon_start() failure
  2026-06-10 13:55 [RFC PATCH v4 0/6] samples/damon: handle damon_{start,stop}() failures SeongJae Park
@ 2026-06-10 13:55 ` SeongJae Park
  2026-06-10 13:55 ` [RFC PATCH v4 2/6] samples/damon/prcl: " SeongJae Park
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: SeongJae Park @ 2026-06-10 13:55 UTC (permalink / raw)
  Cc: SeongJae Park, # 6 . 14 . x, Andrew Morton, damon, linux-kernel,
	linux-mm

damon_sample_wsse_start() callers assume it will clean up resources when
it fails.  And the function does the cleanup for context buildup
failures.  However, it is not doing the cleanup for damon_start()
failure.  As a result, when damon_start() fails, it leaks the memory for
DAMON context.  Free the context in case of the failure to fix the
issues.

Note that the issue can reliably be reproduced because the module calls
damon_start() in the exclusive mode.  For example,

    $ sudo damo start
    $ echo $$ | sudo tee /sys/module/damon_sample_wsse/parameters/target_pid
    $ echo Y | sudo tee /sys/module/damon_sample_wsse/parameters/enabled
    $ sudo cat /proc/allocinfo | grep damon_new_ctx

Because the first command is running another DAMON instance, the third
command fails the damon_start() call because the new DAMON instance
cannot exclusively run.  And without this fix, by repeating the third
and the fourth commands above, we can show the memory consumption is
only increasing due to the leaks.  It requires the sudo permission
though.

The issue was discovered [1] by Sashiko.

[1] https://lore.kernel.org/20260609145814.70163-1-sj@kernel.org

Fixes: b757c6cfc696 ("samples/damon/wsse: start and stop DAMON as the user requests")
Cc: <stable@vger.kernel.org> # 6.14.x
Signed-off-by: SeongJae Park <sj@kernel.org>
---
 samples/damon/wsse.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/samples/damon/wsse.c b/samples/damon/wsse.c
index 799ad44439434..bbd9392ab5b36 100644
--- a/samples/damon/wsse.c
+++ b/samples/damon/wsse.c
@@ -87,8 +87,10 @@ static int damon_sample_wsse_start(void)
 	target->pid = target_pidp;
 
 	err = damon_start(&ctx, 1, true);
-	if (err)
+	if (err) {
+		damon_destroy_ctx(ctx);
 		return err;
+	}
 	repeat_call_control.data = ctx;
 	return damon_call(ctx, &repeat_call_control);
 }
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH v4 2/6] samples/damon/prcl: handle damon_start() failure
  2026-06-10 13:55 [RFC PATCH v4 0/6] samples/damon: handle damon_{start,stop}() failures SeongJae Park
  2026-06-10 13:55 ` [RFC PATCH v4 1/6] samples/damon/wsse: handle damon_start() failure SeongJae Park
@ 2026-06-10 13:55 ` SeongJae Park
  2026-06-10 13:55 ` [RFC PATCH v4 3/6] samples/damon/mtier: " SeongJae Park
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: SeongJae Park @ 2026-06-10 13:55 UTC (permalink / raw)
  Cc: SeongJae Park, # 6 . 14 . x, Andrew Morton, damon, linux-kernel,
	linux-mm

damon_sample_prcl_start() callers assume it will clean up resources when
it fails.  And the function does the cleanup for context buildup
failures.  However, it is not doing the cleanup for damon_start()
failure.  As a result, when damon_start() fails, it leaks the memory for
DAMON context.  Free the context in case of the failure to fix the
issues.

Note that the issue can reliably be reproduced because the module calls
damon_start() in the exclusive mode.  For example,

    $ sudo damo start
    $ echo $$ | sudo tee /sys/module/damon_sample_prcl/parameters/target_pid
    $ echo Y | sudo tee /sys/module/damon_sample_prcl/parameters/enabled
    $ sudo cat /proc/allocinfo | grep damon_new_ctx

Because the first command is running another DAMON instance, the third
command fails the damon_start() call because the new DAMON instance
cannot exclusively run.  And without this fix, by repeating the third
and the fourth commands above, we can show the memory consumption is
only increasing due to the leaks.  It requires the sudo permission
though.

The issue was discovered [1] by Sashiko.

[1] https://lore.kernel.org/20260609145814.70163-1-sj@kernel.org

Fixes: 2aca254620a8 ("samples/damon: introduce a skeleton of a smaple DAMON module for proactive reclamation")
Cc: <stable@vger.kernel.org> # 6.14.x
Signed-off-by: SeongJae Park <sj@kernel.org>
---
 samples/damon/prcl.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/samples/damon/prcl.c b/samples/damon/prcl.c
index b7c50f2656ce7..0db2598946911 100644
--- a/samples/damon/prcl.c
+++ b/samples/damon/prcl.c
@@ -106,8 +106,10 @@ static int damon_sample_prcl_start(void)
 	damon_set_schemes(ctx, &scheme, 1);
 
 	err = damon_start(&ctx, 1, true);
-	if (err)
+	if (err) {
+		damon_destroy_ctx(ctx);
 		return err;
+	}
 
 	repeat_call_control.data = ctx;
 	return damon_call(ctx, &repeat_call_control);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH v4 3/6] samples/damon/mtier: handle damon_start() failure
  2026-06-10 13:55 [RFC PATCH v4 0/6] samples/damon: handle damon_{start,stop}() failures SeongJae Park
  2026-06-10 13:55 ` [RFC PATCH v4 1/6] samples/damon/wsse: handle damon_start() failure SeongJae Park
  2026-06-10 13:55 ` [RFC PATCH v4 2/6] samples/damon/prcl: " SeongJae Park
@ 2026-06-10 13:55 ` SeongJae Park
  2026-06-10 13:55 ` [RFC PATCH v4 4/6] samples/damon/mtier: handle damon_stop() failure SeongJae Park
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: SeongJae Park @ 2026-06-10 13:55 UTC (permalink / raw)
  Cc: SeongJae Park, # 6 . 16 . x, Andrew Morton, damon, linux-kernel,
	linux-mm

damon_sample_mtier_start() callers assume it will clean up resources
when it fails.  And the function does the cleanup for context buildup
failures.  However, it is not doing the cleanup for damon_start()
failure.

As a result, when damon_start() fails, it could leak the memory for
DAMON context.  Also, if damon_start() fails for only the second
context, the first context will indefinitely run, and avoid starting
other DAMON contexts since it is running in the exclusive mode.  Stop
possibly started DAMON context and free the contexts in case of the
failure to fix the issues.

Note that the issue can reliably be reproduced because the module calls
damon_start() in the exclusive mode.  For example,

    $ sudo damo start
    $ echo Y | sudo tee /sys/module/damon_sample_mtier/parameters/enabled
    $ sudo cat /proc/allocinfo | grep damon_new_ctx

Because the first command is running another DAMON instance, the second
command fails the damon_start() call because the new DAMON instance
cannot exclusively run.  And without this fix, by repeating the second
and the third commands above, we can show the memory consumption is only
increasing due to the leaks.  It requires the sudo permission though.

The issue was discovered [1] by Sashiko.

[1] https://lore.kernel.org/20260608112455.274231F00893@smtp.kernel.org

Fixes: 82a08bde3cf7 ("samples/damon: implement a DAMON module for memory tiering")
Cc: <stable@vger.kernel.org> # 6.16.x
Signed-off-by: SeongJae Park <sj@kernel.org>
---
 samples/damon/mtier.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/samples/damon/mtier.c b/samples/damon/mtier.c
index eb1143de8df17..66b591f2180fa 100644
--- a/samples/damon/mtier.c
+++ b/samples/damon/mtier.c
@@ -174,6 +174,7 @@ static struct damon_ctx *damon_sample_mtier_build_ctx(bool promote)
 static int damon_sample_mtier_start(void)
 {
 	struct damon_ctx *ctx;
+	int err;
 
 	ctx = damon_sample_mtier_build_ctx(true);
 	if (!ctx)
@@ -185,7 +186,15 @@ static int damon_sample_mtier_start(void)
 		return -ENOMEM;
 	}
 	ctxs[1] = ctx;
-	return damon_start(ctxs, 2, true);
+	err = damon_start(ctxs, 2, true);
+	if (!err)
+		return 0;
+
+	if (damon_is_running(ctxs[0]))
+		damon_stop(ctxs, 1);
+	damon_destroy_ctx(ctxs[0]);
+	damon_destroy_ctx(ctxs[1]);
+	return err;
 }
 
 static void damon_sample_mtier_stop(void)
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH v4 4/6] samples/damon/mtier: handle damon_stop() failure
  2026-06-10 13:55 [RFC PATCH v4 0/6] samples/damon: handle damon_{start,stop}() failures SeongJae Park
                   ` (2 preceding siblings ...)
  2026-06-10 13:55 ` [RFC PATCH v4 3/6] samples/damon/mtier: " SeongJae Park
@ 2026-06-10 13:55 ` SeongJae Park
  2026-06-10 13:55 ` [RFC PATCH v4 5/6] samples/damon/wsse: stop and free damon ctx when damon_call() fails SeongJae Park
  2026-06-10 13:55 ` [RFC PATCH v4 6/6] samples/damon/prcl: " SeongJae Park
  5 siblings, 0 replies; 7+ messages in thread
From: SeongJae Park @ 2026-06-10 13:55 UTC (permalink / raw)
  Cc: SeongJae Park, # 6 . 16 . x, Andrew Morton, damon, linux-kernel,
	linux-mm

damon_sample_mtier_stop() assumes its damon_stop() call will always
successfully stops the two DAMON contexts.  Hence it deallocates the two
DAMON contexts after the damon_stop() call.  However, if a given context
is already stopped, damon_stop() fails and returns an error while
letting the DAMON contexts that have not yet stopped keep running.  This
kind of unexpected early DAMON context stops could happen due to memory
allocation failures in kdamond_fn().  Because damon_sample_mtier_stop()
just deallocates all DAMON contexts with damon_target and damon_region
objects that are linked to the contexts, the execution of the unstopped
DAMON context (kdamond) ends up using the memory that freed
(use-after-free).  Fix the issue by separating the damon_stop() to be
invoked per context.

Note that DAMON_SYSFS also allows multiple DAMON contexts execution.
But, it calls damon_stop() for each context one by one.  Hence this
issue is only in mtier.

For the long term, it would be better to refactor damon_stop() to always
ensure stopping all contexts regardless of the failures in the middle.
Make this fix in the current way, though, to keep it simple and easy to
backport.  I will do the refactoring later.

The issue was discovered [1] by Sashiko.

[1] https://lore.kernel.org/20260609014219.3013-1-sj@kernel.org

Fixes: 82a08bde3cf7 ("samples/damon: implement a DAMON module for memory tiering")
Cc: <stable@vger.kernel.org> # 6.16.x
Signed-off-by: SeongJae Park <sj@kernel.org>
---
 samples/damon/mtier.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/samples/damon/mtier.c b/samples/damon/mtier.c
index 66b591f2180fa..faaaaa12e6206 100644
--- a/samples/damon/mtier.c
+++ b/samples/damon/mtier.c
@@ -199,7 +199,8 @@ static int damon_sample_mtier_start(void)
 
 static void damon_sample_mtier_stop(void)
 {
-	damon_stop(ctxs, 2);
+	damon_stop(ctxs, 1);
+	damon_stop(&ctxs[1], 1);
 	damon_destroy_ctx(ctxs[0]);
 	damon_destroy_ctx(ctxs[1]);
 }
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH v4 5/6] samples/damon/wsse: stop and free damon ctx when damon_call() fails
  2026-06-10 13:55 [RFC PATCH v4 0/6] samples/damon: handle damon_{start,stop}() failures SeongJae Park
                   ` (3 preceding siblings ...)
  2026-06-10 13:55 ` [RFC PATCH v4 4/6] samples/damon/mtier: handle damon_stop() failure SeongJae Park
@ 2026-06-10 13:55 ` SeongJae Park
  2026-06-10 13:55 ` [RFC PATCH v4 6/6] samples/damon/prcl: " SeongJae Park
  5 siblings, 0 replies; 7+ messages in thread
From: SeongJae Park @ 2026-06-10 13:55 UTC (permalink / raw)
  Cc: SeongJae Park, # 6 . 17 . x, Andrew Morton, damon, linux-kernel,
	linux-mm

damon_sample_wsse_start() calls damon_call() right after damon_start()
is succeeded.  The kdamond that has started by the damon_start() could
be terminated by itself before or in the middle of the damon_call()
execution. There could be multiple reasons for such a stop including
monitoring target process termination and kdamond_fn() internal memory
allocation failures.  In the case, damon_call() will fail and return an
error without cleaning up the DAMON context object.  The
damon_sample_wsse_start() caller assumes it would clean up the object,
though.  When the user requests to start DAMON again,
damon_sample_wsse_start() is called again, allocates a new DAMON context
object and overwrites the pointer for the previous object.  As a result,
the previous context object is leaked.

Safely stop the kdamond and deallocate the context object when the
failure is returned.  Note that the kdamond should be stopped first,
because damon_call() failure means not complete termination of the
kdamond but only the fact that the termination process has started.

The user impact shouldn't be that significant because the race is not
easy to happen, and only up to one DAMON context object can be leaked
per race.

The issue was discovered [1] by Sashiko.

[1] https://lore.kernel.org/20260610034828.4632-1-sj@kernel.org

Fixes: cc9c1b8c205b ("samples/damon/wsse: use damon_call() repeat mode instead of damon_callback")
Cc: <stable@vger.kernel.org> # 6.17.x
Signed-off-by: SeongJae Park <sj@kernel.org>
---
 samples/damon/wsse.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/samples/damon/wsse.c b/samples/damon/wsse.c
index bbd9392ab5b36..ff5e8a890f448 100644
--- a/samples/damon/wsse.c
+++ b/samples/damon/wsse.c
@@ -92,7 +92,12 @@ static int damon_sample_wsse_start(void)
 		return err;
 	}
 	repeat_call_control.data = ctx;
-	return damon_call(ctx, &repeat_call_control);
+	err = damon_call(ctx, &repeat_call_control);
+	if (err) {
+		damon_stop(&ctx, 1);
+		damon_destroy_ctx(ctx);
+	}
+	return err;
 }
 
 static void damon_sample_wsse_stop(void)
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH v4 6/6] samples/damon/prcl: stop and free damon ctx when damon_call() fails
  2026-06-10 13:55 [RFC PATCH v4 0/6] samples/damon: handle damon_{start,stop}() failures SeongJae Park
                   ` (4 preceding siblings ...)
  2026-06-10 13:55 ` [RFC PATCH v4 5/6] samples/damon/wsse: stop and free damon ctx when damon_call() fails SeongJae Park
@ 2026-06-10 13:55 ` SeongJae Park
  5 siblings, 0 replies; 7+ messages in thread
From: SeongJae Park @ 2026-06-10 13:55 UTC (permalink / raw)
  Cc: SeongJae Park, # 6 . 17 . x, Andrew Morton, damon, linux-kernel,
	linux-mm

damon_sample_prcl_start() calls damon_call() right after damon_start()
is succeeded.  The kdamond that has started by the damon_start() could
be terminated by itself before or in the middle of the damon_call()
execution. There could be multiple reasons for such a stop including
monitoring target process termination and kdamond_fn() internal memory
allocation failures.  In the case, damon_call() will fail and return an
error without cleaning up the DAMON context object.  The
damon_sample_prcl_start() caller assumes it would clean up the object,
though.  When the user requests to start DAMON again,
damon_sample_prcl_start() is called again, allocates a new DAMON context
object and overwrites the pointer for the previous object.  As a result,
the previous context object is leaked.

Safely stop the kdamond and deallocate the context object when the
failure is returned.  Note that the kdamond should be stopped first,
because damon_call() failure means not complete termination of the
kdamond but only the fact that the termination process has started.

The user impact shouldn't be that significant because the race is not
easy to happen, and only up to one DAMON context object can be leaked
per race.

The issue was discovered [1] by Sashiko.

[1] https://lore.kernel.org/20260610035214.4850-1-sj@kernel.org

Fixes: a6c33f1054e3 ("samples/damon/prcl: use damon_call() repeat mode instead of damon_callback")
Cc: <stable@vger.kernel.org> # 6.17.x
Signed-off-by: SeongJae Park <sj@kernel.org>
---
 samples/damon/prcl.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/samples/damon/prcl.c b/samples/damon/prcl.c
index 0db2598946911..edeae145c4a8a 100644
--- a/samples/damon/prcl.c
+++ b/samples/damon/prcl.c
@@ -112,7 +112,12 @@ static int damon_sample_prcl_start(void)
 	}
 
 	repeat_call_control.data = ctx;
-	return damon_call(ctx, &repeat_call_control);
+	err = damon_call(ctx, &repeat_call_control);
+	if (err) {
+		damon_stop(&ctx, 1);
+		damon_destroy_ctx(ctx);
+	}
+	return err;
 }
 
 static void damon_sample_prcl_stop(void)
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-06-10 13:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-10 13:55 [RFC PATCH v4 0/6] samples/damon: handle damon_{start,stop}() failures SeongJae Park
2026-06-10 13:55 ` [RFC PATCH v4 1/6] samples/damon/wsse: handle damon_start() failure SeongJae Park
2026-06-10 13:55 ` [RFC PATCH v4 2/6] samples/damon/prcl: " SeongJae Park
2026-06-10 13:55 ` [RFC PATCH v4 3/6] samples/damon/mtier: " SeongJae Park
2026-06-10 13:55 ` [RFC PATCH v4 4/6] samples/damon/mtier: handle damon_stop() failure SeongJae Park
2026-06-10 13:55 ` [RFC PATCH v4 5/6] samples/damon/wsse: stop and free damon ctx when damon_call() fails SeongJae Park
2026-06-10 13:55 ` [RFC PATCH v4 6/6] samples/damon/prcl: " SeongJae Park

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.