* [RFC PATCH v4 1/6] samples/damon/wsse: handle damon_start() failure
2026-06-10 13:55 [RFC PATCH v4 0/6] samples/damon: handle damon_{start,stop}() failures SeongJae Park
@ 2026-06-10 13:55 ` SeongJae Park
2026-06-10 13:55 ` [RFC PATCH v4 2/6] samples/damon/prcl: " SeongJae Park
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: SeongJae Park @ 2026-06-10 13:55 UTC (permalink / raw)
Cc: SeongJae Park, # 6 . 14 . x, Andrew Morton, damon, linux-kernel,
linux-mm
damon_sample_wsse_start() callers assume it will clean up resources when
it fails. And the function does the cleanup for context buildup
failures. However, it is not doing the cleanup for damon_start()
failure. As a result, when damon_start() fails, it leaks the memory for
DAMON context. Free the context in case of the failure to fix the
issues.
Note that the issue can reliably be reproduced because the module calls
damon_start() in the exclusive mode. For example,
$ sudo damo start
$ echo $$ | sudo tee /sys/module/damon_sample_wsse/parameters/target_pid
$ echo Y | sudo tee /sys/module/damon_sample_wsse/parameters/enabled
$ sudo cat /proc/allocinfo | grep damon_new_ctx
Because the first command is running another DAMON instance, the third
command fails the damon_start() call because the new DAMON instance
cannot exclusively run. And without this fix, by repeating the third
and the fourth commands above, we can show the memory consumption is
only increasing due to the leaks. It requires the sudo permission
though.
The issue was discovered [1] by Sashiko.
[1] https://lore.kernel.org/20260609145814.70163-1-sj@kernel.org
Fixes: b757c6cfc696 ("samples/damon/wsse: start and stop DAMON as the user requests")
Cc: <stable@vger.kernel.org> # 6.14.x
Signed-off-by: SeongJae Park <sj@kernel.org>
---
samples/damon/wsse.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/samples/damon/wsse.c b/samples/damon/wsse.c
index 799ad44439434..bbd9392ab5b36 100644
--- a/samples/damon/wsse.c
+++ b/samples/damon/wsse.c
@@ -87,8 +87,10 @@ static int damon_sample_wsse_start(void)
target->pid = target_pidp;
err = damon_start(&ctx, 1, true);
- if (err)
+ if (err) {
+ damon_destroy_ctx(ctx);
return err;
+ }
repeat_call_control.data = ctx;
return damon_call(ctx, &repeat_call_control);
}
--
2.47.3
^ permalink raw reply related [flat|nested] 7+ messages in thread* [RFC PATCH v4 2/6] samples/damon/prcl: handle damon_start() failure
2026-06-10 13:55 [RFC PATCH v4 0/6] samples/damon: handle damon_{start,stop}() failures SeongJae Park
2026-06-10 13:55 ` [RFC PATCH v4 1/6] samples/damon/wsse: handle damon_start() failure SeongJae Park
@ 2026-06-10 13:55 ` SeongJae Park
2026-06-10 13:55 ` [RFC PATCH v4 3/6] samples/damon/mtier: " SeongJae Park
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: SeongJae Park @ 2026-06-10 13:55 UTC (permalink / raw)
Cc: SeongJae Park, # 6 . 14 . x, Andrew Morton, damon, linux-kernel,
linux-mm
damon_sample_prcl_start() callers assume it will clean up resources when
it fails. And the function does the cleanup for context buildup
failures. However, it is not doing the cleanup for damon_start()
failure. As a result, when damon_start() fails, it leaks the memory for
DAMON context. Free the context in case of the failure to fix the
issues.
Note that the issue can reliably be reproduced because the module calls
damon_start() in the exclusive mode. For example,
$ sudo damo start
$ echo $$ | sudo tee /sys/module/damon_sample_prcl/parameters/target_pid
$ echo Y | sudo tee /sys/module/damon_sample_prcl/parameters/enabled
$ sudo cat /proc/allocinfo | grep damon_new_ctx
Because the first command is running another DAMON instance, the third
command fails the damon_start() call because the new DAMON instance
cannot exclusively run. And without this fix, by repeating the third
and the fourth commands above, we can show the memory consumption is
only increasing due to the leaks. It requires the sudo permission
though.
The issue was discovered [1] by Sashiko.
[1] https://lore.kernel.org/20260609145814.70163-1-sj@kernel.org
Fixes: 2aca254620a8 ("samples/damon: introduce a skeleton of a smaple DAMON module for proactive reclamation")
Cc: <stable@vger.kernel.org> # 6.14.x
Signed-off-by: SeongJae Park <sj@kernel.org>
---
samples/damon/prcl.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/samples/damon/prcl.c b/samples/damon/prcl.c
index b7c50f2656ce7..0db2598946911 100644
--- a/samples/damon/prcl.c
+++ b/samples/damon/prcl.c
@@ -106,8 +106,10 @@ static int damon_sample_prcl_start(void)
damon_set_schemes(ctx, &scheme, 1);
err = damon_start(&ctx, 1, true);
- if (err)
+ if (err) {
+ damon_destroy_ctx(ctx);
return err;
+ }
repeat_call_control.data = ctx;
return damon_call(ctx, &repeat_call_control);
--
2.47.3
^ permalink raw reply related [flat|nested] 7+ messages in thread* [RFC PATCH v4 3/6] samples/damon/mtier: handle damon_start() failure
2026-06-10 13:55 [RFC PATCH v4 0/6] samples/damon: handle damon_{start,stop}() failures SeongJae Park
2026-06-10 13:55 ` [RFC PATCH v4 1/6] samples/damon/wsse: handle damon_start() failure SeongJae Park
2026-06-10 13:55 ` [RFC PATCH v4 2/6] samples/damon/prcl: " SeongJae Park
@ 2026-06-10 13:55 ` SeongJae Park
2026-06-10 13:55 ` [RFC PATCH v4 4/6] samples/damon/mtier: handle damon_stop() failure SeongJae Park
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: SeongJae Park @ 2026-06-10 13:55 UTC (permalink / raw)
Cc: SeongJae Park, # 6 . 16 . x, Andrew Morton, damon, linux-kernel,
linux-mm
damon_sample_mtier_start() callers assume it will clean up resources
when it fails. And the function does the cleanup for context buildup
failures. However, it is not doing the cleanup for damon_start()
failure.
As a result, when damon_start() fails, it could leak the memory for
DAMON context. Also, if damon_start() fails for only the second
context, the first context will indefinitely run, and avoid starting
other DAMON contexts since it is running in the exclusive mode. Stop
possibly started DAMON context and free the contexts in case of the
failure to fix the issues.
Note that the issue can reliably be reproduced because the module calls
damon_start() in the exclusive mode. For example,
$ sudo damo start
$ echo Y | sudo tee /sys/module/damon_sample_mtier/parameters/enabled
$ sudo cat /proc/allocinfo | grep damon_new_ctx
Because the first command is running another DAMON instance, the second
command fails the damon_start() call because the new DAMON instance
cannot exclusively run. And without this fix, by repeating the second
and the third commands above, we can show the memory consumption is only
increasing due to the leaks. It requires the sudo permission though.
The issue was discovered [1] by Sashiko.
[1] https://lore.kernel.org/20260608112455.274231F00893@smtp.kernel.org
Fixes: 82a08bde3cf7 ("samples/damon: implement a DAMON module for memory tiering")
Cc: <stable@vger.kernel.org> # 6.16.x
Signed-off-by: SeongJae Park <sj@kernel.org>
---
samples/damon/mtier.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/samples/damon/mtier.c b/samples/damon/mtier.c
index eb1143de8df17..66b591f2180fa 100644
--- a/samples/damon/mtier.c
+++ b/samples/damon/mtier.c
@@ -174,6 +174,7 @@ static struct damon_ctx *damon_sample_mtier_build_ctx(bool promote)
static int damon_sample_mtier_start(void)
{
struct damon_ctx *ctx;
+ int err;
ctx = damon_sample_mtier_build_ctx(true);
if (!ctx)
@@ -185,7 +186,15 @@ static int damon_sample_mtier_start(void)
return -ENOMEM;
}
ctxs[1] = ctx;
- return damon_start(ctxs, 2, true);
+ err = damon_start(ctxs, 2, true);
+ if (!err)
+ return 0;
+
+ if (damon_is_running(ctxs[0]))
+ damon_stop(ctxs, 1);
+ damon_destroy_ctx(ctxs[0]);
+ damon_destroy_ctx(ctxs[1]);
+ return err;
}
static void damon_sample_mtier_stop(void)
--
2.47.3
^ permalink raw reply related [flat|nested] 7+ messages in thread* [RFC PATCH v4 4/6] samples/damon/mtier: handle damon_stop() failure
2026-06-10 13:55 [RFC PATCH v4 0/6] samples/damon: handle damon_{start,stop}() failures SeongJae Park
` (2 preceding siblings ...)
2026-06-10 13:55 ` [RFC PATCH v4 3/6] samples/damon/mtier: " SeongJae Park
@ 2026-06-10 13:55 ` SeongJae Park
2026-06-10 13:55 ` [RFC PATCH v4 5/6] samples/damon/wsse: stop and free damon ctx when damon_call() fails SeongJae Park
2026-06-10 13:55 ` [RFC PATCH v4 6/6] samples/damon/prcl: " SeongJae Park
5 siblings, 0 replies; 7+ messages in thread
From: SeongJae Park @ 2026-06-10 13:55 UTC (permalink / raw)
Cc: SeongJae Park, # 6 . 16 . x, Andrew Morton, damon, linux-kernel,
linux-mm
damon_sample_mtier_stop() assumes its damon_stop() call will always
successfully stops the two DAMON contexts. Hence it deallocates the two
DAMON contexts after the damon_stop() call. However, if a given context
is already stopped, damon_stop() fails and returns an error while
letting the DAMON contexts that have not yet stopped keep running. This
kind of unexpected early DAMON context stops could happen due to memory
allocation failures in kdamond_fn(). Because damon_sample_mtier_stop()
just deallocates all DAMON contexts with damon_target and damon_region
objects that are linked to the contexts, the execution of the unstopped
DAMON context (kdamond) ends up using the memory that freed
(use-after-free). Fix the issue by separating the damon_stop() to be
invoked per context.
Note that DAMON_SYSFS also allows multiple DAMON contexts execution.
But, it calls damon_stop() for each context one by one. Hence this
issue is only in mtier.
For the long term, it would be better to refactor damon_stop() to always
ensure stopping all contexts regardless of the failures in the middle.
Make this fix in the current way, though, to keep it simple and easy to
backport. I will do the refactoring later.
The issue was discovered [1] by Sashiko.
[1] https://lore.kernel.org/20260609014219.3013-1-sj@kernel.org
Fixes: 82a08bde3cf7 ("samples/damon: implement a DAMON module for memory tiering")
Cc: <stable@vger.kernel.org> # 6.16.x
Signed-off-by: SeongJae Park <sj@kernel.org>
---
samples/damon/mtier.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/samples/damon/mtier.c b/samples/damon/mtier.c
index 66b591f2180fa..faaaaa12e6206 100644
--- a/samples/damon/mtier.c
+++ b/samples/damon/mtier.c
@@ -199,7 +199,8 @@ static int damon_sample_mtier_start(void)
static void damon_sample_mtier_stop(void)
{
- damon_stop(ctxs, 2);
+ damon_stop(ctxs, 1);
+ damon_stop(&ctxs[1], 1);
damon_destroy_ctx(ctxs[0]);
damon_destroy_ctx(ctxs[1]);
}
--
2.47.3
^ permalink raw reply related [flat|nested] 7+ messages in thread* [RFC PATCH v4 5/6] samples/damon/wsse: stop and free damon ctx when damon_call() fails
2026-06-10 13:55 [RFC PATCH v4 0/6] samples/damon: handle damon_{start,stop}() failures SeongJae Park
` (3 preceding siblings ...)
2026-06-10 13:55 ` [RFC PATCH v4 4/6] samples/damon/mtier: handle damon_stop() failure SeongJae Park
@ 2026-06-10 13:55 ` SeongJae Park
2026-06-10 13:55 ` [RFC PATCH v4 6/6] samples/damon/prcl: " SeongJae Park
5 siblings, 0 replies; 7+ messages in thread
From: SeongJae Park @ 2026-06-10 13:55 UTC (permalink / raw)
Cc: SeongJae Park, # 6 . 17 . x, Andrew Morton, damon, linux-kernel,
linux-mm
damon_sample_wsse_start() calls damon_call() right after damon_start()
is succeeded. The kdamond that has started by the damon_start() could
be terminated by itself before or in the middle of the damon_call()
execution. There could be multiple reasons for such a stop including
monitoring target process termination and kdamond_fn() internal memory
allocation failures. In the case, damon_call() will fail and return an
error without cleaning up the DAMON context object. The
damon_sample_wsse_start() caller assumes it would clean up the object,
though. When the user requests to start DAMON again,
damon_sample_wsse_start() is called again, allocates a new DAMON context
object and overwrites the pointer for the previous object. As a result,
the previous context object is leaked.
Safely stop the kdamond and deallocate the context object when the
failure is returned. Note that the kdamond should be stopped first,
because damon_call() failure means not complete termination of the
kdamond but only the fact that the termination process has started.
The user impact shouldn't be that significant because the race is not
easy to happen, and only up to one DAMON context object can be leaked
per race.
The issue was discovered [1] by Sashiko.
[1] https://lore.kernel.org/20260610034828.4632-1-sj@kernel.org
Fixes: cc9c1b8c205b ("samples/damon/wsse: use damon_call() repeat mode instead of damon_callback")
Cc: <stable@vger.kernel.org> # 6.17.x
Signed-off-by: SeongJae Park <sj@kernel.org>
---
samples/damon/wsse.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/samples/damon/wsse.c b/samples/damon/wsse.c
index bbd9392ab5b36..ff5e8a890f448 100644
--- a/samples/damon/wsse.c
+++ b/samples/damon/wsse.c
@@ -92,7 +92,12 @@ static int damon_sample_wsse_start(void)
return err;
}
repeat_call_control.data = ctx;
- return damon_call(ctx, &repeat_call_control);
+ err = damon_call(ctx, &repeat_call_control);
+ if (err) {
+ damon_stop(&ctx, 1);
+ damon_destroy_ctx(ctx);
+ }
+ return err;
}
static void damon_sample_wsse_stop(void)
--
2.47.3
^ permalink raw reply related [flat|nested] 7+ messages in thread* [RFC PATCH v4 6/6] samples/damon/prcl: stop and free damon ctx when damon_call() fails
2026-06-10 13:55 [RFC PATCH v4 0/6] samples/damon: handle damon_{start,stop}() failures SeongJae Park
` (4 preceding siblings ...)
2026-06-10 13:55 ` [RFC PATCH v4 5/6] samples/damon/wsse: stop and free damon ctx when damon_call() fails SeongJae Park
@ 2026-06-10 13:55 ` SeongJae Park
5 siblings, 0 replies; 7+ messages in thread
From: SeongJae Park @ 2026-06-10 13:55 UTC (permalink / raw)
Cc: SeongJae Park, # 6 . 17 . x, Andrew Morton, damon, linux-kernel,
linux-mm
damon_sample_prcl_start() calls damon_call() right after damon_start()
is succeeded. The kdamond that has started by the damon_start() could
be terminated by itself before or in the middle of the damon_call()
execution. There could be multiple reasons for such a stop including
monitoring target process termination and kdamond_fn() internal memory
allocation failures. In the case, damon_call() will fail and return an
error without cleaning up the DAMON context object. The
damon_sample_prcl_start() caller assumes it would clean up the object,
though. When the user requests to start DAMON again,
damon_sample_prcl_start() is called again, allocates a new DAMON context
object and overwrites the pointer for the previous object. As a result,
the previous context object is leaked.
Safely stop the kdamond and deallocate the context object when the
failure is returned. Note that the kdamond should be stopped first,
because damon_call() failure means not complete termination of the
kdamond but only the fact that the termination process has started.
The user impact shouldn't be that significant because the race is not
easy to happen, and only up to one DAMON context object can be leaked
per race.
The issue was discovered [1] by Sashiko.
[1] https://lore.kernel.org/20260610035214.4850-1-sj@kernel.org
Fixes: a6c33f1054e3 ("samples/damon/prcl: use damon_call() repeat mode instead of damon_callback")
Cc: <stable@vger.kernel.org> # 6.17.x
Signed-off-by: SeongJae Park <sj@kernel.org>
---
samples/damon/prcl.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/samples/damon/prcl.c b/samples/damon/prcl.c
index 0db2598946911..edeae145c4a8a 100644
--- a/samples/damon/prcl.c
+++ b/samples/damon/prcl.c
@@ -112,7 +112,12 @@ static int damon_sample_prcl_start(void)
}
repeat_call_control.data = ctx;
- return damon_call(ctx, &repeat_call_control);
+ err = damon_call(ctx, &repeat_call_control);
+ if (err) {
+ damon_stop(&ctx, 1);
+ damon_destroy_ctx(ctx);
+ }
+ return err;
}
static void damon_sample_prcl_stop(void)
--
2.47.3
^ permalink raw reply related [flat|nested] 7+ messages in thread