public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
* [RFC PATCH v4 00/10] mm/damon: let DAMON be paused and resumed
@ 2026-03-22 15:57 SeongJae Park
  2026-03-22 15:57 ` [RFC PATCH v4 01/10] mm/damon/core: introduce damon_ctx->paused SeongJae Park
                   ` (10 more replies)
  0 siblings, 11 replies; 18+ messages in thread
From: SeongJae Park @ 2026-03-22 15:57 UTC (permalink / raw)
  Cc: SeongJae Park, Liam R. Howlett, Andrew Morton, Brendan Higgins,
	David Gow, David Hildenbrand, Jonathan Corbet, Lorenzo Stoakes,
	Michal Hocko, Mike Rapoport, Shuah Khan, Shuah Khan,
	Suren Baghdasaryan, Vlastimil Babka, damon, kunit-dev, linux-doc,
	linux-kernel, linux-kselftest, linux-mm

DAMON utilizes a few mechanisms that enhance itself over time. Adaptive
regions adjustment, goal-based DAMOS quota auto-tuning and monitoring
intervals auto-tuning like self-training mechanisms are such examples.
It also adds access frequency stability information (age) to the
monitoring results, which makes it enhanced over time.

Sometimes users have to stop DAMON.  In this case, DAMON internal state
that enhanced over the time of the last execution simply goes away.
Restarted DAMON have to train itself and enhance its output from the
scratch.  This makes DAMON less useful in such cases.  Introducing three
such use cases below.

Investigation of DAMON.  It is best to do the investigation online,
especially when it is a production environment.  DAMON therefore
provides features for such online investigations, including DAMOS stats,
monitoring result snapshot exposure, and multiple tracepoints.  When
those are insufficient, and there are additional clues that could be
interfered by DAMON, users have to temporarily stop DAMON to collect the
additional clues.  It is not very useful since many of DAMON internal
clues are gone when DAMON is stopped.  The loss of the monitoring
results that improved over time is also problematic, especially in
production environments.

Monitoring of workloads that have different user-known phases.  For
example, in Android, applications are known to have very different
access patterns and behaviors when they are running on the foreground
and the background.  It can therefore be useful to separate monitoring
of apps based on whether they are running on the foreground and on the
background.  Having two DAMON threads per application that paused and
resumed for the apps foreground/background switches can be useful for
the purpose.  But such pause/resume of the execution is not supported.

Tests of DAMON.  A few DAMON selftests are using drgn to dump the
internal DAMON status.  The tests show if the dumped status is the same
as what the test code expected.  Because DAMON keeps running and
modifying its internal status, there are chances of data races that can
cause false test results.  Stopping DAMON can avoid the race.  But,
since the internal state of DAMON is dropped, the test coverage will be
limited.

Let DAMON execution be paused and resumed without loss of the internal
state, to overhaul the limitations.  For this, introduce a new DAMON
context parameter, namely 'pause'.  API callers can update it while the
context is running, using the online parameters update functions
(damon_commit_ctx() and damon_call()).  Once it is set, kdamond_fn()
main loop will do only limited works excluding the monitoring and DAMOS
works, while sleeping sampling intervals per the work.  The limited
works include handling of the online parameters update.  Hence users can
unset the 'pause' parameter again.  Once it is unset, kdamond_fn() main
loop will do all the work again (resumed).  Under the paused state, it
also does stop condition checks and handling of it, so that paused DAMON
can also be stopped if needed.  Expose the feature to the user space via
DAMON sysfs interface.  Also, update existing drgn-based tests to test
and use the feature.

Tests
=====

I confirmed the feature functionality using real time tracing ('perf
trace' or 'trace-cmd stream') of damon:damon_aggregated DAMON
tracepoint.  By pausing and resuming the DAMON execution, I was able to
see the trace stops and continued as expected.  Note that the pause
feature support is added to DAMON user-space tool (damo) after v3.1.9.
Users can use '--pause_ctx' command line option of damo for that, and I
actually used it for my test.  The extended drgn-based selftests are
also testing a part of the functionality.

Patches Sequence
================

Patch 1 introduces the new core API for the pause feature.  Patch 2
extend DAMON sysfs interface for the new parameter.  Patches 3-5 update
design, usage and ABI documents for the new sysfs file, respectively.
The following five patches are for tests.  Patch 6 implements a new
kunit test for the pause parameter online commitment.  Patches 7 and 8
extend DAMON selftest helpers to support the new feature.  Patch 9
extends selftest to test the commitment of the feature.  Finally, patch
10 updates existing selftest to be safe from the race condition using
the pause/resume feature.

Changelog
=========

Changes from v1 (or, RFC v3)
(https://lore.kernel.org/20260321181343.93971-1-sj@kernel.org)
- Add RFC tag again.
- Handle maybe_corrupted inside pause-loop.
- Reduce unnecessary commits in sysfs.py selftest.
Changes from RFC v2
(https://lore.kernel.org/20260319052157.99433-1-sj@kernel.org)
- Move damon_ctx->pause to public fields section.
- Wordsmith design doc change.
- Fix unintended resume of contexts in multiple contexts use case.
- Rebase to latest mm-new.
Changes from RFC v1
(https://lore.kernel.org/20260315210012.94846-1-sj@kernel.org)
- Continuously cancel new damos_walk() requests when paused.
- Initialize damon_sysfs_context->pause.
- Make sysfs.py dump-purpose pausing to work for all contexts.

SeongJae Park (10):
  mm/damon/core: introduce damon_ctx->paused
  mm/damon/sysfs: add pause file under context dir
  Docs/mm/damon/design: update for context pause/resume feature
  Docs/admin-guide/mm/damon/usage: update for pause file
  Docs/ABI/damon: update for pause sysfs file
  mm/damon/tests/core-kunit: test pause commitment
  selftests/damon/_damon_sysfs: support pause file staging
  selftests/damon/drgn_dump_damon_status: dump pause
  selftests/damon/sysfs.py: check pause on assert_ctx_committed()
  selftets/damon/sysfs.py: pause DAMON before dumping status

 .../ABI/testing/sysfs-kernel-mm-damon         |  7 ++++
 Documentation/admin-guide/mm/damon/usage.rst  | 12 ++++--
 Documentation/mm/damon/design.rst             |  7 ++++
 include/linux/damon.h                         |  2 +
 mm/damon/core.c                               |  9 +++++
 mm/damon/sysfs.c                              | 31 +++++++++++++++
 mm/damon/tests/core-kunit.h                   |  4 ++
 tools/testing/selftests/damon/_damon_sysfs.py | 10 ++++-
 .../selftests/damon/drgn_dump_damon_status.py |  1 +
 tools/testing/selftests/damon/sysfs.py        | 39 +++++++++++++++++++
 10 files changed, 117 insertions(+), 5 deletions(-)


base-commit: 73b971e012fbe1b2e8cd4992602898d5c9633ca4
-- 
2.47.3


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH v4 01/10] mm/damon/core: introduce damon_ctx->paused
  2026-03-22 15:57 [RFC PATCH v4 00/10] mm/damon: let DAMON be paused and resumed SeongJae Park
@ 2026-03-22 15:57 ` SeongJae Park
  2026-03-22 17:06   ` (sashiko review) " SeongJae Park
  2026-03-22 15:57 ` [RFC PATCH v4 02/10] mm/damon/sysfs: add pause file under context dir SeongJae Park
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 18+ messages in thread
From: SeongJae Park @ 2026-03-22 15:57 UTC (permalink / raw)
  Cc: SeongJae Park, Andrew Morton, damon, linux-kernel, linux-mm

DAMON supports only start and stop of the execution.  When it is
stopped, its internal data that it self-trained goes away.  It will be
useful if the execution can be paused and resumed with the previous
self-trained data.

Introduce per-context API parameter, 'paused', for the purpose.  The
parameter can be set and unset while DAMON is running and paused, using
the online parameters commit helper functions (damon_commit_ctx() and
damon_call()).  Once 'paused' is set, the kdamond_fn() main loop does
only limited works with sampling interval sleep during the works.  The
limited works include the handling of the online parameters update, so
that users can unset the 'pause' and resume the execution when they
want.  It also keep checking DAMON stop conditions and handling of it,
so that DAMON can be stopped while paused if needed.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 include/linux/damon.h | 2 ++
 mm/damon/core.c       | 9 +++++++++
 2 files changed, 11 insertions(+)

diff --git a/include/linux/damon.h b/include/linux/damon.h
index d9a3babbafc16..ea1649a09395d 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -787,6 +787,7 @@ struct damon_attrs {
  * @ops:	Set of monitoring operations for given use cases.
  * @addr_unit:	Scale factor for core to ops address conversion.
  * @min_region_sz:	Minimum region size.
+ * @pause:	Pause kdamond main loop.
  * @adaptive_targets:	Head of monitoring targets (&damon_target) list.
  * @schemes:		Head of schemes (&damos) list.
  */
@@ -838,6 +839,7 @@ struct damon_ctx {
 	struct damon_operations ops;
 	unsigned long addr_unit;
 	unsigned long min_region_sz;
+	bool pause;
 
 	struct list_head adaptive_targets;
 	struct list_head schemes;
diff --git a/mm/damon/core.c b/mm/damon/core.c
index db6c67e52d2b8..0ab2cfa848e69 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -1349,6 +1349,7 @@ int damon_commit_ctx(struct damon_ctx *dst, struct damon_ctx *src)
 		if (err)
 			return err;
 	}
+	dst->pause = src->pause;
 	dst->ops = src->ops;
 	dst->addr_unit = src->addr_unit;
 	dst->min_region_sz = src->min_region_sz;
@@ -3003,6 +3004,14 @@ static int kdamond_fn(void *data)
 		kdamond_call(ctx, false);
 		if (ctx->maybe_corrupted)
 			break;
+		while (ctx->pause) {
+			damos_walk_cancel(ctx);
+			kdamond_usleep(ctx->attrs.sample_interval);
+			/* allow caller unset pause via damon_call() */
+			kdamond_call(ctx, false);
+			if (kdamond_need_stop(ctx) || ctx->maybe_corrupted)
+				goto done;
+		}
 		if (!list_empty(&ctx->schemes))
 			kdamond_apply_schemes(ctx);
 		else
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v4 02/10] mm/damon/sysfs: add pause file under context dir
  2026-03-22 15:57 [RFC PATCH v4 00/10] mm/damon: let DAMON be paused and resumed SeongJae Park
  2026-03-22 15:57 ` [RFC PATCH v4 01/10] mm/damon/core: introduce damon_ctx->paused SeongJae Park
@ 2026-03-22 15:57 ` SeongJae Park
  2026-03-22 15:57 ` [RFC PATCH v4 03/10] Docs/mm/damon/design: update for context pause/resume feature SeongJae Park
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2026-03-22 15:57 UTC (permalink / raw)
  Cc: SeongJae Park, Andrew Morton, damon, linux-kernel, linux-mm

Add pause DAMON sysfs file under the context directory.  It exposes the
damon_ctx->pause API parameter to the users so that they can use the
pause/resume feature.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 mm/damon/sysfs.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/mm/damon/sysfs.c b/mm/damon/sysfs.c
index 6a44a2f3d8fc9..51893abd09472 100644
--- a/mm/damon/sysfs.c
+++ b/mm/damon/sysfs.c
@@ -866,6 +866,7 @@ struct damon_sysfs_context {
 	struct damon_sysfs_attrs *attrs;
 	struct damon_sysfs_targets *targets;
 	struct damon_sysfs_schemes *schemes;
+	bool pause;
 };
 
 static struct damon_sysfs_context *damon_sysfs_context_alloc(
@@ -878,6 +879,7 @@ static struct damon_sysfs_context *damon_sysfs_context_alloc(
 	context->kobj = (struct kobject){};
 	context->ops_id = ops_id;
 	context->addr_unit = 1;
+	context->pause = false;
 	return context;
 }
 
@@ -1053,6 +1055,30 @@ static ssize_t addr_unit_store(struct kobject *kobj,
 	return count;
 }
 
+static ssize_t pause_show(struct kobject *kobj, struct kobj_attribute *attr,
+		char *buf)
+{
+	struct damon_sysfs_context *context = container_of(kobj,
+			struct damon_sysfs_context, kobj);
+
+	return sysfs_emit(buf, "%c\n", context->pause ? 'Y' : 'N');
+}
+
+static ssize_t pause_store(struct kobject *kobj, struct kobj_attribute *attr,
+		const char *buf, size_t count)
+{
+	struct damon_sysfs_context *context = container_of(kobj,
+			struct damon_sysfs_context, kobj);
+	bool pause;
+	int err = kstrtobool(buf, &pause);
+
+	if (err)
+		return err;
+	context->pause = pause;
+	return count;
+}
+
+
 static void damon_sysfs_context_release(struct kobject *kobj)
 {
 	kfree(container_of(kobj, struct damon_sysfs_context, kobj));
@@ -1067,10 +1093,14 @@ static struct kobj_attribute damon_sysfs_context_operations_attr =
 static struct kobj_attribute damon_sysfs_context_addr_unit_attr =
 		__ATTR_RW_MODE(addr_unit, 0600);
 
+static struct kobj_attribute damon_sysfs_context_pause_attr =
+		__ATTR_RW_MODE(pause, 0600);
+
 static struct attribute *damon_sysfs_context_attrs[] = {
 	&damon_sysfs_context_avail_operations_attr.attr,
 	&damon_sysfs_context_operations_attr.attr,
 	&damon_sysfs_context_addr_unit_attr.attr,
+	&damon_sysfs_context_pause_attr.attr,
 	NULL,
 };
 ATTRIBUTE_GROUPS(damon_sysfs_context);
@@ -1470,6 +1500,7 @@ static int damon_sysfs_apply_inputs(struct damon_ctx *ctx,
 	if (sys_ctx->ops_id == DAMON_OPS_PADDR)
 		ctx->min_region_sz = max(
 				DAMON_MIN_REGION_SZ / sys_ctx->addr_unit, 1);
+	ctx->pause = sys_ctx->pause;
 	err = damon_sysfs_set_attrs(ctx, sys_ctx->attrs);
 	if (err)
 		return err;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v4 03/10] Docs/mm/damon/design: update for context pause/resume feature
  2026-03-22 15:57 [RFC PATCH v4 00/10] mm/damon: let DAMON be paused and resumed SeongJae Park
  2026-03-22 15:57 ` [RFC PATCH v4 01/10] mm/damon/core: introduce damon_ctx->paused SeongJae Park
  2026-03-22 15:57 ` [RFC PATCH v4 02/10] mm/damon/sysfs: add pause file under context dir SeongJae Park
@ 2026-03-22 15:57 ` SeongJae Park
  2026-03-22 15:57 ` [RFC PATCH v4 04/10] Docs/admin-guide/mm/damon/usage: update for pause file SeongJae Park
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2026-03-22 15:57 UTC (permalink / raw)
  Cc: SeongJae Park, Liam R. Howlett, Andrew Morton, David Hildenbrand,
	Jonathan Corbet, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Shuah Khan, Suren Baghdasaryan, Vlastimil Babka, damon, linux-doc,
	linux-kernel, linux-mm

Update DAMON design document for the context execution pause/resume
feature.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 Documentation/mm/damon/design.rst | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/mm/damon/design.rst b/Documentation/mm/damon/design.rst
index afc7d52bda2f7..510ec6375178d 100644
--- a/Documentation/mm/damon/design.rst
+++ b/Documentation/mm/damon/design.rst
@@ -19,6 +19,13 @@ types of monitoring.
 To know how user-space can do the configurations and start/stop DAMON, refer to
 :ref:`DAMON sysfs interface <sysfs_interface>` documentation.
 
+Users can also request each context execution to be paused and resumed.  When
+it is paused, the kdamond does nothing other than applying online parameter
+update.
+
+To know how user-space can pause/resume each context, refer to :ref:`DAMON
+sysfs context <sysfs_context>` usage documentation.
+
 
 Overall Architecture
 ====================
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v4 04/10] Docs/admin-guide/mm/damon/usage: update for pause file
  2026-03-22 15:57 [RFC PATCH v4 00/10] mm/damon: let DAMON be paused and resumed SeongJae Park
                   ` (2 preceding siblings ...)
  2026-03-22 15:57 ` [RFC PATCH v4 03/10] Docs/mm/damon/design: update for context pause/resume feature SeongJae Park
@ 2026-03-22 15:57 ` SeongJae Park
  2026-03-22 15:57 ` [RFC PATCH v4 05/10] Docs/ABI/damon: update for pause sysfs file SeongJae Park
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2026-03-22 15:57 UTC (permalink / raw)
  Cc: SeongJae Park, Liam R. Howlett, Andrew Morton, David Hildenbrand,
	Jonathan Corbet, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Shuah Khan, Suren Baghdasaryan, Vlastimil Babka, damon, linux-doc,
	linux-kernel, linux-mm

Update DAMON usage document for the DAMON context execution pause/resume
feature.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 Documentation/admin-guide/mm/damon/usage.rst | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst
index 534e1199cf091..bfdb717441f05 100644
--- a/Documentation/admin-guide/mm/damon/usage.rst
+++ b/Documentation/admin-guide/mm/damon/usage.rst
@@ -66,7 +66,8 @@ comma (",").
     │ :ref:`kdamonds <sysfs_kdamonds>`/nr_kdamonds
     │ │ :ref:`0 <sysfs_kdamond>`/state,pid,refresh_ms
     │ │ │ :ref:`contexts <sysfs_contexts>`/nr_contexts
-    │ │ │ │ :ref:`0 <sysfs_context>`/avail_operations,operations,addr_unit
+    │ │ │ │ :ref:`0 <sysfs_context>`/avail_operations,operations,addr_unit,
+    │ │ │ │   pause
     │ │ │ │ │ :ref:`monitoring_attrs <sysfs_monitoring_attrs>`/
     │ │ │ │ │ │ intervals/sample_us,aggr_us,update_us
     │ │ │ │ │ │ │ intervals_goal/access_bp,aggrs,min_sample_us,max_sample_us
@@ -194,9 +195,9 @@ details).  At the moment, only one context per kdamond is supported, so only
 contexts/<N>/
 -------------
 
-In each context directory, three files (``avail_operations``, ``operations``
-and ``addr_unit``) and three directories (``monitoring_attrs``, ``targets``,
-and ``schemes``) exist.
+In each context directory, four files (``avail_operations``, ``operations``,
+``addr_unit`` and ``pause``) and three directories (``monitoring_attrs``,
+``targets``, and ``schemes``) exist.
 
 DAMON supports multiple types of :ref:`monitoring operations
 <damon_design_configurable_operations_set>`, including those for virtual address
@@ -214,6 +215,9 @@ reading from the ``operations`` file.
 ``addr_unit`` file is for setting and getting the :ref:`address unit
 <damon_design_addr_unit>` parameter of the operations set.
 
+``pause`` file is for setting and getting the :ref:`pause request
+<damon_design_execution_model_and_data_structures>` parameter of the context.
+
 .. _sysfs_monitoring_attrs:
 
 contexts/<N>/monitoring_attrs/
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v4 05/10] Docs/ABI/damon: update for pause sysfs file
  2026-03-22 15:57 [RFC PATCH v4 00/10] mm/damon: let DAMON be paused and resumed SeongJae Park
                   ` (3 preceding siblings ...)
  2026-03-22 15:57 ` [RFC PATCH v4 04/10] Docs/admin-guide/mm/damon/usage: update for pause file SeongJae Park
@ 2026-03-22 15:57 ` SeongJae Park
  2026-03-22 15:57 ` [RFC PATCH v4 06/10] mm/damon/tests/core-kunit: test pause commitment SeongJae Park
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2026-03-22 15:57 UTC (permalink / raw)
  Cc: SeongJae Park, damon, linux-kernel, linux-mm

Update DAMON ABI document for the DAMON context execution pause/resume
feature.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 Documentation/ABI/testing/sysfs-kernel-mm-damon | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-damon b/Documentation/ABI/testing/sysfs-kernel-mm-damon
index 2424237ebb105..7059f540940f0 100644
--- a/Documentation/ABI/testing/sysfs-kernel-mm-damon
+++ b/Documentation/ABI/testing/sysfs-kernel-mm-damon
@@ -84,6 +84,13 @@ Description:	Writing an integer to this file sets the 'address unit'
 		parameter of the given operations set of the context.  Reading
 		the file returns the last-written 'address unit' value.
 
+What:		/sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/pause
+Date:		Mar 2026
+Contact:	SeongJae Park <sj@kernel.org>
+Description:	Writing a boolean keyword to this file sets the 'pause' request
+		parameter for the context.  Reading the file returns the
+		last-written 'pause' value.
+
 What:		/sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/monitoring_attrs/intervals/sample_us
 Date:		Mar 2022
 Contact:	SeongJae Park <sj@kernel.org>
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v4 06/10] mm/damon/tests/core-kunit: test pause commitment
  2026-03-22 15:57 [RFC PATCH v4 00/10] mm/damon: let DAMON be paused and resumed SeongJae Park
                   ` (4 preceding siblings ...)
  2026-03-22 15:57 ` [RFC PATCH v4 05/10] Docs/ABI/damon: update for pause sysfs file SeongJae Park
@ 2026-03-22 15:57 ` SeongJae Park
  2026-03-22 15:57 ` [RFC PATCH v4 07/10] selftests/damon/_damon_sysfs: support pause file staging SeongJae Park
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2026-03-22 15:57 UTC (permalink / raw)
  Cc: SeongJae Park, Andrew Morton, Brendan Higgins, David Gow, damon,
	kunit-dev, linux-kernel, linux-kselftest, linux-mm

Add a kunit test for commitment of damon_ctx->pause parameter that can
be done using damon_commit_ctx().

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 mm/damon/tests/core-kunit.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/damon/tests/core-kunit.h b/mm/damon/tests/core-kunit.h
index 9e5904c2beeb2..0030f682b23b7 100644
--- a/mm/damon/tests/core-kunit.h
+++ b/mm/damon/tests/core-kunit.h
@@ -1077,6 +1077,10 @@ static void damon_test_commit_ctx(struct kunit *test)
 	KUNIT_EXPECT_EQ(test, damon_commit_ctx(dst, src), 0);
 	src->min_region_sz = 4095;
 	KUNIT_EXPECT_EQ(test, damon_commit_ctx(dst, src), -EINVAL);
+	src->min_region_sz = 4096;
+	src->pause = true;
+	KUNIT_EXPECT_EQ(test, damon_commit_ctx(dst, src), 0);
+	KUNIT_EXPECT_TRUE(test, dst->pause);
 	damon_destroy_ctx(src);
 	damon_destroy_ctx(dst);
 }
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v4 07/10] selftests/damon/_damon_sysfs: support pause file staging
  2026-03-22 15:57 [RFC PATCH v4 00/10] mm/damon: let DAMON be paused and resumed SeongJae Park
                   ` (5 preceding siblings ...)
  2026-03-22 15:57 ` [RFC PATCH v4 06/10] mm/damon/tests/core-kunit: test pause commitment SeongJae Park
@ 2026-03-22 15:57 ` SeongJae Park
  2026-03-22 15:57 ` [RFC PATCH v4 08/10] selftests/damon/drgn_dump_damon_status: dump pause SeongJae Park
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2026-03-22 15:57 UTC (permalink / raw)
  Cc: SeongJae Park, Shuah Khan, damon, linux-kernel, linux-kselftest,
	linux-mm

DAMON test-purpose sysfs interface control Python module, _damon_sysfs,
is not supporting the newly added pause file.  Add the support of the
file, for future test and use of the feature.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 tools/testing/selftests/damon/_damon_sysfs.py | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/damon/_damon_sysfs.py b/tools/testing/selftests/damon/_damon_sysfs.py
index 2b4df655d9fd0..120b96ecbd741 100644
--- a/tools/testing/selftests/damon/_damon_sysfs.py
+++ b/tools/testing/selftests/damon/_damon_sysfs.py
@@ -604,10 +604,11 @@ class DamonCtx:
     targets = None
     schemes = None
     kdamond = None
+    pause = None
     idx = None
 
     def __init__(self, ops='paddr', monitoring_attrs=DamonAttrs(), targets=[],
-            schemes=[]):
+            schemes=[], pause=False):
         self.ops = ops
         self.monitoring_attrs = monitoring_attrs
         self.monitoring_attrs.context = self
@@ -622,6 +623,8 @@ class DamonCtx:
             scheme.idx = idx
             scheme.context = self
 
+        self.pause=pause
+
     def sysfs_dir(self):
         return os.path.join(self.kdamond.sysfs_dir(), 'contexts',
                 '%d' % self.idx)
@@ -662,6 +665,11 @@ class DamonCtx:
             err = scheme.stage()
             if err is not None:
                 return err
+
+        err = write_file(os.path.join(self.sysfs_dir(), 'pause'), self.pause)
+        if err is not None:
+            return err
+
         return None
 
 class Kdamond:
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v4 08/10] selftests/damon/drgn_dump_damon_status: dump pause
  2026-03-22 15:57 [RFC PATCH v4 00/10] mm/damon: let DAMON be paused and resumed SeongJae Park
                   ` (6 preceding siblings ...)
  2026-03-22 15:57 ` [RFC PATCH v4 07/10] selftests/damon/_damon_sysfs: support pause file staging SeongJae Park
@ 2026-03-22 15:57 ` SeongJae Park
  2026-03-22 15:57 ` [RFC PATCH v4 09/10] selftests/damon/sysfs.py: check pause on assert_ctx_committed() SeongJae Park
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2026-03-22 15:57 UTC (permalink / raw)
  Cc: SeongJae Park, Shuah Khan, damon, linux-kernel, linux-kselftest,
	linux-mm

drgn_dump_damon_status is not dumping the damon_ctx->pause parameter
value, so it cannot be tested.  Dump it for future tests.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 tools/testing/selftests/damon/drgn_dump_damon_status.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/damon/drgn_dump_damon_status.py b/tools/testing/selftests/damon/drgn_dump_damon_status.py
index af99b07a4f565..5b90eb8e7ef88 100755
--- a/tools/testing/selftests/damon/drgn_dump_damon_status.py
+++ b/tools/testing/selftests/damon/drgn_dump_damon_status.py
@@ -200,6 +200,7 @@ def damon_ctx_to_dict(ctx):
         ['attrs', attrs_to_dict],
         ['adaptive_targets', targets_to_list],
         ['schemes', schemes_to_list],
+        ['pause', bool],
         ])
 
 def main():
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v4 09/10] selftests/damon/sysfs.py: check pause on assert_ctx_committed()
  2026-03-22 15:57 [RFC PATCH v4 00/10] mm/damon: let DAMON be paused and resumed SeongJae Park
                   ` (7 preceding siblings ...)
  2026-03-22 15:57 ` [RFC PATCH v4 08/10] selftests/damon/drgn_dump_damon_status: dump pause SeongJae Park
@ 2026-03-22 15:57 ` SeongJae Park
  2026-03-22 15:57 ` [RFC PATCH v4 10/10] selftets/damon/sysfs.py: pause DAMON before dumping status SeongJae Park
  2026-03-22 17:05 ` (sashiko status) [RFC PATCH v4 00/10] mm/damon: let DAMON be paused and resumed SeongJae Park
  10 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2026-03-22 15:57 UTC (permalink / raw)
  Cc: SeongJae Park, Shuah Khan, damon, linux-kernel, linux-kselftest,
	linux-mm

Extend sysfs.py tests to confirm damon_ctx->pause can be set using the
pause sysfs file.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 tools/testing/selftests/damon/sysfs.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/damon/sysfs.py b/tools/testing/selftests/damon/sysfs.py
index 3aa5c91548a53..e6d34ba05893f 100755
--- a/tools/testing/selftests/damon/sysfs.py
+++ b/tools/testing/selftests/damon/sysfs.py
@@ -190,6 +190,7 @@ def assert_ctx_committed(ctx, dump):
     assert_monitoring_attrs_committed(ctx.monitoring_attrs, dump['attrs'])
     assert_monitoring_targets_committed(ctx.targets, dump['adaptive_targets'])
     assert_schemes_committed(ctx.schemes, dump['schemes'])
+    assert_true(dump['pause'] == ctx.pause, 'pause', dump)
 
 def assert_ctxs_committed(kdamonds):
     status, err = dump_damon_status_dict(kdamonds.kdamonds[0].pid)
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v4 10/10] selftets/damon/sysfs.py: pause DAMON before dumping status
  2026-03-22 15:57 [RFC PATCH v4 00/10] mm/damon: let DAMON be paused and resumed SeongJae Park
                   ` (8 preceding siblings ...)
  2026-03-22 15:57 ` [RFC PATCH v4 09/10] selftests/damon/sysfs.py: check pause on assert_ctx_committed() SeongJae Park
@ 2026-03-22 15:57 ` SeongJae Park
  2026-03-22 17:15   ` (sashiko review) " SeongJae Park
  2026-03-22 17:05 ` (sashiko status) [RFC PATCH v4 00/10] mm/damon: let DAMON be paused and resumed SeongJae Park
  10 siblings, 1 reply; 18+ messages in thread
From: SeongJae Park @ 2026-03-22 15:57 UTC (permalink / raw)
  Cc: SeongJae Park, Shuah Khan, damon, linux-kernel, linux-kselftest,
	linux-mm

The sysfs.py test commits DAMON parameters, dump the internal DAMON
state, and show if the parameters are committed as expected using the
dumped state.  While the dumping is ongoing, DAMON is alive.  It can
make internal changes including addition and removal of regions.  It can
therefore make a race that can result in false test results.  Pause
DAMON execution during the state dumping to avoid such races.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 tools/testing/selftests/damon/sysfs.py | 38 ++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/tools/testing/selftests/damon/sysfs.py b/tools/testing/selftests/damon/sysfs.py
index e6d34ba05893f..5f00e97f019f4 100755
--- a/tools/testing/selftests/damon/sysfs.py
+++ b/tools/testing/selftests/damon/sysfs.py
@@ -193,18 +193,55 @@ def assert_ctx_committed(ctx, dump):
     assert_true(dump['pause'] == ctx.pause, 'pause', dump)
 
 def assert_ctxs_committed(kdamonds):
+    ctxs_paused_for_dump = []
+    kdamonds_paused_for_dump = []
+    # pause for safe state dumping
+    for kd in kdamonds.kdamonds:
+        for ctx in kd.contexts:
+            if ctx.pause is False:
+                ctx.pause = True
+                ctxs_paused_for_dump.append(ctx)
+                if not kd in kdamonds_paused_for_dump:
+                    kdamonds_paused_for_dump.append(kd)
+        if kd in kdamonds_paused_for_dump:
+            err = kd.commit()
+            if err is not None:
+                print('pause fail (%s)' % err)
+                kdamonds.stop()
+                exit(1)
+
     status, err = dump_damon_status_dict(kdamonds.kdamonds[0].pid)
     if err is not None:
         print(err)
         kdamonds.stop()
         exit(1)
 
+    # resume contexts paused for safe state dumping
+    for ctx in ctxs_paused_for_dump:
+        ctx.pause = False
+    if kd in kdamonds_paused_for_dump:
+        err = kd.commit()
+        if err is not None:
+            print('resume fail (%s)' % err)
+            kdamonds.stop()
+            exit(1)
+
+    # restore for comparison
+    for ctx in ctxs_paused_for_dump:
+        ctx.pause = True
+
     ctxs = kdamonds.kdamonds[0].contexts
     dump = status['contexts']
     assert_true(len(ctxs) == len(dump), 'ctxs length', dump)
     for idx, ctx in enumerate(ctxs):
         assert_ctx_committed(ctx, dump[idx])
 
+    # restore for the caller
+    for kd in kdamonds.kdamonds:
+        for ctx in kd.contexts:
+            if ctx in ctxs_paused_for_dump:
+                ctx.pause = False
+
 def main():
     kdamonds = _damon_sysfs.Kdamonds(
             [_damon_sysfs.Kdamond(
@@ -302,6 +339,7 @@ def main():
         print('kdamond start failed: %s' % err)
         exit(1)
     kdamonds.kdamonds[0].contexts[0].targets[1].obsolete = True
+    kdamonds.kdamonds[0].contexts[0].pause = True
     kdamonds.kdamonds[0].commit()
     del kdamonds.kdamonds[0].contexts[0].targets[1]
     assert_ctxs_committed(kdamonds)
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: (sashiko status) [RFC PATCH v4 00/10] mm/damon: let DAMON be paused and resumed
  2026-03-22 15:57 [RFC PATCH v4 00/10] mm/damon: let DAMON be paused and resumed SeongJae Park
                   ` (9 preceding siblings ...)
  2026-03-22 15:57 ` [RFC PATCH v4 10/10] selftets/damon/sysfs.py: pause DAMON before dumping status SeongJae Park
@ 2026-03-22 17:05 ` SeongJae Park
  2026-03-22 17:11   ` SeongJae Park
  10 siblings, 1 reply; 18+ messages in thread
From: SeongJae Park @ 2026-03-22 17:05 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Liam R. Howlett, Andrew Morton, Brendan Higgins, David Gow,
	David Hildenbrand, Jonathan Corbet, Lorenzo Stoakes, Michal Hocko,
	Mike Rapoport, Shuah Khan, Shuah Khan, Suren Baghdasaryan,
	Vlastimil Babka, damon, kunit-dev, linux-doc, linux-kernel,
	linux-kselftest, linux-mm

Forwarding sashiko.dev review status for this thread.

# review url: https://sashiko.dev/#/patchset/20260322155728.81434-1-sj@kernel.org

- [RFC PATCH v4 01/10] mm/damon/core: introduce damon_ctx->paused
  - status: Reviewed
  - review: ISSUES MAY FOUND
- [RFC PATCH v4 02/10] mm/damon/sysfs: add pause file under context dir
  - status: Reviewed
  - review: No issues found.
- [RFC PATCH v4 03/10] Docs/mm/damon/design: update for context pause/resume feature
  - status: Reviewed
  - review: No issues found.
- [RFC PATCH v4 04/10] Docs/admin-guide/mm/damon/usage: update for pause file
  - status: Reviewed
  - review: No issues found.
- [RFC PATCH v4 05/10] Docs/ABI/damon: update for pause sysfs file
  - status: Reviewed
  - review: No issues found.
- [RFC PATCH v4 06/10] mm/damon/tests/core-kunit: test pause commitment
  - status: Reviewed
  - review: No issues found.
- [RFC PATCH v4 07/10] selftests/damon/_damon_sysfs: support pause file staging
  - status: Pending
  - review: ISSUES MAY FOUND
- [RFC PATCH v4 08/10] selftests/damon/drgn_dump_damon_status: dump pause
  - status: Reviewed
  - review: No issues found.
- [RFC PATCH v4 09/10] selftests/damon/sysfs.py: check pause on assert_ctx_committed()
  - status: Reviewed
  - review: No issues found.
- [RFC PATCH v4 10/10] selftets/damon/sysfs.py: pause DAMON before dumping status
  - status: Reviewed
  - review: ISSUES MAY FOUND

# hkml [1] generated a draft of this mail.  It can be regenerated
# using below command:
#
#     hkml patch sashiko_dev --thread_status --for_forwarding \
#             20260322155728.81434-1-sj@kernel.org
#
# [1] https://github.com/sjp38/hackermail

Sent using hkml (https://github.com/sjp38/hackermail)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: (sashiko review) [RFC PATCH v4 01/10] mm/damon/core: introduce damon_ctx->paused
  2026-03-22 15:57 ` [RFC PATCH v4 01/10] mm/damon/core: introduce damon_ctx->paused SeongJae Park
@ 2026-03-22 17:06   ` SeongJae Park
  2026-03-22 17:40     ` SeongJae Park
  0 siblings, 1 reply; 18+ messages in thread
From: SeongJae Park @ 2026-03-22 17:06 UTC (permalink / raw)
  To: SeongJae Park; +Cc: Andrew Morton, damon, linux-kernel, linux-mm

Forwarding Sashiko review for doing discussions via mails.

# review url: https://sashiko.dev/#/patchset/20260322155728.81434-2-sj@kernel.org
# start of sashiko.dev inline review
commit 2a3be2c8b6c327ef72163cd54e6121ef72b9c932
Author: SeongJae Park <sj@kernel.org>

mm/damon/core: introduce damon_ctx->paused

DAMON supports only start and stop of the execution. When it is
stopped, its internal data that it self-trained goes away. It will be
useful if the execution can be paused and resumed with the previous
self-trained data.

> diff --git a/mm/damon/core.c b/mm/damon/core.c
> index db6c67e52d2b..0ab2cfa848e6 100644
> --- a/mm/damon/core.c
> +++ b/mm/damon/core.c

[ ... ]

> @@ -3003,6 +3004,14 @@ static int kdamond_fn(void *data)
>  		kdamond_call(ctx, false);
>  		if (ctx->maybe_corrupted)
>  			break;
> +		while (ctx->pause) {
> +			damos_walk_cancel(ctx);

Is it safe to repeatedly call damos_walk_cancel() here?

In damos_walk(), the walk_control_lock is dropped after setting
ctx->walk_control to check damon_is_running(), which opens a race window.

Concurrently, kdamond_fn() calls damos_walk_cancel(), which reads
ctx->walk_control, drops the lock, and completes the old request.

If a new caller invokes damos_walk() and successfully registers a new
walk_control, could damos_walk_cancel() resume, reacquire the lock,
and unconditionally set ctx->walk_control = NULL, dropping the new
request?

This might leave the new caller permanently deadlocked in
wait_for_completion().

> +			kdamond_usleep(ctx->attrs.sample_interval);

Could this result in a busy loop if sample_interval is 0?

The DAMON API allows attrs.sample_interval to be 0 (for example, when
tuning is disabled via goal.aggrs == 0).

If sample_interval is 0, this invokes usleep_range_idle(0, 1), which
sleeps for at most 1 microsecond using an hrtimer.

Since the normally expensive monitoring operations are bypassed while
paused, would this cause the thread to loop almost instantly and create
an interrupt storm?

> +			/* allow caller unset pause via damon_call() */
> +			kdamond_call(ctx, false);
> +			if (kdamond_need_stop(ctx) || ctx->maybe_corrupted)
> +				goto done;
> +		}
>  		if (!list_empty(&ctx->schemes))
>  			kdamond_apply_schemes(ctx);
>  		else


# end of sashiko.dev inline review
# review url: https://sashiko.dev/#/patchset/20260322155728.81434-2-sj@kernel.org
#
# hkml [1] generated a draft of this mail.  It can be regenerated
# using below command:
#
#     hkml patch sashiko_dev --for_forwarding \
#             20260322155728.81434-2-sj@kernel.org
#
# [1] https://github.com/sjp38/hackermail

Sent using hkml (https://github.com/sjp38/hackermail)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: (sashiko status) [RFC PATCH v4 00/10] mm/damon: let DAMON be paused and resumed
  2026-03-22 17:05 ` (sashiko status) [RFC PATCH v4 00/10] mm/damon: let DAMON be paused and resumed SeongJae Park
@ 2026-03-22 17:11   ` SeongJae Park
  0 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2026-03-22 17:11 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Liam R. Howlett, Andrew Morton, Brendan Higgins, David Gow,
	David Hildenbrand, Jonathan Corbet, Lorenzo Stoakes, Michal Hocko,
	Mike Rapoport, Shuah Khan, Shuah Khan, Suren Baghdasaryan,
	Vlastimil Babka, damon, kunit-dev, linux-doc, linux-kernel,
	linux-kselftest, linux-mm

On Sun, 22 Mar 2026 10:05:05 -0700 SeongJae Park <sj@kernel.org> wrote:

> Forwarding sashiko.dev review status for this thread.
> 
> # review url: https://sashiko.dev/#/patchset/20260322155728.81434-1-sj@kernel.org
> 
> - [RFC PATCH v4 01/10] mm/damon/core: introduce damon_ctx->paused
>   - status: Reviewed
>   - review: ISSUES MAY FOUND
> - [RFC PATCH v4 02/10] mm/damon/sysfs: add pause file under context dir
>   - status: Reviewed
>   - review: No issues found.
> - [RFC PATCH v4 03/10] Docs/mm/damon/design: update for context pause/resume feature
>   - status: Reviewed
>   - review: No issues found.
> - [RFC PATCH v4 04/10] Docs/admin-guide/mm/damon/usage: update for pause file
>   - status: Reviewed
>   - review: No issues found.
> - [RFC PATCH v4 05/10] Docs/ABI/damon: update for pause sysfs file
>   - status: Reviewed
>   - review: No issues found.
> - [RFC PATCH v4 06/10] mm/damon/tests/core-kunit: test pause commitment
>   - status: Reviewed
>   - review: No issues found.
> - [RFC PATCH v4 07/10] selftests/damon/_damon_sysfs: support pause file staging
>   - status: Pending
>   - review: ISSUES MAY FOUND

The status is pending.  'ISSUES MAY FOUND' is a bug of hkml.  I'm fixing it.


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: (sashiko review) [RFC PATCH v4 10/10] selftets/damon/sysfs.py: pause DAMON before dumping status
  2026-03-22 15:57 ` [RFC PATCH v4 10/10] selftets/damon/sysfs.py: pause DAMON before dumping status SeongJae Park
@ 2026-03-22 17:15   ` SeongJae Park
  2026-03-22 17:47     ` SeongJae Park
  0 siblings, 1 reply; 18+ messages in thread
From: SeongJae Park @ 2026-03-22 17:15 UTC (permalink / raw)
  To: SeongJae Park; +Cc: Shuah Khan, damon, linux-kernel, linux-kselftest, linux-mm

Forwarding Sashiko review for doing discussions via mails.

# review url: https://sashiko.dev/#/patchset/20260322155728.81434-11-sj@kernel.org
# start of sashiko.dev inline review
commit 67adf8e7b8416252faf583e176f0e8424fd4fb95
Author: SeongJae Park <sj@kernel.org>

selftets/damon/sysfs.py: pause DAMON before dumping status

The sysfs.py test commits DAMON parameters, dumps the internal DAMON
state, and verifies the parameters. This commit pauses DAMON execution
during the state dumping to avoid races that can result in false test
results.

> selftets/damon/sysfs.py: pause DAMON before dumping status

This isn't a bug, but there's a typo (selftets) in the subject line.

> diff --git a/tools/testing/selftests/damon/sysfs.py b/tools/testing/selftests/damon/sysfs.py
> --- a/tools/testing/selftests/damon/sysfs.py
> +++ b/tools/testing/selftests/damon/sysfs.py
[ ... ]
> @@ -193,18 +193,55 @@ def assert_ctx_committed(ctx, dump):
>  def assert_ctxs_committed(kdamonds):
> +    ctxs_paused_for_dump = []
> +    kdamonds_paused_for_dump = []
> +    # pause for safe state dumping
> +    for kd in kdamonds.kdamonds:
[ ... ]
> +    # resume contexts paused for safe state dumping
> +    for ctx in ctxs_paused_for_dump:
> +        ctx.pause = False
> +    if kd in kdamonds_paused_for_dump:
> +        err = kd.commit()

Is this if statement intended to be a for loop?

Since kd is just retaining the value from the last iteration of the earlier
loop, using an if statement here means that if there are multiple kdamonds,
only the last one might have its state committed. 

Could this leave other kdamonds un-resumed in the kernel?

> +        if err is not None:
> +            print('resume fail (%s)' % err)
> +            kdamonds.stop()
> +            exit(1)
[ ... ]


# end of sashiko.dev inline review
# review url: https://sashiko.dev/#/patchset/20260322155728.81434-11-sj@kernel.org
#
# hkml [1] generated a draft of this mail.  It can be regenerated
# using below command:
#
#     hkml patch sashiko_dev --for_forwarding \
#             20260322155728.81434-11-sj@kernel.org
#
# [1] https://github.com/sjp38/hackermail

Sent using hkml (https://github.com/sjp38/hackermail)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: (sashiko review) [RFC PATCH v4 01/10] mm/damon/core: introduce damon_ctx->paused
  2026-03-22 17:06   ` (sashiko review) " SeongJae Park
@ 2026-03-22 17:40     ` SeongJae Park
  2026-03-22 20:51       ` SeongJae Park
  0 siblings, 1 reply; 18+ messages in thread
From: SeongJae Park @ 2026-03-22 17:40 UTC (permalink / raw)
  To: SeongJae Park; +Cc: Andrew Morton, damon, linux-kernel, linux-mm

On Sun, 22 Mar 2026 10:06:59 -0700 SeongJae Park <sj@kernel.org> wrote:

> Forwarding Sashiko review for doing discussions via mails.
> 
> # review url: https://sashiko.dev/#/patchset/20260322155728.81434-2-sj@kernel.org
> # start of sashiko.dev inline review
> commit 2a3be2c8b6c327ef72163cd54e6121ef72b9c932
> Author: SeongJae Park <sj@kernel.org>
> 
> mm/damon/core: introduce damon_ctx->paused
> 
> DAMON supports only start and stop of the execution. When it is
> stopped, its internal data that it self-trained goes away. It will be
> useful if the execution can be paused and resumed with the previous
> self-trained data.
> 
> > diff --git a/mm/damon/core.c b/mm/damon/core.c
> > index db6c67e52d2b..0ab2cfa848e6 100644
> > --- a/mm/damon/core.c
> > +++ b/mm/damon/core.c
> 
> [ ... ]
> 
> > @@ -3003,6 +3004,14 @@ static int kdamond_fn(void *data)
> >  		kdamond_call(ctx, false);
> >  		if (ctx->maybe_corrupted)
> >  			break;
> > +		while (ctx->pause) {
> > +			damos_walk_cancel(ctx);
> 
> Is it safe to repeatedly call damos_walk_cancel() here?
> 
> In damos_walk(), the walk_control_lock is dropped after setting
> ctx->walk_control to check damon_is_running(), which opens a race window.
> 
> Concurrently, kdamond_fn() calls damos_walk_cancel(), which reads
> ctx->walk_control, drops the lock, and completes the old request.
> 
> If a new caller invokes damos_walk() and successfully registers a new
> walk_control, could damos_walk_cancel() resume, reacquire the lock,
> and unconditionally set ctx->walk_control = NULL, dropping the new
> request?
> 
> This might leave the new caller permanently deadlocked in
> wait_for_completion().

Nice catch.  Orthogonal to this patch, though.

The deadlock could happen in below sequence.

CPU0                          │CPU1
──────────────────────────────┼────────────────────────
damos_walk()                  │
 │register request            │
 │wait completion             │damos_walk_cancel()
 │                            │ │complete the request
 ▼wakeup,return               │ │
damos_walk()                  │ │
 │register new request        │ │
 │                            │ │remove the new request
 │wait completion             │ ▼return
 ▼  nobody completes it.      │

Nonetheless, kdamond_fn() is calling damos_walk() already in several places
including this loop.  This issue hence exists regardless of this patch.  I will
work on fixing this as a separate hotfix.  Below fix may work.

'''
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -2321,7 +2321,9 @@ static void damos_walk_cancel(struct damon_ctx *ctx)
        control->canceled = true;
        complete(&control->completion);
        mutex_lock(&ctx->walk_control_lock);
-       ctx->walk_control = NULL;
+       /* A new damos_walk() caller could added a new request meanwhile */
+       if (ctx->walk_control == control)
+               ctx->walk_control = NULL;
        mutex_unlock(&ctx->walk_control_lock);
 }
'''


> 
> > +			kdamond_usleep(ctx->attrs.sample_interval);
> 
> Could this result in a busy loop if sample_interval is 0?
> 
> The DAMON API allows attrs.sample_interval to be 0 (for example, when
> tuning is disabled via goal.aggrs == 0).
> 
> If sample_interval is 0, this invokes usleep_range_idle(0, 1), which
> sleeps for at most 1 microsecond using an hrtimer.
> 
> Since the normally expensive monitoring operations are bypassed while
> paused, would this cause the thread to loop almost instantly and create
> an interrupt storm?

This is an intended behavior for users who really impatient.  So this should be
fine.


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: (sashiko review) [RFC PATCH v4 10/10] selftets/damon/sysfs.py: pause DAMON before dumping status
  2026-03-22 17:15   ` (sashiko review) " SeongJae Park
@ 2026-03-22 17:47     ` SeongJae Park
  0 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2026-03-22 17:47 UTC (permalink / raw)
  To: SeongJae Park; +Cc: Shuah Khan, damon, linux-kernel, linux-kselftest, linux-mm

On Sun, 22 Mar 2026 10:15:33 -0700 SeongJae Park <sj@kernel.org> wrote:

> Forwarding Sashiko review for doing discussions via mails.
> 
> # review url: https://sashiko.dev/#/patchset/20260322155728.81434-11-sj@kernel.org
> # start of sashiko.dev inline review
> commit 67adf8e7b8416252faf583e176f0e8424fd4fb95
> Author: SeongJae Park <sj@kernel.org>
> 
> selftets/damon/sysfs.py: pause DAMON before dumping status
> 
> The sysfs.py test commits DAMON parameters, dumps the internal DAMON
> state, and verifies the parameters. This commit pauses DAMON execution
> during the state dumping to avoid races that can result in false test
> results.
> 
> > selftets/damon/sysfs.py: pause DAMON before dumping status
> 
> This isn't a bug, but there's a typo (selftets) in the subject line.

Oops, Sashiko mentioned this in the previous one, but I forgot fixing this.
I will fix this in the next spin.

> 
> > diff --git a/tools/testing/selftests/damon/sysfs.py b/tools/testing/selftests/damon/sysfs.py
> > --- a/tools/testing/selftests/damon/sysfs.py
> > +++ b/tools/testing/selftests/damon/sysfs.py
> [ ... ]
> > @@ -193,18 +193,55 @@ def assert_ctx_committed(ctx, dump):
> >  def assert_ctxs_committed(kdamonds):
> > +    ctxs_paused_for_dump = []
> > +    kdamonds_paused_for_dump = []
> > +    # pause for safe state dumping
> > +    for kd in kdamonds.kdamonds:
> [ ... ]
> > +    # resume contexts paused for safe state dumping
> > +    for ctx in ctxs_paused_for_dump:
> > +        ctx.pause = False
> > +    if kd in kdamonds_paused_for_dump:
> > +        err = kd.commit()
> 
> Is this if statement intended to be a for loop?
> 
> Since kd is just retaining the value from the last iteration of the earlier
> loop, using an if statement here means that if there are multiple kdamonds,
> only the last one might have its state committed. 
> 
> Could this leave other kdamonds un-resumed in the kernel?

Ah, correct...  There is no multiple kdamonds use case, but let's make it
complete.  I will fix this in the next spin, like below.

'''
--- a/tools/testing/selftests/damon/sysfs.py
+++ b/tools/testing/selftests/damon/sysfs.py
@@ -226,7 +226,7 @@ def assert_ctxs_committed(kdamonds):
     # resume contexts paused for safe state dumping
     for ctx in ctxs_paused_for_dump:
         ctx.pause = False
-    if kd in kdamonds_paused_for_dump:
+    for kd in kdamonds_paused_for_dump:
         err = kd.commit()
         if err is not None:
             print('resume fail (%s)' % err)
'''


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: (sashiko review) [RFC PATCH v4 01/10] mm/damon/core: introduce damon_ctx->paused
  2026-03-22 17:40     ` SeongJae Park
@ 2026-03-22 20:51       ` SeongJae Park
  0 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2026-03-22 20:51 UTC (permalink / raw)
  To: SeongJae Park; +Cc: Andrew Morton, damon, linux-kernel, linux-mm

On Sun, 22 Mar 2026 10:40:16 -0700 SeongJae Park <sj@kernel.org> wrote:

> On Sun, 22 Mar 2026 10:06:59 -0700 SeongJae Park <sj@kernel.org> wrote:
> 
> > Forwarding Sashiko review for doing discussions via mails.
> > 
> > # review url: https://sashiko.dev/#/patchset/20260322155728.81434-2-sj@kernel.org
> > # start of sashiko.dev inline review
> > commit 2a3be2c8b6c327ef72163cd54e6121ef72b9c932
> > Author: SeongJae Park <sj@kernel.org>
> > 
> > mm/damon/core: introduce damon_ctx->paused
> > 
> > DAMON supports only start and stop of the execution. When it is
> > stopped, its internal data that it self-trained goes away. It will be
> > useful if the execution can be paused and resumed with the previous
> > self-trained data.
> > 
> > > diff --git a/mm/damon/core.c b/mm/damon/core.c
> > > index db6c67e52d2b..0ab2cfa848e6 100644
> > > --- a/mm/damon/core.c
> > > +++ b/mm/damon/core.c
> > 
> > [ ... ]
> > 
> > > @@ -3003,6 +3004,14 @@ static int kdamond_fn(void *data)
> > >  		kdamond_call(ctx, false);
> > >  		if (ctx->maybe_corrupted)
> > >  			break;
> > > +		while (ctx->pause) {
> > > +			damos_walk_cancel(ctx);
> > 
> > Is it safe to repeatedly call damos_walk_cancel() here?
> > 
> > In damos_walk(), the walk_control_lock is dropped after setting
> > ctx->walk_control to check damon_is_running(), which opens a race window.
> > 
> > Concurrently, kdamond_fn() calls damos_walk_cancel(), which reads
> > ctx->walk_control, drops the lock, and completes the old request.
> > 
> > If a new caller invokes damos_walk() and successfully registers a new
> > walk_control, could damos_walk_cancel() resume, reacquire the lock,
> > and unconditionally set ctx->walk_control = NULL, dropping the new
> > request?
> > 
> > This might leave the new caller permanently deadlocked in
> > wait_for_completion().
> 
> Nice catch.  Orthogonal to this patch, though.
> 
> The deadlock could happen in below sequence.
> 
> CPU0                          │CPU1
> ──────────────────────────────┼────────────────────────
> damos_walk()                  │
>  │register request            │
>  │wait completion             │damos_walk_cancel()
>  │                            │ │complete the request
>  ▼wakeup,return               │ │
> damos_walk()                  │ │
>  │register new request        │ │
>  │                            │ │remove the new request
>  │wait completion             │ ▼return
>  ▼  nobody completes it.      │
> 
> Nonetheless, kdamond_fn() is calling damos_walk() already in several places
> including this loop.  This issue hence exists regardless of this patch.  I will
> work on fixing this as a separate hotfix.  Below fix may work.

TL; DR: there is no deadlock in existing code.  I will work on more clean code
or documentation, though.

The scenario that I illustrated above cannot happen, because the second
damos_walk() cannot register its new request before the old request is unset.

The request is unset in three places.  damos_walk_complete(),
damos_walk_cancel(), and damos_walk().  damos_walk_complete() and
damos_walk_cancel() are called from same kdamond thread, so no race between
them exists.

damos_walk() unsets the request, only if !damon_is_running().  damos_walk()
seeing !damon_is_running() means the kdamond is stopped.  It again means there
can be no concurrent damos_walk_cancel() or damos_walk_complete() that works
for same context and started before the damon_is_running() call.

Unless the same context is restarted, hence, there is no chance to race.  Only
DAMON_SYSFS calls damos_walk() and it doesn't restart same context.
DAMON_RECLAIM and DAMON_LRU_SORT do restart same context, but they don't use
damos_walk().  So, there is no deadlock in the existing code (or, no such
deadlock is found so far).

Let's assume there could be damos_walk() call with parallel restart of a DAMON
context, though.  In the case, below deadlock is available.  Seems this is what
Sashiko was trying to say.

0. A DAMON context is stopped.
1-1. CPU0: calls damos_walk() for the stopped context.
1-2. CPU0: damos_walk(): register a new damos_walk() request to the stopped
                         context.
1-3. CPU0: damos_walk(): shows !damon_is_running().
2.   CPU1: Re-start the DAMON context.
3-1. CPU2: Execute kdamond_fn() -> damos_walk_cancel()
3-2. CPU2: damos_walk_cancel(): complete the walk request that registered on
                                step 1-2.
4-1. CPU0: damos_walk(): unset the request.
4-2: CPU0: calls damos_walk() again.
4-3: CPU0: damos_walk() 2: register a new damos_walk() request.
4-4: CPU0: damos_walk() 2: wait for the completion.
5-1. CPU2: damos_walk_cancel(): unset the walk request that registered on step
                                4-3.

Nobody can complete the request that registered on step 4-3.  CPU0 infinitely
wait.

In more graphiscal way, this can be illustrated as below:

CPU0                           │CPU1             │CPU2                                    
───────────────────────────────┼─────────────────┼────────────────────────────────────────
damos_walk()                   │                 │                                        
   │register reqeust           │                 │                                        
   │show !damon_is_running(ctx)│                 │                                        
   │                           │                 │                                        
   │                           │damon_start(ctx) │                                        
   │                           │                 │damos_walk_cancel()                     
   │                           │                 │    complete first damos_walk() request 
   │                           │                 │                                        
   │unset request              │                 │                                        
   ▼return                     │                 │                                        
                               │                 │                                        
damos_walk()                   │                 │                                        
   │register request           │                 │                                        
   │wait completion            │                 │     unset second request               
   ▼                           │                 │                                        

As I mentioned abovely, this cannot happen on existing code, since there is no
code that restarts a terminated DAMON context, and calls damos_walk().  In the
future, there might be such use cases or mistakenly made call sequence, though.

I will work on improving this.  But, as I mentioned before, it is not a blocker
for this patch.


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2026-03-22 20:51 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-22 15:57 [RFC PATCH v4 00/10] mm/damon: let DAMON be paused and resumed SeongJae Park
2026-03-22 15:57 ` [RFC PATCH v4 01/10] mm/damon/core: introduce damon_ctx->paused SeongJae Park
2026-03-22 17:06   ` (sashiko review) " SeongJae Park
2026-03-22 17:40     ` SeongJae Park
2026-03-22 20:51       ` SeongJae Park
2026-03-22 15:57 ` [RFC PATCH v4 02/10] mm/damon/sysfs: add pause file under context dir SeongJae Park
2026-03-22 15:57 ` [RFC PATCH v4 03/10] Docs/mm/damon/design: update for context pause/resume feature SeongJae Park
2026-03-22 15:57 ` [RFC PATCH v4 04/10] Docs/admin-guide/mm/damon/usage: update for pause file SeongJae Park
2026-03-22 15:57 ` [RFC PATCH v4 05/10] Docs/ABI/damon: update for pause sysfs file SeongJae Park
2026-03-22 15:57 ` [RFC PATCH v4 06/10] mm/damon/tests/core-kunit: test pause commitment SeongJae Park
2026-03-22 15:57 ` [RFC PATCH v4 07/10] selftests/damon/_damon_sysfs: support pause file staging SeongJae Park
2026-03-22 15:57 ` [RFC PATCH v4 08/10] selftests/damon/drgn_dump_damon_status: dump pause SeongJae Park
2026-03-22 15:57 ` [RFC PATCH v4 09/10] selftests/damon/sysfs.py: check pause on assert_ctx_committed() SeongJae Park
2026-03-22 15:57 ` [RFC PATCH v4 10/10] selftets/damon/sysfs.py: pause DAMON before dumping status SeongJae Park
2026-03-22 17:15   ` (sashiko review) " SeongJae Park
2026-03-22 17:47     ` SeongJae Park
2026-03-22 17:05 ` (sashiko status) [RFC PATCH v4 00/10] mm/damon: let DAMON be paused and resumed SeongJae Park
2026-03-22 17:11   ` SeongJae Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox