[RFC PATCH 0/3] mm/damon: introduce DAMON_STAT for access monitoring that just works

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 0/3] mm/damon: introduce DAMON_STAT for access monitoring that just works
@ 2025-05-19 16:44 SeongJae Park
  2025-05-19 16:44 ` [RFC PATCH 1/3] mm/damon: introduce DAMON_STAT module SeongJae Park
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: SeongJae Park @ 2025-05-19 16:44 UTC (permalink / raw)
  Cc: SeongJae Park, Andrew Morton, damon, kernel-team, linux-kernel,
	linux-mm

DAMON-based access monitoring requires manual DAMON control and results
parsing.  Introduce a static kernel module for making it simple and
intuitive.  The module can be enabled without manual setup and provides
simple but practical access pattern metrics, namely memory idle time
percentiles and estimatd memory bandwidth.

Background and Problems
=======================

DAMON can be used for monitoring data access pattern of the system and
workloads.  For that, users can start DAMON to monitor access events on
specific address space with fine controls including address ranges to do
monitor and time intervals between samplings and aggregations.  The
resulting access information snapshot contains access frequency
(nr_accesses) and how long the frequency was maintained (age) for each
byte.

The monitoring usage is not simple and practical enough for production
usage.  Users should first start DAMON with a number of parameters, and
wait until DAMON's monitoring results capture reasonable amount of the
time data (age).  In production, such manual start and wait is
impractical to capture useful information from a high number of machines
in a timely manner.

The monitoring result is also too detailed to be used on production
environments.  The raw results are hard to be aggregated and/or compared
for production environments having large scale of time, space and
machines fleet.

Users have to implement and use their own automation of DAMON control
and results processing.  It is repetitive and challenging since there is
no good reference or guideline for such automation.

Solution: DAMON_STAT
====================

Implement a such automation in kernel space as a static kernel module,
namely DAMON_STAT.  It can be enabled at build, boot, or run time via
its build configuration or module parameter.  It monitors entire
physical address space with monitoring intervals that auto-tuned for
reasonable amount of access observations and minimum overhead.  It
converts the raw monitoring results into simpler metrics that can easily
aggregated and compared, namely idle time percentiles and memory
bandwidth.

Discussions
===========

The module aims to be useful on production environments constructed with
large number of machines that runs ling time.  The auto-tuned monitoring
intervals ensures a reasonable quality of the outputs.  The auto-tuning
also ensures its overhead be reasonable and low enough to be enabled
always on the production.  The simplified monitoring results metrics can
be useful for showing both coldness (idle time percentiles) and hotness
(memory bandwidth) of the system's access pattern.  We expect the
information can be useful for assessing system memory utilization and
inspiring optimizations or investigations on both kernel and user space
memory management logics for large scale fleets.

We hence expect the module is good enough to be just used in most
environments.  For special cases that require a custom access monitoring
automation, users will still get benefit by using DAMON_STAT as a
reference or a guideline for their automation.

SeongJae Park (3):
  mm/damon: introduce DAMON_STAT module
  mm/damon/stat: calculate and expose estimated memory bandwidth
  mm/damon/stat: calculate and expose idle time percentiles

 mm/damon/Kconfig  |  16 +++
 mm/damon/Makefile |   1 +
 mm/damon/stat.c   | 245 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 262 insertions(+)
 create mode 100644 mm/damon/stat.c

base-commit: 251509f2949105d3d0e3cdcd3921670a6aee3a0e
-- 
2.39.5

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RFC PATCH 1/3] mm/damon: introduce DAMON_STAT module
  2025-05-19 16:44 [RFC PATCH 0/3] mm/damon: introduce DAMON_STAT for access monitoring that just works SeongJae Park
@ 2025-05-19 16:44 ` SeongJae Park
  2025-05-19 16:44 ` [RFC PATCH 2/3] mm/damon/stat: calculate and expose estimated memory bandwidth SeongJae Park
  2025-05-19 16:44 ` [RFC PATCH 3/3] mm/damon/stat: calculate and expose idle time percentiles SeongJae Park
  2 siblings, 0 replies; 4+ messages in thread
From: SeongJae Park @ 2025-05-19 16:44 UTC (permalink / raw)
  Cc: SeongJae Park, Andrew Morton, damon, kernel-team, linux-kernel,
	linux-mm

To use DAMON for monitoring access pattern of the system, users should
manually start DAMON via DAMON sysfs ABI with a number of parameters for
specifying the monitoring target address space, address ranges, and
monitoring intervals.  After that, users should also wait until desired
amount of time data is captured into DAMON's monitoring results.  It is
bothersome and take a long time to be practical for access monitoring on
large fleet level production environments.

For access-aware system operations use cases like proactive cold memory
reclamation, similar problem existed and we solved those by introducing
dedicated static kernel modules such as DAMON_RECLAIM.

Implement such static kernel module for access monitoring, namely
DAMON_STAT.  It monitors the entire physical address space with
auto-tuned monitoring intervals.  The auto-tuning is set to capture 4 %
of observable access events in each snapshot while keeping the sampling
intervals in a range of 5 milliseconds and 10 seconds.  From production
environemnts, we confirmed this setup provides high quality monitoring
results with minimum overheads.  The module therefore receives only one
user input, whether to enable or disable it.  It can be set on build or
boot time via build configuration or kernel boot command line.  It can
also be overridden at runtime.

Note that this commit only implements the DAMON control part of the
module.  Users could to get the monitoring results via
damon:damon_aggregated tracepoints, but that's of course not the
recommended way.  Following commits will implement convenient and
optimized ways for serving the monitoring results to users.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 mm/damon/Kconfig  |  16 ++++++
 mm/damon/Makefile |   1 +
 mm/damon/stat.c   | 138 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 155 insertions(+)
 create mode 100644 mm/damon/stat.c

diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig
index 551745df011b..9f482e3adc67 100644
--- a/mm/damon/Kconfig
+++ b/mm/damon/Kconfig
@@ -95,4 +95,20 @@ config DAMON_LRU_SORT
 	  protect frequently accessed (hot) pages while rarely accessed (cold)
 	  pages reclaimed first under memory pressure.
 
+config DAMON_STAT
+	bool "Build data access monitoring stat (DAMON_STAT)"
+	depends on DAMON_PADDR
+	help
+	  This builds the DAMON-based access monitoring statistics subsystem.
+	  It runs DAMON and expose access monitoring results in simple stat
+	  metrics.
+
+config DAMON_STAT_ENABLED_DEFAULT
+	bool "Enable DAMON_STAT by default"
+	depends on DAMON_PADDR
+	default DAMON_STAT
+	help
+	  Whether to enable DAMON_STAT by default.  Users can disable it in
+	  boot or runtime using its 'enabled' parameter.
+
 endmenu
diff --git a/mm/damon/Makefile b/mm/damon/Makefile
index 8b49012ba8c3..d8d6bf5f8bff 100644
--- a/mm/damon/Makefile
+++ b/mm/damon/Makefile
@@ -6,3 +6,4 @@ obj-$(CONFIG_DAMON_PADDR)	+= ops-common.o paddr.o
 obj-$(CONFIG_DAMON_SYSFS)	+= sysfs-common.o sysfs-schemes.o sysfs.o
 obj-$(CONFIG_DAMON_RECLAIM)	+= modules-common.o reclaim.o
 obj-$(CONFIG_DAMON_LRU_SORT)	+= modules-common.o lru_sort.o
+obj-$(CONFIG_DAMON_STAT)	+= modules-common.o stat.o
diff --git a/mm/damon/stat.c b/mm/damon/stat.c
new file mode 100644
index 000000000000..852848ce844e
--- /dev/null
+++ b/mm/damon/stat.c
@@ -0,0 +1,138 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Shows data access monitoring resutls in simple metrics.
+ */
+
+#define pr_fmt(fmt) "damon-stat: " fmt
+
+#include <linux/damon.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/sort.h>
+
+#ifdef MODULE_PARAM_PREFIX
+#undef MODULE_PARAM_PREFIX
+#endif
+#define MODULE_PARAM_PREFIX "damon_stat."
+
+static int damon_stat_enabled_store(
+		const char *val, const struct kernel_param *kp);
+
+static const struct kernel_param_ops enabled_param_ops = {
+	.set = damon_stat_enabled_store,
+	.get = param_get_bool,
+};
+
+static bool enabled __read_mostly = CONFIG_DAMON_STAT_ENABLED_DEFAULT;
+module_param_cb(enabled, &enabled_param_ops, &enabled, 0600);
+MODULE_PARM_DESC(enabled, "Enable of disable DAMON_STAT");
+
+static struct damon_ctx *damon_stat_context;
+
+static struct damon_ctx *damon_stat_build_ctx(void)
+{
+	struct damon_ctx *ctx;
+	struct damon_attrs attrs;
+	struct damon_target *target;
+	unsigned long start = 0, end = 0;
+
+	ctx = damon_new_ctx();
+	if (!ctx)
+		return NULL;
+	attrs = (struct damon_attrs) {
+		.sample_interval = 5 * USEC_PER_MSEC,
+		.aggr_interval = 100 * USEC_PER_MSEC,
+		.ops_update_interval = 60 * USEC_PER_MSEC * MSEC_PER_SEC,
+		.min_nr_regions = 10,
+		.max_nr_regions = 1000,
+	};
+	/*
+	 * auto-tune sampling and aggregation interval aiming 4% DAMON-observed
+	 * accesses ratio, keeping sampling interval in [5ms, 10s] range.
+	 */
+	attrs.intervals_goal = (struct damon_intervals_goal) {
+		.access_bp = 400, .aggrs = 3,
+		.min_sample_us = 5000, .max_sample_us = 10000000,
+	};
+	if (damon_set_attrs(ctx, &attrs))
+		goto free_out;
+
+	/*
+	 * auto-tune sampling and aggregation interval aiming 4% DAMON-observed
+	 * accesses ratio, keeping sampling interval in [5ms, 10s] range.
+	 */
+	ctx->attrs.intervals_goal = (struct damon_intervals_goal) {
+		.access_bp = 400, .aggrs = 3,
+		.min_sample_us = 5000, .max_sample_us = 10000000,
+	};
+	if (damon_select_ops(ctx, DAMON_OPS_PADDR))
+		goto free_out;
+
+	target = damon_new_target();
+	if (!target)
+		goto free_out;
+	damon_add_target(ctx, target);
+	if (damon_set_region_biggest_system_ram_default(target, &start, &end))
+		goto free_out;
+	return ctx;
+free_out:
+	damon_destroy_ctx(ctx);
+	return NULL;
+}
+
+static int damon_stat_start(void)
+{
+	damon_stat_context = damon_stat_build_ctx();
+	if (!damon_stat_context)
+		return -ENOMEM;
+	return damon_start(&damon_stat_context, 1, true);
+}
+
+static void damon_stat_stop(void)
+{
+	damon_stop(&damon_stat_context, 1);
+	damon_destroy_ctx(damon_stat_context);
+}
+
+static bool damon_stat_init_called;
+
+static int damon_stat_enabled_store(
+		const char *val, const struct kernel_param *kp)
+{
+	bool is_enabled = enabled;
+	int err;
+
+	err = kstrtobool(val, &enabled);
+	if (err)
+		return err;
+
+	if (is_enabled == enabled)
+		return 0;
+
+	if (!damon_stat_init_called)
+		/*
+		 * probably called from command line parsing (parse_args()).
+		 * Cannot call damon_new_ctx().  Let damon_stat_init() handle.
+		 */
+		return 0;
+
+	if (enabled)
+		return damon_stat_start();
+	damon_stat_stop();
+	return 0;
+}
+
+static int __init damon_stat_init(void)
+{
+	int err = 0;
+
+	damon_stat_init_called = true;
+
+	/* probably set via command line */
+	if (enabled)
+		err = damon_stat_start();
+	return err;
+}
+
+module_init(damon_stat_init);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [RFC PATCH 2/3] mm/damon/stat: calculate and expose estimated memory bandwidth
  2025-05-19 16:44 [RFC PATCH 0/3] mm/damon: introduce DAMON_STAT for access monitoring that just works SeongJae Park
  2025-05-19 16:44 ` [RFC PATCH 1/3] mm/damon: introduce DAMON_STAT module SeongJae Park
@ 2025-05-19 16:44 ` SeongJae Park
  2025-05-19 16:44 ` [RFC PATCH 3/3] mm/damon/stat: calculate and expose idle time percentiles SeongJae Park
  2 siblings, 0 replies; 4+ messages in thread
From: SeongJae Park @ 2025-05-19 16:44 UTC (permalink / raw)
  Cc: SeongJae Park, Andrew Morton, damon, kernel-team, linux-kernel,
	linux-mm

The raw form of DAMON's monitoring results captures many details of the
information.  However, not every bit of the information is always
required for understanding practical access pattern.  Especially on real
world production systems of high scale time and size, the raw form is
difficult to be aggregated and compared.

Convert the raw monitoring results into a single number metric, namely
estimated memory bandwidth and expose it to users as a read-only
DAMON_STAT parameter.  The metric represents access intensiveness
(hotness) of the system.  It can easily be aggregated and compared for
high level understanding of the access pattern on large systems.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 mm/damon/stat.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/mm/damon/stat.c b/mm/damon/stat.c
index 852848ce844e..f9ae44db265b 100644
--- a/mm/damon/stat.c
+++ b/mm/damon/stat.c
@@ -28,8 +28,42 @@ static bool enabled __read_mostly = CONFIG_DAMON_STAT_ENABLED_DEFAULT;
 module_param_cb(enabled, &enabled_param_ops, &enabled, 0600);
 MODULE_PARM_DESC(enabled, "Enable of disable DAMON_STAT");
 
+static unsigned long estimated_memory_bandwidth __read_mostly;
+module_param(estimated_memory_bandwidth, ulong, 0400);
+MODULE_PARM_DESC(estimated_memory_bandwidth,
+		"Estimated memory bandwidth usage in bytes per second");
+
 static struct damon_ctx *damon_stat_context;
 
+static void damon_stat_set_estimated_memory_bandwidth(struct damon_ctx *c)
+{
+	struct damon_target *t;
+	struct damon_region *r;
+	unsigned long access_bytes = 0;
+
+	damon_for_each_target(t, c) {
+		damon_for_each_region(r, t)
+			access_bytes += (r->ar.end - r->ar.start) *
+				r->nr_accesses;
+	}
+	estimated_memory_bandwidth = access_bytes * USEC_PER_MSEC *
+		MSEC_PER_SEC / c->attrs.aggr_interval;
+}
+
+static int damon_stat_after_aggregation(struct damon_ctx *c)
+{
+	static unsigned long last_refresh_jiffies;
+
+	/* avoid unnecessarily frequent stat update */
+	if (time_before_eq(jiffies, last_refresh_jiffies +
+				msecs_to_jiffies(5 * MSEC_PER_SEC)))
+		return 0;
+	last_refresh_jiffies = jiffies;
+
+	damon_stat_set_estimated_memory_bandwidth(c);
+	return 0;
+}
+
 static struct damon_ctx *damon_stat_build_ctx(void)
 {
 	struct damon_ctx *ctx;
@@ -75,6 +109,7 @@ static struct damon_ctx *damon_stat_build_ctx(void)
 	damon_add_target(ctx, target);
 	if (damon_set_region_biggest_system_ram_default(target, &start, &end))
 		goto free_out;
+	ctx->callback.after_aggregation = damon_stat_after_aggregation;
 	return ctx;
 free_out:
 	damon_destroy_ctx(ctx);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [RFC PATCH 3/3] mm/damon/stat: calculate and expose idle time percentiles
  2025-05-19 16:44 [RFC PATCH 0/3] mm/damon: introduce DAMON_STAT for access monitoring that just works SeongJae Park
  2025-05-19 16:44 ` [RFC PATCH 1/3] mm/damon: introduce DAMON_STAT module SeongJae Park
  2025-05-19 16:44 ` [RFC PATCH 2/3] mm/damon/stat: calculate and expose estimated memory bandwidth SeongJae Park
@ 2025-05-19 16:44 ` SeongJae Park
  2 siblings, 0 replies; 4+ messages in thread
From: SeongJae Park @ 2025-05-19 16:44 UTC (permalink / raw)
  Cc: SeongJae Park, Andrew Morton, damon, kernel-team, linux-kernel,
	linux-mm

Knowing how much memory are how cold can be useful for understanding
coldness and utilization efficiency of memory.  The raw form of DAMON's
monitoring results has the information.  Convert the raw results into
the per-byte idle time distributions and expose it as percentiles metric
to users, as a read-only DAMON_STAT parameter.

In detail, the metrics are calcualted as following.  First, DAMON's
per-region access frequency and age information is converted into
per-byte idle time.  If access frequency of a region is higher than
zero, every byte of the region has zero idle time.  If access frequency
of a region is zero, every byte of the region has idle time as the age
of the region.  Then the logic sorts the per-byte idle times and
provides the value at 0/100, 1/100, ..., 99/100 and 100/100 location of
the sorted array.

The metric can be easily aggregated and compared on large scale
production systems.  For example, if an average of 75-th percentile idle
time  of machines that collected on similar time is two minutes, it
means the system's 25 percent memory is not accessed at all for two
minutes or more in average.  If a workload considers two minutes as unit
work time, we can conclude its working set size is only 75 percent of
the memory.  If the system utilizes proactive reclamation and it
supports coldness-based thresholds like DAMON_RECLAIM, te idle time
percentiles can be used to find more safe or aggressive coldness
threshold for aimed memory saving.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 mm/damon/stat.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/mm/damon/stat.c b/mm/damon/stat.c
index f9ae44db265b..7ef13ea22221 100644
--- a/mm/damon/stat.c
+++ b/mm/damon/stat.c
@@ -33,6 +33,11 @@ module_param(estimated_memory_bandwidth, ulong, 0400);
 MODULE_PARM_DESC(estimated_memory_bandwidth,
 		"Estimated memory bandwidth usage in bytes per second");
 
+static unsigned long memory_idle_ms_percentiles[101] __read_mostly = {0,};
+module_param_array(memory_idle_ms_percentiles, ulong, NULL, 0400);
+MODULE_PARM_DESC(memory_idle_ms_percentiles,
+		"Memory idle time percentiles in milliseconds");
+
 static struct damon_ctx *damon_stat_context;
 
 static void damon_stat_set_estimated_memory_bandwidth(struct damon_ctx *c)
@@ -50,6 +55,72 @@ static void damon_stat_set_estimated_memory_bandwidth(struct damon_ctx *c)
 		MSEC_PER_SEC / c->attrs.aggr_interval;
 }
 
+static unsigned int damon_stat_idletime(const struct damon_region *r)
+{
+	if (r->nr_accesses)
+		return 0;
+	return r->age + 1;
+}
+
+static int damon_stat_cmp_regions(const void *a, const void *b)
+{
+	const struct damon_region *ra = *(const struct damon_region **)a;
+	const struct damon_region *rb = *(const struct damon_region **)b;
+
+	return damon_stat_idletime(ra) - damon_stat_idletime(rb);
+}
+
+static int damon_stat_sort_regions(struct damon_ctx *c,
+		struct damon_region ***sorted_ptr, int *nr_regions_ptr,
+		unsigned long *total_sz_ptr)
+{
+	struct damon_target *t;
+	struct damon_region *r;
+	struct damon_region **region_pointers;
+	unsigned int nr_regions = 0;
+	unsigned long total_sz = 0;
+
+	damon_for_each_target(t, c) {
+		/* there is only one target */
+		region_pointers = kmalloc_array(damon_nr_regions(t),
+				sizeof(*region_pointers), GFP_KERNEL);
+		if (!region_pointers)
+			return -ENOMEM;
+		damon_for_each_region(r, t) {
+			region_pointers[nr_regions++] = r;
+			total_sz += r->ar.end - r->ar.start;
+		}
+	}
+	sort(region_pointers, nr_regions, sizeof(*region_pointers),
+			damon_stat_cmp_regions, NULL);
+	*sorted_ptr = region_pointers;
+	*nr_regions_ptr = nr_regions;
+	*total_sz_ptr = total_sz;
+	return 0;
+}
+
+static void damon_stat_set_idletime_percentiles(struct damon_ctx *c)
+{
+	struct damon_region **sorted_regions, *region;
+	int nr_regions;
+	unsigned long total_sz, accounted_bytes = 0;
+	int err, i, next_percentile = 0;
+
+	err = damon_stat_sort_regions(c, &sorted_regions, &nr_regions,
+			&total_sz);
+	if (err)
+		return;
+	for (i = 0; i < nr_regions; i++) {
+		region = sorted_regions[i];
+		accounted_bytes += region->ar.end - region->ar.start;
+		while (next_percentile <= accounted_bytes * 100 / total_sz)
+			memory_idle_ms_percentiles[next_percentile++] =
+				damon_stat_idletime(region) *
+				c->attrs.aggr_interval / USEC_PER_MSEC;
+	}
+	kfree(sorted_regions);
+}
+
 static int damon_stat_after_aggregation(struct damon_ctx *c)
 {
 	static unsigned long last_refresh_jiffies;
@@ -61,6 +132,7 @@ static int damon_stat_after_aggregation(struct damon_ctx *c)
 	last_refresh_jiffies = jiffies;
 
 	damon_stat_set_estimated_memory_bandwidth(c);
+	damon_stat_set_idletime_percentiles(c);
 	return 0;
 }
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-05-19 16:44 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-19 16:44 [RFC PATCH 0/3] mm/damon: introduce DAMON_STAT for access monitoring that just works SeongJae Park
2025-05-19 16:44 ` [RFC PATCH 1/3] mm/damon: introduce DAMON_STAT module SeongJae Park
2025-05-19 16:44 ` [RFC PATCH 2/3] mm/damon/stat: calculate and expose estimated memory bandwidth SeongJae Park
2025-05-19 16:44 ` [RFC PATCH 3/3] mm/damon/stat: calculate and expose idle time percentiles SeongJae Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).