* [RFC PATCH v2 0/8] mm/damon: auto-tune aggregation interval
@ 2025-02-28 22:03 SeongJae Park
2025-02-28 22:03 ` [RFC PATCH v2 6/8] Docs/mm/damon/design: document for intervals auto-tuning SeongJae Park
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: SeongJae Park @ 2025-02-28 22:03 UTC (permalink / raw)
Cc: SeongJae Park, Andrew Morton, Jonathan Corbet, damon, kernel-team,
linux-doc, linux-kernel, linux-mm
DAMON requires time-consuming and repetitive aggregation interval
tuning. Introduce a feature for automating it using a feedback loop
that aims an amount of observed access events, like auto-exposing
cameras.
Background: Access Frequency Monitoring and Aggregation Interval
================================================================
DAMON checks if each memory element (damon_region) is accessed or not
for every user-specified time interval called 'sampling interval'. It
aggregates the check intervals on per-element counter called
'nr_accesses'. DAMON users can read the counters to get the access
temperature of a given element. The counters are reset for every
another user-specified time interval called 'aggregation interval'.
This can be illustrated as DAMON continuously capturing a snapshot of
access events that happen and captured within the last aggregation
interval. This implies the aggregation interval plays a key role for
the quality of the snapshots, like the camera exposure time. If it is
too short, the amount of access events that happened and captured for
each snapshot is small, so each snapshot will show no many interesting
things but just a cold and dark world with hopefuly one pale blue dot or
two. If it is too long, too many events are aggregated in a single
shot, so each snapshot will look like world of flames, or Muspellheim.
It will be difficult to find practical insights in both cases.
Problem: Time Consuming and Repetitive Tuning
=============================================
The appropriate length of the aggregation interval depends on how
frequently the system and workloads are making access events that DAMON
can observe. Hence, users have to tune the interval with excessive
amount of tests with the target system and workloads. If the system and
workloads are changed, the tuning should be done again. If the
characteristic of the workloads is dynamic, it becomes more challenging.
It is therefore time-consuming and repetitive.
The tuning challenge mainly stems from the wrong question. It is not
asking users what quality of monitoring results they want, but how DAMON
should operate for their hidden goal. To make the right answer, users
need to fully understand DAMON's mechanisms and the characteristics of
their workloads. Users shouldn't be asked to understand the underlying
mechanism. Understanding the characteristics of the workloads shouldn't
be the role of users but DAMON.
Aim-oriented Feedback-driven Auto-Tuning
=========================================
Fortunately, the appropriate length of the aggregation interval can be
inferred using a feedback loop. If the current snapshots are showing no
much intresting information, in other words, if it shows only rare
access events, increasing the aggregation interval helps, and vice
versa. We tested this theory on a few real-world workloads, and
documented one of the experience with an official DAMON monitoring
intervals tuning guideline. Since it is a simple theory that requires
repeatable tries, it can be a good job for machines.
Based on the guideline's theory, we design an automation of aggregation
interval tuning, in a way similar to that of camera auto-exposure
feature. It defines the amount of interesting information as the ratio
of DAMON-observed access events that DAMON actually observed to
theoretical maximum amount of it within each snapshot. Events are
accounted in byte and sampling attempts granularity. For example, let's
say there is a region of 'X' bytes size. DAMON tried access check
smapling for the region 'Y' times in total for a given aggregation.
Among the 'Y' attempts, 'Z' times it shown positive results. Then, the
theoritical maximum number of access events for the region is 'X * Y'.
And the number of access events that DAMON has observed for the region
is 'X * Z'. The abount of the interesting information is
'(X * Z / X * Y)'. Note that each snapshot would have multiple regions.
Users can set an arbitrary value of the ratio as their target. Once the
target is set, the automation periodically measures the current value of
the ratio and increase or decrease the aggregation interval if the ratio
value is lower or higher than the target. The amount of the change is
proportion to the distance between the current adn the target values.
To avoid auto-tuning goes too long way, let users set the minimum and
the maximum aggregation interval times. Changing only aggregation
interval while sampling interval is kept makes the maximum level of
access frequency in each snapshot, or discernment of regions
inconsistent. Also, unnecessarily short sampling interval causes
meaningless monitoring overhed. The automation therefore adjusts the
sampling interval together with aggregation interval, while keeping the
ratio between the two intervals. Users can set the ratio, or the
discernment.
Discussion
==========
The modified question (aimed amount of access events, or lights, in each
snapshot) is easy to answer by both the users and the kernel. If users
are interested in finding more cold regions, the value should be lower,
and vice versa. If users have no idea, kernel can suggest a fair
default value based on some theories and experiments. For example,
based on the Pareto principle (80/20 rule), we could expect 20% target
ratio will capture 80% of real access events. Since 80% might be too
high, applying the rule once again, 4% (20% * 20%) may capture about 56%
(80% * 80%) of real access events.
Sampling to aggregation intervals ratio and min/max aggregation
intervals are also arguably easy to answer. What users want is
discernment of regions for efficient system operation, for examples, X
amount of colder regions or Y amount of warmer regions, not exactly how
many times each cache line is accessed in nanoseconds degree. The
appropriate min/max aggregation interval can relatively naively set, and
may better to set for aimed monitoring overhead. Since sampling
interval is directly deciding the overhead, setting it based on the
sampling interval can be easy. With my experiences, I'd argue the
intervals ratio 0.05, and 5 milliseconds to 20 seconds sampling interval
range (100 milliseconds to 400 seconds aggregation interval) can be a
good default suggestion.
Evaluation
==========
We confirmed the tuning works as expected with a few simple workloads
including kernel builds and an in-memory caching representative
benchmark[1]. We will conduct more evaluations with more workloads and
share the results with more details by the time that we drop the RFC
tag.
Changelog
=========
Changes from RFC v1
(https://lore.kernel.org/20250213014438.145611-1-sj@kernel.org)
- Replace the target metric from positive samples ratio to
DAMON-observed access samples ratio
- Fix wrong max events accounting bug
- Fix double-increase of next_aggregation_sis
SeongJae Park (8):
mm/damon: add data structure for monitoring intervals auto-tuning
mm/damon/core: implement intervals auto-tuning
mm/damon/sysfs: implement intervals tuning goal directory
mm/damon/sysfs: commit intervals tuning goal
mm/damon/sysfs: implement a command to update auto-tuned monitoring
intervals
Docs/mm/damon/design: document for intervals auto-tuning
Docs/ABI/damon: document intervals auto-tuning ABI
Docs/admin-guide/mm/damon/usage: add intervals_goal directory on the
hierarchy
.../ABI/testing/sysfs-kernel-mm-damon | 30 +++
Documentation/admin-guide/mm/damon/usage.rst | 25 ++
Documentation/mm/damon/design.rst | 50 ++++
include/linux/damon.h | 43 ++++
mm/damon/core.c | 98 ++++++++
mm/damon/sysfs.c | 216 ++++++++++++++++++
6 files changed, 462 insertions(+)
base-commit: 9e7d9145ab8ce407acc540fc29133c471bc29046
--
2.39.5
^ permalink raw reply [flat|nested] 4+ messages in thread
* [RFC PATCH v2 6/8] Docs/mm/damon/design: document for intervals auto-tuning
2025-02-28 22:03 [RFC PATCH v2 0/8] mm/damon: auto-tune aggregation interval SeongJae Park
@ 2025-02-28 22:03 ` SeongJae Park
2025-02-28 22:03 ` [RFC PATCH v2 8/8] Docs/admin-guide/mm/damon/usage: add intervals_goal directory on the hierarchy SeongJae Park
2025-02-28 22:09 ` [RFC PATCH v2 0/8] mm/damon: auto-tune aggregation interval SeongJae Park
2 siblings, 0 replies; 4+ messages in thread
From: SeongJae Park @ 2025-02-28 22:03 UTC (permalink / raw)
Cc: SeongJae Park, Andrew Morton, Jonathan Corbet, damon, kernel-team,
linux-doc, linux-kernel, linux-mm
Document the design of DAMON sampling and aggregation intervals
auto-tuning.
Signed-off-by: SeongJae Park <sj@kernel.org>
---
Documentation/mm/damon/design.rst | 46 +++++++++++++++++++++++++++++++
1 file changed, 46 insertions(+)
diff --git a/Documentation/mm/damon/design.rst b/Documentation/mm/damon/design.rst
index ffea744e4889..0cc9f6441354 100644
--- a/Documentation/mm/damon/design.rst
+++ b/Documentation/mm/damon/design.rst
@@ -313,6 +313,10 @@ sufficient for the given purpose, it shouldn't be unnecessarily further
lowered. It is recommended to be set proportional to ``aggregation interval``.
By default, the ratio is set as ``1/20``, and it is still recommended.
+Based on the manual tuning guide, DAMON provides more intuitive knob-based
+intervals auto tuning mechanism. Please refer to :ref:`the design document of
+the feature <damon_design_monitoring_intervals_autotuning>` for detail.
+
Refer to below documents for an example tuning based on the above guide.
.. toctree::
@@ -321,6 +325,48 @@ Refer to below documents for an example tuning based on the above guide.
monitoring_intervals_tuning_example
+.. _damon_design_monitoring_intervals_autotuning:
+
+Monitoring Intervals Auto-tuning
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+DAMON provides automatic tuning of the ``sampling interval`` and ``aggregation
+interval`` based on the :ref:`the tuning guide idea
+<damon_design_monitoring_params_tuning_guide>`. The tuning mechanism allows
+users to set the aimed amount of access events to observe via DAMON within
+given time interval. The target can be specified by the user as a ratio of
+DAMON-observed access events to the theoretical maximum amount of the events
+(``access_bp``) that measured within a given number of aggregations
+(``aggrs``).
+
+The DAMON-observed access events are calculated in byte granularity based on
+DAMON :ref:`region assumption <damon_design_region_based_sample>`. For
+example, if a region of size ``X`` bytes of ``Y`` ``nr_accesses`` is found, it
+means ``X * Y`` access events are observed by DAMON. Theoretical maximum
+access events for the region is calculated in same way, but replacing ``Y``
+with theoretical maximum ``nr_accesses``, which can be calculated as
+``aggregation interval / sampling interval``.
+
+The mechanism calculates the ratio of access events for ``aggrs`` aggregations,
+and increases or decrease the ``sampleing interval`` and ``aggregation
+interval`` in same ratio, if the observed access ratio is lower or higher than
+the target, respectively. The ratio of the intervals change is decided in
+proportion to the distance between current samples ratio and the target ratio.
+
+The user can further set the minimum and maximum ``sampling interval`` that can
+be set by the tuning mechanism using two parameters (``min_sample_us`` and
+``max_sample_us``). Because the tuning mechanism changes ``sampling interval``
+and ``aggregation interval`` in same ratio always, the minimum and maximum
+``aggregation interval`` after each of the tuning changes can automatically set
+together.
+
+The tuning is turned off by default, and need to be set explicitly by the user.
+As a rule of thumbs and the Parreto principle, 4% access samples ratio target
+is recommended. Note that Parreto principle (80/20 rule) has applied twice.
+That is, assumes 4% (20% of 20%) DAMON-observed access events ratio (source)
+to capture 64% (80% multipled by 80%) real access events (outcomes).
+
+
.. _damon_design_damos:
Operation Schemes
--
2.39.5
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [RFC PATCH v2 8/8] Docs/admin-guide/mm/damon/usage: add intervals_goal directory on the hierarchy
2025-02-28 22:03 [RFC PATCH v2 0/8] mm/damon: auto-tune aggregation interval SeongJae Park
2025-02-28 22:03 ` [RFC PATCH v2 6/8] Docs/mm/damon/design: document for intervals auto-tuning SeongJae Park
@ 2025-02-28 22:03 ` SeongJae Park
2025-02-28 22:09 ` [RFC PATCH v2 0/8] mm/damon: auto-tune aggregation interval SeongJae Park
2 siblings, 0 replies; 4+ messages in thread
From: SeongJae Park @ 2025-02-28 22:03 UTC (permalink / raw)
Cc: SeongJae Park, Andrew Morton, Jonathan Corbet, damon, kernel-team,
linux-doc, linux-kernel, linux-mm
Document DAMON sysfs interface usage for DAMON sampling and aggregation
intervals auto-tuning.
Signed-off-by: SeongJae Park <sj@kernel.org>
---
Documentation/admin-guide/mm/damon/usage.rst | 25 ++++++++++++++++++++
Documentation/mm/damon/design.rst | 4 ++++
2 files changed, 29 insertions(+)
diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst
index 4b25c25d4f4f..8f01ad8792e7 100644
--- a/Documentation/admin-guide/mm/damon/usage.rst
+++ b/Documentation/admin-guide/mm/damon/usage.rst
@@ -64,6 +64,7 @@ comma (",").
│ │ │ │ :ref:`0 <sysfs_context>`/avail_operations,operations
│ │ │ │ │ :ref:`monitoring_attrs <sysfs_monitoring_attrs>`/
│ │ │ │ │ │ intervals/sample_us,aggr_us,update_us
+ │ │ │ │ │ │ │ intervals_goal/access_bp,aggrs,min_sample_us,max_sample_us
│ │ │ │ │ │ nr_regions/min,max
│ │ │ │ │ :ref:`targets <sysfs_targets>`/nr_targets
│ │ │ │ │ │ :ref:`0 <sysfs_target>`/pid_target
@@ -132,6 +133,11 @@ Users can write below commands for the kdamond to the ``state`` file.
- ``off``: Stop running.
- ``commit``: Read the user inputs in the sysfs files except ``state`` file
again.
+- ``update_tuned_intervals``: Update the contents of ``sample_us`` and
+ ``aggr_us`` files of the kdamond with the auto-tuning applied ``sampling
+ interval`` and ``aggregation interval`` for the files. Please refer to
+ :ref:`intervals_goal section <damon_usage_sysfs_monitoring_intervals_goal>`
+ for more details.
- ``commit_schemes_quota_goals``: Read the DAMON-based operation schemes'
:ref:`quota goals <sysfs_schemes_quota_goals>`.
- ``update_schemes_stats``: Update the contents of stats files for each
@@ -213,6 +219,25 @@ writing to and rading from the files.
For more details about the intervals and monitoring regions range, please refer
to the Design document (:doc:`/mm/damon/design`).
+.. _damon_usage_sysfs_monitoring_intervals_goal:
+
+contexts/<N>/monitoring_attrs/intervals/intervals_goal/
+-------------------------------------------------------
+
+Under the ``intervals`` directory, one directory for automated tuning of
+``sample_us`` and ``aggr_us``, namely ``intervals_goal`` directory also exists.
+Under the directory, four files for the auto-tuning control, namely
+``access_bp``, ``aggrs``, ``min_sample_us`` and ``max_sample_us`` exist.
+Please refer to the :ref:`design document of the feature
+<damon_design_monitoring_intervals_autotuning>` for the internal of the tuning
+mechanism. Reading and writing the four files under ``intervals_goal``
+directory shows and updates the tuning parameters that described in the
+:ref:design doc <damon_design_monitoring_intervals_autotuning>` with the same
+names. The tuning starts with the user-set ``sample_us`` and ``aggr_us``. The
+tuning-applied current values of the two intervals can be read from the
+``sample_us`` and ``aggr_us`` files after writing ``update_tuned_intervals`` to
+the ``state`` file.
+
.. _sysfs_targets:
contexts/<N>/targets/
diff --git a/Documentation/mm/damon/design.rst b/Documentation/mm/damon/design.rst
index 0cc9f6441354..0cf678d98b1b 100644
--- a/Documentation/mm/damon/design.rst
+++ b/Documentation/mm/damon/design.rst
@@ -366,6 +366,10 @@ is recommended. Note that Parreto principle (80/20 rule) has applied twice.
That is, assumes 4% (20% of 20%) DAMON-observed access events ratio (source)
to capture 64% (80% multipled by 80%) real access events (outcomes).
+To know how user-space can use this feature via :ref:`DAMON sysfs interface
+<sysfs_interface>`, refer to :ref:`intervals_goal <sysfs_scheme>` part of
+the documentation.
+
.. _damon_design_damos:
--
2.39.5
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [RFC PATCH v2 0/8] mm/damon: auto-tune aggregation interval
2025-02-28 22:03 [RFC PATCH v2 0/8] mm/damon: auto-tune aggregation interval SeongJae Park
2025-02-28 22:03 ` [RFC PATCH v2 6/8] Docs/mm/damon/design: document for intervals auto-tuning SeongJae Park
2025-02-28 22:03 ` [RFC PATCH v2 8/8] Docs/admin-guide/mm/damon/usage: add intervals_goal directory on the hierarchy SeongJae Park
@ 2025-02-28 22:09 ` SeongJae Park
2 siblings, 0 replies; 4+ messages in thread
From: SeongJae Park @ 2025-02-28 22:09 UTC (permalink / raw)
To: SeongJae Park
Cc: Andrew Morton, Jonathan Corbet, damon, kernel-team, linux-doc,
linux-kernel, linux-mm
On Fri, 28 Feb 2025 14:03:20 -0800 SeongJae Park <sj@kernel.org> wrote:
> DAMON requires time-consuming and repetitive aggregation interval
> tuning. Introduce a feature for automating it using a feedback loop
> that aims an amount of observed access events, like auto-exposing
> cameras.
[...]
> Evaluation
> ==========
>
> We confirmed the tuning works as expected with a few simple workloads
> including kernel builds and an in-memory caching representative
> benchmark[1].
Forgot adding the link to the benchmark, sorry. It is
https://github.com/facebookresearch/DCPerf/blob/main/packages/tao_bench/README.md
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-02-28 22:09 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-28 22:03 [RFC PATCH v2 0/8] mm/damon: auto-tune aggregation interval SeongJae Park
2025-02-28 22:03 ` [RFC PATCH v2 6/8] Docs/mm/damon/design: document for intervals auto-tuning SeongJae Park
2025-02-28 22:03 ` [RFC PATCH v2 8/8] Docs/admin-guide/mm/damon/usage: add intervals_goal directory on the hierarchy SeongJae Park
2025-02-28 22:09 ` [RFC PATCH v2 0/8] mm/damon: auto-tune aggregation interval SeongJae Park
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).