* Two simple ideas for DAMON accuracy improvement
@ 2024-10-26 21:53 SeongJae Park
2025-01-18 1:47 ` SeongJae Park
2025-02-13 22:23 ` SeongJae Park
0 siblings, 2 replies; 3+ messages in thread
From: SeongJae Park @ 2024-10-26 21:53 UTC (permalink / raw)
To: damon; +Cc: SeongJae Park, kernel-team
Hello DAMON community,
There were a number of grateful questions, concerns, and improvement ideas
around monitoring output accuracy of DAMON. I always admitted the fact that
DAMON has many rooms for improvement, but was bit awary at changes for some
reasons. Now I think it caused some unnecessarily long delay. Sorry about
that. Now I want to invest some time on the topic. So starting by sharing
below two simple ideas first.
User-defined Regions Split Factor
---------------------------------
DAMON's "Adasptive Regions Adjustment (ARA)" mechanism splits each region into
randomly sized sub regions, show their access temperature, and merge back
adjacent regions having similar temperature. The split factor is hard-coded as
two. Increasing the number make DAMON regions more quickly converges in right
shape. However, it makes number of DAMON regions in usual situation higher,
and therefore induce more overhead. It will still keep the user-defined upper
limit (max_nr_regions), though.
The optimum value of the split factor would depend on the use case. We will
therefore add another knob to let users set the factor on runtime. The default
value will be two, so this will not introduce any regression or behavioral
change to existing users.
Periodic Fine-grain Split of Aged Regions
-----------------------------------------
If a region is continuously changing its boundary and access temperature, it
means it is converging, or the access pattern of the workload is not
stabilized. Either case, this is a healthy signal.
If a region is consistently showing same access pattern for long time, it may
because the access pattern is stabilized, and the region is correctly
converged. However, it might be because the access pattern is changed, but the
converging is slow.
To avoid the too slow converging of aged regions, we will let users
periodically increase the split factor for regions that kept current access
pattern for long time (high 'age'). Users will be able to set the 'age'
offset, the split factor for the aged regions, and time interval between the
periodic fine-grain split of the regions. For example, users can ask DAMON to
"split regions keeping current access pattern for ten minutes or higher to five
sub-regions every minute".
The feature will be ignored unless users explicitly set those, so that it does
not introduce any regression of behavioral change to existing users.
Discussions
-----------
Someone might worry if these are adding too much knobs. As I shared the long
term plan on last LPC[1], we will keep supporting those new knobs in long term,
and may introduce auto-tuning feature in future. By letting these user-tunable
first, we can collect experiment results and use those for the future
improvements. Anyway, these changes will not introduce any regresion or
behavioral change to existing users based on the idea, so I believe these are
safe to be added.
One of the factors that made my work on this topic was absence of a formal
DAMON accuracy evaluation method. Using damon-tests, we were able to do the
evaluation by drawing heatmaps of test workloads and comparing those from
different versions of DAMON. Comparing several DAMOS schemes results on test
workloads were also one way for that. But, those are not formal. We still
don't have a formal way for accuracy evaluation. However, the two features
will introduce no regression to existing users, so I believe this is the path
forward for now.
I believe implementing the features would be not difficult. So unless someone
voluntarily steps up, I will start implementation of the features, targeting
v6.14 merge window.
I'm looking forward to any comments.
[1] https://lpc.events/event/18/contributions/1768/
Thanks,
SJ
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Two simple ideas for DAMON accuracy improvement
2024-10-26 21:53 Two simple ideas for DAMON accuracy improvement SeongJae Park
@ 2025-01-18 1:47 ` SeongJae Park
2025-02-13 22:23 ` SeongJae Park
1 sibling, 0 replies; 3+ messages in thread
From: SeongJae Park @ 2025-01-18 1:47 UTC (permalink / raw)
To: SeongJae Park; +Cc: damon, kernel-team
On Sat, 26 Oct 2024 14:53:11 -0700 SeongJae Park <sj@kernel.org> wrote:
> Hello DAMON community,
>
>
> There were a number of grateful questions, concerns, and improvement ideas
> around monitoring output accuracy of DAMON. I always admitted the fact that
> DAMON has many rooms for improvement, but was bit awary at changes for some
> reasons. Now I think it caused some unnecessarily long delay. Sorry about
> that. Now I want to invest some time on the topic. So starting by sharing
> below two simple ideas first.
>
> User-defined Regions Split Factor
> ---------------------------------
>
> DAMON's "Adasptive Regions Adjustment (ARA)" mechanism splits each region into
> randomly sized sub regions, show their access temperature, and merge back
> adjacent regions having similar temperature. The split factor is hard-coded as
> two. Increasing the number make DAMON regions more quickly converges in right
> shape. However, it makes number of DAMON regions in usual situation higher,
> and therefore induce more overhead. It will still keep the user-defined upper
> limit (max_nr_regions), though.
>
> The optimum value of the split factor would depend on the use case. We will
> therefore add another knob to let users set the factor on runtime. The default
> value will be two, so this will not introduce any regression or behavioral
> change to existing users.
>
> Periodic Fine-grain Split of Aged Regions
> -----------------------------------------
>
> If a region is continuously changing its boundary and access temperature, it
> means it is converging, or the access pattern of the workload is not
> stabilized. Either case, this is a healthy signal.
>
> If a region is consistently showing same access pattern for long time, it may
> because the access pattern is stabilized, and the region is correctly
> converged. However, it might be because the access pattern is changed, but the
> converging is slow.
>
> To avoid the too slow converging of aged regions, we will let users
> periodically increase the split factor for regions that kept current access
> pattern for long time (high 'age'). Users will be able to set the 'age'
> offset, the split factor for the aged regions, and time interval between the
> periodic fine-grain split of the regions. For example, users can ask DAMON to
> "split regions keeping current access pattern for ten minutes or higher to five
> sub-regions every minute".
>
> The feature will be ignored unless users explicitly set those, so that it does
> not introduce any regression of behavioral change to existing users.
>
> Discussions
> -----------
>
[...]
> I believe implementing the features would be not difficult. So unless someone
> voluntarily steps up, I will start implementation of the features, targeting
> v6.14 merge window.
TL; DR: The features will not be available by 6.14. Please make a voice if you
want the features get prioritized.
I still think the above features can be useful. But I was unable to make any
progress for those due to other prioritized works. Linux v6.14 may be released
on upcoming Sunday, so I don't think we can make it on it. Sorry if you were
waiting for the features.
It's also not clear if we will make it on the next major release. I'm now
trying to prioritize the monitoring intervals auto-tuning[1]. Please make your
voice if you want me to prioritize the development of the features, or step up
if you want to implement some of those on your own. I think the first feature
(user-defined regions split factor) could be a good DAMON-beginner's task.
[1] https://lore.kernel.org/20241202175459.2005526-1-sj@kernel.org
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Two simple ideas for DAMON accuracy improvement
2024-10-26 21:53 Two simple ideas for DAMON accuracy improvement SeongJae Park
2025-01-18 1:47 ` SeongJae Park
@ 2025-02-13 22:23 ` SeongJae Park
1 sibling, 0 replies; 3+ messages in thread
From: SeongJae Park @ 2025-02-13 22:23 UTC (permalink / raw)
To: SeongJae Park; +Cc: damon, kernel-team
On Sat, 26 Oct 2024 14:53:11 -0700 SeongJae Park <sj@kernel.org> wrote:
> Hello DAMON community,
>
>
> There were a number of grateful questions, concerns, and improvement ideas
> around monitoring output accuracy of DAMON. I always admitted the fact that
> DAMON has many rooms for improvement, but was bit awary at changes for some
> reasons. Now I think it caused some unnecessarily long delay. Sorry about
> that. Now I want to invest some time on the topic. So starting by sharing
> below two simple ideas first.
[...]
>
> Periodic Fine-grain Split of Aged Regions
> -----------------------------------------
>
> If a region is continuously changing its boundary and access temperature, it
> means it is converging, or the access pattern of the workload is not
> stabilized. Either case, this is a healthy signal.
>
> If a region is consistently showing same access pattern for long time, it may
> because the access pattern is stabilized, and the region is correctly
> converged. However, it might be because the access pattern is changed, but the
> converging is slow.
>
> To avoid the too slow converging of aged regions, we will let users
> periodically increase the split factor for regions that kept current access
> pattern for long time (high 'age'). Users will be able to set the 'age'
> offset, the split factor for the aged regions, and time interval between the
> periodic fine-grain split of the regions. For example, users can ask DAMON to
> "split regions keeping current access pattern for ten minutes or higher to five
> sub-regions every minute".
This means that users need to answer three questions. 1) How frequently, 2)
for how long regions, and 3) into how many sub-regions the splitting should be
done. It seems too dificult to answer. To make it simpler to answer, and
still preserve the effect of the original idea, I'd like to adjust the idea as:
"Periodically split regions without limiting number of resulting sub-regions
per region, while keeping the aimed number of total regions after the split."
For example, if there are three regions of different sizes, slice any region
any number of times if it results in making the total number of regions six.
Where to slice will be random, but would be uniformly distributed in a large
scale, to avoid too much bias. Having the distance between the slicing lines
same and randomize only first line's position can be a simplest implementation.
The updated scheme asks users only how frequently the new split method needs to
be used, so reducing the number of questions from three to one. Obviously one
question is easier to answer than three questions.
Huge regions will be splitted finer than now, so what we wanted to achieve with
the original version of this idea is still kept. Unlike the original version
of this idea, it will do the fine splitting for even young huge regions. But
the unnecessary splits will be reverted with upcoming regions merging. Samll
regions may not be splitted with the new approach, and it can slow down
converging small regions. But the next split operation will do the per-region
split, and user can set the frequency of the new split method.
This will make micro-target region split difficult. But such micro-targetting
is anyway challenging for users. If they really know the answers, they can
reform regions as they want by online-committing of target regions.
Please let me know if you have any concern or question about this updated idea.
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-02-13 22:23 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-26 21:53 Two simple ideas for DAMON accuracy improvement SeongJae Park
2025-01-18 1:47 ` SeongJae Park
2025-02-13 22:23 ` SeongJae Park
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.