From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 91545C433F5
	for <linux-mm@archiver.kernel.org>; Fri, 28 Jan 2022 13:42:08 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id C38356B00BF; Fri, 28 Jan 2022 08:42:07 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id BE82C6B00C2; Fri, 28 Jan 2022 08:42:07 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id A87EA6B00C3; Fri, 28 Jan 2022 08:42:07 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0241.hostedemail.com [216.40.44.241])
	by kanga.kvack.org (Postfix) with ESMTP id 978296B00BF
	for <linux-mm@kvack.org>; Fri, 28 Jan 2022 08:42:07 -0500 (EST)
Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay04.hostedemail.com (Postfix) with ESMTP id 579E883EA7
	for <linux-mm@kvack.org>; Fri, 28 Jan 2022 13:42:07 +0000 (UTC)
X-FDA: 79079809494.22.D436D6C
Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75])
	by imf24.hostedemail.com (Postfix) with ESMTP id A46CD18000C
	for <linux-mm@kvack.org>; Fri, 28 Jan 2022 13:41:59 +0000 (UTC)
Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by ams.source.kernel.org (Postfix) with ESMTPS id 2ACC3B82571;
	Fri, 28 Jan 2022 13:41:58 +0000 (UTC)
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8A564C340E0;
	Fri, 28 Jan 2022 13:41:54 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1643377317;
	bh=DinAlQIHyaa2k4xsDXD4didLNirGFnFwz0nf5wnLr40=;
	h=From:To:Cc:Subject:Date:In-Reply-To:From;
	b=bFY0mLaK5p4i7lmJZ/8AQxSUz0yXCicZRoUQV8G2V+fzLWkEaBPxW9YEyUEuHgrca
	 yq2Bs8nnaNbfacdjgyeBb3nH+MESyKxkvyqnwzKWkvvGpysulIN8vA9h1GVErj0sG8
	 VbcDn5+bTVz7zFpV4tC32JoMxYWvaPs0dMpMRQgAIDjEsy8WdEKgw9I4st/UOg8CFk
	 H/wlpBDCjh1Mll/vQlvkParuNYKB/E2NQGf+AgEN4L83H8slXTMhgwPwB1Qq17zf0n
	 6PHTzHXnFd7hcHB/RUfDhxdaBvn0T4KJG7trc8KVJjLc9rjG2laxmsIuwdQGdWBQ17
	 OYi33FbDTrvbA==
From: SeongJae Park <sj@kernel.org>
To: David Rientjes <rientjes@google.com>
Cc: SeongJae Park <sj@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jonathan.Cameron@huawei.com,
	amit@kernel.org,
	benh@kernel.crashing.org,
	corbet@lwn.net,
	david@redhat.com,
	dwmw@amazon.com,
	elver@google.com,
	foersleo@amazon.de,
	gthelen@google.com,
	markubo@amazon.de,
	shakeelb@google.com,
	baolin.wang@linux.alibaba.com,
	guoqing.jiang@linux.dev,
	xhao@linux.alibaba.com,
	hanyihao@vivo.com,
	changbin.du@gmail.com,
	kuba@kernel.org,
	rongwei.wang@linux.alibaba.com,
	rikard.falkeborn@gmail.com,
	geert@linux-m68k.org,
	kilobyte@angband.pl,
	linux-damon@amazon.com,
	linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PLAN] Some humble ideas for DAMON future works
Date: Fri, 28 Jan 2022 13:41:46 +0000
Message-Id: <20220128134146.5379-1-sj@kernel.org>
X-Mailer: git-send-email 2.17.1
MIME-Version: 1.0
In-Reply-To: <7afca3b5-626a-8356-aa73-b378f5aa7a3c@google.com>
Content-Type: text/plain; charset=UTF-8
X-Rspamd-Server: rspam03
X-Rspamd-Queue-Id: A46CD18000C
X-Stat-Signature: mi6fdkiii739c8e9fsk476xf6dnj7zkm
Authentication-Results: imf24.hostedemail.com;
	dkim=pass header.d=kernel.org header.s=k20201202 header.b=bFY0mLaK;
	dmarc=pass (policy=none) header.from=kernel.org;
	spf=pass (imf24.hostedemail.com: domain of sj@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=sj@kernel.org
X-Rspam-User: nil
X-HE-Tag: 1643377319-357083
Content-Transfer-Encoding: quoted-printable
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

Hello David,


Thank you so much for the greatful comments!

On Sun, 23 Jan 2022 14:48:35 -0800 (PST) David Rientjes <rientjes@google.=
com> wrote:

> On Wed, 19 Jan 2022, SeongJae Park wrote:
>=20
> > User-space Policy or In-kernel Policy?  Both.
> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> >=20
> > When discussing about a sort of kernel involved system efficiency
> > optimizations, I show two kinds of people who have slightly different=
 opinions.
> > The first party prefer to implement only simple but efficient mechani=
sms in the
> > kernel and export it to user space, so that users can make smart user=
 space
> > policy.  Meanwhile, the second party prefer the kernel just works.  I=
 agree
> > with both parties.
> >=20
>=20
> Thanks for starting this discussion, SeongJae, and kicking it off with =
all=20
> of your roadmap thoughts.  It's very helpful.
>=20
> I would love for this to turn into an active discussion amongst those=20
> people who are currently looking into using DAMON for their set of=20
> interests and also those who are investigating how its current set of=20
> support can be adapated for their use cases.

Glad to hear that and same to me!

>=20
> For discussion on where the kernel and userspace boundary lies for poli=
cy=20
> decisions, I think it depends heavily on (1) the specific subcomponent =
of=20
> the mm subsystem being discussed, I don't think this boundary will be t=
he=20
> same for all areas (and can/will evolve over time), and (2) the differe=
nce =20
> between the base out-of-the-box behavior that Linux provides for everyb=
ody=20
> and the elaborate support that some users need for efficiency or=20
> performance.  This is going to be very different for things like hugepa=
ge=20
> optimizations and memory compaction, for example.

Agreed.

>=20
> > I think the first opinion makes sense as there are some valuable info=
rmation
> > that only user space can know.  I think only such approaches could ac=
hieve the
> > ultimate efficiency in such cases.
> > I also agree to the second party, though, because there could be some=
 people
> > who don't have special information that only their applications know,=
 or
> > resources to do the additional work.  In-kernel simple policies will =
be still
> > beneficial for some users even though those are sub-optimal compared =
to the
> > highly tuned user space policy, if it provides some extent of efficie=
ncy gain
> > and no regressions for most cases.
> >=20
> > I'd like to help both.  For the reason, I made DAMON as an in-kernel =
mechanism
> > for both user and kernel-space policies.  It provides highly tunable =
general
> > user space interface to help the first party.  It also provides in-ke=
rnel
> > policies which built on top of DAMON using its kernel-space API for s=
pecific
> > common use cases with conservative default parameters that assumed to=
 incur no
> > regression but some extent of benefits in most cases, namely DAMON-ba=
sed
> > proactive reclamation.  I will continue pursuing the two ways.
> >=20
>=20
> Are you referring only to root userspace here or are you including=20
> non-root userspace?
>=20
> Imagine a process that is willing to accept the cpu overhead for doing =
thp=20
> collapse for portions of its memory in process context rather than wait=
ing=20
> for khugepaged and that we had a mechanism (discussed later) for doing=20
> that in the kernel.  The non-root user in this case would need the abil=
ity=20
> to monitor regions of its own heap, for example, and disregard others. =
=20
> The malloc implementation wants to answer the question of "what regions=
 of=20
> my heap are accessed very frequently?" so that we can do hugepage=20
> optimizations.
>=20
> Do you see that the user will have the ability to fork off a DAMON cont=
ext=20
> to do this monitoring for their own heap?  kdamond could be attached to=
 a=20
> cpu cgroup to charge the cpu overhead for doing this monitoring and the=
=20
> time spent applying any actions to that memory to that workload on a=20
> multi-tenant machine.
>=20
> I think it would be useful to discuss the role of non-root userspace fo=
r=20
> future DAMON support.

Interesting points.  I haven't considered about the case in detail so far=
, but
I think your idea definitely makes sense.  I was also thinking we might e=
nd up
having DAMON syscalls in future.  It might make more sense to be used as =
the
interface for the non-root userspace.

>=20
> > Imaginable DAMON-based Policies
> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D
> >=20
> > I'd like to start from listing some imaginable data access-aware oper=
ation
> > policies that I hope to eventually be made.  The list will hopefully =
shed light
> > on how DAMON should be evolved to efficiently support the policies.
> >=20
> > DAMON-based Proactive LRU-pages (de)Activation
> > ----------------------------------------------
> >=20
> > The reclamation mechanism which selects reclaim target using the
> > active/inactive LRU lists sometimes doesn't work well.  According to =
my
> > previous work, providing access pattern-based hints can significantly=
 improve
> > the performance under memory pressure[1,2].
> >=20
> > Proactive reclamation is known to be useful for many memory intensive=
 systems,
> > and now we have a DAMON-based implementation of it[3].  However, the =
proactive
> > reclamation wouldn't be so welcome to some systems having high cost o=
f I/O.
> > Also, even though the system runs proactive reclamation, memory press=
ure can
> > still occasionally triggered.
> >=20
> > My idea for helping this situation is manipulating the orders of page=
s in LRU
> > lists using DAMON-provided monitoring results.  That is, making DAMON
> > proactively finds hot/cold memory regions and moves pages of the hot =
regions to
> > the head of the active list, while moving pages of the cold regions t=
o the tail
> > of the inactive list.  This will help eventual reclamation under memo=
ry
> > pressure to evict cold pages first, so incur less additional page fau=
lts.
> >=20
>=20
> Let's add Johannes Weiner <hannes@cmpxchg.org> into this discussion as=20
> well since we had previously discussed persistent background ordering o=
f=20
> the lru lists based on hotness and coldness of memory before.  This=20
> discussion had happened before DAMON was merged upstream, so that DAMON=
=20
> has landed it is likely an area that he's interested in.
>=20
> One gotcha with the above might be the handling of MADV_FREE memory tha=
t=20
> we want to lazily free under memory pressure.  Userspace has indicated=20
> that we can free this memory whenever necessary, so the kernel=20
> implementation moves this memory to the inactive lru regardless of any=20
> hotness or coldness of the memory.  In other words, this memory *can* h=
ave=20
> very high access frequencies in the short-term and then it's madvised w=
ith=20
> MADV_FREE by userspace to free if we encounter memory pressure.  It see=
ms=20
> like this needs to override the DAMON-provided monitoring results since=
=20
> userspace just knows better in certain scenarios.

Interesting case.  Similarly, some of our customers told us that they don=
't
want DAMON_RECLAIM to affect some special processes that managing their m=
emory
in their highly optimized way.  To deal with this kind of situations, I'm
thinking about adding a sort of deny-list to DAMON-based Operation Scheme=
s.
With that, users would be able to specify to which memory region (e.g., m=
emory
regions of specific process, cgroup, pfn range, ...) each scheme shouldn'=
t
apply its action.  So, in this case, the program would be able to avoid D=
AMON's
interference by adding the region to apply 'MADV_FREE' to the deny-list o=
f
relevant DAMON-based operation scheme before calling 'madvise()'.

>=20
> > [1] https://www.usenix.org/conference/hotstorage19/presentation/park
> > [2] https://linuxplumbersconf.org/event/4/contributions/548/
> > [3] https://docs.kernel.org/admin-guide/mm/damon/reclaim.html
> >=20
> > DAMON-based THP Coalesce/Split
> > ------------------------------
> >=20
> > THP is know to significantly improve performance, but also increase m=
emory
> > footprint[1].  We can minimize the memory overhead while preserving t=
he
> > performance benefit by asking DAMON to provide MADV_HUGEPAGE-like hin=
ts for hot
> > memory regions of >=3D 2MiB size, and MADV_NOHUGEPAGE-like hints for =
cold memory
> > regions.  Our experimental user space policy implementation[2] of thi=
s idea
> > removes 76.15% of THP memory waste while preserving 51.25% of THP spe=
edup in
> > total.
> >=20
>=20
> This is a very interesting area to explore, and turns out to be very=20
> timely as well.  We'll soon be proposing the MADV_COLLAPSE support that=
 we=20
> discussed here[1] and was well received.
>=20
> One thought here is that with DAMON we can create a scheme to apply a=20
> DAMOS_COLLAPSE action on very hot memory in the monitoring region that=20
> would simply call into the new MADV_COLLAPSE code to allow us to do a=20
> synchronous collapse in process context.  With the current DAMON suppor=
t,=20
> this seems very straight-forward once we have MADV_COLLAPSE.
>=20
> [1] https://lore.kernel.org/all/d098c392-273a-36a4-1a29-59731cdf5d3d@go=
ogle.com/

I also agree the synchronous and speedy THP collapse/split will be helpfu=
l for
application-specific usage, and what you described above is almost same t=
o what
I'm planning to do.  Looking forward to MADV_COLLAPSE patch!

BTW, I'm also interested in the system-wide long term usage of the policy=
; the
slow speed of khugepaged seems not a big problem for the case.  I also re=
cently
started thinking it might be more simple and make more sense to do the
proactive works for only collapse or split instead of doing those togethe=
r,
depending on the system's THP setup.

For example, if '/sys/kernel/mm/transparent_hugepage/enabled' is set as
'always', we could simply find cold memory regions and mark those with
NOHUGEPAGE hints to reduce THP internal fragmentation caused memory bloat=
,
under memory pressure.  If the file is set as 'madvise', we could find ho=
t
memory regions and mark those with HUGEPAGE hints to reduce TLB misses, w=
hile
free memory ratio of the system is big.

>=20
> > [1] https://www.usenix.org/conference/osdi16/technical-sessions/prese=
ntation/kwon
> > [2] https://damonitor.github.io/doc/html/v34/vm/damon/eval.html
> >=20
> > DAMON-based Tiered Memory (Pro|De)motion
> > ----------------------------------------
> >=20
> > In tiered memory systems utilizing DRAM and PMEM[1], we can promote h=
ot pages to
> > DRAM and demote cold pages to PMEM using DAMON.  A patch for allowing
> > access-aware demotion user space policy development is already submit=
ted[2] by
> > Baolin.
> >=20
>=20
> Thanks for this, it's very useful.  Is it possible to point to any data=
 on=20
> how responsive the promotion side can be to recent memory accesses?  It=
=20
> seems like we'll need to promote that memory quite quickly to not suffe=
r=20
> long-lived performance degradations if we're treating DRAM and PMEM as=20
> schedulable memory.
>=20
> DAMON provides us with a framework so that we have complete control ove=
r=20
> the efficiency of scanning PMEM for possible promotion candidates.  But=
=20
> I'd be very interested in seeing any data from Baolin (or anybody else)=
 on=20
> just how responsive the promotion side can be.

Totally agreed.  I think the idea makes some sense, but it's useless with=
out
data.  This was in my TODO list since the beginning of DAMON, but I was t=
oo
lazy to do that.  I even couldn't start the test environment setup survey=
 yet.
Also, this is not yet prioritized among other tasks in my TODO list.  If =
anyone
could provide any PoC results or easy-to-start DRAM/PMEM test machine set=
up
instructions, it would be great.  Long story short, I don't have any data=
.  Any
data or help will be welcome.

>=20
> > [1] https://www.intel.com/content/www/us/en/products/details/memory-s=
torage/optane-memory.html
> > [2] https://lore.kernel.org/linux-mm/cover.1640171137.git.baolin.wang=
@linux.alibaba.com/
> >=20
> > DAMON-based Proactive Compaction
> > --------------------------------
> >=20
> > Compaction uses migration scanner to find migration source pages.  Ho=
t pages
> > would be more likely to be unmovable compared to cold pages, so it wo=
uld be
> > better to try migration of cold pages first.  DAMON could be used her=
e.  That
> > is, proactively monitoring accesses via DAMON and start compaction so=
 that the
> > migration scanner scan cold memory ranges first.  I should admit I'm =
not
> > familiar with compaction code and I have no PoC data for this but jus=
t the
> > groundless idea, though.
> >=20
>=20
> Is compaction enlightenment for DAMON a high priority at this point, or=
=20
> would AutoNUMA be a more interesting candidate?
>=20
> Today, AutoNUMA works with a sliding window setting page tables to have=
=20
> PROT_NONE permissions so that we induce a page fault and can determine=20
> which cpu is accessing potentially remote memory (task_numa_work()).  I=
f=20
> that's happening, we can migrate the memory to the home NUMA node so th=
at=20
> we can avoid those remote memory accesses and the increased latency tha=
t=20
> it induces.

I might missing something from your point, but what I hope to achieve wit=
h this
is faster and more successful defragmentation, not reduction of remote-NU=
MA
accesses.  Applying DAMON for enlightening AutoNUMA is also an interestin=
g
idea, though.

>=20
> Idea: if we enlightened task_numa_work() to prioritize hot memory using=
=20
> DAMON, it *seems* like this would be most effective rather than relying=
 on=20
> a sliding window.  We want to migrate memory that is frequently being=20
> accessed to reduce the remote memory access latency, we only get a mini=
mal=20
> improvement (mostly only node balancing) for memory that is rarely=20
> accessed.
>=20
> I'm somewhat surprised this isn't one of the highest priorities, actual=
ly,=20
> for being enlightened with DAMON support, so it feels like I'm missing=20
> something obvious.
>=20
> Let's also add Dave Hansen <dave.hansen@linux.intel.com> into the threa=
d=20
> for the above two sections (memory tiering and AutoNUMA) because I know=
=20
> he's thought about both.

AutoNUMA would need to know not only how frequently the memory regions is
accessed, but also which CPUs are how frequently accessing the memory reg=
ions.
Currently, DAMON doesn't provide the CPU-related information.  I think DA=
MON
could be extended for the case, by using the page fault mechanism as its =
basic
access check primitive instead of Accessed bits.  It would doable as DAMO=
N is
designed to support such extension, but would need some additional effort=
s.

>=20
> > How We Can Implement These
> > --------------------------
> >=20
> > Implementing most of the above mentioned policies wouldn't be too dif=
ficult
> > because we have DAMON-based Operation Schemes (DAMOS).  That is, we w=
ill need
> > to implement some more DAMOS action for each policy.  Some existing k=
ernel
> > functions can be reused.  Such actions would include LRU (de)activati=
on, THP
> > coalesce/split hints, memory (pro|de)motion, and cold pages first sca=
nning
> > compaction.  Then, supporting those actions with the user space inter=
face will
> > allows implementing user space policies.  If we find reasonably good =
default
> > DAMOS parameters and some kernel side control mechanism, we can furth=
er make
> > those as kernel policies in form of, say, builtin modules.
> >=20
> > How DAMON Should Be Evolved For Supporting Those
> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> >=20
> > Let's discuss what kind of changes in DAMON will be needed to efficie=
ntly
> > support above mentioned policies.
> >=20
> > Simultaneously Monitoring Different Types of Address Spaces
> > -----------------------------------------------------------
> >=20
> > It would be better to run all the above mentioned policies simultaneo=
usly on
> > single system.  As some policies such as LRU-pages (de)activation wou=
ld better
> > to run on physical address space while some policies such as THP coal=
esce/split
> > would need to run on virtual address spaces, DAMON should support con=
currently
> > monitoring different address spaces.  We can always do this by creati=
ng one
> > DAMON context for each address space and running those.  However, as =
the
> > address spaces will conflict, each other will be interfered.  Current=
 idea for
> > avoiding this is allowing multiple DAMON contexts to run on a single =
thread,
> > forcing them to have same monitoring contexts.
> >=20
> > Online Parameters Updates
> > -------------------------
> >=20
> > Someone would also want to dynamically turn on/off and/or tune each p=
olicy.
> > This is impossible with current DAMON, because it prohibits updating =
any
> > parameter while it is running.  We disallow the online parameters upd=
ate
> > mainly because we want to avoid doing additional synchronization betw=
een the
> > running kdamond and the parameters updater.  The idea for supporting =
the use
> > case while avoiding the additional synchronization is, allowing users=
 to pause
> > DAMON and update parameters while it is paused.
> >=20
> > A Better DAMON interface
> > ------------------------
> >=20
> > DAMON is currently exposing its major functionality to the user space=
 via the
> > debugfs.  After all, DAMON is not for only debugging.  Also, this mak=
es the
> > interface depends on debugfs unnecessarily, and considered unreliable=
.  Also,
> > the interface is quite unflexible for future interface extension.  I =
admit it
> > was not a good choice.
> >=20
> > It would be better to implement another reliable and easily extensibl=
e
> > interface, and deprecate the debugfs interface.  The idea is exposing=
 the
> > interface via sysfs using hierarchical Kobjects under mm_kobject.  Fo=
r example,
> > the usage would be something like below:
> >=20
> >     # cd /sys/kernel/mm/damon
> >     # echo 1 > nr_kdamonds
> >     # echo 1 > kdamond_1/contexts/nr_contexts
> >     # echo va > kdamond_1/contexts/context_1/target_type
> >     # echo 1 > kdamond_1/contexts/context_1/targets/nr_targets
> >     # echo $(pidof <workload>) > \
> >                     kdamond_1/contexts/context_1/targets/target_1/pid
> >     # echo Y > monitor_on
> >=20
> > The underlying files hierarchy could be something like below.
> >=20
> >     /sys/kernel/mm/damon/
> >     =E2=94=82 monitor_on
> >     =E2=94=82 kdamonds
> >     =E2=94=82 =E2=94=82 nr_kdamonds
> >     =E2=94=82 =E2=94=82 kdamond_1/
> >     =E2=94=82 =E2=94=82 =E2=94=82 kdamond_pid
> >     =E2=94=82 =E2=94=82 =E2=94=82 contexts
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 nr_contexts
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 context_1/
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 target_type (va=
 | pa)
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 attrs/
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 inter=
vals/sampling,aggr,update
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 nr_re=
gions/min,max
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 targets/
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 nr_ta=
rgets
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 targe=
t_1/
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=
=82 pid
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=
=82 init_regions/
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=
=82 =E2=94=82 region1/
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=
=82 =E2=94=82 =E2=94=82 start,end
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=
=82 =E2=94=82 ...
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 ...
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 schemes/
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 nr_sc=
hemes
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 schem=
e_1/
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=
=82 action
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=
=82 target_access_pattern/
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=
=82 =E2=94=82 sz/min,max
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=
=82 =E2=94=82 nr_accesses/min,max
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=
=82 =E2=94=82 age/min,max
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=
=82 quotas/
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=
=82 =E2=94=82 ms,bytes,reset_interval
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=
=82 =E2=94=82 prioritization_weights/
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=
=82 =E2=94=82   sz,nr_accesses,age
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=
=82 watermarks/
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=
=82   metric,check_interval,high,mid,low
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=
=82 stats/
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=
=82 =E2=94=82 quota_exceeds
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=
=82 =E2=94=82 tried/nr,sz
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=
=82 =E2=94=82 applied/nr,sz
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=
=82 ...
> >     =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 ...
> >     =E2=94=82 =E2=94=82 ...
> >=20
> > More DAMON Future Works
> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> >=20
> > In addition to above mentioned things, there are many works to do.  I=
t would be
> > better to extend DAMON for more use cases and address spaces support,=
 including
> > page granularity, idleness only, read/write only, page cache only, an=
d cgroups
> > monitoring supports.
> >=20
>=20
> Cgroup support is very interesting so that we do not need to constantly=
=20
> maintain a list of target_ids when a job forks new processes.  We've=20
> discussed the potential for passing a cgroup inode as the target rather=
=20
> than pid for virtual address monitoring that would operate over the set=
 of=20
> processes attached to that cgroup hierarchy.  Is this what you imagine =
for=20
> cgroup support or something more elaborate (or something different=20
> entirely :)?

You're correct.  That's my current idea.

>=20
> > Also it would be valuable to improve the accuracy of monitoring, usin=
g some
> > adaptive monitoring attributes tuning or some new fancy idea[1].
> >=20
> > DAMOS could also be improved by utilizing its own autotuning feature,=
 for
> > example, by monitoring PSI and other metrics related to the given act=
ion.
> >=20
> > [1] https://linuxplumbersconf.org/event/11/contributions/984/
> >=20
>=20
> I'd like to add another topic here: DAMON based monitoring for virtuali=
zed=20
> workloads.  Today, it seems like you'd need to run DAMON in the guest t=
o=20
> be able to describe its working set.  Monitoring the hypervisor process=
=20
> is inadequate because it will reveal the first access to the guest owne=
d=20
> memory but not the accesses done by the guest itself.  So it seems like=
=20
> the *current* support for virtual address monitoring is insufficient=20
> unless the guest is enlightened to do DAMON monitoring itself.
>=20
> What about unenlightened guests?  An idea is a third DAMON monitoring m=
ode=20
> that monitors accesses in the EPT.  Have you thought about this before =
or=20
> other ways to monitor memory access for an *unenlightened* guest?  Woul=
d=20
> love to have a discussion on this.

We're checking guest-internal accesses via 'mmu_notifier_clear_young()', =
and I
confirmed[1] it allows DAMON on host to collect data accesses in KVM/QEMU
guests.  I'm not heavily testing/using the case, though.

[1] https://lore.kernel.org/linux-mm/CALvZod61Dx4emiV5H73mQcFN6WvmD4A2Z=3D=
sRfmN2qpBh3R-_kQ@mail.gmail.com/

Thank you again for above greatful comments, David.  If you have more
questions/comments, or you find anything I'm missing, please let me know.


Thanks,
SJ