From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91545C433F5 for ; Fri, 28 Jan 2022 13:42:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C38356B00BF; Fri, 28 Jan 2022 08:42:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BE82C6B00C2; Fri, 28 Jan 2022 08:42:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A87EA6B00C3; Fri, 28 Jan 2022 08:42:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0241.hostedemail.com [216.40.44.241]) by kanga.kvack.org (Postfix) with ESMTP id 978296B00BF for ; Fri, 28 Jan 2022 08:42:07 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 579E883EA7 for ; Fri, 28 Jan 2022 13:42:07 +0000 (UTC) X-FDA: 79079809494.22.D436D6C Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf24.hostedemail.com (Postfix) with ESMTP id A46CD18000C for ; Fri, 28 Jan 2022 13:41:59 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 2ACC3B82571; Fri, 28 Jan 2022 13:41:58 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8A564C340E0; Fri, 28 Jan 2022 13:41:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1643377317; bh=DinAlQIHyaa2k4xsDXD4didLNirGFnFwz0nf5wnLr40=; h=From:To:Cc:Subject:Date:In-Reply-To:From; b=bFY0mLaK5p4i7lmJZ/8AQxSUz0yXCicZRoUQV8G2V+fzLWkEaBPxW9YEyUEuHgrca yq2Bs8nnaNbfacdjgyeBb3nH+MESyKxkvyqnwzKWkvvGpysulIN8vA9h1GVErj0sG8 VbcDn5+bTVz7zFpV4tC32JoMxYWvaPs0dMpMRQgAIDjEsy8WdEKgw9I4st/UOg8CFk H/wlpBDCjh1Mll/vQlvkParuNYKB/E2NQGf+AgEN4L83H8slXTMhgwPwB1Qq17zf0n 6PHTzHXnFd7hcHB/RUfDhxdaBvn0T4KJG7trc8KVJjLc9rjG2laxmsIuwdQGdWBQ17 OYi33FbDTrvbA== From: SeongJae Park To: David Rientjes Cc: SeongJae Park , Johannes Weiner , Dave Hansen , Andrew Morton , Jonathan.Cameron@huawei.com, amit@kernel.org, benh@kernel.crashing.org, corbet@lwn.net, david@redhat.com, dwmw@amazon.com, elver@google.com, foersleo@amazon.de, gthelen@google.com, markubo@amazon.de, shakeelb@google.com, baolin.wang@linux.alibaba.com, guoqing.jiang@linux.dev, xhao@linux.alibaba.com, hanyihao@vivo.com, changbin.du@gmail.com, kuba@kernel.org, rongwei.wang@linux.alibaba.com, rikard.falkeborn@gmail.com, geert@linux-m68k.org, kilobyte@angband.pl, linux-damon@amazon.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PLAN] Some humble ideas for DAMON future works Date: Fri, 28 Jan 2022 13:41:46 +0000 Message-Id: <20220128134146.5379-1-sj@kernel.org> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 In-Reply-To: <7afca3b5-626a-8356-aa73-b378f5aa7a3c@google.com> Content-Type: text/plain; charset=UTF-8 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: A46CD18000C X-Stat-Signature: mi6fdkiii739c8e9fsk476xf6dnj7zkm Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=bFY0mLaK; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf24.hostedemail.com: domain of sj@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=sj@kernel.org X-Rspam-User: nil X-HE-Tag: 1643377319-357083 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello David, Thank you so much for the greatful comments! On Sun, 23 Jan 2022 14:48:35 -0800 (PST) David Rientjes wrote: > On Wed, 19 Jan 2022, SeongJae Park wrote: >=20 > > User-space Policy or In-kernel Policy? Both. > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >=20 > > When discussing about a sort of kernel involved system efficiency > > optimizations, I show two kinds of people who have slightly different= opinions. > > The first party prefer to implement only simple but efficient mechani= sms in the > > kernel and export it to user space, so that users can make smart user= space > > policy. Meanwhile, the second party prefer the kernel just works. I= agree > > with both parties. > >=20 >=20 > Thanks for starting this discussion, SeongJae, and kicking it off with = all=20 > of your roadmap thoughts. It's very helpful. >=20 > I would love for this to turn into an active discussion amongst those=20 > people who are currently looking into using DAMON for their set of=20 > interests and also those who are investigating how its current set of=20 > support can be adapated for their use cases. Glad to hear that and same to me! >=20 > For discussion on where the kernel and userspace boundary lies for poli= cy=20 > decisions, I think it depends heavily on (1) the specific subcomponent = of=20 > the mm subsystem being discussed, I don't think this boundary will be t= he=20 > same for all areas (and can/will evolve over time), and (2) the differe= nce =20 > between the base out-of-the-box behavior that Linux provides for everyb= ody=20 > and the elaborate support that some users need for efficiency or=20 > performance. This is going to be very different for things like hugepa= ge=20 > optimizations and memory compaction, for example. Agreed. >=20 > > I think the first opinion makes sense as there are some valuable info= rmation > > that only user space can know. I think only such approaches could ac= hieve the > > ultimate efficiency in such cases. > > I also agree to the second party, though, because there could be some= people > > who don't have special information that only their applications know,= or > > resources to do the additional work. In-kernel simple policies will = be still > > beneficial for some users even though those are sub-optimal compared = to the > > highly tuned user space policy, if it provides some extent of efficie= ncy gain > > and no regressions for most cases. > >=20 > > I'd like to help both. For the reason, I made DAMON as an in-kernel = mechanism > > for both user and kernel-space policies. It provides highly tunable = general > > user space interface to help the first party. It also provides in-ke= rnel > > policies which built on top of DAMON using its kernel-space API for s= pecific > > common use cases with conservative default parameters that assumed to= incur no > > regression but some extent of benefits in most cases, namely DAMON-ba= sed > > proactive reclamation. I will continue pursuing the two ways. > >=20 >=20 > Are you referring only to root userspace here or are you including=20 > non-root userspace? >=20 > Imagine a process that is willing to accept the cpu overhead for doing = thp=20 > collapse for portions of its memory in process context rather than wait= ing=20 > for khugepaged and that we had a mechanism (discussed later) for doing=20 > that in the kernel. The non-root user in this case would need the abil= ity=20 > to monitor regions of its own heap, for example, and disregard others. = =20 > The malloc implementation wants to answer the question of "what regions= of=20 > my heap are accessed very frequently?" so that we can do hugepage=20 > optimizations. >=20 > Do you see that the user will have the ability to fork off a DAMON cont= ext=20 > to do this monitoring for their own heap? kdamond could be attached to= a=20 > cpu cgroup to charge the cpu overhead for doing this monitoring and the= =20 > time spent applying any actions to that memory to that workload on a=20 > multi-tenant machine. >=20 > I think it would be useful to discuss the role of non-root userspace fo= r=20 > future DAMON support. Interesting points. I haven't considered about the case in detail so far= , but I think your idea definitely makes sense. I was also thinking we might e= nd up having DAMON syscalls in future. It might make more sense to be used as = the interface for the non-root userspace. >=20 > > Imaginable DAMON-based Policies > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D > >=20 > > I'd like to start from listing some imaginable data access-aware oper= ation > > policies that I hope to eventually be made. The list will hopefully = shed light > > on how DAMON should be evolved to efficiently support the policies. > >=20 > > DAMON-based Proactive LRU-pages (de)Activation > > ---------------------------------------------- > >=20 > > The reclamation mechanism which selects reclaim target using the > > active/inactive LRU lists sometimes doesn't work well. According to = my > > previous work, providing access pattern-based hints can significantly= improve > > the performance under memory pressure[1,2]. > >=20 > > Proactive reclamation is known to be useful for many memory intensive= systems, > > and now we have a DAMON-based implementation of it[3]. However, the = proactive > > reclamation wouldn't be so welcome to some systems having high cost o= f I/O. > > Also, even though the system runs proactive reclamation, memory press= ure can > > still occasionally triggered. > >=20 > > My idea for helping this situation is manipulating the orders of page= s in LRU > > lists using DAMON-provided monitoring results. That is, making DAMON > > proactively finds hot/cold memory regions and moves pages of the hot = regions to > > the head of the active list, while moving pages of the cold regions t= o the tail > > of the inactive list. This will help eventual reclamation under memo= ry > > pressure to evict cold pages first, so incur less additional page fau= lts. > >=20 >=20 > Let's add Johannes Weiner into this discussion as=20 > well since we had previously discussed persistent background ordering o= f=20 > the lru lists based on hotness and coldness of memory before. This=20 > discussion had happened before DAMON was merged upstream, so that DAMON= =20 > has landed it is likely an area that he's interested in. >=20 > One gotcha with the above might be the handling of MADV_FREE memory tha= t=20 > we want to lazily free under memory pressure. Userspace has indicated=20 > that we can free this memory whenever necessary, so the kernel=20 > implementation moves this memory to the inactive lru regardless of any=20 > hotness or coldness of the memory. In other words, this memory *can* h= ave=20 > very high access frequencies in the short-term and then it's madvised w= ith=20 > MADV_FREE by userspace to free if we encounter memory pressure. It see= ms=20 > like this needs to override the DAMON-provided monitoring results since= =20 > userspace just knows better in certain scenarios. Interesting case. Similarly, some of our customers told us that they don= 't want DAMON_RECLAIM to affect some special processes that managing their m= emory in their highly optimized way. To deal with this kind of situations, I'm thinking about adding a sort of deny-list to DAMON-based Operation Scheme= s. With that, users would be able to specify to which memory region (e.g., m= emory regions of specific process, cgroup, pfn range, ...) each scheme shouldn'= t apply its action. So, in this case, the program would be able to avoid D= AMON's interference by adding the region to apply 'MADV_FREE' to the deny-list o= f relevant DAMON-based operation scheme before calling 'madvise()'. >=20 > > [1] https://www.usenix.org/conference/hotstorage19/presentation/park > > [2] https://linuxplumbersconf.org/event/4/contributions/548/ > > [3] https://docs.kernel.org/admin-guide/mm/damon/reclaim.html > >=20 > > DAMON-based THP Coalesce/Split > > ------------------------------ > >=20 > > THP is know to significantly improve performance, but also increase m= emory > > footprint[1]. We can minimize the memory overhead while preserving t= he > > performance benefit by asking DAMON to provide MADV_HUGEPAGE-like hin= ts for hot > > memory regions of >=3D 2MiB size, and MADV_NOHUGEPAGE-like hints for = cold memory > > regions. Our experimental user space policy implementation[2] of thi= s idea > > removes 76.15% of THP memory waste while preserving 51.25% of THP spe= edup in > > total. > >=20 >=20 > This is a very interesting area to explore, and turns out to be very=20 > timely as well. We'll soon be proposing the MADV_COLLAPSE support that= we=20 > discussed here[1] and was well received. >=20 > One thought here is that with DAMON we can create a scheme to apply a=20 > DAMOS_COLLAPSE action on very hot memory in the monitoring region that=20 > would simply call into the new MADV_COLLAPSE code to allow us to do a=20 > synchronous collapse in process context. With the current DAMON suppor= t,=20 > this seems very straight-forward once we have MADV_COLLAPSE. >=20 > [1] https://lore.kernel.org/all/d098c392-273a-36a4-1a29-59731cdf5d3d@go= ogle.com/ I also agree the synchronous and speedy THP collapse/split will be helpfu= l for application-specific usage, and what you described above is almost same t= o what I'm planning to do. Looking forward to MADV_COLLAPSE patch! BTW, I'm also interested in the system-wide long term usage of the policy= ; the slow speed of khugepaged seems not a big problem for the case. I also re= cently started thinking it might be more simple and make more sense to do the proactive works for only collapse or split instead of doing those togethe= r, depending on the system's THP setup. For example, if '/sys/kernel/mm/transparent_hugepage/enabled' is set as 'always', we could simply find cold memory regions and mark those with NOHUGEPAGE hints to reduce THP internal fragmentation caused memory bloat= , under memory pressure. If the file is set as 'madvise', we could find ho= t memory regions and mark those with HUGEPAGE hints to reduce TLB misses, w= hile free memory ratio of the system is big. >=20 > > [1] https://www.usenix.org/conference/osdi16/technical-sessions/prese= ntation/kwon > > [2] https://damonitor.github.io/doc/html/v34/vm/damon/eval.html > >=20 > > DAMON-based Tiered Memory (Pro|De)motion > > ---------------------------------------- > >=20 > > In tiered memory systems utilizing DRAM and PMEM[1], we can promote h= ot pages to > > DRAM and demote cold pages to PMEM using DAMON. A patch for allowing > > access-aware demotion user space policy development is already submit= ted[2] by > > Baolin. > >=20 >=20 > Thanks for this, it's very useful. Is it possible to point to any data= on=20 > how responsive the promotion side can be to recent memory accesses? It= =20 > seems like we'll need to promote that memory quite quickly to not suffe= r=20 > long-lived performance degradations if we're treating DRAM and PMEM as=20 > schedulable memory. >=20 > DAMON provides us with a framework so that we have complete control ove= r=20 > the efficiency of scanning PMEM for possible promotion candidates. But= =20 > I'd be very interested in seeing any data from Baolin (or anybody else)= on=20 > just how responsive the promotion side can be. Totally agreed. I think the idea makes some sense, but it's useless with= out data. This was in my TODO list since the beginning of DAMON, but I was t= oo lazy to do that. I even couldn't start the test environment setup survey= yet. Also, this is not yet prioritized among other tasks in my TODO list. If = anyone could provide any PoC results or easy-to-start DRAM/PMEM test machine set= up instructions, it would be great. Long story short, I don't have any data= . Any data or help will be welcome. >=20 > > [1] https://www.intel.com/content/www/us/en/products/details/memory-s= torage/optane-memory.html > > [2] https://lore.kernel.org/linux-mm/cover.1640171137.git.baolin.wang= @linux.alibaba.com/ > >=20 > > DAMON-based Proactive Compaction > > -------------------------------- > >=20 > > Compaction uses migration scanner to find migration source pages. Ho= t pages > > would be more likely to be unmovable compared to cold pages, so it wo= uld be > > better to try migration of cold pages first. DAMON could be used her= e. That > > is, proactively monitoring accesses via DAMON and start compaction so= that the > > migration scanner scan cold memory ranges first. I should admit I'm = not > > familiar with compaction code and I have no PoC data for this but jus= t the > > groundless idea, though. > >=20 >=20 > Is compaction enlightenment for DAMON a high priority at this point, or= =20 > would AutoNUMA be a more interesting candidate? >=20 > Today, AutoNUMA works with a sliding window setting page tables to have= =20 > PROT_NONE permissions so that we induce a page fault and can determine=20 > which cpu is accessing potentially remote memory (task_numa_work()). I= f=20 > that's happening, we can migrate the memory to the home NUMA node so th= at=20 > we can avoid those remote memory accesses and the increased latency tha= t=20 > it induces. I might missing something from your point, but what I hope to achieve wit= h this is faster and more successful defragmentation, not reduction of remote-NU= MA accesses. Applying DAMON for enlightening AutoNUMA is also an interestin= g idea, though. >=20 > Idea: if we enlightened task_numa_work() to prioritize hot memory using= =20 > DAMON, it *seems* like this would be most effective rather than relying= on=20 > a sliding window. We want to migrate memory that is frequently being=20 > accessed to reduce the remote memory access latency, we only get a mini= mal=20 > improvement (mostly only node balancing) for memory that is rarely=20 > accessed. >=20 > I'm somewhat surprised this isn't one of the highest priorities, actual= ly,=20 > for being enlightened with DAMON support, so it feels like I'm missing=20 > something obvious. >=20 > Let's also add Dave Hansen into the threa= d=20 > for the above two sections (memory tiering and AutoNUMA) because I know= =20 > he's thought about both. AutoNUMA would need to know not only how frequently the memory regions is accessed, but also which CPUs are how frequently accessing the memory reg= ions. Currently, DAMON doesn't provide the CPU-related information. I think DA= MON could be extended for the case, by using the page fault mechanism as its = basic access check primitive instead of Accessed bits. It would doable as DAMO= N is designed to support such extension, but would need some additional effort= s. >=20 > > How We Can Implement These > > -------------------------- > >=20 > > Implementing most of the above mentioned policies wouldn't be too dif= ficult > > because we have DAMON-based Operation Schemes (DAMOS). That is, we w= ill need > > to implement some more DAMOS action for each policy. Some existing k= ernel > > functions can be reused. Such actions would include LRU (de)activati= on, THP > > coalesce/split hints, memory (pro|de)motion, and cold pages first sca= nning > > compaction. Then, supporting those actions with the user space inter= face will > > allows implementing user space policies. If we find reasonably good = default > > DAMOS parameters and some kernel side control mechanism, we can furth= er make > > those as kernel policies in form of, say, builtin modules. > >=20 > > How DAMON Should Be Evolved For Supporting Those > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >=20 > > Let's discuss what kind of changes in DAMON will be needed to efficie= ntly > > support above mentioned policies. > >=20 > > Simultaneously Monitoring Different Types of Address Spaces > > ----------------------------------------------------------- > >=20 > > It would be better to run all the above mentioned policies simultaneo= usly on > > single system. As some policies such as LRU-pages (de)activation wou= ld better > > to run on physical address space while some policies such as THP coal= esce/split > > would need to run on virtual address spaces, DAMON should support con= currently > > monitoring different address spaces. We can always do this by creati= ng one > > DAMON context for each address space and running those. However, as = the > > address spaces will conflict, each other will be interfered. Current= idea for > > avoiding this is allowing multiple DAMON contexts to run on a single = thread, > > forcing them to have same monitoring contexts. > >=20 > > Online Parameters Updates > > ------------------------- > >=20 > > Someone would also want to dynamically turn on/off and/or tune each p= olicy. > > This is impossible with current DAMON, because it prohibits updating = any > > parameter while it is running. We disallow the online parameters upd= ate > > mainly because we want to avoid doing additional synchronization betw= een the > > running kdamond and the parameters updater. The idea for supporting = the use > > case while avoiding the additional synchronization is, allowing users= to pause > > DAMON and update parameters while it is paused. > >=20 > > A Better DAMON interface > > ------------------------ > >=20 > > DAMON is currently exposing its major functionality to the user space= via the > > debugfs. After all, DAMON is not for only debugging. Also, this mak= es the > > interface depends on debugfs unnecessarily, and considered unreliable= . Also, > > the interface is quite unflexible for future interface extension. I = admit it > > was not a good choice. > >=20 > > It would be better to implement another reliable and easily extensibl= e > > interface, and deprecate the debugfs interface. The idea is exposing= the > > interface via sysfs using hierarchical Kobjects under mm_kobject. Fo= r example, > > the usage would be something like below: > >=20 > > # cd /sys/kernel/mm/damon > > # echo 1 > nr_kdamonds > > # echo 1 > kdamond_1/contexts/nr_contexts > > # echo va > kdamond_1/contexts/context_1/target_type > > # echo 1 > kdamond_1/contexts/context_1/targets/nr_targets > > # echo $(pidof ) > \ > > kdamond_1/contexts/context_1/targets/target_1/pid > > # echo Y > monitor_on > >=20 > > The underlying files hierarchy could be something like below. > >=20 > > /sys/kernel/mm/damon/ > > =E2=94=82 monitor_on > > =E2=94=82 kdamonds > > =E2=94=82 =E2=94=82 nr_kdamonds > > =E2=94=82 =E2=94=82 kdamond_1/ > > =E2=94=82 =E2=94=82 =E2=94=82 kdamond_pid > > =E2=94=82 =E2=94=82 =E2=94=82 contexts > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 nr_contexts > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 context_1/ > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 target_type (va= | pa) > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 attrs/ > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 inter= vals/sampling,aggr,update > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 nr_re= gions/min,max > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 targets/ > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 nr_ta= rgets > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 targe= t_1/ > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94= =82 pid > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94= =82 init_regions/ > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94= =82 =E2=94=82 region1/ > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94= =82 =E2=94=82 =E2=94=82 start,end > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94= =82 =E2=94=82 ... > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 ... > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 schemes/ > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 nr_sc= hemes > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 schem= e_1/ > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94= =82 action > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94= =82 target_access_pattern/ > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94= =82 =E2=94=82 sz/min,max > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94= =82 =E2=94=82 nr_accesses/min,max > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94= =82 =E2=94=82 age/min,max > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94= =82 quotas/ > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94= =82 =E2=94=82 ms,bytes,reset_interval > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94= =82 =E2=94=82 prioritization_weights/ > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94= =82 =E2=94=82 sz,nr_accesses,age > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94= =82 watermarks/ > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94= =82 metric,check_interval,high,mid,low > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94= =82 stats/ > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94= =82 =E2=94=82 quota_exceeds > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94= =82 =E2=94=82 tried/nr,sz > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94= =82 =E2=94=82 applied/nr,sz > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94= =82 ... > > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 ... > > =E2=94=82 =E2=94=82 ... > >=20 > > More DAMON Future Works > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >=20 > > In addition to above mentioned things, there are many works to do. I= t would be > > better to extend DAMON for more use cases and address spaces support,= including > > page granularity, idleness only, read/write only, page cache only, an= d cgroups > > monitoring supports. > >=20 >=20 > Cgroup support is very interesting so that we do not need to constantly= =20 > maintain a list of target_ids when a job forks new processes. We've=20 > discussed the potential for passing a cgroup inode as the target rather= =20 > than pid for virtual address monitoring that would operate over the set= of=20 > processes attached to that cgroup hierarchy. Is this what you imagine = for=20 > cgroup support or something more elaborate (or something different=20 > entirely :)? You're correct. That's my current idea. >=20 > > Also it would be valuable to improve the accuracy of monitoring, usin= g some > > adaptive monitoring attributes tuning or some new fancy idea[1]. > >=20 > > DAMOS could also be improved by utilizing its own autotuning feature,= for > > example, by monitoring PSI and other metrics related to the given act= ion. > >=20 > > [1] https://linuxplumbersconf.org/event/11/contributions/984/ > >=20 >=20 > I'd like to add another topic here: DAMON based monitoring for virtuali= zed=20 > workloads. Today, it seems like you'd need to run DAMON in the guest t= o=20 > be able to describe its working set. Monitoring the hypervisor process= =20 > is inadequate because it will reveal the first access to the guest owne= d=20 > memory but not the accesses done by the guest itself. So it seems like= =20 > the *current* support for virtual address monitoring is insufficient=20 > unless the guest is enlightened to do DAMON monitoring itself. >=20 > What about unenlightened guests? An idea is a third DAMON monitoring m= ode=20 > that monitors accesses in the EPT. Have you thought about this before = or=20 > other ways to monitor memory access for an *unenlightened* guest? Woul= d=20 > love to have a discussion on this. We're checking guest-internal accesses via 'mmu_notifier_clear_young()', = and I confirmed[1] it allows DAMON on host to collect data accesses in KVM/QEMU guests. I'm not heavily testing/using the case, though. [1] https://lore.kernel.org/linux-mm/CALvZod61Dx4emiV5H73mQcFN6WvmD4A2Z=3D= sRfmN2qpBh3R-_kQ@mail.gmail.com/ Thank you again for above greatful comments, David. If you have more questions/comments, or you find anything I'm missing, please let me know. Thanks, SJ