* What kind of memory is DAMON RECLAIM able to free? @ 2023-04-28 14:15 Grzegorz Uriasz 2023-05-02 1:27 ` SeongJae Park 0 siblings, 1 reply; 4+ messages in thread From: Grzegorz Uriasz @ 2023-04-28 14:15 UTC (permalink / raw) To: damon; +Cc: dutkahugo Hi! I'm running some experiments using DAMON RECLAIM on the 6.2 kernel. I've set up an VM with free page reporting enabled with 16 vcores and 16GB of ram with very aggressive memory reclamation settings, my kernel boot line includes: - transparent_hugepage=never - page_reporting.page_reporting_order=0 - damon_reclaim.enabled=Y - damon_reclaim.min_age=10000000 - damon_reclaim.wmarks_low=0 - damon_reclaim.wmarks_mid=999 - damon_reclaim.wmarks_high=1000 - damon_reclaim.quota_sz=1073741824 - damon_reclaim.quota_reset_interval_ms=1000 The memory usage of the VM starts at 800 MB, after running some workloads and ballooning the VM to 16 GB DAMON RECLAIM was able to quickly bring the memory usage back down to 3GB, after which it just stopped doing anything. What concerns me is that 20%(3.2GB for that VM) is the default low watermark in the DAMON RECLAIM module. I've verified that the watermarks were properly set in sysfs to my custom values, but it doesn't seem to affect anything as free -mh shows 400Mb for apps but 2.6GB for caches/buffers. The VM besides idling for a very long time isn't able to free the buffers. When dropping the caches manually using /proc/sys/vm/drop_caches the memory usage returns back to the starting one. The cache/buffers don't increase at all after dropping them indicating that this memory was indeed idling. My questions: 1. Are there types of freeable memory which DAMON is not allowed to touch? 2. What prevents DAMON from getting back the memory? 2. /sys/kernel/debug/damon/* seems separate from DAMON RECLAIM, /sys/module/damon_reclaim/parameters/kdamond_pid shows DAMON RECLAIM is running but the DAMON debugfs doesn't show it nor exposes any registered reclamation schemes. Best Regards, Grzegorz Uriasz ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: What kind of memory is DAMON RECLAIM able to free? 2023-04-28 14:15 What kind of memory is DAMON RECLAIM able to free? Grzegorz Uriasz @ 2023-05-02 1:27 ` SeongJae Park 2023-05-04 13:47 ` Grzegorz Uriasz 0 siblings, 1 reply; 4+ messages in thread From: SeongJae Park @ 2023-05-02 1:27 UTC (permalink / raw) To: Grzegorz Uriasz; +Cc: damon, dutkahugo Hi Grzegorz, On Fri, 28 Apr 2023 16:15:12 +0200 Grzegorz Uriasz <gorbak25@gmail.com> wrote: > Hi! > > I'm running some experiments using DAMON RECLAIM on the 6.2 kernel. I've > set up an VM with free page reporting enabled with 16 vcores and 16GB of > ram with very aggressive memory reclamation settings, my kernel boot > line includes: > - transparent_hugepage=never > - page_reporting.page_reporting_order=0 > - damon_reclaim.enabled=Y > - damon_reclaim.min_age=10000000 > - damon_reclaim.wmarks_low=0 > - damon_reclaim.wmarks_mid=999 > - damon_reclaim.wmarks_high=1000 > - damon_reclaim.quota_sz=1073741824 > - damon_reclaim.quota_reset_interval_ms=1000 > > The memory usage of the VM starts at 800 MB, after running some > workloads and ballooning the VM to 16 GB DAMON RECLAIM was able to > quickly bring the memory usage back down to 3GB, after which it just > stopped doing anything. What concerns me is that 20%(3.2GB for that VM) > is the default low watermark in the DAMON RECLAIM module. I've verified > that the watermarks were properly set in sysfs to my custom values, but > it doesn't seem to affect anything as free -mh shows 400Mb for apps but > 2.6GB for caches/buffers. The VM besides idling for a very long time > isn't able to free the buffers. When dropping the caches manually using > /proc/sys/vm/drop_caches the memory usage returns back to the starting > one. The cache/buffers don't increase at all after dropping them > indicating that this memory was indeed idling. Thank you for sharing your great experience and questions! > > My questions: > 1. Are there types of freeable memory which DAMON is not allowed to touch? Basically there is no such limitation. We implemented page type of cgroups based DAMOS filtering feature in v6.3, but as you're using v6.2, it shouldn't be related with your use case. One possible limitation for this case might be the monitoring region. You can specify the region to monitor and reclaim using `monitor_region_{start,end}` parameters. By default, it's set to biggest System RAM. If your system is having non-countinuous System RAMs and the biggest one is not covering the 3GiB region, the 3GiB regions will not be moitored and therefore not reclaimed. Can you check if it is excluding the 3GiB region? You may be able to get it using `proc/iomem` like files. You could also refer to DAMON user-space tool to show its usage of the file[1]. Also, you could get DAMON_RECLAIM internal statistics[2]. Checking those could also provide some hints, or help excluding unnecessary suspects. > 2. What prevents DAMON from getting back the memory? Other than quotas, watermarks and access pattern, there should be nothing preventing DAMON_RECLAIM reclaiming memory on v6.2 kernel. DAMOS filters could also make some effect, but as mentioned-above, it's available from v6.3. > 2. /sys/kernel/debug/damon/* seems separate from DAMON RECLAIM, > /sys/module/damon_reclaim/parameters/kdamond_pid shows DAMON RECLAIM is > running but the DAMON debugfs doesn't show it nor exposes any registered > reclamation schemes. You're correct. DAMON provides two main user interfaces, via debugfs (/sys/kernel/debug/damon/) and sysfs (/sys/kernel/mm/damon/). Those are for fine-controlled use of all DAMON capabilities. Btw, the debugfs interface is deprecated now, so please use the sysfs interface. DAMON modules like DAMON_RECLAIM and DAMON_LRU_SORT are for simpler control of DAMON for only special purpose system-wide utilization, like proactive reclaim and LRU lists manipulation. Those hence provide simpler module parameters interface. [1] https://github.com/awslabs/damo/blob/next/_damo_paddr_layout.py [2] https://docs.kernel.org/admin-guide/mm/damon/reclaim.html#nr-reclaim-tried-regions Thanks, SJ > > Best Regards, > Grzegorz Uriasz ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: What kind of memory is DAMON RECLAIM able to free? 2023-05-02 1:27 ` SeongJae Park @ 2023-05-04 13:47 ` Grzegorz Uriasz 2023-05-04 17:17 ` SeongJae Park 0 siblings, 1 reply; 4+ messages in thread From: Grzegorz Uriasz @ 2023-05-04 13:47 UTC (permalink / raw) To: SeongJae Park; +Cc: damon, dutkahugo, Grzegorz Uriasz Hi SeongJae, /* I apologize for the duplicate email but damon@lists.linux.dev rejected my previous message due to embedded pictures, moved the screenshots to imgur ;) */ Thank you for your help and providing us a link to the DAMON user space tools, they are very helpful. I've checked the memory regions and indeed there was a 3GB RAM region besides the largest system ram(https://imgur.com/a/q8gdV8b). After setting the start of the monitoring region to 0 DAMOS RECLAIM suddenly became more responsive and ram reclamation became more immediate and useful. Unfortunately there is still something which holds DAMOS RECLAIM back. I've changed the memory region to start from 0 and ran the same workload as before, Damon was able to reclaim 1GB more ram compared to my previous tests, unfortunately this still leaves 2GB's of unused RAM in the caches :( Like before after dropping the caches the cache usage never grows indicating the ram was idling. I've also checked that after rising the amount of monitoring regions in DAMON_RECLAIM from 10 to 100 the reclamation became more effective, DAMON_RECLAIM was able to reclaim 0.4GB more than in the last test, but this still left 1.6GB of reclaimable ram overall(https://imgur.com/a/FHww4XA). My questions: 1) What is holding DAMOS RECLAIM back? 2) Is it possible to explicitly specify multiple monitoring regions in DAMOS RECLAIM or do i need to configure DAMOS manually from userspace for that? 3) How to find the number of monitoring regions where DAMOS is most effective? Best Regards, Grzegorz Uriasz On 02/05/2023 03:27, SeongJae Park wrote: > Hi Grzegorz, > > On Fri, 28 Apr 2023 16:15:12 +0200 Grzegorz Uriasz <gorbak25@gmail.com> wrote: > >> Hi! >> >> I'm running some experiments using DAMON RECLAIM on the 6.2 kernel. I've >> set up an VM with free page reporting enabled with 16 vcores and 16GB of >> ram with very aggressive memory reclamation settings, my kernel boot >> line includes: >> - transparent_hugepage=never >> - page_reporting.page_reporting_order=0 >> - damon_reclaim.enabled=Y >> - damon_reclaim.min_age=10000000 >> - damon_reclaim.wmarks_low=0 >> - damon_reclaim.wmarks_mid=999 >> - damon_reclaim.wmarks_high=1000 >> - damon_reclaim.quota_sz=1073741824 >> - damon_reclaim.quota_reset_interval_ms=1000 >> >> The memory usage of the VM starts at 800 MB, after running some >> workloads and ballooning the VM to 16 GB DAMON RECLAIM was able to >> quickly bring the memory usage back down to 3GB, after which it just >> stopped doing anything. What concerns me is that 20%(3.2GB for that VM) >> is the default low watermark in the DAMON RECLAIM module. I've verified >> that the watermarks were properly set in sysfs to my custom values, but >> it doesn't seem to affect anything as free -mh shows 400Mb for apps but >> 2.6GB for caches/buffers. The VM besides idling for a very long time >> isn't able to free the buffers. When dropping the caches manually using >> /proc/sys/vm/drop_caches the memory usage returns back to the starting >> one. The cache/buffers don't increase at all after dropping them >> indicating that this memory was indeed idling. > Thank you for sharing your great experience and questions! > >> My questions: >> 1. Are there types of freeable memory which DAMON is not allowed to touch? > Basically there is no such limitation. We implemented page type of cgroups > based DAMOS filtering feature in v6.3, but as you're using v6.2, it shouldn't > be related with your use case. > > One possible limitation for this case might be the monitoring region. You can > specify the region to monitor and reclaim using `monitor_region_{start,end}` > parameters. By default, it's set to biggest System RAM. If your system is > having non-countinuous System RAMs and the biggest one is not covering the 3GiB > region, the 3GiB regions will not be moitored and therefore not reclaimed. > > Can you check if it is excluding the 3GiB region? You may be able to get it > using `proc/iomem` like files. You could also refer to DAMON user-space tool > to show its usage of the file[1]. > > Also, you could get DAMON_RECLAIM internal statistics[2]. Checking those could > also provide some hints, or help excluding unnecessary suspects. > >> 2. What prevents DAMON from getting back the memory? > Other than quotas, watermarks and access pattern, there should be nothing > preventing DAMON_RECLAIM reclaiming memory on v6.2 kernel. DAMOS filters could > also make some effect, but as mentioned-above, it's available from v6.3. > >> 2. /sys/kernel/debug/damon/* seems separate from DAMON RECLAIM, >> /sys/module/damon_reclaim/parameters/kdamond_pid shows DAMON RECLAIM is >> running but the DAMON debugfs doesn't show it nor exposes any registered >> reclamation schemes. > You're correct. DAMON provides two main user interfaces, via debugfs > (/sys/kernel/debug/damon/) and sysfs (/sys/kernel/mm/damon/). Those are for > fine-controlled use of all DAMON capabilities. Btw, the debugfs interface is > deprecated now, so please use the sysfs interface. > > DAMON modules like DAMON_RECLAIM and DAMON_LRU_SORT are for simpler control of > DAMON for only special purpose system-wide utilization, like proactive reclaim > and LRU lists manipulation. Those hence provide simpler module parameters > interface. > > > [1] https://github.com/awslabs/damo/blob/next/_damo_paddr_layout.py > [2] https://docs.kernel.org/admin-guide/mm/damon/reclaim.html#nr-reclaim-tried-regions > > > Thanks, > SJ > >> Best Regards, >> Grzegorz Uriasz ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: What kind of memory is DAMON RECLAIM able to free? 2023-05-04 13:47 ` Grzegorz Uriasz @ 2023-05-04 17:17 ` SeongJae Park 0 siblings, 0 replies; 4+ messages in thread From: SeongJae Park @ 2023-05-04 17:17 UTC (permalink / raw) To: Grzegorz Uriasz; +Cc: SeongJae Park, damon, dutkahugo Hi Grzegorz, On Thu, 4 May 2023 15:47:25 +0200 Grzegorz Uriasz <gorbak25@gmail.com> wrote: > Hi SeongJae, > > /* I apologize for the duplicate email but damon@lists.linux.dev > rejected my previous message due to embedded pictures, moved the > screenshots to imgur ;) */ No problem, thank you for patiently posting again :) Seems the screenshots are showing only text outputs. Maybe you could simply copy-paste those into the mail body later if you want to avoid uploading it to imgur separately. > > Thank you for your help and providing us a link to the DAMON user space > tools, they are very helpful. > I've checked the memory regions and indeed there was a 3GB RAM region > besides the largest system ram(https://imgur.com/a/q8gdV8b). > > After setting the start of the monitoring region to 0 DAMOS RECLAIM > suddenly became more responsive and ram reclamation became more > immediate and useful. Unfortunately there is still something which holds > DAMOS RECLAIM back. > > I've changed the memory region to start from 0 and ran the same workload > as before, Damon was able to reclaim 1GB more ram compared to my > previous tests, unfortunately this still leaves 2GB's of unused RAM in > the caches :( Like before after dropping the caches the cache usage > never grows indicating the ram was idling. I've also checked that after > rising the amount of monitoring regions in DAMON_RECLAIM from 10 to 100 > the reclamation became more effective, DAMON_RECLAIM was able to reclaim > 0.4GB more than in the last test, but this still left 1.6GB of > reclaimable ram overall(https://imgur.com/a/FHww4XA). Thank you for sharing your great experiments results! > > My questions: > 1) What is holding DAMOS RECLAIM back? Currently, DAMON utilizes its own monitoring accuracy-overhead tradeoff mechanism, namely Region Based Sampling[1] and Adaptive Regions Adjustment[2]. I guess DAMON is not showing the remaining 1.6GB as cold enough to be reclaimed, due to the traded accuracy. You can increase the accuracy as a cost of increased monitoring overhead by increasing the {min,max}_nr_accesses. I think this explains why you shown DAMON_RECLAIM reclaiming 0.4GB more memory after you increased min_nr_regions from 10 to 100. This also explains why dropping cache reclaimed 1.6GB more memory. Cache dropping and page fault-driven memory population works in page granularity, so could be more accurate than DAMON in general. In detail, no entire page would be idle on the system. Based on the second screenshot, we can assume at least the 97MB of buff/cache memory could be assumed to be still accesses. My hopotheses is that the pages for the 97MB memory are quite evenly spread in the DAMON_RECLAIM unreclaiming 2GB memory. In the case, because DAMON works in region-based sampling[1], it will occasionally pick one of the 97MB pages as the page to sample access, and conclude the 2GB region is accessed. For more detail, let's assume the pages are really evenly distributed, and the pages are accessed at least once per the DAMON's sampling interval, which is 5ms by default. Then, for about 1/20 (97MB / 2GB) times of sampling, one of the pages are picked as sample page. DAMON aggregates the sampling results for its aggregation interval, which is 100ms, so do 20 repeated sampling. So for every aggregation interval, DAMON shows the region is having at least one sample saying it was accessed. So it conclude the 2GB regions is accessed at least once per the 20 samples, and doesn't reclaim the entire 2GB region. You may further increase the {min,max}_nr_regions to increase DAMON accuracy and hence reclaim more memory. Note that it will also increase the monitoring overhead. IF you even increase the numbers to 'your system memory / page size', DAMON will do the monitoring in page granularity[3], so may provide the best accuracy same to that of cache dropping. One possible way for reclaiming more memory while keeping the overhead lower would be finding memory regions that DAMON is still thinking hot, and setting the monitoring region to only the region, and continuously dividing-and-conquering. > 2) Is it possible to explicitly specify multiple monitoring regions in > DAMOS RECLAIM or do i need to configure DAMOS manually from userspace > for that? At the moment, DAMON_RECLAIM does not provide a way for such fine control. So you should use DAMOS manually. Using the DAMON user space tool, damo, you may use '--regions' option. > 3) How to find the number of monitoring regions where DAMOS is most > effective? You may simply try different numbers and show the progress. Visualizing monitored access pattern could also be helpful. For DAMOS-specific case, checking DAMOS stats and tried_regions could also be useful. Note that these limitations are only in current implementation. There are TODO items for improving DAMON accuracy and automating tuning. Hopefully future version of DAMON will provide better accuracy and easier tuning. Nevertheless, this is where we are at the moment. Sending questions, sharing experience/usages, and participating discussions like this helps DAMON community knowing unexpected requirements, getting new ideas, and prioritizing specific items. Thank you for inspiring me with this. Please feel free to ask anything if you need. [1] https://docs.kernel.org/mm/damon/design.html#region-based-sampling [2] https://docs.kernel.org/mm/damon/design.html#adaptive-regions-adjustment [3] https://docs.kernel.org/mm/damon/faq.html#can-i-simply-monitor-page-granularity Thanks, SJ > > Best Regards, > Grzegorz Uriasz > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-05-04 17:17 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-04-28 14:15 What kind of memory is DAMON RECLAIM able to free? Grzegorz Uriasz 2023-05-02 1:27 ` SeongJae Park 2023-05-04 13:47 ` Grzegorz Uriasz 2023-05-04 17:17 ` SeongJae Park
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.