From: "Aneesh Kumar K.V" <aneesh.kumar-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org>
To: Yuanchu Xie <yuanchu-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
Roman Gushchin
<roman.gushchin-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>,
Yu Zhao <yuzhao-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: Andrew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Yuanchu Xie <yuanchu-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Subject: Re: [RFC PATCH 0/2] mm: multi-gen LRU: working set extensions
Date: Tue, 10 Jan 2023 11:55:18 +0530 [thread overview]
Message-ID: <87k01ulxdd.fsf@linux.ibm.com> (raw)
In-Reply-To: <20221214225123.2770216-1-yuanchu-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Yuanchu Xie <yuanchu-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> writes:
> Introduce a way of monitoring the working set of a workload, per page
> type and per NUMA node, with granularity in minutes. It has page-level
> granularity and minimal memory overhead by building on the
> Multi-generational LRU framework, which already has most of the
> infrastructure and is just missing a useful interface.
>
> MGLRU organizes pages in generations, where an older generation contains
> colder pages, and aging promotes the recently used pages into the young
> generation and creates a new one. The working set size is how much
> memory an application needs to keep working, the amount of "hot" memory
> that's frequently used. The only missing pieces between MGLRU
> generations and working set estimation are a consistent aging cadence
> and an interface; we introduce the two additions.
So with kold kthread do we need aging in reclaim ? Should we switch reciam
to wakeup up kold kthread to do aging instead of doing try_to_inc_max_seq?
This would also help us to try different aging mechanism which can run
better in a kthread.
>
> Periodic aging
> ======
> MGLRU Aging is currently driven by reclaim, so the amount of time
> between generations is non-deterministic. With memcgs being aged
> regularly, MGLRU generations become time-based working set information.
>
> - memory.periodic_aging: a new root-level only file in cgroupfs
> Writing to memory.periodic_aging sets the aging interval and opts into
> periodic aging.
> - kold: a new kthread that ages memcgs based on the set aging interval.
>
> Page idle age stats
> ======
> - memory.page_idle_age: we group pages into idle age ranges, and present
> the number of pages per node per pagetype in each range. This
> aggregates the time information from MGLRU generations hierarchically.
>
> Use case: proactive reclaimer
> ======
> The proactive reclaimer sets the aging interval, and periodically reads
> the page idle age stats, forming a working set estimation, which it then
> calculates an amount to write to memory.reclaim.
>
> With the page idle age stats, a proactive reclaimer could calculate a
> precise amount of memory to reclaim without continuously probing and
> inducing reclaim.
>
> A proactive reclaimer that uses a similar interface is used in the
> Google data centers.
>
> Use case: workload introspection
> ======
> A workload may use the working set estimates to adjust application
> behavior as needed, e.g. preemptively killing some of its workers to
> avoid its working set thrashing, or dropping caches to fit within a
> limit.
> It can also be valuable to application developers, who can benefit from
> an out-of-the-box overview of the application's usage behaviors.
>
> TODO List
> ======
> - selftests
> - a userspace demonstrator combining periodic aging, page idle age
> stats, memory.reclaim, and/or PSI
>
> Open questions
> ======
> - MGLRU aging mechanism has a flag called force_scan. With
> force_scan=false, invoking MGLRU aging when an lruvec has a maximum
> number of generations does not actually perform aging.
> However, with force_scan=true, MGLRU moves the pages in the oldest
> generation to the second oldest generation. The force_scan=true flag
> also disables some optimizations in MGLRU's page table walks.
> The current patch sets force_scan=true, so that periodic aging would
> work without a proactive reclaimer evicting the oldest generation.
>
> - The page idle age format uses a fixed set of time ranges in seconds.
> I have considered having it be based on the aging interval, or just
> compiling the raw timestamps.
> With the age ranges based on the aging interval, a memcg that's
> undergoing memcg reclaim might have its generations in the 10
> seconds range, and a much longer aging interval would obscure this
> fact.
> The raw timestamps from MGLRU could lead to a very large file when
> aggregated hierarchically.
>
> Yuanchu Xie (2):
> mm: multi-gen LRU: periodic aging
> mm: multi-gen LRU: cgroup working set stats
>
> include/linux/kold.h | 44 ++++++++++
> include/linux/mmzone.h | 4 +-
> mm/Makefile | 3 +
> mm/kold.c | 150 ++++++++++++++++++++++++++++++++
> mm/memcontrol.c | 188 +++++++++++++++++++++++++++++++++++++++++
> mm/vmscan.c | 35 +++++++-
> 6 files changed, 422 insertions(+), 2 deletions(-)
> create mode 100644 include/linux/kold.h
> create mode 100644 mm/kold.c
>
> --
> 2.39.0.314.g84b9a713c41-goog
WARNING: multiple messages have this Message-ID (diff)
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
To: Yuanchu Xie <yuanchu@google.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Yu Zhao <yuzhao@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Shakeel Butt <shakeelb@google.com>,
Muchun Song <songmuchun@bytedance.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
cgroups@vger.kernel.org, Yuanchu Xie <yuanchu@google.com>
Subject: Re: [RFC PATCH 0/2] mm: multi-gen LRU: working set extensions
Date: Tue, 10 Jan 2023 11:55:18 +0530 [thread overview]
Message-ID: <87k01ulxdd.fsf@linux.ibm.com> (raw)
In-Reply-To: <20221214225123.2770216-1-yuanchu@google.com>
Yuanchu Xie <yuanchu@google.com> writes:
> Introduce a way of monitoring the working set of a workload, per page
> type and per NUMA node, with granularity in minutes. It has page-level
> granularity and minimal memory overhead by building on the
> Multi-generational LRU framework, which already has most of the
> infrastructure and is just missing a useful interface.
>
> MGLRU organizes pages in generations, where an older generation contains
> colder pages, and aging promotes the recently used pages into the young
> generation and creates a new one. The working set size is how much
> memory an application needs to keep working, the amount of "hot" memory
> that's frequently used. The only missing pieces between MGLRU
> generations and working set estimation are a consistent aging cadence
> and an interface; we introduce the two additions.
So with kold kthread do we need aging in reclaim ? Should we switch reciam
to wakeup up kold kthread to do aging instead of doing try_to_inc_max_seq?
This would also help us to try different aging mechanism which can run
better in a kthread.
>
> Periodic aging
> ======
> MGLRU Aging is currently driven by reclaim, so the amount of time
> between generations is non-deterministic. With memcgs being aged
> regularly, MGLRU generations become time-based working set information.
>
> - memory.periodic_aging: a new root-level only file in cgroupfs
> Writing to memory.periodic_aging sets the aging interval and opts into
> periodic aging.
> - kold: a new kthread that ages memcgs based on the set aging interval.
>
> Page idle age stats
> ======
> - memory.page_idle_age: we group pages into idle age ranges, and present
> the number of pages per node per pagetype in each range. This
> aggregates the time information from MGLRU generations hierarchically.
>
> Use case: proactive reclaimer
> ======
> The proactive reclaimer sets the aging interval, and periodically reads
> the page idle age stats, forming a working set estimation, which it then
> calculates an amount to write to memory.reclaim.
>
> With the page idle age stats, a proactive reclaimer could calculate a
> precise amount of memory to reclaim without continuously probing and
> inducing reclaim.
>
> A proactive reclaimer that uses a similar interface is used in the
> Google data centers.
>
> Use case: workload introspection
> ======
> A workload may use the working set estimates to adjust application
> behavior as needed, e.g. preemptively killing some of its workers to
> avoid its working set thrashing, or dropping caches to fit within a
> limit.
> It can also be valuable to application developers, who can benefit from
> an out-of-the-box overview of the application's usage behaviors.
>
> TODO List
> ======
> - selftests
> - a userspace demonstrator combining periodic aging, page idle age
> stats, memory.reclaim, and/or PSI
>
> Open questions
> ======
> - MGLRU aging mechanism has a flag called force_scan. With
> force_scan=false, invoking MGLRU aging when an lruvec has a maximum
> number of generations does not actually perform aging.
> However, with force_scan=true, MGLRU moves the pages in the oldest
> generation to the second oldest generation. The force_scan=true flag
> also disables some optimizations in MGLRU's page table walks.
> The current patch sets force_scan=true, so that periodic aging would
> work without a proactive reclaimer evicting the oldest generation.
>
> - The page idle age format uses a fixed set of time ranges in seconds.
> I have considered having it be based on the aging interval, or just
> compiling the raw timestamps.
> With the age ranges based on the aging interval, a memcg that's
> undergoing memcg reclaim might have its generations in the 10
> seconds range, and a much longer aging interval would obscure this
> fact.
> The raw timestamps from MGLRU could lead to a very large file when
> aggregated hierarchically.
>
> Yuanchu Xie (2):
> mm: multi-gen LRU: periodic aging
> mm: multi-gen LRU: cgroup working set stats
>
> include/linux/kold.h | 44 ++++++++++
> include/linux/mmzone.h | 4 +-
> mm/Makefile | 3 +
> mm/kold.c | 150 ++++++++++++++++++++++++++++++++
> mm/memcontrol.c | 188 +++++++++++++++++++++++++++++++++++++++++
> mm/vmscan.c | 35 +++++++-
> 6 files changed, 422 insertions(+), 2 deletions(-)
> create mode 100644 include/linux/kold.h
> create mode 100644 mm/kold.c
>
> --
> 2.39.0.314.g84b9a713c41-goog
next prev parent reply other threads:[~2023-01-10 6:25 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-14 22:51 [RFC PATCH 0/2] mm: multi-gen LRU: working set extensions Yuanchu Xie
2022-12-14 22:51 ` Yuanchu Xie
2022-12-14 22:51 ` [RFC PATCH 1/2] mm: multi-gen LRU: periodic aging Yuanchu Xie
[not found] ` <20221214225123.2770216-1-yuanchu-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2022-12-14 22:51 ` [RFC PATCH 2/2] mm: multi-gen LRU: cgroup working set stats Yuanchu Xie
2022-12-14 22:51 ` Yuanchu Xie
[not found] ` <20221214225123.2770216-3-yuanchu-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2023-01-11 14:19 ` Michal Koutný
2023-01-11 14:19 ` Michal Koutný
2023-01-10 6:25 ` Aneesh Kumar K.V [this message]
2023-01-10 6:25 ` [RFC PATCH 0/2] mm: multi-gen LRU: working set extensions Aneesh Kumar K.V
[not found] ` <87k01ulxdd.fsf-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org>
2023-01-12 1:50 ` Yuanchu Xie
2023-01-12 1:50 ` Yuanchu Xie
2023-01-11 14:17 ` Michal Koutný
2023-01-11 14:17 ` Michal Koutný
[not found] ` <20230111141716.GA14685-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
2023-01-12 1:38 ` Yuanchu Xie
2023-01-12 1:38 ` Yuanchu Xie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87k01ulxdd.fsf@linux.ibm.com \
--to=aneesh.kumar-texmvtczx7aybs5ee8rs3a@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
--cc=mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=roman.gushchin-fxUVXftIFDnyG1zEObXtfA@public.gmane.org \
--cc=shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org \
--cc=yuanchu-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=yuzhao-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.