* [PATCH v2 0/2] balloon: introduce 6 memory statistics @ 2024-06-06 3:35 zhenwei pi 2024-06-06 3:35 ` [PATCH v2 1/2] " zhenwei pi 2024-06-06 3:35 ` [PATCH v2 2/2] balloon: link virtual memory management URL zhenwei pi 0 siblings, 2 replies; 7+ messages in thread From: zhenwei pi @ 2024-06-06 3:35 UTC (permalink / raw) To: virtio-comment; +Cc: david, zhenwei pi v1 -> v2: - Suggested by MST, include the motivation in the commit message. - Make the counters clear. - Add a link for Linux VM management. v1: There is a previous discussion about the new memory statistics: https://lore.kernel.org/linux-kernel/20240423034109.1552866-1-pizhenwei@bytedance.com/T/ This may be helpful to describe the usage of these counters. zhenwei pi (2): balloon: introduce 6 memory statistics balloon: link virtual memory management URL device-types/balloon/description.tex | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) -- 2.43.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 1/2] balloon: introduce 6 memory statistics 2024-06-06 3:35 [PATCH v2 0/2] balloon: introduce 6 memory statistics zhenwei pi @ 2024-06-06 3:35 ` zhenwei pi 2024-06-06 6:09 ` Michael S. Tsirkin 2024-06-06 3:35 ` [PATCH v2 2/2] balloon: link virtual memory management URL zhenwei pi 1 sibling, 1 reply; 7+ messages in thread From: zhenwei pi @ 2024-06-06 3:35 UTC (permalink / raw) To: virtio-comment; +Cc: david, zhenwei pi Note that virtio balloon statistics are OS independent, the names are not completely the same as Linux or Windows. For example, 'reclaim' is more general than 'steal' (on linux) and 'repurpose' (on windows) for a modern operating system. Expose more memory statistics of the virtual memory subsystem from guest, it's helpful to analyze the memory performance and pressure from host side. More, once the memory pressure gets critical, a task hits ALLOC STALL, and reclaims pages in batch directly. So the ALLOC STALL count is not the same as direct reclaim count. Now we have a metric to analyze the memory performance: - y: counter increases - n: counter does not changes - h: the rate of counter change is high - l: the rate of counter change is low OOM: VIRTIO_BALLOON_S_OOM_KILL STALL: VIRTIO_BALLOON_S_ALLOC_STALL ASCAN: VIRTIO_BALLOON_S_SCAN_ASYNC DSCAN: VIRTIO_BALLOON_S_SCAN_DIRECT ARCLM: VIRTIO_BALLOON_S_RECLAIM_ASYNC DRCLM: VIRTIO_BALLOON_S_RECLAIM_DIRECT - OOM[y], STALL[*], ASCAN[*], DSCAN[*], ARCLM[*], DRCLM[*]: the guest runs under really critial memory pressure - OOM[n], STALL[h], ASCAN[*], DSCAN[l], ARCLM[*], DRCLM[l]: the memory allocation stalls due to cgroup, not the global memory pressure (for Linux). - OOM[n], STALL[h], ASCAN[*], DSCAN[h], ARCLM[*], DRCLM[h]: the memory allocation stalls due to global memory pressure. The performance gets hurt a lot. A high ratio between DRCLM/DSCAN shows quite effective memory reclaiming. - OOM[n], STALL[h], ASCAN[*], DSCAN[h], ARCLM[*], DRCLM[l]: the memory allocation stalls due to global memory pressure. the ratio between DRCLM/DSCAN gets low, the guest OS is thrashing heavily, the serious case leads poor performance and difficult trouble shooting. Ex, sshd may block on memory allocation when accepting new connections, a user can't login a VM by ssh command. - OOM[n], STALL[n], ASCAN[h], DSCAN[n], ARCLM[l], DRCLM[n]: the low ratio between ARCLM/ASCAN shows that the guest tries to reclaim more memory, but it can't. Once more memory is required in future, it will struggle to reclaim memory. Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> --- device-types/balloon/description.tex | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/device-types/balloon/description.tex b/device-types/balloon/description.tex index a1d9603..7ef4e3f 100644 --- a/device-types/balloon/description.tex +++ b/device-types/balloon/description.tex @@ -305,6 +305,12 @@ \subsubsection{Memory Statistics}\label{sec:Device Types / Memory Balloon Device #define VIRTIO_BALLOON_S_CACHES 7 #define VIRTIO_BALLOON_S_HTLB_PGALLOC 8 #define VIRTIO_BALLOON_S_HTLB_PGFAIL 9 +#define VIRTIO_BALLOON_S_OOM_KILL 10 +#define VIRTIO_BALLOON_S_ALLOC_STALL 11 +#define VIRTIO_BALLOON_S_ASYNC_SCAN 12 +#define VIRTIO_BALLOON_S_DIRECT_SCAN 13 +#define VIRTIO_BALLOON_S_ASYNC_RECLAIM 14 +#define VIRTIO_BALLOON_S_DIRECT_RECLAIM 15 le16 tag; le64 val; } __attribute__((packed)); @@ -399,6 +405,24 @@ \subsubsection{Memory Statistics Tags}\label{sec:Device Types / Memory Balloon D \item[VIRTIO_BALLOON_S_HTLB_PGFAIL (9)] The number of failed hugetlb page allocations in the guest. + +\item[VIRTIO_BALLOON_S_OOM_KILL (10)] The count of OOM killer invocations + increases when the kernel invokes the OOM killer. The OOM killer selects a task + to terminate in order to free up memory. + +\item[VIRTIO_BALLOON_S_ALLOC_STALL (11)] The count of stalls on memory allocation. + +\item[VIRTIO_BALLOON_S_ASYNC_SCAN (12)] The amount of memory scanned asynchronously + by the kernel background task (in bytes). + +\item[VIRTIO_BALLOON_S_DIRECT_SCAN (13)] The amount of memory scanned directly by + the running task (in bytes). + +\item[VIRTIO_BALLOON_S_ASYNC_RECLAIM (14)] The amount of memory reclaimed + asynchronously by the kernel background task (in bytes). + +\item[VIRTIO_BALLOON_S_DIRECT_RECLAIM (15)] The amount of memory reclaimed + directly by the running task (in bytes). \end{description} \subsubsection{Free Page Hinting}\label{sec:Device Types / Memory Balloon Device / Device Operation / Free Page Hinting} -- 2.43.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2 1/2] balloon: introduce 6 memory statistics 2024-06-06 3:35 ` [PATCH v2 1/2] " zhenwei pi @ 2024-06-06 6:09 ` Michael S. Tsirkin 2024-06-06 6:37 ` zhenwei pi 0 siblings, 1 reply; 7+ messages in thread From: Michael S. Tsirkin @ 2024-06-06 6:09 UTC (permalink / raw) To: zhenwei pi; +Cc: virtio-comment, david On Thu, Jun 06, 2024 at 11:35:43AM +0800, zhenwei pi wrote: > Note that virtio balloon statistics are OS independent, the names are > not completely the same as Linux or Windows. For example, 'reclaim' is > more general than 'steal' (on linux) and 'repurpose' (on windows) for > a modern operating system. > > Expose more memory statistics of the virtual memory subsystem from > guest, it's helpful to analyze the memory performance and pressure > from host side. > > More, once the memory pressure gets critical, a task hits ALLOC STALL, > and reclaims pages in batch directly. So the ALLOC STALL count is not > the same as direct reclaim count. > > Now we have a metric to analyze the memory performance: > - y: counter increases > - n: counter does not changes > - h: the rate of counter change is high > - l: the rate of counter change is low > > OOM: VIRTIO_BALLOON_S_OOM_KILL > STALL: VIRTIO_BALLOON_S_ALLOC_STALL > ASCAN: VIRTIO_BALLOON_S_SCAN_ASYNC > DSCAN: VIRTIO_BALLOON_S_SCAN_DIRECT > ARCLM: VIRTIO_BALLOON_S_RECLAIM_ASYNC > DRCLM: VIRTIO_BALLOON_S_RECLAIM_DIRECT > > - OOM[y], STALL[*], ASCAN[*], DSCAN[*], ARCLM[*], DRCLM[*]: > the guest runs under really critial memory pressure > > - OOM[n], STALL[h], ASCAN[*], DSCAN[l], ARCLM[*], DRCLM[l]: > the memory allocation stalls due to cgroup, not the global memory > pressure (for Linux). > > - OOM[n], STALL[h], ASCAN[*], DSCAN[h], ARCLM[*], DRCLM[h]: > the memory allocation stalls due to global memory pressure. The > performance gets hurt a lot. A high ratio between DRCLM/DSCAN shows > quite effective memory reclaiming. > > - OOM[n], STALL[h], ASCAN[*], DSCAN[h], ARCLM[*], DRCLM[l]: > the memory allocation stalls due to global memory pressure. > the ratio between DRCLM/DSCAN gets low, the guest OS is thrashing > heavily, the serious case leads poor performance and difficult > trouble shooting. Ex, sshd may block on memory allocation when > accepting new connections, a user can't login a VM by ssh command. > > - OOM[n], STALL[n], ASCAN[h], DSCAN[n], ARCLM[l], DRCLM[n]: > the low ratio between ARCLM/ASCAN shows that the guest tries to > reclaim more memory, but it can't. Once more memory is required in > future, it will struggle to reclaim memory. > > Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> > --- > device-types/balloon/description.tex | 24 ++++++++++++++++++++++++ > 1 file changed, 24 insertions(+) > > diff --git a/device-types/balloon/description.tex b/device-types/balloon/description.tex > index a1d9603..7ef4e3f 100644 > --- a/device-types/balloon/description.tex > +++ b/device-types/balloon/description.tex > @@ -305,6 +305,12 @@ \subsubsection{Memory Statistics}\label{sec:Device Types / Memory Balloon Device > #define VIRTIO_BALLOON_S_CACHES 7 > #define VIRTIO_BALLOON_S_HTLB_PGALLOC 8 > #define VIRTIO_BALLOON_S_HTLB_PGFAIL 9 > +#define VIRTIO_BALLOON_S_OOM_KILL 10 > +#define VIRTIO_BALLOON_S_ALLOC_STALL 11 > +#define VIRTIO_BALLOON_S_ASYNC_SCAN 12 > +#define VIRTIO_BALLOON_S_DIRECT_SCAN 13 > +#define VIRTIO_BALLOON_S_ASYNC_RECLAIM 14 > +#define VIRTIO_BALLOON_S_DIRECT_RECLAIM 15 > le16 tag; > le64 val; > } __attribute__((packed)); > @@ -399,6 +405,24 @@ \subsubsection{Memory Statistics Tags}\label{sec:Device Types / Memory Balloon D > > \item[VIRTIO_BALLOON_S_HTLB_PGFAIL (9)] The number of failed hugetlb page > allocations in the guest. > + > +\item[VIRTIO_BALLOON_S_OOM_KILL (10)] The count of OOM killer invocations > + increases when the kernel invokes the OOM killer. The OOM killer selects a task > + to terminate in order to free up memory. > + > +\item[VIRTIO_BALLOON_S_ALLOC_STALL (11)] The count of stalls on memory allocation. > + > +\item[VIRTIO_BALLOON_S_ASYNC_SCAN (12)] The amount of memory scanned asynchronously > + by the kernel background task (in bytes). > + > +\item[VIRTIO_BALLOON_S_DIRECT_SCAN (13)] The amount of memory scanned directly by > + the running task (in bytes). > + > +\item[VIRTIO_BALLOON_S_ASYNC_RECLAIM (14)] The amount of memory reclaimed > + asynchronously by the kernel background task (in bytes). > + > +\item[VIRTIO_BALLOON_S_DIRECT_RECLAIM (15)] The amount of memory reclaimed > + directly by the running task (in bytes). > \end{description} OOM kill description is unnecessarily verbose, the rest not very clear, and not consistent with existing stats. > \subsubsection{Free Page Hinting}\label{sec:Device Types / Memory Balloon Device / Device Operation / Free Page Hinting} > -- > 2.43.0 > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Re: [PATCH v2 1/2] balloon: introduce 6 memory statistics 2024-06-06 6:09 ` Michael S. Tsirkin @ 2024-06-06 6:37 ` zhenwei pi 2024-06-06 7:17 ` Michael S. Tsirkin 0 siblings, 1 reply; 7+ messages in thread From: zhenwei pi @ 2024-06-06 6:37 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: virtio-comment, david [-- Attachment #1: Type: text/plain, Size: 1646 bytes --] On 6/6/24 14:09, Michael S. Tsirkin wrote: > On Thu, Jun 06, 2024 at 11:35:43AM +0800, zhenwei pi wrote: >> Note that virtio balloon statistics are OS independent, the names are [SNIP] >> \item[VIRTIO_BALLOON_S_HTLB_PGFAIL (9)] The number of failed hugetlb page >> allocations in the guest. >> + >> +\item[VIRTIO_BALLOON_S_OOM_KILL (10)] The count of OOM killer invocations >> + increases when the kernel invokes the OOM killer. The OOM killer selects a task >> + to terminate in order to free up memory. >> + >> +\item[VIRTIO_BALLOON_S_ALLOC_STALL (11)] The count of stalls on memory allocation. >> + >> +\item[VIRTIO_BALLOON_S_ASYNC_SCAN (12)] The amount of memory scanned asynchronously >> + by the kernel background task (in bytes). >> + >> +\item[VIRTIO_BALLOON_S_DIRECT_SCAN (13)] The amount of memory scanned directly by >> + the running task (in bytes). >> + >> +\item[VIRTIO_BALLOON_S_ASYNC_RECLAIM (14)] The amount of memory reclaimed >> + asynchronously by the kernel background task (in bytes). >> + >> +\item[VIRTIO_BALLOON_S_DIRECT_RECLAIM (15)] The amount of memory reclaimed >> + directly by the running task (in bytes). >> \end{description} > > OOM kill description is unnecessarily verbose, the rest not very clear, > and not consistent with existing stats. > Hi Michael, You asked 'What exactly are "invocations"? The number of times a task was killed to free memory?' in the previous version, would you please give me an example, and any example of the rest? > >> \subsubsection{Free Page Hinting}\label{sec:Device Types / Memory Balloon Device / Device Operation / Free Page Hinting} >> -- >> 2.43.0 >> > -- zhenwei pi [-- Attachment #2: Type: text/html, Size: 2045 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Re: [PATCH v2 1/2] balloon: introduce 6 memory statistics 2024-06-06 6:37 ` zhenwei pi @ 2024-06-06 7:17 ` Michael S. Tsirkin 0 siblings, 0 replies; 7+ messages in thread From: Michael S. Tsirkin @ 2024-06-06 7:17 UTC (permalink / raw) To: zhenwei pi; +Cc: virtio-comment, david On Wed, Jun 05, 2024 at 11:37:39PM -0700, zhenwei pi wrote: > On 6/6/24 14:09, Michael S. Tsirkin wrote: > > On Thu, Jun 06, 2024 at 11:35:43AM +0800, zhenwei pi wrote: > >> Note that virtio balloon statistics are OS independent, the names are > [SNIP] > >> \item[VIRTIO_BALLOON_S_HTLB_PGFAIL (9)] The number of failed hugetlb page > >> allocations in the guest. > >> + > >> +\item[VIRTIO_BALLOON_S_OOM_KILL (10)] The count of OOM killer invocations > >> + increases when the kernel invokes the OOM killer. The OOM killer selects a > task > >> + to terminate in order to free up memory. > >> + > >> +\item[VIRTIO_BALLOON_S_ALLOC_STALL (11)] The count of stalls on memory > allocation. > >> + > >> +\item[VIRTIO_BALLOON_S_ASYNC_SCAN (12)] The amount of memory scanned > asynchronously > >> + by the kernel background task (in bytes). > >> + > >> +\item[VIRTIO_BALLOON_S_DIRECT_SCAN (13)] The amount of memory scanned > directly by > >> + the running task (in bytes). > >> + > >> +\item[VIRTIO_BALLOON_S_ASYNC_RECLAIM (14)] The amount of memory reclaimed > >> + asynchronously by the kernel background task (in bytes). > >> + > >> +\item[VIRTIO_BALLOON_S_DIRECT_RECLAIM (15)] The amount of memory reclaimed > >> + directly by the running task (in bytes). > >> \end{description} > > > > OOM kill description is unnecessarily verbose, the rest not very clear, > > and not consistent with existing stats. > > > > Hi Michael, > > You asked 'What exactly are "invocations"? The number of times a task > was killed to free memory?' in the previous version, would you please > give me an example, Well consider what you write: \item[VIRTIO_BALLOON_S_OOM_KILL (10)] The count of OOM killer invocations increases when the kernel invokes the OOM killer. this does not explain anything. You just repeat same thing twice. I asked what are invocations, your patch does not answer. If the answer is "the number of times a task was killed to free memory" then say so, that would be an explanation. > and any example of the rest? Example of not clear: What is "the running task" you refer to? What are "stalls" what is stalled? The text seems to be agrammatical, using "the" the first time a concept is introduced. Example of inconsistency: For example, "the kernel" is not a thing we ever defined. We do say "the guest" in balloon description (it is fundamentally a PV device and so special in that way). But please try to think of a reader coming at this for the first time, reading the whole chapter, not just your patch and attempting to understand what each of the things is. Do not just fix the examples I pointed out. Getting a tech writer to read your text and give feedback is often a good idea. > > > >> \subsubsection{Free Page Hinting}\label{sec:Device Types / Memory Balloon > Device / Device Operation / Free Page Hinting} > >> -- > >> 2.43.0 > >> > > > > -- > zhenwei pi > ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 2/2] balloon: link virtual memory management URL 2024-06-06 3:35 [PATCH v2 0/2] balloon: introduce 6 memory statistics zhenwei pi 2024-06-06 3:35 ` [PATCH v2 1/2] " zhenwei pi @ 2024-06-06 3:35 ` zhenwei pi 2024-06-06 6:13 ` Michael S. Tsirkin 1 sibling, 1 reply; 7+ messages in thread From: zhenwei pi @ 2024-06-06 3:35 UTC (permalink / raw) To: virtio-comment; +Cc: david, zhenwei pi More and more memory statistics have been added into stats VQ, add a link for Linux, this would be helpful to understand these counters. Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> --- device-types/balloon/description.tex | 3 +++ 1 file changed, 3 insertions(+) diff --git a/device-types/balloon/description.tex b/device-types/balloon/description.tex index 7ef4e3f..cfe5b8c 100644 --- a/device-types/balloon/description.tex +++ b/device-types/balloon/description.tex @@ -425,6 +425,9 @@ \subsubsection{Memory Statistics Tags}\label{sec:Device Types / Memory Balloon D directly by the running task (in bytes). \end{description} +Please see \url{https://docs.kernel.org/admin-guide/mm/concepts.html} +for above guidance of virtual memory management on Linux. + \subsubsection{Free Page Hinting}\label{sec:Device Types / Memory Balloon Device / Device Operation / Free Page Hinting} Free page hinting is designed to be used during migration to determine what -- 2.43.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2 2/2] balloon: link virtual memory management URL 2024-06-06 3:35 ` [PATCH v2 2/2] balloon: link virtual memory management URL zhenwei pi @ 2024-06-06 6:13 ` Michael S. Tsirkin 0 siblings, 0 replies; 7+ messages in thread From: Michael S. Tsirkin @ 2024-06-06 6:13 UTC (permalink / raw) To: zhenwei pi; +Cc: virtio-comment, david On Thu, Jun 06, 2024 at 11:35:44AM +0800, zhenwei pi wrote: > More and more memory statistics have been added into stats VQ, add a > link for Linux, this would be helpful to understand these counters. > > Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> > --- > device-types/balloon/description.tex | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/device-types/balloon/description.tex b/device-types/balloon/description.tex > index 7ef4e3f..cfe5b8c 100644 > --- a/device-types/balloon/description.tex > +++ b/device-types/balloon/description.tex > @@ -425,6 +425,9 @@ \subsubsection{Memory Statistics Tags}\label{sec:Device Types / Memory Balloon D > directly by the running task (in bytes). > \end{description} > > +Please see \url{https://docs.kernel.org/admin-guide/mm/concepts.html} > +for above guidance of virtual memory management on Linux. > + Unclear. If you want to refer to this - maybe add the link to normative references and actually call it out where you use a concept from that page. > \subsubsection{Free Page Hinting}\label{sec:Device Types / Memory Balloon Device / Device Operation / Free Page Hinting} > > Free page hinting is designed to be used during migration to determine what > -- > 2.43.0 > ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-06-06 7:17 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-06-06 3:35 [PATCH v2 0/2] balloon: introduce 6 memory statistics zhenwei pi 2024-06-06 3:35 ` [PATCH v2 1/2] " zhenwei pi 2024-06-06 6:09 ` Michael S. Tsirkin 2024-06-06 6:37 ` zhenwei pi 2024-06-06 7:17 ` Michael S. Tsirkin 2024-06-06 3:35 ` [PATCH v2 2/2] balloon: link virtual memory management URL zhenwei pi 2024-06-06 6:13 ` Michael S. Tsirkin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.