From: "Michael S. Tsirkin" <mst@redhat.com>
To: zhenwei pi <pizhenwei@bytedance.com>
Cc: virtio-comment@lists.linux.dev, david@redhat.com
Subject: Re: [PATCH v2 1/2] balloon: introduce 6 memory statistics
Date: Thu, 6 Jun 2024 02:09:53 -0400 [thread overview]
Message-ID: <20240606020800-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20240606033544.400000-2-pizhenwei@bytedance.com>
On Thu, Jun 06, 2024 at 11:35:43AM +0800, zhenwei pi wrote:
> Note that virtio balloon statistics are OS independent, the names are
> not completely the same as Linux or Windows. For example, 'reclaim' is
> more general than 'steal' (on linux) and 'repurpose' (on windows) for
> a modern operating system.
>
> Expose more memory statistics of the virtual memory subsystem from
> guest, it's helpful to analyze the memory performance and pressure
> from host side.
>
> More, once the memory pressure gets critical, a task hits ALLOC STALL,
> and reclaims pages in batch directly. So the ALLOC STALL count is not
> the same as direct reclaim count.
>
> Now we have a metric to analyze the memory performance:
> - y: counter increases
> - n: counter does not changes
> - h: the rate of counter change is high
> - l: the rate of counter change is low
>
> OOM: VIRTIO_BALLOON_S_OOM_KILL
> STALL: VIRTIO_BALLOON_S_ALLOC_STALL
> ASCAN: VIRTIO_BALLOON_S_SCAN_ASYNC
> DSCAN: VIRTIO_BALLOON_S_SCAN_DIRECT
> ARCLM: VIRTIO_BALLOON_S_RECLAIM_ASYNC
> DRCLM: VIRTIO_BALLOON_S_RECLAIM_DIRECT
>
> - OOM[y], STALL[*], ASCAN[*], DSCAN[*], ARCLM[*], DRCLM[*]:
> the guest runs under really critial memory pressure
>
> - OOM[n], STALL[h], ASCAN[*], DSCAN[l], ARCLM[*], DRCLM[l]:
> the memory allocation stalls due to cgroup, not the global memory
> pressure (for Linux).
>
> - OOM[n], STALL[h], ASCAN[*], DSCAN[h], ARCLM[*], DRCLM[h]:
> the memory allocation stalls due to global memory pressure. The
> performance gets hurt a lot. A high ratio between DRCLM/DSCAN shows
> quite effective memory reclaiming.
>
> - OOM[n], STALL[h], ASCAN[*], DSCAN[h], ARCLM[*], DRCLM[l]:
> the memory allocation stalls due to global memory pressure.
> the ratio between DRCLM/DSCAN gets low, the guest OS is thrashing
> heavily, the serious case leads poor performance and difficult
> trouble shooting. Ex, sshd may block on memory allocation when
> accepting new connections, a user can't login a VM by ssh command.
>
> - OOM[n], STALL[n], ASCAN[h], DSCAN[n], ARCLM[l], DRCLM[n]:
> the low ratio between ARCLM/ASCAN shows that the guest tries to
> reclaim more memory, but it can't. Once more memory is required in
> future, it will struggle to reclaim memory.
>
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> ---
> device-types/balloon/description.tex | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/device-types/balloon/description.tex b/device-types/balloon/description.tex
> index a1d9603..7ef4e3f 100644
> --- a/device-types/balloon/description.tex
> +++ b/device-types/balloon/description.tex
> @@ -305,6 +305,12 @@ \subsubsection{Memory Statistics}\label{sec:Device Types / Memory Balloon Device
> #define VIRTIO_BALLOON_S_CACHES 7
> #define VIRTIO_BALLOON_S_HTLB_PGALLOC 8
> #define VIRTIO_BALLOON_S_HTLB_PGFAIL 9
> +#define VIRTIO_BALLOON_S_OOM_KILL 10
> +#define VIRTIO_BALLOON_S_ALLOC_STALL 11
> +#define VIRTIO_BALLOON_S_ASYNC_SCAN 12
> +#define VIRTIO_BALLOON_S_DIRECT_SCAN 13
> +#define VIRTIO_BALLOON_S_ASYNC_RECLAIM 14
> +#define VIRTIO_BALLOON_S_DIRECT_RECLAIM 15
> le16 tag;
> le64 val;
> } __attribute__((packed));
> @@ -399,6 +405,24 @@ \subsubsection{Memory Statistics Tags}\label{sec:Device Types / Memory Balloon D
>
> \item[VIRTIO_BALLOON_S_HTLB_PGFAIL (9)] The number of failed hugetlb page
> allocations in the guest.
> +
> +\item[VIRTIO_BALLOON_S_OOM_KILL (10)] The count of OOM killer invocations
> + increases when the kernel invokes the OOM killer. The OOM killer selects a task
> + to terminate in order to free up memory.
> +
> +\item[VIRTIO_BALLOON_S_ALLOC_STALL (11)] The count of stalls on memory allocation.
> +
> +\item[VIRTIO_BALLOON_S_ASYNC_SCAN (12)] The amount of memory scanned asynchronously
> + by the kernel background task (in bytes).
> +
> +\item[VIRTIO_BALLOON_S_DIRECT_SCAN (13)] The amount of memory scanned directly by
> + the running task (in bytes).
> +
> +\item[VIRTIO_BALLOON_S_ASYNC_RECLAIM (14)] The amount of memory reclaimed
> + asynchronously by the kernel background task (in bytes).
> +
> +\item[VIRTIO_BALLOON_S_DIRECT_RECLAIM (15)] The amount of memory reclaimed
> + directly by the running task (in bytes).
> \end{description}
OOM kill description is unnecessarily verbose, the rest not very clear,
and not consistent with existing stats.
> \subsubsection{Free Page Hinting}\label{sec:Device Types / Memory Balloon Device / Device Operation / Free Page Hinting}
> --
> 2.43.0
>
next prev parent reply other threads:[~2024-06-06 6:10 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-06 3:35 [PATCH v2 0/2] balloon: introduce 6 memory statistics zhenwei pi
2024-06-06 3:35 ` [PATCH v2 1/2] " zhenwei pi
2024-06-06 6:09 ` Michael S. Tsirkin [this message]
2024-06-06 6:37 ` zhenwei pi
2024-06-06 7:17 ` Michael S. Tsirkin
2024-06-06 3:35 ` [PATCH v2 2/2] balloon: link virtual memory management URL zhenwei pi
2024-06-06 6:13 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240606020800-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=david@redhat.com \
--cc=pizhenwei@bytedance.com \
--cc=virtio-comment@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.