Re: Memcg stat for available memory

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Shakeel Butt <shakeelb@google.com>
To: David Rientjes <rientjes@google.com>,
	Yang Shi <yang.shi@linux.alibaba.com>,
	 Roman Gushchin <guro@fb.com>, Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	 Vladimir Davydov <vdavydov.dev@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Cgroups <cgroups@vger.kernel.org>, Linux MM <linux-mm@kvack.org>
Subject: Re: Memcg stat for available memory
Date: Thu, 2 Jul 2020 08:22:10 -0700	[thread overview]
Message-ID: <CALvZod5Zv33oNLxS_8TyGV_QT4CsBjiEuocxpt2+U-XDMaFDPw@mail.gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.22.394.2006281445210.855265@chino.kir.corp.google.com>

(Adding more people who might be interested in this)


On Sun, Jun 28, 2020 at 3:15 PM David Rientjes <rientjes@google.com> wrote:
>
> Hi everybody,
>
> I'd like to discuss the feasibility of a stat similar to
> si_mem_available() but at memcg scope which would specify how much memory
> can be charged without I/O.
>
> The si_mem_available() stat is based on heuristics so this does not
> provide an exact quantity that is actually available at any given time,
> but can otherwise provide userspace with some guidance on the amount of
> reclaimable memory.  See the description in
> Documentation/filesystems/proc.rst and its implementation.
>
>  [ Naturally, userspace would need to understand both the amount of memory
>    that is available for allocation and for charging, separately, on an
>    overcommitted system.  I assume this is trivial.  (Why don't we provide
>    MemAvailable in per-node meminfo?) ]
>
> For such a stat at memcg scope, we can ignore totalreserves and
> watermarks.  We already have ~precise (modulo MEMCG_CHARGE_BATCH) data for
> both file pages and slab_reclaimable.
>
> We can infer lazily free memory by doing
>
>         file - (active_file + inactive_file)
>
> (This is necessary because lazy free memory is anon but on the inactive
>  file lru and we can't infer lazy freeable memory through pglazyfree -
>  pglazyfreed, they are event counters.)
>
> We can also infer the number of underlying compound pages that are on
> deferred split queues but have yet to be split with active_anon - anon (or
> is this a bug? :)
>
> So it *seems* like userspace can make a si_mem_available()-like
> calculation ("avail") by doing
>
>         free = memory.high - memory.current
>         lazyfree = file - (active_file + inactive_file)
>         deferred = active_anon - anon
>
>         avail = free + lazyfree + deferred +
>                 (active_file + inactive_file + slab_reclaimable) / 2
>
> For userspace interested in knowing how much memory it can charge without
> incurring I/O (and assuming it has knowledge of available memory on an
> overcommitted system), it seems like:
>
>  (a) it can derive the above avail amount that is at least similar to
>      MemAvailable,
>
>  (b) it can assume that all reclaim is considered equal so anything more
>      than memory.high - memory.current is disruptive enough that it's a
>      better heuristic than the above, or
>
>  (c) the kernel provide an "avail" stat in memory.stat based on the above
>      and can evolve as the kernel implementation changes (how lazy free
>      memory impacts anon vs file lru stats, how deferred split memory is
>      handled, any future extensions for "easily reclaimable memory") that
>      userspace can count on to the same degree it can count on
>      MemAvailable.
>
> Any thoughts?


I think we need to answer two questions:

1) What's the use-case?
2) Why is user space calculating their MemAvailable themselves not good?

The use case I have in mind is the latency sensitive distributed
caching service which would prefer to reduce the amount of its caching
over the stalls incurred by hitting the limit. Such applications can
monitor their MemAvailable and adjust their caching footprint.

For the second, I think it is to hide the internal implementation
details of the kernel from the user space. The deferred split queues
is an internal detail and we don't want that exposed to the user.
Similarly how lazyfree is implemented (i.e. anon pages on file LRU)
should not be exposed to the users.

Shakeel

next prev parent reply	other threads:[~2020-07-02 15:22 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-28 22:15 Memcg stat for available memory David Rientjes
2020-07-02 15:22 ` Shakeel Butt [this message]
2020-07-03  8:15   ` Michal Hocko
2020-07-07 19:58     ` David Rientjes
2020-07-10 19:47       ` David Rientjes
2020-07-10 21:04         ` Yang Shi
2020-07-12 22:02           ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALvZod5Zv33oNLxS_8TyGV_QT4CsBjiEuocxpt2+U-XDMaFDPw@mail.gmail.com \
    --to=shakeelb@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=gthelen@google.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    --cc=vdavydov.dev@gmail.com \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).