Re: [RFC 0/1] add support for reclaiming priorities per mem cgroup

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Minchan Kim <minchan@kernel.org>
To: Tim Murray <timmurray@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>,
	cgroups@vger.kernel.org, Linux-MM <linux-mm@kvack.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Patrik Torstensson <totte@google.com>,
	Android Kernel Team <kernel-team@android.com>,
	vinmenon@codeaurora.org
Subject: Re: [RFC 0/1] add support for reclaiming priorities per mem cgroup
Date: Wed, 22 Mar 2017 13:41:17 +0900	[thread overview]
Message-ID: <20170322044117.GD30149@bbox> (raw)
In-Reply-To: <CAEe=SxnYXGg+s15imF4D93DVzvhVT+yo5fvAvDtKrQKdXz2kyA@mail.gmail.com>

Hi Tim,

On Tue, Mar 21, 2017 at 10:18:26AM -0700, Tim Murray wrote:
> On Sun, Mar 19, 2017 at 10:59 PM, Minchan Kim <minchan@kernel.org> wrote:
> > However, I'm not sure your approach is good. It seems your approach just
> > reclaims pages from groups (DEF_PRIORITY - memcg->priority) >= sc->priority.
> > IOW, it is based on *temporal* memory pressure fluctuation sc->priority.
> >
> > Rather than it, I guess pages to be reclaimed should be distributed by
> > memcg->priority. Namely, if global memory pressure happens and VM want to
> > reclaim 100 pages, VM should reclaim 90 pages from memcg-A(priority-10)
> > and 10 pages from memcg-B(prioirty-90).
> 
> This is what I debated most while writing this patch. If I'm
> understanding your concern correctly, I think I'm doing more than
> skipping high-priority cgroups:

Yes, that is my concern. It could give too much pressure lower-priority
group. You already reduced scanning window for high-priority group so
I guess it would be enough for working.

The rationale from my thining is high-priority group can have cold pages(
for instance, used-once pages, madvise_free pages and so on) so, VM should
age every groups to reclaim cold pages but we can reduce scanning window
for high-priority group to keep more workingset as you did. By that, we
already give more pressure to lower priority group than high-prioirty group.

> 
> - If the scan isn't high priority yet, then skip high-priority cgroups.

This part is the one I think it's too much ;-)
I think no need to skip but just reduce scanning window by the group's
prioirty.

> - When the scan is high priority, scan fewer pages from
> higher-priority cgroups (using the priority to modify the shift in
> get_scan_count).

That sounds lkie a good idea but need to tune more.

How about this?

get_scan_count for memcg-A:
        ..
        size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx) *
                        (memcg-A / sum(memcg all priorities))

get_scan_count for memcg-B:
        ..
        size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx) *
                        (memcg-B / sum(memcg all priorities))

By that, can't it support memcg-hierarchy as well? I don't know. ;(
Hope memcg guys give more thought.

> 
> This is tightly coupled with the question of what to do with
> vmpressure. The right thing to do on an Android device under memory
> pressure is probably something like this:
> 
> 1. Reclaim aggressively from low-priority background processes. The
> goal here is to reduce the pages used by background processes to the
> size of their heaps (or smaller with ZRAM) but zero file pages.
> They're already likely to be killed by userspace and we're keeping
> them around opportunistically, so a performance hit if they run and
> have to do IO to restore some of that working set is okay.
> 2. Reclaim a small amount from persistent processes. These often have
> a performance-critical subset of pages that we absolutely don't want
> paged out, but some reclaim of these processes is fine. They're large,
> some of them only run sporadically and don't impact performance, it's
> okay to touch these sometimes.

That's why I wanted to age LRU from all of memcg but slow for high-priority
group via reduing scanning window, which means high-priority group's
pages makes more chance to be activated. So, it's already prioirty-boost.

> 3. If there still aren't enough free pages, notify userspace to kill
> any processes it can. If I put my "Android performance engineer
> working on userspace" hat on, what I'd want to know from userspace is
> that kswapd/direct reclaim probably has to scan foreground processes
> in order to reclaim enough free pages to satisfy watermarks. That's a
> signal I could directly act on from userspace.

Hmm, could you tell us how many of memcg groups do you thinking now?

background, foreground? Just two?

The reason I ask is if you want to make foregroud/background memcg
and move apps between them back and forth when the status changed,
we need to remember lru pages are not moved from originated memcg
so it wouldn't work as expected.


> 4. If that still isn't enough, reclaim from foreground processes,
> since those processes are performance-critical.
> 
> As a result, I like not being fair about which cgroups are scanned
> initially. Some cgroups are strictly more important than others. (With

Yeb, *initially* is arguable point. I hope only reducing scanning
window would work. However, just my two cent. If it have a problem,
yes, need more thing.


> that said, I'm not tied to enforcing unfairness in scanning. Android
> would probably use different priority levels for each app level for
> fair scanning vs unfair scanning, but my guess is that the actual
> reclaiming behavior would look similar in both schemes.)
> 
> Mem cgroup priority suggests a useful signal for vmpressure. If
> scanning is starting to touch cgroups at a higher priority than
> persistent processes, then the userspace lowmemorykiller could kill
> one or more background processes (which would be in low-priority
> cgroups that have already been scanned aggressively). The current lmk
> hand-tuned watermarks would be gone, and tuning the /proc/sys/vm knobs
> would be all that's required to make an Android device do the right
> thing in terms of memory.

Yes, it's better way. I think.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2017-03-22  4:41 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-17 23:16 [RFC 0/1] add support for reclaiming priorities per mem cgroup Tim Murray
2017-03-17 23:16 ` [RFC 1/1] mm, memcg: add prioritized reclaim Tim Murray
2017-03-20 14:41   ` vinayak menon
2017-03-20  5:59 ` [RFC 0/1] add support for reclaiming priorities per mem cgroup Minchan Kim
2017-03-20 13:58   ` Vinayak Menon
2017-03-20 15:23     ` Johannes Weiner
2017-03-22 12:13       ` Vinayak Menon
2017-03-21 17:18   ` Tim Murray
2017-03-22  4:41     ` Minchan Kim [this message]
2017-03-22  5:20       ` Minchan Kim
2017-03-20  6:56 ` peter enderborg
2017-03-20  8:18 ` Kyungmin Park
2017-03-30  5:59 ` Minchan Kim
2017-03-30  7:10   ` Tim Murray
2017-03-30 15:51 ` Johannes Weiner
2017-03-30 16:48   ` Shakeel Butt
2017-04-13 16:03     ` Johannes Weiner
2017-03-30 19:40   ` Tim Murray
2017-03-30 21:54     ` Tim Murray
2017-04-13  4:30     ` Minchan Kim
2017-04-13 16:01       ` Johannes Weiner
2017-04-17  4:26         ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170322044117.GD30149@bbox \
    --to=minchan@kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    --cc=totte@google.com \
    --cc=vdavydov.dev@gmail.com \
    --cc=vinmenon@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).