linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>,
	Chris Mason <chris.mason@oracle.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Paul Menage <menage@google.com>, Li Zefan <lizf@cn.fujitsu.com>,
	containers@lists.linux-foundation.org
Subject: Re: memcg: fix fatal livelock in kswapd
Date: Sun, 8 May 2011 03:29:00 +0530	[thread overview]
Message-ID: <BANLkTikKhjmPJKHiJa2hRBdUF2=oe8HZzg@mail.gmail.com> (raw)
In-Reply-To: <20110502224838.GB10278@cmpxchg.org>

[-- Attachment #1: Type: text/plain, Size: 3281 bytes --]

On Tue, May 3, 2011 at 4:18 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:

> Hi,
>
> On Mon, May 02, 2011 at 03:07:29PM -0500, James Bottomley wrote:
> > The fatal livelock in kswapd, reported in this thread:
> >
> > http://marc.info/?t=130392066000001
> >
> > Is mitigateable if we prevent the cgroups code being so aggressive in
> > its zone shrinking (by reducing it's default shrink from 0 [everything]
> > to DEF_PRIORITY [some things]).  This will have an obvious knock on
> > effect to cgroup accounting, but it's better than hanging systems.
>
> Actually, it's not that obvious.  At least not to me.  I added Balbir,
> who added said comment and code in the first place, to CC: Here is the
> comment in full quote:
>
>
I missed this email in my inbox, just saw it and responding


>        /*
>         * NOTE: Although we can get the priority field, using it
>         * here is not a good idea, since it limits the pages we can scan.
>          * if we don't reclaim here, the shrink_zone from balance_pgdat
>         * will pick up pages from other mem cgroup's as well. We hack
>         * the priority and make it zero.
>          */
>
> The idea is that if one memcg is above its softlimit, we prefer
> reducing pages from this memcg over reclaiming random other pages,
> including those of other memcgs.
>
>
My comment and code were based on the observations I saw during my tests.
With DEF_PRIORITY we see scan >> priority in get_scan_count(), since we know
how much exactly we are over the soft limit, it makes sense to go after the
pages, so that normal balancing can be restored.


> But the code flow looks like this:
>
>        balance_pgdat
>          mem_cgroup_soft_limit_reclaim
>             mem_cgroup_shrink_node_zone
>               shrink_zone(0, zone, &sc)
>           shrink_zone(prio, zone, &sc)
>
> so the success of the inner memcg shrink_zone does at least not
> explicitely result in the outer, global shrink_zone steering clear of
> other memcgs' pages.


Yes, but it allows soft reclaim to know what to target first for success


>  It just tries to move the pressure of balancing
> the zones to the memcg with the biggest soft limit excess.  That can
> only really work if the memcg is a large enough contributor to the
> zone's total number of lru pages, though, and looks very likely to hit
> the exceeding memcg too hard in other cases.
>
> I am very much for removing this hack.  There is still more scan
> pressure applied to memcgs in excess of their soft limit even if the
> extra scan is happening at a sane priority level.  And the fact that
> global reclaim operates completely unaware of memcgs is a different
> story.
>
> However, this code came into place with v2.6.31-8387-g4e41695.  Why is
> it only now showing up?
>
> You also wrote in that thread that this happens on a standard F15
> installation.  On the F15 I am running here, systemd does not
> configure memcgs, however.  Did you manually configure memcgs and set
> soft limits?  Because I wonder how it ended up in soft limit reclaim
> in the first place.
>
>
I am running F15 as well, but never hit the problem so far. I am surprised
to see the stack posted on the thread, it seemed like you
never explicitly enabled anything to wake up the memcg beast :)

Balbir

[-- Attachment #2: Type: text/html, Size: 4421 bytes --]

  parent reply	other threads:[~2011-05-07 21:59 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-02 20:07 memcg: fix fatal livelock in kswapd James Bottomley
2011-05-02 22:48 ` Johannes Weiner
2011-05-02 23:14   ` Ying Han
2011-05-02 23:58     ` James Bottomley
2011-05-03  6:38       ` Johannes Weiner
2011-05-03 14:11         ` James Bottomley
2011-05-05 21:00           ` Andrew Morton
2011-05-03  6:11     ` Johannes Weiner
2011-05-07 21:59   ` Balbir Singh [this message]
2011-05-07 22:00     ` Balbir Singh
2011-05-02 22:53 ` Paul Menage

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='BANLkTikKhjmPJKHiJa2hRBdUF2=oe8HZzg@mail.gmail.com' \
    --to=balbir@linux.vnet.ibm.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=chris.mason@oracle.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=menage@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).