From: Johannes Weiner <hannes@cmpxchg.org>
To: Dave Hansen <dave@sr71.net>
Cc: Michal Hocko <mhocko@suse.com>, Hugh Dickins <hughd@google.com>,
Tejun Heo <tj@kernel.org>,
Vladimir Davydov <vdavydov@parallels.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>
Subject: Re: regression caused by cgroups optimization in 3.17-rc2
Date: Tue, 2 Sep 2014 18:18:14 -0400 [thread overview]
Message-ID: <20140902221814.GA18069@cmpxchg.org> (raw)
In-Reply-To: <54061505.8020500@sr71.net>
Hi Dave,
On Tue, Sep 02, 2014 at 12:05:41PM -0700, Dave Hansen wrote:
> I'm seeing a pretty large regression in 3.17-rc2 vs 3.16 coming from the
> memory cgroups code. This is on a kernel with cgroups enabled at
> compile time, but not _used_ for anything. See the green lines in the
> graph:
>
> https://www.sr71.net/~dave/intel/regression-from-05b843012.png
>
> The workload is a little parallel microbenchmark doing page faults:
Ouch.
> > https://github.com/antonblanchard/will-it-scale/blob/master/tests/page_fault2.c
>
> The hardware is an 8-socket Westmere box with 160 hardware threads. For
> some reason, this does not affect the version of the microbenchmark
> which is doing completely anonymous page faults.
>
> I bisected it down to this commit:
>
> > commit 05b8430123359886ef6a4146fba384e30d771b3f
> > Author: Johannes Weiner <hannes@cmpxchg.org>
> > Date: Wed Aug 6 16:05:59 2014 -0700
> >
> > mm: memcontrol: use root_mem_cgroup res_counter
> >
> > Due to an old optimization to keep expensive res_counter changes at a
> > minimum, the root_mem_cgroup res_counter is never charged; there is no
> > limit at that level anyway, and any statistics can be generated on
> > demand by summing up the counters of all other cgroups.
> >
> > However, with per-cpu charge caches, res_counter operations do not even
> > show up in profiles anymore, so this optimization is no longer
> > necessary.
> >
> > Remove it to simplify the code.
Accounting new pages is buffered through per-cpu caches, but taking
them off the counters on free is not, so I'm guessing that above a
certain allocation rate the cost of locking and changing the counters
takes over. Is there a chance you could profile this to see if locks
and res_counter-related operations show up?
I can't reproduce this complete breakdown on my smaller test gear, but
I do see an improvement with the following patch:
---
>From 29a51326c24b7bb45d17e9c864c34506f10868f6 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Tue, 2 Sep 2014 18:11:39 -0400
Subject: [patch] mm: memcontrol: use per-cpu caches for uncharging
---
mm/memcontrol.c | 28 ++++++++++++++++++++++------
1 file changed, 22 insertions(+), 6 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ec4dcf1b9562..cb79ecff399d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2365,6 +2365,7 @@ static void drain_stock(struct memcg_stock_pcp *stock)
res_counter_uncharge(&old->res, bytes);
if (do_swap_account)
res_counter_uncharge(&old->memsw, bytes);
+ memcg_oom_recover(old);
stock->nr_pages = 0;
}
stock->cached = NULL;
@@ -2405,6 +2406,13 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
stock->cached = memcg;
}
stock->nr_pages += nr_pages;
+ if (stock->nr_pages > CHARGE_BATCH * 4) {
+ res_counter_uncharge(&memcg->res, CHARGE_BATCH);
+ if (do_swap_account)
+ res_counter_uncharge(&memcg->memsw, CHARGE_BATCH);
+ memcg_oom_recover(memcg);
+ stock->nr_pages -= CHARGE_BATCH;
+ }
put_cpu_var(memcg_stock);
}
@@ -6509,12 +6517,20 @@ static void uncharge_batch(struct mem_cgroup *memcg, unsigned long pgpgout,
{
unsigned long flags;
- if (nr_mem)
- res_counter_uncharge(&memcg->res, nr_mem * PAGE_SIZE);
- if (nr_memsw)
- res_counter_uncharge(&memcg->memsw, nr_memsw * PAGE_SIZE);
-
- memcg_oom_recover(memcg);
+ /*
+ * The percpu caches count both memory and memsw charges in a
+ * single conuter, but there might be less memsw charges when
+ * some of the pages have been swapped out.
+ */
+ if (nr_mem == nr_memsw)
+ refill_stock(memcg, nr_mem);
+ else {
+ if (nr_mem)
+ res_counter_uncharge(&memcg->res, nr_mem * PAGE_SIZE);
+ if (nr_memsw)
+ res_counter_uncharge(&memcg->memsw, nr_memsw * PAGE_SIZE);
+ memcg_oom_recover(memcg);
+ }
local_irq_save(flags);
__this_cpu_sub(memcg->stat->count[MEM_CGROUP_STAT_RSS], nr_anon);
--
2.0.4
next prev parent reply other threads:[~2014-09-02 22:18 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-02 19:05 regression caused by cgroups optimization in 3.17-rc2 Dave Hansen
2014-09-02 20:18 ` Dave Hansen
2014-09-02 20:57 ` Dave Hansen
2014-09-04 14:27 ` Michal Hocko
2014-09-04 20:27 ` Dave Hansen
2014-09-04 22:53 ` Dave Hansen
2014-09-05 9:28 ` Michal Hocko
2014-09-05 9:25 ` Michal Hocko
2014-09-05 14:47 ` Johannes Weiner
2014-09-05 15:39 ` Michal Hocko
2014-09-10 16:29 ` Michal Hocko
2014-09-10 16:57 ` Dave Hansen
2014-09-10 17:05 ` Michal Hocko
2014-09-05 12:35 ` Johannes Weiner
2014-09-08 15:47 ` Dave Hansen
2014-09-09 14:50 ` Johannes Weiner
2014-09-09 18:23 ` Dave Hansen
2014-09-02 22:18 ` Johannes Weiner [this message]
2014-09-02 22:36 ` Dave Hansen
2014-09-03 0:10 ` Johannes Weiner
2014-09-03 0:20 ` Linus Torvalds
2014-09-03 1:33 ` Johannes Weiner
2014-09-03 3:15 ` Dave Hansen
2014-09-03 0:30 ` Dave Hansen
2014-09-04 15:08 ` Johannes Weiner
2014-09-04 20:50 ` Dave Hansen
2014-09-05 8:04 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140902221814.GA18069@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=dave@sr71.net \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vdavydov@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox