From: Vladimir Davydov <vdavydov@parallels.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@suse.cz>,
cgroups@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [patch 1/3] mm: memcontrol: lockless page counters
Date: Fri, 17 Oct 2014 09:47:18 +0200 [thread overview]
Message-ID: <20141017074718.GB5641@esperanza> (raw)
In-Reply-To: <1413251163-8517-2-git-send-email-hannes@cmpxchg.org>
On Mon, Oct 13, 2014 at 09:46:01PM -0400, Johannes Weiner wrote:
> Memory is internally accounted in bytes, using spinlock-protected
> 64-bit counters, even though the smallest accounting delta is a page.
> The counter interface is also convoluted and does too many things.
>
> Introduce a new lockless word-sized page counter API, then change all
> memory accounting over to it. The translation from and to bytes then
> only happens when interfacing with userspace.
>
> The removed locking overhead is noticable when scaling beyond the
> per-cpu charge caches - on a 4-socket machine with 144-threads, the
> following test shows the performance differences of 288 memcgs
> concurrently running a page fault benchmark:
>
> vanilla:
>
> 18631648.500498 task-clock (msec) # 140.643 CPUs utilized ( +- 0.33% )
> 1,380,638 context-switches # 0.074 K/sec ( +- 0.75% )
> 24,390 cpu-migrations # 0.001 K/sec ( +- 8.44% )
> 1,843,305,768 page-faults # 0.099 M/sec ( +- 0.00% )
> 50,134,994,088,218 cycles # 2.691 GHz ( +- 0.33% )
> <not supported> stalled-cycles-frontend
> <not supported> stalled-cycles-backend
> 8,049,712,224,651 instructions # 0.16 insns per cycle ( +- 0.04% )
> 1,586,970,584,979 branches # 85.176 M/sec ( +- 0.05% )
> 1,724,989,949 branch-misses # 0.11% of all branches ( +- 0.48% )
>
> 132.474343877 seconds time elapsed ( +- 0.21% )
>
> lockless:
>
> 12195979.037525 task-clock (msec) # 133.480 CPUs utilized ( +- 0.18% )
> 832,850 context-switches # 0.068 K/sec ( +- 0.54% )
> 15,624 cpu-migrations # 0.001 K/sec ( +- 10.17% )
> 1,843,304,774 page-faults # 0.151 M/sec ( +- 0.00% )
> 32,811,216,801,141 cycles # 2.690 GHz ( +- 0.18% )
> <not supported> stalled-cycles-frontend
> <not supported> stalled-cycles-backend
> 9,999,265,091,727 instructions # 0.30 insns per cycle ( +- 0.10% )
> 2,076,759,325,203 branches # 170.282 M/sec ( +- 0.12% )
> 1,656,917,214 branch-misses # 0.08% of all branches ( +- 0.55% )
>
> 91.369330729 seconds time elapsed ( +- 0.45% )
>
> On top of improved scalability, this also gets rid of the icky long
> long types in the very heart of memcg, which is great for 32 bit and
> also makes the code a lot more readable.
>
> Notable differences between the old and new API:
>
> - res_counter_charge() and res_counter_charge_nofail() become
> page_counter_try_charge() and page_counter_charge() resp. to match
> the more common kernel naming scheme of try_do()/do()
>
> - res_counter_uncharge_until() is only ever used to cancel a local
> counter and never to uncharge bigger segments of a hierarchy, so
> it's replaced by the simpler page_counter_cancel()
>
> - res_counter_set_limit() is replaced by page_counter_limit(), which
> expects its callers to serialize against themselves
>
> - res_counter_memparse_write_strategy() is replaced by
> page_counter_limit(), which rounds down to the nearest page size -
> rather than up. This is more reasonable for explicitely requested
> hard upper limits.
>
> - to keep charging light-weight, page_counter_try_charge() charges
> speculatively, only to roll back if the result exceeds the limit.
> Because of this, a failing bigger charge can temporarily lock out
> smaller charges that would otherwise succeed. The error is bounded
> to the difference between the smallest and the biggest possible
> charge size, so for memcg, this means that a failing THP charge can
> send base page charges into reclaim upto 2MB (4MB) before the limit
> would have been reached. This should be acceptable.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Definitely better than it was.
Acked-by: Vladimir Davydov <vdavydov@parallels.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Vladimir Davydov <vdavydov@parallels.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@suse.cz>, <cgroups@vger.kernel.org>,
<linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [patch 1/3] mm: memcontrol: lockless page counters
Date: Fri, 17 Oct 2014 09:47:18 +0200 [thread overview]
Message-ID: <20141017074718.GB5641@esperanza> (raw)
In-Reply-To: <1413251163-8517-2-git-send-email-hannes@cmpxchg.org>
On Mon, Oct 13, 2014 at 09:46:01PM -0400, Johannes Weiner wrote:
> Memory is internally accounted in bytes, using spinlock-protected
> 64-bit counters, even though the smallest accounting delta is a page.
> The counter interface is also convoluted and does too many things.
>
> Introduce a new lockless word-sized page counter API, then change all
> memory accounting over to it. The translation from and to bytes then
> only happens when interfacing with userspace.
>
> The removed locking overhead is noticable when scaling beyond the
> per-cpu charge caches - on a 4-socket machine with 144-threads, the
> following test shows the performance differences of 288 memcgs
> concurrently running a page fault benchmark:
>
> vanilla:
>
> 18631648.500498 task-clock (msec) # 140.643 CPUs utilized ( +- 0.33% )
> 1,380,638 context-switches # 0.074 K/sec ( +- 0.75% )
> 24,390 cpu-migrations # 0.001 K/sec ( +- 8.44% )
> 1,843,305,768 page-faults # 0.099 M/sec ( +- 0.00% )
> 50,134,994,088,218 cycles # 2.691 GHz ( +- 0.33% )
> <not supported> stalled-cycles-frontend
> <not supported> stalled-cycles-backend
> 8,049,712,224,651 instructions # 0.16 insns per cycle ( +- 0.04% )
> 1,586,970,584,979 branches # 85.176 M/sec ( +- 0.05% )
> 1,724,989,949 branch-misses # 0.11% of all branches ( +- 0.48% )
>
> 132.474343877 seconds time elapsed ( +- 0.21% )
>
> lockless:
>
> 12195979.037525 task-clock (msec) # 133.480 CPUs utilized ( +- 0.18% )
> 832,850 context-switches # 0.068 K/sec ( +- 0.54% )
> 15,624 cpu-migrations # 0.001 K/sec ( +- 10.17% )
> 1,843,304,774 page-faults # 0.151 M/sec ( +- 0.00% )
> 32,811,216,801,141 cycles # 2.690 GHz ( +- 0.18% )
> <not supported> stalled-cycles-frontend
> <not supported> stalled-cycles-backend
> 9,999,265,091,727 instructions # 0.30 insns per cycle ( +- 0.10% )
> 2,076,759,325,203 branches # 170.282 M/sec ( +- 0.12% )
> 1,656,917,214 branch-misses # 0.08% of all branches ( +- 0.55% )
>
> 91.369330729 seconds time elapsed ( +- 0.45% )
>
> On top of improved scalability, this also gets rid of the icky long
> long types in the very heart of memcg, which is great for 32 bit and
> also makes the code a lot more readable.
>
> Notable differences between the old and new API:
>
> - res_counter_charge() and res_counter_charge_nofail() become
> page_counter_try_charge() and page_counter_charge() resp. to match
> the more common kernel naming scheme of try_do()/do()
>
> - res_counter_uncharge_until() is only ever used to cancel a local
> counter and never to uncharge bigger segments of a hierarchy, so
> it's replaced by the simpler page_counter_cancel()
>
> - res_counter_set_limit() is replaced by page_counter_limit(), which
> expects its callers to serialize against themselves
>
> - res_counter_memparse_write_strategy() is replaced by
> page_counter_limit(), which rounds down to the nearest page size -
> rather than up. This is more reasonable for explicitely requested
> hard upper limits.
>
> - to keep charging light-weight, page_counter_try_charge() charges
> speculatively, only to roll back if the result exceeds the limit.
> Because of this, a failing bigger charge can temporarily lock out
> smaller charges that would otherwise succeed. The error is bounded
> to the difference between the smallest and the biggest possible
> charge size, so for memcg, this means that a failing THP charge can
> send base page charges into reclaim upto 2MB (4MB) before the limit
> would have been reached. This should be acceptable.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Definitely better than it was.
Acked-by: Vladimir Davydov <vdavydov@parallels.com>
next prev parent reply other threads:[~2014-10-17 7:47 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-14 1:46 [patch 0/3] mm: memcontrol: lockless page counters v3 Johannes Weiner
2014-10-14 1:46 ` Johannes Weiner
2014-10-14 1:46 ` [patch 1/3] mm: memcontrol: lockless page counters Johannes Weiner
2014-10-14 1:46 ` Johannes Weiner
[not found] ` <1413251163-8517-2-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2014-10-14 15:56 ` Michal Hocko
2014-10-14 15:56 ` Michal Hocko
2014-10-14 15:56 ` Michal Hocko
[not found] ` <20141014155647.GA6414-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2014-10-14 16:33 ` Johannes Weiner
2014-10-14 16:33 ` Johannes Weiner
2014-10-14 16:33 ` Johannes Weiner
[not found] ` <20141014163354.GA23911-HTCKtW7iVlxqnrmGgq4/JMIURNUf+fel@public.gmane.org>
2014-10-15 9:40 ` Michal Hocko
2014-10-15 9:40 ` Michal Hocko
2014-10-15 9:40 ` Michal Hocko
2014-10-17 7:47 ` Vladimir Davydov [this message]
2014-10-17 7:47 ` Vladimir Davydov
2014-10-14 1:46 ` [patch 2/3] mm: hugetlb_cgroup: convert to " Johannes Weiner
2014-10-14 1:46 ` Johannes Weiner
2014-10-14 1:46 ` [patch 3/3] kernel: res_counter: remove the unused API Johannes Weiner
2014-10-14 1:46 ` Johannes Weiner
2014-10-16 7:20 ` Paul Bolle
2014-10-16 7:20 ` Paul Bolle
2014-10-16 11:20 ` Michal Hocko
2014-10-16 11:20 ` Michal Hocko
2014-10-16 11:20 ` Michal Hocko
[not found] ` <20141016112021.GC338-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2014-10-16 14:46 ` Johannes Weiner
2014-10-16 14:46 ` Johannes Weiner
2014-10-16 14:46 ` Johannes Weiner
2014-10-16 15:05 ` Michal Hocko
2014-10-16 15:05 ` Michal Hocko
-- strict thread matches above, loose matches on Subject: below --
2014-09-24 15:43 [patch 0/3] mm: memcontrol: lockless page counters v2 Johannes Weiner
2014-09-24 15:43 ` [patch 1/3] mm: memcontrol: lockless page counters Johannes Weiner
2014-09-24 15:43 ` Johannes Weiner
2014-09-26 10:31 ` Vladimir Davydov
2014-09-26 10:31 ` Vladimir Davydov
2014-10-02 12:07 ` Johannes Weiner
2014-10-02 12:07 ` Johannes Weiner
[not found] ` <20141002120748.GA1359-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2014-10-03 15:36 ` Vladimir Davydov
2014-10-03 15:36 ` Vladimir Davydov
2014-10-03 15:36 ` Vladimir Davydov
2014-10-03 15:41 ` Michal Hocko
2014-10-03 15:41 ` Michal Hocko
2014-10-06 6:38 ` Vladimir Davydov
2014-10-06 6:38 ` Vladimir Davydov
2014-09-30 11:06 ` Michal Hocko
2014-09-30 11:06 ` Michal Hocko
2014-10-02 15:01 ` Johannes Weiner
2014-10-02 15:01 ` Johannes Weiner
2014-10-02 19:52 ` Johannes Weiner
2014-10-02 19:52 ` Johannes Weiner
[not found] ` <20141002195214.GA2705-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2014-10-03 15:44 ` Michal Hocko
2014-10-03 15:44 ` Michal Hocko
2014-10-03 15:44 ` Michal Hocko
2014-10-03 14:50 ` Michal Hocko
2014-10-03 14:50 ` Michal Hocko
2014-10-07 15:15 ` Michal Hocko
2014-10-07 15:15 ` Michal Hocko
2014-10-08 12:31 ` Johannes Weiner
2014-10-08 12:31 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141017074718.GB5641@esperanza \
--to=vdavydov@parallels.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.