linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov@parallels.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.cz>,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [patch 1/3] mm: memcontrol: lockless page counters
Date: Fri, 17 Oct 2014 09:47:18 +0200	[thread overview]
Message-ID: <20141017074718.GB5641@esperanza> (raw)
In-Reply-To: <1413251163-8517-2-git-send-email-hannes@cmpxchg.org>

On Mon, Oct 13, 2014 at 09:46:01PM -0400, Johannes Weiner wrote:
> Memory is internally accounted in bytes, using spinlock-protected
> 64-bit counters, even though the smallest accounting delta is a page.
> The counter interface is also convoluted and does too many things.
> 
> Introduce a new lockless word-sized page counter API, then change all
> memory accounting over to it.  The translation from and to bytes then
> only happens when interfacing with userspace.
> 
> The removed locking overhead is noticable when scaling beyond the
> per-cpu charge caches - on a 4-socket machine with 144-threads, the
> following test shows the performance differences of 288 memcgs
> concurrently running a page fault benchmark:
> 
> vanilla:
> 
>    18631648.500498      task-clock (msec)         #  140.643 CPUs utilized            ( +-  0.33% )
>          1,380,638      context-switches          #    0.074 K/sec                    ( +-  0.75% )
>             24,390      cpu-migrations            #    0.001 K/sec                    ( +-  8.44% )
>      1,843,305,768      page-faults               #    0.099 M/sec                    ( +-  0.00% )
> 50,134,994,088,218      cycles                    #    2.691 GHz                      ( +-  0.33% )
>    <not supported>      stalled-cycles-frontend
>    <not supported>      stalled-cycles-backend
>  8,049,712,224,651      instructions              #    0.16  insns per cycle          ( +-  0.04% )
>  1,586,970,584,979      branches                  #   85.176 M/sec                    ( +-  0.05% )
>      1,724,989,949      branch-misses             #    0.11% of all branches          ( +-  0.48% )
> 
>      132.474343877 seconds time elapsed                                          ( +-  0.21% )
> 
> lockless:
> 
>    12195979.037525      task-clock (msec)         #  133.480 CPUs utilized            ( +-  0.18% )
>            832,850      context-switches          #    0.068 K/sec                    ( +-  0.54% )
>             15,624      cpu-migrations            #    0.001 K/sec                    ( +- 10.17% )
>      1,843,304,774      page-faults               #    0.151 M/sec                    ( +-  0.00% )
> 32,811,216,801,141      cycles                    #    2.690 GHz                      ( +-  0.18% )
>    <not supported>      stalled-cycles-frontend
>    <not supported>      stalled-cycles-backend
>  9,999,265,091,727      instructions              #    0.30  insns per cycle          ( +-  0.10% )
>  2,076,759,325,203      branches                  #  170.282 M/sec                    ( +-  0.12% )
>      1,656,917,214      branch-misses             #    0.08% of all branches          ( +-  0.55% )
> 
>       91.369330729 seconds time elapsed                                          ( +-  0.45% )
> 
> On top of improved scalability, this also gets rid of the icky long
> long types in the very heart of memcg, which is great for 32 bit and
> also makes the code a lot more readable.
> 
> Notable differences between the old and new API:
> 
> - res_counter_charge() and res_counter_charge_nofail() become
>   page_counter_try_charge() and page_counter_charge() resp. to match
>   the more common kernel naming scheme of try_do()/do()
> 
> - res_counter_uncharge_until() is only ever used to cancel a local
>   counter and never to uncharge bigger segments of a hierarchy, so
>   it's replaced by the simpler page_counter_cancel()
> 
> - res_counter_set_limit() is replaced by page_counter_limit(), which
>   expects its callers to serialize against themselves
> 
> - res_counter_memparse_write_strategy() is replaced by
>   page_counter_limit(), which rounds down to the nearest page size -
>   rather than up.  This is more reasonable for explicitely requested
>   hard upper limits.
> 
> - to keep charging light-weight, page_counter_try_charge() charges
>   speculatively, only to roll back if the result exceeds the limit.
>   Because of this, a failing bigger charge can temporarily lock out
>   smaller charges that would otherwise succeed.  The error is bounded
>   to the difference between the smallest and the biggest possible
>   charge size, so for memcg, this means that a failing THP charge can
>   send base page charges into reclaim upto 2MB (4MB) before the limit
>   would have been reached.  This should be acceptable.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Definitely better than it was.

Acked-by: Vladimir Davydov <vdavydov@parallels.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2014-10-17  7:47 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-14  1:46 [patch 0/3] mm: memcontrol: lockless page counters v3 Johannes Weiner
2014-10-14  1:46 ` [patch 1/3] mm: memcontrol: lockless page counters Johannes Weiner
2014-10-14 15:56   ` Michal Hocko
2014-10-14 16:33     ` Johannes Weiner
2014-10-15  9:40       ` Michal Hocko
2014-10-17  7:47   ` Vladimir Davydov [this message]
2014-10-14  1:46 ` [patch 2/3] mm: hugetlb_cgroup: convert to " Johannes Weiner
2014-10-14  1:46 ` [patch 3/3] kernel: res_counter: remove the unused API Johannes Weiner
2014-10-16  7:20   ` Paul Bolle
2014-10-16 11:20     ` Michal Hocko
2014-10-16 14:46       ` Johannes Weiner
2014-10-16 15:05         ` Michal Hocko
  -- strict thread matches above, loose matches on Subject: below --
2014-09-24 15:43 [patch 0/3] mm: memcontrol: lockless page counters v2 Johannes Weiner
2014-09-24 15:43 ` [patch 1/3] mm: memcontrol: lockless page counters Johannes Weiner
2014-09-26 10:31   ` Vladimir Davydov
2014-10-02 12:07     ` Johannes Weiner
2014-10-03 15:36       ` Vladimir Davydov
2014-10-03 15:41         ` Michal Hocko
2014-10-06  6:38           ` Vladimir Davydov
2014-09-30 11:06   ` Michal Hocko
2014-10-02 15:01     ` Johannes Weiner
2014-10-02 19:52       ` Johannes Weiner
2014-10-03 15:44         ` Michal Hocko
2014-10-03 14:50       ` Michal Hocko
2014-10-07 15:15   ` Michal Hocko
2014-10-08 12:31     ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141017074718.GB5641@esperanza \
    --to=vdavydov@parallels.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).