Re: [patch 1/3] mm: memcontrol: lockless page counters

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Vladimir Davydov <vdavydov@parallels.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.cz>,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [patch 1/3] mm: memcontrol: lockless page counters
Date: Fri, 17 Oct 2014 09:47:18 +0200	[thread overview]
Message-ID: <20141017074718.GB5641@esperanza> (raw)
In-Reply-To: <1413251163-8517-2-git-send-email-hannes@cmpxchg.org>

On Mon, Oct 13, 2014 at 09:46:01PM -0400, Johannes Weiner wrote:
> Memory is internally accounted in bytes, using spinlock-protected
> 64-bit counters, even though the smallest accounting delta is a page.
> The counter interface is also convoluted and does too many things.
> 
> Introduce a new lockless word-sized page counter API, then change all
> memory accounting over to it.  The translation from and to bytes then
> only happens when interfacing with userspace.
> 
> The removed locking overhead is noticable when scaling beyond the
> per-cpu charge caches - on a 4-socket machine with 144-threads, the
> following test shows the performance differences of 288 memcgs
> concurrently running a page fault benchmark:
> 
> vanilla:
> 
>    18631648.500498      task-clock (msec)         #  140.643 CPUs utilized            ( +-  0.33% )
>          1,380,638      context-switches          #    0.074 K/sec                    ( +-  0.75% )
>             24,390      cpu-migrations            #    0.001 K/sec                    ( +-  8.44% )
>      1,843,305,768      page-faults               #    0.099 M/sec                    ( +-  0.00% )
> 50,134,994,088,218      cycles                    #    2.691 GHz                      ( +-  0.33% )
>    <not supported>      stalled-cycles-frontend
>    <not supported>      stalled-cycles-backend
>  8,049,712,224,651      instructions              #    0.16  insns per cycle          ( +-  0.04% )
>  1,586,970,584,979      branches                  #   85.176 M/sec                    ( +-  0.05% )
>      1,724,989,949      branch-misses             #    0.11% of all branches          ( +-  0.48% )
> 
>      132.474343877 seconds time elapsed                                          ( +-  0.21% )
> 
> lockless:
> 
>    12195979.037525      task-clock (msec)         #  133.480 CPUs utilized            ( +-  0.18% )
>            832,850      context-switches          #    0.068 K/sec                    ( +-  0.54% )
>             15,624      cpu-migrations            #    0.001 K/sec                    ( +- 10.17% )
>      1,843,304,774      page-faults               #    0.151 M/sec                    ( +-  0.00% )
> 32,811,216,801,141      cycles                    #    2.690 GHz                      ( +-  0.18% )
>    <not supported>      stalled-cycles-frontend
>    <not supported>      stalled-cycles-backend
>  9,999,265,091,727      instructions              #    0.30  insns per cycle          ( +-  0.10% )
>  2,076,759,325,203      branches                  #  170.282 M/sec                    ( +-  0.12% )
>      1,656,917,214      branch-misses             #    0.08% of all branches          ( +-  0.55% )
> 
>       91.369330729 seconds time elapsed                                          ( +-  0.45% )
> 
> On top of improved scalability, this also gets rid of the icky long
> long types in the very heart of memcg, which is great for 32 bit and
> also makes the code a lot more readable.
> 
> Notable differences between the old and new API:
> 
> - res_counter_charge() and res_counter_charge_nofail() become
>   page_counter_try_charge() and page_counter_charge() resp. to match
>   the more common kernel naming scheme of try_do()/do()
> 
> - res_counter_uncharge_until() is only ever used to cancel a local
>   counter and never to uncharge bigger segments of a hierarchy, so
>   it's replaced by the simpler page_counter_cancel()
> 
> - res_counter_set_limit() is replaced by page_counter_limit(), which
>   expects its callers to serialize against themselves
> 
> - res_counter_memparse_write_strategy() is replaced by
>   page_counter_limit(), which rounds down to the nearest page size -
>   rather than up.  This is more reasonable for explicitely requested
>   hard upper limits.
> 
> - to keep charging light-weight, page_counter_try_charge() charges
>   speculatively, only to roll back if the result exceeds the limit.
>   Because of this, a failing bigger charge can temporarily lock out
>   smaller charges that would otherwise succeed.  The error is bounded
>   to the difference between the smallest and the biggest possible
>   charge size, so for memcg, this means that a failing THP charge can
>   send base page charges into reclaim upto 2MB (4MB) before the limit
>   would have been reached.  This should be acceptable.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Definitely better than it was.

Acked-by: Vladimir Davydov <vdavydov@parallels.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Vladimir Davydov <vdavydov@parallels.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.cz>, <cgroups@vger.kernel.org>,
	<linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [patch 1/3] mm: memcontrol: lockless page counters
Date: Fri, 17 Oct 2014 09:47:18 +0200	[thread overview]
Message-ID: <20141017074718.GB5641@esperanza> (raw)
In-Reply-To: <1413251163-8517-2-git-send-email-hannes@cmpxchg.org>

On Mon, Oct 13, 2014 at 09:46:01PM -0400, Johannes Weiner wrote:
> Memory is internally accounted in bytes, using spinlock-protected
> 64-bit counters, even though the smallest accounting delta is a page.
> The counter interface is also convoluted and does too many things.
> 
> Introduce a new lockless word-sized page counter API, then change all
> memory accounting over to it.  The translation from and to bytes then
> only happens when interfacing with userspace.
> 
> The removed locking overhead is noticable when scaling beyond the
> per-cpu charge caches - on a 4-socket machine with 144-threads, the
> following test shows the performance differences of 288 memcgs
> concurrently running a page fault benchmark:
> 
> vanilla:
> 
>    18631648.500498      task-clock (msec)         #  140.643 CPUs utilized            ( +-  0.33% )
>          1,380,638      context-switches          #    0.074 K/sec                    ( +-  0.75% )
>             24,390      cpu-migrations            #    0.001 K/sec                    ( +-  8.44% )
>      1,843,305,768      page-faults               #    0.099 M/sec                    ( +-  0.00% )
> 50,134,994,088,218      cycles                    #    2.691 GHz                      ( +-  0.33% )
>    <not supported>      stalled-cycles-frontend
>    <not supported>      stalled-cycles-backend
>  8,049,712,224,651      instructions              #    0.16  insns per cycle          ( +-  0.04% )
>  1,586,970,584,979      branches                  #   85.176 M/sec                    ( +-  0.05% )
>      1,724,989,949      branch-misses             #    0.11% of all branches          ( +-  0.48% )
> 
>      132.474343877 seconds time elapsed                                          ( +-  0.21% )
> 
> lockless:
> 
>    12195979.037525      task-clock (msec)         #  133.480 CPUs utilized            ( +-  0.18% )
>            832,850      context-switches          #    0.068 K/sec                    ( +-  0.54% )
>             15,624      cpu-migrations            #    0.001 K/sec                    ( +- 10.17% )
>      1,843,304,774      page-faults               #    0.151 M/sec                    ( +-  0.00% )
> 32,811,216,801,141      cycles                    #    2.690 GHz                      ( +-  0.18% )
>    <not supported>      stalled-cycles-frontend
>    <not supported>      stalled-cycles-backend
>  9,999,265,091,727      instructions              #    0.30  insns per cycle          ( +-  0.10% )
>  2,076,759,325,203      branches                  #  170.282 M/sec                    ( +-  0.12% )
>      1,656,917,214      branch-misses             #    0.08% of all branches          ( +-  0.55% )
> 
>       91.369330729 seconds time elapsed                                          ( +-  0.45% )
> 
> On top of improved scalability, this also gets rid of the icky long
> long types in the very heart of memcg, which is great for 32 bit and
> also makes the code a lot more readable.
> 
> Notable differences between the old and new API:
> 
> - res_counter_charge() and res_counter_charge_nofail() become
>   page_counter_try_charge() and page_counter_charge() resp. to match
>   the more common kernel naming scheme of try_do()/do()
> 
> - res_counter_uncharge_until() is only ever used to cancel a local
>   counter and never to uncharge bigger segments of a hierarchy, so
>   it's replaced by the simpler page_counter_cancel()
> 
> - res_counter_set_limit() is replaced by page_counter_limit(), which
>   expects its callers to serialize against themselves
> 
> - res_counter_memparse_write_strategy() is replaced by
>   page_counter_limit(), which rounds down to the nearest page size -
>   rather than up.  This is more reasonable for explicitely requested
>   hard upper limits.
> 
> - to keep charging light-weight, page_counter_try_charge() charges
>   speculatively, only to roll back if the result exceeds the limit.
>   Because of this, a failing bigger charge can temporarily lock out
>   smaller charges that would otherwise succeed.  The error is bounded
>   to the difference between the smallest and the biggest possible
>   charge size, so for memcg, this means that a failing THP charge can
>   send base page charges into reclaim upto 2MB (4MB) before the limit
>   would have been reached.  This should be acceptable.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Definitely better than it was.

Acked-by: Vladimir Davydov <vdavydov@parallels.com>

next prev parent reply	other threads:[~2014-10-17  7:47 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-14  1:46 [patch 0/3] mm: memcontrol: lockless page counters v3 Johannes Weiner
2014-10-14  1:46 ` Johannes Weiner
2014-10-14  1:46 ` [patch 1/3] mm: memcontrol: lockless page counters Johannes Weiner
2014-10-14  1:46   ` Johannes Weiner
     [not found]   ` <1413251163-8517-2-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2014-10-14 15:56     ` Michal Hocko
2014-10-14 15:56       ` Michal Hocko
2014-10-14 15:56       ` Michal Hocko
     [not found]       ` <20141014155647.GA6414-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2014-10-14 16:33         ` Johannes Weiner
2014-10-14 16:33           ` Johannes Weiner
2014-10-14 16:33           ` Johannes Weiner
     [not found]           ` <20141014163354.GA23911-HTCKtW7iVlxqnrmGgq4/JMIURNUf+fel@public.gmane.org>
2014-10-15  9:40             ` Michal Hocko
2014-10-15  9:40               ` Michal Hocko
2014-10-15  9:40               ` Michal Hocko
2014-10-17  7:47   ` Vladimir Davydov [this message]
2014-10-17  7:47     ` Vladimir Davydov
2014-10-14  1:46 ` [patch 2/3] mm: hugetlb_cgroup: convert to " Johannes Weiner
2014-10-14  1:46   ` Johannes Weiner
2014-10-14  1:46 ` [patch 3/3] kernel: res_counter: remove the unused API Johannes Weiner
2014-10-14  1:46   ` Johannes Weiner
2014-10-16  7:20   ` Paul Bolle
2014-10-16  7:20     ` Paul Bolle
2014-10-16 11:20     ` Michal Hocko
2014-10-16 11:20       ` Michal Hocko
2014-10-16 11:20       ` Michal Hocko
     [not found]       ` <20141016112021.GC338-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2014-10-16 14:46         ` Johannes Weiner
2014-10-16 14:46           ` Johannes Weiner
2014-10-16 14:46           ` Johannes Weiner
2014-10-16 15:05           ` Michal Hocko
2014-10-16 15:05             ` Michal Hocko
  -- strict thread matches above, loose matches on Subject: below --
2014-09-24 15:43 [patch 0/3] mm: memcontrol: lockless page counters v2 Johannes Weiner
2014-09-24 15:43 ` [patch 1/3] mm: memcontrol: lockless page counters Johannes Weiner
2014-09-24 15:43   ` Johannes Weiner
2014-09-26 10:31   ` Vladimir Davydov
2014-09-26 10:31     ` Vladimir Davydov
2014-10-02 12:07     ` Johannes Weiner
2014-10-02 12:07       ` Johannes Weiner
     [not found]       ` <20141002120748.GA1359-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2014-10-03 15:36         ` Vladimir Davydov
2014-10-03 15:36           ` Vladimir Davydov
2014-10-03 15:36           ` Vladimir Davydov
2014-10-03 15:41           ` Michal Hocko
2014-10-03 15:41             ` Michal Hocko
2014-10-06  6:38             ` Vladimir Davydov
2014-10-06  6:38               ` Vladimir Davydov
2014-09-30 11:06   ` Michal Hocko
2014-09-30 11:06     ` Michal Hocko
2014-10-02 15:01     ` Johannes Weiner
2014-10-02 15:01       ` Johannes Weiner
2014-10-02 19:52       ` Johannes Weiner
2014-10-02 19:52         ` Johannes Weiner
     [not found]         ` <20141002195214.GA2705-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2014-10-03 15:44           ` Michal Hocko
2014-10-03 15:44             ` Michal Hocko
2014-10-03 15:44             ` Michal Hocko
2014-10-03 14:50       ` Michal Hocko
2014-10-03 14:50         ` Michal Hocko
2014-10-07 15:15   ` Michal Hocko
2014-10-07 15:15     ` Michal Hocko
2014-10-08 12:31     ` Johannes Weiner
2014-10-08 12:31       ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141017074718.GB5641@esperanza \
    --to=vdavydov@parallels.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.