Re: regression caused by cgroups optimization in 3.17-rc2

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dave Hansen <dave.hansen@intel.com>
To: Johannes Weiner <hannes@cmpxchg.org>, Dave Hansen <dave@sr71.net>
Cc: Michal Hocko <mhocko@suse.cz>, Hugh Dickins <hughd@google.com>,
	Tejun Heo <tj@kernel.org>, Linux-MM <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Vladimir Davydov <vdavydov@parallels.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: regression caused by cgroups optimization in 3.17-rc2
Date: Mon, 08 Sep 2014 08:47:37 -0700	[thread overview]
Message-ID: <540DCF99.2070900@intel.com> (raw)
In-Reply-To: <20140905123517.GA21208@cmpxchg.org>

On 09/05/2014 05:35 AM, Johannes Weiner wrote:
> On Thu, Sep 04, 2014 at 01:27:26PM -0700, Dave Hansen wrote:
>> On 09/04/2014 07:27 AM, Michal Hocko wrote:
>>> Ouch. free_pages_and_swap_cache completely kills the uncharge batching
>>> because it reduces it to PAGEVEC_SIZE batches.
>>>
>>> I think we really do not need PAGEVEC_SIZE batching anymore. We are
>>> already batching on tlb_gather layer. That one is limited so I think
>>> the below should be safe but I have to think about this some more. There
>>> is a risk of prolonged lru_lock wait times but the number of pages is
>>> limited to 10k and the heavy work is done outside of the lock. If this
>>> is really a problem then we can tear LRU part and the actual
>>> freeing/uncharging into a separate functions in this path.
>>>
>>> Could you test with this half baked patch, please? I didn't get to test
>>> it myself unfortunately.
>>
>> 3.16 settled out at about 11.5M faults/sec before the regression.  This
>> patch gets it back up to about 10.5M, which is good.  The top spinlock
>> contention in the kernel is still from the resource counter code via
>> mem_cgroup_commit_charge(), though.
> 
> Thanks for testing, that looks a lot better.
> 
> But commit doesn't touch resource counters - did you mean try_charge()
> or uncharge() by any chance?

I don't have the perf output that I was looking at when I said this, but
here's the path that I think I was referring to.  The inlining makes
this non-obvious, but this memcg_check_events() calls
mem_cgroup_update_tree() which is contending on mctz->lock.

So, you were right, it's not the resource counters code, it's a lock in
'struct mem_cgroup_tree_per_zone'.  But, the contention isn't _that_
high (2% of CPU) in this case.  But, that is 2% that we didn't see before.

>      1.87%     1.87%  [kernel]               [k] _raw_spin_lock_irqsave       
>                                |
>                                --- _raw_spin_lock_irqsave
>                                   |          
>                                   |--107.09%-- memcg_check_events
>                                   |          |          
>                                   |          |--79.98%-- mem_cgroup_commit_charge
>                                   |          |          |          
>                                   |          |          |--99.81%-- do_cow_fault
>                                   |          |          |          handle_mm_fault
>                                   |          |          |          __do_page_fault
>                                   |          |          |          do_page_fault
>                                   |          |          |          page_fault
>                                   |          |          |          testcase
>                                   |          |           --0.19%-- [...]


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Dave Hansen <dave.hansen@intel.com>
To: Johannes Weiner <hannes@cmpxchg.org>, Dave Hansen <dave@sr71.net>
Cc: Michal Hocko <mhocko@suse.cz>, Hugh Dickins <hughd@google.com>,
	Tejun Heo <tj@kernel.org>, Linux-MM <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Vladimir Davydov <vdavydov@parallels.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: regression caused by cgroups optimization in 3.17-rc2
Date: Mon, 08 Sep 2014 08:47:37 -0700	[thread overview]
Message-ID: <540DCF99.2070900@intel.com> (raw)
In-Reply-To: <20140905123517.GA21208@cmpxchg.org>

On 09/05/2014 05:35 AM, Johannes Weiner wrote:
> On Thu, Sep 04, 2014 at 01:27:26PM -0700, Dave Hansen wrote:
>> On 09/04/2014 07:27 AM, Michal Hocko wrote:
>>> Ouch. free_pages_and_swap_cache completely kills the uncharge batching
>>> because it reduces it to PAGEVEC_SIZE batches.
>>>
>>> I think we really do not need PAGEVEC_SIZE batching anymore. We are
>>> already batching on tlb_gather layer. That one is limited so I think
>>> the below should be safe but I have to think about this some more. There
>>> is a risk of prolonged lru_lock wait times but the number of pages is
>>> limited to 10k and the heavy work is done outside of the lock. If this
>>> is really a problem then we can tear LRU part and the actual
>>> freeing/uncharging into a separate functions in this path.
>>>
>>> Could you test with this half baked patch, please? I didn't get to test
>>> it myself unfortunately.
>>
>> 3.16 settled out at about 11.5M faults/sec before the regression.  This
>> patch gets it back up to about 10.5M, which is good.  The top spinlock
>> contention in the kernel is still from the resource counter code via
>> mem_cgroup_commit_charge(), though.
> 
> Thanks for testing, that looks a lot better.
> 
> But commit doesn't touch resource counters - did you mean try_charge()
> or uncharge() by any chance?

I don't have the perf output that I was looking at when I said this, but
here's the path that I think I was referring to.  The inlining makes
this non-obvious, but this memcg_check_events() calls
mem_cgroup_update_tree() which is contending on mctz->lock.

So, you were right, it's not the resource counters code, it's a lock in
'struct mem_cgroup_tree_per_zone'.  But, the contention isn't _that_
high (2% of CPU) in this case.  But, that is 2% that we didn't see before.

>      1.87%     1.87%  [kernel]               [k] _raw_spin_lock_irqsave       
>                                |
>                                --- _raw_spin_lock_irqsave
>                                   |          
>                                   |--107.09%-- memcg_check_events
>                                   |          |          
>                                   |          |--79.98%-- mem_cgroup_commit_charge
>                                   |          |          |          
>                                   |          |          |--99.81%-- do_cow_fault
>                                   |          |          |          handle_mm_fault
>                                   |          |          |          __do_page_fault
>                                   |          |          |          do_page_fault
>                                   |          |          |          page_fault
>                                   |          |          |          testcase
>                                   |          |           --0.19%-- [...]

next prev parent reply	other threads:[~2014-09-08 15:48 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-02 19:05 regression caused by cgroups optimization in 3.17-rc2 Dave Hansen
2014-09-02 19:05 ` Dave Hansen
2014-09-02 20:18 ` Dave Hansen
2014-09-02 20:57   ` Dave Hansen
2014-09-02 20:57     ` Dave Hansen
2014-09-04 14:27     ` Michal Hocko
2014-09-04 14:27       ` Michal Hocko
2014-09-04 20:27       ` Dave Hansen
2014-09-04 20:27         ` Dave Hansen
2014-09-04 22:53         ` Dave Hansen
2014-09-04 22:53           ` Dave Hansen
2014-09-05  9:28           ` Michal Hocko
2014-09-05  9:28             ` Michal Hocko
2014-09-05  9:25         ` Michal Hocko
2014-09-05  9:25           ` Michal Hocko
2014-09-05 14:47           ` Johannes Weiner
2014-09-05 14:47             ` Johannes Weiner
2014-09-05 15:39             ` Michal Hocko
2014-09-05 15:39               ` Michal Hocko
2014-09-10 16:29           ` Michal Hocko
2014-09-10 16:29             ` Michal Hocko
2014-09-10 16:57             ` Dave Hansen
2014-09-10 16:57               ` Dave Hansen
2014-09-10 17:05               ` Michal Hocko
2014-09-10 17:05                 ` Michal Hocko
2014-09-05 12:35         ` Johannes Weiner
2014-09-05 12:35           ` Johannes Weiner
2014-09-08 15:47           ` Dave Hansen [this message]
2014-09-08 15:47             ` Dave Hansen
2014-09-09 14:50             ` Johannes Weiner
2014-09-09 14:50               ` Johannes Weiner
2014-09-09 18:23               ` Dave Hansen
2014-09-09 18:23                 ` Dave Hansen
2014-09-02 22:18 ` Johannes Weiner
2014-09-02 22:18   ` Johannes Weiner
2014-09-02 22:36   ` Dave Hansen
2014-09-03  0:10     ` Johannes Weiner
2014-09-03  0:10       ` Johannes Weiner
2014-09-03  0:20       ` Linus Torvalds
2014-09-03  0:20         ` Linus Torvalds
2014-09-03  1:33         ` Johannes Weiner
2014-09-03  1:33           ` Johannes Weiner
2014-09-03  3:15           ` Dave Hansen
2014-09-03  3:15             ` Dave Hansen
2014-09-03  0:30       ` Dave Hansen
2014-09-03  0:30         ` Dave Hansen
2014-09-04 15:08         ` Johannes Weiner
2014-09-04 15:08           ` Johannes Weiner
2014-09-04 20:50           ` Dave Hansen
2014-09-04 20:50             ` Dave Hansen
2014-09-05  8:04           ` Michal Hocko
2014-09-05  8:04             ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=540DCF99.2070900@intel.com \
    --to=dave.hansen@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave@sr71.net \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vdavydov@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.