Re: [PATCH 4/4] memcg: get rid of percpu_charge_mutex lock

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Johannes Weiner <jweiner@redhat.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-mm@kvack.org, Balbir Singh <bsingharora@gmail.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 4/4] memcg: get rid of percpu_charge_mutex lock
Date: Mon, 8 Aug 2011 20:47:38 +0200	[thread overview]
Message-ID: <20110808184738.GA7749@redhat.com> (raw)
In-Reply-To: <a9244082ba28c4c2e4a6997311d5493bdaa117e9.1311338634.git.mhocko@suse.cz>

On Fri, Jul 22, 2011 at 01:20:25PM +0200, Michal Hocko wrote:
> percpu_charge_mutex protects from multiple simultaneous per-cpu charge
> caches draining because we might end up having too many work items.
> At least this was the case until 26fe6168 (memcg: fix percpu cached
> charge draining frequency) when we introduced a more targeted draining
> for async mode.
> Now that also sync draining is targeted we can safely remove mutex
> because we will not send more work than the current number of CPUs.
> FLUSHING_CACHED_CHARGE protects from sending the same work multiple
> times and stock->nr_pages == 0 protects from pointless sending a work
> if there is obviously nothing to be done. This is of course racy but we
> can live with it as the race window is really small (we would have to
> see FLUSHING_CACHED_CHARGE cleared while nr_pages would be still
> non-zero).
> The only remaining place where we can race is synchronous mode when we
> rely on FLUSHING_CACHED_CHARGE test which might have been set by other
> drainer on the same group but we should wait in that case as well.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
>  mm/memcontrol.c |   12 ++----------
>  1 files changed, 2 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 3685107..f8463a0 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2071,7 +2071,6 @@ struct memcg_stock_pcp {
>  #define FLUSHING_CACHED_CHARGE	(0)
>  };
>  static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
> -static DEFINE_MUTEX(percpu_charge_mutex);
>  
>  /*
>   * Try to consume stocked charge on this cpu. If success, one page is consumed
> @@ -2178,7 +2177,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync)
>  
>  	for_each_online_cpu(cpu) {
>  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> -		if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> +		if (mem_cgroup_same_or_subtree(root_mem, stock->cached) &&
> +				test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
>  			flush_work(&stock->work);
>  	}
>  out:

This hunk triggers a crash for me, as the draining is already done and
stock->cached reset to NULL when dereferenced here.  Oops is attached.

We have this loop in drain_all_stock():

	for_each_online_cpu(cpu) {
		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
		struct mem_cgroup *mem;

		mem = stock->cached;
		if (!mem || !stock->nr_pages)
			continue;
		if (!mem_cgroup_same_or_subtree(root_mem, mem))
			continue;
		if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
			if (cpu == curcpu)
				drain_local_stock(&stock->work);
			else
				schedule_work_on(cpu, &stock->work);
		}
	}

The only thing that stabilizes stock->cached is the knowledge that
there are still pages accounted to the memcg.

Without the mutex serializing this code, can't there be a concurrent
execution that leads to stock->cached being drained, becoming empty
and freed by someone else between the stock->nr_pages check and the
ancestor check, resulting in use after free?

What makes stock->cached safe to dereference?

[ 2313.442944] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[ 2313.443935] IP: [<ffffffff81083b70>] css_is_ancestor+0x20/0x70
[ 2313.443935] PGD 4ae7a067 PUD 4adc4067 PMD 0
[ 2313.443935] Oops: 0000 [#1] PREEMPT SMP
[ 2313.443935] CPU 0
[ 2313.443935] Pid: 19677, comm: rmdir Tainted: G        W   3.0.0-mm1-00188-gf38d32b #35 ECS MCP61M-M3/MCP61M-M3
[ 2313.443935] RIP: 0010:[<ffffffff81083b70>]  [<ffffffff81083b70>] css_is_ancestor+0x20/0x70
[ 2313.443935] RSP: 0018:ffff880077b09c88  EFLAGS: 00010202
[ 2313.443935] RAX: ffff8800781bb310 RBX: 0000000000000000 RCX: 000000000000003e
[ 2313.443935] RDX: 0000000000000000 RSI: ffff8800779f7c00 RDI: 0000000000000000
[ 2313.443935] RBP: ffff880077b09c98 R08: ffffffff818a4e88 R09: 0000000000000000
[ 2313.443935] R10: 0000000000000000 R11: dead000000100100 R12: ffff8800779f7c00
[ 2313.443935] R13: ffff8800779f7c00 R14: 0000000000000000 R15: ffff88007bc0eb80
[ 2313.443935] FS:  00007f5d689ec720(0000) GS:ffff88007bc00000(0000) knlGS:0000000000000000
[ 2313.443935] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2313.443935] CR2: 0000000000000018 CR3: 000000004ad57000 CR4: 00000000000006f0
[ 2313.443935] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2313.443935] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2313.443935] Process rmdir (pid: 19677, threadinfo ffff880077b08000, task ffff8800781bb310)
[ 2313.443935] Stack:
[ 2313.443935]  ffffffff818a4e88 000000000000eb80 ffff880077b09ca8 ffffffff810feba3
[ 2313.443935]  ffff880077b09d08 ffffffff810feccf ffff880077b09cf8 0000000000000001
[ 2313.443935]  ffff88007bd0eb80 0000000000000001 ffff880077af2000 0000000000000000
[ 2313.443935] Call Trace:
[ 2313.443935]  [<ffffffff810feba3>] mem_cgroup_same_or_subtree+0x33/0x40
[ 2313.443935]  [<ffffffff810feccf>] drain_all_stock+0x11f/0x170
[ 2313.443935]  [<ffffffff81103211>] mem_cgroup_force_empty+0x231/0x6d0
[ 2313.443935]  [<ffffffff81111872>] ? path_put+0x22/0x30
[ 2313.443935]  [<ffffffff8111c925>] ? __d_lookup+0xb5/0x170
[ 2313.443935]  [<ffffffff811036c4>] mem_cgroup_pre_destroy+0x14/0x20
[ 2313.443935]  [<ffffffff81080559>] cgroup_rmdir+0xb9/0x500
[ 2313.443935]  [<ffffffff81063990>] ? abort_exclusive_wait+0xb0/0xb0
[ 2313.443935]  [<ffffffff81114d26>] vfs_rmdir+0x86/0xe0
[ 2313.443935]  [<ffffffff811233d3>] ? mnt_want_write+0x43/0x80
[ 2313.443935]  [<ffffffff81114e7b>] do_rmdir+0xfb/0x110
[ 2313.443935]  [<ffffffff81114ea6>] sys_rmdir+0x16/0x20
[ 2313.443935]  [<ffffffff8154d76b>] system_call_fastpath+0x16/0x1b
[ 2313.443935] Code: b7 42 0a 5d c3 66 0f 1f 44 00 00 55 48 89 e5 48 83 ec 10 48 89 5d f0 4c 89 65 f8 66 66 66 66 90 48 89 fb 49 89 f4 e8 10 85 00 00
[ 2313.443935]  8b 43 18 49 8b 54 24 18 48 85 d2 74 05 48 85 c0 75 15 31 db

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2011-08-08 18:47 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-22 12:43 [PATCH 0/4 v2] memcg: cleanup per-cpu charge caches Michal Hocko
2011-07-21  7:38 ` [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages Michal Hocko
2011-07-25  1:16   ` KAMEZAWA Hiroyuki
2011-08-17 19:49     ` [patch] memcg: pin execution to current cpu while draining stock Johannes Weiner
2011-08-18  0:24       ` KAMEZAWA Hiroyuki
2011-08-18  5:56       ` Michal Hocko
2011-07-22 10:24 ` [PATCH 2/4] memcg: unify sync and async per-cpu charge cache draining Michal Hocko
2011-07-22 10:27 ` [PATCH 3/4] memcg: add mem_cgroup_same_or_subtree helper Michal Hocko
2011-07-22 11:20 ` [PATCH 4/4] memcg: get rid of percpu_charge_mutex lock Michal Hocko
2011-07-25  1:18   ` KAMEZAWA Hiroyuki
2011-08-08 18:47   ` Johannes Weiner [this message]
2011-08-08 21:47     ` Michal Hocko
2011-08-08 23:19       ` Johannes Weiner
2011-08-09  7:26         ` Michal Hocko
2011-08-09  9:31           ` [PATCH RFC] memcg: fix drain_all_stock crash Michal Hocko
2011-08-09  9:32             ` KAMEZAWA Hiroyuki
2011-08-09  9:45               ` Michal Hocko
2011-08-09  9:53                 ` KAMEZAWA Hiroyuki
2011-08-09 10:09                   ` Michal Hocko
2011-08-09 10:07                     ` KAMEZAWA Hiroyuki
2011-08-09 11:46                       ` Michal Hocko
2011-08-09 23:54                         ` KAMEZAWA Hiroyuki
2011-08-10  7:29                           ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110808184738.GA7749@redhat.com \
    --to=jweiner@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=bsingharora@gmail.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).