From: Johannes Weiner <jweiner@redhat.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-mm@kvack.org, Balbir Singh <bsingharora@gmail.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 4/4] memcg: get rid of percpu_charge_mutex lock
Date: Mon, 8 Aug 2011 20:47:38 +0200 [thread overview]
Message-ID: <20110808184738.GA7749@redhat.com> (raw)
In-Reply-To: <a9244082ba28c4c2e4a6997311d5493bdaa117e9.1311338634.git.mhocko@suse.cz>
On Fri, Jul 22, 2011 at 01:20:25PM +0200, Michal Hocko wrote:
> percpu_charge_mutex protects from multiple simultaneous per-cpu charge
> caches draining because we might end up having too many work items.
> At least this was the case until 26fe6168 (memcg: fix percpu cached
> charge draining frequency) when we introduced a more targeted draining
> for async mode.
> Now that also sync draining is targeted we can safely remove mutex
> because we will not send more work than the current number of CPUs.
> FLUSHING_CACHED_CHARGE protects from sending the same work multiple
> times and stock->nr_pages == 0 protects from pointless sending a work
> if there is obviously nothing to be done. This is of course racy but we
> can live with it as the race window is really small (we would have to
> see FLUSHING_CACHED_CHARGE cleared while nr_pages would be still
> non-zero).
> The only remaining place where we can race is synchronous mode when we
> rely on FLUSHING_CACHED_CHARGE test which might have been set by other
> drainer on the same group but we should wait in that case as well.
>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
> mm/memcontrol.c | 12 ++----------
> 1 files changed, 2 insertions(+), 10 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 3685107..f8463a0 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2071,7 +2071,6 @@ struct memcg_stock_pcp {
> #define FLUSHING_CACHED_CHARGE (0)
> };
> static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
> -static DEFINE_MUTEX(percpu_charge_mutex);
>
> /*
> * Try to consume stocked charge on this cpu. If success, one page is consumed
> @@ -2178,7 +2177,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync)
>
> for_each_online_cpu(cpu) {
> struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> - if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> + if (mem_cgroup_same_or_subtree(root_mem, stock->cached) &&
> + test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> flush_work(&stock->work);
> }
> out:
This hunk triggers a crash for me, as the draining is already done and
stock->cached reset to NULL when dereferenced here. Oops is attached.
We have this loop in drain_all_stock():
for_each_online_cpu(cpu) {
struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
struct mem_cgroup *mem;
mem = stock->cached;
if (!mem || !stock->nr_pages)
continue;
if (!mem_cgroup_same_or_subtree(root_mem, mem))
continue;
if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
if (cpu == curcpu)
drain_local_stock(&stock->work);
else
schedule_work_on(cpu, &stock->work);
}
}
The only thing that stabilizes stock->cached is the knowledge that
there are still pages accounted to the memcg.
Without the mutex serializing this code, can't there be a concurrent
execution that leads to stock->cached being drained, becoming empty
and freed by someone else between the stock->nr_pages check and the
ancestor check, resulting in use after free?
What makes stock->cached safe to dereference?
[ 2313.442944] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[ 2313.443935] IP: [<ffffffff81083b70>] css_is_ancestor+0x20/0x70
[ 2313.443935] PGD 4ae7a067 PUD 4adc4067 PMD 0
[ 2313.443935] Oops: 0000 [#1] PREEMPT SMP
[ 2313.443935] CPU 0
[ 2313.443935] Pid: 19677, comm: rmdir Tainted: G W 3.0.0-mm1-00188-gf38d32b #35 ECS MCP61M-M3/MCP61M-M3
[ 2313.443935] RIP: 0010:[<ffffffff81083b70>] [<ffffffff81083b70>] css_is_ancestor+0x20/0x70
[ 2313.443935] RSP: 0018:ffff880077b09c88 EFLAGS: 00010202
[ 2313.443935] RAX: ffff8800781bb310 RBX: 0000000000000000 RCX: 000000000000003e
[ 2313.443935] RDX: 0000000000000000 RSI: ffff8800779f7c00 RDI: 0000000000000000
[ 2313.443935] RBP: ffff880077b09c98 R08: ffffffff818a4e88 R09: 0000000000000000
[ 2313.443935] R10: 0000000000000000 R11: dead000000100100 R12: ffff8800779f7c00
[ 2313.443935] R13: ffff8800779f7c00 R14: 0000000000000000 R15: ffff88007bc0eb80
[ 2313.443935] FS: 00007f5d689ec720(0000) GS:ffff88007bc00000(0000) knlGS:0000000000000000
[ 2313.443935] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2313.443935] CR2: 0000000000000018 CR3: 000000004ad57000 CR4: 00000000000006f0
[ 2313.443935] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2313.443935] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2313.443935] Process rmdir (pid: 19677, threadinfo ffff880077b08000, task ffff8800781bb310)
[ 2313.443935] Stack:
[ 2313.443935] ffffffff818a4e88 000000000000eb80 ffff880077b09ca8 ffffffff810feba3
[ 2313.443935] ffff880077b09d08 ffffffff810feccf ffff880077b09cf8 0000000000000001
[ 2313.443935] ffff88007bd0eb80 0000000000000001 ffff880077af2000 0000000000000000
[ 2313.443935] Call Trace:
[ 2313.443935] [<ffffffff810feba3>] mem_cgroup_same_or_subtree+0x33/0x40
[ 2313.443935] [<ffffffff810feccf>] drain_all_stock+0x11f/0x170
[ 2313.443935] [<ffffffff81103211>] mem_cgroup_force_empty+0x231/0x6d0
[ 2313.443935] [<ffffffff81111872>] ? path_put+0x22/0x30
[ 2313.443935] [<ffffffff8111c925>] ? __d_lookup+0xb5/0x170
[ 2313.443935] [<ffffffff811036c4>] mem_cgroup_pre_destroy+0x14/0x20
[ 2313.443935] [<ffffffff81080559>] cgroup_rmdir+0xb9/0x500
[ 2313.443935] [<ffffffff81063990>] ? abort_exclusive_wait+0xb0/0xb0
[ 2313.443935] [<ffffffff81114d26>] vfs_rmdir+0x86/0xe0
[ 2313.443935] [<ffffffff811233d3>] ? mnt_want_write+0x43/0x80
[ 2313.443935] [<ffffffff81114e7b>] do_rmdir+0xfb/0x110
[ 2313.443935] [<ffffffff81114ea6>] sys_rmdir+0x16/0x20
[ 2313.443935] [<ffffffff8154d76b>] system_call_fastpath+0x16/0x1b
[ 2313.443935] Code: b7 42 0a 5d c3 66 0f 1f 44 00 00 55 48 89 e5 48 83 ec 10 48 89 5d f0 4c 89 65 f8 66 66 66 66 90 48 89 fb 49 89 f4 e8 10 85 00 00
[ 2313.443935] 8b 43 18 49 8b 54 24 18 48 85 d2 74 05 48 85 c0 75 15 31 db
WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <jweiner@redhat.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-mm@kvack.org, Balbir Singh <bsingharora@gmail.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 4/4] memcg: get rid of percpu_charge_mutex lock
Date: Mon, 8 Aug 2011 20:47:38 +0200 [thread overview]
Message-ID: <20110808184738.GA7749@redhat.com> (raw)
In-Reply-To: <a9244082ba28c4c2e4a6997311d5493bdaa117e9.1311338634.git.mhocko@suse.cz>
On Fri, Jul 22, 2011 at 01:20:25PM +0200, Michal Hocko wrote:
> percpu_charge_mutex protects from multiple simultaneous per-cpu charge
> caches draining because we might end up having too many work items.
> At least this was the case until 26fe6168 (memcg: fix percpu cached
> charge draining frequency) when we introduced a more targeted draining
> for async mode.
> Now that also sync draining is targeted we can safely remove mutex
> because we will not send more work than the current number of CPUs.
> FLUSHING_CACHED_CHARGE protects from sending the same work multiple
> times and stock->nr_pages == 0 protects from pointless sending a work
> if there is obviously nothing to be done. This is of course racy but we
> can live with it as the race window is really small (we would have to
> see FLUSHING_CACHED_CHARGE cleared while nr_pages would be still
> non-zero).
> The only remaining place where we can race is synchronous mode when we
> rely on FLUSHING_CACHED_CHARGE test which might have been set by other
> drainer on the same group but we should wait in that case as well.
>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
> mm/memcontrol.c | 12 ++----------
> 1 files changed, 2 insertions(+), 10 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 3685107..f8463a0 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2071,7 +2071,6 @@ struct memcg_stock_pcp {
> #define FLUSHING_CACHED_CHARGE (0)
> };
> static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
> -static DEFINE_MUTEX(percpu_charge_mutex);
>
> /*
> * Try to consume stocked charge on this cpu. If success, one page is consumed
> @@ -2178,7 +2177,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync)
>
> for_each_online_cpu(cpu) {
> struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> - if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> + if (mem_cgroup_same_or_subtree(root_mem, stock->cached) &&
> + test_bit(FLUSHING_CACHED_CHARGE, &stock->flags))
> flush_work(&stock->work);
> }
> out:
This hunk triggers a crash for me, as the draining is already done and
stock->cached reset to NULL when dereferenced here. Oops is attached.
We have this loop in drain_all_stock():
for_each_online_cpu(cpu) {
struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
struct mem_cgroup *mem;
mem = stock->cached;
if (!mem || !stock->nr_pages)
continue;
if (!mem_cgroup_same_or_subtree(root_mem, mem))
continue;
if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
if (cpu == curcpu)
drain_local_stock(&stock->work);
else
schedule_work_on(cpu, &stock->work);
}
}
The only thing that stabilizes stock->cached is the knowledge that
there are still pages accounted to the memcg.
Without the mutex serializing this code, can't there be a concurrent
execution that leads to stock->cached being drained, becoming empty
and freed by someone else between the stock->nr_pages check and the
ancestor check, resulting in use after free?
What makes stock->cached safe to dereference?
[ 2313.442944] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[ 2313.443935] IP: [<ffffffff81083b70>] css_is_ancestor+0x20/0x70
[ 2313.443935] PGD 4ae7a067 PUD 4adc4067 PMD 0
[ 2313.443935] Oops: 0000 [#1] PREEMPT SMP
[ 2313.443935] CPU 0
[ 2313.443935] Pid: 19677, comm: rmdir Tainted: G W 3.0.0-mm1-00188-gf38d32b #35 ECS MCP61M-M3/MCP61M-M3
[ 2313.443935] RIP: 0010:[<ffffffff81083b70>] [<ffffffff81083b70>] css_is_ancestor+0x20/0x70
[ 2313.443935] RSP: 0018:ffff880077b09c88 EFLAGS: 00010202
[ 2313.443935] RAX: ffff8800781bb310 RBX: 0000000000000000 RCX: 000000000000003e
[ 2313.443935] RDX: 0000000000000000 RSI: ffff8800779f7c00 RDI: 0000000000000000
[ 2313.443935] RBP: ffff880077b09c98 R08: ffffffff818a4e88 R09: 0000000000000000
[ 2313.443935] R10: 0000000000000000 R11: dead000000100100 R12: ffff8800779f7c00
[ 2313.443935] R13: ffff8800779f7c00 R14: 0000000000000000 R15: ffff88007bc0eb80
[ 2313.443935] FS: 00007f5d689ec720(0000) GS:ffff88007bc00000(0000) knlGS:0000000000000000
[ 2313.443935] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2313.443935] CR2: 0000000000000018 CR3: 000000004ad57000 CR4: 00000000000006f0
[ 2313.443935] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2313.443935] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2313.443935] Process rmdir (pid: 19677, threadinfo ffff880077b08000, task ffff8800781bb310)
[ 2313.443935] Stack:
[ 2313.443935] ffffffff818a4e88 000000000000eb80 ffff880077b09ca8 ffffffff810feba3
[ 2313.443935] ffff880077b09d08 ffffffff810feccf ffff880077b09cf8 0000000000000001
[ 2313.443935] ffff88007bd0eb80 0000000000000001 ffff880077af2000 0000000000000000
[ 2313.443935] Call Trace:
[ 2313.443935] [<ffffffff810feba3>] mem_cgroup_same_or_subtree+0x33/0x40
[ 2313.443935] [<ffffffff810feccf>] drain_all_stock+0x11f/0x170
[ 2313.443935] [<ffffffff81103211>] mem_cgroup_force_empty+0x231/0x6d0
[ 2313.443935] [<ffffffff81111872>] ? path_put+0x22/0x30
[ 2313.443935] [<ffffffff8111c925>] ? __d_lookup+0xb5/0x170
[ 2313.443935] [<ffffffff811036c4>] mem_cgroup_pre_destroy+0x14/0x20
[ 2313.443935] [<ffffffff81080559>] cgroup_rmdir+0xb9/0x500
[ 2313.443935] [<ffffffff81063990>] ? abort_exclusive_wait+0xb0/0xb0
[ 2313.443935] [<ffffffff81114d26>] vfs_rmdir+0x86/0xe0
[ 2313.443935] [<ffffffff811233d3>] ? mnt_want_write+0x43/0x80
[ 2313.443935] [<ffffffff81114e7b>] do_rmdir+0xfb/0x110
[ 2313.443935] [<ffffffff81114ea6>] sys_rmdir+0x16/0x20
[ 2313.443935] [<ffffffff8154d76b>] system_call_fastpath+0x16/0x1b
[ 2313.443935] Code: b7 42 0a 5d c3 66 0f 1f 44 00 00 55 48 89 e5 48 83 ec 10 48 89 5d f0 4c 89 65 f8 66 66 66 66 90 48 89 fb 49 89 f4 e8 10 85 00 00
[ 2313.443935] 8b 43 18 49 8b 54 24 18 48 85 d2 74 05 48 85 c0 75 15 31 db
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-08-08 18:47 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-22 12:43 [PATCH 0/4 v2] memcg: cleanup per-cpu charge caches Michal Hocko
2011-07-21 7:38 ` [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages Michal Hocko
2011-07-25 1:16 ` KAMEZAWA Hiroyuki
2011-07-25 1:16 ` KAMEZAWA Hiroyuki
2011-08-17 19:49 ` [patch] memcg: pin execution to current cpu while draining stock Johannes Weiner
2011-08-17 19:49 ` Johannes Weiner
2011-08-18 0:24 ` KAMEZAWA Hiroyuki
2011-08-18 0:24 ` KAMEZAWA Hiroyuki
2011-08-18 5:56 ` Michal Hocko
2011-08-18 5:56 ` Michal Hocko
2011-07-22 10:24 ` [PATCH 2/4] memcg: unify sync and async per-cpu charge cache draining Michal Hocko
2011-07-22 10:27 ` [PATCH 3/4] memcg: add mem_cgroup_same_or_subtree helper Michal Hocko
2011-07-22 11:20 ` [PATCH 4/4] memcg: get rid of percpu_charge_mutex lock Michal Hocko
2011-07-25 1:18 ` KAMEZAWA Hiroyuki
2011-07-25 1:18 ` KAMEZAWA Hiroyuki
2011-08-08 18:47 ` Johannes Weiner [this message]
2011-08-08 18:47 ` Johannes Weiner
2011-08-08 21:47 ` Michal Hocko
2011-08-08 21:47 ` Michal Hocko
2011-08-08 23:19 ` Johannes Weiner
2011-08-08 23:19 ` Johannes Weiner
2011-08-09 7:26 ` Michal Hocko
2011-08-09 7:26 ` Michal Hocko
2011-08-09 9:31 ` [PATCH RFC] memcg: fix drain_all_stock crash Michal Hocko
2011-08-09 9:31 ` Michal Hocko
2011-08-09 9:32 ` KAMEZAWA Hiroyuki
2011-08-09 9:32 ` KAMEZAWA Hiroyuki
2011-08-09 9:45 ` Michal Hocko
2011-08-09 9:45 ` Michal Hocko
2011-08-09 9:53 ` KAMEZAWA Hiroyuki
2011-08-09 9:53 ` KAMEZAWA Hiroyuki
2011-08-09 10:09 ` Michal Hocko
2011-08-09 10:09 ` Michal Hocko
2011-08-09 10:07 ` KAMEZAWA Hiroyuki
2011-08-09 10:07 ` KAMEZAWA Hiroyuki
2011-08-09 11:46 ` Michal Hocko
2011-08-09 11:46 ` Michal Hocko
2011-08-09 23:54 ` KAMEZAWA Hiroyuki
2011-08-09 23:54 ` KAMEZAWA Hiroyuki
2011-08-10 7:29 ` Michal Hocko
2011-08-10 7:29 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110808184738.GA7749@redhat.com \
--to=jweiner@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=bsingharora@gmail.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.