From: Andrew Morton <akpm@linux-foundation.org>
To: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Konstantin Khlebnikov <khlebnikov@openvz.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 3.3] memcg: fix deadlock by inverting lrucare nesting
Date: Wed, 29 Feb 2012 14:04:58 -0800 [thread overview]
Message-ID: <20120229140458.c53352db.akpm@linux-foundation.org> (raw)
In-Reply-To: <alpine.LSU.2.00.1202282121160.4875@eggly.anvils>
On Tue, 28 Feb 2012 21:25:02 -0800 (PST)
Hugh Dickins <hughd@google.com> wrote:
> We have forgotten the rules of lock nesting: the irq-safe ones must be
> taken inside the non-irq-safe ones, otherwise we are open to deadlock:
>
> CPU0 CPU1
> ---- ----
> lock(&(&pc->lock)->rlock);
> local_irq_disable();
> lock(&(&zone->lru_lock)->rlock);
> lock(&(&pc->lock)->rlock);
> <Interrupt>
> lock(&(&zone->lru_lock)->rlock);
>
> To check a different locking issue, I happened to add a spin_lock to
> memcg's bit_spin_lock in lock_page_cgroup(), and lockdep very quickly
> complained about __mem_cgroup_commit_charge_lrucare() (on CPU1 above).
>
> So delete __mem_cgroup_commit_charge_lrucare(), passing a bool lrucare
> to __mem_cgroup_commit_charge() instead, taking zone->lru_lock under
> lock_page_cgroup() in the lrucare case.
>
> The original was using spin_lock_irqsave, but we'd be in more trouble
> if it were ever called at interrupt time: unconditional _irq is enough.
> And ClearPageLRU before del from lru, SetPageLRU before add to lru: no
> strong reason, but that is the ordering used consistently elsewhere.
This patch makes rather a mess of "memcg: remove PCG_CACHE page_cgroup
flag".
--- mm/memcontrol.c~memcg-remove-pcg_cache-page_cgroup-flag
+++ mm/memcontrol.c
@@ -2410,6 +2414,8 @@
struct page_cgroup *pc,
enum charge_type ctype)
{
+ bool anon;
+
lock_page_cgroup(pc);
if (unlikely(PageCgroupUsed(pc))) {
unlock_page_cgroup(pc);
@@ -2429,21 +2435,14 @@
* See mem_cgroup_add_lru_list(), etc.
*/
smp_wmb();
- switch (ctype) {
- case MEM_CGROUP_CHARGE_TYPE_CACHE:
- case MEM_CGROUP_CHARGE_TYPE_SHMEM:
- SetPageCgroupCache(pc);
- SetPageCgroupUsed(pc);
- break;
- case MEM_CGROUP_CHARGE_TYPE_MAPPED:
- ClearPageCgroupCache(pc);
- SetPageCgroupUsed(pc);
- break;
- default:
- break;
- }
- mem_cgroup_charge_statistics(memcg, PageCgroupCache(pc), nr_pages);
+ SetPageCgroupUsed(pc);
+ if (ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
+ anon = true;
+ else
+ anon = false;
+
+ mem_cgroup_charge_statistics(memcg, anon, nr_pages);
unlock_page_cgroup(pc);
WARN_ON_ONCE(PageLRU(page));
/*
I did it this way:
static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
struct page *page,
unsigned int nr_pages,
struct page_cgroup *pc,
enum charge_type ctype,
bool lrucare)
{
struct zone *uninitialized_var(zone);
bool was_on_lru = false;
bool anon;
lock_page_cgroup(pc);
if (unlikely(PageCgroupUsed(pc))) {
unlock_page_cgroup(pc);
__mem_cgroup_cancel_charge(memcg, nr_pages);
return;
}
/*
* we don't need page_cgroup_lock about tail pages, becase they are not
* accessed by any other context at this point.
*/
/*
* In some cases, SwapCache and FUSE(splice_buf->radixtree), the page
* may already be on some other mem_cgroup's LRU. Take care of it.
*/
if (lrucare) {
zone = page_zone(page);
spin_lock_irq(&zone->lru_lock);
if (PageLRU(page)) {
ClearPageLRU(page);
del_page_from_lru_list(zone, page, page_lru(page));
was_on_lru = true;
}
}
pc->mem_cgroup = memcg;
/*
* We access a page_cgroup asynchronously without lock_page_cgroup().
* Especially when a page_cgroup is taken from a page, pc->mem_cgroup
* is accessed after testing USED bit. To make pc->mem_cgroup visible
* before USED bit, we need memory barrier here.
* See mem_cgroup_add_lru_list(), etc.
*/
smp_wmb();
SetPageCgroupUsed(pc);
if (lrucare) {
if (was_on_lru) {
VM_BUG_ON(PageLRU(page));
SetPageLRU(page);
add_page_to_lru_list(zone, page, page_lru(page));
}
spin_unlock_irq(&zone->lru_lock);
}
if (ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
anon = true;
else
anon = false;
mem_cgroup_charge_statistics(memcg, anon, nr_pages);
unlock_page_cgroup(pc);
/*
* "charge_statistics" updated event counter. Then, check it.
* Insert ancestor (and ancestor's ancestors), to softlimit RB-tree.
* if they exceeds softlimit.
*/
memcg_check_events(memcg, page);
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Konstantin Khlebnikov <khlebnikov@openvz.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 3.3] memcg: fix deadlock by inverting lrucare nesting
Date: Wed, 29 Feb 2012 14:04:58 -0800 [thread overview]
Message-ID: <20120229140458.c53352db.akpm@linux-foundation.org> (raw)
In-Reply-To: <alpine.LSU.2.00.1202282121160.4875@eggly.anvils>
On Tue, 28 Feb 2012 21:25:02 -0800 (PST)
Hugh Dickins <hughd@google.com> wrote:
> We have forgotten the rules of lock nesting: the irq-safe ones must be
> taken inside the non-irq-safe ones, otherwise we are open to deadlock:
>
> CPU0 CPU1
> ---- ----
> lock(&(&pc->lock)->rlock);
> local_irq_disable();
> lock(&(&zone->lru_lock)->rlock);
> lock(&(&pc->lock)->rlock);
> <Interrupt>
> lock(&(&zone->lru_lock)->rlock);
>
> To check a different locking issue, I happened to add a spin_lock to
> memcg's bit_spin_lock in lock_page_cgroup(), and lockdep very quickly
> complained about __mem_cgroup_commit_charge_lrucare() (on CPU1 above).
>
> So delete __mem_cgroup_commit_charge_lrucare(), passing a bool lrucare
> to __mem_cgroup_commit_charge() instead, taking zone->lru_lock under
> lock_page_cgroup() in the lrucare case.
>
> The original was using spin_lock_irqsave, but we'd be in more trouble
> if it were ever called at interrupt time: unconditional _irq is enough.
> And ClearPageLRU before del from lru, SetPageLRU before add to lru: no
> strong reason, but that is the ordering used consistently elsewhere.
This patch makes rather a mess of "memcg: remove PCG_CACHE page_cgroup
flag".
--- mm/memcontrol.c~memcg-remove-pcg_cache-page_cgroup-flag
+++ mm/memcontrol.c
@@ -2410,6 +2414,8 @@
struct page_cgroup *pc,
enum charge_type ctype)
{
+ bool anon;
+
lock_page_cgroup(pc);
if (unlikely(PageCgroupUsed(pc))) {
unlock_page_cgroup(pc);
@@ -2429,21 +2435,14 @@
* See mem_cgroup_add_lru_list(), etc.
*/
smp_wmb();
- switch (ctype) {
- case MEM_CGROUP_CHARGE_TYPE_CACHE:
- case MEM_CGROUP_CHARGE_TYPE_SHMEM:
- SetPageCgroupCache(pc);
- SetPageCgroupUsed(pc);
- break;
- case MEM_CGROUP_CHARGE_TYPE_MAPPED:
- ClearPageCgroupCache(pc);
- SetPageCgroupUsed(pc);
- break;
- default:
- break;
- }
- mem_cgroup_charge_statistics(memcg, PageCgroupCache(pc), nr_pages);
+ SetPageCgroupUsed(pc);
+ if (ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
+ anon = true;
+ else
+ anon = false;
+
+ mem_cgroup_charge_statistics(memcg, anon, nr_pages);
unlock_page_cgroup(pc);
WARN_ON_ONCE(PageLRU(page));
/*
I did it this way:
static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
struct page *page,
unsigned int nr_pages,
struct page_cgroup *pc,
enum charge_type ctype,
bool lrucare)
{
struct zone *uninitialized_var(zone);
bool was_on_lru = false;
bool anon;
lock_page_cgroup(pc);
if (unlikely(PageCgroupUsed(pc))) {
unlock_page_cgroup(pc);
__mem_cgroup_cancel_charge(memcg, nr_pages);
return;
}
/*
* we don't need page_cgroup_lock about tail pages, becase they are not
* accessed by any other context at this point.
*/
/*
* In some cases, SwapCache and FUSE(splice_buf->radixtree), the page
* may already be on some other mem_cgroup's LRU. Take care of it.
*/
if (lrucare) {
zone = page_zone(page);
spin_lock_irq(&zone->lru_lock);
if (PageLRU(page)) {
ClearPageLRU(page);
del_page_from_lru_list(zone, page, page_lru(page));
was_on_lru = true;
}
}
pc->mem_cgroup = memcg;
/*
* We access a page_cgroup asynchronously without lock_page_cgroup().
* Especially when a page_cgroup is taken from a page, pc->mem_cgroup
* is accessed after testing USED bit. To make pc->mem_cgroup visible
* before USED bit, we need memory barrier here.
* See mem_cgroup_add_lru_list(), etc.
*/
smp_wmb();
SetPageCgroupUsed(pc);
if (lrucare) {
if (was_on_lru) {
VM_BUG_ON(PageLRU(page));
SetPageLRU(page);
add_page_to_lru_list(zone, page, page_lru(page));
}
spin_unlock_irq(&zone->lru_lock);
}
if (ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
anon = true;
else
anon = false;
mem_cgroup_charge_statistics(memcg, anon, nr_pages);
unlock_page_cgroup(pc);
/*
* "charge_statistics" updated event counter. Then, check it.
* Insert ancestor (and ancestor's ancestors), to softlimit RB-tree.
* if they exceeds softlimit.
*/
memcg_check_events(memcg, page);
}
next prev parent reply other threads:[~2012-02-29 22:05 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-29 5:25 [PATCH 3.3] memcg: fix deadlock by inverting lrucare nesting Hugh Dickins
2012-02-29 5:25 ` Hugh Dickins
2012-02-29 5:26 ` [PATCH next] memcg: fix deadlock by avoiding stat lock when anon Hugh Dickins
2012-02-29 5:26 ` Hugh Dickins
2012-02-29 19:35 ` Johannes Weiner
2012-02-29 19:35 ` Johannes Weiner
2012-03-01 1:18 ` Hugh Dickins
2012-03-01 1:18 ` Hugh Dickins
2012-03-01 2:44 ` [PATCH v2 " Hugh Dickins
2012-03-01 2:44 ` Hugh Dickins
2012-03-01 9:18 ` KAMEZAWA Hiroyuki
2012-03-01 9:18 ` KAMEZAWA Hiroyuki
2012-03-01 10:00 ` Johannes Weiner
2012-03-01 10:00 ` Johannes Weiner
2012-02-29 5:28 ` [PATCH next] memcg: remove PCG_FILE_MAPPED fix cosmetic fix Hugh Dickins
2012-02-29 5:28 ` Hugh Dickins
2012-02-29 5:40 ` KAMEZAWA Hiroyuki
2012-02-29 5:40 ` KAMEZAWA Hiroyuki
2012-02-29 19:35 ` Johannes Weiner
2012-02-29 19:35 ` Johannes Weiner
2012-02-29 5:30 ` [PATCH next] memcg: remove PCG_CACHE page_cgroup flag fix Hugh Dickins
2012-02-29 5:30 ` Hugh Dickins
2012-02-29 5:54 ` KAMEZAWA Hiroyuki
2012-02-29 5:54 ` KAMEZAWA Hiroyuki
2012-02-29 19:43 ` Johannes Weiner
2012-02-29 19:43 ` Johannes Weiner
2012-03-01 1:21 ` Hugh Dickins
2012-03-01 1:21 ` Hugh Dickins
2012-03-01 2:42 ` [PATCH next] memcg: remove PCG_CACHE page_cgroup flag fix2 Hugh Dickins
2012-03-01 2:42 ` Hugh Dickins
2012-03-01 9:16 ` KAMEZAWA Hiroyuki
2012-03-01 9:16 ` KAMEZAWA Hiroyuki
2012-02-29 5:39 ` [PATCH 3.3] memcg: fix deadlock by inverting lrucare nesting KAMEZAWA Hiroyuki
2012-02-29 5:39 ` KAMEZAWA Hiroyuki
2012-02-29 19:00 ` Johannes Weiner
2012-02-29 19:00 ` Johannes Weiner
2012-02-29 22:04 ` Andrew Morton [this message]
2012-02-29 22:04 ` Andrew Morton
2012-03-01 0:43 ` Hugh Dickins
2012-03-01 0:43 ` Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120229140458.c53352db.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=khlebnikov@openvz.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.