All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>
Subject: Re: [mmotm] BUG: Bad page state in process khugepaged ?
Date: Fri, 11 Feb 2011 11:49:06 +0100	[thread overview]
Message-ID: <20110211104906.GE3347@random.random> (raw)
In-Reply-To: <alpine.LSU.2.00.1102102243160.2331@sister.anvils>

On Thu, Feb 10, 2011 at 11:02:50PM -0800, Hugh Dickins wrote:
> There is a separate little issue here, Andrea.
> 
> Although we went to some trouble for bad_page() to take the page out
> of circulation yet let the system continue, your VM_BUG_ON(!PageBuddy)
> inside __ClearPageBuddy(page), from two callsites in bad_page(), is
> turning it into a fatal error when CONFIG_DEBUG_VM.

I see what you mean. Of course it is only a problem after bad_page
already triggered.... but then it trigger an BUG_ON instead of only a
bad_page.

> You could that only MM developers switch CONFIG_DEBUG_VM=y, and they
> would like bad_page() to be fatal; maybe, but if so we should do that
> as an intentional patch, rather than as an unexpected side-effect ;)

Fedora kernels are built with CONFIG_DEBUG_VM, all my kernels runs
with CONFIG_DEBUG_VM too, so we want it to be as "production" as
possible, and we don't want DEBUG_VM to decrease any reliability (only
to increase it of course).

> I noticed this a few days ago, but hadn't quite decided whether just to
> remove the VM_BUG_ON, or move it to __ClearPageBuddy's third callsite,
> or... doesn't matter much.
>
> I do also wonder if PageBuddy would better be _mapcount -something else:
> if we've got a miscounted page (itself unlikely of course), there's a
> chance that its _mapcount will be further decremented after it has been
> freed: whereupon it will go from -1 to -2, PageBuddy at present.  The
> special avoidance of PageBuddy being that it can pull a whole block of
> pages into misuse if its mistaken.

Agreed. What about the below?

=====
Subject: mm: PageBuddy cleanups

From: Andrea Arcangeli <aarcange@redhat.com>

bad_page could VM_BUG_ON(!PageBuddy(page)) inside __ClearPageBuddy(). I prefer
to keep the VM_BUG_ON for safety and to add a if to solve it.

Change the _mapcount value indicating PageBuddy from -2 to -1024 for more
robusteness against page_mapcount() undeflows.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Hugh Dickins <hughd@google.com>
---

diff --git a/include/linux/mm.h b/include/linux/mm.h
index f6385fc..fa16ba0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -402,16 +402,22 @@ static inline void init_page_count(struct page *page)
 /*
  * PageBuddy() indicate that the page is free and in the buddy system
  * (see mm/page_alloc.c).
+ *
+ * PAGE_BUDDY_MAPCOUNT_VALUE must be <= -2 but better not too close to
+ * -2 so that an underflow of the page_mapcount() won't be mistaken
+ * for a genuine PAGE_BUDDY_MAPCOUNT_VALUE.
  */
+#define PAGE_BUDDY_MAPCOUNT_VALUE (-1024*1024)
+
 static inline int PageBuddy(struct page *page)
 {
-	return atomic_read(&page->_mapcount) == -2;
+	return atomic_read(&page->_mapcount) == PAGE_BUDDY_MAPCOUNT_VALUE;
 }
 
 static inline void __SetPageBuddy(struct page *page)
 {
 	VM_BUG_ON(atomic_read(&page->_mapcount) != -1);
-	atomic_set(&page->_mapcount, -2);
+	atomic_set(&page->_mapcount, PAGE_BUDDY_MAPCOUNT_VALUE);
 }
 
 static inline void __ClearPageBuddy(struct page *page)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a873e61..8aac134 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -286,7 +286,9 @@ static void bad_page(struct page *page)
 
 	/* Don't complain about poisoned pages */
 	if (PageHWPoison(page)) {
-		__ClearPageBuddy(page);
+		/* __ClearPageBuddy VM_BUG_ON(!PageBuddy(page)) */
+		if (PageBuddy(page))
+			__ClearPageBuddy(page);
 		return;
 	}
 
@@ -317,7 +319,8 @@ static void bad_page(struct page *page)
 	dump_stack();
 out:
 	/* Leave bad fields for debug, except PageBuddy could make trouble */
-	__ClearPageBuddy(page);
+	if (PageBuddy(page)) /* __ClearPageBuddy VM_BUG_ON(!PageBuddy(page)) */
+		__ClearPageBuddy(page);
 	add_taint(TAINT_BAD_PAGE);
 }
 

WARNING: multiple messages have this Message-ID (diff)
From: Andrea Arcangeli <aarcange@redhat.com>
To: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>
Subject: Re: [mmotm] BUG: Bad page state in process khugepaged ?
Date: Fri, 11 Feb 2011 11:49:06 +0100	[thread overview]
Message-ID: <20110211104906.GE3347@random.random> (raw)
In-Reply-To: <alpine.LSU.2.00.1102102243160.2331@sister.anvils>

On Thu, Feb 10, 2011 at 11:02:50PM -0800, Hugh Dickins wrote:
> There is a separate little issue here, Andrea.
> 
> Although we went to some trouble for bad_page() to take the page out
> of circulation yet let the system continue, your VM_BUG_ON(!PageBuddy)
> inside __ClearPageBuddy(page), from two callsites in bad_page(), is
> turning it into a fatal error when CONFIG_DEBUG_VM.

I see what you mean. Of course it is only a problem after bad_page
already triggered.... but then it trigger an BUG_ON instead of only a
bad_page.

> You could that only MM developers switch CONFIG_DEBUG_VM=y, and they
> would like bad_page() to be fatal; maybe, but if so we should do that
> as an intentional patch, rather than as an unexpected side-effect ;)

Fedora kernels are built with CONFIG_DEBUG_VM, all my kernels runs
with CONFIG_DEBUG_VM too, so we want it to be as "production" as
possible, and we don't want DEBUG_VM to decrease any reliability (only
to increase it of course).

> I noticed this a few days ago, but hadn't quite decided whether just to
> remove the VM_BUG_ON, or move it to __ClearPageBuddy's third callsite,
> or... doesn't matter much.
>
> I do also wonder if PageBuddy would better be _mapcount -something else:
> if we've got a miscounted page (itself unlikely of course), there's a
> chance that its _mapcount will be further decremented after it has been
> freed: whereupon it will go from -1 to -2, PageBuddy at present.  The
> special avoidance of PageBuddy being that it can pull a whole block of
> pages into misuse if its mistaken.

Agreed. What about the below?

=====
Subject: mm: PageBuddy cleanups

From: Andrea Arcangeli <aarcange@redhat.com>

bad_page could VM_BUG_ON(!PageBuddy(page)) inside __ClearPageBuddy(). I prefer
to keep the VM_BUG_ON for safety and to add a if to solve it.

Change the _mapcount value indicating PageBuddy from -2 to -1024 for more
robusteness against page_mapcount() undeflows.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Hugh Dickins <hughd@google.com>
---

diff --git a/include/linux/mm.h b/include/linux/mm.h
index f6385fc..fa16ba0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -402,16 +402,22 @@ static inline void init_page_count(struct page *page)
 /*
  * PageBuddy() indicate that the page is free and in the buddy system
  * (see mm/page_alloc.c).
+ *
+ * PAGE_BUDDY_MAPCOUNT_VALUE must be <= -2 but better not too close to
+ * -2 so that an underflow of the page_mapcount() won't be mistaken
+ * for a genuine PAGE_BUDDY_MAPCOUNT_VALUE.
  */
+#define PAGE_BUDDY_MAPCOUNT_VALUE (-1024*1024)
+
 static inline int PageBuddy(struct page *page)
 {
-	return atomic_read(&page->_mapcount) == -2;
+	return atomic_read(&page->_mapcount) == PAGE_BUDDY_MAPCOUNT_VALUE;
 }
 
 static inline void __SetPageBuddy(struct page *page)
 {
 	VM_BUG_ON(atomic_read(&page->_mapcount) != -1);
-	atomic_set(&page->_mapcount, -2);
+	atomic_set(&page->_mapcount, PAGE_BUDDY_MAPCOUNT_VALUE);
 }
 
 static inline void __ClearPageBuddy(struct page *page)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a873e61..8aac134 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -286,7 +286,9 @@ static void bad_page(struct page *page)
 
 	/* Don't complain about poisoned pages */
 	if (PageHWPoison(page)) {
-		__ClearPageBuddy(page);
+		/* __ClearPageBuddy VM_BUG_ON(!PageBuddy(page)) */
+		if (PageBuddy(page))
+			__ClearPageBuddy(page);
 		return;
 	}
 
@@ -317,7 +319,8 @@ static void bad_page(struct page *page)
 	dump_stack();
 out:
 	/* Leave bad fields for debug, except PageBuddy could make trouble */
-	__ClearPageBuddy(page);
+	if (PageBuddy(page)) /* __ClearPageBuddy VM_BUG_ON(!PageBuddy(page)) */
+		__ClearPageBuddy(page);
 	add_taint(TAINT_BAD_PAGE);
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2011-02-11 10:49 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-09  6:10 [mmotm] BUG: Bad page state in process khugepaged ? KAMEZAWA Hiroyuki
2011-02-09  6:40 ` KAMEZAWA Hiroyuki
2011-02-09  6:40   ` KAMEZAWA Hiroyuki
2011-02-09  6:50 ` Daisuke Nishimura
2011-02-09  6:50   ` Daisuke Nishimura
2011-02-09  6:52   ` KAMEZAWA Hiroyuki
2011-02-09  6:52     ` KAMEZAWA Hiroyuki
2011-02-09 20:07     ` Andrea Arcangeli
2011-02-09 20:07       ` Andrea Arcangeli
2011-02-11  7:02       ` Hugh Dickins
2011-02-11  7:02         ` Hugh Dickins
2011-02-11 10:49         ` Andrea Arcangeli [this message]
2011-02-11 10:49           ` Andrea Arcangeli
2011-02-11 19:58           ` Hugh Dickins
2011-02-11 19:58             ` Hugh Dickins
2011-02-11 20:24             ` Andrea Arcangeli
2011-02-11 20:24               ` Andrea Arcangeli
2011-02-14 22:24           ` Johannes Weiner
2011-02-14 22:24             ` Johannes Weiner
2011-02-09  7:23 ` [PATCH][BUGFIX] memcg: fix leak of accounting at failure path of hugepage collapsing KAMEZAWA Hiroyuki
2011-02-09  7:23   ` KAMEZAWA Hiroyuki
2011-02-09  7:51   ` Daisuke Nishimura
2011-02-09  7:51     ` Daisuke Nishimura
2011-02-09  9:51   ` Johannes Weiner
2011-02-09  9:51     ` Johannes Weiner
2011-02-10  2:49   ` Minchan Kim
2011-02-10  2:49     ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110211104906.GE3347@random.random \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nishimura@mxp.nes.nec.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.