LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* patch powerpc-fix-incorrect-setting-of-__have_arch_pte_special.patch added to 2.6.31-stable tree
From: gregkh @ 2009-10-01 22:46 UTC (permalink / raw)
  To: Bernhard.Weirich, benh, bernhard.weirich, gregkh, linuxppc-dev,
	stable
  Cc: stable, stable-commits
In-Reply-To: <1253776613.7103.433.camel@pasglop>


This is a note to let you know that we have just queued up the patch titled

    Subject: powerpc: Fix incorrect setting of __HAVE_ARCH_PTE_SPECIAL

to the 2.6.31-stable tree.  Its filename is

    powerpc-fix-incorrect-setting-of-__have_arch_pte_special.patch

A git repo of this tree can be found at 
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary


>From benh@kernel.crashing.org  Thu Oct  1 15:36:15 2009
From: Weirich, Bernhard <Bernhard.Weirich@riedel.net>
Date: Thu, 24 Sep 2009 17:16:53 +1000
Subject: powerpc: Fix incorrect setting of __HAVE_ARCH_PTE_SPECIAL
To: stable <stable@kernel.org>
Cc: linuxppc-dev list <linuxppc-dev@ozlabs.org>, bernhard.weirich@riedel.net, RFeany@mrv.com
Message-ID: <1253776613.7103.433.camel@pasglop>


From: Weirich, Bernhard <Bernhard.Weirich@riedel.net>

[I'm going to fix upstream differently, by having all CPU types
actually support _PAGE_SPECIAL, but I prefer the simple and obvious
fix for -stable. -- Ben]

The test that decides whether to define __HAVE_ARCH_PTE_SPECIAL on
powerpc is bogus and will end up always defining it, even when
_PAGE_SPECIAL is not supported (in which case it's 0) such as on
8xx or 40x processors.

Signed-off-by: Bernhard Weirich <bernhard.weirich@riedel.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


---
 arch/powerpc/include/asm/pte-common.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/powerpc/include/asm/pte-common.h
+++ b/arch/powerpc/include/asm/pte-common.h
@@ -176,7 +176,7 @@ extern unsigned long bad_call_to_PMD_PAG
 #define HAVE_PAGE_AGP
 
 /* Advertise support for _PAGE_SPECIAL */
-#ifdef _PAGE_SPECIAL
+#if _PAGE_SPECIAL != 0
 #define __HAVE_ARCH_PTE_SPECIAL
 #endif
 


Patches currently in stable-queue which might be from Bernhard.Weirich@riedel.net are

queue-2.6.31/powerpc-fix-incorrect-setting-of-__have_arch_pte_special.patch

^ permalink raw reply

* patch powerpc-fix-incorrect-setting-of-__have_arch_pte_special.patch added to 2.6.30-stable tree
From: gregkh @ 2009-10-01 22:51 UTC (permalink / raw)
  To: Bernhard.Weirich, benh, bernhard.weirich, gregkh, linuxppc-dev,
	stable
  Cc: stable, stable-commits
In-Reply-To: <1253776613.7103.433.camel@pasglop>


This is a note to let you know that we have just queued up the patch titled

    Subject: powerpc: Fix incorrect setting of __HAVE_ARCH_PTE_SPECIAL

to the 2.6.30-stable tree.  Its filename is

    powerpc-fix-incorrect-setting-of-__have_arch_pte_special.patch

A git repo of this tree can be found at 
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary


>From benh@kernel.crashing.org  Thu Oct  1 15:36:15 2009
From: Weirich, Bernhard <Bernhard.Weirich@riedel.net>
Date: Thu, 24 Sep 2009 17:16:53 +1000
Subject: powerpc: Fix incorrect setting of __HAVE_ARCH_PTE_SPECIAL
To: stable <stable@kernel.org>
Cc: linuxppc-dev list <linuxppc-dev@ozlabs.org>, bernhard.weirich@riedel.net, RFeany@mrv.com
Message-ID: <1253776613.7103.433.camel@pasglop>


From: Weirich, Bernhard <Bernhard.Weirich@riedel.net>

[I'm going to fix upstream differently, by having all CPU types
actually support _PAGE_SPECIAL, but I prefer the simple and obvious
fix for -stable. -- Ben]

The test that decides whether to define __HAVE_ARCH_PTE_SPECIAL on
powerpc is bogus and will end up always defining it, even when
_PAGE_SPECIAL is not supported (in which case it's 0) such as on
8xx or 40x processors.

Signed-off-by: Bernhard Weirich <bernhard.weirich@riedel.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


---
 arch/powerpc/include/asm/pte-common.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/powerpc/include/asm/pte-common.h
+++ b/arch/powerpc/include/asm/pte-common.h
@@ -176,7 +176,7 @@ extern unsigned long bad_call_to_PMD_PAG
 #define HAVE_PAGE_AGP
 
 /* Advertise support for _PAGE_SPECIAL */
-#ifdef _PAGE_SPECIAL
+#if _PAGE_SPECIAL != 0
 #define __HAVE_ARCH_PTE_SPECIAL
 #endif
 


Patches currently in stable-queue which might be from Bernhard.Weirich@riedel.net are

queue-2.6.30/powerpc-fix-incorrect-setting-of-__have_arch_pte_special.patch

^ permalink raw reply

* patch powerpc-8xx-fix-regression-introduced-by-cache-coherency-rewrite.patch added to 2.6.30-stable tree
From: gregkh @ 2009-10-01 22:51 UTC (permalink / raw)
  To: RFeany, benh, gregkh, linuxppc-dev, rfeany, stable; +Cc: stable, stable-commits
In-Reply-To: <1253776614.7103.434.camel@pasglop>


This is a note to let you know that we have just queued up the patch titled

    Subject: powerpc/8xx: Fix regression introduced by cache coherency rewrite

to the 2.6.30-stable tree.  Its filename is

    powerpc-8xx-fix-regression-introduced-by-cache-coherency-rewrite.patch

A git repo of this tree can be found at 
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary


>From benh@kernel.crashing.org  Thu Oct  1 15:35:28 2009
From: Rex Feany <RFeany@mrv.com>
Date: Thu, 24 Sep 2009 17:16:54 +1000
Subject: powerpc/8xx: Fix regression introduced by cache coherency rewrite
To: stable <stable@kernel.org>
Cc: linuxppc-dev list <linuxppc-dev@ozlabs.org>, RFeany@mrv.com
Message-ID: <1253776614.7103.434.camel@pasglop>

From: Rex Feany <RFeany@mrv.com>

commit e0908085fc2391c85b85fb814ae1df377c8e0dcb upstream.

After upgrading to the latest kernel on my mpc875 userspace started
running incredibly slow (hours to get to a shell, even!).
I tracked it down to commit 8d30c14cab30d405a05f2aaceda1e9ad57800f36,
that patch removed a work-around for the 8xx. Adding it
back makes my problem go away.

Signed-off-by: Rex Feany <rfeany@mrv.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 arch/powerpc/mm/pgtable.c |   19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -30,6 +30,8 @@
 #include <asm/tlbflush.h>
 #include <asm/tlb.h>
 
+#include "mmu_decl.h"
+
 static DEFINE_PER_CPU(struct pte_freelist_batch *, pte_freelist_cur);
 static unsigned long pte_freelist_forced_free;
 
@@ -119,7 +121,7 @@ void pte_free_finish(void)
 /*
  * Handle i/d cache flushing, called from set_pte_at() or ptep_set_access_flags()
  */
-static pte_t do_dcache_icache_coherency(pte_t pte)
+static pte_t do_dcache_icache_coherency(pte_t pte, unsigned long addr)
 {
 	unsigned long pfn = pte_pfn(pte);
 	struct page *page;
@@ -128,6 +130,17 @@ static pte_t do_dcache_icache_coherency(
 		return pte;
 	page = pfn_to_page(pfn);
 
+#ifdef CONFIG_8xx
+       /* On 8xx, cache control instructions (particularly
+        * "dcbst" from flush_dcache_icache) fault as write
+        * operation if there is an unpopulated TLB entry
+        * for the address in question. To workaround that,
+        * we invalidate the TLB here, thus avoiding dcbst
+        * misbehaviour.
+        */
+       _tlbil_va(addr, 0 /* 8xx doesn't care about PID */);
+#endif
+
 	if (!PageReserved(page) && !test_bit(PG_arch_1, &page->flags)) {
 		pr_debug("do_dcache_icache_coherency... flushing\n");
 		flush_dcache_icache_page(page);
@@ -198,7 +211,7 @@ void set_pte_at(struct mm_struct *mm, un
 	 */
 	pte = __pte(pte_val(pte) & ~_PAGE_HPTEFLAGS);
 	if (pte_need_exec_flush(pte, 1))
-		pte = do_dcache_icache_coherency(pte);
+		pte = do_dcache_icache_coherency(pte, addr);
 
 	/* Perform the setting of the PTE */
 	__set_pte_at(mm, addr, ptep, pte, 0);
@@ -216,7 +229,7 @@ int ptep_set_access_flags(struct vm_area
 {
 	int changed;
 	if (!dirty && pte_need_exec_flush(entry, 0))
-		entry = do_dcache_icache_coherency(entry);
+		entry = do_dcache_icache_coherency(entry, address);
 	changed = !pte_same(*(ptep), entry);
 	if (changed) {
 		if (!(vma->vm_flags & VM_HUGETLB))


Patches currently in stable-queue which might be from RFeany@mrv.com are

queue-2.6.30/powerpc-8xx-fix-regression-introduced-by-cache-coherency-rewrite.patch
queue-2.6.30/powerpc-fix-incorrect-setting-of-__have_arch_pte_special.patch

^ permalink raw reply

* patch powerpc-8xx-fix-regression-introduced-by-cache-coherency-rewrite.patch added to 2.6.31-stable tree
From: gregkh @ 2009-10-01 22:46 UTC (permalink / raw)
  To: RFeany, benh, gregkh, linuxppc-dev, rfeany, stable; +Cc: stable, stable-commits
In-Reply-To: <1253776614.7103.434.camel@pasglop>


This is a note to let you know that we have just queued up the patch titled

    Subject: powerpc/8xx: Fix regression introduced by cache coherency rewrite

to the 2.6.31-stable tree.  Its filename is

    powerpc-8xx-fix-regression-introduced-by-cache-coherency-rewrite.patch

A git repo of this tree can be found at 
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary


>From benh@kernel.crashing.org  Thu Oct  1 15:35:28 2009
From: Rex Feany <RFeany@mrv.com>
Date: Thu, 24 Sep 2009 17:16:54 +1000
Subject: powerpc/8xx: Fix regression introduced by cache coherency rewrite
To: stable <stable@kernel.org>
Cc: linuxppc-dev list <linuxppc-dev@ozlabs.org>, RFeany@mrv.com
Message-ID: <1253776614.7103.434.camel@pasglop>

From: Rex Feany <RFeany@mrv.com>

commit e0908085fc2391c85b85fb814ae1df377c8e0dcb upstream.

After upgrading to the latest kernel on my mpc875 userspace started
running incredibly slow (hours to get to a shell, even!).
I tracked it down to commit 8d30c14cab30d405a05f2aaceda1e9ad57800f36,
that patch removed a work-around for the 8xx. Adding it
back makes my problem go away.

Signed-off-by: Rex Feany <rfeany@mrv.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 arch/powerpc/mm/pgtable.c |   19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -30,6 +30,8 @@
 #include <asm/tlbflush.h>
 #include <asm/tlb.h>
 
+#include "mmu_decl.h"
+
 static DEFINE_PER_CPU(struct pte_freelist_batch *, pte_freelist_cur);
 static unsigned long pte_freelist_forced_free;
 
@@ -119,7 +121,7 @@ void pte_free_finish(void)
 /*
  * Handle i/d cache flushing, called from set_pte_at() or ptep_set_access_flags()
  */
-static pte_t do_dcache_icache_coherency(pte_t pte)
+static pte_t do_dcache_icache_coherency(pte_t pte, unsigned long addr)
 {
 	unsigned long pfn = pte_pfn(pte);
 	struct page *page;
@@ -128,6 +130,17 @@ static pte_t do_dcache_icache_coherency(
 		return pte;
 	page = pfn_to_page(pfn);
 
+#ifdef CONFIG_8xx
+       /* On 8xx, cache control instructions (particularly
+        * "dcbst" from flush_dcache_icache) fault as write
+        * operation if there is an unpopulated TLB entry
+        * for the address in question. To workaround that,
+        * we invalidate the TLB here, thus avoiding dcbst
+        * misbehaviour.
+        */
+       _tlbil_va(addr, 0 /* 8xx doesn't care about PID */);
+#endif
+
 	if (!PageReserved(page) && !test_bit(PG_arch_1, &page->flags)) {
 		pr_devel("do_dcache_icache_coherency... flushing\n");
 		flush_dcache_icache_page(page);
@@ -198,7 +211,7 @@ void set_pte_at(struct mm_struct *mm, un
 	 */
 	pte = __pte(pte_val(pte) & ~_PAGE_HPTEFLAGS);
 	if (pte_need_exec_flush(pte, 1))
-		pte = do_dcache_icache_coherency(pte);
+		pte = do_dcache_icache_coherency(pte, addr);
 
 	/* Perform the setting of the PTE */
 	__set_pte_at(mm, addr, ptep, pte, 0);
@@ -216,7 +229,7 @@ int ptep_set_access_flags(struct vm_area
 {
 	int changed;
 	if (!dirty && pte_need_exec_flush(entry, 0))
-		entry = do_dcache_icache_coherency(entry);
+		entry = do_dcache_icache_coherency(entry, address);
 	changed = !pte_same(*(ptep), entry);
 	if (changed) {
 		if (!(vma->vm_flags & VM_HUGETLB))


Patches currently in stable-queue which might be from RFeany@mrv.com are

queue-2.6.31/powerpc-8xx-fix-regression-introduced-by-cache-coherency-rewrite.patch
queue-2.6.31/powerpc-fix-incorrect-setting-of-__have_arch_pte_special.patch

^ permalink raw reply

* [PATCH] pasemi_mac: ethtool get settings fix
From: Valentine Barshak @ 2009-10-01 22:27 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: netdev

Not all pasemi mac interfaces can have a phy attached.
For example, XAUI has no phy and phydev is NULL for it.
In this case ethtool get settings causes kernel crash.
Fix it by returning -EOPNOTSUPP if there's no PHY attached.

Signed-off-by: Valentine Barshak <vbarshak@ru.mvista.com>
---
 drivers/net/pasemi_mac_ethtool.c |    2 ++
 1 file changed, 2 insertions(+)

--- linux-2.6.21.orig/drivers/net/pasemi_mac_ethtool.c	2008-11-06 18:10:38.000000000 +0300
+++ linux-2.6.21/drivers/net/pasemi_mac_ethtool.c	2008-11-19 19:24:28.000000000 +0300
@@ -71,6 +71,8 @@ pasemi_mac_ethtool_get_settings(struct n
 	struct pasemi_mac *mac = netdev_priv(netdev);
 	struct phy_device *phydev = mac->phydev;
 
+	if (!phydev)
+		return -EOPNOTSUPP;
 	return phy_ethtool_gset(phydev, cmd);
 }
 

^ permalink raw reply

* [patch 26/30] powerpc/8xx: Fix regression introduced by cache coherency rewrite
From: Greg KH @ 2009-10-01 23:31 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: linuxppc-dev list, akpm, torvalds, stable-review, alan, Rex Feany
In-Reply-To: <20091001233504.GA17709@kroah.com>

2.6.30-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Rex Feany <RFeany@mrv.com>

commit e0908085fc2391c85b85fb814ae1df377c8e0dcb upstream.

After upgrading to the latest kernel on my mpc875 userspace started
running incredibly slow (hours to get to a shell, even!).
I tracked it down to commit 8d30c14cab30d405a05f2aaceda1e9ad57800f36,
that patch removed a work-around for the 8xx. Adding it
back makes my problem go away.

Signed-off-by: Rex Feany <rfeany@mrv.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 arch/powerpc/mm/pgtable.c |   19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -30,6 +30,8 @@
 #include <asm/tlbflush.h>
 #include <asm/tlb.h>
 
+#include "mmu_decl.h"
+
 static DEFINE_PER_CPU(struct pte_freelist_batch *, pte_freelist_cur);
 static unsigned long pte_freelist_forced_free;
 
@@ -119,7 +121,7 @@ void pte_free_finish(void)
 /*
  * Handle i/d cache flushing, called from set_pte_at() or ptep_set_access_flags()
  */
-static pte_t do_dcache_icache_coherency(pte_t pte)
+static pte_t do_dcache_icache_coherency(pte_t pte, unsigned long addr)
 {
 	unsigned long pfn = pte_pfn(pte);
 	struct page *page;
@@ -128,6 +130,17 @@ static pte_t do_dcache_icache_coherency(
 		return pte;
 	page = pfn_to_page(pfn);
 
+#ifdef CONFIG_8xx
+       /* On 8xx, cache control instructions (particularly
+        * "dcbst" from flush_dcache_icache) fault as write
+        * operation if there is an unpopulated TLB entry
+        * for the address in question. To workaround that,
+        * we invalidate the TLB here, thus avoiding dcbst
+        * misbehaviour.
+        */
+       _tlbil_va(addr, 0 /* 8xx doesn't care about PID */);
+#endif
+
 	if (!PageReserved(page) && !test_bit(PG_arch_1, &page->flags)) {
 		pr_debug("do_dcache_icache_coherency... flushing\n");
 		flush_dcache_icache_page(page);
@@ -198,7 +211,7 @@ void set_pte_at(struct mm_struct *mm, un
 	 */
 	pte = __pte(pte_val(pte) & ~_PAGE_HPTEFLAGS);
 	if (pte_need_exec_flush(pte, 1))
-		pte = do_dcache_icache_coherency(pte);
+		pte = do_dcache_icache_coherency(pte, addr);
 
 	/* Perform the setting of the PTE */
 	__set_pte_at(mm, addr, ptep, pte, 0);
@@ -216,7 +229,7 @@ int ptep_set_access_flags(struct vm_area
 {
 	int changed;
 	if (!dirty && pte_need_exec_flush(entry, 0))
-		entry = do_dcache_icache_coherency(entry);
+		entry = do_dcache_icache_coherency(entry, address);
 	changed = !pte_same(*(ptep), entry);
 	if (changed) {
 		if (!(vma->vm_flags & VM_HUGETLB))

^ permalink raw reply

* [patch 27/30] powerpc: Fix incorrect setting of __HAVE_ARCH_PTE_SPECIAL
From: Greg KH @ 2009-10-01 23:31 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: linuxppc-dev list, bernhard.weirich, akpm, torvalds,
	stable-review, alan, RFeany
In-Reply-To: <20091001233504.GA17709@kroah.com>


2.6.30-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Weirich, Bernhard <Bernhard.Weirich@riedel.net>

[I'm going to fix upstream differently, by having all CPU types
actually support _PAGE_SPECIAL, but I prefer the simple and obvious
fix for -stable. -- Ben]

The test that decides whether to define __HAVE_ARCH_PTE_SPECIAL on
powerpc is bogus and will end up always defining it, even when
_PAGE_SPECIAL is not supported (in which case it's 0) such as on
8xx or 40x processors.

Signed-off-by: Bernhard Weirich <bernhard.weirich@riedel.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


---
 arch/powerpc/include/asm/pte-common.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/powerpc/include/asm/pte-common.h
+++ b/arch/powerpc/include/asm/pte-common.h
@@ -176,7 +176,7 @@ extern unsigned long bad_call_to_PMD_PAG
 #define HAVE_PAGE_AGP
 
 /* Advertise support for _PAGE_SPECIAL */
-#ifdef _PAGE_SPECIAL
+#if _PAGE_SPECIAL != 0
 #define __HAVE_ARCH_PTE_SPECIAL
 #endif
 

^ permalink raw reply

* [127/136] powerpc/8xx: Fix regression introduced by cache coherency rewrite
From: Greg KH @ 2009-10-02  1:17 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: linuxppc-dev list, akpm, torvalds, stable-review, alan, Rex Feany
In-Reply-To: <20091002012911.GA18542@kroah.com>

2.6.31-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Rex Feany <RFeany@mrv.com>

commit e0908085fc2391c85b85fb814ae1df377c8e0dcb upstream.

After upgrading to the latest kernel on my mpc875 userspace started
running incredibly slow (hours to get to a shell, even!).
I tracked it down to commit 8d30c14cab30d405a05f2aaceda1e9ad57800f36,
that patch removed a work-around for the 8xx. Adding it
back makes my problem go away.

Signed-off-by: Rex Feany <rfeany@mrv.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 arch/powerpc/mm/pgtable.c |   19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -30,6 +30,8 @@
 #include <asm/tlbflush.h>
 #include <asm/tlb.h>
 
+#include "mmu_decl.h"
+
 static DEFINE_PER_CPU(struct pte_freelist_batch *, pte_freelist_cur);
 static unsigned long pte_freelist_forced_free;
 
@@ -119,7 +121,7 @@ void pte_free_finish(void)
 /*
  * Handle i/d cache flushing, called from set_pte_at() or ptep_set_access_flags()
  */
-static pte_t do_dcache_icache_coherency(pte_t pte)
+static pte_t do_dcache_icache_coherency(pte_t pte, unsigned long addr)
 {
 	unsigned long pfn = pte_pfn(pte);
 	struct page *page;
@@ -128,6 +130,17 @@ static pte_t do_dcache_icache_coherency(
 		return pte;
 	page = pfn_to_page(pfn);
 
+#ifdef CONFIG_8xx
+       /* On 8xx, cache control instructions (particularly
+        * "dcbst" from flush_dcache_icache) fault as write
+        * operation if there is an unpopulated TLB entry
+        * for the address in question. To workaround that,
+        * we invalidate the TLB here, thus avoiding dcbst
+        * misbehaviour.
+        */
+       _tlbil_va(addr, 0 /* 8xx doesn't care about PID */);
+#endif
+
 	if (!PageReserved(page) && !test_bit(PG_arch_1, &page->flags)) {
 		pr_devel("do_dcache_icache_coherency... flushing\n");
 		flush_dcache_icache_page(page);
@@ -198,7 +211,7 @@ void set_pte_at(struct mm_struct *mm, un
 	 */
 	pte = __pte(pte_val(pte) & ~_PAGE_HPTEFLAGS);
 	if (pte_need_exec_flush(pte, 1))
-		pte = do_dcache_icache_coherency(pte);
+		pte = do_dcache_icache_coherency(pte, addr);
 
 	/* Perform the setting of the PTE */
 	__set_pte_at(mm, addr, ptep, pte, 0);
@@ -216,7 +229,7 @@ int ptep_set_access_flags(struct vm_area
 {
 	int changed;
 	if (!dirty && pte_need_exec_flush(entry, 0))
-		entry = do_dcache_icache_coherency(entry);
+		entry = do_dcache_icache_coherency(entry, address);
 	changed = !pte_same(*(ptep), entry);
 	if (changed) {
 		if (!(vma->vm_flags & VM_HUGETLB))

^ permalink raw reply

* [128/136] powerpc: Fix incorrect setting of __HAVE_ARCH_PTE_SPECIAL
From: Greg KH @ 2009-10-02  1:17 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: linuxppc-dev list, bernhard.weirich, akpm, torvalds,
	stable-review, alan, RFeany
In-Reply-To: <20091002012911.GA18542@kroah.com>


2.6.31-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Weirich, Bernhard <Bernhard.Weirich@riedel.net>

[I'm going to fix upstream differently, by having all CPU types
actually support _PAGE_SPECIAL, but I prefer the simple and obvious
fix for -stable. -- Ben]

The test that decides whether to define __HAVE_ARCH_PTE_SPECIAL on
powerpc is bogus and will end up always defining it, even when
_PAGE_SPECIAL is not supported (in which case it's 0) such as on
8xx or 40x processors.

Signed-off-by: Bernhard Weirich <bernhard.weirich@riedel.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


---
 arch/powerpc/include/asm/pte-common.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/powerpc/include/asm/pte-common.h
+++ b/arch/powerpc/include/asm/pte-common.h
@@ -176,7 +176,7 @@ extern unsigned long bad_call_to_PMD_PAG
 #define HAVE_PAGE_AGP
 
 /* Advertise support for _PAGE_SPECIAL */
-#ifdef _PAGE_SPECIAL
+#if _PAGE_SPECIAL != 0
 #define __HAVE_ARCH_PTE_SPECIAL
 #endif
 

^ permalink raw reply

* Re: [PATCH 1/2] mm: add notifier in pageblock isolation for balloon drivers
From: Mel Gorman @ 2009-10-02 10:44 UTC (permalink / raw)
  To: Ingo Molnar, Badari Pulavarty, Brian King, Benjamin Herrenschmidt,
	Paul Mackerras, Martin Schwidefsky, linux-kernel, linux-mm,
	linuxppc-dev
In-Reply-To: <20091001195311.GA16667@austin.ibm.com>

On Thu, Oct 01, 2009 at 02:53:11PM -0500, Robert Jennings wrote:
> Memory balloon drivers can allocate a large amount of memory which
> is not movable but could be freed to accommodate memory hotplug remove.
> 
> Prior to calling the memory hotplug notifier chain the memory in the
> pageblock is isolated.  If the migrate type is not MIGRATE_MOVABLE the
> isolation will not proceed, causing the memory removal for that page
> range to fail.
> 
> Rather than immediately failing pageblock isolation if the the
> migrateteype is not MIGRATE_MOVABLE, this patch checks if all of the
> pages in the pageblock are owned by a registered balloon driver using a
> notifier chain.  If all of the non-movable pages are owned by a balloon,
> they can be freed later through the memory notifier chain and the range
> can still be isolated in set_migratetype_isolate().
> 
> Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
> 
> ---
>  drivers/base/memory.c  |   19 +++++++++++++++++++
>  include/linux/memory.h |   22 ++++++++++++++++++++++
>  mm/page_alloc.c        |   49 +++++++++++++++++++++++++++++++++++++++++--------
>  3 files changed, 82 insertions(+), 8 deletions(-)
> 
> Index: b/drivers/base/memory.c
> ===================================================================
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -63,6 +63,20 @@ void unregister_memory_notifier(struct n
>  }
>  EXPORT_SYMBOL(unregister_memory_notifier);
>  
> +static BLOCKING_NOTIFIER_HEAD(memory_isolate_chain);
> +
> +int register_memory_isolate_notifier(struct notifier_block *nb)
> +{
> +	return blocking_notifier_chain_register(&memory_isolate_chain, nb);
> +}
> +EXPORT_SYMBOL(register_memory_isolate_notifier);
> +
> +void unregister_memory_isolate_notifier(struct notifier_block *nb)
> +{
> +	blocking_notifier_chain_unregister(&memory_isolate_chain, nb);
> +}
> +EXPORT_SYMBOL(unregister_memory_isolate_notifier);
> +
>  /*
>   * register_memory - Setup a sysfs device for a memory block
>   */
> @@ -157,6 +171,11 @@ int memory_notify(unsigned long val, voi
>  	return blocking_notifier_call_chain(&memory_chain, val, v);
>  }
>  
> +int memory_isolate_notify(unsigned long val, void *v)
> +{
> +	return blocking_notifier_call_chain(&memory_isolate_chain, val, v);
> +}
> +
>  /*
>   * MEMORY_HOTPLUG depends on SPARSEMEM in mm/Kconfig, so it is
>   * OK to have direct references to sparsemem variables in here.
> Index: b/include/linux/memory.h
> ===================================================================
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -50,6 +50,14 @@ struct memory_notify {
>  	int status_change_nid;
>  };
>  
> +#define MEM_ISOLATE_COUNT	(1<<0)
> +

This needs a comment explaining that that this is an action to count the
number of pages within a range that have been isolated within a range of
pages and not a default value for "nr_pages" in the next structure.

> +struct memory_isolate_notify {
> +	unsigned long start_addr;
> +	unsigned int nr_pages;
> +	unsigned int pages_found;
> +};

Is there any particular reason you used virtual address of the mapped
page instead of PFN? I am guessing at this point that the balloon driver
is based on addresses but the code that populates the structure more
commonly deals with PFNs. Outside of debugging code, page_address is
rarely used in mm/page_alloc.c .

It's picky but it feels more natural to me to have the structure have
start_pfn and nr_pages or start_addr and end_addr but not a mix of both.

> +
>  struct notifier_block;
>  struct mem_section;
>  
> @@ -76,14 +84,28 @@ static inline int memory_notify(unsigned
>  {
>  	return 0;
>  }
> +static inline int register_memory_isolate_notifier(struct notifier_block *nb)
> +{
> +	return 0;
> +}
> +static inline void unregister_memory_isolate_notifier(struct notifier_block *nb)
> +{
> +}
> +static inline int memory_isolate_notify(unsigned long val, void *v)
> +{
> +	return 0;
> +}
>  #else
>  extern int register_memory_notifier(struct notifier_block *nb);
>  extern void unregister_memory_notifier(struct notifier_block *nb);
> +extern int register_memory_isolate_notifier(struct notifier_block *nb);
> +extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
>  extern int register_new_memory(int, struct mem_section *);
>  extern int unregister_memory_section(struct mem_section *);
>  extern int memory_dev_init(void);
>  extern int remove_memory_block(unsigned long, struct mem_section *, int);
>  extern int memory_notify(unsigned long val, void *v);
> +extern int memory_isolate_notify(unsigned long val, void *v);
>  extern struct memory_block *find_memory_block(struct mem_section *);
>  #define CONFIG_MEM_BLOCK_SIZE	(PAGES_PER_SECTION<<PAGE_SHIFT)
>  enum mem_add_context { BOOT, HOTPLUG };
> Index: b/mm/page_alloc.c
> ===================================================================
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -48,6 +48,7 @@
>  #include <linux/page_cgroup.h>
>  #include <linux/debugobjects.h>
>  #include <linux/kmemleak.h>
> +#include <linux/memory.h>
>  #include <trace/events/kmem.h>
>  
>  #include <asm/tlbflush.h>
> @@ -4985,23 +4986,55 @@ void set_pageblock_flags_group(struct pa
>  int set_migratetype_isolate(struct page *page)
>  {
>  	struct zone *zone;
> -	unsigned long flags;
> +	unsigned long flags, pfn, iter;
> +	long immobile = 0;

So, the count in the structure is unsigned long, but long here. Why the
difference in types?

> +	struct memory_isolate_notify arg;
> +	int notifier_ret;
>  	int ret = -EBUSY;
>  	int zone_idx;
>  
>  	zone = page_zone(page);
>  	zone_idx = zone_idx(zone);
> +
> +	pfn = page_to_pfn(page);
> +	arg.start_addr = (unsigned long)page_address(page);
> +	arg.nr_pages = pageblock_nr_pages;
> +	arg.pages_found = 0;
> +
>  	spin_lock_irqsave(&zone->lock, flags);
>  	/*
>  	 * In future, more migrate types will be able to be isolation target.
>  	 */
> -	if (get_pageblock_migratetype(page) != MIGRATE_MOVABLE &&
> -	    zone_idx != ZONE_MOVABLE)
> -		goto out;
> -	set_pageblock_migratetype(page, MIGRATE_ISOLATE);
> -	move_freepages_block(zone, page, MIGRATE_ISOLATE);
> -	ret = 0;
> -out:
> +	do {
> +		if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE &&
> +		    zone_idx == ZONE_MOVABLE) {

So, this condition requires the zone be MOVABLE and the migrate type
be movable. That prevents MIGRATE_RESERVE regions in ZONE_MOVABLE being
off-lined even though they can likely be off-lined. It also prevents
MIGRATE_MOVABLE sections in other zones being off-lined.

Did you mean || here instead of && ?

Might want to expand the comment explaining this condition instead of
leaving it in the old location which is confusing.

> +			ret = 0;
> +			break;
> +		}

Why do you wrap all this in a do {} while(0)  instead of preserving the
out: label and using goto?

> +
> +		/*
> +		 * If all of the pages in a zone are used by a balloon,
> +		 * the range can be still be isolated.  The balloon will
> +		 * free these pages from the memory notifier chain.
> +		 */
> +		notifier_ret = memory_isolate_notify(MEM_ISOLATE_COUNT, &arg);
> +		notifier_ret = notifier_to_errno(ret);
> +		if (notifier_ret || !arg.pages_found)
> +			break;
> +
> +		for (iter = pfn; iter < (pfn + pageblock_nr_pages); iter++)
> +			if (page_count(pfn_to_page(iter)))
> +				immobile++;
> +
> +		if (arg.pages_found == immobile)

and here you compare a signed with an unsigned type. Probably harmless
but why do it?

> +			ret = 0;
> +	} while (0);
> +

So the out label would go here and you'd get rid of the do {} while(0)
loop.

> +	if (!ret) {
> +		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
> +		move_freepages_block(zone, page, MIGRATE_ISOLATE);
> +	}
> +
>  	spin_unlock_irqrestore(&zone->lock, flags);
>  	if (!ret)
>  		drain_all_pages();
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply

* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
From: Joakim Tjernlund @ 2009-10-02 13:06 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <1254350159.5699.21.camel@pasglop>

Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 01/10/2009 00:35:59:
>
>
> > > Had a look at linus tree and there is something I don't understand.
> > > Your fix, e0908085fc2391c85b85fb814ae1df377c8e0dcb, fixes a problem
> > > that was introduced by 8d30c14cab30d405a05f2aaceda1e9ad57800f36 but
> > > 8d30c14cab30d405a05f2aaceda1e9ad57800f36 was included in .31 and .31
> > > works and top of tree does not, how can that be?
> > >
> > > To me it seems more likely that some mm change introduced between .31 and
> > > top of tree is the culprit.
> > > My patch addresses the problem described in the comment:
> > > /* On 8xx, cache control instructions (particularly
> > >  * "dcbst" from flush_dcache_icache) fault as write
> > >  * operation if there is an unpopulated TLB entry
> > >  * for the address in question. To workaround that,
> > >  * we invalidate the TLB here, thus avoiding dcbst
> > >  * misbehaviour.
> > >  */
> > > Now you are using this old fix to paper over some other bug or so I think.
> >
> > There is something fishy with the TLB status, looking into the mpc862 manual I
> > don't see how it can work reliably. Need to dwell some more.
> > Anyhow, I have incorporated some of my findings into a new patch,
> > perhaps this will be the golden one?
>
> Well, I still wonder about what whole unpopulated entry business.
>
> >From what I can see, the TLB miss code will check _PAGE_PRESENT, and
> when not set, it will -still- insert something into the TLB (unlike
> all other CPU types that go straight to data access faults from there).
>
> So we end up populating with those unpopulated entries that will then
> cause us to take a DSI (or ISI) instead of a TLB miss the next time
> around ... and so we need to remove them once we do that, no ? IE. Once
> we have actually put a valid PTE in.
>
> At least that's my understanding and why I believe we should always have
> tlbil_va() in set_pte() and ptep_set_access_flags(), which boils down
> in putting it into the 2 "filter" functions in the new code.
>
> Well.. actually, the ptep_set_access_flags() will already do a
> flush_tlb_page_nohash() when the PTE is changed. So I suppose all we
> really need here would be in set_pte_filter(), to do the same if we are
> writing a PTE that has _PAGE_PRESENT, at least on 8xx.
>
> But just unconditionally doing a tlbil_va() in both the filter functions
> shouldn't harm and also fix the problem, though Rex seems to indicate
> that is not the case.
>
> Now, we -might- have something else broken on 8xx, hard to tell. You may
> want to try to bisect, adding back the removed tlbil_va() as you go
> backward, between .31 and upstream...

Found something odd, perhaps Rex can test?
I tested this on my old 2.4 and it worked well.

 Jocke

>From dd0f213d51437bb48ff42c1c0e690f243266c133 Mon Sep 17 00:00:00 2001
From: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Date: Fri, 2 Oct 2009 14:59:21 +0200
Subject: [PATCH] powerpc, 8xx: DTLB Error must check for more errors.

DataTLBError do only check for store op's. I think this
is too narrow. Protection and and no translation errors
needs to be checked too.
---
 arch/powerpc/kernel/head_8xx.S |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 52ff8c5..ef8a5ce 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -472,7 +472,7 @@ DataTLBError:
 	/* First, make sure this was a store operation.
 	*/
 	mfspr	r10, SPRN_DSISR
-	andis.	r11, r10, 0x0200	/* If set, indicates store op */
+	andis.	r11, r10, 0x4a00 /* If set, store, protection or no trans. */
 	beq	2f

 	/* The EA of a data TLB miss is automatically stored in the MD_EPN
--
1.6.4.4

^ permalink raw reply related

* Is volatile always verboten for FSL QE structures?
From: Michael Barkowski @ 2009-10-02 14:14 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Timur Tabi

Just wondering - is there a case where using volatile for UCC parameter RAM for example will not work, or is the use of I/O accessors everywhere an attempt to be portable to other architectures?

I'm asking because I really want to know ;)

-- 
Michael Barkowski
905-482-4577

^ permalink raw reply

* Re: Is volatile always verboten for FSL QE structures?
From: Timur Tabi @ 2009-10-02 14:46 UTC (permalink / raw)
  To: Michael Barkowski; +Cc: linuxppc-dev
In-Reply-To: <4AC60AD8.8030509@ruggedcom.com>

Michael Barkowski wrote:
> Just wondering - is there a case where using volatile for UCC parameter RAM for example will not work, or is the use of I/O accessors everywhere an attempt to be portable to other architectures?

'volatile' just doesn't really do what you think it should do.  The PowerPC architecture is too complicated w.r.t. ordering of reads and writes.  In other words, you can't trust it.

No one should be using 'volatile' to access I/O registers.

-- 
Timur Tabi
Linux kernel developer at Freescale

^ permalink raw reply

* Re: linux-next: tree build failure
From: Hollis Blanchard @ 2009-10-02 15:48 UTC (permalink / raw)
  To: Jan Beulich; +Cc: sfr, linux-kernel, kvm-ppc, linux-next, akpm, linuxppc-dev
In-Reply-To: <4AC318450200007800017355@vpn.id2.novell.com>

On Wed, 2009-09-30 at 07:35 +0100, Jan Beulich wrote:
> >>> Hollis Blanchard <hollisb@us.ibm.com> 30.09.09 01:39 >>>
> >On Tue, 2009-09-29 at 10:28 +0100, Jan Beulich wrote:
> >> >>> Hollis Blanchard  09/29/09 2:00 AM >>>
> >> >First, I think there is a real bug here, and the code should read like
> >> >this (to match the comment):
> >> >    /* type has to be known at build time for optimization */
> >> >-    BUILD_BUG_ON(__builtin_constant_p(type));
> >> >+    BUILD_BUG_ON(!__builtin_constant_p(type));
> >> >
> >> >However, I get the same build error *both* ways, i.e.
> >> >__builtin_constant_p(type) evaluates to both 0 and 1? Either that, or
> >> >the new BUILD_BUG_ON() macro isn't working...
> >> 
> >> No, at this point of the compilation process it's neither zero nor one,
> >> it's simply considered non-constant by the compiler at that stage
> >> (this builtin is used for optimization, not during parsing, and the
> >> error gets generated when the body of the function gets parsed,
> >> not when code gets generated from it).
> >
> >I think I see what you're saying. Do you have a fix to suggest?
> 
> The one Rusty suggested the other day may help here. I don't like it
> as a drop-in replacement for BUILD_BUG_ON() though (due to it
> deferring the error generated to the linking stage), I'd rather view
> this as an improvement to MAYBE_BUILD_BUG_ON() (which should
> then be used here).

Can you be more specific?

I have no idea what Rusty suggested where. I can't even guess what
MAYBE_BUILD_BUG_ON() is supposed to do (sounds like a terrible name).

All I know is that this used to build...

-- 
Hollis Blanchard
IBM Linux Technology Center

^ permalink raw reply

* Re: Is volatile always verboten for FSL QE structures?
From: Kumar Gala @ 2009-10-02 16:41 UTC (permalink / raw)
  To: Timur Tabi; +Cc: linuxppc-dev, Michael Barkowski
In-Reply-To: <4AC61247.1030507@freescale.com>


On Oct 2, 2009, at 9:46 AM, Timur Tabi wrote:

> Michael Barkowski wrote:
>> Just wondering - is there a case where using volatile for UCC  
>> parameter RAM for example will not work, or is the use of I/O  
>> accessors everywhere an attempt to be portable to other  
>> architectures?
>
> 'volatile' just doesn't really do what you think it should do.  The  
> PowerPC architecture is too complicated w.r.t. ordering of reads and  
> writes.  In other words, you can't trust it.
>
> No one should be using 'volatile' to access I/O registers.

See Documentation/volatile-considered-harmful.txt

- k

^ permalink raw reply

* Re: Is volatile always verboten for FSL QE structures?
From: Michael Barkowski @ 2009-10-02 16:57 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev, Timur Tabi
In-Reply-To: <C3FE6281-04FA-48B3-96EA-D6E68AF65D6B@kernel.crashing.org>

Kumar Gala wrote:
> 
> On Oct 2, 2009, at 9:46 AM, Timur Tabi wrote:
> 
>> Michael Barkowski wrote:
>>> Just wondering - is there a case where using volatile for UCC
>>> parameter RAM for example will not work, or is the use of I/O
>>> accessors everywhere an attempt to be portable to other architectures?
>>
>> 'volatile' just doesn't really do what you think it should do.  The
>> PowerPC architecture is too complicated w.r.t. ordering of reads and
>> writes.  In other words, you can't trust it.
>>
>> No one should be using 'volatile' to access I/O registers.
> 
> See Documentation/volatile-considered-harmful.txt
> 

I'm happy to adopt your interpretation of it, and I appreciate the explanation.

from Documentation/volatile-considered-harmful.txt:
>   - The above-mentioned accessor functions might use volatile on
>     architectures where direct I/O memory access does work.  Essentially,
>     each accessor call becomes a little critical section on its own and
>     ensures that the access happens as expected by the programmer.

Part of it was that I wondered if this was one of those architectures.  I guess not.

-- 
Michael Barkowski
905-482-4577

^ permalink raw reply

* Re: linux booting fails on ppc440x5 with SRAM
From: Johnny Hung @ 2009-10-02 17:17 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Vineeth _
In-Reply-To: <1253742018.7103.333.camel@pasglop>

It's seems a RAM initialize problem. Try to use ICE or your bootloader
to test initialized RAM wirh write/read operation.
Ex, use mtest in uboot to check memory. For ICE, it should be an
detailed memory test function like hardware diagnostic.

2009/9/24 Benjamin Herrenschmidt <benh@kernel.crashing.org>:
> On Wed, 2009-09-23 at 20:19 +0530, Vineeth _ wrote:
>> I am trying to port linux on IBM powerpc-440x5. I have this board
>> which has this processor, a 16MB SRAM sits on location 0x0, uart and a
>> flash.I have a simple bootloader which does the following.
>> =A0 =A0 1. Initialize the processor (as part of it, we Generates the tlb=
s
>> for UART,16MB flash,16MB SRAM)
>> =A0 =A0 2. Initialize the UART
>> =A0 =A0 3. Copy the simple-boot linux_image (binary file) from flash to
>> 0x400000 location of SRAM.
>> =A0 =A0 4. Kernel entry to 0x400000
>
> Not sure yet, looks like things are being overwritten during boot.
> Interestingly enough, the DEAR value looks like an MSR value..
>
> Hard to tell what's up. You'll have to dig a bit more youself to
> figure out how come the kernel's getting that bad pointer.
>
> All I can tell you is that things work fine on 440x5 cores, though
> I haven't tested a configuration with so little memory in a while.
>
> Ben.
>
>> zImage starting: loaded at 0x00400000 (sp: 0x004deeb0)
>> Allocating 0x1dad84 bytes for kernel ...
>> gunzipping (0x00000000 <- 0x0040c000:0x004dd3f1)...done 0x1c31cc bytes
>>
>> Linux/PowerPC load: console=3DttyS0 root=3D/dev/ram
>> Finalizing device tree... flat tree at 0x4eb300
>> id mach(): done
>> inside skybeam_register_console function
>> MMU:enterMMU:hw initMMU:mapinMMU:setioMMU:exitinside
>> _setup_arch-begininginside _setup_arch-1inside
>> _setup_arch-2setup_arch: bootmemarch: exit<7>Top of RAM: 0x1000000,
>> Total RAM: 0x1000000
>> Zone PFN ranges:
>> =A0DMA =A0 =A0 =A00x00000000 -> 0x00001000
>> =A0Normal =A0 0x00001000 -> 0x00001000
>> Movable zone start PFN for each node
>> early_node_map[1] active PFN ranges
>> =A0 =A00: 0x00000000 -> 0x00001000
>> MMU: Allocated 1088 bytes of context maps for 255 contexts
>> Built 1 zonelists in Zone order, mobility grouping off. =A0Total pages: =
4064
>> Kernel command line: console=3DttyS0 root=3D/dev/ram
>> Unable to handle kernel paging request for data at address 0x00021000
>> Faulting instruction address: 0xc010a7c4
>> Oops: Kernel access of bad area, sig: 11 [#1]
>> PREEMPT PowerPC 44x Platform
>> Modules linked in:
>> NIP: c010a7c4 LR: c010dc50 CTR: 00000000
>> REGS: c01bfeb0 TRAP: 0300 =A0 Not tainted =A0(2.6.30)
>> MSR: 00021000 <ME,CE> =A0CR: 24000044 =A0XER: 00000000
>> DEAR: 00021000, ESR: 00000000
>> TASK =3D c01a94b8[0] 'swapper' THREAD: c01be000
>> GPR00: 00001180 c01bff60 c01a94b8 00021000 00000025 00000008 c0104968 00=
000000
>> GPR08: 2f72616d c0110000 c0155938 c01a0000 22000024 00000000 fffff104 00=
000000
>> GPR16: 00000000 00000000 00000000 00000000 fffffff8 000008b8 c010d758 c0=
104968
>> GPR24: 00001198 00001190 c018a001 c01c5498 000008c0 00001188 00021000 c0=
1c42f0
>> NIP [c010a7c4] strchr+0x0/0x3c
>> LR [c010dc50] match_token+0x138/0x228
>> Call Trace:
>> [c01bff60] [c016b99c] 0xc016b99c (unreliable)
>> [c01bffa0] [c0104a00] sort_extable+0x28/0x38
>> [c01bffb0] [c01938ec] sort_main_extable+0x20/0x30
>> [c01bffc0] [c018c734] start_kernel+0x140/0x288
>> [c01bfff0] [c0000200] skpinv+0x190/0x1cc
>> Instruction dump:
>> 7ca903a6 88040000 38a5ffff 38840001 2f800000 98090000 39290001 419e0010
>> 4200ffe4 98a90000 4e800020 4e800020 <88030000> 5484063e 7f802000 4d9e002=
0
>> ---[ end trace 31fd0ba7d8756001 ]---
>> Kernel panic - not syncing: Attempted to kill the idle task!
>> Call Trace:
>> [c01bfd90] [c0005d5c] show_stack+0x4c/0x16c (unreliable)
>> [c01bfdd0] [c002f17c] panic+0xa0/0x168
>> [c01bfe20] [c0032eb8] do_exit+0x61c/0x638
>> [c01bfe60] [c000b60c] kernel_bad_stack+0x0/0x4c
>> [c01bfe90] [c000f310] bad_page_fault+0x90/0xd8
>> [c01bfea0] [c000e184] handle_page_fault+0x7c/0x80
>> [c01bff60] [c016b99c] 0xc016b99c
>> [c01bffa0] [c0104a00] sort_extable+0x28/0x38
>> [c01bffb0] [c01938ec] sort_main_extable+0x20/0x30
>> [c01bffc0] [c018c734] start_kernel+0x140/0x288
>> [c01bfff0] [c0000200] skpinv+0x190/0x1cc
>> Rebooting in 180 seconds..
>> _______________________________________________
>> Linuxppc-dev mailing list
>> Linuxppc-dev@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>

^ permalink raw reply

* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
From: Joakim Tjernlund @ 2009-10-02 18:10 UTC (permalink / raw)
  Cc: linuxppc-dev@ozlabs.org
In-Reply-To: <OF2FCFEA43.75290607-ONC1257643.0047C81F-C1257643.0048036B@transmode.se>

>
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 01/10/2009 00:35:59:
> >
> >
> > > > Had a look at linus tree and there is something I don't understand.
> > > > Your fix, e0908085fc2391c85b85fb814ae1df377c8e0dcb, fixes a problem
> > > > that was introduced by 8d30c14cab30d405a05f2aaceda1e9ad57800f36 but
> > > > 8d30c14cab30d405a05f2aaceda1e9ad57800f36 was included in .31 and .31
> > > > works and top of tree does not, how can that be?
> > > >
> > > > To me it seems more likely that some mm change introduced between .31 and
> > > > top of tree is the culprit.
> > > > My patch addresses the problem described in the comment:
> > > > /* On 8xx, cache control instructions (particularly
> > > >  * "dcbst" from flush_dcache_icache) fault as write
> > > >  * operation if there is an unpopulated TLB entry
> > > >  * for the address in question. To workaround that,
> > > >  * we invalidate the TLB here, thus avoiding dcbst
> > > >  * misbehaviour.
> > > >  */
> > > > Now you are using this old fix to paper over some other bug or so I think.
> > >
> > > There is something fishy with the TLB status, looking into the mpc862 manual I
> > > don't see how it can work reliably. Need to dwell some more.
> > > Anyhow, I have incorporated some of my findings into a new patch,
> > > perhaps this will be the golden one?
> >
> > Well, I still wonder about what whole unpopulated entry business.
> >
> > >From what I can see, the TLB miss code will check _PAGE_PRESENT, and
> > when not set, it will -still- insert something into the TLB (unlike
> > all other CPU types that go straight to data access faults from there).
> >
> > So we end up populating with those unpopulated entries that will then
> > cause us to take a DSI (or ISI) instead of a TLB miss the next time
> > around ... and so we need to remove them once we do that, no ? IE. Once
> > we have actually put a valid PTE in.
> >
> > At least that's my understanding and why I believe we should always have
> > tlbil_va() in set_pte() and ptep_set_access_flags(), which boils down
> > in putting it into the 2 "filter" functions in the new code.
> >
> > Well.. actually, the ptep_set_access_flags() will already do a
> > flush_tlb_page_nohash() when the PTE is changed. So I suppose all we
> > really need here would be in set_pte_filter(), to do the same if we are
> > writing a PTE that has _PAGE_PRESENT, at least on 8xx.
> >
> > But just unconditionally doing a tlbil_va() in both the filter functions
> > shouldn't harm and also fix the problem, though Rex seems to indicate
> > that is not the case.
> >
> > Now, we -might- have something else broken on 8xx, hard to tell. You may
> > want to try to bisect, adding back the removed tlbil_va() as you go
> > backward, between .31 and upstream...
>
> Found something odd, perhaps Rex can test?
> I tested this on my old 2.4 and it worked well.

That was not quite correct, sorry.
Try this instead:

>From c5f1a70561730b8a443f7081fbd9c5b023147043 Mon Sep 17 00:00:00 2001
From: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Date: Fri, 2 Oct 2009 14:59:21 +0200
Subject: [PATCH] powerpc, 8xx: DTLB Error must check for more errors.

DataTLBError currently does:
 if ((err & 0x02000000) == 0)
    DSI();
This won't handle a store with no valid translation.
Change this to
 if ((err & 0x48000000) != 0)
    DSI();
that is, branch to DSI if either !permission or
!translation.
---
 arch/powerpc/kernel/head_8xx.S |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 52ff8c5..118bb05 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -472,8 +472,8 @@ DataTLBError:
 	/* First, make sure this was a store operation.
 	*/
 	mfspr	r10, SPRN_DSISR
-	andis.	r11, r10, 0x0200	/* If set, indicates store op */
-	beq	2f
+	andis.	r11, r10, 0x4800 /* no translation, no permission. */
+	bne	2f	/* branch if either is set */

 	/* The EA of a data TLB miss is automatically stored in the MD_EPN
 	 * register.  The EA of a data TLB error is automatically stored in
--
1.6.4.4

^ permalink raw reply related

* Re: Is volatile always verboten for FSL QE structures?
From: Guillaume Knispel @ 2009-10-02 18:08 UTC (permalink / raw)
  To: Michael Barkowski; +Cc: Guillaume Knispel, linuxppc-dev, Timur Tabi
In-Reply-To: <4AC63112.7080404@ruggedcom.com>

Michael Barkowski wrote:

> Kumar Gala wrote:
> > 
> > On Oct 2, 2009, at 9:46 AM, Timur Tabi wrote:
> > 
> >> Michael Barkowski wrote:
> >>> Just wondering - is there a case where using volatile for UCC
> >>> parameter RAM for example will not work, or is the use of I/O
> >>> accessors everywhere an attempt to be portable to other architectures?
> >>
> >> 'volatile' just doesn't really do what you think it should do.  The
> >> PowerPC architecture is too complicated w.r.t. ordering of reads and
> >> writes.  In other words, you can't trust it.
> >>
> >> No one should be using 'volatile' to access I/O registers.
> > 
> > See Documentation/volatile-considered-harmful.txt
> > 
> 
> I'm happy to adopt your interpretation of it, and I appreciate the explanation.
> 
> from Documentation/volatile-considered-harmful.txt:
> >   - The above-mentioned accessor functions might use volatile on
> >     architectures where direct I/O memory access does work.  Essentially,
> >     each accessor call becomes a little critical section on its own and
> >     ensures that the access happens as expected by the programmer.
> 
> Part of it was that I wondered if this was one of those architectures.  I guess not.

I guess this could only work on architectures having a totally ordered
memory model.  Definitely not the case for Power.

^ permalink raw reply

* Re: [PATCH 1/2] mm: add notifier in pageblock isolation for balloon drivers
From: Robert Jennings @ 2009-10-02 18:31 UTC (permalink / raw)
  To: Nathan Fontenot
  Cc: linux-mm, Mel Gorman, linux-kernel, linuxppc-dev,
	Martin Schwidefsky, Badari Pulavarty, Brian King, Paul Mackerras,
	Ingo Molnar
In-Reply-To: <4AC520B5.9080600@austin.ibm.com>

* Nathan Fontenot (nfont@austin.ibm.com) wrote:
> Robert Jennings wrote:
>> Memory balloon drivers can allocate a large amount of memory which
>> is not movable but could be freed to accommodate memory hotplug remove.
>>
>> Prior to calling the memory hotplug notifier chain the memory in the
>> pageblock is isolated.  If the migrate type is not MIGRATE_MOVABLE the
>> isolation will not proceed, causing the memory removal for that page
>> range to fail.
>>
>> Rather than immediately failing pageblock isolation if the the
>> migrateteype is not MIGRATE_MOVABLE, this patch checks if all of the
>> pages in the pageblock are owned by a registered balloon driver using a
>> notifier chain.  If all of the non-movable pages are owned by a balloon,
>> they can be freed later through the memory notifier chain and the range
>> can still be isolated in set_migratetype_isolate().
>>
>> Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
>>
>> ---
>>  drivers/base/memory.c  |   19 +++++++++++++++++++
>>  include/linux/memory.h |   22 ++++++++++++++++++++++
>>  mm/page_alloc.c        |   49 +++++++++++++++++++++++++++++++++++++++++--------
>>  3 files changed, 82 insertions(+), 8 deletions(-)
>>
<snip>
>> Index: b/mm/page_alloc.c
>> ===================================================================
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -48,6 +48,7 @@
>>  #include <linux/page_cgroup.h>
>>  #include <linux/debugobjects.h>
>>  #include <linux/kmemleak.h>
>> +#include <linux/memory.h>
>>  #include <trace/events/kmem.h>
>>   #include <asm/tlbflush.h>
>> @@ -4985,23 +4986,55 @@ void set_pageblock_flags_group(struct pa
>>  int set_migratetype_isolate(struct page *page)
>>  {
>>  	struct zone *zone;
>> -	unsigned long flags;
>> +	unsigned long flags, pfn, iter;
>> +	long immobile = 0;
>> +	struct memory_isolate_notify arg;
>> +	int notifier_ret;
>>  	int ret = -EBUSY;
>>  	int zone_idx;
>>   	zone = page_zone(page);
>>  	zone_idx = zone_idx(zone);
>> +
>> +	pfn = page_to_pfn(page);
>> +	arg.start_addr = (unsigned long)page_address(page);
>> +	arg.nr_pages = pageblock_nr_pages;
>> +	arg.pages_found = 0;
>> +
>>  	spin_lock_irqsave(&zone->lock, flags);
>>  	/*
>>  	 * In future, more migrate types will be able to be isolation target.
>>  	 */
>> -	if (get_pageblock_migratetype(page) != MIGRATE_MOVABLE &&
>> -	    zone_idx != ZONE_MOVABLE)
>> -		goto out;
>> -	set_pageblock_migratetype(page, MIGRATE_ISOLATE);
>> -	move_freepages_block(zone, page, MIGRATE_ISOLATE);
>> -	ret = 0;
>> -out:
>> +	do {
>> +		if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE &&
>> +		    zone_idx == ZONE_MOVABLE) {
>> +			ret = 0;
>> +			break;
>> +		}
>> +
>> +		/*
>> +		 * If all of the pages in a zone are used by a balloon,
>> +		 * the range can be still be isolated.  The balloon will
>> +		 * free these pages from the memory notifier chain.
>> +		 */
>> +		notifier_ret = memory_isolate_notify(MEM_ISOLATE_COUNT, &arg);
>> +		notifier_ret = notifier_to_errno(ret);
>
> Should this be
>
> 		notifier_ret = notifier_to_errno(notifier_ret);
>
> -Nathan

I'll correct this.  Thanks

^ permalink raw reply

* Re: [PATCH 1/2] mm: add notifier in pageblock isolation for balloon drivers
From: Robert Jennings @ 2009-10-02 18:37 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, linux-kernel, linuxppc-dev, Martin Schwidefsky,
	Badari Pulavarty, Brian King, Paul Mackerras, Ingo Molnar
In-Reply-To: <20091002104425.GN21906@csn.ul.ie>

* Mel Gorman (mel@csn.ul.ie) wrote:
> On Thu, Oct 01, 2009 at 02:53:11PM -0500, Robert Jennings wrote:
> > Memory balloon drivers can allocate a large amount of memory which
> > is not movable but could be freed to accommodate memory hotplug remove.
> > 
> > Prior to calling the memory hotplug notifier chain the memory in the
> > pageblock is isolated.  If the migrate type is not MIGRATE_MOVABLE the
> > isolation will not proceed, causing the memory removal for that page
> > range to fail.
> > 
> > Rather than immediately failing pageblock isolation if the the
> > migrateteype is not MIGRATE_MOVABLE, this patch checks if all of the
> > pages in the pageblock are owned by a registered balloon driver using a
> > notifier chain.  If all of the non-movable pages are owned by a balloon,
> > they can be freed later through the memory notifier chain and the range
> > can still be isolated in set_migratetype_isolate().
> > 
> > Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
> > 
> > ---
> >  drivers/base/memory.c  |   19 +++++++++++++++++++
> >  include/linux/memory.h |   22 ++++++++++++++++++++++
> >  mm/page_alloc.c        |   49 +++++++++++++++++++++++++++++++++++++++++--------
> >  3 files changed, 82 insertions(+), 8 deletions(-)
> > 
> > Index: b/drivers/base/memory.c
> > ===================================================================
> > --- a/drivers/base/memory.c
> > +++ b/drivers/base/memory.c
> > @@ -63,6 +63,20 @@ void unregister_memory_notifier(struct n
> >  }
> >  EXPORT_SYMBOL(unregister_memory_notifier);
> >  
> > +static BLOCKING_NOTIFIER_HEAD(memory_isolate_chain);
> > +
> > +int register_memory_isolate_notifier(struct notifier_block *nb)
> > +{
> > +	return blocking_notifier_chain_register(&memory_isolate_chain, nb);
> > +}
> > +EXPORT_SYMBOL(register_memory_isolate_notifier);
> > +
> > +void unregister_memory_isolate_notifier(struct notifier_block *nb)
> > +{
> > +	blocking_notifier_chain_unregister(&memory_isolate_chain, nb);
> > +}
> > +EXPORT_SYMBOL(unregister_memory_isolate_notifier);
> > +
> >  /*
> >   * register_memory - Setup a sysfs device for a memory block
> >   */
> > @@ -157,6 +171,11 @@ int memory_notify(unsigned long val, voi
> >  	return blocking_notifier_call_chain(&memory_chain, val, v);
> >  }
> >  
> > +int memory_isolate_notify(unsigned long val, void *v)
> > +{
> > +	return blocking_notifier_call_chain(&memory_isolate_chain, val, v);
> > +}
> > +
> >  /*
> >   * MEMORY_HOTPLUG depends on SPARSEMEM in mm/Kconfig, so it is
> >   * OK to have direct references to sparsemem variables in here.
> > Index: b/include/linux/memory.h
> > ===================================================================
> > --- a/include/linux/memory.h
> > +++ b/include/linux/memory.h
> > @@ -50,6 +50,14 @@ struct memory_notify {
> >  	int status_change_nid;
> >  };
> >  
> > +#define MEM_ISOLATE_COUNT	(1<<0)
> > +
> 
> This needs a comment explaining that that this is an action to count the
> number of pages within a range that have been isolated within a range of
> pages and not a default value for "nr_pages" in the next structure.

I'll provide a clear explanation for this.

> > +struct memory_isolate_notify {
> > +	unsigned long start_addr;
> > +	unsigned int nr_pages;
> > +	unsigned int pages_found;
> > +};
> 
> Is there any particular reason you used virtual address of the mapped
> page instead of PFN? I am guessing at this point that the balloon driver
> is based on addresses but the code that populates the structure more
> commonly deals with PFNs. Outside of debugging code, page_address is
> rarely used in mm/page_alloc.c .
> 
> It's picky but it feels more natural to me to have the structure have
> start_pfn and nr_pages or start_addr and end_addr but not a mix of both.

Changing this to use start_pfn and nr_pages, this will also match
struct memory_notify.  Thanks for the review of this patch.

> > +
> >  struct notifier_block;
> >  struct mem_section;
> >  
> > @@ -76,14 +84,28 @@ static inline int memory_notify(unsigned
> >  {
> >  	return 0;
> >  }
> > +static inline int register_memory_isolate_notifier(struct notifier_block *nb)
> > +{
> > +	return 0;
> > +}
> > +static inline void unregister_memory_isolate_notifier(struct notifier_block *nb)
> > +{
> > +}
> > +static inline int memory_isolate_notify(unsigned long val, void *v)
> > +{
> > +	return 0;
> > +}
> >  #else
> >  extern int register_memory_notifier(struct notifier_block *nb);
> >  extern void unregister_memory_notifier(struct notifier_block *nb);
> > +extern int register_memory_isolate_notifier(struct notifier_block *nb);
> > +extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
> >  extern int register_new_memory(int, struct mem_section *);
> >  extern int unregister_memory_section(struct mem_section *);
> >  extern int memory_dev_init(void);
> >  extern int remove_memory_block(unsigned long, struct mem_section *, int);
> >  extern int memory_notify(unsigned long val, void *v);
> > +extern int memory_isolate_notify(unsigned long val, void *v);
> >  extern struct memory_block *find_memory_block(struct mem_section *);
> >  #define CONFIG_MEM_BLOCK_SIZE	(PAGES_PER_SECTION<<PAGE_SHIFT)
> >  enum mem_add_context { BOOT, HOTPLUG };
> > Index: b/mm/page_alloc.c
> > ===================================================================
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -48,6 +48,7 @@
> >  #include <linux/page_cgroup.h>
> >  #include <linux/debugobjects.h>
> >  #include <linux/kmemleak.h>
> > +#include <linux/memory.h>
> >  #include <trace/events/kmem.h>
> >  
> >  #include <asm/tlbflush.h>
> > @@ -4985,23 +4986,55 @@ void set_pageblock_flags_group(struct pa
> >  int set_migratetype_isolate(struct page *page)
> >  {
> >  	struct zone *zone;
> > -	unsigned long flags;
> > +	unsigned long flags, pfn, iter;
> > +	long immobile = 0;
> 
> So, the count in the structure is unsigned long, but long here. Why the
> difference in types?

No good reason, both will be unsigned long when I repost the patch.

> > +	struct memory_isolate_notify arg;
> > +	int notifier_ret;
> >  	int ret = -EBUSY;
> >  	int zone_idx;
> >  
> >  	zone = page_zone(page);
> >  	zone_idx = zone_idx(zone);
> > +
> > +	pfn = page_to_pfn(page);
> > +	arg.start_addr = (unsigned long)page_address(page);
> > +	arg.nr_pages = pageblock_nr_pages;
> > +	arg.pages_found = 0;
> > +
> >  	spin_lock_irqsave(&zone->lock, flags);
> >  	/*
> >  	 * In future, more migrate types will be able to be isolation target.
> >  	 */
> > -	if (get_pageblock_migratetype(page) != MIGRATE_MOVABLE &&
> > -	    zone_idx != ZONE_MOVABLE)
> > -		goto out;
> > -	set_pageblock_migratetype(page, MIGRATE_ISOLATE);
> > -	move_freepages_block(zone, page, MIGRATE_ISOLATE);
> > -	ret = 0;
> > -out:
> > +	do {
> > +		if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE &&
> > +		    zone_idx == ZONE_MOVABLE) {
> 
> So, this condition requires the zone be MOVABLE and the migrate type
> be movable. That prevents MIGRATE_RESERVE regions in ZONE_MOVABLE being
> off-lined even though they can likely be off-lined. It also prevents
> MIGRATE_MOVABLE sections in other zones being off-lined.
> 
> Did you mean || here instead of && ?
> 
> Might want to expand the comment explaining this condition instead of
> leaving it in the old location which is confusing.

I will fix the logic and clean up the comments here.

> > +			ret = 0;
> > +			break;
> > +		}
> 
> Why do you wrap all this in a do {} while(0)  instead of preserving the
> out: label and using goto?

I've put this back to how it was.

> > +
> > +		/*
> > +		 * If all of the pages in a zone are used by a balloon,
> > +		 * the range can be still be isolated.  The balloon will
> > +		 * free these pages from the memory notifier chain.
> > +		 */
> > +		notifier_ret = memory_isolate_notify(MEM_ISOLATE_COUNT, &arg);
> > +		notifier_ret = notifier_to_errno(ret);
> > +		if (notifier_ret || !arg.pages_found)
> > +			break;
> > +
> > +		for (iter = pfn; iter < (pfn + pageblock_nr_pages); iter++)
> > +			if (page_count(pfn_to_page(iter)))
> > +				immobile++;
> > +
> > +		if (arg.pages_found == immobile)
> 
> and here you compare a signed with an unsigned type. Probably harmless
> but why do it?

This is corrected by making both variables unsigned longs.

> > +			ret = 0;
> > +	} while (0);
> > +
> 
> So the out label would go here and you'd get rid of the do {} while(0)
> loop.

Fixed.

> > +	if (!ret) {
> > +		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
> > +		move_freepages_block(zone, page, MIGRATE_ISOLATE);
> > +	}
> > +
> >  	spin_unlock_irqrestore(&zone->lock, flags);
> >  	if (!ret)
> >  		drain_all_pages();

^ permalink raw reply

* [PATCH 1/2][v2] mm: add notifier in pageblock isolation for balloon drivers
From: Robert Jennings @ 2009-10-02 18:44 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, linux-kernel, linuxppc-dev, Martin Schwidefsky,
	Badari Pulavarty, Brian King, Paul Mackerras, Ingo Molnar

Memory balloon drivers can allocate a large amount of memory which
is not movable but could be freed to accomodate memory hotplug remove.

Prior to calling the memory hotplug notifier chain the memory in the
pageblock is isolated.  If the migrate type is not MIGRATE_MOVABLE the
isolation will not proceed, causing the memory removal for that page
range to fail.

Rather than failing pageblock isolation if the the migrateteype is not
MIGRATE_MOVABLE, this patch checks if all of the pages in the pageblock
are owned by a registered balloon driver (or other entity) using a
notifier chain.  If all of the non-movable pages are owned by a balloon,
they can be freed later through the memory notifier chain and the range
can still be isolated in set_migratetype_isolate().

Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>

---
 drivers/base/memory.c  |   19 +++++++++++++++++++
 include/linux/memory.h |   26 ++++++++++++++++++++++++++
 mm/page_alloc.c        |   45 ++++++++++++++++++++++++++++++++++++++-------
 3 files changed, 83 insertions(+), 7 deletions(-)

Index: b/drivers/base/memory.c
===================================================================
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -63,6 +63,20 @@ void unregister_memory_notifier(struct n
 }
 EXPORT_SYMBOL(unregister_memory_notifier);
 
+static BLOCKING_NOTIFIER_HEAD(memory_isolate_chain);
+
+int register_memory_isolate_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_register(&memory_isolate_chain, nb);
+}
+EXPORT_SYMBOL(register_memory_isolate_notifier);
+
+void unregister_memory_isolate_notifier(struct notifier_block *nb)
+{
+	blocking_notifier_chain_unregister(&memory_isolate_chain, nb);
+}
+EXPORT_SYMBOL(unregister_memory_isolate_notifier);
+
 /*
  * register_memory - Setup a sysfs device for a memory block
  */
@@ -157,6 +171,11 @@ int memory_notify(unsigned long val, voi
 	return blocking_notifier_call_chain(&memory_chain, val, v);
 }
 
+int memory_isolate_notify(unsigned long val, void *v)
+{
+	return blocking_notifier_call_chain(&memory_isolate_chain, val, v);
+}
+
 /*
  * MEMORY_HOTPLUG depends on SPARSEMEM in mm/Kconfig, so it is
  * OK to have direct references to sparsemem variables in here.
Index: b/include/linux/memory.h
===================================================================
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -50,6 +50,18 @@ struct memory_notify {
 	int status_change_nid;
 };
 
+/*
+ * During pageblock isolation, count the number of pages in the
+ * range [start_pfn, start_pfn + nr_pages)
+ */
+#define MEM_ISOLATE_COUNT	(1<<0)
+
+struct memory_isolate_notify {
+	unsigned long start_pfn;
+	unsigned int nr_pages;
+	unsigned int pages_found;
+};
+
 struct notifier_block;
 struct mem_section;
 
@@ -76,14 +88,28 @@ static inline int memory_notify(unsigned
 {
 	return 0;
 }
+static inline int register_memory_isolate_notifier(struct notifier_block *nb)
+{
+	return 0;
+}
+static inline void unregister_memory_isolate_notifier(struct notifier_block *nb)
+{
+}
+static inline int memory_isolate_notify(unsigned long val, void *v)
+{
+	return 0;
+}
 #else
 extern int register_memory_notifier(struct notifier_block *nb);
 extern void unregister_memory_notifier(struct notifier_block *nb);
+extern int register_memory_isolate_notifier(struct notifier_block *nb);
+extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
 extern int register_new_memory(int, struct mem_section *);
 extern int unregister_memory_section(struct mem_section *);
 extern int memory_dev_init(void);
 extern int remove_memory_block(unsigned long, struct mem_section *, int);
 extern int memory_notify(unsigned long val, void *v);
+extern int memory_isolate_notify(unsigned long val, void *v);
 extern struct memory_block *find_memory_block(struct mem_section *);
 #define CONFIG_MEM_BLOCK_SIZE	(PAGES_PER_SECTION<<PAGE_SHIFT)
 enum mem_add_context { BOOT, HOTPLUG };
Index: b/mm/page_alloc.c
===================================================================
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -48,6 +48,7 @@
 #include <linux/page_cgroup.h>
 #include <linux/debugobjects.h>
 #include <linux/kmemleak.h>
+#include <linux/memory.h>
 #include <trace/events/kmem.h>
 
 #include <asm/tlbflush.h>
@@ -4985,23 +4986,53 @@ void set_pageblock_flags_group(struct pa
 int set_migratetype_isolate(struct page *page)
 {
 	struct zone *zone;
-	unsigned long flags;
+	unsigned long flags, pfn, iter;
+	unsigned long immobile = 0;
+	struct memory_isolate_notify arg;
+	int notifier_ret;
 	int ret = -EBUSY;
 	int zone_idx;
 
 	zone = page_zone(page);
 	zone_idx = zone_idx(zone);
+
 	spin_lock_irqsave(&zone->lock, flags);
+	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
+	    zone_idx == ZONE_MOVABLE) {
+		ret = 0;
+		goto out;
+	}
+
+	pfn = page_to_pfn(page);
+	arg.start_pfn = pfn;
+	arg.nr_pages = pageblock_nr_pages;
+	arg.pages_found = 0;
+
 	/*
-	 * In future, more migrate types will be able to be isolation target.
+	 * The pageblock can be isolated even if the migrate type is
+	 * not *_MOVABLE.  The memory isolation notifier chain counts
+	 * the number of pages in this pageblock that can be freed later
+	 * through the memory notifier chain.  If all of the pages are
+	 * accounted for, isolation can continue.
 	 */
-	if (get_pageblock_migratetype(page) != MIGRATE_MOVABLE &&
-	    zone_idx != ZONE_MOVABLE)
+	notifier_ret = memory_isolate_notify(MEM_ISOLATE_COUNT, &arg);
+	notifier_ret = notifier_to_errno(notifier_ret);
+       	if (notifier_ret || !arg.pages_found)
 		goto out;
-	set_pageblock_migratetype(page, MIGRATE_ISOLATE);
-	move_freepages_block(zone, page, MIGRATE_ISOLATE);
-	ret = 0;
+
+	for (iter = pfn; iter < (pfn + pageblock_nr_pages); iter++)
+		if (page_count(pfn_to_page(iter)))
+			immobile++;
+
+	if (arg.pages_found == immobile)
+		ret = 0;
+
 out:
+	if (!ret) {
+		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
+		move_freepages_block(zone, page, MIGRATE_ISOLATE);
+	}
+
 	spin_unlock_irqrestore(&zone->lock, flags);
 	if (!ret)
 		drain_all_pages();

^ permalink raw reply

* [PATCH 2/2][v2] powerpc: Make the CMM memory hotplug aware
From: Robert Jennings @ 2009-10-02 18:52 UTC (permalink / raw)
  To: Mel Gorman, Ingo Molnar, Badari Pulavarty, Brian King,
	Benjamin Herrenschmidt, Paul Mackerras, Martin Schwidefsky,
	linux-kernel, linux-mm, linuxppc-dev
In-Reply-To: <20091002184458.GC4908@austin.ibm.com>


The Collaborative Memory Manager (CMM) module allocates individual pages
over time that are not migratable.  On a long running system this can
severely impact the ability to find enough pages to support a hotplug
memory remove operation.

This patch adds a memory isolation notifier and a memory hotplug notifier.
The memory isolation notifier will return the number of pages found
in the range specified.  This is used to determine if all of the used
pages in a pageblock are owned by the balloon (or other entities in
the notifier chain).  The hotplug notifier will free pages in the range
which is to be removed.  The priority of this hotplug notifier is low
so that it will be called near last, this helps avoids removing loaned
pages in operations that fail due to other handlers.

CMM activity will be halted when hotplug remove operations are active
and resume activity after a delay period to allow the hypervisor time
to adjust.

Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>

---
Minor update to cmm_count_pages() to account for changes in
struct memory_isolate_notify.
---
 arch/powerpc/platforms/pseries/cmm.c |  207 ++++++++++++++++++++++++++++++++++-
 1 file changed, 201 insertions(+), 6 deletions(-)

Index: b/arch/powerpc/platforms/pseries/cmm.c
===================================================================
--- a/arch/powerpc/platforms/pseries/cmm.c
+++ b/arch/powerpc/platforms/pseries/cmm.c
@@ -38,19 +38,28 @@
 #include <asm/mmu.h>
 #include <asm/pgalloc.h>
 #include <asm/uaccess.h>
+#include <linux/memory.h>
 
 #include "plpar_wrappers.h"
 
 #define CMM_DRIVER_VERSION	"1.0.0"
 #define CMM_DEFAULT_DELAY	1
+#define CMM_HOTPLUG_DELAY	5
 #define CMM_DEBUG			0
 #define CMM_DISABLE		0
 #define CMM_OOM_KB		1024
 #define CMM_MIN_MEM_MB		256
 #define KB2PAGES(_p)		((_p)>>(PAGE_SHIFT-10))
 #define PAGES2KB(_p)		((_p)<<(PAGE_SHIFT-10))
+/*
+ * The priority level tries to ensure that this notifier is called as
+ * late as possible to reduce thrashing in the shared memory pool.
+ */
+#define CMM_MEM_HOTPLUG_PRI	1
+#define CMM_MEM_ISOLATE_PRI	15
 
 static unsigned int delay = CMM_DEFAULT_DELAY;
+static unsigned int hotplug_delay = CMM_HOTPLUG_DELAY;
 static unsigned int oom_kb = CMM_OOM_KB;
 static unsigned int cmm_debug = CMM_DEBUG;
 static unsigned int cmm_disabled = CMM_DISABLE;
@@ -65,6 +74,10 @@ MODULE_VERSION(CMM_DRIVER_VERSION);
 module_param_named(delay, delay, uint, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(delay, "Delay (in seconds) between polls to query hypervisor paging requests. "
 		 "[Default=" __stringify(CMM_DEFAULT_DELAY) "]");
+module_param_named(hotplug_delay, hotplug_delay, uint, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(delay, "Delay (in seconds) after memory hotplug remove "
+		 "before activity resumes. "
+		 "[Default=" __stringify(CMM_HOTPLUG_DELAY) "]");
 module_param_named(oom_kb, oom_kb, uint, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(oom_kb, "Amount of memory in kb to free on OOM. "
 		 "[Default=" __stringify(CMM_OOM_KB) "]");
@@ -88,6 +101,8 @@ struct cmm_page_array {
 static unsigned long loaned_pages;
 static unsigned long loaned_pages_target;
 static unsigned long oom_freed_pages;
+static atomic_t hotplug_active = ATOMIC_INIT(0);
+static atomic_t hotplug_occurred = ATOMIC_INIT(0);
 
 static struct cmm_page_array *cmm_page_list;
 static DEFINE_SPINLOCK(cmm_lock);
@@ -110,6 +125,9 @@ static long cmm_alloc_pages(long nr)
 	cmm_dbg("Begin request for %ld pages\n", nr);
 
 	while (nr) {
+		if (atomic_read(&hotplug_active))
+			break;
+
 		addr = __get_free_page(GFP_NOIO | __GFP_NOWARN |
 				       __GFP_NORETRY | __GFP_NOMEMALLOC);
 		if (!addr)
@@ -119,8 +137,10 @@ static long cmm_alloc_pages(long nr)
 		if (!pa || pa->index >= CMM_NR_PAGES) {
 			/* Need a new page for the page list. */
 			spin_unlock(&cmm_lock);
-			npa = (struct cmm_page_array *)__get_free_page(GFP_NOIO | __GFP_NOWARN |
-								       __GFP_NORETRY | __GFP_NOMEMALLOC);
+			npa = (struct cmm_page_array *)__get_free_page(
+					GFP_NOIO | __GFP_NOWARN |
+					__GFP_NORETRY | __GFP_NOMEMALLOC |
+					__GFP_MOVABLE);
 			if (!npa) {
 				pr_info("%s: Can not allocate new page list\n", __func__);
 				free_page(addr);
@@ -273,9 +293,23 @@ static int cmm_thread(void *dummy)
 	while (1) {
 		timeleft = msleep_interruptible(delay * 1000);
 
-		if (kthread_should_stop() || timeleft) {
-			loaned_pages_target = loaned_pages;
+		if (kthread_should_stop() || timeleft)
 			break;
+
+		if (atomic_read(&hotplug_active)) {
+			cmm_dbg("Hotplug operation in progress, activity "
+					"suspended\n");
+			continue;
+		}
+
+		if (atomic_dec_if_positive(&hotplug_occurred) >= 0) {
+			cmm_dbg("Hotplug operation has occurred, loaning "
+					"activity suspended for %d seconds.\n",
+					hotplug_delay);
+			timeleft = msleep_interruptible(hotplug_delay * 1000);
+			if (kthread_should_stop() || timeleft)
+				break;
+			continue;
 		}
 
 		cmm_get_mpp();
@@ -405,6 +439,159 @@ static struct notifier_block cmm_reboot_
 };
 
 /**
+ * cmm_count_pages - Count the number of pages loaned in a particular range.
+ *
+ * @arg: memory_isolate_notify structure with address range and count
+ *
+ * Return value:
+ *      0 on success
+ **/
+static unsigned long cmm_count_pages(void *arg)
+{
+	struct memory_isolate_notify *marg = arg;
+	struct cmm_page_array *pa;
+	unsigned long start = (unsigned long)pfn_to_kaddr(marg->start_pfn);
+	unsigned long end = start + (marg->nr_pages << PAGE_SHIFT);
+	unsigned long idx;
+
+	spin_lock(&cmm_lock);
+	pa = cmm_page_list;
+	while (pa) {
+		for (idx = 0; idx < pa->index; idx++)
+			if (pa->page[idx] >= start && pa->page[idx] < end)
+				marg->pages_found++;
+		pa = pa->next;
+	}
+	spin_unlock(&cmm_lock);
+	return 0;
+}
+
+/**
+ * cmm_memory_isolate_cb - Handle memory isolation notifier calls
+ * @self:	notifier block struct
+ * @action:	action to take
+ * @arg:	struct memory_isolate_notify data for handler
+ *
+ * Return value:
+ *	NOTIFY_OK or notifier error based on subfunction return value
+ **/
+static int cmm_memory_isolate_cb(struct notifier_block *self,
+				 unsigned long action, void *arg)
+{
+	int ret = 0;
+
+	if (action == MEM_ISOLATE_COUNT)
+		ret = cmm_count_pages(arg);
+
+	if (ret)
+		ret = notifier_from_errno(ret);
+	else
+		ret = NOTIFY_OK;
+
+	return ret;
+}
+
+static struct notifier_block cmm_mem_isolate_nb = {
+	.notifier_call = cmm_memory_isolate_cb,
+	.priority = CMM_MEM_ISOLATE_PRI
+};
+
+/**
+ * cmm_mem_going_offline - Unloan pages where memory is to be removed
+ * @arg: memory_notify structure with page range to be offlined
+ *
+ * Return value:
+ *	0 on success
+ **/
+static int cmm_mem_going_offline(void *arg)
+{
+	struct memory_notify *marg = arg;
+	unsigned long start_page = (unsigned long)pfn_to_kaddr(marg->start_pfn);
+	unsigned long end_page = start_page + (marg->nr_pages << PAGE_SHIFT);
+	struct cmm_page_array *pa_curr, *pa_last;
+	unsigned long idx;
+	unsigned long freed = 0;
+
+	cmm_dbg("Memory going offline, searching 0x%lx (%ld pages).\n",
+			start_page, marg->nr_pages);
+	spin_lock(&cmm_lock);
+
+	pa_last = pa_curr = cmm_page_list;
+	while (pa_curr) {
+		for (idx = (pa_curr->index - 1); (idx + 1) > 0; idx--) {
+			if ((pa_curr->page[idx] < start_page) ||
+			    (pa_curr->page[idx] >= end_page))
+				continue;
+
+			plpar_page_set_active(__pa(pa_curr->page[idx]));
+			free_page(pa_curr->page[idx]);
+			freed++;
+			loaned_pages--;
+			totalram_pages++;
+			pa_curr->page[idx] = pa_last->page[--pa_last->index];
+			if (pa_last->index == 0) {
+				if (pa_curr == pa_last)
+					pa_curr = pa_last->next;
+				pa_last = pa_last->next;
+				free_page((unsigned long)cmm_page_list);
+				cmm_page_list = pa_last;
+				continue;
+			}
+		}
+		pa_curr = pa_curr->next;
+	}
+	atomic_set(&hotplug_occurred, 1);
+	spin_unlock(&cmm_lock);
+	cmm_dbg("Released %ld pages in the search range.\n", freed);
+
+	return 0;
+}
+
+/**
+ * cmm_memory_cb - Handle memory hotplug notifier calls
+ * @self:	notifier block struct
+ * @action:	action to take
+ * @arg:	struct memory_notify data for handler
+ *
+ * Return value:
+ *	NOTIFY_OK or notifier error based on subfunction return value
+ *
+ **/
+static int cmm_memory_cb(struct notifier_block *self,
+			unsigned long action, void *arg)
+{
+	int ret = 0;
+
+	switch (action) {
+	case MEM_GOING_OFFLINE:
+		atomic_set(&hotplug_active, 1);
+		ret = cmm_mem_going_offline(arg);
+		break;
+	case MEM_OFFLINE:
+	case MEM_CANCEL_OFFLINE:
+		atomic_set(&hotplug_active, 0);
+		cmm_dbg("Memory offline operation complete.\n");
+		break;
+	case MEM_GOING_ONLINE:
+	case MEM_ONLINE:
+	case MEM_CANCEL_ONLINE:
+		break;
+	}
+
+	if (ret)
+		ret = notifier_from_errno(ret);
+	else
+		ret = NOTIFY_OK;
+
+	return ret;
+}
+
+static struct notifier_block cmm_mem_nb = {
+	.notifier_call = cmm_memory_cb,
+	.priority = CMM_MEM_HOTPLUG_PRI
+};
+
+/**
  * cmm_init - Module initialization
  *
  * Return value:
@@ -426,18 +613,24 @@ static int cmm_init(void)
 	if ((rc = cmm_sysfs_register(&cmm_sysdev)))
 		goto out_reboot_notifier;
 
+	if (register_memory_notifier(&cmm_mem_nb) ||
+	    register_memory_isolate_notifier(&cmm_mem_isolate_nb))
+		goto out_unregister_notifier;
+
 	if (cmm_disabled)
 		return rc;
 
 	cmm_thread_ptr = kthread_run(cmm_thread, NULL, "cmmthread");
 	if (IS_ERR(cmm_thread_ptr)) {
 		rc = PTR_ERR(cmm_thread_ptr);
-		goto out_unregister_sysfs;
+		goto out_unregister_notifier;
 	}
 
 	return rc;
 
-out_unregister_sysfs:
+out_unregister_notifier:
+	unregister_memory_notifier(&cmm_mem_nb);
+	unregister_memory_isolate_notifier(&cmm_mem_isolate_nb);
 	cmm_unregister_sysfs(&cmm_sysdev);
 out_reboot_notifier:
 	unregister_reboot_notifier(&cmm_reboot_nb);
@@ -458,6 +651,8 @@ static void cmm_exit(void)
 		kthread_stop(cmm_thread_ptr);
 	unregister_oom_notifier(&cmm_oom_nb);
 	unregister_reboot_notifier(&cmm_reboot_nb);
+	unregister_memory_notifier(&cmm_mem_nb);
+	unregister_memory_isolate_notifier(&cmm_mem_isolate_nb);
 	cmm_free_pages(loaned_pages);
 	cmm_unregister_sysfs(&cmm_sysdev);
 }

^ permalink raw reply

* Re: [patch 1/3] powerpc: Move 64bit VDSO to improve context switch performance
From: Andreas Schwab @ 2009-10-02 19:14 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: linuxppc-dev
In-Reply-To: <20090714065425.301516312@samba.org>

Anton Blanchard <anton@samba.org> writes:

> On 64bit applications the VDSO is the only thing in segment 0. Since the VDSO
> is position independent we can remove the hint and let get_unmapped_area pick
> an area.

This breaks gdb.  The section table in the VDSO image when mapped into
the process no longer contains meaningful values, and gdb rejects it.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply

* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
From: Scott Wood @ 2009-10-02 21:49 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <1254350159.5699.21.camel@pasglop>

On Thu, Oct 01, 2009 at 08:35:59AM +1000, Benjamin Herrenschmidt wrote:
> >From what I can see, the TLB miss code will check _PAGE_PRESENT, and
> when not set, it will -still- insert something into the TLB (unlike
> all other CPU types that go straight to data access faults from there).
> 
> So we end up populating with those unpopulated entries that will then
> cause us to take a DSI (or ISI) instead of a TLB miss the next time
> around ... and so we need to remove them once we do that, no ? IE. Once
> we have actually put a valid PTE in.
> 
> At least that's my understanding and why I believe we should always have
> tlbil_va() in set_pte() and ptep_set_access_flags(), which boils down
> in putting it into the 2 "filter" functions in the new code.
> 
> Well.. actually, the ptep_set_access_flags() will already do a
> flush_tlb_page_nohash() when the PTE is changed. So I suppose all we
> really need here would be in set_pte_filter(), to do the same if we are
> writing a PTE that has _PAGE_PRESENT, at least on 8xx.
> 
> But just unconditionally doing a tlbil_va() in both the filter functions
> shouldn't harm and also fix the problem, though Rex seems to indicate
> that is not the case.

Adding a tlbil_va to do_page_fault makes the problem go away for me (on
top of your "merge" branch) -- none of the other changes in this thread
do (assuming I didn't miss any).  FWIW, when it gets stuck on a fault,
DSISR is 0xc0000000, and handle_mm_fault returns zero.

-Scott

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox