LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
From: Anton Blanchard @ 2014-01-07  2:21 UTC (permalink / raw)
  To: benh, paulus, cl, penberg, mpm, nacc; +Cc: linux-mm, linuxppc-dev

We noticed a huge amount of slab memory consumed on a large ppc64 box:

Slab:            2094336 kB

Almost 2GB. This box is not balanced and some nodes do not have local
memory, causing slub to be very inefficient in its slab usage.

Each time we call kmem_cache_alloc_node slub checks the per cpu slab,
sees it isn't node local, deactivates it and tries to allocate a new
slab. On empty nodes we will allocate a new remote slab and use the
first slot, but as explained above when we get called a second time
we will just deactivate that slab and retry.

As such we end up only using 1 entry in each slab:

slab                    mem  objects
                       used   active
------------------------------------
kmalloc-16384       1404 MB    4.90%
task_struct          668 MB    2.90%
kmalloc-128          193 MB    3.61%
kmalloc-192          152 MB    5.23%
kmalloc-8192          72 MB   23.40%
kmalloc-16            64 MB    7.43%
kmalloc-512           33 MB   22.41%

The patch below checks that a node is not empty before deactivating a
slab and trying to allocate it again. With this patch applied we now
use about 352MB:

Slab:             360192 kB

And our efficiency is much better:

slab                    mem  objects
                       used   active
------------------------------------
kmalloc-16384         92 MB   74.27%
task_struct           23 MB   83.46%
idr_layer_cache       18 MB  100.00%
pgtable-2^12          17 MB  100.00%
kmalloc-65536         15 MB  100.00%
inode_cache           14 MB  100.00%
kmalloc-256           14 MB   97.81%
kmalloc-8192          14 MB   85.71%

Signed-off-by: Anton Blanchard <anton@samba.org>
---

Thoughts? It seems like we could hit a similar situation if a machine
is balanced but we run out of memory on a single node.

Index: b/mm/slub.c
===================================================================
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2278,10 +2278,17 @@ redo:

 	if (unlikely(!node_match(page, node))) {
 		stat(s, ALLOC_NODE_MISMATCH);
-		deactivate_slab(s, page, c->freelist);
-		c->page = NULL;
-		c->freelist = NULL;
-		goto new_slab;
+
+		/*
+		 * If the node contains no memory there is no point in trying
+		 * to allocate a new node local slab
+		 */
+		if (node_spanned_pages(node)) {
+			deactivate_slab(s, page, c->freelist);
+			c->page = NULL;
+			c->freelist = NULL;
+			goto new_slab;
+		}
 	}

 	/*

^ permalink raw reply

* Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
From: Wanpeng Li @ 2014-01-07  4:19 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: cl, nacc, penberg, linux-mm, paulus, mpm, linuxppc-dev
In-Reply-To: <20140107132100.5b5ad198@kryten>

On Tue, Jan 07, 2014 at 01:21:00PM +1100, Anton Blanchard wrote:
>
>We noticed a huge amount of slab memory consumed on a large ppc64 box:
>
>Slab:            2094336 kB
>
>Almost 2GB. This box is not balanced and some nodes do not have local
>memory, causing slub to be very inefficient in its slab usage.
>
>Each time we call kmem_cache_alloc_node slub checks the per cpu slab,
>sees it isn't node local, deactivates it and tries to allocate a new
>slab. On empty nodes we will allocate a new remote slab and use the
>first slot, but as explained above when we get called a second time
>we will just deactivate that slab and retry.
>
>As such we end up only using 1 entry in each slab:
>
>slab                    mem  objects
>                       used   active
>------------------------------------
>kmalloc-16384       1404 MB    4.90%
>task_struct          668 MB    2.90%
>kmalloc-128          193 MB    3.61%
>kmalloc-192          152 MB    5.23%
>kmalloc-8192          72 MB   23.40%
>kmalloc-16            64 MB    7.43%
>kmalloc-512           33 MB   22.41%
>
>The patch below checks that a node is not empty before deactivating a
>slab and trying to allocate it again. With this patch applied we now
>use about 352MB:
>
>Slab:             360192 kB
>
>And our efficiency is much better:
>
>slab                    mem  objects
>                       used   active
>------------------------------------
>kmalloc-16384         92 MB   74.27%
>task_struct           23 MB   83.46%
>idr_layer_cache       18 MB  100.00%
>pgtable-2^12          17 MB  100.00%
>kmalloc-65536         15 MB  100.00%
>inode_cache           14 MB  100.00%
>kmalloc-256           14 MB   97.81%
>kmalloc-8192          14 MB   85.71%
>
>Signed-off-by: Anton Blanchard <anton@samba.org>

Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>

>---
>
>Thoughts? It seems like we could hit a similar situation if a machine
>is balanced but we run out of memory on a single node.
>
>Index: b/mm/slub.c
>===================================================================
>--- a/mm/slub.c
>+++ b/mm/slub.c
>@@ -2278,10 +2278,17 @@ redo:
>
> 	if (unlikely(!node_match(page, node))) {
> 		stat(s, ALLOC_NODE_MISMATCH);
>-		deactivate_slab(s, page, c->freelist);
>-		c->page = NULL;
>-		c->freelist = NULL;
>-		goto new_slab;
>+
>+		/*
>+		 * If the node contains no memory there is no point in trying
>+		 * to allocate a new node local slab
>+		 */
>+		if (node_spanned_pages(node)) {

s/node_spanned_pages/node_present_pages 

>+			deactivate_slab(s, page, c->freelist);
>+			c->page = NULL;
>+			c->freelist = NULL;
>+			goto new_slab;
>+		}
> 	}
>
> 	/*
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org.  For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH] powerpc/mpic: supply a .disable callback
From: Dongsheng Wang @ 2014-01-07  5:38 UTC (permalink / raw)
  To: scottwood, benh; +Cc: linuxppc-dev, Wang Dongsheng

From: Wang Dongsheng <dongsheng.wang@freescale.com>

Currently MPIC provides .mask, but not .disable.  This means that
effectively disable_irq() soft-disables the interrupt, and you get
a .mask call if an interrupt actually occurs.

I'm not sure if this was intended as a performance benefit (it seems common
to omit .disable on powerpc interrupt controllers, but nowhere else), but it
interacts badly with threaded/workqueue interrupts (including KVM
reflection).  In such cases, where the real interrupt handler does a
disable_irq_nosync(), schedules defered handling, and returns, we get two
interrupts for every real interrupt.  The second interrupt does nothing
but see that IRQ_DISABLED is set, and decide that it would be a good
idea to actually call .mask.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>

diff --git a/arch/powerpc/sysdev/mpic.c b/arch/powerpc/sysdev/mpic.c
index 0e166ed..dd7564b 100644
--- a/arch/powerpc/sysdev/mpic.c
+++ b/arch/powerpc/sysdev/mpic.c
@@ -975,6 +975,7 @@ void mpic_set_destination(unsigned int virq, unsigned int cpuid)
 }

 static struct irq_chip mpic_irq_chip = {
+	.irq_disable	= mpic_mask_irq,
 	.irq_mask	= mpic_mask_irq,
 	.irq_unmask	= mpic_unmask_irq,
 	.irq_eoi	= mpic_end_irq,
@@ -984,6 +985,7 @@ static struct irq_chip mpic_irq_chip = {

 #ifdef CONFIG_SMP
 static struct irq_chip mpic_ipi_chip = {
+	.irq_disable	= mpic_mask_ipi,
 	.irq_mask	= mpic_mask_ipi,
 	.irq_unmask	= mpic_unmask_ipi,
 	.irq_eoi	= mpic_end_ipi,
@@ -991,6 +993,7 @@ static struct irq_chip mpic_ipi_chip = {
 #endif /* CONFIG_SMP */

 static struct irq_chip mpic_tm_chip = {
+	.irq_disable	= mpic_mask_tm,
 	.irq_mask	= mpic_mask_tm,
 	.irq_unmask	= mpic_unmask_tm,
 	.irq_eoi	= mpic_end_irq,
@@ -1001,6 +1004,7 @@ static struct irq_chip mpic_tm_chip = {
 static struct irq_chip mpic_irq_ht_chip = {
 	.irq_startup	= mpic_startup_ht_irq,
 	.irq_shutdown	= mpic_shutdown_ht_irq,
+	.irq_disable	= mpic_mask_irq,
 	.irq_mask	= mpic_mask_irq,
 	.irq_unmask	= mpic_unmask_ht_irq,
 	.irq_eoi	= mpic_end_ht_irq,
-- 
1.8.5

^ permalink raw reply related

* Re: [PATCH] powerpc/mpic: supply a .disable callback
From: Benjamin Herrenschmidt @ 2014-01-07  5:49 UTC (permalink / raw)
  To: Dongsheng Wang; +Cc: scottwood, linuxppc-dev
In-Reply-To: <1389073086-6763-1-git-send-email-dongsheng.wang@freescale.com>

On Tue, 2014-01-07 at 13:38 +0800, Dongsheng Wang wrote:
> From: Wang Dongsheng <dongsheng.wang@freescale.com>
> 
> Currently MPIC provides .mask, but not .disable.  This means that
> effectively disable_irq() soft-disables the interrupt, and you get
> a .mask call if an interrupt actually occurs.
> 
> I'm not sure if this was intended as a performance benefit (it seems common
> to omit .disable on powerpc interrupt controllers, but nowhere else), but it
> interacts badly with threaded/workqueue interrupts (including KVM
> reflection).  In such cases, where the real interrupt handler does a
> disable_irq_nosync(), schedules defered handling, and returns, we get two
> interrupts for every real interrupt.  The second interrupt does nothing
> but see that IRQ_DISABLED is set, and decide that it would be a good
> idea to actually call .mask.

We probably don't want to do that for edge, only level interrupts.

Cheers,
Ben.

> 
> Signed-off-by: Scott Wood <scottwood@freescale.com>
> Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
> 
> diff --git a/arch/powerpc/sysdev/mpic.c b/arch/powerpc/sysdev/mpic.c
> index 0e166ed..dd7564b 100644
> --- a/arch/powerpc/sysdev/mpic.c
> +++ b/arch/powerpc/sysdev/mpic.c
> @@ -975,6 +975,7 @@ void mpic_set_destination(unsigned int virq, unsigned int cpuid)
>  }
>  
>  static struct irq_chip mpic_irq_chip = {
> +	.irq_disable	= mpic_mask_irq,
>  	.irq_mask	= mpic_mask_irq,
>  	.irq_unmask	= mpic_unmask_irq,
>  	.irq_eoi	= mpic_end_irq,
> @@ -984,6 +985,7 @@ static struct irq_chip mpic_irq_chip = {
>  
>  #ifdef CONFIG_SMP
>  static struct irq_chip mpic_ipi_chip = {
> +	.irq_disable	= mpic_mask_ipi,
>  	.irq_mask	= mpic_mask_ipi,
>  	.irq_unmask	= mpic_unmask_ipi,
>  	.irq_eoi	= mpic_end_ipi,
> @@ -991,6 +993,7 @@ static struct irq_chip mpic_ipi_chip = {
>  #endif /* CONFIG_SMP */
>  
>  static struct irq_chip mpic_tm_chip = {
> +	.irq_disable	= mpic_mask_tm,
>  	.irq_mask	= mpic_mask_tm,
>  	.irq_unmask	= mpic_unmask_tm,
>  	.irq_eoi	= mpic_end_irq,
> @@ -1001,6 +1004,7 @@ static struct irq_chip mpic_tm_chip = {
>  static struct irq_chip mpic_irq_ht_chip = {
>  	.irq_startup	= mpic_startup_ht_irq,
>  	.irq_shutdown	= mpic_shutdown_ht_irq,
> +	.irq_disable	= mpic_mask_irq,
>  	.irq_mask	= mpic_mask_irq,
>  	.irq_unmask	= mpic_unmask_ht_irq,
>  	.irq_eoi	= mpic_end_ht_irq,

^ permalink raw reply

* [PATCH] ASoC: fsl_ssi: Fixed wrong printf format identifier
From: Alexander Shiyan @ 2014-01-07  6:04 UTC (permalink / raw)
  To: alsa-devel
  Cc: Alexander Shiyan, Liam Girdwood, Takashi Iwai, Timur Tabi,
	Jaroslav Kysela, Mark Brown, linuxppc-dev

sound/soc/fsl/fsl_ssi.c: In function 'fsl_ssi_probe':
sound/soc/fsl/fsl_ssi.c:1180:6: warning: format '%d' expects argument
of type 'int', but argument 3 has type 'long int' [-Wformat=]

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Alexander Shiyan <shc_work@mail.ru>
---
 sound/soc/fsl/fsl_ssi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c
index 3d74477a..c9d567c 100644
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -1192,7 +1192,7 @@ static int fsl_ssi_probe(struct platform_device *pdev)
 		 */
 		ssi_private->baudclk = devm_clk_get(&pdev->dev, "baud");
 		if (IS_ERR(ssi_private->baudclk))
-			dev_warn(&pdev->dev, "could not get baud clock: %d\n",
+			dev_warn(&pdev->dev, "could not get baud clock: %ld\n",
 				 PTR_ERR(ssi_private->baudclk));
 		else
 			clk_prepare_enable(ssi_private->baudclk);
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH 1/2] powerpc/dts: fix lbc lack of error interrupt
From: Dongsheng Wang @ 2014-01-07  6:26 UTC (permalink / raw)
  To: scottwood, galak; +Cc: devicetree, linuxppc-dev, Wang Dongsheng

From: Wang Dongsheng <dongsheng.wang@freescale.com>

P1020, P1021, P1022, P1023 when the lbc get error, the error
interrupt will be triggered. The corresponding interrupt is
internal IRQ0. So system have to process the lbc IRQ0 interrupt.

The corresponding lbc general interrupt is internal IRQ3.

Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>

diff --git a/arch/powerpc/boot/dts/fsl/p1020si-post.dtsi b/arch/powerpc/boot/dts/fsl/p1020si-post.dtsi
index 68cc5e7..13f209f 100644
--- a/arch/powerpc/boot/dts/fsl/p1020si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p1020si-post.dtsi
@@ -36,7 +36,8 @@
 	#address-cells = <2>;
 	#size-cells = <1>;
 	compatible = "fsl,p1020-elbc", "fsl,elbc", "simple-bus";
-	interrupts = <19 2 0 0>;
+	interrupts = <19 2 0 0
+		      16 2 0 0>;
 };
 
 /* controller at 0x9000 */
diff --git a/arch/powerpc/boot/dts/fsl/p1021si-post.dtsi b/arch/powerpc/boot/dts/fsl/p1021si-post.dtsi
index adb82fd..cffc93e 100644
--- a/arch/powerpc/boot/dts/fsl/p1021si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p1021si-post.dtsi
@@ -36,7 +36,8 @@
 	#address-cells = <2>;
 	#size-cells = <1>;
 	compatible = "fsl,p1021-elbc", "fsl,elbc", "simple-bus";
-	interrupts = <19 2 0 0>;
+	interrupts = <19 2 0 0
+		      16 2 0 0>;
 };
 
 /* controller at 0x9000 */
diff --git a/arch/powerpc/boot/dts/fsl/p1022si-post.dtsi b/arch/powerpc/boot/dts/fsl/p1022si-post.dtsi
index e179803..979670d 100644
--- a/arch/powerpc/boot/dts/fsl/p1022si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p1022si-post.dtsi
@@ -40,7 +40,8 @@
 	 * pin muxing when the DIU is enabled.
 	 */
 	compatible = "fsl,p1022-elbc", "fsl,elbc";
-	interrupts = <19 2 0 0>;
+	interrupts = <19 2 0 0
+		      16 2 0 0>;
 };
 
 /* controller at 0x9000 */
diff --git a/arch/powerpc/boot/dts/fsl/p1023si-post.dtsi b/arch/powerpc/boot/dts/fsl/p1023si-post.dtsi
index f1105bf..f5f5043 100644
--- a/arch/powerpc/boot/dts/fsl/p1023si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p1023si-post.dtsi
@@ -36,7 +36,8 @@
 	#address-cells = <2>;
 	#size-cells = <1>;
 	compatible = "fsl,p1023-elbc", "fsl,elbc", "simple-bus";
-	interrupts = <19 2 0 0>;
+	interrupts = <19 2 0 0
+		      16 2 0 0>;
 };
 
 /* controller at 0xa000 */
-- 
1.8.5

^ permalink raw reply related

* [PATCH 2/2] powerpc/85xx: handle the eLBC error interrupt if it exist in dts
From: Dongsheng Wang @ 2014-01-07  6:27 UTC (permalink / raw)
  To: scottwood; +Cc: linuxppc-dev, Wang Dongsheng, Shaohui Xie

From: Wang Dongsheng <dongsheng.wang@freescale.com>

On P3041, P1020, P1021, P1022, P1023 eLBC event interrupts are routed
to Int9(P3041) & Int3(P102x) while ELBC error interrupts are routed to
Int0, we need to call request_irq for each.

Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>

diff --git a/arch/powerpc/include/asm/fsl_lbc.h b/arch/powerpc/include/asm/fsl_lbc.h
index 420b453..067fb0d 100644
--- a/arch/powerpc/include/asm/fsl_lbc.h
+++ b/arch/powerpc/include/asm/fsl_lbc.h
@@ -285,7 +285,7 @@ struct fsl_lbc_ctrl {
 	/* device info */
 	struct device			*dev;
 	struct fsl_lbc_regs __iomem	*regs;
-	int				irq;
+	int				irq[2];
 	wait_queue_head_t		irq_wait;
 	spinlock_t			lock;
 	void				*nand;
diff --git a/arch/powerpc/sysdev/fsl_lbc.c b/arch/powerpc/sysdev/fsl_lbc.c
index 6bc5a54..d631022 100644
--- a/arch/powerpc/sysdev/fsl_lbc.c
+++ b/arch/powerpc/sysdev/fsl_lbc.c
@@ -214,10 +214,14 @@ static irqreturn_t fsl_lbc_ctrl_irq(int irqno, void *data)
 	struct fsl_lbc_ctrl *ctrl = data;
 	struct fsl_lbc_regs __iomem *lbc = ctrl->regs;
 	u32 status;
+	unsigned long flags;
 
+	spin_lock_irqsave(&fsl_lbc_lock, flags);
 	status = in_be32(&lbc->ltesr);
-	if (!status)
+	if (!status) {
+		spin_unlock_irqrestore(&fsl_lbc_lock, flags);
 		return IRQ_NONE;
+	}
 
 	out_be32(&lbc->ltesr, LTESR_CLEAR);
 	out_be32(&lbc->lteatr, 0);
@@ -260,6 +264,7 @@ static irqreturn_t fsl_lbc_ctrl_irq(int irqno, void *data)
 	if (status & ~LTESR_MASK)
 		dev_err(ctrl->dev, "Unknown error: "
 			"LTESR 0x%08X\n", status);
+	spin_unlock_irqrestore(&fsl_lbc_lock, flags);
 	return IRQ_HANDLED;
 }
 
@@ -298,8 +303,8 @@ static int fsl_lbc_ctrl_probe(struct platform_device *dev)
 		goto err;
 	}
 
-	fsl_lbc_ctrl_dev->irq = irq_of_parse_and_map(dev->dev.of_node, 0);
-	if (fsl_lbc_ctrl_dev->irq == NO_IRQ) {
+	fsl_lbc_ctrl_dev->irq[0] = irq_of_parse_and_map(dev->dev.of_node, 0);
+	if (!fsl_lbc_ctrl_dev->irq[0]) {
 		dev_err(&dev->dev, "failed to get irq resource\n");
 		ret = -ENODEV;
 		goto err;
@@ -311,20 +316,34 @@ static int fsl_lbc_ctrl_probe(struct platform_device *dev)
 	if (ret < 0)
 		goto err;
 
-	ret = request_irq(fsl_lbc_ctrl_dev->irq, fsl_lbc_ctrl_irq, 0,
+	ret = request_irq(fsl_lbc_ctrl_dev->irq[0], fsl_lbc_ctrl_irq, 0,
 				"fsl-lbc", fsl_lbc_ctrl_dev);
 	if (ret != 0) {
 		dev_err(&dev->dev, "failed to install irq (%d)\n",
-			fsl_lbc_ctrl_dev->irq);
-		ret = fsl_lbc_ctrl_dev->irq;
+			fsl_lbc_ctrl_dev->irq[0]);
+		ret = fsl_lbc_ctrl_dev->irq[0];
 		goto err;
 	}
 
+	fsl_lbc_ctrl_dev->irq[1] = irq_of_parse_and_map(dev->dev.of_node, 1);
+	if (fsl_lbc_ctrl_dev->irq[1]) {
+		ret = request_irq(fsl_lbc_ctrl_dev->irq[1], fsl_lbc_ctrl_irq,
+				IRQF_SHARED, "fsl-lbc-err", fsl_lbc_ctrl_dev);
+		if (ret) {
+			dev_err(&dev->dev, "failed to install irq (%d)\n",
+					fsl_lbc_ctrl_dev->irq[1]);
+			ret = fsl_lbc_ctrl_dev->irq[1];
+			goto err1;
+		}
+	}
+
 	/* Enable interrupts for any detected events */
 	out_be32(&fsl_lbc_ctrl_dev->regs->lteir, LTEIR_ENABLE);
 
 	return 0;
 
+err1:
+	free_irq(fsl_lbc_ctrl_dev->irq[0], fsl_lbc_ctrl_dev);
 err:
 	iounmap(fsl_lbc_ctrl_dev->regs);
 	kfree(fsl_lbc_ctrl_dev);
-- 
1.8.5

^ permalink raw reply related

* Re: [question] Can the execution of the atomtic operation instruction pair lwarx/stwcx be interrrupted by local HW interruptions?
From: Scott Wood @ 2014-01-07  6:35 UTC (permalink / raw)
  To: wyang; +Cc: Linuxppc-dev, Gavin Hu
In-Reply-To: <52CB51A4.7080303@gmail.com>

On Tue, 2014-01-07 at 09:00 +0800, wyang wrote:
> Yeah, Can you provide more detail info about why they can handle that 
> case? The following is my understand:
> 
> Let us assume that there is a atomic global variable(var_a) and its 
> initial value is 0.
> 
> The kernel attempts to execute atomic_add(1, var_a), after lwarx a async 
> interrupt happens, and the ISR also accesses "var_a" variable and 
> executes atomic_add.
> 
> static __inline__ void atomic_add(int a, atomic_t *v)
> {
>      int t;
> 
>      __asm__ __volatile__(
> "1:    lwarx    %0,0,%3        # atomic_add\n\
> ----------------------------------  <----------- interrupt 
> happens------->        ISR also operates this global variable "var_a" 
> such as also executing atomic_add(1, var_a). so the
>                var_a would is 1.
>      add    %0,%2,%0\n"
>      PPC405_ERR77(0,%3)
> "    stwcx.    %0,0,%3 \n\ <----- After interrupt code returns, the 
> reservation is cleared. so CR0 is not equal to 0, and then jump the 1 
> label. the var_a will be 2.
>      bne-    1b"
>      : "=&r" (t), "+m" (v->counter)
>      : "r" (a), "r" (&v->counter)
>      : "cc");
> }
> 
> So the value of var_a is 2 rather than 1. Thats why i said that 
> atomic_add does not handle such case. If I miss something, please 
> correct me.:-)

2 is the correct result, since atomic_add(1, var_a) was called twice
(once in the ISR, once in the interrupted context).

-Scott

^ permalink raw reply

* Re: [PATCH] powerpc/mpic: supply a .disable callback
From: Scott Wood @ 2014-01-07  6:38 UTC (permalink / raw)
  To: Dongsheng Wang; +Cc: linuxppc-dev
In-Reply-To: <1389073086-6763-1-git-send-email-dongsheng.wang@freescale.com>

On Tue, 2014-01-07 at 13:38 +0800, Dongsheng Wang wrote:
> From: Wang Dongsheng <dongsheng.wang@freescale.com>

Why did you change the author field?

-Scott

^ permalink raw reply

* Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
From: Andi Kleen @ 2014-01-07  6:49 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: cl, nacc, penberg, linux-mm, paulus, mpm, linuxppc-dev
In-Reply-To: <20140107132100.5b5ad198@kryten>

Anton Blanchard <anton@samba.org> writes:
>
> Thoughts? It seems like we could hit a similar situation if a machine
> is balanced but we run out of memory on a single node.

Yes I agree, but your patch doesn't seem to attempt to handle this?

-Andi
>
> Index: b/mm/slub.c
> ===================================================================
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2278,10 +2278,17 @@ redo:
>  
>  	if (unlikely(!node_match(page, node))) {
>  		stat(s, ALLOC_NODE_MISMATCH);
> -		deactivate_slab(s, page, c->freelist);
> -		c->page = NULL;
> -		c->freelist = NULL;
> -		goto new_slab;
> +
> +		/*
> +		 * If the node contains no memory there is no point in trying
> +		 * to allocate a new node local slab
> +		 */
> +		if (node_spanned_pages(node)) {
> +			deactivate_slab(s, page, c->freelist);
> +			c->page = NULL;
> +			c->freelist = NULL;
> +			goto new_slab;
> +		}
>  	}
>  
>  	/*
-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply

* Re: [PATCH 2/2] powerpc/85xx: handle the eLBC error interrupt if it exist in dts
From: Scott Wood @ 2014-01-07  6:58 UTC (permalink / raw)
  To: Dongsheng Wang; +Cc: linuxppc-dev, Shaohui Xie
In-Reply-To: <1389076061-20159-1-git-send-email-dongsheng.wang@freescale.com>

On Tue, 2014-01-07 at 14:27 +0800, Dongsheng Wang wrote:
> From: Wang Dongsheng <dongsheng.wang@freescale.com>

AFAICT this patch was originally written by Shaohui Xie.

> On P3041, P1020, P1021, P1022, P1023 eLBC event interrupts are routed
> to Int9(P3041) & Int3(P102x) while ELBC error interrupts are routed to
> Int0, we need to call request_irq for each.

For p3041 I thought that was only on early silicon revs that we don't
support anymore.

As for p102x, have you tested that this is actually what happens?  How
would we distinguish eLBC errors from other error sources, given that
there's no EISR0?  Do we just hope that no other error interrupts
happen?

-Scott

^ permalink raw reply

* Re: [03/12,v3] pci: fsl: add PCI indirect access support
From: Scott Wood @ 2014-01-07  7:13 UTC (permalink / raw)
  To: Lian Minghuan-b31939
  Cc: Bjorn Helgaas, Minghuan Lian, linuxppc-dev, Zang Roy-R61911,
	linux-pci
In-Reply-To: <52CA40D8.1090805@freescale.com>

On Mon, 2014-01-06 at 13:36 +0800, Lian Minghuan-b31939 wrote:
> HI Scott,
> 
> please see my comments inline.
> 
> On 01/04/2014 06:33 AM, Scott Wood wrote:
> > A lot of this seems duplicated from arch/powerpc/sysdev/indirect_pci.c.
> >
> > How generally applicable is that file to non-PPC implementations?  At a
> > minimum I see a similar file in arch/microblaze.  It should probably
> > eventually be moved to common code, rather than duplicated again.  A
> > prerequisite for that would be making common the dependencies it has on
> > the rest of what is currently arch PCI infrastructure; until then, it's
> > probably better to just have the common fsl-pci code know how to
> > interface with the appropriate PPC/ARM code rather than trying to copy
> > the infrastructure as well.
> [Minghuan] Yes, This is a duplicate except it uses struct fsl_pci. But 
> it is hard to be move to common code.
> because every indirect read/write functions use different PCI controller 
> structure which is very basic structure and ARM has no this structure.
> If we can not establish a unified pci controller structure, we can only 
> abstract out a simple structure which includes indirect access related 
> fields,
> and need a callback function to get the pointer like this: 
> ((powerpc/microblaze/mips/ pci_controller 
> *)(pci_bus->sysdata))->indirect_struct.
> Should we provide the common code for indirect access API or wait for 
> the common PCI controller structure?

Either work with the PCI maintainer to come up with a common structure,
or leave the code where it is and call into it.

-Scott

^ permalink raw reply

* Re: [question] Can the execution of the atomtic operation instruction pair lwarx/stwcx be interrrupted by local HW interruptions?
From: wyang @ 2014-01-07  7:22 UTC (permalink / raw)
  To: Scott Wood; +Cc: Linuxppc-dev, Gavin Hu
In-Reply-To: <1389076538.11795.120.camel@snotra.buserror.net>


On 01/07/2014 02:35 PM, Scott Wood wrote:
> On Tue, 2014-01-07 at 09:00 +0800, wyang wrote:
>> Yeah, Can you provide more detail info about why they can handle that
>> case? The following is my understand:
>>
>> Let us assume that there is a atomic global variable(var_a) and its
>> initial value is 0.
>>
>> The kernel attempts to execute atomic_add(1, var_a), after lwarx a async
>> interrupt happens, and the ISR also accesses "var_a" variable and
>> executes atomic_add.
>>
>> static __inline__ void atomic_add(int a, atomic_t *v)
>> {
>>       int t;
>>
>>       __asm__ __volatile__(
>> "1:    lwarx    %0,0,%3        # atomic_add\n\
>> ----------------------------------  <----------- interrupt
>> happens------->        ISR also operates this global variable "var_a"
>> such as also executing atomic_add(1, var_a). so the
>>                 var_a would is 1.
>>       add    %0,%2,%0\n"
>>       PPC405_ERR77(0,%3)
>> "    stwcx.    %0,0,%3 \n\ <----- After interrupt code returns, the
>> reservation is cleared. so CR0 is not equal to 0, and then jump the 1
>> label. the var_a will be 2.
>>       bne-    1b"
>>       : "=&r" (t), "+m" (v->counter)
>>       : "r" (a), "r" (&v->counter)
>>       : "cc");
>> }
>>
>> So the value of var_a is 2 rather than 1. Thats why i said that
>> atomic_add does not handle such case. If I miss something, please
>> correct me.:-)
> 2 is the correct result, since atomic_add(1, var_a) was called twice
> (once in the ISR, once in the interrupted context).
Scott, thanks for your confirmation. I guess that Gavin thought that 1 
is a correct result. So thats why I said that if he wanna get 1,
he should have responsibility to disable local interrupts. I mean that 
atomic_add is not able to guarantee that 1 is a correct result.:-)

Wei
>
> -Scott
>
>
>

^ permalink raw reply

* Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
From: Joonsoo Kim @ 2014-01-07  7:41 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: cl, nacc, penberg, linux-mm, paulus, mpm, linuxppc-dev
In-Reply-To: <20140107132100.5b5ad198@kryten>

On Tue, Jan 07, 2014 at 01:21:00PM +1100, Anton Blanchard wrote:
> 
> We noticed a huge amount of slab memory consumed on a large ppc64 box:
> 
> Slab:            2094336 kB
> 
> Almost 2GB. This box is not balanced and some nodes do not have local
> memory, causing slub to be very inefficient in its slab usage.
> 
> Each time we call kmem_cache_alloc_node slub checks the per cpu slab,
> sees it isn't node local, deactivates it and tries to allocate a new
> slab. On empty nodes we will allocate a new remote slab and use the
> first slot, but as explained above when we get called a second time
> we will just deactivate that slab and retry.
> 
> As such we end up only using 1 entry in each slab:
> 
> slab                    mem  objects
>                        used   active
> ------------------------------------
> kmalloc-16384       1404 MB    4.90%
> task_struct          668 MB    2.90%
> kmalloc-128          193 MB    3.61%
> kmalloc-192          152 MB    5.23%
> kmalloc-8192          72 MB   23.40%
> kmalloc-16            64 MB    7.43%
> kmalloc-512           33 MB   22.41%
> 
> The patch below checks that a node is not empty before deactivating a
> slab and trying to allocate it again. With this patch applied we now
> use about 352MB:
> 
> Slab:             360192 kB
> 
> And our efficiency is much better:
> 
> slab                    mem  objects
>                        used   active
> ------------------------------------
> kmalloc-16384         92 MB   74.27%
> task_struct           23 MB   83.46%
> idr_layer_cache       18 MB  100.00%
> pgtable-2^12          17 MB  100.00%
> kmalloc-65536         15 MB  100.00%
> inode_cache           14 MB  100.00%
> kmalloc-256           14 MB   97.81%
> kmalloc-8192          14 MB   85.71%
> 
> Signed-off-by: Anton Blanchard <anton@samba.org>
> ---
> 
> Thoughts? It seems like we could hit a similar situation if a machine
> is balanced but we run out of memory on a single node.
> 
> Index: b/mm/slub.c
> ===================================================================
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2278,10 +2278,17 @@ redo:
>  
>  	if (unlikely(!node_match(page, node))) {
>  		stat(s, ALLOC_NODE_MISMATCH);
> -		deactivate_slab(s, page, c->freelist);
> -		c->page = NULL;
> -		c->freelist = NULL;
> -		goto new_slab;
> +
> +		/*
> +		 * If the node contains no memory there is no point in trying
> +		 * to allocate a new node local slab
> +		 */
> +		if (node_spanned_pages(node)) {
> +			deactivate_slab(s, page, c->freelist);
> +			c->page = NULL;
> +			c->freelist = NULL;
> +			goto new_slab;
> +		}
>  	}
>  
>  	/*

Hello,

I think that we need more efforts to solve unbalanced node problem.

With this patch, even if node of current cpu slab is not favorable to
unbalanced node, allocation would proceed and we would get the unintended memory.

And there is one more problem. Even if we have some partial slabs on
compatible node, we would allocate new slab, because get_partial() cannot handle
this unbalance node case.

To fix this correctly, how about following patch?

Thanks.

------------->8--------------------
diff --git a/mm/slub.c b/mm/slub.c
index c3eb3d3..a1f6dfa 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1672,7 +1672,19 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
 {
        void *object;
        int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;
+       struct zonelist *zonelist;
+       struct zoneref *z;
+       struct zone *zone;
+       enum zone_type high_zoneidx = gfp_zone(flags);
 
+       if (!node_present_pages(searchnode)) {
+               zonelist = node_zonelist(searchnode, flags);
+               for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
+                       searchnode = zone_to_nid(zone);
+                       if (node_present_pages(searchnode))
+                               break;
+               }
+       }
        object = get_partial_node(s, get_node(s, searchnode), c, flags);
        if (object || node != NUMA_NO_NODE)
                return object;

^ permalink raw reply related

* Re: [question] Can the execution of the atomtic operation instruction pair lwarx/stwcx be interrrupted by local HW interruptions?
From: Scott Wood @ 2014-01-07  8:01 UTC (permalink / raw)
  To: wyang; +Cc: Linuxppc-dev, Gavin Hu
In-Reply-To: <52CBAB24.9070804@gmail.com>

On Tue, 2014-01-07 at 15:22 +0800, wyang wrote:
> On 01/07/2014 02:35 PM, Scott Wood wrote:
> > On Tue, 2014-01-07 at 09:00 +0800, wyang wrote:
> >> Yeah, Can you provide more detail info about why they can handle that
> >> case? The following is my understand:
> >>
> >> Let us assume that there is a atomic global variable(var_a) and its
> >> initial value is 0.
> >>
> >> The kernel attempts to execute atomic_add(1, var_a), after lwarx a async
> >> interrupt happens, and the ISR also accesses "var_a" variable and
> >> executes atomic_add.
> >>
> >> static __inline__ void atomic_add(int a, atomic_t *v)
> >> {
> >>       int t;
> >>
> >>       __asm__ __volatile__(
> >> "1:    lwarx    %0,0,%3        # atomic_add\n\
> >> ----------------------------------  <----------- interrupt
> >> happens------->        ISR also operates this global variable "var_a"
> >> such as also executing atomic_add(1, var_a). so the
> >>                 var_a would is 1.
> >>       add    %0,%2,%0\n"
> >>       PPC405_ERR77(0,%3)
> >> "    stwcx.    %0,0,%3 \n\ <----- After interrupt code returns, the
> >> reservation is cleared. so CR0 is not equal to 0, and then jump the 1
> >> label. the var_a will be 2.
> >>       bne-    1b"
> >>       : "=&r" (t), "+m" (v->counter)
> >>       : "r" (a), "r" (&v->counter)
> >>       : "cc");
> >> }
> >>
> >> So the value of var_a is 2 rather than 1. Thats why i said that
> >> atomic_add does not handle such case. If I miss something, please
> >> correct me.:-)
> > 2 is the correct result, since atomic_add(1, var_a) was called twice
> > (once in the ISR, once in the interrupted context).
> Scott, thanks for your confirmation. I guess that Gavin thought that 1 
> is a correct result. So thats why I said that if he wanna get 1,
> he should have responsibility to disable local interrupts.

If you disable interrupts, that will just delay the interrupt until
afterward, at which point the interrupt will increment var_a to 2.

> I mean that 
> atomic_add is not able to guarantee that 1 is a correct result.:-)

Well, no.  It's atomic_add(), not set_var_to_one(). :-)

-Scott

^ permalink raw reply

* [PATCH 1/2] pci: Fix root port bus->self is NULL
From: Dongsheng Wang @ 2014-01-07  8:04 UTC (permalink / raw)
  To: bhelgaas, rjw
  Cc: roy.zang, galak, Wang Dongsheng, linux-pci, scottwood,
	linuxppc-dev

From: Wang Dongsheng <dongsheng.wang@freescale.com>

the root port bus->self always NULL, so put root port pci device
into root port bus->self.

Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 38e403d..7f2d1ab 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1472,6 +1472,9 @@ int pci_scan_slot(struct pci_bus *bus, int devfn)
 	if (!dev->is_added)
 		nr++;
 
+	if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT)
+		bus->self = dev;
+
 	for (fn = next_fn(bus, dev, 0); fn > 0; fn = next_fn(bus, dev, fn)) {
 		dev = pci_scan_single_device(bus, devfn + fn);
 		if (dev) {
-- 
1.8.5

^ permalink raw reply related

* [PATCH 2/2] fsl/pci: The new pci suspend/resume implementation
From: Dongsheng Wang @ 2014-01-07  8:04 UTC (permalink / raw)
  To: bhelgaas, rjw
  Cc: roy.zang, galak, Wang Dongsheng, linux-pci, scottwood,
	linuxppc-dev
In-Reply-To: <1389081848-26506-1-git-send-email-dongsheng.wang@freescale.com>

From: Wang Dongsheng <dongsheng.wang@freescale.com>

The new suspend/resume implementation, send pme turnoff message
in suspend, and send pme exit message in resume.

Add a PME handler, to response PME & message interrupt.

Change platform_driver->suspend/resume to syscore->suspend/resume.
pci-driver will call back EP device, to save EP state in
pci_pm_suspend_noirq, so we need to keep the link, until
pci_pm_suspend_noirq finish.

Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>

diff --git a/arch/powerpc/platforms/85xx/c293pcie.c b/arch/powerpc/platforms/85xx/c293pcie.c
index 213d5b8..84476b6 100644
--- a/arch/powerpc/platforms/85xx/c293pcie.c
+++ b/arch/powerpc/platforms/85xx/c293pcie.c
@@ -68,6 +68,7 @@ define_machine(c293_pcie) {
 	.init_IRQ		= c293_pcie_pic_init,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 	.get_irq		= mpic_get_irq,
 	.restart		= fsl_rstcr_restart,
diff --git a/arch/powerpc/platforms/85xx/corenet_generic.c b/arch/powerpc/platforms/85xx/corenet_generic.c
index fbd871e..aa8b9a3 100644
--- a/arch/powerpc/platforms/85xx/corenet_generic.c
+++ b/arch/powerpc/platforms/85xx/corenet_generic.c
@@ -163,6 +163,7 @@ define_machine(corenet_generic) {
 	.init_IRQ		= corenet_gen_pic_init,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 	.get_irq		= mpic_get_coreint_irq,
 	.restart		= fsl_rstcr_restart,
diff --git a/arch/powerpc/platforms/85xx/ge_imp3a.c b/arch/powerpc/platforms/85xx/ge_imp3a.c
index e6285ae..11790e0 100644
--- a/arch/powerpc/platforms/85xx/ge_imp3a.c
+++ b/arch/powerpc/platforms/85xx/ge_imp3a.c
@@ -215,6 +215,7 @@ define_machine(ge_imp3a) {
 	.show_cpuinfo		= ge_imp3a_show_cpuinfo,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 	.get_irq		= mpic_get_irq,
 	.restart		= fsl_rstcr_restart,
diff --git a/arch/powerpc/platforms/85xx/mpc8536_ds.c b/arch/powerpc/platforms/85xx/mpc8536_ds.c
index 15ce4b5..a378ba3 100644
--- a/arch/powerpc/platforms/85xx/mpc8536_ds.c
+++ b/arch/powerpc/platforms/85xx/mpc8536_ds.c
@@ -76,6 +76,7 @@ define_machine(mpc8536_ds) {
 	.init_IRQ		= mpc8536_ds_pic_init,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 	.get_irq		= mpic_get_irq,
 	.restart		= fsl_rstcr_restart,
diff --git a/arch/powerpc/platforms/85xx/mpc85xx_cds.c b/arch/powerpc/platforms/85xx/mpc85xx_cds.c
index 7a31a0e..b0753e2 100644
--- a/arch/powerpc/platforms/85xx/mpc85xx_cds.c
+++ b/arch/powerpc/platforms/85xx/mpc85xx_cds.c
@@ -385,6 +385,7 @@ define_machine(mpc85xx_cds) {
 #ifdef CONFIG_PCI
 	.restart	= mpc85xx_cds_restart,
 	.pcibios_fixup_bus	= mpc85xx_cds_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #else
 	.restart	= fsl_rstcr_restart,
 #endif
diff --git a/arch/powerpc/platforms/85xx/mpc85xx_ds.c b/arch/powerpc/platforms/85xx/mpc85xx_ds.c
index 9ebb91e..ffdf021 100644
--- a/arch/powerpc/platforms/85xx/mpc85xx_ds.c
+++ b/arch/powerpc/platforms/85xx/mpc85xx_ds.c
@@ -209,6 +209,7 @@ define_machine(mpc8544_ds) {
 	.init_IRQ		= mpc85xx_ds_pic_init,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 	.get_irq		= mpic_get_irq,
 	.restart		= fsl_rstcr_restart,
@@ -223,6 +224,7 @@ define_machine(mpc8572_ds) {
 	.init_IRQ		= mpc85xx_ds_pic_init,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 	.get_irq		= mpic_get_irq,
 	.restart		= fsl_rstcr_restart,
@@ -237,6 +239,7 @@ define_machine(p2020_ds) {
 	.init_IRQ		= mpc85xx_ds_pic_init,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 	.get_irq		= mpic_get_irq,
 	.restart		= fsl_rstcr_restart,
diff --git a/arch/powerpc/platforms/85xx/mpc85xx_mds.c b/arch/powerpc/platforms/85xx/mpc85xx_mds.c
index a7b3621..6cd3b8a 100644
--- a/arch/powerpc/platforms/85xx/mpc85xx_mds.c
+++ b/arch/powerpc/platforms/85xx/mpc85xx_mds.c
@@ -416,6 +416,7 @@ define_machine(mpc8568_mds) {
 	.progress	= udbg_progress,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 };
 
@@ -437,6 +438,7 @@ define_machine(mpc8569_mds) {
 	.progress	= udbg_progress,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 };
 
@@ -459,6 +461,7 @@ define_machine(p1021_mds) {
 	.progress	= udbg_progress,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 };
 
diff --git a/arch/powerpc/platforms/85xx/mpc85xx_rdb.c b/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
index 53b6fb0..3e2bc3d 100644
--- a/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
+++ b/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
@@ -254,6 +254,7 @@ define_machine(p2020_rdb) {
 	.init_IRQ		= mpc85xx_rdb_pic_init,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 	.get_irq		= mpic_get_irq,
 	.restart		= fsl_rstcr_restart,
@@ -268,6 +269,7 @@ define_machine(p1020_rdb) {
 	.init_IRQ		= mpc85xx_rdb_pic_init,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 	.get_irq		= mpic_get_irq,
 	.restart		= fsl_rstcr_restart,
@@ -282,6 +284,7 @@ define_machine(p1021_rdb_pc) {
 	.init_IRQ		= mpc85xx_rdb_pic_init,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 	.get_irq		= mpic_get_irq,
 	.restart		= fsl_rstcr_restart,
@@ -296,6 +299,7 @@ define_machine(p2020_rdb_pc) {
 	.init_IRQ		= mpc85xx_rdb_pic_init,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 	.get_irq		= mpic_get_irq,
 	.restart		= fsl_rstcr_restart,
@@ -310,6 +314,7 @@ define_machine(p1025_rdb) {
 	.init_IRQ		= mpc85xx_rdb_pic_init,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 	.get_irq		= mpic_get_irq,
 	.restart		= fsl_rstcr_restart,
@@ -324,6 +329,7 @@ define_machine(p1020_mbg_pc) {
 	.init_IRQ		= mpc85xx_rdb_pic_init,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 	.get_irq		= mpic_get_irq,
 	.restart		= fsl_rstcr_restart,
@@ -338,6 +344,7 @@ define_machine(p1020_utm_pc) {
 	.init_IRQ		= mpc85xx_rdb_pic_init,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 	.get_irq		= mpic_get_irq,
 	.restart		= fsl_rstcr_restart,
@@ -352,6 +359,7 @@ define_machine(p1020_rdb_pc) {
 	.init_IRQ		= mpc85xx_rdb_pic_init,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 	.get_irq		= mpic_get_irq,
 	.restart		= fsl_rstcr_restart,
@@ -366,6 +374,7 @@ define_machine(p1020_rdb_pd) {
 	.init_IRQ		= mpc85xx_rdb_pic_init,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 	.get_irq		= mpic_get_irq,
 	.restart		= fsl_rstcr_restart,
@@ -380,6 +389,7 @@ define_machine(p1024_rdb) {
 	.init_IRQ		= mpc85xx_rdb_pic_init,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 	.get_irq		= mpic_get_irq,
 	.restart		= fsl_rstcr_restart,
diff --git a/arch/powerpc/platforms/85xx/p1010rdb.c b/arch/powerpc/platforms/85xx/p1010rdb.c
index d6a3dd3..ad1a3d4 100644
--- a/arch/powerpc/platforms/85xx/p1010rdb.c
+++ b/arch/powerpc/platforms/85xx/p1010rdb.c
@@ -78,6 +78,7 @@ define_machine(p1010_rdb) {
 	.init_IRQ		= p1010_rdb_pic_init,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 	.get_irq		= mpic_get_irq,
 	.restart		= fsl_rstcr_restart,
diff --git a/arch/powerpc/platforms/85xx/p1022_ds.c b/arch/powerpc/platforms/85xx/p1022_ds.c
index e611e79..6ac986d 100644
--- a/arch/powerpc/platforms/85xx/p1022_ds.c
+++ b/arch/powerpc/platforms/85xx/p1022_ds.c
@@ -567,6 +567,7 @@ define_machine(p1022_ds) {
 	.init_IRQ		= p1022_ds_pic_init,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb	= fsl_pcibios_fixup_phb,
 #endif
 	.get_irq		= mpic_get_irq,
 	.restart		= fsl_rstcr_restart,
diff --git a/arch/powerpc/platforms/85xx/p1022_rdk.c b/arch/powerpc/platforms/85xx/p1022_rdk.c
index 8c92971..7a180f0 100644
--- a/arch/powerpc/platforms/85xx/p1022_rdk.c
+++ b/arch/powerpc/platforms/85xx/p1022_rdk.c
@@ -147,6 +147,7 @@ define_machine(p1022_rdk) {
 	.init_IRQ		= p1022_rdk_pic_init,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 	.get_irq		= mpic_get_irq,
 	.restart		= fsl_rstcr_restart,
diff --git a/arch/powerpc/platforms/85xx/p1023_rds.c b/arch/powerpc/platforms/85xx/p1023_rds.c
index 2ae9d49..0e61400 100644
--- a/arch/powerpc/platforms/85xx/p1023_rds.c
+++ b/arch/powerpc/platforms/85xx/p1023_rds.c
@@ -126,6 +126,7 @@ define_machine(p1023_rds) {
 	.progress		= udbg_progress,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 };
 
@@ -140,5 +141,6 @@ define_machine(p1023_rdb) {
 	.progress		= udbg_progress,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 };
diff --git a/arch/powerpc/platforms/85xx/qemu_e500.c b/arch/powerpc/platforms/85xx/qemu_e500.c
index 5cefc5a..7f26732 100644
--- a/arch/powerpc/platforms/85xx/qemu_e500.c
+++ b/arch/powerpc/platforms/85xx/qemu_e500.c
@@ -66,6 +66,7 @@ define_machine(qemu_e500) {
 	.init_IRQ		= qemu_e500_pic_init,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 	.get_irq		= mpic_get_coreint_irq,
 	.restart		= fsl_rstcr_restart,
diff --git a/arch/powerpc/platforms/85xx/sbc8548.c b/arch/powerpc/platforms/85xx/sbc8548.c
index f621218..b072146 100644
--- a/arch/powerpc/platforms/85xx/sbc8548.c
+++ b/arch/powerpc/platforms/85xx/sbc8548.c
@@ -135,6 +135,7 @@ define_machine(sbc8548) {
 	.restart	= fsl_rstcr_restart,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 	.calibrate_decr = generic_calibrate_decr,
 	.progress	= udbg_progress,
diff --git a/arch/powerpc/platforms/85xx/xes_mpc85xx.c b/arch/powerpc/platforms/85xx/xes_mpc85xx.c
index dcbf7e4..1a9c108 100644
--- a/arch/powerpc/platforms/85xx/xes_mpc85xx.c
+++ b/arch/powerpc/platforms/85xx/xes_mpc85xx.c
@@ -170,6 +170,7 @@ define_machine(xes_mpc8572) {
 	.init_IRQ		= xes_mpc85xx_pic_init,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 	.get_irq		= mpic_get_irq,
 	.restart		= fsl_rstcr_restart,
@@ -184,6 +185,7 @@ define_machine(xes_mpc8548) {
 	.init_IRQ		= xes_mpc85xx_pic_init,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 	.get_irq		= mpic_get_irq,
 	.restart		= fsl_rstcr_restart,
@@ -198,6 +200,7 @@ define_machine(xes_mpc8540) {
 	.init_IRQ		= xes_mpc85xx_pic_init,
 #ifdef CONFIG_PCI
 	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
+	.pcibios_fixup_phb      = fsl_pcibios_fixup_phb,
 #endif
 	.get_irq		= mpic_get_irq,
 	.restart		= fsl_rstcr_restart,
diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index 4dfd61d..98cb3d4 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -22,10 +22,13 @@
 #include <linux/delay.h>
 #include <linux/string.h>
 #include <linux/init.h>
+#include <linux/interrupt.h>
 #include <linux/bootmem.h>
 #include <linux/memblock.h>
 #include <linux/log2.h>
 #include <linux/slab.h>
+#include <linux/suspend.h>
+#include <linux/syscore_ops.h>
 #include <linux/uaccess.h>
 
 #include <asm/io.h>
@@ -1085,55 +1088,167 @@ void fsl_pci_assign_primary(void)
 	}
 }
 
-static int fsl_pci_probe(struct platform_device *pdev)
+#ifdef CONFIG_PM
+static irqreturn_t fsl_pci_pme_handle(int irq, void *dev_id)
 {
-	int ret;
-	struct device_node *node;
+	struct pci_controller *hose = dev_id;
+	struct ccsr_pci __iomem *pci = hose->private_data;
+	u32 dr;
 
-	node = pdev->dev.of_node;
-	ret = fsl_add_bridge(pdev, fsl_pci_primary == node);
+	dr = in_be32(&pci->pex_pme_mes_dr);
+	if (dr)
+		out_be32(&pci->pex_pme_mes_dr, dr);
+	else
+		return IRQ_NONE;
 
-	mpc85xx_pci_err_probe(pdev);
+	return IRQ_HANDLED;
+}
+
+static int fsl_pci_pme_probe(struct pci_controller *hose)
+{
+	struct ccsr_pci __iomem *pci;
+	struct pci_dev *dev = hose->bus->self;
+	u16 pms;
+	int pme_irq;
+	int res;
+
+	/* PME Disable */
+	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pms);
+	pms &= ~PCI_PM_CTRL_PME_ENABLE;
+	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pms);
+
+	pme_irq = irq_of_parse_and_map(hose->dn, 0);
+	if (!pme_irq) {
+		pr_warn("Failed to map PME interrupt.\n");
+
+		return -ENXIO;
+	}
+
+	res = devm_request_irq(hose->parent, pme_irq,
+			fsl_pci_pme_handle,
+			IRQF_DISABLED | IRQF_SHARED,
+			"[PCI] PME", hose);
+	if (res < 0) {
+		pr_warn("Unable to requiest irq %d for PME\n", pme_irq);
+		irq_dispose_mapping(pme_irq);
+
+		return -ENODEV;
+	}
+
+	pci = hose->private_data;
+
+	/* Enable PTOD, ENL23D & EXL23D */
+	out_be32(&pci->pex_pme_mes_disr, 0);
+	setbits32(&pci->pex_pme_mes_disr,
+		  PME_DISR_EN_PTOD | PME_DISR_EN_ENL23D | PME_DISR_EN_EXL23D);
+
+	out_be32(&pci->pex_pme_mes_ier, 0);
+	setbits32(&pci->pex_pme_mes_ier,
+		  PME_DISR_EN_PTOD | PME_DISR_EN_ENL23D | PME_DISR_EN_EXL23D);
+
+	/* PME Enable */
+	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pms);
+	pms |= PCI_PM_CTRL_PME_ENABLE;
+	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pms);
 
 	return 0;
 }
 
-#ifdef CONFIG_PM
-static int fsl_pci_resume(struct device *dev)
+static void send_pme_turnoff_message(struct pci_controller *hose)
 {
-	struct pci_controller *hose;
-	struct resource pci_rsrc;
+	struct ccsr_pci __iomem *pci = hose->private_data;
+	u32 dr;
+	int i;
 
-	hose = pci_find_hose_for_OF_device(dev->of_node);
-	if (!hose)
-		return -ENODEV;
+	/* Send PME_Turn_Off Message Request */
+	setbits32(&pci->pex_pmcr, PEX_PMCR_PTOMR);
 
-	if (of_address_to_resource(dev->of_node, 0, &pci_rsrc)) {
-		dev_err(dev, "Get pci register base failed.");
-		return -ENODEV;
+	for (i = 0; i < 150; i++) {
+		dr = in_be32(&pci->pex_pme_mes_dr);
+		if (dr) {
+			out_be32(&pci->pex_pme_mes_dr, dr);
+			break;
+		} else {
+			udelay(1000);
+		}
 	}
+}
 
-	setup_pci_atmu(hose);
+static void fsl_pci_syscore_do_suspend(struct pci_controller *hose)
+{
+	send_pme_turnoff_message(hose);
+}
+
+static int fsl_pci_syscore_suspend(void)
+{
+	struct pci_controller *hose, *tmp;
+
+	list_for_each_entry_safe(hose, tmp, &hose_list, list_node)
+		fsl_pci_syscore_do_suspend(hose);
 
 	return 0;
 }
 
-static const struct dev_pm_ops pci_pm_ops = {
-	.resume = fsl_pci_resume,
-};
+static void fsl_pci_syscore_do_resume(struct pci_controller *hose)
+{
+	struct ccsr_pci __iomem *pci = hose->private_data;
+	u32 dr;
+	int i;
 
-#define PCI_PM_OPS (&pci_pm_ops)
+	/* Send Exit L2 State Message */
+	setbits32(&pci->pex_pmcr, PEX_PMCR_EXL2S);
 
-#else
+	/* wait exit done */
+	for (i = 0; i < 150; i++) {
+		dr = in_be32(&pci->pex_pme_mes_dr);
+		if (dr) {
+			out_be32(&pci->pex_pme_mes_dr, dr);
+			break;
+		} else {
+			udelay(1000);
+		}
+	}
+
+	setup_pci_atmu(hose);
+}
 
-#define PCI_PM_OPS NULL
+static void fsl_pci_syscore_resume(void)
+{
+	struct pci_controller *hose, *tmp;
+
+	list_for_each_entry_safe(hose, tmp, &hose_list, list_node)
+		fsl_pci_syscore_do_resume(hose);
+}
 
+static struct syscore_ops pci_syscore_pm_ops = {
+	.suspend = fsl_pci_syscore_suspend,
+	.resume = fsl_pci_syscore_resume,
+};
 #endif
 
+void fsl_pcibios_fixup_phb(struct pci_controller *phb)
+{
+#ifdef CONFIG_PM
+	fsl_pci_pme_probe(phb);
+#endif
+}
+
+static int fsl_pci_probe(struct platform_device *pdev)
+{
+	int ret;
+	struct device_node *node;
+
+	node = pdev->dev.of_node;
+	ret = fsl_add_bridge(pdev, fsl_pci_primary == node);
+
+	mpc85xx_pci_err_probe(pdev);
+
+	return 0;
+}
+
 static struct platform_driver fsl_pci_driver = {
 	.driver = {
 		.name = "fsl-pci",
-		.pm = PCI_PM_OPS,
 		.of_match_table = pci_ids,
 	},
 	.probe = fsl_pci_probe,
@@ -1141,6 +1256,9 @@ static struct platform_driver fsl_pci_driver = {
 
 static int __init fsl_pci_init(void)
 {
+#ifdef CONFIG_PM
+	register_syscore_ops(&pci_syscore_pm_ops);
+#endif
 	return platform_driver_register(&fsl_pci_driver);
 }
 arch_initcall(fsl_pci_init);
diff --git a/arch/powerpc/sysdev/fsl_pci.h b/arch/powerpc/sysdev/fsl_pci.h
index 8d455df..c1cec77 100644
--- a/arch/powerpc/sysdev/fsl_pci.h
+++ b/arch/powerpc/sysdev/fsl_pci.h
@@ -32,6 +32,13 @@ struct platform_device;
 #define PIWAR_WRITE_SNOOP	0x00005000
 #define PIWAR_SZ_MASK          0x0000003f
 
+#define PEX_PMCR_PTOMR		0x1
+#define PEX_PMCR_EXL2S		0x2
+
+#define PME_DISR_EN_PTOD	0x00008000
+#define PME_DISR_EN_ENL23D	0x00002000
+#define PME_DISR_EN_EXL23D	0x00001000
+
 /* PCI/PCI Express outbound window reg */
 struct pci_outbound_window_regs {
 	__be32	potar;	/* 0x.0 - Outbound translation address register */
@@ -111,6 +118,7 @@ struct ccsr_pci {
 
 extern int fsl_add_bridge(struct platform_device *pdev, int is_primary);
 extern void fsl_pcibios_fixup_bus(struct pci_bus *bus);
+extern void fsl_pcibios_fixup_phb(struct pci_controller *phb);
 extern int mpc83xx_add_bridge(struct device_node *dev);
 u64 fsl_pci_immrbar_base(struct pci_controller *hose);
 
-- 
1.8.5

^ permalink raw reply related

* Re: [PATCH 1/2] pci: Fix root port bus->self is NULL
From: Yijing Wang @ 2014-01-07  8:27 UTC (permalink / raw)
  To: Dongsheng Wang, bhelgaas, rjw
  Cc: scottwood, roy.zang, linux-pci, linuxppc-dev, galak
In-Reply-To: <1389081848-26506-1-git-send-email-dongsheng.wang@freescale.com>

On 2014/1/7 16:04, Dongsheng Wang wrote:
> From: Wang Dongsheng <dongsheng.wang@freescale.com>
> 
> the root port bus->self always NULL, so put root port pci device
> into root port bus->self.
> 
> Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 38e403d..7f2d1ab 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1472,6 +1472,9 @@ int pci_scan_slot(struct pci_bus *bus, int devfn)
>  	if (!dev->is_added)
>  		nr++;
>  
> +	if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT)
> +		bus->self = dev;

In this case, bus is the pci root bus I think, so why set bus->self = root port ?
"bus->self" should pointer to the pci device that bridge out this bus.

> +
>  	for (fn = next_fn(bus, dev, 0); fn > 0; fn = next_fn(bus, dev, fn)) {
>  		dev = pci_scan_single_device(bus, devfn + fn);
>  		if (dev) {
> 


-- 
Thanks!
Yijing

^ permalink raw reply

* Re: [02/12,v3] pci: fsl: add structure fsl_pci
From: Scott Wood @ 2014-01-07  8:33 UTC (permalink / raw)
  To: Lian Minghuan-b31939
  Cc: Bjorn Helgaas, Minghuan Lian, linuxppc-dev, Zang Roy-R61911,
	linux-pci
In-Reply-To: <52CA48C0.2020307@freescale.com>

On Mon, 2014-01-06 at 14:10 +0800, Lian Minghuan-b31939 wrote:
> On 01/04/2014 06:19 AM, Scott Wood wrote:
> > I don't like the extent to which this duplicates (not moves) PPC's struct
> > pci_controller.  Also this leaves some fields like "indirect_type"
> > unexplained (PPC_INDIRECT_TYPE_xxx is only in the PPC header).
> >
> > Does the arch-independent part of the driver really need all this?  Given
> > how closely this tracks the PPC code, how would this work on ARM?
> [Minghuan] I added the duplicate fields because PPC's struct 
> pci_controller need them.

I think a better approach would be to create a cleanly architected
arch-independent driver.  Share what you reasonably can with the current
fsl_pci.c, but not to the extent of propagating PPCisms that don't match
up with what we ultimately want to see in generic code, or copying
things that ought to be controller-independent infrastructure into
controller-specific code.

See these threads:
http://www.spinics.net/lists/linux-pci/msg25769.html
https://lkml.org/lkml/2013/5/4/103

> The following is for ARM, I will submit them after verification:
[snip]
> +static int fsl_pcie_register(struct fsl_pcie *pcie)
> +{
> +    pcie->controller = fsl_hw_pcie.nr_controllers;
> +    fsl_hw_pcie.nr_controllers = 1;
> +    fsl_hw_pcie.private_data = (void **)&pcie;

I believe this should be:
	fsl_hw_pcie.private_data = pcie;

> +    pci_common_init(&fsl_hw_pcie);
> +    pci_assign_unassigned_resources();
> +#ifdef CONFIG_PCI_DOMAINS
> +    fsl_hw_pcie.domain++;
> +#endif

What serializes that non-atomic increment?

-Scott

^ permalink raw reply

* Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
From: Wanpeng Li @ 2014-01-07  8:48 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: cl, nacc, penberg, linux-mm, paulus, Anton Blanchard, mpm,
	linuxppc-dev
In-Reply-To: <20140107074136.GA4011@lge.com>

Hi Joonsoo,
On Tue, Jan 07, 2014 at 04:41:36PM +0900, Joonsoo Kim wrote:
>On Tue, Jan 07, 2014 at 01:21:00PM +1100, Anton Blanchard wrote:
>> 
[...]
>Hello,
>
>I think that we need more efforts to solve unbalanced node problem.
>
>With this patch, even if node of current cpu slab is not favorable to
>unbalanced node, allocation would proceed and we would get the unintended memory.
>

We have a machine:

[    0.000000] Node 0 Memory:
[    0.000000] Node 4 Memory: 0x0-0x10000000 0x20000000-0x60000000 0x80000000-0xc0000000
[    0.000000] Node 6 Memory: 0x10000000-0x20000000 0x60000000-0x80000000
[    0.000000] Node 10 Memory: 0xc0000000-0x180000000

[    0.041486] Node 0 CPUs: 0-19
[    0.041490] Node 4 CPUs:
[    0.041492] Node 6 CPUs:
[    0.041495] Node 10 CPUs:

The pages of current cpu slab should be allocated from fallback zones/nodes 
of the memoryless node in buddy system, how can not favorable happen? 

>And there is one more problem. Even if we have some partial slabs on
>compatible node, we would allocate new slab, because get_partial() cannot handle
>this unbalance node case.
>
>To fix this correctly, how about following patch?
>

So I think we should fold both of your two patches to one.

Regards,
Wanpeng Li 

>Thanks.
>
>------------->8--------------------
>diff --git a/mm/slub.c b/mm/slub.c
>index c3eb3d3..a1f6dfa 100644
>--- a/mm/slub.c
>+++ b/mm/slub.c
>@@ -1672,7 +1672,19 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
> {
>        void *object;
>        int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;
>+       struct zonelist *zonelist;
>+       struct zoneref *z;
>+       struct zone *zone;
>+       enum zone_type high_zoneidx = gfp_zone(flags);
>
>+       if (!node_present_pages(searchnode)) {
>+               zonelist = node_zonelist(searchnode, flags);
>+               for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
>+                       searchnode = zone_to_nid(zone);
>+                       if (node_present_pages(searchnode))
>+                               break;
>+               }
>+       }
>        object = get_partial_node(s, get_node(s, searchnode), c, flags);
>        if (object || node != NUMA_NO_NODE)
>                return object;
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org.  For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 1/2] pci: Fix root port bus->self is NULL
From: Jiang Liu @ 2014-01-07  8:54 UTC (permalink / raw)
  To: Yijing Wang, Dongsheng Wang, bhelgaas, rjw
  Cc: scottwood, roy.zang, linux-pci, linuxppc-dev, galak
In-Reply-To: <52CBBA86.7020401@huawei.com>



On 2014/1/7 16:27, Yijing Wang wrote:
> On 2014/1/7 16:04, Dongsheng Wang wrote:
>> From: Wang Dongsheng <dongsheng.wang@freescale.com>
>>
>> the root port bus->self always NULL, so put root port pci device
>> into root port bus->self.
>>
>> Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
>>
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index 38e403d..7f2d1ab 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -1472,6 +1472,9 @@ int pci_scan_slot(struct pci_bus *bus, int devfn)
>>  	if (!dev->is_added)
>>  		nr++;
>>  
>> +	if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT)
>> +		bus->self = dev;
> 
> In this case, bus is the pci root bus I think, so why set bus->self = root port ?
> "bus->self" should pointer to the pci device that bridge out this bus.
Yes, this patch seems wrong. If dev is root port, bus should be root
bus, so we shouldn't set root_bus->self = pci_dev_of_root_port.

Actually PCI core has correctly setup pci_bus->self for secondary bus
of PCIe root port.

Thanks!
Gerry

> 
>> +
>>  	for (fn = next_fn(bus, dev, 0); fn > 0; fn = next_fn(bus, dev, fn)) {
>>  		dev = pci_scan_single_device(bus, devfn + fn);
>>  		if (dev) {
>>
> 
> 

^ permalink raw reply

* Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
From: Joonsoo Kim @ 2014-01-07  9:10 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: cl, nacc, penberg, linux-mm, paulus, Anton Blanchard, mpm,
	linuxppc-dev
In-Reply-To: <52cbbf7b.2792420a.571c.ffffd476SMTPIN_ADDED_BROKEN@mx.google.com>

On Tue, Jan 07, 2014 at 04:48:40PM +0800, Wanpeng Li wrote:
> Hi Joonsoo,
> On Tue, Jan 07, 2014 at 04:41:36PM +0900, Joonsoo Kim wrote:
> >On Tue, Jan 07, 2014 at 01:21:00PM +1100, Anton Blanchard wrote:
> >> 
> [...]
> >Hello,
> >
> >I think that we need more efforts to solve unbalanced node problem.
> >
> >With this patch, even if node of current cpu slab is not favorable to
> >unbalanced node, allocation would proceed and we would get the unintended memory.
> >
> 
> We have a machine:
> 
> [    0.000000] Node 0 Memory:
> [    0.000000] Node 4 Memory: 0x0-0x10000000 0x20000000-0x60000000 0x80000000-0xc0000000
> [    0.000000] Node 6 Memory: 0x10000000-0x20000000 0x60000000-0x80000000
> [    0.000000] Node 10 Memory: 0xc0000000-0x180000000
> 
> [    0.041486] Node 0 CPUs: 0-19
> [    0.041490] Node 4 CPUs:
> [    0.041492] Node 6 CPUs:
> [    0.041495] Node 10 CPUs:
> 
> The pages of current cpu slab should be allocated from fallback zones/nodes 
> of the memoryless node in buddy system, how can not favorable happen? 

Hi, Wanpeng.

IIRC, if we call kmem_cache_alloc_node() with certain node #, we try to
allocate the page in fallback zones/node of that node #. So fallback list isn't
related to fallback one of memoryless node #. Am I wrong?

Thanks.

> 
> >And there is one more problem. Even if we have some partial slabs on
> >compatible node, we would allocate new slab, because get_partial() cannot handle
> >this unbalance node case.
> >
> >To fix this correctly, how about following patch?
> >
> 
> So I think we should fold both of your two patches to one.
> 
> Regards,
> Wanpeng Li 
> 
> >Thanks.
> >
> >------------->8--------------------
> >diff --git a/mm/slub.c b/mm/slub.c
> >index c3eb3d3..a1f6dfa 100644
> >--- a/mm/slub.c
> >+++ b/mm/slub.c
> >@@ -1672,7 +1672,19 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
> > {
> >        void *object;
> >        int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;
> >+       struct zonelist *zonelist;
> >+       struct zoneref *z;
> >+       struct zone *zone;
> >+       enum zone_type high_zoneidx = gfp_zone(flags);
> >
> >+       if (!node_present_pages(searchnode)) {
> >+               zonelist = node_zonelist(searchnode, flags);
> >+               for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
> >+                       searchnode = zone_to_nid(zone);
> >+                       if (node_present_pages(searchnode))
> >+                               break;
> >+               }
> >+       }
> >        object = get_partial_node(s, get_node(s, searchnode), c, flags);
> >        if (object || node != NUMA_NO_NODE)
> >                return object;
> >
> >--
> >To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >the body to majordomo@kvack.org.  For more info on Linux MM,
> >see: http://www.linux-mm.org/ .
> >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
From: Wanpeng Li @ 2014-01-07  9:21 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: cl, nacc, penberg, linux-mm, paulus, Anton Blanchard, mpm,
	linuxppc-dev
In-Reply-To: <20140107091016.GA21965@lge.com>

On Tue, Jan 07, 2014 at 06:10:16PM +0900, Joonsoo Kim wrote:
>On Tue, Jan 07, 2014 at 04:48:40PM +0800, Wanpeng Li wrote:
>> Hi Joonsoo,
>> On Tue, Jan 07, 2014 at 04:41:36PM +0900, Joonsoo Kim wrote:
>> >On Tue, Jan 07, 2014 at 01:21:00PM +1100, Anton Blanchard wrote:
>> >> 
>> [...]
>> >Hello,
>> >
>> >I think that we need more efforts to solve unbalanced node problem.
>> >
>> >With this patch, even if node of current cpu slab is not favorable to
>> >unbalanced node, allocation would proceed and we would get the unintended memory.
>> >
>> 
>> We have a machine:
>> 
>> [    0.000000] Node 0 Memory:
>> [    0.000000] Node 4 Memory: 0x0-0x10000000 0x20000000-0x60000000 0x80000000-0xc0000000
>> [    0.000000] Node 6 Memory: 0x10000000-0x20000000 0x60000000-0x80000000
>> [    0.000000] Node 10 Memory: 0xc0000000-0x180000000
>> 
>> [    0.041486] Node 0 CPUs: 0-19
>> [    0.041490] Node 4 CPUs:
>> [    0.041492] Node 6 CPUs:
>> [    0.041495] Node 10 CPUs:
>> 
>> The pages of current cpu slab should be allocated from fallback zones/nodes 
>> of the memoryless node in buddy system, how can not favorable happen? 
>
>Hi, Wanpeng.
>
>IIRC, if we call kmem_cache_alloc_node() with certain node #, we try to
>allocate the page in fallback zones/node of that node #. So fallback list isn't
>related to fallback one of memoryless node #. Am I wrong?
>

Anton add node_spanned_pages(node) check, so current cpu slab mentioned
above is against memoryless node. If I miss something?

Regards,
Wanpeng Li 

>Thanks.
>
>> 
>> >And there is one more problem. Even if we have some partial slabs on
>> >compatible node, we would allocate new slab, because get_partial() cannot handle
>> >this unbalance node case.
>> >
>> >To fix this correctly, how about following patch?
>> >
>> 
>> So I think we should fold both of your two patches to one.
>> 
>> Regards,
>> Wanpeng Li 
>> 
>> >Thanks.
>> >
>> >------------->8--------------------
>> >diff --git a/mm/slub.c b/mm/slub.c
>> >index c3eb3d3..a1f6dfa 100644
>> >--- a/mm/slub.c
>> >+++ b/mm/slub.c
>> >@@ -1672,7 +1672,19 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
>> > {
>> >        void *object;
>> >        int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;
>> >+       struct zonelist *zonelist;
>> >+       struct zoneref *z;
>> >+       struct zone *zone;
>> >+       enum zone_type high_zoneidx = gfp_zone(flags);
>> >
>> >+       if (!node_present_pages(searchnode)) {
>> >+               zonelist = node_zonelist(searchnode, flags);
>> >+               for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
>> >+                       searchnode = zone_to_nid(zone);
>> >+                       if (node_present_pages(searchnode))
>> >+                               break;
>> >+               }
>> >+       }
>> >        object = get_partial_node(s, get_node(s, searchnode), c, flags);
>> >        if (object || node != NUMA_NO_NODE)
>> >                return object;
>> >
>> >--
>> >To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >the body to majordomo@kvack.org.  For more info on Linux MM,
>> >see: http://www.linux-mm.org/ .
>> >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>> 
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
From: Joonsoo Kim @ 2014-01-07  9:31 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: cl, nacc, penberg, linux-mm, paulus, Anton Blanchard, mpm,
	linuxppc-dev
In-Reply-To: <52cbc738.c727440a.5ead.27a3SMTPIN_ADDED_BROKEN@mx.google.com>

On Tue, Jan 07, 2014 at 05:21:45PM +0800, Wanpeng Li wrote:
> On Tue, Jan 07, 2014 at 06:10:16PM +0900, Joonsoo Kim wrote:
> >On Tue, Jan 07, 2014 at 04:48:40PM +0800, Wanpeng Li wrote:
> >> Hi Joonsoo,
> >> On Tue, Jan 07, 2014 at 04:41:36PM +0900, Joonsoo Kim wrote:
> >> >On Tue, Jan 07, 2014 at 01:21:00PM +1100, Anton Blanchard wrote:
> >> >> 
> >> [...]
> >> >Hello,
> >> >
> >> >I think that we need more efforts to solve unbalanced node problem.
> >> >
> >> >With this patch, even if node of current cpu slab is not favorable to
> >> >unbalanced node, allocation would proceed and we would get the unintended memory.
> >> >
> >> 
> >> We have a machine:
> >> 
> >> [    0.000000] Node 0 Memory:
> >> [    0.000000] Node 4 Memory: 0x0-0x10000000 0x20000000-0x60000000 0x80000000-0xc0000000
> >> [    0.000000] Node 6 Memory: 0x10000000-0x20000000 0x60000000-0x80000000
> >> [    0.000000] Node 10 Memory: 0xc0000000-0x180000000
> >> 
> >> [    0.041486] Node 0 CPUs: 0-19
> >> [    0.041490] Node 4 CPUs:
> >> [    0.041492] Node 6 CPUs:
> >> [    0.041495] Node 10 CPUs:
> >> 
> >> The pages of current cpu slab should be allocated from fallback zones/nodes 
> >> of the memoryless node in buddy system, how can not favorable happen? 
> >
> >Hi, Wanpeng.
> >
> >IIRC, if we call kmem_cache_alloc_node() with certain node #, we try to
> >allocate the page in fallback zones/node of that node #. So fallback list isn't
> >related to fallback one of memoryless node #. Am I wrong?
> >
> 
> Anton add node_spanned_pages(node) check, so current cpu slab mentioned
> above is against memoryless node. If I miss something?

I thought following scenario.

memoryless node # : 1
1's fallback node # : 0

On node 1's cpu,

1. kmem_cache_alloc_node (node 2)
2. allocate the page on node 2 for the slab, now cpu slab is that one.
3. kmem_cache_alloc_node (local node, that is, node 1)
4. It check node_spanned_pages() and find it is memoryless node.
So return node 2's memory.

Is it impossible scenario?

Thanks.

^ permalink raw reply

* RE: [RFC] linux/pci: move pci_platform_pm_ops to linux/pci.h
From: Dongsheng.Wang @ 2014-01-07  9:35 UTC (permalink / raw)
  To: Rafael J. Wysocki, Bjorn Helgaas
  Cc: Linux PM list, linux-pci@vger.kernel.org, Scott Wood,
	linuxppc-dev
In-Reply-To: <2276888.ovWFBgDtau@vostro.rjw.lan>

DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogUmFmYWVsIEouIFd5c29j
a2kgW21haWx0bzpyandAcmp3eXNvY2tpLm5ldF0NCj4gU2VudDogTW9uZGF5LCBKYW51YXJ5IDA2
LCAyMDE0IDg6MTMgUE0NCj4gVG86IEJqb3JuIEhlbGdhYXMNCj4gQ2M6IFdhbmcgRG9uZ3NoZW5n
LUI0MDUzNDsgWmFuZyBSb3ktUjYxOTExOyBXb29kIFNjb3R0LUIwNzQyMTsgS3VtYXIgR2FsYTsg
TGludXgNCj4gUE0gbGlzdDsgbGludXgtcGNpQHZnZXIua2VybmVsLm9yZzsgbGludXhwcGMtZGV2
DQo+IFN1YmplY3Q6IFJlOiBbUkZDXSBsaW51eC9wY2k6IG1vdmUgcGNpX3BsYXRmb3JtX3BtX29w
cyB0byBsaW51eC9wY2kuaA0KPiANCj4gT24gRnJpZGF5LCBEZWNlbWJlciAyMCwgMjAxMyAwOTo0
Mjo1OSBBTSBCam9ybiBIZWxnYWFzIHdyb3RlOg0KPiA+IE9uIEZyaSwgRGVjIDIwLCAyMDEzIGF0
IDM6MDMgQU0sIERvbmdzaGVuZyBXYW5nDQo+ID4gPGRvbmdzaGVuZy53YW5nQGZyZWVzY2FsZS5j
b20+IHdyb3RlOg0KPiA+ID4gRnJvbTogV2FuZyBEb25nc2hlbmcgPGRvbmdzaGVuZy53YW5nQGZy
ZWVzY2FsZS5jb20+DQo+ID4gPg0KPiA+ID4gbWFrZSBGcmVlc2NhbGUgcGxhdGZvcm0gdXNlIHBj
aV9wbGF0Zm9ybV9wbV9vcHMgc3RydWN0Lg0KPiA+DQo+ID4gVGhpcyBjaGFuZ2Vsb2cgZG9lc24n
dCBzYXkgYW55dGhpbmcgYWJvdXQgd2hhdCB0aGUgcGF0Y2ggZG9lcy4NCj4gPg0KPiA+IEkgaW5m
ZXIgdGhhdCB5b3Ugd2FudCB0byB1c2UgcGNpX3BsYXRmb3JtX3BtX29wcyBmcm9tIHNvbWUgRnJl
ZXNjYWxlDQo+ID4gY29kZS4gIFRoaXMgcGF0Y2ggc2hvdWxkIGJlIHBvc3RlZCBhbG9uZyB3aXRo
IHRoZSBwYXRjaGVzIHRoYXQgYWRkDQo+ID4gdGhhdCBGcmVlc2NhbGUgY29kZSwgc28gd2UgY2Fu
IHNlZSBob3cgeW91IGludGVuZCB0byB1c2UgaXQuDQo+ID4NCj4gPiBUaGUgZXhpc3RpbmcgdXNl
IGlzIGluIGRyaXZlcnMvcGNpL3BjaS1hY3BpLmMsIHNvIGl0J3MgcG9zc2libGUgdGhhdA0KPiA+
IHlvdXIgbmV3IHVzZSBzaG91bGQgYmUgYWRkZWQgaW4gdGhlIHNhbWUgd2F5LCBpbiBkcml2ZXJz
L3BjaSwgc28gd2UNCj4gPiBkb24ndCBoYXZlIHRvIG1ha2UgcGNpX3BsYXRmb3JtX3BtX29wcyBw
YXJ0IG9mIHRoZSBwdWJsaWMgUENJDQo+ID4gaW50ZXJmYWNlIGluIGluY2x1ZGUvbGludXgvcGNp
LmguDQo+ID4NCj4gPiBUaGF0IHNhaWQsIGlmIFJhcGhhZWwgdGhpbmtzIHRoaXMgbWFrZXMgc2Vu
c2UsIGl0J3MgT0sgd2l0aCBtZS4NCj4gDQo+IFdlbGwsIEknZCBsaWtlIHRvIGtub3cgd2h5IGV4
YWN0bHkgdGhlIGNoYW5nZSBpcyBuZWVkZWQgaW4gdGhlIGZpcnN0IHBsYWNlLg0KPiANClRoYW5r
cyBmb3IgcmV2aWV3LCBJIHRoaW5rIHRoZSBpZGVhIGlzIG5vdCBzdWl0YWJsZSBmb3IgZnJlZXNj
YWxlIHBsYXRmb3JtDQppbXBsZW1lbnRhdGlvbiBvZiB0aGUgcmlnaHQgbm93Lg0KDQpJIHdpbGwg
ZHJvcCB0aGlzIFJGQyBwYXRjaC4NCg0KLURvbmdzaGVuZw0KDQo+IFRoYW5rcyENCj4gDQo+IC0t
DQo+IEkgc3BlYWsgb25seSBmb3IgbXlzZWxmLg0KPiBSYWZhZWwgSi4gV3lzb2NraSwgSW50ZWwg
T3BlbiBTb3VyY2UgVGVjaG5vbG9neSBDZW50ZXIuDQo+IA0KDQo=

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox