LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: Linux seamless booting
From: Fortini Matteo @ 2009-10-12 13:54 UTC (permalink / raw)
  To: Roberto Guerra; +Cc: u-boot@lists.denx.de, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <7c4144600910120623k2c9aea89ia0f255164f03ff89@mail.gmail.com>

Yes, that's what we're currently using, but the problem is a little 
broader: I should answer to CAN messages in at most 100-200ms from 
powerup, and that can be done in u-boot.
However, handing CAN transmission control over to Linux is quite 
complicated nowadays, since it would involve passing structures in 
memory and hacking through device init.
It'd be nice to have a framework with which u-boot could hand-over 
devices to Linux in a clean and defined way.
We were thinking about starting some kind of development in that 
direction, but with very little resources, so I was looking for someone 
who had looked at the same problem.

Thanks,
Matteo

Roberto Guerra ha scritto:
> Did you try "setenv bootdelay 0" in uboot?
>
> On Mon, Oct 12, 2009 at 7:07 AM, Fortini Matteo <matteo.fortini@mta.it> wrote:
>   
>> Hi all,
>> we've been working on a PPC512x board booting with u-boot + linux 2.6.24.6,
>> and one major issue for our application is boot time.
>>
>> Right now, we went down to less than 6s boot time (i.e. time from power off
>> to launch of the user app), even with some drivers installed, i.e.:
>>
>>   * CAN bus
>>   * GPU on PCI bus
>>   * USB host
>>   * UBI
>>
>> with r/o squashfs root and r/w ubifs configuration filesystem mounted from a
>> NOR flash memory.
>>
>> This time could be enough for the high-level application (we're showing a
>> splash screen in about 2-3s so it's ok if the main app takes a little longer
>> to come up), but the whole system is connected to others via CAN and we
>> can't afford the "black out" period between u-boot and linux.
>>
>> Some other operating systems (e.g. QNX) have this "seamless booting"
>> function, in which drivers are started in stages, so that a (minimal)
>> functionality can be given in milliseconds, and more functionalities come up
>> as time goes by.
>>
>> Is there a way to support something similar with the u-boot -> linux
>> solution?
>>
>> Thank you,
>> Matteo
>> _______________________________________________
>> Linuxppc-dev mailing list
>> Linuxppc-dev@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev
>>
>>     
> .
>
>   

^ permalink raw reply

* Re: linux booting fails on ppc440x5 with SRAM
From: Vineeth _ @ 2009-10-12 13:34 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: Johnny Hung, linuxppc-dev
In-Reply-To: <20091009113338.2B7E3E84EBC@gemini.denx.de>

Hi Wolfgang,

The link says about the initialization of the SDRAM; Does it
applicable in our case, where we have SRAM on our board. Does the
initialization means just clearing the memory in case of SRAM ? We
tried clearing the memory before the operation which doesnt work too.
We are creating a TLB of 16MB for the SRAM which is not cachable	.

-Vineeth

On Fri, Oct 9, 2009 at 5:03 PM, Wolfgang Denk <wd@denx.de> wrote:
> Dear Vineeth _,
>
> In message <a9b543570910090320t1444f8f1qf4c8ab7dbbef679f@mail.gmail.com> =
you wrote:
>> We ported the uboot Memory test and tested the 15MB ram and it was
>> successful.interestingly we have only 16MB SRAM in our board.We used 1
>
> Such a memory test means nothing. The only thing you can learn from it
> is that basic read and write accesses are working. You don;t get any
> information about the behaviour for burst mode accesses from such a
> test. See the FAQ at
> http://www.denx.de/wiki/view/DULG/LinuxCrashesRandomly =A0and at
> http://www.denx.de/wiki/DULG/UBootCrashAfterRelocation
>
> Best regards,
>
> Wolfgang Denk
>
> --
> DENX Software Engineering GmbH, =A0 =A0 MD: Wolfgang Denk & Detlev Zundel
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
> I mean, I . . . think to understand you, I just don't know =A0what =A0you
> are saying ... =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0- Terry Pra=
tchett, _Soul Music_
>

^ permalink raw reply

* Re: Linux seamless booting
From: Roberto Guerra @ 2009-10-12 13:23 UTC (permalink / raw)
  To: Fortini Matteo; +Cc: u-boot, linuxppc-dev
In-Reply-To: <4AD30DF0.1000901@mta.it>

Did you try "setenv bootdelay 0" in uboot?

On Mon, Oct 12, 2009 at 7:07 AM, Fortini Matteo <matteo.fortini@mta.it> wro=
te:
> Hi all,
> we've been working on a PPC512x board booting with u-boot + linux 2.6.24.=
6,
> and one major issue for our application is boot time.
>
> Right now, we went down to less than 6s boot time (i.e. time from power o=
ff
> to launch of the user app), even with some drivers installed, i.e.:
>
> =A0 * CAN bus
> =A0 * GPU on PCI bus
> =A0 * USB host
> =A0 * UBI
>
> with r/o squashfs root and r/w ubifs configuration filesystem mounted fro=
m a
> NOR flash memory.
>
> This time could be enough for the high-level application (we're showing a
> splash screen in about 2-3s so it's ok if the main app takes a little lon=
ger
> to come up), but the whole system is connected to others via CAN and we
> can't afford the "black out" period between u-boot and linux.
>
> Some other operating systems (e.g. QNX) have this "seamless booting"
> function, in which drivers are started in stages, so that a (minimal)
> functionality can be given in milliseconds, and more functionalities come=
 up
> as time goes by.
>
> Is there a way to support something similar with the u-boot -> linux
> solution?
>
> Thank you,
> Matteo
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>

^ permalink raw reply

* [PATCH] spufs: Fix test in spufs_switch_log_read()
From: Roel Kluin @ 2009-10-12 13:12 UTC (permalink / raw)
  To: Jeremy Kerr, linuxppc-dev, cbe-oss-dev, Andrew Morton

size_t len cannot be less than 0.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
---
Or can this test be removed?

diff --git a/arch/powerpc/platforms/cell/spufs/file.c b/arch/powerpc/platforms/cell/spufs/file.c
index 884e8bc..d4f304f 100644
--- a/arch/powerpc/platforms/cell/spufs/file.c
+++ b/arch/powerpc/platforms/cell/spufs/file.c
@@ -2494,7 +2494,7 @@ static ssize_t spufs_switch_log_read(struct file *file, char __user *buf,
 	struct spu_context *ctx = SPUFS_I(inode)->i_ctx;
 	int error = 0, cnt = 0;
 
-	if (!buf || len < 0)
+	if (!buf || (ssize_t)len < 0)
 		return -EINVAL;
 
 	error = spu_acquire(ctx);

^ permalink raw reply related

* Re: [PATCH] powerpc/40x: Cleanups for HCU4 board
From: Josh Boyer @ 2009-10-12 12:34 UTC (permalink / raw)
  To: Niklaus Giger; +Cc: linuxppc-dev, niklaus.giger
In-Reply-To: <1255117658-3889-1-git-send-email-niklaus.giger@member.fsf.org>

On Fri, Oct 09, 2009 at 09:47:38PM +0200, Niklaus Giger wrote:
>- hcu4.dts: Added definitions for 2 CAN (Intel 82527)
>- hcu4.c: Some code for CPLD (special HW clock) and 2 CAN (Intel 82527)

Same comments here as in the HCU5 patch.  Drop the device_type for cpld and
can, and use compatible properties please.

josh

^ permalink raw reply

* Re: [PATCH 1/1] powerpc/40x: Add new PPC440EPx based board HCU5 of Netstal Maschinen
From: Josh Boyer @ 2009-10-12 12:32 UTC (permalink / raw)
  To: Niklaus Giger; +Cc: linuxppc-dev, niklaus.giger
In-Reply-To: <1255075865-4025-2-git-send-email-niklaus.giger@member.fsf.org>

On Fri, Oct 09, 2009 at 10:11:05AM +0200, Niklaus Giger wrote:
>Adds support for a HCU5 PPC405EPx based board from Netstal Maschinen AG.

Should be 440EPx, no?

Also, the subject should probably be "powerpc/44x".  These are minor things,
not really a big deal.  Overall, nice patch.  See below for a few more
comments.



>+			UART0: serial@ef600300 {
>+		   		device_type = "serial";
>+		   		compatible = "ns16550";
>+		   		reg = <0xef600300 0x00000008>;
>+		   		virtual-reg = <0xef600300>;
>+		   		clock-frequency = <0>; /* Filled in by U-Boot */
>+		   		current-speed = <0>; /* Filled in by U-Boot */

I've been considering dropping the current-speed property entirely in most
4xx boards.  Given that you have a bootargs specified in chosen that explicitly
tells the kernel to use 115200, do you think you could drop this property and
see how things work?

>+		   		interrupt-parent = <&UIC0>;
>+		   		interrupts = <0x0 0x4>;
>+	   		};
>+
>+			IIC0: i2c@ef600700 {
>+				#address-cells = <1>;
>+				#size-cells = <0>;
>+				compatible = "ibm,iic-440epx", "ibm,iic";
>+				reg = <0xef600700 0x00000014>;
>+				interrupt-parent = <&UIC0>;
>+				interrupts = <0x2 0x4>;
>+			};
>+
>+			IIC1: i2c@ef600800 {
>+				#address-cells = <1>;
>+				#size-cells = <0>;
>+				compatible = "ibm,iic-440epx", "ibm,iic";
>+				reg = <0xef600800 0x00000014>;
>+				interrupt-parent = <&UIC0>;
>+				interrupts = <0x7 0x4>;
>+			};
>+
>+
>+			ZMII0: emac-zmii@ef600d00 {
>+				compatible = "ibm,zmii-440epx", "ibm,zmii";
>+				reg = <0xef600d00 0x0000000c>;
>+			};
>+
>+			SMII0: emac-smii@ef601000 {
>+					compatible = "ibm,smii-440epx", "ibm,smmii";
>+					reg = <0xef601000 0x00000008>;
>+					has-mdio;
>+				};
>+
>+			EMAC0: ethernet@ef600e00 {
>+				device_type = "network";
>+				compatible = "ibm,emac-440epx", "ibm,emac4";
>+				interrupt-parent = <&EMAC0>;
>+				interrupts = <0x0 0x1>;
>+				#interrupt-cells = <1>;
>+				#address-cells = <0>;
>+				#size-cells = <0>;
>+				interrupt-map = </*Status*/ 0x0 &UIC0 0x18 0x4
>+				/*Wake*/  0x1 &UIC1 0x1d 0x4>;
>+				reg = <0xef600e00 0x00000074>;
>+				local-mac-address = [000000000000];
>+				mal-device = <&MAL0>;
>+				mal-tx-channel = <0>;
>+				mal-rx-channel = <0>;
>+				cell-index = <0>;
>+				max-frame-size = <9000>;
>+				rx-fifo-size = <4096>;
>+				tx-fifo-size = <2048>;
>+				phy-mode = "smii";
>+				phy-map = <1>;
>+				zmii-device = <&ZMII0>;
>+				zmii-channel = <0>;
>+				has-inverted-stacr-oc;
>+				has-new-stacr-staopc;
>+			};
>+
>+			CPLD: cpld@0xcd000000 {
>+				reg = <0xcd000000 0x00000100>;
>+				interval = "1000";
>+				device_type = "cpld";

Should drop the device_type.

>+				interrupt-parent = <&UIC1>;
>+				interrupts = <28 8>; /* 8 is IRQ_TYPE_LEVEL_LOW */
>+			};
>+
>+			/* The HCU5 has two Intel CAN SCC82527 controllers */
>+			CAN1: can0@0xc8000000 {
>+				device_type = "can1";

Same comment here and for the one below as well.

>+				compatible = "intel,82527";
>+				reg = <0xc800000 0x00000100>;
>+				interrupt-parent = <&UIC2>;
>+				interrupts = <3 8>; /* 8 is IRQ_TYPE_LEVEL_LOW */
>+			};
>+
>+			CAN2: can1@0xc8000100 {
>+				device_type = "can2";
>+				compatible = "intel,82527";
>+				reg = <0xc8000100 0x00000100>;
>+				interrupt-parent = <&UIC2>;
>+				interrupts = <4 8>; /* 8 is IRQ_TYPE_LEVEL_LOW */
>+			};
>+
>+		};
>+
>+	};
>+
>+	chosen {
>+		linux,stdout-path = "/plb/opb/serial@ef600300";
>+		bootargs = "console=ttyS0,115200";
>+	};
>+};

>+# Platform support
>+#
>+# CONFIG_PPC_CELL is not set
>+# CONFIG_PPC_CELL_NATIVE is not set
>+# CONFIG_PQ2ADS is not set
>+CONFIG_BAMBOO=y
>+# CONFIG_EBONY is not set
>+CONFIG_HCU5=y

You have both bamboo and hcu5 enabled...

>+# CONFIG_SAM440EP is not set
>+# CONFIG_SEQUOIA is not set
>+# CONFIG_TAISHAN is not set
>+# CONFIG_KATMAI is not set
>+# CONFIG_RAINIER is not set
>+# CONFIG_WARP is not set
>+# CONFIG_ARCHES is not set
>+# CONFIG_CANYONLANDS is not set
>+# CONFIG_GLACIER is not set
>+# CONFIG_REDWOOD is not set
>+# CONFIG_EIGER is not set
>+# CONFIG_YOSEMITE is not set
>+# CONFIG_XILINX_VIRTEX440_GENERIC_BOARD is not set
>+CONFIG_PPC44x_SIMPLE=y
>+# CONFIG_PPC4xx_GPIO is not set
>+CONFIG_440EP=y
>+CONFIG_440EPX=y
>+CONFIG_IBM440EP_ERR42=y

And both 440EP and 440EPx?  Just curious as to why.

>+#
>+# Bus options
>+#
>+CONFIG_ZONE_DMA=y
>+CONFIG_PPC_INDIRECT_PCI=y
>+CONFIG_4xx_SOC=y
>+CONFIG_PPC_PCI_CHOICE=y
>+CONFIG_PCI=y
>+CONFIG_PCI_DOMAINS=y
>+CONFIG_PCI_SYSCALL=y

You have PCI enabled, but no PCI device nodes in your DTS.

>diff --git a/arch/powerpc/platforms/44x/Kconfig b/arch/powerpc/platforms/44x/Kconfig
>index 7486bff..cbf3601 100644
>--- a/arch/powerpc/platforms/44x/Kconfig
>+++ b/arch/powerpc/platforms/44x/Kconfig
>@@ -18,6 +18,15 @@ config EBONY
> 	help
> 	  This option enables support for the IBM PPC440GP evaluation board.
>
>+config HCU5
>+	bool "Hcu5"
>+	depends on 44x
>+	default n
>+	select PPC44x_SIMPLE

I don't think you need to select this here.  PPC44x_SIMPLE is for boards that
don't have their own .c file.

>+	select 440EPX
>+	help
>+	  This option enables support for the Netstal Maschinen HCU5 board.
>+
> config SAM440EP
>         bool "Sam440ep"
> 	depends on 44x
>diff --git a/arch/powerpc/platforms/44x/Makefile b/arch/powerpc/platforms/44x/Makefile
>index ee6185a..5263595 100644
>--- a/arch/powerpc/platforms/44x/Makefile
>+++ b/arch/powerpc/platforms/44x/Makefile
>@@ -1,6 +1,7 @@
> obj-$(CONFIG_44x)	:= misc_44x.o idle.o
> obj-$(CONFIG_PPC44x_SIMPLE) += ppc44x_simple.o
> obj-$(CONFIG_EBONY)	+= ebony.o
>+obj-$(CONFIG_HCU5)	+= hcu5.o
> obj-$(CONFIG_SAM440EP) 	+= sam440ep.o
> obj-$(CONFIG_WARP)	+= warp.o
> obj-$(CONFIG_XILINX_VIRTEX_5_FXT) += virtex.o
>diff --git a/arch/powerpc/platforms/44x/hcu5.c b/arch/powerpc/platforms/44x/hcu5.c
>new file mode 100644
>index 0000000..1248a84
>--- /dev/null
>+++ b/arch/powerpc/platforms/44x/hcu5.c
>@@ -0,0 +1,153 @@
>+/*
>+ * arch/ppc/platforms/4xx/hcu5.c
>+ *
>+ * HCU5 board specific routines
>+ *
>+ * Copyright (c) 2009 Niklaus Giger <niklaus.giger@member.fsf.org>
>+ *
>+ * Copyright 2006-2007 DENX Software Engineering, Stefan Roese <sr@denx.de>
>+ *
>+ * Based on bamboo.c from Wade Farnsworth <wfarnsworth@mvista.com>
>+ *	Copyright 2004 MontaVista Software Inc.
>+ *	Copyright 2006 AMCC
>+ *
>+ * This program is free software; you can redistribute  it and/or modify it
>+ * under  the terms of  the GNU General  Public License as published by the
>+ * Free Software Foundation;  either version 2 of the  License, or (at your
>+ * option) any later version.
>+ *
>+ */
>+
>+#include <asm/machdep.h>
>+#include <asm/pci-bridge.h>
>+#include <asm/ppc4xx.h>
>+#include <asm/prom.h>
>+#include <asm/time.h>
>+#include <asm/udbg.h>
>+#include <asm/uic.h>
>+
>+#include <linux/init.h>
>+#include <linux/of_platform.h>
>+#include <linux/of_device.h>
>+#include <asm/irq.h>
>+#include <linux/types.h>
>+#include <linux/interrupt.h>
>+#include <asm/device.h>
>+
>+static int __init hcu5_probe(void);
>+#define debug printk

We have a pr_debug already.  Could you use that?  As it stands now this does
nothing more than s/printk/debug, which seems unecessary unless you meant for
it to be switchable.

>+/* HCU5 tick-status und tick-ontrol-register definitions */
>+#define CFG_CS_2  0xCC000000
>+#define CFG_CPLD  CFG_CS_2
>+#define HCU_TICK_CONTROL_REGISTER_ADDRESS	(CFG_CPLD + 0x1000000)
>+
>+#define TICK_INTERRUPT_ON_HCU5 60	/* UIC1, 28, GPIO40, IRQ0 */
>+#define CAN1_INTERRUPT_ON_HCU5 67	/* UIC2,  3  GPIO42  IRQ2 */
>+#define CAN2_INTERRUPT_ON_HCU5 68	/* UIC3,  4  GPIO43  IRQ3 */
>+
>+#define TICK_INT_ENABLE       0x01
>+#define TICK_INT_RESET        0x02
>+#define TICK_INT_PENDING      0x04
>+#define TICK_START            0x08
>+
>+#define TICK_INTERVAL_250_US  0x00
>+#define TICK_INTERVAL_500_US  0x10
>+#define TICK_INTERVAL_1000_US 0x20
>+#define TICK_INTERVAL_2000_US 0x30
>+
>+static u16 __iomem *tickAddr;
>+
>+static irqreturn_t cpld_interrupt(int irq, void *dev_id)
>+{
>+	/*
>+	out_be16(HCU_TICK_CONTROL_REGISTER_ADDRESS,
>+	TICK_INT_ENABLE | TICK_INT_RESET | TICK_START |
>+	TICK_INTERVAL_1000_US);
>+	*/
>+	return 0;
>+}
>+
>+static irqreturn_t can1_interrupt(int irq, void *dev_id)
>+{
>+	return 0;
>+}
>+
>+static irqreturn_t can2_interrupt(int irq, void *dev_id)
>+{
>+	return 0;
>+}
>+
>+static int map_one_irq(const char *typeName, void *proc)
>+{
>+	struct device_node *np;
>+	int irq = 0;
>+	int rc;
>+
>+	np = of_find_node_by_type(NULL, typeName);

Use of_find_compatible_node and specify proper compatible properties instead.

>+	debug(KERN_INFO "%s: %s np %p\n", __func__, typeName, np);
>+	if (!np) {
>+		printk(KERN_INFO "\n\nNo %s found in device tree\n", typeName);
>+		return -1;
>+	}
>+
>+	irq = irq_of_parse_and_map(np, 0);
>+	debug(KERN_INFO "%s: %s with irq %d\n",  __func__, typeName, irq);
>+	if (irq == NO_IRQ) {
>+		printk(KERN_ERR "%s: irq_of_parse_and_map failed\n", typeName);
>+		of_node_put(np);
>+		return -ENODEV;
>+	}
>+	rc = request_irq(irq, proc, IRQF_TRIGGER_LOW,
>+			  typeName, NULL);
>+	debug(KERN_INFO"%s: %s %d rc %d\n",
>+		__func__, typeName, irq, rc);
>+	return 0;
>+}
>+
>+static __initdata struct of_device_id hcu5_of_bus[] = {
>+	{ .compatible = "ibm,plb4", },
>+	{ .compatible = "ibm,opb", },
>+	{ .compatible = "ibm,ebc", },
>+	{ .compatible = "simple-bus", },
>+	{},
>+};
>+
>+static int __init hcu5_device_probe(void)
>+{
>+	of_platform_bus_probe(NULL, hcu5_of_bus, NULL);
>+	tickAddr = ioremap(HCU_TICK_CONTROL_REGISTER_ADDRESS, 0x100);
>+	printk(KERN_INFO "%s: tickAddr is at %p from 0x%x\n", __func__,
>+		tickAddr, HCU_TICK_CONTROL_REGISTER_ADDRESS);
>+	map_one_irq("cpld", cpld_interrupt);
>+	map_one_irq("can1", can1_interrupt);
>+	map_one_irq("can2", can2_interrupt);
>+
>+	return 0;
>+}
>+
>+machine_device_initcall(hcu5, hcu5_device_probe);
>+
>+/*
>+* Called very early, MMU is off, device-tree isn't unflattened
>+*/
>+static int __init hcu5_probe(void)
>+{
>+	unsigned long root = of_get_flat_dt_root();
>+
>+	if (!of_flat_dt_is_compatible(root, "netstal,hcu5"))
>+		return 0;
>+	ppc_pci_flags = PPC_PCI_REASSIGN_ALL_RSRC;

Do you need this line at all?  You don't have PCI nodes...

>+	return 1;
>+}
>+

josh

^ permalink raw reply

* Linux seamless booting
From: Fortini Matteo @ 2009-10-12 11:07 UTC (permalink / raw)
  To: u-boot; +Cc: linuxppc-dev

Hi all,
we've been working on a PPC512x board booting with u-boot + linux 
2.6.24.6, and one major issue for our application is boot time.

Right now, we went down to less than 6s boot time (i.e. time from power 
off to launch of the user app), even with some drivers installed, i.e.:

    * CAN bus
    * GPU on PCI bus
    * USB host
    * UBI

with r/o squashfs root and r/w ubifs configuration filesystem mounted 
from a NOR flash memory.

This time could be enough for the high-level application (we're showing 
a splash screen in about 2-3s so it's ok if the main app takes a little 
longer to come up), but the whole system is connected to others via CAN 
and we can't afford the "black out" period between u-boot and linux.

Some other operating systems (e.g. QNX) have this "seamless booting" 
function, in which drivers are started in stages, so that a (minimal) 
functionality can be given in milliseconds, and more functionalities 
come up as time goes by.

Is there a way to support something similar with the u-boot -> linux 
solution?

Thank you,
Matteo

^ permalink raw reply

* Re: [v8 PATCH 1/8]: cpuidle: cleanup drivers/cpuidle/cpuidle.c
From: Balbir Singh @ 2009-10-12 11:36 UTC (permalink / raw)
  To: Arun R Bharadwaj
  Cc: linux-arch, Peter Zijlstra, linux-kernel, linux-acpi, Ingo Molnar,
	linuxppc-dev, Arjan van de Ven
In-Reply-To: <20091008094942.GB20595@linux.vnet.ibm.com>

* Arun R B <arun@linux.vnet.ibm.com> [2009-10-08 15:19:42]:

> * Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-10-08 15:18:28]:
> 
> This patch cleans up drivers/cpuidle/cpuidle.c
> Earlier cpuidle assumed pm_idle as the default idle loop. Break that
> assumption and make it more generic. cpuidle_idle_call() which is the
> main idle loop of cpuidle is to be called by architectures which have
> registered to cpuidle.
> 
> Remove routines cpuidle_install/uninstall_idle_handler() which are not
> needed anymore.
> 
>

[snip]

  /**
> - * cpuidle_install_idle_handler - installs the cpuidle idle loop handler
> - */
> -void cpuidle_install_idle_handler(void)
> -{
> -	if (enabled_devices && (pm_idle != cpuidle_idle_call)) {
> -		/* Make sure all changes finished before we switch to new idle */
> -		smp_wmb();
> -		pm_idle = cpuidle_idle_call;
> -	}
> -}
> -
> -/**
> - * cpuidle_uninstall_idle_handler - uninstalls the cpuidle idle loop handler
> - */
> -void cpuidle_uninstall_idle_handler(void)
> -{
> -	if (enabled_devices && pm_idle_old && (pm_idle != pm_idle_old)) {
> -		pm_idle = pm_idle_old;
> -		cpuidle_kick_cpus();
> -	}
> -}
> -

I see the routines above being called in from
cpuidle_pause/resume_and_lock/unlock below and they are entries from
ACPI on ACPI_PROCESSOR_NOTIFY_POWER and from the hotplug path, could
you test them to make sure they are not broken. We also seem to be
missing a cpuidle_kick_cpus() in cpuidle_pause_and_lock()

[snip]

-- 
	Balbir

^ permalink raw reply

* Re: [PATCH] pasemi_mac: ethtool set settings support
From: David Miller @ 2009-10-12 11:25 UTC (permalink / raw)
  To: olof; +Cc: linuxppc-dev, jgarzik, netdev
In-Reply-To: <20091006161123.GC29195@lixom.net>

From: Olof Johansson <olof@lixom.net>
Date: Tue, 6 Oct 2009 11:11:23 -0500

> On Mon, Oct 05, 2009 at 05:31:24PM +0400, Valentine Barshak wrote:
>> Add ethtool set settings to pasemi_mac_ethtool.
>> 
>> Signed-off-by: Valentine Barshak <vbarshak@ru.mvista.com>
> 
> Acked-by: Olof Johansson <olof@lixom.net>

Applied to net-next-2.6, thanks.

^ permalink raw reply

* Re: [v8 PATCH 0/8]: cpuidle: Cleanup cpuidle/ Introduce cpuidle to POWER
From: Balbir Singh @ 2009-10-12 10:01 UTC (permalink / raw)
  To: Arun R Bharadwaj
  Cc: linux-arch, Peter Zijlstra, linux-kernel, linux-acpi, Ingo Molnar,
	linuxppc-dev, Arjan van de Ven
In-Reply-To: <20091008094828.GA20595@linux.vnet.ibm.com>

* Arun R B <arun@linux.vnet.ibm.com> [2009-10-08 15:18:28]:

> Hi
> 
> Please consider this for inclusion into the testing tree.
> 
> This patchset introduces cpuidle infrastructure to POWER, prototyping
> for pSeries, and also does a major refactoring of current x86 idle
> power management and a cleanup of cpuidle infrastructure.
> 
> Earlier discussions on the same can be found at:
> 
> v7 --> http://lkml.org/lkml/2009/10/6/278
> v6 --> http://lkml.org/lkml/2009/9/22/180
> v5 --> http://lkml.org/lkml/2009/9/22/26
> v4 --> http://lkml.org/lkml/2009/9/1/133
> v3 --> http://lkml.org/lkml/2009/8/27/124
> v2 --> http://lkml.org/lkml/2009/8/26/233
> v1 --> http://lkml.org/lkml/2009/8/19/150
> 
>

I looked at the changes and they are beginning to look very good. Some
minor comments about comments in the individual patches. 

-- 
	Balbir

^ permalink raw reply

* Re: [PATCH] powerpc: Fix hypervisor TLB batching
From: Benjamin Herrenschmidt @ 2009-10-12  8:44 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: linuxppc-dev
In-Reply-To: <20091012074734.GB4808@kryten>

On Mon, 2009-10-12 at 18:47 +1100, Anton Blanchard wrote:
> Profiling of a page fault scalability microbenchmark shows flush_hash_range
> is not calling the batch hpte invalidate hcall (H_BULK_REMOVE).
>
> It turns out we have a duplicate firmware feature for hcall-bulk and the
> current setup code stops after finding the first match. This meant we never
> batch and always do individual invalidates.
> 
> The patch below removes the duplicate and shifts FW_FEATURE_CMO to close
> the gap. With the patch applied the single threaded page fault rate improves
> from 217169 to 238755 per second on a POWER5 test box, a 10% improvement.
> 
> Signed-off-by: Anton Blanchard <anton@samba.org>
> ---

Good catch !

Ben.

> Index: linux.trees.git/arch/powerpc/include/asm/firmware.h
> ===================================================================
> --- linux.trees.git.orig/arch/powerpc/include/asm/firmware.h	2009-10-12 18:11:56.000000000 +1100
> +++ linux.trees.git/arch/powerpc/include/asm/firmware.h	2009-10-12 18:21:20.000000000 +1100
> @@ -37,7 +37,7 @@
>  #define FW_FEATURE_VIO		ASM_CONST(0x0000000000004000)
>  #define FW_FEATURE_RDMA		ASM_CONST(0x0000000000008000)
>  #define FW_FEATURE_LLAN		ASM_CONST(0x0000000000010000)
> -#define FW_FEATURE_BULK		ASM_CONST(0x0000000000020000)
> +#define FW_FEATURE_BULK_REMOVE	ASM_CONST(0x0000000000020000)
>  #define FW_FEATURE_XDABR	ASM_CONST(0x0000000000040000)
>  #define FW_FEATURE_MULTITCE	ASM_CONST(0x0000000000080000)
>  #define FW_FEATURE_SPLPAR	ASM_CONST(0x0000000000100000)
> @@ -45,8 +45,7 @@
>  #define FW_FEATURE_LPAR		ASM_CONST(0x0000000000400000)
>  #define FW_FEATURE_PS3_LV1	ASM_CONST(0x0000000000800000)
>  #define FW_FEATURE_BEAT		ASM_CONST(0x0000000001000000)
> -#define FW_FEATURE_BULK_REMOVE	ASM_CONST(0x0000000002000000)
> -#define FW_FEATURE_CMO		ASM_CONST(0x0000000004000000)
> +#define FW_FEATURE_CMO		ASM_CONST(0x0000000002000000)
>  
>  #ifndef __ASSEMBLY__
>  
> @@ -58,8 +57,9 @@ enum {
>  		FW_FEATURE_PERF | FW_FEATURE_DUMP | FW_FEATURE_INTERRUPT |
>  		FW_FEATURE_MIGRATE | FW_FEATURE_PERFMON | FW_FEATURE_CRQ |
>  		FW_FEATURE_VIO | FW_FEATURE_RDMA | FW_FEATURE_LLAN |
> -		FW_FEATURE_BULK | FW_FEATURE_XDABR | FW_FEATURE_MULTITCE |
> -		FW_FEATURE_SPLPAR | FW_FEATURE_LPAR | FW_FEATURE_CMO,
> +		FW_FEATURE_BULK_REMOVE | FW_FEATURE_XDABR |
> +		FW_FEATURE_MULTITCE | FW_FEATURE_SPLPAR | FW_FEATURE_LPAR |
> +		FW_FEATURE_CMO,
>  	FW_FEATURE_PSERIES_ALWAYS = 0,
>  	FW_FEATURE_ISERIES_POSSIBLE = FW_FEATURE_ISERIES | FW_FEATURE_LPAR,
>  	FW_FEATURE_ISERIES_ALWAYS = FW_FEATURE_ISERIES | FW_FEATURE_LPAR,
> Index: linux.trees.git/arch/powerpc/platforms/pseries/firmware.c
> ===================================================================
> --- linux.trees.git.orig/arch/powerpc/platforms/pseries/firmware.c	2009-10-12 18:08:37.000000000 +1100
> +++ linux.trees.git/arch/powerpc/platforms/pseries/firmware.c	2009-10-12 18:12:54.000000000 +1100
> @@ -51,11 +51,10 @@ firmware_features_table[FIRMWARE_MAX_FEA
>  	{FW_FEATURE_VIO,		"hcall-vio"},
>  	{FW_FEATURE_RDMA,		"hcall-rdma"},
>  	{FW_FEATURE_LLAN,		"hcall-lLAN"},
> -	{FW_FEATURE_BULK,		"hcall-bulk"},
> +	{FW_FEATURE_BULK_REMOVE,	"hcall-bulk"},
>  	{FW_FEATURE_XDABR,		"hcall-xdabr"},
>  	{FW_FEATURE_MULTITCE,		"hcall-multi-tce"},
>  	{FW_FEATURE_SPLPAR,		"hcall-splpar"},
> -	{FW_FEATURE_BULK_REMOVE,	"hcall-bulk"},
>  };
>  
>  /* Build up the firmware features bitmask using the contents of

^ permalink raw reply

* [PATCH] powerpc: Fix hypervisor TLB batching
From: Anton Blanchard @ 2009-10-12  7:47 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev


Profiling of a page fault scalability microbenchmark shows flush_hash_range
is not calling the batch hpte invalidate hcall (H_BULK_REMOVE).

It turns out we have a duplicate firmware feature for hcall-bulk and the
current setup code stops after finding the first match. This meant we never
batch and always do individual invalidates.

The patch below removes the duplicate and shifts FW_FEATURE_CMO to close
the gap. With the patch applied the single threaded page fault rate improves
from 217169 to 238755 per second on a POWER5 test box, a 10% improvement.

Signed-off-by: Anton Blanchard <anton@samba.org>
---

Index: linux.trees.git/arch/powerpc/include/asm/firmware.h
===================================================================
--- linux.trees.git.orig/arch/powerpc/include/asm/firmware.h	2009-10-12 18:11:56.000000000 +1100
+++ linux.trees.git/arch/powerpc/include/asm/firmware.h	2009-10-12 18:21:20.000000000 +1100
@@ -37,7 +37,7 @@
 #define FW_FEATURE_VIO		ASM_CONST(0x0000000000004000)
 #define FW_FEATURE_RDMA		ASM_CONST(0x0000000000008000)
 #define FW_FEATURE_LLAN		ASM_CONST(0x0000000000010000)
-#define FW_FEATURE_BULK		ASM_CONST(0x0000000000020000)
+#define FW_FEATURE_BULK_REMOVE	ASM_CONST(0x0000000000020000)
 #define FW_FEATURE_XDABR	ASM_CONST(0x0000000000040000)
 #define FW_FEATURE_MULTITCE	ASM_CONST(0x0000000000080000)
 #define FW_FEATURE_SPLPAR	ASM_CONST(0x0000000000100000)
@@ -45,8 +45,7 @@
 #define FW_FEATURE_LPAR		ASM_CONST(0x0000000000400000)
 #define FW_FEATURE_PS3_LV1	ASM_CONST(0x0000000000800000)
 #define FW_FEATURE_BEAT		ASM_CONST(0x0000000001000000)
-#define FW_FEATURE_BULK_REMOVE	ASM_CONST(0x0000000002000000)
-#define FW_FEATURE_CMO		ASM_CONST(0x0000000004000000)
+#define FW_FEATURE_CMO		ASM_CONST(0x0000000002000000)
 
 #ifndef __ASSEMBLY__
 
@@ -58,8 +57,9 @@ enum {
 		FW_FEATURE_PERF | FW_FEATURE_DUMP | FW_FEATURE_INTERRUPT |
 		FW_FEATURE_MIGRATE | FW_FEATURE_PERFMON | FW_FEATURE_CRQ |
 		FW_FEATURE_VIO | FW_FEATURE_RDMA | FW_FEATURE_LLAN |
-		FW_FEATURE_BULK | FW_FEATURE_XDABR | FW_FEATURE_MULTITCE |
-		FW_FEATURE_SPLPAR | FW_FEATURE_LPAR | FW_FEATURE_CMO,
+		FW_FEATURE_BULK_REMOVE | FW_FEATURE_XDABR |
+		FW_FEATURE_MULTITCE | FW_FEATURE_SPLPAR | FW_FEATURE_LPAR |
+		FW_FEATURE_CMO,
 	FW_FEATURE_PSERIES_ALWAYS = 0,
 	FW_FEATURE_ISERIES_POSSIBLE = FW_FEATURE_ISERIES | FW_FEATURE_LPAR,
 	FW_FEATURE_ISERIES_ALWAYS = FW_FEATURE_ISERIES | FW_FEATURE_LPAR,
Index: linux.trees.git/arch/powerpc/platforms/pseries/firmware.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/platforms/pseries/firmware.c	2009-10-12 18:08:37.000000000 +1100
+++ linux.trees.git/arch/powerpc/platforms/pseries/firmware.c	2009-10-12 18:12:54.000000000 +1100
@@ -51,11 +51,10 @@ firmware_features_table[FIRMWARE_MAX_FEA
 	{FW_FEATURE_VIO,		"hcall-vio"},
 	{FW_FEATURE_RDMA,		"hcall-rdma"},
 	{FW_FEATURE_LLAN,		"hcall-lLAN"},
-	{FW_FEATURE_BULK,		"hcall-bulk"},
+	{FW_FEATURE_BULK_REMOVE,	"hcall-bulk"},
 	{FW_FEATURE_XDABR,		"hcall-xdabr"},
 	{FW_FEATURE_MULTITCE,		"hcall-multi-tce"},
 	{FW_FEATURE_SPLPAR,		"hcall-splpar"},
-	{FW_FEATURE_BULK_REMOVE,	"hcall-bulk"},
 };
 
 /* Build up the firmware features bitmask using the contents of

^ permalink raw reply

* Re: [PATCH 6/8] 8xx: Add missing Guarded setting in DTLB Error.
From: Joakim Tjernlund @ 2009-10-12  6:59 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Scott Wood, linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <1255326374.2192.112.camel@pasglop>

Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 12/10/2009 07:46:14:
>
> On Mon, 2009-10-12 at 07:36 +0200, Joakim Tjernlund wrote:
> > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 12/10/2009 00:44:56:
> > > On Mon, 2009-10-12 at 00:19 +0200, Joakim Tjernlund wrote:
> > > >
> > > > I hear you, I can remove DTLB error with an add on patch later if that is OK?
> > > > I cannot remove the DARFix though, when I move that to do_page_fault(), I get
> > > > duplicate TLB hits on the same insn. It is like when transfer_to_handler()
> > > > executes rfi, the cpu restarts the the faulting insn instead of jumping
> > > > to the page fault handler, not always but often.
> > >
> > > I'm not sure what you mean here ...
> >
> > Just that I need to keep the DAR fix for dcbX insn in the DTLB handler. If I try
> > to move it to do_page_fault() I get a lot more DTLB errors for dcbX insn.
>
> I'm not sure why (ie, I didn't get your explanation about rfi and
> restarting the faulting insn etc...) but ok, I don't mind having
> the DAR fixup remain in the asm. It's the whole logic that looks
> at the PTE and does things with it that I feel has no room in there :-)

OK, I will send a removal patch too.

>
> BTW. Maybe the do_page_fault() thing comes from the fact that we
> also go there via ITLB Error which doesn't set the DAR and
> that's normal ?

I had that idea to, but no. These have different traps numbers and
address is set to SRR0 for ITLB and to DAR for DTLB. I test for DTLB
trap number before doing anything.

^ permalink raw reply

* [PATCH] 8xx: Remove DIRTY pte handling in DTLB Error.
From: Joakim Tjernlund @ 2009-10-12  7:03 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev@ozlabs.org, Rex Feany,
	Scott Wood

There is no need to do set the DIRTY bit directly in DTLB Error.
Trap to do_page_fault() and let the generic MM code do the work.
---

Ben, here it is :)

 arch/powerpc/kernel/head_8xx.S |   95 ----------------------------------------
 1 files changed, 0 insertions(+), 95 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index db5207e..cb94f46 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -500,92 +500,6 @@ DataTLBError:
 	cmpwi	cr0, r10, 0x00f0
 	beq-	FixDAR	/* must be a buggy dcbX, icbi insn. */
 DARFix:	/* Return from dcbx instruction bug workaround, r10 holds value of DAR */
-	mfspr	r11, SPRN_DSISR
-	/* As the DAR fixup may clear store we may have all 3 states zero.
-	 * Make sure only 0x0200(store) falls down into DIRTY handling
-	 */
-	andis.	r11, r11, 0x4a00	/* !translation, protection or store */
-	srwi	r11, r11, 16
-	cmpwi	cr0, r11, 0x0200	/* just store ? */
-	bne	2f
-	/* Only Change bit left now, do it here as it is faster
-	 * than trapping to the C fault handler.
-	*/
-
-	/* The EA of a data TLB miss is automatically stored in the MD_EPN
-	 * register.  The EA of a data TLB error is automatically stored in
-	 * the DAR, but not the MD_EPN register.  We must copy the 20 most
-	 * significant bits of the EA from the DAR to MD_EPN before we
-	 * start walking the page tables.  We also need to copy the CASID
-	 * value from the M_CASID register.
-	 * Addendum:  The EA of a data TLB error is _supposed_ to be stored
-	 * in DAR, but it seems that this doesn't happen in some cases, such
-	 * as when the error is due to a dcbi instruction to a page with a
-	 * TLB that doesn't have the changed bit set.  In such cases, there
-	 * does not appear to be any way  to recover the EA of the error
-	 * since it is neither in DAR nor MD_EPN.  As a workaround, the
-	 * _PAGE_HWWRITE bit is set for all kernel data pages when the PTEs
-	 * are initialized in mapin_ram().  This will avoid the problem,
-	 * assuming we only use the dcbi instruction on kernel addresses.
-	 */
-
-	/* DAR is in r10 already */
-	rlwinm	r11, r10, 0, 0, 19
-	ori	r11, r11, MD_EVALID
-	mfspr	r10, SPRN_M_CASID
-	rlwimi	r11, r10, 0, 28, 31
-	DO_8xx_CPU6(0x3780, r3)
-	mtspr	SPRN_MD_EPN, r11
-
-	mfspr	r10, SPRN_M_TWB	/* Get level 1 table entry address */
-
-	/* If we are faulting a kernel address, we have to use the
-	 * kernel page tables.
-	 */
-	andi.	r11, r10, 0x0800
-	beq	3f
-	lis	r11, swapper_pg_dir@h
-	ori	r11, r11, swapper_pg_dir@l
-	rlwimi	r10, r11, 0, 2, 19
-3:
-	lwz	r11, 0(r10)	/* Get the level 1 entry */
-	rlwinm.	r10, r11,0,0,19	/* Extract page descriptor page address */
-	beq	2f		/* If zero, bail */
-
-	/* We have a pte table, so fetch the pte from the table.
-	 */
-	ori	r11, r11, 1		/* Set valid bit in physical L2 page */
-	DO_8xx_CPU6(0x3b80, r3)
-	mtspr	SPRN_MD_TWC, r11	/* Load pte table base address */
-	mfspr	r10, SPRN_MD_TWC	/* ....and get the pte address */
-	lwz	r10, 0(r10)		/* Get the pte */
-	/* Insert the Guarded flag into the TWC from the Linux PTE.
-	 * It is bit 27 of both the Linux PTE and the TWC
-	 */
-	rlwimi	r11, r10, 0, 27, 27
-	/* Insert the WriteThru flag into the TWC from the Linux PTE.
-	 * It is bit 25 in the Linux PTE and bit 30 in the TWC
-	 */
-	rlwimi	r11, r10, 32-5, 30, 30
-	DO_8xx_CPU6(0x3b80, r3)
-	mtspr	SPRN_MD_TWC, r11
-	mfspr	r11, SPRN_MD_TWC	/* get the pte address again */
-
-	ori	r10, r10, _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_HWWRITE
-	stw	r10, 0(r11)		/* and update pte in table */
-	xori	r10, r10, _PAGE_RW	/* RW bit is inverted */
-
-	/* The Linux PTE won't go exactly into the MMU TLB.
-	 * Software indicator bits 22 and 28 must be clear.
-	 * Software indicator bits 24, 25, 26, and 27 must be
-	 * set.  All other Linux PTE bits control the behavior
-	 * of the MMU.
-	 */
-	li	r11, 0x00f0
-	mtspr	SPRN_DAR,r11	/* Tag DAR */
-	rlwimi	r10, r11, 0, 24, 28	/* Set 24-27, clear 28 */
-	DO_8xx_CPU6(0x3d80, r3)
-	mtspr	SPRN_MD_RPN, r10	/* Update TLB entry */
 
 	mfspr	r10, SPRN_M_TW	/* Restore registers */
 	lwz	r11, 0(r0)
@@ -594,15 +508,6 @@ DARFix:	/* Return from dcbx instruction bug workaround, r10 holds value of DAR *
 #ifdef CONFIG_8xx_CPU6
 	lwz	r3, 8(r0)
 #endif
-	rfi
-2:
-	mfspr	r10, SPRN_M_TW	/* Restore registers */
-	lwz	r11, 0(r0)
-	mtcr	r11
-	lwz	r11, 4(r0)
-#ifdef CONFIG_8xx_CPU6
-	lwz	r3, 8(r0)
-#endif
 	b	DataAccess
 
 	EXCEPTION(0x1500, Trap_15, unknown_exception, EXC_XFER_EE)
-- 
1.6.4.4

^ permalink raw reply related

* Re: [PATCH 6/8] 8xx: Add missing Guarded setting in DTLB Error.
From: Benjamin Herrenschmidt @ 2009-10-12  5:46 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <OFE59DA813.A2491F26-ONC125764D.001E597A-C125764D.001EC8E7@transmode.se>

On Mon, 2009-10-12 at 07:36 +0200, Joakim Tjernlund wrote:
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 12/10/2009 00:44:56:
> > On Mon, 2009-10-12 at 00:19 +0200, Joakim Tjernlund wrote:
> > >
> > > I hear you, I can remove DTLB error with an add on patch later if that is OK?
> > > I cannot remove the DARFix though, when I move that to do_page_fault(), I get
> > > duplicate TLB hits on the same insn. It is like when transfer_to_handler()
> > > executes rfi, the cpu restarts the the faulting insn instead of jumping
> > > to the page fault handler, not always but often.
> >
> > I'm not sure what you mean here ...
> 
> Just that I need to keep the DAR fix for dcbX insn in the DTLB handler. If I try
> to move it to do_page_fault() I get a lot more DTLB errors for dcbX insn.

I'm not sure why (ie, I didn't get your explanation about rfi and
restarting the faulting insn etc...) but ok, I don't mind having
the DAR fixup remain in the asm. It's the whole logic that looks
at the PTE and does things with it that I feel has no room in there :-)

BTW. Maybe the do_page_fault() thing comes from the fact that we
also go there via ITLB Error which doesn't set the DAR and
that's normal ?

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH 6/8] 8xx: Add missing Guarded setting in DTLB Error.
From: Joakim Tjernlund @ 2009-10-12  5:36 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Scott Wood, linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <1255301096.2192.44.camel@pasglop>

Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 12/10/2009 00:44:56:
> On Mon, 2009-10-12 at 00:19 +0200, Joakim Tjernlund wrote:
> >
> > I hear you, I can remove DTLB error with an add on patch later if that is OK?
> > I cannot remove the DARFix though, when I move that to do_page_fault(), I get
> > duplicate TLB hits on the same insn. It is like when transfer_to_handler()
> > executes rfi, the cpu restarts the the faulting insn instead of jumping
> > to the page fault handler, not always but often.
>
> I'm not sure what you mean here ...

Just that I need to keep the DAR fix for dcbX insn in the DTLB handler. If I try
to move it to do_page_fault() I get a lot more DTLB errors for dcbX insn.

 Jocke

^ permalink raw reply

* Re: [PATCH 2/2][v3] powerpc: Make the CMM memory hotplug aware
From: Benjamin Herrenschmidt @ 2009-10-12  5:06 UTC (permalink / raw)
  To: Robert Jennings
  Cc: linux-mm, Mel Gorman, Gerald Schaefer, linux-kernel, linuxppc-dev,
	Martin Schwidefsky, Badari Pulavarty, Brian King, Paul Mackerras,
	Andrew Morton, Ingo Molnar, KAMEZAWA Hiroyuki
In-Reply-To: <20091009204126.GD19114@austin.ibm.com>

On Fri, 2009-10-09 at 15:41 -0500, Robert Jennings wrote:
> The Collaborative Memory Manager (CMM) module allocates individual pages
> over time that are not migratable.  On a long running system this can
> severely impact the ability to find enough pages to support a hotplug
> memory remove operation.
> 
> This patch adds a memory isolation notifier and a memory hotplug notifier.
> The memory isolation notifier will return the number of pages found
> in the range specified.  This is used to determine if all of the used
> pages in a pageblock are owned by the balloon (or other entities in
> the notifier chain).  The hotplug notifier will free pages in the range
> which is to be removed.  The priority of this hotplug notifier is low
> so that it will be called near last, this helps avoids removing loaned
> pages in operations that fail due to other handlers.
> 
> CMM activity will be halted when hotplug remove operations are active
> and resume activity after a delay period to allow the hypervisor time
> to adjust.
> 
> Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>

Do you need me to merge that via the powerpc tree after the relevant
generic parts go in ? This is 2.6.33 material ?

> +module_param_named(hotplug_delay, hotplug_delay, uint, S_IRUGO | S_IWUSR);
> +MODULE_PARM_DESC(delay, "Delay (in seconds) after memory hotplug remove "
> +		 "before activity resumes. "
> +		 "[Default=" __stringify(CMM_HOTPLUG_DELAY) "]");

What is the above ? That sounds scary :-)

>  module_param_named(oom_kb, oom_kb, uint, S_IRUGO | S_IWUSR);
>  MODULE_PARM_DESC(oom_kb, "Amount of memory in kb to free on OOM. "
>  		 "[Default=" __stringify(CMM_OOM_KB) "]");
> @@ -88,6 +101,8 @@ struct cmm_page_array {
>  static unsigned long loaned_pages;
>  static unsigned long loaned_pages_target;
>  static unsigned long oom_freed_pages;
> +static atomic_t hotplug_active = ATOMIC_INIT(0);
> +static atomic_t hotplug_occurred = ATOMIC_INIT(0);

That sounds like a hand made lock with atomics... rarely a good idea,
tends to miss appropriate barriers etc...
 
>  static struct cmm_page_array *cmm_page_list;
>  static DEFINE_SPINLOCK(cmm_lock);
> @@ -110,6 +125,9 @@ static long cmm_alloc_pages(long nr)
>  	cmm_dbg("Begin request for %ld pages\n", nr);
>  
>  	while (nr) {
> +		if (atomic_read(&hotplug_active))
> +			break;
> +

Ok so I'm not familiar with that whole memory hotplug stuff, so the code
might be right, but wouldn't the above be racy anyways in case hotplug
just becomes active after this statement ?

Shouldn't you use a mutex_trylock instead ? That has clearer semantics
and will provide the appropriate memory barriers.

>  		addr = __get_free_page(GFP_NOIO | __GFP_NOWARN |
>  				       __GFP_NORETRY | __GFP_NOMEMALLOC);
>  		if (!addr)
> @@ -119,8 +137,10 @@ static long cmm_alloc_pages(long nr)
>  		if (!pa || pa->index >= CMM_NR_PAGES) {
>  			/* Need a new page for the page list. */
>  			spin_unlock(&cmm_lock);
> -			npa = (struct cmm_page_array *)__get_free_page(GFP_NOIO | __GFP_NOWARN |
> -								       __GFP_NORETRY | __GFP_NOMEMALLOC);
> +			npa = (struct cmm_page_array *)__get_free_page(
> +					GFP_NOIO | __GFP_NOWARN |
> +					__GFP_NORETRY | __GFP_NOMEMALLOC |
> +					__GFP_MOVABLE);
>  			if (!npa) {
>  				pr_info("%s: Can not allocate new page list\n", __func__);
>  				free_page(addr);
> @@ -273,9 +293,23 @@ static int cmm_thread(void *dummy)
>  	while (1) {
>  		timeleft = msleep_interruptible(delay * 1000);
> 
> -		if (kthread_should_stop() || timeleft) {
> -			loaned_pages_target = loaned_pages;
> +		if (kthread_should_stop() || timeleft)
>  			break;
> +
> +		if (atomic_read(&hotplug_active)) {
> +			cmm_dbg("Hotplug operation in progress, activity "
> +					"suspended\n");
> +			continue;
> +		}
> +
> +		if (atomic_dec_if_positive(&hotplug_occurred) >= 0) {
> +			cmm_dbg("Hotplug operation has occurred, loaning "
> +					"activity suspended for %d seconds.\n",
> +					hotplug_delay);
> +			timeleft = msleep_interruptible(hotplug_delay * 1000);
> +			if (kthread_should_stop() || timeleft)
> +				break;
> +			continue;
>  		}

I have less problems with hotplug_occured but if you use a
mutex_trylock, overall, you can turn the above into a normal int instead
of an atomic.

 ../..

> +static int cmm_memory_cb(struct notifier_block *self,
> +			unsigned long action, void *arg)
> +{
> +	int ret = 0;
> +
> +	switch (action) {
> +	case MEM_GOING_OFFLINE:
> +		atomic_set(&hotplug_active, 1);

So that would become a mutex_lock(). Added advantage is that
it would wait for a current CMM operation to complete.
 
Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH 7/8] 8xx: Restore _PAGE_WRITETHRU
From: Benjamin Herrenschmidt @ 2009-10-11 22:45 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <OF0664CEC2.C4547EAA-ONC125764C.007ABA27-C125764C.007AD248@transmode.se>

On Mon, 2009-10-12 at 00:21 +0200, Joakim Tjernlund wrote:
> > I think I've replaced _PAGE_EXEC with _PAGE_SPECIAL already
> > upstream since _PAGE_EXEC is unused on 8xx.
> 
> What is page SPECIAL anyway? 

It's used on newer kernels to indicates PTEs that map something that
isn't backed by a struct page (ie, not memory)

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH 6/8] 8xx: Add missing Guarded setting in DTLB Error.
From: Benjamin Herrenschmidt @ 2009-10-11 22:44 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <OF67239BF2.43007548-ONC125764C.007A17D3-C125764C.007A9BE3@transmode.se>

On Mon, 2009-10-12 at 00:19 +0200, Joakim Tjernlund wrote:
> 
> I hear you, I can remove DTLB error with an add on patch later if that is OK?
> I cannot remove the DARFix though, when I move that to do_page_fault(), I get
> duplicate TLB hits on the same insn. It is like when transfer_to_handler()
> executes rfi, the cpu restarts the the faulting insn instead of jumping
> to the page fault handler, not always but often.

I'm not sure what you mean here ...

Ben.

^ permalink raw reply

* Re: [PATCH 7/8] 8xx: Restore _PAGE_WRITETHRU
From: Joakim Tjernlund @ 2009-10-11 22:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Scott Wood, linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <1255296387.2192.43.camel@pasglop>

Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 11/10/2009 23:26:27:
>
> On Sun, 2009-10-11 at 18:35 +0200, Joakim Tjernlund wrote:
> > 8xx has not had WRITETHRU due to lack of bits in the pte.
> > After the recent rewrite of the 8xx TLB code, there are
> > two bits left. Use one of them to WRITETHRU.
> >
> > Perhaps use the last SW bit to PAGE_SPECIAL or PAGE_FILE?
>
> _PAGE_FILE can already overwrite other bits as it's only set
> when !present, and should pretty much always be 0x2

OK.

>
> I think I've replaced _PAGE_EXEC with _PAGE_SPECIAL already
> upstream since _PAGE_EXEC is unused on 8xx.

What is page SPECIAL anyway?

^ permalink raw reply

* Re: [PATCH 6/8] 8xx: Add missing Guarded setting in DTLB Error.
From: Joakim Tjernlund @ 2009-10-11 22:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Scott Wood, linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <1255296330.2192.42.camel@pasglop>

Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 11/10/2009 23:25:30:
>
> On Sun, 2009-10-11 at 18:35 +0200, Joakim Tjernlund wrote:
> > only DTLB Miss did set this bit, DTLB Error needs too otherwise
> > the setting is lost when the page becomes dirty.
>
> Easier fix: Stop doing thing in DTLB Error
>
> Ben.

I hear you, I can remove DTLB error with an add on patch later if that is OK?
I cannot remove the DARFix though, when I move that to do_page_fault(), I get
duplicate TLB hits on the same insn. It is like when transfer_to_handler()
executes rfi, the cpu restarts the the faulting insn instead of jumping
to the page fault handler, not always but often.

 Jocke

^ permalink raw reply

* Re: [PATCH 7/8] 8xx: Restore _PAGE_WRITETHRU
From: Benjamin Herrenschmidt @ 2009-10-11 21:26 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <1255278912-8042-8-git-send-email-Joakim.Tjernlund@transmode.se>

On Sun, 2009-10-11 at 18:35 +0200, Joakim Tjernlund wrote:
> 8xx has not had WRITETHRU due to lack of bits in the pte.
> After the recent rewrite of the 8xx TLB code, there are
> two bits left. Use one of them to WRITETHRU.
> 
> Perhaps use the last SW bit to PAGE_SPECIAL or PAGE_FILE?

_PAGE_FILE can already overwrite other bits as it's only set
when !present, and should pretty much always be 0x2

I think I've replaced _PAGE_EXEC with _PAGE_SPECIAL already
upstream since _PAGE_EXEC is unused on 8xx.

Cheers,
Ben.

> ---
>  arch/powerpc/include/asm/pte-8xx.h |    5 +++--
>  arch/powerpc/kernel/head_8xx.S     |    8 ++++++++
>  2 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/pte-8xx.h b/arch/powerpc/include/asm/pte-8xx.h
> index f23cd15..9349d83 100644
> --- a/arch/powerpc/include/asm/pte-8xx.h
> +++ b/arch/powerpc/include/asm/pte-8xx.h
> @@ -34,12 +34,13 @@
>  #define _PAGE_SHARED	0x0004	/* No ASID (context) compare */
>  #define _PAGE_DIRTY	0x0100	/* C: page changed */
>  
> -/* These 3 software bits must be masked out when the entry is loaded
> - * into the TLB, 2 SW bits left.
> +/* These 4 software bits must be masked out when the entry is loaded
> + * into the TLB, 1 SW bit left(0x0080).
>   */
>  #define _PAGE_EXEC	0x0008	/* software: i-cache coherency required */
>  #define _PAGE_GUARDED	0x0010	/* software: guarded access */
>  #define _PAGE_ACCESSED	0x0020	/* software: page referenced */
> +#define _PAGE_WRITETHRU	0x0040	/* software: caching is write through */
>  
>  /* Setting any bits in the nibble with the follow two controls will
>   * require a TLB exception handler change.  It is assumed unused bits
> diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
> index 371b606..db5207e 100644
> --- a/arch/powerpc/kernel/head_8xx.S
> +++ b/arch/powerpc/kernel/head_8xx.S
> @@ -422,6 +422,10 @@ DataStoreTLBMiss:
>  	 * above.
>  	 */
>  	rlwimi	r11, r10, 0, 27, 27
> +	/* Insert the WriteThru flag into the TWC from the Linux PTE.
> +	 * It is bit 25 in the Linux PTE and bit 30 in the TWC
> +	 */
> +	rlwimi	r11, r10, 32-5, 30, 30
>  	DO_8xx_CPU6(0x3b80, r3)
>  	mtspr	SPRN_MD_TWC, r11
>  
> @@ -559,6 +563,10 @@ DARFix:	/* Return from dcbx instruction bug workaround, r10 holds value of DAR *
>  	 * It is bit 27 of both the Linux PTE and the TWC
>  	 */
>  	rlwimi	r11, r10, 0, 27, 27
> +	/* Insert the WriteThru flag into the TWC from the Linux PTE.
> +	 * It is bit 25 in the Linux PTE and bit 30 in the TWC
> +	 */
> +	rlwimi	r11, r10, 32-5, 30, 30
>  	DO_8xx_CPU6(0x3b80, r3)
>  	mtspr	SPRN_MD_TWC, r11
>  	mfspr	r11, SPRN_MD_TWC	/* get the pte address again */

^ permalink raw reply

* Re: [PATCH 6/8] 8xx: Add missing Guarded setting in DTLB Error.
From: Benjamin Herrenschmidt @ 2009-10-11 21:25 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <1255278912-8042-7-git-send-email-Joakim.Tjernlund@transmode.se>

On Sun, 2009-10-11 at 18:35 +0200, Joakim Tjernlund wrote:
> only DTLB Miss did set this bit, DTLB Error needs too otherwise
> the setting is lost when the page becomes dirty.

Easier fix: Stop doing thing in DTLB Error

Ben.

> --
>  arch/powerpc/kernel/head_8xx.S |   13 ++++++++++---
>  1 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
> index 027856e..371b606 100644
> --- a/arch/powerpc/kernel/head_8xx.S
> +++ b/arch/powerpc/kernel/head_8xx.S
> @@ -552,9 +552,16 @@ DARFix:	/* Return from dcbx instruction bug workaround, r10 holds value of DAR *
>  	 */
>  	ori	r11, r11, 1		/* Set valid bit in physical L2 page */
>  	DO_8xx_CPU6(0x3b80, r3)
> -	mtspr	SPRN_MD_TWC, r11		/* Load pte table base address */
> -	mfspr	r11, SPRN_MD_TWC		/* ....and get the pte address */
> -	lwz	r10, 0(r11)		/* Get the pte */
> +	mtspr	SPRN_MD_TWC, r11	/* Load pte table base address */
> +	mfspr	r10, SPRN_MD_TWC	/* ....and get the pte address */
> +	lwz	r10, 0(r10)		/* Get the pte */
> +	/* Insert the Guarded flag into the TWC from the Linux PTE.
> +	 * It is bit 27 of both the Linux PTE and the TWC
> +	 */
> +	rlwimi	r11, r10, 0, 27, 27
> +	DO_8xx_CPU6(0x3b80, r3)
> +	mtspr	SPRN_MD_TWC, r11
> +	mfspr	r11, SPRN_MD_TWC	/* get the pte address again */
>  
>  	ori	r10, r10, _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_HWWRITE
>  	stw	r10, 0(r11)		/* and update pte in table */

^ permalink raw reply

* Re: [PATCH 2/8] 8xx: Update TLB asm so it behaves as linux mm expects.
From: Benjamin Herrenschmidt @ 2009-10-11 21:25 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <1255278912-8042-3-git-send-email-Joakim.Tjernlund@transmode.se>

On Sun, 2009-10-11 at 18:35 +0200, Joakim Tjernlund wrote:
> @@ -522,26 +539,12 @@ DataTLBError:

I still think you should remove everything from DataTLBError, it's
just plain useless :-)

Ben.

>         mfspr   r11, SPRN_MD_TWC                /* ....and get the pte
> address */
>         lwz     r10, 0(r11)             /* Get the pte */
>  
> -       andi.   r11, r10, _PAGE_RW      /* Is it writeable? */
> -       beq     2f                      /* Bail out if not */
> -
> -       /* Update 'changed', among others.
> -       */
> -#ifdef CONFIG_SWAP
> -       ori     r10, r10, _PAGE_DIRTY|_PAGE_HWWRITE
> -       /* do not set the _PAGE_ACCESSED bit of a non-present page */
> -       andi.   r11, r10, _PAGE_PRESENT
> -       beq     4f
> -       ori     r10, r10, _PAGE_ACCESSED
> -4:
> -#else
> -       ori     r10, r10, _PAGE_DIRTY|_PAGE_ACCESSED|_PAGE_HWWRITE
> -#endif
> -       mfspr   r11, SPRN_MD_TWC                /* Get pte address
> again */
> +       ori     r10, r10, _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_HWWRITE
>         stw     r10, 0(r11)             /* and update pte in table */
> +       xori    r10, r10, _PAGE_RW      /* RW bit is inverted */
>  
>         /* The Linux PTE won't go exactly into the MMU TLB.
> -        * Software indicator bits 21, 22 and 28 must be clear.
> +        * Software indicator bits 22 and 28 must be clear.
>          * Software indicator bits 24, 25, 26, and 27 must be
>          * set.  All other Linux PTE bits control the behavior
>          * of the MMU.
> -- 
> 1.6.4.4
> 
> 

^ permalink raw reply

* [PATCH 5/8] 8xx: dcbst sets store bit in DTLB error, workaround.
From: Joakim Tjernlund @ 2009-10-11 16:35 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev@ozlabs.org, Rex Feany,
	Scott Wood
In-Reply-To: <1255278912-8042-5-git-send-email-Joakim.Tjernlund@transmode.se>

dcbst should not set the store bit(bit 6, DSISR) when
trapping into a DTLB Error. Clear this bit while doing
the dcbX missing DAR workaround.
---
 arch/powerpc/kernel/head_8xx.S |   34 +++++++++++++++++++++++++++++++---
 1 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 9839e79..027856e 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -496,10 +496,14 @@ DataTLBError:
 	cmpwi	cr0, r10, 0x00f0
 	beq-	FixDAR	/* must be a buggy dcbX, icbi insn. */
 DARFix:	/* Return from dcbx instruction bug workaround, r10 holds value of DAR */
-
 	mfspr	r11, SPRN_DSISR
-	andis.	r11, r11, 0x4800	/* !translation or protection */
-	bne	2f	/* branch if either is set */
+	/* As the DAR fixup may clear store we may have all 3 states zero.
+	 * Make sure only 0x0200(store) falls down into DIRTY handling
+	 */
+	andis.	r11, r11, 0x4a00	/* !translation, protection or store */
+	srwi	r11, r11, 16
+	cmpwi	cr0, r11, 0x0200	/* just store ? */
+	bne	2f
 	/* Only Change bit left now, do it here as it is faster
 	 * than trapping to the C fault handler.
 	*/
@@ -632,6 +636,30 @@ FixDAR:	/* Entry point for dcbx workaround. */
 	tophys  (r11, r10)
 	beq-	139b		/* Branch if user space address */
 140:	lwz	r11,0(r11)
+/* Check if it really is a dcbx instruction. */
+/* dcbt and dcbtst does not generate DTLB Misses/Errors,
+ * no need to include them here */
+	srwi	r10, r11, 26	/* check if major OP code is 31 */
+	cmpwi	cr0, r10, 31
+	bne-	141f
+	rlwinm	r10, r11, 0, 21, 30
+	cmpwi	cr0, r10, 2028	/* Is dcbz? */
+	beq+	142f
+	cmpwi	cr0, r10, 940	/* Is dcbi? */
+	beq+	142f
+	cmpwi	cr0, r10, 108	/* Is dcbst? */
+	beq+	144f		/* Fix up store bit! */
+	cmpwi	cr0, r10, 172	/* Is dcbf? */
+	beq+	142f
+	cmpwi	cr0, r10, 1964	/* Is icbi? */
+	beq+	142f
+141:	mfspr	r10, SPRN_DAR	/* r10 must hold DAR at exit */
+	b	DARfix		/* Nope, go back to normal TLB processing */
+
+144:	mfspr	r10, SPRN_DSISR
+	rlwinm	r10, r10,0,7,5	/* Clear store bit for buggy dcbst insn */
+	mtspr	SPRN_DSISR, r10
+142:	/* continue, it was a dcbx, dcbi instruction. */
 #ifdef CONFIG_8xx_CPU6
 	lwz	r3, 8(r0)	/* restore r3 from memory */
 #endif
-- 
1.6.4.4

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox