LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 1/1] powerpc/85xx: Board support for ppa8548
From: Stef van Os @ 2013-02-03 19:39 UTC (permalink / raw)
  To: Kumar Gala, linuxppc-dev
  Cc: Scott Wood, Paul Mackerras, Timur Tabi, Stef van Os

Initial board support for the Prodrive PPA8548 AMC module. Board
is an MPC8548 AMC platform used in RapidIO systems. This module is
also used to test/work on mainline linux RapidIO software.

PPA8548 overview:
- 1.3 GHz Freescale PowerQUICC III MPC8548 processor
- 1 GB DDR2 @ 266 MHz
- 8 MB NOR flash
- Serial RapidIO 1.2
- 1 x 10/100/1000 BASE-T front ethernet
- 1 x 1000 BASE-BX ethernet on AMC connector

Signed-off-by: Stef van Os <stef.van.os@prodrive.nl>
---
 arch/powerpc/boot/dts/ppa8548.dts           |  163 +++++++++++++++++++++++++++
 arch/powerpc/configs/85xx/ppa8548_defconfig |   65 +++++++++++
 arch/powerpc/platforms/85xx/Kconfig         |    7 +
 arch/powerpc/platforms/85xx/Makefile        |    1 +
 arch/powerpc/platforms/85xx/ppa8548.c       |   98 ++++++++++++++++
 5 files changed, 334 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/boot/dts/ppa8548.dts
 create mode 100644 arch/powerpc/configs/85xx/ppa8548_defconfig
 create mode 100644 arch/powerpc/platforms/85xx/ppa8548.c

diff --git a/arch/powerpc/boot/dts/ppa8548.dts b/arch/powerpc/boot/dts/ppa8548.dts
new file mode 100644
index 0000000..b5f5220
--- /dev/null
+++ b/arch/powerpc/boot/dts/ppa8548.dts
@@ -0,0 +1,163 @@
+/*
+ * PPA8548 Device Tree Source (36-bit address map)
+ * Copyright 2013 Prodrive B.V.
+ *
+ * Based on:
+ * MPC8548 CDS Device Tree Source (36-bit address map)
+ * Copyright 2012 Freescale Semiconductor Inc.
+ *
+ * This program is free software; you can redistribute  it and/or modify it
+ * under  the terms of  the GNU General  Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ */
+
+/include/ "fsl/mpc8548si-pre.dtsi"
+
+/ {
+	model = "ppa8548";
+	compatible = "ppa8548";
+	#address-cells = <2>;
+	#size-cells = <2>;
+	interrupt-parent = <&mpic>;
+
+	memory {
+		device_type = "memory";
+		reg = <0 0 0x0 0x40000000>;
+	};
+
+	lbc: localbus@fe0005000 {
+		reg = <0xf 0xe0005000 0 0x1000>;
+		ranges = <0x0 0x0 0xf 0xff800000 0x00800000>;
+	};
+
+	soc: soc8548@fe0000000 {
+		ranges = <0 0xf 0xe0000000 0x100000>;
+	};
+
+	pci0: pci@fe0008000 {
+		status = "disabled";
+	};
+
+	pci1: pci@fe0009000 {
+		status = "disabled";
+	};
+
+	pci2: pcie@fe000a000 {
+		status = "disabled";
+	};
+
+	rio: rapidio@fe00c0000 {
+		reg = <0xf 0xe00c0000 0x0 0x11000>;
+		port1 {
+			ranges = <0x0 0x0 0x0 0x80000000 0x0 0x40000000>;
+		};
+	};
+};
+
+&lbc {
+	nor@0 {
+		#address-cells = <1>;
+		#size-cells = <1>;
+		compatible = "cfi-flash";
+		reg = <0x0 0x0 0x00800000>;
+		bank-width = <2>;
+		device-width = <2>;
+
+		partition@0 {
+			reg = <0x0 0x7A0000>;
+			label = "user";
+		};
+
+		partition@7A0000 {
+			reg = <0x7A0000 0x20000>;
+			label = "env";
+			read-only;
+		};
+
+		partition@7C0000 {
+			reg = <0x7C0000 0x40000>;
+			label = "u-boot";
+			read-only;
+		};
+	};
+};
+
+&soc {
+	i2c@3000 {
+		rtc@6f {
+			compatible = "intersil,isl1208";
+			reg = <0x6f>;
+		};
+	};
+
+	i2c@3100 {
+	};
+
+	/*
+	 * Only ethernet controller @25000 and @26000 are used.
+	 * Use alias enet2 and enet3 for the remainig controllers,
+	 * to stay compatible with mpc8548si-pre.dtsi.
+	 */
+	enet2: ethernet@24000 {
+		status = "disabled";
+	};
+
+	mdio@24520 {
+		phy0: ethernet-phy@0 {
+			interrupts = <7 1 0 0>;
+			reg = <0x0>;
+			device_type = "ethernet-phy";
+		};
+		phy1: ethernet-phy@1 {
+			interrupts = <8 1 0 0>;
+			reg = <0x1>;
+			device_type = "ethernet-phy";
+		};
+		tbi0: tbi-phy@11 {
+			reg = <0x11>;
+			device_type = "tbi-phy";
+		};
+	};
+
+	enet0: ethernet@25000 {
+		tbi-handle = <&tbi1>;
+		phy-handle = <&phy0>;
+	};
+
+	mdio@25520 {
+		tbi1: tbi-phy@11 {
+			reg = <0x11>;
+			device_type = "tbi-phy";
+		};
+	};
+
+	enet1: ethernet@26000 {
+		tbi-handle = <&tbi2>;
+		phy-handle = <&phy1>;
+	};
+
+	mdio@26520 {
+		tbi2: tbi-phy@11 {
+			reg = <0x11>;
+			device_type = "tbi-phy";
+		};
+	};
+
+	enet3: ethernet@27000 {
+		status = "disabled";
+	};
+
+	mdio@27520 {
+		tbi3: tbi-phy@11 {
+			reg = <0x11>;
+			device_type = "tbi-phy";
+		};
+	};
+
+	crypto@30000 {
+		status = "disabled";
+	};
+};
+
+/include/ "fsl/mpc8548si-post.dtsi"
diff --git a/arch/powerpc/configs/85xx/ppa8548_defconfig b/arch/powerpc/configs/85xx/ppa8548_defconfig
new file mode 100644
index 0000000..a11337d
--- /dev/null
+++ b/arch/powerpc/configs/85xx/ppa8548_defconfig
@@ -0,0 +1,65 @@
+CONFIG_PPC_85xx=y
+CONFIG_PPA8548=y
+CONFIG_DTC=y
+CONFIG_DEFAULT_UIMAGE=y
+CONFIG_IKCONFIG=y
+CONFIG_IKCONFIG_PROC=y
+# CONFIG_PCI is not set
+# CONFIG_USB_SUPPORT is not set
+CONFIG_ADVANCED_OPTIONS=y
+CONFIG_LOWMEM_SIZE_BOOL=y
+CONFIG_LOWMEM_SIZE=0x40000000
+CONFIG_LOWMEM_CAM_NUM_BOOL=y
+CONFIG_LOWMEM_CAM_NUM=4
+CONFIG_PAGE_OFFSET_BOOL=y
+CONFIG_PAGE_OFFSET=0xb0000000
+CONFIG_KERNEL_START_BOOL=y
+CONFIG_KERNEL_START=0xb0000000
+# CONFIG_PHYSICAL_START_BOOL is not set
+CONFIG_PHYSICAL_START=0x00000000
+CONFIG_PHYSICAL_ALIGN=0x04000000
+CONFIG_TASK_SIZE_BOOL=y
+CONFIG_TASK_SIZE=0xb0000000
+
+CONFIG_FSL_LBC=y
+CONFIG_FSL_DMA=y
+CONFIG_FSL_RIO=y
+
+CONFIG_RAPIDIO=y
+CONFIG_RAPIDIO_DMA_ENGINE=y
+CONFIG_RAPIDIO_TSI57X=y
+CONFIG_RAPIDIO_TSI568=y
+CONFIG_RAPIDIO_CPS_XX=y
+CONFIG_RAPIDIO_CPS_GEN2=y
+CONFIG_SERIAL_8250=y
+CONFIG_SERIAL_8250_CONSOLE=y
+CONFIG_PROC_DEVICETREE=y
+
+CONFIG_MTD=y
+CONFIG_MTD_BLKDEVS=y
+CONFIG_MTD_BLOCK=y
+CONFIG_MTD_CFI=y
+CONFIG_MTD_CFI_AMDSTD=y
+CONFIG_MTD_CFI_INTELEXT=y
+CONFIG_MTD_CHAR=y
+CONFIG_MTD_CMDLINE_PARTS=y
+CONFIG_MTD_CONCAT=y
+CONFIG_MTD_PARTITIONS=y
+CONFIG_MTD_PHYSMAP_OF=y
+
+CONFIG_I2C=y
+CONFIG_I2C_MPC=y
+CONFIG_I2C_CHARDEV
+CONFIG_RTC_CLASS=y
+CONFIG_RTC_HCTOSYS=y
+CONFIG_RTC_DRV_ISL1208=y
+
+CONFIG_NET=y
+CONFIG_INET=y
+CONFIG_IP_PNP=y
+CONFIG_NETDEVICES=y
+CONFIG_MII=y
+CONFIG_GIANFAR=y
+CONFIG_MARVELL_PHY=y
+CONFIG_NFS_FS=y
+CONFIG_ROOT_NFS=y
diff --git a/arch/powerpc/platforms/85xx/Kconfig b/arch/powerpc/platforms/85xx/Kconfig
index 02d02a0..803bfed 100644
--- a/arch/powerpc/platforms/85xx/Kconfig
+++ b/arch/powerpc/platforms/85xx/Kconfig
@@ -191,6 +191,13 @@ config SBC8548
 	help
 	  This option enables support for the Wind River SBC8548 board
 
+config PPA8548
+	bool "Prodrive PPA8548"
+	help
+	  This option enables support for the Prodrive PPA8548 board.
+	select DEFAULT_UIMAGE
+	select HAS_RAPIDIO
+
 config GE_IMP3A
 	bool "GE Intelligent Platforms IMP3A"
 	select DEFAULT_UIMAGE
diff --git a/arch/powerpc/platforms/85xx/Makefile b/arch/powerpc/platforms/85xx/Makefile
index 76f679c..d5f4195 100644
--- a/arch/powerpc/platforms/85xx/Makefile
+++ b/arch/powerpc/platforms/85xx/Makefile
@@ -25,6 +25,7 @@ obj-$(CONFIG_P5040_DS)    += p5040_ds.o corenet_ds.o
 obj-$(CONFIG_STX_GP3)	  += stx_gp3.o
 obj-$(CONFIG_TQM85xx)	  += tqm85xx.o
 obj-$(CONFIG_SBC8548)     += sbc8548.o
+obj-$(CONFIG_PPA8548)     += ppa8548.o
 obj-$(CONFIG_SOCRATES)    += socrates.o socrates_fpga_pic.o
 obj-$(CONFIG_KSI8560)	  += ksi8560.o
 obj-$(CONFIG_XES_MPC85xx) += xes_mpc85xx.o
diff --git a/arch/powerpc/platforms/85xx/ppa8548.c b/arch/powerpc/platforms/85xx/ppa8548.c
new file mode 100644
index 0000000..af7bf1a
--- /dev/null
+++ b/arch/powerpc/platforms/85xx/ppa8548.c
@@ -0,0 +1,98 @@
+/*
+ * ppa8548 setup and early boot code.
+ *
+ * Copyright 2009 Prodrive B.V..
+ *
+ * By Stef van Os (see MAINTAINERS for contact information)
+ *
+ * Based on the SBC8548 support - Copyright 2007 Wind River Systems Inc.
+ * Based on the MPC8548CDS support - Copyright 2005 Freescale Inc.
+ *
+ * This program is free software; you can redistribute  it and/or modify it
+ * under  the terms of  the GNU General  Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ */
+
+#include <linux/stddef.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/reboot.h>
+#include <linux/seq_file.h>
+#include <linux/of_platform.h>
+
+#include <asm/machdep.h>
+#include <asm/udbg.h>
+#include <asm/mpic.h>
+
+#include <sysdev/fsl_soc.h>
+
+static void __init ppa8548_pic_init(void)
+{
+	struct mpic *mpic = mpic_alloc(NULL, 0, MPIC_BIG_ENDIAN,
+			0, 256, " OpenPIC  ");
+	BUG_ON(mpic == NULL);
+	mpic_init(mpic);
+}
+
+/*
+ * Setup the architecture
+ */
+static void __init ppa8548_setup_arch(void)
+{
+	if (ppc_md.progress)
+		ppc_md.progress("ppa8548_setup_arch()", 0);
+}
+
+static void ppa8548_show_cpuinfo(struct seq_file *m)
+{
+	uint svid, phid1;
+
+	svid = mfspr(SPRN_SVR);
+
+	seq_printf(m, "Vendor\t\t: Prodrive B.V.\n");
+	seq_printf(m, "SVR\t\t: 0x%x\n", svid);
+
+	/* Display cpu Pll setting */
+	phid1 = mfspr(SPRN_HID1);
+	seq_printf(m, "PLL setting\t: 0x%x\n", ((phid1 >> 24) & 0x3f));
+}
+
+static struct of_device_id __initdata of_bus_ids[] = {
+	{ .name = "soc", },
+	{ .type = "soc", },
+	{ .compatible = "simple-bus", },
+	{ .compatible = "gianfar", },
+	{ .compatible = "fsl,srio", },
+	{},
+};
+
+static int __init declare_of_platform_devices(void)
+{
+	of_platform_bus_probe(NULL, of_bus_ids, NULL);
+
+	return 0;
+}
+machine_device_initcall(ppa8548, declare_of_platform_devices);
+
+/*
+ * Called very early, device-tree isn't unflattened
+ */
+static int __init ppa8548_probe(void)
+{
+	unsigned long root = of_get_flat_dt_root();
+
+	return of_flat_dt_is_compatible(root, "ppa8548");
+}
+
+define_machine(ppa8548) {
+	.name		= "ppa8548",
+	.probe		= ppa8548_probe,
+	.setup_arch	= ppa8548_setup_arch,
+	.init_IRQ	= ppa8548_pic_init,
+	.show_cpuinfo	= ppa8548_show_cpuinfo,
+	.get_irq	= mpic_get_irq,
+	.restart	= fsl_rstcr_restart,
+	.calibrate_decr = generic_calibrate_decr,
+	.progress	= udbg_progress,
+};
-- 
1.7.2.5

^ permalink raw reply related

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Rafael J. Wysocki @ 2013-02-03 20:44 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-s390, Toshi Kani, jiang.liu, wency, linux-acpi, yinghai,
	linux-kernel, linux-mm, isimatu.yasuaki, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <1810611.i6Sc4oLaux@vostro.rjw.lan>

On Saturday, February 02, 2013 09:15:37 PM Rafael J. Wysocki wrote:
> On Saturday, February 02, 2013 03:58:01 PM Greg KH wrote:
> > On Fri, Feb 01, 2013 at 11:12:59PM +0100, Rafael J. Wysocki wrote:
> > > On Friday, February 01, 2013 08:23:12 AM Greg KH wrote:
> > > > On Thu, Jan 31, 2013 at 09:54:51PM +0100, Rafael J. Wysocki wrote:
> > > > > > > But, again, I'm going to ask why you aren't using the existing cpu /
> > > > > > > memory / bridge / node devices that we have in the kernel.  Please use
> > > > > > > them, or give me a _really_ good reason why they will not work.
> > > > > > 
> > > > > > We cannot use the existing system devices or ACPI devices here.  During
> > > > > > hot-plug, ACPI handler sets this shp_device info, so that cpu and memory
> > > > > > handlers (drivers/cpu.c and mm/memory_hotplug.c) can obtain their target
> > > > > > device information in a platform-neutral way.  During hot-add, we first
> > > > > > creates an ACPI device node (i.e. device under /sys/bus/acpi/devices),
> > > > > > but platform-neutral modules cannot use them as they are ACPI-specific.
> > > > > 
> > > > > But suppose we're smart and have ACPI scan handlers that will create
> > > > > "physical" device nodes for those devices during the ACPI namespace scan.
> > > > > Then, the platform-neutral nodes will be able to bind to those "physical"
> > > > > nodes.  Moreover, it should be possible to get a hierarchy of device objects
> > > > > this way that will reflect all of the dependencies we need to take into
> > > > > account during hot-add and hot-remove operations.  That may not be what we
> > > > > have today, but I don't see any *fundamental* obstacles preventing us from
> > > > > using this approach.
> > > > 
> > > > I would _much_ rather see that be the solution here as I think it is the
> > > > proper one.
> > > > 
> > > > > This is already done for PCI host bridges and platform devices and I don't
> > > > > see why we can't do that for the other types of devices too.
> > > > 
> > > > I agree.
> > > > 
> > > > > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > > > > when we try do eject a device at the top of a subtree and need to tear down
> > > > > the entire subtree below it, but if that's going to lead to a system crash,
> > > > > for example, we want to cancel the eject.  It seems to me that we'll need some
> > > > > help from the driver core here.
> > > > 
> > > > I say do what we always have done here, if the user asked us to tear
> > > > something down, let it happen as they are the ones that know best :)
> > > > 
> > > > Seriously, I guess this gets back to the "fail disconnect" idea that the
> > > > ACPI developers keep harping on.  I thought we already resolved this
> > > > properly by having them implement it in their bus code, no reason the
> > > > same thing couldn't happen here, right?
> > > 
> > > Not really. :-)  We haven't ever resolved that particular issue I'm afraid.
> > 
> > Ah, I didn't realize that.
> > 
> > > > I don't think the core needs to do anything special, but if so, I'll be glad
> > > > to review it.
> > > 
> > > OK, so this is the use case.  We have "eject" defined for something like
> > > a container with a number of CPU cores, PCI host bridge, and a memory
> > > controller under it.  And a few pretty much arbitrary I/O devices as a bonus.
> > > 
> > > Now, there's a button on the system case labeled as "Eject" and if that button
> > > is pressed, we're supposed to _try_ to eject all of those things at once.  We
> > > are allowed to fail that request, though, if that's problematic for some
> > > reason, but we're supposed to let the BIOS know about that.
> > > 
> > > Do you seriously think that if that button is pressed, we should just proceed
> > > with removing all that stuff no matter what?  That'd be kind of like Russian
> > > roulette for whoever pressed that button, because s/he could only press it and
> > > wait for the system to either crash or not.  Or maybe to crash a bit later
> > > because of some delayed stuff that would hit one of those devices that had just
> > > gone.  Surely not a situation any admin of a high-availability system would
> > > like to be in. :-)
> > > 
> > > Quite frankly, I have no idea how that can be addressed in a single bus type,
> > > let alone ACPI (which is not even a proper bus type, just something pretending
> > > to be one).
> > 
> > You don't have it as a single bus type, you have a controller somewhere,
> > off of the bus being destroyed, that handles sending remove events to
> > the device and tearing everything down.  PCI does this from the very
> > beginning.
> 
> Yes, but those are just remove events and we can only see how destructive they
> were after the removal.  The point is to be able to figure out whether or not
> we *want* to do the removal in the first place.
> 
> Say you have a computing node which signals a hardware problem in a processor
> package (the container with CPU cores, memory, PCI host bridge etc.).  You
> may want to eject that package, but you don't want to kill the system this
> way.  So if the eject is doable, it is very much desirable to do it, but if it
> is not doable, you'd rather shut the box down and do the replacement afterward.
> That may be costly, however (maybe weeks of computations), so it should be
> avoided if possible, but not at the expense of crashing the box if the eject
> doesn't work out.

It seems to me that we could handle that with the help of a new flag, say
"no_eject", in struct device, a global mutex, and a function that will walk
the given subtree of the device hierarchy and check if "no_eject" is set for
any devices in there.  Plus a global "no_eject" switch, perhaps.

To be more precise, suppose we have a "no_eject" flag that is set for a device
when the kernel is about to start using it in such a way that ejecting it would
lead to serious trouble (i.e. it is a memory module holding the kernel's page
tables or something like that).  Next, suppose that we have a function called,
say, "device_may_eject()" that will walk the subtree of the device hierarchy
starting at the given node and return false if any of the devices in there have
"no_eject" set.  Further, require that device_may_eject() has to be called
under a global mutex called something like "device_eject_lock".  Then, we can
arrange things as follows:

1. When a device is about to be used for such purposes that it shouldn't be
   ejected, the relevant code should:
  (a) Acquire device_eject_lock.
  (b) Check if the device is still there and go to (e) if not.
  (c) Do whatever needs to be done to the device.
  (d) Set the device's no_eject flag.
  (e) Release device_eject_lock.

2. When an eject operation is about to be carried out on a subtree of the device
   hierarchy, the eject code (be it ACPI or something else) should:
  (a) Acquire device_eject_lock.
  (b) Call device_may_eject() on the starting device and go to (d) if it
      returns false.
  (c) Carry out the eject (that includes calling .remove() from all of the
      involved drivers in partiular).
  (d) Release device_eject_lock.

3. When it is OK to eject the device again, the relevant code should just clear
   its "no_eject" flag under device_eject_lock.

If we want to synchronize that with such things like boot or system suspend,
a global "no_eject" switch can be used for that (it needs to be manipulated
under device_eject_lock) and one more step (check the global "no_eject") is
needed between 2(a) and 2(b).

Does it look like a reasonable approach?

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Toshi Kani @ 2013-02-04  0:28 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-s390@vger.kernel.org, jiang.liu@huawei.com,
	wency@cn.fujitsu.com, linux-mm@kvack.org, yinghai@kernel.org,
	linux-kernel@vger.kernel.org, Rafael J. Wysocki,
	linux-acpi@vger.kernel.org, isimatu.yasuaki@jp.fujitsu.com,
	srivatsa.bhat@linux.vnet.ibm.com, guohanjun@huawei.com,
	bhelgaas@google.com, akpm@linux-foundation.org,
	linuxppc-dev@lists.ozlabs.org, lenb@kernel.org
In-Reply-To: <20130202150154.GC1434@kroah.com>

On Sat, 2013-02-02 at 16:01 +0100, Greg KH wrote:
> On Fri, Feb 01, 2013 at 01:40:10PM -0700, Toshi Kani wrote:
> > On Fri, 2013-02-01 at 07:30 +0000, Greg KH wrote:
> > > On Thu, Jan 31, 2013 at 06:32:18PM -0700, Toshi Kani wrote:
> > >  > This is already done for PCI host bridges and platform devices and I don't
> > > > > see why we can't do that for the other types of devices too.
> > > > > 
> > > > > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > > > > when we try do eject a device at the top of a subtree and need to tear down
> > > > > the entire subtree below it, but if that's going to lead to a system crash,
> > > > > for example, we want to cancel the eject.  It seems to me that we'll need some
> > > > > help from the driver core here.
> > > > 
> > > > There are three different approaches suggested for system device
> > > > hot-plug:
> > > >  A. Proceed within system device bus scan.
> > > >  B. Proceed within ACPI bus scan.
> > > >  C. Proceed with a sequence (as a mini-boot).
> > > > 
> > > > Option A uses system devices as tokens, option B uses acpi devices as
> > > > tokens, and option C uses resource tables as tokens, for their handlers.
> > > > 
> > > > Here is summary of key questions & answers so far.  I hope this
> > > > clarifies why I am suggesting option 3.
> > > > 
> > > > 1. What are the system devices?
> > > > System devices provide system-wide core computing resources, which are
> > > > essential to compose a computer system.  System devices are not
> > > > connected to any particular standard buses.
> > > 
> > > Not a problem, lots of devices are not connected to any "particular
> > > standard busses".  All this means is that system devices are connected
> > > to the "system" bus, nothing more.
> > 
> > Can you give me a few examples of other devices that support hotplug and
> > are not connected to any particular buses?  I will investigate them to
> > see how they are managed to support hotplug.
> 
> Any device that is attached to any bus in the driver model can be
> hotunplugged from userspace by telling it to be "unbound" from the
> driver controlling it.  Try it for any platform device in your system to
> see how it happens.

The unbind operation, as I understand from you, is to detach a driver
from a device.  Yes, unbinding can be done for any devices.  It is
however different from hot-plug operation, which unplugs a device.

Today, the unbind operation to an ACPI cpu/memory devices causes
hot-unplug (offline) operation to them, which is one of the major issues
for us since unbind cannot fail.  This patchset addresses this issue by
making the unbind operation of ACPI cpu/memory devices to do the
unbinding only.  ACPI drivers no longer control cpu and memory as they
are supposed to be controlled by their drivers, cpu and memory modules.
The current hotplug code requires putting all device control stuff into
ACPI, which this patchset is trying to fix it.

> > > > 2. Why are the system devices special?
> > > > The system devices are initialized during early boot-time, by multiple
> > > > subsystems, from the boot-up sequence, in pre-defined order.  They
> > > > provide low-level services to enable other subsystems to come up.
> > > 
> > > Sorry, no, that doesn't mean they are special, nothing here is unique
> > > for the point of view of the driver model from any other device or bus.
> > 
> > I think system devices are unique in a sense that they are initialized
> > before drivers run.
> 
> No, most all devices are "initialized" before a driver runs on it, USB
> is one such example, PCI another, and I'm pretty sure that there are
> others.

USB devices can be initialized after the USB bus driver is initialized.
Similarly, PCI devices can be initialized after the PCI bus driver is
initialized.  However, CPU and memory are initialized without any
dependency to their bus driver since there is no such thing.

In addition, CPU and memory have two drivers -- their actual
drivers/subsystems and their ACPI drivers.  Their actual
drivers/subsystems initialize cpu and memory.  The ACPI drivers
initialize driver-specific data of their ACPI nodes.  During hot-plug
operation, however, the current code requires these ACPI drivers to do
their hot-plug operations.  This patchset keeps them separated during
hot-plug and let their actual drivers/subsystems to do the job.

> > > If you need to initialize the driver
> > > core earlier, then do so.  Or, even better, just wait until enough of
> > > the system has come up and then go initialize all of the devices you
> > > have found so far as part of your boot process.
> > 
> > They are pseudo drivers that provide sysfs entry points of cpu and
> > memory.  They do not actually initialize cpu and memory.  I do not think
> > initializing cpu and memory fits into the driver model either, since
> > drivers should run after cpu and memory are initialized.
> 
> We already represent CPUs in the sysfs tree, don't represent them in two
> different places with two different structures.  Use the existing ones
> please.

This patchset does not make any changes to sysfs.  It does however make
the system drivers to control the system devices, and ACPI to do the
ACPI stuff only.

The boot sequence calls the following steps to initialize memory and
cpu, as well as their acpi nodes.  These steps are independent, i.e. #1
and #2 run without ACPI.

 1. mm -> initialize memory -> create sysfs system/memory
 2. smp/cpu -> initialize cpu -> create sysfs system/cpu
 3. acpi core -> acpi scan -> create sysfs acpi nodes

While there are 3 separate steps at boot, the current hotplug code tries
to do everything in #3.  Therefore, this patchset provides a sequencer
(which is similar to the boot sequence) to run these steps during a
hot-plug operation as well.  Hence, we have consistent steps and role
mode between boot and hot-plug operations.

> > > None of the above things you have stated seem to have anything to do
> > > with your proposed patch, so I don't understand why you have mentioned
> > > them...
> > 
> > You suggested option A before, which uses system bus scan to initialize
> > all system devices at boot time as well as hot-plug.  I tried to say
> > that this option would not be doable.
> 
> I haven't yet been convinced otherwise, sorry.  Please prove me wrong :)

This patchset enables the system drivers (i.e. cpu and memory drivers)
to do the hot-plug operations, and ACPI core to do the ACPI stuff.

We should not go with the system driver only approach since we do need
to use ACPI stuff.  Also, we should not go with the ACPI only approach
since we do need to use the system (and other) drivers.  This patchset
provides a sequencer to manage these steps across multiple subsystems.

Thanks,
-Toshi

^ permalink raw reply

* Re: 3.7-rc7: BUG: MAX_STACK_TRACE_ENTRIES too low!
From: Christian Kujau @ 2013-02-04  0:38 UTC (permalink / raw)
  To: Li Zhong; +Cc: linuxppc-dev, LKML
In-Reply-To: <alpine.DEB.2.01.1301271451390.7378@trent.utfs.org>

On Sun, 27 Jan 2013 at 14:56, Christian Kujau wrote:
> Hm, is there no chance to get this into 3.8? I've been running with this 
> patch applied since 3.7-rc7 and it got rid of this 
> "MAX_STACK_TRACE_ENTRIES too low" message. I've just upgraded to 3.8-rc5 
> and it's still not in mainline :-\

Hah! I just noticed that it got merged the next day - Thanks!

Christian.
-- 
BOFH excuse #80:

That's a great computer you have there; have you considered how it would work as a BSD machine?

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Greg KH @ 2013-02-04  1:23 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-s390, Toshi Kani, jiang.liu, wency, linux-acpi, yinghai,
	linux-kernel, linux-mm, isimatu.yasuaki, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <1810611.i6Sc4oLaux@vostro.rjw.lan>

On Sat, Feb 02, 2013 at 09:15:37PM +0100, Rafael J. Wysocki wrote:
> On Saturday, February 02, 2013 03:58:01 PM Greg KH wrote:
> > On Fri, Feb 01, 2013 at 11:12:59PM +0100, Rafael J. Wysocki wrote:
> > > On Friday, February 01, 2013 08:23:12 AM Greg KH wrote:
> > > > On Thu, Jan 31, 2013 at 09:54:51PM +0100, Rafael J. Wysocki wrote:
> > > > > > > But, again, I'm going to ask why you aren't using the existing cpu /
> > > > > > > memory / bridge / node devices that we have in the kernel.  Please use
> > > > > > > them, or give me a _really_ good reason why they will not work.
> > > > > > 
> > > > > > We cannot use the existing system devices or ACPI devices here.  During
> > > > > > hot-plug, ACPI handler sets this shp_device info, so that cpu and memory
> > > > > > handlers (drivers/cpu.c and mm/memory_hotplug.c) can obtain their target
> > > > > > device information in a platform-neutral way.  During hot-add, we first
> > > > > > creates an ACPI device node (i.e. device under /sys/bus/acpi/devices),
> > > > > > but platform-neutral modules cannot use them as they are ACPI-specific.
> > > > > 
> > > > > But suppose we're smart and have ACPI scan handlers that will create
> > > > > "physical" device nodes for those devices during the ACPI namespace scan.
> > > > > Then, the platform-neutral nodes will be able to bind to those "physical"
> > > > > nodes.  Moreover, it should be possible to get a hierarchy of device objects
> > > > > this way that will reflect all of the dependencies we need to take into
> > > > > account during hot-add and hot-remove operations.  That may not be what we
> > > > > have today, but I don't see any *fundamental* obstacles preventing us from
> > > > > using this approach.
> > > > 
> > > > I would _much_ rather see that be the solution here as I think it is the
> > > > proper one.
> > > > 
> > > > > This is already done for PCI host bridges and platform devices and I don't
> > > > > see why we can't do that for the other types of devices too.
> > > > 
> > > > I agree.
> > > > 
> > > > > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > > > > when we try do eject a device at the top of a subtree and need to tear down
> > > > > the entire subtree below it, but if that's going to lead to a system crash,
> > > > > for example, we want to cancel the eject.  It seems to me that we'll need some
> > > > > help from the driver core here.
> > > > 
> > > > I say do what we always have done here, if the user asked us to tear
> > > > something down, let it happen as they are the ones that know best :)
> > > > 
> > > > Seriously, I guess this gets back to the "fail disconnect" idea that the
> > > > ACPI developers keep harping on.  I thought we already resolved this
> > > > properly by having them implement it in their bus code, no reason the
> > > > same thing couldn't happen here, right?
> > > 
> > > Not really. :-)  We haven't ever resolved that particular issue I'm afraid.
> > 
> > Ah, I didn't realize that.
> > 
> > > > I don't think the core needs to do anything special, but if so, I'll be glad
> > > > to review it.
> > > 
> > > OK, so this is the use case.  We have "eject" defined for something like
> > > a container with a number of CPU cores, PCI host bridge, and a memory
> > > controller under it.  And a few pretty much arbitrary I/O devices as a bonus.
> > > 
> > > Now, there's a button on the system case labeled as "Eject" and if that button
> > > is pressed, we're supposed to _try_ to eject all of those things at once.  We
> > > are allowed to fail that request, though, if that's problematic for some
> > > reason, but we're supposed to let the BIOS know about that.
> > > 
> > > Do you seriously think that if that button is pressed, we should just proceed
> > > with removing all that stuff no matter what?  That'd be kind of like Russian
> > > roulette for whoever pressed that button, because s/he could only press it and
> > > wait for the system to either crash or not.  Or maybe to crash a bit later
> > > because of some delayed stuff that would hit one of those devices that had just
> > > gone.  Surely not a situation any admin of a high-availability system would
> > > like to be in. :-)
> > > 
> > > Quite frankly, I have no idea how that can be addressed in a single bus type,
> > > let alone ACPI (which is not even a proper bus type, just something pretending
> > > to be one).
> > 
> > You don't have it as a single bus type, you have a controller somewhere,
> > off of the bus being destroyed, that handles sending remove events to
> > the device and tearing everything down.  PCI does this from the very
> > beginning.
> 
> Yes, but those are just remove events and we can only see how destructive they
> were after the removal.  The point is to be able to figure out whether or not
> we *want* to do the removal in the first place.

Yes, but, you will always race if you try to test to see if you can shut
down a device and then trying to do it.  So walking the bus ahead of
time isn't a good idea.

And, we really don't have a viable way to recover if disconnect() fails,
do we.  What do we do in that situation, restore the other devices we
disconnected successfully?  How do we remember/know what they were?

PCI hotplug almost had this same problem until the designers finally
realized that they just had to accept the fact that removing a PCI
device could either happen by:
	- a user yanking out the device, at which time the OS better
	  clean up properly no matter what happens
	- the user asked nicely to remove a device, and the OS can take
	  as long as it wants to complete that action, including
	  stalling for noticable amounts of time before eventually,
	  always letting the action succeed.

I think the second thing is what you have to do here.  If a user tells
the OS it wants to remove these devices, you better do it.  If you
can't, because memory is being used by someone else, either move them
off, or just hope that nothing bad happens, before the user gets
frustrated and yanks out the CPU/memory module themselves physically :)

> Say you have a computing node which signals a hardware problem in a processor
> package (the container with CPU cores, memory, PCI host bridge etc.).  You
> may want to eject that package, but you don't want to kill the system this
> way.  So if the eject is doable, it is very much desirable to do it, but if it
> is not doable, you'd rather shut the box down and do the replacement afterward.
> That may be costly, however (maybe weeks of computations), so it should be
> avoided if possible, but not at the expense of crashing the box if the eject
> doesn't work out.

These same "situations" came up for PCI hotplug, and I still say the
same resolution there holds true, as described above.  The user wants to
remove something, so let them do it.  They always know best, and get mad
at us if we think otherwise :)

What does the ACPI spec say about this type of thing?  Surely the same
people that did the PCI Hotplug spec were consulted when doing this part
of the spec, right?  Yeah, I know, I can dream...

> > I know it's more complicated with these types of devices, and I think we
> > are getting closer to the correct solution, I just don't want to ever
> > see duplicate devices in the driver model for the same physical device.
> 
> Do you mean two things based on struct device for the same hardware component?
> That's been happening already pretty much forever for every PCI device known
> to the ACPI layer, for PNP and many others.  However, those ACPI things are (or
> rather should be, but we're going to clean that up) only for convenience (to be
> able to see the namespace structure and related things in sysfs).  So the stuff
> under /sys/devices/LNXSYSTM\:00/ is not "real".

Yes, I've never treated that as a "real" device because they (usually)
didn't ever bind to the "real" driver that controlled the device and how
it talked to the rest of the os (like a USB device for example.)  I
always thought just of it as a "shadow" of the firmware image, nothing
that should be directly operated on if at all possible.

But, as you are pointing out, maybe this needs to be changed.  Having
users have to look in one part of the tree for one interface to a
device, and another totally different part for a different interface to
the same physical device is crazy, don't you agree?

As to how to solve it, I really have no idea, I don't know ACPI that
well at all, and honestly, don't want to, I want to keep what little
hair I have left...

> In my view it shouldn't even
> be under /sys/devices/ (/sys/firmware/acpi/ seems to be a better place for it),

I agree.

> but that may be difficult to change without breaking user space (maybe we can
> just symlink it from /sys/devices/ or something).  And the ACPI bus type
> shouldn't even exist in my opinion.
> 
> There's much confusion in there and much work to clean that up, I agree, but
> that's kind of separate from the hotplug thing.

I agree as well.

Best of luck.

greg k-h

^ permalink raw reply

* Re: [PATCH?] Move ACPI device nodes under /sys/firmware/acpi (was: Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework)
From: Greg KH @ 2013-02-04  1:24 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-s390, Toshi Kani, jiang.liu, wency, linux-acpi, yinghai,
	linux-kernel, linux-mm, isimatu.yasuaki, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <2806030.VWUMy6F7lm@vostro.rjw.lan>

On Sat, Feb 02, 2013 at 11:18:20PM +0100, Rafael J. Wysocki wrote:
> On Saturday, February 02, 2013 09:15:37 PM Rafael J. Wysocki wrote:
> > On Saturday, February 02, 2013 03:58:01 PM Greg KH wrote:
> [...]
> > 
> > > I know it's more complicated with these types of devices, and I think we
> > > are getting closer to the correct solution, I just don't want to ever
> > > see duplicate devices in the driver model for the same physical device.
> > 
> > Do you mean two things based on struct device for the same hardware component?
> > That's been happening already pretty much forever for every PCI device known
> > to the ACPI layer, for PNP and many others.  However, those ACPI things are (or
> > rather should be, but we're going to clean that up) only for convenience (to be
> > able to see the namespace structure and related things in sysfs).  So the stuff
> > under /sys/devices/LNXSYSTM\:00/ is not "real".  In my view it shouldn't even
> > be under /sys/devices/ (/sys/firmware/acpi/ seems to be a better place for it),
> > but that may be difficult to change without breaking user space (maybe we can
> > just symlink it from /sys/devices/ or something).  And the ACPI bus type
> > shouldn't even exist in my opinion.
> 
> Well, well.
> 
> In fact, the appended patch moves the whole ACPI device nodes tree under
> /sys/firmware/acpi/ and I'm not seeing any negative consequences of that on my
> test box (events work and so on).  User space is quite new on it, though, and
> the patch is hackish.

Try booting a RHEL 5 image on this type of kernel, or some old Fedora
releases, they were sensitive to changes in sysfs.

greg k-h

^ permalink raw reply

* Re: [PATCH 0/3] Enable multiple MSI feature in pSeries
From: Michael Ellerman @ 2013-02-04  3:23 UTC (permalink / raw)
  To: Mike Qiu; +Cc: tglx, linuxppc-dev, linux-kernel
In-Reply-To: <1358235536-32741-1-git-send-email-qiudayu@linux.vnet.ibm.com>

On Tue, 2013-01-15 at 15:38 +0800, Mike Qiu wrote:
> Currently, multiple MSI feature hasn't been enabled in pSeries,
> These patches try to enbale this feature.

Hi Mike,

> These patches have been tested by using ipr driver, and the driver patch
> has been made by Wen Xiong <wenxiong@linux.vnet.ibm.com>:

So who wrote these patches? Normally we would expect the original author
to post the patches if at all possible.

> [PATCH 0/7] Add support for new IBM SAS controllers

I would like to see the full series, including the driver enablement.

> Test platform: One partition of pSeries with one cpu core(4 SMTs) and 
>                RAID bus controller: IBM PCI-E IPR SAS Adapter (ASIC) in POWER7
> OS version: SUSE Linux Enterprise Server 11 SP2  (ppc64) with 3.8-rc3 kernel 
> 
> IRQ 21 and 22 are assigned to the ipr device which support 2 mutiple MSI.
> 
> The test results is shown by 'cat /proc/interrups':
>           CPU0       CPU1       CPU2       CPU3       
> 21:          6          5          5          5      XICS Level     host1-0
> 22:        817        814        816        813      XICS Level     host1-1

This shows that you are correctly configuring two MSIs.

But the key advantage of using multiple interrupts is to distribute load
across CPUs and improve performance. So I would like to see some
performance numbers that show that there is a real benefit for all the
extra complexity in the code.

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/mm: Fix hash computation function
From: Michael Ellerman @ 2013-02-04  3:29 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: Mike Qiu, paulus, linuxppc-dev
In-Reply-To: <1359524442-5861-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

On Wed, 2013-01-30 at 11:10 +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> The ASM version of hash computation function was truncating the upper bit.
> Make the ASM version similar to hpt_hash function. Remove masking vsid bits.
> Without this patch, we observed hang during bootup due to not satisfying page
> fault request correctly.

Which commit(s) introduced the bug?

cheers

^ permalink raw reply

* Re: [PATCH 2/2] pseries/iommu: remove DDW on kexec
From: Michael Ellerman @ 2013-02-04  3:38 UTC (permalink / raw)
  To: Nishanth Aravamudan; +Cc: miltonm, paulus, anton, nfont, linuxppc-dev
In-Reply-To: <20130129203346.GA7664@linux.vnet.ibm.com>

On Tue, 2013-01-29 at 12:33 -0800, Nishanth Aravamudan wrote:
> Hi Michael,
> 
> On 29.01.2013 [21:58:28 +1100], Michael Ellerman wrote:
> > On Mon, 2013-01-28 at 18:03 -0800, Nishanth Aravamudan wrote:
> > > pseries/iommu: remove DDW on kexec
> > >  ...
> > >     
> > > I believe the simplest, easiest-to-maintain fix is to just change our
> > > initcall to, rather than detecting and updating the new kernel's DDW
> > > knowledge, just remove all DDW configurations. When the drivers
> > > re-initialize, we will set everything back up as it was before.
> > 
> > I don't know this code at all, but this sounds like it will also work
> > for kdump, right? ie. when the original kernel has crashed the 2nd
> > kernel will tear the DDW down and set it back up.
> 
> Yes, my actual test-case (and what was reported as broken) was kdump.
> From my relatively vague (but now growing) understanding of that
> process, kdump does use kexec under the covers to switch to the crash
> kernel, and so does get the same benefit from this change.

OK good. Yeah kdump is basically equivalent to kexec with one major
difference, which is that before a kexec the first kernel is shutdown
more or less normally - as if it was about to reboot.

kdump on the other hand is invoked in the panic path, so nothing is
shutdown in the first kernel (almost, see ehea). So to accommodate kdump
you want any logic to be in the startup path of the 2nd kernel.

> Another datapoint, though, is that it might make sense to recommend (and
> I'm working on figuring this out for the distros, etc) to use
> disable_ddw anyways for the kdump kernel command-line, as DDW isn't
> 'free' and it's unclear if performance is a huge concern for the crash
> kernel (sort of varies with where your storage is, and how much you need
> to dump, which for kdump generally doesn't seem like that much?).

The kdump kernel /should/ just be creating a crash dump and then
rebooting. That may involve writing all of RAM out to disk/nw, which
could be a lot of I/O. So performance may be a concern, but correctness
is much much much more important.

Some folks have talked about having the kdump kernel just come up and
pretend it is back in production, but that is madness for various
reasons and I don't think anyone ever did it. 

cheers

^ permalink raw reply

* Re: [PATCH 0/3] Enable multiple MSI feature in pSeries
From: Mike Qiu @ 2013-02-04  3:49 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: tglx, linuxppc-dev, linux-kernel
In-Reply-To: <1359948180.25414.11.camel@concordia>

[-- Attachment #1: Type: text/plain, Size: 2427 bytes --]

> On Tue, 2013-01-15 at 15:38 +0800, Mike Qiu wrote:
>> Currently, multiple MSI feature hasn't been enabled in pSeries,
>> These patches try to enbale this feature.
> Hi Mike,
>
>> These patches have been tested by using ipr driver, and the driver patch
>> has been made by Wen Xiong <wenxiong@linux.vnet.ibm.com>:
> So who wrote these patches? Normally we would expect the original author
> to post the patches if at all possible.
Hi Michael

These Multiple MSI patches were wrote by myself, you know this feature 
has not enabled
and it need device driver to test whether it works suitable. So I test 
my patches use
Wen Xiong's ipr patches, which has been send out to the maillinglist.

I'm the**original author :)
>
>> [PATCH 0/7] Add support for new IBM SAS controllers
> I would like to see the full series, including the driver enablement.
Yep, but the driver patches were wrote by Wen Xiong and has been send out.
I just use her patches to test my patches. all device support Multiple MSI
can use my feature not only IBM SAS controllers, I also test my patches use
the broadcom wireless card tg3, and also works OK.
>
>> Test platform: One partition of pSeries with one cpu core(4 SMTs) and
>>                 RAID bus controller: IBM PCI-E IPR SAS Adapter (ASIC) in POWER7
>> OS version: SUSE Linux Enterprise Server 11 SP2  (ppc64) with 3.8-rc3 kernel
>>
>> IRQ 21 and 22 are assigned to the ipr device which support 2 mutiple MSI.
>>
>> The test results is shown by 'cat /proc/interrups':
>>            CPU0       CPU1       CPU2       CPU3
>> 21:          6          5          5          5      XICS Level     host1-0
>> 22:        817        814        816        813      XICS Level     host1-1
> This shows that you are correctly configuring two MSIs.
>
> But the key advantage of using multiple interrupts is to distribute load
> across CPUs and improve performance. So I would like to see some
> performance numbers that show that there is a real benefit for all the
> extra complexity in the code.
Yes, the system just has suport two MSIs. Anyway, I will try to do some 
proformance
test, to show the real benefit.
But actually it needs the driver to do so. As the data show above, it 
seems there is
some problems in use the interrupt, the irq 21 use few, most use 22, I 
will discuss
with the driver author to see why and if she fixed, I will give out the 
proformance
result.

Thanks

Mike

>
> cheers
>


[-- Attachment #2: Type: text/html, Size: 3772 bytes --]

^ permalink raw reply

* [git pull] Please pull powerpc.git merge branch
From: Benjamin Herrenschmidt @ 2013-02-04  4:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linuxppc-dev, Linux Kernel list

Hi Linus !

Just so that you don't get too bored on your Island here's a patch for
3.8 fixing a nasty bug that affects the new 64T support that was merged
in 3.7. Please apply whenever you have a chance (and an internet
connection !)

Cheers,
Ben.

The following changes since commit 689dfa894c57842a05bf6dc9f97e6bb71ec5f386:

  powerpc: Max next_tb to prevent from replaying timer interrupt (2013-01-29 10:18:16 +1100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git merge

for you to fetch changes up to eda8eebdd153c48a4e2a3a3ac3cd9e2e31f5c6b3:

  powerpc/mm: Fix hash computation function (2013-02-04 15:15:08 +1100)

----------------------------------------------------------------
Aneesh Kumar K.V (1):
      powerpc/mm: Fix hash computation function

 arch/powerpc/mm/hash_low_64.S |   62 +++++++++++++++++++++++------------------
 1 file changed, 35 insertions(+), 27 deletions(-)

^ permalink raw reply

* Re: [PATCH] powerpc/mm: Fix hash computation function
From: Benjamin Herrenschmidt @ 2013-02-04  4:24 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: Mike Qiu, linuxppc-dev, paulus, Aneesh Kumar K.V
In-Reply-To: <1359948565.25414.12.camel@concordia>

On Mon, 2013-02-04 at 14:29 +1100, Michael Ellerman wrote:
> On Wed, 2013-01-30 at 11:10 +0530, Aneesh Kumar K.V wrote:
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > 
> > The ASM version of hash computation function was truncating the upper bit.
> > Make the ASM version similar to hpt_hash function. Remove masking vsid bits.
> > Without this patch, we observed hang during bootup due to not satisfying page
> > fault request correctly.
> 
> Which commit(s) introduced the bug?

The bug has been there for ever (well, as far back as git goes without
digging the old monster repo) so may as well date from my original port
of the C hashing code to asm.

However it wasn't per-se a bug until Aneesh 64T support was added in 3.7
since we didn't use the top vsid bits that are masked.

I've merged the patch. My only worry is whether there might have been a
reason for the masking in the first place, ie, do we ever carry VSIDs
with bad bits at the top... though the hash mask should take care of
that in any case (well I hope...)

Cheers,
Ben.

^ permalink raw reply

* 3.8: possible circular locking dependency detected
From: Christian Kujau @ 2013-02-04  5:09 UTC (permalink / raw)
  To: LKML; +Cc: linuxppc-dev, b43-dev

Hi,

similar to what I reported earlier [0] for 3.8.0-rc1, this happens during 
"ifup wlan0" (which in effect starts wpa_supplicant to bring up a Broadcom 
b43 wifi network interface). The interface is working though and continues 
to work over several ifup/ifdown iterations.

The backtrace looks awfully similar to the earlier[0] report, but this 
time it had b43* stuff in it so I thought I should report it. Full dmesg 
and .config here: http://nerdbynature.de/bits/3.8.0-rc6/

Christian.

[  807.693791] 
[  807.695519] ======================================================
[  807.697198] [ INFO: possible circular locking dependency detected ]
[  807.698890] 3.8.0-rc6-00008-g8b31849 #1 Not tainted
[  807.700573] -------------------------------------------------------
[  807.702255] wpa_supplicant/4129 is trying to acquire lock:
[  807.703925]  ((&wl->firmware_load)){+.+.+.}, at: [<c0049d1c>] flush_work+0x0/0x2b0
[  807.705696] 
[  807.705696] but task is already holding lock:
[  807.709023]  (rtnl_mutex){+.+.+.}, at: [<c04362d4>] rtnl_lock+0x1c/0x2c
[  807.710743] 
[  807.710743] which lock already depends on the new lock.
[  807.710743] 
[  807.715541] 
[  807.715541] the existing dependency chain (in reverse order) is:
[  807.718691] 
[  807.718691] -> #1 (rtnl_mutex){+.+.+.}:
[  807.721903]        [<c04f69ac>] mutex_lock_nested+0x6c/0x2bc
[  807.723533]        [<c04362d4>] rtnl_lock+0x1c/0x2c
[  807.725138]        [<f20be0d8>] wiphy_register+0x510/0x53c [cfg80211]
[  807.726798]        [<f64815ec>] ieee80211_register_hw+0x3f8/0x82c [mac80211]
[  807.728431]        [<f64c6890>] b43_request_firmware+0x8c/0x198 [b43]
[  807.730025]        [<c004b1f4>] process_one_work+0x1a4/0x498
[  807.731549]        [<c004b8b4>] worker_thread+0x17c/0x428
[  807.733025]        [<c0051280>] kthread+0xa8/0xac
[  807.734439]        [<c0010990>] ret_from_kernel_thread+0x64/0x6c
[  807.735810] 
[  807.735810] -> #0 ((&wl->firmware_load)){+.+.+.}:
[  807.738389]        [<c0071b38>] lock_acquire+0x50/0x6c
[  807.739694]        [<c0049d58>] flush_work+0x3c/0x2b0
[  807.740980]        [<c004c30c>] __cancel_work_timer+0x94/0xec
[  807.742271]        [<f64c748c>] b43_wireless_core_stop+0x5c/0x234 [b43]
[  807.743574]        [<f64c76b0>] b43_op_stop+0x4c/0x88 [b43]
[  807.744888]        [<f64a4c08>] ieee80211_stop_device+0x4c/0x8c [mac80211]
[  807.746240]        [<f64913a8>] ieee80211_do_stop+0x2c0/0x5dc [mac80211]
[  807.747582]        [<f64916dc>] ieee80211_stop+0x18/0x2c [mac80211]
[  807.748925]        [<c0423d94>] __dev_close_many+0xb0/0x100
[  807.750257]        [<c0423e10>] __dev_close+0x2c/0x4c
[  807.751560]        [<c0427514>] __dev_change_flags+0x124/0x178
[  807.752868]        [<c042761c>] dev_change_flags+0x1c/0x64
[  807.754177]        [<c047db9c>] devinet_ioctl+0x69c/0x74c
[  807.755459]        [<c047ecf8>] inet_ioctl+0xcc/0xf8
[  807.756709]        [<c040eab0>] sock_ioctl+0x70/0x2e8
[  807.757948]        [<c00d9f70>] do_vfs_ioctl+0xa4/0x7c0
[  807.759182]        [<c00da6d0>] sys_ioctl+0x44/0x70
[  807.760407]        [<c001084c>] ret_from_syscall+0x0/0x38
[  807.761622] 
[  807.761622] other info that might help us debug this:
[  807.761622] 
[  807.765107]  Possible unsafe locking scenario:
[  807.765107] 
[  807.767412]        CPU0                    CPU1
[  807.768561]        ----                    ----
[  807.769690]   lock(rtnl_mutex);
[  807.770822]                                lock((&wl->firmware_load));
[  807.771970]                                lock(rtnl_mutex);
[  807.773115]   lock((&wl->firmware_load));
[  807.774244] 
[  807.774244]  *** DEADLOCK ***
[  807.774244] 
[  807.777405] 1 lock held by wpa_supplicant/4129:
[  807.778475]  #0:  (rtnl_mutex){+.+.+.}, at: [<c04362d4>] rtnl_lock+0x1c/0x2c
[  807.779628] 
[  807.779628] stack backtrace:
[  807.781720] Call Trace:
[  807.782765] [eea2db20] [c0009160] show_stack+0x48/0x15c (unreliable)
[  807.784087] [eea2db60] [c04fae24] print_circular_bug+0x2b0/0x2c8
[  807.785169] [eea2db90] [c0071300] __lock_acquire+0x14f4/0x18c8
[  807.786254] [eea2dc30] [c0071b38] lock_acquire+0x50/0x6c
[  807.787335] [eea2dc50] [c0049d58] flush_work+0x3c/0x2b0
[  807.788418] [eea2dcc0] [c004c30c] __cancel_work_timer+0x94/0xec
[  807.789516] [eea2dcf0] [f64c748c] b43_wireless_core_stop+0x5c/0x234 [b43]
[  807.790629] [eea2dd20] [f64c76b0] b43_op_stop+0x4c/0x88 [b43]
[  807.791754] [eea2dd40] [f64a4c08] ieee80211_stop_device+0x4c/0x8c [mac80211]
[  807.792888] [eea2dd50] [f64913a8] ieee80211_do_stop+0x2c0/0x5dc [mac80211]
[  807.794016] [eea2dd80] [f64916dc] ieee80211_stop+0x18/0x2c [mac80211]
[  807.795124] [eea2dd90] [c0423d94] __dev_close_many+0xb0/0x100
[  807.796237] [eea2ddb0] [c0423e10] __dev_close+0x2c/0x4c
[  807.797345] [eea2ddd0] [c0427514] __dev_change_flags+0x124/0x178
[  807.798451] [eea2ddf0] [c042761c] dev_change_flags+0x1c/0x64
[  807.799564] [eea2de10] [c047db9c] devinet_ioctl+0x69c/0x74c
[  807.800680] [eea2de70] [c047ecf8] inet_ioctl+0xcc/0xf8
[  807.801792] [eea2de80] [c040eab0] sock_ioctl+0x70/0x2e8
[  807.802905] [eea2dea0] [c00d9f70] do_vfs_ioctl+0xa4/0x7c0
[  807.804025] [eea2df10] [c00da6d0] sys_ioctl+0x44/0x70
[  807.805145] [eea2df40] [c001084c] ret_from_syscall+0x0/0x38
[  807.806270] --- Exception: c01 at 0xfbf2758
[  807.806270]     LR = 0xfbf26bc

[  850.445766] b43-phy1: Broadcom 4306 WLAN found (core revision 5)
[  850.469062] b43-phy1: Found PHY: Analog 2, Type 2 (G), Revision 2
[  850.493199] Broadcom 43xx driver loaded [ Features: P ]
[  850.497448] ieee80211 phy1: Selected rate control algorithm 'minstrel_ht'
[  860.820792] b43-phy1: Loading firmware version 410.2160 (2007-05-26 15:32:10)


[0] https://lkml.org/lkml/2013/1/3/543
-- 
BOFH excuse #174:

Backbone adjustment

^ permalink raw reply

* Re: [PATCH powerpc] Use VPA subfunction macros instead of numbers for vpa calls
From: Michael Ellerman @ 2013-02-04  5:30 UTC (permalink / raw)
  To: Li Zhong; +Cc: Paul Mackerras, PowerPC email list
In-Reply-To: <1359101541.2666.59.camel@ThinkPad-T5421.cn.ibm.com>

On Fri, 2013-01-25 at 16:12 +0800, Li Zhong wrote:
> Use macros in vpa calls. 
> 
> Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>

Looks good.

cheers

^ permalink raw reply

* Re: [PATCH 0/3] Enable multiple MSI feature in pSeries
From: Michael Ellerman @ 2013-02-04  5:56 UTC (permalink / raw)
  To: Mike Qiu; +Cc: tglx, linuxppc-dev, linux-kernel
In-Reply-To: <510F2FDB.6020303@linux.vnet.ibm.com>

On Mon, 2013-02-04 at 11:49 +0800, Mike Qiu wrote:
> > On Tue, 2013-01-15 at 15:38 +0800, Mike Qiu wrote:
> > > Currently, multiple MSI feature hasn't been enabled in pSeries,
> > > These patches try to enbale this feature.
> > Hi Mike,
> > 
> > > These patches have been tested by using ipr driver, and the driver patch
> > > has been made by Wen Xiong <wenxiong@linux.vnet.ibm.com>:
> > So who wrote these patches? Normally we would expect the original author
> > to post the patches if at all possible.
> Hi Michael
> 
> These Multiple MSI patches were wrote by myself, you know this feature
> has not enabled
> and it need device driver to test whether it works suitable. So I test
> my patches use 
> Wen Xiong's ipr patches, which has been send out to the maillinglist.
> 
> I'm the original author :)

Ah OK, sorry, that was more or less clear from your mail but I just
misunderstood.

> > > [PATCH 0/7] Add support for new IBM SAS controllers
> > I would like to see the full series, including the driver enablement.
> Yep, but the driver patches were wrote by Wen Xiong and has been send
> out.

OK, you mean this series?

http://thread.gmane.org/gmane.linux.scsi/79639


> I just use her patches to test my patches. all device support Multiple
> MSI can use my feature not only IBM SAS controllers, I also test my
> patches use the broadcom wireless card tg3, and also works OK.

You mean drivers/net/ethernet/broadcom/tg3.c ? I don't see where it
calls pci_enable_msi_block() ?

All devices /can/ use it, but the driver needs to be updated. Currently
we have two drivers that do so (in Linus' tree), plus the updated IPR.

> > > Test platform: One partition of pSeries with one cpu core(4 SMTs) and 
> > >                RAID bus controller: IBM PCI-E IPR SAS Adapter (ASIC) in POWER7
> > > OS version: SUSE Linux Enterprise Server 11 SP2  (ppc64) with 3.8-rc3 kernel 
> > > 
> > > IRQ 21 and 22 are assigned to the ipr device which support 2 mutiple MSI.
> > > 
> > > The test results is shown by 'cat /proc/interrups':
> > >           CPU0       CPU1       CPU2       CPU3       
> > > 21:          6          5          5          5      XICS Level     host1-0
> > > 22:        817        814        816        813      XICS Level     host1-1
> > This shows that you are correctly configuring two MSIs.
> > 
> > But the key advantage of using multiple interrupts is to distribute load
> > across CPUs and improve performance. So I would like to see some
> > performance numbers that show that there is a real benefit for all the
> > extra complexity in the code.

> Yes, the system just has suport two MSIs. Anyway, I will try to do
> some proformance test, to show the real benefit.
> But actually it needs the driver to do so. As the data show above, it
> seems there is some problems in use the interrupt, the irq 21 use few,
> most use 22, I will discuss with the driver author to see why and if
> she fixed, I will give out the proformance result.

Yeah that would be good.

I really dislike that we have a separate API for multi-MSI vs MSI-X, and
pci_enable_msi_block() also pushes the contiguous power-of-2 allocation
into the irq domain layer, which is unpleasant. So if we really must do
multi-MSI I would like to do it differently.

cheers

^ permalink raw reply

* Re: [PATCH 0/3] Enable multiple MSI feature in pSeries
From: Mike Qiu @ 2013-02-04  6:43 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: tglx, linuxppc-dev, linux-kernel
In-Reply-To: <1359957375.25414.31.camel@concordia>

2013/2/4 13:56, Michael Ellerman:
> On Mon, 2013-02-04 at 11:49 +0800, Mike Qiu wrote:
>>> On Tue, 2013-01-15 at 15:38 +0800, Mike Qiu wrote:
>>>> Currently, multiple MSI feature hasn't been enabled in pSeries,
>>>> These patches try to enbale this feature.
>>> Hi Mike,
>>>
>>>> These patches have been tested by using ipr driver, and the driver patch
>>>> has been made by Wen Xiong <wenxiong@linux.vnet.ibm.com>:
>>> So who wrote these patches? Normally we would expect the original author
>>> to post the patches if at all possible.
>> Hi Michael
>>
>> These Multiple MSI patches were wrote by myself, you know this feature
>> has not enabled
>> and it need device driver to test whether it works suitable. So I test
>> my patches use
>> Wen Xiong's ipr patches, which has been send out to the maillinglist.
>>
>> I'm the original author :)
> Ah OK, sorry, that was more or less clear from your mail but I just
> misunderstood.
>
>>>> [PATCH 0/7] Add support for new IBM SAS controllers
>>> I would like to see the full series, including the driver enablement.
>> Yep, but the driver patches were wrote by Wen Xiong and has been send
>> out.
> OK, you mean this series?
>
> http://thread.gmane.org/gmane.linux.scsi/79639
Yes, exactly.
>
>
>> I just use her patches to test my patches. all device support Multiple
>> MSI can use my feature not only IBM SAS controllers, I also test my
>> patches use the broadcom wireless card tg3, and also works OK.
> You mean drivers/net/ethernet/broadcom/tg3.c ? I don't see where it
> calls pci_enable_msi_block() ?
Yes, I just modify the driver to support mutiple MSI.
>
> All devices /can/ use it, but the driver needs to be updated. Currently
> we have two drivers that do so (in Linus' tree), plus the updated IPR.
Not all devices, just the device which support the multiple MSI by hardware,
can use it
>
>>>> Test platform: One partition of pSeries with one cpu core(4 SMTs) and
>>>>                 RAID bus controller: IBM PCI-E IPR SAS Adapter (ASIC) in POWER7
>>>> OS version: SUSE Linux Enterprise Server 11 SP2  (ppc64) with 3.8-rc3 kernel
>>>>
>>>> IRQ 21 and 22 are assigned to the ipr device which support 2 mutiple MSI.
>>>>
>>>> The test results is shown by 'cat /proc/interrups':
>>>>            CPU0       CPU1       CPU2       CPU3
>>>> 21:          6          5          5          5      XICS Level     host1-0
>>>> 22:        817        814        816        813      XICS Level     host1-1
>>> This shows that you are correctly configuring two MSIs.
>>>
>>> But the key advantage of using multiple interrupts is to distribute load
>>> across CPUs and improve performance. So I would like to see some
>>> performance numbers that show that there is a real benefit for all the
>>> extra complexity in the code.
>> Yes, the system just has suport two MSIs. Anyway, I will try to do
>> some proformance test, to show the real benefit.
>> But actually it needs the driver to do so. As the data show above, it
>> seems there is some problems in use the interrupt, the irq 21 use few,
>> most use 22, I will discuss with the driver author to see why and if
>> she fixed, I will give out the proformance result.
> Yeah that would be good.
>
> I really dislike that we have a separate API for multi-MSI vs MSI-X, and
> pci_enable_msi_block() also pushes the contiguous power-of-2 allocation
> into the irq domain layer, which is unpleasant. So if we really must do
> multi-MSI I would like to do it differently.
Yes, but the multi-MSI must need the hardware support, it is one extend 
for MSI,
The device may sopport MSI and multiple MSI, but not support MSI-X.
for these devices, we'd better use multiple MSI to makes it more efficiency,
compare with MSI.

multi-MSI just can use no more than 32 interrupts

Thanks
>
> cheers
>
>

^ permalink raw reply

* [PATCH RESEND 1/1] arch Kconfig: remove references to IRQ_PER_CPU
From: James Hogan @ 2013-02-04 10:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, linux-mips, James Hogan, linux-ia64, linux-parisc,
	linux-sh, linux-hexagon, Helge Deller, James E.J. Bottomley,
	Fenghua Yu, Paul Mundt, Mike Frysinger, Ralf Baechle,
	uclinux-dist-devel, Thomas Gleixner, linuxppc-dev, Paul Mackerras

The IRQ_PER_CPU Kconfig symbol was removed in the following commit:

Commit 6a58fb3bad099076f36f0f30f44507bc3275cdb6 ("genirq: Remove
CONFIG_IRQ_PER_CPU") merged in v2.6.39-rc1.

But IRQ_PER_CPU wasn't removed from any of the architecture Kconfig
files where it was defined or selected. It's completely unused so remove
the remaining references.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Richard Kuo <rkuo@codeaurora.org>
---

Does anybody want to pick this patch up?

 arch/blackfin/Kconfig |    1 -
 arch/hexagon/Kconfig  |    1 -
 arch/ia64/Kconfig     |    1 -
 arch/mips/Kconfig     |    1 -
 arch/parisc/Kconfig   |    1 -
 arch/powerpc/Kconfig  |    1 -
 arch/sh/Kconfig       |    3 ---
 7 files changed, 0 insertions(+), 9 deletions(-)

diff --git a/arch/blackfin/Kconfig b/arch/blackfin/Kconfig
index b6f3ad5..c709715 100644
--- a/arch/blackfin/Kconfig
+++ b/arch/blackfin/Kconfig
@@ -38,7 +38,6 @@ config BLACKFIN
 	select HAVE_GENERIC_HARDIRQS
 	select GENERIC_ATOMIC64
 	select GENERIC_IRQ_PROBE
-	select IRQ_PER_CPU if SMP
 	select USE_GENERIC_SMP_HELPERS if SMP
 	select HAVE_NMI_WATCHDOG if NMI_WATCHDOG
 	select GENERIC_SMP_IDLE_THREAD
diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig
index 0744f7d..800dd9c 100644
--- a/arch/hexagon/Kconfig
+++ b/arch/hexagon/Kconfig
@@ -12,7 +12,6 @@ config HEXAGON
 	# select ARCH_WANT_OPTIONAL_GPIOLIB
 	# select ARCH_REQUIRE_GPIOLIB
 	# select HAVE_CLK
-	# select IRQ_PER_CPU
 	# select GENERIC_PENDING_IRQ if SMP
 	select HAVE_IRQ_WORK
 	select GENERIC_ATOMIC64
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 3279646..00c2e88 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -29,7 +29,6 @@ config IA64
 	select ARCH_DISCARD_MEMBLOCK
 	select GENERIC_IRQ_PROBE
 	select GENERIC_PENDING_IRQ if SMP
-	select IRQ_PER_CPU
 	select GENERIC_IRQ_SHOW
 	select ARCH_WANT_OPTIONAL_GPIOLIB
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 2ac626a..451c2e7 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -2161,7 +2161,6 @@ source "mm/Kconfig"
 config SMP
 	bool "Multi-Processing support"
 	depends on SYS_SUPPORTS_SMP
-	select IRQ_PER_CPU
 	select USE_GENERIC_SMP_HELPERS
 	help
 	  This enables support for systems with more than one CPU. If you have
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index b77feff..8525be4 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -16,7 +16,6 @@ config PARISC
 	select BROKEN_RODATA
 	select GENERIC_IRQ_PROBE
 	select GENERIC_PCI_IOMAP
-	select IRQ_PER_CPU
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select GENERIC_SMP_IDLE_THREAD
 	select GENERIC_STRNCPY_FROM_USER
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 17903f1..f3d7090a 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -125,7 +125,6 @@ config PPC
 	select HAVE_GENERIC_HARDIRQS
 	select ARCH_WANT_IPC_PARSE_VERSION
 	select SPARSE_IRQ
-	select IRQ_PER_CPU
 	select IRQ_DOMAIN
 	select GENERIC_IRQ_SHOW
 	select GENERIC_IRQ_SHOW_LEVEL
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index babc2b8..6f799ec 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -91,9 +91,6 @@ config GENERIC_CSUM
 config GENERIC_HWEIGHT
 	def_bool y
 
-config IRQ_PER_CPU
-	def_bool y
-
 config GENERIC_GPIO
 	def_bool n
 
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH v2] powerpc/512x: add function for chip select parameter configuration
From: Anatolij Gustschin @ 2013-02-04 10:16 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Anatolij Gustschin
In-Reply-To: <1359725305-23916-1-git-send-email-agust@denx.de>

Add ability to configure chip select (CS) parameters for devices
that need different CS parameters setup after their configuration.
I.e. an FPGA device on LP bus can require different CS parameters
for its bus interface after loading firmware into it. A driver
can easily reconfigure the LPC CS parameters using this function.

Signed-off-by: Anatolij Gustschin <agust@denx.de>
---
v2:
 - define CS in commit log
 - use unsigned int type for cs argument to simplify
   valid CS range check
 - use EXPORT_SYMBOL

 arch/powerpc/include/asm/mpc5121.h           |   16 +++++++++++++
 arch/powerpc/platforms/512x/mpc512x_shared.c |   30 ++++++++++++++++++++++++++
 2 files changed, 46 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/mpc5121.h b/arch/powerpc/include/asm/mpc5121.h
index 8c0ab2c..8ae133e 100644
--- a/arch/powerpc/include/asm/mpc5121.h
+++ b/arch/powerpc/include/asm/mpc5121.h
@@ -53,4 +53,20 @@ struct mpc512x_ccm {
 	u32	m4ccr;	/* MSCAN4 CCR */
 	u8	res[0x98]; /* Reserved */
 };
+
+/*
+ * LPC Module
+ */
+struct mpc512x_lpc {
+	u32	cs_cfg[8];	/* CS config */
+	u32	cs_ctrl;	/* CS Control Register */
+	u32	cs_status;	/* CS Status Register */
+	u32	burst_ctrl;	/* CS Burst Control Register */
+	u32	deadcycle_ctrl;	/* CS Deadcycle Control Register */
+	u32	holdcycle_ctrl;	/* CS Holdcycle Control Register */
+	u32	alt;		/* Address Latch Timing Register */
+};
+
+int mpc512x_cs_config(unsigned int cs, u32 val);
+
 #endif /* __ASM_POWERPC_MPC5121_H__ */
diff --git a/arch/powerpc/platforms/512x/mpc512x_shared.c b/arch/powerpc/platforms/512x/mpc512x_shared.c
index 39d5630..8e2bde9 100644
--- a/arch/powerpc/platforms/512x/mpc512x_shared.c
+++ b/arch/powerpc/platforms/512x/mpc512x_shared.c
@@ -456,3 +456,33 @@ void __init mpc512x_init(void)
 	mpc512x_restart_init();
 	mpc512x_psc_fifo_init();
 }
+
+/**
+ * mpc512x_cs_config - Setup chip select configuration
+ * @cs: chip select number
+ * @val: chip select configuration value
+ *
+ * Perform chip select configuration for devices on LocalPlus Bus.
+ * Intended to dynamically reconfigure the chip select parameters
+ * for configurable devices on the bus.
+ */
+int mpc512x_cs_config(unsigned int cs, u32 val)
+{
+	static struct mpc512x_lpc __iomem *lpc;
+	struct device_node *np;
+
+	if (cs > 7)
+		return -EINVAL;
+
+	if (!lpc) {
+		np = of_find_compatible_node(NULL, NULL, "fsl,mpc5121-lpc");
+		lpc = of_iomap(np, 0);
+		of_node_put(np);
+		if (!lpc)
+			return -ENOMEM;
+	}
+
+	out_be32(&lpc->cs_cfg[cs], val);
+	return 0;
+}
+EXPORT_SYMBOL(mpc512x_cs_config);
-- 
1.7.5.4

^ permalink raw reply related

* Re: [PATCH RESEND 1/1] arch Kconfig: remove references to IRQ_PER_CPU
From: Paul Mundt @ 2013-02-04 11:30 UTC (permalink / raw)
  To: James Hogan
  Cc: linux-arch, linux-mips, linux-ia64, linux-parisc, linux-sh,
	linux-hexagon, Helge Deller, linux-kernel, Fenghua Yu,
	James E.J. Bottomley, Mike Frysinger, Ralf Baechle,
	uclinux-dist-devel, Thomas Gleixner, linuxppc-dev, Paul Mackerras
In-Reply-To: <1359972583-17134-1-git-send-email-james.hogan@imgtec.com>

On Mon, Feb 04, 2013 at 10:09:43AM +0000, James Hogan wrote:
> The IRQ_PER_CPU Kconfig symbol was removed in the following commit:
> 
> Commit 6a58fb3bad099076f36f0f30f44507bc3275cdb6 ("genirq: Remove
> CONFIG_IRQ_PER_CPU") merged in v2.6.39-rc1.
> 
> But IRQ_PER_CPU wasn't removed from any of the architecture Kconfig
> files where it was defined or selected. It's completely unused so remove
> the remaining references.
> 
> Signed-off-by: James Hogan <james.hogan@imgtec.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Mike Frysinger <vapier@gentoo.org>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Ralf Baechle <ralf@linux-mips.org>
> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
> Cc: Helge Deller <deller@gmx.de>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Paul Mundt <lethal@linux-sh.org>
> Acked-by: Tony Luck <tony.luck@intel.com>
> Acked-by: Richard Kuo <rkuo@codeaurora.org>

Acked-by: Paul Mundt <lethal@linux-sh.org>

^ permalink raw reply

* Re: [PATCH?] Move ACPI device nodes under /sys/firmware/acpi (was: Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework)
From: Rafael J. Wysocki @ 2013-02-04 12:34 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-s390, Toshi Kani, jiang.liu, wency, linux-acpi, yinghai,
	linux-kernel, linux-mm, isimatu.yasuaki, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <20130204012447.GB6433@kroah.com>

On Sunday, February 03, 2013 07:24:47 PM Greg KH wrote:
> On Sat, Feb 02, 2013 at 11:18:20PM +0100, Rafael J. Wysocki wrote:
> > On Saturday, February 02, 2013 09:15:37 PM Rafael J. Wysocki wrote:
> > > On Saturday, February 02, 2013 03:58:01 PM Greg KH wrote:
> > [...]
> > > 
> > > > I know it's more complicated with these types of devices, and I think we
> > > > are getting closer to the correct solution, I just don't want to ever
> > > > see duplicate devices in the driver model for the same physical device.
> > > 
> > > Do you mean two things based on struct device for the same hardware component?
> > > That's been happening already pretty much forever for every PCI device known
> > > to the ACPI layer, for PNP and many others.  However, those ACPI things are (or
> > > rather should be, but we're going to clean that up) only for convenience (to be
> > > able to see the namespace structure and related things in sysfs).  So the stuff
> > > under /sys/devices/LNXSYSTM\:00/ is not "real".  In my view it shouldn't even
> > > be under /sys/devices/ (/sys/firmware/acpi/ seems to be a better place for it),
> > > but that may be difficult to change without breaking user space (maybe we can
> > > just symlink it from /sys/devices/ or something).  And the ACPI bus type
> > > shouldn't even exist in my opinion.
> > 
> > Well, well.
> > 
> > In fact, the appended patch moves the whole ACPI device nodes tree under
> > /sys/firmware/acpi/ and I'm not seeing any negative consequences of that on my
> > test box (events work and so on).  User space is quite new on it, though, and
> > the patch is hackish.
> 
> Try booting a RHEL 5 image on this type of kernel, or some old Fedora
> releases, they were sensitive to changes in sysfs.

Well, I've found a machine where it causes problems to happen.

I'll try to add a symlink from /sys/devices to that and see what happens then.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Greg KH @ 2013-02-04 12:46 UTC (permalink / raw)
  To: Toshi Kani
  Cc: linux-s390@vger.kernel.org, jiang.liu@huawei.com,
	wency@cn.fujitsu.com, linux-mm@kvack.org, yinghai@kernel.org,
	linux-kernel@vger.kernel.org, Rafael J. Wysocki,
	linux-acpi@vger.kernel.org, isimatu.yasuaki@jp.fujitsu.com,
	srivatsa.bhat@linux.vnet.ibm.com, guohanjun@huawei.com,
	bhelgaas@google.com, akpm@linux-foundation.org,
	linuxppc-dev@lists.ozlabs.org, lenb@kernel.org
In-Reply-To: <1359937689.23410.82.camel@misato.fc.hp.com>

On Sun, Feb 03, 2013 at 05:28:09PM -0700, Toshi Kani wrote:
> On Sat, 2013-02-02 at 16:01 +0100, Greg KH wrote:
> > On Fri, Feb 01, 2013 at 01:40:10PM -0700, Toshi Kani wrote:
> > > On Fri, 2013-02-01 at 07:30 +0000, Greg KH wrote:
> > > > On Thu, Jan 31, 2013 at 06:32:18PM -0700, Toshi Kani wrote:
> > > >  > This is already done for PCI host bridges and platform devices and I don't
> > > > > > see why we can't do that for the other types of devices too.
> > > > > > 
> > > > > > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > > > > > when we try do eject a device at the top of a subtree and need to tear down
> > > > > > the entire subtree below it, but if that's going to lead to a system crash,
> > > > > > for example, we want to cancel the eject.  It seems to me that we'll need some
> > > > > > help from the driver core here.
> > > > > 
> > > > > There are three different approaches suggested for system device
> > > > > hot-plug:
> > > > >  A. Proceed within system device bus scan.
> > > > >  B. Proceed within ACPI bus scan.
> > > > >  C. Proceed with a sequence (as a mini-boot).
> > > > > 
> > > > > Option A uses system devices as tokens, option B uses acpi devices as
> > > > > tokens, and option C uses resource tables as tokens, for their handlers.
> > > > > 
> > > > > Here is summary of key questions & answers so far.  I hope this
> > > > > clarifies why I am suggesting option 3.
> > > > > 
> > > > > 1. What are the system devices?
> > > > > System devices provide system-wide core computing resources, which are
> > > > > essential to compose a computer system.  System devices are not
> > > > > connected to any particular standard buses.
> > > > 
> > > > Not a problem, lots of devices are not connected to any "particular
> > > > standard busses".  All this means is that system devices are connected
> > > > to the "system" bus, nothing more.
> > > 
> > > Can you give me a few examples of other devices that support hotplug and
> > > are not connected to any particular buses?  I will investigate them to
> > > see how they are managed to support hotplug.
> > 
> > Any device that is attached to any bus in the driver model can be
> > hotunplugged from userspace by telling it to be "unbound" from the
> > driver controlling it.  Try it for any platform device in your system to
> > see how it happens.
> 
> The unbind operation, as I understand from you, is to detach a driver
> from a device.  Yes, unbinding can be done for any devices.  It is
> however different from hot-plug operation, which unplugs a device.

Physically, yes, but to the driver involved, and the driver core, there
is no difference.  That was one of the primary goals of the driver core
creation so many years ago.

> Today, the unbind operation to an ACPI cpu/memory devices causes
> hot-unplug (offline) operation to them, which is one of the major issues
> for us since unbind cannot fail.  This patchset addresses this issue by
> making the unbind operation of ACPI cpu/memory devices to do the
> unbinding only.  ACPI drivers no longer control cpu and memory as they
> are supposed to be controlled by their drivers, cpu and memory modules.

I think that's the problem right there, solve that, please.

> > > > > 2. Why are the system devices special?
> > > > > The system devices are initialized during early boot-time, by multiple
> > > > > subsystems, from the boot-up sequence, in pre-defined order.  They
> > > > > provide low-level services to enable other subsystems to come up.
> > > > 
> > > > Sorry, no, that doesn't mean they are special, nothing here is unique
> > > > for the point of view of the driver model from any other device or bus.
> > > 
> > > I think system devices are unique in a sense that they are initialized
> > > before drivers run.
> > 
> > No, most all devices are "initialized" before a driver runs on it, USB
> > is one such example, PCI another, and I'm pretty sure that there are
> > others.
> 
> USB devices can be initialized after the USB bus driver is initialized.
> Similarly, PCI devices can be initialized after the PCI bus driver is
> initialized.  However, CPU and memory are initialized without any
> dependency to their bus driver since there is no such thing.

You can create such a thing if you want :)

> In addition, CPU and memory have two drivers -- their actual
> drivers/subsystems and their ACPI drivers.

Again, I feel that is the root of the problem.  Rafael seems to be
working on solving this, which I think is essencial to your work as
well.

thanks,

greg k-h

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Greg KH @ 2013-02-04 12:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-s390, Toshi Kani, jiang.liu, wency, linux-acpi, yinghai,
	linux-kernel, linux-mm, isimatu.yasuaki, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <5876609.Ic1nhHW6N2@vostro.rjw.lan>

On Sun, Feb 03, 2013 at 09:44:39PM +0100, Rafael J. Wysocki wrote:
> > Yes, but those are just remove events and we can only see how destructive they
> > were after the removal.  The point is to be able to figure out whether or not
> > we *want* to do the removal in the first place.
> > 
> > Say you have a computing node which signals a hardware problem in a processor
> > package (the container with CPU cores, memory, PCI host bridge etc.).  You
> > may want to eject that package, but you don't want to kill the system this
> > way.  So if the eject is doable, it is very much desirable to do it, but if it
> > is not doable, you'd rather shut the box down and do the replacement afterward.
> > That may be costly, however (maybe weeks of computations), so it should be
> > avoided if possible, but not at the expense of crashing the box if the eject
> > doesn't work out.
> 
> It seems to me that we could handle that with the help of a new flag, say
> "no_eject", in struct device, a global mutex, and a function that will walk
> the given subtree of the device hierarchy and check if "no_eject" is set for
> any devices in there.  Plus a global "no_eject" switch, perhaps.

I think this will always be racy, or at worst, slow things down on
normal device operations as you will always be having to grab this flag
whenever you want to do something new.

See my comments earlier about pci hotplug and the design decisions there
about "no eject" capabilities for why.

thanks,

greg k-h

^ permalink raw reply

* Re: 3.8: possible circular locking dependency detected
From: Michael Ellerman @ 2013-02-04 13:05 UTC (permalink / raw)
  To: Christian Kujau
  Cc: linux-wireless, Johannes Berg, linuxppc-dev, LKML, b43-dev
In-Reply-To: <alpine.DEB.2.01.1302031843450.13050@trent.utfs.org>

On Sun, 2013-02-03 at 21:09 -0800, Christian Kujau wrote:
> Hi,
> 
> similar to what I reported earlier [0] for 3.8.0-rc1, this happens during 
> "ifup wlan0" (which in effect starts wpa_supplicant to bring up a Broadcom 
> b43 wifi network interface). The interface is working though and continues 
> to work over several ifup/ifdown iterations.
> 
> The backtrace looks awfully similar to the earlier[0] report, but this 
> time it had b43* stuff in it so I thought I should report it. Full dmesg 
> and .config here: http://nerdbynature.de/bits/3.8.0-rc6/

>[0] https://lkml.org/lkml/2013/1/3/543

Actually the backtrace looks very different, that was fb vs
console_lock.

This one should probably go to linux-wireless@vger.kernel.org (CC-ed).

IIUI it's actually the work handling that is the problem. In
b43_wireless_core_stop() it calls cancel_work_sync() with the RTNL held,
but the work function (b43_request_firmware()) also takes the RTNL via
ieee80211_register_hw() and wiphy_register(). eg:

> [  807.767412]        CPU0                    CPU1
> [  807.768561]        ----                    ----
> [  807.769690]   lock(rtnl_mutex);
> [  807.770822]                                process_one_work(..., &wl->firmware_load);
> [  807.771970]                                lock(rtnl_mutex);
> [  807.773115]   cancel_work_sync(&wl->firmware_load);

cheers

> [  807.693791] 
> [  807.695519] ======================================================
> [  807.697198] [ INFO: possible circular locking dependency detected ]
> [  807.698890] 3.8.0-rc6-00008-g8b31849 #1 Not tainted
> [  807.700573] -------------------------------------------------------
> [  807.702255] wpa_supplicant/4129 is trying to acquire lock:
> [  807.703925]  ((&wl->firmware_load)){+.+.+.}, at: [<c0049d1c>] flush_work+0x0/0x2b0
> [  807.705696] 
> [  807.705696] but task is already holding lock:
> [  807.709023]  (rtnl_mutex){+.+.+.}, at: [<c04362d4>] rtnl_lock+0x1c/0x2c
> [  807.710743] 
> [  807.710743] which lock already depends on the new lock.
> [  807.710743] 
> [  807.715541] 
> [  807.715541] the existing dependency chain (in reverse order) is:
> [  807.718691] 
> [  807.718691] -> #1 (rtnl_mutex){+.+.+.}:
> [  807.721903]        [<c04f69ac>] mutex_lock_nested+0x6c/0x2bc
> [  807.723533]        [<c04362d4>] rtnl_lock+0x1c/0x2c
> [  807.725138]        [<f20be0d8>] wiphy_register+0x510/0x53c [cfg80211]
> [  807.726798]        [<f64815ec>] ieee80211_register_hw+0x3f8/0x82c [mac80211]
> [  807.728431]        [<f64c6890>] b43_request_firmware+0x8c/0x198 [b43]
> [  807.730025]        [<c004b1f4>] process_one_work+0x1a4/0x498
> [  807.731549]        [<c004b8b4>] worker_thread+0x17c/0x428
> [  807.733025]        [<c0051280>] kthread+0xa8/0xac
> [  807.734439]        [<c0010990>] ret_from_kernel_thread+0x64/0x6c
> [  807.735810] 
> [  807.735810] -> #0 ((&wl->firmware_load)){+.+.+.}:
> [  807.738389]        [<c0071b38>] lock_acquire+0x50/0x6c
> [  807.739694]        [<c0049d58>] flush_work+0x3c/0x2b0
> [  807.740980]        [<c004c30c>] __cancel_work_timer+0x94/0xec
> [  807.742271]        [<f64c748c>] b43_wireless_core_stop+0x5c/0x234 [b43]
> [  807.743574]        [<f64c76b0>] b43_op_stop+0x4c/0x88 [b43]
> [  807.744888]        [<f64a4c08>] ieee80211_stop_device+0x4c/0x8c [mac80211]
> [  807.746240]        [<f64913a8>] ieee80211_do_stop+0x2c0/0x5dc [mac80211]
> [  807.747582]        [<f64916dc>] ieee80211_stop+0x18/0x2c [mac80211]
> [  807.748925]        [<c0423d94>] __dev_close_many+0xb0/0x100
> [  807.750257]        [<c0423e10>] __dev_close+0x2c/0x4c
> [  807.751560]        [<c0427514>] __dev_change_flags+0x124/0x178
> [  807.752868]        [<c042761c>] dev_change_flags+0x1c/0x64
> [  807.754177]        [<c047db9c>] devinet_ioctl+0x69c/0x74c
> [  807.755459]        [<c047ecf8>] inet_ioctl+0xcc/0xf8
> [  807.756709]        [<c040eab0>] sock_ioctl+0x70/0x2e8
> [  807.757948]        [<c00d9f70>] do_vfs_ioctl+0xa4/0x7c0
> [  807.759182]        [<c00da6d0>] sys_ioctl+0x44/0x70
> [  807.760407]        [<c001084c>] ret_from_syscall+0x0/0x38
> [  807.761622] 
> [  807.761622] other info that might help us debug this:
> [  807.761622] 
> [  807.765107]  Possible unsafe locking scenario:
> [  807.765107] 
> [  807.767412]        CPU0                    CPU1
> [  807.768561]        ----                    ----
> [  807.769690]   lock(rtnl_mutex);
> [  807.770822]                                lock((&wl->firmware_load));
> [  807.771970]                                lock(rtnl_mutex);
> [  807.773115]   lock((&wl->firmware_load));
> [  807.774244] 
> [  807.774244]  *** DEADLOCK ***
> [  807.774244] 
> [  807.777405] 1 lock held by wpa_supplicant/4129:
> [  807.778475]  #0:  (rtnl_mutex){+.+.+.}, at: [<c04362d4>] rtnl_lock+0x1c/0x2c
> [  807.779628] 
> [  807.779628] stack backtrace:
> [  807.781720] Call Trace:
> [  807.782765] [eea2db20] [c0009160] show_stack+0x48/0x15c (unreliable)
> [  807.784087] [eea2db60] [c04fae24] print_circular_bug+0x2b0/0x2c8
> [  807.785169] [eea2db90] [c0071300] __lock_acquire+0x14f4/0x18c8
> [  807.786254] [eea2dc30] [c0071b38] lock_acquire+0x50/0x6c
> [  807.787335] [eea2dc50] [c0049d58] flush_work+0x3c/0x2b0
> [  807.788418] [eea2dcc0] [c004c30c] __cancel_work_timer+0x94/0xec
> [  807.789516] [eea2dcf0] [f64c748c] b43_wireless_core_stop+0x5c/0x234 [b43]
> [  807.790629] [eea2dd20] [f64c76b0] b43_op_stop+0x4c/0x88 [b43]
> [  807.791754] [eea2dd40] [f64a4c08] ieee80211_stop_device+0x4c/0x8c [mac80211]
> [  807.792888] [eea2dd50] [f64913a8] ieee80211_do_stop+0x2c0/0x5dc [mac80211]
> [  807.794016] [eea2dd80] [f64916dc] ieee80211_stop+0x18/0x2c [mac80211]
> [  807.795124] [eea2dd90] [c0423d94] __dev_close_many+0xb0/0x100
> [  807.796237] [eea2ddb0] [c0423e10] __dev_close+0x2c/0x4c
> [  807.797345] [eea2ddd0] [c0427514] __dev_change_flags+0x124/0x178
> [  807.798451] [eea2ddf0] [c042761c] dev_change_flags+0x1c/0x64
> [  807.799564] [eea2de10] [c047db9c] devinet_ioctl+0x69c/0x74c
> [  807.800680] [eea2de70] [c047ecf8] inet_ioctl+0xcc/0xf8
> [  807.801792] [eea2de80] [c040eab0] sock_ioctl+0x70/0x2e8
> [  807.802905] [eea2dea0] [c00d9f70] do_vfs_ioctl+0xa4/0x7c0
> [  807.804025] [eea2df10] [c00da6d0] sys_ioctl+0x44/0x70
> [  807.805145] [eea2df40] [c001084c] ret_from_syscall+0x0/0x38
> [  807.806270] --- Exception: c01 at 0xfbf2758
> [  807.806270]     LR = 0xfbf26bc
> 
> [  850.445766] b43-phy1: Broadcom 4306 WLAN found (core revision 5)
> [  850.469062] b43-phy1: Found PHY: Analog 2, Type 2 (G), Revision 2
> [  850.493199] Broadcom 43xx driver loaded [ Features: P ]
> [  850.497448] ieee80211 phy1: Selected rate control algorithm 'minstrel_ht'
> [  860.820792] b43-phy1: Loading firmware version 410.2160 (2007-05-26 15:32:10)
> 
> 

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Rafael J. Wysocki @ 2013-02-04 13:41 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-s390, Toshi Kani, jiang.liu, wency, linux-acpi, yinghai,
	linux-kernel, linux-mm, isimatu.yasuaki, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <20130204012349.GA6433@kroah.com>

On Sunday, February 03, 2013 07:23:49 PM Greg KH wrote:
> On Sat, Feb 02, 2013 at 09:15:37PM +0100, Rafael J. Wysocki wrote:
> > On Saturday, February 02, 2013 03:58:01 PM Greg KH wrote:
> > > On Fri, Feb 01, 2013 at 11:12:59PM +0100, Rafael J. Wysocki wrote:
> > > > On Friday, February 01, 2013 08:23:12 AM Greg KH wrote:
> > > > > On Thu, Jan 31, 2013 at 09:54:51PM +0100, Rafael J. Wysocki wrote:
> > > > > > > > But, again, I'm going to ask why you aren't using the existing cpu /
> > > > > > > > memory / bridge / node devices that we have in the kernel.  Please use
> > > > > > > > them, or give me a _really_ good reason why they will not work.
> > > > > > > 
> > > > > > > We cannot use the existing system devices or ACPI devices here.  During
> > > > > > > hot-plug, ACPI handler sets this shp_device info, so that cpu and memory
> > > > > > > handlers (drivers/cpu.c and mm/memory_hotplug.c) can obtain their target
> > > > > > > device information in a platform-neutral way.  During hot-add, we first
> > > > > > > creates an ACPI device node (i.e. device under /sys/bus/acpi/devices),
> > > > > > > but platform-neutral modules cannot use them as they are ACPI-specific.
> > > > > > 
> > > > > > But suppose we're smart and have ACPI scan handlers that will create
> > > > > > "physical" device nodes for those devices during the ACPI namespace scan.
> > > > > > Then, the platform-neutral nodes will be able to bind to those "physical"
> > > > > > nodes.  Moreover, it should be possible to get a hierarchy of device objects
> > > > > > this way that will reflect all of the dependencies we need to take into
> > > > > > account during hot-add and hot-remove operations.  That may not be what we
> > > > > > have today, but I don't see any *fundamental* obstacles preventing us from
> > > > > > using this approach.
> > > > > 
> > > > > I would _much_ rather see that be the solution here as I think it is the
> > > > > proper one.
> > > > > 
> > > > > > This is already done for PCI host bridges and platform devices and I don't
> > > > > > see why we can't do that for the other types of devices too.
> > > > > 
> > > > > I agree.
> > > > > 
> > > > > > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > > > > > when we try do eject a device at the top of a subtree and need to tear down
> > > > > > the entire subtree below it, but if that's going to lead to a system crash,
> > > > > > for example, we want to cancel the eject.  It seems to me that we'll need some
> > > > > > help from the driver core here.
> > > > > 
> > > > > I say do what we always have done here, if the user asked us to tear
> > > > > something down, let it happen as they are the ones that know best :)
> > > > > 
> > > > > Seriously, I guess this gets back to the "fail disconnect" idea that the
> > > > > ACPI developers keep harping on.  I thought we already resolved this
> > > > > properly by having them implement it in their bus code, no reason the
> > > > > same thing couldn't happen here, right?
> > > > 
> > > > Not really. :-)  We haven't ever resolved that particular issue I'm afraid.
> > > 
> > > Ah, I didn't realize that.
> > > 
> > > > > I don't think the core needs to do anything special, but if so, I'll be glad
> > > > > to review it.
> > > > 
> > > > OK, so this is the use case.  We have "eject" defined for something like
> > > > a container with a number of CPU cores, PCI host bridge, and a memory
> > > > controller under it.  And a few pretty much arbitrary I/O devices as a bonus.
> > > > 
> > > > Now, there's a button on the system case labeled as "Eject" and if that button
> > > > is pressed, we're supposed to _try_ to eject all of those things at once.  We
> > > > are allowed to fail that request, though, if that's problematic for some
> > > > reason, but we're supposed to let the BIOS know about that.
> > > > 
> > > > Do you seriously think that if that button is pressed, we should just proceed
> > > > with removing all that stuff no matter what?  That'd be kind of like Russian
> > > > roulette for whoever pressed that button, because s/he could only press it and
> > > > wait for the system to either crash or not.  Or maybe to crash a bit later
> > > > because of some delayed stuff that would hit one of those devices that had just
> > > > gone.  Surely not a situation any admin of a high-availability system would
> > > > like to be in. :-)
> > > > 
> > > > Quite frankly, I have no idea how that can be addressed in a single bus type,
> > > > let alone ACPI (which is not even a proper bus type, just something pretending
> > > > to be one).
> > > 
> > > You don't have it as a single bus type, you have a controller somewhere,
> > > off of the bus being destroyed, that handles sending remove events to
> > > the device and tearing everything down.  PCI does this from the very
> > > beginning.
> > 
> > Yes, but those are just remove events and we can only see how destructive they
> > were after the removal.  The point is to be able to figure out whether or not
> > we *want* to do the removal in the first place.
> 
> Yes, but, you will always race if you try to test to see if you can shut
> down a device and then trying to do it.  So walking the bus ahead of
> time isn't a good idea.
>
> And, we really don't have a viable way to recover if disconnect() fails,
> do we.  What do we do in that situation, restore the other devices we
> disconnected successfully?  How do we remember/know what they were?
> 
> PCI hotplug almost had this same problem until the designers finally
> realized that they just had to accept the fact that removing a PCI
> device could either happen by:
> 	- a user yanking out the device, at which time the OS better
> 	  clean up properly no matter what happens
> 	- the user asked nicely to remove a device, and the OS can take
> 	  as long as it wants to complete that action, including
> 	  stalling for noticable amounts of time before eventually,
> 	  always letting the action succeed.
> 
> I think the second thing is what you have to do here.  If a user tells
> the OS it wants to remove these devices, you better do it.  If you
> can't, because memory is being used by someone else, either move them
> off, or just hope that nothing bad happens, before the user gets
> frustrated and yanks out the CPU/memory module themselves physically :)

Well, that we can't help, but sometimes users really *want* the OS to tell them
if it is safe to unplug something at this particualr time (think about the
Windows' "safe remove" feature for USB sticks, for example; that came out of
users' demand AFAIR).

So in my opinion it would be good to give them an option to do "safe eject" or
"forcible eject", whichever they prefer.

> > Say you have a computing node which signals a hardware problem in a processor
> > package (the container with CPU cores, memory, PCI host bridge etc.).  You
> > may want to eject that package, but you don't want to kill the system this
> > way.  So if the eject is doable, it is very much desirable to do it, but if it
> > is not doable, you'd rather shut the box down and do the replacement afterward.
> > That may be costly, however (maybe weeks of computations), so it should be
> > avoided if possible, but not at the expense of crashing the box if the eject
> > doesn't work out.
> 
> These same "situations" came up for PCI hotplug, and I still say the
> same resolution there holds true, as described above.  The user wants to
> remove something, so let them do it.  They always know best, and get mad
> at us if we think otherwise :)

Well, not necessarily.  Users sometimes really don't know what they are doing
and want us to give them a hint.  My opinion is that if we can give them a
hint, there's no reason not to.

> What does the ACPI spec say about this type of thing?  Surely the same
> people that did the PCI Hotplug spec were consulted when doing this part
> of the spec, right?  Yeah, I know, I can dream...

It's not very specific (as usual), but it gives hints. :-)

For example, there is the _OST method (Section 6.3.5 of ACPI 5) that we are
supposed to use to notify the platform of ejection failures and there are
status codes like "0x81: Device in use by application" or "0x82: Device busy"
that can be used in there.  So definitely the authors took ejection failures
for software-related reasons into consideration.

> > > I know it's more complicated with these types of devices, and I think we
> > > are getting closer to the correct solution, I just don't want to ever
> > > see duplicate devices in the driver model for the same physical device.
> > 
> > Do you mean two things based on struct device for the same hardware component?
> > That's been happening already pretty much forever for every PCI device known
> > to the ACPI layer, for PNP and many others.  However, those ACPI things are (or
> > rather should be, but we're going to clean that up) only for convenience (to be
> > able to see the namespace structure and related things in sysfs).  So the stuff
> > under /sys/devices/LNXSYSTM\:00/ is not "real".
> 
> Yes, I've never treated that as a "real" device because they (usually)
> didn't ever bind to the "real" driver that controlled the device and how
> it talked to the rest of the os (like a USB device for example.)  I
> always thought just of it as a "shadow" of the firmware image, nothing
> that should be directly operated on if at all possible.

Precisely.  That's why I'd like to move that stuff away from /sys/devices/
and I don't see a reason why these objects should be based on struct device.
They need kobjects to show up in sysfs, but apart from this they don't really
need anything from struct device as far as I can say.

> But, as you are pointing out, maybe this needs to be changed.  Having
> users have to look in one part of the tree for one interface to a
> device, and another totally different part for a different interface to
> the same physical device is crazy, don't you agree?

Well, it is confusing.  I don't have a problem with exposing the ACPI namespace
in the form of a directory structure in sysfs and I see some benefits from
doing that, but I'd like it to be clear what's represented by that directory
structure and I don't want people to confuse ACPI device objects with devices
(they are abstract interfaces to devices rather than anything else).

> As to how to solve it, I really have no idea, I don't know ACPI that
> well at all, and honestly, don't want to, I want to keep what little
> hair I have left...

I totally understand you. :-)

> > In my view it shouldn't even
> > be under /sys/devices/ (/sys/firmware/acpi/ seems to be a better place for it),
> 
> I agree.
> 
> > but that may be difficult to change without breaking user space (maybe we can
> > just symlink it from /sys/devices/ or something).  And the ACPI bus type
> > shouldn't even exist in my opinion.
> > 
> > There's much confusion in there and much work to clean that up, I agree, but
> > that's kind of separate from the hotplug thing.
> 
> I agree as well.
> 
> Best of luck.

Thanks. :-)

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [PATCH v5 00/45] CPU hotplug: stop_machine()-free CPU hotplug
From: Srivatsa S. Bhat @ 2013-02-04 13:47 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	Srivatsa S. Bhat, netdev, namhyung, akpm, walken, linuxppc-dev,
	linux-arm-kernel
In-Reply-To: <20130122073210.13822.50434.stgit@srivatsabhat.in.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 3770 bytes --]

On 01/22/2013 01:03 PM, Srivatsa S. Bhat wrote:
> Hi,
> 
> This patchset removes CPU hotplug's dependence on stop_machine() from the CPU
> offline path and provides an alternative (set of APIs) to preempt_disable() to
> prevent CPUs from going offline, which can be invoked from atomic context.
> The motivation behind the removal of stop_machine() is to avoid its ill-effects
> and thus improve the design of CPU hotplug. (More description regarding this
> is available in the patches).
> 
> All the users of preempt_disable()/local_irq_disable() who used to use it to
> prevent CPU offline, have been converted to the new primitives introduced in the
> patchset. Also, the CPU_DYING notifiers have been audited to check whether
> they can cope up with the removal of stop_machine() or whether they need to
> use new locks for synchronization (all CPU_DYING notifiers looked OK, without
> the need for any new locks).
> 
> Applies on v3.8-rc4. It currently has some locking issues with cpu idle (on
> which even lockdep didn't provide any insight unfortunately). So for now, it
> works with CONFIG_CPU_IDLE=n.
> 

I ran this patchset on a POWER 7 machine with 32 cores (128 logical CPUs)
[POWER doesn't have the cpu idle issue]. And the results (latency or the time
taken for a single CPU offline) are shown below.

Experiment:
----------

Run a heavy workload (genload from LTP) that generates significant system time;
With '# online CPUs' online, measure the time it takes to complete the stop-m/c
phase in mainline and the equivalent phase in the patched kernel for 1 CPU
offline operation. (It is important to note here that the measurement shows the
average time it takes to perform a *single* CPU offline operation).

Expected results:
----------------

Since stop-machine doesn't scale with no. of online CPUs, we expect the
mainline kernel to take longer and longer for taking 1 CPU offline, with
increasing no. of online CPUs. The patched kernel is expected to take a
constant amount of time, irrespective of the number of online CPUs, because it
has a scalable design.

Experimental results:
---------------------

                 Avg. latency of 1 CPU offline (ms) [stop-cpu/stop-m/c latency]

# online CPUs    Mainline (with stop-m/c)       This patchset (no stop-m/c)

      8                 17.04                          7.73

     16                 18.05                          6.44

     32                 17.31                          7.39

     64                 32.40                          9.28

    128                 98.23                          7.35

Analysis and conclusion:
------------------------

The patched kernel performs pretty well and meets our expectations. It beats
mainline easily. As shown in the table above and the graph attached with this
mail, it has the following advantages:

1. Avg. latency is less than mainline (roughly half that of even the least
   in mainline).

2. The avg. latency is a constant, irrespective of number of online CPUs in
   the system, which proves that the design/synchronization scheme is scalable.

3. Throughout the duration shown above, mainline disables interrupts on all
   CPUs. But the patched kernel not only has a smaller duration of hotplug,
   but also keeps interrupts enabled on other CPUs, which makes CPU offline
   less disruptive on latency-sensitive workloads running on the system.

So, this gives us an idea of how this patchset actually performs. Of course
there are bugs and issues that still need fixing (even mainline crashes with
hotplug sometimes), but I did the above experiment to verify whether the
design is working as expected and whether it really shows significant
improvements over mainline. And thankfully, it does :-)

Regards,
Srivatsa S. Bhat

[-- Attachment #2: CPU hotplug latency.png --]
[-- Type: image/png, Size: 172574 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox