LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH powerpc] cpuidle: fix per cpu accessing of cpuidle_devices
From: Li Zhong @ 2014-01-26  5:09 UTC (permalink / raw)
  To: PowerPC email list, linux-pm
  Cc: deepthi, rafael.j.wysocki, b.zolnierkie, daniel.lezcano,
	kyungmin.park, Paul Mackerras

The patch tries to fix following bad address accessing, seems caused by
a typo in commit 7f74dc0f. cpuidle_devices is defined statically, so
per_cpu should be used to access it. 

[  204.774193] Unable to handle kernel paging request for data at address 0x00a7d000
[  204.774215] Faulting instruction address: 0xc0000000006ee92c
[  204.774225] Oops: Kernel access of bad area, sig: 11 [#1]
[  204.774230] PREEMPT SMP NR_CPUS=16 DEBUG_PAGEALLOC NUMA pSeries
[  204.774244] Modules linked in: binfmt_misc
[  204.774254] CPU: 8 PID: 3590 Comm: bash Not tainted 3.13.0-08526-gb2e448e-dirty #2
[  204.774262] task: c0000001faf40000 ti: c0000001fa904000 task.ti: c0000001fa904000
[  204.774268] NIP: c0000000006ee92c LR: c0000000006ee920 CTR: c000000000064e94
[  204.774275] REGS: c0000001fa907240 TRAP: 0300   Not tainted  (3.13.0-08526-gb2e448e-dirty)
[  204.774281] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI>  CR: 22242424  XER: 20000000
[  204.774303] CFAR: 0000000000009088 DAR: 0000000000a7d000 DSISR: 40000000 SOFTE: 1 
GPR00: c0000000006ee920 c0000001fa9074c0 c000000000d764a8 c000000000c5d4c0 
GPR04: 0000000000000010 0000000000000010 0000000000000000 0000000000000000 
GPR08: ffffffffffffffff c0000000016564a8 ffffffffffffffff 0000000000000000 
GPR12: 0000000000000002 c00000000f33e800 000000001012b3dc 0000000000000000 
GPR16: 0000000000000000 0000000010129c58 0000000010129bf8 000000001012b948 
GPR20: 0000000000000000 000000001012b3e4 000000001013ea90 0000000000000000 
GPR24: c000000000c5d858 c0000000015f38e8 c000000000cb8b26 c000000000c664a8 
GPR28: 0000000000000000 ffffffffffffffe5 0000000000a7d000 0000000000a7d000 
[  204.774407] NIP [c0000000006ee92c] .cpuidle_disable_device+0x30/0xc4
[  204.774413] LR [c0000000006ee920] .cpuidle_disable_device+0x24/0xc4
[  204.774419] Call Trace:
[  204.774424] [c0000001fa9074c0] [c0000000006ee920] .cpuidle_disable_device+0x24/0xc4 (unreliable)
[  204.774435] [c0000001fa907540] [c0000000000652a4] .pseries_cpuidle_add_cpu_notifier+0xb8/0xe0
[  204.774445] [c0000001fa9075c0] [c000000000855548] .notifier_call_chain+0x150/0x1c0
[  204.774455] [c0000001fa907670] [c0000000000b3564] .__raw_notifier_call_chain+0x40/0x50
[  204.774463] [c0000001fa907710] [c000000000079c40] .__cpu_notify+0x50/0x9c
[  204.774472] [c0000001fa9077a0] [c00000000018a054] ._cpu_down+0x1ec/0x388
[  204.774479] [c0000001fa9078b0] [c00000000018a234] .cpu_down+0x44/0x64
[  204.774488] [c0000001fa907940] [c0000000005627bc] .cpu_subsys_offline+0x24/0x3c
[  204.774497] [c0000001fa9079c0] [c00000000055b2a4] .device_offline+0xc8/0x120
[  204.774504] [c0000001fa907a50] [c00000000055d000] .online_store+0x74/0xb0
[  204.774512] [c0000001fa907b00] [c000000000559924] .dev_attr_store+0x60/0x78
[  204.774520] [c0000001fa907ba0] [c00000000028cc78] .sysfs_kf_write+0x7c/0x9c
[  204.774528] [c0000001fa907c30] [c000000000291ae8] .kernfs_fop_write+0x108/0x174
[  204.774537] [c0000001fa907cd0] [c0000000001f8308] .vfs_write+0x110/0x21c
[  204.774545] [c0000001fa907d70] [c0000000001f854c] .SyS_write+0x70/0xbc
[  204.774553] [c0000001fa907e30] [c000000000009d88] syscall_exit+0x0/0x7c
[  204.774560] Instruction dump:
[  204.774565] 7c0802a6 f8010010 fbe1fff8 f821ff81 7c7f1b78 60000000 60000000 7fe3fb78 
[  204.774582] 48000c89 60000000 2fbf0000 419e0084 <e81f0000> 780b17e1 41820078 2fa30000 
[  204.774601] ---[ end trace edf1df93cf81e28d ]---

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/processor_idle.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/processor_idle.c b/arch/powerpc/platforms/pseries/processor_idle.c
index 94134a5..197cadc 100644
--- a/arch/powerpc/platforms/pseries/processor_idle.c
+++ b/arch/powerpc/platforms/pseries/processor_idle.c
@@ -190,7 +190,7 @@ static int pseries_cpuidle_add_cpu_notifier(struct notifier_block *n,
 {
 	int hotcpu = (unsigned long)hcpu;
 	struct cpuidle_device *dev =
-			per_cpu_ptr(cpuidle_devices, hotcpu);
+			per_cpu(cpuidle_devices, hotcpu);
 
 	if (dev && cpuidle_get_driver()) {
 		switch (action) {

^ permalink raw reply related

* [PATCH] powerpc/config: Remove unnecssary CONFIG_FSL_IFC
From: Prabhakar Kushwaha @ 2014-01-25 12:23 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: scottwood, Prabhakar Kushwaha

CONFIG_FSL_IFC gets enabled by Kconfig dependancies.
So remove unnecssary define from the defconfigs

Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
---
 arch/powerpc/configs/corenet64_smp_defconfig |    1 -
 arch/powerpc/configs/mpc85xx_defconfig       |    1 -
 arch/powerpc/configs/mpc85xx_smp_defconfig   |    1 -
 3 files changed, 3 deletions(-)

diff --git a/arch/powerpc/configs/corenet64_smp_defconfig b/arch/powerpc/configs/corenet64_smp_defconfig
index 25b03f8..188e117 100644
--- a/arch/powerpc/configs/corenet64_smp_defconfig
+++ b/arch/powerpc/configs/corenet64_smp_defconfig
@@ -26,7 +26,6 @@ CONFIG_CORENET_GENERIC=y
 CONFIG_BINFMT_MISC=m
 CONFIG_MATH_EMULATION=y
 CONFIG_MATH_EMULATION_HW_UNIMPLEMENTED=y
-CONFIG_FSL_IFC=y
 CONFIG_PCIEPORTBUS=y
 CONFIG_PCI_MSI=y
 CONFIG_RAPIDIO=y
diff --git a/arch/powerpc/configs/mpc85xx_defconfig b/arch/powerpc/configs/mpc85xx_defconfig
index cba638c..3c44a53 100644
--- a/arch/powerpc/configs/mpc85xx_defconfig
+++ b/arch/powerpc/configs/mpc85xx_defconfig
@@ -49,7 +49,6 @@ CONFIG_HIGHMEM=y
 CONFIG_BINFMT_MISC=m
 CONFIG_MATH_EMULATION=y
 CONFIG_FORCE_MAX_ZONEORDER=12
-CONFIG_FSL_IFC=y
 CONFIG_PCI=y
 CONFIG_PCI_MSI=y
 CONFIG_RAPIDIO=y
diff --git a/arch/powerpc/configs/mpc85xx_smp_defconfig b/arch/powerpc/configs/mpc85xx_smp_defconfig
index e315b8a..dd55ed6 100644
--- a/arch/powerpc/configs/mpc85xx_smp_defconfig
+++ b/arch/powerpc/configs/mpc85xx_smp_defconfig
@@ -52,7 +52,6 @@ CONFIG_HIGHMEM=y
 CONFIG_BINFMT_MISC=m
 CONFIG_MATH_EMULATION=y
 CONFIG_FORCE_MAX_ZONEORDER=12
-CONFIG_FSL_IFC=y
 CONFIG_PCI=y
 CONFIG_PCI_MSI=y
 CONFIG_RAPIDIO=y
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 2/2][v6] powerpc/config: Enable memory driver
From: Prabhakar Kushwaha @ 2014-01-25 12:06 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: scottwood, Prabhakar Kushwaha

As Freescale IFC controller has been moved to driver to driver/memory.

So enable memory driver in powerpc config

Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
---
 Based upon git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git
 Branch next

 Changes for v2: Sending as it is
 Changes for v3: Sending as it is
 Changes for v4: Rebased to 
	git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux.git
 changes for v5:
 	- Rebased to branch next of 
	git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux.git
 Changes for v6: Sending as it is

 arch/powerpc/configs/corenet32_smp_defconfig |    1 +
 arch/powerpc/configs/corenet64_smp_defconfig |    1 +
 arch/powerpc/configs/mpc85xx_defconfig       |    1 +
 arch/powerpc/configs/mpc85xx_smp_defconfig   |    1 +
 4 files changed, 4 insertions(+)

diff --git a/arch/powerpc/configs/corenet32_smp_defconfig b/arch/powerpc/configs/corenet32_smp_defconfig
index bbd794d..087d437 100644
--- a/arch/powerpc/configs/corenet32_smp_defconfig
+++ b/arch/powerpc/configs/corenet32_smp_defconfig
@@ -142,6 +142,7 @@ CONFIG_RTC_DRV_DS3232=y
 CONFIG_RTC_DRV_CMOS=y
 CONFIG_UIO=y
 CONFIG_STAGING=y
+CONFIG_MEMORY=y
 CONFIG_VIRT_DRIVERS=y
 CONFIG_FSL_HV_MANAGER=y
 CONFIG_EXT2_FS=y
diff --git a/arch/powerpc/configs/corenet64_smp_defconfig b/arch/powerpc/configs/corenet64_smp_defconfig
index 63508dd..25b03f8 100644
--- a/arch/powerpc/configs/corenet64_smp_defconfig
+++ b/arch/powerpc/configs/corenet64_smp_defconfig
@@ -129,6 +129,7 @@ CONFIG_EDAC=y
 CONFIG_EDAC_MM_EDAC=y
 CONFIG_DMADEVICES=y
 CONFIG_FSL_DMA=y
+CONFIG_MEMORY=y
 CONFIG_EXT2_FS=y
 CONFIG_EXT3_FS=y
 CONFIG_ISO9660_FS=m
diff --git a/arch/powerpc/configs/mpc85xx_defconfig b/arch/powerpc/configs/mpc85xx_defconfig
index 83d3550..cba638c 100644
--- a/arch/powerpc/configs/mpc85xx_defconfig
+++ b/arch/powerpc/configs/mpc85xx_defconfig
@@ -216,6 +216,7 @@ CONFIG_RTC_DRV_CMOS=y
 CONFIG_RTC_DRV_DS1307=y
 CONFIG_DMADEVICES=y
 CONFIG_FSL_DMA=y
+CONFIG_MEMORY=y
 # CONFIG_NET_DMA is not set
 CONFIG_EXT2_FS=y
 CONFIG_EXT3_FS=y
diff --git a/arch/powerpc/configs/mpc85xx_smp_defconfig b/arch/powerpc/configs/mpc85xx_smp_defconfig
index 4b68629..e315b8a 100644
--- a/arch/powerpc/configs/mpc85xx_smp_defconfig
+++ b/arch/powerpc/configs/mpc85xx_smp_defconfig
@@ -217,6 +217,7 @@ CONFIG_RTC_DRV_CMOS=y
 CONFIG_RTC_DRV_DS1307=y
 CONFIG_DMADEVICES=y
 CONFIG_FSL_DMA=y
+CONFIG_MEMORY=y
 # CONFIG_NET_DMA is not set
 CONFIG_EXT2_FS=y
 CONFIG_EXT3_FS=y
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 1/2][v6] driver/memory:Move Freescale IFC driver to a common driver
From: Prabhakar Kushwaha @ 2014-01-25 12:06 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: scottwood, Prabhakar Kushwaha

 Freescale IFC controller has been used for mpc8xxx. It will be used
 for ARM-based SoC as well. This patch moves the driver to driver/memory
 and fix the header file includes.

 Also remove module_platform_driver() and  instead call
 platform_driver_register() from subsys_initcall() to make sure this module
 has been loaded before MTD partition parsing starts.

Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
Based upon git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git
Branch next

Changes for v2:
	- Move fsl_ifc in driver/memory

Changes for v3:
	- move device tree bindings to memory

Changes for v4: Rebased to 
	git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux.git

Changes for v5: 
	- Moved powerpc/Kconfig option to driver/memory

Changes for v6:
	- Update Kconfig details

 .../{powerpc => memory-controllers}/fsl/ifc.txt    |    0
 arch/powerpc/Kconfig                               |    4 ----
 arch/powerpc/sysdev/Makefile                       |    1 -
 drivers/memory/Kconfig                             |    9 +++++++++
 drivers/memory/Makefile                            |    1 +
 {arch/powerpc/sysdev => drivers/memory}/fsl_ifc.c  |    8 ++++++--
 drivers/mtd/nand/fsl_ifc_nand.c                    |    2 +-
 .../include/asm => include/linux}/fsl_ifc.h        |    0
 8 files changed, 17 insertions(+), 8 deletions(-)
 rename Documentation/devicetree/bindings/{powerpc => memory-controllers}/fsl/ifc.txt (100%)
 rename {arch/powerpc/sysdev => drivers/memory}/fsl_ifc.c (98%)
 rename {arch/powerpc/include/asm => include/linux}/fsl_ifc.h (100%)

diff --git a/Documentation/devicetree/bindings/powerpc/fsl/ifc.txt b/Documentation/devicetree/bindings/memory-controllers/fsl/ifc.txt
similarity index 100%
rename from Documentation/devicetree/bindings/powerpc/fsl/ifc.txt
rename to Documentation/devicetree/bindings/memory-controllers/fsl/ifc.txt
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index fa39517..91dc43c 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -727,10 +727,6 @@ config FSL_LBC
 	  controller.  Also contains some common code used by
 	  drivers for specific local bus peripherals.
 
-config FSL_IFC
-	bool
-        depends on FSL_SOC
-
 config FSL_GTM
 	bool
 	depends on PPC_83xx || QUICC_ENGINE || CPM2
diff --git a/arch/powerpc/sysdev/Makefile b/arch/powerpc/sysdev/Makefile
index f67ac90..afbcc37 100644
--- a/arch/powerpc/sysdev/Makefile
+++ b/arch/powerpc/sysdev/Makefile
@@ -21,7 +21,6 @@ obj-$(CONFIG_FSL_SOC)		+= fsl_soc.o fsl_mpic_err.o
 obj-$(CONFIG_FSL_PCI)		+= fsl_pci.o $(fsl-msi-obj-y)
 obj-$(CONFIG_FSL_PMC)		+= fsl_pmc.o
 obj-$(CONFIG_FSL_LBC)		+= fsl_lbc.o
-obj-$(CONFIG_FSL_IFC)		+= fsl_ifc.o
 obj-$(CONFIG_FSL_GTM)		+= fsl_gtm.o
 obj-$(CONFIG_FSL_85XX_CACHE_SRAM)	+= fsl_85xx_l2ctlr.o fsl_85xx_cache_sram.o
 obj-$(CONFIG_SIMPLE_GPIO)	+= simple_gpio.o
diff --git a/drivers/memory/Kconfig b/drivers/memory/Kconfig
index 29a11db..555d26f 100644
--- a/drivers/memory/Kconfig
+++ b/drivers/memory/Kconfig
@@ -50,4 +50,13 @@ config TEGRA30_MC
 	  analysis, especially for IOMMU/SMMU(System Memory Management
 	  Unit) module.
 
+config FSL_IFC
+	bool "Freescale Integrated Flash Controller"
+	default y
+        depends on FSL_SOC
+	help
+	  This driver is for the Integrated Flash Controller Controller(IFC) 
+	  module available in Freescale SoCs. This controller allows to handle flash
+	  devices such as NOR, NAND, FPGA and ASIC etc
+
 endif
diff --git a/drivers/memory/Makefile b/drivers/memory/Makefile
index 969d923..f2bf25c 100644
--- a/drivers/memory/Makefile
+++ b/drivers/memory/Makefile
@@ -6,6 +6,7 @@ ifeq ($(CONFIG_DDR),y)
 obj-$(CONFIG_OF)		+= of_memory.o
 endif
 obj-$(CONFIG_TI_EMIF)		+= emif.o
+obj-$(CONFIG_FSL_IFC)		+= fsl_ifc.o
 obj-$(CONFIG_MVEBU_DEVBUS)	+= mvebu-devbus.o
 obj-$(CONFIG_TEGRA20_MC)	+= tegra20-mc.o
 obj-$(CONFIG_TEGRA30_MC)	+= tegra30-mc.o
diff --git a/arch/powerpc/sysdev/fsl_ifc.c b/drivers/memory/fsl_ifc.c
similarity index 98%
rename from arch/powerpc/sysdev/fsl_ifc.c
rename to drivers/memory/fsl_ifc.c
index fbc885b..3d5d792 100644
--- a/arch/powerpc/sysdev/fsl_ifc.c
+++ b/drivers/memory/fsl_ifc.c
@@ -29,8 +29,8 @@
 #include <linux/of.h>
 #include <linux/of_device.h>
 #include <linux/platform_device.h>
+#include <linux/fsl_ifc.h>
 #include <asm/prom.h>
-#include <asm/fsl_ifc.h>
 
 struct fsl_ifc_ctrl *fsl_ifc_ctrl_dev;
 EXPORT_SYMBOL(fsl_ifc_ctrl_dev);
@@ -298,7 +298,11 @@ static struct platform_driver fsl_ifc_ctrl_driver = {
 	.remove      = fsl_ifc_ctrl_remove,
 };
 
-module_platform_driver(fsl_ifc_ctrl_driver);
+static int __init fsl_ifc_init(void)
+{
+	return platform_driver_register(&fsl_ifc_ctrl_driver);
+}
+subsys_initcall(fsl_ifc_init);
 
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Freescale Semiconductor");
diff --git a/drivers/mtd/nand/fsl_ifc_nand.c b/drivers/mtd/nand/fsl_ifc_nand.c
index 4335577..865b323 100644
--- a/drivers/mtd/nand/fsl_ifc_nand.c
+++ b/drivers/mtd/nand/fsl_ifc_nand.c
@@ -30,7 +30,7 @@
 #include <linux/mtd/nand.h>
 #include <linux/mtd/partitions.h>
 #include <linux/mtd/nand_ecc.h>
-#include <asm/fsl_ifc.h>
+#include <linux/fsl_ifc.h>
 
 #define FSL_IFC_V1_1_0	0x01010000
 #define ERR_BYTE		0xFF /* Value returned for read
diff --git a/arch/powerpc/include/asm/fsl_ifc.h b/include/linux/fsl_ifc.h
similarity index 100%
rename from arch/powerpc/include/asm/fsl_ifc.h
rename to include/linux/fsl_ifc.h
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 2/2][v9] powerpc/fsl-booke: Add initial T104x_QDS board support
From: Prabhakar Kushwaha @ 2014-01-25 11:41 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: scottwood, Poonam Aggrwal, Prabhakar Kushwaha, Priyanka Jain

 Add support for T104x board in board file t104x_qds.c, It is common for
 both T1040 and T1042 as they share same QDS board.

 T1040QDS board Overview
 -----------------------
 - SERDES Connections, 8 lanes supporting:
      =E2=80=94 PCI Express: supporting Gen 1 and Gen 2;
      =E2=80=94 SGMII
      =E2=80=94 QSGMII
      =E2=80=94 SATA 2.0
      =E2=80=94 Aurora debug with dedicated connectors (T1040 only)
 - DDR Controller
     - Supports rates of up to 1600 MHz data-rate
     - Supports one DDR3LP UDIMM/RDIMMs, of single-, dual- or quad-rank t=
ypes.
 -IFC/Local Bus
     - NAND flash: 8-bit, async, up to 2GB.
     - NOR: 8-bit or 16-bit, non-multiplexed, up to 512MB
     - GASIC: Simple (minimal) target within Qixis FPGA
     - PromJET rapid memory download support
 - Ethernet
     - Two on-board RGMII 10/100/1G ethernet ports.
     - PHY #0 remains powered up during deep-sleep (T1040 only)
 - QIXIS System Logic FPGA
 - Clocks
     - System and DDR clock (SYSCLK, =E2=80=9CDDRCLK=E2=80=9D)
     - SERDES clocks
 - Power Supplies
 - Video
     - DIU supports video at up to 1280x1024x32bpp
 - USB
     - Supports two USB 2.0 ports with integrated PHYs
     =E2=80=94 Two type A ports with 5V@1.5A per port.
     =E2=80=94 Second port can be converted to OTG mini-AB
 - SDHC
     - SDHC port connects directly to an adapter card slot, featuring:
     - Supporting SD slots for: SD, SDHC (1x, 4x, 8x) and/or MMC
     =E2=80=94 Supporting eMMC memory devices
 - SPI
    -  On-board support of 3 different devices and sizes
 - Other IO
    - Two Serial ports
    - ProfiBus port
    - Four I2C ports

Add T104xQDS support in Kconfig and Makefile. Also create device tree.
Following features are currently not implmented.
  - SerDes: Aurora
  - IFC: GASIC, Promjet
  - QIXIS
  - Ethernet
  - DIU
  - power supplies management
  - ProfiBus

Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
Signed-off-by: Poonam Aggrwal <poonam.aggrwal@freescale.com>
Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
---
 Based upon git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.gi=
t
 Branch next

 Changes for v2: Incorporated Scott's comments
	- Created t104xqds.dtsi, both t1040qds & t1042qds include it
	- Updated get_irq=20
 Changes for v3: Sending as it is
 Changes for v4: Updated description
 Changes for v5: Incorporated Scott's comments
 	- Ported on top of Kevin's patch
 Changes for v6: Updated depedencies
 Changes for v7: Incororated Scott's commetns
    - Create patch set
 Changes for v8: Sending as it is
 Changes for v9:
    - removed partitions
    - updated FPGA node
    - Fix PCIe range fields

 arch/powerpc/boot/dts/t1040qds.dts            |   46 +++++++
 arch/powerpc/boot/dts/t1042qds.dts            |   46 +++++++
 arch/powerpc/boot/dts/t104xqds.dtsi           |  165 +++++++++++++++++++=
++++++
 arch/powerpc/platforms/85xx/Kconfig           |    2 +-
 arch/powerpc/platforms/85xx/corenet_generic.c |    4 +
 5 files changed, 262 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/boot/dts/t1040qds.dts
 create mode 100644 arch/powerpc/boot/dts/t1042qds.dts
 create mode 100644 arch/powerpc/boot/dts/t104xqds.dtsi

diff --git a/arch/powerpc/boot/dts/t1040qds.dts b/arch/powerpc/boot/dts/t=
1040qds.dts
new file mode 100644
index 0000000..973c29c
--- /dev/null
+++ b/arch/powerpc/boot/dts/t1040qds.dts
@@ -0,0 +1,46 @@
+/*
+ * T1040QDS Device Tree Source
+ *
+ * Copyright 2013 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions ar=
e met:
+ *     * Redistributions of source code must retain the above copyright
+ *	 notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyrig=
ht
+ *	 notice, this list of conditions and the following disclaimer in the
+ *	 documentation and/or other materials provided with the distribution.
+ *     * Neither the name of Freescale Semiconductor nor the
+ *	 names of its contributors may be used to endorse or promote products
+ *	 derived from this software without specific prior written permission=
.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of th=
e
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor "AS IS" AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMP=
LIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE AR=
E
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR A=
NY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DA=
MAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SE=
RVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUS=
ED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR=
 TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE=
 OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/include/ "fsl/t104xsi-pre.dtsi"
+/include/ "t104xqds.dtsi"
+
+/ {
+	model =3D "fsl,T1040QDS";
+	compatible =3D "fsl,T1040QDS";
+	#address-cells =3D <2>;
+	#size-cells =3D <2>;
+	interrupt-parent =3D <&mpic>;
+};
+
+/include/ "fsl/t1040si-post.dtsi"
diff --git a/arch/powerpc/boot/dts/t1042qds.dts b/arch/powerpc/boot/dts/t=
1042qds.dts
new file mode 100644
index 0000000..45bd037
--- /dev/null
+++ b/arch/powerpc/boot/dts/t1042qds.dts
@@ -0,0 +1,46 @@
+/*
+ * T1042QDS Device Tree Source
+ *
+ * Copyright 2013 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions ar=
e met:
+ *     * Redistributions of source code must retain the above copyright
+ *	 notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyrig=
ht
+ *	 notice, this list of conditions and the following disclaimer in the
+ *	 documentation and/or other materials provided with the distribution.
+ *     * Neither the name of Freescale Semiconductor nor the
+ *	 names of its contributors may be used to endorse or promote products
+ *	 derived from this software without specific prior written permission=
.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of th=
e
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor "AS IS" AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMP=
LIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE AR=
E
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR A=
NY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DA=
MAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SE=
RVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUS=
ED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR=
 TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE=
 OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/include/ "fsl/t104xsi-pre.dtsi"
+/include/ "t104xqds.dtsi"
+
+/ {
+	model =3D "fsl,T1042QDS";
+	compatible =3D "fsl,T1042QDS";
+	#address-cells =3D <2>;
+	#size-cells =3D <2>;
+	interrupt-parent =3D <&mpic>;
+};
+
+/include/ "fsl/t1042si-post.dtsi"
diff --git a/arch/powerpc/boot/dts/t104xqds.dtsi b/arch/powerpc/boot/dts/=
t104xqds.dtsi
new file mode 100644
index 0000000..61d62d3
--- /dev/null
+++ b/arch/powerpc/boot/dts/t104xqds.dtsi
@@ -0,0 +1,165 @@
+/*
+ * T104xQDS Device Tree Source
+ *
+ * Copyright 2013 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions ar=
e met:
+ *     * Redistributions of source code must retain the above copyright
+ *	 notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyrig=
ht
+ *	 notice, this list of conditions and the following disclaimer in the
+ *	 documentation and/or other materials provided with the distribution.
+ *     * Neither the name of Freescale Semiconductor nor the
+ *	 names of its contributors may be used to endorse or promote products
+ *	 derived from this software without specific prior written permission=
.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of th=
e
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor "AS IS" AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMP=
LIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE AR=
E
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR A=
NY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DA=
MAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SE=
RVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUS=
ED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR=
 TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE=
 OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/ {
+	model =3D "fsl,T1040QDS";
+	#address-cells =3D <2>;
+	#size-cells =3D <2>;
+	interrupt-parent =3D <&mpic>;
+
+	ifc: localbus@ffe124000 {
+		reg =3D <0xf 0xfe124000 0 0x2000>;
+		ranges =3D <0 0 0xf 0xe8000000 0x08000000
+			  2 0 0xf 0xff800000 0x00010000
+			  3 0 0xf 0xffdf0000 0x00008000>;
+
+		nor@0,0 {
+			#address-cells =3D <1>;
+			#size-cells =3D <1>;
+			compatible =3D "cfi-flash";
+			reg =3D <0x0 0x0 0x8000000>;
+
+			bank-width =3D <2>;
+			device-width =3D <1>;
+		};
+
+		nand@2,0 {
+			#address-cells =3D <1>;
+			#size-cells =3D <1>;
+			compatible =3D "fsl,ifc-nand";
+			reg =3D <0x2 0x0 0x10000>;
+		};
+
+		board-control@3,0 {
+			#address-cells =3D <1>;
+			#size-cells =3D <1>;
+			compatible =3D "fsl,fpga-qixis";
+			reg =3D <3 0 0x300>;
+		};
+	};
+
+	memory {
+		device_type =3D "memory";
+	};
+
+	dcsr: dcsr@f00000000 {
+		ranges =3D <0x00000000 0xf 0x00000000 0x01072000>;
+	};
+
+	soc: soc@ffe000000 {
+		ranges =3D <0x00000000 0xf 0xfe000000 0x1000000>;
+		reg =3D <0xf 0xfe000000 0 0x00001000>;
+		spi@110000 {
+			flash@0 {
+				#address-cells =3D <1>;
+				#size-cells =3D <1>;
+				compatible =3D "micron,n25q512a";
+				reg =3D <0>;
+				spi-max-frequency =3D <10000000>; /* input clock */
+			};
+		};
+
+		i2c@118000 {
+			pca9547@77 {
+				compatible =3D "philips,pca9547";
+				reg =3D <0x77>;
+			};
+			rtc@68 {
+				compatible =3D "dallas,ds3232";
+				reg =3D <0x68>;
+				interrupts =3D <0x1 0x1 0 0>;
+			};
+		};
+	};
+
+	pci0: pcie@ffe240000 {
+		reg =3D <0xf 0xfe240000 0 0x10000>;
+		ranges =3D <0x02000000 0 0xe0000000 0xc 0x00000000 0x0 0x10000000
+			  0x01000000 0 0x00000000 0xf 0xf8000000 0x0 0x00010000>;
+		pcie@0 {
+			ranges =3D <0x02000000 0 0xe0000000
+				  0x02000000 0 0xe0000000
+				  0 0x10000000
+
+				  0x01000000 0 0x00000000
+				  0x01000000 0 0x00000000
+				  0 0x00010000>;
+		};
+	};
+
+	pci1: pcie@ffe250000 {
+		reg =3D <0xf 0xfe250000 0 0x10000>;
+		ranges =3D <0x02000000 0x0 0xe0000000 0xc 0x10000000 0x0 0x10000000
+			  0x01000000 0x0 0x00000000 0xf 0xf8010000 0x0 0x00010000>;
+		pcie@0 {
+			ranges =3D <0x02000000 0 0xe0000000
+				  0x02000000 0 0xe0000000
+				  0 0x10000000
+
+				  0x01000000 0 0x00000000
+				  0x01000000 0 0x00000000
+				  0 0x00010000>;
+		};
+	};
+
+	pci2: pcie@ffe260000 {
+		reg =3D <0xf 0xfe260000 0 0x10000>;
+		ranges =3D <0x02000000 0 0xe0000000 0xc 0x20000000 0 0x10000000
+			  0x01000000 0 0x00000000 0xf 0xf8020000 0 0x00010000>;
+		pcie@0 {
+			ranges =3D <0x02000000 0 0xe0000000
+				  0x02000000 0 0xe0000000
+				  0 0x10000000
+
+				  0x01000000 0 0x00000000
+				  0x01000000 0 0x00000000
+				  0 0x00010000>;
+		};
+	};
+
+	pci3: pcie@ffe270000 {
+		reg =3D <0xf 0xfe270000 0 0x10000>;
+		ranges =3D <0x02000000 0 0xe0000000 0xc 0x30000000 0 0x10000000
+			  0x01000000 0 0x00000000 0xf 0xf8030000 0 0x00010000>;
+		pcie@0 {
+			ranges =3D <0x02000000 0 0xe0000000
+				  0x02000000 0 0xe0000000
+				  0 0x10000000
+
+				  0x01000000 0 0x00000000
+				  0x01000000 0 0x00000000
+				  0 0x00010000>;
+		};
+	};
+};
diff --git a/arch/powerpc/platforms/85xx/Kconfig b/arch/powerpc/platforms=
/85xx/Kconfig
index c17aae8..b9fc713 100644
--- a/arch/powerpc/platforms/85xx/Kconfig
+++ b/arch/powerpc/platforms/85xx/Kconfig
@@ -267,7 +267,7 @@ config CORENET_GENERIC
 	  For 64bit kernel, the following boards are supported:
 	    T4240 QDS and B4 QDS
 	  The following boards are supported for both 32bit and 64bit kernel:
-	    P5020 DS and P5040 DS
+	    P5020 DS, P5040 DS and T104xQDS
=20
 endif # FSL_SOC_BOOKE
=20
diff --git a/arch/powerpc/platforms/85xx/corenet_generic.c b/arch/powerpc=
/platforms/85xx/corenet_generic.c
index fbd871e..f4a7621 100644
--- a/arch/powerpc/platforms/85xx/corenet_generic.c
+++ b/arch/powerpc/platforms/85xx/corenet_generic.c
@@ -106,6 +106,8 @@ static const char * const boards[] __initconst =3D {
 	"fsl,B4860QDS",
 	"fsl,B4420QDS",
 	"fsl,B4220QDS",
+	"fsl,T1040QDS",
+	"fsl,T1042QDS",
 	NULL
 };
=20
@@ -119,6 +121,8 @@ static const char * const hv_boards[] __initconst =3D=
 {
 	"fsl,B4860QDS-hv",
 	"fsl,B4420QDS-hv",
 	"fsl,B4220QDS-hv",
+	"fsl,T1040QDS-hv",
+	"fsl,T1042QDS-hv",
 	NULL
 };
=20
--=20
1.7.9.5

^ permalink raw reply related

* [PATCH 1/2][v9] powerpc/mpc85xx:Add initial device tree support of T104x
From: Prabhakar Kushwaha @ 2014-01-25 11:40 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Poonam Aggrwal, Priyanka Jain, scottwood, Varun Sethi,
	Prabhakar Kushwaha

The QorIQ T1040/T1042 processor support four integrated 64-bit e5500 PA
processor cores with high-performance data path acceleration architecture
and network peripheral interfaces required for networking & telecommunications.

T1042 personality is a reduced personality of T1040 without Integrated 8-port
Gigabit Ethernet switch.

The T1040/T1042 SoC includes the following function and features:

 - Four e5500 cores, each with a private 256 KB L2 cache
 - 256 KB shared L3 CoreNet platform cache (CPC)
 - Interconnect CoreNet platform
 - 32-/64-bit DDR3L/DDR4 SDRAM memory controller with ECC and interleaving
   support
 - Data Path Acceleration Architecture (DPAA) incorporating acceleration
 for the following functions:
    -  Packet parsing, classification, and distribution
    -  Queue management for scheduling, packet sequencing, and congestion
    	management
    -  Cryptography Acceleration (SEC 5.0)
    - RegEx Pattern Matching Acceleration (PME 2.2)
    - IEEE Std 1588 support
    - Hardware buffer management for buffer allocation and deallocation
 - Ethernet interfaces
    - Integrated 8-port Gigabit Ethernet switch (T1040 only)
    - Four 1 Gbps Ethernet controllers
 - Two RGMII interfaces or one RGMII and one MII interfaces
 - High speed peripheral interfaces
   - Four PCI Express 2.0 controllers running at up to 5 GHz
   - Two SATA controllers supporting 1.5 and 3.0 Gb/s operation
   - Upto two QSGMII interface
   - Upto six SGMII interface supporting 1000 Mbps
   - One SGMII interface supporting upto 2500 Mbps
 - Additional peripheral interfaces
   - Two USB 2.0 controllers with integrated PHY
   - SD/eSDHC/eMMC
   -  eSPI controller
   - Four I2C controllers
   - Four UARTs
   - Four GPIO controllers
   - Integrated flash controller (IFC)
   - Change this to  LCD/ HDMI interface (DIU) with 12 bit dual data rate
   - TDM interface
 - Multicore programmable interrupt controller (PIC)
 - Two 8-channel DMA engines
 - Single source clocking implementation
 - Deep Sleep power implementaion (wakeup from GPIO/Timer/Ethernet/USB)

Signed-off-by: Poonam Aggrwal <poonam.aggrwal@freescale.com>
Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
---
 Based upon git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git
 Branch next

Changes for v2: Incorporated Scott's comments
	- Update t1040si-post.dtsi
    - update clock device tree node as per
      http://patchwork.ozlabs.org/patch/274134/
    - removed DMA node, It will be added later as per
      http://patchwork.ozlabs.org/patch/271238/
    - Updated display compatible field
 
 Changes for v3: Incorporated Scott's comments
   - Updated soc compatible field
   - updated clock compatible field

 Changes for v4: Sending as it is 
 Changes for v5: Sending as it is 
 Changes for v6: Updated branch of creation 
 Changes for v7: Incororated Scott's commetns
    - Create patch set
    - remove whitespace
    - Removed l2switch. It will be added later
 Changes for v8: Incorporated Scott's comments
    - Added comment line in T1042si-post.dtsi
    - removed extra lines
 Changes for v9:
    - updated clock node as per new bindings
    - removed extra line
    - added dma node

 arch/powerpc/boot/dts/fsl/t1040si-post.dtsi |  433 +++++++++++++++++++++++++++
 arch/powerpc/boot/dts/fsl/t1042si-post.dtsi |   37 +++
 arch/powerpc/boot/dts/fsl/t104xsi-pre.dtsi  |  104 +++++++
 3 files changed, 574 insertions(+)
 create mode 100644 arch/powerpc/boot/dts/fsl/t1040si-post.dtsi
 create mode 100644 arch/powerpc/boot/dts/fsl/t1042si-post.dtsi
 create mode 100644 arch/powerpc/boot/dts/fsl/t104xsi-pre.dtsi

diff --git a/arch/powerpc/boot/dts/fsl/t1040si-post.dtsi b/arch/powerpc/boot/dts/fsl/t1040si-post.dtsi
new file mode 100644
index 0000000..fa1f32a
--- /dev/null
+++ b/arch/powerpc/boot/dts/fsl/t1040si-post.dtsi
@@ -0,0 +1,433 @@
+/*
+ * T1040 Silicon/SoC Device Tree Source (post include)
+ *
+ * Copyright 2013 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *     * Redistributions of source code must retain the above copyright
+ *	 notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *	 notice, this list of conditions and the following disclaimer in the
+ *	 documentation and/or other materials provided with the distribution.
+ *     * Neither the name of Freescale Semiconductor nor the
+ *	 names of its contributors may be used to endorse or promote products
+ *	 derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+&ifc {
+	#address-cells = <2>;
+	#size-cells = <1>;
+	compatible = "fsl,ifc", "simple-bus";
+	interrupts = <25 2 0 0>;
+};
+
+&pci0 {
+	compatible = "fsl,t1040-pcie", "fsl,qoriq-pcie-v2.4", "fsl,qoriq-pcie";
+	device_type = "pci";
+	#size-cells = <2>;
+	#address-cells = <3>;
+	bus-range = <0x0 0xff>;
+	interrupts = <20 2 0 0>;
+	fsl,iommu-parent = <&pamu0>;
+	pcie@0 {
+		reg = <0 0 0 0 0>;
+		#interrupt-cells = <1>;
+		#size-cells = <2>;
+		#address-cells = <3>;
+		device_type = "pci";
+		interrupts = <20 2 0 0>;
+		interrupt-map-mask = <0xf800 0 0 7>;
+		interrupt-map = <
+			/* IDSEL 0x0 */
+			0000 0 0 1 &mpic 40 1 0 0
+			0000 0 0 2 &mpic 1 1 0 0
+			0000 0 0 3 &mpic 2 1 0 0
+			0000 0 0 4 &mpic 3 1 0 0
+			>;
+	};
+};
+
+&pci1 {
+	compatible = "fsl,t1040-pcie", "fsl,qoriq-pcie-v2.4", "fsl,qoriq-pcie";
+	device_type = "pci";
+	#size-cells = <2>;
+	#address-cells = <3>;
+	bus-range = <0 0xff>;
+	interrupts = <21 2 0 0>;
+	fsl,iommu-parent = <&pamu0>;
+	pcie@0 {
+		reg = <0 0 0 0 0>;
+		#interrupt-cells = <1>;
+		#size-cells = <2>;
+		#address-cells = <3>;
+		device_type = "pci";
+		interrupts = <21 2 0 0>;
+		interrupt-map-mask = <0xf800 0 0 7>;
+		interrupt-map = <
+			/* IDSEL 0x0 */
+			0000 0 0 1 &mpic 41 1 0 0
+			0000 0 0 2 &mpic 5 1 0 0
+			0000 0 0 3 &mpic 6 1 0 0
+			0000 0 0 4 &mpic 7 1 0 0
+			>;
+	};
+};
+
+&pci2 {
+	compatible = "fsl,t1040-pcie", "fsl,qoriq-pcie-v2.4", "fsl,qoriq-pcie";
+	device_type = "pci";
+	#size-cells = <2>;
+	#address-cells = <3>;
+	bus-range = <0x0 0xff>;
+	interrupts = <22 2 0 0>;
+	fsl,iommu-parent = <&pamu0>;
+	pcie@0 {
+		reg = <0 0 0 0 0>;
+		#interrupt-cells = <1>;
+		#size-cells = <2>;
+		#address-cells = <3>;
+		device_type = "pci";
+		interrupts = <22 2 0 0>;
+		interrupt-map-mask = <0xf800 0 0 7>;
+		interrupt-map = <
+			/* IDSEL 0x0 */
+			0000 0 0 1 &mpic 42 1 0 0
+			0000 0 0 2 &mpic 9 1 0 0
+			0000 0 0 3 &mpic 10 1 0 0
+			0000 0 0 4 &mpic 11 1 0 0
+			>;
+	};
+};
+
+&pci3 {
+	compatible = "fsl,t1040-pcie", "fsl,qoriq-pcie-v2.4", "fsl,qoriq-pcie";
+	device_type = "pci";
+	#size-cells = <2>;
+	#address-cells = <3>;
+	bus-range = <0x0 0xff>;
+	interrupts = <23 2 0 0>;
+	fsl,iommu-parent = <&pamu0>;
+	pcie@0 {
+		reg = <0 0 0 0 0>;
+		#interrupt-cells = <1>;
+		#size-cells = <2>;
+		#address-cells = <3>;
+		device_type = "pci";
+		interrupts = <23 2 0 0>;
+		interrupt-map-mask = <0xf800 0 0 7>;
+		interrupt-map = <
+			/* IDSEL 0x0 */
+			0000 0 0 1 &mpic 43 1 0 0
+			0000 0 0 2 &mpic 0 1 0 0
+			0000 0 0 3 &mpic 4 1 0 0
+			0000 0 0 4 &mpic 8 1 0 0
+			>;
+	};
+};
+
+&dcsr {
+	#address-cells = <1>;
+	#size-cells = <1>;
+	compatible = "fsl,dcsr", "simple-bus";
+
+	dcsr-epu@0 {
+		compatible = "fsl,t1040-dcsr-epu", "fsl,dcsr-epu";
+		interrupts = <52 2 0 0
+			      84 2 0 0
+			      85 2 0 0>;
+		reg = <0x0 0x1000>;
+	};
+	dcsr-npc {
+		compatible = "fsl,t1040-dcsr-cnpc", "fsl,dcsr-cnpc";
+		reg = <0x1000 0x1000 0x1002000 0x10000>;
+	};
+	dcsr-nxc@2000 {
+		compatible = "fsl,dcsr-nxc";
+		reg = <0x2000 0x1000>;
+	};
+	dcsr-corenet {
+		compatible = "fsl,dcsr-corenet";
+		reg = <0x8000 0x1000 0x1A000 0x1000>;
+	};
+	dcsr-dpaa@9000 {
+		compatible = "fsl,t1040-dcsr-dpaa", "fsl,dcsr-dpaa";
+		reg = <0x9000 0x1000>;
+	};
+	dcsr-ocn@11000 {
+		compatible = "fsl,t1040-dcsr-ocn", "fsl,dcsr-ocn";
+		reg = <0x11000 0x1000>;
+	};
+	dcsr-ddr@12000 {
+		compatible = "fsl,dcsr-ddr";
+		dev-handle = <&ddr1>;
+		reg = <0x12000 0x1000>;
+	};
+	dcsr-nal@18000 {
+		compatible = "fsl,t1040-dcsr-nal", "fsl,dcsr-nal";
+		reg = <0x18000 0x1000>;
+	};
+	dcsr-rcpm@22000 {
+		compatible = "fsl,t1040-dcsr-rcpm", "fsl,dcsr-rcpm";
+		reg = <0x22000 0x1000>;
+	};
+	dcsr-snpc@30000 {
+		compatible = "fsl,t1040-dcsr-snpc", "fsl,dcsr-snpc";
+		reg = <0x30000 0x1000 0x1022000 0x10000>;
+	};
+	dcsr-snpc@31000 {
+		compatible = "fsl,t1040-dcsr-snpc", "fsl,dcsr-snpc";
+		reg = <0x31000 0x1000 0x1042000 0x10000>;
+	};
+	dcsr-cpu-sb-proxy@100000 {
+		compatible = "fsl,dcsr-e5500-sb-proxy", "fsl,dcsr-cpu-sb-proxy";
+		cpu-handle = <&cpu0>;
+		reg = <0x100000 0x1000 0x101000 0x1000>;
+	};
+	dcsr-cpu-sb-proxy@108000 {
+		compatible = "fsl,dcsr-e5500-sb-proxy", "fsl,dcsr-cpu-sb-proxy";
+		cpu-handle = <&cpu1>;
+		reg = <0x108000 0x1000 0x109000 0x1000>;
+	};
+	dcsr-cpu-sb-proxy@110000 {
+		compatible = "fsl,dcsr-e5500-sb-proxy", "fsl,dcsr-cpu-sb-proxy";
+		cpu-handle = <&cpu2>;
+		reg = <0x110000 0x1000 0x111000 0x1000>;
+	};
+	dcsr-cpu-sb-proxy@118000 {
+		compatible = "fsl,dcsr-e5500-sb-proxy", "fsl,dcsr-cpu-sb-proxy";
+		cpu-handle = <&cpu3>;
+		reg = <0x118000 0x1000 0x119000 0x1000>;
+	};
+};
+
+&soc {
+	#address-cells = <1>;
+	#size-cells = <1>;
+	device_type = "soc";
+	compatible = "simple-bus";
+
+	soc-sram-error {
+		compatible = "fsl,soc-sram-error";
+		interrupts = <16 2 1 29>;
+	};
+
+	corenet-law@0 {
+		compatible = "fsl,corenet-law";
+		reg = <0x0 0x1000>;
+		fsl,num-laws = <16>;
+	};
+
+	ddr1: memory-controller@8000 {
+		compatible = "fsl,qoriq-memory-controller-v5.0",
+				"fsl,qoriq-memory-controller";
+		reg = <0x8000 0x1000>;
+		interrupts = <16 2 1 23>;
+	};
+
+	cpc: l3-cache-controller@10000 {
+		compatible = "fsl,t1040-l3-cache-controller", "cache";
+		reg = <0x10000 0x1000>;
+		interrupts = <16 2 1 27>;
+	};
+
+	corenet-cf@18000 {
+		compatible = "fsl,corenet-cf";
+		reg = <0x18000 0x1000>;
+		interrupts = <16 2 1 31>;
+		fsl,ccf-num-csdids = <32>;
+		fsl,ccf-num-snoopids = <32>;
+	};
+
+	iommu@20000 {
+		compatible = "fsl,pamu-v1.0", "fsl,pamu";
+		reg = <0x20000 0x1000>;
+		ranges = <0 0x20000 0x1000>;
+		#address-cells = <1>;
+		#size-cells = <1>;
+		interrupts = <
+			24 2 0 0
+			16 2 1 30>;
+		pamu0: pamu@0 {
+			reg = <0 0x1000>;
+			fsl,primary-cache-geometry = <128 1>;
+			fsl,secondary-cache-geometry = <16 2>;
+		};
+	};
+
+/include/ "qoriq-mpic.dtsi"
+
+	guts: global-utilities@e0000 {
+		compatible = "fsl,t1040-device-config", "fsl,qoriq-device-config-2.0";
+		reg = <0xe0000 0xe00>;
+		fsl,has-rstcr;
+		fsl,liodn-bits = <12>;
+	};
+
+	clockgen: global-utilities@e1000 {
+		compatible = "fsl,t1040-clockgen", "fsl,qoriq-clockgen-2.0",
+				   "fixed-clock";
+		ranges = <0x0 0xe1000 0x1000>;
+		clock-frequency = <100000000>;
+		reg = <0xe1000 0x1000>;
+		clock-output-names = "sysclk";
+		#address-cells = <1>;
+		#size-cells = <1>;
+
+		sysclk: sysclk {
+			#clock-cells = <0>;
+			compatible = "fsl,qoriq-sysclk-2.0";
+			clock-output-names = "sysclk";
+		};
+
+
+		pll0: pll0@800 {
+			#clock-cells = <1>;
+			reg = <0x800 4>;
+			compatible = "fsl,qoriq-core-pll-2.0";
+			clocks = <&clockgen>;
+			clock-output-names = "pll0", "pll0-div2", "pll0-div4";
+		};
+
+		pll1: pll1@820 {
+			#clock-cells = <1>;
+			reg = <0x820 4>;
+			compatible = "fsl,qoriq-core-pll-2.0";
+			clocks = <&clockgen>;
+			clock-output-names = "pll1", "pll1-div2", "pll1-div4";
+		};
+
+		mux0: mux0@0 {
+			#clock-cells = <0>;
+			reg = <0x0 4>;
+			compatible = "fsl,qoriq-core-mux-2.0";
+			clocks = <&pll0 0>, <&pll0 1>, <&pll0 2>,
+				 <&pll1 0>, <&pll1 1>, <&pll1 2>;
+			clock-names = "pll0", "pll0-div2", "pll1-div4",
+				"pll1", "pll1-div2", "pll1-div4";
+			clock-output-names = "cmux0";
+		};
+
+		mux1: mux1@20 {
+			#clock-cells = <0>;
+			reg = <0x20 4>;
+			compatible = "fsl,qoriq-core-mux-2.0";
+			clocks = <&pll0 0>, <&pll0 1>, <&pll0 2>,
+				 <&pll1 0>, <&pll1 1>, <&pll1 2>;
+			clock-names = "pll0", "pll0-div2", "pll1-div4",
+				"pll1", "pll1-div2", "pll1-div4";
+			clock-output-names = "cmux1";
+		};
+
+		mux2: mux2@40 {
+			#clock-cells = <0>;
+			reg = <0x40 4>;
+			compatible = "fsl,qoriq-core-mux-2.0";
+			clocks = <&pll0 0>, <&pll0 1>, <&pll0 2>,
+				 <&pll1 0>, <&pll1 1>, <&pll1 2>;
+			clock-names = "pll0", "pll0-div2", "pll1-div4",
+				"pll1", "pll1-div2", "pll1-div4";
+			clock-output-names = "cmux2";
+		};
+
+		mux3: mux3@60 {
+			#clock-cells = <0>;
+			reg = <0x60 4>;
+			compatible = "fsl,qoriq-core-mux-2.0";
+			clocks = <&pll0 0>, <&pll0 1>, <&pll0 2>,
+				 <&pll1 0>, <&pll1 1>, <&pll1 2>;
+			clock-names = "pll0_0", "pll0_1", "pll0_2",
+				"pll1_0", "pll1_1", "pll1_2";
+			clock-output-names = "cmux3";
+		};
+	};
+
+	rcpm: global-utilities@e2000 {
+		compatible = "fsl,t1040-rcpm", "fsl,qoriq-rcpm-2.0";
+		reg = <0xe2000 0x1000>;
+	};
+
+	sfp: sfp@e8000 {
+		compatible = "fsl,t1040-sfp";
+		reg	   = <0xe8000 0x1000>;
+	};
+
+	serdes: serdes@ea000 {
+		compatible = "fsl,t1040-serdes";
+		reg	   = <0xea000 0x4000>;
+	};
+
+/include/ "elo3-dma-0.dtsi"
+/include/ "elo3-dma-1.dtsi"
+/include/ "qoriq-espi-0.dtsi"
+	spi@110000 {
+		fsl,espi-num-chipselects = <4>;
+	};
+
+/include/ "qoriq-esdhc-0.dtsi"
+	sdhc@114000 {
+		compatible = "fsl,t1040-esdhc", "fsl,esdhc";
+		fsl,iommu-parent = <&pamu0>;
+		fsl,liodn-reg = <&guts 0x530>; /* eSDHCLIODNR */
+		sdhci,auto-cmd12;
+	};
+/include/ "qoriq-i2c-0.dtsi"
+/include/ "qoriq-i2c-1.dtsi"
+/include/ "qoriq-duart-0.dtsi"
+/include/ "qoriq-duart-1.dtsi"
+/include/ "qoriq-gpio-0.dtsi"
+/include/ "qoriq-gpio-1.dtsi"
+/include/ "qoriq-gpio-2.dtsi"
+/include/ "qoriq-gpio-3.dtsi"
+/include/ "qoriq-usb2-mph-0.dtsi"
+		usb0: usb@210000 {
+			compatible = "fsl-usb2-mph-v2.4", "fsl-usb2-mph";
+			fsl,iommu-parent = <&pamu0>;
+			fsl,liodn-reg = <&guts 0x520>; /* USB1LIODNR */
+			phy_type = "utmi";
+			port0;
+		};
+/include/ "qoriq-usb2-dr-0.dtsi"
+		usb1: usb@211000 {
+			compatible = "fsl-usb2-dr-v2.4", "fsl-usb2-dr";
+			fsl,iommu-parent = <&pamu0>;
+			fsl,liodn-reg = <&guts 0x524>; /* USB2LIODNR */
+			dr_mode = "host";
+			phy_type = "utmi";
+		};
+
+	display@180000 {
+		compatible = "fsl,t1040-diu", "fsl,diu";
+		reg = <0x180000 1000>;
+		interrupts = <74 2 0 0>;
+	};
+
+/include/ "qoriq-sata2-0.dtsi"
+sata@220000 {
+			fsl,iommu-parent = <&pamu0>;
+			fsl,liodn-reg = <&guts 0x550>; /* SATA1LIODNR */
+};
+/include/ "qoriq-sata2-1.dtsi"
+sata@221000 {
+			fsl,iommu-parent = <&pamu0>;
+			fsl,liodn-reg = <&guts 0x554>; /* SATA2LIODNR */
+};
+/include/ "qoriq-sec5.0-0.dtsi"
+};
diff --git a/arch/powerpc/boot/dts/fsl/t1042si-post.dtsi b/arch/powerpc/boot/dts/fsl/t1042si-post.dtsi
new file mode 100644
index 0000000..319b74f
--- /dev/null
+++ b/arch/powerpc/boot/dts/fsl/t1042si-post.dtsi
@@ -0,0 +1,37 @@
+/*
+ * T1042 Silicon/SoC Device Tree Source (post include)
+ *
+ * Copyright 2013 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in the
+ *       documentation and/or other materials provided with the distribution.
+ *     * Neither the name of Freescale Semiconductor nor the
+ *       names of its contributors may be used to endorse or promote products
+ *       derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/include/ "t1040si-post.dtsi"
+
+/* Place holder for ethernet related device tree nodes */
diff --git a/arch/powerpc/boot/dts/fsl/t104xsi-pre.dtsi b/arch/powerpc/boot/dts/fsl/t104xsi-pre.dtsi
new file mode 100644
index 0000000..bbb7025
--- /dev/null
+++ b/arch/powerpc/boot/dts/fsl/t104xsi-pre.dtsi
@@ -0,0 +1,104 @@
+/*
+ * T1040/T1042 Silicon/SoC Device Tree Source (pre include)
+ *
+ * Copyright 2013 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *     * Redistributions of source code must retain the above copyright
+ *	 notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *	 notice, this list of conditions and the following disclaimer in the
+ *	 documentation and/or other materials provided with the distribution.
+ *     * Neither the name of Freescale Semiconductor nor the
+ *	 names of its contributors may be used to endorse or promote products
+ *	 derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/dts-v1/;
+
+/include/ "e5500_power_isa.dtsi"
+
+/ {
+	#address-cells = <2>;
+	#size-cells = <2>;
+	interrupt-parent = <&mpic>;
+
+	aliases {
+		ccsr = &soc;
+		dcsr = &dcsr;
+
+		serial0 = &serial0;
+		serial1 = &serial1;
+		serial2 = &serial2;
+		serial3 = &serial3;
+		pci0 = &pci0;
+		pci1 = &pci1;
+		pci2 = &pci2;
+		pci3 = &pci3;
+		usb0 = &usb0;
+		usb1 = &usb1;
+		sdhc = &sdhc;
+
+		crypto = &crypto;
+	};
+
+	cpus {
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		cpu0: PowerPC,e5500@0 {
+			device_type = "cpu";
+			reg = <0>;
+			clocks = <&mux0>;
+			next-level-cache = <&L2_1>;
+			L2_1: l2-cache {
+				next-level-cache = <&cpc>;
+			};
+		};
+		cpu1: PowerPC,e5500@1 {
+			device_type = "cpu";
+			reg = <1>;
+			clocks = <&mux1>;
+			next-level-cache = <&L2_2>;
+			L2_2: l2-cache {
+				next-level-cache = <&cpc>;
+			};
+		};
+		cpu2: PowerPC,e5500@2 {
+			device_type = "cpu";
+			reg = <2>;
+			clocks = <&mux2>;
+			next-level-cache = <&L2_3>;
+			L2_3: l2-cache {
+				next-level-cache = <&cpc>;
+			};
+		};
+		cpu3: PowerPC,e5500@3 {
+			device_type = "cpu";
+			reg = <3>;
+			clocks = <&mux3>;
+			next-level-cache = <&L2_4>;
+			L2_4: l2-cache {
+				next-level-cache = <&cpc>;
+			};
+		};
+	};
+};
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
From: Nishanth Aravamudan @ 2014-01-25  1:10 UTC (permalink / raw)
  To: David Rientjes
  Cc: Han Pingtian, penberg, linux-mm, paulus, Anton Blanchard, mpm,
	Christoph Lameter, linuxppc-dev, Joonsoo Kim, Wanpeng Li
In-Reply-To: <alpine.DEB.2.02.1401241618500.20466@chino.kir.corp.google.com>

On 24.01.2014 [16:25:58 -0800], David Rientjes wrote:
> On Fri, 24 Jan 2014, Nishanth Aravamudan wrote:
> 
> > Thank you for clarifying and providing  a test patch. I ran with this on
> > the system showing the original problem, configured to have 15GB of
> > memory.
> > 
> > With your patch after boot:
> > 
> > MemTotal:       15604736 kB
> > MemFree:         8768192 kB
> > Slab:            3882560 kB
> > SReclaimable:     105408 kB
> > SUnreclaim:      3777152 kB
> > 
> > With Anton's patch after boot:
> > 
> > MemTotal:       15604736 kB
> > MemFree:        11195008 kB
> > Slab:            1427968 kB
> > SReclaimable:     109184 kB
> > SUnreclaim:      1318784 kB
> > 
> > 
> > I know that's fairly unscientific, but the numbers are reproducible. 
> > 
> 
> I don't think the goal of the discussion is to reduce the amount of slab 
> allocated, but rather get the most local slab memory possible by use of 
> kmalloc_node().  When a memoryless node is being passed to kmalloc_node(), 
> which is probably cpu_to_node() for a cpu bound to a node without memory, 
> my patch is allocating it on the most local node; Anton's patch is 
> allocating it on whatever happened to be the cpu slab.

Well, the issue we're trying to resolve, based upon our analysis, is
that we're seeing incredibly inefficient slab usage with memoryless
nodes. To the point where we are OOM'ing a 8GB system without doing
anything in particularly stressful.

As to cpu_to_node() being passed to kmalloc_node(), I think an
appropriate fix is to change that to cpu_to_mem()?

> > > diff --git a/mm/slub.c b/mm/slub.c
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -2278,10 +2278,14 @@ redo:
> > > 
> > >  	if (unlikely(!node_match(page, node))) {
> > >  		stat(s, ALLOC_NODE_MISMATCH);
> > > -		deactivate_slab(s, page, c->freelist);
> > > -		c->page = NULL;
> > > -		c->freelist = NULL;
> > > -		goto new_slab;
> > > +		if (unlikely(!node_present_pages(node)))
> > > +			node = numa_mem_id();
> > > +		if (!node_match(page, node)) {
> > > +			deactivate_slab(s, page, c->freelist);
> > > +			c->page = NULL;
> > > +			c->freelist = NULL;
> > > +			goto new_slab;
> > > +		}
> > 
> > Semantically, and please correct me if I'm wrong, this patch is saying
> > if we have a memoryless node, we expect the page's locality to be that
> > of numa_mem_id(), and we still deactivate the slab if that isn't true.
> > Just wanting to make sure I understand the intent.
> > 
> 
> Yeah, the default policy should be to fallback to local memory if the node 
> passed is memoryless.

Thanks!

> > What I find odd is that there are only 2 nodes on this system, node 0
> > (empty) and node 1. So won't numa_mem_id() always be 1? And every page
> > should be coming from node 1 (thus node_match() should always be true?)
> > 
> 
> The nice thing about slub is its debugging ability, what is 
> /sys/kernel/slab/cache/objects showing in comparison between the two 
> patches?

Do you mean kmem_cache or kmem_cache_node?

-Nish

^ permalink raw reply

* Re: mpic build failure for 7447_hpc defconfig (bisected)
From: Paul Gortmaker @ 2014-01-25  0:47 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <20140108224529.GA4708@windriver.com>

On Wed, Jan 8, 2014 at 5:45 PM, Paul Gortmaker
<paul.gortmaker@windriver.com> wrote:
> Commit 446f6d06fab0b49c61887ecbe8286d6aaa796637 ("powerpc/mpic: Properly
> set default triggers") breaks the mpc7447_hpc_defconfig as follows:
>
>   CC      arch/powerpc/sysdev/mpic.o
> arch/powerpc/sysdev/mpic.c: In function 'mpic_set_irq_type':
> arch/powerpc/sysdev/mpic.c:886:9: error: case label does not reduce to an integer constant
> arch/powerpc/sysdev/mpic.c:890:9: error: case label does not reduce to an integer constant
> arch/powerpc/sysdev/mpic.c:894:9: error: case label does not reduce to an integer constant
> arch/powerpc/sysdev/mpic.c:898:9: error: case label does not reduce to an integer constant
>
> Looking at the cpp output (gcc 4.7.3 from the kernel.org toolchains), I see:
>
>    case mpic->hw_set[MPIC_IDX_VECPRI_SENSE_EDGE] |
>         mpic->hw_set[MPIC_IDX_VECPRI_POLARITY_POSITIVE]:
>
> The pointer into an array appears because CONFIG_MPIC_WEIRD=y is set for
> this thing.
>
> -------------------
>   #ifdef CONFIG_MPIC_WEIRD
>   static u32 mpic_infos[][MPIC_IDX_END] = {
>         [0] = { /* Original OpenPIC compatible MPIC */
>
>   [...]
>
>   #define MPIC_INFO(name) mpic->hw_set[MPIC_IDX_##name]
>
>   #else /* CONFIG_MPIC_WEIRD */
>
>   #define MPIC_INFO(name) MPIC_##name
>
>   #endif /* CONFIG_MPIC_WEIRD */
> -------------------
>
>
> Given it has been broken since 3.4-rc5, is it safe to say MPIC_WEIRD is dead
> and unused?  Or should the case be converted to if/else or similar?  Or were
> other versions of gcc actually able to "see" constant numbers?

Ping.  I'll default to conversion to if/else since I don't see any other
solution that will solve thiis easily.  However I am tempted to delete
the abandonware platforms, if given a choice.  The 3.4 was quite some
time ago, and if nobody cared since then, well ....

P.
--

>
> Paul.
> --
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply

* Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
From: David Rientjes @ 2014-01-25  0:25 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Han Pingtian, penberg, linux-mm, paulus, Anton Blanchard, mpm,
	Christoph Lameter, linuxppc-dev, Joonsoo Kim, Wanpeng Li
In-Reply-To: <20140125001643.GA25344@linux.vnet.ibm.com>

On Fri, 24 Jan 2014, Nishanth Aravamudan wrote:

> Thank you for clarifying and providing  a test patch. I ran with this on
> the system showing the original problem, configured to have 15GB of
> memory.
> 
> With your patch after boot:
> 
> MemTotal:       15604736 kB
> MemFree:         8768192 kB
> Slab:            3882560 kB
> SReclaimable:     105408 kB
> SUnreclaim:      3777152 kB
> 
> With Anton's patch after boot:
> 
> MemTotal:       15604736 kB
> MemFree:        11195008 kB
> Slab:            1427968 kB
> SReclaimable:     109184 kB
> SUnreclaim:      1318784 kB
> 
> 
> I know that's fairly unscientific, but the numbers are reproducible. 
> 

I don't think the goal of the discussion is to reduce the amount of slab 
allocated, but rather get the most local slab memory possible by use of 
kmalloc_node().  When a memoryless node is being passed to kmalloc_node(), 
which is probably cpu_to_node() for a cpu bound to a node without memory, 
my patch is allocating it on the most local node; Anton's patch is 
allocating it on whatever happened to be the cpu slab.

> > diff --git a/mm/slub.c b/mm/slub.c
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2278,10 +2278,14 @@ redo:
> > 
> >  	if (unlikely(!node_match(page, node))) {
> >  		stat(s, ALLOC_NODE_MISMATCH);
> > -		deactivate_slab(s, page, c->freelist);
> > -		c->page = NULL;
> > -		c->freelist = NULL;
> > -		goto new_slab;
> > +		if (unlikely(!node_present_pages(node)))
> > +			node = numa_mem_id();
> > +		if (!node_match(page, node)) {
> > +			deactivate_slab(s, page, c->freelist);
> > +			c->page = NULL;
> > +			c->freelist = NULL;
> > +			goto new_slab;
> > +		}
> 
> Semantically, and please correct me if I'm wrong, this patch is saying
> if we have a memoryless node, we expect the page's locality to be that
> of numa_mem_id(), and we still deactivate the slab if that isn't true.
> Just wanting to make sure I understand the intent.
> 

Yeah, the default policy should be to fallback to local memory if the node 
passed is memoryless.

> What I find odd is that there are only 2 nodes on this system, node 0
> (empty) and node 1. So won't numa_mem_id() always be 1? And every page
> should be coming from node 1 (thus node_match() should always be true?)
> 

The nice thing about slub is its debugging ability, what is 
/sys/kernel/slab/cache/objects showing in comparison between the two 
patches?

^ permalink raw reply

* Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
From: Nishanth Aravamudan @ 2014-01-25  0:16 UTC (permalink / raw)
  To: David Rientjes
  Cc: Han Pingtian, penberg, linux-mm, paulus, Anton Blanchard, mpm,
	Christoph Lameter, linuxppc-dev, Joonsoo Kim, Wanpeng Li
In-Reply-To: <alpine.DEB.2.02.1401241543100.18620@chino.kir.corp.google.com>

On 24.01.2014 [15:49:33 -0800], David Rientjes wrote:
> On Fri, 24 Jan 2014, Nishanth Aravamudan wrote:
> 
> > > I think the problem is a memoryless node being used for kmalloc_node() so 
> > > we need to decide where to enforce node_present_pages().  __slab_alloc() 
> > > seems like the best candidate when !node_match().
> > 
> > Actually, this is effectively what Anton's patch does, except with
> > Wanpeng's adjustment to use node_present_pages(). Does that seem
> > sufficient to you?
> > 
> 
> I don't see that as being the effect of Anton's patch.  We need to use 
> numa_mem_id() as Christoph mentioned when a memoryless node is passed for 
> the best NUMA locality.  Something like this:

Thank you for clarifying and providing  a test patch. I ran with this on
the system showing the original problem, configured to have 15GB of
memory.

With your patch after boot:

MemTotal:       15604736 kB
MemFree:         8768192 kB
Slab:            3882560 kB
SReclaimable:     105408 kB
SUnreclaim:      3777152 kB

With Anton's patch after boot:

MemTotal:       15604736 kB
MemFree:        11195008 kB
Slab:            1427968 kB
SReclaimable:     109184 kB
SUnreclaim:      1318784 kB

I know that's fairly unscientific, but the numbers are reproducible. 

For what it's worth, a sample of the unmodified numbers:

MemTotal:       15317632 kB
MemFree:         5023424 kB
Slab:            7176064 kB
SReclaimable:     106816 kB
SUnreclaim:      7069248 kB

So it's an improvement, but something is still causing us to (it seems)
be pretty inefficient with the slabs.

> diff --git a/mm/slub.c b/mm/slub.c
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2278,10 +2278,14 @@ redo:
> 
>  	if (unlikely(!node_match(page, node))) {
>  		stat(s, ALLOC_NODE_MISMATCH);
> -		deactivate_slab(s, page, c->freelist);
> -		c->page = NULL;
> -		c->freelist = NULL;
> -		goto new_slab;
> +		if (unlikely(!node_present_pages(node)))
> +			node = numa_mem_id();
> +		if (!node_match(page, node)) {
> +			deactivate_slab(s, page, c->freelist);
> +			c->page = NULL;
> +			c->freelist = NULL;
> +			goto new_slab;
> +		}

Semantically, and please correct me if I'm wrong, this patch is saying
if we have a memoryless node, we expect the page's locality to be that
of numa_mem_id(), and we still deactivate the slab if that isn't true.
Just wanting to make sure I understand the intent.

What I find odd is that there are only 2 nodes on this system, node 0
(empty) and node 1. So won't numa_mem_id() always be 1? And every page
should be coming from node 1 (thus node_match() should always be true?)

Thanks,
Nish

^ permalink raw reply

* Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
From: David Rientjes @ 2014-01-24 23:49 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Han Pingtian, penberg, linux-mm, paulus, Anton Blanchard, mpm,
	Christoph Lameter, linuxppc-dev, Joonsoo Kim, Wanpeng Li
In-Reply-To: <20140124232902.GB30361@linux.vnet.ibm.com>

On Fri, 24 Jan 2014, Nishanth Aravamudan wrote:

> > I think the problem is a memoryless node being used for kmalloc_node() so 
> > we need to decide where to enforce node_present_pages().  __slab_alloc() 
> > seems like the best candidate when !node_match().
> 
> Actually, this is effectively what Anton's patch does, except with
> Wanpeng's adjustment to use node_present_pages(). Does that seem
> sufficient to you?
> 

I don't see that as being the effect of Anton's patch.  We need to use 
numa_mem_id() as Christoph mentioned when a memoryless node is passed for 
the best NUMA locality.  Something like this:

diff --git a/mm/slub.c b/mm/slub.c
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2278,10 +2278,14 @@ redo:
 
 	if (unlikely(!node_match(page, node))) {
 		stat(s, ALLOC_NODE_MISMATCH);
-		deactivate_slab(s, page, c->freelist);
-		c->page = NULL;
-		c->freelist = NULL;
-		goto new_slab;
+		if (unlikely(!node_present_pages(node)))
+			node = numa_mem_id();
+		if (!node_match(page, node)) {
+			deactivate_slab(s, page, c->freelist);
+			c->page = NULL;
+			c->freelist = NULL;
+			goto new_slab;
+		}
 	}
 
 	/*

> It does only cover the memoryless node case (not the exhausted node
> case), but I think that shouldn't block the fix (and it does fix the
> issue we've run across in our testing).
> 

kmalloc_node(nid) and kmem_cache_alloc_node(nid) should fallback to nodes 
other than nid when memory can't be allocated, these functions only 
indicate a preference.

^ permalink raw reply

* Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
From: Nishanth Aravamudan @ 2014-01-24 23:29 UTC (permalink / raw)
  To: David Rientjes
  Cc: Han Pingtian, penberg, linux-mm, paulus, Anton Blanchard, mpm,
	Christoph Lameter, linuxppc-dev, Joonsoo Kim, Wanpeng Li
In-Reply-To: <alpine.DEB.2.02.1401241301120.10968@chino.kir.corp.google.com>

On 24.01.2014 [13:03:13 -0800], David Rientjes wrote:
> On Fri, 24 Jan 2014, Christoph Lameter wrote:
> 
> > On Fri, 24 Jan 2014, Wanpeng Li wrote:
> > 
> > > >
> > > >diff --git a/mm/slub.c b/mm/slub.c
> > > >index 545a170..a1c6040 100644
> > > >--- a/mm/slub.c
> > > >+++ b/mm/slub.c
> > > >@@ -1700,6 +1700,9 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
> > > > 	void *object;
> > > >	int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;
> > 
> > This needs to be numa_mem_id() and numa_mem_id would need to be
> > consistently used.
> > 
> > > >
> > > >+	if (!node_present_pages(searchnode))
> > > >+		searchnode = numa_mem_id();
> > 
> > Probably wont need that?
> > 
> 
> I think the problem is a memoryless node being used for kmalloc_node() so 
> we need to decide where to enforce node_present_pages().  __slab_alloc() 
> seems like the best candidate when !node_match().

Actually, this is effectively what Anton's patch does, except with
Wanpeng's adjustment to use node_present_pages(). Does that seem
sufficient to you?

It does only cover the memoryless node case (not the exhausted node
case), but I think that shouldn't block the fix (and it does fix the
issue we've run across in our testing).

-Nish

^ permalink raw reply

* Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
From: Nishanth Aravamudan @ 2014-01-24 22:19 UTC (permalink / raw)
  To: David Rientjes
  Cc: penberg, linux-mm, Han Pingtian, paulus, Anton Blanchard, mpm,
	Christoph Lameter, linuxppc-dev, Joonsoo Kim, Wanpeng Li
In-Reply-To: <alpine.DEB.2.02.1401241301120.10968@chino.kir.corp.google.com>

On 24.01.2014 [13:03:13 -0800], David Rientjes wrote:
> On Fri, 24 Jan 2014, Christoph Lameter wrote:
> 
> > On Fri, 24 Jan 2014, Wanpeng Li wrote:
> > 
> > > >
> > > >diff --git a/mm/slub.c b/mm/slub.c
> > > >index 545a170..a1c6040 100644
> > > >--- a/mm/slub.c
> > > >+++ b/mm/slub.c
> > > >@@ -1700,6 +1700,9 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
> > > > 	void *object;
> > > >	int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;
> > 
> > This needs to be numa_mem_id() and numa_mem_id would need to be
> > consistently used.
> > 
> > > >
> > > >+	if (!node_present_pages(searchnode))
> > > >+		searchnode = numa_mem_id();
> > 
> > Probably wont need that?
> > 
> 
> I think the problem is a memoryless node being used for kmalloc_node() so 
> we need to decide where to enforce node_present_pages().  __slab_alloc() 
> seems like the best candidate when !node_match().
> 

Yep, I'm looking through callers and such right now and came to a
similar conclusion. I should have a patch soon.

Thanks,
Nish

^ permalink raw reply

* Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
From: David Rientjes @ 2014-01-24 21:03 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: nacc, penberg, linux-mm, Han Pingtian, paulus, Anton Blanchard,
	mpm, Joonsoo Kim, linuxppc-dev, Wanpeng Li
In-Reply-To: <alpine.DEB.2.10.1401240946530.12886@nuc>

On Fri, 24 Jan 2014, Christoph Lameter wrote:

> On Fri, 24 Jan 2014, Wanpeng Li wrote:
> 
> > >
> > >diff --git a/mm/slub.c b/mm/slub.c
> > >index 545a170..a1c6040 100644
> > >--- a/mm/slub.c
> > >+++ b/mm/slub.c
> > >@@ -1700,6 +1700,9 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
> > > 	void *object;
> > >	int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;
> 
> This needs to be numa_mem_id() and numa_mem_id would need to be
> consistently used.
> 
> > >
> > >+	if (!node_present_pages(searchnode))
> > >+		searchnode = numa_mem_id();
> 
> Probably wont need that?
> 

I think the problem is a memoryless node being used for kmalloc_node() so 
we need to decide where to enforce node_present_pages().  __slab_alloc() 
seems like the best candidate when !node_match().

^ permalink raw reply

* Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
From: Christoph Lameter @ 2014-01-24 15:50 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: nacc, David Rientjes, penberg, linux-mm, Han Pingtian, paulus,
	Anton Blanchard, mpm, Joonsoo Kim, linuxppc-dev
In-Reply-To: <52e1da8f.86f7440a.120f.25f3SMTPIN_ADDED_BROKEN@mx.google.com>

On Fri, 24 Jan 2014, Wanpeng Li wrote:

> >
> >diff --git a/mm/slub.c b/mm/slub.c
> >index 545a170..a1c6040 100644
> >--- a/mm/slub.c
> >+++ b/mm/slub.c
> >@@ -1700,6 +1700,9 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
> > 	void *object;
> >	int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;

This needs to be numa_mem_id() and numa_mem_id would need to be
consistently used.

> >
> >+	if (!node_present_pages(searchnode))
> >+		searchnode = numa_mem_id();

Probably wont need that?

> >+
> >	object = get_partial_node(s, get_node(s, searchnode), c, flags);
> >	if (object || node != NUMA_NO_NODE)
> >		return object;
> >
>
> The bug still can't be fixed w/ this patch.

Some more detail would be good. If memory is requested from a particular
node then it would be best to use one that has memory. Callers also may
have used numa_node_id() and that also would need to be fixed.

^ permalink raw reply

* [PATCH] rtc: ds3232 make it possible to share an irq
From: Bharat Bhushan @ 2014-01-24 11:06 UTC (permalink / raw)
  To: rtc-linux, a.zummo, scottwood, linuxppc-dev; +Cc: Bharat Bhushan

It's possible to have RTC irq shared with other device (e.g.
t4240qds board shares ds3232irq with phy one).  Handle this in
driver.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
---
 drivers/rtc/rtc-ds3232.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/rtc/rtc-ds3232.c b/drivers/rtc/rtc-ds3232.c
index b83bb5a..598837b 100644
--- a/drivers/rtc/rtc-ds3232.c
+++ b/drivers/rtc/rtc-ds3232.c
@@ -419,8 +419,8 @@ static int ds3232_probe(struct i2c_client *client,
 	}
 
 	if (client->irq >= 0) {
-		ret = devm_request_irq(&client->dev, client->irq, ds3232_irq, 0,
-				 "ds3232", client);
+		ret = devm_request_irq(&client->dev, client->irq, ds3232_irq,
+				       IRQF_SHARED, "ds3232", client);
 		if (ret) {
 			dev_err(&client->dev, "unable to request IRQ\n");
 			return ret;
-- 
1.7.0.4

^ permalink raw reply related

* Re: [PATCH V2] cpuidle/governors: Fix logic in selection of idle states
From: Preeti U Murthy @ 2014-01-24 10:21 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: linux-pm, rjw, linux-kernel, srivatsa.bhat, paulmck, linuxppc-dev,
	tuukka.tikkanen
In-Reply-To: <52E22D8A.5000305@linaro.org>

On 01/24/2014 02:38 PM, Daniel Lezcano wrote:
> On 01/23/2014 12:15 PM, Preeti U Murthy wrote:
>> Hi Daniel,
>>
>> Thank you for the review.
>>
>> On 01/22/2014 01:59 PM, Daniel Lezcano wrote:
>>> On 01/17/2014 05:33 AM, Preeti U Murthy wrote:
>>>>
>>>> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
>>>> index a55e68f..831b664 100644
>>>> --- a/drivers/cpuidle/cpuidle.c
>>>> +++ b/drivers/cpuidle/cpuidle.c
>>>> @@ -131,8 +131,9 @@ int cpuidle_idle_call(void)
>>>>
>>>>        /* ask the governor for the next state */
>>>>        next_state = cpuidle_curr_governor->select(drv, dev);
>>>> +
>>>> +    dev->last_residency = 0;
>>>>        if (need_resched()) {
>>>> -        dev->last_residency = 0;
>>>
>>> Why do you need to do this change ? ^^^^^
>>
>> So as to keep the last_residency consistent with the case that this patch
>> addresses: where no idle state could be selected due to strict latency
>> requirements or disabled states and hence the cpu exits without entering
>> idle. Else it would contain the stale value from the previous idle state
>> entry.
>>
>> But coming to think of it dev->last_residency is not used when the last
>> entered idle state index is -1.
>>
>> So I have reverted this change as well in the revised patch below along
>> with mentioning the reason in the last paragraph of the changelog.
>>
>>>
>>>>            /* give the governor an opportunity to reflect on the
>>>> outcome */
>>>>            if (cpuidle_curr_governor->reflect)
>>>>                cpuidle_curr_governor->reflect(dev, next_state);
>>>> @@ -140,6 +141,18 @@ int cpuidle_idle_call(void)
>>>>            return 0;
>>>>        }
>>>>
>>>> +    /* Unlike in the need_resched() case, we return here because the
>>>> +     * governor did not find a suitable idle state. However idle is
>>>> still
>>>> +     * in progress as we are not asked to reschedule. Hence we return
>>>> +     * without enabling interrupts.
>>>
>>> That will lead to a WARN.
>>>
>>>> +     * NOTE: The return code should still be success, since the
>>>> verdict of this
>>>> +     * call is "do not enter any idle state" and not a failed call
>>>> due to
>>>> +     * errors.
>>>> +     */
>>>> +    if (next_state < 0)
>>>> +        return 0;
>>>> +
>>>
>>> Returning from here breaks the symmetry of the trace.
>>
>> I have addressed the above concerns in the patch found below.
>> Does the rest of the patch look sound?
>>
>> Regards
>> Preeti U Murthy
>>
>> ----------------------------------------------------------------------
>>
>> cpuidle/governors: Fix logic in selection of idle states
>>
>> From: Preeti U Murthy <preeti@linux.vnet.ibm.com>
>>
>> The cpuidle governors today are not handling scenarios where no idle
>> state
>> can be chosen. Such scenarios coud arise if the user has disabled all the
>> idle states at runtime or the latency requirement from the cpus is
>> very strict.
>>
>> The menu governor returns 0th index of the idle state table when no other
>> idle state is suitable. This is even when the idle state corresponding
>> to this
>> index is disabled or the latency requirement is strict and the
>> exit_latency
>> of the lowest idle state is also not acceptable. Hence this patch
>> fixes this logic in the menu governor by defaulting to an idle state
>> index
>> of -1 unless any other state is suitable.
>>
>> The ladder governor needs a few more fixes in addition to that
>> required in the
>> menu governor. When the ladder governor decides to demote the idle
>> state of a
>> CPU, it does not check if the lower idle states are enabled. Add this
>> logic
>> in addition to the logic where it chooses an index of -1 if it can
>> neither
>> promote or demote the idle state of a cpu nor can it choose the
>> current idle
>> state.
>>
>> The cpuidle_idle_call() will return back if the governor decides upon not
>> entering any idle state. However it cannot return an error code
>> because all
>> archs have the logic today that if the call to cpuidle_idle_call()
>> fails, it
>> means that the cpuidle driver failed to *function*; for instance due to
>> errors during registration. As a result they end up deciding upon a
>> default idle state on their own, which could very well be a deep idle
>> state.
>> This is incorrect in cases where no idle state is suitable.
>>
>> Besides for the scenario that this patch is addressing, the call actually
>> succeeds. Its just that no idle state is thought to be suitable by the
>> governors.
>> Under such a circumstance return success code without entering any idle
>> state.
>>
>> The consequence of this patch additionally  on the menu governor is
>> that as
>> long as a valid idle state cannot be chosen, the cpuidle statistics
>> that this
>> governor uses to predict the next idle state remain untouched from the
>> last
>> valid idle state. This is because an idle state is not even being
>> predicted
>> in this path, hence there is no point correcting the prediction either.
>>
>> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
>>
>> Changes from V1:https://lkml.org/lkml/2014/1/14/26
>>
>> 1. Change the return code to success from -EINVAL due to the reason
>> mentioned
>> in the changelog.
>> 2. Add logic that the patch is addressing in the ladder governor as well.
>> 3. Added relevant comments and removed redundant logic as suggested in
>> the
>> above thread.
>> ---
>>   drivers/cpuidle/cpuidle.c          |   15 +++++
>>   drivers/cpuidle/governors/ladder.c |  101
>> ++++++++++++++++++++++++++----------
>>   drivers/cpuidle/governors/menu.c   |    7 +-
>>   3 files changed, 90 insertions(+), 33 deletions(-)
>>
>> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
>> index a55e68f..19d17e8 100644
>> --- a/drivers/cpuidle/cpuidle.c
>> +++ b/drivers/cpuidle/cpuidle.c
>> @@ -131,8 +131,9 @@ int cpuidle_idle_call(void)
>>
>>       /* ask the governor for the next state */
>>       next_state = cpuidle_curr_governor->select(drv, dev);
>> +
>> +    dev->last_residency = 0;
>>       if (need_resched()) {
> 
> What about if (need_resched() || next_state < 0) ?

Hmm.. I feel we need to distinguish between the need_resched() scenario
and the scenario when no idle state was suitable through the trace
points at-least.

This could help while debugging when we could find situations where
there are no tasks to run, yet the cpu is not entering any idle state.
The traces could help clearly point that no idle state was thought
suitable by the governor. Of course there are many other means to find
this out, but this seems rather straightforward. Hence having the
condition next_state < 0 between trace_cpu_idle*() would be apt IMHO.

Regards
Preeti U Murthy

> 
>> -        dev->last_residency = 0;
>>           /* give the governor an opportunity to reflect on the
>> outcome */
>>           if (cpuidle_curr_governor->reflect)
>>               cpuidle_curr_governor->reflect(dev, next_state);
>> @@ -141,6 +142,16 @@ int cpuidle_idle_call(void)
>>       }
>>
>>       trace_cpu_idle_rcuidle(next_state, dev->cpu);
>> +    /*
>> +     * NOTE: The return code should still be success, since the
>> verdict of
>> +     * this call is "do not enter any idle state". It is not a failed
>> call
>> +     * due to errors.
>> +     */
>> +    if (next_state < 0) {
>> +        entered_state = next_state;
>> +        local_irq_enable();
>> +        goto out;
>> +    }
>>
>>       broadcast = !!(drv->states[next_state].flags &
>> CPUIDLE_FLAG_TIMER_STOP);
>>
>> @@ -156,7 +167,7 @@ int cpuidle_idle_call(void)
>>       if (broadcast)
>>           clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &dev->cpu);
>>
>> -    trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
>> +out:    trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
>>
>>       /* give the governor an opportunity to reflect on the outcome */
>>       if (cpuidle_curr_governor->reflect)

^ permalink raw reply

* Re: [PATCH V2] cpuidle/governors: Fix logic in selection of idle states
From: Daniel Lezcano @ 2014-01-24  9:08 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: paulmck, linux-pm, rjw, linux-kernel, srivatsa.bhat, linuxppc-dev,
	tuukka.tikkanen
In-Reply-To: <52E0F9ED.2010506@linux.vnet.ibm.com>

On 01/23/2014 12:15 PM, Preeti U Murthy wrote:
> Hi Daniel,
>
> Thank you for the review.
>
> On 01/22/2014 01:59 PM, Daniel Lezcano wrote:
>> On 01/17/2014 05:33 AM, Preeti U Murthy wrote:
>>>
>>> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
>>> index a55e68f..831b664 100644
>>> --- a/drivers/cpuidle/cpuidle.c
>>> +++ b/drivers/cpuidle/cpuidle.c
>>> @@ -131,8 +131,9 @@ int cpuidle_idle_call(void)
>>>
>>>        /* ask the governor for the next state */
>>>        next_state = cpuidle_curr_governor->select(drv, dev);
>>> +
>>> +    dev->last_residency = 0;
>>>        if (need_resched()) {
>>> -        dev->last_residency = 0;
>>
>> Why do you need to do this change ? ^^^^^
>
> So as to keep the last_residency consistent with the case that this patch
> addresses: where no idle state could be selected due to strict latency
> requirements or disabled states and hence the cpu exits without entering
> idle. Else it would contain the stale value from the previous idle state
> entry.
>
> But coming to think of it dev->last_residency is not used when the last
> entered idle state index is -1.
>
> So I have reverted this change as well in the revised patch below along
> with mentioning the reason in the last paragraph of the changelog.
>
>>
>>>            /* give the governor an opportunity to reflect on the
>>> outcome */
>>>            if (cpuidle_curr_governor->reflect)
>>>                cpuidle_curr_governor->reflect(dev, next_state);
>>> @@ -140,6 +141,18 @@ int cpuidle_idle_call(void)
>>>            return 0;
>>>        }
>>>
>>> +    /* Unlike in the need_resched() case, we return here because the
>>> +     * governor did not find a suitable idle state. However idle is
>>> still
>>> +     * in progress as we are not asked to reschedule. Hence we return
>>> +     * without enabling interrupts.
>>
>> That will lead to a WARN.
>>
>>> +     * NOTE: The return code should still be success, since the
>>> verdict of this
>>> +     * call is "do not enter any idle state" and not a failed call
>>> due to
>>> +     * errors.
>>> +     */
>>> +    if (next_state < 0)
>>> +        return 0;
>>> +
>>
>> Returning from here breaks the symmetry of the trace.
>
> I have addressed the above concerns in the patch found below.
> Does the rest of the patch look sound?
>
> Regards
> Preeti U Murthy
>
> ----------------------------------------------------------------------
>
> cpuidle/governors: Fix logic in selection of idle states
>
> From: Preeti U Murthy <preeti@linux.vnet.ibm.com>
>
> The cpuidle governors today are not handling scenarios where no idle state
> can be chosen. Such scenarios coud arise if the user has disabled all the
> idle states at runtime or the latency requirement from the cpus is very strict.
>
> The menu governor returns 0th index of the idle state table when no other
> idle state is suitable. This is even when the idle state corresponding to this
> index is disabled or the latency requirement is strict and the exit_latency
> of the lowest idle state is also not acceptable. Hence this patch
> fixes this logic in the menu governor by defaulting to an idle state index
> of -1 unless any other state is suitable.
>
> The ladder governor needs a few more fixes in addition to that required in the
> menu governor. When the ladder governor decides to demote the idle state of a
> CPU, it does not check if the lower idle states are enabled. Add this logic
> in addition to the logic where it chooses an index of -1 if it can neither
> promote or demote the idle state of a cpu nor can it choose the current idle
> state.
>
> The cpuidle_idle_call() will return back if the governor decides upon not
> entering any idle state. However it cannot return an error code because all
> archs have the logic today that if the call to cpuidle_idle_call() fails, it
> means that the cpuidle driver failed to *function*; for instance due to
> errors during registration. As a result they end up deciding upon a
> default idle state on their own, which could very well be a deep idle state.
> This is incorrect in cases where no idle state is suitable.
>
> Besides for the scenario that this patch is addressing, the call actually
> succeeds. Its just that no idle state is thought to be suitable by the governors.
> Under such a circumstance return success code without entering any idle
> state.
>
> The consequence of this patch additionally  on the menu governor is that as
> long as a valid idle state cannot be chosen, the cpuidle statistics that this
> governor uses to predict the next idle state remain untouched from the last
> valid idle state. This is because an idle state is not even being predicted
> in this path, hence there is no point correcting the prediction either.
>
> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
>
> Changes from V1:https://lkml.org/lkml/2014/1/14/26
>
> 1. Change the return code to success from -EINVAL due to the reason mentioned
> in the changelog.
> 2. Add logic that the patch is addressing in the ladder governor as well.
> 3. Added relevant comments and removed redundant logic as suggested in the
> above thread.
> ---
>   drivers/cpuidle/cpuidle.c          |   15 +++++
>   drivers/cpuidle/governors/ladder.c |  101 ++++++++++++++++++++++++++----------
>   drivers/cpuidle/governors/menu.c   |    7 +-
>   3 files changed, 90 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index a55e68f..19d17e8 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -131,8 +131,9 @@ int cpuidle_idle_call(void)
>
>   	/* ask the governor for the next state */
>   	next_state = cpuidle_curr_governor->select(drv, dev);
> +
> +	dev->last_residency = 0;
>   	if (need_resched()) {

What about if (need_resched() || next_state < 0) ?

> -		dev->last_residency = 0;
>   		/* give the governor an opportunity to reflect on the outcome */
>   		if (cpuidle_curr_governor->reflect)
>   			cpuidle_curr_governor->reflect(dev, next_state);
> @@ -141,6 +142,16 @@ int cpuidle_idle_call(void)
>   	}
>
>   	trace_cpu_idle_rcuidle(next_state, dev->cpu);
> +	/*
> +	 * NOTE: The return code should still be success, since the verdict of
> +	 * this call is "do not enter any idle state". It is not a failed call
> +	 * due to errors.
> +	 */
> +	if (next_state < 0) {
> +		entered_state = next_state;
> +		local_irq_enable();
> +		goto out;
> +	}
>
>   	broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
>
> @@ -156,7 +167,7 @@ int cpuidle_idle_call(void)
>   	if (broadcast)
>   		clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &dev->cpu);
>
> -	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
> +out:	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
>
>   	/* give the governor an opportunity to reflect on the outcome */
>   	if (cpuidle_curr_governor->reflect)
> diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
> index 9f08e8c..7e93aaa 100644
> --- a/drivers/cpuidle/governors/ladder.c
> +++ b/drivers/cpuidle/governors/ladder.c
> @@ -58,6 +58,36 @@ static inline void ladder_do_selection(struct ladder_device *ldev,
>   	ldev->last_state_idx = new_idx;
>   }
>
> +static int can_promote(struct ladder_device *ldev, int last_idx,
> +				int last_residency)
> +{
> +	struct ladder_device_state *last_state;
> +
> +	last_state = &ldev->states[last_idx];
> +	if (last_residency > last_state->threshold.promotion_time) {
> +		last_state->stats.promotion_count++;
> +		last_state->stats.demotion_count = 0;
> +		if (last_state->stats.promotion_count >= last_state->threshold.promotion_count)
> +			return 1;
> +	}
> +	return 0;
> +}
> +
> +static int can_demote(struct ladder_device *ldev, int last_idx,
> +			int last_residency)
> +{
> +	struct ladder_device_state *last_state;
> +
> +	last_state = &ldev->states[last_idx];
> +	if (last_residency < last_state->threshold.demotion_time) {
> +		last_state->stats.demotion_count++;
> +		last_state->stats.promotion_count = 0;
> +		if (last_state->stats.demotion_count >= last_state->threshold.demotion_count)
> +			return 1;
> +	}
> +	return 0;
> +}
> +
>   /**
>    * ladder_select_state - selects the next state to enter
>    * @drv: cpuidle driver
> @@ -73,29 +103,33 @@ static int ladder_select_state(struct cpuidle_driver *drv,
>
>   	/* Special case when user has set very strict latency requirement */
>   	if (unlikely(latency_req == 0)) {
> -		ladder_do_selection(ldev, last_idx, 0);
> -		return 0;
> +		if (last_idx >= 0)
> +			ladder_do_selection(ldev, last_idx, -1);
> +		goto out;
>   	}
>
> -	last_state = &ldev->states[last_idx];
> -
> -	if (drv->states[last_idx].flags & CPUIDLE_FLAG_TIME_VALID) {
> -		last_residency = cpuidle_get_last_residency(dev) - \
> -					 drv->states[last_idx].exit_latency;
> +	if (last_idx >= 0) {
> +		if (drv->states[last_idx].flags & CPUIDLE_FLAG_TIME_VALID) {
> +			last_residency = cpuidle_get_last_residency(dev) - \
> +						 drv->states[last_idx].exit_latency;
> +		} else {
> +			last_state = &ldev->states[last_idx];
> +			last_residency = last_state->threshold.promotion_time + 1;
> +		}
>   	}
> -	else
> -		last_residency = last_state->threshold.promotion_time + 1;
>
>   	/* consider promotion */
>   	if (last_idx < drv->state_count - 1 &&
>   	    !drv->states[last_idx + 1].disabled &&
>   	    !dev->states_usage[last_idx + 1].disable &&
> -	    last_residency > last_state->threshold.promotion_time &&
>   	    drv->states[last_idx + 1].exit_latency <= latency_req) {
> -		last_state->stats.promotion_count++;
> -		last_state->stats.demotion_count = 0;
> -		if (last_state->stats.promotion_count >= last_state->threshold.promotion_count) {
> -			ladder_do_selection(ldev, last_idx, last_idx + 1);
> +		if (last_idx >= 0) {
> +			if (can_promote(ldev, last_idx, last_residency)) {
> +				ladder_do_selection(ldev, last_idx, last_idx + 1);
> +				return last_idx + 1;
> +			}
> +		} else {
> +			ldev->last_state_idx = last_idx + 1;
>   			return last_idx + 1;
>   		}
>   	}
> @@ -107,26 +141,36 @@ static int ladder_select_state(struct cpuidle_driver *drv,
>   	    drv->states[last_idx].exit_latency > latency_req)) {
>   		int i;
>
> -		for (i = last_idx - 1; i > CPUIDLE_DRIVER_STATE_START; i--) {
> -			if (drv->states[i].exit_latency <= latency_req)
> +		for (i = last_idx - 1; i >= CPUIDLE_DRIVER_STATE_START; i--) {
> +			if (drv->states[i].exit_latency <= latency_req &&
> +				!(drv->states[i].disabled || dev->states_usage[i].disable))
>   				break;
>   		}
> -		ladder_do_selection(ldev, last_idx, i);
> -		return i;
> +		if (i >= 0) {
> +			ladder_do_selection(ldev, last_idx, i);
> +			return i;
> +		}
> +		goto out;
>   	}
>
> -	if (last_idx > CPUIDLE_DRIVER_STATE_START &&
> -	    last_residency < last_state->threshold.demotion_time) {
> -		last_state->stats.demotion_count++;
> -		last_state->stats.promotion_count = 0;
> -		if (last_state->stats.demotion_count >= last_state->threshold.demotion_count) {
> -			ladder_do_selection(ldev, last_idx, last_idx - 1);
> -			return last_idx - 1;
> +	if (last_idx > CPUIDLE_DRIVER_STATE_START) {
> +		int i = last_idx - 1;
> +
> +		if (can_demote(ldev, last_idx, last_residency) &&
> +			!(drv->states[i].disabled || dev->states_usage[i].disable)) {
> +			ladder_do_selection(ldev, last_idx, i);
> +			return i;
>   		}
> +		/* We come here when the last_idx is still a suitable idle state,
> +		 * just that  promotion or demotion is not ideal.
> +		 */
> +		ldev->last_state_idx = last_idx;
> +		return last_idx;
>   	}
>
> -	/* otherwise remain at the current state */
> -	return last_idx;
> +	/* we come here if no idle state is suitable */
> +out:	ldev->last_state_idx = -1;
> +	return ldev->last_state_idx;
>   }
>
>   /**
> @@ -166,7 +210,8 @@ static int ladder_enable_device(struct cpuidle_driver *drv,
>   /**
>    * ladder_reflect - update the correct last_state_idx
>    * @dev: the CPU
> - * @index: the index of actual state entered
> + * @index: the index of actual state entered or -1 if no idle state is
> + * suitable.
>    */
>   static void ladder_reflect(struct cpuidle_device *dev, int index)
>   {
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index cf7f2f0..e9f17ce 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -297,12 +297,12 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>   		data->needs_update = 0;
>   	}
>
> -	data->last_state_idx = 0;
> +	data->last_state_idx = -1;
>   	data->exit_us = 0;
>
>   	/* Special case when user has set very strict latency requirement */
>   	if (unlikely(latency_req == 0))
> -		return 0;
> +		return data->last_state_idx;
>
>   	/* determine the expected residency time, round up */
>   	t = ktime_to_timespec(tick_nohz_get_sleep_length());
> @@ -368,7 +368,8 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>   /**
>    * menu_reflect - records that data structures need update
>    * @dev: the CPU
> - * @index: the index of actual entered state
> + * @index: the index of actual entered state or -1 if no idle state is
> + * suitable.
>    *
>    * NOTE: it's important to be fast here because this operation will add to
>    *       the overall exit latency.
>
>


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply

* [PATCH V2 2/2] tick/cpuidle: Initialize hrtimer mode of broadcast
From: Preeti U Murthy @ 2014-01-24  6:58 UTC (permalink / raw)
  To: daniel.lezcano, peterz, fweisbec, galak, paul.gortmaker, paulus,
	mingo, mikey, shangw, rafael.j.wysocki, agraf, benh, paulmck,
	arnd, linux-pm, rostedt, michael, john.stultz, anton, tglx,
	chenhui.zhao, deepthi, r58472, geoff, linux-kernel, srivatsa.bhat,
	schwidefsky, svaidy, linuxppc-dev
In-Reply-To: <20140124065501.17564.39363.stgit@preeti.in.ibm.com>

From: Thomas Gleixner <tglx@linutronix.de>

On some architectures, in certain CPU deep idle states the local timers stop.
An external clock device is used to wakeup these CPUs. The kernel support for the
wakeup of these CPUs is provided by the tick broadcast framework by using the
external clock device as the wakeup source.

However not all implementations of architectures provide such an external
clock device. This patch includes support in the broadcast framework to handle
the wakeup of the CPUs in deep idle states on such systems by queuing a hrtimer
on one of the CPUs, which is meant to handle the wakeup of CPUs in deep idle states.

This patchset introduces a pseudo clock device which can be registered by the
archs as tick_broadcast_device in the absence of a real external clock
device. Once registered, the broadcast framework will work as is for these
architectures as long as the archs take care of the BROADCAST_ENTER
notification failing for one of the CPUs. This CPU is made the stand by CPU to
handle wakeup of the CPUs in deep idle and it *must not enter deep idle states*.

The CPU with the earliest wakeup is chosen to be this CPU. Hence this way the
stand by CPU dynamically moves around and so does the hrtimer which is queued
to trigger at the next earliest wakeup time. This is consistent with the case where
an external clock device is present. The smp affinity of this clock device is
set to the CPU with the earliest wakeup. This patchset handles the hotplug of
the stand by CPU as well by moving the hrtimer on to the CPU handling the CPU_DEAD
notification.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
[Added Changelog and code to handle reprogramming of hrtimer]
---

 include/linux/clockchips.h           |    9 +++
 kernel/time/Makefile                 |    2 -
 kernel/time/tick-broadcast-hrtimer.c |  102 ++++++++++++++++++++++++++++++++++
 kernel/time/tick-broadcast.c         |   45 +++++++++++++++
 4 files changed, 156 insertions(+), 2 deletions(-)
 create mode 100644 kernel/time/tick-broadcast-hrtimer.c

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index ac81b56..2293025 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -62,6 +62,11 @@ enum clock_event_mode {
 #define CLOCK_EVT_FEAT_DYNIRQ		0x000020
 #define CLOCK_EVT_FEAT_PERCPU		0x000040
 
+/*
+ * Clockevent device is based on a hrtimer for broadcast
+ */
+#define CLOCK_EVT_FEAT_HRTIMER		0x000080
+
 /**
  * struct clock_event_device - clock event device descriptor
  * @event_handler:	Assigned by the framework to be called by the low
@@ -83,6 +88,7 @@ enum clock_event_mode {
  * @name:		ptr to clock event name
  * @rating:		variable to rate clock event devices
  * @irq:		IRQ number (only for non CPU local devices)
+ * @bound_on:		Bound on CPU
  * @cpumask:		cpumask to indicate for which CPUs this device works
  * @list:		list head for the management code
  * @owner:		module reference
@@ -113,6 +119,7 @@ struct clock_event_device {
 	const char		*name;
 	int			rating;
 	int			irq;
+	int			bound_on;
 	const struct cpumask	*cpumask;
 	struct list_head	list;
 	struct module		*owner;
@@ -180,9 +187,11 @@ extern int tick_receive_broadcast(void);
 #endif
 
 #if defined(CONFIG_GENERIC_CLOCKEVENTS_BROADCAST) && defined(CONFIG_TICK_ONESHOT)
+extern void tick_setup_hrtimer_broadcast(void);
 extern int tick_check_broadcast_expired(void);
 #else
 static inline int tick_check_broadcast_expired(void) { return 0; }
+static void tick_setup_hrtimer_broadcast(void) {};
 #endif
 
 #ifdef CONFIG_GENERIC_CLOCKEVENTS
diff --git a/kernel/time/Makefile b/kernel/time/Makefile
index 9250130..06151ef 100644
--- a/kernel/time/Makefile
+++ b/kernel/time/Makefile
@@ -3,7 +3,7 @@ obj-y += timeconv.o posix-clock.o alarmtimer.o
 
 obj-$(CONFIG_GENERIC_CLOCKEVENTS_BUILD)		+= clockevents.o
 obj-$(CONFIG_GENERIC_CLOCKEVENTS)		+= tick-common.o
-obj-$(CONFIG_GENERIC_CLOCKEVENTS_BROADCAST)	+= tick-broadcast.o
+obj-$(CONFIG_GENERIC_CLOCKEVENTS_BROADCAST)	+= tick-broadcast.o tick-broadcast-hrtimer.o
 obj-$(CONFIG_GENERIC_SCHED_CLOCK)		+= sched_clock.o
 obj-$(CONFIG_TICK_ONESHOT)			+= tick-oneshot.o
 obj-$(CONFIG_TICK_ONESHOT)			+= tick-sched.o
diff --git a/kernel/time/tick-broadcast-hrtimer.c b/kernel/time/tick-broadcast-hrtimer.c
new file mode 100644
index 0000000..23f4925
--- /dev/null
+++ b/kernel/time/tick-broadcast-hrtimer.c
@@ -0,0 +1,102 @@
+/*
+ * linux/kernel/time/tick-broadcast-hrtimer.c
+ * This file emulates a local clock event device
+ * via a pseudo clock device.
+ */
+#include <linux/cpu.h>
+#include <linux/err.h>
+#include <linux/hrtimer.h>
+#include <linux/interrupt.h>
+#include <linux/percpu.h>
+#include <linux/profile.h>
+#include <linux/clockchips.h>
+#include <linux/sched.h>
+#include <linux/smp.h>
+#include <linux/module.h>
+
+#include "tick-internal.h"
+
+static struct hrtimer bctimer;
+
+static void bc_set_mode(enum clock_event_mode mode,
+			struct clock_event_device *bc)
+{
+	switch (mode) {
+	case CLOCK_EVT_MODE_SHUTDOWN:
+		/*
+		 * Note, we cannot cancel the timer here as we might
+		 * run into the following live lock scenario:
+		 *
+		 * cpu 0		cpu1
+		 * lock(broadcast_lock);
+		 *			hrtimer_interrupt()
+		 *			bc_handler()
+		 *			   tick_handle_oneshot_broadcast();
+		 *			    lock(broadcast_lock);
+		 * hrtimer_cancel()
+		 *  wait_for_callback()
+		 */
+		hrtimer_try_to_cancel(&bctimer);
+		break;
+	default:
+		break;
+	}
+}
+
+/*
+ * This is called from the guts of the broadcast code when the cpu
+ * which is about to enter idle has the earliest broadcast timer event.
+ */
+static int bc_set_next(ktime_t expires, struct clock_event_device *bc)
+{
+	ktime_t now, interval;
+	/*
+	 * We try to cancel the timer first. If the callback is on
+	 * flight on some other cpu then we let it handle it. If we
+	 * were able to cancel the timer nothing can rearm it as we
+	 * own broadcast_lock.
+	 *
+	 * However if we are called from the hrtimer interrupt handler
+	 * itself, reprogram it.
+	 */
+	if (hrtimer_try_to_cancel(&bctimer) >= 0) {
+		hrtimer_start(&bctimer, expires, HRTIMER_MODE_ABS_PINNED);
+		/* Bind the "device" to the cpu */
+		bc->bound_on = smp_processor_id();
+	} else if (bc->bound_on == smp_processor_id()) {
+		now = ktime_get();
+		interval = ktime_sub(expires, now);
+		hrtimer_forward_now(&bctimer, interval);
+	}
+	return 0;
+}
+
+static struct clock_event_device ce_broadcast_hrtimer = {
+	.set_mode		= bc_set_mode,
+	.set_next_ktime		= bc_set_next,
+	.features		= CLOCK_EVT_FEAT_ONESHOT |
+				  CLOCK_EVT_FEAT_KTIME |
+				  CLOCK_EVT_FEAT_HRTIMER,
+	.rating			= 0,
+	.bound_on		= -1,
+	.min_delta_ns		= 1,
+	.max_delta_ns		= KTIME_MAX,
+	.min_delta_ticks	= 1,
+	.max_delta_ticks	= KTIME_MAX,
+	.mult			= 1,
+	.shift			= 0,
+	.cpumask		= cpu_all_mask,
+};
+
+static enum hrtimer_restart bc_handler(struct hrtimer *t)
+{
+	ce_broadcast_hrtimer.event_handler(&ce_broadcast_hrtimer);
+	return HRTIMER_RESTART;
+}
+
+void tick_setup_hrtimer_broadcast(void)
+{
+	hrtimer_init(&bctimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+	bctimer.function = bc_handler;
+	clockevents_register_device(&ce_broadcast_hrtimer);
+}
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index be00692..7ed7e69 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -630,6 +630,42 @@ again:
 	raw_spin_unlock(&tick_broadcast_lock);
 }
 
+static int broadcast_needs_cpu(struct clock_event_device *bc, int cpu)
+{
+	if (!(bc->features & CLOCK_EVT_FEAT_HRTIMER))
+		return 0;
+	if (bc->next_event.tv64 == KTIME_MAX)
+		return 0;
+	return bc->bound_on == cpu ? -EBUSY : 0;
+}
+
+static void broadcast_shutdown_local(struct clock_event_device *bc,
+				     struct clock_event_device *dev)
+{
+	/*
+	 * For hrtimer based broadcasting we cannot shutdown the cpu
+	 * local device if our own event is the first one to expire or
+	 * if we own the broadcast timer.
+	 */
+	if (bc->features & CLOCK_EVT_FEAT_HRTIMER) {
+		if (broadcast_needs_cpu(bc, smp_processor_id()))
+			return;
+		if (dev->next_event.tv64 < bc->next_event.tv64)
+			return;
+	}
+	clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
+}
+
+static void broadcast_move_bc(int deadcpu)
+{
+	struct clock_event_device *bc = tick_broadcast_device.evtdev;
+
+	if (!bc || !broadcast_needs_cpu(bc, deadcpu))
+		return;
+	/* This moves the broadcast assignment to this cpu */
+	clockevents_program_event(bc, bc->next_event, 1);
+}
+
 /*
  * Powerstate information: The system enters/leaves a state, where
  * affected devices might stop
@@ -667,7 +703,7 @@ int tick_broadcast_oneshot_control(unsigned long reason)
 	if (reason == CLOCK_EVT_NOTIFY_BROADCAST_ENTER) {
 		if (!cpumask_test_and_set_cpu(cpu, tick_broadcast_oneshot_mask)) {
 			WARN_ON_ONCE(cpumask_test_cpu(cpu, tick_broadcast_pending_mask));
-			clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
+			broadcast_shutdown_local(bc, dev);
 			/*
 			 * We only reprogram the broadcast timer if we
 			 * did not mark ourself in the force mask and
@@ -680,6 +716,11 @@ int tick_broadcast_oneshot_control(unsigned long reason)
 			    dev->next_event.tv64 < bc->next_event.tv64)
 				tick_broadcast_set_event(bc, cpu, dev->next_event, 1);
 		}
+		/*
+		 * If the current CPU owns the hrtimer broadcast
+		 * mechanism, it cannot go deep idle.
+		 */
+		ret = broadcast_needs_cpu(bc, cpu);
 	} else {
 		if (cpumask_test_and_clear_cpu(cpu, tick_broadcast_oneshot_mask)) {
 			clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT);
@@ -853,6 +894,8 @@ void tick_shutdown_broadcast_oneshot(unsigned int *cpup)
 	cpumask_clear_cpu(cpu, tick_broadcast_pending_mask);
 	cpumask_clear_cpu(cpu, tick_broadcast_force_mask);
 
+	broadcast_move_bc(cpu);
+
 	raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
 }
 

^ permalink raw reply related

* [PATCH V2 1/2] time: Change the return type of clockevents_notify() to integer
From: Preeti U Murthy @ 2014-01-24  6:57 UTC (permalink / raw)
  To: daniel.lezcano, peterz, fweisbec, galak, paul.gortmaker, paulus,
	mingo, mikey, shangw, rafael.j.wysocki, agraf, benh, paulmck,
	arnd, linux-pm, rostedt, michael, john.stultz, anton, tglx,
	chenhui.zhao, deepthi, r58472, geoff, linux-kernel, srivatsa.bhat,
	schwidefsky, svaidy, linuxppc-dev
In-Reply-To: <20140124065501.17564.39363.stgit@preeti.in.ibm.com>

The broadcast framework can potentially be made use of by archs which do not have an
external clock device as well. Then, it is required that one of the CPUs need
to handle the broadcasting of wakeup IPIs to the CPUs in deep idle. As a
result its local timers should remain functional all the time. For such
a CPU, the BROADCAST_ENTER notification has to fail indicating that its clock
device cannot be shutdown. To make way for this support, change the return
type of tick_broadcast_oneshot_control() and hence clockevents_notify() to
indicate such scenarios.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---

 include/linux/clockchips.h   |    6 +++---
 kernel/time/clockevents.c    |    8 +++++---
 kernel/time/tick-broadcast.c |    6 ++++--
 kernel/time/tick-internal.h  |    6 +++---
 4 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 493aa02..ac81b56 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -186,9 +186,9 @@ static inline int tick_check_broadcast_expired(void) { return 0; }
 #endif
 
 #ifdef CONFIG_GENERIC_CLOCKEVENTS
-extern void clockevents_notify(unsigned long reason, void *arg);
+extern int clockevents_notify(unsigned long reason, void *arg);
 #else
-static inline void clockevents_notify(unsigned long reason, void *arg) {}
+static inline int clockevents_notify(unsigned long reason, void *arg) {}
 #endif
 
 #else /* CONFIG_GENERIC_CLOCKEVENTS_BUILD */
@@ -196,7 +196,7 @@ static inline void clockevents_notify(unsigned long reason, void *arg) {}
 static inline void clockevents_suspend(void) {}
 static inline void clockevents_resume(void) {}
 
-static inline void clockevents_notify(unsigned long reason, void *arg) {}
+static inline int clockevents_notify(unsigned long reason, void *arg) {}
 static inline int tick_check_broadcast_expired(void) { return 0; }
 
 #endif
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 086ad60..79b8685 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -524,12 +524,13 @@ void clockevents_resume(void)
 #ifdef CONFIG_GENERIC_CLOCKEVENTS
 /**
  * clockevents_notify - notification about relevant events
+ * Returns 0 on success, any other value on error
  */
-void clockevents_notify(unsigned long reason, void *arg)
+int clockevents_notify(unsigned long reason, void *arg)
 {
 	struct clock_event_device *dev, *tmp;
 	unsigned long flags;
-	int cpu;
+	int cpu, ret = 0;
 
 	raw_spin_lock_irqsave(&clockevents_lock, flags);
 
@@ -542,7 +543,7 @@ void clockevents_notify(unsigned long reason, void *arg)
 
 	case CLOCK_EVT_NOTIFY_BROADCAST_ENTER:
 	case CLOCK_EVT_NOTIFY_BROADCAST_EXIT:
-		tick_broadcast_oneshot_control(reason);
+		ret = tick_broadcast_oneshot_control(reason);
 		break;
 
 	case CLOCK_EVT_NOTIFY_CPU_DYING:
@@ -585,6 +586,7 @@ void clockevents_notify(unsigned long reason, void *arg)
 		break;
 	}
 	raw_spin_unlock_irqrestore(&clockevents_lock, flags);
+	return ret;
 }
 EXPORT_SYMBOL_GPL(clockevents_notify);
 
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 9532690..be00692 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -633,14 +633,15 @@ again:
 /*
  * Powerstate information: The system enters/leaves a state, where
  * affected devices might stop
+ * Returns 0 on success, -EBUSY if the cpu is used to broadcast wakeups.
  */
-void tick_broadcast_oneshot_control(unsigned long reason)
+int tick_broadcast_oneshot_control(unsigned long reason)
 {
 	struct clock_event_device *bc, *dev;
 	struct tick_device *td;
 	unsigned long flags;
 	ktime_t now;
-	int cpu;
+	int cpu, ret = 0;
 
 	/*
 	 * Periodic mode does not care about the enter/exit of power
@@ -746,6 +747,7 @@ void tick_broadcast_oneshot_control(unsigned long reason)
 	}
 out:
 	raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
+	return ret;
 }
 
 /*
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index 18e71f7..164465c 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -46,7 +46,7 @@ extern int tick_switch_to_oneshot(void (*handler)(struct clock_event_device *));
 extern void tick_resume_oneshot(void);
 # ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
 extern void tick_broadcast_setup_oneshot(struct clock_event_device *bc);
-extern void tick_broadcast_oneshot_control(unsigned long reason);
+extern int tick_broadcast_oneshot_control(unsigned long reason);
 extern void tick_broadcast_switch_to_oneshot(void);
 extern void tick_shutdown_broadcast_oneshot(unsigned int *cpup);
 extern int tick_resume_broadcast_oneshot(struct clock_event_device *bc);
@@ -58,7 +58,7 @@ static inline void tick_broadcast_setup_oneshot(struct clock_event_device *bc)
 {
 	BUG();
 }
-static inline void tick_broadcast_oneshot_control(unsigned long reason) { }
+static inline int tick_broadcast_oneshot_control(unsigned long reason) { }
 static inline void tick_broadcast_switch_to_oneshot(void) { }
 static inline void tick_shutdown_broadcast_oneshot(unsigned int *cpup) { }
 static inline int tick_broadcast_oneshot_active(void) { return 0; }
@@ -87,7 +87,7 @@ static inline void tick_broadcast_setup_oneshot(struct clock_event_device *bc)
 {
 	BUG();
 }
-static inline void tick_broadcast_oneshot_control(unsigned long reason) { }
+static inline int tick_broadcast_oneshot_control(unsigned long reason) { }
 static inline void tick_shutdown_broadcast_oneshot(unsigned int *cpup) { }
 static inline int tick_resume_broadcast_oneshot(struct clock_event_device *bc)
 {

^ permalink raw reply related

* [PATCH V2 0/2] time/cpuidle: Support in tick broadcast framework in absence of external clock device
From: Preeti U Murthy @ 2014-01-24  6:57 UTC (permalink / raw)
  To: daniel.lezcano, peterz, fweisbec, galak, paul.gortmaker, paulus,
	mingo, mikey, shangw, rafael.j.wysocki, agraf, benh, paulmck,
	arnd, linux-pm, rostedt, michael, john.stultz, anton, tglx,
	chenhui.zhao, deepthi, r58472, geoff, linux-kernel, srivatsa.bhat,
	schwidefsky, svaidy, linuxppc-dev

This earlier version of this patchset can be found here:
https://lkml.org/lkml/2013/12/12/687. This version has been based on the
discussion in http://www.kernelhub.org/?p=2&msg=399516.

This patchset provides the hooks that the architectures without an external
clock device and deep idle states where the local timers stop can make use of.

Presently we are in need of this support on certain implementations of
PowerPC. This patchset has been used on PowerPC for testing with

---

Preeti U Murthy (1):
      time: Change the return type of clockevents_notify() to integer

Thomas Gleixner (1):
      tick/cpuidle: Initialize hrtimer mode of broadcast


 include/linux/clockchips.h           |   15 ++++-
 kernel/time/Makefile                 |    2 -
 kernel/time/clockevents.c            |    8 ++-
 kernel/time/tick-broadcast-hrtimer.c |  102 ++++++++++++++++++++++++++++++++++
 kernel/time/tick-broadcast.c         |   51 ++++++++++++++++-
 kernel/time/tick-internal.h          |    6 +-
 6 files changed, 171 insertions(+), 13 deletions(-)
 create mode 100644 kernel/time/tick-broadcast-hrtimer.c

-- 

^ permalink raw reply

* Re: [PATCH V5 6/8] time/cpuidle: Support in tick broadcast framework in the absence of external clock device
From: Preeti U Murthy @ 2014-01-24  6:53 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: daniel.lezcano, peterz, fweisbec, paul.gortmaker, paulus, mingo,
	mikey, shangw, rafael.j.wysocki, agraf, paulmck, arnd, linux-pm,
	rostedt, michael, john.stultz, anton, chenhui.zhao, deepthi,
	r58472, geoff, linux-kernel, srivatsa.bhat, schwidefsky,
	linuxppc-dev
In-Reply-To: <alpine.DEB.2.02.1401221042130.4260@ionos.tec.linutronix.de>

Hi Thomas,

The below patch works pretty much as is. I tried this out with deep idle
states on our system. Looking through the code and analysing corner
cases also did not bring out any issues to me. I will send out a patch
V2 of this.

Regards
Preeti U Murthy

On 01/22/2014 06:57 PM, Thomas Gleixner wrote:
> On Wed, 15 Jan 2014, Preeti U Murthy wrote:
>> diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
>> index 086ad60..d61404e 100644
>> --- a/kernel/time/clockevents.c
>> +++ b/kernel/time/clockevents.c
>> @@ -524,12 +524,13 @@ void clockevents_resume(void)
>>  #ifdef CONFIG_GENERIC_CLOCKEVENTS
>>  /**
>>   * clockevents_notify - notification about relevant events
>> + * Returns non zero on error.
>>   */
>> -void clockevents_notify(unsigned long reason, void *arg)
>> +int clockevents_notify(unsigned long reason, void *arg)
>>  {
> 
> The interface change of clockevents_notify wants to be a separate
> patch.
> 
>> diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
>> index 9532690..1c23912 100644
>> --- a/kernel/time/tick-broadcast.c
>> +++ b/kernel/time/tick-broadcast.c
>> @@ -20,6 +20,7 @@
>>  #include <linux/sched.h>
>>  #include <linux/smp.h>
>>  #include <linux/module.h>
>> +#include <linux/slab.h>
>>  
>>  #include "tick-internal.h"
>>  
>> @@ -35,6 +36,15 @@ static cpumask_var_t tmpmask;
>>  static DEFINE_RAW_SPINLOCK(tick_broadcast_lock);
>>  static int tick_broadcast_force;
>>  
>> +/*
>> + * Helper variables for handling broadcast in the absence of a
>> + * tick_broadcast_device.
>> + * */
>> +static struct hrtimer *bc_hrtimer;
>> +static int bc_cpu = -1;
>> +static ktime_t bc_next_wakeup;
> 
> Why do you need another variable to store the expiry time? The
> broadcast code already knows it and the hrtimer expiry value gives you
> the same information for free.
> 
>> +static int hrtimer_initialized = 0;
> 
> What's the point of this hrtimer_initialized dance? Why not simply
> making the hrtimer static and avoid that all together. Also adding the
> initialization into tick_broadcast_oneshot_available() is
> braindamaged.  Why not adding this to tick_broadcast_init() which is
> the proper place to do?
> 
> Aside of that you are making this hrtimer mode unconditional, which
> might break existing systems which are not aware of the hrtimer
> implications.
> 
> What you really want is a pseudo clock event device which has the
> proper functions for handling the timer and you can register it from
> your architecture code. The broadcast core code needs a few tweaks to
> avoid the shutdown of the cpu local clock event device, but aside of
> that the whole thing just falls into place. So architectures can use
> this if they want and are sure that their low level idle code knows
> about the deep idle preventing return value of
> clockevents_notify(). Once that works you can register the hrtimer
> based broadcast device and a real hardware broadcast device with a
> higher rating. It just works.
> 
> Find an incomplete and nonfunctional concept patch below. It should be
> simple to make it work for real.
> 
> Thanks,
> 
> 	tglx
> ----
> Index: linux-2.6/include/linux/clockchips.h
> ===================================================================
> --- linux-2.6.orig/include/linux/clockchips.h
> +++ linux-2.6/include/linux/clockchips.h
> @@ -62,6 +62,11 @@ enum clock_event_mode {
>  #define CLOCK_EVT_FEAT_DYNIRQ		0x000020
>  #define CLOCK_EVT_FEAT_PERCPU		0x000040
> 
> +/*
> + * Clockevent device is based on a hrtimer for broadcast
> + */
> +#define CLOCK_EVT_FEAT_HRTIMER		0x000080
> +
>  /**
>   * struct clock_event_device - clock event device descriptor
>   * @event_handler:	Assigned by the framework to be called by the low
> @@ -83,6 +88,7 @@ enum clock_event_mode {
>   * @name:		ptr to clock event name
>   * @rating:		variable to rate clock event devices
>   * @irq:		IRQ number (only for non CPU local devices)
> + * @bound_on:		Bound on CPU
>   * @cpumask:		cpumask to indicate for which CPUs this device works
>   * @list:		list head for the management code
>   * @owner:		module reference
> @@ -113,6 +119,7 @@ struct clock_event_device {
>  	const char		*name;
>  	int			rating;
>  	int			irq;
> +	int			bound_on;
>  	const struct cpumask	*cpumask;
>  	struct list_head	list;
>  	struct module		*owner;
> Index: linux-2.6/kernel/time/tick-broadcast-hrtimer.c
> ===================================================================
> --- /dev/null
> +++ linux-2.6/kernel/time/tick-broadcast-hrtimer.c
> @@ -0,0 +1,77 @@
> +
> +static struct hrtimer bctimer;
> +
> +static void bc_set_mode(enum clock_event_mode mode,
> +			struct clock_event_device *bc)
> +{
> +	switch (mode) {
> +	case CLOCK_EVT_MODE_SHUTDOWN:
> +		/*
> +		 * Note, we cannot cancel the timer here as we might
> +		 * run into the following live lock scenario:
> +		 *
> +		 * cpu 0		cpu1
> +		 * lock(broadcast_lock);
> +		 *			hrtimer_interrupt()
> +		 *			bc_handler()
> +		 *			   tick_handle_oneshot_broadcast();
> +		 *			    lock(broadcast_lock);
> +		 * hrtimer_cancel()
> +		 *  wait_for_callback()
> +		 */
> +		hrtimer_try_to_cancel(&bctimer);
> +		break;
> +	default:
> +		break;
> +	}
> +}
> +
> +/*
> + * This is called from the guts of the broadcast code when the cpu
> + * which is about to enter idle has the earliest broadcast timer event.
> + */
> +static int bc_set_next(ktime_t expires, struct clock_event_device *bc)
> +{
> +	/*
> +	 * We try to cancel the timer first. If the callback is on
> +	 * flight on some other cpu then we let it handle it. If we
> +	 * were able to cancel the timer nothing can rearm it as we
> +	 * own broadcast_lock.
> +	 */
> +	if (hrtimer_try_to_cancel(&bctimer) >= 0) {
> +		hrtimer_start(&bctimer, expires, HRTIMER_MODE_ABS_PINNED);
> +		/* Bind the "device" to the cpu */
> +		bc->bound_on = smp_processor_id();
> +	}
> +	return 0;
> +}
> +
> +static struct clock_event_device ce_broadcast_hrtimer = {
> +	.set_mode		= bc_set_mode,
> +	.set_next_ktime		= bc_set_next,
> +	.features		= CLOCK_EVT_FEAT_ONESHOT |
> +				  CLOCK_EVT_FEAT_KTIME |
> +				  CLOCK_EVT_FEAT_HRTIMER,
> +	.rating			= 0,
> +	.bound_on		= -1,
> +	.min_delta_ns		= 1,
> +	.max_delta_ns		= KTIME_MAX,
> +	.min_delta_ticks	= 1,
> +	.max_delta_ticks	= KTIME_MAX,
> +	.mult			= 1,
> +	.shift			= 0,
> +	.cpumask		= cpu_all_mask,
> +};
> +
> +static enum hrtimer_restart bc_handler(struct hrtimer *t)
> +{
> +	ce_broadcast_hrtimer.event_handler(&ce_broadcast_hrtimer);
> +	return HRTIMER_NORESTART;
> +}
> +
> +void tick_setup_hrtimer_broadcast(void)
> +{
> +	hrtimer_init(&bctimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
> +	bctimer.function = bc_handler;
> +	clockevents_register(&ce_broadcast_hrtimer);
> +}
> Index: linux-2.6/kernel/time/tick-broadcast.c
> ===================================================================
> --- linux-2.6.orig/kernel/time/tick-broadcast.c
> +++ linux-2.6/kernel/time/tick-broadcast.c
> @@ -630,6 +630,42 @@ again:
>  	raw_spin_unlock(&tick_broadcast_lock);
>  }
> 
> +static int broadcast_needs_cpu(struct clock_event_device *bc, int cpu)
> +{
> +	if (!(bc->features & CLOCK_EVT_FEAT_HRTIMER))
> +		return 0;
> +	if (bc->next_event.tv64 == KTIME_MAX)
> +		return 0;
> +	return bc->bound_on == cpu ? -EBUSY : 0;
> +}
> +
> +static void broadcast_shutdown_local(struct clock_event_device *bc,
> +				     struct clock_event_device *dev)
> +{
> +	/*
> +	 * For hrtimer based broadcasting we cannot shutdown the cpu
> +	 * local device if our own event is the first one to expire or
> +	 * if we own the broadcast timer.
> +	 */
> +	if (bc->features & CLOCK_EVT_FEAT_HRTIMER) {
> +		if (broadcast_needs_cpu(bc))
> +			return;
> +		if (dev->next_event.tv64 < bc->next_event.tv64)
> +			return;
> +	}
> +	clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
> +}
> +
> +static void broadcast_move_bc(int deadcpu)
> +{
> +	struct clock_event_device *bc = tick_broadcast_device.evtdev;
> +
> +	if (!bc || !broadcast_needs_cpu(bc, deadcpu))
> +		return;
> +	/* This moves the broadcast assignment to this cpu */
> +	clockevents_program_event(bc, bc->next_event, 1);
> +}
> +
>  /*
>   * Powerstate information: The system enters/leaves a state, where
>   * affected devices might stop
> @@ -639,8 +675,8 @@ void tick_broadcast_oneshot_control(unsi
>  	struct clock_event_device *bc, *dev;
>  	struct tick_device *td;
>  	unsigned long flags;
> +	int cpu, ret = 0;
>  	ktime_t now;
> -	int cpu;
> 
>  	/*
>  	 * Periodic mode does not care about the enter/exit of power
> @@ -666,7 +702,7 @@ void tick_broadcast_oneshot_control(unsi
>  	if (reason == CLOCK_EVT_NOTIFY_BROADCAST_ENTER) {
>  		if (!cpumask_test_and_set_cpu(cpu, tick_broadcast_oneshot_mask)) {
>  			WARN_ON_ONCE(cpumask_test_cpu(cpu, tick_broadcast_pending_mask));
> -			clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
> +			broadcast_shutdown_local(bc, dev);
>  			/*
>  			 * We only reprogram the broadcast timer if we
>  			 * did not mark ourself in the force mask and
> @@ -679,6 +715,11 @@ void tick_broadcast_oneshot_control(unsi
>  			    dev->next_event.tv64 < bc->next_event.tv64)
>  				tick_broadcast_set_event(bc, cpu, dev->next_event, 1);
>  		}
> +		/*
> +		 * If the current CPU owns the hrtimer broadcast
> +		 * mechanism, it cannot go deep idle.
> +		 */
> +		ret = broadcast_needs_cpu(bc, cpu);
>  	} else {
>  		if (cpumask_test_and_clear_cpu(cpu, tick_broadcast_oneshot_mask)) {
>  			clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT);
> @@ -851,6 +892,8 @@ void tick_shutdown_broadcast_oneshot(uns
>  	cpumask_clear_cpu(cpu, tick_broadcast_pending_mask);
>  	cpumask_clear_cpu(cpu, tick_broadcast_force_mask);
> 
> +	broadcast_move_bc(cpu);
> +
>  	raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
>  }
> 
> 

^ permalink raw reply

* [PATCH] powerpc/perf: Add Power8 cache & TLB events
From: Michael Ellerman @ 2014-01-24  4:50 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: sukadev, Anton Blanchard, mlpesant

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/perf/power8-pmu.c | 144 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 144 insertions(+)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index a3f7abd..96cee20 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -25,6 +25,37 @@
 #define PM_BRU_FIN			0x10068
 #define PM_BR_MPRED_CMPL		0x400f6
 
+/* All L1 D cache load references counted at finish, gated by reject */
+#define PM_LD_REF_L1			0x100ee
+/* Load Missed L1 */
+#define PM_LD_MISS_L1			0x3e054
+/* Store Missed L1 */
+#define PM_ST_MISS_L1			0x300f0
+/* L1 cache data prefetches */
+#define PM_L1_PREF			0x0d8b8
+/* Instruction fetches from L1 */
+#define PM_INST_FROM_L1			0x04080
+/* Demand iCache Miss */
+#define PM_L1_ICACHE_MISS		0x200fd
+/* Instruction Demand sectors wriittent into IL1 */
+#define PM_L1_DEMAND_WRITE		0x0408c
+/* Instruction prefetch written into IL1 */
+#define PM_IC_PREF_WRITE		0x0408e
+/* The data cache was reloaded from local core's L3 due to a demand load */
+#define PM_DATA_FROM_L3			0x4c042
+/* Demand LD - L3 Miss (not L2 hit and not L3 hit) */
+#define PM_DATA_FROM_L3MISS		0x300fe
+/* All successful D-side store dispatches for this thread */
+#define PM_L2_ST			0x17080
+/* All successful D-side store dispatches for this thread that were L2 Miss */
+#define PM_L2_ST_MISS			0x17082
+/* Total HW L3 prefetches(Load+store) */
+#define PM_L3_PREF_ALL			0x4e052
+/* Data PTEG reload */
+#define PM_DTLB_MISS			0x300fc
+/* ITLB Reloaded */
+#define PM_ITLB_MISS			0x400fc
+
 
 /*
  * Raw event encoding for POWER8:
@@ -557,6 +588,8 @@ static int power8_generic_events[] = {
 	[PERF_COUNT_HW_INSTRUCTIONS] =			PM_INST_CMPL,
 	[PERF_COUNT_HW_BRANCH_INSTRUCTIONS] =		PM_BRU_FIN,
 	[PERF_COUNT_HW_BRANCH_MISSES] =			PM_BR_MPRED_CMPL,
+	[PERF_COUNT_HW_CACHE_REFERENCES] =		PM_LD_REF_L1,
+	[PERF_COUNT_HW_CACHE_MISSES] =			PM_LD_MISS_L1,
 };
 
 static u64 power8_bhrb_filter_map(u64 branch_sample_type)
@@ -596,6 +629,116 @@ static void power8_config_bhrb(u64 pmu_bhrb_filter)
 	mtspr(SPRN_MMCRA, (mfspr(SPRN_MMCRA) | pmu_bhrb_filter));
 }
 
+#define C(x)	PERF_COUNT_HW_CACHE_##x
+
+/*
+ * Table of generalized cache-related events.
+ * 0 means not supported, -1 means nonsensical, other values
+ * are event codes.
+ */
+static int power8_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
+	[ C(L1D) ] = {
+		[ C(OP_READ) ] = {
+			[ C(RESULT_ACCESS) ] = PM_LD_REF_L1,
+			[ C(RESULT_MISS)   ] = PM_LD_MISS_L1,
+		},
+		[ C(OP_WRITE) ] = {
+			[ C(RESULT_ACCESS) ] = 0,
+			[ C(RESULT_MISS)   ] = PM_ST_MISS_L1,
+		},
+		[ C(OP_PREFETCH) ] = {
+			[ C(RESULT_ACCESS) ] = PM_L1_PREF,
+			[ C(RESULT_MISS)   ] = 0,
+		},
+	},
+	[ C(L1I) ] = {
+		[ C(OP_READ) ] = {
+			[ C(RESULT_ACCESS) ] = PM_INST_FROM_L1,
+			[ C(RESULT_MISS)   ] = PM_L1_ICACHE_MISS,
+		},
+		[ C(OP_WRITE) ] = {
+			[ C(RESULT_ACCESS) ] = PM_L1_DEMAND_WRITE,
+			[ C(RESULT_MISS)   ] = -1,
+		},
+		[ C(OP_PREFETCH) ] = {
+			[ C(RESULT_ACCESS) ] = PM_IC_PREF_WRITE,
+			[ C(RESULT_MISS)   ] = 0,
+		},
+	},
+	[ C(LL) ] = {
+		[ C(OP_READ) ] = {
+			[ C(RESULT_ACCESS) ] = PM_DATA_FROM_L3,
+			[ C(RESULT_MISS)   ] = PM_DATA_FROM_L3MISS,
+		},
+		[ C(OP_WRITE) ] = {
+			[ C(RESULT_ACCESS) ] = PM_L2_ST,
+			[ C(RESULT_MISS)   ] = PM_L2_ST_MISS,
+		},
+		[ C(OP_PREFETCH) ] = {
+			[ C(RESULT_ACCESS) ] = PM_L3_PREF_ALL,
+			[ C(RESULT_MISS)   ] = 0,
+		},
+	},
+	[ C(DTLB) ] = {
+		[ C(OP_READ) ] = {
+			[ C(RESULT_ACCESS) ] = 0,
+			[ C(RESULT_MISS)   ] = PM_DTLB_MISS,
+		},
+		[ C(OP_WRITE) ] = {
+			[ C(RESULT_ACCESS) ] = -1,
+			[ C(RESULT_MISS)   ] = -1,
+		},
+		[ C(OP_PREFETCH) ] = {
+			[ C(RESULT_ACCESS) ] = -1,
+			[ C(RESULT_MISS)   ] = -1,
+		},
+	},
+	[ C(ITLB) ] = {
+		[ C(OP_READ) ] = {
+			[ C(RESULT_ACCESS) ] = 0,
+			[ C(RESULT_MISS)   ] = PM_ITLB_MISS,
+		},
+		[ C(OP_WRITE) ] = {
+			[ C(RESULT_ACCESS) ] = -1,
+			[ C(RESULT_MISS)   ] = -1,
+		},
+		[ C(OP_PREFETCH) ] = {
+			[ C(RESULT_ACCESS) ] = -1,
+			[ C(RESULT_MISS)   ] = -1,
+		},
+	},
+	[ C(BPU) ] = {
+		[ C(OP_READ) ] = {
+			[ C(RESULT_ACCESS) ] = PM_BRU_FIN,
+			[ C(RESULT_MISS)   ] = PM_BR_MPRED_CMPL,
+		},
+		[ C(OP_WRITE) ] = {
+			[ C(RESULT_ACCESS) ] = -1,
+			[ C(RESULT_MISS)   ] = -1,
+		},
+		[ C(OP_PREFETCH) ] = {
+			[ C(RESULT_ACCESS) ] = -1,
+			[ C(RESULT_MISS)   ] = -1,
+		},
+	},
+	[ C(NODE) ] = {
+		[ C(OP_READ) ] = {
+			[ C(RESULT_ACCESS) ] = -1,
+			[ C(RESULT_MISS)   ] = -1,
+		},
+		[ C(OP_WRITE) ] = {
+			[ C(RESULT_ACCESS) ] = -1,
+			[ C(RESULT_MISS)   ] = -1,
+		},
+		[ C(OP_PREFETCH) ] = {
+			[ C(RESULT_ACCESS) ] = -1,
+			[ C(RESULT_MISS)   ] = -1,
+		},
+	},
+};
+
+#undef C
+
 static struct power_pmu power8_pmu = {
 	.name			= "POWER8",
 	.n_counter		= 6,
@@ -611,6 +754,7 @@ static struct power_pmu power8_pmu = {
 	.flags			= PPMU_HAS_SSLOT | PPMU_HAS_SIER | PPMU_BHRB | PPMU_EBB,
 	.n_generic		= ARRAY_SIZE(power8_generic_events),
 	.generic_events		= power8_generic_events,
+	.cache_events		= &power8_cache_events,
 	.attr_groups		= power8_pmu_attr_groups,
 	.bhrb_nr		= 32,
 };
-- 
1.8.3.2

^ permalink raw reply related

* Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
From: Wanpeng Li @ 2014-01-24  3:14 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: nacc, David Rientjes, penberg, linux-mm, Han Pingtian, paulus,
	Anton Blanchard, mpm, Joonsoo Kim, linuxppc-dev
In-Reply-To: <52e1d960.2715420a.3569.1013SMTPIN_ADDED_BROKEN@mx.google.com>

On Fri, Jan 24, 2014 at 11:09:07AM +0800, Wanpeng Li wrote:
>Hi Christoph,
>On Mon, Jan 20, 2014 at 04:13:30PM -0600, Christoph Lameter wrote:
>>On Mon, 20 Jan 2014, Wanpeng Li wrote:
>>
>>> >+       enum zone_type high_zoneidx = gfp_zone(flags);
>>> >
>>> >+       if (!node_present_pages(searchnode)) {
>>> >+               zonelist = node_zonelist(searchnode, flags);
>>> >+               for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
>>> >+                       searchnode = zone_to_nid(zone);
>>> >+                       if (node_present_pages(searchnode))
>>> >+                               break;
>>> >+               }
>>> >+       }
>>> >        object = get_partial_node(s, get_node(s, searchnode), c, flags);
>>> >        if (object || node != NUMA_NO_NODE)
>>> >                return object;
>>> >
>>>
>>> The patch fix the bug. However, the kernel crashed very quickly after running
>>> stress tests for a short while:
>>
>>This is not a good way of fixing it. How about not asking for memory from
>>nodes that are memoryless? Use numa_mem_id() which gives you the next node
>>that has memory instead of numa_node_id() (gives you the current node
>>regardless if it has memory or not).
>
>diff --git a/mm/slub.c b/mm/slub.c
>index 545a170..a1c6040 100644
>--- a/mm/slub.c
>+++ b/mm/slub.c
>@@ -1700,6 +1700,9 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
> 	void *object;
>	int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;
>
>+	if (!node_present_pages(searchnode))
>+		searchnode = numa_mem_id();
>+
>	object = get_partial_node(s, get_node(s, searchnode), c, flags);
>	if (object || node != NUMA_NO_NODE)
>		return object;
>

The bug still can't be fixed w/ this patch. 

Regards,
Wanpeng Li 

>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org.  For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
From: Wanpeng Li @ 2014-01-24  3:09 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: nacc, David Rientjes, penberg, linux-mm, Han Pingtian, paulus,
	Anton Blanchard, mpm, Joonsoo Kim, linuxppc-dev
In-Reply-To: <alpine.DEB.2.10.1401201612340.28048@nuc>

Hi Christoph,
On Mon, Jan 20, 2014 at 04:13:30PM -0600, Christoph Lameter wrote:
>On Mon, 20 Jan 2014, Wanpeng Li wrote:
>
>> >+       enum zone_type high_zoneidx = gfp_zone(flags);
>> >
>> >+       if (!node_present_pages(searchnode)) {
>> >+               zonelist = node_zonelist(searchnode, flags);
>> >+               for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
>> >+                       searchnode = zone_to_nid(zone);
>> >+                       if (node_present_pages(searchnode))
>> >+                               break;
>> >+               }
>> >+       }
>> >        object = get_partial_node(s, get_node(s, searchnode), c, flags);
>> >        if (object || node != NUMA_NO_NODE)
>> >                return object;
>> >
>>
>> The patch fix the bug. However, the kernel crashed very quickly after running
>> stress tests for a short while:
>
>This is not a good way of fixing it. How about not asking for memory from
>nodes that are memoryless? Use numa_mem_id() which gives you the next node
>that has memory instead of numa_node_id() (gives you the current node
>regardless if it has memory or not).

diff --git a/mm/slub.c b/mm/slub.c
index 545a170..a1c6040 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1700,6 +1700,9 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
 	void *object;
	int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;

+	if (!node_present_pages(searchnode))
+		searchnode = numa_mem_id();
+
	object = get_partial_node(s, get_node(s, searchnode), c, flags);
	if (object || node != NUMA_NO_NODE)
		return object;

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox