LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 1/2] mm/cma: remove unsupported gfp_mask parameter from cma_alloc()
From: Vlastimil Babka @ 2018-07-16  7:45 UTC (permalink / raw)
  To: Marek Szyprowski, linux-mm, linux-kernel, linux-arm-kernel,
	linuxppc-dev, iommu
  Cc: Andrew Morton, Michal Nazarewicz, Joonsoo Kim, Christoph Hellwig,
	Michal Hocko, Russell King, Catalin Marinas, Will Deacon,
	Paul Mackerras, Benjamin Herrenschmidt, Chris Zankel,
	Martin Schwidefsky, Joerg Roedel, Sumit Semwal, Robin Murphy,
	Laura Abbott, linaro-mm-sig
In-Reply-To: <20180709122019eucas1p2340da484acfcc932537e6014f4fd2c29~-sqTPJKij2939229392eucas1p2j@eucas1p2.samsung.com>

On 07/09/2018 02:19 PM, Marek Szyprowski wrote:
> cma_alloc() function doesn't really support gfp flags other than
> __GFP_NOWARN, so convert gfp_mask parameter to boolean no_warn parameter.
> 
> This will help to avoid giving false feeling that this function supports
> standard gfp flags and callers can pass __GFP_ZERO to get zeroed buffer,
> what has already been an issue: see commit dd65a941f6ba ("arm64:
> dma-mapping: clear buffers allocated with FORCE_CONTIGUOUS flag").
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply

* [RFC PATCH v6 4/4] powerpc/fadump: Do not allow hot-remove memory from fadump reserved area.
From: Mahesh J Salgaonkar @ 2018-07-16  6:03 UTC (permalink / raw)
  To: linuxppc-dev, Linux Kernel
  Cc: Srikar Dronamraju, Aneesh Kumar K.V, Anshuman Khandual,
	Andrew Morton, Joonsoo Kim, Michal Hocko, Hari Bathini,
	Ananth Narayan, kernelfans
In-Reply-To: <153172096333.29252.4376707071382727345.stgit@jupiter.in.ibm.com>

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

For fadump to work successfully there should not be any holes in reserved
memory ranges where kernel has asked firmware to move the content of old
kernel memory in event of crash. Now that fadump reserved memory is marked
as movable zone, this memory area is now not protected from hot-remove
operations. Hence, fadump service can fail to re-register after the
hot-remove operation, if hot-removed memory belongs to fadump reserved
region. To avoid this make sure that memory from fadump reserved area is
not hot-removable if fadump is registered.

However, if user still wants to remove that memory, he can do so by
manually stopping fadump service before hot-remove operation.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/fadump.h               |    2 +-
 arch/powerpc/kernel/fadump.c                    |   10 ++++++++--
 arch/powerpc/platforms/pseries/hotplug-memory.c |    7 +++++--
 3 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h
index 5c0de4508aab..cd28e9b59057 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -208,7 +208,7 @@ struct fad_crash_memory_ranges {
 	unsigned long long	size;
 };
 
-extern int is_fadump_boot_memory_area(u64 addr, ulong size);
+extern int is_fadump_memory_area(u64 addr, ulong size);
 extern int early_init_dt_scan_fw_dump(unsigned long node,
 		const char *uname, int depth, void *data);
 extern int fadump_reserve_mem(void);
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index d1375f3f48c3..18a35f12ffb5 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -116,13 +116,19 @@ int __init early_init_dt_scan_fw_dump(unsigned long node,
 
 /*
  * If fadump is registered, check if the memory provided
- * falls within boot memory area.
+ * falls within boot memory area and reserved memory area.
  */
-int is_fadump_boot_memory_area(u64 addr, ulong size)
+int is_fadump_memory_area(u64 addr, ulong size)
 {
+	u64 d_start = fw_dump.reserve_dump_area_start;
+	u64 d_end = d_start + fw_dump.reserve_dump_area_size;
+
 	if (!fw_dump.dump_registered)
 		return 0;
 
+	if (((addr + size) > d_start) && (addr <= d_end))
+		return 1;
+
 	return (addr + size) > RMA_START && addr <= fw_dump.boot_memory_size;
 }
 
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index c1578f54c626..e4c658cda3a7 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -389,8 +389,11 @@ static bool lmb_is_removable(struct drmem_lmb *lmb)
 	phys_addr = lmb->base_addr;
 
 #ifdef CONFIG_FA_DUMP
-	/* Don't hot-remove memory that falls in fadump boot memory area */
-	if (is_fadump_boot_memory_area(phys_addr, block_sz))
+	/*
+	 * Don't hot-remove memory that falls in fadump boot memory area
+	 * and memory that is reserved for capturing old kernel memory.
+	 */
+	if (is_fadump_memory_area(phys_addr, block_sz))
 		return false;
 #endif
 

^ permalink raw reply related

* [RFC PATCH v6 3/4] powerpc/fadump: throw proper error message on fadump registration failure.
From: Mahesh J Salgaonkar @ 2018-07-16  6:03 UTC (permalink / raw)
  To: linuxppc-dev, Linux Kernel
  Cc: Srikar Dronamraju, Aneesh Kumar K.V, Anshuman Khandual,
	Andrew Morton, Joonsoo Kim, Michal Hocko, Hari Bathini,
	Ananth Narayan, kernelfans
In-Reply-To: <153172096333.29252.4376707071382727345.stgit@jupiter.in.ibm.com>

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

fadump fails to register when there are holes in reserved memory area.
This can happen if user has hot-removed a memory that falls in the fadump
reserved memory area. Throw a meaningful error message to the user in
such case.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/fadump.c |   33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index ce333c1d4cb8..d1375f3f48c3 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -170,6 +170,36 @@ static int is_boot_memory_area_contiguous(void)
 	return ret;
 }
 
+/*
+ * Returns 1, if there are no holes in reserved memory area,
+ * 0 otherwise.
+ */
+static int is_reserved_memory_area_contiguous(void)
+{
+	struct memblock_region *reg;
+	unsigned long start, end;
+	unsigned long d_start = fw_dump.reserve_dump_area_start;
+	unsigned long d_end = d_start + fw_dump.reserve_dump_area_size;
+	int ret = 0;
+
+	for_each_memblock(memory, reg) {
+		start = max(d_start, (unsigned long)reg->base);
+		end = min(d_end, (unsigned long)(reg->base + reg->size));
+		if (d_start < end) {
+			/* Memory hole from d_start to start */
+			if (start > d_start)
+				break;
+
+			if (end == d_end) {
+				ret = 1;
+				break;
+			}
+			d_start = end + 1;
+		}
+	}
+	return ret;
+}
+
 /* Print firmware assisted dump configurations for debugging purpose. */
 static void fadump_show_config(void)
 {
@@ -531,6 +561,9 @@ static int register_fw_dump(struct fadump_mem_struct *fdm)
 		if (!is_boot_memory_area_contiguous())
 			pr_err("Can't have holes in boot memory area while "
 			       "registering fadump\n");
+		else if (!is_reserved_memory_area_contiguous())
+			pr_err("Can't have holes in reserved memory area while"
+			       " registering fadump\n");
 
 		printk(KERN_ERR "Failed to register firmware-assisted kernel"
 			" dump. Parameter Error(%d).\n", rc);

^ permalink raw reply related

* [RFC PATCH v6 2/4] powerpc/fadump: Reservationless firmware assisted dump
From: Mahesh J Salgaonkar @ 2018-07-16  6:03 UTC (permalink / raw)
  To: linuxppc-dev, Linux Kernel
  Cc: Ananth N Mavinakayanahalli, Hari Bathini, Srikar Dronamraju,
	Aneesh Kumar K.V, Anshuman Khandual, Andrew Morton, Joonsoo Kim,
	Michal Hocko, Hari Bathini, Ananth Narayan, kernelfans
In-Reply-To: <153172096333.29252.4376707071382727345.stgit@jupiter.in.ibm.com>

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

One of the primary issues with Firmware Assisted Dump (fadump) on Power
is that it needs a large amount of memory to be reserved. On large
systems with TeraBytes of memory, this reservation can be quite
significant.

In some cases, fadump fails if the memory reserved is insufficient, or
if the reserved memory was DLPAR hot-removed.

In the normal case, post reboot, the preserved memory is filtered to
extract only relevant areas of interest using the makedumpfile tool.
While the tool provides flexibility to determine what needs to be part
of the dump and what memory to filter out, all supported distributions
default this to "Capture only kernel data and nothing else".

We take advantage of this default and the Linux kernel's zone movable
feature to fundamentally change the memory reservation model for fadump.

Instead of setting aside a significant chunk of memory nobody can use,
this patch marks a significant chunk of reserved memory as ZONE_MOVABLE
that the kernel is prevented from using (due to MIGRATE_MOVABLE),
but applications are free to use it. With this fadump will still be able
to capture all of the kernel memory and most of the user space memory
except the user pages that were present in ZONE_MOVABLE zone. But if
someone wants to capture all of user space memory and ok with reserved
memory not available to production system, then 'fadump=nonmovable' kernel
parameter can be used to fallback to old behaviour.

Essentially, on a P9 LPAR with 2 cores, 8GB RAM and current upstream:
[root@zzxx-yy10 ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:           7557         193        6822          12         541        6725
Swap:          4095           0        4095

With this patch:
[root@zzxx-yy10 ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:           8133         194        7464          12         475        7338
Swap:          4095           0        4095

Changes made here are completely transparent to how fadump has
traditionally worked.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
---
 Documentation/powerpc/firmware-assisted-dump.txt |   18 +++++
 arch/powerpc/include/asm/fadump.h                |    5 +
 arch/powerpc/kernel/fadump.c                     |   80 ++++++++++++++++++++--
 3 files changed, 95 insertions(+), 8 deletions(-)

diff --git a/Documentation/powerpc/firmware-assisted-dump.txt b/Documentation/powerpc/firmware-assisted-dump.txt
index bdd344aa18d9..f8a6343a1dcf 100644
--- a/Documentation/powerpc/firmware-assisted-dump.txt
+++ b/Documentation/powerpc/firmware-assisted-dump.txt
@@ -113,7 +113,16 @@ header, is usually reserved at an offset greater than boot memory
 size (see Fig. 1). This area is *not* released: this region will
 be kept permanently reserved, so that it can act as a receptacle
 for a copy of the boot memory content in addition to CPU state
-and HPTE region, in the case a crash does occur.
+and HPTE region, in the case a crash does occur. Since this reserved
+memory area is used only after the system crash, there is no point in
+blocking this significant chunk of memory from production kernel.
+Hence, the implementation marks the memory reserved for fadump as
+ZONE_MOVABLE. With ZONE_MOVABLE this memory will be available for
+applications to use it, while kernel is prevented from using it. With
+this fadump will still be able to capture all of the kernel memory and
+most of the user space memory except the user pages that were present
+in ZONE_MOVABLE region.
+
 
   o Memory Reservation during first kernel
 
@@ -162,6 +171,9 @@ How to enable firmware-assisted dump (fadump):
 
 1. Set config option CONFIG_FA_DUMP=y and build kernel.
 2. Boot into linux kernel with 'fadump=on' kernel cmdline option.
+   By default, the reserved memory will be marked as zone movable.
+   Alternatively, user can boot linux kernel with 'fadump=nonmovable' to
+   prevent fadump to mark reserved memory as zone movable.
 3. Optionally, user can also set 'crashkernel=' kernel cmdline
    to specify size of the memory to reserve for boot memory dump
    preservation.
@@ -172,6 +184,10 @@ NOTE: 1. 'fadump_reserve_mem=' parameter has been deprecated. Instead
       2. If firmware-assisted dump fails to reserve memory then it
          will fallback to existing kdump mechanism if 'crashkernel='
          option is set at kernel cmdline.
+      3. if user wants to capture all of user space memory and ok with
+         reserved memory not available to production system, then
+         'fadump=nonmovable' kernel parameter can be used to fallback to
+         old behaviour.
 
 Sysfs/debugfs files:
 ------------
diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h
index 5a23010af600..5c0de4508aab 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -48,6 +48,10 @@
 
 #define memblock_num_regions(memblock_type)	(memblock.memblock_type.cnt)
 
+/* Alignement per core mm requirement. */
+#define FADUMP_PAGEBLOCK_ALIGNMENT	(PAGE_SIZE <<			\
+			max_t(unsigned long, MAX_ORDER - 1, pageblock_order))
+
 /* Firmware provided dump sections */
 #define FADUMP_CPU_STATE_DATA	0x0001
 #define FADUMP_HPTE_REGION	0x0002
@@ -141,6 +145,7 @@ struct fw_dump {
 	unsigned long	fadump_supported:1;
 	unsigned long	dump_active:1;
 	unsigned long	dump_registered:1;
+	unsigned long	nonmovable:1;		/* !ZONE_MOVABLE */
 };
 
 /*
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 07e8396d472b..ce333c1d4cb8 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -34,6 +34,7 @@
 #include <linux/crash_dump.h>
 #include <linux/kobject.h>
 #include <linux/sysfs.h>
+#include <linux/mmzone.h>
 
 #include <asm/debugfs.h>
 #include <asm/page.h>
@@ -375,8 +376,11 @@ int __init fadump_reserve_mem(void)
 	 */
 	if (fdm_active)
 		fw_dump.boot_memory_size = be64_to_cpu(fdm_active->rmr_region.source_len);
-	else
+	else {
 		fw_dump.boot_memory_size = fadump_calculate_reserve_size();
+		fw_dump.boot_memory_size = ALIGN(fw_dump.boot_memory_size,
+						FADUMP_PAGEBLOCK_ALIGNMENT);
+	}
 
 	/*
 	 * Calculate the memory boundary.
@@ -423,8 +427,7 @@ int __init fadump_reserve_mem(void)
 		fw_dump.fadumphdr_addr =
 				be64_to_cpu(fdm_active->rmr_region.destination_address) +
 				be64_to_cpu(fdm_active->rmr_region.source_len);
-		pr_debug("fadumphdr_addr = %p\n",
-				(void *) fw_dump.fadumphdr_addr);
+		pr_debug("fadumphdr_addr = %pa\n", &fw_dump.fadumphdr_addr);
 	} else {
 		size = get_fadump_area_size();
 
@@ -474,6 +477,10 @@ static int __init early_fadump_param(char *p)
 		fw_dump.fadump_enabled = 1;
 	else if (strncmp(p, "off", 3) == 0)
 		fw_dump.fadump_enabled = 0;
+	else if (strncmp(p, "nonmovable", 10) == 0) {
+		fw_dump.fadump_enabled = 1;
+		fw_dump.nonmovable = 1;
+	}
 
 	return 0;
 }
@@ -1146,7 +1153,7 @@ static int fadump_unregister_dump(struct fadump_mem_struct *fdm)
 	return 0;
 }
 
-static int fadump_invalidate_dump(struct fadump_mem_struct *fdm)
+static int fadump_invalidate_dump(const struct fadump_mem_struct *fdm)
 {
 	int rc = 0;
 	unsigned int wait_time;
@@ -1177,9 +1184,8 @@ void fadump_cleanup(void)
 {
 	/* Invalidate the registration only if dump is active. */
 	if (fw_dump.dump_active) {
-		init_fadump_mem_struct(&fdm,
-			be64_to_cpu(fdm_active->cpu_state_data.destination_address));
-		fadump_invalidate_dump(&fdm);
+		/* pass the same memory dump structure provided by platform */
+		fadump_invalidate_dump(fdm_active);
 	} else if (fw_dump.dump_registered) {
 		/* Un-register Firmware-assisted dump if it was registered. */
 		fadump_unregister_dump(&fdm);
@@ -1525,3 +1531,63 @@ int __init setup_fadump(void)
 	return 1;
 }
 subsys_initcall(setup_fadump);
+
+/*
+ * Mark the fadump reserved area as ZONE_MOVABLE.
+ * The total size of fadump reserved memory covers for boot memory size
+ * + cpu data size + hpte size and metadata. Initialize only the area
+ * equivalent to boot memory size as zone movable. The reamining portion
+ * of fadump reserved memory will be not given to movable zone and pages
+ * for thoes will stay reserved. boot memory size is aligned per core mm
+ * requirement to satisy zone_movable_init_reserved_mem() call.
+ * But for some reason even if it fails we still have the memory reservation
+ * with us and we can still continue doing fadump.
+ */
+static int __init fadump_init_reserved_mem(void)
+{
+	unsigned long long base, size;
+	int rc;
+
+	if (!fw_dump.fadump_enabled)
+		return 0;
+
+	/* Ignore if booted with fadump=nonmovable */
+	if (fw_dump.nonmovable)
+		return 0;
+
+	if (fw_dump.dump_active)
+		return 0;
+
+	/*
+	 * Mark only the size equivalent to boot memory size as movable
+	 * zone.
+	 */
+	base = fw_dump.reserve_dump_area_start;
+	size = fw_dump.boot_memory_size;
+
+	if (!size)
+		return 0;
+
+	rc = zone_movable_init_reserved_mem(base, size);
+	if (rc) {
+		pr_err("Failed to init zone movable area for firmware-assisted dump,%d\n", rc);
+		/*
+		 * Though the zone movable init has failed, we still have memory
+		 * reservation with us. The reserved memory will be
+		 * blocked from production system usage.  Hence return 1,
+		 * so that we can continue with fadump.
+		 */
+		return 1;
+	}
+
+	/*
+	 * So we now have successfully initialized reserved area as
+	 * ZONE_MOVABLE for fadump.
+	 */
+	pr_info("Initialized 0x%llx bytes as zone movable area at %ldMB from "
+		"0x%lx bytes of memory reserved for firmware-assisted dump\n",
+		size, (unsigned long)base >> 20,
+		fw_dump.reserve_dump_area_size);
+	return 1;
+}
+core_initcall(fadump_init_reserved_mem);

^ permalink raw reply related

* [RFC PATCH v6 1/4] mm/page_alloc: Introduce an interface to mark reserved memory as ZONE_MOVABLE
From: Mahesh J Salgaonkar @ 2018-07-16  6:03 UTC (permalink / raw)
  To: linuxppc-dev, Linux Kernel
  Cc: Srikar Dronamraju, Aneesh Kumar K.V, Anshuman Khandual,
	Andrew Morton, Joonsoo Kim, Michal Hocko, Hari Bathini,
	Ananth Narayan, kernelfans
In-Reply-To: <153172096333.29252.4376707071382727345.stgit@jupiter.in.ibm.com>

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Add an interface to allow a custom reserved memory to be marked as
ZONE_MOVABLE. This will help some subsystem's to convert their reserved
memory region into ZONE_MOVABLE so that the memory can still be available
to user applications.

The approach is based on Joonsoo Kim's commit bad8c6c0
(https://github.com/torvalds/linux/commit/bad8c6c0) that
uses ZONE_MOVABLE to manage CMA area. Majority of the code has been taken
from the Joonsoo Kim's commit mentioned above. But I see above commit
has been reverted due to some issues reported on i386. I believe this
patch is being reworked and re-posted soon.

Like CMA, the other user of ZONE_MOVABLE can be fadump on powerpc, which
reserves significant chunk of memory that is used only after system
is crashed. Until then the reserved memory is unused. By marking that
memory to ZONE_MOVABLE, it can be at least utilized by user applications.

This patch proposes a RFC implementation of an interface to mark
specified reserved area as ZONE_MOVABLE. Comments are welcome.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 include/linux/mmzone.h |    2 +
 mm/page_alloc.c        |  146 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 148 insertions(+)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 32699b2dc52a..2519dd690572 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1288,6 +1288,8 @@ struct mminit_pfnnid_cache {
 #endif
 
 void memory_present(int nid, unsigned long start, unsigned long end);
+extern int __init zone_movable_init_reserved_mem(phys_addr_t base,
+							phys_addr_t size);
 
 /*
  * If it is possible to have holes within a MAX_ORDER_NR_PAGES, then we
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1521100f1e63..0817ed8843cb 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7687,6 +7687,152 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 	return true;
 }
 
+static __init void mark_zone_movable(struct page *page)
+{
+	unsigned i = pageblock_nr_pages;
+	struct page *p = page;
+	struct zone *zone;
+	unsigned long pfn = page_to_pfn(page);
+	int nid = page_to_nid(page);
+
+	zone = page_zone(page);
+	zone->present_pages -= pageblock_nr_pages;
+
+	do {
+		__ClearPageReserved(p);
+		set_page_count(p, 0);
+
+		/* Steal pages from other zones */
+		set_page_links(p, ZONE_MOVABLE, nid, pfn);
+	} while (++p, ++pfn, --i);
+
+	zone = page_zone(page);
+	zone->present_pages += pageblock_nr_pages;
+
+	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+
+	if (pageblock_order >= MAX_ORDER) {
+		i = pageblock_nr_pages;
+		p = page;
+		do {
+			set_page_refcounted(p);
+			__free_pages(p, MAX_ORDER - 1);
+			p += MAX_ORDER_NR_PAGES;
+		} while (i -= MAX_ORDER_NR_PAGES);
+	} else {
+		set_page_refcounted(page);
+		__free_pages(page, pageblock_order);
+	}
+
+	adjust_managed_page_count(page, pageblock_nr_pages);
+}
+
+static int __init zone_movable_activate_area(unsigned long start_pfn,
+					unsigned long end_pfn)
+{
+	unsigned long base_pfn = start_pfn, pfn = start_pfn;
+	struct zone *zone;
+	unsigned i = (end_pfn - start_pfn) >> pageblock_order;
+
+	zone = page_zone(pfn_to_page(base_pfn));
+	while (pfn < end_pfn) {
+		if (!pfn_valid(pfn))
+			goto err;
+
+		if (page_zone(pfn_to_page(pfn)) != zone)
+			goto err;
+		pfn++;
+	}
+
+	do {
+		mark_zone_movable(pfn_to_page(base_pfn));
+		base_pfn += pageblock_nr_pages;
+	} while (--i);
+
+	return 0;
+err:
+	pr_err("Zone movable could not be activated\n");
+	return -EINVAL;
+}
+
+/**
+ * zone_movable_init_reserved_mem() - create custom zone movable area from
+ *				      reserved memory
+ * @base: Base address of the reserved area
+ * @size: Size of the reserved area (in bytes),
+ *
+ * This function creates custom zone movable area from already reserved memory.
+ */
+int __init zone_movable_init_reserved_mem(phys_addr_t base, phys_addr_t size)
+{
+	struct zone *zone;
+	pg_data_t *pgdat;
+	unsigned long start_pfn = PHYS_PFN(base);
+	unsigned long end_pfn = PHYS_PFN(base + size);
+	phys_addr_t alignment;
+	int ret;
+
+	if (!size || !memblock_is_region_reserved(base, size))
+		return -EINVAL;
+
+	/* ensure minimal alignment required by mm core */
+	alignment = PAGE_SIZE <<
+			max_t(unsigned long, MAX_ORDER - 1, pageblock_order);
+
+	if (ALIGN(base, alignment) != base || ALIGN(size, alignment) != size)
+		return -EINVAL;
+
+	for_each_online_pgdat(pgdat) {
+		zone = &pgdat->node_zones[ZONE_MOVABLE];
+
+		/*
+		 * Continue if zone is already populated.
+		 * Should we at least bump up the zone->spanned_pages
+		 * for existing populated zone ?
+		 */
+		if (populated_zone(zone))
+			continue;
+
+		/*
+		 * Is it possible to allow memory region across nodes to
+		 * be marked as ZONE_MOVABLE ?
+		 */
+		if (pfn_to_nid(start_pfn) != pgdat->node_id)
+			continue;
+
+		/* Not sure if this is a right place to init empty zone. */
+		if (zone_is_empty(zone)) {
+			init_currently_empty_zone(zone, start_pfn,
+							end_pfn - start_pfn);
+			zone->spanned_pages = end_pfn - start_pfn;
+		}
+	}
+
+	ret = zone_movable_activate_area(start_pfn, end_pfn);
+
+	if (ret)
+		return ret;
+
+	/*
+	 * Reserved pages for ZONE_MOVABLE are now activated and
+	 * this would change ZONE_MOVABLE's managed page counter and
+	 * the other zones' present counter. We need to re-calculate
+	 * various zone information that depends on this initialization.
+	 */
+	build_all_zonelists(NULL);
+	for_each_populated_zone(zone) {
+		if (zone_idx(zone) == ZONE_MOVABLE) {
+			zone_pcp_reset(zone);
+			setup_zone_pageset(zone);
+		} else
+			zone_pcp_update(zone);
+
+		set_zone_contiguous(zone);
+	}
+
+	return 0;
+}
+
 #if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || defined(CONFIG_CMA)
 
 static unsigned long pfn_max_align_down(unsigned long pfn)

^ permalink raw reply related

* [RFC PATCH v6 0/4] powerpc/fadump: Improvements and fixes for firmware-assisted dump.
From: Mahesh J Salgaonkar @ 2018-07-16  6:02 UTC (permalink / raw)
  To: linuxppc-dev, Linux Kernel
  Cc: Hari Bathini, Ananth N Mavinakayanahalli, Srikar Dronamraju,
	Aneesh Kumar K.V, Anshuman Khandual, Andrew Morton, Joonsoo Kim,
	Michal Hocko, Hari Bathini, Ananth Narayan, kernelfans

One of the primary issues with Firmware Assisted Dump (fadump) on Power
is that it needs a large amount of memory to be reserved. This reserved
memory is used for saving the contents of old crashed kernel's memory before
fadump capture kernel uses old kernel's memory area to boot. However, This
reserved memory area stays unused until system crash and isn't available
for production kernel to use.

Instead of setting aside a significant chunk of memory that nobody can use,
take advantage ZONE_MOVABLE to mark a significant chunk of reserved memory
as ZONE_MOVABLE, so that the kernel is prevented from using, but
applications are free to use it.

Patch 1 introduces an interface to mark reserved memory as ZONE_MOVABLE.
Patch 2 uses the above interface to mark reserved memory movable so that
it can be used for applications usage, making fadump reservationless.
Patch 3 and 4 fixes minor issues.

Changes in V6:
- Introduce an interface to mark reserved memory as ZONE_MOVABLE. Hence
  sending this series as RFC again.
- Mark reserved area as ZONE_MOVABLE instead of CMA.
- Add fadump=nonmovable parameter for user who don't want to use ZONE_MOVABLE.

Changes in V5:
- Drop the patch that does metadata movement.
- Move the kexec fix patch to top (patch 1)
- Fold CMA documenation patch into patch 2
- Fix the compilation issues when CONFIG_CMA is not set reported by Hari.
- Use the approach of using boot memory size for CMA as suggested by Hari
  except the movement of sections. Thanks to Hari.

Changes in V4:
- patch 1: Make fadump compatible irrespective of kernel versions.
- patch 4: moved out of the series and been posted seperatly at
  http://patchwork.ozlabs.org/patch/896716/
- Documentation update about CMA reservation.

Changes in V3:
- patch 1 & 2: move metadata region and documentation update.
- patch 7: Un-register the faudmp on kexec path


---

Mahesh Salgaonkar (4):
      mm/page_alloc: Introduce an interface to mark reserved memory as ZONE_MOVABLE
      powerpc/fadump: Reservationless firmware assisted dump
      powerpc/fadump: throw proper error message on fadump registration failure.
      powerpc/fadump: Do not allow hot-remove memory from fadump reserved area.


 Documentation/powerpc/firmware-assisted-dump.txt |   18 +++
 arch/powerpc/include/asm/fadump.h                |    7 +
 arch/powerpc/kernel/fadump.c                     |  123 +++++++++++++++++--
 arch/powerpc/platforms/pseries/hotplug-memory.c  |    7 +
 include/linux/mmzone.h                           |    2 
 mm/page_alloc.c                                  |  146 ++++++++++++++++++++++
 6 files changed, 290 insertions(+), 13 deletions(-)

--
Signature

^ permalink raw reply

* Re: [PATCH v3 2/2] powerpc: Enable CPU_FTR_ASYM_SMT for interleaved big-cores
From: Gautham R Shenoy @ 2018-07-16  5:25 UTC (permalink / raw)
  To: kbuild test robot
  Cc: Gautham R. Shenoy, kbuild-all, Michael Ellerman,
	Benjamin Herrenschmidt, Michael Neuling, Vaidyanathan Srinivasan,
	Akshay Adiga, Shilpasri G Bhat, Oliver O'Halloran,
	Nicholas Piggin, Murilo Opsfelder Araujo, linuxppc-dev,
	linux-kernel
In-Reply-To: <201807111509.TCWIHpaa%fengguang.wu@intel.com>


On Wed, Jul 11, 2018 at 04:32:30PM +0800, kbuild test robot wrote:
> Hi Gautham,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on powerpc/next]
> [also build test ERROR on v4.18-rc4 next-20180710]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
> 
> url:    https://github.com/0day-ci/linux/commits/Gautham-R-Shenoy/powerpc-Detect-the-presence-of-big-cores-via-ibm-thread-groups/20180706-174756
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
> config: powerpc-g5_defconfig (attached as .config)
> compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
> reproduce:
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         GCC_VERSION=7.2.0 make.cross ARCH=powerpc 
> 
> All errors (new ones prefixed by >>):
> 
>    In file included from include/linux/static_key.h:1:0,
>                     from include/linux/context_tracking_state.h:6,
>                     from include/linux/vtime.h:5,
>                     from include/linux/hardirq.h:8,
>                     from include/linux/interrupt.h:11,
>                     from include/linux/serial_core.h:25,
>                     from include/linux/serial_8250.h:14,
>                     from arch/powerpc/kernel/setup-common.c:33:
>    arch/powerpc/kernel/setup-common.c: In function 'smp_setup_cpu_maps':
> >> arch/powerpc/kernel/setup-common.c:777:25: error: 'cpu_feature_keys' undeclared (first use in this function); did you mean 'setup_feature_keys'?
>       static_branch_enable(&cpu_feature_keys[key]);

Ok, so this needs to be enabled only on CONFIG_PPC_PSERIES and
CONFIG_PPC_POWERNV. Will fix this.


--
Thanks and Regards
gautham.

^ permalink raw reply

* Re: [PATCH kernel v6 2/2] KVM: PPC: Check if IOMMU page is contained in the pinned physical page
From: David Gibson @ 2018-07-16  4:17 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, kvm-ppc, Aneesh Kumar K.V, Alex Williamson,
	Michael Ellerman, Nicholas Piggin, Paul Mackerras, Balbir Singh
In-Reply-To: <20180711110044.15939-3-aik@ozlabs.ru>

[-- Attachment #1: Type: text/plain, Size: 10510 bytes --]

On Wed, Jul 11, 2018 at 09:00:44PM +1000, Alexey Kardashevskiy wrote:
> A VM which has:
>  - a DMA capable device passed through to it (eg. network card);
>  - running a malicious kernel that ignores H_PUT_TCE failure;
>  - capability of using IOMMU pages bigger that physical pages
> can create an IOMMU mapping that exposes (for example) 16MB of
> the host physical memory to the device when only 64K was allocated to the VM.
> 
> The remaining 16MB - 64K will be some other content of host memory, possibly
> including pages of the VM, but also pages of host kernel memory, host
> programs or other VMs.
> 
> The attacking VM does not control the location of the page it can map,
> and is only allowed to map as many pages as it has pages of RAM.
> 
> We already have a check in drivers/vfio/vfio_iommu_spapr_tce.c that
> an IOMMU page is contained in the physical page so the PCI hardware won't
> get access to unassigned host memory; however this check is missing in
> the KVM fastpath (H_PUT_TCE accelerated code). We were lucky so far and
> did not hit this yet as the very first time when the mapping happens
> we do not have tbl::it_userspace allocated yet and fall back to
> the userspace which in turn calls VFIO IOMMU driver, this fails and
> the guest does not retry,
> 
> This stores the smallest preregistered page size in the preregistered
> region descriptor and changes the mm_iommu_xxx API to check this against
> the IOMMU page size.
> 
> This calculates maximum page size as a minimum of the natural region
> alignment and compound page size. For the page shift this uses the shift
> returned by find_linux_pte() which indicates how the page is mapped to
> the current userspace - if the page is huge and this is not a zero, then
> it is a leaf pte and the page is mapped within the range.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
> Changes:
> v6:
> * replaced hugetlbfs with pageshift from find_linux_pte()
> 
> v5:
> * only consider compound pages from hugetlbfs
> 
> v4:
> * reimplemented max pageshift calculation
> 
> v3:
> * fixed upper limit for the page size
> * added checks that we don't register parts of a huge page
> 
> v2:
> * explicitely check for compound pages before calling compound_order()
> 
> ---
> The bug is: run QEMU _without_ hugepages (no -mempath) and tell it to
> advertise 16MB pages to the guest; a typical pseries guest will use 16MB
> for IOMMU pages without checking the mmu pagesize and this will fail
> at https://git.qemu.org/?p=qemu.git;a=blob;f=hw/vfio/common.c;h=fb396cf00ac40eb35967a04c9cc798ca896eed57;hb=refs/heads/master#l256
> 
> With the change, mapping will fail in KVM and the guest will print:
> 
> mlx5_core 0000:00:00.0: ibm,create-pe-dma-window(2027) 0 8000000 20000000 18 1f returned 0 (liobn = 0x80000001 starting addr = 8000000 0)
> mlx5_core 0000:00:00.0: created tce table LIOBN 0x80000001 for /pci@800000020000000/ethernet@0
> mlx5_core 0000:00:00.0: failed to map direct window for /pci@800000020000000/ethernet@0: -1
> ---
>  arch/powerpc/include/asm/mmu_context.h |  4 ++--
>  arch/powerpc/kvm/book3s_64_vio.c       |  2 +-
>  arch/powerpc/kvm/book3s_64_vio_hv.c    |  6 ++++--
>  arch/powerpc/mm/mmu_context_iommu.c    | 39 ++++++++++++++++++++++++++++++++--
>  drivers/vfio/vfio_iommu_spapr_tce.c    |  2 +-
>  5 files changed, 45 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index 896efa5..79d570c 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -35,9 +35,9 @@ extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup_rm(
>  extern struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm,
>  		unsigned long ua, unsigned long entries);
>  extern long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
> -		unsigned long ua, unsigned long *hpa);
> +		unsigned long ua, unsigned int pageshift, unsigned long *hpa);
>  extern long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem,
> -		unsigned long ua, unsigned long *hpa);
> +		unsigned long ua, unsigned int pageshift, unsigned long *hpa);
>  extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem);
>  extern void mm_iommu_mapped_dec(struct mm_iommu_table_group_mem_t *mem);
>  #endif
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index d066e37..8c456fa 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -449,7 +449,7 @@ long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
>  		/* This only handles v2 IOMMU type, v1 is handled via ioctl() */
>  		return H_TOO_HARD;
>  
> -	if (WARN_ON_ONCE(mm_iommu_ua_to_hpa(mem, ua, &hpa)))
> +	if (WARN_ON_ONCE(mm_iommu_ua_to_hpa(mem, ua, tbl->it_page_shift, &hpa)))
>  		return H_HARDWARE;
>  
>  	if (mm_iommu_mapped_inc(mem))
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 925fc31..5b298f5 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -279,7 +279,8 @@ static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
>  	if (!mem)
>  		return H_TOO_HARD;
>  
> -	if (WARN_ON_ONCE_RM(mm_iommu_ua_to_hpa_rm(mem, ua, &hpa)))
> +	if (WARN_ON_ONCE_RM(mm_iommu_ua_to_hpa_rm(mem, ua, tbl->it_page_shift,
> +			&hpa)))
>  		return H_HARDWARE;
>  
>  	pua = (void *) vmalloc_to_phys(pua);
> @@ -469,7 +470,8 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  
>  		mem = mm_iommu_lookup_rm(vcpu->kvm->mm, ua, IOMMU_PAGE_SIZE_4K);
>  		if (mem)
> -			prereg = mm_iommu_ua_to_hpa_rm(mem, ua, &tces) == 0;
> +			prereg = mm_iommu_ua_to_hpa_rm(mem, ua,
> +					IOMMU_PAGE_SHIFT_4K, &tces) == 0;
>  	}
>  
>  	if (!prereg) {
> diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
> index abb4364..0bb53a7 100644
> --- a/arch/powerpc/mm/mmu_context_iommu.c
> +++ b/arch/powerpc/mm/mmu_context_iommu.c
> @@ -19,6 +19,7 @@
>  #include <linux/hugetlb.h>
>  #include <linux/swap.h>
>  #include <asm/mmu_context.h>
> +#include <asm/pte-walk.h>
>  
>  static DEFINE_MUTEX(mem_list_mutex);
>  
> @@ -27,6 +28,7 @@ struct mm_iommu_table_group_mem_t {
>  	struct rcu_head rcu;
>  	unsigned long used;
>  	atomic64_t mapped;
> +	unsigned int pageshift;
>  	u64 ua;			/* userspace address */
>  	u64 entries;		/* number of entries in hpas[] */
>  	u64 *hpas;		/* vmalloc'ed */
> @@ -125,6 +127,8 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
>  {
>  	struct mm_iommu_table_group_mem_t *mem;
>  	long i, j, ret = 0, locked_entries = 0;
> +	unsigned int pageshift;
> +	unsigned long flags;
>  	struct page *page = NULL;
>  
>  	mutex_lock(&mem_list_mutex);
> @@ -159,6 +163,12 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
>  		goto unlock_exit;
>  	}
>  
> +	/*
> +	 * For a starting point for a maximum page size calculation
> +	 * we use @ua and @entries natural alignment to allow IOMMU pages
> +	 * smaller than huge pages but still bigger than PAGE_SIZE.
> +	 */
> +	mem->pageshift = __ffs(ua | (entries << PAGE_SHIFT));
>  	mem->hpas = vzalloc(array_size(entries, sizeof(mem->hpas[0])));
>  	if (!mem->hpas) {
>  		kfree(mem);
> @@ -199,6 +209,25 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
>  			}
>  		}
>  populate:
> +		pageshift = PAGE_SHIFT;
> +		if (PageCompound(page)) {
> +			pte_t *pte;
> +			struct page *head = compound_head(page);
> +			unsigned int compshift = compound_order(head);
> +
> +			local_irq_save(flags); /* disables as well */
> +			pte = find_linux_pte(mm->pgd, ua, NULL, &pageshift);
> +			local_irq_restore(flags);
> +			if (!pte) {
> +				ret = -EFAULT;
> +				goto unlock_exit;
> +			}
> +			/* Double check it is still the same pinned page */
> +			if (pte_page(*pte) == head && pageshift == compshift)
> +				pageshift = max_t(unsigned int, pageshift,
> +						PAGE_SHIFT);
> +		}
> +		mem->pageshift = min(mem->pageshift, pageshift);
>  		mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
>  	}
>  
> @@ -349,7 +378,7 @@ struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm,
>  EXPORT_SYMBOL_GPL(mm_iommu_find);
>  
>  long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
> -		unsigned long ua, unsigned long *hpa)
> +		unsigned long ua, unsigned int pageshift, unsigned long *hpa)
>  {
>  	const long entry = (ua - mem->ua) >> PAGE_SHIFT;
>  	u64 *va = &mem->hpas[entry];
> @@ -357,6 +386,9 @@ long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
>  	if (entry >= mem->entries)
>  		return -EFAULT;
>  
> +	if (pageshift > mem->pageshift)
> +		return -EFAULT;
> +
>  	*hpa = *va | (ua & ~PAGE_MASK);
>  
>  	return 0;
> @@ -364,7 +396,7 @@ long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
>  EXPORT_SYMBOL_GPL(mm_iommu_ua_to_hpa);
>  
>  long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem,
> -		unsigned long ua, unsigned long *hpa)
> +		unsigned long ua, unsigned int pageshift, unsigned long *hpa)
>  {
>  	const long entry = (ua - mem->ua) >> PAGE_SHIFT;
>  	void *va = &mem->hpas[entry];
> @@ -373,6 +405,9 @@ long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem,
>  	if (entry >= mem->entries)
>  		return -EFAULT;
>  
> +	if (pageshift > mem->pageshift)
> +		return -EFAULT;
> +
>  	pa = (void *) vmalloc_to_phys(va);
>  	if (!pa)
>  		return -EFAULT;
> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
> index 2da5f05..7cd63b0 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -467,7 +467,7 @@ static int tce_iommu_prereg_ua_to_hpa(struct tce_container *container,
>  	if (!mem)
>  		return -EINVAL;
>  
> -	ret = mm_iommu_ua_to_hpa(mem, tce, phpa);
> +	ret = mm_iommu_ua_to_hpa(mem, tce, shift, phpa);
>  	if (ret)
>  		return -EINVAL;
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH kernel v3 3/6] KVM: PPC: Make iommu_table::it_userspace big endian
From: Paul Mackerras @ 2018-07-15 23:37 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, David Gibson, kvm-ppc, kvm, Alex Williamson,
	Benjamin Herrenschmidt, Michael Ellerman, Russell Currey
In-Reply-To: <20180704061349.20742-4-aik@ozlabs.ru>

On Wed, Jul 04, 2018 at 04:13:46PM +1000, Alexey Kardashevskiy wrote:
> We are going to reuse multilevel TCE code for the userspace copy of
> the TCE table and since it is big endian, let's make the copy big endian
> too.
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Acked-by: Paul Mackerras <paulus@ozlabs.org>

^ permalink raw reply

* Re: [PATCH 2/2] powerpc: Add ppc64le and ppc64_book3e allmodconfig targets
From: Randy Dunlap @ 2018-07-15 19:40 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev; +Cc: npiggin, malat, Felipe Balbi
In-Reply-To: <20180709142426.26999-2-mpe@ellerman.id.au>

On 07/09/18 07:24, Michael Ellerman wrote:
> Similarly as we just did for 32-bit, add phony targets for generating
> a little endian and Book3E allmodconfig. These aren't covered by the
> regular allmodconfig, which is big endian and Book3S due to the way
> the Kconfig symbols are structured.

[adding Felipe Balbi]


Is book3e allmodconfig not seen/used very much?

Besides the patches that I have already sent, I am seeing a build problem
with ppc64_book3e_allmodconfig, where we have:

CONFIG_USB_PHY=y
CONFIG_FSL_USB2_OTG=y
but
CONFIG_USB_OTG_FSM=m

In drivers/usb/phy/Kconfig, FSL_USB2_OTG depends on USB_OTG_FSM (among
other things), but!  FSL_USB2_OTG is a bool symbol, depending on a
tristate symbol.  This often causes problems.  In this case it causes errors
with a builtin driver trying to use symbols that are built in a loadable module:

drivers/usb/phy/phy-fsl-usb.o: In function `.fsl_otg_ioctl':
phy-fsl-usb.c:(.text.fsl_otg_ioctl+0xb4): undefined reference to `.otg_statemachine'
drivers/usb/phy/phy-fsl-usb.o: In function `.fsl_otg_start_srp':
phy-fsl-usb.c:(.text.fsl_otg_start_srp+0x4c): undefined reference to `.otg_statemachine'
drivers/usb/phy/phy-fsl-usb.o: In function `.fsl_otg_set_host':
phy-fsl-usb.c:(.text.fsl_otg_set_host+0xd0): undefined reference to `.otg_statemachine'
drivers/usb/phy/phy-fsl-usb.o: In function `.fsl_otg_start_hnp':
phy-fsl-usb.c:(.text.fsl_otg_start_hnp+0x68): undefined reference to `.otg_statemachine'
drivers/usb/phy/phy-fsl-usb.o: In function `.show_fsl_usb2_otg_state':
phy-fsl-usb.c:(.text.show_fsl_usb2_otg_state+0x154): undefined reference to `.usb_otg_state_string'
drivers/usb/phy/phy-fsl-usb.o: In function `.a_wait_enum':
(.text.a_wait_enum+0x4c): undefined reference to `.otg_statemachine'
drivers/usb/phy/phy-fsl-usb.o: In function `.fsl_otg_set_peripheral':
phy-fsl-usb.c:(.text.fsl_otg_set_peripheral+0x84): undefined reference to `.usb_gadget_vbus_disconnect'
phy-fsl-usb.c:(.text.fsl_otg_set_peripheral+0x9c): undefined reference to `.otg_statemachine'



> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> ---
>  arch/powerpc/Makefile | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index 2556c2182789..48e887f03a6c 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -359,6 +359,16 @@ ppc32_allmodconfig:
>  	$(Q)$(MAKE) KCONFIG_ALLCONFIG=$(srctree)/arch/powerpc/configs/book3s_32.config \
>  		-f $(srctree)/Makefile allmodconfig
>  
> +PHONY += ppc64le_allmodconfig
> +ppc64le_allmodconfig:
> +	$(Q)$(MAKE) KCONFIG_ALLCONFIG=$(srctree)/arch/powerpc/configs/le.config \
> +		-f $(srctree)/Makefile allmodconfig
> +
> +PHONY += ppc64_book3e_allmodconfig
> +ppc64_book3e_allmodconfig:
> +	$(Q)$(MAKE) KCONFIG_ALLCONFIG=$(srctree)/arch/powerpc/configs/85xx-64bit.config \
> +		-f $(srctree)/Makefile allmodconfig
> +
>  define archhelp
>    @echo '* zImage          - Build default images selected by kernel config'
>    @echo '  zImage.*        - Compressed kernel image (arch/$(ARCH)/boot/zImage.*)'
> 

thanks,
-- 
~Randy

^ permalink raw reply

* [PATCH] usb/phy: fix PPC64 build errors in phy-fsl-usb.c
From: Randy Dunlap @ 2018-07-15 17:37 UTC (permalink / raw)
  To: USB list, PowerPC; +Cc: Felipe Balbi, Michael Ellerman, Greg Kroah-Hartman

From: Randy Dunlap <rdunlap@infradead.org>

Fix build errors when built for PPC64:
These variables are only used on PPC32 so they don't need to be
initialized for PPC64.

../drivers/usb/phy/phy-fsl-usb.c: In function 'usb_otg_start':
../drivers/usb/phy/phy-fsl-usb.c:865:3: error: '_fsl_readl' undeclared (first use in this function); did you mean 'fsl_readl'?
   _fsl_readl = _fsl_readl_be;
../drivers/usb/phy/phy-fsl-usb.c:865:16: error: '_fsl_readl_be' undeclared (first use in this function); did you mean 'fsl_readl'?
   _fsl_readl = _fsl_readl_be;
../drivers/usb/phy/phy-fsl-usb.c:866:3: error: '_fsl_writel' undeclared (first use in this function); did you mean 'fsl_writel'?
   _fsl_writel = _fsl_writel_be;
../drivers/usb/phy/phy-fsl-usb.c:866:17: error: '_fsl_writel_be' undeclared (first use in this function); did you mean 'fsl_writel'?
   _fsl_writel = _fsl_writel_be;
../drivers/usb/phy/phy-fsl-usb.c:868:16: error: '_fsl_readl_le' undeclared (first use in this function); did you mean 'fsl_readl'?
   _fsl_readl = _fsl_readl_le;
../drivers/usb/phy/phy-fsl-usb.c:869:17: error: '_fsl_writel_le' undeclared (first use in this function); did you mean 'fsl_writel'?
   _fsl_writel = _fsl_writel_le;

and the sysfs "show" function return type should be ssize_t, not int:

../drivers/usb/phy/phy-fsl-usb.c:1042:49: error: initialization of 'ssize_t (*)(struct device *, struct device_attribute *, char *)' {aka 'long int (*)(struct device *, struct device_attribute *, char *)'} from incompatible pointer type 'int (*)(struct device *, struct device_attribute *, char *)' [-Werror=incompatible-pointer-types]
 static DEVICE_ATTR(fsl_usb2_otg_state, S_IRUGO, show_fsl_usb2_otg_state, NULL);

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: linux-usb@vger.kernel.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
---
Found when using Michael's patch for ppc64_book3e_allmodconfig.

 drivers/usb/phy/phy-fsl-usb.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- lnx-418-rc4.orig/drivers/usb/phy/phy-fsl-usb.c
+++ lnx-418-rc4/drivers/usb/phy/phy-fsl-usb.c
@@ -861,6 +861,7 @@ int usb_otg_start(struct platform_device
 	if (pdata->init && pdata->init(pdev) != 0)
 		return -EINVAL;
 
+#ifdef CONFIG_PPC32
 	if (pdata->big_endian_mmio) {
 		_fsl_readl = _fsl_readl_be;
 		_fsl_writel = _fsl_writel_be;
@@ -868,6 +869,7 @@ int usb_otg_start(struct platform_device
 		_fsl_readl = _fsl_readl_le;
 		_fsl_writel = _fsl_writel_le;
 	}
+#endif
 
 	/* request irq */
 	p_otg->irq = platform_get_irq(pdev, 0);
@@ -958,7 +960,7 @@ int usb_otg_start(struct platform_device
 /*
  * state file in sysfs
  */
-static int show_fsl_usb2_otg_state(struct device *dev,
+static ssize_t show_fsl_usb2_otg_state(struct device *dev,
 				   struct device_attribute *attr, char *buf)
 {
 	struct otg_fsm *fsm = &fsl_otg_dev->fsm;

^ permalink raw reply

* [PATCH] powerpc/platforms/85xx: fix t1042rdb_diu.c build errors & warning
From: Randy Dunlap @ 2018-07-15 17:34 UTC (permalink / raw)
  To: PowerPC
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Scott Wood, Kumar Gala

From: Randy Dunlap <rdunlap@infradead.org>

Fix build errors and warnings in t1042rdb_diu.c by adding header files
and MODULE_LICENSE().

../arch/powerpc/platforms/85xx/t1042rdb_diu.c:152:1: warning: data definition has no type or storage class
 early_initcall(t1042rdb_diu_init);
../arch/powerpc/platforms/85xx/t1042rdb_diu.c:152:1: error: type defaults to 'int' in declaration of 'early_initcall' [-Werror=implicit-int]
../arch/powerpc/platforms/85xx/t1042rdb_diu.c:152:1: warning: parameter names (without types) in function declaration

and
WARNING: modpost: missing MODULE_LICENSE() in arch/powerpc/platforms/85xx/t1042rdb_diu.o

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Scott Wood <oss@buserror.net>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
---
Found when using Michael's patch for ppc64_book3e_allmodconfig.

 arch/powerpc/platforms/85xx/t1042rdb_diu.c |    4 ++++
 1 file changed, 4 insertions(+)

--- lnx-418-rc4.orig/arch/powerpc/platforms/85xx/t1042rdb_diu.c
+++ lnx-418-rc4/arch/powerpc/platforms/85xx/t1042rdb_diu.c
@@ -9,8 +9,10 @@
  * option) any later version.
  */
 
+#include <linux/init.h>
 #include <linux/io.h>
 #include <linux/kernel.h>
+#include <linux/module.h>
 #include <linux/of.h>
 #include <linux/of_address.h>
 
@@ -150,3 +152,5 @@ static int __init t1042rdb_diu_init(void
 }
 
 early_initcall(t1042rdb_diu_init);
+
+MODULE_LICENSE("GPL");

^ permalink raw reply

* [PATCH v5 2/2] hwmon: ibmpowernv: Add attributes to enable/disable sensor groups
From: Shilpasri G Bhat @ 2018-07-15  6:54 UTC (permalink / raw)
  To: mpe, linux; +Cc: linuxppc-dev, linux-hwmon, linux-kernel, ego, Shilpasri G Bhat
In-Reply-To: <1531637685-23250-1-git-send-email-shilpa.bhat@linux.vnet.ibm.com>

On-Chip-Controller(OCC) is an embedded micro-processor in POWER9 chip
which measures various system and chip level sensors. These sensors
comprises of environmental sensors (like power, temperature, current
and voltage) and performance sensors (like utilization, frequency).
All these sensors are copied to main memory at a regular interval of
100ms. OCC provides a way to select a group of sensors that is copied
to the main memory to increase the update frequency of selected sensor
groups. When a sensor-group is disabled, OCC will not copy it to main
memory and those sensors read 0 values.

This patch provides support for enabling/disabling the sensor groups
like power, temperature, current and voltage. This patch adds new
per-senor sysfs attribute to disable and enable them.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
---
Changes from v4:
- As per Mpe's suggestion store device_node instead of phandles and
  clean it after init
- s/sg_data/sgrp_data

 Documentation/hwmon/ibmpowernv |  43 ++++++-
 drivers/hwmon/ibmpowernv.c     | 256 +++++++++++++++++++++++++++++++++++------
 2 files changed, 265 insertions(+), 34 deletions(-)

diff --git a/Documentation/hwmon/ibmpowernv b/Documentation/hwmon/ibmpowernv
index 8826ba2..5646825 100644
--- a/Documentation/hwmon/ibmpowernv
+++ b/Documentation/hwmon/ibmpowernv
@@ -33,9 +33,48 @@ fanX_input		Measured RPM value.
 fanX_min		Threshold RPM for alert generation.
 fanX_fault		0: No fail condition
 			1: Failing fan
+
 tempX_input		Measured ambient temperature.
 tempX_max		Threshold ambient temperature for alert generation.
-inX_input		Measured power supply voltage
+tempX_highest		Historical maximum temperature
+tempX_lowest		Historical minimum temperature
+tempX_enable		Enable/disable all temperature sensors belonging to the
+			sub-group. In POWER9, this attribute corresponds to
+			each OCC. Using this attribute each OCC can be asked to
+			disable/enable all of its temperature sensors.
+			1: Enable
+			0: Disable
+
+inX_input		Measured power supply voltage (millivolt)
 inX_fault		0: No fail condition.
 			1: Failing power supply.
-power1_input		System power consumption (microWatt)
+inX_highest		Historical maximum voltage
+inX_lowest		Historical minimum voltage
+inX_enable		Enable/disable all voltage sensors belonging to the
+			sub-group. In POWER9, this attribute corresponds to
+			each OCC. Using this attribute each OCC can be asked to
+			disable/enable all of its voltage sensors.
+			1: Enable
+			0: Disable
+
+powerX_input		Power consumption (microWatt)
+powerX_input_highest	Historical maximum power
+powerX_input_lowest	Historical minimum power
+powerX_enable		Enable/disable all power sensors belonging to the
+			sub-group. In POWER9, this attribute corresponds to
+			each OCC. Using this attribute each OCC can be asked to
+			disable/enable all of its power sensors.
+			1: Enable
+			0: Disable
+
+currX_input		Measured current (milliampere)
+currX_highest		Historical maximum current
+currX_lowest		Historical minimum current
+currX_enable		Enable/disable all current sensors belonging to the
+			sub-group. In POWER9, this attribute corresponds to
+			each OCC. Using this attribute each OCC can be asked to
+			disable/enable all of its current sensors.
+			1: Enable
+			0: Disable
+
+energyX_input		Cumulative energy (microJoule)
diff --git a/drivers/hwmon/ibmpowernv.c b/drivers/hwmon/ibmpowernv.c
index f829dad..a509b9b 100644
--- a/drivers/hwmon/ibmpowernv.c
+++ b/drivers/hwmon/ibmpowernv.c
@@ -90,11 +90,23 @@ struct sensor_data {
 	char label[MAX_LABEL_LEN];
 	char name[MAX_ATTR_LEN];
 	struct device_attribute dev_attr;
+	struct sensor_group_data *sgrp_data;
+};
+
+struct sensor_group_data {
+	struct mutex mutex;
+	struct device_node **of_nodes;
+	u32 gid;
+	u32 nr_nodes;
+	enum sensors type;
+	bool enable;
 };
 
 struct platform_data {
 	const struct attribute_group *attr_groups[MAX_SENSOR_TYPE + 1];
+	struct sensor_group_data *sgrp_data;
 	u32 sensors_count; /* Total count of sensors from each group */
+	u32 nr_sensor_groups; /* Total number of sensor groups */
 };
 
 static ssize_t show_sensor(struct device *dev, struct device_attribute *devattr,
@@ -105,6 +117,9 @@ static ssize_t show_sensor(struct device *dev, struct device_attribute *devattr,
 	ssize_t ret;
 	u64 x;
 
+	if (sdata->sgrp_data && !sdata->sgrp_data->enable)
+		return -ENODATA;
+
 	ret =  opal_get_sensor_data_u64(sdata->id, &x);
 
 	if (ret)
@@ -120,6 +135,46 @@ static ssize_t show_sensor(struct device *dev, struct device_attribute *devattr,
 	return sprintf(buf, "%llu\n", x);
 }
 
+static ssize_t show_enable(struct device *dev,
+			   struct device_attribute *devattr, char *buf)
+{
+	struct sensor_data *sdata = container_of(devattr, struct sensor_data,
+						 dev_attr);
+
+	return sprintf(buf, "%u\n", sdata->sgrp_data->enable);
+}
+
+static ssize_t store_enable(struct device *dev,
+			    struct device_attribute *devattr,
+			    const char *buf, size_t count)
+{
+	struct sensor_data *sdata = container_of(devattr, struct sensor_data,
+						 dev_attr);
+	struct sensor_group_data *sgrp_data = sdata->sgrp_data;
+	bool data;
+	int ret;
+
+	ret = kstrtobool(buf, &data);
+	if (ret)
+		return ret;
+
+	ret = mutex_lock_interruptible(&sgrp_data->mutex);
+	if (ret)
+		return ret;
+
+	if (data != sgrp_data->enable) {
+		ret =  sensor_group_enable(sgrp_data->gid, data);
+		if (!ret)
+			sgrp_data->enable = data;
+	}
+
+	if (!ret)
+		ret = count;
+
+	mutex_unlock(&sgrp_data->mutex);
+	return ret;
+}
+
 static ssize_t show_label(struct device *dev, struct device_attribute *devattr,
 			  char *buf)
 {
@@ -292,12 +347,129 @@ static u32 get_sensor_hwmon_index(struct sensor_data *sdata,
 	return ++sensor_groups[sdata->type].hwmon_index;
 }
 
+static int init_sensor_group_data(struct platform_device *pdev,
+				  struct platform_data *pdata)
+{
+	struct sensor_group_data *sgrp_data;
+	struct device_node *groups, *sgrp;
+	enum sensors type;
+	int count = 0, ret = 0;
+
+	groups = of_find_node_by_path("/ibm,opal/sensor-groups");
+	if (!groups)
+		return ret;
+
+	for_each_child_of_node(groups, sgrp) {
+		type = get_sensor_type(sgrp);
+		if (type != MAX_SENSOR_TYPE)
+			pdata->nr_sensor_groups++;
+	}
+
+	if (!pdata->nr_sensor_groups)
+		goto out;
+
+	sgrp_data = devm_kcalloc(&pdev->dev, pdata->nr_sensor_groups,
+				 sizeof(*sgrp_data), GFP_KERNEL);
+	if (!sgrp_data) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	for_each_child_of_node(groups, sgrp) {
+		const __be32 *phandles;
+		int len, gid, i, k = 0;
+
+		type = get_sensor_type(sgrp);
+		if (type == MAX_SENSOR_TYPE)
+			continue;
+
+		if (of_property_read_u32(sgrp, "sensor-group-id", &gid))
+			continue;
+
+		phandles = of_get_property(sgrp, "sensors", &len);
+		if (!phandles)
+			continue;
+
+		len /= sizeof(u32);
+		if (!len)
+			continue;
+
+		sgrp_data[count].of_nodes = devm_kcalloc(&pdev->dev,
+						sizeof(struct device_node *),
+						len, GFP_KERNEL);
+		if (!sgrp_data[count].of_nodes) {
+			ret = -ENOMEM;
+			of_node_put(sgrp);
+			goto out;
+		}
+
+		for (i = 0; i < len; i++) {
+			struct device_node *node;
+
+			node = of_parse_phandle(sgrp, "sensors", i);
+			if (!node)
+				continue;
+			sgrp_data[count].of_nodes[k++] = node;
+		}
+
+		sensor_groups[type].attr_count++;
+		sgrp_data[count].gid = gid;
+		sgrp_data[count].type = type;
+		sgrp_data[count].nr_nodes = len;
+		mutex_init(&sgrp_data[count].mutex);
+		sgrp_data[count++].enable = false;
+	}
+	pdata->sgrp_data = sgrp_data;
+out:
+	of_node_put(groups);
+	return ret;
+}
+
+static struct sensor_group_data *get_sensor_group(struct platform_data *pdata,
+						  struct device_node *node,
+						  enum sensors type)
+{
+	struct sensor_group_data *sgrp_data = pdata->sgrp_data;
+	int i, j;
+
+	for (i = 0; i < pdata->nr_sensor_groups; i++) {
+		if (type != sgrp_data[i].type)
+			continue;
+
+		for (j = 0; j < sgrp_data[i].nr_nodes; j++)
+			if (sgrp_data[i].of_nodes[j] == node)
+				return &sgrp_data[i];
+	}
+
+	return NULL;
+}
+
+static void clean_sensor_group_of_node(struct platform_device *pdev)
+{
+	struct platform_data *pdata = platform_get_drvdata(pdev);
+	struct sensor_group_data *sgrp_data = pdata->sgrp_data;
+	int i, j;
+
+	for (i = 0; i < pdata->nr_sensor_groups; i++) {
+		for (j = 0; j < sgrp_data[i].nr_nodes; j++)
+			of_node_put(sgrp_data[i].of_nodes[j]);
+
+		devm_kfree(&pdev->dev, sgrp_data[i].of_nodes);
+		sgrp_data[i].of_nodes = NULL;
+	}
+}
+
 static int populate_attr_groups(struct platform_device *pdev)
 {
 	struct platform_data *pdata = platform_get_drvdata(pdev);
 	const struct attribute_group **pgroups = pdata->attr_groups;
 	struct device_node *opal, *np;
 	enum sensors type;
+	int ret;
+
+	ret = init_sensor_group_data(pdev, pdata);
+	if (ret)
+		return ret;
 
 	opal = of_find_node_by_path("/ibm,opal/sensors");
 	for_each_child_of_node(opal, np) {
@@ -344,7 +516,10 @@ static int populate_attr_groups(struct platform_device *pdev)
 static void create_hwmon_attr(struct sensor_data *sdata, const char *attr_name,
 			      ssize_t (*show)(struct device *dev,
 					      struct device_attribute *attr,
-					      char *buf))
+					      char *buf),
+			    ssize_t (*store)(struct device *dev,
+					     struct device_attribute *attr,
+					     const char *buf, size_t count))
 {
 	snprintf(sdata->name, MAX_ATTR_LEN, "%s%d_%s",
 		 sensor_groups[sdata->type].name, sdata->hwmon_index,
@@ -352,23 +527,33 @@ static void create_hwmon_attr(struct sensor_data *sdata, const char *attr_name,
 
 	sysfs_attr_init(&sdata->dev_attr.attr);
 	sdata->dev_attr.attr.name = sdata->name;
-	sdata->dev_attr.attr.mode = S_IRUGO;
 	sdata->dev_attr.show = show;
+	if (store) {
+		sdata->dev_attr.store = store;
+		sdata->dev_attr.attr.mode = 0664;
+	} else {
+		sdata->dev_attr.attr.mode = 0444;
+	}
 }
 
 static void populate_sensor(struct sensor_data *sdata, int od, int hd, int sid,
 			    const char *attr_name, enum sensors type,
 			    const struct attribute_group *pgroup,
+			    struct sensor_group_data *sgrp_data,
 			    ssize_t (*show)(struct device *dev,
 					    struct device_attribute *attr,
-					    char *buf))
+					    char *buf),
+			    ssize_t (*store)(struct device *dev,
+					     struct device_attribute *attr,
+					     const char *buf, size_t count))
 {
 	sdata->id = sid;
 	sdata->type = type;
 	sdata->opal_index = od;
 	sdata->hwmon_index = hd;
-	create_hwmon_attr(sdata, attr_name, show);
+	create_hwmon_attr(sdata, attr_name, show, store);
 	pgroup->attrs[sensor_groups[type].attr_count++] = &sdata->dev_attr.attr;
+	sdata->sgrp_data = sgrp_data;
 }
 
 static char *get_max_attr(enum sensors type)
@@ -403,24 +588,23 @@ static int create_device_attrs(struct platform_device *pdev)
 	const struct attribute_group **pgroups = pdata->attr_groups;
 	struct device_node *opal, *np;
 	struct sensor_data *sdata;
-	u32 sensor_id;
-	enum sensors type;
 	u32 count = 0;
-	int err = 0;
+	u32 group_attr_id[MAX_SENSOR_TYPE] = {0};
 
-	opal = of_find_node_by_path("/ibm,opal/sensors");
 	sdata = devm_kcalloc(&pdev->dev,
 			     pdata->sensors_count, sizeof(*sdata),
 			     GFP_KERNEL);
-	if (!sdata) {
-		err = -ENOMEM;
-		goto exit_put_node;
-	}
+	if (!sdata)
+		return -ENOMEM;
 
+	opal = of_find_node_by_path("/ibm,opal/sensors");
 	for_each_child_of_node(opal, np) {
+		struct sensor_group_data *sgrp_data;
 		const char *attr_name;
-		u32 opal_index;
+		u32 opal_index, hw_id;
+		u32 sensor_id;
 		const char *label;
+		enum sensors type;
 
 		if (np->name == NULL)
 			continue;
@@ -456,14 +640,12 @@ static int create_device_attrs(struct platform_device *pdev)
 			opal_index = INVALID_INDEX;
 		}
 
-		sdata[count].opal_index = opal_index;
-		sdata[count].hwmon_index =
-			get_sensor_hwmon_index(&sdata[count], sdata, count);
-
-		create_hwmon_attr(&sdata[count], attr_name, show_sensor);
-
-		pgroups[type]->attrs[sensor_groups[type].attr_count++] =
-				&sdata[count++].dev_attr.attr;
+		hw_id = get_sensor_hwmon_index(&sdata[count], sdata, count);
+		sgrp_data = get_sensor_group(pdata, np, type);
+		populate_sensor(&sdata[count], opal_index, hw_id, sensor_id,
+				attr_name, type, pgroups[type], sgrp_data,
+				show_sensor, NULL);
+		count++;
 
 		if (!of_property_read_string(np, "label", &label)) {
 			/*
@@ -474,35 +656,44 @@ static int create_device_attrs(struct platform_device *pdev)
 			 */
 
 			make_sensor_label(np, &sdata[count], label);
-			populate_sensor(&sdata[count], opal_index,
-					sdata[count - 1].hwmon_index,
+			populate_sensor(&sdata[count], opal_index, hw_id,
 					sensor_id, "label", type, pgroups[type],
-					show_label);
+					NULL, show_label, NULL);
 			count++;
 		}
 
 		if (!of_property_read_u32(np, "sensor-data-max", &sensor_id)) {
 			attr_name = get_max_attr(type);
-			populate_sensor(&sdata[count], opal_index,
-					sdata[count - 1].hwmon_index,
+			populate_sensor(&sdata[count], opal_index, hw_id,
 					sensor_id, attr_name, type,
-					pgroups[type], show_sensor);
+					pgroups[type], sgrp_data, show_sensor,
+					NULL);
 			count++;
 		}
 
 		if (!of_property_read_u32(np, "sensor-data-min", &sensor_id)) {
 			attr_name = get_min_attr(type);
-			populate_sensor(&sdata[count], opal_index,
-					sdata[count - 1].hwmon_index,
+			populate_sensor(&sdata[count], opal_index, hw_id,
 					sensor_id, attr_name, type,
-					pgroups[type], show_sensor);
+					pgroups[type], sgrp_data, show_sensor,
+					NULL);
+			count++;
+		}
+
+		if (sgrp_data && !sgrp_data->enable) {
+			sgrp_data->enable = true;
+			hw_id = ++group_attr_id[type];
+			populate_sensor(&sdata[count], opal_index, hw_id,
+					sgrp_data->gid, "enable", type,
+					pgroups[type], sgrp_data, show_enable,
+					store_enable);
 			count++;
 		}
 	}
 
-exit_put_node:
 	of_node_put(opal);
-	return err;
+	clean_sensor_group_of_node(pdev);
+	return 0;
 }
 
 static int ibmpowernv_probe(struct platform_device *pdev)
@@ -517,6 +708,7 @@ static int ibmpowernv_probe(struct platform_device *pdev)
 
 	platform_set_drvdata(pdev, pdata);
 	pdata->sensors_count = 0;
+	pdata->nr_sensor_groups = 0;
 	err = populate_attr_groups(pdev);
 	if (err)
 		return err;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v5 1/2] powernv:opal-sensor-groups: Add support to enable sensor groups
From: Shilpasri G Bhat @ 2018-07-15  6:54 UTC (permalink / raw)
  To: mpe, linux; +Cc: linuxppc-dev, linux-hwmon, linux-kernel, ego, Shilpasri G Bhat
In-Reply-To: <1531637685-23250-1-git-send-email-shilpa.bhat@linux.vnet.ibm.com>

Adds support to enable/disable a sensor group at runtime. This
can be used to select the sensor groups that needs to be copied to
main memory by OCC. Sensor groups like power, temperature, current,
voltage, frequency, utilization can be enabled/disabled at runtime.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/opal-api.h                |  1 +
 arch/powerpc/include/asm/opal.h                    |  2 ++
 .../powerpc/platforms/powernv/opal-sensor-groups.c | 28 ++++++++++++++++++++++
 arch/powerpc/platforms/powernv/opal-wrappers.S     |  1 +
 4 files changed, 32 insertions(+)

diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index 3bab299..56a94a1 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -206,6 +206,7 @@
 #define OPAL_NPU_SPA_CLEAR_CACHE		160
 #define OPAL_NPU_TL_SET				161
 #define OPAL_SENSOR_READ_U64			162
+#define OPAL_SENSOR_GROUP_ENABLE		163
 #define OPAL_PCI_GET_PBCQ_TUNNEL_BAR		164
 #define OPAL_PCI_SET_PBCQ_TUNNEL_BAR		165
 #define OPAL_LAST				165
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index e1b2910..fc0550e 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -292,6 +292,7 @@ int64_t opal_imc_counters_init(uint32_t type, uint64_t address,
 int opal_get_power_shift_ratio(u32 handle, int token, u32 *psr);
 int opal_set_power_shift_ratio(u32 handle, int token, u32 psr);
 int opal_sensor_group_clear(u32 group_hndl, int token);
+int opal_sensor_group_enable(u32 group_hndl, int token, bool enable);
 
 s64 opal_signal_system_reset(s32 cpu);
 s64 opal_quiesce(u64 shutdown_type, s32 cpu);
@@ -326,6 +327,7 @@ extern int opal_async_wait_response_interruptible(uint64_t token,
 		struct opal_msg *msg);
 extern int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data);
 extern int opal_get_sensor_data_u64(u32 sensor_hndl, u64 *sensor_data);
+extern int sensor_group_enable(u32 grp_hndl, bool enable);
 
 struct rtc_time;
 extern time64_t opal_get_boot_time(void);
diff --git a/arch/powerpc/platforms/powernv/opal-sensor-groups.c b/arch/powerpc/platforms/powernv/opal-sensor-groups.c
index 541c9ea..f7d04b6 100644
--- a/arch/powerpc/platforms/powernv/opal-sensor-groups.c
+++ b/arch/powerpc/platforms/powernv/opal-sensor-groups.c
@@ -32,6 +32,34 @@ struct sg_attr {
 	struct sg_attr *sgattrs;
 } *sgs;
 
+int sensor_group_enable(u32 handle, bool enable)
+{
+	struct opal_msg msg;
+	int token, ret;
+
+	token = opal_async_get_token_interruptible();
+	if (token < 0)
+		return token;
+
+	ret = opal_sensor_group_enable(handle, token, enable);
+	if (ret == OPAL_ASYNC_COMPLETION) {
+		ret = opal_async_wait_response(token, &msg);
+		if (ret) {
+			pr_devel("Failed to wait for the async response\n");
+			ret = -EIO;
+			goto out;
+		}
+		ret = opal_error_code(opal_get_async_rc(msg));
+	} else {
+		ret = opal_error_code(ret);
+	}
+
+out:
+	opal_async_release_token(token);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(sensor_group_enable);
+
 static ssize_t sg_store(struct kobject *kobj, struct kobj_attribute *attr,
 			const char *buf, size_t count)
 {
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index a8d9b40..8268a1e 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -327,3 +327,4 @@ OPAL_CALL(opal_npu_tl_set,			OPAL_NPU_TL_SET);
 OPAL_CALL(opal_pci_get_pbcq_tunnel_bar,		OPAL_PCI_GET_PBCQ_TUNNEL_BAR);
 OPAL_CALL(opal_pci_set_pbcq_tunnel_bar,		OPAL_PCI_SET_PBCQ_TUNNEL_BAR);
 OPAL_CALL(opal_sensor_read_u64,			OPAL_SENSOR_READ_U64);
+OPAL_CALL(opal_sensor_group_enable,		OPAL_SENSOR_GROUP_ENABLE);
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v5 0/2] hwmon/powernv: Add attributes to enable/disable sensors
From: Shilpasri G Bhat @ 2018-07-15  6:54 UTC (permalink / raw)
  To: mpe, linux; +Cc: linuxppc-dev, linux-hwmon, linux-kernel, ego, Shilpasri G Bhat

This patch series adds new attribute to enable or disable a sensor at
runtime.

Changes from v4:
- Dropped ABI documentation patch for hwmon as it is already picked by
  Guenter Roeck.
- Changes to sensor-groups device-tree phandle array parsing as per
  Michael Ellerman's suggestion.

v4 : https://lkml.org/lkml/2018/7/6/379
v3 : https://lkml.org/lkml/2018/7/5/476
v2 : https://lkml.org/lkml/2018/7/4/263
v1 : https://lkml.org/lkml/2018/3/22/214

Shilpasri G Bhat (2):
  powernv:opal-sensor-groups: Add support to enable sensor groups
  hwmon: ibmpowernv: Add attributes to enable/disable sensor groups

 Documentation/hwmon/ibmpowernv                     |  43 +++-
 arch/powerpc/include/asm/opal-api.h                |   1 +
 arch/powerpc/include/asm/opal.h                    |   2 +
 .../powerpc/platforms/powernv/opal-sensor-groups.c |  28 +++
 arch/powerpc/platforms/powernv/opal-wrappers.S     |   1 +
 drivers/hwmon/ibmpowernv.c                         | 256 ++++++++++++++++++---
 6 files changed, 297 insertions(+), 34 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* Improvements for the PS3
From: Fredrik Noring @ 2018-07-14 16:49 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Geoff Levand

Hi,

I just checked out the latest "ps3-queue" branch

https://git.kernel.org/pub/scm/linux/kernel/git/geoff/ps3-linux.git/log/?h=ps3-queue

to upgrade my OtherOS boot kernel. I essentially started out with

$ make ARCH=powerpc ps3_defconfig
$ make ARCH=powerpc dtbImage.ps3

to obtain an image to install. The first problem was its size, which is
around 20 MB by default and therefore uninstallable. It seems the main
sections causing size problems in the ELF are (sizes in leftmost column):

	0x400000 OBJECT LOCAL  DEFAULT 37 stack_trace
	0x29feb0 OBJECT LOCAL  DEFAULT 37 lock_classes
	0x200000 OBJECT GLOBAL DEFAULT 37 lock_chains
	0x200000 OBJECT LOCAL  DEFAULT 37 list_entries
	 0xa0000 OBJECT LOCAL  DEFAULT 37 chain_hlocks
	 0x40000 OBJECT LOCAL  DEFAULT 37 chainhash_table
	 0x20000 OBJECT LOCAL  DEFAULT 37 __log_buf

I modified arch/powerpc/boot/wrapper to make the size error more apparent:

--- a/arch/powerpc/boot/wrapper
+++ b/arch/powerpc/boot/wrapper
@@ -550,5 +550,10 @@ ps3)
     bld="otheros.bld"
     [ $size -le 16777216 ] || bld="otheros-too-big.bld"
     gzip -n --force -9 --stdout "$ofile.bin" > "$odir/$bld"
+    if [ $size -gt 16777216 ]
+    then
+	    echo "ERROR: Image size $size bytes is too large for 16 MiB limit" >&2
+	    exit 1
+    fi
     ;;
 esac

I then proceeded with disabling STACKTRACE_SUPPORT and LOCKDEP_SUPPORT,
which reduced the size to about 8 MB, well below the 16 MiB limit. Perhaps
disabling these entirely is a bit heavy-handed? Is there a smarter way?

Trying to start the kernel results in a completely black screen. Nothing
happens. To have a chance of seeing anything I had configured:

CONFIG_FB_PS3=y
CONFIG_FB_PS3_DEFAULT_SIZE_M=9
CONFIG_CMDLINE="video=ps3fb:mode:10"

I decided to proceed by using lv1_panic to bisect the PowerPC boot process.
It either froze, in which case the call to lv1_panic was not reached, or it
rebooted. Interestingly, it turned out that ps3fb_probe was actually called,
so I added a sleep with

--- a/drivers/video/fbdev/ps3fb.c
+++ b/drivers/video/fbdev/ps3fb.c
@@ -1178,6 +1179,8 @@ static int ps3fb_probe(struct ps3_system_bus_device *dev)
 
 	ps3fb.task = task;
 
+	msleep(10000);
+
 	return 0;
 
 err_unregister_framebuffer:

et voilà, the screen came alive and the kernel panic was revealed! It seems
the kernel panics so fast that the PS3 frame buffer is unprepared. This is,
of course, very unfortunate because trying to debug the boot process without
a screen or any other means of obtaining console text is quite difficult.

[ In this case the panic was caused by a missing CONFIG_DEVTMPFS=y, where
"Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100",
is helpful to actually see in text. ]

I suppose the problem is that it relies on interrupts for ps3fb_sync_image
to regularly copy the image, hence without them the screen isn't updated to
show kernel panics, etc. Perhaps one way to fix that is to implement the
struct fb_tile_ops API, so that the console is synchronously updated? Would
that be acceptable?

Fredrik

^ permalink raw reply

* Re: [PATCH kernel] powerpc/ioda/npu2: Call hot reset skiboot hook when disabling NPU
From: Alexey Kardashevskiy @ 2018-07-14 11:34 UTC (permalink / raw)
  To: Alistair Popple
  Cc: linuxppc-dev, Benjamin Herrenschmidt, Russell Currey,
	Balbir Singh, Stewart Smith
In-Reply-To: <1903233.NnVPaYN7RK@new-mexico>

On Thu, 12 Jul 2018 11:38:34 +1000
Alistair Popple <alistair@popple.id.au> wrote:

> Hi Alexey,
> 
> On Wednesday, 11 July 2018 7:45:10 PM AEST Alexey Kardashevskiy wrote:
> > On Thu,  7 Jun 2018 17:06:07 +1000
> > Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> >   
> > > This brings NPU2 in a safe mode when it does not throw HMI if GPU
> > > coherent memory is gone.  
> 
> It might be helpful if you you could describe the problem and what you are
> trying to solve in a bit more depth. Assuming the memory was online how are you
> offlining it?

Fair enough. I am offlining it by simply killing a guest which triggers
GPU PCI reset. Before this, PCI reset would trigger HMI as PTEs were
still in both QEMU and guest pagetables and that would cause
prefetching and thus killing the host.


> If the memory has been online merely fencing/hot-resetting the
> NVLink is likely not sufficient as you also need to flush caches prior to taking
> the links down.

I'd expect the guest driver to take care of this. If this is not enough
and I need to pass some other MMIO (in addition to the ATS/tlb
invalidation thingy which I'll add anyway), then what is it?


> 
> - Alistair
> 
> > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>  
> > 
> > 
> > Anyone, ping?
> > 
> >   
> > > ---
> > > 
> > > The main aim for this is nvlink2 pass through, helps a lot.
> > > 
> > > 
> > > ---
> > >  arch/powerpc/platforms/powernv/pci-ioda.c | 11 +++++++++++
> > >  1 file changed, 11 insertions(+)
> > > 
> > > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> > > index 66c2804..29f798c 100644
> > > --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> > > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> > > @@ -3797,6 +3797,16 @@ static void pnv_pci_release_device(struct pci_dev *pdev)
> > >  		pnv_ioda_release_pe(pe);
> > >  }
> > >  
> > > +void pnv_npu_disable_device(struct pci_dev *pdev)
> > > +{
> > > +	struct eeh_dev *edev = pci_dev_to_eeh_dev(pdev);
> > > +	struct eeh_pe *eehpe = edev ? edev->pe : NULL;
> > > +
> > > +	if (eehpe && eeh_ops && eeh_ops->reset) {
> > > +		eeh_ops->reset(eehpe, EEH_RESET_HOT);
> > > +	}
> > > +}
> > > +
> > >  static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
> > >  {
> > >  	struct pnv_phb *phb = hose->private_data;
> > > @@ -3841,6 +3851,7 @@ static const struct pci_controller_ops pnv_npu_ioda_controller_ops = {
> > >  	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
> > >  	.dma_set_mask		= pnv_npu_dma_set_mask,
> > >  	.shutdown		= pnv_pci_ioda_shutdown,
> > > +	.disable_device		= pnv_npu_disable_device,
> > >  };
> > >  
> > >  static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = {  
> > 
> > 
> > 
> > --
> > Alexey
> >   
> 
> 



--
Alexey

^ permalink raw reply

* Re: [PATCH 1/2] powerpc: Add ppc32_allmodconfig defconfig target
From: Randy Dunlap @ 2018-07-14  4:35 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev; +Cc: npiggin, malat
In-Reply-To: <20180709142426.26999-1-mpe@ellerman.id.au>

On 07/09/2018 07:24 AM, Michael Ellerman wrote:
> Because the allmodconfig logic just sets every symbol to M or Y, it
> has the effect of always generating a 64-bit config, because
> CONFIG_PPC64 becomes Y.
> 
> So to make it easier for folks to test 32-bit code, provide a phony
> defconfig target that generates a 32-bit allmodconfig.
> 
> The 32-bit port has several mutually exclusive CPU types, we choose
> the Book3S variants as that's what the help text in Kconfig says is
> most common.
> 
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

Hi Michael,

ppc32_allmodconfig sets CONFIG_ISA=y (and other related symbols) and
CONFIG_PPC_CHRP=y.  But my builds are failing because they are missing
the functions isa_bus_to_virt() and isa_virt_to_bus().

Any ideas?

Thanks.

> ---
>  arch/powerpc/Makefile                 | 5 +++++
>  arch/powerpc/configs/book3s_32.config | 2 ++
>  2 files changed, 7 insertions(+)
>  create mode 100644 arch/powerpc/configs/book3s_32.config
> 
> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index 2ea575cb3401..2556c2182789 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -354,6 +354,11 @@ mpc86xx_smp_defconfig:
>  	$(call merge_into_defconfig,mpc86xx_basic_defconfig,\
>  		86xx-smp 86xx-hw fsl-emb-nonhw)
>  
> +PHONY += ppc32_allmodconfig
> +ppc32_allmodconfig:
> +	$(Q)$(MAKE) KCONFIG_ALLCONFIG=$(srctree)/arch/powerpc/configs/book3s_32.config \
> +		-f $(srctree)/Makefile allmodconfig
> +
>  define archhelp
>    @echo '* zImage          - Build default images selected by kernel config'
>    @echo '  zImage.*        - Compressed kernel image (arch/$(ARCH)/boot/zImage.*)'
> diff --git a/arch/powerpc/configs/book3s_32.config b/arch/powerpc/configs/book3s_32.config
> new file mode 100644
> index 000000000000..8721eb7b1294
> --- /dev/null
> +++ b/arch/powerpc/configs/book3s_32.config
> @@ -0,0 +1,2 @@
> +CONFIG_PPC64=n
> +CONFIG_PPC_BOOK3S_32=y
> 


-- 
~Randy

^ permalink raw reply

* [PATCH] chrp/nvram.c: add MODULE_LICENSE()
From: Randy Dunlap @ 2018-07-14  4:27 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman

From: Randy Dunlap <rdunlap@infradead.org>

Add MODULE_LICENSE() to the chrp nvram.c driver to fix the build
warning message:

WARNING: modpost: missing MODULE_LICENSE() in arch/powerpc/platforms/chrp/nvram.o

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
---
Feel free to adjust the license string if needed.

 arch/powerpc/platforms/chrp/nvram.c |    3 +++
 1 file changed, 3 insertions(+)

--- lnx-418-rc4.orig/arch/powerpc/platforms/chrp/nvram.c
+++ lnx-418-rc4/arch/powerpc/platforms/chrp/nvram.c
@@ -11,6 +11,7 @@
  */
 
 #include <linux/kernel.h>
+#include <linux/module.h>
 #include <linux/init.h>
 #include <linux/spinlock.h>
 #include <linux/uaccess.h>
@@ -89,3 +90,5 @@ void __init chrp_nvram_init(void)
 
 	return;
 }
+
+MODULE_LICENSE("GPL v2");

^ permalink raw reply

* [PATCH] net/ethernet/freescale/fman: fix cross-build error
From: Randy Dunlap @ 2018-07-14  4:25 UTC (permalink / raw)
  To: netdev@vger.kernel.org, David Miller; +Cc: Madalin Bucur, PowerPC

From: Randy Dunlap <rdunlap@infradead.org>

  CC [M]  drivers/net/ethernet/freescale/fman/fman.o
In file included from ../drivers/net/ethernet/freescale/fman/fman.c:35:
../include/linux/fsl/guts.h: In function 'guts_set_dmacr':
../include/linux/fsl/guts.h:165:2: error: implicit declaration of function 'clrsetbits_be32' [-Werror=implicit-function-declaration]
  clrsetbits_be32(&guts->dmacr, 3 << shift, device << shift);
  ^~~~~~~~~~~~~~~

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Madalin Bucur <madalin.bucur@nxp.com>
Cc: netdev@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
---
Found while doing ppc32_allmodconfig builds.

 include/linux/fsl/guts.h |    1 +
 1 file changed, 1 insertion(+)

--- lnx-418-rc4.orig/include/linux/fsl/guts.h
+++ lnx-418-rc4/include/linux/fsl/guts.h
@@ -16,6 +16,7 @@
 #define __FSL_GUTS_H__
 
 #include <linux/types.h>
+#include <linux/io.h>
 
 /**
  * Global Utility Registers.

^ permalink raw reply

* Re: [PATCH 1/2] powerpc: Add ppc32_allmodconfig defconfig target
From: Randy Dunlap @ 2018-07-14  1:59 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev; +Cc: npiggin, malat
In-Reply-To: <20180709142426.26999-1-mpe@ellerman.id.au>

On 07/09/2018 07:24 AM, Michael Ellerman wrote:
> Because the allmodconfig logic just sets every symbol to M or Y, it
> has the effect of always generating a 64-bit config, because
> CONFIG_PPC64 becomes Y.
> 
> So to make it easier for folks to test 32-bit code, provide a phony
> defconfig target that generates a 32-bit allmodconfig.
> 
> The 32-bit port has several mutually exclusive CPU types, we choose
> the Book3S variants as that's what the help text in Kconfig says is
> most com
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

Hi Michael,

Sorry for the delay.  I was traveling (out in the boonies).

I'm trying to use 'make ppc32_allmodconfig'.  Cross-building on x86_64
with crosstools from kernel.org.  (gcc 8.1.0)

I'm getting build errors.  Looks like it's missing a header file or 3.
I looked into that but it's a long and twisty maze of passages.
Any ideas?


  CC      arch/powerpc/mm/dump_linuxpagetables.o
In file included from ../arch/powerpc/include/asm/book3s/pgtable.h:8,
                 from ../arch/powerpc/include/asm/pgtable.h:18,
                 from ../include/linux/hugetlb.h:12,
                 from ../arch/powerpc/mm/dump_linuxpagetables.c:19:
../arch/powerpc/mm/dump_linuxpagetables.c: In function 'populate_markers':
../arch/powerpc/include/asm/book3s/32/pgtable.h:53:19: error: 'PKMAP_BASE' undeclared (first use in this function); did you mean 'AT_BASE'?
 #define KVIRT_TOP PKMAP_BASE
                   ^~~~~~~~~~
../arch/powerpc/include/asm/book3s/32/pgtable.h:64:23: note: in expansion of macro 'KVIRT_TOP'
 #define IOREMAP_TOP ((KVIRT_TOP - CONFIG_CONSISTENT_SIZE) & PAGE_MASK)
                       ^~~~~~~~~
../arch/powerpc/mm/dump_linuxpagetables.c:456:39: note: in expansion of macro 'IOREMAP_TOP'
  address_markers[i++].start_address = IOREMAP_TOP;
                                       ^~~~~~~~~~~
../arch/powerpc/include/asm/book3s/32/pgtable.h:53:19: note: each undeclared identifier is reported only once for each function it appears in
 #define KVIRT_TOP PKMAP_BASE
                   ^~~~~~~~~~
../arch/powerpc/include/asm/book3s/32/pgtable.h:64:23: note: in expansion of macro 'KVIRT_TOP'
 #define IOREMAP_TOP ((KVIRT_TOP - CONFIG_CONSISTENT_SIZE) & PAGE_MASK)
                       ^~~~~~~~~
../arch/powerpc/mm/dump_linuxpagetables.c:456:39: note: in expansion of macro 'IOREMAP_TOP'
  address_markers[i++].start_address = IOREMAP_TOP;
                                       ^~~~~~~~~~~
../arch/powerpc/mm/dump_linuxpagetables.c:464:39: error: implicit declaration of function 'PKMAP_ADDR'; did you mean 'PCI_IO_ADDR'? [-Werror=implicit-function-declaration]
  address_markers[i++].start_address = PKMAP_ADDR(LAST_PKMAP);
                                       ^~~~~~~~~~
                                       PCI_IO_ADDR
../arch/powerpc/mm/dump_linuxpagetables.c:464:50: error: 'LAST_PKMAP' undeclared (first use in this function); did you mean 'LIST_HEAD'?
  address_markers[i++].start_address = PKMAP_ADDR(LAST_PKMAP);
                                                  ^~~~~~~~~~
                                                  LIST_HEAD



Thanks.

> ---
>  arch/powerpc/Makefile                 | 5 +++++
>  arch/powerpc/configs/book3s_32.config | 2 ++
>  2 files changed, 7 insertions(+)
>  create mode 100644 arch/powerpc/configs/book3s_32.config
> 
> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index 2ea575cb3401..2556c2182789 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -354,6 +354,11 @@ mpc86xx_smp_defconfig:
>  	$(call merge_into_defconfig,mpc86xx_basic_defconfig,\
>  		86xx-smp 86xx-hw fsl-emb-nonhw)
>  
> +PHONY += ppc32_allmodconfig
> +ppc32_allmodconfig:
> +	$(Q)$(MAKE) KCONFIG_ALLCONFIG=$(srctree)/arch/powerpc/configs/book3s_32.config \
> +		-f $(srctree)/Makefile allmodconfig
> +
>  define archhelp
>    @echo '* zImage          - Build default images selected by kernel config'
>    @echo '  zImage.*        - Compressed kernel image (arch/$(ARCH)/boot/zImage.*)'
> diff --git a/arch/powerpc/configs/book3s_32.config b/arch/powerpc/configs/book3s_32.config
> new file mode 100644
> index 000000000000..8721eb7b1294
> --- /dev/null
> +++ b/arch/powerpc/configs/book3s_32.config
> @@ -0,0 +1,2 @@
> +CONFIG_PPC64=n
> +CONFIG_PPC_BOOK3S_32=y
> 


-- 
~Randy

^ permalink raw reply

* Re: [next-20180711][Oops] linux-next kernel boot is broken on powerpc
From: Stephen Rothwell @ 2018-07-14  0:55 UTC (permalink / raw)
  To: Abdul Haleem
  Cc: Pavel Tatashin, sachinp, Michal Hocko, sim, venkatb3, LKML,
	manvanth, Linux Memory Management List, linux-next, aneesh.kumar,
	linuxppc-dev
In-Reply-To: <1531473191.6480.26.camel@abdul.in.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 1230 bytes --]

Hi Abdul,

On Fri, 13 Jul 2018 14:43:11 +0530 Abdul Haleem <abdhalee@linux.vnet.ibm.com> wrote:
>
> On Thu, 2018-07-12 at 13:44 -0400, Pavel Tatashin wrote:
> > > Related commit could be one of below ? I see lots of patches related to mm and could not bisect
> > >
> > > 5479976fda7d3ab23ba0a4eb4d60b296eb88b866 mm: page_alloc: restore memblock_next_valid_pfn() on arm/arm64
> > > 41619b27b5696e7e5ef76d9c692dd7342c1ad7eb mm-drop-vm_bug_on-from-__get_free_pages-fix
> > > 531bbe6bd2721f4b66cdb0f5cf5ac14612fa1419 mm: drop VM_BUG_ON from __get_free_pages
> > > 479350dd1a35f8bfb2534697e5ca68ee8a6e8dea mm, page_alloc: actually ignore mempolicies for high priority allocations
> > > 088018f6fe571444caaeb16e84c9f24f22dfc8b0 mm: skip invalid pages block at a time in zero_resv_unresv()  
> > 
> > Looks like:
> > 0ba29a108979 mm/sparse: Remove CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
> > 
> > This patch is going to be reverted from linux-next. Abdul, please
> > verify that issue is gone once  you revert this patch.  
> 
> kernel booted fine when the above patch is reverted.

And it has been removed from linux-next as of next-20180713.  (Friday
the 13th is not all bad :-))
-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: CONFIG_ANDROID_BINDER_IPC=y (Re: powerpc: 32BIT vs. 64BIT (PPC32 vs. PPC64))
From: Randy Dunlap @ 2018-07-13 23:24 UTC (permalink / raw)
  To: Mathieu Malaterre; +Cc: Michael Ellerman, linuxppc-dev
In-Reply-To: <CA+7wUsz=DUKb1WJgL7dCZNtH8aHh9KeJ--AHWoCmgjSSTRbggA@mail.gmail.com>

On 07/13/2018 04:41 AM, Mathieu Malaterre wrote:
> Randy,
> 
> On Mon, Jul 9, 2018 at 2:00 PM Mathieu Malaterre <malat@debian.org> wrote:
>>
>> On Sun, Jul 8, 2018 at 1:53 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>>>
>>> Randy Dunlap <rdunlap@infradead.org> writes:
>>>> Hi,
>>>>
>>>> Is there a good way (or a shortcut) to do something like:
>>>
>>> The best I know of is:
>>>
>>>> $ make ARCH=powerpc O=PPC32 [other_options] allmodconfig
>>>>   to get a PPC32/32BIT allmodconfig
>>>
>>> $ echo CONFIG_PPC64=n > allmod.config
>>> $ KCONFIG_ALLCONFIG=1 make allmodconfig
>>> $ grep PPC32 .config
>>> CONFIG_PPC32=y
>>>
>>> Which is still a bit clunky.
>>>
>>>
>>> I looked at this a while back and the problem we have is that the 32-bit
>>> kernel is not a single thing. There are multiple 32-bit platforms which
>>> are mutually exclusive.
>>>
>>> eg, from menuconfig:
>>>
>>>  - 512x/52xx/6xx/7xx/74xx/82xx/83xx/86xx
>>>  - Freescale 85xx
>>>  - Freescale 8xx
>>>  - AMCC 40x
>>>  - AMCC 44x, 46x or 47x
>>>  - Freescale e200
>>
>> Most Linux distro seems to have drop support for ppc32. So I'd suggest
>> to pick Debian powperc default config (but I agree that I am a little
>> biased here).
> 
> I tried an allmode as suggest by Michael (above). But I get a build error:
> 
>   MODPOST vmlinux.o
> drivers/android/binder.o: In function `binder_thread_write':
> binder.c:(.text+0xc750): undefined reference to `__get_user_bad'
> binder.c:(.text+0xc76c): undefined reference to `__get_user_bad'
> binder.c:(.text+0xc790): undefined reference to `__get_user_bad'
> binder.c:(.text+0xc7d4): undefined reference to `__get_user_bad'
> binder.c:(.text+0xc7f4): undefined reference to `__get_user_bad'
> 
> 
> So for now I need to do: CONFIG_ANDROID_BINDER_IPC=n
> 
> How did you get passed this build failure ?

Hi,

I am not seeing an error on that driver build.

I am using gcc 8.1.0 from kernel.org:
https://mirrors.edge.kernel.org/pub/tools/crosstool/

and building on x86_64.


-- 
~Randy

^ permalink raw reply

* [PATCH v07 9/9] hotplug/pmt: Update topology after PMT
From: Michael Bringmann @ 2018-07-13 20:18 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michael Bringmann, Nathan Fontenot, John Allen, Tyrel Datwyler,
	Thomas Falcon
In-Reply-To: <458ff569-f611-f506-afa1-138146551dde@linux.vnet.ibm.com>

hotplug/pmt: Call rebuild_sched_domains after applying changes
to update CPU associativity i.e. 'readd' CPUs.  This is to
ensure that the deferred calls to arch_update_cpu_topology are
now reflected in the system data structures.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/dlpar.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
index 7264b8e..ea3c08a 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -16,6 +16,7 @@
 #include <linux/notifier.h>
 #include <linux/spinlock.h>
 #include <linux/cpu.h>
+#include <linux/cpuset.h>
 #include <linux/slab.h>
 #include <linux/of.h>
 
@@ -451,6 +452,9 @@ static int dlpar_pmt(struct pseries_hp_errorlog *work)
 		ssleep(10);
 	}
 
+	ssleep(5);
+	rebuild_sched_domains();
+
 	return 0;
 }
 

^ permalink raw reply related

* [PATCH v07 8/9] hotplug/rtas: No rtas_event_scan during PMT update
From: Michael Bringmann @ 2018-07-13 20:18 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michael Bringmann, Nathan Fontenot, John Allen, Tyrel Datwyler,
	Thomas Falcon
In-Reply-To: <458ff569-f611-f506-afa1-138146551dde@linux.vnet.ibm.com>

hotplug/rtas: Disable rtas_event_scan during device-tree property
updates after migration to reduce conflicts with changes propagated
to other parts of the kernel configuration, such as CPUs or memory.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index df1791b..09633de 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -685,14 +685,18 @@ static int dlpar_cpu_readd_by_index(u32 drc_index)
 
 	pr_info("Attempting to re-add CPU, drc index %x\n", drc_index);
 
+	rtas_event_scan_disable();
 	arch_update_cpu_topology_suspend();
 	rc = dlpar_cpu_remove_by_index(drc_index, false);
 	arch_update_cpu_topology_resume();
+	rtas_event_scan_enable();
 
 	if (!rc) {
+		rtas_event_scan_disable();
 		arch_update_cpu_topology_suspend();
 		rc = dlpar_cpu_add(drc_index, false);
 		arch_update_cpu_topology_resume();
+		rtas_event_scan_enable();
 	}
 
 	if (rc)

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox