Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 09/17] arm64, trans_pgd: add trans_pgd_create_empty
From: Pavel Tatashin @ 2019-08-21 18:31 UTC (permalink / raw)
  To: pasha.tatashin, jmorris, sashal, ebiederm, kexec, linux-kernel,
	corbet, catalin.marinas, will, linux-arm-kernel, marc.zyngier,
	james.morse, vladimir.murzin, matthias.bgg, bhsharma, linux-mm,
	mark.rutland
In-Reply-To: <20190821183204.23576-1-pasha.tatashin@soleen.com>

This functions returns a zeroed trans_pgd using the allocator that is
specified in the info argument.

trans_pgds should be created by using this function.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
---
 arch/arm64/include/asm/trans_pgd.h |  3 +++
 arch/arm64/kernel/hibernate.c      |  6 +++---
 arch/arm64/mm/trans_pgd.c          | 12 ++++++++++++
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/trans_pgd.h b/arch/arm64/include/asm/trans_pgd.h
index e3d022b1b526..26e5a63676b5 100644
--- a/arch/arm64/include/asm/trans_pgd.h
+++ b/arch/arm64/include/asm/trans_pgd.h
@@ -40,6 +40,9 @@ struct trans_pgd_info {
 	unsigned long trans_flags;
 };
 
+/* Create and empty trans_pgd page table */
+int trans_pgd_create_empty(struct trans_pgd_info *info, pgd_t **trans_pgd);
+
 int trans_pgd_create_copy(pgd_t **dst_pgdp, unsigned long start,
 			  unsigned long end);
 
diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
index 17426dc8cb54..8c2641a9bb09 100644
--- a/arch/arm64/kernel/hibernate.c
+++ b/arch/arm64/kernel/hibernate.c
@@ -216,9 +216,9 @@ static int create_safe_exec_page(void *src_start, size_t length,
 	memcpy(page, src_start, length);
 	__flush_icache_range((unsigned long)page, (unsigned long)page + length);
 
-	trans_pgd = (void *)get_safe_page(GFP_ATOMIC);
-	if (!trans_pgd)
-		return -ENOMEM;
+	rc = trans_pgd_create_empty(&trans_info, &trans_pgd);
+	if (rc)
+		return rc;
 
 	rc = trans_pgd_map_page(&trans_info, trans_pgd, page, dst_addr,
 				PAGE_KERNEL_EXEC);
diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c
index dbabccd78cc4..ece797aa1841 100644
--- a/arch/arm64/mm/trans_pgd.c
+++ b/arch/arm64/mm/trans_pgd.c
@@ -164,6 +164,18 @@ static int copy_page_tables(pgd_t *dst_pgdp, unsigned long start,
 	return 0;
 }
 
+int trans_pgd_create_empty(struct trans_pgd_info *info, pgd_t **trans_pgd)
+{
+	pgd_t *dst_pgdp = trans_alloc(info);
+
+	if (!dst_pgdp)
+		return -ENOMEM;
+
+	*trans_pgd = dst_pgdp;
+
+	return 0;
+}
+
 int trans_pgd_create_copy(pgd_t **dst_pgdp, unsigned long start,
 			  unsigned long end)
 {
-- 
2.23.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH v3 08/17] arm64, trans_pgd: make trans_pgd_map_page generic
From: Pavel Tatashin @ 2019-08-21 18:31 UTC (permalink / raw)
  To: pasha.tatashin, jmorris, sashal, ebiederm, kexec, linux-kernel,
	corbet, catalin.marinas, will, linux-arm-kernel, marc.zyngier,
	james.morse, vladimir.murzin, matthias.bgg, bhsharma, linux-mm,
	mark.rutland
In-Reply-To: <20190821183204.23576-1-pasha.tatashin@soleen.com>

Currently, trans_pgd_map_page has assumptions that are relevant to
hibernate. But, to make it generic we must allow it to use any allocator
and also, can't assume that entries do not exist in the page table
already. Also, we can't use init_mm here.

Also, add "flags" for trans_pgd_info, they are going to be used
in copy functions once they are generalized.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
---
 arch/arm64/include/asm/trans_pgd.h | 39 +++++++++++++-
 arch/arm64/kernel/hibernate.c      | 13 ++++-
 arch/arm64/mm/trans_pgd.c          | 82 +++++++++++++++++++++---------
 3 files changed, 107 insertions(+), 27 deletions(-)

diff --git a/arch/arm64/include/asm/trans_pgd.h b/arch/arm64/include/asm/trans_pgd.h
index c7b5402b7d87..e3d022b1b526 100644
--- a/arch/arm64/include/asm/trans_pgd.h
+++ b/arch/arm64/include/asm/trans_pgd.h
@@ -11,10 +11,45 @@
 #include <linux/bits.h>
 #include <asm/pgtable-types.h>
 
+/*
+ * trans_alloc_page
+ *	- Allocator that should return exactly one uninitilaized page, if this
+ *	 allocator fails, trans_pgd returns -ENOMEM error.
+ *
+ * trans_alloc_arg
+ *	- Passed to trans_alloc_page as an argument
+ *
+ * trans_flags
+ *	- bitmap with flags that control how page table is filled.
+ *	  TRANS_MKWRITE: during page table copy make PTE, PME, and PUD page
+ *			 writeable by removing RDONLY flag from PTE.
+ *	  TRANS_MKVALID: during page table copy, if PTE present, but not valid,
+ *			 make it valid.
+ *	  TRANS_CHECKPFN: During page table copy, for every PTE entry check that
+ *			  PFN that this PTE points to is valid. Otherwise return
+ *			  -ENXIO
+ */
+
+#define	TRANS_MKWRITE	BIT(0)
+#define	TRANS_MKVALID	BIT(1)
+#define	TRANS_CHECKPFN	BIT(2)
+
+struct trans_pgd_info {
+	void * (*trans_alloc_page)(void *arg);
+	void *trans_alloc_arg;
+	unsigned long trans_flags;
+};
+
 int trans_pgd_create_copy(pgd_t **dst_pgdp, unsigned long start,
 			  unsigned long end);
 
-int trans_pgd_map_page(pgd_t *trans_pgd, void *page, unsigned long dst_addr,
-		       pgprot_t pgprot);
+/*
+ * Add map entry to trans_pgd for a base-size page at PTE level.
+ * page:	page to be mapped.
+ * dst_addr:	new VA address for the pages
+ * pgprot:	protection for the page.
+ */
+int trans_pgd_map_page(struct trans_pgd_info *info, pgd_t *trans_pgd,
+		       void *page, unsigned long dst_addr, pgprot_t pgprot);
 
 #endif /* _ASM_TRANS_TABLE_H */
diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
index 6ee81bbaa37f..17426dc8cb54 100644
--- a/arch/arm64/kernel/hibernate.c
+++ b/arch/arm64/kernel/hibernate.c
@@ -179,6 +179,12 @@ int arch_hibernation_header_restore(void *addr)
 }
 EXPORT_SYMBOL(arch_hibernation_header_restore);
 
+static void *
+hibernate_page_alloc(void *arg)
+{
+	return (void *)get_safe_page((gfp_t)(unsigned long)arg);
+}
+
 /*
  * Copies length bytes, starting at src_start into an new page,
  * perform cache maintentance, then maps it at the specified address low
@@ -195,6 +201,11 @@ static int create_safe_exec_page(void *src_start, size_t length,
 				 unsigned long dst_addr,
 				 phys_addr_t *phys_dst_addr)
 {
+	struct trans_pgd_info trans_info = {
+		.trans_alloc_page	= hibernate_page_alloc,
+		.trans_alloc_arg	= (void *)GFP_ATOMIC,
+		.trans_flags		= 0,
+	};
 	void *page = (void *)get_safe_page(GFP_ATOMIC);
 	pgd_t *trans_pgd;
 	int rc;
@@ -209,7 +220,7 @@ static int create_safe_exec_page(void *src_start, size_t length,
 	if (!trans_pgd)
 		return -ENOMEM;
 
-	rc = trans_pgd_map_page(trans_pgd, page, dst_addr,
+	rc = trans_pgd_map_page(&trans_info, trans_pgd, page, dst_addr,
 				PAGE_KERNEL_EXEC);
 	if (rc)
 		return rc;
diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c
index 00b62d8640c2..dbabccd78cc4 100644
--- a/arch/arm64/mm/trans_pgd.c
+++ b/arch/arm64/mm/trans_pgd.c
@@ -17,6 +17,16 @@
 #include <asm/pgtable.h>
 #include <linux/suspend.h>
 
+static void *trans_alloc(struct trans_pgd_info *info)
+{
+	void *page = info->trans_alloc_page(info->trans_alloc_arg);
+
+	if (page)
+		clear_page(page);
+
+	return page;
+}
+
 static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
 {
 	pte_t pte = READ_ONCE(*src_ptep);
@@ -172,40 +182,64 @@ int trans_pgd_create_copy(pgd_t **dst_pgdp, unsigned long start,
 	return rc;
 }
 
-int trans_pgd_map_page(pgd_t *trans_pgd, void *page, unsigned long dst_addr,
-		       pgprot_t pgprot)
+int trans_pgd_map_page(struct trans_pgd_info *info, pgd_t *trans_pgd,
+		       void *page, unsigned long dst_addr, pgprot_t pgprot)
 {
-	pgd_t *pgdp;
-	pud_t *pudp;
-	pmd_t *pmdp;
-	pte_t *ptep;
-
-	pgdp = pgd_offset_raw(trans_pgd, dst_addr);
-	if (pgd_none(READ_ONCE(*pgdp))) {
-		pudp = (void *)get_safe_page(GFP_ATOMIC);
-		if (!pudp)
+	int pgd_idx = pgd_index(dst_addr);
+	int pud_idx = pud_index(dst_addr);
+	int pmd_idx = pmd_index(dst_addr);
+	int pte_idx = pte_index(dst_addr);
+	pgd_t *pgdp = trans_pgd;
+	pgd_t pgd = READ_ONCE(pgdp[pgd_idx]);
+	pud_t *pudp, pud;
+	pmd_t *pmdp, pmd;
+	pte_t *ptep, pte;
+
+	if (pgd_none(pgd)) {
+		pud_t *t = trans_alloc(info);
+
+		if (!t)
 			return -ENOMEM;
-		pgd_populate(&init_mm, pgdp, pudp);
+
+		__pgd_populate(&pgdp[pgd_idx], __pa(t), PUD_TYPE_TABLE);
+		pgd = READ_ONCE(pgdp[pgd_idx]);
 	}
 
-	pudp = pud_offset(pgdp, dst_addr);
-	if (pud_none(READ_ONCE(*pudp))) {
-		pmdp = (void *)get_safe_page(GFP_ATOMIC);
-		if (!pmdp)
+	pudp = __va(pgd_page_paddr(pgd));
+	pud = READ_ONCE(pudp[pud_idx]);
+	if (pud_sect(pud)) {
+		return -ENXIO;
+	} else if (pud_none(pud) || pud_sect(pud)) {
+		pmd_t *t = trans_alloc(info);
+
+		if (!t)
 			return -ENOMEM;
-		pud_populate(&init_mm, pudp, pmdp);
+
+		__pud_populate(&pudp[pud_idx], __pa(t), PMD_TYPE_TABLE);
+		pud = READ_ONCE(pudp[pud_idx]);
 	}
 
-	pmdp = pmd_offset(pudp, dst_addr);
-	if (pmd_none(READ_ONCE(*pmdp))) {
-		ptep = (void *)get_safe_page(GFP_ATOMIC);
-		if (!ptep)
+	pmdp = __va(pud_page_paddr(pud));
+	pmd = READ_ONCE(pmdp[pmd_idx]);
+	if (pmd_sect(pmd)) {
+		return -ENXIO;
+	} else if (pmd_none(pmd) || pmd_sect(pmd)) {
+		pte_t *t = trans_alloc(info);
+
+		if (!t)
 			return -ENOMEM;
-		pmd_populate_kernel(&init_mm, pmdp, ptep);
+
+		__pmd_populate(&pmdp[pmd_idx], __pa(t), PTE_TYPE_PAGE);
+		pmd = READ_ONCE(pmdp[pmd_idx]);
 	}
 
-	ptep = pte_offset_kernel(pmdp, dst_addr);
-	set_pte(ptep, pfn_pte(virt_to_pfn(page), PAGE_KERNEL_EXEC));
+	ptep = __va(pmd_page_paddr(pmd));
+	pte = READ_ONCE(ptep[pte_idx]);
+
+	if (!pte_none(pte))
+		return -ENXIO;
+
+	set_pte(&ptep[pte_idx], pfn_pte(virt_to_pfn(page), pgprot));
 
 	return 0;
 }
-- 
2.23.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH v3 07/17] arm64, hibernate: move page handling function to new trans_pgd.c
From: Pavel Tatashin @ 2019-08-21 18:31 UTC (permalink / raw)
  To: pasha.tatashin, jmorris, sashal, ebiederm, kexec, linux-kernel,
	corbet, catalin.marinas, will, linux-arm-kernel, marc.zyngier,
	james.morse, vladimir.murzin, matthias.bgg, bhsharma, linux-mm,
	mark.rutland
In-Reply-To: <20190821183204.23576-1-pasha.tatashin@soleen.com>

Now, that we abstracted the required functions move them to a new home.
Later, we will generalize these function in order to be useful outside
of hibernation.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
---
 arch/arm64/Kconfig                 |   4 +
 arch/arm64/include/asm/trans_pgd.h |  20 +++
 arch/arm64/kernel/hibernate.c      | 199 +--------------------------
 arch/arm64/mm/Makefile             |   1 +
 arch/arm64/mm/trans_pgd.c          | 211 +++++++++++++++++++++++++++++
 5 files changed, 237 insertions(+), 198 deletions(-)
 create mode 100644 arch/arm64/include/asm/trans_pgd.h
 create mode 100644 arch/arm64/mm/trans_pgd.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3adcec05b1f6..91a7416ffe4e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -999,6 +999,10 @@ config CRASH_DUMP
 
 	  For more details see Documentation/admin-guide/kdump/kdump.rst
 
+config TRANS_TABLE
+	def_bool y
+	depends on HIBERNATION || KEXEC_CORE
+
 config XEN_DOM0
 	def_bool y
 	depends on XEN
diff --git a/arch/arm64/include/asm/trans_pgd.h b/arch/arm64/include/asm/trans_pgd.h
new file mode 100644
index 000000000000..c7b5402b7d87
--- /dev/null
+++ b/arch/arm64/include/asm/trans_pgd.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Copyright (c) 2019, Microsoft Corporation.
+ * Pavel Tatashin <patatash@linux.microsoft.com>
+ */
+
+#ifndef _ASM_TRANS_TABLE_H
+#define _ASM_TRANS_TABLE_H
+
+#include <linux/bits.h>
+#include <asm/pgtable-types.h>
+
+int trans_pgd_create_copy(pgd_t **dst_pgdp, unsigned long start,
+			  unsigned long end);
+
+int trans_pgd_map_page(pgd_t *trans_pgd, void *page, unsigned long dst_addr,
+		       pgprot_t pgprot);
+
+#endif /* _ASM_TRANS_TABLE_H */
diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
index 2e29d620b56c..6ee81bbaa37f 100644
--- a/arch/arm64/kernel/hibernate.c
+++ b/arch/arm64/kernel/hibernate.c
@@ -16,7 +16,6 @@
 #define pr_fmt(x) "hibernate: " x
 #include <linux/cpu.h>
 #include <linux/kvm_host.h>
-#include <linux/mm.h>
 #include <linux/pm.h>
 #include <linux/sched.h>
 #include <linux/suspend.h>
@@ -31,14 +30,12 @@
 #include <asm/kexec.h>
 #include <asm/memory.h>
 #include <asm/mmu_context.h>
-#include <asm/pgalloc.h>
-#include <asm/pgtable.h>
-#include <asm/pgtable-hwdef.h>
 #include <asm/sections.h>
 #include <asm/smp.h>
 #include <asm/smp_plat.h>
 #include <asm/suspend.h>
 #include <asm/sysreg.h>
+#include <asm/trans_pgd.h>
 #include <asm/virt.h>
 
 /*
@@ -182,45 +179,6 @@ int arch_hibernation_header_restore(void *addr)
 }
 EXPORT_SYMBOL(arch_hibernation_header_restore);
 
-int trans_pgd_map_page(pgd_t *trans_pgd, void *page,
-		       unsigned long dst_addr,
-		       pgprot_t pgprot)
-{
-	pgd_t *pgdp;
-	pud_t *pudp;
-	pmd_t *pmdp;
-	pte_t *ptep;
-
-	pgdp = pgd_offset_raw(trans_pgd, dst_addr);
-	if (pgd_none(READ_ONCE(*pgdp))) {
-		pudp = (void *)get_safe_page(GFP_ATOMIC);
-		if (!pudp)
-			return -ENOMEM;
-		pgd_populate(&init_mm, pgdp, pudp);
-	}
-
-	pudp = pud_offset(pgdp, dst_addr);
-	if (pud_none(READ_ONCE(*pudp))) {
-		pmdp = (void *)get_safe_page(GFP_ATOMIC);
-		if (!pmdp)
-			return -ENOMEM;
-		pud_populate(&init_mm, pudp, pmdp);
-	}
-
-	pmdp = pmd_offset(pudp, dst_addr);
-	if (pmd_none(READ_ONCE(*pmdp))) {
-		ptep = (void *)get_safe_page(GFP_ATOMIC);
-		if (!ptep)
-			return -ENOMEM;
-		pmd_populate_kernel(&init_mm, pmdp, ptep);
-	}
-
-	ptep = pte_offset_kernel(pmdp, dst_addr);
-	set_pte(ptep, pfn_pte(virt_to_pfn(page), PAGE_KERNEL_EXEC));
-
-	return 0;
-}
-
 /*
  * Copies length bytes, starting at src_start into an new page,
  * perform cache maintentance, then maps it at the specified address low
@@ -339,161 +297,6 @@ int swsusp_arch_suspend(void)
 	return ret;
 }
 
-static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
-{
-	pte_t pte = READ_ONCE(*src_ptep);
-
-	if (pte_valid(pte)) {
-		/*
-		 * Resume will overwrite areas that may be marked
-		 * read only (code, rodata). Clear the RDONLY bit from
-		 * the temporary mappings we use during restore.
-		 */
-		set_pte(dst_ptep, pte_mkwrite(pte));
-	} else if (debug_pagealloc_enabled() && !pte_none(pte)) {
-		/*
-		 * debug_pagealloc will removed the PTE_VALID bit if
-		 * the page isn't in use by the resume kernel. It may have
-		 * been in use by the original kernel, in which case we need
-		 * to put it back in our copy to do the restore.
-		 *
-		 * Before marking this entry valid, check the pfn should
-		 * be mapped.
-		 */
-		BUG_ON(!pfn_valid(pte_pfn(pte)));
-
-		set_pte(dst_ptep, pte_mkpresent(pte_mkwrite(pte)));
-	}
-}
-
-static int copy_pte(pmd_t *dst_pmdp, pmd_t *src_pmdp, unsigned long start,
-		    unsigned long end)
-{
-	pte_t *src_ptep;
-	pte_t *dst_ptep;
-	unsigned long addr = start;
-
-	dst_ptep = (pte_t *)get_safe_page(GFP_ATOMIC);
-	if (!dst_ptep)
-		return -ENOMEM;
-	pmd_populate_kernel(&init_mm, dst_pmdp, dst_ptep);
-	dst_ptep = pte_offset_kernel(dst_pmdp, start);
-
-	src_ptep = pte_offset_kernel(src_pmdp, start);
-	do {
-		_copy_pte(dst_ptep, src_ptep, addr);
-	} while (dst_ptep++, src_ptep++, addr += PAGE_SIZE, addr != end);
-
-	return 0;
-}
-
-static int copy_pmd(pud_t *dst_pudp, pud_t *src_pudp, unsigned long start,
-		    unsigned long end)
-{
-	pmd_t *src_pmdp;
-	pmd_t *dst_pmdp;
-	unsigned long next;
-	unsigned long addr = start;
-
-	if (pud_none(READ_ONCE(*dst_pudp))) {
-		dst_pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC);
-		if (!dst_pmdp)
-			return -ENOMEM;
-		pud_populate(&init_mm, dst_pudp, dst_pmdp);
-	}
-	dst_pmdp = pmd_offset(dst_pudp, start);
-
-	src_pmdp = pmd_offset(src_pudp, start);
-	do {
-		pmd_t pmd = READ_ONCE(*src_pmdp);
-
-		next = pmd_addr_end(addr, end);
-		if (pmd_none(pmd))
-			continue;
-		if (pmd_table(pmd)) {
-			if (copy_pte(dst_pmdp, src_pmdp, addr, next))
-				return -ENOMEM;
-		} else {
-			set_pmd(dst_pmdp,
-				__pmd(pmd_val(pmd) & ~PMD_SECT_RDONLY));
-		}
-	} while (dst_pmdp++, src_pmdp++, addr = next, addr != end);
-
-	return 0;
-}
-
-static int copy_pud(pgd_t *dst_pgdp, pgd_t *src_pgdp, unsigned long start,
-		    unsigned long end)
-{
-	pud_t *dst_pudp;
-	pud_t *src_pudp;
-	unsigned long next;
-	unsigned long addr = start;
-
-	if (pgd_none(READ_ONCE(*dst_pgdp))) {
-		dst_pudp = (pud_t *)get_safe_page(GFP_ATOMIC);
-		if (!dst_pudp)
-			return -ENOMEM;
-		pgd_populate(&init_mm, dst_pgdp, dst_pudp);
-	}
-	dst_pudp = pud_offset(dst_pgdp, start);
-
-	src_pudp = pud_offset(src_pgdp, start);
-	do {
-		pud_t pud = READ_ONCE(*src_pudp);
-
-		next = pud_addr_end(addr, end);
-		if (pud_none(pud))
-			continue;
-		if (pud_table(pud)) {
-			if (copy_pmd(dst_pudp, src_pudp, addr, next))
-				return -ENOMEM;
-		} else {
-			set_pud(dst_pudp,
-				__pud(pud_val(pud) & ~PMD_SECT_RDONLY));
-		}
-	} while (dst_pudp++, src_pudp++, addr = next, addr != end);
-
-	return 0;
-}
-
-static int copy_page_tables(pgd_t *dst_pgdp, unsigned long start,
-			    unsigned long end)
-{
-	unsigned long next;
-	unsigned long addr = start;
-	pgd_t *src_pgdp = pgd_offset_k(start);
-
-	dst_pgdp = pgd_offset_raw(dst_pgdp, start);
-	do {
-		next = pgd_addr_end(addr, end);
-		if (pgd_none(READ_ONCE(*src_pgdp)))
-			continue;
-		if (copy_pud(dst_pgdp, src_pgdp, addr, next))
-			return -ENOMEM;
-	} while (dst_pgdp++, src_pgdp++, addr = next, addr != end);
-
-	return 0;
-}
-
-int trans_pgd_create_copy(pgd_t **dst_pgdp, unsigned long start,
-			  unsigned long end)
-{
-	int rc;
-	pgd_t *trans_pgd = (pgd_t *)get_safe_page(GFP_ATOMIC);
-
-	if (!trans_pgd) {
-		pr_err("Failed to allocate memory for temporary page tables.\n");
-		return -ENOMEM;
-	}
-
-	rc = copy_page_tables(trans_pgd, start, end);
-	if (!rc)
-		*dst_pgdp = trans_pgd;
-
-	return rc;
-}
-
 /*
  * Setup then Resume from the hibernate image using swsusp_arch_suspend_exit().
  *
diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
index 849c1df3d214..f3002f1d0e61 100644
--- a/arch/arm64/mm/Makefile
+++ b/arch/arm64/mm/Makefile
@@ -6,6 +6,7 @@ obj-y				:= dma-mapping.o extable.o fault.o init.o \
 obj-$(CONFIG_HUGETLB_PAGE)	+= hugetlbpage.o
 obj-$(CONFIG_ARM64_PTDUMP_CORE)	+= dump.o
 obj-$(CONFIG_ARM64_PTDUMP_DEBUGFS)	+= ptdump_debugfs.o
+obj-$(CONFIG_TRANS_TABLE)	+= trans_pgd.o
 obj-$(CONFIG_NUMA)		+= numa.o
 obj-$(CONFIG_DEBUG_VIRTUAL)	+= physaddr.o
 KASAN_SANITIZE_physaddr.o	+= n
diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c
new file mode 100644
index 000000000000..00b62d8640c2
--- /dev/null
+++ b/arch/arm64/mm/trans_pgd.c
@@ -0,0 +1,211 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (c) 2019, Microsoft Corporation.
+ * Pavel Tatashin <patatash@linux.microsoft.com>
+ */
+
+/*
+ * Transitional tables are used during system transferring from one world to
+ * another: such as during hibernate restore, and kexec reboots. During these
+ * phases one cannot rely on page table not being overwritten.
+ *
+ */
+
+#include <asm/trans_pgd.h>
+#include <asm/pgalloc.h>
+#include <asm/pgtable.h>
+#include <linux/suspend.h>
+
+static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
+{
+	pte_t pte = READ_ONCE(*src_ptep);
+
+	if (pte_valid(pte)) {
+		/*
+		 * Resume will overwrite areas that may be marked
+		 * read only (code, rodata). Clear the RDONLY bit from
+		 * the temporary mappings we use during restore.
+		 */
+		set_pte(dst_ptep, pte_mkwrite(pte));
+	} else if (debug_pagealloc_enabled() && !pte_none(pte)) {
+		/*
+		 * debug_pagealloc will removed the PTE_VALID bit if
+		 * the page isn't in use by the resume kernel. It may have
+		 * been in use by the original kernel, in which case we need
+		 * to put it back in our copy to do the restore.
+		 *
+		 * Before marking this entry valid, check the pfn should
+		 * be mapped.
+		 */
+		BUG_ON(!pfn_valid(pte_pfn(pte)));
+
+		set_pte(dst_ptep, pte_mkpresent(pte_mkwrite(pte)));
+	}
+}
+
+static int copy_pte(pmd_t *dst_pmdp, pmd_t *src_pmdp, unsigned long start,
+		    unsigned long end)
+{
+	pte_t *src_ptep;
+	pte_t *dst_ptep;
+	unsigned long addr = start;
+
+	dst_ptep = (pte_t *)get_safe_page(GFP_ATOMIC);
+	if (!dst_ptep)
+		return -ENOMEM;
+	pmd_populate_kernel(&init_mm, dst_pmdp, dst_ptep);
+	dst_ptep = pte_offset_kernel(dst_pmdp, start);
+
+	src_ptep = pte_offset_kernel(src_pmdp, start);
+	do {
+		_copy_pte(dst_ptep, src_ptep, addr);
+	} while (dst_ptep++, src_ptep++, addr += PAGE_SIZE, addr != end);
+
+	return 0;
+}
+
+static int copy_pmd(pud_t *dst_pudp, pud_t *src_pudp, unsigned long start,
+		    unsigned long end)
+{
+	pmd_t *src_pmdp;
+	pmd_t *dst_pmdp;
+	unsigned long next;
+	unsigned long addr = start;
+
+	if (pud_none(READ_ONCE(*dst_pudp))) {
+		dst_pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC);
+		if (!dst_pmdp)
+			return -ENOMEM;
+		pud_populate(&init_mm, dst_pudp, dst_pmdp);
+	}
+	dst_pmdp = pmd_offset(dst_pudp, start);
+
+	src_pmdp = pmd_offset(src_pudp, start);
+	do {
+		pmd_t pmd = READ_ONCE(*src_pmdp);
+
+		next = pmd_addr_end(addr, end);
+		if (pmd_none(pmd))
+			continue;
+		if (pmd_table(pmd)) {
+			if (copy_pte(dst_pmdp, src_pmdp, addr, next))
+				return -ENOMEM;
+		} else {
+			set_pmd(dst_pmdp,
+				__pmd(pmd_val(pmd) & ~PMD_SECT_RDONLY));
+		}
+	} while (dst_pmdp++, src_pmdp++, addr = next, addr != end);
+
+	return 0;
+}
+
+static int copy_pud(pgd_t *dst_pgdp, pgd_t *src_pgdp, unsigned long start,
+		    unsigned long end)
+{
+	pud_t *dst_pudp;
+	pud_t *src_pudp;
+	unsigned long next;
+	unsigned long addr = start;
+
+	if (pgd_none(READ_ONCE(*dst_pgdp))) {
+		dst_pudp = (pud_t *)get_safe_page(GFP_ATOMIC);
+		if (!dst_pudp)
+			return -ENOMEM;
+		pgd_populate(&init_mm, dst_pgdp, dst_pudp);
+	}
+	dst_pudp = pud_offset(dst_pgdp, start);
+
+	src_pudp = pud_offset(src_pgdp, start);
+	do {
+		pud_t pud = READ_ONCE(*src_pudp);
+
+		next = pud_addr_end(addr, end);
+		if (pud_none(pud))
+			continue;
+		if (pud_table(pud)) {
+			if (copy_pmd(dst_pudp, src_pudp, addr, next))
+				return -ENOMEM;
+		} else {
+			set_pud(dst_pudp,
+				__pud(pud_val(pud) & ~PMD_SECT_RDONLY));
+		}
+	} while (dst_pudp++, src_pudp++, addr = next, addr != end);
+
+	return 0;
+}
+
+static int copy_page_tables(pgd_t *dst_pgdp, unsigned long start,
+			    unsigned long end)
+{
+	unsigned long next;
+	unsigned long addr = start;
+	pgd_t *src_pgdp = pgd_offset_k(start);
+
+	dst_pgdp = pgd_offset_raw(dst_pgdp, start);
+	do {
+		next = pgd_addr_end(addr, end);
+		if (pgd_none(READ_ONCE(*src_pgdp)))
+			continue;
+		if (copy_pud(dst_pgdp, src_pgdp, addr, next))
+			return -ENOMEM;
+	} while (dst_pgdp++, src_pgdp++, addr = next, addr != end);
+
+	return 0;
+}
+
+int trans_pgd_create_copy(pgd_t **dst_pgdp, unsigned long start,
+			  unsigned long end)
+{
+	int rc;
+	pgd_t *trans_pgd = (pgd_t *)get_safe_page(GFP_ATOMIC);
+
+	if (!trans_pgd) {
+		pr_err("Failed to allocate memory for temporary page tables.\n");
+		return -ENOMEM;
+	}
+
+	rc = copy_page_tables(trans_pgd, start, end);
+	if (!rc)
+		*dst_pgdp = trans_pgd;
+
+	return rc;
+}
+
+int trans_pgd_map_page(pgd_t *trans_pgd, void *page, unsigned long dst_addr,
+		       pgprot_t pgprot)
+{
+	pgd_t *pgdp;
+	pud_t *pudp;
+	pmd_t *pmdp;
+	pte_t *ptep;
+
+	pgdp = pgd_offset_raw(trans_pgd, dst_addr);
+	if (pgd_none(READ_ONCE(*pgdp))) {
+		pudp = (void *)get_safe_page(GFP_ATOMIC);
+		if (!pudp)
+			return -ENOMEM;
+		pgd_populate(&init_mm, pgdp, pudp);
+	}
+
+	pudp = pud_offset(pgdp, dst_addr);
+	if (pud_none(READ_ONCE(*pudp))) {
+		pmdp = (void *)get_safe_page(GFP_ATOMIC);
+		if (!pmdp)
+			return -ENOMEM;
+		pud_populate(&init_mm, pudp, pmdp);
+	}
+
+	pmdp = pmd_offset(pudp, dst_addr);
+	if (pmd_none(READ_ONCE(*pmdp))) {
+		ptep = (void *)get_safe_page(GFP_ATOMIC);
+		if (!ptep)
+			return -ENOMEM;
+		pmd_populate_kernel(&init_mm, pmdp, ptep);
+	}
+
+	ptep = pte_offset_kernel(pmdp, dst_addr);
+	set_pte(ptep, pfn_pte(virt_to_pfn(page), PAGE_KERNEL_EXEC));
+
+	return 0;
+}
-- 
2.23.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH v3 06/17] arm64, hibernate: add trans_pgd public functions
From: Pavel Tatashin @ 2019-08-21 18:31 UTC (permalink / raw)
  To: pasha.tatashin, jmorris, sashal, ebiederm, kexec, linux-kernel,
	corbet, catalin.marinas, will, linux-arm-kernel, marc.zyngier,
	james.morse, vladimir.murzin, matthias.bgg, bhsharma, linux-mm,
	mark.rutland
In-Reply-To: <20190821183204.23576-1-pasha.tatashin@soleen.com>

trans_pgd_create_copy() and trans_pgd_map_page() are going to be
the basis for public interface of new subsystem that handles page
tables for cases which are between kernels: kexec, and hibernate.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
---
 arch/arm64/kernel/hibernate.c | 94 ++++++++++++++++++++++-------------
 1 file changed, 60 insertions(+), 34 deletions(-)

diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
index 750ecc7f2cbe..2e29d620b56c 100644
--- a/arch/arm64/kernel/hibernate.c
+++ b/arch/arm64/kernel/hibernate.c
@@ -182,39 +182,15 @@ int arch_hibernation_header_restore(void *addr)
 }
 EXPORT_SYMBOL(arch_hibernation_header_restore);
 
-/*
- * Copies length bytes, starting at src_start into an new page,
- * perform cache maintentance, then maps it at the specified address low
- * address as executable.
- *
- * This is used by hibernate to copy the code it needs to execute when
- * overwriting the kernel text. This function generates a new set of page
- * tables, which it loads into ttbr0.
- *
- * Length is provided as we probably only want 4K of data, even on a 64K
- * page system.
- */
-static int create_safe_exec_page(void *src_start, size_t length,
-				 unsigned long dst_addr,
-				 phys_addr_t *phys_dst_addr)
+int trans_pgd_map_page(pgd_t *trans_pgd, void *page,
+		       unsigned long dst_addr,
+		       pgprot_t pgprot)
 {
-	void *page = (void *)get_safe_page(GFP_ATOMIC);
-	pgd_t *trans_pgd;
 	pgd_t *pgdp;
 	pud_t *pudp;
 	pmd_t *pmdp;
 	pte_t *ptep;
 
-	if (!page)
-		return -ENOMEM;
-
-	memcpy(page, src_start, length);
-	__flush_icache_range((unsigned long)page, (unsigned long)page + length);
-
-	trans_pgd = (void *)get_safe_page(GFP_ATOMIC);
-	if (!trans_pgd)
-		return -ENOMEM;
-
 	pgdp = pgd_offset_raw(trans_pgd, dst_addr);
 	if (pgd_none(READ_ONCE(*pgdp))) {
 		pudp = (void *)get_safe_page(GFP_ATOMIC);
@@ -242,6 +218,44 @@ static int create_safe_exec_page(void *src_start, size_t length,
 	ptep = pte_offset_kernel(pmdp, dst_addr);
 	set_pte(ptep, pfn_pte(virt_to_pfn(page), PAGE_KERNEL_EXEC));
 
+	return 0;
+}
+
+/*
+ * Copies length bytes, starting at src_start into an new page,
+ * perform cache maintentance, then maps it at the specified address low
+ * address as executable.
+ *
+ * This is used by hibernate to copy the code it needs to execute when
+ * overwriting the kernel text. This function generates a new set of page
+ * tables, which it loads into ttbr0.
+ *
+ * Length is provided as we probably only want 4K of data, even on a 64K
+ * page system.
+ */
+static int create_safe_exec_page(void *src_start, size_t length,
+				 unsigned long dst_addr,
+				 phys_addr_t *phys_dst_addr)
+{
+	void *page = (void *)get_safe_page(GFP_ATOMIC);
+	pgd_t *trans_pgd;
+	int rc;
+
+	if (!page)
+		return -ENOMEM;
+
+	memcpy(page, src_start, length);
+	__flush_icache_range((unsigned long)page, (unsigned long)page + length);
+
+	trans_pgd = (void *)get_safe_page(GFP_ATOMIC);
+	if (!trans_pgd)
+		return -ENOMEM;
+
+	rc = trans_pgd_map_page(trans_pgd, page, dst_addr,
+				PAGE_KERNEL_EXEC);
+	if (rc)
+		return rc;
+
 	/*
 	 * Load our new page tables. A strict BBM approach requires that we
 	 * ensure that TLBs are free of any entries that may overlap with the
@@ -462,6 +476,24 @@ static int copy_page_tables(pgd_t *dst_pgdp, unsigned long start,
 	return 0;
 }
 
+int trans_pgd_create_copy(pgd_t **dst_pgdp, unsigned long start,
+			  unsigned long end)
+{
+	int rc;
+	pgd_t *trans_pgd = (pgd_t *)get_safe_page(GFP_ATOMIC);
+
+	if (!trans_pgd) {
+		pr_err("Failed to allocate memory for temporary page tables.\n");
+		return -ENOMEM;
+	}
+
+	rc = copy_page_tables(trans_pgd, start, end);
+	if (!rc)
+		*dst_pgdp = trans_pgd;
+
+	return rc;
+}
+
 /*
  * Setup then Resume from the hibernate image using swsusp_arch_suspend_exit().
  *
@@ -483,13 +515,7 @@ int swsusp_arch_resume(void)
 	 * Create a second copy of just the linear map, and use this when
 	 * restoring.
 	 */
-	tmp_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC);
-	if (!tmp_pg_dir) {
-		pr_err("Failed to allocate memory for temporary page tables.\n");
-		rc = -ENOMEM;
-		goto out;
-	}
-	rc = copy_page_tables(tmp_pg_dir, PAGE_OFFSET, 0);
+	rc = trans_pgd_create_copy(&tmp_pg_dir, PAGE_OFFSET, 0);
 	if (rc)
 		goto out;
 
-- 
2.23.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH v3 05/17] arm64, hibernate: check pgd table allocation
From: Pavel Tatashin @ 2019-08-21 18:31 UTC (permalink / raw)
  To: pasha.tatashin, jmorris, sashal, ebiederm, kexec, linux-kernel,
	corbet, catalin.marinas, will, linux-arm-kernel, marc.zyngier,
	james.morse, vladimir.murzin, matthias.bgg, bhsharma, linux-mm,
	mark.rutland
In-Reply-To: <20190821183204.23576-1-pasha.tatashin@soleen.com>

There is a bug in create_safe_exec_page(), when page table is allocated
it is not checked that table is allocated successfully:

But it is dereferenced in: pgd_none(READ_ONCE(*pgdp)).

Another issue, is that phys_to_ttbr() uses an offset in page table instead
of pgd directly.

So, allocate page table, check that allocation was successful, and use it
directly to set ttbr0_el1.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
---
 arch/arm64/kernel/hibernate.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
index ee34a06d8a35..750ecc7f2cbe 100644
--- a/arch/arm64/kernel/hibernate.c
+++ b/arch/arm64/kernel/hibernate.c
@@ -199,6 +199,7 @@ static int create_safe_exec_page(void *src_start, size_t length,
 				 phys_addr_t *phys_dst_addr)
 {
 	void *page = (void *)get_safe_page(GFP_ATOMIC);
+	pgd_t *trans_pgd;
 	pgd_t *pgdp;
 	pud_t *pudp;
 	pmd_t *pmdp;
@@ -210,7 +211,11 @@ static int create_safe_exec_page(void *src_start, size_t length,
 	memcpy(page, src_start, length);
 	__flush_icache_range((unsigned long)page, (unsigned long)page + length);
 
-	pgdp = pgd_offset_raw((void *)get_safe_page(GFP_ATOMIC), dst_addr);
+	trans_pgd = (void *)get_safe_page(GFP_ATOMIC);
+	if (!trans_pgd)
+		return -ENOMEM;
+
+	pgdp = pgd_offset_raw(trans_pgd, dst_addr);
 	if (pgd_none(READ_ONCE(*pgdp))) {
 		pudp = (void *)get_safe_page(GFP_ATOMIC);
 		if (!pudp)
@@ -251,7 +256,7 @@ static int create_safe_exec_page(void *src_start, size_t length,
 	 */
 	cpu_set_reserved_ttbr0();
 	local_flush_tlb_all();
-	write_sysreg(phys_to_ttbr(virt_to_phys(pgdp)), ttbr0_el1);
+	write_sysreg(phys_to_ttbr(virt_to_phys(trans_pgd)), ttbr0_el1);
 	isb();
 
 	*phys_dst_addr = virt_to_phys(page);
-- 
2.23.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH v3 04/17] arm64, hibernate: rename dst to page in create_safe_exec_page
From: Pavel Tatashin @ 2019-08-21 18:31 UTC (permalink / raw)
  To: pasha.tatashin, jmorris, sashal, ebiederm, kexec, linux-kernel,
	corbet, catalin.marinas, will, linux-arm-kernel, marc.zyngier,
	james.morse, vladimir.murzin, matthias.bgg, bhsharma, linux-mm,
	mark.rutland
In-Reply-To: <20190821183204.23576-1-pasha.tatashin@soleen.com>

create_safe_exec_page() allocates a safe page and maps it at a
specific location, also this function returns the physical address
of newly allocated page.

The destination VA, and PA are specified in arguments: dst_addr,
phys_dst_addr

However, within the function it uses "dst" which has unsigned long
type, but is actually a pointers in the current virtual space. This
is confusing to read.

Rename dst to more appropriate page (page that is created), and also
change its time to "void *"

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
---
 arch/arm64/kernel/hibernate.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
index c8211108ec11..ee34a06d8a35 100644
--- a/arch/arm64/kernel/hibernate.c
+++ b/arch/arm64/kernel/hibernate.c
@@ -198,17 +198,17 @@ static int create_safe_exec_page(void *src_start, size_t length,
 				 unsigned long dst_addr,
 				 phys_addr_t *phys_dst_addr)
 {
+	void *page = (void *)get_safe_page(GFP_ATOMIC);
 	pgd_t *pgdp;
 	pud_t *pudp;
 	pmd_t *pmdp;
 	pte_t *ptep;
-	unsigned long dst = get_safe_page(GFP_ATOMIC);
 
-	if (!dst)
+	if (!page)
 		return -ENOMEM;
 
-	memcpy((void *)dst, src_start, length);
-	__flush_icache_range(dst, dst + length);
+	memcpy(page, src_start, length);
+	__flush_icache_range((unsigned long)page, (unsigned long)page + length);
 
 	pgdp = pgd_offset_raw((void *)get_safe_page(GFP_ATOMIC), dst_addr);
 	if (pgd_none(READ_ONCE(*pgdp))) {
@@ -235,7 +235,7 @@ static int create_safe_exec_page(void *src_start, size_t length,
 	}
 
 	ptep = pte_offset_kernel(pmdp, dst_addr);
-	set_pte(ptep, pfn_pte(virt_to_pfn(dst), PAGE_KERNEL_EXEC));
+	set_pte(ptep, pfn_pte(virt_to_pfn(page), PAGE_KERNEL_EXEC));
 
 	/*
 	 * Load our new page tables. A strict BBM approach requires that we
@@ -254,7 +254,7 @@ static int create_safe_exec_page(void *src_start, size_t length,
 	write_sysreg(phys_to_ttbr(virt_to_phys(pgdp)), ttbr0_el1);
 	isb();
 
-	*phys_dst_addr = virt_to_phys((void *)dst);
+	*phys_dst_addr = virt_to_phys(page);
 
 	return 0;
 }
-- 
2.23.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH v3 03/17] arm64, hibernate: remove gotos in create_safe_exec_page
From: Pavel Tatashin @ 2019-08-21 18:31 UTC (permalink / raw)
  To: pasha.tatashin, jmorris, sashal, ebiederm, kexec, linux-kernel,
	corbet, catalin.marinas, will, linux-arm-kernel, marc.zyngier,
	james.morse, vladimir.murzin, matthias.bgg, bhsharma, linux-mm,
	mark.rutland
In-Reply-To: <20190821183204.23576-1-pasha.tatashin@soleen.com>

Usually, gotos are used to handle cleanup after exception, but
in case of create_safe_exec_page there are no clean-ups. So,
simply return the errors directly.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
---
 arch/arm64/kernel/hibernate.c | 28 +++++++++-------------------
 1 file changed, 9 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
index 4bb4d17a6a7c..c8211108ec11 100644
--- a/arch/arm64/kernel/hibernate.c
+++ b/arch/arm64/kernel/hibernate.c
@@ -198,17 +198,14 @@ static int create_safe_exec_page(void *src_start, size_t length,
 				 unsigned long dst_addr,
 				 phys_addr_t *phys_dst_addr)
 {
-	int rc = 0;
 	pgd_t *pgdp;
 	pud_t *pudp;
 	pmd_t *pmdp;
 	pte_t *ptep;
 	unsigned long dst = get_safe_page(GFP_ATOMIC);
 
-	if (!dst) {
-		rc = -ENOMEM;
-		goto out;
-	}
+	if (!dst)
+		return -ENOMEM;
 
 	memcpy((void *)dst, src_start, length);
 	__flush_icache_range(dst, dst + length);
@@ -216,30 +213,24 @@ static int create_safe_exec_page(void *src_start, size_t length,
 	pgdp = pgd_offset_raw((void *)get_safe_page(GFP_ATOMIC), dst_addr);
 	if (pgd_none(READ_ONCE(*pgdp))) {
 		pudp = (void *)get_safe_page(GFP_ATOMIC);
-		if (!pudp) {
-			rc = -ENOMEM;
-			goto out;
-		}
+		if (!pudp)
+			return -ENOMEM;
 		pgd_populate(&init_mm, pgdp, pudp);
 	}
 
 	pudp = pud_offset(pgdp, dst_addr);
 	if (pud_none(READ_ONCE(*pudp))) {
 		pmdp = (void *)get_safe_page(GFP_ATOMIC);
-		if (!pmdp) {
-			rc = -ENOMEM;
-			goto out;
-		}
+		if (!pmdp)
+			return -ENOMEM;
 		pud_populate(&init_mm, pudp, pmdp);
 	}
 
 	pmdp = pmd_offset(pudp, dst_addr);
 	if (pmd_none(READ_ONCE(*pmdp))) {
 		ptep = (void *)get_safe_page(GFP_ATOMIC);
-		if (!ptep) {
-			rc = -ENOMEM;
-			goto out;
-		}
+		if (!ptep)
+			return -ENOMEM;
 		pmd_populate_kernel(&init_mm, pmdp, ptep);
 	}
 
@@ -265,8 +256,7 @@ static int create_safe_exec_page(void *src_start, size_t length,
 
 	*phys_dst_addr = virt_to_phys((void *)dst);
 
-out:
-	return rc;
+	return 0;
 }
 
 #define dcache_clean_range(start, end)	__flush_dcache_area(start, (end - start))
-- 
2.23.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH v3 00/17] arm64: MMU enabled kexec relocation
From: Pavel Tatashin @ 2019-08-21 18:31 UTC (permalink / raw)
  To: pasha.tatashin, jmorris, sashal, ebiederm, kexec, linux-kernel,
	corbet, catalin.marinas, will, linux-arm-kernel, marc.zyngier,
	james.morse, vladimir.murzin, matthias.bgg, bhsharma, linux-mm,
	mark.rutland

Changelog:
v3:
	- Split changes to create_safe_exec_page() into several patches for
	  easier review as request by Mark Rutland. This is why this series
	  has 3 more patches.
	- Renamed trans_table to tans_pgd as agreed with Mark. The header
	  comment in trans_pgd.c explains that trans stands for
	  transitional page tables. Meaning they are used in transition
	  between two kernels.
v2:
	- Fixed hibernate bug reported by James Morse
	- Addressed comments from James Morse:
	  * More incremental changes to trans_table
	  * Removed TRANS_FORCEMAP
	  * Added kexec reboot data for image with 380M in size.

Enable MMU during kexec relocation in order to improve reboot performance.

If kexec functionality is used for a fast system update, with a minimal
downtime, the relocation of kernel + initramfs takes a significant portion
of reboot.

The reason for slow relocation is because it is done without MMU, and thus
not benefiting from D-Cache.

Performance data
----------------
For this experiment, the size of kernel plus initramfs is small, only 25M.
If initramfs was larger, than the improvements would be greater, as time
spent in relocation is proportional to the size of relocation.

Previously:
kernel shutdown	0.022131328s
relocation	0.440510736s
kernel startup	0.294706768s

Relocation was taking: 58.2% of reboot time

Now:
kernel shutdown	0.032066576s
relocation	0.022158152s
kernel startup	0.296055880s

Now: Relocation takes 6.3% of reboot time

Total reboot is x2.16 times faster.

With bigger userland (fitImage 380M), the reboot time is improved by 3.57s,
and is reduced from 3.9s down to 0.33s

Previous approaches and discussions
-----------------------------------
https://lore.kernel.org/lkml/20190817024629.26611-1-pasha.tatashin@soleen.com
version 2 of this series

https://lore.kernel.org/lkml/20190801152439.11363-1-pasha.tatashin@soleen.com
version 1 of this series

https://lore.kernel.org/lkml/20190709182014.16052-1-pasha.tatashin@soleen.com
reserve space for kexec to avoid relocation, involves changes to generic code
to optimize a problem that exists on arm64 only:

https://lore.kernel.org/lkml/20190716165641.6990-1-pasha.tatashin@soleen.com
The first attempt to enable MMU, some bugs that prevented performance
improvement. The page tables unnecessary configured idmap for the whole
physical space.

https://lore.kernel.org/lkml/20190731153857.4045-1-pasha.tatashin@soleen.com
No linear copy, bug with EL2 reboots.

Pavel Tatashin (17):
  kexec: quiet down kexec reboot
  arm64, hibernate: use get_safe_page directly
  arm64, hibernate: remove gotos in create_safe_exec_page
  arm64, hibernate: rename dst to page in create_safe_exec_page
  arm64, hibernate: check pgd table allocation
  arm64, hibernate: add trans_pgd public functions
  arm64, hibernate: move page handling function to new trans_pgd.c
  arm64, trans_pgd: make trans_pgd_map_page generic
  arm64, trans_pgd: add trans_pgd_create_empty
  arm64, trans_pgd: adjust trans_pgd_create_copy interface
  arm64, trans_pgd: add PUD_SECT_RDONLY
  arm64, trans_pgd: complete generalization of trans_pgds
  kexec: add machine_kexec_post_load()
  arm64, kexec: move relocation function setup and clean up
  arm64, kexec: add expandable argument to relocation function
  arm64, kexec: configure trans_pgd page table for kexec
  arm64, kexec: enable MMU during kexec relocation

 arch/arm64/Kconfig                     |   4 +
 arch/arm64/include/asm/kexec.h         |  51 ++++-
 arch/arm64/include/asm/pgtable-hwdef.h |   1 +
 arch/arm64/include/asm/trans_pgd.h     |  63 ++++++
 arch/arm64/kernel/asm-offsets.c        |  14 ++
 arch/arm64/kernel/cpu-reset.S          |   4 +-
 arch/arm64/kernel/cpu-reset.h          |   8 +-
 arch/arm64/kernel/hibernate.c          | 261 ++++++------------------
 arch/arm64/kernel/machine_kexec.c      | 199 ++++++++++++++----
 arch/arm64/kernel/relocate_kernel.S    | 196 +++++++++---------
 arch/arm64/mm/Makefile                 |   1 +
 arch/arm64/mm/trans_pgd.c              | 270 +++++++++++++++++++++++++
 kernel/kexec.c                         |   4 +
 kernel/kexec_core.c                    |   8 +-
 kernel/kexec_file.c                    |   4 +
 kernel/kexec_internal.h                |   2 +
 16 files changed, 750 insertions(+), 340 deletions(-)
 create mode 100644 arch/arm64/include/asm/trans_pgd.h
 create mode 100644 arch/arm64/mm/trans_pgd.c

-- 
2.23.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* [PATCH v3 02/17] arm64, hibernate: use get_safe_page directly
From: Pavel Tatashin @ 2019-08-21 18:31 UTC (permalink / raw)
  To: pasha.tatashin, jmorris, sashal, ebiederm, kexec, linux-kernel,
	corbet, catalin.marinas, will, linux-arm-kernel, marc.zyngier,
	james.morse, vladimir.murzin, matthias.bgg, bhsharma, linux-mm,
	mark.rutland
In-Reply-To: <20190821183204.23576-1-pasha.tatashin@soleen.com>

create_safe_exec_page is a local function that uses the
get_safe_page() to allocate page table and pages and one pages
that is getting mapped.

Remove the allocator related arguments, and use get_safe_page
directly, as it is done in other local functions in this
file.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
---
 arch/arm64/kernel/hibernate.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
index 9341fcc6e809..4bb4d17a6a7c 100644
--- a/arch/arm64/kernel/hibernate.c
+++ b/arch/arm64/kernel/hibernate.c
@@ -196,16 +196,14 @@ EXPORT_SYMBOL(arch_hibernation_header_restore);
  */
 static int create_safe_exec_page(void *src_start, size_t length,
 				 unsigned long dst_addr,
-				 phys_addr_t *phys_dst_addr,
-				 void *(*allocator)(gfp_t mask),
-				 gfp_t mask)
+				 phys_addr_t *phys_dst_addr)
 {
 	int rc = 0;
 	pgd_t *pgdp;
 	pud_t *pudp;
 	pmd_t *pmdp;
 	pte_t *ptep;
-	unsigned long dst = (unsigned long)allocator(mask);
+	unsigned long dst = get_safe_page(GFP_ATOMIC);
 
 	if (!dst) {
 		rc = -ENOMEM;
@@ -215,9 +213,9 @@ static int create_safe_exec_page(void *src_start, size_t length,
 	memcpy((void *)dst, src_start, length);
 	__flush_icache_range(dst, dst + length);
 
-	pgdp = pgd_offset_raw(allocator(mask), dst_addr);
+	pgdp = pgd_offset_raw((void *)get_safe_page(GFP_ATOMIC), dst_addr);
 	if (pgd_none(READ_ONCE(*pgdp))) {
-		pudp = allocator(mask);
+		pudp = (void *)get_safe_page(GFP_ATOMIC);
 		if (!pudp) {
 			rc = -ENOMEM;
 			goto out;
@@ -227,7 +225,7 @@ static int create_safe_exec_page(void *src_start, size_t length,
 
 	pudp = pud_offset(pgdp, dst_addr);
 	if (pud_none(READ_ONCE(*pudp))) {
-		pmdp = allocator(mask);
+		pmdp = (void *)get_safe_page(GFP_ATOMIC);
 		if (!pmdp) {
 			rc = -ENOMEM;
 			goto out;
@@ -237,7 +235,7 @@ static int create_safe_exec_page(void *src_start, size_t length,
 
 	pmdp = pmd_offset(pudp, dst_addr);
 	if (pmd_none(READ_ONCE(*pmdp))) {
-		ptep = allocator(mask);
+		ptep = (void *)get_safe_page(GFP_ATOMIC);
 		if (!ptep) {
 			rc = -ENOMEM;
 			goto out;
@@ -523,8 +521,7 @@ int swsusp_arch_resume(void)
 	 */
 	rc = create_safe_exec_page(__hibernate_exit_text_start, exit_size,
 				   (unsigned long)hibernate_exit,
-				   &phys_hibernate_exit,
-				   (void *)get_safe_page, GFP_ATOMIC);
+				   &phys_hibernate_exit);
 	if (rc) {
 		pr_err("Failed to create safe executable page for hibernate_exit code.\n");
 		goto out;
-- 
2.23.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH v3 01/17] kexec: quiet down kexec reboot
From: Pavel Tatashin @ 2019-08-21 18:31 UTC (permalink / raw)
  To: pasha.tatashin, jmorris, sashal, ebiederm, kexec, linux-kernel,
	corbet, catalin.marinas, will, linux-arm-kernel, marc.zyngier,
	james.morse, vladimir.murzin, matthias.bgg, bhsharma, linux-mm,
	mark.rutland
In-Reply-To: <20190821183204.23576-1-pasha.tatashin@soleen.com>

Here is a regular kexec command sequence and output:
=====
$ kexec --reuse-cmdline -i --load Image
$ kexec -e
[  161.342002] kexec_core: Starting new kernel

Welcome to Buildroot
buildroot login:
=====

Even when "quiet" kernel parameter is specified, "kexec_core: Starting
new kernel" is printed.

This message has  KERN_EMERG level, but there is no emergency, it is a
normal kexec operation, so quiet it down to appropriate KERN_NOTICE.

Machines that have slow console baud rate benefit from less output.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Simon Horman <horms@verge.net.au>
---
 kernel/kexec_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index d5870723b8ad..2c5b72863b7b 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -1169,7 +1169,7 @@ int kernel_kexec(void)
 		 * CPU hotplug again; so re-enable it here.
 		 */
 		cpu_hotplug_enable();
-		pr_emerg("Starting new kernel\n");
+		pr_notice("Starting new kernel\n");
 		machine_shutdown();
 	}
 
-- 
2.23.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* Re: [PATCH v3 14/19] dt-bindings: pci: add PHY properties to Armada 7K/8K controller bindings
From: Rob Herring @ 2019-08-21 18:28 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Andrew Lunn, Jason Cooper, devicetree, Antoine Tenart,
	Grzegorz Jaszczyk, Gregory Clement, Russell King,
	Kishon Vijay Abraham I, Nadav Haklai, Thomas Petazzoni,
	Maxime Chevallier, linux-arm-kernel, Sebastian Hesselbarth
In-Reply-To: <20190731122126.3049-15-miquel.raynal@bootlin.com>

On Wed, Jul 31, 2019 at 02:21:21PM +0200, Miquel Raynal wrote:
> Armada CP110 PCIe controller can have from one to four PHYs for
> configuring SERDES lanes (PCIe x1, PCIe x2 or PCIe x4). Describe the
> phys and phy-names properties in the bindings.
> 
> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> ---
>  Documentation/devicetree/bindings/pci/pci-armada8k.txt | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/pci/pci-armada8k.txt b/Documentation/devicetree/bindings/pci/pci-armada8k.txt
> index 9e3fc15e1af8..7cf12162aa4e 100644
> --- a/Documentation/devicetree/bindings/pci/pci-armada8k.txt
> +++ b/Documentation/devicetree/bindings/pci/pci-armada8k.txt
> @@ -17,6 +17,12 @@ Required properties:
>     name must be "core" for the first clock and "reg" for the second
>     one
>  
> +Optional properties:
> +- phys: phandle(s) to PHY node(s) following the generic PHY bindings.
> +	Either 1, 2 or 4 PHYs might be needed depending on the number of
> +	PCIe lanes.
> +- phy-names: names of the PHYs.

You need to enumerate what the names are. Based on your example in v2, I 
don't think the names are really valuable unless you can skip lanes.

Rob

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v2] dt-bindings: arm-boards: Update pointer to ARM CPU bindings
From: Rob Herring @ 2019-08-21 18:24 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Mark Rutland, devicetree, Geert Uytterhoeven, Linus Walleij,
	linux-kernel, linux-arm-kernel
In-Reply-To: <20190731114201.7884-1-geert+renesas@glider.be>

On Wed, 31 Jul 2019 13:42:01 +0200, Geert Uytterhoeven wrote:
> The ARM CPU DT bindings were converted from plain text to YAML, but not
> all referrers were updated.
> 
> Fixes: 672951cbd1b70a9e ("dt-bindings: arm: Convert cpu binding to json-schema")
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> Acked-by: Linus Walleij <linus.walleij@linaro.org>
> ---
> v2:
>   - Add Acked-by.
> ---
>  Documentation/devicetree/bindings/arm/arm-boards | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

Applied, thanks.

Rob

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v2 2/3] dt-bindings: display/bridge: Add binding for NWL mipi dsi host controller
From: Laurent Pinchart @ 2019-08-21 18:15 UTC (permalink / raw)
  To: Guido Günther
  Cc: Mark Rutland, devicetree, Jernej Skrabec, Pengutronix Kernel Team,
	Sam Ravnborg, Neil Armstrong, David Airlie, Fabio Estevam,
	Sascha Hauer, Jonas Karlman, linux-kernel, dri-devel,
	Andrzej Hajda, Rob Herring, NXP Linux Team, Daniel Vetter,
	Robert Chiras, Lee Jones, Shawn Guo, linux-arm-kernel
In-Reply-To: <9c906bb6592424acdb1a67447a482e010a113b49.1565367567.git.agx@sigxcpu.org>

Hi Guido,

Thank you for the patch.

On Fri, Aug 09, 2019 at 06:24:22PM +0200, Guido Günther wrote:
> The Northwest Logic MIPI DSI IP core can be found in NXPs i.MX8 SoCs.
> 
> Signed-off-by: Guido Günther <agx@sigxcpu.org>
> ---
>  .../bindings/display/bridge/nwl-dsi.yaml      | 155 ++++++++++++++++++
>  1 file changed, 155 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/display/bridge/nwl-dsi.yaml
> 
> diff --git a/Documentation/devicetree/bindings/display/bridge/nwl-dsi.yaml b/Documentation/devicetree/bindings/display/bridge/nwl-dsi.yaml
> new file mode 100644
> index 000000000000..5ed8bc4a4d18
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/display/bridge/nwl-dsi.yaml
> @@ -0,0 +1,155 @@
> +# SPDX-License-Identifier: GPL-2.0
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/display/bridge/imx-nwl-dsi.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Northwest Logic MIPI-DSI on imx SoCs
> +
> +maintainers:
> +  - Guido Gúnther <agx@sigxcpu.org>
> +  - Robert Chiras <robert.chiras@nxp.com>
> +
> +description: |
> +  NWL MIPI-DSI host controller found on i.MX8 platforms. This is a dsi bridge for
> +  the SOCs NWL MIPI-DSI host controller.
> +
> +properties:
> +  compatible:
> +    oneOf:
> +      - items:
> +        - const: fsl,imx8mq-nwl-dsi
> +
> +  reg:
> +    maxItems: 1
> +
> +  interrupts:
> +    maxItems: 1
> +
> +  clocks:
> +    items:
> +      - description: DSI core clock
> +      - description: RX_ESC clock (used in escape mode)
> +      - description: TX_ESC clock (used in escape mode)
> +      - description: PHY_REF clock
> +
> +  clock-names:
> +    items:
> +      - const: core
> +      - const: rx_esc
> +      - const: tx_esc
> +      - const: phy_ref
> +
> +  phys:
> +    maxItems: 1
> +    description:
> +      A phandle to the phy module representing the DPHY
> +
> +  phy-names:
> +    items:
> +      - const: dphy
> +
> +  power-domains:
> +    maxItems: 1
> +    description:
> +      A phandle to the power domain
> +
> +  resets:
> +    maxItems: 4
> +    description:
> +      A phandle to the reset controller
> +
> +  reset-names:
> +    items:
> +      - const: byte
> +      - const: dpi
> +      - const: esc
> +      - const: pclk
> +
> +  mux-sel:
> +    maxItems: 1
> +    description:
> +      A phandle to the MUX register set

Did you mean the MUX syscon ? A phandle to a register set sounds a bit
strange.

> +
> +  port:
> +    type: object
> +    description:
> +      A input put or output port node.

s/input put/input/

> +
> +  ports:
> +    type: object
> +    description:
> +      A node containing DSI input & output port nodes with endpoint
> +      definitions as documented in
> +      Documentation/devicetree/bindings/graph.txt.
> +
> +patternProperties:
> +  "^panel@[0-9]+$": true
> +
> +allOf:
> +  - if:
> +      properties:
> +        compatible:
> +          contains:
> +            enum:
> +              - fsl,imx8mq-nwl-dsi
> +    then:
> +      required:
> +        - resets
> +        - reset-names
> +        - mux-sel
> +
> +required:
> +  - compatible
> +  - reg
> +  - interrupts
> +  - clocks
> +  - clock-names
> +  - phys
> +  - phy-names
> +
> +examples:
> + - |
> +
> +   mipi_dsi: mipi_dsi@30a00000 {
> +              #address-cells = <1>;
> +              #size-cells = <0>;
> +              compatible = "fsl,imx8mq-nwl-dsi";
> +              reg = <0x30A00000 0x300>;
> +              clocks = <&clk 163>, <&clk 244>, <&clk 245>, <&clk 164>;
> +              clock-names = "core", "rx_esc", "tx_esc", "phy_ref";
> +              interrupts = <0 34 4>;
> +              power-domains = <&pgc_mipi>;
> +              resets = <&src 0>, <&src 1>, <&src 2>, <&src 3>;
> +              reset-names = "byte", "dpi", "esc", "pclk";
> +              mux-sel = <&iomuxc_gpr>;
> +              phys = <&dphy>;
> +              phy-names = "dphy";
> +
> +              panel@0 {
> +                      compatible = "...";
> +                      port@0 {
> +                           panel_in: endpoint {
> +                                     remote-endpoint = <&mipi_dsi_out>;
> +                           };
> +                      };
> +              };
> +
> +              ports {
> +                    #address-cells = <1>;
> +                    #size-cells = <0>;
> +
> +                    port@0 {
> +                           reg = <0>;
> +                           mipi_dsi_in: endpoint {
> +                                        remote-endpoint = <&lcdif_mipi_dsi>;
> +                           };
> +                    };
> +                    port@1 {
> +                           reg = <1>;
> +                           mipi_dsi_out: endpoint {
> +                                         remote-endpoint = <&panel_in>;
> +                           };
> +                    };
> +              };
> +      };

-- 
Regards,

Laurent Pinchart

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH 2/8] soc: ti: add initial PRM driver with reset control support
From: Tero Kristo @ 2019-08-21 18:15 UTC (permalink / raw)
  To: Suman Anna, Philipp Zabel, Keerthy, ssantosh, linux-arm-kernel,
	linux-omap, robh+dt
  Cc: tony, devicetree
In-Reply-To: <5e82199f-2f75-ee05-ba65-1595d0526572@ti.com>

On 21.8.2019 18.45, Suman Anna wrote:
> On 8/21/19 10:10 AM, Philipp Zabel wrote:
>> On Tue, 2019-08-20 at 11:47 -0500, Suman Anna wrote:
>>> On 8/20/19 2:37 AM, Tero Kristo wrote:
>>>> On 20.8.2019 2.01, Suman Anna wrote:
>>>>> Hi Tero,
>>>>>
>>>>> On 8/19/19 4:32 AM, Tero Kristo wrote:
>> [...]
>>>>>>>> +{
>>>>>>>> +    struct omap_reset_data *reset;
>>>>>>>> +
>>>>>>>> +    /*
>>>>>>>> +     * Check if we have resets. If either rstctl or rstst is
>>>>>>>> +     * non-zero, we have reset registers in place. Additionally
>>>>>>>> +     * the flag OMAP_PRM_NO_RSTST implies that we have resets.
>>>>>>>> +     */
>>>>>>>> +    if (!prm->data->rstctl && !prm->data->rstst &&
>>>>>>>> +        !(prm->data->flags & OMAP_PRM_NO_RSTST))
>>>>>>>> +        return 0;
>>>>>>>> +
>>>>>>>> +    reset = devm_kzalloc(&pdev->dev, sizeof(*reset), GFP_KERNEL);
>>>>>>>> +    if (!reset)
>>>>>>>> +        return -ENOMEM;
>>>>>>>> +
>>>>>>>> +    reset->rcdev.owner = THIS_MODULE;
>>>>>>>> +    reset->rcdev.ops = &omap_reset_ops;
>>>>>>>> +    reset->rcdev.of_node = pdev->dev.of_node;
>>>>>>>> +    reset->rcdev.nr_resets = OMAP_MAX_RESETS;
>>>>>
>>>>> Suggest adding a number of resets to prm->data, and using it so that we
>>>>> don't even entertain any resets beyond the actual number of resets.
>>>>
>>>> Hmm why bother? Accessing a stale reset bit will just cause access to a
>>>> reserved bit in the reset register, doing basically nothing. Also, this
>>>> would not work for am3/am4 wkup, as there is a single reset bit at an
>>>> arbitrary position.
>>>
>>> The generic convention seems to be defining a reset id value defined
>>> from include/dt-bindings/reset/ that can be used to match between the
>>> dt-nodes and the reset-controller driver.
>>>
>>> Philipp,
>>> Any comments?
>>
>> Are there only reset bits and reserved bits in the range accessible by
>> [0..OMAP_MAX_RESETS] or are ther bits with another function as well?
> 
> Thanks Philipp, these are just reset bits and reserved bits.
> 
>> If the latter is the case, I would prefer enumerating the resets in a
>> dt-bindings header, with the driver containing an enum -> reg/bit
>> position lookup table.
>>
>> In general, assuming the device tree contains no errors, this should not
>> matter much, but I think it is nice if the reset driver, even with a
>> misconfigured device tree, can't write into arbitrary bit fields.
> 
> Tero,
> Can you add a check for this if possible?

Well, I can enforce the usage of reset bit mapping, which I have already 
implemented for some SoCs like am33xx. If the specific ID is not found, 
I can bail out. So, basically in this example requesting reset at index 
3 would succeed, but it would fail for any other ID; this would be 
direct HW bit mapping.

-Tero
--
Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v2 2/4] nvmem: meson-efuse: bindings: Add secure-monitor phandle
From: Rob Herring @ 2019-08-21 18:14 UTC (permalink / raw)
  To: Carlo Caione
  Cc: devicetree, narmstrong, khilman, srinivas.kandagatla,
	linux-amlogic, tglx, linux-arm-kernel, jbrunet
In-Reply-To: <20190731082339.20163-3-ccaione@baylibre.com>

On Wed, Jul 31, 2019 at 09:23:37AM +0100, Carlo Caione wrote:
> Add a new property to link the nvmem driver to the secure-monitor. The
> nvmem driver needs to access the secure-monitor to be able to access the
> fuses.
> 
> Signed-off-by: Carlo Caione <ccaione@baylibre.com>
> ---
>  Documentation/devicetree/bindings/nvmem/amlogic-efuse.txt | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/nvmem/amlogic-efuse.txt b/Documentation/devicetree/bindings/nvmem/amlogic-efuse.txt
> index 2e0723ab3384..f7b3ed74db54 100644
> --- a/Documentation/devicetree/bindings/nvmem/amlogic-efuse.txt
> +++ b/Documentation/devicetree/bindings/nvmem/amlogic-efuse.txt
> @@ -4,6 +4,7 @@ Required properties:
>  - compatible: should be "amlogic,meson-gxbb-efuse"
>  - clocks: phandle to the efuse peripheral clock provided by the
>  	  clock controller.
> +- secure-monitor: phandle to the secure-monitor node
>  
>  = Data cells =
>  Are child nodes of eFuse, bindings of which as described in
> @@ -16,6 +17,7 @@ Example:
>  		clocks = <&clkc CLKID_EFUSE>;
>  		#address-cells = <1>;
>  		#size-cells = <1>;
> +		secure-monitor = <&sm>;
>  
>  		sn: sn@14 {
>  			reg = <0x14 0x10>;
> @@ -30,6 +32,10 @@ Example:
>  		};
>  	};
>  
> +	sm: secure-monitor {
> +		compatible = "amlogic,meson-gxbb-sm";
> +	};

I guess I acked this a while back, but I'm not sure I would today. It 
seems incomplete and a node with only a compatible string and no 
resources doesn't need to be in DT. But that's already done...

There's no need for 'secure-monitor' anyways. Just do 
'of_find_compatible_node(NULL, NULL, "amlogic,meson-gxbb-sm")' or search 
for the driver directly. It's not like there's more than one secure 
monitor...

Rob

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH 3/3] watchdog/aspeed: add support for dual boot
From: Guenter Roeck @ 2019-08-21 18:10 UTC (permalink / raw)
  To: Alexander Amelkin
  Cc: linux-watchdog, linux-aspeed, Andrew Jeffery, linux-kernel,
	Joel Stanley, Ivan Mikhaylov, Wim Van Sebroeck, linux-arm-kernel
In-Reply-To: <9e7fe5cc-ba1b-b8b6-69c5-c3c6cf508a36@yadro.com>

On Wed, Aug 21, 2019 at 08:42:24PM +0300, Alexander Amelkin wrote:
> 21.08.2019 19:32, Guenter Roeck wrote:
> > On Wed, Aug 21, 2019 at 06:57:43PM +0300, Ivan Mikhaylov wrote:
> >> Set WDT_CLEAR_TIMEOUT_AND_BOOT_CODE_SELECTION into WDT_CLEAR_TIMEOUT_STATUS
> >> to clear out boot code source and re-enable access to the primary SPI flash
> >> chip while booted via wdt2 from the alternate chip.
> >>
> >> AST2400 datasheet says:
> >> "In the 2nd flash booting mode, all the address mapping to CS0# would be
> >> re-directed to CS1#. And CS0# is not accessable under this mode. To access
> >> CS0#, firmware should clear the 2nd boot mode register in the WDT2 status
> >> register WDT30.bit[1]."
> > Is there reason to not do this automatically when loading the module
> > in alt-boot mode ? What means does userspace have to determine if CS0
> > or CS1 is active at any given time ? If there is reason to ever have CS1
> > active instead of CS0, what means would userspace have to enable it ?
> 
> Yes, there is. The driver is loaded long before the filesystems are mounted. The filesystems, in the event of alternate/recovery boot, need to be mounted from the same chip that the kernel was booted. For one reason because the main chip at CS0 is most probably corrupt. If you clear that bit when driver is loaded, your software will not know that and will try to mount the wrong filesystems. The whole idea of ASPEED's switching chipselects is to have identical firmware in both chips, without the need to process the alternate boot state in any way except for indicating a successful boot and restoring access to CS0 when needed.
> 
> The userspace can read bootstatus sysfs node to determine if an alternate boot has occured.
> 
> With ASPEED, CS1 is activated automatically by wdt2 when system fails to boot from the primary flash chip (at CS0) and disable the watchdog to indicate a successful boot. When that happens, both CS0 and CS1 controls  get routed in hardware to CS1 line, making the primary flash chip inaccessible. Depending on the architecture of the user-space software, it may choose to re-enable access to the primary chip via CS0 at different times. There must be a way to do so.
> 
So by activating cs0, userspace would essentially pull its own root file system
from underneath itself ?

> > If userspace can not really determine if CS1 or CS0 is active, all it could
> > ever do was to enable CS0 to be in a deterministic state. If so, it doesn't
> > make sense to ever have CS1 active, and re-enabling CS0 could be automatic.
> >
> > Similar, if CS1 can ever be enabled, there is no means for userspace to ensure
> > that some other application did not re-enable CS0 while it believes that CS1
> > is enabled. If there is no means for userspace to enable CS1, it can never be
> > sure what is enabled (because some other entity may have enabled CS0 while
> > userspace just thought that CS1 is still enabled). Again, the only means
> > to guarantee a well defined state would be to explicitly enable CS0 and provive
> > no means to enable CS1. Again, this could be done during boot, not requiring
> > an explicit request from userspace.
> 
> Please understand that activation of CS1 in place of CS0 is NOT a software choice!
> 
> 
> >> +	if (unlikely(!wdt))
> >> +		return -ENODEV;
> >> +
> > How would this ever happen, and how / where is drvdata set to NULL ?
> 
> This is purely for robustness. Seeing a pointer obtained via a function accessed without first checking it for validity makes me nervous.
> 
This is not how kernel code is commonly written.
Sure, we could add similar checks to each sysfs access code in the kernel,
blowing up its size significantly. I do not see a point of this.

> This code most probably adds nothing at the assembly level.
> 
That seems quite unlikely. Please demonstrate.

> >
> >> +	writel(WDT_CLEAR_TIMEOUT_AND_BOOT_CODE_SELECTION,
> >> +			wdt->base + WDT_CLEAR_TIMEOUT_STATUS);
> >> +	wdt->wdd.bootstatus |= WDIOF_EXTERN1;
> > The variable reflects the _boot status_. It should not change after booting.
> Is there any documentation that dictates that? All I could find is
> 
> "bootstatus: status of the device after booting". That doesn't look to me like it absolutely can not change to reflect the updated status (that is, to reflect that the originally set up alternate CS routing has been reset to normal).
> 
You choose to interpret "after booting" in a kind of novel way,
which I find a bit disturbing. I am not really sure how else to
describe "boot status" in a way that does not permit such
reinterpratation of the term.

On top of that, how specifically would "WDIOF_EXTERN1" reflect
what you claim it does ? Not only you are hijacking bootstatus9
(which is supposed to describe the reason for a reboot), you
are also hijacking WDIOF_EXTERN1. That seems highly arbitrary
to me, and is not really how an API/ABI should be used.

Guenter

> If you absolutely disallow that, I think we could make 'access_cs0' readable instead, so it could report the current state of the boot code selection bit. Reverted, I suppose. That way 'access_cs0' would report 1 after 1 has been written to it (it wouldn't be possible to write a zero).
> 
> > @@ -223,6 +248,9 @@ static int aspeed_wdt_probe(struct platform_device *pdev)
> >  
> >  	wdt->ctrl = WDT_CTRL_1MHZ_CLK;
> >  
> > +	if (of_property_read_bool(np, "aspeed,alt-boot"))
> > +		wdt->wdd.groups = bswitch_groups;
> > +
> > Why does this have to be separate to the existing evaluation of
> > aspeed,alt-boot, and why does the existing code not work ?
> >
> > Also, is it guaranteed that this does not interfer with existing
> > support for alt-boot ?
> 
> I think Ivan will comment on this.
> 
> With best regards,
> Alexander Amelkin,
> BIOS/BMC Team Lead, YADRO
> https://yadro.com
> 
> 




_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH] usb: dwc3: meson-g12a: fix suspend resume regulator unbalanced disables
From: Kevin Hilman @ 2019-08-21 18:02 UTC (permalink / raw)
  To: Neil Armstrong, balbi
  Cc: linux-amlogic, linux-usb, linux-kernel, linux-arm-kernel,
	Neil Armstrong
In-Reply-To: <20190821133518.9671-1-narmstrong@baylibre.com>

Neil Armstrong <narmstrong@baylibre.com> writes:

> When going in suspend, in Device mode, then resuming back leads
> to the following:
>
> unbalanced disables for USB_PWR_EN
> WARNING: CPU: 0 PID: 163 at ../drivers/regulator/core.c:2590 _regulator_disable+0x104/0x180
> Hardware name: Amlogic Meson G12A U200 Development Board (DT)
> [...]
> pc : _regulator_disable+0x104/0x180
> lr : _regulator_disable+0x104/0x180
> [...]
> Call trace:
>  _regulator_disable+0x104/0x180
>  regulator_disable+0x40/0x78
>  dwc3_meson_g12a_otg_mode_set+0x84/0xb0
>  dwc3_meson_g12a_irq_thread+0x58/0xb8
>  irq_thread_fn+0x28/0x80
>  irq_thread+0x118/0x1b8
>  kthread+0xf4/0x120
>  ret_from_fork+0x10/0x18
>
> This disables the regulator if enabled on suspend, and the reverse on
> resume.
>
> Fixes: c99993376f72 ("usb: dwc3: Add Amlogic G12A DWC3 glue")
> Reported-by: Kevin Hilman <khilman@baylibre.com>
> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>

Tested-by: Kevin Hilman <khilman@baylibre.com>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v2 07/14] dt-bindings: dma: ti: Add document for K3 UDMA
From: Rob Herring @ 2019-08-21 17:59 UTC (permalink / raw)
  To: Peter Ujfalusi
  Cc: nm, devicetree, grygorii.strashko, lokeshvutla, j-keerthy,
	linux-kernel, t-kristo, tony, vkoul, ssantosh, dmaengine,
	dan.j.williams, linux-arm-kernel
In-Reply-To: <20190730093450.12664-8-peter.ujfalusi@ti.com>

On Tue, Jul 30, 2019 at 12:34:43PM +0300, Peter Ujfalusi wrote:
> New binding document for
> Texas Instruments K3 NAVSS Unified DMA – Peripheral Root Complex (UDMA-P).
> 
> UDMA-P is introduced as part of the K3 architecture and can be found on
> AM654 and j721e.
> 
> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
> ---
>  .../devicetree/bindings/dma/ti/k3-udma.txt    | 170 ++++++++++++++++++
>  include/dt-bindings/dma/k3-udma.h             |  10 ++
>  2 files changed, 180 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/dma/ti/k3-udma.txt
>  create mode 100644 include/dt-bindings/dma/k3-udma.h
> 
> diff --git a/Documentation/devicetree/bindings/dma/ti/k3-udma.txt b/Documentation/devicetree/bindings/dma/ti/k3-udma.txt
> new file mode 100644
> index 000000000000..7f30fe583ade
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/dma/ti/k3-udma.txt
> @@ -0,0 +1,170 @@
> +* Texas Instruments K3 NAVSS Unified DMA – Peripheral Root Complex (UDMA-P)
> +
> +The UDMA-P is intended to perform similar (but significantly upgraded) functions
> +as the packet-oriented DMA used on previous SoC devices. The UDMA-P module
> +supports the transmission and reception of various packet types. The UDMA-P is
> +architected to facilitate the segmentation and reassembly of SoC DMA data
> +structure compliant packets to/from smaller data blocks that are natively
> +compatible with the specific requirements of each connected peripheral. Multiple
> +Tx and Rx channels are provided within the DMA which allow multiple segmentation
> +or reassembly operations to be ongoing. The DMA controller maintains state
> +information for each of the channels which allows packet segmentation and
> +reassembly operations to be time division multiplexed between channels in order
> +to share the underlying DMA hardware. An external DMA scheduler is used to
> +control the ordering and rate at which this multiplexing occurs for Transmit
> +operations. The ordering and rate of Receive operations is indirectly controlled
> +by the order in which blocks are pushed into the DMA on the Rx PSI-L interface.
> +
> +The UDMA-P also supports acting as both a UTC and UDMA-C for its internal
> +channels. Channels in the UDMA-P can be configured to be either Packet-Based or
> +Third-Party channels on a channel by channel basis.
> +
> +Required properties:
> +--------------------
> +- compatible:		Should be
> +			"ti,am654-navss-main-udmap" for am654 main NAVSS UDMAP
> +			"ti,am654-navss-mcu-udmap" for am654 mcu NAVSS UDMAP
> +			"ti,j721e-navss-main-udmap" for j721e main NAVSS UDMAP
> +			"ti,j721e-navss-mcu-udmap" for j721e mcu NAVSS UDMAP
> +- #dma-cells:		Should be set to <3>.
> +			- The first parameter is a phandle to the remote PSI-L
> +			  endpoint

This is the phandle of the client? That's weird. More below.

> +			- The second parameter is the thread offset within the
> +			  remote thread ID range
> +			- The third parameter is the channel direction.
> +- reg:			Memory map of UDMAP
> +- reg-names:		"gcfg", "rchanrt", "tchanrt"
> +- msi-parent:		phandle for "ti,sci-inta" interrupt controller
> +- ti,ringacc:		phandle for the ring accelerator node
> +- ti,psil-base:		PSI-L thread ID base of the UDMAP channels
> +- ti,sci:		phandle on TI-SCI compatible System controller node
> +- ti,sci-dev-id:	TI-SCI device id
> +- ti,sci-rm-range-tchan: UDMA tchan resource list in pairs of type and subtype
> +- ti,sci-rm-range-rchan: UDMA rchan resource list in pairs of type and subtype
> +- ti,sci-rm-range-rflow: UDMA rflow resource list in pairs of type and subtype
> +
> +For PSI-L thread management the parent NAVSS node must have:
> +- ti,sci:		phandle on TI-SCI compatible System controller node
> +- ti,sci-dev-id:	TI-SCI device id of the NAVSS instance
> +
> +Remote PSI-L endpoint
> +
> +Required properties:
> +--------------------
> +- ti,psil-base:		PSI-L thread ID base of the endpoint
> +
> +Within the PSI-L endpoint node thread configuration subnodes must present with:
> +psil-configX naming convention, where X is the thread ID offset.
> +
> +Configuration node Optional properties:
> +--------------------
> +- pdma,statictr-type:	In case the remote endpoint (PDMAs) requires StaticTR

Property names are in the form [<vendor>,]prop-name. pdma is not a 
vendor.

> +			configuration:
> +			- PSIL_STATIC_TR_XY (1): XY type of StaticTR
> +			For endpoints without StaticTR the property is not
> +			needed or to be set PSIL_STATIC_TR_NONE (0).
> +- pdma,enable-acc32:	Force 32 bit access on peripheral port. Only valid for
> +			XY type StaticTR, not supported on am654.
> +			Must be enabled for threads servicing McASP with AFIFO
> +			bypass mode.
> +- pdma,enable-burst:	Enable burst access on peripheral port. Only valid for
> +			XY type StaticTR, not supported on am654.
> +- ti,channel-tpl:	Channel Throughput level:
> +			0 / or not present - normal channel
> +			1 - High Throughput channel
> +			2 - Ultra High Throughput channel (j721e only)
> +- ti,needs-epib:	If the endpoint require EPIB to be present in the
> +			descriptor.
> +- ti,psd-size:		Size of the Protocol Specific Data section of the
> +			descriptor.
> +
> +Example:
> +
> +main_navss: main_navss {
> +	compatible = "simple-bus";
> +	#address-cells = <2>;
> +	#size-cells = <2>;
> +	dma-coherent;
> +	dma-ranges;
> +	ranges;
> +
> +	ti,sci = <&dmsc>;
> +	ti,sci-dev-id = <118>;
> +
> +	main_udmap: dma-controller@31150000 {
> +		compatible = "ti,am654-navss-main-udmap";
> +		reg =	<0x0 0x31150000 0x0 0x100>,
> +			<0x0 0x34000000 0x0 0x100000>,
> +			<0x0 0x35000000 0x0 0x100000>;
> +		reg-names = "gcfg", "rchanrt", "tchanrt";
> +		#dma-cells = <3>;
> +
> +		ti,ringacc = <&ringacc>;
> +		ti,psil-base = <0x1000>;
> +
> +		interrupt-parent = <&main_udmass_inta>;
> +
> +		ti,sci = <&dmsc>;
> +		ti,sci-dev-id = <188>;
> +
> +		ti,sci-rm-range-tchan = <0x6 0x1>, /* TX_HCHAN */
> +					<0x6 0x2>; /* TX_CHAN */
> +		ti,sci-rm-range-rchan = <0x6 0x4>, /* RX_HCHAN */
> +					<0x6 0x5>; /* RX_CHAN */
> +		ti,sci-rm-range-rflow = <0x6 0x6>; /* GP RFLOW */
> +	};
> +};
> +
> +psilss@340c000 {
> +	/* PSILSS1 AASRC */
> +	compatible = "ti,j721e-psilss";
> +	reg = <0x0 0x0340c000 0x0 0x1000>;
> +	reg-names = "config";
> +
> +	pdma_main_mcasp_g0: pdma_main_mcasp_g0 {
> +		/* PDMA6 (PDMA_MCASP_G0) */
> +		ti,psil-base = <0x4400>;
> +
> +		/* psil-config0 */
> +		psil-config0 {
> +			pdma,statictr-type = <PSIL_STATIC_TR_XY>;
> +			pdma,enable-acc32;
> +			pdma,enable-burst;
> +		};
> +	};
> +};
> +
> +mcasp0: mcasp@02B00000 {

I don't really follow what psilss and mcasp are...

> +...
> +	/* tx: PDMA_MAIN_MCASP_G0-0, rx: PDMA_MAIN_MCASP_G0-0 */
> +	dmas = <&main_udmap &pdma_main_mcasp_g0 0 UDMA_DIR_TX>,
> +	       <&main_udmap &pdma_main_mcasp_g0 0 UDMA_DIR_RX>;
> +	dma-names = "tx", "rx";
> +...
> +};
> +
> +crypto: crypto@4E00000 {
> +	compatible = "ti,sa2ul-crypto";
> +...
> +
> +	/* tx: crypto_pnp-1, rx: crypto_pnp-1 */
> +	dmas = <&main_udmap &crypto 0 UDMA_DIR_TX>,
> +	       <&main_udmap &crypto 0 UDMA_DIR_RX>,
> +	       <&main_udmap &crypto 1 UDMA_DIR_RX>;

'thread offset' is 1?

> +	dma-names = "tx", "rx1", "rx2";
> +...
> +	psil-config0 {

Are these nodes 1-1 with the 'dmas' entries? I think these flags should 
all be DMA cells. They are all configuration of DMA channels, right?

Though I'm not sure about how that would work for the previous example.

> +		ti,needs-epib;
> +		ti,psd-size = <64>;
> +	};
> +
> +	psil-config1 {
> +		ti,needs-epib;
> +		ti,psd-size = <64>;
> +	};
> +
> +	psil-config2 {
> +		ti,needs-epib;
> +		ti,psd-size = <64>;
> +	};
> +};
> diff --git a/include/dt-bindings/dma/k3-udma.h b/include/dt-bindings/dma/k3-udma.h
> new file mode 100644
> index 000000000000..f5c8f5d50491
> --- /dev/null
> +++ b/include/dt-bindings/dma/k3-udma.h
> @@ -0,0 +1,10 @@
> +#ifndef __DT_TI_UDMA_H
> +#define __DT_TI_UDMA_H
> +
> +#define UDMA_DIR_TX		0
> +#define UDMA_DIR_RX		1
> +
> +#define PSIL_STATIC_TR_NONE	0
> +#define PSIL_STATIC_TR_XY	1
> +
> +#endif /* __DT_TI_UDMA_H */
> -- 
> Peter
> 
> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
> Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH] ARM: s3c64xx: squash samsung_usb_phy.h into setup-usb-phy.c
From: Krzysztof Kozlowski @ 2019-08-21 17:56 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: linux-samsung-soc, Greg Kroah-Hartman, linux-usb, linux-kernel,
	Russell King, Kukjin Kim, linux-arm-kernel
In-Reply-To: <20190819155602.20843-1-yamada.masahiro@socionext.com>

On Tue, Aug 20, 2019 at 12:56:02AM +0900, Masahiro Yamada wrote:
> This is only used by arch/arm/mach-s3c64xx/setup-usb-phy.c
> 
> $ git grep samsung_usb_phy_type
> include/linux/usb/samsung_usb_phy.h:enum samsung_usb_phy_type {
> $ git grep USB_PHY_TYPE_DEVICE
> arch/arm/mach-s3c64xx/setup-usb-phy.c:  if (type == USB_PHY_TYPE_DEVICE)
> arch/arm/mach-s3c64xx/setup-usb-phy.c:  if (type == USB_PHY_TYPE_DEVICE)
> include/linux/usb/samsung_usb_phy.h:    USB_PHY_TYPE_DEVICE,
> $ git grep USB_PHY_TYPE_HOST
> include/linux/usb/samsung_usb_phy.h:    USB_PHY_TYPE_HOST,
> 
> Actually, 'enum samsung_usb_phy_type' is unused; the 'type' parameter
> has 'int' type. Anyway, there is no need to declare this enum in the
> globally visible header. Squash the header.
> 
> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
> ---
> 
>  arch/arm/mach-s3c64xx/setup-usb-phy.c        |  5 +++++
>  arch/arm/plat-samsung/include/plat/usb-phy.h |  2 --
>  include/linux/usb/samsung_usb_phy.h          | 17 -----------------
>  3 files changed, 5 insertions(+), 19 deletions(-)

Thanks, applied.

Best regards,
Krzysztof


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v6 3/4] dt-bindings: arm: fsl: Add Kontron i.MX6UL N6310 compatibles
From: Krzysztof Kozlowski @ 2019-08-21 17:54 UTC (permalink / raw)
  To: Rob Herring
  Cc: Mark Rutland, devicetree, Shawn Guo, Sascha Hauer,
	linux-kernel@vger.kernel.org, Schrempf Frieder, NXP Linux Team,
	Pengutronix Kernel Team, Fabio Estevam,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE
In-Reply-To: <CAL_JsqKBWB2FiVjYo9O7DPw1JYJvan7uRgbR0VBG=FfHDVYdZQ@mail.gmail.com>

On Tue, Aug 20, 2019 at 03:27:39PM -0500, Rob Herring wrote:
> > I see. If I understand the schema correctly, this should look like:
> 
> Looks correct, but a couple of comments.
> 
> > diff --git a/Documentation/devicetree/bindings/arm/fsl.yaml b/Documentation/devicetree/bindings/arm/fsl.yaml
> > index 7294ac36f4c0..eb263d1ccf13 100644
> > --- a/Documentation/devicetree/bindings/arm/fsl.yaml
> > +++ b/Documentation/devicetree/bindings/arm/fsl.yaml
> > @@ -161,6 +161,22 @@ properties:
> >          items:
> >            - enum:
> >                - fsl,imx6ul-14x14-evk      # i.MX6 UltraLite 14x14 EVK Board
> > +              - kontron,imx6ul-n6310-som  # Kontron N6310 SOM
> 
> Is the SOM ever used alone? If not, then no point in listing this here.

SoM alone: no, because it requires some type of base board. However it
will be used by some customer designs with some amount of
changes/addons.

Looking at other aproaches, usually SoMs have their own compatible.  In
such case - I should document it somewhere.

> 
> > +          - const: fsl,imx6ul
> > +
> > +      - description: Kontron N6310 S Board
> > +        items:
> > +          - enum:
> > +              - kontron,imx6ul-n6310-s
> 
> This could be a 'const' instead. It depends if you think there will
> ever be more than one entry.

Indeed, I'll make this and entry below for S 43 board const.


Best regards,
Krzysztof


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* [RESEND PATCH] arm64: dts: rockchip: add rk3328 VPU node
From: Jonas Karlman @ 2019-08-21 17:54 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: devicetree@vger.kernel.org, Jonas Karlman,
	linux-kernel@vger.kernel.org, linux-rockchip@lists.infradead.org,
	Rob Herring, linux-arm-kernel@lists.infradead.org

This patch add a VPU device node for rk3328.

Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
---
It would be great if this can be considered for v5.4,
related dt-bindings commit has been merged in media tree for v5.4.

Decoding using hantro driver has been tested on a Pine64 Rock64 RK3328 device.
---
 arch/arm64/boot/dts/rockchip/rk3328.dtsi | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/rockchip/rk3328.dtsi b/arch/arm64/boot/dts/rockchip/rk3328.dtsi
index e9fefd8a7e02..4a175fff2861 100644
--- a/arch/arm64/boot/dts/rockchip/rk3328.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3328.dtsi
@@ -278,6 +278,7 @@
 			};
 			pd_vpu@RK3328_PD_VPU {
 				reg = <RK3328_PD_VPU>;
+				clocks = <&cru ACLK_VPU>, <&cru HCLK_VPU>;
 			};
 		};
 
@@ -596,6 +597,17 @@
 		status = "disabled";
 	};
 
+	vpu: video-codec@ff350000 {
+		compatible = "rockchip,rk3328-vpu";
+		reg = <0x0 0xff350000 0x0 0x800>;
+		interrupts = <GIC_SPI 9 IRQ_TYPE_LEVEL_HIGH>;
+		interrupt-names = "vdpu";
+		clocks = <&cru ACLK_VPU>, <&cru HCLK_VPU>;
+		clock-names = "aclk", "hclk";
+		iommus = <&vpu_mmu>;
+		power-domains = <&power RK3328_PD_VPU>;
+	};
+
 	vpu_mmu: iommu@ff350800 {
 		compatible = "rockchip,iommu";
 		reg = <0x0 0xff350800 0x0 0x40>;
@@ -604,7 +616,7 @@
 		clocks = <&cru ACLK_VPU>, <&cru HCLK_VPU>;
 		clock-names = "aclk", "iface";
 		#iommu-cells = <0>;
-		status = "disabled";
+		power-domains = <&power RK3328_PD_VPU>;
 	};
 
 	rkvdec_mmu: iommu@ff360480 {
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* Re: [PATCH v4] kasan: add memory corruption identification for software tag-based mode
From: Andrey Ryabinin @ 2019-08-21 17:52 UTC (permalink / raw)
  To: Walter Wu
  Cc: wsd_upstream, Vasily Gorbik, Arnd Bergmann, linux-mm,
	Andrey Konovalov, linux-kernel, kasan-dev, Martin Schwidefsky,
	Miles Chen, Alexander Potapenko, linux-arm-kernel,
	Matthias Brugger, linux-mediatek, Andrew Morton, Thomas Gleixner,
	Dmitry Vyukov
In-Reply-To: <1566279478.9993.21.camel@mtksdccf07>



On 8/20/19 8:37 AM, Walter Wu wrote:
> On Tue, 2019-08-06 at 13:43 +0800, Walter Wu wrote:
>> This patch adds memory corruption identification at bug report for
>> software tag-based mode, the report show whether it is "use-after-free"
>> or "out-of-bound" error instead of "invalid-access" error. This will make
>> it easier for programmers to see the memory corruption problem.
>>
>> We extend the slab to store five old free pointer tag and free backtrace,
>> we can check if the tagged address is in the slab record and make a
>> good guess if the object is more like "use-after-free" or "out-of-bound".
>> therefore every slab memory corruption can be identified whether it's
>> "use-after-free" or "out-of-bound".
>>
>> ====== Changes
>> Change since v1:
>> - add feature option CONFIG_KASAN_SW_TAGS_IDENTIFY.
>> - change QUARANTINE_FRACTION to reduce quarantine size.
>> - change the qlist order in order to find the newest object in quarantine
>> - reduce the number of calling kmalloc() from 2 to 1 time.
>> - remove global variable to use argument to pass it.
>> - correct the amount of qobject cache->size into the byes of qlist_head.
>> - only use kasan_cache_shrink() to shink memory.
>>
>> Change since v2:
>> - remove the shinking memory function kasan_cache_shrink()
>> - modify the description of the CONFIG_KASAN_SW_TAGS_IDENTIFY
>> - optimize the quarantine_find_object() and qobject_free()
>> - fix the duplicating function name 3 times in the header.
>> - modify the function name set_track() to kasan_set_track()
>>
>> Change since v3:
>> - change tag-based quarantine to extend slab to identify memory corruption
> 
> Hi,Andrey,
> 
> Would you review the patch,please?


I didn't notice anything fundamentally wrong, but I find there are some
questionable implementation choices that makes code look weirder than necessary
and harder to understand. So I ended up with cleaning it up, see the diff bellow.
I'll send v5 with that diff folded.




diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
index 26cb3bcc9258..6c9682ce0254 100644
--- a/lib/Kconfig.kasan
+++ b/lib/Kconfig.kasan
@@ -140,7 +140,7 @@ config KASAN_SW_TAGS_IDENTIFY
 	help
 	  This option enables best-effort identification of bug type
 	  (use-after-free or out-of-bounds) at the cost of increased
-	  memory consumption for slab extending.
+	  memory consumption.
 
 config TEST_KASAN
 	tristate "Module for testing KASAN for bug detection"
diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 2cdcb16b9c2d..6814d6d6a023 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -71,7 +71,7 @@ static inline depot_stack_handle_t save_stack(gfp_t flags)
 	return stack_depot_save(entries, nr_entries, flags);
 }
 
-void kasan_set_track(struct kasan_track *track, gfp_t flags)
+static inline void set_track(struct kasan_track *track, gfp_t flags)
 {
 	track->pid = current->pid;
 	track->stack = save_stack(flags);
@@ -304,8 +304,6 @@ size_t kasan_metadata_size(struct kmem_cache *cache)
 struct kasan_alloc_meta *get_alloc_info(struct kmem_cache *cache,
 					const void *object)
 {
-	if (!IS_ENABLED(CONFIG_KASAN_SW_TAGS_IDENTIFY))
-		BUILD_BUG_ON(sizeof(struct kasan_alloc_meta) > 32);
 	return (void *)object + cache->kasan_info.alloc_meta_offset;
 }
 
@@ -316,6 +314,24 @@ struct kasan_free_meta *get_free_info(struct kmem_cache *cache,
 	return (void *)object + cache->kasan_info.free_meta_offset;
 }
 
+
+static void kasan_set_free_info(struct kmem_cache *cache,
+		void *object, u8 tag)
+{
+	struct kasan_alloc_meta *alloc_meta;
+	u8 idx = 0;
+
+	alloc_meta = get_alloc_info(cache, object);
+
+#ifdef CONFIG_KASAN_SW_TAGS_IDENTIFY
+	idx = alloc_meta->free_track_idx;
+	alloc_meta->free_pointer_tag[idx] = tag;
+	alloc_meta->free_track_idx = (idx + 1) % KASAN_NR_FREE_STACKS;
+#endif
+
+	set_track(&alloc_meta->free_track[idx], GFP_NOWAIT);
+}
+
 void kasan_poison_slab(struct page *page)
 {
 	unsigned long i;
@@ -452,11 +468,8 @@ static bool __kasan_slab_free(struct kmem_cache *cache, void *object,
 			unlikely(!(cache->flags & SLAB_KASAN)))
 		return false;
 
-	if (IS_ENABLED(CONFIG_KASAN_SW_TAGS_IDENTIFY))
-		kasan_set_free_info(cache, object, tag);
-	else
-		kasan_set_track(&get_alloc_info(cache, object)->free_track,
-						GFP_NOWAIT);
+	kasan_set_free_info(cache, object, tag);
+
 	quarantine_put(get_free_info(cache, object), cache);
 
 	return IS_ENABLED(CONFIG_KASAN_GENERIC);
@@ -494,8 +507,7 @@ static void *__kasan_kmalloc(struct kmem_cache *cache, const void *object,
 		KASAN_KMALLOC_REDZONE);
 
 	if (cache->flags & SLAB_KASAN)
-		kasan_set_track(&get_alloc_info(cache, object)->alloc_track,
-						flags);
+		set_track(&get_alloc_info(cache, object)->alloc_track, flags);
 
 	return set_tag(object, tag);
 }
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index 531a5823e8c6..35cff6bbb716 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -96,21 +96,17 @@ struct kasan_track {
 };
 
 #ifdef CONFIG_KASAN_SW_TAGS_IDENTIFY
-#define KASAN_EXTRA_FREE_INFO_COUNT 4
-#define KASAN_TOTAL_FREE_INFO_COUNT  (KASAN_EXTRA_FREE_INFO_COUNT + 1)
-struct extra_free_info {
-	/* Round-robin FIFO array. */
-	struct kasan_track free_track[KASAN_EXTRA_FREE_INFO_COUNT];
-	u8 free_pointer_tag[KASAN_TOTAL_FREE_INFO_COUNT];
-	u8 free_track_tail;
-};
+#define KASAN_NR_FREE_STACKS 5
+#else
+#define KASAN_NR_FREE_STACKS 1
 #endif
 
 struct kasan_alloc_meta {
 	struct kasan_track alloc_track;
-	struct kasan_track free_track;
+	struct kasan_track free_track[KASAN_NR_FREE_STACKS];
 #ifdef CONFIG_KASAN_SW_TAGS_IDENTIFY
-	struct extra_free_info free_info;
+	u8 free_pointer_tag[KASAN_NR_FREE_STACKS];
+	u8 free_track_idx;
 #endif
 };
 
@@ -160,28 +156,7 @@ void kasan_report(unsigned long addr, size_t size,
 		bool is_write, unsigned long ip);
 void kasan_report_invalid_free(void *object, unsigned long ip);
 
-struct page *addr_to_page(const void *addr);
-
-void kasan_set_track(struct kasan_track *track, gfp_t flags);
-
-#ifdef CONFIG_KASAN_SW_TAGS_IDENTIFY
-void kasan_set_free_info(struct kmem_cache *cache, void *object, u8 tag);
-struct kasan_track *kasan_get_free_track(struct kmem_cache *cache,
-		void *object, u8 tag);
-char *kasan_get_corruption_type(void *addr);
-#else
-static inline void kasan_set_free_info(struct kmem_cache *cache,
-		void *object, u8 tag) { }
-static inline struct kasan_track *kasan_get_free_track(
-		struct kmem_cache *cache, void *object, u8 tag)
-{
-	return NULL;
-}
-static inline char *kasan_get_corruption_type(void *addr)
-{
-	return NULL;
-}
-#endif
+struct page *kasan_addr_to_page(const void *addr);
 
 #if defined(CONFIG_KASAN_GENERIC) && \
 	(defined(CONFIG_SLAB) || defined(CONFIG_SLUB))
diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index 9ea7a4265b42..621782100eaa 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -111,6 +111,14 @@ static void print_track(struct kasan_track *track, const char *prefix)
 	}
 }
 
+struct page *kasan_addr_to_page(const void *addr)
+{
+	if ((addr >= (void *)PAGE_OFFSET) &&
+			(addr < high_memory))
+		return virt_to_head_page(addr);
+	return NULL;
+}
+
 static void describe_object_addr(struct kmem_cache *cache, void *object,
 				const void *addr)
 {
@@ -143,28 +151,42 @@ static void describe_object_addr(struct kmem_cache *cache, void *object,
 		(void *)(object_addr + cache->object_size));
 }
 
+static struct kasan_track *kasan_get_free_track(struct kmem_cache *cache,
+		void *object, u8 tag)
+{
+	struct kasan_alloc_meta *alloc_meta;
+	int i = 0;
+
+	alloc_meta = get_alloc_info(cache, object);
+
+#ifdef CONFIG_KASAN_SW_TAGS_IDENTIFY
+	for (i = 0; i < KASAN_NR_FREE_STACKS; i++) {
+		if (alloc_meta->free_pointer_tag[i] == tag)
+			break;
+	}
+	if (i == KASAN_NR_FREE_STACKS)
+		i = alloc_meta->free_track_idx;
+#endif
+
+	return &alloc_meta->free_track[i];
+}
+
 static void describe_object(struct kmem_cache *cache, void *object,
-				const void *tagged_addr)
+				const void *addr, u8 tag)
 {
-	void *untagged_addr = reset_tag(tagged_addr);
 	struct kasan_alloc_meta *alloc_info = get_alloc_info(cache, object);
 
 	if (cache->flags & SLAB_KASAN) {
+		struct kasan_track *free_track;
+
 		print_track(&alloc_info->alloc_track, "Allocated");
 		pr_err("\n");
-		if (IS_ENABLED(CONFIG_KASAN_SW_TAGS_IDENTIFY)) {
-			struct kasan_track *free_track;
-			u8 tag = get_tag(tagged_addr);
-
-			free_track = kasan_get_free_track(cache, object, tag);
-			print_track(free_track, "Freed");
-		} else {
-			print_track(&alloc_info->free_track, "Freed");
-			pr_err("\n");
-		}
+		free_track = kasan_get_free_track(cache, object, tag);
+		print_track(free_track, "Freed");
+		pr_err("\n");
 	}
 
-	describe_object_addr(cache, object, untagged_addr);
+	describe_object_addr(cache, object, addr);
 }
 
 static inline bool kernel_or_module_addr(const void *addr)
@@ -345,25 +367,23 @@ static void print_address_stack_frame(const void *addr)
 	print_decoded_frame_descr(frame_descr);
 }
 
-static void print_address_description(void *tagged_addr)
+static void print_address_description(void *addr, u8 tag)
 {
-	void *untagged_addr = reset_tag(tagged_addr);
-	struct page *page = addr_to_page(untagged_addr);
+	struct page *page = kasan_addr_to_page(addr);
 
 	dump_stack();
 	pr_err("\n");
 
 	if (page && PageSlab(page)) {
 		struct kmem_cache *cache = page->slab_cache;
-		void *object = nearest_obj(cache, page,	untagged_addr);
+		void *object = nearest_obj(cache, page,	addr);
 
-		describe_object(cache, object, tagged_addr);
+		describe_object(cache, object, addr, tag);
 	}
 
-	if (kernel_or_module_addr(untagged_addr) &&
-			!init_task_stack_addr(untagged_addr)) {
+	if (kernel_or_module_addr(addr) && !init_task_stack_addr(addr)) {
 		pr_err("The buggy address belongs to the variable:\n");
-		pr_err(" %pS\n", tagged_addr);
+		pr_err(" %pS\n", addr);
 	}
 
 	if (page) {
@@ -371,7 +391,7 @@ static void print_address_description(void *tagged_addr)
 		dump_page(page, "kasan: bad access detected");
 	}
 
-	print_address_stack_frame(untagged_addr);
+	print_address_stack_frame(addr);
 }
 
 static bool row_is_guilty(const void *row, const void *guilty)
@@ -435,25 +455,18 @@ static bool report_enabled(void)
 	return !test_and_set_bit(KASAN_BIT_REPORTED, &kasan_flags);
 }
 
-struct page *addr_to_page(const void *addr)
-{
-	if ((addr >= (void *)PAGE_OFFSET) &&
-			(addr < high_memory))
-		return virt_to_head_page(addr);
-	return NULL;
-}
-
 void kasan_report_invalid_free(void *object, unsigned long ip)
 {
 	unsigned long flags;
+	u8 tag = get_tag(object);
 
+	object = reset_tag(object);
 	start_report(&flags);
 	pr_err("BUG: KASAN: double-free or invalid-free in %pS\n", (void *)ip);
-	print_tags(get_tag(object), reset_tag(object));
+	print_tags(tag, object);
 	pr_err("\n");
-	print_address_description(object);
+	print_address_description(object, tag);
 	pr_err("\n");
-	object = reset_tag(object);
 	print_shadow_for_address(object);
 	end_report(&flags);
 }
@@ -490,7 +503,7 @@ void __kasan_report(unsigned long addr, size_t size, bool is_write, unsigned lon
 	pr_err("\n");
 
 	if (addr_has_shadow(untagged_addr)) {
-		print_address_description(tagged_addr);
+		print_address_description(untagged_addr, get_tag(tagged_addr));
 		pr_err("\n");
 		print_shadow_for_address(info.first_bad_addr);
 	} else {
diff --git a/mm/kasan/tags.c b/mm/kasan/tags.c
index 05a11f1cfff7..0e987c9ca052 100644
--- a/mm/kasan/tags.c
+++ b/mm/kasan/tags.c
@@ -161,89 +161,3 @@ void __hwasan_tag_memory(unsigned long addr, u8 tag, unsigned long size)
 	kasan_poison_shadow((void *)addr, size, tag);
 }
 EXPORT_SYMBOL(__hwasan_tag_memory);
-
-#ifdef CONFIG_KASAN_SW_TAGS_IDENTIFY
-void kasan_set_free_info(struct kmem_cache *cache,
-		void *object, u8 tag)
-{
-	struct kasan_alloc_meta *alloc_meta;
-	struct extra_free_info *free_info;
-	u8 idx;
-
-	alloc_meta = get_alloc_info(cache, object);
-	free_info = &alloc_meta->free_info;
-
-	if (free_info->free_track_tail == 0)
-		free_info->free_track_tail = KASAN_EXTRA_FREE_INFO_COUNT;
-	else
-		free_info->free_track_tail -= 1;
-
-	idx = free_info->free_track_tail;
-	free_info->free_pointer_tag[idx] = tag;
-
-	if (idx == KASAN_EXTRA_FREE_INFO_COUNT)
-		kasan_set_track(&alloc_meta->free_track, GFP_NOWAIT);
-	else
-		kasan_set_track(&free_info->free_track[idx], GFP_NOWAIT);
-}
-
-struct kasan_track *kasan_get_free_track(struct kmem_cache *cache,
-		void *object, u8 tag)
-{
-	struct kasan_alloc_meta *alloc_meta;
-	struct extra_free_info *free_info;
-	int idx, i;
-
-	alloc_meta = get_alloc_info(cache, object);
-	free_info = &alloc_meta->free_info;
-
-	for (i = 0; i < KASAN_TOTAL_FREE_INFO_COUNT; i++) {
-		idx = free_info->free_track_tail + i;
-		if (idx >= KASAN_TOTAL_FREE_INFO_COUNT)
-			idx -= KASAN_TOTAL_FREE_INFO_COUNT;
-
-		if (free_info->free_pointer_tag[idx] == tag) {
-			if (idx == KASAN_EXTRA_FREE_INFO_COUNT)
-				return &alloc_meta->free_track;
-			else
-				return &free_info->free_track[idx];
-		}
-	}
-	if (free_info->free_track_tail == KASAN_EXTRA_FREE_INFO_COUNT)
-		return &alloc_meta->free_track;
-	else
-		return &free_info->free_track[free_info->free_track_tail];
-}
-
-char *kasan_get_corruption_type(void *addr)
-{
-	struct kasan_alloc_meta *alloc_meta;
-	struct extra_free_info *free_info;
-	struct page *page;
-	struct kmem_cache *cache;
-	void *object;
-	u8 tag;
-	int idx, i;
-
-	tag = get_tag(addr);
-	addr = reset_tag(addr);
-	page = addr_to_page(addr);
-	if (page && PageSlab(page)) {
-		cache = page->slab_cache;
-		object = nearest_obj(cache, page, addr);
-		alloc_meta = get_alloc_info(cache, object);
-		free_info = &alloc_meta->free_info;
-
-		for (i = 0; i < KASAN_TOTAL_FREE_INFO_COUNT; i++) {
-			idx = free_info->free_track_tail + i;
-			if (idx >= KASAN_TOTAL_FREE_INFO_COUNT)
-				idx -= KASAN_TOTAL_FREE_INFO_COUNT;
-
-			if (free_info->free_pointer_tag[idx] == tag)
-				return "use-after-free";
-		}
-		return "out-of-bounds";
-	}
-	return "invalid-access";
-}
-#endif
diff --git a/mm/kasan/tags_report.c b/mm/kasan/tags_report.c
index 6d8cdb91c4b6..969ae08f59d7 100644
--- a/mm/kasan/tags_report.c
+++ b/mm/kasan/tags_report.c
@@ -36,10 +36,31 @@
 
 const char *get_bug_type(struct kasan_access_info *info)
 {
-	if (IS_ENABLED(CONFIG_KASAN_SW_TAGS_IDENTIFY))
-		return(kasan_get_corruption_type((void *)info->access_addr));
-	else
-		return "invalid-access";
+#ifdef CONFIG_KASAN_SW_TAGS_IDENTIFY
+	struct kasan_alloc_meta *alloc_meta;
+	struct kmem_cache *cache;
+	struct page *page;
+	const void *addr;
+	void *object;
+	u8 tag;
+	int i;
+
+	tag = get_tag(info->access_addr);
+	addr = reset_tag(info->access_addr);
+	page = kasan_addr_to_page(addr);
+	if (page && PageSlab(page)) {
+		cache = page->slab_cache;
+		object = nearest_obj(cache, page, (void *)addr);
+		alloc_meta = get_alloc_info(cache, object);
+
+		for (i = 0; i < KASAN_NR_FREE_STACKS; i++)
+			if (alloc_meta->free_pointer_tag[i] == tag)
+				return "use-after-free";
+		return "out-of-bounds";
+	}
+
+#endif
+	return "invalid-access";
 }
 
 void *find_first_bad_addr(void *addr, size_t size)

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH v2] ARM: UNWINDER_FRAME_POINTER implementation for Clang
From: Nathan Huckleberry @ 2019-08-21 17:46 UTC (permalink / raw)
  To: linux
  Cc: Tri Vo, ndesaulniers, linux-kernel, Nathan Huckleberry,
	clang-built-linux, miles.chen, linux-arm-kernel
In-Reply-To: <CAJkfWY4cHz+i8kYg2i1Krs-32nh7-WQU+psT=DRGYnTje6yj4Q@mail.gmail.com>

The stackframe setup when compiled with clang is different.
Since the stack unwinder expects the gcc stackframe setup it
fails to print backtraces. This patch adds support for the
clang stackframe setup.

Link: https://github.com/ClangBuiltLinux/linux/issues/35
Cc: clang-built-linux@googlegroups.com
Suggested-by: Tri Vo <trong@google.com>
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
---
Changes from v1->v2
* Fix indentation in various files
* Swap spaces for tabs
* Rename Ldsi to Lopcode
* Remove unused Ldsi entry

 arch/arm/Kconfig.debug         |   2 +-
 arch/arm/Makefile              |   5 +-
 arch/arm/lib/Makefile          |   8 +-
 arch/arm/lib/backtrace-clang.S | 229 +++++++++++++++++++++++++++++++++
 4 files changed, 241 insertions(+), 3 deletions(-)
 create mode 100644 arch/arm/lib/backtrace-clang.S

diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug
index 85710e078afb..b9c674ec19e0 100644
--- a/arch/arm/Kconfig.debug
+++ b/arch/arm/Kconfig.debug
@@ -56,7 +56,7 @@ choice
 
 config UNWINDER_FRAME_POINTER
 	bool "Frame pointer unwinder"
-	depends on !THUMB2_KERNEL && !CC_IS_CLANG
+	depends on !THUMB2_KERNEL
 	select ARCH_WANT_FRAME_POINTERS
 	select FRAME_POINTER
 	help
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index c3624ca6c0bc..6f251c201db0 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -36,7 +36,10 @@ KBUILD_CFLAGS	+= $(call cc-option,-mno-unaligned-access)
 endif
 
 ifeq ($(CONFIG_FRAME_POINTER),y)
-KBUILD_CFLAGS	+=-fno-omit-frame-pointer -mapcs -mno-sched-prolog
+KBUILD_CFLAGS	+=-fno-omit-frame-pointer
+ifeq ($(CONFIG_CC_IS_GCC),y)
+KBUILD_CFLAGS += -mapcs -mno-sched-prolog
+endif
 endif
 
 ifeq ($(CONFIG_CPU_BIG_ENDIAN),y)
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index b25c54585048..6d2ba454f25b 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -5,7 +5,7 @@
 # Copyright (C) 1995-2000 Russell King
 #
 
-lib-y		:= backtrace.o changebit.o csumipv6.o csumpartial.o   \
+lib-y		:= changebit.o csumipv6.o csumpartial.o               \
 		   csumpartialcopy.o csumpartialcopyuser.o clearbit.o \
 		   delay.o delay-loop.o findbit.o memchr.o memcpy.o   \
 		   memmove.o memset.o setbit.o                        \
@@ -19,6 +19,12 @@ lib-y		:= backtrace.o changebit.o csumipv6.o csumpartial.o   \
 mmu-y		:= clear_user.o copy_page.o getuser.o putuser.o       \
 		   copy_from_user.o copy_to_user.o
 
+ifdef CONFIG_CC_IS_CLANG
+  lib-y	+= backtrace-clang.o
+else
+  lib-y	+= backtrace.o
+endif
+
 # using lib_ here won't override already available weak symbols
 obj-$(CONFIG_UACCESS_WITH_MEMCPY) += uaccess_with_memcpy.o
 
diff --git a/arch/arm/lib/backtrace-clang.S b/arch/arm/lib/backtrace-clang.S
new file mode 100644
index 000000000000..6f2a8a57d0fb
--- /dev/null
+++ b/arch/arm/lib/backtrace-clang.S
@@ -0,0 +1,229 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ *  linux/arch/arm/lib/backtrace-clang.S
+ *
+ *  Copyright (C) 2019 Nathan Huckleberry
+ *
+ */
+#include <linux/kern_levels.h>
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+		.text
+
+/* fp is 0 or stack frame */
+
+#define frame	r4
+#define sv_fp	r5
+#define sv_pc	r6
+#define mask	r7
+#define sv_lr	r8
+
+ENTRY(c_backtrace)
+
+#if !defined(CONFIG_FRAME_POINTER) || !defined(CONFIG_PRINTK)
+		ret	lr
+ENDPROC(c_backtrace)
+#else
+
+
+/*
+ * Clang does not store pc or sp in function prologues
+ * so we don't know exactly where the function
+ * starts.
+ *
+ * We can treat the current frame's lr as the saved pc and the
+ * preceding frame's lr as the current frame's lr,
+ * but we can't trace the most recent call.
+ * Inserting a false stack frame allows us to reference the
+ * function called last in the stacktrace.
+ *
+ * If the call instruction was a bl we can look at the callers
+ * branch instruction to calculate the saved pc.
+ * We can recover the pc in most cases, but in cases such as
+ * calling function pointers we cannot. In this
+ * case, default to using the lr. This will be
+ * some address in the function, but will not
+ * be the function start.
+ *
+ * Unfortunately due to the stack frame layout we can't dump
+ *              r0 - r3, but these are less frequently saved.
+ *
+ * Stack frame layout:
+ * 		<larger addresses>
+ * 		saved lr
+ * 	frame=> saved fp
+ * 		optionally saved caller registers (r4 - r10)
+ * 		optionally saved arguments (r0 - r3)
+ * 		<top of stack frame>
+ * 		<smaller addresses>
+ *
+ * Functions start with the following code sequence:
+ * corrected pc =>  stmfd sp!, {..., fp, lr}
+ *		add fp, sp, #x
+ *		stmfd sp!, {r0 - r3} (optional)
+ *
+ *
+ *
+ *
+ *
+ *
+ * The diagram below shows an example stack setup
+ * for dump_stack.
+ *
+ * The frame for c_backtrace has pointers to the
+ * code of dump_stack. This is why the frame of
+ * c_backtrace is used to for the pc calculation
+ * of dump_stack. This is why we must move back
+ * a frame to print dump_stack.
+ *
+ * The stored locals for dump_stack are in dump_stack's
+ * frame. This means that to fully print dump_stack's frame
+ * we need both the frame for dump_stack (for locals) and the
+ * frame that was called by dump_stack (for pc).
+ *
+ * To print locals we must know where the function start is. If
+ * we read the function prologue opcodes we can determine
+ * which variables are stored in the stack frame.
+ *
+ * To find the function start of dump_stack we can look at the
+ * stored LR of show_stack. It points at the instruction
+ * directly after the bl dump_stack. We can then read the
+ * offset from the bl opcode to determine where the branch takes us.
+ * The address calculated must be the start of dump_stack.
+ *
+ * c_backtrace frame           dump_stack:
+ * {[LR]    }  ============|   ...
+ * {[FP]    }  =======|    |   bl c_backtrace
+ *                    |    |=> ...
+ * {[R4-R10]}         |
+ * {[R0-R3] }         |        show_stack:
+ * dump_stack frame   |        ...
+ * {[LR]    } =============|   bl dump_stack
+ * {[FP]    } <=======|    |=> ...
+ * {[R4-R10]}
+ * {[R0-R3] }
+ */
+
+stmfd   sp!, {r4 - r9, fp, lr}	@ Save an extra register
+				@ to ensure 8 byte alignment
+movs	frame, r0		@ if frame pointer is zero
+beq	no_frame		@ we have no stack frames
+
+tst	r1, #0x10		@ 26 or 32-bit mode?
+moveq	mask, #0xfc000003
+movne	mask, #0		@ mask for 32-bit
+
+/*
+ * Switches the current frame to be the frame for dump_stack.
+ */
+		add	frame, sp, #24		@ switch to false frame
+for_each_frame:	tst	frame, mask		@ Check for address exceptions
+		bne	no_frame
+
+/*
+ * sv_fp is the stack frame with the locals for the current considered
+ * function.
+ *
+ * sv_pc is the saved lr frame the frame above. This is a pointer to a
+ * code address within the current considered function, but
+ * it is not the function start. This value gets updated to be
+ * the function start later if it is possible.
+ */
+1001:		ldr	sv_pc, [frame, #4]	@ get saved 'pc'
+1002:		ldr	sv_fp, [frame, #0]	@ get saved fp
+
+		teq	sv_fp, mask		@ make sure next frame exists
+		beq	no_frame
+
+/*
+ * sv_lr is the lr from the function that called the current function. This
+ * is a pointer to a code address in the current function's caller.
+ * sv_lr-4 is the instruction used to call the current function.
+ *
+ * This sv_lr can be used to calculate the function start if the function
+ * was called using a bl instruction. If the function start
+ * can be recovered sv_pc is overwritten with the function start.
+ *
+ * If the current function was called using a function pointer we cannot
+ * recover the function start and instead continue with sv_pc as
+ * an arbitrary value within the current function. If this is the case
+ * we cannot print registers for the current function, but the stacktrace
+ * is still printed properly.
+ */
+1003:		ldr	sv_lr, [sv_fp, #4]	@ get saved lr from next frame
+
+		ldr	r0, [sv_lr, #-4]	@ get call instruction
+		ldr	r3, .Lopcode+4
+		and	r2, r3, r0		@ is this a bl call
+		teq	r2, r3
+		bne	finished_setup		@ give up if it's not
+		and	r0, #0xffffff		@ get call offset 24-bit int
+		lsl	r0, r0, #8		@ sign extend offset
+		asr	r0, r0, #8
+		ldr	sv_pc, [sv_fp, #4]	@ get lr address
+		add	sv_pc, sv_pc, #-4	@ get call instruction address
+		add	sv_pc, sv_pc, #8	@ take care of prefetch
+		add	sv_pc, sv_pc, r0, lsl #2@ find function start
+
+finished_setup:
+
+		bic	sv_pc, sv_pc, mask	@ mask PC/LR for the mode
+
+/*
+ * Print the function (sv_pc) and where it was called
+ * from (sv_lr).
+ */
+1004:		mov	r0, sv_pc
+
+		mov	r1, sv_lr
+		mov	r2, frame
+		bic	r1, r1, mask		@ mask PC/LR for the mode
+		bl	dump_backtrace_entry
+
+/*
+ * Test if the function start is a stmfd instruction
+ * to determine which registers were stored in the function
+ * prologue.
+ *
+ * If we could not recover the sv_pc because we were called through
+ * a function pointer the comparison will fail and no registers
+ * will print.
+ */
+1005:		ldr	r1, [sv_pc, #0]		@ if stmfd sp!, {..., fp, lr}
+		ldr	r3, .Lopcode		@ instruction exists,
+		teq	r3, r1, lsr #11
+		ldr	r0, [frame]		@ locals are stored in
+						@ the preceding frame
+		subeq	r0, r0, #4
+		bleq	dump_backtrace_stm	@ dump saved registers
+
+/*
+ * If we are out of frames or if the next frame is invalid.
+ */
+		teq	sv_fp, #0		@ zero saved fp means
+		beq	no_frame		@ no further frames
+
+		cmp	sv_fp, frame		@ next frame must be
+		mov	frame, sv_fp		@ above the current frame
+		bhi	for_each_frame
+
+1006:		adr	r0, .Lbad
+		mov	r1, frame
+		bl	printk
+no_frame:	ldmfd	sp!, {r4 - r9, fp, pc}
+ENDPROC(c_backtrace)
+		.pushsection __ex_table,"a"
+		.align	3
+		.long	1001b, 1006b
+		.long	1002b, 1006b
+		.long	1003b, 1006b
+		.long	1004b, 1006b
+		.long   1005b, 1006b
+		.popsection
+
+.Lbad:		.asciz	"Backtrace aborted due to bad frame pointer <%p>\n"
+		.align
+.Lopcode:	.word	0xe92d4800 >> 11	@ stmfd sp!, {... fp, lr}
+		.word	0x0b000000		@ bl if these bits are set
+
+#endif
-- 
2.23.0.rc1.153.gdeed80330f-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* Re: [PATCH] ARM: UNWINDER_FRAME_POINTER implementation for Clang
From: Nathan Huckleberry @ 2019-08-21 17:43 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Tri Vo, Russell King, LKML, clang-built-linux,
	Miles Chen (陳民樺), Linux ARM
In-Reply-To: <CAKwvOdm+sGyKfAMNbL10ME=DrG5=4d5kvzdMxjNC22JLLr1h=g@mail.gmail.com>

On Tue, Aug 20, 2019 at 2:39 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> On Tue, Aug 20, 2019 at 12:44 PM Nathan Huckleberry <nhuck@google.com> wrote:
> >
> > The stackframe setup when compiled with clang is different.
> > Since the stack unwinder expects the gcc stackframe setup it
> > fails to print backtraces. This patch adds support for the
> > clang stackframe setup.
> >
> > Link: https://github.com/ClangBuiltLinux/linux/issues/35
> > Cc: clang-built-linux@googlegroups.com
> > Suggested-by: Tri Vo <trong@google.com>
> > Signed-off-by: Nathan Huckleberry <nhuck@google.com>
> > ---
> > Changes from RFC
> > * Push extra register to satisfy 8 byte alignment requirement
> > * Add clarifying comments
> > * Separate code into its own file
>
> Thanks for the patch! The added comments and moving the implementation
> to its own file make it easier to review.
>
> >
> >  arch/arm/Kconfig.debug         |   4 +-
> >  arch/arm/Makefile              |   5 +-
> >  arch/arm/lib/Makefile          |   8 +-
> >  arch/arm/lib/backtrace-clang.S | 224 +++++++++++++++++++++++++++++++++
> >  4 files changed, 237 insertions(+), 4 deletions(-)
> >  create mode 100644 arch/arm/lib/backtrace-clang.S
> >
> > diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug
> > index 85710e078afb..92fca7463e21 100644
> > --- a/arch/arm/Kconfig.debug
> > +++ b/arch/arm/Kconfig.debug
> > @@ -56,7 +56,7 @@ choice
> >
> >  config UNWINDER_FRAME_POINTER
> >         bool "Frame pointer unwinder"
> > -       depends on !THUMB2_KERNEL && !CC_IS_CLANG
> > +       depends on !THUMB2_KERNEL
> >         select ARCH_WANT_FRAME_POINTERS
> >         select FRAME_POINTER
> >         help
> > @@ -1872,7 +1872,7 @@ config DEBUG_UNCOMPRESS
> >           When this option is set, the selected DEBUG_LL output method
> >           will be re-used for normal decompressor output on multiplatform
> >           kernels.
> > -
> > +
>
> Probably can drop the added newline?
>
> >
> >  config UNCOMPRESS_INCLUDE
> >         string
> > diff --git a/arch/arm/Makefile b/arch/arm/Makefile
> > index c3624ca6c0bc..729e223b83fe 100644
> > --- a/arch/arm/Makefile
> > +++ b/arch/arm/Makefile
> > @@ -36,7 +36,10 @@ KBUILD_CFLAGS        += $(call cc-option,-mno-unaligned-access)
> >  endif
> >
> >  ifeq ($(CONFIG_FRAME_POINTER),y)
> > -KBUILD_CFLAGS  +=-fno-omit-frame-pointer -mapcs -mno-sched-prolog
> > +KBUILD_CFLAGS  +=-fno-omit-frame-pointer
> > +  ifeq ($(CONFIG_CC_IS_GCC),y)
> > +  KBUILD_CFLAGS += -mapcs -mno-sched-prolog
> > +  endif
>
> While I can appreciate the indentation, it's unusual to indent
> additional depths of kernel Makefiles.  At least the rest of this file
> does not do so.  Of course, the other Makefile you touch below does
> two spaces.  At least try to keep the file internally consistent, even
> if the kernel itself is inconsistent.
>
> >  endif
> >
> >  ifeq ($(CONFIG_CPU_BIG_ENDIAN),y)
> > diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
> > index b25c54585048..e10a769c72ec 100644
> > --- a/arch/arm/lib/Makefile
> > +++ b/arch/arm/lib/Makefile
> > @@ -5,7 +5,7 @@
> >  # Copyright (C) 1995-2000 Russell King
> >  #
> >
> > -lib-y          := backtrace.o changebit.o csumipv6.o csumpartial.o   \
> > +lib-y          := changebit.o csumipv6.o csumpartial.o               \
> >                    csumpartialcopy.o csumpartialcopyuser.o clearbit.o \
> >                    delay.o delay-loop.o findbit.o memchr.o memcpy.o   \
> >                    memmove.o memset.o setbit.o                        \
> > @@ -19,6 +19,12 @@ lib-y                := backtrace.o changebit.o csumipv6.o csumpartial.o   \
> >  mmu-y          := clear_user.o copy_page.o getuser.o putuser.o       \
> >                    copy_from_user.o copy_to_user.o
> >
> > +ifdef CONFIG_CC_IS_CLANG
> > +  lib-y += backtrace-clang.o
> > +else
> > +  lib-y += backtrace.o
> > +endif
>
> The indentation should match the above (from this file).  Looks like 1
> tab after lib-y.  See L34(CONFIG_CPU_32v3) for what I would have
> expected.
>
> > +
> >  # using lib_ here won't override already available weak symbols
> >  obj-$(CONFIG_UACCESS_WITH_MEMCPY) += uaccess_with_memcpy.o
> >
> > diff --git a/arch/arm/lib/backtrace-clang.S b/arch/arm/lib/backtrace-clang.S
> > new file mode 100644
> > index 000000000000..2b02014dbdd1
> > --- /dev/null
> > +++ b/arch/arm/lib/backtrace-clang.S
> > @@ -0,0 +1,224 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + *  linux/arch/arm/lib/backtrace-clang.S
> > + *
> > + *  Copyright (C) 2019 Nathan Huckleberry
> > + *
> > + */
> > +#include <linux/kern_levels.h>
> > +#include <linux/linkage.h>
> > +#include <asm/assembler.h>
> > +               .text
> > +
> > +/* fp is 0 or stack frame */
>
> ah, I see that the reference implementation uses an assembly comment
> here. Ok (sorry for the noise).
>
> > +
> > +#define frame  r4
> > +#define sv_fp  r5
> > +#define sv_pc  r6
> > +#define mask   r7
> > +#define sv_lr   r8
> > +
> > +ENTRY(c_backtrace)
> > +
> > +#if !defined(CONFIG_FRAME_POINTER) || !defined(CONFIG_PRINTK)
> > +               ret     lr
> > +ENDPROC(c_backtrace)
> > +#else
> > +
> > +
> > +/*
> > + * Clang does not store pc or sp in function prologues
> > + *             so we don't know exactly where the function
> > + *             starts.
> > + * We can treat the current frame's lr as the saved pc and the
> > + *             preceding frame's lr as the lr, but we can't
>
> preceding frame's lr as the current frame's lr, ...
>
> > + *             trace the most recent call.
> > + * Inserting a false stack frame allows us to reference the
> > + *             function called last in the stacktrace.
> > + * If the call instruction was a bl we can look at the callers
> > + *             branch instruction to calculate the saved pc.
> > + * We can recover the pc in most cases, but in cases such as
> > + *             calling function pointers we cannot. In this
> > + *             case, default to using the lr. This will be
> > + *             some address in the function, but will not
> > + *             be the function start.
> > + * Unfortunately due to the stack frame layout we can't dump
> > + *              r0 - r3, but these are less frequently saved.
>
> The use of tabs vs spaces in these comments is inconsistent.  Not that
> I can see whitespace, but:
> https://github.com/nickdesaulniers/dotfiles/blob/37359525f5a403b4ed2d3f9d1bbbee2da8ec8115/.vimrc#L35-L41
> Also, I don't think you need to tab indent every line after the first.
> Where did that format come from?
>
> > + *
> > + * Stack frame layout:
> > + *             <larger addresses>
> > + *             saved lr
> > + *    frame => saved fp
> > + *             optionally saved caller registers (r4 - r10)
> > + *             optionally saved arguments (r0 - r3)
> > + *             <top of stack frame>
> > + *             <smaller addresses>
> > + *
> > + * Functions start with the following code sequence:
> > + * corrected pc =>  stmfd sp!, {..., fp, lr}
> > + *                 add fp, sp, #x
> > + *                 stmfd sp!, {r0 - r3} (optional)
> > + *
> > + *
> > + *
> > + *
> > + *
> > + *
> > + * The diagram below shows an example stack setup
> > + *     for dump_stack.
> > + *
> > + * The frame for c_backtrace has pointers to the
> > + *     code of dump_stack. This is why the frame of
> > + *     c_backtrace is used to for the pc calculation
> > + *     of dump_stack. This is why we must move back
> > + *     a frame to print dump_stack.
> > + *
> > + * The stored locals for dump_stack are in dump_stack's
> > + *     frame. This means that to fully print dump_stack's frame
> > + *     we need the both the frame for dump_stack (for locals) and the
>
> we need both the ...
> (There's an extra `the` in the sentence).
>
> > + *     frame that was called by dump_stack (for pc).
> > + *
> > + * To print locals we must know where the function start is. If
> > + *     we read the function prologue opcodes we can determine
> > + *     which variables are stored in the stack frame.
> > + *
> > + * To find the function start of dump_stack we can look at the
> > + *     stored LR of show_stack. It points at the instruction
> > + *     directly after the bl dump_stack. We can then read the
> > + *     offset from the bl opcode to determine where the branch takes us.
> > + *     The address calculated must be the start of dump_stack.
> > + *
> > + * c_backtrace frame           dump_stack:
> > + * {[LR]    }  ============|   ...
> > + * {[FP]    }  =======|    |   bl c_backtrace
> > + *                    |    |=> ...
> > + * {[R4-R10]}         |
> > + * {[R0-R3] }         |        show_stack:
> > + * dump_stack frame   |        ...
> > + * {[LR]    } =============|   bl dump_stack
> > + * {[FP]    } <=======|    |=> ...
> > + * {[R4-R10]}
> > + * {[R0-R3] }
> > + */
> > +
> > +stmfd   sp!, {r4 - r9, fp, lr} @ Save an extra register
> > +                               @ to ensure 8 byte alignment
> > +movs   frame, r0               @ if frame pointer is zero
> > +beq    no_frame                @ we have no stack frames
> > +
> > +tst    r1, #0x10               @ 26 or 32-bit mode?
> > +moveq  mask, #0xfc000003
>
> Should we be using different masks for ARM vs THUMB as per the
> reference implementation?
The change that introduces the arm/thumb code looked like a script
that was run over all arm in the kernel. Neither this code nor the
reference solution is compatible with arm, so there's no need for the
change.
>
> > +movne  mask, #0                @ mask for 32-bit
> > +
> > +/*
> > + * Switches the current frame to be the frame for dump_stack.
> > + */
> > +               add     frame, sp, #24          @ switch to false frame
> > +for_each_frame:        tst     frame, mask             @ Check for address exceptions
> > +               bne     no_frame
> > +
> > +/*
> > + * sv_fp is the stack frame with the locals for the current considered
> > + *     function.
> > + * sv_pc is the saved lr frame the frame above. This is a pointer to a
> > + *     code address within the current considered function, but
> > + *     it is not the function start. This value gets updated to be
> > + *     the function start later if it is possible.
> > + */
> > +1001:          ldr     sv_pc, [frame, #4]      @ get saved 'pc'
> > +1002:          ldr     sv_fp, [frame, #0]      @ get saved fp
>
> The reference implementation applies the mask to sv_pc and sv_fp.  I
> assume we want to, too?
The mask is already applied to both. See for_each_frame:
>
> > +
> > +               teq     sv_fp, #0               @ make sure next frame exists
> > +               beq     no_frame
> > +
> > +/*
> > + * sv_lr is the lr from the function that called the current function. This
> > + *     is a pointer to a code address in the current function's caller.
> > + *     sv_lr-4 is the instruction used to call the current function.
> > + * This sv_lr can be used to calculate the function start if the function
> > + *     was called using a bl instruction. If the function start
> > + *     can be recovered sv_pc is overwritten with the function start.
> > + * If the current function was called using a function pointer we cannot
> > + *     recover the function start and instead continue with sv_pc as
> > + *     an arbitrary value within the current function. If this is the case
> > + *     we cannot print registers for the current function, but the stacktrace
> > + *     is still printed properly.
> > + */
> > +1003:          ldr     sv_lr, [sv_fp, #4]      @ get saved lr from next frame
> > +
> > +               ldr     r0, [sv_lr, #-4]        @ get call instruction
> > +               ldr     r3, .Ldsi+8
>
> I wonder what `dsi` stands for, it could use a better name.  Maybe put
> that mask in a more descriptively named section and use that instead
> of `.Ldsi+8`?
>
> > +               and     r2, r3, r0              @ is this a bl call
> > +               teq     r2, r3
> > +               bne     finished_setup          @ give up if it's not
> > +               and     r0, #0xffffff           @ get call offset 24-bit int
> > +               lsl     r0, r0, #8              @ sign extend offset
> > +               asr     r0, r0, #8
>
> It's too bad this should work for older ARM versions, v6 added
> dedicated instructions for this:
> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/Cihfifdd.html
>
> > +               ldr     sv_pc, [sv_fp, #4]      @ get lr address
> > +               add     sv_pc, sv_pc, #-4       @ get call instruction address
> > +               add     sv_pc, sv_pc, #8        @ take care of prefetch
> > +               add     sv_pc, sv_pc, r0, lsl #2 @ find function start
> > +
> > +finished_setup:
> > +
> > +               bic     sv_pc, sv_pc, mask      @ mask PC/LR for the mode
> > +
> > +/*
> > + * Print the function (sv_pc) and where it was called
> > + *     from (sv_lr).
> > + */
> > +1004:          mov     r0, sv_pc
> > +
> > +               mov     r1, sv_lr
> > +               mov     r2, frame
> > +               bic     r1, r1, mask            @ mask PC/LR for the mode
> > +               bl      dump_backtrace_entry
> > +
> > +/*
> > + * Test if the function start is a stmfd instruction
> > + *     to determine which registers were stored in the function
> > + *     prologue.
> > + * If we could not recover the sv_pc because we were called through
> > + *     a function pointer the comparison will fail and no registers
> > + *     will print.
> > + */
> > +1005:          ldr     r1, [sv_pc, #0]         @ if stmfd sp!, {..., fp, lr}
> > +               ldr     r3, .Ldsi               @ instruction exists,
> > +               teq     r3, r1, lsr #11
> > +               ldr     r0, [frame]             @ locals are stored in
> > +                                               @ the preceding frame
> > +               subeq   r0, r0, #4
> > +               bleq    dump_backtrace_stm      @ dump saved registers
>
> Do we need to do anything to test .Ldsi+4? Otherwise looks like we
> define it but don't use it?
>
> > +
> > +/*
> > + * If we are out of frames or if the next frame
> > + *     is invalid.
> > + */
> > +               teq     sv_fp, #0               @ zero saved fp means
> > +               beq     no_frame                @ no further frames
> > +
> > +               cmp     sv_fp, frame            @ next frame must be
> > +               mov     frame, sv_fp            @ above the current frame
> > +               bhi     for_each_frame
> > +
> > +1006:          adr     r0, .Lbad
> > +               mov     r1, frame
> > +               bl      printk
> > +no_frame:      ldmfd   sp!, {r4 - r9, fp, pc}
> > +ENDPROC(c_backtrace)
> > +               .pushsection __ex_table,"a"
> > +               .align  3
> > +               .long   1001b, 1006b
> > +               .long   1002b, 1006b
> > +               .long   1003b, 1006b
> > +               .long   1004b, 1006b
> > +               .long   1005b, 1006b
> > +               .popsection
> > +
> > +.Lbad:         .asciz  "Backtrace aborted due to bad frame pointer <%p>\n"
> > +               .align
> > +.Ldsi:         .word   0xe92d4800 >> 11        @ stmfd sp!, {... fp, lr}
> > +               .word   0xe92d0000 >> 11        @ stmfd sp!, {}
> > +               .word   0x0b000000              @ bl if these bits are set
> > +
> > +#endif
> > --
> > 2.23.0.rc1.153.gdeed80330f-goog
> >
>
>
> --
> Thanks,
> ~Nick Desaulniers
Thanks for the review, will send a v2 with your suggestions.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH 3/3] watchdog/aspeed: add support for dual boot
From: Alexander Amelkin @ 2019-08-21 17:42 UTC (permalink / raw)
  To: Guenter Roeck, Ivan Mikhaylov
  Cc: linux-watchdog, linux-aspeed, Andrew Jeffery, linux-kernel,
	Joel Stanley, Wim Van Sebroeck, linux-arm-kernel
In-Reply-To: <20190821163220.GA11547@roeck-us.net>


[-- Attachment #1.1.1: Type: text/plain, Size: 4787 bytes --]

21.08.2019 19:32, Guenter Roeck wrote:
> On Wed, Aug 21, 2019 at 06:57:43PM +0300, Ivan Mikhaylov wrote:
>> Set WDT_CLEAR_TIMEOUT_AND_BOOT_CODE_SELECTION into WDT_CLEAR_TIMEOUT_STATUS
>> to clear out boot code source and re-enable access to the primary SPI flash
>> chip while booted via wdt2 from the alternate chip.
>>
>> AST2400 datasheet says:
>> "In the 2nd flash booting mode, all the address mapping to CS0# would be
>> re-directed to CS1#. And CS0# is not accessable under this mode. To access
>> CS0#, firmware should clear the 2nd boot mode register in the WDT2 status
>> register WDT30.bit[1]."
> Is there reason to not do this automatically when loading the module
> in alt-boot mode ? What means does userspace have to determine if CS0
> or CS1 is active at any given time ? If there is reason to ever have CS1
> active instead of CS0, what means would userspace have to enable it ?

Yes, there is. The driver is loaded long before the filesystems are mounted. The filesystems, in the event of alternate/recovery boot, need to be mounted from the same chip that the kernel was booted. For one reason because the main chip at CS0 is most probably corrupt. If you clear that bit when driver is loaded, your software will not know that and will try to mount the wrong filesystems. The whole idea of ASPEED's switching chipselects is to have identical firmware in both chips, without the need to process the alternate boot state in any way except for indicating a successful boot and restoring access to CS0 when needed.

The userspace can read bootstatus sysfs node to determine if an alternate boot has occured.

With ASPEED, CS1 is activated automatically by wdt2 when system fails to boot from the primary flash chip (at CS0) and disable the watchdog to indicate a successful boot. When that happens, both CS0 and CS1 controls  get routed in hardware to CS1 line, making the primary flash chip inaccessible. Depending on the architecture of the user-space software, it may choose to re-enable access to the primary chip via CS0 at different times. There must be a way to do so.

> If userspace can not really determine if CS1 or CS0 is active, all it could
> ever do was to enable CS0 to be in a deterministic state. If so, it doesn't
> make sense to ever have CS1 active, and re-enabling CS0 could be automatic.
>
> Similar, if CS1 can ever be enabled, there is no means for userspace to ensure
> that some other application did not re-enable CS0 while it believes that CS1
> is enabled. If there is no means for userspace to enable CS1, it can never be
> sure what is enabled (because some other entity may have enabled CS0 while
> userspace just thought that CS1 is still enabled). Again, the only means
> to guarantee a well defined state would be to explicitly enable CS0 and provive
> no means to enable CS1. Again, this could be done during boot, not requiring
> an explicit request from userspace.

Please understand that activation of CS1 in place of CS0 is NOT a software choice!


>> +	if (unlikely(!wdt))
>> +		return -ENODEV;
>> +
> How would this ever happen, and how / where is drvdata set to NULL ?

This is purely for robustness. Seeing a pointer obtained via a function accessed without first checking it for validity makes me nervous.

This code most probably adds nothing at the assembly level.

>
>> +	writel(WDT_CLEAR_TIMEOUT_AND_BOOT_CODE_SELECTION,
>> +			wdt->base + WDT_CLEAR_TIMEOUT_STATUS);
>> +	wdt->wdd.bootstatus |= WDIOF_EXTERN1;
> The variable reflects the _boot status_. It should not change after booting.
Is there any documentation that dictates that? All I could find is

"bootstatus: status of the device after booting". That doesn't look to me like it absolutely can not change to reflect the updated status (that is, to reflect that the originally set up alternate CS routing has been reset to normal).

If you absolutely disallow that, I think we could make 'access_cs0' readable instead, so it could report the current state of the boot code selection bit. Reverted, I suppose. That way 'access_cs0' would report 1 after 1 has been written to it (it wouldn't be possible to write a zero).

> @@ -223,6 +248,9 @@ static int aspeed_wdt_probe(struct platform_device *pdev)
>  
>  	wdt->ctrl = WDT_CTRL_1MHZ_CLK;
>  
> +	if (of_property_read_bool(np, "aspeed,alt-boot"))
> +		wdt->wdd.groups = bswitch_groups;
> +
> Why does this have to be separate to the existing evaluation of
> aspeed,alt-boot, and why does the existing code not work ?
>
> Also, is it guaranteed that this does not interfer with existing
> support for alt-boot ?

I think Ivan will comment on this.

With best regards,
Alexander Amelkin,
BIOS/BMC Team Lead, YADRO
https://yadro.com



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox