LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 2/7] x86/dma: use IS_ENABLED() to simplify the code
From: Zhen Lei @ 2019-05-30  3:48 UTC (permalink / raw)
  To: Jean-Philippe Brucker, John Garry, Robin Murphy, Will Deacon,
	Joerg Roedel, Jonathan Corbet, linux-doc, Sebastian Ott,
	Gerald Schaefer, Martin Schwidefsky, Heiko Carstens,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Tony Luck, Fenghua Yu, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, David Woodhouse, iommu,
	linux-kernel, linux-s390, linuxppc-dev, x86, linux-ia64
  Cc: Hanjun Guo, Zhen Lei
In-Reply-To: <20190530034831.4184-1-thunder.leizhen@huawei.com>

This patch removes the ifdefs around CONFIG_IOMMU_DEFAULT_PASSTHROUGH to
improve readablity.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 arch/x86/kernel/pci-dma.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index dcd272dbd0a9330..9f2b19c35a060df 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -43,11 +43,8 @@
  * It is also possible to disable by default in kernel config, and enable with
  * iommu=nopt at boot time.
  */
-#ifdef CONFIG_IOMMU_DEFAULT_PASSTHROUGH
-int iommu_pass_through __read_mostly = 1;
-#else
-int iommu_pass_through __read_mostly;
-#endif
+int iommu_pass_through __read_mostly =
+			IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH);
 
 extern struct iommu_table_entry __iommu_table[], __iommu_table_end[];
 
-- 
1.8.3



^ permalink raw reply related

* [PATCH v8 1/7] iommu: enhance IOMMU default DMA mode build options
From: Zhen Lei @ 2019-05-30  3:48 UTC (permalink / raw)
  To: Jean-Philippe Brucker, John Garry, Robin Murphy, Will Deacon,
	Joerg Roedel, Jonathan Corbet, linux-doc, Sebastian Ott,
	Gerald Schaefer, Martin Schwidefsky, Heiko Carstens,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Tony Luck, Fenghua Yu, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, David Woodhouse, iommu,
	linux-kernel, linux-s390, linuxppc-dev, x86, linux-ia64
  Cc: Hanjun Guo, Zhen Lei
In-Reply-To: <20190530034831.4184-1-thunder.leizhen@huawei.com>

First, add build option IOMMU_DEFAULT_{LAZY|STRICT}, so that we have the
opportunity to set {lazy|strict} mode as default at build time. Then put
the three config options in an choice, make people can only choose one of
the three at a time.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 drivers/iommu/Kconfig | 42 +++++++++++++++++++++++++++++++++++-------
 drivers/iommu/iommu.c |  3 ++-
 2 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 83664db5221df02..d6a1a45f80ffbf5 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -75,17 +75,45 @@ config IOMMU_DEBUGFS
 	  debug/iommu directory, and then populate a subdirectory with
 	  entries as required.
 
-config IOMMU_DEFAULT_PASSTHROUGH
-	bool "IOMMU passthrough by default"
+choice
+	prompt "IOMMU default DMA mode"
 	depends on IOMMU_API
-        help
-	  Enable passthrough by default, removing the need to pass in
-	  iommu.passthrough=on or iommu=pt through command line. If this
-	  is enabled, you can still disable with iommu.passthrough=off
-	  or iommu=nopt depending on the architecture.
+	default IOMMU_DEFAULT_STRICT
+	help
+	  This option allows IOMMU DMA mode to be chose at build time, to
+	  override the default DMA mode of each ARCHs, removing the need to
+	  pass in kernel parameters through command line. You can still use
+	  ARCHs specific boot options to override this option again.
+
+config IOMMU_DEFAULT_PASSTHROUGH
+	bool "passthrough"
+	help
+	  In this mode, the DMA access through IOMMU without any addresses
+	  translation. That means, the wrong or illegal DMA access can not
+	  be caught, no error information will be reported.
 
 	  If unsure, say N here.
 
+config IOMMU_DEFAULT_LAZY
+	bool "lazy"
+	help
+	  Support lazy mode, where for every IOMMU DMA unmap operation, the
+	  flush operation of IOTLB and the free operation of IOVA are deferred.
+	  They are only guaranteed to be done before the related IOVA will be
+	  reused.
+
+config IOMMU_DEFAULT_STRICT
+	bool "strict"
+	help
+	  For every IOMMU DMA unmap operation, the flush operation of IOTLB and
+	  the free operation of IOVA are guaranteed to be done in the unmap
+	  function.
+
+	  This mode is safer than the two above, but it maybe slower in some
+	  high performace scenarios.
+
+endchoice
+
 config OF_IOMMU
        def_bool y
        depends on OF && IOMMU_API
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 67ee6623f9b2a4d..56bce221285b15f 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -43,7 +43,8 @@
 #else
 static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_DMA;
 #endif
-static bool iommu_dma_strict __read_mostly = true;
+static bool iommu_dma_strict __read_mostly =
+			IS_ENABLED(CONFIG_IOMMU_DEFAULT_STRICT);
 
 struct iommu_group {
 	struct kobject kobj;
-- 
1.8.3



^ permalink raw reply related

* [PATCH v8 4/7] powernv/iommu: add support for IOMMU default DMA mode build options
From: Zhen Lei @ 2019-05-30  3:48 UTC (permalink / raw)
  To: Jean-Philippe Brucker, John Garry, Robin Murphy, Will Deacon,
	Joerg Roedel, Jonathan Corbet, linux-doc, Sebastian Ott,
	Gerald Schaefer, Martin Schwidefsky, Heiko Carstens,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Tony Luck, Fenghua Yu, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, David Woodhouse, iommu,
	linux-kernel, linux-s390, linuxppc-dev, x86, linux-ia64
  Cc: Hanjun Guo, Zhen Lei
In-Reply-To: <20190530034831.4184-1-thunder.leizhen@huawei.com>

The default DMA mode is PASSTHROUGH on powernv, this patch make it can be
set to STRICT at build time. It can be overridden by boot option.

There is no functional change.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++-
 drivers/iommu/Kconfig                     | 2 ++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 126602b4e39972d..40208b9019be890 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -85,7 +85,8 @@ void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
 	va_end(args);
 }
 
-static bool pnv_iommu_bypass_disabled __read_mostly;
+static bool pnv_iommu_bypass_disabled __read_mostly =
+			!IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH);
 static bool pci_reset_phbs __read_mostly;
 
 static int __init iommu_setup(char *str)
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 9b48c2fc20e14d3..b5af859956c4fda 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -78,6 +78,7 @@ config IOMMU_DEBUGFS
 choice
 	prompt "IOMMU default DMA mode"
 	depends on IOMMU_API
+	default IOMMU_DEFAULT_PASSTHROUGH if (PPC_POWERNV && PCI)
 	default IOMMU_DEFAULT_LAZY if S390_IOMMU
 	default IOMMU_DEFAULT_STRICT
 	help
@@ -98,6 +99,7 @@ config IOMMU_DEFAULT_PASSTHROUGH
 
 config IOMMU_DEFAULT_LAZY
 	bool "lazy"
+	depends on !PPC_POWERNV
 	help
 	  Support lazy mode, where for every IOMMU DMA unmap operation, the
 	  flush operation of IOTLB and the free operation of IOVA are deferred.
-- 
1.8.3



^ permalink raw reply related

* [PATCH v8 5/7] iommu/vt-d: add support for IOMMU default DMA mode build options
From: Zhen Lei @ 2019-05-30  3:48 UTC (permalink / raw)
  To: Jean-Philippe Brucker, John Garry, Robin Murphy, Will Deacon,
	Joerg Roedel, Jonathan Corbet, linux-doc, Sebastian Ott,
	Gerald Schaefer, Martin Schwidefsky, Heiko Carstens,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Tony Luck, Fenghua Yu, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, David Woodhouse, iommu,
	linux-kernel, linux-s390, linuxppc-dev, x86, linux-ia64
  Cc: Hanjun Guo, Zhen Lei
In-Reply-To: <20190530034831.4184-1-thunder.leizhen@huawei.com>

The default DMA mode of INTEL IOMMU is LAZY, this patch make it can be
set to STRICT at build time. It can be overridden by boot option.

There is no functional change.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 drivers/iommu/Kconfig       | 2 +-
 drivers/iommu/intel-iommu.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index b5af859956c4fda..af580274b7c5270 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -79,7 +79,7 @@ choice
 	prompt "IOMMU default DMA mode"
 	depends on IOMMU_API
 	default IOMMU_DEFAULT_PASSTHROUGH if (PPC_POWERNV && PCI)
-	default IOMMU_DEFAULT_LAZY if S390_IOMMU
+	default IOMMU_DEFAULT_LAZY if (INTEL_IOMMU || S390_IOMMU)
 	default IOMMU_DEFAULT_STRICT
 	help
 	  This option allows IOMMU DMA mode to be chose at build time, to
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a209199f3af6460..50d74ea0acdbdca 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -362,7 +362,7 @@ static int domain_detach_iommu(struct dmar_domain *domain,
 
 static int dmar_map_gfx = 1;
 static int dmar_forcedac;
-static int intel_iommu_strict;
+static int intel_iommu_strict = IS_ENABLED(CONFIG_IOMMU_DEFAULT_STRICT);
 static int intel_iommu_superpage = 1;
 static int intel_iommu_sm;
 static int iommu_identity_mapping;
-- 
1.8.3



^ permalink raw reply related

* [PATCH v8 6/7] iommu/amd: add support for IOMMU default DMA mode build options
From: Zhen Lei @ 2019-05-30  3:48 UTC (permalink / raw)
  To: Jean-Philippe Brucker, John Garry, Robin Murphy, Will Deacon,
	Joerg Roedel, Jonathan Corbet, linux-doc, Sebastian Ott,
	Gerald Schaefer, Martin Schwidefsky, Heiko Carstens,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Tony Luck, Fenghua Yu, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, David Woodhouse, iommu,
	linux-kernel, linux-s390, linuxppc-dev, x86, linux-ia64
  Cc: Hanjun Guo, Zhen Lei
In-Reply-To: <20190530034831.4184-1-thunder.leizhen@huawei.com>

The default DMA mode of AMD IOMMU is LAZY, this patch make it can be set
to STRICT at build time. It can be overridden by boot option.

There is no functional change.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 drivers/iommu/Kconfig          | 2 +-
 drivers/iommu/amd_iommu_init.c | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index af580274b7c5270..f6c030433d38048 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -79,7 +79,7 @@ choice
 	prompt "IOMMU default DMA mode"
 	depends on IOMMU_API
 	default IOMMU_DEFAULT_PASSTHROUGH if (PPC_POWERNV && PCI)
-	default IOMMU_DEFAULT_LAZY if (INTEL_IOMMU || S390_IOMMU)
+	default IOMMU_DEFAULT_LAZY if (AMD_IOMMU || INTEL_IOMMU || S390_IOMMU)
 	default IOMMU_DEFAULT_STRICT
 	help
 	  This option allows IOMMU DMA mode to be chose at build time, to
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index f977df90d2a4912..6b0bfa43f6faa32 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -166,7 +166,8 @@ struct ivmd_header {
 					   to handle */
 LIST_HEAD(amd_iommu_unity_map);		/* a list of required unity mappings
 					   we find in ACPI */
-bool amd_iommu_unmap_flush;		/* if true, flush on every unmap */
+bool amd_iommu_unmap_flush = IS_ENABLED(CONFIG_IOMMU_DEFAULT_STRICT);
+					/* if true, flush on every unmap */
 
 LIST_HEAD(amd_iommu_list);		/* list of all AMD IOMMUs in the
 					   system */
-- 
1.8.3



^ permalink raw reply related

* [PATCH v8 7/7] ia64: hide build option IOMMU_DEFAULT_PASSTHROUGH
From: Zhen Lei @ 2019-05-30  3:48 UTC (permalink / raw)
  To: Jean-Philippe Brucker, John Garry, Robin Murphy, Will Deacon,
	Joerg Roedel, Jonathan Corbet, linux-doc, Sebastian Ott,
	Gerald Schaefer, Martin Schwidefsky, Heiko Carstens,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Tony Luck, Fenghua Yu, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, David Woodhouse, iommu,
	linux-kernel, linux-s390, linuxppc-dev, x86, linux-ia64
  Cc: Hanjun Guo, Zhen Lei
In-Reply-To: <20190530034831.4184-1-thunder.leizhen@huawei.com>

The DMA mode PASSTHROUGH is not used on ia64.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 drivers/iommu/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index f6c030433d38048..f7400e35628dce4 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -89,7 +89,7 @@ choice
 
 config IOMMU_DEFAULT_PASSTHROUGH
 	bool "passthrough"
-	depends on !S390_IOMMU
+	depends on (!S390_IOMMU && !IA64)
 	help
 	  In this mode, the DMA access through IOMMU without any addresses
 	  translation. That means, the wrong or illegal DMA access can not
-- 
1.8.3



^ permalink raw reply related

* [RFC] mm: Generalize notify_page_fault()
From: Anshuman Khandual @ 2019-05-30  5:55 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Mark Rutland, Michal Hocko, linux-ia64, linux-sh, Catalin Marinas,
	Will Deacon, Paul Mackerras, sparclinux, Stephen Rothwell,
	linux-s390, Yoshinori Sato, Russell King, Matthew Wilcox,
	Fenghua Yu, Anshuman Khandual, Andrey Konovalov, linux-arm-kernel,
	Tony Luck, Heiko Carstens, Martin Schwidefsky, Andrew Morton,
	linuxppc-dev, David S. Miller

Similar notify_page_fault() definitions are being used by architectures
duplicating much of the same code. This attempts to unify them into a
single implementation, generalize it and then move it to a common place.
kprobes_built_in() can detect CONFIG_KPROBES, hence notify_page_fault()
must not be wrapped again within CONFIG_KPROBES. Trap number argument can
now contain upto an 'unsigned int' accommodating all possible platforms.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>

Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-ia64@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "David S. Miller" <davem@davemloft.net>
---
Boot tested on arm64 and build tested on some others.

 arch/arm/mm/fault.c      | 22 ----------------------
 arch/arm64/mm/fault.c    | 22 ----------------------
 arch/ia64/mm/fault.c     | 22 ----------------------
 arch/powerpc/mm/fault.c  | 23 ++---------------------
 arch/s390/mm/fault.c     | 16 +---------------
 arch/sh/mm/fault.c       | 14 --------------
 arch/sparc/mm/fault_64.c | 16 +---------------
 include/linux/mm.h       |  1 +
 mm/memory.c              | 14 ++++++++++++++
 9 files changed, 19 insertions(+), 131 deletions(-)

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 58f69fa..1bc3b18 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -30,28 +30,6 @@
 
 #ifdef CONFIG_MMU
 
-#ifdef CONFIG_KPROBES
-static inline int notify_page_fault(struct pt_regs *regs, unsigned int fsr)
-{
-	int ret = 0;
-
-	if (!user_mode(regs)) {
-		/* kprobe_running() needs smp_processor_id() */
-		preempt_disable();
-		if (kprobe_running() && kprobe_fault_handler(regs, fsr))
-			ret = 1;
-		preempt_enable();
-	}
-
-	return ret;
-}
-#else
-static inline int notify_page_fault(struct pt_regs *regs, unsigned int fsr)
-{
-	return 0;
-}
-#endif
-
 /*
  * This is useful to dump out the page tables associated with
  * 'addr' in mm 'mm'.
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index a30818e..152f1f1 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -70,28 +70,6 @@ static inline const struct fault_info *esr_to_debug_fault_info(unsigned int esr)
 	return debug_fault_info + DBG_ESR_EVT(esr);
 }
 
-#ifdef CONFIG_KPROBES
-static inline int notify_page_fault(struct pt_regs *regs, unsigned int esr)
-{
-	int ret = 0;
-
-	/* kprobe_running() needs smp_processor_id() */
-	if (!user_mode(regs)) {
-		preempt_disable();
-		if (kprobe_running() && kprobe_fault_handler(regs, esr))
-			ret = 1;
-		preempt_enable();
-	}
-
-	return ret;
-}
-#else
-static inline int notify_page_fault(struct pt_regs *regs, unsigned int esr)
-{
-	return 0;
-}
-#endif
-
 static void data_abort_decode(unsigned int esr)
 {
 	pr_alert("Data abort info:\n");
diff --git a/arch/ia64/mm/fault.c b/arch/ia64/mm/fault.c
index 5baeb02..64283d2 100644
--- a/arch/ia64/mm/fault.c
+++ b/arch/ia64/mm/fault.c
@@ -21,28 +21,6 @@
 
 extern int die(char *, struct pt_regs *, long);
 
-#ifdef CONFIG_KPROBES
-static inline int notify_page_fault(struct pt_regs *regs, int trap)
-{
-	int ret = 0;
-
-	if (!user_mode(regs)) {
-		/* kprobe_running() needs smp_processor_id() */
-		preempt_disable();
-		if (kprobe_running() && kprobe_fault_handler(regs, trap))
-			ret = 1;
-		preempt_enable();
-	}
-
-	return ret;
-}
-#else
-static inline int notify_page_fault(struct pt_regs *regs, int trap)
-{
-	return 0;
-}
-#endif
-
 /*
  * Return TRUE if ADDRESS points at a page in the kernel's mapped segment
  * (inside region 5, on ia64) and that page is present.
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index b5d3578..5a0d71f 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -46,26 +46,6 @@
 #include <asm/debug.h>
 #include <asm/kup.h>
 
-static inline bool notify_page_fault(struct pt_regs *regs)
-{
-	bool ret = false;
-
-#ifdef CONFIG_KPROBES
-	/* kprobe_running() needs smp_processor_id() */
-	if (!user_mode(regs)) {
-		preempt_disable();
-		if (kprobe_running() && kprobe_fault_handler(regs, 11))
-			ret = true;
-		preempt_enable();
-	}
-#endif /* CONFIG_KPROBES */
-
-	if (unlikely(debugger_fault_handler(regs)))
-		ret = true;
-
-	return ret;
-}
-
 /*
  * Check whether the instruction inst is a store using
  * an update addressing form which will update r1.
@@ -466,8 +446,9 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	int is_write = page_fault_is_write(error_code);
 	vm_fault_t fault, major = 0;
 	bool must_retry = false;
+	int kprobe_fault = notify_page_fault(regs, 11);
 
-	if (notify_page_fault(regs))
+	if (unlikely(debugger_fault_handler(regs) || kprobe_fault))
 		return 0;
 
 	if (unlikely(page_fault_is_bad(error_code))) {
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index c220399..d317263 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -67,20 +67,6 @@ static int __init fault_init(void)
 }
 early_initcall(fault_init);
 
-static inline int notify_page_fault(struct pt_regs *regs)
-{
-	int ret = 0;
-
-	/* kprobe_running() needs smp_processor_id() */
-	if (kprobes_built_in() && !user_mode(regs)) {
-		preempt_disable();
-		if (kprobe_running() && kprobe_fault_handler(regs, 14))
-			ret = 1;
-		preempt_enable();
-	}
-	return ret;
-}
-
 /*
  * Find out which address space caused the exception.
  * Access register mode is impossible, ignore space == 3.
@@ -409,7 +395,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
 	 */
 	clear_pt_regs_flag(regs, PIF_PER_TRAP);
 
-	if (notify_page_fault(regs))
+	if (notify_page_fault(regs, 14))
 		return 0;
 
 	mm = tsk->mm;
diff --git a/arch/sh/mm/fault.c b/arch/sh/mm/fault.c
index 6defd2c6..94bdfcb 100644
--- a/arch/sh/mm/fault.c
+++ b/arch/sh/mm/fault.c
@@ -24,20 +24,6 @@
 #include <asm/tlbflush.h>
 #include <asm/traps.h>
 
-static inline int notify_page_fault(struct pt_regs *regs, int trap)
-{
-	int ret = 0;
-
-	if (kprobes_built_in() && !user_mode(regs)) {
-		preempt_disable();
-		if (kprobe_running() && kprobe_fault_handler(regs, trap))
-			ret = 1;
-		preempt_enable();
-	}
-
-	return ret;
-}
-
 static void
 force_sig_info_fault(int si_signo, int si_code, unsigned long address,
 		     struct task_struct *tsk)
diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c
index 8f8a604..e5557a1 100644
--- a/arch/sparc/mm/fault_64.c
+++ b/arch/sparc/mm/fault_64.c
@@ -38,20 +38,6 @@
 
 int show_unhandled_signals = 1;
 
-static inline __kprobes int notify_page_fault(struct pt_regs *regs)
-{
-	int ret = 0;
-
-	/* kprobe_running() needs smp_processor_id() */
-	if (kprobes_built_in() && !user_mode(regs)) {
-		preempt_disable();
-		if (kprobe_running() && kprobe_fault_handler(regs, 0))
-			ret = 1;
-		preempt_enable();
-	}
-	return ret;
-}
-
 static void __kprobes unhandled_fault(unsigned long address,
 				      struct task_struct *tsk,
 				      struct pt_regs *regs)
@@ -285,7 +271,7 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
 
 	fault_code = get_thread_fault_code();
 
-	if (notify_page_fault(regs))
+	if (notify_page_fault(regs, 0))
 		goto exit_exception;
 
 	si_code = SEGV_MAPERR;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0e8834a..c5a8dcf 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1778,6 +1778,7 @@ static inline int pte_devmap(pte_t pte)
 }
 #endif
 
+int notify_page_fault(struct pt_regs *regs, unsigned int trap);
 int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot);
 
 extern pte_t *__get_locked_pte(struct mm_struct *mm, unsigned long addr,
diff --git a/mm/memory.c b/mm/memory.c
index ddf20bd..82022d7 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -52,6 +52,7 @@
 #include <linux/pagemap.h>
 #include <linux/memremap.h>
 #include <linux/ksm.h>
+#include <linux/kprobes.h>
 #include <linux/rmap.h>
 #include <linux/export.h>
 #include <linux/delayacct.h>
@@ -141,6 +142,19 @@ static int __init init_zero_pfn(void)
 core_initcall(init_zero_pfn);
 
 
+int __kprobes notify_page_fault(struct pt_regs *regs, unsigned int trap)
+{
+	int ret = 0;
+
+	if (kprobes_built_in() && !user_mode(regs)) {
+		preempt_disable();
+		if (kprobe_running() && kprobe_fault_handler(regs, trap))
+			ret = 1;
+		preempt_enable();
+	}
+	return ret;
+}
+
 #if defined(SPLIT_RSS_COUNTING)
 
 void sync_mm_rss(struct mm_struct *mm)
-- 
2.7.4


^ permalink raw reply related

* Re: [EXTERNAL] Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
From: Sam Bobroff @ 2019-05-30  6:55 UTC (permalink / raw)
  To: Oliver
  Cc: Shawn Anastasio, linux-pci, Linux Kernel Mailing List, rppt,
	Paul Mackerras, Bjorn Helgaas, xyjxie, linuxppc-dev
In-Reply-To: <CAOSf1CEFfbmwfvmdqT1xdt8SFb=tYdYXLfXeyZ8=iRnhg4a3Pg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4180 bytes --]

On Tue, May 28, 2019 at 03:36:34PM +1000, Oliver wrote:
> On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio <shawn@anastas.io> wrote:
> >
> > Introduce a new pcibios function pcibios_ignore_alignment_request
> > which allows the PCI core to defer to platform-specific code to
> > determine whether or not to ignore alignment requests for PCI resources.
> >
> > The existing behavior is to simply ignore alignment requests when
> > PCI_PROBE_ONLY is set. This is behavior is maintained by the
> > default implementation of pcibios_ignore_alignment_request.
> >
> > Signed-off-by: Shawn Anastasio <shawn@anastas.io>
> > ---
> >  drivers/pci/pci.c   | 9 +++++++--
> >  include/linux/pci.h | 1 +
> >  2 files changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 8abc843b1615..8207a09085d1 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void)
> >         return 0;
> >  }
> >
> > +int __weak pcibios_ignore_alignment_request(void)
> > +{
> > +       return pci_has_flag(PCI_PROBE_ONLY);
> > +}
> > +
> >  #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
> >  static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
> >  static DEFINE_SPINLOCK(resource_alignment_lock);
> > @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
> >         p = resource_alignment_param;
> >         if (!*p && !align)
> >                 goto out;
> > -       if (pci_has_flag(PCI_PROBE_ONLY)) {
> > +       if (pcibios_ignore_alignment_request()) {
> >                 align = 0;
> > -               pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n");
> > +               pr_info_once("PCI: Ignoring requested alignments\n");
> >                 goto out;
> >         }
> 
> I think the logic here is questionable to begin with. If the user has
> explicitly requested re-aligning a resource via the command line then
> we should probably do it even if PCI_PROBE_ONLY is set. When it breaks
> they get to keep the pieces.
> 
> That said, the real issue here is that PCI_PROBE_ONLY probably
> shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM)
> hotplugged devices are configured by firmware before it's passed to
> the guest and we need to keep the FW assignments otherwise things
> break. QEMU however doesn't do any BAR assignments and relies on that
> being handled by the guest. At boot time this is done by SLOF, but
> Linux only keeps SLOF around until it's extracted the device-tree.
> Once that's done SLOF gets blown away and the kernel needs to do it's
> own BAR assignments. I'm guessing there's a hack in there to make it
> work today, but it's a little surprising that it works at all...
> 
> IIRC Sam Bobroff was looking at hotplug under pseries recently so he
> might have something to add. He's sick at the moment, but I'll ask him
> to take a look at this once he's back among the living

There seems to be some code already in the kernel that will disable
PCI_PROBE_ONLY based on a device tree property, so I did a quick test
today and it seems to work. Only a trivial tweak is needed in QEMU to
do it (have spapr_dt_chosen() add a node called "linux,pci-probe-only"
with a value of 0), and that would allow us to set it only for QEMU (and
not PowerVM) if that's what we want to do. Is that useful?

(I haven't done any real testing yet but the guest booted up OK.)

> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index 4a5a84d7bdd4..47471dcdbaf9 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -1990,6 +1990,7 @@ static inline void pcibios_penalize_isa_irq(int irq, int active) {}
> >  int pcibios_alloc_irq(struct pci_dev *dev);
> >  void pcibios_free_irq(struct pci_dev *dev);
> >  resource_size_t pcibios_default_alignment(void);
> > +int pcibios_ignore_alignment_request(void);
> >
> >  #ifdef CONFIG_HIBERNATE_CALLBACKS
> >  extern struct dev_pm_ops pcibios_pm_ops;
> > --
> > 2.20.1
> >
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59
From: Alexey Kardashevskiy @ 2019-05-30  7:03 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Oliver O'Halloran,
	Sam Bobroff, David Gibson

This is an attempt to allow DMA masks between 32..59 which are not large
enough to use either a PHB3 bypass mode or a sketchy bypass. Depending
on the max order, up to 40 is usually available.


This is based on v5.2-rc2.

Please comment. Thanks.



Alexey Kardashevskiy (3):
  powerpc/iommu: Allow bypass-only for DMA
  powerpc/powernv/ioda2: Allocate TCE table levels on demand for default
    DMA window
  powerpc/powernv/ioda2: Create bigger default window with 64k IOMMU
    pages

 arch/powerpc/include/asm/iommu.h              |  8 ++-
 arch/powerpc/platforms/powernv/pci.h          |  2 +-
 arch/powerpc/kernel/dma-iommu.c               | 11 ++--
 arch/powerpc/kernel/iommu.c                   | 58 +++++++++++++------
 arch/powerpc/platforms/powernv/pci-ioda-tce.c | 20 +++----
 arch/powerpc/platforms/powernv/pci-ioda.c     | 40 +++++++++++--
 6 files changed, 95 insertions(+), 44 deletions(-)

-- 
2.17.1


^ permalink raw reply

* [PATCH kernel v3 1/3] powerpc/iommu: Allow bypass-only for DMA
From: Alexey Kardashevskiy @ 2019-05-30  7:03 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Oliver O'Halloran,
	Sam Bobroff, David Gibson
In-Reply-To: <20190530070355.121802-1-aik@ozlabs.ru>

POWER8 and newer support a bypass mode which maps all host memory to
PCI buses so an IOMMU table is not always required. However if we fail to
create such a table, the DMA setup fails and the kernel does not boot.

This skips the 32bit DMA setup check if the bypass is can be selected.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---

This minor thing helped me debugging next 2 patches so it can help
somebody else too.
---
 arch/powerpc/kernel/dma-iommu.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index 09231ef06d01..809c1dc01edf 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -118,18 +118,17 @@ int dma_iommu_dma_supported(struct device *dev, u64 mask)
 {
 	struct iommu_table *tbl = get_iommu_table_base(dev);
 
-	if (!tbl) {
-		dev_info(dev, "Warning: IOMMU dma not supported: mask 0x%08llx"
-			", table unavailable\n", mask);
-		return 0;
-	}
-
 	if (dev_is_pci(dev) && dma_iommu_bypass_supported(dev, mask)) {
 		dev->archdata.iommu_bypass = true;
 		dev_dbg(dev, "iommu: 64-bit OK, using fixed ops\n");
 		return 1;
 	}
 
+	if (!tbl) {
+		dev_err(dev, "Warning: IOMMU dma not supported: mask 0x%08llx, table unavailable\n", mask);
+		return 0;
+	}
+
 	if (tbl->it_offset > (mask >> tbl->it_page_shift)) {
 		dev_info(dev, "Warning: IOMMU offset too big for device mask\n");
 		dev_info(dev, "mask: 0x%08llx, table offset: 0x%08lx\n",
-- 
2.17.1


^ permalink raw reply related

* [PATCH kernel v3 2/3] powerpc/powernv/ioda2: Allocate TCE table levels on demand for default DMA window
From: Alexey Kardashevskiy @ 2019-05-30  7:03 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Oliver O'Halloran,
	Sam Bobroff, David Gibson
In-Reply-To: <20190530070355.121802-1-aik@ozlabs.ru>

We allocate only the first level of multilevel TCE tables for KVM
already (alloc_userspace_copy==true), and the rest is allocated on demand.
This is not enabled though for baremetal.

This removes the KVM limitation (implicit, via the alloc_userspace_copy
parameter) and always allocates just the first level. The on-demand
allocation of missing levels is already implemented.

As from now on DMA map might happen with disabled interrupts, this
allocates TCEs with GFP_ATOMIC.

To save time when creating a new clean table, this skips non-allocated
indirect TCE entries in pnv_tce_free just like we already do in
the VFIO IOMMU TCE driver.

This changes the default level number from 1 to 2 to reduce the amount
of memory required for the default 32bit DMA window at the boot time.
The default window size is up to 2GB which requires 4MB of TCEs which is
unlikely to be used entirely or at all as most devices these days are
64bit capable so by switching to 2 levels by default we save 4032KB of
RAM per a device.

While at this, add __GFP_NOWARN to alloc_pages_node() as the userspace
can trigger this path via VFIO, see the failure and try creating a table
again with different parameters which might succeed.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v2:
* added __GFP_NOWARN to alloc_pages_node
---
 arch/powerpc/platforms/powernv/pci.h          |  2 +-
 arch/powerpc/platforms/powernv/pci-ioda-tce.c | 20 +++++++++----------
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 1a51e7bfc541..a55dabc8d057 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -225,7 +225,7 @@ extern struct iommu_table_group *pnv_npu_compound_attach(
 		struct pnv_ioda_pe *pe);
 
 /* pci-ioda-tce.c */
-#define POWERNV_IOMMU_DEFAULT_LEVELS	1
+#define POWERNV_IOMMU_DEFAULT_LEVELS	2
 #define POWERNV_IOMMU_MAX_LEVELS	5
 
 extern int pnv_tce_build(struct iommu_table *tbl, long index, long npages,
diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
index e28f03e1eb5e..c75ec37bf0cd 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
@@ -36,7 +36,8 @@ static __be64 *pnv_alloc_tce_level(int nid, unsigned int shift)
 	struct page *tce_mem = NULL;
 	__be64 *addr;
 
-	tce_mem = alloc_pages_node(nid, GFP_KERNEL, shift - PAGE_SHIFT);
+	tce_mem = alloc_pages_node(nid, GFP_ATOMIC | __GFP_NOWARN,
+			shift - PAGE_SHIFT);
 	if (!tce_mem) {
 		pr_err("Failed to allocate a TCE memory, level shift=%d\n",
 				shift);
@@ -161,6 +162,9 @@ void pnv_tce_free(struct iommu_table *tbl, long index, long npages)
 
 		if (ptce)
 			*ptce = cpu_to_be64(0);
+		else
+			/* Skip the rest of the level */
+			i |= tbl->it_level_size - 1;
 	}
 }
 
@@ -260,7 +264,6 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
 	unsigned int table_shift = max_t(unsigned int, entries_shift + 3,
 			PAGE_SHIFT);
 	const unsigned long tce_table_size = 1UL << table_shift;
-	unsigned int tmplevels = levels;
 
 	if (!levels || (levels > POWERNV_IOMMU_MAX_LEVELS))
 		return -EINVAL;
@@ -268,9 +271,6 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
 	if (!is_power_of_2(window_size))
 		return -EINVAL;
 
-	if (alloc_userspace_copy && (window_size > (1ULL << 32)))
-		tmplevels = 1;
-
 	/* Adjust direct table size from window_size and levels */
 	entries_shift = (entries_shift + levels - 1) / levels;
 	level_shift = entries_shift + 3;
@@ -281,7 +281,7 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
 
 	/* Allocate TCE table */
 	addr = pnv_pci_ioda2_table_do_alloc_pages(nid, level_shift,
-			tmplevels, tce_table_size, &offset, &total_allocated);
+			1, tce_table_size, &offset, &total_allocated);
 
 	/* addr==NULL means that the first level allocation failed */
 	if (!addr)
@@ -292,18 +292,18 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
 	 * we did not allocate as much as we wanted,
 	 * release partially allocated table.
 	 */
-	if (tmplevels == levels && offset < tce_table_size)
+	if (levels == 1 && offset < tce_table_size)
 		goto free_tces_exit;
 
 	/* Allocate userspace view of the TCE table */
 	if (alloc_userspace_copy) {
 		offset = 0;
 		uas = pnv_pci_ioda2_table_do_alloc_pages(nid, level_shift,
-				tmplevels, tce_table_size, &offset,
+				1, tce_table_size, &offset,
 				&total_allocated_uas);
 		if (!uas)
 			goto free_tces_exit;
-		if (tmplevels == levels && (offset < tce_table_size ||
+		if (levels == 1 && (offset < tce_table_size ||
 				total_allocated_uas != total_allocated))
 			goto free_uas_exit;
 	}
@@ -318,7 +318,7 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
 
 	pr_debug("Created TCE table: ws=%08llx ts=%lx @%08llx base=%lx uas=%p levels=%d/%d\n",
 			window_size, tce_table_size, bus_offset, tbl->it_base,
-			tbl->it_userspace, tmplevels, levels);
+			tbl->it_userspace, 1, levels);
 
 	return 0;
 
-- 
2.17.1


^ permalink raw reply related

* [PATCH kernel v3 3/3] powerpc/powernv/ioda2: Create bigger default window with 64k IOMMU pages
From: Alexey Kardashevskiy @ 2019-05-30  7:03 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Oliver O'Halloran,
	Sam Bobroff, David Gibson
In-Reply-To: <20190530070355.121802-1-aik@ozlabs.ru>

At the moment we create a small window only for 32bit devices, the window
maps 0..2GB of the PCI space only. For other devices we either use
a sketchy bypass or hardware bypass but the former can only work if
the amount of RAM is no bigger than the device's DMA mask and the latter
requires devices to support at least 59bit DMA.

This extends the default DMA window to the maximum size possible to allow
a wider DMA mask than just 32bit. The default window size is now limited
by the the iommu_table::it_map allocation bitmap which is a contiguous
array, 1 bit per an IOMMU page.

This increases the default IOMMU page size from hard coded 4K to
the system page size to allow wider DMA masks.

This increases the level number to not exceed the max order allocation
limit per TCE level. By the same time, this keeps minimal levels number
as 2 in order to save memory.

As the extended window now overlaps the 32bit MMIO region, this adds
an area reservation to iommu_init_table().

After this change the default window size is 0x80000000000==1<<43 so
devices limited to DMA mask smaller than the amount of system RAM can
still use more than just 2GB of memory for DMA.

With the on-demand allocation of indirect TCE table levels enabled and
2 levels, the first TCE level size is just
1<<ceil((log2(0x7ffffffffff+1)-16)/2)=16384 TCEs or 2 system pages.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v3:
* fixed tce levels calculation

v2:
* adjusted level number to the max order
---
 arch/powerpc/include/asm/iommu.h          |  8 +++-
 arch/powerpc/kernel/iommu.c               | 58 +++++++++++++++--------
 arch/powerpc/platforms/powernv/pci-ioda.c | 40 +++++++++++++---
 3 files changed, 79 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 0ac52392ed99..5ea782e04803 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -124,6 +124,8 @@ struct iommu_table {
 	struct iommu_table_ops *it_ops;
 	struct kref    it_kref;
 	int it_nid;
+	unsigned long it_reserved_start; /* Start of not-DMA-able (MMIO) area */
+	unsigned long it_reserved_end;
 };
 
 #define IOMMU_TABLE_USERSPACE_ENTRY_RO(tbl, entry) \
@@ -162,8 +164,10 @@ extern int iommu_tce_table_put(struct iommu_table *tbl);
 /* Initializes an iommu_table based in values set in the passed-in
  * structure
  */
-extern struct iommu_table *iommu_init_table(struct iommu_table * tbl,
-					    int nid);
+extern struct iommu_table *iommu_init_table_res(struct iommu_table *tbl,
+		int nid, unsigned long res_start, unsigned long res_end);
+#define iommu_init_table(tbl, nid) iommu_init_table_res((tbl), (nid), 0, 0)
+
 #define IOMMU_TABLE_GROUP_MAX_TABLES	2
 
 struct iommu_table_group;
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 33bbd59cff79..209306ce7f4b 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -646,11 +646,43 @@ static void iommu_table_clear(struct iommu_table *tbl)
 #endif
 }
 
+static void iommu_table_reserve_pages(struct iommu_table *tbl)
+{
+	int i;
+
+	/*
+	 * Reserve page 0 so it will not be used for any mappings.
+	 * This avoids buggy drivers that consider page 0 to be invalid
+	 * to crash the machine or even lose data.
+	 */
+	if (tbl->it_offset == 0)
+		set_bit(0, tbl->it_map);
+
+	for (i = tbl->it_reserved_start; i < tbl->it_reserved_end; ++i)
+		set_bit(i, tbl->it_map);
+}
+
+static void iommu_table_release_pages(struct iommu_table *tbl)
+{
+	int i;
+
+	/*
+	 * In case we have reserved the first bit, we should not emit
+	 * the warning below.
+	 */
+	if (tbl->it_offset == 0)
+		clear_bit(0, tbl->it_map);
+
+	for (i = tbl->it_reserved_start; i < tbl->it_reserved_end; ++i)
+		clear_bit(i, tbl->it_map);
+}
+
 /*
  * Build a iommu_table structure.  This contains a bit map which
  * is used to manage allocation of the tce space.
  */
-struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid)
+struct iommu_table *iommu_init_table_res(struct iommu_table *tbl, int nid,
+		unsigned long res_start, unsigned long res_end)
 {
 	unsigned long sz;
 	static int welcomed = 0;
@@ -669,13 +701,9 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid)
 	tbl->it_map = page_address(page);
 	memset(tbl->it_map, 0, sz);
 
-	/*
-	 * Reserve page 0 so it will not be used for any mappings.
-	 * This avoids buggy drivers that consider page 0 to be invalid
-	 * to crash the machine or even lose data.
-	 */
-	if (tbl->it_offset == 0)
-		set_bit(0, tbl->it_map);
+	tbl->it_reserved_start = res_start;
+	tbl->it_reserved_end = res_end;
+	iommu_table_reserve_pages(tbl);
 
 	/* We only split the IOMMU table if we have 1GB or more of space */
 	if ((tbl->it_size << tbl->it_page_shift) >= (1UL * 1024 * 1024 * 1024))
@@ -727,12 +755,7 @@ static void iommu_table_free(struct kref *kref)
 		return;
 	}
 
-	/*
-	 * In case we have reserved the first bit, we should not emit
-	 * the warning below.
-	 */
-	if (tbl->it_offset == 0)
-		clear_bit(0, tbl->it_map);
+	iommu_table_release_pages(tbl);
 
 	/* verify that table contains no entries */
 	if (!bitmap_empty(tbl->it_map, tbl->it_size))
@@ -1037,8 +1060,7 @@ int iommu_take_ownership(struct iommu_table *tbl)
 	for (i = 0; i < tbl->nr_pools; i++)
 		spin_lock(&tbl->pools[i].lock);
 
-	if (tbl->it_offset == 0)
-		clear_bit(0, tbl->it_map);
+	iommu_table_reserve_pages(tbl);
 
 	if (!bitmap_empty(tbl->it_map, tbl->it_size)) {
 		pr_err("iommu_tce: it_map is not empty");
@@ -1068,9 +1090,7 @@ void iommu_release_ownership(struct iommu_table *tbl)
 
 	memset(tbl->it_map, 0, sz);
 
-	/* Restore bit#0 set by iommu_init_table() */
-	if (tbl->it_offset == 0)
-		set_bit(0, tbl->it_map);
+	iommu_table_release_pages(tbl);
 
 	for (i = 0; i < tbl->nr_pools; i++)
 		spin_unlock(&tbl->pools[i].lock);
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 126602b4e399..ce2efdb3900d 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2422,6 +2422,7 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
 {
 	struct iommu_table *tbl = NULL;
 	long rc;
+	unsigned long res_start, res_end;
 
 	/*
 	 * crashkernel= specifies the kdump kernel's maximum memory at
@@ -2435,19 +2436,46 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
 	 * DMA window can be larger than available memory, which will
 	 * cause errors later.
 	 */
-	const u64 window_size = min((u64)pe->table_group.tce32_size, max_memory);
+	const u64 maxblock = 1UL << (PAGE_SHIFT + MAX_ORDER - 1);
 
-	rc = pnv_pci_ioda2_create_table(&pe->table_group, 0,
-			IOMMU_PAGE_SHIFT_4K,
-			window_size,
-			POWERNV_IOMMU_DEFAULT_LEVELS, false, &tbl);
+	/*
+	 * We create the default window as big as we can. The constraint is
+	 * the max order of allocation possible. The TCE tableis likely to
+	 * end up being multilevel and with on-demand allocation in place,
+	 * the initial use is not going to be huge as the default window aims
+	 * to support cripplied devices (i.e. not fully 64bit DMAble) only.
+	 */
+	/* iommu_table::it_map uses 1 bit per IOMMU page, hence 8 */
+	const u64 window_size = min((maxblock * 8) << PAGE_SHIFT, max_memory);
+	/* Each TCE level cannot exceed maxblock so go multilevel if needed */
+	unsigned long tces_order = ilog2(window_size >> PAGE_SHIFT);
+	unsigned long tcelevel_order = ilog2(maxblock >> 3);
+	unsigned int levels = tces_order / tcelevel_order;
+
+	if (tces_order % tcelevel_order)
+		levels += 1;
+	/*
+	 * We try to stick to default levels (which is >1 at the moment) in
+	 * order to save memory by relying on on-demain TCE level allocation.
+	 */
+	levels = max_t(unsigned int, levels, POWERNV_IOMMU_DEFAULT_LEVELS);
+
+	rc = pnv_pci_ioda2_create_table(&pe->table_group, 0, PAGE_SHIFT,
+			window_size, levels, false, &tbl);
 	if (rc) {
 		pe_err(pe, "Failed to create 32-bit TCE table, err %ld",
 				rc);
 		return rc;
 	}
 
-	iommu_init_table(tbl, pe->phb->hose->node);
+	/* We use top part of 32bit space for MMIO so exclude it from DMA */
+	res_start = 0;
+	res_end = 0;
+	if (window_size > pe->phb->ioda.m32_pci_base) {
+		res_start = pe->phb->ioda.m32_pci_base >> tbl->it_page_shift;
+		res_end = min(window_size, SZ_4G) >> tbl->it_page_shift;
+	}
+	iommu_init_table_res(tbl, pe->phb->hose->node, res_start, res_end);
 
 	rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
 	if (rc) {
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH kernel 0/2] pseries: Enable SWIOTLB
From: Alexey Kardashevskiy @ 2019-05-30  7:04 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alistair Popple, Thiago Jung Bauermann, David Gibson
In-Reply-To: <20190507062559.20295-1-aik@ozlabs.ru>

Ping, anyone?

On 07/05/2019 16:25, Alexey Kardashevskiy wrote:
> This is an attempt to allow PCI pass through to a secure guest when
> hardware can only access insecure memory. This allows SWIOTLB use
> for passed through devices.
> 
> Later on secure VMs will unsecure SWIOTLB bounce buffers for DMA
> and the rest of the guest RAM will be unavailable to the hardware
> by default.
> 
> 
> This is based on sha1
> e93c9c99a629 Linus Torvalds "Linux 5.1".
> 
> Please comment. Thanks.
> 
> 
> 
> Alexey Kardashevskiy (2):
>   powerpc/pseries/dma: Allow swiotlb
>   powerpc/pseries/dma: Enable swiotlb
> 
>  arch/powerpc/kernel/dma-iommu.c        | 36 ++++++++++++++++++++++++++
>  arch/powerpc/platforms/pseries/setup.c |  5 ++++
>  arch/powerpc/platforms/pseries/Kconfig |  1 +
>  3 files changed, 42 insertions(+)
> 

-- 
Alexey

^ permalink raw reply

* Re: [PATCH kernel] prom_init: Fetch flatten device tree from the system firmware
From: Alexey Kardashevskiy @ 2019-05-30  7:09 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, Suraj Jitindar Singh, David Gibson
In-Reply-To: <20190501034221.18437-1-aik@ozlabs.ru>

so, it is sort-of nack from David and sort-of ack from Segher, what
happens now?



On 01/05/2019 13:42, Alexey Kardashevskiy wrote:
> At the moment, on 256CPU + 256 PCI devices guest, it takes the guest
> about 8.5sec to fetch the entire device tree via the client interface
> as the DT is traversed twice - for strings blob and for struct blob.
> Also, "getprop" is quite slow too as SLOF stores properties in a linked
> list.
> 
> However, since [1] SLOF builds flattened device tree (FDT) for another
> purpose. [2] adds a new "fdt-fetch" client interface for the OS to fetch
> the FDT.
> 
> This tries the new method; if not supported, this falls back to
> the old method.
> 
> There is a change in the FDT layout - the old method produced
> (reserved map, strings, structs), the new one receives only strings and
> structs from the firmware and adds the final reserved map to the end,
> so it is (fw reserved map, strings, structs, reserved map).
> This still produces the same unflattened device tree.
> 
> This merges the reserved map from the firmware into the kernel's reserved
> map. At the moment SLOF generates an empty reserved map so this does not
> change the existing behaviour in regard of reservations.
> 
> This supports only v17 onward as only that version provides dt_struct_size
> which works as "fdt-fetch" only produces v17 blobs.
> 
> If "fdt-fetch" is not available, the old method of fetching the DT is used.
> 
> [1] https://git.qemu.org/?p=SLOF.git;a=commitdiff;h=e6fc84652c9c00
> [2] https://git.qemu.org/?p=SLOF.git;a=commit;h=ecda95906930b80
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/kernel/prom_init.c | 43 +++++++++++++++++++++++++++++++++
>  1 file changed, 43 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
> index f33ff4163a51..72e7a602b68e 100644
> --- a/arch/powerpc/kernel/prom_init.c
> +++ b/arch/powerpc/kernel/prom_init.c
> @@ -2457,6 +2457,48 @@ static void __init flatten_device_tree(void)
>  		prom_panic("Can't allocate initial device-tree chunk\n");
>  	mem_end = mem_start + room;
>  
> +	hdr = (void *) mem_start;
> +	if (!call_prom_ret("fdt-fetch", 2, 1, NULL, mem_start,
> +				room - sizeof(mem_reserve_map)) &&
> +			hdr->version >= 17) {
> +		u32 size;
> +		struct mem_map_entry *fwrmap;
> +
> +		/* Fixup the boot cpuid */
> +		hdr->boot_cpuid_phys = cpu_to_be32(prom.cpu);
> +
> +		/*
> +		 * Store the struct and strings addresses, mostly
> +		 * for consistency, only dt_header_start actually matters later.
> +		 */
> +		dt_header_start = mem_start;
> +		dt_string_start = mem_start + be32_to_cpu(hdr->off_dt_strings);
> +		dt_string_end = dt_string_start +
> +			be32_to_cpu(hdr->dt_strings_size);
> +		dt_struct_start = mem_start + be32_to_cpu(hdr->off_dt_struct);
> +		dt_struct_end = dt_struct_start +
> +			be32_to_cpu(hdr->dt_struct_size);
> +
> +		/*
> +		 * Calculate the reserved map location (which we put
> +		 * at the blob end) and update total size.
> +		 */
> +		fwrmap = (void *)(mem_start + be32_to_cpu(hdr->off_mem_rsvmap));
> +		hdr->off_mem_rsvmap = hdr->totalsize;
> +		size = be32_to_cpu(hdr->totalsize);
> +		hdr->totalsize = cpu_to_be32(size + sizeof(mem_reserve_map));
> +
> +		/* Merge reserved map from firmware to ours */
> +		for ( ; fwrmap->size; ++fwrmap)
> +			reserve_mem(be64_to_cpu(fwrmap->base),
> +					be64_to_cpu(fwrmap->size));
> +
> +		rsvmap = (u64 *)(mem_start + size);
> +
> +		prom_debug("Fetched DTB: %d bytes to @%lx\n", size, mem_start);
> +		goto finalize_exit;
> +	}
> +
>  	/* Get root of tree */
>  	root = call_prom("peer", 1, 1, (phandle)0);
>  	if (root == (phandle)0)
> @@ -2504,6 +2546,7 @@ static void __init flatten_device_tree(void)
>  	/* Version 16 is not backward compatible */
>  	hdr->last_comp_version = cpu_to_be32(0x10);
>  
> +finalize_exit:
>  	/* Copy the reserve map in */
>  	memcpy(rsvmap, mem_reserve_map, sizeof(mem_reserve_map));
>  
> 

-- 
Alexey

^ permalink raw reply

* Re: [PATCH 22/22] docs: fix broken documentation links
From: Wolfram Sang @ 2019-05-30  7:21 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Andrew Lunn, Andy Lutomirski, Catalin Marinas, Linus Walleij,
	Will Deacon, Pavel Tatashin, Paul Mackerras, Alessia Mantegazza,
	Jakub Wilk, Bartosz Golaszewski, Paul E. McKenney, Kevin Hilman,
	James Morris, linux-acpi, Ingo Molnar, xen-devel, Jason Wang,
	Alexander Popov, Qian Cai, Al Viro, Thomas Preston,
	Thomas Gleixner, Kairui Song, Ding Xiang, Greg Kroah-Hartman,
	Rafael J. Wysocki, linux-kernel, Paul Burton, Jiri Kosina,
	Casey Schaufler, Andrew Morton, Lu Baolu, Mark Rutland, Feng Tang,
	Linux Doc Mailing List, Dave Hansen, Mimi Zohar, Kamalesh Babulal,
	linux-mm, Masahiro Yamada, Yannik Sembritzki, Harry Wei,
	linux-i2c, Shuah Khan, Stephen Rothwell, Stefano Stabellini,
	Alexandre Ghiti, YueHaibing, Robert Moore, AKASHI Takahiro,
	Len Brown, Joerg Roedel, linux-arm-msm, linuxppc-dev,
	Mauro Carvalho Chehab, linux-gpio, Claudiu Manoil,
	Florian Fainelli, Jacek Anaszewski, Bjorn Helgaas, linux-amlogic,
	Boris Ostrovsky, Mika Westerberg, linux-arm-kernel, Tony Luck,
	Sean Christopherson, James Morse, Samuel Mendoza-Jonas, linux-pci,
	Bhupesh Sharma, Jonathan Cameron, platform-driver-x86,
	Quentin Perret, linux-kselftest, Alex Shi, Lorenzo Pieralisi,
	Baoquan He, Jonathan Corbet, Raphael Gault, Joel Stanley,
	Federico Vaga, Darren Hart, linux-edac, Erik Schmauss,
	Serge E. Hallyn, Palmer Dabbelt, Kees Cook,
	Bartlomiej Zolnierkiewicz, Jonathan Neuschäfer,
	SeongJae Park, Mark Brown, Borislav Petkov, Sunil Muthuswamy,
	virtualization, devel, Ard Biesheuvel, Liam Girdwood,
	Sakari Ailus, Olof Johansson, Logan Gunthorpe, David S. Miller,
	Kirill A. Shutemov, Sven Van Asbroeck, Michal Hocko, kvm,
	Michael S. Tsirkin, Peter Zijlstra, Thorsten Leemhuis,
	David Howells, David Brown, H. Peter Anvin, devel, Manfred Spraul,
	x86, Russell King, Mike Rapoport, Andy Gross, Dave Young,
	devicetree, Arnaldo Carvalho de Melo, Jerome Glisse, Rob Herring,
	Josh Poimboeuf, Dmitry Vyukov, Luis Chamberlain, Juergen Gross,
	Denis Efremov, netdev, Nicolas Ferre, Changbin Du,
	linux-security-module, Robin Murphy, Andy Shevchenko
In-Reply-To: <f9fecacbe4ce0b2b3aed38d71ae3753f2daf3ce3.1559171394.git.mchehab+samsung@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 366 bytes --]

On Wed, May 29, 2019 at 08:23:53PM -0300, Mauro Carvalho Chehab wrote:
> Mostly due to x86 and acpi conversion, several documentation
> links are still pointing to the old file. Fix them.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>

Didn't I ack this already?

For the I2C part:

Reviewed-by: Wolfram Sang <wsa@the-dreams.de>


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 09/22] docs: mark orphan documents as such
From: Paolo Bonzini @ 2019-05-30  8:30 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Linux Doc Mailing List
  Cc: kvm, Radim Krčmář, Maxime Ripard, dri-devel,
	platform-driver-x86, Paul Mackerras, linux-stm32,
	Alexandre Torgue, Jonathan Corbet, David Airlie, Andrew Donnellan,
	linux-pm, Maarten Lankhorst, Matan Ziv-Av, Mauro Carvalho Chehab,
	Daniel Vetter, Sean Paul, linux-arm-kernel, linux-kernel,
	Maxime Coquelin, Frederic Barrat, linuxppc-dev, Georgi Djakov
In-Reply-To: <e0bf4e767dd5de9189e5993fbec2f4b1bafd2064.1559171394.git.mchehab+samsung@kernel.org>

On 30/05/19 01:23, Mauro Carvalho Chehab wrote:
> Sphinx doesn't like orphan documents:
> 
>     Documentation/accelerators/ocxl.rst: WARNING: document isn't included in any toctree
>     Documentation/arm/stm32/overview.rst: WARNING: document isn't included in any toctree
>     Documentation/arm/stm32/stm32f429-overview.rst: WARNING: document isn't included in any toctree
>     Documentation/arm/stm32/stm32f746-overview.rst: WARNING: document isn't included in any toctree
>     Documentation/arm/stm32/stm32f769-overview.rst: WARNING: document isn't included in any toctree
>     Documentation/arm/stm32/stm32h743-overview.rst: WARNING: document isn't included in any toctree
>     Documentation/arm/stm32/stm32mp157-overview.rst: WARNING: document isn't included in any toctree
>     Documentation/gpu/msm-crash-dump.rst: WARNING: document isn't included in any toctree
>     Documentation/interconnect/interconnect.rst: WARNING: document isn't included in any toctree
>     Documentation/laptops/lg-laptop.rst: WARNING: document isn't included in any toctree
>     Documentation/powerpc/isa-versions.rst: WARNING: document isn't included in any toctree
>     Documentation/virtual/kvm/amd-memory-encryption.rst: WARNING: document isn't included in any toctree
>     Documentation/virtual/kvm/vcpu-requests.rst: WARNING: document isn't included in any toctree
> 
> So, while they aren't on any toctree, add :orphan: to them, in order
> to silent this warning.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>

Please leave out KVM, I'll fix that instead.  Thanks for the report!

Paolo

> ---
>  Documentation/accelerators/ocxl.rst                 | 2 ++
>  Documentation/arm/stm32/overview.rst                | 2 ++
>  Documentation/arm/stm32/stm32f429-overview.rst      | 2 ++
>  Documentation/arm/stm32/stm32f746-overview.rst      | 2 ++
>  Documentation/arm/stm32/stm32f769-overview.rst      | 2 ++
>  Documentation/arm/stm32/stm32h743-overview.rst      | 2 ++
>  Documentation/arm/stm32/stm32mp157-overview.rst     | 2 ++
>  Documentation/gpu/msm-crash-dump.rst                | 2 ++
>  Documentation/interconnect/interconnect.rst         | 2 ++
>  Documentation/laptops/lg-laptop.rst                 | 2 ++
>  Documentation/powerpc/isa-versions.rst              | 2 ++
>  Documentation/virtual/kvm/amd-memory-encryption.rst | 2 ++
>  Documentation/virtual/kvm/vcpu-requests.rst         | 2 ++
>  13 files changed, 26 insertions(+)
> 
> diff --git a/Documentation/accelerators/ocxl.rst b/Documentation/accelerators/ocxl.rst
> index 14cefc020e2d..b1cea19a90f5 100644
> --- a/Documentation/accelerators/ocxl.rst
> +++ b/Documentation/accelerators/ocxl.rst
> @@ -1,3 +1,5 @@
> +:orphan:
> +
>  ========================================================
>  OpenCAPI (Open Coherent Accelerator Processor Interface)
>  ========================================================
> diff --git a/Documentation/arm/stm32/overview.rst b/Documentation/arm/stm32/overview.rst
> index 85cfc8410798..f7e734153860 100644
> --- a/Documentation/arm/stm32/overview.rst
> +++ b/Documentation/arm/stm32/overview.rst
> @@ -1,3 +1,5 @@
> +:orphan:
> +
>  ========================
>  STM32 ARM Linux Overview
>  ========================
> diff --git a/Documentation/arm/stm32/stm32f429-overview.rst b/Documentation/arm/stm32/stm32f429-overview.rst
> index 18feda97f483..65bbb1c3b423 100644
> --- a/Documentation/arm/stm32/stm32f429-overview.rst
> +++ b/Documentation/arm/stm32/stm32f429-overview.rst
> @@ -1,3 +1,5 @@
> +:orphan:
> +
>  STM32F429 Overview
>  ==================
>  
> diff --git a/Documentation/arm/stm32/stm32f746-overview.rst b/Documentation/arm/stm32/stm32f746-overview.rst
> index b5f4b6ce7656..42d593085015 100644
> --- a/Documentation/arm/stm32/stm32f746-overview.rst
> +++ b/Documentation/arm/stm32/stm32f746-overview.rst
> @@ -1,3 +1,5 @@
> +:orphan:
> +
>  STM32F746 Overview
>  ==================
>  
> diff --git a/Documentation/arm/stm32/stm32f769-overview.rst b/Documentation/arm/stm32/stm32f769-overview.rst
> index 228656ced2fe..f6adac862b17 100644
> --- a/Documentation/arm/stm32/stm32f769-overview.rst
> +++ b/Documentation/arm/stm32/stm32f769-overview.rst
> @@ -1,3 +1,5 @@
> +:orphan:
> +
>  STM32F769 Overview
>  ==================
>  
> diff --git a/Documentation/arm/stm32/stm32h743-overview.rst b/Documentation/arm/stm32/stm32h743-overview.rst
> index 3458dc00095d..c525835e7473 100644
> --- a/Documentation/arm/stm32/stm32h743-overview.rst
> +++ b/Documentation/arm/stm32/stm32h743-overview.rst
> @@ -1,3 +1,5 @@
> +:orphan:
> +
>  STM32H743 Overview
>  ==================
>  
> diff --git a/Documentation/arm/stm32/stm32mp157-overview.rst b/Documentation/arm/stm32/stm32mp157-overview.rst
> index 62e176d47ca7..2c52cd020601 100644
> --- a/Documentation/arm/stm32/stm32mp157-overview.rst
> +++ b/Documentation/arm/stm32/stm32mp157-overview.rst
> @@ -1,3 +1,5 @@
> +:orphan:
> +
>  STM32MP157 Overview
>  ===================
>  
> diff --git a/Documentation/gpu/msm-crash-dump.rst b/Documentation/gpu/msm-crash-dump.rst
> index 757cd257e0d8..240ef200f76c 100644
> --- a/Documentation/gpu/msm-crash-dump.rst
> +++ b/Documentation/gpu/msm-crash-dump.rst
> @@ -1,3 +1,5 @@
> +:orphan:
> +
>  =====================
>  MSM Crash Dump Format
>  =====================
> diff --git a/Documentation/interconnect/interconnect.rst b/Documentation/interconnect/interconnect.rst
> index c3e004893796..56e331dab70e 100644
> --- a/Documentation/interconnect/interconnect.rst
> +++ b/Documentation/interconnect/interconnect.rst
> @@ -1,5 +1,7 @@
>  .. SPDX-License-Identifier: GPL-2.0
>  
> +:orphan:
> +
>  =====================================
>  GENERIC SYSTEM INTERCONNECT SUBSYSTEM
>  =====================================
> diff --git a/Documentation/laptops/lg-laptop.rst b/Documentation/laptops/lg-laptop.rst
> index aa503ee9b3bc..f2c2ffe31101 100644
> --- a/Documentation/laptops/lg-laptop.rst
> +++ b/Documentation/laptops/lg-laptop.rst
> @@ -1,5 +1,7 @@
>  .. SPDX-License-Identifier: GPL-2.0+
>  
> +:orphan:
> +
>  LG Gram laptop extra features
>  =============================
>  
> diff --git a/Documentation/powerpc/isa-versions.rst b/Documentation/powerpc/isa-versions.rst
> index 812e20cc898c..66c24140ebf1 100644
> --- a/Documentation/powerpc/isa-versions.rst
> +++ b/Documentation/powerpc/isa-versions.rst
> @@ -1,3 +1,5 @@
> +:orphan:
> +
>  CPU to ISA Version Mapping
>  ==========================
>  
> diff --git a/Documentation/virtual/kvm/amd-memory-encryption.rst b/Documentation/virtual/kvm/amd-memory-encryption.rst
> index 659bbc093b52..33d697ab8a58 100644
> --- a/Documentation/virtual/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virtual/kvm/amd-memory-encryption.rst
> @@ -1,3 +1,5 @@
> +:orphan:
> +
>  ======================================
>  Secure Encrypted Virtualization (SEV)
>  ======================================
> diff --git a/Documentation/virtual/kvm/vcpu-requests.rst b/Documentation/virtual/kvm/vcpu-requests.rst
> index 5feb3706a7ae..c1807a1b92e6 100644
> --- a/Documentation/virtual/kvm/vcpu-requests.rst
> +++ b/Documentation/virtual/kvm/vcpu-requests.rst
> @@ -1,3 +1,5 @@
> +:orphan:
> +
>  =================
>  KVM VCPU Requests
>  =================
> 


^ permalink raw reply

* [PATCH] Documentation/stackprotector: powerpc supports stack protector
From: Bhupesh Sharma @ 2019-05-30 10:29 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: inux-kernel, arnd, bhsharma, paulus, bhupesh.linux

powerpc architecture (both 64-bit and 32-bit) supports stack protector
mechanism since some time now [see commit 06ec27aea9fc ("powerpc/64:
add stack protector support")].

Update stackprotector arch support documentation to reflect the same.

Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
---
 Documentation/features/debug/stackprotector/arch-support.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/features/debug/stackprotector/arch-support.txt b/Documentation/features/debug/stackprotector/arch-support.txt
index 9999ea521f3e..32bbdfc64c32 100644
--- a/Documentation/features/debug/stackprotector/arch-support.txt
+++ b/Documentation/features/debug/stackprotector/arch-support.txt
@@ -22,7 +22,7 @@
     |       nios2: | TODO |
     |    openrisc: | TODO |
     |      parisc: | TODO |
-    |     powerpc: | TODO |
+    |     powerpc: |  ok  |
     |       riscv: | TODO |
     |        s390: | TODO |
     |          sh: |  ok  |
-- 
2.7.4


^ permalink raw reply related

* Re: [RFC] mm: Generalize notify_page_fault()
From: Matthew Wilcox @ 2019-05-30 11:06 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Mark Rutland, Michal Hocko, linux-ia64, linux-sh, Catalin Marinas,
	Will Deacon, linux-mm, Paul Mackerras, sparclinux, linux-s390,
	Yoshinori Sato, Russell King, Fenghua Yu, Stephen Rothwell,
	Andrey Konovalov, linux-arm-kernel, Tony Luck, Heiko Carstens,
	linux-kernel, Martin Schwidefsky, Andrew Morton, linuxppc-dev,
	David S. Miller
In-Reply-To: <1559195713-6956-1-git-send-email-anshuman.khandual@arm.com>

On Thu, May 30, 2019 at 11:25:13AM +0530, Anshuman Khandual wrote:
> Similar notify_page_fault() definitions are being used by architectures
> duplicating much of the same code. This attempts to unify them into a
> single implementation, generalize it and then move it to a common place.
> kprobes_built_in() can detect CONFIG_KPROBES, hence notify_page_fault()
> must not be wrapped again within CONFIG_KPROBES. Trap number argument can

This is a funny quirk of the English language.  "must not" means "is not
allowed to be", not "does not have to be".

> @@ -141,6 +142,19 @@ static int __init init_zero_pfn(void)
>  core_initcall(init_zero_pfn);
>  
>  
> +int __kprobes notify_page_fault(struct pt_regs *regs, unsigned int trap)
> +{
> +	int ret = 0;
> +
> +	if (kprobes_built_in() && !user_mode(regs)) {
> +		preempt_disable();
> +		if (kprobe_running() && kprobe_fault_handler(regs, trap))
> +			ret = 1;
> +		preempt_enable();
> +	}
> +	return ret;
> +}
> +
>  #if defined(SPLIT_RSS_COUNTING)

Comparing this to the canonical implementation (ie x86), it looks similar.

static nokprobe_inline int kprobes_fault(struct pt_regs *regs)
{
        if (!kprobes_built_in())
                return 0;
        if (user_mode(regs))
                return 0;
        /*
         * To be potentially processing a kprobe fault and to be allowed to call
         * kprobe_running(), we have to be non-preemptible.
         */
        if (preemptible())
                return 0;
        if (!kprobe_running())
                return 0;
        return kprobe_fault_handler(regs, X86_TRAP_PF);
}

The two handle preemption differently.  Why is x86 wrong and this one
correct?

^ permalink raw reply

* Re: [RFC] mm: Generalize notify_page_fault()
From: Anshuman Khandual @ 2019-05-30 12:01 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Mark Rutland, Michal Hocko, linux-ia64, linux-sh, Catalin Marinas,
	Will Deacon, linux-mm, Paul Mackerras, sparclinux, linux-s390,
	Yoshinori Sato, Russell King, Fenghua Yu, Stephen Rothwell,
	Andrey Konovalov, linux-arm-kernel, Tony Luck, Heiko Carstens,
	linux-kernel, Martin Schwidefsky, Andrew Morton, linuxppc-dev,
	David S. Miller
In-Reply-To: <20190530110639.GC23461@bombadil.infradead.org>



On 05/30/2019 04:36 PM, Matthew Wilcox wrote:
> On Thu, May 30, 2019 at 11:25:13AM +0530, Anshuman Khandual wrote:
>> Similar notify_page_fault() definitions are being used by architectures
>> duplicating much of the same code. This attempts to unify them into a
>> single implementation, generalize it and then move it to a common place.
>> kprobes_built_in() can detect CONFIG_KPROBES, hence notify_page_fault()
>> must not be wrapped again within CONFIG_KPROBES. Trap number argument can
> 
> This is a funny quirk of the English language.  "must not" means "is not
> allowed to be", not "does not have to be".

You are right. Noted for future. Thanks !

> 
>> @@ -141,6 +142,19 @@ static int __init init_zero_pfn(void)
>>  core_initcall(init_zero_pfn);
>>  
>>  
>> +int __kprobes notify_page_fault(struct pt_regs *regs, unsigned int trap)
>> +{
>> +	int ret = 0;
>> +
>> +	if (kprobes_built_in() && !user_mode(regs)) {
>> +		preempt_disable();
>> +		if (kprobe_running() && kprobe_fault_handler(regs, trap))
>> +			ret = 1;
>> +		preempt_enable();
>> +	}
>> +	return ret;
>> +}
>> +
>>  #if defined(SPLIT_RSS_COUNTING)
> 
> Comparing this to the canonical implementation (ie x86), it looks similar.
> 
> static nokprobe_inline int kprobes_fault(struct pt_regs *regs)
> {
>         if (!kprobes_built_in())
>                 return 0;
>         if (user_mode(regs))
>                 return 0;
>         /*
>          * To be potentially processing a kprobe fault and to be allowed to call
>          * kprobe_running(), we have to be non-preemptible.
>          */
>         if (preemptible())
>                 return 0;
>         if (!kprobe_running())
>                 return 0;
>         return kprobe_fault_handler(regs, X86_TRAP_PF);
> }
> 
> The two handle preemption differently.  Why is x86 wrong and this one
> correct?

Here it expects context to be already non-preemptible where as the proposed
generic function makes it non-preemptible with a preempt_[disable|enable]()
pair for the required code section, irrespective of it's present state. Is
not this better ?

^ permalink raw reply

* Re: [PATCH v8 1/7] iommu: enhance IOMMU default DMA mode build options
From: John Garry @ 2019-05-30 12:20 UTC (permalink / raw)
  To: Zhen Lei, Jean-Philippe Brucker, Robin Murphy, Will Deacon,
	Joerg Roedel, Jonathan Corbet, linux-doc, Sebastian Ott,
	Gerald Schaefer, Martin Schwidefsky, Heiko Carstens,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Tony Luck, Fenghua Yu, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, David Woodhouse, iommu,
	linux-kernel, linux-s390, linuxppc-dev, x86, linux-ia64
  Cc: Linuxarm, Hanjun Guo
In-Reply-To: <20190530034831.4184-2-thunder.leizhen@huawei.com>

On 30/05/2019 04:48, Zhen Lei wrote:
> First, add build option IOMMU_DEFAULT_{LAZY|STRICT}, so that we have the
> opportunity to set {lazy|strict} mode as default at build time. Then put
> the three config options in an choice, make people can only choose one of
> the three at a time.
>

Since this was not picked up, but modulo (somtimes same) comments below:

Reviewed-by: John Garry <john.garry@huawei.com>

> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>  drivers/iommu/Kconfig | 42 +++++++++++++++++++++++++++++++++++-------
>  drivers/iommu/iommu.c |  3 ++-
>  2 files changed, 37 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 83664db5221df02..d6a1a45f80ffbf5 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -75,17 +75,45 @@ config IOMMU_DEBUGFS
>  	  debug/iommu directory, and then populate a subdirectory with
>  	  entries as required.
>
> -config IOMMU_DEFAULT_PASSTHROUGH
> -	bool "IOMMU passthrough by default"
> +choice
> +	prompt "IOMMU default DMA mode"
>  	depends on IOMMU_API
> -        help
> -	  Enable passthrough by default, removing the need to pass in
> -	  iommu.passthrough=on or iommu=pt through command line. If this
> -	  is enabled, you can still disable with iommu.passthrough=off
> -	  or iommu=nopt depending on the architecture.
> +	default IOMMU_DEFAULT_STRICT
> +	help
> +	  This option allows IOMMU DMA mode to be chose at build time, to

As before:
/s/chose/chosen/, /s/allows IOMMU/allows an IOMMU/

> +	  override the default DMA mode of each ARCHs, removing the need to

Again, as before:
ARCHs should be singular

> +	  pass in kernel parameters through command line. You can still use
> +	  ARCHs specific boot options to override this option again.
> +
> +config IOMMU_DEFAULT_PASSTHROUGH
> +	bool "passthrough"
> +	help
> +	  In this mode, the DMA access through IOMMU without any addresses
> +	  translation. That means, the wrong or illegal DMA access can not
> +	  be caught, no error information will be reported.
>
>  	  If unsure, say N here.
>
> +config IOMMU_DEFAULT_LAZY
> +	bool "lazy"
> +	help
> +	  Support lazy mode, where for every IOMMU DMA unmap operation, the
> +	  flush operation of IOTLB and the free operation of IOVA are deferred.
> +	  They are only guaranteed to be done before the related IOVA will be
> +	  reused.

why no advisory on how to set if unsure?

> +
> +config IOMMU_DEFAULT_STRICT
> +	bool "strict"
> +	help
> +	  For every IOMMU DMA unmap operation, the flush operation of IOTLB and
> +	  the free operation of IOVA are guaranteed to be done in the unmap
> +	  function.
> +
> +	  This mode is safer than the two above, but it maybe slower in some
> +	  high performace scenarios.

and here?

> +
> +endchoice
> +
>  config OF_IOMMU
>         def_bool y
>         depends on OF && IOMMU_API
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 67ee6623f9b2a4d..56bce221285b15f 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -43,7 +43,8 @@
>  #else
>  static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_DMA;
>  #endif
> -static bool iommu_dma_strict __read_mostly = true;
> +static bool iommu_dma_strict __read_mostly =
> +			IS_ENABLED(CONFIG_IOMMU_DEFAULT_STRICT);
>
>  struct iommu_group {
>  	struct kobject kobj;
>



^ permalink raw reply

* Re: [PATCH] Documentation/stackprotector: powerpc supports stack protector
From: Michael Ellerman @ 2019-05-30 12:55 UTC (permalink / raw)
  To: Bhupesh Sharma, linuxppc-dev
  Cc: arnd, corbet, bhsharma, linux-doc, linux-kernel, paulus,
	bhupesh.linux
In-Reply-To: <1559212177-7072-1-git-send-email-bhsharma@redhat.com>

Bhupesh Sharma <bhsharma@redhat.com> writes:
> powerpc architecture (both 64-bit and 32-bit) supports stack protector
> mechanism since some time now [see commit 06ec27aea9fc ("powerpc/64:
> add stack protector support")].
>
> Update stackprotector arch support documentation to reflect the same.
>
> Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
> ---
>  Documentation/features/debug/stackprotector/arch-support.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/Documentation/features/debug/stackprotector/arch-support.txt b/Documentation/features/debug/stackprotector/arch-support.txt
> index 9999ea521f3e..32bbdfc64c32 100644
> --- a/Documentation/features/debug/stackprotector/arch-support.txt
> +++ b/Documentation/features/debug/stackprotector/arch-support.txt
> @@ -22,7 +22,7 @@
>      |       nios2: | TODO |
>      |    openrisc: | TODO |
>      |      parisc: | TODO |
> -    |     powerpc: | TODO |
> +    |     powerpc: |  ok  |
>      |       riscv: | TODO |
>      |        s390: | TODO |
>      |          sh: |  ok  |
> -- 
> 2.7.4

Thanks.

This should probably go via the documentation tree?

Acked-by: Michael Ellerman <mpe@ellerman.id.au>


cheers

^ permalink raw reply

* Re: [PATCH] Documentation/stackprotector: powerpc supports stack protector
From: Bhupesh Sharma @ 2019-05-30 13:07 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Arnd Bergmann, corbet, linux-doc, Linux Kernel Mailing List,
	Paul Mackerras, Bhupesh SHARMA, linuxppc-dev
In-Reply-To: <87v9xsnlu9.fsf@concordia.ellerman.id.au>

On Thu, May 30, 2019 at 6:25 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Bhupesh Sharma <bhsharma@redhat.com> writes:
> > powerpc architecture (both 64-bit and 32-bit) supports stack protector
> > mechanism since some time now [see commit 06ec27aea9fc ("powerpc/64:
> > add stack protector support")].
> >
> > Update stackprotector arch support documentation to reflect the same.
> >
> > Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
> > ---
> >  Documentation/features/debug/stackprotector/arch-support.txt | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/Documentation/features/debug/stackprotector/arch-support.txt b/Documentation/features/debug/stackprotector/arch-support.txt
> > index 9999ea521f3e..32bbdfc64c32 100644
> > --- a/Documentation/features/debug/stackprotector/arch-support.txt
> > +++ b/Documentation/features/debug/stackprotector/arch-support.txt
> > @@ -22,7 +22,7 @@
> >      |       nios2: | TODO |
> >      |    openrisc: | TODO |
> >      |      parisc: | TODO |
> > -    |     powerpc: | TODO |
> > +    |     powerpc: |  ok  |
> >      |       riscv: | TODO |
> >      |        s390: | TODO |
> >      |          sh: |  ok  |
> > --
> > 2.7.4
>
> Thanks.
>
> This should probably go via the documentation tree?
>
> Acked-by: Michael Ellerman <mpe@ellerman.id.au>

Thanks for the review Michael.
I am ok with this going through the documentation tree as well.

Regards,
Bhupesh

^ permalink raw reply

* Re: [RFC] mm: Generalize notify_page_fault()
From: Matthew Wilcox @ 2019-05-30 13:39 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Mark Rutland, Michal Hocko, linux-ia64, linux-sh, Catalin Marinas,
	Will Deacon, linux-mm, Paul Mackerras, sparclinux, linux-s390,
	Yoshinori Sato, Russell King, Fenghua Yu, Stephen Rothwell,
	Andrey Konovalov, linux-arm-kernel, Tony Luck, Heiko Carstens,
	linux-kernel, Martin Schwidefsky, Andrew Morton, linuxppc-dev,
	David S. Miller
In-Reply-To: <4f9a610d-e856-60f6-4467-09e9c3836771@arm.com>

On Thu, May 30, 2019 at 05:31:15PM +0530, Anshuman Khandual wrote:
> On 05/30/2019 04:36 PM, Matthew Wilcox wrote:
> > The two handle preemption differently.  Why is x86 wrong and this one
> > correct?
> 
> Here it expects context to be already non-preemptible where as the proposed
> generic function makes it non-preemptible with a preempt_[disable|enable]()
> pair for the required code section, irrespective of it's present state. Is
> not this better ?

git log -p arch/x86/mm/fault.c

search for 'kprobes'.

tell me what you think.

^ permalink raw reply

* Re: [PATCH] crypto: vmx - convert to SPDX license identifiers
From: Herbert Xu @ 2019-05-30 13:40 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Nayna Jain, Paulo Flabiano Smorigo, linux-crypto,
	Breno Leitão, linuxppc-dev, Daniel Axtens
In-Reply-To: <20190520164232.159053-1-ebiggers@kernel.org>

On Mon, May 20, 2019 at 09:42:32AM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Remove the boilerplate license text and replace it with the equivalent
> SPDX license identifier.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
>  drivers/crypto/vmx/aes.c     | 14 +-------------
>  drivers/crypto/vmx/aes_cbc.c | 14 +-------------
>  drivers/crypto/vmx/aes_ctr.c | 14 +-------------
>  drivers/crypto/vmx/aes_xts.c | 14 +-------------
>  drivers/crypto/vmx/vmx.c     | 14 +-------------
>  5 files changed, 5 insertions(+), 65 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH] crypto: vmx - convert to skcipher API
From: Herbert Xu @ 2019-05-30 13:40 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Nayna Jain, Paulo Flabiano Smorigo, linux-crypto,
	Breno Leitão, linuxppc-dev, Daniel Axtens
In-Reply-To: <20190520164448.160430-1-ebiggers@kernel.org>

On Mon, May 20, 2019 at 09:44:48AM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Convert the VMX implementations of AES-CBC, AES-CTR, and AES-XTS from
> the deprecated "blkcipher" API to the "skcipher" API.
> 
> As part of this, I moved the skcipher_request for the fallback algorithm
> off the stack and into the request context of the parent algorithm.
> 
> I tested this in a PowerPC VM with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
>  drivers/crypto/vmx/aes_cbc.c   | 183 ++++++++++++---------------------
>  drivers/crypto/vmx/aes_ctr.c   | 165 +++++++++++++----------------
>  drivers/crypto/vmx/aes_xts.c   | 175 ++++++++++++++-----------------
>  drivers/crypto/vmx/aesp8-ppc.h |   2 -
>  drivers/crypto/vmx/vmx.c       |  72 +++++++------
>  5 files changed, 252 insertions(+), 345 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox