LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook
From: Larry Finger @ 2019-06-12  1:52 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Christoph Hellwig
  Cc: linuxppc-dev, Christian Zigotzky, linux-wireless, linux-kernel,
	Aaro Koskinen
In-Reply-To: <0b257651bb7ac4a6f0a8dce5470120b7701720b9.camel@kernel.crashing.org>

On 6/11/19 5:46 PM, Benjamin Herrenschmidt wrote:
> On Tue, 2019-06-11 at 17:20 -0500, Larry Finger wrote:
>> b43-pci-bridge 0001:11:00.0: dma_direct_supported: failed (mask =
>> 0x3fffffff,
>> min_mask = 0x5ffff000/0x5ffff000, dma bits = 0x1f
> 
> Ugh ? A mask with holes in it ? That's very wrong... That min_mask is
> bogus.

I agree, but that is not likely serious as most systems will have enough memory 
that the max_pfn term will be much larger than the initial min_mask, and 
min_mask will be unchanged by the min function. In addition, min_mask is not 
used beyond this routine, and then only to decide if direct dma is supported. 
The following patch generates masks with no holes, but I cannot see that it is 
needed.

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 2c2772e9702a..e3edd4f29e80 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -384,7 +384,8 @@ int dma_direct_supported(struct device *dev, u64 mask)
         else
                 min_mask = DMA_BIT_MASK(32);

-       min_mask = min_t(u64, min_mask, (max_pfn - 1) << PAGE_SHIFT);
+       min_mask = min_t(u64, min_mask, ((max_pfn - 1) << PAGE_SHIFT) |
+                                        DMA_BIT_MASK(PAGE_SHIFT));

         /*
          * This check needs to be against the actual bit mask value, so


Larry

^ permalink raw reply related

* Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook
From: Larry Finger @ 2019-06-12  1:57 UTC (permalink / raw)
  To: Aaro Koskinen
  Cc: linux-wireless, linux-kernel, Christian Zigotzky, linuxppc-dev,
	Christoph Hellwig
In-Reply-To: <20190611224623.GC26504@darkstar.musicnaut.iki.fi>

On 6/11/19 5:46 PM, Aaro Koskinen wrote:
> Hi,
> 
> On Tue, Jun 11, 2019 at 05:20:12PM -0500, Larry Finger wrote:
>> It is obvious that the case of a mask smaller than min_mask should be
>> handled by the IOMMU. In my system, CONFIG_IOMMU_SUPPORT is selected. All
>> other CONFIG variables containing IOMMU are not selected. When
>> dma_direct_supported() fails, should the system not try for an IOMMU
>> solution? Is the driver asking for the wrong type of memory? It is doing a
>> dma_and_set_mask_coherent() call.
> 
> I don't think we have IOMMU on G4. On G5 it should work (I remember fixing
> b43 issue on G5, see 4c374af5fdee, unfortunately all my G5 Macs with b43
> are dead and waiting for re-capping).

You are right. My configuration has CONFIG_IOMMU_SUPPORT=y, but there is no 
mention of an IOMMU in the log.

Larry


^ permalink raw reply

* Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook
From: Benjamin Herrenschmidt @ 2019-06-12  3:32 UTC (permalink / raw)
  To: Larry Finger, Christoph Hellwig
  Cc: linuxppc-dev, Christian Zigotzky, linux-wireless, linux-kernel,
	Aaro Koskinen
In-Reply-To: <7dcf54a9-a7aa-3a4c-8e2d-556be633d6e0@lwfinger.net>

On Tue, 2019-06-11 at 20:52 -0500, Larry Finger wrote:
> On 6/11/19 5:46 PM, Benjamin Herrenschmidt wrote:
> > On Tue, 2019-06-11 at 17:20 -0500, Larry Finger wrote:
> > > b43-pci-bridge 0001:11:00.0: dma_direct_supported: failed (mask =
> > > 0x3fffffff,
> > > min_mask = 0x5ffff000/0x5ffff000, dma bits = 0x1f
> > 
> > Ugh ? A mask with holes in it ? That's very wrong... That min_mask is
> > bogus.
> 
> I agree, but that is not likely serious as most systems will have enough memory 
> that the max_pfn term will be much larger than the initial min_mask, and 
> min_mask will be unchanged by the min function. 

Well no... it's too much memory that is the problem. If min_mask is
bogus though it will cause problem later too, so one should look into
it.

> In addition, min_mask is not 
> used beyond this routine, and then only to decide if direct dma is supported. 
> The following patch generates masks with no holes, but I cannot see that it is 
> needed.

The right fix is to round up max_pfn to a power of 2, something like

min_mask = min_t(u64, min_mask, (roundup_pow_of_two(max_pfn - 1)) <<
		PAGE_SHIFT) 

> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 2c2772e9702a..e3edd4f29e80 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -384,7 +384,8 @@ int dma_direct_supported(struct device *dev, u64 mask)
>          else
>                  min_mask = DMA_BIT_MASK(32);
> 
> -       min_mask = min_t(u64, min_mask, (max_pfn - 1) << PAGE_SHIFT);
> +       min_mask = min_t(u64, min_mask, ((max_pfn - 1) << PAGE_SHIFT) |
> +                                        DMA_BIT_MASK(PAGE_SHIFT));
> 
>          /*
>           * This check needs to be against the actual bit mask value, so
> 
> 
> Larry


^ permalink raw reply

* [PATCH 1/3] powerpc/cacheinfo: add cacheinfo_teardown, cacheinfo_rebuild
From: Nathan Lynch @ 2019-06-12  4:45 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <20190612044506.16392-1-nathanl@linux.ibm.com>

Allow external callers to force the cacheinfo code to release all its
references to cache nodes, e.g. before processing device tree updates
post-migration, and to rebuild the hierarchy afterward.

CPU online/offline must be blocked by callers; enforce this.

Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/kernel/cacheinfo.c | 21 +++++++++++++++++++++
 arch/powerpc/kernel/cacheinfo.h |  4 ++++
 2 files changed, 25 insertions(+)

diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c
index 862e2890bd3d..42c559efe060 100644
--- a/arch/powerpc/kernel/cacheinfo.c
+++ b/arch/powerpc/kernel/cacheinfo.c
@@ -896,4 +896,25 @@ void cacheinfo_cpu_offline(unsigned int cpu_id)
 	if (cache)
 		cache_cpu_clear(cache, cpu_id);
 }
+
+void cacheinfo_teardown(void)
+{
+	unsigned int cpu;
+
+	lockdep_assert_cpus_held();
+
+	for_each_online_cpu(cpu)
+		cacheinfo_cpu_offline(cpu);
+}
+
+void cacheinfo_rebuild(void)
+{
+	unsigned int cpu;
+
+	lockdep_assert_cpus_held();
+
+	for_each_online_cpu(cpu)
+		cacheinfo_cpu_online(cpu);
+}
+
 #endif /* (CONFIG_PPC_PSERIES && CONFIG_SUSPEND) || CONFIG_HOTPLUG_CPU */
diff --git a/arch/powerpc/kernel/cacheinfo.h b/arch/powerpc/kernel/cacheinfo.h
index 955f5e999f1b..52bd3fc6642d 100644
--- a/arch/powerpc/kernel/cacheinfo.h
+++ b/arch/powerpc/kernel/cacheinfo.h
@@ -6,4 +6,8 @@
 extern void cacheinfo_cpu_online(unsigned int cpu_id);
 extern void cacheinfo_cpu_offline(unsigned int cpu_id);
 
+/* Allow migration/suspend to tear down and rebuild the hierarchy. */
+extern void cacheinfo_teardown(void);
+extern void cacheinfo_rebuild(void);
+
 #endif /* _PPC_CACHEINFO_H */
-- 
2.20.1


^ permalink raw reply related

* [PATCH 3/3] powerpc/pseries/mobility: rebuild cacheinfo hierarchy post-migration
From: Nathan Lynch @ 2019-06-12  4:45 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <20190612044506.16392-1-nathanl@linux.ibm.com>

It's common for the platform to replace the cache device nodes after a
migration. Since the cacheinfo code is never informed about this, it
never drops its references to the source system's cache nodes, causing
it to wind up in an inconsistent state resulting in warnings and oopses
as soon as CPU online/offline occurs after the migration, e.g.

cache for /cpus/l3-cache@3113(Unified) refers to cache for /cpus/l2-cache@200d(Unified)
WARNING: CPU: 15 PID: 86 at arch/powerpc/kernel/cacheinfo.c:176 release_cache+0x1bc/0x1d0
[...]
NIP [c00000000002d9bc] release_cache+0x1bc/0x1d0
LR [c00000000002d9b8] release_cache+0x1b8/0x1d0
Call Trace:
[c0000001fc99fa70] [c00000000002d9b8] release_cache+0x1b8/0x1d0 (unreliable)
[c0000001fc99fb10] [c00000000002ebf4] cacheinfo_cpu_offline+0x1c4/0x2c0
[c0000001fc99fbe0] [c00000000002ae58] unregister_cpu_online+0x1b8/0x260
[c0000001fc99fc40] [c000000000165a64] cpuhp_invoke_callback+0x114/0xf40
[c0000001fc99fcd0] [c000000000167450] cpuhp_thread_fun+0x270/0x310
[c0000001fc99fd40] [c0000000001a8bb8] smpboot_thread_fn+0x2c8/0x390
[c0000001fc99fdb0] [c0000000001a1cd8] kthread+0x1b8/0x1c0
[c0000001fc99fe20] [c00000000000c2d4] ret_from_kernel_thread+0x5c/0x68

Using device tree notifiers won't work since we want to rebuild the
hierarchy only after all the removals and additions have occurred and
the device tree is in a consistent state. Call cacheinfo_teardown()
before processing device tree updates, and rebuild the hierarchy
afterward.

Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/mobility.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index edc1ec408589..b8c8096907d4 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -23,6 +23,7 @@
 #include <asm/machdep.h>
 #include <asm/rtas.h>
 #include "pseries.h"
+#include "../../kernel/cacheinfo.h"
 
 static struct kobject *mobility_kobj;
 
@@ -345,11 +346,20 @@ void post_mobility_fixup(void)
 	 */
 	cpus_read_lock();
 
+	/*
+	 * It's common for the destination firmware to replace cache
+	 * nodes.  Release all of the cacheinfo hierarchy's references
+	 * before updating the device tree.
+	 */
+	cacheinfo_teardown();
+
 	rc = pseries_devicetree_update(MIGRATION_SCOPE);
 	if (rc)
 		printk(KERN_ERR "Post-mobility device tree update "
 			"failed: %d\n", rc);
 
+	cacheinfo_rebuild();
+
 	cpus_read_unlock();
 
 	/* Possibly switch to a new RFI flush type */
-- 
2.20.1


^ permalink raw reply related

* [PATCH 2/3] powerpc/pseries/mobility: prevent cpu hotplug during DT update
From: Nathan Lynch @ 2019-06-12  4:45 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <20190612044506.16392-1-nathanl@linux.ibm.com>

CPU online/offline code paths are sensitive to parts of the device
tree (various cpu node properties, cache nodes) that can be changed as
a result of a migration.

Prevent CPU hotplug while the device tree potentially is inconsistent.

Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/mobility.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index 88925f8ca8a0..edc1ec408589 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -9,6 +9,7 @@
  * 2 as published by the Free Software Foundation.
  */
 
+#include <linux/cpu.h>
 #include <linux/kernel.h>
 #include <linux/kobject.h>
 #include <linux/smp.h>
@@ -338,11 +339,19 @@ void post_mobility_fixup(void)
 	if (rc)
 		printk(KERN_ERR "Post-mobility activate-fw failed: %d\n", rc);
 
+	/*
+	 * We don't want CPUs to go online/offline while the device
+	 * tree is being updated.
+	 */
+	cpus_read_lock();
+
 	rc = pseries_devicetree_update(MIGRATION_SCOPE);
 	if (rc)
 		printk(KERN_ERR "Post-mobility device tree update "
 			"failed: %d\n", rc);
 
+	cpus_read_unlock();
+
 	/* Possibly switch to a new RFI flush type */
 	pseries_setup_rfi_flush();
 
-- 
2.20.1


^ permalink raw reply related

* [PATCH 0/3] live partition migration vs cacheinfo
From: Nathan Lynch @ 2019-06-12  4:45 UTC (permalink / raw)
  To: linuxppc-dev

Partition migration often results in the platform telling the OS to
replace all the cache nodes in the device tree. The cacheinfo code has
no knowledge of this, and continues to maintain references to the
deleted/detached nodes, causing subsequent CPU online/offline
operations to get warnings and oopses. This series addresses this
longstanding issue by providing an interface to the cacheinfo layer
that the migration code uses to rebuild the cacheinfo data structures
at a safe time after migration, with appropriate serialization vs CPU
hotplug.

Nathan Lynch (3):
  powerpc/cacheinfo: add cacheinfo_teardown, cacheinfo_rebuild
  powerpc/pseries/mobility: prevent cpu hotplug during DT update
  powerpc/pseries/mobility: rebuild cacheinfo hierarchy post-migration

 arch/powerpc/kernel/cacheinfo.c           | 21 +++++++++++++++++++++
 arch/powerpc/kernel/cacheinfo.h           |  4 ++++
 arch/powerpc/platforms/pseries/mobility.c | 19 +++++++++++++++++++
 3 files changed, 44 insertions(+)

-- 
2.20.1

^ permalink raw reply

* Re: [PATCH v3] powerpc: fix kexec failure on book3s/32
From: Michael Ellerman @ 2019-06-12  4:59 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
	Aaro Koskinen
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <56efc3b317622d5f607d1f7a35894b194c385492.1559549824.git.christophe.leroy@c-s.fr>

On Mon, 2019-06-03 at 08:20:28 UTC, Christophe Leroy wrote:
> In the old days, _PAGE_EXEC didn't exist on 6xx aka book3s/32.
> Therefore, allthough __mapin_ram_chunk() was already mapping kernel
> text with PAGE_KERNEL_TEXT and the rest with PAGE_KERNEL, the entire
> memory was executable. Part of the memory (first 512kbytes) was
> mapped with BATs instead of page table, but it was also entirely
> mapped as executable.
> 
> In commit 385e89d5b20f ("powerpc/mm: add exec protection on
> powerpc 603"), we started adding exec protection to some 6xx, namely
> the 603, for pages mapped via pagetables.
> 
> Then, in commit 63b2bc619565 ("powerpc/mm/32s: Use BATs for
> STRICT_KERNEL_RWX"), the exec protection was extended to BAT mapped
> memory, so that really only the kernel text could be executed.
> 
> The problem here is that kexec is based on copying some code into
> upper part of memory then executing it from there in order to install
> a fresh new kernel at its definitive location.
> 
> However, the code is position independant and first part of it is
> just there to deactivate the MMU and jump to the second part. So it
> is possible to run this first part inplace instead of running the
> copy. Once the MMU is off, there is no protection anymore and the
> second part of the code will just run as before.
> 
> Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
> Fixes: 63b2bc619565 ("powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX")
> Cc: stable@vger.kernel.org
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/6c284228eb356a1ec62a704b4d232971

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/32s: fix booting with CONFIG_PPC_EARLY_DEBUG_BOOTX
From: Michael Ellerman @ 2019-06-12  4:59 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
	Mathieu Malaterre
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <90d30adb0943a11ab127808c03229ba657478df4.1559566521.git.christophe.leroy@c-s.fr>

On Mon, 2019-06-03 at 13:00:51 UTC, Christophe Leroy wrote:
> When booting through OF, setup_disp_bat() does nothing because
> disp_BAT are not set. By change, it used to work because BOOTX
> buffer is mapped 1:1 at address 0x81000000 by the bootloader, and
> btext_setup_display() sets virt addr same as phys addr.
> 
> But since commit 215b823707ce ("powerpc/32s: set up an early static
> hash table for KASAN."), a temporary page table overrides the
> bootloader mapping.
> 
> This 0x81000000 is also problematic with the newly implemented
> Kernel Userspace Access Protection (KUAP) because it is within user
> address space.
> 
> This patch fixes those issues by properly setting disp_BAT through
> a call to btext_prepare_BAT(), allowing setup_disp_bat() to
> properly setup BAT3 for early bootx screen buffer access.
> 
> Reported-by: Mathieu Malaterre <malat@debian.org>
> Fixes: 215b823707ce ("powerpc/32s: set up an early static hash table for KASAN.")
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> Tested-by: Mathieu Malaterre <malat@debian.org>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/c21f5a9ed85ca3e914ca11f421677ae9

cheers

^ permalink raw reply

* Re: [PATCH 1/2] powerpc/64s: Fix THP PMD collapse serialisation
From: Michael Ellerman @ 2019-06-12  4:59 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Aneesh Kumar K . V, Nicholas Piggin
In-Reply-To: <20190607035636.5446-1-npiggin@gmail.com>

On Fri, 2019-06-07 at 03:56:35 UTC, Nicholas Piggin wrote:
> Commit 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion
> in pte helpers") changed the actual bitwise tests in pte_access_permitted
> by using pte_write() and pte_present() helpers rather than raw bitwise
> testing _PAGE_WRITE and _PAGE_PRESENT bits.
> 
> The pte_present change now returns true for ptes which are !_PAGE_PRESENT
> and _PAGE_INVALID, which is the combination used by pmdp_invalidate to
> synchronize access from lock-free lookups. pte_access_permitted is used by
> pmd_access_permitted, so allowing GUP lock free access to proceed with
> such PTEs breaks this synchronisation.
> 
> This bug has been observed on HPT host, with random crashes and corruption
> in guests, usually together with bad PMD messages in the host.
> 
> Fix this by adding an explicit check in pmd_access_permitted, and
> documenting the condition explicitly.
> 
> The pte_write() change should be okay, and would prevent GUP from falling
> back to the slow path when encountering savedwrite ptes, which matches
> what x86 (that does not implement savedwrite) does.
> 
> Fixes: 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion in pte helpers")
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/33258a1db165cf43a9e6382587ad06e9

cheers

^ permalink raw reply

* Re: [PATCH 2/2] powerpc/64s: __find_linux_pte synchronization vs pmdp_invalidate
From: Michael Ellerman @ 2019-06-12  4:59 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Aneesh Kumar K . V, Nicholas Piggin
In-Reply-To: <20190607035636.5446-2-npiggin@gmail.com>

On Fri, 2019-06-07 at 03:56:36 UTC, Nicholas Piggin wrote:
> The change to pmdp_invalidate to mark the pmd with _PAGE_INVALID broke
> the synchronisation against lock free lookups, __find_linux_pte's
> pmd_none check no longer returns true for such cases.
> 
> Fix this by adding a check for this condition as well.
> 
> Fixes: da7ad366b497 ("powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit")
> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
> Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/a00196a272161338d4b1d66ec69e3d57

cheers

^ permalink raw reply

* Re: [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59
From: Shawn Anastasio @ 2019-06-12  5:05 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev
  Cc: Sam Bobroff, Alistair Popple, Oliver O'Halloran, David Gibson
In-Reply-To: <deb34b5f-9472-2156-e58d-8dbcb0a38979@anastas.io>

On 6/5/19 11:11 PM, Shawn Anastasio wrote:
> On 5/30/19 2:03 AM, Alexey Kardashevskiy wrote:
>> This is an attempt to allow DMA masks between 32..59 which are not large
>> enough to use either a PHB3 bypass mode or a sketchy bypass. Depending
>> on the max order, up to 40 is usually available.
>>
>>
>> This is based on v5.2-rc2.
>>
>> Please comment. Thanks.
> 
> I have tested this patch set with an AMD GPU that's limited to <64bit
> DMA (I believe it's 40 or 42 bit). It successfully allows the card to
> operate without falling back to 32-bit DMA mode as it does without
> the patches.
> 
> Relevant kernel log message:
> ```
> [    0.311211] pci 0033:01     : [PE# 00] Enabling 64-bit DMA bypass
> ```
> 
> Tested-by: Shawn Anastasio <shawn@anastas.io>

After a few days of further testing, I've started to run into stability
issues with the patch applied and used with an AMD GPU. Specifically,
the system sometimes spontaneously crashes. Not just EEH errors either,
the whole system shuts down in what looks like a checkstop.

Perhaps some subtle corruption is occurring?

^ permalink raw reply

* Re: [PATCH v8 2/7] x86/dma: use IS_ENABLED() to simplify the code
From: Borislav Petkov @ 2019-06-12  5:16 UTC (permalink / raw)
  To: Zhen Lei
  Cc: linux-ia64, Sebastian Ott, linux-doc, Hanjun Guo, Heiko Carstens,
	Paul Mackerras, H . Peter Anvin, linux-s390, Jonathan Corbet,
	Jean-Philippe Brucker, Joerg Roedel, x86, Ingo Molnar, Fenghua Yu,
	Will Deacon, John Garry, linuxppc-dev, Thomas Gleixner,
	Gerald Schaefer, Tony Luck, David Woodhouse, linux-kernel, iommu,
	Martin Schwidefsky, Robin Murphy
In-Reply-To: <20190530034831.4184-3-thunder.leizhen@huawei.com>

On Thu, May 30, 2019 at 11:48:26AM +0800, Zhen Lei wrote:
> This patch removes the ifdefs around CONFIG_IOMMU_DEFAULT_PASSTHROUGH to
> improve readablity.

Avoid having "This patch" or "This commit" in the commit message. It is
tautologically useless.

Also, do

$ git grep 'This patch' Documentation/process

for more details.

> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>  arch/x86/kernel/pci-dma.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> index dcd272dbd0a9330..9f2b19c35a060df 100644
> --- a/arch/x86/kernel/pci-dma.c
> +++ b/arch/x86/kernel/pci-dma.c
> @@ -43,11 +43,8 @@
>   * It is also possible to disable by default in kernel config, and enable with
>   * iommu=nopt at boot time.
>   */
> -#ifdef CONFIG_IOMMU_DEFAULT_PASSTHROUGH
> -int iommu_pass_through __read_mostly = 1;
> -#else
> -int iommu_pass_through __read_mostly;
> -#endif
> +int iommu_pass_through __read_mostly =
> +			IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH);

Let that line stick out.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply

* Re: [PATCH v2 8/8] habanalabs: enable 64-bit DMA mask in POWER9
From: Oliver O'Halloran @ 2019-06-12  5:45 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Oded Gabbay, Russell Currey, Greg KH,
	Linux-Kernel@Vger. Kernel. Org, Christoph Hellwig, linuxppc-dev
In-Reply-To: <ca81ca5d56a3a12db5a92f5cf9745763a86572e8.camel@kernel.crashing.org>

On Wed, Jun 12, 2019 at 8:54 AM Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
>
> On Tue, 2019-06-11 at 20:22 +0300, Oded Gabbay wrote:
> >
> > > So, to summarize:
> > > If I call pci_set_dma_mask with 48, then it fails on POWER9. However,
> > > in runtime, I don't know if its POWER9 or not, so upon failure I will
> > > call it again with 32, which makes our device pretty much unusable.
> > > If I call pci_set_dma_mask with 64, and do the dedicated configuration
> > > in Goya's PCIe controller, then it won't work on x86-64, because bit
> > > 59 will be set and the host won't like it (I checked it). In addition,
> > > I might get addresses above 50 bits, which my device can't generate.
> > >
> > > I hope this makes things more clear. Now, please explain to me how I
> > > can call pci_set_dma_mask without any regard to whether I run on
> > > x86-64 or POWER9, considering what I wrote above ?
> > >
> > > Thanks,
> > > Oded
> >
> > Adding ppc mailing list.
>
> You can't. Your device is broken. Devices that don't support DMAing to
> the full 64-bit deserve to be added to the trash pile.
>
> As a result, getting it to work will require hacks. Some GPUs have
> similar issues and require similar hacks, it's unfortunate.
>
> Added a couple of guys on CC who might be able to help get those hacks
> right.

> It's still very fishy .. the idea is to detect the case where setting a
> 64-bit mask will give your system memory mapped at a fixed high address
> (1 << 59 in our case) and program that in your chip in the "Fixed high
> bits" register that you seem to have (also make sure it doesn't affect
> MSIs or it will break them).

Judging from the patch (https://lkml.org/lkml/2019/6/11/59) this is
what they're doing.

Also, are you sure about the MSI thing? The IODA3 spec says the only
important bits for a 64bit MSI are bits 61:60 (to hit the window) and
the lower bits that determine what IVE to use. Everything in between
is ignored so ORing in bit 59 shouldn't break anything.

> This will only work as long as all of the system memory can be
> addressed at an offset from that fixed address that itself fits your
> device addressing capabilities (50 bits in this case). It may or may
> not be the case but there's no way to check since the DMA mask logic
> won't really apply.
>
> You might want to consider fixing your HW in the next iteration... This
> is going to bite you when x86 increases the max physical memory for
> example, or on other architectures.

Yes, do this. The easiest way to avoid this sort of wierd hack is to
just design the PCIe interface to the spec in the first place.

^ permalink raw reply

* [PATCH] crypto: talitos - fix max key size for sha384 and sha512
From: Christophe Leroy @ 2019-06-12  5:49 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, horia.geanta
  Cc: linuxppc-dev, linux-crypto, linux-kernel

Below commit came with a typo in the CONFIG_ symbol, leading
to a permanently reduced max key size regarless of the driver
capabilities.

Reported-by: Horia Geantă <horia.geanta@nxp.com>
Fixes: b8fbdc2bc4e7 ("crypto: talitos - reduce max key size for SEC1")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 drivers/crypto/talitos.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index 03b7a5d28fb0..b4c8a013f302 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -832,7 +832,7 @@ static void talitos_unregister_rng(struct device *dev)
  * HMAC_SNOOP_NO_AFEA (HSNA) instead of type IPSEC_ESP
  */
 #define TALITOS_CRA_PRIORITY_AEAD_HSNA	(TALITOS_CRA_PRIORITY - 1)
-#ifdef CONFIG_CRYPTO_DEV_TALITOS_SEC2
+#ifdef CONFIG_CRYPTO_DEV_TALITOS2
 #define TALITOS_MAX_KEY_SIZE		(AES_MAX_KEY_SIZE + SHA512_BLOCK_SIZE)
 #else
 #define TALITOS_MAX_KEY_SIZE		(AES_MAX_KEY_SIZE + SHA256_BLOCK_SIZE)
-- 
2.13.3


^ permalink raw reply related

* Re: [PATCH v2 0/4] Additional fixes on Talitos driver
From: Christophe Leroy @ 2019-06-12  5:52 UTC (permalink / raw)
  To: Horia Geanta, Herbert Xu, David S. Miller
  Cc: linuxppc-dev@lists.ozlabs.org, linux-crypto@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <VI1PR0402MB3485AD965F36709F27EFB72698ED0@VI1PR0402MB3485.eurprd04.prod.outlook.com>



Le 11/06/2019 à 18:30, Horia Geanta a écrit :
> On 6/11/2019 6:40 PM, Christophe Leroy wrote:
>>
>>
>> Le 11/06/2019 à 17:37, Horia Geanta a écrit :
>>> On 6/11/2019 5:39 PM, Christophe Leroy wrote:
>>>> This series is the last set of fixes for the Talitos driver.
>>>>
>>>> We now get a fully clean boot on both SEC1 (SEC1.2 on mpc885) and
>>>> SEC2 (SEC2.2 on mpc8321E) with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS:
>>>>
>>> I am getting below failures on a sec 3.3.2 (p1020rdb) for hmac(sha384) and
>>> hmac(sha512):
>>
>> Is that new with this series or did you already have it before ?
>>
> Looks like this happens with or without this series.

Found the issue, that's in 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=b8fbdc2bc4e71b62646031d5df5f08aafe15d5ad

CONFIG_CRYPTO_DEV_TALITOS_SEC2 should be CONFIG_CRYPTO_DEV_TALITOS2 instead.

Just sent a patch to fix it.

Thanks
Christophe

> 
> I haven't checked the state of this driver for quite some time.
> Since I've noticed increased activity, I thought it would be worth
> actually testing the changes.
> 
> Are changes in patch 2/4 ("crypto: talitos - fix hash on SEC1.")
> strictly for sec 1.x or they affect all revisions?
> 
>> What do you mean by "fuzz testing" enabled ? Is that
>> CONFIG_CRYPTO_MANAGER_EXTRA_TESTS or something else ?
>>
> Yes, it's this config symbol.
> 
> Horia
> 

^ permalink raw reply

* Re: [PATCH v2] powerpc/perf: Use cpumask_last() to determine the designated cpu for nest/core units.
From: Anju T Sudhakar @ 2019-06-12  5:58 UTC (permalink / raw)
  To: Leonardo Bras, mpe; +Cc: ego, maddy, linuxppc-dev
In-Reply-To: <181424243e879218b732034f6014ac4af5c68285.camel@linux.ibm.com>

Hi Leonardo,

On 6/11/19 12:17 AM, Leonardo Bras wrote:
> On Mon, 2019-06-10 at 12:02 +0530, Anju T Sudhakar wrote:
>> Nest and core imc(In-memory Collection counters) assigns a particular
>> cpu as the designated target for counter data collection.
>> During system boot, the first online cpu in a chip gets assigned as
>> the designated cpu for that chip(for nest-imc) and the first online cpu
>> in a core gets assigned as the designated cpu for that core(for core-imc).
>>
>> If the designated cpu goes offline, the next online cpu from the same
>> chip(for nest-imc)/core(for core-imc) is assigned as the next target,
>> and the event context is migrated to the target cpu.
>> Currently, cpumask_any_but() function is used to find the target cpu.
>> Though this function is expected to return a `random` cpu, this always
>> returns the next online cpu.
>>
>> If all cpus in a chip/core is offlined in a sequential manner, starting
>> from the first cpu, the event migration has to happen for all the cpus
>> which goes offline. Since the migration process involves a grace period,
>> the total time taken to offline all the cpus will be significantly high.
> Seems like a very interesting work.
> Out of curiosity, have you used 'chcpu -d' to create your benchmark?

Here I did not use chcpu to disable the cpu.

I used a script which will offline cpus 88-175 by echoing  `0` to

/sys/devices/system/cpu/cpu*/online.


Regards,

Anju



^ permalink raw reply

* Re: [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59
From: Oliver O'Halloran @ 2019-06-12  6:16 UTC (permalink / raw)
  To: Shawn Anastasio
  Cc: Alexey Kardashevskiy, Alistair Popple, David Gibson, linuxppc-dev,
	Sam Bobroff
In-Reply-To: <1e3de274-aec4-6e69-5e37-be15ea888deb@anastas.io>

On Wed, Jun 12, 2019 at 3:06 PM Shawn Anastasio <shawn@anastas.io> wrote:
>
> On 6/5/19 11:11 PM, Shawn Anastasio wrote:
> > On 5/30/19 2:03 AM, Alexey Kardashevskiy wrote:
> >> This is an attempt to allow DMA masks between 32..59 which are not large
> >> enough to use either a PHB3 bypass mode or a sketchy bypass. Depending
> >> on the max order, up to 40 is usually available.
> >>
> >>
> >> This is based on v5.2-rc2.
> >>
> >> Please comment. Thanks.
> >
> > I have tested this patch set with an AMD GPU that's limited to <64bit
> > DMA (I believe it's 40 or 42 bit). It successfully allows the card to
> > operate without falling back to 32-bit DMA mode as it does without
> > the patches.
> >
> > Relevant kernel log message:
> > ```
> > [    0.311211] pci 0033:01     : [PE# 00] Enabling 64-bit DMA bypass
> > ```
> >
> > Tested-by: Shawn Anastasio <shawn@anastas.io>
>
> After a few days of further testing, I've started to run into stability
> issues with the patch applied and used with an AMD GPU. Specifically,
> the system sometimes spontaneously crashes. Not just EEH errors either,
> the whole system shuts down in what looks like a checkstop.

Any specific workload? Checkstops are harder to debug without a system
in the failed state so we'd need to replicate that locally to get a
decent idea what's up.

> Perhaps some subtle corruption is occurring?

^ permalink raw reply

* Re: [PATCH v3 1/3] powerpc/powernv: Add OPAL API interface to get secureboot state
From: Daniel Axtens @ 2019-06-12  6:17 UTC (permalink / raw)
  To: Nayna Jain, linuxppc-dev, linux-efi, linux-integrity,
	linux-kernel
  Cc: Ard Biesheuvel, Nayna Jain, Claudio Carvalho, Mimi Zohar,
	Matthew Garret, Paul Mackerras, Jeremy Kerr
In-Reply-To: <1560198837-18857-2-git-send-email-nayna@linux.ibm.com>

Nayna Jain <nayna@linux.ibm.com> writes:

> From: Claudio Carvalho <cclaudio@linux.ibm.com>
>
> The X.509 certificates trusted by the platform and other information
> required to secure boot the OS kernel are wrapped in secure variables,
> which are controlled by OPAL.
>
> This patch adds support to read OPAL secure variables through
> OPAL_SECVAR_GET call. It returns the metadata and data for a given secure
> variable based on the unique key.
>
> Since OPAL can support different types of backend which can vary in the
> variable interpretation, a new OPAL API call named OPAL_SECVAR_BACKEND, is
> added to retrieve the supported backend version. This helps the consumer
> to know how to interpret the variable.
>

(Firstly, apologies that I haven't got around to asking about this yet!)

Are pluggable/versioned backend a good idea?

There are a few things that worry me about the idea:

 - It adds complexity in crypto (or crypto-adjacent) code, and that
   increases the likelihood that we'll accidentally add a bug with bad
   consequences.

 - Under what circumstances would would we change the kernel-visible
   behaviour of skiboot? Are we expecting to change the behaviour,
   content or names of the variables in future? Otherwise the only
   relevant change I can think of is a change to hardware platforms, and
   I'm not sure how a change in hardware would lead to change in
   behaviour in the kernel. Wouldn't Skiboot hide h/w differences?

 - If we are worried about a long-term-future change to how secure-boot
   works, would it be better to just add more get/set calls to opal at
   the point at which we actually implement the new system?
   
 - UEFI added EFI_VARIABLE_AUTHENTICATION_3 in a way that - as far
   as I know - didn't break backwards compatibility. Is there a reason
   we cannot add features that way instead? (It also dropped v1 of the
   authentication header.)
   
 - What is the correct fallback behaviour if a kernel receives a result
   that it does not expect? If a kernel expecting BackendV1 is instead
   informed that it is running on BackendV2, then the cannot access the
   secure variable at all, so it cannot load keys that are potentially
   required to successfully boot (e.g. to validate the module for
   network card or graphics!)

Kind regards,
Daniel

> This support can be enabled using CONFIG_OPAL_SECVAR
>
> Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com>
> Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
> ---
> This patch depends on a new OPAL call that is being added to skiboot.
> The patch set that implements the new call has been posted to
> https://patchwork.ozlabs.org/project/skiboot/list/?series=112868
>
>  arch/powerpc/include/asm/opal-api.h          |  4 +-
>  arch/powerpc/include/asm/opal-secvar.h       | 23 ++++++
>  arch/powerpc/include/asm/opal.h              |  6 ++
>  arch/powerpc/platforms/powernv/Kconfig       |  6 ++
>  arch/powerpc/platforms/powernv/Makefile      |  1 +
>  arch/powerpc/platforms/powernv/opal-call.c   |  2 +
>  arch/powerpc/platforms/powernv/opal-secvar.c | 85 ++++++++++++++++++++
>  7 files changed, 126 insertions(+), 1 deletion(-)
>  create mode 100644 arch/powerpc/include/asm/opal-secvar.h
>  create mode 100644 arch/powerpc/platforms/powernv/opal-secvar.c
>
> diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
> index e1577cfa7186..a505e669b4b6 100644
> --- a/arch/powerpc/include/asm/opal-api.h
> +++ b/arch/powerpc/include/asm/opal-api.h
> @@ -212,7 +212,9 @@
>  #define OPAL_HANDLE_HMI2			166
>  #define	OPAL_NX_COPROC_INIT			167
>  #define OPAL_XIVE_GET_VP_STATE			170
> -#define OPAL_LAST				170
> +#define OPAL_SECVAR_GET                         173
> +#define OPAL_SECVAR_BACKEND                     177
> +#define OPAL_LAST				177
>  
>  #define QUIESCE_HOLD			1 /* Spin all calls at entry */
>  #define QUIESCE_REJECT			2 /* Fail all calls with OPAL_BUSY */
> diff --git a/arch/powerpc/include/asm/opal-secvar.h b/arch/powerpc/include/asm/opal-secvar.h
> new file mode 100644
> index 000000000000..b677171a0368
> --- /dev/null
> +++ b/arch/powerpc/include/asm/opal-secvar.h
> @@ -0,0 +1,23 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * PowerNV definitions for secure variables OPAL API.
> + *
> + * Copyright (C) 2019 IBM Corporation
> + * Author: Claudio Carvalho <cclaudio@linux.ibm.com>
> + *
> + */
> +#ifndef OPAL_SECVAR_H
> +#define OPAL_SECVAR_H
> +
> +enum {
> +	BACKEND_NONE = 0,
> +	BACKEND_TC_COMPAT_V1,
> +};
> +
> +extern int opal_get_variable(u8 *key, unsigned long ksize,
> +			     u8 *metadata, unsigned long *mdsize,
> +			     u8 *data, unsigned long *dsize);
> +
> +extern int opal_variable_version(unsigned long *backend);
> +
> +#endif
> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index 4cc37e708bc7..57d2c2356eda 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -394,6 +394,12 @@ void opal_powercap_init(void);
>  void opal_psr_init(void);
>  void opal_sensor_groups_init(void);
>  
> +extern int opal_secvar_get(uint64_t k_key, uint64_t k_key_len,
> +			   uint64_t k_metadata, uint64_t k_metadata_size,
> +			   uint64_t k_data, uint64_t k_data_size);
> +
> +extern int opal_secvar_backend(uint64_t k_backend);
> +
>  #endif /* __ASSEMBLY__ */
>  
>  #endif /* _ASM_POWERPC_OPAL_H */
> diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
> index 850eee860cf2..65b060539b5c 100644
> --- a/arch/powerpc/platforms/powernv/Kconfig
> +++ b/arch/powerpc/platforms/powernv/Kconfig
> @@ -47,3 +47,9 @@ config PPC_VAS
>  	  VAS adapters are found in POWER9 based systems.
>  
>  	  If unsure, say N.
> +
> +config OPAL_SECVAR
> +	bool "OPAL Secure Variables"
> +	depends on PPC_POWERNV
> +	help
> +	  This enables the kernel to access OPAL secure variables.
> diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
> index da2e99efbd04..6651c742e530 100644
> --- a/arch/powerpc/platforms/powernv/Makefile
> +++ b/arch/powerpc/platforms/powernv/Makefile
> @@ -16,3 +16,4 @@ obj-$(CONFIG_PERF_EVENTS) += opal-imc.o
>  obj-$(CONFIG_PPC_MEMTRACE)	+= memtrace.o
>  obj-$(CONFIG_PPC_VAS)	+= vas.o vas-window.o vas-debug.o
>  obj-$(CONFIG_OCXL_BASE)	+= ocxl.o
> +obj-$(CONFIG_OPAL_SECVAR) += opal-secvar.o
> diff --git a/arch/powerpc/platforms/powernv/opal-call.c b/arch/powerpc/platforms/powernv/opal-call.c
> index 36c8fa3647a2..0445980f294f 100644
> --- a/arch/powerpc/platforms/powernv/opal-call.c
> +++ b/arch/powerpc/platforms/powernv/opal-call.c
> @@ -288,3 +288,5 @@ OPAL_CALL(opal_pci_set_pbcq_tunnel_bar,		OPAL_PCI_SET_PBCQ_TUNNEL_BAR);
>  OPAL_CALL(opal_sensor_read_u64,			OPAL_SENSOR_READ_U64);
>  OPAL_CALL(opal_sensor_group_enable,		OPAL_SENSOR_GROUP_ENABLE);
>  OPAL_CALL(opal_nx_coproc_init,			OPAL_NX_COPROC_INIT);
> +OPAL_CALL(opal_secvar_get,                      OPAL_SECVAR_GET);
> +OPAL_CALL(opal_secvar_backend,                  OPAL_SECVAR_BACKEND);
> diff --git a/arch/powerpc/platforms/powernv/opal-secvar.c b/arch/powerpc/platforms/powernv/opal-secvar.c
> new file mode 100644
> index 000000000000..dba441dd5af1
> --- /dev/null
> +++ b/arch/powerpc/platforms/powernv/opal-secvar.c
> @@ -0,0 +1,85 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * PowerNV code for secure variables
> + *
> + * Copyright (C) 2019 IBM Corporation
> + * Author: Claudio Carvalho <cclaudio@linux.ibm.com>
> + *
> + */
> +
> +/*
> + * The opal wrappers in this file treat the @name, @vendor, and @data
> + * parameters as little endian blobs.
> + * @name is a ucs2 string
> + * @vendor is the vendor GUID. It is converted to LE in the kernel
> + * @data variable data, which layout may be different for each variable
> + */
> +
> +#define pr_fmt(fmt) "secvar: "fmt
> +
> +#include <linux/types.h>
> +#include <asm/opal.h>
> +#include <asm/opal-secvar.h>
> +
> +static bool is_opal_secvar_supported(void)
> +{
> +	static bool opal_secvar_supported;
> +	static bool initialized;
> +
> +	if (initialized)
> +		return opal_secvar_supported;
> +
> +	if (!opal_check_token(OPAL_SECVAR_GET)
> +	    || !opal_check_token(OPAL_SECVAR_BACKEND)) {
> +		pr_err("OPAL doesn't support secure variables\n");
> +		opal_secvar_supported = false;
> +	} else {
> +		opal_secvar_supported = true;
> +	}
> +
> +	initialized = true;
> +
> +	return opal_secvar_supported;
> +}
> +
> +int opal_get_variable(u8 *key, unsigned long ksize, u8 *metadata,
> +		      unsigned long *mdsize, u8 *data, unsigned long *dsize)
> +{
> +	int rc;
> +
> +	if (!is_opal_secvar_supported())
> +		return OPAL_UNSUPPORTED;
> +
> +	if (mdsize)
> +		*mdsize = cpu_to_be64(*mdsize);
> +	if (dsize)
> +		*dsize = cpu_to_be64(*dsize);
> +
> +	rc = opal_secvar_get(__pa(key), ksize, __pa(metadata), __pa(mdsize),
> +			     __pa(data), __pa(dsize));
> +
> +	if (mdsize)
> +		*mdsize = be64_to_cpu(*mdsize);
> +	if (dsize)
> +		*dsize = be64_to_cpu(*dsize);
> +
> +	return rc;
> +}
> +
> +int opal_variable_version(unsigned long *backend)
> +{
> +	int rc;
> +
> +	if (!is_opal_secvar_supported())
> +		return OPAL_UNSUPPORTED;
> +
> +	if (backend)
> +		*backend = cpu_to_be64(*backend);
> +
> +	rc = opal_secvar_backend(__pa(backend));
> +
> +	if (backend)
> +		*backend = be64_to_cpu(*backend);
> +
> +	return rc;
> +}
> -- 
> 2.20.1

^ permalink raw reply

* Re: [PATCH v2 8/8] habanalabs: enable 64-bit DMA mask in POWER9
From: Oded Gabbay @ 2019-06-12  6:25 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Russell Currey, Oliver OHalloran, Greg KH,
	Linux-Kernel@Vger. Kernel. Org, Christoph Hellwig, linuxppc-dev
In-Reply-To: <ca81ca5d56a3a12db5a92f5cf9745763a86572e8.camel@kernel.crashing.org>

On Wed, Jun 12, 2019 at 1:53 AM Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
>
> On Tue, 2019-06-11 at 20:22 +0300, Oded Gabbay wrote:
> >
> > > So, to summarize:
> > > If I call pci_set_dma_mask with 48, then it fails on POWER9. However,
> > > in runtime, I don't know if its POWER9 or not, so upon failure I will
> > > call it again with 32, which makes our device pretty much unusable.
> > > If I call pci_set_dma_mask with 64, and do the dedicated configuration
> > > in Goya's PCIe controller, then it won't work on x86-64, because bit
> > > 59 will be set and the host won't like it (I checked it). In addition,
> > > I might get addresses above 50 bits, which my device can't generate.
> > >
> > > I hope this makes things more clear. Now, please explain to me how I
> > > can call pci_set_dma_mask without any regard to whether I run on
> > > x86-64 or POWER9, considering what I wrote above ?
> > >
> > > Thanks,
> > > Oded
> >
> > Adding ppc mailing list.
>
> You can't. Your device is broken. Devices that don't support DMAing to
> the full 64-bit deserve to be added to the trash pile.
>
Hmm... right know they are added to customers data-centers but what do I know ;)

> As a result, getting it to work will require hacks. Some GPUs have
> similar issues and require similar hacks, it's unfortunate.
>
> Added a couple of guys on CC who might be able to help get those hacks
> right.
Thanks :)
>
> It's still very fishy .. the idea is to detect the case where setting a
> 64-bit mask will give your system memory mapped at a fixed high address
> (1 << 59 in our case) and program that in your chip in the "Fixed high
> bits" register that you seem to have (also make sure it doesn't affect
> MSIs or it will break them).
MSI-X are working. The set of bit 59 doesn't apply to MSI-X
transactions (AFAICS from the PCIe controller spec we have).
>
> This will only work as long as all of the system memory can be
> addressed at an offset from that fixed address that itself fits your
> device addressing capabilities (50 bits in this case). It may or may
> not be the case but there's no way to check since the DMA mask logic
> won't really apply.
Understood. In the specific system we are integrated to, that is the
case - we have less then 48 bits. But, as you pointed out, it is not a
generic solution but with my H/W I can't give a generic fit-all
solution for POWER9. I'll settle for the best that I can do.

>
> You might want to consider fixing your HW in the next iteration... This
> is going to bite you when x86 increases the max physical memory for
> example, or on other architectures.
Understood and taken care of.

>
> Cheers,
> Ben.
>
>
>
>

^ permalink raw reply

* Re: [PATCH v8 2/7] x86/dma: use IS_ENABLED() to simplify the code
From: Leizhen (ThunderTown) @ 2019-06-12  6:26 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-ia64, Sebastian Ott, linux-doc, Hanjun Guo, Heiko Carstens,
	Paul Mackerras, H . Peter Anvin, linux-s390, Jonathan Corbet,
	Jean-Philippe Brucker, Joerg Roedel, x86, Ingo Molnar, Fenghua Yu,
	Will Deacon, John Garry, linuxppc-dev, Thomas Gleixner,
	Gerald Schaefer, Tony Luck, David Woodhouse, linux-kernel, iommu,
	Martin Schwidefsky, Robin Murphy
In-Reply-To: <20190612051624.GF32652@zn.tnic>



On 2019/6/12 13:16, Borislav Petkov wrote:
> On Thu, May 30, 2019 at 11:48:26AM +0800, Zhen Lei wrote:
>> This patch removes the ifdefs around CONFIG_IOMMU_DEFAULT_PASSTHROUGH to
>> improve readablity.
> 
> Avoid having "This patch" or "This commit" in the commit message. It is
> tautologically useless.

OK, thanks.

> 
> Also, do
> 
> $ git grep 'This patch' Documentation/process
> 
> for more details.
> 
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> ---
>>  arch/x86/kernel/pci-dma.c | 7 ++-----
>>  1 file changed, 2 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
>> index dcd272dbd0a9330..9f2b19c35a060df 100644
>> --- a/arch/x86/kernel/pci-dma.c
>> +++ b/arch/x86/kernel/pci-dma.c
>> @@ -43,11 +43,8 @@
>>   * It is also possible to disable by default in kernel config, and enable with
>>   * iommu=nopt at boot time.
>>   */
>> -#ifdef CONFIG_IOMMU_DEFAULT_PASSTHROUGH
>> -int iommu_pass_through __read_mostly = 1;
>> -#else
>> -int iommu_pass_through __read_mostly;
>> -#endif
>> +int iommu_pass_through __read_mostly =
>> +			IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH);
> 
> Let that line stick out.

OK, I will merge them on the same line.

> 
> Thx.
> 


^ permalink raw reply

* Re: sys_exit: NR -1
From: Naveen N. Rao @ 2019-06-12  6:32 UTC (permalink / raw)
  To: linux-perf-users@vger.kernel.org, Paul Clarke; +Cc: linuxppc-dev
In-Reply-To: <2f004b41-4f6f-3e6f-227a-cb199b8429d2@us.ibm.com>

Paul Clarke wrote:
> What are the circumstances in which raw_syscalls:sys_exit reports "-1" for the syscall ID?
> 
>     perf  5375 [007] 59632.478528:   raw_syscalls:sys_enter: NR 1 (3, 9fb888, 8, 2d83740, 1, 7ffff)
>     perf  5375 [007] 59632.478532:    raw_syscalls:sys_exit: NR 1 = 8
>     perf  5375 [007] 59632.478538:   raw_syscalls:sys_enter: NR 15 (11, 7ffffca734b0, 7ffffca73380, 2d83740, 1, 7ffff)
>     perf  5375 [007] 59632.478539:    raw_syscalls:sys_exit: NR -1 = 8
>     perf  5375 [007] 59632.478543:   raw_syscalls:sys_enter: NR 16 (4, 2401, 0, 2d83740, 1, 0)
>     perf  5375 [007] 59632.478551:    raw_syscalls:sys_exit: NR 16 = 0

Which architecture?
For powerpc, see:

static inline int syscall_get_nr(struct task_struct *task, struct pt_regs *regs)
{
	/*
	 * Note that we are returning an int here. That means 0xffffffff, ie.
	 * 32-bit negative 1, will be interpreted as -1 on a 64-bit kernel.
	 * This is important for seccomp so that compat tasks can set r0 = -1
	 * to reject the syscall.
	 */
	return TRAP(regs) == 0xc00 ? regs->gpr[0] : -1;
}


- Naveen



^ permalink raw reply

* Re: [PATCH v2 8/8] habanalabs: enable 64-bit DMA mask in POWER9
From: Oliver O'Halloran @ 2019-06-12  6:35 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Christoph Hellwig, Greg KH, Linux-Kernel@Vger. Kernel. Org,
	linuxppc-dev
In-Reply-To: <CAFCwf134nTD4FM_9Q+THQ7ZAZzGxhs15O6EheaRJMqM5wxi+aA@mail.gmail.com>

On Wed, Jun 12, 2019 at 3:25 AM Oded Gabbay <oded.gabbay@gmail.com> wrote:
>
> On Tue, Jun 11, 2019 at 8:03 PM Oded Gabbay <oded.gabbay@gmail.com> wrote:
> >
> > On Tue, Jun 11, 2019 at 6:26 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> > > *snip*
> >
> > Now, when I tried to integrate Goya into a POWER9 machine, I got a
> > reject from the call to pci_set_dma_mask(pdev, 48). The standard code,
> > as I wrote above, is to call the same function with 32-bits. That
> > works BUT it is not practical, as our applications require much more
> > memory mapped then 32-bits.

Setting a 48 bit DMA mask doesn't work today because we only allocate
IOMMU tables to cover the 0..2GB range of PCI bus addresses. Alexey
has some patches to expand that range so we can support devices that
can't hit the 64 bit bypass window. You need:

This fix: http://patchwork.ozlabs.org/patch/1113506/
This series: http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=110810

Give that a try and see if the IOMMU overhead is tolerable.

> >In addition, once you add more cards which
> > are all mapped to the same range, it is simply not usable at all.

Each IOMMU group should have a separate bus address space and seperate
cards shouldn't be in the same IOMMU group. If they are then there's
something up.

Oliver

^ permalink raw reply

* Re: [PATCH v2 8/8] habanalabs: enable 64-bit DMA mask in POWER9
From: Christoph Hellwig @ 2019-06-12  6:53 UTC (permalink / raw)
  To: Oliver O'Halloran
  Cc: Oded Gabbay, Greg KH, Christoph Hellwig,
	Linux-Kernel@Vger. Kernel. Org, linuxppc-dev
In-Reply-To: <CAOSf1CE82uVVni638jkJJpQ7XLXX+HdD7xuB7Wv-f8mn=SBMeg@mail.gmail.com>

On Wed, Jun 12, 2019 at 04:35:22PM +1000, Oliver O'Halloran wrote:
> Setting a 48 bit DMA mask doesn't work today because we only allocate
> IOMMU tables to cover the 0..2GB range of PCI bus addresses.

I don't think that is true upstream, and if it is we need to fix bug
in the powerpc code.  powerpc should be falling back treating a 48-bit
dma mask like a 32-bit one at least, that is use dynamic iommu mappings
instead of using the direct mapping.  And from my reding of 
arch/powerpc/kernel/dma-iommu.c that is exactly what it does.

^ permalink raw reply

* Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook
From: Christoph Hellwig @ 2019-06-12  6:55 UTC (permalink / raw)
  To: Larry Finger
  Cc: Aaro Koskinen, linux-wireless, linux-kernel, Christian Zigotzky,
	linuxppc-dev, Christoph Hellwig
In-Reply-To: <5aaa600b-5b59-1f68-454f-20403c318f1a@lwfinger.net>

On Tue, Jun 11, 2019 at 05:20:12PM -0500, Larry Finger wrote:
> Your first patch did not work as the configuration does not have 
> CONFIG_ZONE_DMA. As a result, the initial value of min_mask always starts 
> at 32 bits and is taken down to 31 with the maximum pfn minimization. When 
> I forced the initial value of min_mask to 30 bits, the device worked.

Ooops, yes.  But I think we could just enable ZONE_DMA on 32-bit
powerpc.  Crude enablement hack below:

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8c1c636308c8..1dd71a98b70c 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -372,7 +372,7 @@ config PPC_ADV_DEBUG_DAC_RANGE
 
 config ZONE_DMA
 	bool
-	default y if PPC_BOOK3E_64
+	default y
 
 config PGTABLE_LEVELS
 	int

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox