LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] powerpc/mm: remove warning about ‘type’ being set
From: Mathieu Malaterre @ 2018-06-22 19:27 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Mathieu Malaterre, Benjamin Herrenschmidt, Paul Mackerras,
	Thomas Gleixner, Greg Kroah-Hartman, Philippe Ombredanne,
	Kate Stewart, linuxppc-dev, linux-kernel

‘type’ is only used when CONFIG_DEBUG_HIGHMEM is set. So add a possibly
unused tag to variable. Remove warning treated as error with W=1:

  arch/powerpc/mm/highmem.c:59:6: error: variable ‘type’ set but not used [-Werror=unused-but-set-variable]

Signed-off-by: Mathieu Malaterre <malat@debian.org>
---
 arch/powerpc/mm/highmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/highmem.c b/arch/powerpc/mm/highmem.c
index 668e87d03f9e..82a0e37557a5 100644
--- a/arch/powerpc/mm/highmem.c
+++ b/arch/powerpc/mm/highmem.c
@@ -56,7 +56,7 @@ EXPORT_SYMBOL(kmap_atomic_prot);
 void __kunmap_atomic(void *kvaddr)
 {
 	unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
-	int type;
+	int type __maybe_unused;
 
 	if (vaddr < __fix_to_virt(FIX_KMAP_END)) {
 		pagefault_enable();
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH] selftests/powerpc: Fix strncpy usage
From: Al Dunsmuir @ 2018-06-22 21:01 UTC (permalink / raw)
  To: Paul Clarke, Breno Leitao, Segher Boessenkool
  Cc: linuxppc-dev, Anshuman Khandual
In-Reply-To: <12953a47-a6a7-5f70-a46c-c95f8c424d11@us.ibm.com>

On Friday, June 22, 2018, 11:15:29 AM, Paul Clarke wrote:
> On 06/22/2018 09:43 AM, Breno Leitao wrote:
>> If you don't mind, I would solve this problem slightly different, as it seems
>> to be more readable.
>> 
>> -       strncpy(prog, argv[0], strlen(argv[0]));
>> +       if (strlen(argv[0]) >= LEN_MAX){
>> +               fprintf(stderr, "Very big executable name: %s\n", argv[0]);

> "Very big" is an observation.  "Too big" indicates a problem
> better.  Or, more explicitly "Executable name is too long".

Or even better, display the limit that is being exceeded, in case that
value changes over time.  Something like.

-       strncpy(prog, argv[0], strlen(argv[0]));
+       if (strlen(argv[0]) >= LEN_MAX){
+                fprintf(stderr, "Executable name exceeds limit (%d): %s\n",
+                        LEN_MAX,
+                        argv[0]);

^ permalink raw reply

* Re: [PATCH] selftests/powerpc: Fix strncpy usage
From: Segher Boessenkool @ 2018-06-23  1:00 UTC (permalink / raw)
  To: Christophe LEROY; +Cc: Breno Leitao, linuxppc-dev, Anshuman Khandual
In-Reply-To: <9cd93c1d-c067-3a18-bfd8-fef1ef443739@c-s.fr>

On Fri, Jun 22, 2018 at 04:51:21PM +0200, Christophe LEROY wrote:
> Le 22/06/2018 à 16:43, Breno Leitao a écrit :
> >+               fprintf(stderr, "Very big executable name: %s\n", argv[0]);
> >+               return 1;
> >+       }
> >+
> >+       strncpy(prog, argv[0], sizeof(prog) - 1);
> 
> You have checked before that argv[0] is not too long, so you should not 
> need to use strncpy(), strcpy() would do it.

If you don't care about the bytes of prog after the first zero byte, sure.


Segher

^ permalink raw reply

* Re: [PATCH] selftests/powerpc: Fix strncpy usage
From: Segher Boessenkool @ 2018-06-23  1:10 UTC (permalink / raw)
  To: Breno Leitao; +Cc: linuxppc-dev, Anshuman Khandual
In-Reply-To: <1dc025d5-366c-ae13-259e-dae543e6ec52@debian.org>

Hi!

On Fri, Jun 22, 2018 at 11:43:44AM -0300, Breno Leitao wrote:
> On 06/21/2018 08:18 PM, Segher Boessenkool wrote:
> > On Wed, Jun 20, 2018 at 07:51:11PM -0300, Breno Leitao wrote:
> >> -	strncpy(prog, argv[0], strlen(argv[0]));
> >> +	strncpy(prog, argv[0], sizeof(prog) - 1);
> > 
> > 	strncpy(prog, argv[0], sizeof prog);
> > 	if (prog[sizeof prog - 1])
> > 		scream_bloody_murder();
> > 
> > Silently using the wrong data is a worse habit than not checking for
> > overflows ;-)
> 
> Completely agree! Thanks for bringing this up.
> 
> If you don't mind, I would solve this problem slightly different, as it seems
> to be more readable.
> 
> -       strncpy(prog, argv[0], strlen(argv[0]));
> +       if (strlen(argv[0]) >= LEN_MAX){
> +               fprintf(stderr, "Very big executable name: %s\n", argv[0]);
> +               return 1;
> +       }
> +
> +       strncpy(prog, argv[0], sizeof(prog) - 1);

The strlen reads all of argv[0], which can be very big in theory.  It won't
matter in this test file -- program arguments cannot be super long, for one
thing -- but it's not a good idea in general (that is one of the problems
of strlcpy, btw).

Best of course is to avoid string length restrictions completely, if you can.


Segher

^ permalink raw reply

* Re: [PATCH v2 0/3] Resolve -Wattribute-alias warnings from SYSCALL_DEFINEx()
From: Masahiro Yamada @ 2018-06-23  8:40 UTC (permalink / raw)
  To: Paul Burton
  Cc: Linux Kbuild mailing list, Mauro Carvalho Chehab, Linux-MIPS,
	Arnd Bergmann, Ingo Molnar, Matthew Wilcox, Thomas Gleixner,
	Douglas Anderson, Josh Poimboeuf, Andrew Morton,
	Matthias Kaehlcke, He Zhe, Benjamin Herrenschmidt, Michal Marek,
	Khem Raj, Christophe Leroy, Al Viro, Stafford Horne,
	Gideon Israel Dsouza, Kees Cook, Michael Ellerman, Heiko Carstens,
	Linux Kernel Mailing List, Paul Mackerras, linuxppc-dev
In-Reply-To: <20180619201458.4559-1-paul.burton@mips.com>

2018-06-20 5:14 GMT+09:00 Paul Burton <paul.burton@mips.com>:
> This series introduces infrastructure allowing compiler diagnostics to
> be disabled or their severity modified for specific pieces of code, with
> suitable abstractions to prevent that code from becoming tied to a
> specific compiler.
>
> This infrastructure is then used to disable the -Wattribute-alias
> warning around syscall definitions, which rely on type mismatches to
> sanitize arguments.
>
> Finally PowerPC-specific #pragma's are removed now that the generic code
> is handling this.
>
> The series takes Arnd's RFC patches & addresses the review comments they
> received. The most notable effect of this series to to avoid warnings &
> build failures caused by -Wattribute-alias when compiling the kernel
> with GCC 8.
>
> Applies cleanly atop v4.18-rc1.


Series, applied to linux-kbuild/fixes.
(since we need to fix warnings from GCC 8.1)


Thanks!



-- 
Best Regards
Masahiro Yamada

^ permalink raw reply

* Re: powerpc/64s/radix: Fix radix_kvm_prefetch_workaround paca access of not possible CPU
From: Michael Ellerman @ 2018-06-23 12:56 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev
  Cc: stable, Nicholas Piggin, Pridhiviraj Paidipeddi
In-Reply-To: <20180612093808.30679-1-npiggin@gmail.com>

On Tue, 2018-06-12 at 09:38:08 UTC, Nicholas Piggin wrote:
> If possible CPUs are limited (e.g., by kexec), then the kvm prefetch
> workaround function can access the paca pointer for a !possible CPU.
> 
> Fixes: d2e60075a3d44 ("powerpc/64: Use array of paca pointers and allocate pacas individually")
> Cc: stable@kernel.org
> Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
> Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/758380b8155f69b4e2f77f27562f8a

cheers

^ permalink raw reply

* Re: powerpc/mm/hash/4k: Free hugetlb page table caches correctly.
From: Michael Ellerman @ 2018-06-23 12:56 UTC (permalink / raw)
  To: Aneesh Kumar K.V, npiggin, benh, paulus; +Cc: Aneesh Kumar K.V, linuxppc-dev
In-Reply-To: <20180614103152.7344-1-aneesh.kumar@linux.ibm.com>

On Thu, 2018-06-14 at 10:31:52 UTC, "Aneesh Kumar K.V" wrote:
> With 4k page size for hugetlb we allocate hugepage directories from its on slab
> cache. With patch 0c4d26802 ("powerpc/book3s64/mm: Simplify the rcu callback for page table free")
> we missed to free these allocated hugepd tables.
> 
> Update pgtable_free to handle hugetlb hugepd directory table.
> 
> Fixes:  0c4d26802 ("powerpc/book3s64/mm: Simplify the rcu callback for page table free")
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/fadd03c615922d8521a2e76d4ba233

cheers

^ permalink raw reply

* Re: powerpc/64s: Fix build failures with CONFIG_NMI_IPI=n
From: Michael Ellerman @ 2018-06-23 12:56 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev
In-Reply-To: <20180619230454.8881-1-mpe@ellerman.id.au>

On Tue, 2018-06-19 at 23:04:54 UTC, Michael Ellerman wrote:
> I broke the build when CONFIG_NMI_IPI=n with my recent commit to add
> arch_trigger_cpumask_backtrace(), eg:
> 
>   stacktrace.c:(.text+0x1b0): undefined reference to `.smp_send_safe_nmi_ipi'
> 
> We should rework the CONFIG symbols here in future to avoid these
> double barrelled ifdefs but for now they fix the build.
> 
> Fixes: 5cc05910f26e ("powerpc/64s: Wire up arch_trigger_cpumask_backtrace()")
> Reported-by: Christophe LEROY <christophe.leroy@c-s.fr>
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

Applied to powerpc fixes.

https://git.kernel.org/powerpc/c/e08ecba17b72aeb01859601bc242a5

cheers

^ permalink raw reply

* [GIT PULL] Please pull powerpc/linux.git powerpc-4.18-2 tag
From: Michael Ellerman @ 2018-06-23 12:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: aneesh.kumar, linux-kernel, linuxppc-dev, mjeanson, npiggin

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi Linus,

Please pull powerpc fixes for 4.18:

The following changes since commit ce397d215ccd07b8ae3f71db689aedb85d56ab40:

  Linux 4.18-rc1 (2018-06-17 08:04:49 +0900)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git tags/powerpc-4.18-2

for you to fetch changes up to fadd03c615922d8521a2e76d4ba2335891cb2790:

  powerpc/mm/hash/4k: Free hugetlb page table caches correctly. (2018-06-20 09:13:25 +1000)

- ------------------------------------------------------------------
powerpc fixes for 4.18 #2

 - A fix for hugetlb with 4K pages, broken by our recent changes for split PMD PTL.

 - Set the correct assembler machine type on e500mc, needed since binutils 2.26
   introduced two forms for the "wait" instruction.

 - A fix for potential missed TLB flushes with MADV_[FREE|DONTNEED] etc. and THP
   on Power9 Radix.

 - Three fixes to try and make our panic handling more robust by hard disabling
   interrupts, and not marking stopped CPUs as offline because they haven't been
   properly offlined.

 - Three other minor fixes.

Thanks to:
  Aneesh Kumar K.V, Michael Jeanson, Nicholas Piggin.

- ------------------------------------------------------------------
Aneesh Kumar K.V (1):
      powerpc/mm/hash/4k: Free hugetlb page table caches correctly.

Michael Ellerman (2):
      powerpc/64s: Fix DT CPU features Power9 DD2.1 logic
      powerpc/64s: Fix build failures with CONFIG_NMI_IPI=n

Michael Jeanson (1):
      powerpc/e500mc: Set assembler machine type to e500mc

Nicholas Piggin (5):
      powerpc/64s/radix: Fix MADV_[FREE|DONTNEED] TLB flush miss problem with THP
      powerpc/64: hard disable irqs in panic_smp_self_stop
      powerpc: smp_send_stop do not offline stopped CPUs
      powerpc/64: hard disable irqs on the panic()ing CPU
      powerpc/64s/radix: Fix radix_kvm_prefetch_workaround paca access of not possible CPU


 arch/powerpc/Makefile                            |  1 +
 arch/powerpc/include/asm/book3s/32/pgalloc.h     |  1 +
 arch/powerpc/include/asm/book3s/64/pgtable-4k.h  | 21 +++++
 arch/powerpc/include/asm/book3s/64/pgtable-64k.h |  9 +++
 arch/powerpc/include/asm/book3s/64/pgtable.h     |  5 ++
 arch/powerpc/include/asm/nmi.h                   |  2 +-
 arch/powerpc/include/asm/nohash/32/pgalloc.h     |  1 +
 arch/powerpc/include/asm/nohash/64/pgalloc.h     |  1 +
 arch/powerpc/kernel/dt_cpu_ftrs.c                |  3 +-
 arch/powerpc/kernel/setup-common.c               | 12 ++-
 arch/powerpc/kernel/setup_64.c                   |  8 ++
 arch/powerpc/kernel/smp.c                        |  6 --
 arch/powerpc/kernel/stacktrace.c                 |  4 +-
 arch/powerpc/mm/hugetlbpage.c                    |  3 +-
 arch/powerpc/mm/pgtable-book3s64.c               | 12 +++
 arch/powerpc/mm/tlb-radix.c                      | 98 +++++++++++++++++++-----
 16 files changed, 153 insertions(+), 34 deletions(-)
-----BEGIN PGP SIGNATURE-----

iQIcBAEBCAAGBQJbLkOQAAoJEFHr6jzI4aWA9j0QALRcw9WPmiuhCdy0Jfjn3NnL
B5Y5CBceaemGT/KCp/UrGCpA8WAOJq1PHayAFnFtuVP5ccVdQUFDVUnuSsnuIPP4
+i9OU08NAuItp68fVVO/BahbOudltDRCJg6+yAacBhEQ8PCqVUI7Hj9r1WQRWf6X
dFLHQqv5AJEnbm0u7yGTk+rnj9wkLej2nTs2HlT7bWjJKCG5KP8EV8PyeS7v0bXP
cGvWJ5wyL417vgguPX5YBLnY0qybElsmGzANaDi8tjsdduMNkLrvmoea+udebMFI
pn3f1tnQeFDYZuRn0ZIKeHJlcyp4CJkVAPSIdoUNW7sLcOB5RlMr8d6Ie1ry0seH
6TX+Ij5rK8pPg1K9ZfJzLEEb1ddc+me9sh7Lq5J/fJR8UMwjrws/9uNvTrjhy68m
tgN8I9dYo2vC1BZ7doh4OdbLOJcHvbBAS5y2fnXcvYdnSTvVsFUJTctGBtE8Om1d
8jYBZQ92/dF2O5pwd59A6eZ5alLDQJ5UVt5ecaLuuIuL82rtW4x1OiQTXJ/A4GB8
P3wvJdiPubhLlNuRz3TZPr3Pckb52OKgL/MAG6T4WFRtDL3GweWfe1MYRipQJatO
V7T6QL1/KYSP7XTLqIdE2dYnkpi9jvZV3RSmgvfZrugTzlm3tI5G5cSt55q0D28c
ArL790E2Wz+0dVZYfXPP
=UQLt
-----END PGP SIGNATURE-----

^ permalink raw reply

* Re: [PATCH v2 0/6] powerpc/pkeys: fixes to pkeys
From: Michael Ellerman @ 2018-06-23 15:02 UTC (permalink / raw)
  To: Ram Pai
  Cc: Florian Weimer, linuxppc-dev, dave.hansen, aneesh.kumar,
	bsingharora, hbabu, mhocko, bauerman, Ulrich.Weigand, luto,
	msuchanek
In-Reply-To: <20180621181045.GL5294@ram.oc3035372033.ibm.com>

Ram Pai <linuxram@us.ibm.com> writes:
> On Thu, Jun 21, 2018 at 08:28:47PM +1000, Michael Ellerman wrote:
>> Florian Weimer <fweimer@redhat.com> writes:
>> > On 06/19/2018 02:40 PM, Michael Ellerman wrote:
>> >>> I tested the whole series with the new selftests, with the printamr.c
>> >>> program I posted earlier, and the glibc test for pkey_alloc &c.  The
>> >>> latter required some test fixes, but now passes as well.  As far as I
>> >>> can tell, everything looks good now.
>> >>>
>> >>> Tested-By: Florian Weimer<fweimer@redhat.com>
>> >> Thanks. I'll add that to each patch I guess, if you're happy with that?
>> >
>> > Sure, but I only tested the whole series as a whole.
>> 
>> Yeah OK. We don't have a good way to express that, other than using a
>> merge which I'd prefer to avoid.
>> 
>> So I've tagged them all with your Tested-by. If any of them turn out to
>> have bugs you can blame me :)
>
> I just tested the patches incrementally using the pkey selftests.
>
> So I feel confident these patches are not bugs. I will take the blame
> if the blame lands on Mpe  :)

Did you run core-pkey and ptrace-pkey?

The pkey selftests that are in tools/testing/selftests/powerpc/ptrace ?

Because those are failing for me:

  test: core_pkey
  tags: git_version:c899d94
  [FAIL] Test FAILED on line 245
  [Core Read (Running)]          AMR: 3fcfffffffffffff IAMR: 1105555555555555 UAMOR: 33cfffffffffffff
  failure: core_pkey
  
  test: ptrace_pkey
  tags: git_version:c899d94
  [FAIL] Test FAILED on line 214
  [Ptrace Read (Running)]        AMR: 3fcfffffffffffff IAMR: 1105555555555555 UAMOR: 33cfffffffffffff
  [User Write (Running)]         AMR: 3fffffffffffffff pkey1: 3 pkey2: 4 pkey3: 5
  failure: ptrace_pkey


Some of which is presumably test case bugs, but there's at least one
kernel bug with the UAMOR handling.

So this series will have to wait until next week :/

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/xmon: avoid warnings about variables that might be clobbered by ‘longjmp’
From: christophe leroy @ 2018-06-23 16:59 UTC (permalink / raw)
  To: Mathieu Malaterre, Michael Ellerman
  Cc: Yisheng Xie, Vaibhav Jain, Nicholas Piggin, linux-kernel,
	Paul Mackerras, Breno Leitao, linuxppc-dev
In-Reply-To: <20180622192718.24242-1-malat@debian.org>



Le 22/06/2018 à 21:27, Mathieu Malaterre a écrit :
> Move initialization of variables after data definitions. This silence
> warnings treated as error with W=1:
> 
>    arch/powerpc/xmon/xmon.c:3389:14: error: variable ‘name’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Werror=clobbered]
>    arch/powerpc/xmon/xmon.c:3100:22: error: variable ‘tsk’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Werror=clobbered]

Is that an invalid warning ?

If so, please explain in the commit log.

Otherwise, I'd expect one to fix the warning, not just cheat on GCC.

Christophe


> 
> Signed-off-by: Mathieu Malaterre <malat@debian.org>
> ---
>   arch/powerpc/xmon/xmon.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
> index 47166ad2a669..982848c784ff 100644
> --- a/arch/powerpc/xmon/xmon.c
> +++ b/arch/powerpc/xmon/xmon.c
> @@ -3097,10 +3097,11 @@ static void show_pte(unsigned long addr)
>   static void show_tasks(void)
>   {
>   	unsigned long tskv;
> -	struct task_struct *tsk = NULL;
> +	struct task_struct *tsk;
>   
>   	printf("     task_struct     ->thread.ksp    PID   PPID S  P CMD\n");
>   
> +	tsk = NULL;
>   	if (scanhex(&tskv))
>   		tsk = (struct task_struct *)tskv;
>   
> @@ -3386,10 +3387,11 @@ static void xmon_print_symbol(unsigned long address, const char *mid,
>   			      const char *after)
>   {
>   	char *modname;
> -	const char *name = NULL;
> +	const char *name;
>   	unsigned long offset, size;
>   
>   	printf(REG, address);
> +	name = NULL;
>   	if (setjmp(bus_error_jmp) == 0) {
>   		catch_memory_errors = 1;
>   		sync();
> 

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus

^ permalink raw reply

* Re: [PATCH] powerpc/mm: remove warning about ‘type’ being set
From: christophe leroy @ 2018-06-23 17:12 UTC (permalink / raw)
  To: Mathieu Malaterre, Michael Ellerman
  Cc: Kate Stewart, Greg Kroah-Hartman, linux-kernel, Paul Mackerras,
	Philippe Ombredanne, Thomas Gleixner, linuxppc-dev,
	Peter Zijlstra, akpm@linux-foundation.org
In-Reply-To: <20180622192749.24954-1-malat@debian.org>



Le 22/06/2018 à 21:27, Mathieu Malaterre a écrit :
> ‘type’ is only used when CONFIG_DEBUG_HIGHMEM is set. So add a possibly
> unused tag to variable. Remove warning treated as error with W=1:
> 
>    arch/powerpc/mm/highmem.c:59:6: error: variable ‘type’ set but not used [-Werror=unused-but-set-variable]

Is type neeeded at all when CONFIG_DEBUG_HIGHMEM is not set ?

The call 	type = kmap_atomic_idx();          seems useless when 
CONFIG_DEBUG_HIGHMEM isn't set. Couldn't we just most type definition 
and setting inside the CONFIG_DEBUG_HIGHMEM {} below ?

Alternatively, maybe you could replace the #ifdef CONFIG_DEBUG_HIGHMEM 
by an    if (IS_ENABLED(CONFIG_DEBUG_HIGHMEM)) ?

Christophe

> 
> Signed-off-by: Mathieu Malaterre <malat@debian.org>
> ---
>   arch/powerpc/mm/highmem.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/mm/highmem.c b/arch/powerpc/mm/highmem.c
> index 668e87d03f9e..82a0e37557a5 100644
> --- a/arch/powerpc/mm/highmem.c
> +++ b/arch/powerpc/mm/highmem.c
> @@ -56,7 +56,7 @@ EXPORT_SYMBOL(kmap_atomic_prot);
>   void __kunmap_atomic(void *kvaddr)
>   {
>   	unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
> -	int type;
> +	int type __maybe_unused;
>   
>   	if (vaddr < __fix_to_virt(FIX_KMAP_END)) {
>   		pagefault_enable();
> 

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus

^ permalink raw reply

* Re: [PATCH] powerpc/xmon: avoid warnings about variables that might be clobbered by ‘longjmp’
From: Segher Boessenkool @ 2018-06-23 19:47 UTC (permalink / raw)
  To: christophe leroy
  Cc: Mathieu Malaterre, Michael Ellerman, Yisheng Xie, Vaibhav Jain,
	Nicholas Piggin, linux-kernel, Paul Mackerras, Breno Leitao,
	linuxppc-dev
In-Reply-To: <b41b244e-78b8-c85f-9122-427ad04cccf8@c-s.fr>

On Sat, Jun 23, 2018 at 06:59:27PM +0200, christophe leroy wrote:
> 
> 
> Le 22/06/2018 à 21:27, Mathieu Malaterre a écrit :
> >Move initialization of variables after data definitions. This silence
> >warnings treated as error with W=1:
> >
> >   arch/powerpc/xmon/xmon.c:3389:14: error: variable ‘name’ might be 
> >   clobbered by ‘longjmp’ or ‘vfork’ [-Werror=clobbered]
> >   arch/powerpc/xmon/xmon.c:3100:22: error: variable ‘tsk’ might be 
> >   clobbered by ‘longjmp’ or ‘vfork’ [-Werror=clobbered]
> 
> Is that an invalid warning ?

No, both are correct warnings.  GCC can not see which functions it only
has a declaration of can call longjmp.

> Otherwise, I'd expect one to fix the warning, not just cheat on GCC.

Yes, the patch seems to change the code in such a way that some versions
of GCC will no longer warn.  Which does not make to code any more correct.

Either restructure the code, or make the var non-automatic, or make it
volatile.


Segher

^ permalink raw reply

* [PATCH 0/7] Add initial version of "cognitive DMA"
From: Timothy Pearson @ 2018-06-23 23:52 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman

POWER9 (PHB4) requires all peripherals using DMA to be either restricted
to 32-bit windows or capable of accessing the entire 64 bits of memory
space.  Some devices, such as most GPUs, can only address up to a certain
number of bits (approximately 40, in many cases), while at the same time
it is highly desireable to use a larger DMA space than the fallback 32 bits.

This series adds something called "cognitive DMA", which is a form of dynamic
TCE allocation.  This allows the peripheral to DMA to host addresses mapped in
1G (PHB4) or 256M (PHB3) chunks, and is transparent to the peripheral and its
driver stack.

This series has been tested on a Talos II server with a Radeon WX4100 and
a wide range of OpenGL applications.  While there is still work, notably
involving what happens if a peripheral attempts to DMA close to a TCE
window boundary, this series greatly improves functionality for AMD GPUs
on POWER9 devices over the existing 32-bit DMA support.

Russell Currey (4):
  powerpc/powernv/pci: Track largest available TCE order per PHB
  powerpc/powernv: DMA operations for discontiguous allocation
  powerpc/powernv/pci: Track DMA and TCE tables in debugfs
  powerpc/powernv/pci: Safety fixes for pseudobypass TCE allocation

Timothy Pearson (3):
  powerpc/powernv/pci: Export pnv_pci_ioda2_tce_invalidate_pe
  powerpc/powernv/pci: Invalidate TCE cache after DMA map setup
  powerpc/powernv/pci: Don't use the lower 4G TCEs in pseudo-DMA mode

 arch/powerpc/include/asm/dma-mapping.h    |   1 +
 arch/powerpc/platforms/powernv/Makefile   |   2 +-
 arch/powerpc/platforms/powernv/pci-dma.c  | 320 ++++++++++++++++++++++
 arch/powerpc/platforms/powernv/pci-ioda.c | 169 ++++++++----
 arch/powerpc/platforms/powernv/pci.h      |  11 +
 5 files changed, 452 insertions(+), 51 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/pci-dma.c

-- 
2.17.1

^ permalink raw reply

* [PATCH 2/7] powerpc/powernv: DMA operations for discontiguous
From: Timothy Pearson @ 2018-06-23 23:53 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman

 allocation

Cognitive DMA is a new set of DMA operations that solve some issues for
devices that want to address more than 32 bits but can't address the 59
bits required to enable direct DMA.

The previous implementation for POWER8/PHB3 worked around this by
configuring a bypass from the default 32-bit address space into 64-bit
address space.  This approach does not work for POWER9/PHB4 because
regions of memory are discontiguous and many devices will be unable to
address memory beyond the first node.

Instead, implement a new set of DMA operations that allocate TCEs as DMA
mappings are requested so that all memory is addressable even when a
one-to-one mapping between real addresses and DMA addresses isn't
possible.  These TCEs are the maximum size available on the platform,
which is 256M on PHB3 and 1G on PHB4.

Devices can now map any region of memory up to the maximum amount they can
address according to the DMA mask set, in chunks of the largest available
TCE size.

This implementation replaces the need for the existing PHB3 solution and
should be compatible with future PHB versions.

Signed-off-by: Russell Currey <ruscur@russell.cc>
---
 arch/powerpc/include/asm/dma-mapping.h    |   1 +
 arch/powerpc/platforms/powernv/Makefile   |   2 +-
 arch/powerpc/platforms/powernv/pci-dma.c  | 319 ++++++++++++++++++++++
 arch/powerpc/platforms/powernv/pci-ioda.c | 102 +++----
 arch/powerpc/platforms/powernv/pci.h      |   7 +
 5 files changed, 381 insertions(+), 50 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/pci-dma.c

diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
index 8fa394520af6..354f435160f3 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -74,6 +74,7 @@ static inline unsigned long device_to_mask(struct device *dev)
 extern struct dma_map_ops dma_iommu_ops;
 #endif
 extern const struct dma_map_ops dma_nommu_ops;
+extern const struct dma_map_ops dma_pseudo_bypass_ops;
 
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 {
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index 703a350a7f4e..2467bdab3c13 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -6,7 +6,7 @@ obj-y			+= opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
 obj-y			+= opal-kmsg.o opal-powercap.o opal-psr.o opal-sensor-groups.o
 
 obj-$(CONFIG_SMP)	+= smp.o subcore.o subcore-asm.o
-obj-$(CONFIG_PCI)	+= pci.o pci-ioda.o npu-dma.o
+obj-$(CONFIG_PCI)	+= pci.o pci-ioda.o npu-dma.o pci-dma.o
 obj-$(CONFIG_CXL_BASE)	+= pci-cxl.o
 obj-$(CONFIG_EEH)	+= eeh-powernv.o
 obj-$(CONFIG_PPC_SCOM)	+= opal-xscom.o
diff --git a/arch/powerpc/platforms/powernv/pci-dma.c b/arch/powerpc/platforms/powernv/pci-dma.c
new file mode 100644
index 000000000000..1d5409be343e
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/pci-dma.c
@@ -0,0 +1,319 @@
+/*
+ * DMA operations supporting pseudo-bypass for PHB3+
+ *
+ * Author: Russell Currey <ruscur@russell.cc>
+ *
+ * Copyright 2018 IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or (at your
+ * option) any later version.
+ */
+
+#include <linux/export.h>
+#include <linux/memblock.h>
+#include <linux/device.h>
+#include <linux/dma-mapping.h>
+#include <linux/hash.h>
+
+#include <asm/pci-bridge.h>
+#include <asm/ppc-pci.h>
+#include <asm/pnv-pci.h>
+#include <asm/tce.h>
+
+#include "pci.h"
+
+/* select and allocate a TCE using the bitmap */
+static int dma_pseudo_bypass_select_tce(struct pnv_ioda_pe *pe, phys_addr_t addr)
+{
+	int tce;
+	__be64 old, new;
+
+	spin_lock(&pe->tce_alloc_lock);
+	tce = bitmap_find_next_zero_area(pe->tce_bitmap,
+					 pe->tce_count,
+					 0,
+					 1,
+					 0);
+	bitmap_set(pe->tce_bitmap, tce, 1);
+	old = pe->tces[tce];
+	new = cpu_to_be64(addr | TCE_PCI_READ | TCE_PCI_WRITE);
+	pe->tces[tce] = new;
+	pe_info(pe, "allocating TCE %i 0x%016llx (old 0x%016llx)\n",
+		tce, new, old);
+	spin_unlock(&pe->tce_alloc_lock);
+
+	return tce;
+}
+
+/*
+ * The tracking table for assigning TCEs has two entries per TCE.
+ * - @entry1 contains the physical address and the smallest bit indicates
+ *     if it's currently valid.
+ * - @entry2 contains the DMA address returned in the upper 34 bits, and a
+ *     refcount in the lower 30 bits.
+ */
+static dma_addr_t dma_pseudo_bypass_get_address(struct device *dev,
+					    phys_addr_t addr)
+{
+	struct pci_dev *pdev = container_of(dev, struct pci_dev, dev);
+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct pnv_ioda_pe *pe;
+        u64 i, entry1, entry2, dma_prefix, tce, ret;
+	u64 offset = addr & ((1 << phb->ioda.max_tce_order) - 1);
+
+	pe = &phb->ioda.pe_array[pci_get_pdn(pdev)->pe_number];
+
+	/* look through the tracking table for a free entry */
+	for (i = 0; i < pe->tce_count; i++) {
+		entry1 = pe->tce_tracker[i * 2];
+		entry2 = pe->tce_tracker[i * 2 + 1];
+		dma_prefix = entry2 >> 34;
+
+		/* if the address is the same and the entry is valid */
+		if (entry1 == ((addr - offset) | 1)) {
+			/* all we need to do here is increment the refcount */
+			ret = cmpxchg(&pe->tce_tracker[i * 2 + 1],
+				      entry2, entry2 + 1);
+			if (ret != entry2) {
+				/* conflict, start looking again just in case */
+				i--;
+				continue;
+			}
+			return (dma_prefix << phb->ioda.max_tce_order) | offset;
+		/* if the entry is invalid then we want to replace it */
+		} else if (!(entry1 & 1)) {
+			/* set the real address, note that it isn't valid yet */
+			ret = cmpxchg(&pe->tce_tracker[i * 2],
+				      entry1, (addr - offset));
+			if (ret != entry1) {
+				/* conflict, start looking again */
+				i--;
+				continue;
+			}
+
+			/* now we can allocate a TCE */
+			tce = dma_pseudo_bypass_select_tce(pe, addr - offset);
+
+			/* set new value, including TCE index and new refcount */
+			ret = cmpxchg(&pe->tce_tracker[i * 2 + 1],
+				      entry2, tce << 34 | 1);
+			if (ret != entry2) {
+				/*
+				 * XXX In this case we need to throw out
+				 * everything, including the TCE we just
+				 * allocated.  For now, just leave it.
+				 */
+				i--;
+				continue;
+			}
+
+			/* now set the valid bit */
+			ret = cmpxchg(&pe->tce_tracker[i * 2],
+				      (addr - offset), (addr - offset) | 1);
+			if (ret != (addr - offset)) {
+				/*
+				 * XXX Same situation as above.  We'd probably
+				 * want to null out entry2 as well.
+				 */
+				i--;
+				continue;
+			}
+			return (tce << phb->ioda.max_tce_order) | offset;
+		/* it's a valid entry but not ours, keep looking */
+		} else {
+			continue;
+		}
+	}
+	/* If we get here, the table must be full, so error out. */
+	return -1ULL;
+}
+
+/*
+ * For the moment, unmapping just decrements the refcount and doesn't actually
+ * remove the TCE.  This is because it's very likely that a previously allocated
+ * TCE will be used again, and this saves having to invalidate it.
+ *
+ * TODO implement some kind of garbage collection that clears unused TCE entries
+ * once the table reaches a certain size.
+ */
+static void dma_pseudo_bypass_unmap_address(struct device *dev, dma_addr_t dma_addr)
+{
+	struct pci_dev *pdev = container_of(dev, struct pci_dev, dev);
+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct pnv_ioda_pe *pe;
+	u64 i, entry1, entry2, dma_prefix, refcount;
+
+	pe = &phb->ioda.pe_array[pci_get_pdn(pdev)->pe_number];
+
+	for (i = 0; i < pe->tce_count; i++) {
+		entry1 = pe->tce_tracker[i * 2];
+		entry2 = pe->tce_tracker[i * 2 + 1];
+		dma_prefix = entry2 >> 34;
+		refcount = entry2 & ((1 << 30) - 1);
+
+		/* look through entry2 until we find our address */
+		if (dma_prefix == (dma_addr >> phb->ioda.max_tce_order)) {
+			refcount--;
+			cmpxchg(&pe->tce_tracker[i * 2 + 1], entry2, (dma_prefix << 34) | refcount);
+			if (!refcount) {
+				/*
+				 * Here is where we would remove the valid bit
+				 * from entry1, clear the entry in the TCE table
+				 * and invalidate the TCE - but we want to leave
+				 * them until the table fills up (for now).
+				 */
+			}
+			break;
+		}
+	}
+}
+
+static int dma_pseudo_bypass_dma_supported(struct device *dev, u64 mask)
+{
+	/*
+	 * Normally dma_supported() checks if the mask is capable of addressing
+	 * all of memory.  Since we map physical memory in chunks that the
+	 * device can address, the device will be able to address whatever it
+	 * wants - just not all at once.
+	 */
+	return 1;
+}
+
+static void *dma_pseudo_bypass_alloc_coherent(struct device *dev,
+					  size_t size,
+					  dma_addr_t *dma_handle,
+					  gfp_t flag,
+					  unsigned long attrs)
+{
+	void *ret;
+	struct page *page;
+	int node = dev_to_node(dev);
+
+	/* ignore region specifiers */
+	flag &= ~(__GFP_HIGHMEM);
+
+	page = alloc_pages_node(node, flag, get_order(size));
+	if (page == NULL)
+		return NULL;
+	ret = page_address(page);
+	memset(ret, 0, size);
+	*dma_handle = dma_pseudo_bypass_get_address(dev, __pa(ret));
+
+	return ret;
+}
+
+static void dma_pseudo_bypass_free_coherent(struct device *dev,
+					 size_t size,
+					 void *vaddr,
+					 dma_addr_t dma_handle,
+					 unsigned long attrs)
+{
+	free_pages((unsigned long)vaddr, get_order(size));
+}
+
+static int dma_pseudo_bypass_mmap_coherent(struct device *dev,
+				       struct vm_area_struct *vma,
+				       void *cpu_addr,
+				       dma_addr_t handle,
+				       size_t size,
+				       unsigned long attrs)
+{
+	unsigned long pfn = page_to_pfn(virt_to_page(cpu_addr));
+
+	return remap_pfn_range(vma, vma->vm_start,
+			       pfn + vma->vm_pgoff,
+			       vma->vm_end - vma->vm_start,
+			       vma->vm_page_prot);
+}
+
+static inline dma_addr_t dma_pseudo_bypass_map_page(struct device *dev,
+						struct page *page,
+						unsigned long offset,
+						size_t size,
+						enum dma_data_direction dir,
+						unsigned long attrs)
+{
+	BUG_ON(dir == DMA_NONE);
+
+	/* XXX I don't know if this is necessary (or even desired) */
+	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+		__dma_sync_page(page, offset, size, dir);
+
+	return dma_pseudo_bypass_get_address(dev, page_to_phys(page) + offset);
+}
+
+static inline void dma_pseudo_bypass_unmap_page(struct device *dev,
+					 dma_addr_t dma_address,
+					 size_t size,
+					 enum dma_data_direction direction,
+					 unsigned long attrs)
+{
+	dma_pseudo_bypass_unmap_address(dev, dma_address);
+}
+
+
+static int dma_pseudo_bypass_map_sg(struct device *dev, struct scatterlist *sgl,
+			     int nents, enum dma_data_direction direction,
+			     unsigned long attrs)
+{
+	struct scatterlist *sg;
+	int i;
+
+
+	for_each_sg(sgl, sg, nents, i) {
+		sg->dma_address = dma_pseudo_bypass_get_address(dev, sg_phys(sg));
+		sg->dma_length = sg->length;
+
+		if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
+			continue;
+
+		__dma_sync_page(sg_page(sg), sg->offset, sg->length, direction);
+	}
+
+	return nents;
+}
+
+static void dma_pseudo_bypass_unmap_sg(struct device *dev, struct scatterlist *sgl,
+				int nents, enum dma_data_direction direction,
+				unsigned long attrs)
+{
+	struct scatterlist *sg;
+	int i;
+
+	for_each_sg(sgl, sg, nents, i) {
+		dma_pseudo_bypass_unmap_address(dev, sg->dma_address);
+	}
+}
+
+static u64 dma_pseudo_bypass_get_required_mask(struct device *dev)
+{
+	/*
+	 * there's no limitation on our end, the driver should just call
+	 * set_mask() with as many bits as the device can address.
+	 */
+	return -1ULL;
+}
+
+static int dma_pseudo_bypass_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+	return dma_addr == -1ULL;
+}
+
+
+const struct dma_map_ops dma_pseudo_bypass_ops = {
+	.alloc				= dma_pseudo_bypass_alloc_coherent,
+	.free				= dma_pseudo_bypass_free_coherent,
+	.mmap				= dma_pseudo_bypass_mmap_coherent,
+	.map_sg				= dma_pseudo_bypass_map_sg,
+	.unmap_sg			= dma_pseudo_bypass_unmap_sg,
+	.dma_supported			= dma_pseudo_bypass_dma_supported,
+	.map_page			= dma_pseudo_bypass_map_page,
+	.unmap_page			= dma_pseudo_bypass_unmap_page,
+	.get_required_mask		= dma_pseudo_bypass_get_required_mask,
+	.mapping_error			= dma_pseudo_bypass_mapping_error,
+};
+EXPORT_SYMBOL(dma_pseudo_bypass_ops);
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index bcb3bfce072a..7ecc186493ca 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -25,6 +25,7 @@
 #include <linux/iommu.h>
 #include <linux/rculist.h>
 #include <linux/sizes.h>
+#include <linux/vmalloc.h>
 
 #include <asm/sections.h>
 #include <asm/io.h>
@@ -1088,6 +1089,9 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
 	pe->pbus = NULL;
 	pe->mve_number = -1;
 	pe->rid = dev->bus->number << 8 | pdn->devfn;
+	pe->tces = NULL;
+	pe->tce_tracker = NULL;
+	pe->tce_bitmap = NULL;
 
 	pe_info(pe, "Associated device to PE\n");
 
@@ -1569,6 +1573,9 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 		pe->mve_number = -1;
 		pe->rid = (pci_iov_virtfn_bus(pdev, vf_index) << 8) |
 			   pci_iov_virtfn_devfn(pdev, vf_index);
+		pe->tces = NULL;
+		pe->tce_tracker = NULL;
+		pe->tce_bitmap = NULL;
 
 		pe_info(pe, "VF %04d:%02d:%02d.%d associated with PE#%x\n",
 			hose->global_number, pdev->bus->number,
@@ -1774,43 +1781,40 @@ static bool pnv_pci_ioda_pe_single_vendor(struct pnv_ioda_pe *pe)
 	return true;
 }
 
-/*
- * Reconfigure TVE#0 to be usable as 64-bit DMA space.
- *
- * The first 4GB of virtual memory for a PE is reserved for 32-bit accesses.
- * Devices can only access more than that if bit 59 of the PCI address is set
- * by hardware, which indicates TVE#1 should be used instead of TVE#0.
- * Many PCI devices are not capable of addressing that many bits, and as a
- * result are limited to the 4GB of virtual memory made available to 32-bit
- * devices in TVE#0.
- *
- * In order to work around this, reconfigure TVE#0 to be suitable for 64-bit
- * devices by configuring the virtual memory past the first 4GB inaccessible
- * by 64-bit DMAs.  This should only be used by devices that want more than
- * 4GB, and only on PEs that have no 32-bit devices.
- *
- * Currently this will only work on PHB3 (POWER8).
- */
-static int pnv_pci_ioda_dma_64bit_bypass(struct pnv_ioda_pe *pe)
+static int pnv_pci_pseudo_bypass_setup(struct pnv_ioda_pe *pe)
 {
-	u64 window_size, table_size, tce_count, addr;
+	u64 tce_count, table_size, window_size;
+	struct pnv_phb *p = pe->phb;
 	struct page *table_pages;
-	u64 tce_order = 28; /* 256MB TCEs */
 	__be64 *tces;
-	s64 rc;
+	int rc = -ENOMEM;
+	int bitmap_size, tracker_entries;
+
+	/*
+	 * XXX These are factors for scaling the size of the TCE table, and
+	 * the table that tracks these allocations.  These should eventually
+	 * be kernel command line options with defaults above 1, for situations
+	 * where your memory expands after the machine has booted.
+	 */
+	int tce_size_factor = 1;
+	int tracking_table_factor = 1;
 
 	/*
-	 * Window size needs to be a power of two, but needs to account for
-	 * shifting memory by the 4GB offset required to skip 32bit space.
+	 * The window size covers all of memory (and optionally more), with
+	 * enough tracker entries to cover them all being allocated.  So we
+	 * create enough TCEs to cover all of memory at once.
 	 */
-	window_size = roundup_pow_of_two(memory_hotplug_max() + (1ULL << 32));
-	tce_count = window_size >> tce_order;
+	window_size = roundup_pow_of_two(tce_size_factor * memory_hotplug_max());
+	tracker_entries = (tracking_table_factor * memory_hotplug_max()) >>
+		p->ioda.max_tce_order;
+	tce_count = window_size >> p->ioda.max_tce_order;
+	bitmap_size = BITS_TO_LONGS(tce_count) * sizeof(unsigned long);
 	table_size = tce_count << 3;
 
 	if (table_size < PAGE_SIZE)
 		table_size = PAGE_SIZE;
 
-	table_pages = alloc_pages_node(pe->phb->hose->node, GFP_KERNEL,
+	table_pages = alloc_pages_node(p->hose->node, GFP_KERNEL,
 				       get_order(table_size));
 	if (!table_pages)
 		goto err;
@@ -1821,26 +1825,33 @@ static int pnv_pci_ioda_dma_64bit_bypass(struct pnv_ioda_pe *pe)
 
 	memset(tces, 0, table_size);
 
-	for (addr = 0; addr < memory_hotplug_max(); addr += (1 << tce_order)) {
-		tces[(addr + (1ULL << 32)) >> tce_order] =
-			cpu_to_be64(addr | TCE_PCI_READ | TCE_PCI_WRITE);
-	}
+	pe->tces = tces;
+	pe->tce_count = tce_count;
+	pe->tce_bitmap = kzalloc(bitmap_size, GFP_KERNEL);
+	/* The tracking table has two u64s per TCE */
+	pe->tce_tracker = vzalloc(sizeof(u64) * 2 * tracker_entries);
+	spin_lock_init(&pe->tce_alloc_lock);
+
+	/* mark the first 4GB as reserved so this can still be used for 32bit */
+	bitmap_set(pe->tce_bitmap, 0, 1ULL << (32 - p->ioda.max_tce_order));
+
+	pe_info(pe, "pseudo-bypass sizes: tracker %d bitmap %d TCEs %lld\n",
+		tracker_entries, bitmap_size, tce_count);
 
 	rc = opal_pci_map_pe_dma_window(pe->phb->opal_id,
 					pe->pe_number,
-					/* reconfigure window 0 */
 					(pe->pe_number << 1) + 0,
 					1,
 					__pa(tces),
 					table_size,
-					1 << tce_order);
+					1 << p->ioda.max_tce_order);
 	if (rc == OPAL_SUCCESS) {
-		pe_info(pe, "Using 64-bit DMA iommu bypass (through TVE#0)\n");
+		pe_info(pe, "TCE tables configured for pseudo-bypass\n");
 		return 0;
 	}
 err:
-	pe_err(pe, "Error configuring 64-bit DMA bypass\n");
-	return -EIO;
+	pe_err(pe, "error configuring pseudo-bypass\n");
+	return rc;
 }
 
 static int pnv_pci_ioda_dma_set_mask(struct pci_dev *pdev, u64 dma_mask)
@@ -1851,7 +1862,6 @@ static int pnv_pci_ioda_dma_set_mask(struct pci_dev *pdev, u64 dma_mask)
 	struct pnv_ioda_pe *pe;
 	uint64_t top;
 	bool bypass = false;
-	s64 rc;
 
 	if (WARN_ON(!pdn || pdn->pe_number == IODA_INVALID_PE))
 		return -ENODEV;
@@ -1868,21 +1878,15 @@ static int pnv_pci_ioda_dma_set_mask(struct pci_dev *pdev, u64 dma_mask)
 	} else {
 		/*
 		 * If the device can't set the TCE bypass bit but still wants
-		 * to access 4GB or more, on PHB3 we can reconfigure TVE#0 to
-		 * bypass the 32-bit region and be usable for 64-bit DMAs.
-		 * The device needs to be able to address all of this space.
+		 * to access 4GB or more, we need to use a different set of DMA
+		 * operations with an indirect mapping.
 		 */
 		if (dma_mask >> 32 &&
-		    dma_mask > (memory_hotplug_max() + (1ULL << 32)) &&
-		    pnv_pci_ioda_pe_single_vendor(pe) &&
-		    phb->model == PNV_PHB_MODEL_PHB3) {
-			/* Configure the bypass mode */
-			rc = pnv_pci_ioda_dma_64bit_bypass(pe);
-			if (rc)
-				return rc;
-			/* 4GB offset bypasses 32-bit space */
-			set_dma_offset(&pdev->dev, (1ULL << 32));
-			set_dma_ops(&pdev->dev, &dma_nommu_ops);
+		    phb->model != PNV_PHB_MODEL_P7IOC &&
+		    pnv_pci_ioda_pe_single_vendor(pe)) {
+			if (!pe->tces)
+				pnv_pci_pseudo_bypass_setup(pe);
+			set_dma_ops(&pdev->dev, &dma_pseudo_bypass_ops);
 		} else if (dma_mask >> 32 && dma_mask != DMA_BIT_MASK(64)) {
 			/*
 			 * Fail the request if a DMA mask between 32 and 64 bits
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index c9952def5e93..83492aba90f1 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -70,6 +70,13 @@ struct pnv_ioda_pe {
 	bool			tce_bypass_enabled;
 	uint64_t		tce_bypass_base;
 
+	/* TCE tables for DMA pseudo-bypass */
+	__be64			*tces;
+	u64			tce_count;
+	unsigned long		*tce_bitmap;
+	u64			*tce_tracker; // 2 u64s per TCE
+	spinlock_t		tce_alloc_lock;
+
 	/* MSIs. MVE index is identical for for 32 and 64 bit MSI
 	 * and -1 if not supported. (It's actually identical to the
 	 * PE number)
-- 
2.17.1

^ permalink raw reply related

* [PATCH 1/7] powerpc/powernv/pci: Track largest available TCE order
From: Timothy Pearson @ 2018-06-23 23:52 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman

 per PHB

Knowing the largest possible TCE size of a PHB is useful, so get it out
of the device tree.  This relies on the property being added in OPAL.

It is assumed that any PHB4 or later machine would be running firmware
that implemented this property, and otherwise assumed to be PHB3, which
has a maximum TCE order of 28 bits or 256MB TCEs.

This is used later in the series.

Signed-off-by: Russell Currey <ruscur@russell.cc>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 16 ++++++++++++++++
 arch/powerpc/platforms/powernv/pci.h      |  3 +++
 2 files changed, 19 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 5bd0eb6681bc..bcb3bfce072a 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3873,11 +3873,13 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	struct resource r;
 	const __be64 *prop64;
 	const __be32 *prop32;
+	struct property *prop;
 	int len;
 	unsigned int segno;
 	u64 phb_id;
 	void *aux;
 	long rc;
+	u32 val;
 
 	if (!of_device_is_available(np))
 		return;
@@ -4016,6 +4018,20 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	}
 	phb->ioda.pe_array = aux + pemap_off;
 
+	phb->ioda.max_tce_order = 0;
+	// Get TCE order from the DT.  If it's not present, assume P8
+	if (!of_get_property(np, "ibm,supported-tce-sizes", NULL)) {
+		phb->ioda.max_tce_order = 28; // assume P8 256mb TCEs
+	} else {
+		of_property_for_each_u32(np, "ibm,supported-tce-sizes", prop,
+					 prop32, val) {
+			if (val > phb->ioda.max_tce_order)
+				phb->ioda.max_tce_order = val;
+		}
+		pr_debug("PHB%llx Found max TCE order of %d bits\n",
+			 phb->opal_id, phb->ioda.max_tce_order);
+	}
+
 	/*
 	 * Choose PE number for root bus, which shouldn't have
 	 * M64 resources consumed by its child devices. To pick
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index eada4b6068cb..c9952def5e93 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -173,6 +173,9 @@ struct pnv_phb {
 		struct list_head	pe_list;
 		struct mutex            pe_list_mutex;
 
+		/* Largest supported TCE order bits */
+		uint8_t			max_tce_order;
+
 		/* Reverse map of PEs, indexed by {bus, devfn} */
 		unsigned int		pe_rmap[0x10000];
 	} ioda;
-- 
2.17.1

^ permalink raw reply related

* [PATCH 6/7] powerpc/powernv/pci: Invalidate TCE cache after DMA map
From: Timothy Pearson @ 2018-06-23 23:54 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman

 setup

Per the IODA2, TCEs must be invalidated after their settings
have been changed.  Invalidate the cache after the address
is changed during TCE allocation when using pseudo DMA.

Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
---
 arch/powerpc/platforms/powernv/pci-dma.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-dma.c b/arch/powerpc/platforms/powernv/pci-dma.c
index 237940a2a052..060dbc168401 100644
--- a/arch/powerpc/platforms/powernv/pci-dma.c
+++ b/arch/powerpc/platforms/powernv/pci-dma.c
@@ -42,8 +42,7 @@ static int dma_pseudo_bypass_select_tce(struct pnv_ioda_pe *pe, phys_addr_t addr
 	new = cpu_to_be64(addr | TCE_PCI_READ | TCE_PCI_WRITE);
 	pe->tces[tce] = new;
 	mb();
-	pe_info(pe, "allocating TCE %i 0x%016llx (old 0x%016llx)\n",
-		tce, new, old);
+	pnv_pci_ioda2_tce_invalidate_pe(pe);
 	spin_unlock_irqrestore(&pe->tce_alloc_lock, flags);
 
 	return tce;
-- 
2.17.1

^ permalink raw reply related

* [PATCH 5/7] powerpc/powernv/pci: Export
From: Timothy Pearson @ 2018-06-23 23:54 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman

 pnv_pci_ioda2_tce_invalidate_pe

Pseudo DMA support requires a method to invalidate the TCE cache
Export pnv_pci_ioda2_tce_invalidate_pe for use by the pseudo DMA
mapper.

Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 2 +-
 arch/powerpc/platforms/powernv/pci.h      | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 55f0f7b885bc..a6097dd323f8 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2102,7 +2102,7 @@ static void pnv_pci_phb3_tce_invalidate(struct pnv_ioda_pe *pe, bool rm,
 	}
 }
 
-static inline void pnv_pci_ioda2_tce_invalidate_pe(struct pnv_ioda_pe *pe)
+void pnv_pci_ioda2_tce_invalidate_pe(struct pnv_ioda_pe *pe)
 {
 	struct pnv_phb *phb = pe->phb;
 
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 83492aba90f1..8d3849e76be3 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -264,6 +264,7 @@ extern void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
 /* Nvlink functions */
 extern void pnv_npu_try_dma_set_bypass(struct pci_dev *gpdev, bool bypass);
 extern void pnv_pci_ioda2_tce_invalidate_entire(struct pnv_phb *phb, bool rm);
+extern void pnv_pci_ioda2_tce_invalidate_pe(struct pnv_ioda_pe *pe);
 extern struct pnv_ioda_pe *pnv_pci_npu_setup_iommu(struct pnv_ioda_pe *npe);
 extern long pnv_npu_set_window(struct pnv_ioda_pe *npe, int num,
 		struct iommu_table *tbl);
-- 
2.17.1

^ permalink raw reply related

* [PATCH 7/7] powerpc/powernv/pci: Don't use the lower 4G TCEs in
From: Timothy Pearson @ 2018-06-23 23:54 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman

 pseudo-DMA mode

Four TCEs are reserved for legacy 32-bit DMA mappings in psuedo DMA
mode.  Mark these with an invalid address to avoid their use by
the TCE cache mapper.

Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index a6097dd323f8..e8a1333f6b3e 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1783,7 +1783,7 @@ static bool pnv_pci_ioda_pe_single_vendor(struct pnv_ioda_pe *pe)
 
 static int pnv_pci_pseudo_bypass_setup(struct pnv_ioda_pe *pe)
 {
-	u64 tce_count, table_size, window_size;
+	u64 i, tce_count, table_size, window_size;
 	struct pnv_phb *p = pe->phb;
 	struct page *table_pages;
 	__be64 *tces;
@@ -1835,6 +1835,12 @@ static int pnv_pci_pseudo_bypass_setup(struct pnv_ioda_pe *pe)
 	/* mark the first 4GB as reserved so this can still be used for 32bit */
 	bitmap_set(pe->tce_bitmap, 0, 1ULL << (32 - p->ioda.max_tce_order));
 
+	/* make sure reserved first 4GB TCEs are not used by the mapper
+	 * set each address to -1, which will never match an incoming request
+	 */
+	for (i = 0; i < 4; i++)
+		pe->tce_tracker[i * 2] = -1;
+
 	pe_info(pe, "pseudo-bypass sizes: tracker %d bitmap %d TCEs %lld\n",
 		tracker_entries, bitmap_size, tce_count);
 
-- 
2.17.1

^ permalink raw reply related

* [PATCH 3/7] powerpc/powernv/pci: Track DMA and TCE tables in debugfs
From: Timothy Pearson @ 2018-06-23 23:53 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman


Add a new debugfs entry to trigger dumping out the tracking table and
TCEs for a given PE, for example PE 0x4 of PHB 2:

echo 0x4 > /sys/kernel/debug/powerpc/PCI0002/sketchy

This will result in the table being dumped out in dmesg.

Signed-off-by: Russell Currey <ruscur@russell.cc>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 43 +++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 7ecc186493ca..55f0f7b885bc 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3342,6 +3342,47 @@ static int pnv_pci_diag_data_set(void *data, u64 val)
 DEFINE_SIMPLE_ATTRIBUTE(pnv_pci_diag_data_fops, NULL,
 			pnv_pci_diag_data_set, "%llu\n");
 
+static int pnv_pci_sketchy_set(void *data, u64 val)
+{
+	struct pci_controller *hose;
+	struct pnv_ioda_pe *pe;
+	struct pnv_phb *phb;
+	u64 entry1, entry2;
+	int i;
+
+	hose = (struct pci_controller *)data;
+	if (!hose || !hose->private_data)
+		return -ENODEV;
+
+	phb = hose->private_data;
+	pe = &phb->ioda.pe_array[val];
+
+	if (!pe)
+		return -EINVAL;
+
+	if (!pe->tces || !pe->tce_tracker)
+		return -EIO;
+
+	for (i = 0; i < pe->tce_count; i++) {
+		if (i > 16 && pe->tces[i] == 0)
+			break;
+		pr_info("%3d: %016llx\n", i, be64_to_cpu(pe->tces[i]));
+	}
+
+	for (i = 0; i < pe->tce_count; i++) {
+		entry1 = pe->tce_tracker[i * 2];
+		entry2 = pe->tce_tracker[i * 2 + 1];
+		if (!entry1)
+			break;
+		pr_info("%3d: %016llx %016llx\n", i, entry1, entry2);
+	}
+	return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(pnv_pci_sketchy_fops, NULL,
+			pnv_pci_sketchy_set, "%llu\n");
+
+
 #endif /* CONFIG_DEBUG_FS */
 
 static void pnv_pci_ioda_create_dbgfs(void)
@@ -3367,6 +3408,8 @@ static void pnv_pci_ioda_create_dbgfs(void)
 
 		debugfs_create_file("dump_diag_regs", 0200, phb->dbgfs, hose,
 				    &pnv_pci_diag_data_fops);
+		debugfs_create_file("sketchy", 0200, phb->dbgfs, hose,
+				    &pnv_pci_sketchy_fops);
 	}
 #endif /* CONFIG_DEBUG_FS */
 }
-- 
2.17.1

^ permalink raw reply related

* [PATCH 4/7] powerpc/powernv/pci: Safety fixes for pseudobypass TCE
From: Timothy Pearson @ 2018-06-23 23:53 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman

 allocation

Signed-off-by: Russell Currey <ruscur@russell.cc>
---
 arch/powerpc/platforms/powernv/pci-dma.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-dma.c b/arch/powerpc/platforms/powernv/pci-dma.c
index 1d5409be343e..237940a2a052 100644
--- a/arch/powerpc/platforms/powernv/pci-dma.c
+++ b/arch/powerpc/platforms/powernv/pci-dma.c
@@ -29,8 +29,9 @@ static int dma_pseudo_bypass_select_tce(struct pnv_ioda_pe *pe, phys_addr_t addr
 {
 	int tce;
 	__be64 old, new;
+	unsigned long flags;
 
-	spin_lock(&pe->tce_alloc_lock);
+	spin_lock_irqsave(&pe->tce_alloc_lock, flags);
 	tce = bitmap_find_next_zero_area(pe->tce_bitmap,
 					 pe->tce_count,
 					 0,
@@ -40,9 +41,10 @@ static int dma_pseudo_bypass_select_tce(struct pnv_ioda_pe *pe, phys_addr_t addr
 	old = pe->tces[tce];
 	new = cpu_to_be64(addr | TCE_PCI_READ | TCE_PCI_WRITE);
 	pe->tces[tce] = new;
+	mb();
 	pe_info(pe, "allocating TCE %i 0x%016llx (old 0x%016llx)\n",
 		tce, new, old);
-	spin_unlock(&pe->tce_alloc_lock);
+	spin_unlock_irqrestore(&pe->tce_alloc_lock, flags);
 
 	return tce;
 }
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH v3 00/12] macintosh: Resolve various PMU driver problems
From: Finn Thain @ 2018-06-24 11:35 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Michael Schmitz, linuxppc-dev, linux-m68k, linux-kernel
In-Reply-To: <cover.1528885172.git.fthain@telegraphics.com.au>

On Wed, 13 Jun 2018, I wrote:

> Finn Thain (12):
>   macintosh/via-pmu: Fix section mismatch warning
>   macintosh/via-pmu: Add missing mmio accessors
>   macintosh/via-pmu: Don't clear shift register interrupt flag twice
>   macintosh/via-pmu: Enhance state machine with new 'uninitialized'
>     state
>   macintosh/via-pmu: Replace via pointer with via1 and via2 pointers
>   macintosh/via-pmu: Add support for m68k PowerBooks
>   macintosh/via-pmu: Make CONFIG_PPC_PMAC Kconfig deps explicit
>   macintosh/via-pmu68k: Don't load driver on unsupported hardware
>   macintosh/via-pmu: Replace via-pmu68k driver with via-pmu driver
>   macintosh: Use common code to access RTC
>   macintosh/via-pmu: Clean up interrupt statistics
>   macintosh/via-pmu: Disambiguate interrupt statistics
> 

Patch 10/12 ("macintosh: Use common code to access RTC") conflicts with 
Arnd's RTC work, but the rest of this series can still be reviewed and 
merged as-is.

I will rewrite patch 10/12 after Arnd's fixes and this series have all 
made their way through both powerpc and m68k trees, and submit it 
separately.

Thanks.

-- 

^ permalink raw reply

* Re: [PATCH 0/7] Add initial version of "cognitive DMA"
From: Russell Currey @ 2018-06-25  1:09 UTC (permalink / raw)
  To: Timothy Pearson, linuxppc-dev; +Cc: Paul Mackerras
In-Reply-To: <1564865529.2569245.1529797922226.JavaMail.zimbra@raptorengineeringinc.com>

On Sat, 2018-06-23 at 18:52 -0500, Timothy Pearson wrote:

There's still more to do and this shouldn't be merged yet - would
encourage anyone with suitable hardware to test though.

> POWER9 (PHB4) requires all peripherals using DMA to be either
> restricted
> to 32-bit windows or capable of accessing the entire 64 bits of
> memory
> space.  Some devices, such as most GPUs, can only address up to a
> certain
> number of bits (approximately 40, in many cases), while at the same
> time
> it is highly desireable to use a larger DMA space than the fallback
> 32 bits.
> 
> This series adds something called "cognitive DMA", which is a form of
> dynamic
> TCE allocation.  This allows the peripheral to DMA to host addresses
> mapped in
> 1G (PHB4) or 256M (PHB3) chunks, and is transparent to the peripheral
> and its
> driver stack.
> 
> This series has been tested on a Talos II server with a Radeon WX4100
> and
> a wide range of OpenGL applications.  While there is still work,
> notably
> involving what happens if a peripheral attempts to DMA close to a TCE
> window boundary, this series greatly improves functionality for AMD
> GPUs
> on POWER9 devices over the existing 32-bit DMA support.
> 
> Russell Currey (4):
>   powerpc/powernv/pci: Track largest available TCE order per PHB
>   powerpc/powernv: DMA operations for discontiguous allocation
>   powerpc/powernv/pci: Track DMA and TCE tables in debugfs
>   powerpc/powernv/pci: Safety fixes for pseudobypass TCE allocation
> 
> Timothy Pearson (3):
>   powerpc/powernv/pci: Export pnv_pci_ioda2_tce_invalidate_pe
>   powerpc/powernv/pci: Invalidate TCE cache after DMA map setup
>   powerpc/powernv/pci: Don't use the lower 4G TCEs in pseudo-DMA mode
> 
>  arch/powerpc/include/asm/dma-mapping.h    |   1 +
>  arch/powerpc/platforms/powernv/Makefile   |   2 +-
>  arch/powerpc/platforms/powernv/pci-dma.c  | 320
> ++++++++++++++++++++++
>  arch/powerpc/platforms/powernv/pci-ioda.c | 169 ++++++++----
>  arch/powerpc/platforms/powernv/pci.h      |  11 +
>  5 files changed, 452 insertions(+), 51 deletions(-)
>  create mode 100644 arch/powerpc/platforms/powernv/pci-dma.c
> 

^ permalink raw reply

* Re: [PATCH 0/7] Add initial version of "cognitive DMA"
From: Timothy Pearson @ 2018-06-25  1:11 UTC (permalink / raw)
  To: Russell Currey; +Cc: linuxppc-dev, Paul Mackerras
In-Reply-To: <4b2ca540d0784bddd9e901fdd50eb73033290823.camel@russell.cc>

When should we be targeting merge?  At this point this is a substantial
improvement over currently shipping kernels for our systems, and we
don't really want to have to ship a patched / custom OS kernel if we can
avoid it.

On 06/24/2018 08:09 PM, Russell Currey wrote:
> On Sat, 2018-06-23 at 18:52 -0500, Timothy Pearson wrote:
> 
> There's still more to do and this shouldn't be merged yet - would
> encourage anyone with suitable hardware to test though.
> 
>> POWER9 (PHB4) requires all peripherals using DMA to be either
>> restricted
>> to 32-bit windows or capable of accessing the entire 64 bits of
>> memory
>> space.  Some devices, such as most GPUs, can only address up to a
>> certain
>> number of bits (approximately 40, in many cases), while at the same
>> time
>> it is highly desireable to use a larger DMA space than the fallback
>> 32 bits.
>>
>> This series adds something called "cognitive DMA", which is a form of
>> dynamic
>> TCE allocation.  This allows the peripheral to DMA to host addresses
>> mapped in
>> 1G (PHB4) or 256M (PHB3) chunks, and is transparent to the peripheral
>> and its
>> driver stack.
>>
>> This series has been tested on a Talos II server with a Radeon WX4100
>> and
>> a wide range of OpenGL applications.  While there is still work,
>> notably
>> involving what happens if a peripheral attempts to DMA close to a TCE
>> window boundary, this series greatly improves functionality for AMD
>> GPUs
>> on POWER9 devices over the existing 32-bit DMA support.
>>
>> Russell Currey (4):
>>   powerpc/powernv/pci: Track largest available TCE order per PHB
>>   powerpc/powernv: DMA operations for discontiguous allocation
>>   powerpc/powernv/pci: Track DMA and TCE tables in debugfs
>>   powerpc/powernv/pci: Safety fixes for pseudobypass TCE allocation
>>
>> Timothy Pearson (3):
>>   powerpc/powernv/pci: Export pnv_pci_ioda2_tce_invalidate_pe
>>   powerpc/powernv/pci: Invalidate TCE cache after DMA map setup
>>   powerpc/powernv/pci: Don't use the lower 4G TCEs in pseudo-DMA mode
>>
>>  arch/powerpc/include/asm/dma-mapping.h    |   1 +
>>  arch/powerpc/platforms/powernv/Makefile   |   2 +-
>>  arch/powerpc/platforms/powernv/pci-dma.c  | 320
>> ++++++++++++++++++++++
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 169 ++++++++----
>>  arch/powerpc/platforms/powernv/pci.h      |  11 +
>>  5 files changed, 452 insertions(+), 51 deletions(-)
>>  create mode 100644 arch/powerpc/platforms/powernv/pci-dma.c
>>


-- 
Timothy Pearson
Raptor Engineering
+1 (415) 727-8645 (direct line)
+1 (512) 690-0200 (switchboard)
https://www.raptorengineering.com

^ permalink raw reply

* Re: [PATCH 1/2] powerpc: Document issues with the DAWR on POWER9
From: Michael Neuling @ 2018-06-25  1:30 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: mpe, linuxppc-dev
In-Reply-To: <20180622172308.GQ16221@gate.crashing.org>

On Fri, 2018-06-22 at 12:23 -0500, Segher Boessenkool wrote:
> On Fri, Jun 22, 2018 at 04:14:51PM +1000, Michael Neuling wrote:
> > +will accept the command. Unfortunatley since there is no hardware
>=20
> "unfortunately".
>=20
> > +speed since it can use the hardware emualation. Unfortnatley if this
>=20
> It is not your favourite word to type ;-)

Or emulation apparently :-/

Thanks.
Mikey

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox