* [PATCH 0/7] Add initial version of "cognitive DMA"
@ 2018-06-23 23:52 Timothy Pearson
2018-06-25 1:09 ` Russell Currey
0 siblings, 1 reply; 3+ messages in thread
From: Timothy Pearson @ 2018-06-23 23:52 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
POWER9 (PHB4) requires all peripherals using DMA to be either restricted
to 32-bit windows or capable of accessing the entire 64 bits of memory
space. Some devices, such as most GPUs, can only address up to a certain
number of bits (approximately 40, in many cases), while at the same time
it is highly desireable to use a larger DMA space than the fallback 32 bits.
This series adds something called "cognitive DMA", which is a form of dynamic
TCE allocation. This allows the peripheral to DMA to host addresses mapped in
1G (PHB4) or 256M (PHB3) chunks, and is transparent to the peripheral and its
driver stack.
This series has been tested on a Talos II server with a Radeon WX4100 and
a wide range of OpenGL applications. While there is still work, notably
involving what happens if a peripheral attempts to DMA close to a TCE
window boundary, this series greatly improves functionality for AMD GPUs
on POWER9 devices over the existing 32-bit DMA support.
Russell Currey (4):
powerpc/powernv/pci: Track largest available TCE order per PHB
powerpc/powernv: DMA operations for discontiguous allocation
powerpc/powernv/pci: Track DMA and TCE tables in debugfs
powerpc/powernv/pci: Safety fixes for pseudobypass TCE allocation
Timothy Pearson (3):
powerpc/powernv/pci: Export pnv_pci_ioda2_tce_invalidate_pe
powerpc/powernv/pci: Invalidate TCE cache after DMA map setup
powerpc/powernv/pci: Don't use the lower 4G TCEs in pseudo-DMA mode
arch/powerpc/include/asm/dma-mapping.h | 1 +
arch/powerpc/platforms/powernv/Makefile | 2 +-
arch/powerpc/platforms/powernv/pci-dma.c | 320 ++++++++++++++++++++++
arch/powerpc/platforms/powernv/pci-ioda.c | 169 ++++++++----
arch/powerpc/platforms/powernv/pci.h | 11 +
5 files changed, 452 insertions(+), 51 deletions(-)
create mode 100644 arch/powerpc/platforms/powernv/pci-dma.c
--
2.17.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH 0/7] Add initial version of "cognitive DMA"
2018-06-23 23:52 [PATCH 0/7] Add initial version of "cognitive DMA" Timothy Pearson
@ 2018-06-25 1:09 ` Russell Currey
2018-06-25 1:11 ` Timothy Pearson
0 siblings, 1 reply; 3+ messages in thread
From: Russell Currey @ 2018-06-25 1:09 UTC (permalink / raw)
To: Timothy Pearson, linuxppc-dev; +Cc: Paul Mackerras
On Sat, 2018-06-23 at 18:52 -0500, Timothy Pearson wrote:
There's still more to do and this shouldn't be merged yet - would
encourage anyone with suitable hardware to test though.
> POWER9 (PHB4) requires all peripherals using DMA to be either
> restricted
> to 32-bit windows or capable of accessing the entire 64 bits of
> memory
> space. Some devices, such as most GPUs, can only address up to a
> certain
> number of bits (approximately 40, in many cases), while at the same
> time
> it is highly desireable to use a larger DMA space than the fallback
> 32 bits.
>
> This series adds something called "cognitive DMA", which is a form of
> dynamic
> TCE allocation. This allows the peripheral to DMA to host addresses
> mapped in
> 1G (PHB4) or 256M (PHB3) chunks, and is transparent to the peripheral
> and its
> driver stack.
>
> This series has been tested on a Talos II server with a Radeon WX4100
> and
> a wide range of OpenGL applications. While there is still work,
> notably
> involving what happens if a peripheral attempts to DMA close to a TCE
> window boundary, this series greatly improves functionality for AMD
> GPUs
> on POWER9 devices over the existing 32-bit DMA support.
>
> Russell Currey (4):
> powerpc/powernv/pci: Track largest available TCE order per PHB
> powerpc/powernv: DMA operations for discontiguous allocation
> powerpc/powernv/pci: Track DMA and TCE tables in debugfs
> powerpc/powernv/pci: Safety fixes for pseudobypass TCE allocation
>
> Timothy Pearson (3):
> powerpc/powernv/pci: Export pnv_pci_ioda2_tce_invalidate_pe
> powerpc/powernv/pci: Invalidate TCE cache after DMA map setup
> powerpc/powernv/pci: Don't use the lower 4G TCEs in pseudo-DMA mode
>
> arch/powerpc/include/asm/dma-mapping.h | 1 +
> arch/powerpc/platforms/powernv/Makefile | 2 +-
> arch/powerpc/platforms/powernv/pci-dma.c | 320
> ++++++++++++++++++++++
> arch/powerpc/platforms/powernv/pci-ioda.c | 169 ++++++++----
> arch/powerpc/platforms/powernv/pci.h | 11 +
> 5 files changed, 452 insertions(+), 51 deletions(-)
> create mode 100644 arch/powerpc/platforms/powernv/pci-dma.c
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH 0/7] Add initial version of "cognitive DMA"
2018-06-25 1:09 ` Russell Currey
@ 2018-06-25 1:11 ` Timothy Pearson
0 siblings, 0 replies; 3+ messages in thread
From: Timothy Pearson @ 2018-06-25 1:11 UTC (permalink / raw)
To: Russell Currey; +Cc: linuxppc-dev, Paul Mackerras
When should we be targeting merge? At this point this is a substantial
improvement over currently shipping kernels for our systems, and we
don't really want to have to ship a patched / custom OS kernel if we can
avoid it.
On 06/24/2018 08:09 PM, Russell Currey wrote:
> On Sat, 2018-06-23 at 18:52 -0500, Timothy Pearson wrote:
>
> There's still more to do and this shouldn't be merged yet - would
> encourage anyone with suitable hardware to test though.
>
>> POWER9 (PHB4) requires all peripherals using DMA to be either
>> restricted
>> to 32-bit windows or capable of accessing the entire 64 bits of
>> memory
>> space. Some devices, such as most GPUs, can only address up to a
>> certain
>> number of bits (approximately 40, in many cases), while at the same
>> time
>> it is highly desireable to use a larger DMA space than the fallback
>> 32 bits.
>>
>> This series adds something called "cognitive DMA", which is a form of
>> dynamic
>> TCE allocation. This allows the peripheral to DMA to host addresses
>> mapped in
>> 1G (PHB4) or 256M (PHB3) chunks, and is transparent to the peripheral
>> and its
>> driver stack.
>>
>> This series has been tested on a Talos II server with a Radeon WX4100
>> and
>> a wide range of OpenGL applications. While there is still work,
>> notably
>> involving what happens if a peripheral attempts to DMA close to a TCE
>> window boundary, this series greatly improves functionality for AMD
>> GPUs
>> on POWER9 devices over the existing 32-bit DMA support.
>>
>> Russell Currey (4):
>> powerpc/powernv/pci: Track largest available TCE order per PHB
>> powerpc/powernv: DMA operations for discontiguous allocation
>> powerpc/powernv/pci: Track DMA and TCE tables in debugfs
>> powerpc/powernv/pci: Safety fixes for pseudobypass TCE allocation
>>
>> Timothy Pearson (3):
>> powerpc/powernv/pci: Export pnv_pci_ioda2_tce_invalidate_pe
>> powerpc/powernv/pci: Invalidate TCE cache after DMA map setup
>> powerpc/powernv/pci: Don't use the lower 4G TCEs in pseudo-DMA mode
>>
>> arch/powerpc/include/asm/dma-mapping.h | 1 +
>> arch/powerpc/platforms/powernv/Makefile | 2 +-
>> arch/powerpc/platforms/powernv/pci-dma.c | 320
>> ++++++++++++++++++++++
>> arch/powerpc/platforms/powernv/pci-ioda.c | 169 ++++++++----
>> arch/powerpc/platforms/powernv/pci.h | 11 +
>> 5 files changed, 452 insertions(+), 51 deletions(-)
>> create mode 100644 arch/powerpc/platforms/powernv/pci-dma.c
>>
--
Timothy Pearson
Raptor Engineering
+1 (415) 727-8645 (direct line)
+1 (512) 690-0200 (switchboard)
https://www.raptorengineering.com
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2018-06-25 1:11 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-06-23 23:52 [PATCH 0/7] Add initial version of "cognitive DMA" Timothy Pearson
2018-06-25 1:09 ` Russell Currey
2018-06-25 1:11 ` Timothy Pearson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox