LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] powerpc/mm: Don't report PUDs as memory leaks when using kmemleak
From: Michael Ellerman @ 2018-07-30  6:43 UTC (permalink / raw)
  To: Paul Menzel, linuxppc-dev; +Cc: aneesh.kumar
In-Reply-To: <e653db46-a829-4673-4378-4c0afef03cde@molgen.mpg.de>

Paul Menzel <pmenzel@molgen.mpg.de> writes:
> Am 19.07.2018 um 16:33 schrieb Michael Ellerman:
...
>>=20
>> The fix is fairly simple. We need to tell kmemleak to ignore PUD
>> allocations and never report them as leaks. We can also tell it not to
>> scan the PGD, because it will never find pointers in there. However it
>> will still notice if we allocate a PGD and then leak it.
>>=20
>> Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
>> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> > ---
>>   arch/powerpc/include/asm/book3s/64/pgalloc.h | 23 ++++++++++++++++++++=
+--
>>   1 file changed, 21 insertions(+), 2 deletions(-)
>
> [=E2=80=A6]
>
> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> on IBM S822LC

Thanks.

cheers

^ permalink raw reply

* Re: [PATCH v4 09/11] macintosh/via-pmu: Replace via-pmu68k driver with via-pmu driver
From: Michael Ellerman @ 2018-07-30  6:47 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Finn Thain, Benjamin Herrenschmidt, Michael Schmitz, linux-m68k,
	linuxppc-dev, linux-kernel
In-Reply-To: <03e778fa5a025ac72f6d11c519939c5f8dbc6b8c.1530519301.git.fthain@telegraphics.com.au>

Finn Thain <fthain@telegraphics.com.au> writes:

> Now that the PowerMac via-pmu driver supports m68k PowerBooks,
> switch over to that driver and remove the via-pmu68k driver.
>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Tested-by: Stan Johnson <userm57@yahoo.com>
> Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
> ---
>  arch/m68k/configs/mac_defconfig   |   2 +-
>  arch/m68k/configs/multi_defconfig |   2 +-
>  arch/m68k/mac/config.c            |   2 +-
>  arch/m68k/mac/misc.c              |  48 +--
>  drivers/macintosh/Kconfig         |  13 +-
>  drivers/macintosh/Makefile        |   1 -
>  drivers/macintosh/adb.c           |   2 +-
>  drivers/macintosh/via-pmu68k.c    | 846 --------------------------------------
>  include/uapi/linux/pmu.h          |   2 +-
>  9 files changed, 14 insertions(+), 904 deletions(-)
>  delete mode 100644 drivers/macintosh/via-pmu68k.c

Geert are you OK with this and the other one that touches arch/m68k ?

cheers

^ permalink raw reply

* RE: [PATCH] Adds __init annotation at mmu_init_secondary func
From: Michael Ellerman @ 2018-07-30  6:57 UTC (permalink / raw)
  To: Alexey Spirkov, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev@lists.ozlabs.org
  Cc: trivial@kernel.org, andrew@ncrmnt.org
In-Reply-To: <AM5PR03MB28496A9D6FD7B175312E48F6C92A0@AM5PR03MB2849.eurprd03.prod.outlook.com>

Alexey Spirkov <AlexeiS@astrosoft.ru> writes:

> Without any additional option 
>
> WARNING: modpost: Found 1 section mismatch(es).
>
> If detailed debug is switched on than:
>
> WARNING: vmlinux.o(.text+0x142ac): Section mismatch in reference from the function mmu_init_secondary() to the function .init.text:ppc44x_pin_tlb()
> The function mmu_init_secondary() references
> the function __init ppc44x_pin_tlb().
> This is often because mmu_init_secondary lacks a __init 
> annotation or the annotation of ppc44x_pin_tlb is wrong.

Ah right, thanks.

I checked ppc47x_pin_tlb() but didn't spot the call to ppc44x_pin_tlb().

cheers

^ permalink raw reply

* Re: [PATCH v4 08/11] macintosh/via-pmu68k: Don't load driver on unsupported hardware
From: Geert Uytterhoeven @ 2018-07-30  7:23 UTC (permalink / raw)
  To: Finn Thain
  Cc: Benjamin Herrenschmidt, Michael Schmitz, linuxppc-dev, linux-m68k,
	Linux Kernel Mailing List
In-Reply-To: <3923d42d8e3a5e3ae3382d3354744fb6648d03a8.1530519301.git.fthain@telegraphics.com.au>

On Mon, Jul 2, 2018 at 10:21 AM Finn Thain <fthain@telegraphics.com.au> wrote:
> Don't load the via-pmu68k driver on early PowerBooks. The M50753 PMU
> device found in those models was never supported by this driver.
> Attempting to load the driver usually causes a boot hang.
>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
> Reviewed-by: Michael Schmitz <schmitzmic@gmail.com>

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH v4 09/11] macintosh/via-pmu: Replace via-pmu68k driver with via-pmu driver
From: Geert Uytterhoeven @ 2018-07-30  7:26 UTC (permalink / raw)
  To: Finn Thain
  Cc: Benjamin Herrenschmidt, Michael Schmitz, linuxppc-dev, linux-m68k,
	Linux Kernel Mailing List
In-Reply-To: <03e778fa5a025ac72f6d11c519939c5f8dbc6b8c.1530519301.git.fthain@telegraphics.com.au>

On Mon, Jul 2, 2018 at 10:21 AM Finn Thain <fthain@telegraphics.com.au> wrote:
> Now that the PowerMac via-pmu driver supports m68k PowerBooks,
> switch over to that driver and remove the via-pmu68k driver.
>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Tested-by: Stan Johnson <userm57@yahoo.com>
> Signed-off-by: Finn Thain <fthain@telegraphics.com.au>

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH v4 09/11] macintosh/via-pmu: Replace via-pmu68k driver with via-pmu driver
From: Geert Uytterhoeven @ 2018-07-30  7:27 UTC (permalink / raw)
  To: Finn Thain
  Cc: Benjamin Herrenschmidt, Michael Schmitz, linuxppc-dev, linux-m68k,
	Linux Kernel Mailing List
In-Reply-To: <CAMuHMdU_7V0n6HCNR87pmxYSMJxC=WpbtChpdzT2DQi3VWBZ7Q@mail.gmail.com>

On Mon, Jul 30, 2018 at 9:26 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> On Mon, Jul 2, 2018 at 10:21 AM Finn Thain <fthain@telegraphics.com.au> wrote:
> > Now that the PowerMac via-pmu driver supports m68k PowerBooks,
> > switch over to that driver and remove the via-pmu68k driver.
> >
> > Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> > Tested-by: Stan Johnson <userm57@yahoo.com>
> > Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
>
> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>

Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH v4 09/11] macintosh/via-pmu: Replace via-pmu68k driver with via-pmu driver
From: Geert Uytterhoeven @ 2018-07-30  7:27 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Finn Thain, Benjamin Herrenschmidt, Michael Schmitz, linux-m68k,
	linuxppc-dev, Linux Kernel Mailing List
In-Reply-To: <87tvohi5ji.fsf@concordia.ellerman.id.au>

Hi Michael,

On Mon, Jul 30, 2018 at 8:47 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
> Finn Thain <fthain@telegraphics.com.au> writes:
> > Now that the PowerMac via-pmu driver supports m68k PowerBooks,
> > switch over to that driver and remove the via-pmu68k driver.
> >
> > Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> > Tested-by: Stan Johnson <userm57@yahoo.com>
> > Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
> > ---
> >  arch/m68k/configs/mac_defconfig   |   2 +-
> >  arch/m68k/configs/multi_defconfig |   2 +-
> >  arch/m68k/mac/config.c            |   2 +-
> >  arch/m68k/mac/misc.c              |  48 +--
> >  drivers/macintosh/Kconfig         |  13 +-
> >  drivers/macintosh/Makefile        |   1 -
> >  drivers/macintosh/adb.c           |   2 +-
> >  drivers/macintosh/via-pmu68k.c    | 846 --------------------------------------
> >  include/uapi/linux/pmu.h          |   2 +-
> >  9 files changed, 14 insertions(+), 904 deletions(-)
> >  delete mode 100644 drivers/macintosh/via-pmu68k.c
>
> Geert are you OK with this and the other one that touches arch/m68k ?

Sure, feel free to take them through the PPC tree.
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>

Thanks!

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* [PATCH] PCI: call dma_debug_add_bus for pci_bus_type in common code
From: Christoph Hellwig @ 2018-07-30  7:38 UTC (permalink / raw)
  To: linux-pci; +Cc: iommu, linuxppc-dev, x86, linux-sh, linux-kernel

There is nothing arch specific about PCI or dma-debug, so move this
call to common code just after registering the bus type.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/powerpc/kernel/dma.c | 3 ---
 arch/sh/drivers/pci/pci.c | 2 --
 arch/x86/kernel/pci-dma.c | 3 ---
 drivers/pci/pci-driver.c  | 2 +-
 4 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index 155170d70324..dbfc7056d7df 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -357,9 +357,6 @@ EXPORT_SYMBOL_GPL(dma_get_required_mask);
 
 static int __init dma_init(void)
 {
-#ifdef CONFIG_PCI
-	dma_debug_add_bus(&pci_bus_type);
-#endif
 #ifdef CONFIG_IBMVIO
 	dma_debug_add_bus(&vio_bus_type);
 #endif
diff --git a/arch/sh/drivers/pci/pci.c b/arch/sh/drivers/pci/pci.c
index e5b7437ab4af..8256626bc53c 100644
--- a/arch/sh/drivers/pci/pci.c
+++ b/arch/sh/drivers/pci/pci.c
@@ -160,8 +160,6 @@ static int __init pcibios_init(void)
 	for (hose = hose_head; hose; hose = hose->next)
 		pcibios_scanbus(hose);
 
-	dma_debug_add_bus(&pci_bus_type);
-
 	pci_initialized = 1;
 
 	return 0;
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index ab5d9dd668d2..43f58632f123 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -155,9 +155,6 @@ static int __init pci_iommu_init(void)
 {
 	struct iommu_table_entry *p;
 
-#ifdef CONFIG_PCI
-	dma_debug_add_bus(&pci_bus_type);
-#endif
 	x86_init.iommu.iommu_init();
 
 	for (p = __iommu_table; p < __iommu_table_end; p++) {
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 6792292b5fc7..bef17c3fca67 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -1668,7 +1668,7 @@ static int __init pci_driver_init(void)
 	if (ret)
 		return ret;
 #endif
-
+	dma_debug_add_bus(&pci_bus_type);
 	return 0;
 }
 postcore_initcall(pci_driver_init);
-- 
2.18.0

^ permalink raw reply related

* [PATCH] powerpc: do not redefined NEED_DMA_MAP_STATE
From: Christoph Hellwig @ 2018-07-30  7:37 UTC (permalink / raw)
  To: benh, paulus, mpe; +Cc: linuxppc-dev, iommu

kernel/dma/Kconfig already defines NEED_DMA_MAP_STATE, just select it
from PPC64 and NOT_COHERENT_CACHE instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/powerpc/Kconfig                   | 3 ---
 arch/powerpc/platforms/Kconfig.cputype | 2 ++
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9f2b75fe2c2d..f9cae7edd735 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -884,9 +884,6 @@ config ZONE_DMA
 	bool
 	default y
 
-config NEED_DMA_MAP_STATE
-	def_bool (PPC64 || NOT_COHERENT_CACHE)
-
 config GENERIC_ISA_DMA
 	bool
 	depends on ISA_DMA_API
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index e6a1de521319..a2578bf8d560 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -3,6 +3,7 @@ config PPC64
 	bool "64-bit kernel"
 	default n
 	select ZLIB_DEFLATE
+	select NEED_DMA_MAP_STATE
 	help
 	  This option selects whether a 32-bit or a 64-bit kernel
 	  will be built.
@@ -386,6 +387,7 @@ config NOT_COHERENT_CACHE
 	depends on 4xx || PPC_8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
 	default n if PPC_47x
 	default y
+	select NEED_DMA_MAP_STATE
 
 config CHECK_CACHE_COHERENCY
 	bool
-- 
2.18.0

^ permalink raw reply related

* Re: powerpc: 32BIT vs. 64BIT (PPC32 vs. PPC64)
From: Masahiro Yamada @ 2018-07-30  8:42 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Nicholas Piggin, Benjamin Herrenschmidt, linux-kbuild,
	Paul Mackerras, linuxppc-dev, Stephen Rothwell
In-Reply-To: <c0a17b80-6384-7437-045c-3688d9e5e7f9@infradead.org>

2018-07-07 23:59 GMT+09:00 Randy Dunlap <rdunlap@infradead.org>:
> On 07/07/2018 05:13 AM, Nicholas Piggin wrote:
>> On Fri, 6 Jul 2018 21:58:29 -0700
>> Randy Dunlap <rdunlap@infradead.org> wrote:
>>
>>> On 07/06/2018 06:45 PM, Benjamin Herrenschmidt wrote:
>>>> On Thu, 2018-07-05 at 14:30 -0700, Randy Dunlap wrote:
>>>>> Hi,
>>>>>
>>>>> Is there a good way (or a shortcut) to do something like:
>>>>>
>>>>> $ make ARCH=powerpc O=PPC32 [other_options] allmodconfig
>>>>>   to get a PPC32/32BIT allmodconfig
>>>>>
>>>>> and also be able to do:
>>>>>
>>>>> $make ARCH=powerpc O=PPC64 [other_options] allmodconfig
>>>>>   to get a PPC64/64BIT allmodconfig?
>>>>
>>>> Hrm... O= is for the separate build dir, so there much be something
>>>> else.
>>>>
>>>> You mean having ARCH= aliases like ppc/ppc32 and ppc64 ?
>>>
>>> Yes.
>>>
>>>> That would be a matter of overriding some .config defaults I suppose, I
>>>> don't know how this is done on other archs.
>>>>
>>>> I see the aliasing trick in the Makefile but that's about it.
>>>>
>>>>> Note that arch/x86, arch/sh, and arch/sparc have ways to do
>>>>> some flavor(s) of this (from Documentation/kbuild/kbuild.txt;
>>>>> sh and sparc based on a recent "fix" patch from me):
>>>>
>>>> I fail to see what you are actually talking about here ... sorry. Do
>>>> you have concrete examples on x86 or sparc ? From what I can tell the
>>>> "i386" or "sparc32/sparc64" aliases just change SRCARCH in Makefile and
>>>> 32 vs 64-bit is just a Kconfig option...
>>>
>>> Yes, your summary is mostly correct.
>>>
>>> I'm just looking for a way to do cross-compile builds that are close to
>>> ppc32 allmodconfig and ppc64 allmodconfig.
>>
>> Would there a problem with adding ARCH=ppc32 / ppc64 matching? This
>> seems to work...
>>
>> Thanks,
>> Nick
>
> Yes, this mostly works and is similar to a patch (my patch) on my test machine.
> And they both work for allmodconfig, which is my primary build target.
>
> And they both have one little quirk that is confusing when the build target
> is defconfig:
>
> When ARCH=ppc32, the terminal output (stdout) is: (using O=PPC32)
>
> make[1]: Entering directory '/home/rdunlap/lnx/lnx-418-rc3/PPC32'
>   GEN     ./Makefile
> *** Default configuration is based on 'ppc64_defconfig'   <<<<< NOTE <<<<<
> #
> # configuration written to .config
> #
> make[1]: Leaving directory '/home/rdunlap/lnx/lnx-418-rc3/PPC32'
>


Maybe, we can set one of ppc32 defconfigs to KBUILD_DEFCONFIG
if ARCH is ppc32 ?


ifeq ($(ARCH),ppc32)
   KBUILD_DEFCONFIG := (some reasonable 32bit machine _defconfig)
else
   KBUILD_DEFCONFIG := ppc64_defconfig
endif

ifeq ($(CROSS_COMPILE),)
    KBUILD_DEFCONFIG := $(shell uname -m)_defconfig
endif


> I expect that can be fixed also.  :)
>
> And the written .config file is indeed for 32BIT, not 64BIT.
>
> Thanks, Nick.
>
>> ---
>>  Makefile                               | 8 ++++++++
>>  arch/powerpc/Kconfig                   | 9 +++++++++
>>  arch/powerpc/platforms/Kconfig.cputype | 8 --------
>>  3 files changed, 17 insertions(+), 8 deletions(-)
>>
>> diff --git a/Makefile b/Makefile
>> index c5ce55cbc543..f97204aed17a 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -345,6 +345,14 @@ ifeq ($(ARCH),sh64)
>>         SRCARCH := sh
>>  endif
>>
>> +# Additional ARCH settings for powerpc
>> +ifeq ($(ARCH),ppc32)
>> +       SRCARCH := powerpc
>> +endif
>> +ifeq ($(ARCH),ppc64)
>> +       SRCARCH := powerpc
>> +endif
>> +
>>  KCONFIG_CONFIG       ?= .config
>>  export KCONFIG_CONFIG
>>
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index 9f2b75fe2c2d..3405b1b122be 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -1,4 +1,13 @@
>>  # SPDX-License-Identifier: GPL-2.0
>> +
>> +config PPC64
>> +     bool "64-bit kernel" if "$(ARCH)" = "powerpc"
>> +     default "$(ARCH)" != "ppc32"
>> +     select ZLIB_DEFLATE
>> +     help
>> +       This option selects whether a 32-bit or a 64-bit kernel
>> +       will be built.
>> +
>>  source "arch/powerpc/platforms/Kconfig.cputype"
>>
>>  config PPC32
>> diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
>> index e6a1de521319..f6e5d6ef9782 100644
>> --- a/arch/powerpc/platforms/Kconfig.cputype
>> +++ b/arch/powerpc/platforms/Kconfig.cputype
>> @@ -1,12 +1,4 @@
>>  # SPDX-License-Identifier: GPL-2.0
>> -config PPC64
>> -     bool "64-bit kernel"
>> -     default n
>> -     select ZLIB_DEFLATE
>> -     help
>> -       This option selects whether a 32-bit or a 64-bit kernel
>> -       will be built.
>> -
>>  menu "Processor support"
>>  choice
>>       prompt "Processor Type"
>>
>
>
> --
> ~Randy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Regards
Masahiro Yamada

^ permalink raw reply

* Re: [RFC PATCH kernel 0/5] powerpc/P9/vfio: Pass through NVIDIA Tesla V100
From: Alexey Kardashevskiy @ 2018-07-30  8:58 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Benjamin Herrenschmidt, linuxppc-dev, David Gibson, kvm-ppc,
	Ram Pai, kvm, Alistair Popple
In-Reply-To: <20180711192621.174d849a@aik.ozlabs.ibm.com>



On 11/07/2018 19:26, Alexey Kardashevskiy wrote:
> On Tue, 10 Jul 2018 16:37:15 -0600
> Alex Williamson <alex.williamson@redhat.com> wrote:
> 
>> On Tue, 10 Jul 2018 14:10:20 +1000
>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>
>>> On Thu, 7 Jun 2018 23:03:23 -0600
>>> Alex Williamson <alex.williamson@redhat.com> wrote:
>>>   
>>>> On Fri, 8 Jun 2018 14:14:23 +1000
>>>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>>     
>>>>> On 8/6/18 1:44 pm, Alex Williamson wrote:      
>>>>>> On Fri, 8 Jun 2018 13:08:54 +1000
>>>>>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>>>>         
>>>>>>> On 8/6/18 8:15 am, Alex Williamson wrote:        
>>>>>>>> On Fri, 08 Jun 2018 07:54:02 +1000
>>>>>>>> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>>>>>>>>           
>>>>>>>>> On Thu, 2018-06-07 at 11:04 -0600, Alex Williamson wrote:          
>>>>>>>>>>
>>>>>>>>>> Can we back up and discuss whether the IOMMU grouping of NVLink
>>>>>>>>>> connected devices makes sense?  AIUI we have a PCI view of these
>>>>>>>>>> devices and from that perspective they're isolated.  That's the view of
>>>>>>>>>> the device used to generate the grouping.  However, not visible to us,
>>>>>>>>>> these devices are interconnected via NVLink.  What isolation properties
>>>>>>>>>> does NVLink provide given that its entire purpose for existing seems to
>>>>>>>>>> be to provide a high performance link for p2p between devices?            
>>>>>>>>>
>>>>>>>>> Not entire. On POWER chips, we also have an nvlink between the device
>>>>>>>>> and the CPU which is running significantly faster than PCIe.
>>>>>>>>>
>>>>>>>>> But yes, there are cross-links and those should probably be accounted
>>>>>>>>> for in the grouping.          
>>>>>>>>
>>>>>>>> Then after we fix the grouping, can we just let the host driver manage
>>>>>>>> this coherent memory range and expose vGPUs to guests?  The use case of
>>>>>>>> assigning all 6 GPUs to one VM seems pretty limited.  (Might need to
>>>>>>>> convince NVIDIA to support more than a single vGPU per VM though)          
>>>>>>>
>>>>>>> These are physical GPUs, not virtual sriov-alike things they are
>>>>>>> implementing as well elsewhere.        
>>>>>>
>>>>>> vGPUs as implemented on M- and P-series Teslas aren't SR-IOV like
>>>>>> either.  That's why we have mdev devices now to implement software
>>>>>> defined devices.  I don't have first hand experience with V-series, but
>>>>>> I would absolutely expect a PCIe-based Tesla V100 to support vGPU.        
>>>>>
>>>>> So assuming V100 can do vGPU, you are suggesting ditching this patchset and
>>>>> using mediated vGPUs instead, correct?      
>>>>
>>>> If it turns out that our PCIe-only-based IOMMU grouping doesn't
>>>> account for lack of isolation on the NVLink side and we correct that,
>>>> limiting assignment to sets of 3 interconnected GPUs, is that still a
>>>> useful feature?  OTOH, it's entirely an NVIDIA proprietary decision
>>>> whether they choose to support vGPU on these GPUs or whether they can
>>>> be convinced to support multiple vGPUs per VM.
>>>>     
>>>>>>> My current understanding is that every P9 chip in that box has some NVLink2
>>>>>>> logic on it so each P9 is directly connected to 3 GPUs via PCIe and
>>>>>>> 2xNVLink2, and GPUs in that big group are interconnected by NVLink2 links
>>>>>>> as well.
>>>>>>>
>>>>>>> From small bits of information I have it seems that a GPU can perfectly
>>>>>>> work alone and if the NVIDIA driver does not see these interconnects
>>>>>>> (because we do not pass the rest of the big 3xGPU group to this guest), it
>>>>>>> continues with a single GPU. There is an "nvidia-smi -r" big reset hammer
>>>>>>> which simply refuses to work until all 3 GPUs are passed so there is some
>>>>>>> distinction between passing 1 or 3 GPUs, and I am trying (as we speak) to
>>>>>>> get a confirmation from NVIDIA that it is ok to pass just a single GPU.
>>>>>>>
>>>>>>> So we will either have 6 groups (one per GPU) or 2 groups (one per
>>>>>>> interconnected group).        
>>>>>>
>>>>>> I'm not gaining much confidence that we can rely on isolation between
>>>>>> NVLink connected GPUs, it sounds like you're simply expecting that
>>>>>> proprietary code from NVIDIA on a proprietary interconnect from NVIDIA
>>>>>> is going to play nice and nobody will figure out how to do bad things
>>>>>> because... obfuscation?  Thanks,        
>>>>>
>>>>> Well, we already believe that a proprietary firmware of a sriov-capable
>>>>> adapter like Mellanox ConnextX is not doing bad things, how is this
>>>>> different in principle?      
>>>>
>>>> It seems like the scope and hierarchy are different.  Here we're
>>>> talking about exposing big discrete devices, which are peers of one
>>>> another (and have history of being reverse engineered), to userspace
>>>> drivers.  Once handed to userspace, each of those devices needs to be
>>>> considered untrusted.  In the case of SR-IOV, we typically have a
>>>> trusted host driver for the PF managing untrusted VFs.  We do rely on
>>>> some sanity in the hardware/firmware in isolating the VFs from each
>>>> other and from the PF, but we also often have source code for Linux
>>>> drivers for these devices and sometimes even datasheets.  Here we have
>>>> neither of those and perhaps we won't know the extent of the lack of
>>>> isolation between these devices until nouveau (best case) or some
>>>> exploit (worst case) exposes it.  IOMMU grouping always assumes a lack
>>>> of isolation between devices unless the hardware provides some
>>>> indication that isolation exists, for example ACS on PCIe.  If NVIDIA
>>>> wants to expose isolation on NVLink, perhaps they need to document
>>>> enough of it that the host kernel can manipulate and test for isolation,
>>>> perhaps even enabling virtualization of the NVLink interconnect
>>>> interface such that the host can prevent GPUs from interfering with
>>>> each other.  Thanks,    
>>>
>>>
>>> So far I got this from NVIDIA:
>>>
>>> 1. An NVLink2 state can be controlled via MMIO registers, there is a
>>> "NVLINK ISOLATION ON MULTI-TENANT SYSTEMS" spec (my copy is
>>> "confidential" though) from NVIDIA with the MMIO addresses to block if
>>> we want to disable certain links. In order to NVLink to work it needs to
>>> be enabled on both sides so by filtering certains MMIO ranges we can
>>> isolate a GPU.  
>>
>> Where are these MMIO registers, on the bridge or on the endpoint device?
> 
> The endpoint GPU device.
> 
>> I'm wondering when you say block MMIO if these are ranges on the device
>> that we disallow mmap to and all the overlapping PAGE_SIZE issues that
>> come with that or if this should essentially be device specific
>> enable_acs and acs_enabled quirks, and maybe also potentially used by
>> Logan's disable acs series to allow GPUs to be linked and have grouping
>> to match.
> 
> An update, I confused P100 and V100, P100 would need filtering but
> ours is V100 and it has a couple of registers which we can use to
> disable particular links and once disabled, the link cannot be
> re-enabled till the next secondary bus reset.
> 
> 
>>> 2. We can and should also prohibit the GPU firmware update, this is
>>> done via MMIO as well. The protocol is not open but at least register
>>> ranges might be in order to filter these accesses, and there is no
>>> plan to change this.  
>>
>> I assume this MMIO is on the endpoint and has all the PAGE_SIZE joys
>> along with it.
> 
> Yes, however NVIDIA says there is no performance critical stuff with
> this 64K page.
> 
>> Also, there are certainly use cases of updating
>> firmware for an assigned device, we don't want to impose a policy, but
>> we should figure out the right place for that policy to be specified by
>> the admin.
> 
> May be but NVIDIA is talking about some "out-of-band" command to the GPU
> to enable firmware update so firmware update is not really supported.
> 
> 
>>> 3. DMA trafic over the NVLink2 link can be of 2 types: UT=1 for
>>> PCI-style DMA via our usual TCE tables (one per a NVLink2 link),
>>> and UT=0 for direct host memory access. UT stands for "use
>>> translation" and this is a part of the NVLink2 protocol. Only UT=1 is
>>> possible over the PCIe link.
>>> This UT=0 trafic uses host physical addresses returned by a nest MMU (a
>>> piece of NVIDIA logic on a POWER9 chip), this takes LPID (guest id),
>>> mmu context id (guest userspace mm id), a virtual address and translates
>>> to the host physical and that result is used for UT=0 DMA, this is
>>> called "ATS" although it is not PCIe ATS afaict.
>>> NVIDIA says that the hardware is designed in a way that it can only do
>>> DMA UT=0 to addresses which ATS translated to, and there is no way to
>>> override this behavior and this is what guarantees the isolation.  
>>
>> I'm kinda lost here, maybe we can compare it to PCIe ATS where an
>> endpoint requests a translation of an IOVA to physical address, the
>> IOMMU returns a lookup based on PCIe requester ID, and there's an
>> invalidation protocol to keep things coherent.
> 
> Yes there is. The current approach is to have an MMU notifier in
> the kernel which tells an NPU (IBM piece of logic between GPU/NVlink2
> and NVIDIA nest MMU) to invalidate translations and that in turn pokes
> the GPU till that confirms that it invalidated tlbs and there is no
> ongoing DMA.
> 
>> In the case above, who provides a guest id and mmu context id? 
> 
> We (powerpc/powernv platform) configure NPU to bind specific bus:dev:fn to
> an LPID (== guest id) and MMU context id comes from the guest. The nest
> MMU knows where the partition table and this table contains all the
> pointers needs for the translation.
> 
> 
>> Additional software
>> somewhere?  Is the virtual address an IOVA or a process virtual
>> address? 
> 
> A guest kernel or a guest userspace virtual address.
> 
>> Do we assume some sort of invalidation protocol as well?
> 
> I am little confused, is this question about the same invalidation
> protocol as above or different?
> 
> 
>>> So isolation can be achieved if I do not miss something.
>>>
>>> How do we want this to be documented to proceed? I assume if I post
>>> patches filtering MMIOs, this won't do it, right? If just 1..3 are
>>> documented, will we take this t&c or we need a GPU API spec (which is
>>> not going to happen anyway)?  
>>
>> "t&c"? I think we need what we're actually interacting with to be well
>> documented, but that could be _thorough_ comments in the code, enough
>> to understand the theory of operation, as far as I'm concerned.  A pdf
>> lost on a corporate webserver isn't necessarily an improvement over
>> that, but there needs to be sufficient detail to understand what we're
>> touching such that we can maintain, adapt, and improve the code over
>> time.  Only item #3 above appears POWER specific, so I'd hope that #1
>> is done in the PCI subsystem, #2 might be a QEMU option (maybe kernel
>> vfio-pci, but I'm not sure that's necessary), and I don't know where #3
>> goes.  Thanks,
> 
> Ok, understood. Thanks!

After some local discussions, it was pointed out that force disabling
nvlinks won't bring us much as for an nvlink to work, both sides need to
enable it so malicious guests cannot penetrate good ones (or a host)
unless a good guest enabled the link but won't happen with a well
behaving guest. And if two guests became malicious, then can still only
harm each other, and so can they via other ways such network. This is
different from PCIe as once PCIe link is unavoidably enabled, a well
behaving device cannot firewall itself from peers as it is up to the
upstream bridge(s) now to decide the routing; with nvlink2, a GPU still
has means to protect itself, just like a guest can run "firewalld" for
network.

Although it would be a nice feature to have an extra barrier between
GPUs, is inability to block the links in hypervisor still a blocker for
V100 pass through?


-- 
Alexey

^ permalink raw reply

* Re: Infinite looping observed in __offline_pages
From: Michal Hocko @ 2018-07-30  9:16 UTC (permalink / raw)
  To: John Allen
  Cc: linux-kernel, linuxppc-dev, kamezawa.hiroyu, n-horiguchi, mgorman,
	nfont
In-Reply-To: <20180727173259.htdxpn4i2fxprpaj@p50.austin.ibm.com>

On Fri 27-07-18 12:32:59, John Allen wrote:
> On Wed, Jul 25, 2018 at 10:03:36PM +0200, Michal Hocko wrote:
> > On Wed 25-07-18 13:11:15, John Allen wrote:
> > [...]
> > > Does a failure in do_migrate_range indicate that the range is unmigratable
> > > and the loop in __offline_pages should terminate and goto failed_removal? Or
> > > should we allow a certain number of retrys before we
> > > give up on migrating the range?
> > 
> > Unfortunatelly not. Migration code doesn't tell a difference between
> > ephemeral and permanent failures. We are relying on
> > start_isolate_page_range to tell us this. So the question is, what kind
> > of page is not migratable and for what reason.
> > 
> > Are you able to add some debugging to give us more information. The
> > current debugging code in the hotplug/migration sucks...
> 
> After reproducing the problem a couple times, it seems that it can occur for
> different types of pages. Running page-types on the offending page over two
> separate instances produced the following:
> 
> # tools/vm/page-types -a 307968-308224
>             flags	page-count       MB  symbolic-flags			long-symbolic-flags
> 0x0000000000000400	         1        0  __________B________________________________	buddy
> 	     total	         1        0

Huh! How come a buddy page has non zero reference count.
> 
> And the following on a separate run:
> 
> # tools/vm/page-types -a 313088-313344
>             flags	page-count       MB  symbolic-flags			long-symbolic-flags
> 0x000000000000006c	         1        0  __RU_lA____________________________________	referenced,uptodate,lru,active
>             total	         1        0

Hmm, what is the expected page count in this case? Seeing 1 doesn't look
particularly wrong.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: [RFC 1/4] virtio: Define virtio_direct_dma_ops structure
From: Christoph Hellwig @ 2018-07-30  9:24 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, benh, mpe, mst, hch, linuxram, haren,
	paulus, srikar
In-Reply-To: <20180720035941.6844-2-khandual@linux.vnet.ibm.com>

> +/*
> + * Virtio direct mapping DMA API operations structure
> + *
> + * This defines DMA API structure for all virtio devices which would not
> + * either bring in their own DMA OPS from architecture or they would not
> + * like to use architecture specific IOMMU based DMA OPS because QEMU
> + * expects GPA instead of an IOVA in absence of VIRTIO_F_IOMMU_PLATFORM.
> + */
> +dma_addr_t virtio_direct_map_page(struct device *dev, struct page *page,
> +			    unsigned long offset, size_t size,
> +			    enum dma_data_direction dir,
> +			    unsigned long attrs)

All these functions should probably be marked static.

> +void virtio_direct_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
> +			size_t size, enum dma_data_direction dir,
> +			unsigned long attrs)
> +{
> +}

No need to implement no-op callbacks in struct dma_map_ops.

> +
> +int virtio_direct_mapping_error(struct device *hwdev, dma_addr_t dma_addr)
> +{
> +	return 0;
> +}

Including this one.

> +void *virtio_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
> +		gfp_t gfp, unsigned long attrs)
> +{
> +	void *queue = alloc_pages_exact(PAGE_ALIGN(size), gfp);
> +
> +	if (queue) {
> +		phys_addr_t phys_addr = virt_to_phys(queue);
> +		*dma_handle = (dma_addr_t)phys_addr;
> +
> +		if (WARN_ON_ONCE(*dma_handle != phys_addr)) {
> +			free_pages_exact(queue, PAGE_ALIGN(size));
> +			return NULL;
> +		}
> +	}
> +	return queue;

queue is a very odd name in a generic memory allocator.

> +void virtio_direct_free(struct device *dev, size_t size, void *vaddr,
> +		dma_addr_t dma_addr, unsigned long attrs)
> +{
> +	free_pages_exact(vaddr, PAGE_ALIGN(size));
> +}
> +
> +const struct dma_map_ops virtio_direct_dma_ops = {
> +	.alloc			= virtio_direct_alloc,
> +	.free			= virtio_direct_free,
> +	.map_page		= virtio_direct_map_page,
> +	.unmap_page		= virtio_direct_unmap_page,
> +	.mapping_error		= virtio_direct_mapping_error,
> +};

This is missing a dma_map_sg implementation.  In general this is
mandatory for dma_ops.  So either you implement it or explain in
a common why you think you can skip it.

> +EXPORT_SYMBOL(virtio_direct_dma_ops);

EXPORT_SYMBOL_GPL like all virtio symbols, please.

^ permalink raw reply

* Re: [RFC 2/4] virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively
From: Christoph Hellwig @ 2018-07-30  9:25 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, benh, mpe, mst, hch, linuxram, haren,
	paulus, srikar
In-Reply-To: <20180720035941.6844-3-khandual@linux.vnet.ibm.com>

> +const struct dma_map_ops virtio_direct_dma_ops;

This belongs into a header if it is non-static.  If you only
use it in this file anyway please mark it static and avoid a forward
declaration.

> +
>  int virtio_finalize_features(struct virtio_device *dev)
>  {
>  	int ret = dev->config->finalize_features(dev);
> @@ -174,6 +176,9 @@ int virtio_finalize_features(struct virtio_device *dev)
>  	if (ret)
>  		return ret;
>  
> +	if (virtio_has_iommu_quirk(dev))
> +		set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);

This needs a big fat comment explaining what is going on here.

Also not new, but I find the existance of virtio_has_iommu_quirk and its
name horribly confusing.  It might be better to open code it here once
only a single caller is left.

^ permalink raw reply

* Re: [RFC 2/4] virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively
From: Christoph Hellwig @ 2018-07-30  9:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, jasowang, benh, mpe, hch,
	linuxram, haren, paulus, srikar
In-Reply-To: <20180729001344-mutt-send-email-mst@kernel.org>

> > +
> > +	if (xen_domain())
> > +		goto skip_override;
> > +
> > +	if (virtio_has_iommu_quirk(dev))
> > +		set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);
> > +
> > + skip_override:
> > +
> 
> I prefer normal if scoping as opposed to goto spaghetti pls.
> Better yet move vring_use_dma_api here and use it.
> Less of a chance something will break.

I agree about avoid pointless gotos here, but we can do things
perfectly well without either gotos or a confusing helper here
if we structure it right. E.g.:

	// suitably detailed comment here
	if (!xen_domain() &&
	    !virtio_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM))
		set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);

and while we're at it - modifying dma ops for the parent looks very
dangerous.  I don't think we can do that, as it could break iommu
setup interactions.  IFF we set a specific dma map ops it has to be
on the virtio device itself, of which we have full control.

^ permalink raw reply

* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Christoph Hellwig @ 2018-07-30  9:34 UTC (permalink / raw)
  To: Will Deacon
  Cc: Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, jasowang, benh, mpe, mst, hch,
	linuxram, haren, paulus, srikar, robin.murphy,
	jean-philippe.brucker, marc.zyngier
In-Reply-To: <20180727095804.GA25592@arm.com>

On Fri, Jul 27, 2018 at 10:58:05AM +0100, Will Deacon wrote:
> 
> I just wanted to say that this patch series provides a means for us to
> force the coherent DMA ops for legacy virtio devices on arm64, which in turn
> means that we can enable the SMMU with legacy devices in our fastmodel
> emulation platform (which is slowly being upgraded to virtio 1.0) without
> hanging during boot. Patch below.

Yikes, this is a nightmare.  That is exactly where I do not want things
to end up.  We really need to distinguish between legacy virtual crappy
virtio (and that includes v1) that totally ignores the bus it pretends
to be on, and sane virtio (to be defined) that sit on a real (or
properly emulated including iommu and details for dma mapping) bus.

Having a mumble jumble of arch specific undocumented magic as in
the powerpc patch replied to or this arm patch is a complete no-go.

Nacked-by: Christoph Hellwig <hch@lst.de>

for both.

^ permalink raw reply

* Re: [PATCH] powerpc/mm: Don't report PUDs as memory leaks when using kmemleak
From: Paul Menzel @ 2018-07-30  9:54 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, aneesh.kumar
In-Reply-To: <87zhy9i5q0.fsf@concordia.ellerman.id.au>

[-- Attachment #1: Type: text/plain, Size: 977 bytes --]

Dear Michael,


On 07/30/18 08:43, Michael Ellerman wrote:
> Paul Menzel <pmenzel@molgen.mpg.de> writes:
>> Am 19.07.2018 um 16:33 schrieb Michael Ellerman:
> ...
>>>
>>> The fix is fairly simple. We need to tell kmemleak to ignore PUD
>>> allocations and never report them as leaks. We can also tell it not to
>>> scan the PGD, because it will never find pointers in there. However it
>>> will still notice if we allocate a PGD and then leak it.
>>>
>>> Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
>>> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> > ---
>>>   arch/powerpc/include/asm/book3s/64/pgalloc.h | 23 +++++++++++++++++++++--
>>>   1 file changed, 21 insertions(+), 2 deletions(-)
>>
>> […]
>>
>> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> on IBM S822LC
> 
> Thanks.

No problem. I forgot to add, that it’d be great, if you tagged this
for the stable series too.

Cc: stable@vger.kernel.org


Kind regards,

Paul


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply

* [PATCH 1/3] arm64: dts: fsl: add clocks property for fman ptp timer node
From: Yangbo Lu @ 2018-07-30 10:01 UTC (permalink / raw)
  To: netdev, madalin.bucur, Richard Cochran, Rob Herring, Shawn Guo,
	David S . Miller
  Cc: devicetree, linuxppc-dev, linux-arm-kernel, linux-kernel,
	Yangbo Lu

This patch is to add clocks property for fman ptp timer node.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
---
 arch/arm64/boot/dts/freescale/qoriq-fman3-0.dtsi |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/arm64/boot/dts/freescale/qoriq-fman3-0.dtsi b/arch/arm64/boot/dts/freescale/qoriq-fman3-0.dtsi
index a56a408..4664c33 100644
--- a/arch/arm64/boot/dts/freescale/qoriq-fman3-0.dtsi
+++ b/arch/arm64/boot/dts/freescale/qoriq-fman3-0.dtsi
@@ -80,4 +80,5 @@ ptp_timer0: ptp-timer@1afe000 {
 	compatible = "fsl,fman-ptp-timer";
 	reg = <0x0 0x1afe000 0x0 0x1000>;
 	interrupts = <GIC_SPI 44 IRQ_TYPE_LEVEL_HIGH>;
+	clocks = <&clockgen 3 0>;
 };
-- 
1.7.1

^ permalink raw reply related

* [PATCH 2/3] powerpc/mpc85xx: add clocks property for fman ptp timer node
From: Yangbo Lu @ 2018-07-30 10:01 UTC (permalink / raw)
  To: netdev, madalin.bucur, Richard Cochran, Rob Herring, Shawn Guo,
	David S . Miller
  Cc: devicetree, linuxppc-dev, linux-arm-kernel, linux-kernel,
	Yangbo Lu
In-Reply-To: <20180730100154.27906-1-yangbo.lu@nxp.com>

This patch is to add clocks property for fman ptp timer node.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
---
 arch/powerpc/boot/dts/fsl/qoriq-fman-0.dtsi   |    1 +
 arch/powerpc/boot/dts/fsl/qoriq-fman-1.dtsi   |    1 +
 arch/powerpc/boot/dts/fsl/qoriq-fman3-0.dtsi  |    1 +
 arch/powerpc/boot/dts/fsl/qoriq-fman3-1.dtsi  |    1 +
 arch/powerpc/boot/dts/fsl/qoriq-fman3l-0.dtsi |    1 +
 5 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman-0.dtsi b/arch/powerpc/boot/dts/fsl/qoriq-fman-0.dtsi
index 6b124f7..9b6cf91 100644
--- a/arch/powerpc/boot/dts/fsl/qoriq-fman-0.dtsi
+++ b/arch/powerpc/boot/dts/fsl/qoriq-fman-0.dtsi
@@ -100,4 +100,5 @@ ptp_timer0: ptp-timer@4fe000 {
 	compatible = "fsl,fman-ptp-timer";
 	reg = <0x4fe000 0x1000>;
 	interrupts = <96 2 0 0>;
+	clocks = <&clockgen 3 0>;
 };
diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman-1.dtsi b/arch/powerpc/boot/dts/fsl/qoriq-fman-1.dtsi
index b80aaf5..e95c11f 100644
--- a/arch/powerpc/boot/dts/fsl/qoriq-fman-1.dtsi
+++ b/arch/powerpc/boot/dts/fsl/qoriq-fman-1.dtsi
@@ -100,4 +100,5 @@ ptp_timer1: ptp-timer@5fe000 {
 	compatible = "fsl,fman-ptp-timer";
 	reg = <0x5fe000 0x1000>;
 	interrupts = <97 2 0 0>;
+	clocks = <&clockgen 3 1>;
 };
diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman3-0.dtsi b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0.dtsi
index d3720fd..d62b36c 100644
--- a/arch/powerpc/boot/dts/fsl/qoriq-fman3-0.dtsi
+++ b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0.dtsi
@@ -105,4 +105,5 @@ ptp_timer0: ptp-timer@4fe000 {
 	compatible = "fsl,fman-ptp-timer";
 	reg = <0x4fe000 0x1000>;
 	interrupts = <96 2 0 0>;
+	clocks = <&clockgen 3 0>;
 };
diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman3-1.dtsi b/arch/powerpc/boot/dts/fsl/qoriq-fman3-1.dtsi
index ae34c20..3102324 100644
--- a/arch/powerpc/boot/dts/fsl/qoriq-fman3-1.dtsi
+++ b/arch/powerpc/boot/dts/fsl/qoriq-fman3-1.dtsi
@@ -105,4 +105,5 @@ ptp_timer1: ptp-timer@5fe000 {
 	compatible = "fsl,fman-ptp-timer";
 	reg = <0x5fe000 0x1000>;
 	interrupts = <97 2 0 0>;
+	clocks = <&clockgen 3 1>;
 };
diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman3l-0.dtsi b/arch/powerpc/boot/dts/fsl/qoriq-fman3l-0.dtsi
index 02f2755..c90702b 100644
--- a/arch/powerpc/boot/dts/fsl/qoriq-fman3l-0.dtsi
+++ b/arch/powerpc/boot/dts/fsl/qoriq-fman3l-0.dtsi
@@ -93,4 +93,5 @@ ptp_timer0: ptp-timer@4fe000 {
 	compatible = "fsl,fman-ptp-timer";
 	reg = <0x4fe000 0x1000>;
 	interrupts = <96 2 0 0>;
+	clocks = <&clockgen 3 0>;
 };
-- 
1.7.1

^ permalink raw reply related

* [PATCH 3/3] ptp_qoriq: convert to use module parameters for initialization
From: Yangbo Lu @ 2018-07-30 10:01 UTC (permalink / raw)
  To: netdev, madalin.bucur, Richard Cochran, Rob Herring, Shawn Guo,
	David S . Miller
  Cc: devicetree, linuxppc-dev, linux-arm-kernel, linux-kernel,
	Yangbo Lu
In-Reply-To: <20180730100154.27906-1-yangbo.lu@nxp.com>

The ptp_qoriq driver initialized the 1588 timer with the
configurations provided by the properties of device tree
node. For example,

  fsl,tclk-period = <5>;
  fsl,tmr-prsc    = <2>;
  fsl,tmr-add     = <0xaaaaaaab>;
  fsl,tmr-fiper1  = <999999995>;
  fsl,tmr-fiper2  = <99990>;
  fsl,max-adj     = <499999999>;

These things actually were runtime configurations which
were not proper to be put into dts. This patch is to convert
to use module parameters for 1588 timer initialization, and
to support initial register values calculation.
If the parameters are not provided, the driver will calculate
register values with a set of default parameters. With this
patch, those dts properties are no longer needed for new
platform to support 1588 timer, and many QorIQ DPAA platforms
(some P series and T series platforms of PowerPC, and some
LS series platforms of ARM64) could use this driver for their
fman ptp timer with default module parameters. However, this
patch didn't remove the dts method. Because there were still
many old platforms using the dts method. We need to clean up
their dts files, verify module parameters on them, and convert
them to the new method gradually in case of breaking any
function.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
---
 drivers/ptp/ptp_qoriq.c       |  117 +++++++++++++++++++++++++++++++++++++++-
 include/linux/fsl/ptp_qoriq.h |    1 +
 2 files changed, 115 insertions(+), 3 deletions(-)

diff --git a/drivers/ptp/ptp_qoriq.c b/drivers/ptp/ptp_qoriq.c
index a14c317..22baf83 100644
--- a/drivers/ptp/ptp_qoriq.c
+++ b/drivers/ptp/ptp_qoriq.c
@@ -29,9 +29,30 @@
 #include <linux/of_platform.h>
 #include <linux/timex.h>
 #include <linux/slab.h>
+#include <linux/clk.h>
 
 #include <linux/fsl/ptp_qoriq.h>
 
+static unsigned int cksel = DEFAULT_CKSEL;
+module_param(cksel, uint, 0644);
+MODULE_PARM_DESC(cksel, "Select reference clock");
+
+static unsigned int clk_src;
+module_param(clk_src, uint, 0644);
+MODULE_PARM_DESC(clk_src, "Reference clock frequency (if clocks property not provided in dts)");
+
+static unsigned int tmr_prsc = 2;
+module_param(tmr_prsc, uint, 0644);
+MODULE_PARM_DESC(tmr_prsc, "Output clock division/prescale factor");
+
+static unsigned int tmr_fiper1 = 1000000000;
+module_param(tmr_fiper1, uint, 0644);
+MODULE_PARM_DESC(tmr_fiper1, "Desired fixed interval pulse period (ns)");
+
+static unsigned int tmr_fiper2 = 100000;
+module_param(tmr_fiper2, uint, 0644);
+MODULE_PARM_DESC(tmr_fiper2, "Desired fixed interval pulse period (ns)");
+
 /*
  * Register access functions
  */
@@ -317,6 +338,91 @@ static int ptp_qoriq_enable(struct ptp_clock_info *ptp,
 	.enable		= ptp_qoriq_enable,
 };
 
+/**
+ * qoriq_ptp_nominal_freq - calculate nominal frequency by reference clock
+ *			    frequency
+ *
+ * @clk_src: reference clock frequency
+ *
+ * The nominal frequency is the desired clock frequency.
+ * It should be less than the reference clock frequency.
+ * It should be a factor of 1000MHz.
+ *
+ * Return the nominal frequency
+ */
+static u32 qoriq_ptp_nominal_freq(u32 clk_src)
+{
+	u32 remainder = 0;
+
+	clk_src /= 1000000;
+	remainder = clk_src % 100;
+	if (remainder) {
+		clk_src -= remainder;
+		clk_src += 100;
+	}
+
+	do {
+		clk_src -= 100;
+
+	} while (1000 % clk_src);
+
+	return clk_src * 1000000;
+}
+
+static int qoriq_ptp_config(struct qoriq_ptp *qoriq_ptp,
+			    struct device_node *node)
+{
+	struct clk *clk;
+	u64 freq_comp;
+	u64 max_adj;
+	u32 nominal_freq;
+
+	qoriq_ptp->cksel = cksel;
+
+	if (clk_src) {
+		qoriq_ptp->clk_src = clk_src;
+	} else {
+		clk = of_clk_get(node, 0);
+		if (!IS_ERR(clk)) {
+			qoriq_ptp->clk_src = clk_get_rate(clk);
+			clk_put(clk);
+		}
+	}
+
+	if (qoriq_ptp->clk_src <= 100000000UL) {
+		pr_err("error reference clock value, or lower than 100MHz\n");
+		return -EINVAL;
+	}
+
+	nominal_freq = qoriq_ptp_nominal_freq(qoriq_ptp->clk_src);
+	if (!nominal_freq)
+		return -EINVAL;
+
+	qoriq_ptp->tclk_period = 1000000000UL / nominal_freq;
+	qoriq_ptp->tmr_prsc = tmr_prsc;
+
+	/* Calculate initial frequency compensation value for TMR_ADD register.
+	 * freq_comp = ceil(2^32 / freq_ratio)
+	 * freq_ratio = reference_clock_freq / nominal_freq
+	 */
+	freq_comp = ((u64)1 << 32) * nominal_freq;
+	if (do_div(freq_comp, qoriq_ptp->clk_src))
+		freq_comp++;
+
+	qoriq_ptp->tmr_add = freq_comp;
+	qoriq_ptp->tmr_fiper1 = tmr_fiper1 - qoriq_ptp->tclk_period;
+	qoriq_ptp->tmr_fiper2 = tmr_fiper2 - qoriq_ptp->tclk_period;
+
+	/* max_adj = 1000000000 * (freq_ratio - 1.0) - 1
+	 * freq_ratio = reference_clock_freq / nominal_freq
+	 */
+	max_adj = 1000000000ULL * (qoriq_ptp->clk_src - nominal_freq);
+	max_adj = max_adj / nominal_freq - 1;
+	qoriq_ptp->caps.max_adj = max_adj;
+
+	return 0;
+}
+
 static int qoriq_ptp_probe(struct platform_device *dev)
 {
 	struct device_node *node = dev->dev.of_node;
@@ -332,7 +438,7 @@ static int qoriq_ptp_probe(struct platform_device *dev)
 	if (!qoriq_ptp)
 		goto no_memory;
 
-	err = -ENODEV;
+	err = -EINVAL;
 
 	qoriq_ptp->caps = ptp_qoriq_caps;
 
@@ -351,10 +457,14 @@ static int qoriq_ptp_probe(struct platform_device *dev)
 				 "fsl,tmr-fiper2", &qoriq_ptp->tmr_fiper2) ||
 	    of_property_read_u32(node,
 				 "fsl,max-adj", &qoriq_ptp->caps.max_adj)) {
-		pr_err("device tree node missing required elements\n");
-		goto no_node;
+		pr_warn("device tree node missing required elements, try module param\n");
+
+		if (qoriq_ptp_config(qoriq_ptp, node))
+			goto no_param;
 	}
 
+	err = -ENODEV;
+
 	qoriq_ptp->irq = platform_get_irq(dev, 0);
 
 	if (qoriq_ptp->irq < 0) {
@@ -436,6 +546,7 @@ static int qoriq_ptp_probe(struct platform_device *dev)
 	release_resource(qoriq_ptp->rsrc);
 no_resource:
 	free_irq(qoriq_ptp->irq, qoriq_ptp);
+no_param:
 no_node:
 	kfree(qoriq_ptp);
 no_memory:
diff --git a/include/linux/fsl/ptp_qoriq.h b/include/linux/fsl/ptp_qoriq.h
index dc3dac4..586d430 100644
--- a/include/linux/fsl/ptp_qoriq.h
+++ b/include/linux/fsl/ptp_qoriq.h
@@ -147,6 +147,7 @@ struct qoriq_ptp {
 	u32 cksel;
 	u32 tmr_fiper1;
 	u32 tmr_fiper2;
+	u32 clk_src;
 };
 
 static inline u32 qoriq_read(unsigned __iomem *addr)
-- 
1.7.1

^ permalink raw reply related

* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Michael S. Tsirkin @ 2018-07-30 10:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Will Deacon, Anshuman Khandual, virtualization, linux-kernel,
	linuxppc-dev, aik, robh, joe, elfring, david, jasowang, benh, mpe,
	linuxram, haren, paulus, srikar, robin.murphy,
	jean-philippe.brucker, marc.zyngier
In-Reply-To: <20180730093414.GD26245@infradead.org>

On Mon, Jul 30, 2018 at 02:34:14AM -0700, Christoph Hellwig wrote:
> We really need to distinguish between legacy virtual crappy
> virtio (and that includes v1) that totally ignores the bus it pretends
> to be on, and sane virtio (to be defined) that sit on a real (or
> properly emulated including iommu and details for dma mapping) bus.

Let me reply to the "crappy" part first:
So virtio devices can run on another CPU or on a PCI bus. Configuration
can happen over mupltiple transports.  There is a discovery protocol to
figure out where it is. It has some warts but any real system has warts.

So IMHO virtio running on another CPU isn't "legacy virtual crappy
virtio". virtio devices that actually sit on a PCI bus aren't "sane"
simply because the DMA is more convoluted on some architectures.

Performance impact of the optimizations possible when you know
your "device" is in fact just another CPU has been measured,
it is real, so we aren't interested in adding all that overhead back
just so we can use DMA API. The "correct then fast" mantra doesn't
apply to something that is as widely deployed as virtio.

And I can accept an argument that maybe the DMA API isn't designed to
support such virtual DMA. Whether it should I don't know.

With this out of my system:
I agree these approaches are hacky. I think it is generally better to
have virtio feature negotiation tell you whether device runs on a CPU or
not rather than rely on platform specific ways for this. To this end
there was a recent proposal to rename VIRTIO_F_IO_BARRIER to
VIRTIO_F_REAL_DEVICE.  It got stuck since "real" sounds vague to people,
e.g.  what if it's a VF - is that real or not? But I can see something
like e.g. VIRTIO_F_PLATFORM_DMA gaining support.

We would then rename virtio_has_iommu_quirk to virtio_has_dma_quirk
and test VIRTIO_F_PLATFORM_DMA in addition to the IOMMU thing.

-- 
MST

^ permalink raw reply

* Re: [PATCH] of/fdt: Remove PPC32 longtrail hack in memory scan
From: Michael Ellerman @ 2018-07-30 10:47 UTC (permalink / raw)
  To: Rob Herring; +Cc: devicetree, Frank Rowand, Paul Mackerras, linuxppc-dev
In-Reply-To: <CAL_JsqK4GEPJSPg+sStvFyO=sQ0Fp49FKKTz0S9OGc1COOnp-g@mail.gmail.com>

Rob Herring <robh+dt@kernel.org> writes:
> On Thu, Jul 26, 2018 at 11:36 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>> When the OF code was originally made common by Grant in commit
>> 51975db0b733 ("of/flattree: merge early_init_dt_scan_memory() common
>> code") (Feb 2010), the common code inherited a hack to handle
>> PPC "longtrail" machines, which had a "memory@0" node with no
>> device_type.
>>
>> That check was then made to only apply to PPC32 in b44aa25d20e2 ("of:
>> Handle memory@0 node on PPC32 only") (May 2014).
>>
>> But according to Paul Mackerras the "longtrail" machines are long
>> dead, if they were ever seen in the wild at all. If someone does still
>> have one, we can handle this firmware wart in powerpc platform code.
>>
>> So remove the hack once and for all.
>
> Yay. I guess Power Macs and other quirks will never die...

Not soon.

In base.c I see:
 - the hack in arch_find_n_match_cpu_physical_id()
   - we should just move that into arch code, it's a __weak arch hook
     after all.
 - a PPC hack in of_alias_scan(), I guess we need to retain that
   behaviour, but it's pretty minor anyway.

In address.c there's the powermac empty ranges hack. Seems like we could
fix that just by creating empty `ranges` properties in fixup_device_tree().
I don't think we support booting powermacs other than via prom_init(). (Ben?)

> I'll queue this up.

cheers

^ permalink raw reply

* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Christoph Hellwig @ 2018-07-30 11:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization,
	linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david,
	jasowang, benh, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier
In-Reply-To: <20180730125100-mutt-send-email-mst@kernel.org>

On Mon, Jul 30, 2018 at 01:28:03PM +0300, Michael S. Tsirkin wrote:
> Let me reply to the "crappy" part first:
> So virtio devices can run on another CPU or on a PCI bus. Configuration
> can happen over mupltiple transports.  There is a discovery protocol to
> figure out where it is. It has some warts but any real system has warts.
> 
> So IMHO virtio running on another CPU isn't "legacy virtual crappy
> virtio". virtio devices that actually sit on a PCI bus aren't "sane"
> simply because the DMA is more convoluted on some architectures.

All of what you said would be true if virtio didn't claim to be
a PCI device.  Once it claims to be a PCI device and we also see
real hardware written to the interface I stand to all what I said
above.

> With this out of my system:
> I agree these approaches are hacky. I think it is generally better to
> have virtio feature negotiation tell you whether device runs on a CPU or
> not rather than rely on platform specific ways for this. To this end
> there was a recent proposal to rename VIRTIO_F_IO_BARRIER to
> VIRTIO_F_REAL_DEVICE.  It got stuck since "real" sounds vague to people,
> e.g.  what if it's a VF - is that real or not? But I can see something
> like e.g. VIRTIO_F_PLATFORM_DMA gaining support.
> 
> We would then rename virtio_has_iommu_quirk to virtio_has_dma_quirk
> and test VIRTIO_F_PLATFORM_DMA in addition to the IOMMU thing.

I don't really care about the exact naming, and indeed a device that
sets the flag doesn't have to be a 'real' device - it just has to act
like one.  I explained all the issues that this means (at least relating
to DMA) in one of the previous threads.

The important bit is that we can specify exact behavior for both
devices that sets the "I'm real!" flag and that ones that don't exactly
in the spec.  And that very much excludes arch-specific (or
Xen-specific) overrides.

^ permalink raw reply

* Re: [PATCH] PCI: call dma_debug_add_bus for pci_bus_type in common code
From: Thomas Gleixner @ 2018-07-30 11:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-pci, iommu, linuxppc-dev, x86, linux-sh, linux-kernel
In-Reply-To: <20180730073842.16092-1-hch@lst.de>

On Mon, 30 Jul 2018, Christoph Hellwig wrote:

> There is nothing arch specific about PCI or dma-debug, so move this
> call to common code just after registering the bus type.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Acked-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply

* [PATCH 1/2] of: Add of_machine_compatible_match()
From: Michael Ellerman @ 2018-07-30 13:15 UTC (permalink / raw)
  To: devicetree, robh+dt, frowand.list; +Cc: linuxppc-dev

We have of_machine_is_compatible() to check if a machine is compatible
with a single compatible string. However some code is able to support
multiple compatible boards, and so wants to check for one of many
compatible strings.

So add of_machine_compatible_match() which takes a NULL terminated
array of compatible strings to check against the root node's
compatible property.

Compared to an open coded match this is slightly more self
documenting, and also avoids the caller needing to juggle the root
node either directly or via of_find_node_by_path().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 drivers/of/base.c  | 21 +++++++++++++++++++++
 include/linux/of.h |  6 ++++++
 2 files changed, 27 insertions(+)

diff --git a/drivers/of/base.c b/drivers/of/base.c
index 848f549164cd..603716ba8513 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -505,6 +505,27 @@ int of_device_compatible_match(struct device_node *device,
 	return score;
 }
 
+/**
+ * of_machine_compatible_match - Test root of device tree against a compatible array
+ * @compats: NULL terminated array of compatible strings to look for in root node's compatible property.
+ *
+ * Returns true if the root node has any of the given compatible values in its
+ * compatible property.
+ */
+bool of_machine_compatible_match(const char *const *compats)
+{
+	struct device_node *root;
+	int rc = 0;
+
+	root = of_node_get(of_root);
+	if (root) {
+		rc = of_device_compatible_match(root, compats);
+		of_node_put(root);
+	}
+
+	return rc != 0;
+}
+
 /**
  * of_machine_is_compatible - Test root of device tree for a given compatible value
  * @compat: compatible string to look for in root node's compatible property.
diff --git a/include/linux/of.h b/include/linux/of.h
index 4d25e4f952d9..05e3e23a3a57 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -389,6 +389,7 @@ extern int of_alias_get_id(struct device_node *np, const char *stem);
 extern int of_alias_get_highest_id(const char *stem);
 
 extern int of_machine_is_compatible(const char *compat);
+extern bool of_machine_compatible_match(const char *const *compats);
 
 extern int of_add_property(struct device_node *np, struct property *prop);
 extern int of_remove_property(struct device_node *np, struct property *prop);
@@ -877,6 +878,11 @@ static inline int of_machine_is_compatible(const char *compat)
 	return 0;
 }
 
+static inline bool of_machine_compatible_match(const char *const *compats)
+{
+	return false;
+}
+
 static inline bool of_console_check(const struct device_node *dn, const char *name, int index)
 {
 	return false;
-- 
2.14.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox