* Re: [PATCH] powerpc/pseries/hotplug-cpu: increase wait time for vCPU death
From: Michael Roth @ 2020-08-05 4:37 UTC (permalink / raw)
To: Greg Kurz, Michael Ellerman
Cc: Nathan Lynch, linuxppc-dev, Cedric Le Goater,
Thiago Jung Bauermann
In-Reply-To: <87zh79yen7.fsf@mpe.ellerman.id.au>
Quoting Michael Ellerman (2020-08-04 22:07:08)
> Greg Kurz <groug@kaod.org> writes:
> > On Tue, 04 Aug 2020 23:35:10 +1000
> > Michael Ellerman <mpe@ellerman.id.au> wrote:
> >> There is a bit of history to this code, but not in a good way :)
> >>
> >> Michael Roth <mdroth@linux.vnet.ibm.com> writes:
> >> > For a power9 KVM guest with XIVE enabled, running a test loop
> >> > where we hotplug 384 vcpus and then unplug them, the following traces
> >> > can be seen (generally within a few loops) either from the unplugged
> >> > vcpu:
> >> >
> >> > [ 1767.353447] cpu 65 (hwid 65) Ready to die...
> >> > [ 1767.952096] Querying DEAD? cpu 66 (66) shows 2
> >> > [ 1767.952311] list_del corruption. next->prev should be c00a000002470208, but was c00a000002470048
> >> ...
> >> >
> >> > At that point the worker thread assumes the unplugged CPU is in some
> >> > unknown/dead state and procedes with the cleanup, causing the race with
> >> > the XIVE cleanup code executed by the unplugged CPU.
> >> >
> >> > Fix this by inserting an msleep() after each RTAS call to avoid
> >>
> >> We previously had an msleep(), but it was removed:
> >>
> >> b906cfa397fd ("powerpc/pseries: Fix cpu hotplug")
> >
> > Ah, I hadn't seen that one...
> >
> >> > pseries_cpu_die() returning prematurely, and double the number of
> >> > attempts so we wait at least a total of 5 seconds. While this isn't an
> >> > ideal solution, it is similar to how we dealt with a similar issue for
> >> > cede_offline mode in the past (940ce422a3).
> >>
> >> Thiago tried to fix this previously but there was a bit of discussion
> >> that didn't quite resolve:
> >>
> >> https://lore.kernel.org/linuxppc-dev/20190423223914.3882-1-bauerman@linux.ibm.com/
> >
> > Yeah it appears that the motivation at the time was to make the "Querying DEAD?"
> > messages to disappear and to avoid potentially concurrent calls to rtas-stop-self
> > which is prohibited by PAPR... not fixing actual crashes.
>
> I'm pretty sure at one point we were triggering crashes *in* RTAS via
> this path, I think that got resolved.
>
> >> Spinning forever seems like a bad idea, but as has been demonstrated at
> >> least twice now, continuing when we don't know the state of the other
> >> CPU can lead to straight up crashes.
> >>
> >> So I think I'm persuaded that it's preferable to have the kernel stuck
> >> spinning rather than oopsing.
> >>
> >
> > +1
> >
> >> I'm 50/50 on whether we should have a cond_resched() in the loop. My
> >> first instinct is no, if we're stuck here for 20s a stack trace would be
> >> good. But then we will probably hit that on some big and/or heavily
> >> loaded machine.
> >>
> >> So possibly we should call cond_resched() but have some custom logic in
> >> the loop to print a warning if we are stuck for more than some
> >> sufficiently long amount of time.
> >
> > How long should that be ?
>
> Yeah good question.
>
> I guess step one would be seeing how long it can take on the 384 vcpu
> machine. And we can probably test on some other big machines.
>
> Hopefully Nathan can give us some idea of how long he's seen it take on
> large systems? I know he was concerned about the 20s timeout of the
> softlockup detector.
>
> Maybe a minute or two?
Hmm, so I took a stab at this where I called cond_resched() after
every 5 seconds of polling and printed a warning at the same time (FWIW
that doesn't seem to trigger any warnings on a loaded 96-core mihawk
system using KVM running the 384vcpu unplug loop)
But it sounds like that's not quite what you had in mind. How frequently
do you think we should call cond_resched()? Maybe after 25 iterations
of polling smp_query_cpu_stopped() to keep original behavior somewhat
similar?
I'll let the current patch run on the mihawk system overnight in the
meantime so we at least have that data point, but would be good to
know what things look like a large pHyp machine.
Thanks!
>
> >> > Fixes: eac1e731b59ee ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
> >> > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1856588
> >>
> >> This is not public.
> >
> > I'll have a look at changing that.
>
> Thanks.
>
> cheers
^ permalink raw reply
* Re: [PATCH v2 13/17] x86/setup: simplify initrd relocation and reservation
From: Baoquan He @ 2020-08-05 4:20 UTC (permalink / raw)
To: Mike Rapoport
Cc: Emil Renner Berthing, linux-sh, Peter Zijlstra, Dave Hansen,
linux-mips, Max Filippov, Paul Mackerras, sparclinux, linux-riscv,
Will Deacon, Stafford Horne, Marek Szyprowski, linux-arch,
linux-s390, linux-c6x-dev, Yoshinori Sato, x86, Russell King,
Mike Rapoport, clang-built-linux, Ingo Molnar, linux-arm-kernel,
Catalin Marinas, uclinux-h8-devel, linux-xtensa, openrisc,
Borislav Petkov, Andy Lutomirski, Paul Walmsley, Thomas Gleixner,
Hari Bathini, Michal Simek, linux-mm, linuxppc-dev, linux-kernel,
iommu, Palmer Dabbelt, Andrew Morton, Christoph Hellwig
In-Reply-To: <20200802163601.8189-14-rppt@kernel.org>
On 08/02/20 at 07:35pm, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
>
> Currently, initrd image is reserved very early during setup and then it
> might be relocated and re-reserved after the initial physical memory
> mapping is created. The "late" reservation of memblock verifies that mapped
> memory size exceeds the size of initrd, the checks whether the relocation
~ then?
> required and, if yes, relocates inirtd to a new memory allocated from
> memblock and frees the old location.
>
> The check for memory size is excessive as memblock allocation will anyway
> fail if there is not enough memory. Besides, there is no point to allocate
> memory from memblock using memblock_find_in_range() + memblock_reserve()
> when there exists memblock_phys_alloc_range() with required functionality.
>
> Remove the redundant check and simplify memblock allocation.
>
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
> arch/x86/kernel/setup.c | 16 +++-------------
> 1 file changed, 3 insertions(+), 13 deletions(-)
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index a3767e74c758..d8de4053c5e8 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -262,16 +262,12 @@ static void __init relocate_initrd(void)
> u64 area_size = PAGE_ALIGN(ramdisk_size);
>
> /* We need to move the initrd down into directly mapped mem */
> - relocated_ramdisk = memblock_find_in_range(0, PFN_PHYS(max_pfn_mapped),
> - area_size, PAGE_SIZE);
> -
> + relocated_ramdisk = memblock_phys_alloc_range(area_size, PAGE_SIZE, 0,
> + PFN_PHYS(max_pfn_mapped));
> if (!relocated_ramdisk)
> panic("Cannot find place for new RAMDISK of size %lld\n",
> ramdisk_size);
>
> - /* Note: this includes all the mem currently occupied by
> - the initrd, we rely on that fact to keep the data intact. */
> - memblock_reserve(relocated_ramdisk, area_size);
> initrd_start = relocated_ramdisk + PAGE_OFFSET;
> initrd_end = initrd_start + ramdisk_size;
> printk(KERN_INFO "Allocated new RAMDISK: [mem %#010llx-%#010llx]\n",
> @@ -298,13 +294,13 @@ static void __init early_reserve_initrd(void)
>
> memblock_reserve(ramdisk_image, ramdisk_end - ramdisk_image);
> }
> +
> static void __init reserve_initrd(void)
> {
> /* Assume only end is not page aligned */
> u64 ramdisk_image = get_ramdisk_image();
> u64 ramdisk_size = get_ramdisk_size();
> u64 ramdisk_end = PAGE_ALIGN(ramdisk_image + ramdisk_size);
> - u64 mapped_size;
>
> if (!boot_params.hdr.type_of_loader ||
> !ramdisk_image || !ramdisk_size)
> @@ -312,12 +308,6 @@ static void __init reserve_initrd(void)
>
> initrd_start = 0;
>
> - mapped_size = memblock_mem_size(max_pfn_mapped);
> - if (ramdisk_size >= (mapped_size>>1))
> - panic("initrd too large to handle, "
> - "disabling initrd (%lld needed, %lld available)\n",
> - ramdisk_size, mapped_size>>1);
Reviewed-by: Baoquan He <bhe@redhat.com>
> -
> printk(KERN_INFO "RAMDISK: [mem %#010llx-%#010llx]\n", ramdisk_image,
> ramdisk_end - 1);
>
> --
> 2.26.2
>
^ permalink raw reply
* Re: [PATCH v2 2/2] ASoC: fsl_sai: Refine enable and disable sequence for synchronous mode
From: Nicolin Chen @ 2020-08-05 4:11 UTC (permalink / raw)
To: Shengjiu Wang
Cc: alsa-devel, timur, Xiubo.Lee, linuxppc-dev, tiwai, lgirdwood,
perex, broonie, festevam, linux-kernel
In-Reply-To: <1596594233-13489-3-git-send-email-shengjiu.wang@nxp.com>
On Wed, Aug 05, 2020 at 10:23:53AM +0800, Shengjiu Wang wrote:
> Tx synchronous with Rx:
> The TCSR.TE is no need to enabled when only Rx is going to be enabled.
> Check if need to disable RSCR.RE before disabling TCSR.TE.
>
> Rx synchronous with Tx:
> The RCSR.RE is no need to enabled when only Tx is going to be enabled.
> Check if need to disable TSCR.RE before disabling RCSR.TE.
Please add to the commit log more context such as what we have
discussed: what's the problem of the current driver, and why we
_have_to_ apply this change though it's sightly against what RM
recommends.
(If thing is straightforward, it's okay to make the text short.
Yet I believe that this change deserves more than these lines.)
One info that you should mention -- also the main reason why I'm
convinced to add this change: trigger() is still in the shape of
the early version where we only supported one operation mode --
Tx synchronous with Rx. So we need an update for other modes.
> Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
The git-diff part looks good, please add this in next ver.:
Reviewed-by: Nicolin Chen <nicoleotsuka@gmail.com>
Btw, the new fsl_sai_dir_is_synced() can be probably applied to
other places with a followup patch.
^ permalink raw reply
* Re: [PATCH] powerpc/pseries/hotplug-cpu: increase wait time for vCPU death
From: Thiago Jung Bauermann @ 2020-08-05 4:01 UTC (permalink / raw)
To: Michael Ellerman
Cc: Nathan Lynch, Cedric Le Goater, linuxppc-dev, Greg Kurz,
Michael Roth
In-Reply-To: <87zh79yen7.fsf@mpe.ellerman.id.au>
Michael Ellerman <mpe@ellerman.id.au> writes:
> Greg Kurz <groug@kaod.org> writes:
>> On Tue, 04 Aug 2020 23:35:10 +1000
>> Michael Ellerman <mpe@ellerman.id.au> wrote:
>>> There is a bit of history to this code, but not in a good way :)
>>>
>>> Michael Roth <mdroth@linux.vnet.ibm.com> writes:
>>> > For a power9 KVM guest with XIVE enabled, running a test loop
>>> > where we hotplug 384 vcpus and then unplug them, the following traces
>>> > can be seen (generally within a few loops) either from the unplugged
>>> > vcpu:
>>> >
>>> > [ 1767.353447] cpu 65 (hwid 65) Ready to die...
>>> > [ 1767.952096] Querying DEAD? cpu 66 (66) shows 2
>>> > [ 1767.952311] list_del corruption. next->prev should be c00a000002470208, but was c00a000002470048
>>> ...
>>> >
>>> > At that point the worker thread assumes the unplugged CPU is in some
>>> > unknown/dead state and procedes with the cleanup, causing the race with
>>> > the XIVE cleanup code executed by the unplugged CPU.
>>> >
>>> > Fix this by inserting an msleep() after each RTAS call to avoid
>>>
>>> We previously had an msleep(), but it was removed:
>>>
>>> b906cfa397fd ("powerpc/pseries: Fix cpu hotplug")
>>
>> Ah, I hadn't seen that one...
>>
>>> > pseries_cpu_die() returning prematurely, and double the number of
>>> > attempts so we wait at least a total of 5 seconds. While this isn't an
>>> > ideal solution, it is similar to how we dealt with a similar issue for
>>> > cede_offline mode in the past (940ce422a3).
>>>
>>> Thiago tried to fix this previously but there was a bit of discussion
>>> that didn't quite resolve:
>>>
>>> https://lore.kernel.org/linuxppc-dev/20190423223914.3882-1-bauerman@linux.ibm.com/
>>
>> Yeah it appears that the motivation at the time was to make the "Querying DEAD?"
>> messages to disappear and to avoid potentially concurrent calls to rtas-stop-self
>> which is prohibited by PAPR... not fixing actual crashes.
>
> I'm pretty sure at one point we were triggering crashes *in* RTAS via
> this path, I think that got resolved.
Yes, pHyp's RTAS now tolerates concurrent calls to stop-self. The
original bug that was reported when I worked on this ended in an RTAS
crash because of this timeout. The crash was fixed then.
--
Thiago Jung Bauermann
IBM Linux Technology Center
^ permalink raw reply
* Re: [PATCH v2 11/17] arch, mm: replace for_each_memblock() with for_each_mem_pfn_range()
From: Baoquan He @ 2020-08-05 3:57 UTC (permalink / raw)
To: Mike Rapoport
Cc: Emil Renner Berthing, linux-sh, Peter Zijlstra, Dave Hansen,
linux-mips, Max Filippov, Paul Mackerras, sparclinux, linux-riscv,
Will Deacon, Stafford Horne, Marek Szyprowski, linux-arch,
linux-s390, linux-c6x-dev, Yoshinori Sato, x86, Russell King,
Mike Rapoport, clang-built-linux, Ingo Molnar, linux-arm-kernel,
Catalin Marinas, uclinux-h8-devel, linux-xtensa, openrisc,
Borislav Petkov, Andy Lutomirski, Paul Walmsley, Thomas Gleixner,
Hari Bathini, Michal Simek, linux-mm, linuxppc-dev, linux-kernel,
iommu, Palmer Dabbelt, Andrew Morton, Christoph Hellwig
In-Reply-To: <20200802163601.8189-12-rppt@kernel.org>
On 08/02/20 at 07:35pm, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
>
> There are several occurrences of the following pattern:
>
> for_each_memblock(memory, reg) {
> start_pfn = memblock_region_memory_base_pfn(reg);
> end_pfn = memblock_region_memory_end_pfn(reg);
>
> /* do something with start_pfn and end_pfn */
> }
>
> Rather than iterate over all memblock.memory regions and each time query
> for their start and end PFNs, use for_each_mem_pfn_range() iterator to get
> simpler and clearer code.
>
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
> arch/arm/mm/init.c | 11 ++++-------
> arch/arm64/mm/init.c | 11 ++++-------
> arch/powerpc/kernel/fadump.c | 11 ++++++-----
> arch/powerpc/mm/mem.c | 15 ++++++++-------
> arch/powerpc/mm/numa.c | 7 ++-----
> arch/s390/mm/page-states.c | 6 ++----
> arch/sh/mm/init.c | 9 +++------
> mm/memblock.c | 6 ++----
> mm/sparse.c | 10 ++++------
> 9 files changed, 35 insertions(+), 51 deletions(-)
>
Reviewed-by: Baoquan He <bhe@redhat.com>
^ permalink raw reply
* Re: [PATCH v2 02/17] dma-contiguous: simplify cma_early_percent_memory()
From: Baoquan He @ 2020-08-05 3:50 UTC (permalink / raw)
To: Mike Rapoport
Cc: Emil Renner Berthing, linux-sh, Peter Zijlstra, Dave Hansen,
linux-mips, Max Filippov, Paul Mackerras, sparclinux, linux-riscv,
Will Deacon, Stafford Horne, Marek Szyprowski, linux-arch,
linux-s390, linux-c6x-dev, Yoshinori Sato, x86, Russell King,
Mike Rapoport, clang-built-linux, Ingo Molnar, linux-arm-kernel,
Catalin Marinas, uclinux-h8-devel, linux-xtensa, openrisc,
Borislav Petkov, Andy Lutomirski, Paul Walmsley, Thomas Gleixner,
Hari Bathini, Michal Simek, linux-mm, linuxppc-dev, linux-kernel,
iommu, Palmer Dabbelt, Andrew Morton, Christoph Hellwig
In-Reply-To: <20200802163601.8189-3-rppt@kernel.org>
On 08/02/20 at 07:35pm, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
>
> The memory size calculation in cma_early_percent_memory() traverses
> memblock.memory rather than simply call memblock_phys_mem_size(). The
> comment in that function suggests that at some point there should have been
> call to memblock_analyze() before memblock_phys_mem_size() could be used.
> As of now, there is no memblock_analyze() at all and
> memblock_phys_mem_size() can be used as soon as cold-plug memory is
> registerd with memblock.
>
> Replace loop over memblock.memory with a call to memblock_phys_mem_size().
>
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
> kernel/dma/contiguous.c | 11 +----------
> 1 file changed, 1 insertion(+), 10 deletions(-)
>
> diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
> index 15bc5026c485..1992afd8ca7b 100644
> --- a/kernel/dma/contiguous.c
> +++ b/kernel/dma/contiguous.c
> @@ -73,16 +73,7 @@ early_param("cma", early_cma);
>
> static phys_addr_t __init __maybe_unused cma_early_percent_memory(void)
> {
> - struct memblock_region *reg;
> - unsigned long total_pages = 0;
> -
> - /*
> - * We cannot use memblock_phys_mem_size() here, because
> - * memblock_analyze() has not been called yet.
> - */
> - for_each_memblock(memory, reg)
> - total_pages += memblock_region_memory_end_pfn(reg) -
> - memblock_region_memory_base_pfn(reg);
> + unsigned long total_pages = PHYS_PFN(memblock_phys_mem_size());
Reviewed-by: Baoquan He <bhe@redhat.com>
>
> return (total_pages * CONFIG_CMA_SIZE_PERCENTAGE / 100) << PAGE_SHIFT;
> }
> --
> 2.26.2
>
^ permalink raw reply
* Re: [PATCH v2 1/2] ASoC: fsl_sai: Clean code for synchronous mode
From: Nicolin Chen @ 2020-08-05 3:43 UTC (permalink / raw)
To: Shengjiu Wang
Cc: alsa-devel, timur, Xiubo.Lee, linuxppc-dev, tiwai, lgirdwood,
perex, broonie, festevam, linux-kernel
In-Reply-To: <1596594233-13489-2-git-send-email-shengjiu.wang@nxp.com>
On Wed, Aug 05, 2020 at 10:23:52AM +0800, Shengjiu Wang wrote:
> Tx synchronous with Rx: The RMR is the word mask register, it is used
> to mask any word in the frame, it is not relating to clock generation,
> So it is no need to be changed when Tx is going to be enabled.
>
> Rx synchronous with Tx: The TMR is the word mask register, it is used
> to mask any word in the frame, it is not relating to clock generation,
> So it is no need to be changed when Rx is going to be enabled.
>
> Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Can you rename the PATCH subject to something more specific?
For example, "Drop TMR/RMR settings for synchronous mode".
Please add this once it's addressed:
Reviewed-by: Nicolin Chen <nicoleotsuka@gmail.com>
^ permalink raw reply
* Re: [PATCH v5 4/4] powerpc/pseries/iommu: Allow bigger 64bit window by removing default DMA window
From: Alexey Kardashevskiy @ 2020-08-05 3:17 UTC (permalink / raw)
To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
Paul Mackerras, Thiago Jung Bauermann, Ram Pai, Brian King,
Murilo Fossa Vicentini, David Dai
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200805030455.123024-5-leobras.c@gmail.com>
On 05/08/2020 13:04, Leonardo Bras wrote:
> On LoPAR "DMA Window Manipulation Calls", it's recommended to remove the
> default DMA window for the device, before attempting to configure a DDW,
> in order to make the maximum resources available for the next DDW to be
> created.
>
> This is a requirement for using DDW on devices in which hypervisor
> allows only one DMA window.
>
> If setting up a new DDW fails anywhere after the removal of this
> default DMA window, it's needed to restore the default DMA window.
> For this, an implementation of ibm,reset-pe-dma-windows rtas call is
> needed:
>
> Platforms supporting the DDW option starting with LoPAR level 2.7 implement
> ibm,ddw-extensions. The first extension available (index 2) carries the
> token for ibm,reset-pe-dma-windows rtas call, which is used to restore
> the default DMA window for a device, if it has been deleted.
>
> It does so by resetting the TCE table allocation for the PE to it's
> boot time value, available in "ibm,dma-window" device tree node.
>
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> Tested-by: David Dai <zdai@linux.vnet.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> arch/powerpc/platforms/pseries/iommu.c | 73 +++++++++++++++++++++++---
> 1 file changed, 66 insertions(+), 7 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 4e33147825cc..e4198700ed1a 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -1066,6 +1066,38 @@ static phys_addr_t ddw_memory_hotplug_max(void)
> return max_addr;
> }
>
> +/*
> + * Platforms supporting the DDW option starting with LoPAR level 2.7 implement
> + * ibm,ddw-extensions, which carries the rtas token for
> + * ibm,reset-pe-dma-windows.
> + * That rtas-call can be used to restore the default DMA window for the device.
> + */
> +static void reset_dma_window(struct pci_dev *dev, struct device_node *par_dn)
> +{
> + int ret;
> + u32 cfg_addr, reset_dma_win;
> + u64 buid;
> + struct device_node *dn;
> + struct pci_dn *pdn;
> +
> + ret = ddw_read_ext(par_dn, DDW_EXT_RESET_DMA_WIN, &reset_dma_win);
> + if (ret)
> + return;
> +
> + dn = pci_device_to_OF_node(dev);
> + pdn = PCI_DN(dn);
> + buid = pdn->phb->buid;
> + cfg_addr = (pdn->busno << 16) | (pdn->devfn << 8);
> +
> + ret = rtas_call(reset_dma_win, 3, 1, NULL, cfg_addr, BUID_HI(buid),
> + BUID_LO(buid));
> + if (ret)
> + dev_info(&dev->dev,
> + "ibm,reset-pe-dma-windows(%x) %x %x %x returned %d ",
> + reset_dma_win, cfg_addr, BUID_HI(buid), BUID_LO(buid),
> + ret);
> +}
> +
> /*
> * If the PE supports dynamic dma windows, and there is space for a table
> * that can map all pages in a linear offset, then setup such a table,
> @@ -1090,6 +1122,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> struct property *win64;
> struct dynamic_dma_window_prop *ddwprop;
> struct failed_ddw_pdn *fpdn;
> + bool default_win_removed = false;
>
> mutex_lock(&direct_window_init_mutex);
>
> @@ -1133,14 +1166,38 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> if (ret != 0)
> goto out_failed;
>
> + /*
> + * If there is no window available, remove the default DMA window,
> + * if it's present. This will make all the resources available to the
> + * new DDW window.
> + * If anything fails after this, we need to restore it, so also check
> + * for extensions presence.
> + */
> if (query.windows_available == 0) {
> - /*
> - * no additional windows are available for this device.
> - * We might be able to reallocate the existing window,
> - * trading in for a larger page size.
> - */
> - dev_dbg(&dev->dev, "no free dynamic windows");
> - goto out_failed;
> + struct property *default_win;
> + int reset_win_ext;
> +
> + default_win = of_find_property(pdn, "ibm,dma-window", NULL);
> + if (!default_win)
> + goto out_failed;
> +
> + reset_win_ext = ddw_read_ext(pdn, DDW_EXT_RESET_DMA_WIN, NULL);
> + if (reset_win_ext)
> + goto out_failed;
> +
> + remove_dma_window(pdn, ddw_avail, default_win);
> + default_win_removed = true;
> +
> + /* Query again, to check if the window is available */
> + ret = query_ddw(dev, ddw_avail, &query, pdn);
> + if (ret != 0)
> + goto out_failed;
> +
> + if (query.windows_available == 0) {
> + /* no windows are available for this device. */
> + dev_dbg(&dev->dev, "no free dynamic windows");
> + goto out_failed;
> + }
> }
> if (query.page_size & 4) {
> page_shift = 24; /* 16MB */
> @@ -1231,6 +1288,8 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> kfree(win64);
>
> out_failed:
> + if (default_win_removed)
> + reset_dma_window(dev, pdn);
>
> fpdn = kzalloc(sizeof(*fpdn), GFP_KERNEL);
> if (!fpdn)
>
--
Alexey
^ permalink raw reply
* Re: [PATCH v5 3/4] powerpc/pseries/iommu: Move window-removing part of remove_ddw into remove_dma_window
From: Alexey Kardashevskiy @ 2020-08-05 3:17 UTC (permalink / raw)
To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
Paul Mackerras, Thiago Jung Bauermann, Ram Pai, Brian King,
Murilo Fossa Vicentini, David Dai
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200805030455.123024-4-leobras.c@gmail.com>
On 05/08/2020 13:04, Leonardo Bras wrote:
> Move the window-removing part of remove_ddw into a new function
> (remove_dma_window), so it can be used to remove other DMA windows.
>
> It's useful for removing DMA windows that don't create DIRECT64_PROPNAME
> property, like the default DMA window from the device, which uses
> "ibm,dma-window".
>
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> Tested-by: David Dai <zdai@linux.vnet.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> arch/powerpc/platforms/pseries/iommu.c | 45 +++++++++++++++-----------
> 1 file changed, 27 insertions(+), 18 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 1a933c4e8bba..4e33147825cc 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -781,25 +781,14 @@ static int __init disable_ddw_setup(char *str)
>
> early_param("disable_ddw", disable_ddw_setup);
>
> -static void remove_ddw(struct device_node *np, bool remove_prop)
> +static void remove_dma_window(struct device_node *np, u32 *ddw_avail,
> + struct property *win)
> {
> struct dynamic_dma_window_prop *dwp;
> - struct property *win64;
> - u32 ddw_avail[DDW_APPLICABLE_SIZE];
> u64 liobn;
> - int ret = 0;
> -
> - ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
> - &ddw_avail[0], DDW_APPLICABLE_SIZE);
> -
> - win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
> - if (!win64)
> - return;
> -
> - if (ret || win64->length < sizeof(*dwp))
> - goto delprop;
> + int ret;
>
> - dwp = win64->value;
> + dwp = win->value;
> liobn = (u64)be32_to_cpu(dwp->liobn);
>
> /* clear the whole window, note the arg is in kernel pages */
> @@ -821,10 +810,30 @@ static void remove_ddw(struct device_node *np, bool remove_prop)
> pr_debug("%pOF: successfully removed direct window: rtas returned "
> "%d to ibm,remove-pe-dma-window(%x) %llx\n",
> np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
> +}
> +
> +static void remove_ddw(struct device_node *np, bool remove_prop)
> +{
> + struct property *win;
> + u32 ddw_avail[DDW_APPLICABLE_SIZE];
> + int ret = 0;
> +
> + ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
> + &ddw_avail[0], DDW_APPLICABLE_SIZE);
> + if (ret)
> + return;
> +
> + win = of_find_property(np, DIRECT64_PROPNAME, NULL);
> + if (!win)
> + return;
> +
> + if (win->length >= sizeof(struct dynamic_dma_window_prop))
> + remove_dma_window(np, ddw_avail, win);
> +
> + if (!remove_prop)
> + return;
>
> -delprop:
> - if (remove_prop)
> - ret = of_remove_property(np, win64);
> + ret = of_remove_property(np, win);
> if (ret)
> pr_warn("%pOF: failed to remove direct window property: %d\n",
> np, ret);
>
--
Alexey
^ permalink raw reply
* Re: [PATCH v5 2/4] powerpc/pseries/iommu: Update call to ibm,query-pe-dma-windows
From: Alexey Kardashevskiy @ 2020-08-05 3:16 UTC (permalink / raw)
To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
Paul Mackerras, Thiago Jung Bauermann, Ram Pai, Brian King,
Murilo Fossa Vicentini, David Dai
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200805030455.123024-3-leobras.c@gmail.com>
On 05/08/2020 13:04, Leonardo Bras wrote:
> From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can make the number of
> outputs from "ibm,query-pe-dma-windows" go from 5 to 6.
>
> This change of output size is meant to expand the address size of
> largest_available_block PE TCE from 32-bit to 64-bit, which ends up
> shifting page_size and migration_capable.
>
> This ends up requiring the update of
> ddw_query_response->largest_available_block from u32 to u64, and manually
> assigning the values from the buffer into this struct, according to
> output size.
>
> Also, a routine was created for helping reading the ddw extensions as
> suggested by LoPAR: First reading the size of the extension array from
> index 0, checking if the property exists, and then returning it's value.
>
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> Tested-by: David Dai <zdai@linux.vnet.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> arch/powerpc/platforms/pseries/iommu.c | 91 +++++++++++++++++++++++---
> 1 file changed, 81 insertions(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index ac0d6376bdad..1a933c4e8bba 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -47,6 +47,12 @@ enum {
> DDW_APPLICABLE_SIZE
> };
>
> +enum {
> + DDW_EXT_SIZE = 0,
> + DDW_EXT_RESET_DMA_WIN = 1,
> + DDW_EXT_QUERY_OUT_SIZE = 2
> +};
> +
> static struct iommu_table_group *iommu_pseries_alloc_group(int node)
> {
> struct iommu_table_group *table_group;
> @@ -342,7 +348,7 @@ struct direct_window {
> /* Dynamic DMA Window support */
> struct ddw_query_response {
> u32 windows_available;
> - u32 largest_available_block;
> + u64 largest_available_block;
> u32 page_size;
> u32 migration_capable;
> };
> @@ -877,14 +883,62 @@ static int find_existing_ddw_windows(void)
> }
> machine_arch_initcall(pseries, find_existing_ddw_windows);
>
> +/**
> + * ddw_read_ext - Get the value of an DDW extension
> + * @np: device node from which the extension value is to be read.
> + * @extnum: index number of the extension.
> + * @value: pointer to return value, modified when extension is available.
> + *
> + * Checks if "ibm,ddw-extensions" exists for this node, and get the value
> + * on index 'extnum'.
> + * It can be used only to check if a property exists, passing value == NULL.
> + *
> + * Returns:
> + * 0 if extension successfully read
> + * -EINVAL if the "ibm,ddw-extensions" does not exist,
> + * -ENODATA if "ibm,ddw-extensions" does not have a value, and
> + * -EOVERFLOW if "ibm,ddw-extensions" does not contain this extension.
> + */
> +static inline int ddw_read_ext(const struct device_node *np, int extnum,
> + u32 *value)
> +{
> + static const char propname[] = "ibm,ddw-extensions";
> + u32 count;
> + int ret;
> +
> + ret = of_property_read_u32_index(np, propname, DDW_EXT_SIZE, &count);
> + if (ret)
> + return ret;
> +
> + if (count < extnum)
> + return -EOVERFLOW;
> +
> + if (!value)
> + value = &count;
> +
> + return of_property_read_u32_index(np, propname, extnum, value);
> +}
> +
> static int query_ddw(struct pci_dev *dev, const u32 *ddw_avail,
> - struct ddw_query_response *query)
> + struct ddw_query_response *query,
> + struct device_node *parent)
> {
> struct device_node *dn;
> struct pci_dn *pdn;
> - u32 cfg_addr;
> + u32 cfg_addr, ext_query, query_out[5];
> u64 buid;
> - int ret;
> + int ret, out_sz;
> +
> + /*
> + * From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can rule how many
> + * output parameters ibm,query-pe-dma-windows will have, ranging from
> + * 5 to 6.
> + */
> + ret = ddw_read_ext(parent, DDW_EXT_QUERY_OUT_SIZE, &ext_query);
> + if (!ret && ext_query == 1)
> + out_sz = 6;
> + else
> + out_sz = 5;
>
> /*
> * Get the config address and phb buid of the PE window.
> @@ -897,11 +951,28 @@ static int query_ddw(struct pci_dev *dev, const u32 *ddw_avail,
> buid = pdn->phb->buid;
> cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
>
> - ret = rtas_call(ddw_avail[DDW_QUERY_PE_DMA_WIN], 3, 5, (u32 *)query,
> + ret = rtas_call(ddw_avail[DDW_QUERY_PE_DMA_WIN], 3, out_sz, query_out,
> cfg_addr, BUID_HI(buid), BUID_LO(buid));
> - dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x"
> - " returned %d\n", ddw_avail[DDW_QUERY_PE_DMA_WIN], cfg_addr,
> - BUID_HI(buid), BUID_LO(buid), ret);
> + dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x returned %d\n",
> + ddw_avail[DDW_QUERY_PE_DMA_WIN], cfg_addr, BUID_HI(buid),
> + BUID_LO(buid), ret);
> +
> + switch (out_sz) {
> + case 5:
> + query->windows_available = query_out[0];
> + query->largest_available_block = query_out[1];
> + query->page_size = query_out[2];
> + query->migration_capable = query_out[3];
> + break;
> + case 6:
> + query->windows_available = query_out[0];
> + query->largest_available_block = ((u64)query_out[1] << 32) |
> + query_out[2];
> + query->page_size = query_out[3];
> + query->migration_capable = query_out[4];
> + break;
> + }
> +
> return ret;
> }
>
> @@ -1049,7 +1120,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> * of page sizes: supported and supported for migrate-dma.
> */
> dn = pci_device_to_OF_node(dev);
> - ret = query_ddw(dev, ddw_avail, &query);
> + ret = query_ddw(dev, ddw_avail, &query, pdn);
> if (ret != 0)
> goto out_failed;
>
> @@ -1077,7 +1148,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> /* check largest block * page size > max memory hotplug addr */
> max_addr = ddw_memory_hotplug_max();
> if (query.largest_available_block < (max_addr >> page_shift)) {
> - dev_dbg(&dev->dev, "can't map partition max 0x%llx with %u "
> + dev_dbg(&dev->dev, "can't map partition max 0x%llx with %llu "
> "%llu-sized pages\n", max_addr, query.largest_available_block,
> 1ULL << page_shift);
> goto out_failed;
>
--
Alexey
^ permalink raw reply
* Re: [PATCH v5 1/4] powerpc/pseries/iommu: Create defines for operations in ibm,ddw-applicable
From: Alexey Kardashevskiy @ 2020-08-05 3:16 UTC (permalink / raw)
To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
Paul Mackerras, Thiago Jung Bauermann, Ram Pai, Brian King,
Murilo Fossa Vicentini, David Dai
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200805030455.123024-2-leobras.c@gmail.com>
On 05/08/2020 13:04, Leonardo Bras wrote:
> Create defines to help handling ibm,ddw-applicable values, avoiding
> confusion about the index of given operations.
>
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> Tested-by: David Dai <zdai@linux.vnet.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> arch/powerpc/platforms/pseries/iommu.c | 43 ++++++++++++++++----------
> 1 file changed, 26 insertions(+), 17 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 6d47b4a3ce39..ac0d6376bdad 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -39,6 +39,14 @@
>
> #include "pseries.h"
>
> +enum {
> + DDW_QUERY_PE_DMA_WIN = 0,
> + DDW_CREATE_PE_DMA_WIN = 1,
> + DDW_REMOVE_PE_DMA_WIN = 2,
> +
> + DDW_APPLICABLE_SIZE
> +};
> +
> static struct iommu_table_group *iommu_pseries_alloc_group(int node)
> {
> struct iommu_table_group *table_group;
> @@ -771,12 +779,12 @@ static void remove_ddw(struct device_node *np, bool remove_prop)
> {
> struct dynamic_dma_window_prop *dwp;
> struct property *win64;
> - u32 ddw_avail[3];
> + u32 ddw_avail[DDW_APPLICABLE_SIZE];
> u64 liobn;
> int ret = 0;
>
> ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
> - &ddw_avail[0], 3);
> + &ddw_avail[0], DDW_APPLICABLE_SIZE);
>
> win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
> if (!win64)
> @@ -798,15 +806,15 @@ static void remove_ddw(struct device_node *np, bool remove_prop)
> pr_debug("%pOF successfully cleared tces in window.\n",
> np);
>
> - ret = rtas_call(ddw_avail[2], 1, 1, NULL, liobn);
> + ret = rtas_call(ddw_avail[DDW_REMOVE_PE_DMA_WIN], 1, 1, NULL, liobn);
> if (ret)
> pr_warn("%pOF: failed to remove direct window: rtas returned "
> "%d to ibm,remove-pe-dma-window(%x) %llx\n",
> - np, ret, ddw_avail[2], liobn);
> + np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
> else
> pr_debug("%pOF: successfully removed direct window: rtas returned "
> "%d to ibm,remove-pe-dma-window(%x) %llx\n",
> - np, ret, ddw_avail[2], liobn);
> + np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
>
> delprop:
> if (remove_prop)
> @@ -889,11 +897,11 @@ static int query_ddw(struct pci_dev *dev, const u32 *ddw_avail,
> buid = pdn->phb->buid;
> cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
>
> - ret = rtas_call(ddw_avail[0], 3, 5, (u32 *)query,
> - cfg_addr, BUID_HI(buid), BUID_LO(buid));
> + ret = rtas_call(ddw_avail[DDW_QUERY_PE_DMA_WIN], 3, 5, (u32 *)query,
> + cfg_addr, BUID_HI(buid), BUID_LO(buid));
> dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x"
> - " returned %d\n", ddw_avail[0], cfg_addr, BUID_HI(buid),
> - BUID_LO(buid), ret);
> + " returned %d\n", ddw_avail[DDW_QUERY_PE_DMA_WIN], cfg_addr,
> + BUID_HI(buid), BUID_LO(buid), ret);
> return ret;
> }
>
> @@ -920,15 +928,16 @@ static int create_ddw(struct pci_dev *dev, const u32 *ddw_avail,
>
> do {
> /* extra outputs are LIOBN and dma-addr (hi, lo) */
> - ret = rtas_call(ddw_avail[1], 5, 4, (u32 *)create,
> - cfg_addr, BUID_HI(buid), BUID_LO(buid),
> - page_shift, window_shift);
> + ret = rtas_call(ddw_avail[DDW_CREATE_PE_DMA_WIN], 5, 4,
> + (u32 *)create, cfg_addr, BUID_HI(buid),
> + BUID_LO(buid), page_shift, window_shift);
> } while (rtas_busy_delay(ret));
> dev_info(&dev->dev,
> "ibm,create-pe-dma-window(%x) %x %x %x %x %x returned %d "
> - "(liobn = 0x%x starting addr = %x %x)\n", ddw_avail[1],
> - cfg_addr, BUID_HI(buid), BUID_LO(buid), page_shift,
> - window_shift, ret, create->liobn, create->addr_hi, create->addr_lo);
> + "(liobn = 0x%x starting addr = %x %x)\n",
> + ddw_avail[DDW_CREATE_PE_DMA_WIN], cfg_addr, BUID_HI(buid),
> + BUID_LO(buid), page_shift, window_shift, ret, create->liobn,
> + create->addr_hi, create->addr_lo);
>
> return ret;
> }
> @@ -996,7 +1005,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> int page_shift;
> u64 dma_addr, max_addr;
> struct device_node *dn;
> - u32 ddw_avail[3];
> + u32 ddw_avail[DDW_APPLICABLE_SIZE];
> struct direct_window *window;
> struct property *win64;
> struct dynamic_dma_window_prop *ddwprop;
> @@ -1029,7 +1038,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> * the property is actually in the parent, not the PE
> */
> ret = of_property_read_u32_array(pdn, "ibm,ddw-applicable",
> - &ddw_avail[0], 3);
> + &ddw_avail[0], DDW_APPLICABLE_SIZE);
> if (ret)
> goto out_failed;
>
>
--
Alexey
^ permalink raw reply
* Re: [PATCH] powerpc/pseries/hotplug-cpu: increase wait time for vCPU death
From: Michael Ellerman @ 2020-08-05 3:07 UTC (permalink / raw)
To: Greg Kurz
Cc: Nathan Lynch, linuxppc-dev, Michael Roth, Thiago Jung Bauermann,
Cedric Le Goater
In-Reply-To: <20200804161609.6cb2cb71@bahia.lan>
Greg Kurz <groug@kaod.org> writes:
> On Tue, 04 Aug 2020 23:35:10 +1000
> Michael Ellerman <mpe@ellerman.id.au> wrote:
>> There is a bit of history to this code, but not in a good way :)
>>
>> Michael Roth <mdroth@linux.vnet.ibm.com> writes:
>> > For a power9 KVM guest with XIVE enabled, running a test loop
>> > where we hotplug 384 vcpus and then unplug them, the following traces
>> > can be seen (generally within a few loops) either from the unplugged
>> > vcpu:
>> >
>> > [ 1767.353447] cpu 65 (hwid 65) Ready to die...
>> > [ 1767.952096] Querying DEAD? cpu 66 (66) shows 2
>> > [ 1767.952311] list_del corruption. next->prev should be c00a000002470208, but was c00a000002470048
>> ...
>> >
>> > At that point the worker thread assumes the unplugged CPU is in some
>> > unknown/dead state and procedes with the cleanup, causing the race with
>> > the XIVE cleanup code executed by the unplugged CPU.
>> >
>> > Fix this by inserting an msleep() after each RTAS call to avoid
>>
>> We previously had an msleep(), but it was removed:
>>
>> b906cfa397fd ("powerpc/pseries: Fix cpu hotplug")
>
> Ah, I hadn't seen that one...
>
>> > pseries_cpu_die() returning prematurely, and double the number of
>> > attempts so we wait at least a total of 5 seconds. While this isn't an
>> > ideal solution, it is similar to how we dealt with a similar issue for
>> > cede_offline mode in the past (940ce422a3).
>>
>> Thiago tried to fix this previously but there was a bit of discussion
>> that didn't quite resolve:
>>
>> https://lore.kernel.org/linuxppc-dev/20190423223914.3882-1-bauerman@linux.ibm.com/
>
> Yeah it appears that the motivation at the time was to make the "Querying DEAD?"
> messages to disappear and to avoid potentially concurrent calls to rtas-stop-self
> which is prohibited by PAPR... not fixing actual crashes.
I'm pretty sure at one point we were triggering crashes *in* RTAS via
this path, I think that got resolved.
>> Spinning forever seems like a bad idea, but as has been demonstrated at
>> least twice now, continuing when we don't know the state of the other
>> CPU can lead to straight up crashes.
>>
>> So I think I'm persuaded that it's preferable to have the kernel stuck
>> spinning rather than oopsing.
>>
>
> +1
>
>> I'm 50/50 on whether we should have a cond_resched() in the loop. My
>> first instinct is no, if we're stuck here for 20s a stack trace would be
>> good. But then we will probably hit that on some big and/or heavily
>> loaded machine.
>>
>> So possibly we should call cond_resched() but have some custom logic in
>> the loop to print a warning if we are stuck for more than some
>> sufficiently long amount of time.
>
> How long should that be ?
Yeah good question.
I guess step one would be seeing how long it can take on the 384 vcpu
machine. And we can probably test on some other big machines.
Hopefully Nathan can give us some idea of how long he's seen it take on
large systems? I know he was concerned about the 20s timeout of the
softlockup detector.
Maybe a minute or two?
>> > Fixes: eac1e731b59ee ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
>> > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1856588
>>
>> This is not public.
>
> I'll have a look at changing that.
Thanks.
cheers
^ permalink raw reply
* [PATCH v5 4/4] powerpc/pseries/iommu: Allow bigger 64bit window by removing default DMA window
From: Leonardo Bras @ 2020-08-05 3:04 UTC (permalink / raw)
To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
Leonardo Bras, Alexey Kardashevskiy, Thiago Jung Bauermann,
Ram Pai, Brian King, Murilo Fossa Vicentini, David Dai
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200805030455.123024-1-leobras.c@gmail.com>
On LoPAR "DMA Window Manipulation Calls", it's recommended to remove the
default DMA window for the device, before attempting to configure a DDW,
in order to make the maximum resources available for the next DDW to be
created.
This is a requirement for using DDW on devices in which hypervisor
allows only one DMA window.
If setting up a new DDW fails anywhere after the removal of this
default DMA window, it's needed to restore the default DMA window.
For this, an implementation of ibm,reset-pe-dma-windows rtas call is
needed:
Platforms supporting the DDW option starting with LoPAR level 2.7 implement
ibm,ddw-extensions. The first extension available (index 2) carries the
token for ibm,reset-pe-dma-windows rtas call, which is used to restore
the default DMA window for a device, if it has been deleted.
It does so by resetting the TCE table allocation for the PE to it's
boot time value, available in "ibm,dma-window" device tree node.
Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Tested-by: David Dai <zdai@linux.vnet.ibm.com>
---
arch/powerpc/platforms/pseries/iommu.c | 73 +++++++++++++++++++++++---
1 file changed, 66 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 4e33147825cc..e4198700ed1a 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1066,6 +1066,38 @@ static phys_addr_t ddw_memory_hotplug_max(void)
return max_addr;
}
+/*
+ * Platforms supporting the DDW option starting with LoPAR level 2.7 implement
+ * ibm,ddw-extensions, which carries the rtas token for
+ * ibm,reset-pe-dma-windows.
+ * That rtas-call can be used to restore the default DMA window for the device.
+ */
+static void reset_dma_window(struct pci_dev *dev, struct device_node *par_dn)
+{
+ int ret;
+ u32 cfg_addr, reset_dma_win;
+ u64 buid;
+ struct device_node *dn;
+ struct pci_dn *pdn;
+
+ ret = ddw_read_ext(par_dn, DDW_EXT_RESET_DMA_WIN, &reset_dma_win);
+ if (ret)
+ return;
+
+ dn = pci_device_to_OF_node(dev);
+ pdn = PCI_DN(dn);
+ buid = pdn->phb->buid;
+ cfg_addr = (pdn->busno << 16) | (pdn->devfn << 8);
+
+ ret = rtas_call(reset_dma_win, 3, 1, NULL, cfg_addr, BUID_HI(buid),
+ BUID_LO(buid));
+ if (ret)
+ dev_info(&dev->dev,
+ "ibm,reset-pe-dma-windows(%x) %x %x %x returned %d ",
+ reset_dma_win, cfg_addr, BUID_HI(buid), BUID_LO(buid),
+ ret);
+}
+
/*
* If the PE supports dynamic dma windows, and there is space for a table
* that can map all pages in a linear offset, then setup such a table,
@@ -1090,6 +1122,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
struct property *win64;
struct dynamic_dma_window_prop *ddwprop;
struct failed_ddw_pdn *fpdn;
+ bool default_win_removed = false;
mutex_lock(&direct_window_init_mutex);
@@ -1133,14 +1166,38 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
if (ret != 0)
goto out_failed;
+ /*
+ * If there is no window available, remove the default DMA window,
+ * if it's present. This will make all the resources available to the
+ * new DDW window.
+ * If anything fails after this, we need to restore it, so also check
+ * for extensions presence.
+ */
if (query.windows_available == 0) {
- /*
- * no additional windows are available for this device.
- * We might be able to reallocate the existing window,
- * trading in for a larger page size.
- */
- dev_dbg(&dev->dev, "no free dynamic windows");
- goto out_failed;
+ struct property *default_win;
+ int reset_win_ext;
+
+ default_win = of_find_property(pdn, "ibm,dma-window", NULL);
+ if (!default_win)
+ goto out_failed;
+
+ reset_win_ext = ddw_read_ext(pdn, DDW_EXT_RESET_DMA_WIN, NULL);
+ if (reset_win_ext)
+ goto out_failed;
+
+ remove_dma_window(pdn, ddw_avail, default_win);
+ default_win_removed = true;
+
+ /* Query again, to check if the window is available */
+ ret = query_ddw(dev, ddw_avail, &query, pdn);
+ if (ret != 0)
+ goto out_failed;
+
+ if (query.windows_available == 0) {
+ /* no windows are available for this device. */
+ dev_dbg(&dev->dev, "no free dynamic windows");
+ goto out_failed;
+ }
}
if (query.page_size & 4) {
page_shift = 24; /* 16MB */
@@ -1231,6 +1288,8 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
kfree(win64);
out_failed:
+ if (default_win_removed)
+ reset_dma_window(dev, pdn);
fpdn = kzalloc(sizeof(*fpdn), GFP_KERNEL);
if (!fpdn)
--
2.25.4
^ permalink raw reply related
* [PATCH v5 3/4] powerpc/pseries/iommu: Move window-removing part of remove_ddw into remove_dma_window
From: Leonardo Bras @ 2020-08-05 3:04 UTC (permalink / raw)
To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
Leonardo Bras, Alexey Kardashevskiy, Thiago Jung Bauermann,
Ram Pai, Brian King, Murilo Fossa Vicentini, David Dai
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200805030455.123024-1-leobras.c@gmail.com>
Move the window-removing part of remove_ddw into a new function
(remove_dma_window), so it can be used to remove other DMA windows.
It's useful for removing DMA windows that don't create DIRECT64_PROPNAME
property, like the default DMA window from the device, which uses
"ibm,dma-window".
Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Tested-by: David Dai <zdai@linux.vnet.ibm.com>
---
arch/powerpc/platforms/pseries/iommu.c | 45 +++++++++++++++-----------
1 file changed, 27 insertions(+), 18 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 1a933c4e8bba..4e33147825cc 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -781,25 +781,14 @@ static int __init disable_ddw_setup(char *str)
early_param("disable_ddw", disable_ddw_setup);
-static void remove_ddw(struct device_node *np, bool remove_prop)
+static void remove_dma_window(struct device_node *np, u32 *ddw_avail,
+ struct property *win)
{
struct dynamic_dma_window_prop *dwp;
- struct property *win64;
- u32 ddw_avail[DDW_APPLICABLE_SIZE];
u64 liobn;
- int ret = 0;
-
- ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
- &ddw_avail[0], DDW_APPLICABLE_SIZE);
-
- win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
- if (!win64)
- return;
-
- if (ret || win64->length < sizeof(*dwp))
- goto delprop;
+ int ret;
- dwp = win64->value;
+ dwp = win->value;
liobn = (u64)be32_to_cpu(dwp->liobn);
/* clear the whole window, note the arg is in kernel pages */
@@ -821,10 +810,30 @@ static void remove_ddw(struct device_node *np, bool remove_prop)
pr_debug("%pOF: successfully removed direct window: rtas returned "
"%d to ibm,remove-pe-dma-window(%x) %llx\n",
np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
+}
+
+static void remove_ddw(struct device_node *np, bool remove_prop)
+{
+ struct property *win;
+ u32 ddw_avail[DDW_APPLICABLE_SIZE];
+ int ret = 0;
+
+ ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
+ &ddw_avail[0], DDW_APPLICABLE_SIZE);
+ if (ret)
+ return;
+
+ win = of_find_property(np, DIRECT64_PROPNAME, NULL);
+ if (!win)
+ return;
+
+ if (win->length >= sizeof(struct dynamic_dma_window_prop))
+ remove_dma_window(np, ddw_avail, win);
+
+ if (!remove_prop)
+ return;
-delprop:
- if (remove_prop)
- ret = of_remove_property(np, win64);
+ ret = of_remove_property(np, win);
if (ret)
pr_warn("%pOF: failed to remove direct window property: %d\n",
np, ret);
--
2.25.4
^ permalink raw reply related
* [PATCH v5 2/4] powerpc/pseries/iommu: Update call to ibm, query-pe-dma-windows
From: Leonardo Bras @ 2020-08-05 3:04 UTC (permalink / raw)
To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
Leonardo Bras, Alexey Kardashevskiy, Thiago Jung Bauermann,
Ram Pai, Brian King, Murilo Fossa Vicentini, David Dai
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200805030455.123024-1-leobras.c@gmail.com>
From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can make the number of
outputs from "ibm,query-pe-dma-windows" go from 5 to 6.
This change of output size is meant to expand the address size of
largest_available_block PE TCE from 32-bit to 64-bit, which ends up
shifting page_size and migration_capable.
This ends up requiring the update of
ddw_query_response->largest_available_block from u32 to u64, and manually
assigning the values from the buffer into this struct, according to
output size.
Also, a routine was created for helping reading the ddw extensions as
suggested by LoPAR: First reading the size of the extension array from
index 0, checking if the property exists, and then returning it's value.
Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Tested-by: David Dai <zdai@linux.vnet.ibm.com>
---
arch/powerpc/platforms/pseries/iommu.c | 91 +++++++++++++++++++++++---
1 file changed, 81 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index ac0d6376bdad..1a933c4e8bba 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -47,6 +47,12 @@ enum {
DDW_APPLICABLE_SIZE
};
+enum {
+ DDW_EXT_SIZE = 0,
+ DDW_EXT_RESET_DMA_WIN = 1,
+ DDW_EXT_QUERY_OUT_SIZE = 2
+};
+
static struct iommu_table_group *iommu_pseries_alloc_group(int node)
{
struct iommu_table_group *table_group;
@@ -342,7 +348,7 @@ struct direct_window {
/* Dynamic DMA Window support */
struct ddw_query_response {
u32 windows_available;
- u32 largest_available_block;
+ u64 largest_available_block;
u32 page_size;
u32 migration_capable;
};
@@ -877,14 +883,62 @@ static int find_existing_ddw_windows(void)
}
machine_arch_initcall(pseries, find_existing_ddw_windows);
+/**
+ * ddw_read_ext - Get the value of an DDW extension
+ * @np: device node from which the extension value is to be read.
+ * @extnum: index number of the extension.
+ * @value: pointer to return value, modified when extension is available.
+ *
+ * Checks if "ibm,ddw-extensions" exists for this node, and get the value
+ * on index 'extnum'.
+ * It can be used only to check if a property exists, passing value == NULL.
+ *
+ * Returns:
+ * 0 if extension successfully read
+ * -EINVAL if the "ibm,ddw-extensions" does not exist,
+ * -ENODATA if "ibm,ddw-extensions" does not have a value, and
+ * -EOVERFLOW if "ibm,ddw-extensions" does not contain this extension.
+ */
+static inline int ddw_read_ext(const struct device_node *np, int extnum,
+ u32 *value)
+{
+ static const char propname[] = "ibm,ddw-extensions";
+ u32 count;
+ int ret;
+
+ ret = of_property_read_u32_index(np, propname, DDW_EXT_SIZE, &count);
+ if (ret)
+ return ret;
+
+ if (count < extnum)
+ return -EOVERFLOW;
+
+ if (!value)
+ value = &count;
+
+ return of_property_read_u32_index(np, propname, extnum, value);
+}
+
static int query_ddw(struct pci_dev *dev, const u32 *ddw_avail,
- struct ddw_query_response *query)
+ struct ddw_query_response *query,
+ struct device_node *parent)
{
struct device_node *dn;
struct pci_dn *pdn;
- u32 cfg_addr;
+ u32 cfg_addr, ext_query, query_out[5];
u64 buid;
- int ret;
+ int ret, out_sz;
+
+ /*
+ * From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can rule how many
+ * output parameters ibm,query-pe-dma-windows will have, ranging from
+ * 5 to 6.
+ */
+ ret = ddw_read_ext(parent, DDW_EXT_QUERY_OUT_SIZE, &ext_query);
+ if (!ret && ext_query == 1)
+ out_sz = 6;
+ else
+ out_sz = 5;
/*
* Get the config address and phb buid of the PE window.
@@ -897,11 +951,28 @@ static int query_ddw(struct pci_dev *dev, const u32 *ddw_avail,
buid = pdn->phb->buid;
cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
- ret = rtas_call(ddw_avail[DDW_QUERY_PE_DMA_WIN], 3, 5, (u32 *)query,
+ ret = rtas_call(ddw_avail[DDW_QUERY_PE_DMA_WIN], 3, out_sz, query_out,
cfg_addr, BUID_HI(buid), BUID_LO(buid));
- dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x"
- " returned %d\n", ddw_avail[DDW_QUERY_PE_DMA_WIN], cfg_addr,
- BUID_HI(buid), BUID_LO(buid), ret);
+ dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x returned %d\n",
+ ddw_avail[DDW_QUERY_PE_DMA_WIN], cfg_addr, BUID_HI(buid),
+ BUID_LO(buid), ret);
+
+ switch (out_sz) {
+ case 5:
+ query->windows_available = query_out[0];
+ query->largest_available_block = query_out[1];
+ query->page_size = query_out[2];
+ query->migration_capable = query_out[3];
+ break;
+ case 6:
+ query->windows_available = query_out[0];
+ query->largest_available_block = ((u64)query_out[1] << 32) |
+ query_out[2];
+ query->page_size = query_out[3];
+ query->migration_capable = query_out[4];
+ break;
+ }
+
return ret;
}
@@ -1049,7 +1120,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
* of page sizes: supported and supported for migrate-dma.
*/
dn = pci_device_to_OF_node(dev);
- ret = query_ddw(dev, ddw_avail, &query);
+ ret = query_ddw(dev, ddw_avail, &query, pdn);
if (ret != 0)
goto out_failed;
@@ -1077,7 +1148,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
/* check largest block * page size > max memory hotplug addr */
max_addr = ddw_memory_hotplug_max();
if (query.largest_available_block < (max_addr >> page_shift)) {
- dev_dbg(&dev->dev, "can't map partition max 0x%llx with %u "
+ dev_dbg(&dev->dev, "can't map partition max 0x%llx with %llu "
"%llu-sized pages\n", max_addr, query.largest_available_block,
1ULL << page_shift);
goto out_failed;
--
2.25.4
^ permalink raw reply related
* [PATCH v5 1/4] powerpc/pseries/iommu: Create defines for operations in ibm, ddw-applicable
From: Leonardo Bras @ 2020-08-05 3:04 UTC (permalink / raw)
To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
Leonardo Bras, Alexey Kardashevskiy, Thiago Jung Bauermann,
Ram Pai, Brian King, Murilo Fossa Vicentini, David Dai
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200805030455.123024-1-leobras.c@gmail.com>
Create defines to help handling ibm,ddw-applicable values, avoiding
confusion about the index of given operations.
Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Tested-by: David Dai <zdai@linux.vnet.ibm.com>
---
arch/powerpc/platforms/pseries/iommu.c | 43 ++++++++++++++++----------
1 file changed, 26 insertions(+), 17 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 6d47b4a3ce39..ac0d6376bdad 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -39,6 +39,14 @@
#include "pseries.h"
+enum {
+ DDW_QUERY_PE_DMA_WIN = 0,
+ DDW_CREATE_PE_DMA_WIN = 1,
+ DDW_REMOVE_PE_DMA_WIN = 2,
+
+ DDW_APPLICABLE_SIZE
+};
+
static struct iommu_table_group *iommu_pseries_alloc_group(int node)
{
struct iommu_table_group *table_group;
@@ -771,12 +779,12 @@ static void remove_ddw(struct device_node *np, bool remove_prop)
{
struct dynamic_dma_window_prop *dwp;
struct property *win64;
- u32 ddw_avail[3];
+ u32 ddw_avail[DDW_APPLICABLE_SIZE];
u64 liobn;
int ret = 0;
ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
- &ddw_avail[0], 3);
+ &ddw_avail[0], DDW_APPLICABLE_SIZE);
win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
if (!win64)
@@ -798,15 +806,15 @@ static void remove_ddw(struct device_node *np, bool remove_prop)
pr_debug("%pOF successfully cleared tces in window.\n",
np);
- ret = rtas_call(ddw_avail[2], 1, 1, NULL, liobn);
+ ret = rtas_call(ddw_avail[DDW_REMOVE_PE_DMA_WIN], 1, 1, NULL, liobn);
if (ret)
pr_warn("%pOF: failed to remove direct window: rtas returned "
"%d to ibm,remove-pe-dma-window(%x) %llx\n",
- np, ret, ddw_avail[2], liobn);
+ np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
else
pr_debug("%pOF: successfully removed direct window: rtas returned "
"%d to ibm,remove-pe-dma-window(%x) %llx\n",
- np, ret, ddw_avail[2], liobn);
+ np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
delprop:
if (remove_prop)
@@ -889,11 +897,11 @@ static int query_ddw(struct pci_dev *dev, const u32 *ddw_avail,
buid = pdn->phb->buid;
cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
- ret = rtas_call(ddw_avail[0], 3, 5, (u32 *)query,
- cfg_addr, BUID_HI(buid), BUID_LO(buid));
+ ret = rtas_call(ddw_avail[DDW_QUERY_PE_DMA_WIN], 3, 5, (u32 *)query,
+ cfg_addr, BUID_HI(buid), BUID_LO(buid));
dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x"
- " returned %d\n", ddw_avail[0], cfg_addr, BUID_HI(buid),
- BUID_LO(buid), ret);
+ " returned %d\n", ddw_avail[DDW_QUERY_PE_DMA_WIN], cfg_addr,
+ BUID_HI(buid), BUID_LO(buid), ret);
return ret;
}
@@ -920,15 +928,16 @@ static int create_ddw(struct pci_dev *dev, const u32 *ddw_avail,
do {
/* extra outputs are LIOBN and dma-addr (hi, lo) */
- ret = rtas_call(ddw_avail[1], 5, 4, (u32 *)create,
- cfg_addr, BUID_HI(buid), BUID_LO(buid),
- page_shift, window_shift);
+ ret = rtas_call(ddw_avail[DDW_CREATE_PE_DMA_WIN], 5, 4,
+ (u32 *)create, cfg_addr, BUID_HI(buid),
+ BUID_LO(buid), page_shift, window_shift);
} while (rtas_busy_delay(ret));
dev_info(&dev->dev,
"ibm,create-pe-dma-window(%x) %x %x %x %x %x returned %d "
- "(liobn = 0x%x starting addr = %x %x)\n", ddw_avail[1],
- cfg_addr, BUID_HI(buid), BUID_LO(buid), page_shift,
- window_shift, ret, create->liobn, create->addr_hi, create->addr_lo);
+ "(liobn = 0x%x starting addr = %x %x)\n",
+ ddw_avail[DDW_CREATE_PE_DMA_WIN], cfg_addr, BUID_HI(buid),
+ BUID_LO(buid), page_shift, window_shift, ret, create->liobn,
+ create->addr_hi, create->addr_lo);
return ret;
}
@@ -996,7 +1005,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
int page_shift;
u64 dma_addr, max_addr;
struct device_node *dn;
- u32 ddw_avail[3];
+ u32 ddw_avail[DDW_APPLICABLE_SIZE];
struct direct_window *window;
struct property *win64;
struct dynamic_dma_window_prop *ddwprop;
@@ -1029,7 +1038,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
* the property is actually in the parent, not the PE
*/
ret = of_property_read_u32_array(pdn, "ibm,ddw-applicable",
- &ddw_avail[0], 3);
+ &ddw_avail[0], DDW_APPLICABLE_SIZE);
if (ret)
goto out_failed;
--
2.25.4
^ permalink raw reply related
* [PATCH v5 0/4] Allow bigger 64bit window by removing default DMA window
From: Leonardo Bras @ 2020-08-05 3:04 UTC (permalink / raw)
To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
Leonardo Bras, Alexey Kardashevskiy, Thiago Jung Bauermann,
Ram Pai, Brian King, Murilo Fossa Vicentini, David Dai
Cc: linuxppc-dev, linux-kernel
There are some devices in which a hypervisor may only allow 1 DMA window
to exist at a time, and in those cases, a DDW is never created to them,
since the default DMA window keeps using this resource.
LoPAR recommends this procedure:
1. Remove the default DMA window,
2. Query for which configs the DDW can be created,
3. Create a DDW.
Patch #1:
Create defines for outputs of ibm,ddw-applicable, so it's easier to
identify them.
Patch #2:
- After LoPAR level 2.8, there is an extension that can make
ibm,query-pe-dma-windows to have 6 outputs instead of 5. This changes the
order of the outputs, and that can cause some trouble.
- query_ddw() was updated to check how many outputs the
ibm,query-pe-dma-windows is supposed to have, update the rtas_call() and
deal correctly with the outputs in both cases.
- This patch looks somehow unrelated to the series, but it can avoid future
problems on DDW creation.
Patch #3 moves the window-removing code from remove_ddw() to
remove_dma_window(), creating a way to delete any DMA window, so it can be
used to delete the default DMA window.
Patch #4 makes use of the remove_dma_window() from patch #3 to remove the
default DMA window before query_ddw(). It also implements a new rtas call
to recover the default DMA window, in case anything fails after it was
removed, and a DDW couldn't be created.
---
Changes since v4:
- Removed patches 5+ in order to deal with a feature at a time
- Remove unnecessary parentesis in patch #4
- Changed patch #4 title from
"Remove default DMA window before creating DDW"
- Included David Dai tested-by
- v4 link: http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=190051&state=%2A&archive=both
Changes since v3:
- Introduces new patch #5, to prepare for an important change in #6
- struct iommu_table was not being updated, so include a way to do this
in patch #6.
- Improved patch #4 based in a suggestion from Alexey, to make code
more easily understandable
- v3 link: http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=187348&state=%2A&archive=both
Changes since v2:
- Change the way ibm,ddw-extensions is accessed, using a proper function
instead of doing this inline everytime it's used.
- Remove previous patch #6, as it doesn't look like it would be useful.
- Add new patch, for changing names from direct* to dma*, as indirect
mapping can be used from now on.
- Fix some typos, corrects some define usage.
- v2 link: http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=185433&state=%2A&archive=both
Changes since v1:
- Add defines for ibm,ddw-applicable and ibm,ddw-extensions outputs
- Merge aux function query_ddw_out_sz() into query_ddw()
- Merge reset_dma_window() patch (prev. #2) into remove default DMA
window patch (#4).
- Keep device_node *np name instead of using pdn in remove_*()
- Rename 'device_node *pdn' into 'parent' in new functions
- Rename dfl_win to default_win
- Only remove the default DMA window if there is no window available
in first query.
- Check if default DMA window can be restored before removing it.
- Fix 'unitialized use' (found by travis mpe:ci-test)
- New patches #5 and #6
- v1 link: http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=184420&state=%2A&archive=both
Special thanks for Alexey Kardashevskiy, Brian King and
Oliver O'Halloran for the feedback provided!
Leonardo Bras (4):
powerpc/pseries/iommu: Create defines for operations in
ibm,ddw-applicable
powerpc/pseries/iommu: Update call to ibm,query-pe-dma-windows
powerpc/pseries/iommu: Move window-removing part of remove_ddw into
remove_dma_window
powerpc/pseries/iommu: Allow bigger 64bit window by removing default
DMA window
arch/powerpc/platforms/pseries/iommu.c | 242 ++++++++++++++++++++-----
1 file changed, 195 insertions(+), 47 deletions(-)
--
2.25.4
^ permalink raw reply
* [PATCH v2 2/2] ASoC: fsl_sai: Refine enable and disable sequence for synchronous mode
From: Shengjiu Wang @ 2020-08-05 2:23 UTC (permalink / raw)
To: timur, nicoleotsuka, Xiubo.Lee, festevam, lgirdwood, broonie,
perex, tiwai, alsa-devel, linuxppc-dev, linux-kernel
In-Reply-To: <1596594233-13489-1-git-send-email-shengjiu.wang@nxp.com>
Tx synchronous with Rx:
The TCSR.TE is no need to enabled when only Rx is going to be enabled.
Check if need to disable RSCR.RE before disabling TCSR.TE.
Rx synchronous with Tx:
The RCSR.RE is no need to enabled when only Tx is going to be enabled.
Check if need to disable TSCR.RE before disabling RCSR.TE.
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
---
sound/soc/fsl/fsl_sai.c | 126 +++++++++++++++++++++++++++-------------
1 file changed, 85 insertions(+), 41 deletions(-)
diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c
index 84714fe7144c..f30c4e7b5221 100644
--- a/sound/soc/fsl/fsl_sai.c
+++ b/sound/soc/fsl/fsl_sai.c
@@ -517,6 +517,56 @@ static int fsl_sai_hw_free(struct snd_pcm_substream *substream,
return 0;
}
+/**
+ * fsl_sai_dir_is_synced - Check if stream is synced by the opposite stream
+ *
+ * SAI supports synchronous mode using bit/frame clocks of either Transmitter's
+ * or Receiver's for both streams. This function is used to check if clocks of
+ * the stream's are synced by the opposite stream.
+ *
+ * @sai: SAI context
+ * @dir: stream direction
+ */
+static inline bool fsl_sai_dir_is_synced(struct fsl_sai *sai, int dir)
+{
+ int adir = (dir == TX) ? RX : TX;
+
+ /* current dir in async mode while opposite dir in sync mode */
+ return !sai->synchronous[dir] && sai->synchronous[adir];
+}
+
+static void fsl_sai_config_disable(struct fsl_sai *sai, int dir)
+{
+ unsigned int ofs = sai->soc_data->reg_offset;
+ bool tx = dir == TX;
+ u32 xcsr, count = 100;
+
+ regmap_update_bits(sai->regmap, FSL_SAI_xCSR(tx, ofs),
+ FSL_SAI_CSR_TERE, 0);
+
+ /* TERE will remain set till the end of current frame */
+ do {
+ udelay(10);
+ regmap_read(sai->regmap, FSL_SAI_xCSR(tx, ofs), &xcsr);
+ } while (--count && xcsr & FSL_SAI_CSR_TERE);
+
+ regmap_update_bits(sai->regmap, FSL_SAI_xCSR(tx, ofs),
+ FSL_SAI_CSR_FR, FSL_SAI_CSR_FR);
+
+ /*
+ * For sai master mode, after several open/close sai,
+ * there will be no frame clock, and can't recover
+ * anymore. Add software reset to fix this issue.
+ * This is a hardware bug, and will be fix in the
+ * next sai version.
+ */
+ if (!sai->is_slave_mode) {
+ /* Software Reset */
+ regmap_write(sai->regmap, FSL_SAI_xCSR(tx, ofs), FSL_SAI_CSR_SR);
+ /* Clear SR bit to finish the reset */
+ regmap_write(sai->regmap, FSL_SAI_xCSR(tx, ofs), 0);
+ }
+}
static int fsl_sai_trigger(struct snd_pcm_substream *substream, int cmd,
struct snd_soc_dai *cpu_dai)
@@ -525,7 +575,9 @@ static int fsl_sai_trigger(struct snd_pcm_substream *substream, int cmd,
unsigned int ofs = sai->soc_data->reg_offset;
bool tx = substream->stream == SNDRV_PCM_STREAM_PLAYBACK;
- u32 xcsr, count = 100;
+ int adir = tx ? RX : TX;
+ int dir = tx ? TX : RX;
+ u32 xcsr;
/*
* Asynchronous mode: Clear SYNC for both Tx and Rx.
@@ -548,10 +600,22 @@ static int fsl_sai_trigger(struct snd_pcm_substream *substream, int cmd,
regmap_update_bits(sai->regmap, FSL_SAI_xCSR(tx, ofs),
FSL_SAI_CSR_FRDE, FSL_SAI_CSR_FRDE);
- regmap_update_bits(sai->regmap, FSL_SAI_RCSR(ofs),
- FSL_SAI_CSR_TERE, FSL_SAI_CSR_TERE);
- regmap_update_bits(sai->regmap, FSL_SAI_TCSR(ofs),
+ regmap_update_bits(sai->regmap, FSL_SAI_xCSR(tx, ofs),
FSL_SAI_CSR_TERE, FSL_SAI_CSR_TERE);
+ /*
+ * Enable the opposite direction for synchronous mode
+ * 1. Tx sync with Rx: only set RE for Rx; set TE & RE for Tx
+ * 2. Rx sync with Tx: only set TE for Tx; set RE & TE for Rx
+ *
+ * RM recommends to enable RE after TE for case 1 and to enable
+ * TE after RE for case 2, but we here may not always guarantee
+ * that happens: "arecord 1.wav; aplay 2.wav" in case 1 enables
+ * TE after RE, which is against what RM recommends but should
+ * be safe to do, judging by years of testing results.
+ */
+ if (fsl_sai_dir_is_synced(sai, adir))
+ regmap_update_bits(sai->regmap, FSL_SAI_xCSR((!tx), ofs),
+ FSL_SAI_CSR_TERE, FSL_SAI_CSR_TERE);
regmap_update_bits(sai->regmap, FSL_SAI_xCSR(tx, ofs),
FSL_SAI_CSR_xIE_MASK, FSL_SAI_FLAGS);
@@ -566,43 +630,23 @@ static int fsl_sai_trigger(struct snd_pcm_substream *substream, int cmd,
/* Check if the opposite FRDE is also disabled */
regmap_read(sai->regmap, FSL_SAI_xCSR(!tx, ofs), &xcsr);
- if (!(xcsr & FSL_SAI_CSR_FRDE)) {
- /* Disable both directions and reset their FIFOs */
- regmap_update_bits(sai->regmap, FSL_SAI_TCSR(ofs),
- FSL_SAI_CSR_TERE, 0);
- regmap_update_bits(sai->regmap, FSL_SAI_RCSR(ofs),
- FSL_SAI_CSR_TERE, 0);
-
- /* TERE will remain set till the end of current frame */
- do {
- udelay(10);
- regmap_read(sai->regmap,
- FSL_SAI_xCSR(tx, ofs), &xcsr);
- } while (--count && xcsr & FSL_SAI_CSR_TERE);
-
- regmap_update_bits(sai->regmap, FSL_SAI_TCSR(ofs),
- FSL_SAI_CSR_FR, FSL_SAI_CSR_FR);
- regmap_update_bits(sai->regmap, FSL_SAI_RCSR(ofs),
- FSL_SAI_CSR_FR, FSL_SAI_CSR_FR);
-
- /*
- * For sai master mode, after several open/close sai,
- * there will be no frame clock, and can't recover
- * anymore. Add software reset to fix this issue.
- * This is a hardware bug, and will be fix in the
- * next sai version.
- */
- if (!sai->is_slave_mode) {
- /* Software Reset for both Tx and Rx */
- regmap_write(sai->regmap, FSL_SAI_TCSR(ofs),
- FSL_SAI_CSR_SR);
- regmap_write(sai->regmap, FSL_SAI_RCSR(ofs),
- FSL_SAI_CSR_SR);
- /* Clear SR bit to finish the reset */
- regmap_write(sai->regmap, FSL_SAI_TCSR(ofs), 0);
- regmap_write(sai->regmap, FSL_SAI_RCSR(ofs), 0);
- }
- }
+
+ /*
+ * If opposite stream provides clocks for synchronous mode and
+ * it is inactive, disable it before disabling the current one
+ */
+ if (fsl_sai_dir_is_synced(sai, adir) && !(xcsr & FSL_SAI_CSR_FRDE))
+ fsl_sai_config_disable(sai, adir);
+
+ /*
+ * Disable current stream if either of:
+ * 1. current stream doesn't provide clocks for synchronous mode
+ * 2. current stream provides clocks for synchronous mode but no
+ * more stream is active.
+ */
+ if (!fsl_sai_dir_is_synced(sai, dir) || !(xcsr & FSL_SAI_CSR_FRDE))
+ fsl_sai_config_disable(sai, dir);
+
break;
default:
return -EINVAL;
--
2.27.0
^ permalink raw reply related
* [PATCH v2 1/2] ASoC: fsl_sai: Clean code for synchronous mode
From: Shengjiu Wang @ 2020-08-05 2:23 UTC (permalink / raw)
To: timur, nicoleotsuka, Xiubo.Lee, festevam, lgirdwood, broonie,
perex, tiwai, alsa-devel, linuxppc-dev, linux-kernel
In-Reply-To: <1596594233-13489-1-git-send-email-shengjiu.wang@nxp.com>
Tx synchronous with Rx: The RMR is the word mask register, it is used
to mask any word in the frame, it is not relating to clock generation,
So it is no need to be changed when Tx is going to be enabled.
Rx synchronous with Tx: The TMR is the word mask register, it is used
to mask any word in the frame, it is not relating to clock generation,
So it is no need to be changed when Rx is going to be enabled.
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
---
sound/soc/fsl/fsl_sai.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c
index cdff739924e2..84714fe7144c 100644
--- a/sound/soc/fsl/fsl_sai.c
+++ b/sound/soc/fsl/fsl_sai.c
@@ -470,8 +470,7 @@ static int fsl_sai_hw_params(struct snd_pcm_substream *substream,
/*
* For SAI master mode, when Tx(Rx) sync with Rx(Tx) clock, Rx(Tx) will
* generate bclk and frame clock for Tx(Rx), we should set RCR4(TCR4),
- * RCR5(TCR5) and RMR(TMR) for playback(capture), or there will be sync
- * error.
+ * RCR5(TCR5) for playback(capture), or there will be sync error.
*/
if (!sai->is_slave_mode) {
@@ -482,8 +481,6 @@ static int fsl_sai_hw_params(struct snd_pcm_substream *substream,
regmap_update_bits(sai->regmap, FSL_SAI_TCR5(ofs),
FSL_SAI_CR5_WNW_MASK | FSL_SAI_CR5_W0W_MASK |
FSL_SAI_CR5_FBT_MASK, val_cr5);
- regmap_write(sai->regmap, FSL_SAI_TMR,
- ~0UL - ((1 << channels) - 1));
} else if (!sai->synchronous[RX] && sai->synchronous[TX] && tx) {
regmap_update_bits(sai->regmap, FSL_SAI_RCR4(ofs),
FSL_SAI_CR4_SYWD_MASK | FSL_SAI_CR4_FRSZ_MASK,
@@ -491,8 +488,6 @@ static int fsl_sai_hw_params(struct snd_pcm_substream *substream,
regmap_update_bits(sai->regmap, FSL_SAI_RCR5(ofs),
FSL_SAI_CR5_WNW_MASK | FSL_SAI_CR5_W0W_MASK |
FSL_SAI_CR5_FBT_MASK, val_cr5);
- regmap_write(sai->regmap, FSL_SAI_RMR,
- ~0UL - ((1 << channels) - 1));
}
}
--
2.27.0
^ permalink raw reply related
* [PATCH v2 0/2] refine and clean code for synchronous mode
From: Shengjiu Wang @ 2020-08-05 2:23 UTC (permalink / raw)
To: timur, nicoleotsuka, Xiubo.Lee, festevam, lgirdwood, broonie,
perex, tiwai, alsa-devel, linuxppc-dev, linux-kernel
refine and clean code for synchronous mode
Shengjiu Wang (2):
ASoC: fsl_sai: Clean code for synchronous mode
ASoC: fsl_sai: Refine enable and disable sequence for synchronous mode
changes in v2:
- Split the commit
- refine the sequence in trigger stop
sound/soc/fsl/fsl_sai.c | 133 ++++++++++++++++++++++++++--------------
1 file changed, 86 insertions(+), 47 deletions(-)
--
2.27.0
^ permalink raw reply
* Re: [PATCH] powerpc/powernv/sriov: Fix use of uninitialised variable
From: Michael Ellerman @ 2020-08-05 0:42 UTC (permalink / raw)
To: linuxppc-dev, Oliver O'Halloran; +Cc: Nathan Chancellor
In-Reply-To: <20200803075408.132601-1-oohall@gmail.com>
On Mon, 3 Aug 2020 17:54:08 +1000, Oliver O'Halloran wrote:
> Initialising the value before using it is generally regarded as a good
> idea so do that.
Applied to powerpc/next.
[1/1] powerpc/powernv/sriov: Fix use of uninitialised variable
https://git.kernel.org/powerpc/c/2075ec9896c5aef01e837198381d04cfa6452317
cheers
^ permalink raw reply
* Re: [PATCH v2] selftests/powerpc: Skip vmx/vsx/tar/etc tests on older CPUs
From: Michael Ellerman @ 2020-08-05 0:42 UTC (permalink / raw)
To: linuxppc-dev, Michael Ellerman
In-Reply-To: <20200803020719.96114-1-mpe@ellerman.id.au>
On Mon, 3 Aug 2020 12:07:19 +1000, Michael Ellerman wrote:
> Some of our tests use VSX or newer VMX instructions, so need to be
> skipped on older CPUs to avoid SIGILL'ing.
>
> Similarly TAR was added in v2.07, and the PMU event used in the stcx
> fail test only works on Power8 or later.
Applied to powerpc/next.
[1/1] selftests/powerpc: Skip vmx/vsx/tar/etc tests on older CPUs
https://git.kernel.org/powerpc/c/872d11bca9c29ed19595c993b9f552ffe9b63dcb
cheers
^ permalink raw reply
* Re: [PATCH] powerpc/40x: Fix assembler warning about r0
From: Michael Ellerman @ 2020-08-05 0:42 UTC (permalink / raw)
To: linuxppc-dev, Michael Ellerman
In-Reply-To: <20200722022422.825197-1-mpe@ellerman.id.au>
On Wed, 22 Jul 2020 12:24:22 +1000, Michael Ellerman wrote:
> The assembler says:
> arch/powerpc/kernel/head_40x.S:623: Warning: invalid register expression
>
> It's objecting to the use of r0 as the RA argument. That's because
> when RA = 0 the literal value 0 is used, rather than the content of
> r0, making the use of r0 in the source potentially confusing.
>
> [...]
Applied to powerpc/next.
[1/1] powerpc/40x: Fix assembler warning about r0
https://git.kernel.org/powerpc/c/8d8a629d00a5283874b81b594f31f8d436dc57d8
cheers
^ permalink raw reply
* Re: [PATCH v4 1/7] powerpc/pseries/iommu: Create defines for operations in ibm, ddw-applicable
From: David Dai @ 2020-08-04 21:34 UTC (permalink / raw)
To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
Paul Mackerras, Alexey Kardashevskiy, Joel Stanley,
Christophe Leroy, Thiago Jung Bauermann, Ram Pai, Brian King
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200716071658.467820-2-leobras.c@gmail.com>
On Thu, 2020-07-16 at 04:16 -0300, Leonardo Bras wrote:
> Create defines to help handling ibm,ddw-applicable values, avoiding
> confusion about the index of given operations.
>
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
> arch/powerpc/platforms/pseries/iommu.c | 43 ++++++++++++++++------
> ----
> 1 file changed, 26 insertions(+), 17 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/iommu.c
> b/arch/powerpc/platforms/pseries/iommu.c
> index 6d47b4a3ce39..ac0d6376bdad 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -39,6 +39,14 @@
>
> #include "pseries.h"
>
> +enum {
> + DDW_QUERY_PE_DMA_WIN = 0,
> + DDW_CREATE_PE_DMA_WIN = 1,
> + DDW_REMOVE_PE_DMA_WIN = 2,
> +
> + DDW_APPLICABLE_SIZE
> +};
> +
> static struct iommu_table_group *iommu_pseries_alloc_group(int node)
> {
> struct iommu_table_group *table_group;
> @@ -771,12 +779,12 @@ static void remove_ddw(struct device_node *np,
> bool remove_prop)
> {
> struct dynamic_dma_window_prop *dwp;
> struct property *win64;
> - u32 ddw_avail[3];
> + u32 ddw_avail[DDW_APPLICABLE_SIZE];
> u64 liobn;
> int ret = 0;
>
> ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
> - &ddw_avail[0], 3);
> + &ddw_avail[0],
> DDW_APPLICABLE_SIZE);
>
> win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
> if (!win64)
> @@ -798,15 +806,15 @@ static void remove_ddw(struct device_node *np,
> bool remove_prop)
> pr_debug("%pOF successfully cleared tces in window.\n",
> np);
>
> - ret = rtas_call(ddw_avail[2], 1, 1, NULL, liobn);
> + ret = rtas_call(ddw_avail[DDW_REMOVE_PE_DMA_WIN], 1, 1, NULL,
> liobn);
> if (ret)
> pr_warn("%pOF: failed to remove direct window: rtas
> returned "
> "%d to ibm,remove-pe-dma-window(%x) %llx\n",
> - np, ret, ddw_avail[2], liobn);
> + np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN],
> liobn);
> else
> pr_debug("%pOF: successfully removed direct window:
> rtas returned "
> "%d to ibm,remove-pe-dma-window(%x) %llx\n",
> - np, ret, ddw_avail[2], liobn);
> + np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN],
> liobn);
>
> delprop:
> if (remove_prop)
> @@ -889,11 +897,11 @@ static int query_ddw(struct pci_dev *dev, const
> u32 *ddw_avail,
> buid = pdn->phb->buid;
> cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
>
> - ret = rtas_call(ddw_avail[0], 3, 5, (u32 *)query,
> - cfg_addr, BUID_HI(buid), BUID_LO(buid));
> + ret = rtas_call(ddw_avail[DDW_QUERY_PE_DMA_WIN], 3, 5, (u32
> *)query,
> + cfg_addr, BUID_HI(buid), BUID_LO(buid));
> dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x"
> - " returned %d\n", ddw_avail[0], cfg_addr,
> BUID_HI(buid),
> - BUID_LO(buid), ret);
> + " returned %d\n", ddw_avail[DDW_QUERY_PE_DMA_WIN],
> cfg_addr,
> + BUID_HI(buid), BUID_LO(buid), ret);
> return ret;
> }
>
> @@ -920,15 +928,16 @@ static int create_ddw(struct pci_dev *dev,
> const u32 *ddw_avail,
>
> do {
> /* extra outputs are LIOBN and dma-addr (hi, lo) */
> - ret = rtas_call(ddw_avail[1], 5, 4, (u32 *)create,
> - cfg_addr, BUID_HI(buid), BUID_LO(buid),
> - page_shift, window_shift);
> + ret = rtas_call(ddw_avail[DDW_CREATE_PE_DMA_WIN], 5, 4,
> + (u32 *)create, cfg_addr, BUID_HI(buid),
> + BUID_LO(buid), page_shift,
> window_shift);
> } while (rtas_busy_delay(ret));
> dev_info(&dev->dev,
> "ibm,create-pe-dma-window(%x) %x %x %x %x %x returned
> %d "
> - "(liobn = 0x%x starting addr = %x %x)\n", ddw_avail[1],
> - cfg_addr, BUID_HI(buid), BUID_LO(buid), page_shift,
> - window_shift, ret, create->liobn, create->addr_hi,
> create->addr_lo);
> + "(liobn = 0x%x starting addr = %x %x)\n",
> + ddw_avail[DDW_CREATE_PE_DMA_WIN], cfg_addr,
> BUID_HI(buid),
> + BUID_LO(buid), page_shift, window_shift, ret, create-
> >liobn,
> + create->addr_hi, create->addr_lo);
>
> return ret;
> }
> @@ -996,7 +1005,7 @@ static u64 enable_ddw(struct pci_dev *dev,
> struct device_node *pdn)
> int page_shift;
> u64 dma_addr, max_addr;
> struct device_node *dn;
> - u32 ddw_avail[3];
> + u32 ddw_avail[DDW_APPLICABLE_SIZE];
> struct direct_window *window;
> struct property *win64;
> struct dynamic_dma_window_prop *ddwprop;
> @@ -1029,7 +1038,7 @@ static u64 enable_ddw(struct pci_dev *dev,
> struct device_node *pdn)
> * the property is actually in the parent, not the PE
> */
> ret = of_property_read_u32_array(pdn, "ibm,ddw-applicable",
> - &ddw_avail[0], 3);
> + &ddw_avail[0],
> DDW_APPLICABLE_SIZE);
> if (ret)
> goto out_failed;
>
Tested-by: David Dai <zdai@linux.vnet.ibm.com>
^ permalink raw reply
* Re: [PATCH v4 4/7] powerpc/pseries/iommu: Remove default DMA window before creating DDW
From: David Dai @ 2020-08-04 21:33 UTC (permalink / raw)
To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
Paul Mackerras, Alexey Kardashevskiy, Joel Stanley,
Christophe Leroy, Thiago Jung Bauermann, Ram Pai, Brian King
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200716071658.467820-5-leobras.c@gmail.com>
On Thu, 2020-07-16 at 04:16 -0300, Leonardo Bras wrote:
> On LoPAR "DMA Window Manipulation Calls", it's recommended to remove
> the
> default DMA window for the device, before attempting to configure a
> DDW,
> in order to make the maximum resources available for the next DDW to
> be
> created.
>
> This is a requirement for using DDW on devices in which hypervisor
> allows only one DMA window.
>
> If setting up a new DDW fails anywhere after the removal of this
> default DMA window, it's needed to restore the default DMA window.
> For this, an implementation of ibm,reset-pe-dma-windows rtas call is
> needed:
>
> Platforms supporting the DDW option starting with LoPAR level 2.7
> implement
> ibm,ddw-extensions. The first extension available (index 2) carries
> the
> token for ibm,reset-pe-dma-windows rtas call, which is used to
> restore
> the default DMA window for a device, if it has been deleted.
>
> It does so by resetting the TCE table allocation for the PE to it's
> boot time value, available in "ibm,dma-window" device tree node.
>
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
> arch/powerpc/platforms/pseries/iommu.c | 73 +++++++++++++++++++++++-
> --
> 1 file changed, 66 insertions(+), 7 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/iommu.c
> b/arch/powerpc/platforms/pseries/iommu.c
> index 4e33147825cc..fc8d0555e2e9 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -1066,6 +1066,38 @@ static phys_addr_t
> ddw_memory_hotplug_max(void)
> return max_addr;
> }
>
> +/*
> + * Platforms supporting the DDW option starting with LoPAR level 2.7
> implement
> + * ibm,ddw-extensions, which carries the rtas token for
> + * ibm,reset-pe-dma-windows.
> + * That rtas-call can be used to restore the default DMA window for
> the device.
> + */
> +static void reset_dma_window(struct pci_dev *dev, struct device_node
> *par_dn)
> +{
> + int ret;
> + u32 cfg_addr, reset_dma_win;
> + u64 buid;
> + struct device_node *dn;
> + struct pci_dn *pdn;
> +
> + ret = ddw_read_ext(par_dn, DDW_EXT_RESET_DMA_WIN,
> &reset_dma_win);
> + if (ret)
> + return;
> +
> + dn = pci_device_to_OF_node(dev);
> + pdn = PCI_DN(dn);
> + buid = pdn->phb->buid;
> + cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
> +
> + ret = rtas_call(reset_dma_win, 3, 1, NULL, cfg_addr,
> BUID_HI(buid),
> + BUID_LO(buid));
> + if (ret)
> + dev_info(&dev->dev,
> + "ibm,reset-pe-dma-windows(%x) %x %x %x
> returned %d ",
> + reset_dma_win, cfg_addr, BUID_HI(buid),
> BUID_LO(buid),
> + ret);
> +}
> +
> /*
> * If the PE supports dynamic dma windows, and there is space for a
> table
> * that can map all pages in a linear offset, then setup such a
> table,
> @@ -1090,6 +1122,7 @@ static u64 enable_ddw(struct pci_dev *dev,
> struct device_node *pdn)
> struct property *win64;
> struct dynamic_dma_window_prop *ddwprop;
> struct failed_ddw_pdn *fpdn;
> + bool default_win_removed = false;
>
> mutex_lock(&direct_window_init_mutex);
>
> @@ -1133,14 +1166,38 @@ static u64 enable_ddw(struct pci_dev *dev,
> struct device_node *pdn)
> if (ret != 0)
> goto out_failed;
>
> + /*
> + * If there is no window available, remove the default DMA
> window,
> + * if it's present. This will make all the resources available
> to the
> + * new DDW window.
> + * If anything fails after this, we need to restore it, so also
> check
> + * for extensions presence.
> + */
> if (query.windows_available == 0) {
> - /*
> - * no additional windows are available for this device.
> - * We might be able to reallocate the existing window,
> - * trading in for a larger page size.
> - */
> - dev_dbg(&dev->dev, "no free dynamic windows");
> - goto out_failed;
> + struct property *default_win;
> + int reset_win_ext;
> +
> + default_win = of_find_property(pdn, "ibm,dma-window",
> NULL);
> + if (!default_win)
> + goto out_failed;
> +
> + reset_win_ext = ddw_read_ext(pdn,
> DDW_EXT_RESET_DMA_WIN, NULL);
> + if (reset_win_ext)
> + goto out_failed;
> +
> + remove_dma_window(pdn, ddw_avail, default_win);
> + default_win_removed = true;
> +
> + /* Query again, to check if the window is available */
> + ret = query_ddw(dev, ddw_avail, &query, pdn);
> + if (ret != 0)
> + goto out_failed;
> +
> + if (query.windows_available == 0) {
> + /* no windows are available for this device. */
> + dev_dbg(&dev->dev, "no free dynamic windows");
> + goto out_failed;
> + }
> }
> if (query.page_size & 4) {
> page_shift = 24; /* 16MB */
> @@ -1231,6 +1288,8 @@ static u64 enable_ddw(struct pci_dev *dev,
> struct device_node *pdn)
> kfree(win64);
>
> out_failed:
> + if (default_win_removed)
> + reset_dma_window(dev, pdn);
>
> fpdn = kzalloc(sizeof(*fpdn), GFP_KERNEL);
> if (!fpdn)
Tested-by: David Dai <zdai@linux.vnet.ibm.com>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox