LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH BACKPORTv2 4.19, 5.0, 5.1] crypto: vmx - ghash: do nosimd fallback manually
From: Greg KH @ 2019-06-03  7:54 UTC (permalink / raw)
  To: Daniel Axtens; +Cc: linuxppc-dev, Herbert Xu, stable
In-Reply-To: <20190603020848.9598-1-dja@axtens.net>

On Mon, Jun 03, 2019 at 12:08:48PM +1000, Daniel Axtens wrote:
> commit 357d065a44cdd77ed5ff35155a989f2a763e96ef upstream.
> [backported: the VMX driver did not use crypto_simd_usable() until
>  after 5.1]
> 
> VMX ghash was using a fallback that did not support interleaving simd
> and nosimd operations, leading to failures in the extended test suite.
> 
> If I understood correctly, Eric's suggestion was to use the same
> data format that the generic code uses, allowing us to call into it
> with the same contexts. I wasn't able to get that to work - I think
> there's a very different key structure and data layout being used.
> 
> So instead steal the arm64 approach and perform the fallback
> operations directly if required.
> 
> Fixes: cc333cd68dfa ("crypto: vmx - Adding GHASH routines for VMX module")
> Cc: stable@vger.kernel.org # v4.1+
> Reported-by: Eric Biggers <ebiggers@google.com>
> Signed-off-by: Daniel Axtens <dja@axtens.net>
> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Tested-by: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> Signed-off-by: Daniel Axtens <dja@axtens.net>
> ---
> 
> v2: do stable backport form correctly.

Thanks for all of these, now queued up.

greg k-h

^ permalink raw reply

* [PATCH v3] powerpc: fix kexec failure on book3s/32
From: Christophe Leroy @ 2019-06-03  8:20 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Aaro Koskinen
  Cc: linuxppc-dev, linux-kernel

In the old days, _PAGE_EXEC didn't exist on 6xx aka book3s/32.
Therefore, allthough __mapin_ram_chunk() was already mapping kernel
text with PAGE_KERNEL_TEXT and the rest with PAGE_KERNEL, the entire
memory was executable. Part of the memory (first 512kbytes) was
mapped with BATs instead of page table, but it was also entirely
mapped as executable.

In commit 385e89d5b20f ("powerpc/mm: add exec protection on
powerpc 603"), we started adding exec protection to some 6xx, namely
the 603, for pages mapped via pagetables.

Then, in commit 63b2bc619565 ("powerpc/mm/32s: Use BATs for
STRICT_KERNEL_RWX"), the exec protection was extended to BAT mapped
memory, so that really only the kernel text could be executed.

The problem here is that kexec is based on copying some code into
upper part of memory then executing it from there in order to install
a fresh new kernel at its definitive location.

However, the code is position independant and first part of it is
just there to deactivate the MMU and jump to the second part. So it
is possible to run this first part inplace instead of running the
copy. Once the MMU is off, there is no protection anymore and the
second part of the code will just run as before.

Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Fixes: 63b2bc619565 ("powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 Aaro, can you test this patch ? Thanks.

 arch/powerpc/include/asm/kexec.h       | 3 +++
 arch/powerpc/kernel/machine_kexec_32.c | 4 +++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 4a585cba1787..c68476818753 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -94,6 +94,9 @@ static inline bool kdump_in_progress(void)
 	return crashing_cpu >= 0;
 }

+void relocate_new_kernel(unsigned long indirection_page, unsigned long reboot_code_buffer,
+			 unsigned long start_address) __noreturn;
+
 #ifdef CONFIG_KEXEC_FILE
 extern const struct kexec_file_ops kexec_elf64_ops;

diff --git a/arch/powerpc/kernel/machine_kexec_32.c b/arch/powerpc/kernel/machine_kexec_32.c
index affe5dcce7f4..2b160d68db49 100644
--- a/arch/powerpc/kernel/machine_kexec_32.c
+++ b/arch/powerpc/kernel/machine_kexec_32.c
@@ -30,7 +30,6 @@ typedef void (*relocate_new_kernel_t)(
  */
 void default_machine_kexec(struct kimage *image)
 {
-	extern const unsigned char relocate_new_kernel[];
 	extern const unsigned int relocate_new_kernel_size;
 	unsigned long page_list;
 	unsigned long reboot_code_buffer, reboot_code_buffer_phys;
@@ -58,6 +57,9 @@ void default_machine_kexec(struct kimage *image)
 				reboot_code_buffer + KEXEC_CONTROL_PAGE_SIZE);
 	printk(KERN_INFO "Bye!\n");

+	if (!IS_ENABLED(CONFIG_FSL_BOOKE) && !IS_ENABLED(CONFIG_44x))
+		relocate_new_kernel(page_list, reboot_code_buffer_phys, image->start);
+
 	/* now call it */
 	rnk = (relocate_new_kernel_t) reboot_code_buffer;
 	(*rnk)(page_list, reboot_code_buffer_phys, image->start);
-- 
2.13.3

^ permalink raw reply related

* Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
From: Alexey Kardashevskiy @ 2019-06-03  8:35 UTC (permalink / raw)
  To: Shawn Anastasio
  Cc: Sam Bobroff, linux-pci, Linux Kernel Mailing List, rppt, Oliver,
	Bjorn Helgaas, Paul Mackerras, xyjxie, linuxppc-dev
In-Reply-To: <2f4185ac-d19f-6668-7b3e-a300ce3b9e00@ozlabs.ru>



On 03/06/2019 15:02, Alexey Kardashevskiy wrote:
> 
> 
> On 03/06/2019 12:23, Shawn Anastasio wrote:
>>
>>
>> On 5/30/19 10:56 PM, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 31/05/2019 08:49, Shawn Anastasio wrote:
>>>> On 5/29/19 10:39 PM, Alexey Kardashevskiy wrote:
>>>>>
>>>>>
>>>>> On 28/05/2019 17:39, Shawn Anastasio wrote:
>>>>>>
>>>>>>
>>>>>> On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 28/05/2019 15:36, Oliver wrote:
>>>>>>>> On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio <shawn@anastas.io>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Introduce a new pcibios function pcibios_ignore_alignment_request
>>>>>>>>> which allows the PCI core to defer to platform-specific code to
>>>>>>>>> determine whether or not to ignore alignment requests for PCI
>>>>>>>>> resources.
>>>>>>>>>
>>>>>>>>> The existing behavior is to simply ignore alignment requests when
>>>>>>>>> PCI_PROBE_ONLY is set. This is behavior is maintained by the
>>>>>>>>> default implementation of pcibios_ignore_alignment_request.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Shawn Anastasio <shawn@anastas.io>
>>>>>>>>> ---
>>>>>>>>>     drivers/pci/pci.c   | 9 +++++++--
>>>>>>>>>     include/linux/pci.h | 1 +
>>>>>>>>>     2 files changed, 8 insertions(+), 2 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>>>>>>>> index 8abc843b1615..8207a09085d1 100644
>>>>>>>>> --- a/drivers/pci/pci.c
>>>>>>>>> +++ b/drivers/pci/pci.c
>>>>>>>>> @@ -5882,6 +5882,11 @@ resource_size_t __weak
>>>>>>>>> pcibios_default_alignment(void)
>>>>>>>>>            return 0;
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>> +int __weak pcibios_ignore_alignment_request(void)
>>>>>>>>> +{
>>>>>>>>> +       return pci_has_flag(PCI_PROBE_ONLY);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>>     #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
>>>>>>>>>     static char
>>>>>>>>> resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
>>>>>>>>>     static DEFINE_SPINLOCK(resource_alignment_lock);
>>>>>>>>> @@ -5906,9 +5911,9 @@ static resource_size_t
>>>>>>>>> pci_specified_resource_alignment(struct pci_dev *dev,
>>>>>>>>>            p = resource_alignment_param;
>>>>>>>>>            if (!*p && !align)
>>>>>>>>>                    goto out;
>>>>>>>>> -       if (pci_has_flag(PCI_PROBE_ONLY)) {
>>>>>>>>> +       if (pcibios_ignore_alignment_request()) {
>>>>>>>>>                    align = 0;
>>>>>>>>> -               pr_info_once("PCI: Ignoring requested alignments
>>>>>>>>> (PCI_PROBE_ONLY)\n");
>>>>>>>>> +               pr_info_once("PCI: Ignoring requested
>>>>>>>>> alignments\n");
>>>>>>>>>                    goto out;
>>>>>>>>>            }
>>>>>>>>
>>>>>>>> I think the logic here is questionable to begin with. If the user
>>>>>>>> has
>>>>>>>> explicitly requested re-aligning a resource via the command line
>>>>>>>> then
>>>>>>>> we should probably do it even if PCI_PROBE_ONLY is set. When it
>>>>>>>> breaks
>>>>>>>> they get to keep the pieces.
>>>>>>>>
>>>>>>>> That said, the real issue here is that PCI_PROBE_ONLY probably
>>>>>>>> shouldn't be set under qemu/kvm. Under the other hypervisor
>>>>>>>> (PowerVM)
>>>>>>>> hotplugged devices are configured by firmware before it's passed to
>>>>>>>> the guest and we need to keep the FW assignments otherwise things
>>>>>>>> break. QEMU however doesn't do any BAR assignments and relies on
>>>>>>>> that
>>>>>>>> being handled by the guest. At boot time this is done by SLOF, but
>>>>>>>> Linux only keeps SLOF around until it's extracted the device-tree.
>>>>>>>> Once that's done SLOF gets blown away and the kernel needs to do
>>>>>>>> it's
>>>>>>>> own BAR assignments. I'm guessing there's a hack in there to make it
>>>>>>>> work today, but it's a little surprising that it works at all...
>>>>>>>
>>>>>>>
>>>>>>> The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the
>>>>>>> guest which receives an event from qemu (RAS_EPOW from
>>>>>>> /proc/interrupts), fetches device tree chunks (and as I understand
>>>>>>> it -
>>>>>>> they come with BARs from phyp but without from qemu) and writes
>>>>>>> "1" to
>>>>>>> "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually:
>>>>>>
>>>>>> Interesting. Does this mean that the PHYP hotplug path doesn't
>>>>>> call pci_assign_resource?
>>>>>
>>>>>
>>>>> I'd expect dlpar_add_slot() to be called under phyp and eventually
>>>>> pci_device_add() which (I think) may or may not trigger later
>>>>> reassignment.
>>>>>
>>>>>
>>>>>> If so it means the patch may not
>>>>>> break that platform after all, though it still may not be
>>>>>> the correct way of doing things.
>>>>>
>>>>>
>>>>> We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems
>>>>> that (unless resource_alignment= is used) the pseries guest should just
>>>>> walk through all allocated resources and leave them unchanged.
>>>>
>>>> If we add a pcibios_default_alignment() implementation like was
>>>> suggested earlier, then it will behave as if the user has
>>>> specified resource_alignment= by default and SLOF's assignments
>>>> won't be honored (I think).
>>>
>>>
>>> I removed pci_add_flags(PCI_PROBE_ONLY) from pSeries_setup_arch and
>>> tried booting with and without pci=resource_alignment= and I can see no
>>> difference - BARs are still aligned to 64K as programmed in SLOF; if I
>>> hack SLOF to align to 4K or 32K - BARs get packed and the guest leaves
>>> them unchanged.
>>>
>>>
>>>> I guess it boils down to one question - is it important that we
>>>> observe SLOF's initial BAR assignments?
>>>
>>> It isn't if it's SLOF but it is if it's phyp. It used to not
>>> allow/support BAR reassignment and even if it does not, I'd rather avoid
>>> touching them.
>>
>> A quick update. I tried removing pci_add_flags(PCI_PROBE_ONLY) which
>> worked, but if I add an implementation of pcibios_default_alignment
>> which simply returns PAGE_SIZE, my VM fails to boot and many errors
>> from the virtio disk driver are printed to the console.
>>
>> After some investigation, it seems that with pcibios_default_alignment
>> present, Linux will reallocate all resources provided by SLOF on
>> boot. I'm still not sure why exactly this causes the virtio driver
>> to fail, but it does indicate that there is a reason to keep
>> SLOF's initial assignments.
>>
>> Anybody have an idea what's causing this?
> 
> With your changes the guest feels the urge to reassign bars (no idea why
> but ok), when it does so, it puts both BARs (one is prefetchable) into
> the 32bit non-prefetchable window of the PHB (SLOF puts the prefetchable
> bar to a 64bit prefetchable window, I have no idea why the guest does it
> different either but this must still work) and then qemu does not
> emulate something properly - unassigned_mem_accepts() is triggered on
> the bar access - no idea why - I am debugging it right now.


Sooo the problem is that resouce::flags has 2 bits to describe 64bit
BARs - PCI_BASE_ADDRESS_MEM_TYPE_64 and IORESOURCE_MEM_64 - and we don't
set IORESOURCE_MEM_64 for 64bit BARs when parsing the fdt.

So the BAR reallocator moves the BAR to 32bit window (which is not
desirable but permitted, I still have to chase it) and then
pci_std_update_resource() writes BAR back but since now it is 32bit BAR,
it does not write to the upper 32bits so that half remains 0x2100, QEMU
does not move BAR to the right window and the MMIO stops working.

Try this in the guest kernel, it seems to keep bars where they were
after slof.


diff --git a/arch/powerpc/kernel/pci_of_scan.c
b/arch/powerpc/kernel/pci_of_scan.c
index 24191ea2d9a7..64ad92016b63 100644
--- a/arch/powerpc/kernel/pci_of_scan.c
+++ b/arch/powerpc/kernel/pci_of_scan.c
@@ -45,6 +45,8 @@ unsigned int pci_parse_of_flags(u32 addr0, int bridge)
        if (addr0 & 0x02000000) {
                flags = IORESOURCE_MEM | PCI_BASE_ADDRESS_SPACE_MEMORY;
                flags |= (addr0 >> 22) & PCI_BASE_ADDRESS_MEM_TYPE_64;
+               if (flags & PCI_BASE_ADDRESS_MEM_TYPE_64)
+                       flags |= IORESOURCE_MEM_64;
                flags |= (addr0 >> 28) & PCI_BASE_ADDRESS_MEM_TYPE_1M;
                if (addr0 & 0x40000000)





-- 
Alexey

^ permalink raw reply related

* Re: [PATCH] scsi: ibmvscsi: Don't use rc uninitialized in ibmvscsi_do_work
From: Christophe Leroy @ 2019-06-03  8:45 UTC (permalink / raw)
  To: Nathan Chancellor, Tyrel Datwyler, James E.J. Bottomley,
	Martin K. Petersen
  Cc: clang-built-linux, linuxppc-dev, linux-kernel, linux-scsi
In-Reply-To: <20190531185306.41290-1-natechancellor@gmail.com>



Le 31/05/2019 à 20:53, Nathan Chancellor a écrit :
> clang warns:
> 
> drivers/scsi/ibmvscsi/ibmvscsi.c:2126:7: warning: variable 'rc' is used
> uninitialized whenever switch case is taken [-Wsometimes-uninitialized]
>          case IBMVSCSI_HOST_ACTION_NONE:
>               ^~~~~~~~~~~~~~~~~~~~~~~~~
> drivers/scsi/ibmvscsi/ibmvscsi.c:2151:6: note: uninitialized use occurs
> here
>          if (rc) {
>              ^~
> 
> Initialize rc to zero so that the atomic_set and dev_err statement don't
> trigger for the cases that just break.
> 
> Fixes: 035a3c4046b5 ("scsi: ibmvscsi: redo driver work thread to use enum action states")
> Link: https://github.com/ClangBuiltLinux/linux/issues/502
> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
> ---
>   drivers/scsi/ibmvscsi/ibmvscsi.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c
> index 727c31dc11a0..6714d8043e62 100644
> --- a/drivers/scsi/ibmvscsi/ibmvscsi.c
> +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
> @@ -2118,7 +2118,7 @@ static unsigned long ibmvscsi_get_desired_dma(struct vio_dev *vdev)
>   static void ibmvscsi_do_work(struct ibmvscsi_host_data *hostdata)
>   {
>   	unsigned long flags;
> -	int rc;
> +	int rc = 0;

I don't think the above is the best solution, as it hides the warning 
instead of really fixing it.

Your problem is that some legs of the switch are missing setting the 
value of rc, it would therefore be better to fix the legs instead of 
setting a default value which may not be correct for every case, 
allthough it may be at the time being.

Christophe


>   	char *action = "reset";
>   
>   	spin_lock_irqsave(hostdata->host->host_lock, flags);
> 

^ permalink raw reply

* Re: [PATCH v3] powerpc/pseries: Fix cpu_hotplug_lock acquisition in resize_hpt()
From: Gautham R Shenoy @ 2019-06-03  8:55 UTC (permalink / raw)
  To: Gautham R. Shenoy
  Cc: linux-kernel, Nicholas Piggin, Paul Mackerras, Aneesh Kumar K.V,
	linuxppc-dev
In-Reply-To: <1557906352-29048-1-git-send-email-ego@linux.vnet.ibm.com>

Hi,

On Wed, May 15, 2019 at 01:15:52PM +0530, Gautham R. Shenoy wrote:
> From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> 
> The calls to arch_add_memory()/arch_remove_memory() are always made
> with the read-side cpu_hotplug_lock acquired via
> memory_hotplug_begin().  On pSeries,
> arch_add_memory()/arch_remove_memory() eventually call resize_hpt()
> which in turn calls stop_machine() which acquires the read-side
> cpu_hotplug_lock again, thereby resulting in the recursive acquisition
> of this lock.

A clarification regarding why we hadn't observed this problem earlier.

In the absence of CONFIG_PROVE_LOCKING, we hadn't observed a system
lockup during a memory hotplug operation because cpus_read_lock() is a
per-cpu rwsem read, which, in the fast-path (in the absence of the
writer, which in our case is a CPU-hotplug operation) simply
increments the read_count on the semaphore. Thus a recursive read in
the fast-path doesn't cause any problems.

However, we can hit this problem in practice if there is a concurrent
CPU-Hotplug operation in progress which is waiting to acquire the
write-side of the lock. This will cause the second recursive read to
block until the writer finishes. While the writer is blocked since the
first read holds the lock. Thus both the reader as well as the writers
fail to make any progress thereby blocking both CPU-Hotplug as well as
Memory Hotplug operations.

Memory-Hotplug				CPU-Hotplug
CPU 0					CPU 1
------                                  ------

1. down_read(cpu_hotplug_lock.rw_sem)  
   [memory_hotplug_begin]
					2. down_write(cpu_hotplug_lock.rw_sem)
					[cpu_up/cpu_down]
3. down_read(cpu_hotplug_lock.rw_sem)
   [stop_machine()]


> 
> Lockdep complains as follows in these code-paths.
> 
>  swapper/0/1 is trying to acquire lock:
>  (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: stop_machine+0x2c/0x60
> 
> but task is already holding lock:
> (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: mem_hotplug_begin+0x20/0x50
> 
>  other info that might help us debug this:
>   Possible unsafe locking scenario:
> 
>         CPU0
>         ----
>    lock(cpu_hotplug_lock.rw_sem);
>    lock(cpu_hotplug_lock.rw_sem);
> 
>   *** DEADLOCK ***
> 
>   May be due to missing lock nesting notation
> 
>  3 locks held by swapper/0/1:
>   #0: (____ptrval____) (&dev->mutex){....}, at: __driver_attach+0x12c/0x1b0
>   #1: (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: mem_hotplug_begin+0x20/0x50
>   #2: (____ptrval____) (mem_hotplug_lock.rw_sem){++++}, at: percpu_down_write+0x54/0x1a0
> 
> stack backtrace:
>  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc5-58373-gbc99402235f3-dirty #166
>  Call Trace:
>  [c0000000feb03150] [c000000000e32bd4] dump_stack+0xe8/0x164 (unreliable)
>  [c0000000feb031a0] [c00000000020d6c0] __lock_acquire+0x1110/0x1c70
>  [c0000000feb03320] [c00000000020f080] lock_acquire+0x240/0x290
>  [c0000000feb033e0] [c00000000017f554] cpus_read_lock+0x64/0xf0
>  [c0000000feb03420] [c00000000029ebac] stop_machine+0x2c/0x60
>  [c0000000feb03460] [c0000000000d7f7c] pseries_lpar_resize_hpt+0x19c/0x2c0
>  [c0000000feb03500] [c0000000000788d0] resize_hpt_for_hotplug+0x70/0xd0
>  [c0000000feb03570] [c000000000e5d278] arch_add_memory+0x58/0xfc
>  [c0000000feb03610] [c0000000003553a8] devm_memremap_pages+0x5e8/0x8f0
>  [c0000000feb036c0] [c0000000009c2394] pmem_attach_disk+0x764/0x830
>  [c0000000feb037d0] [c0000000009a7c38] nvdimm_bus_probe+0x118/0x240
>  [c0000000feb03860] [c000000000968500] really_probe+0x230/0x4b0
>  [c0000000feb038f0] [c000000000968aec] driver_probe_device+0x16c/0x1e0
>  [c0000000feb03970] [c000000000968ca8] __driver_attach+0x148/0x1b0
>  [c0000000feb039f0] [c0000000009650b0] bus_for_each_dev+0x90/0x130
>  [c0000000feb03a50] [c000000000967dd4] driver_attach+0x34/0x50
>  [c0000000feb03a70] [c000000000967068] bus_add_driver+0x1a8/0x360
>  [c0000000feb03b00] [c00000000096a498] driver_register+0x108/0x170
>  [c0000000feb03b70] [c0000000009a7400] __nd_driver_register+0xd0/0xf0
>  [c0000000feb03bd0] [c00000000128aa90] nd_pmem_driver_init+0x34/0x48
>  [c0000000feb03bf0] [c000000000010a10] do_one_initcall+0x1e0/0x45c
>  [c0000000feb03cd0] [c00000000122462c] kernel_init_freeable+0x540/0x64c
>  [c0000000feb03db0] [c00000000001110c] kernel_init+0x2c/0x160
>  [c0000000feb03e20] [c00000000000bed4] ret_from_kernel_thread+0x5c/0x68
> 
> Fix this issue by
>   1) Requiring all the calls to pseries_lpar_resize_hpt() be made
>      with cpu_hotplug_lock held.
> 
>   2) In pseries_lpar_resize_hpt() invoke stop_machine_cpuslocked()
>      as a consequence of 1)
> 
>   3) To satisfy 1), in hpt_order_set(), call mmu_hash_ops.resize_hpt()
>      with cpu_hotplug_lock held.
> 
> Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> ---
> v2 -> v3 : Updated the comment for pseries_lpar_resize_hpt()
>            Updated the commit-log with the full backtrace.
> v1 -> v2 : Rebased against powerpc/next instead of linux/master
> 
>  arch/powerpc/mm/book3s64/hash_utils.c | 9 ++++++++-
>  arch/powerpc/platforms/pseries/lpar.c | 8 ++++++--
>  2 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
> index 919a861..d07fcafd 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -38,6 +38,7 @@
>  #include <linux/libfdt.h>
>  #include <linux/pkeys.h>
>  #include <linux/hugetlb.h>
> +#include <linux/cpu.h>
> 
>  #include <asm/debugfs.h>
>  #include <asm/processor.h>
> @@ -1928,10 +1929,16 @@ static int hpt_order_get(void *data, u64 *val)
> 
>  static int hpt_order_set(void *data, u64 val)
>  {
> +	int ret;
> +
>  	if (!mmu_hash_ops.resize_hpt)
>  		return -ENODEV;
> 
> -	return mmu_hash_ops.resize_hpt(val);
> +	cpus_read_lock();
> +	ret = mmu_hash_ops.resize_hpt(val);
> +	cpus_read_unlock();
> +
> +	return ret;
>  }
> 
>  DEFINE_DEBUGFS_ATTRIBUTE(fops_hpt_order, hpt_order_get, hpt_order_set, "%llu\n");
> diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
> index 1034ef1..557d592 100644
> --- a/arch/powerpc/platforms/pseries/lpar.c
> +++ b/arch/powerpc/platforms/pseries/lpar.c
> @@ -859,7 +859,10 @@ static int pseries_lpar_resize_hpt_commit(void *data)
>  	return 0;
>  }
> 
> -/* Must be called in user context */
> +/*
> + * Must be called in process context. The caller must hold the
> + * cpus_lock.
> + */
>  static int pseries_lpar_resize_hpt(unsigned long shift)
>  {
>  	struct hpt_resize_state state = {
> @@ -913,7 +916,8 @@ static int pseries_lpar_resize_hpt(unsigned long shift)
> 
>  	t1 = ktime_get();
> 
> -	rc = stop_machine(pseries_lpar_resize_hpt_commit, &state, NULL);
> +	rc = stop_machine_cpuslocked(pseries_lpar_resize_hpt_commit,
> +				     &state, NULL);
> 
>  	t2 = ktime_get();
> 
> -- 
> 1.9.4
> 


^ permalink raw reply

* Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
From: Shawn Anastasio @ 2019-06-03  9:12 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Sam Bobroff, linux-pci, Linux Kernel Mailing List, rppt, Oliver,
	Bjorn Helgaas, Paul Mackerras, xyjxie, linuxppc-dev
In-Reply-To: <bab59b8e-f1fc-f92a-36bb-4ff471e6da24@ozlabs.ru>



On 6/3/19 3:35 AM, Alexey Kardashevskiy wrote:
> 
> 
> On 03/06/2019 15:02, Alexey Kardashevskiy wrote:
>>
>>
>> On 03/06/2019 12:23, Shawn Anastasio wrote:
>>>
>>>
>>> On 5/30/19 10:56 PM, Alexey Kardashevskiy wrote:
>>>>
>>>>
>>>> On 31/05/2019 08:49, Shawn Anastasio wrote:
>>>>> On 5/29/19 10:39 PM, Alexey Kardashevskiy wrote:
>>>>>>
>>>>>>
>>>>>> On 28/05/2019 17:39, Shawn Anastasio wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 28/05/2019 15:36, Oliver wrote:
>>>>>>>>> On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio <shawn@anastas.io>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Introduce a new pcibios function pcibios_ignore_alignment_request
>>>>>>>>>> which allows the PCI core to defer to platform-specific code to
>>>>>>>>>> determine whether or not to ignore alignment requests for PCI
>>>>>>>>>> resources.
>>>>>>>>>>
>>>>>>>>>> The existing behavior is to simply ignore alignment requests when
>>>>>>>>>> PCI_PROBE_ONLY is set. This is behavior is maintained by the
>>>>>>>>>> default implementation of pcibios_ignore_alignment_request.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Shawn Anastasio <shawn@anastas.io>
>>>>>>>>>> ---
>>>>>>>>>>      drivers/pci/pci.c   | 9 +++++++--
>>>>>>>>>>      include/linux/pci.h | 1 +
>>>>>>>>>>      2 files changed, 8 insertions(+), 2 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>>>>>>>>> index 8abc843b1615..8207a09085d1 100644
>>>>>>>>>> --- a/drivers/pci/pci.c
>>>>>>>>>> +++ b/drivers/pci/pci.c
>>>>>>>>>> @@ -5882,6 +5882,11 @@ resource_size_t __weak
>>>>>>>>>> pcibios_default_alignment(void)
>>>>>>>>>>             return 0;
>>>>>>>>>>      }
>>>>>>>>>>
>>>>>>>>>> +int __weak pcibios_ignore_alignment_request(void)
>>>>>>>>>> +{
>>>>>>>>>> +       return pci_has_flag(PCI_PROBE_ONLY);
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>>      #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
>>>>>>>>>>      static char
>>>>>>>>>> resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
>>>>>>>>>>      static DEFINE_SPINLOCK(resource_alignment_lock);
>>>>>>>>>> @@ -5906,9 +5911,9 @@ static resource_size_t
>>>>>>>>>> pci_specified_resource_alignment(struct pci_dev *dev,
>>>>>>>>>>             p = resource_alignment_param;
>>>>>>>>>>             if (!*p && !align)
>>>>>>>>>>                     goto out;
>>>>>>>>>> -       if (pci_has_flag(PCI_PROBE_ONLY)) {
>>>>>>>>>> +       if (pcibios_ignore_alignment_request()) {
>>>>>>>>>>                     align = 0;
>>>>>>>>>> -               pr_info_once("PCI: Ignoring requested alignments
>>>>>>>>>> (PCI_PROBE_ONLY)\n");
>>>>>>>>>> +               pr_info_once("PCI: Ignoring requested
>>>>>>>>>> alignments\n");
>>>>>>>>>>                     goto out;
>>>>>>>>>>             }
>>>>>>>>>
>>>>>>>>> I think the logic here is questionable to begin with. If the user
>>>>>>>>> has
>>>>>>>>> explicitly requested re-aligning a resource via the command line
>>>>>>>>> then
>>>>>>>>> we should probably do it even if PCI_PROBE_ONLY is set. When it
>>>>>>>>> breaks
>>>>>>>>> they get to keep the pieces.
>>>>>>>>>
>>>>>>>>> That said, the real issue here is that PCI_PROBE_ONLY probably
>>>>>>>>> shouldn't be set under qemu/kvm. Under the other hypervisor
>>>>>>>>> (PowerVM)
>>>>>>>>> hotplugged devices are configured by firmware before it's passed to
>>>>>>>>> the guest and we need to keep the FW assignments otherwise things
>>>>>>>>> break. QEMU however doesn't do any BAR assignments and relies on
>>>>>>>>> that
>>>>>>>>> being handled by the guest. At boot time this is done by SLOF, but
>>>>>>>>> Linux only keeps SLOF around until it's extracted the device-tree.
>>>>>>>>> Once that's done SLOF gets blown away and the kernel needs to do
>>>>>>>>> it's
>>>>>>>>> own BAR assignments. I'm guessing there's a hack in there to make it
>>>>>>>>> work today, but it's a little surprising that it works at all...
>>>>>>>>
>>>>>>>>
>>>>>>>> The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the
>>>>>>>> guest which receives an event from qemu (RAS_EPOW from
>>>>>>>> /proc/interrupts), fetches device tree chunks (and as I understand
>>>>>>>> it -
>>>>>>>> they come with BARs from phyp but without from qemu) and writes
>>>>>>>> "1" to
>>>>>>>> "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually:
>>>>>>>
>>>>>>> Interesting. Does this mean that the PHYP hotplug path doesn't
>>>>>>> call pci_assign_resource?
>>>>>>
>>>>>>
>>>>>> I'd expect dlpar_add_slot() to be called under phyp and eventually
>>>>>> pci_device_add() which (I think) may or may not trigger later
>>>>>> reassignment.
>>>>>>
>>>>>>
>>>>>>> If so it means the patch may not
>>>>>>> break that platform after all, though it still may not be
>>>>>>> the correct way of doing things.
>>>>>>
>>>>>>
>>>>>> We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems
>>>>>> that (unless resource_alignment= is used) the pseries guest should just
>>>>>> walk through all allocated resources and leave them unchanged.
>>>>>
>>>>> If we add a pcibios_default_alignment() implementation like was
>>>>> suggested earlier, then it will behave as if the user has
>>>>> specified resource_alignment= by default and SLOF's assignments
>>>>> won't be honored (I think).
>>>>
>>>>
>>>> I removed pci_add_flags(PCI_PROBE_ONLY) from pSeries_setup_arch and
>>>> tried booting with and without pci=resource_alignment= and I can see no
>>>> difference - BARs are still aligned to 64K as programmed in SLOF; if I
>>>> hack SLOF to align to 4K or 32K - BARs get packed and the guest leaves
>>>> them unchanged.
>>>>
>>>>
>>>>> I guess it boils down to one question - is it important that we
>>>>> observe SLOF's initial BAR assignments?
>>>>
>>>> It isn't if it's SLOF but it is if it's phyp. It used to not
>>>> allow/support BAR reassignment and even if it does not, I'd rather avoid
>>>> touching them.
>>>
>>> A quick update. I tried removing pci_add_flags(PCI_PROBE_ONLY) which
>>> worked, but if I add an implementation of pcibios_default_alignment
>>> which simply returns PAGE_SIZE, my VM fails to boot and many errors
>>> from the virtio disk driver are printed to the console.
>>>
>>> After some investigation, it seems that with pcibios_default_alignment
>>> present, Linux will reallocate all resources provided by SLOF on
>>> boot. I'm still not sure why exactly this causes the virtio driver
>>> to fail, but it does indicate that there is a reason to keep
>>> SLOF's initial assignments.
>>>
>>> Anybody have an idea what's causing this?
>>
>> With your changes the guest feels the urge to reassign bars (no idea why
>> but ok), when it does so, it puts both BARs (one is prefetchable) into
>> the 32bit non-prefetchable window of the PHB (SLOF puts the prefetchable
>> bar to a 64bit prefetchable window, I have no idea why the guest does it
>> different either but this must still work) and then qemu does not
>> emulate something properly - unassigned_mem_accepts() is triggered on
>> the bar access - no idea why - I am debugging it right now.
> 
> 
> Sooo the problem is that resouce::flags has 2 bits to describe 64bit
> BARs - PCI_BASE_ADDRESS_MEM_TYPE_64 and IORESOURCE_MEM_64 - and we don't
> set IORESOURCE_MEM_64 for 64bit BARs when parsing the fdt.
> 
> So the BAR reallocator moves the BAR to 32bit window (which is not
> desirable but permitted, I still have to chase it) and then
> pci_std_update_resource() writes BAR back but since now it is 32bit BAR,
> it does not write to the upper 32bits so that half remains 0x2100, QEMU
> does not move BAR to the right window and the MMIO stops working.
> 
> Try this in the guest kernel, it seems to keep bars where they were
> after slof.

Nice debugging work! With your patch the VM does boot. I'm not sure
if SLOF's original allocations are being kept or if Linux is redoing
them (how do you check?), but MMIO works and the system boots anyways.

I've also tested hotplug and the BAR allocations are page-aligned too,
as expected.

^ permalink raw reply

* Re: [PATCH v3 14/16] powerpc/32: implement fast entry for syscalls on BOOKE
From: Christophe Leroy @ 2019-06-03  9:16 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, linux-kernel, Nicholas Piggin
In-Reply-To: <20190528190341.Horde.nTXOule-IO2ReXFiNIqNbg8@messagerie.si.c-s.fr>



On 05/28/2019 05:03 PM, Christophe Leroy wrote:
> Michael Ellerman <mpe@ellerman.id.au> a écrit :
> 
>> Christophe Leroy <christophe.leroy@c-s.fr> writes:
>>> Le 23/05/2019 à 09:00, Christophe Leroy a écrit :
>>>
>>> [...]
>>>
>>>>> arch/powerpc/kernel/head_fsl_booke.o: In function `SystemCall':
>>>>> arch/powerpc/kernel/head_fsl_booke.S:416: undefined reference to
>>>>> `kvmppc_handler_BOOKE_INTERRUPT_SYSCALL_SPRN_SRR1'
>>>>> Makefile:1052: recipe for target 'vmlinux' failed
>>>>>
>>>>>> +.macro SYSCALL_ENTRY trapno intno
>>>>>> +    mfspr    r10, SPRN_SPRG_THREAD
>>>>>> +#ifdef CONFIG_KVM_BOOKE_HV
>>>>>> +BEGIN_FTR_SECTION
>>>>>> +    mtspr    SPRN_SPRG_WSCRATCH0, r10
>>>>>> +    stw    r11, THREAD_NORMSAVE(0)(r10)
>>>>>> +    stw    r13, THREAD_NORMSAVE(2)(r10)
>>>>>> +    mfcr    r13            /* save CR in r13 for now       */
>>>>>> +    mfspr    r11, SPRN_SRR1
>>>>>> +    mtocrf    0x80, r11    /* check MSR[GS] without clobbering 
>>>>>> reg */
>>>>>> +    bf    3, 1975f
>>>>>> +    b    kvmppc_handler_BOOKE_INTERRUPT_\intno\()_SPRN_SRR1
>>>>>
>>>>> It seems to me that the "_SPRN_SRR1" on the end of this line
>>>>> isn't meant to be there...  However, it still fails to link with that
>>>>> removed.
>>>
>>> It looks like I missed the macro expansion.
>>>
>>> The called function should be kvmppc_handler_8_0x01B
>>>
>>> Seems like kisskb doesn't build any config like this.
>>
>> I thought we did, ie:
>>
>> http://kisskb.ellerman.id.au/kisskb/buildresult/13817941/
> 
> That's a ppc64 config it seems. The problem was on booke32.
> 
> Christophe
> 
>>
>> But clearly something is missing to trigger the bug.

I was able to trigger the bug with mpc85xx_defconfig + 
CONFIG_VIRTUALIZATION + CONFIG_PPC_E500MC

The bug pops up when CONFIG_KVM_BOOKE_HV is set.

Christophe

>>
>> cheers
> 
> 

^ permalink raw reply

* Re: [PATCH 22/22] docs: fix broken documentation links
From: Christophe Leroy @ 2019-06-03  7:34 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Linux Doc Mailing List
  Cc: Andrew Lunn, Thomas Preston, Wolfram Sang, Catalin Marinas,
	Linus Walleij, Will Deacon, Russell King, Paul Mackerras,
	Alessia Mantegazza, Jakub Wilk, AKASHI Takahiro, Kevin Hilman,
	James Morris, linux-acpi, Andy Gross, xen-devel, Jason Wang,
	Alexander Popov, Qian Cai, Al Viro, Andy Lutomirski,
	Thomas Gleixner, Kairui Song, Quentin Perret, Greg Kroah-Hartman,
	Rafael J. Wysocki, linux-kernel, Paul Burton, Jiri Kosina,
	Casey Schaufler, Andrew Morton, Lu Baolu, Mark Rutland, Feng Tang,
	Dave Hansen, Mimi Zohar, Kamalesh Babulal, Masahiro Yamada,
	Yannik Sembritzki, Harry Wei, linux-i2c, Shuah Khan,
	Stephen Rothwell, Paul E. McKenney, Alexandre Ghiti, YueHaibing,
	Robert Moore, Bartosz Golaszewski, Len Brown, David Brown,
	Joerg Roedel, linux-arm-msm, Mauro Carvalho Chehab, linux-gpio,
	Claudiu Manoil, Florian Fainelli, Jacek Anaszewski, Bjorn Helgaas,
	linux-amlogic, Boris Ostrovsky, Mika Westerberg, linux-arm-kernel,
	Tony Luck, Sean Christopherson, Rob Herring, James Morse,
	Robin Murphy, Samuel Mendoza-Jonas, linux-pci, Bhupesh Sharma,
	Josh Poimboeuf, platform-driver-x86, Ding Xiang, linux-kselftest,
	Alex Shi, Lorenzo Pieralisi, Baoquan He, Jonathan Corbet,
	Raphael Gault, Joel Stanley, Federico Vaga, Darren Hart,
	Erik Schmauss, Serge E. Hallyn, Palmer Dabbelt, Kees Cook,
	Bartlomiej Zolnierkiewicz, Jonathan Neuschäfer,
	SeongJae Park, Mark Brown, Borislav Petkov, Sunil Muthuswamy,
	virtualization, devel, Ard Biesheuvel, Liam Girdwood,
	Sakari Ailus, Olof Johansson, Logan Gunthorpe, David S. Miller,
	Kirill A. Shutemov, Sven Van Asbroeck, Michal Hocko, kvm,
	Michael S. Tsirkin, Peter Zijlstra, Thorsten Leemhuis,
	David Howells, linux-mm, H. Peter Anvin, devel, Manfred Spraul,
	Luis Chamberlain, x86, Pavel Tatashin, Mike Rapoport, Ingo Molnar,
	Dave Young, devicetree, Arnaldo Carvalho de Melo, Jerome Glisse,
	Stefano Stabellini, Jonathan Cameron, Dmitry Vyukov, linux-edac,
	Juergen Gross, Denis Efremov, netdev, Nicolas Ferre, Changbin Du,
	linux-security-module, linuxppc-dev, Andy Shevchenko
In-Reply-To: <f9fecacbe4ce0b2b3aed38d71ae3753f2daf3ce3.1559171394.git.mchehab+samsung@kernel.org>



Le 30/05/2019 à 01:23, Mauro Carvalho Chehab a écrit :
> Mostly due to x86 and acpi conversion, several documentation
> links are still pointing to the old file. Fix them.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
> ---
>   Documentation/acpi/dsd/leds.txt                  |  2 +-
>   Documentation/admin-guide/kernel-parameters.rst  |  6 +++---
>   Documentation/admin-guide/kernel-parameters.txt  | 16 ++++++++--------
>   Documentation/admin-guide/ras.rst                |  2 +-
>   .../devicetree/bindings/net/fsl-enetc.txt        |  7 +++----
>   .../bindings/pci/amlogic,meson-pcie.txt          |  2 +-
>   .../bindings/regulator/qcom,rpmh-regulator.txt   |  2 +-
>   Documentation/devicetree/booting-without-of.txt  |  2 +-
>   Documentation/driver-api/gpio/board.rst          |  2 +-
>   Documentation/driver-api/gpio/consumer.rst       |  2 +-
>   .../firmware-guide/acpi/enumeration.rst          |  2 +-
>   .../firmware-guide/acpi/method-tracing.rst       |  2 +-
>   Documentation/i2c/instantiating-devices          |  2 +-
>   Documentation/sysctl/kernel.txt                  |  4 ++--
>   .../translations/it_IT/process/howto.rst         |  2 +-
>   .../it_IT/process/stable-kernel-rules.rst        |  4 ++--
>   .../translations/zh_CN/process/4.Coding.rst      |  2 +-
>   Documentation/x86/x86_64/5level-paging.rst       |  2 +-
>   Documentation/x86/x86_64/boot-options.rst        |  4 ++--
>   .../x86/x86_64/fake-numa-for-cpusets.rst         |  2 +-
>   MAINTAINERS                                      |  6 +++---
>   arch/arm/Kconfig                                 |  2 +-
>   arch/arm64/kernel/kexec_image.c                  |  2 +-
>   arch/powerpc/Kconfig                             |  2 +-
>   arch/x86/Kconfig                                 | 16 ++++++++--------
>   arch/x86/Kconfig.debug                           |  2 +-
>   arch/x86/boot/header.S                           |  2 +-
>   arch/x86/entry/entry_64.S                        |  2 +-
>   arch/x86/include/asm/bootparam_utils.h           |  2 +-
>   arch/x86/include/asm/page_64_types.h             |  2 +-
>   arch/x86/include/asm/pgtable_64_types.h          |  2 +-
>   arch/x86/kernel/cpu/microcode/amd.c              |  2 +-
>   arch/x86/kernel/kexec-bzimage64.c                |  2 +-
>   arch/x86/kernel/pci-dma.c                        |  2 +-
>   arch/x86/mm/tlb.c                                |  2 +-
>   arch/x86/platform/pvh/enlighten.c                |  2 +-
>   drivers/acpi/Kconfig                             | 10 +++++-----
>   drivers/net/ethernet/faraday/ftgmac100.c         |  2 +-
>   .../fieldbus/Documentation/fieldbus_dev.txt      |  4 ++--
>   drivers/vhost/vhost.c                            |  2 +-
>   include/acpi/acpi_drivers.h                      |  2 +-
>   include/linux/fs_context.h                       |  2 +-
>   include/linux/lsm_hooks.h                        |  2 +-
>   mm/Kconfig                                       |  2 +-
>   security/Kconfig                                 |  2 +-
>   tools/include/linux/err.h                        |  2 +-
>   tools/objtool/Documentation/stack-validation.txt |  4 ++--
>   tools/testing/selftests/x86/protection_keys.c    |  2 +-
>   48 files changed, 77 insertions(+), 78 deletions(-)

[...]

> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 8c1c636308c8..e868d2bd48b8 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -898,7 +898,7 @@ config PPC_MEM_KEYS
>   	  page-based protections, but without requiring modification of the
>   	  page tables when an application changes protection domains.
>   
> -	  For details, see Documentation/vm/protection-keys.rst
> +	  For details, see Documentation/x86/protection-keys.rst

It looks strange to reference an x86 file, for powerpc arch.

Christophe


^ permalink raw reply

* Re: [PATCH v8 3/7] s390/pci: add support for IOMMU default DMA mode build options
From: Sebastian Ott @ 2019-06-03 11:51 UTC (permalink / raw)
  To: Zhen Lei
  Cc: linux-ia64, linux-doc, Hanjun Guo, Heiko Carstens, Paul Mackerras,
	H . Peter Anvin, linux-s390, Jonathan Corbet,
	Jean-Philippe Brucker, Joerg Roedel, x86, Ingo Molnar, Fenghua Yu,
	Will Deacon, John Garry, linuxppc-dev, Borislav Petkov,
	Thomas Gleixner, Gerald Schaefer, Tony Luck, David Woodhouse,
	linux-kernel, iommu, Martin Schwidefsky, Robin Murphy
In-Reply-To: <20190530034831.4184-4-thunder.leizhen@huawei.com>


On Thu, 30 May 2019, Zhen Lei wrote:
> The default DMA mode is LAZY on s390, this patch make it can be set to
> STRICT at build time. It can be overridden by boot option.
> 
> There is no functional change.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>

Acked-by: Sebastian Ott <sebott@linux.ibm.com>


^ permalink raw reply

* Re: [PATCH] powerpc/powernv/npu: Fix reference leak
From: Michael Ellerman @ 2019-06-03 12:32 UTC (permalink / raw)
  To: Greg Kurz, linux-kernel
  Cc: Alexey Kardashevskiy, Alistair Popple, linuxppc-dev
In-Reply-To: <155568805354.600470.13376593185688810607.stgit@bahia.lan>

On Fri, 2019-04-19 at 15:34:13 UTC, Greg Kurz wrote:
> Since 902bdc57451c, get_pci_dev() calls pci_get_domain_bus_and_slot(). This
> has the effect of incrementing the reference count of the PCI device, as
> explained in drivers/pci/search.c:
> 
>  * Given a PCI domain, bus, and slot/function number, the desired PCI
>  * device is located in the list of PCI devices. If the device is
>  * found, its reference count is increased and this function returns a
>  * pointer to its data structure.  The caller must decrement the
>  * reference count by calling pci_dev_put().  If no device is found,
>  * %NULL is returned.
> 
> Nothing was done to call pci_dev_put() and the reference count of GPU and
> NPU PCI devices rockets up.
> 
> A natural way to fix this would be to teach the callers about the change,
> so that they call pci_dev_put() when done with the pointer. This turns
> out to be quite intrusive, as it affects many paths in npu-dma.c,
> pci-ioda.c and vfio_pci_nvlink2.c. Also, the issue appeared in 4.16 and
> some affected code got moved around since then: it would be problematic
> to backport the fix to stable releases.
> 
> All that code never cared for reference counting anyway. Call pci_dev_put()
> from get_pci_dev() to revert to the previous behavior.
> 
> Fixes: 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn")
> Cc: stable@vger.kernel.org # v4.16
> Signed-off-by: Greg Kurz <groug@kaod.org>
> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/02c5f5394918b9b47ff4357b1b183357

cheers

^ permalink raw reply

* Re: [PATCH -next] powerpc/book3s64: Make some symbols static
From: Michael Ellerman @ 2019-06-03 12:32 UTC (permalink / raw)
  To: YueHaibing, benh, paulus, christophe.leroy, aneesh.kumar
  Cc: linuxppc-dev, YueHaibing, linux-kernel
In-Reply-To: <20190504102427.12332-1-yuehaibing@huawei.com>

On Sat, 2019-05-04 at 10:24:27 UTC, YueHaibing wrote:
> Fix sparse warnings:
> 
> arch/powerpc/mm/book3s64/radix_pgtable.c:326:13: warning: symbol 'radix_init_pgtable' was not declared. Should it be static?
> arch/powerpc/mm/book3s64/hash_native.c:48:1: warning: symbol 'native_tlbie_lock' was not declared. Should it be static?
> arch/powerpc/mm/book3s64/hash_utils.c:988:24: warning: symbol 'init_hash_mm_context' was not declared. Should it be static?
> 
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/d667edc01bedcd23988ef69f851e7cc2

cheers

^ permalink raw reply

* Re: [PATCH -next] misc: ocxl: Make ocxl_remove static
From: Michael Ellerman @ 2019-06-03 12:32 UTC (permalink / raw)
  To: YueHaibing, fbarrat, ajd, arnd, gregkh
  Cc: linuxppc-dev, YueHaibing, linux-kernel
In-Reply-To: <20190504102720.42220-1-yuehaibing@huawei.com>

On Sat, 2019-05-04 at 10:27:20 UTC, YueHaibing wrote:
> Fix sparse warning:
> 
> drivers/misc/ocxl/pci.c:44:6: warning:
>  symbol 'ocxl_remove' was not declared. Should it be static?
> 
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> Acked-by: Andrew Donnellan <ajd@linux.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/00b0cdbbc87fb4b531a0d75f4004ed3d

cheers

^ permalink raw reply

* Re: [PATCH 1/2] powerpc/lib: fix redundant inclusion of quad.o
From: Michael Ellerman @ 2019-06-03 12:32 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <7496da89e027e563cb8e62dc89548525cf53b57e.1557741292.git.christophe.leroy@c-s.fr>

On Mon, 2019-05-13 at 10:00:14 UTC, Christophe Leroy wrote:
> quad.o is only for PPC64, and already included in obj64-y,
> so it doesn't have to be in obj-y
> 
> Fixes: 31bfdb036f12 ("powerpc: Use instruction emulation infrastructure to handle alignment faults")
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/f8e0d0fddf87f26aed576983f752329d

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/pseries: Fix xive=off command line
From: Michael Ellerman @ 2019-06-03 12:32 UTC (permalink / raw)
  To: Greg Kurz, Paul Mackerras, Benjamin Herrenschmidt
  Cc: pavrampu, linuxppc-dev, linux-kernel, stable,
	Cédric Le Goater
In-Reply-To: <155791470178.432724.8008395673479905061.stgit@bahia.lan>

On Wed, 2019-05-15 at 10:05:01 UTC, Greg Kurz wrote:
> On POWER9, if the hypervisor supports XIVE exploitation mode, the guest OS
> will unconditionally requests for the XIVE interrupt mode even if XIVE was
> deactivated with the kernel command line xive=off. Later on, when the spapr
> XIVE init code handles xive=off, it disables XIVE and tries to fall back on
> the legacy mode XICS.
> 
> This discrepency causes a kernel panic because the hypervisor is configured
> to provide the XIVE interrupt mode to the guest :
> 
> [    0.008837] kernel BUG at arch/powerpc/sysdev/xics/xics-common.c:135!
> [    0.008877] Oops: Exception in kernel mode, sig: 5 [#1]
> [    0.008908] LE SMP NR_CPUS=1024 NUMA pSeries
> [    0.008939] Modules linked in:
> [    0.008964] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.0.13-200.fc29.ppc64le #1
> [    0.009018] NIP:  c000000001029ab8 LR: c000000001029aac CTR: c0000000018e0000
> [    0.009065] REGS: c0000007f96d7900 TRAP: 0700   Tainted: G        W          (5.0.13-200.fc29.ppc64le)
> [    0.009119] MSR:  8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 28000222  XER: 20040000
> [    0.009168] CFAR: c0000000001b1e28 IRQMASK: 0
> [    0.009168] GPR00: c000000001029aac c0000007f96d7b90 c0000000015e8600 0000000000000000
> [    0.009168] GPR04: 0000000000000001 0000000000000000 0000000000000061 646f6d61696e0d0a
> [    0.009168] GPR08: 00000007fd8f0000 0000000000000001 c0000000014c44c0 c0000007f96d76cf
> [    0.009168] GPR12: 0000000000000000 c0000000018e0000 0000000000000001 0000000000000000
> [    0.009168] GPR16: 0000000000000000 0000000000000001 c0000007f96d7c08 c0000000016903d0
> [    0.009168] GPR20: c0000007fffe04e8 ffffffffffffffea c000000001620164 c00000000161fe58
> [    0.009168] GPR24: c000000000ea6c88 c0000000011151a8 00000000006000c0 c0000007f96d7c34
> [    0.009168] GPR28: 0000000000000000 c0000000014b286c c000000001115180 c00000000161dc70
> [    0.009558] NIP [c000000001029ab8] xics_smp_probe+0x38/0x98
> [    0.009590] LR [c000000001029aac] xics_smp_probe+0x2c/0x98
> [    0.009622] Call Trace:
> [    0.009639] [c0000007f96d7b90] [c000000001029aac] xics_smp_probe+0x2c/0x98 (unreliable)
> [    0.009687] [c0000007f96d7bb0] [c000000001033404] pSeries_smp_probe+0x40/0xa0
> [    0.009734] [c0000007f96d7bd0] [c0000000010212a4] smp_prepare_cpus+0x62c/0x6ec
> [    0.009782] [c0000007f96d7cf0] [c0000000010141b8] kernel_init_freeable+0x148/0x448
> [    0.009829] [c0000007f96d7db0] [c000000000010ba4] kernel_init+0x2c/0x148
> [    0.009870] [c0000007f96d7e20] [c00000000000bdd4] ret_from_kernel_thread+0x5c/0x68
> [    0.009916] Instruction dump:
> [    0.009940] 7c0802a6 60000000 7c0802a6 38800002 f8010010 f821ffe1 3c62001c e863b9a0
> [    0.009988] 4b1882d1 60000000 7c690034 5529d97e <0b090000> 3d22001c e929b998 3ce2ff8f
> 
> Look for xive=off during prom_init and don't ask for XIVE in this case. One
> exception though: if the host only supports XIVE, we still want to boot so
> we ignore xive=off.
> 
> Similarly, have the spapr XIVE init code to looking at the interrupt mode
> negociated during CAS, and ignore xive=off if the hypervisor only supports
> XIVE.
> 
> Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
> Cc: stable@vger.kernel.org # v4.20
> Reported-by: Pavithra R. Prakash <pavrampu@in.ibm.com>
> Signed-off-by: Greg Kurz <groug@kaod.org>
> Reviewed-by: Cédric Le Goater <clg@kaod.org>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/a3bf9fbdad600b1e4335dd90979f8d60

cheers

^ permalink raw reply

* Re: [PATCH] powerpc: Remove variable ‘path’ since not used
From: Michael Ellerman @ 2019-06-03 12:32 UTC (permalink / raw)
  To: Mathieu Malaterre
  Cc: Mathieu Malaterre, linuxppc-dev, Paul Mackerras, linux-kernel
In-Reply-To: <20190523102520.20585-1-malat@debian.org>

On Thu, 2019-05-23 at 10:25:20 UTC, Mathieu Malaterre wrote:
> In commit eab00a208eb6 ("powerpc: Move `path` variable inside
> DEBUG_PROM") DEBUG_PROM sentinels were added to silence a warning
> (treated as error with W=1):
> 
>   arch/powerpc/kernel/prom_init.c:1388:8: error: variable ‘path’ set but not used [-Werror=unused-but-set-variable]
> 
> Rework the original patch and simplify the code, by removing the
> variable ‘path’ completely. Fix line over 90 characters.
> 
> Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Mathieu Malaterre <malat@debian.org>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/c806a6fde1c29e7419afcf94d761827a

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/powernv: Show checkstop reason for NPU2 HMIs
From: Michael Ellerman @ 2019-06-03 12:32 UTC (permalink / raw)
  To: Frederic Barrat, linuxppc-dev, ajd, arbab, aik; +Cc: clombard, groug
In-Reply-To: <20190523122804.4801-1-fbarrat@linux.ibm.com>

On Thu, 2019-05-23 at 12:28:04 UTC, Frederic Barrat wrote:
> If the kernel is notified of an HMI caused by the NPU2, it's currently
> not being recognized and it logs the default message:
> 
>     Unknown Malfunction Alert of type 3
> 
> The NPU on Power 9 has 3 Fault Isolation Registers, so that's a lot of
> possible causes, but we should at least log that it's an NPU problem
> and report which FIR and which bit were raised if opal gave us the
> information.
> 
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/89d87bcba2874d824affb7842bb3960c

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/powernv: Update firmware archaeology around OPAL_HANDLE_HMI
From: Michael Ellerman @ 2019-06-03 12:32 UTC (permalink / raw)
  To: Stewart Smith, linuxppc-dev; +Cc: Stewart Smith
In-Reply-To: <20190524050956.14114-1-stewart@linux.ibm.com>

On Fri, 2019-05-24 at 05:09:56 UTC, Stewart Smith wrote:
> The first machines to ship with OPAL firmware all got firmware updates
> that have the new call, but just in case someone is foolish enough to
> believe the first 4 months of firmware is the best, we keep this code
> around.
> 
> Comment is updated to not refer to late 2014 as recent or the future.
> 
> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/1549c42deff5f3ffff326ae295ae5816

cheers

^ permalink raw reply

* Re: [PATCH] dlpar: Fix a missing-check bug in dlpar_parse_cc_property()
From: Michael Ellerman @ 2019-06-03 12:32 UTC (permalink / raw)
  To: Gen Zhang, benh, paulus; +Cc: nfont, linuxppc-dev, linux-kernel
In-Reply-To: <20190526024240.GA14546@zhanggen-UX430UQ>

On Sun, 2019-05-26 at 02:42:40 UTC, Gen Zhang wrote:
> In dlpar_parse_cc_property(), 'prop->name' is allocated by kstrdup().
> kstrdup() may return NULL, so it should be checked and handle error.
> And prop should be freed if 'prop->name' is NULL.
> 
> Signed-off-by: Gen Zhang <blackgod016574@gmail.com>
> Acked-by: Nathan Lynch <nathanl@linux.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/efa9ace68e487ddd29c2b4d6dd232421

cheers

^ permalink raw reply

* [PATCH] powerpc/32s: fix booting with CONFIG_PPC_EARLY_DEBUG_BOOTX
From: Christophe Leroy @ 2019-06-03 13:00 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Mathieu Malaterre
  Cc: linuxppc-dev, linux-kernel

When booting through OF, setup_disp_bat() does nothing because
disp_BAT are not set. By change, it used to work because BOOTX
buffer is mapped 1:1 at address 0x81000000 by the bootloader, and
btext_setup_display() sets virt addr same as phys addr.

But since commit 215b823707ce ("powerpc/32s: set up an early static
hash table for KASAN."), a temporary page table overrides the
bootloader mapping.

This 0x81000000 is also problematic with the newly implemented
Kernel Userspace Access Protection (KUAP) because it is within user
address space.

This patch fixes those issues by properly setting disp_BAT through
a call to btext_prepare_BAT(), allowing setup_disp_bat() to
properly setup BAT3 for early bootx screen buffer access.

Reported-by: Mathieu Malaterre <malat@debian.org>
Fixes: 215b823707ce ("powerpc/32s: set up an early static hash table for KASAN.")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/include/asm/btext.h       | 4 ++++
 arch/powerpc/kernel/prom_init.c        | 1 +
 arch/powerpc/kernel/prom_init_check.sh | 2 +-
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/btext.h b/arch/powerpc/include/asm/btext.h
index 3ffad030393c..461b0f193864 100644
--- a/arch/powerpc/include/asm/btext.h
+++ b/arch/powerpc/include/asm/btext.h
@@ -13,7 +13,11 @@ extern void btext_update_display(unsigned long phys, int width, int height,
 				 int depth, int pitch);
 extern void btext_setup_display(int width, int height, int depth, int pitch,
 				unsigned long address);
+#ifdef CONFIG_PPC32
 extern void btext_prepare_BAT(void);
+#else
+static inline void btext_prepare_BAT(void) { }
+#endif
 extern void btext_map(void);
 extern void btext_unmap(void);
 
diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 3555cad7bdde..ed446b7ea164 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -2336,6 +2336,7 @@ static void __init prom_check_displays(void)
 			prom_printf("W=%d H=%d LB=%d addr=0x%x\n",
 				    width, height, pitch, addr);
 			btext_setup_display(width, height, 8, pitch, addr);
+			btext_prepare_BAT();
 		}
 #endif /* CONFIG_PPC_EARLY_DEBUG_BOOTX */
 	}
diff --git a/arch/powerpc/kernel/prom_init_check.sh b/arch/powerpc/kernel/prom_init_check.sh
index 518d416971c1..160bef0d553d 100644
--- a/arch/powerpc/kernel/prom_init_check.sh
+++ b/arch/powerpc/kernel/prom_init_check.sh
@@ -24,7 +24,7 @@ fi
 WHITELIST="add_reloc_offset __bss_start __bss_stop copy_and_flush
 _end enter_prom $MEM_FUNCS reloc_offset __secondary_hold
 __secondary_hold_acknowledge __secondary_hold_spinloop __start
-logo_linux_clut224
+logo_linux_clut224 btext_prepare_BAT
 reloc_got2 kernstart_addr memstart_addr linux_banner _stext
 __prom_init_toc_start __prom_init_toc_end btext_setup_display TOC."
 
-- 
2.13.3


^ permalink raw reply related

* Re: [PATCH 01/16] uaccess: add untagged_addr definition for other arches
From: Khalid Aziz @ 2019-06-03 15:16 UTC (permalink / raw)
  To: Christoph Hellwig, Linus Torvalds, Paul Burton, James Hogan,
	Yoshinori Sato, Rich Felker, David S. Miller, Andrey Konovalov
  Cc: Catalin Marinas, linux-sh, x86, linux-mips, Nicholas Piggin,
	linux-kernel, linux-mm, Paul Mackerras, sparclinux, linuxppc-dev
In-Reply-To: <20190601074959.14036-2-hch@lst.de>

On 6/1/19 1:49 AM, Christoph Hellwig wrote:
> From: Andrey Konovalov <andreyknvl@google.com>
> 
> To allow arm64 syscalls to accept tagged pointers from userspace, we must
> untag them when they are passed to the kernel. Since untagging is done in
> generic parts of the kernel, the untagged_addr macro needs to be defined
> for all architectures.
> 
> Define it as a noop for architectures other than arm64.

Could you reword above sentence? We are already starting off with
untagged_addr() not being no-op for arm64 and sparc64. It will expand
further potentially. So something more along the lines of "Define it as
noop for architectures that do not support memory tagging". The first
paragraph in the log can also be rewritten to be not specific to arm64.

--
Khalid

> 
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  include/linux/mm.h | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0e8834ac32b7..949d43e9c0b6 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -99,6 +99,10 @@ extern int mmap_rnd_compat_bits __read_mostly;
>  #include <asm/pgtable.h>
>  #include <asm/processor.h>
>  
> +#ifndef untagged_addr
> +#define untagged_addr(addr) (addr)
> +#endif
> +
>  #ifndef __pa_symbol
>  #define __pa_symbol(x)  __pa(RELOC_HIDE((unsigned long)(x), 0))
>  #endif
> 


^ permalink raw reply

* Re: [PATCH 03/16] mm: simplify gup_fast_permitted
From: Linus Torvalds @ 2019-06-03 16:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Rich Felker, Yoshinori Sato, Linux-sh list, James Hogan,
	the arch/x86 maintainers, Khalid Aziz, Nicholas Piggin,
	linux-mips, Linux-MM, Paul Burton, Paul Mackerras,
	Andrey Konovalov, sparclinux, linuxppc-dev, David S. Miller,
	Linux List Kernel Mailing
In-Reply-To: <20190603074121.GA22920@lst.de>

On Mon, Jun 3, 2019 at 12:41 AM Christoph Hellwig <hch@lst.de> wrote:
>
> I only removed a duplicate of it.

I don't see any remaining cases.

> The full (old) code in get_user_pages_fast() looks like this:
>
>         if (nr_pages <= 0)
>                 return 0;
>
>         if (unlikely(!access_ok((void __user *)start, len)))
>                 return -EFAULT;
>
>         if (gup_fast_permitted(start, nr_pages)) {

Yes, and that code was correct.

The new code has no test at all for "nr_pages == 0", afaik.

                 Linus

^ permalink raw reply

* Re: [PATCH v3 0/6] Prerequisites for NXP LS104xA SMMU enablement
From: Andreas Färber @ 2019-06-03 16:42 UTC (permalink / raw)
  To: Laurentiu Tudor
  Cc: Madalin-cristian Bucur, netdev@vger.kernel.org, Roy Pledge,
	linux-kernel@vger.kernel.org, Leo Li,
	iommu@lists.linux-foundation.org, Camelia Alexandra Groza,
	Mian Yousaf Kaukab, linuxppc-dev@lists.ozlabs.org,
	davem@davemloft.net, linux-arm-kernel@lists.infradead.org
In-Reply-To: <VI1PR04MB5134E4DA6EA052BEBB3C26EFEC190@VI1PR04MB5134.eurprd04.prod.outlook.com>

Am 31.05.19 um 19:32 schrieb Laurentiu Tudor:
>> -----Original Message-----
>> From: Andreas Färber <afaerber@suse.de>
>> Sent: Friday, May 31, 2019 8:04 PM
>>
>> Hello Laurentiu,
>>
>> Am 31.05.19 um 18:46 schrieb Laurentiu Tudor:
>>>> -----Original Message-----
>>>> From: Andreas Färber <afaerber@suse.de>
>>>> Sent: Friday, May 31, 2019 7:15 PM
>>>>
>>>> Hi Laurentiu,
>>>>
>>>> Am 30.05.19 um 16:19 schrieb laurentiu.tudor@nxp.com:
>>>>> This patch series contains several fixes in preparation for SMMU
>>>>> support on NXP LS1043A and LS1046A chips. Once these get picked up,
>>>>> I'll submit the actual SMMU enablement patches consisting in the
>>>>> required device tree changes.
>>>>
>>>> Have you thought through what will happen if this patch ordering is not
>>>> preserved? In particular, a user installing a future U-Boot update with
>>>> the DTB bits but booting a stable kernel without this patch series -
>>>> wouldn't that regress dpaa then for our customers?
>>>>
>>>
>>> These are fixes for issues that popped out after enabling SMMU.
>>> I do not expect them to break anything.
>>
>> That was not my question! You're missing my point: All your patches are
>> lacking a Fixes header in their commit message, for backporting them, to
>> avoid _your DT patches_ breaking the driver on stable branches!
> 
> It does appear that I'm missing your point. For sure, the DT updates solely will
> break the kernel without these fixes but I'm not sure I understand how this
> could happen.

In short, customers rarely run master branch. Kindly have your
colleagues explain stable branches to you in details.

With Fixes header I was referring to the syntax explained here:
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes

> My plan was to share the kernel dts patches sometime after this series
> makes it through.

That's fine. What I'm warning you is that seemingly your DT patches,
once in one of your LSDK U-Boot releases, will cause a regression for
distros like our SLES 15 SP1 unless these prereq kernel patches get
applied on the respective stable branches. Which will not happen
automatically unless you as patch author take the appropriate action
before they get merged.

Thanks,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)

^ permalink raw reply

* Re: [PATCH 03/16] mm: simplify gup_fast_permitted
From: Linus Torvalds @ 2019-06-03 17:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Rich Felker, Yoshinori Sato, Linux-sh list, James Hogan,
	the arch/x86 maintainers, Khalid Aziz, Nicholas Piggin,
	linux-mips, Linux-MM, Paul Burton, Paul Mackerras,
	Andrey Konovalov, sparclinux, linuxppc-dev, David S. Miller,
	Linux List Kernel Mailing
In-Reply-To: <CAHk-=wg5mww3StP8HqPN4d5eij3KmEayM743v-nDKAMgRe2J6g@mail.gmail.com>

On Mon, Jun 3, 2019 at 9:08 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> The new code has no test at all for "nr_pages == 0", afaik.

Note that it really is important to check for that, because right now we do

        if (gup_fast_permitted(start, nr_pages)) {
                local_irq_save(flags);
                gup_pgd_range(start, end, write ? FOLL_WRITE : 0, pages, &nr);
                local_irq_restore(flags);
        }

and that gup_pgd_range() function *depends* on the range being
non-zero, and does

        pgdp = pgd_offset(current->mm, addr);
        do {
                pgd_t pgd = READ_ONCE(*pgdp);
...
        } while (pgdp++, addr = next, addr != end);

Note how a zero range would turn into an infinite range here.

And the only check for 0 was that

        if (nr_pages <= 0)
                return 0;

in get_user_pages_fast() that you removed.

(Admittedly, it would be much better to have that check in
__get_user_pages_fast() itself, because we do have callers that call
the double-underscore version)

Now, I sincerely hope that we don't have anybody that passes in a zero
nr_pages (or a negative one), but we do actually have a comment saying
it's ok.

Note that the check for "if (end < start)" not only does not check for
0, it also doesn't really check for negative. It checks for
_overflow_. Admittedly most negative values would be expected to
overflow, but it's still a very different issue.

Maybe you added the check for negative somewhere else (in another
patch), but I don't see it.

                Linus

^ permalink raw reply

* [PATCH] PCI: rpaphp: Avoid a sometimes-uninitialized warning
From: Nathan Chancellor @ 2019-06-03 17:43 UTC (permalink / raw)
  To: Tyrel Datwyler, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: linux-pci, linux-kernel, clang-built-linux, Bjorn Helgaas,
	Nathan Chancellor, linuxppc-dev

When building with -Wsometimes-uninitialized, clang warns:

drivers/pci/hotplug/rpaphp_core.c:243:14: warning: variable 'fndit' is
used uninitialized whenever 'for' loop exits because its condition is
false [-Wsometimes-uninitialized]
        for (j = 0; j < entries; j++) {
                    ^~~~~~~~~~~
drivers/pci/hotplug/rpaphp_core.c:256:6: note: uninitialized use occurs
here
        if (fndit)
            ^~~~~
drivers/pci/hotplug/rpaphp_core.c:243:14: note: remove the condition if
it is always true
        for (j = 0; j < entries; j++) {
                    ^~~~~~~~~~~
drivers/pci/hotplug/rpaphp_core.c:233:14: note: initialize the variable
'fndit' to silence this warning
        int j, fndit;
                    ^
                     = 0

Looking at the loop in a vacuum as clang would, fndit could be
uninitialized if entries was ever zero or the if statement was
always true within the loop. Regardless of whether or not this
warning is a problem in practice, "found" variables should always
be initialized to false so that there is no possibility of
undefined behavior.

Link: https://github.com/ClangBuiltLinux/linux/issues/504
Fixes: 2fcf3ae508c2 ("hotplug/drc-info: Add code to search ibm,drc-info property")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
---
 drivers/pci/hotplug/rpaphp_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/hotplug/rpaphp_core.c b/drivers/pci/hotplug/rpaphp_core.c
index bcd5d357ca23..07b56bf2886f 100644
--- a/drivers/pci/hotplug/rpaphp_core.c
+++ b/drivers/pci/hotplug/rpaphp_core.c
@@ -230,7 +230,7 @@ static int rpaphp_check_drc_props_v2(struct device_node *dn, char *drc_name,
 	struct of_drc_info drc;
 	const __be32 *value;
 	char cell_drc_name[MAX_DRC_NAME_LEN];
-	int j, fndit;
+	int j, fndit = 0;
 
 	info = of_find_property(dn->parent, "ibm,drc-info", NULL);
 	if (info == NULL)
-- 
2.22.0.rc2


^ permalink raw reply related

* Re: [PATCH v3] mm: add account_locked_vm utility function
From: Alex Williamson @ 2019-06-03 19:15 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Mark Rutland, Davidlohr Bueso, kvm, Alan Tull,
	Alexey Kardashevskiy, linux-fpga, linuxppc-dev, kvm-ppc,
	linux-kernel, linux-mm, Jason Gunthorpe, Moritz Fischer,
	Steve Sistare, akpm, Ira Weiny, Christoph Lameter, Wu Hao
In-Reply-To: <20190529205019.20927-1-daniel.m.jordan@oracle.com>

On Wed, 29 May 2019 16:50:19 -0400
Daniel Jordan <daniel.m.jordan@oracle.com> wrote:

> locked_vm accounting is done roughly the same way in five places, so
> unify them in a helper.
> 
> Include the helper's caller in the debug print to distinguish between
> callsites.
> 
> Error codes stay the same, so user-visible behavior does too.  The one
> exception is that the -EPERM case in tce_account_locked_vm is removed
> because Alexey has never seen it triggered.
> 
> Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> Cc: Alan Tull <atull@kernel.org>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: Jason Gunthorpe <jgg@mellanox.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Moritz Fischer <mdf@kernel.org>
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Cc: Steve Sistare <steven.sistare@oracle.com>
> Cc: Wu Hao <hao.wu@intel.com>
> Cc: linux-mm@kvack.org
> Cc: kvm@vger.kernel.org
> Cc: kvm-ppc@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-fpga@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
> v3:
>  - uninline account_locked_vm (Andrew)
>  - fix doc comment (Ira)
>  - retain down_write_killable in vfio type1 (Alex)
>  - leave Alexey's T-b since the code is the same aside from uninlining
>  - sanity tested with vfio type1, sanity-built on ppc
> 
>  arch/powerpc/kvm/book3s_64_vio.c     | 44 ++--------------
>  arch/powerpc/mm/book3s64/iommu_api.c | 41 ++-------------
>  drivers/fpga/dfl-afu-dma-region.c    | 53 ++------------------
>  drivers/vfio/vfio_iommu_spapr_tce.c  | 54 ++------------------
>  drivers/vfio/vfio_iommu_type1.c      | 17 +------
>  include/linux/mm.h                   |  4 ++
>  mm/util.c                            | 75 ++++++++++++++++++++++++++++
>  7 files changed, 98 insertions(+), 190 deletions(-)

I tend to prefer adding a negative rather than converting to absolute
and passing a bool for inc/dec, but it all seems equivalent, so for
vfio parts

Acked-by: Alex Williamson <alex.williamson@redhat.com>

> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 66270e07449a..768b645c7edf 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -30,6 +30,7 @@
>  #include <linux/anon_inodes.h>
>  #include <linux/iommu.h>
>  #include <linux/file.h>
> +#include <linux/mm.h>
>  
>  #include <asm/kvm_ppc.h>
>  #include <asm/kvm_book3s.h>
> @@ -56,43 +57,6 @@ static unsigned long kvmppc_stt_pages(unsigned long tce_pages)
>  	return tce_pages + ALIGN(stt_bytes, PAGE_SIZE) / PAGE_SIZE;
>  }
>  
> -static long kvmppc_account_memlimit(unsigned long stt_pages, bool inc)
> -{
> -	long ret = 0;
> -
> -	if (!current || !current->mm)
> -		return ret; /* process exited */
> -
> -	down_write(&current->mm->mmap_sem);
> -
> -	if (inc) {
> -		unsigned long locked, lock_limit;
> -
> -		locked = current->mm->locked_vm + stt_pages;
> -		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> -		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
> -			ret = -ENOMEM;
> -		else
> -			current->mm->locked_vm += stt_pages;
> -	} else {
> -		if (WARN_ON_ONCE(stt_pages > current->mm->locked_vm))
> -			stt_pages = current->mm->locked_vm;
> -
> -		current->mm->locked_vm -= stt_pages;
> -	}
> -
> -	pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid,
> -			inc ? '+' : '-',
> -			stt_pages << PAGE_SHIFT,
> -			current->mm->locked_vm << PAGE_SHIFT,
> -			rlimit(RLIMIT_MEMLOCK),
> -			ret ? " - exceeded" : "");
> -
> -	up_write(&current->mm->mmap_sem);
> -
> -	return ret;
> -}
> -
>  static void kvm_spapr_tce_iommu_table_free(struct rcu_head *head)
>  {
>  	struct kvmppc_spapr_tce_iommu_table *stit = container_of(head,
> @@ -302,7 +266,7 @@ static int kvm_spapr_tce_release(struct inode *inode, struct file *filp)
>  
>  	kvm_put_kvm(stt->kvm);
>  
> -	kvmppc_account_memlimit(
> +	account_locked_vm(current->mm,
>  		kvmppc_stt_pages(kvmppc_tce_pages(stt->size)), false);
>  	call_rcu(&stt->rcu, release_spapr_tce_table);
>  
> @@ -327,7 +291,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  		return -EINVAL;
>  
>  	npages = kvmppc_tce_pages(size);
> -	ret = kvmppc_account_memlimit(kvmppc_stt_pages(npages), true);
> +	ret = account_locked_vm(current->mm, kvmppc_stt_pages(npages), true);
>  	if (ret)
>  		return ret;
>  
> @@ -373,7 +337,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  
>  	kfree(stt);
>   fail_acct:
> -	kvmppc_account_memlimit(kvmppc_stt_pages(npages), false);
> +	account_locked_vm(current->mm, kvmppc_stt_pages(npages), false);
>  	return ret;
>  }
>  
> diff --git a/arch/powerpc/mm/book3s64/iommu_api.c b/arch/powerpc/mm/book3s64/iommu_api.c
> index 5c521f3924a5..18d22eec0ebd 100644
> --- a/arch/powerpc/mm/book3s64/iommu_api.c
> +++ b/arch/powerpc/mm/book3s64/iommu_api.c
> @@ -19,6 +19,7 @@
>  #include <linux/hugetlb.h>
>  #include <linux/swap.h>
>  #include <linux/sizes.h>
> +#include <linux/mm.h>
>  #include <asm/mmu_context.h>
>  #include <asm/pte-walk.h>
>  #include <linux/mm_inline.h>
> @@ -51,40 +52,6 @@ struct mm_iommu_table_group_mem_t {
>  	u64 dev_hpa;		/* Device memory base address */
>  };
>  
> -static long mm_iommu_adjust_locked_vm(struct mm_struct *mm,
> -		unsigned long npages, bool incr)
> -{
> -	long ret = 0, locked, lock_limit;
> -
> -	if (!npages)
> -		return 0;
> -
> -	down_write(&mm->mmap_sem);
> -
> -	if (incr) {
> -		locked = mm->locked_vm + npages;
> -		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> -		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
> -			ret = -ENOMEM;
> -		else
> -			mm->locked_vm += npages;
> -	} else {
> -		if (WARN_ON_ONCE(npages > mm->locked_vm))
> -			npages = mm->locked_vm;
> -		mm->locked_vm -= npages;
> -	}
> -
> -	pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%ld %ld/%ld\n",
> -			current ? current->pid : 0,
> -			incr ? '+' : '-',
> -			npages << PAGE_SHIFT,
> -			mm->locked_vm << PAGE_SHIFT,
> -			rlimit(RLIMIT_MEMLOCK));
> -	up_write(&mm->mmap_sem);
> -
> -	return ret;
> -}
> -
>  bool mm_iommu_preregistered(struct mm_struct *mm)
>  {
>  	return !list_empty(&mm->context.iommu_group_mem_list);
> @@ -101,7 +68,7 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua,
>  	unsigned long entry, chunk;
>  
>  	if (dev_hpa == MM_IOMMU_TABLE_INVALID_HPA) {
> -		ret = mm_iommu_adjust_locked_vm(mm, entries, true);
> +		ret = account_locked_vm(mm, entries, true);
>  		if (ret)
>  			return ret;
>  
> @@ -216,7 +183,7 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua,
>  	kfree(mem);
>  
>  unlock_exit:
> -	mm_iommu_adjust_locked_vm(mm, locked_entries, false);
> +	account_locked_vm(mm, locked_entries, false);
>  
>  	return ret;
>  }
> @@ -316,7 +283,7 @@ long mm_iommu_put(struct mm_struct *mm, struct mm_iommu_table_group_mem_t *mem)
>  unlock_exit:
>  	mutex_unlock(&mem_list_mutex);
>  
> -	mm_iommu_adjust_locked_vm(mm, unlock_entries, false);
> +	account_locked_vm(mm, unlock_entries, false);
>  
>  	return ret;
>  }
> diff --git a/drivers/fpga/dfl-afu-dma-region.c b/drivers/fpga/dfl-afu-dma-region.c
> index c438722bf4e1..0a532c602d8f 100644
> --- a/drivers/fpga/dfl-afu-dma-region.c
> +++ b/drivers/fpga/dfl-afu-dma-region.c
> @@ -12,6 +12,7 @@
>  #include <linux/dma-mapping.h>
>  #include <linux/sched/signal.h>
>  #include <linux/uaccess.h>
> +#include <linux/mm.h>
>  
>  #include "dfl-afu.h"
>  
> @@ -31,52 +32,6 @@ void afu_dma_region_init(struct dfl_feature_platform_data *pdata)
>  	afu->dma_regions = RB_ROOT;
>  }
>  
> -/**
> - * afu_dma_adjust_locked_vm - adjust locked memory
> - * @dev: port device
> - * @npages: number of pages
> - * @incr: increase or decrease locked memory
> - *
> - * Increase or decrease the locked memory size with npages input.
> - *
> - * Return 0 on success.
> - * Return -ENOMEM if locked memory size is over the limit and no CAP_IPC_LOCK.
> - */
> -static int afu_dma_adjust_locked_vm(struct device *dev, long npages, bool incr)
> -{
> -	unsigned long locked, lock_limit;
> -	int ret = 0;
> -
> -	/* the task is exiting. */
> -	if (!current->mm)
> -		return 0;
> -
> -	down_write(&current->mm->mmap_sem);
> -
> -	if (incr) {
> -		locked = current->mm->locked_vm + npages;
> -		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> -
> -		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
> -			ret = -ENOMEM;
> -		else
> -			current->mm->locked_vm += npages;
> -	} else {
> -		if (WARN_ON_ONCE(npages > current->mm->locked_vm))
> -			npages = current->mm->locked_vm;
> -		current->mm->locked_vm -= npages;
> -	}
> -
> -	dev_dbg(dev, "[%d] RLIMIT_MEMLOCK %c%ld %ld/%ld%s\n", current->pid,
> -		incr ? '+' : '-', npages << PAGE_SHIFT,
> -		current->mm->locked_vm << PAGE_SHIFT, rlimit(RLIMIT_MEMLOCK),
> -		ret ? "- exceeded" : "");
> -
> -	up_write(&current->mm->mmap_sem);
> -
> -	return ret;
> -}
> -
>  /**
>   * afu_dma_pin_pages - pin pages of given dma memory region
>   * @pdata: feature device platform data
> @@ -92,7 +47,7 @@ static int afu_dma_pin_pages(struct dfl_feature_platform_data *pdata,
>  	struct device *dev = &pdata->dev->dev;
>  	int ret, pinned;
>  
> -	ret = afu_dma_adjust_locked_vm(dev, npages, true);
> +	ret = account_locked_vm(current->mm, npages, true);
>  	if (ret)
>  		return ret;
>  
> @@ -121,7 +76,7 @@ static int afu_dma_pin_pages(struct dfl_feature_platform_data *pdata,
>  free_pages:
>  	kfree(region->pages);
>  unlock_vm:
> -	afu_dma_adjust_locked_vm(dev, npages, false);
> +	account_locked_vm(current->mm, npages, false);
>  	return ret;
>  }
>  
> @@ -141,7 +96,7 @@ static void afu_dma_unpin_pages(struct dfl_feature_platform_data *pdata,
>  
>  	put_all_pages(region->pages, npages);
>  	kfree(region->pages);
> -	afu_dma_adjust_locked_vm(dev, npages, false);
> +	account_locked_vm(current->mm, npages, false);
>  
>  	dev_dbg(dev, "%ld pages unpinned\n", npages);
>  }
> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
> index 40ddc0c5f677..d06e8e291924 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -22,6 +22,7 @@
>  #include <linux/vmalloc.h>
>  #include <linux/sched/mm.h>
>  #include <linux/sched/signal.h>
> +#include <linux/mm.h>
>  
>  #include <asm/iommu.h>
>  #include <asm/tce.h>
> @@ -34,51 +35,6 @@
>  static void tce_iommu_detach_group(void *iommu_data,
>  		struct iommu_group *iommu_group);
>  
> -static long try_increment_locked_vm(struct mm_struct *mm, long npages)
> -{
> -	long ret = 0, locked, lock_limit;
> -
> -	if (WARN_ON_ONCE(!mm))
> -		return -EPERM;
> -
> -	if (!npages)
> -		return 0;
> -
> -	down_write(&mm->mmap_sem);
> -	locked = mm->locked_vm + npages;
> -	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> -	if (locked > lock_limit && !capable(CAP_IPC_LOCK))
> -		ret = -ENOMEM;
> -	else
> -		mm->locked_vm += npages;
> -
> -	pr_debug("[%d] RLIMIT_MEMLOCK +%ld %ld/%ld%s\n", current->pid,
> -			npages << PAGE_SHIFT,
> -			mm->locked_vm << PAGE_SHIFT,
> -			rlimit(RLIMIT_MEMLOCK),
> -			ret ? " - exceeded" : "");
> -
> -	up_write(&mm->mmap_sem);
> -
> -	return ret;
> -}
> -
> -static void decrement_locked_vm(struct mm_struct *mm, long npages)
> -{
> -	if (!mm || !npages)
> -		return;
> -
> -	down_write(&mm->mmap_sem);
> -	if (WARN_ON_ONCE(npages > mm->locked_vm))
> -		npages = mm->locked_vm;
> -	mm->locked_vm -= npages;
> -	pr_debug("[%d] RLIMIT_MEMLOCK -%ld %ld/%ld\n", current->pid,
> -			npages << PAGE_SHIFT,
> -			mm->locked_vm << PAGE_SHIFT,
> -			rlimit(RLIMIT_MEMLOCK));
> -	up_write(&mm->mmap_sem);
> -}
> -
>  /*
>   * VFIO IOMMU fd for SPAPR_TCE IOMMU implementation
>   *
> @@ -336,7 +292,7 @@ static int tce_iommu_enable(struct tce_container *container)
>  		return ret;
>  
>  	locked = table_group->tce32_size >> PAGE_SHIFT;
> -	ret = try_increment_locked_vm(container->mm, locked);
> +	ret = account_locked_vm(container->mm, locked, true);
>  	if (ret)
>  		return ret;
>  
> @@ -355,7 +311,7 @@ static void tce_iommu_disable(struct tce_container *container)
>  	container->enabled = false;
>  
>  	BUG_ON(!container->mm);
> -	decrement_locked_vm(container->mm, container->locked_pages);
> +	account_locked_vm(container->mm, container->locked_pages, false);
>  }
>  
>  static void *tce_iommu_open(unsigned long arg)
> @@ -659,7 +615,7 @@ static long tce_iommu_create_table(struct tce_container *container,
>  	if (!table_size)
>  		return -EINVAL;
>  
> -	ret = try_increment_locked_vm(container->mm, table_size >> PAGE_SHIFT);
> +	ret = account_locked_vm(container->mm, table_size >> PAGE_SHIFT, true);
>  	if (ret)
>  		return ret;
>  
> @@ -678,7 +634,7 @@ static void tce_iommu_free_table(struct tce_container *container,
>  	unsigned long pages = tbl->it_allocated_size >> PAGE_SHIFT;
>  
>  	iommu_tce_table_put(tbl);
> -	decrement_locked_vm(container->mm, pages);
> +	account_locked_vm(container->mm, pages, false);
>  }
>  
>  static long tce_iommu_create_window(struct tce_container *container,
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 3ddc375e7063..bf449ace1676 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -275,21 +275,8 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
>  
>  	ret = down_write_killable(&mm->mmap_sem);
>  	if (!ret) {
> -		if (npage > 0) {
> -			if (!dma->lock_cap) {
> -				unsigned long limit;
> -
> -				limit = task_rlimit(dma->task,
> -						RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> -
> -				if (mm->locked_vm + npage > limit)
> -					ret = -ENOMEM;
> -			}
> -		}
> -
> -		if (!ret)
> -			mm->locked_vm += npage;
> -
> +		ret = __account_locked_vm(mm, abs(npage), npage > 0, dma->task,
> +					  dma->lock_cap);
>  		up_write(&mm->mmap_sem);
>  	}
>  
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0e8834ac32b7..95510f6fad45 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1564,6 +1564,10 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
>  int get_user_pages_fast(unsigned long start, int nr_pages,
>  			unsigned int gup_flags, struct page **pages);
>  
> +int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc);
> +int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
> +			struct task_struct *task, bool bypass_rlim);
> +
>  /* Container for pinned pfns / pages */
>  struct frame_vector {
>  	unsigned int nr_allocated;	/* Number of frames we have space for */
> diff --git a/mm/util.c b/mm/util.c
> index 91682a2090ee..cbbcc035b12b 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -7,6 +7,7 @@
>  #include <linux/err.h>
>  #include <linux/sched.h>
>  #include <linux/sched/mm.h>
> +#include <linux/sched/signal.h>
>  #include <linux/sched/task_stack.h>
>  #include <linux/security.h>
>  #include <linux/swap.h>
> @@ -347,6 +348,80 @@ int __weak get_user_pages_fast(unsigned long start,
>  }
>  EXPORT_SYMBOL_GPL(get_user_pages_fast);
>  
> +/**
> + * __account_locked_vm - account locked pages to an mm's locked_vm
> + * @mm:          mm to account against
> + * @pages:       number of pages to account
> + * @inc:         %true if @pages should be considered positive, %false if not
> + * @task:        task used to check RLIMIT_MEMLOCK
> + * @bypass_rlim: %true if checking RLIMIT_MEMLOCK should be skipped
> + *
> + * Assumes @task and @mm are valid (i.e. at least one reference on each), and
> + * that mmap_sem is held as writer.
> + *
> + * Return:
> + * * 0       on success
> + * * -ENOMEM if RLIMIT_MEMLOCK would be exceeded.
> + */
> +int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
> +			struct task_struct *task, bool bypass_rlim)
> +{
> +	unsigned long locked_vm, limit;
> +	int ret = 0;
> +
> +	lockdep_assert_held_exclusive(&mm->mmap_sem);
> +
> +	locked_vm = mm->locked_vm;
> +	if (inc) {
> +		if (!bypass_rlim) {
> +			limit = task_rlimit(task, RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> +			if (locked_vm + pages > limit)
> +				ret = -ENOMEM;
> +		}
> +		if (!ret)
> +			mm->locked_vm = locked_vm + pages;
> +	} else {
> +		WARN_ON_ONCE(pages > locked_vm);
> +		mm->locked_vm = locked_vm - pages;
> +	}
> +
> +	pr_debug("%s: [%d] caller %ps %c%lu %lu/%lu%s\n", __func__, task->pid,
> +		 (void *)_RET_IP_, (inc) ? '+' : '-', pages << PAGE_SHIFT,
> +		 locked_vm << PAGE_SHIFT, task_rlimit(task, RLIMIT_MEMLOCK),
> +		 ret ? " - exceeded" : "");
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(__account_locked_vm);
> +
> +/**
> + * account_locked_vm - account locked pages to an mm's locked_vm
> + * @mm:          mm to account against, may be NULL
> + * @pages:       number of pages to account
> + * @inc:         %true if @pages should be considered positive, %false if not
> + *
> + * Assumes a non-NULL @mm is valid (i.e. at least one reference on it).
> + *
> + * Return:
> + * * 0       on success, or if mm is NULL
> + * * -ENOMEM if RLIMIT_MEMLOCK would be exceeded.
> + */
> +int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc)
> +{
> +	int ret;
> +
> +	if (pages == 0 || !mm)
> +		return 0;
> +
> +	down_write(&mm->mmap_sem);
> +	ret = __account_locked_vm(mm, pages, inc, current,
> +				  capable(CAP_IPC_LOCK));
> +	up_write(&mm->mmap_sem);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(account_locked_vm);
> +
>  unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
>  	unsigned long len, unsigned long prot,
>  	unsigned long flag, unsigned long pgoff)
> 
> base-commit: cd6c84d8f0cdc911df435bb075ba22ce3c605b07


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox