LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] powerpc: Use the common INIT_DATA_SECTION macro in vmlinux.lds.S
From: Michael Ellerman @ 2020-11-25 11:57 UTC (permalink / raw)
  To: Youling Tang, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1604487550-20040-1-git-send-email-tangyouling@loongson.cn>

On Wed, 4 Nov 2020 18:59:10 +0800, Youling Tang wrote:
> Use the common INIT_DATA_SECTION rule for the linker script in an effort
> to regularize the linker script.

Applied to powerpc/next.

[1/1] powerpc: Use the common INIT_DATA_SECTION macro in vmlinux.lds.S
      https://git.kernel.org/powerpc/c/fdcfeaba38e5b183045f5b079af94f97658eabe6

cheers

^ permalink raw reply

* Re: [PATCH] Revert "powerpc/pseries/hotplug-cpu: Remove double free in error path"
From: Michael Ellerman @ 2020-11-25 11:58 UTC (permalink / raw)
  To: Zhang Xiaoxu, paulus, groug, tyreld, mpe, benh, linuxppc-dev
In-Reply-To: <20201111020752.1686139-1-zhangxiaoxu5@huawei.com>

On Tue, 10 Nov 2020 21:07:52 -0500, Zhang Xiaoxu wrote:
> This reverts commit a0ff72f9f5a780341e7ff5e9ba50a0dad5fa1980.
> 
> Since the commit b015f6bc9547 ("powerpc/pseries: Add cpu DLPAR
> support for drc-info property"), the 'cpu_drcs' wouldn't be double
> freed when the 'cpus' node not found.
> 
> So we needn't apply this patch, otherwise, the memory will be leak.

Applied to powerpc/next.

[1/1] Revert "powerpc/pseries/hotplug-cpu: Remove double free in error path"
      https://git.kernel.org/powerpc/c/a40fdaf1420d6e6bda0dd2df1e6806013e58dbe1

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/powernv/sriov: fix unsigned int win compared to less than zero
From: Michael Ellerman @ 2020-11-25 11:58 UTC (permalink / raw)
  To: ajd, paulus, fbarrat, mpe, xiakaixu1987@gmail.com, benh
  Cc: linuxppc-dev, Kaixu Xia, linux-kernel
In-Reply-To: <1605007170-22171-1-git-send-email-kaixuxia@tencent.com>

On Tue, 10 Nov 2020 19:19:30 +0800, xiakaixu1987@gmail.com wrote:
> Fix coccicheck warning:
> 
> ./arch/powerpc/platforms/powernv/pci-sriov.c:443:7-10: WARNING: Unsigned expression compared with zero: win < 0
> ./arch/powerpc/platforms/powernv/pci-sriov.c:462:7-10: WARNING: Unsigned expression compared with zero: win < 0

Applied to powerpc/next.

[1/1] powerpc/powernv/sriov: fix unsigned int win compared to less than zero
      https://git.kernel.org/powerpc/c/027717a45ca251a7ba67a63db359994836962cd2

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/mm: Fix comparing pointer to 0 warning
From: Michael Ellerman @ 2020-11-25 11:57 UTC (permalink / raw)
  To: paulus, xiakaixu1987@gmail.com, mpe, benh
  Cc: linuxppc-dev, Kaixu Xia, linux-kernel
In-Reply-To: <1604976961-20441-1-git-send-email-kaixuxia@tencent.com>

On Tue, 10 Nov 2020 10:56:01 +0800, xiakaixu1987@gmail.com wrote:
> Fixes coccicheck warning:
> 
> ./arch/powerpc/mm/pgtable_32.c:87:11-12: WARNING comparing pointer to 0
> 
> Avoid pointer type value compared to 0.

Applied to powerpc/next.

[1/1] powerpc/mm: Fix comparing pointer to 0 warning
      https://git.kernel.org/powerpc/c/b84bf098fcc49ed6bf4b0a8bed52e9df0e8f1de7

cheers

^ permalink raw reply

* Re: [PATCHv2] selftests/powerpc/eeh: disable kselftest timeout setting for eeh-basic
From: Michael Ellerman @ 2020-11-25 11:57 UTC (permalink / raw)
  To: Po-Hsu Lin, mpe, linux-kselftest, linux-kernel, linuxppc-dev
  Cc: mathieu.desnoyers, shuah, joe.lawrence, mbenes
In-Reply-To: <20201023024539.9512-1-po-hsu.lin@canonical.com>

On Fri, 23 Oct 2020 10:45:39 +0800, Po-Hsu Lin wrote:
> The eeh-basic test got its own 60 seconds timeout (defined in commit
> 414f50434aa2 "selftests/eeh: Bump EEH wait time to 60s") per breakable
> device.
> 
> And we have discovered that the number of breakable devices varies
> on different hardware. The device recovery time ranges from 0 to 35
> seconds. In our test pool it will take about 30 seconds to run on a
> Power8 system that with 5 breakable devices, 60 seconds to run on a
> Power9 system that with 4 breakable devices.
> 
> [...]

Applied to powerpc/next.

[1/1] selftests/powerpc/eeh: disable kselftest timeout setting for eeh-basic
      https://git.kernel.org/powerpc/c/f5eca0b279117f25020112a2f65ec9c3ea25f3ac

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/ps3: Drop unused DBG macro
From: Michael Ellerman @ 2020-11-25 11:57 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev
In-Reply-To: <20201023031305.3284819-1-mpe@ellerman.id.au>

On Fri, 23 Oct 2020 14:13:05 +1100, Michael Ellerman wrote:
> This DBG macro is unused, and has been unused since the file was
> originally merged into mainline. Just drop it.

Applied to powerpc/next.

[1/1] powerpc/ps3: Drop unused DBG macro
      https://git.kernel.org/powerpc/c/cb5d4c465f31bc44b8bbd4934678c2b140a2ad29

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/85xx: Fix declaration made after definition
From: Michael Ellerman @ 2020-11-25 11:57 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev
In-Reply-To: <20201023020838.3274226-1-mpe@ellerman.id.au>

On Fri, 23 Oct 2020 13:08:38 +1100, Michael Ellerman wrote:
> Currently the clang build of corenet64_smp_defconfig fails with:
> 
>   arch/powerpc/platforms/85xx/corenet_generic.c:210:1: error:
>   attribute declaration must precede definition
>   machine_arch_initcall(corenet_generic, corenet_gen_publish_devices);
> 
> Fix it by moving the initcall definition prior to the machine
> definition, and directly below the function it calls, which is the
> usual style anyway.

Applied to powerpc/next.

[1/1] powerpc/85xx: Fix declaration made after definition
      https://git.kernel.org/powerpc/c/ef78f2dd2398ce8ed9eeaab9c9f8af2e15f5d870

cheers

^ permalink raw reply

* Re: [PATCH] powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe()
From: Michael Ellerman @ 2020-11-25 11:57 UTC (permalink / raw)
  To: Qinglang Miao, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20201028091551.136400-1-miaoqinglang@huawei.com>

On Wed, 28 Oct 2020 17:15:51 +0800, Qinglang Miao wrote:
> I noticed that iounmap() of msgr_block_addr before return from
> mpic_msgr_probe() in the error handling case is missing. So use
> devm_ioremap() instead of just ioremap() when remapping the message
> register block, so the mapping will be automatically released on
> probe failure.

Applied to powerpc/next.

[1/1] powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe()
      https://git.kernel.org/powerpc/c/ffa1797040c5da391859a9556be7b735acbe1242

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/64s/perf: perf interrupt does not have to get_user_pages to access user memory
From: Michael Ellerman @ 2020-11-25 11:57 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev
In-Reply-To: <20201111120151.3150658-1-npiggin@gmail.com>

On Wed, 11 Nov 2020 22:01:51 +1000, Nicholas Piggin wrote:
> read_user_stack_slow that walks user address translation by hand is
> only required on hash, because a hash fault can not be serviced from
> "NMI" context (to avoid re-entering the hash code) so the user stack
> can be mapped into Linux page tables but not accessible by the CPU.
> 
> Radix MMU mode does not have this restriction. A page fault failure
> would indicate the page is not accessible via get_user_pages either,
> so avoid this on radix.

Applied to powerpc/next.

[1/1] powerpc/64s/perf: perf interrupt does not have to get_user_pages to access user memory
      https://git.kernel.org/powerpc/c/987c426320cce72d1b28f55c8603b239e4f7187c

cheers

^ permalink raw reply

* Re: [PATCH v2 0/8] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations
From: Michael Ellerman @ 2020-11-25 11:57 UTC (permalink / raw)
  To: linux-kernel, David Hildenbrand
  Cc: Michal Hocko, Wei Yang, Nicholas Piggin, Michal Hocko, linux-mm,
	Paul Mackerras, Aneesh Kumar K.V, Rashmica Gupta, linuxppc-dev,
	Andrew Morton, Mike Rapoport, Oscar Salvador
In-Reply-To: <20201111145322.15793-1-david@redhat.com>

On Wed, 11 Nov 2020 15:53:14 +0100, David Hildenbrand wrote:
> Based on latest linux/master
> 
> powernv/memtrace is the only in-kernel user that rips out random memory
> it never added (doesn't own) in order to allocate memory without a
> linear mapping. Let's stop abusing memory hot(un)plug infrastructure for
> that - use alloc_contig_pages() for allocating memory and remove the
> linear mapping manually.
> 
> [...]

Applied to powerpc/next.

[1/8] powerpc/powernv/memtrace: Don't leak kernel memory to user space
      https://git.kernel.org/powerpc/c/c74cf7a3d59a21b290fe0468f5b470d0b8ee37df
[2/8] powerpc/powernv/memtrace: Fix crashing the kernel when enabling concurrently
      https://git.kernel.org/powerpc/c/d6718941a2767fb383e105d257d2105fe4f15f0e
[3/8] powerpc/mm: factor out creating/removing linear mapping
      https://git.kernel.org/powerpc/c/4abb1e5b63ac3281275315fc6b0cde0b9c2e2e42
[4/8] powerpc/mm: protect linear mapping modifications by a mutex
      https://git.kernel.org/powerpc/c/e5b2af044f31bf18defa557a8cd11c23caefa34c
[5/8] powerpc/mm: print warning in arch_remove_linear_mapping()
      https://git.kernel.org/powerpc/c/1f73ad3e8d755dbec52fcec98618a7ce4de12af2
[6/8] powerpc/book3s64/hash: Drop WARN_ON in hash__remove_section_mapping()
      https://git.kernel.org/powerpc/c/d8bd9a121c2f2bc8b36da930dc91b69fd2a705e2
[7/8] powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory()
      https://git.kernel.org/powerpc/c/ca2c36cae9d48b180ea51259e35ab3d95d327df2
[8/8] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations
      https://git.kernel.org/powerpc/c/0bd4b96d99108b7ea9bac0573957483be7781d70

cheers

^ permalink raw reply

* Re: [PATCH v4 1/2] powerpc/64: Set up a kernel stack for secondaries before cpu_restore()
From: Michael Ellerman @ 2020-11-25 11:57 UTC (permalink / raw)
  To: Jordan Niethe, linuxppc-dev; +Cc: oohall, npiggin
In-Reply-To: <20201014072837.24539-1-jniethe5@gmail.com>

On Wed, 14 Oct 2020 18:28:36 +1100, Jordan Niethe wrote:
> Currently in generic_secondary_smp_init(), cur_cpu_spec->cpu_restore()
> is called before a stack has been set up in r1. This was previously fine
> as the cpu_restore() functions were implemented in assembly and did not
> use a stack. However commit 5a61ef74f269 ("powerpc/64s: Support new
> device tree binding for discovering CPU features") used
> __restore_cpu_cpufeatures() as the cpu_restore() function for a
> device-tree features based cputable entry. This is a C function and
> hence uses a stack in r1.
> 
> [...]

Applied to powerpc/next.

[1/2] powerpc/64: Set up a kernel stack for secondaries before cpu_restore()
      https://git.kernel.org/powerpc/c/3c0b976bf20d236c57adcefa80f86a0a1d737727
[2/2] powerpc/64s: Convert some cpu_setup() and cpu_restore() functions to C
      https://git.kernel.org/powerpc/c/344fbab991a568dc33ad90711b489d870e18d26d

cheers

^ permalink raw reply

* Re: [PATCH v1 0/4] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations
From: Michael Ellerman @ 2020-11-25 11:57 UTC (permalink / raw)
  To: linux-kernel, David Hildenbrand
  Cc: Michal Hocko, Wei Yang, Michal Hocko, linux-mm, Paul Mackerras,
	Rashmica Gupta, linuxppc-dev, Andrew Morton, Mike Rapoport,
	Oscar Salvador
In-Reply-To: <20201029162718.29910-1-david@redhat.com>

On Thu, 29 Oct 2020 17:27:14 +0100, David Hildenbrand wrote:
> powernv/memtrace is the only in-kernel user that rips out random memory
> it never added (doesn't own) in order to allocate memory without a
> linear mapping. Let's stop abusing memory hot(un)plug infrastructure for
> that - use alloc_contig_pages() for allocating memory and remove the
> linear mapping manually.
> 
> The original idea was discussed in:
>  https://lkml.kernel.org/r/48340e96-7e6b-736f-9e23-d3111b915b6e@redhat.com
> 
> [...]

Applied to powerpc/next.

[1/4] powerpc/mm: factor out creating/removing linear mapping
      https://git.kernel.org/powerpc/c/4abb1e5b63ac3281275315fc6b0cde0b9c2e2e42
[2/4] powerpc/mm: print warning in arch_remove_linear_mapping()
      https://git.kernel.org/powerpc/c/1f73ad3e8d755dbec52fcec98618a7ce4de12af2
[3/4] powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory()
      https://git.kernel.org/powerpc/c/ca2c36cae9d48b180ea51259e35ab3d95d327df2
[4/4] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations
      https://git.kernel.org/powerpc/c/0bd4b96d99108b7ea9bac0573957483be7781d70

cheers

^ permalink raw reply

* Re: [PATCH v2 1/3] powerpc/64s: Replace RFI by RFI_TO_KERNEL and remove RFI
From: Michael Ellerman @ 2020-11-25 11:57 UTC (permalink / raw)
  To: Christophe Leroy, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <7719261b0a0d2787772339484c33eb809723bca7.1604854583.git.christophe.leroy@csgroup.eu>

On Sun, 8 Nov 2020 16:57:35 +0000 (UTC), Christophe Leroy wrote:
> In head_64.S, we have two places using RFI to return to
> kernel. Use RFI_TO_KERNEL instead.
> 
> They are the two only places using RFI on book3s/64, so
> the RFI macro can go away.

Applied to powerpc/next.

[1/3] powerpc/64s: Replace RFI by RFI_TO_KERNEL and remove RFI
      https://git.kernel.org/powerpc/c/879add7720172ffd2986c44587510fabb7af52f5
[2/3] powerpc: Replace RFI by rfi on book3s/32 and booke
      https://git.kernel.org/powerpc/c/120c0518ec321f33cdc4670059fb76e96ceb56eb
[3/3] powerpc: Remove RFI macro
      https://git.kernel.org/powerpc/c/62182e6c0faf75117f8d1719c118bb5fc8574012

cheers

^ permalink raw reply

* Re: [PATCH v13 0/8] powerpc: switch VDSO to C implementation
From: Michael Ellerman @ 2020-11-25 11:57 UTC (permalink / raw)
  To: Michael Ellerman, Paul Mackerras, anton, Benjamin Herrenschmidt,
	Christophe Leroy, nathanl
  Cc: linux-arch, arnd, linux-kernel, luto, tglx, vincenzo.frascino,
	linuxppc-dev
In-Reply-To: <cover.1604426550.git.christophe.leroy@csgroup.eu>

On Tue, 3 Nov 2020 18:07:11 +0000 (UTC), Christophe Leroy wrote:
> This is a series to switch powerpc VDSO to generic C implementation.
> 
> Changes in v13:
> - Reorganised headers to avoid the need for a fake 32 bits config for building VDSO32 on PPC64
> - Rebased after the removal of powerpc 601
> - Using DOTSYM() macro to call functions directly without using OPD
> - Explicitely dropped .opd and .got1 sections which are now unused
> 
> [...]

Patch 1 applied to powerpc/next.

[1/8] powerpc/feature: Fix CPU_FTRS_ALWAYS by removing CPU_FTRS_GENERIC_32
      https://git.kernel.org/powerpc/c/78665179e569c7e1fe102fb6c21d0f5b6951f084

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/bitops: Fix possible undefined behaviour with fls() and fls64()
From: Michael Ellerman @ 2020-11-25 11:57 UTC (permalink / raw)
  To: Michael Ellerman, Paul Mackerras, jakub, Benjamin Herrenschmidt,
	Christophe Leroy, segher
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <348c2d3f19ffcff8abe50d52513f989c4581d000.1603375524.git.christophe.leroy@csgroup.eu>

On Thu, 22 Oct 2020 14:05:46 +0000 (UTC), Christophe Leroy wrote:
> fls() and fls64() are using __builtin_ctz() and _builtin_ctzll().
> On powerpc, those builtins trivially use ctlzw and ctlzd power
> instructions.
> 
> Allthough those instructions provide the expected result with
> input argument 0, __builtin_ctz() and __builtin_ctzll() are
> documented as undefined for value 0.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/bitops: Fix possible undefined behaviour with fls() and fls64()
      https://git.kernel.org/powerpc/c/1891ef21d92c4801ea082ee8ed478e304ddc6749

cheers

^ permalink raw reply

* Re: [PATCH] powerpc: avoid broken GCC __attribute__((optimize))
From: Michael Ellerman @ 2020-11-25 11:57 UTC (permalink / raw)
  To: linux-kernel, Ard Biesheuvel
  Cc: Kees Cook, Daniel Borkmann, Peter Zijlstra, Randy Dunlap,
	Nick Desaulniers, Alexei Starovoitov, Arvind Sankar,
	Paul Mackerras, Josh Poimboeuf, Geert Uytterhoeven,
	Thomas Gleixner, linuxppc-dev
In-Reply-To: <20201028080433.26799-1-ardb@kernel.org>

On Wed, 28 Oct 2020 09:04:33 +0100, Ard Biesheuvel wrote:
> Commit 7053f80d9696 ("powerpc/64: Prevent stack protection in early boot")
> introduced a couple of uses of __attribute__((optimize)) with function
> scope, to disable the stack protector in some early boot code.
> 
> Unfortunately, and this is documented in the GCC man pages [0], overriding
> function attributes for optimization is broken, and is only supported for
> debug scenarios, not for production: the problem appears to be that
> setting GCC -f flags using this method will cause it to forget about some
> or all other optimization settings that have been applied.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc: Avoid broken GCC __attribute__((optimize))
      https://git.kernel.org/powerpc/c/a7223f5bfcaeade4a86d35263493bcda6c940891

cheers

^ permalink raw reply

* Re: [PATCH v2] powerpc/mm: Update tlbiel loop on POWER10
From: Michael Ellerman @ 2020-11-25 11:57 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linuxppc-dev, mpe; +Cc: npiggin
In-Reply-To: <20201007053305.232879-1-aneesh.kumar@linux.ibm.com>

On Wed, 7 Oct 2020 11:03:05 +0530, Aneesh Kumar K.V wrote:
> With POWER10, single tlbiel instruction invalidates all the congruence
> class of the TLB and hence we need to issue only one tlbiel with SET=0.

Applied to powerpc/next.

[1/1] powerpc/mm: Update tlbiel loop on POWER10
      https://git.kernel.org/powerpc/c/e80639405c40127727812a0e1f8a65ba9979f146

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/mm: move setting pte specific flags to pfn_pmd
From: Michael Ellerman @ 2020-11-25 11:57 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linuxppc-dev, mpe; +Cc: linux-mm
In-Reply-To: <20201022091115.39568-1-aneesh.kumar@linux.ibm.com>

On Thu, 22 Oct 2020 14:41:15 +0530, Aneesh Kumar K.V wrote:
> powerpc used to set the pte specific flags in set_pte_at().  This is
> different from other architectures. To be consistent with other
> architecture powerpc updated pfn_pte to set _PAGE_PTE with
> commit 379c926d6334 ("powerpc/mm: move setting pte specific flags to pfn_pte")
> 
> The commit didn't do the same w.r.t pfn_pmd because we expect pmd_mkhuge
> to do that. But as per Linus that is a bad rule [1].
> Hence update pfn_pmd to set _PAGE_PTE.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/mm: Move setting PTE specific flags to pfn_pmd()
      https://git.kernel.org/powerpc/c/53f45ecc9cd04b4b963f3040f2a54c3baf03b229

cheers

^ permalink raw reply

* [PATCH v2 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()
From: Laurent Vivier @ 2020-11-25 11:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Laurent Vivier, Michael S . Tsirkin, linux-pci, Greg Kurz,
	linux-block, Paul Mackerras, Marc Zyngier, Thomas Gleixner,
	linuxppc-dev, Christoph Hellwig
In-Reply-To: <20201125111657.1141295-1-lvivier@redhat.com>

With virtio multiqueue, normally each queue IRQ is mapped to a CPU.

But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ affinity")
this is broken on pseries.

The affinity is correctly computed in msi_desc but this is not applied
to the system IRQs.

It appears the affinity is correctly passed to rtas_setup_msi_irqs() but
lost at this point and never passed to irq_domain_alloc_descs()
(see commit 06ee6d571f0e ("genirq: Add affinity hint to irq allocation"))
because irq_create_mapping() doesn't take an affinity parameter.

As the previous patch has added the affinity parameter to
irq_create_mapping() we can forward the affinity from rtas_setup_msi_irqs()
to irq_domain_alloc_descs().

With this change, the virtqueues are correctly dispatched between the CPUs
on pseries.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
 arch/powerpc/platforms/pseries/msi.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
index 133f6adcb39c..b3ac2455faad 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -458,7 +458,8 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type)
 			return hwirq;
 		}
 
-		virq = irq_create_mapping(NULL, hwirq);
+		virq = irq_create_mapping_affinity(NULL, hwirq,
+						   entry->affinity);
 
 		if (!virq) {
 			pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
-- 
2.28.0


^ permalink raw reply related

* [PATCH v2 1/2] genirq: add an irq_create_mapping_affinity() function
From: Laurent Vivier @ 2020-11-25 11:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Laurent Vivier, Michael S . Tsirkin, linux-pci, Greg Kurz,
	linux-block, Paul Mackerras, Marc Zyngier, Thomas Gleixner,
	linuxppc-dev, Christoph Hellwig
In-Reply-To: <20201125111657.1141295-1-lvivier@redhat.com>

This function adds an affinity parameter to irq_create_mapping().
This parameter is needed to pass it to irq_domain_alloc_descs().

irq_create_mapping() is a wrapper around irq_create_mapping_affinity()
to pass NULL for the affinity parameter.

No functional change.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
 include/linux/irqdomain.h | 12 ++++++++++--
 kernel/irq/irqdomain.c    | 13 ++++++++-----
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index 71535e87109f..ea5a337e0f8b 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -384,11 +384,19 @@ extern void irq_domain_associate_many(struct irq_domain *domain,
 extern void irq_domain_disassociate(struct irq_domain *domain,
 				    unsigned int irq);
 
-extern unsigned int irq_create_mapping(struct irq_domain *host,
-				       irq_hw_number_t hwirq);
+extern unsigned int irq_create_mapping_affinity(struct irq_domain *host,
+				      irq_hw_number_t hwirq,
+				      const struct irq_affinity_desc *affinity);
 extern unsigned int irq_create_fwspec_mapping(struct irq_fwspec *fwspec);
 extern void irq_dispose_mapping(unsigned int virq);
 
+static inline unsigned int irq_create_mapping(struct irq_domain *host,
+					      irq_hw_number_t hwirq)
+{
+	return irq_create_mapping_affinity(host, hwirq, NULL);
+}
+
+
 /**
  * irq_linear_revmap() - Find a linux irq from a hw irq number.
  * @domain: domain owning this hardware interrupt
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index cf8b374b892d..e4ca69608f3b 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -624,17 +624,19 @@ unsigned int irq_create_direct_mapping(struct irq_domain *domain)
 EXPORT_SYMBOL_GPL(irq_create_direct_mapping);
 
 /**
- * irq_create_mapping() - Map a hardware interrupt into linux irq space
+ * irq_create_mapping_affinity() - Map a hardware interrupt into linux irq space
  * @domain: domain owning this hardware interrupt or NULL for default domain
  * @hwirq: hardware irq number in that domain space
+ * @affinity: irq affinity
  *
  * Only one mapping per hardware interrupt is permitted. Returns a linux
  * irq number.
  * If the sense/trigger is to be specified, set_irq_type() should be called
  * on the number returned from that call.
  */
-unsigned int irq_create_mapping(struct irq_domain *domain,
-				irq_hw_number_t hwirq)
+unsigned int irq_create_mapping_affinity(struct irq_domain *domain,
+				       irq_hw_number_t hwirq,
+				       const struct irq_affinity_desc *affinity)
 {
 	struct device_node *of_node;
 	int virq;
@@ -660,7 +662,8 @@ unsigned int irq_create_mapping(struct irq_domain *domain,
 	}
 
 	/* Allocate a virtual interrupt number */
-	virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node), NULL);
+	virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node),
+				      affinity);
 	if (virq <= 0) {
 		pr_debug("-> virq allocation failed\n");
 		return 0;
@@ -676,7 +679,7 @@ unsigned int irq_create_mapping(struct irq_domain *domain,
 
 	return virq;
 }
-EXPORT_SYMBOL_GPL(irq_create_mapping);
+EXPORT_SYMBOL_GPL(irq_create_mapping_affinity);
 
 /**
  * irq_create_strict_mappings() - Map a range of hw irqs to fixed linux irqs
-- 
2.28.0


^ permalink raw reply related

* [PATCH v2 0/2] powerpc/pseries: fix MSI/X IRQ affinity on pseries
From: Laurent Vivier @ 2020-11-25 11:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Laurent Vivier, Michael S . Tsirkin, linux-pci, Greg Kurz,
	linux-block, Paul Mackerras, Marc Zyngier, Thomas Gleixner,
	linuxppc-dev, Christoph Hellwig

With virtio, in multiqueue case, each queue IRQ is normally
bound to a different CPU using the affinity mask.

This works fine on x86_64 but totally ignored on pseries.

This is not obvious at first look because irqbalance is doing
some balancing to improve that.

It appears that the "managed" flag set in the MSI entry
is never copied to the system IRQ entry.

This series passes the affinity mask from rtas_setup_msi_irqs()
to irq_domain_alloc_descs() by adding an affinity parameter to
irq_create_mapping().

The first patch adds the parameter (no functional change), the
second patch passes the actual affinity mask to irq_create_mapping()
in rtas_setup_msi_irqs().

For instance, with 32 CPUs VM and 32 queues virtio-scsi interface:

... -smp 32 -device virtio-scsi-pci,id=virtio_scsi_pci0,num_queues=32

for IRQ in $(grep virtio2-request /proc/interrupts |cut -d: -f1); do
    for file in /proc/irq/$IRQ/ ; do
        echo -n "IRQ: $(basename $file) CPU: " ; cat $file/smp_affinity_list
    done
done

Without the patch (and without irqbalanced)

IRQ: 268 CPU: 0-31
IRQ: 269 CPU: 0-31
IRQ: 270 CPU: 0-31
IRQ: 271 CPU: 0-31
IRQ: 272 CPU: 0-31
IRQ: 273 CPU: 0-31
IRQ: 274 CPU: 0-31
IRQ: 275 CPU: 0-31
IRQ: 276 CPU: 0-31
IRQ: 277 CPU: 0-31
IRQ: 278 CPU: 0-31
IRQ: 279 CPU: 0-31
IRQ: 280 CPU: 0-31
IRQ: 281 CPU: 0-31
IRQ: 282 CPU: 0-31
IRQ: 283 CPU: 0-31
IRQ: 284 CPU: 0-31
IRQ: 285 CPU: 0-31
IRQ: 286 CPU: 0-31
IRQ: 287 CPU: 0-31
IRQ: 288 CPU: 0-31
IRQ: 289 CPU: 0-31
IRQ: 290 CPU: 0-31
IRQ: 291 CPU: 0-31
IRQ: 292 CPU: 0-31
IRQ: 293 CPU: 0-31
IRQ: 294 CPU: 0-31
IRQ: 295 CPU: 0-31
IRQ: 296 CPU: 0-31
IRQ: 297 CPU: 0-31
IRQ: 298 CPU: 0-31
IRQ: 299 CPU: 0-31

With the patch:

IRQ: 265 CPU: 0
IRQ: 266 CPU: 1
IRQ: 267 CPU: 2
IRQ: 268 CPU: 3
IRQ: 269 CPU: 4
IRQ: 270 CPU: 5
IRQ: 271 CPU: 6
IRQ: 272 CPU: 7
IRQ: 273 CPU: 8
IRQ: 274 CPU: 9
IRQ: 275 CPU: 10
IRQ: 276 CPU: 11
IRQ: 277 CPU: 12
IRQ: 278 CPU: 13
IRQ: 279 CPU: 14
IRQ: 280 CPU: 15
IRQ: 281 CPU: 16
IRQ: 282 CPU: 17
IRQ: 283 CPU: 18
IRQ: 284 CPU: 19
IRQ: 285 CPU: 20
IRQ: 286 CPU: 21
IRQ: 287 CPU: 22
IRQ: 288 CPU: 23
IRQ: 289 CPU: 24
IRQ: 290 CPU: 25
IRQ: 291 CPU: 26
IRQ: 292 CPU: 27
IRQ: 293 CPU: 28
IRQ: 294 CPU: 29
IRQ: 295 CPU: 30
IRQ: 299 CPU: 31

This matches what we have on an x86_64 system.

v2: add a wrapper around original irq_create_mapping() with the
    affinity parameter. Update comments

Laurent Vivier (2):
  genirq: add an irq_create_mapping_affinity() function
  powerpc/pseries: pass MSI affinity to irq_create_mapping()

 arch/powerpc/platforms/pseries/msi.c |  3 ++-
 include/linux/irqdomain.h            | 12 ++++++++++--
 kernel/irq/irqdomain.c               | 13 ++++++++-----
 3 files changed, 20 insertions(+), 8 deletions(-)

-- 
2.28.0



^ permalink raw reply

* Re: [PATCH 1/2] powerpc: sstep: Fix load and update instructions
From: Ravi Bangoria @ 2020-11-25 10:09 UTC (permalink / raw)
  To: Sandipan Das
  Cc: Ravi Bangoria, jniethe5, paulus, naveen.n.rao, linuxppc-dev, dja
In-Reply-To: <20201119054139.244083-1-sandipan@linux.ibm.com>


> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
> index 855457ed09b5..25a5436be6c6 100644
> --- a/arch/powerpc/lib/sstep.c
> +++ b/arch/powerpc/lib/sstep.c
> @@ -2157,11 +2157,15 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
>   
>   		case 23:	/* lwzx */
>   		case 55:	/* lwzux */
> +			if (u && (ra == 0 || ra == rd))
> +				return -1;

I guess you also need to split case 23 and 55?

- Ravi

^ permalink raw reply

* Re: C vdso
From: Christophe Leroy @ 2020-11-25  9:21 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev
In-Reply-To: <87tuteyyxi.fsf@mpe.ellerman.id.au>


Quoting Michael Ellerman <mpe@ellerman.id.au>:

> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>> Le 03/11/2020 à 19:13, Christophe Leroy a écrit :
>>> Le 23/10/2020 à 15:24, Michael Ellerman a écrit :
>>>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>>>> Le 24/09/2020 à 15:17, Christophe Leroy a écrit :
>>>>>> Le 17/09/2020 à 14:33, Michael Ellerman a écrit :
>>>>>>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>>>>>>>
>>>>>>>> What is the status with the generic C vdso merge ?
>>>>>>>> In some mail, you mentionned having difficulties getting it working on
>>>>>>>> ppc64, any progress ? What's the problem ? Can I help ?
>>>>>>>
>>>>>>> Yeah sorry I was hoping to get time to work on it but haven't been able
>>>>>>> to.
>>>>>>>
>>>>>>> It's causing crashes on ppc64 ie. big endian.
>>>> ...
>>>>>>
>>>>>> Can you tell what defconfig you are using ? I have been able to  
>>>>>> setup a full glibc PPC64 cross
>>>>>> compilation chain and been able to test it under QEMU with  
>>>>>> success, using Nathan's vdsotest tool.
>>>>>
>>>>> What config are you using ?
>>>>
>>>> ppc64_defconfig + guest.config
>>>>
>>>> Or pseries_defconfig.
>>>>
>>>> I'm using Ubuntu GCC 9.3.0 mostly, but it happens with other  
>>>> toolchains too.
>>>>
>>>> At a minimum we're seeing relocations in the output, which is a problem:
>>>>
>>>>    $ readelf -r build\~/arch/powerpc/kernel/vdso64/vdso64.so
>>>>    Relocation section '.rela.dyn' at offset 0x12a8 contains 8 entries:
>>>>      Offset          Info           Type           Sym. Value     
>>>> Sym. Name + Addend
>>>>    000000001368  000000000016 R_PPC64_RELATIVE                     7c0
>>>>    000000001370  000000000016 R_PPC64_RELATIVE                     9300
>>>>    000000001380  000000000016 R_PPC64_RELATIVE                     970
>>>>    000000001388  000000000016 R_PPC64_RELATIVE                     9300
>>>>    000000001398  000000000016 R_PPC64_RELATIVE                     a90
>>>>    0000000013a0  000000000016 R_PPC64_RELATIVE                     9300
>>>>    0000000013b0  000000000016 R_PPC64_RELATIVE                     b20
>>>>    0000000013b8  000000000016 R_PPC64_RELATIVE                     9300
>>>
>>> Looks like it's due to the OPD and relation between the function()  
>>> and .function()
>>>
>>> By using DOTSYM() in the 'bl' call, that's directly the dot  
>>> function which is called and the OPD is
>>> not used anymore, it can get dropped.
>>>
>>> Now I get .rela.dyn full of 0, don't know if we should drop it explicitely.
>>
>> What is the status now with latest version of CVDSO ? I saw you had  
>> it in next-test for some time,
>> it is not there anymore today.
>
> Still having some trouble with the compat VDSO.
>
> eg:
>
> $ ./vdsotest clock-gettime-monotonic verify
> timestamp obtained from kernel predates timestamp
> previously obtained from libc/vDSO:
> 	[1346, 821441653] (vDSO)
> 	[570, 769440040] (kernel)
>
>
> And similar for all clocks except the coarse ones.
>

Ok, I managed to get the same with QEMU. Looking at the binary, I only  
see an mftb instead of the mftbu/mftb/mftbu triplet.

Fix below. Can you carry it, or do you prefer a full patch from me ?  
The easiest would be either to squash it into [v13,4/8]  
("powerpc/time: Move timebase functions into new asm/timebase.h"), or  
to add it between patch 4 and 5 ?

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index f877a576b338..c3473eb031a3 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1419,7 +1419,7 @@ static inline void msr_check_and_clear(unsigned  
long bits)
  		__msr_check_and_clear(bits);
  }

-#if defined(CONFIG_PPC_CELL) || defined(CONFIG_E500)
+#if defined(__powerpc64__) && (defined(CONFIG_PPC_CELL) ||  
defined(CONFIG_E500))
  #define mftb()		({unsigned long rval;				\
  			asm volatile(					\
  				"90:	mfspr %0, %2;\n"		\
diff --git a/arch/powerpc/include/asm/timebase.h  
b/arch/powerpc/include/asm/timebase.h
index a8eae3adaa91..7b372976f5a5 100644
--- a/arch/powerpc/include/asm/timebase.h
+++ b/arch/powerpc/include/asm/timebase.h
@@ -21,7 +21,7 @@ static inline u64 get_tb(void)
  {
  	unsigned int tbhi, tblo, tbhi2;

-	if (IS_ENABLED(CONFIG_PPC64))
+	if (IS_BUILTIN(__powerpc64__))
  		return mftb();

  	do {


^ permalink raw reply related

* Re: [PATCH v4 10/18] dt-bindings: usb: Convert DWC USB3 bindings to DT schema
From: Serge Semin @ 2020-11-25  8:32 UTC (permalink / raw)
  To: Rob Herring
  Cc: Neil Armstrong, Bjorn Andersson, Pavel Parkhomenko, Kevin Hilman,
	Krzysztof Kozlowski, Andy Gross, Chunfeng Yun, linux-snps-arc,
	devicetree, Mathias Nyman, Martin Blumenstingl, Lad Prabhakar,
	Alexey Malahov, linux-arm-kernel, Roger Quadros, Felipe Balbi,
	Greg Kroah-Hartman, Yoshihiro Shimoda, linux-usb, linux-mips,
	Serge Semin, linux-kernel, Manu Gautam, linuxppc-dev
In-Reply-To: <20201121124228.GA2039998@robh.at.kernel.org>

On Sat, Nov 21, 2020 at 06:42:28AM -0600, Rob Herring wrote:
> On Thu, Nov 12, 2020 at 01:29:46PM +0300, Serge Semin wrote:
> > On Wed, Nov 11, 2020 at 02:14:23PM -0600, Rob Herring wrote:
> > > On Wed, Nov 11, 2020 at 12:08:45PM +0300, Serge Semin wrote:
> > > > DWC USB3 DT node is supposed to be compliant with the Generic xHCI
> > > > Controller schema, but with additional vendor-specific properties, the
> > > > controller-specific reference clocks and PHYs. So let's convert the
> > > > currently available legacy text-based DWC USB3 bindings to the DT schema
> > > > and make sure the DWC USB3 nodes are also validated against the
> > > > usb-xhci.yaml schema.
> > > > 
> > > > Note we have to discard the nodename restriction of being prefixed with
> > > > "dwc3@" string, since in accordance with the usb-hcd.yaml schema USB nodes
> > > > are supposed to be named as "^usb(@.*)".
> > > > 
> > > > Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
> > > > 
> > > > ---
> > > > 
> > > > Changelog v2:
> > > > - Discard '|' from the descriptions, since we don't need to preserve
> > > >   the text formatting in any of them.
> > > > - Drop quotes from around the string constants.
> > > > - Fix the "clock-names" prop description to be referring the enumerated
> > > >   clock-names instead of the ones from the Databook.
> > > > 
> > > > Changelog v3:
> > > > - Apply usb-xhci.yaml# schema only if the controller is supposed to work
> > > >   as either host or otg.
> > > > 
> > > > Changelog v4:
> > > > - Apply usb-drd.yaml schema first. If the controller is configured
> > > >   to work in a gadget mode only, then apply the usb.yaml schema too,
> > > >   otherwise apply the usb-xhci.yaml schema.
> > > > - Discard the Rob'es Reviewed-by tag. Please review the patch one more
> > > >   time.
> > > > ---
> > > >  .../devicetree/bindings/usb/dwc3.txt          | 125 --------
> > > >  .../devicetree/bindings/usb/snps,dwc3.yaml    | 303 ++++++++++++++++++
> > > >  2 files changed, 303 insertions(+), 125 deletions(-)
> > > >  delete mode 100644 Documentation/devicetree/bindings/usb/dwc3.txt
> > > >  create mode 100644 Documentation/devicetree/bindings/usb/snps,dwc3.yaml
> 
> 
> > > > diff --git a/Documentation/devicetree/bindings/usb/snps,dwc3.yaml b/Documentation/devicetree/bindings/usb/snps,dwc3.yaml
> > > > new file mode 100644
> > > > index 000000000000..079617891da6
> > > > --- /dev/null
> > > > +++ b/Documentation/devicetree/bindings/usb/snps,dwc3.yaml
> > > > @@ -0,0 +1,303 @@
> > > > +# SPDX-License-Identifier: GPL-2.0
> > > > +%YAML 1.2
> > > > +---
> > > > +$id: http://devicetree.org/schemas/usb/snps,dwc3.yaml#
> > > > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > > > +
> > > > +title: Synopsys DesignWare USB3 Controller
> > > > +
> > > > +maintainers:
> > > > +  - Felipe Balbi <balbi@kernel.org>
> > > > +
> > > > +description:
> > > > +  This is usually a subnode to DWC3 glue to which it is connected, but can also
> > > > +  be presented as a standalone DT node with an optional vendor-specific
> > > > +  compatible string.
> > > > +
> > 
> > > > +allOf:
> > > > +  - $ref: usb-drd.yaml#
> > > > +  - if:
> > > > +      properties:
> > > > +        dr_mode:
> > > > +          const: peripheral
> 

> Another thing, this evaluates to true if dr_mode is not present. You 
> need to add 'required'?

Right. Will something like this do that?

+ allOf:
+  - $ref: usb-drd.yaml#
+  - if:
+      properties:
+        dr_mode:
+          const: peripheral
+ 
+      required:
+        - dr_mode
+    then:
+      $ref: usb.yaml#
+    else
+      $ref: usb-xhci.yaml#

> If dr_mode is otg, then don't you need to apply 
> both usb.yaml and usb-xhci.yaml?

No I don't. Since there is no peripheral-specific DT schema, then the
only schema any USB-gadget node needs to pass is usb.yaml, which
is already included into the usb-xhci.yaml schema. So for pure OTG devices
with xHCI host and gadget capabilities it's enough to evaluate: allOf:
[$ref: usb-drd.yaml#, $ref: usb-xhci.yaml#].  Please see the
sketch/ASCII-figure below and the following text for details.

-Sergey

> 
> > > > +    then:
> > > > +      $ref: usb.yaml#
> > > 
> > > This part could be done in usb-drd.yaml?
> > 
> > Originally I was thinking about that, but then in order to minimize
> > the properties validation I've decided to split the properties in
> > accordance with the USB controllers functionality:
> > 
> >             +----- USB Gadget/Peripheral Controller. There is no
> >             |      specific schema for the gadgets since there is no
> >             |      common gadget properties (at least I failed to find
> >             |      ones). So the pure gadget controllers need to be
> >             |      validated just against usb.yaml schema.
> >             |
> > usb.yaml <--+-- usb-hcd.yaml - Generic USB Host Controller. The schema
> >                 ^              turns out to include the OHCI/UHCI/EHCI
> >                 |              properties, which AFAICS are also
> >                 |              applicable for the other host controllers.
> >                 |              So any USB host controller node needs to
> >                 |              be validated against this schema.
> >                 |
> >                 +- usb-xhci.yaml - Generic xHCI Host controller.
> > 
> > usb-drd.yaml -- USB Dual-Role/OTG Controllers. It describes the
> >                 DRD/OTG-specific properties and nothing else. So normally
> >                 it should be applied together with one of the
> >                 schemas described above.
> > 
> > So the use-cases of the suggested schemas is following:
> > 
> > 1) USB Controller is pure gadget? Then:
> >    + allOf:
> >    +  - $ref: usb.yaml#
> > 2) USB Controller is pure USB host (including OHCI/UHCI/EHCI)?
> >    + allOf:
> >    +   - $ref: usb-hcd.yaml#
> >    Note this prevents us from fixing all the currently available USB DT
> >    schemas, which already apply the usb-hcd.yaml schema.
> > 3) USB Controller is pure xHCI host controller? Then:
> >    + allOf:
> >    +   - $ref: usb-xhci.yaml#
> > 4) USB Controller is Dual-Role/OTG controller with USB 2.0 host? Then:
> >    + allOf:
> >    +   - $ref: usb-drd.yaml#
> >    +   - $ref: usb-hcd.yaml#
> > 5) USB Controller is Dual-Role/OTG controller with xHCI host? Then:
> >    + allOf:
> >    +   - $ref: usb-drd.yaml#
> >    +   - $ref: usb-xhci.yaml#
> > 6) USB Controller is Dual-Role/OTG controller which can only be a
> >    gadget? Then:
> >    + allOf:
> >    +   - $ref: usb-drd.yaml#
> >    +   - $ref: usb.yaml#
> > 
> > * Don't know really if controllers like in 6)-th really exist. Most
> > * likely they are still internally capable of dual-roling, but due to
> > * some conditions can be used as gadgets only.
> > 
> > It looks a bit complicated, but at least by having such design we'd minimize
> > the number of properties validation.
> > 

[...]

^ permalink raw reply

* Re: [PATCH 1/3] perf/core: Flush PMU internal buffers for per-CPU events
From: Michael Ellerman @ 2020-11-25  8:12 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ian Rogers, Andi Kleen, Peter Zijlstra, linuxppc-dev,
	linux-kernel, Stephane Eranian, Paul Mackerras,
	Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Gabriel Marin,
	Liang, Kan
In-Reply-To: <CAM9d7cg8kYMyPHQK_rhEiYQaSddqqt93=pLVNKJm8Y6F=if9ow@mail.gmail.com>

Namhyung Kim <namhyung@kernel.org> writes:
> Hello,
>
> On Mon, Nov 23, 2020 at 8:00 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>>
>> Namhyung Kim <namhyung@kernel.org> writes:
>> > Hi Peter and Kan,
>> >
>> > (Adding PPC folks)
>> >
>> > On Tue, Nov 17, 2020 at 2:01 PM Namhyung Kim <namhyung@kernel.org> wrote:
>> >>
>> >> Hello,
>> >>
>> >> On Thu, Nov 12, 2020 at 4:54 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>> >> >
>> >> >
>> >> >
>> >> > On 11/11/2020 11:25 AM, Peter Zijlstra wrote:
>> >> > > On Mon, Nov 09, 2020 at 09:49:31AM -0500, Liang, Kan wrote:
>> >> > >
>> >> > >> - When the large PEBS was introduced (9c964efa4330), the sched_task() should
>> >> > >> be invoked to flush the PEBS buffer in each context switch. However, The
>> >> > >> perf_sched_events in account_event() is not updated accordingly. The
>> >> > >> perf_event_task_sched_* never be invoked for a pure per-CPU context. Only
>> >> > >> per-task event works.
>> >> > >>     At that time, the perf_pmu_sched_task() is outside of
>> >> > >> perf_event_context_sched_in/out. It means that perf has to double
>> >> > >> perf_pmu_disable() for per-task event.
>> >> > >
>> >> > >> - The patch 1 tries to fix broken per-CPU events. The CPU context cannot be
>> >> > >> retrieved from the task->perf_event_ctxp. So it has to be tracked in the
>> >> > >> sched_cb_list. Yes, the code is very similar to the original codes, but it
>> >> > >> is actually the new code for per-CPU events. The optimization for per-task
>> >> > >> events is still kept.
>> >> > >>    For the case, which has both a CPU context and a task context, yes, the
>> >> > >> __perf_pmu_sched_task() in this patch is not invoked. Because the
>> >> > >> sched_task() only need to be invoked once in a context switch. The
>> >> > >> sched_task() will be eventually invoked in the task context.
>> >> > >
>> >> > > The thing is; your first two patches rely on PERF_ATTACH_SCHED_CB and
>> >> > > only set that for large pebs. Are you sure the other users (Intel LBR
>> >> > > and PowerPC BHRB) don't need it?
>> >> >
>> >> > I didn't set it for LBR, because the perf_sched_events is always enabled
>> >> > for LBR. But, yes, we should explicitly set the PERF_ATTACH_SCHED_CB
>> >> > for LBR.
>> >> >
>> >> >         if (has_branch_stack(event))
>> >> >                 inc = true;
>> >> >
>> >> > >
>> >> > > If they indeed do not require the pmu::sched_task() callback for CPU
>> >> > > events, then I still think the whole perf_sched_cb_{inc,dec}() interface
>> >> >
>> >> > No, LBR requires the pmu::sched_task() callback for CPU events.
>> >> >
>> >> > Now, The LBR registers have to be reset in sched in even for CPU events.
>> >> >
>> >> > To fix the shorter LBR callstack issue for CPU events, we also need to
>> >> > save/restore LBRs in pmu::sched_task().
>> >> > https://lore.kernel.org/lkml/1578495789-95006-4-git-send-email-kan.liang@linux.intel.com/
>> >> >
>> >> > > is confusing at best.
>> >> > >
>> >> > > Can't we do something like this instead?
>> >> > >
>> >> > I think the below patch may have two issues.
>> >> > - PERF_ATTACH_SCHED_CB is required for LBR (maybe PowerPC BHRB as well) now.
>> >> > - We may disable the large PEBS later if not all PEBS events support
>> >> > large PEBS. The PMU need a way to notify the generic code to decrease
>> >> > the nr_sched_task.
>> >>
>> >> Any updates on this?  I've reviewed and tested Kan's patches
>> >> and they all look good.
>> >>
>> >> Maybe we can talk to PPC folks to confirm the BHRB case?
>> >
>> > Can we move this forward?  I saw patch 3/3 also adds PERF_ATTACH_SCHED_CB
>> > for PowerPC too.  But it'd be nice if ppc folks can confirm the change.
>>
>> Sorry I've read the whole thread, but I'm still not entirely sure I
>> understand the question.
>
> Thanks for your time and sorry about not being clear enough.
>
> We found per-cpu events are not calling pmu::sched_task()
> on context switches.  So PERF_ATTACH_SCHED_CB was
> added to indicate the core logic that it needs to invoke the
> callback.

OK. TBH I've never thought of using branch stack with a per-cpu event,
but I guess you can do it.

I think the same logic applies as LBR, we need to read the BHRB entries
in the context of the task that they were recorded for.

> The patch 3/3 added the flag to PPC (for BHRB) with other
> changes (I think it should be split like in the patch 2/3) and
> want to get ACKs from the PPC folks.

If you post a new version with Maddy's comments addressed then he or I
can ack it.

cheers

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox