* [PATCH devicetree 3/4] powerpc: dts: t1040rdb: put SGMII PHY under &mdio0 label
From: Vladimir Oltean @ 2020-07-22 17:24 UTC (permalink / raw)
To: robh+dt, shawnguo, mpe, devicetree
Cc: madalin.bucur, linux-kernel, radu-andrei.bulie, fido_max, paulus,
netdev, linuxppc-dev
In-Reply-To: <20200722172422.2590489-1-olteanv@gmail.com>
We're going to add 8 more PHYs in a future patch. It is easier to follow
the hardware description if we don't need to fish for the path of the
MDIO controllers inside the SoC and just use the labels.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
arch/powerpc/boot/dts/fsl/t1040rdb.dts | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/boot/dts/fsl/t1040rdb.dts b/arch/powerpc/boot/dts/fsl/t1040rdb.dts
index 65ff34c49025..40d7126dbe90 100644
--- a/arch/powerpc/boot/dts/fsl/t1040rdb.dts
+++ b/arch/powerpc/boot/dts/fsl/t1040rdb.dts
@@ -59,12 +59,6 @@ ethernet@e4000 {
phy-handle = <&phy_sgmii_2>;
phy-connection-type = "sgmii";
};
-
- mdio@fc000 {
- phy_sgmii_2: ethernet-phy@3 {
- reg = <0x03>;
- };
- };
};
};
@@ -76,3 +70,9 @@ cpld@3,0 {
};
#include "t1040si-post.dtsi"
+
+&mdio0 {
+ phy_sgmii_2: ethernet-phy@3 {
+ reg = <0x3>;
+ };
+};
--
2.25.1
^ permalink raw reply related
* [PATCH devicetree 0/4] Add Seville Ethernet switch to T1040RDB
From: Vladimir Oltean @ 2020-07-22 17:24 UTC (permalink / raw)
To: robh+dt, shawnguo, mpe, devicetree
Cc: madalin.bucur, linux-kernel, radu-andrei.bulie, fido_max, paulus,
netdev, linuxppc-dev
Seville is a DSA switch that is embedded inside the T1040 SoC, and
supported by the mscc_seville DSA driver. The driver has been accepted
this release cycle and is currently available in net-next (and
therefore, in linux-next).
This series adds this switch to the SoC's dtsi files and to the T1040RDB
board file.
Vladimir Oltean (4):
powerpc: dts: t1040: add bindings for Seville Ethernet switch
powerpc: dts: t1040: label the 2 MDIO controllers
powerpc: dts: t1040rdb: put SGMII PHY under &mdio0 label
powerpc: dts: t1040rdb: add ports for Seville Ethernet switch
arch/powerpc/boot/dts/fsl/t1040rdb.dts | 123 +++++++++++++++++++-
arch/powerpc/boot/dts/fsl/t1040si-post.dtsi | 79 ++++++++++++-
2 files changed, 194 insertions(+), 8 deletions(-)
--
2.25.1
^ permalink raw reply
* [PATCH devicetree 1/4] powerpc: dts: t1040: add bindings for Seville Ethernet switch
From: Vladimir Oltean @ 2020-07-22 17:24 UTC (permalink / raw)
To: robh+dt, shawnguo, mpe, devicetree
Cc: madalin.bucur, linux-kernel, radu-andrei.bulie, fido_max, paulus,
netdev, linuxppc-dev
In-Reply-To: <20200722172422.2590489-1-olteanv@gmail.com>
Add the description of the embedded L2 switch inside the SoC dtsi file
for NXP T1040.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
arch/powerpc/boot/dts/fsl/t1040si-post.dtsi | 75 +++++++++++++++++++++
1 file changed, 75 insertions(+)
diff --git a/arch/powerpc/boot/dts/fsl/t1040si-post.dtsi b/arch/powerpc/boot/dts/fsl/t1040si-post.dtsi
index 315d0557eefc..4af856dcc6a3 100644
--- a/arch/powerpc/boot/dts/fsl/t1040si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/t1040si-post.dtsi
@@ -628,6 +628,81 @@ mdio@fd000 {
status = "disabled";
};
};
+
+ seville_switch: ethernet-switch@800000 {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ compatible = "mscc,vsc9953-switch";
+ little-endian;
+ reg = <0x800000 0x290000>;
+
+ ports {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ seville_port0: port@0 {
+ reg = <0>;
+ status = "disabled";
+ };
+
+ seville_port1: port@1 {
+ reg = <1>;
+ status = "disabled";
+ };
+
+ seville_port2: port@2 {
+ reg = <2>;
+ status = "disabled";
+ };
+
+ seville_port3: port@3 {
+ reg = <3>;
+ status = "disabled";
+ };
+
+ seville_port4: port@4 {
+ reg = <4>;
+ status = "disabled";
+ };
+
+ seville_port5: port@5 {
+ reg = <5>;
+ status = "disabled";
+ };
+
+ seville_port6: port@6 {
+ reg = <6>;
+ status = "disabled";
+ };
+
+ seville_port7: port@7 {
+ reg = <7>;
+ status = "disabled";
+ };
+
+ seville_port8: port@8 {
+ reg = <8>;
+ phy-mode = "internal";
+ status = "disabled";
+
+ fixed-link {
+ speed = <2500>;
+ full-duplex;
+ };
+ };
+
+ seville_port9: port@9 {
+ reg = <9>;
+ phy-mode = "internal";
+ status = "disabled";
+
+ fixed-link {
+ speed = <2500>;
+ full-duplex;
+ };
+ };
+ };
+ };
};
&qe {
--
2.25.1
^ permalink raw reply related
* Re: [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone
From: Atish Patra @ 2020-07-22 21:05 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Albert Ou, Alexandre Ghiti, Linux-MM, Anup Patel,
linux-kernel@vger.kernel.org, Atish Patra, Palmer Dabbelt,
Zong Li, Paul Walmsley, Paul Mackerras, linux-riscv, linuxppc-dev
In-Reply-To: <CAK8P3a2VHXDLK6iba=NxSQ-t=9P7LSwzwx3XrK=N=M+qoX_oeQ@mail.gmail.com>
On Wed, Jul 22, 2020 at 1:23 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Wed, Jul 22, 2020 at 9:52 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
> > On Wed, 22 Jul 2020 02:43:50 PDT (-0700), Arnd Bergmann wrote:
> > > On Tue, Jul 21, 2020 at 9:06 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
> > > The eventual goal is to have a split of 3840MB for either user or linear map
> > > plus and 256MB for vmalloc, including the kernel. Switching between linear
> > > and user has a noticeable runtime overhead, but it relaxes both the limits
> > > for user memory and lowmem, and it provides a somewhat stronger
> > > address space isolation.
> >
> > Ya, I think we decided not to do that, at least for now. I guess the right
> > answer there will depend on what 32-bit systems look like, and since we don't
> > have any I'm inclined to just stick to the fast option.
>
> Makes sense. Actually on 32-bit Arm we see fewer large-memory
> configurations in new machines than we had in the past before 64-bit
> machines were widely available at low cost, so I expect not to see a
> lot new hardware with more than 1GB of DDR3 (two 256Mbit x16 chips)
> for cost reasons, and rv32 is likely going to be similar, so you may never
> really see a need for highmem or the above hack to increase the
> size of the linear mapping.
>
> I just noticed that rv32 allows 2GB of lowmem rather than just the usual
> 768MB or 1GB, at the expense of addressable user memory. This seems
> like an unusual choice, but I also don't see any reason to change this
> or make it more flexible unless actual users appear.
>
I am a bit confused here. As per my understanding, RV32 supports 1GB
of lowmem only
as the page offset is set to 0xC0000000. The config option
MAXPHYSMEM_2GB is misleading
as RV32 actually allows 1GB of physical memory only. Any memory blocks beyond
DRAM + 1GB are removed in setup_bootmem. IMHO, The current config
should clarify that.
Moreover, we should add 2G split under a separate configuration if we
want to support that.
> Arnd
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
--
Regards,
Atish
^ permalink raw reply
* Re: [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone
From: Arnd Bergmann @ 2020-07-22 20:22 UTC (permalink / raw)
To: Palmer Dabbelt
Cc: Albert Ou, Alexandre Ghiti, Atish Patra, Anup Patel,
linux-kernel@vger.kernel.org, Paul Walmsley, Linux-MM,
Paul Mackerras, Zong Li, linux-riscv, linuxppc-dev
In-Reply-To: <mhng-820ebe55-b4a3-4ab3-b848-6d3551b43091@palmerdabbelt-glaptop1>
On Wed, Jul 22, 2020 at 9:52 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
> On Wed, 22 Jul 2020 02:43:50 PDT (-0700), Arnd Bergmann wrote:
> > On Tue, Jul 21, 2020 at 9:06 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
> > The eventual goal is to have a split of 3840MB for either user or linear map
> > plus and 256MB for vmalloc, including the kernel. Switching between linear
> > and user has a noticeable runtime overhead, but it relaxes both the limits
> > for user memory and lowmem, and it provides a somewhat stronger
> > address space isolation.
>
> Ya, I think we decided not to do that, at least for now. I guess the right
> answer there will depend on what 32-bit systems look like, and since we don't
> have any I'm inclined to just stick to the fast option.
Makes sense. Actually on 32-bit Arm we see fewer large-memory
configurations in new machines than we had in the past before 64-bit
machines were widely available at low cost, so I expect not to see a
lot new hardware with more than 1GB of DDR3 (two 256Mbit x16 chips)
for cost reasons, and rv32 is likely going to be similar, so you may never
really see a need for highmem or the above hack to increase the
size of the linear mapping.
I just noticed that rv32 allows 2GB of lowmem rather than just the usual
768MB or 1GB, at the expense of addressable user memory. This seems
like an unusual choice, but I also don't see any reason to change this
or make it more flexible unless actual users appear.
Arnd
^ permalink raw reply
* Re: [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone
From: Palmer Dabbelt @ 2020-07-22 19:52 UTC (permalink / raw)
To: Arnd Bergmann
Cc: aou, alex, Atish Patra, Anup Patel, linux-kernel, Paul Walmsley,
linux-mm, paulus, zong.li, linux-riscv, linuxppc-dev
In-Reply-To: <CAK8P3a34sT2bQbkZUjaxaShzCkn+s35pXxS0UNhqGFu+t2hZYw@mail.gmail.com>
On Wed, 22 Jul 2020 02:43:50 PDT (-0700), Arnd Bergmann wrote:
> On Tue, Jul 21, 2020 at 9:06 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>>
>> On Tue, 21 Jul 2020 11:36:10 PDT (-0700), alex@ghiti.fr wrote:
>> > Let's try to make progress here: I add linux-mm in CC to get feedback on
>> > this patch as it blocks sv48 support too.
>>
>> Sorry for being slow here. I haven't replied because I hadn't really fleshed
>> out the design yet, but just so everyone's on the same page my problems with
>> this are:
>>
>> * We waste vmalloc space on 32-bit systems, where there isn't a lot of it.
>
> There is actually an ongoing work to make 32-bit Arm kernels move
> vmlinux into the vmalloc space, as part of the move to avoid highmem.
>
> Overall, a 32-bit system would waste about 0.1% of its virtual address space
> by having the kernel be located in both the linear map and the vmalloc area.
> It's not zero, but not that bad either. With the typical split of 3072 MB user,
> 768MB linear and 256MB vmalloc, it's also around 1.5% of the available
> vmalloc area (assuming a 4MB vmlinux in a typical 32-bit kernel), but the
> boundaries can be changed arbitrarily if needed.
OK, I guess maybe it's not so bad. Our 32-bit defconfig is 10MiB, but I
wouldn't really put much weight behind that number as it's just a 64-bit
defconfig built for 32-bit. We don't have any 32-bit hardware anyway, so if
this becomes an issue later I guess we can just deal with it then.
> The eventual goal is to have a split of 3840MB for either user or linear map
> plus and 256MB for vmalloc, including the kernel. Switching between linear
> and user has a noticeable runtime overhead, but it relaxes both the limits
> for user memory and lowmem, and it provides a somewhat stronger
> address space isolation.
Ya, I think we decided not to do that, at least for now. I guess the right
answer there will depend on what 32-bit systems look like, and since we don't
have any I'm inclined to just stick to the fast option.
> Another potential idea would be to completely randomize the physical
> addresses underneath the kernel by using a random permutation of the
> pages in the kernel image. This adds even more overhead (virt_to_phys
> may need to call vmalloc_to_page or similar) and may cause problems
> with DMA into kernel .data across page boundaries,
>
>> * Sort out how to maintain a linear map as the canonical hole moves around
>> between the VA widths without adding a bunch of overhead to the virt2phys and
>> friends. This is probably going to be the trickiest part, but I think if we
>> just change the page table code to essentially lie about VAs when an sv39
>> system runs an sv48+sv39 kernel we could make it work -- there'd be some
>> logical complexity involved, but it would remain fast.
>
> I assume you can't use the trick that x86 has where all kernel addresses
> are at the top of the 64-bit address space and user addresses are at the
> bottom, regardless of the size of the page tables?
They have the load in their mapping functions, as far as I can tell that's
required to do this sort of thing. We do as well to handle some of the
implicit boot stuff for now, but I was assuming that we'd want to get rid of
that for performance reasons. That said, maybe it just doesn't matter?
^ permalink raw reply
* Re: [PATCH v4 07/12] ppc64/kexec_file: add support to relocate purgatory
From: Hari Bathini @ 2020-07-22 17:33 UTC (permalink / raw)
To: Michael Ellerman, Andrew Morton
Cc: kernel test robot, Pingfan Liu, Kexec-ml, Nayna Jain,
Petr Tesarik, Mahesh J Salgaonkar, Mimi Zohar, lkml, linuxppc-dev,
Sourabh Jain, Vivek Goyal, Dave Young, Thiago Jung Bauermann,
Eric Biederman
In-Reply-To: <871rl4rxao.fsf@mpe.ellerman.id.au>
On 22/07/20 9:55 am, Michael Ellerman wrote:
> Hari Bathini <hbathini@linux.ibm.com> writes:
>> Right now purgatory implementation is only minimal. But if purgatory
>> code is to be enhanced to copy memory to the backup region and verify
>> sha256 digest, relocations may have to be applied to the purgatory.
>> So, add support to relocate purgatory in kexec_file_load system call
>> by setting up TOC pointer and applying RELA relocations as needed.
>>
>> Reported-by: kernel test robot <lkp@intel.com>
>> [lkp: In v1, 'struct mem_sym' was declared in parameter list]
>> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
>> ---
>>
>> * Michael, can you share your opinion on the below:
>> - https://lore.kernel.org/patchwork/patch/1272027/
>> - My intention in cover note.
>
> It seems like a lot of complexity for little benefit.
>
> AFAICS your final purgatory_64.c is only 36 lines, and all it does is a
> single (open coded) memcpy().
>
> It seems like we could write that in not many more lines of assembler
> and avoid all this code.
Hi Michael,
I am not sure if you would agree with me on this, but I am looking at the
purgatory code as work in progress. As mentioned in the cover note, I intend
to add log messaging, sha256 verification into purgatory. And also change it
to position independent executable after moving common purgatory code (sha256
verification) to arch-independent code.
When I initially took this up, I wanted to add all the above changes too, but
cut down on it, in the interest of time, first to get kdump (kexec -s -p)
working in v5.9 merge window.
But as the logic in patches 07/12 & 08/12 has been tested in kexec-tools code
a lot of times and there are unlikely to be any changes to them except for
__kexec_do_relocs() function (afaics), when -PIE would be used, I submitted them.
With patch 09/12, I tried for a change that uses relocations while is minimal
for now.
Would you prefer it to be absolutely minimal by dropping patches 7 & 8 for
now and writing the backup data copy code in assembler?
Thanks
Hari
^ permalink raw reply
* Re: [PATCH v3 0/2] Selftest for cpuidle latency measurement
From: Pratik Sampat @ 2020-07-22 15:33 UTC (permalink / raw)
To: Daniel Lezcano, rjw, mpe, benh, paulus, srivatsa, shuah, npiggin,
ego, svaidy, pratik.r.sampat, linux-pm, linuxppc-dev,
linux-kernel, linux-kselftest
In-Reply-To: <17e884b8-09d8-98a8-3890-bf506d2cdfca@linaro.org>
Hello Daniel,
On 21/07/20 8:27 pm, Daniel Lezcano wrote:
> On 21/07/2020 14:42, Pratik Rajesh Sampat wrote:
>> v2: https://lkml.org/lkml/2020/7/17/369
>> Changelog v2-->v3
>> Based on comments from Gautham R. Shenoy adding the following in the
>> selftest,
>> 1. Grepping modules to determine if already loaded
>> 2. Wrapper to enable/disable states
>> 3. Preventing any operation/test on offlined CPUs
>> ---
>>
>> The patch series introduces a mechanism to measure wakeup latency for
>> IPI and timer based interrupts
>> The motivation behind this series is to find significant deviations
>> behind advertised latency and resisdency values
> Why do you want to measure for the timer and the IPI ? Whatever the
> source of the wakeup, the exit latency remains the same, no ?
>
> Is all this kernel-ish code really needed ?
>
> What about using a highres periodic timer and make it expires every eg.
> 50ms x 2400, so it is 120 secondes and measure the deviation. Repeat the
> operation for each idle states.
>
> And in order to make it as much accurate as possible, set the program
> affinity on a CPU and isolate this one by preventing other processes to
> be scheduled on and migrate the interrupts on the other CPUs.
>
> That will be all userspace code, no?
>
>
The kernel module may not needed now that you mention it.
IPI latencies could be measured using pipes and threads using
pthread_attr_setaffinity_np to control the experiment, as you
suggested. This should internally fire a smp_call_function_single.
The original idea was to essentially measure it as closely as possible
in the kernel without involving the kernel-->userspace overhead.
However, the user-space approach may not be too much of a problem as
we are collecting a baseline and the delta of the latency is what we
would be concerned about anyways!
With respect to measuring both timers and IPI latencies: In principle
yes, the exit latency should remain the same but if there is a
deviation in reality we may want to measure it.
I'll implement this experiment in the userspace and get back with the
numbers to confirm.
Thanks for your comments!
Best,
Pratik
>
>
^ permalink raw reply
* Re: [PATCH] spi: ppc4xx: Convert to use GPIO descriptors
From: Mark Brown @ 2020-07-22 13:45 UTC (permalink / raw)
To: linux-spi, Linus Walleij; +Cc: linuxppc-dev
In-Reply-To: <20200714072226.26071-1-linus.walleij@linaro.org>
On Tue, 14 Jul 2020 09:22:26 +0200, Linus Walleij wrote:
> This converts the PPC4xx SPI driver to use GPIO descriptors.
>
> The driver is already just picking some GPIOs from the device
> tree so the conversion is pretty straight forward. However
> this driver is looking form a pure "gpios" property rather
> than the standard binding "cs-gpios" so we need to add a new
> exception to the gpiolib OF parser to allow this for this
> driver's compatibles.
Applied to
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next
Thanks!
[1/1] spi: ppc4xx: Convert to use GPIO descriptors
commit: 4726773292bfdb2917a0b4d369ddccd5e2f30991
All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.
You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.
If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.
Please add any relevant lists and maintainers to the CCs when replying
to this mail.
Thanks,
Mark
^ permalink raw reply
* RE: [RFC PATCH] powerpc/pseries/svm: capture instruction faulting on MMIO access, in sprg0 register
From: Michael Ellerman @ 2020-07-22 12:45 UTC (permalink / raw)
To: Ram Pai
Cc: ldufour, aik, kvm-ppc, bharata, sathnaga, sukadev, linuxppc-dev,
bauerman, david
In-Reply-To: <20200722074929.GI7339@oc0525413822.ibm.com>
Ram Pai <linuxram@us.ibm.com> writes:
> On Wed, Jul 22, 2020 at 12:06:06PM +1000, Michael Ellerman wrote:
>> Ram Pai <linuxram@us.ibm.com> writes:
>> > An instruction accessing a mmio address, generates a HDSI fault. This fault is
>> > appropriately handled by the Hypervisor. However in the case of secureVMs, the
>> > fault is delivered to the ultravisor.
>> >
>> > Unfortunately the Ultravisor has no correct-way to fetch the faulting
>> > instruction. The PEF architecture does not allow Ultravisor to enable MMU
>> > translation. Walking the two level page table to read the instruction can race
>> > with other vcpus modifying the SVM's process scoped page table.
>>
>> You're trying to read the guest's kernel text IIUC, that mapping should
>> be stable. Possibly permissions on it could change over time, but the
>> virtual -> real mapping should not.
>
> Actually the code does not capture the address of the instruction in the
> sprg0 register. It captures the instruction itself. So should the mapping
> matter?
Sorry that was talking about reading the instruction by doing the page
walk, not with this patch applied.
cheers
^ permalink raw reply
* Re: [v3 07/15] powerpc/perf: Add power10_feat to dt_cpu_ftrs
From: Athira Rajeev @ 2020-07-22 12:28 UTC (permalink / raw)
To: Jordan Niethe
Cc: Gautham R Shenoy, Michael Neuling, maddy, kvm, kvm-ppc, svaidyan,
acme, jolsa, linuxppc-dev
In-Reply-To: <CACzsE9p-mNGptKi_+RSOOhvmW5gfLcKEbtoS6ah_5ZmVCThfpQ@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 8112 bytes --]
> On 22-Jul-2020, at 4:19 PM, Jordan Niethe <jniethe5@gmail.com> wrote:
>
> On Wed, Jul 22, 2020 at 5:55 PM Athira Rajeev
> <atrajeev@linux.vnet.ibm.com <mailto:atrajeev@linux.vnet.ibm.com>> wrote:
>>
>>
>>
>> On 22-Jul-2020, at 10:11 AM, Jordan Niethe <jniethe5@gmail.com> wrote:
>>
>> On Sat, Jul 18, 2020 at 1:13 AM Athira Rajeev
>> <atrajeev@linux.vnet.ibm.com> wrote:
>>
>>
>> From: Madhavan Srinivasan <maddy@linux.ibm.com>
>>
>> Add power10 feature function to dt_cpu_ftrs.c along
>> with a power10 specific init() to initialize pmu sprs,
>> sets the oprofile_cpu_type and cpu_features. This will
>> enable performance monitoring unit(PMU) for Power10
>> in CPU features with "performance-monitor-power10".
>>
>> For PowerISA v3.1, BHRB disable is controlled via Monitor Mode
>> Control Register A (MMCRA) bit, namely "BHRB Recording Disable
>> (BHRBRD)". This patch initializes MMCRA BHRBRD to disable BHRB
>> feature at boot for power10.
>>
>> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
>> ---
>> arch/powerpc/include/asm/reg.h | 3 +++
>> arch/powerpc/kernel/cpu_setup_power.S | 8 ++++++++
>> arch/powerpc/kernel/dt_cpu_ftrs.c | 26 ++++++++++++++++++++++++++
>> 3 files changed, 37 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
>> index 21a1b2d..900ada1 100644
>> --- a/arch/powerpc/include/asm/reg.h
>> +++ b/arch/powerpc/include/asm/reg.h
>> @@ -1068,6 +1068,9 @@
>> #define MMCR0_PMC2_LOADMISSTIME 0x5
>> #endif
>>
>> +/* BHRB disable bit for PowerISA v3.10 */
>> +#define MMCRA_BHRB_DISABLE 0x0000002000000000
>>
>> Shouldn't this go under SPRN_MMCRA with the other MMCRA_*.
>>
>>
>>
>> Hi Jordan
>>
>> Ok, the definition of MMCRA is under #ifdef for 64 bit . if I move definition of MMCRA_BHRB_DISABLE along with other SPR's, I also
>> need to define this for 32-bit to satisfy core-book3s to compile as below:
>>
>> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
>> index 900ada10762c..7e271657b412 100644
>> --- a/arch/powerpc/include/asm/reg.h
>> +++ b/arch/powerpc/include/asm/reg.h
>> @@ -888,6 +888,8 @@
>> #define MMCRA_SLOT 0x07000000UL /* SLOT bits (37-39) */
>> #define MMCRA_SLOT_SHIFT 24
>> #define MMCRA_SAMPLE_ENABLE 0x00000001UL /* enable sampling */
>> +/* BHRB disable bit for PowerISA v3.10 */
>> +#define MMCRA_BHRB_DISABLE 0x0000002000000000
>> #define POWER6_MMCRA_SDSYNC 0x0000080000000000ULL /* SDAR/SIAR synced */
>> #define POWER6_MMCRA_SIHV 0x0000040000000000ULL
>> #define POWER6_MMCRA_SIPR 0x0000020000000000ULL
>> @@ -1068,9 +1070,6 @@
>> #define MMCR0_PMC2_LOADMISSTIME 0x5
>> #endif
>>
>>
>>
>> -/* BHRB disable bit for PowerISA v3.10 */
>> -#define MMCRA_BHRB_DISABLE 0x0000002000000000
>> -
>> /*
>> * SPRG usage:
>> *
>> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
>> index 36baae666387..88068f20827c 100644
>> --- a/arch/powerpc/perf/core-book3s.c
>> +++ b/arch/powerpc/perf/core-book3s.c
>> @@ -94,6 +94,7 @@ static unsigned int freeze_events_kernel = MMCR0_FCS;
>> #define SPRN_SIER2 0
>> #define SPRN_SIER3 0
>> #define MMCRA_SAMPLE_ENABLE 0
>> +#define MMCRA_BHRB_DISABLE 0
>>
>>
>>
>> static inline unsigned long perf_ip_adjust(struct pt_regs *regs)
>> {
>>
>>
>>
>> +
>> /*
>> * SPRG usage:
>> *
>> diff --git a/arch/powerpc/kernel/cpu_setup_power.S b/arch/powerpc/kernel/cpu_setup_power.S
>> index efdcfa7..b8e0d1e 100644
>> --- a/arch/powerpc/kernel/cpu_setup_power.S
>> +++ b/arch/powerpc/kernel/cpu_setup_power.S
>> @@ -94,6 +94,7 @@ _GLOBAL(__restore_cpu_power8)
>> _GLOBAL(__setup_cpu_power10)
>> mflr r11
>> bl __init_FSCR_power10
>> + bl __init_PMU_ISA31
>>
>> So we set MMCRA here but then aren't we still going to call __init_PMU
>> which will overwrite that?
>> Would this setting MMCRA also need to be handled in __restore_cpu_power10?
>>
>>
>> Thanks for this nice catch ! When I rebased code initial phase, we didn’t had power10 part filled in.
>> It was a miss from my side in adding PMu init functions and thanks for pointing this out.
>> Below patch will call __init_PMU functions in setup and restore. Please check if this looks good
>>
>> --
>> diff --git a/arch/powerpc/kernel/cpu_setup_power.S b/arch/powerpc/kernel/cpu_setup_power.S
>> index efdcfa714106..e672a6c5fd7c 100644
>> --- a/arch/powerpc/kernel/cpu_setup_power.S
>> +++ b/arch/powerpc/kernel/cpu_setup_power.S
>> @@ -94,6 +94,9 @@ _GLOBAL(__restore_cpu_power8)
>> _GLOBAL(__setup_cpu_power10)
>> mflr r11
>> bl __init_FSCR_power10
>> + bl __init_PMU
>> + bl __init_PMU_ISA31
>> + bl __init_PMU_HV
>> b 1f
>>
>> _GLOBAL(__setup_cpu_power9)
>
> Won't you also need to change where the label 1 is:
> --- a/arch/powerpc/kernel/cpu_setup_power.S
> +++ b/arch/powerpc/kernel/cpu_setup_power.S
> @@ -100,8 +100,8 @@ _GLOBAL(__setup_cpu_power10)
> _GLOBAL(__setup_cpu_power9)
> mflr r11
> bl __init_FSCR
> -1: bl __init_PMU
> - bl __init_hvmode_206
> + bl __init_PMU
> +1: bl __init_hvmode_206
> mtlr r11
> beqlr
> li r0,0
HI Jordan
I will address these comments and include changes for cpu_setup_power.S in a separate patch as suggested by Michael Ellerman
Thanks
Athira
>
>> @@ -124,6 +127,9 @@ _GLOBAL(__setup_cpu_power9)
>> _GLOBAL(__restore_cpu_power10)
>> mflr r11
>> bl __init_FSCR_power10
>> + bl __init_PMU
>> + bl __init_PMU_ISA31
>> + bl __init_PMU_HV
>> b 1f
>>
>> _GLOBAL(__restore_cpu_power9)
>> @@ -233,3 +239,10 @@ __init_PMU_ISA207:
>> li r5,0
>> mtspr SPRN_MMCRS,r5
>> blr
>> +
>> +__init_PMU_ISA31:
>> + li r5,0
>> + mtspr SPRN_MMCR3,r5
>> + LOAD_REG_IMMEDIATE(r5, MMCRA_BHRB_DISABLE)
>> + mtspr SPRN_MMCRA,r5
>> + blr
>>
>> —
>>
>> b 1f
>>
>> _GLOBAL(__setup_cpu_power9)
>> @@ -233,3 +234,10 @@ __init_PMU_ISA207:
>> li r5,0
>> mtspr SPRN_MMCRS,r5
>> blr
>> +
>> +__init_PMU_ISA31:
>> + li r5,0
>> + mtspr SPRN_MMCR3,r5
>> + LOAD_REG_IMMEDIATE(r5, MMCRA_BHRB_DISABLE)
>> + mtspr SPRN_MMCRA,r5
>> + blr
>> diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c
>> index 3a40951..f482286 100644
>> --- a/arch/powerpc/kernel/dt_cpu_ftrs.c
>> +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
>> @@ -450,6 +450,31 @@ static int __init feat_enable_pmu_power9(struct dt_cpu_feature *f)
>> return 1;
>> }
>>
>> +static void init_pmu_power10(void)
>> +{
>> + init_pmu_power9();
>> +
>> + mtspr(SPRN_MMCR3, 0);
>> + mtspr(SPRN_MMCRA, MMCRA_BHRB_DISABLE);
>> +}
>> +
>> +static int __init feat_enable_pmu_power10(struct dt_cpu_feature *f)
>> +{
>> + hfscr_pmu_enable();
>> +
>> + init_pmu_power10();
>> + init_pmu_registers = init_pmu_power10;
>> +
>> + cur_cpu_spec->cpu_features |= CPU_FTR_MMCRA;
>> + cur_cpu_spec->cpu_user_features |= PPC_FEATURE_PSERIES_PERFMON_COMPAT;
>> +
>> + cur_cpu_spec->num_pmcs = 6;
>> + cur_cpu_spec->pmc_type = PPC_PMC_IBM;
>> + cur_cpu_spec->oprofile_cpu_type = "ppc64/power10";
>> +
>> + return 1;
>> +}
>> +
>> static int __init feat_enable_tm(struct dt_cpu_feature *f)
>> {
>> #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>> @@ -639,6 +664,7 @@ struct dt_cpu_feature_match {
>> {"pc-relative-addressing", feat_enable, 0},
>> {"machine-check-power9", feat_enable_mce_power9, 0},
>> {"performance-monitor-power9", feat_enable_pmu_power9, 0},
>> + {"performance-monitor-power10", feat_enable_pmu_power10, 0},
>> {"event-based-branch-v3", feat_enable, 0},
>> {"random-number-generator", feat_enable, 0},
>> {"system-call-vectored", feat_disable, 0},
>> --
>> 1.8.3.1
[-- Attachment #2: Type: text/html, Size: 26517 bytes --]
^ permalink raw reply
* Re: [PATCH v3] powerpc/pseries: Avoid using addr_to_pfn in realmode
From: Ganesh @ 2020-07-22 10:37 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev, mpe; +Cc: aneesh.kumar, mahesh
In-Reply-To: <1595325962.zikec6nkw7.astroid@bobo.none>
[-- Attachment #1: Type: text/plain, Size: 9104 bytes --]
On 7/21/20 3:38 PM, Nicholas Piggin wrote:
> Excerpts from Ganesh Goudar's message of July 20, 2020 6:03 pm:
>> When an UE or memory error exception is encountered the MCE handler
>> tries to find the pfn using addr_to_pfn() which takes effective
>> address as an argument, later pfn is used to poison the page where
>> memory error occurred, recent rework in this area made addr_to_pfn
>> to run in realmode, which can be fatal as it may try to access
>> memory outside RMO region.
>>
>> To fix this have separate functions for realmode and virtual mode
>> handling and let addr_to_pfn to run in virtual mode.
> You didn't really explain what you moved around. You added some
> helper functions, but what does it actually do differently now? Can you
> explain that in the changelog?
Sure, ill rephrase the changelog, here I have moved all that we can and we must
do in virtual mode to new helper function which runs in virtual mode, like filling
mce error info, using addr_to_pfn and calling save_mce_event().
>
> Thanks,
> Nick
>
>> Without this fix following kernel crash is seen on hitting UE.
>>
>> [ 485.128036] Oops: Kernel access of bad area, sig: 11 [#1]
>> [ 485.128040] LE SMP NR_CPUS=2048 NUMA pSeries
>> [ 485.128047] Modules linked in:
>> [ 485.128067] CPU: 15 PID: 6536 Comm: insmod Kdump: loaded Tainted: G OE 5.7.0 #22
>> [ 485.128074] NIP: c00000000009b24c LR: c0000000000398d8 CTR: c000000000cd57c0
>> [ 485.128078] REGS: c000000003f1f970 TRAP: 0300 Tainted: G OE (5.7.0)
>> [ 485.128082] MSR: 8000000000001003 <SF,ME,RI,LE> CR: 28008284 XER: 00000001
>> [ 485.128088] CFAR: c00000000009b190 DAR: c0000001fab00000 DSISR: 40000000 IRQMASK: 1
>> [ 485.128088] GPR00: 0000000000000001 c000000003f1fbf0 c000000001634300 0000b0fa01000000
>> [ 485.128088] GPR04: d000000002220000 0000000000000000 00000000fab00000 0000000000000022
>> [ 485.128088] GPR08: c0000001fab00000 0000000000000000 c0000001fab00000 c000000003f1fc14
>> [ 485.128088] GPR12: 0000000000000008 c000000003ff5880 d000000002100008 0000000000000000
>> [ 485.128088] GPR16: 000000000000ff20 000000000000fff1 000000000000fff2 d0000000021a1100
>> [ 485.128088] GPR20: d000000002200000 c00000015c893c50 c000000000d49b28 c00000015c893c50
>> [ 485.128088] GPR24: d0000000021a0d08 c0000000014e5da8 d0000000021a0818 000000000000000a
>> [ 485.128088] GPR28: 0000000000000008 000000000000000a c0000000017e2970 000000000000000a
>> [ 485.128125] NIP [c00000000009b24c] __find_linux_pte+0x11c/0x310
>> [ 485.128130] LR [c0000000000398d8] addr_to_pfn+0x138/0x170
>> [ 485.128133] Call Trace:
>> [ 485.128135] Instruction dump:
>> [ 485.128138] 3929ffff 7d4a3378 7c883c36 7d2907b4 794a1564 7d294038 794af082 3900ffff
>> [ 485.128144] 79291f24 790af00e 78e70020 7d095214 <7c69502a> 2fa30000 419e011c 70690040
>> [ 485.128152] ---[ end trace d34b27e29ae0e340 ]---
>>
>> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
>> ---
>> V2: Leave bare metal code and save_mce_event as is.
>>
>> V3: Have separate functions for realmode and virtual mode handling.
>> ---
>> arch/powerpc/platforms/pseries/ras.c | 119 ++++++++++++++++-----------
>> 1 file changed, 70 insertions(+), 49 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
>> index f3736fcd98fc..32fe3fad86b8 100644
>> --- a/arch/powerpc/platforms/pseries/ras.c
>> +++ b/arch/powerpc/platforms/pseries/ras.c
>> @@ -522,18 +522,55 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
>> return 0; /* need to perform reset */
>> }
>>
>> +static int mce_handle_err_realmode(int disposition, u8 error_type)
>> +{
>> +#ifdef CONFIG_PPC_BOOK3S_64
>> + if (disposition == RTAS_DISP_NOT_RECOVERED) {
>> + switch (error_type) {
>> + case MC_ERROR_TYPE_SLB:
>> + case MC_ERROR_TYPE_ERAT:
>> + /*
>> + * Store the old slb content in paca before flushing.
>> + * Print this when we go to virtual mode.
>> + * There are chances that we may hit MCE again if there
>> + * is a parity error on the SLB entry we trying to read
>> + * for saving. Hence limit the slb saving to single
>> + * level of recursion.
>> + */
>> + if (local_paca->in_mce == 1)
>> + slb_save_contents(local_paca->mce_faulty_slbs);
>> + flush_and_reload_slb();
>> + disposition = RTAS_DISP_FULLY_RECOVERED;
>> + break;
>> + default:
>> + break;
>> + }
>> + } else if (disposition == RTAS_DISP_LIMITED_RECOVERY) {
>> + /* Platform corrected itself but could be degraded */
>> + pr_err("MCE: limited recovery, system may be degraded\n");
>> + disposition = RTAS_DISP_FULLY_RECOVERED;
>> + }
>> +#endif
>> + return disposition;
>> +}
>>
>> -static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp)
>> +static int mce_handle_err_virtmode(struct pt_regs *regs,
>> + struct rtas_error_log *errp,
>> + struct pseries_mc_errorlog *mce_log,
>> + int disposition)
>> {
>> struct mce_error_info mce_err = { 0 };
>> - unsigned long eaddr = 0, paddr = 0;
>> - struct pseries_errorlog *pseries_log;
>> - struct pseries_mc_errorlog *mce_log;
>> - int disposition = rtas_error_disposition(errp);
>> int initiator = rtas_error_initiator(errp);
>> int severity = rtas_error_severity(errp);
>> + unsigned long eaddr = 0, paddr = 0;
>> u8 error_type, err_sub_type;
>>
>> + if (!mce_log)
>> + goto out;
>> +
>> + error_type = mce_log->error_type;
>> + err_sub_type = rtas_mc_error_sub_type(mce_log);
>> +
>> if (initiator == RTAS_INITIATOR_UNKNOWN)
>> mce_err.initiator = MCE_INITIATOR_UNKNOWN;
>> else if (initiator == RTAS_INITIATOR_CPU)
>> @@ -572,18 +609,7 @@ static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp)
>> mce_err.error_type = MCE_ERROR_TYPE_UNKNOWN;
>> mce_err.error_class = MCE_ECLASS_UNKNOWN;
>>
>> - if (!rtas_error_extended(errp))
>> - goto out;
>> -
>> - pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
>> - if (pseries_log == NULL)
>> - goto out;
>> -
>> - mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
>> - error_type = mce_log->error_type;
>> - err_sub_type = rtas_mc_error_sub_type(mce_log);
>> -
>> - switch (mce_log->error_type) {
>> + switch (error_type) {
>> case MC_ERROR_TYPE_UE:
>> mce_err.error_type = MCE_ERROR_TYPE_UE;
>> mce_common_process_ue(regs, &mce_err);
>> @@ -683,37 +709,32 @@ static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp)
>> mce_err.error_type = MCE_ERROR_TYPE_UNKNOWN;
>> break;
>> }
>> +out:
>> + save_mce_event(regs, disposition == RTAS_DISP_FULLY_RECOVERED,
>> + &mce_err, regs->nip, eaddr, paddr);
>> + return disposition;
>> +}
>>
>> -#ifdef CONFIG_PPC_BOOK3S_64
>> - if (disposition == RTAS_DISP_NOT_RECOVERED) {
>> - switch (error_type) {
>> - case MC_ERROR_TYPE_SLB:
>> - case MC_ERROR_TYPE_ERAT:
>> - /*
>> - * Store the old slb content in paca before flushing.
>> - * Print this when we go to virtual mode.
>> - * There are chances that we may hit MCE again if there
>> - * is a parity error on the SLB entry we trying to read
>> - * for saving. Hence limit the slb saving to single
>> - * level of recursion.
>> - */
>> - if (local_paca->in_mce == 1)
>> - slb_save_contents(local_paca->mce_faulty_slbs);
>> - flush_and_reload_slb();
>> - disposition = RTAS_DISP_FULLY_RECOVERED;
>> - break;
>> - default:
>> - break;
>> - }
>> - } else if (disposition == RTAS_DISP_LIMITED_RECOVERY) {
>> - /* Platform corrected itself but could be degraded */
>> - printk(KERN_ERR "MCE: limited recovery, system may "
>> - "be degraded\n");
>> - disposition = RTAS_DISP_FULLY_RECOVERED;
>> - }
>> -#endif
>> +static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp)
>> +{
>> + struct pseries_errorlog *pseries_log;
>> + struct pseries_mc_errorlog *mce_log = NULL;
>> + int disposition = rtas_error_disposition(errp);
>> + u8 error_type, err_sub_type;
>> +
>> + if (!rtas_error_extended(errp))
>> + goto out;
>> +
>> + pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
>> + if (!pseries_log)
>> + goto out;
>> +
>> + mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
>> + error_type = mce_log->error_type;
>> + err_sub_type = rtas_mc_error_sub_type(mce_log);
>> +
>> + disposition = mce_handle_err_realmode(disposition, error_type);
>>
>> -out:
>> /*
>> * Enable translation as we will be accessing per-cpu variables
>> * in save_mce_event() which may fall outside RMO region, also
>> @@ -724,10 +745,10 @@ static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp)
>> * Note: All the realmode handling like flushing SLB entries for
>> * SLB multihit is done by now.
>> */
>> +out:
>> mtmsr(mfmsr() | MSR_IR | MSR_DR);
>> - save_mce_event(regs, disposition == RTAS_DISP_FULLY_RECOVERED,
>> - &mce_err, regs->nip, eaddr, paddr);
>> -
>> + disposition = mce_handle_err_virtmode(regs, errp, mce_log,
>> + disposition);
>> return disposition;
>> }
>>
>> --
>> 2.17.2
>>
>>
[-- Attachment #2: Type: text/html, Size: 9381 bytes --]
^ permalink raw reply
* Re: [v3 04/15] powerpc/perf: Add support for ISA3.1 PMU SPRs
From: Athira Rajeev @ 2020-07-22 8:07 UTC (permalink / raw)
To: Jordan Niethe
Cc: Gautham R Shenoy, Michael Neuling, maddy, kvm, kvm-ppc, svaidyan,
acme, jolsa, linuxppc-dev
In-Reply-To: <CACzsE9r9fy22hScRm7yz5OeZH9jXA+97hEfAOo-Nk_EPwW-_Dw@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 9478 bytes --]
> On 22-Jul-2020, at 9:48 AM, Jordan Niethe <jniethe5@gmail.com> wrote:
>
> On Sat, Jul 18, 2020 at 1:02 AM Athira Rajeev
> <atrajeev@linux.vnet.ibm.com <mailto:atrajeev@linux.vnet.ibm.com>> wrote:
>>
>> From: Madhavan Srinivasan <maddy@linux.ibm.com>
>>
>> PowerISA v3.1 includes new performance monitoring unit(PMU)
>> special purpose registers (SPRs). They are
>>
>> Monitor Mode Control Register 3 (MMCR3)
>> Sampled Instruction Event Register 2 (SIER2)
>> Sampled Instruction Event Register 3 (SIER3)
>>
>> MMCR3 is added for further sampling related configuration
>> control. SIER2/SIER3 are added to provide additional
>> information about the sampled instruction.
>>
>> Patch adds new PPMU flag called "PPMU_ARCH_310S" to support
>> handling of these new SPRs, updates the struct thread_struct
>> to include these new SPRs, include MMCR3 in struct mmcr_regs.
>> This is needed to support programming of MMCR3 SPR during
>> event_[enable/disable]. Patch also adds the sysfs support
>> for the MMCR3 SPR along with SPRN_ macros for these new pmu sprs.
>>
>> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
>> ---
>> arch/powerpc/include/asm/perf_event_server.h | 2 ++
>> arch/powerpc/include/asm/processor.h | 4 ++++
>> arch/powerpc/include/asm/reg.h | 6 ++++++
>> arch/powerpc/kernel/sysfs.c | 8 ++++++++
>> arch/powerpc/perf/core-book3s.c | 29 ++++++++++++++++++++++++++++
>> 5 files changed, 49 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
>> index 14b8dc1..832450a 100644
>> --- a/arch/powerpc/include/asm/perf_event_server.h
>> +++ b/arch/powerpc/include/asm/perf_event_server.h
>> @@ -22,6 +22,7 @@ struct mmcr_regs {
>> unsigned long mmcr1;
>> unsigned long mmcr2;
>> unsigned long mmcra;
>> + unsigned long mmcr3;
>> };
>> /*
>> * This struct provides the constants and functions needed to
>> @@ -75,6 +76,7 @@ struct power_pmu {
>> #define PPMU_HAS_SIER 0x00000040 /* Has SIER */
>> #define PPMU_ARCH_207S 0x00000080 /* PMC is architecture v2.07S */
>> #define PPMU_NO_SIAR 0x00000100 /* Do not use SIAR */
>> +#define PPMU_ARCH_310S 0x00000200 /* Has MMCR3, SIER2 and SIER3 */
> We elsewhere have CPU_FTR_ARCH_31, so should this be PPMU_ARCH_31S to
> be consistent.
>>
Ok,
This change will need to be done in all places which are currently using PPMU_ARCH_310S
>> /*
>> * Values for flags to get_alternatives()
>> diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
>> index 52a6783..a466e94 100644
>> --- a/arch/powerpc/include/asm/processor.h
>> +++ b/arch/powerpc/include/asm/processor.h
>> @@ -272,6 +272,10 @@ struct thread_struct {
>> unsigned mmcr0;
>>
>> unsigned used_ebb;
>> + unsigned long mmcr3;
>> + unsigned long sier2;
>> + unsigned long sier3;
>> +
>> #endif
>> };
>>
>> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
>> index 88e6c78..21a1b2d 100644
>> --- a/arch/powerpc/include/asm/reg.h
>> +++ b/arch/powerpc/include/asm/reg.h
>> @@ -876,7 +876,9 @@
>> #define MMCR0_FCHV 0x00000001UL /* freeze conditions in hypervisor mode */
>> #define SPRN_MMCR1 798
>> #define SPRN_MMCR2 785
>> +#define SPRN_MMCR3 754
>> #define SPRN_UMMCR2 769
>> +#define SPRN_UMMCR3 738
>> #define SPRN_MMCRA 0x312
>> #define MMCRA_SDSYNC 0x80000000UL /* SDAR synced with SIAR */
>> #define MMCRA_SDAR_DCACHE_MISS 0x40000000UL
>> @@ -918,6 +920,10 @@
>> #define SIER_SIHV 0x1000000 /* Sampled MSR_HV */
>> #define SIER_SIAR_VALID 0x0400000 /* SIAR contents valid */
>> #define SIER_SDAR_VALID 0x0200000 /* SDAR contents valid */
>> +#define SPRN_SIER2 752
>> +#define SPRN_SIER3 753
>> +#define SPRN_USIER2 736
>> +#define SPRN_USIER3 737
>> #define SPRN_SIAR 796
>> #define SPRN_SDAR 797
>> #define SPRN_TACR 888
>> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
>> index 571b325..46b4ebc 100644
>> --- a/arch/powerpc/kernel/sysfs.c
>> +++ b/arch/powerpc/kernel/sysfs.c
>> @@ -622,8 +622,10 @@ void ppc_enable_pmcs(void)
>> SYSFS_PMCSETUP(pmc8, SPRN_PMC8);
>>
>> SYSFS_PMCSETUP(mmcra, SPRN_MMCRA);
>> +SYSFS_PMCSETUP(mmcr3, SPRN_MMCR3);
>>
>> static DEVICE_ATTR(mmcra, 0600, show_mmcra, store_mmcra);
>> +static DEVICE_ATTR(mmcr3, 0600, show_mmcr3, store_mmcr3);
>> #endif /* HAS_PPC_PMC56 */
>>
>>
>> @@ -886,6 +888,9 @@ static int register_cpu_online(unsigned int cpu)
>> #ifdef CONFIG_PMU_SYSFS
>> if (cpu_has_feature(CPU_FTR_MMCRA))
>> device_create_file(s, &dev_attr_mmcra);
>> +
>> + if (cpu_has_feature(CPU_FTR_ARCH_31))
>> + device_create_file(s, &dev_attr_mmcr3);
>> #endif /* CONFIG_PMU_SYSFS */
>>
>> if (cpu_has_feature(CPU_FTR_PURR)) {
>> @@ -980,6 +985,9 @@ static int unregister_cpu_online(unsigned int cpu)
>> #ifdef CONFIG_PMU_SYSFS
>> if (cpu_has_feature(CPU_FTR_MMCRA))
>> device_remove_file(s, &dev_attr_mmcra);
>> +
>> + if (cpu_has_feature(CPU_FTR_ARCH_31))
>> + device_remove_file(s, &dev_attr_mmcr3);
>> #endif /* CONFIG_PMU_SYSFS */
>>
>> if (cpu_has_feature(CPU_FTR_PURR)) {
>> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
>> index f4d07b5..ca32fc0 100644
>> --- a/arch/powerpc/perf/core-book3s.c
>> +++ b/arch/powerpc/perf/core-book3s.c
>> @@ -72,6 +72,11 @@ struct cpu_hw_events {
>> /*
>> * 32-bit doesn't have MMCRA but does have an MMCR2,
>> * and a few other names are different.
>> + * Also 32-bit doesn't have MMCR3, SIER2 and SIER3.
>> + * Define them as zero knowing that any code path accessing
>> + * these registers (via mtspr/mfspr) are done under ppmu flag
>> + * check for PPMU_ARCH_310S and we will not enter that code path
>> + * for 32-bit.
>> */
>> #ifdef CONFIG_PPC32
>>
>> @@ -85,6 +90,9 @@ struct cpu_hw_events {
>> #define MMCR0_PMCC_U6 0
>>
>> #define SPRN_MMCRA SPRN_MMCR2
>> +#define SPRN_MMCR3 0
>> +#define SPRN_SIER2 0
>> +#define SPRN_SIER3 0
>> #define MMCRA_SAMPLE_ENABLE 0
>>
>> static inline unsigned long perf_ip_adjust(struct pt_regs *regs)
>> @@ -581,6 +589,11 @@ static void ebb_switch_out(unsigned long mmcr0)
>> current->thread.sdar = mfspr(SPRN_SDAR);
>> current->thread.mmcr0 = mmcr0 & MMCR0_USER_MASK;
>> current->thread.mmcr2 = mfspr(SPRN_MMCR2) & MMCR2_USER_MASK;
>> + if (ppmu->flags & PPMU_ARCH_310S) {
>> + current->thread.mmcr3 = mfspr(SPRN_MMCR3);
> Like MMCR0_USER_MASK and MMCR2_USER_MASK do we need a MMCR3_USER_MASK
> here, or is there no need?
Jordan
We don’t need user mask for MMCR3 and other new SPRs ( SIER2/3) . Incase of MMCR3, we dont have any Freeze control bits and incase of SIER2/3, it is similar to SIER (where HW handles the masking of the bits), hence we didn't add any user_mask for these SPRs
Thanks
Athira
>> + current->thread.sier2 = mfspr(SPRN_SIER2);
>> + current->thread.sier3 = mfspr(SPRN_SIER3);
>> + }
>> }
>>
>> static unsigned long ebb_switch_in(bool ebb, struct cpu_hw_events *cpuhw)
>> @@ -620,6 +633,12 @@ static unsigned long ebb_switch_in(bool ebb, struct cpu_hw_events *cpuhw)
>> * instead manage the MMCR2 entirely by itself.
>> */
>> mtspr(SPRN_MMCR2, cpuhw->mmcr.mmcr2 | current->thread.mmcr2);
>> +
>> + if (ppmu->flags & PPMU_ARCH_310S) {
>> + mtspr(SPRN_MMCR3, current->thread.mmcr3);
>> + mtspr(SPRN_SIER2, current->thread.sier2);
>> + mtspr(SPRN_SIER3, current->thread.sier3);
>> + }
>> out:
>> return mmcr0;
>> }
>> @@ -840,6 +859,11 @@ void perf_event_print_debug(void)
>> pr_info("EBBRR: %016lx BESCR: %016lx\n",
>> mfspr(SPRN_EBBRR), mfspr(SPRN_BESCR));
>> }
>> +
>> + if (ppmu->flags & PPMU_ARCH_310S) {
>> + pr_info("MMCR3: %016lx SIER2: %016lx SIER3: %016lx\n",
>> + mfspr(SPRN_MMCR3), mfspr(SPRN_SIER2), mfspr(SPRN_SIER3));
>> + }
>> #endif
>> pr_info("SIAR: %016lx SDAR: %016lx SIER: %016lx\n",
>> mfspr(SPRN_SIAR), sdar, sier);
>> @@ -1305,6 +1329,8 @@ static void power_pmu_enable(struct pmu *pmu)
>> if (!cpuhw->n_added) {
>> mtspr(SPRN_MMCRA, cpuhw->mmcr.mmcra & ~MMCRA_SAMPLE_ENABLE);
>> mtspr(SPRN_MMCR1, cpuhw->mmcr.mmcr1);
>> + if (ppmu->flags & PPMU_ARCH_310S)
>> + mtspr(SPRN_MMCR3, cpuhw->mmcr.mmcr3);
>> goto out_enable;
>> }
>>
>> @@ -1348,6 +1374,9 @@ static void power_pmu_enable(struct pmu *pmu)
>> if (ppmu->flags & PPMU_ARCH_207S)
>> mtspr(SPRN_MMCR2, cpuhw->mmcr.mmcr2);
>>
>> + if (ppmu->flags & PPMU_ARCH_310S)
>> + mtspr(SPRN_MMCR3, cpuhw->mmcr.mmcr3);
>> +
>> /*
>> * Read off any pre-existing events that need to move
>> * to another PMC.
>> --
>> 1.8.3.1
[-- Attachment #2: Type: text/html, Size: 23566 bytes --]
^ permalink raw reply
* Re: [v3 07/15] powerpc/perf: Add power10_feat to dt_cpu_ftrs
From: Athira Rajeev @ 2020-07-22 7:55 UTC (permalink / raw)
To: Jordan Niethe
Cc: Gautham R Shenoy, Michael Neuling, maddy, kvm, kvm-ppc, svaidyan,
acme, jolsa, linuxppc-dev
In-Reply-To: <CACzsE9oBw1ZrJLqOAg1QqPrQgSoVbEdPh_ax7mU_kcWNyfyAcg@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 7014 bytes --]
> On 22-Jul-2020, at 10:11 AM, Jordan Niethe <jniethe5@gmail.com> wrote:
>
> On Sat, Jul 18, 2020 at 1:13 AM Athira Rajeev
> <atrajeev@linux.vnet.ibm.com <mailto:atrajeev@linux.vnet.ibm.com>> wrote:
>>
>> From: Madhavan Srinivasan <maddy@linux.ibm.com>
>>
>> Add power10 feature function to dt_cpu_ftrs.c along
>> with a power10 specific init() to initialize pmu sprs,
>> sets the oprofile_cpu_type and cpu_features. This will
>> enable performance monitoring unit(PMU) for Power10
>> in CPU features with "performance-monitor-power10".
>>
>> For PowerISA v3.1, BHRB disable is controlled via Monitor Mode
>> Control Register A (MMCRA) bit, namely "BHRB Recording Disable
>> (BHRBRD)". This patch initializes MMCRA BHRBRD to disable BHRB
>> feature at boot for power10.
>>
>> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
>> ---
>> arch/powerpc/include/asm/reg.h | 3 +++
>> arch/powerpc/kernel/cpu_setup_power.S | 8 ++++++++
>> arch/powerpc/kernel/dt_cpu_ftrs.c | 26 ++++++++++++++++++++++++++
>> 3 files changed, 37 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
>> index 21a1b2d..900ada1 100644
>> --- a/arch/powerpc/include/asm/reg.h
>> +++ b/arch/powerpc/include/asm/reg.h
>> @@ -1068,6 +1068,9 @@
>> #define MMCR0_PMC2_LOADMISSTIME 0x5
>> #endif
>>
>> +/* BHRB disable bit for PowerISA v3.10 */
>> +#define MMCRA_BHRB_DISABLE 0x0000002000000000
> Shouldn't this go under SPRN_MMCRA with the other MMCRA_*.
Hi Jordan
Ok, the definition of MMCRA is under #ifdef for 64 bit . if I move definition of MMCRA_BHRB_DISABLE along with other SPR's, I also
need to define this for 32-bit to satisfy core-book3s to compile as below:
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 900ada10762c..7e271657b412 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -888,6 +888,8 @@
#define MMCRA_SLOT 0x07000000UL /* SLOT bits (37-39) */
#define MMCRA_SLOT_SHIFT 24
#define MMCRA_SAMPLE_ENABLE 0x00000001UL /* enable sampling */
+/* BHRB disable bit for PowerISA v3.10 */
+#define MMCRA_BHRB_DISABLE 0x0000002000000000
#define POWER6_MMCRA_SDSYNC 0x0000080000000000ULL /* SDAR/SIAR synced */
#define POWER6_MMCRA_SIHV 0x0000040000000000ULL
#define POWER6_MMCRA_SIPR 0x0000020000000000ULL
@@ -1068,9 +1070,6 @@
#define MMCR0_PMC2_LOADMISSTIME 0x5
#endif
-/* BHRB disable bit for PowerISA v3.10 */
-#define MMCRA_BHRB_DISABLE 0x0000002000000000
-
/*
* SPRG usage:
*
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 36baae666387..88068f20827c 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -94,6 +94,7 @@ static unsigned int freeze_events_kernel = MMCR0_FCS;
#define SPRN_SIER2 0
#define SPRN_SIER3 0
#define MMCRA_SAMPLE_ENABLE 0
+#define MMCRA_BHRB_DISABLE 0
static inline unsigned long perf_ip_adjust(struct pt_regs *regs)
{
>> +
>> /*
>> * SPRG usage:
>> *
>> diff --git a/arch/powerpc/kernel/cpu_setup_power.S b/arch/powerpc/kernel/cpu_setup_power.S
>> index efdcfa7..b8e0d1e 100644
>> --- a/arch/powerpc/kernel/cpu_setup_power.S
>> +++ b/arch/powerpc/kernel/cpu_setup_power.S
>> @@ -94,6 +94,7 @@ _GLOBAL(__restore_cpu_power8)
>> _GLOBAL(__setup_cpu_power10)
>> mflr r11
>> bl __init_FSCR_power10
>> + bl __init_PMU_ISA31
> So we set MMCRA here but then aren't we still going to call __init_PMU
> which will overwrite that?
> Would this setting MMCRA also need to be handled in __restore_cpu_power10?
Thanks for this nice catch ! When I rebased code initial phase, we didn’t had power10 part filled in.
It was a miss from my side in adding PMu init functions and thanks for pointing this out.
Below patch will call __init_PMU functions in setup and restore. Please check if this looks good
--
diff --git a/arch/powerpc/kernel/cpu_setup_power.S b/arch/powerpc/kernel/cpu_setup_power.S
index efdcfa714106..e672a6c5fd7c 100644
--- a/arch/powerpc/kernel/cpu_setup_power.S
+++ b/arch/powerpc/kernel/cpu_setup_power.S
@@ -94,6 +94,9 @@ _GLOBAL(__restore_cpu_power8)
_GLOBAL(__setup_cpu_power10)
mflr r11
bl __init_FSCR_power10
+ bl __init_PMU
+ bl __init_PMU_ISA31
+ bl __init_PMU_HV
b 1f
_GLOBAL(__setup_cpu_power9)
@@ -124,6 +127,9 @@ _GLOBAL(__setup_cpu_power9)
_GLOBAL(__restore_cpu_power10)
mflr r11
bl __init_FSCR_power10
+ bl __init_PMU
+ bl __init_PMU_ISA31
+ bl __init_PMU_HV
b 1f
_GLOBAL(__restore_cpu_power9)
@@ -233,3 +239,10 @@ __init_PMU_ISA207:
li r5,0
mtspr SPRN_MMCRS,r5
blr
+
+__init_PMU_ISA31:
+ li r5,0
+ mtspr SPRN_MMCR3,r5
+ LOAD_REG_IMMEDIATE(r5, MMCRA_BHRB_DISABLE)
+ mtspr SPRN_MMCRA,r5
+ blr
—
>> b 1f
>>
>> _GLOBAL(__setup_cpu_power9)
>> @@ -233,3 +234,10 @@ __init_PMU_ISA207:
>> li r5,0
>> mtspr SPRN_MMCRS,r5
>> blr
>> +
>> +__init_PMU_ISA31:
>> + li r5,0
>> + mtspr SPRN_MMCR3,r5
>> + LOAD_REG_IMMEDIATE(r5, MMCRA_BHRB_DISABLE)
>> + mtspr SPRN_MMCRA,r5
>> + blr
>> diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c
>> index 3a40951..f482286 100644
>> --- a/arch/powerpc/kernel/dt_cpu_ftrs.c
>> +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
>> @@ -450,6 +450,31 @@ static int __init feat_enable_pmu_power9(struct dt_cpu_feature *f)
>> return 1;
>> }
>>
>> +static void init_pmu_power10(void)
>> +{
>> + init_pmu_power9();
>> +
>> + mtspr(SPRN_MMCR3, 0);
>> + mtspr(SPRN_MMCRA, MMCRA_BHRB_DISABLE);
>> +}
>> +
>> +static int __init feat_enable_pmu_power10(struct dt_cpu_feature *f)
>> +{
>> + hfscr_pmu_enable();
>> +
>> + init_pmu_power10();
>> + init_pmu_registers = init_pmu_power10;
>> +
>> + cur_cpu_spec->cpu_features |= CPU_FTR_MMCRA;
>> + cur_cpu_spec->cpu_user_features |= PPC_FEATURE_PSERIES_PERFMON_COMPAT;
>> +
>> + cur_cpu_spec->num_pmcs = 6;
>> + cur_cpu_spec->pmc_type = PPC_PMC_IBM;
>> + cur_cpu_spec->oprofile_cpu_type = "ppc64/power10";
>> +
>> + return 1;
>> +}
>> +
>> static int __init feat_enable_tm(struct dt_cpu_feature *f)
>> {
>> #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>> @@ -639,6 +664,7 @@ struct dt_cpu_feature_match {
>> {"pc-relative-addressing", feat_enable, 0},
>> {"machine-check-power9", feat_enable_mce_power9, 0},
>> {"performance-monitor-power9", feat_enable_pmu_power9, 0},
>> + {"performance-monitor-power10", feat_enable_pmu_power10, 0},
>> {"event-based-branch-v3", feat_enable, 0},
>> {"random-number-generator", feat_enable, 0},
>> {"system-call-vectored", feat_disable, 0},
>> --
>> 1.8.3.1
[-- Attachment #2: Type: text/html, Size: 23836 bytes --]
^ permalink raw reply related
* Re: [v3 04/15] powerpc/perf: Add support for ISA3.1 PMU SPRs
From: Michael Ellerman @ 2020-07-22 12:03 UTC (permalink / raw)
To: Jordan Niethe, Athira Rajeev
Cc: Gautham R Shenoy, mikey, maddy, kvm, kvm-ppc, svaidyan, acme,
jolsa, linuxppc-dev
In-Reply-To: <CACzsE9r9fy22hScRm7yz5OeZH9jXA+97hEfAOo-Nk_EPwW-_Dw@mail.gmail.com>
Jordan Niethe <jniethe5@gmail.com> writes:
> On Sat, Jul 18, 2020 at 1:02 AM Athira Rajeev <atrajeev@linux.vnet.ibm.com> wrote:
>> From: Madhavan Srinivasan <maddy@linux.ibm.com>
>>
>> PowerISA v3.1 includes new performance monitoring unit(PMU)
>> special purpose registers (SPRs). They are
...
>>
>> diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
>> index 14b8dc1..832450a 100644
>> --- a/arch/powerpc/include/asm/perf_event_server.h
>> +++ b/arch/powerpc/include/asm/perf_event_server.h
>> @@ -75,6 +76,7 @@ struct power_pmu {
>> #define PPMU_HAS_SIER 0x00000040 /* Has SIER */
>> #define PPMU_ARCH_207S 0x00000080 /* PMC is architecture v2.07S */
>> #define PPMU_NO_SIAR 0x00000100 /* Do not use SIAR */
>> +#define PPMU_ARCH_310S 0x00000200 /* Has MMCR3, SIER2 and SIER3 */
> We elsewhere have CPU_FTR_ARCH_31, so should this be PPMU_ARCH_31S to
> be consistent.
The "S" is no longer needed as there's no Book S vs Book E distinction
in ISA v3.1.
So I changed it to PPMU_ARCH_31.
>> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
>> index f4d07b5..ca32fc0 100644
>> --- a/arch/powerpc/perf/core-book3s.c
>> +++ b/arch/powerpc/perf/core-book3s.c
>> @@ -581,6 +589,11 @@ static void ebb_switch_out(unsigned long mmcr0)
>> current->thread.sdar = mfspr(SPRN_SDAR);
>> current->thread.mmcr0 = mmcr0 & MMCR0_USER_MASK;
>> current->thread.mmcr2 = mfspr(SPRN_MMCR2) & MMCR2_USER_MASK;
>> + if (ppmu->flags & PPMU_ARCH_310S) {
>> + current->thread.mmcr3 = mfspr(SPRN_MMCR3);
> Like MMCR0_USER_MASK and MMCR2_USER_MASK do we need a MMCR3_USER_MASK
> here, or is there no need?
mmcr0 and mmcr2 are visible via ptrace, so masking them here means we
don't expose any bits to userspace via ptrace that aren't also visible
by reading the register.
So at least while mmcr3 is not exposed via ptrace it's safe to not mask
it, if there are even any sensitive bits in it.
cheers
^ permalink raw reply
* Re: [v3 04/15] powerpc/perf: Add support for ISA3.1 PMU SPRs
From: Jordan Niethe @ 2020-07-22 10:52 UTC (permalink / raw)
To: Athira Rajeev
Cc: Gautham R Shenoy, Michael Neuling, maddy, kvm, kvm-ppc, svaidyan,
acme, jolsa, linuxppc-dev
In-Reply-To: <F54F7D50-8255-428A-B79E-EE1E2D70074B@linux.vnet.ibm.com>
On Wed, Jul 22, 2020 at 6:07 PM Athira Rajeev
<atrajeev@linux.vnet.ibm.com> wrote:
>
>
>
> On 22-Jul-2020, at 9:48 AM, Jordan Niethe <jniethe5@gmail.com> wrote:
>
> On Sat, Jul 18, 2020 at 1:02 AM Athira Rajeev
> <atrajeev@linux.vnet.ibm.com> wrote:
>
>
> From: Madhavan Srinivasan <maddy@linux.ibm.com>
>
> PowerISA v3.1 includes new performance monitoring unit(PMU)
> special purpose registers (SPRs). They are
>
> Monitor Mode Control Register 3 (MMCR3)
> Sampled Instruction Event Register 2 (SIER2)
> Sampled Instruction Event Register 3 (SIER3)
>
> MMCR3 is added for further sampling related configuration
> control. SIER2/SIER3 are added to provide additional
> information about the sampled instruction.
>
> Patch adds new PPMU flag called "PPMU_ARCH_310S" to support
> handling of these new SPRs, updates the struct thread_struct
> to include these new SPRs, include MMCR3 in struct mmcr_regs.
> This is needed to support programming of MMCR3 SPR during
> event_[enable/disable]. Patch also adds the sysfs support
> for the MMCR3 SPR along with SPRN_ macros for these new pmu sprs.
>
> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
> ---
> arch/powerpc/include/asm/perf_event_server.h | 2 ++
> arch/powerpc/include/asm/processor.h | 4 ++++
> arch/powerpc/include/asm/reg.h | 6 ++++++
> arch/powerpc/kernel/sysfs.c | 8 ++++++++
> arch/powerpc/perf/core-book3s.c | 29 ++++++++++++++++++++++++++++
> 5 files changed, 49 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
> index 14b8dc1..832450a 100644
> --- a/arch/powerpc/include/asm/perf_event_server.h
> +++ b/arch/powerpc/include/asm/perf_event_server.h
> @@ -22,6 +22,7 @@ struct mmcr_regs {
> unsigned long mmcr1;
> unsigned long mmcr2;
> unsigned long mmcra;
> + unsigned long mmcr3;
> };
> /*
> * This struct provides the constants and functions needed to
> @@ -75,6 +76,7 @@ struct power_pmu {
> #define PPMU_HAS_SIER 0x00000040 /* Has SIER */
> #define PPMU_ARCH_207S 0x00000080 /* PMC is architecture v2.07S */
> #define PPMU_NO_SIAR 0x00000100 /* Do not use SIAR */
> +#define PPMU_ARCH_310S 0x00000200 /* Has MMCR3, SIER2 and SIER3 */
>
> We elsewhere have CPU_FTR_ARCH_31, so should this be PPMU_ARCH_31S to
> be consistent.
>
>
>
> Ok,
> This change will need to be done in all places which are currently using PPMU_ARCH_310S
>
> /*
> * Values for flags to get_alternatives()
> diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
> index 52a6783..a466e94 100644
> --- a/arch/powerpc/include/asm/processor.h
> +++ b/arch/powerpc/include/asm/processor.h
> @@ -272,6 +272,10 @@ struct thread_struct {
> unsigned mmcr0;
>
> unsigned used_ebb;
> + unsigned long mmcr3;
> + unsigned long sier2;
> + unsigned long sier3;
> +
> #endif
> };
>
> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> index 88e6c78..21a1b2d 100644
> --- a/arch/powerpc/include/asm/reg.h
> +++ b/arch/powerpc/include/asm/reg.h
> @@ -876,7 +876,9 @@
> #define MMCR0_FCHV 0x00000001UL /* freeze conditions in hypervisor mode */
> #define SPRN_MMCR1 798
> #define SPRN_MMCR2 785
> +#define SPRN_MMCR3 754
> #define SPRN_UMMCR2 769
> +#define SPRN_UMMCR3 738
> #define SPRN_MMCRA 0x312
> #define MMCRA_SDSYNC 0x80000000UL /* SDAR synced with SIAR */
> #define MMCRA_SDAR_DCACHE_MISS 0x40000000UL
> @@ -918,6 +920,10 @@
> #define SIER_SIHV 0x1000000 /* Sampled MSR_HV */
> #define SIER_SIAR_VALID 0x0400000 /* SIAR contents valid */
> #define SIER_SDAR_VALID 0x0200000 /* SDAR contents valid */
> +#define SPRN_SIER2 752
> +#define SPRN_SIER3 753
> +#define SPRN_USIER2 736
> +#define SPRN_USIER3 737
> #define SPRN_SIAR 796
> #define SPRN_SDAR 797
> #define SPRN_TACR 888
> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> index 571b325..46b4ebc 100644
> --- a/arch/powerpc/kernel/sysfs.c
> +++ b/arch/powerpc/kernel/sysfs.c
> @@ -622,8 +622,10 @@ void ppc_enable_pmcs(void)
> SYSFS_PMCSETUP(pmc8, SPRN_PMC8);
>
> SYSFS_PMCSETUP(mmcra, SPRN_MMCRA);
> +SYSFS_PMCSETUP(mmcr3, SPRN_MMCR3);
>
> static DEVICE_ATTR(mmcra, 0600, show_mmcra, store_mmcra);
> +static DEVICE_ATTR(mmcr3, 0600, show_mmcr3, store_mmcr3);
> #endif /* HAS_PPC_PMC56 */
>
>
> @@ -886,6 +888,9 @@ static int register_cpu_online(unsigned int cpu)
> #ifdef CONFIG_PMU_SYSFS
> if (cpu_has_feature(CPU_FTR_MMCRA))
> device_create_file(s, &dev_attr_mmcra);
> +
> + if (cpu_has_feature(CPU_FTR_ARCH_31))
> + device_create_file(s, &dev_attr_mmcr3);
> #endif /* CONFIG_PMU_SYSFS */
>
> if (cpu_has_feature(CPU_FTR_PURR)) {
> @@ -980,6 +985,9 @@ static int unregister_cpu_online(unsigned int cpu)
> #ifdef CONFIG_PMU_SYSFS
> if (cpu_has_feature(CPU_FTR_MMCRA))
> device_remove_file(s, &dev_attr_mmcra);
> +
> + if (cpu_has_feature(CPU_FTR_ARCH_31))
> + device_remove_file(s, &dev_attr_mmcr3);
> #endif /* CONFIG_PMU_SYSFS */
>
> if (cpu_has_feature(CPU_FTR_PURR)) {
> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index f4d07b5..ca32fc0 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -72,6 +72,11 @@ struct cpu_hw_events {
> /*
> * 32-bit doesn't have MMCRA but does have an MMCR2,
> * and a few other names are different.
> + * Also 32-bit doesn't have MMCR3, SIER2 and SIER3.
> + * Define them as zero knowing that any code path accessing
> + * these registers (via mtspr/mfspr) are done under ppmu flag
> + * check for PPMU_ARCH_310S and we will not enter that code path
> + * for 32-bit.
> */
> #ifdef CONFIG_PPC32
>
> @@ -85,6 +90,9 @@ struct cpu_hw_events {
> #define MMCR0_PMCC_U6 0
>
> #define SPRN_MMCRA SPRN_MMCR2
> +#define SPRN_MMCR3 0
> +#define SPRN_SIER2 0
> +#define SPRN_SIER3 0
> #define MMCRA_SAMPLE_ENABLE 0
>
> static inline unsigned long perf_ip_adjust(struct pt_regs *regs)
> @@ -581,6 +589,11 @@ static void ebb_switch_out(unsigned long mmcr0)
> current->thread.sdar = mfspr(SPRN_SDAR);
> current->thread.mmcr0 = mmcr0 & MMCR0_USER_MASK;
> current->thread.mmcr2 = mfspr(SPRN_MMCR2) & MMCR2_USER_MASK;
> + if (ppmu->flags & PPMU_ARCH_310S) {
> + current->thread.mmcr3 = mfspr(SPRN_MMCR3);
>
> Like MMCR0_USER_MASK and MMCR2_USER_MASK do we need a MMCR3_USER_MASK
> here, or is there no need?
>
>
> Jordan
>
> We don’t need user mask for MMCR3 and other new SPRs ( SIER2/3) . Incase of MMCR3, we dont have any Freeze control bits and incase of SIER2/3, it is similar to SIER (where HW handles the masking of the bits), hence we didn't add any user_mask for these SPRs
Okay thank you for explaining that.
>
> Thanks
> Athira
>
> + current->thread.sier2 = mfspr(SPRN_SIER2);
> + current->thread.sier3 = mfspr(SPRN_SIER3);
> + }
> }
>
> static unsigned long ebb_switch_in(bool ebb, struct cpu_hw_events *cpuhw)
> @@ -620,6 +633,12 @@ static unsigned long ebb_switch_in(bool ebb, struct cpu_hw_events *cpuhw)
> * instead manage the MMCR2 entirely by itself.
> */
> mtspr(SPRN_MMCR2, cpuhw->mmcr.mmcr2 | current->thread.mmcr2);
> +
> + if (ppmu->flags & PPMU_ARCH_310S) {
> + mtspr(SPRN_MMCR3, current->thread.mmcr3);
> + mtspr(SPRN_SIER2, current->thread.sier2);
> + mtspr(SPRN_SIER3, current->thread.sier3);
> + }
> out:
> return mmcr0;
> }
> @@ -840,6 +859,11 @@ void perf_event_print_debug(void)
> pr_info("EBBRR: %016lx BESCR: %016lx\n",
> mfspr(SPRN_EBBRR), mfspr(SPRN_BESCR));
> }
> +
> + if (ppmu->flags & PPMU_ARCH_310S) {
> + pr_info("MMCR3: %016lx SIER2: %016lx SIER3: %016lx\n",
> + mfspr(SPRN_MMCR3), mfspr(SPRN_SIER2), mfspr(SPRN_SIER3));
> + }
> #endif
> pr_info("SIAR: %016lx SDAR: %016lx SIER: %016lx\n",
> mfspr(SPRN_SIAR), sdar, sier);
> @@ -1305,6 +1329,8 @@ static void power_pmu_enable(struct pmu *pmu)
> if (!cpuhw->n_added) {
> mtspr(SPRN_MMCRA, cpuhw->mmcr.mmcra & ~MMCRA_SAMPLE_ENABLE);
> mtspr(SPRN_MMCR1, cpuhw->mmcr.mmcr1);
> + if (ppmu->flags & PPMU_ARCH_310S)
> + mtspr(SPRN_MMCR3, cpuhw->mmcr.mmcr3);
> goto out_enable;
> }
>
> @@ -1348,6 +1374,9 @@ static void power_pmu_enable(struct pmu *pmu)
> if (ppmu->flags & PPMU_ARCH_207S)
> mtspr(SPRN_MMCR2, cpuhw->mmcr.mmcr2);
>
> + if (ppmu->flags & PPMU_ARCH_310S)
> + mtspr(SPRN_MMCR3, cpuhw->mmcr.mmcr3);
> +
> /*
> * Read off any pre-existing events that need to move
> * to another PMC.
> --
> 1.8.3.1
>
>
^ permalink raw reply
* Re: [v3 07/15] powerpc/perf: Add power10_feat to dt_cpu_ftrs
From: Jordan Niethe @ 2020-07-22 10:49 UTC (permalink / raw)
To: Athira Rajeev
Cc: Gautham R Shenoy, Michael Neuling, maddy, kvm, kvm-ppc, svaidyan,
acme, jolsa, linuxppc-dev
In-Reply-To: <9A4E06A2-5686-4C85-B2F7-0904F195B58A@linux.vnet.ibm.com>
On Wed, Jul 22, 2020 at 5:55 PM Athira Rajeev
<atrajeev@linux.vnet.ibm.com> wrote:
>
>
>
> On 22-Jul-2020, at 10:11 AM, Jordan Niethe <jniethe5@gmail.com> wrote:
>
> On Sat, Jul 18, 2020 at 1:13 AM Athira Rajeev
> <atrajeev@linux.vnet.ibm.com> wrote:
>
>
> From: Madhavan Srinivasan <maddy@linux.ibm.com>
>
> Add power10 feature function to dt_cpu_ftrs.c along
> with a power10 specific init() to initialize pmu sprs,
> sets the oprofile_cpu_type and cpu_features. This will
> enable performance monitoring unit(PMU) for Power10
> in CPU features with "performance-monitor-power10".
>
> For PowerISA v3.1, BHRB disable is controlled via Monitor Mode
> Control Register A (MMCRA) bit, namely "BHRB Recording Disable
> (BHRBRD)". This patch initializes MMCRA BHRBRD to disable BHRB
> feature at boot for power10.
>
> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
> ---
> arch/powerpc/include/asm/reg.h | 3 +++
> arch/powerpc/kernel/cpu_setup_power.S | 8 ++++++++
> arch/powerpc/kernel/dt_cpu_ftrs.c | 26 ++++++++++++++++++++++++++
> 3 files changed, 37 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> index 21a1b2d..900ada1 100644
> --- a/arch/powerpc/include/asm/reg.h
> +++ b/arch/powerpc/include/asm/reg.h
> @@ -1068,6 +1068,9 @@
> #define MMCR0_PMC2_LOADMISSTIME 0x5
> #endif
>
> +/* BHRB disable bit for PowerISA v3.10 */
> +#define MMCRA_BHRB_DISABLE 0x0000002000000000
>
> Shouldn't this go under SPRN_MMCRA with the other MMCRA_*.
>
>
>
> Hi Jordan
>
> Ok, the definition of MMCRA is under #ifdef for 64 bit . if I move definition of MMCRA_BHRB_DISABLE along with other SPR's, I also
> need to define this for 32-bit to satisfy core-book3s to compile as below:
>
> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> index 900ada10762c..7e271657b412 100644
> --- a/arch/powerpc/include/asm/reg.h
> +++ b/arch/powerpc/include/asm/reg.h
> @@ -888,6 +888,8 @@
> #define MMCRA_SLOT 0x07000000UL /* SLOT bits (37-39) */
> #define MMCRA_SLOT_SHIFT 24
> #define MMCRA_SAMPLE_ENABLE 0x00000001UL /* enable sampling */
> +/* BHRB disable bit for PowerISA v3.10 */
> +#define MMCRA_BHRB_DISABLE 0x0000002000000000
> #define POWER6_MMCRA_SDSYNC 0x0000080000000000ULL /* SDAR/SIAR synced */
> #define POWER6_MMCRA_SIHV 0x0000040000000000ULL
> #define POWER6_MMCRA_SIPR 0x0000020000000000ULL
> @@ -1068,9 +1070,6 @@
> #define MMCR0_PMC2_LOADMISSTIME 0x5
> #endif
>
>
>
> -/* BHRB disable bit for PowerISA v3.10 */
> -#define MMCRA_BHRB_DISABLE 0x0000002000000000
> -
> /*
> * SPRG usage:
> *
> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index 36baae666387..88068f20827c 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -94,6 +94,7 @@ static unsigned int freeze_events_kernel = MMCR0_FCS;
> #define SPRN_SIER2 0
> #define SPRN_SIER3 0
> #define MMCRA_SAMPLE_ENABLE 0
> +#define MMCRA_BHRB_DISABLE 0
>
>
>
> static inline unsigned long perf_ip_adjust(struct pt_regs *regs)
> {
>
>
>
> +
> /*
> * SPRG usage:
> *
> diff --git a/arch/powerpc/kernel/cpu_setup_power.S b/arch/powerpc/kernel/cpu_setup_power.S
> index efdcfa7..b8e0d1e 100644
> --- a/arch/powerpc/kernel/cpu_setup_power.S
> +++ b/arch/powerpc/kernel/cpu_setup_power.S
> @@ -94,6 +94,7 @@ _GLOBAL(__restore_cpu_power8)
> _GLOBAL(__setup_cpu_power10)
> mflr r11
> bl __init_FSCR_power10
> + bl __init_PMU_ISA31
>
> So we set MMCRA here but then aren't we still going to call __init_PMU
> which will overwrite that?
> Would this setting MMCRA also need to be handled in __restore_cpu_power10?
>
>
> Thanks for this nice catch ! When I rebased code initial phase, we didn’t had power10 part filled in.
> It was a miss from my side in adding PMu init functions and thanks for pointing this out.
> Below patch will call __init_PMU functions in setup and restore. Please check if this looks good
>
> --
> diff --git a/arch/powerpc/kernel/cpu_setup_power.S b/arch/powerpc/kernel/cpu_setup_power.S
> index efdcfa714106..e672a6c5fd7c 100644
> --- a/arch/powerpc/kernel/cpu_setup_power.S
> +++ b/arch/powerpc/kernel/cpu_setup_power.S
> @@ -94,6 +94,9 @@ _GLOBAL(__restore_cpu_power8)
> _GLOBAL(__setup_cpu_power10)
> mflr r11
> bl __init_FSCR_power10
> + bl __init_PMU
> + bl __init_PMU_ISA31
> + bl __init_PMU_HV
> b 1f
>
> _GLOBAL(__setup_cpu_power9)
Won't you also need to change where the label 1 is:
--- a/arch/powerpc/kernel/cpu_setup_power.S
+++ b/arch/powerpc/kernel/cpu_setup_power.S
@@ -100,8 +100,8 @@ _GLOBAL(__setup_cpu_power10)
_GLOBAL(__setup_cpu_power9)
mflr r11
bl __init_FSCR
-1: bl __init_PMU
- bl __init_hvmode_206
+ bl __init_PMU
+1: bl __init_hvmode_206
mtlr r11
beqlr
li r0,0
> @@ -124,6 +127,9 @@ _GLOBAL(__setup_cpu_power9)
> _GLOBAL(__restore_cpu_power10)
> mflr r11
> bl __init_FSCR_power10
> + bl __init_PMU
> + bl __init_PMU_ISA31
> + bl __init_PMU_HV
> b 1f
>
> _GLOBAL(__restore_cpu_power9)
> @@ -233,3 +239,10 @@ __init_PMU_ISA207:
> li r5,0
> mtspr SPRN_MMCRS,r5
> blr
> +
> +__init_PMU_ISA31:
> + li r5,0
> + mtspr SPRN_MMCR3,r5
> + LOAD_REG_IMMEDIATE(r5, MMCRA_BHRB_DISABLE)
> + mtspr SPRN_MMCRA,r5
> + blr
>
> —
>
> b 1f
>
> _GLOBAL(__setup_cpu_power9)
> @@ -233,3 +234,10 @@ __init_PMU_ISA207:
> li r5,0
> mtspr SPRN_MMCRS,r5
> blr
> +
> +__init_PMU_ISA31:
> + li r5,0
> + mtspr SPRN_MMCR3,r5
> + LOAD_REG_IMMEDIATE(r5, MMCRA_BHRB_DISABLE)
> + mtspr SPRN_MMCRA,r5
> + blr
> diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c
> index 3a40951..f482286 100644
> --- a/arch/powerpc/kernel/dt_cpu_ftrs.c
> +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
> @@ -450,6 +450,31 @@ static int __init feat_enable_pmu_power9(struct dt_cpu_feature *f)
> return 1;
> }
>
> +static void init_pmu_power10(void)
> +{
> + init_pmu_power9();
> +
> + mtspr(SPRN_MMCR3, 0);
> + mtspr(SPRN_MMCRA, MMCRA_BHRB_DISABLE);
> +}
> +
> +static int __init feat_enable_pmu_power10(struct dt_cpu_feature *f)
> +{
> + hfscr_pmu_enable();
> +
> + init_pmu_power10();
> + init_pmu_registers = init_pmu_power10;
> +
> + cur_cpu_spec->cpu_features |= CPU_FTR_MMCRA;
> + cur_cpu_spec->cpu_user_features |= PPC_FEATURE_PSERIES_PERFMON_COMPAT;
> +
> + cur_cpu_spec->num_pmcs = 6;
> + cur_cpu_spec->pmc_type = PPC_PMC_IBM;
> + cur_cpu_spec->oprofile_cpu_type = "ppc64/power10";
> +
> + return 1;
> +}
> +
> static int __init feat_enable_tm(struct dt_cpu_feature *f)
> {
> #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> @@ -639,6 +664,7 @@ struct dt_cpu_feature_match {
> {"pc-relative-addressing", feat_enable, 0},
> {"machine-check-power9", feat_enable_mce_power9, 0},
> {"performance-monitor-power9", feat_enable_pmu_power9, 0},
> + {"performance-monitor-power10", feat_enable_pmu_power10, 0},
> {"event-based-branch-v3", feat_enable, 0},
> {"random-number-generator", feat_enable, 0},
> {"system-call-vectored", feat_disable, 0},
> --
> 1.8.3.1
>
>
^ permalink raw reply
* Re: [PATCH] powerpc/64s: Fix irq tracing corruption in interrupt/syscall return caused by perf interrupts
From: Alexey Kardashevskiy @ 2020-07-22 10:50 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev
In-Reply-To: <20200722073437.930521-1-npiggin@gmail.com>
On 22/07/2020 17:34, Nicholas Piggin wrote:
> Alexey reports lockdep_assert_irqs_enabled() warnings when stress testing perf, e.g.,
>
> WARNING: CPU: 0 PID: 1556 at kernel/softirq.c:169 __local_bh_enable_ip+0x258/0x270
> CPU: 0 PID: 1556 Comm: syz-executor
> NIP: c0000000001ec888 LR: c0000000001ec884 CTR: c000000000ef0610
> REGS: c000000022d4f8a0 TRAP: 0700 Not tainted (5.8.0-rc3-x)
> MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 28008844 XER: 20040000
> CFAR: c0000000001dc1d0 IRQMASK: 0
>
> The interesting thing is MSR[EE] and IRQMASK shows interrupts are enabled,
> suggesting the current->hardirqs_enabled irq tracing state is going out of sync
> with the actual interrupt enable state.
>
> The cause is a window in interrupt/syscall return where irq tracing state is being
> adjusted for an irqs-enabled return while MSR[EE] is still enabled. A perf
> interrupt hits and ends up calling trace_hardirqs_off() when restoring
> interrupt flags to a disable state.
>
> Fix this by disabling perf interrupts as well while adjusting irq tracing state.
>
> Add a debug check that catches the condition sooner.
>
> Fixes: 68b34588e202 ("powerpc/64/sycall: Implement syscall entry/exit logic in C")
> Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>
> I can reproduce similar symptoms and this patch fixes my test case,
> still trying to confirm Alexey's test case or whether there's another
> similar bug causing it.
This does not fix my testcase. I applied this on top of 4fa640dc5230
("Merge tag 'vfio-v5.8-rc7' of git://github.com/awilliam/linux-vfio into
master") without any of my testing code, just to be clear. Sorry...
>
> arch/powerpc/kernel/syscall_64.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
> index 79edba3ab312..6c6f88eff915 100644
> --- a/arch/powerpc/kernel/syscall_64.c
> +++ b/arch/powerpc/kernel/syscall_64.c
> @@ -107,8 +107,13 @@ notrace long system_call_exception(long r3, long r4, long r5,
> */
> static notrace inline bool prep_irq_for_enabled_exit(void)
> {
> - /* This must be done with RI=1 because tracing may touch vmaps */
> - trace_hardirqs_on();
> + if (IS_ENABLED(CONFIG_TRACE_IRQFLAGS)) {
> + /* Prevent perf interrupts hitting and messing up the trace_hardirqs state */
> + irq_soft_mask_set(IRQS_ALL_DISABLED);
> +
> + /* This must be done with RI=1 because tracing may touch vmaps */
> + trace_hardirqs_on();
> + }
>
> /* This pattern matches prep_irq_for_idle */
> __hard_EE_RI_disable();
> @@ -123,6 +128,8 @@ static notrace inline bool prep_irq_for_enabled_exit(void)
> local_paca->irq_happened = 0;
> irq_soft_mask_set(IRQS_ENABLED);
>
> + lockdep_assert_irqs_enabled();
> +
> return true;
> }
>
>
--
Alexey
^ permalink raw reply
* Re: [PATCH 09/20] Documentation: i2c: eliminate duplicated word
From: Wolfram Sang @ 2020-07-22 10:49 UTC (permalink / raw)
To: Randy Dunlap
Cc: kvm, linux-doc, David Airlie, kgdb-bugreport, linux-fpga,
Liviu Dudau, dri-devel, Douglas Anderson, Paul Cercueil, keyrings,
Paul Mackerras, linux-i2c, Pavel Machek, Srinivas Pandruvada,
Mihail Atanassov, linux-leds, linux-s390, Daniel Thompson,
linux-scsi, Jonathan Corbet, Masahiro Yamada, Matthew Wilcox,
Halil Pasic, Jarkko Sakkinen, James Wang, linux-input,
Mali DP Maintainers, Derek Kiernan, linux-mips, Dragan Cvetic,
Wu Hao, Tony Krowiak, linux-kbuild, James E.J. Bottomley,
Jiri Kosina, Hannes Reinecke, linux-block, Thomas Bogendoerfer,
Jacek Anaszewski, linux-mm, Dan Williams, Andrew Morton,
Mimi Zohar, Jens Axboe, Michal Marek, Martin K. Petersen,
Pierre Morel, linux-kernel, Daniel Vetter, Jason Wessel,
Paolo Bonzini, linux-integrity, linuxppc-dev, Mike Rapoport,
Dan Murphy
In-Reply-To: <20200707180414.10467-10-rdunlap@infradead.org>
[-- Attachment #1: Type: text/plain, Size: 214 bytes --]
On Tue, Jul 07, 2020 at 11:04:03AM -0700, Randy Dunlap wrote:
> Drop doubled word "new".
>
> Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
For the record:
Acked-by: Wolfram Sang <wsa@kernel.org>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [v3 07/15] powerpc/perf: Add power10_feat to dt_cpu_ftrs
From: Michael Ellerman @ 2020-07-22 10:39 UTC (permalink / raw)
To: Athira Rajeev, Jordan Niethe
Cc: Gautham R Shenoy, Michael Neuling, maddy, kvm, kvm-ppc, svaidyan,
acme, jolsa, linuxppc-dev
In-Reply-To: <9A4E06A2-5686-4C85-B2F7-0904F195B58A@linux.vnet.ibm.com>
Athira Rajeev <atrajeev@linux.vnet.ibm.com> writes:
>> On 22-Jul-2020, at 10:11 AM, Jordan Niethe <jniethe5@gmail.com> wrote:
>>
>> On Sat, Jul 18, 2020 at 1:13 AM Athira Rajeev
>> <atrajeev@linux.vnet.ibm.com <mailto:atrajeev@linux.vnet.ibm.com>> wrote:
>>>
>>> From: Madhavan Srinivasan <maddy@linux.ibm.com>
>>>
>>> Add power10 feature function to dt_cpu_ftrs.c along
>>> with a power10 specific init() to initialize pmu sprs,
>>> sets the oprofile_cpu_type and cpu_features. This will
>>> enable performance monitoring unit(PMU) for Power10
>>> in CPU features with "performance-monitor-power10".
>>>
>>> For PowerISA v3.1, BHRB disable is controlled via Monitor Mode
>>> Control Register A (MMCRA) bit, namely "BHRB Recording Disable
>>> (BHRBRD)". This patch initializes MMCRA BHRBRD to disable BHRB
>>> feature at boot for power10.
>>>
>>> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
>>> ---
>>> arch/powerpc/include/asm/reg.h | 3 +++
>>> arch/powerpc/kernel/cpu_setup_power.S | 8 ++++++++
>>> arch/powerpc/kernel/dt_cpu_ftrs.c | 26 ++++++++++++++++++++++++++
>>> 3 files changed, 37 insertions(+)
>>>
>>> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
>>> index 21a1b2d..900ada1 100644
>>> --- a/arch/powerpc/include/asm/reg.h
>>> +++ b/arch/powerpc/include/asm/reg.h
>>> @@ -1068,6 +1068,9 @@
>>> #define MMCR0_PMC2_LOADMISSTIME 0x5
>>> #endif
>>>
>>> +/* BHRB disable bit for PowerISA v3.10 */
>>> +#define MMCRA_BHRB_DISABLE 0x0000002000000000
>> Shouldn't this go under SPRN_MMCRA with the other MMCRA_*.
>
>
> Hi Jordan
>
> Ok, the definition of MMCRA is under #ifdef for 64 bit . if I move definition of MMCRA_BHRB_DISABLE along with other SPR's, I also
> need to define this for 32-bit to satisfy core-book3s to compile as below:
>
> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> index 900ada10762c..7e271657b412 100644
> --- a/arch/powerpc/include/asm/reg.h
> +++ b/arch/powerpc/include/asm/reg.h
> @@ -888,6 +888,8 @@
> #define MMCRA_SLOT 0x07000000UL /* SLOT bits (37-39) */
> #define MMCRA_SLOT_SHIFT 24
> #define MMCRA_SAMPLE_ENABLE 0x00000001UL /* enable sampling */
> +/* BHRB disable bit for PowerISA v3.10 */
> +#define MMCRA_BHRB_DISABLE 0x0000002000000000
I changed it to:
#define MMCRA_BHRB_DISABLE 0x2000000000UL // BHRB disable bit for ISA v3.1
>>> diff --git a/arch/powerpc/kernel/cpu_setup_power.S b/arch/powerpc/kernel/cpu_setup_power.S
>>> index efdcfa7..b8e0d1e 100644
>>> --- a/arch/powerpc/kernel/cpu_setup_power.S
>>> +++ b/arch/powerpc/kernel/cpu_setup_power.S
>>> @@ -94,6 +94,7 @@ _GLOBAL(__restore_cpu_power8)
>>> _GLOBAL(__setup_cpu_power10)
>>> mflr r11
>>> bl __init_FSCR_power10
>>> + bl __init_PMU_ISA31
>> So we set MMCRA here but then aren't we still going to call __init_PMU
>> which will overwrite that?
>> Would this setting MMCRA also need to be handled in __restore_cpu_power10?
>
> Thanks for this nice catch ! When I rebased code initial phase, we didn’t had power10 part filled in.
> It was a miss from my side in adding PMu init functions and thanks for pointing this out.
> Below patch will call __init_PMU functions in setup and restore. Please check if this looks good
Actually those changes should be in a separate patch.
This one is wiring up DT CPU features, the cpu setup routines are not
used by DT CPU features.
So please send a new patch I can insert into the series that adds the
cpu_setup_power.S changes.
cheers
> --
> diff --git a/arch/powerpc/kernel/cpu_setup_power.S b/arch/powerpc/kernel/cpu_setup_power.S
> index efdcfa714106..e672a6c5fd7c 100644
> --- a/arch/powerpc/kernel/cpu_setup_power.S
> +++ b/arch/powerpc/kernel/cpu_setup_power.S
> @@ -94,6 +94,9 @@ _GLOBAL(__restore_cpu_power8)
> _GLOBAL(__setup_cpu_power10)
> mflr r11
> bl __init_FSCR_power10
> + bl __init_PMU
> + bl __init_PMU_ISA31
> + bl __init_PMU_HV
> b 1f
>
> _GLOBAL(__setup_cpu_power9)
> @@ -124,6 +127,9 @@ _GLOBAL(__setup_cpu_power9)
> _GLOBAL(__restore_cpu_power10)
> mflr r11
> bl __init_FSCR_power10
> + bl __init_PMU
> + bl __init_PMU_ISA31
> + bl __init_PMU_HV
> b 1f
>
> _GLOBAL(__restore_cpu_power9)
> @@ -233,3 +239,10 @@ __init_PMU_ISA207:
> li r5,0
> mtspr SPRN_MMCRS,r5
> blr
> +
> +__init_PMU_ISA31:
> + li r5,0
> + mtspr SPRN_MMCR3,r5
> + LOAD_REG_IMMEDIATE(r5, MMCRA_BHRB_DISABLE)
> + mtspr SPRN_MMCRA,r5
> + blr
>
^ permalink raw reply
* Re: [PATCH 15/15] powerpc/powernv/sriov: Make single PE mode a per-BAR setting
From: Alexey Kardashevskiy @ 2020-07-22 10:06 UTC (permalink / raw)
To: Oliver O'Halloran; +Cc: linuxppc-dev@lists.ozlabs.org
In-Reply-To: <CAOSf1CF1_Ga1KDhqLGxTgg+ugj6AfrzXbouZq1MiMa0faHZeeg@mail.gmail.com>
On 22/07/2020 15:39, Oliver O'Halloran wrote:
> On Wed, Jul 15, 2020 at 6:00 PM Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>
>>>>> *
>>>>> - * Generally, one M64 BAR maps one IOV BAR. To avoid conflict
>>>>> - * with other devices, IOV BAR size is expanded to be
>>>>> - * (total_pe * VF_BAR_size). When VF_BAR_size is half of M64
>>>>> - * segment size , the expanded size would equal to half of the
>>>>> - * whole M64 space size, which will exhaust the M64 Space and
>>>>> - * limit the system flexibility. This is a design decision to
>>>>> - * set the boundary to quarter of the M64 segment size.
>>>>> + * The 1/4 limit is arbitrary and can be tweaked.
>>>>> */
>>>>> - if (total_vf_bar_sz > gate) {
>>>>> - mul = roundup_pow_of_two(total_vfs);
>>>>> - dev_info(&pdev->dev,
>>>>> - "VF BAR Total IOV size %llx > %llx, roundup to %d VFs\n",
>>>>> - total_vf_bar_sz, gate, mul);
>>>>> - iov->m64_single_mode = true;
>>>>> - break;
>>>>> - }
>>>>> - }
>>>>> + if (vf_bar_sz > (phb->ioda.m64_segsize >> 2)) {
>>>>> + /*
>>>>> + * On PHB3, the minimum size alignment of M64 BAR in
>>>>> + * single mode is 32MB. If this VF BAR is smaller than
>>>>> + * 32MB, but still too large for a segmented window
>>>>> + * then we can't map it and need to disable SR-IOV for
>>>>> + * this device.
>>>>
>>>>
>>>> Why not use single PE mode for such BAR? Better than nothing.
>>>
>>> Suppose you could, but I figured VFs were mainly interesting since you
>>> could give each VF to a separate guest. If there's multiple VFs under
>>> the same single PE BAR then they'd have to be assigned to the same
>>
>> True. But with one PE per VF we can still have 15 (or 14?) isolated VFs
>> which is not hundreds but better than 0.
>
> We can only use single PE BARs if the per-VF size is >= 32MB due to
> the alignment requirements on P8. If the per-VF size is smaller then
> we're stuck with multiple VFs inside the same BAR which is bad due to
> the PAPR requirements mentioned below. Sure we could look at doing
> something else, but considering this matches the current behaviour
> it's a bit hard to care...
>
>>> guest in order to retain the freeze/unfreeze behaviour that PAPR
>>> requires. I guess that's how it used to work, but it seems better just
>>> to disable them rather than having VFs which sort of work.
>>
>> Well, realistically the segment size should be 8MB to make this matter
>> (or the whole window 2GB) which does not seem to happen so it does not
>> matter.
>
> I'm not sure what you mean.
I mean how can we possibly hit this case, what m64_segsize would the
platform have to trigger this. The whole check seems useless but whatever.
--
Alexey
^ permalink raw reply
* Re: [v4 5/5] KVM: PPC: Book3S HV: migrate hot plugged memory
From: Bharata B Rao @ 2020-07-22 10:01 UTC (permalink / raw)
To: Ram Pai
Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
linuxppc-dev, bauerman, david
In-Reply-To: <1594972827-13928-6-git-send-email-linuxram@us.ibm.com>
On Fri, Jul 17, 2020 at 01:00:27AM -0700, Ram Pai wrote:
> From: Laurent Dufour <ldufour@linux.ibm.com>
>
> When a memory slot is hot plugged to a SVM, PFNs associated with the
> GFNs in that slot must be migrated to secure-PFNs, aka device-PFNs.
>
> Call kvmppc_uv_migrate_mem_slot() to accomplish this.
> Disable page-merge for all pages in the memory slot.
>
> Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> [rearranged the code, and modified the commit log]
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
> ---
> arch/powerpc/include/asm/kvm_book3s_uvmem.h | 10 ++++++++++
> arch/powerpc/kvm/book3s_hv.c | 10 ++--------
> arch/powerpc/kvm/book3s_hv_uvmem.c | 22 ++++++++++++++++++++++
> 3 files changed, 34 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> index f229ab5..6f7da00 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> @@ -25,6 +25,9 @@ void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
> struct kvm *kvm, bool skip_page_out);
> int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
> const struct kvm_memory_slot *memslot);
> +void kvmppc_memslot_create(struct kvm *kvm, const struct kvm_memory_slot *new);
> +void kvmppc_memslot_delete(struct kvm *kvm, const struct kvm_memory_slot *old);
The names look a bit generic, but these functions are specific
to secure guests. May be rename them to kvmppc_uvmem_memslot_[create/delele]?
> +
> #else
> static inline int kvmppc_uvmem_init(void)
> {
> @@ -84,5 +87,12 @@ static inline int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn)
> static inline void
> kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
> struct kvm *kvm, bool skip_page_out) { }
> +
> +static inline void kvmppc_memslot_create(struct kvm *kvm,
> + const struct kvm_memory_slot *new) { }
> +
> +static inline void kvmppc_memslot_delete(struct kvm *kvm,
> + const struct kvm_memory_slot *old) { }
> +
> #endif /* CONFIG_PPC_UV */
> #endif /* __ASM_KVM_BOOK3S_UVMEM_H__ */
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index d331b46..bf3be3b 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -4515,16 +4515,10 @@ static void kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
>
> switch (change) {
> case KVM_MR_CREATE:
> - if (kvmppc_uvmem_slot_init(kvm, new))
> - return;
> - uv_register_mem_slot(kvm->arch.lpid,
> - new->base_gfn << PAGE_SHIFT,
> - new->npages * PAGE_SIZE,
> - 0, new->id);
> + kvmppc_memslot_create(kvm, new);
> break;
> case KVM_MR_DELETE:
> - uv_unregister_mem_slot(kvm->arch.lpid, old->id);
> - kvmppc_uvmem_slot_free(kvm, old);
> + kvmppc_memslot_delete(kvm, old);
> break;
> default:
> /* TODO: Handle KVM_MR_MOVE */
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index a206984..a2b4d25 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -1089,6 +1089,28 @@ int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn)
> return (ret == U_SUCCESS) ? RESUME_GUEST : -EFAULT;
> }
>
> +void kvmppc_memslot_create(struct kvm *kvm, const struct kvm_memory_slot *new)
> +{
> + if (kvmppc_uvmem_slot_init(kvm, new))
> + return;
> +
> + if (kvmppc_memslot_page_merge(kvm, new, false))
> + return;
> +
> + if (uv_register_mem_slot(kvm->arch.lpid, new->base_gfn << PAGE_SHIFT,
> + new->npages * PAGE_SIZE, 0, new->id))
> + return;
> +
> + kvmppc_uv_migrate_mem_slot(kvm, new);
Quite a few things can return failure here including
kvmppc_uv_migrate_mem_slot() and we are ignoring all of those.
I am wondering if this should be called from prepare_memory_region callback
instead of commit_memory_region. In the prepare phase, we have a way
to back out in case of error. Can you check if moving this call to
prepare callback is feasible?
In the other case in 1/5, the code issues ksm unmerge request on error,
but not here.
Also check if the code for 1st three calls can be shared with similar
code in 1/5.
Regards,
Bharata.
^ permalink raw reply
* Re: [PATCH 05/15] powerpc/powernv/sriov: Move SR-IOV into a seperate file
From: Alexey Kardashevskiy @ 2020-07-22 9:53 UTC (permalink / raw)
To: Oliver O'Halloran; +Cc: linuxppc-dev
In-Reply-To: <CAOSf1CF0jv_cq5xgVz+7fzf155MjHT72p+VN1EY6HjjW1Nza-w@mail.gmail.com>
On 22/07/2020 15:01, Oliver O'Halloran wrote:
> On Tue, Jul 14, 2020 at 7:16 PM Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>
>> On 10/07/2020 15:23, Oliver O'Halloran wrote:
>>> + align = pci_iov_resource_size(pdev, resno);
>>> +
>>> + /*
>>> + * iov can be null if we have an SR-IOV device with IOV BAR that can't
>>> + * be placed in the m64 space (i.e. The BAR is 32bit or non-prefetch).
>>> + * In that case we don't allow VFs to be enabled so just return the
>>> + * default alignment.
>>> + */
>>> + if (!iov)
>>> + return align;
>>
>>
>> This is the new chunk. What would happen before? Non-prefetch BAR would
>> still go to m64 space?
>
> I don't think there's any real change. Currently if the setup in
> pnv_pci_ioda_fixup_iov_resources() fails then pdn->vfs_expanded will
> be zero. The !iov check here fills the same role, but it's more
> explicit. vfs_expanded has some other behaviour too so we can't get
> rid of it entirely (yet).
The check is fine, you have to have one as @iov can be NULL (unlike
pci_dn). The comment is what bothered me. It would make more sense
somewhere in pnv_pci_ioda_fixup_iov_resources() near
"dev_warn(&pdev->dev, "Don't support SR-IOV with"" as now it suggests
there is one reason for the failed iov configuration only while there
are two reasons.
--
Alexey
^ permalink raw reply
* Re: [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone
From: Arnd Bergmann @ 2020-07-22 9:43 UTC (permalink / raw)
To: Palmer Dabbelt
Cc: Albert Ou, Alexandre Ghiti, Atish Patra, Anup Patel,
linux-kernel@vger.kernel.org, Paul Walmsley, Linux-MM,
Paul Mackerras, Zong Li, linux-riscv, linuxppc-dev
In-Reply-To: <mhng-08bff01a-ca15-4bbc-8454-2ca3e823fef8@palmerdabbelt-glaptop1>
On Tue, Jul 21, 2020 at 9:06 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
> On Tue, 21 Jul 2020 11:36:10 PDT (-0700), alex@ghiti.fr wrote:
> > Let's try to make progress here: I add linux-mm in CC to get feedback on
> > this patch as it blocks sv48 support too.
>
> Sorry for being slow here. I haven't replied because I hadn't really fleshed
> out the design yet, but just so everyone's on the same page my problems with
> this are:
>
> * We waste vmalloc space on 32-bit systems, where there isn't a lot of it.
There is actually an ongoing work to make 32-bit Arm kernels move
vmlinux into the vmalloc space, as part of the move to avoid highmem.
Overall, a 32-bit system would waste about 0.1% of its virtual address space
by having the kernel be located in both the linear map and the vmalloc area.
It's not zero, but not that bad either. With the typical split of 3072 MB user,
768MB linear and 256MB vmalloc, it's also around 1.5% of the available
vmalloc area (assuming a 4MB vmlinux in a typical 32-bit kernel), but the
boundaries can be changed arbitrarily if needed.
The eventual goal is to have a split of 3840MB for either user or linear map
plus and 256MB for vmalloc, including the kernel. Switching between linear
and user has a noticeable runtime overhead, but it relaxes both the limits
for user memory and lowmem, and it provides a somewhat stronger
address space isolation.
Another potential idea would be to completely randomize the physical
addresses underneath the kernel by using a random permutation of the
pages in the kernel image. This adds even more overhead (virt_to_phys
may need to call vmalloc_to_page or similar) and may cause problems
with DMA into kernel .data across page boundaries,
> * Sort out how to maintain a linear map as the canonical hole moves around
> between the VA widths without adding a bunch of overhead to the virt2phys and
> friends. This is probably going to be the trickiest part, but I think if we
> just change the page table code to essentially lie about VAs when an sv39
> system runs an sv48+sv39 kernel we could make it work -- there'd be some
> logical complexity involved, but it would remain fast.
I assume you can't use the trick that x86 has where all kernel addresses
are at the top of the 64-bit address space and user addresses are at the
bottom, regardless of the size of the page tables?
Arnd
^ permalink raw reply
* Re: [PATCH v2 06/10] powerpc/smp: Generalize 2nd sched domain
From: peterz @ 2020-07-22 8:54 UTC (permalink / raw)
To: Srikar Dronamraju
Cc: Ingo Molnar, Nathan Lynch, Oliver OHalloran, Michael Neuling,
Michael Ellerman, Gautham R Shenoy, Jordan Niethe,
Anton Blanchard, LKML, Valentin Schneider, linuxppc-dev,
Nick Piggin
In-Reply-To: <20200722084854.GL10769@hirez.programming.kicks-ass.net>
On Wed, Jul 22, 2020 at 10:48:54AM +0200, Peter Zijlstra wrote:
> But reading your explanation, it looks like the Linux topology setup
> could actually unravel the fused-faux-SMT8 into two independent SMT4
> parts, negating that firmware option.
Ah, it looks like that's exactly what you're doing. Nice!
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox