* Re: [PATCH v2] powerpc/mm: Update tlbiel loop on POWER10
From: Michael Ellerman @ 2020-11-25 11:57 UTC (permalink / raw)
To: Aneesh Kumar K.V, linuxppc-dev, mpe; +Cc: npiggin
In-Reply-To: <20201007053305.232879-1-aneesh.kumar@linux.ibm.com>
On Wed, 7 Oct 2020 11:03:05 +0530, Aneesh Kumar K.V wrote:
> With POWER10, single tlbiel instruction invalidates all the congruence
> class of the TLB and hence we need to issue only one tlbiel with SET=0.
Applied to powerpc/next.
[1/1] powerpc/mm: Update tlbiel loop on POWER10
https://git.kernel.org/powerpc/c/e80639405c40127727812a0e1f8a65ba9979f146
cheers
^ permalink raw reply
* Re: [PATCH] powerpc/mm: move setting pte specific flags to pfn_pmd
From: Michael Ellerman @ 2020-11-25 11:57 UTC (permalink / raw)
To: Aneesh Kumar K.V, linuxppc-dev, mpe; +Cc: linux-mm
In-Reply-To: <20201022091115.39568-1-aneesh.kumar@linux.ibm.com>
On Thu, 22 Oct 2020 14:41:15 +0530, Aneesh Kumar K.V wrote:
> powerpc used to set the pte specific flags in set_pte_at(). This is
> different from other architectures. To be consistent with other
> architecture powerpc updated pfn_pte to set _PAGE_PTE with
> commit 379c926d6334 ("powerpc/mm: move setting pte specific flags to pfn_pte")
>
> The commit didn't do the same w.r.t pfn_pmd because we expect pmd_mkhuge
> to do that. But as per Linus that is a bad rule [1].
> Hence update pfn_pmd to set _PAGE_PTE.
>
> [...]
Applied to powerpc/next.
[1/1] powerpc/mm: Move setting PTE specific flags to pfn_pmd()
https://git.kernel.org/powerpc/c/53f45ecc9cd04b4b963f3040f2a54c3baf03b229
cheers
^ permalink raw reply
* [PATCH v2 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()
From: Laurent Vivier @ 2020-11-25 11:16 UTC (permalink / raw)
To: linux-kernel
Cc: Laurent Vivier, Michael S . Tsirkin, linux-pci, Greg Kurz,
linux-block, Paul Mackerras, Marc Zyngier, Thomas Gleixner,
linuxppc-dev, Christoph Hellwig
In-Reply-To: <20201125111657.1141295-1-lvivier@redhat.com>
With virtio multiqueue, normally each queue IRQ is mapped to a CPU.
But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ affinity")
this is broken on pseries.
The affinity is correctly computed in msi_desc but this is not applied
to the system IRQs.
It appears the affinity is correctly passed to rtas_setup_msi_irqs() but
lost at this point and never passed to irq_domain_alloc_descs()
(see commit 06ee6d571f0e ("genirq: Add affinity hint to irq allocation"))
because irq_create_mapping() doesn't take an affinity parameter.
As the previous patch has added the affinity parameter to
irq_create_mapping() we can forward the affinity from rtas_setup_msi_irqs()
to irq_domain_alloc_descs().
With this change, the virtqueues are correctly dispatched between the CPUs
on pseries.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
arch/powerpc/platforms/pseries/msi.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
index 133f6adcb39c..b3ac2455faad 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -458,7 +458,8 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type)
return hwirq;
}
- virq = irq_create_mapping(NULL, hwirq);
+ virq = irq_create_mapping_affinity(NULL, hwirq,
+ entry->affinity);
if (!virq) {
pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
--
2.28.0
^ permalink raw reply related
* [PATCH v2 1/2] genirq: add an irq_create_mapping_affinity() function
From: Laurent Vivier @ 2020-11-25 11:16 UTC (permalink / raw)
To: linux-kernel
Cc: Laurent Vivier, Michael S . Tsirkin, linux-pci, Greg Kurz,
linux-block, Paul Mackerras, Marc Zyngier, Thomas Gleixner,
linuxppc-dev, Christoph Hellwig
In-Reply-To: <20201125111657.1141295-1-lvivier@redhat.com>
This function adds an affinity parameter to irq_create_mapping().
This parameter is needed to pass it to irq_domain_alloc_descs().
irq_create_mapping() is a wrapper around irq_create_mapping_affinity()
to pass NULL for the affinity parameter.
No functional change.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
include/linux/irqdomain.h | 12 ++++++++++--
kernel/irq/irqdomain.c | 13 ++++++++-----
2 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index 71535e87109f..ea5a337e0f8b 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -384,11 +384,19 @@ extern void irq_domain_associate_many(struct irq_domain *domain,
extern void irq_domain_disassociate(struct irq_domain *domain,
unsigned int irq);
-extern unsigned int irq_create_mapping(struct irq_domain *host,
- irq_hw_number_t hwirq);
+extern unsigned int irq_create_mapping_affinity(struct irq_domain *host,
+ irq_hw_number_t hwirq,
+ const struct irq_affinity_desc *affinity);
extern unsigned int irq_create_fwspec_mapping(struct irq_fwspec *fwspec);
extern void irq_dispose_mapping(unsigned int virq);
+static inline unsigned int irq_create_mapping(struct irq_domain *host,
+ irq_hw_number_t hwirq)
+{
+ return irq_create_mapping_affinity(host, hwirq, NULL);
+}
+
+
/**
* irq_linear_revmap() - Find a linux irq from a hw irq number.
* @domain: domain owning this hardware interrupt
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index cf8b374b892d..e4ca69608f3b 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -624,17 +624,19 @@ unsigned int irq_create_direct_mapping(struct irq_domain *domain)
EXPORT_SYMBOL_GPL(irq_create_direct_mapping);
/**
- * irq_create_mapping() - Map a hardware interrupt into linux irq space
+ * irq_create_mapping_affinity() - Map a hardware interrupt into linux irq space
* @domain: domain owning this hardware interrupt or NULL for default domain
* @hwirq: hardware irq number in that domain space
+ * @affinity: irq affinity
*
* Only one mapping per hardware interrupt is permitted. Returns a linux
* irq number.
* If the sense/trigger is to be specified, set_irq_type() should be called
* on the number returned from that call.
*/
-unsigned int irq_create_mapping(struct irq_domain *domain,
- irq_hw_number_t hwirq)
+unsigned int irq_create_mapping_affinity(struct irq_domain *domain,
+ irq_hw_number_t hwirq,
+ const struct irq_affinity_desc *affinity)
{
struct device_node *of_node;
int virq;
@@ -660,7 +662,8 @@ unsigned int irq_create_mapping(struct irq_domain *domain,
}
/* Allocate a virtual interrupt number */
- virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node), NULL);
+ virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node),
+ affinity);
if (virq <= 0) {
pr_debug("-> virq allocation failed\n");
return 0;
@@ -676,7 +679,7 @@ unsigned int irq_create_mapping(struct irq_domain *domain,
return virq;
}
-EXPORT_SYMBOL_GPL(irq_create_mapping);
+EXPORT_SYMBOL_GPL(irq_create_mapping_affinity);
/**
* irq_create_strict_mappings() - Map a range of hw irqs to fixed linux irqs
--
2.28.0
^ permalink raw reply related
* [PATCH v2 0/2] powerpc/pseries: fix MSI/X IRQ affinity on pseries
From: Laurent Vivier @ 2020-11-25 11:16 UTC (permalink / raw)
To: linux-kernel
Cc: Laurent Vivier, Michael S . Tsirkin, linux-pci, Greg Kurz,
linux-block, Paul Mackerras, Marc Zyngier, Thomas Gleixner,
linuxppc-dev, Christoph Hellwig
With virtio, in multiqueue case, each queue IRQ is normally
bound to a different CPU using the affinity mask.
This works fine on x86_64 but totally ignored on pseries.
This is not obvious at first look because irqbalance is doing
some balancing to improve that.
It appears that the "managed" flag set in the MSI entry
is never copied to the system IRQ entry.
This series passes the affinity mask from rtas_setup_msi_irqs()
to irq_domain_alloc_descs() by adding an affinity parameter to
irq_create_mapping().
The first patch adds the parameter (no functional change), the
second patch passes the actual affinity mask to irq_create_mapping()
in rtas_setup_msi_irqs().
For instance, with 32 CPUs VM and 32 queues virtio-scsi interface:
... -smp 32 -device virtio-scsi-pci,id=virtio_scsi_pci0,num_queues=32
for IRQ in $(grep virtio2-request /proc/interrupts |cut -d: -f1); do
for file in /proc/irq/$IRQ/ ; do
echo -n "IRQ: $(basename $file) CPU: " ; cat $file/smp_affinity_list
done
done
Without the patch (and without irqbalanced)
IRQ: 268 CPU: 0-31
IRQ: 269 CPU: 0-31
IRQ: 270 CPU: 0-31
IRQ: 271 CPU: 0-31
IRQ: 272 CPU: 0-31
IRQ: 273 CPU: 0-31
IRQ: 274 CPU: 0-31
IRQ: 275 CPU: 0-31
IRQ: 276 CPU: 0-31
IRQ: 277 CPU: 0-31
IRQ: 278 CPU: 0-31
IRQ: 279 CPU: 0-31
IRQ: 280 CPU: 0-31
IRQ: 281 CPU: 0-31
IRQ: 282 CPU: 0-31
IRQ: 283 CPU: 0-31
IRQ: 284 CPU: 0-31
IRQ: 285 CPU: 0-31
IRQ: 286 CPU: 0-31
IRQ: 287 CPU: 0-31
IRQ: 288 CPU: 0-31
IRQ: 289 CPU: 0-31
IRQ: 290 CPU: 0-31
IRQ: 291 CPU: 0-31
IRQ: 292 CPU: 0-31
IRQ: 293 CPU: 0-31
IRQ: 294 CPU: 0-31
IRQ: 295 CPU: 0-31
IRQ: 296 CPU: 0-31
IRQ: 297 CPU: 0-31
IRQ: 298 CPU: 0-31
IRQ: 299 CPU: 0-31
With the patch:
IRQ: 265 CPU: 0
IRQ: 266 CPU: 1
IRQ: 267 CPU: 2
IRQ: 268 CPU: 3
IRQ: 269 CPU: 4
IRQ: 270 CPU: 5
IRQ: 271 CPU: 6
IRQ: 272 CPU: 7
IRQ: 273 CPU: 8
IRQ: 274 CPU: 9
IRQ: 275 CPU: 10
IRQ: 276 CPU: 11
IRQ: 277 CPU: 12
IRQ: 278 CPU: 13
IRQ: 279 CPU: 14
IRQ: 280 CPU: 15
IRQ: 281 CPU: 16
IRQ: 282 CPU: 17
IRQ: 283 CPU: 18
IRQ: 284 CPU: 19
IRQ: 285 CPU: 20
IRQ: 286 CPU: 21
IRQ: 287 CPU: 22
IRQ: 288 CPU: 23
IRQ: 289 CPU: 24
IRQ: 290 CPU: 25
IRQ: 291 CPU: 26
IRQ: 292 CPU: 27
IRQ: 293 CPU: 28
IRQ: 294 CPU: 29
IRQ: 295 CPU: 30
IRQ: 299 CPU: 31
This matches what we have on an x86_64 system.
v2: add a wrapper around original irq_create_mapping() with the
affinity parameter. Update comments
Laurent Vivier (2):
genirq: add an irq_create_mapping_affinity() function
powerpc/pseries: pass MSI affinity to irq_create_mapping()
arch/powerpc/platforms/pseries/msi.c | 3 ++-
include/linux/irqdomain.h | 12 ++++++++++--
kernel/irq/irqdomain.c | 13 ++++++++-----
3 files changed, 20 insertions(+), 8 deletions(-)
--
2.28.0
^ permalink raw reply
* Re: [PATCH 1/2] powerpc: sstep: Fix load and update instructions
From: Ravi Bangoria @ 2020-11-25 10:09 UTC (permalink / raw)
To: Sandipan Das
Cc: Ravi Bangoria, jniethe5, paulus, naveen.n.rao, linuxppc-dev, dja
In-Reply-To: <20201119054139.244083-1-sandipan@linux.ibm.com>
> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
> index 855457ed09b5..25a5436be6c6 100644
> --- a/arch/powerpc/lib/sstep.c
> +++ b/arch/powerpc/lib/sstep.c
> @@ -2157,11 +2157,15 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
>
> case 23: /* lwzx */
> case 55: /* lwzux */
> + if (u && (ra == 0 || ra == rd))
> + return -1;
I guess you also need to split case 23 and 55?
- Ravi
^ permalink raw reply
* Re: C vdso
From: Christophe Leroy @ 2020-11-25 9:21 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev
In-Reply-To: <87tuteyyxi.fsf@mpe.ellerman.id.au>
Quoting Michael Ellerman <mpe@ellerman.id.au>:
> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>> Le 03/11/2020 à 19:13, Christophe Leroy a écrit :
>>> Le 23/10/2020 à 15:24, Michael Ellerman a écrit :
>>>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>>>> Le 24/09/2020 à 15:17, Christophe Leroy a écrit :
>>>>>> Le 17/09/2020 à 14:33, Michael Ellerman a écrit :
>>>>>>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>>>>>>>
>>>>>>>> What is the status with the generic C vdso merge ?
>>>>>>>> In some mail, you mentionned having difficulties getting it working on
>>>>>>>> ppc64, any progress ? What's the problem ? Can I help ?
>>>>>>>
>>>>>>> Yeah sorry I was hoping to get time to work on it but haven't been able
>>>>>>> to.
>>>>>>>
>>>>>>> It's causing crashes on ppc64 ie. big endian.
>>>> ...
>>>>>>
>>>>>> Can you tell what defconfig you are using ? I have been able to
>>>>>> setup a full glibc PPC64 cross
>>>>>> compilation chain and been able to test it under QEMU with
>>>>>> success, using Nathan's vdsotest tool.
>>>>>
>>>>> What config are you using ?
>>>>
>>>> ppc64_defconfig + guest.config
>>>>
>>>> Or pseries_defconfig.
>>>>
>>>> I'm using Ubuntu GCC 9.3.0 mostly, but it happens with other
>>>> toolchains too.
>>>>
>>>> At a minimum we're seeing relocations in the output, which is a problem:
>>>>
>>>> $ readelf -r build\~/arch/powerpc/kernel/vdso64/vdso64.so
>>>> Relocation section '.rela.dyn' at offset 0x12a8 contains 8 entries:
>>>> Offset Info Type Sym. Value
>>>> Sym. Name + Addend
>>>> 000000001368 000000000016 R_PPC64_RELATIVE 7c0
>>>> 000000001370 000000000016 R_PPC64_RELATIVE 9300
>>>> 000000001380 000000000016 R_PPC64_RELATIVE 970
>>>> 000000001388 000000000016 R_PPC64_RELATIVE 9300
>>>> 000000001398 000000000016 R_PPC64_RELATIVE a90
>>>> 0000000013a0 000000000016 R_PPC64_RELATIVE 9300
>>>> 0000000013b0 000000000016 R_PPC64_RELATIVE b20
>>>> 0000000013b8 000000000016 R_PPC64_RELATIVE 9300
>>>
>>> Looks like it's due to the OPD and relation between the function()
>>> and .function()
>>>
>>> By using DOTSYM() in the 'bl' call, that's directly the dot
>>> function which is called and the OPD is
>>> not used anymore, it can get dropped.
>>>
>>> Now I get .rela.dyn full of 0, don't know if we should drop it explicitely.
>>
>> What is the status now with latest version of CVDSO ? I saw you had
>> it in next-test for some time,
>> it is not there anymore today.
>
> Still having some trouble with the compat VDSO.
>
> eg:
>
> $ ./vdsotest clock-gettime-monotonic verify
> timestamp obtained from kernel predates timestamp
> previously obtained from libc/vDSO:
> [1346, 821441653] (vDSO)
> [570, 769440040] (kernel)
>
>
> And similar for all clocks except the coarse ones.
>
Ok, I managed to get the same with QEMU. Looking at the binary, I only
see an mftb instead of the mftbu/mftb/mftbu triplet.
Fix below. Can you carry it, or do you prefer a full patch from me ?
The easiest would be either to squash it into [v13,4/8]
("powerpc/time: Move timebase functions into new asm/timebase.h"), or
to add it between patch 4 and 5 ?
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index f877a576b338..c3473eb031a3 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1419,7 +1419,7 @@ static inline void msr_check_and_clear(unsigned
long bits)
__msr_check_and_clear(bits);
}
-#if defined(CONFIG_PPC_CELL) || defined(CONFIG_E500)
+#if defined(__powerpc64__) && (defined(CONFIG_PPC_CELL) ||
defined(CONFIG_E500))
#define mftb() ({unsigned long rval; \
asm volatile( \
"90: mfspr %0, %2;\n" \
diff --git a/arch/powerpc/include/asm/timebase.h
b/arch/powerpc/include/asm/timebase.h
index a8eae3adaa91..7b372976f5a5 100644
--- a/arch/powerpc/include/asm/timebase.h
+++ b/arch/powerpc/include/asm/timebase.h
@@ -21,7 +21,7 @@ static inline u64 get_tb(void)
{
unsigned int tbhi, tblo, tbhi2;
- if (IS_ENABLED(CONFIG_PPC64))
+ if (IS_BUILTIN(__powerpc64__))
return mftb();
do {
^ permalink raw reply related
* Re: [PATCH v4 10/18] dt-bindings: usb: Convert DWC USB3 bindings to DT schema
From: Serge Semin @ 2020-11-25 8:32 UTC (permalink / raw)
To: Rob Herring
Cc: Neil Armstrong, Bjorn Andersson, Pavel Parkhomenko, Kevin Hilman,
Krzysztof Kozlowski, Andy Gross, Chunfeng Yun, linux-snps-arc,
devicetree, Mathias Nyman, Martin Blumenstingl, Lad Prabhakar,
Alexey Malahov, linux-arm-kernel, Roger Quadros, Felipe Balbi,
Greg Kroah-Hartman, Yoshihiro Shimoda, linux-usb, linux-mips,
Serge Semin, linux-kernel, Manu Gautam, linuxppc-dev
In-Reply-To: <20201121124228.GA2039998@robh.at.kernel.org>
On Sat, Nov 21, 2020 at 06:42:28AM -0600, Rob Herring wrote:
> On Thu, Nov 12, 2020 at 01:29:46PM +0300, Serge Semin wrote:
> > On Wed, Nov 11, 2020 at 02:14:23PM -0600, Rob Herring wrote:
> > > On Wed, Nov 11, 2020 at 12:08:45PM +0300, Serge Semin wrote:
> > > > DWC USB3 DT node is supposed to be compliant with the Generic xHCI
> > > > Controller schema, but with additional vendor-specific properties, the
> > > > controller-specific reference clocks and PHYs. So let's convert the
> > > > currently available legacy text-based DWC USB3 bindings to the DT schema
> > > > and make sure the DWC USB3 nodes are also validated against the
> > > > usb-xhci.yaml schema.
> > > >
> > > > Note we have to discard the nodename restriction of being prefixed with
> > > > "dwc3@" string, since in accordance with the usb-hcd.yaml schema USB nodes
> > > > are supposed to be named as "^usb(@.*)".
> > > >
> > > > Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
> > > >
> > > > ---
> > > >
> > > > Changelog v2:
> > > > - Discard '|' from the descriptions, since we don't need to preserve
> > > > the text formatting in any of them.
> > > > - Drop quotes from around the string constants.
> > > > - Fix the "clock-names" prop description to be referring the enumerated
> > > > clock-names instead of the ones from the Databook.
> > > >
> > > > Changelog v3:
> > > > - Apply usb-xhci.yaml# schema only if the controller is supposed to work
> > > > as either host or otg.
> > > >
> > > > Changelog v4:
> > > > - Apply usb-drd.yaml schema first. If the controller is configured
> > > > to work in a gadget mode only, then apply the usb.yaml schema too,
> > > > otherwise apply the usb-xhci.yaml schema.
> > > > - Discard the Rob'es Reviewed-by tag. Please review the patch one more
> > > > time.
> > > > ---
> > > > .../devicetree/bindings/usb/dwc3.txt | 125 --------
> > > > .../devicetree/bindings/usb/snps,dwc3.yaml | 303 ++++++++++++++++++
> > > > 2 files changed, 303 insertions(+), 125 deletions(-)
> > > > delete mode 100644 Documentation/devicetree/bindings/usb/dwc3.txt
> > > > create mode 100644 Documentation/devicetree/bindings/usb/snps,dwc3.yaml
>
>
> > > > diff --git a/Documentation/devicetree/bindings/usb/snps,dwc3.yaml b/Documentation/devicetree/bindings/usb/snps,dwc3.yaml
> > > > new file mode 100644
> > > > index 000000000000..079617891da6
> > > > --- /dev/null
> > > > +++ b/Documentation/devicetree/bindings/usb/snps,dwc3.yaml
> > > > @@ -0,0 +1,303 @@
> > > > +# SPDX-License-Identifier: GPL-2.0
> > > > +%YAML 1.2
> > > > +---
> > > > +$id: http://devicetree.org/schemas/usb/snps,dwc3.yaml#
> > > > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > > > +
> > > > +title: Synopsys DesignWare USB3 Controller
> > > > +
> > > > +maintainers:
> > > > + - Felipe Balbi <balbi@kernel.org>
> > > > +
> > > > +description:
> > > > + This is usually a subnode to DWC3 glue to which it is connected, but can also
> > > > + be presented as a standalone DT node with an optional vendor-specific
> > > > + compatible string.
> > > > +
> >
> > > > +allOf:
> > > > + - $ref: usb-drd.yaml#
> > > > + - if:
> > > > + properties:
> > > > + dr_mode:
> > > > + const: peripheral
>
> Another thing, this evaluates to true if dr_mode is not present. You
> need to add 'required'?
Right. Will something like this do that?
+ allOf:
+ - $ref: usb-drd.yaml#
+ - if:
+ properties:
+ dr_mode:
+ const: peripheral
+
+ required:
+ - dr_mode
+ then:
+ $ref: usb.yaml#
+ else
+ $ref: usb-xhci.yaml#
> If dr_mode is otg, then don't you need to apply
> both usb.yaml and usb-xhci.yaml?
No I don't. Since there is no peripheral-specific DT schema, then the
only schema any USB-gadget node needs to pass is usb.yaml, which
is already included into the usb-xhci.yaml schema. So for pure OTG devices
with xHCI host and gadget capabilities it's enough to evaluate: allOf:
[$ref: usb-drd.yaml#, $ref: usb-xhci.yaml#]. Please see the
sketch/ASCII-figure below and the following text for details.
-Sergey
>
> > > > + then:
> > > > + $ref: usb.yaml#
> > >
> > > This part could be done in usb-drd.yaml?
> >
> > Originally I was thinking about that, but then in order to minimize
> > the properties validation I've decided to split the properties in
> > accordance with the USB controllers functionality:
> >
> > +----- USB Gadget/Peripheral Controller. There is no
> > | specific schema for the gadgets since there is no
> > | common gadget properties (at least I failed to find
> > | ones). So the pure gadget controllers need to be
> > | validated just against usb.yaml schema.
> > |
> > usb.yaml <--+-- usb-hcd.yaml - Generic USB Host Controller. The schema
> > ^ turns out to include the OHCI/UHCI/EHCI
> > | properties, which AFAICS are also
> > | applicable for the other host controllers.
> > | So any USB host controller node needs to
> > | be validated against this schema.
> > |
> > +- usb-xhci.yaml - Generic xHCI Host controller.
> >
> > usb-drd.yaml -- USB Dual-Role/OTG Controllers. It describes the
> > DRD/OTG-specific properties and nothing else. So normally
> > it should be applied together with one of the
> > schemas described above.
> >
> > So the use-cases of the suggested schemas is following:
> >
> > 1) USB Controller is pure gadget? Then:
> > + allOf:
> > + - $ref: usb.yaml#
> > 2) USB Controller is pure USB host (including OHCI/UHCI/EHCI)?
> > + allOf:
> > + - $ref: usb-hcd.yaml#
> > Note this prevents us from fixing all the currently available USB DT
> > schemas, which already apply the usb-hcd.yaml schema.
> > 3) USB Controller is pure xHCI host controller? Then:
> > + allOf:
> > + - $ref: usb-xhci.yaml#
> > 4) USB Controller is Dual-Role/OTG controller with USB 2.0 host? Then:
> > + allOf:
> > + - $ref: usb-drd.yaml#
> > + - $ref: usb-hcd.yaml#
> > 5) USB Controller is Dual-Role/OTG controller with xHCI host? Then:
> > + allOf:
> > + - $ref: usb-drd.yaml#
> > + - $ref: usb-xhci.yaml#
> > 6) USB Controller is Dual-Role/OTG controller which can only be a
> > gadget? Then:
> > + allOf:
> > + - $ref: usb-drd.yaml#
> > + - $ref: usb.yaml#
> >
> > * Don't know really if controllers like in 6)-th really exist. Most
> > * likely they are still internally capable of dual-roling, but due to
> > * some conditions can be used as gadgets only.
> >
> > It looks a bit complicated, but at least by having such design we'd minimize
> > the number of properties validation.
> >
[...]
^ permalink raw reply
* Re: [PATCH 1/3] perf/core: Flush PMU internal buffers for per-CPU events
From: Michael Ellerman @ 2020-11-25 8:12 UTC (permalink / raw)
To: Namhyung Kim
Cc: Ian Rogers, Andi Kleen, Peter Zijlstra, linuxppc-dev,
linux-kernel, Stephane Eranian, Paul Mackerras,
Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Gabriel Marin,
Liang, Kan
In-Reply-To: <CAM9d7cg8kYMyPHQK_rhEiYQaSddqqt93=pLVNKJm8Y6F=if9ow@mail.gmail.com>
Namhyung Kim <namhyung@kernel.org> writes:
> Hello,
>
> On Mon, Nov 23, 2020 at 8:00 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>>
>> Namhyung Kim <namhyung@kernel.org> writes:
>> > Hi Peter and Kan,
>> >
>> > (Adding PPC folks)
>> >
>> > On Tue, Nov 17, 2020 at 2:01 PM Namhyung Kim <namhyung@kernel.org> wrote:
>> >>
>> >> Hello,
>> >>
>> >> On Thu, Nov 12, 2020 at 4:54 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>> >> >
>> >> >
>> >> >
>> >> > On 11/11/2020 11:25 AM, Peter Zijlstra wrote:
>> >> > > On Mon, Nov 09, 2020 at 09:49:31AM -0500, Liang, Kan wrote:
>> >> > >
>> >> > >> - When the large PEBS was introduced (9c964efa4330), the sched_task() should
>> >> > >> be invoked to flush the PEBS buffer in each context switch. However, The
>> >> > >> perf_sched_events in account_event() is not updated accordingly. The
>> >> > >> perf_event_task_sched_* never be invoked for a pure per-CPU context. Only
>> >> > >> per-task event works.
>> >> > >> At that time, the perf_pmu_sched_task() is outside of
>> >> > >> perf_event_context_sched_in/out. It means that perf has to double
>> >> > >> perf_pmu_disable() for per-task event.
>> >> > >
>> >> > >> - The patch 1 tries to fix broken per-CPU events. The CPU context cannot be
>> >> > >> retrieved from the task->perf_event_ctxp. So it has to be tracked in the
>> >> > >> sched_cb_list. Yes, the code is very similar to the original codes, but it
>> >> > >> is actually the new code for per-CPU events. The optimization for per-task
>> >> > >> events is still kept.
>> >> > >> For the case, which has both a CPU context and a task context, yes, the
>> >> > >> __perf_pmu_sched_task() in this patch is not invoked. Because the
>> >> > >> sched_task() only need to be invoked once in a context switch. The
>> >> > >> sched_task() will be eventually invoked in the task context.
>> >> > >
>> >> > > The thing is; your first two patches rely on PERF_ATTACH_SCHED_CB and
>> >> > > only set that for large pebs. Are you sure the other users (Intel LBR
>> >> > > and PowerPC BHRB) don't need it?
>> >> >
>> >> > I didn't set it for LBR, because the perf_sched_events is always enabled
>> >> > for LBR. But, yes, we should explicitly set the PERF_ATTACH_SCHED_CB
>> >> > for LBR.
>> >> >
>> >> > if (has_branch_stack(event))
>> >> > inc = true;
>> >> >
>> >> > >
>> >> > > If they indeed do not require the pmu::sched_task() callback for CPU
>> >> > > events, then I still think the whole perf_sched_cb_{inc,dec}() interface
>> >> >
>> >> > No, LBR requires the pmu::sched_task() callback for CPU events.
>> >> >
>> >> > Now, The LBR registers have to be reset in sched in even for CPU events.
>> >> >
>> >> > To fix the shorter LBR callstack issue for CPU events, we also need to
>> >> > save/restore LBRs in pmu::sched_task().
>> >> > https://lore.kernel.org/lkml/1578495789-95006-4-git-send-email-kan.liang@linux.intel.com/
>> >> >
>> >> > > is confusing at best.
>> >> > >
>> >> > > Can't we do something like this instead?
>> >> > >
>> >> > I think the below patch may have two issues.
>> >> > - PERF_ATTACH_SCHED_CB is required for LBR (maybe PowerPC BHRB as well) now.
>> >> > - We may disable the large PEBS later if not all PEBS events support
>> >> > large PEBS. The PMU need a way to notify the generic code to decrease
>> >> > the nr_sched_task.
>> >>
>> >> Any updates on this? I've reviewed and tested Kan's patches
>> >> and they all look good.
>> >>
>> >> Maybe we can talk to PPC folks to confirm the BHRB case?
>> >
>> > Can we move this forward? I saw patch 3/3 also adds PERF_ATTACH_SCHED_CB
>> > for PowerPC too. But it'd be nice if ppc folks can confirm the change.
>>
>> Sorry I've read the whole thread, but I'm still not entirely sure I
>> understand the question.
>
> Thanks for your time and sorry about not being clear enough.
>
> We found per-cpu events are not calling pmu::sched_task()
> on context switches. So PERF_ATTACH_SCHED_CB was
> added to indicate the core logic that it needs to invoke the
> callback.
OK. TBH I've never thought of using branch stack with a per-cpu event,
but I guess you can do it.
I think the same logic applies as LBR, we need to read the BHRB entries
in the context of the task that they were recorded for.
> The patch 3/3 added the flag to PPC (for BHRB) with other
> changes (I think it should be split like in the patch 2/3) and
> want to get ACKs from the PPC folks.
If you post a new version with Maddy's comments addressed then he or I
can ack it.
cheers
^ permalink raw reply
* Re: [PATCH 1/2] genirq: add an affinity parameter to irq_create_mapping()
From: Laurent Vivier @ 2020-11-25 7:30 UTC (permalink / raw)
To: Thomas Gleixner, linux-kernel
Cc: Michael S . Tsirkin, linux-pci, linux-block, Paul Mackerras,
Marc Zyngier, linuxppc-dev, Christoph Hellwig
In-Reply-To: <87h7pel7ng.fsf@nanos.tec.linutronix.de>
On 24/11/2020 23:19, Thomas Gleixner wrote:
> On Tue, Nov 24 2020 at 21:03, Laurent Vivier wrote:
>> This parameter is needed to pass it to irq_domain_alloc_descs().
>>
>> This seems to have been missed by
>> o06ee6d571f0e ("genirq: Add affinity hint to irq allocation")
>
> No, this has not been missed at all. There was and is no reason to do
> this.
>
>> This is needed to implement proper support for multiqueue with
>> pseries.
>
> And because pseries needs this _all_ callers need to be changed?
>
>> 123 files changed, 171 insertions(+), 146 deletions(-)
>
> Lots of churn for nothing. 99% of the callers will never need that.
>
> What's wrong with simply adding an interface which takes that parameter,
> make the existing one an inline wrapper and and leave the rest alone?
Nothing. I'm going to do like that.
Thank you for your comment.
Laurent
^ permalink raw reply
* [PATCH V2] powerpc/perf: Exclude kernel samples while counting events in user space.
From: Athira Rajeev @ 2020-11-25 7:26 UTC (permalink / raw)
To: mpe; +Cc: maddy, linuxppc-dev
Perf event attritube supports exclude_kernel flag
to avoid sampling/profiling in supervisor state (kernel).
Based on this event attr flag, Monitor Mode Control Register
bit is set to freeze on supervisor state. But sometime (due
to hardware limitation), Sampled Instruction Address
Register (SIAR) locks on to kernel address even when
freeze on supervisor is set. Patch here adds a check to
drop those samples.
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
Changes in v2:
- Initial patch was sent along with series:
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=209195
Moving this patch as separate since this change is applicable
for all PMU platforms.
arch/powerpc/perf/core-book3s.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 08643cb..40aa117 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2122,6 +2122,17 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
perf_event_update_userpage(event);
/*
+ * Due to hardware limitation, sometimes SIAR could
+ * lock on to kernel address even with freeze on
+ * supervisor state (kernel) is set in MMCR2.
+ * Check attr.exclude_kernel and address
+ * to drop the sample in these cases.
+ */
+ if (event->attr.exclude_kernel && record)
+ if (is_kernel_addr(mfspr(SPRN_SIAR)))
+ record = 0;
+
+ /*
* Finally record data if requested.
*/
if (record) {
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH v4] dt-bindings: misc: convert fsl,qoriq-mc from txt to YAML
From: Ioana Ciornei @ 2020-11-25 7:17 UTC (permalink / raw)
To: Laurentiu Tudor
Cc: devicetree@vger.kernel.org, corbet@lwn.net,
netdev@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, Leo Li, robh+dt@kernel.org,
Ionut-robert Aron, kuba@kernel.org, linuxppc-dev@lists.ozlabs.org,
davem@davemloft.net, linux-arm-kernel@lists.infradead.org
In-Reply-To: <20201123090035.15734-1-laurentiu.tudor@nxp.com>
On Mon, Nov 23, 2020 at 11:00:35AM +0200, Laurentiu Tudor wrote:
> From: Ionut-robert Aron <ionut-robert.aron@nxp.com>
>
> Convert fsl,qoriq-mc to YAML in order to automate the verification
> process of dts files. In addition, update MAINTAINERS accordingly
> and, while at it, add some missing files.
>
> Signed-off-by: Ionut-robert Aron <ionut-robert.aron@nxp.com>
> [laurentiu.tudor@nxp.com: update MINTAINERS, updates & fixes in schema]
> Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com>
> ---
> Changes in v4:
> - use $ref to point to fsl,qoriq-mc-dpmac binding
>
> Changes in v3:
> - dropped duplicated "fsl,qoriq-mc-dpmac" schema and replaced with
> reference to it
> - fixed a dt_binding_check warning
>
> Changes in v2:
> - fixed errors reported by yamllint
> - dropped multiple unnecessary quotes
> - used schema instead of text in description
> - added constraints on dpmac reg property
>
> .../devicetree/bindings/misc/fsl,qoriq-mc.txt | 196 ------------------
> .../bindings/misc/fsl,qoriq-mc.yaml | 186 +++++++++++++++++
> .../ethernet/freescale/dpaa2/overview.rst | 5 +-
> MAINTAINERS | 4 +-
> 4 files changed, 193 insertions(+), 198 deletions(-)
> delete mode 100644 Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
> create mode 100644 Documentation/devicetree/bindings/misc/fsl,qoriq-mc.yaml
>
> diff --git a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
> deleted file mode 100644
> index 7b486d4985dc..000000000000
> --- a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
> +++ /dev/null
> @@ -1,196 +0,0 @@
> -* Freescale Management Complex
> -
> -The Freescale Management Complex (fsl-mc) is a hardware resource
> -manager that manages specialized hardware objects used in
> -network-oriented packet processing applications. After the fsl-mc
> -block is enabled, pools of hardware resources are available, such as
> -queues, buffer pools, I/O interfaces. These resources are building
> -blocks that can be used to create functional hardware objects/devices
> -such as network interfaces, crypto accelerator instances, L2 switches,
> -etc.
> -
> -For an overview of the DPAA2 architecture and fsl-mc bus see:
> -Documentation/networking/device_drivers/ethernet/freescale/dpaa2/overview.rst
> -
> -As described in the above overview, all DPAA2 objects in a DPRC share the
> -same hardware "isolation context" and a 10-bit value called an ICID
> -(isolation context id) is expressed by the hardware to identify
> -the requester.
> -
> -The generic 'iommus' property is insufficient to describe the relationship
> -between ICIDs and IOMMUs, so an iommu-map property is used to define
> -the set of possible ICIDs under a root DPRC and how they map to
> -an IOMMU.
> -
> -For generic IOMMU bindings, see
> -Documentation/devicetree/bindings/iommu/iommu.txt.
> -
> -For arm-smmu binding, see:
> -Documentation/devicetree/bindings/iommu/arm,smmu.yaml.
> -
> -The MSI writes are accompanied by sideband data which is derived from the ICID.
> -The msi-map property is used to associate the devices with both the ITS
> -controller and the sideband data which accompanies the writes.
> -
> -For generic MSI bindings, see
> -Documentation/devicetree/bindings/interrupt-controller/msi.txt.
> -
> -For GICv3 and GIC ITS bindings, see:
> -Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml.
> -
> -Required properties:
> -
> - - compatible
> - Value type: <string>
> - Definition: Must be "fsl,qoriq-mc". A Freescale Management Complex
> - compatible with this binding must have Block Revision
> - Registers BRR1 and BRR2 at offset 0x0BF8 and 0x0BFC in
> - the MC control register region.
> -
> - - reg
> - Value type: <prop-encoded-array>
> - Definition: A standard property. Specifies one or two regions
> - defining the MC's registers:
> -
> - -the first region is the command portal for the
> - this machine and must always be present
> -
> - -the second region is the MC control registers. This
> - region may not be present in some scenarios, such
> - as in the device tree presented to a virtual machine.
> -
> - - ranges
> - Value type: <prop-encoded-array>
> - Definition: A standard property. Defines the mapping between the child
> - MC address space and the parent system address space.
> -
> - The MC address space is defined by 3 components:
> - <region type> <offset hi> <offset lo>
> -
> - Valid values for region type are
> - 0x0 - MC portals
> - 0x1 - QBMAN portals
> -
> - - #address-cells
> - Value type: <u32>
> - Definition: Must be 3. (see definition in 'ranges' property)
> -
> - - #size-cells
> - Value type: <u32>
> - Definition: Must be 1.
> -
> -Sub-nodes:
> -
> - The fsl-mc node may optionally have dpmac sub-nodes that describe
> - the relationship between the Ethernet MACs which belong to the MC
> - and the Ethernet PHYs on the system board.
> -
> - The dpmac nodes must be under a node named "dpmacs" which contains
> - the following properties:
> -
> - - #address-cells
> - Value type: <u32>
> - Definition: Must be present if dpmac sub-nodes are defined and must
> - have a value of 1.
> -
> - - #size-cells
> - Value type: <u32>
> - Definition: Must be present if dpmac sub-nodes are defined and must
> - have a value of 0.
> -
> - These nodes must have the following properties:
> -
> - - compatible
> - Value type: <string>
> - Definition: Must be "fsl,qoriq-mc-dpmac".
> -
> - - reg
> - Value type: <prop-encoded-array>
> - Definition: Specifies the id of the dpmac.
> -
> - - phy-handle
> - Value type: <phandle>
> - Definition: Specifies the phandle to the PHY device node associated
> - with the this dpmac.
> -Optional properties:
> -
> -- iommu-map: Maps an ICID to an IOMMU and associated iommu-specifier
> - data.
> -
> - The property is an arbitrary number of tuples of
> - (icid-base,iommu,iommu-base,length).
> -
> - Any ICID i in the interval [icid-base, icid-base + length) is
> - associated with the listed IOMMU, with the iommu-specifier
> - (i - icid-base + iommu-base).
> -
> -- msi-map: Maps an ICID to a GIC ITS and associated msi-specifier
> - data.
> -
> - The property is an arbitrary number of tuples of
> - (icid-base,gic-its,msi-base,length).
> -
> - Any ICID in the interval [icid-base, icid-base + length) is
> - associated with the listed GIC ITS, with the msi-specifier
> - (i - icid-base + msi-base).
> -
> -Deprecated properties:
> -
> - - msi-parent
> - Value type: <phandle>
> - Definition: Describes the MSI controller node handling message
> - interrupts for the MC. When there is no translation
> - between the ICID and deviceID this property can be used
> - to describe the MSI controller used by the devices on the
> - mc-bus.
> - The use of this property for mc-bus is deprecated. Please
> - use msi-map.
> -
> -Example:
> -
> - smmu: iommu@5000000 {
> - compatible = "arm,mmu-500";
> - #iommu-cells = <1>;
> - stream-match-mask = <0x7C00>;
> - ...
> - };
> -
> - gic: interrupt-controller@6000000 {
> - compatible = "arm,gic-v3";
> - ...
> - }
> - its: gic-its@6020000 {
> - compatible = "arm,gic-v3-its";
> - msi-controller;
> - ...
> - };
> -
> - fsl_mc: fsl-mc@80c000000 {
> - compatible = "fsl,qoriq-mc";
> - reg = <0x00000008 0x0c000000 0 0x40>, /* MC portal base */
> - <0x00000000 0x08340000 0 0x40000>; /* MC control reg */
> - /* define map for ICIDs 23-64 */
> - iommu-map = <23 &smmu 23 41>;
> - /* define msi map for ICIDs 23-64 */
> - msi-map = <23 &its 23 41>;
> - #address-cells = <3>;
> - #size-cells = <1>;
> -
> - /*
> - * Region type 0x0 - MC portals
> - * Region type 0x1 - QBMAN portals
> - */
> - ranges = <0x0 0x0 0x0 0x8 0x0c000000 0x4000000
> - 0x1 0x0 0x0 0x8 0x18000000 0x8000000>;
> -
> - dpmacs {
> - #address-cells = <1>;
> - #size-cells = <0>;
> -
> - dpmac@1 {
> - compatible = "fsl,qoriq-mc-dpmac";
> - reg = <1>;
> - phy-handle = <&mdio0_phy0>;
> - }
> - }
> - };
> diff --git a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.yaml b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.yaml
> new file mode 100644
> index 000000000000..f45e21872e4f
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.yaml
> @@ -0,0 +1,186 @@
> +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
> +# Copyright 2020 NXP
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/misc/fsl,qoriq-mc.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +maintainers:
> + - Laurentiu Tudor <laurentiu.tudor@nxp.com>
> +
> +title: Freescale Management Complex
> +
> +description: |
> + The Freescale Management Complex (fsl-mc) is a hardware resource
> + manager that manages specialized hardware objects used in
> + network-oriented packet processing applications. After the fsl-mc
> + block is enabled, pools of hardware resources are available, such as
> + queues, buffer pools, I/O interfaces. These resources are building
> + blocks that can be used to create functional hardware objects/devices
> + such as network interfaces, crypto accelerator instances, L2 switches,
> + etc.
> +
> + For an overview of the DPAA2 architecture and fsl-mc bus see:
> + Documentation/networking/device_drivers/freescale/dpaa2/overview.rst
> +
> + As described in the above overview, all DPAA2 objects in a DPRC share the
> + same hardware "isolation context" and a 10-bit value called an ICID
> + (isolation context id) is expressed by the hardware to identify
> + the requester.
> +
> + The generic 'iommus' property is insufficient to describe the relationship
> + between ICIDs and IOMMUs, so an iommu-map property is used to define
> + the set of possible ICIDs under a root DPRC and how they map to
> + an IOMMU.
> +
> + For generic IOMMU bindings, see:
> + Documentation/devicetree/bindings/iommu/iommu.txt.
> +
> + For arm-smmu binding, see:
> + Documentation/devicetree/bindings/iommu/arm,smmu.yaml.
> +
> + MC firmware binary images can be found here:
> + https://github.com/NXP/qoriq-mc-binary
> +
> +properties:
> + compatible:
> + const: fsl,qoriq-mc
> + description:
> + A Freescale Management Complex compatible with this binding must have
> + Block Revision Registers BRR1 and BRR2 at offset 0x0BF8 and 0x0BFC in
> + the MC control register region.
> +
> + reg:
> + minItems: 1
> + items:
> + - description: the command portal for this machine
> + - description:
> + MC control registers. This region may not be present in some
> + scenarios, such as in the device tree presented to a virtual
> + machine.
> +
> + ranges:
> + description: |
> + A standard property. Defines the mapping between the child MC address
> + space and the parent system address space.
> +
> + The MC address space is defined by 3 components:
> + <region type> <offset hi> <offset lo>
> +
> + Valid values for region type are:
> + 0x0 - MC portals
> + 0x1 - QBMAN portals
> +
> + '#address-cells':
> + const: 3
> +
> + '#size-cells':
> + const: 1
> +
> + dpmacs:
> + type: object
> + description:
> + The fsl-mc node may optionally have dpmac sub-nodes that describe the
> + relationship between the Ethernet MACs which belong to the MC and the
> + Ethernet PHYs on the system board.
> +
> + properties:
> + '#address-cells':
> + const: 1
> +
> + '#size-cells':
> + const: 0
> +
> + patternProperties:
> + "^(dpmac@[0-9a-f]+)|(ethernet@[0-9a-f]+)$":
> + type: object
> +
> + $ref: /schemas/net/fsl,qoriq-mc-dpmac.yaml#
> +
> + iommu-map:
> + description: |
> + Maps an ICID to an IOMMU and associated iommu-specifier data.
> +
> + The property is an arbitrary number of tuples of
> + (icid-base, iommu, iommu-base, length).
> +
> + Any ICID i in the interval [icid-base, icid-base + length) is
> + associated with the listed IOMMU, with the iommu-specifier
> + (i - icid-base + iommu-base).
> +
> + msi-map:
> + description: |
> + Maps an ICID to a GIC ITS and associated msi-specifier data.
> +
> + The property is an arbitrary number of tuples of
> + (icid-base, gic-its, msi-base, length).
> +
> + Any ICID in the interval [icid-base, icid-base + length) is
> + associated with the listed GIC ITS, with the msi-specifier
> + (i - icid-base + msi-base).
> +
> + msi-parent:
> + deprecated: true
> + description:
> + Points to the MSI controller node handling message interrupts for the MC.
> +
> +required:
> + - compatible
> + - reg
> + - iommu-map
> + - msi-map
> + - ranges
> + - '#address-cells'
> + - '#size-cells'
> +
> +additionalProperties: false
> +
> +examples:
> + - |
> + soc {
> + #address-cells = <2>;
> + #size-cells = <2>;
> +
> + smmu: iommu@5000000 {
> + compatible = "arm,mmu-500";
> + #global-interrupts = <1>;
> + #iommu-cells = <1>;
> + reg = <0 0x5000000 0 0x800000>;
> + stream-match-mask = <0x7c00>;
> + interrupts = <0 13 4>,
> + <0 146 4>, <0 147 4>,
> + <0 148 4>, <0 149 4>,
> + <0 150 4>, <0 151 4>,
> + <0 152 4>, <0 153 4>;
> + };
> +
> + fsl_mc: fsl-mc@80c000000 {
> + compatible = "fsl,qoriq-mc";
> + reg = <0x00000008 0x0c000000 0 0x40>, /* MC portal base */
> + <0x00000000 0x08340000 0 0x40000>; /* MC control reg */
> + /* define map for ICIDs 23-64 */
> + iommu-map = <23 &smmu 23 41>;
> + /* define msi map for ICIDs 23-64 */
> + msi-map = <23 &its 23 41>;
> + #address-cells = <3>;
> + #size-cells = <1>;
> +
> + /*
> + * Region type 0x0 - MC portals
> + * Region type 0x1 - QBMAN portals
> + */
> + ranges = <0x0 0x0 0x0 0x8 0x0c000000 0x4000000
> + 0x1 0x0 0x0 0x8 0x18000000 0x8000000>;
> +
> + dpmacs {
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + ethernet@1 {
> + compatible = "fsl,qoriq-mc-dpmac";
> + reg = <1>;
> + phy-handle = <&mdio0_phy0>;
> + };
> + };
> + };
> + };
> diff --git a/Documentation/networking/device_drivers/ethernet/freescale/dpaa2/overview.rst b/Documentation/networking/device_drivers/ethernet/freescale/dpaa2/overview.rst
> index d638b5a8aadd..b3261c5871cc 100644
> --- a/Documentation/networking/device_drivers/ethernet/freescale/dpaa2/overview.rst
> +++ b/Documentation/networking/device_drivers/ethernet/freescale/dpaa2/overview.rst
> @@ -28,6 +28,9 @@ interfaces, an L2 switch, or accelerator instances.
> The MC provides memory-mapped I/O command interfaces (MC portals)
> which DPAA2 software drivers use to operate on DPAA2 objects.
>
> +MC firmware binary images can be found here:
> +https://github.com/NXP/qoriq-mc-binary
> +
> The diagram below shows an overview of the DPAA2 resource management
> architecture::
>
> @@ -338,7 +341,7 @@ Key functions include:
> a bind of the root DPRC to the DPRC driver
>
> The binding for the MC-bus device-tree node can be consulted at
> -*Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt*.
> +*Documentation/devicetree/bindings/misc/fsl,qoriq-mc.yaml*.
> The sysfs bind/unbind interfaces for the MC-bus can be consulted at
> *Documentation/ABI/testing/sysfs-bus-fsl-mc*.
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index b516bb34a8d5..e0ce6e2b663c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -14409,9 +14409,11 @@ M: Stuart Yoder <stuyoder@gmail.com>
> M: Laurentiu Tudor <laurentiu.tudor@nxp.com>
> L: linux-kernel@vger.kernel.org
> S: Maintained
> -F: Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
> +F: Documentation/devicetree/bindings/misc/fsl,dpaa2-console.yaml
> +F: Documentation/devicetree/bindings/misc/fsl,qoriq-mc.yaml
> F: Documentation/networking/device_drivers/ethernet/freescale/dpaa2/overview.rst
> F: drivers/bus/fsl-mc/
> +F: include/linux/fsl/mc.h
>
> QT1010 MEDIA DRIVER
> M: Antti Palosaari <crope@iki.fi>
> --
> 2.17.1
>
^ permalink raw reply
* [PATCH v1 8/8] powerpc/32: Use SPRN_SPRG_SCRATCH2 in exception prologs
From: Christophe Leroy @ 2020-11-25 7:10 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <da51f7ec632825a4ce43290a904aad61648408c0.1606285013.git.christophe.leroy@csgroup.eu>
Use SPRN_SPRG_SCRATCH2 as a third scratch register in
exception prologs in order to simplify them and avoid
data going back and forth from/to CR.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/kernel/head_32.h | 22 +++++++---------------
1 file changed, 7 insertions(+), 15 deletions(-)
diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index 5e3393122d29..a1ee1e12241e 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -40,7 +40,7 @@
.macro EXCEPTION_PROLOG_1 for_rtas=0
#ifdef CONFIG_VMAP_STACK
- mr r11, r1
+ mtspr SPRN_SPRG_SCRATCH2,r1
subi r1, r1, INT_FRAME_SIZE /* use r1 if kernel */
beq 1f
mfspr r1,SPRN_SPRG_THREAD
@@ -61,15 +61,10 @@
.macro EXCEPTION_PROLOG_2 handle_dar_dsisr=0
#ifdef CONFIG_VMAP_STACK
- mtcr r10
- li r10, MSR_KERNEL & ~(MSR_IR | MSR_RI) /* can take DTLB miss */
- mtmsr r10
+ li r11, MSR_KERNEL & ~(MSR_IR | MSR_RI) /* can take DTLB miss */
+ mtmsr r11
isync
-#else
- stw r10,_CCR(r11) /* save registers */
-#endif
- mfspr r10, SPRN_SPRG_SCRATCH0
-#ifdef CONFIG_VMAP_STACK
+ mfspr r11, SPRN_SPRG_SCRATCH2
stw r11,GPR1(r1)
stw r11,0(r1)
mr r11, r1
@@ -78,14 +73,12 @@
stw r1,0(r11)
tovirt(r1, r11) /* set new kernel sp */
#endif
+ stw r10,_CCR(r11) /* save registers */
stw r12,GPR12(r11)
stw r9,GPR9(r11)
- stw r10,GPR10(r11)
-#ifdef CONFIG_VMAP_STACK
- mfcr r10
- stw r10, _CCR(r11)
-#endif
+ mfspr r10,SPRN_SPRG_SCRATCH0
mfspr r12,SPRN_SPRG_SCRATCH1
+ stw r10,GPR10(r11)
stw r12,GPR11(r11)
mflr r10
stw r10,_LINK(r11)
@@ -99,7 +92,6 @@
stw r10, _DSISR(r11)
.endif
lwz r9, SRR1(r12)
- andi. r10, r9, MSR_PR
lwz r12, SRR0(r12)
#else
mfspr r12,SPRN_SRR0
--
2.25.0
^ permalink raw reply related
* [PATCH v1 7/8] powerpc/32s: Use SPRN_SPRG_SCRATCH2 in DSI prolog
From: Christophe Leroy @ 2020-11-25 7:10 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <da51f7ec632825a4ce43290a904aad61648408c0.1606285013.git.christophe.leroy@csgroup.eu>
Use SPRN_SPRG_SCRATCH2 as an alternative scratch register in
the early part of DSI prolog in order to avoid clobbering
SPRN_SPRG_SCRATCH0/1 used by other prologs.
The 603 doesn't like a jump from DataLoadTLBMiss to the 10 nops
that are now in the beginning of DSI exception as a result of
the feature section. To workaround this, add a jump as alternative.
It also avoids fetching 10 nops for nothing.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/include/asm/reg.h | 1 +
arch/powerpc/kernel/head_book3s_32.S | 24 ++++++++----------------
2 files changed, 9 insertions(+), 16 deletions(-)
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index a37ce826f6f6..acd334ee3936 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1203,6 +1203,7 @@
#ifdef CONFIG_PPC_BOOK3S_32
#define SPRN_SPRG_SCRATCH0 SPRN_SPRG0
#define SPRN_SPRG_SCRATCH1 SPRN_SPRG1
+#define SPRN_SPRG_SCRATCH2 SPRN_SPRG2
#define SPRN_SPRG_603_LRU SPRN_SPRG4
#endif
diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S
index 51eef7b82f9c..22d670263222 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -288,9 +288,9 @@ MachineCheck:
DO_KVM 0x300
DataAccess:
#ifdef CONFIG_VMAP_STACK
- mtspr SPRN_SPRG_SCRATCH0,r10
- mfspr r10, SPRN_SPRG_THREAD
BEGIN_MMU_FTR_SECTION
+ mtspr SPRN_SPRG_SCRATCH2,r10
+ mfspr r10, SPRN_SPRG_THREAD
stw r11, THR11(r10)
mfspr r10, SPRN_DSISR
mfcr r11
@@ -304,19 +304,11 @@ BEGIN_MMU_FTR_SECTION
.Lhash_page_dsi_cont:
mtcr r11
lwz r11, THR11(r10)
-END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE)
- mtspr SPRN_SPRG_SCRATCH1,r11
- mfspr r11, SPRN_DAR
- stw r11, DAR(r10)
- mfspr r11, SPRN_DSISR
- stw r11, DSISR(r10)
- mfspr r11, SPRN_SRR0
- stw r11, SRR0(r10)
- mfspr r11, SPRN_SRR1 /* check whether user or kernel */
- stw r11, SRR1(r10)
- mfcr r10
- andi. r11, r11, MSR_PR
-
+ mfspr r10, SPRN_SPRG_SCRATCH2
+MMU_FTR_SECTION_ELSE
+ b 1f
+ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_HPTE_TABLE)
+1: EXCEPTION_PROLOG_0 handle_dar_dsisr=1
EXCEPTION_PROLOG_1
b handle_page_fault_tramp_1
#else /* CONFIG_VMAP_STACK */
@@ -760,7 +752,7 @@ fast_hash_page_return:
/* DSI */
mtcr r11
lwz r11, THR11(r10)
- mfspr r10, SPRN_SPRG_SCRATCH0
+ mfspr r10, SPRN_SPRG_SCRATCH2
RFI
1: /* ISI */
--
2.25.0
^ permalink raw reply related
* [PATCH v1 6/8] powerpc/32: Simplify EXCEPTION_PROLOG_1 macro
From: Christophe Leroy @ 2020-11-25 7:10 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <da51f7ec632825a4ce43290a904aad61648408c0.1606285013.git.christophe.leroy@csgroup.eu>
Make code more readable with a clear CONFIG_VMAP_STACK
section and a clear non CONFIG_VMAP_STACK section.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/kernel/head_32.h | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index 7c767765071d..5e3393122d29 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -46,18 +46,16 @@
mfspr r1,SPRN_SPRG_THREAD
lwz r1,TASK_STACK-THREAD(r1)
addi r1, r1, THREAD_SIZE - INT_FRAME_SIZE
+1:
+ mtcrf 0x7f, r1
+ bt 32 - THREAD_ALIGN_SHIFT, stack_overflow
#else
subi r11, r1, INT_FRAME_SIZE /* use r1 if kernel */
beq 1f
mfspr r11,SPRN_SPRG_THREAD
lwz r11,TASK_STACK-THREAD(r11)
addi r11, r11, THREAD_SIZE - INT_FRAME_SIZE
-#endif
-1:
- tophys_novmstack r11, r11
-#ifdef CONFIG_VMAP_STACK
- mtcrf 0x7f, r1
- bt 32 - THREAD_ALIGN_SHIFT, stack_overflow
+1: tophys(r11, r11)
#endif
.endm
--
2.25.0
^ permalink raw reply related
* [PATCH v1 5/8] powerpc/603: Use SPRN_SDR1 to store the pgdir phys address
From: Christophe Leroy @ 2020-11-25 7:10 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <da51f7ec632825a4ce43290a904aad61648408c0.1606285013.git.christophe.leroy@csgroup.eu>
On the 603, SDR1 is not used.
In order to free SPRN_SPRG2, use SPRN_SDR1 to store the pgdir
phys addr.
But only some bits of SDR1 can be used (0xffff01ff).
As the pgdir is 4k aligned, rotate it by 4 bits to the left.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/include/asm/reg.h | 1 -
arch/powerpc/kernel/head_book3s_32.S | 31 +++++++++++++++++++++-------
2 files changed, 24 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index f877a576b338..a37ce826f6f6 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1203,7 +1203,6 @@
#ifdef CONFIG_PPC_BOOK3S_32
#define SPRN_SPRG_SCRATCH0 SPRN_SPRG0
#define SPRN_SPRG_SCRATCH1 SPRN_SPRG1
-#define SPRN_SPRG_PGDIR SPRN_SPRG2
#define SPRN_SPRG_603_LRU SPRN_SPRG4
#endif
diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S
index 236a95d163be..51eef7b82f9c 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -457,8 +457,9 @@ InstructionTLBMiss:
lis r1, TASK_SIZE@h /* check if kernel address */
cmplw 0,r1,r3
#endif
- mfspr r2, SPRN_SPRG_PGDIR
+ mfspr r2, SPRN_SDR1
li r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
+ rlwinm r2, r2, 28, 0xfffff000
#ifdef CONFIG_MODULES
bgt- 112f
lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */
@@ -519,8 +520,9 @@ DataLoadTLBMiss:
mfspr r3,SPRN_DMISS
lis r1, TASK_SIZE@h /* check if kernel address */
cmplw 0,r1,r3
- mfspr r2, SPRN_SPRG_PGDIR
+ mfspr r2, SPRN_SDR1
li r1, _PAGE_PRESENT | _PAGE_ACCESSED
+ rlwinm r2, r2, 28, 0xfffff000
bgt- 112f
lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */
addi r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l /* kernel page table */
@@ -595,8 +597,9 @@ DataStoreTLBMiss:
mfspr r3,SPRN_DMISS
lis r1, TASK_SIZE@h /* check if kernel address */
cmplw 0,r1,r3
- mfspr r2, SPRN_SPRG_PGDIR
+ mfspr r2, SPRN_SDR1
li r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED
+ rlwinm r2, r2, 28, 0xfffff000
bgt- 112f
lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */
addi r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l /* kernel page table */
@@ -889,9 +892,12 @@ __secondary_start:
tophys(r4,r2)
addi r4,r4,THREAD /* phys address of our thread_struct */
mtspr SPRN_SPRG_THREAD,r4
+BEGIN_MMU_FTR_SECTION
lis r4, (swapper_pg_dir - PAGE_OFFSET)@h
ori r4, r4, (swapper_pg_dir - PAGE_OFFSET)@l
- mtspr SPRN_SPRG_PGDIR, r4
+ rlwinm r4, r4, 4, 0xffff01ff
+ mtspr SPRN_SDR1, r4
+END_MMU_FTR_SECTION_IFCLR(MMU_FTR_HPTE_TABLE)
/* enable MMU and jump to start_secondary */
li r4,MSR_KERNEL
@@ -931,11 +937,13 @@ load_up_mmu:
tlbia /* Clear all TLB entries */
sync /* wait for tlbia/tlbie to finish */
TLBSYNC /* ... on all CPUs */
+BEGIN_MMU_FTR_SECTION
/* Load the SDR1 register (hash table base & size) */
lis r6,_SDR1@ha
tophys(r6,r6)
lwz r6,_SDR1@l(r6)
mtspr SPRN_SDR1,r6
+END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE)
/* Load the BAT registers with the values set up by MMU_init. */
lis r3,BATS@ha
@@ -991,9 +999,12 @@ start_here:
tophys(r4,r2)
addi r4,r4,THREAD /* init task's THREAD */
mtspr SPRN_SPRG_THREAD,r4
+BEGIN_MMU_FTR_SECTION
lis r4, (swapper_pg_dir - PAGE_OFFSET)@h
ori r4, r4, (swapper_pg_dir - PAGE_OFFSET)@l
- mtspr SPRN_SPRG_PGDIR, r4
+ rlwinm r4, r4, 4, 0xffff01ff
+ mtspr SPRN_SDR1, r4
+END_MMU_FTR_SECTION_IFCLR(MMU_FTR_HPTE_TABLE)
/* stack */
lis r1,init_thread_union@ha
@@ -1073,16 +1084,22 @@ _ENTRY(switch_mmu_context)
li r0,NUM_USER_SEGMENTS
mtctr r0
- lwz r4, MM_PGD(r4)
#ifdef CONFIG_BDI_SWITCH
/* Context switch the PTE pointer for the Abatron BDI2000.
* The PGDIR is passed as second argument.
*/
+ lwz r4, MM_PGD(r4)
lis r5, abatron_pteptrs@ha
stw r4, abatron_pteptrs@l + 0x4(r5)
+#endif
+BEGIN_MMU_FTR_SECTION
+#ifndef CONFIG_BDI_SWITCH
+ lwz r4, MM_PGD(r4)
#endif
tophys(r4, r4)
- mtspr SPRN_SPRG_PGDIR, r4
+ rlwinm r4, r4, 4, 0xffff01ff
+ mtspr SPRN_SDR1, r4
+END_MMU_FTR_SECTION_IFCLR(MMU_FTR_HPTE_TABLE)
li r4,0
isync
3:
--
2.25.0
^ permalink raw reply related
* [PATCH v1 2/8] powerpc/32s: Don't hash_preload() kernel text
From: Christophe Leroy @ 2020-11-25 7:10 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <da51f7ec632825a4ce43290a904aad61648408c0.1606285013.git.christophe.leroy@csgroup.eu>
We now always map kernel text with BATs. Neither need to preload
hash with kernel text addresses nor ensure they are never evicted.
This is more or less a revert of commit ee4f2ea48674 ("[POWERPC] Fix
32-bit mm operations when not using BATs")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/mm/book3s32/hash_low.S | 18 +-----------------
arch/powerpc/mm/book3s32/mmu.c | 2 +-
arch/powerpc/mm/mmu_decl.h | 2 --
arch/powerpc/mm/pgtable_32.c | 4 ----
4 files changed, 2 insertions(+), 24 deletions(-)
diff --git a/arch/powerpc/mm/book3s32/hash_low.S b/arch/powerpc/mm/book3s32/hash_low.S
index b2c912e517b9..48415c857d80 100644
--- a/arch/powerpc/mm/book3s32/hash_low.S
+++ b/arch/powerpc/mm/book3s32/hash_low.S
@@ -411,30 +411,14 @@ END_FTR_SECTION_IFCLR(CPU_FTR_NEED_COHERENT)
* and we know there is a definite (although small) speed
* advantage to putting the PTE in the primary PTEG, we always
* put the PTE in the primary PTEG.
- *
- * In addition, we skip any slot that is mapping kernel text in
- * order to avoid a deadlock when not using BAT mappings if
- * trying to hash in the kernel hash code itself after it has
- * already taken the hash table lock. This works in conjunction
- * with pre-faulting of the kernel text.
- *
- * If the hash table bucket is full of kernel text entries, we'll
- * lockup here but that shouldn't happen
*/
-1: lis r4, (next_slot - PAGE_OFFSET)@ha /* get next evict slot */
+ lis r4, (next_slot - PAGE_OFFSET)@ha /* get next evict slot */
lwz r6, (next_slot - PAGE_OFFSET)@l(r4)
addi r6,r6,HPTE_SIZE /* search for candidate */
andi. r6,r6,7*HPTE_SIZE
stw r6,next_slot@l(r4)
add r4,r3,r6
- LDPTE r0,HPTE_SIZE/2(r4) /* get PTE second word */
- clrrwi r0,r0,12
- lis r6,etext@h
- ori r6,r6,etext@l /* get etext */
- tophys(r6,r6)
- cmpl cr0,r0,r6 /* compare and try again */
- blt 1b
#ifndef CONFIG_SMP
/* Store PTE in PTEG */
diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index 5c60dcade90a..23f60e97196e 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -302,7 +302,7 @@ void __init setbat(int index, unsigned long virt, phys_addr_t phys,
/*
* Preload a translation in the hash table
*/
-void hash_preload(struct mm_struct *mm, unsigned long ea)
+static void hash_preload(struct mm_struct *mm, unsigned long ea)
{
pmd_t *pmd;
diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
index 1b6d39e9baed..0ad6d476d01d 100644
--- a/arch/powerpc/mm/mmu_decl.h
+++ b/arch/powerpc/mm/mmu_decl.h
@@ -91,8 +91,6 @@ void print_system_hash_info(void);
#ifdef CONFIG_PPC32
-void hash_preload(struct mm_struct *mm, unsigned long ea);
-
extern void mapin_ram(void);
extern void setbat(int index, unsigned long virt, phys_addr_t phys,
unsigned int size, pgprot_t prot);
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 079159e97bca..6e0083e7f008 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -112,10 +112,6 @@ static void __init __mapin_ram_chunk(unsigned long offset, unsigned long top)
ktext = ((char *)v >= _stext && (char *)v < etext) ||
((char *)v >= _sinittext && (char *)v < _einittext);
map_kernel_page(v, p, ktext ? PAGE_KERNEL_TEXT : PAGE_KERNEL);
-#ifdef CONFIG_PPC_BOOK3S_32
- if (ktext)
- hash_preload(&init_mm, v);
-#endif
v += PAGE_SIZE;
p += PAGE_SIZE;
}
--
2.25.0
^ permalink raw reply related
* [PATCH v1 3/8] powerpc/32s: Fix an FTR_SECTION_ELSE
From: Christophe Leroy @ 2020-11-25 7:10 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <da51f7ec632825a4ce43290a904aad61648408c0.1606285013.git.christophe.leroy@csgroup.eu>
An FTR_SECTION_ELSE is in the middle of
BEGIN_MMU_FTR_SECTION/ALT_MMU_FTR_SECTION_END_IFSET
Change it to MMU_FTR_SECTION_ELSE
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/kernel/head_book3s_32.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S
index 27767f3e7ec1..236a95d163be 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -332,7 +332,7 @@ BEGIN_MMU_FTR_SECTION
rlwinm r3, r5, 32 - 15, 21, 21 /* DSISR_STORE -> _PAGE_RW */
bl hash_page
b handle_page_fault_tramp_1
-FTR_SECTION_ELSE
+MMU_FTR_SECTION_ELSE
b handle_page_fault_tramp_2
ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_HPTE_TABLE)
#endif /* CONFIG_VMAP_STACK */
--
2.25.0
^ permalink raw reply related
* [PATCH v1 4/8] powerpc/32s: Don't use SPRN_SPRG_PGDIR in hash_page
From: Christophe Leroy @ 2020-11-25 7:10 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <da51f7ec632825a4ce43290a904aad61648408c0.1606285013.git.christophe.leroy@csgroup.eu>
SPRN_SPRG_PGDIR is there mainly to speedup SW TLB miss handlers
for powerpc 603.
We need to free SPRN_SPRG2 to reduce the mess with CONFIG_VMAP_STACK.
In hash_page(), reading PGDIR from thread_struct will be in the noise
performance wise.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/mm/book3s32/hash_low.S | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/mm/book3s32/hash_low.S b/arch/powerpc/mm/book3s32/hash_low.S
index 48415c857d80..aca353d1c5f4 100644
--- a/arch/powerpc/mm/book3s32/hash_low.S
+++ b/arch/powerpc/mm/book3s32/hash_low.S
@@ -65,13 +65,14 @@ _GLOBAL(hash_page)
/* Get PTE (linux-style) and check access */
lis r0, TASK_SIZE@h /* check if kernel address */
cmplw 0,r4,r0
+ mfspr r8,SPRN_SPRG_THREAD /* current task's THREAD (phys) */
ori r3,r3,_PAGE_USER|_PAGE_PRESENT /* test low addresses as user */
- mfspr r5, SPRN_SPRG_PGDIR /* phys page-table root */
+ lwz r5,PGDIR(r8) /* virt page-table root */
blt+ 112f /* assume user more likely */
- lis r5, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */
- addi r5 ,r5 ,(swapper_pg_dir - PAGE_OFFSET)@l /* kernel page table */
+ lis r5,swapper_pg_dir@ha /* if kernel address, use */
+ addi r5,r5,swapper_pg_dir@l /* kernel page table */
rlwimi r3,r9,32-12,29,29 /* MSR_PR -> _PAGE_USER */
-112:
+112: tophys(r5, r5)
#ifndef CONFIG_PTE_64BIT
rlwimi r5,r4,12,20,29 /* insert top 10 bits of address */
lwz r8,0(r5) /* get pmd entry */
--
2.25.0
^ permalink raw reply related
* [PATCH v1 1/8] powerpc/32s: Always map kernel text and rodata with BATs
From: Christophe Leroy @ 2020-11-25 7:10 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
Since commit 2b279c0348af ("powerpc/32s: Allow mapping with BATs with
DEBUG_PAGEALLOC"), there is no real situation where mapping without
BATs is required.
In order to simplify memory handling, always map kernel text
and rodata with BATs even when "nobats" kernel parameter is set.
Also fix the 603 TLB miss exceptions that don't require anymore
kernel page table if DEBUG_PAGEALLOC.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/kernel/head_book3s_32.S | 4 ++--
arch/powerpc/mm/book3s32/mmu.c | 8 +++-----
2 files changed, 5 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S
index a0dda2a1f2df..27767f3e7ec1 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -453,13 +453,13 @@ InstructionTLBMiss:
*/
/* Get PTE (linux-style) and check access */
mfspr r3,SPRN_IMISS
-#if defined(CONFIG_MODULES) || defined(CONFIG_DEBUG_PAGEALLOC)
+#ifdef CONFIG_MODULES
lis r1, TASK_SIZE@h /* check if kernel address */
cmplw 0,r1,r3
#endif
mfspr r2, SPRN_SPRG_PGDIR
li r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
-#if defined(CONFIG_MODULES) || defined(CONFIG_DEBUG_PAGEALLOC)
+#ifdef CONFIG_MODULES
bgt- 112f
lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */
addi r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l /* kernel page table */
diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index a59e7ec98180..5c60dcade90a 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -157,11 +157,9 @@ unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top)
unsigned long done;
unsigned long border = (unsigned long)__init_begin - PAGE_OFFSET;
- if (__map_without_bats) {
- pr_debug("RAM mapped without BATs\n");
- return base;
- }
- if (debug_pagealloc_enabled()) {
+
+ if (debug_pagealloc_enabled() || __map_without_bats) {
+ pr_debug_once("Read-Write memory mapped without BATs\n");
if (base >= border)
return base;
if (top >= border)
--
2.25.0
^ permalink raw reply related
* Re: [PATCH 0/2] powerpc: Remove support for ppc405/440 Xilinx platforms
From: Christophe Leroy @ 2020-11-25 6:36 UTC (permalink / raw)
To: Michael Ellerman, Arnd Bergmann
Cc: Kate Stewart, Mark Rutland, Desnes A. Nunes do Rosario,
Geert Uytterhoeven, open list:DOCUMENTATION,
ALSA Development Mailing List, dri-devel, Jaroslav Kysela,
Richard Fontana, Paul Mackerras, Miquel Raynal,
Mauro Carvalho Chehab, Fabio Estevam, Sasha Levin,
Stephen Rothwell, Jonathan Corbet, Masahiro Yamada, YueHaibing,
Michal Simek, Krzysztof Kozlowski, Allison Randal, Leonardo Bras,
DTML, Andrew Donnellan, Bartlomiej Zolnierkiewicz, Marc Zyngier,
Alistair Popple, Nicholas Piggin, Alexios Zavras, Mark Brown, git,
Linux Fbdev development list, Jonathan Cameron, Thomas Gleixner,
Andy Shevchenko, Linux ARM, Christophe Leroy, Enrico Weigelt,
Michal Simek, Wei Hu, Christian Lamparter, Greg Kroah-Hartman,
Nick Desaulniers, Takashi Iwai, linux-kernel@vger.kernel.org,
Armijn Hemel, Rob Herring, linuxppc-dev, David S. Miller,
Thiago Jung Bauermann
In-Reply-To: <33b873a8-ded2-4866-fb70-c336fb325923@csgroup.eu>
Le 21/05/2020 à 12:38, Christophe Leroy a écrit :
>
>
> Le 21/05/2020 à 09:02, Michael Ellerman a écrit :
>> Arnd Bergmann <arnd@arndb.de> writes:
>>> +On Wed, Apr 8, 2020 at 2:04 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>>>> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
>>>>> On Fri, 2020-04-03 at 15:59 +1100, Michael Ellerman wrote:
>>>>>> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
>>>>> IBM still put 40x cores inside POWER chips no ?
>>>>
>>>> Oh yeah that's true. I guess most folks don't know that, or that they
>>>> run RHEL on them.
>>>
>>> Is there a reason for not having those dts files in mainline then?
>>> If nothing else, it would document what machines are still being
>>> used with future kernels.
>>
>> Sorry that part was a joke :D Those chips don't run Linux.
>>
>
> Nice to know :)
>
> What's the plan then, do we still want to keep 40x in the kernel ?
>
> If yes, is it ok to drop the oldies anyway as done in my series
> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=172630 ?
>
> (Note that this series will conflict with my series on hugepages on 8xx due to the
> PTE_ATOMIC_UPDATES stuff. I can rebase the 40x modernisation series on top of the 8xx hugepages
> series if it is worth it)
>
Do we still want to keep 40x in the kernel ? We don't even have a running 40x QEMU machine as far as
I know.
I'm asking because I'd like to drop the non CONFIG_VMAP_STACK code to simplify and ease stuff (code
that works with vmalloc'ed stacks also works with stacks in linear memory), but I can't do it
because 40x doesn't have VMAP_STACK and should I implement it for 40x, I have to means to test it.
So it would ease things if we could drop 40x completely, unless someone there has a 40x platform to
test stuff.
Thanks
Christophe
^ permalink raw reply
* Re: [PATCH net 1/2] ibmvnic: Ensure that SCRQ entry reads are correctly ordered
From: Michael Ellerman @ 2020-11-25 5:43 UTC (permalink / raw)
To: Thomas Falcon, netdev
Cc: cforno12, ljp, ricklind, dnbanerg, tlfalcon, drt, brking, sukadev,
linuxppc-dev
In-Reply-To: <1606238776-30259-2-git-send-email-tlfalcon@linux.ibm.com>
Thomas Falcon <tlfalcon@linux.ibm.com> writes:
> Ensure that received Subordinate Command-Response Queue (SCRQ)
> entries are properly read in order by the driver. These queues
> are used in the ibmvnic device to process RX buffer and TX completion
> descriptors. dma_rmb barriers have been added after checking for a
> pending descriptor to ensure the correct descriptor entry is checked
> and after reading the SCRQ descriptor to ensure the entire
> descriptor is read before processing.
>
> Fixes: 032c5e828 ("Driver for IBM System i/p VNIC protocol")
> Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
> ---
> drivers/net/ethernet/ibm/ibmvnic.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
> index 2aa40b2..489ed5e 100644
> --- a/drivers/net/ethernet/ibm/ibmvnic.c
> +++ b/drivers/net/ethernet/ibm/ibmvnic.c
> @@ -2403,6 +2403,8 @@ static int ibmvnic_poll(struct napi_struct *napi, int budget)
>
> if (!pending_scrq(adapter, adapter->rx_scrq[scrq_num]))
> break;
> + /* ensure that we do not prematurely exit the polling loop */
> + dma_rmb();
I'd be happier if these comments were more specific about which read(s)
they are ordering vs which other read(s).
I'm sure it's obvious to you, but it may not be to a future author,
and/or after the code has been refactored over time.
> next = ibmvnic_next_scrq(adapter, adapter->rx_scrq[scrq_num]);
> rx_buff =
> (struct ibmvnic_rx_buff *)be64_to_cpu(next->
> @@ -3098,6 +3100,9 @@ static int ibmvnic_complete_tx(struct ibmvnic_adapter *adapter,
> unsigned int pool = scrq->pool_index;
> int num_entries = 0;
>
> + /* ensure that the correct descriptor entry is read */
> + dma_rmb();
> +
> next = ibmvnic_next_scrq(adapter, scrq);
> for (i = 0; i < next->tx_comp.num_comps; i++) {
> if (next->tx_comp.rcs[i]) {
> @@ -3498,6 +3503,9 @@ static union sub_crq *ibmvnic_next_scrq(struct ibmvnic_adapter *adapter,
> }
> spin_unlock_irqrestore(&scrq->lock, flags);
>
> + /* ensure that the entire SCRQ descriptor is read */
> + dma_rmb();
> +
> return entry;
> }
cheers
^ permalink raw reply
* [PATCH v6 22/22] powerpc/book3s64/pkeys: Optimize FTR_KUAP and FTR_KUEP disabled case
From: Aneesh Kumar K.V @ 2020-11-25 5:16 UTC (permalink / raw)
To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V
In-Reply-To: <20201125051634.509286-1-aneesh.kumar@linux.ibm.com>
If FTR_KUAP is disabled kernel will continue to run with the same AMR
value with which it was entered. Hence there is a high chance that
we can return without restoring the AMR value. This also helps the case
when applications are not using the pkey feature. In this case, different
applications will have the same AMR values and hence we can avoid restoring
AMR in this case too.
Also avoid isync() if not really needed.
Do the same for IAMR.
null-syscall benchmark results:
With smap/smep disabled:
Without patch:
957.95 ns 2778.17 cycles
With patch:
858.38 ns 2489.30 cycles
With smap/smep enabled:
Without patch:
1017.26 ns 2950.36 cycles
With patch:
1021.51 ns 2962.44 cycles
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/include/asm/book3s/64/kup.h | 61 +++++++++++++++++++++---
arch/powerpc/kernel/entry_64.S | 2 +-
arch/powerpc/kernel/syscall_64.c | 12 +++--
3 files changed, 65 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h
index 7026d1b5d0c6..e063e439b0a8 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -12,28 +12,54 @@
#ifdef __ASSEMBLY__
-.macro kuap_restore_user_amr gpr1
+.macro kuap_restore_user_amr gpr1, gpr2
#if defined(CONFIG_PPC_PKEY)
BEGIN_MMU_FTR_SECTION_NESTED(67)
+ b 100f // skip_restore_amr
+ END_MMU_FTR_SECTION_NESTED_IFCLR(MMU_FTR_PKEY, 67)
/*
* AMR and IAMR are going to be different when
* returning to userspace.
*/
ld \gpr1, STACK_REGS_AMR(r1)
+
+ /*
+ * If kuap feature is not enabled, do the mtspr
+ * only if AMR value is different.
+ */
+ BEGIN_MMU_FTR_SECTION_NESTED(68)
+ mfspr \gpr2, SPRN_AMR
+ cmpd \gpr1, \gpr2
+ beq 99f
+ END_MMU_FTR_SECTION_NESTED_IFCLR(MMU_FTR_KUAP, 68)
+
isync
mtspr SPRN_AMR, \gpr1
+99:
/*
* Restore IAMR only when returning to userspace
*/
ld \gpr1, STACK_REGS_IAMR(r1)
+
+ /*
+ * If kuep feature is not enabled, do the mtspr
+ * only if IAMR value is different.
+ */
+ BEGIN_MMU_FTR_SECTION_NESTED(69)
+ mfspr \gpr2, SPRN_IAMR
+ cmpd \gpr1, \gpr2
+ beq 100f
+ END_MMU_FTR_SECTION_NESTED_IFCLR(MMU_FTR_KUEP, 69)
+
+ isync
mtspr SPRN_IAMR, \gpr1
+100: //skip_restore_amr
/* No isync required, see kuap_restore_user_amr() */
- END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_PKEY, 67)
#endif
.endm
-.macro kuap_restore_kernel_amr gpr1, gpr2
+.macro kuap_restore_kernel_amr gpr1, gpr2
#if defined(CONFIG_PPC_PKEY)
BEGIN_MMU_FTR_SECTION_NESTED(67)
@@ -197,18 +223,41 @@ static inline u64 current_thread_iamr(void)
static inline void kuap_restore_user_amr(struct pt_regs *regs)
{
+ bool restore_amr = false, restore_iamr = false;
+ unsigned long amr, iamr;
+
if (!mmu_has_feature(MMU_FTR_PKEY))
return;
- isync();
- mtspr(SPRN_AMR, regs->amr);
- mtspr(SPRN_IAMR, regs->iamr);
+ if (!mmu_has_feature(MMU_FTR_KUAP)) {
+ amr = mfspr(SPRN_AMR);
+ if (amr != regs->amr)
+ restore_amr = true;
+ } else
+ restore_amr = true;
+
+ if (!mmu_has_feature(MMU_FTR_KUEP)) {
+ iamr = mfspr(SPRN_IAMR);
+ if (iamr != regs->iamr)
+ restore_iamr = true;
+ } else
+ restore_iamr = true;
+
+
+ if (restore_amr || restore_iamr) {
+ isync();
+ if (restore_amr)
+ mtspr(SPRN_AMR, regs->amr);
+ if (restore_iamr)
+ mtspr(SPRN_IAMR, regs->iamr);
+ }
/*
* No isync required here because we are about to rfi
* back to previous context before any user accesses
* would be made, which is a CSI.
*/
}
+
static inline void kuap_restore_kernel_amr(struct pt_regs *regs,
unsigned long amr)
{
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index e49291594c68..a68517e99fd2 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -675,7 +675,7 @@ _ASM_NOKPROBE_SYMBOL(interrupt_return)
bne- .Lrestore_nvgprs
.Lfast_user_interrupt_return_amr:
- kuap_restore_user_amr r3
+ kuap_restore_user_amr r3, r4
.Lfast_user_interrupt_return:
ld r11,_NIP(r1)
ld r12,_MSR(r1)
diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
index 60c57609d316..681f9afafc6f 100644
--- a/arch/powerpc/kernel/syscall_64.c
+++ b/arch/powerpc/kernel/syscall_64.c
@@ -38,6 +38,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
#ifdef CONFIG_PPC_PKEY
if (mmu_has_feature(MMU_FTR_PKEY)) {
unsigned long amr, iamr;
+ bool flush_needed = false;
/*
* When entering from userspace we mostly have the AMR/IAMR
* different from kernel default values. Hence don't compare.
@@ -46,11 +47,16 @@ notrace long system_call_exception(long r3, long r4, long r5,
iamr = mfspr(SPRN_IAMR);
regs->amr = amr;
regs->iamr = iamr;
- if (mmu_has_feature(MMU_FTR_KUAP))
+ if (mmu_has_feature(MMU_FTR_KUAP)) {
mtspr(SPRN_AMR, AMR_KUAP_BLOCKED);
- if (mmu_has_feature(MMU_FTR_KUEP))
+ flush_needed = true;
+ }
+ if (mmu_has_feature(MMU_FTR_KUEP)) {
mtspr(SPRN_IAMR, AMR_KUEP_BLOCKED);
- isync();
+ flush_needed = true;
+ }
+ if (flush_needed)
+ isync();
} else
#endif
kuap_check_amr();
--
2.28.0
^ permalink raw reply related
* [PATCH v6 21/22] powerpc/book3s64/hash/kup: Don't hardcode kup key
From: Aneesh Kumar K.V @ 2020-11-25 5:16 UTC (permalink / raw)
To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V
In-Reply-To: <20201125051634.509286-1-aneesh.kumar@linux.ibm.com>
Make KUAP/KUEP key a variable and also check whether the platform
limit the max key such that we can't use the key for KUAP/KEUP.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
.../powerpc/include/asm/book3s/64/hash-pkey.h | 22 +-------
arch/powerpc/include/asm/book3s/64/pkeys.h | 1 +
arch/powerpc/mm/book3s64/pkeys.c | 53 ++++++++++++++++---
3 files changed, 49 insertions(+), 27 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/64/hash-pkey.h b/arch/powerpc/include/asm/book3s/64/hash-pkey.h
index 9f44e208f036..ff9907c72ee3 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-pkey.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-pkey.h
@@ -2,9 +2,7 @@
#ifndef _ASM_POWERPC_BOOK3S_64_HASH_PKEY_H
#define _ASM_POWERPC_BOOK3S_64_HASH_PKEY_H
-/* We use key 3 for KERNEL */
-#define HASH_DEFAULT_KERNEL_KEY (HPTE_R_KEY_BIT0 | HPTE_R_KEY_BIT1)
-
+u64 pte_to_hpte_pkey_bits(u64 pteflags, unsigned long flags);
static inline u64 hash__vmflag_to_pte_pkey_bits(u64 vm_flags)
{
return (((vm_flags & VM_PKEY_BIT0) ? H_PTE_PKEY_BIT0 : 0x0UL) |
@@ -14,24 +12,6 @@ static inline u64 hash__vmflag_to_pte_pkey_bits(u64 vm_flags)
((vm_flags & VM_PKEY_BIT4) ? H_PTE_PKEY_BIT4 : 0x0UL));
}
-static inline u64 pte_to_hpte_pkey_bits(u64 pteflags, unsigned long flags)
-{
- unsigned long pte_pkey;
-
- pte_pkey = (((pteflags & H_PTE_PKEY_BIT4) ? HPTE_R_KEY_BIT4 : 0x0UL) |
- ((pteflags & H_PTE_PKEY_BIT3) ? HPTE_R_KEY_BIT3 : 0x0UL) |
- ((pteflags & H_PTE_PKEY_BIT2) ? HPTE_R_KEY_BIT2 : 0x0UL) |
- ((pteflags & H_PTE_PKEY_BIT1) ? HPTE_R_KEY_BIT1 : 0x0UL) |
- ((pteflags & H_PTE_PKEY_BIT0) ? HPTE_R_KEY_BIT0 : 0x0UL));
-
- if (mmu_has_feature(MMU_FTR_KUAP) || mmu_has_feature(MMU_FTR_KUEP)) {
- if ((pte_pkey == 0) && (flags & HPTE_USE_KERNEL_KEY))
- return HASH_DEFAULT_KERNEL_KEY;
- }
-
- return pte_pkey;
-}
-
static inline u16 hash__pte_to_pkey_bits(u64 pteflags)
{
return (((pteflags & H_PTE_PKEY_BIT4) ? 0x10 : 0x0UL) |
diff --git a/arch/powerpc/include/asm/book3s/64/pkeys.h b/arch/powerpc/include/asm/book3s/64/pkeys.h
index 3b8640498f5b..a2b6c4a7275f 100644
--- a/arch/powerpc/include/asm/book3s/64/pkeys.h
+++ b/arch/powerpc/include/asm/book3s/64/pkeys.h
@@ -8,6 +8,7 @@
extern u64 __ro_after_init default_uamor;
extern u64 __ro_after_init default_amr;
extern u64 __ro_after_init default_iamr;
+extern int kup_key;
static inline u64 vmflag_to_pte_pkey_bits(u64 vm_flags)
{
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index f029e7bf5ca2..204e4598b45c 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -37,7 +37,10 @@ u64 default_uamor __ro_after_init;
*/
static int execute_only_key = 2;
static bool pkey_execute_disable_supported;
-
+/*
+ * key used to implement KUAP/KUEP with hash translation.
+ */
+int kup_key = 3;
#define AMR_BITS_PER_PKEY 2
#define AMR_RD_BIT 0x1UL
@@ -185,6 +188,25 @@ void __init pkey_early_init_devtree(void)
default_uamor &= ~(0x3ul << pkeyshift(execute_only_key));
}
+ if (unlikely(num_pkey <= kup_key)) {
+ /*
+ * Insufficient number of keys to support
+ * KUAP/KUEP feature.
+ */
+ kup_key = -1;
+ } else {
+ /* handle key which is used by kernel for KAUP */
+ reserved_allocation_mask |= (0x1 << kup_key);
+ /*
+ * Mark access for kup_key in default amr so that
+ * we continue to operate with that AMR in
+ * copy_to/from_user().
+ */
+ default_amr &= ~(0x3ul << pkeyshift(kup_key));
+ default_iamr &= ~(0x1ul << pkeyshift(kup_key));
+ default_uamor &= ~(0x3ul << pkeyshift(kup_key));
+ }
+
/*
* Allow access for only key 0. And prevent any other modification.
*/
@@ -205,9 +227,6 @@ void __init pkey_early_init_devtree(void)
reserved_allocation_mask |= (0x1 << 1);
default_uamor &= ~(0x3ul << pkeyshift(1));
- /* handle key 3 which is used by kernel for KAUP */
- reserved_allocation_mask |= (0x1 << 3);
- default_uamor &= ~(0x3ul << pkeyshift(3));
/*
* Prevent the usage of OS reserved keys. Update UAMOR
@@ -236,7 +255,7 @@ void __init pkey_early_init_devtree(void)
#ifdef CONFIG_PPC_KUEP
void __init setup_kuep(bool disabled)
{
- if (disabled)
+ if (disabled || kup_key == -1)
return;
/*
* On hash if PKEY feature is not enabled, disable KUAP too.
@@ -262,7 +281,7 @@ void __init setup_kuep(bool disabled)
#ifdef CONFIG_PPC_KUAP
void __init setup_kuap(bool disabled)
{
- if (disabled)
+ if (disabled || kup_key == -1)
return;
/*
* On hash if PKEY feature is not enabled, disable KUAP too.
@@ -458,4 +477,26 @@ void arch_dup_pkeys(struct mm_struct *oldmm, struct mm_struct *mm)
mm->context.execute_only_pkey = oldmm->context.execute_only_pkey;
}
+u64 pte_to_hpte_pkey_bits(u64 pteflags, unsigned long flags)
+{
+ unsigned long pte_pkey;
+
+ pte_pkey = (((pteflags & H_PTE_PKEY_BIT4) ? HPTE_R_KEY_BIT4 : 0x0UL) |
+ ((pteflags & H_PTE_PKEY_BIT3) ? HPTE_R_KEY_BIT3 : 0x0UL) |
+ ((pteflags & H_PTE_PKEY_BIT2) ? HPTE_R_KEY_BIT2 : 0x0UL) |
+ ((pteflags & H_PTE_PKEY_BIT1) ? HPTE_R_KEY_BIT1 : 0x0UL) |
+ ((pteflags & H_PTE_PKEY_BIT0) ? HPTE_R_KEY_BIT0 : 0x0UL));
+
+ if (mmu_has_feature(MMU_FTR_KUAP) || mmu_has_feature(MMU_FTR_KUEP)) {
+ if ((pte_pkey == 0) &&
+ (flags & HPTE_USE_KERNEL_KEY) && (kup_key != -1)) {
+ u64 vm_flag = pkey_to_vmflag_bits(kup_key);
+ u64 pte_flag = hash__vmflag_to_pte_pkey_bits(vm_flag);
+ return pte_to_hpte_pkey_bits(pte_flag, 0);
+ }
+ }
+
+ return pte_pkey;
+}
+
#endif /* CONFIG_PPC_MEM_KEYS */
--
2.28.0
^ permalink raw reply related
* [PATCH v6 20/22] powerpc/book3s64/hash/kuep: Enable KUEP on hash
From: Aneesh Kumar K.V @ 2020-11-25 5:16 UTC (permalink / raw)
To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, Sandipan Das
In-Reply-To: <20201125051634.509286-1-aneesh.kumar@linux.ibm.com>
Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/mm/book3s64/pkeys.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index 84f8664ffc47..f029e7bf5ca2 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -236,7 +236,12 @@ void __init pkey_early_init_devtree(void)
#ifdef CONFIG_PPC_KUEP
void __init setup_kuep(bool disabled)
{
- if (disabled || !early_radix_enabled())
+ if (disabled)
+ return;
+ /*
+ * On hash if PKEY feature is not enabled, disable KUAP too.
+ */
+ if (!early_radix_enabled() && !early_mmu_has_feature(MMU_FTR_PKEY))
return;
if (smp_processor_id() == boot_cpuid) {
--
2.28.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox