* [PATCH] net: ethernet: fs-enet: remove casting dma_alloc_coherent
From: Xu Wang @ 2020-12-11 8:52 UTC (permalink / raw)
To: pantelis.antoniou, davem, kuba, linuxppc-dev; +Cc: linux-kernel
Remove casting the values returned by dma_alloc_coherent.
Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
---
drivers/net/ethernet/freescale/fs_enet/mac-fec.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/freescale/fs_enet/mac-fec.c b/drivers/net/ethernet/freescale/fs_enet/mac-fec.c
index 99fe2c210d0f..3ae345676e50 100644
--- a/drivers/net/ethernet/freescale/fs_enet/mac-fec.c
+++ b/drivers/net/ethernet/freescale/fs_enet/mac-fec.c
@@ -131,7 +131,7 @@ static int allocate_bd(struct net_device *dev)
struct fs_enet_private *fep = netdev_priv(dev);
const struct fs_platform_info *fpi = fep->fpi;
- fep->ring_base = (void __force __iomem *)dma_alloc_coherent(fep->dev,
+ fep->ring_base = dma_alloc_coherent(fep->dev,
(fpi->tx_ring + fpi->rx_ring) *
sizeof(cbd_t), &fep->ring_mem_addr,
GFP_KERNEL);
--
2.17.1
^ permalink raw reply related
* [powerpc/merge] System crash during cpu offline/online operation
From: Sachin Sant @ 2020-12-11 10:17 UTC (permalink / raw)
To: linuxppc-dev
I am observing system crash during a cpu offline/online operation
with latest merge branch code running in a PowerVM LPAR (P8 onwards)
# uname -r
5.10.0-rc7-01792-g244569c777ca
# ppc64_cpu --smt=1
[ 244.205194] cpu 1 (hwid 1) Ready to die…
………
……...
[ 247.015113] cpu 30 (hwid 30) Ready to die...
[ 247.104973] cpu 31 (hwid 31) Ready to die…
# ppc64_cpu --smt=8
At this point the LPAR reboots instantly without any trace message.
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
………..
………..
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
1 = SMS Menu 5 = Default Boot List
8 = Open Firmware Prompt 6 = Stored Boot List
9 = Restricted Open Firmware Prompt
…….
…….
merge commit 9acd775e45 (based on rc6) was good. Mainline 5.10.0-rc7 also does
not exhibit this problem.
Thanks
-Sachin
^ permalink raw reply
* Re: [PATCH v1 1/2] KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE (nested case only)
From: Bharata B Rao @ 2020-12-11 10:33 UTC (permalink / raw)
To: kvm-ppc, linuxppc-dev; +Cc: aneesh.kumar, david, npiggin
In-Reply-To: <20201019112642.53016-2-bharata@linux.ibm.com>
On Mon, Oct 19, 2020 at 04:56:41PM +0530, Bharata B Rao wrote:
> Implements H_RPT_INVALIDATE hcall and supports only nested case
> currently.
>
> A KVM capability KVM_CAP_RPT_INVALIDATE is added to indicate the
> support for this hcall.
As Paul mentioned in the thread, this hcall does both process scoped
invalidations and partition scoped invalidations for L2 guest.
I am adding KVM_CAP_RPT_INVALIDATE capability with only partition
scoped invalidations (nested case) implemented in the hcall as we
don't see the need for KVM to implement process scoped invalidation
function as KVM may never run with LPCR[GTSE]=0.
I am wondering if enabling the capability with only partial
implementation of the hcall is the correct thing to do. In future
if we ever want process scoped invalidations support in this hcall,
we may not be able to differentiate the availability of two functions
cleanly from QEMU.
So does it make sense to implement the process scoped invalidation
function also now itself even if it is not going to be used in
KVM?
Regards,
Bharata.
^ permalink raw reply
* [PATCH] ASoC: fsl: imx-hdmi: Fix an uninitialized return in probe
From: Dan Carpenter @ 2020-12-11 10:13 UTC (permalink / raw)
To: Timur Tabi, Shengjiu Wang
Cc: alsa-devel, linuxppc-dev, Xiubo Li, Shawn Guo, Sascha Hauer,
Takashi Iwai, kernel-janitors, Liam Girdwood, Jaroslav Kysela,
Nicolin Chen, Mark Brown, NXP Linux Team, Pengutronix Kernel Team,
Fabio Estevam
The "ret" variable should be set to a negative error code but
currently it returns uninitialized stack data.
Fixes: 6a5f850aa83a ("ASoC: fsl: Add imx-hdmi machine driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
---
sound/soc/fsl/imx-hdmi.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/sound/soc/fsl/imx-hdmi.c b/sound/soc/fsl/imx-hdmi.c
index 2c2a76a71940..ede4a9ad1054 100644
--- a/sound/soc/fsl/imx-hdmi.c
+++ b/sound/soc/fsl/imx-hdmi.c
@@ -164,6 +164,7 @@ static int imx_hdmi_probe(struct platform_device *pdev)
if ((hdmi_out && hdmi_in) || (!hdmi_out && !hdmi_in)) {
dev_err(&pdev->dev, "Invalid HDMI DAI link\n");
+ ret = -EINVAL;
goto fail;
}
--
2.29.2
^ permalink raw reply related
* [RFC HACK PATCH] PCI: dwc: layerscape: Hack around enumeration problems with Honeycomb LX2K
From: Daniel Thompson @ 2020-12-11 12:15 UTC (permalink / raw)
To: Minghuan Lian, Mingkai Hu, Roy Zang
Cc: Rob Herring, Daniel Thompson, Lorenzo Pieralisi, patches,
linux-pci, Jon Nettleton, linux-kernel, Bjorn Helgaas,
linuxppc-dev, linux-arm-kernel
I have been chasing down a problem enumerating an NVMe drive on a
Honeycomb LX2K (NXP LX2160A). Specifically the drive can only enumerate
successfully if the we are emitting lots of console messages via a UART.
If the system is booted with `quiet` set then enumeration fails.
I guessed this would be due to the timing impact of printk-to-UART and
tried to find out where a delay could be added to provoke a successful
enumeration.
This patch contains the results. The delay length (1ms) was selected by
binary searching downwards until the delay was not effective for these
devices (Honeycomb LX2K and a Western Digital WD Blue SN550).
I have also included the workaround twice (conditionally compiled). The
first change is the *latest* possible code path that we can deploy a
delay whilst the second is the earliest place I could find.
The summary is that the critical window were we are currently relying on
a call to the console UART code can "mend" the driver runs from calling
dw_pcie_setup_rc() in host init to just before we read the state in the
link up callback.
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
---
Notes:
This patch is RFC (and HACK) because I don't have much clue *why* this
patch works... merely that this is the smallest possible change I need
to replicate the current accidental printk() workaround. Perhaps one
could argue that RFC here stands for request-for-clue. All my
observations and changes here are empirical and I don't know how best to
turn them into something that is not a hack!
BTW I noticed many other pcie-designware drivers take advantage
of a function called dw_pcie_wait_for_link() in their init paths...
but my naive attempts to add it to the layerscape driver results
in non-booting systems so I haven't embarrassed myself by including
that in the patch!
drivers/pci/controller/dwc/pci-layerscape.c | 35 +++++++++++++++++++++
1 file changed, 35 insertions(+)
diff --git a/drivers/pci/controller/dwc/pci-layerscape.c b/drivers/pci/controller/dwc/pci-layerscape.c
index f24f79a70d9a..c354904b90ef 100644
--- a/drivers/pci/controller/dwc/pci-layerscape.c
+++ b/drivers/pci/controller/dwc/pci-layerscape.c
@@ -22,6 +22,8 @@
#include "pcie-designware.h"
+#define WORKAROUND_LATEST_POSSIBLE
+
/* PEX1/2 Misc Ports Status Register */
#define SCFG_PEXMSCPORTSR(pex_idx) (0x94 + (pex_idx) * 4)
#define LTSSM_STATE_SHIFT 20
@@ -113,10 +115,31 @@ static int ls_pcie_link_up(struct dw_pcie *pci)
struct ls_pcie *pcie = to_ls_pcie(pci);
u32 state;
+ /*
+ * Strictly speaking *this* (before the ioread32) is the latest
+ * point a simple delay can be effective. If we move the delay
+ * after the ioread32 then the NVMe does not enumerate.
+ *
+ * However this function appears to be frequently called so an
+ * unconditional delay here causes noticeable delay at boot
+ * time. Hence we implement the workaround by retrying the read
+ * after a short delay if we think we might need to return false.
+ */
+
state = (ioread32(pcie->lut + pcie->drvdata->lut_dbg) >>
pcie->drvdata->ltssm_shift) &
LTSSM_STATE_MASK;
+#ifdef WORKAROUND_LATEST_POSSIBLE
+ if (state < LTSSM_PCIE_L0) {
+ /* see comment above */
+ mdelay(1);
+ state = (ioread32(pcie->lut + pcie->drvdata->lut_dbg) >>
+ pcie->drvdata->ltssm_shift) &
+ LTSSM_STATE_MASK;
+ }
+#endif
+
if (state < LTSSM_PCIE_L0)
return 0;
@@ -152,6 +175,18 @@ static int ls_pcie_host_init(struct pcie_port *pp)
dw_pcie_setup_rc(pp);
+#ifdef WORKAROUND_EARLIEST_POSSIBLE
+ /*
+ * This is the earliest point the delay is effective.
+ * If we move it before dw_pcie_setup_rc() then the
+ * NVMe does not enumerate.
+ *
+ * 500us is too short to reliably work around the issue
+ * hence adopting 1000us here.
+ */
+ mdelay(1);
+#endif
+
return 0;
}
base-commit: 0477e92881850d44910a7e94fc2c46f96faa131f
--
2.29.2
^ permalink raw reply related
* Re: [RFC HACK PATCH] PCI: dwc: layerscape: Hack around enumeration problems with Honeycomb LX2K
From: Rob Herring @ 2020-12-11 14:37 UTC (permalink / raw)
To: Daniel Thompson
Cc: Roy Zang, Lorenzo Pieralisi, Linaro Patches, PCI, Jon Nettleton,
linux-kernel@vger.kernel.org, Minghuan Lian, linux-arm-kernel,
Bjorn Helgaas, linuxppc-dev, Mingkai Hu
In-Reply-To: <20201211121507.28166-1-daniel.thompson@linaro.org>
On Fri, Dec 11, 2020 at 6:15 AM Daniel Thompson
<daniel.thompson@linaro.org> wrote:
>
> I have been chasing down a problem enumerating an NVMe drive on a
> Honeycomb LX2K (NXP LX2160A). Specifically the drive can only enumerate
> successfully if the we are emitting lots of console messages via a UART.
> If the system is booted with `quiet` set then enumeration fails.
We really need to get away from work-arounds for device X on host Y. I
suspect they are not limited to that combination only...
How exactly does it fail to enumerate? There's a (racy) linkup check
on config accesses, is it reporting link down and skipping config
accesses?
> I guessed this would be due to the timing impact of printk-to-UART and
> tried to find out where a delay could be added to provoke a successful
> enumeration.
>
> This patch contains the results. The delay length (1ms) was selected by
> binary searching downwards until the delay was not effective for these
> devices (Honeycomb LX2K and a Western Digital WD Blue SN550).
>
> I have also included the workaround twice (conditionally compiled). The
> first change is the *latest* possible code path that we can deploy a
> delay whilst the second is the earliest place I could find.
>
> The summary is that the critical window were we are currently relying on
> a call to the console UART code can "mend" the driver runs from calling
> dw_pcie_setup_rc() in host init to just before we read the state in the
> link up callback.
>
> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
> ---
>
> Notes:
> This patch is RFC (and HACK) because I don't have much clue *why* this
> patch works... merely that this is the smallest possible change I need
> to replicate the current accidental printk() workaround. Perhaps one
> could argue that RFC here stands for request-for-clue. All my
> observations and changes here are empirical and I don't know how best to
> turn them into something that is not a hack!
>
> BTW I noticed many other pcie-designware drivers take advantage
> of a function called dw_pcie_wait_for_link() in their init paths...
> but my naive attempts to add it to the layerscape driver results
> in non-booting systems so I haven't embarrassed myself by including
> that in the patch!
You need to look at what's pending for v5.11, because I reworked this
to be more unified. The ordering of init is also possibly changed. The
sequence is now like this:
dw_pcie_setup_rc(pp);
dw_pcie_msi_init(pp);
if (!dw_pcie_link_up(pci) && pci->ops->start_link) {
ret = pci->ops->start_link(pci);
if (ret)
goto err_free_msi;
}
/* Ignore errors, the link may come up later */
dw_pcie_wait_for_link(pci);
> drivers/pci/controller/dwc/pci-layerscape.c | 35 +++++++++++++++++++++
> 1 file changed, 35 insertions(+)
>
> diff --git a/drivers/pci/controller/dwc/pci-layerscape.c b/drivers/pci/controller/dwc/pci-layerscape.c
> index f24f79a70d9a..c354904b90ef 100644
> --- a/drivers/pci/controller/dwc/pci-layerscape.c
> +++ b/drivers/pci/controller/dwc/pci-layerscape.c
> @@ -22,6 +22,8 @@
>
> #include "pcie-designware.h"
>
> +#define WORKAROUND_LATEST_POSSIBLE
> +
> /* PEX1/2 Misc Ports Status Register */
> #define SCFG_PEXMSCPORTSR(pex_idx) (0x94 + (pex_idx) * 4)
> #define LTSSM_STATE_SHIFT 20
> @@ -113,10 +115,31 @@ static int ls_pcie_link_up(struct dw_pcie *pci)
> struct ls_pcie *pcie = to_ls_pcie(pci);
> u32 state;
>
> + /*
> + * Strictly speaking *this* (before the ioread32) is the latest
> + * point a simple delay can be effective. If we move the delay
> + * after the ioread32 then the NVMe does not enumerate.
> + *
> + * However this function appears to be frequently called so an
> + * unconditional delay here causes noticeable delay at boot
> + * time. Hence we implement the workaround by retrying the read
> + * after a short delay if we think we might need to return false.
> + */
> +
> state = (ioread32(pcie->lut + pcie->drvdata->lut_dbg) >>
> pcie->drvdata->ltssm_shift) &
> LTSSM_STATE_MASK;
>
> +#ifdef WORKAROUND_LATEST_POSSIBLE
> + if (state < LTSSM_PCIE_L0) {
> + /* see comment above */
> + mdelay(1);
> + state = (ioread32(pcie->lut + pcie->drvdata->lut_dbg) >>
> + pcie->drvdata->ltssm_shift) &
> + LTSSM_STATE_MASK;
Side note, I really wonder about the variety of ways in DWC drivers we
have to check link state. The default is a debug register... Are the
standard PCIe registers for this really broken in these cases?
> + }
> +#endif
> +
> if (state < LTSSM_PCIE_L0)
> return 0;
>
> @@ -152,6 +175,18 @@ static int ls_pcie_host_init(struct pcie_port *pp)
>
> dw_pcie_setup_rc(pp);
Might be related to the PORT_LOGIC_SPEED_CHANGE we do in here.
> +#ifdef WORKAROUND_EARLIEST_POSSIBLE
> + /*
> + * This is the earliest point the delay is effective.
> + * If we move it before dw_pcie_setup_rc() then the
> + * NVMe does not enumerate.
> + *
> + * 500us is too short to reliably work around the issue
> + * hence adopting 1000us here.
> + */
> + mdelay(1);
> +#endif
> +
> return 0;
> }
>
>
> base-commit: 0477e92881850d44910a7e94fc2c46f96faa131f
> --
> 2.29.2
>
^ permalink raw reply
* [PATCH] powerpc/memhotplug: quieting some DLPAR operations
From: Laurent Dufour @ 2020-12-11 14:59 UTC (permalink / raw)
To: linuxppc-dev; +Cc: nathanl, cheloha, linux-kernel, paulus
When attempting to remove by index a set of LMB a lot of messages are
displayed on the console, even when everything goes fine:
pseries-hotplug-mem: Attempting to hot-remove LMB, drc index 8000002d
Offlined Pages 4096
pseries-hotplug-mem: Memory at 2d0000000 was hot-removed
The 2 messages prefixed by "pseries-hotplug-mem" are not really helpful for
the end user, they should be debug outputs.
In case of error, because some of the LMB's pages couldn't be offlined, the
following is displayed on the console:
pseries-hotplug-mem: Attempting to hot-remove LMB, drc index 8000003e
pseries-hotplug-mem: Failed to hot-remove memory at 3e0000000
dlpar: Could not handle DLPAR request "memory remove index 0x8000003e"
Again, the 2 messages prefixed by "pseries-hotplug-mem" are useless, and the
generic DLPAR prefixed message should be enough. Turning the 2 firts at
DEBUG level.
These 2 first changes are mainly triggered by the changes introduced in
drmgr:
https://groups.google.com/g/powerpc-utils-devel/c/Y6ef4NB3EzM/m/9cu5JHRxAQAJ
Also, when adding a bunch of LMBs, a message is displayed in the console per LMB
like these ones:
pseries-hotplug-mem: Memory at 7e0000000 (drc index 8000007e) was hot-added
pseries-hotplug-mem: Memory at 7f0000000 (drc index 8000007f) was hot-added
pseries-hotplug-mem: Memory at 800000000 (drc index 80000080) was hot-added
pseries-hotplug-mem: Memory at 810000000 (drc index 80000081) was hot-added
When adding 1TB of memory and LMB size is 256MB, this leads to 4096
messages to be displayed on the console. These messages are not really
helpful for the end user, so moving them to the DEBUG level.
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
---
arch/powerpc/platforms/pseries/hotplug-memory.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 7efe6ec5d14a..8377f1f7c78e 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -479,7 +479,7 @@ static int dlpar_memory_remove_by_index(u32 drc_index)
int lmb_found;
int rc;
- pr_info("Attempting to hot-remove LMB, drc index %x\n", drc_index);
+ pr_debug("Attempting to hot-remove LMB, drc index %x\n", drc_index);
lmb_found = 0;
for_each_drmem_lmb(lmb) {
@@ -497,10 +497,10 @@ static int dlpar_memory_remove_by_index(u32 drc_index)
rc = -EINVAL;
if (rc)
- pr_info("Failed to hot-remove memory at %llx\n",
- lmb->base_addr);
+ pr_debug("Failed to hot-remove memory at %llx\n",
+ lmb->base_addr);
else
- pr_info("Memory at %llx was hot-removed\n", lmb->base_addr);
+ pr_debug("Memory at %llx was hot-removed\n", lmb->base_addr);
return rc;
}
@@ -717,8 +717,8 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
if (!drmem_lmb_reserved(lmb))
continue;
- pr_info("Memory at %llx (drc index %x) was hot-added\n",
- lmb->base_addr, lmb->drc_index);
+ pr_debug("Memory at %llx (drc index %x) was hot-added\n",
+ lmb->base_addr, lmb->drc_index);
drmem_remove_lmb_reservation(lmb);
}
rc = 0;
--
2.29.2
^ permalink raw reply related
* Re: [PATCH] net: ethernet: fs-enet: remove casting dma_alloc_coherent
From: Christophe Leroy @ 2020-12-11 15:22 UTC (permalink / raw)
To: Xu Wang, pantelis.antoniou, davem, kuba, linuxppc-dev; +Cc: linux-kernel
In-Reply-To: <20201211085212.85457-1-vulab@iscas.ac.cn>
Le 11/12/2020 à 09:52, Xu Wang a écrit :
> Remove casting the values returned by dma_alloc_coherent.
Can you explain more in the commit log ?
As far as I can see, dma_alloc_coherent() doesn't return __iomem, and ring_base member is __iomem
Christophe
>
> Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
> ---
> drivers/net/ethernet/freescale/fs_enet/mac-fec.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/freescale/fs_enet/mac-fec.c b/drivers/net/ethernet/freescale/fs_enet/mac-fec.c
> index 99fe2c210d0f..3ae345676e50 100644
> --- a/drivers/net/ethernet/freescale/fs_enet/mac-fec.c
> +++ b/drivers/net/ethernet/freescale/fs_enet/mac-fec.c
> @@ -131,7 +131,7 @@ static int allocate_bd(struct net_device *dev)
> struct fs_enet_private *fep = netdev_priv(dev);
> const struct fs_platform_info *fpi = fep->fpi;
>
> - fep->ring_base = (void __force __iomem *)dma_alloc_coherent(fep->dev,
> + fep->ring_base = dma_alloc_coherent(fep->dev,
> (fpi->tx_ring + fpi->rx_ring) *
> sizeof(cbd_t), &fep->ring_mem_addr,
> GFP_KERNEL);
>
^ permalink raw reply
* [RFC PATCH v1] powerpc/net: Implement eBPF on PPC32
From: Christophe Leroy @ 2020-12-11 15:30 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
KP Singh, Naveen N. Rao, Sandipan Das
Cc: netdev, bpf, linuxppc-dev, linux-kernel
This first draft of eBPF for PPC32. This is still dirty WIP.
Don't pay too much attention on comments, most of them are copied
as is from PPC64 implementation.
Test result:
test_bpf: Summary: 378 PASSED, 0 FAILED, [332/366 JIT'ed]
Registers mapping:
[BPF_REG_0] = r21-r22
/* function arguments */
[BPF_REG_1] = r3-r4
[BPF_REG_2] = r5-r6
[BPF_REG_3] = r7-r8
[BPF_REG_4] = r9-r10
[BPF_REG_5] = r11-r12
/* non volatile registers */
[BPF_REG_6] = r23-r24
[BPF_REG_7] = r25-r26
[BPF_REG_8] = r27-r28
[BPF_REG_9] = r29-r30
/* frame pointer aka BPF_REG_10 */
[BPF_REG_FP] = r31
/* eBPF jit internal registers */
[BPF_REG_AX] = r19-r20
[TMP_REG] = r18
r0 is also used as temporary register as much as possible. It is
referenced directly in the code in order to avoid misuse of it, as
some instructions interpret it as value 0 instead of register r0.
The following operations are not (or only partially) supported
for the time being:
case BPF_ALU64 | BPF_DIV | BPF_X: /* dst /= src */
case BPF_ALU64 | BPF_MOD | BPF_X: /* dst %= src */
case BPF_ALU64 | BPF_MOD | BPF_K: /* dst %= imm */
case BPF_ALU64 | BPF_DIV | BPF_K: /* dst /= imm */
case BPF_ALU64 | BPF_LSH | BPF_X: /* dst <<= src; */
case BPF_ALU64 | BPF_LSH | BPF_K: /* dst <<== imm */
case BPF_ALU64 | BPF_RSH | BPF_X: /* dst >>= src */
case BPF_ALU64 | BPF_RSH | BPF_K: /* dst >>= imm */
case BPF_ALU64 | BPF_ARSH | BPF_X: /* (s64) dst >>= src */
case BPF_ALU64 | BPF_ARSH | BPF_K: /* (s64) dst >>= imm */
case BPF_STX | BPF_XADD | BPF_DW: /* *(u64 *)(dst + off) += src */
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/Kconfig | 3 +-
arch/powerpc/include/asm/ppc-opcode.h | 11 +
arch/powerpc/net/Makefile | 6 +-
arch/powerpc/net/bpf_jit32.h | 173 ++--
arch/powerpc/net/bpf_jit_asm.S | 226 -----
arch/powerpc/net/bpf_jit_comp.c | 683 --------------
arch/powerpc/net/bpf_jit_comp32.c | 1177 +++++++++++++++++++++++++
7 files changed, 1249 insertions(+), 1030 deletions(-)
delete mode 100644 arch/powerpc/net/bpf_jit_asm.S
delete mode 100644 arch/powerpc/net/bpf_jit_comp.c
create mode 100644 arch/powerpc/net/bpf_jit_comp32.c
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9e679ba0811c..dd6ccd550230 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -190,7 +190,6 @@ config PPC
select HAVE_ARCH_TRACEHOOK
select HAVE_ASM_MODVERSIONS
select HAVE_C_RECORDMCOUNT
- select HAVE_CBPF_JIT if !PPC64
select HAVE_STACKPROTECTOR if PPC64 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13)
select HAVE_STACKPROTECTOR if PPC32 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
select HAVE_CONTEXT_TRACKING if PPC64
@@ -199,7 +198,7 @@ config PPC
select HAVE_DEBUG_STACKOVERFLOW
select HAVE_DYNAMIC_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_REGS if MPROFILE_KERNEL
- select HAVE_EBPF_JIT if PPC64
+ select HAVE_EBPF_JIT
select HAVE_EFFICIENT_UNALIGNED_ACCESS if !(CPU_LITTLE_ENDIAN && POWER7_CPU)
select HAVE_FAST_GUP
select HAVE_FTRACE_MCOUNT_RECORD
diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index a6e3700c4566..0cefa1da9a1f 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -417,6 +417,7 @@
#define PPC_RAW_LD(r, base, i) (PPC_INST_LD | ___PPC_RT(r) | ___PPC_RA(base) | IMM_DS(i))
#define PPC_RAW_LWZ(r, base, i) (0x80000000 | ___PPC_RT(r) | ___PPC_RA(base) | IMM_L(i))
#define PPC_RAW_LWZX(t, a, b) (0x7c00002e | ___PPC_RT(t) | ___PPC_RA(a) | ___PPC_RB(b))
+#define PPC_RAW_LMW(r, base, i) (0xb8000000 | ___PPC_RT(r) | ___PPC_RA(base) | IMM_L(i))
#define PPC_RAW_STD(r, base, i) (PPC_INST_STD | ___PPC_RS(r) | ___PPC_RA(base) | IMM_DS(i))
#define PPC_RAW_STDCX(s, a, b) (0x7c0001ad | ___PPC_RS(s) | ___PPC_RA(a) | ___PPC_RB(b))
#define PPC_RAW_LFSX(t, a, b) (0x7c00042e | ___PPC_RT(t) | ___PPC_RA(a) | ___PPC_RB(b))
@@ -425,6 +426,9 @@
#define PPC_RAW_STFDX(s, a, b) (0x7c0005ae | ___PPC_RS(s) | ___PPC_RA(a) | ___PPC_RB(b))
#define PPC_RAW_LVX(t, a, b) (0x7c0000ce | ___PPC_RT(t) | ___PPC_RA(a) | ___PPC_RB(b))
#define PPC_RAW_STVX(s, a, b) (0x7c0001ce | ___PPC_RS(s) | ___PPC_RA(a) | ___PPC_RB(b))
+#define PPC_RAW_ADDE(t, a, b) (0x7c000114 | ___PPC_RT(t) | ___PPC_RA(a) | ___PPC_RB(b))
+#define PPC_RAW_ADDZE(t, a) (0x7c000194 | ___PPC_RT(t) | ___PPC_RA(a))
+#define PPC_RAW_ADDME(t, a) (0x7c0001d4 | ___PPC_RT(t) | ___PPC_RA(a))
#define PPC_RAW_ADD(t, a, b) (PPC_INST_ADD | ___PPC_RT(t) | ___PPC_RA(a) | ___PPC_RB(b))
#define PPC_RAW_ADD_DOT(t, a, b) (PPC_INST_ADD | ___PPC_RT(t) | ___PPC_RA(a) | ___PPC_RB(b) | 0x1)
#define PPC_RAW_ADDC(t, a, b) (0x7c000014 | ___PPC_RT(t) | ___PPC_RA(a) | ___PPC_RB(b))
@@ -443,6 +447,7 @@
#define PPC_RAW_STDU(r, base, i) (0xf8000001 | ___PPC_RS(r) | ___PPC_RA(base) | ((i) & 0xfffc))
#define PPC_RAW_STW(r, base, i) (0x90000000 | ___PPC_RS(r) | ___PPC_RA(base) | IMM_L(i))
#define PPC_RAW_STWU(r, base, i) (0x94000000 | ___PPC_RS(r) | ___PPC_RA(base) | IMM_L(i))
+#define PPC_RAW_STMW(r, base, i) (0xbc000000 | ___PPC_RS(r) | ___PPC_RA(base) | IMM_L(i))
#define PPC_RAW_STH(r, base, i) (0xb0000000 | ___PPC_RS(r) | ___PPC_RA(base) | IMM_L(i))
#define PPC_RAW_STB(r, base, i) (0x98000000 | ___PPC_RS(r) | ___PPC_RA(base) | IMM_L(i))
#define PPC_RAW_LBZ(r, base, i) (0x88000000 | ___PPC_RT(r) | ___PPC_RA(base) | IMM_L(i))
@@ -460,6 +465,10 @@
#define PPC_RAW_CMPLW(a, b) (0x7c000040 | ___PPC_RA(a) | ___PPC_RB(b))
#define PPC_RAW_CMPLD(a, b) (0x7c200040 | ___PPC_RA(a) | ___PPC_RB(b))
#define PPC_RAW_SUB(d, a, b) (0x7c000050 | ___PPC_RT(d) | ___PPC_RB(a) | ___PPC_RA(b))
+#define PPC_RAW_SUBFC(d, a, b) (0x7c000010 | ___PPC_RT(d) | ___PPC_RA(a) | ___PPC_RB(b))
+#define PPC_RAW_SUBFE(d, a, b) (0x7c000110 | ___PPC_RT(d) | ___PPC_RA(a) | ___PPC_RB(b))
+#define PPC_RAW_SUBFIC(d, a, i) (0x20000000 | ___PPC_RT(d) | ___PPC_RA(a) | IMM_L(i))
+#define PPC_RAW_SUBFZE(d, a) (0x7c000190 | ___PPC_RT(d) | ___PPC_RA(a))
#define PPC_RAW_MULD(d, a, b) (0x7c0001d2 | ___PPC_RT(d) | ___PPC_RA(a) | ___PPC_RB(b))
#define PPC_RAW_MULW(d, a, b) (0x7c0001d6 | ___PPC_RT(d) | ___PPC_RA(a) | ___PPC_RB(b))
#define PPC_RAW_MULHWU(d, a, b) (0x7c000016 | ___PPC_RT(d) | ___PPC_RA(a) | ___PPC_RB(b))
@@ -472,11 +481,13 @@
#define PPC_RAW_DIVDEU_DOT(t, a, b) (0x7c000312 | ___PPC_RT(t) | ___PPC_RA(a) | ___PPC_RB(b) | 0x1)
#define PPC_RAW_AND(d, a, b) (0x7c000038 | ___PPC_RA(d) | ___PPC_RS(a) | ___PPC_RB(b))
#define PPC_RAW_ANDI(d, a, i) (0x70000000 | ___PPC_RA(d) | ___PPC_RS(a) | IMM_L(i))
+#define PPC_RAW_ANDIS(d, a, i) (0x74000000 | ___PPC_RA(d) | ___PPC_RS(a) | IMM_L(i))
#define PPC_RAW_AND_DOT(d, a, b) (0x7c000039 | ___PPC_RA(d) | ___PPC_RS(a) | ___PPC_RB(b))
#define PPC_RAW_OR(d, a, b) (0x7c000378 | ___PPC_RA(d) | ___PPC_RS(a) | ___PPC_RB(b))
#define PPC_RAW_MR(d, a) PPC_RAW_OR(d, a, a)
#define PPC_RAW_ORI(d, a, i) (PPC_INST_ORI | ___PPC_RA(d) | ___PPC_RS(a) | IMM_L(i))
#define PPC_RAW_ORIS(d, a, i) (PPC_INST_ORIS | ___PPC_RA(d) | ___PPC_RS(a) | IMM_L(i))
+#define PPC_RAW_NOR(d, a, b) (0x7c0000f8 | ___PPC_RA(d) | ___PPC_RS(a) | ___PPC_RB(b))
#define PPC_RAW_XOR(d, a, b) (0x7c000278 | ___PPC_RA(d) | ___PPC_RS(a) | ___PPC_RB(b))
#define PPC_RAW_XORI(d, a, i) (0x68000000 | ___PPC_RA(d) | ___PPC_RS(a) | IMM_L(i))
#define PPC_RAW_XORIS(d, a, i) (0x6c000000 | ___PPC_RA(d) | ___PPC_RS(a) | IMM_L(i))
diff --git a/arch/powerpc/net/Makefile b/arch/powerpc/net/Makefile
index c2dec3a68d4c..bfc17c54e39a 100644
--- a/arch/powerpc/net/Makefile
+++ b/arch/powerpc/net/Makefile
@@ -2,8 +2,4 @@
#
# Arch-specific network modules
#
-ifdef CONFIG_PPC64
-obj-$(CONFIG_BPF_JIT) += bpf_jit_comp64.o
-else
-obj-$(CONFIG_BPF_JIT) += bpf_jit_asm.o bpf_jit_comp.o
-endif
+obj-$(CONFIG_BPF_JIT) += bpf_jit_comp$(BITS).o
diff --git a/arch/powerpc/net/bpf_jit32.h b/arch/powerpc/net/bpf_jit32.h
index 448dfd4d98e1..f6f0aa87eedf 100644
--- a/arch/powerpc/net/bpf_jit32.h
+++ b/arch/powerpc/net/bpf_jit32.h
@@ -1,139 +1,84 @@
/* SPDX-License-Identifier: GPL-2.0-only */
/*
- * bpf_jit32.h: BPF JIT compiler for PPC
+ * bpf_jit32.h: BPF JIT compiler for PPC32
*
- * Copyright 2011 Matt Evans <matt@ozlabs.org>, IBM Corporation
- *
- * Split from bpf_jit.h
+ * Copyright 2016 Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
+ * IBM Corporation
*/
#ifndef _BPF_JIT32_H
#define _BPF_JIT32_H
-#include <asm/asm-compat.h>
#include "bpf_jit.h"
-#ifdef CONFIG_PPC64
-#define BPF_PPC_STACK_R3_OFF 48
-#define BPF_PPC_STACK_LOCALS 32
-#define BPF_PPC_STACK_BASIC (48+64)
-#define BPF_PPC_STACK_SAVE (18*8)
-#define BPF_PPC_STACKFRAME (BPF_PPC_STACK_BASIC+BPF_PPC_STACK_LOCALS+ \
- BPF_PPC_STACK_SAVE)
-#define BPF_PPC_SLOWPATH_FRAME (48+64)
-#else
-#define BPF_PPC_STACK_R3_OFF 24
-#define BPF_PPC_STACK_LOCALS 16
-#define BPF_PPC_STACK_BASIC (24+32)
-#define BPF_PPC_STACK_SAVE (18*4)
-#define BPF_PPC_STACKFRAME (BPF_PPC_STACK_BASIC+BPF_PPC_STACK_LOCALS+ \
- BPF_PPC_STACK_SAVE)
-#define BPF_PPC_SLOWPATH_FRAME (24+32)
-#endif
-
-#define REG_SZ (BITS_PER_LONG/8)
-
/*
- * Generated code register usage:
- *
- * As normal PPC C ABI (e.g. r1=sp, r2=TOC), with:
+ * Stack layout:
*
- * skb r3 (Entry parameter)
- * A register r4
- * X register r5
- * addr param r6
- * r7-r10 scratch
- * skb->data r14
- * skb headlen r15 (skb->len - skb->data_len)
- * m[0] r16
- * m[...] ...
- * m[15] r31
+ * [ prev sp ] <-------------
+ * [ nv gpr save area ] 6*8 |
+ * [ tail_call_cnt ] 8 |
+ * [ local_tmp_var ] 8 |
+ * fp (r31) --> [ ebpf stack space ] upto 512 |
+ * [ frame header ] 32/112 |
+ * sp (r1) ---> [ stack pointer ] --------------
*/
-#define r_skb 3
-#define r_ret 3
-#define r_A 4
-#define r_X 5
-#define r_addr 6
-#define r_scratch1 7
-#define r_scratch2 8
-#define r_D 14
-#define r_HL 15
-#define r_M 16
-
-#ifndef __ASSEMBLY__
-
-/*
- * Assembly helpers from arch/powerpc/net/bpf_jit.S:
- */
-#define DECLARE_LOAD_FUNC(func) \
- extern u8 func[], func##_negative_offset[], func##_positive_offset[]
-
-DECLARE_LOAD_FUNC(sk_load_word);
-DECLARE_LOAD_FUNC(sk_load_half);
-DECLARE_LOAD_FUNC(sk_load_byte);
-DECLARE_LOAD_FUNC(sk_load_byte_msh);
-#define PPC_LBZ_OFFS(r, base, i) do { if ((i) < 32768) EMIT(PPC_RAW_LBZ(r, base, i)); \
- else { EMIT(PPC_RAW_ADDIS(r, base, IMM_HA(i))); \
- EMIT(PPC_RAW_LBZ(r, r, IMM_L(i))); } } while(0)
-
-#define PPC_LD_OFFS(r, base, i) do { if ((i) < 32768) EMIT(PPC_RAW_LD(r, base, i)); \
- else { EMIT(PPC_RAW_ADDIS(r, base, IMM_HA(i))); \
- EMIT(PPC_RAW_LD(r, r, IMM_L(i))); } } while(0)
-
-#define PPC_LWZ_OFFS(r, base, i) do { if ((i) < 32768) EMIT(PPC_RAW_LWZ(r, base, i)); \
- else { EMIT(PPC_RAW_ADDIS(r, base, IMM_HA(i))); \
- EMIT(PPC_RAW_LWZ(r, r, IMM_L(i))); } } while(0)
-
-#define PPC_LHZ_OFFS(r, base, i) do { if ((i) < 32768) EMIT(PPC_RAW_LHZ(r, base, i)); \
- else { EMIT(PPC_RAW_ADDIS(r, base, IMM_HA(i))); \
- EMIT(PPC_RAW_LHZ(r, r, IMM_L(i))); } } while(0)
-
-#ifdef CONFIG_PPC64
-#define PPC_LL_OFFS(r, base, i) do { PPC_LD_OFFS(r, base, i); } while(0)
-#else
-#define PPC_LL_OFFS(r, base, i) do { PPC_LWZ_OFFS(r, base, i); } while(0)
-#endif
+/* for gpr non volatile registers BPG_REG_6 to 10 */
+#define BPF_PPC_STACK_SAVE ((17+3)*4)
+/* for bpf JIT code internal usage */
+#define BPF_PPC_STACK_LOCALS 16
+/* stack frame excluding BPF stack, ensure this is quadword aligned */
+#define BPF_PPC_STACKFRAME (STACK_FRAME_MIN_SIZE + \
+ BPF_PPC_STACK_LOCALS + BPF_PPC_STACK_SAVE)
-#ifdef CONFIG_SMP
-#ifdef CONFIG_PPC64
-#define PPC_BPF_LOAD_CPU(r) \
- do { BUILD_BUG_ON(sizeof_field(struct paca_struct, paca_index) != 2); \
- PPC_LHZ_OFFS(r, 13, offsetof(struct paca_struct, paca_index)); \
- } while (0)
-#else
-#define PPC_BPF_LOAD_CPU(r) \
- do { BUILD_BUG_ON(sizeof_field(struct task_struct, cpu) != 4); \
- PPC_LHZ_OFFS(r, 2, offsetof(struct task_struct, cpu)); \
- } while(0)
-#endif
-#else
-#define PPC_BPF_LOAD_CPU(r) do { EMIT(PPC_RAW_LI(r, 0)); } while(0)
-#endif
+#ifndef __ASSEMBLY__
-#define PPC_LHBRX_OFFS(r, base, i) \
- do { PPC_LI32(r, i); EMIT(PPC_RAW_LHBRX(r, r, base)); } while(0)
-#ifdef __LITTLE_ENDIAN__
-#define PPC_NTOHS_OFFS(r, base, i) PPC_LHBRX_OFFS(r, base, i)
-#else
-#define PPC_NTOHS_OFFS(r, base, i) PPC_LHZ_OFFS(r, base, i)
-#endif
+/* BPF register usage */
+#define TMP_REG (MAX_BPF_JIT_REG + 0)
+
+/* BPF to ppc register mappings */
+static const int b2p[] = {
+ /* function return value */
+ [BPF_REG_0] = 22,
+ /* function arguments */
+ [BPF_REG_1] = 4,
+ [BPF_REG_2] = 6,
+ [BPF_REG_3] = 8,
+ [BPF_REG_4] = 10,
+ [BPF_REG_5] = 12,
+ /* non volatile registers */
+ [BPF_REG_6] = 24,
+ [BPF_REG_7] = 26,
+ [BPF_REG_8] = 28,
+ [BPF_REG_9] = 30,
+ /* frame pointer aka BPF_REG_10 */
+ [BPF_REG_FP] = 31,
+ /* eBPF jit internal registers */
+ [BPF_REG_AX] = 20,
+ [TMP_REG] = 18,
+};
-#define PPC_BPF_LL(r, base, i) do { EMIT(PPC_RAW_LWZ(r, base, i)); } while(0)
-#define PPC_BPF_STL(r, base, i) do { EMIT(PPC_RAW_STW(r, base, i)); } while(0)
-#define PPC_BPF_STLU(r, base, i) do { EMIT(PPC_RAW_STWU(r, base, i)); } while(0)
+/* PPC NVR range -- update this if we ever use NVRs below r27 */
+#define BPF_PPC_NVR_MIN 18
-#define SEEN_DATAREF 0x10000 /* might call external helpers */
-#define SEEN_XREG 0x20000 /* X reg is used */
-#define SEEN_MEM 0x40000 /* SEEN_MEM+(1<<n) = use mem[n] for temporary
- * storage */
-#define SEEN_MEM_MSK 0x0ffff
+#define SEEN_FUNC 0x20000000 /* might call external helpers */
+#define SEEN_STACK 0x40000000 /* uses BPF stack */
+#define SEEN_TAILCALL 0x80000000 /* uses tail calls */
struct codegen_context {
+ /*
+ * This is used to track register usage as well
+ * as calls to external helpers.
+ * - register usage is tracked with corresponding
+ * bits (r3-r10 and r27-r31)
+ * - rest of the bits can be used to track other
+ * things -- for now, we use bits 16 to 23
+ * encoded in SEEN_* macros above
+ */
unsigned int seen;
unsigned int idx;
- int pc_ret0; /* bpf index of first RET #0 instruction (if any) */
+ unsigned int stack_size;
};
-#endif
+#endif /* !__ASSEMBLY__ */
#endif
diff --git a/arch/powerpc/net/bpf_jit_asm.S b/arch/powerpc/net/bpf_jit_asm.S
deleted file mode 100644
index 2f5030d8383f..000000000000
--- a/arch/powerpc/net/bpf_jit_asm.S
+++ /dev/null
@@ -1,226 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/* bpf_jit.S: Packet/header access helper functions
- * for PPC64 BPF compiler.
- *
- * Copyright 2011 Matt Evans <matt@ozlabs.org>, IBM Corporation
- */
-
-#include <asm/ppc_asm.h>
-#include <asm/asm-compat.h>
-#include "bpf_jit32.h"
-
-/*
- * All of these routines are called directly from generated code,
- * whose register usage is:
- *
- * r3 skb
- * r4,r5 A,X
- * r6 *** address parameter to helper ***
- * r7-r10 scratch
- * r14 skb->data
- * r15 skb headlen
- * r16-31 M[]
- */
-
-/*
- * To consider: These helpers are so small it could be better to just
- * generate them inline. Inline code can do the simple headlen check
- * then branch directly to slow_path_XXX if required. (In fact, could
- * load a spare GPR with the address of slow_path_generic and pass size
- * as an argument, making the call site a mtlr, li and bllr.)
- */
- .globl sk_load_word
-sk_load_word:
- PPC_LCMPI r_addr, 0
- blt bpf_slow_path_word_neg
- .globl sk_load_word_positive_offset
-sk_load_word_positive_offset:
- /* Are we accessing past headlen? */
- subi r_scratch1, r_HL, 4
- PPC_LCMP r_scratch1, r_addr
- blt bpf_slow_path_word
- /* Nope, just hitting the header. cr0 here is eq or gt! */
-#ifdef __LITTLE_ENDIAN__
- lwbrx r_A, r_D, r_addr
-#else
- lwzx r_A, r_D, r_addr
-#endif
- blr /* Return success, cr0 != LT */
-
- .globl sk_load_half
-sk_load_half:
- PPC_LCMPI r_addr, 0
- blt bpf_slow_path_half_neg
- .globl sk_load_half_positive_offset
-sk_load_half_positive_offset:
- subi r_scratch1, r_HL, 2
- PPC_LCMP r_scratch1, r_addr
- blt bpf_slow_path_half
-#ifdef __LITTLE_ENDIAN__
- lhbrx r_A, r_D, r_addr
-#else
- lhzx r_A, r_D, r_addr
-#endif
- blr
-
- .globl sk_load_byte
-sk_load_byte:
- PPC_LCMPI r_addr, 0
- blt bpf_slow_path_byte_neg
- .globl sk_load_byte_positive_offset
-sk_load_byte_positive_offset:
- PPC_LCMP r_HL, r_addr
- ble bpf_slow_path_byte
- lbzx r_A, r_D, r_addr
- blr
-
-/*
- * BPF_LDX | BPF_B | BPF_MSH: ldxb 4*([offset]&0xf)
- * r_addr is the offset value
- */
- .globl sk_load_byte_msh
-sk_load_byte_msh:
- PPC_LCMPI r_addr, 0
- blt bpf_slow_path_byte_msh_neg
- .globl sk_load_byte_msh_positive_offset
-sk_load_byte_msh_positive_offset:
- PPC_LCMP r_HL, r_addr
- ble bpf_slow_path_byte_msh
- lbzx r_X, r_D, r_addr
- rlwinm r_X, r_X, 2, 32-4-2, 31-2
- blr
-
-/* Call out to skb_copy_bits:
- * We'll need to back up our volatile regs first; we have
- * local variable space at r1+(BPF_PPC_STACK_BASIC).
- * Allocate a new stack frame here to remain ABI-compliant in
- * stashing LR.
- */
-#define bpf_slow_path_common(SIZE) \
- mflr r0; \
- PPC_STL r0, PPC_LR_STKOFF(r1); \
- /* R3 goes in parameter space of caller's frame */ \
- PPC_STL r_skb, (BPF_PPC_STACKFRAME+BPF_PPC_STACK_R3_OFF)(r1); \
- PPC_STL r_A, (BPF_PPC_STACK_BASIC+(0*REG_SZ))(r1); \
- PPC_STL r_X, (BPF_PPC_STACK_BASIC+(1*REG_SZ))(r1); \
- addi r5, r1, BPF_PPC_STACK_BASIC+(2*REG_SZ); \
- PPC_STLU r1, -BPF_PPC_SLOWPATH_FRAME(r1); \
- /* R3 = r_skb, as passed */ \
- mr r4, r_addr; \
- li r6, SIZE; \
- bl skb_copy_bits; \
- nop; \
- /* R3 = 0 on success */ \
- addi r1, r1, BPF_PPC_SLOWPATH_FRAME; \
- PPC_LL r0, PPC_LR_STKOFF(r1); \
- PPC_LL r_A, (BPF_PPC_STACK_BASIC+(0*REG_SZ))(r1); \
- PPC_LL r_X, (BPF_PPC_STACK_BASIC+(1*REG_SZ))(r1); \
- mtlr r0; \
- PPC_LCMPI r3, 0; \
- blt bpf_error; /* cr0 = LT */ \
- PPC_LL r_skb, (BPF_PPC_STACKFRAME+BPF_PPC_STACK_R3_OFF)(r1); \
- /* Great success! */
-
-bpf_slow_path_word:
- bpf_slow_path_common(4)
- /* Data value is on stack, and cr0 != LT */
- lwz r_A, BPF_PPC_STACK_BASIC+(2*REG_SZ)(r1)
- blr
-
-bpf_slow_path_half:
- bpf_slow_path_common(2)
- lhz r_A, BPF_PPC_STACK_BASIC+(2*8)(r1)
- blr
-
-bpf_slow_path_byte:
- bpf_slow_path_common(1)
- lbz r_A, BPF_PPC_STACK_BASIC+(2*8)(r1)
- blr
-
-bpf_slow_path_byte_msh:
- bpf_slow_path_common(1)
- lbz r_X, BPF_PPC_STACK_BASIC+(2*8)(r1)
- rlwinm r_X, r_X, 2, 32-4-2, 31-2
- blr
-
-/* Call out to bpf_internal_load_pointer_neg_helper:
- * We'll need to back up our volatile regs first; we have
- * local variable space at r1+(BPF_PPC_STACK_BASIC).
- * Allocate a new stack frame here to remain ABI-compliant in
- * stashing LR.
- */
-#define sk_negative_common(SIZE) \
- mflr r0; \
- PPC_STL r0, PPC_LR_STKOFF(r1); \
- /* R3 goes in parameter space of caller's frame */ \
- PPC_STL r_skb, (BPF_PPC_STACKFRAME+BPF_PPC_STACK_R3_OFF)(r1); \
- PPC_STL r_A, (BPF_PPC_STACK_BASIC+(0*REG_SZ))(r1); \
- PPC_STL r_X, (BPF_PPC_STACK_BASIC+(1*REG_SZ))(r1); \
- PPC_STLU r1, -BPF_PPC_SLOWPATH_FRAME(r1); \
- /* R3 = r_skb, as passed */ \
- mr r4, r_addr; \
- li r5, SIZE; \
- bl bpf_internal_load_pointer_neg_helper; \
- nop; \
- /* R3 != 0 on success */ \
- addi r1, r1, BPF_PPC_SLOWPATH_FRAME; \
- PPC_LL r0, PPC_LR_STKOFF(r1); \
- PPC_LL r_A, (BPF_PPC_STACK_BASIC+(0*REG_SZ))(r1); \
- PPC_LL r_X, (BPF_PPC_STACK_BASIC+(1*REG_SZ))(r1); \
- mtlr r0; \
- PPC_LCMPLI r3, 0; \
- beq bpf_error_slow; /* cr0 = EQ */ \
- mr r_addr, r3; \
- PPC_LL r_skb, (BPF_PPC_STACKFRAME+BPF_PPC_STACK_R3_OFF)(r1); \
- /* Great success! */
-
-bpf_slow_path_word_neg:
- lis r_scratch1,-32 /* SKF_LL_OFF */
- PPC_LCMP r_addr, r_scratch1 /* addr < SKF_* */
- blt bpf_error /* cr0 = LT */
- .globl sk_load_word_negative_offset
-sk_load_word_negative_offset:
- sk_negative_common(4)
- lwz r_A, 0(r_addr)
- blr
-
-bpf_slow_path_half_neg:
- lis r_scratch1,-32 /* SKF_LL_OFF */
- PPC_LCMP r_addr, r_scratch1 /* addr < SKF_* */
- blt bpf_error /* cr0 = LT */
- .globl sk_load_half_negative_offset
-sk_load_half_negative_offset:
- sk_negative_common(2)
- lhz r_A, 0(r_addr)
- blr
-
-bpf_slow_path_byte_neg:
- lis r_scratch1,-32 /* SKF_LL_OFF */
- PPC_LCMP r_addr, r_scratch1 /* addr < SKF_* */
- blt bpf_error /* cr0 = LT */
- .globl sk_load_byte_negative_offset
-sk_load_byte_negative_offset:
- sk_negative_common(1)
- lbz r_A, 0(r_addr)
- blr
-
-bpf_slow_path_byte_msh_neg:
- lis r_scratch1,-32 /* SKF_LL_OFF */
- PPC_LCMP r_addr, r_scratch1 /* addr < SKF_* */
- blt bpf_error /* cr0 = LT */
- .globl sk_load_byte_msh_negative_offset
-sk_load_byte_msh_negative_offset:
- sk_negative_common(1)
- lbz r_X, 0(r_addr)
- rlwinm r_X, r_X, 2, 32-4-2, 31-2
- blr
-
-bpf_error_slow:
- /* fabricate a cr0 = lt */
- li r_scratch1, -1
- PPC_LCMPI r_scratch1, 0
-bpf_error:
- /* Entered with cr0 = lt */
- li r3, 0
- /* Generated code will 'blt epilogue', returning 0. */
- blr
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
deleted file mode 100644
index e809cb5a1631..000000000000
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ /dev/null
@@ -1,683 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/* bpf_jit_comp.c: BPF JIT compiler
- *
- * Copyright 2011 Matt Evans <matt@ozlabs.org>, IBM Corporation
- *
- * Based on the x86 BPF compiler, by Eric Dumazet (eric.dumazet@gmail.com)
- * Ported to ppc32 by Denis Kirjanov <kda@linux-powerpc.org>
- */
-#include <linux/moduleloader.h>
-#include <asm/cacheflush.h>
-#include <asm/asm-compat.h>
-#include <linux/netdevice.h>
-#include <linux/filter.h>
-#include <linux/if_vlan.h>
-
-#include "bpf_jit32.h"
-
-static inline void bpf_flush_icache(void *start, void *end)
-{
- smp_wmb();
- flush_icache_range((unsigned long)start, (unsigned long)end);
-}
-
-static void bpf_jit_build_prologue(struct bpf_prog *fp, u32 *image,
- struct codegen_context *ctx)
-{
- int i;
- const struct sock_filter *filter = fp->insns;
-
- if (ctx->seen & (SEEN_MEM | SEEN_DATAREF)) {
- /* Make stackframe */
- if (ctx->seen & SEEN_DATAREF) {
- /* If we call any helpers (for loads), save LR */
- EMIT(PPC_INST_MFLR | __PPC_RT(R0));
- PPC_BPF_STL(0, 1, PPC_LR_STKOFF);
-
- /* Back up non-volatile regs. */
- PPC_BPF_STL(r_D, 1, -(REG_SZ*(32-r_D)));
- PPC_BPF_STL(r_HL, 1, -(REG_SZ*(32-r_HL)));
- }
- if (ctx->seen & SEEN_MEM) {
- /*
- * Conditionally save regs r15-r31 as some will be used
- * for M[] data.
- */
- for (i = r_M; i < (r_M+16); i++) {
- if (ctx->seen & (1 << (i-r_M)))
- PPC_BPF_STL(i, 1, -(REG_SZ*(32-i)));
- }
- }
- PPC_BPF_STLU(1, 1, -BPF_PPC_STACKFRAME);
- }
-
- if (ctx->seen & SEEN_DATAREF) {
- /*
- * If this filter needs to access skb data,
- * prepare r_D and r_HL:
- * r_HL = skb->len - skb->data_len
- * r_D = skb->data
- */
- PPC_LWZ_OFFS(r_scratch1, r_skb, offsetof(struct sk_buff,
- data_len));
- PPC_LWZ_OFFS(r_HL, r_skb, offsetof(struct sk_buff, len));
- EMIT(PPC_RAW_SUB(r_HL, r_HL, r_scratch1));
- PPC_LL_OFFS(r_D, r_skb, offsetof(struct sk_buff, data));
- }
-
- if (ctx->seen & SEEN_XREG) {
- /*
- * TODO: Could also detect whether first instr. sets X and
- * avoid this (as below, with A).
- */
- EMIT(PPC_RAW_LI(r_X, 0));
- }
-
- /* make sure we dont leak kernel information to user */
- if (bpf_needs_clear_a(&filter[0]))
- EMIT(PPC_RAW_LI(r_A, 0));
-}
-
-static void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
-{
- int i;
-
- if (ctx->seen & (SEEN_MEM | SEEN_DATAREF)) {
- EMIT(PPC_RAW_ADDI(1, 1, BPF_PPC_STACKFRAME));
- if (ctx->seen & SEEN_DATAREF) {
- PPC_BPF_LL(0, 1, PPC_LR_STKOFF);
- EMIT(PPC_RAW_MTLR(0));
- PPC_BPF_LL(r_D, 1, -(REG_SZ*(32-r_D)));
- PPC_BPF_LL(r_HL, 1, -(REG_SZ*(32-r_HL)));
- }
- if (ctx->seen & SEEN_MEM) {
- /* Restore any saved non-vol registers */
- for (i = r_M; i < (r_M+16); i++) {
- if (ctx->seen & (1 << (i-r_M)))
- PPC_BPF_LL(i, 1, -(REG_SZ*(32-i)));
- }
- }
- }
- /* The RETs have left a return value in R3. */
-
- EMIT(PPC_RAW_BLR());
-}
-
-#define CHOOSE_LOAD_FUNC(K, func) \
- ((int)K < 0 ? ((int)K >= SKF_LL_OFF ? func##_negative_offset : func) : func##_positive_offset)
-
-/* Assemble the body code between the prologue & epilogue. */
-static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
- struct codegen_context *ctx,
- unsigned int *addrs)
-{
- const struct sock_filter *filter = fp->insns;
- int flen = fp->len;
- u8 *func;
- unsigned int true_cond;
- int i;
-
- /* Start of epilogue code */
- unsigned int exit_addr = addrs[flen];
-
- for (i = 0; i < flen; i++) {
- unsigned int K = filter[i].k;
- u16 code = bpf_anc_helper(&filter[i]);
-
- /*
- * addrs[] maps a BPF bytecode address into a real offset from
- * the start of the body code.
- */
- addrs[i] = ctx->idx * 4;
-
- switch (code) {
- /*** ALU ops ***/
- case BPF_ALU | BPF_ADD | BPF_X: /* A += X; */
- ctx->seen |= SEEN_XREG;
- EMIT(PPC_RAW_ADD(r_A, r_A, r_X));
- break;
- case BPF_ALU | BPF_ADD | BPF_K: /* A += K; */
- if (!K)
- break;
- EMIT(PPC_RAW_ADDI(r_A, r_A, IMM_L(K)));
- if (K >= 32768)
- EMIT(PPC_RAW_ADDIS(r_A, r_A, IMM_HA(K)));
- break;
- case BPF_ALU | BPF_SUB | BPF_X: /* A -= X; */
- ctx->seen |= SEEN_XREG;
- EMIT(PPC_RAW_SUB(r_A, r_A, r_X));
- break;
- case BPF_ALU | BPF_SUB | BPF_K: /* A -= K */
- if (!K)
- break;
- EMIT(PPC_RAW_ADDI(r_A, r_A, IMM_L(-K)));
- if (K >= 32768)
- EMIT(PPC_RAW_ADDIS(r_A, r_A, IMM_HA(-K)));
- break;
- case BPF_ALU | BPF_MUL | BPF_X: /* A *= X; */
- ctx->seen |= SEEN_XREG;
- EMIT(PPC_RAW_MULW(r_A, r_A, r_X));
- break;
- case BPF_ALU | BPF_MUL | BPF_K: /* A *= K */
- if (K < 32768)
- EMIT(PPC_RAW_MULI(r_A, r_A, K));
- else {
- PPC_LI32(r_scratch1, K);
- EMIT(PPC_RAW_MULW(r_A, r_A, r_scratch1));
- }
- break;
- case BPF_ALU | BPF_MOD | BPF_X: /* A %= X; */
- case BPF_ALU | BPF_DIV | BPF_X: /* A /= X; */
- ctx->seen |= SEEN_XREG;
- EMIT(PPC_RAW_CMPWI(r_X, 0));
- if (ctx->pc_ret0 != -1) {
- PPC_BCC(COND_EQ, addrs[ctx->pc_ret0]);
- } else {
- PPC_BCC_SHORT(COND_NE, (ctx->idx*4)+12);
- EMIT(PPC_RAW_LI(r_ret, 0));
- PPC_JMP(exit_addr);
- }
- if (code == (BPF_ALU | BPF_MOD | BPF_X)) {
- EMIT(PPC_RAW_DIVWU(r_scratch1, r_A, r_X));
- EMIT(PPC_RAW_MULW(r_scratch1, r_X, r_scratch1));
- EMIT(PPC_RAW_SUB(r_A, r_A, r_scratch1));
- } else {
- EMIT(PPC_RAW_DIVWU(r_A, r_A, r_X));
- }
- break;
- case BPF_ALU | BPF_MOD | BPF_K: /* A %= K; */
- PPC_LI32(r_scratch2, K);
- EMIT(PPC_RAW_DIVWU(r_scratch1, r_A, r_scratch2));
- EMIT(PPC_RAW_MULW(r_scratch1, r_scratch2, r_scratch1));
- EMIT(PPC_RAW_SUB(r_A, r_A, r_scratch1));
- break;
- case BPF_ALU | BPF_DIV | BPF_K: /* A /= K */
- if (K == 1)
- break;
- PPC_LI32(r_scratch1, K);
- EMIT(PPC_RAW_DIVWU(r_A, r_A, r_scratch1));
- break;
- case BPF_ALU | BPF_AND | BPF_X:
- ctx->seen |= SEEN_XREG;
- EMIT(PPC_RAW_AND(r_A, r_A, r_X));
- break;
- case BPF_ALU | BPF_AND | BPF_K:
- if (!IMM_H(K))
- EMIT(PPC_RAW_ANDI(r_A, r_A, K));
- else {
- PPC_LI32(r_scratch1, K);
- EMIT(PPC_RAW_AND(r_A, r_A, r_scratch1));
- }
- break;
- case BPF_ALU | BPF_OR | BPF_X:
- ctx->seen |= SEEN_XREG;
- EMIT(PPC_RAW_OR(r_A, r_A, r_X));
- break;
- case BPF_ALU | BPF_OR | BPF_K:
- if (IMM_L(K))
- EMIT(PPC_RAW_ORI(r_A, r_A, IMM_L(K)));
- if (K >= 65536)
- EMIT(PPC_RAW_ORIS(r_A, r_A, IMM_H(K)));
- break;
- case BPF_ANC | SKF_AD_ALU_XOR_X:
- case BPF_ALU | BPF_XOR | BPF_X: /* A ^= X */
- ctx->seen |= SEEN_XREG;
- EMIT(PPC_RAW_XOR(r_A, r_A, r_X));
- break;
- case BPF_ALU | BPF_XOR | BPF_K: /* A ^= K */
- if (IMM_L(K))
- EMIT(PPC_RAW_XORI(r_A, r_A, IMM_L(K)));
- if (K >= 65536)
- EMIT(PPC_RAW_XORIS(r_A, r_A, IMM_H(K)));
- break;
- case BPF_ALU | BPF_LSH | BPF_X: /* A <<= X; */
- ctx->seen |= SEEN_XREG;
- EMIT(PPC_RAW_SLW(r_A, r_A, r_X));
- break;
- case BPF_ALU | BPF_LSH | BPF_K:
- if (K == 0)
- break;
- else
- EMIT(PPC_RAW_SLWI(r_A, r_A, K));
- break;
- case BPF_ALU | BPF_RSH | BPF_X: /* A >>= X; */
- ctx->seen |= SEEN_XREG;
- EMIT(PPC_RAW_SRW(r_A, r_A, r_X));
- break;
- case BPF_ALU | BPF_RSH | BPF_K: /* A >>= K; */
- if (K == 0)
- break;
- else
- EMIT(PPC_RAW_SRWI(r_A, r_A, K));
- break;
- case BPF_ALU | BPF_NEG:
- EMIT(PPC_RAW_NEG(r_A, r_A));
- break;
- case BPF_RET | BPF_K:
- PPC_LI32(r_ret, K);
- if (!K) {
- if (ctx->pc_ret0 == -1)
- ctx->pc_ret0 = i;
- }
- /*
- * If this isn't the very last instruction, branch to
- * the epilogue if we've stuff to clean up. Otherwise,
- * if there's nothing to tidy, just return. If we /are/
- * the last instruction, we're about to fall through to
- * the epilogue to return.
- */
- if (i != flen - 1) {
- /*
- * Note: 'seen' is properly valid only on pass
- * #2. Both parts of this conditional are the
- * same instruction size though, meaning the
- * first pass will still correctly determine the
- * code size/addresses.
- */
- if (ctx->seen)
- PPC_JMP(exit_addr);
- else
- EMIT(PPC_RAW_BLR());
- }
- break;
- case BPF_RET | BPF_A:
- EMIT(PPC_RAW_MR(r_ret, r_A));
- if (i != flen - 1) {
- if (ctx->seen)
- PPC_JMP(exit_addr);
- else
- EMIT(PPC_RAW_BLR());
- }
- break;
- case BPF_MISC | BPF_TAX: /* X = A */
- EMIT(PPC_RAW_MR(r_X, r_A));
- break;
- case BPF_MISC | BPF_TXA: /* A = X */
- ctx->seen |= SEEN_XREG;
- EMIT(PPC_RAW_MR(r_A, r_X));
- break;
-
- /*** Constant loads/M[] access ***/
- case BPF_LD | BPF_IMM: /* A = K */
- PPC_LI32(r_A, K);
- break;
- case BPF_LDX | BPF_IMM: /* X = K */
- PPC_LI32(r_X, K);
- break;
- case BPF_LD | BPF_MEM: /* A = mem[K] */
- EMIT(PPC_RAW_MR(r_A, r_M + (K & 0xf)));
- ctx->seen |= SEEN_MEM | (1<<(K & 0xf));
- break;
- case BPF_LDX | BPF_MEM: /* X = mem[K] */
- EMIT(PPC_RAW_MR(r_X, r_M + (K & 0xf)));
- ctx->seen |= SEEN_MEM | (1<<(K & 0xf));
- break;
- case BPF_ST: /* mem[K] = A */
- EMIT(PPC_RAW_MR(r_M + (K & 0xf), r_A));
- ctx->seen |= SEEN_MEM | (1<<(K & 0xf));
- break;
- case BPF_STX: /* mem[K] = X */
- EMIT(PPC_RAW_MR(r_M + (K & 0xf), r_X));
- ctx->seen |= SEEN_XREG | SEEN_MEM | (1<<(K & 0xf));
- break;
- case BPF_LD | BPF_W | BPF_LEN: /* A = skb->len; */
- BUILD_BUG_ON(sizeof_field(struct sk_buff, len) != 4);
- PPC_LWZ_OFFS(r_A, r_skb, offsetof(struct sk_buff, len));
- break;
- case BPF_LDX | BPF_W | BPF_ABS: /* A = *((u32 *)(seccomp_data + K)); */
- PPC_LWZ_OFFS(r_A, r_skb, K);
- break;
- case BPF_LDX | BPF_W | BPF_LEN: /* X = skb->len; */
- PPC_LWZ_OFFS(r_X, r_skb, offsetof(struct sk_buff, len));
- break;
-
- /*** Ancillary info loads ***/
- case BPF_ANC | SKF_AD_PROTOCOL: /* A = ntohs(skb->protocol); */
- BUILD_BUG_ON(sizeof_field(struct sk_buff,
- protocol) != 2);
- PPC_NTOHS_OFFS(r_A, r_skb, offsetof(struct sk_buff,
- protocol));
- break;
- case BPF_ANC | SKF_AD_IFINDEX:
- case BPF_ANC | SKF_AD_HATYPE:
- BUILD_BUG_ON(sizeof_field(struct net_device,
- ifindex) != 4);
- BUILD_BUG_ON(sizeof_field(struct net_device,
- type) != 2);
- PPC_LL_OFFS(r_scratch1, r_skb, offsetof(struct sk_buff,
- dev));
- EMIT(PPC_RAW_CMPDI(r_scratch1, 0));
- if (ctx->pc_ret0 != -1) {
- PPC_BCC(COND_EQ, addrs[ctx->pc_ret0]);
- } else {
- /* Exit, returning 0; first pass hits here. */
- PPC_BCC_SHORT(COND_NE, ctx->idx * 4 + 12);
- EMIT(PPC_RAW_LI(r_ret, 0));
- PPC_JMP(exit_addr);
- }
- if (code == (BPF_ANC | SKF_AD_IFINDEX)) {
- PPC_LWZ_OFFS(r_A, r_scratch1,
- offsetof(struct net_device, ifindex));
- } else {
- PPC_LHZ_OFFS(r_A, r_scratch1,
- offsetof(struct net_device, type));
- }
-
- break;
- case BPF_ANC | SKF_AD_MARK:
- BUILD_BUG_ON(sizeof_field(struct sk_buff, mark) != 4);
- PPC_LWZ_OFFS(r_A, r_skb, offsetof(struct sk_buff,
- mark));
- break;
- case BPF_ANC | SKF_AD_RXHASH:
- BUILD_BUG_ON(sizeof_field(struct sk_buff, hash) != 4);
- PPC_LWZ_OFFS(r_A, r_skb, offsetof(struct sk_buff,
- hash));
- break;
- case BPF_ANC | SKF_AD_VLAN_TAG:
- BUILD_BUG_ON(sizeof_field(struct sk_buff, vlan_tci) != 2);
-
- PPC_LHZ_OFFS(r_A, r_skb, offsetof(struct sk_buff,
- vlan_tci));
- break;
- case BPF_ANC | SKF_AD_VLAN_TAG_PRESENT:
- PPC_LBZ_OFFS(r_A, r_skb, PKT_VLAN_PRESENT_OFFSET());
- if (PKT_VLAN_PRESENT_BIT)
- EMIT(PPC_RAW_SRWI(r_A, r_A, PKT_VLAN_PRESENT_BIT));
- if (PKT_VLAN_PRESENT_BIT < 7)
- EMIT(PPC_RAW_ANDI(r_A, r_A, 1));
- break;
- case BPF_ANC | SKF_AD_QUEUE:
- BUILD_BUG_ON(sizeof_field(struct sk_buff,
- queue_mapping) != 2);
- PPC_LHZ_OFFS(r_A, r_skb, offsetof(struct sk_buff,
- queue_mapping));
- break;
- case BPF_ANC | SKF_AD_PKTTYPE:
- PPC_LBZ_OFFS(r_A, r_skb, PKT_TYPE_OFFSET());
- EMIT(PPC_RAW_ANDI(r_A, r_A, PKT_TYPE_MAX));
- EMIT(PPC_RAW_SRWI(r_A, r_A, 5));
- break;
- case BPF_ANC | SKF_AD_CPU:
- PPC_BPF_LOAD_CPU(r_A);
- break;
- /*** Absolute loads from packet header/data ***/
- case BPF_LD | BPF_W | BPF_ABS:
- func = CHOOSE_LOAD_FUNC(K, sk_load_word);
- goto common_load;
- case BPF_LD | BPF_H | BPF_ABS:
- func = CHOOSE_LOAD_FUNC(K, sk_load_half);
- goto common_load;
- case BPF_LD | BPF_B | BPF_ABS:
- func = CHOOSE_LOAD_FUNC(K, sk_load_byte);
- common_load:
- /* Load from [K]. */
- ctx->seen |= SEEN_DATAREF;
- PPC_FUNC_ADDR(r_scratch1, func);
- EMIT(PPC_RAW_MTLR(r_scratch1));
- PPC_LI32(r_addr, K);
- EMIT(PPC_RAW_BLRL());
- /*
- * Helper returns 'lt' condition on error, and an
- * appropriate return value in r3
- */
- PPC_BCC(COND_LT, exit_addr);
- break;
-
- /*** Indirect loads from packet header/data ***/
- case BPF_LD | BPF_W | BPF_IND:
- func = sk_load_word;
- goto common_load_ind;
- case BPF_LD | BPF_H | BPF_IND:
- func = sk_load_half;
- goto common_load_ind;
- case BPF_LD | BPF_B | BPF_IND:
- func = sk_load_byte;
- common_load_ind:
- /*
- * Load from [X + K]. Negative offsets are tested for
- * in the helper functions.
- */
- ctx->seen |= SEEN_DATAREF | SEEN_XREG;
- PPC_FUNC_ADDR(r_scratch1, func);
- EMIT(PPC_RAW_MTLR(r_scratch1));
- EMIT(PPC_RAW_ADDI(r_addr, r_X, IMM_L(K)));
- if (K >= 32768)
- EMIT(PPC_RAW_ADDIS(r_addr, r_addr, IMM_HA(K)));
- EMIT(PPC_RAW_BLRL());
- /* If error, cr0.LT set */
- PPC_BCC(COND_LT, exit_addr);
- break;
-
- case BPF_LDX | BPF_B | BPF_MSH:
- func = CHOOSE_LOAD_FUNC(K, sk_load_byte_msh);
- goto common_load;
- break;
-
- /*** Jump and branches ***/
- case BPF_JMP | BPF_JA:
- if (K != 0)
- PPC_JMP(addrs[i + 1 + K]);
- break;
-
- case BPF_JMP | BPF_JGT | BPF_K:
- case BPF_JMP | BPF_JGT | BPF_X:
- true_cond = COND_GT;
- goto cond_branch;
- case BPF_JMP | BPF_JGE | BPF_K:
- case BPF_JMP | BPF_JGE | BPF_X:
- true_cond = COND_GE;
- goto cond_branch;
- case BPF_JMP | BPF_JEQ | BPF_K:
- case BPF_JMP | BPF_JEQ | BPF_X:
- true_cond = COND_EQ;
- goto cond_branch;
- case BPF_JMP | BPF_JSET | BPF_K:
- case BPF_JMP | BPF_JSET | BPF_X:
- true_cond = COND_NE;
- cond_branch:
- /* same targets, can avoid doing the test :) */
- if (filter[i].jt == filter[i].jf) {
- if (filter[i].jt > 0)
- PPC_JMP(addrs[i + 1 + filter[i].jt]);
- break;
- }
-
- switch (code) {
- case BPF_JMP | BPF_JGT | BPF_X:
- case BPF_JMP | BPF_JGE | BPF_X:
- case BPF_JMP | BPF_JEQ | BPF_X:
- ctx->seen |= SEEN_XREG;
- EMIT(PPC_RAW_CMPLW(r_A, r_X));
- break;
- case BPF_JMP | BPF_JSET | BPF_X:
- ctx->seen |= SEEN_XREG;
- EMIT(PPC_RAW_AND_DOT(r_scratch1, r_A, r_X));
- break;
- case BPF_JMP | BPF_JEQ | BPF_K:
- case BPF_JMP | BPF_JGT | BPF_K:
- case BPF_JMP | BPF_JGE | BPF_K:
- if (K < 32768)
- EMIT(PPC_RAW_CMPLWI(r_A, K));
- else {
- PPC_LI32(r_scratch1, K);
- EMIT(PPC_RAW_CMPLW(r_A, r_scratch1));
- }
- break;
- case BPF_JMP | BPF_JSET | BPF_K:
- if (K < 32768)
- /* PPC_ANDI is /only/ dot-form */
- EMIT(PPC_RAW_ANDI(r_scratch1, r_A, K));
- else {
- PPC_LI32(r_scratch1, K);
- EMIT(PPC_RAW_AND_DOT(r_scratch1, r_A,
- r_scratch1));
- }
- break;
- }
- /* Sometimes branches are constructed "backward", with
- * the false path being the branch and true path being
- * a fallthrough to the next instruction.
- */
- if (filter[i].jt == 0)
- /* Swap the sense of the branch */
- PPC_BCC(true_cond ^ COND_CMP_TRUE,
- addrs[i + 1 + filter[i].jf]);
- else {
- PPC_BCC(true_cond, addrs[i + 1 + filter[i].jt]);
- if (filter[i].jf != 0)
- PPC_JMP(addrs[i + 1 + filter[i].jf]);
- }
- break;
- default:
- /* The filter contains something cruel & unusual.
- * We don't handle it, but also there shouldn't be
- * anything missing from our list.
- */
- if (printk_ratelimit())
- pr_err("BPF filter opcode %04x (@%d) unsupported\n",
- filter[i].code, i);
- return -ENOTSUPP;
- }
-
- }
- /* Set end-of-body-code address for exit. */
- addrs[i] = ctx->idx * 4;
-
- return 0;
-}
-
-void bpf_jit_compile(struct bpf_prog *fp)
-{
- unsigned int proglen;
- unsigned int alloclen;
- u32 *image = NULL;
- u32 *code_base;
- unsigned int *addrs;
- struct codegen_context cgctx;
- int pass;
- int flen = fp->len;
-
- if (!bpf_jit_enable)
- return;
-
- addrs = kcalloc(flen + 1, sizeof(*addrs), GFP_KERNEL);
- if (addrs == NULL)
- return;
-
- /*
- * There are multiple assembly passes as the generated code will change
- * size as it settles down, figuring out the max branch offsets/exit
- * paths required.
- *
- * The range of standard conditional branches is +/- 32Kbytes. Since
- * BPF_MAXINSNS = 4096, we can only jump from (worst case) start to
- * finish with 8 bytes/instruction. Not feasible, so long jumps are
- * used, distinct from short branches.
- *
- * Current:
- *
- * For now, both branch types assemble to 2 words (short branches padded
- * with a NOP); this is less efficient, but assembly will always complete
- * after exactly 3 passes:
- *
- * First pass: No code buffer; Program is "faux-generated" -- no code
- * emitted but maximum size of output determined (and addrs[] filled
- * in). Also, we note whether we use M[], whether we use skb data, etc.
- * All generation choices assumed to be 'worst-case', e.g. branches all
- * far (2 instructions), return path code reduction not available, etc.
- *
- * Second pass: Code buffer allocated with size determined previously.
- * Prologue generated to support features we have seen used. Exit paths
- * determined and addrs[] is filled in again, as code may be slightly
- * smaller as a result.
- *
- * Third pass: Code generated 'for real', and branch destinations
- * determined from now-accurate addrs[] map.
- *
- * Ideal:
- *
- * If we optimise this, near branches will be shorter. On the
- * first assembly pass, we should err on the side of caution and
- * generate the biggest code. On subsequent passes, branches will be
- * generated short or long and code size will reduce. With smaller
- * code, more branches may fall into the short category, and code will
- * reduce more.
- *
- * Finally, if we see one pass generate code the same size as the
- * previous pass we have converged and should now generate code for
- * real. Allocating at the end will also save the memory that would
- * otherwise be wasted by the (small) current code shrinkage.
- * Preferably, we should do a small number of passes (e.g. 5) and if we
- * haven't converged by then, get impatient and force code to generate
- * as-is, even if the odd branch would be left long. The chances of a
- * long jump are tiny with all but the most enormous of BPF filter
- * inputs, so we should usually converge on the third pass.
- */
-
- cgctx.idx = 0;
- cgctx.seen = 0;
- cgctx.pc_ret0 = -1;
- /* Scouting faux-generate pass 0 */
- if (bpf_jit_build_body(fp, 0, &cgctx, addrs))
- /* We hit something illegal or unsupported. */
- goto out;
-
- /*
- * Pretend to build prologue, given the features we've seen. This will
- * update ctgtx.idx as it pretends to output instructions, then we can
- * calculate total size from idx.
- */
- bpf_jit_build_prologue(fp, 0, &cgctx);
- bpf_jit_build_epilogue(0, &cgctx);
-
- proglen = cgctx.idx * 4;
- alloclen = proglen + FUNCTION_DESCR_SIZE;
- image = module_alloc(alloclen);
- if (!image)
- goto out;
-
- code_base = image + (FUNCTION_DESCR_SIZE/4);
-
- /* Code generation passes 1-2 */
- for (pass = 1; pass < 3; pass++) {
- /* Now build the prologue, body code & epilogue for real. */
- cgctx.idx = 0;
- bpf_jit_build_prologue(fp, code_base, &cgctx);
- bpf_jit_build_body(fp, code_base, &cgctx, addrs);
- bpf_jit_build_epilogue(code_base, &cgctx);
-
- if (bpf_jit_enable > 1)
- pr_info("Pass %d: shrink = %d, seen = 0x%x\n", pass,
- proglen - (cgctx.idx * 4), cgctx.seen);
- }
-
- if (bpf_jit_enable > 1)
- /* Note that we output the base address of the code_base
- * rather than image, since opcodes are in code_base.
- */
- bpf_jit_dump(flen, proglen, pass, code_base);
-
- bpf_flush_icache(code_base, code_base + (proglen/4));
-
-#ifdef CONFIG_PPC64
- /* Function descriptor nastiness: Address + TOC */
- ((u64 *)image)[0] = (u64)code_base;
- ((u64 *)image)[1] = local_paca->kernel_toc;
-#endif
-
- fp->bpf_func = (void *)image;
- fp->jited = 1;
-
-out:
- kfree(addrs);
- return;
-}
-
-void bpf_jit_free(struct bpf_prog *fp)
-{
- if (fp->jited)
- module_memfree(fp->bpf_func);
-
- bpf_prog_unlock_free(fp);
-}
diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c
new file mode 100644
index 000000000000..909c27da71f1
--- /dev/null
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -0,0 +1,1177 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * bpf_jit_comp64.c: eBPF JIT compiler
+ *
+ * Copyright 2016 Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
+ * IBM Corporation
+ *
+ * Based on the powerpc classic BPF JIT compiler by Matt Evans
+ */
+#include <linux/moduleloader.h>
+#include <asm/cacheflush.h>
+#include <asm/asm-compat.h>
+#include <linux/netdevice.h>
+#include <linux/filter.h>
+#include <linux/if_vlan.h>
+#include <asm/kprobes.h>
+#include <linux/bpf.h>
+
+#include "bpf_jit32.h"
+
+static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
+{
+ memset32(area, BREAKPOINT_INSTRUCTION, size/4);
+}
+
+static inline void bpf_flush_icache(void *start, void *end)
+{
+ smp_wmb();
+ flush_icache_range((unsigned long)start, (unsigned long)end);
+}
+
+static inline bool bpf_is_seen_register(struct codegen_context *ctx, int i)
+{
+ return (ctx->seen & (3 << (30 - b2p[i])));
+}
+
+static inline void bpf_set_seen_register(struct codegen_context *ctx, int i)
+{
+ ctx->seen |= (3 << (30 - b2p[i]));
+}
+
+static inline bool bpf_has_stack_frame(struct codegen_context *ctx)
+{
+ /*
+ * We only need a stack frame if:
+ * - we call other functions (kernel helpers), or
+ * - the bpf program uses its stack area
+ * The latter condition is deduced from the usage of BPF_REG_FP
+ */
+ return true;
+}
+
+/*
+ * When not setting up our own stackframe, the redzone usage is:
+ *
+ * [ prev sp ] <-------------
+ * [ ... ] |
+ * sp (r1) ---> [ stack pointer ] --------------
+ * [ nv gpr save area ] 6*8
+ * [ tail_call_cnt ] 8
+ * [ local_tmp_var ] 8
+ * [ unused red zone ] 208 bytes protected
+ */
+static int bpf_jit_stack_local(struct codegen_context *ctx)
+{
+ if (bpf_has_stack_frame(ctx))
+ return STACK_FRAME_MIN_SIZE + ctx->stack_size;
+ else
+ return -(BPF_PPC_STACK_SAVE + 16);
+}
+
+static int bpf_jit_stack_tailcallcnt(struct codegen_context *ctx)
+{
+ return bpf_jit_stack_local(ctx) + 8;
+}
+
+static int bpf_jit_stack_offsetof(struct codegen_context *ctx, int reg)
+{
+ if (reg >= BPF_PPC_NVR_MIN && reg < 32)
+ return (bpf_has_stack_frame(ctx) ?
+ (BPF_PPC_STACKFRAME + ctx->stack_size) : 0)
+ - (4 * (32 - reg));
+
+ pr_err("BPF JIT is asking about unknown registers");
+ BUG();
+}
+
+static void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)
+{
+ /*
+ * Initialize tail_call_cnt if we do tail calls.
+ * Otherwise, put in NOPs so that it can be skipped when we are
+ * invoked through a tail call.
+ */
+ if (ctx->seen & SEEN_TAILCALL) {
+ EMIT(PPC_RAW_LI(0, 0));
+ /* this goes in the redzone */
+ EMIT(PPC_RAW_STW(0, 1, -(BPF_PPC_STACK_SAVE + 8)));
+ } else {
+ EMIT(PPC_RAW_NOP());
+ EMIT(PPC_RAW_NOP());
+ }
+ EMIT(PPC_RAW_MR(b2p[BPF_REG_1], 3));
+ EMIT(PPC_RAW_LI(b2p[BPF_REG_1]-1, 0));
+
+#define BPF_TAILCALL_PROLOGUE_SIZE 16
+
+ if (bpf_is_seen_register(ctx, BPF_REG_5)) {
+ EMIT(PPC_RAW_LWZ(b2p[BPF_REG_5]-1, 1, 8));
+ EMIT(PPC_RAW_LWZ(b2p[BPF_REG_5], 1, 12));
+ }
+ /*
+ * We need a stack frame, but we don't necessarily need to
+ * save/restore LR unless we call other functions
+ */
+ if (ctx->seen & SEEN_FUNC) {
+ EMIT(PPC_INST_MFLR | __PPC_RT(R0));
+ EMIT(PPC_RAW_STW(0, 1, PPC_LR_STKOFF));
+ }
+
+ EMIT(PPC_RAW_STWU(1, 1, -(BPF_PPC_STACKFRAME + ctx->stack_size)));
+
+ /*
+ * Back up non-volatile regs -- BPF registers 6-10
+ * If we haven't created our own stack frame, we save these
+ * in the protected zone below the previous stack frame
+ */
+ EMIT(PPC_RAW_STMW(18, 1, bpf_jit_stack_offsetof(ctx, 18)));
+
+ /* Setup frame pointer to point to the bpf stack area */
+ if (bpf_is_seen_register(ctx, BPF_REG_FP))
+ EMIT(PPC_RAW_ADDI(b2p[BPF_REG_FP], 1, STACK_FRAME_MIN_SIZE + ctx->stack_size));
+}
+
+static void bpf_jit_emit_common_epilogue(u32 *image, struct codegen_context *ctx)
+{
+ /* Restore NVRs */
+ EMIT(PPC_RAW_LMW(18, 1, bpf_jit_stack_offsetof(ctx, 18)));
+
+ /* Tear down our stack frame */
+ EMIT(PPC_RAW_ADDI(1, 1, BPF_PPC_STACKFRAME + ctx->stack_size));
+ if (ctx->seen & SEEN_FUNC) {
+ EMIT(PPC_RAW_LWZ(0, 1, PPC_LR_STKOFF));
+ EMIT(PPC_RAW_MTLR(0));
+ }
+}
+
+static void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
+{
+ EMIT(PPC_RAW_MR(3, b2p[BPF_REG_0]));
+
+ bpf_jit_emit_common_epilogue(image, ctx);
+
+ EMIT(PPC_RAW_BLR());
+}
+
+static void bpf_jit_emit_func_call(u32 *image, struct codegen_context *ctx, u64 func)
+{
+ /* Load function address into r0 */
+ PPC_LI32(0, func);
+ EMIT(PPC_RAW_MTLR(0));
+ EMIT(PPC_RAW_BLRL());
+}
+
+static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 out)
+{
+ /*
+ * By now, the eBPF program has already setup parameters in r3, r4 and r5
+ * r3/BPF_REG_1 - pointer to ctx -- passed as is to the next bpf program
+ * r4/BPF_REG_2 - pointer to bpf_array
+ * r5/BPF_REG_3 - index in bpf_array
+ */
+ int b2p_bpf_array = b2p[BPF_REG_2];
+ int b2p_index = b2p[BPF_REG_3];
+
+ /*
+ * if (index >= array->map.max_entries)
+ * goto out;
+ */
+ EMIT(PPC_RAW_LWZ(0, b2p_bpf_array, offsetof(struct bpf_array, map.max_entries)));
+ EMIT(PPC_RAW_CMPLW(b2p_index, 0));
+ PPC_BCC(COND_GE, out);
+
+ /*
+ * if (tail_call_cnt > MAX_TAIL_CALL_CNT)
+ * goto out;
+ */
+ EMIT(PPC_RAW_LWZ(0, 1, bpf_jit_stack_tailcallcnt(ctx)));
+ EMIT(PPC_RAW_CMPLWI(0, MAX_TAIL_CALL_CNT));
+ PPC_BCC(COND_GT, out);
+
+ /*
+ * tail_call_cnt++;
+ */
+ EMIT(PPC_RAW_ADDI(0, 0, 1));
+ EMIT(PPC_RAW_STW(0, 1, bpf_jit_stack_tailcallcnt(ctx)));
+
+ /* prog = array->ptrs[index]; */
+ EMIT(PPC_RAW_MULI(0, b2p_index, 8));
+ EMIT(PPC_RAW_ADD(0, 0, b2p_bpf_array));
+ EMIT(PPC_RAW_LWZ(0, 0, offsetof(struct bpf_array, ptrs)));
+
+ /*
+ * if (prog == NULL)
+ * goto out;
+ */
+ EMIT(PPC_RAW_CMPLWI(0, 0));
+ PPC_BCC(COND_EQ, out);
+
+ /* goto *(prog->bpf_func + prologue_size); */
+ EMIT(PPC_RAW_LWZ(0, 0, offsetof(struct bpf_prog, bpf_func)));
+ EMIT(PPC_RAW_ADDI(0, 0, BPF_TAILCALL_PROLOGUE_SIZE));
+ EMIT(PPC_RAW_MTCTR(0));
+
+ EMIT(PPC_RAW_MR(3, b2p[BPF_REG_1]));
+
+ /* tear down stack, restore NVRs, ... */
+ bpf_jit_emit_common_epilogue(image, ctx);
+
+ EMIT(PPC_RAW_BCTR());
+ /* out: */
+}
+
+/* Assemble the body code between the prologue & epilogue */
+static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
+ struct codegen_context *ctx,
+ u32 *addrs, bool extra_pass)
+{
+ const struct bpf_insn *insn = fp->insnsi;
+ int flen = fp->len;
+ int i, ret;
+
+ /* Start of epilogue code - will only be valid 2nd pass onwards */
+ u32 exit_addr = addrs[flen];
+
+ for (i = 0; i < flen; i++) {
+ u32 code = insn[i].code;
+ u32 dst_reg = b2p[insn[i].dst_reg];
+ u32 dst_reg_h = dst_reg - 1;
+ u32 src_reg = b2p[insn[i].src_reg];
+ u32 src_reg_h = src_reg - 1;
+ u32 tmp_reg = b2p[TMP_REG];
+ s16 off = insn[i].off;
+ s32 imm = insn[i].imm;
+ bool func_addr_fixed;
+ u64 func_addr;
+ u32 true_cond;
+ u32 tmp_idx;
+
+ /*
+ * addrs[] maps a BPF bytecode address into a real offset from
+ * the start of the body code.
+ */
+ addrs[i] = ctx->idx * 4;
+
+ /*
+ * As an optimization, we note down which non-volatile registers
+ * are used so that we can only save/restore those in our
+ * prologue and epilogue. We do this here regardless of whether
+ * the actual BPF instruction uses src/dst registers or not
+ * (for instance, BPF_CALL does not use them). The expectation
+ * is that those instructions will have src_reg/dst_reg set to
+ * 0. Even otherwise, we just lose some prologue/epilogue
+ * optimization but everything else should work without
+ * any issues.
+ */
+ if (dst_reg >= BPF_PPC_NVR_MIN && dst_reg < 32)
+ bpf_set_seen_register(ctx, insn[i].dst_reg);
+
+ if (src_reg >= BPF_PPC_NVR_MIN && src_reg < 32)
+ bpf_set_seen_register(ctx, insn[i].src_reg);
+
+ switch (code) {
+ /*
+ * Arithmetic operations: ADD/SUB/MUL/DIV/MOD/NEG
+ */
+ case BPF_ALU | BPF_ADD | BPF_X: /* (u32) dst += (u32) src */
+ EMIT(PPC_RAW_ADD(dst_reg, dst_reg, src_reg));
+ break;
+ case BPF_ALU64 | BPF_ADD | BPF_X: /* dst += src */
+ EMIT(PPC_RAW_ADDC(dst_reg, dst_reg, src_reg));
+ EMIT(PPC_RAW_ADDE(dst_reg_h, dst_reg_h, src_reg_h));
+ break;
+ case BPF_ALU | BPF_SUB | BPF_X: /* (u32) dst -= (u32) src */
+ EMIT(PPC_RAW_SUB(dst_reg, dst_reg, src_reg));
+ break;
+ case BPF_ALU64 | BPF_SUB | BPF_X: /* dst -= src */
+ EMIT(PPC_RAW_SUBFC(dst_reg, src_reg, dst_reg));
+ EMIT(PPC_RAW_SUBFE(dst_reg_h, src_reg_h, dst_reg_h));
+ break;
+ case BPF_ALU | BPF_SUB | BPF_K: /* (u32) dst -= (u32) imm */
+ imm = -imm;
+ fallthrough;
+ case BPF_ALU | BPF_ADD | BPF_K: /* (u32) dst += (u32) imm */
+ if (IMM_HA(imm) & 0xffff)
+ EMIT(PPC_RAW_ADDIS(dst_reg, dst_reg, IMM_HA(imm)));
+ if (IMM_L(imm))
+ EMIT(PPC_RAW_ADDI(dst_reg, dst_reg, IMM_L(imm)));
+ break;
+ case BPF_ALU64 | BPF_SUB | BPF_K: /* dst -= imm */
+ imm = -imm;
+ fallthrough;
+ case BPF_ALU64 | BPF_ADD | BPF_K: /* dst += imm */
+ if (imm) {
+ PPC_LI32(0, imm);
+ EMIT(PPC_RAW_ADDC(dst_reg, dst_reg, 0));
+ if (imm >= 0)
+ EMIT(PPC_RAW_ADDZE(dst_reg_h, dst_reg_h));
+ else
+ EMIT(PPC_RAW_ADDME(dst_reg_h, dst_reg_h));
+ }
+ break;
+ case BPF_ALU | BPF_MUL | BPF_X: /* (u32) dst *= (u32) src */
+ EMIT(PPC_RAW_MULW(dst_reg, dst_reg, src_reg));
+ break;
+ case BPF_ALU64 | BPF_MUL | BPF_X: /* dst *= src */
+ EMIT(PPC_RAW_MULW(0, dst_reg, src_reg_h));
+ EMIT(PPC_RAW_MULW(dst_reg_h, dst_reg_h, src_reg));
+ EMIT(PPC_RAW_MULHWU(tmp_reg, dst_reg, src_reg));
+ EMIT(PPC_RAW_MULW(dst_reg, dst_reg, src_reg));
+ EMIT(PPC_RAW_ADD(dst_reg_h, dst_reg_h, 0));
+ EMIT(PPC_RAW_ADD(dst_reg_h, dst_reg_h, tmp_reg));
+ break;
+ case BPF_ALU | BPF_MUL | BPF_K: /* (u32) dst *= (u32) imm */
+ if (imm >= -32768 && imm < 32768) {
+ EMIT(PPC_RAW_MULI(dst_reg, dst_reg, imm));
+ } else {
+ PPC_LI32(0, imm);
+ EMIT(PPC_RAW_MULW(dst_reg, dst_reg, 0));
+ }
+ break;
+ case BPF_ALU64 | BPF_MUL | BPF_K: /* dst *= imm */
+ PPC_LI32(0, imm);
+ if (imm >= 0) {
+ EMIT(PPC_RAW_MULW(dst_reg_h, dst_reg_h, 0));
+ EMIT(PPC_RAW_MULW(dst_reg, dst_reg, 0));
+ EMIT(PPC_RAW_MULHWU(0, dst_reg, 0));
+ EMIT(PPC_RAW_ADD(dst_reg_h, dst_reg_h, 0));
+ } else {
+ EMIT(PPC_RAW_MULW(dst_reg_h, dst_reg_h, 0));
+ EMIT(PPC_RAW_NEG(tmp_reg, dst_reg));
+ EMIT(PPC_RAW_ADD(dst_reg_h, dst_reg_h, tmp_reg));
+ EMIT(PPC_RAW_MULHWU(tmp_reg, dst_reg, 0));
+ EMIT(PPC_RAW_MULW(dst_reg, dst_reg, 0));
+ EMIT(PPC_RAW_ADD(dst_reg_h, dst_reg_h, tmp_reg));
+ }
+ break;
+ case BPF_ALU | BPF_DIV | BPF_X: /* (u32) dst /= (u32) src */
+ EMIT(PPC_RAW_DIVWU(dst_reg, dst_reg, src_reg));
+ break;
+ case BPF_ALU | BPF_MOD | BPF_X: /* (u32) dst %= (u32) src */
+ EMIT(PPC_RAW_DIVWU(0, dst_reg, src_reg));
+ EMIT(PPC_RAW_MULW(0, src_reg, 0));
+ EMIT(PPC_RAW_SUB(dst_reg, dst_reg, 0));
+ break;
+ case BPF_ALU64 | BPF_DIV | BPF_X: /* dst /= src */
+ case BPF_ALU64 | BPF_MOD | BPF_X: /* dst %= src */
+ return -ENOTSUPP;
+ case BPF_ALU | BPF_DIV | BPF_K: /* (u32) dst /= (u32) imm */
+ if (imm == 0)
+ return -EINVAL;
+ else if (imm == 1)
+ break;
+
+ PPC_LI32(0, imm);
+ EMIT(PPC_RAW_DIVWU(dst_reg, dst_reg, 0));
+ if (!fp->aux->verifier_zext)
+ EMIT(PPC_RAW_LI(dst_reg_h, 0));
+ break;
+ case BPF_ALU | BPF_MOD | BPF_K: /* (u32) dst %= (u32) imm */
+ if (imm == 0)
+ return -EINVAL;
+
+ PPC_LI32(tmp_reg, imm);
+ EMIT(PPC_RAW_DIVWU(0, dst_reg, tmp_reg));
+ EMIT(PPC_RAW_MULW(0, tmp_reg, 0));
+ EMIT(PPC_RAW_SUB(dst_reg, dst_reg, 0));
+ break;
+ case BPF_ALU64 | BPF_MOD | BPF_K: /* dst %= imm */
+ case BPF_ALU64 | BPF_DIV | BPF_K: /* dst /= imm */
+ return -ENOTSUPP;
+ case BPF_ALU | BPF_NEG: /* (u32) dst = -dst */
+ EMIT(PPC_RAW_NEG(dst_reg, dst_reg));
+ break;
+ case BPF_ALU64 | BPF_NEG: /* dst = -dst */
+ EMIT(PPC_RAW_SUBFIC(dst_reg, dst_reg, 0));
+ EMIT(PPC_RAW_SUBFZE(dst_reg_h, dst_reg_h));
+ break;
+
+ /*
+ * Logical operations: AND/OR/XOR/[A]LSH/[A]RSH
+ */
+ case BPF_ALU64 | BPF_AND | BPF_X: /* dst = dst & src */
+ EMIT(PPC_RAW_AND(dst_reg_h, dst_reg_h, src_reg_h));
+ fallthrough;
+ case BPF_ALU | BPF_AND | BPF_X: /* (u32) dst = dst & src */
+ EMIT(PPC_RAW_AND(dst_reg, dst_reg, src_reg));
+ break;
+ case BPF_ALU64 | BPF_AND | BPF_K: /* dst = dst & imm */
+ if (imm >= 0)
+ EMIT(PPC_RAW_LI(dst_reg_h, 0));
+ fallthrough;
+ case BPF_ALU | BPF_AND | BPF_K: /* (u32) dst = dst & imm */
+ if (!IMM_H(imm)) {
+ EMIT(PPC_RAW_ANDI(dst_reg, dst_reg, IMM_L(imm)));
+ } else if (!IMM_L(imm)) {
+ EMIT(PPC_RAW_ANDIS(dst_reg, dst_reg, IMM_H(imm)));
+ } else {
+ PPC_LI32(0, imm);
+ EMIT(PPC_RAW_AND(dst_reg, dst_reg, 0));
+ }
+ break;
+ case BPF_ALU64 | BPF_OR | BPF_X: /* dst = dst | src */
+ EMIT(PPC_RAW_OR(dst_reg_h, dst_reg_h, src_reg_h));
+ fallthrough;
+ case BPF_ALU | BPF_OR | BPF_X: /* dst = (u32) dst | (u32) src */
+ EMIT(PPC_RAW_OR(dst_reg, dst_reg, src_reg));
+ break;
+ case BPF_ALU64 | BPF_OR | BPF_K:/* dst = dst | imm */
+ /* Sign-extended */
+ if (imm < 0)
+ EMIT(PPC_RAW_LI(dst_reg_h, -1));
+ fallthrough;
+ case BPF_ALU | BPF_OR | BPF_K:/* dst = (u32) dst | (u32) imm */
+ if (IMM_L(imm))
+ EMIT(PPC_RAW_ORI(dst_reg, dst_reg, IMM_L(imm)));
+ if (IMM_H(imm))
+ EMIT(PPC_RAW_ORIS(dst_reg, dst_reg, IMM_H(imm)));
+ break;
+ case BPF_ALU64 | BPF_XOR | BPF_X: /* dst ^= src */
+ EMIT(PPC_RAW_XOR(dst_reg_h, dst_reg_h, src_reg_h));
+ EMIT(PPC_RAW_XOR(dst_reg, dst_reg, src_reg));
+ break;
+ case BPF_ALU | BPF_XOR | BPF_X: /* (u32) dst ^= src */
+ EMIT(PPC_RAW_XOR(dst_reg, dst_reg, src_reg));
+ break;
+ case BPF_ALU64 | BPF_XOR | BPF_K: /* dst ^= imm */
+ if (imm < 0)
+ EMIT(PPC_RAW_NOR(dst_reg_h, dst_reg_h, dst_reg_h));
+ fallthrough;
+ case BPF_ALU | BPF_XOR | BPF_K: /* (u32) dst ^= (u32) imm */
+ if (IMM_L(imm))
+ EMIT(PPC_RAW_XORI(dst_reg, dst_reg, IMM_L(imm)));
+ if (IMM_H(imm))
+ EMIT(PPC_RAW_XORIS(dst_reg, dst_reg, IMM_H(imm)));
+ break;
+ case BPF_ALU | BPF_LSH | BPF_X: /* (u32) dst <<= (u32) src */
+ EMIT(PPC_RAW_SLW(dst_reg, dst_reg, src_reg));
+ break;
+ case BPF_ALU64 | BPF_LSH | BPF_X: /* dst <<= src; */
+ return -ENOTSUPP;
+ case BPF_ALU | BPF_LSH | BPF_K: /* (u32) dst <<== (u32) imm */
+ /* with imm 0, we still need to clear top 32 bits */
+ EMIT(PPC_RAW_SLWI(dst_reg, dst_reg, imm));
+ break;
+ case BPF_ALU64 | BPF_LSH | BPF_K: /* dst <<== imm */
+ if (imm != 0)
+ return -ENOTSUPP;
+ break;
+ case BPF_ALU | BPF_RSH | BPF_X: /* (u32) dst >>= (u32) src */
+ EMIT(PPC_RAW_SRW(dst_reg, dst_reg, src_reg));
+ break;
+ case BPF_ALU64 | BPF_RSH | BPF_X: /* dst >>= src */
+ return -ENOTSUPP;
+ case BPF_ALU | BPF_RSH | BPF_K: /* (u32) dst >>= (u32) imm */
+ EMIT(PPC_RAW_SRWI(dst_reg, dst_reg, imm));
+ break;
+ case BPF_ALU64 | BPF_RSH | BPF_K: /* dst >>= imm */
+ if (imm != 0)
+ return -ENOTSUPP;
+ break;
+ case BPF_ALU | BPF_ARSH | BPF_X: /* (s32) dst >>= src */
+ EMIT(PPC_RAW_SRAW(dst_reg_h, dst_reg, src_reg));
+ break;
+ case BPF_ALU64 | BPF_ARSH | BPF_X: /* (s64) dst >>= src */
+ return -ENOTSUPP;
+ case BPF_ALU | BPF_ARSH | BPF_K: /* (s32) dst >>= imm */
+ EMIT(PPC_RAW_SRAWI(dst_reg, dst_reg, imm));
+ break;
+ case BPF_ALU64 | BPF_ARSH | BPF_K: /* (s64) dst >>= imm */
+ if (imm != 0)
+ return -ENOTSUPP;
+ break;
+
+ /*
+ * MOV
+ */
+ case BPF_ALU64 | BPF_MOV | BPF_X: /* dst = src */
+ if (dst_reg_h != src_reg_h)
+ EMIT(PPC_RAW_MR(dst_reg_h, src_reg_h));
+ fallthrough;
+ case BPF_ALU | BPF_MOV | BPF_X: /* (u32) dst = src */
+ if (dst_reg != src_reg)
+ EMIT(PPC_RAW_MR(dst_reg, src_reg));
+ if (imm == 1) {
+ /* special mov32 for zext */
+ EMIT(PPC_RAW_LI(dst_reg_h, 0));
+ break;
+ }
+ break;
+ case BPF_ALU64 | BPF_MOV | BPF_K: /* dst = (s64) imm */
+ PPC_LI32(dst_reg, imm);
+ EMIT(PPC_RAW_LI(dst_reg_h, imm < 0 ? -1 : 0));
+ break;
+ case BPF_ALU | BPF_MOV | BPF_K: /* (u32) dst = imm */
+ PPC_LI32(dst_reg, imm);
+ if (!fp->aux->verifier_zext)
+ EMIT(PPC_RAW_LI(dst_reg_h, 0));
+ break;
+
+ /*
+ * BPF_FROM_BE/LE
+ */
+ case BPF_ALU | BPF_END | BPF_FROM_LE:
+ switch (imm) {
+ case 16:
+ /* Rotate 8 bits left & mask with 0x0000ff00 */
+ EMIT(PPC_RAW_RLWINM(0, dst_reg, 8, 16, 23));
+ /* Rotate 8 bits right & insert LSB to reg */
+ EMIT(PPC_RAW_RLWIMI(0, dst_reg, 24, 24, 31));
+ /* Move result back to dst_reg_h */
+ EMIT(PPC_RAW_MR(dst_reg, 0));
+ break;
+ case 32:
+ /*
+ * Rotate word left by 8 bits:
+ * 2 bytes are already in their final position
+ * -- byte 2 and 4 (of bytes 1, 2, 3 and 4)
+ */
+ EMIT(PPC_RAW_RLWINM(0, dst_reg, 8, 0, 31));
+ /* Rotate 24 bits and insert byte 1 */
+ EMIT(PPC_RAW_RLWIMI(0, dst_reg, 24, 0, 7));
+ /* Rotate 24 bits and insert byte 3 */
+ EMIT(PPC_RAW_RLWIMI(0, dst_reg, 24, 16, 23));
+ EMIT(PPC_RAW_MR(dst_reg, 0));
+ break;
+ case 64:
+ EMIT(PPC_RAW_RLWINM(tmp_reg, dst_reg, 8, 0, 31));
+ EMIT(PPC_RAW_RLWINM(0, dst_reg_h, 8, 0, 31));
+ /* Rotate 24 bits and insert byte 1 */
+ EMIT(PPC_RAW_RLWIMI(tmp_reg, dst_reg, 24, 0, 7));
+ EMIT(PPC_RAW_RLWIMI(0, dst_reg_h, 24, 0, 7));
+ /* Rotate 24 bits and insert byte 3 */
+ EMIT(PPC_RAW_RLWIMI(tmp_reg, dst_reg, 24, 16, 23));
+ EMIT(PPC_RAW_RLWIMI(0, dst_reg_h, 24, 16, 23));
+ EMIT(PPC_RAW_MR(dst_reg, 0));
+ EMIT(PPC_RAW_MR(dst_reg_h, tmp_reg));
+ break;
+ }
+ break;
+ case BPF_ALU | BPF_END | BPF_FROM_BE:
+ switch (imm) {
+ case 16:
+ /* zero-extend 16 bits into 32 bits */
+ EMIT(PPC_RAW_RLWINM(dst_reg, dst_reg, 0, 16, 31));
+ break;
+ case 32:
+ case 64:
+ /* nop */
+ break;
+ }
+ break;
+
+ /*
+ * BPF_ST(X)
+ */
+ case BPF_STX | BPF_MEM | BPF_B: /* *(u8 *)(dst + off) = src */
+ EMIT(PPC_RAW_STB(src_reg, dst_reg, off));
+ break;
+ case BPF_ST | BPF_MEM | BPF_B: /* *(u8 *)(dst + off) = imm */
+ PPC_LI32(0, imm);
+ EMIT(PPC_RAW_STB(0, dst_reg, off));
+ break;
+ case BPF_STX | BPF_MEM | BPF_H: /* (u16 *)(dst + off) = src */
+ EMIT(PPC_RAW_STH(src_reg, dst_reg, off));
+ break;
+ case BPF_ST | BPF_MEM | BPF_H: /* (u16 *)(dst + off) = imm */
+ PPC_LI32(0, imm);
+ EMIT(PPC_RAW_STH(0, dst_reg, off));
+ break;
+ case BPF_STX | BPF_MEM | BPF_W: /* *(u32 *)(dst + off) = src */
+ EMIT(PPC_RAW_STW(src_reg, dst_reg, off));
+ break;
+ case BPF_ST | BPF_MEM | BPF_W: /* *(u32 *)(dst + off) = imm */
+ PPC_LI32(0, imm);
+ EMIT(PPC_RAW_STW(0, dst_reg, off));
+ break;
+ case BPF_STX | BPF_MEM | BPF_DW: /* (u64 *)(dst + off) = src */
+ EMIT(PPC_RAW_STW(src_reg_h, dst_reg, off));
+ EMIT(PPC_RAW_STW(src_reg, dst_reg, off+4));
+ break;
+ case BPF_ST | BPF_MEM | BPF_DW: /* *(u64 *)(dst + off) = imm */
+ PPC_LI32(0, imm);
+ EMIT(PPC_RAW_STW(0, dst_reg, off+4));
+ EMIT(PPC_RAW_LI(0, imm < 0 ? -1 : 0));
+ EMIT(PPC_RAW_STW(0, dst_reg, off));
+ break;
+
+ /*
+ * BPF_STX XADD (atomic_add)
+ */
+ /* *(u32 *)(dst + off) += src */
+ case BPF_STX | BPF_XADD | BPF_W:
+ /* Get offset into TMP_REG */
+ EMIT(PPC_RAW_LI(tmp_reg, off));
+ tmp_idx = ctx->idx * 4;
+ /* load value from memory into r0 */
+ EMIT(PPC_RAW_LWARX(0, tmp_reg, dst_reg, 0));
+ /* add value from src_reg into this */
+ EMIT(PPC_RAW_ADD(0, 0, src_reg));
+ /* store result back */
+ EMIT(PPC_RAW_STWCX(0, tmp_reg, dst_reg));
+ /* we're done if this succeeded */
+ PPC_BCC_SHORT(COND_NE, tmp_idx);
+ break;
+ /* *(u64 *)(dst + off) += src */
+ case BPF_STX | BPF_XADD | BPF_DW:
+ return -ENOTSUPP;
+
+ /*
+ * BPF_LDX
+ */
+ /* dst = *(u8 *)(ul) (src + off) */
+ case BPF_LDX | BPF_MEM | BPF_B:
+ EMIT(PPC_RAW_LBZ(dst_reg, src_reg, off));
+ if (!fp->aux->verifier_zext)
+ EMIT(PPC_RAW_LI(dst_reg_h, 0));
+ break;
+ /* dst = *(u16 *)(ul) (src + off) */
+ case BPF_LDX | BPF_MEM | BPF_H:
+ EMIT(PPC_RAW_LHZ(dst_reg, src_reg, off));
+ if (!fp->aux->verifier_zext)
+ EMIT(PPC_RAW_LI(dst_reg_h, 0));
+ break;
+ /* dst = *(u32 *)(ul) (src + off) */
+ case BPF_LDX | BPF_MEM | BPF_W:
+ EMIT(PPC_RAW_LWZ(dst_reg, src_reg, off));
+ if (!fp->aux->verifier_zext)
+ EMIT(PPC_RAW_LI(dst_reg_h, 0));
+ break;
+ /* dst = *(u64 *)(ul) (src + off) */
+ case BPF_LDX | BPF_MEM | BPF_DW:
+ EMIT(PPC_RAW_LWZ(dst_reg_h, src_reg, off));
+ EMIT(PPC_RAW_LWZ(dst_reg, src_reg, off+4));
+ break;
+
+ /*
+ * Doubleword load
+ * 16 byte instruction that uses two 'struct bpf_insn'
+ */
+ case BPF_LD | BPF_IMM | BPF_DW: /* dst = (u64) imm */
+ PPC_LI32(dst_reg_h, (u32)insn[i + 1].imm);
+ PPC_LI32(dst_reg, (u32)insn[i].imm);
+ /* Adjust for two bpf instructions */
+ addrs[++i] = ctx->idx * 4;
+ break;
+
+ /*
+ * Return/Exit
+ */
+ case BPF_JMP | BPF_EXIT:
+ /*
+ * If this isn't the very last instruction, branch to
+ * the epilogue. If we _are_ the last instruction,
+ * we'll just fall through to the epilogue.
+ */
+ if (i != flen - 1)
+ PPC_JMP(exit_addr);
+ /* else fall through to the epilogue */
+ break;
+
+ /*
+ * Call kernel helper or bpf function
+ */
+ case BPF_JMP | BPF_CALL:
+ ctx->seen |= SEEN_FUNC;
+
+ ret = bpf_jit_get_func_addr(fp, &insn[i], extra_pass,
+ &func_addr, &func_addr_fixed);
+ if (ret < 0)
+ return ret;
+
+ bpf_jit_emit_func_call(image, ctx, func_addr);
+
+ EMIT(PPC_RAW_MR(b2p[BPF_REG_0]-1, 3));
+ EMIT(PPC_RAW_MR(b2p[BPF_REG_0], 4));
+ break;
+
+ /*
+ * Jumps and branches
+ */
+ case BPF_JMP | BPF_JA:
+ PPC_JMP(addrs[i + 1 + off]);
+ break;
+
+ case BPF_JMP | BPF_JGT | BPF_K:
+ case BPF_JMP | BPF_JGT | BPF_X:
+ case BPF_JMP | BPF_JSGT | BPF_K:
+ case BPF_JMP | BPF_JSGT | BPF_X:
+ case BPF_JMP32 | BPF_JGT | BPF_K:
+ case BPF_JMP32 | BPF_JGT | BPF_X:
+ case BPF_JMP32 | BPF_JSGT | BPF_K:
+ case BPF_JMP32 | BPF_JSGT | BPF_X:
+ true_cond = COND_GT;
+ goto cond_branch;
+ case BPF_JMP | BPF_JLT | BPF_K:
+ case BPF_JMP | BPF_JLT | BPF_X:
+ case BPF_JMP | BPF_JSLT | BPF_K:
+ case BPF_JMP | BPF_JSLT | BPF_X:
+ case BPF_JMP32 | BPF_JLT | BPF_K:
+ case BPF_JMP32 | BPF_JLT | BPF_X:
+ case BPF_JMP32 | BPF_JSLT | BPF_K:
+ case BPF_JMP32 | BPF_JSLT | BPF_X:
+ true_cond = COND_LT;
+ goto cond_branch;
+ case BPF_JMP | BPF_JGE | BPF_K:
+ case BPF_JMP | BPF_JGE | BPF_X:
+ case BPF_JMP | BPF_JSGE | BPF_K:
+ case BPF_JMP | BPF_JSGE | BPF_X:
+ case BPF_JMP32 | BPF_JGE | BPF_K:
+ case BPF_JMP32 | BPF_JGE | BPF_X:
+ case BPF_JMP32 | BPF_JSGE | BPF_K:
+ case BPF_JMP32 | BPF_JSGE | BPF_X:
+ true_cond = COND_GE;
+ goto cond_branch;
+ case BPF_JMP | BPF_JLE | BPF_K:
+ case BPF_JMP | BPF_JLE | BPF_X:
+ case BPF_JMP | BPF_JSLE | BPF_K:
+ case BPF_JMP | BPF_JSLE | BPF_X:
+ case BPF_JMP32 | BPF_JLE | BPF_K:
+ case BPF_JMP32 | BPF_JLE | BPF_X:
+ case BPF_JMP32 | BPF_JSLE | BPF_K:
+ case BPF_JMP32 | BPF_JSLE | BPF_X:
+ true_cond = COND_LE;
+ goto cond_branch;
+ case BPF_JMP | BPF_JEQ | BPF_K:
+ case BPF_JMP | BPF_JEQ | BPF_X:
+ case BPF_JMP32 | BPF_JEQ | BPF_K:
+ case BPF_JMP32 | BPF_JEQ | BPF_X:
+ true_cond = COND_EQ;
+ goto cond_branch;
+ case BPF_JMP | BPF_JNE | BPF_K:
+ case BPF_JMP | BPF_JNE | BPF_X:
+ case BPF_JMP32 | BPF_JNE | BPF_K:
+ case BPF_JMP32 | BPF_JNE | BPF_X:
+ true_cond = COND_NE;
+ goto cond_branch;
+ case BPF_JMP | BPF_JSET | BPF_K:
+ case BPF_JMP | BPF_JSET | BPF_X:
+ case BPF_JMP32 | BPF_JSET | BPF_K:
+ case BPF_JMP32 | BPF_JSET | BPF_X:
+ true_cond = COND_NE;
+ /* Fall through */
+
+cond_branch:
+ switch (code) {
+ case BPF_JMP | BPF_JGT | BPF_X:
+ case BPF_JMP | BPF_JLT | BPF_X:
+ case BPF_JMP | BPF_JGE | BPF_X:
+ case BPF_JMP | BPF_JLE | BPF_X:
+ case BPF_JMP | BPF_JEQ | BPF_X:
+ case BPF_JMP | BPF_JNE | BPF_X:
+ /* unsigned comparison */
+ EMIT(PPC_RAW_CMPLW(dst_reg_h, src_reg_h));
+ PPC_BCC_SHORT(COND_NE, (ctx->idx + 2) * 4);
+ EMIT(PPC_RAW_CMPLW(dst_reg, src_reg));
+ break;
+ case BPF_JMP32 | BPF_JGT | BPF_X:
+ case BPF_JMP32 | BPF_JLT | BPF_X:
+ case BPF_JMP32 | BPF_JGE | BPF_X:
+ case BPF_JMP32 | BPF_JLE | BPF_X:
+ case BPF_JMP32 | BPF_JEQ | BPF_X:
+ case BPF_JMP32 | BPF_JNE | BPF_X:
+ /* unsigned comparison */
+ EMIT(PPC_RAW_CMPLW(dst_reg, src_reg));
+ break;
+ case BPF_JMP | BPF_JSGT | BPF_X:
+ case BPF_JMP | BPF_JSLT | BPF_X:
+ case BPF_JMP | BPF_JSGE | BPF_X:
+ case BPF_JMP | BPF_JSLE | BPF_X:
+ /* signed comparison */
+ EMIT(PPC_RAW_CMPW(dst_reg_h, src_reg_h));
+ PPC_BCC_SHORT(COND_NE, (ctx->idx + 2) * 4);
+ EMIT(PPC_RAW_CMPLW(dst_reg, src_reg));
+ break;
+ case BPF_JMP32 | BPF_JSGT | BPF_X:
+ case BPF_JMP32 | BPF_JSLT | BPF_X:
+ case BPF_JMP32 | BPF_JSGE | BPF_X:
+ case BPF_JMP32 | BPF_JSLE | BPF_X:
+ /* signed comparison */
+ EMIT(PPC_RAW_CMPW(dst_reg, src_reg));
+ break;
+ case BPF_JMP | BPF_JSET | BPF_X:
+ EMIT(PPC_RAW_AND_DOT(0, dst_reg_h, src_reg_h));
+ PPC_BCC_SHORT(COND_NE, (ctx->idx + 2) * 4);
+ EMIT(PPC_RAW_AND_DOT(0, dst_reg, src_reg));
+ break;
+ case BPF_JMP32 | BPF_JSET | BPF_X: {
+ EMIT(PPC_RAW_AND_DOT(0, dst_reg, src_reg));
+ break;
+ case BPF_JMP | BPF_JNE | BPF_K:
+ case BPF_JMP | BPF_JEQ | BPF_K:
+ case BPF_JMP | BPF_JGT | BPF_K:
+ case BPF_JMP | BPF_JLT | BPF_K:
+ case BPF_JMP | BPF_JGE | BPF_K:
+ case BPF_JMP | BPF_JLE | BPF_K:
+ /*
+ * Need sign-extended load, so only positive
+ * values can be used as imm in cmpldi
+ */
+ if (imm >= 0 && imm < 32768) {
+ EMIT(PPC_RAW_CMPLWI(dst_reg_h, 0));
+ PPC_BCC_SHORT(COND_NE, (ctx->idx + 2) * 4);
+ EMIT(PPC_RAW_CMPLWI(dst_reg, imm));
+ } else {
+ /* sign-extending load ... but unsigned comparison */
+ EMIT(PPC_RAW_LI(0, imm < 0 ? -1 : 0));
+ EMIT(PPC_RAW_CMPLW(dst_reg_h, 0));
+ PPC_LI32(0, imm);
+ PPC_BCC_SHORT(COND_NE, (ctx->idx + 2) * 4);
+ EMIT(PPC_RAW_CMPLW(dst_reg, 0));
+ }
+ break;
+ case BPF_JMP32 | BPF_JNE | BPF_K:
+ case BPF_JMP32 | BPF_JEQ | BPF_K:
+ case BPF_JMP32 | BPF_JGT | BPF_K:
+ case BPF_JMP32 | BPF_JLT | BPF_K:
+ case BPF_JMP32 | BPF_JGE | BPF_K:
+ case BPF_JMP32 | BPF_JLE | BPF_K:
+ /*
+ * Need sign-extended load, so only positive
+ * values can be used as imm in cmpldi
+ */
+ if (imm >= 0 && imm < 65536) {
+ EMIT(PPC_RAW_CMPLWI(dst_reg, imm));
+ } else {
+ /* sign-extending load */
+ PPC_LI32(0, imm);
+ /* ... but unsigned comparison */
+ EMIT(PPC_RAW_CMPLW(dst_reg, 0));
+ }
+ break;
+ }
+ case BPF_JMP | BPF_JSGT | BPF_K:
+ case BPF_JMP | BPF_JSLT | BPF_K:
+ case BPF_JMP | BPF_JSGE | BPF_K:
+ case BPF_JMP | BPF_JSLE | BPF_K:
+ /*
+ * signed comparison, so any 16-bit value
+ * can be used in cmpdi
+ */
+ if (imm >= 0 && imm < 65536) {
+ EMIT(PPC_RAW_CMPWI(dst_reg_h, imm < 0 ? -1 : 0));
+ PPC_BCC_SHORT(COND_NE, (ctx->idx + 2) * 4);
+ EMIT(PPC_RAW_CMPLWI(dst_reg, imm));
+ } else {
+ /* sign-extending load */
+ EMIT(PPC_RAW_CMPWI(dst_reg_h, imm < 0 ? -1 : 0));
+ PPC_LI32(0, imm);
+ PPC_BCC_SHORT(COND_NE, (ctx->idx + 2) * 4);
+ EMIT(PPC_RAW_CMPLW(dst_reg, 0));
+ }
+ break;
+ case BPF_JMP32 | BPF_JSGT | BPF_K:
+ case BPF_JMP32 | BPF_JSLT | BPF_K:
+ case BPF_JMP32 | BPF_JSGE | BPF_K:
+ case BPF_JMP32 | BPF_JSLE | BPF_K:
+ /*
+ * signed comparison, so any 16-bit value
+ * can be used in cmpdi
+ */
+ if (imm >= -32768 && imm < 32768) {
+ EMIT(PPC_RAW_CMPWI(dst_reg, imm));
+ } else {
+ /* sign-extending load */
+ PPC_LI32(0, imm);
+ EMIT(PPC_RAW_CMPW(dst_reg, 0));
+ }
+ break;
+ case BPF_JMP | BPF_JSET | BPF_K:
+ /* andi does not sign-extend the immediate */
+ if (imm >= 0 && imm < 32768) {
+ /* PPC_ANDI is _only/always_ dot-form */
+ EMIT(PPC_RAW_ANDI(0, dst_reg, imm));
+ } else {
+ PPC_LI32(0, imm);
+ if (imm < 0) {
+ EMIT(PPC_RAW_CMPWI(dst_reg_h, 0));
+ PPC_BCC_SHORT(COND_NE, (ctx->idx + 2) * 4);
+ }
+ EMIT(PPC_RAW_AND_DOT(0, dst_reg, 0));
+ }
+ break;
+ case BPF_JMP32 | BPF_JSET | BPF_K:
+ /* andi does not sign-extend the immediate */
+ if (imm >= -32768 && imm < 32768)
+ /* PPC_ANDI is _only/always_ dot-form */
+ EMIT(PPC_RAW_ANDI(0, dst_reg, imm));
+ else {
+ PPC_LI32(0, imm);
+ EMIT(PPC_RAW_AND_DOT(0, dst_reg, 0));
+ }
+ break;
+ }
+ PPC_BCC(true_cond, addrs[i + 1 + off]);
+ break;
+
+ /*
+ * Tail call
+ */
+ case BPF_JMP | BPF_TAIL_CALL:
+ ctx->seen |= SEEN_TAILCALL;
+ bpf_jit_emit_tail_call(image, ctx, addrs[i + 1]);
+ break;
+
+ default:
+ /*
+ * The filter contains something cruel & unusual.
+ * We don't handle it, but also there shouldn't be
+ * anything missing from our list.
+ */
+ pr_err_ratelimited("eBPF filter opcode %04x (@%d) unsupported\n", code, i);
+ return -ENOTSUPP;
+ }
+ }
+
+ /* Set end-of-body-code address for exit. */
+ addrs[i] = ctx->idx * 4;
+
+ return 0;
+}
+
+/* Fix the branch target addresses for subprog calls */
+static int bpf_jit_fixup_subprog_calls(struct bpf_prog *fp, u32 *image,
+ struct codegen_context *ctx, u32 *addrs)
+{
+ const struct bpf_insn *insn = fp->insnsi;
+ bool func_addr_fixed;
+ u64 func_addr;
+ u32 tmp_idx;
+ int i, ret;
+
+ for (i = 0; i < fp->len; i++) {
+ /*
+ * During the extra pass, only the branch target addresses for
+ * the subprog calls need to be fixed. All other instructions
+ * can left untouched.
+ *
+ * The JITed image length does not change because we already
+ * ensure that the JITed instruction sequence for these calls
+ * are of fixed length by padding them with NOPs.
+ */
+ if (insn[i].code == (BPF_JMP | BPF_CALL) &&
+ insn[i].src_reg == BPF_PSEUDO_CALL) {
+ ret = bpf_jit_get_func_addr(fp, &insn[i], true,
+ &func_addr,
+ &func_addr_fixed);
+ if (ret < 0)
+ return ret;
+
+ /*
+ * Save ctx->idx as this would currently point to the
+ * end of the JITed image and set it to the offset of
+ * the instruction sequence corresponding to the
+ * subprog call temporarily.
+ */
+ tmp_idx = ctx->idx;
+ ctx->idx = addrs[i] / 4;
+ bpf_jit_emit_func_call(image, ctx, func_addr);
+
+ /*
+ * Restore ctx->idx here. This is safe as the length
+ * of the JITed sequence remains unchanged.
+ */
+ ctx->idx = tmp_idx;
+ }
+ }
+
+ return 0;
+}
+
+struct powerpc64_jit_data {
+ struct bpf_binary_header *header;
+ u32 *addrs;
+ u8 *image;
+ u32 proglen;
+ struct codegen_context ctx;
+};
+
+bool bpf_jit_needs_zext(void)
+{
+ return true;
+}
+
+struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
+{
+ u32 proglen;
+ u32 alloclen;
+ u8 *image = NULL;
+ u32 *code_base;
+ u32 *addrs;
+ struct powerpc64_jit_data *jit_data;
+ struct codegen_context cgctx;
+ int pass;
+ int flen;
+ struct bpf_binary_header *bpf_hdr;
+ struct bpf_prog *org_fp = fp;
+ struct bpf_prog *tmp_fp;
+ bool bpf_blinded = false;
+ bool extra_pass = false;
+
+ if (!fp->jit_requested)
+ return org_fp;
+
+ tmp_fp = bpf_jit_blind_constants(org_fp);
+ if (IS_ERR(tmp_fp))
+ return org_fp;
+
+ if (tmp_fp != org_fp) {
+ bpf_blinded = true;
+ fp = tmp_fp;
+ }
+
+ jit_data = fp->aux->jit_data;
+ if (!jit_data) {
+ jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL);
+ if (!jit_data) {
+ fp = org_fp;
+ goto out;
+ }
+ fp->aux->jit_data = jit_data;
+ }
+
+ flen = fp->len;
+ addrs = jit_data->addrs;
+ if (addrs) {
+ cgctx = jit_data->ctx;
+ image = jit_data->image;
+ bpf_hdr = jit_data->header;
+ proglen = jit_data->proglen;
+ alloclen = proglen + FUNCTION_DESCR_SIZE;
+ extra_pass = true;
+ goto skip_init_ctx;
+ }
+
+ addrs = kcalloc(flen + 1, sizeof(*addrs), GFP_KERNEL);
+ if (addrs == NULL) {
+ fp = org_fp;
+ goto out_addrs;
+ }
+
+ memset(&cgctx, 0, sizeof(struct codegen_context));
+
+ /* Make sure that the stack is quadword aligned. */
+ cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
+
+ /* Scouting faux-generate pass 0 */
+ if (bpf_jit_build_body(fp, 0, &cgctx, addrs, false)) {
+ /* We hit something illegal or unsupported. */
+ fp = org_fp;
+ goto out_addrs;
+ }
+
+ /*
+ * If we have seen a tail call, we need a second pass.
+ * This is because bpf_jit_emit_common_epilogue() is called
+ * from bpf_jit_emit_tail_call() with a not yet stable ctx->seen.
+ */
+ if (cgctx.seen & SEEN_TAILCALL) {
+ cgctx.idx = 0;
+ if (bpf_jit_build_body(fp, 0, &cgctx, addrs, false)) {
+ fp = org_fp;
+ goto out_addrs;
+ }
+ }
+
+ /*
+ * Pretend to build prologue, given the features we've seen. This will
+ * update ctgtx.idx as it pretends to output instructions, then we can
+ * calculate total size from idx.
+ */
+ bpf_jit_build_prologue(0, &cgctx);
+ bpf_jit_build_epilogue(0, &cgctx);
+
+ proglen = cgctx.idx * 4;
+ alloclen = proglen + FUNCTION_DESCR_SIZE;
+
+ bpf_hdr = bpf_jit_binary_alloc(alloclen, &image, 4,
+ bpf_jit_fill_ill_insns);
+ if (!bpf_hdr) {
+ fp = org_fp;
+ goto out_addrs;
+ }
+
+skip_init_ctx:
+ code_base = (u32 *)(image + FUNCTION_DESCR_SIZE);
+
+ if (extra_pass) {
+ /*
+ * Do not touch the prologue and epilogue as they will remain
+ * unchanged. Only fix the branch target address for subprog
+ * calls in the body.
+ *
+ * This does not change the offsets and lengths of the subprog
+ * call instruction sequences and hence, the size of the JITed
+ * image as well.
+ */
+ bpf_jit_fixup_subprog_calls(fp, code_base, &cgctx, addrs);
+
+ /* There is no need to perform the usual passes. */
+ goto skip_codegen_passes;
+ }
+
+ /* Code generation passes 1-2 */
+ for (pass = 1; pass < 3; pass++) {
+ /* Now build the prologue, body code & epilogue for real. */
+ cgctx.idx = 0;
+ bpf_jit_build_prologue(code_base, &cgctx);
+ bpf_jit_build_body(fp, code_base, &cgctx, addrs, extra_pass);
+ bpf_jit_build_epilogue(code_base, &cgctx);
+
+ if (bpf_jit_enable > 1)
+ pr_info("Pass %d: shrink = %d, seen = 0x%x\n", pass,
+ proglen - (cgctx.idx * 4), cgctx.seen);
+ }
+
+skip_codegen_passes:
+ if (bpf_jit_enable > 1)
+ /*
+ * Note that we output the base address of the code_base
+ * rather than image, since opcodes are in code_base.
+ */
+ bpf_jit_dump(flen, proglen, pass, code_base);
+
+#ifdef PPC64_ELF_ABI_v1
+ /* Function descriptor nastiness: Address + TOC */
+ ((u64 *)image)[0] = (u64)code_base;
+ ((u64 *)image)[1] = local_paca->kernel_toc;
+#endif
+
+ fp->bpf_func = (void *)image;
+ fp->jited = 1;
+ fp->jited_len = alloclen;
+
+ bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + (bpf_hdr->pages * PAGE_SIZE));
+ if (!fp->is_func || extra_pass) {
+ bpf_prog_fill_jited_linfo(fp, addrs);
+out_addrs:
+ kfree(addrs);
+ kfree(jit_data);
+ fp->aux->jit_data = NULL;
+ } else {
+ jit_data->addrs = addrs;
+ jit_data->ctx = cgctx;
+ jit_data->proglen = proglen;
+ jit_data->image = image;
+ jit_data->header = bpf_hdr;
+ }
+
+out:
+ if (bpf_blinded)
+ bpf_jit_prog_release_other(fp, fp == org_fp ? tmp_fp : org_fp);
+
+ return fp;
+}
+
+/* Overriding bpf_jit_free() as we don't set images read-only. */
+void bpf_jit_free(struct bpf_prog *fp)
+{
+ unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
+ struct bpf_binary_header *bpf_hdr = (void *)addr;
+
+ if (fp->jited)
+ bpf_jit_binary_free(bpf_hdr);
+
+ bpf_prog_unlock_free(fp);
+}
--
2.25.0
^ permalink raw reply related
* Re: [PATCH] powerpc/mm: Refactor the floor/ceiling check in hugetlb range freeing functions
From: Qian Cai @ 2020-12-11 16:17 UTC (permalink / raw)
To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <16a571bb32eb6e8cd44bda484c8d81cd8a25e6d7.1604668827.git.christophe.leroy@csgroup.eu>
On Fri, 2020-11-06 at 13:20 +0000, Christophe Leroy wrote:
> All hugetlb range freeing functions have a verification like the following,
> which only differs by the mask used, depending on the page table level.
>
> start &= MASK;
> if (start < floor)
> return;
> if (ceiling) {
> ceiling &= MASK;
> if (! ceiling)
> return;
> }
> if (end - 1 > ceiling - 1)
> return;
>
> Refactor that into a helper function which takes the mask as
> an argument, returning true when [start;end[ is not fully
> contained inside [floor;ceiling[
>
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
> arch/powerpc/mm/hugetlbpage.c | 56 ++++++++++++-----------------------
> 1 file changed, 19 insertions(+), 37 deletions(-)
>
> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
> index 36c3800769fb..f8d8a4988e15 100644
> --- a/arch/powerpc/mm/hugetlbpage.c
> +++ b/arch/powerpc/mm/hugetlbpage.c
> @@ -294,6 +294,21 @@ static void hugepd_free(struct mmu_gather *tlb, void *hugepte)
> static inline void hugepd_free(struct mmu_gather *tlb, void *hugepte) {}
> #endif
>
> +/* Return true when the entry to be freed maps more than the area being freed */
> +static bool range_is_outside_limits(unsigned long start, unsigned long end,
> + unsigned long floor, unsigned long ceiling,
> + unsigned long mask)
> +{
> + if ((start & mask) < floor)
> + return true;
> + if (ceiling) {
> + ceiling &= mask;
> + if (!ceiling)
> + return true;
> + }
> + return end - 1 > ceiling - 1;
> +}
> +
> static void free_hugepd_range(struct mmu_gather *tlb, hugepd_t *hpdp, int pdshift,
> unsigned long start, unsigned long end,
> unsigned long floor, unsigned long ceiling)
> @@ -309,15 +324,7 @@ static void free_hugepd_range(struct mmu_gather *tlb, hugepd_t *hpdp, int pdshif
> if (shift > pdshift)
> num_hugepd = 1 << (shift - pdshift);
>
> - start &= pdmask;
> - if (start < floor)
> - return;
> - if (ceiling) {
> - ceiling &= pdmask;
> - if (! ceiling)
> - return;
> - }
> - if (end - 1 > ceiling - 1)
> + if (range_is_outside_limits(start, end, floor, ceiling, pdmask))
> return;
>
> for (i = 0; i < num_hugepd; i++, hpdp++)
> @@ -334,18 +341,9 @@ static void hugetlb_free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
> unsigned long addr, unsigned long end,
> unsigned long floor, unsigned long ceiling)
> {
> - unsigned long start = addr;
> pgtable_t token = pmd_pgtable(*pmd);
>
> - start &= PMD_MASK;
> - if (start < floor)
> - return;
> - if (ceiling) {
> - ceiling &= PMD_MASK;
> - if (!ceiling)
> - return;
> - }
> - if (end - 1 > ceiling - 1)
> + if (range_is_outside_limits(addr, end, floor, ceiling, PMD_MASK))
> return;
>
> pmd_clear(pmd);
> @@ -395,15 +393,7 @@ static void hugetlb_free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
> addr, next, floor, ceiling);
> } while (addr = next, addr != end);
>
> - start &= PUD_MASK;
> - if (start < floor)
> - return;
> - if (ceiling) {
> - ceiling &= PUD_MASK;
> - if (!ceiling)
> - return;
> - }
> - if (end - 1 > ceiling - 1)
> + if (range_is_outside_limits(start, end, floor, ceiling, PUD_MASK))
> return;
>
> pmd = pmd_offset(pud, start);
> @@ -446,15 +436,7 @@ static void hugetlb_free_pud_range(struct mmu_gather *tlb, p4d_t *p4d,
> }
> } while (addr = next, addr != end);
>
> - start &= PGDIR_MASK;
> - if (start < floor)
> - return;
> - if (ceiling) {
> - ceiling &= PGDIR_MASK;
> - if (!ceiling)
> - return;
> - }
> - if (end - 1 > ceiling - 1)
> + if (range_is_outside_limits(start, end, floor, ceiling, PGDIR_MASK))
> return;
>
> pud = pud_offset(p4d, start);
Well, "start" is still in use in hugetlb_free_pmd_range() and
hugetlb_free_pud_range() after range_is_outside_limits(), but after this patch,
"start" is not longer has the bitmask, i.e., "no &=".
Anyway, reverting this commit from today's linux-next fixed a crash on POWE9 NV.
# runltp -f hugetlb
[ 7703.114640][T58070] LTP: starting hugemmap05_1 (hugemmap05 -m)
[ 7703.157792][ C99] ------------[ cut here ]------------
[ 7703.158279][ C99] kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:387!
[ 7703.158306][ C99] Oops: Exception in kernel mode, sig: 5 [#1]
[ 7703.158330][ C99] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 NUMA PowerNV
[ 7703.158343][ C99] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_spapr_tce vfio vfio_spapr_eeh loop kvm_hv kvm ip_tables x_tables sd_mod ahci libahci tg3 libata firmware_class libphy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: dummy_del_mod]
[ 7703.158435][ C99] CPU: 99 PID: 308 Comm: ksoftirqd/99 Tainted: G O 5.10.0-rc7-next-20201211 #1
[ 7703.158464][ C99] NIP: c00000000005dbec LR: c0000000003352f4 CTR: 0000000000000000
[ 7703.158489][ C99] REGS: c00020000bb6f830 TRAP: 0700 Tainted: G O (5.10.0-rc7-next-20201211)
[ 7703.158528][ C99] MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24002284 XER: 20040000
[ 7703.158570][ C99] GPR00: c0000000003352f4 c00020000bb6fad0 c000000007f70b00 c0002000385b3ff0
[ 7703.158570][ C99] GPR04: 0000000000000000 0000000000000003 c00020000bb6f8b4 0000000000000001
[ 7703.158570][ C99] GPR08: 0000000000000001 0000000000000009 0000000000000008 0000000000000002
[ 7703.158570][ C99] GPR12: 0000000024002488 c000201fff649c00 c000000007f2a20c 0000000000000000
[ 7703.158570][ C99] GPR16: 0000000000000007 0000000000000000 c000000000194d10 c000000000194d10
[ 7703.158570][ C99] GPR24: 0000000000000014 0000000000000015 c000201cc6e72398 c000000007fac4b4
[ 7703.158570][ C99] GPR28: c000000007f2bf80 c000000007fac2f8 0000000000000008 c000200033870000
[ 7703.158766][ C99] NIP [c00000000005dbec] __tlb_remove_table+0x1dc/0x1e0
pgtable_free at arch/powerpc/mm/book3s64/pgtable.c:387
(inlined by) __tlb_remove_table at arch/powerpc/mm/book3s64/pgtable.c:405
[ 7703.158805][ C99] LR [c0000000003352f4] tlb_remove_table_rcu+0x54/0xa0
[ 7703.158853][ C99] Call Trace:
[ 7703.158872][ C99] [c00020000bb6fad0] [c00000000005db4c] __tlb_remove_table+0x13c/0x1e0 (unreliable)
[ 7703.158890][ C99] [c00020000bb6fb00] [c0000000003352f4] tlb_remove_table_rcu+0x54/0xa0
__tlb_remove_table_free at mm/mmu_gather.c:101
(inlined by) tlb_remove_table_rcu at mm/mmu_gather.c:156
[ 7703.158927][ C99] [c00020000bb6fb30] [c000000000194d7c] rcu_core+0x35c/0xbb0
rcu_do_batch at kernel/rcu/tree.c:2502
(inlined by) rcu_core at kernel/rcu/tree.c:2737
[ 7703.158966][ C99] [c00020000bb6fbf0] [c00000000095a3d0] __do_softirq+0x480/0x704
[ 7703.159006][ C99] [c00020000bb6fd10] [c0000000000cc1f4] run_ksoftirqd+0x74/0xd0
run_ksoftirqd at kernel/softirq.c:651
(inlined by) run_ksoftirqd at kernel/softirq.c:642
[ 7703.159046][ C99] [c00020000bb6fd30] [c0000000001040c8] smpboot_thread_fn+0x278/0x320
[ 7703.159096][ C99] [c00020000bb6fda0] [c0000000000fc8a4] kthread+0x1c4/0x1d0
[ 7703.159145][ C99] [c00020000bb6fe10] [c00000000000d9fc] ret_from_kernel_thread+0x5c/0x80
[ 7703.159183][ C99] Instruction dump:
[ 7703.159204][ C99] 60000000 7c0802a6 3c82f8b4 7fe3fb78 38847470 f8010040 482b4fc5 60000000
[ 7703.159248][ C99] 0fe00000 7c0802a6 fbe10028 f8010040 <0fe00000> 3c4c07f1 38422f10 7c0802a6
[ 7703.159293][ C99] ---[ end trace 1d92a5231ba6a0d5 ]---
^ permalink raw reply
* Re: [PATCH] net: ethernet: fs-enet: remove casting dma_alloc_coherent
From: Christophe Leroy @ 2020-12-11 16:42 UTC (permalink / raw)
To: David Laight, Xu Wang, pantelis.antoniou@gmail.com,
davem@davemloft.net, kuba@kernel.org,
linuxppc-dev@lists.ozlabs.org
Cc: linux-kernel@vger.kernel.org
In-Reply-To: <6fc4b62ee7754d78b8f7b9c2275bc47e@AcuMS.aculab.com>
Le 11/12/2020 à 17:07, David Laight a écrit :
> From: Christophe Leroy
>> Sent: 11 December 2020 15:22
>>
>> Le 11/12/2020 à 09:52, Xu Wang a écrit :
>>> Remove casting the values returned by dma_alloc_coherent.
>>
>> Can you explain more in the commit log ?
>>
>> As far as I can see, dma_alloc_coherent() doesn't return __iomem, and ring_base member is __iomem
>
> Which is probably wrong - that is the kernel address of kernel memory.
> So it shouldn't have the __iomem marker.
That's where the buffer descriptors are, the driver accesses to the content of the buffer
descriptors using the IO accessors in_be16()/out_be16(). Is it not correct ?
Christophe
>
> I wonder what else is wrong....
>
> David
>
>>
>> Christophe
>>
>>>
>>> Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
>>> ---
>>> drivers/net/ethernet/freescale/fs_enet/mac-fec.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/ethernet/freescale/fs_enet/mac-fec.c
>> b/drivers/net/ethernet/freescale/fs_enet/mac-fec.c
>>> index 99fe2c210d0f..3ae345676e50 100644
>>> --- a/drivers/net/ethernet/freescale/fs_enet/mac-fec.c
>>> +++ b/drivers/net/ethernet/freescale/fs_enet/mac-fec.c
>>> @@ -131,7 +131,7 @@ static int allocate_bd(struct net_device *dev)
>>> struct fs_enet_private *fep = netdev_priv(dev);
>>> const struct fs_platform_info *fpi = fep->fpi;
>>>
>>> - fep->ring_base = (void __force __iomem *)dma_alloc_coherent(fep->dev,
>>> + fep->ring_base = dma_alloc_coherent(fep->dev,
>>> (fpi->tx_ring + fpi->rx_ring) *
>>> sizeof(cbd_t), &fep->ring_mem_addr,
>>> GFP_KERNEL);
>>>
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>
^ permalink raw reply
* Re: [RFC HACK PATCH] PCI: dwc: layerscape: Hack around enumeration problems with Honeycomb LX2K
From: Daniel Thompson @ 2020-12-11 17:05 UTC (permalink / raw)
To: Rob Herring
Cc: Roy Zang, Lorenzo Pieralisi, Linaro Patches, PCI, Jon Nettleton,
linux-kernel@vger.kernel.org, Minghuan Lian, linux-arm-kernel,
Bjorn Helgaas, linuxppc-dev, Mingkai Hu
In-Reply-To: <CAL_JsqKQxFvkFtph1BZD2LKdZjboxhMTWkZe_AWS-vMD9y0pMw@mail.gmail.com>
On Fri, Dec 11, 2020 at 08:37:40AM -0600, Rob Herring wrote:
> On Fri, Dec 11, 2020 at 6:15 AM Daniel Thompson
> <daniel.thompson@linaro.org> wrote:
> >
> > I have been chasing down a problem enumerating an NVMe drive on a
> > Honeycomb LX2K (NXP LX2160A). Specifically the drive can only enumerate
> > successfully if the we are emitting lots of console messages via a UART.
> > If the system is booted with `quiet` set then enumeration fails.
>
> We really need to get away from work-arounds for device X on host Y. I
> suspect they are not limited to that combination only...
No objection on that. This patch was essentially sharing the result of
an investigation where I got stuck at the "now fix it properly" stage!
> How exactly does it fail to enumerate? There's a (racy) linkup check
> on config accesses, is it reporting link down and skipping config
> accesses?
In dmesg terms it looked like this:
--- nvme_borked_gpu_working.linted.dmesg 2020-11-18 15:28:35.544118213 +0000
+++ nvme_working_gpu_working.linted.dmesg 2020-11-18 15:28:56.180076946 +0000
@@ -241,11 +241,19 @@
pci 0000:00:00.0: reg 0x38: [mem 0x9048000000-0x9048ffffff pref]
pci 0000:00:00.0: supports D1 D2
pci 0000:00:00.0: PME# supported from D0 D1 D2 D3hot
+pci 0000:01:00.0: [15b7:5009] type 00 class 0x010802
+pci 0000:01:00.0: reg 0x10: [mem 0x9049000000-0x9049003fff 64bit]
+pci 0000:01:00.0: reg 0x20: [mem 0x9049004000-0x90490040ff 64bit]
pci 0000:00:00.0: BAR 1: assigned [mem 0x9040000000-0x9043ffffff]
pci 0000:00:00.0: BAR 0: assigned [mem 0x9044000000-0x9044ffffff]
pci 0000:00:00.0: BAR 6: assigned [mem 0x9045000000-0x9045ffffff pref]
+pci 0000:00:00.0: BAR 14: assigned [mem 0x9046000000-0x90460fffff]
+pci 0000:01:00.0: BAR 0: assigned [mem 0x9046000000-0x9046003fff 64bit]
+pci 0000:01:00.0: BAR 4: assigned [mem 0x9046004000-0x90460040ff 64bit]
pci 0000:00:00.0: PCI bridge to [bus 01-ff]
+pci 0000:00:00.0: bridge window [mem 0x9046000000-0x90460fffff]
pci 0000:00:00.0: Max Payload Size set to 256/ 256 (was 128), Max Read Rq 256
+pci 0000:01:00.0: Max Payload Size set to 256/ 512 (was 128), Max Read Rq 256
layerscape-pcie 3800000.pcie: host bridge /soc/pcie@3800000 ranges:
layerscape-pcie 3800000.pcie: MEM 0xa040000000..0xa07fffffff -> 0x0040000000
layerscape-pcie 3800000.pcie: PCI host bridge to bus 0001:00
... and be aware that the last three lines here are another PCIe
controller coming up just fine and it is about to detect the GPU
(which like the NVMe is also gen3) just fine whether or not we
add a short delay.
> > I guessed this would be due to the timing impact of printk-to-UART and
> > tried to find out where a delay could be added to provoke a successful
> > enumeration.
> >
> > This patch contains the results. The delay length (1ms) was selected by
> > binary searching downwards until the delay was not effective for these
> > devices (Honeycomb LX2K and a Western Digital WD Blue SN550).
> >
> > I have also included the workaround twice (conditionally compiled). The
> > first change is the *latest* possible code path that we can deploy a
> > delay whilst the second is the earliest place I could find.
> >
> > The summary is that the critical window were we are currently relying on
> > a call to the console UART code can "mend" the driver runs from calling
> > dw_pcie_setup_rc() in host init to just before we read the state in the
> > link up callback.
> >
> > Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
> > ---
> >
> > Notes:
> > This patch is RFC (and HACK) because I don't have much clue *why* this
> > patch works... merely that this is the smallest possible change I need
> > to replicate the current accidental printk() workaround. Perhaps one
> > could argue that RFC here stands for request-for-clue. All my
> > observations and changes here are empirical and I don't know how best to
> > turn them into something that is not a hack!
> >
> > BTW I noticed many other pcie-designware drivers take advantage
> > of a function called dw_pcie_wait_for_link() in their init paths...
> > but my naive attempts to add it to the layerscape driver results
> > in non-booting systems so I haven't embarrassed myself by including
> > that in the patch!
>
> You need to look at what's pending for v5.11, because I reworked this
> to be more unified. The ordering of init is also possibly changed. The
> sequence is now like this:
>
> dw_pcie_setup_rc(pp);
> dw_pcie_msi_init(pp);
>
> if (!dw_pcie_link_up(pci) && pci->ops->start_link) {
> ret = pci->ops->start_link(pci);
> if (ret)
> goto err_free_msi;
> }
>
> /* Ignore errors, the link may come up later */
> dw_pcie_wait_for_link(pci);
Thanks. That looks likely to fix it since IIUC dw_pcie_wait_for_link()
will end up waiting somewhat like the double check I added to
ls_pcie_link_up().
I'll take a look at let you know.
Daniel.
^ permalink raw reply
* RE: [PATCH] net: ethernet: fs-enet: remove casting dma_alloc_coherent
From: David Laight @ 2020-12-11 16:55 UTC (permalink / raw)
To: 'Christophe Leroy', Xu Wang, pantelis.antoniou@gmail.com,
davem@davemloft.net, kuba@kernel.org,
linuxppc-dev@lists.ozlabs.org
Cc: linux-kernel@vger.kernel.org
In-Reply-To: <4a1c2852-781f-e125-afcb-69387660b6af@csgroup.eu>
From: Christophe Leroy
> Sent: 11 December 2020 16:43
>
> Le 11/12/2020 à 17:07, David Laight a écrit :
> > From: Christophe Leroy
> >> Sent: 11 December 2020 15:22
> >>
> >> Le 11/12/2020 à 09:52, Xu Wang a écrit :
> >>> Remove casting the values returned by dma_alloc_coherent.
> >>
> >> Can you explain more in the commit log ?
> >>
> >> As far as I can see, dma_alloc_coherent() doesn't return __iomem, and ring_base member is __iomem
> >
> > Which is probably wrong - that is the kernel address of kernel memory.
> > So it shouldn't have the __iomem marker.
>
> That's where the buffer descriptors are, the driver accesses to the content of the buffer
> descriptors using the IO accessors in_be16()/out_be16(). Is it not correct ?
I've just been looking at the crap in there.
My understanding is that IO accessors are for IO devices (eg addresses
from io_remap() etc).
Buffers allocated by dma_alloc_coherent() are normal kernel memory
and don't need any accessors.
Now you might need some barriers - mostly because an ethernet chip
can typically read a ring entry without being prodded.
IIRC there is a barrier in writel() to ensure the dma master will
'see' all memory writes done before the IO write that kicks it into
doing some processing.
The fact that the driver contains so many __iomem casts (eg in
tx_restart) is an indication that something is badly awry.
__iomem exists to check you are using the correct type of pointer.
Any __iomem casts are dubious.
David
>
> Christophe
>
> >
> > I wonder what else is wrong....
> >
> > David
> >
> >>
> >> Christophe
> >>
> >>>
> >>> Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
> >>> ---
> >>> drivers/net/ethernet/freescale/fs_enet/mac-fec.c | 2 +-
> >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/net/ethernet/freescale/fs_enet/mac-fec.c
> >> b/drivers/net/ethernet/freescale/fs_enet/mac-fec.c
> >>> index 99fe2c210d0f..3ae345676e50 100644
> >>> --- a/drivers/net/ethernet/freescale/fs_enet/mac-fec.c
> >>> +++ b/drivers/net/ethernet/freescale/fs_enet/mac-fec.c
> >>> @@ -131,7 +131,7 @@ static int allocate_bd(struct net_device *dev)
> >>> struct fs_enet_private *fep = netdev_priv(dev);
> >>> const struct fs_platform_info *fpi = fep->fpi;
> >>>
> >>> - fep->ring_base = (void __force __iomem *)dma_alloc_coherent(fep->dev,
> >>> + fep->ring_base = dma_alloc_coherent(fep->dev,
> >>> (fpi->tx_ring + fpi->rx_ring) *
> >>> sizeof(cbd_t), &fep->ring_mem_addr,
> >>> GFP_KERNEL);
> >>>
> >
> > -
> > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> > Registration No: 1397386 (Wales)
> >
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply
* RE: [PATCH] net: ethernet: fs-enet: remove casting dma_alloc_coherent
From: David Laight @ 2020-12-11 16:07 UTC (permalink / raw)
To: 'Christophe Leroy', Xu Wang, pantelis.antoniou@gmail.com,
davem@davemloft.net, kuba@kernel.org,
linuxppc-dev@lists.ozlabs.org
Cc: linux-kernel@vger.kernel.org
In-Reply-To: <34548188-67f4-d3ef-c2e3-871fc520e838@csgroup.eu>
From: Christophe Leroy
> Sent: 11 December 2020 15:22
>
> Le 11/12/2020 à 09:52, Xu Wang a écrit :
> > Remove casting the values returned by dma_alloc_coherent.
>
> Can you explain more in the commit log ?
>
> As far as I can see, dma_alloc_coherent() doesn't return __iomem, and ring_base member is __iomem
Which is probably wrong - that is the kernel address of kernel memory.
So it shouldn't have the __iomem marker.
I wonder what else is wrong....
David
>
> Christophe
>
> >
> > Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
> > ---
> > drivers/net/ethernet/freescale/fs_enet/mac-fec.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/freescale/fs_enet/mac-fec.c
> b/drivers/net/ethernet/freescale/fs_enet/mac-fec.c
> > index 99fe2c210d0f..3ae345676e50 100644
> > --- a/drivers/net/ethernet/freescale/fs_enet/mac-fec.c
> > +++ b/drivers/net/ethernet/freescale/fs_enet/mac-fec.c
> > @@ -131,7 +131,7 @@ static int allocate_bd(struct net_device *dev)
> > struct fs_enet_private *fep = netdev_priv(dev);
> > const struct fs_platform_info *fpi = fep->fpi;
> >
> > - fep->ring_base = (void __force __iomem *)dma_alloc_coherent(fep->dev,
> > + fep->ring_base = dma_alloc_coherent(fep->dev,
> > (fpi->tx_ring + fpi->rx_ring) *
> > sizeof(cbd_t), &fep->ring_mem_addr,
> > GFP_KERNEL);
> >
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply
* Re: [powerpc/merge] System crash during cpu offline/online operation
From: Sachin Sant @ 2020-12-12 10:07 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: linuxppc-dev
In-Reply-To: <F4809152-C7F5-4ED8-B071-85A9115BD29D@linux.vnet.ibm.com>
> On 11-Dec-2020, at 3:47 PM, Sachin Sant <sachinp@linux.vnet.ibm.com> wrote:
>
> I am observing system crash during a cpu offline/online operation
> with latest merge branch code running in a PowerVM LPAR (P8 onwards)
>
> # uname -r
> 5.10.0-rc7-01792-g244569c777ca
> # ppc64_cpu --smt=1
> [ 244.205194] cpu 1 (hwid 1) Ready to die…
> ………
> ……...
> [ 247.015113] cpu 30 (hwid 30) Ready to die...
> [ 247.104973] cpu 31 (hwid 31) Ready to die…
> # ppc64_cpu --smt=8
>
> At this point the LPAR reboots instantly without any trace message.
Git bisect leads me to the following commit:
3b47b7549ead0719e94022c6742199333c7c8d9f is the first bad commit
commit 3b47b7549ead0719e94022c6742199333c7c8d9f
Author: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Date: Fri Nov 27 10:14:07 2020 +0530
powerpc/book3s64/kuap: Move KUAP related function outside radix
Here is the bisect log:
# git bisect log
git bisect start
# bad: [244569c777ca638b08c75db88fe035bdec52ef80] Automatic merge of 'next' into merge (2020-12-10 00:34)
git bisect bad 244569c777ca638b08c75db88fe035bdec52ef80
# good: [9acd775e4579bde0a6d937d72f9669e418aa87ad] Automatic merge of 'master' into merge (2020-12-05 22:54)
git bisect good 9acd775e4579bde0a6d937d72f9669e418aa87ad
# good: [ab91292cb3e9f43d9c6839d7572d17b35bc21710] Merge tag 'char-misc-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
git bisect good ab91292cb3e9f43d9c6839d7572d17b35bc21710
# bad: [19b311ca51e108b6d8d679496af8635fdc1984a8] ocxl: Initiate a TLB invalidate command
git bisect bad 19b311ca51e108b6d8d679496af8635fdc1984a8
# bad: [d94b827e89dc3f92cd871d10f4992a6bd3c861e5] powerpc/book3s64/kuap: Use Key 3 for kernel mapping with hash translation
git bisect bad d94b827e89dc3f92cd871d10f4992a6bd3c861e5
# good: [1d15ffdfc94127d75e04a88344ee1ce8c79f05fd] KVM: PPC: Book3S HV: Ratelimit machine check messages coming from guests
git bisect good 1d15ffdfc94127d75e04a88344ee1ce8c79f05fd
# good: [9f378b9f007cc94beadea40df83cc62a76975c6f] KVM: PPC: BOOK3S: PR: Ignore UAMOR SPR
git bisect good 9f378b9f007cc94beadea40df83cc62a76975c6f
# bad: [3b47b7549ead0719e94022c6742199333c7c8d9f] powerpc/book3s64/kuap: Move KUAP related function outside radix
git bisect bad 3b47b7549ead0719e94022c6742199333c7c8d9f
# good: [39df17bc20059c84ddc6f91831fce2e2cc79a6f3] powerpc/book3s64/kuap/kuep: Move uamor setup to pkey init
git bisect good 39df17bc20059c84ddc6f91831fce2e2cc79a6f3
# first bad commit: [3b47b7549ead0719e94022c6742199333c7c8d9f] powerpc/book3s64/kuap: Move KUAP related function outside radix
Thanks
-Sachin
^ permalink raw reply
* [PATCH] powerpc/mm: Fix hugetlb_free_pmd_range() and hugetlb_free_pud_range()
From: Christophe Leroy @ 2020-12-12 13:41 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, qcai
Cc: linuxppc-dev, linux-kernel
Commit 7bfe54b5f165 ("powerpc/mm: Refactor the floor/ceiling check in
hugetlb range freeing functions") inadvertely removed the mask
applied to start parameter in those two functions, leading to the
following crash on power9.
[ 7703.114640][T58070] LTP: starting hugemmap05_1 (hugemmap05 -m)
[ 7703.157792][ C99] ------------[ cut here ]------------
[ 7703.158279][ C99] kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:387!
[ 7703.158306][ C99] Oops: Exception in kernel mode, sig: 5 [#1]
[ 7703.158330][ C99] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 NUMA PowerNV
[ 7703.158343][ C99] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_spapr_tce vfio vfio_spapr_eeh loop kvm_hv kvm ip_tables x_tables sd_mod ahci libahci tg3 libata firmware_class libphy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: dummy_del_mod]
[ 7703.158435][ C99] CPU: 99 PID: 308 Comm: ksoftirqd/99 Tainted: G O 5.10.0-rc7-next-20201211 #1
[ 7703.158464][ C99] NIP: c00000000005dbec LR: c0000000003352f4 CTR: 0000000000000000
[ 7703.158489][ C99] REGS: c00020000bb6f830 TRAP: 0700 Tainted: G O (5.10.0-rc7-next-20201211)
[ 7703.158528][ C99] MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24002284 XER: 20040000
[ 7703.158570][ C99] GPR00: c0000000003352f4 c00020000bb6fad0 c000000007f70b00 c0002000385b3ff0
[ 7703.158570][ C99] GPR04: 0000000000000000 0000000000000003 c00020000bb6f8b4 0000000000000001
[ 7703.158570][ C99] GPR08: 0000000000000001 0000000000000009 0000000000000008 0000000000000002
[ 7703.158570][ C99] GPR12: 0000000024002488 c000201fff649c00 c000000007f2a20c 0000000000000000
[ 7703.158570][ C99] GPR16: 0000000000000007 0000000000000000 c000000000194d10 c000000000194d10
[ 7703.158570][ C99] GPR24: 0000000000000014 0000000000000015 c000201cc6e72398 c000000007fac4b4
[ 7703.158570][ C99] GPR28: c000000007f2bf80 c000000007fac2f8 0000000000000008 c000200033870000
[ 7703.158766][ C99] NIP [c00000000005dbec] __tlb_remove_table+0x1dc/0x1e0
pgtable_free at arch/powerpc/mm/book3s64/pgtable.c:387
(inlined by) __tlb_remove_table at arch/powerpc/mm/book3s64/pgtable.c:405
[ 7703.158805][ C99] LR [c0000000003352f4] tlb_remove_table_rcu+0x54/0xa0
[ 7703.158853][ C99] Call Trace:
[ 7703.158872][ C99] [c00020000bb6fad0] [c00000000005db4c] __tlb_remove_table+0x13c/0x1e0 (unreliable)
[ 7703.158890][ C99] [c00020000bb6fb00] [c0000000003352f4] tlb_remove_table_rcu+0x54/0xa0
__tlb_remove_table_free at mm/mmu_gather.c:101
(inlined by) tlb_remove_table_rcu at mm/mmu_gather.c:156
[ 7703.158927][ C99] [c00020000bb6fb30] [c000000000194d7c] rcu_core+0x35c/0xbb0
rcu_do_batch at kernel/rcu/tree.c:2502
(inlined by) rcu_core at kernel/rcu/tree.c:2737
[ 7703.158966][ C99] [c00020000bb6fbf0] [c00000000095a3d0] __do_softirq+0x480/0x704
[ 7703.159006][ C99] [c00020000bb6fd10] [c0000000000cc1f4] run_ksoftirqd+0x74/0xd0
run_ksoftirqd at kernel/softirq.c:651
(inlined by) run_ksoftirqd at kernel/softirq.c:642
[ 7703.159046][ C99] [c00020000bb6fd30] [c0000000001040c8] smpboot_thread_fn+0x278/0x320
[ 7703.159096][ C99] [c00020000bb6fda0] [c0000000000fc8a4] kthread+0x1c4/0x1d0
[ 7703.159145][ C99] [c00020000bb6fe10] [c00000000000d9fc] ret_from_kernel_thread+0x5c/0x80
[ 7703.159183][ C99] Instruction dump:
[ 7703.159204][ C99] 60000000 7c0802a6 3c82f8b4 7fe3fb78 38847470 f8010040 482b4fc5 60000000
[ 7703.159248][ C99] 0fe00000 7c0802a6 fbe10028 f8010040 <0fe00000> 3c4c07f1 38422f10 7c0802a6
[ 7703.159293][ C99] ---[ end trace 1d92a5231ba6a0d5 ]---
Properly apply the masks before calling pmd_free_tlb() and
pud_free_tlb() respectively.
Reported-by: Qian Cai <qcai@redhat.com>
Fixes: 7bfe54b5f165 ("powerpc/mm: Refactor the floor/ceiling check in hugetlb range freeing functions")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/mm/hugetlbpage.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index f8d8a4988e15..8b3cc4d688e8 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -396,9 +396,9 @@ static void hugetlb_free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
if (range_is_outside_limits(start, end, floor, ceiling, PUD_MASK))
return;
- pmd = pmd_offset(pud, start);
+ pmd = pmd_offset(pud, start & PUD_MASK);
pud_clear(pud);
- pmd_free_tlb(tlb, pmd, start);
+ pmd_free_tlb(tlb, pmd, start & PUD_MASK);
mm_dec_nr_pmds(tlb->mm);
}
@@ -439,9 +439,9 @@ static void hugetlb_free_pud_range(struct mmu_gather *tlb, p4d_t *p4d,
if (range_is_outside_limits(start, end, floor, ceiling, PGDIR_MASK))
return;
- pud = pud_offset(p4d, start);
+ pud = pud_offset(p4d, start & PGDIR_MASK);
p4d_clear(p4d);
- pud_free_tlb(tlb, pud, start);
+ pud_free_tlb(tlb, pud, start & PGDIR_MASK);
mm_dec_nr_puds(tlb->mm);
}
--
2.25.0
^ permalink raw reply related
* [PATCH] powerpc/vas: Fix IRQ name allocation
From: Cédric Le Goater @ 2020-12-12 14:27 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Sukadev Bhattiprolu, Haren Myneni, Cédric Le Goater
The VAS device allocates a generic interrupt to handle page faults but
the IRQ name doesn't show under /proc. This is because it's on
stack. Allocate the name.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
I didn't understand this part in init_vas_instance() :
if (vinst->virq) {
rc = vas_irq_fault_window_setup(vinst);
/*
* Fault window is used only for user space send windows.
* So if vinst->virq is NULL, tx_win_open returns -ENODEV
* for user space.
*/
if (rc)
vinst->virq = 0;
}
If the IRQ cannot be requested, the device probing should fail but
it's not today. The use of 'vinst->virq' is suspicious.
arch/powerpc/platforms/powernv/vas.h | 1 +
arch/powerpc/platforms/powernv/vas.c | 11 ++++++++---
2 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/vas.h b/arch/powerpc/platforms/powernv/vas.h
index 70f793e8f6cc..c7db3190baca 100644
--- a/arch/powerpc/platforms/powernv/vas.h
+++ b/arch/powerpc/platforms/powernv/vas.h
@@ -340,6 +340,7 @@ struct vas_instance {
struct vas_window *rxwin[VAS_COP_TYPE_MAX];
struct vas_window *windows[VAS_WINDOWS_PER_CHIP];
+ char *name;
char *dbgname;
struct dentry *dbgdir;
};
diff --git a/arch/powerpc/platforms/powernv/vas.c b/arch/powerpc/platforms/powernv/vas.c
index 598e4cd563fb..b65256a63e87 100644
--- a/arch/powerpc/platforms/powernv/vas.c
+++ b/arch/powerpc/platforms/powernv/vas.c
@@ -28,12 +28,10 @@ static DEFINE_PER_CPU(int, cpu_vas_id);
static int vas_irq_fault_window_setup(struct vas_instance *vinst)
{
- char devname[64];
int rc = 0;
- snprintf(devname, sizeof(devname), "vas-%d", vinst->vas_id);
rc = request_threaded_irq(vinst->virq, vas_fault_handler,
- vas_fault_thread_fn, 0, devname, vinst);
+ vas_fault_thread_fn, 0, vinst->name, vinst);
if (rc) {
pr_err("VAS[%d]: Request IRQ(%d) failed with %d\n",
@@ -80,6 +78,12 @@ static int init_vas_instance(struct platform_device *pdev)
if (!vinst)
return -ENOMEM;
+ vinst->name = kasprintf(GFP_KERNEL, "vas-%d", vasid);
+ if (!vinst->name) {
+ kfree(vinst);
+ return -ENOMEM;
+ }
+
INIT_LIST_HEAD(&vinst->node);
ida_init(&vinst->ida);
mutex_init(&vinst->mutex);
@@ -162,6 +166,7 @@ static int init_vas_instance(struct platform_device *pdev)
return 0;
free_vinst:
+ kfree(vinst->name);
kfree(vinst);
return -ENODEV;
--
2.26.2
^ permalink raw reply related
* [PATCH 1/2] dmaengine: fsldma: Fix a resource leak in the remove function
From: Christophe JAILLET @ 2020-12-12 16:05 UTC (permalink / raw)
To: leoyang.li, zw, vkoul, dan.j.williams, timur
Cc: dmaengine, kernel-janitors, Christophe JAILLET, linuxppc-dev,
linux-kernel
A 'irq_dispose_mapping()' call is missing in the remove function.
Add it.
This is needed to undo the 'irq_of_parse_and_map() call from the probe
function and already part of the error handling path of the probe function.
It was added in the probe function only in commit d3f620b2c4fe ("fsldma:
simplify IRQ probing and handling")
Fixes: 77cd62e8082b ("fsldma: allow Freescale Elo DMA driver to be compiled as a module")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
---
Patch provided as-is.
I don't have the configuration to compile test this patch
---
drivers/dma/fsldma.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 0feb323bae1e..554f70a0c18c 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -1314,6 +1314,7 @@ static int fsldma_of_remove(struct platform_device *op)
if (fdev->chan[i])
fsl_dma_chan_remove(fdev->chan[i]);
}
+ irq_dispose_mapping(fdev->irq);
iounmap(fdev->regs);
kfree(fdev);
--
2.27.0
^ permalink raw reply related
* [PATCH 2/2] dmaengine: fsldma: Fix a resource leak in an error handling path of the probe function
From: Christophe JAILLET @ 2020-12-12 16:06 UTC (permalink / raw)
To: leoyang.li, zw, vkoul, dan.j.williams, iws
Cc: dmaengine, kernel-janitors, Christophe JAILLET, linuxppc-dev,
linux-kernel
In case of error, the previous 'fsl_dma_chan_probe()' calls must be undone
by some 'fsl_dma_chan_remove()', as already done in the remove function.
It was added in the remove function in commit 77cd62e8082b ("fsldma: allow
Freescale Elo DMA driver to be compiled as a module")
Fixes: d3f620b2c4fe ("fsldma: simplify IRQ probing and handling")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
---
Patch provided as-is.
I don't have the configuration to compile test this patch
---
drivers/dma/fsldma.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 554f70a0c18c..f8459cc5315d 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -1214,6 +1214,7 @@ static int fsldma_of_probe(struct platform_device *op)
{
struct fsldma_device *fdev;
struct device_node *child;
+ unsigned int i;
int err;
fdev = kzalloc(sizeof(*fdev), GFP_KERNEL);
@@ -1292,6 +1293,10 @@ static int fsldma_of_probe(struct platform_device *op)
return 0;
out_free_fdev:
+ for (i = 0; i < FSL_DMA_MAX_CHANS_PER_DEVICE; i++) {
+ if (fdev->chan[i])
+ fsl_dma_chan_remove(fdev->chan[i]);
+ }
irq_dispose_mapping(fdev->irq);
iounmap(fdev->regs);
out_free:
--
2.27.0
^ permalink raw reply related
* [PATCH 2/3] kbuild: LD_VERSION redenomination
From: Masahiro Yamada @ 2020-12-12 16:54 UTC (permalink / raw)
To: linux-kbuild
Cc: Thomas Bogendoerfer, Dominique Martinet, linuxppc-dev,
linux-kernel, Jiaxun Yang, linux-mips, Paul Mackerras,
Catalin Marinas, Huacai Chen, Will Deacon, Masahiro Yamada,
linux-arm-kernel
In-Reply-To: <20201212165431.150750-1-masahiroy@kernel.org>
Commit ccbef1674a15 ("Kbuild, lto: add ld-version and ld-ifversion
macros") introduced scripts/ld-version.sh for GCC LTO.
At that time, this script handled 5 version fields because GCC LTO
needed the downstream binutils. (https://lkml.org/lkml/2014/4/8/272)
The code snippet from the submitted patch was as follows:
# We need HJ Lu's Linux binutils because mainline binutils does not
# support mixing assembler and LTO code in the same ld -r object.
# XXX check if the gcc plugin ld is the expected one too
# XXX some Fedora binutils should also support it. How to check for that?
ifeq ($(call ld-ifversion,-ge,22710001,y),y)
...
However, GCC LTO was not merged into the mainline after all.
(https://lkml.org/lkml/2014/4/8/272)
So, the 4th and 5th fields were never used, and finally removed by
commit 0d61ed17dd30 ("ld-version: Drop the 4th and 5th version
components").
Since then, the last 4-digits returned by this script is always zeros.
Remove the meaningless last 4-digits. This makes the version format
consistent with GCC_VERSION, CLANG_VERSION, LLD_VERSION.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
---
arch/arm64/Kconfig | 2 +-
arch/mips/loongson64/Platform | 2 +-
arch/mips/vdso/Kconfig | 2 +-
arch/powerpc/Makefile | 2 +-
arch/powerpc/lib/Makefile | 2 +-
scripts/ld-version.sh | 2 +-
6 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a6b5b7ef40ae..69d56b21a6ec 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1499,7 +1499,7 @@ config ARM64_PTR_AUTH
depends on (CC_HAS_SIGN_RETURN_ADDRESS || CC_HAS_BRANCH_PROT_PAC_RET) && AS_HAS_PAC
# Modern compilers insert a .note.gnu.property section note for PAC
# which is only understood by binutils starting with version 2.33.1.
- depends on LD_IS_LLD || LD_VERSION >= 233010000 || (CC_IS_GCC && GCC_VERSION < 90100)
+ depends on LD_IS_LLD || LD_VERSION >= 23301 || (CC_IS_GCC && GCC_VERSION < 90100)
depends on !CC_IS_CLANG || AS_HAS_CFI_NEGATE_RA_STATE
depends on (!FUNCTION_GRAPH_TRACER || DYNAMIC_FTRACE_WITH_REGS)
help
diff --git a/arch/mips/loongson64/Platform b/arch/mips/loongson64/Platform
index ec42c5085905..cc0b9c87f9ad 100644
--- a/arch/mips/loongson64/Platform
+++ b/arch/mips/loongson64/Platform
@@ -35,7 +35,7 @@ cflags-$(CONFIG_CPU_LOONGSON64) += $(call as-option,-Wa$(comma)-mno-fix-loongson
# can't easily be used safely within the kbuild framework.
#
ifeq ($(call cc-ifversion, -ge, 0409, y), y)
- ifeq ($(call ld-ifversion, -ge, 225000000, y), y)
+ ifeq ($(call ld-ifversion, -ge, 22500, y), y)
cflags-$(CONFIG_CPU_LOONGSON64) += \
$(call cc-option,-march=loongson3a -U_MIPS_ISA -D_MIPS_ISA=_MIPS_ISA_MIPS64)
else
diff --git a/arch/mips/vdso/Kconfig b/arch/mips/vdso/Kconfig
index 7aec721398d5..a665f6108cb5 100644
--- a/arch/mips/vdso/Kconfig
+++ b/arch/mips/vdso/Kconfig
@@ -12,7 +12,7 @@
# the lack of relocations. As such, we disable the VDSO for microMIPS builds.
config MIPS_LD_CAN_LINK_VDSO
- def_bool LD_VERSION >= 225000000 || LD_IS_LLD
+ def_bool LD_VERSION >= 22500 || LD_IS_LLD
config MIPS_DISABLE_VDSO
def_bool CPU_MICROMIPS || (!CPU_MIPSR6 && !MIPS_LD_CAN_LINK_VDSO)
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 5c8c06215dd4..6a9a852c3d56 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -65,7 +65,7 @@ UTS_MACHINE := $(subst $(space),,$(machine-y))
ifdef CONFIG_PPC32
KBUILD_LDFLAGS_MODULE += arch/powerpc/lib/crtsavres.o
else
-ifeq ($(call ld-ifversion, -ge, 225000000, y),y)
+ifeq ($(call ld-ifversion, -ge, 22500, y),y)
# Have the linker provide sfpr if possible.
# There is a corresponding test in arch/powerpc/lib/Makefile
KBUILD_LDFLAGS_MODULE += --save-restore-funcs
diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index 69a91b571845..d4efc182662a 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -31,7 +31,7 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
# 64-bit linker creates .sfpr on demand for final link (vmlinux),
# so it is only needed for modules, and only for older linkers which
# do not support --save-restore-funcs
-ifeq ($(call ld-ifversion, -lt, 225000000, y),y)
+ifeq ($(call ld-ifversion, -lt, 22500, y),y)
extra-$(CONFIG_PPC64) += crtsavres.o
endif
diff --git a/scripts/ld-version.sh b/scripts/ld-version.sh
index f2be0ff9a738..0f8a2c0f9502 100755
--- a/scripts/ld-version.sh
+++ b/scripts/ld-version.sh
@@ -6,6 +6,6 @@
gsub(".*version ", "");
gsub("-.*", "");
split($1,a, ".");
- print a[1]*100000000 + a[2]*1000000 + a[3]*10000;
+ print a[1]*10000 + a[2]*100 + a[3];
exit
}
--
2.27.0
^ permalink raw reply related
* Re: [PATCH AUTOSEL 5.9 27/39] sched/idle: Fix arch_cpu_idle() vs tracing
From: Sasha Levin @ 2020-12-13 14:10 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Mark Rutland, uclinux-h8-devel, linux-ia64, linux-parisc,
linux-s390, linux-hexagon, Heiko Carstens, linux-sh, linux-um,
linux-kernel, stable, linux-mips, openrisc, linux-csky,
Sven Schnelle, linux-alpha, sparclinux, linux-riscv, linuxppc-dev,
linux-arm-kernel
In-Reply-To: <20201203171015.GN2414@hirez.programming.kicks-ass.net>
On Thu, Dec 03, 2020 at 06:10:15PM +0100, Peter Zijlstra wrote:
>On Thu, Dec 03, 2020 at 03:54:42PM +0100, Heiko Carstens wrote:
>> On Thu, Dec 03, 2020 at 08:28:21AM -0500, Sasha Levin wrote:
>> > From: Peter Zijlstra <peterz@infradead.org>
>> >
>> > [ Upstream commit 58c644ba512cfbc2e39b758dd979edd1d6d00e27 ]
>> >
>> > We call arch_cpu_idle() with RCU disabled, but then use
>> > local_irq_{en,dis}able(), which invokes tracing, which relies on RCU.
>> >
>> > Switch all arch_cpu_idle() implementations to use
>> > raw_local_irq_{en,dis}able() and carefully manage the
>> > lockdep,rcu,tracing state like we do in entry.
>> >
>> > (XXX: we really should change arch_cpu_idle() to not return with
>> > interrupts enabled)
>> >
>> > Reported-by: Sven Schnelle <svens@linux.ibm.com>
>> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>> > Reviewed-by: Mark Rutland <mark.rutland@arm.com>
>> > Tested-by: Mark Rutland <mark.rutland@arm.com>
>> > Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org
>> > Signed-off-by: Sasha Levin <sashal@kernel.org>
>>
>> This patch broke s390 irq state tracing. A patch to fix this is
>> scheduled to be merged upstream today (hopefully).
>> Therefore I think this patch should not yet go into 5.9 stable.
>
>Agreed.
I'll also grab b1cae1f84a0f ("s390: fix irq state tracing"). Thanks!
--
Thanks,
Sasha
^ permalink raw reply
* Re: [PATCH] powerpc/ps3: use dma_mapping_error()
From: Geoff Levand @ 2020-12-13 19:38 UTC (permalink / raw)
To: Vincent Stehlé, linuxppc-dev, linux-kernel; +Cc: Geert Uytterhoeven
In-Reply-To: <20201213182622.23047-1-vincent.stehle@laposte.net>
On 12/13/20 10:26 AM, Vincent Stehlé wrote:
> The DMA address returned by dma_map_single() should be checked with
> dma_mapping_error(). Fix the ps3stor_setup() function accordingly.
>
> Fixes: 80071802cb9c ("[POWERPC] PS3: Storage Driver Core")
> Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net>
> Cc: Geoff Levand <geoff@infradead.org>
> Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
> ---
> drivers/ps3/ps3stor_lib.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
Looks good. Thanks for submitting.
Acked by: Geoff Levand <geoff@infradead.org>
^ permalink raw reply
* Re: [PATCH] powerpc/ps3: use dma_mapping_error()
From: Geert Uytterhoeven @ 2020-12-13 19:39 UTC (permalink / raw)
To: Vincent Stehlé
Cc: Geoff Levand, Geert Uytterhoeven, linuxppc-dev,
Linux Kernel Mailing List
In-Reply-To: <20201213182622.23047-1-vincent.stehle@laposte.net>
On Sun, Dec 13, 2020 at 8:06 PM Vincent Stehlé
<vincent.stehle@laposte.net> wrote:
> The DMA address returned by dma_map_single() should be checked with
> dma_mapping_error(). Fix the ps3stor_setup() function accordingly.
>
> Fixes: 80071802cb9c ("[POWERPC] PS3: Storage Driver Core")
> Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply
* [PATCH] powerpc: fix alignment bug whithin the init sections
From: Ariel Marcovitch @ 2020-12-13 18:35 UTC (permalink / raw)
To: mpe; +Cc: paulus, linuxppc-dev, Ariel Marcovitch, linux-kernel
This is a bug that can cause early crashes in configurations with a
.exit.text section smaller than a page and a .init.text section that
ends in the beginning of a physical page (this is kinda random, which
might explain why this wasn't really encountered before).
The init sections are ordered like this:
.init.text
.exit.text
.init.data
Currently, these sections aren't page aligned.
Because the init code is mapped read-only at runtime and because the
.init.text section can potentially reside on the same physical page as
.init.data, the beginning of .init.data might be mapped read-only along
with .init.text.
Then when the kernel tries to modify a variable in .init.data (like
kthreadd_done, used in kernel_init()) the kernel panics.
To avoid this, I made these sections page aligned.
Fixes: 060ef9d89d18 ("powerpc32: PAGE_EXEC required for inittext")
Signed-off-by: Ariel Marcovitch <ariel.marcovitch@gmail.com>
---
arch/powerpc/kernel/vmlinux.lds.S | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S
index 326e113d2e45..e3a7c90c03f4 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -179,6 +179,11 @@ SECTIONS
#endif
} :text
+ /* .init.text is made RO and .exit.text is not, so we must
+ * ensure these sections reside in separate physical pages.
+ */
+ . = ALIGN(PAGE_SIZE);
+
/* .exit.text is discarded at runtime, not link time,
* to deal with references from __bug_table
*/
@@ -186,6 +191,8 @@ SECTIONS
EXIT_TEXT
}
+ . = ALIGN(PAGE_SIZE);
+
.init.data : AT(ADDR(.init.data) - LOAD_OFFSET) {
INIT_DATA
}
base-commit: 1398820fee515873379809a6415930ad0764b2f6
--
2.17.1
^ permalink raw reply related
* [PATCH] powerpc/ps3: use dma_mapping_error()
From: Vincent Stehlé @ 2020-12-13 18:26 UTC (permalink / raw)
To: linuxppc-dev, linux-kernel
Cc: Geoff Levand, Geert Uytterhoeven, Vincent Stehlé
The DMA address returned by dma_map_single() should be checked with
dma_mapping_error(). Fix the ps3stor_setup() function accordingly.
Fixes: 80071802cb9c ("[POWERPC] PS3: Storage Driver Core")
Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
---
drivers/ps3/ps3stor_lib.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/ps3/ps3stor_lib.c b/drivers/ps3/ps3stor_lib.c
index 333ba83006e48..a12a1ad9b5fe3 100644
--- a/drivers/ps3/ps3stor_lib.c
+++ b/drivers/ps3/ps3stor_lib.c
@@ -189,7 +189,7 @@ int ps3stor_setup(struct ps3_storage_device *dev, irq_handler_t handler)
dev->bounce_lpar = ps3_mm_phys_to_lpar(__pa(dev->bounce_buf));
dev->bounce_dma = dma_map_single(&dev->sbd.core, dev->bounce_buf,
dev->bounce_size, DMA_BIDIRECTIONAL);
- if (!dev->bounce_dma) {
+ if (dma_mapping_error(&dev->sbd.core, dev->bounce_dma)) {
dev_err(&dev->sbd.core, "%s:%u: map DMA region failed\n",
__func__, __LINE__);
error = -ENODEV;
--
2.29.2
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox