* Re: [PATCH 1/4] perf: Add comment about current state of PERF_MEM_LVL_* namespace and remove an extra line
From: kajoljain @ 2021-10-05 9:48 UTC (permalink / raw)
To: mpe, linuxppc-dev, linux-kernel, peterz, mingo, acme, jolsa,
namhyung, ak
Cc: mark.rutland, songliubraving, atrajeev, daniel, rnsastry,
alexander.shishkin, ast, linux-perf-users, yao.jin, maddy, paulus,
kan.liang
In-Reply-To: <20211005091837.250044-1-kjain@linux.ibm.com>
Hi,
Sorry I missed to update correct version details.
Link to the previous patch-set, where discussion related to addition of
new data source encoding field 'mem_hops' happened:
https://lkml.org/lkml/2021/9/4/37
Changelog:
- Rather then adding new macros for L2.1/L3.1 (same chip, different
core) entries as part of field lvlnum, we are introducing new field
called 'mem_hops' which can be used to get hops
level data(intra-chip/package or inter-chip/off-package details).
As suggested by Peter Zijlstra.
- Using OnChip to denote data accesses from 'another core of same chip'
is not too clear. Update it to 'remote core, same chip' as pointed by
Michael Ellerman.
- Update the fix patch of correcting data source encodings to use new
added field 'mem_hops'.
Thanks,
Kajol Jain
On 10/5/21 2:48 PM, Kajol Jain wrote:
> Add a comment about PERF_MEM_LVL_* namespace being depricated
> to some extent in favour of added PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_}
> fields.
>
> Remove an extra line present in perf_mem__lvl_scnprintf function.
>
> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
> ---
> include/uapi/linux/perf_event.h | 8 +++++++-
> tools/include/uapi/linux/perf_event.h | 8 +++++++-
> tools/perf/util/mem-events.c | 1 -
> 3 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index f92880a15645..e1701e9c7858 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -1241,7 +1241,13 @@ union perf_mem_data_src {
> #define PERF_MEM_OP_EXEC 0x10 /* code (execution) */
> #define PERF_MEM_OP_SHIFT 0
>
> -/* memory hierarchy (memory level, hit or miss) */
> +/*
> + * PERF_MEM_LVL_* namespace being depricated to some extent in the
> + * favour of newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_} fields.
> + * Supporting this namespace inorder to not break defined ABIs.
> + *
> + * memory hierarchy (memory level, hit or miss)
> + */
> #define PERF_MEM_LVL_NA 0x01 /* not available */
> #define PERF_MEM_LVL_HIT 0x02 /* hit level */
> #define PERF_MEM_LVL_MISS 0x04 /* miss level */
> diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
> index f92880a15645..e1701e9c7858 100644
> --- a/tools/include/uapi/linux/perf_event.h
> +++ b/tools/include/uapi/linux/perf_event.h
> @@ -1241,7 +1241,13 @@ union perf_mem_data_src {
> #define PERF_MEM_OP_EXEC 0x10 /* code (execution) */
> #define PERF_MEM_OP_SHIFT 0
>
> -/* memory hierarchy (memory level, hit or miss) */
> +/*
> + * PERF_MEM_LVL_* namespace being depricated to some extent in the
> + * favour of newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_} fields.
> + * Supporting this namespace inorder to not break defined ABIs.
> + *
> + * memory hierarchy (memory level, hit or miss)
> + */
> #define PERF_MEM_LVL_NA 0x01 /* not available */
> #define PERF_MEM_LVL_HIT 0x02 /* hit level */
> #define PERF_MEM_LVL_MISS 0x04 /* miss level */
> diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
> index f0e75df72b80..ff7289e28192 100644
> --- a/tools/perf/util/mem-events.c
> +++ b/tools/perf/util/mem-events.c
> @@ -320,7 +320,6 @@ int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
> /* already taken care of */
> m &= ~(PERF_MEM_LVL_HIT|PERF_MEM_LVL_MISS);
>
> -
> if (mem_info && mem_info->data_src.mem_remote) {
> strcat(out, "Remote ");
> l += 7;
>
^ permalink raw reply
* Re: [PATCH v5 00/14] PCI: Add support for Apple M1
From: Marc Zyngier @ 2021-10-05 9:57 UTC (permalink / raw)
To: Linus Walleij
Cc: open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
Mark Kettenis, Lorenzo Pieralisi, Krzysztof Wilczyński, PCI,
Joerg Roedel, Sven Peter, linux-kernel@vger.kernel.org,
Joey Gouly, Rob Herring, Hector Martin, Bjorn Helgaas,
Robin Murphy, Android Kernel Team,
linuxppc-dev@lists.ozlabs.org list, opensuse-ppc,
Alyssa Rosenzweig, Stan Skowronek
In-Reply-To: <CACRpkdaL=YEfqSmAogLcP0Gn2gUqSaEXZQrJD1GR5QU+DyuyDQ@mail.gmail.com>
On Mon, 04 Oct 2021 21:42:45 +0100,
Linus Walleij <linus.walleij@linaro.org> wrote:
>
> On Mon, Oct 4, 2021 at 9:52 PM Rob Herring <robh+dt@kernel.org> wrote:
>
> > FYI, I pushed patches 1-3 to kernelCI and didn't see any regressions.
> > I am a bit worried about changes to the DT interrupt parsing and
> > ancient platforms (such as PowerMacs). Most likely there wouldn't be
> > any report until -rc1 or months later on those old systems.
>
> Lets page the PPC lists to see if someone can test on some powermac.
/me eyes the XServe-G5 that hasn't been powered on in 10 years. What
could possibly go wrong?
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply
* Re: [PATCH v1 2/6] mm/memory_hotplug: remove CONFIG_MEMORY_HOTPLUG_SPARSE
From: Greg Kroah-Hartman @ 2021-10-05 13:43 UTC (permalink / raw)
To: David Hildenbrand
Cc: Michal Hocko, Michael S. Tsirkin, Peter Zijlstra, Jason Wang,
Dave Hansen, virtualization, linux-mm, Paul Mackerras,
linux-kselftest, H. Peter Anvin, Shuah Khan, Alex Shi,
Jonathan Corbet, x86, Ingo Molnar, Borislav Petkov,
Andy Lutomirski, Thomas Gleixner, Oscar Salvador, linux-doc,
linux-kernel, Rafael J. Wysocki, Andrew Morton, linuxppc-dev,
Mike Rapoport
In-Reply-To: <20210929143600.49379-3-david@redhat.com>
On Wed, Sep 29, 2021 at 04:35:56PM +0200, David Hildenbrand wrote:
> CONFIG_MEMORY_HOTPLUG depends on CONFIG_SPARSEMEM, so there is no need for
> CONFIG_MEMORY_HOTPLUG_SPARSE anymore; adjust all instances to use
> CONFIG_MEMORY_HOTPLUG and remove CONFIG_MEMORY_HOTPLUG_SPARSE.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
^ permalink raw reply
* Re: [PATCH v1 2/6] mm/memory_hotplug: remove CONFIG_MEMORY_HOTPLUG_SPARSE
From: Shuah Khan @ 2021-10-05 14:22 UTC (permalink / raw)
To: David Hildenbrand, linux-kernel
Cc: Michal Hocko, Michael S. Tsirkin, Peter Zijlstra, Jason Wang,
Dave Hansen, virtualization, linux-mm, Paul Mackerras,
linux-kselftest, H. Peter Anvin, Shuah Khan, Alex Shi,
Jonathan Corbet, x86, Ingo Molnar, Borislav Petkov,
Andy Lutomirski, Shuah Khan, Thomas Gleixner, Oscar Salvador,
Greg Kroah-Hartman, linux-doc, Rafael J. Wysocki, Andrew Morton,
linuxppc-dev, Mike Rapoport
In-Reply-To: <20210929143600.49379-3-david@redhat.com>
On 9/29/21 8:35 AM, David Hildenbrand wrote:
> CONFIG_MEMORY_HOTPLUG depends on CONFIG_SPARSEMEM, so there is no need for
> CONFIG_MEMORY_HOTPLUG_SPARSE anymore; adjust all instances to use
> CONFIG_MEMORY_HOTPLUG and remove CONFIG_MEMORY_HOTPLUG_SPARSE.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> arch/powerpc/include/asm/machdep.h | 2 +-
> arch/powerpc/kernel/setup_64.c | 2 +-
> arch/powerpc/platforms/powernv/setup.c | 4 ++--
> arch/powerpc/platforms/pseries/setup.c | 2 +-
> drivers/base/Makefile | 2 +-
> drivers/base/node.c | 9 ++++-----
> drivers/virtio/Kconfig | 2 +-
> include/linux/memory.h | 18 +++++++-----------
> include/linux/node.h | 4 ++--
> lib/Kconfig.debug | 2 +-
> mm/Kconfig | 4 ----
> mm/memory_hotplug.c | 2 --
> tools/testing/selftests/memory-hotplug/config | 1 -
> 13 files changed, 21 insertions(+), 33 deletions(-)
>
> {
> diff --git a/tools/testing/selftests/memory-hotplug/config b/tools/testing/selftests/memory-hotplug/config
> index a7e8cd5bb265..1eef042a31e1 100644
> --- a/tools/testing/selftests/memory-hotplug/config
> +++ b/tools/testing/selftests/memory-hotplug/config
> @@ -1,5 +1,4 @@
> CONFIG_MEMORY_HOTPLUG=y
> -CONFIG_MEMORY_HOTPLUG_SPARSE=y
> CONFIG_NOTIFIER_ERROR_INJECTION=y
> CONFIG_MEMORY_NOTIFIER_ERROR_INJECT=m
> CONFIG_MEMORY_HOTREMOVE=y
>
For Kselftest change:
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
thanks,
-- Shuah
^ permalink raw reply
* [PATCH v4 0/8] Fix long standing AER Error Handling Issues
From: Naveen Naidu @ 2021-10-05 17:18 UTC (permalink / raw)
To: bhelgaas, ruscur, oohall
Cc: linux-pci, linux-kernel, Naveen Naidu, skhan, linuxppc-dev,
linux-kernel-mentees
This patch series aims at fixing some of the AER error handling issues
we have.
Currently we have the following issues:
- Confusing message in aer_print_error()
- aer_err_info not being initialized completely in DPC path before
we print the AER logs
- A bug [1] in clearing of AER registers in the native AER path
[1] https://lore.kernel.org/linux-pci/20151229155822.GA17321@localhost/
The primary aim of this patch series is to converge the APEI path and the
native AER error handling paths. In our current code, we find that we
have two different behaviours (especially when it comes to clearing of
the AER registers) for the same functionality.
This patch series, tries to bring the same semantics and hence more
commonanlity between the APEI part of code and the native OS
handling of AER errors.
PATCH 1:
- Fixes the first issue
PATCH 2 - 4:
- Fixes the second issue
- "Patch 3/8" is dependent on "Patch 2/8" in the series
PATCH 5 - 7
- Deals with converging the various paths and brings more
commonality between them
- "Patch 6/8" depends on "Patch 1/8"
PATCH 8:
- Adds extra information in AER error logs.
Thanks,
Naveen Naidu
Changelog
=========
v4:
- Implement review comments
- Make "Patch 1/8" commit message more meaningful
- Fix the code comment error detected by kernel test robot
in "Patch 6/8"
v2 and v3:
- Fix up mail formatting and include the appropriate receipients for
the patch.
Naveen Naidu (8):
[PATCH v4 1/8] PCI/AER: Remove ID from aer_agent_string[]
[PATCH v4 2/8] PCI: Cleanup struct aer_err_info
[PATCH v4 3/8] PCI/DPC: Initialize info->id in dpc_process_error()
[PATCH v4 4/8] PCI/DPC: Use pci_aer_clear_status() in dpc_process_error()
[PATCH v4 5/8] PCI/DPC: Converge EDR and DPC Path of clearing AER registers
[PATCH v4 6/8] PCI/AER: Clear error device AER registers in aer_irq()
[PATCH v4 7/8] PCI/ERR: Remove redundant clearing of AER register in pcie_do_recovery()
[PATCH v4 8/8] PCI/AER: Include DEVCTL in aer_print_error()
drivers/pci/pci.h | 23 +++-
drivers/pci/pcie/aer.c | 269 ++++++++++++++++++++++++++++-------------
drivers/pci/pcie/dpc.c | 9 +-
drivers/pci/pcie/err.c | 9 +-
4 files changed, 209 insertions(+), 101 deletions(-)
--
2.25.1
^ permalink raw reply
* [PATCH v4 1/8] PCI/AER: Remove ID from aer_agent_string[]
From: Naveen Naidu @ 2021-10-05 17:18 UTC (permalink / raw)
To: bhelgaas, ruscur, oohall
Cc: linux-pci, linux-kernel, Naveen Naidu, skhan, linuxppc-dev,
linux-kernel-mentees
In-Reply-To: <cover.1633453452.git.naveennaidu479@gmail.com>
Currently, we do not print the "id" field in the AER error logs. Yet the
aer_agent_string[] has the word "id" in it. The AER error log looks
like:
pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Without the "id" field in the error log, The aer_agent_string[]
(eg: "Receiver ID") does not make sense. A user reading the
aer_agent_string[] in the log, might inadvertently look for an "id"
field and not finding it might lead to confusion.
Remove the "ID" from the aer_agent_string[].
The following are sample dummy errors inject via aer-inject.
Before
=======
In 010caed4ccb6 ("PCI/AER: Decode Error Source Requester ID"),
the "id" field was removed from the AER error logs, so currently AER
logs look like:
pcieport 0000:00:03.0: AER: Corrected error received: 0000:00:03:0
pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID) <--- no id field
pcieport 0000:00:03.0: device [1b36:000c] error status/mask=00000040/0000e000
pcieport 0000:00:03.0: [ 6] BadTLP
After
======
pcieport 0000:00:03.0: AER: Corrected error received: 0000:00:03.0
pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver)
pcieport 0000:00:03.0: device [1b36:000c] error status/mask=00000040/0000e000
pcieport 0000:00:03.0: [ 6] BadTLP
Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
---
drivers/pci/pcie/aer.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 9784fdcf3006..241ff361b43c 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -516,10 +516,10 @@ static const char *aer_uncorrectable_error_string[] = {
};
static const char *aer_agent_string[] = {
- "Receiver ID",
- "Requester ID",
- "Completer ID",
- "Transmitter ID"
+ "Receiver",
+ "Requester",
+ "Completer",
+ "Transmitter"
};
#define aer_stats_dev_attr(name, stats_array, strings_array, \
@@ -703,7 +703,7 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
const char *level;
if (!info->status) {
- pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent ID)\n",
+ pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent)\n",
aer_error_severity_string[info->severity]);
goto out;
}
--
2.25.1
^ permalink raw reply related
* [PATCH v4 2/8] PCI: Cleanup struct aer_err_info
From: Naveen Naidu @ 2021-10-05 17:18 UTC (permalink / raw)
To: bhelgaas, ruscur, oohall
Cc: linux-pci, linux-kernel, Naveen Naidu, skhan, linuxppc-dev,
linux-kernel-mentees
In-Reply-To: <cover.1633453452.git.naveennaidu479@gmail.com>
The id, status and the mask fields of the struct aer_err_info comes
directly from the registers, hence their sizes should be explicit.
The length of these registers are:
- id: 16 bits - Represents the Error Source Requester ID
- status: 32 bits - COR/UNCOR Error Status
- mask: 32 bits - COR/UNCOR Error Mask
Since the length of the above registers are even, use u16 and u32
to represent their values.
Also remove the __pad fields.
"pahole" was run on the modified struct aer_err_info and the size
remains unchanged.
Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
---
drivers/pci/pci.h | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 1cce56c2aea0..9be7a966fda7 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -427,18 +427,16 @@ struct aer_err_info {
struct pci_dev *dev[AER_MAX_MULTI_ERR_DEVICES];
int error_dev_num;
- unsigned int id:16;
+ u16 id;
unsigned int severity:2; /* 0:NONFATAL | 1:FATAL | 2:COR */
- unsigned int __pad1:5;
unsigned int multi_error_valid:1;
unsigned int first_error:5;
- unsigned int __pad2:2;
unsigned int tlp_header_valid:1;
- unsigned int status; /* COR/UNCOR Error Status */
- unsigned int mask; /* COR/UNCOR Error Mask */
+ u32 status; /* COR/UNCOR Error Status */
+ u32 mask; /* COR/UNCOR Error Mask */
struct aer_header_log_regs tlp; /* TLP Header */
};
--
2.25.1
^ permalink raw reply related
* [PATCH v4 3/8] PCI/DPC: Initialize info->id in dpc_process_error()
From: Naveen Naidu @ 2021-10-05 17:18 UTC (permalink / raw)
To: bhelgaas, ruscur, oohall
Cc: linux-pci, linux-kernel, Naveen Naidu, skhan, linuxppc-dev,
linux-kernel-mentees
In-Reply-To: <cover.1633453452.git.naveennaidu479@gmail.com>
In the dpc_process_error() path, info->id isn't initialized before being
passed to aer_print_error(). In the corresponding AER path, it is
initialized in aer_isr_one_error().
The error message shown during Coverity Scan is:
Coverity #1461602
CID 1461602 (#1 of 1): Uninitialized scalar variable (UNINIT)
8. uninit_use_in_call: Using uninitialized value info.id when calling aer_print_error.
Initialize the "info->id" before passing it to aer_print_error()
Fixes: 8aefa9b0d910 ("PCI/DPC: Print AER status in DPC event handling")
Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
---
drivers/pci/pcie/dpc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index c556e7beafe3..df3f3a10f8bc 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -262,14 +262,14 @@ static int dpc_get_aer_uncorrect_severity(struct pci_dev *dev,
void dpc_process_error(struct pci_dev *pdev)
{
- u16 cap = pdev->dpc_cap, status, source, reason, ext_reason;
+ u16 cap = pdev->dpc_cap, status, reason, ext_reason;
struct aer_err_info info;
pci_read_config_word(pdev, cap + PCI_EXP_DPC_STATUS, &status);
- pci_read_config_word(pdev, cap + PCI_EXP_DPC_SOURCE_ID, &source);
+ pci_read_config_word(pdev, cap + PCI_EXP_DPC_SOURCE_ID, &info.id);
pci_info(pdev, "containment event, status:%#06x source:%#06x\n",
- status, source);
+ status, info.id);
reason = (status & PCI_EXP_DPC_STATUS_TRIGGER_RSN) >> 1;
ext_reason = (status & PCI_EXP_DPC_STATUS_TRIGGER_RSN_EXT) >> 5;
--
2.25.1
^ permalink raw reply related
* [PATCH v4 4/8] PCI/DPC: Use pci_aer_clear_status() in dpc_process_error()
From: Naveen Naidu @ 2021-10-05 17:18 UTC (permalink / raw)
To: bhelgaas, ruscur, oohall
Cc: linux-pci, linux-kernel, Naveen Naidu, skhan, linuxppc-dev,
linux-kernel-mentees
In-Reply-To: <cover.1633453452.git.naveennaidu479@gmail.com>
dpc_process_error() clears both AER fatal and non fatal status
registers. Instead of clearing each status registers via a different
function call use pci_aer_clear_status().
This helps clean up the code a bit.
Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
---
drivers/pci/pcie/dpc.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index df3f3a10f8bc..faf4a1e77fab 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -288,8 +288,7 @@ void dpc_process_error(struct pci_dev *pdev)
dpc_get_aer_uncorrect_severity(pdev, &info) &&
aer_get_device_error_info(pdev, &info)) {
aer_print_error(pdev, &info);
- pci_aer_clear_nonfatal_status(pdev);
- pci_aer_clear_fatal_status(pdev);
+ pci_aer_clear_status(pdev);
}
}
--
2.25.1
^ permalink raw reply related
* [PATCH v4 5/8] PCI/DPC: Converge EDR and DPC Path of clearing AER registers
From: Naveen Naidu @ 2021-10-05 17:18 UTC (permalink / raw)
To: bhelgaas, ruscur, oohall
Cc: linux-pci, linux-kernel, Naveen Naidu, skhan, linuxppc-dev,
linux-kernel-mentees
In-Reply-To: <cover.1633453452.git.naveennaidu479@gmail.com>
In the EDR path, AER registers are cleared *after* DPC error event is
processed. The process stack in EDR is:
edr_handle_event()
dpc_process_error()
pci_aer_raw_clear_status()
pcie_do_recovery()
But in DPC path, AER status registers are cleared *while* processing
the error. The process stack in DPC is:
dpc_handler()
dpc_process_error()
pci_aer_clear_status()
pcie_do_recovery()
In EDR path, AER status registers are cleared irrespective of whether
the error was an RP PIO or unmasked uncorrectable error. But in DPC, the
AER status registers are cleared only when it's an unmasked uncorrectable
error.
This leads to two different behaviours for the same task (handling of
DPC errors) in FFS systems and when native OS has control.
Bring the same semantics for clearing the AER status register in EDR
path and DPC path.
Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
---
drivers/pci/pcie/dpc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index faf4a1e77fab..68899a3db126 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -288,7 +288,6 @@ void dpc_process_error(struct pci_dev *pdev)
dpc_get_aer_uncorrect_severity(pdev, &info) &&
aer_get_device_error_info(pdev, &info)) {
aer_print_error(pdev, &info);
- pci_aer_clear_status(pdev);
}
}
@@ -297,6 +296,7 @@ static irqreturn_t dpc_handler(int irq, void *context)
struct pci_dev *pdev = context;
dpc_process_error(pdev);
+ pci_aer_clear_status(pdev);
/* We configure DPC so it only triggers on ERR_FATAL */
pcie_do_recovery(pdev, pci_channel_io_frozen, dpc_reset_link);
--
2.25.1
^ permalink raw reply related
* [PATCH v4 6/8] PCI/AER: Clear error device AER registers in aer_irq()
From: Naveen Naidu @ 2021-10-05 17:18 UTC (permalink / raw)
To: bhelgaas, ruscur, oohall
Cc: linux-pci, linux-kernel, Naveen Naidu, skhan, linuxppc-dev,
linux-kernel-mentees
In-Reply-To: <cover.1633453452.git.naveennaidu479@gmail.com>
Converge the APEI path and native AER path of clearing the AER registers
of the error device.
In APEI path, the system firmware clears the AER registers before
handing off the record to OS. But in "native AER" path, the execution
path of clearing the AER register is as follows:
aer_isr_one_error
aer_print_port_info
if (find_source_device())
aer_process_err_devices
handle_error_source
pci_write_config_dword(dev, PCI_ERR_COR_STATUS, ...)
The above path has a bug, if the find_source_device() fails, AER
registers are not cleared from the error device. This means, the error
device will keep reporting the error again and again and would lead
to message spew.
Related Bug Report:
https://lore.kernel.org/linux-pci/20151229155822.GA17321@localhost/
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1521173
The above bug could be avoided, if the AER registers are cleared during
the AER IRQ handler aer_irq(), which would provide guarantee that the AER
error registers are always cleared. This is similar to how APEI handles
these errors.
The main aim is that:
When an interrupt handler deals with a interrupt, it must *always*
clear the source of the interrupt.
Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
---
drivers/pci/pci.h | 13 ++-
drivers/pci/pcie/aer.c | 249 ++++++++++++++++++++++++++++-------------
2 files changed, 184 insertions(+), 78 deletions(-)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 9be7a966fda7..eb88d8bfeaf7 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -424,7 +424,6 @@ static inline bool pci_dev_is_added(const struct pci_dev *dev)
#define AER_MAX_MULTI_ERR_DEVICES 5 /* Not likely to have more */
struct aer_err_info {
- struct pci_dev *dev[AER_MAX_MULTI_ERR_DEVICES];
int error_dev_num;
u16 id;
@@ -440,6 +439,18 @@ struct aer_err_info {
struct aer_header_log_regs tlp; /* TLP Header */
};
+/* Preliminary AER error information processed from Root port */
+struct aer_devices_err_info {
+ struct pci_dev *dev[AER_MAX_MULTI_ERR_DEVICES];
+ struct aer_err_info err_info;
+};
+
+/* AER information associated with each error device */
+struct aer_dev_err_info {
+ struct pci_dev *dev;
+ struct aer_err_info err_info;
+};
+
int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info);
void aer_print_error(struct pci_dev *dev, struct aer_err_info *info);
#endif /* CONFIG_PCIEAER */
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 241ff361b43c..d3937f5384e4 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -36,6 +36,18 @@
#define AER_ERROR_SOURCES_MAX 128
+/*
+ * There can be 128 maximum error sources (AER_ERROR_SOURCES_MAX) and each
+ * error source can have maximum of 5 error devices (AER_MAX_MULTI_ERR_DEVICES)
+ * so the maximum error devices we can report is:
+ *
+ * AER_ERROR_DEVICES_MAX = AER_ERROR_SOURCES_MAX * AER_MAX_MULTI_ERR_DEVICES == (128 * 5) == 640
+ *
+ * But since, the size in KFIFO should be a power of two, the closest value
+ * to 640 is 1024
+ */
+# define AER_ERROR_DEVICES_MAX 1024
+
#define AER_MAX_TYPEOF_COR_ERRS 16 /* as per PCI_ERR_COR_STATUS */
#define AER_MAX_TYPEOF_UNCOR_ERRS 27 /* as per PCI_ERR_UNCOR_STATUS*/
@@ -46,7 +58,7 @@ struct aer_err_source {
struct aer_rpc {
struct pci_dev *rpd; /* Root Port device */
- DECLARE_KFIFO(aer_fifo, struct aer_err_source, AER_ERROR_SOURCES_MAX);
+ DECLARE_KFIFO(aer_fifo, struct aer_dev_err_info, AER_ERROR_DEVICES_MAX);
};
/* AER stats for the device */
@@ -803,14 +815,14 @@ void cper_print_aer(struct pci_dev *dev, int aer_severity,
/**
* add_error_device - list device to be handled
- * @e_info: pointer to error info
+ * @e_dev: pointer to error info
* @dev: pointer to pci_dev to be added
*/
-static int add_error_device(struct aer_err_info *e_info, struct pci_dev *dev)
+static int add_error_device(struct aer_devices_err_info *e_dev, struct pci_dev *dev)
{
- if (e_info->error_dev_num < AER_MAX_MULTI_ERR_DEVICES) {
- e_info->dev[e_info->error_dev_num] = pci_dev_get(dev);
- e_info->error_dev_num++;
+ if (e_dev->err_info.error_dev_num < AER_MAX_MULTI_ERR_DEVICES) {
+ e_dev->dev[e_dev->err_info.error_dev_num] = pci_dev_get(dev);
+ e_dev->err_info.error_dev_num++;
return 0;
}
return -ENOSPC;
@@ -877,18 +889,18 @@ static bool is_error_source(struct pci_dev *dev, struct aer_err_info *e_info)
static int find_device_iter(struct pci_dev *dev, void *data)
{
- struct aer_err_info *e_info = (struct aer_err_info *)data;
+ struct aer_devices_err_info *e_dev = (struct aer_devices_err_info *)data;
- if (is_error_source(dev, e_info)) {
+ if (is_error_source(dev, &e_dev->err_info)) {
/* List this device */
- if (add_error_device(e_info, dev)) {
+ if (add_error_device(e_dev, dev)) {
/* We cannot handle more... Stop iteration */
/* TODO: Should print error message here? */
return 1;
}
/* If there is only a single error, stop iteration */
- if (!e_info->multi_error_valid)
+ if (!e_dev->err_info.multi_error_valid)
return 1;
}
return 0;
@@ -897,7 +909,7 @@ static int find_device_iter(struct pci_dev *dev, void *data)
/**
* find_source_device - search through device hierarchy for source device
* @parent: pointer to Root Port pci_dev data structure
- * @e_info: including detailed error information such like id
+ * @e_dev: including detailed error information such like id
*
* Return true if found.
*
@@ -907,26 +919,26 @@ static int find_device_iter(struct pci_dev *dev, void *data)
* e_info->error_dev_num and e_info->dev[], based on the given information.
*/
static bool find_source_device(struct pci_dev *parent,
- struct aer_err_info *e_info)
+ struct aer_devices_err_info *e_dev)
{
struct pci_dev *dev = parent;
int result;
/* Must reset in this function */
- e_info->error_dev_num = 0;
+ e_dev->err_info.error_dev_num = 0;
/* Is Root Port an agent that sends error message? */
- result = find_device_iter(dev, e_info);
+ result = find_device_iter(dev, e_dev);
if (result)
return true;
if (pci_pcie_type(parent) == PCI_EXP_TYPE_RC_EC)
- pcie_walk_rcec(parent, find_device_iter, e_info);
+ pcie_walk_rcec(parent, find_device_iter, e_dev);
else
- pci_walk_bus(parent->subordinate, find_device_iter, e_info);
+ pci_walk_bus(parent->subordinate, find_device_iter, e_dev);
- if (!e_info->error_dev_num) {
- pci_info(parent, "can't find device of ID%04x\n", e_info->id);
+ if (!e_dev->err_info.error_dev_num) {
+ pci_info(parent, "can't find device of ID%04x\n", e_dev->err_info.id);
return false;
}
return true;
@@ -940,24 +952,42 @@ static bool find_source_device(struct pci_dev *parent,
* Invoked when an error being detected by Root Port.
*/
static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
+{
+ /*
+ * Correctable error does not need software intervention.
+ * No need to go through error recovery process.
+ */
+ if (info->severity == AER_NONFATAL)
+ pcie_do_recovery(dev, pci_channel_io_normal, aer_root_reset);
+ else if (info->severity == AER_FATAL)
+ pcie_do_recovery(dev, pci_channel_io_frozen, aer_root_reset);
+ pci_dev_put(dev);
+}
+
+/**
+ * clear_error_source_aer_registers - clear AER registers of the error source device
+ * @dev: pointer to pci_dev data structure of error source device
+ * @info: comprehensive error information
+ *
+ * Invoked when an error being detected by Root Port but before we handle the
+ * error.
+ */
+static void clear_error_source_aer_registers(struct pci_dev *dev, struct aer_err_info info)
{
int aer = dev->aer_cap;
- if (info->severity == AER_CORRECTABLE) {
- /*
- * Correctable error does not need software intervention.
- * No need to go through error recovery process.
- */
+ if (info.severity == AER_CORRECTABLE) {
if (aer)
pci_write_config_dword(dev, aer + PCI_ERR_COR_STATUS,
- info->status);
+ info.status);
if (pcie_aer_is_native(dev))
pcie_clear_device_status(dev);
- } else if (info->severity == AER_NONFATAL)
- pcie_do_recovery(dev, pci_channel_io_normal, aer_root_reset);
- else if (info->severity == AER_FATAL)
- pcie_do_recovery(dev, pci_channel_io_frozen, aer_root_reset);
- pci_dev_put(dev);
+ } else if (info.severity == AER_NONFATAL) {
+ pci_aer_clear_nonfatal_status(dev);
+ } else if (info.severity == AER_FATAL) {
+ pci_aer_clear_fatal_status(dev);
+ }
+
}
#ifdef CONFIG_ACPI_APEI_PCIEAER
@@ -1093,70 +1123,112 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
return 1;
}
-static inline void aer_process_err_devices(struct aer_err_info *e_info)
-{
- int i;
-
- /* Report all before handle them, not to lost records by reset etc. */
- for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) {
- if (aer_get_device_error_info(e_info->dev[i], e_info))
- aer_print_error(e_info->dev[i], e_info);
- }
- for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) {
- if (aer_get_device_error_info(e_info->dev[i], e_info))
- handle_error_source(e_info->dev[i], e_info);
- }
-}
-
/**
- * aer_isr_one_error - consume an error detected by root port
- * @rpc: pointer to the root port which holds an error
+ * aer_find_corr_error_source_device - find the error source which detected the corrected error
+ * @rp: pointer to Root Port pci_dev data structure
* @e_src: pointer to an error source
+ * @e_info: including detailed error information such like id
+ *
+ * Return true if found.
+ *
+ * Process the error information received at the Root Port, set these values
+ * in the aer_devices_err_info and find all the devices that are related to
+ * the error.
*/
-static void aer_isr_one_error(struct aer_rpc *rpc,
- struct aer_err_source *e_src)
+static bool aer_find_corr_error_source_device(struct pci_dev *rp,
+ struct aer_err_source *e_src,
+ struct aer_devices_err_info *e_info)
{
- struct pci_dev *pdev = rpc->rpd;
- struct aer_err_info e_info;
-
- pci_rootport_aer_stats_incr(pdev, e_src);
-
- /*
- * There is a possibility that both correctable error and
- * uncorrectable error being logged. Report correctable error first.
- */
if (e_src->status & PCI_ERR_ROOT_COR_RCV) {
- e_info.id = ERR_COR_ID(e_src->id);
- e_info.severity = AER_CORRECTABLE;
+ e_info->err_info.id = ERR_COR_ID(e_src->id);
+ e_info->err_info.severity = AER_CORRECTABLE;
if (e_src->status & PCI_ERR_ROOT_MULTI_COR_RCV)
- e_info.multi_error_valid = 1;
+ e_info->err_info.multi_error_valid = 1;
else
- e_info.multi_error_valid = 0;
- aer_print_port_info(pdev, &e_info);
+ e_info->err_info.multi_error_valid = 0;
- if (find_source_device(pdev, &e_info))
- aer_process_err_devices(&e_info);
+ if (!find_source_device(rp, e_info))
+ return false;
}
+ return true;
+}
+/**
+ * aer_find_uncorr_error_source_device - find the error source which detected the uncorrected error
+ * @rp: pointer to Root Port pci_dev data structure
+ * @e_src: pointer to an error source
+ * @e_info: including detailed error information such like id
+ *
+ * Return true if found.
+ *
+ * Process the error information received at the Root Port, set these values
+ * in the aer_devices_err_info and find all the devices that are related to
+ * the error.
+ */
+static bool aer_find_uncorr_error_source_device(struct pci_dev *rp,
+ struct aer_err_source *e_src,
+ struct aer_devices_err_info *e_info)
+{
if (e_src->status & PCI_ERR_ROOT_UNCOR_RCV) {
- e_info.id = ERR_UNCOR_ID(e_src->id);
+ e_info->err_info.id = ERR_UNCOR_ID(e_src->id);
if (e_src->status & PCI_ERR_ROOT_FATAL_RCV)
- e_info.severity = AER_FATAL;
+ e_info->err_info.severity = AER_FATAL;
else
- e_info.severity = AER_NONFATAL;
+ e_info->err_info.severity = AER_NONFATAL;
if (e_src->status & PCI_ERR_ROOT_MULTI_UNCOR_RCV)
- e_info.multi_error_valid = 1;
+ e_info->err_info.multi_error_valid = 1;
else
- e_info.multi_error_valid = 0;
+ e_info->err_info.multi_error_valid = 0;
+
+ if (!find_source_device(rp, e_info))
+ return false;
+ }
- aer_print_port_info(pdev, &e_info);
+ return true;
+}
- if (find_source_device(pdev, &e_info))
- aer_process_err_devices(&e_info);
+/**
+ * aer_isr_one_error - consume an error detected by root port
+ * @rp: pointer to Root Port pci_dev data structure
+ * @e_dev: pointer to an error device
+ */
+static void aer_isr_one_error(struct pci_dev *rp, struct aer_dev_err_info *e_dev)
+{
+ aer_print_port_info(rp, &e_dev->err_info);
+ aer_print_error(e_dev->dev, &e_dev->err_info);
+ handle_error_source(e_dev->dev, &e_dev->err_info);
+}
+
+static bool aer_add_err_devices_to_queue(struct aer_rpc *rpc,
+ struct aer_devices_err_info *e_info)
+{
+ int i;
+ struct aer_dev_err_info *e_dev;
+
+ e_dev = kzalloc(sizeof(*e_dev), GFP_ATOMIC);
+ if (!e_dev)
+ return false;
+
+ for (i = 0; i < e_info->err_info.error_dev_num && e_info->dev[i]; i++) {
+ e_dev->err_info = e_info->err_info;
+ e_dev->dev = e_info->dev[i];
+
+ /*
+ * Store the AER register information for each error device on
+ * the queue
+ */
+ if (aer_get_device_error_info(e_dev->dev, &e_dev->err_info)) {
+ if (!kfifo_put(&rpc->aer_fifo, *e_dev))
+ return false;
+
+ clear_error_source_aer_registers(e_dev->dev, e_dev->err_info);
+ }
}
+
+ return true;
}
/**
@@ -1170,13 +1242,13 @@ static irqreturn_t aer_isr(int irq, void *context)
{
struct pcie_device *dev = (struct pcie_device *)context;
struct aer_rpc *rpc = get_service_data(dev);
- struct aer_err_source e_src;
+ struct aer_dev_err_info e_dev;
if (kfifo_is_empty(&rpc->aer_fifo))
return IRQ_NONE;
- while (kfifo_get(&rpc->aer_fifo, &e_src))
- aer_isr_one_error(rpc, &e_src);
+ while (kfifo_get(&rpc->aer_fifo, &e_dev))
+ aer_isr_one_error(rpc->rpd, &e_dev);
return IRQ_HANDLED;
}
@@ -1194,6 +1266,11 @@ static irqreturn_t aer_irq(int irq, void *context)
struct pci_dev *rp = rpc->rpd;
int aer = rp->aer_cap;
struct aer_err_source e_src = {};
+ struct aer_devices_err_info *e_info;
+
+ e_info = kzalloc(sizeof(*e_info), GFP_ATOMIC);
+ if (!e_info)
+ return IRQ_NONE;
pci_read_config_dword(rp, aer + PCI_ERR_ROOT_STATUS, &e_src.status);
if (!(e_src.status & (PCI_ERR_ROOT_UNCOR_RCV|PCI_ERR_ROOT_COR_RCV)))
@@ -1202,8 +1279,26 @@ static irqreturn_t aer_irq(int irq, void *context)
pci_read_config_dword(rp, aer + PCI_ERR_ROOT_ERR_SRC, &e_src.id);
pci_write_config_dword(rp, aer + PCI_ERR_ROOT_STATUS, e_src.status);
- if (!kfifo_put(&rpc->aer_fifo, e_src))
- return IRQ_HANDLED;
+ pci_rootport_aer_stats_incr(rp, &e_src);
+
+ /*
+ * There is a possibility that both correctable error and
+ * uncorrectable error are being logged. Find the devices which caused
+ * correctable errors first so that they can be added to the queue first
+ * and will be reported first.
+ *
+ * Before adding the error device to the queue to be handled, clear the
+ * AER status registers.
+ */
+ if (aer_find_corr_error_source_device(rp, &e_src, e_info)) {
+ if (!aer_add_err_devices_to_queue(rpc, e_info))
+ return IRQ_NONE;
+ }
+
+ if (aer_find_uncorr_error_source_device(rp, &e_src, e_info)) {
+ if (!aer_add_err_devices_to_queue(rpc, e_info))
+ return IRQ_NONE;
+ }
return IRQ_WAKE_THREAD;
}
--
2.25.1
^ permalink raw reply related
* [PATCH v4 7/8] PCI/ERR: Remove redundant clearing of AER register in pcie_do_recovery()
From: Naveen Naidu @ 2021-10-05 17:18 UTC (permalink / raw)
To: bhelgaas, ruscur, oohall
Cc: linux-pci, linux-kernel, Naveen Naidu, skhan, linuxppc-dev,
linux-kernel-mentees
In-Reply-To: <cover.1633453452.git.naveennaidu479@gmail.com>
pcie_do_recovery() is shared across the following paths:
- ACPI APEI
- Native AER path
- EDR
- DPC
ACPI APEI
==========
ghes_handle_aer()
aer_recover_queue()
kfifo_in_spinlocked(aer_recover_ring)
aer_recover_work_func()
while (kfifo_get(aer_recover_ring))
pcie_do_recovery()
In this path the system firmware clears the AER registers before
handing off the record to the OS in ghes_handle_aer()
Native AER
==========
aer_irq()
aer_add_err_devices_to_queue()
kfifo_put(&rpc->aer_fifo, *e_dev)
clear_error_source_aer_registers() <---- AER registers are cleard
aer_isr()
aer_isr_one_error()
handle_error_source()
pcie_do_recovery()
The AER registers are cleared during the handling of IRQ, i.e before we
the recovery starts.
DPC
=====
dpc_handler()
dpc_process_error()
pci_aer_clear_status() <---- AER registers are cleared
pcie_do_recovery()
EDR
====
edr_handle_event()
dpc_process_error()
pci_aer_raw_clear_status() <---- AER registers are cleared
pcie_do_recovery()
In all the above paths, the AER registers are cleared before
pcie_do_recovery(). The non fatal status AER registers are again cleared
in pcie_do_recovery(). This is redundant.
Remove redundant clearing of AER register in pcie_do_recovery()
Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
---
drivers/pci/pcie/err.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index b576aa890c76..fe04b0ae22f4 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -231,14 +231,11 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
/*
* If we have native control of AER, clear error status in the device
- * that detected the error. If the platform retained control of AER,
- * it is responsible for clearing this status. In that case, the
- * signaling device may not even be visible to the OS.
+ * that detected the error.
*/
- if (host->native_aer || pcie_ports_native) {
+ if (host->native_aer || pcie_ports_native)
pcie_clear_device_status(dev);
- pci_aer_clear_nonfatal_status(dev);
- }
+
pci_info(bridge, "device recovery successful\n");
return status;
--
2.25.1
^ permalink raw reply related
* [PATCH v4 8/8] PCI/AER: Include DEVCTL in aer_print_error()
From: Naveen Naidu @ 2021-10-05 17:18 UTC (permalink / raw)
To: bhelgaas, ruscur, oohall
Cc: linux-pci, linux-kernel, Naveen Naidu, skhan, linuxppc-dev,
linux-kernel-mentees
In-Reply-To: <cover.1633453452.git.naveennaidu479@gmail.com>
Print the contents of Device Control Register of the device which
detected the error. This might help in faster error diagnosis.
Sample output from dummy error injected by aer-inject:
pcieport 0000:00:03.0: AER: Corrected error received: 0000:00:03.0
pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver)
pcieport 0000:00:03.0: device [1b36:000c] error status/mask=00000040/0000e000, devctl=0x000f <-- devctl added to the error log
pcieport 0000:00:03.0: [ 6] BadTLP
Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
---
drivers/pci/pci.h | 2 ++
drivers/pci/pcie/aer.c | 10 ++++++++--
2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index eb88d8bfeaf7..48ed7f91113b 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -437,6 +437,8 @@ struct aer_err_info {
u32 status; /* COR/UNCOR Error Status */
u32 mask; /* COR/UNCOR Error Mask */
struct aer_header_log_regs tlp; /* TLP Header */
+
+ u16 devctl;
};
/* Preliminary AER error information processed from Root port */
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index d3937f5384e4..fdeef9deb016 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -729,8 +729,8 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
aer_error_severity_string[info->severity],
aer_error_layer[layer], aer_agent_string[agent]);
- pci_printk(level, dev, " device [%04x:%04x] error status/mask=%08x/%08x\n",
- dev->vendor, dev->device, info->status, info->mask);
+ pci_printk(level, dev, " device [%04x:%04x] error status/mask=%08x/%08x, devctl=%#06x\n",
+ dev->vendor, dev->device, info->status, info->mask, info->devctl);
__aer_print_error(dev, info);
@@ -1083,6 +1083,12 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
if (!aer)
return 0;
+ /*
+ * Cache the value of Device Control Register now, because later the
+ * device might not be available
+ */
+ pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &info->devctl);
+
if (info->severity == AER_CORRECTABLE) {
pci_read_config_dword(dev, aer + PCI_ERR_COR_STATUS,
&info->status);
--
2.25.1
^ permalink raw reply related
* Re: [PATCH 2/4] perf: Add mem_hops field in perf_mem_data_src structure
From: Peter Zijlstra @ 2021-10-05 20:20 UTC (permalink / raw)
To: Kajol Jain
Cc: mark.rutland, atrajeev, ak, daniel, rnsastry, alexander.shishkin,
linux-kernel, acme, ast, linux-perf-users, yao.jin, mingo, paulus,
maddy, jolsa, namhyung, songliubraving, linuxppc-dev, kan.liang
In-Reply-To: <20211005091837.250044-2-kjain@linux.ibm.com>
On Tue, Oct 05, 2021 at 02:48:35PM +0530, Kajol Jain wrote:
> Going forward, future generation systems can have more hierarchy
> within the chip/package level but currently we don't have any data source
> encoding field in perf, which can be used to represent this level of data.
>
> Add a new field called 'mem_hops' in the perf_mem_data_src structure
> which can be used to represent intra-chip/package or inter-chip/off-package
> details. This field is of size 3 bits where PERF_MEM_HOPS_{NA, 0..6} value
> can be used to present different hop levels data.
>
> Also add corresponding macros to define mem_hop field values
> and shift value.
>
> Currently we define macro for HOPS_0 which corresponds
> to data coming from another core but same chip.
>
> For ex: Encodings for mem_hops fields with L2 cache:
>
> L2 - local L2
> L2 | REMOTE | HOPS_0 - remote core, same chip L2
Can we do s/chip/node/ ? Hops are something NUMA related, while chips
come in a bag or something :-)
> +/* hop level */
> +#define PERF_MEM_HOPS_0 0x01 /* remote core, same chip */
> +/* 2-7 available */
> +#define PERF_MEM_HOPS_SHIFT 43
^ permalink raw reply
* Re: [PATCH 3/9] powerpc/bpf: Remove unused SEEN_STACK
From: Naveen N. Rao @ 2021-10-05 20:22 UTC (permalink / raw)
To: Alexei Starovoitov, Christophe Leroy, Daniel Borkmann,
Johan Almbladh, Michael Ellerman, Nicholas Piggin
Cc: bpf, linuxppc-dev
In-Reply-To: <a9904ed3-c9fc-d86f-a720-de0a7e7a8938@csgroup.eu>
Christophe Leroy wrote:
>
>
> Le 04/10/2021 à 20:11, Naveen N. Rao a écrit :
>> Christophe Leroy wrote:
>>>
>>>
>>> Le 01/10/2021 à 23:14, Naveen N. Rao a écrit :
>>>> From: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
>>>>
>>>> SEEN_STACK is unused on PowerPC. Remove it. Also, have
>>>> SEEN_TAILCALL use 0x40000000.
>>>
>>> Why change SEEN_TAILCALL ? Would it be a problem to leave it as is ?
>>>
>>>>
>>>> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
>>>> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>>
>> I prefer the bit usage to be contiguous. Changing SEEN_TAILCALL isn't a
>> problem either.
>>
>
> Well you are adding SEEN_BIG_PROG in following patch so it would still
> be contiguous at the end.
>
> I don't really mind but I thought it would be less churn to just leave
> SEEN_TAILCALL as is and re-use 0x40000000 for SEEN_BIG_PROG.
Ah ok. This patch was from a different series and it made more sense to
change the bit number there. I have reused the patch here as-is since
the change is fairly trivial.
- Naveen
^ permalink raw reply
* [PATCH v2 00/10] powerpc/bpf: Various fixes
From: Naveen N. Rao @ 2021-10-05 20:25 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, Jordan Niethe, Daniel Borkmann,
Alexei Starovoitov, Christophe Leroy, Johan Almbladh, Song Liu
Cc: bpf, linuxppc-dev
This is v2 of the series posted at:
http://lkml.kernel.org/r/cover.1633104510.git.naveen.n.rao@linux.vnet.ibm.com
Only patches from v1 that need to go into powerpc/fixes are included.
Other patches will be posted as a separate series for inclusion into
powerpc/next.
Patches 7 to 10 are new and fix issues in ppc32.
- Naveen
Naveen N. Rao (10):
powerpc/lib: Add helper to check if offset is within conditional
branch range
powerpc/bpf: Validate branch ranges
powerpc/bpf: Fix BPF_MOD when imm == 1
powerpc/bpf: Fix BPF_SUB when imm == 0x80000000
powerpc/security: Add a helper to query stf_barrier type
powerpc/bpf: Emit stf barrier instruction sequences for BPF_NOSPEC
powerpc/bpf ppc32: Fix ALU32 BPF_ARSH operation
powerpc/bpf ppc32: Fix JMP32_JSET_K
powerpc/bpf ppc32: Do not emit zero extend instruction for 64-bit
BPF_END
powerpc/bpf ppc32: Fix BPF_SUB when imm == 0x80000000
arch/powerpc/include/asm/code-patching.h | 1 +
arch/powerpc/include/asm/security_features.h | 5 +
arch/powerpc/kernel/security.c | 5 +
arch/powerpc/lib/code-patching.c | 7 +-
arch/powerpc/net/bpf_jit.h | 33 +++---
arch/powerpc/net/bpf_jit64.h | 8 +-
arch/powerpc/net/bpf_jit_comp.c | 6 +-
arch/powerpc/net/bpf_jit_comp32.c | 16 +--
arch/powerpc/net/bpf_jit_comp64.c | 100 +++++++++++++++----
9 files changed, 139 insertions(+), 42 deletions(-)
base-commit: cdcb1396e357bd198f81dc7fa4f5d819063abe44
--
2.33.0
^ permalink raw reply
* [PATCH v2 01/10] powerpc/lib: Add helper to check if offset is within conditional branch range
From: Naveen N. Rao @ 2021-10-05 20:25 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, Jordan Niethe, Daniel Borkmann,
Alexei Starovoitov, Christophe Leroy, Johan Almbladh, Song Liu
Cc: bpf, linuxppc-dev
In-Reply-To: <cover.1633464148.git.naveen.n.rao@linux.vnet.ibm.com>
Add a helper to check if a given offset is within the branch range for a
powerpc conditional branch instruction, and update some sites to use the
new helper.
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
Changelog:
- Change 0x7FFF to 0x7fff, per Christophe
arch/powerpc/include/asm/code-patching.h | 1 +
arch/powerpc/lib/code-patching.c | 7 ++++++-
arch/powerpc/net/bpf_jit.h | 7 +------
3 files changed, 8 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h
index a95f63788c6b14..4ba834599c4d4c 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -23,6 +23,7 @@
#define BRANCH_ABSOLUTE 0x2
bool is_offset_in_branch_range(long offset);
+bool is_offset_in_cond_branch_range(long offset);
int create_branch(struct ppc_inst *instr, const u32 *addr,
unsigned long target, int flags);
int create_cond_branch(struct ppc_inst *instr, const u32 *addr,
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index f9a3019e37b43c..c5ed9882383521 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -228,6 +228,11 @@ bool is_offset_in_branch_range(long offset)
return (offset >= -0x2000000 && offset <= 0x1fffffc && !(offset & 0x3));
}
+bool is_offset_in_cond_branch_range(long offset)
+{
+ return offset >= -0x8000 && offset <= 0x7fff && !(offset & 0x3);
+}
+
/*
* Helper to check if a given instruction is a conditional branch
* Derived from the conditional checks in analyse_instr()
@@ -280,7 +285,7 @@ int create_cond_branch(struct ppc_inst *instr, const u32 *addr,
offset = offset - (unsigned long)addr;
/* Check we can represent the target in the instruction format */
- if (offset < -0x8000 || offset > 0x7FFF || offset & 0x3)
+ if (!is_offset_in_cond_branch_range(offset))
return 1;
/* Mask out the flags and target, so they don't step on each other. */
diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 99fad093f43ec1..935ea95b66359e 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -78,11 +78,6 @@
#define PPC_FUNC_ADDR(d,i) do { PPC_LI32(d, i); } while(0)
#endif
-static inline bool is_nearbranch(int offset)
-{
- return (offset < 32768) && (offset >= -32768);
-}
-
/*
* The fly in the ointment of code size changing from pass to pass is
* avoided by padding the short branch case with a NOP. If code size differs
@@ -91,7 +86,7 @@ static inline bool is_nearbranch(int offset)
* state.
*/
#define PPC_BCC(cond, dest) do { \
- if (is_nearbranch((dest) - (ctx->idx * 4))) { \
+ if (is_offset_in_cond_branch_range((long)(dest) - (ctx->idx * 4))) { \
PPC_BCC_SHORT(cond, dest); \
EMIT(PPC_RAW_NOP()); \
} else { \
--
2.33.0
^ permalink raw reply related
* [PATCH v2 02/10] powerpc/bpf: Validate branch ranges
From: Naveen N. Rao @ 2021-10-05 20:25 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, Jordan Niethe, Daniel Borkmann,
Alexei Starovoitov, Christophe Leroy, Johan Almbladh, Song Liu
Cc: bpf, linuxppc-dev
In-Reply-To: <cover.1633464148.git.naveen.n.rao@linux.vnet.ibm.com>
Add checks to ensure that we never emit branch instructions with
truncated branch offsets.
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Tested-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
arch/powerpc/net/bpf_jit.h | 26 ++++++++++++++++++++------
arch/powerpc/net/bpf_jit_comp.c | 6 +++++-
arch/powerpc/net/bpf_jit_comp32.c | 8 ++++++--
arch/powerpc/net/bpf_jit_comp64.c | 8 ++++++--
4 files changed, 37 insertions(+), 11 deletions(-)
diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 935ea95b66359e..7e9b978b768ed9 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -24,16 +24,30 @@
#define EMIT(instr) PLANT_INSTR(image, ctx->idx, instr)
/* Long jump; (unconditional 'branch') */
-#define PPC_JMP(dest) EMIT(PPC_INST_BRANCH | \
- (((dest) - (ctx->idx * 4)) & 0x03fffffc))
+#define PPC_JMP(dest) \
+ do { \
+ long offset = (long)(dest) - (ctx->idx * 4); \
+ if (!is_offset_in_branch_range(offset)) { \
+ pr_err_ratelimited("Branch offset 0x%lx (@%u) out of range\n", offset, ctx->idx); \
+ return -ERANGE; \
+ } \
+ EMIT(PPC_INST_BRANCH | (offset & 0x03fffffc)); \
+ } while (0)
+
/* blr; (unconditional 'branch' with link) to absolute address */
#define PPC_BL_ABS(dest) EMIT(PPC_INST_BL | \
(((dest) - (unsigned long)(image + ctx->idx)) & 0x03fffffc))
/* "cond" here covers BO:BI fields. */
-#define PPC_BCC_SHORT(cond, dest) EMIT(PPC_INST_BRANCH_COND | \
- (((cond) & 0x3ff) << 16) | \
- (((dest) - (ctx->idx * 4)) & \
- 0xfffc))
+#define PPC_BCC_SHORT(cond, dest) \
+ do { \
+ long offset = (long)(dest) - (ctx->idx * 4); \
+ if (!is_offset_in_cond_branch_range(offset)) { \
+ pr_err_ratelimited("Conditional branch offset 0x%lx (@%u) out of range\n", offset, ctx->idx); \
+ return -ERANGE; \
+ } \
+ EMIT(PPC_INST_BRANCH_COND | (((cond) & 0x3ff) << 16) | (offset & 0xfffc)); \
+ } while (0)
+
/* Sign-extended 32-bit immediate load */
#define PPC_LI32(d, i) do { \
if ((int)(uintptr_t)(i) >= -32768 && \
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 53aefee3fe70be..fcbf7a917c566e 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -210,7 +210,11 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
/* Now build the prologue, body code & epilogue for real. */
cgctx.idx = 0;
bpf_jit_build_prologue(code_base, &cgctx);
- bpf_jit_build_body(fp, code_base, &cgctx, addrs, extra_pass);
+ if (bpf_jit_build_body(fp, code_base, &cgctx, addrs, extra_pass)) {
+ bpf_jit_binary_free(bpf_hdr);
+ fp = org_fp;
+ goto out_addrs;
+ }
bpf_jit_build_epilogue(code_base, &cgctx);
if (bpf_jit_enable > 1)
diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c
index beb12cbc8c2994..a74d52204f8da2 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -200,7 +200,7 @@ void bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 fun
}
}
-static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 out)
+static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 out)
{
/*
* By now, the eBPF program has already setup parameters in r3-r6
@@ -261,7 +261,9 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32
bpf_jit_emit_common_epilogue(image, ctx);
EMIT(PPC_RAW_BCTR());
+
/* out: */
+ return 0;
}
/* Assemble the body code between the prologue & epilogue */
@@ -1090,7 +1092,9 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
*/
case BPF_JMP | BPF_TAIL_CALL:
ctx->seen |= SEEN_TAILCALL;
- bpf_jit_emit_tail_call(image, ctx, addrs[i + 1]);
+ ret = bpf_jit_emit_tail_call(image, ctx, addrs[i + 1]);
+ if (ret < 0)
+ return ret;
break;
default:
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index b87a63dba9c8fb..f06c62089b1457 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -206,7 +206,7 @@ void bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 fun
EMIT(PPC_RAW_BCTRL());
}
-static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 out)
+static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 out)
{
/*
* By now, the eBPF program has already setup parameters in r3, r4 and r5
@@ -267,7 +267,9 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32
bpf_jit_emit_common_epilogue(image, ctx);
EMIT(PPC_RAW_BCTR());
+
/* out: */
+ return 0;
}
/* Assemble the body code between the prologue & epilogue */
@@ -993,7 +995,9 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
*/
case BPF_JMP | BPF_TAIL_CALL:
ctx->seen |= SEEN_TAILCALL;
- bpf_jit_emit_tail_call(image, ctx, addrs[i + 1]);
+ ret = bpf_jit_emit_tail_call(image, ctx, addrs[i + 1]);
+ if (ret < 0)
+ return ret;
break;
default:
--
2.33.0
^ permalink raw reply related
* [PATCH v2 03/10] powerpc/bpf: Fix BPF_MOD when imm == 1
From: Naveen N. Rao @ 2021-10-05 20:25 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, Jordan Niethe, Daniel Borkmann,
Alexei Starovoitov, Christophe Leroy, Johan Almbladh, Song Liu
Cc: bpf, linuxppc-dev
In-Reply-To: <cover.1633464148.git.naveen.n.rao@linux.vnet.ibm.com>
Only ignore the operation if dividing by 1.
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Tested-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Fixes: 156d0e290e969c ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
arch/powerpc/net/bpf_jit_comp64.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index f06c62089b1457..d67f6d62e2e1ff 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -391,8 +391,14 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
case BPF_ALU64 | BPF_DIV | BPF_K: /* dst /= imm */
if (imm == 0)
return -EINVAL;
- else if (imm == 1)
- goto bpf_alu32_trunc;
+ if (imm == 1) {
+ if (BPF_OP(code) == BPF_DIV) {
+ goto bpf_alu32_trunc;
+ } else {
+ EMIT(PPC_RAW_LI(dst_reg, 0));
+ break;
+ }
+ }
PPC_LI32(b2p[TMP_REG_1], imm);
switch (BPF_CLASS(code)) {
--
2.33.0
^ permalink raw reply related
* [PATCH v2 04/10] powerpc/bpf: Fix BPF_SUB when imm == 0x80000000
From: Naveen N. Rao @ 2021-10-05 20:25 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, Jordan Niethe, Daniel Borkmann,
Alexei Starovoitov, Christophe Leroy, Johan Almbladh, Song Liu
Cc: bpf, linuxppc-dev
In-Reply-To: <cover.1633464148.git.naveen.n.rao@linux.vnet.ibm.com>
We aren't handling subtraction involving an immediate value of
0x80000000 properly. Fix the same.
Fixes: 156d0e290e969c ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
Changelog:
- Split up BPF_ADD and BPF_SUB cases per Christophe's comments
arch/powerpc/net/bpf_jit_comp64.c | 27 +++++++++++++++++----------
1 file changed, 17 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index d67f6d62e2e1ff..6626e6c17d4ed2 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -330,18 +330,25 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
EMIT(PPC_RAW_SUB(dst_reg, dst_reg, src_reg));
goto bpf_alu32_trunc;
case BPF_ALU | BPF_ADD | BPF_K: /* (u32) dst += (u32) imm */
- case BPF_ALU | BPF_SUB | BPF_K: /* (u32) dst -= (u32) imm */
case BPF_ALU64 | BPF_ADD | BPF_K: /* dst += imm */
+ if (!imm) {
+ goto bpf_alu32_trunc;
+ } else if (imm >= -32768 && imm < 32768) {
+ EMIT(PPC_RAW_ADDI(dst_reg, dst_reg, IMM_L(imm)));
+ } else {
+ PPC_LI32(b2p[TMP_REG_1], imm);
+ EMIT(PPC_RAW_ADD(dst_reg, dst_reg, b2p[TMP_REG_1]));
+ }
+ goto bpf_alu32_trunc;
+ case BPF_ALU | BPF_SUB | BPF_K: /* (u32) dst -= (u32) imm */
case BPF_ALU64 | BPF_SUB | BPF_K: /* dst -= imm */
- if (BPF_OP(code) == BPF_SUB)
- imm = -imm;
- if (imm) {
- if (imm >= -32768 && imm < 32768)
- EMIT(PPC_RAW_ADDI(dst_reg, dst_reg, IMM_L(imm)));
- else {
- PPC_LI32(b2p[TMP_REG_1], imm);
- EMIT(PPC_RAW_ADD(dst_reg, dst_reg, b2p[TMP_REG_1]));
- }
+ if (!imm) {
+ goto bpf_alu32_trunc;
+ } else if (imm > -32768 && imm < 32768) {
+ EMIT(PPC_RAW_ADDI(dst_reg, dst_reg, IMM_L(-imm)));
+ } else {
+ PPC_LI32(b2p[TMP_REG_1], imm);
+ EMIT(PPC_RAW_SUB(dst_reg, dst_reg, b2p[TMP_REG_1]));
}
goto bpf_alu32_trunc;
case BPF_ALU | BPF_MUL | BPF_X: /* (u32) dst *= (u32) src */
--
2.33.0
^ permalink raw reply related
* [PATCH v2 05/10] powerpc/security: Add a helper to query stf_barrier type
From: Naveen N. Rao @ 2021-10-05 20:25 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, Jordan Niethe, Daniel Borkmann,
Alexei Starovoitov, Christophe Leroy, Johan Almbladh, Song Liu
Cc: bpf, linuxppc-dev
In-Reply-To: <cover.1633464148.git.naveen.n.rao@linux.vnet.ibm.com>
Add a helper to return the stf_barrier type for the current processor.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/security_features.h | 5 +++++
arch/powerpc/kernel/security.c | 5 +++++
2 files changed, 10 insertions(+)
diff --git a/arch/powerpc/include/asm/security_features.h b/arch/powerpc/include/asm/security_features.h
index 792eefaf230b80..27574f218b371f 100644
--- a/arch/powerpc/include/asm/security_features.h
+++ b/arch/powerpc/include/asm/security_features.h
@@ -39,6 +39,11 @@ static inline bool security_ftr_enabled(u64 feature)
return !!(powerpc_security_features & feature);
}
+#ifdef CONFIG_PPC_BOOK3S_64
+enum stf_barrier_type stf_barrier_type_get(void);
+#else
+static inline enum stf_barrier_type stf_barrier_type_get(void) { return STF_BARRIER_NONE; }
+#endif
// Features indicating support for Spectre/Meltdown mitigations
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index 1a998490fe60f0..15fb5ea1b9eafa 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -263,6 +263,11 @@ static int __init handle_no_stf_barrier(char *p)
early_param("no_stf_barrier", handle_no_stf_barrier);
+enum stf_barrier_type stf_barrier_type_get(void)
+{
+ return stf_enabled_flush_types;
+}
+
/* This is the generic flag used by other architectures */
static int __init handle_ssbd(char *p)
{
--
2.33.0
^ permalink raw reply related
* [PATCH v2 06/10] powerpc/bpf: Emit stf barrier instruction sequences for BPF_NOSPEC
From: Naveen N. Rao @ 2021-10-05 20:25 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, Jordan Niethe, Daniel Borkmann,
Alexei Starovoitov, Christophe Leroy, Johan Almbladh, Song Liu
Cc: bpf, linuxppc-dev
In-Reply-To: <cover.1633464148.git.naveen.n.rao@linux.vnet.ibm.com>
Emit similar instruction sequences to commit a048a07d7f4535
("powerpc/64s: Add support for a store forwarding barrier at kernel
entry/exit") when encountering BPF_NOSPEC.
Mitigations are enabled depending on what the firmware advertises. In
particular, we do not gate these mitigations based on current settings,
just like in x86. Due to this, we don't need to take any action if
mitigations are enabled or disabled at runtime.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
arch/powerpc/net/bpf_jit64.h | 8 ++---
arch/powerpc/net/bpf_jit_comp64.c | 55 ++++++++++++++++++++++++++++---
2 files changed, 55 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/net/bpf_jit64.h b/arch/powerpc/net/bpf_jit64.h
index 7b713edfa7e261..b63b35e45e558c 100644
--- a/arch/powerpc/net/bpf_jit64.h
+++ b/arch/powerpc/net/bpf_jit64.h
@@ -16,18 +16,18 @@
* with our redzone usage.
*
* [ prev sp ] <-------------
- * [ nv gpr save area ] 6*8 |
+ * [ nv gpr save area ] 5*8 |
* [ tail_call_cnt ] 8 |
- * [ local_tmp_var ] 8 |
+ * [ local_tmp_var ] 16 |
* fp (r31) --> [ ebpf stack space ] upto 512 |
* [ frame header ] 32/112 |
* sp (r1) ---> [ stack pointer ] --------------
*/
/* for gpr non volatile registers BPG_REG_6 to 10 */
-#define BPF_PPC_STACK_SAVE (6*8)
+#define BPF_PPC_STACK_SAVE (5*8)
/* for bpf JIT code internal usage */
-#define BPF_PPC_STACK_LOCALS 16
+#define BPF_PPC_STACK_LOCALS 24
/* stack frame excluding BPF stack, ensure this is quadword aligned */
#define BPF_PPC_STACKFRAME (STACK_FRAME_MIN_SIZE + \
BPF_PPC_STACK_LOCALS + BPF_PPC_STACK_SAVE)
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 6626e6c17d4ed2..51c7f6cd9a0a10 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -15,6 +15,7 @@
#include <linux/if_vlan.h>
#include <asm/kprobes.h>
#include <linux/bpf.h>
+#include <asm/security_features.h>
#include "bpf_jit64.h"
@@ -35,9 +36,9 @@ static inline bool bpf_has_stack_frame(struct codegen_context *ctx)
* [ prev sp ] <-------------
* [ ... ] |
* sp (r1) ---> [ stack pointer ] --------------
- * [ nv gpr save area ] 6*8
+ * [ nv gpr save area ] 5*8
* [ tail_call_cnt ] 8
- * [ local_tmp_var ] 8
+ * [ local_tmp_var ] 16
* [ unused red zone ] 208 bytes protected
*/
static int bpf_jit_stack_local(struct codegen_context *ctx)
@@ -45,12 +46,12 @@ static int bpf_jit_stack_local(struct codegen_context *ctx)
if (bpf_has_stack_frame(ctx))
return STACK_FRAME_MIN_SIZE + ctx->stack_size;
else
- return -(BPF_PPC_STACK_SAVE + 16);
+ return -(BPF_PPC_STACK_SAVE + 24);
}
static int bpf_jit_stack_tailcallcnt(struct codegen_context *ctx)
{
- return bpf_jit_stack_local(ctx) + 8;
+ return bpf_jit_stack_local(ctx) + 16;
}
static int bpf_jit_stack_offsetof(struct codegen_context *ctx, int reg)
@@ -272,10 +273,33 @@ static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 o
return 0;
}
+/*
+ * We spill into the redzone always, even if the bpf program has its own stackframe.
+ * Offsets hardcoded based on BPF_PPC_STACK_SAVE -- see bpf_jit_stack_local()
+ */
+void bpf_stf_barrier(void);
+
+asm (
+" .global bpf_stf_barrier ;"
+" bpf_stf_barrier: ;"
+" std 21,-64(1) ;"
+" std 22,-56(1) ;"
+" sync ;"
+" ld 21,-64(1) ;"
+" ld 22,-56(1) ;"
+" ori 31,31,0 ;"
+" .rept 14 ;"
+" b 1f ;"
+" 1: ;"
+" .endr ;"
+" blr ;"
+);
+
/* Assemble the body code between the prologue & epilogue */
int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx,
u32 *addrs, bool extra_pass)
{
+ enum stf_barrier_type stf_barrier = stf_barrier_type_get();
const struct bpf_insn *insn = fp->insnsi;
int flen = fp->len;
int i, ret;
@@ -646,6 +670,29 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
* BPF_ST NOSPEC (speculation barrier)
*/
case BPF_ST | BPF_NOSPEC:
+ if (!security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) ||
+ !security_ftr_enabled(SEC_FTR_STF_BARRIER))
+ break;
+
+ switch (stf_barrier) {
+ case STF_BARRIER_EIEIO:
+ EMIT(PPC_RAW_EIEIO() | 0x02000000);
+ break;
+ case STF_BARRIER_SYNC_ORI:
+ EMIT(PPC_RAW_SYNC());
+ EMIT(PPC_RAW_LD(b2p[TMP_REG_1], _R13, 0));
+ EMIT(PPC_RAW_ORI(_R31, _R31, 0));
+ break;
+ case STF_BARRIER_FALLBACK:
+ EMIT(PPC_RAW_MFLR(b2p[TMP_REG_1]));
+ PPC_LI64(12, dereference_kernel_function_descriptor(bpf_stf_barrier));
+ EMIT(PPC_RAW_MTCTR(12));
+ EMIT(PPC_RAW_BCTRL());
+ EMIT(PPC_RAW_MTLR(b2p[TMP_REG_1]));
+ break;
+ case STF_BARRIER_NONE:
+ break;
+ }
break;
/*
--
2.33.0
^ permalink raw reply related
* [PATCH v2 07/10] powerpc/bpf ppc32: Fix ALU32 BPF_ARSH operation
From: Naveen N. Rao @ 2021-10-05 20:25 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, Jordan Niethe, Daniel Borkmann,
Alexei Starovoitov, Christophe Leroy, Johan Almbladh, Song Liu
Cc: bpf, linuxppc-dev
In-Reply-To: <cover.1633464148.git.naveen.n.rao@linux.vnet.ibm.com>
Correct the destination register used for ALU32 BPF_ARSH operation.
Fixes: 51c66ad849a703 ("powerpc/bpf: Implement extended BPF on PPC32")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
arch/powerpc/net/bpf_jit_comp32.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c
index a74d52204f8da2..519ecb9ab67266 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -625,7 +625,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
EMIT(PPC_RAW_LI(dst_reg_h, 0));
break;
case BPF_ALU | BPF_ARSH | BPF_X: /* (s32) dst >>= src */
- EMIT(PPC_RAW_SRAW(dst_reg_h, dst_reg, src_reg));
+ EMIT(PPC_RAW_SRAW(dst_reg, dst_reg, src_reg));
break;
case BPF_ALU64 | BPF_ARSH | BPF_X: /* (s64) dst >>= src */
bpf_set_seen_register(ctx, tmp_reg);
--
2.33.0
^ permalink raw reply related
* [PATCH v2 08/10] powerpc/bpf ppc32: Fix JMP32_JSET_K
From: Naveen N. Rao @ 2021-10-05 20:25 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, Jordan Niethe, Daniel Borkmann,
Alexei Starovoitov, Christophe Leroy, Johan Almbladh, Song Liu
Cc: bpf, linuxppc-dev
In-Reply-To: <cover.1633464148.git.naveen.n.rao@linux.vnet.ibm.com>
'andi' only takes an unsigned 16-bit value. Correct the imm range used
when emitting andi.
Fixes: 51c66ad849a703 ("powerpc/bpf: Implement extended BPF on PPC32")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
arch/powerpc/net/bpf_jit_comp32.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c
index 519ecb9ab67266..7c65de9ed4fa64 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -1075,7 +1075,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
break;
case BPF_JMP32 | BPF_JSET | BPF_K:
/* andi does not sign-extend the immediate */
- if (imm >= -32768 && imm < 32768) {
+ if (imm >= 0 && imm < 32768) {
/* PPC_ANDI is _only/always_ dot-form */
EMIT(PPC_RAW_ANDI(_R0, dst_reg, imm));
} else {
--
2.33.0
^ permalink raw reply related
* [PATCH v2 10/10] powerpc/bpf ppc32: Fix BPF_SUB when imm == 0x80000000
From: Naveen N. Rao @ 2021-10-05 20:25 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, Jordan Niethe, Daniel Borkmann,
Alexei Starovoitov, Christophe Leroy, Johan Almbladh, Song Liu
Cc: bpf, linuxppc-dev
In-Reply-To: <cover.1633464148.git.naveen.n.rao@linux.vnet.ibm.com>
Special case handling of the smallest 32-bit negative number for BPF_SUB.
Fixes: 51c66ad849a703 ("powerpc/bpf: Implement extended BPF on PPC32")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
arch/powerpc/net/bpf_jit_comp32.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c
index 68dc8a8231de04..0da31d41d41310 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -357,7 +357,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
PPC_LI32(_R0, imm);
EMIT(PPC_RAW_ADDC(dst_reg, dst_reg, _R0));
}
- if (imm >= 0)
+ if (imm >= 0 || (BPF_OP(code) == BPF_SUB && imm == 0x80000000))
EMIT(PPC_RAW_ADDZE(dst_reg_h, dst_reg_h));
else
EMIT(PPC_RAW_ADDME(dst_reg_h, dst_reg_h));
--
2.33.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox