LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 2/8] PCI: Cleanup struct aer_err_info
From: Naveen Naidu @ 2021-10-04 14:29 UTC (permalink / raw)
  To: bhelgaas, ruscur, oohall
  Cc: linux-pci, linux-kernel, Naveen Naidu, skhan, linuxppc-dev,
	linux-kernel-mentees
In-Reply-To: <cover.1633357368.git.naveennaidu479@gmail.com>

The id, status and the mask fields of the struct aer_err_info comes
directly from the registers, hence their sizes should be explicit.

The length of these registers are:
  - id: 16 bits - Represents the Error Source Requester ID
  - status: 32 bits - COR/UNCOR Error Status
  - mask: 32 bits - COR/UNCOR Error Mask

Since the length of the above registers are even, use u16 and u32
to represent their values.

Also remove the __pad fields.

"pahole" was run on the modified struct aer_err_info and the size
remains unchanged.

Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
---
 drivers/pci/pci.h | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 1cce56c2aea0..9be7a966fda7 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -427,18 +427,16 @@ struct aer_err_info {
 	struct pci_dev *dev[AER_MAX_MULTI_ERR_DEVICES];
 	int error_dev_num;
 
-	unsigned int id:16;
+	u16	id;
 
 	unsigned int severity:2;	/* 0:NONFATAL | 1:FATAL | 2:COR */
-	unsigned int __pad1:5;
 	unsigned int multi_error_valid:1;
 
 	unsigned int first_error:5;
-	unsigned int __pad2:2;
 	unsigned int tlp_header_valid:1;
 
-	unsigned int status;		/* COR/UNCOR Error Status */
-	unsigned int mask;		/* COR/UNCOR Error Mask */
+	u32 status;		/* COR/UNCOR Error Status */
+	u32 mask;		/* COR/UNCOR Error Mask */
 	struct aer_header_log_regs tlp;	/* TLP Header */
 };
 
-- 
2.25.1


^ permalink raw reply related

* [PATCH v3 3/8] PCI/DPC: Initialize info->id in dpc_process_error()
From: Naveen Naidu @ 2021-10-04 14:29 UTC (permalink / raw)
  To: bhelgaas, ruscur, oohall
  Cc: linux-pci, linux-kernel, Naveen Naidu, skhan, linuxppc-dev,
	linux-kernel-mentees
In-Reply-To: <cover.1633357368.git.naveennaidu479@gmail.com>

In the dpc_process_error() path, info->id isn't initialized before being
passed to aer_print_error(). In the corresponding AER path, it is
initialized in aer_isr_one_error().

The error message shown during Coverity Scan is:

  Coverity #1461602
  CID 1461602 (#1 of 1): Uninitialized scalar variable (UNINIT)
  8. uninit_use_in_call: Using uninitialized value info.id when calling aer_print_error.

Initialize the "info->id" before passing it to aer_print_error()

Fixes: 8aefa9b0d910 ("PCI/DPC: Print AER status in DPC event handling")
Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
---
 drivers/pci/pcie/dpc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index c556e7beafe3..df3f3a10f8bc 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -262,14 +262,14 @@ static int dpc_get_aer_uncorrect_severity(struct pci_dev *dev,
 
 void dpc_process_error(struct pci_dev *pdev)
 {
-	u16 cap = pdev->dpc_cap, status, source, reason, ext_reason;
+	u16 cap = pdev->dpc_cap, status, reason, ext_reason;
 	struct aer_err_info info;
 
 	pci_read_config_word(pdev, cap + PCI_EXP_DPC_STATUS, &status);
-	pci_read_config_word(pdev, cap + PCI_EXP_DPC_SOURCE_ID, &source);
+	pci_read_config_word(pdev, cap + PCI_EXP_DPC_SOURCE_ID, &info.id);
 
 	pci_info(pdev, "containment event, status:%#06x source:%#06x\n",
-		 status, source);
+		 status, info.id);
 
 	reason = (status & PCI_EXP_DPC_STATUS_TRIGGER_RSN) >> 1;
 	ext_reason = (status & PCI_EXP_DPC_STATUS_TRIGGER_RSN_EXT) >> 5;
-- 
2.25.1


^ permalink raw reply related

* [PATCH v3 4/8] PCI/DPC: Use pci_aer_clear_status() in dpc_process_error()
From: Naveen Naidu @ 2021-10-04 14:30 UTC (permalink / raw)
  To: bhelgaas, ruscur, oohall
  Cc: linux-pci, linux-kernel, Naveen Naidu, skhan, linuxppc-dev,
	linux-kernel-mentees
In-Reply-To: <cover.1633357368.git.naveennaidu479@gmail.com>

dpc_process_error() clears both AER fatal and non fatal status
registers. Instead of clearing each status registers via a different
function call use pci_aer_clear_status().

This helps clean up the code a bit.

Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
---
 drivers/pci/pcie/dpc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index df3f3a10f8bc..faf4a1e77fab 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -288,8 +288,7 @@ void dpc_process_error(struct pci_dev *pdev)
 		 dpc_get_aer_uncorrect_severity(pdev, &info) &&
 		 aer_get_device_error_info(pdev, &info)) {
 		aer_print_error(pdev, &info);
-		pci_aer_clear_nonfatal_status(pdev);
-		pci_aer_clear_fatal_status(pdev);
+		pci_aer_clear_status(pdev);
 	}
 }
 
-- 
2.25.1


^ permalink raw reply related

* [PATCH v3 5/8] PCI/DPC: Converge EDR and DPC Path of clearing AER registers
From: Naveen Naidu @ 2021-10-04 14:30 UTC (permalink / raw)
  To: bhelgaas, ruscur, oohall
  Cc: linux-pci, linux-kernel, Naveen Naidu, skhan, linuxppc-dev,
	linux-kernel-mentees
In-Reply-To: <cover.1633357368.git.naveennaidu479@gmail.com>

In the EDR path, AER registers are cleared *after* DPC error event is
processed. The process stack in EDR is:

  edr_handle_event()
    dpc_process_error()
    pci_aer_raw_clear_status()
    pcie_do_recovery()

But in DPC path, AER status registers are cleared *while* processing
the error. The process stack in DPC is:

  dpc_handler()
    dpc_process_error()
      pci_aer_clear_status()
    pcie_do_recovery()

In EDR path, AER status registers are cleared irrespective of whether
the error was an RP PIO or unmasked uncorrectable error. But in DPC, the
AER status registers are cleared only when it's an unmasked uncorrectable
error.

This leads to two different behaviours for the same task (handling of
DPC errors) in FFS systems and when native OS has control.

Bring the same semantics for clearing the AER status register in EDR
path and DPC path.

Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
---
 drivers/pci/pcie/dpc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index faf4a1e77fab..68899a3db126 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -288,7 +288,6 @@ void dpc_process_error(struct pci_dev *pdev)
 		 dpc_get_aer_uncorrect_severity(pdev, &info) &&
 		 aer_get_device_error_info(pdev, &info)) {
 		aer_print_error(pdev, &info);
-		pci_aer_clear_status(pdev);
 	}
 }

@@ -297,6 +296,7 @@ static irqreturn_t dpc_handler(int irq, void *context)
 	struct pci_dev *pdev = context;

 	dpc_process_error(pdev);
+	pci_aer_clear_status(pdev);

 	/* We configure DPC so it only triggers on ERR_FATAL */
 	pcie_do_recovery(pdev, pci_channel_io_frozen, dpc_reset_link);
-- 
2.25.1

^ permalink raw reply related

* [PATCH v3 6/8] PCI/AER: Clear error device AER registers in aer_irq()
From: Naveen Naidu @ 2021-10-04 14:30 UTC (permalink / raw)
  To: bhelgaas, ruscur, oohall
  Cc: linux-pci, linux-kernel, Naveen Naidu, skhan, linuxppc-dev,
	linux-kernel-mentees
In-Reply-To: <cover.1633357368.git.naveennaidu479@gmail.com>

Converge the APEI path and native AER path of clearing the AER registers
of the error device.

In APEI path, the system firmware clears the AER registers before
handing off the record to OS. But in "native AER" path, the execution
path of clearing the AER register is as follows:

  aer_isr_one_error
    aer_print_port_info
      if (find_source_device())
        aer_process_err_devices
          handle_error_source
            pci_write_config_dword(dev, PCI_ERR_COR_STATUS, ...)

The above path has a bug, if the find_source_device() fails, AER
registers are not cleared from the error device. This means, the error
device will keep reporting the error again and again and would lead
to message spew.

Related Bug Report:
  https://lore.kernel.org/linux-pci/20151229155822.GA17321@localhost/
  https://bugzilla.kernel.org/show_bug.cgi?id=109691
  https://bugzilla.kernel.org/show_bug.cgi?id=109691
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1521173

The above bug could be avoided, if the AER registers are cleared during
the AER IRQ handler aer_irq(), which would provide guarantee that the AER
error registers are always cleared. This is similar to how APEI handles
these errors.

The main aim is that:

  When an interrupt handler deals with a interrupt, it must *always*
  clear the source of the interrupt.

Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
---
 drivers/pci/pci.h      |  13 ++-
 drivers/pci/pcie/aer.c | 245 ++++++++++++++++++++++++++++-------------
 2 files changed, 182 insertions(+), 76 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 9be7a966fda7..eb88d8bfeaf7 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -424,7 +424,6 @@ static inline bool pci_dev_is_added(const struct pci_dev *dev)
 #define AER_MAX_MULTI_ERR_DEVICES	5	/* Not likely to have more */
 
 struct aer_err_info {
-	struct pci_dev *dev[AER_MAX_MULTI_ERR_DEVICES];
 	int error_dev_num;
 
 	u16	id;
@@ -440,6 +439,18 @@ struct aer_err_info {
 	struct aer_header_log_regs tlp;	/* TLP Header */
 };
 
+/* Preliminary AER error information processed from Root port */
+struct aer_devices_err_info {
+	struct pci_dev *dev[AER_MAX_MULTI_ERR_DEVICES];
+	struct aer_err_info err_info;
+};
+
+/* AER information associated with each error device */
+struct aer_dev_err_info {
+	struct pci_dev *dev;
+	struct aer_err_info err_info;
+};
+
 int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info);
 void aer_print_error(struct pci_dev *dev, struct aer_err_info *info);
 #endif	/* CONFIG_PCIEAER */
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 241ff361b43c..91f91d6ab052 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -36,6 +36,18 @@
 
 #define AER_ERROR_SOURCES_MAX		128
 
+/*
+ * There can be 128 maximum error sources (AER_ERROR_SOURCES_MAX) and each
+ * error source can have maximum of 5 error devices (AER_MAX_MULTI_ERR_DEVICES)
+ * so the maximum error devices we can report is:
+ *
+ * AER_ERROR_DEVICES_MAX = AER_ERROR_SOURCES_MAX * AER_MAX_MULTI_ERR_DEVICES == (128 * 5) == 640
+ *
+ * But since, the size in KFIFO should be a power of two, the closest value
+ * to 640 is 1024
+ */
+# define AER_ERROR_DEVICES_MAX		1024
+
 #define AER_MAX_TYPEOF_COR_ERRS		16	/* as per PCI_ERR_COR_STATUS */
 #define AER_MAX_TYPEOF_UNCOR_ERRS	27	/* as per PCI_ERR_UNCOR_STATUS*/
 
@@ -46,7 +58,7 @@ struct aer_err_source {
 
 struct aer_rpc {
 	struct pci_dev *rpd;		/* Root Port device */
-	DECLARE_KFIFO(aer_fifo, struct aer_err_source, AER_ERROR_SOURCES_MAX);
+	DECLARE_KFIFO(aer_fifo, struct aer_dev_err_info, AER_ERROR_DEVICES_MAX);
 };
 
 /* AER stats for the device */
@@ -806,11 +818,11 @@ void cper_print_aer(struct pci_dev *dev, int aer_severity,
  * @e_info: pointer to error info
  * @dev: pointer to pci_dev to be added
  */
-static int add_error_device(struct aer_err_info *e_info, struct pci_dev *dev)
+static int add_error_device(struct aer_devices_err_info *e_dev, struct pci_dev *dev)
 {
-	if (e_info->error_dev_num < AER_MAX_MULTI_ERR_DEVICES) {
-		e_info->dev[e_info->error_dev_num] = pci_dev_get(dev);
-		e_info->error_dev_num++;
+	if (e_dev->err_info.error_dev_num < AER_MAX_MULTI_ERR_DEVICES) {
+		e_dev->dev[e_dev->err_info.error_dev_num] = pci_dev_get(dev);
+		e_dev->err_info.error_dev_num++;
 		return 0;
 	}
 	return -ENOSPC;
@@ -877,18 +889,18 @@ static bool is_error_source(struct pci_dev *dev, struct aer_err_info *e_info)
 
 static int find_device_iter(struct pci_dev *dev, void *data)
 {
-	struct aer_err_info *e_info = (struct aer_err_info *)data;
+	struct aer_devices_err_info *e_dev = (struct aer_devices_err_info *)data;
 
-	if (is_error_source(dev, e_info)) {
+	if (is_error_source(dev, &e_dev->err_info)) {
 		/* List this device */
-		if (add_error_device(e_info, dev)) {
+		if (add_error_device(e_dev, dev)) {
 			/* We cannot handle more... Stop iteration */
 			/* TODO: Should print error message here? */
 			return 1;
 		}
 
 		/* If there is only a single error, stop iteration */
-		if (!e_info->multi_error_valid)
+		if (!e_dev->err_info.multi_error_valid)
 			return 1;
 	}
 	return 0;
@@ -907,26 +919,26 @@ static int find_device_iter(struct pci_dev *dev, void *data)
  * e_info->error_dev_num and e_info->dev[], based on the given information.
  */
 static bool find_source_device(struct pci_dev *parent,
-		struct aer_err_info *e_info)
+		struct aer_devices_err_info *e_dev)
 {
 	struct pci_dev *dev = parent;
 	int result;
 
 	/* Must reset in this function */
-	e_info->error_dev_num = 0;
+	e_dev->err_info.error_dev_num = 0;
 
 	/* Is Root Port an agent that sends error message? */
-	result = find_device_iter(dev, e_info);
+	result = find_device_iter(dev, e_dev);
 	if (result)
 		return true;
 
 	if (pci_pcie_type(parent) == PCI_EXP_TYPE_RC_EC)
-		pcie_walk_rcec(parent, find_device_iter, e_info);
+		pcie_walk_rcec(parent, find_device_iter, e_dev);
 	else
-		pci_walk_bus(parent->subordinate, find_device_iter, e_info);
+		pci_walk_bus(parent->subordinate, find_device_iter, e_dev);
 
-	if (!e_info->error_dev_num) {
-		pci_info(parent, "can't find device of ID%04x\n", e_info->id);
+	if (!e_dev->err_info.error_dev_num) {
+		pci_info(parent, "can't find device of ID%04x\n", e_dev->err_info.id);
 		return false;
 	}
 	return true;
@@ -940,24 +952,42 @@ static bool find_source_device(struct pci_dev *parent,
  * Invoked when an error being detected by Root Port.
  */
 static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
+{
+	/*
+	 * Correctable error does not need software intervention.
+	 * No need to go through error recovery process.
+	 */
+	if (info->severity == AER_NONFATAL)
+		pcie_do_recovery(dev, pci_channel_io_normal, aer_root_reset);
+	else if (info->severity == AER_FATAL)
+		pcie_do_recovery(dev, pci_channel_io_frozen, aer_root_reset);
+	pci_dev_put(dev);
+}
+
+/**
+ * clear_error_source_aer_registers - clear AER registers of the error source device
+ * @dev: pointer to pci_dev data structure of error source device
+ * @info: comprehensive error information
+ *
+ * Invoked when an error being detected by Root Port but before we handle the
+ * error.
+ */
+static void clear_error_source_aer_registers(struct pci_dev *dev, struct aer_err_info info)
 {
 	int aer = dev->aer_cap;
 
-	if (info->severity == AER_CORRECTABLE) {
-		/*
-		 * Correctable error does not need software intervention.
-		 * No need to go through error recovery process.
-		 */
+	if (info.severity == AER_CORRECTABLE) {
 		if (aer)
 			pci_write_config_dword(dev, aer + PCI_ERR_COR_STATUS,
-					info->status);
+					info.status);
 		if (pcie_aer_is_native(dev))
 			pcie_clear_device_status(dev);
-	} else if (info->severity == AER_NONFATAL)
-		pcie_do_recovery(dev, pci_channel_io_normal, aer_root_reset);
-	else if (info->severity == AER_FATAL)
-		pcie_do_recovery(dev, pci_channel_io_frozen, aer_root_reset);
-	pci_dev_put(dev);
+	} else if (info.severity == AER_NONFATAL) {
+		pci_aer_clear_nonfatal_status(dev);
+	} else if (info.severity == AER_FATAL) {
+		pci_aer_clear_fatal_status(dev);
+	}
+
 }
 
 #ifdef CONFIG_ACPI_APEI_PCIEAER
@@ -1093,70 +1123,112 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
 	return 1;
 }
 
-static inline void aer_process_err_devices(struct aer_err_info *e_info)
-{
-	int i;
-
-	/* Report all before handle them, not to lost records by reset etc. */
-	for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) {
-		if (aer_get_device_error_info(e_info->dev[i], e_info))
-			aer_print_error(e_info->dev[i], e_info);
-	}
-	for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) {
-		if (aer_get_device_error_info(e_info->dev[i], e_info))
-			handle_error_source(e_info->dev[i], e_info);
-	}
-}
-
 /**
- * aer_isr_one_error - consume an error detected by root port
- * @rpc: pointer to the root port which holds an error
+ * aer_find_corr_error_source_device - find the error source which detected the corrected error
+ * @rp: pointer to Root Port pci_dev data structure
  * @e_src: pointer to an error source
+ * @e_info: including detailed error information such like id
+ *
+ * Return true if found.
+ *
+ * Process the error information received at the Root Port, set these values
+ * in the aer_devices_err_info and find all the devices that are related to
+ * the error.
  */
-static void aer_isr_one_error(struct aer_rpc *rpc,
-		struct aer_err_source *e_src)
+static bool aer_find_corr_error_source_device(struct pci_dev *rp,
+					struct aer_err_source *e_src,
+					struct aer_devices_err_info *e_info)
 {
-	struct pci_dev *pdev = rpc->rpd;
-	struct aer_err_info e_info;
-
-	pci_rootport_aer_stats_incr(pdev, e_src);
-
-	/*
-	 * There is a possibility that both correctable error and
-	 * uncorrectable error being logged. Report correctable error first.
-	 */
 	if (e_src->status & PCI_ERR_ROOT_COR_RCV) {
-		e_info.id = ERR_COR_ID(e_src->id);
-		e_info.severity = AER_CORRECTABLE;
+		e_info->err_info.id = ERR_COR_ID(e_src->id);
+		e_info->err_info.severity = AER_CORRECTABLE;
 
 		if (e_src->status & PCI_ERR_ROOT_MULTI_COR_RCV)
-			e_info.multi_error_valid = 1;
+			e_info->err_info.multi_error_valid = 1;
 		else
-			e_info.multi_error_valid = 0;
-		aer_print_port_info(pdev, &e_info);
+			e_info->err_info.multi_error_valid = 0;
 
-		if (find_source_device(pdev, &e_info))
-			aer_process_err_devices(&e_info);
+		if (!find_source_device(rp, e_info))
+			return false;
 	}
+	return true;
+}
 
+/**
+ * aer_find_corr_error_source_device - find the error source which detected the uncorrected error
+ * @rp: pointer to Root Port pci_dev data structure
+ * @e_src: pointer to an error source
+ * @e_info: including detailed error information such like id
+ *
+ * Return true if found.
+ *
+ * Process the error information received at the Root Port, set these values
+ * in the aer_devices_err_info and find all the devices that are related to
+ * the error.
+ */
+static bool aer_find_uncorr_error_source_device(struct pci_dev *rp,
+					struct aer_err_source *e_src,
+					struct aer_devices_err_info *e_info)
+{
 	if (e_src->status & PCI_ERR_ROOT_UNCOR_RCV) {
-		e_info.id = ERR_UNCOR_ID(e_src->id);
+		e_info->err_info.id = ERR_UNCOR_ID(e_src->id);
 
 		if (e_src->status & PCI_ERR_ROOT_FATAL_RCV)
-			e_info.severity = AER_FATAL;
+			e_info->err_info.severity = AER_FATAL;
 		else
-			e_info.severity = AER_NONFATAL;
+			e_info->err_info.severity = AER_NONFATAL;
 
 		if (e_src->status & PCI_ERR_ROOT_MULTI_UNCOR_RCV)
-			e_info.multi_error_valid = 1;
+			e_info->err_info.multi_error_valid = 1;
 		else
-			e_info.multi_error_valid = 0;
+			e_info->err_info.multi_error_valid = 0;
+
+		if (!find_source_device(rp, e_info))
+			return false;
+	}
 
-		aer_print_port_info(pdev, &e_info);
+	return true;
+}
 
-		if (find_source_device(pdev, &e_info))
-			aer_process_err_devices(&e_info);
+/**
+ * aer_isr_one_error - consume an error detected by root port
+ * @rp: pointer to Root Port pci_dev data structure
+ * @e_dev: pointer to an error device
+ */
+static void aer_isr_one_error(struct pci_dev *rp, struct aer_dev_err_info *e_dev)
+{
+	aer_print_port_info(rp, &e_dev->err_info);
+	aer_print_error(e_dev->dev, &e_dev->err_info);
+	handle_error_source(e_dev->dev, &e_dev->err_info);
+}
+
+static bool aer_add_err_devices_to_queue(struct aer_rpc *rpc,
+		struct aer_devices_err_info *e_info)
+{
+	int i;
+	struct aer_dev_err_info *e_dev;
+
+	e_dev = kzalloc(sizeof(*e_dev), GFP_ATOMIC);
+	if (!e_dev)
+		return false;
+
+	for (i = 0; i < e_info->err_info.error_dev_num && e_info->dev[i]; i++) {
+		e_dev->err_info = e_info->err_info;
+		e_dev->dev = e_info->dev[i];
+
+		/*
+		 * Store the AER register information for each error device on
+		 * the queue
+		 */
+		if (aer_get_device_error_info(e_dev->dev, &e_dev->err_info)) {
+			if (!kfifo_put(&rpc->aer_fifo, *e_dev))
+				return false;
+
+			clear_error_source_aer_registers(e_dev->dev, e_dev->err_info);
+		}
 	}
+
+	return true;
 }
 
 /**
@@ -1170,13 +1242,13 @@ static irqreturn_t aer_isr(int irq, void *context)
 {
 	struct pcie_device *dev = (struct pcie_device *)context;
 	struct aer_rpc *rpc = get_service_data(dev);
-	struct aer_err_source e_src;
+	struct aer_dev_err_info e_dev;
 
 	if (kfifo_is_empty(&rpc->aer_fifo))
 		return IRQ_NONE;
 
-	while (kfifo_get(&rpc->aer_fifo, &e_src))
-		aer_isr_one_error(rpc, &e_src);
+	while (kfifo_get(&rpc->aer_fifo, &e_dev))
+		aer_isr_one_error(rpc->rpd, &e_dev);
 	return IRQ_HANDLED;
 }
 
@@ -1194,6 +1266,11 @@ static irqreturn_t aer_irq(int irq, void *context)
 	struct pci_dev *rp = rpc->rpd;
 	int aer = rp->aer_cap;
 	struct aer_err_source e_src = {};
+	struct aer_devices_err_info *e_info;
+
+	e_info = kzalloc(sizeof(*e_info), GFP_ATOMIC);
+	if (!e_info)
+		return IRQ_NONE;
 
 	pci_read_config_dword(rp, aer + PCI_ERR_ROOT_STATUS, &e_src.status);
 	if (!(e_src.status & (PCI_ERR_ROOT_UNCOR_RCV|PCI_ERR_ROOT_COR_RCV)))
@@ -1202,8 +1279,26 @@ static irqreturn_t aer_irq(int irq, void *context)
 	pci_read_config_dword(rp, aer + PCI_ERR_ROOT_ERR_SRC, &e_src.id);
 	pci_write_config_dword(rp, aer + PCI_ERR_ROOT_STATUS, e_src.status);
 
-	if (!kfifo_put(&rpc->aer_fifo, e_src))
-		return IRQ_HANDLED;
+	pci_rootport_aer_stats_incr(rp, &e_src);
+
+	/*
+	 * There is a possibility that both correctable error and
+	 * uncorrectable error are being logged. Find the devices which caused
+	 * correctable errors first so that they can be added to the queue first
+	 * and will be reported first.
+	 *
+	 * Before adding the error device to the queue to be handled, clear the
+	 * AER status registers.
+	 */
+	if (aer_find_corr_error_source_device(rp, &e_src, e_info)) {
+		if (!aer_add_err_devices_to_queue(rpc, e_info))
+			return IRQ_NONE;
+	}
+
+	if (aer_find_uncorr_error_source_device(rp, &e_src, e_info)) {
+		if (!aer_add_err_devices_to_queue(rpc, e_info))
+			return IRQ_NONE;
+	}
 
 	return IRQ_WAKE_THREAD;
 }
-- 
2.25.1


^ permalink raw reply related

* [PATCH v3 7/8] PCI/ERR: Remove redundant clearing of AER register in pcie_do_recovery()
From: Naveen Naidu @ 2021-10-04 14:30 UTC (permalink / raw)
  To: bhelgaas, ruscur, oohall
  Cc: linux-pci, linux-kernel, Naveen Naidu, skhan, linuxppc-dev,
	linux-kernel-mentees
In-Reply-To: <cover.1633357368.git.naveennaidu479@gmail.com>

pcie_do_recovery() is shared across the following paths:
 - ACPI APEI
 - Native AER path
 - EDR
 - DPC

ACPI APEI
==========

  ghes_handle_aer()
    aer_recover_queue()
      kfifo_in_spinlocked(aer_recover_ring)

  aer_recover_work_func()
    while (kfifo_get(aer_recover_ring))
      pcie_do_recovery()

In this path the system firmware clears the AER registers before
handing off the record to the OS in ghes_handle_aer()

Native AER
==========

 aer_irq()
   aer_add_err_devices_to_queue()
     kfifo_put(&rpc->aer_fifo, *e_dev)
     clear_error_source_aer_registers()   <---- AER registers are cleard

 aer_isr()
   aer_isr_one_error()
    handle_error_source()
      pcie_do_recovery()

The AER registers are cleared during the handling of IRQ, i.e before we
the recovery starts.

DPC
=====

  dpc_handler()
    dpc_process_error()
    pci_aer_clear_status()       <---- AER registers are cleared
    pcie_do_recovery()

EDR
====

  edr_handle_event()
    dpc_process_error()
    pci_aer_raw_clear_status()  <---- AER registers are cleared
    pcie_do_recovery()

In all the above paths, the AER registers are cleared before
pcie_do_recovery(). The non fatal status AER registers are again cleared
in pcie_do_recovery(). This is redundant.

Remove redundant clearing of AER register in pcie_do_recovery()

Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
---
 drivers/pci/pcie/err.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index b576aa890c76..fe04b0ae22f4 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -231,14 +231,11 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 
 	/*
 	 * If we have native control of AER, clear error status in the device
-	 * that detected the error.  If the platform retained control of AER,
-	 * it is responsible for clearing this status.  In that case, the
-	 * signaling device may not even be visible to the OS.
+	 * that detected the error.
 	 */
-	if (host->native_aer || pcie_ports_native) {
+	if (host->native_aer || pcie_ports_native)
 		pcie_clear_device_status(dev);
-		pci_aer_clear_nonfatal_status(dev);
-	}
+
 	pci_info(bridge, "device recovery successful\n");
 	return status;
 
-- 
2.25.1


^ permalink raw reply related

* [PATCH v3 8/8] PCI/AER: Include DEVCTL in aer_print_error()
From: Naveen Naidu @ 2021-10-04 14:30 UTC (permalink / raw)
  To: bhelgaas, ruscur, oohall
  Cc: linux-pci, linux-kernel, Naveen Naidu, skhan, linuxppc-dev,
	linux-kernel-mentees
In-Reply-To: <cover.1633357368.git.naveennaidu479@gmail.com>

Print the contents of Device Control Register of the device which
detected the error. This might help in faster error diagnosis.

Sample output from dummy error injected by aer-inject:

  pcieport 0000:00:03.0: AER: Corrected error received: 0000:00:03.0
  pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver)
  pcieport 0000:00:03.0:   device [1b36:000c] error status/mask=00000040/0000e000, devctl=0x000f
  pcieport 0000:00:03.0:    [ 6] BadTLP

Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
---
 drivers/pci/pci.h      |  2 ++
 drivers/pci/pcie/aer.c | 10 ++++++++--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index eb88d8bfeaf7..48ed7f91113b 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -437,6 +437,8 @@ struct aer_err_info {
 	u32 status;		/* COR/UNCOR Error Status */
 	u32 mask;		/* COR/UNCOR Error Mask */
 	struct aer_header_log_regs tlp;	/* TLP Header */
+
+	u16 devctl;
 };
 
 /* Preliminary AER error information processed from Root port */
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 91f91d6ab052..42cae01b6887 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -729,8 +729,8 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
 		   aer_error_severity_string[info->severity],
 		   aer_error_layer[layer], aer_agent_string[agent]);
 
-	pci_printk(level, dev, "  device [%04x:%04x] error status/mask=%08x/%08x\n",
-		   dev->vendor, dev->device, info->status, info->mask);
+	pci_printk(level, dev, "  device [%04x:%04x] error status/mask=%08x/%08x, devctl=%#06x\n",
+		   dev->vendor, dev->device, info->status, info->mask, info->devctl);
 
 	__aer_print_error(dev, info);
 
@@ -1083,6 +1083,12 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
 	if (!aer)
 		return 0;
 
+	/*
+	 * Cache the value of Device Control Register now, because later the
+	 * device might not be available
+	 */
+	pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &info->devctl);
+
 	if (info->severity == AER_CORRECTABLE) {
 		pci_read_config_dword(dev, aer + PCI_ERR_COR_STATUS,
 			&info->status);
-- 
2.25.1


^ permalink raw reply related

* [PATCH 0/6] MIPS: OCTEON: Remove redundant AER code
From: Naveen Naidu @ 2021-10-04 17:59 UTC (permalink / raw)
  To: bhelgaas, tsbogend, ruscur, oohall
  Cc: linux-pci, linuxppc-dev, linux-mips, linux-kernel, Naveen Naidu,
	skhan, linux-kernel-mentees

e8635b484f64 ("MIPS: Add Cavium OCTEON PCI support.") added MIPS
specific code to enable PCIe and AER error reporting (*irrespective
of CONFIG_PCIEAER value*) because PCI core didn't do that at the time.

But currently, the PCI core clears and enables the AER status registers.
So it's redundant for octeon code to do so. This patch series removes
the redundant code from the pci-octeon.c

Currently, the correctable and uncorrectable AER mask registers are not
set to their default value when AER service driver is loaded. This
defect is also fixed in the "[PATCH 1/6]" in the series.

Please note that "Patch 4/6" is dependent on "Patch 1/6".

Thanks,
Naveen Naidu

Naveen Naidu (6):
 [PATCH 1/6] PCI/AER: Enable COR/UNCOR error reporting in set_device_error_reporting()
 [PATCH 2/6] MIPS: OCTEON: Remove redundant clearing of AER status registers
 [PATCH 3/6] MIPS: OCTEON: Remove redundant enable of PCIe normal error reporting
 [PATCH 4/6] MIPS: OCTEON: Remove redundant enable of COR/UNCOR error
 [PATCH 5/6] MIPS: OCTEON: Remove redundant ECRC Generation Enable
 [PATCH 6/6] MIPS: OCTEON: Remove redundant enable of RP error reporting

 arch/mips/pci/pci-octeon.c | 50 --------------------------------------
 drivers/pci/pcie/aer.c     | 13 +++++++++-
 2 files changed, 12 insertions(+), 51 deletions(-)

-- 
2.25.1

^ permalink raw reply

* [PATCH 1/6] PCI/AER: Enable COR/UNCOR error reporting in set_device_error_reporting()
From: Naveen Naidu @ 2021-10-04 17:59 UTC (permalink / raw)
  To: bhelgaas, tsbogend, ruscur, oohall
  Cc: linux-pci, linuxppc-dev, linux-mips, linux-kernel, Naveen Naidu,
	skhan, linux-kernel-mentees
In-Reply-To: <cover.1633369560.git.naveennaidu479@gmail.com>

The (PCIe r5.0, sec 7.6.4.3, Table 7-101) and  (PCIe r5.0, sec 7.8.4.6,
Table 7-104) states that the default values for the Uncorrectable Error
Mask and Correctable Error Mask should be 0b. But the current code does
not set the default value of these registers when the PCIe bus loads the
AER service driver.

Enable reporting of all correctable and uncorrectable errors during
aer_probe()

Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
---
 drivers/pci/pcie/aer.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 9784fdcf3006..88c4ca6098fb 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1212,6 +1212,7 @@ static int set_device_error_reporting(struct pci_dev *dev, void *data)
 {
 	bool enable = *((bool *)data);
 	int type = pci_pcie_type(dev);
+	int aer = dev->aer_cap;
 
 	if ((type == PCI_EXP_TYPE_ROOT_PORT) ||
 	    (type == PCI_EXP_TYPE_RC_EC) ||
@@ -1223,8 +1224,18 @@ static int set_device_error_reporting(struct pci_dev *dev, void *data)
 			pci_disable_pcie_error_reporting(dev);
 	}
 
-	if (enable)
+	if (enable) {
+
+		/* Enable reporting of all uncorrectable errors */
+		/* Uncorrectable Error Mask - turned on bits disable errors */
+		pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_MASK, 0);
+
+		/* Enable reporting of all correctable errors */
+		/* Correctable Error Mask - turned on bits disable errors */
+		pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, 0);
+
 		pcie_set_ecrc_checking(dev);
+	}
 
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH v6 07/11] PCI: Replace pci_dev::driver usage that gets the driver name
From: Ido Schimmel @ 2021-10-04 16:06 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: linux-pci, Alexander Duyck, oss-drivers, Paul Mackerras,
	Herbert Xu, Rafał Miłecki, Jesse Brandeburg,
	Bjorn Helgaas, Ido Schimmel, Jakub Kicinski, Yisen Zhuang,
	Vadym Kochan, Michael Buesch, Jiri Pirko, Salil Mehta, netdev,
	linux-wireless, linux-kernel, Taras Chornyi, Zhou Wang,
	linux-crypto, kernel, Simon Horman, Oliver O'Halloran,
	linuxppc-dev, David S. Miller
In-Reply-To: <20211004125935.2300113-8-u.kleine-koenig@pengutronix.de>

On Mon, Oct 04, 2021 at 02:59:31PM +0200, Uwe Kleine-König wrote:
> struct pci_dev::driver holds (apart from a constant offset) the same
> data as struct pci_dev::dev->driver. With the goal to remove struct
> pci_dev::driver to get rid of data duplication replace getting the
> driver name by dev_driver_string() which implicitly makes use of struct
> pci_dev::dev->driver.
> 
> Acked-by: Simon Horman <simon.horman@corigine.com> (for NFP)
> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

For mlxsw:

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>

Tested with the kexec flow that I mentioned last time. Works fine now.

Thanks

^ permalink raw reply

* Re: [RFC PATCH 4/8] powerpc: add CPU field to struct thread_info
From: Michael Ellerman @ 2021-10-05  1:55 UTC (permalink / raw)
  To: Kees Cook
  Cc: Peter Zijlstra, Catalin Marinas, Paul Mackerras, linux-riscv,
	Will Deacon, Ard Biesheuvel, open list:S390, Vasily Gorbik,
	Russell King, Christian Borntraeger, Ingo Molnar, Albert Ou,
	Arnd Bergmann, Heiko Carstens, Keith Packard, Borislav Petkov,
	Andy Lutomirski, Paul Walmsley, Thomas Gleixner, Linux ARM,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
	Linux Kernel Mailing List, Palmer Dabbelt, Linus Torvalds
In-Reply-To: <202109301045.15DDDA0B@keescook>

Kees Cook <keescook@chromium.org> writes:
> On Thu, Sep 30, 2021 at 08:46:04AM +1000, Michael Ellerman wrote:
>> Ard Biesheuvel <ardb@kernel.org> writes:
>> > On Tue, 28 Sept 2021 at 02:16, Michael Ellerman <mpe@ellerman.id.au> wrote:
>> >>
>> >> Michael Ellerman <mpe@ellerman.id.au> writes:
>> >> > Ard Biesheuvel <ardb@kernel.org> writes:
>> >> >> On Tue, 14 Sept 2021 at 14:11, Ard Biesheuvel <ardb@kernel.org> wrote:
>> >> >>>
>> >> >>> The CPU field will be moved back into thread_info even when
>> >> >>> THREAD_INFO_IN_TASK is enabled, so add it back to powerpc's definition
>> >> >>> of struct thread_info.
>> >> >>>
>> >> >>> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>> >> >>
>> >> >> Michael,
>> >> >>
>> >> >> Do you have any objections or issues with this patch or the subsequent
>> >> >> ones cleaning up the task CPU kludge for ppc32? Christophe indicated
>> >> >> that he was happy with it.
>> >> >
>> >> > No objections, it looks good to me, thanks for cleaning up that horror :)
>> >> >
>> >> > It didn't apply cleanly to master so I haven't tested it at all, if you can point me at a
>> >> > git tree with the dependencies I'd be happy to run some tests over it.
>> >>
>> >> Actually I realised I can just drop the last patch.
>> >>
>> >> So that looks fine, passes my standard quick build & boot on qemu tests,
>> >> and builds with/without stack protector enabled.
>> >>
>> >
>> > Thanks.
>> >
>> > Do you have any opinion on how this series should be merged? Kees Cook
>> > is willing to take them via his cross-arch tree, or you could carry
>> > them if you prefer. Taking it via multiple trees at the same time is
>> > going to be tricky, or take two cycles, with I'd prefer to avoid.
>> 
>> I don't really mind. If Kees is happy to take it then that's OK by me.
>> 
>> If Kees put the series in a topic branch based off rc2 then I could
>> merge that, and avoid any conflicts.
>
> I've created:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/thread_info/cpu
>
> it includes a --no-ff merge commit, which I'm not sure is desirable? Let
> me know if I should adjust this, or if Linus will yell about this if I
> send him a PR containing a merge commit? I'm not sure what's right here.

It looks good to me.

I don't think Linus will be bothered about that merge. It has useful
information, ie. explains why you're merging it and that arch
maintainers have acked it, and quotes Ard's cover letter.

cheers

^ permalink raw reply

* Re: [PATCH v3 6/8] PCI/AER: Clear error device AER registers in aer_irq()
From: kernel test robot @ 2021-10-05  5:02 UTC (permalink / raw)
  To: Naveen Naidu, bhelgaas, ruscur, oohall
  Cc: kbuild-all, linux-pci, linuxppc-dev, linux-kernel, Naveen Naidu,
	skhan, linux-kernel-mentees
In-Reply-To: <9092514ec952ab3d60498b6f6c29671b85f9aeff.1633357368.git.naveennaidu479@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3637 bytes --]

Hi Naveen,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on helgaas-pci/next]
[also build test WARNING on linux/master linus/master v5.15-rc3 next-20210922]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Naveen-Naidu/Fix-long-standing-AER-Error-Handling-Issues/20211004-223758
base:   https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git next
config: arc-allyesconfig (attached as .config)
compiler: arceb-elf-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/ab727f8771a49bb5f4c7bbf10dc9ba9b90a454cf
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Naveen-Naidu/Fix-long-standing-AER-Error-Handling-Issues/20211004-223758
        git checkout ab727f8771a49bb5f4c7bbf10dc9ba9b90a454cf
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=arc SHELL=/bin/bash drivers/pci/pcie/ drivers/usb/dwc3/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   drivers/pci/pcie/aer.c:822: warning: Function parameter or member 'e_dev' not described in 'add_error_device'
   drivers/pci/pcie/aer.c:822: warning: Excess function parameter 'e_info' description in 'add_error_device'
   drivers/pci/pcie/aer.c:923: warning: Function parameter or member 'e_dev' not described in 'find_source_device'
   drivers/pci/pcie/aer.c:923: warning: Excess function parameter 'e_info' description in 'find_source_device'
>> drivers/pci/pcie/aer.c:1172: warning: expecting prototype for aer_find_corr_error_source_device(). Prototype was for aer_find_uncorr_error_source_device() instead


vim +1172 drivers/pci/pcie/aer.c

  1156	
  1157	/**
  1158	 * aer_find_corr_error_source_device - find the error source which detected the uncorrected error
  1159	 * @rp: pointer to Root Port pci_dev data structure
  1160	 * @e_src: pointer to an error source
  1161	 * @e_info: including detailed error information such like id
  1162	 *
  1163	 * Return true if found.
  1164	 *
  1165	 * Process the error information received at the Root Port, set these values
  1166	 * in the aer_devices_err_info and find all the devices that are related to
  1167	 * the error.
  1168	 */
  1169	static bool aer_find_uncorr_error_source_device(struct pci_dev *rp,
  1170						struct aer_err_source *e_src,
  1171						struct aer_devices_err_info *e_info)
> 1172	{
  1173		if (e_src->status & PCI_ERR_ROOT_UNCOR_RCV) {
  1174			e_info->err_info.id = ERR_UNCOR_ID(e_src->id);
  1175	
  1176			if (e_src->status & PCI_ERR_ROOT_FATAL_RCV)
  1177				e_info->err_info.severity = AER_FATAL;
  1178			else
  1179				e_info->err_info.severity = AER_NONFATAL;
  1180	
  1181			if (e_src->status & PCI_ERR_ROOT_MULTI_UNCOR_RCV)
  1182				e_info->err_info.multi_error_valid = 1;
  1183			else
  1184				e_info->err_info.multi_error_valid = 0;
  1185	
  1186			if (!find_source_device(rp, e_info))
  1187				return false;
  1188		}
  1189	
  1190		return true;
  1191	}
  1192	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 69189 bytes --]

^ permalink raw reply

* Re: [PATCH 6/9] powerpc/bpf: Fix BPF_SUB when imm == 0x80000000
From: Christophe Leroy @ 2021-10-05  5:40 UTC (permalink / raw)
  To: Naveen N. Rao, Alexei Starovoitov, Daniel Borkmann,
	Johan Almbladh, Michael Ellerman, Nicholas Piggin
  Cc: bpf, linuxppc-dev
In-Reply-To: <1633371242.5ghdfjua6t.naveen@linux.ibm.com>



Le 04/10/2021 à 20:18, Naveen N. Rao a écrit :
> Christophe Leroy wrote:
>>
>>
>> Le 01/10/2021 à 23:14, Naveen N. Rao a écrit :
>>> We aren't handling subtraction involving an immediate value of
>>> 0x80000000 properly. Fix the same.
>>>
>>> Fixes: 156d0e290e969c ("powerpc/ebpf/jit: Implement JIT compiler for 
>>> extended BPF")
>>> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/net/bpf_jit_comp64.c | 16 ++++++++--------
>>>   1 file changed, 8 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
>>> b/arch/powerpc/net/bpf_jit_comp64.c
>>> index ffb7a2877a8469..4641a50e82d50d 100644
>>> --- a/arch/powerpc/net/bpf_jit_comp64.c
>>> +++ b/arch/powerpc/net/bpf_jit_comp64.c
>>> @@ -333,15 +333,15 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 
>>> *image, struct codegen_context *
>>>           case BPF_ALU | BPF_SUB | BPF_K: /* (u32) dst -= (u32) imm */
>>>           case BPF_ALU64 | BPF_ADD | BPF_K: /* dst += imm */
>>>           case BPF_ALU64 | BPF_SUB | BPF_K: /* dst -= imm */
>>> -            if (BPF_OP(code) == BPF_SUB)
>>> -                imm = -imm;
>>> -            if (imm) {
>>> -                if (imm >= -32768 && imm < 32768)
>>> -                    EMIT(PPC_RAW_ADDI(dst_reg, dst_reg, IMM_L(imm)));
>>> -                else {
>>> -                    PPC_LI32(b2p[TMP_REG_1], imm);
>>> +            if (imm > -32768 && imm < 32768) {
>>> +                EMIT(PPC_RAW_ADDI(dst_reg, dst_reg,
>>> +                    BPF_OP(code) == BPF_SUB ? IMM_L(-imm) : 
>>> IMM_L(imm)));
>>> +            } else {
>>> +                PPC_LI32(b2p[TMP_REG_1], imm);
>>> +                if (BPF_OP(code) == BPF_SUB)
>>> +                    EMIT(PPC_RAW_SUB(dst_reg, dst_reg, 
>>> b2p[TMP_REG_1]));
>>> +                else
>>>                       EMIT(PPC_RAW_ADD(dst_reg, dst_reg, 
>>> b2p[TMP_REG_1]));
>>> -                }
>>>               }
>>>               goto bpf_alu32_trunc;
>>
>> There is now so few code common to both BPF_ADD and BPF_SUB that you 
>> should make them different cases.
>>
>> While at it, why not also use ADDIS if imm is 32 bits ? That would be 
>> an ADDIS/ADDI instead of LIS/ORI/ADD
> 
> Sure. I wanted to limit the change for this fix. We can do a separate 
> patch to optimize code generation for BPF_ADD.
> 

Sure, this second part was just a thought, I agree it should be another 
patch.

My main comment here is to split stuff and make it a different case, I 
don't think it increases the change much, and IMO it is easier to read:

diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index ffb7a2877a84..39226d88c558 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -330,11 +330,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 
*image, struct codegen_context *
  			EMIT(PPC_RAW_SUB(dst_reg, dst_reg, src_reg));
  			goto bpf_alu32_trunc;
  		case BPF_ALU | BPF_ADD | BPF_K: /* (u32) dst += (u32) imm */
-		case BPF_ALU | BPF_SUB | BPF_K: /* (u32) dst -= (u32) imm */
  		case BPF_ALU64 | BPF_ADD | BPF_K: /* dst += imm */
-		case BPF_ALU64 | BPF_SUB | BPF_K: /* dst -= imm */
-			if (BPF_OP(code) == BPF_SUB)
-				imm = -imm;
  			if (imm) {
  				if (imm >= -32768 && imm < 32768)
  					EMIT(PPC_RAW_ADDI(dst_reg, dst_reg, IMM_L(imm)));
@@ -344,6 +340,17 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 
*image, struct codegen_context *
  				}
  			}
  			goto bpf_alu32_trunc;
+		case BPF_ALU | BPF_SUB | BPF_K: /* (u32) dst -= (u32) imm */
+		case BPF_ALU64 | BPF_SUB | BPF_K: /* dst -= imm */
+			if (imm) {
+				if (-imm >= -32768 && -imm < 32768) {
+					EMIT(PPC_RAW_ADDI(dst_reg, dst_reg, IMM_L(-imm)));
+				} else {
+					PPC_LI32(b2p[TMP_REG_1], imm);
+					EMIT(PPC_RAW_SUB(dst_reg, dst_reg, b2p[TMP_REG_1]));
+				}
+			}
+			goto bpf_alu32_trunc;
  		case BPF_ALU | BPF_MUL | BPF_X: /* (u32) dst *= (u32) src */
  		case BPF_ALU64 | BPF_MUL | BPF_X: /* dst *= src */
  			if (BPF_CLASS(code) == BPF_ALU)

^ permalink raw reply related

* Re: [PATCH 4/9] powerpc/bpf: Handle large branch ranges with BPF_EXIT
From: Christophe Leroy @ 2021-10-05  5:46 UTC (permalink / raw)
  To: Naveen N. Rao, Alexei Starovoitov, Daniel Borkmann,
	Johan Almbladh, Michael Ellerman, Nicholas Piggin
  Cc: bpf, linuxppc-dev
In-Reply-To: <1633371632.j9hqy0kjhu.naveen@linux.ibm.com>



Le 04/10/2021 à 20:24, Naveen N. Rao a écrit :
> Christophe Leroy wrote:
>>
>>
>> Le 01/10/2021 à 23:14, Naveen N. Rao a écrit :
>>> In some scenarios, it is possible that the program epilogue is outside
>>> the branch range for a BPF_EXIT instruction. Instead of rejecting such
>>> programs, emit an indirect branch. We track the size of the bpf program
>>> emitted after the initial run and do a second pass since BPF_EXIT can
>>> end up emitting different number of instructions depending on the
>>> program size.
>>>
>>> Suggested-by: Jordan Niethe <jniethe5@gmail.com>
>>> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/net/bpf_jit.h        |  3 +++
>>>   arch/powerpc/net/bpf_jit_comp.c   | 22 +++++++++++++++++++++-
>>>   arch/powerpc/net/bpf_jit_comp32.c |  2 +-
>>>   arch/powerpc/net/bpf_jit_comp64.c |  2 +-
>>>   4 files changed, 26 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
>>> index 89bd744c2bffd4..4023de1698b9f5 100644
>>> --- a/arch/powerpc/net/bpf_jit.h
>>> +++ b/arch/powerpc/net/bpf_jit.h
>>> @@ -126,6 +126,7 @@
>>>   #define SEEN_FUNC    0x20000000 /* might call external helpers */
>>>   #define SEEN_TAILCALL    0x40000000 /* uses tail calls */
>>> +#define SEEN_BIG_PROG    0x80000000 /* large prog, >32MB */
>>>   #define SEEN_VREG_MASK    0x1ff80000 /* Volatile registers r3-r12 */
>>>   #define SEEN_NVREG_MASK    0x0003ffff /* Non volatile registers 
>>> r14-r31 */
>>> @@ -179,6 +180,8 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 
>>> *image, struct codegen_context *
>>>   void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
>>>   void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
>>>   void bpf_jit_realloc_regs(struct codegen_context *ctx);
>>> +int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx,
>>> +                    int tmp_reg, unsigned long exit_addr);
>>>   #endif
>>> diff --git a/arch/powerpc/net/bpf_jit_comp.c 
>>> b/arch/powerpc/net/bpf_jit_comp.c
>>> index fcbf7a917c566e..3204872fbf2738 100644
>>> --- a/arch/powerpc/net/bpf_jit_comp.c
>>> +++ b/arch/powerpc/net/bpf_jit_comp.c
>>> @@ -72,6 +72,21 @@ static int bpf_jit_fixup_subprog_calls(struct 
>>> bpf_prog *fp, u32 *image,
>>>       return 0;
>>>   }
>>> +int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx,
>>> +                    int tmp_reg, unsigned long exit_addr)
>>> +{
>>> +    if (!(ctx->seen & SEEN_BIG_PROG) && 
>>> is_offset_in_branch_range(exit_addr)) {
>>> +        PPC_JMP(exit_addr);
>>> +    } else {
>>> +        ctx->seen |= SEEN_BIG_PROG;
>>> +        PPC_FUNC_ADDR(tmp_reg, (unsigned long)image + exit_addr);
>>> +        EMIT(PPC_RAW_MTCTR(tmp_reg));
>>> +        EMIT(PPC_RAW_BCTR());
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>>   struct powerpc64_jit_data {
>>>       struct bpf_binary_header *header;
>>>       u32 *addrs;
>>> @@ -155,12 +170,17 @@ struct bpf_prog *bpf_int_jit_compile(struct 
>>> bpf_prog *fp)
>>>           goto out_addrs;
>>>       }
>>> +    if (!is_offset_in_branch_range((long)cgctx.idx * 4))
>>> +        cgctx.seen |= SEEN_BIG_PROG;
>>> +
>>>       /*
>>>        * If we have seen a tail call, we need a second pass.
>>>        * This is because bpf_jit_emit_common_epilogue() is called
>>>        * from bpf_jit_emit_tail_call() with a not yet stable ctx->seen.
>>> +     * We also need a second pass if we ended up with too large
>>> +     * a program so as to fix branches.
>>>        */
>>> -    if (cgctx.seen & SEEN_TAILCALL) {
>>> +    if (cgctx.seen & (SEEN_TAILCALL | SEEN_BIG_PROG)) {
>>>           cgctx.idx = 0;
>>>           if (bpf_jit_build_body(fp, 0, &cgctx, addrs, false)) {
>>>               fp = org_fp;
>>> diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
>>> b/arch/powerpc/net/bpf_jit_comp32.c
>>> index a74d52204f8da2..d2a67574a23066 100644
>>> --- a/arch/powerpc/net/bpf_jit_comp32.c
>>> +++ b/arch/powerpc/net/bpf_jit_comp32.c
>>> @@ -852,7 +852,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 
>>> *image, struct codegen_context *
>>>                * we'll just fall through to the epilogue.
>>>                */
>>>               if (i != flen - 1)
>>> -                PPC_JMP(exit_addr);
>>> +                bpf_jit_emit_exit_insn(image, ctx, tmp_reg, exit_addr);
>>
>> On ppc32, if you use tmp_reg you must flag it. But I think you could 
>> use r0 instead.
> 
> Indeed. Can we drop tracking of the temp registers and using them while
> remapping registers? Are you seeing significant benefits with re-use of 
> those temp registers?
> 

I'm not sure to follow you.

On ppc32, all volatile registers are used for function arguments, so 
temp registers are necessarily non-volatile so we track them as all 
non-volatile registers we use.

I think saving on stack only the non-volatile registers we use provides 
real benefit, otherwise you wouldn't have implemented it would you ?

Anyway here you should use _R0 instead of tmp_reg.

Christophe

^ permalink raw reply

* Re: [PATCH 3/9] powerpc/bpf: Remove unused SEEN_STACK
From: Christophe Leroy @ 2021-10-05  5:50 UTC (permalink / raw)
  To: Naveen N. Rao, Alexei Starovoitov, Daniel Borkmann,
	Johan Almbladh, Michael Ellerman, Nicholas Piggin
  Cc: bpf, linuxppc-dev
In-Reply-To: <1633369544.ekqufta9bg.naveen@linux.ibm.com>



Le 04/10/2021 à 20:11, Naveen N. Rao a écrit :
> Christophe Leroy wrote:
>>
>>
>> Le 01/10/2021 à 23:14, Naveen N. Rao a écrit :
>>> From: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
>>>
>>> SEEN_STACK is unused on PowerPC. Remove it. Also, have
>>> SEEN_TAILCALL use 0x40000000.
>>
>> Why change SEEN_TAILCALL ? Would it be a problem to leave it as is ?
>>
>>>
>>> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
>>> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> 
> I prefer the bit usage to be contiguous. Changing SEEN_TAILCALL isn't a 
> problem either.
> 

Well you are adding SEEN_BIG_PROG in following patch so it would still 
be contiguous at the end.

I don't really mind but I thought it would be less churn to just leave 
SEEN_TAILCALL as is and re-use 0x40000000 for SEEN_BIG_PROG.

Anyway

Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>

^ permalink raw reply

* Re: [V2 1/4] powerpc/perf: Refactor the code definition of perf reg extended mask
From: Michael Ellerman @ 2021-10-05  5:52 UTC (permalink / raw)
  To: Athira Rajeev, acme, jolsa; +Cc: kjain, maddy, linuxppc-dev, rnsastry
In-Reply-To: <20210930122055.1390-2-atrajeev@linux.vnet.ibm.com>

Athira Rajeev <atrajeev@linux.vnet.ibm.com> writes:
> PERF_REG_PMU_MASK_300 and PERF_REG_PMU_MASK_31 defines the mask
> value for extended registers. Current definition of these mask values
> uses hex constant and does not use registers by name, making it less
> readable. Patch refactor the macro values by or'ing together the actual
> register value constants. Also include PERF_REG_EXTENDED_MAX as
> part of enum definition.
>
> Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/uapi/asm/perf_regs.h | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/include/uapi/asm/perf_regs.h b/arch/powerpc/include/uapi/asm/perf_regs.h
> index 578b3ee86105..fb1d8a9b4393 100644
> --- a/arch/powerpc/include/uapi/asm/perf_regs.h
> +++ b/arch/powerpc/include/uapi/asm/perf_regs.h
> @@ -61,27 +61,32 @@ enum perf_event_powerpc_regs {
>  	PERF_REG_POWERPC_PMC4,
>  	PERF_REG_POWERPC_PMC5,
>  	PERF_REG_POWERPC_PMC6,
> -	/* Max regs without the extended regs */
> +	/* Max mask value for interrupt regs w/o extended regs */
>  	PERF_REG_POWERPC_MAX = PERF_REG_POWERPC_MMCRA + 1,
> +	/* Max mask value for interrupt regs including extended regs */
> +	PERF_REG_EXTENDED_MAX = PERF_REG_POWERPC_PMC6 + 1,
>  };
>  
>  #define PERF_REG_PMU_MASK	((1ULL << PERF_REG_POWERPC_MAX) - 1)
>  
> -/* Exclude MMCR3, SIER2, SIER3 for CPU_FTR_ARCH_300 */
> -#define	PERF_EXCLUDE_REG_EXT_300	(7ULL << PERF_REG_POWERPC_MMCR3)
> -
>  /*
>   * PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_300
>   * includes 9 SPRS from MMCR0 to PMC6 excluding the
> - * unsupported SPRS in PERF_EXCLUDE_REG_EXT_300.
> + * unsupported SPRS MMCR3, SIER2 and SIER3.
>   */
> -#define PERF_REG_PMU_MASK_300   ((0xfffULL << PERF_REG_POWERPC_MMCR0) - PERF_EXCLUDE_REG_EXT_300)
> +#define PERF_REG_PMU_MASK_300	\
> +	((1ul << PERF_REG_POWERPC_MMCR0) | (1ul << PERF_REG_POWERPC_MMCR1) | \
> +	(1ul << PERF_REG_POWERPC_MMCR2) | (1ul << PERF_REG_POWERPC_PMC1) | \
> +	(1ul << PERF_REG_POWERPC_PMC2) | (1ul << PERF_REG_POWERPC_PMC3) | \
> +	(1ul << PERF_REG_POWERPC_PMC4) | (1ul << PERF_REG_POWERPC_PMC5) | \
> +	(1ul << PERF_REG_POWERPC_PMC6))

These all need to be unsigned long long. Otherwise when building on big
endian (which defaults to 32-bit), we see errors such as:

  In file included from /home/michael/linux/tools/perf/arch/powerpc/include/perf_regs.h:7:0,
                   from arch/powerpc/util/../../../util/perf_regs.h:30,
                   from arch/powerpc/util/perf_regs.c:7:
  arch/powerpc/util/perf_regs.c: In function ‘arch__intr_reg_mask’:
  /home/michael/linux/tools/arch/powerpc/include/uapi/asm/perf_regs.h:78:8: error: left shift count >= width of type [-Werror=shift-count-overflow]
    ((1ul << PERF_REG_POWERPC_MMCR0) | (1ul << PERF_REG_POWERPC_MMCR1) | \
          ^
  arch/powerpc/util/perf_regs.c:206:19: note: in expansion of macro ‘PERF_REG_PMU_MASK_300’
     extended_mask = PERF_REG_PMU_MASK_300;
                     ^

cheers

^ permalink raw reply

* Re: [PATCH 1/3] fixup mmu_features immediately after getting cpu pa features.
From: Mahesh J Salgaonkar @ 2021-10-05  5:54 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linuxppc-dev, mahesh, linux-kernel, Sourabh Jain, Abdul haleem,
	hbathini
In-Reply-To: <e67acb9b-dd8e-767a-b57b-f12b3b0fd44d@linux.ibm.com>

On 2021-10-04 21:02:21 Mon, Aneesh Kumar K.V wrote:
> On 10/4/21 20:41, Sourabh Jain wrote:
> > From: Mahesh Salgaonkar <mahesh@linux.ibm.com>
> > 
> > On system with radix support available, early_radix_enabled() starts
> > returning true for a small window (until mmu_early_init_devtree() is
> > called) even when radix mode disabled on kernel command line. This causes
> > ppc64_bolted_size() to return ULONG_MAX in HPT mode instead of supported
> > segment size, during boot cpu paca allocation.
> > 
> > With kernel command line = "... disable_radix":
> > 
> > early_init_devtree:			  <- early_radix_enabled() = false
> >    early_init_dt_scan_cpus:		  <- early_radix_enabled() = false
> >        ...
> >        check_cpu_pa_features:		  <- early_radix_enabled() = false
> >        ...				^ <- early_radix_enabled() = TRUE
> >        allocate_paca:			| <- early_radix_enabled() = TRUE
> >            ...                           |
> >            ppc64_bolted_size:		| <- early_radix_enabled() = TRUE
> >                if (early_radix_enabled())| <- early_radix_enabled() = TRUE
> >                    return ULONG_MAX;     |
> >        ...                               |
> >    ...					| <- early_radix_enabled() = TRUE
> >    ...					| <- early_radix_enabled() = TRUE
> >    mmu_early_init_devtree()              V
> >    ...					  <- early_radix_enabled() = false
> > 
> > So far we have not seen any issue because allocate_paca() takes minimum of
> > ppc64_bolted_size and rma_size while allocating paca. However it is better
> > to close this window by fixing up the mmu features as early as possible.
> > This fixes early_radix_enabled() and ppc64_bolted_size() to return valid
> > values in radix disable mode. This patch will help subsequent patch to
> > depend on early_radix_enabled() check while detecting supported segment
> > size in HPT mode.
> > 
> > Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
> > Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
> > Reported-and-tested-by: Abdul haleem <abdhalee@linux.vnet.ibm.com>
> > ---
> >   arch/powerpc/include/asm/book3s/64/mmu.h | 1 +
> >   arch/powerpc/include/asm/mmu.h           | 1 +
> >   arch/powerpc/kernel/prom.c               | 1 +
> >   arch/powerpc/mm/init_64.c                | 5 ++++-
> >   4 files changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
> > index c02f42d1031e..69a89fa1330d 100644
> > --- a/arch/powerpc/include/asm/book3s/64/mmu.h
> > +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
> > @@ -197,6 +197,7 @@ extern int mmu_vmemmap_psize;
> >   extern int mmu_io_psize;
> >   /* MMU initialization */
> > +void mmu_cpu_feature_fixup(void);
> >   void mmu_early_init_devtree(void);
> >   void hash__early_init_devtree(void);
> >   void radix__early_init_devtree(void);
> > diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
> > index 8abe8e42e045..c8eafd401fe9 100644
> > --- a/arch/powerpc/include/asm/mmu.h
> > +++ b/arch/powerpc/include/asm/mmu.h
> > @@ -401,6 +401,7 @@ extern void early_init_mmu(void);
> >   extern void early_init_mmu_secondary(void);
> >   extern void setup_initial_memory_limit(phys_addr_t first_memblock_base,
> >   				       phys_addr_t first_memblock_size);
> > +static inline void mmu_cpu_feature_fixup(void) { }
> >   static inline void mmu_early_init_devtree(void) { }
> >   static inline void pkey_early_init_devtree(void) {}
> > diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> > index 2e67588f6f6e..1727a3abe6c1 100644
> > --- a/arch/powerpc/kernel/prom.c
> > +++ b/arch/powerpc/kernel/prom.c
> > @@ -380,6 +380,7 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
> >   		check_cpu_pa_features(node);
> >   	}
> > +	mmu_cpu_feature_fixup();
> 
> can you do that call inside check_cpu_pa_features? or is it because we have
> the same issue with baremetal platforms?

Yup same issue exist on baremetal as well in case of dt_cpu_ftrs_in_use
is true. Hence calling it after the if (!dt_cpu_ftrs_in_use) code block
takes care of both pseries and baremetal platforms.

> 
> Can we also rename this to indicate we are sanitizing the feature flag based
> on kernel command line.  Something like
> 
> /* Update cpu features based on kernel command line */
> update_cpu_features();

Sure will do.

Thanks for your review.
-Mahesh.

^ permalink raw reply

* Re: [PATCH 2/3] Remove 256MB limit restriction for boot cpu paca allocation
From: kernel test robot @ 2021-10-05  5:58 UTC (permalink / raw)
  To: Sourabh Jain, mpe
  Cc: kbuild-all, linuxppc-dev, mahesh, Sourabh Jain, Mahesh Salgaonkar,
	linux-kernel, Abdul haleem, aneesh.kumar, hbathini
In-Reply-To: <20211004151142.256251-3-sourabhjain@linux.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 5105 bytes --]

Hi Sourabh,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on linux/master linus/master v5.15-rc3 next-20210922]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Sourabh-Jain/Update-crashkernel-offset-to-allow-kernel-to-boot-on-large-config-LPARs/20211004-233345
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-buildonly-randconfig-r004-20211004 (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/563e715d022b3fab0f1791f64c3944aa34d20f04
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Sourabh-Jain/Update-crashkernel-offset-to-allow-kernel-to-boot-on-large-config-LPARs/20211004-233345
        git checkout 563e715d022b3fab0f1791f64c3944aa34d20f04
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   arch/powerpc/kernel/prom.c: In function 'early_init_dt_scan_cpus':
>> arch/powerpc/kernel/prom.c:389:17: error: implicit declaration of function 'hash__early_detect_seg_size' [-Werror=implicit-function-declaration]
     389 |                 hash__early_detect_seg_size();
         |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +/hash__early_detect_seg_size +389 arch/powerpc/kernel/prom.c

   307	
   308	static int __init early_init_dt_scan_cpus(unsigned long node,
   309						  const char *uname, int depth,
   310						  void *data)
   311	{
   312		const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
   313		const __be32 *prop;
   314		const __be32 *intserv;
   315		int i, nthreads;
   316		int len;
   317		int found = -1;
   318		int found_thread = 0;
   319	
   320		/* We are scanning "cpu" nodes only */
   321		if (type == NULL || strcmp(type, "cpu") != 0)
   322			return 0;
   323	
   324		/* Get physical cpuid */
   325		intserv = of_get_flat_dt_prop(node, "ibm,ppc-interrupt-server#s", &len);
   326		if (!intserv)
   327			intserv = of_get_flat_dt_prop(node, "reg", &len);
   328	
   329		nthreads = len / sizeof(int);
   330	
   331		/*
   332		 * Now see if any of these threads match our boot cpu.
   333		 * NOTE: This must match the parsing done in smp_setup_cpu_maps.
   334		 */
   335		for (i = 0; i < nthreads; i++) {
   336			if (be32_to_cpu(intserv[i]) ==
   337				fdt_boot_cpuid_phys(initial_boot_params)) {
   338				found = boot_cpu_count;
   339				found_thread = i;
   340			}
   341	#ifdef CONFIG_SMP
   342			/* logical cpu id is always 0 on UP kernels */
   343			boot_cpu_count++;
   344	#endif
   345		}
   346	
   347		/* Not the boot CPU */
   348		if (found < 0)
   349			return 0;
   350	
   351		DBG("boot cpu: logical %d physical %d\n", found,
   352		    be32_to_cpu(intserv[found_thread]));
   353		boot_cpuid = found;
   354	
   355		/*
   356		 * PAPR defines "logical" PVR values for cpus that
   357		 * meet various levels of the architecture:
   358		 * 0x0f000001	Architecture version 2.04
   359		 * 0x0f000002	Architecture version 2.05
   360		 * If the cpu-version property in the cpu node contains
   361		 * such a value, we call identify_cpu again with the
   362		 * logical PVR value in order to use the cpu feature
   363		 * bits appropriate for the architecture level.
   364		 *
   365		 * A POWER6 partition in "POWER6 architected" mode
   366		 * uses the 0x0f000002 PVR value; in POWER5+ mode
   367		 * it uses 0x0f000001.
   368		 *
   369		 * If we're using device tree CPU feature discovery then we don't
   370		 * support the cpu-version property, and it's the responsibility of the
   371		 * firmware/hypervisor to provide the correct feature set for the
   372		 * architecture level via the ibm,powerpc-cpu-features binding.
   373		 */
   374		if (!dt_cpu_ftrs_in_use()) {
   375			prop = of_get_flat_dt_prop(node, "cpu-version", NULL);
   376			if (prop && (be32_to_cpup(prop) & 0xff000000) == 0x0f000000)
   377				identify_cpu(0, be32_to_cpup(prop));
   378	
   379			check_cpu_feature_properties(node);
   380			check_cpu_pa_features(node);
   381		}
   382	
   383		mmu_cpu_feature_fixup();
   384		identical_pvr_fixup(node);
   385		init_mmu_slb_size(node);
   386	
   387		/* Initialize segment sizes */
   388		if (!early_radix_enabled())
 > 389			hash__early_detect_seg_size();
   390	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 43214 bytes --]

^ permalink raw reply

* Re: [PATCH v3 6/8] PCI/AER: Clear error device AER registers in aer_irq()
From: kernel test robot @ 2021-10-05  6:54 UTC (permalink / raw)
  To: Naveen Naidu, bhelgaas, ruscur, oohall
  Cc: linuxppc-dev, kbuild-all, linux-pci, llvm, linux-kernel,
	Naveen Naidu, skhan, linux-kernel-mentees
In-Reply-To: <9092514ec952ab3d60498b6f6c29671b85f9aeff.1633357368.git.naveennaidu479@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3710 bytes --]

Hi Naveen,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on helgaas-pci/next]
[also build test WARNING on linux/master linus/master v5.15-rc3 next-20210921]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Naveen-Naidu/Fix-long-standing-AER-Error-Handling-Issues/20211004-223758
base:   https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git next
config: x86_64-randconfig-a006-20211004 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project c0039de2953d15815448b4b3c3bafb45607781e0)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/ab727f8771a49bb5f4c7bbf10dc9ba9b90a454cf
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Naveen-Naidu/Fix-long-standing-AER-Error-Handling-Issues/20211004-223758
        git checkout ab727f8771a49bb5f4c7bbf10dc9ba9b90a454cf
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/pci/pcie/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   drivers/pci/pcie/aer.c:822: warning: Function parameter or member 'e_dev' not described in 'add_error_device'
   drivers/pci/pcie/aer.c:822: warning: Excess function parameter 'e_info' description in 'add_error_device'
   drivers/pci/pcie/aer.c:923: warning: Function parameter or member 'e_dev' not described in 'find_source_device'
   drivers/pci/pcie/aer.c:923: warning: Excess function parameter 'e_info' description in 'find_source_device'
>> drivers/pci/pcie/aer.c:1172: warning: expecting prototype for aer_find_corr_error_source_device(). Prototype was for aer_find_uncorr_error_source_device() instead


vim +1172 drivers/pci/pcie/aer.c

  1156	
  1157	/**
  1158	 * aer_find_corr_error_source_device - find the error source which detected the uncorrected error
  1159	 * @rp: pointer to Root Port pci_dev data structure
  1160	 * @e_src: pointer to an error source
  1161	 * @e_info: including detailed error information such like id
  1162	 *
  1163	 * Return true if found.
  1164	 *
  1165	 * Process the error information received at the Root Port, set these values
  1166	 * in the aer_devices_err_info and find all the devices that are related to
  1167	 * the error.
  1168	 */
  1169	static bool aer_find_uncorr_error_source_device(struct pci_dev *rp,
  1170						struct aer_err_source *e_src,
  1171						struct aer_devices_err_info *e_info)
> 1172	{
  1173		if (e_src->status & PCI_ERR_ROOT_UNCOR_RCV) {
  1174			e_info->err_info.id = ERR_UNCOR_ID(e_src->id);
  1175	
  1176			if (e_src->status & PCI_ERR_ROOT_FATAL_RCV)
  1177				e_info->err_info.severity = AER_FATAL;
  1178			else
  1179				e_info->err_info.severity = AER_NONFATAL;
  1180	
  1181			if (e_src->status & PCI_ERR_ROOT_MULTI_UNCOR_RCV)
  1182				e_info->err_info.multi_error_valid = 1;
  1183			else
  1184				e_info->err_info.multi_error_valid = 0;
  1185	
  1186			if (!find_source_device(rp, e_info))
  1187				return false;
  1188		}
  1189	
  1190		return true;
  1191	}
  1192	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 41518 bytes --]

^ permalink raw reply

* Re: [PATCH 3/3] powerpc: Set crashkernel offset to mid of RMA region
From: Sourabh Jain @ 2021-10-05  8:14 UTC (permalink / raw)
  To: Aneesh Kumar K.V, mpe
  Cc: linuxppc-dev, Abdul haleem, linux-kernel, hbathini, mahesh
In-Reply-To: <f13e218e-4e38-4076-672f-d555d7abfc02@linux.ibm.com>


On 04/10/21 21:36, Aneesh Kumar K.V wrote:
> On 10/4/21 20:41, Sourabh Jain wrote:
>> On large config LPARs (having 192 and more cores), Linux fails to boot
>> due to insufficient memory in the first memory block. It is due to the
>> reserve crashkernel area starts at 128MB offset by default and which
>> doesn't leave enough space in the first memory block to accommodate
>> memory for other essential system resources.
>>
>> Given that the RMA region size can be 512MB or more, setting the
>> crashkernel offset to mid of RMA size will leave enough space to
>> kernel to allocate memory for other system resources in the first
>> memory block.
>>
>> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
>> Reported-and-tested-by: Abdul haleem <abdhalee@linux.vnet.ibm.com>
>> ---
>>   arch/powerpc/kernel/rtas.c |  3 +++
>>   arch/powerpc/kexec/core.c  | 13 +++++++++----
>>   2 files changed, 12 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>> index ff80bbad22a5..ce5e62bb4d8e 100644
>> --- a/arch/powerpc/kernel/rtas.c
>> +++ b/arch/powerpc/kernel/rtas.c
>> @@ -1235,6 +1235,9 @@ int __init early_init_dt_scan_rtas(unsigned 
>> long node,
>>       entryp = of_get_flat_dt_prop(node, "linux,rtas-entry", NULL);
>>       sizep  = of_get_flat_dt_prop(node, "rtas-size", NULL);
>>   +    if (of_get_flat_dt_prop(node, "ibm,hypertas-functions", NULL))
>> +        powerpc_firmware_features |= FW_FEATURE_LPAR;
>> +
>
> The equivalent check that we currently do more than checking 
> ibm,hypertas-functions.
>
>     if (!strcmp(uname, "rtas") || !strcmp(uname, "rtas@0")) {
>         prop = of_get_flat_dt_prop(node, "ibm,hypertas-functions",
>                        &len);
>         if (prop) {
>             powerpc_firmware_features |= FW_FEATURE_LPAR;
>             fw_hypertas_feature_init(prop, len);
>         }
>
>
> also do we expect other firmware features to be set along with 
> FW_FEATURE_LPAR?

This patch needs to move crash kernel reservation to mid point of rma 
size for LPAR in reserve_crashkernel() function. Since 
reserve_crashkernel() is called too early even before 
powerpc_firmware_features is set with FW_FEATURE_LPAR, the check for if 
(firmware_has_feature(FW_FEATURE_LPAR)) fails and hence we only need to 
make sure that we set this flag early during early_init_dt_scan_rtas().

The rest of the LPAR specific initialization isn't required at this 
point and will be still done during pseries_probe_fw_features() as usual.

Thanks,
Sourabh Jain


^ permalink raw reply

* [PATCH 1/4] perf: Add comment about current state of PERF_MEM_LVL_* namespace and remove an extra line
From: Kajol Jain @ 2021-10-05  9:18 UTC (permalink / raw)
  To: mpe, linuxppc-dev, linux-kernel, peterz, mingo, acme, jolsa,
	namhyung, ak
  Cc: mark.rutland, songliubraving, atrajeev, daniel, rnsastry,
	alexander.shishkin, kjain, ast, linux-perf-users, yao.jin, maddy,
	paulus, kan.liang

Add a comment about PERF_MEM_LVL_* namespace being depricated
to some extent in favour of added PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_}
fields.

Remove an extra line present in perf_mem__lvl_scnprintf function.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 include/uapi/linux/perf_event.h       | 8 +++++++-
 tools/include/uapi/linux/perf_event.h | 8 +++++++-
 tools/perf/util/mem-events.c          | 1 -
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index f92880a15645..e1701e9c7858 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1241,7 +1241,13 @@ union perf_mem_data_src {
 #define PERF_MEM_OP_EXEC	0x10 /* code (execution) */
 #define PERF_MEM_OP_SHIFT	0
 
-/* memory hierarchy (memory level, hit or miss) */
+/*
+ * PERF_MEM_LVL_* namespace being depricated to some extent in the
+ * favour of newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_} fields.
+ * Supporting this namespace inorder to not break defined ABIs.
+ *
+ * memory hierarchy (memory level, hit or miss)
+ */
 #define PERF_MEM_LVL_NA		0x01  /* not available */
 #define PERF_MEM_LVL_HIT	0x02  /* hit level */
 #define PERF_MEM_LVL_MISS	0x04  /* miss level  */
diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index f92880a15645..e1701e9c7858 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -1241,7 +1241,13 @@ union perf_mem_data_src {
 #define PERF_MEM_OP_EXEC	0x10 /* code (execution) */
 #define PERF_MEM_OP_SHIFT	0
 
-/* memory hierarchy (memory level, hit or miss) */
+/*
+ * PERF_MEM_LVL_* namespace being depricated to some extent in the
+ * favour of newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_} fields.
+ * Supporting this namespace inorder to not break defined ABIs.
+ *
+ * memory hierarchy (memory level, hit or miss)
+ */
 #define PERF_MEM_LVL_NA		0x01  /* not available */
 #define PERF_MEM_LVL_HIT	0x02  /* hit level */
 #define PERF_MEM_LVL_MISS	0x04  /* miss level  */
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index f0e75df72b80..ff7289e28192 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -320,7 +320,6 @@ int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
 	/* already taken care of */
 	m &= ~(PERF_MEM_LVL_HIT|PERF_MEM_LVL_MISS);
 
-
 	if (mem_info && mem_info->data_src.mem_remote) {
 		strcat(out, "Remote ");
 		l += 7;
-- 
2.26.2


^ permalink raw reply related

* [PATCH 2/4] perf: Add mem_hops field in perf_mem_data_src structure
From: Kajol Jain @ 2021-10-05  9:18 UTC (permalink / raw)
  To: mpe, linuxppc-dev, linux-kernel, peterz, mingo, acme, jolsa,
	namhyung, ak
  Cc: mark.rutland, songliubraving, atrajeev, daniel, rnsastry,
	alexander.shishkin, kjain, ast, linux-perf-users, yao.jin, maddy,
	paulus, kan.liang
In-Reply-To: <20211005091837.250044-1-kjain@linux.ibm.com>

Going forward, future generation systems can have more hierarchy
within the chip/package level but currently we don't have any data source
encoding field in perf, which can be used to represent this level of data.

Add a new field called 'mem_hops' in the perf_mem_data_src structure
which can be used to represent intra-chip/package or inter-chip/off-package
details. This field is of size 3 bits where PERF_MEM_HOPS_{NA, 0..6} value
can be used to present different hop levels data.

Also add corresponding macros to define mem_hop field values
and shift value.

Currently we define macro for HOPS_0 which corresponds
to data coming from another core but same chip.

For ex: Encodings for mem_hops fields with L2 cache:

L2			- local L2
L2 | REMOTE | HOPS_0	- remote core, same chip L2

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 include/uapi/linux/perf_event.h | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index e1701e9c7858..42680563228c 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1210,14 +1210,16 @@ union perf_mem_data_src {
 			mem_remote:1,   /* remote */
 			mem_snoopx:2,	/* snoop mode, ext */
 			mem_blk:3,	/* access blocked */
-			mem_rsvd:21;
+			mem_hops:3,	/* hop level */
+			mem_rsvd:18;
 	};
 };
 #elif defined(__BIG_ENDIAN_BITFIELD)
 union perf_mem_data_src {
 	__u64 val;
 	struct {
-		__u64	mem_rsvd:21,
+		__u64	mem_rsvd:18,
+			mem_hops:3,	/* hop level */
 			mem_blk:3,	/* access blocked */
 			mem_snoopx:2,	/* snoop mode, ext */
 			mem_remote:1,   /* remote */
@@ -1313,6 +1315,11 @@ union perf_mem_data_src {
 #define PERF_MEM_BLK_ADDR	0x04 /* address conflict */
 #define PERF_MEM_BLK_SHIFT	40
 
+/* hop level */
+#define PERF_MEM_HOPS_0		0x01 /* remote core, same chip */
+/* 2-7 available */
+#define PERF_MEM_HOPS_SHIFT	43
+
 #define PERF_MEM_S(a, s) \
 	(((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT)
 
-- 
2.26.2


^ permalink raw reply related

* [PATCH 3/4] tools/perf: Add mem_hops field in perf_mem_data_src structure
From: Kajol Jain @ 2021-10-05  9:18 UTC (permalink / raw)
  To: mpe, linuxppc-dev, linux-kernel, peterz, mingo, acme, jolsa,
	namhyung, ak
  Cc: mark.rutland, songliubraving, atrajeev, daniel, rnsastry,
	alexander.shishkin, kjain, ast, linux-perf-users, yao.jin, maddy,
	paulus, kan.liang
In-Reply-To: <20211005091837.250044-1-kjain@linux.ibm.com>

Going forward, future generation systems can have more hierarchy
within the chip/package level but currently we don't have any data source
encoding field in perf, which can be used to represent this level of data.

Add a new field called 'mem_hops' in the perf_mem_data_src structure
which can be used to represent intra-chip/package or inter-chip/off-package
details. This field is of size 3 bits where PERF_MEM_HOPS_{NA, 0..6} value
can be used to present different hop levels data.

Also add corresponding macros to define mem_hop field values
and shift value.

Currently we define macro for HOPS_0 which corresponds
to data coming from another core but same chip.

Add functionality to represent mem_hop field data in
perf_mem__lvl_scnprintf function with the help of added string
array called mem_hops.

For ex: Encodings for mem_hops fields with L2 cache:

L2                      - local L2
L2 | REMOTE | HOPS_0    - remote core, same chip L2

Since with the addition of HOPS field, now remote can be used to
denote cache access from the same chip but different core, a check
is added in the c2c_decode_stats function to set mrem only when HOPS
is zero along with set remote field.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 tools/include/uapi/linux/perf_event.h | 11 +++++++++--
 tools/perf/util/mem-events.c          | 19 ++++++++++++++++++-
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index e1701e9c7858..42680563228c 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -1210,14 +1210,16 @@ union perf_mem_data_src {
 			mem_remote:1,   /* remote */
 			mem_snoopx:2,	/* snoop mode, ext */
 			mem_blk:3,	/* access blocked */
-			mem_rsvd:21;
+			mem_hops:3,	/* hop level */
+			mem_rsvd:18;
 	};
 };
 #elif defined(__BIG_ENDIAN_BITFIELD)
 union perf_mem_data_src {
 	__u64 val;
 	struct {
-		__u64	mem_rsvd:21,
+		__u64	mem_rsvd:18,
+			mem_hops:3,	/* hop level */
 			mem_blk:3,	/* access blocked */
 			mem_snoopx:2,	/* snoop mode, ext */
 			mem_remote:1,   /* remote */
@@ -1313,6 +1315,11 @@ union perf_mem_data_src {
 #define PERF_MEM_BLK_ADDR	0x04 /* address conflict */
 #define PERF_MEM_BLK_SHIFT	40
 
+/* hop level */
+#define PERF_MEM_HOPS_0		0x01 /* remote core, same chip */
+/* 2-7 available */
+#define PERF_MEM_HOPS_SHIFT	43
+
 #define PERF_MEM_S(a, s) \
 	(((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT)
 
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index ff7289e28192..585b29592a24 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -301,6 +301,16 @@ static const char * const mem_lvlnum[] = {
 	[PERF_MEM_LVLNUM_NA] = "N/A",
 };
 
+static const char * const mem_hops[] = {
+	"N/A",
+	/*
+	 * While printing, 'Remote' will be added to represent
+	 * 'Remote core, same chip' accesses as remote field need
+	 * to be set with mem_hops field.
+	 */
+	"core, same chip",
+};
+
 int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
 {
 	size_t i, l = 0;
@@ -325,6 +335,9 @@ int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
 		l += 7;
 	}
 
+	if (mem_info && mem_info->data_src.mem_hops)
+		l += scnprintf(out + l, sz - l, "%s ", mem_hops[mem_info->data_src.mem_hops]);
+
 	printed = 0;
 	for (i = 0; m && i < ARRAY_SIZE(mem_lvl); i++, m >>= 1) {
 		if (!(m & 0x1))
@@ -471,8 +484,12 @@ int c2c_decode_stats(struct c2c_stats *stats, struct mem_info *mi)
 	/*
 	 * Skylake might report unknown remote level via this
 	 * bit, consider it when evaluating remote HITMs.
+	 *
+	 * Incase of power, remote field can also be used to denote cache
+	 * accesses from the another core of same chip. Hence, setting
+	 * mrem only when HOPS is zero along with set remote field.
 	 */
-	bool mrem  = data_src->mem_remote;
+	bool mrem  = (data_src->mem_remote && !data_src->mem_hops);
 	int err = 0;
 
 #define HITM_INC(__f)		\
-- 
2.26.2


^ permalink raw reply related

* [PATCH 4/4] powerpc/perf: Fix data source encodings for L2.1 and L3.1 accesses
From: Kajol Jain @ 2021-10-05  9:18 UTC (permalink / raw)
  To: mpe, linuxppc-dev, linux-kernel, peterz, mingo, acme, jolsa,
	namhyung, ak
  Cc: mark.rutland, songliubraving, atrajeev, daniel, rnsastry,
	alexander.shishkin, kjain, ast, linux-perf-users, yao.jin, maddy,
	paulus, kan.liang
In-Reply-To: <20211005091837.250044-1-kjain@linux.ibm.com>

Fix the data source encodings to represent L2.1/L3.1(another core's
L2/L3 on the same chip) accesses properly for power10 and older
plaforms.

Add new macros(LEVEL/REM) which can be used to add mem_lvl_num and remote
field data inside perf_mem_data_src structure.

Result in power9 system with patch changes:

localhost:~/linux/tools/perf # ./perf mem report | grep Remote
     0.01%             1  236           Remote core, same chip L3 or L3 hit  [.] 0x0000000000002dd0                producer_consumer   [.] 0x00007fffadd4cc10        
anon               HitM          N/A                     No       N/A        0              0            
     0.01%             1  208           Remote core, same chip L3 or L3 hit  [.] 0x0000000000002dd0                producer_consumer   [.] 0x00007fff9dd33710        
anon               HitM          N/A                     No       N/A        0              0            
     0.00%             1  176           Remote core, same chip L3 or L3 hit  [.] 0x0000000000002dd0                producer_consumer   [.] 0x00007fff9b22c290        
anon               HitM          N/A                     No       N/A        0              0 

Fixes: 79e96f8f930d ("powerpc/perf: Export memory hierarchy info to user
space")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 arch/powerpc/perf/isa207-common.c | 26 +++++++++++++++++++++-----
 arch/powerpc/perf/isa207-common.h |  2 ++
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c
index f92bf5f6b74f..7ea873ab2e6f 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -238,11 +238,27 @@ static inline u64 isa207_find_source(u64 idx, u32 sub_idx)
 		ret |= P(SNOOP, HIT);
 		break;
 	case 5:
-		ret = PH(LVL, REM_CCE1);
-		if ((sub_idx == 0) || (sub_idx == 2) || (sub_idx == 4))
-			ret |= P(SNOOP, HIT);
-		else if ((sub_idx == 1) || (sub_idx == 3) || (sub_idx == 5))
-			ret |= P(SNOOP, HITM);
+		if (cpu_has_feature(CPU_FTR_ARCH_31)) {
+			ret = REM | P(HOPS, 0);
+
+			if (sub_idx == 0 || sub_idx == 4)
+				ret |= PH(LVL, L2) | LEVEL(L2) | P(SNOOP, HIT);
+			else if (sub_idx == 1 || sub_idx == 5)
+				ret |= PH(LVL, L2) | LEVEL(L2) | P(SNOOP, HITM);
+			else if (sub_idx == 2 || sub_idx == 6)
+				ret |= PH(LVL, L3) | LEVEL(L3) | P(SNOOP, HIT);
+			else if (sub_idx == 3 || sub_idx == 7)
+				ret |= PH(LVL, L3) | LEVEL(L3) | P(SNOOP, HITM);
+		} else {
+			if (sub_idx == 0)
+				ret = PH(LVL, L2) | LEVEL(L2) | REM | P(SNOOP, HIT) | P(HOPS, 0);
+			else if (sub_idx == 1)
+				ret = PH(LVL, L2) | LEVEL(L2) | REM | P(SNOOP, HITM) | P(HOPS, 0);
+			else if (sub_idx == 2 || sub_idx == 4)
+				ret = PH(LVL, L3) | LEVEL(L3) | REM | P(SNOOP, HIT) | P(HOPS, 0);
+			else if (sub_idx == 3 || sub_idx == 5)
+				ret = PH(LVL, L3) | LEVEL(L3) | REM | P(SNOOP, HITM) | P(HOPS, 0);
+		}
 		break;
 	case 6:
 		ret = PH(LVL, REM_CCE2);
diff --git a/arch/powerpc/perf/isa207-common.h b/arch/powerpc/perf/isa207-common.h
index 4a2cbc3dc047..ff122603989b 100644
--- a/arch/powerpc/perf/isa207-common.h
+++ b/arch/powerpc/perf/isa207-common.h
@@ -273,6 +273,8 @@
 #define P(a, b)				PERF_MEM_S(a, b)
 #define PH(a, b)			(P(LVL, HIT) | P(a, b))
 #define PM(a, b)			(P(LVL, MISS) | P(a, b))
+#define LEVEL(x)			P(LVLNUM, x)
+#define REM				P(REMOTE, REMOTE)
 
 int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp, u64 event_config1);
 int isa207_compute_mmcr(u64 event[], int n_ev,
-- 
2.26.2


^ permalink raw reply related

* Re: [PATCH 1/4] perf: Add comment about current state of PERF_MEM_LVL_* namespace and remove an extra line
From: kajoljain @ 2021-10-05  9:48 UTC (permalink / raw)
  To: mpe, linuxppc-dev, linux-kernel, peterz, mingo, acme, jolsa,
	namhyung, ak
  Cc: mark.rutland, songliubraving, atrajeev, daniel, rnsastry,
	alexander.shishkin, ast, linux-perf-users, yao.jin, maddy, paulus,
	kan.liang
In-Reply-To: <20211005091837.250044-1-kjain@linux.ibm.com>

Hi,
  Sorry I missed to update correct version details.

Link to the previous patch-set, where discussion related to addition of
new data source encoding field 'mem_hops' happened:

https://lkml.org/lkml/2021/9/4/37

Changelog:

- Rather then adding new macros for L2.1/L3.1 (same chip, different
core) entries as part of field lvlnum, we are introducing new field
called 'mem_hops' which can be used to get hops
level data(intra-chip/package or inter-chip/off-package details).
As suggested by Peter Zijlstra.

- Using OnChip to denote data accesses from 'another core of same chip'
  is not too clear. Update it to 'remote core, same chip' as pointed by
  Michael Ellerman.

- Update the fix patch of correcting data source encodings to use new
added field 'mem_hops'.

Thanks,
Kajol Jain


On 10/5/21 2:48 PM, Kajol Jain wrote:
> Add a comment about PERF_MEM_LVL_* namespace being depricated
> to some extent in favour of added PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_}
> fields.
> 
> Remove an extra line present in perf_mem__lvl_scnprintf function.
> 
> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
> ---
>  include/uapi/linux/perf_event.h       | 8 +++++++-
>  tools/include/uapi/linux/perf_event.h | 8 +++++++-
>  tools/perf/util/mem-events.c          | 1 -
>  3 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index f92880a15645..e1701e9c7858 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -1241,7 +1241,13 @@ union perf_mem_data_src {
>  #define PERF_MEM_OP_EXEC	0x10 /* code (execution) */
>  #define PERF_MEM_OP_SHIFT	0
>  
> -/* memory hierarchy (memory level, hit or miss) */
> +/*
> + * PERF_MEM_LVL_* namespace being depricated to some extent in the
> + * favour of newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_} fields.
> + * Supporting this namespace inorder to not break defined ABIs.
> + *
> + * memory hierarchy (memory level, hit or miss)
> + */
>  #define PERF_MEM_LVL_NA		0x01  /* not available */
>  #define PERF_MEM_LVL_HIT	0x02  /* hit level */
>  #define PERF_MEM_LVL_MISS	0x04  /* miss level  */
> diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
> index f92880a15645..e1701e9c7858 100644
> --- a/tools/include/uapi/linux/perf_event.h
> +++ b/tools/include/uapi/linux/perf_event.h
> @@ -1241,7 +1241,13 @@ union perf_mem_data_src {
>  #define PERF_MEM_OP_EXEC	0x10 /* code (execution) */
>  #define PERF_MEM_OP_SHIFT	0
>  
> -/* memory hierarchy (memory level, hit or miss) */
> +/*
> + * PERF_MEM_LVL_* namespace being depricated to some extent in the
> + * favour of newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_} fields.
> + * Supporting this namespace inorder to not break defined ABIs.
> + *
> + * memory hierarchy (memory level, hit or miss)
> + */
>  #define PERF_MEM_LVL_NA		0x01  /* not available */
>  #define PERF_MEM_LVL_HIT	0x02  /* hit level */
>  #define PERF_MEM_LVL_MISS	0x04  /* miss level  */
> diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
> index f0e75df72b80..ff7289e28192 100644
> --- a/tools/perf/util/mem-events.c
> +++ b/tools/perf/util/mem-events.c
> @@ -320,7 +320,6 @@ int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
>  	/* already taken care of */
>  	m &= ~(PERF_MEM_LVL_HIT|PERF_MEM_LVL_MISS);
>  
> -
>  	if (mem_info && mem_info->data_src.mem_remote) {
>  		strcat(out, "Remote ");
>  		l += 7;
> 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox