* [PATCH] iommu/arm-smmu: Pretty-print context fault related regs
@ 2024-05-23 21:34 Rob Clark
0 siblings, 0 replies; 10+ messages in thread
From: Rob Clark @ 2024-05-23 21:34 UTC (permalink / raw)
To: iommu
Cc: linux-arm-msm, Stephen Boyd, Rob Clark, Will Deacon, Robin Murphy,
Joerg Roedel, Jason Gunthorpe, Jerry Snitselaar,
Krzysztof Kozlowski, Rob Herring, Dmitry Baryshkov,
moderated list:ARM SMMU DRIVERS, open list:IOMMU SUBSYSTEM,
open list
From: Rob Clark <robdclark@chromium.org>
Parse out the bitfields for easier-to-read fault messages.
Signed-off-by: Rob Clark <robdclark@chromium.org>
---
Stephen was wanting easier to read fault messages.. so I typed this up.
drivers/iommu/arm/arm-smmu/arm-smmu.c | 53 +++++++++++++++++++++++++--
drivers/iommu/arm/arm-smmu/arm-smmu.h | 5 +++
2 files changed, 54 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index c572d877b0e1..06712d73519c 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -411,6 +411,8 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
unsigned long iova;
struct arm_smmu_domain *smmu_domain = dev;
struct arm_smmu_device *smmu = smmu_domain->smmu;
+ static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
int idx = smmu_domain->cfg.cbndx;
int ret;
@@ -425,10 +427,53 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
ret = report_iommu_fault(&smmu_domain->domain, NULL, iova,
fsynr & ARM_SMMU_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
- if (ret == -ENOSYS)
- dev_err_ratelimited(smmu->dev,
- "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
- fsr, iova, fsynr, cbfrsynra, idx);
+ if (ret == -ENOSYS && __ratelimit(&rs)) {
+ static const struct {
+ u32 mask; const char *name;
+ } fsr_bits[] = {
+ { ARM_SMMU_FSR_MULTI, "MULTI" },
+ { ARM_SMMU_FSR_SS, "SS" },
+ { ARM_SMMU_FSR_UUT, "UUT" },
+ { ARM_SMMU_FSR_ASF, "ASF" },
+ { ARM_SMMU_FSR_TLBLKF, "TLBLKF" },
+ { ARM_SMMU_FSR_TLBMCF, "TLBMCF" },
+ { ARM_SMMU_FSR_EF, "EF" },
+ { ARM_SMMU_FSR_PF, "PF" },
+ { ARM_SMMU_FSR_AFF, "AFF" },
+ { ARM_SMMU_FSR_TF, "TF" },
+ }, fsynr0_bits[] = {
+ { ARM_SMMU_FSYNR0_WNR, "WNR" },
+ { ARM_SMMU_FSYNR0_PNU, "PNU" },
+ { ARM_SMMU_FSYNR0_IND, "IND" },
+ { ARM_SMMU_FSYNR0_NSATTR, "NSATTR" },
+ { ARM_SMMU_FSYNR0_PTWF, "PTWF" },
+ { ARM_SMMU_FSYNR0_AFR, "AFR" },
+ };
+
+ pr_err("%s %s: Unhandled context fault: fsr=0x%x (",
+ dev_driver_string(smmu->dev), dev_name(smmu->dev), fsr);
+
+ for (int i = 0, n = 0; i < ARRAY_SIZE(fsr_bits); i++) {
+ if (fsr & fsr_bits[i].mask) {
+ pr_cont("%s%s", (n > 0) ? "|" : "", fsr_bits[i].name);
+ n++;
+ }
+ }
+
+ pr_cont("), iova=0x%08lx, fsynr=0x%x (S1CBNDX=%u", iova, fsynr,
+ (fsynr >> 16) & 0xff);
+
+ for (int i = 0; i < ARRAY_SIZE(fsynr0_bits); i++) {
+ if (fsynr & fsynr0_bits[i].mask) {
+ pr_cont("|%s", fsynr0_bits[i].name);
+ }
+ }
+
+ pr_cont("|PLVL=%u), cbfrsynra=0x%x, cb=%d\n",
+ fsynr & 0x3, /* FSYNR0.PLV */
+ cbfrsynra, idx);
+
+ }
arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
return IRQ_HANDLED;
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 836ed6799a80..3b051273718b 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -223,6 +223,11 @@ enum arm_smmu_cbar_type {
#define ARM_SMMU_CB_FSYNR0 0x68
#define ARM_SMMU_FSYNR0_WNR BIT(4)
+#define ARM_SMMU_FSYNR0_PNU BIT(5)
+#define ARM_SMMU_FSYNR0_IND BIT(6)
+#define ARM_SMMU_FSYNR0_NSATTR BIT(8)
+#define ARM_SMMU_FSYNR0_PTWF BIT(10)
+#define ARM_SMMU_FSYNR0_AFR BIT(11)
#define ARM_SMMU_CB_FSYNR1 0x6c
--
2.45.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH] iommu/arm-smmu: Pretty-print context fault related regs
@ 2024-06-04 15:01 Rob Clark
2024-06-17 10:27 ` Pranjal Shrivastava
2024-06-17 13:07 ` Robin Murphy
0 siblings, 2 replies; 10+ messages in thread
From: Rob Clark @ 2024-06-04 15:01 UTC (permalink / raw)
To: iommu
Cc: linux-arm-msm, Stephen Boyd, Rob Clark, Will Deacon, Robin Murphy,
Joerg Roedel, Jason Gunthorpe, Jerry Snitselaar,
Krzysztof Kozlowski, Dmitry Baryshkov,
moderated list:ARM SMMU DRIVERS, open list
From: Rob Clark <robdclark@chromium.org>
Parse out the bitfields for easier-to-read fault messages.
Signed-off-by: Rob Clark <robdclark@chromium.org>
---
Stephen was wanting easier to read fault messages.. so I typed this up.
Resend with the new iommu list address
drivers/iommu/arm/arm-smmu/arm-smmu.c | 53 +++++++++++++++++++++++++--
drivers/iommu/arm/arm-smmu/arm-smmu.h | 5 +++
2 files changed, 54 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index c572d877b0e1..06712d73519c 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -411,6 +411,8 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
unsigned long iova;
struct arm_smmu_domain *smmu_domain = dev;
struct arm_smmu_device *smmu = smmu_domain->smmu;
+ static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
int idx = smmu_domain->cfg.cbndx;
int ret;
@@ -425,10 +427,53 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
ret = report_iommu_fault(&smmu_domain->domain, NULL, iova,
fsynr & ARM_SMMU_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
- if (ret == -ENOSYS)
- dev_err_ratelimited(smmu->dev,
- "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
- fsr, iova, fsynr, cbfrsynra, idx);
+ if (ret == -ENOSYS && __ratelimit(&rs)) {
+ static const struct {
+ u32 mask; const char *name;
+ } fsr_bits[] = {
+ { ARM_SMMU_FSR_MULTI, "MULTI" },
+ { ARM_SMMU_FSR_SS, "SS" },
+ { ARM_SMMU_FSR_UUT, "UUT" },
+ { ARM_SMMU_FSR_ASF, "ASF" },
+ { ARM_SMMU_FSR_TLBLKF, "TLBLKF" },
+ { ARM_SMMU_FSR_TLBMCF, "TLBMCF" },
+ { ARM_SMMU_FSR_EF, "EF" },
+ { ARM_SMMU_FSR_PF, "PF" },
+ { ARM_SMMU_FSR_AFF, "AFF" },
+ { ARM_SMMU_FSR_TF, "TF" },
+ }, fsynr0_bits[] = {
+ { ARM_SMMU_FSYNR0_WNR, "WNR" },
+ { ARM_SMMU_FSYNR0_PNU, "PNU" },
+ { ARM_SMMU_FSYNR0_IND, "IND" },
+ { ARM_SMMU_FSYNR0_NSATTR, "NSATTR" },
+ { ARM_SMMU_FSYNR0_PTWF, "PTWF" },
+ { ARM_SMMU_FSYNR0_AFR, "AFR" },
+ };
+
+ pr_err("%s %s: Unhandled context fault: fsr=0x%x (",
+ dev_driver_string(smmu->dev), dev_name(smmu->dev), fsr);
+
+ for (int i = 0, n = 0; i < ARRAY_SIZE(fsr_bits); i++) {
+ if (fsr & fsr_bits[i].mask) {
+ pr_cont("%s%s", (n > 0) ? "|" : "", fsr_bits[i].name);
+ n++;
+ }
+ }
+
+ pr_cont("), iova=0x%08lx, fsynr=0x%x (S1CBNDX=%u", iova, fsynr,
+ (fsynr >> 16) & 0xff);
+
+ for (int i = 0; i < ARRAY_SIZE(fsynr0_bits); i++) {
+ if (fsynr & fsynr0_bits[i].mask) {
+ pr_cont("|%s", fsynr0_bits[i].name);
+ }
+ }
+
+ pr_cont("|PLVL=%u), cbfrsynra=0x%x, cb=%d\n",
+ fsynr & 0x3, /* FSYNR0.PLV */
+ cbfrsynra, idx);
+
+ }
arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
return IRQ_HANDLED;
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 836ed6799a80..3b051273718b 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -223,6 +223,11 @@ enum arm_smmu_cbar_type {
#define ARM_SMMU_CB_FSYNR0 0x68
#define ARM_SMMU_FSYNR0_WNR BIT(4)
+#define ARM_SMMU_FSYNR0_PNU BIT(5)
+#define ARM_SMMU_FSYNR0_IND BIT(6)
+#define ARM_SMMU_FSYNR0_NSATTR BIT(8)
+#define ARM_SMMU_FSYNR0_PTWF BIT(10)
+#define ARM_SMMU_FSYNR0_AFR BIT(11)
#define ARM_SMMU_CB_FSYNR1 0x6c
--
2.45.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] iommu/arm-smmu: Pretty-print context fault related regs
2024-06-04 15:01 [PATCH] iommu/arm-smmu: Pretty-print context fault related regs Rob Clark
@ 2024-06-17 10:27 ` Pranjal Shrivastava
2024-06-17 13:07 ` Robin Murphy
1 sibling, 0 replies; 10+ messages in thread
From: Pranjal Shrivastava @ 2024-06-17 10:27 UTC (permalink / raw)
To: Rob Clark
Cc: iommu, linux-arm-msm, Stephen Boyd, Rob Clark, Will Deacon,
Robin Murphy, Joerg Roedel, Jason Gunthorpe, Jerry Snitselaar,
Krzysztof Kozlowski, Dmitry Baryshkov,
moderated list:ARM SMMU DRIVERS, open list
Hi Rob,
On Tue, Jun 4, 2024 at 8:32 PM Rob Clark <robdclark@gmail.com> wrote:
>
> From: Rob Clark <robdclark@chromium.org>
>
> Parse out the bitfields for easier-to-read fault messages.
>
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
> Stephen was wanting easier to read fault messages.. so I typed this up.
>
> Resend with the new iommu list address
>
> drivers/iommu/arm/arm-smmu/arm-smmu.c | 53 +++++++++++++++++++++++++--
> drivers/iommu/arm/arm-smmu/arm-smmu.h | 5 +++
> 2 files changed, 54 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> index c572d877b0e1..06712d73519c 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> @@ -411,6 +411,8 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
> unsigned long iova;
> struct arm_smmu_domain *smmu_domain = dev;
> struct arm_smmu_device *smmu = smmu_domain->smmu;
> + static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
> + DEFAULT_RATELIMIT_BURST);
> int idx = smmu_domain->cfg.cbndx;
> int ret;
>
> @@ -425,10 +427,53 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
> ret = report_iommu_fault(&smmu_domain->domain, NULL, iova,
> fsynr & ARM_SMMU_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
>
> - if (ret == -ENOSYS)
> - dev_err_ratelimited(smmu->dev,
> - "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
> - fsr, iova, fsynr, cbfrsynra, idx);
> + if (ret == -ENOSYS && __ratelimit(&rs)) {
> + static const struct {
> + u32 mask; const char *name;
> + } fsr_bits[] = {
> + { ARM_SMMU_FSR_MULTI, "MULTI" },
> + { ARM_SMMU_FSR_SS, "SS" },
> + { ARM_SMMU_FSR_UUT, "UUT" },
> + { ARM_SMMU_FSR_ASF, "ASF" },
> + { ARM_SMMU_FSR_TLBLKF, "TLBLKF" },
> + { ARM_SMMU_FSR_TLBMCF, "TLBMCF" },
> + { ARM_SMMU_FSR_EF, "EF" },
> + { ARM_SMMU_FSR_PF, "PF" },
> + { ARM_SMMU_FSR_AFF, "AFF" },
> + { ARM_SMMU_FSR_TF, "TF" },
I think we are missing to log the translation scheme i.e. `Format
bits[10:9]` field of this register as per the SMMUv2 spec. Maybe add
that too?
> + }, fsynr0_bits[] = {
> + { ARM_SMMU_FSYNR0_WNR, "WNR" },
> + { ARM_SMMU_FSYNR0_PNU, "PNU" },
> + { ARM_SMMU_FSYNR0_IND, "IND" },
> + { ARM_SMMU_FSYNR0_NSATTR, "NSATTR" },
> + { ARM_SMMU_FSYNR0_PTWF, "PTWF" },
> + { ARM_SMMU_FSYNR0_AFR, "AFR" },
> + };
> +
> + pr_err("%s %s: Unhandled context fault: fsr=0x%x (",
> + dev_driver_string(smmu->dev), dev_name(smmu->dev), fsr);
> +
> + for (int i = 0, n = 0; i < ARRAY_SIZE(fsr_bits); i++) {
> + if (fsr & fsr_bits[i].mask) {
> + pr_cont("%s%s", (n > 0) ? "|" : "", fsr_bits[i].name);
> + n++;
> + }
> + }
Nit: Maybe add a line-wrap here after logging the iova?
Not trying to sound like a terminal geek, but a full log with pr_cont
could get very long (> 80 characters).
So, let's log fsr & iova in one line, fsynr and others in the next.
> +
> + pr_cont("), iova=0x%08lx, fsynr=0x%x (S1CBNDX=%u", iova, fsynr,
> + (fsynr >> 16) & 0xff);
> +
> + for (int i = 0; i < ARRAY_SIZE(fsynr0_bits); i++) {
> + if (fsynr & fsynr0_bits[i].mask) {
> + pr_cont("|%s", fsynr0_bits[i].name);
> + }
> + }
> +
> + pr_cont("|PLVL=%u), cbfrsynra=0x%x, cb=%d\n",
> + fsynr & 0x3, /* FSYNR0.PLV */
> + cbfrsynra, idx);
> +
> + }
>
> arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
> return IRQ_HANDLED;
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> index 836ed6799a80..3b051273718b 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> @@ -223,6 +223,11 @@ enum arm_smmu_cbar_type {
>
> #define ARM_SMMU_CB_FSYNR0 0x68
> #define ARM_SMMU_FSYNR0_WNR BIT(4)
> +#define ARM_SMMU_FSYNR0_PNU BIT(5)
> +#define ARM_SMMU_FSYNR0_IND BIT(6)
> +#define ARM_SMMU_FSYNR0_NSATTR BIT(8)
> +#define ARM_SMMU_FSYNR0_PTWF BIT(10)
> +#define ARM_SMMU_FSYNR0_AFR BIT(11)
>
Nit: Worth prefixing these with "CB_" i.e. "ARM_SMMU_CB_FSYNR0_* " to
avoid confusion with "GFSYNR0".
> #define ARM_SMMU_CB_FSYNR1 0x6c
>
> --
> 2.45.1
>
>
Thanks,
Pranjal
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] iommu/arm-smmu: Pretty-print context fault related regs
2024-06-04 15:01 [PATCH] iommu/arm-smmu: Pretty-print context fault related regs Rob Clark
2024-06-17 10:27 ` Pranjal Shrivastava
@ 2024-06-17 13:07 ` Robin Murphy
2024-06-17 16:18 ` Rob Clark
1 sibling, 1 reply; 10+ messages in thread
From: Robin Murphy @ 2024-06-17 13:07 UTC (permalink / raw)
To: Rob Clark, iommu
Cc: linux-arm-msm, Stephen Boyd, Rob Clark, Will Deacon, Joerg Roedel,
Jason Gunthorpe, Jerry Snitselaar, Krzysztof Kozlowski,
Dmitry Baryshkov, moderated list:ARM SMMU DRIVERS, open list,
Pranjal Shrivastava
On 04/06/2024 4:01 pm, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
>
> Parse out the bitfields for easier-to-read fault messages.
>
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
> Stephen was wanting easier to read fault messages.. so I typed this up.
>
> Resend with the new iommu list address
>
> drivers/iommu/arm/arm-smmu/arm-smmu.c | 53 +++++++++++++++++++++++++--
> drivers/iommu/arm/arm-smmu/arm-smmu.h | 5 +++
> 2 files changed, 54 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> index c572d877b0e1..06712d73519c 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> @@ -411,6 +411,8 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
> unsigned long iova;
> struct arm_smmu_domain *smmu_domain = dev;
> struct arm_smmu_device *smmu = smmu_domain->smmu;
> + static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
> + DEFAULT_RATELIMIT_BURST);
> int idx = smmu_domain->cfg.cbndx;
> int ret;
>
> @@ -425,10 +427,53 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
> ret = report_iommu_fault(&smmu_domain->domain, NULL, iova,
> fsynr & ARM_SMMU_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
>
> - if (ret == -ENOSYS)
> - dev_err_ratelimited(smmu->dev,
> - "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
> - fsr, iova, fsynr, cbfrsynra, idx);
> + if (ret == -ENOSYS && __ratelimit(&rs)) {
> + static const struct {
> + u32 mask; const char *name;
> + } fsr_bits[] = {
> + { ARM_SMMU_FSR_MULTI, "MULTI" },
> + { ARM_SMMU_FSR_SS, "SS" },
> + { ARM_SMMU_FSR_UUT, "UUT" },
> + { ARM_SMMU_FSR_ASF, "ASF" },
> + { ARM_SMMU_FSR_TLBLKF, "TLBLKF" },
> + { ARM_SMMU_FSR_TLBMCF, "TLBMCF" },
> + { ARM_SMMU_FSR_EF, "EF" },
> + { ARM_SMMU_FSR_PF, "PF" },
> + { ARM_SMMU_FSR_AFF, "AFF" },
> + { ARM_SMMU_FSR_TF, "TF" },
> + }, fsynr0_bits[] = {
> + { ARM_SMMU_FSYNR0_WNR, "WNR" },
> + { ARM_SMMU_FSYNR0_PNU, "PNU" },
> + { ARM_SMMU_FSYNR0_IND, "IND" },
> + { ARM_SMMU_FSYNR0_NSATTR, "NSATTR" },
> + { ARM_SMMU_FSYNR0_PTWF, "PTWF" },
> + { ARM_SMMU_FSYNR0_AFR, "AFR" },
> + };
> +
> + pr_err("%s %s: Unhandled context fault: fsr=0x%x (",
> + dev_driver_string(smmu->dev), dev_name(smmu->dev), fsr);
> +
> + for (int i = 0, n = 0; i < ARRAY_SIZE(fsr_bits); i++) {
> + if (fsr & fsr_bits[i].mask) {
> + pr_cont("%s%s", (n > 0) ? "|" : "", fsr_bits[i].name);
Given that SMMU faults have a high likelihood of correlating with other
errors, e.g. the initiating device also reporting that it got an abort
back, this much pr_cont is a recipe for an unreadable mess. Furthermore,
just imagine how "helpful" this would be when faults in two contexts are
reported by two different CPUs at the same time ;)
I'd prefer to retain the original message as-is, so there is at least
still an unambiguous "atomic" view of a fault's entire state, then
follow it with a decode more in the style of arm64's ESR logging. TBH I
also wouldn't disapprove of hiding the additional decode behind a
command-line/runtime parameter, since a fault storm can cripple a system
enough as it is, without making the interrupt handler spend even longer
printing to a potentially slow console.
> + n++;
> + }
> + }
> +
> + pr_cont("), iova=0x%08lx, fsynr=0x%x (S1CBNDX=%u", iova, fsynr,
> + (fsynr >> 16) & 0xff);
Please define all the bitfields properly (and I agree with Pranjal about
the naming).
Thanks,
Robin.
> +
> + for (int i = 0; i < ARRAY_SIZE(fsynr0_bits); i++) {
> + if (fsynr & fsynr0_bits[i].mask) {
> + pr_cont("|%s", fsynr0_bits[i].name);
> + }
> + }
> +
> + pr_cont("|PLVL=%u), cbfrsynra=0x%x, cb=%d\n",
> + fsynr & 0x3, /* FSYNR0.PLV */
> + cbfrsynra, idx);
> +
> + }
>
> arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
> return IRQ_HANDLED;
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> index 836ed6799a80..3b051273718b 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> @@ -223,6 +223,11 @@ enum arm_smmu_cbar_type {
>
> #define ARM_SMMU_CB_FSYNR0 0x68
> #define ARM_SMMU_FSYNR0_WNR BIT(4)
> +#define ARM_SMMU_FSYNR0_PNU BIT(5)
> +#define ARM_SMMU_FSYNR0_IND BIT(6)
> +#define ARM_SMMU_FSYNR0_NSATTR BIT(8)
> +#define ARM_SMMU_FSYNR0_PTWF BIT(10)
> +#define ARM_SMMU_FSYNR0_AFR BIT(11)
>
> #define ARM_SMMU_CB_FSYNR1 0x6c
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] iommu/arm-smmu: Pretty-print context fault related regs
2024-06-17 13:07 ` Robin Murphy
@ 2024-06-17 16:18 ` Rob Clark
2024-06-17 17:33 ` Robin Murphy
0 siblings, 1 reply; 10+ messages in thread
From: Rob Clark @ 2024-06-17 16:18 UTC (permalink / raw)
To: Robin Murphy
Cc: iommu, linux-arm-msm, Stephen Boyd, Rob Clark, Will Deacon,
Joerg Roedel, Jason Gunthorpe, Jerry Snitselaar,
Krzysztof Kozlowski, Dmitry Baryshkov,
moderated list:ARM SMMU DRIVERS, open list, Pranjal Shrivastava
On Mon, Jun 17, 2024 at 6:07 AM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 04/06/2024 4:01 pm, Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > Parse out the bitfields for easier-to-read fault messages.
> >
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > ---
> > Stephen was wanting easier to read fault messages.. so I typed this up.
> >
> > Resend with the new iommu list address
> >
> > drivers/iommu/arm/arm-smmu/arm-smmu.c | 53 +++++++++++++++++++++++++--
> > drivers/iommu/arm/arm-smmu/arm-smmu.h | 5 +++
> > 2 files changed, 54 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > index c572d877b0e1..06712d73519c 100644
> > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > @@ -411,6 +411,8 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
> > unsigned long iova;
> > struct arm_smmu_domain *smmu_domain = dev;
> > struct arm_smmu_device *smmu = smmu_domain->smmu;
> > + static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
> > + DEFAULT_RATELIMIT_BURST);
> > int idx = smmu_domain->cfg.cbndx;
> > int ret;
> >
> > @@ -425,10 +427,53 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
> > ret = report_iommu_fault(&smmu_domain->domain, NULL, iova,
> > fsynr & ARM_SMMU_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
> >
> > - if (ret == -ENOSYS)
> > - dev_err_ratelimited(smmu->dev,
> > - "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
> > - fsr, iova, fsynr, cbfrsynra, idx);
> > + if (ret == -ENOSYS && __ratelimit(&rs)) {
> > + static const struct {
> > + u32 mask; const char *name;
> > + } fsr_bits[] = {
> > + { ARM_SMMU_FSR_MULTI, "MULTI" },
> > + { ARM_SMMU_FSR_SS, "SS" },
> > + { ARM_SMMU_FSR_UUT, "UUT" },
> > + { ARM_SMMU_FSR_ASF, "ASF" },
> > + { ARM_SMMU_FSR_TLBLKF, "TLBLKF" },
> > + { ARM_SMMU_FSR_TLBMCF, "TLBMCF" },
> > + { ARM_SMMU_FSR_EF, "EF" },
> > + { ARM_SMMU_FSR_PF, "PF" },
> > + { ARM_SMMU_FSR_AFF, "AFF" },
> > + { ARM_SMMU_FSR_TF, "TF" },
> > + }, fsynr0_bits[] = {
> > + { ARM_SMMU_FSYNR0_WNR, "WNR" },
> > + { ARM_SMMU_FSYNR0_PNU, "PNU" },
> > + { ARM_SMMU_FSYNR0_IND, "IND" },
> > + { ARM_SMMU_FSYNR0_NSATTR, "NSATTR" },
> > + { ARM_SMMU_FSYNR0_PTWF, "PTWF" },
> > + { ARM_SMMU_FSYNR0_AFR, "AFR" },
> > + };
> > +
> > + pr_err("%s %s: Unhandled context fault: fsr=0x%x (",
> > + dev_driver_string(smmu->dev), dev_name(smmu->dev), fsr);
> > +
> > + for (int i = 0, n = 0; i < ARRAY_SIZE(fsr_bits); i++) {
> > + if (fsr & fsr_bits[i].mask) {
> > + pr_cont("%s%s", (n > 0) ? "|" : "", fsr_bits[i].name);
>
> Given that SMMU faults have a high likelihood of correlating with other
> errors, e.g. the initiating device also reporting that it got an abort
> back, this much pr_cont is a recipe for an unreadable mess. Furthermore,
> just imagine how "helpful" this would be when faults in two contexts are
> reported by two different CPUs at the same time ;)
It looks like arm_smmu_context_fault() is only used with non-threaded
irq's. And this fallback is only used if driver doesn't register it's
own fault handler. So I don't think this will be a problem.
> I'd prefer to retain the original message as-is, so there is at least
> still an unambiguous "atomic" view of a fault's entire state, then
> follow it with a decode more in the style of arm64's ESR logging. TBH I
> also wouldn't disapprove of hiding the additional decode behind a
> command-line/runtime parameter, since a fault storm can cripple a system
> enough as it is, without making the interrupt handler spend even longer
> printing to a potentially slow console.
It _is_ ratelimited. But we could perhaps use a higher loglevel (pr_debug?)
BR,
-R
> > + n++;
> > + }
> > + }
> > +
> > + pr_cont("), iova=0x%08lx, fsynr=0x%x (S1CBNDX=%u", iova, fsynr,
> > + (fsynr >> 16) & 0xff);
>
> Please define all the bitfields properly (and I agree with Pranjal about
> the naming).
>
> Thanks,
> Robin.
>
> > +
> > + for (int i = 0; i < ARRAY_SIZE(fsynr0_bits); i++) {
> > + if (fsynr & fsynr0_bits[i].mask) {
> > + pr_cont("|%s", fsynr0_bits[i].name);
> > + }
> > + }
> > +
> > + pr_cont("|PLVL=%u), cbfrsynra=0x%x, cb=%d\n",
> > + fsynr & 0x3, /* FSYNR0.PLV */
> > + cbfrsynra, idx);
> > +
> > + }
> >
> > arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
> > return IRQ_HANDLED;
> > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> > index 836ed6799a80..3b051273718b 100644
> > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
> > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> > @@ -223,6 +223,11 @@ enum arm_smmu_cbar_type {
> >
> > #define ARM_SMMU_CB_FSYNR0 0x68
> > #define ARM_SMMU_FSYNR0_WNR BIT(4)
> > +#define ARM_SMMU_FSYNR0_PNU BIT(5)
> > +#define ARM_SMMU_FSYNR0_IND BIT(6)
> > +#define ARM_SMMU_FSYNR0_NSATTR BIT(8)
> > +#define ARM_SMMU_FSYNR0_PTWF BIT(10)
> > +#define ARM_SMMU_FSYNR0_AFR BIT(11)
> >
> > #define ARM_SMMU_CB_FSYNR1 0x6c
> >
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] iommu/arm-smmu: Pretty-print context fault related regs
2024-06-17 16:18 ` Rob Clark
@ 2024-06-17 17:33 ` Robin Murphy
2024-06-18 14:57 ` Rob Clark
0 siblings, 1 reply; 10+ messages in thread
From: Robin Murphy @ 2024-06-17 17:33 UTC (permalink / raw)
To: Rob Clark
Cc: iommu, linux-arm-msm, Stephen Boyd, Rob Clark, Will Deacon,
Joerg Roedel, Jason Gunthorpe, Jerry Snitselaar,
Krzysztof Kozlowski, Dmitry Baryshkov,
moderated list:ARM SMMU DRIVERS, open list, Pranjal Shrivastava
On 2024-06-17 5:18 pm, Rob Clark wrote:
> On Mon, Jun 17, 2024 at 6:07 AM Robin Murphy <robin.murphy@arm.com> wrote:
>>
>> On 04/06/2024 4:01 pm, Rob Clark wrote:
>>> From: Rob Clark <robdclark@chromium.org>
>>>
>>> Parse out the bitfields for easier-to-read fault messages.
>>>
>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
>>> ---
>>> Stephen was wanting easier to read fault messages.. so I typed this up.
>>>
>>> Resend with the new iommu list address
>>>
>>> drivers/iommu/arm/arm-smmu/arm-smmu.c | 53 +++++++++++++++++++++++++--
>>> drivers/iommu/arm/arm-smmu/arm-smmu.h | 5 +++
>>> 2 files changed, 54 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>> index c572d877b0e1..06712d73519c 100644
>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>> @@ -411,6 +411,8 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>>> unsigned long iova;
>>> struct arm_smmu_domain *smmu_domain = dev;
>>> struct arm_smmu_device *smmu = smmu_domain->smmu;
>>> + static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
>>> + DEFAULT_RATELIMIT_BURST);
>>> int idx = smmu_domain->cfg.cbndx;
>>> int ret;
>>>
>>> @@ -425,10 +427,53 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>>> ret = report_iommu_fault(&smmu_domain->domain, NULL, iova,
>>> fsynr & ARM_SMMU_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
>>>
>>> - if (ret == -ENOSYS)
>>> - dev_err_ratelimited(smmu->dev,
>>> - "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
>>> - fsr, iova, fsynr, cbfrsynra, idx);
>>> + if (ret == -ENOSYS && __ratelimit(&rs)) {
>>> + static const struct {
>>> + u32 mask; const char *name;
>>> + } fsr_bits[] = {
>>> + { ARM_SMMU_FSR_MULTI, "MULTI" },
>>> + { ARM_SMMU_FSR_SS, "SS" },
>>> + { ARM_SMMU_FSR_UUT, "UUT" },
>>> + { ARM_SMMU_FSR_ASF, "ASF" },
>>> + { ARM_SMMU_FSR_TLBLKF, "TLBLKF" },
>>> + { ARM_SMMU_FSR_TLBMCF, "TLBMCF" },
>>> + { ARM_SMMU_FSR_EF, "EF" },
>>> + { ARM_SMMU_FSR_PF, "PF" },
>>> + { ARM_SMMU_FSR_AFF, "AFF" },
>>> + { ARM_SMMU_FSR_TF, "TF" },
>>> + }, fsynr0_bits[] = {
>>> + { ARM_SMMU_FSYNR0_WNR, "WNR" },
>>> + { ARM_SMMU_FSYNR0_PNU, "PNU" },
>>> + { ARM_SMMU_FSYNR0_IND, "IND" },
>>> + { ARM_SMMU_FSYNR0_NSATTR, "NSATTR" },
>>> + { ARM_SMMU_FSYNR0_PTWF, "PTWF" },
>>> + { ARM_SMMU_FSYNR0_AFR, "AFR" },
>>> + };
>>> +
>>> + pr_err("%s %s: Unhandled context fault: fsr=0x%x (",
>>> + dev_driver_string(smmu->dev), dev_name(smmu->dev), fsr);
>>> +
>>> + for (int i = 0, n = 0; i < ARRAY_SIZE(fsr_bits); i++) {
>>> + if (fsr & fsr_bits[i].mask) {
>>> + pr_cont("%s%s", (n > 0) ? "|" : "", fsr_bits[i].name);
>>
>> Given that SMMU faults have a high likelihood of correlating with other
>> errors, e.g. the initiating device also reporting that it got an abort
>> back, this much pr_cont is a recipe for an unreadable mess. Furthermore,
>> just imagine how "helpful" this would be when faults in two contexts are
>> reported by two different CPUs at the same time ;)
>
> It looks like arm_smmu_context_fault() is only used with non-threaded
> irq's. And this fallback is only used if driver doesn't register it's
> own fault handler. So I don't think this will be a problem.
You don't think two different IRQs can fire on two different CPUs at the
same time (or close to)? It's already bad enough when multiple CPUs
panic and one has to pick apart line-by-line interleaving of the
registers/stacktraces - imagine how much more utterly unusable that
would be with bits of different dumps randomly pr_cont'ed together onto
the same line(s)...
Even when unrelated stuff gets interleaved because other CPUs just
happen to be calling printk() at the same time for unrelated reasons
it's still annoying, and pr_cont makes a bigger mess than not.
>> I'd prefer to retain the original message as-is, so there is at least
>> still an unambiguous "atomic" view of a fault's entire state, then
>> follow it with a decode more in the style of arm64's ESR logging. TBH I
>> also wouldn't disapprove of hiding the additional decode behind a
>> command-line/runtime parameter, since a fault storm can cripple a system
>> enough as it is, without making the interrupt handler spend even longer
>> printing to a potentially slow console.
>
> It _is_ ratelimited. But we could perhaps use a higher loglevel (pr_debug?)
Yeah, I'd have no complaint with pr_debug/dev_dbg either, if that suits
your use case. True that the ratelimit may typically mitigate the
overall impact, but still in the worst case with a sufficiently slow
console and/or a sufficiently large amount to print per __ratelimit()
call, it can end up being slow enough to stay below the threshold. Don't
ask me how I know that :)
Thanks,
Robin.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] iommu/arm-smmu: Pretty-print context fault related regs
2024-06-17 17:33 ` Robin Murphy
@ 2024-06-18 14:57 ` Rob Clark
2024-06-18 15:47 ` Pranjal Shrivastava
0 siblings, 1 reply; 10+ messages in thread
From: Rob Clark @ 2024-06-18 14:57 UTC (permalink / raw)
To: Robin Murphy
Cc: iommu, linux-arm-msm, Stephen Boyd, Rob Clark, Will Deacon,
Joerg Roedel, Jason Gunthorpe, Jerry Snitselaar,
Krzysztof Kozlowski, Dmitry Baryshkov,
moderated list:ARM SMMU DRIVERS, open list, Pranjal Shrivastava
On Mon, Jun 17, 2024 at 10:33 AM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2024-06-17 5:18 pm, Rob Clark wrote:
> > On Mon, Jun 17, 2024 at 6:07 AM Robin Murphy <robin.murphy@arm.com> wrote:
> >>
> >> On 04/06/2024 4:01 pm, Rob Clark wrote:
> >>> From: Rob Clark <robdclark@chromium.org>
> >>>
> >>> Parse out the bitfields for easier-to-read fault messages.
> >>>
> >>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> >>> ---
> >>> Stephen was wanting easier to read fault messages.. so I typed this up.
> >>>
> >>> Resend with the new iommu list address
> >>>
> >>> drivers/iommu/arm/arm-smmu/arm-smmu.c | 53 +++++++++++++++++++++++++--
> >>> drivers/iommu/arm/arm-smmu/arm-smmu.h | 5 +++
> >>> 2 files changed, 54 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> >>> index c572d877b0e1..06712d73519c 100644
> >>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> >>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> >>> @@ -411,6 +411,8 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
> >>> unsigned long iova;
> >>> struct arm_smmu_domain *smmu_domain = dev;
> >>> struct arm_smmu_device *smmu = smmu_domain->smmu;
> >>> + static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
> >>> + DEFAULT_RATELIMIT_BURST);
> >>> int idx = smmu_domain->cfg.cbndx;
> >>> int ret;
> >>>
> >>> @@ -425,10 +427,53 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
> >>> ret = report_iommu_fault(&smmu_domain->domain, NULL, iova,
> >>> fsynr & ARM_SMMU_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
> >>>
> >>> - if (ret == -ENOSYS)
> >>> - dev_err_ratelimited(smmu->dev,
> >>> - "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
> >>> - fsr, iova, fsynr, cbfrsynra, idx);
> >>> + if (ret == -ENOSYS && __ratelimit(&rs)) {
> >>> + static const struct {
> >>> + u32 mask; const char *name;
> >>> + } fsr_bits[] = {
> >>> + { ARM_SMMU_FSR_MULTI, "MULTI" },
> >>> + { ARM_SMMU_FSR_SS, "SS" },
> >>> + { ARM_SMMU_FSR_UUT, "UUT" },
> >>> + { ARM_SMMU_FSR_ASF, "ASF" },
> >>> + { ARM_SMMU_FSR_TLBLKF, "TLBLKF" },
> >>> + { ARM_SMMU_FSR_TLBMCF, "TLBMCF" },
> >>> + { ARM_SMMU_FSR_EF, "EF" },
> >>> + { ARM_SMMU_FSR_PF, "PF" },
> >>> + { ARM_SMMU_FSR_AFF, "AFF" },
> >>> + { ARM_SMMU_FSR_TF, "TF" },
> >>> + }, fsynr0_bits[] = {
> >>> + { ARM_SMMU_FSYNR0_WNR, "WNR" },
> >>> + { ARM_SMMU_FSYNR0_PNU, "PNU" },
> >>> + { ARM_SMMU_FSYNR0_IND, "IND" },
> >>> + { ARM_SMMU_FSYNR0_NSATTR, "NSATTR" },
> >>> + { ARM_SMMU_FSYNR0_PTWF, "PTWF" },
> >>> + { ARM_SMMU_FSYNR0_AFR, "AFR" },
> >>> + };
> >>> +
> >>> + pr_err("%s %s: Unhandled context fault: fsr=0x%x (",
> >>> + dev_driver_string(smmu->dev), dev_name(smmu->dev), fsr);
> >>> +
> >>> + for (int i = 0, n = 0; i < ARRAY_SIZE(fsr_bits); i++) {
> >>> + if (fsr & fsr_bits[i].mask) {
> >>> + pr_cont("%s%s", (n > 0) ? "|" : "", fsr_bits[i].name);
> >>
> >> Given that SMMU faults have a high likelihood of correlating with other
> >> errors, e.g. the initiating device also reporting that it got an abort
> >> back, this much pr_cont is a recipe for an unreadable mess. Furthermore,
> >> just imagine how "helpful" this would be when faults in two contexts are
> >> reported by two different CPUs at the same time ;)
> >
> > It looks like arm_smmu_context_fault() is only used with non-threaded
> > irq's. And this fallback is only used if driver doesn't register it's
> > own fault handler. So I don't think this will be a problem.
>
> You don't think two different IRQs can fire on two different CPUs at the
> same time (or close to)? It's already bad enough when multiple CPUs
> panic and one has to pick apart line-by-line interleaving of the
> registers/stacktraces - imagine how much more utterly unusable that
> would be with bits of different dumps randomly pr_cont'ed together onto
> the same line(s)...
_different_ irq's, yes
Anyways, the reason for pr_cont() was that there wasn't another
reasonable way to decide where separator chars were needed with a
single pr_err(). I could instead construct a string on stack and
print that in a single call, but pr_cont() seemed like the more
reasonable alternative.
BR,
-R
> Even when unrelated stuff gets interleaved because other CPUs just
> happen to be calling printk() at the same time for unrelated reasons
> it's still annoying, and pr_cont makes a bigger mess than not.
> >> I'd prefer to retain the original message as-is, so there is at least
> >> still an unambiguous "atomic" view of a fault's entire state, then
> >> follow it with a decode more in the style of arm64's ESR logging. TBH I
> >> also wouldn't disapprove of hiding the additional decode behind a
> >> command-line/runtime parameter, since a fault storm can cripple a system
> >> enough as it is, without making the interrupt handler spend even longer
> >> printing to a potentially slow console.
> >
> > It _is_ ratelimited. But we could perhaps use a higher loglevel (pr_debug?)
>
> Yeah, I'd have no complaint with pr_debug/dev_dbg either, if that suits
> your use case. True that the ratelimit may typically mitigate the
> overall impact, but still in the worst case with a sufficiently slow
> console and/or a sufficiently large amount to print per __ratelimit()
> call, it can end up being slow enough to stay below the threshold. Don't
> ask me how I know that :)
>
> Thanks,
> Robin.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] iommu/arm-smmu: Pretty-print context fault related regs
2024-06-18 14:57 ` Rob Clark
@ 2024-06-18 15:47 ` Pranjal Shrivastava
2024-06-18 16:07 ` Rob Clark
0 siblings, 1 reply; 10+ messages in thread
From: Pranjal Shrivastava @ 2024-06-18 15:47 UTC (permalink / raw)
To: Rob Clark
Cc: Robin Murphy, iommu, linux-arm-msm, Stephen Boyd, Rob Clark,
Will Deacon, Joerg Roedel, Jason Gunthorpe, Jerry Snitselaar,
Krzysztof Kozlowski, Dmitry Baryshkov,
moderated list:ARM SMMU DRIVERS, open list
On Tue, Jun 18, 2024 at 8:28 PM Rob Clark <robdclark@gmail.com> wrote:
>
> On Mon, Jun 17, 2024 at 10:33 AM Robin Murphy <robin.murphy@arm.com> wrote:
> >
> > On 2024-06-17 5:18 pm, Rob Clark wrote:
> > > On Mon, Jun 17, 2024 at 6:07 AM Robin Murphy <robin.murphy@arm.com> wrote:
> > >>
> > >> On 04/06/2024 4:01 pm, Rob Clark wrote:
> > >>> From: Rob Clark <robdclark@chromium.org>
> > >>>
> > >>> Parse out the bitfields for easier-to-read fault messages.
> > >>>
> > >>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> > >>> ---
> > >>> Stephen was wanting easier to read fault messages.. so I typed this up.
> > >>>
> > >>> Resend with the new iommu list address
> > >>>
> > >>> drivers/iommu/arm/arm-smmu/arm-smmu.c | 53 +++++++++++++++++++++++++--
> > >>> drivers/iommu/arm/arm-smmu/arm-smmu.h | 5 +++
> > >>> 2 files changed, 54 insertions(+), 4 deletions(-)
> > >>>
> > >>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > >>> index c572d877b0e1..06712d73519c 100644
> > >>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > >>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > >>> @@ -411,6 +411,8 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
> > >>> unsigned long iova;
> > >>> struct arm_smmu_domain *smmu_domain = dev;
> > >>> struct arm_smmu_device *smmu = smmu_domain->smmu;
> > >>> + static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
> > >>> + DEFAULT_RATELIMIT_BURST);
> > >>> int idx = smmu_domain->cfg.cbndx;
> > >>> int ret;
> > >>>
> > >>> @@ -425,10 +427,53 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
> > >>> ret = report_iommu_fault(&smmu_domain->domain, NULL, iova,
> > >>> fsynr & ARM_SMMU_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
> > >>>
> > >>> - if (ret == -ENOSYS)
> > >>> - dev_err_ratelimited(smmu->dev,
> > >>> - "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
> > >>> - fsr, iova, fsynr, cbfrsynra, idx);
> > >>> + if (ret == -ENOSYS && __ratelimit(&rs)) {
> > >>> + static const struct {
> > >>> + u32 mask; const char *name;
> > >>> + } fsr_bits[] = {
> > >>> + { ARM_SMMU_FSR_MULTI, "MULTI" },
> > >>> + { ARM_SMMU_FSR_SS, "SS" },
> > >>> + { ARM_SMMU_FSR_UUT, "UUT" },
> > >>> + { ARM_SMMU_FSR_ASF, "ASF" },
> > >>> + { ARM_SMMU_FSR_TLBLKF, "TLBLKF" },
> > >>> + { ARM_SMMU_FSR_TLBMCF, "TLBMCF" },
> > >>> + { ARM_SMMU_FSR_EF, "EF" },
> > >>> + { ARM_SMMU_FSR_PF, "PF" },
> > >>> + { ARM_SMMU_FSR_AFF, "AFF" },
> > >>> + { ARM_SMMU_FSR_TF, "TF" },
> > >>> + }, fsynr0_bits[] = {
> > >>> + { ARM_SMMU_FSYNR0_WNR, "WNR" },
> > >>> + { ARM_SMMU_FSYNR0_PNU, "PNU" },
> > >>> + { ARM_SMMU_FSYNR0_IND, "IND" },
> > >>> + { ARM_SMMU_FSYNR0_NSATTR, "NSATTR" },
> > >>> + { ARM_SMMU_FSYNR0_PTWF, "PTWF" },
> > >>> + { ARM_SMMU_FSYNR0_AFR, "AFR" },
> > >>> + };
> > >>> +
> > >>> + pr_err("%s %s: Unhandled context fault: fsr=0x%x (",
> > >>> + dev_driver_string(smmu->dev), dev_name(smmu->dev), fsr);
> > >>> +
> > >>> + for (int i = 0, n = 0; i < ARRAY_SIZE(fsr_bits); i++) {
> > >>> + if (fsr & fsr_bits[i].mask) {
> > >>> + pr_cont("%s%s", (n > 0) ? "|" : "", fsr_bits[i].name);
> > >>
> > >> Given that SMMU faults have a high likelihood of correlating with other
> > >> errors, e.g. the initiating device also reporting that it got an abort
> > >> back, this much pr_cont is a recipe for an unreadable mess. Furthermore,
> > >> just imagine how "helpful" this would be when faults in two contexts are
> > >> reported by two different CPUs at the same time ;)
> > >
> > > It looks like arm_smmu_context_fault() is only used with non-threaded
> > > irq's. And this fallback is only used if driver doesn't register it's
> > > own fault handler. So I don't think this will be a problem.
> >
> > You don't think two different IRQs can fire on two different CPUs at the
> > same time (or close to)? It's already bad enough when multiple CPUs
> > panic and one has to pick apart line-by-line interleaving of the
> > registers/stacktraces - imagine how much more utterly unusable that
> > would be with bits of different dumps randomly pr_cont'ed together onto
> > the same line(s)...
>
> _different_ irq's, yes
>
> Anyways, the reason for pr_cont() was that there wasn't another
> reasonable way to decide where separator chars were needed with a
> single pr_err(). I could instead construct a string on stack and
> print that in a single call, but pr_cont() seemed like the more
> reasonable alternative.
>
> BR,
> -R
The string approach sounds good to me, if possible, let's break this
out into a helper function, something like `arm_smmu_log_ctx_fault`
and put it under a module parameter, I guess? Not sure if this
requires a new Kconfig option, would like Robin's opinion on this.
Thanks,
Pranjal
>
> > Even when unrelated stuff gets interleaved because other CPUs just
> > happen to be calling printk() at the same time for unrelated reasons
> > it's still annoying, and pr_cont makes a bigger mess than not.
> > >> I'd prefer to retain the original message as-is, so there is at least
> > >> still an unambiguous "atomic" view of a fault's entire state, then
> > >> follow it with a decode more in the style of arm64's ESR logging. TBH I
> > >> also wouldn't disapprove of hiding the additional decode behind a
> > >> command-line/runtime parameter, since a fault storm can cripple a system
> > >> enough as it is, without making the interrupt handler spend even longer
> > >> printing to a potentially slow console.
> > >
> > > It _is_ ratelimited. But we could perhaps use a higher loglevel (pr_debug?)
> >
> > Yeah, I'd have no complaint with pr_debug/dev_dbg either, if that suits
> > your use case. True that the ratelimit may typically mitigate the
> > overall impact, but still in the worst case with a sufficiently slow
> > console and/or a sufficiently large amount to print per __ratelimit()
> > call, it can end up being slow enough to stay below the threshold. Don't
> > ask me how I know that :)
> >
> > Thanks,
> > Robin.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] iommu/arm-smmu: Pretty-print context fault related regs
2024-06-18 15:47 ` Pranjal Shrivastava
@ 2024-06-18 16:07 ` Rob Clark
2024-06-26 15:28 ` Pranjal Shrivastava
0 siblings, 1 reply; 10+ messages in thread
From: Rob Clark @ 2024-06-18 16:07 UTC (permalink / raw)
To: Pranjal Shrivastava
Cc: Robin Murphy, iommu, linux-arm-msm, Stephen Boyd, Rob Clark,
Will Deacon, Joerg Roedel, Jason Gunthorpe, Jerry Snitselaar,
Krzysztof Kozlowski, Dmitry Baryshkov,
moderated list:ARM SMMU DRIVERS, open list
On Tue, Jun 18, 2024 at 8:47 AM Pranjal Shrivastava <praan@google.com> wrote:
>
> On Tue, Jun 18, 2024 at 8:28 PM Rob Clark <robdclark@gmail.com> wrote:
> >
> > On Mon, Jun 17, 2024 at 10:33 AM Robin Murphy <robin.murphy@arm.com> wrote:
> > >
> > > On 2024-06-17 5:18 pm, Rob Clark wrote:
> > > > On Mon, Jun 17, 2024 at 6:07 AM Robin Murphy <robin.murphy@arm.com> wrote:
> > > >>
> > > >> On 04/06/2024 4:01 pm, Rob Clark wrote:
> > > >>> From: Rob Clark <robdclark@chromium.org>
> > > >>>
> > > >>> Parse out the bitfields for easier-to-read fault messages.
> > > >>>
> > > >>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > >>> ---
> > > >>> Stephen was wanting easier to read fault messages.. so I typed this up.
> > > >>>
> > > >>> Resend with the new iommu list address
> > > >>>
> > > >>> drivers/iommu/arm/arm-smmu/arm-smmu.c | 53 +++++++++++++++++++++++++--
> > > >>> drivers/iommu/arm/arm-smmu/arm-smmu.h | 5 +++
> > > >>> 2 files changed, 54 insertions(+), 4 deletions(-)
> > > >>>
> > > >>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > >>> index c572d877b0e1..06712d73519c 100644
> > > >>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > >>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > >>> @@ -411,6 +411,8 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
> > > >>> unsigned long iova;
> > > >>> struct arm_smmu_domain *smmu_domain = dev;
> > > >>> struct arm_smmu_device *smmu = smmu_domain->smmu;
> > > >>> + static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
> > > >>> + DEFAULT_RATELIMIT_BURST);
> > > >>> int idx = smmu_domain->cfg.cbndx;
> > > >>> int ret;
> > > >>>
> > > >>> @@ -425,10 +427,53 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
> > > >>> ret = report_iommu_fault(&smmu_domain->domain, NULL, iova,
> > > >>> fsynr & ARM_SMMU_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
> > > >>>
> > > >>> - if (ret == -ENOSYS)
> > > >>> - dev_err_ratelimited(smmu->dev,
> > > >>> - "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
> > > >>> - fsr, iova, fsynr, cbfrsynra, idx);
> > > >>> + if (ret == -ENOSYS && __ratelimit(&rs)) {
> > > >>> + static const struct {
> > > >>> + u32 mask; const char *name;
> > > >>> + } fsr_bits[] = {
> > > >>> + { ARM_SMMU_FSR_MULTI, "MULTI" },
> > > >>> + { ARM_SMMU_FSR_SS, "SS" },
> > > >>> + { ARM_SMMU_FSR_UUT, "UUT" },
> > > >>> + { ARM_SMMU_FSR_ASF, "ASF" },
> > > >>> + { ARM_SMMU_FSR_TLBLKF, "TLBLKF" },
> > > >>> + { ARM_SMMU_FSR_TLBMCF, "TLBMCF" },
> > > >>> + { ARM_SMMU_FSR_EF, "EF" },
> > > >>> + { ARM_SMMU_FSR_PF, "PF" },
> > > >>> + { ARM_SMMU_FSR_AFF, "AFF" },
> > > >>> + { ARM_SMMU_FSR_TF, "TF" },
> > > >>> + }, fsynr0_bits[] = {
> > > >>> + { ARM_SMMU_FSYNR0_WNR, "WNR" },
> > > >>> + { ARM_SMMU_FSYNR0_PNU, "PNU" },
> > > >>> + { ARM_SMMU_FSYNR0_IND, "IND" },
> > > >>> + { ARM_SMMU_FSYNR0_NSATTR, "NSATTR" },
> > > >>> + { ARM_SMMU_FSYNR0_PTWF, "PTWF" },
> > > >>> + { ARM_SMMU_FSYNR0_AFR, "AFR" },
> > > >>> + };
> > > >>> +
> > > >>> + pr_err("%s %s: Unhandled context fault: fsr=0x%x (",
> > > >>> + dev_driver_string(smmu->dev), dev_name(smmu->dev), fsr);
> > > >>> +
> > > >>> + for (int i = 0, n = 0; i < ARRAY_SIZE(fsr_bits); i++) {
> > > >>> + if (fsr & fsr_bits[i].mask) {
> > > >>> + pr_cont("%s%s", (n > 0) ? "|" : "", fsr_bits[i].name);
> > > >>
> > > >> Given that SMMU faults have a high likelihood of correlating with other
> > > >> errors, e.g. the initiating device also reporting that it got an abort
> > > >> back, this much pr_cont is a recipe for an unreadable mess. Furthermore,
> > > >> just imagine how "helpful" this would be when faults in two contexts are
> > > >> reported by two different CPUs at the same time ;)
> > > >
> > > > It looks like arm_smmu_context_fault() is only used with non-threaded
> > > > irq's. And this fallback is only used if driver doesn't register it's
> > > > own fault handler. So I don't think this will be a problem.
> > >
> > > You don't think two different IRQs can fire on two different CPUs at the
> > > same time (or close to)? It's already bad enough when multiple CPUs
> > > panic and one has to pick apart line-by-line interleaving of the
> > > registers/stacktraces - imagine how much more utterly unusable that
> > > would be with bits of different dumps randomly pr_cont'ed together onto
> > > the same line(s)...
> >
> > _different_ irq's, yes
> >
> > Anyways, the reason for pr_cont() was that there wasn't another
> > reasonable way to decide where separator chars were needed with a
> > single pr_err(). I could instead construct a string on stack and
> > print that in a single call, but pr_cont() seemed like the more
> > reasonable alternative.
> >
> > BR,
> > -R
>
> The string approach sounds good to me, if possible, let's break this
> out into a helper function, something like `arm_smmu_log_ctx_fault`
> and put it under a module parameter, I guess? Not sure if this
> requires a new Kconfig option, would like Robin's opinion on this.
I did notice that qcom_smmu_context_fault() appeared recently, also
adding similar pretty-print.. but only for things with a tbu driver
(and a bit more open coded). So a helper would probably make sense.
Less sure about mod param or Kconfig, but I tend to be of the opinion
that the kernel already has too much configurability.. others may have
a different view.
BR,
-R
> Thanks,
> Pranjal
>
> >
> > > Even when unrelated stuff gets interleaved because other CPUs just
> > > happen to be calling printk() at the same time for unrelated reasons
> > > it's still annoying, and pr_cont makes a bigger mess than not.
> > > >> I'd prefer to retain the original message as-is, so there is at least
> > > >> still an unambiguous "atomic" view of a fault's entire state, then
> > > >> follow it with a decode more in the style of arm64's ESR logging. TBH I
> > > >> also wouldn't disapprove of hiding the additional decode behind a
> > > >> command-line/runtime parameter, since a fault storm can cripple a system
> > > >> enough as it is, without making the interrupt handler spend even longer
> > > >> printing to a potentially slow console.
> > > >
> > > > It _is_ ratelimited. But we could perhaps use a higher loglevel (pr_debug?)
> > >
> > > Yeah, I'd have no complaint with pr_debug/dev_dbg either, if that suits
> > > your use case. True that the ratelimit may typically mitigate the
> > > overall impact, but still in the worst case with a sufficiently slow
> > > console and/or a sufficiently large amount to print per __ratelimit()
> > > call, it can end up being slow enough to stay below the threshold. Don't
> > > ask me how I know that :)
> > >
> > > Thanks,
> > > Robin.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] iommu/arm-smmu: Pretty-print context fault related regs
2024-06-18 16:07 ` Rob Clark
@ 2024-06-26 15:28 ` Pranjal Shrivastava
0 siblings, 0 replies; 10+ messages in thread
From: Pranjal Shrivastava @ 2024-06-26 15:28 UTC (permalink / raw)
To: Rob Clark
Cc: Robin Murphy, iommu, linux-arm-msm, Stephen Boyd, Rob Clark,
Will Deacon, Joerg Roedel, Jason Gunthorpe, Jerry Snitselaar,
Krzysztof Kozlowski, Dmitry Baryshkov,
moderated list:ARM SMMU DRIVERS, open list
On Tue, Jun 18, 2024 at 9:37 PM Rob Clark <robdclark@gmail.com> wrote:
>
> On Tue, Jun 18, 2024 at 8:47 AM Pranjal Shrivastava <praan@google.com> wrote:
> >
> > On Tue, Jun 18, 2024 at 8:28 PM Rob Clark <robdclark@gmail.com> wrote:
> > >
> > > On Mon, Jun 17, 2024 at 10:33 AM Robin Murphy <robin.murphy@arm.com> wrote:
> > > >
> > > > On 2024-06-17 5:18 pm, Rob Clark wrote:
> > > > > On Mon, Jun 17, 2024 at 6:07 AM Robin Murphy <robin.murphy@arm.com> wrote:
> > > > >>
> > > > >> On 04/06/2024 4:01 pm, Rob Clark wrote:
> > > > >>> From: Rob Clark <robdclark@chromium.org>
> > > > >>>
> > > > >>> Parse out the bitfields for easier-to-read fault messages.
> > > > >>>
> > > > >>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > > >>> ---
> > > > >>> Stephen was wanting easier to read fault messages.. so I typed this up.
> > > > >>>
> > > > >>> Resend with the new iommu list address
> > > > >>>
> > > > >>> drivers/iommu/arm/arm-smmu/arm-smmu.c | 53 +++++++++++++++++++++++++--
> > > > >>> drivers/iommu/arm/arm-smmu/arm-smmu.h | 5 +++
> > > > >>> 2 files changed, 54 insertions(+), 4 deletions(-)
> > > > >>>
> > > > >>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > > >>> index c572d877b0e1..06712d73519c 100644
> > > > >>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > > >>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > > >>> @@ -411,6 +411,8 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
> > > > >>> unsigned long iova;
> > > > >>> struct arm_smmu_domain *smmu_domain = dev;
> > > > >>> struct arm_smmu_device *smmu = smmu_domain->smmu;
> > > > >>> + static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
> > > > >>> + DEFAULT_RATELIMIT_BURST);
> > > > >>> int idx = smmu_domain->cfg.cbndx;
> > > > >>> int ret;
> > > > >>>
> > > > >>> @@ -425,10 +427,53 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
> > > > >>> ret = report_iommu_fault(&smmu_domain->domain, NULL, iova,
> > > > >>> fsynr & ARM_SMMU_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
> > > > >>>
> > > > >>> - if (ret == -ENOSYS)
> > > > >>> - dev_err_ratelimited(smmu->dev,
> > > > >>> - "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
> > > > >>> - fsr, iova, fsynr, cbfrsynra, idx);
> > > > >>> + if (ret == -ENOSYS && __ratelimit(&rs)) {
> > > > >>> + static const struct {
> > > > >>> + u32 mask; const char *name;
> > > > >>> + } fsr_bits[] = {
> > > > >>> + { ARM_SMMU_FSR_MULTI, "MULTI" },
> > > > >>> + { ARM_SMMU_FSR_SS, "SS" },
> > > > >>> + { ARM_SMMU_FSR_UUT, "UUT" },
> > > > >>> + { ARM_SMMU_FSR_ASF, "ASF" },
> > > > >>> + { ARM_SMMU_FSR_TLBLKF, "TLBLKF" },
> > > > >>> + { ARM_SMMU_FSR_TLBMCF, "TLBMCF" },
> > > > >>> + { ARM_SMMU_FSR_EF, "EF" },
> > > > >>> + { ARM_SMMU_FSR_PF, "PF" },
> > > > >>> + { ARM_SMMU_FSR_AFF, "AFF" },
> > > > >>> + { ARM_SMMU_FSR_TF, "TF" },
> > > > >>> + }, fsynr0_bits[] = {
> > > > >>> + { ARM_SMMU_FSYNR0_WNR, "WNR" },
> > > > >>> + { ARM_SMMU_FSYNR0_PNU, "PNU" },
> > > > >>> + { ARM_SMMU_FSYNR0_IND, "IND" },
> > > > >>> + { ARM_SMMU_FSYNR0_NSATTR, "NSATTR" },
> > > > >>> + { ARM_SMMU_FSYNR0_PTWF, "PTWF" },
> > > > >>> + { ARM_SMMU_FSYNR0_AFR, "AFR" },
> > > > >>> + };
> > > > >>> +
> > > > >>> + pr_err("%s %s: Unhandled context fault: fsr=0x%x (",
> > > > >>> + dev_driver_string(smmu->dev), dev_name(smmu->dev), fsr);
> > > > >>> +
> > > > >>> + for (int i = 0, n = 0; i < ARRAY_SIZE(fsr_bits); i++) {
> > > > >>> + if (fsr & fsr_bits[i].mask) {
> > > > >>> + pr_cont("%s%s", (n > 0) ? "|" : "", fsr_bits[i].name);
> > > > >>
> > > > >> Given that SMMU faults have a high likelihood of correlating with other
> > > > >> errors, e.g. the initiating device also reporting that it got an abort
> > > > >> back, this much pr_cont is a recipe for an unreadable mess. Furthermore,
> > > > >> just imagine how "helpful" this would be when faults in two contexts are
> > > > >> reported by two different CPUs at the same time ;)
> > > > >
> > > > > It looks like arm_smmu_context_fault() is only used with non-threaded
> > > > > irq's. And this fallback is only used if driver doesn't register it's
> > > > > own fault handler. So I don't think this will be a problem.
> > > >
> > > > You don't think two different IRQs can fire on two different CPUs at the
> > > > same time (or close to)? It's already bad enough when multiple CPUs
> > > > panic and one has to pick apart line-by-line interleaving of the
> > > > registers/stacktraces - imagine how much more utterly unusable that
> > > > would be with bits of different dumps randomly pr_cont'ed together onto
> > > > the same line(s)...
> > >
> > > _different_ irq's, yes
> > >
> > > Anyways, the reason for pr_cont() was that there wasn't another
> > > reasonable way to decide where separator chars were needed with a
> > > single pr_err(). I could instead construct a string on stack and
> > > print that in a single call, but pr_cont() seemed like the more
> > > reasonable alternative.
> > >
> > > BR,
> > > -R
> >
> > The string approach sounds good to me, if possible, let's break this
> > out into a helper function, something like `arm_smmu_log_ctx_fault`
> > and put it under a module parameter, I guess? Not sure if this
> > requires a new Kconfig option, would like Robin's opinion on this.
>
> I did notice that qcom_smmu_context_fault() appeared recently, also
> adding similar pretty-print.. but only for things with a tbu driver
> (and a bit more open coded). So a helper would probably make sense.
>
> Less sure about mod param or Kconfig, but I tend to be of the opinion
> that the kernel already has too much configurability.. others may have
> a different view.
>
> BR,
> -R
Ping. Bringing this up again.
I think having something like a dev_dbg would work too.
Any thoughts? I'm okay with Rob's suggestion TBH.
Thanks,
Pranjal
>
> > Thanks,
> > Pranjal
> >
> > >
> > > > Even when unrelated stuff gets interleaved because other CPUs just
> > > > happen to be calling printk() at the same time for unrelated reasons
> > > > it's still annoying, and pr_cont makes a bigger mess than not.
> > > > >> I'd prefer to retain the original message as-is, so there is at least
> > > > >> still an unambiguous "atomic" view of a fault's entire state, then
> > > > >> follow it with a decode more in the style of arm64's ESR logging. TBH I
> > > > >> also wouldn't disapprove of hiding the additional decode behind a
> > > > >> command-line/runtime parameter, since a fault storm can cripple a system
> > > > >> enough as it is, without making the interrupt handler spend even longer
> > > > >> printing to a potentially slow console.
> > > > >
> > > > > It _is_ ratelimited. But we could perhaps use a higher loglevel (pr_debug?)
> > > >
> > > > Yeah, I'd have no complaint with pr_debug/dev_dbg either, if that suits
> > > > your use case. True that the ratelimit may typically mitigate the
> > > > overall impact, but still in the worst case with a sufficiently slow
> > > > console and/or a sufficiently large amount to print per __ratelimit()
> > > > call, it can end up being slow enough to stay below the threshold. Don't
> > > > ask me how I know that :)
> > > >
> > > > Thanks,
> > > > Robin.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-06-26 15:29 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-04 15:01 [PATCH] iommu/arm-smmu: Pretty-print context fault related regs Rob Clark
2024-06-17 10:27 ` Pranjal Shrivastava
2024-06-17 13:07 ` Robin Murphy
2024-06-17 16:18 ` Rob Clark
2024-06-17 17:33 ` Robin Murphy
2024-06-18 14:57 ` Rob Clark
2024-06-18 15:47 ` Pranjal Shrivastava
2024-06-18 16:07 ` Rob Clark
2024-06-26 15:28 ` Pranjal Shrivastava
-- strict thread matches above, loose matches on Subject: below --
2024-05-23 21:34 Rob Clark
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).