[PATCH 1/2] iommu/msm: resume device after fault

linux-arm-msm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/2] iommu/msm: resume device after fault
@ 2016-08-12 15:29 Rob Clark
       [not found] ` <1471015747-569-1-git-send-email-robdclark-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2016-08-12 15:50 ` [PATCH 1/2] iommu/msm: resume device after fault Sricharan
  0 siblings, 2 replies; 5+ messages in thread
From: Rob Clark @ 2016-08-12 15:29 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA, Sricharan

We need to disable stall on memory access after a fault, otherwise the
device using the iommu will be perma-wedged with no access to memory.
This was causing drm/msm to be unable to recover the gpu after an iommu
fault.

Signed-off-by: Rob Clark <robdclark-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 drivers/iommu/msm_iommu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index b09692b..f6f596f 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -628,6 +628,7 @@ irqreturn_t msm_iommu_fault_handler(int irq, void *dev_id)
 			pr_err("Interesting registers:\n");
 			print_ctx_regs(iommu->base, i);
 			SET_FSR(iommu->base, i, 0x4000000F);
+			SET_RESUME(iommu->base, i, 1);
 		}
 	}
 	__disable_clocks(iommu);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] iommu/msm: wire up fault handling
       [not found] ` <1471015747-569-1-git-send-email-robdclark-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2016-08-12 15:29   ` Rob Clark
       [not found]     ` <1471015747-569-2-git-send-email-robdclark-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Rob Clark @ 2016-08-12 15:29 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA, Sricharan

When things go wrong on the gpu, we can get *thousands* of faults.  With
so many pr_err() prints, which were slowing down resuming the iommu,
drm/msm would think the GPU had actually hung and reset it.

Wire up the fault reporting, so instead we get a small ratelimited print
of the fault address from drm/msm's fault handler instead.

Signed-off-by: Rob Clark <robdclark-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 drivers/iommu/msm_iommu.c | 16 +++++++++++-----
 drivers/iommu/msm_iommu.h |  3 +++
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index f6f596f..1110b72 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -411,6 +411,7 @@ static int msm_iommu_attach_dev(struct iommu_domain *domain, struct device *dev)
 			}
 			__disable_clocks(iommu);
 			list_add(&iommu->dom_node, &priv->list_attached);
+			iommu->domain = domain;
 		}
 	}
 
@@ -614,8 +615,8 @@ irqreturn_t msm_iommu_fault_handler(int irq, void *dev_id)
 		goto fail;
 	}
 
-	pr_err("Unexpected IOMMU page fault!\n");
-	pr_err("base = %08x\n", (unsigned int)iommu->base);
+	pr_debug("Unexpected IOMMU page fault!\n");
+	pr_debug("base = %08x\n", (unsigned int)iommu->base);
 
 	ret = __enable_clocks(iommu);
 	if (ret)
@@ -624,9 +625,14 @@ irqreturn_t msm_iommu_fault_handler(int irq, void *dev_id)
 	for (i = 0; i < iommu->ncb; i++) {
 		fsr = GET_FSR(iommu->base, i);
 		if (fsr) {
-			pr_err("Fault occurred in context %d.\n", i);
-			pr_err("Interesting registers:\n");
-			print_ctx_regs(iommu->base, i);
+			int ret = report_iommu_fault(iommu->domain,
+					to_msm_priv(iommu->domain)->dev,
+					GET_FAR(iommu->base, i), 0);
+			if (ret == -ENOSYS) {
+				pr_err("Fault occurred in context %d.\n", i);
+				pr_err("Interesting registers:\n");
+				print_ctx_regs(iommu->base, i);
+			}
 			SET_FSR(iommu->base, i, 0x4000000F);
 			SET_RESUME(iommu->base, i, 1);
 		}
diff --git a/drivers/iommu/msm_iommu.h b/drivers/iommu/msm_iommu.h
index 4ca25d5..c53016c 100644
--- a/drivers/iommu/msm_iommu.h
+++ b/drivers/iommu/msm_iommu.h
@@ -56,6 +56,8 @@
  * dom_node:	list head for domain
  * ctx_list:	list of 'struct msm_iommu_ctx_dev'
  * context_map: Bitmap to track allocated context banks
+ * domain:	iommu domain that this iommu dev is a member of,
+ * 		ie. whose msm_priv::list_attached are we on?
  */
 struct msm_iommu_dev {
 	void __iomem *base;
@@ -68,6 +70,7 @@ struct msm_iommu_dev {
 	struct list_head dom_node;
 	struct list_head ctx_list;
 	DECLARE_BITMAP(context_map, IOMMU_MAX_CBS);
+	struct iommu_domain *domain;
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [PATCH 1/2] iommu/msm: resume device after fault
  2016-08-12 15:29 [PATCH 1/2] iommu/msm: resume device after fault Rob Clark
       [not found] ` <1471015747-569-1-git-send-email-robdclark-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2016-08-12 15:50 ` Sricharan
  1 sibling, 0 replies; 5+ messages in thread
From: Sricharan @ 2016-08-12 15:50 UTC (permalink / raw)
  To: 'Rob Clark', iommu; +Cc: linux-arm-msm

Hi,

>We need to disable stall on memory access after a fault, otherwise the
>device using the iommu will be perma-wedged with no access to memory.
>This was causing drm/msm to be unable to recover the gpu after an iommu
>fault.
>
>Signed-off-by: Rob Clark <robdclark@gmail.com>
>---
> drivers/iommu/msm_iommu.c | 1 +
> 1 file changed, 1 insertion(+)
>
>diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
>index b09692b..f6f596f 100644
>--- a/drivers/iommu/msm_iommu.c
>+++ b/drivers/iommu/msm_iommu.c
>@@ -628,6 +628,7 @@ irqreturn_t msm_iommu_fault_handler(int irq, void *dev_id)
> 			pr_err("Interesting registers:\n");
> 			print_ctx_regs(iommu->base, i);
> 			SET_FSR(iommu->base, i, 0x4000000F);
>+			SET_RESUME(iommu->base, i, 1);

    Acked-by:  sricharan@codeaurora.org

Regards,
Sricharan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH 2/2] iommu/msm: wire up fault handling
       [not found]     ` <1471015747-569-2-git-send-email-robdclark-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2016-08-12 16:17       ` Sricharan
  2016-08-12 16:39         ` Rob Clark
  0 siblings, 1 reply; 5+ messages in thread
From: Sricharan @ 2016-08-12 16:17 UTC (permalink / raw)
  To: 'Rob Clark',
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA

Hi,

>When things go wrong on the gpu, we can get *thousands* of faults.  With
>so many pr_err() prints, which were slowing down resuming the iommu,
>drm/msm would think the GPU had actually hung and reset it.
>
>Wire up the fault reporting, so instead we get a small ratelimited print
>of the fault address from drm/msm's fault handler instead.
>
>Signed-off-by: Rob Clark <robdclark-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>---
> drivers/iommu/msm_iommu.c | 16 +++++++++++-----
> drivers/iommu/msm_iommu.h |  3 +++
> 2 files changed, 14 insertions(+), 5 deletions(-)
>
>diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
>index f6f596f..1110b72 100644
>--- a/drivers/iommu/msm_iommu.c
>+++ b/drivers/iommu/msm_iommu.c
>@@ -411,6 +411,7 @@ static int msm_iommu_attach_dev(struct iommu_domain *domain, struct device *dev)
> 			}
> 			__disable_clocks(iommu);
> 			list_add(&iommu->dom_node, &priv->list_attached);
>+			iommu->domain = domain;
> 		}
> 	}
>
>@@ -614,8 +615,8 @@ irqreturn_t msm_iommu_fault_handler(int irq, void *dev_id)
> 		goto fail;
> 	}
>
>-	pr_err("Unexpected IOMMU page fault!\n");
>-	pr_err("base = %08x\n", (unsigned int)iommu->base);
>+	pr_debug("Unexpected IOMMU page fault!\n");

              So was just thinking if its better to have only this as a ratelimited print,
              for global faults ?, otherwise
                Reviewed-by: sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org
 
Regards,
 Sricharan
  

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] iommu/msm: wire up fault handling
  2016-08-12 16:17       ` Sricharan
@ 2016-08-12 16:39         ` Rob Clark
  0 siblings, 0 replies; 5+ messages in thread
From: Rob Clark @ 2016-08-12 16:39 UTC (permalink / raw)
  To: Sricharan; +Cc: iommu@lists.linux-foundation.org, linux-arm-msm

On Fri, Aug 12, 2016 at 12:17 PM, Sricharan <sricharan@codeaurora.org> wrote:
> Hi,
>
>>When things go wrong on the gpu, we can get *thousands* of faults.  With
>>so many pr_err() prints, which were slowing down resuming the iommu,
>>drm/msm would think the GPU had actually hung and reset it.
>>
>>Wire up the fault reporting, so instead we get a small ratelimited print
>>of the fault address from drm/msm's fault handler instead.
>>
>>Signed-off-by: Rob Clark <robdclark@gmail.com>
>>---
>> drivers/iommu/msm_iommu.c | 16 +++++++++++-----
>> drivers/iommu/msm_iommu.h |  3 +++
>> 2 files changed, 14 insertions(+), 5 deletions(-)
>>
>>diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
>>index f6f596f..1110b72 100644
>>--- a/drivers/iommu/msm_iommu.c
>>+++ b/drivers/iommu/msm_iommu.c
>>@@ -411,6 +411,7 @@ static int msm_iommu_attach_dev(struct iommu_domain *domain, struct device *dev)
>>                       }
>>                       __disable_clocks(iommu);
>>                       list_add(&iommu->dom_node, &priv->list_attached);
>>+                      iommu->domain = domain;
>>               }
>>       }
>>
>>@@ -614,8 +615,8 @@ irqreturn_t msm_iommu_fault_handler(int irq, void *dev_id)
>>               goto fail;
>>       }
>>
>>-      pr_err("Unexpected IOMMU page fault!\n");
>>-      pr_err("base = %08x\n", (unsigned int)iommu->base);
>>+      pr_debug("Unexpected IOMMU page fault!\n");
>
>               So was just thinking if its better to have only this as a ratelimited print,
>               for global faults ?, otherwise

it is possibly a good idea to ratelimit the pr_err prints that get
printed when there is not a fault handler installed..  although in the
case there is a handler, I don't think we should print anything.  (At
least not unless DEBUG is defined.)

If we can actually resume the faulting memory transaction, then we
could use this to implement virtual memory for the GPU, like the HMM
stuff.. in order to use malloc'd memory with the gpu without having to
pin..

(I know we can resume future memory transactions, but not sure if we
can update iommu page tables and resume the transaction that triggered
the fault..)

BR,
-R

>                 Reviewed-by: sricharan@codeaurora.org
>
> Regards,
>  Sricharan
>
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-08-12 16:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-12 15:29 [PATCH 1/2] iommu/msm: resume device after fault Rob Clark
     [not found] ` <1471015747-569-1-git-send-email-robdclark-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-08-12 15:29   ` [PATCH 2/2] iommu/msm: wire up fault handling Rob Clark
     [not found]     ` <1471015747-569-2-git-send-email-robdclark-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-08-12 16:17       ` Sricharan
2016-08-12 16:39         ` Rob Clark
2016-08-12 15:50 ` [PATCH 1/2] iommu/msm: resume device after fault Sricharan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).