RE: [PATCH 3/4] iommu/arm-smmu: Disable stalling faults for all endpoints

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Sricharan" <sricharan@codeaurora.org>
To: 'Jordan Crouse' <jcrouse@codeaurora.org>,
	'Rob Clark' <robdclark@gmail.com>
Cc: 'linux-arm-msm' <linux-arm-msm@vger.kernel.org>,
	iommu@lists.linux-foundation.org,
	'Will Deacon' <will.deacon@arm.com>,
	linux-arm-kernel@lists.infradead.org
Subject: RE: [PATCH 3/4] iommu/arm-smmu: Disable stalling faults for all endpoints
Date: Sat, 10 Dec 2016 21:14:42 +0530	[thread overview]
Message-ID: <000301d252fc$5b19a810$114cf830$@codeaurora.org> (raw)
In-Reply-To: <20161207000025.GB21862@jcrouse-lnx.qualcomm.com>

Hi Rob,

>> > Although it's not really Linux's job to save hardware integrators from
>> > their own misfortune, it *is* our job to stop userspace (e.g. VFIO
>> > clients) from hosing the system for everybody else, even if they might
>> > already be required to have elevated privileges.
>> >
>> > Given that the fault handling code currently executes entirely in IRQ
>> > context, there is nothing that can sensibly be done to recover from
>> > things like page faults anyway, so let's rip this code out for now and
>> > avoid the potential for deadlock.
>>
>> Hi Will,
>>
>> so, I'd like to re-introduce this feature, I *guess* as some sort of
>> opt-in quirk (ie. disabled by default unless something in DT tells you
>> otherwise??  But I'm open to suggestions.  I'm not entirely sure what
>> hw was having problems due to this feature.)
>>
>> On newer snapdragon devices we are using arm-smmu for the GPU, and
>> halting the GPU so the driver's fault handler can dump some GPU state
>> on faults is enormously helpful for debugging and tracking down where
>> in the gpu cmdstream the fault was triggered.  In addition, we will
>> eventually want the ability to update pagetables from fault handler
>> and resuming the faulting transition.
>>
>> Some additional comments below..
>>
>> > Cc: <stable@vger.kernel.org>
>> > Reported-by: Matt Evans <matt.evans@arm.com>
>> > Signed-off-by: Will Deacon <will.deacon@arm.com>
>> > ---
>> >  drivers/iommu/arm-smmu.c | 34 +++++++---------------------------
>> >  1 file changed, 7 insertions(+), 27 deletions(-)
>> >
>> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> > index 4f49fe29f202..2db74ebc3240 100644
>> > --- a/drivers/iommu/arm-smmu.c
>> > +++ b/drivers/iommu/arm-smmu.c
>> > @@ -686,8 +686,7 @@ static struct iommu_gather_ops arm_smmu_gather_ops = {
>> >
>> >  static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>> >  {
>> > -       int flags, ret;
>> > -       u32 fsr, fsynr, resume;
>> > +       u32 fsr, fsynr;
>> >         unsigned long iova;
>> >         struct iommu_domain *domain = dev;
>> >         struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> > @@ -701,34 +700,15 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>> >         if (!(fsr & FSR_FAULT))
>> >                 return IRQ_NONE;
>> >
>> > -       if (fsr & FSR_IGN)
>> > -               dev_err_ratelimited(smmu->dev,
>> > -                                   "Unexpected context fault (fsr 0x%x)\n",
>> > -                                   fsr);
>> > -
>> >         fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
>> > -       flags = fsynr & FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ;
>> > -
>> >         iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
>> > -       if (!report_iommu_fault(domain, smmu->dev, iova, flags)) {
>> > -               ret = IRQ_HANDLED;
>> > -               resume = RESUME_RETRY;
>> > -       } else {
>> > -               dev_err_ratelimited(smmu->dev,
>> > -                   "Unhandled context fault: iova=0x%08lx, fsynr=0x%x, cb=%d\n",
>> > -                   iova, fsynr, cfg->cbndx);
>>
>> I would like to decouple this dev_err_ratelimit() print from the
>> RESUME_RETRY vs RESUME_TERMINATE behaviour.  I need the ability to
>> indicate by return from my fault handler whether to resume or
>> terminate.  But I already have my own ratelimted prints and would
>> prefer not to spam dmesg twice.
>>
>> I'm thinking about report_iommu_fault() returning:
>>
>>   0 => RESUME_RETRY
>>   -EFAULT => RESUME_TERMINATE but don't print
>>   anything else (or specifically -ENOSYS?) => RESUME_TERMINATE and print
>>
>> thoughts?
>>
>> > -               ret = IRQ_NONE;
>> > -               resume = RESUME_TERMINATE;
>> > -       }
>> > -
>> > -       /* Clear the faulting FSR */
>> > -       writel(fsr, cb_base + ARM_SMMU_CB_FSR);
>> >
>> > -       /* Retry or terminate any stalled transactions */
>> > -       if (fsr & FSR_SS)
>> > -               writel_relaxed(resume, cb_base + ARM_SMMU_CB_RESUME);
>>
>> This might be a bug in qcom's implementation of the smmu spec, but
>> seems like we don't have SS bit set, yet we still require RESUME reg
>> to be written, otherwise gpu is perma-wedged.  Maybe topic for a
>> separate quirk?  I'm not sure if writing RESUME reg on other hw when
>> SS bit is not set is likely to cause problems?  If not I suppose we
>> could just unconditionally write it.
>>
>> Anyways, I'm not super-familiar w/ arm-smmu so suggestions welcome..
>> in between debugging freedreno I'll try to put together some patches.
>
>From what I can tell we need SCTLR_CFCFG to make the stall happen otherwise
>the operation just gets terminated immediately and *then* we get notification
>but by then the system keeps going.
>

Yes thats right, SCTLR_CFCFG was removed as a part of this patch.

>I think SCTLR_HUPCF helps control that behavior (i.e. we don't go off faulting
>through eternity) but I don't know how it works.
>
>From my very unlearned understanding I think we do want to set CFCFG and then
>stall and let the interrupt handler decide to retry/terminate.

Yes, infact i was thinking of adding this as a new patch, but now that this was
a reverted one. As you said, it would be good to know  the hw which was having
problem with this and probably having this has a quirk otherwise.

Also i see that FSR_SS is implemented by the qcom  and the 
downstream code uses it.

Regards,
  Sricharan

WARNING: multiple messages have this Message-ID (diff)

From: sricharan@codeaurora.org (Sricharan)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 3/4] iommu/arm-smmu: Disable stalling faults for all endpoints
Date: Sat, 10 Dec 2016 21:14:42 +0530	[thread overview]
Message-ID: <000301d252fc$5b19a810$114cf830$@codeaurora.org> (raw)
In-Reply-To: <20161207000025.GB21862@jcrouse-lnx.qualcomm.com>

Hi Rob,

>> > Although it's not really Linux's job to save hardware integrators from
>> > their own misfortune, it *is* our job to stop userspace (e.g. VFIO
>> > clients) from hosing the system for everybody else, even if they might
>> > already be required to have elevated privileges.
>> >
>> > Given that the fault handling code currently executes entirely in IRQ
>> > context, there is nothing that can sensibly be done to recover from
>> > things like page faults anyway, so let's rip this code out for now and
>> > avoid the potential for deadlock.
>>
>> Hi Will,
>>
>> so, I'd like to re-introduce this feature, I *guess* as some sort of
>> opt-in quirk (ie. disabled by default unless something in DT tells you
>> otherwise??  But I'm open to suggestions.  I'm not entirely sure what
>> hw was having problems due to this feature.)
>>
>> On newer snapdragon devices we are using arm-smmu for the GPU, and
>> halting the GPU so the driver's fault handler can dump some GPU state
>> on faults is enormously helpful for debugging and tracking down where
>> in the gpu cmdstream the fault was triggered.  In addition, we will
>> eventually want the ability to update pagetables from fault handler
>> and resuming the faulting transition.
>>
>> Some additional comments below..
>>
>> > Cc: <stable@vger.kernel.org>
>> > Reported-by: Matt Evans <matt.evans@arm.com>
>> > Signed-off-by: Will Deacon <will.deacon@arm.com>
>> > ---
>> >  drivers/iommu/arm-smmu.c | 34 +++++++---------------------------
>> >  1 file changed, 7 insertions(+), 27 deletions(-)
>> >
>> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> > index 4f49fe29f202..2db74ebc3240 100644
>> > --- a/drivers/iommu/arm-smmu.c
>> > +++ b/drivers/iommu/arm-smmu.c
>> > @@ -686,8 +686,7 @@ static struct iommu_gather_ops arm_smmu_gather_ops = {
>> >
>> >  static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>> >  {
>> > -       int flags, ret;
>> > -       u32 fsr, fsynr, resume;
>> > +       u32 fsr, fsynr;
>> >         unsigned long iova;
>> >         struct iommu_domain *domain = dev;
>> >         struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> > @@ -701,34 +700,15 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>> >         if (!(fsr & FSR_FAULT))
>> >                 return IRQ_NONE;
>> >
>> > -       if (fsr & FSR_IGN)
>> > -               dev_err_ratelimited(smmu->dev,
>> > -                                   "Unexpected context fault (fsr 0x%x)\n",
>> > -                                   fsr);
>> > -
>> >         fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
>> > -       flags = fsynr & FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ;
>> > -
>> >         iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
>> > -       if (!report_iommu_fault(domain, smmu->dev, iova, flags)) {
>> > -               ret = IRQ_HANDLED;
>> > -               resume = RESUME_RETRY;
>> > -       } else {
>> > -               dev_err_ratelimited(smmu->dev,
>> > -                   "Unhandled context fault: iova=0x%08lx, fsynr=0x%x, cb=%d\n",
>> > -                   iova, fsynr, cfg->cbndx);
>>
>> I would like to decouple this dev_err_ratelimit() print from the
>> RESUME_RETRY vs RESUME_TERMINATE behaviour.  I need the ability to
>> indicate by return from my fault handler whether to resume or
>> terminate.  But I already have my own ratelimted prints and would
>> prefer not to spam dmesg twice.
>>
>> I'm thinking about report_iommu_fault() returning:
>>
>>   0 => RESUME_RETRY
>>   -EFAULT => RESUME_TERMINATE but don't print
>>   anything else (or specifically -ENOSYS?) => RESUME_TERMINATE and print
>>
>> thoughts?
>>
>> > -               ret = IRQ_NONE;
>> > -               resume = RESUME_TERMINATE;
>> > -       }
>> > -
>> > -       /* Clear the faulting FSR */
>> > -       writel(fsr, cb_base + ARM_SMMU_CB_FSR);
>> >
>> > -       /* Retry or terminate any stalled transactions */
>> > -       if (fsr & FSR_SS)
>> > -               writel_relaxed(resume, cb_base + ARM_SMMU_CB_RESUME);
>>
>> This might be a bug in qcom's implementation of the smmu spec, but
>> seems like we don't have SS bit set, yet we still require RESUME reg
>> to be written, otherwise gpu is perma-wedged.  Maybe topic for a
>> separate quirk?  I'm not sure if writing RESUME reg on other hw when
>> SS bit is not set is likely to cause problems?  If not I suppose we
>> could just unconditionally write it.
>>
>> Anyways, I'm not super-familiar w/ arm-smmu so suggestions welcome..
>> in between debugging freedreno I'll try to put together some patches.
>
>From what I can tell we need SCTLR_CFCFG to make the stall happen otherwise
>the operation just gets terminated immediately and *then* we get notification
>but by then the system keeps going.
>

Yes thats right, SCTLR_CFCFG was removed as a part of this patch.

>I think SCTLR_HUPCF helps control that behavior (i.e. we don't go off faulting
>through eternity) but I don't know how it works.
>
>From my very unlearned understanding I think we do want to set CFCFG and then
>stall and let the interrupt handler decide to retry/terminate.

Yes, infact i was thinking of adding this as a new patch, but now that this was
a reverted one. As you said, it would be good to know  the hw which was having
problem with this and probably having this has a quirk otherwise.

Also i see that FSR_SS is implemented by the qcom  and the 
downstream code uses it.

Regards,
  Sricharan

next prev parent reply	other threads:[~2016-12-10 15:44 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-18 13:05 [PATCH 0/4] iommu/arm-smmu: Fixes for 4.8 Will Deacon
2016-08-18 13:05 ` Will Deacon
2016-08-18 13:05 ` [PATCH 1/4] iommu/io-pgtable-arm-v7s: Fix attributes when splitting blocks Will Deacon
2016-08-18 13:05   ` Will Deacon
2016-08-18 13:05   ` Will Deacon
2016-08-18 13:05 ` [PATCH 2/4] iommu/arm-smmu: Fix CMDQ error handling Will Deacon
2016-08-18 13:05   ` Will Deacon
2016-08-18 13:05   ` Will Deacon
2016-08-18 13:05 ` [PATCH 3/4] iommu/arm-smmu: Disable stalling faults for all endpoints Will Deacon
2016-08-18 13:05   ` Will Deacon
2016-08-18 13:05   ` Will Deacon
     [not found]   ` <1471525542-14969-4-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
2016-12-06 23:30     ` Rob Clark
2016-12-06 23:30       ` Rob Clark
2016-12-07  0:00       ` Jordan Crouse
2016-12-07  0:00         ` Jordan Crouse
2016-12-10 15:44         ` Sricharan [this message]
2016-12-10 15:44           ` Sricharan
     [not found]       ` <CAF6AEGujUb37qagBZK1T13eAh=50zy0z5HpFF9FsuvkZZ7c5qw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-12-16 11:54         ` Will Deacon
2016-12-16 11:54           ` Will Deacon
     [not found]           ` <20161216115413.GE13191-5wv7dgnIgG8@public.gmane.org>
2016-12-19  9:03             ` Sricharan
2016-12-19  9:03               ` Sricharan
2016-12-20 16:17               ` Will Deacon
2016-12-20 16:17                 ` Will Deacon
2017-09-13 19:31             ` Rob Clark
2017-09-13 19:31               ` Rob Clark
2017-09-18 17:33               ` Will Deacon
2017-09-18 17:33                 ` Will Deacon
2017-09-18 18:45                 ` Rob Clark
2017-09-18 18:45                   ` Rob Clark
     [not found] ` <1471525542-14969-1-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
2016-08-18 13:05   ` [PATCH 4/4] iommu/arm-smmu: Don't BUG() if we find aborting STEs with disable_bypass Will Deacon
2016-08-18 13:05     ` Will Deacon
2016-08-18 16:52   ` [PATCH 0/4] iommu/arm-smmu: Fixes for 4.8 Joerg Roedel
2016-08-18 16:52     ` Joerg Roedel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='000301d252fc$5b19a810$114cf830$@codeaurora.org' \
    --to=sricharan@codeaurora.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jcrouse@codeaurora.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=robdclark@gmail.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.