RE: [PATCH 3/4] iommu/arm-smmu: Disable stalling faults for all endpoints

linux-arm-msm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Sricharan" <sricharan@codeaurora.org>
To: 'Jordan Crouse' <jcrouse@codeaurora.org>,
	'Rob Clark' <robdclark@gmail.com>
Cc: 'linux-arm-msm' <linux-arm-msm@vger.kernel.org>,
	iommu@lists.linux-foundation.org,
	'Will Deacon' <will.deacon@arm.com>,
	linux-arm-kernel@lists.infradead.org
Subject: RE: [PATCH 3/4] iommu/arm-smmu: Disable stalling faults for all endpoints
Date: Sat, 10 Dec 2016 21:14:42 +0530	[thread overview]
Message-ID: <000301d252fc$5b19a810$114cf830$@codeaurora.org> (raw)
In-Reply-To: <20161207000025.GB21862@jcrouse-lnx.qualcomm.com>

Hi Rob,

>> > Although it's not really Linux's job to save hardware integrators from
>> > their own misfortune, it *is* our job to stop userspace (e.g. VFIO
>> > clients) from hosing the system for everybody else, even if they might
>> > already be required to have elevated privileges.
>> >
>> > Given that the fault handling code currently executes entirely in IRQ
>> > context, there is nothing that can sensibly be done to recover from
>> > things like page faults anyway, so let's rip this code out for now and
>> > avoid the potential for deadlock.
>>
>> Hi Will,
>>
>> so, I'd like to re-introduce this feature, I *guess* as some sort of
>> opt-in quirk (ie. disabled by default unless something in DT tells you
>> otherwise??  But I'm open to suggestions.  I'm not entirely sure what
>> hw was having problems due to this feature.)
>>
>> On newer snapdragon devices we are using arm-smmu for the GPU, and
>> halting the GPU so the driver's fault handler can dump some GPU state
>> on faults is enormously helpful for debugging and tracking down where
>> in the gpu cmdstream the fault was triggered.  In addition, we will
>> eventually want the ability to update pagetables from fault handler
>> and resuming the faulting transition.
>>
>> Some additional comments below..
>>
>> > Cc: <stable@vger.kernel.org>
>> > Reported-by: Matt Evans <matt.evans@arm.com>
>> > Signed-off-by: Will Deacon <will.deacon@arm.com>
>> > ---
>> >  drivers/iommu/arm-smmu.c | 34 +++++++---------------------------
>> >  1 file changed, 7 insertions(+), 27 deletions(-)
>> >
>> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> > index 4f49fe29f202..2db74ebc3240 100644
>> > --- a/drivers/iommu/arm-smmu.c
>> > +++ b/drivers/iommu/arm-smmu.c
>> > @@ -686,8 +686,7 @@ static struct iommu_gather_ops arm_smmu_gather_ops = {
>> >
>> >  static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>> >  {
>> > -       int flags, ret;
>> > -       u32 fsr, fsynr, resume;
>> > +       u32 fsr, fsynr;
>> >         unsigned long iova;
>> >         struct iommu_domain *domain = dev;
>> >         struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> > @@ -701,34 +700,15 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>> >         if (!(fsr & FSR_FAULT))
>> >                 return IRQ_NONE;
>> >
>> > -       if (fsr & FSR_IGN)
>> > -               dev_err_ratelimited(smmu->dev,
>> > -                                   "Unexpected context fault (fsr 0x%x)\n",
>> > -                                   fsr);
>> > -
>> >         fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
>> > -       flags = fsynr & FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ;
>> > -
>> >         iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
>> > -       if (!report_iommu_fault(domain, smmu->dev, iova, flags)) {
>> > -               ret = IRQ_HANDLED;
>> > -               resume = RESUME_RETRY;
>> > -       } else {
>> > -               dev_err_ratelimited(smmu->dev,
>> > -                   "Unhandled context fault: iova=0x%08lx, fsynr=0x%x, cb=%d\n",
>> > -                   iova, fsynr, cfg->cbndx);
>>
>> I would like to decouple this dev_err_ratelimit() print from the
>> RESUME_RETRY vs RESUME_TERMINATE behaviour.  I need the ability to
>> indicate by return from my fault handler whether to resume or
>> terminate.  But I already have my own ratelimted prints and would
>> prefer not to spam dmesg twice.
>>
>> I'm thinking about report_iommu_fault() returning:
>>
>>   0 => RESUME_RETRY
>>   -EFAULT => RESUME_TERMINATE but don't print
>>   anything else (or specifically -ENOSYS?) => RESUME_TERMINATE and print
>>
>> thoughts?
>>
>> > -               ret = IRQ_NONE;
>> > -               resume = RESUME_TERMINATE;
>> > -       }
>> > -
>> > -       /* Clear the faulting FSR */
>> > -       writel(fsr, cb_base + ARM_SMMU_CB_FSR);
>> >
>> > -       /* Retry or terminate any stalled transactions */
>> > -       if (fsr & FSR_SS)
>> > -               writel_relaxed(resume, cb_base + ARM_SMMU_CB_RESUME);
>>
>> This might be a bug in qcom's implementation of the smmu spec, but
>> seems like we don't have SS bit set, yet we still require RESUME reg
>> to be written, otherwise gpu is perma-wedged.  Maybe topic for a
>> separate quirk?  I'm not sure if writing RESUME reg on other hw when
>> SS bit is not set is likely to cause problems?  If not I suppose we
>> could just unconditionally write it.
>>
>> Anyways, I'm not super-familiar w/ arm-smmu so suggestions welcome..
>> in between debugging freedreno I'll try to put together some patches.
>
>From what I can tell we need SCTLR_CFCFG to make the stall happen otherwise
>the operation just gets terminated immediately and *then* we get notification
>but by then the system keeps going.
>

Yes thats right, SCTLR_CFCFG was removed as a part of this patch.

>I think SCTLR_HUPCF helps control that behavior (i.e. we don't go off faulting
>through eternity) but I don't know how it works.
>
>From my very unlearned understanding I think we do want to set CFCFG and then
>stall and let the interrupt handler decide to retry/terminate.

Yes, infact i was thinking of adding this as a new patch, but now that this was
a reverted one. As you said, it would be good to know  the hw which was having
problem with this and probably having this has a quirk otherwise.

Also i see that FSR_SS is implemented by the qcom  and the 
downstream code uses it.

Regards,
  Sricharan

next prev parent reply	other threads:[~2016-12-10 15:44 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1471525542-14969-1-git-send-email-will.deacon@arm.com>
     [not found] ` <1471525542-14969-4-git-send-email-will.deacon@arm.com>
     [not found]   ` <1471525542-14969-4-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
2016-12-06 23:30     ` [PATCH 3/4] iommu/arm-smmu: Disable stalling faults for all endpoints Rob Clark
2016-12-07  0:00       ` Jordan Crouse
2016-12-10 15:44         ` Sricharan [this message]
     [not found]       ` <CAF6AEGujUb37qagBZK1T13eAh=50zy0z5HpFF9FsuvkZZ7c5qw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-12-16 11:54         ` Will Deacon
     [not found]           ` <20161216115413.GE13191-5wv7dgnIgG8@public.gmane.org>
2016-12-19  9:03             ` Sricharan
2016-12-20 16:17               ` Will Deacon
2017-09-13 19:31             ` Rob Clark
2017-09-18 17:33               ` Will Deacon
2017-09-18 18:45                 ` Rob Clark

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='000301d252fc$5b19a810$114cf830$@codeaurora.org' \
    --to=sricharan@codeaurora.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jcrouse@codeaurora.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=robdclark@gmail.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).