From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from mx1.redhat.com ([209.132.183.28]) by bombadil.infradead.org with esmtps (Exim 4.85_2 #1 (Red Hat Linux)) id 1cBBpa-0006XR-S2 for kexec@lists.infradead.org; Mon, 28 Nov 2016 02:38:12 +0000 Subject: Re: [PATCH v2] iommu/vt-d: Flush old iommu caches for kdump when the device gets context mapped References: <1479486225-9286-1-git-send-email-xlpang@redhat.com> From: Xunlei Pang Message-ID: <583B98D0.90300@redhat.com> Date: Mon, 28 Nov 2016 10:39:12 +0800 MIME-Version: 1.0 In-Reply-To: <1479486225-9286-1-git-send-email-xlpang@redhat.com> List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: xlpang@redhat.com Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "kexec" Errors-To: kexec-bounces+dwmw2=infradead.org@lists.infradead.org To: Xunlei Pang , iommu@lists.linux-foundation.org, David Woodhouse , Joerg Roedel Cc: Don Brace , Baoquan He , Joseph Szczypek , Dave Young , kexec@lists.infradead.org, linux-kernel@vger.kernel.org, Myron Stowe Ping Joerg/David, do you have any comment on it? On 2016/11/19 at 00:23, Xunlei Pang wrote: > We met the DMAR fault both on hpsa P420i and P421 SmartArray controllers > under kdump, it can be steadily reproduced on several different machines, > the dmesg log is like(running on 4.9.0-rc5+): > HP HPSA Driver (v 3.4.16-0) > hpsa 0000:02:00.0: using doorbell to reset controller > hpsa 0000:02:00.0: board ready after hard reset. > hpsa 0000:02:00.0: Waiting for controller to respond to no-op > DMAR: Setting identity map for device 0000:02:00.0 [0xe8000 - 0xe8fff] > DMAR: Setting identity map for device 0000:02:00.0 [0xf4000 - 0xf4fff] > DMAR: Setting identity map for device 0000:02:00.0 [0xbdf6e000 - 0xbdf6efff] > DMAR: Setting identity map for device 0000:02:00.0 [0xbdf6f000 - 0xbdf7efff] > DMAR: Setting identity map for device 0000:02:00.0 [0xbdf7f000 - 0xbdf82fff] > DMAR: Setting identity map for device 0000:02:00.0 [0xbdf83000 - 0xbdf84fff] > DMAR: DRHD: handling fault status reg 2 > DMAR: [DMA Read] Request device [02:00.0] fault addr fffff000 [fault reason 06] PTE Read access is not set > hpsa 0000:02:00.0: controller message 03:00 timed out > hpsa 0000:02:00.0: no-op failed; re-trying > > After some debugging, we found that the fault addr is from DMA initiated at > the driver probe stage after reset(not in-flight DMA), and the corresponding > pte entry value is correct, the fault is likely due to the old iommu caches > of the in-flight DMA before it. > > Thus we need to flush the old cache after context mapping is setup for the > device, where the device is supposed to finish reset at its driver probe > stage and no in-flight DMA exists hereafter. > > I'm not sure if the hardware is responsible for invalidating all the related > caches allocated in iommu hardware during reset, but seems not the case for hpsa, > actually many device drivers even have problems properly resetting the hardware. > Anyway flushing (again) by software in kdump mode when the device gets context > mapped which is a quite infrequent operation does little harm. > > With this patch, the problematic machine can survive the kdump tests. > > CC: Myron Stowe > CC: Joseph Szczypek > CC: Don Brace > CC: Baoquan He > CC: Dave Young > Signed-off-by: Xunlei Pang > --- > v1 -> v2: > Flush caches using old domain id. > > drivers/iommu/intel-iommu.c | 22 ++++++++++++++++++++++ > 1 file changed, 22 insertions(+) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index 3965e73..653304d 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -2024,6 +2024,28 @@ static int domain_context_mapping_one(struct dmar_domain *domain, > if (context_present(context)) > goto out_unlock; > > + /* > + * For kdump cases, old valid entries may be cached due to the > + * in-flight DMA and copied pgtable, but there is no unmapping > + * behaviour for them, thus we need an explicit cache flush for > + * the newly-mapped device. For kdump, at this point, the device > + * is supposed to finish reset at its driver probe stage, so no > + * in-flight DMA will exist, and we don't need to worry anymore > + * hereafter. > + */ > + if (context_copied(context)) { > + u16 did_old = context_domain_id(context); > + > + if (did_old >= 0 && did_old < cap_ndoms(iommu->cap)) { > + iommu->flush.flush_context(iommu, did_old, > + (((u16)bus) << 8) | devfn, > + DMA_CCMD_MASK_NOBIT, > + DMA_CCMD_DEVICE_INVL); > + iommu->flush.flush_iotlb(iommu, did_old, 0, 0, > + DMA_TLB_DSI_FLUSH); > + } > + } > + > pgd = domain->pgd; > > context_clear_entry(context); _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec