[PATCH v2] iommu: intel: do deep dma-unmapping, to avoid kernel-flooding.

public inbox for iommu@lists.linux-foundation.org
 help / color / mirror / Atom feed

* [PATCH v2] iommu: intel: do deep dma-unmapping, to avoid kernel-flooding.
@ 2021-10-12 13:56 Ajay Garg
  2021-10-22 17:33 ` Ajay Garg
  2021-11-09 23:56 ` Alex Williamson
  0 siblings, 2 replies; 5+ messages in thread
From: Ajay Garg @ 2021-10-12 13:56 UTC (permalink / raw)
  To: iommu; +Cc: alex.williamson

Origins at :
https://lists.linuxfoundation.org/pipermail/iommu/2021-October/thread.html

=== Changes from v1 => v2 ===

a)
Improved patch-description.

b)
A more root-level fix, as suggested by

	1.
	Alex Williamson <alex.williamson@redhat.com>

	2.
	Lu Baolu <baolu.lu@linux.intel.com>

=== Issue ===

Kernel-flooding is seen, when an x86_64 L1 guest (Ubuntu-21) is booted in qemu/kvm
on a x86_64 host (Ubuntu-21), with a host-pci-device attached.

Following kind of logs, along with the stacktraces, cause the flood :

......
 DMAR: ERROR: DMA PTE for vPFN 0x428ec already set (to 3f6ec003 not 3f6ec003)
 DMAR: ERROR: DMA PTE for vPFN 0x428ed already set (to 3f6ed003 not 3f6ed003)
 DMAR: ERROR: DMA PTE for vPFN 0x428ee already set (to 3f6ee003 not 3f6ee003)
 DMAR: ERROR: DMA PTE for vPFN 0x428ef already set (to 3f6ef003 not 3f6ef003)
 DMAR: ERROR: DMA PTE for vPFN 0x428f0 already set (to 3f6f0003 not 3f6f0003)
......

=== Current Behaviour, leading to the issue ===

Currently, when we do a dma-unmapping, we unmap/unlink the mappings, but
the pte-entries are not cleared.

Thus, following sequencing would flood the kernel-logs :

i)
A dma-unmapping makes the real/leaf-level pte-slot invalid, but the 
pte-content itself is not cleared.

ii)
Now, during some later dma-mapping procedure, as the pte-slot is about
to hold a new pte-value, the intel-iommu checks if a prior 
pte-entry exists in the pte-slot. If it exists, it logs a kernel-error,
along with a corresponding stacktrace.

iii)
Step ii) runs in abundance, and the kernel-logs run insane.

=== Fix ===

We ensure that as part of a dma-unmapping, each (unmapped) pte-slot
is also cleared of its value/content (at the leaf-level, where the 
real mapping from a iova => pfn mapping is stored). 

This completes a "deep" dma-unmapping.

Signed-off-by: Ajay Garg <ajaygargnsit@gmail.com>
---
 drivers/iommu/intel/iommu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d75f59ae28e6..485a8ea71394 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5090,6 +5090,8 @@ static size_t intel_iommu_unmap(struct iommu_domain *domain,
 	gather->freelist = domain_unmap(dmar_domain, start_pfn,
 					last_pfn, gather->freelist);

+	dma_pte_clear_range(dmar_domain, start_pfn, last_pfn);
+
 	if (dmar_domain->max_addr == iova + size)
 		dmar_domain->max_addr = iova;

-- 
2.30.2

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] iommu: intel: do deep dma-unmapping, to avoid kernel-flooding.
  2021-10-12 13:56 [PATCH v2] iommu: intel: do deep dma-unmapping, to avoid kernel-flooding Ajay Garg
@ 2021-10-22 17:33 ` Ajay Garg
  2021-10-23  7:00   ` Ajay Garg
  2021-11-09 23:56 ` Alex Williamson
  1 sibling, 1 reply; 5+ messages in thread
From: Ajay Garg @ 2021-10-22 17:33 UTC (permalink / raw)
  To: iommu; +Cc: Alex Williamson

Ping ..

Any updates please on this?

It will be great to have the fix upstreamed (properly of course).

Right now, the patch contains the change as suggested, of
explicitly/properly clearing out dma-mappings when unmap is called.
Please let me know in whatever way I can help, including
testing/debugging for other approaches if required.


Many thanks to Alex and Lu for their continued support on the issue.



P.S. :

I might have missed mentioning the information about the device that
causes flooding.
Please find it below :

######################################
sudo lspci -vvv

0a:00.0 SD Host controller: O2 Micro, Inc. OZ600FJ0/OZ900FJ0/OZ600FJS
SD/MMC Card Reader Controller (rev 05) (prog-if 01)
    Subsystem: Dell OZ600FJ0/OZ900FJ0/OZ600FJS SD/MMC Card Reader Controller
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin A routed to IRQ 17
    IOMMU group: 14
    Region 0: Memory at e2c20000 (32-bit, non-prefetchable) [size=512]
    Capabilities: [a0] Power Management version 3
        Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
        Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [48] MSI: Enable- Count=1/1 Maskable+ 64bit+
        Address: 0000000000000000  Data: 0000
        Masking: 00000000  Pending: 00000000
    Capabilities: [80] Express (v1) Endpoint, MSI 00
        DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us
            ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
SlotPowerLimit 10.000W
        DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
            RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
            MaxPayload 128 bytes, MaxReadReq 512 bytes
        DevSta:    CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
        LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
Latency L0s <512ns, L1 <64us
            ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
        LnkCtl:    ASPM L0s Enabled; RCB 64 bytes, Disabled- CommClk-
            ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
            TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
    Capabilities: [100 v1] Virtual Channel
        Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
        Arb:    Fixed- WRR32- WRR64- WRR128-
        Ctrl:    ArbSelect=Fixed
        Status:    InProgress-
        VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
            Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
            Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
            Status:    NegoPending- InProgress-
    Capabilities: [200 v1] Advanced Error Reporting
        UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
        CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
        AERCap:    First Error Pointer: 00, ECRCGenCap- ECRCGenEn-
ECRCChkCap- ECRCChkEn-
            MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
        HeaderLog: 00000000 00000000 00000000 00000000
    Kernel driver in use: sdhci-pci
    Kernel modules: sdhci_pci
######################################



Thanks and Regards,
Ajay

On Tue, Oct 12, 2021 at 7:27 PM Ajay Garg <ajaygargnsit@gmail.com> wrote:
>
> Origins at :
> https://lists.linuxfoundation.org/pipermail/iommu/2021-October/thread.html
>
> === Changes from v1 => v2 ===
>
> a)
> Improved patch-description.
>
> b)
> A more root-level fix, as suggested by
>
>         1.
>         Alex Williamson <alex.williamson@redhat.com>
>
>         2.
>         Lu Baolu <baolu.lu@linux.intel.com>
>
>
>
> === Issue ===
>
> Kernel-flooding is seen, when an x86_64 L1 guest (Ubuntu-21) is booted in qemu/kvm
> on a x86_64 host (Ubuntu-21), with a host-pci-device attached.
>
> Following kind of logs, along with the stacktraces, cause the flood :
>
> ......
>  DMAR: ERROR: DMA PTE for vPFN 0x428ec already set (to 3f6ec003 not 3f6ec003)
>  DMAR: ERROR: DMA PTE for vPFN 0x428ed already set (to 3f6ed003 not 3f6ed003)
>  DMAR: ERROR: DMA PTE for vPFN 0x428ee already set (to 3f6ee003 not 3f6ee003)
>  DMAR: ERROR: DMA PTE for vPFN 0x428ef already set (to 3f6ef003 not 3f6ef003)
>  DMAR: ERROR: DMA PTE for vPFN 0x428f0 already set (to 3f6f0003 not 3f6f0003)
> ......
>
>
>
> === Current Behaviour, leading to the issue ===
>
> Currently, when we do a dma-unmapping, we unmap/unlink the mappings, but
> the pte-entries are not cleared.
>
> Thus, following sequencing would flood the kernel-logs :
>
> i)
> A dma-unmapping makes the real/leaf-level pte-slot invalid, but the
> pte-content itself is not cleared.
>
> ii)
> Now, during some later dma-mapping procedure, as the pte-slot is about
> to hold a new pte-value, the intel-iommu checks if a prior
> pte-entry exists in the pte-slot. If it exists, it logs a kernel-error,
> along with a corresponding stacktrace.
>
> iii)
> Step ii) runs in abundance, and the kernel-logs run insane.
>
>
>
> === Fix ===
>
> We ensure that as part of a dma-unmapping, each (unmapped) pte-slot
> is also cleared of its value/content (at the leaf-level, where the
> real mapping from a iova => pfn mapping is stored).
>
> This completes a "deep" dma-unmapping.
>
>
>
> Signed-off-by: Ajay Garg <ajaygargnsit@gmail.com>
> ---
>  drivers/iommu/intel/iommu.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index d75f59ae28e6..485a8ea71394 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -5090,6 +5090,8 @@ static size_t intel_iommu_unmap(struct iommu_domain *domain,
>         gather->freelist = domain_unmap(dmar_domain, start_pfn,
>                                         last_pfn, gather->freelist);
>
> +       dma_pte_clear_range(dmar_domain, start_pfn, last_pfn);
> +
>         if (dmar_domain->max_addr == iova + size)
>                 dmar_domain->max_addr = iova;
>
> --
> 2.30.2
>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] iommu: intel: do deep dma-unmapping, to avoid kernel-flooding.
  2021-10-22 17:33 ` Ajay Garg
@ 2021-10-23  7:00   ` Ajay Garg
  0 siblings, 0 replies; 5+ messages in thread
From: Ajay Garg @ 2021-10-23  7:00 UTC (permalink / raw)
  To: iommu; +Cc: Alex Williamson

Another piece of information :

The observations are same, if the current pci-device (sd/mmc
controller) is detached, and another pci-device (sound controller) is
attached to the guest.

So, it looks that we can rule out any (pci-)device-specific issue.


For brevity, here are the details of the other pci-device I tried with :

###############################################
sudo lspci -vvv

00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset
Family High Definition Audio Controller (rev 04)
    DeviceName:  Onboard Audio
    Subsystem: Dell 6 Series/C200 Series Chipset Family High
Definition Audio Controller
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin A routed to IRQ 31
    IOMMU group: 5
    Region 0: Memory at e2e60000 (64-bit, non-prefetchable) [size=16K]
    Capabilities: [50] Power Management version 2
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
        Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Address: 00000000fee00358  Data: 0000
    Capabilities: [70] Express (v1) Root Complex Integrated Endpoint, MSI 00
        DevCap:    MaxPayload 128 bytes, PhantFunc 0
            ExtTag- RBE- FLReset+
        DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
            RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
            MaxPayload 128 bytes, MaxReadReq 128 bytes
        DevSta:    CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
    Capabilities: [100 v1] Virtual Channel
        Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
        Arb:    Fixed- WRR32- WRR64- WRR128-
        Ctrl:    ArbSelect=Fixed
        Status:    InProgress-
        VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
            Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
            Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
            Status:    NegoPending- InProgress-
        VC1:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
            Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
            Ctrl:    Enable+ ID=1 ArbSelect=Fixed TC/VC=22
            Status:    NegoPending- InProgress-
    Capabilities: [130 v1] Root Complex Link
        Desc:    PortNumber=0f ComponentID=00 EltType=Config
        Link0:    Desc:    TargetPort=00 TargetComponent=00 AssocRCRB-
LinkType=MemMapped LinkValid+
            Addr:    00000000fed1c000
    Kernel driver in use: snd_hda_intel
    Kernel modules: snd_hda_intel
###############################################

On Fri, Oct 22, 2021 at 11:03 PM Ajay Garg <ajaygargnsit@gmail.com> wrote:
>
> Ping ..
>
> Any updates please on this?
>
> It will be great to have the fix upstreamed (properly of course).
>
> Right now, the patch contains the change as suggested, of
> explicitly/properly clearing out dma-mappings when unmap is called.
> Please let me know in whatever way I can help, including
> testing/debugging for other approaches if required.
>
>
> Many thanks to Alex and Lu for their continued support on the issue.
>
>
>
> P.S. :
>
> I might have missed mentioning the information about the device that
> causes flooding.
> Please find it below :
>
> ######################################
> sudo lspci -vvv
>
> 0a:00.0 SD Host controller: O2 Micro, Inc. OZ600FJ0/OZ900FJ0/OZ600FJS
> SD/MMC Card Reader Controller (rev 05) (prog-if 01)
>     Subsystem: Dell OZ600FJ0/OZ900FJ0/OZ600FJS SD/MMC Card Reader Controller
>     Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx-
>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>     Latency: 0, Cache Line Size: 64 bytes
>     Interrupt: pin A routed to IRQ 17
>     IOMMU group: 14
>     Region 0: Memory at e2c20000 (32-bit, non-prefetchable) [size=512]
>     Capabilities: [a0] Power Management version 3
>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
> PME(D0+,D1+,D2+,D3hot+,D3cold+)
>         Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>     Capabilities: [48] MSI: Enable- Count=1/1 Maskable+ 64bit+
>         Address: 0000000000000000  Data: 0000
>         Masking: 00000000  Pending: 00000000
>     Capabilities: [80] Express (v1) Endpoint, MSI 00
>         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us
>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> SlotPowerLimit 10.000W
>         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
>             RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>             MaxPayload 128 bytes, MaxReadReq 512 bytes
>         DevSta:    CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> Latency L0s <512ns, L1 <64us
>             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
>         LnkCtl:    ASPM L0s Enabled; RCB 64 bytes, Disabled- CommClk-
>             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
>             TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
>     Capabilities: [100 v1] Virtual Channel
>         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
>         Arb:    Fixed- WRR32- WRR64- WRR128-
>         Ctrl:    ArbSelect=Fixed
>         Status:    InProgress-
>         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
>             Status:    NegoPending- InProgress-
>     Capabilities: [200 v1] Advanced Error Reporting
>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>         CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>         AERCap:    First Error Pointer: 00, ECRCGenCap- ECRCGenEn-
> ECRCChkCap- ECRCChkEn-
>             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>         HeaderLog: 00000000 00000000 00000000 00000000
>     Kernel driver in use: sdhci-pci
>     Kernel modules: sdhci_pci
> ######################################
>
>
>
> Thanks and Regards,
> Ajay
>
> On Tue, Oct 12, 2021 at 7:27 PM Ajay Garg <ajaygargnsit@gmail.com> wrote:
> >
> > Origins at :
> > https://lists.linuxfoundation.org/pipermail/iommu/2021-October/thread.html
> >
> > === Changes from v1 => v2 ===
> >
> > a)
> > Improved patch-description.
> >
> > b)
> > A more root-level fix, as suggested by
> >
> >         1.
> >         Alex Williamson <alex.williamson@redhat.com>
> >
> >         2.
> >         Lu Baolu <baolu.lu@linux.intel.com>
> >
> >
> >
> > === Issue ===
> >
> > Kernel-flooding is seen, when an x86_64 L1 guest (Ubuntu-21) is booted in qemu/kvm
> > on a x86_64 host (Ubuntu-21), with a host-pci-device attached.
> >
> > Following kind of logs, along with the stacktraces, cause the flood :
> >
> > ......
> >  DMAR: ERROR: DMA PTE for vPFN 0x428ec already set (to 3f6ec003 not 3f6ec003)
> >  DMAR: ERROR: DMA PTE for vPFN 0x428ed already set (to 3f6ed003 not 3f6ed003)
> >  DMAR: ERROR: DMA PTE for vPFN 0x428ee already set (to 3f6ee003 not 3f6ee003)
> >  DMAR: ERROR: DMA PTE for vPFN 0x428ef already set (to 3f6ef003 not 3f6ef003)
> >  DMAR: ERROR: DMA PTE for vPFN 0x428f0 already set (to 3f6f0003 not 3f6f0003)
> > ......
> >
> >
> >
> > === Current Behaviour, leading to the issue ===
> >
> > Currently, when we do a dma-unmapping, we unmap/unlink the mappings, but
> > the pte-entries are not cleared.
> >
> > Thus, following sequencing would flood the kernel-logs :
> >
> > i)
> > A dma-unmapping makes the real/leaf-level pte-slot invalid, but the
> > pte-content itself is not cleared.
> >
> > ii)
> > Now, during some later dma-mapping procedure, as the pte-slot is about
> > to hold a new pte-value, the intel-iommu checks if a prior
> > pte-entry exists in the pte-slot. If it exists, it logs a kernel-error,
> > along with a corresponding stacktrace.
> >
> > iii)
> > Step ii) runs in abundance, and the kernel-logs run insane.
> >
> >
> >
> > === Fix ===
> >
> > We ensure that as part of a dma-unmapping, each (unmapped) pte-slot
> > is also cleared of its value/content (at the leaf-level, where the
> > real mapping from a iova => pfn mapping is stored).
> >
> > This completes a "deep" dma-unmapping.
> >
> >
> >
> > Signed-off-by: Ajay Garg <ajaygargnsit@gmail.com>
> > ---
> >  drivers/iommu/intel/iommu.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index d75f59ae28e6..485a8ea71394 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -5090,6 +5090,8 @@ static size_t intel_iommu_unmap(struct iommu_domain *domain,
> >         gather->freelist = domain_unmap(dmar_domain, start_pfn,
> >                                         last_pfn, gather->freelist);
> >
> > +       dma_pte_clear_range(dmar_domain, start_pfn, last_pfn);
> > +
> >         if (dmar_domain->max_addr == iova + size)
> >                 dmar_domain->max_addr = iova;
> >
> > --
> > 2.30.2
> >
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] iommu: intel: do deep dma-unmapping, to avoid kernel-flooding.
  2021-10-12 13:56 [PATCH v2] iommu: intel: do deep dma-unmapping, to avoid kernel-flooding Ajay Garg
  2021-10-22 17:33 ` Ajay Garg
@ 2021-11-09 23:56 ` Alex Williamson
  2021-11-10  6:33   ` Lu Baolu
  1 sibling, 1 reply; 5+ messages in thread
From: Alex Williamson @ 2021-11-09 23:56 UTC (permalink / raw)
  To: baolu.lu; +Cc: iommu


Hi Baolu,

Have you looked into this?  I'm able to reproduce by starting and
destroying an assigned device VM several times.  It seems like it came
in with Joerg's pull request for the v5.15 merge window.  Bisecting
lands me on 3f34f1259776 where intel-iommu added map/unmap_pages
support, but I'm not convinced that isn't an artifact that the regular
map/unmap calls had been been simplified to only be used for single
pages by that point.  If I mask the map/unmap_pages callbacks and use
map/unmap with (pgsize * size) and restore the previous pgsize_bitmap,
I can generate the same faults.  So maybe the root issue was introduced
somewhere else, or perhaps it is a latent bug in clearing of pte ranges
as Ajay proposes below.  In any case, I think there's a real issue
here.  Thanks,

Alex

On Tue, 12 Oct 2021 19:26:53 +0530
Ajay Garg <ajaygargnsit@gmail.com> wrote:

> === Issue ===
> 
> Kernel-flooding is seen, when an x86_64 L1 guest (Ubuntu-21) is booted in qemu/kvm
> on a x86_64 host (Ubuntu-21), with a host-pci-device attached.
> 
> Following kind of logs, along with the stacktraces, cause the flood :
> 
> ......
>  DMAR: ERROR: DMA PTE for vPFN 0x428ec already set (to 3f6ec003 not 3f6ec003)
>  DMAR: ERROR: DMA PTE for vPFN 0x428ed already set (to 3f6ed003 not 3f6ed003)
>  DMAR: ERROR: DMA PTE for vPFN 0x428ee already set (to 3f6ee003 not 3f6ee003)
>  DMAR: ERROR: DMA PTE for vPFN 0x428ef already set (to 3f6ef003 not 3f6ef003)
>  DMAR: ERROR: DMA PTE for vPFN 0x428f0 already set (to 3f6f0003 not 3f6f0003)
> ......
> 
> 
> 
> === Current Behaviour, leading to the issue ===
> 
> Currently, when we do a dma-unmapping, we unmap/unlink the mappings, but
> the pte-entries are not cleared.
> 
> Thus, following sequencing would flood the kernel-logs :
> 
> i)
> A dma-unmapping makes the real/leaf-level pte-slot invalid, but the 
> pte-content itself is not cleared.
> 
> ii)
> Now, during some later dma-mapping procedure, as the pte-slot is about
> to hold a new pte-value, the intel-iommu checks if a prior 
> pte-entry exists in the pte-slot. If it exists, it logs a kernel-error,
> along with a corresponding stacktrace.
> 
> iii)
> Step ii) runs in abundance, and the kernel-logs run insane.
> 
> 
> 
> === Fix ===
> 
> We ensure that as part of a dma-unmapping, each (unmapped) pte-slot
> is also cleared of its value/content (at the leaf-level, where the 
> real mapping from a iova => pfn mapping is stored). 
> 
> This completes a "deep" dma-unmapping.
> 
> 
> 
> Signed-off-by: Ajay Garg <ajaygargnsit@gmail.com>
> ---
>  drivers/iommu/intel/iommu.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index d75f59ae28e6..485a8ea71394 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -5090,6 +5090,8 @@ static size_t intel_iommu_unmap(struct iommu_domain *domain,
>  	gather->freelist = domain_unmap(dmar_domain, start_pfn,
>  					last_pfn, gather->freelist);
>  
> +	dma_pte_clear_range(dmar_domain, start_pfn, last_pfn);
> +
>  	if (dmar_domain->max_addr == iova + size)
>  		dmar_domain->max_addr = iova;
>  

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] iommu: intel: do deep dma-unmapping, to avoid kernel-flooding.
  2021-11-09 23:56 ` Alex Williamson
@ 2021-11-10  6:33   ` Lu Baolu
  0 siblings, 0 replies; 5+ messages in thread
From: Lu Baolu @ 2021-11-10  6:33 UTC (permalink / raw)
  To: Alex Williamson; +Cc: iommu

Hi Alex,

On 2021/11/10 7:56, Alex Williamson wrote:
> 
> Hi Baolu,
> 
> Have you looked into this?

I am looking at this.

> I'm able to reproduce by starting and
> destroying an assigned device VM several times.  It seems like it came
> in with Joerg's pull request for the v5.15 merge window.  Bisecting
> lands me on 3f34f1259776 where intel-iommu added map/unmap_pages
> support, but I'm not convinced that isn't an artifact that the regular
> map/unmap calls had been been simplified to only be used for single
> pages by that point.  If I mask the map/unmap_pages callbacks and use
> map/unmap with (pgsize * size) and restore the previous pgsize_bitmap,
> I can generate the same faults.  So maybe the root issue was introduced
> somewhere else, or perhaps it is a latent bug in clearing of pte ranges
> as Ajay proposes below.  In any case, I think there's a real issue
> here.  Thanks,

I am trying to reproduce this issue with my local setup. I will come
back again after I have more details.

Best regards,
baolu

> 
> Alex
> 
> On Tue, 12 Oct 2021 19:26:53 +0530
> Ajay Garg <ajaygargnsit@gmail.com> wrote:
> 
>> === Issue ===
>>
>> Kernel-flooding is seen, when an x86_64 L1 guest (Ubuntu-21) is booted in qemu/kvm
>> on a x86_64 host (Ubuntu-21), with a host-pci-device attached.
>>
>> Following kind of logs, along with the stacktraces, cause the flood :
>>
>> ......
>>   DMAR: ERROR: DMA PTE for vPFN 0x428ec already set (to 3f6ec003 not 3f6ec003)
>>   DMAR: ERROR: DMA PTE for vPFN 0x428ed already set (to 3f6ed003 not 3f6ed003)
>>   DMAR: ERROR: DMA PTE for vPFN 0x428ee already set (to 3f6ee003 not 3f6ee003)
>>   DMAR: ERROR: DMA PTE for vPFN 0x428ef already set (to 3f6ef003 not 3f6ef003)
>>   DMAR: ERROR: DMA PTE for vPFN 0x428f0 already set (to 3f6f0003 not 3f6f0003)
>> ......
>>
>>
>>
>> === Current Behaviour, leading to the issue ===
>>
>> Currently, when we do a dma-unmapping, we unmap/unlink the mappings, but
>> the pte-entries are not cleared.
>>
>> Thus, following sequencing would flood the kernel-logs :
>>
>> i)
>> A dma-unmapping makes the real/leaf-level pte-slot invalid, but the
>> pte-content itself is not cleared.
>>
>> ii)
>> Now, during some later dma-mapping procedure, as the pte-slot is about
>> to hold a new pte-value, the intel-iommu checks if a prior
>> pte-entry exists in the pte-slot. If it exists, it logs a kernel-error,
>> along with a corresponding stacktrace.
>>
>> iii)
>> Step ii) runs in abundance, and the kernel-logs run insane.
>>
>>
>>
>> === Fix ===
>>
>> We ensure that as part of a dma-unmapping, each (unmapped) pte-slot
>> is also cleared of its value/content (at the leaf-level, where the
>> real mapping from a iova => pfn mapping is stored).
>>
>> This completes a "deep" dma-unmapping.
>>
>>
>>
>> Signed-off-by: Ajay Garg <ajaygargnsit@gmail.com>
>> ---
>>   drivers/iommu/intel/iommu.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>> index d75f59ae28e6..485a8ea71394 100644
>> --- a/drivers/iommu/intel/iommu.c
>> +++ b/drivers/iommu/intel/iommu.c
>> @@ -5090,6 +5090,8 @@ static size_t intel_iommu_unmap(struct iommu_domain *domain,
>>   	gather->freelist = domain_unmap(dmar_domain, start_pfn,
>>   					last_pfn, gather->freelist);
>>   
>> +	dma_pte_clear_range(dmar_domain, start_pfn, last_pfn);
>> +
>>   	if (dmar_domain->max_addr == iova + size)
>>   		dmar_domain->max_addr = iova;
>>   
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-11-10  6:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-10-12 13:56 [PATCH v2] iommu: intel: do deep dma-unmapping, to avoid kernel-flooding Ajay Garg
2021-10-22 17:33 ` Ajay Garg
2021-10-23  7:00   ` Ajay Garg
2021-11-09 23:56 ` Alex Williamson
2021-11-10  6:33   ` Lu Baolu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox