xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: "Roger Pau Monné" <roger.pau@citrix.com>
To: Paul Durrant <Paul.Durrant@citrix.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>,
	"abelgun@amazon.com" <abelgun@amazon.com>,
	David Woodhouse <dwmw2@infradead.org>,
	Jan Beulich <JBeulich@suse.com>,
	"bercarug@amazon.com" <bercarug@amazon.com>
Subject: Re: PVH dom0 creation fails - the system freezes
Date: Thu, 26 Jul 2018 18:46:11 +0200	[thread overview]
Message-ID: <20180726164611.gneruvoi5vmvbkd5@mac.bytemobile.com> (raw)
In-Reply-To: <39fd8a224e5e4a39999b04582d45a0c4@AMSPEX02CL03.citrite.net>

On Wed, Jul 25, 2018 at 05:19:03PM +0100, Paul Durrant wrote:
> > -----Original Message-----
> > From: Xen-devel [mailto:xen-devel-bounces@lists.xenproject.org] On Behalf
> > Of Roger Pau Monné
> > Sent: 25 July 2018 15:12
> > To: bercarug@amazon.com
> > Cc: xen-devel <xen-devel@lists.xenproject.org>; David Woodhouse
> > <dwmw2@infradead.org>; Jan Beulich <JBeulich@suse.com>;
> > abelgun@amazon.com
> > Subject: Re: [Xen-devel] PVH dom0 creation fails - the system freezes
> > 
> > On Wed, Jul 25, 2018 at 04:57:23PM +0300, bercarug@amazon.com wrote:
> > > On 07/25/2018 04:35 PM, Roger Pau Monné wrote:
> > > > On Wed, Jul 25, 2018 at 01:06:43PM +0300, bercarug@amazon.com
> > wrote:
> > > > > On 07/24/2018 12:54 PM, Jan Beulich wrote:
> > > > > > > > > On 23.07.18 at 13:50, <bercarug@amazon.com> wrote:
> > > > > > > For the last few days, I have been trying to get a PVH dom0 running,
> > > > > > > however I encountered the following problem: the system seems
> > to
> > > > > > > freeze after the hypervisor boots, the screen goes black. I have
> > tried to
> > > > > > > debug it via a serial console (using Minicom) and managed to get
> > some
> > > > > > > more Xen output, after the screen turns black.
> > > > > > >
> > > > > > > I mention that I have tried to boot the PVH dom0 using different
> > kernel
> > > > > > > images (from 4.9.0 to 4.18-rc3), different Xen  versions (4.10, 4.11,
> > 4.12).
> > > > > > >
> > > > > > > Below I attached my system / hypervisor configuration, as well as
> > the
> > > > > > > output captured through the serial console, corresponding to the
> > latest
> > > > > > > versions for Xen and the Linux Kernel (Xen staging and Kernel from
> > the
> > > > > > > xen/tip tree).
> > > > > > > [...]
> > > > > > > (XEN) [VT-D]iommu.c:919: iommu_fault_status: Fault Overflow
> > > > > > > (XEN) [VT-D]iommu.c:921: iommu_fault_status: Primary Pending
> > Fault
> > > > > > > (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:00:14.0] fault
> > addr 8deb3000, iommu reg = ffff82c00021b000
> > > > Can you figure out which PCI device is 00:14.0?
> > > This is the output of lspci -vvv for device 00:14.0:
> > >
> > > 00:14.0 USB controller: Intel Corporation Sunrise Point-H USB 3.0 xHCI
> > > Controller (rev 31) (prog-if 30 [XHCI])
> > >         Subsystem: Intel Corporation Sunrise Point-H USB 3.0 xHCI Controller
> > >         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > ParErr+
> > > Stepping- SERR+ FastB2B- DisINTx+
> > >         Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
> > > <TAbort- <MAbort+ >SERR- <PERR- INTx-
> > >         Latency: 0
> > >         Interrupt: pin A routed to IRQ 178
> > >         Region 0: Memory at a2e00000 (64-bit, non-prefetchable) [size=64K]
> > >         Capabilities: [70] Power Management version 2
> > >                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA
> > > PME(D0-,D1-,D2-,D3hot+,D3cold+)
> > >                 Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> > >         Capabilities: [80] MSI: Enable+ Count=1/8 Maskable- 64bit+
> > >                 Address: 00000000fee0e000  Data: 4021
> > >         Kernel driver in use: xhci_hcd
> > >         Kernel modules: xhci_pci
> > 
> > I'm afraid your USB controller is missing RMRR entries in the DMAR
> > ACPI tables, thus causing the IOMMU faults and not working properly.
> > 
> > You could try to manually add some extra rmrr regions by appending:
> > 
> > rmrr=0x8deb3=0:0:14.0
> > 
> > To the Xen command line, and keep adding any address that pops up in
> > the iommu faults. This is of course quite cumbersome, but there's no
> > way to get the required memory addresses if the data in RMRR is
> > wrong/incomplete.
> > 
> 
> You could just add all E820 reserved regions in there. That will almost certainly cover it.

I have a prototype patch for this that attempts to identity map all
reserved regions below 4GB to the p2m. It's still a WIP, but if you
could give it a try that would help me figure out whether this fixes
your issues and is indeed something that would be good to have.

I don't really like the patch as-is because it doesn't check whether
the reserved regions added to the p2m overlap with the LAPIC page or
the PCIe MCFG regions for example, I will continue to work on a safer
version.

If you can give this a shot, please remove any rmrr options from the
command line and use iommu=debug in order to catch any issues.

Thanks, Roger.
---8<---
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 2c44fabf99..76a1fd6681 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -21,6 +21,8 @@
 #include <xen/keyhandler.h>
 #include <xsm/xsm.h>
 
+#include <asm/setup.h>
+
 static int parse_iommu_param(const char *s);
 static void iommu_dump_p2m_table(unsigned char key);
 
@@ -47,6 +49,8 @@ integer_param("iommu_dev_iotlb_timeout", iommu_dev_iotlb_timeout);
  *   no-igfx                    Disable VT-d for IGD devices (insecure)
  *   no-amd-iommu-perdev-intremap Don't use per-device interrupt remapping
  *                              tables (insecure)
+ *   inclusive                  Include any memory ranges below 4GB not used
+ *                              by Xen or unusable to the iommu page tables.
  */
 custom_param("iommu", parse_iommu_param);
 bool_t __initdata iommu_enable = 1;
@@ -60,6 +64,7 @@ bool_t __read_mostly iommu_passthrough;
 bool_t __read_mostly iommu_snoop = 1;
 bool_t __read_mostly iommu_qinval = 1;
 bool_t __read_mostly iommu_intremap = 1;
+bool __read_mostly iommu_inclusive = true;
 
 /*
  * In the current implementation of VT-d posted interrupts, in some extreme
@@ -126,6 +131,8 @@ static int __init parse_iommu_param(const char *s)
             iommu_dom0_strict = val;
         else if ( !strncmp(s, "sharept", ss - s) )
             iommu_hap_pt_share = val;
+        else if ( !strncmp(s, "inclusive", ss - s) )
+            iommu_inclusive = val;
         else
             rc = -EINVAL;
 
@@ -165,6 +172,85 @@ static void __hwdom_init check_hwdom_reqs(struct domain *d)
     iommu_dom0_strict = 1;
 }
 
+static void __hwdom_init setup_inclusive_mappings(struct domain *d)
+{
+    unsigned long i, j, tmp, top, max_pfn;
+
+    BUG_ON(!is_hardware_domain(d));
+
+    max_pfn = (GB(4) >> PAGE_SHIFT) - 1;
+    top = max(max_pdx, pfn_to_pdx(max_pfn) + 1);
+
+    for ( i = 0; i < top; i++ )
+    {
+        unsigned long pfn = pdx_to_pfn(i);
+        bool map;
+        int rc = 0;
+
+        /*
+         * Set up 1:1 mapping for dom0. Default to include only
+         * conventional RAM areas and let RMRRs include needed reserved
+         * regions. When set, the inclusive mapping additionally maps in
+         * every pfn up to 4GB except those that fall in unusable ranges.
+         */
+        if ( pfn > max_pfn && !mfn_valid(_mfn(pfn)) )
+            continue;
+
+        if ( is_pv_domain(d) && iommu_inclusive && pfn <= max_pfn )
+            map = !page_is_ram_type(pfn, RAM_TYPE_UNUSABLE);
+        else if ( is_hvm_domain(d) && iommu_inclusive )
+            map = page_is_ram_type(pfn, RAM_TYPE_RESERVED);
+        else
+            map = page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL);
+
+        if ( !map )
+            continue;
+
+        /* Exclude Xen bits */
+        if ( xen_in_range(pfn) )
+            continue;
+
+        /*
+         * If dom0-strict mode is enabled or guest type is HVM/PVH then exclude
+         * conventional RAM and let the common code map dom0's pages.
+         */
+        if ( (iommu_dom0_strict || is_hvm_domain(d)) &&
+             page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) )
+            continue;
+
+        /* For HVM avoid memory below 1MB because that's already mapped. */
+        if ( is_hvm_domain(d) && pfn < PFN_DOWN(MB(1)) )
+            continue;
+
+        tmp = 1 << (PAGE_SHIFT - PAGE_SHIFT_4K);
+        for ( j = 0; j < tmp; j++ )
+        {
+            int ret;
+
+            if ( iommu_use_hap_pt(d) )
+            {
+                ASSERT(is_hvm_domain(d));
+                ret = set_identity_p2m_entry(d, pfn * tmp + j, p2m_access_rw,
+                                             0);
+            }
+            else
+                ret = iommu_map_page(d, pfn * tmp + j, pfn * tmp + j,
+                                     IOMMUF_readable|IOMMUF_writable);
+
+            if ( !rc )
+               rc = ret;
+        }
+
+        if ( rc )
+            printk(XENLOG_WARNING " d%d: IOMMU mapping failed: %d\n",
+                   d->domain_id, rc);
+
+        if (!(i & (0xfffff >> (PAGE_SHIFT - PAGE_SHIFT_4K))))
+            process_pending_softirqs();
+    }
+
+}
+
 void __hwdom_init iommu_hwdom_init(struct domain *d)
 {
     const struct domain_iommu *hd = dom_iommu(d);
@@ -207,7 +293,10 @@ void __hwdom_init iommu_hwdom_init(struct domain *d)
                    d->domain_id, rc);
     }
 
-    return hd->platform_ops->hwdom_init(d);
+    hd->platform_ops->hwdom_init(d);
+
+    if ( !iommu_passthrough )
+        setup_inclusive_mappings(d);
 }
 
 void iommu_teardown(struct domain *d)
diff --git a/xen/drivers/passthrough/vtd/extern.h b/xen/drivers/passthrough/vtd/extern.h
index fb7edfaef9..91cadc602e 100644
--- a/xen/drivers/passthrough/vtd/extern.h
+++ b/xen/drivers/passthrough/vtd/extern.h
@@ -99,6 +99,4 @@ void pci_vtd_quirk(const struct pci_dev *);
 bool_t platform_supports_intremap(void);
 bool_t platform_supports_x2apic(void);
 
-void vtd_set_hwdom_mapping(struct domain *d);
-
 #endif // _VTD_EXTERN_H_
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 1710256823..569ec4aec2 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1304,12 +1304,6 @@ static void __hwdom_init intel_iommu_hwdom_init(struct domain *d)
 {
     struct acpi_drhd_unit *drhd;
 
-    if ( !iommu_passthrough && is_pv_domain(d) )
-    {
-        /* Set up 1:1 page table for hardware domain. */
-        vtd_set_hwdom_mapping(d);
-    }
-
     setup_hwdom_pci_devices(d, setup_hwdom_device);
     setup_hwdom_rmrr(d);
 
diff --git a/xen/drivers/passthrough/vtd/x86/vtd.c b/xen/drivers/passthrough/vtd/x86/vtd.c
index cc2bfea162..9971915349 100644
--- a/xen/drivers/passthrough/vtd/x86/vtd.c
+++ b/xen/drivers/passthrough/vtd/x86/vtd.c
@@ -32,11 +32,9 @@
 #include "../extern.h"
 
 /*
- * iommu_inclusive_mapping: when set, all memory below 4GB is included in dom0
- * 1:1 iommu mappings except xen and unusable regions.
+ * iommu_inclusive_mapping: superseded by iommu=inclusive.
  */
-static bool_t __hwdom_initdata iommu_inclusive_mapping = 1;
-boolean_param("iommu_inclusive_mapping", iommu_inclusive_mapping);
+boolean_param("iommu_inclusive_mapping", iommu_inclusive);
 
 void *map_vtd_domain_page(u64 maddr)
 {
@@ -107,67 +105,3 @@ void hvm_dpci_isairq_eoi(struct domain *d, unsigned int isairq)
     }
     spin_unlock(&d->event_lock);
 }
-
-void __hwdom_init vtd_set_hwdom_mapping(struct domain *d)
-{
-    unsigned long i, j, tmp, top, max_pfn;
-
-    BUG_ON(!is_hardware_domain(d));
-
-    max_pfn = (GB(4) >> PAGE_SHIFT) - 1;
-    top = max(max_pdx, pfn_to_pdx(max_pfn) + 1);
-
-    for ( i = 0; i < top; i++ )
-    {
-        unsigned long pfn = pdx_to_pfn(i);
-        bool map;
-        int rc = 0;
-
-        /*
-         * Set up 1:1 mapping for dom0. Default to include only
-         * conventional RAM areas and let RMRRs include needed reserved
-         * regions. When set, the inclusive mapping additionally maps in
-         * every pfn up to 4GB except those that fall in unusable ranges.
-         */
-        if ( pfn > max_pfn && !mfn_valid(_mfn(pfn)) )
-            continue;
-
-        if ( iommu_inclusive_mapping && pfn <= max_pfn )
-            map = !page_is_ram_type(pfn, RAM_TYPE_UNUSABLE);
-        else
-            map = page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL);
-
-        if ( !map )
-            continue;
-
-        /* Exclude Xen bits */
-        if ( xen_in_range(pfn) )
-            continue;
-
-        /*
-         * If dom0-strict mode is enabled then exclude conventional RAM
-         * and let the common code map dom0's pages.
-         */
-        if ( iommu_dom0_strict &&
-             page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) )
-            continue;
-
-        tmp = 1 << (PAGE_SHIFT - PAGE_SHIFT_4K);
-        for ( j = 0; j < tmp; j++ )
-        {
-            int ret = iommu_map_page(d, pfn * tmp + j, pfn * tmp + j,
-                                     IOMMUF_readable|IOMMUF_writable);
-
-            if ( !rc )
-               rc = ret;
-        }
-
-        if ( rc )
-            printk(XENLOG_WARNING VTDPREFIX " d%d: IOMMU mapping failed: %d\n",
-                   d->domain_id, rc);
-
-        if (!(i & (0xfffff >> (PAGE_SHIFT - PAGE_SHIFT_4K))))
-            process_pending_softirqs();
-    }
-}
-
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 6b42e3b876..15d6584837 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -35,6 +35,7 @@ extern bool_t iommu_snoop, iommu_qinval, iommu_intremap, iommu_intpost;
 extern bool_t iommu_hap_pt_share;
 extern bool_t iommu_debug;
 extern bool_t amd_iommu_perdev_intremap;
+extern bool iommu_inclusive;
 
 extern unsigned int iommu_dev_iotlb_timeout;
 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2018-07-26 16:46 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-23 11:50 PVH dom0 creation fails - the system freezes bercarug
2018-07-24  9:54 ` Jan Beulich
2018-07-25 10:06   ` bercarug
2018-07-25 10:22     ` Wei Liu
2018-07-25 10:43     ` Juergen Gross
2018-07-25 13:35     ` Roger Pau Monné
2018-07-25 13:41       ` Juergen Gross
2018-07-25 14:02         ` Wei Liu
2018-07-25 14:05           ` bercarug
2018-07-25 14:10             ` Wei Liu
2018-07-25 16:12             ` Roger Pau Monné
2018-07-25 16:29               ` Juergen Gross
2018-07-25 18:56                 ` [Memory Accounting] was: " Andrew Cooper
2018-07-25 23:07                   ` Boris Ostrovsky
2018-07-26  9:41                     ` Juergen Gross
2018-07-26  9:45                     ` George Dunlap
2018-07-26 11:11                       ` Roger Pau Monné
2018-07-26 11:22                         ` Juergen Gross
2018-07-26 11:27                           ` George Dunlap
2018-07-26 12:19                             ` Juergen Gross
2018-07-26 14:44                               ` George Dunlap
2018-07-26 13:50                           ` Roger Pau Monné
2018-07-26 13:58                             ` Juergen Gross
2018-07-26 14:35                               ` Roger Pau Monné
2018-07-26 11:23                         ` George Dunlap
2018-07-26 11:08                 ` Roger Pau Monné
2018-07-26  8:15               ` bercarug
2018-07-26  8:31                 ` Juergen Gross
2018-07-26 11:05                   ` Roger Pau Monné
2018-07-25 13:57       ` bercarug
2018-07-25 14:12         ` Roger Pau Monné
2018-07-25 16:19           ` Paul Durrant
2018-07-26 16:46             ` Roger Pau Monné [this message]
2018-07-27  8:48               ` Bercaru, Gabriel
2018-07-27  9:11                 ` Roger Pau Monné
2018-08-02 11:36                   ` Bercaru, Gabriel
2018-08-02 13:55                     ` Roger Pau Monné
2018-08-08  7:46                       ` bercarug
2018-08-08  8:08                         ` Roger Pau Monné
2018-08-08  8:39                           ` bercarug
2018-08-08  8:43                           ` Paul Durrant
2018-08-08  8:51                             ` Roger Pau Monné
2018-08-08  8:54                               ` bercarug
2018-08-08  9:44                                 ` Roger Pau Monné
2018-08-08 10:11                                   ` Roger Pau Monné
2018-08-08 10:13                                     ` bercarug
     [not found]                               ` <5B6AAD430200009A03E1638C@prv1-mh.provo.novell.com>
     [not found]                                 ` <5B6AAF130200003B04D2E796@prv1-mh.provo.novell.com>
2018-08-08 10:00                                   ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180726164611.gneruvoi5vmvbkd5@mac.bytemobile.com \
    --to=roger.pau@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=Paul.Durrant@citrix.com \
    --cc=abelgun@amazon.com \
    --cc=bercarug@amazon.com \
    --cc=dwmw2@infradead.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).