From mboxrd@z Thu Jan  1 00:00:00 1970
From: Wei Wang2 <wei.wang2@amd.com>
Subject: Re: [PATCH] amd iommu: Do not adjust paging mode for dom0
	devices
Date: Tue, 8 Feb 2011 19:02:33 +0100
Message-ID: <201102081902.33844.wei.wang2@amd.com>
References: <C975B974.2ACE1%keir@xen.org>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="Boundary-00=_5UYUNw9CZzS6LpX"
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <C975B974.2ACE1%keir@xen.org>
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: Keir Fraser <keir@xen.org>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>
List-Id: xen-devel@lists.xenproject.org

--Boundary-00=_5UYUNw9CZzS6LpX
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline


On Monday 07 February 2011 16:00:04 Keir Fraser wrote:
> On 07/02/2011 13:30, "Wei Wang2" <wei.wang2@amd.com> wrote:
> > On Monday 07 February 2011 11:47:32 Keir Fraser wrote:
> >> On 07/02/2011 10:33, "Wei Wang2" <wei.wang2@amd.com> wrote:
> >>
> >> Personally I would suggest starting with small 2-level tables and
> >> dynamically increase their height as bigger mappings are added to them.
> >> Else stick with 4-level tables, or size tables according to global
> >> variable max_page. I think basing anything on d->max_pages is not a good
> >> idea.
> >>
> >>  -- Keir
> >
> > How does the attached patch look like? It uses global variable max_page
> > for pv and dom0 and calculate maxpfn for hvm guest. This should cover gfn
> > holes on hvm guests.
>
> The p2m code already tracks the largest gfn for HVM guests. Try using
> p2m_get_hostp2m(d)->max_mapped_pfn for HVM guests. Note that this could
> increase after you sample it, however. Hence why you really need to have a
> statically deep-enough table, or the ability to grow the table depth
> dynamically.
Keir, 
Attached patch implements dynamical page table depth adjustment. Please 
review. IO Page table growth is triggered by amd_iommu_map_page and grows to 
upper level. I have tested it well for different devices (nic and gfx) and 
different guests (linux and Win7) with different guest memory sizes (512M, 
1G, 4G and above). 
Although this looks easier than my first thought, it may not be trivial for 
the release. If so, I will send you another patch to reverse c/s 22825 
tomorrow.
Thanks,
Wei
Signed-off-by: Wei Wang <wei.wang2@amd.com>

> Your change for PV guests would definitely be correct, however.
>
>  -- Keir
>
> > Thanks,
> > Wei
> > Signed-off-by: Wei Wang <wei.wang2@amd.com>
> >
> >>> I was assuming max_pdx is the index number... Or are
> >>> you referring memory hot plug? If so, we might also need 4 level for
> >>> dom0.


--Boundary-00=_5UYUNw9CZzS6LpX
Content-Type: text/x-diff; charset="iso-8859-1"; name="dynamic_pgmode.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="dynamic_pgmode.patch"
Content-Description: dynamic_pgmode.patch

diff -r 77d05af7dc78 xen/drivers/passthrough/amd/iommu_map.c
--- a/xen/drivers/passthrough/amd/iommu_map.c	Mon Feb 07 09:58:11 2011 +0000
+++ b/xen/drivers/passthrough/amd/iommu_map.c	Tue Feb 08 18:49:34 2011 +0100
@@ -472,6 +472,89 @@ static u64 iommu_l2e_from_pfn(struct pag
     return next_table_maddr;
 }
 
+static int update_paging_mode(struct domain *d, unsigned long gfn)
+{
+    u16 bdf;
+    void *device_entry;
+    unsigned int req_id, level, offset;
+    unsigned long flags;
+    struct pci_dev *pdev;
+    struct amd_iommu *iommu = NULL;
+    struct page_info *new_root = NULL;
+    struct page_info *old_root = NULL;
+    void *new_root_vaddr;
+    u64 old_root_maddr;
+    struct hvm_iommu *hd = domain_hvm_iommu(d);
+
+    level = hd->paging_mode;
+    old_root = hd->root_table;
+    offset = gfn >> (PTE_PER_TABLE_SHIFT * (level - 1));
+
+    ASSERT(spin_is_locked(&hd->mapping_lock) && is_hvm_domain(d));
+
+    while ( offset >= PTE_PER_TABLE_SIZE )
+    {
+        /* Allocate and install a new root table.
+         * Only upper I/O page table grows, no need to fix next level bits */
+        new_root = alloc_amd_iommu_pgtable();
+        if ( new_root == NULL )
+        {
+            AMD_IOMMU_DEBUG("%s Cannot allocate I/O page table\n",
+                            __func__);
+            return -ENOMEM;
+        }
+
+        new_root_vaddr = __map_domain_page(new_root);
+        old_root_maddr = page_to_maddr(old_root);
+        amd_iommu_set_page_directory_entry((u32 *)new_root_vaddr,
+                                           old_root_maddr, level);
+        level++;
+        old_root = new_root;
+        offset >>= PTE_PER_TABLE_SHIFT;
+    }
+
+    if ( new_root != NULL )
+    {
+        hd->paging_mode = level;
+        hd->root_table = new_root;
+
+        if ( !spin_is_locked(&pcidevs_lock) )
+            AMD_IOMMU_DEBUG("%s Try to access pdev_list "
+                            "without aquiring pcidevs_lock.\n", __func__);
+
+        /* Update device table entries using new root table and paging mode */
+        for_each_pdev( d, pdev )
+        {
+            bdf = (pdev->bus << 8) | pdev->devfn;
+            req_id = get_dma_requestor_id(bdf);
+            iommu = find_iommu_for_device(bdf);
+            if ( !iommu )
+            {
+                AMD_IOMMU_DEBUG("%s Fail to find iommu.\n", __func__);
+                return -ENODEV;
+            }
+
+            spin_lock_irqsave(&iommu->lock, flags);
+            device_entry = iommu->dev_table.buffer +
+                           (req_id * IOMMU_DEV_TABLE_ENTRY_SIZE);
+
+            /* valid = 0 only works for dom0 passthrough mode */
+            amd_iommu_set_root_page_table((u32 *)device_entry,
+                                          page_to_maddr(hd->root_table),
+                                          hd->domain_id,
+                                          hd->paging_mode, 1);
+
+            invalidate_dev_table_entry(iommu, req_id);
+            flush_command_buffer(iommu);
+            spin_unlock_irqrestore(&iommu->lock, flags);
+        }
+
+        /* For safety, invalidate all entries */
+        invalidate_all_iommu_pages(d);
+    }
+    return 0;
+}
+
 int amd_iommu_map_page(struct domain *d, unsigned long gfn, unsigned long mfn,
                        unsigned int flags)
 {
@@ -481,6 +564,18 @@ int amd_iommu_map_page(struct domain *d,
     BUG_ON( !hd->root_table );
 
     spin_lock(&hd->mapping_lock);
+
+    /* Since HVM domain is initialized with 2 level IO page table,
+     * we might need a deeper page table for lager gfn now */
+    if ( is_hvm_domain(d) )
+    {
+        if ( update_paging_mode(d, gfn) )
+        {
+            AMD_IOMMU_DEBUG("Update page mode failed gfn = %lx\n", gfn);
+            domain_crash(d);
+            return -EFAULT;
+        }
+    }
 
     iommu_l2e = iommu_l2e_from_pfn(hd->root_table, hd->paging_mode, gfn);
     if ( iommu_l2e == 0 )
@@ -509,6 +604,18 @@ int amd_iommu_unmap_page(struct domain *
     BUG_ON( !hd->root_table );
 
     spin_lock(&hd->mapping_lock);
+
+    /* Since HVM domain is initialized with 2 level IO page table,
+     * we might need a deeper page table for lager gfn now */
+    if ( is_hvm_domain(d) )
+    {
+        if ( update_paging_mode(d, gfn) )
+        {
+            AMD_IOMMU_DEBUG("Update page mode failed gfn = %lx\n", gfn);
+            domain_crash(d);
+            return -EFAULT;
+        }
+    }
 
     iommu_l2e = iommu_l2e_from_pfn(hd->root_table, hd->paging_mode, gfn);
 
diff -r 77d05af7dc78 xen/drivers/passthrough/amd/pci_amd_iommu.c
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c	Mon Feb 07 09:58:11 2011 +0000
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c	Tue Feb 08 18:49:34 2011 +0100
@@ -214,8 +214,11 @@ static int amd_iommu_domain_init(struct 
         return -ENOMEM;
     }
 
+    /* For pv and dom0, stick with get_paging_mode(max_page)
+     * For HVM dom0, use 2 level page table at first */
     hd->paging_mode = is_hvm_domain(d) ?
-        IOMMU_PAGE_TABLE_LEVEL_4 : get_paging_mode(max_page);
+                      IOMMU_PAGING_MODE_LEVEL_2 :
+                      get_paging_mode(max_page);
 
     hd->domain_id = d->domain_id;
 
@@ -297,9 +300,6 @@ static int reassign_device( struct domai
 
     list_move(&pdev->domain_list, &target->arch.pdev_list);
     pdev->domain = target;
-
-    if ( target->max_pages > 0 )
-        t->paging_mode = get_paging_mode(target->max_pages);
 
     /* IO page tables might be destroyed after pci-detach the last device
      * In this case, we have to re-allocate root table for next pci-attach.*/
diff -r 77d05af7dc78 xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h	Mon Feb 07 09:58:11 2011 +0000
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h	Tue Feb 08 18:49:34 2011 +0100
@@ -386,8 +386,6 @@
 #define IOMMU_PAGES                 (MMIO_PAGES_PER_IOMMU * MAX_AMD_IOMMUS)
 #define DEFAULT_DOMAIN_ADDRESS_WIDTH    48
 #define MAX_AMD_IOMMUS                  32
-#define IOMMU_PAGE_TABLE_LEVEL_3        3
-#define IOMMU_PAGE_TABLE_LEVEL_4        4
 
 /* interrupt remapping table */
 #define INT_REMAP_INDEX_DM_MASK         0x1C00

--Boundary-00=_5UYUNw9CZzS6LpX
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

--Boundary-00=_5UYUNw9CZzS6LpX--