Re: [PATCH] amd iommu: Do not adjust paging mode for dom0 devices

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Wei Wang2 <wei.wang2@amd.com>
To: Keir Fraser <keir@xen.org>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>
Subject: Re: [PATCH] amd iommu: Do not adjust paging mode for dom0 devices
Date: Tue, 8 Feb 2011 19:02:33 +0100	[thread overview]
Message-ID: <201102081902.33844.wei.wang2@amd.com> (raw)
In-Reply-To: <C975B974.2ACE1%keir@xen.org>

[-- Attachment #1: Type: text/plain, Size: 1916 bytes --]


On Monday 07 February 2011 16:00:04 Keir Fraser wrote:
> On 07/02/2011 13:30, "Wei Wang2" <wei.wang2@amd.com> wrote:
> > On Monday 07 February 2011 11:47:32 Keir Fraser wrote:
> >> On 07/02/2011 10:33, "Wei Wang2" <wei.wang2@amd.com> wrote:
> >>
> >> Personally I would suggest starting with small 2-level tables and
> >> dynamically increase their height as bigger mappings are added to them.
> >> Else stick with 4-level tables, or size tables according to global
> >> variable max_page. I think basing anything on d->max_pages is not a good
> >> idea.
> >>
> >>  -- Keir
> >
> > How does the attached patch look like? It uses global variable max_page
> > for pv and dom0 and calculate maxpfn for hvm guest. This should cover gfn
> > holes on hvm guests.
>
> The p2m code already tracks the largest gfn for HVM guests. Try using
> p2m_get_hostp2m(d)->max_mapped_pfn for HVM guests. Note that this could
> increase after you sample it, however. Hence why you really need to have a
> statically deep-enough table, or the ability to grow the table depth
> dynamically.
Keir, 
Attached patch implements dynamical page table depth adjustment. Please 
review. IO Page table growth is triggered by amd_iommu_map_page and grows to 
upper level. I have tested it well for different devices (nic and gfx) and 
different guests (linux and Win7) with different guest memory sizes (512M, 
1G, 4G and above). 
Although this looks easier than my first thought, it may not be trivial for 
the release. If so, I will send you another patch to reverse c/s 22825 
tomorrow.
Thanks,
Wei
Signed-off-by: Wei Wang <wei.wang2@amd.com>

> Your change for PV guests would definitely be correct, however.
>
>  -- Keir
>
> > Thanks,
> > Wei
> > Signed-off-by: Wei Wang <wei.wang2@amd.com>
> >
> >>> I was assuming max_pdx is the index number... Or are
> >>> you referring memory hot plug? If so, we might also need 4 level for
> >>> dom0.



[-- Attachment #2: dynamic_pgmode.patch --]
[-- Type: text/x-diff, Size: 6206 bytes --]

diff -r 77d05af7dc78 xen/drivers/passthrough/amd/iommu_map.c
--- a/xen/drivers/passthrough/amd/iommu_map.c	Mon Feb 07 09:58:11 2011 +0000
+++ b/xen/drivers/passthrough/amd/iommu_map.c	Tue Feb 08 18:49:34 2011 +0100
@@ -472,6 +472,89 @@ static u64 iommu_l2e_from_pfn(struct pag
     return next_table_maddr;
 }
 
+static int update_paging_mode(struct domain *d, unsigned long gfn)
+{
+    u16 bdf;
+    void *device_entry;
+    unsigned int req_id, level, offset;
+    unsigned long flags;
+    struct pci_dev *pdev;
+    struct amd_iommu *iommu = NULL;
+    struct page_info *new_root = NULL;
+    struct page_info *old_root = NULL;
+    void *new_root_vaddr;
+    u64 old_root_maddr;
+    struct hvm_iommu *hd = domain_hvm_iommu(d);
+
+    level = hd->paging_mode;
+    old_root = hd->root_table;
+    offset = gfn >> (PTE_PER_TABLE_SHIFT * (level - 1));
+
+    ASSERT(spin_is_locked(&hd->mapping_lock) && is_hvm_domain(d));
+
+    while ( offset >= PTE_PER_TABLE_SIZE )
+    {
+        /* Allocate and install a new root table.
+         * Only upper I/O page table grows, no need to fix next level bits */
+        new_root = alloc_amd_iommu_pgtable();
+        if ( new_root == NULL )
+        {
+            AMD_IOMMU_DEBUG("%s Cannot allocate I/O page table\n",
+                            __func__);
+            return -ENOMEM;
+        }
+
+        new_root_vaddr = __map_domain_page(new_root);
+        old_root_maddr = page_to_maddr(old_root);
+        amd_iommu_set_page_directory_entry((u32 *)new_root_vaddr,
+                                           old_root_maddr, level);
+        level++;
+        old_root = new_root;
+        offset >>= PTE_PER_TABLE_SHIFT;
+    }
+
+    if ( new_root != NULL )
+    {
+        hd->paging_mode = level;
+        hd->root_table = new_root;
+
+        if ( !spin_is_locked(&pcidevs_lock) )
+            AMD_IOMMU_DEBUG("%s Try to access pdev_list "
+                            "without aquiring pcidevs_lock.\n", __func__);
+
+        /* Update device table entries using new root table and paging mode */
+        for_each_pdev( d, pdev )
+        {
+            bdf = (pdev->bus << 8) | pdev->devfn;
+            req_id = get_dma_requestor_id(bdf);
+            iommu = find_iommu_for_device(bdf);
+            if ( !iommu )
+            {
+                AMD_IOMMU_DEBUG("%s Fail to find iommu.\n", __func__);
+                return -ENODEV;
+            }
+
+            spin_lock_irqsave(&iommu->lock, flags);
+            device_entry = iommu->dev_table.buffer +
+                           (req_id * IOMMU_DEV_TABLE_ENTRY_SIZE);
+
+            /* valid = 0 only works for dom0 passthrough mode */
+            amd_iommu_set_root_page_table((u32 *)device_entry,
+                                          page_to_maddr(hd->root_table),
+                                          hd->domain_id,
+                                          hd->paging_mode, 1);
+
+            invalidate_dev_table_entry(iommu, req_id);
+            flush_command_buffer(iommu);
+            spin_unlock_irqrestore(&iommu->lock, flags);
+        }
+
+        /* For safety, invalidate all entries */
+        invalidate_all_iommu_pages(d);
+    }
+    return 0;
+}
+
 int amd_iommu_map_page(struct domain *d, unsigned long gfn, unsigned long mfn,
                        unsigned int flags)
 {
@@ -481,6 +564,18 @@ int amd_iommu_map_page(struct domain *d,
     BUG_ON( !hd->root_table );
 
     spin_lock(&hd->mapping_lock);
+
+    /* Since HVM domain is initialized with 2 level IO page table,
+     * we might need a deeper page table for lager gfn now */
+    if ( is_hvm_domain(d) )
+    {
+        if ( update_paging_mode(d, gfn) )
+        {
+            AMD_IOMMU_DEBUG("Update page mode failed gfn = %lx\n", gfn);
+            domain_crash(d);
+            return -EFAULT;
+        }
+    }
 
     iommu_l2e = iommu_l2e_from_pfn(hd->root_table, hd->paging_mode, gfn);
     if ( iommu_l2e == 0 )
@@ -509,6 +604,18 @@ int amd_iommu_unmap_page(struct domain *
     BUG_ON( !hd->root_table );
 
     spin_lock(&hd->mapping_lock);
+
+    /* Since HVM domain is initialized with 2 level IO page table,
+     * we might need a deeper page table for lager gfn now */
+    if ( is_hvm_domain(d) )
+    {
+        if ( update_paging_mode(d, gfn) )
+        {
+            AMD_IOMMU_DEBUG("Update page mode failed gfn = %lx\n", gfn);
+            domain_crash(d);
+            return -EFAULT;
+        }
+    }
 
     iommu_l2e = iommu_l2e_from_pfn(hd->root_table, hd->paging_mode, gfn);
 
diff -r 77d05af7dc78 xen/drivers/passthrough/amd/pci_amd_iommu.c
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c	Mon Feb 07 09:58:11 2011 +0000
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c	Tue Feb 08 18:49:34 2011 +0100
@@ -214,8 +214,11 @@ static int amd_iommu_domain_init(struct 
         return -ENOMEM;
     }
 
+    /* For pv and dom0, stick with get_paging_mode(max_page)
+     * For HVM dom0, use 2 level page table at first */
     hd->paging_mode = is_hvm_domain(d) ?
-        IOMMU_PAGE_TABLE_LEVEL_4 : get_paging_mode(max_page);
+                      IOMMU_PAGING_MODE_LEVEL_2 :
+                      get_paging_mode(max_page);
 
     hd->domain_id = d->domain_id;
 
@@ -297,9 +300,6 @@ static int reassign_device( struct domai
 
     list_move(&pdev->domain_list, &target->arch.pdev_list);
     pdev->domain = target;
-
-    if ( target->max_pages > 0 )
-        t->paging_mode = get_paging_mode(target->max_pages);
 
     /* IO page tables might be destroyed after pci-detach the last device
      * In this case, we have to re-allocate root table for next pci-attach.*/
diff -r 77d05af7dc78 xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h	Mon Feb 07 09:58:11 2011 +0000
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h	Tue Feb 08 18:49:34 2011 +0100
@@ -386,8 +386,6 @@
 #define IOMMU_PAGES                 (MMIO_PAGES_PER_IOMMU * MAX_AMD_IOMMUS)
 #define DEFAULT_DOMAIN_ADDRESS_WIDTH    48
 #define MAX_AMD_IOMMUS                  32
-#define IOMMU_PAGE_TABLE_LEVEL_3        3
-#define IOMMU_PAGE_TABLE_LEVEL_4        4
 
 /* interrupt remapping table */
 #define INT_REMAP_INDEX_DM_MASK         0x1C00

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

     prev parent reply	other threads:[~2011-02-08 18:02 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-01 17:34 [PATCH] amd iommu: Do not adjust paging mode for dom0 devices Wei Wang2
2011-02-06 16:58 ` Keir Fraser
2011-02-07  9:58   ` Wei Wang2
2011-02-07 10:10     ` Keir Fraser
2011-02-07 10:33       ` Wei Wang2
2011-02-07 10:47         ` Keir Fraser
2011-02-07 13:30           ` Wei Wang2
2011-02-07 15:00             ` Keir Fraser
2011-02-08 18:02               ` Wei Wang2 [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201102081902.33844.wei.wang2@amd.com \
    --to=wei.wang2@amd.com \
    --cc=keir@xen.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.