From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wei Wang2 Subject: Re: [PATCH] amd iommu: Do not adjust paging mode for dom0 devices Date: Tue, 8 Feb 2011 19:02:33 +0100 Message-ID: <201102081902.33844.wei.wang2@amd.com> References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="Boundary-00=_5UYUNw9CZzS6LpX" Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Keir Fraser Cc: "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org --Boundary-00=_5UYUNw9CZzS6LpX Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline On Monday 07 February 2011 16:00:04 Keir Fraser wrote: > On 07/02/2011 13:30, "Wei Wang2" wrote: > > On Monday 07 February 2011 11:47:32 Keir Fraser wrote: > >> On 07/02/2011 10:33, "Wei Wang2" wrote: > >> > >> Personally I would suggest starting with small 2-level tables and > >> dynamically increase their height as bigger mappings are added to them. > >> Else stick with 4-level tables, or size tables according to global > >> variable max_page. I think basing anything on d->max_pages is not a good > >> idea. > >> > >> -- Keir > > > > How does the attached patch look like? It uses global variable max_page > > for pv and dom0 and calculate maxpfn for hvm guest. This should cover gfn > > holes on hvm guests. > > The p2m code already tracks the largest gfn for HVM guests. Try using > p2m_get_hostp2m(d)->max_mapped_pfn for HVM guests. Note that this could > increase after you sample it, however. Hence why you really need to have a > statically deep-enough table, or the ability to grow the table depth > dynamically. Keir, Attached patch implements dynamical page table depth adjustment. Please review. IO Page table growth is triggered by amd_iommu_map_page and grows to upper level. I have tested it well for different devices (nic and gfx) and different guests (linux and Win7) with different guest memory sizes (512M, 1G, 4G and above). Although this looks easier than my first thought, it may not be trivial for the release. If so, I will send you another patch to reverse c/s 22825 tomorrow. Thanks, Wei Signed-off-by: Wei Wang > Your change for PV guests would definitely be correct, however. > > -- Keir > > > Thanks, > > Wei > > Signed-off-by: Wei Wang > > > >>> I was assuming max_pdx is the index number... Or are > >>> you referring memory hot plug? If so, we might also need 4 level for > >>> dom0. --Boundary-00=_5UYUNw9CZzS6LpX Content-Type: text/x-diff; charset="iso-8859-1"; name="dynamic_pgmode.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="dynamic_pgmode.patch" Content-Description: dynamic_pgmode.patch diff -r 77d05af7dc78 xen/drivers/passthrough/amd/iommu_map.c --- a/xen/drivers/passthrough/amd/iommu_map.c Mon Feb 07 09:58:11 2011 +0000 +++ b/xen/drivers/passthrough/amd/iommu_map.c Tue Feb 08 18:49:34 2011 +0100 @@ -472,6 +472,89 @@ static u64 iommu_l2e_from_pfn(struct pag return next_table_maddr; } +static int update_paging_mode(struct domain *d, unsigned long gfn) +{ + u16 bdf; + void *device_entry; + unsigned int req_id, level, offset; + unsigned long flags; + struct pci_dev *pdev; + struct amd_iommu *iommu = NULL; + struct page_info *new_root = NULL; + struct page_info *old_root = NULL; + void *new_root_vaddr; + u64 old_root_maddr; + struct hvm_iommu *hd = domain_hvm_iommu(d); + + level = hd->paging_mode; + old_root = hd->root_table; + offset = gfn >> (PTE_PER_TABLE_SHIFT * (level - 1)); + + ASSERT(spin_is_locked(&hd->mapping_lock) && is_hvm_domain(d)); + + while ( offset >= PTE_PER_TABLE_SIZE ) + { + /* Allocate and install a new root table. + * Only upper I/O page table grows, no need to fix next level bits */ + new_root = alloc_amd_iommu_pgtable(); + if ( new_root == NULL ) + { + AMD_IOMMU_DEBUG("%s Cannot allocate I/O page table\n", + __func__); + return -ENOMEM; + } + + new_root_vaddr = __map_domain_page(new_root); + old_root_maddr = page_to_maddr(old_root); + amd_iommu_set_page_directory_entry((u32 *)new_root_vaddr, + old_root_maddr, level); + level++; + old_root = new_root; + offset >>= PTE_PER_TABLE_SHIFT; + } + + if ( new_root != NULL ) + { + hd->paging_mode = level; + hd->root_table = new_root; + + if ( !spin_is_locked(&pcidevs_lock) ) + AMD_IOMMU_DEBUG("%s Try to access pdev_list " + "without aquiring pcidevs_lock.\n", __func__); + + /* Update device table entries using new root table and paging mode */ + for_each_pdev( d, pdev ) + { + bdf = (pdev->bus << 8) | pdev->devfn; + req_id = get_dma_requestor_id(bdf); + iommu = find_iommu_for_device(bdf); + if ( !iommu ) + { + AMD_IOMMU_DEBUG("%s Fail to find iommu.\n", __func__); + return -ENODEV; + } + + spin_lock_irqsave(&iommu->lock, flags); + device_entry = iommu->dev_table.buffer + + (req_id * IOMMU_DEV_TABLE_ENTRY_SIZE); + + /* valid = 0 only works for dom0 passthrough mode */ + amd_iommu_set_root_page_table((u32 *)device_entry, + page_to_maddr(hd->root_table), + hd->domain_id, + hd->paging_mode, 1); + + invalidate_dev_table_entry(iommu, req_id); + flush_command_buffer(iommu); + spin_unlock_irqrestore(&iommu->lock, flags); + } + + /* For safety, invalidate all entries */ + invalidate_all_iommu_pages(d); + } + return 0; +} + int amd_iommu_map_page(struct domain *d, unsigned long gfn, unsigned long mfn, unsigned int flags) { @@ -481,6 +564,18 @@ int amd_iommu_map_page(struct domain *d, BUG_ON( !hd->root_table ); spin_lock(&hd->mapping_lock); + + /* Since HVM domain is initialized with 2 level IO page table, + * we might need a deeper page table for lager gfn now */ + if ( is_hvm_domain(d) ) + { + if ( update_paging_mode(d, gfn) ) + { + AMD_IOMMU_DEBUG("Update page mode failed gfn = %lx\n", gfn); + domain_crash(d); + return -EFAULT; + } + } iommu_l2e = iommu_l2e_from_pfn(hd->root_table, hd->paging_mode, gfn); if ( iommu_l2e == 0 ) @@ -509,6 +604,18 @@ int amd_iommu_unmap_page(struct domain * BUG_ON( !hd->root_table ); spin_lock(&hd->mapping_lock); + + /* Since HVM domain is initialized with 2 level IO page table, + * we might need a deeper page table for lager gfn now */ + if ( is_hvm_domain(d) ) + { + if ( update_paging_mode(d, gfn) ) + { + AMD_IOMMU_DEBUG("Update page mode failed gfn = %lx\n", gfn); + domain_crash(d); + return -EFAULT; + } + } iommu_l2e = iommu_l2e_from_pfn(hd->root_table, hd->paging_mode, gfn); diff -r 77d05af7dc78 xen/drivers/passthrough/amd/pci_amd_iommu.c --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c Mon Feb 07 09:58:11 2011 +0000 +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c Tue Feb 08 18:49:34 2011 +0100 @@ -214,8 +214,11 @@ static int amd_iommu_domain_init(struct return -ENOMEM; } + /* For pv and dom0, stick with get_paging_mode(max_page) + * For HVM dom0, use 2 level page table at first */ hd->paging_mode = is_hvm_domain(d) ? - IOMMU_PAGE_TABLE_LEVEL_4 : get_paging_mode(max_page); + IOMMU_PAGING_MODE_LEVEL_2 : + get_paging_mode(max_page); hd->domain_id = d->domain_id; @@ -297,9 +300,6 @@ static int reassign_device( struct domai list_move(&pdev->domain_list, &target->arch.pdev_list); pdev->domain = target; - - if ( target->max_pages > 0 ) - t->paging_mode = get_paging_mode(target->max_pages); /* IO page tables might be destroyed after pci-detach the last device * In this case, we have to re-allocate root table for next pci-attach.*/ diff -r 77d05af7dc78 xen/include/asm-x86/hvm/svm/amd-iommu-defs.h --- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h Mon Feb 07 09:58:11 2011 +0000 +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h Tue Feb 08 18:49:34 2011 +0100 @@ -386,8 +386,6 @@ #define IOMMU_PAGES (MMIO_PAGES_PER_IOMMU * MAX_AMD_IOMMUS) #define DEFAULT_DOMAIN_ADDRESS_WIDTH 48 #define MAX_AMD_IOMMUS 32 -#define IOMMU_PAGE_TABLE_LEVEL_3 3 -#define IOMMU_PAGE_TABLE_LEVEL_4 4 /* interrupt remapping table */ #define INT_REMAP_INDEX_DM_MASK 0x1C00 --Boundary-00=_5UYUNw9CZzS6LpX Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --Boundary-00=_5UYUNw9CZzS6LpX--