From mboxrd@z Thu Jan 1 00:00:00 1970 From: Imre Deak Subject: Re: [PATCH 5/9] [v5] drm/i915/bdw: Reorganize PT allocations Date: Mon, 24 Feb 2014 19:03:12 +0200 Message-ID: <1393261392.13131.126.camel@intelbox> References: <1392876349-24684-6-git-send-email-benjamin.widawsky@intel.com> <1392925881-6665-1-git-send-email-benjamin.widawsky@intel.com> Reply-To: imre.deak@intel.com Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1989354641==" Return-path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by gabe.freedesktop.org (Postfix) with ESMTP id 595B2FAB71 for ; Mon, 24 Feb 2014 09:04:42 -0800 (PST) In-Reply-To: <1392925881-6665-1-git-send-email-benjamin.widawsky@intel.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces@lists.freedesktop.org Errors-To: intel-gfx-bounces@lists.freedesktop.org To: Ben Widawsky Cc: Intel GFX , Ben Widawsky List-Id: intel-gfx@lists.freedesktop.org --===============1989354641== Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature"; boundary="=-gtRSN/qEPRyUz3ilQmd/" --=-gtRSN/qEPRyUz3ilQmd/ Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, 2014-02-20 at 11:51 -0800, Ben Widawsky wrote: > The previous allocation mechanism would get 2 contiguous allocations, > one for the page directories, and one for the page tables. As each page > table is 1 page, and there are 512 of these per page directory, this > goes to 2MB. An unfriendly request at best. Worse still, our HW now > supports 4 page directories, and a 2MB allocation is not allowed. >=20 > In order to fix this, this patch attempts to split up each page table > allocation into a single, discrete allocation. There is nothing really > fancy about the patch itself, it just has to manage an extra pointer > indirection, and have a fancier bit of logic to free up the pages. >=20 > To accommodate some of the added complexity, two new helpers are > introduced to allocate, and free the page table pages. >=20 > NOTE: I really wanted to split the way we do allocations, and the way in > which we identify the page table/page directory being used. I found > splitting this functionality up to be too unwieldy. I apologize in > advance to the reviewer. I'd recommend looking at the result, rather > than the diff. >=20 > v2/NOTE2: This patch predated commit: > 6f1cc993518462ccf039e195fabd47e7aa5bfd13 > Author: Chris Wilson > Date: Tue Dec 31 15:50:31 2013 +0000 >=20 > drm/i915: Avoid dereference past end of page arr >=20 > It fixed the same issue as that patch, but because of the limbo state of > PPGTT, Chris patch was merged instead. The excess churn is a result of > my using my original patch, which has my preferred naming. Primarily > act_* is changed to which_*, but it's mostly the same otherwise. I've > kept the convention Chris used for the pte wrap (I had something > slightly different, and broken - but fixable) >=20 > v3: Rename which_p[..]e to drop which_ (Chris) > Remove BUG_ON in inner loop (Chris) > Redo the pde/pdpe wrap logic (Chris) >=20 > v4: s/1MB/2MB in commit message (Imre) > Plug leaking gen8_pt_pages in both the error path, as well as general > free case (Imre) >=20 > v5: Rename leftover "which_" variables (Imre) > Add the pde =3D 0 wrap that was missed from v3 (Imre) >=20 > Reviewed-by: Imre Deak > Signed-off-by: Ben Widawsky Reviewed-by: Imre Deak > --- > drivers/gpu/drm/i915/i915_drv.h | 5 +- > drivers/gpu/drm/i915/i915_gem_gtt.c | 132 ++++++++++++++++++++++++++++--= ------ > 2 files changed, 108 insertions(+), 29 deletions(-) >=20 > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_= drv.h > index ab23bfd..465ebf4 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -691,6 +691,7 @@ struct i915_gtt { > }; > #define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT) > =20 > +#define GEN8_LEGACY_PDPS 4 > struct i915_hw_ppgtt { > struct i915_address_space base; > struct kref ref; > @@ -698,14 +699,14 @@ struct i915_hw_ppgtt { > unsigned num_pd_entries; > union { > struct page **pt_pages; > - struct page *gen8_pt_pages; > + struct page **gen8_pt_pages[GEN8_LEGACY_PDPS]; > }; > struct page *pd_pages; > int num_pd_pages; > int num_pt_pages; > union { > uint32_t pd_offset; > - dma_addr_t pd_dma_addr[4]; > + dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS]; > }; > union { > dma_addr_t *pt_dma_addr; > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i= 915_gem_gtt.c > index 03a3871..46adb0d 100644 > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c > @@ -64,7 +64,19 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t; > =20 > #define GEN8_PTES_PER_PAGE (PAGE_SIZE / sizeof(gen8_gtt_pte_t)) > #define GEN8_PDES_PER_PAGE (PAGE_SIZE / sizeof(gen8_ppgtt_pde_t)) > -#define GEN8_LEGACY_PDPS 4 > + > +/* GEN8 legacy style addressis defined as a 3 level page table: > + * 31:30 | 29:21 | 20:12 | 11:0 > + * PDPE | PDE | PTE | offset > + * The difference as compared to normal x86 3 level page table is the PD= PEs are > + * programmed via register. > + */ > +#define GEN8_PDPE_SHIFT 30 > +#define GEN8_PDPE_MASK 0x3 > +#define GEN8_PDE_SHIFT 21 > +#define GEN8_PDE_MASK 0x1ff > +#define GEN8_PTE_SHIFT 12 > +#define GEN8_PTE_MASK 0x1ff > =20 > #define PPAT_UNCACHED_INDEX (_PAGE_PWT | _PAGE_PCD) > #define PPAT_CACHED_PDE_INDEX 0 /* WB LLC */ > @@ -261,32 +273,36 @@ static void gen8_ppgtt_clear_range(struct i915_addr= ess_space *vm, > struct i915_hw_ppgtt *ppgtt =3D > container_of(vm, struct i915_hw_ppgtt, base); > gen8_gtt_pte_t *pt_vaddr, scratch_pte; > - unsigned first_entry =3D start >> PAGE_SHIFT; > + unsigned pdpe =3D start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK; > + unsigned pde =3D start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK; > + unsigned pte =3D start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK; > unsigned num_entries =3D length >> PAGE_SHIFT; > - unsigned act_pt =3D first_entry / GEN8_PTES_PER_PAGE; > - unsigned first_pte =3D first_entry % GEN8_PTES_PER_PAGE; > unsigned last_pte, i; > =20 > scratch_pte =3D gen8_pte_encode(ppgtt->base.scratch.addr, > I915_CACHE_LLC, use_scratch); > =20 > while (num_entries) { > - struct page *page_table =3D &ppgtt->gen8_pt_pages[act_pt]; > + struct page *page_table =3D ppgtt->gen8_pt_pages[pdpe][pde]; > =20 > - last_pte =3D first_pte + num_entries; > + last_pte =3D pte + num_entries; > if (last_pte > GEN8_PTES_PER_PAGE) > last_pte =3D GEN8_PTES_PER_PAGE; > =20 > pt_vaddr =3D kmap_atomic(page_table); > =20 > - for (i =3D first_pte; i < last_pte; i++) > + for (i =3D pte; i < last_pte; i++) { > pt_vaddr[i] =3D scratch_pte; > + num_entries--; > + } > =20 > kunmap_atomic(pt_vaddr); > =20 > - num_entries -=3D last_pte - first_pte; > - first_pte =3D 0; > - act_pt++; > + pte =3D 0; > + if (++pde =3D=3D GEN8_PDES_PER_PAGE) { > + pdpe++; > + pde =3D 0; > + } > } > } > =20 > @@ -298,38 +314,59 @@ static void gen8_ppgtt_insert_entries(struct i915_a= ddress_space *vm, > struct i915_hw_ppgtt *ppgtt =3D > container_of(vm, struct i915_hw_ppgtt, base); > gen8_gtt_pte_t *pt_vaddr; > - unsigned first_entry =3D start >> PAGE_SHIFT; > - unsigned act_pt =3D first_entry / GEN8_PTES_PER_PAGE; > - unsigned act_pte =3D first_entry % GEN8_PTES_PER_PAGE; > + unsigned pdpe =3D start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK; > + unsigned pde =3D start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK; > + unsigned pte =3D start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK; > struct sg_page_iter sg_iter; > =20 > pt_vaddr =3D NULL; > + > for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) { > + if (WARN_ON(pdpe >=3D GEN8_LEGACY_PDPS)) > + break; > + > if (pt_vaddr =3D=3D NULL) > - pt_vaddr =3D kmap_atomic(&ppgtt->gen8_pt_pages[act_pt]); > + pt_vaddr =3D kmap_atomic(ppgtt->gen8_pt_pages[pdpe][pde]); > =20 > - pt_vaddr[act_pte] =3D > + pt_vaddr[pte] =3D > gen8_pte_encode(sg_page_iter_dma_address(&sg_iter), > cache_level, true); > - if (++act_pte =3D=3D GEN8_PTES_PER_PAGE) { > + if (++pte =3D=3D GEN8_PTES_PER_PAGE) { > kunmap_atomic(pt_vaddr); > pt_vaddr =3D NULL; > - act_pt++; > - act_pte =3D 0; > + if (pde + 1 =3D=3D GEN8_PDES_PER_PAGE) { > + pdpe++; > + pde =3D 0; > + } > + pte =3D 0; > } > } > if (pt_vaddr) > kunmap_atomic(pt_vaddr); > } > =20 > -static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt) > +static void gen8_free_page_tables(struct page **pt_pages) > { > int i; > =20 > - for (i =3D 0; i < ppgtt->num_pd_pages ; i++) > + if (pt_pages =3D=3D NULL) > + return; > + > + for (i =3D 0; i < GEN8_PDES_PER_PAGE; i++) > + if (pt_pages[i]) > + __free_pages(pt_pages[i], 0); > +} > + > +static void gen8_ppgtt_free(const struct i915_hw_ppgtt *ppgtt) > +{ > + int i; > + > + for (i =3D 0; i < ppgtt->num_pd_pages; i++) { > + gen8_free_page_tables(ppgtt->gen8_pt_pages[i]); > + kfree(ppgtt->gen8_pt_pages[i]); > kfree(ppgtt->gen8_pt_dma_addr[i]); > + } > =20 > - __free_pages(ppgtt->gen8_pt_pages, get_order(ppgtt->num_pt_pages << PAG= E_SHIFT)); > __free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHI= FT)); > } > =20 > @@ -368,20 +405,61 @@ static void gen8_ppgtt_cleanup(struct i915_address_= space *vm) > gen8_ppgtt_free(ppgtt); > } > =20 > +static struct page **__gen8_alloc_page_tables(void) > +{ > + struct page **pt_pages; > + int i; > + > + pt_pages =3D kcalloc(GEN8_PDES_PER_PAGE, sizeof(struct page *), GFP_KER= NEL); > + if (!pt_pages) > + return ERR_PTR(-ENOMEM); > + > + for (i =3D 0; i < GEN8_PDES_PER_PAGE; i++) { > + pt_pages[i] =3D alloc_page(GFP_KERNEL); > + if (!pt_pages[i]) > + goto bail; > + } > + > + return pt_pages; > + > +bail: > + gen8_free_page_tables(pt_pages); > + kfree(pt_pages); > + return ERR_PTR(-ENOMEM); > +} > + > static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt, > const int max_pdp) > { > - struct page *pt_pages; > + struct page **pt_pages[GEN8_LEGACY_PDPS]; > const int num_pt_pages =3D GEN8_PDES_PER_PAGE * max_pdp; > + int i, ret; > =20 > - pt_pages =3D alloc_pages(GFP_KERNEL, get_order(num_pt_pages << PAGE_SHI= FT)); > - if (!pt_pages) > - return -ENOMEM; > + for (i =3D 0; i < max_pdp; i++) { > + pt_pages[i] =3D __gen8_alloc_page_tables(); > + if (IS_ERR(pt_pages[i])) { > + ret =3D PTR_ERR(pt_pages[i]); > + goto unwind_out; > + } > + } > + > + /* NB: Avoid touching gen8_pt_pages until last to keep the allocation, > + * "atomic" - for cleanup purposes. > + */ > + for (i =3D 0; i < max_pdp; i++) > + ppgtt->gen8_pt_pages[i] =3D pt_pages[i]; > =20 > - ppgtt->gen8_pt_pages =3D pt_pages; > ppgtt->num_pt_pages =3D 1 << get_order(num_pt_pages << PAGE_SHIFT); > =20 > return 0; > + > +unwind_out: > + while (i--) { > + gen8_free_page_tables(pt_pages[i]); > + kfree(pt_pages[i]); > + } > + > + return ret; > } > =20 > static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt) > @@ -463,7 +541,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_h= w_ppgtt *ppgtt, > struct page *p; > int ret; > =20 > - p =3D &ppgtt->gen8_pt_pages[pd * GEN8_PDES_PER_PAGE + pt]; > + p =3D ppgtt->gen8_pt_pages[pd][pt]; > pt_addr =3D pci_map_page(ppgtt->base.dev->pdev, > p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL); > ret =3D pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr); --=-gtRSN/qEPRyUz3ilQmd/ Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQEcBAABAgAGBQJTC3tQAAoJEORIIAnNuWDFNwoIAJ+75eSBTNpCsB3JEANjYdZr Oza6N/YfndK1ZEjrDDxpskLvWotD2lPywNnbWlcwB7tLhXUbTj7HhC19lWlyGqFc Yz4FFR8Y4UPltBJ+9sOARx5DE7soT9ijwPCcFUSbxoNuQoOhjnDh5PIaKm0br2t6 MhP/WO4dfjBxzYJcuLtkkkWr6vdY5JD5rb+TBuaKJHLe1mpNv5dbwLnxjyC2XbZV QgpRe1ypIvl1R+rPLMmhnTCIixv0B8WJovktkuya8vQppTQmfy/x7t4IAqd7Le88 isiYUYA7/ujJmDy15e53OFP9DIxwJSEhGbK/5IZJ22m3VTvDSf4o4UGx2MKoNSM= =CoWH -----END PGP SIGNATURE----- --=-gtRSN/qEPRyUz3ilQmd/-- --===============1989354641== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx --===============1989354641==--