public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Ben Widawsky <benjamin.widawsky@intel.com>
To: Intel GFX <intel-gfx@lists.freedesktop.org>
Cc: Anthony Bernecky <anthony.bernecky@intel.com>,
	Ben Widawsky <benjamin.widawsky@intel.com>,
	mesa-dev <mesa-dev@lists.freedesktop.org>
Subject: [PATCH 00/68] Broadwell 48b addressing and prelocations (no relocs)
Date: Thu, 21 Aug 2014 20:11:23 -0700	[thread overview]
Message-ID: <1408677155-1840-1-git-send-email-benjamin.widawsky@intel.com> (raw)

The primary goal of these patches is to introduce what I've started
calling, "prelocations" on Broadwell. A prelocation is like a
relocation, except not. When a GPU client specifies a prelocation, it is
instructing the kernel where in the GPU address the buffer should be
mapped. The mechanic works very similarly to a relocation except it uses
the execbuffer object to obtain the offset, and bind if needed. If a GPU
client uses only prelocations, the relocation process can be entirely
skipped. This sounds like a big win initially, but realistically with
full PPGTT and 48b address space it's unlikely to noticeably improve
anything. Doing this work leaves the address space allocation up to
libc/malloc [1] instead of drm_mm which I believe has some upside due to
the hits on creating new VMAs. Not specific to prelocations, dynamic
page table allocations by themselves can save measurable memory on systems
running multiple GPU clients. As previously mentioned, this kind of thing is
needed for OCL 2.0 SVM. One other advantage I've discussed with Ken... [2].

The difficult part to enable this [for 64b platforms] is supporting the
48b address space. As mentioned in previous versions of this cover
letter, and my blog post [3], it's not feasible to allocate the entire 48b
address space's page tables. Dynamic page table allocation and teardown
required a lot of plumbing and rework, and to make the interfaces as
neat as possible, I also had to put a good deal of work into GEN7 PPGTT
well. The other really difficult part is taking the malloc'd memory and
turning it into GPU usable pages. Luckily, Chris already did that for me
with userptr, so I simply reused his work.

The kernel patches are lightly tested at best. Previous iterations of
this series were more thoroughly tested, but enough has changed since
then that I would assume the code is unstable. If miraculously it is
almost stable, there are still a lot of cosmetic things to clean up, and
a performance optimization to reduce re-mapping already mapped objects.
I started on a patch to do this but ran into too many stability problems
(See Optimize PDP loads from previous posts). It's likely memory leaks
are introduced with the dynamic page tables; plugging those would nice.
One could also implement the reaper I refer to in the comments.

With the kernel prelocation support are the libdrm patches, an
intel-gpu-tools test, and a mesa patch. Some parts of the code are in
rough shape, and were meant for demonstration only. The userspace
components in particular were mostly meant as sample code. [4]

The series is fundamental 5 parts with some bleeding between 2-3, and
3-4.

1. [00-18] Provide fixes to make a stable branch for test with full
PPGTT.  I've previously posted this as a separate series. In the
meanwhile, many similar fixes have gone in, and some of these may be
dropped. So this is mostly here for completeness.

2. [19-42] Rework code to avoid as much future churn
as possible.  Nothing special here. Some of this is arguably #3.

3. [43-46] Make page table allocations dynamic. I tried to keep this
generic, but since the current code supported very specific page table
depths, it's really mostly GEN7.

4. [47-67] GEN8 dynamic page table support with 64b page table support.
This was very hard to split up, and is definitely the majority of the
work.

5. [68] A basic SVM interface.  I opted not to use create2 IOCTL since
there are patches for that already, and I wanted to have something
that's as reusable as possible.  X. the rest are
workaround/libdrm/mesa/igt

Kernel:
http://cgit.freedesktop.org/~bwidawsk/drm-intel/log/?h=prelocate
libdrm:
http://cgit.freedesktop.org/~bwidawsk/drm/log/?h=prelocate
mesa:
http://cgit.freedesktop.org/~bwidawsk/mesa/log/?h=prelocate
IGT:
http://cgit.freedesktop.org/~bwidawsk/intel-gpu-tools/log/?h=prelocate

Final thoughts:
* Due to time pressure, the ability to go back and test on GEN7 was lost.
The original patches I posted back in March did work fine on GEN7, but I
cannot speak to the quality now. That said, I did the work, so I figured
I may as well provide it. For the sake of progress, someone should
test/fix GEN7, or simply drop the GEN7 support.

* Broadwell is currently hanging with this patch series when I run piglit.
I have gone through plenty of software bugs, and this current hang is
baffling. Therefore I think it makes sense to either parameterize, or
CONFIG_ dynamic page table allocations until that's solved.

* Again on the stability, there are a lot of extra flushes introduced as a
result of this series. I believe if we can figure out the case of some
of these issues, we can remove some flushes.

* I haven't tested aliasing PPGTT only in a while. Someone should do that.

* I'll bet 32b is broken.

* A lot of issues I had were related to the complexities when dealing with
legacy contexts. It's possible, and I am hopeful that with execlists
these issues go away, and so do the hangs.

* The patches have been rebased SOOOOO many times that they really need to
be reviewed closely to make sure they're bisectable. They were at one
time, but I doubt it's the case now.



[1] We have to use mmap in certain situations due to a hardware
limitation. I'm not sure how libc manages these things together. I hope
it's efficient...

[2] We can potentially always set the state base to be 0, and rely on HW
contexts to save restore this information, thus eliminating this
non-pipelined state upload. It turns out this is not possible for all
cases because of hardware limitations, but it's a neat idea that someone
can possibly turn into something useful. It's also probably a premature
optimization given how many PIPE CONTROL stalls we have.

[3] https://bwidawsk.net/blog/index.php/2014/07/future-ppgtt-part-4-dynamic-page-table-allocations-64-bit-address-space-gpu-mirroring-and-yeah-something-about-relocs-too/

[4] This was the best I could do on short notice. I won't be improving,
rebasing, or fixing these patches any longer, but someone is welcome to take
them over. Consider this my parting gift before I go on sabbatical [tomorrow].

--

Ben Widawsky (68):
  drm/i915: Split up do_switch
  drm/i915: Extract l3 remapping out of ctx switch
  drm/i915/ppgtt: Load address space after mi_set_context
  drm/i915: Fix another another use-after-free in do_switch
  drm/i915/ctx: Return earlier on failure
  drm/i915/error: vma error capture prettyify
  drm/i915/error: Do a better job of disambiguating VMAs
  drm/i915/error: Capture vmas instead of BOs
  drm/i915: Add some extra guards in evict_vm
  drm/i915: Make an uninterruptible evict
  drm/i915: More correct (slower) ppgtt cleanup
  drm/i915: Defer PPGTT cleanup
  drm/i915/bdw: Enable full PPGTT
  drm/i915: Get the error state over the wire (HACKish)
  drm/i915/gen8: Invalidate TLBs before PDP reload
  drm/i915: Remove false assertion in ppgtt_release
  Revert "drm/i915/bdw: Use timeout mode for RC6 on bdw"
  drm/i915/trace: Fix offsets for 64b
  drm/i915: Wrap VMA binding
  drm/i915: Make pin global flags explicit
  drm/i915: Split out aliasing binds
  drm/i915: fix gtt_total_entries()
  drm/i915: Rename to GEN8_LEGACY_PDPES
  drm/i915: Split out verbose PPGTT dumping
  drm/i915: s/pd/pdpe, s/pt/pde
  drm/i915: rename map/unmap to dma_map/unmap
  drm/i915: Setup less PPGTT on failed pagedir
  drm/i915: clean up PPGTT init error path
  drm/i915: Un-hardcode number of page directories
  drm/i915: Make gen6_write_pdes gen6_map_page_tables
  drm/i915: Range clearing is PPGTT agnostic
  drm/i915: Page table helpers, and define renames
  drm/i915: construct page table abstractions
  drm/i915: Complete page table structures
  drm/i915: Create page table allocators
  drm/i915: Generalize GEN6 mapping
  drm/i915: Clean up pagetable DMA map & unmap
  drm/i915: Always dma map page table allocations
  drm/i915: Consolidate dma mappings
  drm/i915: Always dma map page directory allocations
  drm/i915: Track GEN6 page table usage
  drm/i915: Extract context switch skip logic
  drm/i915: Track page table reload need
  drm/i915: Initialize all contexts
  drm/i915: Finish gen6/7 dynamic page table allocation
  drm/i915/bdw: Use dynamic allocation idioms on free
  drm/i915/bdw: pagedirs rework allocation
  drm/i915/bdw: pagetable allocation rework
  drm/i915/bdw: Make the pdp switch a bit less hacky
  drm/i915: num_pd_pages/num_pd_entries isn't useful
  drm/i915: Extract PPGTT param from pagedir alloc
  drm/i915/bdw: Split out mappings
  drm/i915/bdw: begin bitmap tracking
  drm/i915/bdw: Dynamic page table allocations
  drm/i915/bdw: Make pdp allocation more dynamic
  drm/i915/bdw: Abstract PDP usage
  drm/i915/bdw: Add dynamic page trace events
  drm/i915/bdw: Add ppgtt info for dynamic pages
  drm/i915/bdw: implement alloc/teardown for 4lvl
  drm/i915/bdw: Add 4 level switching infrastructure
  drm/i915/bdw: Generalize PTE writing for GEN8 PPGTT
  drm/i915: Plumb sg_iter through va allocation ->maps
  drm/i915: Introduce map and unmap for VMAs
  drm/i915: Depend exclusively on map and unmap_vma
  drm/i915: Expand error state's address width to 64b
  drm/i915/bdw: Flip the 48b switch
  drm/i915: Provide a soft_pin hook
  XXX: drm/i915: Unexplained workarounds

 drivers/gpu/drm/i915/i915_debugfs.c        |  114 +-
 drivers/gpu/drm/i915/i915_drv.h            |   61 +-
 drivers/gpu/drm/i915/i915_gem.c            |  231 +++-
 drivers/gpu/drm/i915/i915_gem_context.c    |  276 ++++-
 drivers/gpu/drm/i915/i915_gem_evict.c      |   39 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   27 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 1838 +++++++++++++++++++++-------
 drivers/gpu/drm/i915/i915_gem_gtt.h        |  379 +++++-
 drivers/gpu/drm/i915/i915_gem_stolen.c     |    2 +-
 drivers/gpu/drm/i915/i915_gem_userptr.c    |    7 +-
 drivers/gpu/drm/i915/i915_gpu_error.c      |  171 ++-
 drivers/gpu/drm/i915/i915_reg.h            |    1 +
 drivers/gpu/drm/i915/i915_sysfs.c          |    2 +-
 drivers/gpu/drm/i915/i915_trace.h          |  156 ++-
 drivers/gpu/drm/i915/intel_pm.c            |   16 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c    |    2 +-
 include/uapi/drm/i915_drm.h                |    3 +-
 17 files changed, 2588 insertions(+), 737 deletions(-)

-- 
2.0.4

             reply	other threads:[~2014-08-22  3:11 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-22  3:11 Ben Widawsky [this message]
2014-08-22  3:11 ` [PATCH 01/68] drm/i915: Split up do_switch Ben Widawsky
2014-08-22  3:11 ` [PATCH 02/68] drm/i915: Extract l3 remapping out of ctx switch Ben Widawsky
2014-08-22  3:11 ` [PATCH 03/68] drm/i915/ppgtt: Load address space after mi_set_context Ben Widawsky
2014-08-22  3:11 ` [PATCH 04/68] drm/i915: Fix another another use-after-free in do_switch Ben Widawsky
2014-08-22  3:11 ` [PATCH 05/68] drm/i915/ctx: Return earlier on failure Ben Widawsky
2014-08-22  3:11 ` [PATCH 06/68] drm/i915/error: vma error capture prettyify Ben Widawsky
2014-08-22  3:11 ` [PATCH 07/68] drm/i915/error: Do a better job of disambiguating VMAs Ben Widawsky
2014-08-22  3:11 ` [PATCH 08/68] drm/i915/error: Capture vmas instead of BOs Ben Widawsky
2014-08-22  3:11 ` [PATCH 09/68] drm/i915: Add some extra guards in evict_vm Ben Widawsky
2014-08-22  3:11 ` [PATCH 10/68] drm/i915: Make an uninterruptible evict Ben Widawsky
2014-08-22  3:11 ` [PATCH 11/68] drm/i915: More correct (slower) ppgtt cleanup Ben Widawsky
2014-08-22  3:11 ` [PATCH 12/68] drm/i915: Defer PPGTT cleanup Ben Widawsky
2014-08-22  3:11 ` [PATCH 13/68] drm/i915/bdw: Enable full PPGTT Ben Widawsky
2014-08-22  3:11 ` [PATCH 14/68] drm/i915: Get the error state over the wire (HACKish) Ben Widawsky
2014-08-22  3:11 ` [PATCH 15/68] drm/i915/gen8: Invalidate TLBs before PDP reload Ben Widawsky
2014-08-22  3:11 ` [PATCH 16/68] drm/i915: Remove false assertion in ppgtt_release Ben Widawsky
2014-08-22  3:11 ` [PATCH 17/68] Revert "drm/i915/bdw: Use timeout mode for RC6 on bdw" Ben Widawsky
2014-10-31 19:45   ` Rodrigo Vivi
2014-10-31 21:10     ` Rodrigo Vivi
2014-08-22  3:11 ` [PATCH 18/68] drm/i915/trace: Fix offsets for 64b Ben Widawsky
2014-08-22  3:11 ` [PATCH 19/68] drm/i915: Wrap VMA binding Ben Widawsky
2014-08-22  3:11 ` [PATCH 20/68] drm/i915: Make pin global flags explicit Ben Widawsky
2014-08-22  3:11 ` [PATCH 21/68] drm/i915: Split out aliasing binds Ben Widawsky
2014-08-22  3:11 ` [PATCH 22/68] drm/i915: fix gtt_total_entries() Ben Widawsky
2014-08-22  3:11 ` [PATCH 23/68] drm/i915: Rename to GEN8_LEGACY_PDPES Ben Widawsky
2014-08-22  3:11 ` [PATCH 24/68] drm/i915: Split out verbose PPGTT dumping Ben Widawsky
2014-08-22  3:11 ` [PATCH 25/68] drm/i915: s/pd/pdpe, s/pt/pde Ben Widawsky
2014-08-22  3:11 ` [PATCH 26/68] drm/i915: rename map/unmap to dma_map/unmap Ben Widawsky
2014-08-22  3:11 ` [PATCH 27/68] drm/i915: Setup less PPGTT on failed pagedir Ben Widawsky
2014-08-22  3:11 ` [PATCH 28/68] drm/i915: clean up PPGTT init error path Ben Widawsky
2014-08-22  3:11 ` [PATCH 29/68] drm/i915: Un-hardcode number of page directories Ben Widawsky
2014-08-22  3:11 ` [PATCH 30/68] drm/i915: Make gen6_write_pdes gen6_map_page_tables Ben Widawsky
2014-08-22  3:11 ` [PATCH 31/68] drm/i915: Range clearing is PPGTT agnostic Ben Widawsky
2014-08-22  3:11 ` [PATCH 32/68] drm/i915: Page table helpers, and define renames Ben Widawsky
2014-08-22  3:11 ` [PATCH 33/68] drm/i915: construct page table abstractions Ben Widawsky
2014-08-22  3:11 ` [PATCH 34/68] drm/i915: Complete page table structures Ben Widawsky
2014-08-22  3:11 ` [PATCH 35/68] drm/i915: Create page table allocators Ben Widawsky
2014-08-22  3:11 ` [PATCH 36/68] drm/i915: Generalize GEN6 mapping Ben Widawsky
2014-08-22  3:12 ` [PATCH 37/68] drm/i915: Clean up pagetable DMA map & unmap Ben Widawsky
2014-08-22  3:12 ` [PATCH 38/68] drm/i915: Always dma map page table allocations Ben Widawsky
2014-08-22  3:12 ` [PATCH 39/68] drm/i915: Consolidate dma mappings Ben Widawsky
2014-08-22  3:12 ` [PATCH 40/68] drm/i915: Always dma map page directory allocations Ben Widawsky
2014-08-22  3:12 ` [PATCH 41/68] drm/i915: Track GEN6 page table usage Ben Widawsky
2014-08-22  3:12 ` [PATCH 42/68] drm/i915: Extract context switch skip logic Ben Widawsky
2014-08-22  3:12 ` [PATCH 43/68] drm/i915: Track page table reload need Ben Widawsky
2014-08-22  3:12 ` [PATCH 44/68] drm/i915: Initialize all contexts Ben Widawsky
2014-08-22  3:12 ` [PATCH 45/68] drm/i915: Finish gen6/7 dynamic page table allocation Ben Widawsky
2014-08-22  3:12 ` [PATCH 46/68] drm/i915/bdw: Use dynamic allocation idioms on free Ben Widawsky
2014-08-22  3:12 ` [PATCH 47/68] drm/i915/bdw: pagedirs rework allocation Ben Widawsky
2014-08-22  3:12 ` [PATCH 48/68] drm/i915/bdw: pagetable allocation rework Ben Widawsky
2014-08-22  3:12 ` [PATCH 49/68] drm/i915/bdw: Make the pdp switch a bit less hacky Ben Widawsky
2014-08-22  3:12 ` [PATCH 50/68] drm/i915: num_pd_pages/num_pd_entries isn't useful Ben Widawsky
2014-08-22  3:12 ` [PATCH 51/68] drm/i915: Extract PPGTT param from pagedir alloc Ben Widawsky
2014-08-22  3:12 ` [PATCH 52/68] drm/i915/bdw: Split out mappings Ben Widawsky
2014-08-22  3:12 ` [PATCH 53/68] drm/i915/bdw: begin bitmap tracking Ben Widawsky
2014-08-22  3:12 ` [PATCH 54/68] drm/i915/bdw: Dynamic page table allocations Ben Widawsky
2014-08-22  3:12 ` [PATCH 55/68] drm/i915/bdw: Make pdp allocation more dynamic Ben Widawsky
2014-08-22  3:12 ` [PATCH 56/68] drm/i915/bdw: Abstract PDP usage Ben Widawsky
2014-08-22  3:12 ` [PATCH 57/68] drm/i915/bdw: Add dynamic page trace events Ben Widawsky
2014-08-22  3:12 ` [PATCH 58/68] drm/i915/bdw: Add ppgtt info for dynamic pages Ben Widawsky
2014-08-22  3:12 ` [PATCH 59/68] drm/i915/bdw: implement alloc/teardown for 4lvl Ben Widawsky
2014-08-22  3:12 ` [PATCH 60/68] drm/i915/bdw: Add 4 level switching infrastructure Ben Widawsky
2014-08-22  3:12 ` [PATCH 61/68] drm/i915/bdw: Generalize PTE writing for GEN8 PPGTT Ben Widawsky
2014-08-22  3:12 ` [PATCH 62/68] drm/i915: Plumb sg_iter through va allocation ->maps Ben Widawsky
2014-08-22  3:12 ` [PATCH 63/68] drm/i915: Introduce map and unmap for VMAs Ben Widawsky
2014-08-22  3:12 ` [PATCH 64/68] drm/i915: Depend exclusively on map and unmap_vma Ben Widawsky
2014-08-22  3:12 ` [PATCH 65/68] drm/i915: Expand error state's address width to 64b Ben Widawsky
2014-08-22  3:12 ` [PATCH 66/68] drm/i915/bdw: Flip the 48b switch Ben Widawsky
2014-08-22  3:12 ` [PATCH 67/68] drm/i915: Provide a soft_pin hook Ben Widawsky
2014-08-22  3:12 ` [PATCH 68/68] XXX: drm/i915: Unexplained workarounds Ben Widawsky
2014-08-22  3:12 ` [PATCH 1/2] intel: Split out bo allocation Ben Widawsky
2014-08-22  3:12 ` [PATCH 2/2] intel: Add prelocation support Ben Widawsky
2014-08-22  3:12 ` [PATCH] i965: First step toward prelocation Ben Widawsky
2014-08-22 12:15   ` [Mesa-dev] " Alex Deucher
2014-08-22 17:14     ` Ben Widawsky
2014-08-22  3:12 ` [PATCH] no_reloc: test case Ben Widawsky
2014-08-22  6:30 ` [Intel-gfx] [PATCH 00/68] Broadwell 48b addressing and prelocations (no relocs) Chris Wilson
2014-08-22  6:59   ` Kenneth Graunke
2014-08-22  7:03     ` Chris Wilson
2014-08-22 13:30       ` Daniel Vetter
2014-08-22 13:38         ` [Intel-gfx] " Chris Wilson
2014-08-22 20:29           ` Daniel Vetter
2014-08-22 20:38           ` [Intel-gfx] " Daniel Vetter
2014-08-25 22:42             ` Jesse Barnes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1408677155-1840-1-git-send-email-benjamin.widawsky@intel.com \
    --to=benjamin.widawsky@intel.com \
    --cc=anthony.bernecky@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=mesa-dev@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox