From: Ben Widawsky <benjamin.widawsky@intel.com>
To: Intel GFX <intel-gfx@lists.freedesktop.org>
Cc: Anthony Bernecky <anthony.bernecky@intel.com>,
Ben Widawsky <benjamin.widawsky@intel.com>,
mesa-dev <mesa-dev@lists.freedesktop.org>
Subject: [PATCH 00/68] Broadwell 48b addressing and prelocations (no relocs)
Date: Thu, 21 Aug 2014 20:11:23 -0700 [thread overview]
Message-ID: <1408677155-1840-1-git-send-email-benjamin.widawsky@intel.com> (raw)
The primary goal of these patches is to introduce what I've started
calling, "prelocations" on Broadwell. A prelocation is like a
relocation, except not. When a GPU client specifies a prelocation, it is
instructing the kernel where in the GPU address the buffer should be
mapped. The mechanic works very similarly to a relocation except it uses
the execbuffer object to obtain the offset, and bind if needed. If a GPU
client uses only prelocations, the relocation process can be entirely
skipped. This sounds like a big win initially, but realistically with
full PPGTT and 48b address space it's unlikely to noticeably improve
anything. Doing this work leaves the address space allocation up to
libc/malloc [1] instead of drm_mm which I believe has some upside due to
the hits on creating new VMAs. Not specific to prelocations, dynamic
page table allocations by themselves can save measurable memory on systems
running multiple GPU clients. As previously mentioned, this kind of thing is
needed for OCL 2.0 SVM. One other advantage I've discussed with Ken... [2].
The difficult part to enable this [for 64b platforms] is supporting the
48b address space. As mentioned in previous versions of this cover
letter, and my blog post [3], it's not feasible to allocate the entire 48b
address space's page tables. Dynamic page table allocation and teardown
required a lot of plumbing and rework, and to make the interfaces as
neat as possible, I also had to put a good deal of work into GEN7 PPGTT
well. The other really difficult part is taking the malloc'd memory and
turning it into GPU usable pages. Luckily, Chris already did that for me
with userptr, so I simply reused his work.
The kernel patches are lightly tested at best. Previous iterations of
this series were more thoroughly tested, but enough has changed since
then that I would assume the code is unstable. If miraculously it is
almost stable, there are still a lot of cosmetic things to clean up, and
a performance optimization to reduce re-mapping already mapped objects.
I started on a patch to do this but ran into too many stability problems
(See Optimize PDP loads from previous posts). It's likely memory leaks
are introduced with the dynamic page tables; plugging those would nice.
One could also implement the reaper I refer to in the comments.
With the kernel prelocation support are the libdrm patches, an
intel-gpu-tools test, and a mesa patch. Some parts of the code are in
rough shape, and were meant for demonstration only. The userspace
components in particular were mostly meant as sample code. [4]
The series is fundamental 5 parts with some bleeding between 2-3, and
3-4.
1. [00-18] Provide fixes to make a stable branch for test with full
PPGTT. I've previously posted this as a separate series. In the
meanwhile, many similar fixes have gone in, and some of these may be
dropped. So this is mostly here for completeness.
2. [19-42] Rework code to avoid as much future churn
as possible. Nothing special here. Some of this is arguably #3.
3. [43-46] Make page table allocations dynamic. I tried to keep this
generic, but since the current code supported very specific page table
depths, it's really mostly GEN7.
4. [47-67] GEN8 dynamic page table support with 64b page table support.
This was very hard to split up, and is definitely the majority of the
work.
5. [68] A basic SVM interface. I opted not to use create2 IOCTL since
there are patches for that already, and I wanted to have something
that's as reusable as possible. X. the rest are
workaround/libdrm/mesa/igt
Kernel:
http://cgit.freedesktop.org/~bwidawsk/drm-intel/log/?h=prelocate
libdrm:
http://cgit.freedesktop.org/~bwidawsk/drm/log/?h=prelocate
mesa:
http://cgit.freedesktop.org/~bwidawsk/mesa/log/?h=prelocate
IGT:
http://cgit.freedesktop.org/~bwidawsk/intel-gpu-tools/log/?h=prelocate
Final thoughts:
* Due to time pressure, the ability to go back and test on GEN7 was lost.
The original patches I posted back in March did work fine on GEN7, but I
cannot speak to the quality now. That said, I did the work, so I figured
I may as well provide it. For the sake of progress, someone should
test/fix GEN7, or simply drop the GEN7 support.
* Broadwell is currently hanging with this patch series when I run piglit.
I have gone through plenty of software bugs, and this current hang is
baffling. Therefore I think it makes sense to either parameterize, or
CONFIG_ dynamic page table allocations until that's solved.
* Again on the stability, there are a lot of extra flushes introduced as a
result of this series. I believe if we can figure out the case of some
of these issues, we can remove some flushes.
* I haven't tested aliasing PPGTT only in a while. Someone should do that.
* I'll bet 32b is broken.
* A lot of issues I had were related to the complexities when dealing with
legacy contexts. It's possible, and I am hopeful that with execlists
these issues go away, and so do the hangs.
* The patches have been rebased SOOOOO many times that they really need to
be reviewed closely to make sure they're bisectable. They were at one
time, but I doubt it's the case now.
[1] We have to use mmap in certain situations due to a hardware
limitation. I'm not sure how libc manages these things together. I hope
it's efficient...
[2] We can potentially always set the state base to be 0, and rely on HW
contexts to save restore this information, thus eliminating this
non-pipelined state upload. It turns out this is not possible for all
cases because of hardware limitations, but it's a neat idea that someone
can possibly turn into something useful. It's also probably a premature
optimization given how many PIPE CONTROL stalls we have.
[3] https://bwidawsk.net/blog/index.php/2014/07/future-ppgtt-part-4-dynamic-page-table-allocations-64-bit-address-space-gpu-mirroring-and-yeah-something-about-relocs-too/
[4] This was the best I could do on short notice. I won't be improving,
rebasing, or fixing these patches any longer, but someone is welcome to take
them over. Consider this my parting gift before I go on sabbatical [tomorrow].
--
Ben Widawsky (68):
drm/i915: Split up do_switch
drm/i915: Extract l3 remapping out of ctx switch
drm/i915/ppgtt: Load address space after mi_set_context
drm/i915: Fix another another use-after-free in do_switch
drm/i915/ctx: Return earlier on failure
drm/i915/error: vma error capture prettyify
drm/i915/error: Do a better job of disambiguating VMAs
drm/i915/error: Capture vmas instead of BOs
drm/i915: Add some extra guards in evict_vm
drm/i915: Make an uninterruptible evict
drm/i915: More correct (slower) ppgtt cleanup
drm/i915: Defer PPGTT cleanup
drm/i915/bdw: Enable full PPGTT
drm/i915: Get the error state over the wire (HACKish)
drm/i915/gen8: Invalidate TLBs before PDP reload
drm/i915: Remove false assertion in ppgtt_release
Revert "drm/i915/bdw: Use timeout mode for RC6 on bdw"
drm/i915/trace: Fix offsets for 64b
drm/i915: Wrap VMA binding
drm/i915: Make pin global flags explicit
drm/i915: Split out aliasing binds
drm/i915: fix gtt_total_entries()
drm/i915: Rename to GEN8_LEGACY_PDPES
drm/i915: Split out verbose PPGTT dumping
drm/i915: s/pd/pdpe, s/pt/pde
drm/i915: rename map/unmap to dma_map/unmap
drm/i915: Setup less PPGTT on failed pagedir
drm/i915: clean up PPGTT init error path
drm/i915: Un-hardcode number of page directories
drm/i915: Make gen6_write_pdes gen6_map_page_tables
drm/i915: Range clearing is PPGTT agnostic
drm/i915: Page table helpers, and define renames
drm/i915: construct page table abstractions
drm/i915: Complete page table structures
drm/i915: Create page table allocators
drm/i915: Generalize GEN6 mapping
drm/i915: Clean up pagetable DMA map & unmap
drm/i915: Always dma map page table allocations
drm/i915: Consolidate dma mappings
drm/i915: Always dma map page directory allocations
drm/i915: Track GEN6 page table usage
drm/i915: Extract context switch skip logic
drm/i915: Track page table reload need
drm/i915: Initialize all contexts
drm/i915: Finish gen6/7 dynamic page table allocation
drm/i915/bdw: Use dynamic allocation idioms on free
drm/i915/bdw: pagedirs rework allocation
drm/i915/bdw: pagetable allocation rework
drm/i915/bdw: Make the pdp switch a bit less hacky
drm/i915: num_pd_pages/num_pd_entries isn't useful
drm/i915: Extract PPGTT param from pagedir alloc
drm/i915/bdw: Split out mappings
drm/i915/bdw: begin bitmap tracking
drm/i915/bdw: Dynamic page table allocations
drm/i915/bdw: Make pdp allocation more dynamic
drm/i915/bdw: Abstract PDP usage
drm/i915/bdw: Add dynamic page trace events
drm/i915/bdw: Add ppgtt info for dynamic pages
drm/i915/bdw: implement alloc/teardown for 4lvl
drm/i915/bdw: Add 4 level switching infrastructure
drm/i915/bdw: Generalize PTE writing for GEN8 PPGTT
drm/i915: Plumb sg_iter through va allocation ->maps
drm/i915: Introduce map and unmap for VMAs
drm/i915: Depend exclusively on map and unmap_vma
drm/i915: Expand error state's address width to 64b
drm/i915/bdw: Flip the 48b switch
drm/i915: Provide a soft_pin hook
XXX: drm/i915: Unexplained workarounds
drivers/gpu/drm/i915/i915_debugfs.c | 114 +-
drivers/gpu/drm/i915/i915_drv.h | 61 +-
drivers/gpu/drm/i915/i915_gem.c | 231 +++-
drivers/gpu/drm/i915/i915_gem_context.c | 276 ++++-
drivers/gpu/drm/i915/i915_gem_evict.c | 39 +-
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 27 +-
drivers/gpu/drm/i915/i915_gem_gtt.c | 1838 +++++++++++++++++++++-------
drivers/gpu/drm/i915/i915_gem_gtt.h | 379 +++++-
drivers/gpu/drm/i915/i915_gem_stolen.c | 2 +-
drivers/gpu/drm/i915/i915_gem_userptr.c | 7 +-
drivers/gpu/drm/i915/i915_gpu_error.c | 171 ++-
drivers/gpu/drm/i915/i915_reg.h | 1 +
drivers/gpu/drm/i915/i915_sysfs.c | 2 +-
drivers/gpu/drm/i915/i915_trace.h | 156 ++-
drivers/gpu/drm/i915/intel_pm.c | 16 +-
drivers/gpu/drm/i915/intel_ringbuffer.c | 2 +-
include/uapi/drm/i915_drm.h | 3 +-
17 files changed, 2588 insertions(+), 737 deletions(-)
--
2.0.4
next reply other threads:[~2014-08-22 3:11 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-22 3:11 Ben Widawsky [this message]
2014-08-22 3:11 ` [PATCH 01/68] drm/i915: Split up do_switch Ben Widawsky
2014-08-22 3:11 ` [PATCH 02/68] drm/i915: Extract l3 remapping out of ctx switch Ben Widawsky
2014-08-22 3:11 ` [PATCH 03/68] drm/i915/ppgtt: Load address space after mi_set_context Ben Widawsky
2014-08-22 3:11 ` [PATCH 04/68] drm/i915: Fix another another use-after-free in do_switch Ben Widawsky
2014-08-22 3:11 ` [PATCH 05/68] drm/i915/ctx: Return earlier on failure Ben Widawsky
2014-08-22 3:11 ` [PATCH 06/68] drm/i915/error: vma error capture prettyify Ben Widawsky
2014-08-22 3:11 ` [PATCH 07/68] drm/i915/error: Do a better job of disambiguating VMAs Ben Widawsky
2014-08-22 3:11 ` [PATCH 08/68] drm/i915/error: Capture vmas instead of BOs Ben Widawsky
2014-08-22 3:11 ` [PATCH 09/68] drm/i915: Add some extra guards in evict_vm Ben Widawsky
2014-08-22 3:11 ` [PATCH 10/68] drm/i915: Make an uninterruptible evict Ben Widawsky
2014-08-22 3:11 ` [PATCH 11/68] drm/i915: More correct (slower) ppgtt cleanup Ben Widawsky
2014-08-22 3:11 ` [PATCH 12/68] drm/i915: Defer PPGTT cleanup Ben Widawsky
2014-08-22 3:11 ` [PATCH 13/68] drm/i915/bdw: Enable full PPGTT Ben Widawsky
2014-08-22 3:11 ` [PATCH 14/68] drm/i915: Get the error state over the wire (HACKish) Ben Widawsky
2014-08-22 3:11 ` [PATCH 15/68] drm/i915/gen8: Invalidate TLBs before PDP reload Ben Widawsky
2014-08-22 3:11 ` [PATCH 16/68] drm/i915: Remove false assertion in ppgtt_release Ben Widawsky
2014-08-22 3:11 ` [PATCH 17/68] Revert "drm/i915/bdw: Use timeout mode for RC6 on bdw" Ben Widawsky
2014-10-31 19:45 ` Rodrigo Vivi
2014-10-31 21:10 ` Rodrigo Vivi
2014-08-22 3:11 ` [PATCH 18/68] drm/i915/trace: Fix offsets for 64b Ben Widawsky
2014-08-22 3:11 ` [PATCH 19/68] drm/i915: Wrap VMA binding Ben Widawsky
2014-08-22 3:11 ` [PATCH 20/68] drm/i915: Make pin global flags explicit Ben Widawsky
2014-08-22 3:11 ` [PATCH 21/68] drm/i915: Split out aliasing binds Ben Widawsky
2014-08-22 3:11 ` [PATCH 22/68] drm/i915: fix gtt_total_entries() Ben Widawsky
2014-08-22 3:11 ` [PATCH 23/68] drm/i915: Rename to GEN8_LEGACY_PDPES Ben Widawsky
2014-08-22 3:11 ` [PATCH 24/68] drm/i915: Split out verbose PPGTT dumping Ben Widawsky
2014-08-22 3:11 ` [PATCH 25/68] drm/i915: s/pd/pdpe, s/pt/pde Ben Widawsky
2014-08-22 3:11 ` [PATCH 26/68] drm/i915: rename map/unmap to dma_map/unmap Ben Widawsky
2014-08-22 3:11 ` [PATCH 27/68] drm/i915: Setup less PPGTT on failed pagedir Ben Widawsky
2014-08-22 3:11 ` [PATCH 28/68] drm/i915: clean up PPGTT init error path Ben Widawsky
2014-08-22 3:11 ` [PATCH 29/68] drm/i915: Un-hardcode number of page directories Ben Widawsky
2014-08-22 3:11 ` [PATCH 30/68] drm/i915: Make gen6_write_pdes gen6_map_page_tables Ben Widawsky
2014-08-22 3:11 ` [PATCH 31/68] drm/i915: Range clearing is PPGTT agnostic Ben Widawsky
2014-08-22 3:11 ` [PATCH 32/68] drm/i915: Page table helpers, and define renames Ben Widawsky
2014-08-22 3:11 ` [PATCH 33/68] drm/i915: construct page table abstractions Ben Widawsky
2014-08-22 3:11 ` [PATCH 34/68] drm/i915: Complete page table structures Ben Widawsky
2014-08-22 3:11 ` [PATCH 35/68] drm/i915: Create page table allocators Ben Widawsky
2014-08-22 3:11 ` [PATCH 36/68] drm/i915: Generalize GEN6 mapping Ben Widawsky
2014-08-22 3:12 ` [PATCH 37/68] drm/i915: Clean up pagetable DMA map & unmap Ben Widawsky
2014-08-22 3:12 ` [PATCH 38/68] drm/i915: Always dma map page table allocations Ben Widawsky
2014-08-22 3:12 ` [PATCH 39/68] drm/i915: Consolidate dma mappings Ben Widawsky
2014-08-22 3:12 ` [PATCH 40/68] drm/i915: Always dma map page directory allocations Ben Widawsky
2014-08-22 3:12 ` [PATCH 41/68] drm/i915: Track GEN6 page table usage Ben Widawsky
2014-08-22 3:12 ` [PATCH 42/68] drm/i915: Extract context switch skip logic Ben Widawsky
2014-08-22 3:12 ` [PATCH 43/68] drm/i915: Track page table reload need Ben Widawsky
2014-08-22 3:12 ` [PATCH 44/68] drm/i915: Initialize all contexts Ben Widawsky
2014-08-22 3:12 ` [PATCH 45/68] drm/i915: Finish gen6/7 dynamic page table allocation Ben Widawsky
2014-08-22 3:12 ` [PATCH 46/68] drm/i915/bdw: Use dynamic allocation idioms on free Ben Widawsky
2014-08-22 3:12 ` [PATCH 47/68] drm/i915/bdw: pagedirs rework allocation Ben Widawsky
2014-08-22 3:12 ` [PATCH 48/68] drm/i915/bdw: pagetable allocation rework Ben Widawsky
2014-08-22 3:12 ` [PATCH 49/68] drm/i915/bdw: Make the pdp switch a bit less hacky Ben Widawsky
2014-08-22 3:12 ` [PATCH 50/68] drm/i915: num_pd_pages/num_pd_entries isn't useful Ben Widawsky
2014-08-22 3:12 ` [PATCH 51/68] drm/i915: Extract PPGTT param from pagedir alloc Ben Widawsky
2014-08-22 3:12 ` [PATCH 52/68] drm/i915/bdw: Split out mappings Ben Widawsky
2014-08-22 3:12 ` [PATCH 53/68] drm/i915/bdw: begin bitmap tracking Ben Widawsky
2014-08-22 3:12 ` [PATCH 54/68] drm/i915/bdw: Dynamic page table allocations Ben Widawsky
2014-08-22 3:12 ` [PATCH 55/68] drm/i915/bdw: Make pdp allocation more dynamic Ben Widawsky
2014-08-22 3:12 ` [PATCH 56/68] drm/i915/bdw: Abstract PDP usage Ben Widawsky
2014-08-22 3:12 ` [PATCH 57/68] drm/i915/bdw: Add dynamic page trace events Ben Widawsky
2014-08-22 3:12 ` [PATCH 58/68] drm/i915/bdw: Add ppgtt info for dynamic pages Ben Widawsky
2014-08-22 3:12 ` [PATCH 59/68] drm/i915/bdw: implement alloc/teardown for 4lvl Ben Widawsky
2014-08-22 3:12 ` [PATCH 60/68] drm/i915/bdw: Add 4 level switching infrastructure Ben Widawsky
2014-08-22 3:12 ` [PATCH 61/68] drm/i915/bdw: Generalize PTE writing for GEN8 PPGTT Ben Widawsky
2014-08-22 3:12 ` [PATCH 62/68] drm/i915: Plumb sg_iter through va allocation ->maps Ben Widawsky
2014-08-22 3:12 ` [PATCH 63/68] drm/i915: Introduce map and unmap for VMAs Ben Widawsky
2014-08-22 3:12 ` [PATCH 64/68] drm/i915: Depend exclusively on map and unmap_vma Ben Widawsky
2014-08-22 3:12 ` [PATCH 65/68] drm/i915: Expand error state's address width to 64b Ben Widawsky
2014-08-22 3:12 ` [PATCH 66/68] drm/i915/bdw: Flip the 48b switch Ben Widawsky
2014-08-22 3:12 ` [PATCH 67/68] drm/i915: Provide a soft_pin hook Ben Widawsky
2014-08-22 3:12 ` [PATCH 68/68] XXX: drm/i915: Unexplained workarounds Ben Widawsky
2014-08-22 3:12 ` [PATCH 1/2] intel: Split out bo allocation Ben Widawsky
2014-08-22 3:12 ` [PATCH 2/2] intel: Add prelocation support Ben Widawsky
2014-08-22 3:12 ` [PATCH] i965: First step toward prelocation Ben Widawsky
2014-08-22 12:15 ` [Mesa-dev] " Alex Deucher
2014-08-22 17:14 ` Ben Widawsky
2014-08-22 3:12 ` [PATCH] no_reloc: test case Ben Widawsky
2014-08-22 6:30 ` [Intel-gfx] [PATCH 00/68] Broadwell 48b addressing and prelocations (no relocs) Chris Wilson
2014-08-22 6:59 ` Kenneth Graunke
2014-08-22 7:03 ` Chris Wilson
2014-08-22 13:30 ` Daniel Vetter
2014-08-22 13:38 ` [Intel-gfx] " Chris Wilson
2014-08-22 20:29 ` Daniel Vetter
2014-08-22 20:38 ` [Intel-gfx] " Daniel Vetter
2014-08-25 22:42 ` Jesse Barnes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1408677155-1840-1-git-send-email-benjamin.widawsky@intel.com \
--to=benjamin.widawsky@intel.com \
--cc=anthony.bernecky@intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=mesa-dev@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox