* libdrm relocation reset support
@ 2011-10-24 18:04 Eric Anholt
2011-10-24 18:04 ` [PATCH 1/2] intel: Use stdbool.h for dealing with boolean values Eric Anholt
2011-10-24 18:04 ` [PATCH 2/2] intel: Add an interface for removing relocs after they're added Eric Anholt
0 siblings, 2 replies; 4+ messages in thread
From: Eric Anholt @ 2011-10-24 18:04 UTC (permalink / raw)
To: intel-gfx
Early in the GEM work, we introduced this ugly split in the 3D driver
state setup to support the check_aperture code. We would go through
the GL state and make a list of every BO that would be part of state
setup, and save that, then do check_aperture on them, then go through
state setup again actually emitting the state. The overhead wasn't
huge (after some small optimizations), but the complexity cost on the
code was pretty bad.
While I was getting started on vertex texturing, I was having an awful
time wrapping my mind around the state setup sequence for VS/WM
surface state. I realized that if I couldn't understand it, having
rewritten most of it myself, everyone else was screwed. So, I spent a
while trying to work out how to avoid the split. I realized two
things:
1) We now emit only a single level of relocations.
Back in the pre-state-streaming days, we'd have this big tree of
relocations. The batch pointed to the binding table, which pointed to
the surface state, which pointed to texture data. Now, that chain is
batch pointing to binding-table-in-batch pointing to
surface-state-in-batch, pointing to texture data. So, to do a
checkpoint and restore, we only need to do it on the batchbuffer.
That's way easier than previous plans.
(Tested with an assertion on gen6)
2) We don't care about check_aperture after rolling back the last
primitive emitted.
The plan is to replace this inner loop of state setup:
for each prim {
compute bo list
if (check_aperture_space(bo list)) {
batch flush
compute bo list
if (check_aperture_space(bo list)) {
whine_about_batch_size()
fall back;
}
}
upload state to BOs
}
with this inner loop:
for each prim {
retry:
upload state to BOs
if (check_aperture_space(batch)) {
if (!retried) {
reset_to_last_prim()
batch flush
} else {
if (batch_flush())
whine_about_batch_size()
goto retry;
}
}
}
Note how if we do a reset, we immediately flush. We don't need to
check aperture space -- the kernel will tell us if we actually ran out
of aperture or not. Anhd if we did run out of aperture, it's because
either the single prim was too big, or because check_aperture was
wrong at the point of setting up the last primitive.
This means that all I need to do for reset_to_last_prim() is to roll
back where the next batch emits go (for the final MI_BATCHBUFFER_END
emit), and the set of relocations emitted since the last primitive.
This change gives us the relocations reset required.
The simplification in i965 comes out to:
41 files changed, 228 insertions(+), 418 deletions(-)
but I don't think that accurately reflects the benefit.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 1/2] intel: Use stdbool.h for dealing with boolean values.
2011-10-24 18:04 libdrm relocation reset support Eric Anholt
@ 2011-10-24 18:04 ` Eric Anholt
2011-10-24 18:04 ` [PATCH 2/2] intel: Add an interface for removing relocs after they're added Eric Anholt
1 sibling, 0 replies; 4+ messages in thread
From: Eric Anholt @ 2011-10-24 18:04 UTC (permalink / raw)
To: intel-gfx
A few of the bitfield-based booleans are left in place. Changing them
to "bool" results in the same code size, so I'm erring on the side of
not changing things.
---
intel/intel_bufmgr_gem.c | 65 +++++++++++++++++++++++----------------------
1 files changed, 33 insertions(+), 32 deletions(-)
diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c
index 1baa0b3..31ea26e 100644
--- a/intel/intel_bufmgr_gem.c
+++ b/intel/intel_bufmgr_gem.c
@@ -51,6 +51,7 @@
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>
+#include <stdbool.h>
#include "errno.h"
#include "libdrm_lists.h"
@@ -105,7 +106,7 @@ typedef struct _drm_intel_bufmgr_gem {
unsigned int has_blt : 1;
unsigned int has_relaxed_fencing : 1;
unsigned int bo_reuse : 1;
- char fenced_relocs;
+ bool fenced_relocs;
} drm_intel_bufmgr_gem;
#define DRM_INTEL_RELOC_FENCE (1<<0)
@@ -163,24 +164,24 @@ struct _drm_intel_bo_gem {
* Boolean of whether this BO and its children have been included in
* the current drm_intel_bufmgr_check_aperture_space() total.
*/
- char included_in_check_aperture;
+ bool included_in_check_aperture;
/**
* Boolean of whether this buffer has been used as a relocation
* target and had its size accounted for, and thus can't have any
* further relocations added to it.
*/
- char used_as_reloc_target;
+ bool used_as_reloc_target;
/**
* Boolean of whether we have encountered an error whilst building the relocation tree.
*/
- char has_error;
+ bool has_error;
/**
* Boolean of whether this buffer can be re-used
*/
- char reusable;
+ bool reusable;
/**
* Size in bytes of this buffer and its relocation descendents.
@@ -507,7 +508,7 @@ drm_intel_setup_reloc_list(drm_intel_bo *bo)
bo_gem->reloc_target_info = malloc(max_relocs *
sizeof(drm_intel_reloc_target));
if (bo_gem->relocs == NULL || bo_gem->reloc_target_info == NULL) {
- bo_gem->has_error = 1;
+ bo_gem->has_error = true;
free (bo_gem->relocs);
bo_gem->relocs = NULL;
@@ -592,12 +593,12 @@ drm_intel_gem_bo_alloc_internal(drm_intel_bufmgr *bufmgr,
unsigned int page_size = getpagesize();
int ret;
struct drm_intel_gem_bo_bucket *bucket;
- int alloc_from_cache;
+ bool alloc_from_cache;
unsigned long bo_size;
- int for_render = 0;
+ bool for_render = false;
if (flags & BO_ALLOC_FOR_RENDER)
- for_render = 1;
+ for_render = true;
/* Round the allocated size up to a power of two number of pages. */
bucket = drm_intel_gem_bo_bucket_for_size(bufmgr_gem, size);
@@ -616,7 +617,7 @@ drm_intel_gem_bo_alloc_internal(drm_intel_bufmgr *bufmgr,
pthread_mutex_lock(&bufmgr_gem->lock);
/* Get a buffer out of the cache if available */
retry:
- alloc_from_cache = 0;
+ alloc_from_cache = false;
if (bucket != NULL && !DRMLISTEMPTY(&bucket->head)) {
if (for_render) {
/* Allocate new render-target BOs from the tail (MRU)
@@ -626,7 +627,7 @@ retry:
bo_gem = DRMLISTENTRY(drm_intel_bo_gem,
bucket->head.prev, head);
DRMLISTDEL(&bo_gem->head);
- alloc_from_cache = 1;
+ alloc_from_cache = true;
} else {
/* For non-render-target BOs (where we're probably
* going to map it first thing in order to fill it
@@ -638,7 +639,7 @@ retry:
bo_gem = DRMLISTENTRY(drm_intel_bo_gem,
bucket->head.next, head);
if (!drm_intel_gem_bo_busy(&bo_gem->bo)) {
- alloc_from_cache = 1;
+ alloc_from_cache = true;
DRMLISTDEL(&bo_gem->head);
}
}
@@ -702,9 +703,9 @@ retry:
atomic_set(&bo_gem->refcount, 1);
bo_gem->validate_index = -1;
bo_gem->reloc_tree_fences = 0;
- bo_gem->used_as_reloc_target = 0;
- bo_gem->has_error = 0;
- bo_gem->reusable = 1;
+ bo_gem->used_as_reloc_target = false;
+ bo_gem->has_error = false;
+ bo_gem->reusable = true;
drm_intel_bo_gem_set_in_aperture_size(bufmgr_gem, bo_gem);
@@ -845,7 +846,7 @@ drm_intel_bo_gem_create_from_name(drm_intel_bufmgr *bufmgr,
bo_gem->gem_handle = open_arg.handle;
bo_gem->bo.handle = open_arg.handle;
bo_gem->global_name = handle;
- bo_gem->reusable = 0;
+ bo_gem->reusable = false;
memset(&get_tiling, 0, sizeof(get_tiling));
get_tiling.handle = bo_gem->gem_handle;
@@ -938,7 +939,7 @@ drm_intel_gem_bo_unreference_final(drm_intel_bo *bo, time_t time)
}
}
bo_gem->reloc_count = 0;
- bo_gem->used_as_reloc_target = 0;
+ bo_gem->used_as_reloc_target = false;
DBG("bo_unreference final: %d (%s)\n",
bo_gem->gem_handle, bo_gem->name);
@@ -1329,28 +1330,28 @@ static int
do_bo_emit_reloc(drm_intel_bo *bo, uint32_t offset,
drm_intel_bo *target_bo, uint32_t target_offset,
uint32_t read_domains, uint32_t write_domain,
- int need_fence)
+ bool need_fence)
{
drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
drm_intel_bo_gem *target_bo_gem = (drm_intel_bo_gem *) target_bo;
- int fenced_command;
+ bool fenced_command;
if (bo_gem->has_error)
return -ENOMEM;
if (target_bo_gem->has_error) {
- bo_gem->has_error = 1;
+ bo_gem->has_error = true;
return -ENOMEM;
}
/* We never use HW fences for rendering on 965+ */
if (bufmgr_gem->gen >= 4)
- need_fence = 0;
+ need_fence = false;
fenced_command = need_fence;
if (target_bo_gem->tiling_mode == I915_TILING_NONE)
- need_fence = 0;
+ need_fence = false;
/* Create a new relocation list if needed */
if (bo_gem->relocs == NULL && drm_intel_setup_reloc_list(bo))
@@ -1368,7 +1369,7 @@ do_bo_emit_reloc(drm_intel_bo *bo, uint32_t offset,
*/
assert(!bo_gem->used_as_reloc_target);
if (target_bo_gem != bo_gem) {
- target_bo_gem->used_as_reloc_target = 1;
+ target_bo_gem->used_as_reloc_target = true;
bo_gem->reloc_tree_size += target_bo_gem->reloc_tree_size;
}
/* An object needing a fence is a tiled buffer, so it won't have
@@ -1419,7 +1420,7 @@ drm_intel_gem_bo_emit_reloc_fence(drm_intel_bo *bo, uint32_t offset,
uint32_t read_domains, uint32_t write_domain)
{
return do_bo_emit_reloc(bo, offset, target_bo, target_offset,
- read_domains, write_domain, 1);
+ read_domains, write_domain, true);
}
/**
@@ -1800,7 +1801,7 @@ drm_intel_gem_bo_flink(drm_intel_bo *bo, uint32_t * name)
if (ret != 0)
return -errno;
bo_gem->global_name = flink.name;
- bo_gem->reusable = 0;
+ bo_gem->reusable = false;
DRMLISTADDTAIL(&bo_gem->name_list, &bufmgr_gem->named);
}
@@ -1821,7 +1822,7 @@ drm_intel_bufmgr_gem_enable_reuse(drm_intel_bufmgr *bufmgr)
{
drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bufmgr;
- bufmgr_gem->bo_reuse = 1;
+ bufmgr_gem->bo_reuse = true;
}
/**
@@ -1837,7 +1838,7 @@ drm_intel_bufmgr_gem_enable_fenced_relocs(drm_intel_bufmgr *bufmgr)
drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *)bufmgr;
if (bufmgr_gem->bufmgr.bo_exec == drm_intel_gem_bo_exec2)
- bufmgr_gem->fenced_relocs = 1;
+ bufmgr_gem->fenced_relocs = true;
}
/**
@@ -1855,7 +1856,7 @@ drm_intel_gem_bo_get_aperture_space(drm_intel_bo *bo)
return 0;
total += bo->size;
- bo_gem->included_in_check_aperture = 1;
+ bo_gem->included_in_check_aperture = true;
for (i = 0; i < bo_gem->reloc_count; i++)
total +=
@@ -1903,7 +1904,7 @@ drm_intel_gem_bo_clear_aperture_space_flag(drm_intel_bo *bo)
if (bo == NULL || !bo_gem->included_in_check_aperture)
return;
- bo_gem->included_in_check_aperture = 0;
+ bo_gem->included_in_check_aperture = false;
for (i = 0; i < bo_gem->reloc_count; i++)
drm_intel_gem_bo_clear_aperture_space_flag(bo_gem->
@@ -2020,7 +2021,7 @@ drm_intel_gem_bo_disable_reuse(drm_intel_bo *bo)
{
drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
- bo_gem->reusable = 0;
+ bo_gem->reusable = false;
return 0;
}
@@ -2116,7 +2117,7 @@ drm_intel_bufmgr_gem_init(int fd, int batch_size)
struct drm_i915_gem_get_aperture aperture;
drm_i915_getparam_t gp;
int ret, tmp;
- int exec2 = 0;
+ bool exec2 = false;
bufmgr_gem = calloc(1, sizeof(*bufmgr_gem));
if (bufmgr_gem == NULL)
@@ -2167,7 +2168,7 @@ drm_intel_bufmgr_gem_init(int fd, int batch_size)
gp.param = I915_PARAM_HAS_EXECBUF2;
ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
if (!ret)
- exec2 = 1;
+ exec2 = true;
gp.param = I915_PARAM_HAS_BSD;
ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
--
1.7.7
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/2] intel: Add an interface for removing relocs after they're added.
2011-10-24 18:04 libdrm relocation reset support Eric Anholt
2011-10-24 18:04 ` [PATCH 1/2] intel: Use stdbool.h for dealing with boolean values Eric Anholt
@ 2011-10-24 18:04 ` Eric Anholt
2011-10-25 11:51 ` Daniel Vetter
1 sibling, 1 reply; 4+ messages in thread
From: Eric Anholt @ 2011-10-24 18:04 UTC (permalink / raw)
To: intel-gfx
---
intel/intel_bufmgr.h | 2 ++
intel/intel_bufmgr_gem.c | 42 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 44 insertions(+), 0 deletions(-)
diff --git a/intel/intel_bufmgr.h b/intel/intel_bufmgr.h
index 889ef46..abe9711 100644
--- a/intel/intel_bufmgr.h
+++ b/intel/intel_bufmgr.h
@@ -147,6 +147,8 @@ void drm_intel_bufmgr_gem_enable_reuse(drm_intel_bufmgr *bufmgr);
void drm_intel_bufmgr_gem_enable_fenced_relocs(drm_intel_bufmgr *bufmgr);
int drm_intel_gem_bo_map_gtt(drm_intel_bo *bo);
int drm_intel_gem_bo_unmap_gtt(drm_intel_bo *bo);
+int drm_intel_gem_bo_get_reloc_count(drm_intel_bo *bo);
+void drm_intel_gem_bo_clear_relocs(drm_intel_bo *bo, int start);
void drm_intel_gem_bo_start_gtt_access(drm_intel_bo *bo, int write_enable);
int drm_intel_get_pipe_from_crtc_id(drm_intel_bufmgr *bufmgr, int crtc_id);
diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c
index 31ea26e..54433ea 100644
--- a/intel/intel_bufmgr_gem.c
+++ b/intel/intel_bufmgr_gem.c
@@ -1423,6 +1423,48 @@ drm_intel_gem_bo_emit_reloc_fence(drm_intel_bo *bo, uint32_t offset,
read_domains, write_domain, true);
}
+int
+drm_intel_gem_bo_get_reloc_count(drm_intel_bo *bo)
+{
+ drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+
+ return bo_gem->reloc_count;
+}
+
+/**
+ * Removes existing relocation entries in the BO after "start".
+ *
+ * This allows a user to avoid a two-step process for state setup with
+ * counting up all the buffer objects and doing a
+ * drm_intel_bufmgr_check_aperture_space() before emitting any of the
+ * relocations for the state setup. Instead, save the state of the
+ * batchbuffer including drm_intel_gem_get_reloc_count(), emit all the
+ * state, and then check if it still fits in the aperture.
+ *
+ * Any further drm_intel_bufmgr_check_aperture_space() queries
+ * involving this buffer in the tree are undefined after this call.
+ */
+void
+drm_intel_gem_bo_clear_relocs(drm_intel_bo *bo, int start)
+{
+ drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+ int i;
+ struct timespec time;
+
+ clock_gettime(CLOCK_MONOTONIC, &time);
+
+ assert(bo_gem->reloc_count >= start);
+ /* Unreference the cleared target buffers */
+ for (i = start; i < bo_gem->reloc_count; i++) {
+ if (bo_gem->reloc_target_info[i].bo != bo) {
+ drm_intel_gem_bo_unreference_locked_timed(bo_gem->
+ reloc_target_info[i].bo,
+ time.tv_sec);
+ }
+ }
+ bo_gem->reloc_count = start;
+}
+
/**
* Walk the tree of relocations rooted at BO and accumulate the list of
* validations to be performed and update the relocation buffers with
--
1.7.7
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 2/2] intel: Add an interface for removing relocs after they're added.
2011-10-24 18:04 ` [PATCH 2/2] intel: Add an interface for removing relocs after they're added Eric Anholt
@ 2011-10-25 11:51 ` Daniel Vetter
0 siblings, 0 replies; 4+ messages in thread
From: Daniel Vetter @ 2011-10-25 11:51 UTC (permalink / raw)
To: Eric Anholt; +Cc: intel-gfx
Maybe add your nice example from the series intro to the commit msg to
show how this is supposed to be used. Otherwise for both patches:
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
--
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-10-25 11:50 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-24 18:04 libdrm relocation reset support Eric Anholt
2011-10-24 18:04 ` [PATCH 1/2] intel: Use stdbool.h for dealing with boolean values Eric Anholt
2011-10-24 18:04 ` [PATCH 2/2] intel: Add an interface for removing relocs after they're added Eric Anholt
2011-10-25 11:51 ` Daniel Vetter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox