* drm/radeon: giving userspace control over hardware engine sync
@ 2014-08-14 16:12 Christian König
2014-08-14 16:12 ` [PATCH 1/4] drm/radeon: add pre and post IB sync Christian König
` (4 more replies)
0 siblings, 5 replies; 11+ messages in thread
From: Christian König @ 2014-08-14 16:12 UTC (permalink / raw)
To: dri-devel
Hello everyone,
this set of patch adds the ability for userspace to better control when synchronization between the different hardware engines on radeon happens. Previously every access to a buffer object was serialized and concurrent execution could only happen if command submissions didn't shared any buffer handle.
Patch #1 in this series adds the ability to not only sync before the IB execution, but also after it before the fence value is written. This is a workaround because TTM currently can't handle multiple fences attached to a single buffer object.
Patch #2 allows concurrent execution of command submission if there is only read only access to the same buffer.
Patch #3 adds a DONT_SYNC flag to each buffer object send to the kernel which allows userspace to explicitly note that concurrent access to a buffer is ok. The usage of this flag is restricted in that way that each operation the client doesn't knows about (eviction, access by other clients etc...) is still implicitly synced to.
Patch #4 adds a DONT_FENCE flag that tells the kernel to sync all operations to a buffer handle, but don't fence that handle with the current command submission. This is necessarily because we currently abuses zero sized buffer objects as fence handles.
Please review and comment,
Christian.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/4] drm/radeon: add pre and post IB sync
2014-08-14 16:12 drm/radeon: giving userspace control over hardware engine sync Christian König
@ 2014-08-14 16:12 ` Christian König
2014-08-14 16:12 ` [PATCH 2/4] drm/radeon: allow concurrent BO access by different engines Christian König
` (3 subsequent siblings)
4 siblings, 0 replies; 11+ messages in thread
From: Christian König @ 2014-08-14 16:12 UTC (permalink / raw)
To: dri-devel
From: Christian König <christian.koenig@amd.com>
This allows us to run different engines concurrently and
still have the fences sync to all operations as currently
required by TTM.
Signed-off-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/radeon/radeon.h | 3 ++-
drivers/gpu/drm/radeon/radeon_cs.c | 18 ++++++++++++------
drivers/gpu/drm/radeon/radeon_ib.c | 30 ++++++++++++++++++++++--------
drivers/gpu/drm/radeon/radeon_vm.c | 8 ++++----
4 files changed, 40 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index e715e0c..4579361 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -800,7 +800,8 @@ struct radeon_ib {
struct radeon_fence *fence;
struct radeon_vm *vm;
bool is_const_ib;
- struct radeon_semaphore *semaphore;
+ struct radeon_semaphore *presync;
+ struct radeon_semaphore *postsync;
};
struct radeon_ring {
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index ee712c1..2be4fc5 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -228,11 +228,15 @@ static void radeon_cs_sync_rings(struct radeon_cs_parser *p)
int i;
for (i = 0; i < p->nrelocs; i++) {
- if (!p->relocs[i].robj)
+ struct radeon_cs_reloc *reloc = &p->relocs[i];
+ struct radeon_bo *bo = reloc->robj;
+ struct radeon_fence *fence;
+
+ if (!bo)
continue;
- radeon_semaphore_sync_to(p->ib.semaphore,
- p->relocs[i].robj->tbo.sync_obj);
+ fence = bo->tbo.sync_obj;
+ radeon_semaphore_sync_to(p->ib.presync, fence);
}
}
@@ -252,9 +256,11 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
INIT_LIST_HEAD(&p->validated);
p->idx = 0;
p->ib.sa_bo = NULL;
- p->ib.semaphore = NULL;
+ p->ib.presync = NULL;
+ p->ib.postsync = NULL;
p->const_ib.sa_bo = NULL;
- p->const_ib.semaphore = NULL;
+ p->const_ib.presync = NULL;
+ p->const_ib.postsync = NULL;
p->chunk_ib_idx = -1;
p->chunk_relocs_idx = -1;
p->chunk_flags_idx = -1;
@@ -537,7 +543,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
goto out;
}
radeon_cs_sync_rings(parser);
- radeon_semaphore_sync_to(parser->ib.semaphore, vm->fence);
+ radeon_semaphore_sync_to(parser->ib.presync, vm->fence);
if ((rdev->family >= CHIP_TAHITI) &&
(parser->chunk_const_ib_idx != -1)) {
diff --git a/drivers/gpu/drm/radeon/radeon_ib.c b/drivers/gpu/drm/radeon/radeon_ib.c
index 65b0c21..be09d70 100644
--- a/drivers/gpu/drm/radeon/radeon_ib.c
+++ b/drivers/gpu/drm/radeon/radeon_ib.c
@@ -64,10 +64,13 @@ int radeon_ib_get(struct radeon_device *rdev, int ring,
return r;
}
- r = radeon_semaphore_create(rdev, &ib->semaphore);
- if (r) {
+ r = radeon_semaphore_create(rdev, &ib->presync);
+ if (r)
+ return r;
+
+ r = radeon_semaphore_create(rdev, &ib->postsync);
+ if (r)
return r;
- }
ib->ring = ring;
ib->fence = NULL;
@@ -96,7 +99,8 @@ int radeon_ib_get(struct radeon_device *rdev, int ring,
*/
void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib *ib)
{
- radeon_semaphore_free(rdev, &ib->semaphore, ib->fence);
+ radeon_semaphore_free(rdev, &ib->presync, ib->fence);
+ radeon_semaphore_free(rdev, &ib->postsync, ib->fence);
radeon_sa_bo_free(rdev, &ib->sa_bo, ib->fence);
radeon_fence_unref(&ib->fence);
}
@@ -144,11 +148,11 @@ int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib,
if (ib->vm) {
struct radeon_fence *vm_id_fence;
vm_id_fence = radeon_vm_grab_id(rdev, ib->vm, ib->ring);
- radeon_semaphore_sync_to(ib->semaphore, vm_id_fence);
+ radeon_semaphore_sync_to(ib->presync, vm_id_fence);
}
- /* sync with other rings */
- r = radeon_semaphore_sync_rings(rdev, ib->semaphore, ib->ring);
+ /* sync with other rings before IB execution */
+ r = radeon_semaphore_sync_rings(rdev, ib->presync, ib->ring);
if (r) {
dev_err(rdev->dev, "failed to sync rings (%d)\n", r);
radeon_ring_unlock_undo(rdev, ring);
@@ -160,9 +164,19 @@ int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib,
if (const_ib) {
radeon_ring_ib_execute(rdev, const_ib->ring, const_ib);
- radeon_semaphore_free(rdev, &const_ib->semaphore, NULL);
+ radeon_semaphore_free(rdev, &const_ib->presync, NULL);
+ radeon_semaphore_free(rdev, &const_ib->postsync, NULL);
}
radeon_ring_ib_execute(rdev, ib->ring, ib);
+
+ /* sync with other rings after IB execution */
+ r = radeon_semaphore_sync_rings(rdev, ib->postsync, ib->ring);
+ if (r) {
+ dev_err(rdev->dev, "failed to sync rings (%d)\n", r);
+ radeon_ring_unlock_undo(rdev, ring);
+ return r;
+ }
+
r = radeon_fence_emit(rdev, &ib->fence, ib->ring);
if (r) {
dev_err(rdev->dev, "failed to emit fence for new IB (%d)\n", r);
diff --git a/drivers/gpu/drm/radeon/radeon_vm.c b/drivers/gpu/drm/radeon/radeon_vm.c
index b2c336a..61f2e6a 100644
--- a/drivers/gpu/drm/radeon/radeon_vm.c
+++ b/drivers/gpu/drm/radeon/radeon_vm.c
@@ -696,8 +696,8 @@ int radeon_vm_update_page_directory(struct radeon_device *rdev,
if (ib.length_dw != 0) {
radeon_asic_vm_pad_ib(rdev, &ib);
- radeon_semaphore_sync_to(ib.semaphore, pd->tbo.sync_obj);
- radeon_semaphore_sync_to(ib.semaphore, vm->last_id_use);
+ radeon_semaphore_sync_to(ib.presync, pd->tbo.sync_obj);
+ radeon_semaphore_sync_to(ib.presync, vm->last_id_use);
WARN_ON(ib.length_dw > ndw);
r = radeon_ib_schedule(rdev, &ib, NULL);
if (r) {
@@ -823,7 +823,7 @@ static void radeon_vm_update_ptes(struct radeon_device *rdev,
unsigned nptes;
uint64_t pte;
- radeon_semaphore_sync_to(ib->semaphore, pt->tbo.sync_obj);
+ radeon_semaphore_sync_to(ib->presync, pt->tbo.sync_obj);
if ((addr & ~mask) == (end & ~mask))
nptes = end - addr;
@@ -965,7 +965,7 @@ int radeon_vm_bo_update(struct radeon_device *rdev,
radeon_asic_vm_pad_ib(rdev, &ib);
WARN_ON(ib.length_dw > ndw);
- radeon_semaphore_sync_to(ib.semaphore, vm->fence);
+ radeon_semaphore_sync_to(ib.presync, vm->fence);
r = radeon_ib_schedule(rdev, &ib, NULL);
if (r) {
radeon_ib_free(rdev, &ib);
--
1.9.1
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/4] drm/radeon: allow concurrent BO access by different engines
2014-08-14 16:12 drm/radeon: giving userspace control over hardware engine sync Christian König
2014-08-14 16:12 ` [PATCH 1/4] drm/radeon: add pre and post IB sync Christian König
@ 2014-08-14 16:12 ` Christian König
2014-08-14 16:12 ` [PATCH 3/4] drm/radeon: add DONT_SYNC flag to CS relocs Christian König
` (2 subsequent siblings)
4 siblings, 0 replies; 11+ messages in thread
From: Christian König @ 2014-08-14 16:12 UTC (permalink / raw)
To: dri-devel
From: Christian König <christian.koenig@amd.com>
This patch allows concurrent access of different engines to the same BO
as long as everybody only reads from it. Since TTM can't (yet) handle
multiple fences for one BO we still sync the fence after executing the IB.
Signed-off-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/radeon/radeon.h | 2 ++
drivers/gpu/drm/radeon/radeon_cs.c | 24 +++++++++++++++++++++++-
drivers/gpu/drm/radeon/radeon_ttm.c | 8 ++++++++
3 files changed, 33 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 4579361..c0f7773 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -478,6 +478,7 @@ struct radeon_bo {
u32 tiling_flags;
u32 pitch;
int surface_reg;
+ struct radeon_fence *written;
/* list of all virtual address to which this bo
* is associated to
*/
@@ -1017,6 +1018,7 @@ struct radeon_cs_reloc {
unsigned allowed_domains;
uint32_t tiling_flags;
uint32_t handle;
+ bool written;
};
struct radeon_cs_chunk {
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index 2be4fc5..3aa7e48 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -166,6 +166,7 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
p->relocs[i].tv.bo = &p->relocs[i].robj->tbo;
p->relocs[i].handle = r->handle;
+ p->relocs[i].written = !!r->write_domain;
radeon_cs_buckets_add(&buckets, &p->relocs[i].tv.head,
priority);
@@ -236,7 +237,16 @@ static void radeon_cs_sync_rings(struct radeon_cs_parser *p)
continue;
fence = bo->tbo.sync_obj;
- radeon_semaphore_sync_to(p->ib.presync, fence);
+
+ if (bo->written && radeon_fence_signaled(bo->written))
+ radeon_fence_unref(&bo->written);
+
+ /* if either this CS or the last one write to
+ the BO we sync before executing the IB */
+ if (reloc->written || bo->written)
+ radeon_semaphore_sync_to(p->ib.presync, fence);
+ else
+ radeon_semaphore_sync_to(p->ib.postsync, fence);
}
}
@@ -406,6 +416,18 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo
*/
list_sort(NULL, &parser->validated, cmp_size_smaller_first);
+ /* remember which BOs we write to */
+ for (i = 0; i < parser->nrelocs; i++) {
+ struct radeon_cs_reloc *reloc = &parser->relocs[i];
+ struct radeon_bo *bo = reloc->robj;
+
+ if (!bo || !reloc->written)
+ continue;
+
+ radeon_fence_unref(&bo->written);
+ bo->written = radeon_fence_ref(parser->ib.fence);
+ }
+
ttm_eu_fence_buffer_objects(&parser->ticket,
&parser->validated,
parser->ib.fence);
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index 72afe82..76be612 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -228,10 +228,12 @@ static int radeon_move_blit(struct ttm_buffer_object *bo,
struct radeon_device *rdev;
uint64_t old_start, new_start;
struct radeon_fence *fence;
+ struct radeon_bo *rbo;
int r, ridx;
rdev = radeon_get_rdev(bo->bdev);
ridx = radeon_copy_ring_index(rdev);
+ rbo = container_of(bo, struct radeon_bo, tbo);
old_start = old_mem->start << PAGE_SHIFT;
new_start = new_mem->start << PAGE_SHIFT;
@@ -269,6 +271,12 @@ static int radeon_move_blit(struct ttm_buffer_object *bo,
r = radeon_copy(rdev, old_start, new_start,
new_mem->num_pages * (PAGE_SIZE / RADEON_GPU_PAGE_SIZE), /* GPU pages */
&fence);
+
+ if (!r) {
+ radeon_fence_unref(&rbo->written);
+ rbo->written = radeon_fence_ref(fence);
+ }
+
/* FIXME: handle copy error */
r = ttm_bo_move_accel_cleanup(bo, (void *)fence,
evict, no_wait_gpu, new_mem);
--
1.9.1
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 3/4] drm/radeon: add DONT_SYNC flag to CS relocs
2014-08-14 16:12 drm/radeon: giving userspace control over hardware engine sync Christian König
2014-08-14 16:12 ` [PATCH 1/4] drm/radeon: add pre and post IB sync Christian König
2014-08-14 16:12 ` [PATCH 2/4] drm/radeon: allow concurrent BO access by different engines Christian König
@ 2014-08-14 16:12 ` Christian König
2014-08-14 18:43 ` Jerome Glisse
2014-08-14 16:12 ` [PATCH 4/4] drm/radeon: add flag to NOT fence a BO on CS Christian König
2014-08-14 18:35 ` drm/radeon: giving userspace control over hardware engine sync Jerome Glisse
4 siblings, 1 reply; 11+ messages in thread
From: Christian König @ 2014-08-14 16:12 UTC (permalink / raw)
To: dri-devel
From: Christian König <christian.koenig@amd.com>
This allows userspace to explicitly don't sync submission to
different rings as long as all uses stay in the same client.
Signed-off-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/radeon/radeon.h | 3 +++
drivers/gpu/drm/radeon/radeon_cs.c | 23 ++++++++++++++++++++++-
drivers/gpu/drm/radeon/radeon_gem.c | 1 +
drivers/gpu/drm/radeon/radeon_ttm.c | 3 +++
include/uapi/drm/radeon_drm.h | 2 ++
5 files changed, 31 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index c0f7773..740a0b2 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -478,6 +478,8 @@ struct radeon_bo {
u32 tiling_flags;
u32 pitch;
int surface_reg;
+ struct drm_file *last_user;
+ struct radeon_fence *last_sync;
struct radeon_fence *written;
/* list of all virtual address to which this bo
* is associated to
@@ -1018,6 +1020,7 @@ struct radeon_cs_reloc {
unsigned allowed_domains;
uint32_t tiling_flags;
uint32_t handle;
+ uint32_t flags;
bool written;
};
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index 3aa7e48..11e4789 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -166,6 +166,7 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
p->relocs[i].tv.bo = &p->relocs[i].robj->tbo;
p->relocs[i].handle = r->handle;
+ p->relocs[i].flags = r->flags;
p->relocs[i].written = !!r->write_domain;
radeon_cs_buckets_add(&buckets, &p->relocs[i].tv.head,
@@ -236,6 +237,14 @@ static void radeon_cs_sync_rings(struct radeon_cs_parser *p)
if (!bo)
continue;
+ /* always sync to the last operation
+ the clients doesn't know about */
+ radeon_semaphore_sync_to(p->ib.presync, bo->last_sync);
+
+ if (bo->last_user == p->filp &&
+ reloc->flags & RADEON_RELOC_DONT_SYNC)
+ continue;
+
fence = bo->tbo.sync_obj;
if (bo->written && radeon_fence_signaled(bo->written))
@@ -421,7 +430,19 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo
struct radeon_cs_reloc *reloc = &parser->relocs[i];
struct radeon_bo *bo = reloc->robj;
- if (!bo || !reloc->written)
+ if (!bo)
+ continue;
+
+ /* if the client changes remember that and always serialize
+ all operations from different clients */
+ if (bo->last_user != parser->filp && bo->tbo.sync_obj) {
+ struct radeon_fence *fence = bo->tbo.sync_obj;
+ radeon_fence_unref(&bo->last_sync);
+ bo->last_sync = radeon_fence_ref(fence);
+ }
+ bo->last_user = parser->filp;
+
+ if (!reloc->written)
continue;
radeon_fence_unref(&bo->written);
diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c
index bfd7e1b..c73dbc1 100644
--- a/drivers/gpu/drm/radeon/radeon_gem.c
+++ b/drivers/gpu/drm/radeon/radeon_gem.c
@@ -259,6 +259,7 @@ int radeon_gem_create_ioctl(struct drm_device *dev, void *data,
r = radeon_gem_handle_lockup(rdev, r);
return r;
}
+ gem_to_radeon_bo(gobj)->last_user = filp;
r = drm_gem_handle_create(filp, gobj, &handle);
/* drop reference from allocate - handle holds it now */
drm_gem_object_unreference_unlocked(gobj);
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index 76be612..a4f964f 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -273,6 +273,9 @@ static int radeon_move_blit(struct ttm_buffer_object *bo,
&fence);
if (!r) {
+ rbo->last_user = NULL;
+ radeon_fence_unref(&rbo->last_sync);
+ rbo->last_sync = radeon_fence_ref(fence);
radeon_fence_unref(&rbo->written);
rbo->written = radeon_fence_ref(fence);
}
diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon_drm.h
index 509b2d7..5bd3f68 100644
--- a/include/uapi/drm/radeon_drm.h
+++ b/include/uapi/drm/radeon_drm.h
@@ -944,6 +944,8 @@ struct drm_radeon_cs_chunk {
};
/* drm_radeon_cs_reloc.flags */
+#define RADEON_RELOC_PRIO_MASK (0xf << 0)
+#define RADEON_RELOC_DONT_SYNC (1 << 4)
struct drm_radeon_cs_reloc {
uint32_t handle;
--
1.9.1
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 4/4] drm/radeon: add flag to NOT fence a BO on CS
2014-08-14 16:12 drm/radeon: giving userspace control over hardware engine sync Christian König
` (2 preceding siblings ...)
2014-08-14 16:12 ` [PATCH 3/4] drm/radeon: add DONT_SYNC flag to CS relocs Christian König
@ 2014-08-14 16:12 ` Christian König
2014-08-14 18:35 ` drm/radeon: giving userspace control over hardware engine sync Jerome Glisse
4 siblings, 0 replies; 11+ messages in thread
From: Christian König @ 2014-08-14 16:12 UTC (permalink / raw)
To: dri-devel
From: Christian König <christian.koenig@amd.com>
This is useful and only allowed with pure sync objects. E.g.
the CS depends on finishing a previous CS, but should not add
the new fence the handle.
Signed-off-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/radeon/radeon_cs.c | 29 ++++++++++++++++++++++++-----
include/uapi/drm/radeon_drm.h | 1 +
2 files changed, 25 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index 11e4789..bdb8eda 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -225,7 +225,7 @@ static int radeon_cs_get_ring(struct radeon_cs_parser *p, u32 ring, s32 priority
return 0;
}
-static void radeon_cs_sync_rings(struct radeon_cs_parser *p)
+static int radeon_cs_sync_rings(struct radeon_cs_parser *p)
{
int i;
@@ -237,6 +237,18 @@ static void radeon_cs_sync_rings(struct radeon_cs_parser *p)
if (!bo)
continue;
+ fence = bo->tbo.sync_obj;
+ if (reloc->flags & RADEON_RELOC_DONT_FENCE) {
+
+ /* only allowed for sync objects */
+ if (radeon_bo_size(bo) != 0)
+ return -EINVAL;
+
+ list_del(&p->relocs[i].tv.head);
+ radeon_bo_unreserve(bo);
+ p->relocs[i].robj = NULL;
+ }
+
/* always sync to the last operation
the clients doesn't know about */
radeon_semaphore_sync_to(p->ib.presync, bo->last_sync);
@@ -245,8 +257,6 @@ static void radeon_cs_sync_rings(struct radeon_cs_parser *p)
reloc->flags & RADEON_RELOC_DONT_SYNC)
continue;
- fence = bo->tbo.sync_obj;
-
if (bo->written && radeon_fence_signaled(bo->written))
radeon_fence_unref(&bo->written);
@@ -257,6 +267,8 @@ static void radeon_cs_sync_rings(struct radeon_cs_parser *p)
else
radeon_semaphore_sync_to(p->ib.postsync, fence);
}
+
+ return 0;
}
/* XXX: note that this is called from the legacy UMS CS ioctl as well */
@@ -498,7 +510,10 @@ static int radeon_cs_ib_chunk(struct radeon_device *rdev,
(parser->ring == TN_RING_TYPE_VCE2_INDEX))
radeon_vce_note_usage(rdev);
- radeon_cs_sync_rings(parser);
+ r = radeon_cs_sync_rings(parser);
+ if (r)
+ return r;
+
r = radeon_ib_schedule(rdev, &parser->ib, NULL);
if (r) {
DRM_ERROR("Failed to schedule IB !\n");
@@ -585,7 +600,11 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
if (r) {
goto out;
}
- radeon_cs_sync_rings(parser);
+
+ r = radeon_cs_sync_rings(parser);
+ if (r)
+ goto out;
+
radeon_semaphore_sync_to(parser->ib.presync, vm->fence);
if ((rdev->family >= CHIP_TAHITI) &&
diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon_drm.h
index 5bd3f68..289a0ca 100644
--- a/include/uapi/drm/radeon_drm.h
+++ b/include/uapi/drm/radeon_drm.h
@@ -946,6 +946,7 @@ struct drm_radeon_cs_chunk {
/* drm_radeon_cs_reloc.flags */
#define RADEON_RELOC_PRIO_MASK (0xf << 0)
#define RADEON_RELOC_DONT_SYNC (1 << 4)
+#define RADEON_RELOC_DONT_FENCE (1 << 5)
struct drm_radeon_cs_reloc {
uint32_t handle;
--
1.9.1
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: drm/radeon: giving userspace control over hardware engine sync
2014-08-14 16:12 drm/radeon: giving userspace control over hardware engine sync Christian König
` (3 preceding siblings ...)
2014-08-14 16:12 ` [PATCH 4/4] drm/radeon: add flag to NOT fence a BO on CS Christian König
@ 2014-08-14 18:35 ` Jerome Glisse
2014-08-15 12:04 ` Christian König
4 siblings, 1 reply; 11+ messages in thread
From: Jerome Glisse @ 2014-08-14 18:35 UTC (permalink / raw)
To: Christian König; +Cc: dri-devel
On Thu, Aug 14, 2014 at 06:12:01PM +0200, Christian König wrote:
> Hello everyone,
>
> this set of patch adds the ability for userspace to better control when
> synchronization between the different hardware engines on radeon happens.
> Previously every access to a buffer object was serialized and concurrent
> execution could only happen if command submissions didn't shared any
> buffer handle.
I must say i do not like the whole direction of abusing buffer object to be
expose as fence to userspace. Allowing 0 sized bo was the first step in this
bad direction.
I do understand it's lot easier to implement such things in isolation from
other and that trying to design and get a common kernel subsystem accepted
is way more painful and unpredictible.
> Patch #1 in this series adds the ability to not only sync before the IB
> execution, but also after it before the fence value is written. This is
> a workaround because TTM currently can't handle multiple fences attached
> to a single buffer object.
We do not want multiple fences to be associated with a buffer object ever.
In fact i believe getting rid of fence associated to buffer object is what
would make sense.
Others comment as a per patch reply.
Cheers,
Jérôme
> Patch #2 allows concurrent execution of command submission if there is
> only read only access to the same buffer.
>
> Patch #3 adds a DONT_SYNC flag to each buffer object send to the kernel
> which allows userspace to explicitly note that concurrent access to a
> buffer is ok. The usage of this flag is restricted in that way that each
> operation the client doesn't knows about (eviction, access by other clients
> etc...) is still implicitly synced to.
>
> Patch #4 adds a DONT_FENCE flag that tells the kernel to sync all operations
> to a buffer handle, but don't fence that handle with the current command
> submission. This is necessarily because we currently abuses zero sized buffer
> objects as fence handles.
>
> Please review and comment,
> Christian.
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/4] drm/radeon: add DONT_SYNC flag to CS relocs
2014-08-14 16:12 ` [PATCH 3/4] drm/radeon: add DONT_SYNC flag to CS relocs Christian König
@ 2014-08-14 18:43 ` Jerome Glisse
2014-08-14 20:34 ` Alex Deucher
0 siblings, 1 reply; 11+ messages in thread
From: Jerome Glisse @ 2014-08-14 18:43 UTC (permalink / raw)
To: Christian König; +Cc: dri-devel
On Thu, Aug 14, 2014 at 06:12:04PM +0200, Christian König wrote:
> From: Christian König <christian.koenig@amd.com>
>
> This allows userspace to explicitly don't sync submission to
> different rings as long as all uses stay in the same client.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
> drivers/gpu/drm/radeon/radeon.h | 3 +++
> drivers/gpu/drm/radeon/radeon_cs.c | 23 ++++++++++++++++++++++-
> drivers/gpu/drm/radeon/radeon_gem.c | 1 +
> drivers/gpu/drm/radeon/radeon_ttm.c | 3 +++
> include/uapi/drm/radeon_drm.h | 2 ++
> 5 files changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> index c0f7773..740a0b2 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -478,6 +478,8 @@ struct radeon_bo {
> u32 tiling_flags;
> u32 pitch;
> int surface_reg;
> + struct drm_file *last_user;
> + struct radeon_fence *last_sync;
> struct radeon_fence *written;
> /* list of all virtual address to which this bo
> * is associated to
> @@ -1018,6 +1020,7 @@ struct radeon_cs_reloc {
> unsigned allowed_domains;
> uint32_t tiling_flags;
> uint32_t handle;
> + uint32_t flags;
> bool written;
> };
>
> diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
> index 3aa7e48..11e4789 100644
> --- a/drivers/gpu/drm/radeon/radeon_cs.c
> +++ b/drivers/gpu/drm/radeon/radeon_cs.c
> @@ -166,6 +166,7 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
>
> p->relocs[i].tv.bo = &p->relocs[i].robj->tbo;
> p->relocs[i].handle = r->handle;
> + p->relocs[i].flags = r->flags;
> p->relocs[i].written = !!r->write_domain;
>
> radeon_cs_buckets_add(&buckets, &p->relocs[i].tv.head,
> @@ -236,6 +237,14 @@ static void radeon_cs_sync_rings(struct radeon_cs_parser *p)
> if (!bo)
> continue;
>
> + /* always sync to the last operation
> + the clients doesn't know about */
> + radeon_semaphore_sync_to(p->ib.presync, bo->last_sync);
> +
> + if (bo->last_user == p->filp &&
> + reloc->flags & RADEON_RELOC_DONT_SYNC)
> + continue;
> +
> fence = bo->tbo.sync_obj;
>
> if (bo->written && radeon_fence_signaled(bo->written))
> @@ -421,7 +430,19 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo
> struct radeon_cs_reloc *reloc = &parser->relocs[i];
> struct radeon_bo *bo = reloc->robj;
>
> - if (!bo || !reloc->written)
> + if (!bo)
> + continue;
> +
> + /* if the client changes remember that and always serialize
> + all operations from different clients */
> + if (bo->last_user != parser->filp && bo->tbo.sync_obj) {
> + struct radeon_fence *fence = bo->tbo.sync_obj;
> + radeon_fence_unref(&bo->last_sync);
> + bo->last_sync = radeon_fence_ref(fence);
> + }
> + bo->last_user = parser->filp;
> +
> + if (!reloc->written)
> continue;
>
> radeon_fence_unref(&bo->written);
> diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c
> index bfd7e1b..c73dbc1 100644
> --- a/drivers/gpu/drm/radeon/radeon_gem.c
> +++ b/drivers/gpu/drm/radeon/radeon_gem.c
> @@ -259,6 +259,7 @@ int radeon_gem_create_ioctl(struct drm_device *dev, void *data,
> r = radeon_gem_handle_lockup(rdev, r);
> return r;
> }
> + gem_to_radeon_bo(gobj)->last_user = filp;
> r = drm_gem_handle_create(filp, gobj, &handle);
> /* drop reference from allocate - handle holds it now */
> drm_gem_object_unreference_unlocked(gobj);
> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
> index 76be612..a4f964f 100644
> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
> @@ -273,6 +273,9 @@ static int radeon_move_blit(struct ttm_buffer_object *bo,
> &fence);
>
> if (!r) {
> + rbo->last_user = NULL;
> + radeon_fence_unref(&rbo->last_sync);
> + rbo->last_sync = radeon_fence_ref(fence);
> radeon_fence_unref(&rbo->written);
> rbo->written = radeon_fence_ref(fence);
> }
> diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon_drm.h
> index 509b2d7..5bd3f68 100644
> --- a/include/uapi/drm/radeon_drm.h
> +++ b/include/uapi/drm/radeon_drm.h
> @@ -944,6 +944,8 @@ struct drm_radeon_cs_chunk {
> };
>
> /* drm_radeon_cs_reloc.flags */
> +#define RADEON_RELOC_PRIO_MASK (0xf << 0)
RADEON_RELOC_PRIO_MASK not use anywhere, not explain by anything, so new API
with no justification, i would say NAK
> +#define RADEON_RELOC_DONT_SYNC (1 << 4)
>
> struct drm_radeon_cs_reloc {
> uint32_t handle;
> --
> 1.9.1
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/4] drm/radeon: add DONT_SYNC flag to CS relocs
2014-08-14 18:43 ` Jerome Glisse
@ 2014-08-14 20:34 ` Alex Deucher
2014-08-14 21:06 ` Jerome Glisse
0 siblings, 1 reply; 11+ messages in thread
From: Alex Deucher @ 2014-08-14 20:34 UTC (permalink / raw)
To: Jerome Glisse; +Cc: Maling list - DRI developers
On Thu, Aug 14, 2014 at 2:43 PM, Jerome Glisse <j.glisse@gmail.com> wrote:
> On Thu, Aug 14, 2014 at 06:12:04PM +0200, Christian König wrote:
>> From: Christian König <christian.koenig@amd.com>
>>
>> This allows userspace to explicitly don't sync submission to
>> different rings as long as all uses stay in the same client.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>> drivers/gpu/drm/radeon/radeon.h | 3 +++
>> drivers/gpu/drm/radeon/radeon_cs.c | 23 ++++++++++++++++++++++-
>> drivers/gpu/drm/radeon/radeon_gem.c | 1 +
>> drivers/gpu/drm/radeon/radeon_ttm.c | 3 +++
>> include/uapi/drm/radeon_drm.h | 2 ++
>> 5 files changed, 31 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
>> index c0f7773..740a0b2 100644
>> --- a/drivers/gpu/drm/radeon/radeon.h
>> +++ b/drivers/gpu/drm/radeon/radeon.h
>> @@ -478,6 +478,8 @@ struct radeon_bo {
>> u32 tiling_flags;
>> u32 pitch;
>> int surface_reg;
>> + struct drm_file *last_user;
>> + struct radeon_fence *last_sync;
>> struct radeon_fence *written;
>> /* list of all virtual address to which this bo
>> * is associated to
>> @@ -1018,6 +1020,7 @@ struct radeon_cs_reloc {
>> unsigned allowed_domains;
>> uint32_t tiling_flags;
>> uint32_t handle;
>> + uint32_t flags;
>> bool written;
>> };
>>
>> diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
>> index 3aa7e48..11e4789 100644
>> --- a/drivers/gpu/drm/radeon/radeon_cs.c
>> +++ b/drivers/gpu/drm/radeon/radeon_cs.c
>> @@ -166,6 +166,7 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
>>
>> p->relocs[i].tv.bo = &p->relocs[i].robj->tbo;
>> p->relocs[i].handle = r->handle;
>> + p->relocs[i].flags = r->flags;
>> p->relocs[i].written = !!r->write_domain;
>>
>> radeon_cs_buckets_add(&buckets, &p->relocs[i].tv.head,
>> @@ -236,6 +237,14 @@ static void radeon_cs_sync_rings(struct radeon_cs_parser *p)
>> if (!bo)
>> continue;
>>
>> + /* always sync to the last operation
>> + the clients doesn't know about */
>> + radeon_semaphore_sync_to(p->ib.presync, bo->last_sync);
>> +
>> + if (bo->last_user == p->filp &&
>> + reloc->flags & RADEON_RELOC_DONT_SYNC)
>> + continue;
>> +
>> fence = bo->tbo.sync_obj;
>>
>> if (bo->written && radeon_fence_signaled(bo->written))
>> @@ -421,7 +430,19 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo
>> struct radeon_cs_reloc *reloc = &parser->relocs[i];
>> struct radeon_bo *bo = reloc->robj;
>>
>> - if (!bo || !reloc->written)
>> + if (!bo)
>> + continue;
>> +
>> + /* if the client changes remember that and always serialize
>> + all operations from different clients */
>> + if (bo->last_user != parser->filp && bo->tbo.sync_obj) {
>> + struct radeon_fence *fence = bo->tbo.sync_obj;
>> + radeon_fence_unref(&bo->last_sync);
>> + bo->last_sync = radeon_fence_ref(fence);
>> + }
>> + bo->last_user = parser->filp;
>> +
>> + if (!reloc->written)
>> continue;
>>
>> radeon_fence_unref(&bo->written);
>> diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c
>> index bfd7e1b..c73dbc1 100644
>> --- a/drivers/gpu/drm/radeon/radeon_gem.c
>> +++ b/drivers/gpu/drm/radeon/radeon_gem.c
>> @@ -259,6 +259,7 @@ int radeon_gem_create_ioctl(struct drm_device *dev, void *data,
>> r = radeon_gem_handle_lockup(rdev, r);
>> return r;
>> }
>> + gem_to_radeon_bo(gobj)->last_user = filp;
>> r = drm_gem_handle_create(filp, gobj, &handle);
>> /* drop reference from allocate - handle holds it now */
>> drm_gem_object_unreference_unlocked(gobj);
>> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
>> index 76be612..a4f964f 100644
>> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
>> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
>> @@ -273,6 +273,9 @@ static int radeon_move_blit(struct ttm_buffer_object *bo,
>> &fence);
>>
>> if (!r) {
>> + rbo->last_user = NULL;
>> + radeon_fence_unref(&rbo->last_sync);
>> + rbo->last_sync = radeon_fence_ref(fence);
>> radeon_fence_unref(&rbo->written);
>> rbo->written = radeon_fence_ref(fence);
>> }
>> diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon_drm.h
>> index 509b2d7..5bd3f68 100644
>> --- a/include/uapi/drm/radeon_drm.h
>> +++ b/include/uapi/drm/radeon_drm.h
>> @@ -944,6 +944,8 @@ struct drm_radeon_cs_chunk {
>> };
>>
>> /* drm_radeon_cs_reloc.flags */
>> +#define RADEON_RELOC_PRIO_MASK (0xf << 0)
>
> RADEON_RELOC_PRIO_MASK not use anywhere, not explain by anything, so new API
> with no justification, i would say NAK
The first 4 bits are used in:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c9b76548899cde2e729e3bca015d7e78ec5baad7
and that interface is already used in mesa.
This just makes it clear that those bits are already used when adding
additional flags.
Alex
>
>> +#define RADEON_RELOC_DONT_SYNC (1 << 4)
>>
>> struct drm_radeon_cs_reloc {
>> uint32_t handle;
>> --
>> 1.9.1
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/4] drm/radeon: add DONT_SYNC flag to CS relocs
2014-08-14 20:34 ` Alex Deucher
@ 2014-08-14 21:06 ` Jerome Glisse
2014-08-15 7:46 ` Christian König
0 siblings, 1 reply; 11+ messages in thread
From: Jerome Glisse @ 2014-08-14 21:06 UTC (permalink / raw)
To: Alex Deucher; +Cc: Maling list - DRI developers
On Thu, Aug 14, 2014 at 04:34:29PM -0400, Alex Deucher wrote:
> On Thu, Aug 14, 2014 at 2:43 PM, Jerome Glisse <j.glisse@gmail.com> wrote:
> > On Thu, Aug 14, 2014 at 06:12:04PM +0200, Christian König wrote:
> >> From: Christian König <christian.koenig@amd.com>
> >>
> >> This allows userspace to explicitly don't sync submission to
> >> different rings as long as all uses stay in the same client.
> >>
> >> Signed-off-by: Christian König <christian.koenig@amd.com>
> >> ---
> >> drivers/gpu/drm/radeon/radeon.h | 3 +++
> >> drivers/gpu/drm/radeon/radeon_cs.c | 23 ++++++++++++++++++++++-
> >> drivers/gpu/drm/radeon/radeon_gem.c | 1 +
> >> drivers/gpu/drm/radeon/radeon_ttm.c | 3 +++
> >> include/uapi/drm/radeon_drm.h | 2 ++
> >> 5 files changed, 31 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> >> index c0f7773..740a0b2 100644
> >> --- a/drivers/gpu/drm/radeon/radeon.h
> >> +++ b/drivers/gpu/drm/radeon/radeon.h
> >> @@ -478,6 +478,8 @@ struct radeon_bo {
> >> u32 tiling_flags;
> >> u32 pitch;
> >> int surface_reg;
> >> + struct drm_file *last_user;
> >> + struct radeon_fence *last_sync;
> >> struct radeon_fence *written;
> >> /* list of all virtual address to which this bo
> >> * is associated to
> >> @@ -1018,6 +1020,7 @@ struct radeon_cs_reloc {
> >> unsigned allowed_domains;
> >> uint32_t tiling_flags;
> >> uint32_t handle;
> >> + uint32_t flags;
> >> bool written;
> >> };
> >>
> >> diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
> >> index 3aa7e48..11e4789 100644
> >> --- a/drivers/gpu/drm/radeon/radeon_cs.c
> >> +++ b/drivers/gpu/drm/radeon/radeon_cs.c
> >> @@ -166,6 +166,7 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
> >>
> >> p->relocs[i].tv.bo = &p->relocs[i].robj->tbo;
> >> p->relocs[i].handle = r->handle;
> >> + p->relocs[i].flags = r->flags;
> >> p->relocs[i].written = !!r->write_domain;
> >>
> >> radeon_cs_buckets_add(&buckets, &p->relocs[i].tv.head,
> >> @@ -236,6 +237,14 @@ static void radeon_cs_sync_rings(struct radeon_cs_parser *p)
> >> if (!bo)
> >> continue;
> >>
> >> + /* always sync to the last operation
> >> + the clients doesn't know about */
> >> + radeon_semaphore_sync_to(p->ib.presync, bo->last_sync);
> >> +
> >> + if (bo->last_user == p->filp &&
> >> + reloc->flags & RADEON_RELOC_DONT_SYNC)
> >> + continue;
> >> +
> >> fence = bo->tbo.sync_obj;
> >>
> >> if (bo->written && radeon_fence_signaled(bo->written))
> >> @@ -421,7 +430,19 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo
> >> struct radeon_cs_reloc *reloc = &parser->relocs[i];
> >> struct radeon_bo *bo = reloc->robj;
> >>
> >> - if (!bo || !reloc->written)
> >> + if (!bo)
> >> + continue;
> >> +
> >> + /* if the client changes remember that and always serialize
> >> + all operations from different clients */
> >> + if (bo->last_user != parser->filp && bo->tbo.sync_obj) {
> >> + struct radeon_fence *fence = bo->tbo.sync_obj;
> >> + radeon_fence_unref(&bo->last_sync);
> >> + bo->last_sync = radeon_fence_ref(fence);
> >> + }
> >> + bo->last_user = parser->filp;
> >> +
> >> + if (!reloc->written)
> >> continue;
> >>
> >> radeon_fence_unref(&bo->written);
> >> diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c
> >> index bfd7e1b..c73dbc1 100644
> >> --- a/drivers/gpu/drm/radeon/radeon_gem.c
> >> +++ b/drivers/gpu/drm/radeon/radeon_gem.c
> >> @@ -259,6 +259,7 @@ int radeon_gem_create_ioctl(struct drm_device *dev, void *data,
> >> r = radeon_gem_handle_lockup(rdev, r);
> >> return r;
> >> }
> >> + gem_to_radeon_bo(gobj)->last_user = filp;
> >> r = drm_gem_handle_create(filp, gobj, &handle);
> >> /* drop reference from allocate - handle holds it now */
> >> drm_gem_object_unreference_unlocked(gobj);
> >> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
> >> index 76be612..a4f964f 100644
> >> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
> >> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
> >> @@ -273,6 +273,9 @@ static int radeon_move_blit(struct ttm_buffer_object *bo,
> >> &fence);
> >>
> >> if (!r) {
> >> + rbo->last_user = NULL;
> >> + radeon_fence_unref(&rbo->last_sync);
> >> + rbo->last_sync = radeon_fence_ref(fence);
> >> radeon_fence_unref(&rbo->written);
> >> rbo->written = radeon_fence_ref(fence);
> >> }
> >> diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon_drm.h
> >> index 509b2d7..5bd3f68 100644
> >> --- a/include/uapi/drm/radeon_drm.h
> >> +++ b/include/uapi/drm/radeon_drm.h
> >> @@ -944,6 +944,8 @@ struct drm_radeon_cs_chunk {
> >> };
> >>
> >> /* drm_radeon_cs_reloc.flags */
> >> +#define RADEON_RELOC_PRIO_MASK (0xf << 0)
> >
> > RADEON_RELOC_PRIO_MASK not use anywhere, not explain by anything, so new API
> > with no justification, i would say NAK
>
> The first 4 bits are used in:
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c9b76548899cde2e729e3bca015d7e78ec5baad7
> and that interface is already used in mesa.
>
> This just makes it clear that those bits are already used when adding
> additional flags.
Well would be good to introduce the flag and use it as part of
separate patch.
>
> Alex
>
> >
> >> +#define RADEON_RELOC_DONT_SYNC (1 << 4)
> >>
> >> struct drm_radeon_cs_reloc {
> >> uint32_t handle;
> >> --
> >> 1.9.1
> >>
> >> _______________________________________________
> >> dri-devel mailing list
> >> dri-devel@lists.freedesktop.org
> >> http://lists.freedesktop.org/mailman/listinfo/dri-devel
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/4] drm/radeon: add DONT_SYNC flag to CS relocs
2014-08-14 21:06 ` Jerome Glisse
@ 2014-08-15 7:46 ` Christian König
0 siblings, 0 replies; 11+ messages in thread
From: Christian König @ 2014-08-15 7:46 UTC (permalink / raw)
To: Jerome Glisse, Alex Deucher; +Cc: Maling list - DRI developers
Am 14.08.2014 um 23:06 schrieb Jerome Glisse:
> On Thu, Aug 14, 2014 at 04:34:29PM -0400, Alex Deucher wrote:
>> On Thu, Aug 14, 2014 at 2:43 PM, Jerome Glisse <j.glisse@gmail.com> wrote:
>>> On Thu, Aug 14, 2014 at 06:12:04PM +0200, Christian König wrote:
>>>> From: Christian König <christian.koenig@amd.com>
>>>>
>>>> This allows userspace to explicitly don't sync submission to
>>>> different rings as long as all uses stay in the same client.
>>>>
>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>> ---
>>>> drivers/gpu/drm/radeon/radeon.h | 3 +++
>>>> drivers/gpu/drm/radeon/radeon_cs.c | 23 ++++++++++++++++++++++-
>>>> drivers/gpu/drm/radeon/radeon_gem.c | 1 +
>>>> drivers/gpu/drm/radeon/radeon_ttm.c | 3 +++
>>>> include/uapi/drm/radeon_drm.h | 2 ++
>>>> 5 files changed, 31 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
>>>> index c0f7773..740a0b2 100644
>>>> --- a/drivers/gpu/drm/radeon/radeon.h
>>>> +++ b/drivers/gpu/drm/radeon/radeon.h
>>>> @@ -478,6 +478,8 @@ struct radeon_bo {
>>>> u32 tiling_flags;
>>>> u32 pitch;
>>>> int surface_reg;
>>>> + struct drm_file *last_user;
>>>> + struct radeon_fence *last_sync;
>>>> struct radeon_fence *written;
>>>> /* list of all virtual address to which this bo
>>>> * is associated to
>>>> @@ -1018,6 +1020,7 @@ struct radeon_cs_reloc {
>>>> unsigned allowed_domains;
>>>> uint32_t tiling_flags;
>>>> uint32_t handle;
>>>> + uint32_t flags;
>>>> bool written;
>>>> };
>>>>
>>>> diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
>>>> index 3aa7e48..11e4789 100644
>>>> --- a/drivers/gpu/drm/radeon/radeon_cs.c
>>>> +++ b/drivers/gpu/drm/radeon/radeon_cs.c
>>>> @@ -166,6 +166,7 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
>>>>
>>>> p->relocs[i].tv.bo = &p->relocs[i].robj->tbo;
>>>> p->relocs[i].handle = r->handle;
>>>> + p->relocs[i].flags = r->flags;
>>>> p->relocs[i].written = !!r->write_domain;
>>>>
>>>> radeon_cs_buckets_add(&buckets, &p->relocs[i].tv.head,
>>>> @@ -236,6 +237,14 @@ static void radeon_cs_sync_rings(struct radeon_cs_parser *p)
>>>> if (!bo)
>>>> continue;
>>>>
>>>> + /* always sync to the last operation
>>>> + the clients doesn't know about */
>>>> + radeon_semaphore_sync_to(p->ib.presync, bo->last_sync);
>>>> +
>>>> + if (bo->last_user == p->filp &&
>>>> + reloc->flags & RADEON_RELOC_DONT_SYNC)
>>>> + continue;
>>>> +
>>>> fence = bo->tbo.sync_obj;
>>>>
>>>> if (bo->written && radeon_fence_signaled(bo->written))
>>>> @@ -421,7 +430,19 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo
>>>> struct radeon_cs_reloc *reloc = &parser->relocs[i];
>>>> struct radeon_bo *bo = reloc->robj;
>>>>
>>>> - if (!bo || !reloc->written)
>>>> + if (!bo)
>>>> + continue;
>>>> +
>>>> + /* if the client changes remember that and always serialize
>>>> + all operations from different clients */
>>>> + if (bo->last_user != parser->filp && bo->tbo.sync_obj) {
>>>> + struct radeon_fence *fence = bo->tbo.sync_obj;
>>>> + radeon_fence_unref(&bo->last_sync);
>>>> + bo->last_sync = radeon_fence_ref(fence);
>>>> + }
>>>> + bo->last_user = parser->filp;
>>>> +
>>>> + if (!reloc->written)
>>>> continue;
>>>>
>>>> radeon_fence_unref(&bo->written);
>>>> diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c
>>>> index bfd7e1b..c73dbc1 100644
>>>> --- a/drivers/gpu/drm/radeon/radeon_gem.c
>>>> +++ b/drivers/gpu/drm/radeon/radeon_gem.c
>>>> @@ -259,6 +259,7 @@ int radeon_gem_create_ioctl(struct drm_device *dev, void *data,
>>>> r = radeon_gem_handle_lockup(rdev, r);
>>>> return r;
>>>> }
>>>> + gem_to_radeon_bo(gobj)->last_user = filp;
>>>> r = drm_gem_handle_create(filp, gobj, &handle);
>>>> /* drop reference from allocate - handle holds it now */
>>>> drm_gem_object_unreference_unlocked(gobj);
>>>> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
>>>> index 76be612..a4f964f 100644
>>>> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
>>>> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
>>>> @@ -273,6 +273,9 @@ static int radeon_move_blit(struct ttm_buffer_object *bo,
>>>> &fence);
>>>>
>>>> if (!r) {
>>>> + rbo->last_user = NULL;
>>>> + radeon_fence_unref(&rbo->last_sync);
>>>> + rbo->last_sync = radeon_fence_ref(fence);
>>>> radeon_fence_unref(&rbo->written);
>>>> rbo->written = radeon_fence_ref(fence);
>>>> }
>>>> diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon_drm.h
>>>> index 509b2d7..5bd3f68 100644
>>>> --- a/include/uapi/drm/radeon_drm.h
>>>> +++ b/include/uapi/drm/radeon_drm.h
>>>> @@ -944,6 +944,8 @@ struct drm_radeon_cs_chunk {
>>>> };
>>>>
>>>> /* drm_radeon_cs_reloc.flags */
>>>> +#define RADEON_RELOC_PRIO_MASK (0xf << 0)
>>> RADEON_RELOC_PRIO_MASK not use anywhere, not explain by anything, so new API
>>> with no justification, i would say NAK
>> The first 4 bits are used in:
>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c9b76548899cde2e729e3bca015d7e78ec5baad7
>> and that interface is already used in mesa.
>>
>> This just makes it clear that those bits are already used when adding
>> additional flags.
> Well would be good to introduce the flag and use it as part of
> separate patch.
Which is what I actually wanted to do, but forgotten about it because
all of the fence discussion.
Going to send out a separate cleanup patch soon,
Christian.
>
>> Alex
>>
>>>> +#define RADEON_RELOC_DONT_SYNC (1 << 4)
>>>>
>>>> struct drm_radeon_cs_reloc {
>>>> uint32_t handle;
>>>> --
>>>> 1.9.1
>>>>
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel@lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: drm/radeon: giving userspace control over hardware engine sync
2014-08-14 18:35 ` drm/radeon: giving userspace control over hardware engine sync Jerome Glisse
@ 2014-08-15 12:04 ` Christian König
0 siblings, 0 replies; 11+ messages in thread
From: Christian König @ 2014-08-15 12:04 UTC (permalink / raw)
To: Jerome Glisse; +Cc: dri-devel
Am 14.08.2014 um 20:35 schrieb Jerome Glisse:
> On Thu, Aug 14, 2014 at 06:12:01PM +0200, Christian König wrote:
>> Hello everyone,
>>
>> this set of patch adds the ability for userspace to better control when
>> synchronization between the different hardware engines on radeon happens.
>> Previously every access to a buffer object was serialized and concurrent
>> execution could only happen if command submissions didn't shared any
>> buffer handle.
> I must say i do not like the whole direction of abusing buffer object to be
> expose as fence to userspace. Allowing 0 sized bo was the first step in this
> bad direction.
I'm not very keen about it either, but I wanted to wait what's the final
result of the fence discussion and how we want to expose them to
userspace before switching to something else.
> I do understand it's lot easier to implement such things in isolation from
> other and that trying to design and get a common kernel subsystem accepted
> is way more painful and unpredictible.
Correct, comparing the history of GEM named handles and DMA-buf the idea
should be relative clear what we should use. Well after all everything
is just a file, isn't it?
>> Patch #1 in this series adds the ability to not only sync before the IB
>> execution, but also after it before the fence value is written. This is
>> a workaround because TTM currently can't handle multiple fences attached
>> to a single buffer object.
> We do not want multiple fences to be associated with a buffer object ever.
Well maybe we shouldn't use the wording associating the fence to the
buffer, but rather protecting the buffer object by multiple fences.
Otherwise it wouldn't be possible to allow concurrent access from
different hardware engines to the same buffer object. What we should
avoid clearly is exposing any of this to userspace.
> In fact i believe getting rid of fence associated to buffer object is what
> would make sense.
Yeah, agree. We should have a separated fence object that userspace can
query. In the end it should probably be similar to a DMA-buf fd or a
futex. Not sure how a sane interface would look like.
Christian.
>
> Others comment as a per patch reply.
>
> Cheers,
> Jérôme
>
>> Patch #2 allows concurrent execution of command submission if there is
>> only read only access to the same buffer.
>>
>> Patch #3 adds a DONT_SYNC flag to each buffer object send to the kernel
>> which allows userspace to explicitly note that concurrent access to a
>> buffer is ok. The usage of this flag is restricted in that way that each
>> operation the client doesn't knows about (eviction, access by other clients
>> etc...) is still implicitly synced to.
>>
>> Patch #4 adds a DONT_FENCE flag that tells the kernel to sync all operations
>> to a buffer handle, but don't fence that handle with the current command
>> submission. This is necessarily because we currently abuses zero sized buffer
>> objects as fence handles.
>>
>> Please review and comment,
>> Christian.
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2014-08-15 12:05 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-14 16:12 drm/radeon: giving userspace control over hardware engine sync Christian König
2014-08-14 16:12 ` [PATCH 1/4] drm/radeon: add pre and post IB sync Christian König
2014-08-14 16:12 ` [PATCH 2/4] drm/radeon: allow concurrent BO access by different engines Christian König
2014-08-14 16:12 ` [PATCH 3/4] drm/radeon: add DONT_SYNC flag to CS relocs Christian König
2014-08-14 18:43 ` Jerome Glisse
2014-08-14 20:34 ` Alex Deucher
2014-08-14 21:06 ` Jerome Glisse
2014-08-15 7:46 ` Christian König
2014-08-14 16:12 ` [PATCH 4/4] drm/radeon: add flag to NOT fence a BO on CS Christian König
2014-08-14 18:35 ` drm/radeon: giving userspace control over hardware engine sync Jerome Glisse
2014-08-15 12:04 ` Christian König
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.