From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jerome Glisse Subject: Re: [PATCH 2/3] drm/radeon: add command submission IDs Date: Fri, 19 Sep 2014 09:40:51 -0400 Message-ID: <20140919134050.GA2581@gmail.com> References: <1411054213-11173-1-git-send-email-deathsimple@vodafone.de> <1411054213-11173-2-git-send-email-deathsimple@vodafone.de> <20140919004215.GA3447@gmail.com> <20140919031155.GB3447@gmail.com> <20140919040629.GC3447@gmail.com> <541BE35F.9000200@vodafone.de> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: Received: from mail-qc0-f182.google.com (mail-qc0-f182.google.com [209.85.216.182]) by gabe.freedesktop.org (Postfix) with ESMTP id AF56D6E1AA for ; Fri, 19 Sep 2014 06:43:48 -0700 (PDT) Received: by mail-qc0-f182.google.com with SMTP id i8so2452263qcq.13 for ; Fri, 19 Sep 2014 06:43:48 -0700 (PDT) Content-Disposition: inline In-Reply-To: <541BE35F.9000200@vodafone.de> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: Christian =?iso-8859-1?Q?K=F6nig?= Cc: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org On Fri, Sep 19, 2014 at 10:03:43AM +0200, Christian K=F6nig wrote: > Hi Jerome, > = > this is exactly the approach which we wanted to avoid. And according to A= lex > it was actually you guys (I think Dave) who noted that this is probably n= ot > such a good idea. > = > The issue is that by telling userspace the ring specific sequence number = we > nail down the order of execution. E.g. a scheduler who want to change the > order in which IBs are submitted to the hardware can't do so any more > because we already gave userspace a sequence number to identify the comma= nd > submission. Well for curent code this design does work, for hw where you want to have t= he hw/driver do some scheduling this can still be achieve by having a seq numb= er per ring per process. > = > Additional to that for hardware inter device synchronization we want to > refer to a certain kernel object and not just the fence number. So we can > only use some kind of handle based approach, e.g. as discussed before we > want to be able to turn this sequence number in a FD fence. Inter-device is not an issue here. For inter-device you want to convert this seq number to an fd at which point you can create a fence structure with the proper information ie create a structure only if it is needed. Really this is the only sane design if we start allocating structure for ea= ch cs (well for each cs userspace cares) this gonna play bad with memory press= ure. Cheers, J=E9r=F4me > = > Regards, > Christian. > = > Am 19.09.2014 um 06:06 schrieb Jerome Glisse: > >On Thu, Sep 18, 2014 at 11:11:57PM -0400, Jerome Glisse wrote: > >>On Thu, Sep 18, 2014 at 08:42:16PM -0400, Jerome Glisse wrote: > >>>On Thu, Sep 18, 2014 at 05:30:12PM +0200, Christian K=F6nig wrote: > >>>>From: Christian K=F6nig > >>>> > >>>>This patch adds a new 64bit ID as a result to each command submission. > >>>NAK for design reasons. > >>> > >>>First and foremost we DO NOT WANT TO ALLOCATE ANYTHING for submission = id. > >>>I know that hooking up to fence make life easy but this is not how it > >>>should be done. > >>> > >>>What you want to expose is the per ring 32bit seq value and report to = user > >>>space the value and the ring id. To handle the wrap around you add a u= niq > >>>32bit wrap counter which is incremented every 1 << 30 or some big numb= er > >>>seq value number (but must be smaller than 1 << 31). Then you use arit= hmetic > >>>to determine if a cs seq number is done or not (ie the seq wrap around > >>>counter diff should not be bigger then 2 or 3) and the seq number shou= ld > >>>obviously be after again using arithmetic diff like jiffies code. > >>> > >>>All this being hidden in the kernel all the user space have to know is= : > >>>32 bit seq value per ring > >>>32 bit ring uniq id > >>>32 wrap counter per ring > >>> > >>>With such scheme you never allocate anything or worry about any fence. > >>>Way simpler and less code. > >>> > >>>Cheers, > >>>J=E9r=F4me > >>Because code is clearer than words attached is a patch of what i am thi= nking > >>about. The arithmetic should be right as well as the coherency and prop= er > >>memory barrier. Thought it is completely untested and will require coup= le > >>fixes for properly checking userspace arguments and other aspect (see F= IXME > >>in patches). > >> > >>There is no allocation, it is a pretty small patch, and it provide a fa= st > >>ligthweight solution. While the new ioctl argument and return value can= be > >>improved (like reporting more information to userspace like for instance > >>warning userspace when it is querying very old fences). I think the ove= rall > >>design is sound. > >> > >>Let me know. > >Ok actually after a beer (given that planet express manual for Bender, w= hich > >also apply to me, clearly stipulate to expect malfunction if not properly > >inebriate) it became clear that by reversing the problem i could make it= a > >lot simpler. So attached is an even simpler patch that handle gracefully= user > >space querying for very old fence (ie that could have wrap around). > > > >Code explains it all. > > > >>Cheers, > >>J=E9r=F4me > >> > >>> > >>>>Signed-off-by: Christian K=F6nig > >>>>--- > >>>> drivers/gpu/drm/radeon/Makefile | 2 +- > >>>> drivers/gpu/drm/radeon/radeon.h | 13 +- > >>>> drivers/gpu/drm/radeon/radeon_cs.c | 13 ++ > >>>> drivers/gpu/drm/radeon/radeon_kms.c | 41 +++---- > >>>> drivers/gpu/drm/radeon/radeon_seq.c | 229 +++++++++++++++++++++++++= +++++++++++ > >>>> include/uapi/drm/radeon_drm.h | 1 + > >>>> 6 files changed, 277 insertions(+), 22 deletions(-) > >>>> create mode 100644 drivers/gpu/drm/radeon/radeon_seq.c > >>>> > >>>>diff --git a/drivers/gpu/drm/radeon/Makefile b/drivers/gpu/drm/radeon= /Makefile > >>>>index 8cc9f68..21ee928 100644 > >>>>--- a/drivers/gpu/drm/radeon/Makefile > >>>>+++ b/drivers/gpu/drm/radeon/Makefile > >>>>@@ -81,7 +81,7 @@ radeon-y +=3D radeon_device.o radeon_asic.o radeon_= kms.o \ > >>>> rv770_smc.o cypress_dpm.o btc_dpm.o sumo_dpm.o sumo_smc.o trinity_= dpm.o \ > >>>> trinity_smc.o ni_dpm.o si_smc.o si_dpm.o kv_smc.o kv_dpm.o ci_smc.= o \ > >>>> ci_dpm.o dce6_afmt.o radeon_vm.o radeon_ucode.o radeon_ib.o radeon= _mn.o \ > >>>>- radeon_sync.o > >>>>+ radeon_sync.o radeon_seq.o > >>>> # add async DMA block > >>>> radeon-y +=3D \ > >>>>diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon= /radeon.h > >>>>index b3b4e96..dbfd346 100644 > >>>>--- a/drivers/gpu/drm/radeon/radeon.h > >>>>+++ b/drivers/gpu/drm/radeon/radeon.h > >>>>@@ -423,6 +423,15 @@ static inline bool radeon_fence_is_earlier(struc= t radeon_fence *a, > >>>> } > >>>> /* > >>>>+ * Userspace command submission identifier generation > >>>>+ */ > >>>>+struct radeon_seq; > >>>>+ > >>>>+uint64_t radeon_seq_push(struct radeon_seq **seq, struct radeon_fenc= e *fence); > >>>>+struct radeon_fence *radeon_seq_query(struct radeon_seq *seq, uint64= _t id); > >>>>+void radeon_seq_destroy(struct radeon_seq **seq); > >>>>+ > >>>>+/* > >>>> * Tiling registers > >>>> */ > >>>> struct radeon_surface_reg { > >>>>@@ -946,7 +955,9 @@ struct radeon_vm_manager { > >>>> * file private structure > >>>> */ > >>>> struct radeon_fpriv { > >>>>- struct radeon_vm vm; > >>>>+ struct radeon_vm vm; > >>>>+ struct mutex seq_lock; > >>>>+ struct radeon_seq *seq; > >>>> }; > >>>> /* > >>>>diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/rad= eon/radeon_cs.c > >>>>index b8f20a5..0c0f0b3 100644 > >>>>--- a/drivers/gpu/drm/radeon/radeon_cs.c > >>>>+++ b/drivers/gpu/drm/radeon/radeon_cs.c > >>>>@@ -413,6 +413,19 @@ static void radeon_cs_parser_fini(struct radeon_= cs_parser *parser, int error, bo > >>>> unsigned i; > >>>> if (!error) { > >>>>+ if (parser->chunk_flags && > >>>>+ parser->chunk_flags->length_dw > 4) { > >>>>+ struct radeon_fpriv *fpriv =3D parser->filp->driver_priv; > >>>>+ uint32_t __user *to =3D parser->chunk_flags->user_ptr; > >>>>+ uint64_t id; > >>>>+ > >>>>+ mutex_lock(&fpriv->seq_lock); > >>>>+ id =3D radeon_seq_push(&fpriv->seq, parser->ib.fence); > >>>>+ mutex_unlock(&fpriv->seq_lock); > >>>>+ > >>>>+ copy_to_user(&to[3], &id, sizeof(uint64_t)); > >>>>+ } > >>>>+ > >>>> /* Sort the buffer list from the smallest to largest buffer, > >>>> * which affects the order of buffers in the LRU list. > >>>> * This assures that the smallest buffers are added first > >>>>diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/ra= deon/radeon_kms.c > >>>>index 85ee6f7..38880af 100644 > >>>>--- a/drivers/gpu/drm/radeon/radeon_kms.c > >>>>+++ b/drivers/gpu/drm/radeon/radeon_kms.c > >>>>@@ -578,39 +578,34 @@ void radeon_driver_lastclose_kms(struct drm_dev= ice *dev) > >>>> */ > >>>> int radeon_driver_open_kms(struct drm_device *dev, struct drm_file = *file_priv) > >>>> { > >>>>+ struct radeon_fpriv *fpriv =3D kzalloc(sizeof(*fpriv), GFP_KERNEL); > >>>> struct radeon_device *rdev =3D dev->dev_private; > >>>> int r; > >>>>- file_priv->driver_priv =3D NULL; > >>>>+ if (unlikely(!fpriv)) > >>>>+ return -ENOMEM; > >>>>+ > >>>>+ file_priv->driver_priv =3D fpriv; > >>>> r =3D pm_runtime_get_sync(dev->dev); > >>>> if (r < 0) > >>>>- return r; > >>>>+ goto error; > >>>> /* new gpu have virtual address space support */ > >>>> if (rdev->family >=3D CHIP_CAYMAN) { > >>>>- struct radeon_fpriv *fpriv; > >>>> struct radeon_vm *vm; > >>>> int r; > >>>>- fpriv =3D kzalloc(sizeof(*fpriv), GFP_KERNEL); > >>>>- if (unlikely(!fpriv)) { > >>>>- return -ENOMEM; > >>>>- } > >>>>- > >>>> vm =3D &fpriv->vm; > >>>> r =3D radeon_vm_init(rdev, vm); > >>>>- if (r) { > >>>>- kfree(fpriv); > >>>>- return r; > >>>>- } > >>>>+ if (r) > >>>>+ goto error; > >>>> if (rdev->accel_working) { > >>>> r =3D radeon_bo_reserve(rdev->ring_tmp_bo.bo, false); > >>>> if (r) { > >>>> radeon_vm_fini(rdev, vm); > >>>>- kfree(fpriv); > >>>>- return r; > >>>>+ goto error; > >>>> } > >>>> /* map the ib pool buffer read only into > >>>>@@ -623,16 +618,20 @@ int radeon_driver_open_kms(struct drm_device *d= ev, struct drm_file *file_priv) > >>>> RADEON_VM_PAGE_SNOOPED); > >>>> if (r) { > >>>> radeon_vm_fini(rdev, vm); > >>>>- kfree(fpriv); > >>>>- return r; > >>>>+ goto error; > >>>> } > >>>> } > >>>>- file_priv->driver_priv =3D fpriv; > >>>> } > >>>>+ mutex_init(&fpriv->seq_lock); > >>>>+ > >>>> pm_runtime_mark_last_busy(dev->dev); > >>>> pm_runtime_put_autosuspend(dev->dev); > >>>> return 0; > >>>>+ > >>>>+error: > >>>>+ kfree(fpriv); > >>>>+ return r; > >>>> } > >>>> /** > >>>>@@ -646,11 +645,13 @@ int radeon_driver_open_kms(struct drm_device *d= ev, struct drm_file *file_priv) > >>>> void radeon_driver_postclose_kms(struct drm_device *dev, > >>>> struct drm_file *file_priv) > >>>> { > >>>>+ struct radeon_fpriv *fpriv =3D file_priv->driver_priv; > >>>> struct radeon_device *rdev =3D dev->dev_private; > >>>>+ radeon_seq_destroy(&fpriv->seq); > >>>>+ > >>>> /* new gpu have virtual address space support */ > >>>> if (rdev->family >=3D CHIP_CAYMAN && file_priv->driver_priv) { > >>>>- struct radeon_fpriv *fpriv =3D file_priv->driver_priv; > >>>> struct radeon_vm *vm =3D &fpriv->vm; > >>>> int r; > >>>>@@ -664,9 +665,9 @@ void radeon_driver_postclose_kms(struct drm_devic= e *dev, > >>>> } > >>>> radeon_vm_fini(rdev, vm); > >>>>- kfree(fpriv); > >>>>- file_priv->driver_priv =3D NULL; > >>>> } > >>>>+ kfree(fpriv); > >>>>+ file_priv->driver_priv =3D NULL; > >>>> } > >>>> /** > >>>>diff --git a/drivers/gpu/drm/radeon/radeon_seq.c b/drivers/gpu/drm/ra= deon/radeon_seq.c > >>>>new file mode 100644 > >>>>index 0000000..d8857f1 > >>>>--- /dev/null > >>>>+++ b/drivers/gpu/drm/radeon/radeon_seq.c > >>>>@@ -0,0 +1,229 @@ > >>>>+/* > >>>>+ * Copyright 2014 Advanced Micro Devices, Inc. > >>>>+ * All Rights Reserved. > >>>>+ * > >>>>+ * Permission is hereby granted, free of charge, to any person obtai= ning a > >>>>+ * copy of this software and associated documentation files (the > >>>>+ * "Software"), to deal in the Software without restriction, includi= ng > >>>>+ * without limitation the rights to use, copy, modify, merge, publis= h, > >>>>+ * distribute, sub license, and/or sell copies of the Software, and = to > >>>>+ * permit persons to whom the Software is furnished to do so, subjec= t to > >>>>+ * the following conditions: > >>>>+ * > >>>>+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, E= XPRESS OR > >>>>+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTA= BILITY, > >>>>+ * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVEN= T SHALL > >>>>+ * THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR= ANY CLAIM, > >>>>+ * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TOR= T OR > >>>>+ * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWAR= E OR THE > >>>>+ * USE OR OTHER DEALINGS IN THE SOFTWARE. > >>>>+ * > >>>>+ * The above copyright notice and this permission notice (including = the > >>>>+ * next paragraph) shall be included in all copies or substantial po= rtions > >>>>+ * of the Software. > >>>>+ * > >>>>+ */ > >>>>+/* > >>>>+ * Authors: > >>>>+ * Christian K=F6nig > >>>>+ */ > >>>>+ > >>>>+#include > >>>>+#include "radeon.h" > >>>>+ > >>>>+/* > >>>>+ * ID sequences > >>>>+ * This code generates a 64bit identifier for a command submission. > >>>>+ * It works by adding the fence of the command submission to a autom= atically > >>>>+ * resizing ring buffer. > >>>>+ */ > >>>>+ > >>>>+struct radeon_seq { > >>>>+ uint64_t start; > >>>>+ uint64_t end; > >>>>+ uint64_t mask; > >>>>+ struct radeon_seq *replacement; > >>>>+}; > >>>>+ > >>>>+/** > >>>>+ * radeon_seq_create - create a new sequence object > >>>>+ * > >>>>+ * @start: start value for this sequence > >>>>+ * @size: size of the ring buffer, must be power of two > >>>>+ * > >>>>+ * Allocate and initialize a new ring buffer and header. > >>>>+ * Returns NULL if allocation fails, new object otherwise. > >>>>+ */ > >>>>+static struct radeon_seq *radeon_seq_create(uint64_t start, unsigned= size) > >>>>+{ > >>>>+ unsigned bytes =3D sizeof(struct radeon_seq) + > >>>>+ size * sizeof(struct radeon_fence *); > >>>>+ > >>>>+ struct radeon_seq *seq; > >>>>+ > >>>>+ seq =3D kmalloc(bytes, GFP_KERNEL); > >>>>+ if (!seq) > >>>>+ return NULL; > >>>>+ > >>>>+ seq->start =3D start; > >>>>+ seq->end =3D start; > >>>>+ seq->mask =3D size - 1; > >>>>+ seq->replacement =3D NULL; > >>>>+ > >>>>+ return seq; > >>>>+} > >>>>+ > >>>>+/** > >>>>+ * radeon_seq_ring - get pointer to ring buffer > >>>>+ * > >>>>+ * @seq: sequence object > >>>>+ * > >>>>+ * Calculate the address of the ring buffer. > >>>>+ */ > >>>>+static struct radeon_fence **radeon_seq_ring(struct radeon_seq *seq) > >>>>+{ > >>>>+ return (struct radeon_fence **)&seq[1]; > >>>>+} > >>>>+ > >>>>+/** > >>>>+ * radeon_seq_try_free - try to free fences from the ring buffer > >>>>+ * > >>>>+ * @seq: sequence object > >>>>+ * > >>>>+ * Try to free fences from the start of the ring buffer. > >>>>+ */ > >>>>+static void radeon_seq_try_free(struct radeon_seq *seq) > >>>>+{ > >>>>+ struct radeon_fence **ring =3D radeon_seq_ring(seq); > >>>>+ > >>>>+ while (seq->start !=3D seq->end) { > >>>>+ unsigned idx =3D seq->start & seq->mask; > >>>>+ struct radeon_fence *fence =3D ring[idx]; > >>>>+ > >>>>+ if (!radeon_fence_signaled(fence)) > >>>>+ break; > >>>>+ > >>>>+ radeon_fence_unref(&fence); > >>>>+ ++seq->start; > >>>>+ } > >>>>+} > >>>>+ > >>>>+/** > >>>>+ * radeon_seq_add - add new fence to the end of the ring buffer > >>>>+ * > >>>>+ * @seq: sequence object > >>>>+ * @f: the fence object > >>>>+ * > >>>>+ * Add the fence and return the generated ID. > >>>>+ */ > >>>>+static uint64_t radeon_seq_add(struct radeon_seq *seq, struct radeon= _fence *f) > >>>>+{ > >>>>+ struct radeon_fence **ring =3D radeon_seq_ring(seq); > >>>>+ > >>>>+ ring[seq->end & seq->mask] =3D radeon_fence_ref(f); > >>>>+ return seq->end++; > >>>>+} > >>>>+ > >>>>+/** > >>>>+ * radeon_seq_push - check for room and add the fence > >>>>+ * > >>>>+ * @seq: sequence object > >>>>+ * @fence: the fence object > >>>>+ * > >>>>+ * Check for room on the ring buffer, if there isn't enough > >>>>+ * reallocate the sequence object and add the fence. > >>>>+ * Returns the generated ID. > >>>>+ */ > >>>>+uint64_t radeon_seq_push(struct radeon_seq **seq, struct radeon_fenc= e *fence) > >>>>+{ > >>>>+ unsigned size_for_new_seq =3D 4; > >>>>+ uint64_t start_for_new_seq =3D 1; > >>>>+ > >>>>+ if (*seq) { > >>>>+ /* try to release old replacements */ > >>>>+ while ((*seq)->replacement) { > >>>>+ radeon_seq_try_free(*seq); > >>>>+ if ((*seq)->start =3D=3D (*seq)->end) { > >>>>+ struct radeon_seq *repl =3D (*seq)->replacement; > >>>>+ > >>>>+ kfree(*seq); > >>>>+ *seq =3D repl; > >>>>+ } else { > >>>>+ /* move on to the current container */ > >>>>+ seq =3D &(*seq)->replacement; > >>>>+ } > >>>>+ } > >>>>+ > >>>>+ /* check if we have enough room for one more fence */ > >>>>+ radeon_seq_try_free(*seq); > >>>>+ if (((*seq)->end - (*seq)->start) <=3D (*seq)->mask) > >>>>+ return radeon_seq_add(*seq, fence); > >>>>+ > >>>>+ /* not enough room, let's allocate a replacement */ > >>>>+ size_for_new_seq =3D ((*seq)->mask + 1) * 2; > >>>>+ start_for_new_seq =3D (*seq)->end + 1; > >>>>+ seq =3D &(*seq)->replacement; > >>>>+ } > >>>>+ > >>>>+ *seq =3D radeon_seq_create(start_for_new_seq, size_for_new_seq); > >>>>+ if (!*seq) { > >>>>+ /* not enough memory for a new sequence object, but failing > >>>>+ here isn't a good idea either cause the commands are already > >>>>+ submitted to the hardware. So just block on the fence. */ > >>>>+ int r =3D radeon_fence_wait(fence, false); > >>>>+ if (r) > >>>>+ DRM_ERROR("Error waiting for fence (%d)\n", r); > >>>>+ return 0; > >>>>+ } > >>>>+ return radeon_seq_add(*seq, fence); > >>>>+} > >>>>+ > >>>>+/** > >>>>+ * radeon_seq_query - lockup fence by it's ID > >>>>+ * > >>>>+ * @seq: sequence object > >>>>+ * @id: the generated ID > >>>>+ * > >>>>+ * Lockup the associated fence by it's ID. > >>>>+ * Returns fence object or NULL if it couldn't be found. > >>>>+ */ > >>>>+struct radeon_fence *radeon_seq_query(struct radeon_seq *seq, uint64= _t id) > >>>>+{ > >>>>+ struct radeon_fence **ring; > >>>>+ > >>>>+ while (seq && id > seq->end) > >>>>+ seq =3D seq->replacement; > >>>>+ > >>>>+ if (!seq || id < seq->start) > >>>>+ return NULL; > >>>>+ > >>>>+ ring =3D radeon_seq_ring(seq); > >>>>+ return ring[id & seq->mask]; > >>>>+} > >>>>+ > >>>>+/** > >>>>+ * radeon_seq_destroy - destroy the sequence object > >>>>+ * > >>>>+ * @seq_ptr: pointer to sequence object > >>>>+ * > >>>>+ * Destroy the sequence objects and release all fence references tak= en. > >>>>+ */ > >>>>+void radeon_seq_destroy(struct radeon_seq **seq_ptr) > >>>>+{ > >>>>+ struct radeon_seq *seq =3D *seq_ptr; > >>>>+ while (seq) { > >>>>+ struct radeon_seq *repl =3D seq->replacement; > >>>>+ unsigned start =3D seq->start & seq->mask; > >>>>+ unsigned end =3D seq->end & seq->mask; > >>>>+ struct radeon_fence **ring; > >>>>+ unsigned i; > >>>>+ > >>>>+ ring =3D radeon_seq_ring(seq); > >>>>+ for (i =3D start; i < end; ++i) > >>>>+ radeon_fence_unref(&ring[i]); > >>>>+ > >>>>+ kfree(seq); > >>>>+ seq =3D repl; > >>>>+ } > >>>>+ *seq_ptr =3D NULL; > >>>>+} > >>>>diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon_= drm.h > >>>>index 50d0fb4..6b2b2e7 100644 > >>>>--- a/include/uapi/drm/radeon_drm.h > >>>>+++ b/include/uapi/drm/radeon_drm.h > >>>>@@ -959,6 +959,7 @@ struct drm_radeon_gem_va { > >>>> #define RADEON_CS_RING_VCE 4 > >>>> /* The third dword of RADEON_CHUNK_ID_FLAGS is a sint32 that sets t= he priority */ > >>>> /* 0 =3D normal, + =3D higher priority, - =3D lower priority */ > >>>>+/* The fourth and fives dword are a 64bit fence ID generated for thi= s CS */ > >>>> struct drm_radeon_cs_chunk { > >>>> uint32_t chunk_id; > >>>>-- = > >>>>1.9.1 > >>>> > >>>>_______________________________________________ > >>>>dri-devel mailing list > >>>>dri-devel@lists.freedesktop.org > >>>>http://lists.freedesktop.org/mailman/listinfo/dri-devel > >> From b95fe503dad89875da9db2d11d05f93eca41232d Mon Sep 17 00:00:00 2001 > >>From: =3D?UTF-8?q?J=3DC3=3DA9r=3DC3=3DB4me=3D20Glisse?=3D > >>Date: Thu, 18 Sep 2014 22:51:21 -0400 > >>Subject: [PATCH] drm/radeon: cs sequence id and cs completion query. > >>MIME-Version: 1.0 > >>Content-Type: text/plain; charset=3DUTF-8 > >>Content-Transfer-Encoding: 8bit > >> > >>This report back to userspace ring id and sequence number that can > >>be use by userspace to query for the completetion of the cs on the > >>hardware. This also add a new ioctl to perform such query. > >> > >>There is an extra value supplied to userspace the wrap counter that > >>is use to detect wrap around and to gracefuly handle the case where > >>userspace might be querying about a very, very, very old cs executed > >>long time ago in a far far away universe. > >> > >>This patch is aimed to introduce the necessary ground work for user > >>space explicit synchronization. By allowing userspace to query about > >>cs completetion on hardware, user space can perform operation and > >>synchronization on buffer by itself without having the cs ioctl to > >>implicitly wait for older cs completetion before scheduling new cs. > >>This part is however left to a follow-up patch. > >> > >>Signed-off-by: J=E9r=F4me Glisse > >>--- > >> drivers/gpu/drm/radeon/radeon.h | 2 ++ > >> drivers/gpu/drm/radeon/radeon_cs.c | 60 ++++++++++++++++++++++++++= +++++++++ > >> drivers/gpu/drm/radeon/radeon_fence.c | 4 +++ > >> drivers/gpu/drm/radeon/radeon_kms.c | 1 + > >> include/uapi/drm/radeon_drm.h | 9 ++++++ > >> 5 files changed, 76 insertions(+) > >> > >>diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/r= adeon.h > >>index 5f05b4c..1f1dd1f 100644 > >>--- a/drivers/gpu/drm/radeon/radeon.h > >>+++ b/drivers/gpu/drm/radeon/radeon.h > >>@@ -118,6 +118,7 @@ extern int radeon_bapm; > >> #define RADEON_DEBUGFS_MAX_COMPONENTS 32 > >> #define RADEONFB_CONN_LIMIT 4 > >> #define RADEON_BIOS_NUM_SCRATCH 8 > >>+#define RADEON_SEQ_WRAP_VALUE (1 << 30) > >> /* fence seq are set to this number when signaled */ > >> #define RADEON_FENCE_SIGNALED_SEQ 0LL > >>@@ -355,6 +356,7 @@ struct radeon_fence_driver { > >> /* sync_seq is protected by ring emission lock */ > >> uint64_t sync_seq[RADEON_NUM_RINGS]; > >> atomic64_t last_seq; > >>+ int32_t wrap_seq; > >> bool initialized; > >> }; > >>diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeo= n/radeon_cs.c > >>index 83f382e..be4ae25 100644 > >>--- a/drivers/gpu/drm/radeon/radeon_cs.c > >>+++ b/drivers/gpu/drm/radeon/radeon_cs.c > >>@@ -404,6 +404,17 @@ static void radeon_cs_parser_fini(struct radeon_cs= _parser *parser, int error, bo > >> ttm_eu_fence_buffer_objects(&parser->ticket, > >> &parser->validated, > >> parser->ib.fence); > >>+ if (parser->chunk_flags && parser->chunk_flags->length_dw > 4) { > >>+ uint32_t __user *to =3D parser->chunk_flags->user_ptr; > >>+ uint32_t tmp; > >>+ > >>+ tmp =3D lower_32_bits(parser->ib.fence->seq); > >>+ copy_to_user(&to[3], &tmp, sizeof(uint32_t)); > >>+ tmp =3D parser->ib.fence->ring; > >>+ copy_to_user(&to[4], &tmp, sizeof(uint32_t)); > >>+ tmp =3D rdev->fence_drv[tmp]->wrap_seq; > >>+ copy_to_user(&to[5], &tmp, sizeof(uint32_t)); > >>+ } > >> } else if (backoff) { > >> ttm_eu_backoff_reservation(&parser->ticket, > >> &parser->validated); > >>@@ -823,3 +834,52 @@ int radeon_cs_packet_next_reloc(struct radeon_cs_p= arser *p, > >> *cs_reloc =3D p->relocs_ptr[(idx / 4)]; > >> return 0; > >> } > >>+ > >>+int radeon_cs_done_ioctl(struct drm_device *dev, void *data, struct dr= m_file *filp) > >>+{ > >>+ struct radeon_device *rdev =3D dev->dev_private; > >>+ struct drm_radeon_cs_done *args =3D data; > >>+ unsigned i =3D args->ring; > >>+ int32_t last_seq, sync_seq, wrap_seq; > >>+ > >>+ /* FIXME check args->ring value is ok. */ > >>+ > >>+ /* > >>+ * The memory barrier is match with the one in radeon_fence_emit() and > >>+ * it insure us that we get the right matching wrap_seq and sync_seq. > >>+ * > >>+ * Note that no need to protect the fence_drv.sync_seq here as barrier > >>+ * insure us we will get the coherency we need. > >>+ */ > >>+ wrap_seq =3D ACCESS_ONCE(rdev->fence_drv[i].wrap_seq); > >>+ smp_rmb(); > >>+ sync_seq =3D lower_32_bits(ACCESS_ONCE(rdev->fence_drv[i].sync_seq[i]= )); > >>+ > >>+ /* > >>+ * So if current wrap_seq and one we are queried with are differ by > >>+ * more than one this means that we are queried about a very old fence > >>+ * seq value and we can assume it is long done. > >>+ * > >>+ * Well this is not entirely true, for it to be true we would need to > >>+ * stall when we increment the wrap counter if cs in previous wrap we= re > >>+ * not completed but this is highly unlikely. So live with the trill = of > >>+ * it going wrong ! > >>+ */ > >>+ if (abs((unsigned)wrap_seq - (unsigned)args->wrap) > 1) > >>+ return 1; > >>+ /* Now check if currently reported fence seq is done or not. */ > >>+ /* FIXME call fence func to update last_seq just in case. */ > >>+ last_seq =3D lower_32_bits(atomic64_read(&rdev->fence_drv[i].last_seq= )); > >>+ if ((last_seq - arg->seq) >=3D 0) > >>+ return 1; > >>+ /* > >>+ * Last failsafe to handle the horrible case were userspace holded on > >>+ * a wrap and seq value for so long without querying that it we wrapp= ed > >>+ * around. This is here to avoid userspace waiting for a fence that w= as > >>+ * emited a long time ago but the current sync_seq[ring] value migth = be > >>+ * stuck and thus we might go bigger than this very old seq value. > >>+ */ > >>+ if (((sync_seq - args->seq) < 0) && args->wrap =3D=3D wrap_seq) > >>+ return 1; > >>+ return 0; > >>+} > >>diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/ra= deon/radeon_fence.c > >>index 9137870..a6adcf6 100644 > >>--- a/drivers/gpu/drm/radeon/radeon_fence.c > >>+++ b/drivers/gpu/drm/radeon/radeon_fence.c > >>@@ -119,6 +119,10 @@ int radeon_fence_emit(struct radeon_device *rdev, > >> kref_init(&((*fence)->kref)); > >> (*fence)->rdev =3D rdev; > >> (*fence)->seq =3D ++rdev->fence_drv[ring].sync_seq[ring]; > >>+ /* Barrier is important for radeon_fence_cs_done_ioctl. */ > >>+ smp_wmb(); > >>+ if (rdev->fence_drv[ring].sync_seq[ring] =3D=3D RADEON_SEQ_WRAP_VALUE) > >>+ rdev->fence_drv[ring].wrap_seq++; > >> (*fence)->ring =3D ring; > >> radeon_fence_ring_emit(rdev, ring, *fence); > >> trace_radeon_fence_emit(rdev->ddev, ring, (*fence)->seq); > >>diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/rade= on/radeon_kms.c > >>index eb7164d..c9cfcf5 100644 > >>--- a/drivers/gpu/drm/radeon/radeon_kms.c > >>+++ b/drivers/gpu/drm/radeon/radeon_kms.c > >>@@ -885,5 +885,6 @@ const struct drm_ioctl_desc radeon_ioctls_kms[] =3D= { > >> DRM_IOCTL_DEF_DRV(RADEON_GEM_BUSY, radeon_gem_busy_ioctl, DRM_AUTH|D= RM_UNLOCKED|DRM_RENDER_ALLOW), > >> DRM_IOCTL_DEF_DRV(RADEON_GEM_VA, radeon_gem_va_ioctl, DRM_AUTH|DRM_U= NLOCKED|DRM_RENDER_ALLOW), > >> DRM_IOCTL_DEF_DRV(RADEON_GEM_OP, radeon_gem_op_ioctl, DRM_AUTH|DRM_U= NLOCKED|DRM_RENDER_ALLOW), > >>+ DRM_IOCTL_DEF_DRV(RADEON_CS_DONE, radeon_cs_done_ioctl, DRM_AUTH|DRM_= UNLOCKED|DRM_RENDER_ALLOW), > >> }; > >> int radeon_max_kms_ioctl =3D ARRAY_SIZE(radeon_ioctls_kms); > >>diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon_dr= m.h > >>index fea6099..a0c215d 100644 > >>--- a/include/uapi/drm/radeon_drm.h > >>+++ b/include/uapi/drm/radeon_drm.h > >>@@ -554,6 +554,7 @@ typedef struct { > >> #define DRM_IOCTL_RADEON_GEM_BUSY DRM_IOWR(DRM_COMMAND_BASE + DRM_RAD= EON_GEM_BUSY, struct drm_radeon_gem_busy) > >> #define DRM_IOCTL_RADEON_GEM_VA DRM_IOWR(DRM_COMMAND_BASE + DRM_RADE= ON_GEM_VA, struct drm_radeon_gem_va) > >> #define DRM_IOCTL_RADEON_GEM_OP DRM_IOWR(DRM_COMMAND_BASE + DRM_RADE= ON_GEM_OP, struct drm_radeon_gem_op) > >>+#define DRM_IOCTL_RADEON_CS_DONE DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEO= N_CS_DONE, struct drm_radeon_cs_done) > >> typedef struct drm_radeon_init { > >> enum { > >>@@ -936,6 +937,7 @@ struct drm_radeon_gem_va { > >> #define RADEON_CS_RING_VCE 4 > >> /* The third dword of RADEON_CHUNK_ID_FLAGS is a sint32 that sets the= priority */ > >> /* 0 =3D normal, + =3D higher priority, - =3D lower priority */ > >>+/* The fifth, sixth, seventh dword are a 32bit fence ID, ring id and = wrap id of this CS */ > >> struct drm_radeon_cs_chunk { > >> uint32_t chunk_id; > >>@@ -1038,4 +1040,11 @@ struct drm_radeon_info { > >> #define CIK_TILE_MODE_DEPTH_STENCIL_1D 5 > >>+struct drm_radeon_cs_done { > >>+ int32_t seq; > >>+ int32_t ring; > >>+ int32_t wrap; > >>+ int32_t pad; > >>+}; > >>+ > >> #endif > >>-- = > >>1.9.3 > >> > =