From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?windows-1252?Q?Christian_K=F6nig?= Subject: Re: [PATCH 2/3] drm/radeon: add command submission IDs Date: Fri, 19 Sep 2014 16:29:35 +0200 Message-ID: <541C3DCF.9020800@vodafone.de> References: <1411054213-11173-1-git-send-email-deathsimple@vodafone.de> <1411054213-11173-2-git-send-email-deathsimple@vodafone.de> <20140919004215.GA3447@gmail.com> <20140919031155.GB3447@gmail.com> <20140919040629.GC3447@gmail.com> <541BE35F.9000200@vodafone.de> <20140919134050.GA2581@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; Format="flowed" Content-Transfer-Encoding: quoted-printable Return-path: Received: from pegasos-out.vodafone.de (pegasos-out.vodafone.de [80.84.1.38]) by gabe.freedesktop.org (Postfix) with ESMTP id 091F7899D6 for ; Fri, 19 Sep 2014 07:29:47 -0700 (PDT) In-Reply-To: <20140919134050.GA2581@gmail.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: Jerome Glisse Cc: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org Am 19.09.2014 um 15:40 schrieb Jerome Glisse: > On Fri, Sep 19, 2014 at 10:03:43AM +0200, Christian K=F6nig wrote: >> Hi Jerome, >> >> this is exactly the approach which we wanted to avoid. And according to = Alex >> it was actually you guys (I think Dave) who noted that this is probably = not >> such a good idea. >> >> The issue is that by telling userspace the ring specific sequence number= we >> nail down the order of execution. E.g. a scheduler who want to change the >> order in which IBs are submitted to the hardware can't do so any more >> because we already gave userspace a sequence number to identify the comm= and >> submission. > Well for curent code this design does work, for hw where you want to have= the > hw/driver do some scheduling this can still be achieve by having a seq nu= mber > per ring per process. Correct, for current design we could do so but we have more long term = goals with that in mind. I will reorder my work a bit which should make = it more clear why we need the original fence and not only the sequence = for this. > >> Additional to that for hardware inter device synchronization we want to >> refer to a certain kernel object and not just the fence number. So we can >> only use some kind of handle based approach, e.g. as discussed before we >> want to be able to turn this sequence number in a FD fence. > Inter-device is not an issue here. For inter-device you want to convert t= his > seq number to an fd at which point you can create a fence structure with = the > proper information ie create a structure only if it is needed. > > Really this is the only sane design if we start allocating structure for = each > cs (well for each cs userspace cares) this gonna play bad with memory pre= ssure. Mhm, what's the matter with this? My implementation only allocates a single ring buffer for each client = which automatically resizes, apart from that we only have the fence = structure which is allocated anyway and released when automatically as = soon as all users goes away. And the scheduler implementation the Intel guys are working on would use = even more memory. As long as everything as accounted to the correct = process I don't really see a problem with that. Regards, Christian. > > Cheers, > J=E9r=F4me > >> Regards, >> Christian. >> >> Am 19.09.2014 um 06:06 schrieb Jerome Glisse: >>> On Thu, Sep 18, 2014 at 11:11:57PM -0400, Jerome Glisse wrote: >>>> On Thu, Sep 18, 2014 at 08:42:16PM -0400, Jerome Glisse wrote: >>>>> On Thu, Sep 18, 2014 at 05:30:12PM +0200, Christian K=F6nig wrote: >>>>>> From: Christian K=F6nig >>>>>> >>>>>> This patch adds a new 64bit ID as a result to each command submissio= n. >>>>> NAK for design reasons. >>>>> >>>>> First and foremost we DO NOT WANT TO ALLOCATE ANYTHING for submission= id. >>>>> I know that hooking up to fence make life easy but this is not how it >>>>> should be done. >>>>> >>>>> What you want to expose is the per ring 32bit seq value and report to= user >>>>> space the value and the ring id. To handle the wrap around you add a = uniq >>>>> 32bit wrap counter which is incremented every 1 << 30 or some big num= ber >>>>> seq value number (but must be smaller than 1 << 31). Then you use ari= thmetic >>>>> to determine if a cs seq number is done or not (ie the seq wrap around >>>>> counter diff should not be bigger then 2 or 3) and the seq number sho= uld >>>>> obviously be after again using arithmetic diff like jiffies code. >>>>> >>>>> All this being hidden in the kernel all the user space have to know i= s : >>>>> 32 bit seq value per ring >>>>> 32 bit ring uniq id >>>>> 32 wrap counter per ring >>>>> >>>>> With such scheme you never allocate anything or worry about any fence. >>>>> Way simpler and less code. >>>>> >>>>> Cheers, >>>>> J=E9r=F4me >>>> Because code is clearer than words attached is a patch of what i am th= inking >>>> about. The arithmetic should be right as well as the coherency and pro= per >>>> memory barrier. Thought it is completely untested and will require cou= ple >>>> fixes for properly checking userspace arguments and other aspect (see = FIXME >>>> in patches). >>>> >>>> There is no allocation, it is a pretty small patch, and it provide a f= ast >>>> ligthweight solution. While the new ioctl argument and return value ca= n be >>>> improved (like reporting more information to userspace like for instan= ce >>>> warning userspace when it is querying very old fences). I think the ov= erall >>>> design is sound. >>>> >>>> Let me know. >>> Ok actually after a beer (given that planet express manual for Bender, = which >>> also apply to me, clearly stipulate to expect malfunction if not proper= ly >>> inebriate) it became clear that by reversing the problem i could make i= t a >>> lot simpler. So attached is an even simpler patch that handle gracefull= y user >>> space querying for very old fence (ie that could have wrap around). >>> >>> Code explains it all. >>> >>>> Cheers, >>>> J=E9r=F4me >>>> >>>>>> Signed-off-by: Christian K=F6nig >>>>>> --- >>>>>> drivers/gpu/drm/radeon/Makefile | 2 +- >>>>>> drivers/gpu/drm/radeon/radeon.h | 13 +- >>>>>> drivers/gpu/drm/radeon/radeon_cs.c | 13 ++ >>>>>> drivers/gpu/drm/radeon/radeon_kms.c | 41 +++---- >>>>>> drivers/gpu/drm/radeon/radeon_seq.c | 229 ++++++++++++++++++++++++= ++++++++++++ >>>>>> include/uapi/drm/radeon_drm.h | 1 + >>>>>> 6 files changed, 277 insertions(+), 22 deletions(-) >>>>>> create mode 100644 drivers/gpu/drm/radeon/radeon_seq.c >>>>>> >>>>>> diff --git a/drivers/gpu/drm/radeon/Makefile b/drivers/gpu/drm/radeo= n/Makefile >>>>>> index 8cc9f68..21ee928 100644 >>>>>> --- a/drivers/gpu/drm/radeon/Makefile >>>>>> +++ b/drivers/gpu/drm/radeon/Makefile >>>>>> @@ -81,7 +81,7 @@ radeon-y +=3D radeon_device.o radeon_asic.o radeon= _kms.o \ >>>>>> rv770_smc.o cypress_dpm.o btc_dpm.o sumo_dpm.o sumo_smc.o trinity= _dpm.o \ >>>>>> trinity_smc.o ni_dpm.o si_smc.o si_dpm.o kv_smc.o kv_dpm.o ci_smc= .o \ >>>>>> ci_dpm.o dce6_afmt.o radeon_vm.o radeon_ucode.o radeon_ib.o radeo= n_mn.o \ >>>>>> - radeon_sync.o >>>>>> + radeon_sync.o radeon_seq.o >>>>>> # add async DMA block >>>>>> radeon-y +=3D \ >>>>>> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeo= n/radeon.h >>>>>> index b3b4e96..dbfd346 100644 >>>>>> --- a/drivers/gpu/drm/radeon/radeon.h >>>>>> +++ b/drivers/gpu/drm/radeon/radeon.h >>>>>> @@ -423,6 +423,15 @@ static inline bool radeon_fence_is_earlier(stru= ct radeon_fence *a, >>>>>> } >>>>>> /* >>>>>> + * Userspace command submission identifier generation >>>>>> + */ >>>>>> +struct radeon_seq; >>>>>> + >>>>>> +uint64_t radeon_seq_push(struct radeon_seq **seq, struct radeon_fen= ce *fence); >>>>>> +struct radeon_fence *radeon_seq_query(struct radeon_seq *seq, uint6= 4_t id); >>>>>> +void radeon_seq_destroy(struct radeon_seq **seq); >>>>>> + >>>>>> +/* >>>>>> * Tiling registers >>>>>> */ >>>>>> struct radeon_surface_reg { >>>>>> @@ -946,7 +955,9 @@ struct radeon_vm_manager { >>>>>> * file private structure >>>>>> */ >>>>>> struct radeon_fpriv { >>>>>> - struct radeon_vm vm; >>>>>> + struct radeon_vm vm; >>>>>> + struct mutex seq_lock; >>>>>> + struct radeon_seq *seq; >>>>>> }; >>>>>> /* >>>>>> diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/ra= deon/radeon_cs.c >>>>>> index b8f20a5..0c0f0b3 100644 >>>>>> --- a/drivers/gpu/drm/radeon/radeon_cs.c >>>>>> +++ b/drivers/gpu/drm/radeon/radeon_cs.c >>>>>> @@ -413,6 +413,19 @@ static void radeon_cs_parser_fini(struct radeon= _cs_parser *parser, int error, bo >>>>>> unsigned i; >>>>>> if (!error) { >>>>>> + if (parser->chunk_flags && >>>>>> + parser->chunk_flags->length_dw > 4) { >>>>>> + struct radeon_fpriv *fpriv =3D parser->filp->driver_priv; >>>>>> + uint32_t __user *to =3D parser->chunk_flags->user_ptr; >>>>>> + uint64_t id; >>>>>> + >>>>>> + mutex_lock(&fpriv->seq_lock); >>>>>> + id =3D radeon_seq_push(&fpriv->seq, parser->ib.fence); >>>>>> + mutex_unlock(&fpriv->seq_lock); >>>>>> + >>>>>> + copy_to_user(&to[3], &id, sizeof(uint64_t)); >>>>>> + } >>>>>> + >>>>>> /* Sort the buffer list from the smallest to largest buffer, >>>>>> * which affects the order of buffers in the LRU list. >>>>>> * This assures that the smallest buffers are added first >>>>>> diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/r= adeon/radeon_kms.c >>>>>> index 85ee6f7..38880af 100644 >>>>>> --- a/drivers/gpu/drm/radeon/radeon_kms.c >>>>>> +++ b/drivers/gpu/drm/radeon/radeon_kms.c >>>>>> @@ -578,39 +578,34 @@ void radeon_driver_lastclose_kms(struct drm_de= vice *dev) >>>>>> */ >>>>>> int radeon_driver_open_kms(struct drm_device *dev, struct drm_file= *file_priv) >>>>>> { >>>>>> + struct radeon_fpriv *fpriv =3D kzalloc(sizeof(*fpriv), GFP_KERNEL); >>>>>> struct radeon_device *rdev =3D dev->dev_private; >>>>>> int r; >>>>>> - file_priv->driver_priv =3D NULL; >>>>>> + if (unlikely(!fpriv)) >>>>>> + return -ENOMEM; >>>>>> + >>>>>> + file_priv->driver_priv =3D fpriv; >>>>>> r =3D pm_runtime_get_sync(dev->dev); >>>>>> if (r < 0) >>>>>> - return r; >>>>>> + goto error; >>>>>> /* new gpu have virtual address space support */ >>>>>> if (rdev->family >=3D CHIP_CAYMAN) { >>>>>> - struct radeon_fpriv *fpriv; >>>>>> struct radeon_vm *vm; >>>>>> int r; >>>>>> - fpriv =3D kzalloc(sizeof(*fpriv), GFP_KERNEL); >>>>>> - if (unlikely(!fpriv)) { >>>>>> - return -ENOMEM; >>>>>> - } >>>>>> - >>>>>> vm =3D &fpriv->vm; >>>>>> r =3D radeon_vm_init(rdev, vm); >>>>>> - if (r) { >>>>>> - kfree(fpriv); >>>>>> - return r; >>>>>> - } >>>>>> + if (r) >>>>>> + goto error; >>>>>> if (rdev->accel_working) { >>>>>> r =3D radeon_bo_reserve(rdev->ring_tmp_bo.bo, false); >>>>>> if (r) { >>>>>> radeon_vm_fini(rdev, vm); >>>>>> - kfree(fpriv); >>>>>> - return r; >>>>>> + goto error; >>>>>> } >>>>>> /* map the ib pool buffer read only into >>>>>> @@ -623,16 +618,20 @@ int radeon_driver_open_kms(struct drm_device *= dev, struct drm_file *file_priv) >>>>>> RADEON_VM_PAGE_SNOOPED); >>>>>> if (r) { >>>>>> radeon_vm_fini(rdev, vm); >>>>>> - kfree(fpriv); >>>>>> - return r; >>>>>> + goto error; >>>>>> } >>>>>> } >>>>>> - file_priv->driver_priv =3D fpriv; >>>>>> } >>>>>> + mutex_init(&fpriv->seq_lock); >>>>>> + >>>>>> pm_runtime_mark_last_busy(dev->dev); >>>>>> pm_runtime_put_autosuspend(dev->dev); >>>>>> return 0; >>>>>> + >>>>>> +error: >>>>>> + kfree(fpriv); >>>>>> + return r; >>>>>> } >>>>>> /** >>>>>> @@ -646,11 +645,13 @@ int radeon_driver_open_kms(struct drm_device *= dev, struct drm_file *file_priv) >>>>>> void radeon_driver_postclose_kms(struct drm_device *dev, >>>>>> struct drm_file *file_priv) >>>>>> { >>>>>> + struct radeon_fpriv *fpriv =3D file_priv->driver_priv; >>>>>> struct radeon_device *rdev =3D dev->dev_private; >>>>>> + radeon_seq_destroy(&fpriv->seq); >>>>>> + >>>>>> /* new gpu have virtual address space support */ >>>>>> if (rdev->family >=3D CHIP_CAYMAN && file_priv->driver_priv) { >>>>>> - struct radeon_fpriv *fpriv =3D file_priv->driver_priv; >>>>>> struct radeon_vm *vm =3D &fpriv->vm; >>>>>> int r; >>>>>> @@ -664,9 +665,9 @@ void radeon_driver_postclose_kms(struct drm_devi= ce *dev, >>>>>> } >>>>>> radeon_vm_fini(rdev, vm); >>>>>> - kfree(fpriv); >>>>>> - file_priv->driver_priv =3D NULL; >>>>>> } >>>>>> + kfree(fpriv); >>>>>> + file_priv->driver_priv =3D NULL; >>>>>> } >>>>>> /** >>>>>> diff --git a/drivers/gpu/drm/radeon/radeon_seq.c b/drivers/gpu/drm/r= adeon/radeon_seq.c >>>>>> new file mode 100644 >>>>>> index 0000000..d8857f1 >>>>>> --- /dev/null >>>>>> +++ b/drivers/gpu/drm/radeon/radeon_seq.c >>>>>> @@ -0,0 +1,229 @@ >>>>>> +/* >>>>>> + * Copyright 2014 Advanced Micro Devices, Inc. >>>>>> + * All Rights Reserved. >>>>>> + * >>>>>> + * Permission is hereby granted, free of charge, to any person obta= ining a >>>>>> + * copy of this software and associated documentation files (the >>>>>> + * "Software"), to deal in the Software without restriction, includ= ing >>>>>> + * without limitation the rights to use, copy, modify, merge, publi= sh, >>>>>> + * distribute, sub license, and/or sell copies of the Software, and= to >>>>>> + * permit persons to whom the Software is furnished to do so, subje= ct to >>>>>> + * the following conditions: >>>>>> + * >>>>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, = EXPRESS OR >>>>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANT= ABILITY, >>>>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVE= NT SHALL >>>>>> + * THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FO= R ANY CLAIM, >>>>>> + * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TO= RT OR >>>>>> + * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWA= RE OR THE >>>>>> + * USE OR OTHER DEALINGS IN THE SOFTWARE. >>>>>> + * >>>>>> + * The above copyright notice and this permission notice (including= the >>>>>> + * next paragraph) shall be included in all copies or substantial p= ortions >>>>>> + * of the Software. >>>>>> + * >>>>>> + */ >>>>>> +/* >>>>>> + * Authors: >>>>>> + * Christian K=F6nig >>>>>> + */ >>>>>> + >>>>>> +#include >>>>>> +#include "radeon.h" >>>>>> + >>>>>> +/* >>>>>> + * ID sequences >>>>>> + * This code generates a 64bit identifier for a command submission. >>>>>> + * It works by adding the fence of the command submission to a auto= matically >>>>>> + * resizing ring buffer. >>>>>> + */ >>>>>> + >>>>>> +struct radeon_seq { >>>>>> + uint64_t start; >>>>>> + uint64_t end; >>>>>> + uint64_t mask; >>>>>> + struct radeon_seq *replacement; >>>>>> +}; >>>>>> + >>>>>> +/** >>>>>> + * radeon_seq_create - create a new sequence object >>>>>> + * >>>>>> + * @start: start value for this sequence >>>>>> + * @size: size of the ring buffer, must be power of two >>>>>> + * >>>>>> + * Allocate and initialize a new ring buffer and header. >>>>>> + * Returns NULL if allocation fails, new object otherwise. >>>>>> + */ >>>>>> +static struct radeon_seq *radeon_seq_create(uint64_t start, unsigne= d size) >>>>>> +{ >>>>>> + unsigned bytes =3D sizeof(struct radeon_seq) + >>>>>> + size * sizeof(struct radeon_fence *); >>>>>> + >>>>>> + struct radeon_seq *seq; >>>>>> + >>>>>> + seq =3D kmalloc(bytes, GFP_KERNEL); >>>>>> + if (!seq) >>>>>> + return NULL; >>>>>> + >>>>>> + seq->start =3D start; >>>>>> + seq->end =3D start; >>>>>> + seq->mask =3D size - 1; >>>>>> + seq->replacement =3D NULL; >>>>>> + >>>>>> + return seq; >>>>>> +} >>>>>> + >>>>>> +/** >>>>>> + * radeon_seq_ring - get pointer to ring buffer >>>>>> + * >>>>>> + * @seq: sequence object >>>>>> + * >>>>>> + * Calculate the address of the ring buffer. >>>>>> + */ >>>>>> +static struct radeon_fence **radeon_seq_ring(struct radeon_seq *seq) >>>>>> +{ >>>>>> + return (struct radeon_fence **)&seq[1]; >>>>>> +} >>>>>> + >>>>>> +/** >>>>>> + * radeon_seq_try_free - try to free fences from the ring buffer >>>>>> + * >>>>>> + * @seq: sequence object >>>>>> + * >>>>>> + * Try to free fences from the start of the ring buffer. >>>>>> + */ >>>>>> +static void radeon_seq_try_free(struct radeon_seq *seq) >>>>>> +{ >>>>>> + struct radeon_fence **ring =3D radeon_seq_ring(seq); >>>>>> + >>>>>> + while (seq->start !=3D seq->end) { >>>>>> + unsigned idx =3D seq->start & seq->mask; >>>>>> + struct radeon_fence *fence =3D ring[idx]; >>>>>> + >>>>>> + if (!radeon_fence_signaled(fence)) >>>>>> + break; >>>>>> + >>>>>> + radeon_fence_unref(&fence); >>>>>> + ++seq->start; >>>>>> + } >>>>>> +} >>>>>> + >>>>>> +/** >>>>>> + * radeon_seq_add - add new fence to the end of the ring buffer >>>>>> + * >>>>>> + * @seq: sequence object >>>>>> + * @f: the fence object >>>>>> + * >>>>>> + * Add the fence and return the generated ID. >>>>>> + */ >>>>>> +static uint64_t radeon_seq_add(struct radeon_seq *seq, struct radeo= n_fence *f) >>>>>> +{ >>>>>> + struct radeon_fence **ring =3D radeon_seq_ring(seq); >>>>>> + >>>>>> + ring[seq->end & seq->mask] =3D radeon_fence_ref(f); >>>>>> + return seq->end++; >>>>>> +} >>>>>> + >>>>>> +/** >>>>>> + * radeon_seq_push - check for room and add the fence >>>>>> + * >>>>>> + * @seq: sequence object >>>>>> + * @fence: the fence object >>>>>> + * >>>>>> + * Check for room on the ring buffer, if there isn't enough >>>>>> + * reallocate the sequence object and add the fence. >>>>>> + * Returns the generated ID. >>>>>> + */ >>>>>> +uint64_t radeon_seq_push(struct radeon_seq **seq, struct radeon_fen= ce *fence) >>>>>> +{ >>>>>> + unsigned size_for_new_seq =3D 4; >>>>>> + uint64_t start_for_new_seq =3D 1; >>>>>> + >>>>>> + if (*seq) { >>>>>> + /* try to release old replacements */ >>>>>> + while ((*seq)->replacement) { >>>>>> + radeon_seq_try_free(*seq); >>>>>> + if ((*seq)->start =3D=3D (*seq)->end) { >>>>>> + struct radeon_seq *repl =3D (*seq)->replacement; >>>>>> + >>>>>> + kfree(*seq); >>>>>> + *seq =3D repl; >>>>>> + } else { >>>>>> + /* move on to the current container */ >>>>>> + seq =3D &(*seq)->replacement; >>>>>> + } >>>>>> + } >>>>>> + >>>>>> + /* check if we have enough room for one more fence */ >>>>>> + radeon_seq_try_free(*seq); >>>>>> + if (((*seq)->end - (*seq)->start) <=3D (*seq)->mask) >>>>>> + return radeon_seq_add(*seq, fence); >>>>>> + >>>>>> + /* not enough room, let's allocate a replacement */ >>>>>> + size_for_new_seq =3D ((*seq)->mask + 1) * 2; >>>>>> + start_for_new_seq =3D (*seq)->end + 1; >>>>>> + seq =3D &(*seq)->replacement; >>>>>> + } >>>>>> + >>>>>> + *seq =3D radeon_seq_create(start_for_new_seq, size_for_new_seq); >>>>>> + if (!*seq) { >>>>>> + /* not enough memory for a new sequence object, but failing >>>>>> + here isn't a good idea either cause the commands are already >>>>>> + submitted to the hardware. So just block on the fence. */ >>>>>> + int r =3D radeon_fence_wait(fence, false); >>>>>> + if (r) >>>>>> + DRM_ERROR("Error waiting for fence (%d)\n", r); >>>>>> + return 0; >>>>>> + } >>>>>> + return radeon_seq_add(*seq, fence); >>>>>> +} >>>>>> + >>>>>> +/** >>>>>> + * radeon_seq_query - lockup fence by it's ID >>>>>> + * >>>>>> + * @seq: sequence object >>>>>> + * @id: the generated ID >>>>>> + * >>>>>> + * Lockup the associated fence by it's ID. >>>>>> + * Returns fence object or NULL if it couldn't be found. >>>>>> + */ >>>>>> +struct radeon_fence *radeon_seq_query(struct radeon_seq *seq, uint6= 4_t id) >>>>>> +{ >>>>>> + struct radeon_fence **ring; >>>>>> + >>>>>> + while (seq && id > seq->end) >>>>>> + seq =3D seq->replacement; >>>>>> + >>>>>> + if (!seq || id < seq->start) >>>>>> + return NULL; >>>>>> + >>>>>> + ring =3D radeon_seq_ring(seq); >>>>>> + return ring[id & seq->mask]; >>>>>> +} >>>>>> + >>>>>> +/** >>>>>> + * radeon_seq_destroy - destroy the sequence object >>>>>> + * >>>>>> + * @seq_ptr: pointer to sequence object >>>>>> + * >>>>>> + * Destroy the sequence objects and release all fence references ta= ken. >>>>>> + */ >>>>>> +void radeon_seq_destroy(struct radeon_seq **seq_ptr) >>>>>> +{ >>>>>> + struct radeon_seq *seq =3D *seq_ptr; >>>>>> + while (seq) { >>>>>> + struct radeon_seq *repl =3D seq->replacement; >>>>>> + unsigned start =3D seq->start & seq->mask; >>>>>> + unsigned end =3D seq->end & seq->mask; >>>>>> + struct radeon_fence **ring; >>>>>> + unsigned i; >>>>>> + >>>>>> + ring =3D radeon_seq_ring(seq); >>>>>> + for (i =3D start; i < end; ++i) >>>>>> + radeon_fence_unref(&ring[i]); >>>>>> + >>>>>> + kfree(seq); >>>>>> + seq =3D repl; >>>>>> + } >>>>>> + *seq_ptr =3D NULL; >>>>>> +} >>>>>> diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon= _drm.h >>>>>> index 50d0fb4..6b2b2e7 100644 >>>>>> --- a/include/uapi/drm/radeon_drm.h >>>>>> +++ b/include/uapi/drm/radeon_drm.h >>>>>> @@ -959,6 +959,7 @@ struct drm_radeon_gem_va { >>>>>> #define RADEON_CS_RING_VCE 4 >>>>>> /* The third dword of RADEON_CHUNK_ID_FLAGS is a sint32 that sets = the priority */ >>>>>> /* 0 =3D normal, + =3D higher priority, - =3D lower priority */ >>>>>> +/* The fourth and fives dword are a 64bit fence ID generated for th= is CS */ >>>>>> struct drm_radeon_cs_chunk { >>>>>> uint32_t chunk_id; >>>>>> -- = >>>>>> 1.9.1 >>>>>> >>>>>> _______________________________________________ >>>>>> dri-devel mailing list >>>>>> dri-devel@lists.freedesktop.org >>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel >>>> From b95fe503dad89875da9db2d11d05f93eca41232d Mon Sep 17 00:00:00 2001 >>>> From: =3D?UTF-8?q?J=3DC3=3DA9r=3DC3=3DB4me=3D20Glisse?=3D >>>> Date: Thu, 18 Sep 2014 22:51:21 -0400 >>>> Subject: [PATCH] drm/radeon: cs sequence id and cs completion query. >>>> MIME-Version: 1.0 >>>> Content-Type: text/plain; charset=3DUTF-8 >>>> Content-Transfer-Encoding: 8bit >>>> >>>> This report back to userspace ring id and sequence number that can >>>> be use by userspace to query for the completetion of the cs on the >>>> hardware. This also add a new ioctl to perform such query. >>>> >>>> There is an extra value supplied to userspace the wrap counter that >>>> is use to detect wrap around and to gracefuly handle the case where >>>> userspace might be querying about a very, very, very old cs executed >>>> long time ago in a far far away universe. >>>> >>>> This patch is aimed to introduce the necessary ground work for user >>>> space explicit synchronization. By allowing userspace to query about >>>> cs completetion on hardware, user space can perform operation and >>>> synchronization on buffer by itself without having the cs ioctl to >>>> implicitly wait for older cs completetion before scheduling new cs. >>>> This part is however left to a follow-up patch. >>>> >>>> Signed-off-by: J=E9r=F4me Glisse >>>> --- >>>> drivers/gpu/drm/radeon/radeon.h | 2 ++ >>>> drivers/gpu/drm/radeon/radeon_cs.c | 60 +++++++++++++++++++++++++= ++++++++++ >>>> drivers/gpu/drm/radeon/radeon_fence.c | 4 +++ >>>> drivers/gpu/drm/radeon/radeon_kms.c | 1 + >>>> include/uapi/drm/radeon_drm.h | 9 ++++++ >>>> 5 files changed, 76 insertions(+) >>>> >>>> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/= radeon.h >>>> index 5f05b4c..1f1dd1f 100644 >>>> --- a/drivers/gpu/drm/radeon/radeon.h >>>> +++ b/drivers/gpu/drm/radeon/radeon.h >>>> @@ -118,6 +118,7 @@ extern int radeon_bapm; >>>> #define RADEON_DEBUGFS_MAX_COMPONENTS 32 >>>> #define RADEONFB_CONN_LIMIT 4 >>>> #define RADEON_BIOS_NUM_SCRATCH 8 >>>> +#define RADEON_SEQ_WRAP_VALUE (1 << 30) >>>> /* fence seq are set to this number when signaled */ >>>> #define RADEON_FENCE_SIGNALED_SEQ 0LL >>>> @@ -355,6 +356,7 @@ struct radeon_fence_driver { >>>> /* sync_seq is protected by ring emission lock */ >>>> uint64_t sync_seq[RADEON_NUM_RINGS]; >>>> atomic64_t last_seq; >>>> + int32_t wrap_seq; >>>> bool initialized; >>>> }; >>>> diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/rade= on/radeon_cs.c >>>> index 83f382e..be4ae25 100644 >>>> --- a/drivers/gpu/drm/radeon/radeon_cs.c >>>> +++ b/drivers/gpu/drm/radeon/radeon_cs.c >>>> @@ -404,6 +404,17 @@ static void radeon_cs_parser_fini(struct radeon_c= s_parser *parser, int error, bo >>>> ttm_eu_fence_buffer_objects(&parser->ticket, >>>> &parser->validated, >>>> parser->ib.fence); >>>> + if (parser->chunk_flags && parser->chunk_flags->length_dw > 4) { >>>> + uint32_t __user *to =3D parser->chunk_flags->user_ptr; >>>> + uint32_t tmp; >>>> + >>>> + tmp =3D lower_32_bits(parser->ib.fence->seq); >>>> + copy_to_user(&to[3], &tmp, sizeof(uint32_t)); >>>> + tmp =3D parser->ib.fence->ring; >>>> + copy_to_user(&to[4], &tmp, sizeof(uint32_t)); >>>> + tmp =3D rdev->fence_drv[tmp]->wrap_seq; >>>> + copy_to_user(&to[5], &tmp, sizeof(uint32_t)); >>>> + } >>>> } else if (backoff) { >>>> ttm_eu_backoff_reservation(&parser->ticket, >>>> &parser->validated); >>>> @@ -823,3 +834,52 @@ int radeon_cs_packet_next_reloc(struct radeon_cs_= parser *p, >>>> *cs_reloc =3D p->relocs_ptr[(idx / 4)]; >>>> return 0; >>>> } >>>> + >>>> +int radeon_cs_done_ioctl(struct drm_device *dev, void *data, struct d= rm_file *filp) >>>> +{ >>>> + struct radeon_device *rdev =3D dev->dev_private; >>>> + struct drm_radeon_cs_done *args =3D data; >>>> + unsigned i =3D args->ring; >>>> + int32_t last_seq, sync_seq, wrap_seq; >>>> + >>>> + /* FIXME check args->ring value is ok. */ >>>> + >>>> + /* >>>> + * The memory barrier is match with the one in radeon_fence_emit() a= nd >>>> + * it insure us that we get the right matching wrap_seq and sync_seq. >>>> + * >>>> + * Note that no need to protect the fence_drv.sync_seq here as barri= er >>>> + * insure us we will get the coherency we need. >>>> + */ >>>> + wrap_seq =3D ACCESS_ONCE(rdev->fence_drv[i].wrap_seq); >>>> + smp_rmb(); >>>> + sync_seq =3D lower_32_bits(ACCESS_ONCE(rdev->fence_drv[i].sync_seq[i= ])); >>>> + >>>> + /* >>>> + * So if current wrap_seq and one we are queried with are differ by >>>> + * more than one this means that we are queried about a very old fen= ce >>>> + * seq value and we can assume it is long done. >>>> + * >>>> + * Well this is not entirely true, for it to be true we would need to >>>> + * stall when we increment the wrap counter if cs in previous wrap w= ere >>>> + * not completed but this is highly unlikely. So live with the trill= of >>>> + * it going wrong ! >>>> + */ >>>> + if (abs((unsigned)wrap_seq - (unsigned)args->wrap) > 1) >>>> + return 1; >>>> + /* Now check if currently reported fence seq is done or not. */ >>>> + /* FIXME call fence func to update last_seq just in case. */ >>>> + last_seq =3D lower_32_bits(atomic64_read(&rdev->fence_drv[i].last_se= q)); >>>> + if ((last_seq - arg->seq) >=3D 0) >>>> + return 1; >>>> + /* >>>> + * Last failsafe to handle the horrible case were userspace holded on >>>> + * a wrap and seq value for so long without querying that it we wrap= ped >>>> + * around. This is here to avoid userspace waiting for a fence that = was >>>> + * emited a long time ago but the current sync_seq[ring] value migth= be >>>> + * stuck and thus we might go bigger than this very old seq value. >>>> + */ >>>> + if (((sync_seq - args->seq) < 0) && args->wrap =3D=3D wrap_seq) >>>> + return 1; >>>> + return 0; >>>> +} >>>> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/r= adeon/radeon_fence.c >>>> index 9137870..a6adcf6 100644 >>>> --- a/drivers/gpu/drm/radeon/radeon_fence.c >>>> +++ b/drivers/gpu/drm/radeon/radeon_fence.c >>>> @@ -119,6 +119,10 @@ int radeon_fence_emit(struct radeon_device *rdev, >>>> kref_init(&((*fence)->kref)); >>>> (*fence)->rdev =3D rdev; >>>> (*fence)->seq =3D ++rdev->fence_drv[ring].sync_seq[ring]; >>>> + /* Barrier is important for radeon_fence_cs_done_ioctl. */ >>>> + smp_wmb(); >>>> + if (rdev->fence_drv[ring].sync_seq[ring] =3D=3D RADEON_SEQ_WRAP_VALU= E) >>>> + rdev->fence_drv[ring].wrap_seq++; >>>> (*fence)->ring =3D ring; >>>> radeon_fence_ring_emit(rdev, ring, *fence); >>>> trace_radeon_fence_emit(rdev->ddev, ring, (*fence)->seq); >>>> diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/rad= eon/radeon_kms.c >>>> index eb7164d..c9cfcf5 100644 >>>> --- a/drivers/gpu/drm/radeon/radeon_kms.c >>>> +++ b/drivers/gpu/drm/radeon/radeon_kms.c >>>> @@ -885,5 +885,6 @@ const struct drm_ioctl_desc radeon_ioctls_kms[] = =3D { >>>> DRM_IOCTL_DEF_DRV(RADEON_GEM_BUSY, radeon_gem_busy_ioctl, DRM_AUTH|= DRM_UNLOCKED|DRM_RENDER_ALLOW), >>>> DRM_IOCTL_DEF_DRV(RADEON_GEM_VA, radeon_gem_va_ioctl, DRM_AUTH|DRM_= UNLOCKED|DRM_RENDER_ALLOW), >>>> DRM_IOCTL_DEF_DRV(RADEON_GEM_OP, radeon_gem_op_ioctl, DRM_AUTH|DRM_= UNLOCKED|DRM_RENDER_ALLOW), >>>> + DRM_IOCTL_DEF_DRV(RADEON_CS_DONE, radeon_cs_done_ioctl, DRM_AUTH|DRM= _UNLOCKED|DRM_RENDER_ALLOW), >>>> }; >>>> int radeon_max_kms_ioctl =3D ARRAY_SIZE(radeon_ioctls_kms); >>>> diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon_d= rm.h >>>> index fea6099..a0c215d 100644 >>>> --- a/include/uapi/drm/radeon_drm.h >>>> +++ b/include/uapi/drm/radeon_drm.h >>>> @@ -554,6 +554,7 @@ typedef struct { >>>> #define DRM_IOCTL_RADEON_GEM_BUSY DRM_IOWR(DRM_COMMAND_BASE + DRM_RA= DEON_GEM_BUSY, struct drm_radeon_gem_busy) >>>> #define DRM_IOCTL_RADEON_GEM_VA DRM_IOWR(DRM_COMMAND_BASE + DRM_RAD= EON_GEM_VA, struct drm_radeon_gem_va) >>>> #define DRM_IOCTL_RADEON_GEM_OP DRM_IOWR(DRM_COMMAND_BASE + DRM_RAD= EON_GEM_OP, struct drm_radeon_gem_op) >>>> +#define DRM_IOCTL_RADEON_CS_DONE DRM_IOWR(DRM_COMMAND_BASE + DRM_RADE= ON_CS_DONE, struct drm_radeon_cs_done) >>>> typedef struct drm_radeon_init { >>>> enum { >>>> @@ -936,6 +937,7 @@ struct drm_radeon_gem_va { >>>> #define RADEON_CS_RING_VCE 4 >>>> /* The third dword of RADEON_CHUNK_ID_FLAGS is a sint32 that sets th= e priority */ >>>> /* 0 =3D normal, + =3D higher priority, - =3D lower priority */ >>>> +/* The fifth, sixth, seventh dword are a 32bit fence ID, ring id and= wrap id of this CS */ >>>> struct drm_radeon_cs_chunk { >>>> uint32_t chunk_id; >>>> @@ -1038,4 +1040,11 @@ struct drm_radeon_info { >>>> #define CIK_TILE_MODE_DEPTH_STENCIL_1D 5 >>>> +struct drm_radeon_cs_done { >>>> + int32_t seq; >>>> + int32_t ring; >>>> + int32_t wrap; >>>> + int32_t pad; >>>> +}; >>>> + >>>> #endif >>>> -- = >>>> 1.9.3 >>>>