From mboxrd@z Thu Jan  1 00:00:00 1970
From: =?windows-1252?Q?Christian_K=F6nig?= <deathsimple@vodafone.de>
Subject: Re: [PATCH 2/3] drm/radeon: add command submission IDs
Date: Fri, 19 Sep 2014 16:29:35 +0200
Message-ID: <541C3DCF.9020800@vodafone.de>
References: <1411054213-11173-1-git-send-email-deathsimple@vodafone.de>
 <1411054213-11173-2-git-send-email-deathsimple@vodafone.de>
 <20140919004215.GA3447@gmail.com> <20140919031155.GB3447@gmail.com>
 <20140919040629.GC3447@gmail.com> <541BE35F.9000200@vodafone.de>
 <20140919134050.GA2581@gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="windows-1252"; Format="flowed"
Content-Transfer-Encoding: quoted-printable
Return-path: <dri-devel-bounces@lists.freedesktop.org>
Received: from pegasos-out.vodafone.de (pegasos-out.vodafone.de [80.84.1.38])
 by gabe.freedesktop.org (Postfix) with ESMTP id 091F7899D6
 for <dri-devel@lists.freedesktop.org>; Fri, 19 Sep 2014 07:29:47 -0700 (PDT)
In-Reply-To: <20140919134050.GA2581@gmail.com>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
To: Jerome Glisse <j.glisse@gmail.com>
Cc: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org

Am 19.09.2014 um 15:40 schrieb Jerome Glisse:
> On Fri, Sep 19, 2014 at 10:03:43AM +0200, Christian K=F6nig wrote:
>> Hi Jerome,
>>
>> this is exactly the approach which we wanted to avoid. And according to =
Alex
>> it was actually you guys (I think Dave) who noted that this is probably =
not
>> such a good idea.
>>
>> The issue is that by telling userspace the ring specific sequence number=
 we
>> nail down the order of execution. E.g. a scheduler who want to change the
>> order in which IBs are submitted to the hardware can't do so any more
>> because we already gave userspace a sequence number to identify the comm=
and
>> submission.
> Well for curent code this design does work, for hw where you want to have=
 the
> hw/driver do some scheduling this can still be achieve by having a seq nu=
mber
> per ring per process.

Correct, for current design we could do so but we have more long term =

goals with that in mind. I will reorder my work a bit which should make =

it more clear why we need the original fence and not only the sequence =

for this.

>
>> Additional to that for hardware inter device synchronization we want to
>> refer to a certain kernel object and not just the fence number. So we can
>> only use some kind of handle based approach, e.g. as discussed before we
>> want to be able to turn this sequence number in a FD fence.
> Inter-device is not an issue here. For inter-device you want to convert t=
his
> seq number to an fd at which point you can create a fence structure with =
the
> proper information ie create a structure only if it is needed.
>
> Really this is the only sane design if we start allocating structure for =
each
> cs (well for each cs userspace cares) this gonna play bad with memory pre=
ssure.
Mhm, what's the matter with this?

My implementation only allocates a single ring buffer for each client =

which automatically resizes, apart from that we only have the fence =

structure which is allocated anyway and released when automatically as =

soon as all users goes away.

And the scheduler implementation the Intel guys are working on would use =

even more memory. As long as everything as accounted to the correct =

process I don't really see a problem with that.

Regards,
Christian.

>
> Cheers,
> J=E9r=F4me
>
>> Regards,
>> Christian.
>>
>> Am 19.09.2014 um 06:06 schrieb Jerome Glisse:
>>> On Thu, Sep 18, 2014 at 11:11:57PM -0400, Jerome Glisse wrote:
>>>> On Thu, Sep 18, 2014 at 08:42:16PM -0400, Jerome Glisse wrote:
>>>>> On Thu, Sep 18, 2014 at 05:30:12PM +0200, Christian K=F6nig wrote:
>>>>>> From: Christian K=F6nig <christian.koenig@amd.com>
>>>>>>
>>>>>> This patch adds a new 64bit ID as a result to each command submissio=
n.
>>>>> NAK for design reasons.
>>>>>
>>>>> First and foremost we DO NOT WANT TO ALLOCATE ANYTHING for submission=
 id.
>>>>> I know that hooking up to fence make life easy but this is not how it
>>>>> should be done.
>>>>>
>>>>> What you want to expose is the per ring 32bit seq value and report to=
 user
>>>>> space the value and the ring id. To handle the wrap around you add a =
uniq
>>>>> 32bit wrap counter which is incremented every 1 << 30 or some big num=
ber
>>>>> seq value number (but must be smaller than 1 << 31). Then you use ari=
thmetic
>>>>> to determine if a cs seq number is done or not (ie the seq wrap around
>>>>> counter diff should not be bigger then 2 or 3) and the seq number sho=
uld
>>>>> obviously be after again using arithmetic diff like jiffies code.
>>>>>
>>>>> All this being hidden in the kernel all the user space have to know i=
s :
>>>>> 32 bit seq value per ring
>>>>> 32 bit ring uniq id
>>>>> 32 wrap counter per ring
>>>>>
>>>>> With such scheme you never allocate anything or worry about any fence.
>>>>> Way simpler and less code.
>>>>>
>>>>> Cheers,
>>>>> J=E9r=F4me
>>>> Because code is clearer than words attached is a patch of what i am th=
inking
>>>> about. The arithmetic should be right as well as the coherency and pro=
per
>>>> memory barrier. Thought it is completely untested and will require cou=
ple
>>>> fixes for properly checking userspace arguments and other aspect (see =
FIXME
>>>> in patches).
>>>>
>>>> There is no allocation, it is a pretty small patch, and it provide a f=
ast
>>>> ligthweight solution. While the new ioctl argument and return value ca=
n be
>>>> improved (like reporting more information to userspace like for instan=
ce
>>>> warning userspace when it is querying very old fences). I think the ov=
erall
>>>> design is sound.
>>>>
>>>> Let me know.
>>> Ok actually after a beer (given that planet express manual for Bender, =
which
>>> also apply to me, clearly stipulate to expect malfunction if not proper=
ly
>>> inebriate) it became clear that by reversing the problem i could make i=
t a
>>> lot simpler. So attached is an even simpler patch that handle gracefull=
y user
>>> space querying for very old fence (ie that could have wrap around).
>>>
>>> Code explains it all.
>>>
>>>> Cheers,
>>>> J=E9r=F4me
>>>>
>>>>>> Signed-off-by: Christian K=F6nig <christian.koenig@amd.com>
>>>>>> ---
>>>>>>   drivers/gpu/drm/radeon/Makefile     |   2 +-
>>>>>>   drivers/gpu/drm/radeon/radeon.h     |  13 +-
>>>>>>   drivers/gpu/drm/radeon/radeon_cs.c  |  13 ++
>>>>>>   drivers/gpu/drm/radeon/radeon_kms.c |  41 +++----
>>>>>>   drivers/gpu/drm/radeon/radeon_seq.c | 229 ++++++++++++++++++++++++=
++++++++++++
>>>>>>   include/uapi/drm/radeon_drm.h       |   1 +
>>>>>>   6 files changed, 277 insertions(+), 22 deletions(-)
>>>>>>   create mode 100644 drivers/gpu/drm/radeon/radeon_seq.c
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/radeon/Makefile b/drivers/gpu/drm/radeo=
n/Makefile
>>>>>> index 8cc9f68..21ee928 100644
>>>>>> --- a/drivers/gpu/drm/radeon/Makefile
>>>>>> +++ b/drivers/gpu/drm/radeon/Makefile
>>>>>> @@ -81,7 +81,7 @@ radeon-y +=3D radeon_device.o radeon_asic.o radeon=
_kms.o \
>>>>>>   	rv770_smc.o cypress_dpm.o btc_dpm.o sumo_dpm.o sumo_smc.o trinity=
_dpm.o \
>>>>>>   	trinity_smc.o ni_dpm.o si_smc.o si_dpm.o kv_smc.o kv_dpm.o ci_smc=
.o \
>>>>>>   	ci_dpm.o dce6_afmt.o radeon_vm.o radeon_ucode.o radeon_ib.o radeo=
n_mn.o \
>>>>>> -	radeon_sync.o
>>>>>> +	radeon_sync.o radeon_seq.o
>>>>>>   # add async DMA block
>>>>>>   radeon-y +=3D \
>>>>>> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeo=
n/radeon.h
>>>>>> index b3b4e96..dbfd346 100644
>>>>>> --- a/drivers/gpu/drm/radeon/radeon.h
>>>>>> +++ b/drivers/gpu/drm/radeon/radeon.h
>>>>>> @@ -423,6 +423,15 @@ static inline bool radeon_fence_is_earlier(stru=
ct radeon_fence *a,
>>>>>>   }
>>>>>>   /*
>>>>>> + * Userspace command submission identifier generation
>>>>>> + */
>>>>>> +struct radeon_seq;
>>>>>> +
>>>>>> +uint64_t radeon_seq_push(struct radeon_seq **seq, struct radeon_fen=
ce *fence);
>>>>>> +struct radeon_fence *radeon_seq_query(struct radeon_seq *seq, uint6=
4_t id);
>>>>>> +void radeon_seq_destroy(struct radeon_seq **seq);
>>>>>> +
>>>>>> +/*
>>>>>>    * Tiling registers
>>>>>>    */
>>>>>>   struct radeon_surface_reg {
>>>>>> @@ -946,7 +955,9 @@ struct radeon_vm_manager {
>>>>>>    * file private structure
>>>>>>    */
>>>>>>   struct radeon_fpriv {
>>>>>> -	struct radeon_vm		vm;
>>>>>> +	struct radeon_vm	vm;
>>>>>> +	struct mutex		seq_lock;
>>>>>> +	struct radeon_seq	*seq;
>>>>>>   };
>>>>>>   /*
>>>>>> diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/ra=
deon/radeon_cs.c
>>>>>> index b8f20a5..0c0f0b3 100644
>>>>>> --- a/drivers/gpu/drm/radeon/radeon_cs.c
>>>>>> +++ b/drivers/gpu/drm/radeon/radeon_cs.c
>>>>>> @@ -413,6 +413,19 @@ static void radeon_cs_parser_fini(struct radeon=
_cs_parser *parser, int error, bo
>>>>>>   	unsigned i;
>>>>>>   	if (!error) {
>>>>>> +		if (parser->chunk_flags &&
>>>>>> +		    parser->chunk_flags->length_dw > 4) {
>>>>>> +			struct radeon_fpriv *fpriv =3D parser->filp->driver_priv;
>>>>>> +			uint32_t __user *to =3D parser->chunk_flags->user_ptr;
>>>>>> +			uint64_t id;
>>>>>> +
>>>>>> +			mutex_lock(&fpriv->seq_lock);
>>>>>> +			id =3D radeon_seq_push(&fpriv->seq, parser->ib.fence);
>>>>>> +			mutex_unlock(&fpriv->seq_lock);
>>>>>> +
>>>>>> +			copy_to_user(&to[3], &id, sizeof(uint64_t));
>>>>>> +		}
>>>>>> +
>>>>>>   		/* Sort the buffer list from the smallest to largest buffer,
>>>>>>   		 * which affects the order of buffers in the LRU list.
>>>>>>   		 * This assures that the smallest buffers are added first
>>>>>> diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/r=
adeon/radeon_kms.c
>>>>>> index 85ee6f7..38880af 100644
>>>>>> --- a/drivers/gpu/drm/radeon/radeon_kms.c
>>>>>> +++ b/drivers/gpu/drm/radeon/radeon_kms.c
>>>>>> @@ -578,39 +578,34 @@ void radeon_driver_lastclose_kms(struct drm_de=
vice *dev)
>>>>>>    */
>>>>>>   int radeon_driver_open_kms(struct drm_device *dev, struct drm_file=
 *file_priv)
>>>>>>   {
>>>>>> +	struct radeon_fpriv *fpriv =3D kzalloc(sizeof(*fpriv), GFP_KERNEL);
>>>>>>   	struct radeon_device *rdev =3D dev->dev_private;
>>>>>>   	int r;
>>>>>> -	file_priv->driver_priv =3D NULL;
>>>>>> +	if (unlikely(!fpriv))
>>>>>> +		return -ENOMEM;
>>>>>> +
>>>>>> +	file_priv->driver_priv =3D fpriv;
>>>>>>   	r =3D pm_runtime_get_sync(dev->dev);
>>>>>>   	if (r < 0)
>>>>>> -		return r;
>>>>>> +		goto error;
>>>>>>   	/* new gpu have virtual address space support */
>>>>>>   	if (rdev->family >=3D CHIP_CAYMAN) {
>>>>>> -		struct radeon_fpriv *fpriv;
>>>>>>   		struct radeon_vm *vm;
>>>>>>   		int r;
>>>>>> -		fpriv =3D kzalloc(sizeof(*fpriv), GFP_KERNEL);
>>>>>> -		if (unlikely(!fpriv)) {
>>>>>> -			return -ENOMEM;
>>>>>> -		}
>>>>>> -
>>>>>>   		vm =3D &fpriv->vm;
>>>>>>   		r =3D radeon_vm_init(rdev, vm);
>>>>>> -		if (r) {
>>>>>> -			kfree(fpriv);
>>>>>> -			return r;
>>>>>> -		}
>>>>>> +		if (r)
>>>>>> +			goto error;
>>>>>>   		if (rdev->accel_working) {
>>>>>>   			r =3D radeon_bo_reserve(rdev->ring_tmp_bo.bo, false);
>>>>>>   			if (r) {
>>>>>>   				radeon_vm_fini(rdev, vm);
>>>>>> -				kfree(fpriv);
>>>>>> -				return r;
>>>>>> +				goto error;
>>>>>>   			}
>>>>>>   			/* map the ib pool buffer read only into
>>>>>> @@ -623,16 +618,20 @@ int radeon_driver_open_kms(struct drm_device *=
dev, struct drm_file *file_priv)
>>>>>>   						  RADEON_VM_PAGE_SNOOPED);
>>>>>>   			if (r) {
>>>>>>   				radeon_vm_fini(rdev, vm);
>>>>>> -				kfree(fpriv);
>>>>>> -				return r;
>>>>>> +				goto error;
>>>>>>   			}
>>>>>>   		}
>>>>>> -		file_priv->driver_priv =3D fpriv;
>>>>>>   	}
>>>>>> +	mutex_init(&fpriv->seq_lock);
>>>>>> +
>>>>>>   	pm_runtime_mark_last_busy(dev->dev);
>>>>>>   	pm_runtime_put_autosuspend(dev->dev);
>>>>>>   	return 0;
>>>>>> +
>>>>>> +error:
>>>>>> +	kfree(fpriv);
>>>>>> +	return r;
>>>>>>   }
>>>>>>   /**
>>>>>> @@ -646,11 +645,13 @@ int radeon_driver_open_kms(struct drm_device *=
dev, struct drm_file *file_priv)
>>>>>>   void radeon_driver_postclose_kms(struct drm_device *dev,
>>>>>>   				 struct drm_file *file_priv)
>>>>>>   {
>>>>>> +	struct radeon_fpriv *fpriv =3D file_priv->driver_priv;
>>>>>>   	struct radeon_device *rdev =3D dev->dev_private;
>>>>>> +	radeon_seq_destroy(&fpriv->seq);
>>>>>> +
>>>>>>   	/* new gpu have virtual address space support */
>>>>>>   	if (rdev->family >=3D CHIP_CAYMAN && file_priv->driver_priv) {
>>>>>> -		struct radeon_fpriv *fpriv =3D file_priv->driver_priv;
>>>>>>   		struct radeon_vm *vm =3D &fpriv->vm;
>>>>>>   		int r;
>>>>>> @@ -664,9 +665,9 @@ void radeon_driver_postclose_kms(struct drm_devi=
ce *dev,
>>>>>>   		}
>>>>>>   		radeon_vm_fini(rdev, vm);
>>>>>> -		kfree(fpriv);
>>>>>> -		file_priv->driver_priv =3D NULL;
>>>>>>   	}
>>>>>> +	kfree(fpriv);
>>>>>> +	file_priv->driver_priv =3D NULL;
>>>>>>   }
>>>>>>   /**
>>>>>> diff --git a/drivers/gpu/drm/radeon/radeon_seq.c b/drivers/gpu/drm/r=
adeon/radeon_seq.c
>>>>>> new file mode 100644
>>>>>> index 0000000..d8857f1
>>>>>> --- /dev/null
>>>>>> +++ b/drivers/gpu/drm/radeon/radeon_seq.c
>>>>>> @@ -0,0 +1,229 @@
>>>>>> +/*
>>>>>> + * Copyright 2014 Advanced Micro Devices, Inc.
>>>>>> + * All Rights Reserved.
>>>>>> + *
>>>>>> + * Permission is hereby granted, free of charge, to any person obta=
ining a
>>>>>> + * copy of this software and associated documentation files (the
>>>>>> + * "Software"), to deal in the Software without restriction, includ=
ing
>>>>>> + * without limitation the rights to use, copy, modify, merge, publi=
sh,
>>>>>> + * distribute, sub license, and/or sell copies of the Software, and=
 to
>>>>>> + * permit persons to whom the Software is furnished to do so, subje=
ct to
>>>>>> + * the following conditions:
>>>>>> + *
>>>>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, =
EXPRESS OR
>>>>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANT=
ABILITY,
>>>>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVE=
NT SHALL
>>>>>> + * THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FO=
R ANY CLAIM,
>>>>>> + * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TO=
RT OR
>>>>>> + * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWA=
RE OR THE
>>>>>> + * USE OR OTHER DEALINGS IN THE SOFTWARE.
>>>>>> + *
>>>>>> + * The above copyright notice and this permission notice (including=
 the
>>>>>> + * next paragraph) shall be included in all copies or substantial p=
ortions
>>>>>> + * of the Software.
>>>>>> + *
>>>>>> + */
>>>>>> +/*
>>>>>> + * Authors:
>>>>>> + *    Christian K=F6nig <christian.koenig@amd.com>
>>>>>> + */
>>>>>> +
>>>>>> +#include <drm/drmP.h>
>>>>>> +#include "radeon.h"
>>>>>> +
>>>>>> +/*
>>>>>> + * ID sequences
>>>>>> + * This code generates a 64bit identifier for a command submission.
>>>>>> + * It works by adding the fence of the command submission to a auto=
matically
>>>>>> + * resizing ring buffer.
>>>>>> + */
>>>>>> +
>>>>>> +struct radeon_seq {
>>>>>> +	uint64_t		start;
>>>>>> +	uint64_t		end;
>>>>>> +	uint64_t		mask;
>>>>>> +	struct radeon_seq	*replacement;
>>>>>> +};
>>>>>> +
>>>>>> +/**
>>>>>> + * radeon_seq_create - create a new sequence object
>>>>>> + *
>>>>>> + * @start: start value for this sequence
>>>>>> + * @size: size of the ring buffer, must be power of two
>>>>>> + *
>>>>>> + * Allocate and initialize a new ring buffer and header.
>>>>>> + * Returns NULL if allocation fails, new object otherwise.
>>>>>> + */
>>>>>> +static struct radeon_seq *radeon_seq_create(uint64_t start, unsigne=
d size)
>>>>>> +{
>>>>>> +	unsigned bytes =3D sizeof(struct radeon_seq) +
>>>>>> +		size * sizeof(struct radeon_fence *);
>>>>>> +
>>>>>> +	struct radeon_seq *seq;
>>>>>> +
>>>>>> +	seq =3D kmalloc(bytes, GFP_KERNEL);
>>>>>> +	if (!seq)
>>>>>> +		return NULL;
>>>>>> +
>>>>>> +	seq->start =3D start;
>>>>>> +	seq->end =3D start;
>>>>>> +	seq->mask =3D size - 1;
>>>>>> +	seq->replacement =3D NULL;
>>>>>> +
>>>>>> +	return seq;
>>>>>> +}
>>>>>> +
>>>>>> +/**
>>>>>> + * radeon_seq_ring - get pointer to ring buffer
>>>>>> + *
>>>>>> + * @seq: sequence object
>>>>>> + *
>>>>>> + * Calculate the address of the ring buffer.
>>>>>> + */
>>>>>> +static struct radeon_fence **radeon_seq_ring(struct radeon_seq *seq)
>>>>>> +{
>>>>>> +	return (struct radeon_fence **)&seq[1];
>>>>>> +}
>>>>>> +
>>>>>> +/**
>>>>>> + * radeon_seq_try_free - try to free fences from the ring buffer
>>>>>> + *
>>>>>> + * @seq: sequence object
>>>>>> + *
>>>>>> + * Try to free fences from the start of the ring buffer.
>>>>>> + */
>>>>>> +static void radeon_seq_try_free(struct radeon_seq *seq)
>>>>>> +{
>>>>>> +	struct radeon_fence **ring =3D radeon_seq_ring(seq);
>>>>>> +
>>>>>> +	while (seq->start !=3D seq->end) {
>>>>>> +		unsigned idx =3D seq->start & seq->mask;
>>>>>> +		struct radeon_fence *fence =3D ring[idx];
>>>>>> +
>>>>>> +		if (!radeon_fence_signaled(fence))
>>>>>> +			break;
>>>>>> +
>>>>>> +		radeon_fence_unref(&fence);
>>>>>> +		++seq->start;
>>>>>> +	}
>>>>>> +}
>>>>>> +
>>>>>> +/**
>>>>>> + * radeon_seq_add - add new fence to the end of the ring buffer
>>>>>> + *
>>>>>> + * @seq: sequence object
>>>>>> + * @f: the fence object
>>>>>> + *
>>>>>> + * Add the fence and return the generated ID.
>>>>>> + */
>>>>>> +static uint64_t radeon_seq_add(struct radeon_seq *seq, struct radeo=
n_fence *f)
>>>>>> +{
>>>>>> +	struct radeon_fence **ring =3D radeon_seq_ring(seq);
>>>>>> +
>>>>>> +	ring[seq->end & seq->mask] =3D radeon_fence_ref(f);
>>>>>> +	return seq->end++;
>>>>>> +}
>>>>>> +
>>>>>> +/**
>>>>>> + * radeon_seq_push - check for room and add the fence
>>>>>> + *
>>>>>> + * @seq: sequence object
>>>>>> + * @fence: the fence object
>>>>>> + *
>>>>>> + * Check for room on the ring buffer, if there isn't enough
>>>>>> + * reallocate the sequence object and add the fence.
>>>>>> + * Returns the generated ID.
>>>>>> + */
>>>>>> +uint64_t radeon_seq_push(struct radeon_seq **seq, struct radeon_fen=
ce *fence)
>>>>>> +{
>>>>>> +	unsigned size_for_new_seq =3D 4;
>>>>>> +	uint64_t start_for_new_seq =3D 1;
>>>>>> +
>>>>>> +	if (*seq) {
>>>>>> +		/* try to release old replacements */
>>>>>> +		while ((*seq)->replacement) {
>>>>>> +			radeon_seq_try_free(*seq);
>>>>>> +			if ((*seq)->start =3D=3D (*seq)->end) {
>>>>>> +				struct radeon_seq *repl =3D (*seq)->replacement;
>>>>>> +
>>>>>> +				kfree(*seq);
>>>>>> +				*seq =3D repl;
>>>>>> +			} else {
>>>>>> +				/* move on to the current container */
>>>>>> +				seq =3D &(*seq)->replacement;
>>>>>> +			}
>>>>>> +		}
>>>>>> +
>>>>>> +		/* check if we have enough room for one more fence */
>>>>>> +		radeon_seq_try_free(*seq);
>>>>>> +		if (((*seq)->end - (*seq)->start) <=3D (*seq)->mask)
>>>>>> +			return radeon_seq_add(*seq, fence);
>>>>>> +
>>>>>> +		/* not enough room, let's allocate a replacement */
>>>>>> +		size_for_new_seq =3D ((*seq)->mask + 1) * 2;
>>>>>> +		start_for_new_seq =3D (*seq)->end + 1;
>>>>>> +		seq =3D &(*seq)->replacement;
>>>>>> +	}
>>>>>> +
>>>>>> +	*seq =3D radeon_seq_create(start_for_new_seq, size_for_new_seq);
>>>>>> +	if (!*seq) {
>>>>>> +		/* not enough memory for a new sequence object, but failing
>>>>>> +		   here isn't a good idea either cause the commands are already
>>>>>> +		   submitted to the hardware. So just block on the fence. */
>>>>>> +		int r =3D radeon_fence_wait(fence, false);
>>>>>> +		if (r)
>>>>>> +			DRM_ERROR("Error waiting for fence (%d)\n", r);
>>>>>> +		return 0;
>>>>>> +	}
>>>>>> +	return radeon_seq_add(*seq, fence);
>>>>>> +}
>>>>>> +
>>>>>> +/**
>>>>>> + * radeon_seq_query - lockup fence by it's ID
>>>>>> + *
>>>>>> + * @seq: sequence object
>>>>>> + * @id: the generated ID
>>>>>> + *
>>>>>> + * Lockup the associated fence by it's ID.
>>>>>> + * Returns fence object or NULL if it couldn't be found.
>>>>>> + */
>>>>>> +struct radeon_fence *radeon_seq_query(struct radeon_seq *seq, uint6=
4_t id)
>>>>>> +{
>>>>>> +	struct radeon_fence **ring;
>>>>>> +
>>>>>> +	while (seq && id > seq->end)
>>>>>> +		seq =3D seq->replacement;
>>>>>> +
>>>>>> +	if (!seq || id < seq->start)
>>>>>> +		return NULL;
>>>>>> +
>>>>>> +	ring =3D radeon_seq_ring(seq);
>>>>>> +	return ring[id & seq->mask];
>>>>>> +}
>>>>>> +
>>>>>> +/**
>>>>>> + * radeon_seq_destroy - destroy the sequence object
>>>>>> + *
>>>>>> + * @seq_ptr: pointer to sequence object
>>>>>> + *
>>>>>> + * Destroy the sequence objects and release all fence references ta=
ken.
>>>>>> + */
>>>>>> +void radeon_seq_destroy(struct radeon_seq **seq_ptr)
>>>>>> +{
>>>>>> +	struct radeon_seq *seq =3D *seq_ptr;
>>>>>> +	while (seq) {
>>>>>> +		struct radeon_seq *repl =3D seq->replacement;
>>>>>> +		unsigned start =3D seq->start & seq->mask;
>>>>>> +		unsigned end =3D seq->end & seq->mask;
>>>>>> +		struct radeon_fence **ring;
>>>>>> +		unsigned i;
>>>>>> +
>>>>>> +		ring =3D radeon_seq_ring(seq);
>>>>>> +		for (i =3D start; i < end; ++i)
>>>>>> +			radeon_fence_unref(&ring[i]);
>>>>>> +
>>>>>> +		kfree(seq);
>>>>>> +		seq =3D repl;
>>>>>> +	}
>>>>>> +	*seq_ptr =3D NULL;
>>>>>> +}
>>>>>> diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon=
_drm.h
>>>>>> index 50d0fb4..6b2b2e7 100644
>>>>>> --- a/include/uapi/drm/radeon_drm.h
>>>>>> +++ b/include/uapi/drm/radeon_drm.h
>>>>>> @@ -959,6 +959,7 @@ struct drm_radeon_gem_va {
>>>>>>   #define RADEON_CS_RING_VCE          4
>>>>>>   /* The third dword of RADEON_CHUNK_ID_FLAGS is a sint32 that sets =
the priority */
>>>>>>   /* 0 =3D normal, + =3D higher priority, - =3D lower priority */
>>>>>> +/* The fourth and fives dword are a 64bit fence ID generated for th=
is CS */
>>>>>>   struct drm_radeon_cs_chunk {
>>>>>>   	uint32_t		chunk_id;
>>>>>> -- =

>>>>>> 1.9.1
>>>>>>
>>>>>> _______________________________________________
>>>>>> dri-devel mailing list
>>>>>> dri-devel@lists.freedesktop.org
>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>  From b95fe503dad89875da9db2d11d05f93eca41232d Mon Sep 17 00:00:00 2001
>>>> From: =3D?UTF-8?q?J=3DC3=3DA9r=3DC3=3DB4me=3D20Glisse?=3D <jglisse@red=
hat.com>
>>>> Date: Thu, 18 Sep 2014 22:51:21 -0400
>>>> Subject: [PATCH] drm/radeon: cs sequence id and cs completion query.
>>>> MIME-Version: 1.0
>>>> Content-Type: text/plain; charset=3DUTF-8
>>>> Content-Transfer-Encoding: 8bit
>>>>
>>>> This report back to userspace ring id and sequence number that can
>>>> be use by userspace to query for the completetion of the cs on the
>>>> hardware. This also add a new ioctl to perform such query.
>>>>
>>>> There is an extra value supplied to userspace the wrap counter that
>>>> is use to detect wrap around and to gracefuly handle the case where
>>>> userspace might be querying about a very, very, very old cs executed
>>>> long time ago in a far far away universe.
>>>>
>>>> This patch is aimed to introduce the necessary ground work for user
>>>> space explicit synchronization. By allowing userspace to query about
>>>> cs completetion on hardware, user space can perform operation and
>>>> synchronization on buffer by itself without having the cs ioctl to
>>>> implicitly wait for older cs completetion before scheduling new cs.
>>>> This part is however left to a follow-up patch.
>>>>
>>>> Signed-off-by: J=E9r=F4me Glisse <jglisse@redhat.com>
>>>> ---
>>>>   drivers/gpu/drm/radeon/radeon.h       |  2 ++
>>>>   drivers/gpu/drm/radeon/radeon_cs.c    | 60 +++++++++++++++++++++++++=
++++++++++
>>>>   drivers/gpu/drm/radeon/radeon_fence.c |  4 +++
>>>>   drivers/gpu/drm/radeon/radeon_kms.c   |  1 +
>>>>   include/uapi/drm/radeon_drm.h         |  9 ++++++
>>>>   5 files changed, 76 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/=
radeon.h
>>>> index 5f05b4c..1f1dd1f 100644
>>>> --- a/drivers/gpu/drm/radeon/radeon.h
>>>> +++ b/drivers/gpu/drm/radeon/radeon.h
>>>> @@ -118,6 +118,7 @@ extern int radeon_bapm;
>>>>   #define RADEON_DEBUGFS_MAX_COMPONENTS		32
>>>>   #define RADEONFB_CONN_LIMIT			4
>>>>   #define RADEON_BIOS_NUM_SCRATCH			8
>>>> +#define RADEON_SEQ_WRAP_VALUE			(1 << 30)
>>>>   /* fence seq are set to this number when signaled */
>>>>   #define RADEON_FENCE_SIGNALED_SEQ		0LL
>>>> @@ -355,6 +356,7 @@ struct radeon_fence_driver {
>>>>   	/* sync_seq is protected by ring emission lock */
>>>>   	uint64_t			sync_seq[RADEON_NUM_RINGS];
>>>>   	atomic64_t			last_seq;
>>>> +	int32_t				wrap_seq;
>>>>   	bool				initialized;
>>>>   };
>>>> diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/rade=
on/radeon_cs.c
>>>> index 83f382e..be4ae25 100644
>>>> --- a/drivers/gpu/drm/radeon/radeon_cs.c
>>>> +++ b/drivers/gpu/drm/radeon/radeon_cs.c
>>>> @@ -404,6 +404,17 @@ static void radeon_cs_parser_fini(struct radeon_c=
s_parser *parser, int error, bo
>>>>   		ttm_eu_fence_buffer_objects(&parser->ticket,
>>>>   					    &parser->validated,
>>>>   					    parser->ib.fence);
>>>> +		if (parser->chunk_flags && parser->chunk_flags->length_dw > 4) {
>>>> +			uint32_t __user *to =3D parser->chunk_flags->user_ptr;
>>>> +			uint32_t tmp;
>>>> +
>>>> +			tmp =3D lower_32_bits(parser->ib.fence->seq);
>>>> +			copy_to_user(&to[3], &tmp, sizeof(uint32_t));
>>>> +			tmp =3D parser->ib.fence->ring;
>>>> +			copy_to_user(&to[4], &tmp, sizeof(uint32_t));
>>>> +			tmp =3D rdev->fence_drv[tmp]->wrap_seq;
>>>> +			copy_to_user(&to[5], &tmp, sizeof(uint32_t));
>>>> +		}
>>>>   	} else if (backoff) {
>>>>   		ttm_eu_backoff_reservation(&parser->ticket,
>>>>   					   &parser->validated);
>>>> @@ -823,3 +834,52 @@ int radeon_cs_packet_next_reloc(struct radeon_cs_=
parser *p,
>>>>   		*cs_reloc =3D p->relocs_ptr[(idx / 4)];
>>>>   	return 0;
>>>>   }
>>>> +
>>>> +int radeon_cs_done_ioctl(struct drm_device *dev, void *data, struct d=
rm_file *filp)
>>>> +{
>>>> +	struct radeon_device *rdev =3D dev->dev_private;
>>>> +	struct drm_radeon_cs_done *args =3D data;
>>>> +	unsigned i =3D args->ring;
>>>> +	int32_t last_seq, sync_seq, wrap_seq;
>>>> +
>>>> +	/* FIXME check args->ring value is ok. */
>>>> +
>>>> +	/*
>>>> +	 * The memory barrier is match with the one in radeon_fence_emit() a=
nd
>>>> +	 * it insure us that we get the right matching wrap_seq and sync_seq.
>>>> +	 *
>>>> +	 * Note that no need to protect the fence_drv.sync_seq here as barri=
er
>>>> +	 * insure us we will get the coherency we need.
>>>> +	 */
>>>> +	wrap_seq =3D ACCESS_ONCE(rdev->fence_drv[i].wrap_seq);
>>>> +	smp_rmb();
>>>> +	sync_seq =3D lower_32_bits(ACCESS_ONCE(rdev->fence_drv[i].sync_seq[i=
]));
>>>> +
>>>> +	/*
>>>> +	 * So if current wrap_seq and one we are queried with are differ by
>>>> +	 * more than one this means that we are queried about a very old fen=
ce
>>>> +	 * seq value and we can assume it is long done.
>>>> +	 *
>>>> +	 * Well this is not entirely true, for it to be true we would need to
>>>> +	 * stall when we increment the wrap counter if cs in previous wrap w=
ere
>>>> +	 * not completed but this is highly unlikely. So live with the trill=
 of
>>>> +	 * it going wrong !
>>>> +	 */
>>>> +	if (abs((unsigned)wrap_seq - (unsigned)args->wrap) > 1)
>>>> +		return 1;
>>>> +	/* Now check if currently reported fence seq is done or not. */
>>>> +	/* FIXME call fence func to update last_seq just in case. */
>>>> +	last_seq =3D lower_32_bits(atomic64_read(&rdev->fence_drv[i].last_se=
q));
>>>> +	if ((last_seq - arg->seq) >=3D 0)
>>>> +		return 1;
>>>> +	/*
>>>> +	 * Last failsafe to handle the horrible case were userspace holded on
>>>> +	 * a wrap and seq value for so long without querying that it we wrap=
ped
>>>> +	 * around. This is here to avoid userspace waiting for a fence that =
was
>>>> +	 * emited a long time ago but the current sync_seq[ring] value migth=
 be
>>>> +	 * stuck and thus we might go bigger than this very old seq value.
>>>> +	 */
>>>> +	if (((sync_seq - args->seq) < 0) && args->wrap =3D=3D wrap_seq)
>>>> +		return 1;
>>>> +	return 0;
>>>> +}
>>>> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/r=
adeon/radeon_fence.c
>>>> index 9137870..a6adcf6 100644
>>>> --- a/drivers/gpu/drm/radeon/radeon_fence.c
>>>> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
>>>> @@ -119,6 +119,10 @@ int radeon_fence_emit(struct radeon_device *rdev,
>>>>   	kref_init(&((*fence)->kref));
>>>>   	(*fence)->rdev =3D rdev;
>>>>   	(*fence)->seq =3D ++rdev->fence_drv[ring].sync_seq[ring];
>>>> +	/* Barrier is important for radeon_fence_cs_done_ioctl. */
>>>> +	smp_wmb();
>>>> +	if (rdev->fence_drv[ring].sync_seq[ring] =3D=3D RADEON_SEQ_WRAP_VALU=
E)
>>>> +		rdev->fence_drv[ring].wrap_seq++;
>>>>   	(*fence)->ring =3D ring;
>>>>   	radeon_fence_ring_emit(rdev, ring, *fence);
>>>>   	trace_radeon_fence_emit(rdev->ddev, ring, (*fence)->seq);
>>>> diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/rad=
eon/radeon_kms.c
>>>> index eb7164d..c9cfcf5 100644
>>>> --- a/drivers/gpu/drm/radeon/radeon_kms.c
>>>> +++ b/drivers/gpu/drm/radeon/radeon_kms.c
>>>> @@ -885,5 +885,6 @@ const struct drm_ioctl_desc radeon_ioctls_kms[] =
=3D {
>>>>   	DRM_IOCTL_DEF_DRV(RADEON_GEM_BUSY, radeon_gem_busy_ioctl, DRM_AUTH|=
DRM_UNLOCKED|DRM_RENDER_ALLOW),
>>>>   	DRM_IOCTL_DEF_DRV(RADEON_GEM_VA, radeon_gem_va_ioctl, DRM_AUTH|DRM_=
UNLOCKED|DRM_RENDER_ALLOW),
>>>>   	DRM_IOCTL_DEF_DRV(RADEON_GEM_OP, radeon_gem_op_ioctl, DRM_AUTH|DRM_=
UNLOCKED|DRM_RENDER_ALLOW),
>>>> +	DRM_IOCTL_DEF_DRV(RADEON_CS_DONE, radeon_cs_done_ioctl, DRM_AUTH|DRM=
_UNLOCKED|DRM_RENDER_ALLOW),
>>>>   };
>>>>   int radeon_max_kms_ioctl =3D ARRAY_SIZE(radeon_ioctls_kms);
>>>> diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon_d=
rm.h
>>>> index fea6099..a0c215d 100644
>>>> --- a/include/uapi/drm/radeon_drm.h
>>>> +++ b/include/uapi/drm/radeon_drm.h
>>>> @@ -554,6 +554,7 @@ typedef struct {
>>>>   #define DRM_IOCTL_RADEON_GEM_BUSY	DRM_IOWR(DRM_COMMAND_BASE + DRM_RA=
DEON_GEM_BUSY, struct drm_radeon_gem_busy)
>>>>   #define DRM_IOCTL_RADEON_GEM_VA		DRM_IOWR(DRM_COMMAND_BASE + DRM_RAD=
EON_GEM_VA, struct drm_radeon_gem_va)
>>>>   #define DRM_IOCTL_RADEON_GEM_OP		DRM_IOWR(DRM_COMMAND_BASE + DRM_RAD=
EON_GEM_OP, struct drm_radeon_gem_op)
>>>> +#define DRM_IOCTL_RADEON_CS_DONE	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADE=
ON_CS_DONE, struct drm_radeon_cs_done)
>>>>   typedef struct drm_radeon_init {
>>>>   	enum {
>>>> @@ -936,6 +937,7 @@ struct drm_radeon_gem_va {
>>>>   #define RADEON_CS_RING_VCE          4
>>>>   /* The third dword of RADEON_CHUNK_ID_FLAGS is a sint32 that sets th=
e priority */
>>>>   /* 0 =3D normal, + =3D higher priority, - =3D lower priority */
>>>> +/* The fifth, sixth, seventh  dword are a 32bit fence ID, ring id and=
 wrap id of this CS */
>>>>   struct drm_radeon_cs_chunk {
>>>>   	uint32_t		chunk_id;
>>>> @@ -1038,4 +1040,11 @@ struct drm_radeon_info {
>>>>   #define CIK_TILE_MODE_DEPTH_STENCIL_1D		5
>>>> +struct drm_radeon_cs_done {
>>>> +	int32_t			seq;
>>>> +	int32_t			ring;
>>>> +	int32_t			wrap;
>>>> +	int32_t			pad;
>>>> +};
>>>> +
>>>>   #endif
>>>> -- =

>>>> 1.9.3
>>>>