From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jerome Glisse <j.glisse@gmail.com>
Subject: Re: [PATCH 2/3] drm/radeon: add command submission IDs
Date: Fri, 19 Sep 2014 09:40:51 -0400
Message-ID: <20140919134050.GA2581@gmail.com>
References: <1411054213-11173-1-git-send-email-deathsimple@vodafone.de>
 <1411054213-11173-2-git-send-email-deathsimple@vodafone.de>
 <20140919004215.GA3447@gmail.com> <20140919031155.GB3447@gmail.com>
 <20140919040629.GC3447@gmail.com> <541BE35F.9000200@vodafone.de>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Return-path: <dri-devel-bounces@lists.freedesktop.org>
Received: from mail-qc0-f182.google.com (mail-qc0-f182.google.com
 [209.85.216.182])
 by gabe.freedesktop.org (Postfix) with ESMTP id AF56D6E1AA
 for <dri-devel@lists.freedesktop.org>; Fri, 19 Sep 2014 06:43:48 -0700 (PDT)
Received: by mail-qc0-f182.google.com with SMTP id i8so2452263qcq.13
 for <dri-devel@lists.freedesktop.org>; Fri, 19 Sep 2014 06:43:48 -0700 (PDT)
Content-Disposition: inline
In-Reply-To: <541BE35F.9000200@vodafone.de>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
To: Christian =?iso-8859-1?Q?K=F6nig?= <deathsimple@vodafone.de>
Cc: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org

On Fri, Sep 19, 2014 at 10:03:43AM +0200, Christian K=F6nig wrote:
> Hi Jerome,
> =

> this is exactly the approach which we wanted to avoid. And according to A=
lex
> it was actually you guys (I think Dave) who noted that this is probably n=
ot
> such a good idea.
> =

> The issue is that by telling userspace the ring specific sequence number =
we
> nail down the order of execution. E.g. a scheduler who want to change the
> order in which IBs are submitted to the hardware can't do so any more
> because we already gave userspace a sequence number to identify the comma=
nd
> submission.

Well for curent code this design does work, for hw where you want to have t=
he
hw/driver do some scheduling this can still be achieve by having a seq numb=
er
per ring per process.

> =

> Additional to that for hardware inter device synchronization we want to
> refer to a certain kernel object and not just the fence number. So we can
> only use some kind of handle based approach, e.g. as discussed before we
> want to be able to turn this sequence number in a FD fence.

Inter-device is not an issue here. For inter-device you want to convert this
seq number to an fd at which point you can create a fence structure with the
proper information ie create a structure only if it is needed.

Really this is the only sane design if we start allocating structure for ea=
ch
cs (well for each cs userspace cares) this gonna play bad with memory press=
ure.

Cheers,
J=E9r=F4me

> =

> Regards,
> Christian.
> =

> Am 19.09.2014 um 06:06 schrieb Jerome Glisse:
> >On Thu, Sep 18, 2014 at 11:11:57PM -0400, Jerome Glisse wrote:
> >>On Thu, Sep 18, 2014 at 08:42:16PM -0400, Jerome Glisse wrote:
> >>>On Thu, Sep 18, 2014 at 05:30:12PM +0200, Christian K=F6nig wrote:
> >>>>From: Christian K=F6nig <christian.koenig@amd.com>
> >>>>
> >>>>This patch adds a new 64bit ID as a result to each command submission.
> >>>NAK for design reasons.
> >>>
> >>>First and foremost we DO NOT WANT TO ALLOCATE ANYTHING for submission =
id.
> >>>I know that hooking up to fence make life easy but this is not how it
> >>>should be done.
> >>>
> >>>What you want to expose is the per ring 32bit seq value and report to =
user
> >>>space the value and the ring id. To handle the wrap around you add a u=
niq
> >>>32bit wrap counter which is incremented every 1 << 30 or some big numb=
er
> >>>seq value number (but must be smaller than 1 << 31). Then you use arit=
hmetic
> >>>to determine if a cs seq number is done or not (ie the seq wrap around
> >>>counter diff should not be bigger then 2 or 3) and the seq number shou=
ld
> >>>obviously be after again using arithmetic diff like jiffies code.
> >>>
> >>>All this being hidden in the kernel all the user space have to know is=
 :
> >>>32 bit seq value per ring
> >>>32 bit ring uniq id
> >>>32 wrap counter per ring
> >>>
> >>>With such scheme you never allocate anything or worry about any fence.
> >>>Way simpler and less code.
> >>>
> >>>Cheers,
> >>>J=E9r=F4me
> >>Because code is clearer than words attached is a patch of what i am thi=
nking
> >>about. The arithmetic should be right as well as the coherency and prop=
er
> >>memory barrier. Thought it is completely untested and will require coup=
le
> >>fixes for properly checking userspace arguments and other aspect (see F=
IXME
> >>in patches).
> >>
> >>There is no allocation, it is a pretty small patch, and it provide a fa=
st
> >>ligthweight solution. While the new ioctl argument and return value can=
 be
> >>improved (like reporting more information to userspace like for instance
> >>warning userspace when it is querying very old fences). I think the ove=
rall
> >>design is sound.
> >>
> >>Let me know.
> >Ok actually after a beer (given that planet express manual for Bender, w=
hich
> >also apply to me, clearly stipulate to expect malfunction if not properly
> >inebriate) it became clear that by reversing the problem i could make it=
 a
> >lot simpler. So attached is an even simpler patch that handle gracefully=
 user
> >space querying for very old fence (ie that could have wrap around).
> >
> >Code explains it all.
> >
> >>Cheers,
> >>J=E9r=F4me
> >>
> >>>
> >>>>Signed-off-by: Christian K=F6nig <christian.koenig@amd.com>
> >>>>---
> >>>>  drivers/gpu/drm/radeon/Makefile     |   2 +-
> >>>>  drivers/gpu/drm/radeon/radeon.h     |  13 +-
> >>>>  drivers/gpu/drm/radeon/radeon_cs.c  |  13 ++
> >>>>  drivers/gpu/drm/radeon/radeon_kms.c |  41 +++----
> >>>>  drivers/gpu/drm/radeon/radeon_seq.c | 229 +++++++++++++++++++++++++=
+++++++++++
> >>>>  include/uapi/drm/radeon_drm.h       |   1 +
> >>>>  6 files changed, 277 insertions(+), 22 deletions(-)
> >>>>  create mode 100644 drivers/gpu/drm/radeon/radeon_seq.c
> >>>>
> >>>>diff --git a/drivers/gpu/drm/radeon/Makefile b/drivers/gpu/drm/radeon=
/Makefile
> >>>>index 8cc9f68..21ee928 100644
> >>>>--- a/drivers/gpu/drm/radeon/Makefile
> >>>>+++ b/drivers/gpu/drm/radeon/Makefile
> >>>>@@ -81,7 +81,7 @@ radeon-y +=3D radeon_device.o radeon_asic.o radeon_=
kms.o \
> >>>>  	rv770_smc.o cypress_dpm.o btc_dpm.o sumo_dpm.o sumo_smc.o trinity_=
dpm.o \
> >>>>  	trinity_smc.o ni_dpm.o si_smc.o si_dpm.o kv_smc.o kv_dpm.o ci_smc.=
o \
> >>>>  	ci_dpm.o dce6_afmt.o radeon_vm.o radeon_ucode.o radeon_ib.o radeon=
_mn.o \
> >>>>-	radeon_sync.o
> >>>>+	radeon_sync.o radeon_seq.o
> >>>>  # add async DMA block
> >>>>  radeon-y +=3D \
> >>>>diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon=
/radeon.h
> >>>>index b3b4e96..dbfd346 100644
> >>>>--- a/drivers/gpu/drm/radeon/radeon.h
> >>>>+++ b/drivers/gpu/drm/radeon/radeon.h
> >>>>@@ -423,6 +423,15 @@ static inline bool radeon_fence_is_earlier(struc=
t radeon_fence *a,
> >>>>  }
> >>>>  /*
> >>>>+ * Userspace command submission identifier generation
> >>>>+ */
> >>>>+struct radeon_seq;
> >>>>+
> >>>>+uint64_t radeon_seq_push(struct radeon_seq **seq, struct radeon_fenc=
e *fence);
> >>>>+struct radeon_fence *radeon_seq_query(struct radeon_seq *seq, uint64=
_t id);
> >>>>+void radeon_seq_destroy(struct radeon_seq **seq);
> >>>>+
> >>>>+/*
> >>>>   * Tiling registers
> >>>>   */
> >>>>  struct radeon_surface_reg {
> >>>>@@ -946,7 +955,9 @@ struct radeon_vm_manager {
> >>>>   * file private structure
> >>>>   */
> >>>>  struct radeon_fpriv {
> >>>>-	struct radeon_vm		vm;
> >>>>+	struct radeon_vm	vm;
> >>>>+	struct mutex		seq_lock;
> >>>>+	struct radeon_seq	*seq;
> >>>>  };
> >>>>  /*
> >>>>diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/rad=
eon/radeon_cs.c
> >>>>index b8f20a5..0c0f0b3 100644
> >>>>--- a/drivers/gpu/drm/radeon/radeon_cs.c
> >>>>+++ b/drivers/gpu/drm/radeon/radeon_cs.c
> >>>>@@ -413,6 +413,19 @@ static void radeon_cs_parser_fini(struct radeon_=
cs_parser *parser, int error, bo
> >>>>  	unsigned i;
> >>>>  	if (!error) {
> >>>>+		if (parser->chunk_flags &&
> >>>>+		    parser->chunk_flags->length_dw > 4) {
> >>>>+			struct radeon_fpriv *fpriv =3D parser->filp->driver_priv;
> >>>>+			uint32_t __user *to =3D parser->chunk_flags->user_ptr;
> >>>>+			uint64_t id;
> >>>>+
> >>>>+			mutex_lock(&fpriv->seq_lock);
> >>>>+			id =3D radeon_seq_push(&fpriv->seq, parser->ib.fence);
> >>>>+			mutex_unlock(&fpriv->seq_lock);
> >>>>+
> >>>>+			copy_to_user(&to[3], &id, sizeof(uint64_t));
> >>>>+		}
> >>>>+
> >>>>  		/* Sort the buffer list from the smallest to largest buffer,
> >>>>  		 * which affects the order of buffers in the LRU list.
> >>>>  		 * This assures that the smallest buffers are added first
> >>>>diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/ra=
deon/radeon_kms.c
> >>>>index 85ee6f7..38880af 100644
> >>>>--- a/drivers/gpu/drm/radeon/radeon_kms.c
> >>>>+++ b/drivers/gpu/drm/radeon/radeon_kms.c
> >>>>@@ -578,39 +578,34 @@ void radeon_driver_lastclose_kms(struct drm_dev=
ice *dev)
> >>>>   */
> >>>>  int radeon_driver_open_kms(struct drm_device *dev, struct drm_file =
*file_priv)
> >>>>  {
> >>>>+	struct radeon_fpriv *fpriv =3D kzalloc(sizeof(*fpriv), GFP_KERNEL);
> >>>>  	struct radeon_device *rdev =3D dev->dev_private;
> >>>>  	int r;
> >>>>-	file_priv->driver_priv =3D NULL;
> >>>>+	if (unlikely(!fpriv))
> >>>>+		return -ENOMEM;
> >>>>+
> >>>>+	file_priv->driver_priv =3D fpriv;
> >>>>  	r =3D pm_runtime_get_sync(dev->dev);
> >>>>  	if (r < 0)
> >>>>-		return r;
> >>>>+		goto error;
> >>>>  	/* new gpu have virtual address space support */
> >>>>  	if (rdev->family >=3D CHIP_CAYMAN) {
> >>>>-		struct radeon_fpriv *fpriv;
> >>>>  		struct radeon_vm *vm;
> >>>>  		int r;
> >>>>-		fpriv =3D kzalloc(sizeof(*fpriv), GFP_KERNEL);
> >>>>-		if (unlikely(!fpriv)) {
> >>>>-			return -ENOMEM;
> >>>>-		}
> >>>>-
> >>>>  		vm =3D &fpriv->vm;
> >>>>  		r =3D radeon_vm_init(rdev, vm);
> >>>>-		if (r) {
> >>>>-			kfree(fpriv);
> >>>>-			return r;
> >>>>-		}
> >>>>+		if (r)
> >>>>+			goto error;
> >>>>  		if (rdev->accel_working) {
> >>>>  			r =3D radeon_bo_reserve(rdev->ring_tmp_bo.bo, false);
> >>>>  			if (r) {
> >>>>  				radeon_vm_fini(rdev, vm);
> >>>>-				kfree(fpriv);
> >>>>-				return r;
> >>>>+				goto error;
> >>>>  			}
> >>>>  			/* map the ib pool buffer read only into
> >>>>@@ -623,16 +618,20 @@ int radeon_driver_open_kms(struct drm_device *d=
ev, struct drm_file *file_priv)
> >>>>  						  RADEON_VM_PAGE_SNOOPED);
> >>>>  			if (r) {
> >>>>  				radeon_vm_fini(rdev, vm);
> >>>>-				kfree(fpriv);
> >>>>-				return r;
> >>>>+				goto error;
> >>>>  			}
> >>>>  		}
> >>>>-		file_priv->driver_priv =3D fpriv;
> >>>>  	}
> >>>>+	mutex_init(&fpriv->seq_lock);
> >>>>+
> >>>>  	pm_runtime_mark_last_busy(dev->dev);
> >>>>  	pm_runtime_put_autosuspend(dev->dev);
> >>>>  	return 0;
> >>>>+
> >>>>+error:
> >>>>+	kfree(fpriv);
> >>>>+	return r;
> >>>>  }
> >>>>  /**
> >>>>@@ -646,11 +645,13 @@ int radeon_driver_open_kms(struct drm_device *d=
ev, struct drm_file *file_priv)
> >>>>  void radeon_driver_postclose_kms(struct drm_device *dev,
> >>>>  				 struct drm_file *file_priv)
> >>>>  {
> >>>>+	struct radeon_fpriv *fpriv =3D file_priv->driver_priv;
> >>>>  	struct radeon_device *rdev =3D dev->dev_private;
> >>>>+	radeon_seq_destroy(&fpriv->seq);
> >>>>+
> >>>>  	/* new gpu have virtual address space support */
> >>>>  	if (rdev->family >=3D CHIP_CAYMAN && file_priv->driver_priv) {
> >>>>-		struct radeon_fpriv *fpriv =3D file_priv->driver_priv;
> >>>>  		struct radeon_vm *vm =3D &fpriv->vm;
> >>>>  		int r;
> >>>>@@ -664,9 +665,9 @@ void radeon_driver_postclose_kms(struct drm_devic=
e *dev,
> >>>>  		}
> >>>>  		radeon_vm_fini(rdev, vm);
> >>>>-		kfree(fpriv);
> >>>>-		file_priv->driver_priv =3D NULL;
> >>>>  	}
> >>>>+	kfree(fpriv);
> >>>>+	file_priv->driver_priv =3D NULL;
> >>>>  }
> >>>>  /**
> >>>>diff --git a/drivers/gpu/drm/radeon/radeon_seq.c b/drivers/gpu/drm/ra=
deon/radeon_seq.c
> >>>>new file mode 100644
> >>>>index 0000000..d8857f1
> >>>>--- /dev/null
> >>>>+++ b/drivers/gpu/drm/radeon/radeon_seq.c
> >>>>@@ -0,0 +1,229 @@
> >>>>+/*
> >>>>+ * Copyright 2014 Advanced Micro Devices, Inc.
> >>>>+ * All Rights Reserved.
> >>>>+ *
> >>>>+ * Permission is hereby granted, free of charge, to any person obtai=
ning a
> >>>>+ * copy of this software and associated documentation files (the
> >>>>+ * "Software"), to deal in the Software without restriction, includi=
ng
> >>>>+ * without limitation the rights to use, copy, modify, merge, publis=
h,
> >>>>+ * distribute, sub license, and/or sell copies of the Software, and =
to
> >>>>+ * permit persons to whom the Software is furnished to do so, subjec=
t to
> >>>>+ * the following conditions:
> >>>>+ *
> >>>>+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, E=
XPRESS OR
> >>>>+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTA=
BILITY,
> >>>>+ * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVEN=
T SHALL
> >>>>+ * THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR=
 ANY CLAIM,
> >>>>+ * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TOR=
T OR
> >>>>+ * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWAR=
E OR THE
> >>>>+ * USE OR OTHER DEALINGS IN THE SOFTWARE.
> >>>>+ *
> >>>>+ * The above copyright notice and this permission notice (including =
the
> >>>>+ * next paragraph) shall be included in all copies or substantial po=
rtions
> >>>>+ * of the Software.
> >>>>+ *
> >>>>+ */
> >>>>+/*
> >>>>+ * Authors:
> >>>>+ *    Christian K=F6nig <christian.koenig@amd.com>
> >>>>+ */
> >>>>+
> >>>>+#include <drm/drmP.h>
> >>>>+#include "radeon.h"
> >>>>+
> >>>>+/*
> >>>>+ * ID sequences
> >>>>+ * This code generates a 64bit identifier for a command submission.
> >>>>+ * It works by adding the fence of the command submission to a autom=
atically
> >>>>+ * resizing ring buffer.
> >>>>+ */
> >>>>+
> >>>>+struct radeon_seq {
> >>>>+	uint64_t		start;
> >>>>+	uint64_t		end;
> >>>>+	uint64_t		mask;
> >>>>+	struct radeon_seq	*replacement;
> >>>>+};
> >>>>+
> >>>>+/**
> >>>>+ * radeon_seq_create - create a new sequence object
> >>>>+ *
> >>>>+ * @start: start value for this sequence
> >>>>+ * @size: size of the ring buffer, must be power of two
> >>>>+ *
> >>>>+ * Allocate and initialize a new ring buffer and header.
> >>>>+ * Returns NULL if allocation fails, new object otherwise.
> >>>>+ */
> >>>>+static struct radeon_seq *radeon_seq_create(uint64_t start, unsigned=
 size)
> >>>>+{
> >>>>+	unsigned bytes =3D sizeof(struct radeon_seq) +
> >>>>+		size * sizeof(struct radeon_fence *);
> >>>>+
> >>>>+	struct radeon_seq *seq;
> >>>>+
> >>>>+	seq =3D kmalloc(bytes, GFP_KERNEL);
> >>>>+	if (!seq)
> >>>>+		return NULL;
> >>>>+
> >>>>+	seq->start =3D start;
> >>>>+	seq->end =3D start;
> >>>>+	seq->mask =3D size - 1;
> >>>>+	seq->replacement =3D NULL;
> >>>>+
> >>>>+	return seq;
> >>>>+}
> >>>>+
> >>>>+/**
> >>>>+ * radeon_seq_ring - get pointer to ring buffer
> >>>>+ *
> >>>>+ * @seq: sequence object
> >>>>+ *
> >>>>+ * Calculate the address of the ring buffer.
> >>>>+ */
> >>>>+static struct radeon_fence **radeon_seq_ring(struct radeon_seq *seq)
> >>>>+{
> >>>>+	return (struct radeon_fence **)&seq[1];
> >>>>+}
> >>>>+
> >>>>+/**
> >>>>+ * radeon_seq_try_free - try to free fences from the ring buffer
> >>>>+ *
> >>>>+ * @seq: sequence object
> >>>>+ *
> >>>>+ * Try to free fences from the start of the ring buffer.
> >>>>+ */
> >>>>+static void radeon_seq_try_free(struct radeon_seq *seq)
> >>>>+{
> >>>>+	struct radeon_fence **ring =3D radeon_seq_ring(seq);
> >>>>+
> >>>>+	while (seq->start !=3D seq->end) {
> >>>>+		unsigned idx =3D seq->start & seq->mask;
> >>>>+		struct radeon_fence *fence =3D ring[idx];
> >>>>+
> >>>>+		if (!radeon_fence_signaled(fence))
> >>>>+			break;
> >>>>+
> >>>>+		radeon_fence_unref(&fence);
> >>>>+		++seq->start;
> >>>>+	}
> >>>>+}
> >>>>+
> >>>>+/**
> >>>>+ * radeon_seq_add - add new fence to the end of the ring buffer
> >>>>+ *
> >>>>+ * @seq: sequence object
> >>>>+ * @f: the fence object
> >>>>+ *
> >>>>+ * Add the fence and return the generated ID.
> >>>>+ */
> >>>>+static uint64_t radeon_seq_add(struct radeon_seq *seq, struct radeon=
_fence *f)
> >>>>+{
> >>>>+	struct radeon_fence **ring =3D radeon_seq_ring(seq);
> >>>>+
> >>>>+	ring[seq->end & seq->mask] =3D radeon_fence_ref(f);
> >>>>+	return seq->end++;
> >>>>+}
> >>>>+
> >>>>+/**
> >>>>+ * radeon_seq_push - check for room and add the fence
> >>>>+ *
> >>>>+ * @seq: sequence object
> >>>>+ * @fence: the fence object
> >>>>+ *
> >>>>+ * Check for room on the ring buffer, if there isn't enough
> >>>>+ * reallocate the sequence object and add the fence.
> >>>>+ * Returns the generated ID.
> >>>>+ */
> >>>>+uint64_t radeon_seq_push(struct radeon_seq **seq, struct radeon_fenc=
e *fence)
> >>>>+{
> >>>>+	unsigned size_for_new_seq =3D 4;
> >>>>+	uint64_t start_for_new_seq =3D 1;
> >>>>+
> >>>>+	if (*seq) {
> >>>>+		/* try to release old replacements */
> >>>>+		while ((*seq)->replacement) {
> >>>>+			radeon_seq_try_free(*seq);
> >>>>+			if ((*seq)->start =3D=3D (*seq)->end) {
> >>>>+				struct radeon_seq *repl =3D (*seq)->replacement;
> >>>>+
> >>>>+				kfree(*seq);
> >>>>+				*seq =3D repl;
> >>>>+			} else {
> >>>>+				/* move on to the current container */
> >>>>+				seq =3D &(*seq)->replacement;
> >>>>+			}
> >>>>+		}
> >>>>+
> >>>>+		/* check if we have enough room for one more fence */
> >>>>+		radeon_seq_try_free(*seq);
> >>>>+		if (((*seq)->end - (*seq)->start) <=3D (*seq)->mask)
> >>>>+			return radeon_seq_add(*seq, fence);
> >>>>+
> >>>>+		/* not enough room, let's allocate a replacement */
> >>>>+		size_for_new_seq =3D ((*seq)->mask + 1) * 2;
> >>>>+		start_for_new_seq =3D (*seq)->end + 1;
> >>>>+		seq =3D &(*seq)->replacement;
> >>>>+	}
> >>>>+
> >>>>+	*seq =3D radeon_seq_create(start_for_new_seq, size_for_new_seq);
> >>>>+	if (!*seq) {
> >>>>+		/* not enough memory for a new sequence object, but failing
> >>>>+		   here isn't a good idea either cause the commands are already
> >>>>+		   submitted to the hardware. So just block on the fence. */
> >>>>+		int r =3D radeon_fence_wait(fence, false);
> >>>>+		if (r)
> >>>>+			DRM_ERROR("Error waiting for fence (%d)\n", r);
> >>>>+		return 0;
> >>>>+	}
> >>>>+	return radeon_seq_add(*seq, fence);
> >>>>+}
> >>>>+
> >>>>+/**
> >>>>+ * radeon_seq_query - lockup fence by it's ID
> >>>>+ *
> >>>>+ * @seq: sequence object
> >>>>+ * @id: the generated ID
> >>>>+ *
> >>>>+ * Lockup the associated fence by it's ID.
> >>>>+ * Returns fence object or NULL if it couldn't be found.
> >>>>+ */
> >>>>+struct radeon_fence *radeon_seq_query(struct radeon_seq *seq, uint64=
_t id)
> >>>>+{
> >>>>+	struct radeon_fence **ring;
> >>>>+
> >>>>+	while (seq && id > seq->end)
> >>>>+		seq =3D seq->replacement;
> >>>>+
> >>>>+	if (!seq || id < seq->start)
> >>>>+		return NULL;
> >>>>+
> >>>>+	ring =3D radeon_seq_ring(seq);
> >>>>+	return ring[id & seq->mask];
> >>>>+}
> >>>>+
> >>>>+/**
> >>>>+ * radeon_seq_destroy - destroy the sequence object
> >>>>+ *
> >>>>+ * @seq_ptr: pointer to sequence object
> >>>>+ *
> >>>>+ * Destroy the sequence objects and release all fence references tak=
en.
> >>>>+ */
> >>>>+void radeon_seq_destroy(struct radeon_seq **seq_ptr)
> >>>>+{
> >>>>+	struct radeon_seq *seq =3D *seq_ptr;
> >>>>+	while (seq) {
> >>>>+		struct radeon_seq *repl =3D seq->replacement;
> >>>>+		unsigned start =3D seq->start & seq->mask;
> >>>>+		unsigned end =3D seq->end & seq->mask;
> >>>>+		struct radeon_fence **ring;
> >>>>+		unsigned i;
> >>>>+
> >>>>+		ring =3D radeon_seq_ring(seq);
> >>>>+		for (i =3D start; i < end; ++i)
> >>>>+			radeon_fence_unref(&ring[i]);
> >>>>+
> >>>>+		kfree(seq);
> >>>>+		seq =3D repl;
> >>>>+	}
> >>>>+	*seq_ptr =3D NULL;
> >>>>+}
> >>>>diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon_=
drm.h
> >>>>index 50d0fb4..6b2b2e7 100644
> >>>>--- a/include/uapi/drm/radeon_drm.h
> >>>>+++ b/include/uapi/drm/radeon_drm.h
> >>>>@@ -959,6 +959,7 @@ struct drm_radeon_gem_va {
> >>>>  #define RADEON_CS_RING_VCE          4
> >>>>  /* The third dword of RADEON_CHUNK_ID_FLAGS is a sint32 that sets t=
he priority */
> >>>>  /* 0 =3D normal, + =3D higher priority, - =3D lower priority */
> >>>>+/* The fourth and fives dword are a 64bit fence ID generated for thi=
s CS */
> >>>>  struct drm_radeon_cs_chunk {
> >>>>  	uint32_t		chunk_id;
> >>>>-- =

> >>>>1.9.1
> >>>>
> >>>>_______________________________________________
> >>>>dri-devel mailing list
> >>>>dri-devel@lists.freedesktop.org
> >>>>http://lists.freedesktop.org/mailman/listinfo/dri-devel
> >> From b95fe503dad89875da9db2d11d05f93eca41232d Mon Sep 17 00:00:00 2001
> >>From: =3D?UTF-8?q?J=3DC3=3DA9r=3DC3=3DB4me=3D20Glisse?=3D <jglisse@redh=
at.com>
> >>Date: Thu, 18 Sep 2014 22:51:21 -0400
> >>Subject: [PATCH] drm/radeon: cs sequence id and cs completion query.
> >>MIME-Version: 1.0
> >>Content-Type: text/plain; charset=3DUTF-8
> >>Content-Transfer-Encoding: 8bit
> >>
> >>This report back to userspace ring id and sequence number that can
> >>be use by userspace to query for the completetion of the cs on the
> >>hardware. This also add a new ioctl to perform such query.
> >>
> >>There is an extra value supplied to userspace the wrap counter that
> >>is use to detect wrap around and to gracefuly handle the case where
> >>userspace might be querying about a very, very, very old cs executed
> >>long time ago in a far far away universe.
> >>
> >>This patch is aimed to introduce the necessary ground work for user
> >>space explicit synchronization. By allowing userspace to query about
> >>cs completetion on hardware, user space can perform operation and
> >>synchronization on buffer by itself without having the cs ioctl to
> >>implicitly wait for older cs completetion before scheduling new cs.
> >>This part is however left to a follow-up patch.
> >>
> >>Signed-off-by: J=E9r=F4me Glisse <jglisse@redhat.com>
> >>---
> >>  drivers/gpu/drm/radeon/radeon.h       |  2 ++
> >>  drivers/gpu/drm/radeon/radeon_cs.c    | 60 ++++++++++++++++++++++++++=
+++++++++
> >>  drivers/gpu/drm/radeon/radeon_fence.c |  4 +++
> >>  drivers/gpu/drm/radeon/radeon_kms.c   |  1 +
> >>  include/uapi/drm/radeon_drm.h         |  9 ++++++
> >>  5 files changed, 76 insertions(+)
> >>
> >>diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/r=
adeon.h
> >>index 5f05b4c..1f1dd1f 100644
> >>--- a/drivers/gpu/drm/radeon/radeon.h
> >>+++ b/drivers/gpu/drm/radeon/radeon.h
> >>@@ -118,6 +118,7 @@ extern int radeon_bapm;
> >>  #define RADEON_DEBUGFS_MAX_COMPONENTS		32
> >>  #define RADEONFB_CONN_LIMIT			4
> >>  #define RADEON_BIOS_NUM_SCRATCH			8
> >>+#define RADEON_SEQ_WRAP_VALUE			(1 << 30)
> >>  /* fence seq are set to this number when signaled */
> >>  #define RADEON_FENCE_SIGNALED_SEQ		0LL
> >>@@ -355,6 +356,7 @@ struct radeon_fence_driver {
> >>  	/* sync_seq is protected by ring emission lock */
> >>  	uint64_t			sync_seq[RADEON_NUM_RINGS];
> >>  	atomic64_t			last_seq;
> >>+	int32_t				wrap_seq;
> >>  	bool				initialized;
> >>  };
> >>diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeo=
n/radeon_cs.c
> >>index 83f382e..be4ae25 100644
> >>--- a/drivers/gpu/drm/radeon/radeon_cs.c
> >>+++ b/drivers/gpu/drm/radeon/radeon_cs.c
> >>@@ -404,6 +404,17 @@ static void radeon_cs_parser_fini(struct radeon_cs=
_parser *parser, int error, bo
> >>  		ttm_eu_fence_buffer_objects(&parser->ticket,
> >>  					    &parser->validated,
> >>  					    parser->ib.fence);
> >>+		if (parser->chunk_flags && parser->chunk_flags->length_dw > 4) {
> >>+			uint32_t __user *to =3D parser->chunk_flags->user_ptr;
> >>+			uint32_t tmp;
> >>+
> >>+			tmp =3D lower_32_bits(parser->ib.fence->seq);
> >>+			copy_to_user(&to[3], &tmp, sizeof(uint32_t));
> >>+			tmp =3D parser->ib.fence->ring;
> >>+			copy_to_user(&to[4], &tmp, sizeof(uint32_t));
> >>+			tmp =3D rdev->fence_drv[tmp]->wrap_seq;
> >>+			copy_to_user(&to[5], &tmp, sizeof(uint32_t));
> >>+		}
> >>  	} else if (backoff) {
> >>  		ttm_eu_backoff_reservation(&parser->ticket,
> >>  					   &parser->validated);
> >>@@ -823,3 +834,52 @@ int radeon_cs_packet_next_reloc(struct radeon_cs_p=
arser *p,
> >>  		*cs_reloc =3D p->relocs_ptr[(idx / 4)];
> >>  	return 0;
> >>  }
> >>+
> >>+int radeon_cs_done_ioctl(struct drm_device *dev, void *data, struct dr=
m_file *filp)
> >>+{
> >>+	struct radeon_device *rdev =3D dev->dev_private;
> >>+	struct drm_radeon_cs_done *args =3D data;
> >>+	unsigned i =3D args->ring;
> >>+	int32_t last_seq, sync_seq, wrap_seq;
> >>+
> >>+	/* FIXME check args->ring value is ok. */
> >>+
> >>+	/*
> >>+	 * The memory barrier is match with the one in radeon_fence_emit() and
> >>+	 * it insure us that we get the right matching wrap_seq and sync_seq.
> >>+	 *
> >>+	 * Note that no need to protect the fence_drv.sync_seq here as barrier
> >>+	 * insure us we will get the coherency we need.
> >>+	 */
> >>+	wrap_seq =3D ACCESS_ONCE(rdev->fence_drv[i].wrap_seq);
> >>+	smp_rmb();
> >>+	sync_seq =3D lower_32_bits(ACCESS_ONCE(rdev->fence_drv[i].sync_seq[i]=
));
> >>+
> >>+	/*
> >>+	 * So if current wrap_seq and one we are queried with are differ by
> >>+	 * more than one this means that we are queried about a very old fence
> >>+	 * seq value and we can assume it is long done.
> >>+	 *
> >>+	 * Well this is not entirely true, for it to be true we would need to
> >>+	 * stall when we increment the wrap counter if cs in previous wrap we=
re
> >>+	 * not completed but this is highly unlikely. So live with the trill =
of
> >>+	 * it going wrong !
> >>+	 */
> >>+	if (abs((unsigned)wrap_seq - (unsigned)args->wrap) > 1)
> >>+		return 1;
> >>+	/* Now check if currently reported fence seq is done or not. */
> >>+	/* FIXME call fence func to update last_seq just in case. */
> >>+	last_seq =3D lower_32_bits(atomic64_read(&rdev->fence_drv[i].last_seq=
));
> >>+	if ((last_seq - arg->seq) >=3D 0)
> >>+		return 1;
> >>+	/*
> >>+	 * Last failsafe to handle the horrible case were userspace holded on
> >>+	 * a wrap and seq value for so long without querying that it we wrapp=
ed
> >>+	 * around. This is here to avoid userspace waiting for a fence that w=
as
> >>+	 * emited a long time ago but the current sync_seq[ring] value migth =
be
> >>+	 * stuck and thus we might go bigger than this very old seq value.
> >>+	 */
> >>+	if (((sync_seq - args->seq) < 0) && args->wrap =3D=3D wrap_seq)
> >>+		return 1;
> >>+	return 0;
> >>+}
> >>diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/ra=
deon/radeon_fence.c
> >>index 9137870..a6adcf6 100644
> >>--- a/drivers/gpu/drm/radeon/radeon_fence.c
> >>+++ b/drivers/gpu/drm/radeon/radeon_fence.c
> >>@@ -119,6 +119,10 @@ int radeon_fence_emit(struct radeon_device *rdev,
> >>  	kref_init(&((*fence)->kref));
> >>  	(*fence)->rdev =3D rdev;
> >>  	(*fence)->seq =3D ++rdev->fence_drv[ring].sync_seq[ring];
> >>+	/* Barrier is important for radeon_fence_cs_done_ioctl. */
> >>+	smp_wmb();
> >>+	if (rdev->fence_drv[ring].sync_seq[ring] =3D=3D RADEON_SEQ_WRAP_VALUE)
> >>+		rdev->fence_drv[ring].wrap_seq++;
> >>  	(*fence)->ring =3D ring;
> >>  	radeon_fence_ring_emit(rdev, ring, *fence);
> >>  	trace_radeon_fence_emit(rdev->ddev, ring, (*fence)->seq);
> >>diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/rade=
on/radeon_kms.c
> >>index eb7164d..c9cfcf5 100644
> >>--- a/drivers/gpu/drm/radeon/radeon_kms.c
> >>+++ b/drivers/gpu/drm/radeon/radeon_kms.c
> >>@@ -885,5 +885,6 @@ const struct drm_ioctl_desc radeon_ioctls_kms[] =3D=
 {
> >>  	DRM_IOCTL_DEF_DRV(RADEON_GEM_BUSY, radeon_gem_busy_ioctl, DRM_AUTH|D=
RM_UNLOCKED|DRM_RENDER_ALLOW),
> >>  	DRM_IOCTL_DEF_DRV(RADEON_GEM_VA, radeon_gem_va_ioctl, DRM_AUTH|DRM_U=
NLOCKED|DRM_RENDER_ALLOW),
> >>  	DRM_IOCTL_DEF_DRV(RADEON_GEM_OP, radeon_gem_op_ioctl, DRM_AUTH|DRM_U=
NLOCKED|DRM_RENDER_ALLOW),
> >>+	DRM_IOCTL_DEF_DRV(RADEON_CS_DONE, radeon_cs_done_ioctl, DRM_AUTH|DRM_=
UNLOCKED|DRM_RENDER_ALLOW),
> >>  };
> >>  int radeon_max_kms_ioctl =3D ARRAY_SIZE(radeon_ioctls_kms);
> >>diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon_dr=
m.h
> >>index fea6099..a0c215d 100644
> >>--- a/include/uapi/drm/radeon_drm.h
> >>+++ b/include/uapi/drm/radeon_drm.h
> >>@@ -554,6 +554,7 @@ typedef struct {
> >>  #define DRM_IOCTL_RADEON_GEM_BUSY	DRM_IOWR(DRM_COMMAND_BASE + DRM_RAD=
EON_GEM_BUSY, struct drm_radeon_gem_busy)
> >>  #define DRM_IOCTL_RADEON_GEM_VA		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADE=
ON_GEM_VA, struct drm_radeon_gem_va)
> >>  #define DRM_IOCTL_RADEON_GEM_OP		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADE=
ON_GEM_OP, struct drm_radeon_gem_op)
> >>+#define DRM_IOCTL_RADEON_CS_DONE	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEO=
N_CS_DONE, struct drm_radeon_cs_done)
> >>  typedef struct drm_radeon_init {
> >>  	enum {
> >>@@ -936,6 +937,7 @@ struct drm_radeon_gem_va {
> >>  #define RADEON_CS_RING_VCE          4
> >>  /* The third dword of RADEON_CHUNK_ID_FLAGS is a sint32 that sets the=
 priority */
> >>  /* 0 =3D normal, + =3D higher priority, - =3D lower priority */
> >>+/* The fifth, sixth, seventh  dword are a 32bit fence ID, ring id and =
wrap id of this CS */
> >>  struct drm_radeon_cs_chunk {
> >>  	uint32_t		chunk_id;
> >>@@ -1038,4 +1040,11 @@ struct drm_radeon_info {
> >>  #define CIK_TILE_MODE_DEPTH_STENCIL_1D		5
> >>+struct drm_radeon_cs_done {
> >>+	int32_t			seq;
> >>+	int32_t			ring;
> >>+	int32_t			wrap;
> >>+	int32_t			pad;
> >>+};
> >>+
> >>  #endif
> >>-- =

> >>1.9.3
> >>
> =