From mboxrd@z Thu Jan  1 00:00:00 1970
From: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Subject: Re: [PULL REQUEST] ttm fence conversion
Date: Tue, 02 Sep 2014 14:29:01 +0200
Message-ID: <5405B80D.6090909@canonical.com>
References: <540459E3.9060406@canonical.com> <54046722.1000207@vodafone.de>
 <540475A6.2020603@canonical.com> <54049D19.30802@vodafone.de>
 <5404BE44.8030407@canonical.com> <5405850F.8010901@vodafone.de>
 <54058A07.2090101@canonical.com> <5405936E.2030301@vodafone.de>
Mime-Version: 1.0
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: quoted-printable
Return-path: <dri-devel-bounces@lists.freedesktop.org>
Received: from youngberry.canonical.com (youngberry.canonical.com
 [91.189.89.112])
 by gabe.freedesktop.org (Postfix) with ESMTP id 4468C89109
 for <dri-devel@lists.freedesktop.org>; Tue,  2 Sep 2014 05:29:03 -0700 (PDT)
In-Reply-To: <5405936E.2030301@vodafone.de>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
To: =?windows-1252?Q?Christian_K=F6nig?= <deathsimple@vodafone.de>, Dave Airlie <airlied@redhat.com>
Cc: "dri-devel@lists.freedesktop.org" <dri-devel@lists.freedesktop.org>
List-Id: dri-devel@lists.freedesktop.org

Op 02-09-14 om 11:52 schreef Christian K=F6nig:
> Am 02.09.2014 um 11:12 schrieb Maarten Lankhorst:
>> Op 02-09-14 om 10:51 schreef Christian K=F6nig:
>>> Am 01.09.2014 um 20:43 schrieb Maarten Lankhorst:
>>>> Hey,
>>>>
>>>> On 01-09-14 18:21, Christian K=F6nig wrote:
>>>>> Am 01.09.2014 um 15:33 schrieb Maarten Lankhorst:
>>>>>> Hey,
>>>>>>
>>>>>> Op 01-09-14 om 14:31 schreef Christian K=F6nig:
>>>>>>> Please wait a second with that.
>>>>>>>
>>>>>>> I didn't had a chance to test this yet and nobody has yet given it'=
s rb on at least the radeon changes in this branch.
>>>>>> Ok, my fault. I thought it was implicitly acked. I haven't made any =
functional changes to these patches,
>>>>>> just some small fixups and a fix to make it apply after the upstream=
 removal of  RADEON_FENCE_SIGNALED_SEQ.
>>>>> Yeah, but the resulting patch looks to complex for my taste and shoul=
d be simplified a bit more. Here is a more detailed review:
>>>>>
>>>>>> +    wait_queue_t fence_wake;
>>>>> Only a nitpick, but please fix the indention and maybe add a comment.
>>>>>
>>>>>> +    struct work_struct delayed_irq_work;
>>>>> Just drop that, the new fall back work item should take care of this =
when the unfortunate case happens that somebody tries to enable_signaling i=
n the middle of a GPU reset.
>>>> I can only drop it if radeon_gpu_reset will always call radeon_irq_set=
 after downgrading to read mode, even if no work needs to be done. :-)
>>>>
>>>> Then again, should be possible.
>>> The fall back handler should take care of the rare condition that we ca=
n't activate the IRQ because the driver is in a lockup handler.
>>>
>>> The issue is that the delayed_irq_work handler needs to take the exclus=
ive lock once more and so would block an innocent process for the duration =
of the GPU lockup.
>>>
>>> Either reschedule as delayed work item if we can't take the lock immedi=
ately or just live with the delay of the fall back handler. Since IRQs usua=
lly don't work correctly immediately after an GPU reset I'm pretty sure tha=
t the fallback handler will be needed anyway.
>> Ok, rescheduling would be fine. Or could I go with the alternative, remo=
ve the delayed_irq_work and always set irqs after downgrading the write loc=
k?
>
> Always setting the IRQ's after downgrading the write lock would work for =
me as well.
>
>
>>
>>>>>>    /*
>>>>>> - * Cast helper
>>>>>> - */
>>>>>> -#define to_radeon_fence(p) ((struct radeon_fence *)(p))
>>>>>> -
>>>>>> -/*
>>>>> Please define the new cast helper in radeon.h as well.
>>>> The ops are only defined in radeon_fence.c, and nothing outside of rad=
eon_fence.c should care about the internals.
>>> Then define this as a function instead, I need a checked cast from a fe=
nce to a radeon_fence outside of the fence code as well.
>> Ok.
>>
>>>>>>        if (!rdev->needs_reset) {
>>>>>> -        up_write(&rdev->exclusive_lock);
>>>>>> +        downgrade_write(&rdev->exclusive_lock);
>>>>>> +        wake_up_all(&rdev->fence_queue);
>>>>>> +        up_read(&rdev->exclusive_lock);
>>>>>>            return 0;
>>>>>>        }
>>>>> Just drop that as well, no need to wake up anybody here.
>>>> Maybe not, but if I have to remove delayed_irq_work I do need to add a=
 radeon_irq_set here.
>>>>
>>>>>>    downgrade_write(&rdev->exclusive_lock);
>>>>>> +    wake_up_all(&rdev->fence_queue);
>>>>> Same here, the IB test will wake up all fences for recheck anyway.
>>>> Same as previous comment. :-)
>>>>
>>>>>> + * radeon_fence_read_seq - Returns the current fence value without =
updating
>>>>>> + *
>>>>>> + * @rdev: radeon_device pointer
>>>>>> + * @ring: ring index to return the seqno of
>>>>>> + */
>>>>>> +static uint64_t radeon_fence_read_seq(struct radeon_device *rdev, i=
nt ring)
>>>>>> +{
>>>>>> +    uint64_t last_seq =3D atomic64_read(&rdev->fence_drv[ring].last=
_seq);
>>>>>> +    uint64_t last_emitted =3D rdev->fence_drv[ring].sync_seq[ring];
>>>>>> +    uint64_t seq =3D radeon_fence_read(rdev, ring);
>>>>>> +
>>>>>> +    seq =3D radeon_fence_read(rdev, ring);
>>>>>> +    seq |=3D last_seq & 0xffffffff00000000LL;
>>>>>> +    if (seq < last_seq) {
>>>>>> +        seq &=3D 0xffffffff;
>>>>>> +        seq |=3D last_emitted & 0xffffffff00000000LL;
>>>>>> +    }
>>>>>> +    return seq;
>>>>>> +}
>>>>> Completely drop that and just check the last_seq signaled as set by r=
adeon_fence_activity.
>>>> Do you mean call radeon_fence_activity in radeon_fence_signaled? Or sh=
ould I just use the cached value in radeon_fence_check_signaled.
>>> Just check the cached value, it should be updated by radeon_fence_activ=
ity immediately before calling this anyway.
>> Ok. I think I wrote this as a workaround for unreliable interrupts. :-)
>>
>>>> I can't call fence_activity in check_signaled, because it would cause =
re-entrancy in fence_queue.
>>>>
>>>>>> +        if (!ret)
>>>>>> +            FENCE_TRACE(&fence->base, "signaled from irq context\n"=
);
>>>>>> +        else
>>>>>> +            FENCE_TRACE(&fence->base, "was already signaled\n");
>>>>> Is all that text tracing necessary? Probably better define a trace po=
int here.
>>>> It gets optimized out normally. There's already a tracepoint called in=
 fence_signal.
>>>>  =

>>>>>> +    if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >=3D =
fence->seq ||
>>>>>> +        !rdev->ddev->irq_enabled)
>>>>>> +        return false;
>>>>> Checking irq_enabled here might not be such a good idea if the fence =
code don't has a fall back on it's own. What exactly happens if enable_sign=
aling returns false?
>>>> I thought irq_enabled couldn't happen under normal circumstances?
>>> Not 100% sure, but I think it is temporary turned off during reset.
>>>
>>>> Anyway the fence gets treated as signaled if it returns false, and fen=
ce_signal will get called.
>>> Thought so, well that's rather bad if we failed to install the IRQ hand=
le that we just treat all fences as signaled isn't it?
>> I wrote this code before the delayed work was added, I guess the check f=
or !irq_enabled can be removed now. :-)
>>
>>>>>> +static signed long radeon_fence_default_wait(struct fence *f, bool =
intr,
>>>>>> +                         signed long timeout)
>>>>>> +{
>>>>>> +    struct radeon_fence *fence =3D to_radeon_fence(f);
>>>>>> +    struct radeon_device *rdev =3D fence->rdev;
>>>>>> +    bool signaled;
>>>>>> +
>>>>>> +    fence_enable_sw_signaling(&fence->base);
>>>>>> +
>>>>>> +    /*
>>>>>> +     * This function has to return -EDEADLK, but cannot hold
>>>>>> +     * exclusive_lock during the wait because some callers
>>>>>> +     * may already hold it. This means checking needs_reset without
>>>>>> +     * lock, and not fiddling with any gpu internals.
>>>>>> +     *
>>>>>> +     * The callback installed with fence_enable_sw_signaling will
>>>>>> +     * run before our wait_event_*timeout call, so we will see
>>>>>> +     * both the signaled fence and the changes to needs_reset.
>>>>>> +     */
>>>>>> +
>>>>>> +    if (intr)
>>>>>> +        timeout =3D wait_event_interruptible_timeout(rdev->fence_qu=
eue,
>>>>>> +                               ((signaled =3D (test_bit(FENCE_FLAG_=
SIGNALED_BIT, &fence->base.flags))) || rdev->needs_reset),
>>>>>> +                               timeout);
>>>>>> +    else
>>>>>> +        timeout =3D wait_event_timeout(rdev->fence_queue,
>>>>>> +                         ((signaled =3D (test_bit(FENCE_FLAG_SIGNAL=
ED_BIT, &fence->base.flags))) || rdev->needs_reset),
>>>>>> +                         timeout);
>>>>>> +
>>>>>> +    if (timeout > 0 && !signaled)
>>>>>> +        return -EDEADLK;
>>>>>> +    return timeout;
>>>>>> +}
>>>>> This at least needs to be properly formated, but I think since we now=
 don't need extra handling any more we don't need an extra wait function as=
 well.
>>>> I thought of removing the extra handling, but the -EDEADLK stuff is ne=
eded because a deadlock could happen in ttm_bo_lock_delayed_workqueue other=
wise if the gpu's really hung there would never be any progress forward.
>>> Hui what? ttm_bo_lock_delayed_workqueue shouldn't call any blocking wai=
t.
>> Oops, you're right. ttm_bo_delayed_delete is called with remove_all fals=
e, not true.
>>
>> Unfortunately ttm_bo_vm_fault does hold the exclusive_lock in read mode,=
 and other places that use eviction will use it too.
>> Without returning -EDEADLK this could mean that ttm_bo_move_accel_cleanu=
p would block forever,
>> so this function has to stay.
>
> Ok, fine with me. I'm not deep enough into the TTM code to really judge t=
his, but my understanding was that TTM still calls it's own wait callback.
That one is about to go away. :-)

How does this patch look?
---- 8< ----
commit e75af5ee3b94157e868cb48b4bfbc1ca36183ba4
Author: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Date:   Thu Jan 9 11:03:12 2014 +0100

    drm/radeon: use common fence implementation for fences, v4
    =

    Changes since v1:
    - Kill the sw interrupt dance, add and use
      radeon_irq_kms_sw_irq_get_delayed instead.
    - Change custom wait function, lockdep complained about it.
      Holding exclusive_lock in the wait function might cause deadlocks.
      Instead do all the processing in .enable_signaling, and wait
      on the global fence_queue to pick up gpu resets.
    - Process all fences in radeon_gpu_reset after reset to close a race
      with the trylock in enable_signaling.
    Changes since v2:
    - Small changes to work with the rewritten lockup recovery patches.
    Changes since v3:
    - Call radeon_fence_schedule_check when exclusive_lock cannot be
      acquired to always cause a wake up.
    - Reset irqs from hangup check.
    - Drop reading seqno in the callback, use cached value.
    - Fix indentation in radeon_fence_default_wait
    - Add a radeon_test_signaled function, drop a few test_bit calls.
    - Make to_radeon_fence global.
    =

    Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeo=
n.h
index 83a24614138a..d80dc547a105 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -66,6 +66,7 @@
 #include <linux/kref.h>
 #include <linux/interval_tree.h>
 #include <linux/hashtable.h>
+#include <linux/fence.h>
 =

 #include <ttm/ttm_bo_api.h>
 #include <ttm/ttm_bo_driver.h>
@@ -354,17 +355,19 @@ struct radeon_fence_driver {
 	/* sync_seq is protected by ring emission lock */
 	uint64_t			sync_seq[RADEON_NUM_RINGS];
 	atomic64_t			last_seq;
-	bool				initialized;
+	bool				initialized, delayed_irq;
 	struct delayed_work		lockup_work;
 };
 =

 struct radeon_fence {
+	struct fence base;
+
 	struct radeon_device		*rdev;
-	struct kref			kref;
-	/* protected by radeon_fence.lock */
 	uint64_t			seq;
 	/* RB, DMA, etc. */
 	unsigned			ring;
+
+	wait_queue_t			fence_wake;
 };
 =

 int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring);
@@ -782,6 +785,7 @@ struct radeon_irq {
 int radeon_irq_kms_init(struct radeon_device *rdev);
 void radeon_irq_kms_fini(struct radeon_device *rdev);
 void radeon_irq_kms_sw_irq_get(struct radeon_device *rdev, int ring);
+bool radeon_irq_kms_sw_irq_get_delayed(struct radeon_device *rdev, int rin=
g);
 void radeon_irq_kms_sw_irq_put(struct radeon_device *rdev, int ring);
 void radeon_irq_kms_pflip_irq_get(struct radeon_device *rdev, int crtc);
 void radeon_irq_kms_pflip_irq_put(struct radeon_device *rdev, int crtc);
@@ -2308,6 +2312,7 @@ struct radeon_device {
 	struct radeon_mman		mman;
 	struct radeon_fence_driver	fence_drv[RADEON_NUM_RINGS];
 	wait_queue_head_t		fence_queue;
+	unsigned			fence_context;
 	struct mutex			ring_lock;
 	struct radeon_ring		ring[RADEON_NUM_RINGS];
 	bool				ib_pool_ready;
@@ -2441,7 +2446,17 @@ void cik_mm_wdoorbell(struct radeon_device *rdev, u3=
2 index, u32 v);
 /*
  * Cast helper
  */
-#define to_radeon_fence(p) ((struct radeon_fence *)(p))
+extern const struct fence_ops radeon_fence_ops;
+
+static inline struct radeon_fence *to_radeon_fence(struct fence *f)
+{
+	struct radeon_fence *__f =3D container_of(f, struct radeon_fence, base);
+
+	if (__f->base.ops =3D=3D &radeon_fence_ops)
+		return __f;
+
+	return NULL;
+}
 =

 /*
  * Registers read & write functions.
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeo=
n/radeon_device.c
index d30f1cc1aa12..e84a76e6656a 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1253,6 +1253,7 @@ int radeon_device_init(struct radeon_device *rdev,
 	for (i =3D 0; i < RADEON_NUM_RINGS; i++) {
 		rdev->ring[i].idx =3D i;
 	}
+	rdev->fence_context =3D fence_context_alloc(RADEON_NUM_RINGS);
 =

 	DRM_INFO("initializing kernel modesetting (%s 0x%04X:0x%04X 0x%04X:0x%04X=
).\n",
 		radeon_family_name[rdev->family], pdev->vendor, pdev->device,
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon=
/radeon_fence.c
index ecdba3afa2c3..af9f2d6bd7d0 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -130,15 +130,18 @@ int radeon_fence_emit(struct radeon_device *rdev,
 		      struct radeon_fence **fence,
 		      int ring)
 {
+	u64 seq =3D ++rdev->fence_drv[ring].sync_seq[ring];
+
 	/* we are protected by the ring emission mutex */
 	*fence =3D kmalloc(sizeof(struct radeon_fence), GFP_KERNEL);
 	if ((*fence) =3D=3D NULL) {
 		return -ENOMEM;
 	}
-	kref_init(&((*fence)->kref));
 	(*fence)->rdev =3D rdev;
-	(*fence)->seq =3D ++rdev->fence_drv[ring].sync_seq[ring];
+	(*fence)->seq =3D seq;
 	(*fence)->ring =3D ring;
+	fence_init(&(*fence)->base, &radeon_fence_ops,
+		   &rdev->fence_queue.lock, rdev->fence_context + ring, seq);
 	radeon_fence_ring_emit(rdev, ring, *fence);
 	trace_radeon_fence_emit(rdev->ddev, ring, (*fence)->seq);
 	radeon_fence_schedule_check(rdev, ring);
@@ -146,6 +149,41 @@ int radeon_fence_emit(struct radeon_device *rdev,
 }
 =

 /**
+ * radeon_fence_check_signaled - callback from fence_queue
+ *
+ * this function is called with fence_queue lock held, which is also used
+ * for the fence locking itself, so unlocked variants are used for
+ * fence_signal, and remove_wait_queue.
+ */
+static int radeon_fence_check_signaled(wait_queue_t *wait, unsigned mode, =
int flags, void *key)
+{
+	struct radeon_fence *fence;
+	u64 seq;
+
+	fence =3D container_of(wait, struct radeon_fence, fence_wake);
+
+	/*
+	 * We cannot use radeon_fence_process here because we're already
+	 * in the waitqueue, in a call from wake_up_all.
+	 */
+	seq =3D atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq);
+	if (seq >=3D fence->seq) {
+		int ret =3D fence_signal_locked(&fence->base);
+
+		if (!ret)
+			FENCE_TRACE(&fence->base, "signaled from irq context\n");
+		else
+			FENCE_TRACE(&fence->base, "was already signaled\n");
+
+		radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
+		__remove_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
+		fence_put(&fence->base);
+	} else
+		FENCE_TRACE(&fence->base, "pending\n");
+	return 0;
+}
+
+/**
  * radeon_fence_activity - check for fence activity
  *
  * @rdev: radeon_device pointer
@@ -242,6 +280,15 @@ static void radeon_fence_check_lockup(struct work_stru=
ct *work)
 		return;
 	}
 =

+	if (fence_drv->delayed_irq && rdev->ddev->irq_enabled) {
+		unsigned long irqflags;
+
+		fence_drv->delayed_irq =3D false;
+		spin_lock_irqsave(&rdev->irq.lock, irqflags);
+		radeon_irq_set(rdev);
+		spin_unlock_irqrestore(&rdev->irq.lock, irqflags);
+	}
+
 	if (radeon_fence_activity(rdev, ring))
 		wake_up_all(&rdev->fence_queue);
 =

@@ -276,21 +323,6 @@ void radeon_fence_process(struct radeon_device *rdev, =
int ring)
 }
 =

 /**
- * radeon_fence_destroy - destroy a fence
- *
- * @kref: fence kref
- *
- * Frees the fence object (all asics).
- */
-static void radeon_fence_destroy(struct kref *kref)
-{
-	struct radeon_fence *fence;
-
-	fence =3D container_of(kref, struct radeon_fence, kref);
-	kfree(fence);
-}
-
-/**
  * radeon_fence_seq_signaled - check if a fence sequence number has signal=
ed
  *
  * @rdev: radeon device pointer
@@ -318,6 +350,75 @@ static bool radeon_fence_seq_signaled(struct radeon_de=
vice *rdev,
 	return false;
 }
 =

+static bool radeon_fence_is_signaled(struct fence *f)
+{
+	struct radeon_fence *fence =3D to_radeon_fence(f);
+	struct radeon_device *rdev =3D fence->rdev;
+	unsigned ring =3D fence->ring;
+	u64 seq =3D fence->seq;
+
+	if (atomic64_read(&rdev->fence_drv[ring].last_seq) >=3D seq) {
+		return true;
+	}
+
+	if (down_read_trylock(&rdev->exclusive_lock)) {
+		radeon_fence_process(rdev, ring);
+		up_read(&rdev->exclusive_lock);
+
+		if (atomic64_read(&rdev->fence_drv[ring].last_seq) >=3D seq) {
+			return true;
+		}
+	}
+	return false;
+}
+
+/**
+ * radeon_fence_enable_signaling - enable signalling on fence
+ * @fence: fence
+ *
+ * This function is called with fence_queue lock held, and adds a callback
+ * to fence_queue that checks if this fence is signaled, and if so it
+ * signals the fence and removes itself.
+ */
+static bool radeon_fence_enable_signaling(struct fence *f)
+{
+	struct radeon_fence *fence =3D to_radeon_fence(f);
+	struct radeon_device *rdev =3D fence->rdev;
+
+	if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >=3D fence->seq)
+		return false;
+
+	if (down_read_trylock(&rdev->exclusive_lock)) {
+		radeon_irq_kms_sw_irq_get(rdev, fence->ring);
+
+		if (radeon_fence_activity(rdev, fence->ring))
+			wake_up_all_locked(&rdev->fence_queue);
+
+		/* did fence get signaled after we enabled the sw irq? */
+		if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >=3D fence->se=
q) {
+			radeon_irq_kms_sw_irq_put(rdev, fence->ring);
+			up_read(&rdev->exclusive_lock);
+			return false;
+		}
+
+		up_read(&rdev->exclusive_lock);
+	} else {
+		/* we're probably in a lockup, lets not fiddle too much */
+		if (radeon_irq_kms_sw_irq_get_delayed(rdev, fence->ring))
+			rdev->fence_drv[fence->ring].delayed_irq =3D true;
+		radeon_fence_schedule_check(rdev, fence->ring);
+	}
+
+	fence->fence_wake.flags =3D 0;
+	fence->fence_wake.private =3D NULL;
+	fence->fence_wake.func =3D radeon_fence_check_signaled;
+	__add_wait_queue(&rdev->fence_queue, &fence->fence_wake);
+	fence_get(f);
+
+	FENCE_TRACE(&fence->base, "armed on ring %i!\n", fence->ring);
+	return true;
+}
+
 /**
  * radeon_fence_signaled - check if a fence has signaled
  *
@@ -330,8 +431,15 @@ bool radeon_fence_signaled(struct radeon_fence *fence)
 {
 	if (!fence)
 		return true;
-	if (radeon_fence_seq_signaled(fence->rdev, fence->seq, fence->ring))
+
+	if (radeon_fence_seq_signaled(fence->rdev, fence->seq, fence->ring)) {
+		int ret;
+
+		ret =3D fence_signal(&fence->base);
+		if (!ret)
+			FENCE_TRACE(&fence->base, "signaled from radeon_fence_signaled\n");
 		return true;
+	}
 	return false;
 }
 =

@@ -433,17 +541,15 @@ int radeon_fence_wait(struct radeon_fence *fence, boo=
l intr)
 	uint64_t seq[RADEON_NUM_RINGS] =3D {};
 	long r;
 =

-	if (fence =3D=3D NULL) {
-		WARN(1, "Querying an invalid fence : %p !\n", fence);
-		return -EINVAL;
-	}
-
 	seq[fence->ring] =3D fence->seq;
 	r =3D radeon_fence_wait_seq_timeout(fence->rdev, seq, intr, MAX_SCHEDULE_=
TIMEOUT);
 	if (r < 0) {
 		return r;
 	}
 =

+	r =3D fence_signal(&fence->base);
+	if (!r)
+		FENCE_TRACE(&fence->base, "signaled from fence_wait\n");
 	return 0;
 }
 =

@@ -557,7 +663,7 @@ int radeon_fence_wait_empty(struct radeon_device *rdev,=
 int ring)
  */
 struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence)
 {
-	kref_get(&fence->kref);
+	fence_get(&fence->base);
 	return fence;
 }
 =

@@ -574,7 +680,7 @@ void radeon_fence_unref(struct radeon_fence **fence)
 =

 	*fence =3D NULL;
 	if (tmp) {
-		kref_put(&tmp->kref, radeon_fence_destroy);
+		fence_put(&tmp->base);
 	}
 }
 =

@@ -887,3 +993,72 @@ int radeon_debugfs_fence_init(struct radeon_device *rd=
ev)
 	return 0;
 #endif
 }
+
+static const char *radeon_fence_get_driver_name(struct fence *fence)
+{
+	return "radeon";
+}
+
+static const char *radeon_fence_get_timeline_name(struct fence *f)
+{
+	struct radeon_fence *fence =3D to_radeon_fence(f);
+	switch (fence->ring) {
+	case RADEON_RING_TYPE_GFX_INDEX: return "radeon.gfx";
+	case CAYMAN_RING_TYPE_CP1_INDEX: return "radeon.cp1";
+	case CAYMAN_RING_TYPE_CP2_INDEX: return "radeon.cp2";
+	case R600_RING_TYPE_DMA_INDEX: return "radeon.dma";
+	case CAYMAN_RING_TYPE_DMA1_INDEX: return "radeon.dma1";
+	case R600_RING_TYPE_UVD_INDEX: return "radeon.uvd";
+	case TN_RING_TYPE_VCE1_INDEX: return "radeon.vce1";
+	case TN_RING_TYPE_VCE2_INDEX: return "radeon.vce2";
+	default: WARN_ON_ONCE(1); return "radeon.unk";
+	}
+}
+
+static inline bool radeon_test_signaled(struct radeon_fence *fence)
+{
+	return test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags);
+}
+
+static signed long radeon_fence_default_wait(struct fence *f, bool intr,
+					     signed long t)
+{
+	struct radeon_fence *fence =3D to_radeon_fence(f);
+	struct radeon_device *rdev =3D fence->rdev;
+	bool signaled;
+
+	fence_enable_sw_signaling(&fence->base);
+
+	/*
+	 * This function has to return -EDEADLK, but cannot hold
+	 * exclusive_lock during the wait because some callers
+	 * may already hold it. This means checking needs_reset without
+	 * lock, and not fiddling with any gpu internals.
+	 *
+	 * The callback installed with fence_enable_sw_signaling will
+	 * run before our wait_event_*timeout call, so we will see
+	 * both the signaled fence and the changes to needs_reset.
+	 */
+
+	if (intr)
+		t =3D wait_event_interruptible_timeout(rdev->fence_queue,
+			((signaled =3D radeon_test_signaled(fence)) ||
+			 rdev->needs_reset), t);
+	else
+		t =3D wait_event_timeout(rdev->fence_queue,
+			((signaled =3D radeon_test_signaled(fence)) ||
+			 rdev->needs_reset), t);
+
+	if (t > 0 && !signaled)
+		return -EDEADLK;
+	return t;
+}
+
+const struct fence_ops radeon_fence_ops =3D {
+	.get_driver_name =3D radeon_fence_get_driver_name,
+	.get_timeline_name =3D radeon_fence_get_timeline_name,
+	.enable_signaling =3D radeon_fence_enable_signaling,
+	.signaled =3D radeon_fence_is_signaled,
+	.wait =3D radeon_fence_default_wait,
+	.release =3D NULL,
+};
diff --git a/drivers/gpu/drm/radeon/radeon_irq_kms.c b/drivers/gpu/drm/rade=
on/radeon_irq_kms.c
index f0bff4be67f1..7784911d78ef 100644
--- a/drivers/gpu/drm/radeon/radeon_irq_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_irq_kms.c
@@ -324,6 +324,21 @@ void radeon_irq_kms_sw_irq_get(struct radeon_device *r=
dev, int ring)
 }
 =

 /**
+ * radeon_irq_kms_sw_irq_get_delayed - enable software interrupt
+ *
+ * @rdev: radeon device pointer
+ * @ring: ring whose interrupt you want to enable
+ *
+ * Enables the software interrupt for a specific ring (all asics).
+ * The software interrupt is generally used to signal a fence on
+ * a particular ring.
+ */
+bool radeon_irq_kms_sw_irq_get_delayed(struct radeon_device *rdev, int rin=
g)
+{
+	return atomic_inc_return(&rdev->irq.ring_int[ring]) =3D=3D 1;
+}
+
+/**
  * radeon_irq_kms_sw_irq_put - disable software interrupt
  *
  * @rdev: radeon device pointer