From mboxrd@z Thu Jan  1 00:00:00 1970
From: =?windows-1252?Q?Christian_K=F6nig?= <deathsimple@vodafone.de>
Subject: Re: [PULL REQUEST] ttm fence conversion
Date: Tue, 02 Sep 2014 15:47:35 +0200
Message-ID: <5405CA77.80100@vodafone.de>
References: <540459E3.9060406@canonical.com> <54046722.1000207@vodafone.de>
 <540475A6.2020603@canonical.com> <54049D19.30802@vodafone.de>
 <5404BE44.8030407@canonical.com> <5405850F.8010901@vodafone.de>
 <54058A07.2090101@canonical.com> <5405936E.2030301@vodafone.de>
 <5405B80D.6090909@canonical.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="windows-1252"; Format="flowed"
Content-Transfer-Encoding: quoted-printable
Return-path: <dri-devel-bounces@lists.freedesktop.org>
Received: from pegasos-out.vodafone.de (pegasos-out.vodafone.de [80.84.1.38])
 by gabe.freedesktop.org (Postfix) with ESMTP id 832B76E02D
 for <dri-devel@lists.freedesktop.org>; Tue,  2 Sep 2014 06:48:19 -0700 (PDT)
In-Reply-To: <5405B80D.6090909@canonical.com>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
To: Maarten Lankhorst <maarten.lankhorst@canonical.com>, Dave Airlie <airlied@redhat.com>
Cc: "dri-devel@lists.freedesktop.org" <dri-devel@lists.freedesktop.org>
List-Id: dri-devel@lists.freedesktop.org

> How does this patch look?
Looks better now, yes. This patch is Reviewed-by: Christian K=F6nig =

<christian.koenig@amd.com>

The next one we need to take a look at is "drm/radeon: use rcu waits in =

some ioctls":
> @@ -110,9 +110,12 @@ static int radeon_gem_set_domain(struct =

> drm_gem_object *gobj,
>         }
>         if (domain =3D=3D RADEON_GEM_DOMAIN_CPU) {
>                 /* Asking for cpu access wait for object idle */
> -               r =3D radeon_bo_wait(robj, NULL, false);
> -               if (r) {
> -                       printk(KERN_ERR "Failed to wait for object !\n");
> +               r =3D =

> reservation_object_wait_timeout_rcu(robj->tbo.resv, true, true, 30 * HZ);

Here r is still an int, so this assignment might overflow.

Apart from that the patch has my rb as well.

Regards,
Christian.

Am 02.09.2014 um 14:29 schrieb Maarten Lankhorst:
> Op 02-09-14 om 11:52 schreef Christian K=F6nig:
>> Am 02.09.2014 um 11:12 schrieb Maarten Lankhorst:
>>> Op 02-09-14 om 10:51 schreef Christian K=F6nig:
>>>> Am 01.09.2014 um 20:43 schrieb Maarten Lankhorst:
>>>>> Hey,
>>>>>
>>>>> On 01-09-14 18:21, Christian K=F6nig wrote:
>>>>>> Am 01.09.2014 um 15:33 schrieb Maarten Lankhorst:
>>>>>>> Hey,
>>>>>>>
>>>>>>> Op 01-09-14 om 14:31 schreef Christian K=F6nig:
>>>>>>>> Please wait a second with that.
>>>>>>>>
>>>>>>>> I didn't had a chance to test this yet and nobody has yet given it=
's rb on at least the radeon changes in this branch.
>>>>>>> Ok, my fault. I thought it was implicitly acked. I haven't made any=
 functional changes to these patches,
>>>>>>> just some small fixups and a fix to make it apply after the upstrea=
m removal of  RADEON_FENCE_SIGNALED_SEQ.
>>>>>> Yeah, but the resulting patch looks to complex for my taste and shou=
ld be simplified a bit more. Here is a more detailed review:
>>>>>>
>>>>>>> +    wait_queue_t fence_wake;
>>>>>> Only a nitpick, but please fix the indention and maybe add a comment.
>>>>>>
>>>>>>> +    struct work_struct delayed_irq_work;
>>>>>> Just drop that, the new fall back work item should take care of this=
 when the unfortunate case happens that somebody tries to enable_signaling =
in the middle of a GPU reset.
>>>>> I can only drop it if radeon_gpu_reset will always call radeon_irq_se=
t after downgrading to read mode, even if no work needs to be done. :-)
>>>>>
>>>>> Then again, should be possible.
>>>> The fall back handler should take care of the rare condition that we c=
an't activate the IRQ because the driver is in a lockup handler.
>>>>
>>>> The issue is that the delayed_irq_work handler needs to take the exclu=
sive lock once more and so would block an innocent process for the duration=
 of the GPU lockup.
>>>>
>>>> Either reschedule as delayed work item if we can't take the lock immed=
iately or just live with the delay of the fall back handler. Since IRQs usu=
ally don't work correctly immediately after an GPU reset I'm pretty sure th=
at the fallback handler will be needed anyway.
>>> Ok, rescheduling would be fine. Or could I go with the alternative, rem=
ove the delayed_irq_work and always set irqs after downgrading the write lo=
ck?
>> Always setting the IRQ's after downgrading the write lock would work for=
 me as well.
>>
>>
>>>>>>>     /*
>>>>>>> - * Cast helper
>>>>>>> - */
>>>>>>> -#define to_radeon_fence(p) ((struct radeon_fence *)(p))
>>>>>>> -
>>>>>>> -/*
>>>>>> Please define the new cast helper in radeon.h as well.
>>>>> The ops are only defined in radeon_fence.c, and nothing outside of ra=
deon_fence.c should care about the internals.
>>>> Then define this as a function instead, I need a checked cast from a f=
ence to a radeon_fence outside of the fence code as well.
>>> Ok.
>>>
>>>>>>>         if (!rdev->needs_reset) {
>>>>>>> -        up_write(&rdev->exclusive_lock);
>>>>>>> +        downgrade_write(&rdev->exclusive_lock);
>>>>>>> +        wake_up_all(&rdev->fence_queue);
>>>>>>> +        up_read(&rdev->exclusive_lock);
>>>>>>>             return 0;
>>>>>>>         }
>>>>>> Just drop that as well, no need to wake up anybody here.
>>>>> Maybe not, but if I have to remove delayed_irq_work I do need to add =
a radeon_irq_set here.
>>>>>
>>>>>>>     downgrade_write(&rdev->exclusive_lock);
>>>>>>> +    wake_up_all(&rdev->fence_queue);
>>>>>> Same here, the IB test will wake up all fences for recheck anyway.
>>>>> Same as previous comment. :-)
>>>>>
>>>>>>> + * radeon_fence_read_seq - Returns the current fence value without=
 updating
>>>>>>> + *
>>>>>>> + * @rdev: radeon_device pointer
>>>>>>> + * @ring: ring index to return the seqno of
>>>>>>> + */
>>>>>>> +static uint64_t radeon_fence_read_seq(struct radeon_device *rdev, =
int ring)
>>>>>>> +{
>>>>>>> +    uint64_t last_seq =3D atomic64_read(&rdev->fence_drv[ring].las=
t_seq);
>>>>>>> +    uint64_t last_emitted =3D rdev->fence_drv[ring].sync_seq[ring];
>>>>>>> +    uint64_t seq =3D radeon_fence_read(rdev, ring);
>>>>>>> +
>>>>>>> +    seq =3D radeon_fence_read(rdev, ring);
>>>>>>> +    seq |=3D last_seq & 0xffffffff00000000LL;
>>>>>>> +    if (seq < last_seq) {
>>>>>>> +        seq &=3D 0xffffffff;
>>>>>>> +        seq |=3D last_emitted & 0xffffffff00000000LL;
>>>>>>> +    }
>>>>>>> +    return seq;
>>>>>>> +}
>>>>>> Completely drop that and just check the last_seq signaled as set by =
radeon_fence_activity.
>>>>> Do you mean call radeon_fence_activity in radeon_fence_signaled? Or s=
hould I just use the cached value in radeon_fence_check_signaled.
>>>> Just check the cached value, it should be updated by radeon_fence_acti=
vity immediately before calling this anyway.
>>> Ok. I think I wrote this as a workaround for unreliable interrupts. :-)
>>>
>>>>> I can't call fence_activity in check_signaled, because it would cause=
 re-entrancy in fence_queue.
>>>>>
>>>>>>> +        if (!ret)
>>>>>>> +            FENCE_TRACE(&fence->base, "signaled from irq context\n=
");
>>>>>>> +        else
>>>>>>> +            FENCE_TRACE(&fence->base, "was already signaled\n");
>>>>>> Is all that text tracing necessary? Probably better define a trace p=
oint here.
>>>>> It gets optimized out normally. There's already a tracepoint called i=
n fence_signal.
>>>>>   =

>>>>>>> +    if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >=3D=
 fence->seq ||
>>>>>>> +        !rdev->ddev->irq_enabled)
>>>>>>> +        return false;
>>>>>> Checking irq_enabled here might not be such a good idea if the fence=
 code don't has a fall back on it's own. What exactly happens if enable_sig=
naling returns false?
>>>>> I thought irq_enabled couldn't happen under normal circumstances?
>>>> Not 100% sure, but I think it is temporary turned off during reset.
>>>>
>>>>> Anyway the fence gets treated as signaled if it returns false, and fe=
nce_signal will get called.
>>>> Thought so, well that's rather bad if we failed to install the IRQ han=
dle that we just treat all fences as signaled isn't it?
>>> I wrote this code before the delayed work was added, I guess the check =
for !irq_enabled can be removed now. :-)
>>>
>>>>>>> +static signed long radeon_fence_default_wait(struct fence *f, bool=
 intr,
>>>>>>> +                         signed long timeout)
>>>>>>> +{
>>>>>>> +    struct radeon_fence *fence =3D to_radeon_fence(f);
>>>>>>> +    struct radeon_device *rdev =3D fence->rdev;
>>>>>>> +    bool signaled;
>>>>>>> +
>>>>>>> +    fence_enable_sw_signaling(&fence->base);
>>>>>>> +
>>>>>>> +    /*
>>>>>>> +     * This function has to return -EDEADLK, but cannot hold
>>>>>>> +     * exclusive_lock during the wait because some callers
>>>>>>> +     * may already hold it. This means checking needs_reset without
>>>>>>> +     * lock, and not fiddling with any gpu internals.
>>>>>>> +     *
>>>>>>> +     * The callback installed with fence_enable_sw_signaling will
>>>>>>> +     * run before our wait_event_*timeout call, so we will see
>>>>>>> +     * both the signaled fence and the changes to needs_reset.
>>>>>>> +     */
>>>>>>> +
>>>>>>> +    if (intr)
>>>>>>> +        timeout =3D wait_event_interruptible_timeout(rdev->fence_q=
ueue,
>>>>>>> +                               ((signaled =3D (test_bit(FENCE_FLAG=
_SIGNALED_BIT, &fence->base.flags))) || rdev->needs_reset),
>>>>>>> +                               timeout);
>>>>>>> +    else
>>>>>>> +        timeout =3D wait_event_timeout(rdev->fence_queue,
>>>>>>> +                         ((signaled =3D (test_bit(FENCE_FLAG_SIGNA=
LED_BIT, &fence->base.flags))) || rdev->needs_reset),
>>>>>>> +                         timeout);
>>>>>>> +
>>>>>>> +    if (timeout > 0 && !signaled)
>>>>>>> +        return -EDEADLK;
>>>>>>> +    return timeout;
>>>>>>> +}
>>>>>> This at least needs to be properly formated, but I think since we no=
w don't need extra handling any more we don't need an extra wait function a=
s well.
>>>>> I thought of removing the extra handling, but the -EDEADLK stuff is n=
eeded because a deadlock could happen in ttm_bo_lock_delayed_workqueue othe=
rwise if the gpu's really hung there would never be any progress forward.
>>>> Hui what? ttm_bo_lock_delayed_workqueue shouldn't call any blocking wa=
it.
>>> Oops, you're right. ttm_bo_delayed_delete is called with remove_all fal=
se, not true.
>>>
>>> Unfortunately ttm_bo_vm_fault does hold the exclusive_lock in read mode=
, and other places that use eviction will use it too.
>>> Without returning -EDEADLK this could mean that ttm_bo_move_accel_clean=
up would block forever,
>>> so this function has to stay.
>> Ok, fine with me. I'm not deep enough into the TTM code to really judge =
this, but my understanding was that TTM still calls it's own wait callback.
> That one is about to go away. :-)
>
> How does this patch look?
> ---- 8< ----
> commit e75af5ee3b94157e868cb48b4bfbc1ca36183ba4
> Author: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> Date:   Thu Jan 9 11:03:12 2014 +0100
>
>      drm/radeon: use common fence implementation for fences, v4
>      =

>      Changes since v1:
>      - Kill the sw interrupt dance, add and use
>        radeon_irq_kms_sw_irq_get_delayed instead.
>      - Change custom wait function, lockdep complained about it.
>        Holding exclusive_lock in the wait function might cause deadlocks.
>        Instead do all the processing in .enable_signaling, and wait
>        on the global fence_queue to pick up gpu resets.
>      - Process all fences in radeon_gpu_reset after reset to close a race
>        with the trylock in enable_signaling.
>      Changes since v2:
>      - Small changes to work with the rewritten lockup recovery patches.
>      Changes since v3:
>      - Call radeon_fence_schedule_check when exclusive_lock cannot be
>        acquired to always cause a wake up.
>      - Reset irqs from hangup check.
>      - Drop reading seqno in the callback, use cached value.
>      - Fix indentation in radeon_fence_default_wait
>      - Add a radeon_test_signaled function, drop a few test_bit calls.
>      - Make to_radeon_fence global.
>      =

>      Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>
> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/rad=
eon.h
> index 83a24614138a..d80dc547a105 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -66,6 +66,7 @@
>   #include <linux/kref.h>
>   #include <linux/interval_tree.h>
>   #include <linux/hashtable.h>
> +#include <linux/fence.h>
>   =

>   #include <ttm/ttm_bo_api.h>
>   #include <ttm/ttm_bo_driver.h>
> @@ -354,17 +355,19 @@ struct radeon_fence_driver {
>   	/* sync_seq is protected by ring emission lock */
>   	uint64_t			sync_seq[RADEON_NUM_RINGS];
>   	atomic64_t			last_seq;
> -	bool				initialized;
> +	bool				initialized, delayed_irq;
>   	struct delayed_work		lockup_work;
>   };
>   =

>   struct radeon_fence {
> +	struct fence base;
> +
>   	struct radeon_device		*rdev;
> -	struct kref			kref;
> -	/* protected by radeon_fence.lock */
>   	uint64_t			seq;
>   	/* RB, DMA, etc. */
>   	unsigned			ring;
> +
> +	wait_queue_t			fence_wake;
>   };
>   =

>   int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring=
);
> @@ -782,6 +785,7 @@ struct radeon_irq {
>   int radeon_irq_kms_init(struct radeon_device *rdev);
>   void radeon_irq_kms_fini(struct radeon_device *rdev);
>   void radeon_irq_kms_sw_irq_get(struct radeon_device *rdev, int ring);
> +bool radeon_irq_kms_sw_irq_get_delayed(struct radeon_device *rdev, int r=
ing);
>   void radeon_irq_kms_sw_irq_put(struct radeon_device *rdev, int ring);
>   void radeon_irq_kms_pflip_irq_get(struct radeon_device *rdev, int crtc);
>   void radeon_irq_kms_pflip_irq_put(struct radeon_device *rdev, int crtc);
> @@ -2308,6 +2312,7 @@ struct radeon_device {
>   	struct radeon_mman		mman;
>   	struct radeon_fence_driver	fence_drv[RADEON_NUM_RINGS];
>   	wait_queue_head_t		fence_queue;
> +	unsigned			fence_context;
>   	struct mutex			ring_lock;
>   	struct radeon_ring		ring[RADEON_NUM_RINGS];
>   	bool				ib_pool_ready;
> @@ -2441,7 +2446,17 @@ void cik_mm_wdoorbell(struct radeon_device *rdev, =
u32 index, u32 v);
>   /*
>    * Cast helper
>    */
> -#define to_radeon_fence(p) ((struct radeon_fence *)(p))
> +extern const struct fence_ops radeon_fence_ops;
> +
> +static inline struct radeon_fence *to_radeon_fence(struct fence *f)
> +{
> +	struct radeon_fence *__f =3D container_of(f, struct radeon_fence, base);
> +
> +	if (__f->base.ops =3D=3D &radeon_fence_ops)
> +		return __f;
> +
> +	return NULL;
> +}
>   =

>   /*
>    * Registers read & write functions.
> diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/rad=
eon/radeon_device.c
> index d30f1cc1aa12..e84a76e6656a 100644
> --- a/drivers/gpu/drm/radeon/radeon_device.c
> +++ b/drivers/gpu/drm/radeon/radeon_device.c
> @@ -1253,6 +1253,7 @@ int radeon_device_init(struct radeon_device *rdev,
>   	for (i =3D 0; i < RADEON_NUM_RINGS; i++) {
>   		rdev->ring[i].idx =3D i;
>   	}
> +	rdev->fence_context =3D fence_context_alloc(RADEON_NUM_RINGS);
>   =

>   	DRM_INFO("initializing kernel modesetting (%s 0x%04X:0x%04X 0x%04X:0x%=
04X).\n",
>   		radeon_family_name[rdev->family], pdev->vendor, pdev->device,
> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/rade=
on/radeon_fence.c
> index ecdba3afa2c3..af9f2d6bd7d0 100644
> --- a/drivers/gpu/drm/radeon/radeon_fence.c
> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
> @@ -130,15 +130,18 @@ int radeon_fence_emit(struct radeon_device *rdev,
>   		      struct radeon_fence **fence,
>   		      int ring)
>   {
> +	u64 seq =3D ++rdev->fence_drv[ring].sync_seq[ring];
> +
>   	/* we are protected by the ring emission mutex */
>   	*fence =3D kmalloc(sizeof(struct radeon_fence), GFP_KERNEL);
>   	if ((*fence) =3D=3D NULL) {
>   		return -ENOMEM;
>   	}
> -	kref_init(&((*fence)->kref));
>   	(*fence)->rdev =3D rdev;
> -	(*fence)->seq =3D ++rdev->fence_drv[ring].sync_seq[ring];
> +	(*fence)->seq =3D seq;
>   	(*fence)->ring =3D ring;
> +	fence_init(&(*fence)->base, &radeon_fence_ops,
> +		   &rdev->fence_queue.lock, rdev->fence_context + ring, seq);
>   	radeon_fence_ring_emit(rdev, ring, *fence);
>   	trace_radeon_fence_emit(rdev->ddev, ring, (*fence)->seq);
>   	radeon_fence_schedule_check(rdev, ring);
> @@ -146,6 +149,41 @@ int radeon_fence_emit(struct radeon_device *rdev,
>   }
>   =

>   /**
> + * radeon_fence_check_signaled - callback from fence_queue
> + *
> + * this function is called with fence_queue lock held, which is also used
> + * for the fence locking itself, so unlocked variants are used for
> + * fence_signal, and remove_wait_queue.
> + */
> +static int radeon_fence_check_signaled(wait_queue_t *wait, unsigned mode=
, int flags, void *key)
> +{
> +	struct radeon_fence *fence;
> +	u64 seq;
> +
> +	fence =3D container_of(wait, struct radeon_fence, fence_wake);
> +
> +	/*
> +	 * We cannot use radeon_fence_process here because we're already
> +	 * in the waitqueue, in a call from wake_up_all.
> +	 */
> +	seq =3D atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq);
> +	if (seq >=3D fence->seq) {
> +		int ret =3D fence_signal_locked(&fence->base);
> +
> +		if (!ret)
> +			FENCE_TRACE(&fence->base, "signaled from irq context\n");
> +		else
> +			FENCE_TRACE(&fence->base, "was already signaled\n");
> +
> +		radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
> +		__remove_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
> +		fence_put(&fence->base);
> +	} else
> +		FENCE_TRACE(&fence->base, "pending\n");
> +	return 0;
> +}
> +
> +/**
>    * radeon_fence_activity - check for fence activity
>    *
>    * @rdev: radeon_device pointer
> @@ -242,6 +280,15 @@ static void radeon_fence_check_lockup(struct work_st=
ruct *work)
>   		return;
>   	}
>   =

> +	if (fence_drv->delayed_irq && rdev->ddev->irq_enabled) {
> +		unsigned long irqflags;
> +
> +		fence_drv->delayed_irq =3D false;
> +		spin_lock_irqsave(&rdev->irq.lock, irqflags);
> +		radeon_irq_set(rdev);
> +		spin_unlock_irqrestore(&rdev->irq.lock, irqflags);
> +	}
> +
>   	if (radeon_fence_activity(rdev, ring))
>   		wake_up_all(&rdev->fence_queue);
>   =

> @@ -276,21 +323,6 @@ void radeon_fence_process(struct radeon_device *rdev=
, int ring)
>   }
>   =

>   /**
> - * radeon_fence_destroy - destroy a fence
> - *
> - * @kref: fence kref
> - *
> - * Frees the fence object (all asics).
> - */
> -static void radeon_fence_destroy(struct kref *kref)
> -{
> -	struct radeon_fence *fence;
> -
> -	fence =3D container_of(kref, struct radeon_fence, kref);
> -	kfree(fence);
> -}
> -
> -/**
>    * radeon_fence_seq_signaled - check if a fence sequence number has sig=
naled
>    *
>    * @rdev: radeon device pointer
> @@ -318,6 +350,75 @@ static bool radeon_fence_seq_signaled(struct radeon_=
device *rdev,
>   	return false;
>   }
>   =

> +static bool radeon_fence_is_signaled(struct fence *f)
> +{
> +	struct radeon_fence *fence =3D to_radeon_fence(f);
> +	struct radeon_device *rdev =3D fence->rdev;
> +	unsigned ring =3D fence->ring;
> +	u64 seq =3D fence->seq;
> +
> +	if (atomic64_read(&rdev->fence_drv[ring].last_seq) >=3D seq) {
> +		return true;
> +	}
> +
> +	if (down_read_trylock(&rdev->exclusive_lock)) {
> +		radeon_fence_process(rdev, ring);
> +		up_read(&rdev->exclusive_lock);
> +
> +		if (atomic64_read(&rdev->fence_drv[ring].last_seq) >=3D seq) {
> +			return true;
> +		}
> +	}
> +	return false;
> +}
> +
> +/**
> + * radeon_fence_enable_signaling - enable signalling on fence
> + * @fence: fence
> + *
> + * This function is called with fence_queue lock held, and adds a callba=
ck
> + * to fence_queue that checks if this fence is signaled, and if so it
> + * signals the fence and removes itself.
> + */
> +static bool radeon_fence_enable_signaling(struct fence *f)
> +{
> +	struct radeon_fence *fence =3D to_radeon_fence(f);
> +	struct radeon_device *rdev =3D fence->rdev;
> +
> +	if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >=3D fence->s=
eq)
> +		return false;
> +
> +	if (down_read_trylock(&rdev->exclusive_lock)) {
> +		radeon_irq_kms_sw_irq_get(rdev, fence->ring);
> +
> +		if (radeon_fence_activity(rdev, fence->ring))
> +			wake_up_all_locked(&rdev->fence_queue);
> +
> +		/* did fence get signaled after we enabled the sw irq? */
> +		if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >=3D fence->=
seq) {
> +			radeon_irq_kms_sw_irq_put(rdev, fence->ring);
> +			up_read(&rdev->exclusive_lock);
> +			return false;
> +		}
> +
> +		up_read(&rdev->exclusive_lock);
> +	} else {
> +		/* we're probably in a lockup, lets not fiddle too much */
> +		if (radeon_irq_kms_sw_irq_get_delayed(rdev, fence->ring))
> +			rdev->fence_drv[fence->ring].delayed_irq =3D true;
> +		radeon_fence_schedule_check(rdev, fence->ring);
> +	}
> +
> +	fence->fence_wake.flags =3D 0;
> +	fence->fence_wake.private =3D NULL;
> +	fence->fence_wake.func =3D radeon_fence_check_signaled;
> +	__add_wait_queue(&rdev->fence_queue, &fence->fence_wake);
> +	fence_get(f);
> +
> +	FENCE_TRACE(&fence->base, "armed on ring %i!\n", fence->ring);
> +	return true;
> +}
> +
>   /**
>    * radeon_fence_signaled - check if a fence has signaled
>    *
> @@ -330,8 +431,15 @@ bool radeon_fence_signaled(struct radeon_fence *fenc=
e)
>   {
>   	if (!fence)
>   		return true;
> -	if (radeon_fence_seq_signaled(fence->rdev, fence->seq, fence->ring))
> +
> +	if (radeon_fence_seq_signaled(fence->rdev, fence->seq, fence->ring)) {
> +		int ret;
> +
> +		ret =3D fence_signal(&fence->base);
> +		if (!ret)
> +			FENCE_TRACE(&fence->base, "signaled from radeon_fence_signaled\n");
>   		return true;
> +	}
>   	return false;
>   }
>   =

> @@ -433,17 +541,15 @@ int radeon_fence_wait(struct radeon_fence *fence, b=
ool intr)
>   	uint64_t seq[RADEON_NUM_RINGS] =3D {};
>   	long r;
>   =

> -	if (fence =3D=3D NULL) {
> -		WARN(1, "Querying an invalid fence : %p !\n", fence);
> -		return -EINVAL;
> -	}
> -
>   	seq[fence->ring] =3D fence->seq;
>   	r =3D radeon_fence_wait_seq_timeout(fence->rdev, seq, intr, MAX_SCHEDU=
LE_TIMEOUT);
>   	if (r < 0) {
>   		return r;
>   	}
>   =

> +	r =3D fence_signal(&fence->base);
> +	if (!r)
> +		FENCE_TRACE(&fence->base, "signaled from fence_wait\n");
>   	return 0;
>   }
>   =

> @@ -557,7 +663,7 @@ int radeon_fence_wait_empty(struct radeon_device *rde=
v, int ring)
>    */
>   struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence)
>   {
> -	kref_get(&fence->kref);
> +	fence_get(&fence->base);
>   	return fence;
>   }
>   =

> @@ -574,7 +680,7 @@ void radeon_fence_unref(struct radeon_fence **fence)
>   =

>   	*fence =3D NULL;
>   	if (tmp) {
> -		kref_put(&tmp->kref, radeon_fence_destroy);
> +		fence_put(&tmp->base);
>   	}
>   }
>   =

> @@ -887,3 +993,72 @@ int radeon_debugfs_fence_init(struct radeon_device *=
rdev)
>   	return 0;
>   #endif
>   }
> +
> +static const char *radeon_fence_get_driver_name(struct fence *fence)
> +{
> +	return "radeon";
> +}
> +
> +static const char *radeon_fence_get_timeline_name(struct fence *f)
> +{
> +	struct radeon_fence *fence =3D to_radeon_fence(f);
> +	switch (fence->ring) {
> +	case RADEON_RING_TYPE_GFX_INDEX: return "radeon.gfx";
> +	case CAYMAN_RING_TYPE_CP1_INDEX: return "radeon.cp1";
> +	case CAYMAN_RING_TYPE_CP2_INDEX: return "radeon.cp2";
> +	case R600_RING_TYPE_DMA_INDEX: return "radeon.dma";
> +	case CAYMAN_RING_TYPE_DMA1_INDEX: return "radeon.dma1";
> +	case R600_RING_TYPE_UVD_INDEX: return "radeon.uvd";
> +	case TN_RING_TYPE_VCE1_INDEX: return "radeon.vce1";
> +	case TN_RING_TYPE_VCE2_INDEX: return "radeon.vce2";
> +	default: WARN_ON_ONCE(1); return "radeon.unk";
> +	}
> +}
> +
> +static inline bool radeon_test_signaled(struct radeon_fence *fence)
> +{
> +	return test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags);
> +}
> +
> +static signed long radeon_fence_default_wait(struct fence *f, bool intr,
> +					     signed long t)
> +{
> +	struct radeon_fence *fence =3D to_radeon_fence(f);
> +	struct radeon_device *rdev =3D fence->rdev;
> +	bool signaled;
> +
> +	fence_enable_sw_signaling(&fence->base);
> +
> +	/*
> +	 * This function has to return -EDEADLK, but cannot hold
> +	 * exclusive_lock during the wait because some callers
> +	 * may already hold it. This means checking needs_reset without
> +	 * lock, and not fiddling with any gpu internals.
> +	 *
> +	 * The callback installed with fence_enable_sw_signaling will
> +	 * run before our wait_event_*timeout call, so we will see
> +	 * both the signaled fence and the changes to needs_reset.
> +	 */
> +
> +	if (intr)
> +		t =3D wait_event_interruptible_timeout(rdev->fence_queue,
> +			((signaled =3D radeon_test_signaled(fence)) ||
> +			 rdev->needs_reset), t);
> +	else
> +		t =3D wait_event_timeout(rdev->fence_queue,
> +			((signaled =3D radeon_test_signaled(fence)) ||
> +			 rdev->needs_reset), t);
> +
> +	if (t > 0 && !signaled)
> +		return -EDEADLK;
> +	return t;
> +}
> +
> +const struct fence_ops radeon_fence_ops =3D {
> +	.get_driver_name =3D radeon_fence_get_driver_name,
> +	.get_timeline_name =3D radeon_fence_get_timeline_name,
> +	.enable_signaling =3D radeon_fence_enable_signaling,
> +	.signaled =3D radeon_fence_is_signaled,
> +	.wait =3D radeon_fence_default_wait,
> +	.release =3D NULL,
> +};
> diff --git a/drivers/gpu/drm/radeon/radeon_irq_kms.c b/drivers/gpu/drm/ra=
deon/radeon_irq_kms.c
> index f0bff4be67f1..7784911d78ef 100644
> --- a/drivers/gpu/drm/radeon/radeon_irq_kms.c
> +++ b/drivers/gpu/drm/radeon/radeon_irq_kms.c
> @@ -324,6 +324,21 @@ void radeon_irq_kms_sw_irq_get(struct radeon_device =
*rdev, int ring)
>   }
>   =

>   /**
> + * radeon_irq_kms_sw_irq_get_delayed - enable software interrupt
> + *
> + * @rdev: radeon device pointer
> + * @ring: ring whose interrupt you want to enable
> + *
> + * Enables the software interrupt for a specific ring (all asics).
> + * The software interrupt is generally used to signal a fence on
> + * a particular ring.
> + */
> +bool radeon_irq_kms_sw_irq_get_delayed(struct radeon_device *rdev, int r=
ing)
> +{
> +	return atomic_inc_return(&rdev->irq.ring_int[ring]) =3D=3D 1;
> +}
> +
> +/**
>    * radeon_irq_kms_sw_irq_put - disable software interrupt
>    *
>    * @rdev: radeon device pointer
>