From mboxrd@z Thu Jan 1 00:00:00 1970 From: Maarten Lankhorst Subject: Re: [PULL REQUEST] ttm fence conversion Date: Tue, 02 Sep 2014 11:12:39 +0200 Message-ID: <54058A07.2090101@canonical.com> References: <540459E3.9060406@canonical.com> <54046722.1000207@vodafone.de> <540475A6.2020603@canonical.com> <54049D19.30802@vodafone.de> <5404BE44.8030407@canonical.com> <5405850F.8010901@vodafone.de> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: quoted-printable Return-path: Received: from youngberry.canonical.com (youngberry.canonical.com [91.189.89.112]) by gabe.freedesktop.org (Postfix) with ESMTP id 89D6D6E422 for ; Tue, 2 Sep 2014 02:12:41 -0700 (PDT) In-Reply-To: <5405850F.8010901@vodafone.de> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: =?windows-1252?Q?Christian_K=F6nig?= , Dave Airlie Cc: "dri-devel@lists.freedesktop.org" List-Id: dri-devel@lists.freedesktop.org Op 02-09-14 om 10:51 schreef Christian K=F6nig: > Am 01.09.2014 um 20:43 schrieb Maarten Lankhorst: >> Hey, >> >> On 01-09-14 18:21, Christian K=F6nig wrote: >>> Am 01.09.2014 um 15:33 schrieb Maarten Lankhorst: >>>> Hey, >>>> >>>> Op 01-09-14 om 14:31 schreef Christian K=F6nig: >>>>> Please wait a second with that. >>>>> >>>>> I didn't had a chance to test this yet and nobody has yet given it's = rb on at least the radeon changes in this branch. >>>> Ok, my fault. I thought it was implicitly acked. I haven't made any fu= nctional changes to these patches, >>>> just some small fixups and a fix to make it apply after the upstream r= emoval of RADEON_FENCE_SIGNALED_SEQ. >>> Yeah, but the resulting patch looks to complex for my taste and should = be simplified a bit more. Here is a more detailed review: >>> >>>> + wait_queue_t fence_wake; >>> Only a nitpick, but please fix the indention and maybe add a comment. >>> >>>> + struct work_struct delayed_irq_work; >>> Just drop that, the new fall back work item should take care of this wh= en the unfortunate case happens that somebody tries to enable_signaling in = the middle of a GPU reset. >> I can only drop it if radeon_gpu_reset will always call radeon_irq_set a= fter downgrading to read mode, even if no work needs to be done. :-) >> >> Then again, should be possible. > > The fall back handler should take care of the rare condition that we can'= t activate the IRQ because the driver is in a lockup handler. > > The issue is that the delayed_irq_work handler needs to take the exclusiv= e lock once more and so would block an innocent process for the duration of= the GPU lockup. > > Either reschedule as delayed work item if we can't take the lock immediat= ely or just live with the delay of the fall back handler. Since IRQs usuall= y don't work correctly immediately after an GPU reset I'm pretty sure that = the fallback handler will be needed anyway. Ok, rescheduling would be fine. Or could I go with the alternative, remove = the delayed_irq_work and always set irqs after downgrading the write lock? >>>> /* >>>> - * Cast helper >>>> - */ >>>> -#define to_radeon_fence(p) ((struct radeon_fence *)(p)) >>>> - >>>> -/* >>> Please define the new cast helper in radeon.h as well. >> The ops are only defined in radeon_fence.c, and nothing outside of radeo= n_fence.c should care about the internals. > > Then define this as a function instead, I need a checked cast from a fenc= e to a radeon_fence outside of the fence code as well. Ok. >> >>>> if (!rdev->needs_reset) { >>>> - up_write(&rdev->exclusive_lock); >>>> + downgrade_write(&rdev->exclusive_lock); >>>> + wake_up_all(&rdev->fence_queue); >>>> + up_read(&rdev->exclusive_lock); >>>> return 0; >>>> } >>> Just drop that as well, no need to wake up anybody here. >> Maybe not, but if I have to remove delayed_irq_work I do need to add a r= adeon_irq_set here. >> >>>> downgrade_write(&rdev->exclusive_lock); >>>> + wake_up_all(&rdev->fence_queue); >>> Same here, the IB test will wake up all fences for recheck anyway. >> Same as previous comment. :-) >> >>>> + * radeon_fence_read_seq - Returns the current fence value without up= dating >>>> + * >>>> + * @rdev: radeon_device pointer >>>> + * @ring: ring index to return the seqno of >>>> + */ >>>> +static uint64_t radeon_fence_read_seq(struct radeon_device *rdev, int= ring) >>>> +{ >>>> + uint64_t last_seq =3D atomic64_read(&rdev->fence_drv[ring].last_s= eq); >>>> + uint64_t last_emitted =3D rdev->fence_drv[ring].sync_seq[ring]; >>>> + uint64_t seq =3D radeon_fence_read(rdev, ring); >>>> + >>>> + seq =3D radeon_fence_read(rdev, ring); >>>> + seq |=3D last_seq & 0xffffffff00000000LL; >>>> + if (seq < last_seq) { >>>> + seq &=3D 0xffffffff; >>>> + seq |=3D last_emitted & 0xffffffff00000000LL; >>>> + } >>>> + return seq; >>>> +} >>> Completely drop that and just check the last_seq signaled as set by rad= eon_fence_activity. >> Do you mean call radeon_fence_activity in radeon_fence_signaled? Or shou= ld I just use the cached value in radeon_fence_check_signaled. > > Just check the cached value, it should be updated by radeon_fence_activit= y immediately before calling this anyway. Ok. I think I wrote this as a workaround for unreliable interrupts. :-) >> >> I can't call fence_activity in check_signaled, because it would cause re= -entrancy in fence_queue. >> >>>> + if (!ret) >>>> + FENCE_TRACE(&fence->base, "signaled from irq context\n"); >>>> + else >>>> + FENCE_TRACE(&fence->base, "was already signaled\n"); >>> Is all that text tracing necessary? Probably better define a trace poin= t here. >> It gets optimized out normally. There's already a tracepoint called in f= ence_signal. >> = >>>> + if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >=3D fe= nce->seq || >>>> + !rdev->ddev->irq_enabled) >>>> + return false; >>> Checking irq_enabled here might not be such a good idea if the fence co= de don't has a fall back on it's own. What exactly happens if enable_signal= ing returns false? >> I thought irq_enabled couldn't happen under normal circumstances? > > Not 100% sure, but I think it is temporary turned off during reset. > >> Anyway the fence gets treated as signaled if it returns false, and fence= _signal will get called. > Thought so, well that's rather bad if we failed to install the IRQ handle= that we just treat all fences as signaled isn't it? I wrote this code before the delayed work was added, I guess the check for = !irq_enabled can be removed now. :-) >> >>>> +static signed long radeon_fence_default_wait(struct fence *f, bool in= tr, >>>> + signed long timeout) >>>> +{ >>>> + struct radeon_fence *fence =3D to_radeon_fence(f); >>>> + struct radeon_device *rdev =3D fence->rdev; >>>> + bool signaled; >>>> + >>>> + fence_enable_sw_signaling(&fence->base); >>>> + >>>> + /* >>>> + * This function has to return -EDEADLK, but cannot hold >>>> + * exclusive_lock during the wait because some callers >>>> + * may already hold it. This means checking needs_reset without >>>> + * lock, and not fiddling with any gpu internals. >>>> + * >>>> + * The callback installed with fence_enable_sw_signaling will >>>> + * run before our wait_event_*timeout call, so we will see >>>> + * both the signaled fence and the changes to needs_reset. >>>> + */ >>>> + >>>> + if (intr) >>>> + timeout =3D wait_event_interruptible_timeout(rdev->fence_queu= e, >>>> + ((signaled =3D (test_bit(FENCE_FLAG_SI= GNALED_BIT, &fence->base.flags))) || rdev->needs_reset), >>>> + timeout); >>>> + else >>>> + timeout =3D wait_event_timeout(rdev->fence_queue, >>>> + ((signaled =3D (test_bit(FENCE_FLAG_SIGNALED= _BIT, &fence->base.flags))) || rdev->needs_reset), >>>> + timeout); >>>> + >>>> + if (timeout > 0 && !signaled) >>>> + return -EDEADLK; >>>> + return timeout; >>>> +} >>> This at least needs to be properly formated, but I think since we now d= on't need extra handling any more we don't need an extra wait function as w= ell. >> I thought of removing the extra handling, but the -EDEADLK stuff is need= ed because a deadlock could happen in ttm_bo_lock_delayed_workqueue otherwi= se if the gpu's really hung there would never be any progress forward. > > Hui what? ttm_bo_lock_delayed_workqueue shouldn't call any blocking wait. Oops, you're right. ttm_bo_delayed_delete is called with remove_all false, = not true. Unfortunately ttm_bo_vm_fault does hold the exclusive_lock in read mode, an= d other places that use eviction will use it too. Without returning -EDEADLK this could mean that ttm_bo_move_accel_cleanup w= ould block forever, so this function has to stay. ~Maarten