From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?ISO-8859-1?Q?Christian_K=F6nig?= Subject: Re: [PATCH] drm/radeon: avoid deadlock in pm path when waiting for fence Date: Mon, 17 Dec 2012 17:41:31 +0100 Message-ID: <50CF4B3B.4000708@vodafone.de> References: <1355760272-18402-1-git-send-email-j.glisse@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" Content-Transfer-Encoding: quoted-printable Return-path: Received: from outgoing.email.vodafone.de (outgoing.email.vodafone.de [139.7.28.128]) by gabe.freedesktop.org (Postfix) with ESMTP id 915BCE5CB2 for ; Mon, 17 Dec 2012 08:41:34 -0800 (PST) In-Reply-To: <1355760272-18402-1-git-send-email-j.glisse@gmail.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org Errors-To: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org To: j.glisse@gmail.com Cc: Jerome Glisse , dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org On 17.12.2012 17:04, j.glisse@gmail.com wrote: > From: Jerome Glisse > > radeon_fence_wait_empty_locked should not trigger GPU reset as no > place where it's call from would benefit from such thing and it > actually lead to a kernel deadlock in case the reset is triggered > from pm codepath. Instead force ring completion in place where it > makes sense or return early in others. > > Signed-off-by: Jerome Glisse I wanted to stop losing GPU reset signals by moving the reset into = radeon_fence_wait_empty locked, but it's true that in most cases it = doesn't make much sense (suspend/finish) and I didn't know that it could = cause a hang in the PM code. We should probably fix the PM code to properly signal this condition to = it's caller and reset the GPU when it is possible to do so, but fixing = the deadlock is of course more important. Also should probably go into the stable kernel as well, but anyway it is: Reviewed-by: Christian K=F6nig > --- > drivers/gpu/drm/radeon/radeon.h | 2 +- > drivers/gpu/drm/radeon/radeon_device.c | 13 +++++++++++-- > drivers/gpu/drm/radeon/radeon_fence.c | 30 ++++++++++++++-------------= --- > drivers/gpu/drm/radeon/radeon_pm.c | 15 ++++++++++++--- > 4 files changed, 38 insertions(+), 22 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/rad= eon.h > index 9c7625c..071b2d7 100644 > --- a/drivers/gpu/drm/radeon/radeon.h > +++ b/drivers/gpu/drm/radeon/radeon.h > @@ -231,7 +231,7 @@ void radeon_fence_process(struct radeon_device *rdev,= int ring); > bool radeon_fence_signaled(struct radeon_fence *fence); > int radeon_fence_wait(struct radeon_fence *fence, bool interruptible); > int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring); > -void radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring= ); > +int radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring); > int radeon_fence_wait_any(struct radeon_device *rdev, > struct radeon_fence **fences, > bool intr); > diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/rad= eon/radeon_device.c > index 774fae7..53a9223 100644 > --- a/drivers/gpu/drm/radeon/radeon_device.c > +++ b/drivers/gpu/drm/radeon/radeon_device.c > @@ -1163,6 +1163,7 @@ int radeon_suspend_kms(struct drm_device *dev, pm_m= essage_t state) > struct drm_crtc *crtc; > struct drm_connector *connector; > int i, r; > + bool force_completion =3D false; > = > if (dev =3D=3D NULL || dev->dev_private =3D=3D NULL) { > return -ENODEV; > @@ -1205,8 +1206,16 @@ int radeon_suspend_kms(struct drm_device *dev, pm_= message_t state) > = > mutex_lock(&rdev->ring_lock); > /* wait for gpu to finish processing current batch */ > - for (i =3D 0; i < RADEON_NUM_RINGS; i++) > - radeon_fence_wait_empty_locked(rdev, i); > + for (i =3D 0; i < RADEON_NUM_RINGS; i++) { > + r =3D radeon_fence_wait_empty_locked(rdev, i); > + if (r) { > + /* delay GPU reset to resume */ > + force_completion =3D true; > + } > + } > + if (force_completion) { > + radeon_fence_driver_force_completion(rdev); > + } > mutex_unlock(&rdev->ring_lock); > = > radeon_save_bios_scratch_regs(rdev); > diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/rade= on/radeon_fence.c > index bf7b20e..28c09b6 100644 > --- a/drivers/gpu/drm/radeon/radeon_fence.c > +++ b/drivers/gpu/drm/radeon/radeon_fence.c > @@ -609,26 +609,20 @@ int radeon_fence_wait_next_locked(struct radeon_dev= ice *rdev, int ring) > * Returns 0 if the fences have passed, error for all other cases. > * Caller must hold ring lock. > */ > -void radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring) > +int radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring) > { > uint64_t seq =3D rdev->fence_drv[ring].sync_seq[ring]; > + int r; > = > - while(1) { > - int r; > - r =3D radeon_fence_wait_seq(rdev, seq, ring, false, false); > + r =3D radeon_fence_wait_seq(rdev, seq, ring, false, false); > + if (r) { > if (r =3D=3D -EDEADLK) { > - mutex_unlock(&rdev->ring_lock); > - r =3D radeon_gpu_reset(rdev); > - mutex_lock(&rdev->ring_lock); > - if (!r) > - continue; > - } > - if (r) { > - dev_err(rdev->dev, "error waiting for ring to become" > - " idle (%d)\n", r); > + return -EDEADLK; > } > - return; > + dev_err(rdev->dev, "error waiting for ring[%d] to become idle (%d)\n", > + ring, r); > } > + return 0; > } > = > /** > @@ -854,13 +848,17 @@ int radeon_fence_driver_init(struct radeon_device *= rdev) > */ > void radeon_fence_driver_fini(struct radeon_device *rdev) > { > - int ring; > + int ring, r; > = > mutex_lock(&rdev->ring_lock); > for (ring =3D 0; ring < RADEON_NUM_RINGS; ring++) { > if (!rdev->fence_drv[ring].initialized) > continue; > - radeon_fence_wait_empty_locked(rdev, ring); > + r =3D radeon_fence_wait_empty_locked(rdev, ring); > + if (r) { > + /* no need to trigger GPU reset as we are unloading */ > + radeon_fence_driver_force_completion(rdev); > + } > wake_up_all(&rdev->fence_queue); > radeon_scratch_free(rdev, rdev->fence_drv[ring].scratch_reg); > rdev->fence_drv[ring].initialized =3D false; > diff --git a/drivers/gpu/drm/radeon/radeon_pm.c b/drivers/gpu/drm/radeon/= radeon_pm.c > index aa14dbb..0bfa656 100644 > --- a/drivers/gpu/drm/radeon/radeon_pm.c > +++ b/drivers/gpu/drm/radeon/radeon_pm.c > @@ -234,7 +234,7 @@ static void radeon_set_power_state(struct radeon_devi= ce *rdev) > = > static void radeon_pm_set_clocks(struct radeon_device *rdev) > { > - int i; > + int i, r; > = > /* no need to take locks, etc. if nothing's going to change */ > if ((rdev->pm.requested_clock_mode_index =3D=3D rdev->pm.current_clock= _mode_index) && > @@ -248,8 +248,17 @@ static void radeon_pm_set_clocks(struct radeon_devic= e *rdev) > /* wait for the rings to drain */ > for (i =3D 0; i < RADEON_NUM_RINGS; i++) { > struct radeon_ring *ring =3D &rdev->ring[i]; > - if (ring->ready) > - radeon_fence_wait_empty_locked(rdev, i); > + if (!ring->ready) { > + continue; > + } > + r =3D radeon_fence_wait_empty_locked(rdev, i); > + if (r) { > + /* needs a GPU reset dont reset here */ > + mutex_unlock(&rdev->ring_lock); > + up_write(&rdev->pm.mclk_lock); > + mutex_unlock(&rdev->ddev->struct_mutex); > + return; > + } > } > = > radeon_unmap_vram_bos(rdev);