From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?windows-1252?Q?Christian_K=F6nig?= Subject: Re: [PATCH 1/3] drm/radeon: take exclusive_lock in read mode during ring tests Date: Mon, 18 Aug 2014 18:07:08 +0200 Message-ID: <53F224AC.4070806@vodafone.de> References: <53F2118F.8000706@canonical.com> <53F215A0.1070107@vodafone.de> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; Format="flowed" Content-Transfer-Encoding: quoted-printable Return-path: Received: from pegasos-out.vodafone.de (pegasos-out.vodafone.de [80.84.1.38]) by gabe.freedesktop.org (Postfix) with ESMTP id 2A67A6E457 for ; Mon, 18 Aug 2014 09:07:45 -0700 (PDT) Received: from localhost (localhost.localdomain [127.0.0.1]) by pegasos-out.vodafone.de (Rohrpostix2 Daemon) with ESMTP id 52E64761F49 for ; Mon, 18 Aug 2014 18:07:44 +0200 (CEST) Received: from pegasos-out.vodafone.de ([127.0.0.1]) by localhost (rohrpostix2.prod.vfnet.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lBr5s14eZdQs for ; Mon, 18 Aug 2014 18:07:22 +0200 (CEST) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: Alex Deucher Cc: "dri-devel@lists.freedesktop.org" List-Id: dri-devel@lists.freedesktop.org > Yeah, looks like a bug. I think the attached patch should fix it. Sounds logical and the patch is Reviewed-by: Christian K=F6nig Going to apply Maartens patch on top and test that one a bit to make sure i= t works as expected. Regards, Christian. Am 18.08.2014 um 18:03 schrieb Alex Deucher: > On Mon, Aug 18, 2014 at 11:02 AM, Christian K=F6nig > wrote: >> Am 18.08.2014 um 16:45 schrieb Maarten Lankhorst: >> >>> This is needed for the next commit, because the lockup detection >>> will need the read lock to run. >>> >>> Signed-off-by: Maarten Lankhorst >>> --- >>> drivers/gpu/drm/radeon/radeon.h | 2 +- >>> drivers/gpu/drm/radeon/radeon_device.c | 61 >>> ++++++++++++++++++++-------------- >>> 2 files changed, 37 insertions(+), 26 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/radeon/radeon.h >>> b/drivers/gpu/drm/radeon/radeon.h >>> index 9e1732eb402c..1d806983ec7b 100644 >>> --- a/drivers/gpu/drm/radeon/radeon.h >>> +++ b/drivers/gpu/drm/radeon/radeon.h >>> @@ -2312,7 +2312,7 @@ struct radeon_device { >>> bool need_dma32; >>> bool accel_working; >>> bool fastfb_working; /* IGP feature= */ >>> - bool needs_reset; >>> + bool needs_reset, in_reset; >>> struct radeon_surface_reg surface_regs[RADEON_GEM_MAX_SURFACES= ]; >>> const struct firmware *me_fw; /* all family ME firmware */ >>> const struct firmware *pfp_fw; /* r6/700 PFP firmware */ >>> diff --git a/drivers/gpu/drm/radeon/radeon_device.c >>> b/drivers/gpu/drm/radeon/radeon_device.c >>> index c8ea050c8fa4..82633fdd399d 100644 >>> --- a/drivers/gpu/drm/radeon/radeon_device.c >>> +++ b/drivers/gpu/drm/radeon/radeon_device.c >>> @@ -1671,29 +1671,34 @@ int radeon_gpu_reset(struct radeon_device *rdev) >>> down_write(&rdev->exclusive_lock); >>> if (!rdev->needs_reset) { >>> + WARN_ON(rdev->in_reset); >>> up_write(&rdev->exclusive_lock); >>> return 0; >>> } >>> rdev->needs_reset =3D false; >>> - >>> - radeon_save_bios_scratch_regs(rdev); >>> - /* block TTM */ >>> resched =3D ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev); >>> - radeon_pm_suspend(rdev); >>> - radeon_suspend(rdev); >>> - for (i =3D 0; i < RADEON_NUM_RINGS; ++i) { >>> - ring_sizes[i] =3D radeon_ring_backup(rdev, &rdev->ring[= i], >>> - &ring_data[i]); >>> - if (ring_sizes[i]) { >>> - saved =3D true; >>> - dev_info(rdev->dev, "Saved %d dwords of command= s " >>> - "on ring %d.\n", ring_sizes[i], i); >>> + if (!rdev->in_reset) { >>> + rdev->in_reset =3D true; >>> + >>> + radeon_save_bios_scratch_regs(rdev); >>> + /* block TTM */ >>> + radeon_pm_suspend(rdev); >>> + radeon_suspend(rdev); >> >> That won't work correctly because you might end up with calling pm_resume >> more often than suspend and that can only lead to a crash. Saying this we >> probably already have a bug in the reset code at this point anyway, but = see >> below. >> >> >>> + >>> + for (i =3D 0; i < RADEON_NUM_RINGS; ++i) { >>> + ring_sizes[i] =3D radeon_ring_backup(rdev, >>> &rdev->ring[i], >>> + &ring_data[i= ]); >>> + if (ring_sizes[i]) { >>> + saved =3D true; >>> + dev_info(rdev->dev, "Saved %d dwords of >>> commands " >>> + "on ring %d.\n", ring_sizes[i], >>> i); >>> + } >>> } >>> - } >>> + } else >>> + memset(ring_data, 0, sizeof(ring_data)); >>> -retry: >>> r =3D radeon_asic_reset(rdev); >>> if (!r) { >>> dev_info(rdev->dev, "GPU reset succeeded, trying to >>> resume\n"); >>> @@ -1702,40 +1707,46 @@ retry: >>> radeon_restore_bios_scratch_regs(rdev); >> >> We should resume PM here as well. >> >> >>> - if (!r) { >>> + if (!r && saved) { >>> for (i =3D 0; i < RADEON_NUM_RINGS; ++i) { >>> radeon_ring_restore(rdev, &rdev->ring[i], >>> ring_sizes[i], ring_data[i= ]); >>> - ring_sizes[i] =3D 0; >>> ring_data[i] =3D NULL; >>> } >>> + } else { >>> + radeon_fence_driver_force_completion(rdev); >>> + >>> + for (i =3D 0; i < RADEON_NUM_RINGS; ++i) { >>> + kfree(ring_data[i]); >>> + } >>> + } >>> + downgrade_write(&rdev->exclusive_lock); >>> + ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched); >> >> I would unlock the delayed_workqueue first and then downgrade the readlo= ck. >> >> >>> + if (!r) { >>> r =3D radeon_ib_ring_tests(rdev); >>> if (r) { >>> dev_err(rdev->dev, "ib ring test failed (%d).\= n", >>> r); >>> if (saved) { >>> - saved =3D false; >>> + /* if reset fails, try without saving d= ata >>> */ >>> + rdev->needs_reset =3D true; >>> radeon_suspend(rdev); >>> - goto retry; >>> + up_read(&rdev->exclusive_lock); >>> + return -EAGAIN; >>> } >>> } >>> - } else { >>> - radeon_fence_driver_force_completion(rdev); >>> - for (i =3D 0; i < RADEON_NUM_RINGS; ++i) { >>> - kfree(ring_data[i]); >>> - } >>> } >>> >> >>> radeon_pm_resume(rdev); >> >> Move this more up. >> >> Alex is more into this, but it's probably a bug in the current reset code >> that this is after the IB tests, cause the IB tests needs everything pow= ered >> up and with PM handling suspended it is possible that individual blocks = are >> powered down. > Yeah, looks like a bug. I think the attached patch should fix it. > > Alex > >> Thanks, >> Christian. >> >> >>> drm_helper_resume_force_mode(rdev->ddev); >>> - ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched); >>> if (r) { >>> /* bad news, how to tell it to userspace ? */ >>> dev_info(rdev->dev, "GPU reset failed\n"); >>> } >>> - up_write(&rdev->exclusive_lock); >>> + rdev->in_reset =3D false; >>> + up_read(&rdev->exclusive_lock); >>> return r; >>> } >>> >> >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/dri-devel