All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <deathsimple@vodafone.de>
To: Alex Deucher <alexdeucher@gmail.com>
Cc: "dri-devel@lists.freedesktop.org" <dri-devel@lists.freedesktop.org>
Subject: Re: [PATCH 1/3] drm/radeon: take exclusive_lock in read mode during ring tests
Date: Mon, 18 Aug 2014 18:07:08 +0200	[thread overview]
Message-ID: <53F224AC.4070806@vodafone.de> (raw)
In-Reply-To: <CADnq5_NZ0Gnzzj9+1=7hnF=zWZ90+b00yujhwY6u9_vFmc1_+w@mail.gmail.com>

> Yeah, looks like a bug.  I think the attached patch should fix it.
Sounds logical and the patch is Reviewed-by: Christian König <christian.koenig@amd.com>

Going to apply Maartens patch on top and test that one a bit to make sure it works as expected.

Regards,
Christian.

Am 18.08.2014 um 18:03 schrieb Alex Deucher:
> On Mon, Aug 18, 2014 at 11:02 AM, Christian König
> <deathsimple@vodafone.de> wrote:
>> Am 18.08.2014 um 16:45 schrieb Maarten Lankhorst:
>>
>>> This is needed for the next commit, because the lockup detection
>>> will need the read lock to run.
>>>
>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>>> ---
>>>    drivers/gpu/drm/radeon/radeon.h        |  2 +-
>>>    drivers/gpu/drm/radeon/radeon_device.c | 61
>>> ++++++++++++++++++++--------------
>>>    2 files changed, 37 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/radeon/radeon.h
>>> b/drivers/gpu/drm/radeon/radeon.h
>>> index 9e1732eb402c..1d806983ec7b 100644
>>> --- a/drivers/gpu/drm/radeon/radeon.h
>>> +++ b/drivers/gpu/drm/radeon/radeon.h
>>> @@ -2312,7 +2312,7 @@ struct radeon_device {
>>>          bool                            need_dma32;
>>>          bool                            accel_working;
>>>          bool                            fastfb_working; /* IGP feature*/
>>> -       bool                            needs_reset;
>>> +       bool                            needs_reset, in_reset;
>>>          struct radeon_surface_reg surface_regs[RADEON_GEM_MAX_SURFACES];
>>>          const struct firmware *me_fw;   /* all family ME firmware */
>>>          const struct firmware *pfp_fw;  /* r6/700 PFP firmware */
>>> diff --git a/drivers/gpu/drm/radeon/radeon_device.c
>>> b/drivers/gpu/drm/radeon/radeon_device.c
>>> index c8ea050c8fa4..82633fdd399d 100644
>>> --- a/drivers/gpu/drm/radeon/radeon_device.c
>>> +++ b/drivers/gpu/drm/radeon/radeon_device.c
>>> @@ -1671,29 +1671,34 @@ int radeon_gpu_reset(struct radeon_device *rdev)
>>>          down_write(&rdev->exclusive_lock);
>>>          if (!rdev->needs_reset) {
>>> +               WARN_ON(rdev->in_reset);
>>>                  up_write(&rdev->exclusive_lock);
>>>                  return 0;
>>>          }
>>>          rdev->needs_reset = false;
>>> -
>>> -       radeon_save_bios_scratch_regs(rdev);
>>> -       /* block TTM */
>>>          resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
>>> -       radeon_pm_suspend(rdev);
>>> -       radeon_suspend(rdev);
>>>    -     for (i = 0; i < RADEON_NUM_RINGS; ++i) {
>>> -               ring_sizes[i] = radeon_ring_backup(rdev, &rdev->ring[i],
>>> -                                                  &ring_data[i]);
>>> -               if (ring_sizes[i]) {
>>> -                       saved = true;
>>> -                       dev_info(rdev->dev, "Saved %d dwords of commands "
>>> -                                "on ring %d.\n", ring_sizes[i], i);
>>> +       if (!rdev->in_reset) {
>>> +               rdev->in_reset = true;
>>> +
>>> +               radeon_save_bios_scratch_regs(rdev);
>>> +               /* block TTM */
>>> +               radeon_pm_suspend(rdev);
>>> +               radeon_suspend(rdev);
>>
>> That won't work correctly because you might end up with calling pm_resume
>> more often than suspend and that can only lead to a crash. Saying this we
>> probably already have a bug in the reset code at this point anyway, but see
>> below.
>>
>>
>>> +
>>> +               for (i = 0; i < RADEON_NUM_RINGS; ++i) {
>>> +                       ring_sizes[i] = radeon_ring_backup(rdev,
>>> &rdev->ring[i],
>>> +                                                          &ring_data[i]);
>>> +                       if (ring_sizes[i]) {
>>> +                               saved = true;
>>> +                               dev_info(rdev->dev, "Saved %d dwords of
>>> commands "
>>> +                                        "on ring %d.\n", ring_sizes[i],
>>> i);
>>> +                       }
>>>                  }
>>> -       }
>>> +       } else
>>> +               memset(ring_data, 0, sizeof(ring_data));
>>>    -retry:
>>>          r = radeon_asic_reset(rdev);
>>>          if (!r) {
>>>                  dev_info(rdev->dev, "GPU reset succeeded, trying to
>>> resume\n");
>>> @@ -1702,40 +1707,46 @@ retry:
>>>          radeon_restore_bios_scratch_regs(rdev);
>>
>> We should resume PM here as well.
>>
>>
>>>    -     if (!r) {
>>> +       if (!r && saved) {
>>>                  for (i = 0; i < RADEON_NUM_RINGS; ++i) {
>>>                          radeon_ring_restore(rdev, &rdev->ring[i],
>>>                                              ring_sizes[i], ring_data[i]);
>>> -                       ring_sizes[i] = 0;
>>>                          ring_data[i] = NULL;
>>>                  }
>>> +       } else {
>>> +               radeon_fence_driver_force_completion(rdev);
>>> +
>>> +               for (i = 0; i < RADEON_NUM_RINGS; ++i) {
>>> +                       kfree(ring_data[i]);
>>> +               }
>>> +       }
>>> +       downgrade_write(&rdev->exclusive_lock);
>>> +       ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched);
>>
>> I would unlock the delayed_workqueue first and then downgrade the readlock.
>>
>>
>>>    +     if (!r) {
>>>                  r = radeon_ib_ring_tests(rdev);
>>>                  if (r) {
>>>                          dev_err(rdev->dev, "ib ring test failed (%d).\n",
>>> r);
>>>                          if (saved) {
>>> -                               saved = false;
>>> +                               /* if reset fails, try without saving data
>>> */
>>> +                               rdev->needs_reset = true;
>>>                                  radeon_suspend(rdev);
>>> -                               goto retry;
>>> +                               up_read(&rdev->exclusive_lock);
>>> +                               return -EAGAIN;
>>>                          }
>>>                  }
>>> -       } else {
>>> -               radeon_fence_driver_force_completion(rdev);
>>> -               for (i = 0; i < RADEON_NUM_RINGS; ++i) {
>>> -                       kfree(ring_data[i]);
>>> -               }
>>>          }
>>>
>>
>>>          radeon_pm_resume(rdev);
>>
>> Move this more up.
>>
>> Alex is more into this, but it's probably a bug in the current reset code
>> that this is after the IB tests, cause the IB tests needs everything powered
>> up and with PM handling suspended it is possible that individual blocks are
>> powered down.
> Yeah, looks like a bug.  I think the attached patch should fix it.
>
> Alex
>
>> Thanks,
>> Christian.
>>
>>
>>>          drm_helper_resume_force_mode(rdev->ddev);
>>>    -     ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched);
>>>          if (r) {
>>>                  /* bad news, how to tell it to userspace ? */
>>>                  dev_info(rdev->dev, "GPU reset failed\n");
>>>          }
>>>    -     up_write(&rdev->exclusive_lock);
>>> +       rdev->in_reset = false;
>>> +       up_read(&rdev->exclusive_lock);
>>>          return r;
>>>    }
>>>
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel

  reply	other threads:[~2014-08-18 16:07 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-18 14:45 [PATCH 1/3] drm/radeon: take exclusive_lock in read mode during ring tests Maarten Lankhorst
2014-08-18 14:45 ` [PATCH 2/3] drm/radeon: handle lockup in delayed work, v3 Maarten Lankhorst
2014-08-18 15:12   ` Christian König
2014-08-18 15:28     ` Maarten Lankhorst
2014-08-18 16:20       ` Christian König
2014-08-18 14:46 ` [PATCH 3/3] drm/radeon: add timeout argument to radeon_fence_wait_seq Maarten Lankhorst
2014-08-18 15:02 ` [PATCH 1/3] drm/radeon: take exclusive_lock in read mode during ring tests Christian König
2014-08-18 15:24   ` [PATCH 1/3] drm/radeon: take exclusive_lock in read mode during ring tests, v2 Maarten Lankhorst
2014-08-18 16:03   ` [PATCH 1/3] drm/radeon: take exclusive_lock in read mode during ring tests Alex Deucher
2014-08-18 16:07     ` Christian König [this message]
2014-08-18 16:10       ` Alex Deucher
2014-08-18 20:02         ` Maarten Lankhorst
2014-08-18 20:56           ` Alex Deucher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53F224AC.4070806@vodafone.de \
    --to=deathsimple@vodafone.de \
    --cc=alexdeucher@gmail.com \
    --cc=dri-devel@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.