From: Andrey Grodzovsky <Andrey.Grodzovsky-5C7GfCeVMHo@public.gmane.org>
To: christian.koenig-5C7GfCeVMHo@public.gmane.org, "Deucher,
Alexander" <Alexander.Deucher-5C7GfCeVMHo@public.gmane.org>,
"StDenis, Tom" <Tom.StDenis-5C7GfCeVMHo@public.gmane.org>,
amd-gfx mailing list
<amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>,
"Zhou,
David(ChunMing)" <David1.Zhou-5C7GfCeVMHo@public.gmane.org>
Subject: Re: Regression on gfx8 with ring init
Date: Fri, 21 Sep 2018 13:56:43 -0400 [thread overview]
Message-ID: <681ddd4e-6bd2-db28-4286-2cc577d0f00a@amd.com> (raw)
In-Reply-To: <04944e7b-044b-4b16-3d2f-e760eedcee9a-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
[-- Attachment #1.1: Type: text/plain, Size: 9488 bytes --]
No worries, I will just revert locally until then to clear the extra
errors during my investigation of current GPU reset status and issues.
Andrey
On 09/21/2018 01:53 PM, Christian König wrote:
> I unfortunately don't have a Polaris to test this myself.
>
> But please give me time till Monday so that I can at least try one
> more things to fix it.
>
> Christian.
>
> Am 21.09.2018 um 19:11 schrieb Andrey Grodzovsky:
>>
>> Ping...
>>
>>
>> Andrey
>>
>>
>> On 09/20/2018 04:35 PM, Andrey Grodzovsky wrote:
>>>
>>> What's the status with this error and the suggested patch to fix it
>>> ? It impacts GPU reset on Polaris11.
>>>
>>> Do we want to investigate why the original patch breaks it or just
>>> disable with the proposed patch ?
>>>
>>>
>>> P.S Suspend resume also stopped working on latest branch - will
>>> bisect it later today or tomorrow.
>>>
>>>
>>> Andrey
>>>
>>>
>>> On 09/18/2018 11:00 AM, Christian König wrote:
>>>> Tom,
>>>>
>>>> can you try if the following makes it working again?
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>>> index b6160de70d12..d65f5ba92fc5 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>>> @@ -937,6 +937,10 @@ static int gfx_v8_0_ring_test_ib(struct
>>>> amdgpu_ring *ring, long timeout)
>>>> return r;
>>>> }
>>>>
>>>> +static int gfx_v8_0_kiq_ring_test_ib(struct amdgpu_ring *ring,
>>>> long timeout)
>>>> +{
>>>> + return 0;
>>>> +}
>>>>
>>>> static void gfx_v8_0_free_microcode(struct amdgpu_device *adev)
>>>> {
>>>> @@ -7174,7 +7178,7 @@ static const struct amdgpu_ring_funcs
>>>> gfx_v8_0_ring_funcs_kiq = {
>>>> .emit_ib = gfx_v8_0_ring_emit_ib_compute,
>>>> .emit_fence = gfx_v8_0_ring_emit_fence_kiq,
>>>> .test_ring = gfx_v8_0_ring_test_ring,
>>>> - .test_ib = gfx_v8_0_ring_test_ib,
>>>> + .test_ib = gfx_v8_0_kiq_ring_test_ib,
>>>> .insert_nop = amdgpu_ring_insert_nop,
>>>> .pad_ib = amdgpu_ring_generic_pad_ib,
>>>> .emit_rreg = gfx_v8_0_ring_emit_rreg,
>>>>
>>>>
>>>> Thanks,
>>>> Christian.
>>>>
>>>> Am 18.09.2018 um 16:41 schrieb Christian König:
>>>>> CRTC and GFX interrupts seem to be working perfectly fine.
>>>>>
>>>>> The problem here looks like only EOP interrupts from the Compute
>>>>> queue are not correctly handled.
>>>>>
>>>>> Most likely a bug somewhere in gfx_v8_0_eop_irq().
>>>>>
>>>>> Christian.
>>>>>
>>>>> Am 18.09.2018 um 16:36 schrieb Deucher, Alexander:
>>>>>>
>>>>>> FWIW, a number of consumer Raven boards have bad IVRS tables
>>>>>> (windows doesn't use interrupt remapping so they are sometimes
>>>>>> wrong and probably not validated. There are a number of
>>>>>> workaround to manually override the IVRS tables to make
>>>>>> interrupts work. I think specifying pci=noacpi is also a
>>>>>> possible workaround.
>>>>>>
>>>>>>
>>>>>> Alex
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>> *From:* amd-gfx <amd-gfx-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org> on behalf
>>>>>> of Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
>>>>>> *Sent:* Tuesday, September 18, 2018 10:31:16 AM
>>>>>> *To:* StDenis, Tom; amd-gfx mailing list; Zhou, David(ChunMing)
>>>>>> *Subject:* Re: Regression on gfx8 with ring init
>>>>>> Well looks like interrupt processing is working perfectly fine.
>>>>>>
>>>>>> But looking at the error message once more I see that this actually
>>>>>> affects ring number 9 and not the GFX ring.
>>>>>>
>>>>>> Can you fix amdgpu_ib_ring_tests() to print ring->name instead of
>>>>>> the
>>>>>> number?
>>>>>>
>>>>>> That must be some of the compute rings.
>>>>>>
>>>>>> Thanks,
>>>>>> Christian.
>>>>>>
>>>>>> Am 18.09.2018 um 16:20 schrieb Tom St Denis:
>>>>>> > On 2018-09-18 10:13 a.m., Christian König wrote:
>>>>>> >> Mhm, there is no more failed IB-test in there isn't it?
>>>>>> >
>>>>>> > oh sorry I thought you wanted to test HEAD~ ... Attached is a
>>>>>> log from
>>>>>> > the tip of drm-next
>>>>>> >
>>>>>> > Tom
>>>>>> >
>>>>>> >>
>>>>>> >> Christian.
>>>>>> >>
>>>>>> >> Am 18.09.2018 um 16:09 schrieb Tom St Denis:
>>>>>> >>> Disabling IOMMU in the BIOS resulted in a correct boot up...
>>>>>> >>>
>>>>>> >>> Here's the log.
>>>>>> >>>
>>>>>> >>> Tom
>>>>>> >>>
>>>>>> >>> On 2018-09-18 9:58 a.m., Tom St Denis wrote:
>>>>>> >>>> Odd I couldn't even boot my system with the dGPU as primary
>>>>>> after
>>>>>> >>>> rebuilding the kernel. It got hung up in the IOMMU driver
>>>>>> (loads
>>>>>> >>>> of AMD-Vi IOMMU errors) which I wasn't able to capture
>>>>>> because it
>>>>>> >>>> panic'ed before loading the network stack.
>>>>>> >>>>
>>>>>> >>>> Bizarre.
>>>>>> >>>>
>>>>>> >>>> I'll keep trying.
>>>>>> >>>>
>>>>>> >>>> Tom
>>>>>> >>>>
>>>>>> >>>> On 2018-09-18 9:35 a.m., Christian König wrote:
>>>>>> >>>>> Am 18.09.2018 um 15:32 schrieb Tom St Denis:
>>>>>> >>>>>> On 2018-09-18 9:30 a.m., Christian König wrote:
>>>>>> >>>>>>> Great, not sure if that is a good or a bad news.
>>>>>> >>>>>>>
>>>>>> >>>>>>> Anyway going to revert the change for now. Does anybody
>>>>>> >>>>>>> volunteer to figure out why interrupts sometimes doesn't
>>>>>> work
>>>>>> >>>>>>> correctly on Raven?
>>>>>> >>>>>>
>>>>>> >>>>>> What does "doesn't work correctly?" My workstation is a
>>>>>> Raven1
>>>>>> >>>>>> (Ryzen 2400G) and other than the TTM bulk move issue has been
>>>>>> >>>>>> perfectly stable (through suspend/resumes too I might add).
>>>>>> >>>>>>
>>>>>> >>>>>> Anything I could test with my devel raven?
>>>>>> >>>>>
>>>>>> >>>>> The problem seems to be that on some boards IH handling
>>>>>> doesn't
>>>>>> >>>>> work as it should.
>>>>>> >>>>>
>>>>>> >>>>> Can you try to disable the onboard graphics and try again?
>>>>>> >>>>>
>>>>>> >>>>> If that still doesn't work there is a DRM_DEBUG in
>>>>>> >>>>> amdgpu_ih_process(), make that a DRM_ERROR and send me the
>>>>>> >>>>> resulting dmesg of loading amdgpu (but don't start any UMD).
>>>>>> >>>>>
>>>>>> >>>>> Thanks,
>>>>>> >>>>> Christian.
>>>>>> >>>>>
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>> Tom
>>>>>> >>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> Christian.
>>>>>> >>>>>>>
>>>>>> >>>>>>> Am 18.09.2018 um 15:27 schrieb Tom St Denis:
>>>>>> >>>>>>>> This commit:
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> [root@raven linux]# git bisect good
>>>>>> >>>>>>>> 9b0df0937a852d299fbe42a5939c9a8a4cc83c55 is the first
>>>>>> bad commit
>>>>>> >>>>>>>> commit 9b0df0937a852d299fbe42a5939c9a8a4cc83c55
>>>>>> >>>>>>>> Author: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
>>>>>> >>>>>>>> Date: Tue Sep 18 10:38:09 2018 +0200
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> drm/amdgpu: remove fence fallback
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> DC doesn't seem to have a fallback path either.
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> So when interrupts doesn't work any more we are
>>>>>> pretty much
>>>>>> >>>>>>>> busted no
>>>>>> >>>>>>>> matter what.
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Signed-off-by: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
>>>>>> >>>>>>>> Reviewed-by: Chunming Zhou <david1.zhou-5C7GfCeVMHo@public.gmane.org>
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Results in this:
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> [ 24.334025] [drm] Initialized amdgpu 3.27.0 20150101 for
>>>>>> >>>>>>>> 0000:07:00.0 on minor 1
>>>>>> >>>>>>>> [ 24.335674] modprobe (3895) used greatest stack
>>>>>> depth: 12600
>>>>>> >>>>>>>> bytes left
>>>>>> >>>>>>>> [ 26.272358] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR*
>>>>>> >>>>>>>> amdgpu: IB test timed out.
>>>>>> >>>>>>>> [ 26.272460] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR*
>>>>>> >>>>>>>> amdgpu: failed testing IB on ring 9 (-110).
>>>>>> >>>>>>>> [ 26.407885] [drm:process_one_work] *ERROR* ib ring test
>>>>>> >>>>>>>> failed (-110).
>>>>>> >>>>>>>> [ 28.506708] fuse init (API version 7.27)
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> On init with my polaris/raven1 system.
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Cheers,
>>>>>> >>>>>>>> Tom
>>>>>> >>>>>>>> _______________________________________________
>>>>>> >>>>>>>> amd-gfx mailing list
>>>>>> >>>>>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>>>> >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>> >>>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>
>>>>>> >>>>
>>>>>> >>>
>>>>>> >>
>>>>>> >
>>>>>>
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>
>>>
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>>
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
[-- Attachment #1.2: Type: text/html, Size: 22275 bytes --]
[-- Attachment #2: Type: text/plain, Size: 154 bytes --]
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
next prev parent reply other threads:[~2018-09-21 17:56 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-18 13:27 Regression on gfx8 with ring init Tom St Denis
[not found] ` <8cdb037b-7db7-9be9-2c8a-d52c1b058454-5C7GfCeVMHo@public.gmane.org>
2018-09-18 13:30 ` Christian König
[not found] ` <7f748397-265d-20e9-b081-108b28994c1f-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-09-18 13:32 ` Tom St Denis
[not found] ` <1fdbd1f8-afb8-59e7-c057-10da9b9f6e25-5C7GfCeVMHo@public.gmane.org>
2018-09-18 13:35 ` Christian König
[not found] ` <80d8437f-0873-8318-01c1-2710adea67e0-5C7GfCeVMHo@public.gmane.org>
2018-09-18 13:58 ` Tom St Denis
[not found] ` <aa62adca-48cb-fb3c-65ac-7d2e3311d602-5C7GfCeVMHo@public.gmane.org>
2018-09-18 14:09 ` Tom St Denis
[not found] ` <43e69bf1-8751-dbe8-6b8d-5250c527154c-5C7GfCeVMHo@public.gmane.org>
2018-09-18 14:13 ` Christian König
[not found] ` <34359f9e-be6f-945c-e084-c109e6584d67-5C7GfCeVMHo@public.gmane.org>
2018-09-18 14:20 ` Tom St Denis
[not found] ` <12ac8b66-0ce2-0304-d9ad-6e3f2479e04f-5C7GfCeVMHo@public.gmane.org>
2018-09-18 14:31 ` Christian König
[not found] ` <3ad24617-bdee-846e-b47c-d854c48fce43-5C7GfCeVMHo@public.gmane.org>
2018-09-18 14:36 ` Deucher, Alexander
[not found] ` <BN6PR12MB1809B0E02DDA1E8AACFFD1DAF71D0-/b2+HYfkarSEx6ez0IUAagdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-09-18 14:41 ` Christian König
[not found] ` <4a250398-d2ac-1650-739d-e4a6598f1c48-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-09-18 15:00 ` Christian König
[not found] ` <edd44be9-2ef3-3c39-3342-5d3b4bbfa40a-5C7GfCeVMHo@public.gmane.org>
2018-09-20 20:35 ` Andrey Grodzovsky
[not found] ` <4afeb01c-37e9-ca76-8055-5dd15fca98d3-5C7GfCeVMHo@public.gmane.org>
2018-09-21 17:11 ` Andrey Grodzovsky
[not found] ` <c81338de-5fc7-3be3-961a-bba0eba05351-5C7GfCeVMHo@public.gmane.org>
2018-09-21 17:53 ` Christian König
[not found] ` <04944e7b-044b-4b16-3d2f-e760eedcee9a-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-09-21 17:56 ` Andrey Grodzovsky [this message]
[not found] ` <681ddd4e-6bd2-db28-4286-2cc577d0f00a-5C7GfCeVMHo@public.gmane.org>
2018-09-21 18:04 ` Andrey Grodzovsky
2018-09-18 14:40 ` Tom St Denis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=681ddd4e-6bd2-db28-4286-2cc577d0f00a@amd.com \
--to=andrey.grodzovsky-5c7gfcevmho@public.gmane.org \
--cc=Alexander.Deucher-5C7GfCeVMHo@public.gmane.org \
--cc=David1.Zhou-5C7GfCeVMHo@public.gmane.org \
--cc=Tom.StDenis-5C7GfCeVMHo@public.gmane.org \
--cc=amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
--cc=christian.koenig-5C7GfCeVMHo@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.