Re: [PATCH 2/2] drm/amdgpu: fix vcn sw init failed

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Lazar, Lijo" <lijo.lazar@amd.com>
To: "Zhang, Jesse(Jie)" <Jesse.Zhang@amd.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>
Cc: "Deucher, Alexander" <Alexander.Deucher@amd.com>,
	"Koenig, Christian" <Christian.Koenig@amd.com>,
	"Prosyak, Vitaly" <Vitaly.Prosyak@amd.com>,
	"Huang, Tim" <Tim.Huang@amd.com>
Subject: Re: [PATCH 2/2] drm/amdgpu: fix vcn sw init failed
Date: Wed, 13 Nov 2024 08:40:16 +0530	[thread overview]
Message-ID: <63fcdfc5-af2a-4065-ab5a-81ca5dca6db9@amd.com> (raw)
In-Reply-To: <DM4PR12MB51524E90E5A98A03BF7FDD8CE35A2@DM4PR12MB5152.namprd12.prod.outlook.com>



On 11/13/2024 7:58 AM, Zhang, Jesse(Jie) wrote:
> [AMD Official Use Only - AMD Internal Distribution Only]
> 
> Hi, Lijo,
> 
> -----Original Message-----
> From: Lazar, Lijo <Lijo.Lazar@amd.com>
> Sent: Tuesday, November 12, 2024 10:54 PM
> To: Zhang, Jesse(Jie) <Jesse.Zhang@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; Prosyak, Vitaly <Vitaly.Prosyak@amd.com>; Huang, Tim <Tim.Huang@amd.com>
> Subject: Re: [PATCH 2/2] drm/amdgpu: fix vcn sw init failed
> 
> 
> 
> On 11/12/2024 8:00 PM, Jesse.zhang@amd.com wrote:
>> [ 2875.870277] [drm:amdgpu_device_init [amdgpu]] *ERROR* sw_init of IP
>> block <vcn_v4_0_3> failed -22 [ 2875.880494] amdgpu 0000:01:00.0:
>> amdgpu: amdgpu_device_ip_init failed [ 2875.887689] amdgpu
>> 0000:01:00.0: amdgpu: Fatal error during GPU init [ 2875.894791] amdgpu 0000:01:00.0: amdgpu: amdgpu: finishing device.
>>
>> Add irqs with different IRQ source pointer for vcn0 and vcn1.
>>
>> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c | 19 +++++++++++++------
>>  1 file changed, 13 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c
>> b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c
>> index ef3dfd44a022..82b90f1e6f33 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c
>> @@ -83,6 +83,10 @@ static const struct amdgpu_hwip_reg_entry
>> vcn_reg_list_4_0_3[] = {
>>
>>  #define NORMALIZE_VCN_REG_OFFSET(offset) \
>>               (offset & 0x1FFFF)
>> +static int amdgpu_ih_clientid_vcns[] = {
>> +     SOC15_IH_CLIENTID_VCN,
>> +     SOC15_IH_CLIENTID_VCN1
> 
> This is not valid for 4.0.3. It uses only the same client id, different node_id to distinguish. Also, there are max of 4 instances.
> 
> I would say that entire IP instance series was done in a haste without applying thought and breaks other things including ip block mask.
> 
> If the same client id is used, it returns -EINVAL (because of the following check) and sw init fails at step vcn_v4_0_3_sw_init / amdgpu_irq_add_id.
> 
> amdgpu_irq_add_id:
> if (adev->irq.client[client_id].sources[src_id] != NULL)
>         return -EINVAL;
> 

We had some side discussions on IP block-per-instance approach.
Personally, I was not in favour of it as I thought allowing IP block to
handle its own instances is the better approach and that could handle
dependencies between instances. Turns out that there are more like
handling common things for all instances as in this example.

I would prefer to revert the patch series and consider all angles before
moving forward on the approach. Will leave this to Alex/Christian/Leo on
the final decsion.

Thanks,
Lijo

> Regards
> Jesse
> 
> 
> Thanks,
> Lijo
> 
>> +};
>>
>>  static int vcn_v4_0_3_start_sriov(struct amdgpu_device *adev);
>> static void vcn_v4_0_3_set_unified_ring_funcs(struct amdgpu_device
>> *adev, int inst); @@ -150,9 +154,9 @@ static int vcn_v4_0_3_sw_init(struct amdgpu_ip_block *ip_block)
>>       if (r)
>>               return r;
>>
>> -     /* VCN DEC TRAP */
>> -     r = amdgpu_irq_add_id(adev, SOC15_IH_CLIENTID_VCN,
>> -             VCN_4_0__SRCID__UVD_ENC_GENERAL_PURPOSE, &adev->vcn.inst->irq);
>> +     /* VCN UNIFIED TRAP */
>> +     r = amdgpu_irq_add_id(adev, amdgpu_ih_clientid_vcns[inst],
>> +                     VCN_4_0__SRCID__UVD_ENC_GENERAL_PURPOSE,
>> +&adev->vcn.inst[inst].irq);
>>       if (r)
>>               return r;
>>
>> @@ -174,7 +178,7 @@ static int vcn_v4_0_3_sw_init(struct
>> amdgpu_ip_block *ip_block)
>>
>>       ring->vm_hub = AMDGPU_MMHUB0(adev->vcn.inst[inst].aid_id);
>>       sprintf(ring->name, "vcn_unified_%d", adev->vcn.inst[inst].aid_id);
>> -     r = amdgpu_ring_init(adev, ring, 512, &adev->vcn.inst->irq, 0,
>> +     r = amdgpu_ring_init(adev, ring, 512, &adev->vcn.inst[inst].irq, 0,
>>                                AMDGPU_RING_PRIO_DEFAULT,
>>                                &adev->vcn.inst[inst].sched_score);
>>       if (r)
>> @@ -1734,9 +1738,12 @@ static const struct amdgpu_irq_src_funcs vcn_v4_0_3_irq_funcs = {
>>   */
>>  static void vcn_v4_0_3_set_irq_funcs(struct amdgpu_device *adev, int
>> inst)  {
>> -     adev->vcn.inst->irq.num_types++;
>> +     if (adev->vcn.harvest_config & (1 << inst))
>> +             return;
>> +
>> +     adev->vcn.inst[inst].irq.num_types = adev->vcn.num_enc_rings + 1;
>>
>> -     adev->vcn.inst->irq.funcs = &vcn_v4_0_3_irq_funcs;
>> +     adev->vcn.inst[inst].irq.funcs = &vcn_v4_0_3_irq_funcs;
>>  }
>>
>>  static void vcn_v4_0_3_print_ip_state(struct amdgpu_ip_block
>> *ip_block, struct drm_printer *p)

next prev parent reply	other threads:[~2024-11-13  3:10 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-12 14:30 [PATCH 1/2] drm/admgpu: fix vcn sw init failed Jesse.zhang@amd.com
2024-11-12 14:30 ` [PATCH 2/2] drm/amdgpu: " Jesse.zhang@amd.com
2024-11-12 14:54   ` Lazar, Lijo
2024-11-13  2:28     ` Zhang, Jesse(Jie)
2024-11-13  3:10       ` Lazar, Lijo [this message]
2024-11-13  4:46         ` Alex Deucher
2024-11-13  5:03           ` Lazar, Lijo
2024-11-13  5:22             ` Zhang, Jesse(Jie)
2024-11-13  5:24             ` Alex Deucher
2024-11-13  5:32               ` Lazar, Lijo
2024-11-13 21:43                 ` Alex Deucher
2024-11-15 12:34                   ` Christian König
2024-11-15 14:24                     ` Alex Deucher
2024-11-13  5:30 ` [PATCH 1/2] drm/admgpu: " Alex Deucher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=63fcdfc5-af2a-4065-ab5a-81ca5dca6db9@amd.com \
    --to=lijo.lazar@amd.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Christian.Koenig@amd.com \
    --cc=Jesse.Zhang@amd.com \
    --cc=Tim.Huang@amd.com \
    --cc=Vitaly.Prosyak@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.