All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lang Yu <Lang.Yu@amd.com>
To: "Quan, Evan" <Evan.Quan@amd.com>
Cc: "Grodzovsky, Andrey" <Andrey.Grodzovsky@amd.com>,
	"Lazar, Lijo" <Lijo.Lazar@amd.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	"Huang, Ray" <Ray.Huang@amd.com>,
	"Deucher, Alexander" <Alexander.Deucher@amd.com>,
	"Koenig, Christian" <Christian.Koenig@amd.com>
Subject: Re: [PATCH 2/2] drm/amdgpu: add support for SMU debug option
Date: Fri, 10 Dec 2021 10:33:59 +0800	[thread overview]
Message-ID: <YbK8lwgodK5VutKA@lang-desktop> (raw)
In-Reply-To: <DM6PR12MB2619199DA5BBADDF0ED9ED0CE4719@DM6PR12MB2619.namprd12.prod.outlook.com>

On 12/10/ , Quan, Evan wrote:
> [AMD Official Use Only]
> 
> 
> 
> > -----Original Message-----
> > From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Lang
> > Yu
> > Sent: Thursday, December 9, 2021 4:49 PM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Lazar, Lijo
> > <Lijo.Lazar@amd.com>; Huang, Ray <Ray.Huang@amd.com>; Deucher,
> > Alexander <Alexander.Deucher@amd.com>; Yu, Lang <Lang.Yu@amd.com>;
> > Koenig, Christian <Christian.Koenig@amd.com>
> > Subject: [PATCH 2/2] drm/amdgpu: add support for SMU debug option
> > 
> > SMU firmware guys expect the driver maintains error context
> > and doesn't interact with SMU any more when SMU errors occurred.
> > That will aid in debugging SMU firmware issues.
> > 
> > Add SMU debug option support for this request, it can be
> > enabled or disabled via amdgpu_smu_debug debugfs file.
> > When enabled, it brings hardware to a kind of halt state
> > so that no one can touch it any more in the envent of SMU
> > errors.
> > 
> > Currently, dirver interacts with SMU via sending messages.
> > And threre are three ways to sending messages to SMU.
> > Handle them respectively as following:
> > 
> > 1, smu_cmn_send_smc_msg_with_param() for normal timeout cases
> > 
> >   Halt on any error.
> > 
> > 2, smu_cmn_send_msg_without_waiting()/smu_cmn_wait_for_response()
> > for longer timeout cases
> > 
> >   Halt on errors apart from ETIME. Otherwise this way won't work.
> > 
> > 3, smu_cmn_send_msg_without_waiting() for no waiting cases
> > 
> >   Halt on errors apart from ETIME. Otherwise second way won't work.
> > 
> > After halting, use BUG() to explicitly notify users.
> > 
> > == Command Guide ==
> > 
> > 1, enable SMU debug option
> > 
> >  # echo 1 > /sys/kernel/debug/dri/0/amdgpu_smu_debug
> > 
> > 2, disable SMU debug option
> > 
> >  # echo 0 > /sys/kernel/debug/dri/0/amdgpu_smu_debug
> > 
> > v4:
> >  - Set to halt state instead of a simple hang.(Christian)
> > 
> > v3:
> >  - Use debugfs_create_bool().(Christian)
> >  - Put variable into smu_context struct.
> >  - Don't resend command when timeout.
> > 
> > v2:
> >  - Resend command when timeout.(Lijo)
> >  - Use debugfs file instead of module parameter.
> > 
> > Signed-off-by: Lang Yu <lang.yu@amd.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  3 +++
> >  drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h     |  5 +++++
> >  drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c      | 20
> > +++++++++++++++++++-
> >  3 files changed, 27 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > index 164d6a9e9fbb..86cd888c7822 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > @@ -1618,6 +1618,9 @@ int amdgpu_debugfs_init(struct amdgpu_device
> > *adev)
> >  	if (!debugfs_initialized())
> >  		return 0;
> > 
> > +	debugfs_create_bool("amdgpu_smu_debug", 0600, root,
> > +				  &adev->smu.smu_debug_mode);
> > +
> >  	ent = debugfs_create_file("amdgpu_preempt_ib", 0600, root, adev,
> >  				  &fops_ib_preempt);
> >  	if (IS_ERR(ent)) {
> > diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
> > b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
> > index f738f7dc20c9..50dbf5594a9d 100644
> > --- a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
> > +++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
> > @@ -569,6 +569,11 @@ struct smu_context
> >  	struct smu_user_dpm_profile user_dpm_profile;
> > 
> >  	struct stb_context stb_context;
> > +	/*
> > +	 * When enabled, it makes SMU errors fatal.
> > +	 * (0 = disabled (default), 1 = enabled)
> > +	 */
> > +	bool smu_debug_mode;
> [Quan, Evan] Can you expand this to bit mask(as ppfeaturemask)? So that in future we can add support for other debug features.
> >  };

OK.

> > 
> >  struct i2c_adapter;
> > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> > b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> > index 048ca1673863..84016d22c075 100644
> > --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> > @@ -272,6 +272,11 @@ int smu_cmn_send_msg_without_waiting(struct
> > smu_context *smu,
> >  	__smu_cmn_send_msg(smu, msg_index, param);
> >  	res = 0;
> >  Out:
> > +	if (unlikely(smu->smu_debug_mode) && res && (res != -ETIME)) {
> > +		amdgpu_device_halt(smu->adev);
> > +		BUG();
> [Quan, Evan] I agree amdgpu_device_halt() is a good idea. Christian and Andrey can share you more insights about that.
> Do we still need the "BUG()" then? 

The BUG() is used to explicitly notify users something went 
wrong. Otherwise userspace may not know immediately. 
FW guys request this in ticket.

Regards,
Lang

> BR
> Evan
> > +	}
> > +
> >  	return res;
> >  }
> > 
> > @@ -288,9 +293,17 @@ int smu_cmn_send_msg_without_waiting(struct
> > smu_context *smu,
> >  int smu_cmn_wait_for_response(struct smu_context *smu)
> >  {
> >  	u32 reg;
> > +	int res;
> > 
> >  	reg = __smu_cmn_poll_stat(smu);
> > -	return __smu_cmn_reg2errno(smu, reg);
> > +	res = __smu_cmn_reg2errno(smu, reg);
> > +
> > +	if (unlikely(smu->smu_debug_mode) && res && (res != -ETIME)) {
> > +		amdgpu_device_halt(smu->adev);
> > +		BUG();
> > +	}
> > +
> > +	return res;
> >  }
> > 
> >  /**
> > @@ -357,6 +370,11 @@ int smu_cmn_send_smc_msg_with_param(struct
> > smu_context *smu,
> >  	if (read_arg)
> >  		smu_cmn_read_arg(smu, read_arg);
> >  Out:
> > +	if (unlikely(smu->smu_debug_mode) && res) {
> > +		amdgpu_device_halt(smu->adev);
> > +		BUG();
> > +	}
> > +
> >  	mutex_unlock(&smu->message_lock);
> >  	return res;
> >  }
> > --
> > 2.25.1

  reply	other threads:[~2021-12-10  2:34 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-09  8:49 [PATCH 1/2] drm/amdgpu: introduce a kind of halt state for amdgpu device Lang Yu
2021-12-09  8:49 ` [PATCH 2/2] drm/amdgpu: add support for SMU debug option Lang Yu
2021-12-10  2:07   ` Quan, Evan
2021-12-10  2:33     ` Lang Yu [this message]
2021-12-10  2:52       ` Quan, Evan
2021-12-10  3:21         ` Lang Yu
2021-12-10  7:04           ` Christian König
2021-12-10  7:49             ` Lang Yu
2021-12-09  9:00 ` [PATCH 1/2] drm/amdgpu: introduce a kind of halt state for amdgpu device Christian König
2021-12-09 15:38   ` Andrey Grodzovsky
2021-12-09 15:42     ` Christian König
2021-12-10  3:47       ` Lang Yu
2021-12-10 15:45         ` Andrey Grodzovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YbK8lwgodK5VutKA@lang-desktop \
    --to=lang.yu@amd.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Andrey.Grodzovsky@amd.com \
    --cc=Christian.Koenig@amd.com \
    --cc=Evan.Quan@amd.com \
    --cc=Lijo.Lazar@amd.com \
    --cc=Ray.Huang@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.