Re: [Intel-gfx] [PATCH] drm/i915/guc: Check for ct enabled while waiting for response

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Dixit, Ashutosh" <ashutosh.dixit@intel.com>
To: Zhanjun Dong <zhanjun.dong@intel.com>
Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH] drm/i915/guc: Check for ct enabled while waiting for response
Date: Thu, 16 Jun 2022 21:42:59 -0700	[thread overview]
Message-ID: <87mtebx5m4.wl-ashutosh.dixit@intel.com> (raw)
In-Reply-To: <20220616220158.15778-1-zhanjun.dong@intel.com>

On Thu, 16 Jun 2022 15:01:59 -0700, Zhanjun Dong wrote:
>
> We are seeing error message of "No response for request". Some cases
> happened while waiting for response and reset/suspend action was triggered.
> In this case, no response is not an error, active requests will be
> cancelled.
>
> This patch will handle this condition and change the error message into
> debug message.

The convention we follow in drm is to record the version of the patch and
what changed in that version.

Generally I am ok with this version of the patch but still have a couple of
questions.

> -static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
> +static int wait_for_ct_request_update(struct intel_guc_ct *ct, struct ct_request *req, u32 *status)
>  {
>	int err;
> +	bool ct_enabled;
>
>	/*
>	 * Fast commands should complete in less than 10us, so sample quickly
> @@ -481,12 +483,15 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
>  #define GUC_CTB_RESPONSE_TIMEOUT_SHORT_MS 10
>  #define GUC_CTB_RESPONSE_TIMEOUT_LONG_MS 1000
>  #define done \
> -	(FIELD_GET(GUC_HXG_MSG_0_ORIGIN, READ_ONCE(req->status)) == \
> +	(!(ct_enabled = intel_guc_ct_enabled(ct)) || \
> +	 FIELD_GET(GUC_HXG_MSG_0_ORIGIN, READ_ONCE(req->status)) == \
>	 GUC_HXG_ORIGIN_GUC)
>	err = wait_for_us(done, GUC_CTB_RESPONSE_TIMEOUT_SHORT_MS);
>	if (err)
>		err = wait_for(done, GUC_CTB_RESPONSE_TIMEOUT_LONG_MS);
>  #undef done
> +	if (!ct_enabled)
> +		err = -ECANCELED;

So we have the choice of either setting the request status here as I was
suggesting earlier, e.g. as follows:

	#define   GUC_HXG_TYPE_REQUEST_CANCELED        4u // unused value

	if (!ct_enabled)
		req->status = GUC_HXG_TYPE_REQUEST_CANCELED;

We would return 0 in this case and would check for the req->status value
above where needed.

Or we can return -ECANCELED. I don't know if -ECANCELED is the right value
to return but whatever we return will have to be unique (ununsed elsewhere)
since we are relying on the return value. -ECANCELED is unique so that part
is ok.

Do other reviewers have a preference whether we should set req->status or
return a unique return value?

>	*status = req->status;
>	return err;
> @@ -703,11 +708,15 @@ static int ct_send(struct intel_guc_ct *ct,
>
>	intel_guc_notify(ct_to_guc(ct));
>
> -	err = wait_for_ct_request_update(&request, status);
> +	err = wait_for_ct_request_update(ct, &request, status);
>	g2h_release_space(ct, GUC_CTB_HXG_MSG_MAX_LEN);
>	if (unlikely(err)) {
> -		CT_ERROR(ct, "No response for request %#x (fence %u)\n",
> -			 action[0], request.fence);
> +		if (err == -ECANCELED)
> +			CT_DEBUG(ct, "Request %#x (fence %u) cancelled as CTB is disabled\n",
> +				 action[0], request.fence);
> +		else
> +			CT_ERROR(ct, "No response for request %#x (fence %u)\n",
> +				 action[0], request.fence);
>		goto unlink;
>	}
>
> @@ -771,8 +780,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>
>	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
>	if (unlikely(ret < 0)) {
> -		CT_ERROR(ct, "Sending action %#x failed (%pe) status=%#X\n",
> -			 action[0], ERR_PTR(ret), status);
> +		if (ret != -ECANCELED)
> +			CT_ERROR(ct, "Sending action %#x failed (%pe) status=%#X\n",
> +				 action[0], ERR_PTR(ret), status);

I am wondering why we even have this print and should we just delete it or
convert it to CT_DEBUG(). The reason is that only error prints closest to
where the actual error occurs are useful since they pin-point the error
clearly. This to be seems to be a "second" print from a higher level
function which does not seem particularly useful.


>	} else if (unlikely(ret)) {
>		CT_DEBUG(ct, "send action %#x returned %d (%#x)\n",
>			 action[0], ret, ret);
> --
> 2.36.0
>

WARNING: multiple messages have this Message-ID (diff)

From: "Dixit, Ashutosh" <ashutosh.dixit@intel.com>
To: Zhanjun Dong <zhanjun.dong@intel.com>
Cc: intel-gfx@lists.freedesktop.org,
	Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>,
	John Harrison <john.c.harrison@intel.com>,
	dri-devel@lists.freedesktop.org,
	Michal Wajdeczko <michal.wajdeczko@intel.com>
Subject: Re: [Intel-gfx] [PATCH] drm/i915/guc: Check for ct enabled while waiting for response
Date: Thu, 16 Jun 2022 21:42:59 -0700	[thread overview]
Message-ID: <87mtebx5m4.wl-ashutosh.dixit@intel.com> (raw)
In-Reply-To: <20220616220158.15778-1-zhanjun.dong@intel.com>

On Thu, 16 Jun 2022 15:01:59 -0700, Zhanjun Dong wrote:
>
> We are seeing error message of "No response for request". Some cases
> happened while waiting for response and reset/suspend action was triggered.
> In this case, no response is not an error, active requests will be
> cancelled.
>
> This patch will handle this condition and change the error message into
> debug message.

The convention we follow in drm is to record the version of the patch and
what changed in that version.

Generally I am ok with this version of the patch but still have a couple of
questions.

> -static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
> +static int wait_for_ct_request_update(struct intel_guc_ct *ct, struct ct_request *req, u32 *status)
>  {
>	int err;
> +	bool ct_enabled;
>
>	/*
>	 * Fast commands should complete in less than 10us, so sample quickly
> @@ -481,12 +483,15 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
>  #define GUC_CTB_RESPONSE_TIMEOUT_SHORT_MS 10
>  #define GUC_CTB_RESPONSE_TIMEOUT_LONG_MS 1000
>  #define done \
> -	(FIELD_GET(GUC_HXG_MSG_0_ORIGIN, READ_ONCE(req->status)) == \
> +	(!(ct_enabled = intel_guc_ct_enabled(ct)) || \
> +	 FIELD_GET(GUC_HXG_MSG_0_ORIGIN, READ_ONCE(req->status)) == \
>	 GUC_HXG_ORIGIN_GUC)
>	err = wait_for_us(done, GUC_CTB_RESPONSE_TIMEOUT_SHORT_MS);
>	if (err)
>		err = wait_for(done, GUC_CTB_RESPONSE_TIMEOUT_LONG_MS);
>  #undef done
> +	if (!ct_enabled)
> +		err = -ECANCELED;

So we have the choice of either setting the request status here as I was
suggesting earlier, e.g. as follows:

	#define   GUC_HXG_TYPE_REQUEST_CANCELED        4u // unused value

	if (!ct_enabled)
		req->status = GUC_HXG_TYPE_REQUEST_CANCELED;

We would return 0 in this case and would check for the req->status value
above where needed.

Or we can return -ECANCELED. I don't know if -ECANCELED is the right value
to return but whatever we return will have to be unique (ununsed elsewhere)
since we are relying on the return value. -ECANCELED is unique so that part
is ok.

Do other reviewers have a preference whether we should set req->status or
return a unique return value?

>	*status = req->status;
>	return err;
> @@ -703,11 +708,15 @@ static int ct_send(struct intel_guc_ct *ct,
>
>	intel_guc_notify(ct_to_guc(ct));
>
> -	err = wait_for_ct_request_update(&request, status);
> +	err = wait_for_ct_request_update(ct, &request, status);
>	g2h_release_space(ct, GUC_CTB_HXG_MSG_MAX_LEN);
>	if (unlikely(err)) {
> -		CT_ERROR(ct, "No response for request %#x (fence %u)\n",
> -			 action[0], request.fence);
> +		if (err == -ECANCELED)
> +			CT_DEBUG(ct, "Request %#x (fence %u) cancelled as CTB is disabled\n",
> +				 action[0], request.fence);
> +		else
> +			CT_ERROR(ct, "No response for request %#x (fence %u)\n",
> +				 action[0], request.fence);
>		goto unlink;
>	}
>
> @@ -771,8 +780,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>
>	ret = ct_send(ct, action, len, response_buf, response_buf_size, &status);
>	if (unlikely(ret < 0)) {
> -		CT_ERROR(ct, "Sending action %#x failed (%pe) status=%#X\n",
> -			 action[0], ERR_PTR(ret), status);
> +		if (ret != -ECANCELED)
> +			CT_ERROR(ct, "Sending action %#x failed (%pe) status=%#X\n",
> +				 action[0], ERR_PTR(ret), status);

I am wondering why we even have this print and should we just delete it or
convert it to CT_DEBUG(). The reason is that only error prints closest to
where the actual error occurs are useful since they pin-point the error
clearly. This to be seems to be a "second" print from a higher level
function which does not seem particularly useful.


>	} else if (unlikely(ret)) {
>		CT_DEBUG(ct, "send action %#x returned %d (%#x)\n",
>			 action[0], ret, ret);
> --
> 2.36.0
>

next prev parent reply	other threads:[~2022-06-17  4:43 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-16 22:01 [Intel-gfx] [PATCH] drm/i915/guc: Check for ct enabled while waiting for response Zhanjun Dong
2022-06-16 22:01 ` Zhanjun Dong
2022-06-17  1:20 ` [Intel-gfx] ✓ Fi.CI.BAT: success for " Patchwork
2022-06-17  4:42 ` Dixit, Ashutosh [this message]
2022-06-17  4:42   ` [Intel-gfx] [PATCH] " Dixit, Ashutosh
2022-06-17  4:50 ` Dixit, Ashutosh
2022-07-12 19:47   ` Dixit, Ashutosh
2022-07-13 21:45     ` Dong, Zhanjun
2022-07-13 21:45       ` Dong, Zhanjun
2022-06-17 11:35 ` [Intel-gfx] ✓ Fi.CI.IGT: success for " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2022-07-15 21:13 [Intel-gfx] [PATCH] " Zhanjun Dong
2022-07-25 18:18 ` Dixit, Ashutosh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mtebx5m4.wl-ashutosh.dixit@intel.com \
    --to=ashutosh.dixit@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=zhanjun.dong@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.