public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: Tomas Winkler <tomas.winkler@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Alexander Usyskin <alexander.usyskin@intel.com>,
	Vitaly Lubart <vitaly.lubart@intel.com>,
	linux-kernel@vger.kernel.org, intel-gfx@lists.freedesktop.org,
	Alan Previn <alan.previn.teres.alexis@intel.com>
Subject: Re: [char-misc-next 3/4] mei: pxp: re-enable client on errors
Date: Tue, 14 Nov 2023 16:00:27 +0200	[thread overview]
Message-ID: <ZVN9e3BczixJy_1H@intel.com> (raw)
In-Reply-To: <20231011110157.247552-4-tomas.winkler@intel.com>

On Wed, Oct 11, 2023 at 02:01:56PM +0300, Tomas Winkler wrote:
> From: Alexander Usyskin <alexander.usyskin@intel.com>
> 
> Disable and enable mei-pxp client on errors to clean the internal state.

This broke i915 on my Alderlake-P laptop.

Trying to start Xorg just hangs and I eventually have to power off the
laptop to get things back into shape.

The behaviour gets a bit better after commit fb99e79ee62a ("mei: update mei-pxp's
component interface with timeouts") as Xorg "only" gets blocked for
~10 seconds, after which it manages to start, and I get a bunch of spew
in dmesg:
[   25.431535] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   30.435241] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   30.435965] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   30.437341] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   30.437356] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg for inv-stream-key-15, ret=[28]
[   35.555210] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   35.555919] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   35.555937] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg init arb session, ret=[-62]
[   35.555941] i915 0000:00:02.0: [drm] *ERROR* tee cmd for arb session creation failed
[   35.556765] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   36.021808] fuse: init (API version 7.39)
[   40.675183] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   40.676045] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   40.676591] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   40.676602] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg for inv-stream-key-15, ret=[28]
[   40.960209] mate-session-ch[5936]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[   45.795172] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   45.795872] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   45.796520] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   50.915183] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   50.916005] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   50.916012] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg for inv-stream-key-15, ret=[-62]
[   50.916846] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   56.035149] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   56.035956] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   56.036585] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   56.036592] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg for inv-stream-key-15, ret=[28]
[   61.155137] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...

The same spew repeats every time I run any application that uses the GPU,
and the application also gets blocked for a long time (eg. firefox takes
over 15 seconds to start now).

> 
> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
> ---
>  drivers/misc/mei/pxp/mei_pxp.c | 70 +++++++++++++++++++++++-----------
>  1 file changed, 48 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/misc/mei/pxp/mei_pxp.c b/drivers/misc/mei/pxp/mei_pxp.c
> index c6cdd6a47308ebcc72f34c38..9875d16445bb03efcfb31cd9 100644
> --- a/drivers/misc/mei/pxp/mei_pxp.c
> +++ b/drivers/misc/mei/pxp/mei_pxp.c
> @@ -23,6 +23,24 @@
>  
>  #include "mei_pxp.h"
>  
> +static inline int mei_pxp_reenable(const struct device *dev, struct mei_cl_device *cldev)
> +{
> +	int ret;
> +
> +	dev_warn(dev, "Trying to reset the channel...\n");
> +	ret = mei_cldev_disable(cldev);
> +	if (ret < 0)
> +		dev_warn(dev, "mei_cldev_disable failed. %d\n", ret);
> +	/*
> +	 * Explicitly ignoring disable failure,
> +	 * enable may fix the states and succeed
> +	 */
> +	ret = mei_cldev_enable(cldev);
> +	if (ret < 0)
> +		dev_err(dev, "mei_cldev_enable failed. %d\n", ret);
> +	return ret;
> +}
> +
>  /**
>   * mei_pxp_send_message() - Sends a PXP message to ME FW.
>   * @dev: device corresponding to the mei_cl_device
> @@ -35,6 +53,7 @@ mei_pxp_send_message(struct device *dev, const void *message, size_t size)
>  {
>  	struct mei_cl_device *cldev;
>  	ssize_t byte;
> +	int ret;
>  
>  	if (!dev || !message)
>  		return -EINVAL;
> @@ -44,10 +63,20 @@ mei_pxp_send_message(struct device *dev, const void *message, size_t size)
>  	byte = mei_cldev_send(cldev, message, size);
>  	if (byte < 0) {
>  		dev_dbg(dev, "mei_cldev_send failed. %zd\n", byte);
> -		return byte;
> +		switch (byte) {
> +		case -ENOMEM:
> +			fallthrough;
> +		case -ENODEV:
> +			fallthrough;
> +		case -ETIME:
> +			ret = mei_pxp_reenable(dev, cldev);
> +			if (ret)
> +				byte = ret;
> +			break;
> +		}
>  	}
>  
> -	return 0;
> +	return byte;
>  }
>  
>  /**
> @@ -63,6 +92,7 @@ mei_pxp_receive_message(struct device *dev, void *buffer, size_t size)
>  	struct mei_cl_device *cldev;
>  	ssize_t byte;
>  	bool retry = false;
> +	int ret;
>  
>  	if (!dev || !buffer)
>  		return -EINVAL;
> @@ -73,26 +103,22 @@ mei_pxp_receive_message(struct device *dev, void *buffer, size_t size)
>  	byte = mei_cldev_recv(cldev, buffer, size);
>  	if (byte < 0) {
>  		dev_dbg(dev, "mei_cldev_recv failed. %zd\n", byte);
> -		if (byte != -ENOMEM)
> -			return byte;
> -
> -		/* Retry the read when pages are reclaimed */
> -		msleep(20);
> -		if (!retry) {
> -			retry = true;
> -			goto retry;
> -		} else {
> -			dev_warn(dev, "No memory on data receive after retry, trying to reset the channel...\n");
> -			byte = mei_cldev_disable(cldev);
> -			if (byte < 0)
> -				dev_warn(dev, "mei_cldev_disable failed. %zd\n", byte);
> -			/*
> -			 * Explicitly ignoring disable failure,
> -			 * enable may fix the states and succeed
> -			 */
> -			byte = mei_cldev_enable(cldev);
> -			if (byte < 0)
> -				dev_err(dev, "mei_cldev_enable failed. %zd\n", byte);
> +		switch (byte) {
> +		case -ENOMEM:
> +			/* Retry the read when pages are reclaimed */
> +			msleep(20);
> +			if (!retry) {
> +				retry = true;
> +				goto retry;
> +			}
> +			fallthrough;
> +		case -ENODEV:
> +			fallthrough;
> +		case -ETIME:
> +			ret = mei_pxp_reenable(dev, cldev);
> +			if (ret)
> +				byte = ret;
> +			break;
>  		}
>  	}
>  
> -- 
> 2.41.0
> 

-- 
Ville Syrjälä
Intel

  reply	other threads:[~2023-11-14 14:00 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-11 11:01 [char-misc-next 0/4] mei: enhance mei pxp recoverability Tomas Winkler
2023-10-11 11:01 ` [char-misc-next 1/4] mei: bus: add send and recv api with timeout Tomas Winkler
2023-10-11 11:01 ` [char-misc-next 2/4] mei: pxp: recover from recv fail under memory pressure Tomas Winkler
2023-10-11 11:01 ` [char-misc-next 3/4] mei: pxp: re-enable client on errors Tomas Winkler
2023-11-14 14:00   ` Ville Syrjälä [this message]
2023-11-14 15:31     ` Teres Alexis, Alan Previn
2023-11-14 18:40       ` Winkler, Tomas
2023-11-15 20:35         ` Ville Syrjälä
2023-11-27 13:22           ` [Intel-gfx] " Ville Syrjälä
2023-11-27 13:31             ` gregkh
2023-11-15 13:31       ` Tvrtko Ursulin
2023-11-15 15:58         ` Teres Alexis, Alan Previn
2023-10-11 11:01 ` [char-misc-next 4/4] mei: update mei-pxp's component interface with timeouts Tomas Winkler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZVN9e3BczixJy_1H@intel.com \
    --to=ville.syrjala@linux.intel.com \
    --cc=alan.previn.teres.alexis@intel.com \
    --cc=alexander.usyskin@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tomas.winkler@intel.com \
    --cc=vitaly.lubart@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox