All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: Tomas Winkler <tomas.winkler@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	intel-gfx@lists.freedesktop.org,
	Alexander Usyskin <alexander.usyskin@intel.com>,
	linux-kernel@vger.kernel.org,
	Vitaly Lubart <vitaly.lubart@intel.com>
Subject: Re: [Intel-gfx] [char-misc-next 3/4] mei: pxp: re-enable client on errors
Date: Tue, 14 Nov 2023 16:00:27 +0200	[thread overview]
Message-ID: <ZVN9e3BczixJy_1H@intel.com> (raw)
In-Reply-To: <20231011110157.247552-4-tomas.winkler@intel.com>

On Wed, Oct 11, 2023 at 02:01:56PM +0300, Tomas Winkler wrote:
> From: Alexander Usyskin <alexander.usyskin@intel.com>
> 
> Disable and enable mei-pxp client on errors to clean the internal state.

This broke i915 on my Alderlake-P laptop.

Trying to start Xorg just hangs and I eventually have to power off the
laptop to get things back into shape.

The behaviour gets a bit better after commit fb99e79ee62a ("mei: update mei-pxp's
component interface with timeouts") as Xorg "only" gets blocked for
~10 seconds, after which it manages to start, and I get a bunch of spew
in dmesg:
[   25.431535] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   30.435241] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   30.435965] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   30.437341] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   30.437356] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg for inv-stream-key-15, ret=[28]
[   35.555210] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   35.555919] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   35.555937] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg init arb session, ret=[-62]
[   35.555941] i915 0000:00:02.0: [drm] *ERROR* tee cmd for arb session creation failed
[   35.556765] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   36.021808] fuse: init (API version 7.39)
[   40.675183] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   40.676045] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   40.676591] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   40.676602] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg for inv-stream-key-15, ret=[28]
[   40.960209] mate-session-ch[5936]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[   45.795172] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   45.795872] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   45.796520] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   50.915183] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   50.916005] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   50.916012] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg for inv-stream-key-15, ret=[-62]
[   50.916846] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   56.035149] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   56.035956] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   56.036585] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   56.036592] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg for inv-stream-key-15, ret=[28]
[   61.155137] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...

The same spew repeats every time I run any application that uses the GPU,
and the application also gets blocked for a long time (eg. firefox takes
over 15 seconds to start now).

> 
> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
> ---
>  drivers/misc/mei/pxp/mei_pxp.c | 70 +++++++++++++++++++++++-----------
>  1 file changed, 48 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/misc/mei/pxp/mei_pxp.c b/drivers/misc/mei/pxp/mei_pxp.c
> index c6cdd6a47308ebcc72f34c38..9875d16445bb03efcfb31cd9 100644
> --- a/drivers/misc/mei/pxp/mei_pxp.c
> +++ b/drivers/misc/mei/pxp/mei_pxp.c
> @@ -23,6 +23,24 @@
>  
>  #include "mei_pxp.h"
>  
> +static inline int mei_pxp_reenable(const struct device *dev, struct mei_cl_device *cldev)
> +{
> +	int ret;
> +
> +	dev_warn(dev, "Trying to reset the channel...\n");
> +	ret = mei_cldev_disable(cldev);
> +	if (ret < 0)
> +		dev_warn(dev, "mei_cldev_disable failed. %d\n", ret);
> +	/*
> +	 * Explicitly ignoring disable failure,
> +	 * enable may fix the states and succeed
> +	 */
> +	ret = mei_cldev_enable(cldev);
> +	if (ret < 0)
> +		dev_err(dev, "mei_cldev_enable failed. %d\n", ret);
> +	return ret;
> +}
> +
>  /**
>   * mei_pxp_send_message() - Sends a PXP message to ME FW.
>   * @dev: device corresponding to the mei_cl_device
> @@ -35,6 +53,7 @@ mei_pxp_send_message(struct device *dev, const void *message, size_t size)
>  {
>  	struct mei_cl_device *cldev;
>  	ssize_t byte;
> +	int ret;
>  
>  	if (!dev || !message)
>  		return -EINVAL;
> @@ -44,10 +63,20 @@ mei_pxp_send_message(struct device *dev, const void *message, size_t size)
>  	byte = mei_cldev_send(cldev, message, size);
>  	if (byte < 0) {
>  		dev_dbg(dev, "mei_cldev_send failed. %zd\n", byte);
> -		return byte;
> +		switch (byte) {
> +		case -ENOMEM:
> +			fallthrough;
> +		case -ENODEV:
> +			fallthrough;
> +		case -ETIME:
> +			ret = mei_pxp_reenable(dev, cldev);
> +			if (ret)
> +				byte = ret;
> +			break;
> +		}
>  	}
>  
> -	return 0;
> +	return byte;
>  }
>  
>  /**
> @@ -63,6 +92,7 @@ mei_pxp_receive_message(struct device *dev, void *buffer, size_t size)
>  	struct mei_cl_device *cldev;
>  	ssize_t byte;
>  	bool retry = false;
> +	int ret;
>  
>  	if (!dev || !buffer)
>  		return -EINVAL;
> @@ -73,26 +103,22 @@ mei_pxp_receive_message(struct device *dev, void *buffer, size_t size)
>  	byte = mei_cldev_recv(cldev, buffer, size);
>  	if (byte < 0) {
>  		dev_dbg(dev, "mei_cldev_recv failed. %zd\n", byte);
> -		if (byte != -ENOMEM)
> -			return byte;
> -
> -		/* Retry the read when pages are reclaimed */
> -		msleep(20);
> -		if (!retry) {
> -			retry = true;
> -			goto retry;
> -		} else {
> -			dev_warn(dev, "No memory on data receive after retry, trying to reset the channel...\n");
> -			byte = mei_cldev_disable(cldev);
> -			if (byte < 0)
> -				dev_warn(dev, "mei_cldev_disable failed. %zd\n", byte);
> -			/*
> -			 * Explicitly ignoring disable failure,
> -			 * enable may fix the states and succeed
> -			 */
> -			byte = mei_cldev_enable(cldev);
> -			if (byte < 0)
> -				dev_err(dev, "mei_cldev_enable failed. %zd\n", byte);
> +		switch (byte) {
> +		case -ENOMEM:
> +			/* Retry the read when pages are reclaimed */
> +			msleep(20);
> +			if (!retry) {
> +				retry = true;
> +				goto retry;
> +			}
> +			fallthrough;
> +		case -ENODEV:
> +			fallthrough;
> +		case -ETIME:
> +			ret = mei_pxp_reenable(dev, cldev);
> +			if (ret)
> +				byte = ret;
> +			break;
>  		}
>  	}
>  
> -- 
> 2.41.0
> 

-- 
Ville Syrjälä
Intel

WARNING: multiple messages have this Message-ID (diff)
From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: Tomas Winkler <tomas.winkler@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Alexander Usyskin <alexander.usyskin@intel.com>,
	Vitaly Lubart <vitaly.lubart@intel.com>,
	linux-kernel@vger.kernel.org, intel-gfx@lists.freedesktop.org,
	Alan Previn <alan.previn.teres.alexis@intel.com>
Subject: Re: [char-misc-next 3/4] mei: pxp: re-enable client on errors
Date: Tue, 14 Nov 2023 16:00:27 +0200	[thread overview]
Message-ID: <ZVN9e3BczixJy_1H@intel.com> (raw)
In-Reply-To: <20231011110157.247552-4-tomas.winkler@intel.com>

On Wed, Oct 11, 2023 at 02:01:56PM +0300, Tomas Winkler wrote:
> From: Alexander Usyskin <alexander.usyskin@intel.com>
> 
> Disable and enable mei-pxp client on errors to clean the internal state.

This broke i915 on my Alderlake-P laptop.

Trying to start Xorg just hangs and I eventually have to power off the
laptop to get things back into shape.

The behaviour gets a bit better after commit fb99e79ee62a ("mei: update mei-pxp's
component interface with timeouts") as Xorg "only" gets blocked for
~10 seconds, after which it manages to start, and I get a bunch of spew
in dmesg:
[   25.431535] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   30.435241] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   30.435965] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   30.437341] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   30.437356] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg for inv-stream-key-15, ret=[28]
[   35.555210] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   35.555919] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   35.555937] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg init arb session, ret=[-62]
[   35.555941] i915 0000:00:02.0: [drm] *ERROR* tee cmd for arb session creation failed
[   35.556765] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   36.021808] fuse: init (API version 7.39)
[   40.675183] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   40.676045] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   40.676591] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   40.676602] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg for inv-stream-key-15, ret=[28]
[   40.960209] mate-session-ch[5936]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[   45.795172] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   45.795872] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   45.796520] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   50.915183] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   50.916005] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   50.916012] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg for inv-stream-key-15, ret=[-62]
[   50.916846] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   56.035149] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   56.035956] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   56.036585] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   56.036592] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg for inv-stream-key-15, ret=[28]
[   61.155137] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...

The same spew repeats every time I run any application that uses the GPU,
and the application also gets blocked for a long time (eg. firefox takes
over 15 seconds to start now).

> 
> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
> ---
>  drivers/misc/mei/pxp/mei_pxp.c | 70 +++++++++++++++++++++++-----------
>  1 file changed, 48 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/misc/mei/pxp/mei_pxp.c b/drivers/misc/mei/pxp/mei_pxp.c
> index c6cdd6a47308ebcc72f34c38..9875d16445bb03efcfb31cd9 100644
> --- a/drivers/misc/mei/pxp/mei_pxp.c
> +++ b/drivers/misc/mei/pxp/mei_pxp.c
> @@ -23,6 +23,24 @@
>  
>  #include "mei_pxp.h"
>  
> +static inline int mei_pxp_reenable(const struct device *dev, struct mei_cl_device *cldev)
> +{
> +	int ret;
> +
> +	dev_warn(dev, "Trying to reset the channel...\n");
> +	ret = mei_cldev_disable(cldev);
> +	if (ret < 0)
> +		dev_warn(dev, "mei_cldev_disable failed. %d\n", ret);
> +	/*
> +	 * Explicitly ignoring disable failure,
> +	 * enable may fix the states and succeed
> +	 */
> +	ret = mei_cldev_enable(cldev);
> +	if (ret < 0)
> +		dev_err(dev, "mei_cldev_enable failed. %d\n", ret);
> +	return ret;
> +}
> +
>  /**
>   * mei_pxp_send_message() - Sends a PXP message to ME FW.
>   * @dev: device corresponding to the mei_cl_device
> @@ -35,6 +53,7 @@ mei_pxp_send_message(struct device *dev, const void *message, size_t size)
>  {
>  	struct mei_cl_device *cldev;
>  	ssize_t byte;
> +	int ret;
>  
>  	if (!dev || !message)
>  		return -EINVAL;
> @@ -44,10 +63,20 @@ mei_pxp_send_message(struct device *dev, const void *message, size_t size)
>  	byte = mei_cldev_send(cldev, message, size);
>  	if (byte < 0) {
>  		dev_dbg(dev, "mei_cldev_send failed. %zd\n", byte);
> -		return byte;
> +		switch (byte) {
> +		case -ENOMEM:
> +			fallthrough;
> +		case -ENODEV:
> +			fallthrough;
> +		case -ETIME:
> +			ret = mei_pxp_reenable(dev, cldev);
> +			if (ret)
> +				byte = ret;
> +			break;
> +		}
>  	}
>  
> -	return 0;
> +	return byte;
>  }
>  
>  /**
> @@ -63,6 +92,7 @@ mei_pxp_receive_message(struct device *dev, void *buffer, size_t size)
>  	struct mei_cl_device *cldev;
>  	ssize_t byte;
>  	bool retry = false;
> +	int ret;
>  
>  	if (!dev || !buffer)
>  		return -EINVAL;
> @@ -73,26 +103,22 @@ mei_pxp_receive_message(struct device *dev, void *buffer, size_t size)
>  	byte = mei_cldev_recv(cldev, buffer, size);
>  	if (byte < 0) {
>  		dev_dbg(dev, "mei_cldev_recv failed. %zd\n", byte);
> -		if (byte != -ENOMEM)
> -			return byte;
> -
> -		/* Retry the read when pages are reclaimed */
> -		msleep(20);
> -		if (!retry) {
> -			retry = true;
> -			goto retry;
> -		} else {
> -			dev_warn(dev, "No memory on data receive after retry, trying to reset the channel...\n");
> -			byte = mei_cldev_disable(cldev);
> -			if (byte < 0)
> -				dev_warn(dev, "mei_cldev_disable failed. %zd\n", byte);
> -			/*
> -			 * Explicitly ignoring disable failure,
> -			 * enable may fix the states and succeed
> -			 */
> -			byte = mei_cldev_enable(cldev);
> -			if (byte < 0)
> -				dev_err(dev, "mei_cldev_enable failed. %zd\n", byte);
> +		switch (byte) {
> +		case -ENOMEM:
> +			/* Retry the read when pages are reclaimed */
> +			msleep(20);
> +			if (!retry) {
> +				retry = true;
> +				goto retry;
> +			}
> +			fallthrough;
> +		case -ENODEV:
> +			fallthrough;
> +		case -ETIME:
> +			ret = mei_pxp_reenable(dev, cldev);
> +			if (ret)
> +				byte = ret;
> +			break;
>  		}
>  	}
>  
> -- 
> 2.41.0
> 

-- 
Ville Syrjälä
Intel

  reply	other threads:[~2023-11-14 14:00 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-11 11:01 [char-misc-next 0/4] mei: enhance mei pxp recoverability Tomas Winkler
2023-10-11 11:01 ` [char-misc-next 1/4] mei: bus: add send and recv api with timeout Tomas Winkler
2023-10-11 11:01 ` [char-misc-next 2/4] mei: pxp: recover from recv fail under memory pressure Tomas Winkler
2023-10-11 11:01 ` [char-misc-next 3/4] mei: pxp: re-enable client on errors Tomas Winkler
2023-11-14 14:00   ` Ville Syrjälä [this message]
2023-11-14 14:00     ` Ville Syrjälä
2023-11-14 15:31     ` [Intel-gfx] " Teres Alexis, Alan Previn
2023-11-14 15:31       ` Teres Alexis, Alan Previn
2023-11-14 18:40       ` [Intel-gfx] " Winkler, Tomas
2023-11-14 18:40         ` Winkler, Tomas
2023-11-15 20:35         ` [Intel-gfx] " Ville Syrjälä
2023-11-15 20:35           ` Ville Syrjälä
2023-11-27 13:22           ` [Intel-gfx] " Ville Syrjälä
2023-11-27 13:31             ` gregkh
2023-11-27 13:31               ` gregkh
2023-11-15 13:31       ` Tvrtko Ursulin
2023-11-15 15:58         ` Teres Alexis, Alan Previn
2023-11-15 15:58           ` Teres Alexis, Alan Previn
2023-10-11 11:01 ` [char-misc-next 4/4] mei: update mei-pxp's component interface with timeouts Tomas Winkler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZVN9e3BczixJy_1H@intel.com \
    --to=ville.syrjala@linux.intel.com \
    --cc=alan.previn.teres.alexis@intel.com \
    --cc=alexander.usyskin@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tomas.winkler@intel.com \
    --cc=vitaly.lubart@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.