Re: [PATCH] powerpc/rtas: Introduce rtas_get_sensor_nonblocking() for pci hotplug driver.

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

From: Nathan Lynch <nathanl@linux.ibm.com>
To: Mahesh Salgaonkar <mahesh@linux.ibm.com>,
	linuxppc-dev <linuxppc-dev@ozlabs.org>
Cc: Oliver O'Halloran <oohall@gmail.com>
Subject: Re: [PATCH] powerpc/rtas: Introduce rtas_get_sensor_nonblocking() for pci hotplug driver.
Date: Mon, 29 Nov 2021 22:53:41 -0600	[thread overview]
Message-ID: <87o862nt0q.fsf@linux.ibm.com> (raw)
In-Reply-To: <163817631601.2016996.16085383012429651821.stgit@jupiter>

Mahesh Salgaonkar <mahesh@linux.ibm.com> writes:
> When certain PHB HW failure causes phyp to recover PHB, it marks the PE
> state as temporarily unavailable until recovery is complete. This also
> triggers an EEH handler in Linux which needs to notify drivers, and perform
> recovery. But before notifying the driver about the pci error it uses
> get_adapter_state()->get-sesnor-state() operation of the hotplug_slot to
> determine if the slot contains a device or not. if the slot is empty, the
> recovery is skipped entirely.
>
> However on certain PHB failures, the rtas call get-sesnor-state() returns
> extended busy error (9902) until PHB is recovered by phyp. Once PHB is
> recovered, the get-sensor-state() returns success with correct presence
> status. The rtas call interface rtas_get_sensor() loops over the rtas call
> on extended delay return code (9902) until the return value is either
> success (0) or error (-1). This causes the EEH handler to get stuck for ~6
> seconds before it could notify that the pci error has been detected and
> stop any active operations.

I am curious whether you see any difference with "powerpc/rtas:
rtas_busy_delay() improvements" which was recently applied. It will
cause the the calling task to sleep in response to a 990x status instead
of immediately retrying:

https://git.kernel.org/powerpc/c/38f7b7067dae0c101be573106018e8af22a90fdf

If that commit helps then maybe this change isn't needed.

Otherwise, see my comments below.


> -int rtas_get_sensor_fast(int sensor, int index, int *state)
> +static int
> +__rtas_get_sensor(int sensor, int index, int *state, bool warn_on)

Boolean flag parameters in this style are undesirable. As a reader you
can't infer the significance of a 'true' or 'false' in the argument list
at the call site.

>  {
>  	int token = rtas_token("get-sensor-state");
>  	int rc;
> @@ -618,14 +619,26 @@ int rtas_get_sensor_fast(int sensor, int index, int *state)
>  		return -ENOENT;
>  
>  	rc = rtas_call(token, 2, 2, state, sensor, index);
> -	WARN_ON(rc == RTAS_BUSY || (rc >= RTAS_EXTENDED_DELAY_MIN &&
> -				    rc <= RTAS_EXTENDED_DELAY_MAX));
> +	WARN_ON(warn_on &&
> +		(rc == RTAS_BUSY || (rc >= RTAS_EXTENDED_DELAY_MIN &&
> +				    rc <= RTAS_EXTENDED_DELAY_MAX)));
>  
>  	if (rc < 0)
>  		return rtas_error_rc(rc);
>  	return rc;
>  }

Issues I see with this, in terms of correctness and convention:

* On non-negative status from rtas_call(), including 990x,
  __rtas_get_sensor() returns the RTAS status unchanged. On a negative
  status, it returns a Linux errno value. On a -2 (busy) status
  rtas_error_rc() prints an error message and returns -ERANGE. Seems
  difficult for a caller to handle. Generally we want rtas_* APIs to
  adhere to a Linux 0/-errno convention or to return the RTAS
  status unchanged, but not a mixture.

* __rtas_get_sensor() is called by rtas_get_sensor_fast() and
  rtas_get_sensor_nonblocking(), but is not called by rtas_get_sensor(),
  despite common practice with __-prefixed functions.

> +int rtas_get_sensor_fast(int sensor, int index, int *state)
> +{
> +	return __rtas_get_sensor(sensor, index, state, true);
> +}
> +
> +int rtas_get_sensor_nonblocking(int sensor, int index, int *state)
> +{
> +	return __rtas_get_sensor(sensor, index, state, false);
> +}
> +EXPORT_SYMBOL(rtas_get_sensor_nonblocking);
> +
>  bool rtas_indicator_present(int token, int *maxindex)
>  {
>  	int proplen, count, i;
> diff --git a/drivers/pci/hotplug/rpaphp_pci.c b/drivers/pci/hotplug/rpaphp_pci.c
> index c380bdacd1466..8a7d681254ce9 100644
> --- a/drivers/pci/hotplug/rpaphp_pci.c
> +++ b/drivers/pci/hotplug/rpaphp_pci.c
> @@ -23,7 +23,7 @@ int rpaphp_get_sensor_state(struct slot *slot, int *state)
>  	int rc;
>  	int setlevel;
>  
> -	rc = rtas_get_sensor(DR_ENTITY_SENSE, slot->index, state);
> +	rc = rtas_get_sensor_nonblocking(DR_ENTITY_SENSE, slot->index, state);
>  
>  	if (rc < 0) {
>  		if (rc == -EFAULT || rc == -EEXIST) {
> @@ -38,10 +38,10 @@ int rpaphp_get_sensor_state(struct slot *slot, int *state)
>  			if (rc < 0) {
>  				dbg("%s: power on slot[%s] failed rc=%d.\n",
>  				    __func__, slot->name, rc);
> -			} else {
> -				rc = rtas_get_sensor(DR_ENTITY_SENSE,
> -						     slot->index, state);
> +				return rc;
>  			}
> +			rc = rtas_get_sensor_nonblocking(DR_ENTITY_SENSE,
> +							 slot->index, state);
>  		} else if (rc == -ENODEV)
>  			info("%s: slot is unusable\n", __func__);
>  		else

If I'm reading it right rpaphp_get_sensor_state() now returns 9902 in
the situation this change is trying to address. I checked a couple of
its call sites and it seems like this is going to propagate back into
the PCI hotplug core which of course doesn't understand RTAS call
statuses. So this doesn't seem right.

Maybe it would be better to have rpaphp_get_sensor_state() invoke
rtas_call("get-sensor-state", ...) directly and code whatever special
behavior is needed there, instead of introducing a new exported API. The
driver seems to want to deal with the RTAS return values anyway - it's
implicitly mapping ENODEV, EFAULT, EEXIST from rtas_get_sensor() back to
-9002, -9000, -9001 respectively.

next prev parent reply	other threads:[~2021-11-30  4:54 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-29  8:58 [PATCH] powerpc/rtas: Introduce rtas_get_sensor_nonblocking() for pci hotplug driver Mahesh Salgaonkar
2021-11-29 22:54 ` Tyrel Datwyler
2021-11-30  1:06   ` Nathan Lynch
2021-11-30  1:21     ` Tyrel Datwyler
2021-11-30  4:53 ` Nathan Lynch [this message]
2021-11-30  9:31   ` Mahesh J Salgaonkar
2021-11-30 12:39     ` Nathan Lynch
2021-12-03 13:42   ` Mahesh J Salgaonkar
2021-12-09 15:03     ` Nathan Lynch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o862nt0q.fsf@linux.ibm.com \
    --to=nathanl@linux.ibm.com \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=mahesh@linux.ibm.com \
    --cc=oohall@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).