linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
To: Christophe Lombard <clombard@linux.vnet.ibm.com>,
	imunsie@au1.ibm.com, fbarrat@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH] cxl: Add a kernel thread to check the coherent platform function's state
Date: Tue, 19 Apr 2016 12:40:07 +1000	[thread overview]
Message-ID: <57159A87.70705@au1.ibm.com> (raw)
In-Reply-To: <1460984701-21490-1-git-send-email-clombard@linux.vnet.ibm.com>

On 18/04/16 23:05, Christophe Lombard wrote:
> In the POWERVM environement, the PHYP CoherentAccel component manages

environment

> the state of the Coherant Accelerator Processor Interface adapter and

Coherent

> virtualizes CAPI resources, handles CAPP, PSL, PSL Slice errors - and
> interrupts - and provides a new set of HCALLs for the OS APIs to utilize
> AFUs.
>
> During the course of operation, a coherent platform function can
> encounter errors. Some possible reason for errors are:
> • Hardware recoverable and unrecoverable errors
> • Transient and over-threshold correctable errors
>
> PHYP implements its own state model for the coherent platform function.
> The current state of this Acclerator Fonction Unit (AFU) is available

Accelerator Function Unit

> through a hcall.
>
> In case of low-level troubles (or error injection), The PHYP component

the

> may reset the card and change the AFU state. The PHYP interface doesn't
> provide any way to be notified when that happens.
>
> The current implementation of the cxl driver, for the POWERVM
> environment, follows the general error recovery procedures required to
> reset operation of the coherent platform function. The platform firmware
> resets and reconfigures hardware when an external action is required -
> attach/detach a process, link ok, ....
>
> The purpose of this patch is to interact with the external driver
> (where the AFU is shown) even if no action is required. A kernel thread
> is needed to check every x seconds the current state of the AFU to see
> if we need to enter an error recovery path.
>
> Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>

A few minor issues below.

> diff --git a/drivers/misc/cxl/guest.c b/drivers/misc/cxl/guest.c
> index 8213372..06dfe7f 100644
> --- a/drivers/misc/cxl/guest.c
> +++ b/drivers/misc/cxl/guest.c
> @@ -19,6 +19,10 @@
>   #define CXL_SLOT_RESET_EVENT		2
>   #define CXL_RESUME_EVENT		3
>
> +#define CXL_KTHREAD 			"cxl_kthread"
> +
> +void stop_state_thread(struct cxl_afu *afu);

static?

[...]

> -static int afu_do_recovery(struct cxl_afu *afu)
> +static int handle_state_thread(void *data)
>   {
> -	int rc;
> +	struct cxl_afu *afu;
> +	int rc = 0;

It looks like we don't use rc (see also comment below).

>
> -	/* many threads can arrive here, in case of detach_all for example.
> -	 * Only one needs to drive the recovery
> -	 */
> -	if (mutex_trylock(&afu->guest->recovery_lock)) {
> -		rc = afu_update_state(afu);
> -		mutex_unlock(&afu->guest->recovery_lock);
> -		return rc;
> +	pr_devel("in %s\n", __func__);
> +
> +	afu = (struct cxl_afu*)data;

CodingStyle: space between cxl_afu and *

> +	do {
> +		set_current_state(TASK_INTERRUPTIBLE);
> +
> +		if (afu) {
> +			afu_update_state(afu);

Should we be checking the retval here?

> +			if (afu->guest->previous_state == H_STATE_PERM_UNAVAILABLE)
> +				goto out;
> +		} else
> +			return -ENODEV;
> +		schedule_timeout(msecs_to_jiffies(3000));
> +	} while(!kthread_should_stop());

CodingStyle: space between while and (

> +
> +out:
> +	afu->guest->kthread_tsk = NULL;
> +	return rc;
> +}
> +
> +void start_state_thread(struct cxl_afu *afu)

static?

> +{
> +	if (afu->guest->kthread_tsk)
> +		return;
> +
> +	/* start kernel thread to handle the state of the afu */
> +	afu->guest->kthread_tsk = kthread_run(&handle_state_thread,
> +				  (void *)afu, CXL_KTHREAD);
> +	if (IS_ERR(afu->guest->kthread_tsk)) {
> +		pr_devel("cannot start state kthread\n");
> +		afu->guest->kthread_tsk = NULL;
>   	}
> -	return 0;
> +}
> +
> +void stop_state_thread(struct cxl_afu *afu)

static?

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited

  reply	other threads:[~2016-04-19  2:41 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-18 13:05 [PATCH] cxl: Add a kernel thread to check the coherent platform function's state Christophe Lombard
2016-04-19  2:40 ` Andrew Donnellan [this message]
2016-04-19  9:15   ` christophe lombard
2016-04-19  9:47 ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57159A87.70705@au1.ibm.com \
    --to=andrew.donnellan@au1.ibm.com \
    --cc=clombard@linux.vnet.ibm.com \
    --cc=fbarrat@linux.vnet.ibm.com \
    --cc=imunsie@au1.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).