All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
To: Christophe Lombard <clombard@linux.vnet.ibm.com>,
	imunsie@au1.ibm.com, fbarrat@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH] cxl: Add a kernel thread to check the coherent platform function's state
Date: Tue, 19 Apr 2016 12:40:07 +1000	[thread overview]
Message-ID: <57159A87.70705@au1.ibm.com> (raw)
In-Reply-To: <1460984701-21490-1-git-send-email-clombard@linux.vnet.ibm.com>

On 18/04/16 23:05, Christophe Lombard wrote:
> In the POWERVM environement, the PHYP CoherentAccel component manages

environment

> the state of the Coherant Accelerator Processor Interface adapter and

Coherent

> virtualizes CAPI resources, handles CAPP, PSL, PSL Slice errors - and
> interrupts - and provides a new set of HCALLs for the OS APIs to utilize
> AFUs.
>
> During the course of operation, a coherent platform function can
> encounter errors. Some possible reason for errors are:
> • Hardware recoverable and unrecoverable errors
> • Transient and over-threshold correctable errors
>
> PHYP implements its own state model for the coherent platform function.
> The current state of this Acclerator Fonction Unit (AFU) is available

Accelerator Function Unit

> through a hcall.
>
> In case of low-level troubles (or error injection), The PHYP component

the

> may reset the card and change the AFU state. The PHYP interface doesn't
> provide any way to be notified when that happens.
>
> The current implementation of the cxl driver, for the POWERVM
> environment, follows the general error recovery procedures required to
> reset operation of the coherent platform function. The platform firmware
> resets and reconfigures hardware when an external action is required -
> attach/detach a process, link ok, ....
>
> The purpose of this patch is to interact with the external driver
> (where the AFU is shown) even if no action is required. A kernel thread
> is needed to check every x seconds the current state of the AFU to see
> if we need to enter an error recovery path.
>
> Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>

A few minor issues below.

> diff --git a/drivers/misc/cxl/guest.c b/drivers/misc/cxl/guest.c
> index 8213372..06dfe7f 100644
> --- a/drivers/misc/cxl/guest.c
> +++ b/drivers/misc/cxl/guest.c
> @@ -19,6 +19,10 @@
>   #define CXL_SLOT_RESET_EVENT		2
>   #define CXL_RESUME_EVENT		3
>
> +#define CXL_KTHREAD 			"cxl_kthread"
> +
> +void stop_state_thread(struct cxl_afu *afu);

static?

[...]

> -static int afu_do_recovery(struct cxl_afu *afu)
> +static int handle_state_thread(void *data)
>   {
> -	int rc;
> +	struct cxl_afu *afu;
> +	int rc = 0;

It looks like we don't use rc (see also comment below).

>
> -	/* many threads can arrive here, in case of detach_all for example.
> -	 * Only one needs to drive the recovery
> -	 */
> -	if (mutex_trylock(&afu->guest->recovery_lock)) {
> -		rc = afu_update_state(afu);
> -		mutex_unlock(&afu->guest->recovery_lock);
> -		return rc;
> +	pr_devel("in %s\n", __func__);
> +
> +	afu = (struct cxl_afu*)data;

CodingStyle: space between cxl_afu and *

> +	do {
> +		set_current_state(TASK_INTERRUPTIBLE);
> +
> +		if (afu) {
> +			afu_update_state(afu);

Should we be checking the retval here?

> +			if (afu->guest->previous_state == H_STATE_PERM_UNAVAILABLE)
> +				goto out;
> +		} else
> +			return -ENODEV;
> +		schedule_timeout(msecs_to_jiffies(3000));
> +	} while(!kthread_should_stop());

CodingStyle: space between while and (

> +
> +out:
> +	afu->guest->kthread_tsk = NULL;
> +	return rc;
> +}
> +
> +void start_state_thread(struct cxl_afu *afu)

static?

> +{
> +	if (afu->guest->kthread_tsk)
> +		return;
> +
> +	/* start kernel thread to handle the state of the afu */
> +	afu->guest->kthread_tsk = kthread_run(&handle_state_thread,
> +				  (void *)afu, CXL_KTHREAD);
> +	if (IS_ERR(afu->guest->kthread_tsk)) {
> +		pr_devel("cannot start state kthread\n");
> +		afu->guest->kthread_tsk = NULL;
>   	}
> -	return 0;
> +}
> +
> +void stop_state_thread(struct cxl_afu *afu)

static?

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited

  reply	other threads:[~2016-04-19  2:41 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-18 13:05 [PATCH] cxl: Add a kernel thread to check the coherent platform function's state Christophe Lombard
2016-04-19  2:40 ` Andrew Donnellan [this message]
2016-04-19  9:15   ` christophe lombard
2016-04-19  9:47 ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57159A87.70705@au1.ibm.com \
    --to=andrew.donnellan@au1.ibm.com \
    --cc=clombard@linux.vnet.ibm.com \
    --cc=fbarrat@linux.vnet.ibm.com \
    --cc=imunsie@au1.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.