LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v5 3/7] powerpc: use task_pid_nr() for TID allocation
From: Frederic Barrat @ 2018-05-11  8:48 UTC (permalink / raw)
  To: Alastair D'Silva, linuxppc-dev
  Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
	felix, pombredanne, sukadev, npiggin, gregkh, arnd,
	andrew.donnellan, fbarrat, corbet, Alastair D'Silva
In-Reply-To: <20180511061303.10728-4-alastair@au1.ibm.com>



Le 11/05/2018 à 08:12, Alastair D'Silva a écrit :
> From: Alastair D'Silva <alastair@d-silva.org>
> 
> The current implementation of TID allocation, using a global IDR, may
> result in an errant process starving the system of available TIDs.
> Instead, use task_pid_nr(), as mentioned by the original author. The
> scenario described which prevented it's use is not applicable, as
> set_thread_tidr can only be called after the task struct has been
> populated.
> 
> In the unlikely event that 2 threads share the TID and are waiting,
> all potential outcomes have been determined safe.
> 
> Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
> ---

Thanks for adding the comment. It assumes the reader is aware that the 
TIDR value is only used for the notification using the 'wait' 
instruction, but that's likely to be the case.

Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>


>   arch/powerpc/include/asm/switch_to.h |   1 -
>   arch/powerpc/kernel/process.c        | 122 ++++++---------------------
>   2 files changed, 28 insertions(+), 95 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h
> index be8c9fa23983..5b03d8a82409 100644
> --- a/arch/powerpc/include/asm/switch_to.h
> +++ b/arch/powerpc/include/asm/switch_to.h
> @@ -94,6 +94,5 @@ static inline void clear_task_ebb(struct task_struct *t)
>   extern int set_thread_uses_vas(void);
> 
>   extern int set_thread_tidr(struct task_struct *t);
> -extern void clear_thread_tidr(struct task_struct *t);
> 
>   #endif /* _ASM_POWERPC_SWITCH_TO_H */
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 3b00da47699b..c5b8e53acbae 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1496,103 +1496,41 @@ int set_thread_uses_vas(void)
>   }
> 
>   #ifdef CONFIG_PPC64
> -static DEFINE_SPINLOCK(vas_thread_id_lock);
> -static DEFINE_IDA(vas_thread_ida);
> -
> -/*
> - * We need to assign a unique thread id to each thread in a process.
> +/**
> + * Assign a TIDR (thread ID) for task @t and set it in the thread
> + * structure. For now, we only support setting TIDR for 'current' task.
>    *
> - * This thread id, referred to as TIDR, and separate from the Linux's tgid,
> - * is intended to be used to direct an ASB_Notify from the hardware to the
> - * thread, when a suitable event occurs in the system.
> + * Since the TID value is a truncated form of it PID, it is possible
> + * (but unlikely) for 2 threads to have the same TID. In the unlikely event
> + * that 2 threads share the same TID and are waiting, one of the following
> + * cases will happen:
>    *
> - * One such event is a "paste" instruction in the context of Fast Thread
> - * Wakeup (aka Core-to-core wake up in the Virtual Accelerator Switchboard
> - * (VAS) in POWER9.
> + * 1. The correct thread is running, the wrong thread is not
> + * In this situation, the correct thread is woken and proceeds to pass it's
> + * condition check.
>    *
> - * To get a unique TIDR per process we could simply reuse task_pid_nr() but
> - * the problem is that task_pid_nr() is not yet available copy_thread() is
> - * called. Fixing that would require changing more intrusive arch-neutral
> - * code in code path in copy_process()?.
> + * 2. Neither threads are running
> + * In this situation, neither thread will be woken. When scheduled, the waiting
> + * threads will execute either a wait, which will return immediately, followed
> + * by a condition check, which will pass for the correct thread and fail
> + * for the wrong thread, or they will execute the condition check immediately.
>    *
> - * Further, to assign unique TIDRs within each process, we need an atomic
> - * field (or an IDR) in task_struct, which again intrudes into the arch-
> - * neutral code. So try to assign globally unique TIDRs for now.
> + * 3. The wrong thread is running, the correct thread is not
> + * The wrong thread will be woken, but will fail it's condition check and
> + * re-execute wait. The correct thread, when scheduled, will execute either
> + * it's condition check (which will pass), or wait, which returns immediately
> + * when called the first time after the thread is scheduled, followed by it's
> + * condition check (which will pass).
>    *
> - * NOTE: TIDR 0 indicates that the thread does not need a TIDR value.
> - *	 For now, only threads that expect to be notified by the VAS
> - *	 hardware need a TIDR value and we assign values > 0 for those.
> - */
> -#define MAX_THREAD_CONTEXT	((1 << 16) - 1)
> -static int assign_thread_tidr(void)
> -{
> -	int index;
> -	int err;
> -	unsigned long flags;
> -
> -again:
> -	if (!ida_pre_get(&vas_thread_ida, GFP_KERNEL))
> -		return -ENOMEM;
> -
> -	spin_lock_irqsave(&vas_thread_id_lock, flags);
> -	err = ida_get_new_above(&vas_thread_ida, 1, &index);
> -	spin_unlock_irqrestore(&vas_thread_id_lock, flags);
> -
> -	if (err == -EAGAIN)
> -		goto again;
> -	else if (err)
> -		return err;
> -
> -	if (index > MAX_THREAD_CONTEXT) {
> -		spin_lock_irqsave(&vas_thread_id_lock, flags);
> -		ida_remove(&vas_thread_ida, index);
> -		spin_unlock_irqrestore(&vas_thread_id_lock, flags);
> -		return -ENOMEM;
> -	}
> -
> -	return index;
> -}
> -
> -static void free_thread_tidr(int id)
> -{
> -	unsigned long flags;
> -
> -	spin_lock_irqsave(&vas_thread_id_lock, flags);
> -	ida_remove(&vas_thread_ida, id);
> -	spin_unlock_irqrestore(&vas_thread_id_lock, flags);
> -}
> -
> -/*
> - * Clear any TIDR value assigned to this thread.
> - */
> -void clear_thread_tidr(struct task_struct *t)
> -{
> -	if (!t->thread.tidr)
> -		return;
> -
> -	if (!cpu_has_feature(CPU_FTR_P9_TIDR)) {
> -		WARN_ON_ONCE(1);
> -		return;
> -	}
> -
> -	mtspr(SPRN_TIDR, 0);
> -	free_thread_tidr(t->thread.tidr);
> -	t->thread.tidr = 0;
> -}
> -
> -void arch_release_task_struct(struct task_struct *t)
> -{
> -	clear_thread_tidr(t);
> -}
> -
> -/*
> - * Assign a unique TIDR (thread id) for task @t and set it in the thread
> - * structure. For now, we only support setting TIDR for 'current' task.
> + * 4. Both threads are running
> + * Both threads will be woken. The wrong thread will fail it's condition check
> + * and execute another wait, while the correct thread will pass it's condition
> + * check.
> + *
> + * @t: the task to set the thread ID for
>    */
>   int set_thread_tidr(struct task_struct *t)
>   {
> -	int rc;
> -
>   	if (!cpu_has_feature(CPU_FTR_P9_TIDR))
>   		return -EINVAL;
> 
> @@ -1602,11 +1540,7 @@ int set_thread_tidr(struct task_struct *t)
>   	if (t->thread.tidr)
>   		return 0;
> 
> -	rc = assign_thread_tidr();
> -	if (rc < 0)
> -		return rc;
> -
> -	t->thread.tidr = rc;
> +	t->thread.tidr = (u16)task_pid_nr(t);
>   	mtspr(SPRN_TIDR, t->thread.tidr);
> 
>   	return 0;
> 

^ permalink raw reply

* Re: [PATCH v5 4/7] ocxl: Rename pnv_ocxl_spa_remove_pe to clarify it's action
From: Frederic Barrat @ 2018-05-11  8:49 UTC (permalink / raw)
  To: Alastair D'Silva, linuxppc-dev
  Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
	felix, pombredanne, sukadev, npiggin, gregkh, arnd,
	andrew.donnellan, fbarrat, corbet, Alastair D'Silva
In-Reply-To: <20180511061303.10728-5-alastair@au1.ibm.com>



Le 11/05/2018 à 08:13, Alastair D'Silva a écrit :
> From: Alastair D'Silva <alastair@d-silva.org>
> 
> The function removes the process element from NPU cache.
> 
> Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
> ---

Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>

>   arch/powerpc/include/asm/pnv-ocxl.h   | 2 +-
>   arch/powerpc/platforms/powernv/ocxl.c | 4 ++--
>   drivers/misc/ocxl/link.c              | 2 +-
>   3 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/pnv-ocxl.h b/arch/powerpc/include/asm/pnv-ocxl.h
> index f6945d3bc971..208b5503f4ed 100644
> --- a/arch/powerpc/include/asm/pnv-ocxl.h
> +++ b/arch/powerpc/include/asm/pnv-ocxl.h
> @@ -28,7 +28,7 @@ extern int pnv_ocxl_map_xsl_regs(struct pci_dev *dev, void __iomem **dsisr,
>   extern int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, int PE_mask,
>   			void **platform_data);
>   extern void pnv_ocxl_spa_release(void *platform_data);
> -extern int pnv_ocxl_spa_remove_pe(void *platform_data, int pe_handle);
> +extern int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle);
> 
>   extern int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr);
>   extern void pnv_ocxl_free_xive_irq(u32 irq);
> diff --git a/arch/powerpc/platforms/powernv/ocxl.c b/arch/powerpc/platforms/powernv/ocxl.c
> index fa9b53af3c7b..8c65aacda9c8 100644
> --- a/arch/powerpc/platforms/powernv/ocxl.c
> +++ b/arch/powerpc/platforms/powernv/ocxl.c
> @@ -475,7 +475,7 @@ void pnv_ocxl_spa_release(void *platform_data)
>   }
>   EXPORT_SYMBOL_GPL(pnv_ocxl_spa_release);
> 
> -int pnv_ocxl_spa_remove_pe(void *platform_data, int pe_handle)
> +int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle)
>   {
>   	struct spa_data *data = (struct spa_data *) platform_data;
>   	int rc;
> @@ -483,7 +483,7 @@ int pnv_ocxl_spa_remove_pe(void *platform_data, int pe_handle)
>   	rc = opal_npu_spa_clear_cache(data->phb_opal_id, data->bdfn, pe_handle);
>   	return rc;
>   }
> -EXPORT_SYMBOL_GPL(pnv_ocxl_spa_remove_pe);
> +EXPORT_SYMBOL_GPL(pnv_ocxl_spa_remove_pe_from_cache);
> 
>   int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr)
>   {
> diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
> index f30790582dc0..656e8610eec2 100644
> --- a/drivers/misc/ocxl/link.c
> +++ b/drivers/misc/ocxl/link.c
> @@ -599,7 +599,7 @@ int ocxl_link_remove_pe(void *link_handle, int pasid)
>   	 * On powerpc, the entry needs to be cleared from the context
>   	 * cache of the NPU.
>   	 */
> -	rc = pnv_ocxl_spa_remove_pe(link->platform_data, pe_handle);
> +	rc = pnv_ocxl_spa_remove_pe_from_cache(link->platform_data, pe_handle);
>   	WARN_ON(rc);
> 
>   	pe_data = radix_tree_delete(&spa->pe_tree, pe_handle);
> 

^ permalink raw reply

* Re: [PATCH v5 5/7] ocxl: Expose the thread_id needed for wait on POWER9
From: Frederic Barrat @ 2018-05-11  9:25 UTC (permalink / raw)
  To: Alastair D'Silva, linuxppc-dev
  Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
	felix, pombredanne, sukadev, npiggin, gregkh, arnd,
	andrew.donnellan, fbarrat, corbet, Alastair D'Silva
In-Reply-To: <20180511061303.10728-6-alastair@au1.ibm.com>



Le 11/05/2018 à 08:13, Alastair D'Silva a écrit :
> From: Alastair D'Silva <alastair@d-silva.org>
> 
> In order to successfully issue as_notify, an AFU needs to know the TID
> to notify, which in turn means that this information should be
> available in userspace so it can be communicated to the AFU.
> 
> Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
> ---

Ok, so we keep the limitation of having only one thread per context able 
to call 'wait', even though we don't have to worry about depleting the 
pool of TIDs any more. I think that's acceptable, though we don't really 
have a reason to justify it any more. Any reason you want to keep it 
that way?

   Fred


>   drivers/misc/ocxl/context.c       |  5 ++-
>   drivers/misc/ocxl/file.c          | 53 +++++++++++++++++++++++++++++++
>   drivers/misc/ocxl/link.c          | 36 +++++++++++++++++++++
>   drivers/misc/ocxl/ocxl_internal.h |  1 +
>   include/misc/ocxl.h               |  9 ++++++
>   include/uapi/misc/ocxl.h          |  8 +++++
>   6 files changed, 111 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
> index 909e8807824a..95f74623113e 100644
> --- a/drivers/misc/ocxl/context.c
> +++ b/drivers/misc/ocxl/context.c
> @@ -34,6 +34,8 @@ int ocxl_context_init(struct ocxl_context *ctx, struct ocxl_afu *afu,
>   	mutex_init(&ctx->xsl_error_lock);
>   	mutex_init(&ctx->irq_lock);
>   	idr_init(&ctx->irq_idr);
> +	ctx->tidr = 0;
> +
>   	/*
>   	 * Keep a reference on the AFU to make sure it's valid for the
>   	 * duration of the life of the context
> @@ -65,6 +67,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
>   {
>   	int rc;
> 
> +	// Locks both status & tidr
>   	mutex_lock(&ctx->status_mutex);
>   	if (ctx->status != OPENED) {
>   		rc = -EIO;
> @@ -72,7 +75,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
>   	}
> 
>   	rc = ocxl_link_add_pe(ctx->afu->fn->link, ctx->pasid,
> -			current->mm->context.id, 0, amr, current->mm,
> +			current->mm->context.id, ctx->tidr, amr, current->mm,
>   			xsl_fault_error, ctx);
>   	if (rc)
>   		goto out;
> diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
> index 038509e5d031..eb409a469f21 100644
> --- a/drivers/misc/ocxl/file.c
> +++ b/drivers/misc/ocxl/file.c
> @@ -5,6 +5,8 @@
>   #include <linux/sched/signal.h>
>   #include <linux/uaccess.h>
>   #include <uapi/misc/ocxl.h>
> +#include <asm/reg.h>
> +#include <asm/switch_to.h>
>   #include "ocxl_internal.h"
> 
> 
> @@ -123,11 +125,55 @@ static long afu_ioctl_get_metadata(struct ocxl_context *ctx,
>   	return 0;
>   }
> 
> +#ifdef CONFIG_PPC64
> +static long afu_ioctl_enable_p9_wait(struct ocxl_context *ctx,
> +		struct ocxl_ioctl_p9_wait __user *uarg)
> +{
> +	struct ocxl_ioctl_p9_wait arg;
> +
> +	memset(&arg, 0, sizeof(arg));
> +
> +	if (cpu_has_feature(CPU_FTR_P9_TIDR)) {
> +		enum ocxl_context_status status;
> +
> +		// Locks both status & tidr
> +		mutex_lock(&ctx->status_mutex);
> +		if (!ctx->tidr) {
> +			if (set_thread_tidr(current))
> +				return -ENOENT;
> +
> +			ctx->tidr = current->thread.tidr;
> +		}
> +
> +		status = ctx->status;
> +		mutex_unlock(&ctx->status_mutex);
> +
> +		if (status == ATTACHED) {
> +			int rc;
> +			struct link *link = ctx->afu->fn->link;
> +
> +			rc = ocxl_link_update_pe(link, ctx->pasid, ctx->tidr);
> +			if (rc)
> +				return rc;
> +		}
> +
> +		arg.thread_id = ctx->tidr;
> +	} else
> +		return -ENOENT;
> +
> +	if (copy_to_user(uarg, &arg, sizeof(arg)))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +#endif
> +
>   #define CMD_STR(x) (x == OCXL_IOCTL_ATTACH ? "ATTACH" :			\
>   			x == OCXL_IOCTL_IRQ_ALLOC ? "IRQ_ALLOC" :	\
>   			x == OCXL_IOCTL_IRQ_FREE ? "IRQ_FREE" :		\
>   			x == OCXL_IOCTL_IRQ_SET_FD ? "IRQ_SET_FD" :	\
>   			x == OCXL_IOCTL_GET_METADATA ? "GET_METADATA" :	\
> +			x == OCXL_IOCTL_ENABLE_P9_WAIT ? "ENABLE_P9_WAIT" :	\
>   			"UNKNOWN")
> 
>   static long afu_ioctl(struct file *file, unsigned int cmd,
> @@ -186,6 +232,13 @@ static long afu_ioctl(struct file *file, unsigned int cmd,
>   				(struct ocxl_ioctl_metadata __user *) args);
>   		break;
> 
> +#ifdef CONFIG_PPC64
> +	case OCXL_IOCTL_ENABLE_P9_WAIT:
> +		rc = afu_ioctl_enable_p9_wait(ctx,
> +				(struct ocxl_ioctl_p9_wait __user *) args);
> +		break;
> +#endif
> +
>   	default:
>   		rc = -EINVAL;
>   	}
> diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
> index 656e8610eec2..88876ae8f330 100644
> --- a/drivers/misc/ocxl/link.c
> +++ b/drivers/misc/ocxl/link.c
> @@ -544,6 +544,42 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
>   }
>   EXPORT_SYMBOL_GPL(ocxl_link_add_pe);
> 
> +int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid)
> +{
> +	struct link *link = (struct link *) link_handle;
> +	struct spa *spa = link->spa;
> +	struct ocxl_process_element *pe;
> +	int pe_handle, rc;
> +
> +	if (pasid > SPA_PASID_MAX)
> +		return -EINVAL;
> +
> +	pe_handle = pasid & SPA_PE_MASK;
> +	pe = spa->spa_mem + pe_handle;
> +
> +	mutex_lock(&spa->spa_lock);
> +
> +	pe->tid = tid;
> +
> +	/*
> +	 * The barrier makes sure the PE is updated
> +	 * before we clear the NPU context cache below, so that the
> +	 * old PE cannot be reloaded erroneously.
> +	 */
> +	mb();
> +
> +	/*
> +	 * hook to platform code
> +	 * On powerpc, the entry needs to be cleared from the context
> +	 * cache of the NPU.
> +	 */
> +	rc = pnv_ocxl_spa_remove_pe_from_cache(link->platform_data, pe_handle);
> +	WARN_ON(rc);
> +
> +	mutex_unlock(&spa->spa_lock);
> +	return rc;
> +}
> +
>   int ocxl_link_remove_pe(void *link_handle, int pasid)
>   {
>   	struct link *link = (struct link *) link_handle;
> diff --git a/drivers/misc/ocxl/ocxl_internal.h b/drivers/misc/ocxl/ocxl_internal.h
> index 5d421824afd9..a32f2151029f 100644
> --- a/drivers/misc/ocxl/ocxl_internal.h
> +++ b/drivers/misc/ocxl/ocxl_internal.h
> @@ -77,6 +77,7 @@ struct ocxl_context {
>   	struct ocxl_xsl_error xsl_error;
>   	struct mutex irq_lock;
>   	struct idr irq_idr;
> +	u16 tidr; // Thread ID used for P9 wait implementation
>   };
> 
>   struct ocxl_process_element {
> diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
> index 51ccf76db293..9ff6ddc28e22 100644
> --- a/include/misc/ocxl.h
> +++ b/include/misc/ocxl.h
> @@ -188,6 +188,15 @@ extern int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
>   		void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
>   		void *xsl_err_data);
> 
> +/**
> + * Update values within a Process Element
> + *
> + * link_handle: the link handle associated with the process element
> + * pasid: the PASID for the AFU context
> + * tid: the new thread id for the process element
> + */
> +extern int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid);
> +
>   /*
>    * Remove a Process Element from the Shared Process Area for a link
>    */
> diff --git a/include/uapi/misc/ocxl.h b/include/uapi/misc/ocxl.h
> index 0af83d80fb3e..561e6f0dfcb7 100644
> --- a/include/uapi/misc/ocxl.h
> +++ b/include/uapi/misc/ocxl.h
> @@ -48,6 +48,13 @@ struct ocxl_ioctl_metadata {
>   	__u64 reserved[13]; // Total of 16*u64
>   };
> 
> +struct ocxl_ioctl_p9_wait {
> +	__u16 thread_id; // The thread ID required to wake this thread
> +	__u16 reserved1;
> +	__u32 reserved2;
> +	__u64 reserved3[3];
> +};
> +
>   struct ocxl_ioctl_irq_fd {
>   	__u64 irq_offset;
>   	__s32 eventfd;
> @@ -62,5 +69,6 @@ struct ocxl_ioctl_irq_fd {
>   #define OCXL_IOCTL_IRQ_FREE	_IOW(OCXL_MAGIC, 0x12, __u64)
>   #define OCXL_IOCTL_IRQ_SET_FD	_IOW(OCXL_MAGIC, 0x13, struct ocxl_ioctl_irq_fd)
>   #define OCXL_IOCTL_GET_METADATA _IOR(OCXL_MAGIC, 0x14, struct ocxl_ioctl_metadata)
> +#define OCXL_IOCTL_ENABLE_P9_WAIT	_IOR(OCXL_MAGIC, 0x15, struct ocxl_ioctl_p9_wait)
> 
>   #endif /* _UAPI_MISC_OCXL_H */
> 

^ permalink raw reply

* Re: [PATCH v5 6/7] ocxl: Add an IOCTL so userspace knows what OCXL features are available
From: Frederic Barrat @ 2018-05-11  9:27 UTC (permalink / raw)
  To: Alastair D'Silva, linuxppc-dev
  Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
	felix, pombredanne, sukadev, npiggin, gregkh, arnd,
	andrew.donnellan, fbarrat, corbet, Alastair D'Silva
In-Reply-To: <20180511061303.10728-7-alastair@au1.ibm.com>



Le 11/05/2018 à 08:13, Alastair D'Silva a écrit :
> From: Alastair D'Silva <alastair@d-silva.org>
> 
> In order for a userspace AFU driver to call the POWER9 specific
> OCXL_IOCTL_ENABLE_P9_WAIT, it needs to verify that it can actually
> make that call.
> 
> Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
> ---

Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>


>   drivers/misc/ocxl/file.c | 25 +++++++++++++++++++++++++
>   include/uapi/misc/ocxl.h |  6 ++++++
>   2 files changed, 31 insertions(+)
> 
> diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
> index eb409a469f21..33ae46ce0a8a 100644
> --- a/drivers/misc/ocxl/file.c
> +++ b/drivers/misc/ocxl/file.c
> @@ -168,12 +168,32 @@ static long afu_ioctl_enable_p9_wait(struct ocxl_context *ctx,
>   }
>   #endif
> 
> +
> +static long afu_ioctl_get_features(struct ocxl_context *ctx,
> +		struct ocxl_ioctl_features __user *uarg)
> +{
> +	struct ocxl_ioctl_features arg;
> +
> +	memset(&arg, 0, sizeof(arg));
> +
> +#ifdef CONFIG_PPC64
> +	if (cpu_has_feature(CPU_FTR_P9_TIDR))
> +		arg.flags[0] |= OCXL_IOCTL_FEATURES_FLAGS0_P9_WAIT;
> +#endif
> +
> +	if (copy_to_user(uarg, &arg, sizeof(arg)))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
>   #define CMD_STR(x) (x == OCXL_IOCTL_ATTACH ? "ATTACH" :			\
>   			x == OCXL_IOCTL_IRQ_ALLOC ? "IRQ_ALLOC" :	\
>   			x == OCXL_IOCTL_IRQ_FREE ? "IRQ_FREE" :		\
>   			x == OCXL_IOCTL_IRQ_SET_FD ? "IRQ_SET_FD" :	\
>   			x == OCXL_IOCTL_GET_METADATA ? "GET_METADATA" :	\
>   			x == OCXL_IOCTL_ENABLE_P9_WAIT ? "ENABLE_P9_WAIT" :	\
> +			x == OCXL_IOCTL_GET_FEATURES ? "GET_FEATURES" :	\
>   			"UNKNOWN")
> 
>   static long afu_ioctl(struct file *file, unsigned int cmd,
> @@ -239,6 +259,11 @@ static long afu_ioctl(struct file *file, unsigned int cmd,
>   		break;
>   #endif
> 
> +	case OCXL_IOCTL_GET_FEATURES:
> +		rc = afu_ioctl_get_features(ctx,
> +				(struct ocxl_ioctl_features __user *) args);
> +		break;
> +
>   	default:
>   		rc = -EINVAL;
>   	}
> diff --git a/include/uapi/misc/ocxl.h b/include/uapi/misc/ocxl.h
> index 561e6f0dfcb7..97937cfa3baa 100644
> --- a/include/uapi/misc/ocxl.h
> +++ b/include/uapi/misc/ocxl.h
> @@ -55,6 +55,11 @@ struct ocxl_ioctl_p9_wait {
>   	__u64 reserved3[3];
>   };
> 
> +#define OCXL_IOCTL_FEATURES_FLAGS0_P9_WAIT	0x01
> +struct ocxl_ioctl_features {
> +	__u64 flags[4];
> +};
> +
>   struct ocxl_ioctl_irq_fd {
>   	__u64 irq_offset;
>   	__s32 eventfd;
> @@ -70,5 +75,6 @@ struct ocxl_ioctl_irq_fd {
>   #define OCXL_IOCTL_IRQ_SET_FD	_IOW(OCXL_MAGIC, 0x13, struct ocxl_ioctl_irq_fd)
>   #define OCXL_IOCTL_GET_METADATA _IOR(OCXL_MAGIC, 0x14, struct ocxl_ioctl_metadata)
>   #define OCXL_IOCTL_ENABLE_P9_WAIT	_IOR(OCXL_MAGIC, 0x15, struct ocxl_ioctl_p9_wait)
> +#define OCXL_IOCTL_GET_FEATURES _IOR(OCXL_MAGIC, 0x16, struct ocxl_ioctl_features)
> 
>   #endif /* _UAPI_MISC_OCXL_H */
> 

^ permalink raw reply

* Re: [PATCH v5 7/7] ocxl: Document new OCXL IOCTLs
From: Frederic Barrat @ 2018-05-11  9:28 UTC (permalink / raw)
  To: Alastair D'Silva, linuxppc-dev
  Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
	felix, pombredanne, sukadev, npiggin, gregkh, arnd,
	andrew.donnellan, fbarrat, corbet, Alastair D'Silva
In-Reply-To: <20180511061303.10728-8-alastair@au1.ibm.com>



Le 11/05/2018 à 08:13, Alastair D'Silva a écrit :
> From: Alastair D'Silva <alastair@d-silva.org>
> 
> Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
> ---

Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>


>   Documentation/accelerators/ocxl.rst | 11 +++++++++++
>   1 file changed, 11 insertions(+)
> 
> diff --git a/Documentation/accelerators/ocxl.rst b/Documentation/accelerators/ocxl.rst
> index ddcc58d01cfb..14cefc020e2d 100644
> --- a/Documentation/accelerators/ocxl.rst
> +++ b/Documentation/accelerators/ocxl.rst
> @@ -157,6 +157,17 @@ OCXL_IOCTL_GET_METADATA:
>     Obtains configuration information from the card, such at the size of
>     MMIO areas, the AFU version, and the PASID for the current context.
> 
> +OCXL_IOCTL_ENABLE_P9_WAIT:
> +
> +  Allows the AFU to wake a userspace thread executing 'wait'. Returns
> +  information to userspace to allow it to configure the AFU. Note that
> +  this is only available on POWER9.
> +
> +OCXL_IOCTL_GET_FEATURES:
> +
> +  Reports on which CPU features that affect OpenCAPI are usable from
> +  userspace.
> +
> 
>   mmap
>   ----
> 

^ permalink raw reply

* RE: [PATCH v5 5/7] ocxl: Expose the thread_id needed for wait on POWER9
From: Alastair D'Silva @ 2018-05-11 10:06 UTC (permalink / raw)
  To: 'Frederic Barrat', 'Alastair D'Silva',
	linuxppc-dev
  Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
	felix, pombredanne, sukadev, npiggin, gregkh, arnd,
	andrew.donnellan, fbarrat, corbet
In-Reply-To: <9656b063-eafd-a882-c678-a1c4d504cd54@linux.ibm.com>


> -----Original Message-----
> From: Frederic Barrat <fbarrat@linux.ibm.com>
> Sent: Friday, 11 May 2018 7:25 PM
> To: Alastair D'Silva <alastair@au1.ibm.com>; =
linuxppc-dev@lists.ozlabs.org
> Cc: linux-kernel@vger.kernel.org; linux-doc@vger.kernel.org;
> mikey@neuling.org; vaibhav@linux.vnet.ibm.com;
> aneesh.kumar@linux.vnet.ibm.com; malat@debian.org;
> felix@linux.vnet.ibm.com; pombredanne@nexb.com;
> sukadev@linux.vnet.ibm.com; npiggin@gmail.com;
> gregkh@linuxfoundation.org; arnd@arndb.de;
> andrew.donnellan@au1.ibm.com; fbarrat@linux.vnet.ibm.com;
> corbet@lwn.net; Alastair D'Silva <alastair@d-silva.org>
> Subject: Re: [PATCH v5 5/7] ocxl: Expose the thread_id needed for wait =
on
> POWER9
>=20
>=20
>=20
> Le 11/05/2018 =C3=A0 08:13, Alastair D'Silva a =C3=A9crit :
> > From: Alastair D'Silva <alastair@d-silva.org>
> >
> > In order to successfully issue as_notify, an AFU needs to know the =
TID
> > to notify, which in turn means that this information should be
> > available in userspace so it can be communicated to the AFU.
> >
> > Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
> > ---
>=20
> Ok, so we keep the limitation of having only one thread per context =
able to
> call 'wait', even though we don't have to worry about depleting the =
pool of
> TIDs any more. I think that's acceptable, though we don't really have =
a reason
> to justify it any more. Any reason you want to keep it that way?
>=20

No strong reason, just trying to minimise the amount of changes. We can =
always expand the scope later, if we have a use-case for it.

>    Fred
>=20
>=20
> >   drivers/misc/ocxl/context.c       |  5 ++-
> >   drivers/misc/ocxl/file.c          | 53 =
+++++++++++++++++++++++++++++++
> >   drivers/misc/ocxl/link.c          | 36 +++++++++++++++++++++
> >   drivers/misc/ocxl/ocxl_internal.h |  1 +
> >   include/misc/ocxl.h               |  9 ++++++
> >   include/uapi/misc/ocxl.h          |  8 +++++
> >   6 files changed, 111 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/misc/ocxl/context.c =
b/drivers/misc/ocxl/context.c
> > index 909e8807824a..95f74623113e 100644
> > --- a/drivers/misc/ocxl/context.c
> > +++ b/drivers/misc/ocxl/context.c
> > @@ -34,6 +34,8 @@ int ocxl_context_init(struct ocxl_context *ctx, =
struct
> ocxl_afu *afu,
> >   	mutex_init(&ctx->xsl_error_lock);
> >   	mutex_init(&ctx->irq_lock);
> >   	idr_init(&ctx->irq_idr);
> > +	ctx->tidr =3D 0;
> > +
> >   	/*
> >   	 * Keep a reference on the AFU to make sure it's valid for the
> >   	 * duration of the life of the context @@ -65,6 +67,7 @@ int
> > ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
> >   {
> >   	int rc;
> >
> > +	// Locks both status & tidr
> >   	mutex_lock(&ctx->status_mutex);
> >   	if (ctx->status !=3D OPENED) {
> >   		rc =3D -EIO;
> > @@ -72,7 +75,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, =
u64
> amr)
> >   	}
> >
> >   	rc =3D ocxl_link_add_pe(ctx->afu->fn->link, ctx->pasid,
> > -			current->mm->context.id, 0, amr, current->mm,
> > +			current->mm->context.id, ctx->tidr, amr, current-
> >mm,
> >   			xsl_fault_error, ctx);
> >   	if (rc)
> >   		goto out;
> > diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c =
index
> > 038509e5d031..eb409a469f21 100644
> > --- a/drivers/misc/ocxl/file.c
> > +++ b/drivers/misc/ocxl/file.c
> > @@ -5,6 +5,8 @@
> >   #include <linux/sched/signal.h>
> >   #include <linux/uaccess.h>
> >   #include <uapi/misc/ocxl.h>
> > +#include <asm/reg.h>
> > +#include <asm/switch_to.h>
> >   #include "ocxl_internal.h"
> >
> >
> > @@ -123,11 +125,55 @@ static long afu_ioctl_get_metadata(struct
> ocxl_context *ctx,
> >   	return 0;
> >   }
> >
> > +#ifdef CONFIG_PPC64
> > +static long afu_ioctl_enable_p9_wait(struct ocxl_context *ctx,
> > +		struct ocxl_ioctl_p9_wait __user *uarg) {
> > +	struct ocxl_ioctl_p9_wait arg;
> > +
> > +	memset(&arg, 0, sizeof(arg));
> > +
> > +	if (cpu_has_feature(CPU_FTR_P9_TIDR)) {
> > +		enum ocxl_context_status status;
> > +
> > +		// Locks both status & tidr
> > +		mutex_lock(&ctx->status_mutex);
> > +		if (!ctx->tidr) {
> > +			if (set_thread_tidr(current))
> > +				return -ENOENT;
> > +
> > +			ctx->tidr =3D current->thread.tidr;
> > +		}
> > +
> > +		status =3D ctx->status;
> > +		mutex_unlock(&ctx->status_mutex);
> > +
> > +		if (status =3D=3D ATTACHED) {
> > +			int rc;
> > +			struct link *link =3D ctx->afu->fn->link;
> > +
> > +			rc =3D ocxl_link_update_pe(link, ctx->pasid, ctx->tidr);
> > +			if (rc)
> > +				return rc;
> > +		}
> > +
> > +		arg.thread_id =3D ctx->tidr;
> > +	} else
> > +		return -ENOENT;
> > +
> > +	if (copy_to_user(uarg, &arg, sizeof(arg)))
> > +		return -EFAULT;
> > +
> > +	return 0;
> > +}
> > +#endif
> > +
> >   #define CMD_STR(x) (x =3D=3D OCXL_IOCTL_ATTACH ? "ATTACH" :
> 		\
> >   			x =3D=3D OCXL_IOCTL_IRQ_ALLOC ? "IRQ_ALLOC" :	\
> >   			x =3D=3D OCXL_IOCTL_IRQ_FREE ? "IRQ_FREE" :
> 	\
> >   			x =3D=3D OCXL_IOCTL_IRQ_SET_FD ? "IRQ_SET_FD" :
> 	\
> >   			x =3D=3D OCXL_IOCTL_GET_METADATA ?
> "GET_METADATA" :	\
> > +			x =3D=3D OCXL_IOCTL_ENABLE_P9_WAIT ?
> "ENABLE_P9_WAIT" :	\
> >   			"UNKNOWN")
> >
> >   static long afu_ioctl(struct file *file, unsigned int cmd, @@ =
-186,6
> > +232,13 @@ static long afu_ioctl(struct file *file, unsigned int =
cmd,
> >   				(struct ocxl_ioctl_metadata __user *) args);
> >   		break;
> >
> > +#ifdef CONFIG_PPC64
> > +	case OCXL_IOCTL_ENABLE_P9_WAIT:
> > +		rc =3D afu_ioctl_enable_p9_wait(ctx,
> > +				(struct ocxl_ioctl_p9_wait __user *) args);
> > +		break;
> > +#endif
> > +
> >   	default:
> >   		rc =3D -EINVAL;
> >   	}
> > diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c =
index
> > 656e8610eec2..88876ae8f330 100644
> > --- a/drivers/misc/ocxl/link.c
> > +++ b/drivers/misc/ocxl/link.c
> > @@ -544,6 +544,42 @@ int ocxl_link_add_pe(void *link_handle, int =
pasid,
> u32 pidr, u32 tidr,
> >   }
> >   EXPORT_SYMBOL_GPL(ocxl_link_add_pe);
> >
> > +int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid) {
> > +	struct link *link =3D (struct link *) link_handle;
> > +	struct spa *spa =3D link->spa;
> > +	struct ocxl_process_element *pe;
> > +	int pe_handle, rc;
> > +
> > +	if (pasid > SPA_PASID_MAX)
> > +		return -EINVAL;
> > +
> > +	pe_handle =3D pasid & SPA_PE_MASK;
> > +	pe =3D spa->spa_mem + pe_handle;
> > +
> > +	mutex_lock(&spa->spa_lock);
> > +
> > +	pe->tid =3D tid;
> > +
> > +	/*
> > +	 * The barrier makes sure the PE is updated
> > +	 * before we clear the NPU context cache below, so that the
> > +	 * old PE cannot be reloaded erroneously.
> > +	 */
> > +	mb();
> > +
> > +	/*
> > +	 * hook to platform code
> > +	 * On powerpc, the entry needs to be cleared from the context
> > +	 * cache of the NPU.
> > +	 */
> > +	rc =3D pnv_ocxl_spa_remove_pe_from_cache(link->platform_data,
> pe_handle);
> > +	WARN_ON(rc);
> > +
> > +	mutex_unlock(&spa->spa_lock);
> > +	return rc;
> > +}
> > +
> >   int ocxl_link_remove_pe(void *link_handle, int pasid)
> >   {
> >   	struct link *link =3D (struct link *) link_handle; diff --git
> > a/drivers/misc/ocxl/ocxl_internal.h
> > b/drivers/misc/ocxl/ocxl_internal.h
> > index 5d421824afd9..a32f2151029f 100644
> > --- a/drivers/misc/ocxl/ocxl_internal.h
> > +++ b/drivers/misc/ocxl/ocxl_internal.h
> > @@ -77,6 +77,7 @@ struct ocxl_context {
> >   	struct ocxl_xsl_error xsl_error;
> >   	struct mutex irq_lock;
> >   	struct idr irq_idr;
> > +	u16 tidr; // Thread ID used for P9 wait implementation
> >   };
> >
> >   struct ocxl_process_element {
> > diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h index
> > 51ccf76db293..9ff6ddc28e22 100644
> > --- a/include/misc/ocxl.h
> > +++ b/include/misc/ocxl.h
> > @@ -188,6 +188,15 @@ extern int ocxl_link_add_pe(void *link_handle, =
int
> pasid, u32 pidr, u32 tidr,
> >   		void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
> >   		void *xsl_err_data);
> >
> > +/**
> > + * Update values within a Process Element
> > + *
> > + * link_handle: the link handle associated with the process element
> > + * pasid: the PASID for the AFU context
> > + * tid: the new thread id for the process element  */ extern int
> > +ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid);
> > +
> >   /*
> >    * Remove a Process Element from the Shared Process Area for a =
link
> >    */
> > diff --git a/include/uapi/misc/ocxl.h b/include/uapi/misc/ocxl.h =
index
> > 0af83d80fb3e..561e6f0dfcb7 100644
> > --- a/include/uapi/misc/ocxl.h
> > +++ b/include/uapi/misc/ocxl.h
> > @@ -48,6 +48,13 @@ struct ocxl_ioctl_metadata {
> >   	__u64 reserved[13]; // Total of 16*u64
> >   };
> >
> > +struct ocxl_ioctl_p9_wait {
> > +	__u16 thread_id; // The thread ID required to wake this thread
> > +	__u16 reserved1;
> > +	__u32 reserved2;
> > +	__u64 reserved3[3];
> > +};
> > +
> >   struct ocxl_ioctl_irq_fd {
> >   	__u64 irq_offset;
> >   	__s32 eventfd;
> > @@ -62,5 +69,6 @@ struct ocxl_ioctl_irq_fd {
> >   #define OCXL_IOCTL_IRQ_FREE	_IOW(OCXL_MAGIC, 0x12, __u64)
> >   #define OCXL_IOCTL_IRQ_SET_FD	_IOW(OCXL_MAGIC, 0x13, struct
> ocxl_ioctl_irq_fd)
> >   #define OCXL_IOCTL_GET_METADATA _IOR(OCXL_MAGIC, 0x14, struct
> > ocxl_ioctl_metadata)
> > +#define OCXL_IOCTL_ENABLE_P9_WAIT	_IOR(OCXL_MAGIC, 0x15,
> struct ocxl_ioctl_p9_wait)
> >
> >   #endif /* _UAPI_MISC_OCXL_H */
> >
>=20
>=20
> ---
> This email has been checked for viruses by AVG.
> http://www.avg.com

^ permalink raw reply

* Re: [PATCH] misc: cxl: Change return type to vm_fault_t
From: Frederic Barrat @ 2018-05-11 10:24 UTC (permalink / raw)
  To: Souptick Joarder, fbarrat, andrew.donnellan, arnd, gregkh
  Cc: linuxppc-dev, linux-kernel, willy
In-Reply-To: <20180417145354.GA31451@jordon-HP-15-Notebook-PC>



Le 17/04/2018 à 16:53, Souptick Joarder a écrit :
> Use new return type vm_fault_t for fault handler. For
> now, this is just documenting that the function returns
> a VM_FAULT value rather than an errno. Once all instances
> are converted, vm_fault_t will become a distinct type.
> 
> Reference id -> 1c8f422059ae ("mm: change return type to
> vm_fault_t")
> 
> previously cxl_mmap_fault returns VM_FAULT_NOPAGE as
> default value irrespective of vm_insert_pfn() return
> value. This bug is fixed with new vmf_insert_pfn()
> which will return VM_FAULT_ type based on err.
> 
> Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
> ---

It looks ok, and it passed some basic testing.

Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>

   Fred


>   drivers/misc/cxl/context.c | 7 ++++---
>   1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
> index 7ff315a..c6ec872 100644
> --- a/drivers/misc/cxl/context.c
> +++ b/drivers/misc/cxl/context.c
> @@ -128,11 +128,12 @@ void cxl_context_set_mapping(struct cxl_context *ctx,
>   	mutex_unlock(&ctx->mapping_lock);
>   }
> 
> -static int cxl_mmap_fault(struct vm_fault *vmf)
> +static vm_fault_t cxl_mmap_fault(struct vm_fault *vmf)
>   {
>   	struct vm_area_struct *vma = vmf->vma;
>   	struct cxl_context *ctx = vma->vm_file->private_data;
>   	u64 area, offset;
> +	vm_fault_t ret;
> 
>   	offset = vmf->pgoff << PAGE_SHIFT;
> 
> @@ -169,11 +170,11 @@ static int cxl_mmap_fault(struct vm_fault *vmf)
>   		return VM_FAULT_SIGBUS;
>   	}
> 
> -	vm_insert_pfn(vma, vmf->address, (area + offset) >> PAGE_SHIFT);
> +	ret = vmf_insert_pfn(vma, vmf->address, (area + offset) >> PAGE_SHIFT);
> 
>   	mutex_unlock(&ctx->status_mutex);
> 
> -	return VM_FAULT_NOPAGE;
> +	return ret;
>   }
> 
>   static const struct vm_operations_struct cxl_mmap_vmops = {
> --
> 1.9.1
> 

^ permalink raw reply

* Re: [PATCH v5 5/7] ocxl: Expose the thread_id needed for wait on POWER9
From: Frederic Barrat @ 2018-05-11 11:03 UTC (permalink / raw)
  To: Alastair D'Silva, 'Alastair D'Silva',
	linuxppc-dev
  Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
	felix, pombredanne, sukadev, npiggin, gregkh, arnd,
	andrew.donnellan, fbarrat, corbet
In-Reply-To: <02a401d3e90f$cdb0be90$69123bb0$@d-silva.org>



Le 11/05/2018 à 12:06, Alastair D'Silva a écrit :
> 
>> -----Original Message-----
>> From: Frederic Barrat <fbarrat@linux.ibm.com>
>> Sent: Friday, 11 May 2018 7:25 PM
>> To: Alastair D'Silva <alastair@au1.ibm.com>; linuxppc-dev@lists.ozlabs.org
>> Cc: linux-kernel@vger.kernel.org; linux-doc@vger.kernel.org;
>> mikey@neuling.org; vaibhav@linux.vnet.ibm.com;
>> aneesh.kumar@linux.vnet.ibm.com; malat@debian.org;
>> felix@linux.vnet.ibm.com; pombredanne@nexb.com;
>> sukadev@linux.vnet.ibm.com; npiggin@gmail.com;
>> gregkh@linuxfoundation.org; arnd@arndb.de;
>> andrew.donnellan@au1.ibm.com; fbarrat@linux.vnet.ibm.com;
>> corbet@lwn.net; Alastair D'Silva <alastair@d-silva.org>
>> Subject: Re: [PATCH v5 5/7] ocxl: Expose the thread_id needed for wait on
>> POWER9
>>
>>
>>
>> Le 11/05/2018 à 08:13, Alastair D'Silva a écrit :
>>> From: Alastair D'Silva <alastair@d-silva.org>
>>>
>>> In order to successfully issue as_notify, an AFU needs to know the TID
>>> to notify, which in turn means that this information should be
>>> available in userspace so it can be communicated to the AFU.
>>>
>>> Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
>>> ---
>>
>> Ok, so we keep the limitation of having only one thread per context able to
>> call 'wait', even though we don't have to worry about depleting the pool of
>> TIDs any more. I think that's acceptable, though we don't really have a reason
>> to justify it any more. Any reason you want to keep it that way?
>>
> 
> No strong reason, just trying to minimise the amount of changes. We can always expand the scope later, if we have a use-case for it.

ok. Agreed, it's not worth holding up this series, we can send a follow 
up patch.

   Fred


>>     Fred
>>
>>
>>>    drivers/misc/ocxl/context.c       |  5 ++-
>>>    drivers/misc/ocxl/file.c          | 53 +++++++++++++++++++++++++++++++
>>>    drivers/misc/ocxl/link.c          | 36 +++++++++++++++++++++
>>>    drivers/misc/ocxl/ocxl_internal.h |  1 +
>>>    include/misc/ocxl.h               |  9 ++++++
>>>    include/uapi/misc/ocxl.h          |  8 +++++
>>>    6 files changed, 111 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
>>> index 909e8807824a..95f74623113e 100644
>>> --- a/drivers/misc/ocxl/context.c
>>> +++ b/drivers/misc/ocxl/context.c
>>> @@ -34,6 +34,8 @@ int ocxl_context_init(struct ocxl_context *ctx, struct
>> ocxl_afu *afu,
>>>    	mutex_init(&ctx->xsl_error_lock);
>>>    	mutex_init(&ctx->irq_lock);
>>>    	idr_init(&ctx->irq_idr);
>>> +	ctx->tidr = 0;
>>> +
>>>    	/*
>>>    	 * Keep a reference on the AFU to make sure it's valid for the
>>>    	 * duration of the life of the context @@ -65,6 +67,7 @@ int
>>> ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
>>>    {
>>>    	int rc;
>>>
>>> +	// Locks both status & tidr
>>>    	mutex_lock(&ctx->status_mutex);
>>>    	if (ctx->status != OPENED) {
>>>    		rc = -EIO;
>>> @@ -72,7 +75,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64
>> amr)
>>>    	}
>>>
>>>    	rc = ocxl_link_add_pe(ctx->afu->fn->link, ctx->pasid,
>>> -			current->mm->context.id, 0, amr, current->mm,
>>> +			current->mm->context.id, ctx->tidr, amr, current-
>>> mm,
>>>    			xsl_fault_error, ctx);
>>>    	if (rc)
>>>    		goto out;
>>> diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c index
>>> 038509e5d031..eb409a469f21 100644
>>> --- a/drivers/misc/ocxl/file.c
>>> +++ b/drivers/misc/ocxl/file.c
>>> @@ -5,6 +5,8 @@
>>>    #include <linux/sched/signal.h>
>>>    #include <linux/uaccess.h>
>>>    #include <uapi/misc/ocxl.h>
>>> +#include <asm/reg.h>
>>> +#include <asm/switch_to.h>
>>>    #include "ocxl_internal.h"
>>>
>>>
>>> @@ -123,11 +125,55 @@ static long afu_ioctl_get_metadata(struct
>> ocxl_context *ctx,
>>>    	return 0;
>>>    }
>>>
>>> +#ifdef CONFIG_PPC64
>>> +static long afu_ioctl_enable_p9_wait(struct ocxl_context *ctx,
>>> +		struct ocxl_ioctl_p9_wait __user *uarg) {
>>> +	struct ocxl_ioctl_p9_wait arg;
>>> +
>>> +	memset(&arg, 0, sizeof(arg));
>>> +
>>> +	if (cpu_has_feature(CPU_FTR_P9_TIDR)) {
>>> +		enum ocxl_context_status status;
>>> +
>>> +		// Locks both status & tidr
>>> +		mutex_lock(&ctx->status_mutex);
>>> +		if (!ctx->tidr) {
>>> +			if (set_thread_tidr(current))
>>> +				return -ENOENT;
>>> +
>>> +			ctx->tidr = current->thread.tidr;
>>> +		}
>>> +
>>> +		status = ctx->status;
>>> +		mutex_unlock(&ctx->status_mutex);
>>> +
>>> +		if (status == ATTACHED) {
>>> +			int rc;
>>> +			struct link *link = ctx->afu->fn->link;
>>> +
>>> +			rc = ocxl_link_update_pe(link, ctx->pasid, ctx->tidr);
>>> +			if (rc)
>>> +				return rc;
>>> +		}
>>> +
>>> +		arg.thread_id = ctx->tidr;
>>> +	} else
>>> +		return -ENOENT;
>>> +
>>> +	if (copy_to_user(uarg, &arg, sizeof(arg)))
>>> +		return -EFAULT;
>>> +
>>> +	return 0;
>>> +}
>>> +#endif
>>> +
>>>    #define CMD_STR(x) (x == OCXL_IOCTL_ATTACH ? "ATTACH" :
>> 		\
>>>    			x == OCXL_IOCTL_IRQ_ALLOC ? "IRQ_ALLOC" :	\
>>>    			x == OCXL_IOCTL_IRQ_FREE ? "IRQ_FREE" :
>> 	\
>>>    			x == OCXL_IOCTL_IRQ_SET_FD ? "IRQ_SET_FD" :
>> 	\
>>>    			x == OCXL_IOCTL_GET_METADATA ?
>> "GET_METADATA" :	\
>>> +			x == OCXL_IOCTL_ENABLE_P9_WAIT ?
>> "ENABLE_P9_WAIT" :	\
>>>    			"UNKNOWN")
>>>
>>>    static long afu_ioctl(struct file *file, unsigned int cmd, @@ -186,6
>>> +232,13 @@ static long afu_ioctl(struct file *file, unsigned int cmd,
>>>    				(struct ocxl_ioctl_metadata __user *) args);
>>>    		break;
>>>
>>> +#ifdef CONFIG_PPC64
>>> +	case OCXL_IOCTL_ENABLE_P9_WAIT:
>>> +		rc = afu_ioctl_enable_p9_wait(ctx,
>>> +				(struct ocxl_ioctl_p9_wait __user *) args);
>>> +		break;
>>> +#endif
>>> +
>>>    	default:
>>>    		rc = -EINVAL;
>>>    	}
>>> diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c index
>>> 656e8610eec2..88876ae8f330 100644
>>> --- a/drivers/misc/ocxl/link.c
>>> +++ b/drivers/misc/ocxl/link.c
>>> @@ -544,6 +544,42 @@ int ocxl_link_add_pe(void *link_handle, int pasid,
>> u32 pidr, u32 tidr,
>>>    }
>>>    EXPORT_SYMBOL_GPL(ocxl_link_add_pe);
>>>
>>> +int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid) {
>>> +	struct link *link = (struct link *) link_handle;
>>> +	struct spa *spa = link->spa;
>>> +	struct ocxl_process_element *pe;
>>> +	int pe_handle, rc;
>>> +
>>> +	if (pasid > SPA_PASID_MAX)
>>> +		return -EINVAL;
>>> +
>>> +	pe_handle = pasid & SPA_PE_MASK;
>>> +	pe = spa->spa_mem + pe_handle;
>>> +
>>> +	mutex_lock(&spa->spa_lock);
>>> +
>>> +	pe->tid = tid;
>>> +
>>> +	/*
>>> +	 * The barrier makes sure the PE is updated
>>> +	 * before we clear the NPU context cache below, so that the
>>> +	 * old PE cannot be reloaded erroneously.
>>> +	 */
>>> +	mb();
>>> +
>>> +	/*
>>> +	 * hook to platform code
>>> +	 * On powerpc, the entry needs to be cleared from the context
>>> +	 * cache of the NPU.
>>> +	 */
>>> +	rc = pnv_ocxl_spa_remove_pe_from_cache(link->platform_data,
>> pe_handle);
>>> +	WARN_ON(rc);
>>> +
>>> +	mutex_unlock(&spa->spa_lock);
>>> +	return rc;
>>> +}
>>> +
>>>    int ocxl_link_remove_pe(void *link_handle, int pasid)
>>>    {
>>>    	struct link *link = (struct link *) link_handle; diff --git
>>> a/drivers/misc/ocxl/ocxl_internal.h
>>> b/drivers/misc/ocxl/ocxl_internal.h
>>> index 5d421824afd9..a32f2151029f 100644
>>> --- a/drivers/misc/ocxl/ocxl_internal.h
>>> +++ b/drivers/misc/ocxl/ocxl_internal.h
>>> @@ -77,6 +77,7 @@ struct ocxl_context {
>>>    	struct ocxl_xsl_error xsl_error;
>>>    	struct mutex irq_lock;
>>>    	struct idr irq_idr;
>>> +	u16 tidr; // Thread ID used for P9 wait implementation
>>>    };
>>>
>>>    struct ocxl_process_element {
>>> diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h index
>>> 51ccf76db293..9ff6ddc28e22 100644
>>> --- a/include/misc/ocxl.h
>>> +++ b/include/misc/ocxl.h
>>> @@ -188,6 +188,15 @@ extern int ocxl_link_add_pe(void *link_handle, int
>> pasid, u32 pidr, u32 tidr,
>>>    		void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
>>>    		void *xsl_err_data);
>>>
>>> +/**
>>> + * Update values within a Process Element
>>> + *
>>> + * link_handle: the link handle associated with the process element
>>> + * pasid: the PASID for the AFU context
>>> + * tid: the new thread id for the process element  */ extern int
>>> +ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid);
>>> +
>>>    /*
>>>     * Remove a Process Element from the Shared Process Area for a link
>>>     */
>>> diff --git a/include/uapi/misc/ocxl.h b/include/uapi/misc/ocxl.h index
>>> 0af83d80fb3e..561e6f0dfcb7 100644
>>> --- a/include/uapi/misc/ocxl.h
>>> +++ b/include/uapi/misc/ocxl.h
>>> @@ -48,6 +48,13 @@ struct ocxl_ioctl_metadata {
>>>    	__u64 reserved[13]; // Total of 16*u64
>>>    };
>>>
>>> +struct ocxl_ioctl_p9_wait {
>>> +	__u16 thread_id; // The thread ID required to wake this thread
>>> +	__u16 reserved1;
>>> +	__u32 reserved2;
>>> +	__u64 reserved3[3];
>>> +};
>>> +
>>>    struct ocxl_ioctl_irq_fd {
>>>    	__u64 irq_offset;
>>>    	__s32 eventfd;
>>> @@ -62,5 +69,6 @@ struct ocxl_ioctl_irq_fd {
>>>    #define OCXL_IOCTL_IRQ_FREE	_IOW(OCXL_MAGIC, 0x12, __u64)
>>>    #define OCXL_IOCTL_IRQ_SET_FD	_IOW(OCXL_MAGIC, 0x13, struct
>> ocxl_ioctl_irq_fd)
>>>    #define OCXL_IOCTL_GET_METADATA _IOR(OCXL_MAGIC, 0x14, struct
>>> ocxl_ioctl_metadata)
>>> +#define OCXL_IOCTL_ENABLE_P9_WAIT	_IOR(OCXL_MAGIC, 0x15,
>> struct ocxl_ioctl_p9_wait)
>>>
>>>    #endif /* _UAPI_MISC_OCXL_H */
>>>
>>
>>
>> ---
>> This email has been checked for viruses by AVG.
>> http://www.avg.com
> 
> 

^ permalink raw reply

* [PATCH 0/2] powerpc: Scheduler optimization for POWER9 bigcores
From: Gautham R. Shenoy @ 2018-05-11 11:17 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Michael Neuling,
	Vaidyanathan Srinivasan, Akshay Adiga, Shilpasri G Bhat,
	Balbir Singh, Oliver O'Halloran, Nicholas Piggin
  Cc: linuxppc-dev, linux-kernel, Gautham R. Shenoy

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

Hi,

A pair of IBM POWER9 SMT4 cores can be fused together to form a
big-core with 8 SMT threads. This can be discovered via the
"ibm,thread-groups" CPU property in the device tree which will
indicate which group of threads that share the L1 cache, translation
cache and instruction data flow.  If there are multiple such group of
threads, then the core is a big-core. Furthermore, the thread-ids of
such a big-core is obtained by interleaving the thread-ids of the
component SMT4 cores.

Eg: Threads in the pair of component SMT4 cores of an interleaved
big-core are numbered {0,2,4,6} and {1,3,5,7} respectively.
    
On such a big-core, when multiple tasks are scheduled to run on the
big-core, we get the best performance when the tasks are spread across
the pair of SMT4 cores.

The Linux scheduler has a feature called "ASYM_SMT" which will bias
the load-balancing of the tasks on the smaller numbered threads in the
core. On an big-core whose threads are interleavings of the threads of
the small cores, enabling ASYM_SMT automatically results in spreading
the tasks uniformly across the associated pair of SMT4 cores.

In this patchset, we detect if the cores are big-cores with interleaved threads.
If so, we enable ASYM_SMT feature at the SMT-sched domain.

Experimental results for ebizzy with 2 threads, bound to a single big-core
show a marked improvement with this patchset over the 4.17-rc3 vanilla
kernel.

The results of 100 such runs, for 4.17-rc3 vanilla kernel and
4.17-rc3 + this_patchset are as follows.

4.17-rc3 vanilla:
================================================
  ebizzy records/s   :# samples: Histogram 
=================================================
[1000000 - 2000000]  :      14    : ###
[2000000 - 3000000]  :      13    : ###
[3000000 - 4000000]  :      15    : ####
[4000000 - 5000000]  :       2    : #
[5000000 - 6000000]  :      56    : ############
=================================================
 
4.17-rc3 vanilla + this_patchset
=================================================
  ebizzy records/s   :# samples: Histogram 
=================================================
[1000000 - 2000000]  :       0    : #
[2000000 - 3000000]  :      14    : ###
[3000000 - 4000000]  :       0    : #
[4000000 - 5000000]  :       1    : #
[5000000 - 6000000]  :      85    : ##################
=================================================

This patchset contains two patchset.

The first patch detects the presence of big-cores with interleaved
threads.

The second patch adds the ASYM_SMT bit to the flags the SMT-sched
domain when interleaved big-cores are detected.

Gautham R. Shenoy (2):
  powerpc: Detect the presence of big-core with interleaved threads
  powerpc: Enable ASYM_SMT on interleaved big-core systems

 arch/powerpc/include/asm/cputhreads.h |  8 +++--
 arch/powerpc/kernel/setup-common.c    | 63 +++++++++++++++++++++++++++++++++--
 arch/powerpc/kernel/smp.c             |  2 +-
 3 files changed, 67 insertions(+), 6 deletions(-)

-- 
1.9.4

^ permalink raw reply

* [PATCH 1/2] powerpc: Detect the presence of big-core with interleaved threads
From: Gautham R. Shenoy @ 2018-05-11 11:17 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Michael Neuling,
	Vaidyanathan Srinivasan, Akshay Adiga, Shilpasri G Bhat,
	Balbir Singh, Oliver O'Halloran, Nicholas Piggin
  Cc: linuxppc-dev, linux-kernel, Gautham R. Shenoy
In-Reply-To: <1526037444-22876-1-git-send-email-ego@linux.vnet.ibm.com>

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

A pair of IBM POWER9 SMT4 cores can be fused together to form a
big-core with 8 SMT threads. This can be discovered via the
"ibm,thread-groups" CPU property in the device tree which will
indicate which group of threads that share the L1 cache, translation
cache and instruction data flow.  If there are multiple such group of
threads, then the core is a big-core. The thread-ids of the threads of
the big-core can be obtained by interleaving the thread-ids of the
thread-groups (component small core).

Eg: Threads in the pair of component SMT4 cores of an interleaved
big-core are numbered {0,2,4,6} and {1,3,5,7} respectively.

This patch introduces a function to check if a given device tree node
corresponding to a CPU node represents an interleaved big-core.

This function is invoked during the boot-up to detect the presence of
interleaved big-cores. The presence of such an interleaved big-core is
recorded in a global variable for later use.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/cputhreads.h |  8 +++--
 arch/powerpc/kernel/setup-common.c    | 63 +++++++++++++++++++++++++++++++++--
 2 files changed, 66 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/cputhreads.h b/arch/powerpc/include/asm/cputhreads.h
index d71a909..b706f0a 100644
--- a/arch/powerpc/include/asm/cputhreads.h
+++ b/arch/powerpc/include/asm/cputhreads.h
@@ -23,11 +23,13 @@
 extern int threads_per_core;
 extern int threads_per_subcore;
 extern int threads_shift;
+extern bool has_interleaved_big_core;
 extern cpumask_t threads_core_mask;
 #else
-#define threads_per_core	1
-#define threads_per_subcore	1
-#define threads_shift		0
+#define threads_per_core		1
+#define threads_per_subcore		1
+#define threads_shift			0
+#define has_interleaved_big_core	0
 #define threads_core_mask	(*get_cpu_mask(0))
 #endif
 
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 0af5c11..884dff2 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -408,10 +408,12 @@ void __init check_for_initrd(void)
 #ifdef CONFIG_SMP
 
 int threads_per_core, threads_per_subcore, threads_shift;
+bool has_interleaved_big_core;
 cpumask_t threads_core_mask;
 EXPORT_SYMBOL_GPL(threads_per_core);
 EXPORT_SYMBOL_GPL(threads_per_subcore);
 EXPORT_SYMBOL_GPL(threads_shift);
+EXPORT_SYMBOL_GPL(has_interleaved_big_core);
 EXPORT_SYMBOL_GPL(threads_core_mask);
 
 static void __init cpu_init_thread_core_maps(int tpc)
@@ -436,8 +438,56 @@ static void __init cpu_init_thread_core_maps(int tpc)
 	printk(KERN_DEBUG " (thread shift is %d)\n", threads_shift);
 }
 
-
 u32 *cpu_to_phys_id = NULL;
+/*
+ * check_for_interleaved_big_core - Checks if the core represented by
+ *	 dn is a big-core whose threads are interleavings of the
+ *	 threads of the component small cores.
+ *
+ * @dn: device node corresponding to the core.
+ *
+ * Returns true if the core is a interleaved big-core.
+ * Returns false otherwise.
+ */
+static inline bool check_for_interleaved_big_core(struct device_node *dn)
+{
+	int len, nr_groups, threads_per_group;
+	const __be32 *thread_groups;
+	__be32 *thread_list, *first_cpu_idx;
+	int cur_cpu, next_cpu, i, j;
+
+	thread_groups = of_get_property(dn, "ibm,thread-groups", &len);
+	if (!thread_groups)
+		return false;
+
+	nr_groups = be32_to_cpu(*(thread_groups + 1));
+	if (nr_groups <= 1)
+		return false;
+
+	threads_per_group = be32_to_cpu(*(thread_groups + 2));
+	thread_list = (__be32 *)thread_groups + 3;
+
+	/*
+	 * In case of an interleaved big-core, the thread-ids of the
+	 * big-core can be obtained by interleaving the the thread-ids
+	 * of the component small
+	 *
+	 * Eg: On a 8-thread big-core with two SMT4 small cores, the
+	 * threads of the two component small cores will be
+	 * {0, 2, 4, 6} and {1, 3, 5, 7}.
+	 */
+	for (i = 0; i < nr_groups; i++) {
+		first_cpu_idx = thread_list + i * threads_per_group;
+
+		for (j = 0; j < threads_per_group - 1; j++) {
+			cur_cpu = be32_to_cpu(*(first_cpu_idx + j));
+			next_cpu = be32_to_cpu(*(first_cpu_idx + j + 1));
+			if (next_cpu != cur_cpu + nr_groups)
+				return false;
+		}
+	}
+	return true;
+}
 
 /**
  * setup_cpu_maps - initialize the following cpu maps:
@@ -565,7 +615,16 @@ void __init smp_setup_cpu_maps(void)
 	vdso_data->processorCount = num_present_cpus();
 #endif /* CONFIG_PPC64 */
 
-        /* Initialize CPU <=> thread mapping/
+	dn = of_find_node_by_type(NULL, "cpu");
+	if (dn) {
+		if (check_for_interleaved_big_core(dn)) {
+			has_interleaved_big_core = true;
+			pr_info("Detected interleaved big-cores\n");
+		}
+		of_node_put(dn);
+	}
+
+	/* Initialize CPU <=> thread mapping/
 	 *
 	 * WARNING: We assume that the number of threads is the same for
 	 * every CPU in the system. If that is not the case, then some code
-- 
1.9.4

^ permalink raw reply related

* [PATCH 2/2] powerpc: Enable ASYM_SMT on interleaved big-core systems
From: Gautham R. Shenoy @ 2018-05-11 11:17 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Michael Neuling,
	Vaidyanathan Srinivasan, Akshay Adiga, Shilpasri G Bhat,
	Balbir Singh, Oliver O'Halloran, Nicholas Piggin
  Cc: linuxppc-dev, linux-kernel, Gautham R. Shenoy
In-Reply-To: <1526037444-22876-1-git-send-email-ego@linux.vnet.ibm.com>

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

Each of the SMT4 cores forming a fused-core are more or less
independent units. Thus when multiple tasks are scheduled to run on
the fused core, we get the best performance when the tasks are spread
across the pair of SMT4 cores.

Since the threads in the pair of SMT4 cores of an interleaved big-core
are numbered {0,2,4,6} and {1,3,5,7} respectively, enable ASYM_SMT on
such interleaved big-cores that will bias the load-balancing of tasks
on smaller numbered threads, which will automatically result in
spreading the tasks uniformly across the associated pair of SMT4
cores.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/smp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 9ca7148..0153f01 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1082,7 +1082,7 @@ static int powerpc_smt_flags(void)
 {
 	int flags = SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES;
 
-	if (cpu_has_feature(CPU_FTR_ASYM_SMT)) {
+	if (cpu_has_feature(CPU_FTR_ASYM_SMT) || has_interleaved_big_core) {
 		printk_once(KERN_INFO "Enabling Asymmetric SMT scheduling\n");
 		flags |= SD_ASYM_PACKING;
 	}
-- 
1.9.4

^ permalink raw reply related

* Re: [PATCH v3 1/2] cxl: Set the PBCQ Tunnel BAR register when enabling capi mode
From: Frederic Barrat @ 2018-05-11 11:25 UTC (permalink / raw)
  To: Philippe Bergheaud, linuxppc-dev; +Cc: clombard, benh
In-Reply-To: <20180425110834.16048-1-felix@linux.ibm.com>



Le 25/04/2018 à 13:08, Philippe Bergheaud a écrit :
> Skiboot used to set the default Tunnel BAR register value when capi mode
> was enabled. This approach was ok for the cxl driver, but prevented other
> drivers from choosing different values.
> 
> Skiboot versions > 5.11 will not set the default value any longer. This
> patch modifies the cxl driver to set/reset the Tunnel BAR register when
> entering/exiting the cxl mode, with pnv_pci_set_tunnel_bar().
> 
> Signed-off-by: Philippe Bergheaud <felix@linux.ibm.com>
> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
> ---

ok, so that should work with old skiboot (since we are re-writing the 
value already set) and new skiboot. Thanks!

Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>


> v2: Restrict tunnel bar setting to power9.
>      Do not fail cxl_configure_adapter() on tunnel bar setting error.
>      Log an info message instead, and continue configuring capi mode.
> 
> v3: No change.
> ---
>   drivers/misc/cxl/pci.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
> index 83f1d08058fc..355c789406f7 100644
> --- a/drivers/misc/cxl/pci.c
> +++ b/drivers/misc/cxl/pci.c
> @@ -1742,6 +1742,10 @@ static int cxl_configure_adapter(struct cxl *adapter, struct pci_dev *dev)
>   	/* Required for devices using CAPP DMA mode, harmless for others */
>   	pci_set_master(dev);
>   
> +	if (cxl_is_power9())
> +		if (pnv_pci_set_tunnel_bar(dev, 0x00020000E0000000ull, 1))
> +			dev_info(&dev->dev, "Tunneled operations unsupported\n");
> +
>   	if ((rc = pnv_phb_to_cxl_mode(dev, adapter->native->sl_ops->capi_mode)))
>   		goto err;
>   
> @@ -1768,6 +1772,8 @@ static void cxl_deconfigure_adapter(struct cxl *adapter)
>   {
>   	struct pci_dev *pdev = to_pci_dev(adapter->dev.parent);
>   
> +	if (cxl_is_power9())
> +		pnv_pci_set_tunnel_bar(pdev, 0x00020000E0000000ull, 0);
>   	cxl_native_release_psl_err_irq(adapter);
>   	cxl_unmap_adapter_regs(adapter);
>   
> 

^ permalink raw reply

* Re: [PATCH v3 2/2] cxl: Report the tunneled operations status
From: Frederic Barrat @ 2018-05-11 11:26 UTC (permalink / raw)
  To: Philippe Bergheaud, linuxppc-dev; +Cc: clombard, benh
In-Reply-To: <20180425110834.16048-2-felix@linux.ibm.com>



Le 25/04/2018 à 13:08, Philippe Bergheaud a écrit :
> Failure to synchronize the tunneled operations does not prevent
> the initialization of the cxl card. This patch reports the tunneled
> operations status via /sys.
> 
> Signed-off-by: Philippe Bergheaud <felix@linux.ibm.com>
> ---

Good idea, but you'll have to also edit
Documentation/ABI/testing/sysfs-class-cxl
to describe the new sysfs entry.

   Fred


> v3: Added this patch to report the tunneled operations status.
> ---
>   drivers/misc/cxl/cxl.h   |  1 +
>   drivers/misc/cxl/pci.c   |  7 ++++++-
>   drivers/misc/cxl/sysfs.c | 10 ++++++++++
>   3 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
> index a4c9c8297a6d..918d4fb742d1 100644
> --- a/drivers/misc/cxl/cxl.h
> +++ b/drivers/misc/cxl/cxl.h
> @@ -717,6 +717,7 @@ struct cxl {
>   	bool perst_select_user;
>   	bool perst_same_image;
>   	bool psl_timebase_synced;
> +	bool tunneled_ops_supported;
>   
>   	/*
>   	 * number of contexts mapped on to this card. Possible values are:
> diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
> index 355c789406f7..008f50a0c465 100644
> --- a/drivers/misc/cxl/pci.c
> +++ b/drivers/misc/cxl/pci.c
> @@ -1742,9 +1742,14 @@ static int cxl_configure_adapter(struct cxl *adapter, struct pci_dev *dev)
>   	/* Required for devices using CAPP DMA mode, harmless for others */
>   	pci_set_master(dev);
>   
> -	if (cxl_is_power9())
> +	adapter->tunneled_ops_supported = false;
> +
> +	if (cxl_is_power9()) {
>   		if (pnv_pci_set_tunnel_bar(dev, 0x00020000E0000000ull, 1))
>   			dev_info(&dev->dev, "Tunneled operations unsupported\n");
> +		else
> +			adapter->tunneled_ops_supported = true;
> +	}
>   
>   	if ((rc = pnv_phb_to_cxl_mode(dev, adapter->native->sl_ops->capi_mode)))
>   		goto err;
> diff --git a/drivers/misc/cxl/sysfs.c b/drivers/misc/cxl/sysfs.c
> index 95285b7f636f..4b5a4c5d3c01 100644
> --- a/drivers/misc/cxl/sysfs.c
> +++ b/drivers/misc/cxl/sysfs.c
> @@ -78,6 +78,15 @@ static ssize_t psl_timebase_synced_show(struct device *device,
>   	return scnprintf(buf, PAGE_SIZE, "%i\n", adapter->psl_timebase_synced);
>   }
>   
> +static ssize_t tunneled_ops_supported_show(struct device *device,
> +					struct device_attribute *attr,
> +					char *buf)
> +{
> +	struct cxl *adapter = to_cxl_adapter(device);
> +
> +	return scnprintf(buf, PAGE_SIZE, "%i\n", adapter->tunneled_ops_supported);
> +}
> +
>   static ssize_t reset_adapter_store(struct device *device,
>   				   struct device_attribute *attr,
>   				   const char *buf, size_t count)
> @@ -183,6 +192,7 @@ static struct device_attribute adapter_attrs[] = {
>   	__ATTR_RO(base_image),
>   	__ATTR_RO(image_loaded),
>   	__ATTR_RO(psl_timebase_synced),
> +	__ATTR_RO(tunneled_ops_supported),
>   	__ATTR_RW(load_image_on_perst),
>   	__ATTR_RW(perst_reloads_same_image),
>   	__ATTR(reset, S_IWUSR, NULL, reset_adapter_store),
> 

^ permalink raw reply

* [GIT PULL] Please pull powerpc/linux.git powerpc-4.17-5 tag
From: Michael Ellerman @ 2018-05-11 13:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, linuxppc-dev, naveen.n.rao

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi Linus,

Please pull some more powerpc fixes for 4.17:

The following changes since commit b2d7ecbe355698010a6b7a15eb179e09eb3d6a34:

  powerpc/kvm/booke: Fix altivec related build break (2018-04-27 16:36:03 +1000)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git tags/powerpc-4.17-5

for you to fetch changes up to 6c0a8f6b5a45ac892a763b6299bd3c5324fc5e02:

  powerpc/pseries: Fix CONFIG_NUMA=n build (2018-05-08 14:59:56 +1000)

- ----------------------------------------------------------------
powerpc fixes for 4.17 #5

One fix for an actual regression, the change to the SYSCALL_DEFINE wrapper broke
FTRACE_SYSCALLS for us due to a name mismatch. There's also another commit to
the same code to make sure we match all our syscalls with various prefixes.

And then just one minor build fix, and the removal of an unused variable that
was removed and then snuck back in due to some rebasing.

Thanks to:
  Naveen N. Rao.

- ----------------------------------------------------------------
Michael Ellerman (2):
      powerpc/64: Remove unused paca->soft_enabled
      powerpc/pseries: Fix CONFIG_NUMA=n build

Naveen N. Rao (2):
      powerpc/trace/syscalls: Update syscall name matching logic
      powerpc/trace/syscalls: Update syscall name matching logic to account for ppc_ prefix

 arch/powerpc/include/asm/ftrace.h   | 29 +++++++++++++++++++++--------
 arch/powerpc/include/asm/paca.h     |  1 -
 arch/powerpc/include/asm/topology.h | 13 +++++--------
 3 files changed, 26 insertions(+), 17 deletions(-)
-----BEGIN PGP SIGNATURE-----

iQIcBAEBCAAGBQJa9Z2YAAoJEFHr6jzI4aWAS/IQAJwlHONhDnkjycj5hWQbmYuU
tRBckKHixyiB5r9trAhpiO7H78dod880qORM/Qvx4f/2fhs+2kNOAD5BJlAwfjg2
0OgJJWMXE6i5IPFzALwWn8rmUHWRaimYaOW+oGh+ieN0nup80OFQwZMLB28CUdGX
b4Nu0c+1jtc9TO5R4zbaDysCVA9xbQjEv0WffwrV5PFDoDm7cxy8cp/RqU1xwZwi
ar07SPM6eu+cUx9NqMpKG75wO2lh2KuqBSYqH3wxlxSSpygLSutdy/hQRZqqQJKF
sNFmb239OIDayTH38Ht2GADl7W+qbJSLmdpqhNvmu3xgnprEd6Ipf3M5hmkSksb/
/0ppzvGfFNOleBRMIOawTgRa7er+AsDZUuC65JZuEkGv1eNXMcALWy8/UCQ3gyQV
jlxL8JoWoTW8AqDgJvb50GDCDxvZpZNGebARWc57A2OkvhHTLG+DqYE/oVHKCyZM
wkCnHHsBOsS+0p/qOPIHuwvu1l03EsuY/K1D1tkWrqAjJX5MTIA9rn4R9zMZnDPb
9UXxPymYazMaJ5yKzbMZmsLNjntu9ltQi8aS1l1FwTrubYzdohagjdg3unOMwcHv
UAHPK609fhOncfylPNWlqaG4CjWRNyqBB6zGSSmj8v7Pg65VxhrgNac5ES8RbbN2
GZUrjS71LSRFiwqYK4+j
=nibe
-----END PGP SIGNATURE-----

^ permalink raw reply

* [PATCH] powerpc/perf: Fix memory allocation for core-imc based on num_possible_cpus()
From: Anju T Sudhakar @ 2018-05-11 13:43 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: anju, mpe, maddy, ppaidipe

Currently memory is allocated for core-imc based on cpu_present_mask, which has
bit 'cpu' set iff cpu is populated. We use  (cpu number / threads per core)
as as array index to access the memory.
So in a system with guarded cores, since allocation happens based on
cpu_present_mask, (cpu number / threads per core) bounds the index and leads
to memory overflow.

The issue is exposed in a guard test.
The guard test will make some CPU's as un-available to the system during boot
time as well as at runtime. So when the cpu is unavailable to the system during
boot time, the memory allocation happens depending on the number of available
cpus. And when we access the memory using (cpu number / threads per core) as the
index the system crashes due to memory overflow.

Allocating memory for core-imc based on cpu_possible_mask, which has
bit 'cpu' set iff cpu is populatable, will fix this issue.

Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
---
 arch/powerpc/perf/imc-pmu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index d7532e7..75fb23c 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -1146,7 +1146,7 @@ static int init_nest_pmu_ref(void)
 
 static void cleanup_all_core_imc_memory(void)
 {
-	int i, nr_cores = DIV_ROUND_UP(num_present_cpus(), threads_per_core);
+	int i, nr_cores = DIV_ROUND_UP(num_possible_cpus(), threads_per_core);
 	struct imc_mem_info *ptr = core_imc_pmu->mem_info;
 	int size = core_imc_pmu->counter_mem_size;
 
@@ -1264,7 +1264,7 @@ static int imc_mem_init(struct imc_pmu *pmu_ptr, struct device_node *parent,
 		if (!pmu_ptr->pmu.name)
 			return -ENOMEM;
 
-		nr_cores = DIV_ROUND_UP(num_present_cpus(), threads_per_core);
+		nr_cores = DIV_ROUND_UP(num_possible_cpus(), threads_per_core);
 		pmu_ptr->mem_info = kcalloc(nr_cores, sizeof(struct imc_mem_info),
 								GFP_KERNEL);
 
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH 1/2] crypto: vmx - Remove overly verbose printk from AES init routines
From: Herbert Xu @ 2018-05-11 16:19 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: linux-crypto, linuxppc-dev, linux-kernel, mhcerri, pfsmorigo
In-Reply-To: <20180503122930.17448-1-mpe@ellerman.id.au>

On Thu, May 03, 2018 at 10:29:29PM +1000, Michael Ellerman wrote:
> In the vmx AES init routines we do a printk(KERN_INFO ...) to report
> the fallback implementation we're using.
> 
> However with a slow console this can significantly affect the speed of
> crypto operations. Using 'cryptsetup benchmark' the removal of the
> printk() leads to a ~5x speedup for aes-cbc decryption.
> 
> So remove them.
> 
> Fixes: 8676590a1593 ("crypto: vmx - Adding AES routines for VMX module")
> Fixes: 8c755ace357c ("crypto: vmx - Adding CBC routines for VMX module")
> Fixes: 4f7f60d312b3 ("crypto: vmx - Adding CTR routines for VMX module")
> Fixes: cc333cd68dfa ("crypto: vmx - Adding GHASH routines for VMX module")
> Cc: stable@vger.kernel.org # v4.1+
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> ---
>  drivers/crypto/vmx/aes.c     | 2 --
>  drivers/crypto/vmx/aes_cbc.c | 3 ---
>  drivers/crypto/vmx/aes_ctr.c | 2 --
>  drivers/crypto/vmx/ghash.c   | 2 --
>  4 files changed, 9 deletions(-)

Patch applied.  Thanks.

> If this is the wrong fix please let me know, I'm not a crypto expert.
> 
> What we see is 'cryptsetup benchmark' causing thousands of these printks() to
> happen. The call trace is something like:
> 
> [c000001e47867a60] [c0000000009cf6b4] p8_aes_cbc_init+0x74/0xf0
> [c000001e47867ae0] [c000000000551a80] __crypto_alloc_tfm+0x1d0/0x2c0
> [c000001e47867b20] [c00000000055aea4] crypto_skcipher_init_tfm+0x124/0x280
> [c000001e47867b60] [c00000000055138c] crypto_create_tfm+0x9c/0x1a0
> [c000001e47867ba0] [c000000000552220] crypto_alloc_tfm+0xa0/0x140
> [c000001e47867c00] [c00000000055b168] crypto_alloc_skcipher+0x48/0x70
> [c000001e47867c40] [c00000000057af28] skcipher_bind+0x38/0x50
> [c000001e47867c80] [c00000000057a07c] alg_bind+0xbc/0x220
> [c000001e47867d10] [c000000000a016a0] __sys_bind+0x90/0x100
> [c000001e47867df0] [c000000000a01750] sys_bind+0x40/0x60
> [c000001e47867e30] [c00000000000b320] system_call+0x58/0x6c
> 
> 
> Is it normal for init to be called on every system call?

This is the tfm init function, so yes it is called every time
you allocate a tfm.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 2/2] crypto: vmx - Remove overly verbose printk from AES XTS init
From: Herbert Xu @ 2018-05-11 16:19 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: linux-crypto, linuxppc-dev, linux-kernel, mhcerri, pfsmorigo
In-Reply-To: <20180503122930.17448-2-mpe@ellerman.id.au>

On Thu, May 03, 2018 at 10:29:30PM +1000, Michael Ellerman wrote:
> In p8_aes_xts_init() we do a printk(KERN_INFO ...) to report the
> fallback implementation we're using. However with a slow console this
> can significantly affect the speed of crypto operations. So remove it.
> 
> Fixes: c07f5d3da643 ("crypto: vmx - Adding support for XTS")
> Cc: stable@vger.kernel.org # v4.8+
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> ---
>  drivers/crypto/vmx/aes_xts.c | 2 --
>  1 file changed, 2 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* RE: [PATCH 3/9] soc: fsl: set rcpm bit for FTM
From: Leo Li @ 2018-05-11 17:00 UTC (permalink / raw)
  To: Yinbo Zhu, Yinbo Zhu, Rob Herring, Mark Rutland,
	Catalin Marinas ), Will Deacon ), Lorenzo Pieralisi )
  Cc: Xiaobo Xie, Ran Wang, Daniel Lezcano, Thomas Gleixner, Shawn Guo,
	Madalin-cristian Bucur, Z.q. Hou, Jerry Huang, M.h. Lian,
	Qiang Zhao, Fabio Estevam, Jiaheng Fan, Po Liu, Nipun Gupta,
	Horia Geantă, Priyanka Jain, Sumit Garg, costi,
	Bogdan Purcareata, open list:CLOCKSOURCE, CLOCKEVENT DRIVERS,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	linux-arm-kernel@lists.infradead.org,
	open list:FREESCALE SOC DRIVERS, Andy Tang, Ying Zhang
In-Reply-To: <20180511033530.7931-3-yinbo.zhu@nxp.com>



> -----Original Message-----
> From: Yinbo Zhu [mailto:yinbo.zhu@nxp.com]
> Sent: Thursday, May 10, 2018 10:35 PM
> To: Yinbo Zhu <yinbo.zhu@nxp.com>; Rob Herring <robh+dt@kernel.org>;
> Mark Rutland <mark.rutland@arm.com>; Catalin Marinas )
> <catalin.marinas@arm.com>; Will Deacon ) <will.deacon@arm.com>;
> Lorenzo Pieralisi ) <lorenzo.pieralisi@arm.com>; Leo Li <leoyang.li@nxp.c=
om>
> Cc: Xiaobo Xie <xiaobo.xie@nxp.com>; Ran Wang <ran.wang_1@nxp.com>;
> Daniel Lezcano <daniel.lezcano@linaro.org>; Thomas Gleixner
> <tglx@linutronix.de>; Shawn Guo <shawnguo@kernel.org>; Madalin-cristian
> Bucur <madalin.bucur@nxp.com>; Z.q. Hou <zhiqiang.hou@nxp.com>; Jerry
> Huang <jerry.huang@nxp.com>; M.h. Lian <minghuan.lian@nxp.com>;
> Qiang Zhao <qiang.zhao@nxp.com>; Fabio Estevam
> <fabio.estevam@nxp.com>; Jiaheng Fan <jiaheng.fan@nxp.com>; Po Liu
> <po.liu@nxp.com>; Nipun Gupta <nipun.gupta@nxp.com>; Horia Geant=E3
> <horia.geanta@nxp.com>; Priyanka Jain <priyanka.jain@nxp.com>; Sumit
> Garg <sumit.garg@nxp.com>; costi <constantin.tudor@freescale.com>;
> Bogdan Purcareata <bogdan.purcareata@nxp.com>; Meng Yi
> <meng.yi@nxp.com>; Wang Dongsheng <dongsheng.wang@nxp.com>; open
> list:CLOCKSOURCE, CLOCKEVENT DRIVERS <linux-kernel@vger.kernel.org>;
> open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS
> <devicetree@vger.kernel.org>; linux-arm-kernel@lists.infradead.org; open
> list:FREESCALE SOC DRIVERS <linuxppc-dev@lists.ozlabs.org>; Andy Tang
> <andy.tang@nxp.com>; Ying Zhang <ying.zhang22455@nxp.com>
> Subject: [PATCH 3/9] soc: fsl: set rcpm bit for FTM
>=20
> From: Zhang Ying-22455 <ying.zhang22455@nxp.com>
>=20
> Set RCPM for FTM when using FTM as wakeup source. Because the RCPM
> module of each platform has different big-end and little-end mode, there
> need to set RCPM depending on the platform.
>=20
> Signed-off-by: Zhang Ying-22455 <ying.zhang22455@nxp.com>
> Signed-off-by: Yinbo Zhu <yinbo.zhu@nxp.com>
> ---
>  .../devicetree/bindings/timer/fsl,ftm-timer.txt    |    7 ++
>  drivers/soc/fsl/layerscape/ftm_alarm.c             |   92 ++++++++++++++=
++++-
>  2 files changed, 94 insertions(+), 5 deletions(-)
>=20
> diff --git a/Documentation/devicetree/bindings/timer/fsl,ftm-timer.txt
> b/Documentation/devicetree/bindings/timer/fsl,ftm-timer.txt
> index aa8c402..15ead58 100644
> --- a/Documentation/devicetree/bindings/timer/fsl,ftm-timer.txt
> +++ b/Documentation/devicetree/bindings/timer/fsl,ftm-timer.txt
> @@ -3,6 +3,13 @@ Freescale FlexTimer Module (FTM) Timer  Required
> properties:
>=20
>  - compatible : should be "fsl,ftm-timer"

Hi Yingbo,

This is a change that breaks backward compatibility and not acceptable.

> + Possible compatibles for ARM:
> +     "fsl,ls1012a-ftm"
> +     "fsl,ls1021a-ftm"
> +     "fsl,ls1043a-ftm"
> +     "fsl,ls1046a-ftm"
> +     "fsl,ls1088a-ftm"
> +     "fsl,ls208xa-ftm"
>  - reg : Specifies base physical address and size of the register sets fo=
r the
>    clock event device and clock source device.
>  - interrupts : Should be the clock event device interrupt.
> diff --git a/drivers/soc/fsl/layerscape/ftm_alarm.c
> b/drivers/soc/fsl/layerscape/ftm_alarm.c
> index 6f9882f..811dcfa 100644
> --- a/drivers/soc/fsl/layerscape/ftm_alarm.c
> +++ b/drivers/soc/fsl/layerscape/ftm_alarm.c

There is no such file in the mainline kernel.  So it looks like the patch s=
et is based on some internal git repo instead of the upstream Linux kernel.=
  This kind of patches shouldn't be sent to the upstream mailing list for r=
eview.

Regards,
Leo

^ permalink raw reply

* [PATCH 0/7] Miscellaneous patches and bug fixes
From: Uma Krishnan @ 2018-05-11 19:04 UTC (permalink / raw)
  To: linux-scsi, James Bottomley, Martin K. Petersen, Matthew R. Ochs,
	Manoj N. Kumar
  Cc: linuxppc-dev, Andrew Donnellan, Frederic Barrat,
	Christophe Lombard

This patch series adds few improvements to the cxlflash driver and
it also contains couple of bug fixes.

This patch series is intended for 4.18 and is bisectable.

Matthew R. Ochs (1):
  cxlflash: Use local mutex for AFU serialization

Uma Krishnan (6):
  cxlflash: Yield to active send threads
  cxlflash: Limit the debug logs in the IO path
  cxlflash: Acquire semaphore before invoking ioctl services
  cxlflash: Add include guards to backend.h
  cxlflash: Abstract hardware dependent assignments
  cxlflash: Isolate external module dependencies

 drivers/scsi/cxlflash/Kconfig     |  2 +-
 drivers/scsi/cxlflash/Makefile    |  4 +++-
 drivers/scsi/cxlflash/backend.h   |  5 +++++
 drivers/scsi/cxlflash/common.h    |  1 +
 drivers/scsi/cxlflash/lunmgt.c    |  4 +++-
 drivers/scsi/cxlflash/main.c      | 21 +++++++++------------
 drivers/scsi/cxlflash/main.h      | 20 ++++++++++++++++++++
 drivers/scsi/cxlflash/superpipe.c |  9 ++++++++-
 drivers/scsi/cxlflash/vlun.c      |  3 ++-
 9 files changed, 52 insertions(+), 17 deletions(-)

-- 
2.1.0

^ permalink raw reply

* [PATCH 1/7] cxlflash: Yield to active send threads
From: Uma Krishnan @ 2018-05-11 19:04 UTC (permalink / raw)
  To: linux-scsi, James Bottomley, Martin K. Petersen, Matthew R. Ochs,
	Manoj N. Kumar
  Cc: linuxppc-dev, Andrew Donnellan, Frederic Barrat,
	Christophe Lombard
In-Reply-To: <1526065440-38806-1-git-send-email-ukrishn@linux.vnet.ibm.com>

The following Oops may be encountered if the device is reset, i.e. EEH
recovery, while there is heavy I/O traffic:

59:mon> t
[c000200db64bb680] c008000009264c40 cxlflash_queuecommand+0x3b8/0x500
					[cxlflash]
[c000200db64bb770] c00000000090d3b0 scsi_dispatch_cmd+0x130/0x2f0
[c000200db64bb7f0] c00000000090fdd8 scsi_request_fn+0x3c8/0x8d0
[c000200db64bb900] c00000000067f528 __blk_run_queue+0x68/0xb0
[c000200db64bb930] c00000000067ab80 __elv_add_request+0x140/0x3c0
[c000200db64bb9b0] c00000000068daac blk_execute_rq_nowait+0xec/0x1a0
[c000200db64bba00] c00000000068dbb0 blk_execute_rq+0x50/0xe0
[c000200db64bba50] c0000000006b2040 sg_io+0x1f0/0x520
[c000200db64bbaf0] c0000000006b2e94 scsi_cmd_ioctl+0x534/0x610
[c000200db64bbc20] c000000000926208 sd_ioctl+0x118/0x280
[c000200db64bbcc0] c00000000069f7ac blkdev_ioctl+0x7fc/0xe30
[c000200db64bbd20] c000000000439204 block_ioctl+0x84/0xa0
[c000200db64bbd40] c0000000003f8514 do_vfs_ioctl+0xd4/0xa00
[c000200db64bbde0] c0000000003f8f04 SyS_ioctl+0xc4/0x130
[c000200db64bbe30] c00000000000b184 system_call+0x58/0x6c

When there is no room to send the I/O request, the cached room is refreshed
by reading the memory mapped command room value from the AFU. The AFU
register mapping is refreshed during a reset, creating a race condition
that can lead to the Oops above.

During a device reset, the AFU should not be unmapped until all the active
send threads quiesce. An atomic counter, cmds_active, is currently used to
track internal AFU commands and quiesce during reset. This same counter can
also be used for the active send threads.

Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
---
 drivers/scsi/cxlflash/main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index a24d7e6..dad2be6 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -616,6 +616,7 @@ static int cxlflash_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scp)
 		rc = 0;
 		goto out;
 	default:
+		atomic_inc(&afu->cmds_active);
 		break;
 	}
 
@@ -641,6 +642,7 @@ static int cxlflash_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scp)
 	memcpy(cmd->rcb.cdb, scp->cmnd, sizeof(cmd->rcb.cdb));
 
 	rc = afu->send_cmd(afu, cmd);
+	atomic_dec(&afu->cmds_active);
 out:
 	return rc;
 }
-- 
2.1.0

^ permalink raw reply related

* [PATCH 2/7] cxlflash: Limit the debug logs in the IO path
From: Uma Krishnan @ 2018-05-11 19:05 UTC (permalink / raw)
  To: linux-scsi, James Bottomley, Martin K. Petersen, Matthew R. Ochs,
	Manoj N. Kumar
  Cc: linuxppc-dev, Andrew Donnellan, Frederic Barrat,
	Christophe Lombard
In-Reply-To: <1526065440-38806-1-git-send-email-ukrishn@linux.vnet.ibm.com>

The kernel log can get filled with debug messages from send_cmd_ioarrin()
when dynamic debug is enabled for the cxlflash module and there is a lot
of legacy I/O traffic.

While these messages are necessary to debug issues that involve command
tracking, the abundance of data can overwrite other useful data in the
log. The best option available is to limit the messages that should
serve most of the common use cases.

Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
---
 drivers/scsi/cxlflash/main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index dad2be6..cf0b407 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -339,8 +339,8 @@ static int send_cmd_ioarrin(struct afu *afu, struct afu_cmd *cmd)
 	writeq_be((u64)&cmd->rcb, &hwq->host_map->ioarrin);
 out:
 	spin_unlock_irqrestore(&hwq->hsq_slock, lock_flags);
-	dev_dbg(dev, "%s: cmd=%p len=%u ea=%016llx rc=%d\n", __func__,
-		cmd, cmd->rcb.data_len, cmd->rcb.data_ea, rc);
+	dev_dbg_ratelimited(dev, "%s: cmd=%p len=%u ea=%016llx rc=%d\n",
+		__func__, cmd, cmd->rcb.data_len, cmd->rcb.data_ea, rc);
 	return rc;
 }
 
-- 
2.1.0

^ permalink raw reply related

* [PATCH 3/7] cxlflash: Acquire semaphore before invoking ioctl services
From: Uma Krishnan @ 2018-05-11 19:05 UTC (permalink / raw)
  To: linux-scsi, James Bottomley, Martin K. Petersen, Matthew R. Ochs,
	Manoj N. Kumar
  Cc: linuxppc-dev, Andrew Donnellan, Frederic Barrat,
	Christophe Lombard
In-Reply-To: <1526065440-38806-1-git-send-email-ukrishn@linux.vnet.ibm.com>

When a superpipe process that makes use of virtual LUNs is terminated or
killed abruptly, there is a possibility that the cxlflash driver could
hang and deprive other operations on the adapter.

The release fop registered to be invoked on a context close, detaches
every LUN associated with the context. The underlying service to detach
the LUN assumes it has been called with the read semaphore held, and
releases the semaphore before any operation that could be time consuming.

When invoked without holding the read semaphore, an opportunity is created
for the semaphore's count to become negative when it is temporarily
released during one of these potential lengthy operations. This negative
count results in subsequent acquisition attempts taking forever, leading
to the hang.

To support the current design point of holding the semaphore on the
ioctl() paths, the release fop should acquire it before invoking any
ioctl services.

Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
---
 drivers/scsi/cxlflash/superpipe.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/scsi/cxlflash/superpipe.c b/drivers/scsi/cxlflash/superpipe.c
index 04a3bf9..5ba6e62 100644
--- a/drivers/scsi/cxlflash/superpipe.c
+++ b/drivers/scsi/cxlflash/superpipe.c
@@ -988,6 +988,10 @@ static int cxlflash_disk_detach(struct scsi_device *sdev,
  * theoretically never occur), every call into this routine results
  * in a complete freeing of a context.
  *
+ * Detaching the LUN is typically an ioctl() operation and the underlying
+ * code assumes that ioctl_rwsem has been acquired as a reader. To support
+ * that design point, the semaphore is acquired and released around detach.
+ *
  * Return: 0 on success
  */
 static int cxlflash_cxl_release(struct inode *inode, struct file *file)
@@ -1026,9 +1030,11 @@ static int cxlflash_cxl_release(struct inode *inode, struct file *file)
 
 	dev_dbg(dev, "%s: close for ctxid=%d\n", __func__, ctxid);
 
+	down_read(&cfg->ioctl_rwsem);
 	detach.context_id = ctxi->ctxid;
 	list_for_each_entry_safe(lun_access, t, &ctxi->luns, list)
 		_cxlflash_disk_detach(lun_access->sdev, ctxi, &detach);
+	up_read(&cfg->ioctl_rwsem);
 out_release:
 	cfg->ops->fd_release(inode, file);
 out:
-- 
2.1.0

^ permalink raw reply related

* [PATCH 4/7] cxlflash: Use local mutex for AFU serialization
From: Uma Krishnan @ 2018-05-11 19:05 UTC (permalink / raw)
  To: linux-scsi, James Bottomley, Martin K. Petersen, Matthew R. Ochs,
	Manoj N. Kumar
  Cc: linuxppc-dev, Andrew Donnellan, Frederic Barrat,
	Christophe Lombard
In-Reply-To: <1526065440-38806-1-git-send-email-ukrishn@linux.vnet.ibm.com>

From: "Matthew R. Ochs" <mrochs@linux.vnet.ibm.com>

AFUs can only process a single AFU command at a time. This is enforced
with a global mutex situated within the AFU send routine. As this mutex
has a global scope, it has the potential to unnecessarily block commands
destined for other AFUs.

Instead of using a global mutex, transition the mutex to be per-AFU. This
will allow commands to only be blocked by siblings of the same AFU.

Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
---
 drivers/scsi/cxlflash/common.h | 1 +
 drivers/scsi/cxlflash/main.c   | 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/cxlflash/common.h b/drivers/scsi/cxlflash/common.h
index 3556b1d..dfcbecb 100644
--- a/drivers/scsi/cxlflash/common.h
+++ b/drivers/scsi/cxlflash/common.h
@@ -240,6 +240,7 @@ struct afu {
 	struct cxlflash_afu_map __iomem *afu_map;	/* entire MMIO map */
 
 	atomic_t cmds_active;	/* Number of currently active AFU commands */
+	struct mutex sync_active;	/* Mutex to serialize AFU commands */
 	u64 hb;
 	u32 internal_lun;	/* User-desired LUN mode for this AFU */
 
diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index cf0b407..c91e912 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -2126,6 +2126,7 @@ static int init_afu(struct cxlflash_cfg *cfg)
 
 	cfg->ops->perst_reloads_same_image(cfg->afu_cookie, true);
 
+	mutex_init(&afu->sync_active);
 	afu->num_hwqs = afu->desired_hwqs;
 	for (i = 0; i < afu->num_hwqs; i++) {
 		rc = init_mc(cfg, i);
@@ -2309,7 +2310,6 @@ static int send_afu_cmd(struct afu *afu, struct sisl_ioarcb *rcb)
 	char *buf = NULL;
 	int rc = 0;
 	int nretry = 0;
-	static DEFINE_MUTEX(sync_active);
 
 	if (cfg->state != STATE_NORMAL) {
 		dev_dbg(dev, "%s: Sync not required state=%u\n",
@@ -2317,7 +2317,7 @@ static int send_afu_cmd(struct afu *afu, struct sisl_ioarcb *rcb)
 		return 0;
 	}
 
-	mutex_lock(&sync_active);
+	mutex_lock(&afu->sync_active);
 	atomic_inc(&afu->cmds_active);
 	buf = kmalloc(sizeof(*cmd) + __alignof__(*cmd) - 1, GFP_KERNEL);
 	if (unlikely(!buf)) {
@@ -2372,7 +2372,7 @@ static int send_afu_cmd(struct afu *afu, struct sisl_ioarcb *rcb)
 		*rcb->ioasa = cmd->sa;
 out:
 	atomic_dec(&afu->cmds_active);
-	mutex_unlock(&sync_active);
+	mutex_unlock(&afu->sync_active);
 	kfree(buf);
 	dev_dbg(dev, "%s: returning rc=%d\n", __func__, rc);
 	return rc;
-- 
2.1.0

^ permalink raw reply related

* [PATCH 5/7] cxlflash: Add include guards to backend.h
From: Uma Krishnan @ 2018-05-11 19:05 UTC (permalink / raw)
  To: linux-scsi, James Bottomley, Martin K. Petersen, Matthew R. Ochs,
	Manoj N. Kumar
  Cc: linuxppc-dev, Andrew Donnellan, Frederic Barrat,
	Christophe Lombard
In-Reply-To: <1526065440-38806-1-git-send-email-ukrishn@linux.vnet.ibm.com>

The new header file, backend.h, that was recently added is missing
the include guards. This commit adds the guards.

Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
---
 drivers/scsi/cxlflash/backend.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/scsi/cxlflash/backend.h b/drivers/scsi/cxlflash/backend.h
index bcd8a6c..55638d1 100644
--- a/drivers/scsi/cxlflash/backend.h
+++ b/drivers/scsi/cxlflash/backend.h
@@ -12,6 +12,9 @@
  * 2 of the License, or (at your option) any later version.
  */
 
+#ifndef _CXLFLASH_BACKEND_H
+#define _CXLFLASH_BACKEND_H
+
 extern const struct cxlflash_backend_ops cxlflash_cxl_ops;
 extern const struct cxlflash_backend_ops cxlflash_ocxl_ops;
 
@@ -45,3 +48,5 @@ struct cxlflash_backend_ops {
 	int (*fd_mmap)(struct file *file, struct vm_area_struct *vm);
 	int (*fd_release)(struct inode *inode, struct file *file);
 };
+
+#endif /* _CXLFLASH_BACKEND_H */
-- 
2.1.0

^ permalink raw reply related

* [PATCH 6/7] cxlflash: Abstract hardware dependent assignments
From: Uma Krishnan @ 2018-05-11 19:06 UTC (permalink / raw)
  To: linux-scsi, James Bottomley, Martin K. Petersen, Matthew R. Ochs,
	Manoj N. Kumar
  Cc: linuxppc-dev, Andrew Donnellan, Frederic Barrat,
	Christophe Lombard
In-Reply-To: <1526065440-38806-1-git-send-email-ukrishn@linux.vnet.ibm.com>

As a staging cleanup to support transport specific builds of the cxlflash
module, relocate device dependent assignments to header files. This will
avoid littering the core driver with conditional compilation logic.

Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
---
 drivers/scsi/cxlflash/main.c |  7 ++-----
 drivers/scsi/cxlflash/main.h | 15 +++++++++++++++
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index c91e912..cd7dcc5 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -3708,11 +3708,8 @@ static int cxlflash_probe(struct pci_dev *pdev,
 	cfg->init_state = INIT_STATE_NONE;
 	cfg->dev = pdev;
 	cfg->cxl_fops = cxlflash_cxl_fops;
-
-	if (ddv->flags & CXLFLASH_OCXL_DEV)
-		cfg->ops = &cxlflash_ocxl_ops;
-	else
-		cfg->ops = &cxlflash_cxl_ops;
+	cfg->ops = cxlflash_assign_ops(ddv);
+	WARN_ON_ONCE(!cfg->ops);
 
 	/*
 	 * Promoted LUNs move to the top of the LUN table. The rest stay on
diff --git a/drivers/scsi/cxlflash/main.h b/drivers/scsi/cxlflash/main.h
index 6f1be62..ed4908e 100644
--- a/drivers/scsi/cxlflash/main.h
+++ b/drivers/scsi/cxlflash/main.h
@@ -20,6 +20,8 @@
 #include <scsi/scsi.h>
 #include <scsi/scsi_device.h>
 
+#include "backend.h"
+
 #define CXLFLASH_NAME		"cxlflash"
 #define CXLFLASH_ADAPTER_NAME	"IBM POWER CXL Flash Adapter"
 #define CXLFLASH_MAX_ADAPTERS	32
@@ -100,6 +102,19 @@ struct dev_dependent_vals {
 #define CXLFLASH_OCXL_DEV		0x0000000000000004ULL
 };
 
+static inline const struct cxlflash_backend_ops *
+cxlflash_assign_ops(struct dev_dependent_vals *ddv)
+{
+	const struct cxlflash_backend_ops *ops = NULL;
+
+	if (ddv->flags & CXLFLASH_OCXL_DEV)
+		ops = &cxlflash_ocxl_ops;
+	if (!(ddv->flags & CXLFLASH_OCXL_DEV))
+		ops = &cxlflash_cxl_ops;
+
+	return ops;
+}
+
 struct asyc_intr_info {
 	u64 status;
 	char *desc;
-- 
2.1.0

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox