From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3wpNKX0cZjzDqL8 for ; Thu, 15 Jun 2017 22:36:31 +1000 (AEST) Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v5FCYv34103931 for ; Thu, 15 Jun 2017 08:36:30 -0400 Received: from e06smtp13.uk.ibm.com (e06smtp13.uk.ibm.com [195.75.94.109]) by mx0a-001b2d01.pphosted.com with ESMTP id 2b3skrt7an-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Thu, 15 Jun 2017 08:36:29 -0400 Received: from localhost by e06smtp13.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 15 Jun 2017 13:36:27 +0100 Subject: Re: [PATCH V2] cxl: Export library to support IBM XSL To: Christophe Lombard , linuxppc-dev@lists.ozlabs.org, vaibhav@linux.vnet.ibm.com, andrew.donnellan@au1.ibm.com References: <1497446944-15759-1-git-send-email-clombard@linux.vnet.ibm.com> From: Frederic Barrat Date: Thu, 15 Jun 2017 14:36:24 +0200 MIME-Version: 1.0 In-Reply-To: <1497446944-15759-1-git-send-email-clombard@linux.vnet.ibm.com> Content-Type: text/plain; charset=utf-8; format=flowed Message-Id: List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Salut Christophe, A few comments below, nothing major... Le 14/06/2017 à 15:29, Christophe Lombard a écrit : > This patch exports a in-kernel 'library' API which can be called by > other drivers to help interacting with an IBM XSL on a POWER9 system. > > The XSL (Translation Service Layer) is a stripped down version of the > PSL (Power Service Layer) used in some cards such as the Mellanox CX5. > Like the PSL, it implements the CAIA architecture, but has a number > of differences, mostly in it's implementation dependent registers. > > The XSL also uses a special DMA cxl mode, which uses a slightly > different init sequence for the CAPP and PHB. > > Changelog[v2] > - Rebase to latest upstream. > - Return -EFAULT in case of NULL pointer in cxllib_handle_fault(). > - Reverse parameters when copro_handle_mm_fault() is called. > The change log shouldn't be part of the commit message, but below the next "---" > +++ b/drivers/misc/cxl/cxllib.c > @@ -0,0 +1,241 @@ > +/* > + * Copyright 2016 IBM Corp. Year need to be updated > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public License > + * as published by the Free Software Foundation; either version > + * 2 of the License, or (at your option) any later version. > + */ > + > +#include > +#include > +#include "cxl.h" > +#include > +#include Ordering of the #include is messy: #include #include #include #include #include "cxl.h" > +int cxllib_set_device_dma(struct pci_dev *dev, unsigned long flags) > +{ > + int rc; > + > + if (flags) > + return -EINVAL; > + > + rc = dma_set_mask(&dev->dev, DMA_BIT_MASK(64)); > + return rc; > +} > +EXPORT_SYMBOL_GPL(cxllib_set_device_dma); A comment in cxllib_set_device_dma() would help: /* * When switching the PHB to capi mode, the TVT#1 entry for * the Partitionable Endpoint is set in bypass mode, like * in PCI mode. * Configure the device dma to use TVT#1, which is done * by calling dma_set_mask() with a mask large enough. */ > + > +int cxllib_get_PE_attributes(struct task_struct *task, > + unsigned long translation_mode, struct cxllib_pe_attributes *attr) > +{ > + struct mm_struct *mm = NULL; > + > + if (translation_mode != CXL_TRANSLATED_MODE && > + translation_mode != CXL_REAL_MODE) > + return -EINVAL; > + > + attr->sr = cxl_calculate_sr(false /* master */, > + task == NULL /* kernel ctx */, > + translation_mode == CXL_REAL_MODE, > + true /* p9 */); > + attr->lpid = mfspr(SPRN_LPID); > + if (task) { > + mm = get_task_mm(task); > + if (mm == NULL) > + return -EINVAL; > + /* > + * Caller is keeping a reference on mm_users for as long > + * as XSL uses the memory context > + */ > + attr->pid = mm->context.id; > + mmput(mm); > + } else { > + attr->pid = 0; > + } > + attr->tid = 0; We'll need to remember to patch that function as welll (attr->tid) when we add support for as_notify (even though Mellanox is not expected to use as_notify in capi mode, just pci I believe) > +int cxllib_handle_fault(struct mm_struct *mm, u64 addr, u64 size, u64 flags) > +{ > + int rc; > + u64 dar; > + struct vm_area_struct *vma = NULL; > + unsigned long page_size; > + > + if (mm == NULL) > + return -EFAULT; > + > + down_read(&mm->mmap_sem); > + > + for (dar = addr; dar < addr + size; dar += page_size) { > + if (!vma || dar < vma->vm_start || dar > vma->vm_end) { > + vma = find_vma(mm, addr); > + if (!vma) { > + pr_err("Can't find vma for addr %016llx\n", addr); > + rc = -EFAULT; > + goto out; > + } > + /* get the size of the pages allocated */ > + page_size = vma_kernel_pagesize(vma); > + } > + > + rc = cxl_handle_page_fault(true, mm, flags, dar); Why do we pass "true" for kernel parameter? Actually do we even need a kernel input parameter for cxl_handle_page_fault() ? It seems that we can infer it based on mm. If NULL, then we are in kernel space. > diff --git a/drivers/misc/cxl/fault.c b/drivers/misc/cxl/fault.c > index c79e39b..9db63f3 100644 > --- a/drivers/misc/cxl/fault.c > +++ b/drivers/misc/cxl/fault.c > @@ -132,18 +132,16 @@ static int cxl_handle_segment_miss(struct cxl_context *ctx, > return IRQ_HANDLED; > } > > -static void cxl_handle_page_fault(struct cxl_context *ctx, > - struct mm_struct *mm, u64 dsisr, u64 dar) > +int cxl_handle_page_fault(bool kernel_context, > + struct mm_struct *mm, u64 dsisr, u64 dar) > { > unsigned flt = 0; > int result; > unsigned long access, flags, inv_flags = 0; > > - trace_cxl_pte_miss(ctx, dsisr, dar); > - > if ((result = copro_handle_mm_fault(mm, dar, dsisr, &flt))) { > pr_devel("copro_handle_mm_fault failed: %#x\n", result); > - return cxl_ack_ae(ctx); > + return result; > } > > if (!radix_enabled()) { > @@ -156,7 +154,7 @@ static void cxl_handle_page_fault(struct cxl_context *ctx, > access |= _PAGE_WRITE; > > access |= _PAGE_PRIVILEGED; > - if ((!ctx->kernel) || (REGION_ID(dar) == USER_REGION_ID)) > + if (!kernel_context || (REGION_ID(dar) == USER_REGION_ID)) > access &= ~_PAGE_PRIVILEGED; > > if (dsisr & DSISR_NOHPTE) > @@ -166,8 +164,7 @@ static void cxl_handle_page_fault(struct cxl_context *ctx, > hash_page_mm(mm, dar, access, 0x300, inv_flags); > local_irq_restore(flags); > } > - pr_devel("Page fault successfully handled for pe: %i!\n", ctx->pe); > - cxl_ops->ack_irq(ctx, CXL_PSL_TFC_An_R, 0); > + return 0; > } > > /* > @@ -261,9 +258,15 @@ void cxl_handle_fault(struct work_struct *fault_work) > > if (cxl_is_segment_miss(ctx, dsisr)) > cxl_handle_segment_miss(ctx, mm, dar); > - else if (cxl_is_page_fault(ctx, dsisr)) > - cxl_handle_page_fault(ctx, mm, dsisr, dar); > - else > + else if (cxl_is_page_fault(ctx, dsisr)) { > + trace_cxl_pte_miss(ctx, dsisr, dar); > + if (cxl_handle_page_fault(ctx->kernel, mm, dsisr, dar)) { > + cxl_ack_ae(ctx); > + } else { > + pr_devel("Page fault successfully handled for pe: %i!\n", ctx->pe); > + cxl_ops->ack_irq(ctx, CXL_PSL_TFC_An_R, 0); > + } Could we have that code in a wrapper before calling cxl_handle_page_fault()? It would keep the code cleaner and in line with what we do for cxl_handle_segment_miss(). > +++ b/include/misc/cxllib.h > @@ -0,0 +1,132 @@ > +/* > + * Copyright 2016 IBM Corp. Year update. > +/* > + * Get the Process Element structure for the given thread > + * > + * Input: > + * pid: points the struct pid for the given thread (i.e. linux pid) > + * translation_mode: whether addresses should be translated > + */ > +struct cxllib_pe_attributes { > + u64 sr; > + u32 lpid; > + u32 tid; > + u32 pid; > +}; > +#define CXL_TRANSLATED_MODE 0 > +#define CXL_REAL_MODE 1 > + > +int cxllib_get_PE_attributes(struct task_struct *task, > + unsigned long translation_mode, struct cxllib_pe_attributes *attr); Description in comment no longer matches reality. /* * Get the Process Element structure for the given thread * * Input: * task: task_struct for the context of the translation * translation_mode: whether addresses should be translated * Output: * attr: attributes to fill up the Process Element structure from CAIA */ Thanks, Fred