From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 40Frhd52lwzF277 for ; Wed, 4 Apr 2018 00:15:09 +1000 (AEST) Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w33EE01C132581 for ; Tue, 3 Apr 2018 10:15:06 -0400 Received: from e06smtp12.uk.ibm.com (e06smtp12.uk.ibm.com [195.75.94.108]) by mx0a-001b2d01.pphosted.com with ESMTP id 2h4b6m82yg-1 (version=TLSv1.2 cipher=AES256-SHA256 bits=256 verify=NOT) for ; Tue, 03 Apr 2018 10:15:06 -0400 Received: from localhost by e06smtp12.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 3 Apr 2018 15:15:03 +0100 Subject: Re: [PATCH v2] cxl: Fix possible deadlock when processing page faults from cxllib To: Laurent Dufour , Frederic Barrat , linuxppc-dev@lists.ozlabs.org, mpe@ellerman.id.au Cc: clombard@linux.vnet.ibm.com, andrew.donnellan@au1.ibm.com, vaibhav@linux.vnet.ibm.com References: <20180403135402.30374-1-fbarrat@linux.vnet.ibm.com> <904fbfd8-5224-78b5-ca74-4c4dec1125e7@linux.vnet.ibm.com> From: Frederic Barrat Date: Tue, 3 Apr 2018 16:15:00 +0200 MIME-Version: 1.0 In-Reply-To: <904fbfd8-5224-78b5-ca74-4c4dec1125e7@linux.vnet.ibm.com> Content-Type: text/plain; charset=utf-8; format=flowed Message-Id: List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Le 03/04/2018 à 16:06, Laurent Dufour a écrit : > On 03/04/2018 15:54, Frederic Barrat wrote: >> cxllib_handle_fault() is called by an external driver when it needs to >> have the host resolve page faults for a buffer. The buffer can cover >> several pages and VMAs. The function iterates over all the pages used >> by the buffer, based on the page size of the VMA. >> >> To ensure some stability while processing the faults, the thread T1 >> grabs the mm->mmap_sem semaphore with read access (R1). However, when >> processing a page fault for a single page, one of the underlying >> functions, copro_handle_mm_fault(), also grabs the same semaphore with >> read access (R2). So the thread T1 takes the semaphore twice. >> >> If another thread T2 tries to access the semaphore in write mode W1 >> (say, because it wants to allocate memory and calls 'brk'), then that >> thread T2 will have to wait because there's a reader (R1). If the >> thread T1 is processing a new page at that time, it won't get an >> automatic grant at R2, because there's now a writer thread >> waiting (T2). And we have a deadlock. >> >> The timeline is: >> 1. thread T1 owns the semaphore with read access R1 >> 2. thread T2 requests write access W1 and waits >> 3. thread T1 requests read access R2 and waits >> >> The fix is for the thread T1 to release the semaphore R1 once it got >> the information it needs from the current VMA. The address space/VMAs >> could evolve while T1 iterates over the full buffer, but in the >> unlikely case where T1 misses a page, the external driver will raise a >> new page fault when retrying the memory access. >> >> Fixes: 3ced8d730063 ("cxl: Export library to support IBM XSL") >> Cc: stable@vger.kernel.org # 4.13+ >> Signed-off-by: Frederic Barrat > > What are the changes introduced in this v2 ? How did that happen! I would have sworn to have added the changelog! Code is the same. I rewrote the commit message to hopefully make it more readable. Fred >> --- >> drivers/misc/cxl/cxllib.c | 85 ++++++++++++++++++++++++++++++----------------- >> 1 file changed, 55 insertions(+), 30 deletions(-) >> >> diff --git a/drivers/misc/cxl/cxllib.c b/drivers/misc/cxl/cxllib.c >> index 30ccba436b3b..55cd35d1a9cc 100644 >> --- a/drivers/misc/cxl/cxllib.c >> +++ b/drivers/misc/cxl/cxllib.c >> @@ -208,49 +208,74 @@ int cxllib_get_PE_attributes(struct task_struct *task, >> } >> EXPORT_SYMBOL_GPL(cxllib_get_PE_attributes); >> >> -int cxllib_handle_fault(struct mm_struct *mm, u64 addr, u64 size, u64 flags) >> +static int get_vma_info(struct mm_struct *mm, u64 addr, >> + u64 *vma_start, u64 *vma_end, >> + unsigned long *page_size) >> { >> - int rc; >> - u64 dar; >> struct vm_area_struct *vma = NULL; >> - unsigned long page_size; >> - >> - if (mm == NULL) >> - return -EFAULT; >> + int rc = 0; >> >> down_read(&mm->mmap_sem); >> >> vma = find_vma(mm, addr); >> if (!vma) { >> - pr_err("Can't find vma for addr %016llx\n", addr); >> rc = -EFAULT; >> goto out; >> } >> - /* get the size of the pages allocated */ >> - page_size = vma_kernel_pagesize(vma); >> - >> - for (dar = (addr & ~(page_size - 1)); dar < (addr + size); dar += page_size) { >> - if (dar < vma->vm_start || dar >= vma->vm_end) { >> - vma = find_vma(mm, addr); >> - if (!vma) { >> - pr_err("Can't find vma for addr %016llx\n", addr); >> - rc = -EFAULT; >> - goto out; >> - } >> - /* get the size of the pages allocated */ >> - page_size = vma_kernel_pagesize(vma); >> + *page_size = vma_kernel_pagesize(vma); >> + *vma_start = vma->vm_start; >> + *vma_end = vma->vm_end; >> +out: >> + up_read(&mm->mmap_sem); >> + return rc; >> +} >> + >> +int cxllib_handle_fault(struct mm_struct *mm, u64 addr, u64 size, u64 flags) >> +{ >> + int rc; >> + u64 dar, vma_start, vma_end; >> + unsigned long page_size; >> + >> + if (mm == NULL) >> + return -EFAULT; >> + >> + /* >> + * The buffer we have to process can extend over several pages >> + * and may also cover several VMAs. >> + * We iterate over all the pages. The page size could vary >> + * between VMAs. >> + */ >> + rc = get_vma_info(mm, addr, &vma_start, &vma_end, &page_size); >> + if (rc) >> + return rc; >> + >> + for (dar = (addr & ~(page_size - 1)); dar < (addr + size); >> + dar += page_size) { >> + if (dar < vma_start || dar >= vma_end) { >> + /* >> + * We don't hold the mm->mmap_sem semaphore >> + * while iterating, since the semaphore is >> + * required by one of the lower-level page >> + * fault processing functions and it could >> + * create a deadlock. >> + * >> + * It means the VMAs can be altered between 2 >> + * loop iterations and we could theoretically >> + * miss a page (however unlikely). But that's >> + * not really a problem, as the driver will >> + * retry access, get another page fault on the >> + * missing page and call us again. >> + */ >> + rc = get_vma_info(mm, dar, &vma_start, &vma_end, >> + &page_size); >> + if (rc) >> + return rc; >> } >> >> rc = cxl_handle_mm_fault(mm, flags, dar); >> - if (rc) { >> - pr_err("cxl_handle_mm_fault failed %d", rc); >> - rc = -EFAULT; >> - goto out; >> - } >> + if (rc) >> + return -EFAULT; >> } >> - rc = 0; >> -out: >> - up_read(&mm->mmap_sem); >> - return rc; >> + return 0; >> } >> EXPORT_SYMBOL_GPL(cxllib_handle_fault); >>