From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 40GSGG5ssFzF1pY for ; Wed, 4 Apr 2018 23:57:58 +1000 (AEST) Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w34AsgXr075864 for ; Wed, 4 Apr 2018 06:59:06 -0400 Received: from e06smtp11.uk.ibm.com (e06smtp11.uk.ibm.com [195.75.94.107]) by mx0a-001b2d01.pphosted.com with ESMTP id 2h4uw1vfr3-1 (version=TLSv1.2 cipher=AES256-SHA256 bits=256 verify=NOT) for ; Wed, 04 Apr 2018 06:59:06 -0400 Received: from localhost by e06smtp11.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 4 Apr 2018 11:59:03 +0100 Subject: Re: [PATCH] cxl: Fix possible deadlock when processing page faults from cxllib To: "Aneesh Kumar K.V" , Frederic Barrat , linuxppc-dev@lists.ozlabs.org, mpe@ellerman.id.au Cc: ldufour@linux.vnet.ibm.com, clombard@linux.vnet.ibm.com, andrew.donnellan@au1.ibm.com, vaibhav@linux.vnet.ibm.com References: <20180403094311.331-1-fbarrat@linux.vnet.ibm.com> <606e7509-5c37-b38e-9b6c-a60a5694c652@linux.ibm.com> <8e099a01-ba11-d26f-07ab-bf0fa6e387c1@linux.ibm.com> From: Frederic Barrat Date: Wed, 4 Apr 2018 12:58:59 +0200 MIME-Version: 1.0 In-Reply-To: <8e099a01-ba11-d26f-07ab-bf0fa6e387c1@linux.ibm.com> Content-Type: text/plain; charset=utf-8; format=flowed Message-Id: List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Le 03/04/2018 à 17:31, Aneesh Kumar K.V a écrit : > On 04/03/2018 08:10 PM, Aneesh Kumar K.V wrote: >> On 04/03/2018 03:13 PM, Frederic Barrat wrote: >>> cxllib_handle_fault() is called by an external driver when it needs to >>> have the host process page faults for a buffer which may cover several >>> pages. Currently the function holds the mm->mmap_sem semaphore with >>> read access while iterating over the buffer, since it could spawn >>> several VMAs. When calling a lower-level function to handle the page >>> fault for a single page, the semaphore is accessed again in read >>> mode. That is wrong and can lead to deadlocks if a writer tries to >>> sneak in while a buffer of several pages is being processed. >>> >>> The fix is to release the semaphore once cxllib_handle_fault() got the >>> information it needs from the current vma. The address space/VMAs >>> could evolve while we iterate over the full buffer, but in the >>> unlikely case where we miss a page, the driver will raise a new page >>> fault when retrying. >>> >>> Fixes: 3ced8d730063 ("cxl: Export library to support IBM XSL") >>> Cc: stable@vger.kernel.org # 4.13+ >>> Signed-off-by: Frederic Barrat >>> --- >>>   drivers/misc/cxl/cxllib.c | 85 >>> ++++++++++++++++++++++++++++++----------------- >>>   1 file changed, 55 insertions(+), 30 deletions(-) >>> >>> diff --git a/drivers/misc/cxl/cxllib.c b/drivers/misc/cxl/cxllib.c >>> index 30ccba436b3b..55cd35d1a9cc 100644 >>> --- a/drivers/misc/cxl/cxllib.c >>> +++ b/drivers/misc/cxl/cxllib.c >>> @@ -208,49 +208,74 @@ int cxllib_get_PE_attributes(struct task_struct >>> *task, >>>   } >>>   EXPORT_SYMBOL_GPL(cxllib_get_PE_attributes); >>> -int cxllib_handle_fault(struct mm_struct *mm, u64 addr, u64 size, >>> u64 flags) >>> +static int get_vma_info(struct mm_struct *mm, u64 addr, >>> +            u64 *vma_start, u64 *vma_end, >>> +            unsigned long *page_size) >>>   { >>> -    int rc; >>> -    u64 dar; >>>       struct vm_area_struct *vma = NULL; >>> -    unsigned long page_size; >>> - >>> -    if (mm == NULL) >>> -        return -EFAULT; >>> +    int rc = 0; >>>       down_read(&mm->mmap_sem); >>>       vma = find_vma(mm, addr); >>>       if (!vma) { >>> -        pr_err("Can't find vma for addr %016llx\n", addr); >>>           rc = -EFAULT; >>>           goto out; >>>       } >>> -    /* get the size of the pages allocated */ >>> -    page_size = vma_kernel_pagesize(vma); >>> - >>> -    for (dar = (addr & ~(page_size - 1)); dar < (addr + size); dar >>> += page_size) { >>> -        if (dar < vma->vm_start || dar >= vma->vm_end) { >>> -            vma = find_vma(mm, addr); >>> -            if (!vma) { >>> -                pr_err("Can't find vma for addr %016llx\n", addr); >>> -                rc = -EFAULT; >>> -                goto out; >>> -            } >>> -            /* get the size of the pages allocated */ >>> -            page_size = vma_kernel_pagesize(vma); >>> +    *page_size = vma_kernel_pagesize(vma); >>> +    *vma_start = vma->vm_start; >>> +    *vma_end = vma->vm_end; >>> +out: >>> +    up_read(&mm->mmap_sem); >>> +    return rc; >>> +} >>> + >>> +int cxllib_handle_fault(struct mm_struct *mm, u64 addr, u64 size, >>> u64 flags) >>> +{ >>> +    int rc; >>> +    u64 dar, vma_start, vma_end; >>> +    unsigned long page_size; >>> + >>> +    if (mm == NULL) >>> +        return -EFAULT; >>> + >>> +    /* >>> +     * The buffer we have to process can extend over several pages >>> +     * and may also cover several VMAs. >>> +     * We iterate over all the pages. The page size could vary >>> +     * between VMAs. >>> +     */ >>> +    rc = get_vma_info(mm, addr, &vma_start, &vma_end, &page_size); >>> +    if (rc) >>> +        return rc; >>> + >>> +    for (dar = (addr & ~(page_size - 1)); dar < (addr + size); >>> +         dar += page_size) { >>> +        if (dar < vma_start || dar >= vma_end) { >> >> >> IIUC, we are fetching the vma to get just the page_size with which it >> is mapped? Can't we iterate with PAGE_SIZE? Considering hugetlb page >> size will be larger than PAGE_SIZE, we might call into >> cxl_handle_mm_fault multiple times for a hugetlb page. Does that cause >> any issue? Also can cxl be used with hugetlb mappings? >> > Can you also try to use a helper like below. That will clarify the need > of find_vma there. > > static int __cxllib_handle_fault(struct mm_struct *mm, unsigned long > start, unsigned long end, >                  unsigned long mapping_psize, u64 flags) > { >     int rc; >     unsigned long dar; > >     for (dar = start; dar < end; dar += mapping_psize) { >         rc = cxl_handle_mm_fault(mm, flags, dar); >         if (rc) { >             rc = -EFAULT; >             goto out; >         } >     } >     rc = 0; > out: >     return rc; > } I'm struggling to make good use of it. I see your point, it makes touching all the pages for a given VMA easy (same page size). But I'm given a buffer, which can span several VMAs, and we can have a varying page size. So I now need an outside loop iterating over all the VMAs and call the helper for a subset of the VMA. IMHO, it's not making the code any easier, i.e what I gain in the helper is lost in the outside VMA loop. The current code, with a single loop and a varying increment based on the page size, doesn't look that bad to me. Fred