From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BEF0DC001DF for ; Mon, 24 Jul 2023 15:22:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231145AbjGXPWX (ORCPT ); Mon, 24 Jul 2023 11:22:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46126 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229926AbjGXPWW (ORCPT ); Mon, 24 Jul 2023 11:22:22 -0400 Received: from madras.collabora.co.uk (madras.collabora.co.uk [IPv6:2a00:1098:0:82:1000:25:2eeb:e5ab]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 52FFC133; Mon, 24 Jul 2023 08:22:18 -0700 (PDT) Received: from [192.168.100.7] (unknown [39.34.185.74]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: usama.anjum) by madras.collabora.co.uk (Postfix) with ESMTPSA id 4B5EE66070F7; Mon, 24 Jul 2023 16:22:08 +0100 (BST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1690212136; bh=bnkD018OLMVlnqYn0CKMoCrb5lYKht87nInr5jLEyEA=; h=Date:Cc:Subject:To:References:From:In-Reply-To:From; b=CYMUR0MZRmc//CNry3EtS7AIREmSN/0yFPzY/6s695eoKPkgT+ct3gadk3IlUuUIp MQNmt+B43Y0N1k1L8Jwy9R9/xb8ASxVQ4pQ0LCT+BODxDWSIDNDxoveDZyFbJemlpZ EKSLpXeBZuE0+R2a9zUOOLqRR+NyU7g/1a1fsjeaDd8OVABfsAx6hGueqFQt4Qtf4M 93pIsSXckHctpaXrxDaA/9Mb8wZx/xOAp9UQ9PcGUkcldXmbxvGkRPX7D9sliDQ8H1 bXZrMpGeTNHoRV7CuK2Evmq1rYOF+ykT++ZYJ1iU+jRfRyRy/S12nefGQE7SFS4eD2 znKDMuxFs0sPA== Message-ID: <44eddc7d-fd68-1595-7e4f-e196abe37311@collabora.com> Date: Mon, 24 Jul 2023 20:21:58 +0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Cc: Muhammad Usama Anjum , =?UTF-8?B?TWljaGHFgiBNaXJvc8WCYXc=?= , Andrei Vagin , Danylo Mocherniuk , Alex Sierra , Alexander Viro , Andrew Morton , Axel Rasmussen , Christian Brauner , Cyrill Gorcunov , Dan Williams , David Hildenbrand , Greg KH , "Gustavo A . R . Silva" , "Liam R . Howlett" , Matthew Wilcox , Mike Rapoport , Nadav Amit , Pasha Tatashin , Paul Gofman , Peter Xu , Shuah Khan , Suren Baghdasaryan , Vlastimil Babka , Yang Shi , Yun Zhou , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, kernel@collabora.com Subject: Re: [v2] fs/proc/task_mmu: Implement IOCTL for efficient page table scanning To: =?UTF-8?B?TWljaGHFgiBNaXJvc8WCYXc=?= References: <20230713101415.108875-6-usama.anjum@collabora.com> <7eedf953-7cf6-c342-8fa8-b7626d69ab63@collabora.com> <382f4435-2088-08ce-20e9-bc1a15050861@collabora.com> Content-Language: en-US From: Muhammad Usama Anjum In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org On 7/24/23 7:38 PM, Michał Mirosław wrote: > On Mon, 24 Jul 2023 at 16:04, Muhammad Usama Anjum > wrote: >> >> Fixed found bugs. Testing it further. >> >> - Split and backoff in case buffer full case as well >> - Fix the wrong breaking of loop if page isn't interesting, skip intead >> - Untag the address and save them into struct >> - Round off the end address to next page >> >> Signed-off-by: Muhammad Usama Anjum >> --- >> fs/proc/task_mmu.c | 54 ++++++++++++++++++++++++++-------------------- >> 1 file changed, 31 insertions(+), 23 deletions(-) >> >> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c >> index add21fdf3c9a..64b326d0ec6d 100644 >> --- a/fs/proc/task_mmu.c >> +++ b/fs/proc/task_mmu.c >> @@ -1985,18 +1989,19 @@ static int pagemap_scan_output(unsigned long >> categories, >> unsigned long n_pages, total_pages; >> int ret = 0; >> >> + if (!p->vec_buf) >> + return 0; >> + >> if (!pagemap_scan_is_interesting_page(categories, p)) { >> *end = addr; >> return 0; >> } >> >> - if (!p->vec_buf) >> - return 0; >> - >> categories &= p->arg.return_mask; > > This is wrong - is_interesting() check must happen before output as > the `*end = addr` means the range should be skipped, but return 0 > requests continuing of the walk. Will revert. > >> @@ -2044,7 +2050,7 @@ static int pagemap_scan_thp_entry(pmd_t *pmd, >> unsigned long start, >> * Break huge page into small pages if the WP operation >> * need to be performed is on a portion of the huge page. >> */ >> - if (end != start + HPAGE_SIZE) { >> + if (end != start + HPAGE_SIZE || ret == -ENOSPC) { > > Why is it needed? If `end == start + HPAGE_SIZE` then we're handling a > full hugepage anyway. If we weren't able to add the complete thp in the output buffer and we need to perform WP on the entire page, we should split and rollback. Otherwise we'll WP the entire thp and we'll lose the state on the remaining THP which wasn't added to output. Lets say max=100 only 100 pages would be added to output we need to split and rollback otherwise other 412 pages would get WP > >> @@ -2066,8 +2072,8 @@ static int pagemap_scan_pmd_entry(pmd_t *pmd, >> unsigned long start, >> { >> struct pagemap_scan_private *p = walk->private; >> struct vm_area_struct *vma = walk->vma; >> + unsigned long addr, categories, next; >> pte_t *pte, *start_pte; >> - unsigned long addr; >> bool flush = false; >> spinlock_t *ptl; >> int ret; >> @@ -2088,12 +2094,14 @@ static int pagemap_scan_pmd_entry(pmd_t *pmd, >> unsigned long start, >> } >> >> for (addr = start; addr != end; pte++, addr += PAGE_SIZE) { >> - unsigned long categories = p->cur_vma_category | >> - pagemap_page_category(vma, addr, ptep_get(pte)); >> - unsigned long next = addr + PAGE_SIZE; >> + categories = p->cur_vma_category | >> + pagemap_page_category(vma, addr, ptep_get(pte)); >> + next = addr + PAGE_SIZE; > > Why moving the variable declarations out of the loop? Saving spaces inside loop. What are pros of declation of variable in loop? > >> >> ret = pagemap_scan_output(categories, p, addr, &next); >> - if (next == addr) >> + if (ret == 0 && next == addr) >> + continue; >> + else if (next == addr) >> break; > > Ah, this indeed was a bug. Nit: > > if (next == addr) { if (!ret) continue; break; } > >> @@ -2204,8 +2212,6 @@ static const struct mm_walk_ops pagemap_scan_ops = { >> static int pagemap_scan_get_args(struct pm_scan_arg *arg, >> unsigned long uarg) >> { >> - unsigned long start, end, vec; >> - >> if (copy_from_user(arg, (void __user *)uarg, sizeof(*arg))) >> return -EFAULT; >> >> @@ -2219,22 +2225,24 @@ static int pagemap_scan_get_args(struct pm_scan_arg >> *arg, >> arg->category_anyof_mask | arg->return_mask) & ~PM_SCAN_CATEGORIES) >> return -EINVAL; >> >> - start = untagged_addr((unsigned long)arg->start); >> - end = untagged_addr((unsigned long)arg->end); >> - vec = untagged_addr((unsigned long)arg->vec); >> + arg->start = untagged_addr((unsigned long)arg->start); >> + arg->end = untagged_addr((unsigned long)arg->end); >> + arg->vec = untagged_addr((unsigned long)arg->vec); > > BTW, We should we keep the tag in args writeback(). Sorry what? After this function, the start, end and vec would be used. We need to make sure that the address are untagged before that. > >> /* Validate memory pointers */ >> - if (!IS_ALIGNED(start, PAGE_SIZE)) >> + if (!IS_ALIGNED(arg->start, PAGE_SIZE)) >> return -EINVAL; >> - if (!access_ok((void __user *)start, end - start)) >> + if (!access_ok((void __user *)arg->start, arg->end - arg->start)) >> return -EFAULT; >> - if (!vec && arg->vec_len) >> + if (!arg->vec && arg->vec_len) >> return -EFAULT; >> - if (vec && !access_ok((void __user *)vec, >> + if (arg->vec && !access_ok((void __user *)arg->vec, >> arg->vec_len * sizeof(struct page_region))) >> return -EFAULT; >> >> /* Fixup default values */ >> + arg->end = (arg->end & ~PAGE_MASK) ? >> + ((arg->end & PAGE_MASK) + PAGE_SIZE) : (arg->end); > > arg->end = ALIGN(arg->end, PAGE_SIZE); > >> if (!arg->max_pages) >> arg->max_pages = ULONG_MAX; >> > > Best Regards > Michał Mirosław -- BR, Muhammad Usama Anjum