From: Andrei Vagin <avagin@gmail.com>
To: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: "Peter Xu" <peterx@redhat.com>,
"David Hildenbrand" <david@redhat.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Michał Mirosław" <emmir@google.com>,
"Danylo Mocherniuk" <mdanylo@google.com>,
"Paul Gofman" <pgofman@codeweavers.com>,
"Cyrill Gorcunov" <gorcunov@gmail.com>,
"Mike Rapoport" <rppt@kernel.org>,
"Nadav Amit" <namit@vmware.com>,
"Alexander Viro" <viro@zeniv.linux.org.uk>,
"Shuah Khan" <shuah@kernel.org>,
"Christian Brauner" <brauner@kernel.org>,
"Yang Shi" <shy828301@gmail.com>,
"Vlastimil Babka" <vbabka@suse.cz>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
"Yun Zhou" <yun.zhou@windriver.com>,
"Suren Baghdasaryan" <surenb@google.com>,
"Alex Sierra" <alex.sierra@amd.com>,
"Matthew Wilcox" <willy@infradead.org>,
"Pasha Tatashin" <pasha.tatashin@soleen.com>,
"Axel Rasmussen" <axelrasmussen@google.com>,
"Gustavo A . R . Silva" <gustavoars@kernel.org>,
"Dan Williams" <dan.j.williams@intel.com>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
"Greg KH" <gregkh@linuxfoundation.org>,
kernel@collabora.com
Subject: Re: [PATCH v22 2/5] fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs
Date: Fri, 30 Jun 2023 08:01:14 -0700 [thread overview]
Message-ID: <ZJ7uOqPIJwMiCuOI@gmail.com> (raw)
In-Reply-To: <20230628095426.1886064-3-usama.anjum@collabora.com>
On Wed, Jun 28, 2023 at 02:54:23PM +0500, Muhammad Usama Anjum wrote:
> This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear
> the info about page table entries. The following operations are supported
> in this ioctl:
> - Get the information if the pages have been written-to (PAGE_IS_WRITTEN),
> file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT), swapped
> (PAGE_IS_SWAPPED) or page has pfn zero (PAGE_IS_PFNZERO).
> - Find pages which have been written-to and/or write protect the pages
> (atomic PM_SCAN_OP_GET + PM_SCAN_OP_WP)
>
> This IOCTL can be extended to get information about more PTE bits. The
> entire address range passed by user [start, end) is scanned until either
> the user provided buffer is full or max_pages have been found.
>
> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
<snip>
> +
> +static long do_pagemap_scan(struct mm_struct *mm, unsigned long __arg)
> +{
> + struct pm_scan_arg __user *uarg = (struct pm_scan_arg __user *)__arg;
> + unsigned long long start, end, walk_start, walk_end;
> + unsigned long empty_slots, vec_index = 0;
> + struct mmu_notifier_range range;
> + struct page_region __user *vec;
> + struct pagemap_scan_private p;
> + struct pm_scan_arg arg;
> + int ret = 0;
> +
> + if (copy_from_user(&arg, uarg, sizeof(arg)))
> + return -EFAULT;
> +
> + start = untagged_addr((unsigned long)arg.start);
> + end = untagged_addr((unsigned long)arg.end);
> + vec = (struct page_region __user *)untagged_addr((unsigned long)arg.vec);
> +
> + ret = pagemap_scan_args_valid(&arg, start, end, vec);
> + if (ret)
> + return ret;
> +
> + p.max_pages = (arg.max_pages) ? arg.max_pages : ULONG_MAX;
> + p.found_pages = 0;
> + p.required_mask = arg.required_mask;
> + p.anyof_mask = arg.anyof_mask;
> + p.excluded_mask = arg.excluded_mask;
> + p.return_mask = arg.return_mask;
> + p.flags = arg.flags;
> + p.flags |= ((p.required_mask | p.anyof_mask | p.excluded_mask) &
> + PAGE_IS_WRITTEN) ? PM_SCAN_REQUIRE_UFFD : 0;
> + p.cur_buf.start = p.cur_buf.len = p.cur_buf.flags = 0;
> + p.vec_buf = NULL;
> + p.vec_buf_len = PAGEMAP_WALK_SIZE >> PAGE_SHIFT;
> +
> + /*
> + * Allocate smaller buffer to get output from inside the page walk
> + * functions and walk page range in PAGEMAP_WALK_SIZE size chunks. As
> + * we want to return output to user in compact form where no two
> + * consecutive regions should be continuous and have the same flags.
> + * So store the latest element in p.cur_buf between different walks and
> + * store the p.cur_buf at the end of the walk to the user buffer.
> + */
> + if (IS_PM_SCAN_GET(p.flags)) {
> + p.vec_buf = kmalloc_array(p.vec_buf_len, sizeof(*p.vec_buf),
> + GFP_KERNEL);
> + if (!p.vec_buf)
> + return -ENOMEM;
> + }
> +
> + if (IS_PM_SCAN_WP(p.flags)) {
> + mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_VMA, 0,
> + mm, start, end);
> + mmu_notifier_invalidate_range_start(&range);
> + }
> +
> + walk_start = walk_end = start;
> + while (walk_end < end && !ret) {
> + if (IS_PM_SCAN_GET(p.flags)) {
> + p.vec_buf_index = 0;
> +
> + /*
> + * All data is copied to cur_buf first. When more data
> + * is found, we push cur_buf to vec_buf and copy new
> + * data to cur_buf. Subtract 1 from length as the
> + * index of cur_buf isn't counted in length.
> + */
> + empty_slots = arg.vec_len - vec_index;
> + p.vec_buf_len = min(p.vec_buf_len, empty_slots - 1);
> + }
> +
> + ret = mmap_read_lock_killable(mm);
> + if (ret)
> + goto return_status;
> +
> + walk_end = min((walk_start + PAGEMAP_WALK_SIZE) & PAGEMAP_WALK_MASK, end);
> +
> + ret = walk_page_range(mm, walk_start, walk_end,
> + &pagemap_scan_ops, &p);
> + mmap_read_unlock(mm);
> +
> + if (ret && ret != PM_SCAN_FOUND_MAX_PAGES &&
> + ret != PM_SCAN_END_WALK)
> + goto return_status;
> +
> + walk_start = walk_end;
> + if (IS_PM_SCAN_GET(p.flags) && p.vec_buf_index) {
> + if (copy_to_user(&vec[vec_index], p.vec_buf,
> + p.vec_buf_index * sizeof(*p.vec_buf))) {
> + /*
> + * Return error even though the OP succeeded
> + */
> + ret = -EFAULT;
> + goto return_status;
> + }
> + vec_index += p.vec_buf_index;
> + }
> + }
> +
> + if (p.cur_buf.len) {
> + if (copy_to_user(&vec[vec_index], &p.cur_buf, sizeof(p.cur_buf))) {
> + ret = -EFAULT;
> + goto return_status;
> + }
> + vec_index++;
> + }
> +
> + ret = vec_index;
> +
> +return_status:
> + arg.start = (unsigned long)walk_end;
This doesn't look right. pagemap_scan_pmd_entry can stop early. For
example, it can happen when it hits the max_pages limit. Do I miss
something?
> + if (copy_to_user(&uarg->start, &arg.start, sizeof(arg.start)))
> + ret = -EFAULT;
> +
> + if (IS_PM_SCAN_WP(p.flags))
> + mmu_notifier_invalidate_range_end(&range);
> +
> + kfree(p.vec_buf);
> + return ret;
> +}
> +
next prev parent reply other threads:[~2023-06-30 15:01 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-28 9:54 [PATCH v22 0/5] Implement IOCTL to get and optionally clear info about PTEs Muhammad Usama Anjum
2023-06-28 9:54 ` [PATCH v22 1/5] userfaultfd: UFFD_FEATURE_WP_ASYNC Muhammad Usama Anjum
2023-06-28 9:54 ` [PATCH v22 2/5] fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs Muhammad Usama Anjum
2023-06-28 12:11 ` Michał Mirosław
2023-07-03 9:30 ` Muhammad Usama Anjum
2023-06-30 15:01 ` Andrei Vagin [this message]
2023-07-03 6:47 ` Muhammad Usama Anjum
2023-07-03 15:07 ` Andrei Vagin
2023-07-06 5:17 ` Muhammad Usama Anjum
2023-07-04 22:15 ` kernel test robot
2023-06-28 9:54 ` [PATCH v22 3/5] tools headers UAPI: Update linux/fs.h with the kernel sources Muhammad Usama Anjum
2023-06-28 9:54 ` [PATCH v22 4/5] mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL Muhammad Usama Anjum
2023-07-04 10:10 ` kernel test robot
2023-06-28 9:54 ` [PATCH v22 5/5] selftests: mm: add pagemap ioctl tests Muhammad Usama Anjum
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZJ7uOqPIJwMiCuOI@gmail.com \
--to=avagin@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=alex.sierra@amd.com \
--cc=axelrasmussen@google.com \
--cc=brauner@kernel.org \
--cc=dan.j.williams@intel.com \
--cc=david@redhat.com \
--cc=emmir@google.com \
--cc=gorcunov@gmail.com \
--cc=gregkh@linuxfoundation.org \
--cc=gustavoars@kernel.org \
--cc=kernel@collabora.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mdanylo@google.com \
--cc=namit@vmware.com \
--cc=pasha.tatashin@soleen.com \
--cc=peterx@redhat.com \
--cc=pgofman@codeweavers.com \
--cc=rppt@kernel.org \
--cc=shuah@kernel.org \
--cc=shy828301@gmail.com \
--cc=surenb@google.com \
--cc=usama.anjum@collabora.com \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=yun.zhou@windriver.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.