From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0EF6CEB64D9 for ; Thu, 15 Jun 2023 14:52:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 533B86B0074; Thu, 15 Jun 2023 10:52:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4E34D8E0003; Thu, 15 Jun 2023 10:52:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3AA7C8E0002; Thu, 15 Jun 2023 10:52:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 2B35B6B0074 for ; Thu, 15 Jun 2023 10:52:57 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id ED40A1208F2 for ; Thu, 15 Jun 2023 14:52:56 +0000 (UTC) X-FDA: 80905274352.01.94A6608 Received: from mail-ed1-f54.google.com (mail-ed1-f54.google.com [209.85.208.54]) by imf22.hostedemail.com (Postfix) with ESMTP id DA460C0003 for ; Thu, 15 Jun 2023 14:52:53 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=fSrJpGcg; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of emmir@google.com designates 209.85.208.54 as permitted sender) smtp.mailfrom=emmir@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686840774; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IAtbfG4Aoz6kytomR9wtfMXX6I18hQsyqyJIKoo2vmA=; b=S9FKv9ZIvVORBkvEoBEvK6iptWJvSVIHCmNdlOXiZpj/bid91aU0WrRbVRL4SQsoNJdnyz 7Y/ZS3JpE5Y0SlOy0f/uKLWajUr4Qb5d8t7ynNM8D0ARPpH0Hh+2G3NfXe7FFy/sWPKM3g tikOmhSm6xRvyAXT4JrO4KFNvGDtjco= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=fSrJpGcg; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of emmir@google.com designates 209.85.208.54 as permitted sender) smtp.mailfrom=emmir@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686840774; a=rsa-sha256; cv=none; b=g4nw1M2CoRmM6pckQAR9aDlOPUOIyqHl4Rhm6Xik22xJVdUHvK+poM3p6ZDppLDA/4ezwW T7StL1OHDZwGWdTbsny+9HkDUybcR7w06IJUHjp5Q9Jb+D/hsBVYaf0stOAoElm0vKua3x gJsv8r8vV2stflTCuJUAdabebByhs4c= Received: by mail-ed1-f54.google.com with SMTP id 4fb4d7f45d1cf-513ea2990b8so10779a12.0 for ; Thu, 15 Jun 2023 07:52:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686840772; x=1689432772; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=IAtbfG4Aoz6kytomR9wtfMXX6I18hQsyqyJIKoo2vmA=; b=fSrJpGcgZ1ZKV9qgO5cMqf4m56iOHhDAXNSThI8B74HnDj6UMjTgbDzZIZyjvQnY1R Xx6Tmc+iE/57/HuVGRFuPzB2i0L12QwzPzKCZNA2ciwXSqdQl/qEfZE9YBrWGPyZbsSG ygi3lzfzXYwxvRLZHugqOgNwrHLSKy83i+DZrYSeezi22l9n0NFXgc0KZ/iA7Jt3G0KF RIVYIOILg36nVZoVgUXU7ykY/lcVDSe+jUG8VB57jbA1yskBDyzbLNLfsFIUWguSHTn+ kdzFJ9+pj/k2yf/PW6pBgk+D3zHo04G0Uw28QJMadK/qeBo4aHSDyJBeR2fS2rVY0GKt yXDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686840772; x=1689432772; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IAtbfG4Aoz6kytomR9wtfMXX6I18hQsyqyJIKoo2vmA=; b=QeOnTa0Bghqh4dOcMRraQi7N/BJUQHtS6qOaofaxtFzFluBkILKs//UkW1YVTDOgT5 I49QmvWv9qST5wucaA3Y7REDZRzbNpCf4FLKgRacA7TJfVmYB+AkKeY+g8BQQ0mIu/we ekiWiGveJpSOd0r/LAu15t7CQZyM4IAbH4zoFImlNLjPGRbDxqSlZ3384dAzYWP/wPV9 RnOenixbyNFu0ahuw3/jU7kbWrrj4JRMa+/ksw2UFaIuyJPK0UVxkTxuCuneDxZQFWVw DNeBPQjzdPQLotLOSa/OQEnLjZojhInAML2B8g7FnDp3cI9d2+WU5NEpEk13GaIFrDeO MBHA== X-Gm-Message-State: AC+VfDy2oq49ZLlUDTPCRDFId5xkxKJxO7L2mQSZPhJFWsECGbLkJCZC e1hFnPFv3w3GxU+FEqvwkycAkZ4J6k6G60LhFSr9GQ== X-Google-Smtp-Source: ACHHUZ67CknraQiNFeWTHQxsI/pD93j9Z/IG8HHMp+oyBmBPrD3SOWlFDPxALG/rqwNKAAJWPgK3+biNNXtJMfLIvgs= X-Received: by 2002:a50:d681:0:b0:514:95d4:c2bb with SMTP id r1-20020a50d681000000b0051495d4c2bbmr114787edi.2.1686840772093; Thu, 15 Jun 2023 07:52:52 -0700 (PDT) MIME-Version: 1.0 References: <20230613102905.2808371-1-usama.anjum@collabora.com> <20230613102905.2808371-3-usama.anjum@collabora.com> <0db01d90-09d6-08a4-bbb8-70670d3baa94@collabora.com> <34203acf-7270-7ade-a60e-ae0f729dcf70@collabora.com> <96b7cc00-d213-ad7d-1b48-b27f75b04d22@collabora.com> In-Reply-To: <96b7cc00-d213-ad7d-1b48-b27f75b04d22@collabora.com> From: =?UTF-8?B?TWljaGHFgiBNaXJvc8WCYXc=?= Date: Thu, 15 Jun 2023 16:52:40 +0200 Message-ID: Subject: Re: [PATCH v18 2/5] fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs To: Muhammad Usama Anjum Cc: Peter Xu , David Hildenbrand , Andrew Morton , Andrei Vagin , Danylo Mocherniuk , Paul Gofman , Cyrill Gorcunov , Mike Rapoport , Nadav Amit , Alexander Viro , Shuah Khan , Christian Brauner , Yang Shi , Vlastimil Babka , "Liam R . Howlett" , Yun Zhou , Suren Baghdasaryan , Alex Sierra , Matthew Wilcox , Pasha Tatashin , Axel Rasmussen , "Gustavo A . R . Silva" , Dan Williams , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Greg KH , kernel@collabora.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: DA460C0003 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 44enox15s7yicioz5u7jfobmbykwecne X-HE-Tag: 1686840773-422377 X-HE-Meta: U2FsdGVkX1/indx2JvTCEqUlQyA93btIPU5Ix+G3XNEkTrKwZ0l4kAgxySgDGAOP7p5dPSTrLG99jiY1lG/PJVdR4gJyscwRNuAMStxuI9BbfxH5TQpKAlxkCCEmVNO0IB3L3iAJy6FdEPr0zR++diom/jlOk6A9JWPk1zuW0F2K5Wz85jv+ZW6TgpFWwhG+NgK3sBOrSAJkO+lEUgsHQj09tntrd9AYkJXcYQtLo1WYcSl3apnlULWG/KU240y7cjs6nfvQCBOrFHfEuFu+e/cH3G8Y56ssi64s9prqlyD8dqz6qhIBBgBf+cEClDPrHG1IJPmyt0sCUJ2lLZEhSR5Qpz0WtA4y2SeQDel8x3qvu7lQCnR/UrYYsKsJako0Owcuf7r+9N3LxPOW3Z9omBgzSDq6jMPck2idnfG4lZfKqsIpCUQqWXnvRfJQKORkAdKoTu6EZmNKgyrY4TwhkkSjSzYpRwPrnf5XGDBj52PHt36goc/2vl+mW4us5esOZkne6kxBGC22/pMjn2rI2gzraIw9+Dplu8qi1CdHMPJI9umBDBMhQzbqjCBTDFScRYrymeiS/1+zQ5Fikkk0yWi0NkHawvCIeai6mz2oEYjEldzGxVNJXbqW+1Noeo3nZZfz7P+0wHzcSBLmTwtMlHJgaKWHz+J7tmWJUZ8IUGBlsS5yTSAXc31vN0t9zcizmZ/sDBdXpGUjAA0dk1hGgOSj28PrLuXNjzUJH6jVc9Gy09BkIlzHaHRTUVGVir+95zemqU95f87gv52jJtpq9eEKeZ2i9Ik4jGUqrYkWOlRS4AGY48P5e15V8lS89wjw592yxD3TjaujCEhqEtVU+JXO3C/+rHdUX+EAIL4Y9KR47ksBTmcYlLrYfnWUgODDbrfbdlI0qKovY7BwKPJd1HqGOai3AD8em++nTZgWwZjn1+oZUurlvMf4xfOKSr2DglRbNMLCVFY9G0deeU+ uiGQtcVb G8m1hBSxjVp0BV8CJVKibe4SzcYrluQePZmjBSzooWFrBP2DIKS91lFatDk2RY4ahNo7VH+qBYRzqEC8ytlXo4VB/rLQfEIxN5UkUZbG35MTCQNyUlK7G+6HT0kfev6bLOMwvG7WFE4unx1Nf80A8MFWg5C4qstV1jis352qbEViHCRsoF0Hrb/LZOEZJ1pSxAW07ifQkBb30MwUvnXQb7Cu3WRYI2KM36cFKeisVsu9RTnYGbUKfyOx+JgSfJBzGmniitTZH9hLZel5VnXLy+jyrivgl5zcQ/fItEK33biPXjveh9vGLIjNlNWDxxhUU8blc01U0hdQ7PU0+yynXk3v/krClmpTZGASao3P+PlW5Qca2wASlhw+Jz+pyh+lz4Z+8BrtJyATuY+JvUHJjT34k4g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, 15 Jun 2023 at 15:58, Muhammad Usama Anjum wrote: > I'll send next revision now. > On 6/14/23 11:00=E2=80=AFPM, Micha=C5=82 Miros=C5=82aw wrote: > > (A quick reply to answer open questions in case they help the next vers= ion.) > > > > On Wed, 14 Jun 2023 at 19:10, Muhammad Usama Anjum > > wrote: > >> On 6/14/23 8:14=E2=80=AFPM, Micha=C5=82 Miros=C5=82aw wrote: > >>> On Wed, 14 Jun 2023 at 15:46, Muhammad Usama Anjum > >>> wrote: > >>>> > >>>> On 6/14/23 3:36=E2=80=AFAM, Micha=C5=82 Miros=C5=82aw wrote: > >>>>> On Tue, 13 Jun 2023 at 12:29, Muhammad Usama Anjum > >>>>> wrote: > > [...] > >>>>>> + if (cur_buf->bitmap =3D=3D bitmap && > >>>>>> + cur_buf->start + cur_buf->len * PAGE_SIZE =3D=3D addr)= { > >>>>>> + cur_buf->len +=3D n_pages; > >>>>>> + p->found_pages +=3D n_pages; > >>>>>> + } else { > >>>>>> + if (cur_buf->len && p->vec_buf_index >=3D p->vec_b= uf_len) > >>>>>> + return -ENOMEM; > >>>>> > >>>>> Shouldn't this be -ENOSPC? -ENOMEM usually signifies that the kerne= l > >>>>> ran out of memory when allocating, not that there is no space in a > >>>>> user-provided buffer. > >>>> There are 3 kinds of return values here: > >>>> * PM_SCAN_FOUND_MAX_PAGES (1) ---> max_pages have been found. Abort = the > >>>> page walk from next entry > >>>> * 0 ---> continue the page walk > >>>> * -ENOMEM --> Abort the page walk from current entry, user buffer is= full > >>>> which is not error, but only a stop signal. This -ENOMEM is just > >>>> differentiater from (1). This -ENOMEM is for internal use and isn't > >>>> returned to user. > >>> > >>> But why ENOSPC is not good here? I was used before, I think. > >> -ENOSPC is being returned in form of true error from > >> pagemap_scan_hugetlb_entry(). So I'd to remove -ENOSPC from here as it > >> wasn't true error here, it was only a way to abort the walk immediatel= y. > >> I'm liking the following erturn code from here now: > >> > >> #define PM_SCAN_BUFFER_FULL (-256) > > > > I guess this will be reworked anyway, but I'd prefer this didn't need > > custom errors etc. If we agree to decoupling the selection and GET > > output, it could be: > > > > bool is_interesting_page(p, flags); // this one does the > > required/anyof/excluded match > > size_t output_range(p, start, len, flags); // this one fills the > > output vector and returns how many pages were fit > > > > In this setup, `is_interesting_page() && (n_out =3D output_range()) < > > n_pages` means this is the final range, no more will fit. And if > > `n_out =3D=3D 0` then no pages fit and no WP is needed (no other specia= l > > cases). > Right now, pagemap_scan_output() performs the work of both of these two > functions. The part can be broken into is_interesting_pages() and we can > leave the remaining part as it is. > > Saying that n_out < n_pages tells us the buffer is full covers one case. > But there is case of maximum pages have been found and walk needs to be > aborted. This case is exactly what `n_out < n_pages` will cover (if scan_output uses max_pages properly to limit n_out). Isn't it that when the buffer is full we want to abort the scan always (with WP if `n_out > 0`)? > >>>>> For flags name: PM_REQUIRE_WRITE_ACCESS? > >>>>> Or Is it intended to be checked only if doing WP (as the current na= me > >>>>> suggests) and so it would be redundant as WP currently requires > >>>>> `p->required_mask =3D PAGE_IS_WRITTEN`? > >>>> This is intended to indicate that if userfaultfd is needed. If > >>>> PAGE_IS_WRITTEN is mentioned in any of mask, we need to check if > >>>> userfaultfd has been initialized for this memory. I'll rename to > >>>> PM_SCAN_REQUIRE_UFFD. > >>> > >>> Why do we need that check? Wouldn't `is_written =3D false` work for v= mas > >>> not registered via uffd? > >> UFFD_FEATURE_WP_ASYNC and UNPOPULATED needs to be set on the memory re= gion > >> for it to report correct written values on the memory region. Without = UFFD > >> WP ASYNC and UNPOUPULATED defined on the memory, we consider UFFD_WP s= tate > >> undefined. If user hasn't initialized memory with UFFD, he has no righ= t to > >> set is_written =3D false. > > > > How about calculating `is_written =3D is_uffd_registered() && > > is_uffd_wp()`? This would enable a user to apply GET+WP for the whole > > address space of a process regardless of whether all of it is > > registered. > I wouldn't want to check if uffd is registered again and again. This is w= hy > we are doing it only once every walk in pagemap_scan_test_walk(). There is no need to do the checks repeatedly. If I understand the code correctly, uffd registration is per-vma, so it can be communicated from test_walk to entry/hole callbacks via a field in pagemap_scan_private. > >>> While here, I wonder if we really need to fail the call if there are > >>> unknown bits in those masks set: if this bit set is expanded with > >>> another category flags, a newer userspace run on older kernel would > >>> get EINVAL even if the "treat unknown as 0" be what it requires. > >>> There is no simple way in the API to discover what bits the kernel > >>> supports. We could allow a no-op (no WP nor GET) call to help with > >>> that and then rejecting unknown bits would make sense. > >> I've not seen any examples of this. But I've seen examples of returnin= g > >> error if kernel doesn't support a feature. Each new feature comes with= a > >> kernel version, greater than this version support this feature. If use= r is > >> trying to use advanced feature which isn't present in a kernel, we sho= uld > >> return error and not proceed to confuse the user/kernel. In fact if we= look > >> at userfaultfd_api(), we return error immediately if feature has some = bit > >> set which kernel doesn't support. > > > > I think we should have a way of detecting the supported flags if we > > don't want a forward compatibility policy for flags here. Maybe it > > would be enough to allow all the no-op combinations for this purpose? > Again I don't think UFFD is doing anything like this. If it's cheap and easy to provide a user with a way to detect the supported features - why not do it? Best Regards Micha=C5=82 Miros=C5=82aw