From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94FDBC61DA4 for ; Wed, 15 Mar 2023 19:53:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 254E36B0071; Wed, 15 Mar 2023 15:53:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 204D56B0072; Wed, 15 Mar 2023 15:53:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0A6086B0075; Wed, 15 Mar 2023 15:53:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id F06F26B0071 for ; Wed, 15 Mar 2023 15:53:54 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C1714AB8A6 for ; Wed, 15 Mar 2023 19:53:54 +0000 (UTC) X-FDA: 80572183188.12.5594A48 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf21.hostedemail.com (Postfix) with ESMTP id A68CC1C0020 for ; Wed, 15 Mar 2023 19:53:52 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="ba6aO/J1"; spf=pass (imf21.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678910032; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=woVT10ZvTBDEkwPsfEHU2WbSYTau3o/cURF+myIHKzc=; b=Bx4UqBqvp50H2637jgSeLEtkH3SVfl/r0edd+r4pVbpgQZMcgBGkLbB0B+nTxg0bVXEM1d sEYPwuaVkHP1A+mLQ+gEpQBuYS2f9kwCR0MWbjQuy+TnzefamWlJGpLYaD8mdINkIfA8+/ vYY76yCbla/ASJ0zNyFD/ln/pek4xWo= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="ba6aO/J1"; spf=pass (imf21.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678910032; a=rsa-sha256; cv=none; b=b3UY6eYHkbVZnLBy8rag6YNwpYAHwUhERMij+UMl2QipW9Emb58GvzjP02MqJ50oJqeUED i+p3bhaF0gakqZUjDK7+blQvaYTg9Wr24/xgGCtwec6alI9C870F8mz04N4gy7KCv2Lrn2 ij25GXKBSzaNPit8xe9n0+7bERfrAZY= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1678910032; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=woVT10ZvTBDEkwPsfEHU2WbSYTau3o/cURF+myIHKzc=; b=ba6aO/J1wMWPTxdqakcfsDKhj7hySCz7IlBu07cAotg2mMY+Puo4oZ/OzAPbwV853K8zM9 6VFg2vQiwD+e5yCJhMe7IMMxVs+OFo0LpFkSiVhZkdx3o6NKu5wCakNFVZNUwaVKXG5eUh eqQDYYoPqvaeh3uj5MvCUS9vBiHy5C8= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-663-ky4knivUOMSnI4tbK2U4WQ-1; Wed, 15 Mar 2023 15:53:50 -0400 X-MC-Unique: ky4knivUOMSnI4tbK2U4WQ-1 Received: by mail-qv1-f72.google.com with SMTP id jh21-20020a0562141fd500b0053c23b938a0so12352449qvb.17 for ; Wed, 15 Mar 2023 12:53:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678910030; x=1681502030; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=woVT10ZvTBDEkwPsfEHU2WbSYTau3o/cURF+myIHKzc=; b=LPVzF5+4Uwt4GzMqgyWkn1NYEwztQ8TF3hQ1OIgEuQ5lAjVIhpFZ2KKxcV3KX6YDr7 ZJ9zME8lXo1ipL6u8qGdnCT7AxnAfFocAUGxiHjJatsPyoj0C62hYorixJ8AxpZYrHka ZdDmGAciriJ9QjqJVbuJlkVNM4sN7GkHCLg2DWmGEwISSOpjGli9U+qb/UkBHrQYAWxC C+hqbzEXR9knPqpJ1Zd4sD0gCVO+h320LHk8nqrqOVFgH/BC0Ao3mfF3g7k4cFdFlZsG NuGg0AUNlozoQPaLTLbVwdHi0PfdAGn9s9iha6FlPmpqDkzUjJqws9VrLmzMsiOQiEcg 8Vow== X-Gm-Message-State: AO0yUKVAEpWlC07KJLcyEk7vf+zu/8NDS3nF0ouNc6HdJRN9VJsrv6VH zvPryEEMVl8nnKJQHGPK2vP+baEih3qijTWwrsxxQrO45JIiq7pCTlYsumyz9KSJzoQqds1qQ15 /mkAFEmP5sFI= X-Received: by 2002:a05:622a:64f:b0:3bf:a564:573b with SMTP id a15-20020a05622a064f00b003bfa564573bmr8006377qtb.0.1678910030296; Wed, 15 Mar 2023 12:53:50 -0700 (PDT) X-Google-Smtp-Source: AK7set+YTOsYq99xGXjreXVl31GgX6FpNk3Wo+WVqNbF2kHRbcCNbb1E4jkD9EHxOg3lsUoFTmcIMw== X-Received: by 2002:a05:622a:64f:b0:3bf:a564:573b with SMTP id a15-20020a05622a064f00b003bfa564573bmr8006333qtb.0.1678910029944; Wed, 15 Mar 2023 12:53:49 -0700 (PDT) Received: from x1n (bras-base-aurron9127w-grc-56-70-30-145-63.dsl.bell.ca. [70.30.145.63]) by smtp.gmail.com with ESMTPSA id p22-20020a374216000000b0074374e2b630sm4309024qka.119.2023.03.15.12.53.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 Mar 2023 12:53:49 -0700 (PDT) Date: Wed, 15 Mar 2023 15:53:47 -0400 From: Peter Xu To: Muhammad Usama Anjum Cc: David Hildenbrand , Andrew Morton , =?utf-8?B?TWljaGHFgiBNaXJvc8WCYXc=?= , Andrei Vagin , Danylo Mocherniuk , Paul Gofman , Cyrill Gorcunov , Mike Rapoport , Nadav Amit , Alexander Viro , Shuah Khan , Christian Brauner , Yang Shi , Vlastimil Babka , "Liam R . Howlett" , Yun Zhou , Suren Baghdasaryan , Alex Sierra , Matthew Wilcox , Pasha Tatashin , Axel Rasmussen , "Gustavo A . R . Silva" , Dan Williams , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Greg KH , kernel@collabora.com Subject: Re: [PATCH v11 4/7] fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs Message-ID: References: <20230309135718.1490461-1-usama.anjum@collabora.com> <20230309135718.1490461-5-usama.anjum@collabora.com> <3d2d1ba4-bfab-6b3d-f0d6-ae0920ebdcb0@collabora.com> MIME-Version: 1.0 In-Reply-To: <3d2d1ba4-bfab-6b3d-f0d6-ae0920ebdcb0@collabora.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: A68CC1C0020 X-Stat-Signature: syegp3mi4y41jxqstysn4mumuzo6jgq9 X-Rspam-User: X-HE-Tag: 1678910032-51256 X-HE-Meta: U2FsdGVkX19e1Nxqcq/TtKZyU8J8bi9OFIOz7i7IVCnxQc79rWqM5cCfq6Zf4Ip2sbKDR8gfkDX+ViAtFMU+AkdQYUq+wzrxy0FM+6VgAxk1J9NoXxre077j+sERcGC5B7B7qE+JXIptlP1fRnDvHXuEiYiG5tV5c8MzVZChVvsSrYWB3YaYCWFZvjZ1u//OG3rP9b04dkLnkA2sMgZYRFJPhy6ZHnofwmLMultEMwDuAraIFgM4x5XJFt9AkDVk9GbRglAlLh6RDaeXTxHWSpgsCxPNkZzZyDSdMy5J55169HvZuDo1xIzZ6NHtQ2JgUArMzdKG9LB21Go9lTGwF/culWRylLj6v9rP5z8SsD6gKOkQDHDrvfgOTTfH3qADo1nkXE7LV8F4FCcLhS2QymL28MzAkGVfirIhLUB5ux7QvJDAF2eqz0+u8dwQNV6EETu+G2dsnwI/MSNnXDyNEYCK6YBv6a2RsRckp6ohvVpSe7dD0XlHEgrejE+ci/h6faKNVgXvpJGK/0Fo3+ZekuXmbvusV+M+F/WedCUMS2C00snNHTtT1qEthyAMpxi4AfzEsKe/VyLevMIayf3/I7fSFPUGGMbeNJTRO+ugA6AIOO3JYFnxW6BM/0rOJ5zgX9uDYwubJlAOzcDZYiqPiDoFeKGLAJxTWlsQzwxCMRTmrbtAZ8PeDeH2HavZUTk2rgiBoB7DEQ9e6MLEZEqfqdVHcFy6BtznXZa2OWx/x47iEozRkWX5SF5aRiJGH9JDtVkcGtC+HHu1ptR2eMYHicA4L++bpyeDIpxDLrixEUHyJmMquOlzUuPVGPmicMfw7k6TkJosE0b3ndjRgp/tdkLW2uDWcPAW1Mr8JxbuLeN2rXq4rOcpXnr9h4Qe44j5zKTf8+9Eaf2EIwtEB7/sZcJwK2oY/W6jBzX33Okw7JlyeX7ZKkBuEq8VozDt7xd1XAOI0/tTDKYWp+1BGlL DXnvDq1T G1ASM3GbcBZMDgocH5Qdwea4hmF/Pnmqz5ZWqDBIW6OEozWjYzwn2ePOlbpCSzYwaelW1apeaPj4OFoFZGqvF4wbqWEbv/Y6EFyhKXVPsrlYEHFnJmk2ZLGYjdc7pmXsI0DwGcXsqRP8Cugz01DCz7xFNuMYeu5cUQEei0Hv0A8CCpzyyaWS4eBYt5NxoKvZPUuB21X4W9td1EMV4IRx0pcgvJ+Rfut6Q/m3WcBoA/zu8k6Z5LnPN+4qCZeB84uKZXyNhKx4vlLwkj8aGhsTKymBcIMIC1zWluFFjjhfjLMDjAKzvdH5gy4/k2iBWYbWUVPfAV0RsPoe87BBe9CihdOEw/b2XlA7bAkqcbp+dL+jMX9C9oSWqPpidtEiYTv5Hqlv5jxl7c55aAnWpD5ATXbPEAp8BVBYrcXeWkfshdCLJVL7Mx16UykNbBw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Mar 15, 2023 at 09:54:40PM +0500, Muhammad Usama Anjum wrote: > On 3/15/23 8:55 PM, Peter Xu wrote: > > On Thu, Mar 09, 2023 at 06:57:15PM +0500, Muhammad Usama Anjum wrote: > >> + for (addr = start; !ret && addr < end; pte++, addr += PAGE_SIZE) { > >> + pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); > >> + > >> + is_writ = !is_pte_uffd_wp(*pte); > >> + is_file = vma->vm_file; > >> + is_pres = pte_present(*pte); > >> + is_swap = is_swap_pte(*pte); > >> + > >> + pte_unmap_unlock(pte, ptl); > >> + > >> + ret = pagemap_scan_output(is_writ, is_file, is_pres, is_swap, > >> + p, addr, 1); > >> + if (ret) > >> + break; > >> + > >> + if (PM_SCAN_OP_IS_WP(p) && is_writ && > >> + uffd_wp_range(walk->mm, vma, addr, PAGE_SIZE, true) < 0) > >> + ret = -EINVAL; > >> + } > > > > This is not real atomic.. > > > > Taking the spinlock for eacy pte is not only overkill but wrong in > > atomicity because the pte can change right after spinlock unlocked. > Let me explain. It seems like wrong, but it isn't. In my rigorous testing, > it didn't show any side-effect. Here we are finding out if a page is > written. If page is written, only then we clear it. Lets look at the > different possibilities here: > - If a page isn't written, we'll not clear it. > - If a page is written and there isn't any race, we'll clear written-to > flag by write protecting it. > - If a page is written but before clearing it, data is written again to the > page. The page would remain written and we'll clear it. > - If a page is written but before clearing it, it gets write protected, > we'll still write protected it. There is double right protection here, but > no side-effect. > > Lets turn this into a truth table for easier understanding. Here first > coulmn and thrid column represents this above code. 2nd column represents > any other thread interacting with the page. > > If page is written/dirty some other task interacts wp_page > no does nothing no > no writes to page no > no wp the page no > yes does nothing yes > yes write to page yes > yes wp the page yes > > As you can see there isn't any side-effect happening. We aren't over doing > the wp or under-doing the write-protect. > > Even if we were doing something wrong here and I bring the lock over all of > this, the pages get become written or wp just after unlocking. It is > expected. This current implementation doesn't seem to be breaking this. > > Is my understanding wrong somewhere here? Can you point out? Yes you're right. With is_writ check it looks all fine. > > Previous to this current locking design were either buggy or slower when > multiple threads were working on same pages. Current implementation removes > the limitations: > - The memcpy inside pagemap_scan_output is happening with pte unlocked. Why this has anything to worry? Isn't that memcpy only applies to a page_region struct? > - We are only wp a page if we have noted this page to be dirty > - No mm write lock is required. Only read lock works fine just like > userfaultfd_writeprotect() takes only read lock. I didn't even notice you used to use write lock. Yes I think read lock is suffice here. > > There is only one con here that we are locking and unlocking the pte lock > again and again. > > Please have a look at my explanation and let me know what do you think. I think this is fine as long as the semantics is correct, which I believe is the case. The spinlock can be optimized, but it can be done on top if needs more involved changes. > > > > > Unfortunately you also cannot reuse uffd_wp_range() because that's not > > atomic either, my fault here. Probably I was thinking mostly from > > soft-dirty pov on batching the collect+reset. > > > > You need to take the spin lock, collect whatever bits, set/clear whatever > > bits, only until then release the spin lock. > > > > "Not atomic" means you can have some page got dirtied but you could miss > > it. Depending on how strict you want, I think it'll break apps like CRIU > > if strict atomicity needed for migrating a process. If we want to have a > > new interface anyway, IMHO we'd better do that in the strict way. > In my rigorous multi-threaded testing where a lots of threads are working > on same set of pages, we aren't losing even a single update. I can share > the test if you want. Good to have tests covering that. I'd say you can add the test into selftests along with the series when you repost if it's convenient. It can be part of an existing test or it can be a new one under mm/. Thanks, -- Peter Xu