From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21156C433EF for ; Mon, 15 Nov 2021 07:55:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AAB5263219 for ; Mon, 15 Nov 2021 07:55:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org AAB5263219 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 1861C6B007E; Mon, 15 Nov 2021 02:55:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 135B76B0081; Mon, 15 Nov 2021 02:55:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F25056B0080; Mon, 15 Nov 2021 02:55:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0122.hostedemail.com [216.40.44.122]) by kanga.kvack.org (Postfix) with ESMTP id E389A6B007B for ; Mon, 15 Nov 2021 02:55:49 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 97C44181251A8 for ; Mon, 15 Nov 2021 07:55:49 +0000 (UTC) X-FDA: 78810405576.27.89BFE03 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf21.hostedemail.com (Postfix) with ESMTP id 9DE43D036A55 for ; Mon, 15 Nov 2021 07:55:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636962948; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=OFYWOfIgh+R0pCCOGFbQusLursYsTaR7YBgSvw4x5lw=; b=HKi/AxvRIr89Lekl0R6aOHUBMVxKbCJuTU0gBiJY9qFbYaZnqPnvjUzzHIYF9NTiGht2yI or3gUE/pvJA8C6Jbm5p/ozNSgnXH95ytI1Rgwo4vupIuq2p9Hcui5EZDR45wC4N1KE+XPr 04PqNLVH9lYJDiZumB4Ha1V8QKp0a+c= Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-398-Wr2LAzTTPda_r7LQ9U-m-A-1; Mon, 15 Nov 2021 02:55:39 -0500 X-MC-Unique: Wr2LAzTTPda_r7LQ9U-m-A-1 Received: by mail-pl1-f197.google.com with SMTP id n13-20020a170902d2cd00b0014228ffc40dso5830949plc.4 for ; Sun, 14 Nov 2021 23:55:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=OFYWOfIgh+R0pCCOGFbQusLursYsTaR7YBgSvw4x5lw=; b=RXkXZXAX+xIptkNcHLfCV3eo63ZY5AmBkggpyZPYKfPsZvliT1d3tjQrt+r3EzQ60k yAZug7oC2ELgQXDBVJb8+lO9CHwEAA+9ecEaHgfrgfw+GabtHvCF2c2/PKwaQ/1wKn60 qhYPOwp5ZXKcDqTUlkmhlFiWQB3K0cudzMHRybdjc963crOaN2aHkjHmSrbsLVREyIN7 ThXX5yQudBZlVeoM2CONO+gVASkaX1vz3UnMwK+S0BFlsY9icQOBd7jaXJPjIRtmVgoV hOVd/KqE2zsE9QyWkDLTGef+iMcmeRXvmXpL/84Dmry2UjCrGIdoJFeFENZjSXnDHrB1 0v7Q== X-Gm-Message-State: AOAM530y5DPD3FLuvdEEwPueKUZw5th7O9IIknem4oyw7RClstVuhfST 0zW1Q0yyEhz5cOrFQ4ICsEiMq80Yz7FJJfQ3IGAZO3nEuXrXGxT6RCmE1AzNfzM+GG67zFJs0Fn PMxR3fXANyx1y/e2KydIIhny+kdJUfCRtUpRtmUfX4qVOCnAqFePMjGTovFFr X-Received: by 2002:a17:902:b716:b0:141:d36c:78fc with SMTP id d22-20020a170902b71600b00141d36c78fcmr32876806pls.59.1636962937764; Sun, 14 Nov 2021 23:55:37 -0800 (PST) X-Google-Smtp-Source: ABdhPJwAd51rC565DmR4S0Px83qtaC3tncJfoqUkji9bNAdVboAcS0spgZmn9DtgCef8nGAYJErnow== X-Received: by 2002:a17:902:b716:b0:141:d36c:78fc with SMTP id d22-20020a170902b71600b00141d36c78fcmr32876737pls.59.1636962937216; Sun, 14 Nov 2021 23:55:37 -0800 (PST) Received: from localhost.localdomain ([191.101.132.223]) by smtp.gmail.com with ESMTPSA id e10sm15792796pfv.140.2021.11.14.23.55.29 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 14 Nov 2021 23:55:36 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Axel Rasmussen , Nadav Amit , Mike Rapoport , Hugh Dickins , Mike Kravetz , "Kirill A . Shutemov" , Alistair Popple , Jerome Glisse , Matthew Wilcox , Andrew Morton , peterx@redhat.com, David Hildenbrand , Andrea Arcangeli Subject: [PATCH v6 00/23] userfaultfd-wp: Support shmem and hugetlbfs Date: Mon, 15 Nov 2021 15:54:59 +0800 Message-Id: <20211115075522.73795-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="utf-8" X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 9DE43D036A55 X-Stat-Signature: 8kdqjzyjywtt5duxek81rn8o835jq9cr Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="HKi/AxvR"; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf21.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com X-HE-Tag: 1636962942-787888 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This is v6 of the series to add shmem+hugetlbfs support for userfaultfd w= rite protection. It is based on v5.16-rc1 (fa55b7dcdc43), with below two patc= hes applied first: Subject: [PATCH RFC 0/2] mm: Rework zap ptes on swap entries https://lore.kernel.org/lkml/20211110082952.19266-1-peterx@redhat.com/ The whole tree can be found here for testing: https://github.com/xzpeter/linux/tree/uffd-wp-shmem-hugetlbfs Previous versions: RFC: https://lore.kernel.org/lkml/20210115170907.24498-1-peterx@redhat.= com/ v1: https://lore.kernel.org/lkml/20210323004912.35132-1-peterx@redhat.= com/ v2: https://lore.kernel.org/lkml/20210427161317.50682-1-peterx@redhat.= com/ v3: https://lore.kernel.org/lkml/20210527201927.29586-1-peterx@redhat.= com/ v4: https://lore.kernel.org/lkml/20210714222117.47648-1-peterx@redhat.= com/ v5: https://lore.kernel.org/lkml/20210715201422.211004-1-peterx@redhat= .com/ Overview =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D This is the first version of this work to rebase the uffd-wp logic work u= pon PTE markers. The major logic will be the same as v5, but since there're = quite a few minor changes here and there, I decided to not provide a change log= at all as it'll stop to be helpful. However I should have addressed all the comments that were raised by reviewers, please shoot if I missed somethin= g. I still kept many of the Mike's Review-By tag when there's merely no change= to the patch content (I touched up quite a few commit messages), but it'll b= e nice if Mike could still went over the patches even if there're R-bs standing. PTE marker is a new type of swap entry that is ony applicable to file-bac= ked memories like shmem and hugetlbfs. It's used to persist some pte-level information even if the original present ptes in pgtable are zapped. The= se information could be one of: (1) Userfaultfd wr-protect information (2) PTE soft-dirty information (3) Or others This series only uses the marker to store uffd-wp information across temp= orary zappings of shmem/hugetlbfs pgtables, for example, when a shmem thp is sp= lit. So even if ptes are temporarily zapped, the wr-protect information can st= ill be kept within the pgtables. Then when the page fault triggers again, we'll= know this pte is wr-protected so we can treat the pte the same as a normal uff= d wr-protected pte. The extra information is encoded into the swap entry, or swp_offset to be explicit, with the swp_type being PTE_MARKER. So far uffd-wp only uses o= ne bit out of the swap entry, the rest bits of swp_offset are still reserved for= other purposes. There're two configs to enable/disable PTE markers: CONFIG_PTE_MARKER CONFIG_PTE_MARKER_UFFD_WP We can set !PTE_MARKER to completely disable all the PTE markers, along w= ith uffd-wp support. I made two config so we can also enable PTE marker but disable uffd-wp file-backed for other purposes. At the end of current se= ries, I'll enable CONFIG_PTE_MARKER by default, but that patch is standalone an= d if anyone worries about having it by default, we can also consider turn it o= ff by dropping that oneliner patch. So far I don't see a huge risk of doing so= , so I kept that patch. In most cases, PTE markers should be treated as none ptes. It is because= that unlike most of the other swap entry types, there's no PFN or block offset information encoded into PTE markers but some extra well-defined bits sho= wing the status of the pte. These bits should only be used as extra data when servicing an upcoming page fault, and that should be it. I did spend a lot of time observing all the pte_none() users this time. I= t is indeed a challenge because there're a lot, and I hope I didn't miss a sin= gle of them when we should take care of pte markers. Luckily, I don't think it'= ll need to be considered in many cases, for example: boot code, arch code (especially non-x86), kernel-only page handlings (e.g. CPA), or device dr= iver codes when we're tackling with pure PFN mappings. I introduced pte_none_mostly() in this series when we need to handle pte markers the same as none pte, the "mostly" is the other way to write "eit= her none pte or a pte marker". I didn't replace pte_none() to cover pte markers for below reasons: - Very rare case of pte_none() callers will handle pte markers. E.g., = all the kernel pages do not require knowledge of pte markers. So we don'= t pollute the major use cases. - Unconditionally change pte_none() semantics could confuse people, bec= ause pte_none() existed for so long a time. - Unconditionally change pte_none() semantics could make pte_none() slo= wer even if in many cases pte markers do not exist. - There're cases where we'd like to handle pte markers differntly from pte_none(), so a full replace is also impossible. E.g. khugepaged sh= ould still treat pte markers as normal swap ptes rather than none ptes, be= cause pte markers will always need a fault-in to merge the marker with a va= lid pte. Or the smap code will need to parse PTE markers not none ptes. Patch Layout =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Introducing PTE marker and uffd-wp bit in PTE marker: mm: Introduce PTE_MARKER swap entry mm: Teach core mm about pte markers mm: Check against orig_pte for finish_fault() mm/uffd: PTE_MARKER_UFFD_WP Adding support for shmem uffd-wp: mm/shmem: Take care of UFFDIO_COPY_MODE_WP mm/shmem: Handle uffd-wp special pte in page fault handler mm/shmem: Persist uffd-wp bit across zapping for file-backed mm/shmem: Allow uffd wr-protect none pte for file-backed mem mm/shmem: Allows file-back mem to be uffd wr-protected on thps mm/shmem: Handle uffd-wp during fork() Adding support for hugetlbfs uffd-wp: mm/hugetlb: Introduce huge pte version of uffd-wp helpers mm/hugetlb: Hook page faults for uffd write protection mm/hugetlb: Take care of UFFDIO_COPY_MODE_WP mm/hugetlb: Handle UFFDIO_WRITEPROTECT mm/hugetlb: Handle pte markers in page faults mm/hugetlb: Allow uffd wr-protect none ptes mm/hugetlb: Only drop uffd-wp special pte if required mm/hugetlb: Handle uffd-wp during fork() Misc handling on the rest mm for uffd-wp file-backed: mm/khugepaged: Don't recycle vma pgtable if uffd-wp registered mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs Enabling of uffd-wp on file-backed memory: mm/uffd: Enable write protection for shmem & hugetlbfs mm: Enable PTE markers by default selftests/uffd: Enable uffd-wp for shmem/hugetlbfs Tests =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D - x86_64 - Compile tested on: - PTE_MARKER && PTE_MARKER_UFFD_WP, - PTE_MARKER && !PTE_MARKER_UFFD_WP, - !PTE_MARKER - !USERFAULTFD - Kernel userfaultfd selftests for shmem/hugetlb/hugetlb_shared - Umapsort [1,2] test for shmem/hugetlb, with swap on/off - aarch64 - Compile and smoke tested with !PTE_MARKER [1] https://github.com/xzpeter/umap-apps/tree/peter [2] https://github.com/xzpeter/umap/tree/peter-shmem-hugetlbfs Peter Xu (23): mm: Introduce PTE_MARKER swap entry mm: Teach core mm about pte markers mm: Check against orig_pte for finish_fault() mm/uffd: PTE_MARKER_UFFD_WP mm/shmem: Take care of UFFDIO_COPY_MODE_WP mm/shmem: Handle uffd-wp special pte in page fault handler mm/shmem: Persist uffd-wp bit across zapping for file-backed mm/shmem: Allow uffd wr-protect none pte for file-backed mem mm/shmem: Allows file-back mem to be uffd wr-protected on thps mm/shmem: Handle uffd-wp during fork() mm/hugetlb: Introduce huge pte version of uffd-wp helpers mm/hugetlb: Hook page faults for uffd write protection mm/hugetlb: Take care of UFFDIO_COPY_MODE_WP mm/hugetlb: Handle UFFDIO_WRITEPROTECT mm/hugetlb: Handle pte markers in page faults mm/hugetlb: Allow uffd wr-protect none ptes mm/hugetlb: Only drop uffd-wp special pte if required mm/hugetlb: Handle uffd-wp during fork() mm/khugepaged: Don't recycle vma pgtable if uffd-wp registered mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs mm/uffd: Enable write protection for shmem & hugetlbfs mm: Enable PTE markers by default selftests/uffd: Enable uffd-wp for shmem/hugetlbfs arch/s390/include/asm/hugetlb.h | 15 ++ fs/hugetlbfs/inode.c | 15 +- fs/proc/task_mmu.c | 11 ++ fs/userfaultfd.c | 31 +--- include/asm-generic/hugetlb.h | 24 +++ include/linux/hugetlb.h | 27 ++-- include/linux/mm.h | 20 +++ include/linux/mm_inline.h | 45 ++++++ include/linux/shmem_fs.h | 4 +- include/linux/swap.h | 15 +- include/linux/swapops.h | 79 ++++++++++ include/linux/userfaultfd_k.h | 67 +++++++++ include/uapi/linux/userfaultfd.h | 10 +- mm/Kconfig | 16 ++ mm/filemap.c | 5 + mm/hmm.c | 2 +- mm/hugetlb.c | 181 +++++++++++++++++----- mm/khugepaged.c | 14 +- mm/memcontrol.c | 8 +- mm/memory.c | 184 ++++++++++++++++++++--- mm/mincore.c | 3 +- mm/mprotect.c | 76 +++++++++- mm/rmap.c | 8 + mm/shmem.c | 4 +- mm/userfaultfd.c | 61 +++++--- tools/testing/selftests/vm/userfaultfd.c | 4 +- 26 files changed, 798 insertions(+), 131 deletions(-) --=20 2.32.0