From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58055C433F5 for ; Tue, 19 Apr 2022 16:29:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BBE3C8D0082; Tue, 19 Apr 2022 12:29:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B6D648D0047; Tue, 19 Apr 2022 12:29:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A0E038D0082; Tue, 19 Apr 2022 12:29:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 925348D0047 for ; Tue, 19 Apr 2022 12:29:33 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 565AF223C3 for ; Tue, 19 Apr 2022 16:29:33 +0000 (UTC) X-FDA: 79374164226.18.AA3C9BF Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf22.hostedemail.com (Postfix) with ESMTP id 4D4C7C0017 for ; Tue, 19 Apr 2022 16:29:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1650385772; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5DfINRDdhXkP4ETnLQBnkCvvjlr01rXpqh9mx4OBLIE=; b=dbH3LsNKp3xUYe7J64+Ya+O2/R5+35PPAR17cwTolToYFB2S1OELruwMzgO1NkZbJ55h4S s7uUg5CbVI8ELIRrqLBCXsMR4ZYLxReUeY1CWsvhXddWb4w7IHX4VEVLGEjLEfQh5kgEdH tYaObUORoKlsZe5XB49uieXWg8fprDY= Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-303-JmL_tVkXPUOhLfzif9BVlg-1; Tue, 19 Apr 2022 12:29:30 -0400 X-MC-Unique: JmL_tVkXPUOhLfzif9BVlg-1 Received: by mail-ed1-f69.google.com with SMTP id dk2-20020a0564021d8200b0041d789d18bcso11308682edb.21 for ; Tue, 19 Apr 2022 09:29:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:organization:in-reply-to :content-transfer-encoding; bh=5DfINRDdhXkP4ETnLQBnkCvvjlr01rXpqh9mx4OBLIE=; b=3Orr/OMwQsePd/Iw+hNWisZ0xrTY0ANRvnml5sOpLb+iUHRx6vTI0gQuo9YgMOBaUA FNfYX2NPrfDV5GbGJJ2M3MHRTnPZPoye/jhzY+CZCdbENjn210t1D/mhc1f59S10gP4u nKuryMAFJefkHrTj7IWu1SE4vFstyGrqbwIbzEAzknlI8pYyyluPBKmwLXHM8SqEJ3ab wb6MZsw2cpe0EAJeSWSRkNiQmZ9sKANNBh8aN+p81fP3m7leeK1X7zTVUU+K6KFu3BHc 3DyFZdWYRw6+nz67foLQCuA14jzC56g8z2mX1DYdWEDuDai2TNn5U2ABYSXB0GvMXcad 8MXg== X-Gm-Message-State: AOAM531W+b/bCDUlGlwIzk3k9EHsdH3vX3ltx4kjLqQzsfcUIrytY9IC sChNM5wP6P+Osz8e6Mm6dRDJW6qRoNn7V2Ox5I9tflA8Yn0zdsWc1h46WVOH359CcHg69S4er49 gk+KFrKQSUOg= X-Received: by 2002:a05:6402:1691:b0:41d:6b0a:657d with SMTP id a17-20020a056402169100b0041d6b0a657dmr18478774edv.192.1650385769671; Tue, 19 Apr 2022 09:29:29 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwamgE0FLRdBphf+ZM6UrkLzqNkK1a4km6/iSAlvUfstbr2FYphqKj6T9p9W7HMHApptH5OxQ== X-Received: by 2002:a05:6402:1691:b0:41d:6b0a:657d with SMTP id a17-20020a056402169100b0041d6b0a657dmr18478729edv.192.1650385769407; Tue, 19 Apr 2022 09:29:29 -0700 (PDT) Received: from ?IPV6:2003:cb:c704:5d00:d8c2:fbf6:a608:957a? (p200300cbc7045d00d8c2fbf6a608957a.dip0.t-ipconnect.de. [2003:cb:c704:5d00:d8c2:fbf6:a608:957a]) by smtp.gmail.com with ESMTPSA id x1-20020a170906148100b006efa8a81a52sm2599923ejc.120.2022.04.19.09.29.27 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 19 Apr 2022 09:29:28 -0700 (PDT) Message-ID: Date: Tue, 19 Apr 2022 18:29:26 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.6.2 Subject: Re: [PATCH v3 14/16] mm: support GUP-triggered unsharing of anonymous pages To: Vlastimil Babka , linux-kernel@vger.kernel.org Cc: Andrew Morton , Hugh Dickins , Linus Torvalds , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Liang Zhang , Pedro Gomes , Oded Gabbay , linux-mm@kvack.org References: <20220329160440.193848-1-david@redhat.com> <20220329160440.193848-15-david@redhat.com> <9005b167-db08-c967-463b-5e0e092cbb6c@suse.cz> From: David Hildenbrand Organization: Red Hat In-Reply-To: <9005b167-db08-c967-463b-5e0e092cbb6c@suse.cz> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 4D4C7C0017 X-Stat-Signature: 1h45qjcoxcfsq9shxekfdzoooof7qx88 Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=dbH3LsNK; spf=none (imf22.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-HE-Tag: 1650385772-32507 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 14.04.22 19:15, Vlastimil Babka wrote: > On 3/29/22 18:04, David Hildenbrand wrote: >> Whenever GUP currently ends up taking a R/O pin on an anonymous page that >> might be shared -- mapped R/O and !PageAnonExclusive() -- any write fault >> on the page table entry will end up replacing the mapped anonymous page >> due to COW, resulting in the GUP pin no longer being consistent with the >> page actually mapped into the page table. >> >> The possible ways to deal with this situation are: >> (1) Ignore and pin -- what we do right now. >> (2) Fail to pin -- which would be rather surprising to callers and >> could break user space. >> (3) Trigger unsharing and pin the now exclusive page -- reliable R/O >> pins. >> >> We want to implement 3) because it provides the clearest semantics and >> allows for checking in unpin_user_pages() and friends for possible BUGs: >> when trying to unpin a page that's no longer exclusive, clearly >> something went very wrong and might result in memory corruptions that >> might be hard to debug. So we better have a nice way to spot such >> issues. >> >> To implement 3), we need a way for GUP to trigger unsharing: >> FAULT_FLAG_UNSHARE. FAULT_FLAG_UNSHARE is only applicable to R/O mapped >> anonymous pages and resembles COW logic during a write fault. However, in >> contrast to a write fault, GUP-triggered unsharing will, for example, still >> maintain the write protection. >> >> Let's implement FAULT_FLAG_UNSHARE by hooking into the existing write fault >> handlers for all applicable anonymous page types: ordinary pages, THP and >> hugetlb. >> >> * If FAULT_FLAG_UNSHARE finds a R/O-mapped anonymous page that has been >> marked exclusive in the meantime by someone else, there is nothing to do. >> * If FAULT_FLAG_UNSHARE finds a R/O-mapped anonymous page that's not >> marked exclusive, it will try detecting if the process is the exclusive >> owner. If exclusive, it can be set exclusive similar to reuse logic >> during write faults via page_move_anon_rmap() and there is nothing >> else to do; otherwise, we either have to copy and map a fresh, >> anonymous exclusive page R/O (ordinary pages, hugetlb), or split the >> THP. >> >> This commit is heavily based on patches by Andrea. >> >> Co-developed-by: Andrea Arcangeli >> Signed-off-by: Andrea Arcangeli >> Signed-off-by: David Hildenbrand > > Acked-by: Vlastimil Babka > > Modulo a nit and suspected logical bug below. Thanks! >> @@ -4515,8 +4550,11 @@ static inline vm_fault_t create_huge_pmd(struct vm_fault *vmf) >> /* `inline' is required to avoid gcc 4.1.2 build error */ >> static inline vm_fault_t wp_huge_pmd(struct vm_fault *vmf) >> { >> + const bool unshare = vmf->flags & FAULT_FLAG_UNSHARE; >> + >> if (vma_is_anonymous(vmf->vma)) { >> - if (userfaultfd_huge_pmd_wp(vmf->vma, vmf->orig_pmd)) >> + if (unlikely(unshare) && > > Is this condition flipped, should it be "likely(!unshare)"? As the similar > code in do_wp_page() does. Good catch, this should affect uffd-wp on THP -- it wouldn't trigger as expected. Thanks a lot for finding that! > >> + userfaultfd_huge_pmd_wp(vmf->vma, vmf->orig_pmd)) >> return handle_userfault(vmf, VM_UFFD_WP); >> return do_huge_pmd_wp_page(vmf); >> } >> @@ -4651,10 +4689,11 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) >> update_mmu_tlb(vmf->vma, vmf->address, vmf->pte); >> goto unlock; >> } >> - if (vmf->flags & FAULT_FLAG_WRITE) { >> + if (vmf->flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) { >> if (!pte_write(entry)) >> return do_wp_page(vmf); >> - entry = pte_mkdirty(entry); >> + else if (likely(vmf->flags & FAULT_FLAG_WRITE)) >> + entry = pte_mkdirty(entry); >> } >> entry = pte_mkyoung(entry); >> if (ptep_set_access_flags(vmf->vma, vmf->address, vmf->pte, entry, > So the following on top, right? diff --git a/mm/memory.c b/mm/memory.c index 8b3cb73f5e44..4584c7e87a70 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3137,7 +3137,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) free_swap_cache(old_page); put_page(old_page); } - return page_copied && !unshare ? VM_FAULT_WRITE : 0; + return (page_copied && !unshare) ? VM_FAULT_WRITE : 0; oom_free_new: put_page(new_page); oom: @@ -4604,7 +4604,7 @@ static inline vm_fault_t wp_huge_pmd(struct vm_fault *vmf) const bool unshare = vmf->flags & FAULT_FLAG_UNSHARE; if (vma_is_anonymous(vmf->vma)) { - if (unlikely(unshare) && + if (likely(!unshare) && userfaultfd_huge_pmd_wp(vmf->vma, vmf->orig_pmd)) return handle_userfault(vmf, VM_UFFD_WP); return do_huge_pmd_wp_page(vmf); -- Thanks, David / dhildenb