From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9DD22C433EF for ; Wed, 22 Sep 2021 18:21:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2281160F12 for ; Wed, 22 Sep 2021 18:21:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 2281160F12 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 677E294000D; Wed, 22 Sep 2021 14:21:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 604B36B0092; Wed, 22 Sep 2021 14:21:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 47AEF94000D; Wed, 22 Sep 2021 14:21:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0121.hostedemail.com [216.40.44.121]) by kanga.kvack.org (Postfix) with ESMTP id 322A16B008C for ; Wed, 22 Sep 2021 14:21:45 -0400 (EDT) Received: from smtpin33.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id DBE041812A47F for ; Wed, 22 Sep 2021 18:21:44 +0000 (UTC) X-FDA: 78616027728.33.521EEDA Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf12.hostedemail.com (Postfix) with ESMTP id 7B8DE10000AC for ; Wed, 22 Sep 2021 18:21:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1632334904; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HxWqvwn1W60zGEVdo9pDaah3Bkr4kJeRG9VF7vjOA+0=; b=OB6L0hoygCNViP9b5culeEK1PGrnJ+HYtAve+fgVjoZm2hzPDaYRQj37lev+pJGdbDQwbb ztaqA4m9C3uVBNhhHncw/mCoeFxAibDfejsz9XjL7s6uIw6RyFz3dvYqGY1oo7OMWsWkbX zX77rQH8vaOhMV+dnzvw/fqv+J1OGLc= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-320-qzZo-HvTNBmVOydHXSYXHg-1; Wed, 22 Sep 2021 14:21:42 -0400 X-MC-Unique: qzZo-HvTNBmVOydHXSYXHg-1 Received: by mail-wr1-f69.google.com with SMTP id x7-20020a5d6507000000b0015dada209b1so2917776wru.15 for ; Wed, 22 Sep 2021 11:21:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:to:cc:references:from:organization:subject :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=HxWqvwn1W60zGEVdo9pDaah3Bkr4kJeRG9VF7vjOA+0=; b=rV+2q4gG8xHrQfGmqTnEXPaKrZiOhjacYKPQ+JCQb0IYnJayQp5sKC2vB5HyTftyts nfFoOmec9+nZZO2KlALOe44l2XpKxWJYTeQIftHha5U2eYY6Jkcz8syihqGWUYSgdlBr aq0YRIg+5t8lOTN+U9nO1UDJDp7EiDDiMviqvSd8ptAQYjD57Nq++BEHlaYE2bn2Vnd8 hLB86KLL+7nuCZGIqa/7bj+98b5J+PUpx2tguOh/Xjw7cJ9p3QZ5oDD03dpjwNF3WlAN n37TJtGy4f9+W47AZAKWsw8JTt/DRyMqm0dIYJ5X5p5x1SMbu8vP/Peeov92enP/RRqK Daeg== X-Gm-Message-State: AOAM533fQWw+6eRRT9WQErhNQRq/BG7wGAggsWR07GBXN9Do5BZuMPsv C9Bez5vGuRBf3ZOdmdIuqeYF3iyRXUTi4ZlIrA629GVk7uidogBJXVQDsrfAU+cTjjAcIOsViPe Oy+WaW/yA/1g= X-Received: by 2002:adf:a2c4:: with SMTP id t4mr480809wra.258.1632334901607; Wed, 22 Sep 2021 11:21:41 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyqZUcKBGyxg7Q7AiGxVq/7ww3aUAcI8z15wqMjB9dBOckRdHvxLQFjM28R9kkO/82pI55VNA== X-Received: by 2002:adf:a2c4:: with SMTP id t4mr480793wra.258.1632334901361; Wed, 22 Sep 2021 11:21:41 -0700 (PDT) Received: from [192.168.3.132] (p5b0c64dd.dip0.t-ipconnect.de. [91.12.100.221]) by smtp.gmail.com with ESMTPSA id g131sm2717348wme.22.2021.09.22.11.21.40 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 22 Sep 2021 11:21:40 -0700 (PDT) To: Peter Xu , linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Andrew Morton , Andrea Arcangeli , Axel Rasmussen , Hugh Dickins , Nadav Amit References: <20210922175156.130228-1-peterx@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH] mm/khugepaged: Detecting uffd-wp vma more efficiently Message-ID: <6bbb8e29-9e21-dfbe-d23d-61de7e3cc6db@redhat.com> Date: Wed, 22 Sep 2021 20:21:40 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <20210922175156.130228-1-peterx@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=OB6L0hoy; spf=none (imf12.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 7B8DE10000AC X-Stat-Signature: a8whayhq5kfhodu4umcrwy33i7ya6q7q X-HE-Tag: 1632334904-792231 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 22.09.21 19:51, Peter Xu wrote: > We forbid merging thps for uffd-wp enabled regions, by breaking the khu= gepaged > scanning right after we detected a uffd-wp armed pte (either present, o= r swap). >=20 > It works, but it's less efficient, because those ptes only exist for VM= _UFFD_WP > enabled VMAs. Checking against the vma flag would be more efficient, a= nd good > enough. To be explicit, we could still be able to merge some thps for > VM_UFFD_WP regions before this patch as long as they have zero uffd-wp = armed > ptes, however that's not a major target for thp collapse anyways. >=20 Hm, are we sure there are no users that could benefit from the current=20 handling? I'm thinking about long-term uffd-wp users that effectively end up=20 wp-ing on only a small fraction of a gigantic vma, or always wp complete=20 blocks in a certain granularity in the range of THP. Databases come to mind ... In the past, I played with the idea of using uffd-wp to protect access=20 to logically unplugged memory regions part of virtio-mem devices in QEMU=20 -- which would exactly do something as described above. But I'll most=20 probably be using ordinary uffd once any users that might read such=20 logically unplugged memory have been "fixed". The change itself looks sane to me AFAIKT. > This mostly reverts commit e1e267c7928fe387e5e1cffeafb0de2d0473663a, bu= t > instead we do the same check at vma level, so it's not a bugfix. >=20 > This also paves the way for file-backed uffd-wp support, as the VM_UFFD= _WP flag > will work for file-backed too. >=20 > After this patch, the error for khugepaged for these regions will switc= h from > SCAN_PTE_UFFD_WP to SCAN_VMA_CHECK. >=20 > Since uffd minor mode should not allow thp as well, do the same thing f= or minor > mode to stop early on trying to collapse pages in khugepaged. >=20 > Cc: Andrea Arcangeli > Cc: Axel Rasmussen > Cc: Hugh Dickins > Cc: Nadav Amit > Signed-off-by: Peter Xu > --- >=20 > Axel: as I asked in the other thread, please help check whether minor m= ode will > work properly with shmem thp enabled. If not, I feel like this patch c= ould be > part of that effort at last, but it's also possible that I missed somet= hing. >=20 > Signed-off-by: Peter Xu > --- > include/trace/events/huge_memory.h | 1 - > mm/khugepaged.c | 26 +++----------------------- > 2 files changed, 3 insertions(+), 24 deletions(-) >=20 > diff --git a/include/trace/events/huge_memory.h b/include/trace/events/= huge_memory.h > index 4fdb14a81108..53532f5925c3 100644 > --- a/include/trace/events/huge_memory.h > +++ b/include/trace/events/huge_memory.h > @@ -15,7 +15,6 @@ > EM( SCAN_EXCEED_SWAP_PTE, "exceed_swap_pte") \ > EM( SCAN_EXCEED_SHARED_PTE, "exceed_shared_pte") \ > EM( SCAN_PTE_NON_PRESENT, "pte_non_present") \ > - EM( SCAN_PTE_UFFD_WP, "pte_uffd_wp") \ > EM( SCAN_PAGE_RO, "no_writable_page") \ > EM( SCAN_LACK_REFERENCED_PAGE, "lack_referenced_page") \ > EM( SCAN_PAGE_NULL, "page_null") \ > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index 045cc579f724..3afe66d48db0 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -31,7 +31,6 @@ enum scan_result { > SCAN_EXCEED_SWAP_PTE, > SCAN_EXCEED_SHARED_PTE, > SCAN_PTE_NON_PRESENT, > - SCAN_PTE_UFFD_WP, > SCAN_PAGE_RO, > SCAN_LACK_REFERENCED_PAGE, > SCAN_PAGE_NULL, > @@ -467,6 +466,9 @@ static bool hugepage_vma_check(struct vm_area_struc= t *vma, > return false; > if (vma_is_temporary_stack(vma)) > return false; > + /* Don't allow thp merging for wp/minor enabled uffd regions */ > + if (userfaultfd_wp(vma) || userfaultfd_minor(vma)) > + return false; > return !(vm_flags & VM_NO_KHUGEPAGED); > } > =20 > @@ -1246,15 +1248,6 @@ static int khugepaged_scan_pmd(struct mm_struct = *mm, > pte_t pteval =3D *_pte; > if (is_swap_pte(pteval)) { > if (++unmapped <=3D khugepaged_max_ptes_swap) { > - /* > - * Always be strict with uffd-wp > - * enabled swap entries. Please see > - * comment below for pte_uffd_wp(). > - */ > - if (pte_swp_uffd_wp(pteval)) { > - result =3D SCAN_PTE_UFFD_WP; > - goto out_unmap; > - } > continue; > } else { > result =3D SCAN_EXCEED_SWAP_PTE; > @@ -1270,19 +1263,6 @@ static int khugepaged_scan_pmd(struct mm_struct = *mm, > goto out_unmap; > } > } > - if (pte_uffd_wp(pteval)) { > - /* > - * Don't collapse the page if any of the small > - * PTEs are armed with uffd write protection. > - * Here we can also mark the new huge pmd as > - * write protected if any of the small ones is > - * marked but that could bring unknown > - * userfault messages that falls outside of > - * the registered range. So, just be simple. > - */ > - result =3D SCAN_PTE_UFFD_WP; > - goto out_unmap; > - } > if (pte_write(pteval)) > writable =3D true; > =20 >=20 --=20 Thanks, David / dhildenb