From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E262C433F5 for ; Fri, 4 Mar 2022 05:19:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0A4028D0003; Fri, 4 Mar 2022 00:19:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 053888D0001; Fri, 4 Mar 2022 00:19:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E84068D0003; Fri, 4 Mar 2022 00:19:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id DBA778D0001 for ; Fri, 4 Mar 2022 00:19:58 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 9F1D1181951F6 for ; Fri, 4 Mar 2022 05:19:58 +0000 (UTC) X-FDA: 79205552076.30.A23BA5F Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf15.hostedemail.com (Postfix) with ESMTP id 33025A000C for ; Fri, 4 Mar 2022 05:19:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646371197; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PRicOcRvWzoxpLs51SSJ+aFkDsMZLS/bjPquNOlSlQc=; b=gEKvOt7r/yOcpKptgVHLfdmz0R3d4G/sDtUxuqTqKfg5fJ0lLPFYMJqFPk4xOLEr3UKHgn HUJcYBQONqqz7yLIUrcxpG/z7ZDOmRvDoiim4yEJLrBfgUYszvPR8ICMIGlsD30FOUQKHW 1Vlg6XVOejHJPbohv4XsMEnFlE1vNoc= Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-204-FJUJ0ldmNi6QhxP8N8TitA-1; Fri, 04 Mar 2022 00:19:56 -0500 X-MC-Unique: FJUJ0ldmNi6QhxP8N8TitA-1 Received: by mail-pg1-f198.google.com with SMTP id v4-20020a63f844000000b003745fd0919aso3901887pgj.20 for ; Thu, 03 Mar 2022 21:19:56 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=PRicOcRvWzoxpLs51SSJ+aFkDsMZLS/bjPquNOlSlQc=; b=6Iabi8ni739NCWaPhxw+5pFqE9xWCCoFcvIFMMZTMYp3eEyK82q8MFGTC1TqqLAbo3 ExzA3ZXSEbHMmT0ZoJ3sbUB7Q4zWdcBO6XWPlIYHUYBBLua2ibfk48e/fDykj1EwOleo 9xA5XsBrX5FTvHt1cTOxrBW/z7iflwhkrrgNfKoF6NWwWqYopwKaBFw0hPGJFI4bbhhy Ea6dPAgcU+NzSoroZN1Ni8sS15/ILPbVqreKrsNmwSwjc+BvPoq8LyBdRkbikby69k/H l5VnJcW5owdUn40S4DyRQaRI/pIlWWFckg8DThocaqTAOu7mcQtUWhLkAbh0dBpBEYqt lWNw== X-Gm-Message-State: AOAM532imawloGJ7tBFYfw1Mr9LAlCZQsu2zcAS9oTTesF9o4zDfTbcf GC5odn8JMIO8rIJkmGjWLEgBzJiOYVkSy0Di0iFxq/qH2/57Pq1TZarSe3+W6maH7450gY8NoDU wxH2unoNapVnfYO4mNBrkDWhH0nH4N9u+LjZMz3HbFati4xlTGeI1FgOJm9gk X-Received: by 2002:a63:8bca:0:b0:37c:9049:103 with SMTP id j193-20020a638bca000000b0037c90490103mr2179473pge.387.1646371195450; Thu, 03 Mar 2022 21:19:55 -0800 (PST) X-Google-Smtp-Source: ABdhPJzMQjw3TZyj+/ynwRaeUsP3tPV3EyAhD/DLSy/Gjrm4jimUydTxbOQNlZECErcoEoI+7O+Amw== X-Received: by 2002:a63:8bca:0:b0:37c:9049:103 with SMTP id j193-20020a638bca000000b0037c90490103mr2179446pge.387.1646371195056; Thu, 03 Mar 2022 21:19:55 -0800 (PST) Received: from localhost.localdomain ([94.177.118.59]) by smtp.gmail.com with ESMTPSA id p16-20020a056a000b5000b004f669806cd9sm4323865pfo.87.2022.03.03.21.19.47 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 03 Mar 2022 21:19:54 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Nadav Amit , Hugh Dickins , David Hildenbrand , Axel Rasmussen , Matthew Wilcox , Alistair Popple , Mike Rapoport , Andrew Morton , Jerome Glisse , Mike Kravetz , "Kirill A . Shutemov" , Andrea Arcangeli Subject: [PATCH v7 19/23] mm/khugepaged: Don't recycle vma pgtable if uffd-wp registered Date: Fri, 4 Mar 2022 13:17:04 +0800 Message-Id: <20220304051708.86193-20-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220304051708.86193-1-peterx@redhat.com> References: <20220304051708.86193-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="US-ASCII" X-Rspamd-Queue-Id: 33025A000C X-Stat-Signature: 3roe5oq4hdnb6ngxr469nzh7dkdwxwnw X-Rspam-User: Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=gEKvOt7r; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf15.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com X-Rspamd-Server: rspam03 X-HE-Tag: 1646371198-331527 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When we're trying to collapse a 2M huge shmem page, don't retract pgtable= pmd page if it's registered with uffd-wp, because that pgtable could have pte markers installed. Recycling of that pgtable means we'll lose the pte ma= rkers. That could cause data loss for an uffd-wp enabled application on shmem. Instead of disabling khugepaged on these files, simply skip retracting th= ese special VMAs, then the page cache can still be merged into a huge thp, an= d other mm/vma can still map the range of file with a huge thp when proper. Note that checking VM_UFFD_WP needs to be done with mmap_sem held for wri= te, that avoids race like: khugepaged user thread =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D check VM_UFFD_WP, not set UFFDIO_REGISTER with uffd-wp on sh= mem wr-protect some pages (install mar= kers) take mmap_sem write lock erase pmd and free pmd page --> pte markers are dropped unnoticed! Signed-off-by: Peter Xu --- mm/khugepaged.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index a4e5eaf3eb01..87d88d6725af 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1456,6 +1456,10 @@ void collapse_pte_mapped_thp(struct mm_struct *mm,= unsigned long addr) if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE)) return; =20 + /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() *= / + if (userfaultfd_wp(vma)) + return; + hpage =3D find_lock_page(vma->vm_file->f_mapping, linear_page_index(vma, haddr)); if (!hpage) @@ -1591,7 +1595,15 @@ static void retract_page_tables(struct address_spa= ce *mapping, pgoff_t pgoff) * reverse order. Trylock is a way to avoid deadlock. */ if (mmap_write_trylock(mm)) { - if (!khugepaged_test_exit(mm)) + /* + * When a vma is registered with uffd-wp, we can't + * recycle the pmd pgtable because there can be pte + * markers installed. Skip it only, so the rest mm/vma + * can still have the same file mapped hugely, however + * it'll always mapped in small page size for uffd-wp + * registered ranges. + */ + if (!khugepaged_test_exit(mm) && !userfaultfd_wp(vma)) collapse_and_free_pmd(mm, vma, addr, pmd); mmap_write_unlock(mm); } else { --=20 2.32.0