From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D524C4741F for ; Fri, 25 Sep 2020 22:26:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3155B2087D for ; Fri, 25 Sep 2020 22:26:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="BVt7XG6g" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3155B2087D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C5CB3900003; Fri, 25 Sep 2020 18:26:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C32206B0068; Fri, 25 Sep 2020 18:26:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AFFBA6B006C; Fri, 25 Sep 2020 18:26:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0203.hostedemail.com [216.40.44.203]) by kanga.kvack.org (Postfix) with ESMTP id 954476B0062 for ; Fri, 25 Sep 2020 18:26:07 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 5BE25180AD802 for ; Fri, 25 Sep 2020 22:26:07 +0000 (UTC) X-FDA: 77303017974.08.skate60_450887d2716b Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id 418161819E76B for ; Fri, 25 Sep 2020 22:26:07 +0000 (UTC) X-HE-Tag: skate60_450887d2716b X-Filterd-Recvd-Size: 6992 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf06.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 22:26:06 +0000 (UTC) Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1601072766; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=83VPDmzYEL7zTGyeIJw5KGTP3VVq8cB09Qjoi453FqI=; b=BVt7XG6gyD+VbzTS4ma2CniHiwjIal8kPyQo7Ndf91a+T/Eknpu4sZvGk94sOSN8sGPzjc Mw01qOjjIBKKjiKwLA6sW7dcComRaE3c9CfiJO51qu2KbmXo+96nJsJxJ98ltGXieEP+jg cwwyGAP6z0iz8Dg/MrZF6C/4sB9st2Y= Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-490-xW-5BVOwObCrHdYgk9effw-1; Fri, 25 Sep 2020 18:26:04 -0400 X-MC-Unique: xW-5BVOwObCrHdYgk9effw-1 Received: by mail-qv1-f70.google.com with SMTP id k14so2722977qvw.20 for ; Fri, 25 Sep 2020 15:26:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=83VPDmzYEL7zTGyeIJw5KGTP3VVq8cB09Qjoi453FqI=; b=N+u69ITjFrOemEVA5J6hZXPLOUuMd0AYqUMcmuECRF6i1Kc8iX6zyHqDmCjKjKx7aV +bBOZo9knRaXZ/H6NhRvERHHAloIlIiS1WSj1eN5p0+vRkvXFhGcrtduFZ+sxzK8NZGj KIDwI1SUM5mIr1W5bIvSiTvtBh0faG9cRfeDg6m/IdgpxWdrWPT5ArOdUhbUIt+J1ick VRv2cl4oVp+PVd7VRzzxomDpBpxeMfAvdeN845nAKDNPB+GhOVFLSvhYhTBl98RKB8nj zITMtnkQXdT3qfhi5SgZ/UvwlDe4KVOnAbdNWGeS3NtTFBLXpL5uROTDQ28DKDk9whWz BvtA== X-Gm-Message-State: AOAM533lmh7pSargfYklkGl4yY6uS/XKfA9mVohSLvHcn1SUK08U4Nfz 5f3Q0AOq54ORP3mUL/m+huL15p4S7BP4+NoaM/4MQGQ6zZ+CI/Do89wTPdQqKfZsoAPxLh6+/Fc Cg6k4qxc7tpQ= X-Received: by 2002:a05:620a:1367:: with SMTP id d7mr2294653qkl.20.1601072764007; Fri, 25 Sep 2020 15:26:04 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzRyeulRe69kAimB9nqfmhL3dvaIO6LziVgyjL+yf/PZwpi4PZATybtfpDGIbm/A0z9wGWDeA== X-Received: by 2002:a05:620a:1367:: with SMTP id d7mr2294608qkl.20.1601072763613; Fri, 25 Sep 2020 15:26:03 -0700 (PDT) Received: from localhost.localdomain (bras-vprn-toroon474qw-lp130-11-70-53-122-15.dsl.bell.ca. [70.53.122.15]) by smtp.gmail.com with ESMTPSA id w44sm3051471qth.9.2020.09.25.15.26.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Sep 2020 15:26:02 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: peterx@redhat.com, Jason Gunthorpe , John Hubbard , Andrew Morton , Christoph Hellwig , Yang Shi , Oleg Nesterov , Kirill Tkhai , Kirill Shutemov , Hugh Dickins , Jann Horn , Linus Torvalds , Michal Hocko , Jan Kara , Andrea Arcangeli , Leon Romanovsky Subject: [PATCH v2 4/4] mm/thp: Split huge pmds/puds if they're pinned when fork() Date: Fri, 25 Sep 2020 18:26:00 -0400 Message-Id: <20200925222600.6832-5-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200925222600.6832-1-peterx@redhat.com> References: <20200925222600.6832-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Pinned pages shouldn't be write-protected when fork() happens, because fo= llow up copy-on-write on these pages could cause the pinned pages to be replac= ed by random newly allocated pages. For huge PMDs, we split the huge pmd if pinning is detected. So that fut= ure handling will be done by the PTE level (with our latest changes, each of = the small pages will be copied). We can achieve this by let copy_huge_pmd() = return -EAGAIN for pinned pages, so that we'll fallthrough in copy_pmd_range() a= nd finally land the next copy_pte_range() call. Huge PUDs will be even more special - so far it does not support anonymou= s pages. But it can actually be done the same as the huge PMDs even if the= split huge PUDs means to erase the PUD entries. It'll guarantee the follow up = fault ins will remap the same pages in either parent/child later. This might not be the most efficient way, but it should be easy and clean enough. It should be fine, since we're tackling with a very rare case ju= st to make sure userspaces that pinned some thps will still work even without MADV_DONTFORK and after they fork()ed. Signed-off-by: Peter Xu --- mm/huge_memory.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index faadc449cca5..da397779a6d4 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1074,6 +1074,24 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct= mm_struct *src_mm, =20 src_page =3D pmd_page(pmd); VM_BUG_ON_PAGE(!PageHead(src_page), src_page); + + /* + * If this page is a potentially pinned page, split and retry the fault + * with smaller page size. Normally this should not happen because the + * userspace should use MADV_DONTFORK upon pinned regions. This is a + * best effort that the pinned pages won't be replaced by another + * random page during the coming copy-on-write. + */ + if (unlikely(is_cow_mapping(vma->vm_flags) && + atomic_read(&src_mm->has_pinned) && + page_maybe_dma_pinned(src_page))) { + pte_free(dst_mm, pgtable); + spin_unlock(src_ptl); + spin_unlock(dst_ptl); + __split_huge_pmd(vma, src_pmd, addr, false, NULL); + return -EAGAIN; + } + get_page(src_page); page_dup_rmap(src_page, true); add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); @@ -1177,6 +1195,16 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct= mm_struct *src_mm, /* No huge zero pud yet */ } =20 + /* Please refer to comments in copy_huge_pmd() */ + if (unlikely(is_cow_mapping(vma->vm_flags) && + atomic_read(&src_mm->has_pinned) && + page_maybe_dma_pinned(pud_page(pud)))) { + spin_unlock(src_ptl); + spin_unlock(dst_ptl); + __split_huge_pud(vma, src_pud, addr); + return -EAGAIN; + } + pudp_set_wrprotect(src_mm, addr, src_pud); pud =3D pud_mkold(pud_wrprotect(pud)); set_pud_at(dst_mm, addr, dst_pud, pud); --=20 2.26.2