From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B69C3EB64DD for ; Thu, 3 Aug 2023 13:38:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 339D128025A; Thu, 3 Aug 2023 09:38:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2EA5028022C; Thu, 3 Aug 2023 09:38:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 18ACB28025A; Thu, 3 Aug 2023 09:38:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 0AFC228022C for ; Thu, 3 Aug 2023 09:38:20 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id CFF81120320 for ; Thu, 3 Aug 2023 13:38:19 +0000 (UTC) X-FDA: 81082897518.24.CEB76B8 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf06.hostedemail.com (Postfix) with ESMTP id 725BB180014 for ; Thu, 3 Aug 2023 13:38:17 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=JDsvfLsg; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf06.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691069897; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4LdCOYEIp5EKMcN/eAa9kzzi2QRfegY2bb0j77Ga9CQ=; b=JHese48MMUIJLwaQX4HAUTmjO+LonkuG5L5ZzwCYNHXGC4/yJwBqNl3ZxZckizQD/O22bU 9rRL6aMyge/1gslzsMyOyxs9FBcwpVtW7OxJsDHyplLIMpuU93mrBfxVUXbCwuwUe4CeM0 l3Pb+GOc8wvAsKdofAxCLxqzk3Sbs1o= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=JDsvfLsg; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf06.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691069897; a=rsa-sha256; cv=none; b=qrDgtDelVJTrqDy1oXa+InKxmmL+J+P5yfRSBbe3GBNXLxnbhLOVV30QH33wUocCahacyI eniZcJFeqP2oCyAa2LEdOgH6DD2zsi1AacvauQab/wp5P8iKeOV6L8H5Aid4MQ60c4w0hm 5rUTP5VOm2kNuCrwod5ypqHI1xGHrs8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1691069896; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4LdCOYEIp5EKMcN/eAa9kzzi2QRfegY2bb0j77Ga9CQ=; b=JDsvfLsgqMibJvy7NCjgsEwcSfESpakFtoWsNLgkcyEIZ0YJKOPp5HNnQG1JLG8PjKSqQ2 zcd6oaC7ajfySTdEHwG5e2BhThx5QR9FeJ9C0+AAotS9qlnOG2H2UMEBzKblVbjLr21VId q70XUhyT5IJu5Ef96NZtK60enHYCLpk= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-434-ct1pZio_MyStZNFdgr-PQg-1; Thu, 03 Aug 2023 09:38:13 -0400 X-MC-Unique: ct1pZio_MyStZNFdgr-PQg-1 Received: by mail-wm1-f72.google.com with SMTP id 5b1f17b1804b1-3fe2a5ced6dso6100965e9.2 for ; Thu, 03 Aug 2023 06:38:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691069892; x=1691674692; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=4LdCOYEIp5EKMcN/eAa9kzzi2QRfegY2bb0j77Ga9CQ=; b=N0uXU0Nv9mMnyhUIAxMiYFGuT97u6R/cW0RNWieFrVDvMPkC2N6pKOBbmSezOLavQ1 ZAEX/cahSsGbExJGOOTWB+014pNZDJj+KKMgxARntIUXzFc/XkNIQT1V/s00LODZB+gE 2aTU8zp8UmNETh9Nno5Hydousiu7ZZXA9zfrAE/KR7PIhQE1w+SCL7yJvLvKc5++xkkh Xv5vUk06Yk/kDQgdL4oSXk8fG5cxPHrb225QQbORSBsrz+CjzYdTbP38v1NTtnpXCnT3 Y1cuR/17HGP4wdKWk2Jaw07yWdAyalVpdzjnvgW5DbQcPKmm6lHvdPERYMulOlPoCqqV 11QA== X-Gm-Message-State: ABy/qLY7JasBffBM5oFvz96vEjYdpffJ+MhJDFre5el0zpJ12P9RsoV7 G660q8/jtDJIBaHK4DZRckxnL5tKHT9cz9a2vSiPBveCx4KnQTvbbdciYTsWoaHqXdVC0Vf1uo3 HFBas07saf3U= X-Received: by 2002:a1c:f70f:0:b0:3fe:1f80:7d92 with SMTP id v15-20020a1cf70f000000b003fe1f807d92mr7553914wmh.8.1691069892389; Thu, 03 Aug 2023 06:38:12 -0700 (PDT) X-Google-Smtp-Source: APBJJlE5zo8NVBdJ99WdYuTitMtVNIGysr42vSLSInOY1ml9vS6uWQFfAhmIUWYV3tm0hhlXpmN+KQ== X-Received: by 2002:a1c:f70f:0:b0:3fe:1f80:7d92 with SMTP id v15-20020a1cf70f000000b003fe1f807d92mr7553891wmh.8.1691069891987; Thu, 03 Aug 2023 06:38:11 -0700 (PDT) Received: from ?IPV6:2003:cb:c718:9a00:a5f5:5315:b9fa:64df? (p200300cbc7189a00a5f55315b9fa64df.dip0.t-ipconnect.de. [2003:cb:c718:9a00:a5f5:5315:b9fa:64df]) by smtp.gmail.com with ESMTPSA id i15-20020adffdcf000000b003145559a691sm21890432wrs.41.2023.08.03.06.38.10 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 03 Aug 2023 06:38:11 -0700 (PDT) Message-ID: <6cda91b3-bb7a-4c4c-a618-2572b9c8bbf9@redhat.com> Date: Thu, 3 Aug 2023 15:38:09 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [PATCH v4 3/3] mm: Batch-zap large anonymous folio PTE mappings To: Ryan Roberts , Andrew Morton , Matthew Wilcox , Yin Fengwei , Yu Zhao , Yang Shi , "Huang, Ying" , Zi Yan , Nathan Chancellor , Alexander Gordeev , Gerald Schaefer Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20230727141837.3386072-1-ryan.roberts@arm.com> <20230727141837.3386072-4-ryan.roberts@arm.com> From: David Hildenbrand Organization: Red Hat In-Reply-To: <20230727141837.3386072-4-ryan.roberts@arm.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Stat-Signature: 4dnokg3dwusa7n9cwo3wzsbit8oqx6ry X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 725BB180014 X-HE-Tag: 1691069897-820683 X-HE-Meta: U2FsdGVkX192sXjTX/0T0r0QJfJaSduDmJw2bkxGw5Owui1LY6nhuThUqfUDBaeEhp1Uuk5EXxEmVH0fq2zXo6bDZLSUXSAESQs67jEs+aPdm93HYCxvOeopdKr2qA4ih8oz8VpdaeNaQG20N+A5HTFDEE3W/0U8BSm5yQBoPyHLrRPUCkBjxDJuscutjvcYLRJmJaLaZ/D67RLR6d78n26Js0Ixy5B7/P0vRBYkr1iVUy+Kpco8ds6/agpLoOphI+Zyfn3IxZYcYJRZkEOmrcxmcoMoYZn9X5erWc0R1k7794tbXx9+a+M+9L4UWwrYfc6YcGK2AreSS/KAWeOCBN6/MedQykXD9h7oG/GoVo3N8Ba8N/Ajk2E6oOZ4LhovSEolsrMdpX/JhPeBxPKlwbQ0AHFBTKIf5lGHJw7r48aIOyvsw0ZMnrqPs3Mhl+OJXcuOuWDt0IcNCIyuQPEh0QdewOib7gY0ENOGweOVb6W6EKcRey6YlPQyDsi4Sp1kUBYCVhwC7Zv8JVIKC8WvE/3rmTREJMDjgpis3y8IlQaR0s2zMlMkBiY8om8anFipjQj7GNRQ/C3T1V3kPRLgYsTWORT1zuCT5ZkMUF9tj57j8Ql5ybyJtJPXbRDNRqD2xaYdiISRAogkGxxu2zJfF+zaBJYnv2YdHzA1ImRhMYutMTxtJ7uDUBtcxxEU6bdMpbav9AM/Idpjmag69Ms8btvXNeL0t65Z2QO+zkTYbc0p+hVfj1cOCpQJI0fKi6S9B+vzdp2tt5FjfzWlCbpsmyVtketRKnb695nprD8xraoHIy8XV07UFZt26Xkv1ob6Rc4ogmP4VASVs8aUgJkdUuOYi2JiZ9i7j1zfaZaCDFViWXizyzyiRzFn3nq1oDySIOlMpA0TPC9NbHZEK9a17YLbQ5nYthfrE/W+0N2mHo4UP8TtlvlbQ4ofaBVnAWK67tnHZ4m7Z+T5KHdREAe aw7JF0Po BwzoincEGtXDdKvoCP+FqHgBsopyhUdnItKz9Id+pLiPiESrWaLaGc132u0MKKAAoYH7/dFGTI/INmSUuhDo0CcKiMsI/mGEkL3eoQcCgV30sI78V9y81I0VzLEvXqd1dSHqIKKO8xUGdnvMXtMcDJTncZFmQbYt5WxfVzGgVFF3uitt+tsYIR4jpZmiDnbr+Ca+2h6vfoe5kyomtDEs3Qi8w8abUj6qiehvbHvQt5We1Ay6Y+/vx6ANGMllzwcfg7phGtyMDufjVF9+TS79F+QT56gBuJdVp00sW+qh4uyRJknXcJfk5JVpaB7Eje4YviR5rqanaXuFS0Uhn+F2NHfHPafB1vsLvES6xnf/h90k0GfVh0eqMIdmBcjjYiECTxCOqpZWIaEzVG4boxyc81kWzcm0Y+mOzSBDde+o800aXUIawySyeIYK/wAsOIYwSP0yj X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 27.07.23 16:18, Ryan Roberts wrote: > This allows batching the rmap removal with folio_remove_rmap_range(), > which means we avoid spuriously adding a partially unmapped folio to the > deferred split queue in the common case, which reduces split queue lock > contention. > > Previously each page was removed from the rmap individually with > page_remove_rmap(). If the first page belonged to a large folio, this > would cause page_remove_rmap() to conclude that the folio was now > partially mapped and add the folio to the deferred split queue. But > subsequent calls would cause the folio to become fully unmapped, meaning > there is no value to adding it to the split queue. > > A complicating factor is that for platforms where MMU_GATHER_NO_GATHER > is enabled (e.g. s390), __tlb_remove_page() drops a reference to the > page. This means that the folio reference count could drop to zero while > still in use (i.e. before folio_remove_rmap_range() is called). This > does not happen on other platforms because the actual page freeing is > deferred. > > Solve this by appropriately getting/putting the folio to guarrantee it > does not get freed early. Given the need to get/put the folio in the > batch path, we stick to the non-batched path if the folio is not large. > While the batched path is functionally correct for a folio with 1 page, > it is unlikely to be as efficient as the existing non-batched path in > this case. > > Signed-off-by: Ryan Roberts > --- > mm/memory.c | 132 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 132 insertions(+) > > diff --git a/mm/memory.c b/mm/memory.c > index 01f39e8144ef..d35bd8d2b855 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1391,6 +1391,99 @@ zap_install_uffd_wp_if_needed(struct vm_area_struct *vma, > pte_install_uffd_wp_if_needed(vma, addr, pte, pteval); > } > > +static inline unsigned long page_cont_mapped_vaddr(struct page *page, > + struct page *anchor, unsigned long anchor_vaddr) > +{ > + unsigned long offset; > + unsigned long vaddr; > + > + offset = (page_to_pfn(page) - page_to_pfn(anchor)) << PAGE_SHIFT; > + vaddr = anchor_vaddr + offset; > + > + if (anchor > page) { > + if (vaddr > anchor_vaddr) > + return 0; > + } else { > + if (vaddr < anchor_vaddr) > + return ULONG_MAX; > + } > + > + return vaddr; > +} > + > +static int folio_nr_pages_cont_mapped(struct folio *folio, > + struct page *page, pte_t *pte, > + unsigned long addr, unsigned long end) > +{ > + pte_t ptent; > + int floops; > + int i; > + unsigned long pfn; > + struct page *folio_end; > + > + if (!folio_test_large(folio)) > + return 1; > + > + folio_end = &folio->page + folio_nr_pages(folio); > + end = min(page_cont_mapped_vaddr(folio_end, page, addr), end); > + floops = (end - addr) >> PAGE_SHIFT; > + pfn = page_to_pfn(page); > + pfn++; > + pte++; > + > + for (i = 1; i < floops; i++) { > + ptent = ptep_get(pte); > + > + if (!pte_present(ptent) || pte_pfn(ptent) != pfn) > + break; > + > + pfn++; > + pte++; > + } > + > + return i; > +} > + > +static unsigned long try_zap_anon_pte_range(struct mmu_gather *tlb, > + struct vm_area_struct *vma, > + struct folio *folio, > + struct page *page, pte_t *pte, > + unsigned long addr, int nr_pages, > + struct zap_details *details) > +{ > + struct mm_struct *mm = tlb->mm; > + pte_t ptent; > + bool full; > + int i; > + > + /* __tlb_remove_page may drop a ref; prevent going to 0 while in use. */ > + folio_get(folio); Is there no way around that? It feels wrong and IMHO a bit ugly. With this patch, you'll might suddenly have mapcount > refcount for a folio, or am I wrong? > + > + for (i = 0; i < nr_pages;) { > + ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm); > + tlb_remove_tlb_entry(tlb, pte, addr); > + zap_install_uffd_wp_if_needed(vma, addr, pte, details, ptent); > + full = __tlb_remove_page(tlb, page, 0); > + > + if (unlikely(page_mapcount(page) < 1)) > + print_bad_pte(vma, addr, ptent, page); Can we avoid new users of page_mapcount() outside rmap code, please? :) -- Cheers, David / dhildenb