From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C75FEB64DD for ; Thu, 3 Aug 2023 13:50:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C1ABF28025B; Thu, 3 Aug 2023 09:50:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BCB5728022C; Thu, 3 Aug 2023 09:50:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A6C3628025B; Thu, 3 Aug 2023 09:50:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 96EAD28022C for ; Thu, 3 Aug 2023 09:50:33 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 529ACA118E for ; Thu, 3 Aug 2023 13:50:33 +0000 (UTC) X-FDA: 81082928346.25.119DCB0 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf04.hostedemail.com (Postfix) with ESMTP id CB8914001E for ; Thu, 3 Aug 2023 13:50:30 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=N7X+xYC6; spf=pass (imf04.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691070630; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DFtvFxzvtN7yC4jJv7eqOYfjz3o6rsTtEUsEqB0F9gY=; b=oMKcqOyMqD5e3uhy5Htfqs+nqyCc8COyUH532yxb/bExnsMW635xV4yX5YpScBuH5E39rK 0obR0kRENP7FCL180nfOsXDekKaAYoy0DQOf0VUIZ/IuTa7IGP7G34Ii0LCkqp9GM/bBdD QXdz65Fnkyjv9/M8dPc2cOXM+SBfyZ0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691070630; a=rsa-sha256; cv=none; b=WEU5bLUGeACziNGjoH5luGPwnyBjcB9Qe0mbEy01ExJANGpzDknzZakEAXugjtukK1zG/o LEHIfcd4izuHMgVJFpKNaR/vLdyZv3PHgHOIV3EIr2s/dMixZyi33wOrD0KH+0q0MXjtq3 oEiRLcG7bVxalHqG+ufd+E+Sb26qIQs= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=N7X+xYC6; spf=pass (imf04.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1691070630; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DFtvFxzvtN7yC4jJv7eqOYfjz3o6rsTtEUsEqB0F9gY=; b=N7X+xYC6Pq4Sgw06/ALReKUDPcGr8rSJ61wnHQSSh8TpQizc0jVd6u8XBGg+KF8MFLDLBV mK/Z6n8xjMaVPMt0iCsJ1SKBbGos0HYFI2pzJF0KADUakIRHGJX46nfdno4sWV8ddLJw/B zJE4zsQq57Dvr72twZYAFbga/iVel+c= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-518-KRPPcwBkPOyuJVO29qagKA-1; Thu, 03 Aug 2023 09:50:28 -0400 X-MC-Unique: KRPPcwBkPOyuJVO29qagKA-1 Received: by mail-wr1-f71.google.com with SMTP id ffacd0b85a97d-31400956ce8so701615f8f.3 for ; Thu, 03 Aug 2023 06:50:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691070627; x=1691675427; h=content-transfer-encoding:in-reply-to:subject:organization :references:cc:to:from:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=DFtvFxzvtN7yC4jJv7eqOYfjz3o6rsTtEUsEqB0F9gY=; b=LOZmCzXsWgruSozxIzaM/P1Fg/LfzqXzSLCnB9l/FlrhrR8YRcwdgsB244AaQluF7f GAc+ZxIodPCK4WVVJHm1B6fy9q+e8YkkGawBIw9lRFuBIcNewnWRY9206aWkzEcqkvtX ma8EhQnTXb5vbbfFHR29T7lIW0h8147gHIcdyLdyzAdqvqlGq+ojJh5GQhBZc3teHeWa kQ8zx/ZY4cNDy4I5wwStCDMJVyb6yKIhfhT+G1YEVQNT00J4yBghZ5hKSACTwqoUp2+y beGfDwL9GUkpO4Gj8SkdpOBM0H2wBntex2pqWr1rXKPM6YNMBMTcLDEVxppe+zQHvdQf mFEw== X-Gm-Message-State: ABy/qLYlrJXqol+byzB+zkg0KwNjX6HvSAh3y59mcRBfxxLW6Fx6ZJMX 7/X020p8G2ixa4a5+hYN5LiQQeLrVPsHwe5o/bap3eWDD+yXfaiDQglxkxwIWAzD/KBRnhuLaJh kkdbBK+eXjyg= X-Received: by 2002:a05:6000:52:b0:316:d887:624a with SMTP id k18-20020a056000005200b00316d887624amr6276964wrx.15.1691070627550; Thu, 03 Aug 2023 06:50:27 -0700 (PDT) X-Google-Smtp-Source: APBJJlHINXnvI/NbH2+Y8nVmQoso9t9S0OvMtkkSv5reS2g3Ibze6DPyePC/pFcASpcAO7A5FTtBog== X-Received: by 2002:a05:6000:52:b0:316:d887:624a with SMTP id k18-20020a056000005200b00316d887624amr6276941wrx.15.1691070627128; Thu, 03 Aug 2023 06:50:27 -0700 (PDT) Received: from ?IPV6:2003:cb:c718:9a00:a5f5:5315:b9fa:64df? (p200300cbc7189a00a5f55315b9fa64df.dip0.t-ipconnect.de. [2003:cb:c718:9a00:a5f5:5315:b9fa:64df]) by smtp.gmail.com with ESMTPSA id n6-20020a5d4206000000b00317a04131c5sm12979287wrq.57.2023.08.03.06.50.25 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 03 Aug 2023 06:50:26 -0700 (PDT) Message-ID: Date: Thu, 3 Aug 2023 15:50:25 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 From: David Hildenbrand To: Ryan Roberts , Andrew Morton , Matthew Wilcox , Yin Fengwei , Yu Zhao , Yang Shi , "Huang, Ying" , Zi Yan , Nathan Chancellor , Alexander Gordeev , Gerald Schaefer Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20230727141837.3386072-1-ryan.roberts@arm.com> <20230727141837.3386072-4-ryan.roberts@arm.com> <6cda91b3-bb7a-4c4c-a618-2572b9c8bbf9@redhat.com> Organization: Red Hat Subject: Re: [PATCH v4 3/3] mm: Batch-zap large anonymous folio PTE mappings In-Reply-To: <6cda91b3-bb7a-4c4c-a618-2572b9c8bbf9@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: CB8914001E X-Rspam-User: X-Stat-Signature: f7sch9mzs16qxtj7hkysht8r9prntbxc X-Rspamd-Server: rspam03 X-HE-Tag: 1691070630-840120 X-HE-Meta: U2FsdGVkX1+hYKyorEFmLj8T/eG+y0FaZHMtbYFmnm/hCU4efNcEhmvGTVxYAXfv2o32VQsYtyOZ/O/AA3QOdhvDe8BwIEa3+ZnlssTAV/idA7wIwRq6OFT0P0Z/aUqf560CJ8ujNa++GHAgjRVZM5QcFLOq6do7cwceGiOc5ZGHQC4Mzt00nPqfasHJBH+ITgEtUq6knCzT1u39branVW5VuWYPzeFJGX7yQ1py+omonQFBiLRDOexnG6To/VHWkPMJotGR+nGydYxLoyazUq1iYMa/YsIBwTeJid2FtyWALNumuNJP+X4HEyus7ceOXZ+KJPnTiSsV13verOlWJqM9Zkr7F+hGpptBTuYpsm+FG2paL5lcvkvZazGs81V+TSqLpoaGzeiNe7MaYqFzu7oWlr7gkRsqZQ8EszvrbsEBMTv7hCRt3HQ9FXF2eXKW24MU9DJLBQhzBO1oWGV1xaJW3DhpzNUHdaphU8FfdFYrPbFrnphVrEPq7eFlxYLLKHc1wBz0uRrBIRVeJ4vhh583YTHxhg9cYE59Tf3yGyxDtrHV1ko/Klxa2KcTjLzjOy6HZiPyvj2IoMzGDty/hRVWzAJedanMFxPU0JLb5mlTq5W8b1WPLR9GlBHNf2TvEtKRxz75ai2s+QUdVKEBPMol6aagBifDOPd/Fu3MnWFzg6MkmumZFvo9BlgSvxnz/GPIpmJKNLCWlVGnTPigvoJhZ7kdyRsJm6z+93dP13xZpjCPO20+3Ah2MIOWZhatJzMQF2RPDSW3cDNrZ0hIO7JtgYI3p3/IltwuRP1E3e+K0YXh40RnJ8ZORES5tQalr7Su6zUYreG86Oqg2xeq6KTahP5WH4gHBNMbZuBBWRo/fgxNGD5SdWcLzp3yqUuT6HohnJE+iFzylt1D9yiW8WxUPS4XQDWSCJpbM5w6r20zMKIXeNzM1CTLVkoPeRQDD+DXHSdtA3f0zz1HhOY zwOV2K6r jdqVCDN9jF2sOcsMN52QtAaM3dqFamoslmiKuiPNc+zYafwCcZJjUpzlGg1zXTBeSWMdTkKWowFyjtXFzHav07mAtN2iF/TOZweG598It0AahTJ8h+dP1KYDlIQGOr+fzWgtbsxuwl7PPI/CXiOd4VbZKG/aZzn7i72bONrhmxYvuTZK03z2muqnJaWwf50RR2TpsXqCo20VTijE3J2LN8E4BG07YJmJnEx7WZWKa3yjVv88k74zl1yL3FgWkxMcDobi3kldKKl3QxwTROSOzOxUzh7rA7K7ulNRTZ3QWSWPVgYM0sksyOmoK8aAxWdBaqN/8/W69dvW9zt3NHTUHDOJCuG4Wbk+9NQAWyiM8szN+QucxRWrQzWDyzSdsYvtliPn0Zq+ZIClr/xs/bB59sWOlM1YCiYREEMJ3zSOwBcBm7jZJ/5VlhreHEmvSCQXw5+ql X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 03.08.23 15:38, David Hildenbrand wrote: > On 27.07.23 16:18, Ryan Roberts wrote: >> This allows batching the rmap removal with folio_remove_rmap_range(), >> which means we avoid spuriously adding a partially unmapped folio to the >> deferred split queue in the common case, which reduces split queue lock >> contention. >> >> Previously each page was removed from the rmap individually with >> page_remove_rmap(). If the first page belonged to a large folio, this >> would cause page_remove_rmap() to conclude that the folio was now >> partially mapped and add the folio to the deferred split queue. But >> subsequent calls would cause the folio to become fully unmapped, meaning >> there is no value to adding it to the split queue. >> >> A complicating factor is that for platforms where MMU_GATHER_NO_GATHER >> is enabled (e.g. s390), __tlb_remove_page() drops a reference to the >> page. This means that the folio reference count could drop to zero while >> still in use (i.e. before folio_remove_rmap_range() is called). This >> does not happen on other platforms because the actual page freeing is >> deferred. >> >> Solve this by appropriately getting/putting the folio to guarrantee it >> does not get freed early. Given the need to get/put the folio in the >> batch path, we stick to the non-batched path if the folio is not large. >> While the batched path is functionally correct for a folio with 1 page, >> it is unlikely to be as efficient as the existing non-batched path in >> this case. >> >> Signed-off-by: Ryan Roberts >> --- >> mm/memory.c | 132 ++++++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 132 insertions(+) >> >> diff --git a/mm/memory.c b/mm/memory.c >> index 01f39e8144ef..d35bd8d2b855 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -1391,6 +1391,99 @@ zap_install_uffd_wp_if_needed(struct vm_area_struct *vma, >> pte_install_uffd_wp_if_needed(vma, addr, pte, pteval); >> } >> >> +static inline unsigned long page_cont_mapped_vaddr(struct page *page, >> + struct page *anchor, unsigned long anchor_vaddr) >> +{ >> + unsigned long offset; >> + unsigned long vaddr; >> + >> + offset = (page_to_pfn(page) - page_to_pfn(anchor)) << PAGE_SHIFT; >> + vaddr = anchor_vaddr + offset; >> + >> + if (anchor > page) { >> + if (vaddr > anchor_vaddr) >> + return 0; >> + } else { >> + if (vaddr < anchor_vaddr) >> + return ULONG_MAX; >> + } >> + >> + return vaddr; >> +} >> + >> +static int folio_nr_pages_cont_mapped(struct folio *folio, >> + struct page *page, pte_t *pte, >> + unsigned long addr, unsigned long end) >> +{ >> + pte_t ptent; >> + int floops; >> + int i; >> + unsigned long pfn; >> + struct page *folio_end; >> + >> + if (!folio_test_large(folio)) >> + return 1; >> + >> + folio_end = &folio->page + folio_nr_pages(folio); >> + end = min(page_cont_mapped_vaddr(folio_end, page, addr), end); >> + floops = (end - addr) >> PAGE_SHIFT; >> + pfn = page_to_pfn(page); >> + pfn++; >> + pte++; >> + >> + for (i = 1; i < floops; i++) { >> + ptent = ptep_get(pte); >> + >> + if (!pte_present(ptent) || pte_pfn(ptent) != pfn) >> + break; >> + >> + pfn++; >> + pte++; >> + } >> + >> + return i; >> +} >> + >> +static unsigned long try_zap_anon_pte_range(struct mmu_gather *tlb, >> + struct vm_area_struct *vma, >> + struct folio *folio, >> + struct page *page, pte_t *pte, >> + unsigned long addr, int nr_pages, >> + struct zap_details *details) >> +{ >> + struct mm_struct *mm = tlb->mm; >> + pte_t ptent; >> + bool full; >> + int i; >> + >> + /* __tlb_remove_page may drop a ref; prevent going to 0 while in use. */ >> + folio_get(folio); > > Is there no way around that? It feels wrong and IMHO a bit ugly. > > With this patch, you'll might suddenly have mapcount > refcount for a > folio, or am I wrong? Thinking about it, Maybe we should really find a way to keep the current logic flow unmodified: 1) ptep_get_and_clear_full() 2) tlb_remove_tlb_entry() 3) page_remove_rmap() 4) __tlb_remove_page() For example, one loop to handle 1) and 2); and another one to handle 4). This will need a way to query for the first loop how often we can call __tlb_remove_page() before we need a flush. The simple answer would be "batch->max - batch->nr". tlb_next_batch() makes exceeding that a bit harder, maybe it's not really required. -- Cheers, David / dhildenb