From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BBED1C10F16 for ; Mon, 6 May 2024 12:38:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 37D536B0088; Mon, 6 May 2024 08:38:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 32A2B6B008A; Mon, 6 May 2024 08:38:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1F3036B008C; Mon, 6 May 2024 08:38:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 032AC6B0088 for ; Mon, 6 May 2024 08:38:37 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 73101120A2B for ; Mon, 6 May 2024 12:38:37 +0000 (UTC) X-FDA: 82087924674.27.6CE09EF Received: from mail-vs1-f42.google.com (mail-vs1-f42.google.com [209.85.217.42]) by imf02.hostedemail.com (Postfix) with ESMTP id AAEA680002 for ; Mon, 6 May 2024 12:38:34 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZHbuRuGc; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.42 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714999114; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iNddV6/g7b5X9nuwqRkA6IZ/STwot3sVpGbOunU96o4=; b=0gquIwPbJ6UBrGp5uaz/k0nZhXhsvHGHwh4QKZ2OCexS/GEl2P/P2KCRsZNlqiyhCpTKE8 Ill0U2jYB7MkrRvTkOP/bB1eV/A9WR/1XqWqWay8hCHjzFoiAsOU8gg8h4Z1tEd8aJPEuz ionGzk1qVCR03P8ws/gaMEG9IHbBi5k= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714999114; a=rsa-sha256; cv=none; b=16J8HDrRkkZquFmqXZ3SJpSK3+qk9u1OtO7tXPtv1dc0AcHrZiieUUNS3h+AJLfkUwnmlp DabWoGAkRQs5TqxFpkKGhhITudTruHkMI2hdEOkfm1emzdyTeV5KFLmbYGNeeNdDF2O2Yi 6zzuVjhzW2GAc40RfR+x72DruCDhI4w= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZHbuRuGc; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.42 as permitted sender) smtp.mailfrom=21cnbao@gmail.com Received: by mail-vs1-f42.google.com with SMTP id ada2fe7eead31-47ef5a51829so285216137.2 for ; Mon, 06 May 2024 05:38:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714999114; x=1715603914; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=iNddV6/g7b5X9nuwqRkA6IZ/STwot3sVpGbOunU96o4=; b=ZHbuRuGc8Rh5VMKuPJXeyIl318PX/DSNRfkguF0CNa7OTbShY2lW5GvNX8lPbA6ZXt g1zPnn8Jr9HCPjG7MSEiHmR00PuMiFoNgTmIpJpY9nPCshMtM+jsOrGHBo0cNL4hBEGk K7/ZwNkivl288QMTLbdBDjt05wRDm2dJSTFm/ZSUzhootbVfAACLAwPHY2N9O7QBi1qI b/+TrV70TYz9+HTuWkx0hLkamx/H/8sN3EZ5bhN59khNvmNIySN5Ne8N1oKFG2+HuAdN bT0kpo93PTBPOdRHuDzyiWHkeLFdgcFMskTUlwqRASXrpmELsrdvKmQYNxdM7ZKOggti GtLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714999114; x=1715603914; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iNddV6/g7b5X9nuwqRkA6IZ/STwot3sVpGbOunU96o4=; b=AAaOnzjxH70OoQxWrvTfcnF76REVbSvsHbjYtW5KdFB+LLp8quOx8XX56HjS153oav UqKo3ZprDKcW1cVhWHkfQkbF9duSG2URphEkJXwMlDuNGztUKZD15W60ed/pIijMD6h/ qNGktet+stunfBsUVK93jCM4RfpY8IT8Z0HJcR3PRVMFRJpb6AeCEk81NSiI0Kz4Lvc0 YY5XyQ5frhhiYAEQgln+xqpYhmO6KajJrkCV23Uk4HrYwpTOuHaQf5dra3p51w5x1iA3 I9RiZxd8/QNl9dPCpdkOCQqhkXhRAd8YRxNV0DoVTbKSGtEcx6RrvgXB3VAR4Wf6+8G3 Zi3Q== X-Forwarded-Encrypted: i=1; AJvYcCVa3PTt0DrlzfmDfdjWXRw+CE37W+ShUlMV84mvxiJjrlXP6CeTrqhX2MgcWUoDhNnTSdOB2CGVdrMQZfSD20T8n30= X-Gm-Message-State: AOJu0YyacUCr81fBqQh9Pg92t3eCFqwDUq12XphjWLZ4pn+p5XAVSOBa TcorU96OjW4jN/U3CbXaHD7Iil+rg+MHiPrwVl5DFys4A7ycDfHUPeyd99lOncB84BbInFTbh4V f28/53MuuWM+FasTr0lmR3eHd8Tk= X-Google-Smtp-Source: AGHT+IF361PCS/bcPQDXdLLBgiShj8tlb0DlrjHFfS9TEIc70Q8vWeR4C5rynAElu9jpvQokJBteD0vRH1G3zQL8TGI= X-Received: by 2002:a05:6102:5f47:b0:47e:a214:5317 with SMTP id il7-20020a0561025f4700b0047ea2145317mr12206866vsb.19.1714999113599; Mon, 06 May 2024 05:38:33 -0700 (PDT) MIME-Version: 1.0 References: <20240503005023.174597-1-21cnbao@gmail.com> <20240503005023.174597-7-21cnbao@gmail.com> <0b4d4d4b-91d8-4fd5-af4e-aebe9ee08b89@arm.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Tue, 7 May 2024 00:38:22 +1200 Message-ID: Subject: Re: [PATCH v3 6/6] mm: swap: entirely map large folios found in swapcache To: David Hildenbrand Cc: Ryan Roberts , akpm@linux-foundation.org, linux-mm@kvack.org, baolin.wang@linux.alibaba.com, chrisl@kernel.org, hanchuanhua@oppo.com, hannes@cmpxchg.org, hughd@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, surenb@google.com, v-songbaohua@oppo.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yosryahmed@google.com, yuzhao@google.com, ziy@nvidia.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: AAEA680002 X-Stat-Signature: 97cmwi5grohumn3syukaoqq7jwm5jb93 X-Rspam-User: X-HE-Tag: 1714999114-473134 X-HE-Meta: U2FsdGVkX1/BbkDAff8ldprzjhqkYR0rqf9vkU1a8YbhGvDSy+mBbRxoUhcq4wGA/+aCOccc9M+icepLwO+ONr3E56ZcyFingmj4Dk/55T4bIzMbBcFq4cHZcgEoAbp8D+ux3vv3LEuwFHDNFnOuLe8vMG9Wgg3sb3+i+km5+gmqPig4AuGbkDRfkA1A/H6uJwf9IDT5mAckcLGQfHweta/cSFwDemR1AX4LR88e5zb5fMVI+wLid0bn3fIvnTtXq9WZ0+5oOLMRTnYTd9hCBFCEAi6D+1BnMAiwP4KUpSYTXpefnffF3gxxfZ6VtnfloWgqBo7udrqXMtDIYyPQl+MuzOlg1CiYcP8j67F8CmxD8Ebc04ma8XR+FNY9+ewxS3QEmTRm/TBQ8gRwKTrHdQCji+mdNB+fQ4KLTZQdcAKudE2014GRUy9SOHGgUvzLEbp/KN2QhkXXoNJ30gFewcONHc9+NxtJcw6gQrqV7mh6bfTXa0nFPyDstduROZ/H7FwWM0zvf05et/Eb1Qi0rMmvIU0A9WwaeGvKPlKG+SdqgtfJy3qh3l76JYROW16hH6Fn+S1MGpcNDIvn/cetdkSK2b8hIg/rLyaP19rBu3WzGI+AyAx+aSQZMXqa9L7akXuQaSdqR9kNVc0gIWiDHvhxb3p2Oj1HC9wopFk6PC33fXE4mQwwhTjKfsxWyVRmjjG6oU4DAnTo2SSL74MOHmSgiLJDgO4zQ1vFjg0De828B0iYjJ/NqHSSnwHRlVxRZ8S/EDXLLimZaaqa4RZcibXCQn8Ztmf7lEGSWZ/28vFSOHS06HAsyN0dp6YkvNWXV/rSNUMO+awTYPx33oYKlogpKVg0tX9G53hcqZPUaqmyRVZgqKP9XXtJ+BqO4aWaFEwvFRaJj/0DyPZ4j7zszbDtCGpX2D1+ldkIlsEC/g6cK5DnM5AlobjqBl5mASDYZYi0ZfobcilHDtStFFk Gs5OKsqk an/4/Gdmm9mCNbxy6a/j4fmLmdLUtjYp7FB5F16xtoGT9X38q7U/ALmOge6P41gFzTBtfdprGZC0xymsNbT5q5VorezSyTw/DCuJGd6zUXQJBT9XRb34A/fZJXH5kE0yQhPL7G/iYwF5iSCzUDG6QjhY3nd97tmSSe/piwk1gJtLA4MaI2HfSaGiRJw8Fu3TiSkuqNWJpruBV/2eAqJC0IQxx9JOVhqcUpBN7a3SSiPx+vEW4SZUAhoj3qctV4/HUvuyGBgs7BDSmU8bDvPhWBUhUc6ie91CxvZ99wek6geewNE6ghxhAkBBVYvfJ48BWqrimARFZjdccsgiIG/UH7zNV/5NkqMTBzMwnFBkfWlPOB4m3ZpRNO++7UHWcuEvru7FwaBPC2V7fO//ae0NdSxsUkgc7oKPZ83wU78P7aw0R3JCtw8NkLL3cZ67kkY5TnqRuSJQjJqUNdD2YMaGGXkVLA8DTMzwuO/G98H3liTF6izfVCuNLh0to6g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, May 7, 2024 at 12:07=E2=80=AFAM David Hildenbrand wrote: > > On 04.05.24 01:23, Barry Song wrote: > > On Fri, May 3, 2024 at 6:50=E2=80=AFPM Ryan Roberts wrote: > >> > >> On 03/05/2024 01:50, Barry Song wrote: > >>> From: Chuanhua Han > >>> > >>> When a large folio is found in the swapcache, the current implementat= ion > >>> requires calling do_swap_page() nr_pages times, resulting in nr_pages > >>> page faults. This patch opts to map the entire large folio at once to > >>> minimize page faults. Additionally, redundant checks and early exits > >>> for ARM64 MTE restoring are removed. > >>> > >>> Signed-off-by: Chuanhua Han > >>> Co-developed-by: Barry Song > >>> Signed-off-by: Barry Song > >> > >> With the suggested changes below: > >> > >> Reviewed-by: Ryan Roberts > >> > >>> --- > >>> mm/memory.c | 60 ++++++++++++++++++++++++++++++++++++++++++--------= --- > >>> 1 file changed, 48 insertions(+), 12 deletions(-) > >>> > >>> diff --git a/mm/memory.c b/mm/memory.c > >>> index 22e7c33cc747..940fdbe69fa1 100644 > >>> --- a/mm/memory.c > >>> +++ b/mm/memory.c > >>> @@ -3968,6 +3968,10 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > >>> pte_t pte; > >>> vm_fault_t ret =3D 0; > >>> void *shadow =3D NULL; > >>> + int nr_pages =3D 1; > >>> + unsigned long page_idx =3D 0; > >>> + unsigned long address =3D vmf->address; > >>> + pte_t *ptep; > >> > >> nit: Personally I'd prefer all these to get initialised just before th= e "if > >> (folio_test_large()..." block below. That way it is clear they are fre= sh (incase > >> any logic between here and there makes an adjustment) and its clear th= at they > >> are only to be used after that block (the compiler will warn if using = an > >> uninitialized value). > > > > right. I agree this will make the code more readable. > > > >> > >>> > >>> if (!pte_unmap_same(vmf)) > >>> goto out; > >>> @@ -4166,6 +4170,36 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > >>> goto out_nomap; > >>> } > >>> > >>> + ptep =3D vmf->pte; > >>> + if (folio_test_large(folio) && folio_test_swapcache(folio)) { > >>> + int nr =3D folio_nr_pages(folio); > >>> + unsigned long idx =3D folio_page_idx(folio, page); > >>> + unsigned long folio_start =3D vmf->address - idx * PAGE= _SIZE; > >>> + unsigned long folio_end =3D folio_start + nr * PAGE_SIZ= E; > >>> + pte_t *folio_ptep; > >>> + pte_t folio_pte; > >>> + > >>> + if (unlikely(folio_start < max(vmf->address & PMD_MASK,= vma->vm_start))) > >>> + goto check_folio; > >>> + if (unlikely(folio_end > pmd_addr_end(vmf->address, vma= ->vm_end))) > >>> + goto check_folio; > >>> + > >>> + folio_ptep =3D vmf->pte - idx; > >>> + folio_pte =3D ptep_get(folio_ptep); > >>> + if (!pte_same(folio_pte, pte_move_swp_offset(vmf->orig_= pte, -idx)) || > >>> + swap_pte_batch(folio_ptep, nr, folio_pte) !=3D nr) > >>> + goto check_folio; > >>> + > >>> + page_idx =3D idx; > >>> + address =3D folio_start; > >>> + ptep =3D folio_ptep; > >>> + nr_pages =3D nr; > >>> + entry =3D folio->swap; > >>> + page =3D &folio->page; > >>> + } > >>> + > >>> +check_folio: > >> > >> Is this still the correct label name, given the checks are now above t= he new > >> block? Perhaps "one_page" or something like that? > > > > not quite sure about this, as the code after one_page can be multiple_p= ages. > > On the other hand, it seems we are really checking folio after "check_f= olio" > > :-) > > > > > > BUG_ON(!folio_test_anon(folio) && folio_test_mappedtodisk(folio)); > > BUG_ON(folio_test_anon(folio) && PageAnonExclusive(page)); > > > > /* > > * Check under PT lock (to protect against concurrent fork() sharing > > * the swap entry concurrently) for certainly exclusive pages. > > */ > > if (!folio_test_ksm(folio)) { > > > > > >> > >>> + > >>> /* > >>> * PG_anon_exclusive reuses PG_mappedtodisk for anon pages. A = swap pte > >>> * must never point at an anonymous page in the swapcache that= is > >>> @@ -4225,12 +4259,13 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > >>> * We're already holding a reference on the page but haven't m= apped it > >>> * yet. > >>> */ > >>> - swap_free_nr(entry, 1); > >>> + swap_free_nr(entry, nr_pages); > >>> if (should_try_to_free_swap(folio, vma, vmf->flags)) > >>> folio_free_swap(folio); > >>> > >>> - inc_mm_counter(vma->vm_mm, MM_ANONPAGES); > >>> - dec_mm_counter(vma->vm_mm, MM_SWAPENTS); > >>> + folio_ref_add(folio, nr_pages - 1); > >>> + add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages); > >>> + add_mm_counter(vma->vm_mm, MM_SWAPENTS, -nr_pages); > >>> pte =3D mk_pte(page, vma->vm_page_prot); > >>> > >>> /* > >>> @@ -4240,34 +4275,35 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > >>> * exclusivity. > >>> */ > >>> if (!folio_test_ksm(folio) && > >>> - (exclusive || folio_ref_count(folio) =3D=3D 1)) { > >>> + (exclusive || (folio_ref_count(folio) =3D=3D nr_pages && > >>> + folio_nr_pages(folio) =3D=3D nr_pages))) { > >> > >> I think in practice there is no change here? If nr_pages > 1 then the = folio is > >> in the swapcache, so there is an extra ref on it? I agree with the cha= nge for > >> robustness sake. Just checking my understanding. > > > > This is the code showing we are reusing/(mkwrite) a folio either > > 1. we meet a small folio and we are the only one hitting the small foli= o > > 2. we meet a large folio and we are the only one hitting the large foli= o > > > > any corner cases besides the above two seems difficult. for example, > > > > while we hit a large folio in swapcache but if we can't entirely map it > > (nr_pages=3D=3D1) due to partial unmap, we will have folio_ref_count(fo= lio) > > =3D=3D nr_pages =3D=3D 1 > > No, there would be other references from the swapcache and > folio_ref_count(folio) > 1. See my other reply. right. can be clearer by: @@ -4263,7 +4264,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (should_try_to_free_swap(folio, vma, vmf->flags)) folio_free_swap(folio); - folio_ref_add(folio, nr_pages - 1); add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages); add_mm_counter(vma->vm_mm, MM_SWAPENTS, -nr_pages); pte =3D mk_pte(page, vma->vm_page_prot); @@ -4275,14 +4275,14 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) * exclusivity. */ if (!folio_test_ksm(folio) && - (exclusive || (folio_ref_count(folio) =3D=3D nr_pages && - folio_nr_pages(folio) =3D=3D nr_pages))) { + (exclusive || folio_ref_count(folio) =3D=3D 1)) { if (vmf->flags & FAULT_FLAG_WRITE) { pte =3D maybe_mkwrite(pte_mkdirty(pte), vma); vmf->flags &=3D ~FAULT_FLAG_WRITE; } rmap_flags |=3D RMAP_EXCLUSIVE; } + folio_ref_add(folio, nr_pages - 1); flush_icache_pages(vma, page, nr_pages); if (pte_swp_soft_dirty(vmf->orig_pte)) pte =3D pte_mksoft_dirty(pte); > > -- > Cheers, > > David / dhildenb >