From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12E6DC47DAF for ; Thu, 18 Jan 2024 23:54:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6C7E26B0078; Thu, 18 Jan 2024 18:54:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6779E6B0081; Thu, 18 Jan 2024 18:54:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4F1496B00A4; Thu, 18 Jan 2024 18:54:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3D0F86B0078 for ; Thu, 18 Jan 2024 18:54:26 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 802BA1A040F for ; Thu, 18 Jan 2024 23:54:24 +0000 (UTC) X-FDA: 81694088448.14.4DA780E Received: from mail-vs1-f48.google.com (mail-vs1-f48.google.com [209.85.217.48]) by imf30.hostedemail.com (Postfix) with ESMTP id C793A80011 for ; Thu, 18 Jan 2024 23:54:22 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=gqBdW1J1; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf30.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.48 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705622062; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=csibKJZwe9QBeUcqCRzxQaxm9q++tACWnhZsAcs9+Ao=; b=kr1ckKhp3XGAklNrPNkxRHDUaCfOORkZPM54dPOM6XBAwbemfAMg4MjHGzZLLz+Olsns4p Km0p0X4SaEoAlYxFeoQKB5K4uYSxG+gSe2zWagOZcqaS91f2oG/C1A8AIkY79ooH7nDzNf CReaSxPV4k2AeY/tsJtMXoK9zw8at4Y= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=gqBdW1J1; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf30.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.48 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705622062; a=rsa-sha256; cv=none; b=rczJ6e/371sTDMdVK7W/uuMdSOCiN59hJrwCXG/X5/CYMn0F9IK4uQqXnEzvc2YSrZRfge xE5PcPb1dx3vasW9MVl+SMkczYQkqLhb1P2LvRmIRa2LAVLCTKj7szr+cwy2/jalfaSIvZ aebIGkk4zwB5M74sVfXZ46HlsW1VS4o= Received: by mail-vs1-f48.google.com with SMTP id ada2fe7eead31-469407ff780so73770137.1 for ; Thu, 18 Jan 2024 15:54:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705622062; x=1706226862; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=csibKJZwe9QBeUcqCRzxQaxm9q++tACWnhZsAcs9+Ao=; b=gqBdW1J1w7xs2oJ9LrItiJdBShkiHH8jUQhASvpi3PChTJg6StXz+g17m8OlapgdPj poHWLIzFTIxTSQ37oziWd/gph4cbpIo8ErRTN66GZdPf6niPZ7+8XouUZKpA+3Wl1o02 zZ4PjTPu61a2EawG7lU0SZHtd1UdOZqoOOUQMtApN5GZMjGVZcuZ8fBLEMJEV6K/B8zZ Iu9Sy9xKcKYHMgz3ESZ+6r+N+vKFY3zHZ6sV9NTCE87AQ2wA0Fr5XjZIV3dF/utvRFIq ZOmqOnW7xM83ujcaec+lXIhWPWxBqrH3pzwhnS9khtCq1y+nwJdwSRny82no9IBtzYaL 6TsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705622062; x=1706226862; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=csibKJZwe9QBeUcqCRzxQaxm9q++tACWnhZsAcs9+Ao=; b=Lih6OOH9SByEcaRxvgWSbFGYX6JqVJRvmzAhF0kVNCwzv11UJ9t9rIThJWWWakaJM3 NJrGh4mffjMp/J+Q3UPm0RTgDtSgSQVJyUJ0VZS2AtuKTmicooCIFtAY2+DeDRjb0xDU Sbo5WoZt3kFpfaaoJPvJqOqJ5tjt5xvxC3Cw7WArOE2yVR8qU4ADcPZqy8+ftB4uPX45 glRkVltCFGFDfdk59EYdwAfGTUd0lSzafC/1q7r8D44mIAZcmN+8Irsji1hZ0Ee5MoiG lgLSxcJIOXo66Ef2T4eDxtLQhYxLSeb1E1FUDRwoqgxQl8451X3MessOfjoktBnJWDnO sYVw== X-Gm-Message-State: AOJu0YxroIE9AJxa+fkkoSRBd4fKyOTWVQdth6Im3MQWHtoFTj9agkPB B65rZeQzMt7RJZZLZNBHrtJ7IHHw9UC37hZwbg+XQHIqvv/Tm+ZLwMPmjXCBBWLOCj4ATHxJSrP 0HB8eS6Drzbe2BtNz0R7vYK7qJW0= X-Google-Smtp-Source: AGHT+IFPaf1FzqdUkzDXuFj5qbm17OW6S8RY5EJ2XlMRXIVI95KesvbU1YDDcnNDF7A/017mou+FQZkyI0v0rzpzecE= X-Received: by 2002:a67:fd19:0:b0:468:e16:1cf9 with SMTP id f25-20020a67fd19000000b004680e161cf9mr1334711vsr.60.1705622061317; Thu, 18 Jan 2024 15:54:21 -0800 (PST) MIME-Version: 1.0 References: <20231025144546.577640-1-ryan.roberts@arm.com> <20240118111036.72641-1-21cnbao@gmail.com> <0450f151-143a-4ce8-8131-31180bbc13b8@arm.com> In-Reply-To: <0450f151-143a-4ce8-8131-31180bbc13b8@arm.com> From: Barry Song <21cnbao@gmail.com> Date: Fri, 19 Jan 2024 07:54:10 +0800 Message-ID: Subject: Re: [PATCH RFC 0/6] mm: support large folios swap-in To: Ryan Roberts Cc: akpm@linux-foundation.org, david@redhat.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, mhocko@suse.com, shy828301@gmail.com, wangkefeng.wang@huawei.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yuzhao@google.com, surenb@google.com, steven.price@arm.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: C793A80011 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 15s5qymeg5daa9gx33qizc3swszjwxyk X-HE-Tag: 1705622062-416625 X-HE-Meta: U2FsdGVkX1+rOtLx2jPJR8L6kf7UQ01HZp1zQ7nMIO/SrEqlsGUut2xwkSjwkGY1fdk1Ec7bt7ulA7KChdw56eQ+ySFaOL+m+sAkCYZ8kSTstIzisIab8H6GB0T15t0ukc0kC7dz+jOoQMCh1mUWDctRHls8LpDgJ22TZormQe7Uj3fblBnaxjyxchrmEvl+8b77QrzroND4ge6zHZCP4rzDPzJ4dJHKGG7FsxQgfspMqK8U6infnUgTpkzAdXRh7HkUm+7vQOxRYTa8ROCXzOSERm188ZQIgnVK1jQ62CnY8BSPfL0HYY6KUauaPB4TyH2hxWLpmoW7FRbcmhculDgtkkfvuHb41Y5Qm7XodWhTvGYf/CfXg49YDAsEv9ABZ8f7JtnjQn7n7BzqDVkYKRL+Rt7ssfVjKSVWzPOvbEiDYJwYW2xdOrFAkjF8bGjLin5UfFQ1rsCAdiyIRj69BjJlW4IeVJAkUa2XaUSBXkDkB6waU273ekAjD0FDN1vzCJRJXzpVqeKgrfiTL1ElNPDzQE1+f5Ik+kMEPhVcLSgc9bzr6HlKwSJ4BUUrOUxZtr+hxp7zatqrIzDk9uTBFhdHF9w3H4ryEfcXxW5yjPyVDJ6Gye3+awxppyFrhoRqSZv/qyzzPOsSAOr06uC59eQ/wSTDfZQPerdLeF+fiGlzont65tnhaaxw3vW0hN0RbN6m+Jejm9BSBygVvX+JMVIP734K1aa+1k2yCWtM6E3cNd8wP8KY/Ird7HVNXyt5t9fGJHR+X6n+/36DerFouiB21HwQx2Cxd9flaHqcd3XW4LMYhVAjdgMK+MPC1vb//WEFYvN4itmqf7uQVX8LoAWkJqoinDcJY91/t/l92b4foLyDL8qNfMWC8t7SR0q0cuiupW+lH/pG73Bm68NlP4Y0BMLb+4rAoepJiW/qy51XrD1oxBrNCZGdtcXimXGKdb2+tqq3Lz/K6I1DWQk D6C3O4S+ jdPcIpy53B3oX1151PIjghi+wxxCyb46HGCjkjWVEn9iws7D5mr+tTT0LZALV7OvBsxTHTnEeh68FA24DALA1gGmFrjpTaBLwLTkVu0TJBIE8No5X2SFHYnFQF2Io2mtpVXBxr7VYOfAMLwgtSYL601HSzE34gCKMyoTRFRZ53d4txxQ1YT+FCqFPXvaI7a9celYHFOETXQJ3eduWk/J/y6/OjcDXsGlaSw9X+pK+yhL/QreO+xj4TpsKq8kpYwxlKth2lJ7Tb2yoPoFgjtvS9Ax7hi2uAa/r5DsLcvDnFByX8pXAyqbdq8nyVh9lmBB5D2phZt6J0BieuZ7vhOERBXBzAOd4/8P+s6PkW0Vdvwa6XCSDu9hV6vkA4oDWGGzL1sT8Q+lkg4FLokv/YbmXkYHXFGoMIb7YZQXS1wDFJmlz/vN0Al8ExezhKC0xH3A6z6Hvxt3pguYm6+DEmA74mqI+1FQn+WsmbT38xb0v/3vcx6Ma0gsT7fJ9hA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jan 18, 2024 at 11:25=E2=80=AFPM Ryan Roberts wrote: > > On 18/01/2024 11:10, Barry Song wrote: > > On an embedded system like Android, more than half of anon memory is ac= tually > > in swap devices such as zRAM. For example, while an app is switched to = back- > > ground, its most memory might be swapped-out. > > > > Now we have mTHP features, unfortunately, if we don't support large fol= ios > > swap-in, once those large folios are swapped-out, we immediately lose t= he > > performance gain we can get through large folios and hardware optimizat= ion > > such as CONT-PTE. > > > > In theory, we don't need to rely on Ryan's swap out patchset[1]. That i= s to say, > > before swap-out, if some memory were normal pages, but when swapping in= , we > > can also swap-in them as large folios. > > I think this could also violate MADV_NOHUGEPAGE; if the application has > requested that we do not create a THP, then we had better not; it could c= ause a > correctness issue in some circumstances. You would need to pay attention = to this > vma flag if taking this approach. > > > But this might require I/O happen at > > some random places in swap devices. So we limit the large folios swap-i= n to > > those areas which were large folios before swapping-out, aka, swaps are= also > > contiguous in hardware. > > In fact, even this may not be sufficient; it's possible that a contiguous= set of > base pages (small folios) were allocated to a virtual mapping and all swa= pped > out together - they would likely end up contiguous in the swap file, but = should > not be swapped back in as a single folio because of this (same reasoning = applies > to cluster of smaller THPs that you mistake for a larger THP, etc). > > So you will need to check what THP sizes are enabled and check the VMA > suitability regardless; Perhaps you are already doing this - I haven't lo= oked at > the code yet. we are actually re-using your alloc_anon_folio() by adding a parameter to make it support both do_anon_page and do_swap_page, -static struct folio *alloc_anon_folio(struct vm_fault *vmf) +static struct folio *alloc_anon_folio(struct vm_fault *vmf, + bool (*pte_range_check)(pte_t *, int)) { #ifdef CONFIG_TRANSPARENT_HUGEPAGE struct vm_area_struct *vma =3D vmf->vma; @@ -4190,7 +4270,7 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf) order =3D highest_order(orders); while (orders) { addr =3D ALIGN_DOWN(vmf->address, PAGE_SIZE << order); - if (pte_range_none(pte + pte_index(addr), 1 << order)) + if (pte_range_check(pte + pte_index(addr), 1 << order)) break; order =3D next_order(&orders, order); } @@ -4269,7 +4349,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *= vmf) if (unlikely(anon_vma_prepare(vma))) goto oom; /* Returns NULL on OOM or ERR_PTR(-EAGAIN) if we must retry the fault */ - folio =3D alloc_anon_folio(vmf); + folio =3D alloc_anon_folio(vmf, pte_range_none); if (IS_ERR(folio)) return 0; if (!folio) -- I assume this has checked everything? > > I'll aim to review the code in the next couple of weeks. nice, thanks! > > Thanks, > Ryan > > > On the other hand, in OPPO's product, we've deployed > > anon large folios on millions of phones[2]. we enhanced zsmalloc and zR= AM to > > compress and decompress large folios as a whole, which help improve com= pression > > ratio and decrease CPU consumption significantly. In zsmalloc and zRAM = we can > > save large objects whose original size are 64KiB for example. So it is = also a > > better choice for us to only swap-in large folios for those compressed = large > > objects as a large folio can be decompressed all together. > > > > Note I am moving my previous "arm64: mm: swap: support THP_SWAP on hard= ware > > with MTE" to this series as it might help review. > > > > [1] [PATCH v3 0/4] Swap-out small-sized THP without splitting > > https://lore.kernel.org/linux-mm/20231025144546.577640-1-ryan.roberts@a= rm.com/ > > [2] OnePlusOSS / android_kernel_oneplus_sm8550 > > https://github.com/OnePlusOSS/android_kernel_oneplus_sm8550/tree/oneplu= s/sm8550_u_14.0.0_oneplus11 > > > > Barry Song (2): > > arm64: mm: swap: support THP_SWAP on hardware with MTE > > mm: rmap: weaken the WARN_ON in __folio_add_anon_rmap() > > > > Chuanhua Han (4): > > mm: swap: introduce swap_nr_free() for batched swap_free() > > mm: swap: make should_try_to_free_swap() support large-folio > > mm: support large folios swapin as a whole > > mm: madvise: don't split mTHP for MADV_PAGEOUT > > > > arch/arm64/include/asm/pgtable.h | 21 ++---- > > arch/arm64/mm/mteswap.c | 42 ++++++++++++ > > include/asm-generic/tlb.h | 10 +++ > > include/linux/huge_mm.h | 12 ---- > > include/linux/pgtable.h | 62 ++++++++++++++++- > > include/linux/swap.h | 6 ++ > > mm/madvise.c | 48 ++++++++++++++ > > mm/memory.c | 110 ++++++++++++++++++++++++++----- > > mm/page_io.c | 2 +- > > mm/rmap.c | 5 +- > > mm/swap_slots.c | 2 +- > > mm/swapfile.c | 29 ++++++++ > > 12 files changed, 301 insertions(+), 48 deletions(-) > > > Thanks Barry