From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20F02C54E67 for ; Sun, 17 Mar 2024 06:12:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 758AB6B0082; Sun, 17 Mar 2024 02:12:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 709066B0083; Sun, 17 Mar 2024 02:12:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5D02C6B0085; Sun, 17 Mar 2024 02:12:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 4C8E96B0082 for ; Sun, 17 Mar 2024 02:12:00 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 1DDA680737 for ; Sun, 17 Mar 2024 06:12:00 +0000 (UTC) X-FDA: 81905510400.07.66290AB Received: from mail-ua1-f48.google.com (mail-ua1-f48.google.com [209.85.222.48]) by imf11.hostedemail.com (Postfix) with ESMTP id 4F5E640015 for ; Sun, 17 Mar 2024 06:11:58 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=MAkguyxF; spf=pass (imf11.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.48 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710655918; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xaXRziAAjTeilxT57/ezUTVz0HxnKuoeL6SxjwNi6/A=; b=n4mOgVFGal3TophPHvRrucbK+OGEF3JiLzIpXdAwvq/DKeQAEE37EF33CJX/ZIreqnLO53 T6pZ8z/GCcB4ukto6j/n5C3v6PRfbekdCdQ34Y+v/wtXV2O1BC3MJEloyhMMQvzOToSD+n Zjik57D7KZsbd2Q5aLYfXQPKGF9pwp4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710655918; a=rsa-sha256; cv=none; b=JDYFzrhexnWoSK3URUXcawcMRXTBbf847ZDPF7Lb5m/9U9tVTwB8HyL3uSw0rMeZKnrTBe hIBij4v+FwmYK8GIT+3R+6V5Kht+062Oq6XSTDyEkzH6zicFpLKL0ZZnz08P3dactI74jQ VLy23BthF6C/KZYiGAHGi56BdPmZm3g= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=MAkguyxF; spf=pass (imf11.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.48 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ua1-f48.google.com with SMTP id a1e0cc1a2514c-7e05cb778e9so85147241.0 for ; Sat, 16 Mar 2024 23:11:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1710655917; x=1711260717; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=xaXRziAAjTeilxT57/ezUTVz0HxnKuoeL6SxjwNi6/A=; b=MAkguyxFEaMopAeNyfe2c2tfD27Cjy+3F5K2TGja1Shtcgs+/GGJHRLByN9RjIzHTC Z5MofgaQGaPW+fm7eOctYKzVUbO9hwfZVHu4IuwTT4QkUK65w2c/M6wpf2Rz+kxtT1BG aa3vvpc602G72L/tHlKE5rRmPCQPe/K1LtpyooCO66OQbY3LIbO0cA5q6dEIcV8BArTi mykcDMiCBUvj/9jkABEonj87B2NZVLnH458zRxV4fgtzsNigMGPTOp870gggKgVoWbmf RLxxC8SMf4/90+o2w8J71xdLO6G5y2Y7bHWnBN7sv9ajU4aafQOIMZSwFj4Ehwd2jVa3 PE+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710655917; x=1711260717; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xaXRziAAjTeilxT57/ezUTVz0HxnKuoeL6SxjwNi6/A=; b=XUU57ram5fE+1AFoYJOK9GYEg2vfF6PC+r66no/jaBqjN+AeOOO9FRakLyMHs1LXCF DvTxDBKcL5H4j3HcrENlzvG7rJUAQe19Dz2Y1xQVsDibbRAHS6pYlfMqQy2TEg0Nu8l7 dSaoCKex1330A0zAvWh2A31UAp+2VX6pBcZKa0BhOoU117ojHWnobqWiS/EMTqzFoorh s7F5srLB2+N5+mtvdDvFSKxU9QWKByUe6CKTxSq+s6siWsDQq+V1kKAf77HoK3BThrbf qgocptcuNqFZCSIJBe1uTluxCruRRElxi8Pi8M/lQUhv4+NZFWTp0yQMOlFhUFK772gN I7dA== X-Forwarded-Encrypted: i=1; AJvYcCX+9oUr5DmzPwxqmhssm4O76onBGyKpDphBuq7Z0cLCpoRVwNluHIyRi+z+G8kLnEdNsHeLBUgrwfbjW5Pow+S8lp0= X-Gm-Message-State: AOJu0Yx87mthb/BIVgHp/mXcbgFoeIXQlvNX1RkGMEM4F/AMpbLHSTO4 C6EVdqZFPiwDVg+XHxIHI4mQa6Z/hFDIh2jH2JVLJctKalWlX7NSIn+U5CyY4YdTD8NQWQmk62u 25nW/tpz3Mso6HhyFzu7XK0FFSAQ= X-Google-Smtp-Source: AGHT+IEylcYSoIR/ruNDqXb8pnruvgXX/ghkBZ9HJd5X81bodgdIw4FDw1hl6Q54oeNcdoEBcKnX92q8dpg3lUIst04= X-Received: by 2002:a67:ea4a:0:b0:473:efc:eeae with SMTP id r10-20020a67ea4a000000b004730efceeaemr7578507vso.2.1710655917329; Sat, 16 Mar 2024 23:11:57 -0700 (PDT) MIME-Version: 1.0 References: <20240304081348.197341-1-21cnbao@gmail.com> <20240304081348.197341-6-21cnbao@gmail.com> <87wmq3yji6.fsf@yhuang6-desk2.ccr.corp.intel.com> <87sf0rx3d6.fsf@yhuang6-desk2.ccr.corp.intel.com> <4fea8887-b3a1-4b32-8484-c3eeb74cf2e0@arm.com> In-Reply-To: <4fea8887-b3a1-4b32-8484-c3eeb74cf2e0@arm.com> From: Barry Song <21cnbao@gmail.com> Date: Sun, 17 Mar 2024 19:11:46 +1300 Message-ID: Subject: Re: [RFC PATCH v3 5/5] mm: support large folios swapin as a whole To: Ryan Roberts Cc: "Huang, Ying" , Matthew Wilcox , akpm@linux-foundation.org, linux-mm@kvack.org, chengming.zhou@linux.dev, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, mhocko@suse.com, nphamcs@gmail.com, shy828301@gmail.com, steven.price@arm.com, surenb@google.com, wangkefeng.wang@huawei.com, xiang@kernel.org, yosryahmed@google.com, yuzhao@google.com, Chuanhua Han , Barry Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4F5E640015 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: zr4uhpgmng3wu93rysmjg8xy31oygima X-HE-Tag: 1710655918-472195 X-HE-Meta: U2FsdGVkX1+8CF/xJQPZuKM/SXPXFfgMlhR4BSJ+Bdgq1CO+bPk3jmqAIaNcU7oWr5xX7u4uHTC6hUGVl4KHxLB0LlpZhd9nk+cGPEMdofy+18xKoQVsWo1C8uomx2xFJc5d9bsWE/JM0BXKe+4UrrY1lXOP/7oPjANVmaeCeB3Koh5HOLHJJLYTOfeT/2XcpKTSwi1nwRG/dnmOizP5F9/kPDVqEEnbnxNoYRsR0ZwxeQc13YxvKVRi/5HulIKzOe7qwrCtqw+saBDVQU2VPDylsIJ07u4aaQlXs9h5OTWJmP9Cm23stlQ4p3vVRfIP5Xv8v0CwgWvUNDj3g1hwGkyik5+NVschayb8KndPl0FlVjqcuAFq7Uv1wdXPAq/6U1XQQnQWQ+KfHkqZKv8gBZsWnH6D1Hbwa8nQOhM/JDiak1rQDkANsYkUeQ2OUHnrub7jqm4njN9Hd/lFhH7fdDd/M3SkyvTf1FOhYt3/CGJFRDCX81/+SJIyLqLaA3Tt/PANJJq5lC8us27nYNDdRXQPiba+NRFJqs5gRA4fBAqoMqO1GHu+ILweX3y0J+Dzyid06NZIhLUORnvaV2S0TinQAC8f2NZEpzTIr2PV9mMhqM2PCkh8FDAnYY/0ghg8BIndLeYi7Ny8TyD1E0C53G6eyA/8vIvZZCo1Yui63NxQ0W+adOBTIhD16LwyeuQ+NvcsplnJr0huhIaJMd0l8I13Xe50BflnY2QIBBKMZ1wZp0GbHMSnyRhT5aNYLCdMa670pIOE3Ltkl+z6mhqnwW83M4Pny1pYIjzAWFbYHSgZ9j5iRlxFdG/he8eYaC91MbFSQZ4U4hFBAAPQXuQZQ3ctTn6KQnU5of2fgKSvowD+P/8JevrUxcK/psV74/cKYqpbCpGlE5alKUJueA6BRgwlJZpf+PgHNj10HcXfsX+ITNF3/OFPQ/aKHFi+EEH5D0Lku5QItZBjbviDdq7 ZyZc8JcC VXH0Qc8+ctM2CWGjj1JOC4Jqv0pK+t05lqhYmCb3ftyY90lCYnF7vDr/BuOpCNzAbEFq52TMVWYiljEBopY2VclWkk6pfBq7KruT3AOBezKP19zuJA6KQS+V5Im3VwsJxxyIwLeOT546Yz6j2mfW5A5Ul+y36nZ4SIPe1Bj5YhUAFQdPlp5a5g+5sLKFirRDgxOo9a2ZVXAEvx0/9e/Gb3WxLaq0fNF3NxOQTAG5F6iwzckevDmfecniWw3+kPqn4LyJBBtf21sLmodtZUP1l1XJQjubcslGCHfsh+3Xob6o7lLxbM0jozXU7If1W4zlGvKugpyG8HPVLmNNZuz6f+3WYCxRzE5mOI+dfz7+/NUDbutiwXRNRS+5mLe6SBmoetrgFN8sWBzukXheEb/J59BVzc1Wm8NdW2a5hqsNPZmjCj7T8SM5fTnnt15jvtQyy5N2VTaBQqlA7LkY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Mar 16, 2024 at 1:06=E2=80=AFAM Ryan Roberts = wrote: > > On 15/03/2024 10:01, Barry Song wrote: > > On Fri, Mar 15, 2024 at 10:17=E2=80=AFPM Huang, Ying wrote: > >> > >> Barry Song <21cnbao@gmail.com> writes: > >> > >>> On Fri, Mar 15, 2024 at 9:43=E2=80=AFPM Huang, Ying wrote: > >>>> > >>>> Barry Song <21cnbao@gmail.com> writes: > >>>> > >>>>> From: Chuanhua Han > >>>>> > >>>>> On an embedded system like Android, more than half of anon memory i= s > >>>>> actually in swap devices such as zRAM. For example, while an app is > >>>>> switched to background, its most memory might be swapped-out. > >>>>> > >>>>> Now we have mTHP features, unfortunately, if we don't support large= folios > >>>>> swap-in, once those large folios are swapped-out, we immediately lo= se the > >>>>> performance gain we can get through large folios and hardware optim= ization > >>>>> such as CONT-PTE. > >>>>> > >>>>> This patch brings up mTHP swap-in support. Right now, we limit mTHP= swap-in > >>>>> to those contiguous swaps which were likely swapped out from mTHP a= s a > >>>>> whole. > >>>>> > >>>>> Meanwhile, the current implementation only covers the SWAP_SYCHRONO= US > >>>>> case. It doesn't support swapin_readahead as large folios yet since= this > >>>>> kind of shared memory is much less than memory mapped by single pro= cess. > >>>> > >>>> In contrast, I still think that it's better to start with normal swa= p-in > >>>> path, then expand to SWAP_SYCHRONOUS case. > >>> > >>> I'd rather try the reverse direction as non-sync anon memory is only = around > >>> 3% in a phone as my observation. > >> > >> Phone is not the only platform that Linux is running on. > > > > I suppose it's generally true that forked shared anonymous pages only > > constitute a > > small portion of all anonymous pages. The majority of anonymous pages a= re within > > a single process. > > > > I agree phones are not the only platform. But Rome wasn't built in a > > day. I can only get > > started on a hardware which I can easily reach and have enough hardware= /test > > resources on it. So we may take the first step which can be applied on > > a real product > > and improve its performance, and step by step, we broaden it and make i= t > > widely useful to various areas in which I can't reach :-) > > > > so probably we can have a sysfs "enable" entry with default "n" or > > have a maximum > > swap-in order as Ryan's suggestion [1] at the beginning, > > I wasn't neccessarily suggesting that we should hard-code an upper limit.= I was > just pointing out that we likely need some policy somewhere because the r= ight > thing very likely depends on the folio size and workload. And there is li= kely > similar policy needed for CoW. > > We already have per-thp-size directories in sysfs, so there is a natural = place > to add new controls as you suggest - that would fit well. Of course if we= can > avoid exposing yet more controls that would be preferable. > > > > > " > > So in the common case, swap-in will pull in the same size of folio as w= as > > swapped-out. Is that definitely the right policy for all folio sizes? C= ertainly > > it makes sense for "small" large folios (e.g. up to 64K IMHO). But I'm = not sure > > it makes sense for 2M THP; As the size increases the chances of actuall= y needing > > all of the folio reduces so chances are we are wasting IO. There are si= milar > > arguments for CoW, where we currently copy 1 page per fault - it probab= ly makes > > sense to copy the whole folio up to a certain size. > > " right now we have an "enable" entry in each size, for example: /sys/kernel/mm/transparent_hugepage/hugepages-64kB/enable for the phone case, it would be quite simple, just enable 64KiB(or +16KiB) = and allow swap-in 64KiB(or +16KiB) folios, so it doesn't need any new controls since do_swap_page does the same checks as do_anonymous_page() does. And we actually have deployed 64KiB-only swap-out and swap-in on millions of real phones. Considering other users scenarios which might want larger folios such as 2M= iB 1MiB but only want smaller swap-in folio sizes, I could have a new swapin control like, /sys/kernel/mm/transparent_hugepage/hugepages-64kB/swapin this can be 1 or 0. With this, it seems safer for the patchset to land while I don't have the ability to extensively test it on Linux servers? Thanks Barry