From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B580BC433F5 for ; Mon, 14 Mar 2022 11:13:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:Subject:Message-ID:Date:From: In-Reply-To:References:MIME-Version:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=9nLFZIBHRO48p3Rhb9hL3gOTgG3lrAsNcDV8g/gTFLo=; b=g162PfeFMBjtB8 zoUlhDOMD8jWQVWJf2gGKZvF/ePyi7kwygxfpQPBTb3OWQmwe3P4/WpjB6jXGESeOZVdBibgmc2fu Lggmn9cVx5wjtlQxyVOFYaPd8hQohILVSbCiL7v3xG72YUFIFw7ZkBQ0UCwiw32pXLmGhwRqVwrNk plvxZB/GnX7F1zz9wba2irZrNXHNagMfPs3Ekq/9iap9e2yLcAUv+9OzDGVsESEE6gsS+FQKWto3z xE0kTN+WoTuZJDJ0+QsxAvwDh8lm3caOW6Xy6yYV385w2IPX4BBb3Nz0JoVJSzt7LgyTDuIHocSrp 4RDd/MjSxYK5sAxtE77w==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1nTicQ-0056Rq-Hd; Mon, 14 Mar 2022 11:12:06 +0000 Received: from mail-yb1-xb32.google.com ([2607:f8b0:4864:20::b32]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1nTicN-0056Qq-4h for linux-arm-kernel@lists.infradead.org; Mon, 14 Mar 2022 11:12:05 +0000 Received: by mail-yb1-xb32.google.com with SMTP id e186so29949388ybc.7 for ; Mon, 14 Mar 2022 04:12:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=NrHL268kYTQKU4P/1wkUZzdGz3tsAsC9tQJaQPpuhuw=; b=dfyY3aPCvZgBdhCbIskssGSkz1cIjNj3niZVF1fMEDNssC7eZXPPKC4V4d3Kn++8JB vmC1rm09tpzKPKrwA9M/acKBmv5HCV+ihe0knLzgEquZzzMWhMM83Rr6Epy/RQabxwf8 8sJAfFDkJ+cF8FlSRanzzSJYon6mnOn+NyUR/sCpB49au+ehE98SRUAaRHCcFFSF8c7w 7NK+aSicKLUmofLMuCSgpSmcgx7ATWGpONpHrBOlK684BuS5fVqp5GRBkdsVSqLeieXf 3vT4fGwYvIjNAr53YShmo7m9jCZoVkhalM5eC4u89KIDJI0BHAXVODrZ11zKaGyMjT+6 rdUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=NrHL268kYTQKU4P/1wkUZzdGz3tsAsC9tQJaQPpuhuw=; b=iQoLL2Sbkr9NroSdksOLYzPP3BwkIvpw1fIl5oBr5jk9CVIUsFWNXLJ0g8ET49DW7G S8rDvoI3MxYqXuKeXoRa90mFsn+7r3p+40iISjG228+ldddb1xIF7Tfwg1A8OdLDag11 74oHzMoww17zvWtQAGxhAodXgoYZqq3X+pNrKXhQmKBY/ySpW8natpGdvKos2PgMHn89 oN7+I0qoeMq2tvbt/O0z4MjKoSgBuKAWUwRRf4AYWkmaEWVgwY2QEe6g1OwhudIIDDDp wGX9/lukSShSG7CTrmVoKyCWYMdSklMPkCtV8QaGkjnwWyYT8EdYjJ1Lyommdogw7ibp BH6w== X-Gm-Message-State: AOAM532elvZEkyV9w2el9FBuxqDojZKN5EV8CKWW4zSXZgjiwP6U68ge stP3TxMT47fWCkIhADc2xjto3xY3+yfNGITDZ/k= X-Google-Smtp-Source: ABdhPJws/fpfY+OebPw5PnXLzpcwwAkHXKhoKO32xvpUOeSozhDQhfrG2imKe/Bk7UFZD/+TtWQGCIrYRG+Gt7cN0GM= X-Received: by 2002:a5b:7c6:0:b0:60b:a0ce:19b with SMTP id t6-20020a5b07c6000000b0060ba0ce019bmr17145561ybq.407.1647256321073; Mon, 14 Mar 2022 04:12:01 -0700 (PDT) MIME-Version: 1.0 References: <20220208081902.3550911-1-yuzhao@google.com> <20220208081902.3550911-5-yuzhao@google.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Tue, 15 Mar 2022 00:11:50 +1300 Message-ID: Subject: Re: [PATCH v7 04/12] mm: multigenerational LRU: groundwork To: Yu Zhao Cc: Johannes Weiner , Andrew Morton , Mel Gorman , Michal Hocko , Andi Kleen , Aneesh Kumar , Catalin Marinas , Dave Hansen , Hillf Danton , Jens Axboe , Jesse Barnes , Jonathan Corbet , Linus Torvalds , Matthew Wilcox , Michael Larabel , Mike Rapoport , Rik van Riel , Vlastimil Babka , Will Deacon , Ying Huang , LAK , Linux Doc Mailing List , LKML , Linux-MM , Kernel Page Reclaim v2 , x86 , Brian Geffon , Jan Alexander Steffens , Oleksandr Natalenko , Steven Barrett , Suleiman Souhlal , Daniel Byrne , Donald Carr , =?UTF-8?Q?Holger_Hoffst=C3=A4tte?= , Konstantin Kharlamov , Shuang Zhai , Sofia Trinh X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220314_041203_239145_41687C12 X-CRM114-Status: GOOD ( 50.32 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org > > > > > > > > > We used to put a faulted file page in inactive, if we access it a > > > > > second time, it can be promoted > > > > > to active. then in recent years, we have also applied this to anon > > > > > pages while kernel adds > > > > > workingset protection for anon pages. so basically both anon and file > > > > > pages go into the inactive > > > > > list for the 1st time, if we access it for the second time, they go to > > > > > the active list. if we don't access > > > > > it any more, they are likely to be reclaimed as they are inactive. > > > > > we do have some special fastpath for code section, executable file > > > > > pages are kept on active list > > > > > as long as they are accessed. > > > > > > > > Yes. > > > > > > > > > so all of the above concerns are actually not that correct? > > > > > > > > They are valid concerns but I don't know any popular workloads that > > > > care about them. > > > > > > Hi Yu, > > > here we can get a workload in Kim's patchset while he added workingset > > > protection > > > for anon pages: > > > https://patchwork.kernel.org/project/linux-mm/cover/1581401993-20041-1-git-send-email-iamjoonsoo.kim@lge.com/ > > > > Thanks. I wouldn't call that a workload because it's not a real > > application. By popular workloads, I mean applications that the > > majority of people actually run on phones, in cloud, etc. > > > > > anon pages used to go to active rather than inactive, but kim's patchset > > > moved to use inactive first. then only after the anon page is accessed > > > second time, it can move to active. > > > > Yes. To clarify, the A-bit doesn't really mean the first or second > > access. It can be many accesses each time it's set. > > > > > "In current implementation, newly created or swap-in anonymous page is > > > > > > started on the active list. Growing the active list results in rebalancing > > > active/inactive list so old pages on the active list are demoted to the > > > inactive list. Hence, hot page on the active list isn't protected at all. > > > > > > Following is an example of this situation. > > > > > > Assume that 50 hot pages on active list and system can contain total > > > 100 pages. Numbers denote the number of pages on active/inactive > > > list (active | inactive). (h) stands for hot pages and (uo) stands for > > > used-once pages. > > > > > > 1. 50 hot pages on active list > > > 50(h) | 0 > > > > > > 2. workload: 50 newly created (used-once) pages > > > 50(uo) | 50(h) > > > > > > 3. workload: another 50 newly created (used-once) pages > > > 50(uo) | 50(uo), swap-out 50(h) > > > > > > As we can see, hot pages are swapped-out and it would cause swap-in later." > > > > > > Is MGLRU able to avoid the swap-out of the 50 hot pages? > > > > I think the real question is why the 50 hot pages can be moved to the > > inactive list. If they are really hot, the A-bit should protect them. > > This is a good question. > > I guess it is probably because the current lru is trying to maintain a balance > between the sizes of active and inactive lists. Thus, it can shrink active list > even though pages might be still "hot" but not the recently accessed ones. > > 1. 50 hot pages on active list > 50(h) | 0 > > 2. workload: 50 newly created (used-once) pages > 50(uo) | 50(h) > > 3. workload: another 50 newly created (used-once) pages > 50(uo) | 50(uo), swap-out 50(h) > > the old kernel without anon workingset protection put workload 2 on active, so > pushed 50 hot pages from active to inactive. workload 3 would further contribute > to evict the 50 hot pages. > > it seems mglru doesn't demote pages from the youngest generation to older > generation only in order to balance the list size? so mglru is probably safe > in these cases. > > I will run some tests mentioned in Kim's patchset and report the result to you > afterwards. > Hi Yu, I did find putting faulted pages to the youngest generation lead to some regression in the case ebizzy Kim's patchset mentioned while he tried to support workingset protection for anon pages. i did a little bit modification for rand_chunk() which is probably similar with the modifcation() Kim mentioned in his patchset. The modification can be found here: https://github.com/21cnbao/ltp/commit/7134413d747bfa9ef The test env is a x86 machine in which I have set memory size to 2.5GB and set zRAM to 2GB and disabled external disk swap. with the vanilla kernel: \time -v ./a.out -vv -t 4 -s 209715200 -S 200000 so we have 10 chunks and 4 threads, each trunk is 209715200(200MB) typical result: Command being timed: "./a.out -vv -t 4 -s 209715200 -S 200000" User time (seconds): 36.19 System time (seconds): 229.72 Percent of CPU this job got: 371% Elapsed (wall clock) time (h:mm:ss or m:ss): 1:11.59 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2166196 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 9990128 Minor (reclaiming a frame) page faults: 33315945 Voluntary context switches: 59144 Involuntary context switches: 167754 Swaps: 0 File system inputs: 2760 File system outputs: 8 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 with gen_lru and lru_gen/enabled=0x3: typical result: Command being timed: "./a.out -vv -t 4 -s 209715200 -S 200000" User time (seconds): 36.34 System time (seconds): 276.07 Percent of CPU this job got: 378% Elapsed (wall clock) time (h:mm:ss or m:ss): 1:22.46 **** 15% time + Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2168120 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 13362810 ***** 30% page fault + Minor (reclaiming a frame) page faults: 33394617 Voluntary context switches: 55216 Involuntary context switches: 137220 Swaps: 0 File system inputs: 4088 File system outputs: 8 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 with gen_lru and lru_gen/enabled=0x7: typical result: Command being timed: "./a.out -vv -t 4 -s 209715200 -S 200000" User time (seconds): 36.13 System time (seconds): 251.71 Percent of CPU this job got: 378% Elapsed (wall clock) time (h:mm:ss or m:ss): 1:16.00 *****better than enabled=0x3, worse than vanilla Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2120988 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 12706512 Minor (reclaiming a frame) page faults: 33422243 Voluntary context switches: 49485 Involuntary context switches: 126765 Swaps: 0 File system inputs: 2976 File system outputs: 8 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 I can also reproduce the problem on arm64. I am not saying this is going to block mglru from being mainlined. But I am still curious if this is an issue worth being addressed somehow in mglru. > > > > > since MGLRU > > > is putting faulted pages to the youngest generation directly, do we have the > > > risk mentioned in Kim's patchset? > > > > There are always risks :) I could imagine a thousand ways to make VM > > suffer, but all of them could be irrelevant to how it actually does in > > production. So a concrete use case of yours would be much appreciated > > for this discussion. > Thanks Barry _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel