From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14C97C7EE37 for ; Tue, 6 Jun 2023 19:48:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8F34A8E0001; Tue, 6 Jun 2023 15:48:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8AC5D6B0072; Tue, 6 Jun 2023 15:48:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 743268E0001; Tue, 6 Jun 2023 15:48:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 60CA26B0071 for ; Tue, 6 Jun 2023 15:48:52 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 275D2A0457 for ; Tue, 6 Jun 2023 19:48:52 +0000 (UTC) X-FDA: 80873360904.26.1397677 Received: from mail-yw1-f169.google.com (mail-yw1-f169.google.com [209.85.128.169]) by imf12.hostedemail.com (Postfix) with ESMTP id 5DF6C4000F for ; Tue, 6 Jun 2023 19:48:50 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=eJ71o31o; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of surenb@google.com designates 209.85.128.169 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686080930; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=eYYAA3RhtrdXesbSXVAKg9SAMg9cz+VHDutK0lbbEnQ=; b=S5AHO88pMVsr4D2IVd1MJ1sbS6pOpxce8gUn0WGTHxaRFM3wM/xOWxTl//cK/VPgVy8CTX eWeKkVzXGkYcInGGTIKZqFPhM1tOH8feeAPAWYqHWsk91ABJeVHdI8Ai2IQgaM9qR7Cqt5 4jRW8bTiuk7Aueo2t0LmMVPf1EI11gk= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=eJ71o31o; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of surenb@google.com designates 209.85.128.169 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686080930; a=rsa-sha256; cv=none; b=4nXRPDRWMcTrfqfpUbW4bD4oi9uHNK4E1QJ70sTfazQbnjNGWYBUF1xQPZrLR8Dx5495xQ FdrjFf/F5iP1mnNGqelZWkXvk2Gv8ghWMB46YRQy5BD3qFYRHcV6uYiYp9y5xlmpoM3cQc n8c6yDZmvpaTGNQgsFo6FVc8xN6f21g= Received: by mail-yw1-f169.google.com with SMTP id 00721157ae682-56896c77434so64860997b3.0 for ; Tue, 06 Jun 2023 12:48:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686080929; x=1688672929; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=eYYAA3RhtrdXesbSXVAKg9SAMg9cz+VHDutK0lbbEnQ=; b=eJ71o31oQjhPioKEp+k0fZ2Rkf/rhy8zXW/KjoNUcr7x7pCammCChXximi+wbiEOze RVlUbcqJULNTcwArntaaduxtmT8bBm/RxMcnUbZH2LShvZRl97RzvKKTFYY8oYHiD3pF Mu6F+/C5OYXZLkujh60sbF47kjFi7HXdKGPj3Ctr5HnpcbdxkXxoKxXPc9yuTcmjzbXe 2EUpdOm3YfZizDLAGzF3vqquCj+/Xt9zGOntJMslih9Qmcy89Hf6FWMsFpFezmFLNA29 f3VUumJ5Aa+kZ1JyeXCnc51EffdVqf3LclB+wxBEhpv8dMRbAxsrcoSDF49dwfNxFOqJ X47A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686080929; x=1688672929; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=eYYAA3RhtrdXesbSXVAKg9SAMg9cz+VHDutK0lbbEnQ=; b=LIgLJH/R+LHlgb2IeVBbhZExcZ6ck6Zq9sZ8805dgZTpYXZltBfmU2gBketYpABCFj SKHCLJWWDnw8xIOyu2dYqIbL2p51i4v+v7QO674pGbWuX5VhxHMbn5aORJCHxphen8x9 zV0UJ378WSXdb1ugRLcCZbIsPQqC6KpmqsgHXJUJENrDxA/B1LBTPi3RSeFt9yQ6Ww5X wu1UIJj5Ltz0EGDKITPTWbEutHMNPKHcL9Udr3OLTPYNyF6fGAZmKzoZAInsot9fG+Pz B4FVg7vYEr08mleoy0d2lszLQ2pFqhv1e0aorI2NHf42Q0sTZqv9JAe3Qof+sJwEV743 xAJA== X-Gm-Message-State: AC+VfDzMIBt46Gt4U7Jb/v3K/eidspx0YLfOKr6K4pUtZiJGWSpM6+qZ zAqC+2yA0wwSxCQcy8W4SfAApOKigvvAaeqbveIlnQ== X-Google-Smtp-Source: ACHHUZ6bImh0yLK+J9ye97KdoSZSC2EQZoOrFipbdJDnmp7fpHgg+mU8SsYLzmDlhX3HHgDG3xc2M6y93+Nga6FlFmc= X-Received: by 2002:a81:5212:0:b0:565:bd70:69e3 with SMTP id g18-20020a815212000000b00565bd7069e3mr3534041ywb.10.1686080929287; Tue, 06 Jun 2023 12:48:49 -0700 (PDT) MIME-Version: 1.0 References: <1685531374-6091-1-git-send-email-quic_charante@quicinc.com> <20230531221955.GD102494@cmpxchg.org> <230e45e8-8cd8-3668-bbfa-a95212b4cb99@quicinc.com> <20230605180013.GD221380@cmpxchg.org> In-Reply-To: From: Suren Baghdasaryan Date: Tue, 6 Jun 2023 12:48:38 -0700 Message-ID: Subject: Re: [PATCH] mm: madvise: fix uneven accounting of psi To: Charan Teja Kalla Cc: Johannes Weiner , akpm@linux-foundation.org, minchan@kernel.org, quic_pkondeti@quicinc.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 5DF6C4000F X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: ri3jwxsspej5y6okfgzppfjic6m6i5rx X-HE-Tag: 1686080930-327494 X-HE-Meta: U2FsdGVkX18QwLrNMOM/hIc1KpPlnWH4M44IOvvBxLWbwUKouA5RlfsQMNjxqZQomb0GBDlXrqz3j21om4Ova4nENP2SnfCsSUCEhmiEU2wJHht4HPUcNHA8hyej3EHzJDuvKhA03uUPzjzBYp0UUfWwY0yGKduew0tY3THcVeXMSWZgaJaXaQ5Jo+x9kk1U7e7jarRuY6oNO4W/AdR7NCIUZfRDYJmc7VBGqKXPld0p1opKnXC8108KQDfi3tf1GCIGCBFa37ik1SFa+nvltyTZ5Qev9qeEfKObO9pCMko8lDEKQXk5J4ZgsNAarv567zVv/n2hA6X6/XLH30D+G9TcNUH35/KslhIkCwrMp8SRboRX9Gn0s+Q4cyAbRDoD7DRs3wHIwVe3UKOUACI9VO2P3qf4i9LrhE1XlNBfeEStuiMJrfmFbvzkPgZvL850EnR538xgfNKtVYvAPFFeVMZ3dTrHLvHMuXD1nYxQynxhd+R/Vax2ToixsDcozpIhQJ8J6ABkjwZz4UxJmN1XHGnawnbfnkgKTs6RQC/BFTGXsQM3FpfcrBnVUsx7lUnoTM1By6w+SGtDUFdXbe8LThFP+YHgz+WYT265uUDFzRRgPqmLsSxdmVCKf4z3D5x5wkRv35YIsxsETZJjGxhuHPbwgBdfHSB+WoWF2pJZ94UO/+ryRZQFK4iMBKf99cexp3vHmfzB/ZGShC/CutcmiQuIrJK9TTM+wRDCHkAJlmhhAJjmftXotnhXN7Qo3Po31qZNHwhZA2lJE3lYtBAVkLoHf8GHr6JNZHrGoUKqkUrH40zhGbvSOSXtlfHNWwprFOwyT81ZMIbPKpcOwww2RBrf4at+BOrlwk6C5Zx4gWdW7BCnHodBv1Vduw2kKajJI9RN8ZUU+j10pBadZ6VJSjTnDvLRyuCAsQMDpo4XyYRj6b0oO01XX2MnYvF4YLy6tfSbc099cXc2ObCxTV5 YmwjK4ri sIknXVbyyoI0LUxoayOYdnRo6epuQRLSdhuEYcETx226iv6+x696COm8sZFFiTS7KsOQxcPsN+8iQ8W73qeHaUvFXKYcze8rkROt7QSz69PD4My9499Bf6khVKBtZpV/2AqhLovXHL3goRpluqwuNeyNBHHrpyDwvC0wiLx/56B/68Qj/OT1zKc9obaG8sLemnlkwk3U2nqIQnXzNspKs6DongxPssJjK+Ld6rm4pK4ZvCIQlBr+bXfMYBVC0dSkRSpmA0EPEunIL7At5/+WooTLsPTW3i73VyulK X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Charan, On Tue, Jun 6, 2023 at 7:54=E2=80=AFAM Charan Teja Kalla wrote: > > Thanks Johannes for the detailed review comments... > > On 6/5/2023 11:30 PM, Johannes Weiner wrote: > >> Agree that we shouldn't be really silence the thrashing. My point is w= e > >> shouldn't be considering the folios as thrashing If those were gettin= g > >> reclaim by the user him self through MADV_PAGEOUT under the assumption > >> that __user knows they are not real working set__. Please let me know > >> if I am not making sense here. > > I'm not sure I agree with this. I think it misses the point of what > > the madvise is actually for. > > > > The workingset is defined based on access frequency and available > > memory. Thrashing is defined as having to read pages back shortly > > after their eviction. > > > > MADV_PAGEOUT is for the application to inform the kernel that it's > > done accessing the pages, so that the kernel can accelerate their > > eviction over other pages that may still be in use. This is ultimately > > meant to REDUCE reclaim and paging. > > > > However, in this case, the MADVISE_PAGEOUT evicts pages that are > > reused after and then refault. It INCREASED reclaim and paging. > > > I agree here... > > Surely that's a problem? And the system would have behaved better > > without the madvise() in the first place? > > > Yes, the system behavior could be much better without this PAGEOUT > operation... > > In fact, I would argue that the pressure spike is a great signal for > > detecting overzealous madvising. If you're redefining the workingset > > from access frequency to "whatever the user is saying", that will take > > away an important mechanism to detect advise bugs and unnecessary IO. > currently wanted to do the PAGEOUT operation but what information lacks > is if I am really operating on the workingset pages. Had the client > knows that he is operating on the workingset pages, he could have backed > off from madvising. > > I now note that I shouldn't be defining the workingset from "whatever > user is saying". But then, IMO, there should be a way from the kernel to > the user that his madvise operation is being performed on the workingset > pages. > > One way the user can do is monitoring the PSI events while PAGEOUT is > being performed and he may exclude those VMA's from next time. > > Alternatively kernel itself can support it may be through like > MADV_PAGEOUT_INACTIVE which doesn't pageout the Workingset pages. > > Please let me know your opinion about this interface. > > This has the usecase on android where it just assumes that 2nd > background app will most likely to be not used in the future thus > reclaim those app pages. It works well for most of the times but such > assumption will go wrong with the usecase I had mentioned. Hi Folks. Sorry for being late to the party. Yeah, userspace does not have a crystal ball to predict future user behavior, so there will always be pathological cases when usual assumptions and resulting madvise() would make things worse. I think this discussion can be split into several questions/issues: 1. Inconsistency in how madvise(MADV_PAGEOUT) would affect PSI calculation when the page is refaulted, based on the path it took before being evicted by madvise(). In your initial description case (a) is inconsistent with (b) and (c) and it's probably worth fixing. IMHO (a) should be made consistent with others, not the other way around. My reasoning is that page was expelled from the active list, so it was part of the active workingset. 2. Whether refaults caused by incorrect madvise(MADV_PAGEOUT) should be counted as workingset refault and affect PSI. This one I think is trickier. IMHO it should be counted as workingset refault simply because it was refaulted and it was part of the workingset. Whether it should affect PSI, which is supposed to be an indicator of "pressure" is, I think, debatable. With madvise() in the mix, refault might happen without any real memory pressure... So, the answer is not obvious to me. 3. Should refaults caused by incorrect madvise(MADV_PAGEOUT) be distinguished from the ones which were evicted by kernel reclaim mechanisms. I can see use for that from userspace to detect incorrect madvise() and adjust its aggressiveness. I think the API might get a bit complex because of the need to associate refaults with specific madvise()/VMAs to understand which hint was incorrect and adjust the behavior. Hope my feedback is useful and if we can improve Android's userspace behavior, I'm happy to help make that happen. Thanks, Suren. > > --Thanks.