From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21D90CE7A89 for ; Thu, 5 Sep 2024 19:24:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A0AD56B008C; Thu, 5 Sep 2024 15:24:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9BA866B0092; Thu, 5 Sep 2024 15:24:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8A8DC6B0093; Thu, 5 Sep 2024 15:24:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 6C2C96B008C for ; Thu, 5 Sep 2024 15:24:56 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1AB68120A39 for ; Thu, 5 Sep 2024 19:24:56 +0000 (UTC) X-FDA: 82531662192.21.D6C7380 Received: from mail-wm1-f54.google.com (mail-wm1-f54.google.com [209.85.128.54]) by imf28.hostedemail.com (Postfix) with ESMTP id 1718AC0007 for ; Thu, 5 Sep 2024 19:24:52 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jWwnfIrF; spf=pass (imf28.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.128.54 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725564196; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CbVNl1AsNveIurDxg2UqvATkpgD+3xfOD/Vx+6xl1Ys=; b=WMZ1oYdWT/H/v7euT3qomAZhlztK6hIq26A7C8e9nzsU7ws+R8MKyy/2rfUcA68/XY1Oqt +nONcqdh8+wHqLzZKTOxjb9+46CkoAEGEve9OVkpDbOWWKS7utkNbyZH1aqeK3qQNhYsVc mVJqdPeO/46mQThbaB8Lp3Q71PMhUXs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725564196; a=rsa-sha256; cv=none; b=o/MuyRxDu/14r3V0z1hlmANlXM0g42XWaU3WMimrxyuhXaXlvi0w9FKZF7WiTKzKCsrDkv 3izPVJ4bClfjV/HtR5UH5CK1B6u9Wz+z6so8UC7j+nJX5ygFwb/sTTyYHjMYMX7NizZmt0 mL2fn4iqwnuV6kFtfplcla1wXdelZ48= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jWwnfIrF; spf=pass (imf28.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.128.54 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-wm1-f54.google.com with SMTP id 5b1f17b1804b1-42bfb50e4e6so9058875e9.2 for ; Thu, 05 Sep 2024 12:24:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725564291; x=1726169091; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=CbVNl1AsNveIurDxg2UqvATkpgD+3xfOD/Vx+6xl1Ys=; b=jWwnfIrFpvKstVk72QC3kTpRGUHnMjOVOEA1JzDSCFtM4Bckdssp46qMvtOsM4A7iF BKiRaNnwOR6ts+hvqPAhM5+LtzX2Qq2DYKmkLC8ukvzuGRE2TlCYjiDuYIJ8SWr9TT3l J1NJHiHa1gNkZEDWKywDBfwIjU/C9vN+45nvIEPfTuuwiNUZMBI0xUfc80rINX/nYP8h AuqSJpdGJ/dhRf612R1jJ7UivQby6YaqEIX2cFll1TNWbAF2eW9l167cENtPLAsqJ1NJ q4XBwob6TE8a6g7vfYrJfD3blbSPwwYKLiO5EUdOYUbcLtvYSy9GVaXDsTLJvN9oGf7h 4dwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725564291; x=1726169091; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=CbVNl1AsNveIurDxg2UqvATkpgD+3xfOD/Vx+6xl1Ys=; b=OAx/RIKxnL7ZIzu2n3Rq1yLSr6qqEn8LURz4u/a57L3Jyva8Bpcx4lceA5CdpTKRAl rljRttftDynrTfpBLt1HI7ycW+oaae/4e7oFbbn8a7IdcY1pZHfaSjG3zSPptfyAx1hm Craw2HvwzZ/Y3aasx1WfGORp8W7JqOln9quwRFVoPHAhSfELdq/7xrnsNrHXU1lt9w5r OFGylguBfr/jwpcFczbH/88uoubMmwyeAlGvmy32JkUCcBSfMFiC0MoHcVkxWaFEY1vI TWujaH8HpBok2ajzRDmeYtBHbnxmwai8THfSDIlcMjnfwqv8DdYpElHtPrldcvUtCAgP 3AXw== X-Forwarded-Encrypted: i=1; AJvYcCUUtZ94J4htUTRMu4fFizigRY1Ar+Z+h81wp+XX2+ZcP+aM7uG+vves01fhhCOXNYa8JbT87OTC+g==@kvack.org X-Gm-Message-State: AOJu0YxtMIvrRuvLbMmqYHngHgQST4t3DD+e+BlK5HiCeMFn5uOMe8uw kxktO4LgBkpQwUwFXA0WW+gTWwwKuFyUY/wyN6EXYMXTR+tHO33+UasUCq3W X-Google-Smtp-Source: AGHT+IHZa/rGu6zLqp1a+czg0eJcLBEk6NINPiW6JvwpJnLQEhN78I2SNiqDWkiUB3iGN4vbyXkRLA== X-Received: by 2002:a5d:5244:0:b0:374:c1cc:2eb7 with SMTP id ffacd0b85a97d-374c1cc2f6cmr11980228f8f.35.1725564290600; Thu, 05 Sep 2024 12:24:50 -0700 (PDT) Received: from ?IPV6:2a02:6b6f:e750:7600:c5:51ce:2b5:970b? ([2a02:6b6f:e750:7600:c5:51ce:2b5:970b]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-378885ee8bdsm228660f8f.68.2024.09.05.12.24.50 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 05 Sep 2024 12:24:50 -0700 (PDT) Message-ID: <25da676b-f87d-4cbc-8df7-93528c601d62@gmail.com> Date: Thu, 5 Sep 2024 20:24:49 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v5 1/6] mm: free zapped tail pages when splitting isolated thp To: Hugh Dickins Cc: Andrew Morton , Yu Zhao , linux-mm@kvack.org, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, david@redhat.com, npache@redhat.com, baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, ryncsn@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Shuang Zhai References: <20240830100438.3623486-1-usamaarif642@gmail.com> <20240830100438.3623486-2-usamaarif642@gmail.com> <1d490ab5-5cf8-4c16-65d0-37a62999fcd5@google.com> <1ffdf94d-ce3f-4dac-8ed3-0681f98beebf@gmail.com> <5efa255b-3689-0c91-1980-c0f0562d84e9@google.com> Content-Language: en-US From: Usama Arif In-Reply-To: <5efa255b-3689-0c91-1980-c0f0562d84e9@google.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 1718AC0007 X-Stat-Signature: mybc6t1846u6ot83bf9kstgyxoh7rb3d X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1725564292-322556 X-HE-Meta: U2FsdGVkX1+YwWLHuJZwUYsQh4ID3TZKelC9YgUv20hgpXytO1cn0LG8fI9FfXtKw0NU2Qeob9AcaSUJqtk6EIuHxzgeRDlgrNL22jxTzdF6tfAgaIU9ui/bBksaIQ1aL7mGXEh9Q6z2hAcjDQ4KvhGpn9TmUqXImE2RXSZq3pEXU+0yJYBjunFIJ/wOu0beSa1xyNMkAOyvXe+/xgmWC+d69btDKeqSy0SV5Ntl0AxgKOcF2CjxvwY0gzFVYyLYb1jUNFekyVDyJeQCzB3DRl1SEmQf3PEyrT03dOSMiXaQ00p4oRyicTh+lJ+XIiwo5Ywv0MFplZ7rUEngfRPWpZIaEJSu2iQ3QW9tv0qIrnXtgobKV5ysDDKPNUeiW+PNorxLOVQSuniiZGX/SMSNCyQ+E6d9fvwcVVZhF63r3rZrUjK/WsJCe79xeb7yFgjQdmalKNokOT6ovWV8o9y0IadnJyd1MKcpuPm8cODMAu2AAYF4IO+W2lYAko6BVQWlMAyGYShSt31yJ18y2lqiWtXU5aNNVjWVTWmib2Xzh/dNNCDGCFVMwcsCpKV9cFxEbPjCbvf3jMkp9XjSjVKQeHuVwl3D8/vgJCfhi5KM/IKULWwWaCr0k+RygASg5vl0otPHqZr2MgYOtKRED3v82FmijxEAe0VmQp+SmKn+KepO9NL9eyXhRVw2hVBH/JzxfeHfWLtxUEbhHYZZ1Gu3PNv6Wlv3Uxxs7e3sCxGGP/RTcu6/HYIh0chjlduYAtawrTzxCRtu+cDBEzx+rHNz4dnl7P6YIfNz+T+VIjmwtC6eihDMUuXr+xnceZ/Tcilr3pACwaMNbc9Zy7MIm75JnSvdWuZfdf3j5ni6hdfG38+gtLE6nVp7eTupVpuljUns4Nfpxv+RnZQbaKHAUnC0lkHZDjqOReNrDiksJwrRY7XbJct8onQMOP9nCeYzO/lrVAeVvkVbMUOIJa8DE43 RTvuXLl3 9r0m0X7L9qvISr2sO2WBe0Zbqz5kOg03SL5C8vxXi7M3kbBe0h77lhpmyS3TF0UXpqNxdMVlQkzLAOnDTq1k+HI/Rh/AmrUuNfOD4yifZbWbEh+yiZN3wIHosU1h5QNEpsUO6EOY+GFFFBEhnYb4uwbeLOvr0T1ZYJaV1KWS/6sCf5TjC9C727OJV51b3yw2DhgAurA/tNe/cSE+uNX+7SQeB05i44Kj8AxaM0NrKC1nCP0S+/8CUUJ069kLU58gzRNKO/RKLmPwpAX4Q/PhHZqlPUJAmriDvyPHH7Sx/5wuMXFOVzA2qIRjawnJrmuNtLVuURT0muvqSz9xXC/jeVgwI8YQNOr+6AsJclnbTPSSONFlx7IG18LspvrjCkMNqv+IozoHdrnAZ/5AbUin5wf7r4ElD3cNJqjR1ooBS26I4KSd+XAPQv1vjbT4NR0D6jNfMvKRzAA8mvh4YE6QE0SP2eLeDI4Rlj0VCS2MKSkOhsLVGzrJqzEsq2LUWukSy1iau3qLArtzStP+mQJ4IhWgaey/n/mg+Dm0I X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 05/09/2024 19:05, Hugh Dickins wrote: > On Thu, 5 Sep 2024, Usama Arif wrote: >> On 05/09/2024 09:46, Hugh Dickins wrote: >>> On Fri, 30 Aug 2024, Usama Arif wrote: >>> >>>> From: Yu Zhao >>>> >>>> If a tail page has only two references left, one inherited from the >>>> isolation of its head and the other from lru_add_page_tail() which we >>>> are about to drop, it means this tail page was concurrently zapped. >>>> Then we can safely free it and save page reclaim or migration the >>>> trouble of trying it. >>>> >>>> Signed-off-by: Yu Zhao >>>> Tested-by: Shuang Zhai >>>> Acked-by: Johannes Weiner >>>> Signed-off-by: Usama Arif >>> >>> I'm sorry, but I think this patch (just this 1/6) needs to be dropped: >>> it is only an optimization, and unless a persuasive performance case >>> can be made to extend it, it ought to go (perhaps revisited later). >>> >> >> I am ok for patch 1 only to be dropped. Patches 2-6 are not dependent on it. >> >> Its an optimization and underused shrinker doesn't depend on it. >> Its possible that folio->new_folio below might fix it? But if it doesn't, >> I can retry later on to make this work and resend it only if it alone shows >> a significant performance improvement. >> >> Thanks a lot for debugging this! and sorry it caused an issue. >> >> >>> The problem I kept hitting was that all my work, requiring compaction and >>> reclaim, got (killably) stuck in or repeatedly calling reclaim_throttle(): >>> because nr_isolated_anon had grown high - and remained high even when the >>> load had all been killed. >>> >>> Bisection led to the 2/6 (remap to shared zeropage), but I'd say this 1/6 >>> is the one to blame. I was intending to send this patch to "fix" it: >>> >>> --- a/mm/huge_memory.c >>> +++ b/mm/huge_memory.c >>> @@ -3295,6 +3295,8 @@ static void __split_huge_page(struct pag >>> folio_clear_active(new_folio); >>> folio_clear_unevictable(new_folio); >>> list_del(&new_folio->lru); >>> + node_stat_sub_folio(folio, NR_ISOLATED_ANON + >>> + folio_is_file_lru(folio)); >> >> Maybe this should have been below? (Notice the folio->new_folio) >> >> + node_stat_sub_folio(new_folio, NR_ISOLATED_ANON + >> + folio_is_file_lru(new_folio)); > > Yes, how very stupid of me (I'm well aware of the earlier bugfixes here, > and ought not to have done a blind copy and paste of those lines): thanks. > > However... it makes no difference. I gave yours a run, expecting a > much smaller negative number, but actually it works out much the same: > because, of course, by this point in the code "folio" is left pointing > to a small folio, and is almost equivalent to new_folio for the > node_stat calculations. > > (I say "almost" because I guess there's a chance that the page at > folio got reused as part of a larger folio meanwhile, which would > throw the stats off (if they weren't off already).) > > So, even with your fix to my fix, this code doesn't work. > Whereas a revert of this 1/6 does work: nr_isolated_anon and > nr_isolated_file come to 0 when the system is at rest, as expected > (and as silence from vmstat_refresh confirms - /proc/vmstat itself > presents negative stats as 0, in order to hide transient oddities). > > Hugh Thanks for trying. I was hoping you had mTHPs enabled and then the folio -> new_folio change would have fixed it. Happy for patch 1 only to be dropped. I can try to figure it out later and send if I can actually show any performance numbers for the fixed version on real world cases. Thanks, Usama > >> >>> if (!folio_batch_add(&free_folios, new_folio)) { >>> mem_cgroup_uncharge_folios(&free_folios); >>> free_unref_folios(&free_folios); >>> >>> And that ran nicely, until I terminated the run and did >>> grep nr_isolated /proc/sys/vm/stat_refresh /proc/vmstat >>> at the end: stat_refresh kindly left a pr_warn in dmesg to say >>> nr_isolated_anon -334013737 >>> >>> My patch is not good enough. IIUC, some split_huge_pagers (reclaim?) >>> know how many pages they isolated and decremented the stats by, and >>> increment by that same number at the end; whereas other split_huge_pagers >>> (migration?) decrement one by one as they go through the list afterwards. >>> >>> I've run out of time (I'm about to take a break): I gave up researching >>> who needs what, and was already feeling this optimization does too much >>> second guessing of what's needed (and its array of VM_WARN_ON_ONCE_FOLIOs >>> rather admits to that). >>> >>> And I don't think it's as simple as moving the node_stat_sub_folio() >>> into 2/6 where the zero pte is substituted: that would probably handle >>> the vast majority of cases, but aren't there others which pass the >>> folio_ref_freeze(new_folio, 2) test - the title's zapped tail pages, >>> or racily truncated now that the folio has been unlocked, for example? >>> >>> Hugh