From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34ED8C3ABDA for ; Wed, 14 May 2025 20:17:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C116C6B0089; Wed, 14 May 2025 16:17:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B72F96B008A; Wed, 14 May 2025 16:17:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9C4126B008C; Wed, 14 May 2025 16:17:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 785EA6B0089 for ; Wed, 14 May 2025 16:17:46 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 1D572E0412 for ; Wed, 14 May 2025 20:17:48 +0000 (UTC) X-FDA: 83442624216.04.1F4C672 Received: from mail-pj1-f54.google.com (mail-pj1-f54.google.com [209.85.216.54]) by imf11.hostedemail.com (Postfix) with ESMTP id 2A22F4000E for ; Wed, 14 May 2025 20:17:45 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SUP0+ZGB; spf=pass (imf11.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.216.54 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747253866; a=rsa-sha256; cv=none; b=VCW5UFg1gCI0N2WMnkG2lCA1UtTeLwpSohlhhZphFeum45gtRAb3TsZ+Aivu6IMbzjXUdb nAQCpnYWo13Bojz1IO8KZSBjlTfz14zPpfAeOcyBK1qjVXXHkKWg+vd0RtFInPwpnddi1G VVhsPh01q/KoHCP9tUq+mGuGuNJqtas= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SUP0+ZGB; spf=pass (imf11.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.216.54 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747253866; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8AlrphzJzFh3a5J5QPmLDB74oQwYSGsKj2pYO/+pAWY=; b=Oc0TgwVrwblPxOTTIl7F6vSF9S3+H6UL/uujjy2WHZSIxc3YK8XVNu0ewxq195/j8FV5Ta R69bDBZpMf5kQEjUegYF/On1lboHbElNHBz2evn5xP7i+kDup//Mew2P4sjcHnwoZH3+nY BvVgIp+4MyLWvZcevmi+gREmYz3rrpA= Received: by mail-pj1-f54.google.com with SMTP id 98e67ed59e1d1-30c47918d84so263073a91.3 for ; Wed, 14 May 2025 13:17:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1747253864; x=1747858664; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=8AlrphzJzFh3a5J5QPmLDB74oQwYSGsKj2pYO/+pAWY=; b=SUP0+ZGB3Asl+rT01HyjGsO+2zarQ4zPBWLSjlSP/pFw3ERnCEHhg8025zG4SVvGdL GLk5ZbI5/1nuDxfycXdFk05W2FksyvAvzMVsVE8OQztVXJvLH91ppYxcyXEzOsdPZgFj tkteCNn1oBPzzdNXmCaYc0aZlaRXpDdvEMcNpjpHTVHMjCS/d+XTeFGBz3+Z3eEMYsyK S5HRxXFs9C+7H3cUYq1Ht7zNvYrQqD45crnYuIkFDBojpTzUJC+ei7dNi2SMY2pf7R3v HowWC4o59j887adT1iphqjUzCGTa1gGNu34BOgOKnKmziZp6HyH9W45krJ0iBHLIeN8A 6isA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747253864; x=1747858664; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=8AlrphzJzFh3a5J5QPmLDB74oQwYSGsKj2pYO/+pAWY=; b=GK2O+ix0wgh3xOQPvjB+b9ZqZiWrzhmhKPWbFEF7u33o5RO9MMSm3R1NBphxFtXkgo K5RjuNt6PboYm977ajQtYGZf+nGDkcH83CgFFh6kZ4yzCmN5bidxx96OIl2tuxwHCWmZ 09Mqgi77CiGndsTP1RqYoSqEg+ooDnWqv3w/lwDQFiz19Xem66vfro9uwM6NxJx029nZ DBFiVSoWCcByuSKjnq4jGdP33q9pJjpv7BUezOfFKWqTN9ffA9Mb8UO1pWvXDgjqZtoK +WsgCmFuFhx2PW5W5ad+bOdvG71LjWLheSGlnH56xeliJQqM7HjGwyQGSfXuGA5VlQue l7fw== X-Gm-Message-State: AOJu0Yx58CfawS7RwwZHiKTIEDSc4Wi3aWd+WEnBYOtLw0c+085I9EQh 0nKEpd3h0nFS6/NA/jcpCe8Gvhccfs+b1uIeyZiFsiyOEjmbT6/WR296X3IEVdk= X-Gm-Gg: ASbGncuct7UIN5lWzhNLe6/HfZivQnbn18wEO9F8jzCa5pb3ZSNzossfGUVr6nP2Pbr ngPLpvDhZ/WEhodl/w+7a9Bkf+kvwTBq/C4gj4MYRaYSxM+569rt/roYeqTK7H/ZgSr4OT/wvAC zHcheOQFhIAjD2vnjr/BeheX+Un2vvOhuymoLhiDuCqfO9W6Gep3IdwoCKRCKlOt1xcLliFu2hv mYutBur7a8RujYJNl6eL/CrigfbQlqrU8Q7QA7JVRSf12FlwN8rR7yvvPmda6aGhMtSIr3Ekv+l e7NyOZDydeADZR+pe5VVswfS20skOxtGpb2/e1SFqtywigoYz+Z4k94wPVksRVAr3SsjsWsR X-Google-Smtp-Source: AGHT+IEePKZsW2RTsfW0tKVf1T8xZLvycmuR3dieqTOJNDka7eCP+i5yn8SNIGAd4e0FacSKYdOiaQ== X-Received: by 2002:a17:90b:3802:b0:2fa:1a23:c01d with SMTP id 98e67ed59e1d1-30e2e5df0cbmr7143131a91.21.1747253864230; Wed, 14 May 2025 13:17:44 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-30e33401934sm2003692a91.9.2025.05.14.13.17.39 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 14 May 2025 13:17:43 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Matthew Wilcox , Hugh Dickins , Chris Li , David Hildenbrand , Yosry Ahmed , "Huang, Ying" , Nhat Pham , Johannes Weiner , Baolin Wang , Baoquan He , Barry Song , Kalesh Singh , Kemeng Shi , Tim Chen , Ryan Roberts , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 01/28] mm, swap: don't scan every fragment cluster Date: Thu, 15 May 2025 04:17:01 +0800 Message-ID: <20250514201729.48420-2-ryncsn@gmail.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250514201729.48420-1-ryncsn@gmail.com> References: <20250514201729.48420-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 2A22F4000E X-Stat-Signature: cs8j4robdff1ccgmsifz8xwdr4foy58r X-HE-Tag: 1747253865-311368 X-HE-Meta: U2FsdGVkX1+m5NDJEjRr9zacwwTG/VJimQbn9k8EO1U7koOZA3YxRrwbQMlquLk/yQkEuSII4YRbIez3kXtMtUGWX13BiUF7sym46NbOsNXTtUOp0SZTStCRkUYk0fs4kuHsbmPiWA8DzU6iDdYeKGq/3I1loLRnuv82Jt2sGPkYggHPjz9i6fpXj/CL5UcLE8Dmzar+GmRthlf6Q9yKNPXyUfeb22VMrO/nwidsPSre0oXyDfUBLdRo06+GZW5hQ2FoRNGSpMiPGdh3PmMu5096/hVKzFjaBbQ0roYiDI8a3jb9VEr5O/jGWPcfRB9ovURGWp/SlKOEfk4k/bumWaMBcsPCX9k48sOmZhGJsIAJCJc3qbK2NXWMwsGTRGdFosHI7NWpnT/gY5/5LkGevLe4x+zmkFQu5SIdBGhcTL+XwrLJiKlYP/IBDMxUUI6RYBgU+/0AgZXkj1bGHYld5ovZc/UrBws5Qm+4K0gTGxkA6cOesTyGu74aSbGgN2dRWyJpOJCFhTUWLzpVAteQKNgkMQC5u3JLf7K7Lf/Vrb5IlN2K3uvD578WhzDBR4wFiyXFjkp7vnBnD+o+i1IrIiB7Wx+SxKm3ZBBcBq0w5EpiWLXb9ALoaKDbeOhjk4c/U8cD54WuI7TSp92r85CEPXakr2rPDwxV+40wbkXy4N7C8pMdtkTvw6/9R1n0LaHIzDqTHfTBFY8R8RppfC5Ro26BSO7srAx9ULWKPspok9uyUM556AsNudv8bkhkAK+2MWwAHLWPwO+cenaxRhApf7AiVS77HzIfvoJiPTIkzLchlDpAbCSXdVETRk6p+fnLsgWBxcFfTt+pMu+MHAUQ8KM01lvyQPB9Um1UYm/8JzfUSf+W+wJryMmKWnM3hDW+IHYPCEttd9Tx6Vevu8UbqqnJt59/suZXBynvdCyWmg8JtCfvo0iP99IInZGr1TkCiR62pTc7qj46MAbH48Y /7kMP9uH DCQaAReyhavXBgtx0A39E8C5vg5YvV+LVioeN7434li7XbD7sdPhVKzga3gFgQO9SkFTgJkyeXLHqv6Dq7O5tsO8epGRc0Wv4BwN5VIci0Z9iLEyxDVa38FSCSt2NtReJcjEqUhueri9TnBpXawMgTxn44ALPWrkLojO8fOd7qsiTtFBSRUcrRq130UKQUnMwZZRf0IxHOb3uIGcANM40yK2V08BKdkqQxS/kNM+LIpixOCtcnBD0i1eNeoeOlQJTtcJRvnxN/dJeMV4DbTeyMOzw70XH7qUvzgTTHhCCVr4NpKI0PPesLqB7mlBIMBvfAi6hd+U1yQknD/5a7pj0Jb/ij/PtZJmFX5wXqgoFAI8nOyMhnTBVcmb8m6Yu8C0yBS/k5s0dBrmaFGZNR/RHWEnoYSFTGL9Qp0EqRy/8dOHOVNqwDM87yYlAopc9LyqIFMfvtWZmlBgOw3XEa+DikHyN22Rmk2gYXMlGTfAfahM5u3unyNzQ5+Qnm7PfSf93GFxMJ0malevP7QuH/jrRN0G4NyawxRZd47F0WzqiejrQGxSjVSuZ08pSE/GyTAVqz45ysGSU4FTwLlhEYZXFvi6BhJW9qhO1c54i X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Fragment clusters were failing high order allocation already, the reason we scan it now is that a swap entry may get freed without releasing the cache so a swap map entry will end up in HAS_CACHE only status and the cluster won't be moved back to non-full or free cluster list. The chance is low and only happens with the device usage is low (!vm_swap_full()). This is especially unhelpful for SWP_SYNCHRONOUS_IO devices as swap cache almost always gets freed when count reaches zero for these device. And besides, high order allocation failure isn't a critical issue. Having the scan actually slow down mTHP allocation by a lot when the fragment cluster list is long. The HAS_CACHE issue will be fixed in a proper way later, so drop this fragment cluster scanning design. Signed-off-by: Kairui Song --- include/linux/swap.h | 1 - mm/swapfile.c | 32 +++++++++----------------------- 2 files changed, 9 insertions(+), 24 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index bc0e1c275fc0..817e427a47d2 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -310,7 +310,6 @@ struct swap_info_struct { /* list of cluster that contains at least one free slot */ struct list_head frag_clusters[SWAP_NR_ORDERS]; /* list of cluster that are fragmented or contented */ - atomic_long_t frag_cluster_nr[SWAP_NR_ORDERS]; unsigned int pages; /* total of usable pages of swap */ atomic_long_t inuse_pages; /* number of those currently in use */ struct swap_sequential_cluster *global_cluster; /* Use one global cluster for rotating device */ diff --git a/mm/swapfile.c b/mm/swapfile.c index 026090bf3efe..34188714479f 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -470,11 +470,6 @@ static void move_cluster(struct swap_info_struct *si, else list_move_tail(&ci->list, list); spin_unlock(&si->lock); - - if (ci->flags == CLUSTER_FLAG_FRAG) - atomic_long_dec(&si->frag_cluster_nr[ci->order]); - else if (new_flags == CLUSTER_FLAG_FRAG) - atomic_long_inc(&si->frag_cluster_nr[ci->order]); ci->flags = new_flags; } @@ -926,32 +921,25 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o swap_reclaim_full_clusters(si, false); if (order < PMD_ORDER) { - unsigned int frags = 0, frags_existing; - while ((ci = isolate_lock_cluster(si, &si->nonfull_clusters[order]))) { found = alloc_swap_scan_cluster(si, ci, cluster_offset(si, ci), order, usage); if (found) goto done; - /* Clusters failed to allocate are moved to frag_clusters */ - frags++; } - frags_existing = atomic_long_read(&si->frag_cluster_nr[order]); - while (frags < frags_existing && - (ci = isolate_lock_cluster(si, &si->frag_clusters[order]))) { - atomic_long_dec(&si->frag_cluster_nr[order]); - /* - * Rotate the frag list to iterate, they were all - * failing high order allocation or moved here due to - * per-CPU usage, but they could contain newly released - * reclaimable (eg. lazy-freed swap cache) slots. - */ + /* + * Scan only one fragment cluster is good enough. Order 0 + * allocation will surely success, and mTHP allocation failure + * is not critical, and scanning one cluster still keeps the + * list rotated and scanned (for reclaiming HAS_CACHE). + */ + ci = isolate_lock_cluster(si, &si->frag_clusters[order]); + if (ci) { found = alloc_swap_scan_cluster(si, ci, cluster_offset(si, ci), - order, usage); + order, usage); if (found) goto done; - frags++; } } @@ -973,7 +961,6 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o * allocation, but reclaim may drop si->lock and race with another user. */ while ((ci = isolate_lock_cluster(si, &si->frag_clusters[o]))) { - atomic_long_dec(&si->frag_cluster_nr[o]); found = alloc_swap_scan_cluster(si, ci, cluster_offset(si, ci), 0, usage); if (found) @@ -3234,7 +3221,6 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, for (i = 0; i < SWAP_NR_ORDERS; i++) { INIT_LIST_HEAD(&si->nonfull_clusters[i]); INIT_LIST_HEAD(&si->frag_clusters[i]); - atomic_long_set(&si->frag_cluster_nr[i], 0); } /* -- 2.49.0