From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 359DDC83F1B for ; Wed, 16 Jul 2025 08:06:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C1ACF6B009C; Wed, 16 Jul 2025 04:06:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BD3F46B009D; Wed, 16 Jul 2025 04:06:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ABA776B009E; Wed, 16 Jul 2025 04:06:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 8D8F96B009C for ; Wed, 16 Jul 2025 04:06:06 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 6428EC02D1 for ; Wed, 16 Jul 2025 08:06:06 +0000 (UTC) X-FDA: 83669394732.11.38B4B55 Received: from mail-yw1-f179.google.com (mail-yw1-f179.google.com [209.85.128.179]) by imf14.hostedemail.com (Postfix) with ESMTP id 6813F100004 for ; Wed, 16 Jul 2025 08:06:04 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=n6P8zJLU; spf=pass (imf14.hostedemail.com: domain of hughd@google.com designates 209.85.128.179 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752653164; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=bqtxBqtN8vQ8p/jPa38XSR40F5AnDNFFUmJHLtEITu8=; b=4a5pdwFIuNO6TDfSjKMHzLl2p1ubG0Sz3IObaDdKjpy4ZgZDOFBAFq1Tf3zB8o7q8j4Ch/ 9Lz7bfzU6sArUVPboVSYDscszAchFU2/yARMChExbMHuXPuLvcaFKUqJscOxFwXiJqLokA 7/8Hdh+6nNWZegH9yfSu2xJ6AUYGC8w= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752653164; a=rsa-sha256; cv=none; b=YFxlUBck00L8EB8KBHYakGx1M14ugqGAAMHOP0wwGMX/8JGl3hSn6b0Nsh9Tu5JTg8bWGM KSi8lhiiIBUqkWPj4ioWbxQFBJw11Pn0GQvBk/xC5zt3lgjkWjR/3/OBEek0dUJLIpPPA+ y/b0JF6R6VFnhyYQZDItW7ck2htN6iM= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=n6P8zJLU; spf=pass (imf14.hostedemail.com: domain of hughd@google.com designates 209.85.128.179 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yw1-f179.google.com with SMTP id 00721157ae682-713fba639f3so55343677b3.1 for ; Wed, 16 Jul 2025 01:06:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752653163; x=1753257963; darn=kvack.org; h=mime-version:message-id:subject:cc:to:from:date:from:to:cc:subject :date:message-id:reply-to; bh=bqtxBqtN8vQ8p/jPa38XSR40F5AnDNFFUmJHLtEITu8=; b=n6P8zJLUIT9IQIor91Aj6xBpcVbe0yvGE/C9PRny20NsEiMRLammwUp5+Vzm0d0fxi rKNX8lrNE3OSCoP20wbQrfO6c3kH1+dZGUOGb6XDSxYnHcPRFOxEt+4WTFfBDGzJgwgg oN9x8ZU/NIkxfqY+x/UziMMhvIepqFSal/gbRXCZMYJ/PLVL2sIP1VbiiE9wTojRjLv5 oAsPuTi0/EOBB+YsZWvU6j5WC6NYO9h5WRYjJrX/yjCUuQs+BkM9XDYqvWv1nPfeVvkH +KoTh9uFsOvHWN1zDTiyBHYmljTSLephtMp4ETkhsEYodenNjcRAPDk0XNPNSipDytc5 tamw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752653163; x=1753257963; h=mime-version:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=bqtxBqtN8vQ8p/jPa38XSR40F5AnDNFFUmJHLtEITu8=; b=SeAYq8D9IQYk1VXw6cIXyGNLCCfWUt9yM0P37mJCa+bSuAJnnxq2oZgOl2wFCulLmv Fj92qvFpJSYuBgVohT0dPMmfFmZRiKd0OW13mH1wJtmA9eUb4w+1+GfcibBvy+7JNC/1 Xx60pTNiTZS6EZl1xvuqKpfbRmnFV6ER8bhEbZF67g98UqOiq6hSqGz7nzTS1P/yafvh IAMTIsA2HppX9k4Vrae0LMAyBXXJwi/5KeJ/A8z2jH9mRiaR+NxRKQS0uskHbC8Z1K+q 2r5ihpao5Ee8sy8ZxnMmcNeV8TSTkgml20rhPta8JBcF9RIFNz9/FUrIdHadgmxMj3JR Vn2Q== X-Forwarded-Encrypted: i=1; AJvYcCVP9xDkEhCthqgaSvXrP2tn0VTTNPeRxg7eNQjaRhFlhTfRPaDwCqizHTq2ZUs9nU3gGeEYex9ZoA==@kvack.org X-Gm-Message-State: AOJu0Yzp9NnZh2UVHTYX8L0Vv/HWwMEHqk2qSd53LI+veF5IyD9fWZ+d PcaELUvvWxqBhCt6PsPzE15qNBdb1ye5SaLKLmszODGEfodFDqcCAFiut5+nYDOpSQ== X-Gm-Gg: ASbGncsbhaX3RDRdRWZTA7EdfLMKzWr57ULvE5pNIAUNYw0fPNGstqdTgR/iU0rqoFx oeozkuJrmB4iJyWRV/s27+1YyygDj79n7Dqf3gwlTayRh54fAH+6nzAPkahRkKOl12vxHxcdjIq Pa0++sCsSIL/qnW7OAlJ/pb7n2aiWAOzuVcaeJK9lCSQUfdggDznW59aqWPKEgeT96CPgtydDxa 7rbbvU8vyTkqsNJYXgA0EF4qn9XrWP4Qjm7CX8G6SKh/EQJEeRAgf7GCx9SGjrPYLcjRG0sS3kd KoZ0CoyqM+YqhBPGv/dET0/gRNl4ziICmhISRH/zxPCsOgynpZ+myTlF/kXPQaX8KzoUbvh4JeE pXeUIlyKq37lscH1Y5Rv9NNayI80x4NCPiFoHRNm98Ys54kkTEQQYD131Bowile8Y1UfSjIpT0v sWKm7H5hw= X-Google-Smtp-Source: AGHT+IFBVzdIK9GW0gSHL3NVXNkmY6WyahhMky4UyM9jrdqIZPfwYSuC5koJmIQUN+dX6es3hs2lkw== X-Received: by 2002:a05:690c:4d41:b0:712:cc11:af8 with SMTP id 00721157ae682-71835163330mr29495957b3.27.1752653163026; Wed, 16 Jul 2025 01:06:03 -0700 (PDT) Received: from darker.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id 00721157ae682-717c61b4b7dsm28117337b3.58.2025.07.16.01.06.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Jul 2025 01:06:01 -0700 (PDT) Date: Wed, 16 Jul 2025 01:05:50 -0700 (PDT) From: Hugh Dickins To: Andrew Morton cc: Baolin Wang , Baoquan He , Barry Song <21cnbao@gmail.com>, Chris Li , David Rientjes , Kairui Song , Kemeng Shi , Shakeel Butt , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH mm-new 1/2] mm/shmem: hold shmem_swaplist spinlock (not mutex) much less Message-ID: <87beaec6-a3b0-ce7a-c892-1e1e5bd57aa3@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Queue-Id: 6813F100004 X-Rspam-User: X-Rspamd-Server: rspam09 X-Stat-Signature: onxsjgzwwi6k8dgx8cfs5bgncrbm1dcw X-HE-Tag: 1752653164-947475 X-HE-Meta: U2FsdGVkX18pF2TiDkxqZuL3RDsmrHfZw0bILqGTWVB0aHtMjP5BM5Ehh59RhExiwYDvsH0Gff/f7sfbvkwP4R90qR4vYK+u6uxd5Z64Edi9jKdsQBUp1R6TQk9T+4KTDxUp1SPAYg5QUVOF5uhYFEy2icLxAJk3yY9VDj+ql7TWvZ1RHT8DcIFpMODVRql11kUcWByLN9BmUkDUA8Ucz3qXpj1XTQ2xbu7UJpAKJjOTWRSI3MyqtrIqHv71R7a9Ri5SR+TEbHNw76Z4LG0WWcPsMwtjiYdLHa5DKYT2AMlAZoisff86Xd7+K8G4zzxtt6Hx15APjs+LFZBMPPFhP/KBDUEhFYZPG8dlI6ulAr1ds9Bxiwpis5nPalm3H+4oGarBQjD9AhzXkkkc4pq6cJjNgq/Y2OZANl2P9TpPE5kVg2HByKK+Oc8zYVfFX1853F8j4sbxliWOeMLBtwhsqM95PLhtgj6My5K1xChZOhNAJ3dMFq2qeLkJmHAFIS1aWL4WuxVTdwX8oG1lZC6gDjL16XIeRKRwOyqZ7/Qo4Y+6WME2Ips5ty78mmYiQX3kNk/fB+V8inkZ8Uis9YyaUD4hHwLsaf9t3qz6J4eWI8Clt/+Ja8RWA+B1SY3Mmx36ACUYivTALgqTU8Wo7f0ij904JSljYcBsCCesXZ6XYsdCCG0kDdZTdWcAzf/Th7+R4ZHEcH6ftvnzAiEwdBWzjnQEf4qUn9vcTIrnontCtefVRzwEQyJVgYzDJ6X2Uu4L7C/giuriUIQezryllg1lRbmco54Y9vJ1GA2owWnP4pzGO4SsrFf779gaAN//ipcqQIEaX4epCOWzxszCPdafUS0t1QoGKTOCLKqMaD9TcXP0aMLc3tcUD1xR8lhBVWaBtYENknWRNTOkkj1Zr2v8i2jYeJdP0JaoY44B/PpYcm/RpecjwI4kIiOi4Ovk4aNxu+5nd8bVOcPmMtRnls5 HyevA8HK lzudqmb3xlBx3XrWc9ZL0l4AEzdGA9GkK8sagmsV+0qVIORWsyUFViYpGc2hnRbe65bnFbJdO8u/3wnXcSvJqlIyI/7lK60T4pUbFCzmkAaN74AyrYZxAp75/QStfGofi8dLGgL668UM4jUOt4aQITkt+uWdoOpOaP9jp8I6xO+IPq2hYd0cmhJt4TWv03lb5ZptdbgSnsI+WoR9YXH/z+nLlKd+mmXed9l+GXYq9lEyb32j64rL3nDBKiAFvmOgeWUJUFL9DuG6BQnNxOAp/IjUzuLOByxfN7+Lmg8DpKQ745KCWJDgYbvOpUtlh/g6ir/HS3E6uDA9wf4Zq0Tu0Rdo70RqLqv8ctu5kpcNJDNs0sdEwBjgWRCU5rkHCBfMg7p44AwQnNEBrR14= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A flamegraph (from an MGLRU load) showed shmem_writeout()'s use of the global shmem_swaplist_mutex worryingly hot: improvement is long overdue. 3.1 commit 6922c0c7abd3 ("tmpfs: convert shmem_writepage and enable swap") apologized for extending shmem_swaplist_mutex across add_to_swap_cache(), and hoped to find another way: yes, there may be lots of work to allocate radix tree nodes in there. Then 6.15 commit b487a2da3575 ("mm, swap: simplify folio swap allocation") will have made it worse, by moving shmem_writeout()'s swap allocation under that mutex too (but the worrying flamegraph was observed even before that change). There's a useful comment about pagelock no longer protecting from eviction once moved to swap cache: but it's good till shmem_delete_from_page_cache() replaces page pointer by swap entry, so move the swaplist add between them. We would much prefer to take the global lock once per inode than once per page: given the possible races with shmem_unuse() pruning when !swapped (and other tasks racing to swap other pages out or in), try the swaplist add whenever swapped was incremented from 0 (but inode may already be on the list - only unuse and evict bother to remove it). This technique is more subtle than it looks (we're avoiding the very lock which would make it easy), but works: whereas an unlocked list_empty() check runs a risk of the inode being unqueued and left off the swaplist forever, swapoff only completing when the page is faulted in or removed. The need for a sleepable mutex went away in 5.1 commit b56a2d8af914 ("mm: rid swapoff of quadratic complexity"): a spinlock works better now. This commit is certain to take shmem_swaplist_mutex out of contention, and has been seen to make a practical improvement (but there is likely to have been an underlying issue which made its contention so visible). Signed-off-by: Hugh Dickins --- mm/shmem.c | 59 ++++++++++++++++++++++++++++++------------------------ 1 file changed, 33 insertions(+), 26 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 60247dc48505..33675361031b 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -292,7 +292,7 @@ bool vma_is_shmem(struct vm_area_struct *vma) } static LIST_HEAD(shmem_swaplist); -static DEFINE_MUTEX(shmem_swaplist_mutex); +static DEFINE_SPINLOCK(shmem_swaplist_lock); #ifdef CONFIG_TMPFS_QUOTA @@ -432,10 +432,13 @@ static void shmem_free_inode(struct super_block *sb, size_t freed_ispace) * * But normally info->alloced == inode->i_mapping->nrpages + info->swapped * So mm freed is info->alloced - (inode->i_mapping->nrpages + info->swapped) + * + * Return: true if swapped was incremented from 0, for shmem_writeout(). */ -static void shmem_recalc_inode(struct inode *inode, long alloced, long swapped) +static bool shmem_recalc_inode(struct inode *inode, long alloced, long swapped) { struct shmem_inode_info *info = SHMEM_I(inode); + bool first_swapped = false; long freed; spin_lock(&info->lock); @@ -450,8 +453,11 @@ static void shmem_recalc_inode(struct inode *inode, long alloced, long swapped) * to stop a racing shmem_recalc_inode() from thinking that a page has * been freed. Compensate here, to avoid the need for a followup call. */ - if (swapped > 0) + if (swapped > 0) { + if (info->swapped == swapped) + first_swapped = true; freed += swapped; + } if (freed > 0) info->alloced -= freed; spin_unlock(&info->lock); @@ -459,6 +465,7 @@ static void shmem_recalc_inode(struct inode *inode, long alloced, long swapped) /* The quota case may block */ if (freed > 0) shmem_inode_unacct_blocks(inode, freed); + return first_swapped; } bool shmem_charge(struct inode *inode, long pages) @@ -1399,11 +1406,11 @@ static void shmem_evict_inode(struct inode *inode) /* Wait while shmem_unuse() is scanning this inode... */ wait_var_event(&info->stop_eviction, !atomic_read(&info->stop_eviction)); - mutex_lock(&shmem_swaplist_mutex); + spin_lock(&shmem_swaplist_lock); /* ...but beware of the race if we peeked too early */ if (!atomic_read(&info->stop_eviction)) list_del_init(&info->swaplist); - mutex_unlock(&shmem_swaplist_mutex); + spin_unlock(&shmem_swaplist_lock); } } @@ -1526,7 +1533,7 @@ int shmem_unuse(unsigned int type) if (list_empty(&shmem_swaplist)) return 0; - mutex_lock(&shmem_swaplist_mutex); + spin_lock(&shmem_swaplist_lock); start_over: list_for_each_entry_safe(info, next, &shmem_swaplist, swaplist) { if (!info->swapped) { @@ -1540,12 +1547,12 @@ int shmem_unuse(unsigned int type) * (igrab() would protect from unlink, but not from unmount). */ atomic_inc(&info->stop_eviction); - mutex_unlock(&shmem_swaplist_mutex); + spin_unlock(&shmem_swaplist_lock); error = shmem_unuse_inode(&info->vfs_inode, type); cond_resched(); - mutex_lock(&shmem_swaplist_mutex); + spin_lock(&shmem_swaplist_lock); if (atomic_dec_and_test(&info->stop_eviction)) wake_up_var(&info->stop_eviction); if (error) @@ -1556,7 +1563,7 @@ int shmem_unuse(unsigned int type) if (!info->swapped) list_del_init(&info->swaplist); } - mutex_unlock(&shmem_swaplist_mutex); + spin_unlock(&shmem_swaplist_lock); return error; } @@ -1646,30 +1653,30 @@ int shmem_writeout(struct folio *folio, struct swap_iocb **plug, folio_mark_uptodate(folio); } - /* - * Add inode to shmem_unuse()'s list of swapped-out inodes, - * if it's not already there. Do it now before the folio is - * moved to swap cache, when its pagelock no longer protects - * the inode from eviction. But don't unlock the mutex until - * we've incremented swapped, because shmem_unuse_inode() will - * prune a !swapped inode from the swaplist under this mutex. - */ - mutex_lock(&shmem_swaplist_mutex); - if (list_empty(&info->swaplist)) - list_add(&info->swaplist, &shmem_swaplist); - if (!folio_alloc_swap(folio, __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN)) { - shmem_recalc_inode(inode, 0, nr_pages); + bool first_swapped = shmem_recalc_inode(inode, 0, nr_pages); + + /* + * Add inode to shmem_unuse()'s list of swapped-out inodes, + * if it's not already there. Do it now before the folio is + * removed from page cache, when its pagelock no longer + * protects the inode from eviction. And do it now, after + * we've incremented swapped, because shmem_unuse() will + * prune a !swapped inode from the swaplist. + */ + if (first_swapped) { + spin_lock(&shmem_swaplist_lock); + if (list_empty(&info->swaplist)) + list_add(&info->swaplist, &shmem_swaplist); + spin_unlock(&shmem_swaplist_lock); + } + swap_shmem_alloc(folio->swap, nr_pages); shmem_delete_from_page_cache(folio, swp_to_radix_entry(folio->swap)); - mutex_unlock(&shmem_swaplist_mutex); BUG_ON(folio_mapped(folio)); return swap_writeout(folio, plug); } - if (!info->swapped) - list_del_init(&info->swaplist); - mutex_unlock(&shmem_swaplist_mutex); if (nr_pages > 1) goto try_split; redirty: -- 2.43.0