From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C4E2C3DA4A for ; Sat, 3 Aug 2024 10:38:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1948E6B007B; Sat, 3 Aug 2024 06:38:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 145846B0083; Sat, 3 Aug 2024 06:38:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 00C506B0085; Sat, 3 Aug 2024 06:38:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D7A426B007B for ; Sat, 3 Aug 2024 06:38:57 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 550C5C0959 for ; Sat, 3 Aug 2024 10:38:57 +0000 (UTC) X-FDA: 82410586314.03.A045FA3 Received: from mail-vs1-f50.google.com (mail-vs1-f50.google.com [209.85.217.50]) by imf03.hostedemail.com (Postfix) with ESMTP id 93F2920017 for ; Sat, 3 Aug 2024 10:38:55 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=none; spf=pass (imf03.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.50 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722681489; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=muAGmuHbBXdTRByfVV7oFSS1BtcMM0muLKhOERFMDxI=; b=5+dXrO+BOa5GNnMzA7BPeTGx2vbA65tJjnPbClA0I0304vJrYkP9u6hXVHQ17ySi4nkQbv E87gwzSPC7Pv9Sz4qegXSKRrYFhE4RAckmSEERlir4EgiBFbSenacLtd8acrfSdiEqPTZe XuOCt2YHfCpxgJBV8GHB9G6VYf5wQiU= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=none; spf=pass (imf03.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.50 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722681489; a=rsa-sha256; cv=none; b=KT830GDGMfSfQ4luX602SVh5+P4H43VsW+nmMlLqKi8uwkapry7JNoNhMCxhuJ6Hk3ACPF w6nNeBq/ZRhaEjf7nIMr/1qCe6BF4h6a4yHn+adjhG7P335EZgPEClZCIqcZPcGp71kmkA JHWvNeZLuE/9sBzXeleLpOuesCl7LW8= Received: by mail-vs1-f50.google.com with SMTP id ada2fe7eead31-49288fafca9so2676557137.3 for ; Sat, 03 Aug 2024 03:38:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722681535; x=1723286335; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=muAGmuHbBXdTRByfVV7oFSS1BtcMM0muLKhOERFMDxI=; b=drhsWm4PhvfWPel/KdH3sfpcigJtWbOJPt41guoL21Rgpwshpv24gVqmMqxZDrlYJd Kc3vtu3vEGo8iNDBUmASNY0A3ck2S9GoJgZtIQ2YhkLK/dtqUwUfGezVzeOKMlerb2o0 QbOFcrVjnkw2eM98eUEdjKmiJVdDUJUHZf6GQk+JGEPahwAO7MhNKX+LsE0aF2A09qhT uWdN54wWpovjlg+llm22f8oLB3GouF1K0t5G33OnrIqHP/c5y3n4rPm32k5WUJndEgHl Ct8NLoTasvjEdAb8PK75tQfVpKeB6ZEbC8g/uIkBYHlKm5NRQTFNsHoQjrixrEIWkvfa OrgA== X-Forwarded-Encrypted: i=1; AJvYcCW+FcFMgvCc8bdJ5Dfu8jnqn5K+J9WKnI8i9OEwWcviTP0/TQ9GHVgE5Q5bfxQ0IqyM2DYqC+ry8uElNHL9/CZlnpI= X-Gm-Message-State: AOJu0Yx4wO8O5RF1bkS3zrFCSuk7QmRSiL7YGXFui8v/T4PG7bz5v+4s al2bp6sv6OYWCopn1SAGPrxJbs7vu7nvEo8hNxDJLzmauje+Co4xHhfherBToE0FsnDIJd30gMD IrjEawYO4JWwyIzONJZUBaJcImLA= X-Google-Smtp-Source: AGHT+IH0h0yRvr9dorbI2WUOUQi38WG0jp3pdwv+elWfjFPN0q+34vtYRmxMPsA1qDeVgws4ae225mgiokXN5NQne+c= X-Received: by 2002:a05:6102:d8f:b0:493:b0c2:ad3c with SMTP id ada2fe7eead31-4945bed096cmr9467619137.22.1722681534511; Sat, 03 Aug 2024 03:38:54 -0700 (PDT) MIME-Version: 1.0 References: <20240730-swap-allocator-v5-0-cb9c148b9297@kernel.org> <20240730-swap-allocator-v5-6-cb9c148b9297@kernel.org> In-Reply-To: <20240730-swap-allocator-v5-6-cb9c148b9297@kernel.org> From: Barry Song Date: Sat, 3 Aug 2024 18:38:41 +0800 Message-ID: Subject: Re: [PATCH v5 6/9] mm: swap: allow cache reclaim to skip slot cache To: chrisl@kernel.org Cc: Andrew Morton , Kairui Song , Hugh Dickins , Ryan Roberts , "Huang, Ying" , Kalesh Singh , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 4g34rob7usdxu5t546e9y36ie7c34mqa X-Rspam-User: X-Rspamd-Queue-Id: 93F2920017 X-Rspamd-Server: rspam02 X-HE-Tag: 1722681535-101161 X-HE-Meta: U2FsdGVkX1/KlfE0RRDwLetpGyxwhwlmiFHH29zz6Vvm9qV4QQVNZQgZnmX72vgWNEpqdcBv/RC0C+PYMXBqvYS+V3JEGXexUGEdPtaosAuV1JC4rZsiTMLIPGGjqFYI6H3gd+GIVKd/E+X0VCPDv714hvFYZR6Zyw2NCWYBuvzoMpBaw7i1Ucx3qprkUbfnSFp56m8BW669EwkIo1axv65eKXCXI3CjhRln9B8PEwG0eXTpVxlhdTLKzXIvqyQd93cRONJWj0WMEII35ilCBsM8tzY9NZd9CKeJXMQNLZyFNMmIAl4O/+Jvr50ifRrmuTa1mVppguRApXkxxS7SLgIJB3qVJi+UMLuZw/+tkwCn8bQ+TY/UjR92Um6vhO+bMpUSYe0Skx4VgyWZOq21qSIxvZGpSuJW3m48vRa3zgswEw5dXhq8VYDNqVLLeW97+Zg/WgyP0foJQ1upewYQY3cytw9F24t38EVM0haR6rGiAftvU7S24JmRYyJBqkBjjSYtKVq78Ap3atLPbxTdP3piCOi1eWMpvdWvjzGrt9iOSGCCB3ejwT7ONLKOqauwsFZz8bkFMcH53rmLW8PhdJ2ujlQ1SHIighXk6Zqq+4T4FyDw9oG33vnFhpLW1upQrLrzMDxaCEVG3HOLbzWsjua4AaFTOyKBkjsbufw1CSCkS4mSlYSINy+zhn6cGZRiEyTEHN+w7I3rgzCodW/mJERuJ/54DDnIfqJD0NlsuQ0cbwP89wSMVEaHztmgb6oiA4aW2yb67l/8tU2vzuzkwl883ZUmPBH1Z1eVfSvvAvvN5FxawrwK34PhIcFx2ZzpeCZvnTCYdvPKhDs2dnipMq8QcBZYbSb+bfN3rwz6K3Gs13ULg7+pJ8ljZGn4+ofnOaIrQuYODflCT+ZqjBYIbIZ55J+pqQ7Vf3zJ8Vpkl5SlCWSxPMCLuimBR9rwrU2jbiNf+tHRS3b4rE2fYhR GVODY+2p jMjTEivNgVaJDoSkjx5Ead1AW1pgEf/SuhQaKLkpmUumW7+HIaaLqZLQzvoJHbJAwQWoig0UyeB3rNpBTEZJAjdBojgvizb/hQ2oAei3mxfwQdX16JD9qx1aNfdeJS/CuveyCNNMRo24L0xxZb2vcmTFZTGH9iroUi/wnb0wCvP/ca/pC080TWiVatOmiJ4kD/UmH5KY2dzxbNQ2lGVGl+kg4n5A0nTIgVhk/LC4wNEJzI/Z7JhHcNxL/3vRQwQXsBKSjS3P9k2SLmNyLOufrD6OQzxji5C32MrFebEQZJB9QJ0qO+zRnG6IT6mDWUPiTfNvYr9C48xesy36yXGkZOCXW+h0FJ5d3+Z6ARY3Bz0ymw9KilDII+et32DAuk4xs2d98sRC6yll1tdY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jul 31, 2024 at 2:49=E2=80=AFPM wrote: > > From: Kairui Song > > Currently we free the reclaimed slots through slot cache even > if the slot is required to be empty immediately. As a result > the reclaim caller will see the slot still occupied even after a > successful reclaim, and need to keep reclaiming until slot cache > get flushed. This caused ineffective or over reclaim when SWAP is > under stress. > > So introduce a new flag allowing the slot to be emptied bypassing > the slot cache. > > Signed-off-by: Kairui Song > --- > mm/swapfile.c | 152 +++++++++++++++++++++++++++++++++++++++++-----------= ------ > 1 file changed, 109 insertions(+), 43 deletions(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 9b63b2262cc2..4c0fc0409d3c 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -53,8 +53,15 @@ > static bool swap_count_continued(struct swap_info_struct *, pgoff_t, > unsigned char); > static void free_swap_count_continuations(struct swap_info_struct *); > +static void swap_entry_range_free(struct swap_info_struct *si, swp_entry= _t entry, > + unsigned int nr_pages); > static void swap_range_alloc(struct swap_info_struct *si, unsigned long = offset, > unsigned int nr_entries); > +static bool folio_swapcache_freeable(struct folio *folio); > +static struct swap_cluster_info *lock_cluster_or_swap_info( > + struct swap_info_struct *si, unsigned long offset); > +static void unlock_cluster_or_swap_info(struct swap_info_struct *si, > + struct swap_cluster_info *ci); > > static DEFINE_SPINLOCK(swap_lock); > static unsigned int nr_swapfiles; > @@ -129,8 +136,25 @@ static inline unsigned char swap_count(unsigned char= ent) > * corresponding page > */ > #define TTRS_UNMAPPED 0x2 > -/* Reclaim the swap entry if swap is getting full*/ > +/* Reclaim the swap entry if swap is getting full */ > #define TTRS_FULL 0x4 > +/* Reclaim directly, bypass the slot cache and don't touch device lock *= / > +#define TTRS_DIRECT 0x8 > + > +static bool swap_is_has_cache(struct swap_info_struct *si, > + unsigned long offset, int nr_pages) > +{ > + unsigned char *map =3D si->swap_map + offset; > + unsigned char *map_end =3D map + nr_pages; > + > + do { > + VM_BUG_ON(!(*map & SWAP_HAS_CACHE)); > + if (*map !=3D SWAP_HAS_CACHE) > + return false; > + } while (++map < map_end); > + > + return true; > +} > > /* > * returns number of pages in the folio that backs the swap entry. If po= sitive, > @@ -141,12 +165,22 @@ static int __try_to_reclaim_swap(struct swap_info_s= truct *si, > unsigned long offset, unsigned long flag= s) > { > swp_entry_t entry =3D swp_entry(si->type, offset); > + struct address_space *address_space =3D swap_address_space(entry)= ; > + struct swap_cluster_info *ci; > struct folio *folio; > - int ret =3D 0; > + int ret, nr_pages; > + bool need_reclaim; > > - folio =3D filemap_get_folio(swap_address_space(entry), swap_cache= _index(entry)); > + folio =3D filemap_get_folio(address_space, swap_cache_index(entry= )); > if (IS_ERR(folio)) > return 0; > + > + /* offset could point to the middle of a large folio */ > + entry =3D folio->swap; > + offset =3D swp_offset(entry); > + nr_pages =3D folio_nr_pages(folio); > + ret =3D -nr_pages; > + > /* > * When this function is called from scan_swap_map_slots() and it= 's > * called by vmscan.c at reclaiming folios. So we hold a folio lo= ck > @@ -154,14 +188,50 @@ static int __try_to_reclaim_swap(struct swap_info_s= truct *si, > * case and you should use folio_free_swap() with explicit folio_= lock() > * in usual operations. > */ > - if (folio_trylock(folio)) { > - if ((flags & TTRS_ANYWAY) || > - ((flags & TTRS_UNMAPPED) && !folio_mapped(folio)) || > - ((flags & TTRS_FULL) && mem_cgroup_swap_full(folio))) > - ret =3D folio_free_swap(folio); > - folio_unlock(folio); > + if (!folio_trylock(folio)) > + goto out; > + > + need_reclaim =3D ((flags & TTRS_ANYWAY) || > + ((flags & TTRS_UNMAPPED) && !folio_mapped(folio))= || > + ((flags & TTRS_FULL) && mem_cgroup_swap_full(foli= o))); > + if (!need_reclaim || !folio_swapcache_freeable(folio)) > + goto out_unlock; > + > + /* > + * It's safe to delete the folio from swap cache only if the foli= o's > + * swap_map is HAS_CACHE only, which means the slots have no page= table > + * reference or pending writeback, and can't be allocated to othe= rs. > + */ > + ci =3D lock_cluster_or_swap_info(si, offset); > + need_reclaim =3D swap_is_has_cache(si, offset, nr_pages); > + unlock_cluster_or_swap_info(si, ci); > + if (!need_reclaim) > + goto out_unlock; > + > + if (!(flags & TTRS_DIRECT)) { > + /* Free through slot cache */ > + delete_from_swap_cache(folio); > + folio_set_dirty(folio); > + ret =3D nr_pages; > + goto out_unlock; > } > - ret =3D ret ? folio_nr_pages(folio) : -folio_nr_pages(folio); > + > + xa_lock_irq(&address_space->i_pages); > + __delete_from_swap_cache(folio, entry, NULL); > + xa_unlock_irq(&address_space->i_pages); > + folio_ref_sub(folio, nr_pages); > + folio_set_dirty(folio); > + > + spin_lock(&si->lock); > + /* Only sinple page folio can be backed by zswap */ > + if (!nr_pages) > + zswap_invalidate(entry); I am trying to figure out if I am mad :-) Does nr_pages =3D=3D 0 means sin= gle page folio? > + swap_entry_range_free(si, entry, nr_pages); > + spin_unlock(&si->lock); > + ret =3D nr_pages; > +out_unlock: > + folio_unlock(folio); > +out: > folio_put(folio); > return ret; > } > @@ -903,7 +973,7 @@ static int scan_swap_map_slots(struct swap_info_struc= t *si, > if (vm_swap_full() && si->swap_map[offset] =3D=3D SWAP_HAS_CACHE)= { > int swap_was_freed; > spin_unlock(&si->lock); > - swap_was_freed =3D __try_to_reclaim_swap(si, offset, TTRS= _ANYWAY); > + swap_was_freed =3D __try_to_reclaim_swap(si, offset, TTRS= _ANYWAY | TTRS_DIRECT); > spin_lock(&si->lock); > /* entry was freed successfully, try to use this again */ > if (swap_was_freed > 0) > @@ -1340,9 +1410,6 @@ void put_swap_folio(struct folio *folio, swp_entry_= t entry) > unsigned long offset =3D swp_offset(entry); > struct swap_cluster_info *ci; > struct swap_info_struct *si; > - unsigned char *map; > - unsigned int i, free_entries =3D 0; > - unsigned char val; > int size =3D 1 << swap_entry_order(folio_order(folio)); > > si =3D _swap_info_get(entry); > @@ -1350,23 +1417,14 @@ void put_swap_folio(struct folio *folio, swp_entr= y_t entry) > return; > > ci =3D lock_cluster_or_swap_info(si, offset); > - if (size > 1) { > - map =3D si->swap_map + offset; > - for (i =3D 0; i < size; i++) { > - val =3D map[i]; > - VM_BUG_ON(!(val & SWAP_HAS_CACHE)); > - if (val =3D=3D SWAP_HAS_CACHE) > - free_entries++; > - } > - if (free_entries =3D=3D size) { > - unlock_cluster_or_swap_info(si, ci); > - spin_lock(&si->lock); > - swap_entry_range_free(si, entry, size); > - spin_unlock(&si->lock); > - return; > - } > + if (size > 1 && swap_is_has_cache(si, offset, size)) { > + unlock_cluster_or_swap_info(si, ci); > + spin_lock(&si->lock); > + swap_entry_range_free(si, entry, size); > + spin_unlock(&si->lock); > + return; > } > - for (i =3D 0; i < size; i++, entry.val++) { > + for (int i =3D 0; i < size; i++, entry.val++) { > if (!__swap_entry_free_locked(si, offset + i, SWAP_HAS_CA= CHE)) { > unlock_cluster_or_swap_info(si, ci); > free_swap_slot(entry); > @@ -1526,16 +1584,7 @@ static bool folio_swapped(struct folio *folio) > return swap_page_trans_huge_swapped(si, entry, folio_order(folio)= ); > } > > -/** > - * folio_free_swap() - Free the swap space used for this folio. > - * @folio: The folio to remove. > - * > - * If swap is getting full, or if there are no more mappings of this fol= io, > - * then call folio_free_swap to free its swap space. > - * > - * Return: true if we were able to release the swap space. > - */ > -bool folio_free_swap(struct folio *folio) > +static bool folio_swapcache_freeable(struct folio *folio) > { > VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); > > @@ -1543,8 +1592,6 @@ bool folio_free_swap(struct folio *folio) > return false; > if (folio_test_writeback(folio)) > return false; > - if (folio_swapped(folio)) > - return false; > > /* > * Once hibernation has begun to create its image of memory, > @@ -1564,6 +1611,25 @@ bool folio_free_swap(struct folio *folio) > if (pm_suspended_storage()) > return false; > > + return true; > +} > + > +/** > + * folio_free_swap() - Free the swap space used for this folio. > + * @folio: The folio to remove. > + * > + * If swap is getting full, or if there are no more mappings of this fol= io, > + * then call folio_free_swap to free its swap space. > + * > + * Return: true if we were able to release the swap space. > + */ > +bool folio_free_swap(struct folio *folio) > +{ > + if (!folio_swapcache_freeable(folio)) > + return false; > + if (folio_swapped(folio)) > + return false; > + > delete_from_swap_cache(folio); > folio_set_dirty(folio); > return true; > @@ -1640,7 +1706,7 @@ void free_swap_and_cache_nr(swp_entry_t entry, int = nr) > * to the next boundary. > */ > nr =3D __try_to_reclaim_swap(si, offset, > - TTRS_UNMAPPED | TTRS_FULL); > + TTRS_UNMAPPED | TTRS_F= ULL); > if (nr =3D=3D 0) > nr =3D 1; > else if (nr < 0) > > -- > 2.46.0.rc1.232.g9752f9e123-goog >