From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C00991E0DD8 for ; Mon, 15 Dec 2025 03:57:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765771072; cv=none; b=FiE6V87YJmGgkDW02bhW6EJ2t+V+o1Zma6FfgBQShzO70ZZboCHYVREKpOmq2Wshpk6k9fZG6n/i+jjVtD7A1SV/z9GX6DdvQVlp6+r1Tp6HPqpXrdgBVTpUuKdhu69yrqHAdn53WCMZxKqWO1F4y0Iix4Yjx22xeTqubSzAnvE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765771072; c=relaxed/simple; bh=h/BT/0hDeecdoYWsP8VwppJsiTYbaD+cOPpwaRkjcIY=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=stoqUs3Wy9uZ91VRAgsso0dQOojVsuFjF+dIi986ekOrq8592C0KKsJhB+PorquGk5vOrJfquE8oacgfpHirEagGZslNU5R0hMgdtL+BXGT+YuPQK0Sw4rFAZGl5HelflgZNnRdBp1WMu1Hl0py+ukB1lDJEcw+GccDkbtWQr3U= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=DeA3RqhC; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="DeA3RqhC" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1765771069; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=vP69h/tudeYRvb+VTJJItOX2+7RkyUHJiC5JDYMONtA=; b=DeA3RqhCuRTsFmZICuxMvDdiibvtJrO04kiTYN7/yGikvEZGmw7vx4YcXvlgYBMydFdDCT FTWjd2u37OmhH3x8+zH1m0tYDXDkVzBQ7Fgx3h9+32f/ZzUhE+6PdmFrZl1jjLg4WlBlLz aXEEPwsFAav5fAGd4zQXnR9NwOPRj+E= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-65-_gwCH_5MOje5wl9DGQUINg-1; Sun, 14 Dec 2025 22:57:44 -0500 X-MC-Unique: _gwCH_5MOje5wl9DGQUINg-1 X-Mimecast-MFC-AGG-ID: _gwCH_5MOje5wl9DGQUINg_1765771062 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 2F6131800451; Mon, 15 Dec 2025 03:57:41 +0000 (UTC) Received: from localhost (unknown [10.72.112.95]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 9E0B230001A2; Mon, 15 Dec 2025 03:57:37 +0000 (UTC) Date: Mon, 15 Dec 2025 11:57:33 +0800 From: Baoquan He To: Kairui Song Cc: linux-mm@kvack.org, Andrew Morton , Barry Song , Chris Li , Nhat Pham , Yosry Ahmed , David Hildenbrand , Johannes Weiner , Youngjun Park , Hugh Dickins , Baolin Wang , Ying Huang , Kemeng Shi , Lorenzo Stoakes , "Matthew Wilcox (Oracle)" , linux-kernel@vger.kernel.org, Kairui Song Subject: Re: [PATCH v4 09/19] mm, swap: swap entry of a bad slot should not be considered as swapped out Message-ID: References: <20251205-swap-table-p2-v4-0-cb7e28a26a40@tencent.com> <20251205-swap-table-p2-v4-9-cb7e28a26a40@tencent.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20251205-swap-table-p2-v4-9-cb7e28a26a40@tencent.com> X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 On 12/05/25 at 03:29am, Kairui Song wrote: > From: Kairui Song > > When checking if a swap entry is swapped out, we simply check if the > bitwise result of the count value is larger than 0. But SWAP_MAP_BAD > will also be considered as a swao count value larger than 0. > > SWAP_MAP_BAD being considered as a count value larger than 0 is useful > for the swap allocator: they will be seen as a used slot, so the > allocator will skip them. But for the swapped out check, this > isn't correct. > > There is currently no observable issue. The swapped out check is only > useful for readahead and folio swapped-out status check. For readahead, > the swap cache layer will abort upon checking and updating the swap map. > For the folio swapped out status check, the swap allocator will never > allocate an entry of bad slots to folio, so that part is fine too. The > worst that could happen now is redundant allocation/freeing of folios > and waste CPU time. > > This also makes it easier to get rid of swap map checking and update > during folio insertion in the swap cache layer. Will swap_entry_swapped() be called in other places in phase 3 of swap table? I checked the code of the current phase 2, in all three places you convert it with swp_offset(entry), is it necessary? Why don't you keep the swap_entry_swapped(si, entry)? Surely, this is trivial. mm/swap_state.c <> if (!swap_entry_swapped(si, swp_offset(entry))) mm/swapfile.c <> return swap_entry_swapped(si, swp_offset(entry)); mm/swapfile.c <> WARN_ON(swap_entry_swapped(si, offset)); > > Signed-off-by: Kairui Song > --- > include/linux/swap.h | 6 ++++-- > mm/swap_state.c | 4 ++-- > mm/swapfile.c | 22 +++++++++++----------- > 3 files changed, 17 insertions(+), 15 deletions(-) > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index bf72b548a96d..936fa8f9e5f3 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -466,7 +466,8 @@ int find_first_swap(dev_t *device); > extern unsigned int count_swap_pages(int, int); > extern sector_t swapdev_block(int, pgoff_t); > extern int __swap_count(swp_entry_t entry); > -extern bool swap_entry_swapped(struct swap_info_struct *si, swp_entry_t entry); > +extern bool swap_entry_swapped(struct swap_info_struct *si, > + unsigned long offset); > extern int swp_swapcount(swp_entry_t entry); > struct backing_dev_info; > extern struct swap_info_struct *get_swap_device(swp_entry_t entry); > @@ -535,7 +536,8 @@ static inline int __swap_count(swp_entry_t entry) > return 0; > } > > -static inline bool swap_entry_swapped(struct swap_info_struct *si, swp_entry_t entry) > +static inline bool swap_entry_swapped(struct swap_info_struct *si, > + unsigned long offset) > { > return false; > } > diff --git a/mm/swap_state.c b/mm/swap_state.c > index 8c429dc33ca9..0c5aad537716 100644 > --- a/mm/swap_state.c > +++ b/mm/swap_state.c > @@ -527,8 +527,8 @@ struct folio *swap_cache_alloc_folio(swp_entry_t entry, gfp_t gfp_mask, > if (folio) > return folio; > > - /* Skip allocation for unused swap slot for readahead path. */ > - if (!swap_entry_swapped(si, entry)) > + /* Skip allocation for unused and bad swap slot for readahead. */ > + if (!swap_entry_swapped(si, swp_offset(entry))) > return NULL; > > /* Allocate a new folio to be added into the swap cache. */ > diff --git a/mm/swapfile.c b/mm/swapfile.c > index e23287c06f1c..5a766d4fcaa5 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -1766,21 +1766,21 @@ int __swap_count(swp_entry_t entry) > return swap_count(si->swap_map[offset]); > } > > -/* > - * How many references to @entry are currently swapped out? > - * This does not give an exact answer when swap count is continued, > - * but does include the high COUNT_CONTINUED flag to allow for that. > +/** > + * swap_entry_swapped - Check if the swap entry at @offset is swapped. > + * @si: the swap device. > + * @offset: offset of the swap entry. > */ > -bool swap_entry_swapped(struct swap_info_struct *si, swp_entry_t entry) > +bool swap_entry_swapped(struct swap_info_struct *si, unsigned long offset) > { > - pgoff_t offset = swp_offset(entry); > struct swap_cluster_info *ci; > int count; > > ci = swap_cluster_lock(si, offset); > count = swap_count(si->swap_map[offset]); > swap_cluster_unlock(ci); > - return !!count; > + > + return count && count != SWAP_MAP_BAD; > } > > /* > @@ -1866,7 +1866,7 @@ static bool folio_swapped(struct folio *folio) > return false; > > if (!IS_ENABLED(CONFIG_THP_SWAP) || likely(!folio_test_large(folio))) > - return swap_entry_swapped(si, entry); > + return swap_entry_swapped(si, swp_offset(entry)); > > return swap_page_trans_huge_swapped(si, entry, folio_order(folio)); > } > @@ -3677,10 +3677,10 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr) > count = si->swap_map[offset + i]; > > /* > - * swapin_readahead() doesn't check if a swap entry is valid, so the > - * swap entry could be SWAP_MAP_BAD. Check here with lock held. > + * Allocator never allocates bad slots, and readahead is guarded > + * by swap_entry_swapped. > */ > - if (unlikely(swap_count(count) == SWAP_MAP_BAD)) { > + if (WARN_ON(swap_count(count) == SWAP_MAP_BAD)) { > err = -ENOENT; > goto unlock_out; > } > > -- > 2.52.0 >