From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5289A43ABC for ; Thu, 9 Jan 2025 02:08:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736388492; cv=none; b=Psji8zb9mY5l13VqAgWWR3FCdGGzGaKP41sGrGnN3PCC9QP7Uiswp5r0E+39rKzDfpLeL0BtC5vZOJAwkf/63VN1GUZvlmGj7oObx/1p4f4VDGZ08Tjm0iDkYtBCCF3q4/4eUSysstxNo9QgPazVJykQfh7bgYXtzC1u4Ry/8W0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736388492; c=relaxed/simple; bh=WaNjUpsHu0M+LcaqJIll4b/1LQoIUqeg6rOmPSI15WY=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=ankYOTO/Scu03FI5dukS8t0sHi+WcrxMP+Jjqy7agl9MEfE1h1l9ainTOgSpudUep/LluW6tq4vhbTV/F/I5oujDcMhmuq2gVFqIYGBFnj8fzAprOGuYn3TZ61UKqZanCLIONnlbofxrF5yKOYhjLgN2Ya3xVFpcy9joG8w56a8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=YDJ4qWkO; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="YDJ4qWkO" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736388489; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=8ooB0O1WD64FgHnPSOYOdvJ9nrWH+T4U0WRMROiUJU4=; b=YDJ4qWkOmITQ6fCece21jYJuIocuGIVmthFFE08vvOxMYSdVgJw3yuEamE3+cf/1+IrV2t 7fZ/4OySMaOqwFvqcc3D+ZoqBPBfTziq/iwIZ13RultwK/tzpZouS0SYIpaTXJrbhKHpsW zPZQkSVkJyJgFoFbCaOpDmu8amPS1a4= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-271-SwuNs9svOnuwKGUJ_F8jRA-1; Wed, 08 Jan 2025 21:08:03 -0500 X-MC-Unique: SwuNs9svOnuwKGUJ_F8jRA-1 X-Mimecast-MFC-AGG-ID: SwuNs9svOnuwKGUJ_F8jRA Received: from mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.40]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 47A6F1944D05; Thu, 9 Jan 2025 02:08:01 +0000 (UTC) Received: from localhost (unknown [10.72.112.99]) by mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 4140519560AD; Thu, 9 Jan 2025 02:07:58 +0000 (UTC) Date: Thu, 9 Jan 2025 10:07:54 +0800 From: Baoquan He To: Kairui Song Cc: linux-mm@kvack.org, Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org Subject: Re: [PATCH v3 10/13] mm, swap: simplify percpu cluster updating Message-ID: References: <20241230174621.61185-1-ryncsn@gmail.com> <20241230174621.61185-11-ryncsn@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20241230174621.61185-11-ryncsn@gmail.com> X-Scanned-By: MIMEDefang 3.0 on 10.30.177.40 On 12/31/24 at 01:46am, Kairui Song wrote: > From: Kairui Song > > Instead of using a returning argument, we can simply store the next > cluster offset to the fixed percpu location, which reduce the stack > usage and simplify the function: > > Object size: > ./scripts/bloat-o-meter mm/swapfile.o mm/swapfile.o.new > add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-271 (-271) > Function old new delta > get_swap_pages 2847 2733 -114 > alloc_swap_scan_cluster 894 737 -157 > Total: Before=30833, After=30562, chg -0.88% > > Stack usage: > Before: > swapfile.c:1190:5:get_swap_pages 240 static > > After: > swapfile.c:1185:5:get_swap_pages 216 static > > Signed-off-by: Kairui Song > --- > include/linux/swap.h | 4 +-- > mm/swapfile.c | 66 +++++++++++++++++++------------------------- > 2 files changed, 31 insertions(+), 39 deletions(-) > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index c4ff31cb6bde..4c1d2e69689f 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -275,9 +275,9 @@ enum swap_cluster_flags { > * The first page in the swap file is the swap header, which is always marked > * bad to prevent it from being allocated as an entry. This also prevents the > * cluster to which it belongs being marked free. Therefore 0 is safe to use as > - * a sentinel to indicate next is not valid in percpu_cluster. > + * a sentinel to indicate an entry is not valid. > */ > -#define SWAP_NEXT_INVALID 0 > +#define SWAP_ENTRY_INVALID 0 > > #ifdef CONFIG_THP_SWAP > #define SWAP_NR_ORDERS (PMD_ORDER + 1) > diff --git a/mm/swapfile.c b/mm/swapfile.c > index dadd4fead689..60a650ba88fd 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -759,23 +759,23 @@ static bool cluster_alloc_range(struct swap_info_struct *si, struct swap_cluster > return true; > } > > -static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, unsigned long offset, > - unsigned int *foundp, unsigned int order, > +/* Try use a new cluster for current CPU and allocate from it. */ > +static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, > + struct swap_cluster_info *ci, > + unsigned long offset, > + unsigned int order, > unsigned char usage) > { > - unsigned long start = offset & ~(SWAPFILE_CLUSTER - 1); > + unsigned int next = SWAP_ENTRY_INVALID, found = SWAP_ENTRY_INVALID; > + unsigned long start = ALIGN_DOWN(offset, SWAPFILE_CLUSTER); > unsigned long end = min(start + SWAPFILE_CLUSTER, si->max); > unsigned int nr_pages = 1 << order; > bool need_reclaim, ret; > - struct swap_cluster_info *ci; > > - ci = &si->cluster_info[offset / SWAPFILE_CLUSTER]; > lockdep_assert_held(&ci->lock); > > - if (end < nr_pages || ci->count + nr_pages > SWAPFILE_CLUSTER) { > - offset = SWAP_NEXT_INVALID; > + if (end < nr_pages || ci->count + nr_pages > SWAPFILE_CLUSTER) > goto out; > - } > > for (end -= nr_pages; offset <= end; offset += nr_pages) { > need_reclaim = false; > @@ -789,34 +789,27 @@ static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, unsigne > * cluster has no flag set, and change of list > * won't cause fragmentation. > */ > - if (!cluster_is_usable(ci, order)) { > - offset = SWAP_NEXT_INVALID; > + if (!cluster_is_usable(ci, order)) > goto out; > - } > if (cluster_is_free(ci)) > offset = start; > /* Reclaim failed but cluster is usable, try next */ > if (!ret) > continue; > } > - if (!cluster_alloc_range(si, ci, offset, usage, order)) { > - offset = SWAP_NEXT_INVALID; > - goto out; > - } > - *foundp = offset; > - if (ci->count == SWAPFILE_CLUSTER) { > - offset = SWAP_NEXT_INVALID; > - goto out; > - } > + if (!cluster_alloc_range(si, ci, offset, usage, order)) > + break; > + found = offset; > offset += nr_pages; > + if (ci->count < SWAPFILE_CLUSTER && offset <= end) > + next = offset; > break; > } > - if (offset > end) > - offset = SWAP_NEXT_INVALID; > out: > relocate_cluster(si, ci); > unlock_cluster(ci); > - return offset; > + __this_cpu_write(si->percpu_cluster->next[order], next); > + return found; > } > > /* Return true if reclaimed a whole cluster */ > @@ -885,8 +878,8 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o > if (cluster_is_usable(ci, order)) { > if (cluster_is_free(ci)) > offset = cluster_offset(si, ci); > - offset = alloc_swap_scan_cluster(si, offset, &found, > - order, usage); > + found = alloc_swap_scan_cluster(si, ci, offset, > + order, usage); > } else { > unlock_cluster(ci); > } > @@ -897,8 +890,8 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o > new_cluster: > ci = cluster_isolate_lock(si, &si->free_clusters); > if (ci) { > - offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), > - &found, order, usage); > + found = alloc_swap_scan_cluster(si, ci, cluster_offset(si, ci), > + order, usage); > if (found) > goto done; > } > @@ -911,8 +904,8 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o > unsigned int frags = 0, frags_existing; > > while ((ci = cluster_isolate_lock(si, &si->nonfull_clusters[order]))) { > - offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), > - &found, order, usage); > + found = alloc_swap_scan_cluster(si, ci, cluster_offset(si, ci), > + order, usage); > /* > * With `fragmenting` set to true, it will surely take > * the cluster off nonfull list > @@ -932,8 +925,8 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o > * per-CPU usage, but they could contain newly released > * reclaimable (eg. lazy-freed swap cache) slots. > */ > - offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), > - &found, order, usage); > + found = alloc_swap_scan_cluster(si, ci, cluster_offset(si, ci), > + order, usage); > if (found) > goto done; > frags++; > @@ -959,21 +952,20 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o > */ > while ((ci = cluster_isolate_lock(si, &si->frag_clusters[o]))) { > atomic_long_dec(&si->frag_cluster_nr[o]); > - offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), > - &found, order, usage); > + found = alloc_swap_scan_cluster(si, ci, cluster_offset(si, ci), > + 0, usage); > if (found) > goto done; > } > > while ((ci = cluster_isolate_lock(si, &si->nonfull_clusters[o]))) { > - offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), > - &found, order, usage); > + found = alloc_swap_scan_cluster(si, ci, cluster_offset(si, ci), > + 0, usage); > if (found) > goto done; > } > } > done: > - __this_cpu_write(si->percpu_cluster->next[order], offset); > local_unlock(&si->percpu_cluster->lock); Do you think if we still need hold the si->percpu_cluster->lock till the end of cluster_alloc_swap_entry() invocation? If so, we may need hold the lock during the whole period when going through percpu_cluster->next, free_cluster, nonfull, frag_clusters until we get one available slot, even though we keep upating the si->percpu_cluster->next[order]. I can't see the point by changing it like this. > > return found; > @@ -3194,7 +3186,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, > > cluster = per_cpu_ptr(si->percpu_cluster, cpu); > for (i = 0; i < SWAP_NR_ORDERS; i++) > - cluster->next[i] = SWAP_NEXT_INVALID; > + cluster->next[i] = SWAP_ENTRY_INVALID; > local_lock_init(&cluster->lock); > } > > -- > 2.47.1 > >