From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 35D88C61DA4
	for <linux-mm@archiver.kernel.org>; Thu, 23 Feb 2023 23:46:32 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 987716B0074; Thu, 23 Feb 2023 18:46:31 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 90FF46B0075; Thu, 23 Feb 2023 18:46:31 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 7B10B6B0078; Thu, 23 Feb 2023 18:46:31 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16])
	by kanga.kvack.org (Postfix) with ESMTP id 656136B0074
	for <linux-mm@kvack.org>; Thu, 23 Feb 2023 18:46:31 -0500 (EST)
Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay06.hostedemail.com (Postfix) with ESMTP id 35026AB230
	for <linux-mm@kvack.org>; Thu, 23 Feb 2023 23:46:31 +0000 (UTC)
X-FDA: 80500193382.20.150CEDA
Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179])
	by imf24.hostedemail.com (Postfix) with ESMTP id 448D7180011
	for <linux-mm@kvack.org>; Thu, 23 Feb 2023 23:46:29 +0000 (UTC)
Authentication-Results: imf24.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20210112 header.b=igwJZZBd;
	dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none);
	spf=pass (imf24.hostedemail.com: domain of minchan.kim@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1677195989;
	h=from:from:sender:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=LUsoETvpcOqiI6sggxMzGrA7yADLPKc9REkQGUWXyIY=;
	b=cXT3ChztH/bEF+SyocsuJIVBzBPOT0JDqcwyEskBi76605osn+10Vn6uyRlw9oQyyDvztx
	wPs7DgE/m/WrB9ZpsHziI/PC7uAa/2UV9CWc7Y6Rto5bFbL5rtMuOWXCjBdcrK99z2YdXB
	m1Hp9O5OJc4CaxyaYw7/8oqHWDne29o=
ARC-Authentication-Results: i=1;
	imf24.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20210112 header.b=igwJZZBd;
	dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none);
	spf=pass (imf24.hostedemail.com: domain of minchan.kim@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677195989; a=rsa-sha256;
	cv=none;
	b=E3fbb5qrYizYTuml0jL3pv8TKSlUglw75F0u3XhtOSMldaK3IhL3LsrmjJKI3NryHkfvgp
	6CbXiTuup8eCBj1OFchTd6ewnY4KWZiss7/M3UTMEWbHZHkzm4TpANr7fATfWdann4l8Na
	oQeI0t5V9OxtHALE9JtKU18c619hvKU=
Received: by mail-pl1-f179.google.com with SMTP id i10so6026445plr.9
        for <linux-mm@kvack.org>; Thu, 23 Feb 2023 15:46:28 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112; t=1677195988;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:sender:from:to:cc:subject:date:message-id
         :reply-to;
        bh=LUsoETvpcOqiI6sggxMzGrA7yADLPKc9REkQGUWXyIY=;
        b=igwJZZBd/7G2tpJHLBI9j/0I/4gIhm8JvRoToGo/WGOiCm/npZgYMVJxmgdCbh1hPM
         zokYeHhHBAPhWggGsfrk6/CkVAeoA7ssIsJCbXKhSttBgPAJmsCSRmqLbvcEieuOQ/So
         MR6Xuqq30UwHSxl7sjWBFvEwuzWGIyEmNphDnGSKxEIQnmVnT5bNmhqHuQahoGRlQCNv
         CLOU10q09fBIy/ZP+8lztgKgOH6EKgwTD+aVWzYWj9v0x6bUDVFOpvgXmQ5ESNeP3x3+
         M0VU0xLPSw0obQHO3hkdrL0Bn4OUkxYAYfvIboXmmS5ZBg4ZU9bkMo1bpxQ1XGCK3qoy
         R1ww==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112; t=1677195988;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:sender:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=LUsoETvpcOqiI6sggxMzGrA7yADLPKc9REkQGUWXyIY=;
        b=N5s/wAmTAQL4bTEgXAIhoQN1X6nDQxohpxHm7mbpzPM0ZYcy0NNTK4M26JAX/DDdy1
         h/aJEZfDKCo1GugCass0r35bpoUY/3GdLWUOjeoLCp2EYZB4Tt8TuMgJi7l30XMCSg+M
         SJ15m/+GefVEGtP02pmsdFAz4pJ8u1H/LOn+r4OfxXcEkK7K/oQ2Uw26sz/qAhV9M7Fp
         s+vquJaPNKvvVJf5rd0A2QMB4tYUn3megKacz4M+Q+Y83pq9+jJ/rPIT0OQbCQy6gAAd
         3/lLSMYEEX251d8FTmsWYDGnaXzdG8STZfMLBKfVl4YOqY0pdPJW2/CWuua+KuAcAeC3
         n7HA==
X-Gm-Message-State: AO0yUKXGllXVTdoy0lRedIsmnwTq5244oHb/H/SeJ/ouaj0Dasjoep7/
	XOgOR+g+Qvx9sGLDbGoadho=
X-Google-Smtp-Source: AK7set8bIWofM3SorkwUKjw3N5BkXFgXlakZ6DuEJndggtAGtfKcdDg7oAScOvZ7rQwrvBJxIJ+Q0g==
X-Received: by 2002:a17:90b:164f:b0:230:8ec4:3c0c with SMTP id il15-20020a17090b164f00b002308ec43c0cmr13403479pjb.18.1677195987892;
        Thu, 23 Feb 2023 15:46:27 -0800 (PST)
Received: from google.com ([2620:15c:211:201:be5:a687:1e9d:ec53])
        by smtp.gmail.com with ESMTPSA id x7-20020a17090aca0700b0023317104415sm253000pjt.17.2023.02.23.15.46.26
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 23 Feb 2023 15:46:27 -0800 (PST)
Date: Thu, 23 Feb 2023 15:46:25 -0800
From: Minchan Kim <minchan@kernel.org>
To: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Yosry Ahmed <yosryahmed@google.com>, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCHv2 4/6] zsmalloc: rework compaction algorithm
Message-ID: <Y/f60c25V4oiIeMN@google.com>
References: <20230223030451.543162-1-senozhatsky@chromium.org>
 <20230223030451.543162-5-senozhatsky@chromium.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20230223030451.543162-5-senozhatsky@chromium.org>
X-Rspam-User: 
X-Rspamd-Server: rspam02
X-Rspamd-Queue-Id: 448D7180011
X-Stat-Signature: eibn41kaf1xbd17wdgfydjmo6eqkzyrr
X-HE-Tag: 1677195989-147398
X-HE-Meta: U2FsdGVkX18SJRlmBKknQiPguiXBqUPf23VyBAfNFu8VUExk0ryw0OyeY4q4GzQUs2T1RIhd1RFFprKFwTn879yN+yluQAk/8XFsCTMGKg/6D+I+YSBRYFpV+Zz+8FvGn15ouO1PSfa+VyDj2r6uhhAuMcAWjCY52+iN2LlDHDcvd9iD4VGQijEBIY42CiFrYM1DaICBEMfp3mOx4/IMC2oea59b81z15MVBWrZrP29hozOg3Ryn5VMaNBMhqqKag0/g+9M74PiyjgH5LEW3Ctt+Y12rDWxu0EYqJAO976uumo4OdLbA/y5rbnhYVnt6WcdDugthootOJHPr8IhQj+HetzgzkfX6qxEkkwvqRUY9n4Sdwo5vd1hYzOxxx45efr9MobWptErpt7qISCWs82DLK69okDhtbrnnl0NJBH2hc3DFES7Z/6ICKTvCD77F5/F9Yc5/VepCWCnAYQbnYiZtrVt3Stdc2AHPFP2Iulnv79A0mwBC6QoEd8twkxWwncyfMGYOyAalV/eoNfXo7VdXH5yW7CrosFrMtZJuKjriMN/aaehmkjXCk0zYIpXWndBlbEU0bKCQRXZQvKu6PeDlw8Nm/Q8sxb3AssrG1L2u4So2uhc6IB2lkJpnzO0S+BoKYek2j4weuUdPqRZSPuDP2bkig6sTaS7KSD3AF15+u6eRcmFkZegx7HNinQtpwoDRAnxtPCVpWldZIZvJ3QbNeMu3ZWUWlIEyZhqDzhrxkvhYBoWsZobrMSsdbTVzbF9zY0qraQSDNYd06Xs2j1GPCkR708R18eQ6Nv1iRX1nyVk+RUSzmLUROfbPNwXw6wlCZiYVh1APJqbXZOd8SdYlQaBNRFP3qYspnAaYjb8IcjrWxItcWHoUZ9sc2A4dhd6jFVK2J+cfRL6E6Q4q0J0Yf6zPtMPh5VJpm8IUyS8eyzunrM5XQoejIlFVlGrmRl/caonirW09Xt277P6
 6G6zOKlA
 j7Juw308j7KvUH54Kjum+SjqdsbeGeruCJdfovMiJcbhrP9zizVei+p9Ke3YkaNGqBDmCFsKihaqI6rJm4Uo6ZNVzieRC2sK+tD099sns4IN1Wb8IK7TfqDDgp21Rv7ChzsNRYNoX5UNAGZ9Zmok+s++UNLRQ7zt4IwgdIm10esRl0r8bko83UqLHAls2f/9AWM1PBazfkt9+Zkd8XIfecbP4ivd3aaboE7EV8H9qEucwdjpEgqxwKzWRGXqO3LYxFRMOPDpYynnWOS+cgXp3wTD/H0hBu7ZutLf4fpQxjPiXgtRo/VQJvPAzbpAA8CeqYk1kAteiJWnMqhDGrhF/FWpNNOX6ulqMlKNO6fQwF/tg4Tc9pmvtprnxzAWMfn7CImXMkogM6K8IxEquYrO7Mnllyu6/8OmdMFTJcMFMzs2Lc0gxh7v5Y3qTn99sLX8ktbLbq9mlhbikhCCX6UFHtsPm1/os9ChjHB51aNHq16NwMB7iQoBS08ieP5o8SUBoeDfpHkdcYOgiJcScv4ENvh1LlyuXp0MLdk1AvmNK0Z46zB+fggUI6gdjolY8s1o4atZ94428jFbYXleNzG1Q45gcXTSJFyXuL2klRlkgCg+PRK8=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.068492, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Thu, Feb 23, 2023 at 12:04:49PM +0900, Sergey Senozhatsky wrote:
> The zsmalloc compaction algorithm has the potential to
> waste some CPU cycles, particularly when compacting pages
> within the same fullness group. This is due to the way it
> selects the head page of the fullness list for source and
> destination pages, and how it reinserts those pages during
> each iteration. The algorithm may first use a page as a
> migration destination and then as a migration source,
> leading to an unnecessary back-and-forth movement of
> objects.
> 
> Consider the following fullness list:
> 
> PageA PageB PageC PageD PageE
> 
> During the first iteration, the compaction algorithm will
> select PageA as the source and PageB as the destination.
> All of PageA's objects will be moved to PageB, and then
> PageA will be released while PageB is reinserted into the
> fullness list.
> 
> PageB PageC PageD PageE
> 
> During the next iteration, the compaction algorithm will
> again select the head of the list as the source and destination,
> meaning that PageB will now serve as the source and PageC as
> the destination. This will result in the objects being moved
> away from PageB, the same objects that were just moved to PageB
> in the previous iteration.
> 
> To prevent this avalanche effect, the compaction algorithm

Good point.

> should not reinsert the destination page between iterations.
> By doing so, the most optimal page will continue to be used
> and its usage ratio will increase, reducing internal
> fragmentation. The destination page should only be reinserted
> into the fullness list if:
> - It becomes full
> - No source page is available.

I think that's really better option, yeah.

> 
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
>  mm/zsmalloc.c | 82 ++++++++++++++++++++++++---------------------------
>  1 file changed, 38 insertions(+), 44 deletions(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 1a92ebe338eb..eacf9e32da5c 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1786,15 +1786,14 @@ struct zs_compact_control {
>  	int obj_idx;
>  };
>  
> -static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
> -				struct zs_compact_control *cc)
> +static void migrate_zspage(struct zs_pool *pool, struct size_class *class,
> +			   struct zs_compact_control *cc)
>  {
>  	unsigned long used_obj, free_obj;
>  	unsigned long handle;
>  	struct page *s_page = cc->s_page;
>  	struct page *d_page = cc->d_page;
>  	int obj_idx = cc->obj_idx;
> -	int ret = 0;
>  
>  	while (1) {
>  		handle = find_alloced_obj(class, s_page, &obj_idx);
> @@ -1807,10 +1806,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
>  		}
>  
>  		/* Stop if there is no more space */
> -		if (zspage_full(class, get_zspage(d_page))) {
> -			ret = -ENOMEM;
> +		if (zspage_full(class, get_zspage(d_page)))
>  			break;
> -		}
>  
>  		used_obj = handle_to_obj(handle);
>  		free_obj = obj_malloc(pool, get_zspage(d_page), handle);
> @@ -1823,8 +1820,6 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
>  	/* Remember last position in this iteration */
>  	cc->s_page = s_page;
>  	cc->obj_idx = obj_idx;
> -
> -	return ret;
>  }
>  
>  static struct zspage *isolate_src_zspage(struct size_class *class)
> @@ -2228,57 +2223,56 @@ static unsigned long __zs_compact(struct zs_pool *pool,
>  	 * as well as zpage allocation/free
>  	 */
>  	spin_lock(&pool->lock);
> -	while ((src_zspage = isolate_src_zspage(class))) {
> -		/* protect someone accessing the zspage(i.e., zs_map_object) */
> -		migrate_write_lock(src_zspage);
> -
> -		if (!zs_can_compact(class))
> -			break;
> -
> -		cc.obj_idx = 0;
> -		cc.s_page = get_first_page(src_zspage);
> -
> -		while ((dst_zspage = isolate_dst_zspage(class))) {
> -			migrate_write_lock_nested(dst_zspage);
> -
> +	while (1) {

Hmm, I preferred the old loop structure. Did you see any problem
to keep old code structure?

Can't we just add checking logic whether destination zspage page
is full after migrate_zspage and the putback if it is. Otherwise,
keep continuing with the source zspage or new zspage we completely
migrate all zpages in the zspage. If there is no more source zspages
in the list, we can break the loop and then putback the dest zspage
into right class group at out of end loop.

> +		if (!dst_zspage) {
> +			dst_zspage = isolate_dst_zspage(class);
> +			if (!dst_zspage)
> +				goto out;
> +			migrate_write_lock(dst_zspage);
>  			cc.d_page = get_first_page(dst_zspage);
> -			/*
> -			 * If there is no more space in dst_page, resched
> -			 * and see if anyone had allocated another zspage.
> -			 */
> -			if (!migrate_zspage(pool, class, &cc))
> -				break;
> +		}
>  
> +		if (!zs_can_compact(class)) {
>  			putback_zspage(class, dst_zspage);
>  			migrate_write_unlock(dst_zspage);
> -			dst_zspage = NULL;
> -			if (spin_is_contended(&pool->lock))
> -				break;
> +			goto out;
>  		}
>  
> -		/* Stop if we couldn't find slot */
> -		if (dst_zspage == NULL)
> -			break;
> +		src_zspage = isolate_src_zspage(class);
> +		if (!src_zspage) {
> +			putback_zspage(class, dst_zspage);
> +			migrate_write_unlock(dst_zspage);
> +			goto out;
> +		}
>  
> -		putback_zspage(class, dst_zspage);
> -		migrate_write_unlock(dst_zspage);
> +		migrate_write_lock_nested(src_zspage);
> +
> +		cc.obj_idx = 0;
> +		cc.s_page = get_first_page(src_zspage);
> +		migrate_zspage(pool, class, &cc);
>  
>  		if (putback_zspage(class, src_zspage) == ZS_INUSE_RATIO_0) {
>  			migrate_write_unlock(src_zspage);
>  			free_zspage(pool, class, src_zspage);
>  			pages_freed += class->pages_per_zspage;
> -		} else
> +		} else {
>  			migrate_write_unlock(src_zspage);
> -		spin_unlock(&pool->lock);
> -		cond_resched();
> -		spin_lock(&pool->lock);
> -	}
> +		}
>  
> -	if (src_zspage) {
> -		putback_zspage(class, src_zspage);
> -		migrate_write_unlock(src_zspage);
> -	}
> +		if (get_fullness_group(class, dst_zspage) == ZS_INUSE_RATIO_100
> +		    || spin_is_contended(&pool->lock)) {
> +			putback_zspage(class, dst_zspage);
> +			migrate_write_unlock(dst_zspage);
> +			dst_zspage = NULL;
> +		}
>  
> +		if (!dst_zspage) {
> +			spin_unlock(&pool->lock);
> +			cond_resched();
> +			spin_lock(&pool->lock);
> +		}
> +	}
> +out:
>  	spin_unlock(&pool->lock);
>  
>  	return pages_freed;
> -- 
> 2.39.2.637.g21b0678d19-goog
>