From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B06C2C3DA4A
	for <linux-mm@archiver.kernel.org>; Wed, 14 Aug 2024 17:54:34 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 4142F6B0082; Wed, 14 Aug 2024 13:54:34 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 39CC66B0088; Wed, 14 Aug 2024 13:54:34 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 216B76B0089; Wed, 14 Aug 2024 13:54:34 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16])
	by kanga.kvack.org (Postfix) with ESMTP id F3AF66B0082
	for <linux-mm@kvack.org>; Wed, 14 Aug 2024 13:54:33 -0400 (EDT)
Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay09.hostedemail.com (Postfix) with ESMTP id 8109081090
	for <linux-mm@kvack.org>; Wed, 14 Aug 2024 17:54:33 +0000 (UTC)
X-FDA: 82451600826.16.C6DD0F7
Received: from mail-qv1-f53.google.com (mail-qv1-f53.google.com [209.85.219.53])
	by imf09.hostedemail.com (Postfix) with ESMTP id 9C856140029
	for <linux-mm@kvack.org>; Wed, 14 Aug 2024 17:54:31 +0000 (UTC)
Authentication-Results: imf09.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=H3IKGxaU;
	spf=pass (imf09.hostedemail.com: domain of alexander.duyck@gmail.com designates 209.85.219.53 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723658035; a=rsa-sha256;
	cv=none;
	b=m6ZcT6nsSbuvL7A3g405puJJGeoYiInOJsln3veNwE45snbvURo/iSxhKuedcABDgwNN/N
	GavXBUAKFiQSb5glw8pYVz7gBE59ea5X+Wtxfsfcm4x7XhpzDr8X5BddDxeHVs5ZocZn8D
	xXueB+lL+uJn3DkXn+bUtlmNPzflQOY=
ARC-Authentication-Results: i=1;
	imf09.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=H3IKGxaU;
	spf=pass (imf09.hostedemail.com: domain of alexander.duyck@gmail.com designates 209.85.219.53 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1723658035;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=EIqNiHrHgtDnjEyOTrZtcTK6ZU4Eeh3i5xeMQgHzr90=;
	b=PvwYJqtuaUTLl0db0xvFUY3mIdWIiP/PGU3TAZ1FLceWeZulvF19MOcakImUAyrMi+gpz/
	w/49dsMpzjYV4p1tRVrm9zw9jRwYw8cYaNZLqpvoKMBUus+y7kNaePPNuyL4MoSwkzpwzt
	Wl8BomK+bHduN5oJ1DlaZycGKSpuca4=
Received: by mail-qv1-f53.google.com with SMTP id 6a1803df08f44-6bf6606363fso428936d6.3
        for <linux-mm@kvack.org>; Wed, 14 Aug 2024 10:54:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1723658071; x=1724262871; darn=kvack.org;
        h=mime-version:user-agent:content-transfer-encoding:references
         :in-reply-to:date:cc:to:from:subject:message-id:from:to:cc:subject
         :date:message-id:reply-to;
        bh=EIqNiHrHgtDnjEyOTrZtcTK6ZU4Eeh3i5xeMQgHzr90=;
        b=H3IKGxaUpX2S/eeEyfKZf4ieotDRWtpRYZQKpfC0qfB/JE4G7hOiZ3BXxJQHtv4WD0
         UHPQ5gjB6novXnw/SNUh8PiBnPgdl+xq2x/xEHeW+nRe0usub1Q7I/zpuGgXLGWTSdEB
         HYTcOPD5uw8qmVzrAf4SRwNU7DLIyj2JP082PU3+iPsvVf6UZuQk7giHfetx7c+VXy8Y
         9/wewjDu8zvfSw77EkQh6Dj56YzzraMAjLulHl0NTN9zaYQExYJ4sTWZxD4O4A/n7QNS
         TTyVTJHUG+UlL4yPrgwS0W6/viPK/o2UW4xfes8F8NddJaFL8JlVzoXisAHFcfT+xqq0
         i2Jw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1723658071; x=1724262871;
        h=mime-version:user-agent:content-transfer-encoding:references
         :in-reply-to:date:cc:to:from:subject:message-id:x-gm-message-state
         :from:to:cc:subject:date:message-id:reply-to;
        bh=EIqNiHrHgtDnjEyOTrZtcTK6ZU4Eeh3i5xeMQgHzr90=;
        b=f+ynYA/brTwplFfRvkdejuIjjEdBQyiRMTK4yJutB04bSwDeqJg9uyyxsE+U7+mhvn
         6HcFolvuzcASj+G3pNN2kE0hJKdmvt5OoTabsktf+9LuQc8Ov2TxzWirIuO4aCIv7DOS
         PTdM/lEMtpJjfOR371Pw/C1pJ3dRQWTRa5fc/y5cd3+a+BliliyuDzLCzQ4M3gZjZ8bB
         S+zrLW7CF3VGG9NFubaxgMbtia/GbVuv0v9kh3sUajpNlc5/DHF9uWt/VHKZTpykaQZ1
         LHaZMAbJVyBvCCtdJOCTcAG/hFASLhY/G34F9EZdWjg0zzgkgVH1UvmapxgfQEl2YkN9
         5tJQ==
X-Forwarded-Encrypted: i=1; AJvYcCXpmG3nbNxN0VyXr4Qd/Xu1k+PNss3M80/YronoUI05VrI4KwKix03zzrD3XpGxn2UGBlI8bDVDDe4yL8mhbdkohqA=
X-Gm-Message-State: AOJu0Yzti6S39K8JOWD8xJt/iJVkjLPeDc+/E2vujg6/iy2qGrwMo8Hw
	cXzjyPw4Mm9D5EBAkJtAFr6QucEo9yGpcUBjziWFSVbiPIQscDCNl0DiFUSt
X-Google-Smtp-Source: AGHT+IFUHdLi9kBifztRIcc2wlKCf5t/o+yS31u3ikobz6dsiPyWLh7iQQHN5b1mOEqnzRUCvIaqHA==
X-Received: by 2002:a05:6214:2b9e:b0:6bb:9b66:f263 with SMTP id 6a1803df08f44-6bf5d163f3cmr38184196d6.9.1723658070507;
        Wed, 14 Aug 2024 10:54:30 -0700 (PDT)
Received: from ?IPv6:2605:59c8:829:4c00:82ee:73ff:fe41:9a02? ([2605:59c8:829:4c00:82ee:73ff:fe41:9a02])
        by smtp.googlemail.com with ESMTPSA id 6a1803df08f44-6bd82c82f31sm46216566d6.57.2024.08.14.10.54.28
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 14 Aug 2024 10:54:30 -0700 (PDT)
Message-ID: <7d16ba784eb564f9d556f532d670b9bc4698d913.camel@gmail.com>
Subject: Re: [PATCH net-next v13 08/14] mm: page_frag: some minor
 refactoring before adding new API
From: Alexander H Duyck <alexander.duyck@gmail.com>
To: Yunsheng Lin <linyunsheng@huawei.com>, davem@davemloft.net,
 kuba@kernel.org,  pabeni@redhat.com
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Andrew Morton
	 <akpm@linux-foundation.org>, linux-mm@kvack.org
Date: Wed, 14 Aug 2024 10:54:27 -0700
In-Reply-To: <20240808123714.462740-9-linyunsheng@huawei.com>
References: <20240808123714.462740-1-linyunsheng@huawei.com>
	 <20240808123714.462740-9-linyunsheng@huawei.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
User-Agent: Evolution 3.48.4 (3.48.4-1.fc38) 
MIME-Version: 1.0
X-Stat-Signature: uyrozg81fc35exi7nmde7z6xxb3h5gqz
X-Rspamd-Queue-Id: 9C856140029
X-Rspam-User: 
X-Rspamd-Server: rspam10
X-HE-Tag: 1723658071-973049
X-HE-Meta: U2FsdGVkX1+vTVEHFOhmICNu4UVCGqnH8Hu/zqWBI2CYoVePy4/K00OdddXBsIJ7+BzbQe2+f+7+sbN3oUY5+KrJ6gUDQt8RA3r8v5FqoF55CgPRdpfBRABzBrwr95JOUZmz3L1JnOq77vPZX8TqdOHC+ztzVQoT4cLJEi+WWO6iElLSxwxShPglHQT1lEmuVSTcJIjfTfEj19kIYwnhzkWhqLoRZ/Nubj++qglaZMLjmBe4S/tJ070EvLfyrHE8SX3cPI5o/mUhlkYKDXad8Px4ePmwzashcgHcdSvSqaukXKOT3/X4oeVGT7cKYrErTiTIyQ/nTZEfxsTHbT32i5+60/dTSmS/uAPTvDYIXPgMfiVVjLdp6Yh+1DOQrcOZGmiIirL+GrXPQj9Me6irIxOTUCSJAne+Rs3gCAqkGMOpW+M4Drz4jHlDJN6vsCjghoGG65lItwoVGFrQbCGOfnsr4pvH37g29UAxfzYC8Wc6YPpE9vOwfgMFXKnu4OVXkhpRZpRJ1HRrFLpZ3xPxgYNkApWCyd11YgbKhH23V6m8/y1D+au+GjOXBPHa2bOkppp8j8D69U6t9PIStlAsRxdyw7cKWBSIWTA6MrPEMfMWfbd9tfnrfp5uqITKY8iIXklnRy2uwHPc4qy2N6kMlqLtZixdacU+yxfxAd1dNGbFHtzAHg1U4Jwkn8zpow4b7AC8Am0FxkLzpDssgRifWHOVsX/aSSXr60kvdbxVGGcRjiIFBDYmDrWtbMfBsI1D+fL/8wqx9f32RbN5o+kQg8llnadCnKnh56CHw6fu30wSnlUw9Div4EflZDWThweN7Occzh6u2C7MXJuYcYoD2hF5Ox0V+rnYmooUocigTW67FaeZZ+yotMc07tQjG0EmUfV1A4q84emKoYomeXClv8vI5x71cqdP1AojuKeBCxGuxg89aYdL30b9IRfraiH8rFat53aRVqAr5JpyC9F
 hY8tfD88
 /Ig1gjPWLPqs2pEg8M+OTOUrTgqvb8NSQo8tqa/08m+D9laeQnGO9ULFgpevDU7YtfCx+3H24p/y5GqUn788h5c8juirSKXJzeywbqJwr4E9tIm3tIjIRZVvtTfhMoaQz5O2Bbnipp53aj23yAemnTmOlm1C82RHeM4lD1jR91Q1XGYedgrnnb1QdGjTeenSV+TVpvcxAR9uQ2FHucGJbXmWyNvhWGaznnxQZpkPJstqyCt0djmmtrYcw/8xjPycp0PHFzu+6SP5GawZ3a7VdyE9cX4JX6Kp5Ut0VguDmg7zMqiNvFcOZx1IZwFHjGndrcJtxP+AhwL4L3pLgJe93xJ/40eq/twc9U9yJJPbbrZleCa3Q5QSFuRns74vy0HvRyFn87JLI/i708E1rWxoOl3nOj+Ew6snCDP9fvEVSPyKLqghmqGM5PXmt3g17vO86814C
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Thu, 2024-08-08 at 20:37 +0800, Yunsheng Lin wrote:
> Refactor common codes from __page_frag_alloc_va_align()
> to __page_frag_cache_reload(), so that the new API can
> make use of them.
>=20
> CC: Alexander Duyck <alexander.duyck@gmail.com>
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> ---
>  include/linux/page_frag_cache.h |   2 +-
>  mm/page_frag_cache.c            | 138 ++++++++++++++++++--------------
>  2 files changed, 81 insertions(+), 59 deletions(-)
>=20
> diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_ca=
che.h
> index 4ce924eaf1b1..0abffdd10a1c 100644
> --- a/include/linux/page_frag_cache.h
> +++ b/include/linux/page_frag_cache.h
> @@ -52,7 +52,7 @@ static inline void *encoded_page_address(unsigned long =
encoded_va)
> =20
>  static inline void page_frag_cache_init(struct page_frag_cache *nc)
>  {
> -	nc->encoded_va =3D 0;
> +	memset(nc, 0, sizeof(*nc));
>  }
> =20

Still not a fan of this. Just setting encoded_va to 0 should be enough
as the other fields will automatically be overwritten when the new page
is allocated.

Relying on memset is problematic at best since you then introduce the
potential for issues where remaining somehow gets corrupted but
encoded_va/page is 0. I would rather have both of these being checked
as a part of allocation than just just assuming it is valid if
remaining is set.

I would prefer to keep the check for a non-0 encoded_page value and
then check remaining rather than just rely on remaining as it creates a
single point of failure. With that we can safely tear away a page and
the next caller to try to allocate will populated a new page and the
associated fields.

>  static inline bool page_frag_cache_is_pfmemalloc(struct page_frag_cache =
*nc)
> diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c
> index 2544b292375a..4e6b1c4684f0 100644
> --- a/mm/page_frag_cache.c
> +++ b/mm/page_frag_cache.c
> @@ -19,8 +19,27 @@
>  #include <linux/page_frag_cache.h>
>  #include "internal.h"
> =20
> -static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
> -					     gfp_t gfp_mask)
> +static bool __page_frag_cache_reuse(unsigned long encoded_va,
> +				    unsigned int pagecnt_bias)
> +{
> +	struct page *page;
> +
> +	page =3D virt_to_page((void *)encoded_va);
> +	if (!page_ref_sub_and_test(page, pagecnt_bias))
> +		return false;
> +
> +	if (unlikely(encoded_page_pfmemalloc(encoded_va))) {
> +		free_unref_page(page, encoded_page_order(encoded_va));
> +		return false;
> +	}
> +
> +	/* OK, page count is 0, we can safely set it */
> +	set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
> +	return true;
> +}
> +
> +static bool __page_frag_cache_refill(struct page_frag_cache *nc,
> +				     gfp_t gfp_mask)
>  {
>  	unsigned long order =3D PAGE_FRAG_CACHE_MAX_ORDER;
>  	struct page *page =3D NULL;
> @@ -35,8 +54,8 @@ static struct page *__page_frag_cache_refill(struct pag=
e_frag_cache *nc,
>  	if (unlikely(!page)) {
>  		page =3D alloc_pages_node(NUMA_NO_NODE, gfp, 0);
>  		if (unlikely(!page)) {
> -			nc->encoded_va =3D 0;
> -			return NULL;
> +			memset(nc, 0, sizeof(*nc));
> +			return false;
>  		}
> =20
>  		order =3D 0;
> @@ -45,7 +64,33 @@ static struct page *__page_frag_cache_refill(struct pa=
ge_frag_cache *nc,
>  	nc->encoded_va =3D encode_aligned_va(page_address(page), order,
>  					   page_is_pfmemalloc(page));
> =20
> -	return page;
> +	/* Even if we own the page, we do not use atomic_set().
> +	 * This would break get_page_unless_zero() users.
> +	 */
> +	page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE);
> +
> +	return true;
> +}
> +
> +/* Reload cache by reusing the old cache if it is possible, or
> + * refilling from the page allocator.
> + */
> +static bool __page_frag_cache_reload(struct page_frag_cache *nc,
> +				     gfp_t gfp_mask)
> +{
> +	if (likely(nc->encoded_va)) {
> +		if (__page_frag_cache_reuse(nc->encoded_va, nc->pagecnt_bias))
> +			goto out;
> +	}
> +
> +	if (unlikely(!__page_frag_cache_refill(nc, gfp_mask)))
> +		return false;
> +
> +out:
> +	/* reset page count bias and remaining to start of new frag */
> +	nc->pagecnt_bias =3D PAGE_FRAG_CACHE_MAX_SIZE + 1;
> +	nc->remaining =3D page_frag_cache_page_size(nc->encoded_va);

One thought I am having is that it might be better to have the
pagecnt_bias get set at the same time as the page_ref_add or the
set_page_count call. In addition setting the remaining value at the
same time probably would make sense as in the refill case you can make
use of the "order" value directly instead of having to write/read it
out of the encoded va/page.

With that we could simplify this function and get something closer to
what we had for the original alloc_va_align code.

> +	return true;
>  }
> =20
>  void page_frag_cache_drain(struct page_frag_cache *nc)
> @@ -55,7 +100,7 @@ void page_frag_cache_drain(struct page_frag_cache *nc)
> =20
>  	__page_frag_cache_drain(virt_to_head_page((void *)nc->encoded_va),
>  				nc->pagecnt_bias);
> -	nc->encoded_va =3D 0;
> +	memset(nc, 0, sizeof(*nc));
>  }
>  EXPORT_SYMBOL(page_frag_cache_drain);
> =20
> @@ -73,67 +118,44 @@ void *__page_frag_alloc_va_align(struct page_frag_ca=
che *nc,
>  				 unsigned int align_mask)
>  {
>  	unsigned long encoded_va =3D nc->encoded_va;
> -	unsigned int size, remaining;
> -	struct page *page;
> -
> -	if (unlikely(!encoded_va)) {

We should still be checking this before we even touch remaining.
Otherwise we greatly increase the risk of providing a bad virtual
address and have greatly decreased the likelihood of us catching
potential errors gracefully.

> -refill:
> -		page =3D __page_frag_cache_refill(nc, gfp_mask);
> -		if (!page)
> -			return NULL;
> -
> -		encoded_va =3D nc->encoded_va;
> -		size =3D page_frag_cache_page_size(encoded_va);
> -
> -		/* Even if we own the page, we do not use atomic_set().
> -		 * This would break get_page_unless_zero() users.
> -		 */
> -		page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE);
> -
> -		/* reset page count bias and remaining to start of new frag */
> -		nc->pagecnt_bias =3D PAGE_FRAG_CACHE_MAX_SIZE + 1;
> -		nc->remaining =3D size;

With my suggested change above you could essentially just drop the
block starting from the comment and this function wouldn't need to
change as much as it is.

> -	} else {
> -		size =3D page_frag_cache_page_size(encoded_va);
> -	}
> +	unsigned int remaining;
> =20
>  	remaining =3D nc->remaining & align_mask;
> -	if (unlikely(remaining < fragsz)) {
> -		if (unlikely(fragsz > PAGE_SIZE)) {
> -			/*
> -			 * The caller is trying to allocate a fragment
> -			 * with fragsz > PAGE_SIZE but the cache isn't big
> -			 * enough to satisfy the request, this may
> -			 * happen in low memory conditions.
> -			 * We don't release the cache page because
> -			 * it could make memory pressure worse
> -			 * so we simply return NULL here.
> -			 */
> -			return NULL;
> -		}
> -
> -		page =3D virt_to_page((void *)encoded_va);
> =20
> -		if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
> -			goto refill;
> -
> -		if (unlikely(encoded_page_pfmemalloc(encoded_va))) {
> -			free_unref_page(page, encoded_page_order(encoded_va));
> -			goto refill;
> -		}

Likewise for this block here. We can essentially just make use of the
__page_frag_cache_reuse function without the need to do a complete
rework of the code.

> +	/* As we have ensured remaining is zero when initializing and draining =
old
> +	 * cache, 'remaining >=3D fragsz' checking is enough to indicate there =
is
> +	 * enough available space for the new fragment allocation.
> +	 */
> +	if (likely(remaining >=3D fragsz)) {
> +		nc->pagecnt_bias--;
> +		nc->remaining =3D remaining - fragsz;
> =20
> -		/* OK, page count is 0, we can safely set it */
> -		set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
> +		return encoded_page_address(encoded_va) +
> +			(page_frag_cache_page_size(encoded_va) - remaining);
> +	}
> =20
> -		/* reset page count bias and remaining to start of new frag */
> -		nc->pagecnt_bias =3D PAGE_FRAG_CACHE_MAX_SIZE + 1;
> -		remaining =3D size;
> +	if (unlikely(fragsz > PAGE_SIZE)) {
> +		/*
> +		 * The caller is trying to allocate a fragment with
> +		 * fragsz > PAGE_SIZE but the cache isn't big enough to satisfy
> +		 * the request, this may happen in low memory conditions. We don't
> +		 * release the cache page because it could make memory pressure
> +		 * worse so we simply return NULL here.
> +		 */
> +		return NULL;
>  	}
> =20
> +	if (unlikely(!__page_frag_cache_reload(nc, gfp_mask)))
> +		return NULL;
> +
> +	/* As the we are allocating fragment from cache by count-up way, the of=
fset
> +	 * of allocated fragment from the just reloaded cache is zero, so remai=
ning
> +	 * aligning and offset calculation are not needed.
> +	 */
>  	nc->pagecnt_bias--;
> -	nc->remaining =3D remaining - fragsz;
> +	nc->remaining -=3D fragsz;
> =20
> -	return encoded_page_address(encoded_va) + (size - remaining);
> +	return encoded_page_address(nc->encoded_va);
>  }
>  EXPORT_SYMBOL(__page_frag_alloc_va_align);
> =20