From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64FB3C3DA61 for ; Sun, 21 Jul 2024 22:59:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7832E6B0083; Sun, 21 Jul 2024 18:59:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7319F6B0085; Sun, 21 Jul 2024 18:59:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5F98A6B0088; Sun, 21 Jul 2024 18:59:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 404466B0083 for ; Sun, 21 Jul 2024 18:59:12 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 8E6281213BA for ; Sun, 21 Jul 2024 22:59:11 +0000 (UTC) X-FDA: 82365277302.19.781120C Received: from mail-pf1-f177.google.com (mail-pf1-f177.google.com [209.85.210.177]) by imf04.hostedemail.com (Postfix) with ESMTP id A14CA40002 for ; Sun, 21 Jul 2024 22:59:09 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=eHqBGgCS; spf=pass (imf04.hostedemail.com: domain of alexander.duyck@gmail.com designates 209.85.210.177 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721602690; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=s9428CJifiQUG6CjoPtx1iPjSURVsMzNyEj5V06bq3k=; b=GpWHT0aXzRIkWNCYyIIAIrgFnPSe2KQYihlAJVM2PcHWbrfNsQBMgtXefOgO0wqITtkC77 MO+TS9fCFsc1wGuH737Y90MdV2l9Rmz9gBb1v13F2mM7Qv24Z+01VxDzJ+lByFoLpD/2By +ngDQHuCXw3m3mlzDebK+8Ia2DOXn0A= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=eHqBGgCS; spf=pass (imf04.hostedemail.com: domain of alexander.duyck@gmail.com designates 209.85.210.177 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721602690; a=rsa-sha256; cv=none; b=DCkJzkX0YMdG8gK5C9JBNFcix+dgJ6LNfJYc+Q0aq2lvhw1QJCm67KwhonUqUOToV4ggdm oWe7oNCHRFG5OciogG3hMMDZwKpKsshKwWlnWDFVxW+3fTl8zAXjMsThi/Hd1ehR2+eGjX 3ELlQYHRZhAX6PYQl8HA/xuDZPBFguw= Received: by mail-pf1-f177.google.com with SMTP id d2e1a72fcca58-70d18112b60so521726b3a.1 for ; Sun, 21 Jul 2024 15:59:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721602748; x=1722207548; darn=kvack.org; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:from:to:cc:subject :date:message-id:reply-to; bh=s9428CJifiQUG6CjoPtx1iPjSURVsMzNyEj5V06bq3k=; b=eHqBGgCSCE1hqWzH0AU8PsWhOjf78rPa0WvqNUKDq0hGPYc3XbfeDFMJgvNe3ihaRv Yw7W4T1xKdjIbb9bpqjLXBU2bkUkzYmX7HFz4/E54Doflj2kZRisXvkz19QcHqEmw/p6 8GchoXhA7gbV0JbgYTHChgGcq9kDvS7VsblYQrg40/xOyH67Z1uaIiZjYL+ppFuCmiQ5 dvqAHU8l71hBjAu3b4zqmv7OR5bUUfi1SWv39h2rPbhceLvp9RH/ZvC4uj/2IFIV99Q+ pXyGNBiMcShu7T1TEoLBFoeng+kLz/NLLLBqEK3kjz0RA8R0m1IAgJByYa+Pj8ouhriF /sRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721602748; x=1722207548; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=s9428CJifiQUG6CjoPtx1iPjSURVsMzNyEj5V06bq3k=; b=claKnaJKM9rX2TVNXBbfl1dN9jXwT23WKYatUka2816dKyLI6/BkYqFpKu1SbdH2oA GDZ8/uTOiBiYITa7GlBB9YW9jN+UW4bnLAAR1yHM6zTJeTuKoe7kM01UyzvaylICII2H 6tDua1vq2pe8W2kqsI81I33ad36c8/vih6jOzG56CwIwIK+tspK/SqQU8YnVGEjj9BgQ 3jWZp4m8F6cekue4Mc08kHwlKXrb9m8wUWkkgaiqtoMzRo1UDR2lwHSJ0Q9RfP8IFg1l it5HFGffHIRMsHzWetMouDCmCtqMG/EUUoc/Zx5v9KSpc1WOqC7DAGsKJWnqudZ9awWD FzVw== X-Forwarded-Encrypted: i=1; AJvYcCU7v5npto5c+RmrgSw2oBnwBg2F0mTo6fy7CFp3Q2GW7kDXHHyRfCF//Z+pj5NLDhCc5e8xRkSoRSy9iDcsqlXQUvM= X-Gm-Message-State: AOJu0YydJGBjhEwRU1yQNLryp/ZbnzR1HKaINlCwQ7qI1wu6qtvAmjiv c68y/whH6UhDvIgqgdbLyPCdDw7BpAZEWYgunxLnUbczVu9U9UV2 X-Google-Smtp-Source: AGHT+IEbDApUm3u6kTx7a8fl5rBd8PVSi4e07BfRjymtRtdEfyt0+OSdM0OSxvAHqQ19sIV9Fy6CPA== X-Received: by 2002:aa7:88c2:0:b0:706:5daf:efa5 with SMTP id d2e1a72fcca58-70cfd51f315mr15550240b3a.9.1721602748084; Sun, 21 Jul 2024 15:59:08 -0700 (PDT) Received: from [192.168.0.128] ([98.97.103.43]) by smtp.googlemail.com with ESMTPSA id d2e1a72fcca58-70d1c99ce43sm1504409b3a.205.2024.07.21.15.59.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 Jul 2024 15:59:07 -0700 (PDT) Message-ID: Subject: Re: [RFC v11 07/14] mm: page_frag: reuse existing space for 'size' and 'pfmemalloc' From: Alexander H Duyck To: Yunsheng Lin , davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Andrew Morton , linux-mm@kvack.org Date: Sun, 21 Jul 2024 15:59:06 -0700 In-Reply-To: <20240719093338.55117-8-linyunsheng@huawei.com> References: <20240719093338.55117-1-linyunsheng@huawei.com> <20240719093338.55117-8-linyunsheng@huawei.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.48.4 (3.48.4-1.fc38) MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: A14CA40002 X-Stat-Signature: xwooizri6su5bkrirgm7w9o7c55hf7bx X-Rspam-User: X-HE-Tag: 1721602749-445875 X-HE-Meta: U2FsdGVkX1/InTBp9BIPzMP7NmqPCzZuZubBjKJzXllz/a1CHoQItkZXbBUaZgs+12S/9mDe+r2akmASyVrGQdjF25arKai7Ux7uGv4yd3fq6hCoagGGlbqqGDlcevf3+dg8h28raD8B99lsJ2vSiqcc3ULv16YLWNsD/IBR8ScQJoGzn/yvq1M2PMRGt89Jgx4LXa1VeJVo5350FJevErRKd0Dy7a955T5RRWG9C/iealy6k54wboHk4BxNyFiuMri/XK+znRX7S3l7EXfXKYcL2+esp9//2uGvau2xg0SSnk0z6tIrk3dHYoBYo6aoA3nv2d0Rp2QfGO/8q5Lw64ZudDIVw1leVApVRoy0dPrjuxiz9daeYUEIA/aTCoCy55m0eFFrTUY3qD2X0NQ15f6IzjbIeDr7EfEgiC4as02aPQAaS4is/RtnEtlfUnXl7bLED5Bz37Gdqt3HDyhm5nT+eqon/0C9ZdIWoRIXtCuyRSe5wEsRf0y8pu680Hhi2rzzjELD5muOK4y3isli4gia0aTEWRJPNj6U1bsx3ex2ealSen/2p8+mWE7ZYetgTSTcVYKytR8nEfdHm79ufjgjH0C1PNNEYyLPRi8xG3JhHcACQ32xhOZk5YgCvDNxES7MDYaye0lOZbo/NcZhbI411iQd2qCludyZYoKSF/akOgOolKWbLwt2CWxVrzJDe3oK36fXL7544ip+hqKFzGM2a/FiM8fTps0exVk0wG9Du5uw1TYKLcxeRulPanKNmbCej6orqSYuDFKrli+ahu8Ecwvu9GXHMBQC77xpFqacsS2XPFmk9mmllNQkvENHGYFzt22vT4C6Hv71mwyoL3L5fA9CPNmQUbkAnOwPMhPGBEYaoyQEYgD3cro3eljzprSA0rNSdd8SlOZfmrYT8548AyDB7sLhFnHU/qzVnPQD856iJwexlGHpBc9G/2wgcKIRWG9V7JDSIUe4+LJ unjAIudU Yggl54SOPoEJgRtB6VzFxyuW0mKPfcHMKTybvtnkLAUyAlgt/o5/ymJhD9MUHY0Be6w/PgWDhbpJYfjOXm3s3JzSW+xPbeFkFwRpo42TGqcmE7WbREvhD97kSWP2F+GS87hfgMt19nhibTJPkcFh9AaCqAAMKOMdi8zzWsjmz9xR8IaRoYSoRY6BAHbkwMMaARv5voRVrCEqSQ7R0w4e02NfE52epdrXB4jZIgZQL6jEteBWEHik17fRSPkase7bmxYA4MWQCgTwZRjV/MCnbpWXnwxmYmOJ0NyjRDAboUFgOFjoqrtx/uSlL7KtwL3/dbtehx1BEIUTXTcRltS9W+9dFaNzs/QXXBzzTIT/c2IHKj8t/wAjmc0j4VolWij0rZdynqDPFXN83VaVXKrK1+YX0cf+oIC2nMW5UfFt5bkurLtC2Z//zFHcSoT7F+6cb9aN9bD/nZnyKR67Jwx/RbQdRHw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, 2024-07-19 at 17:33 +0800, Yunsheng Lin wrote: > Currently there is one 'struct page_frag' for every 'struct > sock' and 'struct task_struct', we are about to replace the > 'struct page_frag' with 'struct page_frag_cache' for them. > Before begin the replacing, we need to ensure the size of > 'struct page_frag_cache' is not bigger than the size of > 'struct page_frag', as there may be tens of thousands of > 'struct sock' and 'struct task_struct' instances in the > system. >=20 > By or'ing the page order & pfmemalloc with lower bits of > 'va' instead of using 'u16' or 'u32' for page size and 'u8' > for pfmemalloc, we are able to avoid 3 or 5 bytes space waste. > And page address & pfmemalloc & order is unchanged for the > same page in the same 'page_frag_cache' instance, it makes > sense to fit them together. >=20 > After this patch, the size of 'struct page_frag_cache' should be > the same as the size of 'struct page_frag'. >=20 > CC: Alexander Duyck > Signed-off-by: Yunsheng Lin > --- > include/linux/mm_types_task.h | 16 +++++------ > include/linux/page_frag_cache.h | 49 +++++++++++++++++++++++++++++++-- > mm/page_frag_cache.c | 49 +++++++++++++++------------------ > 3 files changed, 77 insertions(+), 37 deletions(-) >=20 > diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.= h > index b1c54b2b9308..f2610112a642 100644 > --- a/include/linux/mm_types_task.h > +++ b/include/linux/mm_types_task.h > @@ -50,18 +50,18 @@ struct page_frag { > #define PAGE_FRAG_CACHE_MAX_SIZE __ALIGN_MASK(32768, ~PAGE_MASK) > #define PAGE_FRAG_CACHE_MAX_ORDER get_order(PAGE_FRAG_CACHE_MAX_SIZE) > struct page_frag_cache { > - void *va; > -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) > + /* encoded_va consists of the virtual address, pfmemalloc bit and order > + * of a page. > + */ > + unsigned long encoded_va; > + > +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) && (BITS_PER_LONG <=3D 32) > __u16 remaining; > - __u16 size; > + __u16 pagecnt_bias; > #else > __u32 remaining; > + __u32 pagecnt_bias; > #endif > - /* we maintain a pagecount bias, so that we dont dirty cache line > - * containing page->_refcount every time we allocate a fragment. > - */ > - unsigned int pagecnt_bias; > - bool pfmemalloc; > }; > =20 > /* Track pages that require TLB flushes */ > diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_ca= che.h > index ef1572f11248..12a16f8e8ad0 100644 > --- a/include/linux/page_frag_cache.h > +++ b/include/linux/page_frag_cache.h > @@ -3,19 +3,64 @@ > #ifndef _LINUX_PAGE_FRAG_CACHE_H > #define _LINUX_PAGE_FRAG_CACHE_H > =20 > +#include > +#include > #include > #include > #include > #include > =20 > +#define PAGE_FRAG_CACHE_ORDER_MASK GENMASK(7, 0) I would pull the PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE check from below and use it to wrap this mask definition. If we don't need order you could define the mask as 0. With that you get the benefit of the compiler being able to figure out we don't read things as any value ANDed with 0 is 0. Also a comment explaining why you want it to be a full byte here would be useful. I am assuming this is for assembler optimization as the shift operation is usually expecting a byte. > +#define PAGE_FRAG_CACHE_PFMEMALLOC_BIT BIT(8) > +#define PAGE_FRAG_CACHE_PFMEMALLOC_SHIFT 8 > + > +static inline unsigned long encode_aligned_va(void *va, unsigned int ord= er, > + bool pfmemalloc) > +{ > + BUILD_BUG_ON(PAGE_FRAG_CACHE_MAX_ORDER > PAGE_FRAG_CACHE_ORDER_MASK); > + BUILD_BUG_ON(PAGE_FRAG_CACHE_PFMEMALLOC_SHIFT >=3D PAGE_SHIFT); > + > +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) > + return (unsigned long)va | order | > + (pfmemalloc << PAGE_FRAG_CACHE_PFMEMALLOC_SHIFT); > +#else > + return (unsigned long)va | > + (pfmemalloc << PAGE_FRAG_CACHE_PFMEMALLOC_SHIFT); > +#endif So with the mask trick I called out above you could just have (order & PAGE_FRAG_CACHE_ORDER_MASK) be one of your inputs. If ORDER_MASK is 0 it should just strip the compiler will know it will turn out 0. Also doing a shift on a bool is a risky action. What you might look at doing instead would be something like a multiplication of a unsigned long bit by a bool, or at least you need to recast pfmemalloc to something other than a bool. > +} > + > +static inline unsigned long encoded_page_order(unsigned long encoded_va) > +{ > +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) > + return encoded_va & PAGE_FRAG_CACHE_ORDER_MASK; > +#else > + return 0; > +#endif > +} > + As mentioned above, if the mask takes care of it for us it should just return 0 automatically and cut out this code without the #if/else logic. > +static inline bool encoded_page_pfmemalloc(unsigned long encoded_va) > +{ > + return encoded_va & PAGE_FRAG_CACHE_PFMEMALLOC_BIT; > +} > + Technically you aren't returning a bool here, you are returning an unsigned long. It would be best to wrap this in "!!()". > +static inline void *encoded_page_address(unsigned long encoded_va) > +{ > + return (void *)(encoded_va & PAGE_MASK); > +} > + > static inline void page_frag_cache_init(struct page_frag_cache *nc) > { > - nc->va =3D NULL; > + nc->encoded_va =3D 0; > } > =20 > static inline bool page_frag_cache_is_pfmemalloc(struct page_frag_cache = *nc) > { > - return !!nc->pfmemalloc; > + return encoded_page_pfmemalloc(nc->encoded_va); > +} > + > +static inline unsigned int page_frag_cache_page_size(unsigned long encod= ed_va) > +{ > + return PAGE_SIZE << encoded_page_order(encoded_va); > } > =20 > void page_frag_cache_drain(struct page_frag_cache *nc); > diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c > index b12496f05c4a..7928e5d50711 100644 > --- a/mm/page_frag_cache.c > +++ b/mm/page_frag_cache.c > @@ -22,7 +22,7 @@ > static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, > gfp_t gfp_mask) > { > - unsigned int page_size =3D PAGE_FRAG_CACHE_MAX_SIZE; > + unsigned long order =3D PAGE_FRAG_CACHE_MAX_ORDER; > struct page *page =3D NULL; > gfp_t gfp =3D gfp_mask; > =20 > @@ -35,28 +35,27 @@ static struct page *__page_frag_cache_refill(struct p= age_frag_cache *nc, > if (unlikely(!page)) { > page =3D alloc_pages_node(NUMA_NO_NODE, gfp, 0); > if (unlikely(!page)) { > - nc->va =3D NULL; > + nc->encoded_va =3D 0; > return NULL; > } > =20 > - page_size =3D PAGE_SIZE; > + order =3D 0; > } > =20 > -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) > - nc->size =3D page_size; > -#endif > - nc->va =3D page_address(page); > + nc->encoded_va =3D encode_aligned_va(page_address(page), order, > + page_is_pfmemalloc(page)); > =20 > return page; > } > =20 > void page_frag_cache_drain(struct page_frag_cache *nc) > { > - if (!nc->va) > + if (!nc->encoded_va) > return; > =20 > - __page_frag_cache_drain(virt_to_head_page(nc->va), nc->pagecnt_bias); > - nc->va =3D NULL; > + __page_frag_cache_drain(virt_to_head_page((void *)nc->encoded_va), > + nc->pagecnt_bias); > + nc->encoded_va =3D 0; > } > EXPORT_SYMBOL(page_frag_cache_drain); > =20 > @@ -73,36 +72,30 @@ void *__page_frag_alloc_va_align(struct page_frag_cac= he *nc, > unsigned int fragsz, gfp_t gfp_mask, > unsigned int align_mask) > { > - unsigned int size =3D PAGE_SIZE; > - unsigned int remaining; > + unsigned long encoded_va =3D nc->encoded_va; > + unsigned int size, remaining; > struct page *page; > =20 > - if (unlikely(!nc->va)) { > + if (unlikely(!encoded_va)) { > refill: > page =3D __page_frag_cache_refill(nc, gfp_mask); > if (!page) > return NULL; > =20 > -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) > - /* if size can vary use size else just use PAGE_SIZE */ > - size =3D nc->size; > -#endif > + encoded_va =3D nc->encoded_va; > + size =3D page_frag_cache_page_size(encoded_va); > + > /* Even if we own the page, we do not use atomic_set(). > * This would break get_page_unless_zero() users. > */ > page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE); > =20 > /* reset page count bias and remaining to start of new frag */ > - nc->pfmemalloc =3D page_is_pfmemalloc(page); > nc->pagecnt_bias =3D PAGE_FRAG_CACHE_MAX_SIZE + 1; > nc->remaining =3D size; > } > =20 > -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) > - /* if size can vary use size else just use PAGE_SIZE */ > - size =3D nc->size; > -#endif > - > + size =3D page_frag_cache_page_size(encoded_va); As I think I mentioned in an earlier patch it would probably be better to do this before the if statement above. That way you avoid recomputing size when you allocate a new page. With any luck the compiler will realize that this is essentially an "else" for the if statement above. Either that or just make this an else for the allocation block above. > remaining =3D nc->remaining & align_mask; > if (unlikely(remaining < fragsz)) { > if (unlikely(fragsz > PAGE_SIZE)) { > @@ -118,13 +111,15 @@ void *__page_frag_alloc_va_align(struct page_frag_c= ache *nc, > return NULL; > } > =20 > - page =3D virt_to_page(nc->va); > + page =3D virt_to_page((void *)encoded_va); > =20 > if (!page_ref_sub_and_test(page, nc->pagecnt_bias)) > goto refill; > =20 > - if (unlikely(nc->pfmemalloc)) { > - free_unref_page(page, compound_order(page)); > + if (unlikely(encoded_page_pfmemalloc(encoded_va))) { > + VM_BUG_ON(compound_order(page) !=3D > + encoded_page_order(encoded_va)); > + free_unref_page(page, encoded_page_order(encoded_va)); > goto refill; > } > =20 > @@ -141,7 +136,7 @@ void *__page_frag_alloc_va_align(struct page_frag_cac= he *nc, > nc->pagecnt_bias--; > nc->remaining =3D remaining - fragsz; > =20 > - return nc->va + (size - remaining); > + return encoded_page_address(encoded_va) + (size - remaining); > } > EXPORT_SYMBOL(__page_frag_alloc_va_align); > =20