From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9AFB9C54734 for ; Tue, 27 Aug 2024 12:06:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 16ED76B0083; Tue, 27 Aug 2024 08:06:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 11F796B0085; Tue, 27 Aug 2024 08:06:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F29126B0088; Tue, 27 Aug 2024 08:06:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D506F6B0083 for ; Tue, 27 Aug 2024 08:06:33 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 95279C1890 for ; Tue, 27 Aug 2024 12:06:33 +0000 (UTC) X-FDA: 82497898266.03.59D7411 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by imf09.hostedemail.com (Postfix) with ESMTP id BFF91140007 for ; Tue, 27 Aug 2024 12:06:30 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf09.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724760371; a=rsa-sha256; cv=none; b=CKF2nvZiwVWqn6QLr1m2EhkGXpheku0hQafl1u/cQNMsGG30l/nBdpR3hyU09lCIAYBquk lIspii0TG5aWyNmFK/E9de4jDC6Zu2dTM0bN3IlmcLSVtQG45fYxq7VezXN0g+7jHB1iAM +uqZbIn5kwJ6aGt37XprktAeC4wNTH0= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf09.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724760371; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dXlKKo6NNXBGmZGoHBHYheT/dOtb9pNcrmJvdyN6gRo=; b=Ih6ikTOSHsa9OdJJZqILun8PAAybHfy+d7dAzq1xw+0XekGSiNV9VSidS/7fagbbEBlaFR iRNmHr3x7fjpWG95mx6i8Ot2FUL7SmzBUN3f22yGCk6TAnnxhZSo9JYYqe9P8wkQzqU4d6 2voh3OTDkvFQsHwGqGjJ3wNU9q0AmsA= Received: from mail.maildlp.com (unknown [172.19.88.105]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4WtR8338KDz14DV3; Tue, 27 Aug 2024 20:05:39 +0800 (CST) Received: from dggpemf200006.china.huawei.com (unknown [7.185.36.61]) by mail.maildlp.com (Postfix) with ESMTPS id 5E7F4140137; Tue, 27 Aug 2024 20:06:26 +0800 (CST) Received: from [10.67.120.129] (10.67.120.129) by dggpemf200006.china.huawei.com (7.185.36.61) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 27 Aug 2024 20:06:25 +0800 Message-ID: <82be328d-8f04-417f-bdf2-e8c0f6f58057@huawei.com> Date: Tue, 27 Aug 2024 20:06:25 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH net-next v15 06/13] mm: page_frag: reuse existing space for 'size' and 'pfmemalloc' To: Alexander Duyck CC: , , , , , Andrew Morton , References: <20240826124021.2635705-1-linyunsheng@huawei.com> <20240826124021.2635705-7-linyunsheng@huawei.com> Content-Language: en-US From: Yunsheng Lin In-Reply-To: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.67.120.129] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To dggpemf200006.china.huawei.com (7.185.36.61) X-Rspam-User: X-Rspamd-Queue-Id: BFF91140007 X-Rspamd-Server: rspam01 X-Stat-Signature: pnnydwuj9cwiajbswq1mjpwhfs6ct6id X-HE-Tag: 1724760390-411934 X-HE-Meta: U2FsdGVkX1/Qpg39qcX07khYeucqcvqc2/rt/G4L2m52ZcnuwVmiRyF+IOnFjye/skzL1kn2EETwXo0uzRPMnlhiGsc4DBNPwR1gx33HCZHO1hKB0H+IPbb27QNmKRDHAd4BmamjDxvG5+J344TueSBYQBbQkVTZXM+/U1QhyNYIe8UGmZpsOxPQbpaf2ZxXJSJzJaXsvzRgPgNb/fLWNTVKo2K4c7KUr3JztIT16G+9X3BY7/I9CEfXcaBcnmvJULFimkbB5nrd3gwEvRgUkV0oXSIz5vCjiUGc/ZGiOa6iHEkJZHwznivJQENRp9Y0oYyvPXC0GhG9g1ky9QnHWuYW2c6N0Ov1KCM745E2WbzHEdkcm/ZBqFXui+KVi65UUs1n/3ps6BS/qQdkRhsmWeJFKTpwUZKEYI8kLFEzcSGt9Z0hKY9ZrIaNOS3VDbImL6cLZhP6KufNCPgspvXGcnQRAf32x5AiihnTobCkS0hesfxoc2vpnF9oOuRP2ucQEf/eNzvsb24tSENkR1QAx/Zm6EofoFnUDCHk1NFlgJf61Qe07MMJI9to8EK5snQY8Nn57xZg3wcwHPdZzF4jK3PRFRQiiK2KZ7G1yANe5KjVb20GsW5qx6FH2TpPaTtXJHzTf00SDDbtmpjrqxRsmHQHQNibgQtSJf/VHyOhPH9vuhS80NPVWPWMnPX7+/vnS9gGl3gaR8FfVA7tb/6Ua9QavEh68Q/UVRHEHw7FMxSvN6AXbxm3GrsMEADt+3bW8qwjDR0gPiRbT0n5k7ZO45KzD6UXdsKLsYr6B/8yw3niKdoGXxSfGa3vCK3QUMlU6Ka2xOcD9SXqXftQwDU9iPi/SqCRNxRy470mC3J+MVTBMH/Yshzqd1jtqxZlhENnzNvtMKcGlW6EDETIc+m9/hgeVxgtSyoDANJlaz3LfrzEr6Z0c0G9xZxgSPHi1b5md03WdLOd809Los6Crr2 WD0hEfvQ AoGrBHSr+WbDL4GqArQnPA0gKahNN4xhYQYgBqduCDuGBJ0vUGjGoBl8ve+RCBm6c34UNeQwlGx6dLt6bCu3IFE5UkDTm0LdpoCCNi37IOuB7z1gQgfw4hU4KVfwBv7QQUn/9moYWA6Cb0HsKf2r1HrcBH1vI6AEBRcYuWRDOZzBb/l5USXZEC/rtXJpc8jNtiaH5baH8t2o8DzMjm/lE5L0O7PTd3wYyHW/YHaVLjPCePQ0RTHi9AjjRkLoyCjkMT0KH3/s4lO8ZCEmIWMCFtMy3B9U0VvfRi7unV6npUVsel38J04fYksHTZouKgEyuj9q7iWcJmuAJH51mEqCk2WVhz/CFt5r8jzASN2HwXEyO6Os= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/8/27 0:46, Alexander Duyck wrote: > On Mon, Aug 26, 2024 at 5:46 AM Yunsheng Lin wrote: >> >> Currently there is one 'struct page_frag' for every 'struct >> sock' and 'struct task_struct', we are about to replace the >> 'struct page_frag' with 'struct page_frag_cache' for them. >> Before begin the replacing, we need to ensure the size of >> 'struct page_frag_cache' is not bigger than the size of >> 'struct page_frag', as there may be tens of thousands of >> 'struct sock' and 'struct task_struct' instances in the >> system. >> >> By or'ing the page order & pfmemalloc with lower bits of >> 'va' instead of using 'u16' or 'u32' for page size and 'u8' >> for pfmemalloc, we are able to avoid 3 or 5 bytes space waste. >> And page address & pfmemalloc & order is unchanged for the >> same page in the same 'page_frag_cache' instance, it makes >> sense to fit them together. >> >> After this patch, the size of 'struct page_frag_cache' should be >> the same as the size of 'struct page_frag'. >> >> CC: Alexander Duyck >> Signed-off-by: Yunsheng Lin >> --- >> include/linux/mm_types_task.h | 19 ++++++----- >> include/linux/page_frag_cache.h | 60 +++++++++++++++++++++++++++++++-- >> mm/page_frag_cache.c | 51 +++++++++++++++------------- >> 3 files changed, 97 insertions(+), 33 deletions(-) >> >> diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h >> index cdc1e3696439..a8635460e027 100644 >> --- a/include/linux/mm_types_task.h >> +++ b/include/linux/mm_types_task.h >> @@ -50,18 +50,21 @@ struct page_frag { >> #define PAGE_FRAG_CACHE_MAX_SIZE __ALIGN_MASK(32768, ~PAGE_MASK) >> #define PAGE_FRAG_CACHE_MAX_ORDER get_order(PAGE_FRAG_CACHE_MAX_SIZE) >> struct page_frag_cache { >> - void *va; >> -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) >> + /* encoded_page consists of the virtual address, pfmemalloc bit and order >> + * of a page. >> + */ >> + unsigned long encoded_page; >> + >> + /* we maintain a pagecount bias, so that we dont dirty cache line >> + * containing page->_refcount every time we allocate a fragment. >> + */ >> +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) && (BITS_PER_LONG <= 32) >> __u16 offset; >> - __u16 size; >> + __u16 pagecnt_bias; >> #else >> __u32 offset; >> + __u32 pagecnt_bias; >> #endif >> - /* we maintain a pagecount bias, so that we dont dirty cache line >> - * containing page->_refcount every time we allocate a fragment. >> - */ >> - unsigned int pagecnt_bias; >> - bool pfmemalloc; >> }; >> >> /* Track pages that require TLB flushes */ >> diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h >> index 0a52f7a179c8..372d6ed7e20a 100644 >> --- a/include/linux/page_frag_cache.h >> +++ b/include/linux/page_frag_cache.h >> @@ -3,18 +3,74 @@ >> #ifndef _LINUX_PAGE_FRAG_CACHE_H >> #define _LINUX_PAGE_FRAG_CACHE_H >> >> +#include >> +#include >> #include >> +#include >> #include >> #include >> >> +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) >> +/* Use a full byte here to enable assembler optimization as the shift >> + * operation is usually expecting a byte. >> + */ >> +#define PAGE_FRAG_CACHE_ORDER_MASK GENMASK(7, 0) >> +#define PAGE_FRAG_CACHE_PFMEMALLOC_SHIFT 8 >> +#define PAGE_FRAG_CACHE_PFMEMALLOC_BIT BIT(PAGE_FRAG_CACHE_PFMEMALLOC_SHIFT) >> +#else >> +/* Compiler should be able to figure out we don't read things as any value >> + * ANDed with 0 is 0. >> + */ >> +#define PAGE_FRAG_CACHE_ORDER_MASK 0 >> +#define PAGE_FRAG_CACHE_PFMEMALLOC_SHIFT 0 >> +#define PAGE_FRAG_CACHE_PFMEMALLOC_BIT BIT(PAGE_FRAG_CACHE_PFMEMALLOC_SHIFT) >> +#endif >> + >> +static inline unsigned long page_frag_encode_page(struct page *page, >> + unsigned int order, >> + bool pfmemalloc) >> +{ >> + BUILD_BUG_ON(PAGE_FRAG_CACHE_MAX_ORDER > PAGE_FRAG_CACHE_ORDER_MASK); >> + BUILD_BUG_ON(PAGE_FRAG_CACHE_PFMEMALLOC_BIT >= PAGE_SIZE); >> + >> + return (unsigned long)page_address(page) | >> + (order & PAGE_FRAG_CACHE_ORDER_MASK) | >> + ((unsigned long)pfmemalloc << PAGE_FRAG_CACHE_PFMEMALLOC_SHIFT); >> +} >> + >> +static inline unsigned long page_frag_encoded_page_order(unsigned long encoded_page) >> +{ >> + return encoded_page & PAGE_FRAG_CACHE_ORDER_MASK; >> +} >> + >> +static inline bool page_frag_encoded_page_pfmemalloc(unsigned long encoded_page) >> +{ >> + return !!(encoded_page & PAGE_FRAG_CACHE_PFMEMALLOC_BIT); >> +} >> + >> +static inline void *page_frag_encoded_page_address(unsigned long encoded_page) >> +{ >> + return (void *)(encoded_page & PAGE_MASK); >> +} >> + >> +static inline struct page *page_frag_encoded_page_ptr(unsigned long encoded_page) >> +{ >> + return virt_to_page((void *)encoded_page); >> +} >> + >> static inline void page_frag_cache_init(struct page_frag_cache *nc) >> { >> - nc->va = NULL; >> + nc->encoded_page = 0; >> } >> >> static inline bool page_frag_cache_is_pfmemalloc(struct page_frag_cache *nc) >> { >> - return !!nc->pfmemalloc; >> + return page_frag_encoded_page_pfmemalloc(nc->encoded_page); >> +} >> + >> +static inline unsigned int page_frag_cache_page_size(unsigned long encoded_page) >> +{ >> + return PAGE_SIZE << page_frag_encoded_page_order(encoded_page); >> } >> >> void page_frag_cache_drain(struct page_frag_cache *nc); > > So how many of these additions are actually needed outside of the > page_frag_cache.c file? I'm just wondering if we could look at moving At least page_frag_cache_is_pfmemalloc(), page_frag_encoded_page_order(), page_frag_encoded_page_ptr(), page_frag_encoded_page_address() are needed out of the page_frag_cache.c file for now, which are used mostly in __page_frag_cache_commit() and __page_frag_alloc_refill_probe_align() for debugging and performance reason, see patch 7 & 10. The only left one is page_frag_encode_page(), I am not sure if it makes much sense to move it to page_frag_cache.c while the rest of them are in .h file. > most of these into the c file itself instead of making them accessible > to all callers as I don't believe we currently have anyone looking > into the size of the frag cache or anything like that and I would > prefer to avoid exposing such functionality if possible. As the > non-order0 allocation problem with this has pointed out people will > exploit any interface exposed even if unintentionally. > > I would want to move the size/order logic as well as splitting out the > virtual address as we shouldn't be allowing the user to look at that > without going through an allocation function. I am generally agreed with the above argument if there are ways to do that without sacrificing the above mentioned debugging and performance.