From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8CEE0C63797 for ; Wed, 1 Feb 2023 18:06:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 18DC26B0072; Wed, 1 Feb 2023 13:06:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 13D676B0074; Wed, 1 Feb 2023 13:06:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 02C6C6B0075; Wed, 1 Feb 2023 13:06:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E9A106B0072 for ; Wed, 1 Feb 2023 13:06:53 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id B4F9EC0D7E for ; Wed, 1 Feb 2023 18:06:53 +0000 (UTC) X-FDA: 80419503906.07.7D86A75 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf12.hostedemail.com (Postfix) with ESMTP id 2BD4740022 for ; Wed, 1 Feb 2023 18:06:50 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=igFpy5aK; spf=pass (imf12.hostedemail.com: domain of rppt@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675274811; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=caogqC4rO8bxbn54Z1+P1FMhxCLsfaoZMNe4KzKhzVg=; b=eHOZ09aqVTLeb6cCrmlswr7Gj3tY/fTRruIkTNopGG5JZuPnu9M57TErjo4jbvEHan0NQi +EGj1NX0PSGvQKKlLRiIHmIQLitHFGTg2xS7zsXdodXmnkB5x8SWb8hDiMULyNaecaBkqQ OKL8YZeVPOyFMVlEZy1p4VUwyWiDBXk= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=igFpy5aK; spf=pass (imf12.hostedemail.com: domain of rppt@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675274811; a=rsa-sha256; cv=none; b=wmpDohbx8WTdP6Pk+DbNp4jQXjYvhC2obg9vCFJzKSjEaHijE7EoWEpTvZqejWCr7PfBK/ yxW1iDCtFKMHKZDVlIMerMeeyZp6Pf6jR2EPYOdAEAOtXr2CVw1Gakjte46Nz65z+6xwxg k31F0hMNxdR/A46Kl6ylVIYFOCfgXbg= Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 1DDBC618F9; Wed, 1 Feb 2023 18:06:50 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 854C1C433D2; Wed, 1 Feb 2023 18:06:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1675274809; bh=STNrIQeju4De6sG6kkbjxY6kN5ZTE/jbJl0kjF81qjo=; h=Date:From:To:Subject:From; b=igFpy5aKaTkrP9rsEBHS07+tcUZ12P+uVaXkYrbsc2QvciCtVmgTd2gvCb7wl0fV7 VvE9LfzG6VbdFqj7hZhPtQe/6sTTU0BtUGe3ijh2MvYpJI7Q6xHh+AzZy2RJ5m3Kwb pW4HYvuKvHKo3t5w/5mdjjf8oIINlvaNnOuSW1Df6dqwBlXtkMa8uoLf2qvSbQecDj GIqnOoUJsZQMKos6fxEXNny9smxROGkV0zwzZQ8Lp8rdrodL4DnKrK2moxiqkv0P/P nasxMER8i9yJtc0JJJ2bNWJ8YnxjatUxHTsJzWOGeCFEsSMHS+Mm4cbWPQ0fZdgvJd jEpCUOfsphHcQ== Date: Wed, 1 Feb 2023 20:06:37 +0200 From: Mike Rapoport To: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org Subject: [LSF/MM/BPF TOPIC] reducing direct map fragmentation Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Rspamd-Queue-Id: 2BD4740022 X-Stat-Signature: tfyyytkjxxw7jm7tejz1uckgpu81edpm X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1675274810-293780 X-HE-Meta: U2FsdGVkX1/9LJTxanPa5tSyMkodIOD27PMFfgoH+99vuGbcmJix14/QkiAnbhaODEqc1sNyXvBG+s95+jkwruT3uXP0xN39dgZlkI+a1wYPLLzeRoUR9hBueAh6kjaiRXOMp0GPqLDZhWlti2G1wvifZffBHwtxAkc5xfzIyLY1u/xsyTPW3vWW/bBE8Hh10xjnDMUznuRjUeG1FiPPoxR95JKFepznPDseMMKpyIYdj+63kCXvojqap3eLsn8vlahOFizbLGMSfZ/PHp1r3cZm6CFFdscew4twtBWl+CIK4czJYDGZMfglWCpMe1CRE6TKLRICYpe9X9/LaT9Yb+XpJMFbD9AR9wgTJ5BDTe88nXXTufWjNhluEKiQ692F1CiMnwMPPs9JZ1LBsgEQK7cE3BklbRGpvGJZePJcWBOVzxcnbVKqwy9D3gvKqXmUc6qUlv315lTYdi8c6OC8XEJTD8SvhHKCFalraA1qG1POdx2T0s3b4RK5nfEGkWusA3ioJW5wqQCa8w3fryMWFxvtYekMcwqvyTvGiRKnmr9K9/aJIqX9u9tmResdMFtHAkTxGhskvlOeFpHw73n1SCuPde1IjfBYKyyK7IWovt0ek+4DzJTzJH3Mq/PNrWDvc9oyaWt1G0FFzM1b7desa95Sdvy6jx9eG1PDjYEfOZf3fPkTbjjLdnjelrIq966pLwLR4uEIk1ZBxWwwcSu8E+F4PTPdTr4X94oW6qwDkr9T5i0/qDKBBsTZxKKPJCSleTkhlRVYszuFBEuAIFzA4Xr1Whi4FkM8W9rLE3AOHNYthu3nfzqXzWg1g+tbJ6mS/cRGFerLsNEOdv77yRiw5uZyfnkew6wyW95Dq+73bCWVD52M4Poh0o/6JvzOT3Z7S+T4KBkNFlbFQR3ZWneO2NJ6He5fnY3bIWhxX/w2FMIt8ZQZ6biu0N+EoxVh4vzqu/tYBAbk6AgndaXCX3x eYOigNLu wb5EvS3IlkOR1uE4XasAWObFGodMLCzYZKuYl6v9AMzoF6krkkfmaEahxrIjKmbN3xiIMcf1YlMDGDJ8/cKgN9aeQYWgv8u+hNHUAsAjWDt3WUM7B/mrgibNkht/ySKxNBD8yEt8Va/DlTVIg5SCOiidX+1kiIVuIeSgIiDTp1ncoBCgj0u84KjuH569yfGxInGEbM2vtiOT+F4s8xg89GdDFXfsZFknqsDCzUvt3kFoXlecP8axq8LB4owv9pGkp83y/7vL4BFtr2AinVBu8wUzHFnCuaI7yoNeYzcISypSvKPQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi all, There are use-cases that need to remove pages from the direct map or at least map them at PTE level. These use-cases include vfree, module loading, ftrace, kprobe, BPF, secretmem and generally any caller of set_memory/set_direct_map APIs. Remapping pages at PTE level causes split of the PUD and PMD sized mappings in the direct map which leads to performance degradation. To reduce the performance hit caused by the fragmentation of the direct map, it makes sense to group and/or cache the base pages removed from the direct map so that the most of base pages created during a split of a large page will be consumed by users requiring PTE level mappings. Last year the proposal to use a new migrate type for such cache received strong pushback and the suggested alternative was to try to use slab instead. I've been thinking about it (yeah, it took me a while) and I believe slab is not appropriate because use cases require at least page size allocations and some would really benefit from higher order allocations, and in the most cases the code that allocates memory excluded from the direct map needs the struct page/folio. For example, caching allocations of text in 2M pages would benefit from reduced iTLB pressure and doing kmalloc() from vmalloc() will be way more intrusive than using some variant of __alloc_pages(). Secretmem and potentially PKS protected page tables also need struct page/folio. My current proposal is to have a cache of 2M pages close to the page allocator and use a GFP flag to make allocation request use that cache. On the free() path, the pages that are mapped at PTE level will be put into that cache. The cache is internally implemented as a buddy allocator so it can satisfy high order allocations, and there will be a shrinker to release free pages from that cache to the page allocator. I hope to have a first prototype posted Really Soon. -- Sincerely yours, Mike.