From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C821CE77180 for ; Thu, 12 Dec 2024 16:25:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 524686B007B; Thu, 12 Dec 2024 11:25:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4D4616B0082; Thu, 12 Dec 2024 11:25:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 375C06B0089; Thu, 12 Dec 2024 11:25:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 15C8D6B007B for ; Thu, 12 Dec 2024 11:25:17 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 850F380272 for ; Thu, 12 Dec 2024 16:25:16 +0000 (UTC) X-FDA: 82886831286.18.BB460DA Received: from mail-qt1-f182.google.com (mail-qt1-f182.google.com [209.85.160.182]) by imf06.hostedemail.com (Postfix) with ESMTP id 29E44180008 for ; Thu, 12 Dec 2024 16:24:57 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=VPmxbGrK; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf06.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.182 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734020702; a=rsa-sha256; cv=none; b=osywf5cm03p0wv0vp6OqdsqTCkxMIPcjMw9Z+8RbmGAzfCLAosBshDyoSfsOd1Op3ceF57 tzRsBD+9+KaOit/k4XjGCzaEMeFREu+gn59SLw+lcA8fCC8vUwRE7sYKekFX3bv7MxknM4 2VnIyRagtyb0YVvYL/lLHaZnusZPAyI= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=VPmxbGrK; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf06.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.182 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734020702; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kmVf1pMUr9PRue7W4xJ4FzVcyeXfq8sxEM+U2YHg9ng=; b=xV5xy8cprNYAWHO7Sl8D2silvG+WE/IOuDRCXeH5yLsZ6XreYYJpESl0pr/YHTK5xVdkqV kF92bmBs98HUISksT2wcMRQyG9Xi0Dl2ozbHW6hhsASdGXLTVlOKUOTNE/+eoT5yhAE9z1 oPGVzH8b02HiUZdBaU6EKpaNnRC7FnM= Received: by mail-qt1-f182.google.com with SMTP id d75a77b69052e-4679eacf2c5so5073041cf.0 for ; Thu, 12 Dec 2024 08:25:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1734020713; x=1734625513; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=kmVf1pMUr9PRue7W4xJ4FzVcyeXfq8sxEM+U2YHg9ng=; b=VPmxbGrKAVeygWIsXqXz+H4li6jbhzERY3NlydhKID2hem216t9FU7tX4soYDeGVc+ A0kW81PDt8xvZc5rkx/Spe0QS6v8oj85fZ1F9MZs+fOA9VUY+9RryTntAmSxz2kgLHXi goAANXl/00A8D9fi783jEABzkO1LhdGFaI9oRlLsnGZGWuC0HYU1MJR+0wwcCrVXgZSC PjKBsQapuQ7mY9JNDDOsADYxyEUYLF/HUnchbhOBaH1jULHnI+P6veTwSsqUomzGVAa3 mNthcJUznVpsIfBC/MkxIDKxyG7tsq/s4I7MUZYHuCV6do4eNMAk0saD83KNyX4ky+XW EtZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734020713; x=1734625513; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=kmVf1pMUr9PRue7W4xJ4FzVcyeXfq8sxEM+U2YHg9ng=; b=fO14McVIzrYRhNWV/u4PQG1enIT2hRjGaHFfijkw3Vd/A9DNXNSyUTmlQCTC2AiGbv WCSDA3rbyen6KHsUOBSYubTABRvTGaCRc5IT0OyfMVQk6W6nuitDv5300QY+EXSbGVCL FK7udEndUhhscWfkUIRCfYb18VF8sQJO0Bc4whaDbqSZV1AdiZNppZCLhpYSQ36+bFYo tWvOLkEn9VIc6acslxnS1N1mRjVnbwvGf8KzFFM8n1BaqyigDZXGzw17OVK2GOVuok7G FLxfprohEJ0mh4+HKZExTShKDsY1vsr4vWA8VxlvKnXzb7JIfQiruVPy/zEMfKXyPXG5 KLzg== X-Forwarded-Encrypted: i=1; AJvYcCVMZuzxSynGh5N/H60NA+ph8URleRryLgDO2q6kPmyGCZBSJDY04OG5efnF26BZCZwiMrZe8Yjv6w==@kvack.org X-Gm-Message-State: AOJu0YwWJ9zxdp4ojSsZ7vmq2HWhGS3viNFISTsVhwqLWbcByhrj/gnJ z4KqsSmaQ2SlPUUTzPzoU3chRSwB65yd3/CD+0H+WNo4ok6Z7EBqKm5m4GOtTuM= X-Gm-Gg: ASbGncsM19iGpwx2COGyQ73urAvK9baGwZp9njmKv4eVYPO3H8HB0pFTuueRs5Ydb+C QMhnNgiHYyaHMd0+v7tgdIqUp+Kxph5YMBYg8GjBlCjlnBb1IYtUVt8Jfl1khEYYeYWSY5QRQz9 f+fPi7pIQnVCOg4s0BoVbizGrsOtEMtrJRmgzdE9FuZyF4S4sAsCpXKiim5eeQrBDqbeEUnF7jr 045Yih5oDvmsGO9e5oeQMJ2RNPCn2h3/QznI01RrAZCE83hYURbj+s= X-Google-Smtp-Source: AGHT+IG8hpWYXHW8Bbk4OUrSFDT7UV27TmNqiVCUHZzMsrmD8Qhqk5LsewyM7ZSIkeOQ3grdm2m3yg== X-Received: by 2002:a05:622a:14cd:b0:467:5ad8:a042 with SMTP id d75a77b69052e-467a163b4f7mr16172281cf.26.1734020713245; Thu, 12 Dec 2024 08:25:13 -0800 (PST) Received: from localhost ([2603:7000:c01:2716:97cf:7b55:44af:acd6]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4678de2c74csm17682601cf.89.2024.12.12.08.25.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 Dec 2024 08:25:12 -0800 (PST) Date: Thu, 12 Dec 2024 08:25:08 -0800 From: Johannes Weiner To: Barry Song <21cnbao@gmail.com> Cc: David Hildenbrand , Christoph Hellwig , akpm@linux-foundation.org, linux-mm@kvack.org, axboe@kernel.dk, bala.seshasayee@linux.intel.com, chrisl@kernel.org, kanchana.p.sridhar@intel.com, kasong@tencent.com, nphamcs@gmail.com, ryan.roberts@arm.com, senozhatsky@chromium.org, terrelln@fb.com, usamaarif642@gmail.com, v-songbaohua@oppo.com, wajdi.k.feghali@intel.com, willy@infradead.org, ying.huang@linux.alibaba.com, yosryahmed@google.com, baolin.wang@linux.alibaba.com Subject: Re: [PATCH RFC] mm: map zero-filled pages to zero_pfn while doing swap-in Message-ID: <20241212162508.GA4712@cmpxchg.org> References: <20241212073711.82300-1-21cnbao@gmail.com> <41e33113-1ac4-48fa-8eac-0f90ba5bc864@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 29E44180008 X-Stat-Signature: 7zedhwaufe1ccqjzyxq5zs4r9rayi3mq X-HE-Tag: 1734020697-979084 X-HE-Meta: U2FsdGVkX19BBmOlp4drvTCNHX7YavMVso0N5+ApuGa7WOPGXDAVkodb/D3Q9QNOsGrOlIn7o6BjOkTY7OvNoLz0fPVtQVY9W8a10syP+lWK+9uoH+wHt6LRFnItqFxWdlsEKTf5cusj7XYYpiPOdzSscwR7qWr3oJpfjMbkRJolb1sUPjl9vEi7gGI6nFPXAN5WAVS6KDF5nLUig/jPoo1knl2pnV3khdyrg1H6BfvD1IhOUFdKRWu/rNUKnktblNkl4YXpeguqJ7zH4GMhP8IBLV43vq6ivBMYyKcuASRyIJVH1h/Rc11sXoHDhLXVvKcT+WgnHcVfhgoR8JV3qSgNNUrmmAhYt8wrgQXfbNKJfHyGajheVKhOjKInn8tz1W/g9Cu7axiYP7QFncFRPV9WeIAlj6Zb+Uztn9/yMK9PHgMqHDa+thRUK88t77FshnWRsytUb+D6X2BB+nxNITLsihEKJIDsA2V7jeX8KuKGA1FJ/0TM6gTCTAbrlYkjAGtV/bSBkN9i6nHXKHzeNvcIgtVjJpAQHamuwgHxVGdaGw0krv8P079h64C2O+WlhlRq1BG5p+Eg2MhhiUqLcTY88YbNe45yha+yLlY8mOporaU13n2CzsK7MKXSGWPT/Z6ViM1mt0sfceImqs7kkUFGSmGv5toBS1GQUWY7D9yiujHzWnJQx6jHtAxQP5w4K05YvKM1bmCrD8FAckrn8LRNNgjpuLNVH960+HSaiVQh+fclWyd1/Yi2HVrCk/jMRZTbkap4BKleqnaNApB7MzuCyc6RllKpQCkxTByM96IKFKPtunqBtG7JxsrT250TNDTYbbmetj/HevhUTcJzmnRJdT0DeyamQClT1V36kcBsLAUylfebhXHvWLG10oEbGRnIV702y8yQXZJdWISV71PBEqPwNMOswObmmstEBwzFyV3dSohErVRfW+oPaLP5r6iqttu9NGnOAdPmjjq yMeiI1yY hxjxLrXNzuI8WDIfl7S+7RhU/rV4AgcS0w9CuCjq2DAfyT5ymhH1Ma9oMwssKXee/9av/+mAGHcI9LgrxRPfyL6/8CPMO+Q2Q/WL6815AwUqGLrPWArn4xTUn352tybGF24GtuDw64Et7zVWNN41CxEqXd1oNWgdo/c+5dOY4v+YsFaOlhRcA5CNUWbY+0gq2OuRQpuUrdq5P/XsH744AjutS6edkEIDxeWn2lNQenlQA1O9K04ex++EBEz7PIAvevf8DB1izRvyxyCWsavow6BRKG8Y+KuAYWb65LkBqRjCXxmILajhvKVWICsMb+Swqe+L1l1YXvZcFm6DYQerYUqsvQvuKztlTSPUveORpXgeLRVuivfe5Z9QT2CtSq6sRynAXbeee1hN8FFI/MoSEh/MJKOcTrlrZb6dgUnv9/onPsNHWF+7PJ0AzqNy5jjUE3nd4W4xiRQVCgRLKJstFLR90erg+23enQKgeV5FNjZwyVQ/fEadbV4PbuZSpvhdULbCy/pziLeNHXjwoeu3r98CMn40NFTm7/TRrPJlhki+blO78u/ju2aDBycSdkxrAaXuOfquFRtIkFdeQjiB1rT4b0SbKZWkXYn5PjGnL4TazNS+pKUd8Mu11CpBc1h/I+mzv X-Bogosity: Ham, tests=bogofilter, spamicity=0.000042, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Dec 12, 2024 at 10:16:22PM +1300, Barry Song wrote: > On Thu, Dec 12, 2024 at 9:51 PM David Hildenbrand wrote: > > > > On 12.12.24 09:46, Barry Song wrote: > > > On Thu, Dec 12, 2024 at 9:29 PM Christoph Hellwig wrote: > > >> > > >> On Thu, Dec 12, 2024 at 08:37:11PM +1300, Barry Song wrote: > > >>> From: Barry Song > > >>> > > >>> While developing the zeromap series, Usama observed that certain > > >>> workloads may contain over 10% zero-filled pages. This may present > > >>> an opportunity to save memory by mapping zero-filled pages to zero_pfn > > >>> in do_swap_page(). If a write occurs later, do_wp_page() can > > >>> allocate a new page using the Copy-on-Write mechanism. > > >> > > >> Shouldn't this be done during, or rather instead of swap out instead? > > >> Swapping all zero pages out just to optimize the in-memory > > >> representation on seems rather backwards. > > > > > > I’m having trouble understanding your point—it seems like you might > > > not have fully read the code. :-) > > > > > > The situation is as follows: for a zero-filled page, we are currently > > > allocating a new > > > page unconditionally. By mapping this zero-filled page to zero_pfn, we could > > > save the memory used by this page. > > > > > > We don't need to allocate the memory until the page is written(which may never > > > happen). > > > > I think what Christoph means is that you would determine that at PTE > > unmap time, and directly place the zero page in there. So there would be > > no need to have the page fault at all. > > > > I suspect at PTE unmap time might be problematic, because we might still > > have other (i.e., GUP) references modifying that page, and we can only > > rely on the page content being stable after we flushed the TLB as well. > > (I recall some deferred flushing optimizations) > > Yes, we need to follow a strict sequence: > > 1. try_to_unmap - unmap PTEs in all processes; > 2. try_to_unmap_flush_dirty - flush deferred TLB shootdown; > 3. pageout - zeromap will set 1 in bitmap if page is zero-filled > > At the moment of pageout(), we can be confident that the page is zero-filled. > > mapping to zeropage during unmap seems quite risky. You have to unmap and flush to stop modifications, but I think not in all processes before it's safe to decide. Shared anon pages have COW semantics; when you enter try_to_unmap() with a page and rmap gives you a pte, it's one of these: a) never forked, no sibling ptes b) cow broken into private copy, no sibling ptes c) cow/WP; any writes to this or another pte will go to a new page. In cases a and b you need to unmap and flush the current pte, but then it's safe to check contents and set the zero pte right away, even before finishing the rmap walk. In case c, modifications to the page are impossible due to WP, so you don't even need to unmap and flush before checking the contents. The pte lock holds up COW breaking to a new page until you're done. It's definitely more complicated than the current implementation, but if it can be made to work, we could get rid of the bitmap. You might also reduce faults, but I'm a bit skeptical. Presumably zerofilled regions are mostly considered invalid by the application, not useful data, so a populating write that will cowbreak seems more likely to happen next than a faultless read from the zeropage.