From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE3C3C54E4A for ; Fri, 8 Mar 2024 13:44:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F3DE46B00AE; Fri, 8 Mar 2024 08:44:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EC5386B00B0; Fri, 8 Mar 2024 08:44:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D3E696B009C; Fri, 8 Mar 2024 08:44:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id BAC298D0002 for ; Fri, 8 Mar 2024 08:44:26 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 65CDAC0790 for ; Fri, 8 Mar 2024 13:44:26 +0000 (UTC) X-FDA: 81873991332.22.E46E6E6 Received: from NAM02-BN1-obe.outbound.protection.outlook.com (mail-bn1nam02on2054.outbound.protection.outlook.com [40.107.212.54]) by imf07.hostedemail.com (Postfix) with ESMTP id A5CD94001C for ; Fri, 8 Mar 2024 13:44:23 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=aET731pv; spf=pass (imf07.hostedemail.com: domain of jgg@nvidia.com designates 40.107.212.54 as permitted sender) smtp.mailfrom=jgg@nvidia.com; arc=pass ("microsoft.com:s=arcselector9901:i=1"); dmarc=pass (policy=reject) header.from=nvidia.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1709905463; a=rsa-sha256; cv=pass; b=UbHB32Mcf5+raUHqhJGmB6T/NAuNWRm4iDj6jDY/l9xASpZfYsyILjQ3i1wj4uLzlEe3V3 mA4STjtf3wHCtF9bDJ6P8DKeC7jbVI5lrawbk1md9Q+mjdsflJM01BAvu3crDatKD8fbUy SjpKVEGaWUlSWkU24f/divC3SDWtIj0= ARC-Authentication-Results: i=2; imf07.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=aET731pv; spf=pass (imf07.hostedemail.com: domain of jgg@nvidia.com designates 40.107.212.54 as permitted sender) smtp.mailfrom=jgg@nvidia.com; arc=pass ("microsoft.com:s=arcselector9901:i=1"); dmarc=pass (policy=reject) header.from=nvidia.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709905463; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OpylKlH8q9clcObwOt+tE2VdTFrv/sOTJ70XLUxAV5A=; b=qBw5c9aNNLKDGGi1TVR2D49dwFnCe5IfphZ15rGomdoIMeRS0/Y6kO2DyVTXWAfMSXJBBx cWbFMGCVz3I56AppVf4crtUKVRUMVYGXvoVIN8a3oEuiDq5Kc9kmZEVf+pK938gJNHfip9 wcCLKQqEo5Hd3bifqQBFDuGk1vb63eM= ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=TZFOlA1qDrWFApo6eDNFp8QrVI3e1TGmpMCccEcd4V3F1TLyXZL6ZCGDJ3X0N69yDD4NhPuN7ilmpgx3Qkg2Bi/om08O1qOrhvb3F7XNy4GxL9y1gfPhsd0AHPblEaTOSIlKphwQSq1xaU3khEooqYDEJB16jzbAftIaE4Vu/ZG9A6c8NV3jfG+tbe4mOddwRAaOFbFk+ibyC8XKVNGqErXFtK7i+/6Q1DkRnehc626kZf05wMnHHAYTiykhXz9dGNbHpnWiir9qv/CvnuwFKNIu//HJvGg6LeqgN4KWzgQ2a5qKILf2DI3TuKBeGvQx+qNy1OPkOqbosTdyATGx9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=OpylKlH8q9clcObwOt+tE2VdTFrv/sOTJ70XLUxAV5A=; b=c5WHRPnK3rXspxp84/ibqYxaLkiDIE45kZfWJ+IQJpKsBcX5m99VmaV1XaC6tAi+ApYAIB51bI+1LXAZ/bqIg2XiWvC9ZF9wXV4IOLFihEE3dWsRjbFR95XHnlKkWHuVrmtp97bo8ymNe/fvU37CLIFgb6dnxZMhbB2/DKkM6j2nbrHqBDR/0y8OvK2zLXdslwHi45KL/c2Q08azSppkXtTHzCNM56qGVVZ2sjDbdfLCGptyXpdfdumcmAdYiLZfYFzvdgsWhqGpGKczdg9obA2YtV1mCZYAwngkumkaHq4a1H4tib33lkWpvc0tyb5jlK2obssBqfGKhXu+hyNR/g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=OpylKlH8q9clcObwOt+tE2VdTFrv/sOTJ70XLUxAV5A=; b=aET731pvklLkbca5E54sW8Y0c5jfuM6/fj8galJrWX3uBaZoyB6GoyJtGyyW5BhLcWK3H3HLk9pOwwI2aDItjjb3xIRc9X25gJGxWmNG4f39CJb+o+eXM0SJ18730dFzwaUnURmGBNG4S+YuDGlh+PSDRt/K8bi56LfAOj0/p2ySogMC67ETowxOOa0w9RqzRxTtrwXmIxrYiEn0KJWR6gihPDcz4XQyBl3MUJbJVW1EkfELvVjjKk1PYp+Dnt1O032dKYJY902+4uDZ3nvxBp0PnuysrUGH7OJEJy9seRawIQ6bJ27NKAkpy5Kj2FuTvnSnqWi50ulMfJJHybgXRg== Received: from DM6PR12MB3849.namprd12.prod.outlook.com (2603:10b6:5:1c7::26) by DM6PR12MB4282.namprd12.prod.outlook.com (2603:10b6:5:223::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7362.27; Fri, 8 Mar 2024 13:44:20 +0000 Received: from DM6PR12MB3849.namprd12.prod.outlook.com ([fe80::6aec:dbca:a593:a222]) by DM6PR12MB3849.namprd12.prod.outlook.com ([fe80::6aec:dbca:a593:a222%5]) with mapi id 15.20.7362.019; Fri, 8 Mar 2024 13:44:20 +0000 Date: Fri, 8 Mar 2024 09:44:18 -0400 From: Jason Gunthorpe To: Alistair Popple Cc: linux-mm@kvack.org, jhubbard@nvidia.com, rcampbell@nvidia.com, willy@infradead.org, dan.j.williams@intel.com, david@fromorbit.com, linux-fsdevel@vger.kernel.org, jack@suse.cz, djwong@kernel.org, hch@lst.de, david@redhat.com Subject: Re: ZONE_DEVICE refcounting Message-ID: <20240308134418.GH9179@nvidia.com> References: <87ttlhmj9p.fsf@nvdebian.thelocal> Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87ttlhmj9p.fsf@nvdebian.thelocal> X-ClientProxiedBy: SA0PR12CA0015.namprd12.prod.outlook.com (2603:10b6:806:6f::20) To DM6PR12MB3849.namprd12.prod.outlook.com (2603:10b6:5:1c7::26) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM6PR12MB3849:EE_|DM6PR12MB4282:EE_ X-MS-Office365-Filtering-Correlation-Id: 08f9dc60-5371-4b1b-76c5-08dc3f75db44 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: MMYlES+3BkhHnUmwRF8aO7lG1/kdpqD9qfkeDGMx2yrZrNXPBK2xk4KyhDWdtpcKDB1XSL3Tpd9yjXxrA+vmjoJcrTTCAa9ZbUkfLyVA2xDPsP+YtAb3fnJtNkdUMSqYyjp1XQwaEDR8UCKyta+aJ1eVl/CRBD0LVRtKwgvaCa87jNlwd+luHApa+eG6WK3IkS17evLLXiZA0JCUrMdaQTGCrZqVKIMg65iXAv3uwWzlD40PGkNNZB48LGwoNW6KqG+cYDP1/b+TT5J4cGI4aUhetue9NpucIcOa8wDhdZ2sFhqa+sOCYq9keZBR7r4JeDgViX4xSva2MJWLuJf5yyDqR657LSopC8HdNW7vHnkHOIYqmYqz68+KXELeSzctN/ZBU+p1Mgyt57kof7kYULyCuffTC7CBSM660JitJo4pZBddwun+6Noy9k3CTf5kUfFDd805gukpUWSMRJbakei5i3T6nJhOHLdW8gvjDBmd1n4urnAPXAvvxk9Uxf60THT+/HXvRzBVdJrI5v+jkArYboPZSHMCLzFdrg5Zva6NEzEoShkkoyiGWJkX+qBxhVuVCH9xDoOf5rHJ15fM0fGbno5MqzYEywQCpruFPzve+nK9WddseXwb1N3/Y7UnadHdJjAbzTShZo8pA8ffjuUGkQgOrJ8N/0voETd4YMc= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DM6PR12MB3849.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(376005)(1800799015);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?oKcc7ukosBW3ZpH/zIb6cfMWU513eqUIGSaiMg5tH4GQDemlRPtUrMepJQSK?= =?us-ascii?Q?Q8Q+6XJNOE2EaTW+dn/lTUlhOWiNbubwkKHbg3CDKuaiGSTUOLD3vISFyT9i?= =?us-ascii?Q?kNOmnDnAqfilmdU1qc+inOQtWuVoqNq6Yy4+hzMbSFRYWZqdGsOykKJM+mCC?= =?us-ascii?Q?aBoX54W3h5xXyU5bOOt2TvL8JICmoklaXcuaVJs7gWDBkU0L1uJ+/dRNd7jS?= =?us-ascii?Q?FqNN6u5ksudnsrA7j485XDFrKjzh7LEBFmLKX3xW2ypKAJQbc8MOTNMzp2i3?= =?us-ascii?Q?6RNi6t5CMQfp0pY25ApiblKOIlqE1jY+EsCF9vNEc6ybutMRdSNarKnK2eMv?= =?us-ascii?Q?8OYdMapVdv8QSn+P3zho2WV3sr5vuUnOuE5j9JfLxWvPwbIIy19BLgv0SPvz?= =?us-ascii?Q?cL4Y0OMaWU5SU3qtlv053A4yqicKag47O/uCJ9JaU7AQvo5J4x2nJYLwgxkW?= =?us-ascii?Q?VS7IuiUvAAhsYq/BNwMPt+kxpeo6yJonO7LL9uDimWq84QeBDBaoCEIUHNHz?= =?us-ascii?Q?bZHcFSiAsynxZvb9W6F0O3Sm5j/rrzcdsgpGxksvEmP/m6b4+WQLm5iekbFw?= =?us-ascii?Q?LRrJNT+jKAHtaKLQxz0kX1NLWzp5Im9eCvMtR1+0x9zy7sVjU3D4jS3AFzs5?= =?us-ascii?Q?FwIG0AenZfc+fI/WkkZUIhvGcbduL6NBXiXsTklo0UgheKqyNTa2CtJdbGam?= =?us-ascii?Q?OweMcWWQfAp+bA7TPQ9rfch1W5lB7gqcJptAL7XHI2uLs4sYiarpGaHOt3v4?= =?us-ascii?Q?d0s4UAoIo9vHWqB/w64zV48wkgSUhA6Su36CczELGCdG1vMoQbY+HzNBBQhW?= =?us-ascii?Q?3eBHsxDi+gx63TZSAbYRtU9xdOwu4UgQf+ds9XvYhKUmet7o5SPFpDYM3+QU?= =?us-ascii?Q?qF/lqNGAnfoTN/ChyMLX0iJ15s0YmfghChhsngIaVHZwonetVH6ASwlW51+l?= =?us-ascii?Q?k2nIMbN6HqK24lJHlR5y/O7/HQupR/lP0t6US/yhpgLx5kGzfxyIY76oWcYK?= =?us-ascii?Q?iGqF7sUl0KwHdSJ7egiEegoKe9KYdGBr2BzoBCi/t5RgJZSDkvGWp2ZKvCQn?= =?us-ascii?Q?JkI9r7mOCsrv0HLtc8BwHWflBd4uP8jv3ffW4PTnvCfZve03wZguw0eje7z6?= =?us-ascii?Q?JzZ/LT5jZRpQAsLwP7ag1AB+CzN7IkZnYodu0x7yw/cxvx1P/QVDgIrlJWK0?= =?us-ascii?Q?9foeBYA1bglFHCZq2bOK+YhijMutRSLzf9Rs0KsR4lUlOGjWlwrZ/jFGtPQ2?= =?us-ascii?Q?4RSEcyTdIIAtmr5bnX1MqsDacLU/4pmPuA6wivXCpapay1RQOVwtxnyNnhQ0?= =?us-ascii?Q?eB3idOiocV8aHmOhB5xmufczw2Nzg1iEG+DRtuTjqiih8YQAZ4t50gSAQmVF?= =?us-ascii?Q?7ci9dmBG15RTco0kzNukNEXgIQe5P2SXAowXeSu0KC5FEJ7j1KM8UsSXgeMf?= =?us-ascii?Q?Dbn32/5hgTj5y0d+ytI6oPlG76ue0x0HM4dcEsL/pwQ5GbpVT228K82LqwkF?= =?us-ascii?Q?CDwa7tLq2uMss7SbNplJ83L4G7Pd87zDIFPC3mB0unMG3+0vCsvIuoUDplSj?= =?us-ascii?Q?pgpMTo1Ov/admKjNRvk=3D?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 08f9dc60-5371-4b1b-76c5-08dc3f75db44 X-MS-Exchange-CrossTenant-AuthSource: DM6PR12MB3849.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Mar 2024 13:44:20.3558 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: J56QJaTc5qdYwQN+GzPpPg226z/CQp0LsfhaCA2dKkQBq8Z8AxkPz1TdroJIt0Hn X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB4282 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: A5CD94001C X-Stat-Signature: rtt6fqcgu54cf55tkm1uuru9f9umjuut X-Rspam-User: X-HE-Tag: 1709905463-462248 X-HE-Meta: U2FsdGVkX1+S5vDKPiKNwQkMnAPAkwZzWPE/xTBx2Et9MyevfiayV5vQK8cgudKJav9iDsefzL3pjoBtVouz5RFbF183sE3YRdy08rXdMsPYgw/Ig50Zfn5wpQ/tusp4sFkN3f51x74vkxz943CD9ZgNG0SW/0yJ5a9pyf+ms/FL0GlrHwpljzBijkYdS68DzmPi4L2b+S31remwDL3n9nlvQH6bD4TclOoJSkAdcLwaXAQ6NCnkdISG2XmnciNosXjIgGDVKtwsxLMgHpQuwgPhKyhXjlIO7geOuXXSLwUH5X9FrZz1ELul5f60qRmT3JFgxp6wUWxj9SL7R8iCHnOKhHgyRnpTZUY0vEoOqKJbOhiKBE8OEOmhNeF54n1KJb3t33cyteYr6iXnIDfiNvERKP4qul9q/Vs/3+ABpsWLXwnLIYilHOR7+51RM/qn0KIXkhvWBeIMYiIh9Vi0c9w4E1ecHCk6LB66zRa36OY6ZNMuE0r1KQdS/9M/rJOMIV602UZw5j5369DEzQ9mmOw4MPSyNKlRBk/S96UO0TaRbDEKnlygcYtuGzDe8PW1K5yThbNlrW6ydyQGNGSRsms8vg2r9mCYpM0H87c7TlcANF59Ti2JspitX84JaLJdLssdObBnD0FI39LBrrg2SSsmgZAKvyZcri0DhWnrWNo9fu7x9xMPFQeIX3rc/p2EwGU1/snJKmbgDgpeCJ/Sai5MCWIdG6J7JrhXST06FLVvkLGSoKZ/obOxgeOigV04M/lCVEsnemS+ZEb/B0k/c8KzZpEZTnIdaDTYqGl8vxnQy3vP+DFR7eZIp+wPE7hv41HhzWihEP79GD9ViFp2+xarVv9/iVokUP0Y23IH0NQJNIQuG1/FY7vMbL50tP2oI8ByC4eVxNaHFB211f3dV1xbuUI75pRlp/WkVYq7Ba508dzlWJ245t+pSK2tYEP+JFiF5c+Q1RI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Mar 08, 2024 at 03:24:35PM +1100, Alistair Popple wrote: > Hi, > > I have been looking at fixing up ZONE_DEVICE refcounting again. Specifically I > have been looking at fixing the 1-based refcounts that are currently used for > FS DAX pages (and p2pdma pages, but that's trival). > > This started with the simple idea of "just subtract one from the > refcounts everywhere and that will fix the off by one". Unfortunately > it's not that simple. For starters doing a simple conversion like that > requires allowing pages to be mapped with zero refcounts. That seems > wrong. It also leads to problems detecting idle IO vs. page map pages. > > So instead I'm thinking of doing something along the lines of the following: > > 1. Refcount FS DAX pages normally. Ie. map them with vm_insert_page() and > increment the refcount inline with mapcount and decrement it when pages are > unmapped. This is the right thing to do > 2. As per normal pages the pages are considered free when the refcount drops > to zero. > > 3. Because these are treated as normal pages for refcounting we no longer map > them as pte_devmap() (possibly freeing up a PTE bit). Yes, the pmd/pte_devmap() should ideally go away. > 4. PMD sized FS DAX pages get treated the same as normal compound pages. > > 5. This means we need to allow compound ZONE DEVICE pages. Tail pages share > the page->pgmap field with page->compound_head, but this isn't a problem > because the LSB of page->pgmap is free and we can still get pgmap from > compound_head(page)->pgmap. Right, this is the actual work - the mm is obviously already happy with its part, fsdax just need to create a properly sized folio and map it properly. > 6. When FS DAX pages are freed they notify filesystem drivers. This can be done > from the pgmap->ops->page_free() callback. > > 7. We could probably get rid of the pgmap refcounting because we can just scan > pages and look for any pages with non-zero references and wait for them to be > freed whilst ensuring no new mappings can be created (some drivers do a > similar thing for private pages today). This might be a follow-up > change. Yeah, the pgmap refcounting needs some cleanup for sure. Jason