From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02580C54E41 for ; Tue, 5 Mar 2024 20:35:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6B7766B0071; Tue, 5 Mar 2024 15:35:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 667C06B007B; Tue, 5 Mar 2024 15:35:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 556836B007D; Tue, 5 Mar 2024 15:35:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 475436B007B for ; Tue, 5 Mar 2024 15:35:59 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 18E0C803BA for ; Tue, 5 Mar 2024 20:35:59 +0000 (UTC) X-FDA: 81864142038.22.32198A4 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf25.hostedemail.com (Postfix) with ESMTP id 5B388A000C for ; Tue, 5 Mar 2024 20:35:57 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b="kK/y35VK"; dmarc=none; spf=none (imf25.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709670957; a=rsa-sha256; cv=none; b=VPBZv2vZL0ilbUVfW0kwWHubOoy79TSZMZmvicSjxCE3scb4n0JAJ4uslUb5JjHWcGxLId J2flFkZ0cZixz49PiLAN9kq6Va4od6Yx/XMgxQbx3hPDaoSxcG67UTlB8brZLFOC6fExNr 8D9n1Id6GSRJ9D5v14Ichcz5Ro/9qTw= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b="kK/y35VK"; dmarc=none; spf=none (imf25.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709670957; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Kqk/qNJ9S99aFKwTpfxHTRvkBgVjd6okEAjhqO7G8Zc=; b=a8h1qRBugCRTqWpzwHogjvY3HtqsDAKhxCYT2uKIgVcf7YsaZEKVz/LuYKGyTYRbMg/Ke0 91afLbpcj51qNoLQhppgLj6MaphKMGb2OScSW/OTIhCqxXjWjd8E8GpilBGkXpO87pBBOZ 3f9ZlAOvndKsR2YlbnlS+KcFgR6Po4A= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=Kqk/qNJ9S99aFKwTpfxHTRvkBgVjd6okEAjhqO7G8Zc=; b=kK/y35VKW+DSFy0Fu91anNqBqh amJ4sBJs5wsfz9mYeCS/+DnQo3AZh/aMh24Cmjuu9F6bFQIS31jz1jsliilns8mhv2hINlNS7j1mI 2VAAr8/azFfzI//h32+Gl7KEZa125g8Z9UACdXZ60j94Y+6/teo90rnBRMnY+Jzobbx12kLegJ1v+ Jc3pfVy79XsQPZF0YipRwI/jwdm8qcV/oMZ5ouTVK5ZH2TNRalVEibdVtzfJBkcW3Icnz8U0oP7vA EP8XTlfPdOfdU9SecVxIjtVF3CckDY8YI/5xVRld6zgUtrrC5wD7BypNN5lMj1vm8a46DAciqDGZZ 9kXSpHPQ==; Received: from willy by casper.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1rhbVz-00000005D9G-07EP; Tue, 05 Mar 2024 20:35:55 +0000 Date: Tue, 5 Mar 2024 20:35:54 +0000 From: Matthew Wilcox To: David Hildenbrand Cc: linux-mm@kvack.org, Oscar Salvador Subject: Re: [PATCH 0/5] Remove some races around folio_test_hugetlb Message-ID: References: <20240301214712.2853147-1-willy@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 5B388A000C X-Stat-Signature: 3gxhuddhz9dqp898anw69ash4typ8guf X-HE-Tag: 1709670957-850555 X-HE-Meta: U2FsdGVkX19x04Zle2mjKhEa0xx9F8pBaJldcPwxG8kmVWoiPJUKHtxn8EZKwpOBldA+WnSfTb3/UsmrLv+Z7B5CBEsQPV4U9TvvGacfBRuGhjT7SQkAqp1bhxPzVT7KFBlcdMrZosIhEpngY2xMS0Yn8ZN1wpehDEV8W+2rvH3U1iZUaKrAaQxx/5AEbHFzykDe//P/owxf3Rf/wWGfy5pj0pxPvxl0gZdoEyIIpk6w600sOjr3/H5w20mZ5YUI17pyhrtb7FDJtwI2OJcWDhutrhRsnQnUnznKAi8vCxobAybSRCk63WS/zFkwK1FaeIu1QYx7bzvxEB9Go0/RXvMh/UaPDq82rf+X49mvDTTOz90zlR8xzoT0gaPzM32aHzKRKUS9VcA3a16cdaRy7K1Tp66ILxjHBev0wnWLnw2ZD+gTiljBXwRAz6zDK0XL1UJg5eWofdbVm1AdothDxaHx9JB3U/Q1aIioGLUE2ylkEeFBVpNJsR9yVpwHPgl7snlNst9lR41vqLfALMD9QL8vxmptmHkO7LkikB+pBbh7gF0I9Fb69xqn9qq6xsPXMppjVawXnj0qd2wDrUqbGg7YZCRGZz78X8KgH44QBdelogfgMolpaEUF1WwCYJ9yVFgVHuyBHawW2HRlAHqa3HhRqr6qYjAQTye22LsLJ04IDO/H4XerLSw/926WvLfwxnMfnWvfdgk06xwrOkGb/3pnZP2Lur2g33byKUXsw5is5fFn1WDNzLzPzvKgqXufKDyEiZ3sbvE4MN13Ma4UCM9l0gd41Hx5xQ+9NoKY8xafWGwaCn7HhZnBWztl6UYQRMZNhUvVGNRzQjCJGEWTXsvo1bBDXpZp8UKJKE2XQuigDCTZsY+14MeiGXf3nFlU X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Mar 05, 2024 at 10:10:08AM +0100, David Hildenbrand wrote: > > The cost of this reliability is that we now consume the word I recently > > freed in folio->page[1]. I think this is acceptable; we've still gained > > a completely reliable folio_test_hugetlb() (which we didn't have before > > I started messing around with the folio dtors). Non-hugetlb users > > can use large_id as a pointer to something else entirely, or even as a > > non-pointer, as long as they can guarantee it can't conflict (ie don't > > use it as a bitfield). > > That probably means that we have to always set the lowest bit to use it for > something else, or use another bit. Yes, that would work. > I was wondering if > > a) We could move that to another subpage. In hugetlb folios we have plenty > of space for such things. I guess we'd have be able to detect the folio size > without holding a reference, to make sure we can touch another subpage. Yes, that was my concern. I wanted to put it in page[2] with all the other hugetlb goop, but I got to thinking about an order-1 compound page allocated at the end of memmap and got scared. We could make folio_test_hugetlb() look at ->flags for the head bit, then look at ->flags_1 for the order and finally at ->hugetlb_id, but now we've looked at three cachelines to answer a fairly frequent question. And then what if the folio got split between looking at ->flags and ->flags_1 and we get a bogus folio order that makes it look OK? We can't even look at ->flags, ->flags_1 and recheck ->flags because it might have got split, freed and reallocated in the meantime. > b) We could overload _nr_pages_mapped. We'd effectively have to steal one > bit from _nr_pages_mapped to make this work. > > Maybe what works is using the existing mechanism (hugetlb flag), and then > storing the pointer in __nr_pages_mapped. > > So depending on the hugetlb flag, we can interpret __nr_pages_mapped either > as the pointer or as the old variant. > > Mostly only folio_large_is_mapped() would need care for now, to ignore > _nr_pages_mapped if the hugetlb flag is set. I don't mind that at all. We wouldn't even need to steal a bit or use the existing flag; we could just say that -2 means this is a hugetlb folio. As long as it ends up at the same offset as page->mapping (because that's always NULL or a pointer possibly with a low bit set so can't ever be a number between -4095 and -1). IOW: word page0 page1 0 flags flags 1 lru.next head 2 lru.prev entire_mapcount + gap 3 mapping nr_pages_mapped + gap / hugetlb_id 4 index pincount + nr_pages 5 private unused 6 mapcount+refcount mapcount+refcount(0) 7 memcg_data - or on 32-bit word page0 page1 0 flags flags 1 lru.next head 2 lru.prev entire_mapcount 3 mapping nr_pages_mapped / hugetlb_id 4 index pincount 5 private unused 6 mapcount mapcount 7 refcount refcount 8 memcg_data - 9+ virtual? last_cpupid? whatever Does this fit with your plans?